2026-02-21T08:04:42.6048060Z Current runner version: '2.331.0' 2026-02-21T08:04:42.6053604Z Runner name: 'i-0846a4fea0bb8a298-1002' 2026-02-21T08:04:42.6054372Z Runner group name: 'default' 2026-02-21T08:04:42.6055275Z Machine name: 'ebfe9a6a9984' 2026-02-21T08:04:42.6057957Z ##[group]GITHUB_TOKEN Permissions 2026-02-21T08:04:42.6059909Z Contents: read 2026-02-21T08:04:42.6060424Z Metadata: read 2026-02-21T08:04:42.6061032Z ##[endgroup] 2026-02-21T08:04:42.6062900Z Secret source: Actions 2026-02-21T08:04:42.6063575Z Prepare workflow directory 2026-02-21T08:04:42.6597101Z Prepare all required actions 2026-02-21T08:04:42.6631489Z Getting action download info 2026-02-21T08:04:43.0108885Z Download action repository 'actions/checkout@v6' (SHA:de0fac2e4500dabe0009e67214ff5f5447ce83dd) 2026-02-21T08:04:43.3233026Z Download action repository 'actions/setup-python@v6' (SHA:a309ff8b426b58ec0e2a45f0f869d46889d02405) 2026-02-21T08:04:43.6843934Z Download action repository 'astral-sh/setup-uv@v7' (SHA:eac588ad8def6316056a12d4907a9d4d84ff7a3b) 2026-02-21T08:04:44.0756729Z Download action repository 'pytorch/test-infra@main' (SHA:bb8f04ff3961233c844fde6533c7c6c5f0857909) 2026-02-21T08:04:46.6871344Z Download action repository 'actions/upload-artifact@v6' (SHA:b7c566a772e6b6bfb58ed0dc250532a479d7789f) 2026-02-21T08:04:47.1678117Z Getting action download info 2026-02-21T08:04:47.3323226Z Uses: pytorch/helion/.github/workflows/benchmark.yml@refs/heads/main (874a7d0cadab18218a84ad3579d329dc95c51820) 2026-02-21T08:04:47.3327474Z ##[group] Inputs 2026-02-21T08:04:47.3327757Z runner: linux.aws.h100 2026-02-21T08:04:47.3328014Z python-version: 3.12 2026-02-21T08:04:47.3328274Z image: nvidia/cuda:12.8.1-devel-ubuntu24.04 2026-02-21T08:04:47.3328560Z runtime-version: cu128 2026-02-21T08:04:47.3328824Z container-options: --gpus all 2026-02-21T08:04:47.3329073Z alias: h100 2026-02-21T08:04:47.3329282Z kernels: int4_gemm 2026-02-21T08:04:47.3329494Z env-vars: 2026-02-21T08:04:47.3329690Z custom-args: 2026-02-21T08:04:47.3330267Z run_h100: true 2026-02-21T08:04:47.3330517Z run_b200: true 2026-02-21T08:04:47.3330719Z run_mi325x: true 2026-02-21T08:04:47.3330934Z ##[endgroup] 2026-02-21T08:04:47.3331261Z Complete job name: run-h100 (int4_gemm) / benchmark-cu128-int4_gemm-py3.12-h100 2026-02-21T08:04:47.4347536Z ##[group]Checking docker version 2026-02-21T08:04:47.4359190Z ##[command]/usr/bin/docker version --format '{{.Server.APIVersion}}' 2026-02-21T08:04:47.4572510Z '1.53' 2026-02-21T08:04:47.4590813Z Docker daemon API version: '1.53' 2026-02-21T08:04:47.4591292Z ##[command]/usr/bin/docker version --format '{{.Client.APIVersion}}' 2026-02-21T08:04:47.4794605Z '1.52' 2026-02-21T08:04:47.4815701Z Docker client API version: '1.52' 2026-02-21T08:04:47.4820361Z ##[endgroup] 2026-02-21T08:04:47.4822953Z ##[group]Clean up resources from previous jobs 2026-02-21T08:04:47.4828108Z ##[command]/usr/bin/docker ps --all --quiet --no-trunc --filter "label=60cdf3" 2026-02-21T08:04:47.5037743Z ##[command]/usr/bin/docker network prune --force --filter "label=60cdf3" 2026-02-21T08:04:47.5179435Z ##[endgroup] 2026-02-21T08:04:47.5179726Z ##[group]Create local container network 2026-02-21T08:04:47.5188549Z ##[command]/usr/bin/docker network create --label 60cdf3 github_network_3a5cb79b050545ab97b2786230e17941 2026-02-21T08:04:47.5696904Z 6e0b70498d91b4fabaa1a94b87781b8cdbf7871c9bd6c11d8cce903d575a3ba1 2026-02-21T08:04:47.5722121Z ##[endgroup] 2026-02-21T08:04:47.5749217Z ##[group]Starting job container 2026-02-21T08:04:47.5777830Z ##[command]/usr/bin/docker pull nvidia/cuda:12.8.1-devel-ubuntu24.04 2026-02-21T08:04:48.0485440Z 12.8.1-devel-ubuntu24.04: Pulling from nvidia/cuda 2026-02-21T08:04:48.2293892Z 73389fbd088f: Pulling fs layer 2026-02-21T08:04:48.2294241Z 93e2721b7ddd: Pulling fs layer 2026-02-21T08:04:48.2294505Z 7209097bfb98: Pulling fs layer 2026-02-21T08:04:48.2294769Z cbb9175a9bc5: Pulling fs layer 2026-02-21T08:04:48.2295027Z abf026459f52: Pulling fs layer 2026-02-21T08:04:48.2295271Z a102f36d092c: Pulling fs layer 2026-02-21T08:04:48.2296832Z 545a3ada5b6b: Pulling fs layer 2026-02-21T08:04:48.2297643Z 5a7813e071bf: Pulling fs layer 2026-02-21T08:04:48.2297914Z 05ec76e31584: Pulling fs layer 2026-02-21T08:04:48.2298165Z 3d6ab8c799cd: Pulling fs layer 2026-02-21T08:04:48.2298431Z 398182656c47: Pulling fs layer 2026-02-21T08:04:48.3932430Z 93e2721b7ddd: Download complete 2026-02-21T08:04:48.4934083Z 3d6ab8c799cd: Download complete 2026-02-21T08:04:48.4935268Z 73389fbd088f: Download complete 2026-02-21T08:04:48.4938553Z 545a3ada5b6b: Download complete 2026-02-21T08:04:48.4941154Z a102f36d092c: Download complete 2026-02-21T08:04:48.4943699Z 398182656c47: Download complete 2026-02-21T08:04:48.4946076Z 7209097bfb98: Download complete 2026-02-21T08:04:48.8934989Z 5a7813e071bf: Download complete 2026-02-21T08:04:49.7352404Z 05ec76e31584: Download complete 2026-02-21T08:04:55.2549371Z 5a7813e071bf: Pull complete 2026-02-21T08:07:20.5934755Z cbb9175a9bc5: Download complete 2026-02-21T08:08:03.9928714Z abf026459f52: Download complete 2026-02-21T08:08:08.4941678Z a102f36d092c: Pull complete 2026-02-21T08:08:16.2940274Z 05ec76e31584: Pull complete 2026-02-21T08:08:16.3934442Z 73389fbd088f: Pull complete 2026-02-21T08:08:16.3939225Z 398182656c47: Pull complete 2026-02-21T08:10:26.0934399Z 3d6ab8c799cd: Pull complete 2026-02-21T08:10:26.0952586Z cbb9175a9bc5: Pull complete 2026-02-21T08:10:26.1933574Z 545a3ada5b6b: Pull complete 2026-02-21T08:10:26.1936616Z 7209097bfb98: Pull complete 2026-02-21T08:14:35.2940469Z abf026459f52: Pull complete 2026-02-21T08:14:35.3829179Z 93e2721b7ddd: Pull complete 2026-02-21T08:14:35.3829625Z Digest: sha256:520292dbb4f755fd360766059e62956e9379485d9e073bbd2f6e3c20c270ed66 2026-02-21T08:14:35.3848699Z Status: Downloaded newer image for nvidia/cuda:12.8.1-devel-ubuntu24.04 2026-02-21T08:14:35.3849172Z docker.io/nvidia/cuda:12.8.1-devel-ubuntu24.04 2026-02-21T08:14:35.3948390Z ##[command]/usr/bin/docker create --name a713bb541b394d40a9e372a466ae966c_nvidiacuda1281develubuntu2404_ca6977 --label 60cdf3 --workdir /__w/helion/helion --network github_network_3a5cb79b050545ab97b2786230e17941 --gpus all -e "HOME=/github/home" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/bob/_work":"/__w" -v "/home/bob/externals":"/__e":ro -v "/home/bob/_work/_temp":"/__w/_temp" -v "/home/bob/_work/_actions":"/__w/_actions" -v "/home/bob/_work/_tool":"/__w/_tool" -v "/home/bob/_work/_temp/_github_home":"/github/home" -v "/home/bob/_work/_temp/_github_workflow":"/github/workflow" --entrypoint "tail" nvidia/cuda:12.8.1-devel-ubuntu24.04 "-f" "/dev/null" 2026-02-21T08:14:35.4979377Z dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T08:14:35.5004861Z ##[command]/usr/bin/docker start dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T08:14:35.9667572Z dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T08:14:35.9690796Z ##[command]/usr/bin/docker ps --all --filter id=dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 --filter status=running --no-trunc --format "{{.ID}} {{.Status}}" 2026-02-21T08:14:35.9850511Z dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 Up Less than a second 2026-02-21T08:14:35.9870583Z ##[command]/usr/bin/docker inspect --format "{{range .Config.Env}}{{println .}}{{end}}" dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T08:14:35.9991468Z HOME=/github/home 2026-02-21T08:14:35.9991679Z GITHUB_ACTIONS=true 2026-02-21T08:14:35.9992071Z CI=true 2026-02-21T08:14:35.9992398Z PATH=/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:14:35.9992837Z NVARCH=x86_64 2026-02-21T08:14:36.0000295Z NVIDIA_REQUIRE_CUDA=cuda>=12.8 brand=unknown,driver>=470,driver<471 brand=grid,driver>=470,driver<471 brand=tesla,driver>=470,driver<471 brand=nvidia,driver>=470,driver<471 brand=quadro,driver>=470,driver<471 brand=quadrortx,driver>=470,driver<471 brand=nvidiartx,driver>=470,driver<471 brand=vapps,driver>=470,driver<471 brand=vpc,driver>=470,driver<471 brand=vcs,driver>=470,driver<471 brand=vws,driver>=470,driver<471 brand=cloudgaming,driver>=470,driver<471 brand=unknown,driver>=535,driver<536 brand=grid,driver>=535,driver<536 brand=tesla,driver>=535,driver<536 brand=nvidia,driver>=535,driver<536 brand=quadro,driver>=535,driver<536 brand=quadrortx,driver>=535,driver<536 brand=nvidiartx,driver>=535,driver<536 brand=vapps,driver>=535,driver<536 brand=vpc,driver>=535,driver<536 brand=vcs,driver>=535,driver<536 brand=vws,driver>=535,driver<536 brand=cloudgaming,driver>=535,driver<536 brand=unknown,driver>=550,driver<551 brand=grid,driver>=550,driver<551 brand=tesla,driver>=550,driver<551 brand=nvidia,driver>=550,driver<551 brand=quadro,driver>=550,driver<551 brand=quadrortx,driver>=550,driver<551 brand=nvidiartx,driver>=550,driver<551 brand=vapps,driver>=550,driver<551 brand=vpc,driver>=550,driver<551 brand=vcs,driver>=550,driver<551 brand=vws,driver>=550,driver<551 brand=cloudgaming,driver>=550,driver<551 brand=unknown,driver>=560,driver<561 brand=grid,driver>=560,driver<561 brand=tesla,driver>=560,driver<561 brand=nvidia,driver>=560,driver<561 brand=quadro,driver>=560,driver<561 brand=quadrortx,driver>=560,driver<561 brand=nvidiartx,driver>=560,driver<561 brand=vapps,driver>=560,driver<561 brand=vpc,driver>=560,driver<561 brand=vcs,driver>=560,driver<561 brand=vws,driver>=560,driver<561 brand=cloudgaming,driver>=560,driver<561 brand=unknown,driver>=565,driver<566 brand=grid,driver>=565,driver<566 brand=tesla,driver>=565,driver<566 brand=nvidia,driver>=565,driver<566 brand=quadro,driver>=565,driver<566 brand=quadrortx,driver>=565,driver<566 brand=nvidiartx,driver>=565,driver<566 brand=vapps,driver>=565,driver<566 brand=vpc,driver>=565,driver<566 brand=vcs,driver>=565,driver<566 brand=vws,driver>=565,driver<566 brand=cloudgaming,driver>=565,driver<566 2026-02-21T08:14:36.0007486Z NV_CUDA_CUDART_VERSION=12.8.90-1 2026-02-21T08:14:36.0007700Z CUDA_VERSION=12.8.1 2026-02-21T08:14:36.0007898Z LD_LIBRARY_PATH=/usr/local/cuda/lib64 2026-02-21T08:14:36.0008126Z NVIDIA_VISIBLE_DEVICES=all 2026-02-21T08:14:36.0008350Z NVIDIA_DRIVER_CAPABILITIES=compute,utility 2026-02-21T08:14:36.0008587Z NV_CUDA_LIB_VERSION=12.8.1-1 2026-02-21T08:14:36.0008781Z NV_NVTX_VERSION=12.8.90-1 2026-02-21T08:14:36.0009152Z NV_LIBNPP_VERSION=12.3.3.100-1 2026-02-21T08:14:36.0009417Z NV_LIBNPP_PACKAGE=libnpp-12-8=12.3.3.100-1 2026-02-21T08:14:36.0009779Z NV_LIBCUSPARSE_VERSION=12.5.8.93-1 2026-02-21T08:14:36.0010070Z NV_LIBCUBLAS_PACKAGE_NAME=libcublas-12-8 2026-02-21T08:14:36.0010299Z NV_LIBCUBLAS_VERSION=12.8.4.1-1 2026-02-21T08:14:36.0010514Z NV_LIBCUBLAS_PACKAGE=libcublas-12-8=12.8.4.1-1 2026-02-21T08:14:36.0010757Z NV_LIBNCCL_PACKAGE_NAME=libnccl2 2026-02-21T08:14:36.0010957Z NV_LIBNCCL_PACKAGE_VERSION=2.25.1-1 2026-02-21T08:14:36.0011172Z NCCL_VERSION=2.25.1-1 2026-02-21T08:14:36.0011356Z NV_LIBNCCL_PACKAGE=libnccl2=2.25.1-1+cuda12.8 2026-02-21T08:14:36.0011585Z NVIDIA_PRODUCT_NAME=CUDA 2026-02-21T08:14:36.0011769Z NV_CUDA_CUDART_DEV_VERSION=12.8.90-1 2026-02-21T08:14:36.0011985Z NV_NVML_DEV_VERSION=12.8.90-1 2026-02-21T08:14:36.0012189Z NV_LIBCUSPARSE_DEV_VERSION=12.5.8.93-1 2026-02-21T08:14:36.0012412Z NV_LIBNPP_DEV_VERSION=12.3.3.100-1 2026-02-21T08:14:36.0012646Z NV_LIBNPP_DEV_PACKAGE=libnpp-dev-12-8=12.3.3.100-1 2026-02-21T08:14:36.0013050Z NV_LIBCUBLAS_DEV_VERSION=12.8.4.1-1 2026-02-21T08:14:36.0013317Z NV_LIBCUBLAS_DEV_PACKAGE_NAME=libcublas-dev-12-8 2026-02-21T08:14:36.0013599Z NV_LIBCUBLAS_DEV_PACKAGE=libcublas-dev-12-8=12.8.4.1-1 2026-02-21T08:14:36.0013864Z NV_CUDA_NSIGHT_COMPUTE_VERSION=12.8.1-1 2026-02-21T08:14:36.0014162Z NV_CUDA_NSIGHT_COMPUTE_DEV_PACKAGE=cuda-nsight-compute-12-8=12.8.1-1 2026-02-21T08:14:36.0014463Z NV_NVPROF_VERSION=12.8.90-1 2026-02-21T08:14:36.0014679Z NV_NVPROF_DEV_PACKAGE=cuda-nvprof-12-8=12.8.90-1 2026-02-21T08:14:36.0014922Z NV_LIBNCCL_DEV_PACKAGE_NAME=libnccl-dev 2026-02-21T08:14:36.0015144Z NV_LIBNCCL_DEV_PACKAGE_VERSION=2.25.1-1 2026-02-21T08:14:36.0015385Z NV_LIBNCCL_DEV_PACKAGE=libnccl-dev=2.25.1-1+cuda12.8 2026-02-21T08:14:36.0015648Z LIBRARY_PATH=/usr/local/cuda/lib64/stubs 2026-02-21T08:14:36.0022497Z ##[endgroup] 2026-02-21T08:14:36.0031322Z ##[group]Waiting for all services to be ready 2026-02-21T08:14:36.0032906Z ##[endgroup] 2026-02-21T08:14:36.0212802Z ##[group]Run echo "Detected NVIDIA image" 2026-02-21T08:14:36.0213156Z echo "Detected NVIDIA image" 2026-02-21T08:14:36.0213472Z nvidia-smi || echo "nvidia-smi not found" 2026-02-21T08:14:36.0216155Z shell: bash -l {0} 2026-02-21T08:14:36.0216827Z env: 2026-02-21T08:14:36.0217008Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:36.0217233Z ##[endgroup] 2026-02-21T08:14:36.0807926Z Detected NVIDIA image 2026-02-21T08:14:36.2432560Z Sat Feb 21 08:14:36 2026 2026-02-21T08:14:36.2432961Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:14:36.2433538Z | NVIDIA-SMI 580.95.05 Driver Version: 580.95.05 CUDA Version: 13.0 | 2026-02-21T08:14:36.2434078Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T08:14:36.2434616Z | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | 2026-02-21T08:14:36.2435208Z | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | 2026-02-21T08:14:36.2435672Z | | | MIG M. | 2026-02-21T08:14:36.2436039Z |=========================================+========================+======================| 2026-02-21T08:14:36.2842119Z | 0 NVIDIA H100 80GB HBM3 On | 00000000:64:00.0 Off | 0 | 2026-02-21T08:14:36.2843089Z | N/A 32C P0 72W / 500W | 0MiB / 81559MiB | 0% Default | 2026-02-21T08:14:36.2843833Z | | | Disabled | 2026-02-21T08:14:36.2844523Z +-----------------------------------------+------------------------+----------------------+ 2026-02-21T08:14:36.2845815Z 2026-02-21T08:14:36.2846141Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:14:36.2847164Z | Processes: | 2026-02-21T08:14:36.2847898Z | GPU GI CI PID Type Process name GPU Memory | 2026-02-21T08:14:36.2848590Z | ID ID Usage | 2026-02-21T08:14:36.2849176Z |=========================================================================================| 2026-02-21T08:14:36.4271871Z | No running processes found | 2026-02-21T08:14:36.4272421Z +-----------------------------------------------------------------------------------------+ 2026-02-21T08:14:36.5187721Z ##[group]Run set -x 2026-02-21T08:14:36.5187988Z set -x 2026-02-21T08:14:36.5188165Z apt-get update 2026-02-21T08:14:36.5188457Z apt-get install -y git 2026-02-21T08:14:36.5188777Z shell: bash -l {0} 2026-02-21T08:14:36.5188936Z env: 2026-02-21T08:14:36.5189089Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:36.5189298Z ##[endgroup] 2026-02-21T08:14:36.5872757Z + apt-get update 2026-02-21T08:14:36.6845708Z Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 InRelease [1581 B] 2026-02-21T08:14:36.8224885Z Get:2 http://security.ubuntu.com/ubuntu noble-security InRelease [126 kB] 2026-02-21T08:14:36.8244315Z Get:3 http://archive.ubuntu.com/ubuntu noble InRelease [256 kB] 2026-02-21T08:14:36.8402259Z Get:4 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64 Packages [1218 kB] 2026-02-21T08:14:37.2768237Z Get:5 http://security.ubuntu.com/ubuntu noble-security/multiverse amd64 Packages [34.8 kB] 2026-02-21T08:14:37.3488189Z Get:6 http://archive.ubuntu.com/ubuntu noble-updates InRelease [126 kB] 2026-02-21T08:14:37.3830065Z Get:7 http://security.ubuntu.com/ubuntu noble-security/restricted amd64 Packages [3196 kB] 2026-02-21T08:14:37.4823343Z Get:8 http://archive.ubuntu.com/ubuntu noble-backports InRelease [126 kB] 2026-02-21T08:14:37.6170244Z Get:9 http://archive.ubuntu.com/ubuntu noble/main amd64 Packages [1808 kB] 2026-02-21T08:14:37.9182562Z Get:10 http://security.ubuntu.com/ubuntu noble-security/main amd64 Packages [1857 kB] 2026-02-21T08:14:37.9353510Z Get:11 http://archive.ubuntu.com/ubuntu noble/restricted amd64 Packages [117 kB] 2026-02-21T08:14:37.9424711Z Get:12 http://archive.ubuntu.com/ubuntu noble/universe amd64 Packages [19.3 MB] 2026-02-21T08:14:37.9768322Z Get:13 http://security.ubuntu.com/ubuntu noble-security/universe amd64 Packages [1207 kB] 2026-02-21T08:14:38.3994719Z Get:14 http://archive.ubuntu.com/ubuntu noble/multiverse amd64 Packages [331 kB] 2026-02-21T08:14:38.4021483Z Get:15 http://archive.ubuntu.com/ubuntu noble-updates/restricted amd64 Packages [3381 kB] 2026-02-21T08:14:38.4744412Z Get:16 http://archive.ubuntu.com/ubuntu noble-updates/multiverse amd64 Packages [38.1 kB] 2026-02-21T08:14:38.4747423Z Get:17 http://archive.ubuntu.com/ubuntu noble-updates/universe amd64 Packages [2016 kB] 2026-02-21T08:14:38.5152649Z Get:18 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 Packages [2240 kB] 2026-02-21T08:14:38.5634360Z Get:19 http://archive.ubuntu.com/ubuntu noble-backports/universe amd64 Packages [34.6 kB] 2026-02-21T08:14:38.5636935Z Get:20 http://archive.ubuntu.com/ubuntu noble-backports/main amd64 Packages [49.5 kB] 2026-02-21T08:14:39.2081845Z Fetched 37.5 MB in 3s (14.6 MB/s) 2026-02-21T08:14:39.9571230Z Reading package lists... 2026-02-21T08:14:39.9732031Z W: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2404/x86_64/InRelease: Key is stored in legacy trusted.gpg keyring (/etc/apt/trusted.gpg), see the DEPRECATION section in apt-key(8) for details. 2026-02-21T08:14:39.9743803Z + apt-get install -y git 2026-02-21T08:14:40.7484591Z Reading package lists... 2026-02-21T08:14:40.9221624Z Building dependency tree... 2026-02-21T08:14:40.9222026Z Reading state information... 2026-02-21T08:14:41.0666085Z The following additional packages will be installed: 2026-02-21T08:14:41.0667360Z git-man krb5-locales less libbrotli1 libbsd0 libcbor0.10 libcurl3t64-gnutls 2026-02-21T08:14:41.0668288Z libedit2 liberror-perl libexpat1 libfido2-1 libgssapi-krb5-2 libk5crypto3 2026-02-21T08:14:41.0670360Z libkeyutils1 libkrb5-3 libkrb5support0 libnghttp2-14 libpsl5t64 librtmp1 2026-02-21T08:14:41.0671781Z libssh-4 libx11-6 libx11-data libxau6 libxcb1 libxdmcp6 libxext6 libxmuu1 2026-02-21T08:14:41.0672702Z openssh-client publicsuffix xauth 2026-02-21T08:14:41.0679297Z Suggested packages: 2026-02-21T08:14:41.0679664Z gettext-base git-daemon-run | git-daemon-sysvinit git-doc git-email git-gui 2026-02-21T08:14:41.0680891Z gitk gitweb git-cvs git-mediawiki git-svn krb5-doc krb5-user keychain 2026-02-21T08:14:41.0681315Z libpam-ssh monkeysphere ssh-askpass 2026-02-21T08:14:41.1344002Z The following NEW packages will be installed: 2026-02-21T08:14:41.1344499Z git git-man krb5-locales less libbrotli1 libbsd0 libcbor0.10 2026-02-21T08:14:41.1345202Z libcurl3t64-gnutls libedit2 liberror-perl libexpat1 libfido2-1 2026-02-21T08:14:41.1345764Z libgssapi-krb5-2 libk5crypto3 libkeyutils1 libkrb5-3 libkrb5support0 2026-02-21T08:14:41.1349631Z libnghttp2-14 libpsl5t64 librtmp1 libssh-4 libx11-6 libx11-data libxau6 2026-02-21T08:14:41.1353589Z libxcb1 libxdmcp6 libxext6 libxmuu1 openssh-client publicsuffix xauth 2026-02-21T08:14:41.2047177Z 0 upgraded, 31 newly installed, 0 to remove and 92 not upgraded. 2026-02-21T08:14:41.2047819Z Need to get 8886 kB of archives. 2026-02-21T08:14:41.2048396Z After this operation, 38.0 MB of additional disk space will be used. 2026-02-21T08:14:41.2049499Z Get:1 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 krb5-locales all 1.20.1-6ubuntu2.6 [14.8 kB] 2026-02-21T08:14:41.2480589Z Get:2 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 less amd64 590-2ubuntu2.1 [142 kB] 2026-02-21T08:14:41.3078236Z Get:3 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libbsd0 amd64 0.12.1-1build1.1 [41.2 kB] 2026-02-21T08:14:41.3181225Z Get:4 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libexpat1 amd64 2.6.1-2ubuntu0.4 [88.2 kB] 2026-02-21T08:14:41.3397232Z Get:5 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libkrb5support0 amd64 1.20.1-6ubuntu2.6 [34.4 kB] 2026-02-21T08:14:41.3429800Z Get:6 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libk5crypto3 amd64 1.20.1-6ubuntu2.6 [82.0 kB] 2026-02-21T08:14:41.3494183Z Get:7 http://archive.ubuntu.com/ubuntu noble/main amd64 libkeyutils1 amd64 1.6.3-3build1 [9490 B] 2026-02-21T08:14:41.3502985Z Get:8 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libkrb5-3 amd64 1.20.1-6ubuntu2.6 [348 kB] 2026-02-21T08:14:41.3759224Z Get:9 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libgssapi-krb5-2 amd64 1.20.1-6ubuntu2.6 [143 kB] 2026-02-21T08:14:41.3838695Z Get:10 http://archive.ubuntu.com/ubuntu noble/main amd64 libcbor0.10 amd64 0.10.2-1.2ubuntu2 [25.8 kB] 2026-02-21T08:14:41.3839425Z Get:11 http://archive.ubuntu.com/ubuntu noble/main amd64 libedit2 amd64 3.1-20230828-1build1 [97.6 kB] 2026-02-21T08:14:41.3861477Z Get:12 http://archive.ubuntu.com/ubuntu noble/main amd64 libfido2-1 amd64 1.14.0-1build3 [83.5 kB] 2026-02-21T08:14:41.3878125Z Get:13 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libnghttp2-14 amd64 1.59.0-1ubuntu0.2 [74.3 kB] 2026-02-21T08:14:41.3901295Z Get:14 http://archive.ubuntu.com/ubuntu noble/main amd64 libpsl5t64 amd64 0.21.2-1.1build1 [57.1 kB] 2026-02-21T08:14:41.3921618Z Get:15 http://archive.ubuntu.com/ubuntu noble/main amd64 libxau6 amd64 1:1.0.9-1build6 [7160 B] 2026-02-21T08:14:41.3923476Z Get:16 http://archive.ubuntu.com/ubuntu noble/main amd64 libxdmcp6 amd64 1:1.1.3-0ubuntu6 [10.3 kB] 2026-02-21T08:14:41.3926891Z Get:17 http://archive.ubuntu.com/ubuntu noble/main amd64 libxcb1 amd64 1.15-1ubuntu2 [47.7 kB] 2026-02-21T08:14:41.3984876Z Get:18 http://archive.ubuntu.com/ubuntu noble/main amd64 libx11-data all 2:1.8.7-1build1 [115 kB] 2026-02-21T08:14:41.4065917Z Get:19 http://archive.ubuntu.com/ubuntu noble/main amd64 libx11-6 amd64 2:1.8.7-1build1 [650 kB] 2026-02-21T08:14:41.4174313Z Get:20 http://archive.ubuntu.com/ubuntu noble/main amd64 libxext6 amd64 2:1.3.4-1build2 [30.4 kB] 2026-02-21T08:14:41.4177756Z Get:21 http://archive.ubuntu.com/ubuntu noble/main amd64 libxmuu1 amd64 2:1.1.3-3build2 [8958 B] 2026-02-21T08:14:41.4180524Z Get:22 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 openssh-client amd64 1:9.6p1-3ubuntu13.14 [906 kB] 2026-02-21T08:14:41.4299488Z Get:23 http://archive.ubuntu.com/ubuntu noble/main amd64 publicsuffix all 20231001.0357-0.1 [129 kB] 2026-02-21T08:14:41.4314335Z Get:24 http://archive.ubuntu.com/ubuntu noble/main amd64 xauth amd64 1:1.1.2-1build1 [25.6 kB] 2026-02-21T08:14:41.4338138Z Get:25 http://archive.ubuntu.com/ubuntu noble/main amd64 libbrotli1 amd64 1.1.0-2build2 [331 kB] 2026-02-21T08:14:41.4370307Z Get:26 http://archive.ubuntu.com/ubuntu noble/main amd64 librtmp1 amd64 2.4+20151223.gitfa8646d.1-2build7 [56.3 kB] 2026-02-21T08:14:41.4373823Z Get:27 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libssh-4 amd64 0.10.6-2ubuntu0.3 [190 kB] 2026-02-21T08:14:41.4389385Z Get:28 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 libcurl3t64-gnutls amd64 8.5.0-2ubuntu10.6 [333 kB] 2026-02-21T08:14:41.4435704Z Get:29 http://archive.ubuntu.com/ubuntu noble/main amd64 liberror-perl all 0.17029-2 [25.6 kB] 2026-02-21T08:14:41.4438679Z Get:30 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 git-man all 1:2.43.0-1ubuntu7.3 [1100 kB] 2026-02-21T08:14:41.4534611Z Get:31 http://archive.ubuntu.com/ubuntu noble-updates/main amd64 git amd64 1:2.43.0-1ubuntu7.3 [3680 kB] 2026-02-21T08:14:41.6173168Z debconf: delaying package configuration, since apt-utils is not installed 2026-02-21T08:14:41.6548612Z Fetched 8886 kB in 0s (26.9 MB/s) 2026-02-21T08:14:41.7099004Z Selecting previously unselected package krb5-locales. 2026-02-21T08:14:41.7136959Z (Reading database ... 2026-02-21T08:14:41.7149639Z (Reading database ... 5% 2026-02-21T08:14:41.7149878Z (Reading database ... 10% 2026-02-21T08:14:41.7150079Z (Reading database ... 15% 2026-02-21T08:14:41.7150277Z (Reading database ... 20% 2026-02-21T08:14:41.7150475Z (Reading database ... 25% 2026-02-21T08:14:41.7150667Z (Reading database ... 30% 2026-02-21T08:14:41.7150863Z (Reading database ... 35% 2026-02-21T08:14:41.7151056Z (Reading database ... 40% 2026-02-21T08:14:41.7151252Z (Reading database ... 45% 2026-02-21T08:14:41.7151441Z (Reading database ... 50% 2026-02-21T08:14:41.7151637Z (Reading database ... 55% 2026-02-21T08:14:41.7151829Z (Reading database ... 60% 2026-02-21T08:14:41.7152027Z (Reading database ... 65% 2026-02-21T08:14:41.7156430Z (Reading database ... 70% 2026-02-21T08:14:41.7169698Z (Reading database ... 75% 2026-02-21T08:14:41.7173304Z (Reading database ... 80% 2026-02-21T08:14:41.7178857Z (Reading database ... 85% 2026-02-21T08:14:41.7189277Z (Reading database ... 90% 2026-02-21T08:14:41.7195691Z (Reading database ... 95% 2026-02-21T08:14:41.7195919Z (Reading database ... 100% 2026-02-21T08:14:41.7196268Z (Reading database ... 15165 files and directories currently installed.) 2026-02-21T08:14:41.7201219Z Preparing to unpack .../00-krb5-locales_1.20.1-6ubuntu2.6_all.deb ... 2026-02-21T08:14:41.7292382Z Unpacking krb5-locales (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:41.8188527Z Selecting previously unselected package less. 2026-02-21T08:14:41.8198999Z Preparing to unpack .../01-less_590-2ubuntu2.1_amd64.deb ... 2026-02-21T08:14:41.8311559Z Unpacking less (590-2ubuntu2.1) ... 2026-02-21T08:14:41.9108549Z Selecting previously unselected package libbsd0:amd64. 2026-02-21T08:14:41.9119218Z Preparing to unpack .../02-libbsd0_0.12.1-1build1.1_amd64.deb ... 2026-02-21T08:14:41.9336133Z Unpacking libbsd0:amd64 (0.12.1-1build1.1) ... 2026-02-21T08:14:42.0085440Z Selecting previously unselected package libexpat1:amd64. 2026-02-21T08:14:42.0095850Z Preparing to unpack .../03-libexpat1_2.6.1-2ubuntu0.4_amd64.deb ... 2026-02-21T08:14:42.0196768Z Unpacking libexpat1:amd64 (2.6.1-2ubuntu0.4) ... 2026-02-21T08:14:42.0933581Z Selecting previously unselected package libkrb5support0:amd64. 2026-02-21T08:14:42.0946033Z Preparing to unpack .../04-libkrb5support0_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:14:42.1048578Z Unpacking libkrb5support0:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:42.1792511Z Selecting previously unselected package libk5crypto3:amd64. 2026-02-21T08:14:42.1802455Z Preparing to unpack .../05-libk5crypto3_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:14:42.1915262Z Unpacking libk5crypto3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:42.2574013Z Selecting previously unselected package libkeyutils1:amd64. 2026-02-21T08:14:42.2584369Z Preparing to unpack .../06-libkeyutils1_1.6.3-3build1_amd64.deb ... 2026-02-21T08:14:42.2684094Z Unpacking libkeyutils1:amd64 (1.6.3-3build1) ... 2026-02-21T08:14:42.3357528Z Selecting previously unselected package libkrb5-3:amd64. 2026-02-21T08:14:42.3367643Z Preparing to unpack .../07-libkrb5-3_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:14:42.3469383Z Unpacking libkrb5-3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:42.4317067Z Selecting previously unselected package libgssapi-krb5-2:amd64. 2026-02-21T08:14:42.4327443Z Preparing to unpack .../08-libgssapi-krb5-2_1.20.1-6ubuntu2.6_amd64.deb ... 2026-02-21T08:14:42.4411329Z Unpacking libgssapi-krb5-2:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:42.5152719Z Selecting previously unselected package libcbor0.10:amd64. 2026-02-21T08:14:42.5163052Z Preparing to unpack .../09-libcbor0.10_0.10.2-1.2ubuntu2_amd64.deb ... 2026-02-21T08:14:42.5246169Z Unpacking libcbor0.10:amd64 (0.10.2-1.2ubuntu2) ... 2026-02-21T08:14:42.6027632Z Selecting previously unselected package libedit2:amd64. 2026-02-21T08:14:42.6038822Z Preparing to unpack .../10-libedit2_3.1-20230828-1build1_amd64.deb ... 2026-02-21T08:14:42.6119983Z Unpacking libedit2:amd64 (3.1-20230828-1build1) ... 2026-02-21T08:14:42.6947587Z Selecting previously unselected package libfido2-1:amd64. 2026-02-21T08:14:42.6957687Z Preparing to unpack .../11-libfido2-1_1.14.0-1build3_amd64.deb ... 2026-02-21T08:14:42.7055233Z Unpacking libfido2-1:amd64 (1.14.0-1build3) ... 2026-02-21T08:14:42.7829257Z Selecting previously unselected package libnghttp2-14:amd64. 2026-02-21T08:14:42.7839099Z Preparing to unpack .../12-libnghttp2-14_1.59.0-1ubuntu0.2_amd64.deb ... 2026-02-21T08:14:42.7927862Z Unpacking libnghttp2-14:amd64 (1.59.0-1ubuntu0.2) ... 2026-02-21T08:14:42.8604586Z Selecting previously unselected package libpsl5t64:amd64. 2026-02-21T08:14:42.8622198Z Preparing to unpack .../13-libpsl5t64_0.21.2-1.1build1_amd64.deb ... 2026-02-21T08:14:42.8730354Z Unpacking libpsl5t64:amd64 (0.21.2-1.1build1) ... 2026-02-21T08:14:42.9531131Z Selecting previously unselected package libxau6:amd64. 2026-02-21T08:14:42.9543073Z Preparing to unpack .../14-libxau6_1%3a1.0.9-1build6_amd64.deb ... 2026-02-21T08:14:42.9639074Z Unpacking libxau6:amd64 (1:1.0.9-1build6) ... 2026-02-21T08:14:43.0418424Z Selecting previously unselected package libxdmcp6:amd64. 2026-02-21T08:14:43.0428418Z Preparing to unpack .../15-libxdmcp6_1%3a1.1.3-0ubuntu6_amd64.deb ... 2026-02-21T08:14:43.0527063Z Unpacking libxdmcp6:amd64 (1:1.1.3-0ubuntu6) ... 2026-02-21T08:14:43.1339233Z Selecting previously unselected package libxcb1:amd64. 2026-02-21T08:14:43.1350308Z Preparing to unpack .../16-libxcb1_1.15-1ubuntu2_amd64.deb ... 2026-02-21T08:14:43.1449763Z Unpacking libxcb1:amd64 (1.15-1ubuntu2) ... 2026-02-21T08:14:43.2100173Z Selecting previously unselected package libx11-data. 2026-02-21T08:14:43.2109938Z Preparing to unpack .../17-libx11-data_2%3a1.8.7-1build1_all.deb ... 2026-02-21T08:14:43.2226391Z Unpacking libx11-data (2:1.8.7-1build1) ... 2026-02-21T08:14:43.4161656Z Selecting previously unselected package libx11-6:amd64. 2026-02-21T08:14:43.4187687Z Preparing to unpack .../18-libx11-6_2%3a1.8.7-1build1_amd64.deb ... 2026-02-21T08:14:43.4298739Z Unpacking libx11-6:amd64 (2:1.8.7-1build1) ... 2026-02-21T08:14:43.5179941Z Selecting previously unselected package libxext6:amd64. 2026-02-21T08:14:43.5194945Z Preparing to unpack .../19-libxext6_2%3a1.3.4-1build2_amd64.deb ... 2026-02-21T08:14:43.5292604Z Unpacking libxext6:amd64 (2:1.3.4-1build2) ... 2026-02-21T08:14:43.6030069Z Selecting previously unselected package libxmuu1:amd64. 2026-02-21T08:14:43.6042683Z Preparing to unpack .../20-libxmuu1_2%3a1.1.3-3build2_amd64.deb ... 2026-02-21T08:14:43.6155588Z Unpacking libxmuu1:amd64 (2:1.1.3-3build2) ... 2026-02-21T08:14:43.7051114Z Selecting previously unselected package openssh-client. 2026-02-21T08:14:43.7081288Z Preparing to unpack .../21-openssh-client_1%3a9.6p1-3ubuntu13.14_amd64.deb ... 2026-02-21T08:14:43.7218594Z Unpacking openssh-client (1:9.6p1-3ubuntu13.14) ... 2026-02-21T08:14:43.8281743Z Selecting previously unselected package publicsuffix. 2026-02-21T08:14:43.8296049Z Preparing to unpack .../22-publicsuffix_20231001.0357-0.1_all.deb ... 2026-02-21T08:14:43.8398143Z Unpacking publicsuffix (20231001.0357-0.1) ... 2026-02-21T08:14:43.9089732Z Selecting previously unselected package xauth. 2026-02-21T08:14:43.9100332Z Preparing to unpack .../23-xauth_1%3a1.1.2-1build1_amd64.deb ... 2026-02-21T08:14:43.9237651Z Unpacking xauth (1:1.1.2-1build1) ... 2026-02-21T08:14:43.9957320Z Selecting previously unselected package libbrotli1:amd64. 2026-02-21T08:14:43.9967492Z Preparing to unpack .../24-libbrotli1_1.1.0-2build2_amd64.deb ... 2026-02-21T08:14:44.0083855Z Unpacking libbrotli1:amd64 (1.1.0-2build2) ... 2026-02-21T08:14:44.0902741Z Selecting previously unselected package librtmp1:amd64. 2026-02-21T08:14:44.0913597Z Preparing to unpack .../25-librtmp1_2.4+20151223.gitfa8646d.1-2build7_amd64.deb ... 2026-02-21T08:14:44.1032599Z Unpacking librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build7) ... 2026-02-21T08:14:44.1833363Z Selecting previously unselected package libssh-4:amd64. 2026-02-21T08:14:44.1843915Z Preparing to unpack .../26-libssh-4_0.10.6-2ubuntu0.3_amd64.deb ... 2026-02-21T08:14:44.1905534Z Unpacking libssh-4:amd64 (0.10.6-2ubuntu0.3) ... 2026-02-21T08:14:44.2739069Z Selecting previously unselected package libcurl3t64-gnutls:amd64. 2026-02-21T08:14:44.2755911Z Preparing to unpack .../27-libcurl3t64-gnutls_8.5.0-2ubuntu10.6_amd64.deb ... 2026-02-21T08:14:44.2855765Z Unpacking libcurl3t64-gnutls:amd64 (8.5.0-2ubuntu10.6) ... 2026-02-21T08:14:44.3546164Z Selecting previously unselected package liberror-perl. 2026-02-21T08:14:44.3560095Z Preparing to unpack .../28-liberror-perl_0.17029-2_all.deb ... 2026-02-21T08:14:44.3633655Z Unpacking liberror-perl (0.17029-2) ... 2026-02-21T08:14:44.4219260Z Selecting previously unselected package git-man. 2026-02-21T08:14:44.4233320Z Preparing to unpack .../29-git-man_1%3a2.43.0-1ubuntu7.3_all.deb ... 2026-02-21T08:14:44.4327578Z Unpacking git-man (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:14:44.5277347Z Selecting previously unselected package git. 2026-02-21T08:14:44.5291570Z Preparing to unpack .../30-git_1%3a2.43.0-1ubuntu7.3_amd64.deb ... 2026-02-21T08:14:44.5437351Z Unpacking git (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:14:44.8090352Z Setting up libexpat1:amd64 (2.6.1-2ubuntu0.4) ... 2026-02-21T08:14:44.8345712Z Setting up libxau6:amd64 (1:1.0.9-1build6) ... 2026-02-21T08:14:44.8676082Z Setting up libkeyutils1:amd64 (1.6.3-3build1) ... 2026-02-21T08:14:44.8972717Z Setting up libcbor0.10:amd64 (0.10.2-1.2ubuntu2) ... 2026-02-21T08:14:44.9332266Z Setting up libbrotli1:amd64 (1.1.0-2build2) ... 2026-02-21T08:14:44.9640773Z Setting up libpsl5t64:amd64 (0.21.2-1.1build1) ... 2026-02-21T08:14:44.9950981Z Setting up libnghttp2-14:amd64 (1.59.0-1ubuntu0.2) ... 2026-02-21T08:14:45.0257343Z Setting up less (590-2ubuntu2.1) ... 2026-02-21T08:14:45.0828736Z Setting up krb5-locales (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:45.1152964Z Setting up libkrb5support0:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:45.1511214Z Setting up liberror-perl (0.17029-2) ... 2026-02-21T08:14:45.1830564Z Setting up libx11-data (2:1.8.7-1build1) ... 2026-02-21T08:14:45.2157185Z Setting up librtmp1:amd64 (2.4+20151223.gitfa8646d.1-2build7) ... 2026-02-21T08:14:45.2438312Z Setting up libk5crypto3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:45.2741033Z Setting up git-man (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:14:45.3012435Z Setting up libkrb5-3:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:45.3310534Z Setting up libfido2-1:amd64 (1.14.0-1build3) ... 2026-02-21T08:14:45.3601073Z Setting up libbsd0:amd64 (0.12.1-1build1.1) ... 2026-02-21T08:14:45.3862450Z Setting up publicsuffix (20231001.0357-0.1) ... 2026-02-21T08:14:45.4133005Z Setting up libxdmcp6:amd64 (1:1.1.3-0ubuntu6) ... 2026-02-21T08:14:45.4395239Z Setting up libxcb1:amd64 (1.15-1ubuntu2) ... 2026-02-21T08:14:45.4624898Z Setting up libedit2:amd64 (3.1-20230828-1build1) ... 2026-02-21T08:14:45.4875491Z Setting up libgssapi-krb5-2:amd64 (1.20.1-6ubuntu2.6) ... 2026-02-21T08:14:45.5184725Z Setting up libssh-4:amd64 (0.10.6-2ubuntu0.3) ... 2026-02-21T08:14:45.5479572Z Setting up libx11-6:amd64 (2:1.8.7-1build1) ... 2026-02-21T08:14:45.5769307Z Setting up libxmuu1:amd64 (2:1.1.3-3build2) ... 2026-02-21T08:14:45.6065903Z Setting up openssh-client (1:9.6p1-3ubuntu13.14) ... 2026-02-21T08:14:45.7471902Z Setting up libcurl3t64-gnutls:amd64 (8.5.0-2ubuntu10.6) ... 2026-02-21T08:14:45.7756770Z Setting up libxext6:amd64 (2:1.3.4-1build2) ... 2026-02-21T08:14:45.7978016Z Setting up git (1:2.43.0-1ubuntu7.3) ... 2026-02-21T08:14:45.8401757Z Setting up xauth (1:1.1.2-1build1) ... 2026-02-21T08:14:45.8674783Z Processing triggers for libc-bin (2.39-0ubuntu8.4) ... 2026-02-21T08:14:45.9702146Z ##[group]Run actions/checkout@v6 2026-02-21T08:14:45.9702403Z with: 2026-02-21T08:14:45.9702578Z repository: pytorch/helion 2026-02-21T08:14:45.9702901Z token: *** 2026-02-21T08:14:45.9703068Z ssh-strict: true 2026-02-21T08:14:45.9703231Z ssh-user: git 2026-02-21T08:14:45.9703425Z persist-credentials: true 2026-02-21T08:14:45.9703618Z clean: true 2026-02-21T08:14:45.9703783Z sparse-checkout-cone-mode: true 2026-02-21T08:14:45.9703995Z fetch-depth: 1 2026-02-21T08:14:45.9704387Z fetch-tags: false 2026-02-21T08:14:45.9704551Z show-progress: true 2026-02-21T08:14:45.9704711Z lfs: false 2026-02-21T08:14:45.9704851Z submodules: false 2026-02-21T08:14:45.9705009Z set-safe-directory: true 2026-02-21T08:14:45.9705188Z env: 2026-02-21T08:14:45.9705330Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:45.9705525Z ##[endgroup] 2026-02-21T08:14:45.9743678Z ##[command]/usr/bin/docker exec dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:14:46.1819527Z Syncing repository: pytorch/helion 2026-02-21T08:14:46.1820638Z ##[group]Getting Git version info 2026-02-21T08:14:46.1820929Z Working directory is '/__w/helion/helion' 2026-02-21T08:14:46.1821329Z [command]/usr/bin/git version 2026-02-21T08:14:46.1821525Z git version 2.43.0 2026-02-21T08:14:46.1829044Z ##[endgroup] 2026-02-21T08:14:46.1841798Z Temporarily overriding HOME='/__w/_temp/54de2f90-9cf0-4542-96b1-51253bd2d02a' before making global git config changes 2026-02-21T08:14:46.1842446Z Adding repository directory to the temporary git global config as a safe directory 2026-02-21T08:14:46.1846261Z [command]/usr/bin/git config --global --add safe.directory /__w/helion/helion 2026-02-21T08:14:46.1903734Z Deleting the contents of '/__w/helion/helion' 2026-02-21T08:14:46.1920525Z ##[group]Initializing the repository 2026-02-21T08:14:46.1923439Z [command]/usr/bin/git init /__w/helion/helion 2026-02-21T08:14:46.1958210Z hint: Using 'master' as the name for the initial branch. This default branch name 2026-02-21T08:14:46.1958793Z hint: is subject to change. To configure the initial branch name to use in all 2026-02-21T08:14:46.1959325Z hint: of your new repositories, which will suppress this warning, call: 2026-02-21T08:14:46.1959700Z hint: 2026-02-21T08:14:46.1959947Z hint: git config --global init.defaultBranch 2026-02-21T08:14:46.1960263Z hint: 2026-02-21T08:14:46.1960544Z hint: Names commonly chosen instead of 'master' are 'main', 'trunk' and 2026-02-21T08:14:46.1961061Z hint: 'development'. The just-created branch can be renamed via this command: 2026-02-21T08:14:46.1961453Z hint: 2026-02-21T08:14:46.1961631Z hint: git branch -m 2026-02-21T08:14:46.1984200Z Initialized empty Git repository in /__w/helion/helion/.git/ 2026-02-21T08:14:46.1993971Z [command]/usr/bin/git remote add origin https://github.com/pytorch/helion 2026-02-21T08:14:46.2066239Z ##[endgroup] 2026-02-21T08:14:46.2066704Z ##[group]Disabling automatic garbage collection 2026-02-21T08:14:46.2070278Z [command]/usr/bin/git config --local gc.auto 0 2026-02-21T08:14:46.2099956Z ##[endgroup] 2026-02-21T08:14:46.2100274Z ##[group]Setting up auth 2026-02-21T08:14:46.2101571Z Removing SSH command configuration 2026-02-21T08:14:46.2106176Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2026-02-21T08:14:46.2137703Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2026-02-21T08:14:46.2462289Z Removing HTTP extra header 2026-02-21T08:14:46.2465428Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2026-02-21T08:14:46.2496967Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2026-02-21T08:14:46.2770691Z Removing includeIf entries pointing to credentials config files 2026-02-21T08:14:46.2774849Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2026-02-21T08:14:46.2806435Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2026-02-21T08:14:46.3084244Z [command]/usr/bin/git config --file /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config http.https://github.com/.extraheader AUTHORIZATION: basic *** 2026-02-21T08:14:46.3119768Z [command]/usr/bin/git config --local includeIf.gitdir:/__w/helion/helion/.git.path /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T08:14:46.3150389Z [command]/usr/bin/git config --local includeIf.gitdir:/__w/helion/helion/.git/worktrees/*.path /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T08:14:46.3180477Z [command]/usr/bin/git config --local includeIf.gitdir:/github/workspace/.git.path /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T08:14:46.3210941Z [command]/usr/bin/git config --local includeIf.gitdir:/github/workspace/.git/worktrees/*.path /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T08:14:46.3250235Z ##[endgroup] 2026-02-21T08:14:46.3250547Z ##[group]Fetching the repository 2026-02-21T08:14:46.3257820Z [command]/usr/bin/git -c protocol.version=2 fetch --no-tags --prune --no-recurse-submodules --depth=1 origin +874a7d0cadab18218a84ad3579d329dc95c51820:refs/remotes/origin/main 2026-02-21T08:14:46.7567995Z From https://github.com/pytorch/helion 2026-02-21T08:14:46.7568462Z * [new ref] 874a7d0cadab18218a84ad3579d329dc95c51820 -> origin/main 2026-02-21T08:14:46.7595408Z [command]/usr/bin/git branch --list --remote origin/main 2026-02-21T08:14:46.7630170Z origin/main 2026-02-21T08:14:46.7638550Z [command]/usr/bin/git rev-parse refs/remotes/origin/main 2026-02-21T08:14:46.7665119Z 874a7d0cadab18218a84ad3579d329dc95c51820 2026-02-21T08:14:46.7668270Z ##[endgroup] 2026-02-21T08:14:46.7668714Z ##[group]Determining the checkout info 2026-02-21T08:14:46.7670162Z ##[endgroup] 2026-02-21T08:14:46.7673836Z [command]/usr/bin/git sparse-checkout disable 2026-02-21T08:14:46.7720660Z [command]/usr/bin/git config --local --unset-all extensions.worktreeConfig 2026-02-21T08:14:46.7754486Z ##[group]Checking out the ref 2026-02-21T08:14:46.7754863Z [command]/usr/bin/git checkout --progress --force -B main refs/remotes/origin/main 2026-02-21T08:14:46.8223593Z Switched to a new branch 'main' 2026-02-21T08:14:46.8223917Z branch 'main' set up to track 'origin/main'. 2026-02-21T08:14:46.8230049Z ##[endgroup] 2026-02-21T08:14:46.8268887Z [command]/usr/bin/git log -1 --format=%H 2026-02-21T08:14:46.8296689Z 874a7d0cadab18218a84ad3579d329dc95c51820 2026-02-21T08:14:46.8505616Z ##[group]Run actions/setup-python@v6 2026-02-21T08:14:46.8505899Z with: 2026-02-21T08:14:46.8506054Z python-version: 3.12 2026-02-21T08:14:46.8506243Z check-latest: false 2026-02-21T08:14:46.8506710Z token: *** 2026-02-21T08:14:46.8506886Z update-environment: true 2026-02-21T08:14:46.8507081Z allow-prereleases: false 2026-02-21T08:14:46.8507284Z freethreaded: false 2026-02-21T08:14:46.8507445Z env: 2026-02-21T08:14:46.8507594Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:46.8507795Z ##[endgroup] 2026-02-21T08:14:46.8511344Z ##[command]/usr/bin/docker exec dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:14:47.0713998Z ##[group]Installed versions 2026-02-21T08:14:47.0724910Z Version 3.12 was not found in the local cache 2026-02-21T08:14:47.4995538Z Version 3.12 is available for downloading 2026-02-21T08:14:47.4996752Z Download from "https://github.com/actions/python-versions/releases/download/3.12.12-18393146713/python-3.12.12-linux-24.04-x64.tar.gz" 2026-02-21T08:14:47.9326382Z Extract downloaded archive 2026-02-21T08:14:47.9488967Z [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /__w/_temp/7e8ff2df-974f-4b69-b743-11f7d6eca546 -f /__w/_temp/abdb4da6-d5fd-4ddd-a94b-a08d07ee43a4 2026-02-21T08:14:49.9255409Z Execute installation script 2026-02-21T08:14:49.9378841Z Check if Python hostedtoolcache folder exist... 2026-02-21T08:14:49.9379183Z Creating Python hostedtoolcache folder... 2026-02-21T08:14:49.9389888Z Create Python 3.12.12 folder 2026-02-21T08:14:49.9401450Z Copy Python binaries to hostedtoolcache folder 2026-02-21T08:14:50.6153464Z Create additional symlinks (Required for the UsePythonVersion Azure Pipelines task and the setup-python GitHub Action) 2026-02-21T08:14:50.6193087Z Upgrading pip... 2026-02-21T08:14:52.4979665Z Looking in links: /tmp/tmpf93d9u08 2026-02-21T08:14:52.4981671Z Requirement already satisfied: pip in /__w/_tool/Python/3.12.12/x64/lib/python3.12/site-packages (25.0.1) 2026-02-21T08:14:52.5040084Z ##[error]WARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager, possibly rendering your system unusable. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv. Use the --root-user-action option if you know what you are doing and want to suppress this warning. 2026-02-21T08:14:53.2698857Z ##[error]WARNING: The directory '/github/home/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag. 2026-02-21T08:14:53.4584229Z Collecting pip 2026-02-21T08:14:53.5116347Z Downloading pip-26.0.1-py3-none-any.whl.metadata (4.7 kB) 2026-02-21T08:14:53.5247394Z Downloading pip-26.0.1-py3-none-any.whl (1.8 MB) 2026-02-21T08:14:53.5803629Z ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 60.1 MB/s eta 0:00:00 2026-02-21T08:14:53.5803920Z 2026-02-21T08:14:53.5925730Z Installing collected packages: pip 2026-02-21T08:14:53.5927484Z Attempting uninstall: pip 2026-02-21T08:14:53.5946327Z Found existing installation: pip 25.0.1 2026-02-21T08:14:53.6312830Z Uninstalling pip-25.0.1: 2026-02-21T08:14:53.6369807Z Successfully uninstalled pip-25.0.1 2026-02-21T08:14:54.4582382Z Successfully installed pip-26.0.1 2026-02-21T08:14:54.5092463Z Create complete file 2026-02-21T08:14:54.5135371Z Successfully set up CPython (3.12.12) 2026-02-21T08:14:54.5135923Z ##[endgroup] 2026-02-21T08:14:54.5391343Z ##[group]Run astral-sh/setup-uv@v7 2026-02-21T08:14:54.5391606Z with: 2026-02-21T08:14:54.5391769Z activate-environment: false 2026-02-21T08:14:54.5392049Z working-directory: /home/bob/_work/helion/helion 2026-02-21T08:14:54.5392449Z github-token: *** 2026-02-21T08:14:54.5392617Z enable-cache: auto 2026-02-21T08:14:54.5393113Z cache-dependency-glob: **/*requirements*.txt **/*requirements*.in **/*constraints*.txt **/*constraints*.in **/pyproject.toml **/uv.lock **/*.py.lock 2026-02-21T08:14:54.5393641Z restore-cache: true 2026-02-21T08:14:54.5393809Z save-cache: true 2026-02-21T08:14:54.5393965Z prune-cache: true 2026-02-21T08:14:54.5394118Z cache-python: false 2026-02-21T08:14:54.5394301Z ignore-nothing-to-cache: false 2026-02-21T08:14:54.5394513Z ignore-empty-workdir: false 2026-02-21T08:14:54.5394719Z add-problem-matchers: true 2026-02-21T08:14:54.5394911Z resolution-strategy: highest 2026-02-21T08:14:54.5395122Z env: 2026-02-21T08:14:54.5395304Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:54.5395539Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:54.5395861Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:14:54.5396166Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:54.5396431Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:54.5396856Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:54.5397167Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:14:54.5397464Z ##[endgroup] 2026-02-21T08:14:54.5402845Z ##[command]/usr/bin/docker exec dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T08:14:54.7967266Z (node:802) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead. 2026-02-21T08:14:54.7968071Z (Use `node --trace-deprecation ...` to show where the warning was created) 2026-02-21T08:14:54.8085216Z Trying to find version for uv in: /__w/helion/helion/uv.toml 2026-02-21T08:14:54.8085609Z Could not find file: /__w/helion/helion/uv.toml 2026-02-21T08:14:54.8086430Z Trying to find version for uv in: /__w/helion/helion/pyproject.toml 2026-02-21T08:14:54.8096685Z Could not determine uv version from uv.toml or pyproject.toml. Falling back to latest. 2026-02-21T08:14:54.8098241Z Getting latest version from GitHub API... 2026-02-21T08:14:55.0308822Z manifest-file not provided, reading from local file. 2026-02-21T08:14:55.0350259Z manifest-file does not contain version 0.10.4, arch x86_64, platform unknown-linux-gnu. Falling back to GitHub releases. 2026-02-21T08:14:55.0351432Z Downloading uv from "https://github.com/astral-sh/uv/releases/download/0.10.4/uv-x86_64-unknown-linux-gnu.tar.gz" ... 2026-02-21T08:14:55.2623863Z [command]/usr/bin/tar xz --warning=no-unknown-keyword --overwrite -C /__w/_temp/7f251331-9857-47fb-8930-e4e9907a11aa -f /__w/_temp/35bc44e8-157f-4a74-ae0e-6758932b831a 2026-02-21T08:14:55.6831063Z Added /github/home/.local/bin to the path 2026-02-21T08:14:55.6834074Z Added /__w/_tool/uv/0.10.4/x86_64 to the path 2026-02-21T08:14:55.6835595Z Set UV_PYTHON_INSTALL_DIR to /github/home/.local/share/uv/python 2026-02-21T08:14:55.6836024Z Added /github/home/.local/share/uv/python to the path 2026-02-21T08:14:55.6842224Z Successfully installed uv version 0.10.4 2026-02-21T08:14:55.8500013Z ##[group]Run uv venv --python 3.12 2026-02-21T08:14:55.8500318Z uv venv --python 3.12 2026-02-21T08:14:55.8500670Z shell: bash -l {0} 2026-02-21T08:14:55.8500832Z env: 2026-02-21T08:14:55.8500978Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:55.8501223Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:55.8501530Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:14:55.8501845Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:55.8502113Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:55.8502373Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:55.8502697Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:14:55.8503072Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:14:55.8503336Z ##[endgroup] 2026-02-21T08:14:56.0001775Z Using CPython 3.12.12 interpreter at: /__w/_tool/Python/3.12.12/x64/bin/python3.12 2026-02-21T08:14:56.0002338Z Creating virtual environment at: .venv 2026-02-21T08:14:56.0007848Z Activate with: source .venv/bin/activate 2026-02-21T08:14:56.0092209Z ##[group]Run source .venv/bin/activate 2026-02-21T08:14:56.0092530Z source .venv/bin/activate 2026-02-21T08:14:56.0092934Z uv pip install -U "torch==2.9.*" --index-url https://download.pytorch.org/whl/cu128 2026-02-21T08:14:56.0093574Z shell: bash -l {0} 2026-02-21T08:14:56.0093741Z env: 2026-02-21T08:14:56.0093887Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:14:56.0094127Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:56.0094437Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:14:56.0094735Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:56.0095090Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:56.0095353Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:14:56.0095670Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:14:56.0096024Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:14:56.0096312Z ##[endgroup] 2026-02-21T08:14:56.4291649Z Resolved 26 packages in 296ms 2026-02-21T08:14:56.4356860Z Downloading torch (859.1MiB) 2026-02-21T08:14:56.4366827Z Downloading triton (162.6MiB) 2026-02-21T08:14:56.4403748Z Downloading nvidia-nccl-cu12 (307.4MiB) 2026-02-21T08:14:56.4482769Z Downloading nvidia-cuda-nvrtc-cu12 (84.0MiB) 2026-02-21T08:14:56.4499329Z Downloading sympy (6.0MiB) 2026-02-21T08:14:56.4500601Z Downloading networkx (2.0MiB) 2026-02-21T08:14:56.4747019Z Downloading nvidia-cufft-cu12 (184.2MiB) 2026-02-21T08:14:56.4757734Z Downloading nvidia-nvshmem-cu12 (118.9MiB) 2026-02-21T08:14:56.4833589Z Downloading nvidia-cusparse-cu12 (274.9MiB) 2026-02-21T08:14:56.4893088Z Downloading nvidia-cuda-cupti-cu12 (9.8MiB) 2026-02-21T08:14:56.4974393Z Downloading nvidia-cublas-cu12 (566.8MiB) 2026-02-21T08:14:56.5040711Z Downloading nvidia-cufile-cu12 (1.1MiB) 2026-02-21T08:14:56.5141783Z Downloading nvidia-cusolver-cu12 (255.1MiB) 2026-02-21T08:14:56.5184655Z Downloading nvidia-cusparselt-cu12 (273.9MiB) 2026-02-21T08:14:56.5209665Z Downloading nvidia-nvjitlink-cu12 (37.4MiB) 2026-02-21T08:14:56.5352213Z Downloading nvidia-curand-cu12 (60.7MiB) 2026-02-21T08:14:56.5398293Z Downloading nvidia-cudnn-cu12 (674.0MiB) 2026-02-21T08:14:56.8207636Z Downloaded nvidia-cufile-cu12 2026-02-21T08:14:56.9349508Z Downloaded networkx 2026-02-21T08:14:57.7828631Z Downloaded sympy 2026-02-21T08:14:58.4712433Z Downloaded nvidia-cuda-cupti-cu12 2026-02-21T08:14:59.3180520Z Downloaded triton 2026-02-21T08:15:01.4678701Z Downloaded nvidia-nvjitlink-cu12 2026-02-21T08:15:02.3291105Z Downloaded nvidia-cuda-nvrtc-cu12 2026-02-21T08:15:03.2681635Z Downloaded nvidia-curand-cu12 2026-02-21T08:15:09.2482742Z Downloaded nvidia-nvshmem-cu12 2026-02-21T08:15:12.5844533Z Downloaded nvidia-cufft-cu12 2026-02-21T08:15:14.2728651Z Downloaded nvidia-nccl-cu12 2026-02-21T08:15:15.2073726Z Downloaded torch 2026-02-21T08:15:16.4123962Z Downloaded nvidia-cusparse-cu12 2026-02-21T08:15:17.3659300Z Downloaded nvidia-cusolver-cu12 2026-02-21T08:15:17.9674755Z Downloaded nvidia-cusparselt-cu12 2026-02-21T08:15:20.1985686Z Downloaded nvidia-cublas-cu12 2026-02-21T08:15:21.6918735Z Downloaded nvidia-cudnn-cu12 2026-02-21T08:15:21.6998347Z Prepared 26 packages in 25.27s 2026-02-21T08:15:21.7043796Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:15:21.7048086Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:15:21.7051718Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:15:24.6773495Z Installed 26 packages in 2.97s 2026-02-21T08:15:24.6774645Z + filelock==3.20.0 2026-02-21T08:15:24.6775088Z + fsspec==2025.12.0 2026-02-21T08:15:24.6775457Z + jinja2==3.1.6 2026-02-21T08:15:24.6775788Z + markupsafe==3.0.2 2026-02-21T08:15:24.6776131Z + mpmath==1.3.0 2026-02-21T08:15:24.6776442Z + networkx==3.6.1 2026-02-21T08:15:24.6778262Z + nvidia-cublas-cu12==12.8.4.1 2026-02-21T08:15:24.6778559Z + nvidia-cuda-cupti-cu12==12.8.90 2026-02-21T08:15:24.6778818Z + nvidia-cuda-nvrtc-cu12==12.8.93 2026-02-21T08:15:24.6779072Z + nvidia-cuda-runtime-cu12==12.8.90 2026-02-21T08:15:24.6779317Z + nvidia-cudnn-cu12==9.10.2.21 2026-02-21T08:15:24.6779543Z + nvidia-cufft-cu12==11.3.3.83 2026-02-21T08:15:24.6779779Z + nvidia-cufile-cu12==1.13.1.3 2026-02-21T08:15:24.6779998Z + nvidia-curand-cu12==10.3.9.90 2026-02-21T08:15:24.6780229Z + nvidia-cusolver-cu12==11.7.3.90 2026-02-21T08:15:24.6780463Z + nvidia-cusparse-cu12==12.5.8.93 2026-02-21T08:15:24.6780698Z + nvidia-cusparselt-cu12==0.7.1 2026-02-21T08:15:24.6780935Z + nvidia-nccl-cu12==2.27.5 2026-02-21T08:15:24.6781159Z + nvidia-nvjitlink-cu12==12.8.93 2026-02-21T08:15:24.6781387Z + nvidia-nvshmem-cu12==3.3.20 2026-02-21T08:15:24.6781624Z + nvidia-nvtx-cu12==12.8.90 2026-02-21T08:15:24.6781830Z + setuptools==70.2.0 2026-02-21T08:15:24.6782024Z + sympy==1.14.0 2026-02-21T08:15:24.6782212Z + torch==2.9.1+cu128 2026-02-21T08:15:24.6782390Z + triton==3.5.1 2026-02-21T08:15:24.6782578Z + typing-extensions==4.15.0 2026-02-21T08:15:24.8759895Z ##[group]Run source .venv/bin/activate 2026-02-21T08:15:24.8760222Z source .venv/bin/activate 2026-02-21T08:15:24.8760549Z SETUPTOOLS_SCM_PRETEND_VERSION="0.0.0" uv pip install -e .'[dev]' 2026-02-21T08:15:24.8760927Z python -c "import helion; print(helion.__name__)" 2026-02-21T08:15:24.8761380Z shell: bash -l {0} 2026-02-21T08:15:24.8761537Z env: 2026-02-21T08:15:24.8761694Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:15:24.8761930Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:24.8762513Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:15:24.8762817Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:24.8763077Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:24.8763338Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:24.8763646Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:15:24.8764053Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:15:24.8764324Z ##[endgroup] 2026-02-21T08:15:27.6762473Z Resolved 30 packages in 2.44s 2026-02-21T08:15:27.6774820Z Building helion @ file:///__w/helion/helion 2026-02-21T08:15:27.6917211Z Downloading virtualenv (5.6MiB) 2026-02-21T08:15:27.6930387Z Downloading scikit-learn (8.5MiB) 2026-02-21T08:15:27.6932775Z Downloading scipy (33.4MiB) 2026-02-21T08:15:27.7092284Z Downloading pygments (1.2MiB) 2026-02-21T08:15:27.7093518Z Downloading numpy (15.8MiB) 2026-02-21T08:15:27.8648391Z Built helion @ file:///__w/helion/helion 2026-02-21T08:15:27.8675290Z Downloaded pygments 2026-02-21T08:15:27.8684529Z Downloaded virtualenv 2026-02-21T08:15:28.3097301Z Downloaded scikit-learn 2026-02-21T08:15:28.3600836Z Downloaded numpy 2026-02-21T08:15:28.8807467Z Downloaded scipy 2026-02-21T08:15:28.8870856Z Prepared 27 packages in 1.21s 2026-02-21T08:15:28.8880053Z Uninstalled 1 package in 0.81ms 2026-02-21T08:15:28.8890703Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:15:28.8891447Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:15:28.8892195Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:15:30.0540862Z Installed 29 packages in 1.16s 2026-02-21T08:15:30.0541475Z + cfgv==3.5.0 2026-02-21T08:15:30.0541701Z + distlib==0.4.0 2026-02-21T08:15:30.0541955Z + expecttest==0.3.0 2026-02-21T08:15:30.0542190Z + filecheck==1.0.3 2026-02-21T08:15:30.0542605Z - filelock==3.20.0 2026-02-21T08:15:30.0542877Z + filelock==3.24.3 2026-02-21T08:15:30.0543193Z + helion==0.0.0 (from file:///__w/helion/helion) 2026-02-21T08:15:30.0543856Z + hypothesis==6.151.9 2026-02-21T08:15:30.0544141Z + identify==2.6.16 2026-02-21T08:15:30.0544484Z + iniconfig==2.3.0 2026-02-21T08:15:30.0544877Z + joblib==1.5.3 2026-02-21T08:15:30.0545187Z + markdown-it-py==4.0.0 2026-02-21T08:15:30.0545532Z + mdurl==0.1.2 2026-02-21T08:15:30.0545886Z + nodeenv==1.10.0 2026-02-21T08:15:30.0546197Z + numpy==2.4.2 2026-02-21T08:15:30.0546897Z + packaging==26.0 2026-02-21T08:15:30.0547330Z + platformdirs==4.9.2 2026-02-21T08:15:30.0547623Z + pluggy==1.6.0 2026-02-21T08:15:30.0547948Z + pre-commit==4.5.1 2026-02-21T08:15:30.0548370Z + psutil==7.2.2 2026-02-21T08:15:30.0548662Z + pygments==2.19.2 2026-02-21T08:15:30.0549028Z + pytest==9.0.2 2026-02-21T08:15:30.0562128Z + pytest-timeout==2.4.0 2026-02-21T08:15:30.0562397Z + pyyaml==6.0.3 2026-02-21T08:15:30.0562588Z + rich==14.3.3 2026-02-21T08:15:30.0562757Z + scikit-learn==1.8.0 2026-02-21T08:15:30.0562945Z + scipy==1.17.0 2026-02-21T08:15:30.0563128Z + sortedcontainers==2.4.0 2026-02-21T08:15:30.0563329Z + threadpoolctl==3.6.0 2026-02-21T08:15:30.0563535Z + virtualenv==20.38.0 2026-02-21T08:15:44.2975721Z helion 2026-02-21T08:15:45.2238705Z ##[group]Run set -x 2026-02-21T08:15:45.2238969Z set -x 2026-02-21T08:15:45.2239157Z source .venv/bin/activate 2026-02-21T08:15:45.2239382Z uv pip install pip 2026-02-21T08:15:45.2239623Z uv pip install quack-kernels --no-deps 2026-02-21T08:15:45.2239942Z mkdir -p benchmarks/ && pushd benchmarks/ 2026-02-21T08:15:45.2240283Z git clone https://github.com/pytorch-labs/tritonbench/ 2026-02-21T08:15:45.2240591Z pushd tritonbench/ 2026-02-21T08:15:45.2240817Z git submodule update --init --recursive 2026-02-21T08:15:45.2241356Z uv pip install -r requirements.txt 2026-02-21T08:15:45.2241604Z python install.py --liger 2026-02-21T08:15:45.2241830Z uv pip install -e . --no-deps 2026-02-21T08:15:45.2242043Z popd 2026-02-21T08:15:45.2242185Z popd 2026-02-21T08:15:45.2242479Z shell: bash -l {0} 2026-02-21T08:15:45.2242638Z env: 2026-02-21T08:15:45.2242823Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:15:45.2243063Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:45.2243371Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:15:45.2243684Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:45.2243962Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:45.2244227Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:15:45.2244547Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:15:45.2244919Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:15:45.2245198Z ##[endgroup] 2026-02-21T08:15:45.3249617Z + source .venv/bin/activate 2026-02-21T08:15:45.3249963Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3250149Z ++ '[' -n x ']' 2026-02-21T08:15:45.3250358Z ++ SCRIPT_PATH=.venv/bin/activate 2026-02-21T08:15:45.3250724Z ++ '[' .venv/bin/activate = /__w/_temp/4e95467d-c69a-4706-88f6-d50d54b852a1.sh ']' 2026-02-21T08:15:45.3251129Z ++ deactivate nondestructive 2026-02-21T08:15:45.3251352Z ++ unset -f pydoc 2026-02-21T08:15:45.3251543Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3251701Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3251867Z ++ hash -r 2026-02-21T08:15:45.3252016Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3252189Z ++ unset VIRTUAL_ENV 2026-02-21T08:15:45.3252377Z ++ unset VIRTUAL_ENV_PROMPT 2026-02-21T08:15:45.3252619Z ++ '[' '!' nondestructive = nondestructive ']' 2026-02-21T08:15:45.3252888Z ++ VIRTUAL_ENV=/__w/helion/helion/.venv 2026-02-21T08:15:45.3255046Z ++ '[' linux-gnu = cygwin ']' 2026-02-21T08:15:45.3255266Z ++ '[' linux-gnu = msys ']' 2026-02-21T08:15:45.3256751Z ++ export VIRTUAL_ENV 2026-02-21T08:15:45.3256943Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3257116Z ++ unset SCRIPT_PATH 2026-02-21T08:15:45.3258017Z ++ _OLD_VIRTUAL_PATH=/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:15:45.3259671Z ++ PATH=/__w/helion/helion/.venv/bin:/github/home/.local/share/uv/python:/__w/_tool/uv/0.10.4/x86_64:/github/home/.local/bin:/__w/_tool/Python/3.12.12/x64/bin:/__w/_tool/Python/3.12.12/x64:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 2026-02-21T08:15:45.3260622Z ++ export PATH 2026-02-21T08:15:45.3260811Z ++ '[' xhelion '!=' x ']' 2026-02-21T08:15:45.3261017Z ++ VIRTUAL_ENV_PROMPT=helion 2026-02-21T08:15:45.3261243Z ++ export VIRTUAL_ENV_PROMPT 2026-02-21T08:15:45.3261443Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3261617Z ++ '[' -z '' ']' 2026-02-21T08:15:45.3261787Z ++ _OLD_VIRTUAL_PS1= 2026-02-21T08:15:45.3261973Z ++ PS1='(helion) ' 2026-02-21T08:15:45.3262157Z ++ export PS1 2026-02-21T08:15:45.3262322Z ++ alias pydoc 2026-02-21T08:15:45.3262489Z ++ true 2026-02-21T08:15:45.3262634Z ++ hash -r 2026-02-21T08:15:45.3262801Z + uv pip install pip 2026-02-21T08:15:45.4164916Z Resolved 1 package in 81ms 2026-02-21T08:15:45.4259189Z Downloading pip (1.7MiB) 2026-02-21T08:15:45.8479108Z Downloaded pip 2026-02-21T08:15:45.8596628Z Prepared 1 package in 443ms 2026-02-21T08:15:45.8954019Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:15:45.8954780Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:15:45.8955521Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:15:46.0514011Z Installed 1 package in 191ms 2026-02-21T08:15:46.0514581Z + pip==26.0.1 2026-02-21T08:15:46.0553824Z + uv pip install quack-kernels --no-deps 2026-02-21T08:15:46.0749341Z Resolved 1 package in 10ms 2026-02-21T08:15:46.1729667Z Prepared 1 package in 97ms 2026-02-21T08:15:46.1760461Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:15:46.1761244Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:15:46.1762005Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:15:46.2086119Z Installed 1 package in 35ms 2026-02-21T08:15:46.2086426Z + quack-kernels==0.2.10 2026-02-21T08:15:46.2126561Z + mkdir -p benchmarks/ 2026-02-21T08:15:46.2141852Z + pushd benchmarks/ 2026-02-21T08:15:46.2142145Z /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:15:46.2142516Z + git clone https://github.com/pytorch-labs/tritonbench/ 2026-02-21T08:15:46.2157597Z Cloning into 'tritonbench'... 2026-02-21T08:15:48.7639923Z + pushd tritonbench/ 2026-02-21T08:15:48.7640201Z + git submodule update --init --recursive 2026-02-21T08:15:48.7640707Z /__w/helion/helion/benchmarks/tritonbench /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:15:48.8345077Z Submodule 'submodules/ThunderKittens' (https://github.com/HazyResearch/ThunderKittens.git) registered for path 'submodules/ThunderKittens' 2026-02-21T08:15:48.9081274Z Submodule 'submodules/aiter' (https://github.com/ROCm/aiter.git) registered for path 'submodules/aiter' 2026-02-21T08:15:49.0610341Z Submodule 'submodules/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/cutlass' 2026-02-21T08:15:49.2489930Z Submodule 'submodules/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'submodules/flash-attention' 2026-02-21T08:15:49.3524262Z Submodule 'submodules/generative-recommenders' (https://github.com/facebookresearch/generative-recommenders.git) registered for path 'submodules/generative-recommenders' 2026-02-21T08:15:49.4232769Z Submodule 'submodules/xformers' (https://github.com/facebookresearch/xformers.git) registered for path 'submodules/xformers' 2026-02-21T08:15:49.4265064Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/ThunderKittens'... 2026-02-21T08:15:55.0992206Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/aiter'... 2026-02-21T08:16:26.5251231Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/cutlass'... 2026-02-21T08:16:32.6830382Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention'... 2026-02-21T08:16:35.2660884Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/generative-recommenders'... 2026-02-21T08:16:38.1836194Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers'... 2026-02-21T08:16:44.1362572Z Submodule path 'submodules/ThunderKittens': checked out '25f7568450b412a1984a4f619fb28373df06fa1b' 2026-02-21T08:16:44.6221612Z Submodule path 'submodules/aiter': checked out '1f5b378dcc9d9b0bcd9456c8c767b7424a5e8190' 2026-02-21T08:16:44.8017404Z Submodule '3rdparty/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/aiter/3rdparty/composable_kernel' 2026-02-21T08:16:44.8055111Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/aiter/3rdparty/composable_kernel'... 2026-02-21T08:16:55.4513619Z Submodule path 'submodules/aiter/3rdparty/composable_kernel': checked out 'e31a7a4f29b371c32ea9daf9211b6ae1fed2fa40' 2026-02-21T08:16:56.4931336Z Submodule path 'submodules/cutlass': checked out 'ad7b2f5e84fcfa124cb02b91d5bd26d238c0459e' 2026-02-21T08:16:56.6172732Z Submodule path 'submodules/flash-attention': checked out '43375aab2893018dfb7950db1cfa623c14946ad6' 2026-02-21T08:16:56.7103209Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/flash-attention/csrc/composable_kernel' 2026-02-21T08:16:56.7765800Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/flash-attention/csrc/cutlass' 2026-02-21T08:16:56.7804985Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention/csrc/composable_kernel'... 2026-02-21T08:17:06.2697913Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/flash-attention/csrc/cutlass'... 2026-02-21T08:17:13.9239343Z Submodule path 'submodules/flash-attention/csrc/composable_kernel': checked out 'e8709c24f403173ad21a2da907d1347957e324fb' 2026-02-21T08:17:15.0853126Z Submodule path 'submodules/flash-attention/csrc/cutlass': checked out 'b1d6e2c9b334dfa811e4183dfbd02419249e4b52' 2026-02-21T08:17:15.3977821Z Submodule path 'submodules/generative-recommenders': checked out '88512dbd71b053226bc4ef8ec1630e3db53e55e5' 2026-02-21T08:17:15.5220041Z Submodule 'generative_recommenders/ops/cpp/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass' 2026-02-21T08:17:15.5250882Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass'... 2026-02-21T08:17:24.9706021Z Submodule path 'submodules/generative-recommenders/generative_recommenders/ops/cpp/cutlass': checked out 'dc4817921edda44a549197ff3a9dcf5df0636e7b' 2026-02-21T08:17:25.0917946Z Submodule path 'submodules/xformers': checked out '8fc8ec5a4d6498ff81c0c418b89bbaf133ae3a44' 2026-02-21T08:17:25.1721165Z Submodule 'third_party/composable_kernel_tiled' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/xformers/third_party/composable_kernel_tiled' 2026-02-21T08:17:25.2876639Z Submodule 'third_party/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/xformers/third_party/cutlass' 2026-02-21T08:17:25.4002673Z Submodule 'third_party/flash-attention' (https://github.com/Dao-AILab/flash-attention.git) registered for path 'submodules/xformers/third_party/flash-attention' 2026-02-21T08:17:25.8526920Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/composable_kernel_tiled'... 2026-02-21T08:17:32.5227916Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/cutlass'... 2026-02-21T08:17:36.9202830Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention'... 2026-02-21T08:17:38.8083616Z Submodule path 'submodules/xformers/third_party/composable_kernel_tiled': checked out '4f54fa30583704f34da2ac50372d524cae6bad7d' 2026-02-21T08:17:39.6509110Z Submodule path 'submodules/xformers/third_party/cutlass': checked out 'e9627ce55b42fd2599f58cd4396da9380954def0' 2026-02-21T08:17:39.7279937Z Submodule path 'submodules/xformers/third_party/flash-attention': checked out '979702c87a8713a8e0a5e9fee122b90d2ef13be5' 2026-02-21T08:17:39.7896124Z Submodule 'csrc/composable_kernel' (https://github.com/ROCm/composable_kernel.git) registered for path 'submodules/xformers/third_party/flash-attention/csrc/composable_kernel' 2026-02-21T08:17:39.7964928Z Submodule 'csrc/cutlass' (https://github.com/NVIDIA/cutlass.git) registered for path 'submodules/xformers/third_party/flash-attention/csrc/cutlass' 2026-02-21T08:17:39.8000486Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention/csrc/composable_kernel'... 2026-02-21T08:17:46.1358574Z Cloning into '/__w/helion/helion/benchmarks/tritonbench/submodules/xformers/third_party/flash-attention/csrc/cutlass'... 2026-02-21T08:17:53.6387266Z Submodule path 'submodules/xformers/third_party/flash-attention/csrc/composable_kernel': checked out '888317e698e9803c62bd38568abc9e05d7709f33' 2026-02-21T08:17:54.5799796Z Submodule path 'submodules/xformers/third_party/flash-attention/csrc/cutlass': checked out 'c506e16788cb08416a4a57e11a9067beeee29420' 2026-02-21T08:17:54.5843293Z + uv pip install -r requirements.txt 2026-02-21T08:17:54.5922224Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:17:55.3737301Z Resolved 30 packages in 779ms 2026-02-21T08:17:55.3968129Z Downloading fonttools (4.7MiB) 2026-02-21T08:17:55.5908999Z Downloading hf-xet (3.2MiB) 2026-02-21T08:17:55.5912454Z Downloading tokenizers (3.0MiB) 2026-02-21T08:17:55.5912959Z Downloading kiwisolver (1.4MiB) 2026-02-21T08:17:55.5946984Z Downloading transformers (10.3MiB) 2026-02-21T08:17:55.5949055Z Downloading matplotlib (8.3MiB) 2026-02-21T08:17:55.5950931Z Downloading pillow (6.7MiB) 2026-02-21T08:17:55.7124405Z Downloaded kiwisolver 2026-02-21T08:17:55.7716096Z Downloaded tokenizers 2026-02-21T08:17:55.7818550Z Downloaded hf-xet 2026-02-21T08:17:55.8615752Z Downloaded pillow 2026-02-21T08:17:55.8754900Z Downloaded fonttools 2026-02-21T08:17:55.9037679Z Downloaded matplotlib 2026-02-21T08:17:56.3840721Z Downloaded transformers 2026-02-21T08:17:56.4096708Z Prepared 23 packages in 1.03s 2026-02-21T08:17:56.4139719Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:17:56.4140523Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:17:56.4141288Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:17:56.9486789Z Installed 23 packages in 538ms 2026-02-21T08:17:56.9487070Z + certifi==2026.1.4 2026-02-21T08:17:56.9487301Z + charset-normalizer==3.4.4 2026-02-21T08:17:56.9487527Z + contourpy==1.3.3 2026-02-21T08:17:56.9487707Z + cycler==0.12.1 2026-02-21T08:17:56.9487882Z + fonttools==4.61.1 2026-02-21T08:17:56.9488059Z + hf-xet==1.2.0 2026-02-21T08:17:56.9488239Z + huggingface-hub==0.36.2 2026-02-21T08:17:56.9488438Z + idna==3.11 2026-02-21T08:17:56.9488638Z + kiwisolver==1.4.9 2026-02-21T08:17:56.9488835Z + matplotlib==3.10.8 2026-02-21T08:17:56.9489022Z + nvidia-ml-py==13.590.48 2026-02-21T08:17:56.9489219Z + pillow==12.1.1 2026-02-21T08:17:56.9489387Z + pyparsing==3.3.2 2026-02-21T08:17:56.9491608Z + python-dateutil==2.9.0.post0 2026-02-21T08:17:56.9491909Z + regex==2026.2.19 2026-02-21T08:17:56.9492128Z + requests==2.32.5 2026-02-21T08:17:56.9492332Z + safetensors==0.7.0 2026-02-21T08:17:56.9492542Z + six==1.17.0 2026-02-21T08:17:56.9492744Z + tabulate==0.9.0 2026-02-21T08:17:56.9492935Z + tokenizers==0.21.4 2026-02-21T08:17:56.9493147Z + tqdm==4.67.3 2026-02-21T08:17:56.9493334Z + transformers==4.53.0 2026-02-21T08:17:56.9493539Z + urllib3==2.6.3 2026-02-21T08:17:56.9597089Z + python install.py --liger 2026-02-21T08:18:02.4953076Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:18:02.4994153Z Audited 6 packages in 4ms 2026-02-21T08:18:03.0560110Z INFO:__main__:[tritonbench] installing liger-kernels... 2026-02-21T08:18:03.0635522Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:18:03.1157887Z Resolved 1 package in 50ms 2026-02-21T08:18:03.2120870Z Prepared 1 package in 96ms 2026-02-21T08:18:03.2151156Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:18:03.2151872Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:18:03.2152581Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:18:03.2685295Z Installed 1 package in 56ms 2026-02-21T08:18:03.2685616Z + liger-kernel-nightly==0.7.0.dev20260219183429 2026-02-21T08:18:03.2726959Z INFO:__main__:[tritonbench] installation complete! 2026-02-21T08:18:03.6290632Z + uv pip install -e . --no-deps 2026-02-21T08:18:04.0022548Z Using Python 3.12.12 environment at: /__w/helion/helion/.venv 2026-02-21T08:18:04.0055564Z Resolved 1 package in 1ms 2026-02-21T08:18:04.0226052Z Building tritonbench @ file:///__w/helion/helion/benchmarks/tritonbench 2026-02-21T08:18:05.9558575Z Built tritonbench @ file:///__w/helion/helion/benchmarks/tritonbench 2026-02-21T08:18:05.9758118Z Prepared 1 package in 1.97s 2026-02-21T08:18:05.9762299Z warning: Failed to hardlink files; falling back to full copy. This may lead to degraded performance. 2026-02-21T08:18:05.9763018Z If the cache and target directories are on different filesystems, hardlinking may not be supported. 2026-02-21T08:18:05.9763784Z If this is intentional, set `export UV_LINK_MODE=copy` or use `--link-mode=copy` to suppress this warning. 2026-02-21T08:18:06.0817190Z Installed 1 package in 105ms 2026-02-21T08:18:06.0817795Z + tritonbench==0.0.1 (from file:///__w/helion/helion/benchmarks/tritonbench) 2026-02-21T08:18:06.0964510Z + popd 2026-02-21T08:18:06.0964688Z + popd 2026-02-21T08:18:06.0964921Z /__w/helion/helion/benchmarks /__w/helion/helion 2026-02-21T08:18:06.0965222Z /__w/helion/helion 2026-02-21T08:18:06.1028254Z ##[group]Run rm -rf /tmp/torchinductor_*/ || true 2026-02-21T08:18:06.1028780Z rm -rf /tmp/torchinductor_*/ || true 2026-02-21T08:18:06.1029026Z  2026-02-21T08:18:06.1029200Z source .venv/bin/activate 2026-02-21T08:18:06.1029395Z  2026-02-21T08:18:06.1029585Z TEST_REPORTS_DIR=$(pwd)/test/test-reports 2026-02-21T08:18:06.1029885Z mkdir -p "$TEST_REPORTS_DIR" 2026-02-21T08:18:06.1030105Z echo "$TEST_REPORTS_DIR" 2026-02-21T08:18:06.1030294Z  2026-02-21T08:18:06.1030445Z KERNEL_LIST="int4_gemm" 2026-02-21T08:18:06.1030665Z for kernel in ${KERNEL_LIST//,/ }; do 2026-02-21T08:18:06.1030937Z  echo "==========================================" 2026-02-21T08:18:06.1031237Z  echo "Running benchmark for kernel: $kernel" 2026-02-21T08:18:06.1031515Z  echo "==========================================" 2026-02-21T08:18:06.1031747Z  2026-02-21T08:18:06.1032003Z  # Get available implementations and baseline for this kernel 2026-02-21T08:18:06.1032542Z  KERNEL_INFO=$(python benchmarks/run.py --list-impls-for-benchmark-ci --op $kernel | grep "^$kernel:") 2026-02-21T08:18:06.1033082Z  IMPLS=$(echo "$KERNEL_INFO" | sed -n 's/.*impls=\([^ ]*\).*/\1/p') 2026-02-21T08:18:06.1033488Z  BASELINE=$(echo "$KERNEL_INFO" | sed -n 's/.*baseline=\([^ ]*\).*/\1/p') 2026-02-21T08:18:06.1033794Z  2026-02-21T08:18:06.1033945Z  if [[ -z "$IMPLS" ]]; then 2026-02-21T08:18:06.1034270Z  echo "Warning: No implementations found for kernel $kernel, skipping..." 2026-02-21T08:18:06.1034607Z  continue 2026-02-21T08:18:06.1034766Z  fi 2026-02-21T08:18:06.1034932Z  if [[ -z "$BASELINE" ]]; then 2026-02-21T08:18:06.1035259Z  echo "Warning: No baseline found for kernel $kernel, skipping..." 2026-02-21T08:18:06.1035564Z  continue 2026-02-21T08:18:06.1035735Z  fi 2026-02-21T08:18:06.1035905Z  echo "Using baseline: $BASELINE" 2026-02-21T08:18:06.1036194Z  echo "Available implementations for $kernel: $IMPLS" 2026-02-21T08:18:06.1036593Z  2026-02-21T08:18:06.1036793Z  # Do autotuning but do not record the results 2026-02-21T08:18:06.1037055Z  python benchmarks/run.py \ 2026-02-21T08:18:06.1037283Z  --op $kernel \ 2026-02-21T08:18:06.1037513Z  --metrics speedup,accuracy \ 2026-02-21T08:18:06.1037779Z  --latency-measure-mode triton_do_bench \ 2026-02-21T08:18:06.1038051Z  --cudagraph \ 2026-02-21T08:18:06.1038243Z  --only $IMPLS \ 2026-02-21T08:18:06.1038478Z  --only-match-mode prefix-with-baseline \ 2026-02-21T08:18:06.1038733Z  --baseline $BASELINE \ 2026-02-21T08:18:06.1038947Z  --atol 1e-2 \ 2026-02-21T08:18:06.1039136Z  --rtol 1e-2 \ 2026-02-21T08:18:06.1039357Z  --input-sample-mode equally-spaced-k \ 2026-02-21T08:18:06.1039607Z  --keep-going \ 2026-02-21T08:18:06.1039787Z   2026-02-21T08:18:06.1040185Z  2026-02-21T08:18:06.1040325Z  # Relax the GPU 2026-02-21T08:18:06.1040503Z  sleep 2m 2026-02-21T08:18:06.1040655Z  2026-02-21T08:18:06.1040826Z  # Run again with cache and record results 2026-02-21T08:18:06.1041217Z  HELION_PRINT_OUTPUT_CODE=1 HELION_ASSERT_CACHE_HIT=1 python benchmarks/run.py \ 2026-02-21T08:18:06.1041583Z  --op $kernel \ 2026-02-21T08:18:06.1041804Z  --metrics speedup,accuracy \ 2026-02-21T08:18:06.1042076Z  --latency-measure-mode triton_do_bench \ 2026-02-21T08:18:06.1042368Z  --cudagraph \ 2026-02-21T08:18:06.1042568Z  --only $IMPLS \ 2026-02-21T08:18:06.1043033Z  --only-match-mode prefix-with-baseline \ 2026-02-21T08:18:06.1043331Z  --baseline $BASELINE \ 2026-02-21T08:18:06.1043542Z  --atol 1e-2 \ 2026-02-21T08:18:06.1043730Z  --rtol 1e-2 \ 2026-02-21T08:18:06.1043959Z  --input-sample-mode equally-spaced-k \ 2026-02-21T08:18:06.1044265Z  --output "$TEST_REPORTS_DIR/helionbench.json" \ 2026-02-21T08:18:06.1044546Z  --append-to-output \ 2026-02-21T08:18:06.1044757Z  --keep-going \ 2026-02-21T08:18:06.1044946Z   2026-02-21T08:18:06.1045091Z  2026-02-21T08:18:06.1045299Z  echo "✅ Completed benchmark for kernel: $kernel" 2026-02-21T08:18:06.1045567Z done 2026-02-21T08:18:06.1045713Z  2026-02-21T08:18:06.1045919Z if [[ ! -s "$TEST_REPORTS_DIR/helionbench.json" ]]; then 2026-02-21T08:18:06.1046247Z  echo "❌ helionbench.json is missing or empty" 2026-02-21T08:18:06.1046624Z  exit 1 2026-02-21T08:18:06.1046779Z fi 2026-02-21T08:18:06.1046961Z cat "$TEST_REPORTS_DIR/helionbench.json" 2026-02-21T08:18:06.1047332Z shell: bash -l {0} 2026-02-21T08:18:06.1047493Z env: 2026-02-21T08:18:06.1047649Z HELION_AUTOTUNE_LOG_LEVEL: INFO 2026-02-21T08:18:06.1047889Z pythonLocation: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:18:06.1048202Z PKG_CONFIG_PATH: /__w/_tool/Python/3.12.12/x64/lib/pkgconfig 2026-02-21T08:18:06.1048499Z Python_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:18:06.1048765Z Python2_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:18:06.1049027Z Python3_ROOT_DIR: /__w/_tool/Python/3.12.12/x64 2026-02-21T08:18:06.1049401Z LD_LIBRARY_PATH: /__w/_tool/Python/3.12.12/x64/lib:/usr/local/cuda/lib64 2026-02-21T08:18:06.1049766Z UV_PYTHON_INSTALL_DIR: /github/home/.local/share/uv/python 2026-02-21T08:18:06.1050039Z ##[endgroup] 2026-02-21T08:18:06.1885961Z /__w/helion/helion/test/test-reports 2026-02-21T08:18:06.1886277Z ========================================== 2026-02-21T08:18:06.1886737Z Running benchmark for kernel: int4_gemm 2026-02-21T08:18:06.1887008Z ========================================== 2026-02-21T08:18:12.4372211Z Using baseline: preprocessed_eager_int4_gemm 2026-02-21T08:18:12.4373035Z Available implementations for int4_gemm: helion_int4_gemm_tritonbench,preprocessed_torch_compile_int4_gemm,preprocessed_triton_int4_gemm 2026-02-21T08:18:19.2929792Z Applying custom args for int4_gemm: {'num_inputs': 10} 2026-02-21T08:18:19.3448990Z Running int4_gemm benchmark with Helion implementation... 2026-02-21T08:18:19.3449273Z 2026-02-21T08:18:19.5942144Z Equally-spaced-k mode: Selected 10 equally spaced inputs (total available: 32) 2026-02-21T08:18:19.5942803Z WARNING:tritonbench.utils.triton_op:Input IDs to run: [0, 3, 7, 10, 14, 17, 21, 24, 28, 31] 2026-02-21T08:18:19.5950243Z 2026-02-21T08:18:19.5959353Z 0%| | 0/10 [00:00; 2026-02-21T08:52:40.5981478Z .reg .b16 %rs<209>; 2026-02-21T08:52:40.5981645Z .reg .b32 %r<1745>; 2026-02-21T08:52:40.5982216Z [77s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:40.5984015Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 64, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=2, num_stages=3, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[1, 0], range_unroll_factors=[3, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:40.5986009Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:40.5986291Z `ptxas` stderr: 2026-02-21T08:52:40.5987061Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 591 in function _helion_matmul_bf16_int4. Try to compile with register target of 34 or higher. 2026-02-21T08:52:40.5987735Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:40.5987936Z 2026-02-21T08:52:40.5988458Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmppxcbvji7.ptx -o /tmp/tmppxcbvji7.ptx.o 2026-02-21T08:52:40.5989149Z 2026-02-21T08:52:40.5989565Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:40.5989867Z .reg .b64 %rd<285>; 2026-02-21T08:52:40.5990201Z .loc 1 19 0 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:19:0 2026-02-21T08:52:40.5990576Z $L__func_begin0: 2026-02-21T08:52:40.5990878Z .loc 1 19 0 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:19:0 2026-02-21T08:52:40.5991170Z 2026-02-21T08:52:40.5991231Z // %bb.0: 2026-02-21T08:52:40.5991429Z ld.param.b64 %rd42, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:40.5991680Z $L__tmp0: 2026-02-21T08:52:40.5991961Z .loc 1 21 67 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:21:67 2026-02-21T08:52:40.5992322Z mov.u32 %r1711, %ctaid.x; 2026-02-21T08:52:40.5992563Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:40.5992858Z ld.param.b64 %rd62, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:40.5993101Z mov.u32 %r152, %ctaid.y; 2026-02-21T08:52:40.5993331Z ld.param.b64 %rd79, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T08:52:40.5993591Z mov.u32 %r153, %ctaid.z; 2026-02-21T08:52:40.5993759Z mov.u32 %r154, %nctaid.x; 2026-02-21T08:52:40.5993940Z mov.u32 %r155, %nctaid.y; 2026-02-21T08:52:40.5994116Z mad.lo.s32 %r156, %r153, %r155, %r152; 2026-02-21T08:52:40.5994334Z mad.lo.s32 %r157, %r156, %r154, %r1711; 2026-02-21T08:52:40.5994537Z shl.b32 %r158, %r157, 8; 2026-02-21T08:52:40.5994710Z cvt.s64.s32 %rd80, %r158; 2026-02-21T08:52:40.5994883Z add.s64 %rd58, %rd79, %rd80; 2026-02-21T08:52:40.5995062Z mov.u32 %r2, %tid.x; 2026-02-21T08:52:40.5995227Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T08:52:40.5995404Z shl.b32 %r159, %r2, 2; 2026-02-21T08:52:40.5995603Z mov.b32 %r160, global_smem; 2026-02-21T08:52:40.5995791Z add.s32 %r136, %r160, %r159; 2026-02-21T08:52:40.5995975Z mov.b32 %r145, 0; 2026-02-21T08:52:40.5996132Z // begin inline asm 2026-02-21T08:52:40.5996315Z @%p1 st.shared.b32 [ %r136 + 0 ], %r145; 2026-02-21T08:52:40.5996681Z // end inline asm 2026-02-21T08:52:40.5996852Z bar.warp.sync -1; 2026-02-21T08:52:40.5997024Z setp.eq.b32 %p191, %r2, 0; 2026-02-21T08:52:40.5997214Z cvt.u64.u32 %rd43, %r160; 2026-02-21T08:52:40.5997393Z // begin inline asm 2026-02-21T08:52:40.5997742Z @%p191 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd43 + 0 ], %rd44; 2026-02-21T08:52:40.5998104Z // end inline asm 2026-02-21T08:52:40.5998269Z // begin inline asm 2026-02-21T08:52:40.5998559Z @%p191 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T08:52:40.5998880Z // end inline asm 2026-02-21T08:52:40.5999036Z mov.b32 %r138, 16; 2026-02-21T08:52:40.5999189Z // begin inline asm 2026-02-21T08:52:40.5999483Z @%p191 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r138; 2026-02-21T08:52:40.5999837Z // end inline asm 2026-02-21T08:52:40.5999990Z // begin inline asm 2026-02-21T08:52:40.6000304Z @%p191 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r138; 2026-02-21T08:52:40.6000645Z // end inline asm 2026-02-21T08:52:40.6000801Z mov.b32 %r140, 7168; 2026-02-21T08:52:40.6000960Z // begin inline asm 2026-02-21T08:52:40.6001425Z @%p191 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r140; 2026-02-21T08:52:40.6001784Z // end inline asm 2026-02-21T08:52:40.6001942Z mov.b32 %r141, 4096; 2026-02-21T08:52:40.6002105Z // begin inline asm 2026-02-21T08:52:40.6002398Z @%p191 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r141; 2026-02-21T08:52:40.6002756Z // end inline asm 2026-02-21T08:52:40.6002900Z mov.b64 %rd51, 7168; 2026-02-21T08:52:40.6003057Z // begin inline asm 2026-02-21T08:52:40.6003366Z @%p191 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd43 + 0 ], 0x0, %rd51; 2026-02-21T08:52:40.6003729Z // end inline asm 2026-02-21T08:52:40.6003877Z mov.b32 %r142, 1; 2026-02-21T08:52:40.6004024Z // begin inline asm 2026-02-21T08:52:40.6004469Z @%p191 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r142; 2026-02-21T08:52:40.6004845Z // end inline asm 2026-02-21T08:52:40.6004995Z // begin inline asm 2026-02-21T08:52:40.6005323Z @%p191 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r142; 2026-02-21T08:52:40.6005689Z // end inline asm 2026-02-21T08:52:40.6005834Z // begin inline asm 2026-02-21T08:52:40.6006118Z @%p191 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6006586Z // end inline asm 2026-02-21T08:52:40.6006743Z // begin inline asm 2026-02-21T08:52:40.6007067Z @%p191 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6007417Z // end inline asm 2026-02-21T08:52:40.6007566Z // begin inline asm 2026-02-21T08:52:40.6007855Z @%p191 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6008194Z // end inline asm 2026-02-21T08:52:40.6008349Z // begin inline asm 2026-02-21T08:52:40.6008623Z @%p191 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6008950Z // end inline asm 2026-02-21T08:52:40.6009098Z // begin inline asm 2026-02-21T08:52:40.6009531Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd58 + 0 ], [ %rd43 + 0 ], 0x80; 2026-02-21T08:52:40.6010008Z // end inline asm 2026-02-21T08:52:40.6010161Z // begin inline asm 2026-02-21T08:52:40.6010411Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd58 + 0 ], 0x80; 2026-02-21T08:52:40.6010728Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:52:40.6010959Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:52:40.6011166Z // end inline asm 2026-02-21T08:52:40.6011315Z bar.sync 0; 2026-02-21T08:52:40.6011472Z cvta.global.u64 %rd180, %rd58; 2026-02-21T08:52:40.6011819Z .loc 1 23 65 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:23:65 2026-02-21T08:52:40.6012190Z add.s64 %rd76, %rd58, 128; 2026-02-21T08:52:40.6012373Z bar.sync 0; 2026-02-21T08:52:40.6012524Z // begin inline asm 2026-02-21T08:52:40.6012694Z @%p1 st.shared.b32 [ %r136 + 0 ], %r145; 2026-02-21T08:52:40.6012908Z // end inline asm 2026-02-21T08:52:40.6013108Z bar.warp.sync -1; 2026-02-21T08:52:40.6013282Z // begin inline asm 2026-02-21T08:52:40.6013605Z @%p191 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd43 + 0 ], %rd62; 2026-02-21T08:52:40.6013958Z // end inline asm 2026-02-21T08:52:40.6014110Z // begin inline asm 2026-02-21T08:52:40.6014382Z @%p191 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T08:52:40.6014692Z // end inline asm 2026-02-21T08:52:40.6014845Z // begin inline asm 2026-02-21T08:52:40.6015128Z @%p191 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r138; 2026-02-21T08:52:40.6015462Z // end inline asm 2026-02-21T08:52:40.6015604Z mov.b32 %r147, 64; 2026-02-21T08:52:40.6015765Z // begin inline asm 2026-02-21T08:52:40.6016043Z @%p191 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r147; 2026-02-21T08:52:40.6016374Z // end inline asm 2026-02-21T08:52:40.6016868Z // begin inline asm 2026-02-21T08:52:40.6017175Z @%p191 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r140; 2026-02-21T08:52:40.6017526Z // end inline asm 2026-02-21T08:52:40.6017673Z // begin inline asm 2026-02-21T08:52:40.6017973Z @%p191 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r147; 2026-02-21T08:52:40.6018314Z // end inline asm 2026-02-21T08:52:40.6018466Z mov.b64 %rd69, 14336; 2026-02-21T08:52:40.6018634Z // begin inline asm 2026-02-21T08:52:40.6018943Z @%p191 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd43 + 0 ], 0x0, %rd69; 2026-02-21T08:52:40.6019305Z // end inline asm 2026-02-21T08:52:40.6019452Z // begin inline asm 2026-02-21T08:52:40.6019902Z @%p191 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r142; 2026-02-21T08:52:40.6020270Z // end inline asm 2026-02-21T08:52:40.6020420Z // begin inline asm 2026-02-21T08:52:40.6020745Z @%p191 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r142; 2026-02-21T08:52:40.6021111Z // end inline asm 2026-02-21T08:52:40.6021263Z // begin inline asm 2026-02-21T08:52:40.6021541Z @%p191 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd43 + 0 ], 0xa; 2026-02-21T08:52:40.6021875Z // end inline asm 2026-02-21T08:52:40.6022023Z // begin inline asm 2026-02-21T08:52:40.6022327Z @%p191 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6022680Z // end inline asm 2026-02-21T08:52:40.6022845Z // begin inline asm 2026-02-21T08:52:40.6023141Z @%p191 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T08:52:40.6023482Z // end inline asm 2026-02-21T08:52:40.6023634Z // begin inline asm 2026-02-21T08:52:40.6023909Z @%p191 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T08:52:40.6024237Z // end inline asm 2026-02-21T08:52:40.6024382Z // begin inline asm 2026-02-21T08:52:40.6024911Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd76 + 0 ], [ %rd43 + 0 ], 0x80; 2026-02-21T08:52:40.6025408Z // end inline asm 2026-02-21T08:52:40.6025561Z // begin inline asm 2026-02-21T08:52:40.6025814Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd76 + 0 ], 0x80; 2026-02-21T08:52:40.6026124Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T08:52:40.6026352Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T08:52:40.6026699Z // end inline asm 2026-02-21T08:52:40.6026856Z bar.sync 0; 2026-02-21T08:52:40.6027010Z cvta.global.u64 %rd177, %rd76; 2026-02-21T08:52:40.6027359Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6027730Z sub.s32 %r161, 711, %r1711; 2026-02-21T08:52:40.6027916Z mul.hi.u32 %r162, %r161, 1041204193; 2026-02-21T08:52:40.6028125Z shr.u32 %r163, %r162, 6; 2026-02-21T08:52:40.6028299Z mul.hi.u32 %r164, %r163, 1431655766; 2026-02-21T08:52:40.6028628Z mad.lo.s32 %r1736, %r164, 792, %r1711; 2026-02-21T08:52:40.6028974Z .loc 1 41 45 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:41:45 2026-02-21T08:52:40.6029329Z and.b32 %r4, %r2, 31; 2026-02-21T08:52:40.6029498Z shr.u32 %r5, %r2, 5; 2026-02-21T08:52:40.6029652Z shr.u32 %r6, %r2, 3; 2026-02-21T08:52:40.6029814Z bfe.u32 %r7, %r2, 3, 5; 2026-02-21T08:52:40.6029979Z or.b32 %r8, %r7, 32; 2026-02-21T08:52:40.6030286Z .loc 1 54 38 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:54:38 2026-02-21T08:52:40.6030636Z and.b32 %r9, %r2, 7; 2026-02-21T08:52:40.6030795Z shl.b32 %r10, %r9, 2; 2026-02-21T08:52:40.6031108Z .loc 1 72 38 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:72:38 2026-02-21T08:52:40.6031457Z and.b32 %r11, %r2, 16; 2026-02-21T08:52:40.6031765Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6032283Z setp.ge.s32 %p37, %r1711, %r1736; 2026-02-21T08:52:40.6032484Z shl.b32 %r1695, %r2, 3; 2026-02-21T08:52:40.6032651Z shr.u32 %r1696, %r2, 1; 2026-02-21T08:52:40.6032816Z shl.b32 %r1697, %r2, 5; 2026-02-21T08:52:40.6032989Z shl.b32 %r1698, %r2, 4; 2026-02-21T08:52:40.6033155Z shl.b32 %r1699, %r2, 1; 2026-02-21T08:52:40.6033317Z and.b32 %r1700, %r2, 24; 2026-02-21T08:52:40.6033490Z and.b32 %r1701, %r2, 15; 2026-02-21T08:52:40.6033651Z shl.b32 %r1702, %r9, 4; 2026-02-21T08:52:40.6033818Z shr.u32 %r1703, %r2, 2; 2026-02-21T08:52:40.6033984Z shl.b32 %r1704, %r5, 3; 2026-02-21T08:52:40.6034143Z shl.b32 %r1705, %r5, 7; 2026-02-21T08:52:40.6034311Z shl.b32 %r1706, %r4, 3; 2026-02-21T08:52:40.6034473Z shl.b32 %r1707, %r2, 6; 2026-02-21T08:52:40.6034804Z bfe.s32 %r1708, %r2, 2, 1; 2026-02-21T08:52:40.6034984Z and.b32 %r1709, %r6, 16; 2026-02-21T08:52:40.6035154Z shl.b32 %r1710, %r7, 13; 2026-02-21T08:52:40.6035319Z cvt.u64.u32 %rd276, %r10; 2026-02-21T08:52:40.6035511Z setp.eq.b32 %p243, %r11, 0; 2026-02-21T08:52:40.6035691Z @%p37 bra $L__BB0_9; 2026-02-21T08:52:40.6035876Z // %bb.1: // %.lr.ph 2026-02-21T08:52:40.6036260Z .loc 1 0 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:0:97 2026-02-21T08:52:40.6036754Z and.b32 %r166, %r1695, 2040; 2026-02-21T08:52:40.6036941Z and.b32 %r168, %r1696, 24; 2026-02-21T08:52:40.6037116Z xor.b32 %r12, %r166, %r168; 2026-02-21T08:52:40.6037298Z add.s32 %r13, %r160, %r12; 2026-02-21T08:52:40.6037468Z add.s32 %r14, %r13, 2048; 2026-02-21T08:52:40.6037642Z add.s32 %r15, %r13, 8192; 2026-02-21T08:52:40.6037807Z add.s32 %r16, %r13, 10240; 2026-02-21T08:52:40.6037986Z add.s32 %r17, %r13, 4096; 2026-02-21T08:52:40.6038159Z add.s32 %r18, %r13, 6144; 2026-02-21T08:52:40.6038331Z add.s32 %r19, %r13, 12288; 2026-02-21T08:52:40.6038507Z add.s32 %r20, %r13, 14336; 2026-02-21T08:52:40.6038679Z and.b32 %r171, %r1697, 3072; 2026-02-21T08:52:40.6038864Z and.b32 %r173, %r1698, 448; 2026-02-21T08:52:40.6039036Z and.b32 %r175, %r1699, 6; 2026-02-21T08:52:40.6039210Z or.b32 %r177, %r173, %r1700; 2026-02-21T08:52:40.6039380Z or.b32 %r178, %r177, %r171; 2026-02-21T08:52:40.6039567Z or.b32 %r21, %r178, %r175; 2026-02-21T08:52:40.6039740Z xor.b32 %r22, %r21, 8; 2026-02-21T08:52:40.6039902Z xor.b32 %r23, %r21, 16; 2026-02-21T08:52:40.6040069Z xor.b32 %r24, %r21, 24; 2026-02-21T08:52:40.6040228Z and.b32 %r180, %r1696, 112; 2026-02-21T08:52:40.6040404Z or.b32 %r25, %r180, %r1701; 2026-02-21T08:52:40.6040572Z shl.b32 %r181, %r1701, 7; 2026-02-21T08:52:40.6040741Z and.b32 %r184, %r1703, 60; 2026-02-21T08:52:40.6040910Z xor.b32 %r185, %r1702, %r184; 2026-02-21T08:52:40.6041094Z or.b32 %r186, %r185, %r181; 2026-02-21T08:52:40.6041271Z add.s32 %r187, %r160, 16384; 2026-02-21T08:52:40.6041447Z add.s32 %r26, %r187, %r186; 2026-02-21T08:52:40.6041624Z xor.b32 %r188, %r186, 64; 2026-02-21T08:52:40.6041790Z add.s32 %r27, %r187, %r188; 2026-02-21T08:52:40.6041970Z and.b32 %r190, %r1704, 56; 2026-02-21T08:52:40.6042137Z or.b32 %r191, %r190, %r4; 2026-02-21T08:52:40.6042321Z shl.b32 %r192, %r191, 4; 2026-02-21T08:52:40.6042488Z add.s32 %r193, %r160, 18432; 2026-02-21T08:52:40.6042664Z add.s32 %r434, %r193, %r192; 2026-02-21T08:52:40.6042833Z and.b32 %r194, %r1698, 112; 2026-02-21T08:52:40.6043011Z or.b32 %r197, %r1705, %r1706; 2026-02-21T08:52:40.6043183Z and.b32 %r198, %r197, 384; 2026-02-21T08:52:40.6043359Z and.b32 %r200, %r1707, 512; 2026-02-21T08:52:40.6043537Z add.s32 %r201, %r193, %r194; 2026-02-21T08:52:40.6043709Z add.s32 %r202, %r201, %r200; 2026-02-21T08:52:40.6043885Z add.s32 %r303, %r202, %r198; 2026-02-21T08:52:40.6044059Z bfe.u32 %r203, %r187, 4, 14; 2026-02-21T08:52:40.6044236Z cvt.u64.u32 %rd81, %r203; 2026-02-21T08:52:40.6044420Z or.b64 %rd3, %rd81, 4611686293313683456; 2026-02-21T08:52:40.6044632Z add.s32 %r204, %r160, 16416; 2026-02-21T08:52:40.6044804Z bfe.u32 %r205, %r204, 4, 14; 2026-02-21T08:52:40.6045156Z cvt.u64.u32 %rd82, %r205; 2026-02-21T08:52:40.6045344Z or.b64 %rd4, %rd82, 4611686293313683456; 2026-02-21T08:52:40.6045545Z add.s32 %r206, %r160, 16448; 2026-02-21T08:52:40.6045729Z bfe.u32 %r207, %r206, 4, 14; 2026-02-21T08:52:40.6045904Z cvt.u64.u32 %rd83, %r207; 2026-02-21T08:52:40.6046087Z or.b64 %rd5, %rd83, 4611686293313683456; 2026-02-21T08:52:40.6046287Z add.s32 %r208, %r160, 16480; 2026-02-21T08:52:40.6046593Z bfe.u32 %r209, %r208, 4, 14; 2026-02-21T08:52:40.6046780Z cvt.u64.u32 %rd84, %r209; 2026-02-21T08:52:40.6046964Z or.b64 %rd6, %rd84, 4611686293313683456; 2026-02-21T08:52:40.6047160Z add.s32 %r210, %r160, 17408; 2026-02-21T08:52:40.6047338Z bfe.u32 %r211, %r210, 4, 14; 2026-02-21T08:52:40.6047515Z cvt.u64.u32 %rd85, %r211; 2026-02-21T08:52:40.6047840Z or.b64 %rd7, %rd85, 4611686293313683456; 2026-02-21T08:52:40.6048055Z add.s32 %r212, %r160, 17440; 2026-02-21T08:52:40.6048225Z bfe.u32 %r213, %r212, 4, 14; 2026-02-21T08:52:40.6048399Z cvt.u64.u32 %rd86, %r213; 2026-02-21T08:52:40.6048578Z or.b64 %rd8, %rd86, 4611686293313683456; 2026-02-21T08:52:40.6048779Z add.s32 %r214, %r160, 17472; 2026-02-21T08:52:40.6048946Z bfe.u32 %r215, %r214, 4, 14; 2026-02-21T08:52:40.6049122Z cvt.u64.u32 %rd87, %r215; 2026-02-21T08:52:40.6049299Z or.b64 %rd9, %rd87, 4611686293313683456; 2026-02-21T08:52:40.6049492Z add.s32 %r216, %r160, 17504; 2026-02-21T08:52:40.6049683Z bfe.u32 %r217, %r216, 4, 14; 2026-02-21T08:52:40.6049854Z cvt.u64.u32 %rd88, %r217; 2026-02-21T08:52:40.6050041Z or.b64 %rd10, %rd88, 4611686293313683456; 2026-02-21T08:52:40.6050240Z shl.b32 %r218, %r5, 4; 2026-02-21T08:52:40.6050405Z or.b32 %r219, %r218, %r4; 2026-02-21T08:52:40.6065942Z shl.b32 %r220, %r219, 5; 2026-02-21T08:52:40.6066163Z and.b32 %r221, %r220, 1888; 2026-02-21T08:52:40.6066373Z and.b32 %r223, %r1708, 144; 2026-02-21T08:52:40.6066722Z xor.b32 %r225, %r223, %r1709; 2026-02-21T08:52:40.6066924Z add.s32 %r226, %r160, %r221; 2026-02-21T08:52:40.6067111Z add.s32 %r30, %r226, %r225; 2026-02-21T08:52:40.6067299Z shl.b32 %r227, %r219, 3; 2026-02-21T08:52:40.6067493Z and.b32 %r228, %r227, 384; 2026-02-21T08:52:40.6067680Z add.s32 %r653, %r202, %r228; 2026-02-21T08:52:40.6068033Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6068423Z mad.wide.u32 %rd89, %r9, 8, %rd42; 2026-02-21T08:52:40.6068732Z add.s64 %rd11, %rd89, 320; 2026-02-21T08:52:40.6069077Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6069469Z or.b32 %r230, %r1710, %r10; 2026-02-21T08:52:40.6069662Z or.b32 %r33, %r230, 262304; 2026-02-21T08:52:40.6069891Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:40.6070204Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:40.6070474Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:40.6070745Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:40.6071122Z .loc 1 34 35 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:34:35 2026-02-21T08:52:40.6071485Z shr.s32 %r275, %r1711, 31; 2026-02-21T08:52:40.6071676Z shr.u32 %r276, %r275, 29; 2026-02-21T08:52:40.6071864Z add.s32 %r277, %r1711, %r276; 2026-02-21T08:52:40.6072194Z .loc 1 35 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:35:33 2026-02-21T08:52:40.6072556Z and.b32 %r278, %r277, -8; 2026-02-21T08:52:40.6072899Z .loc 1 36 39 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:39 2026-02-21T08:52:40.6073261Z sub.s32 %r279, 448, %r278; 2026-02-21T08:52:40.6073581Z .loc 1 36 52 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:52 2026-02-21T08:52:40.6073939Z min.s32 %r280, %r279, 8; 2026-02-21T08:52:40.6074257Z .loc 1 37 45 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:45 2026-02-21T08:52:40.6074852Z sub.s32 %r281, %r1711, %r278; 2026-02-21T08:52:40.6075168Z .loc 1 38 51 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:38:51 2026-02-21T08:52:40.6075523Z div.s32 %r282, %r281, %r280; 2026-02-21T08:52:40.6075856Z .loc 1 37 64 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:64 2026-02-21T08:52:40.6076234Z mul.lo.s32 %r283, %r282, %r280; 2026-02-21T08:52:40.6076435Z sub.s32 %r284, %r281, %r283; 2026-02-21T08:52:40.6076883Z .loc 1 37 30 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:30 2026-02-21T08:52:40.6077241Z add.s32 %r285, %r284, %r278; 2026-02-21T08:52:40.6077694Z .loc 1 39 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:39:27 2026-02-21T08:52:40.6078076Z shl.b32 %r575, %r285, 4; 2026-02-21T08:52:40.6078398Z .loc 1 40 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:40:27 2026-02-21T08:52:40.6078770Z shl.b32 %r576, %r282, 6; 2026-02-21T08:52:40.6079099Z .loc 1 41 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:41:32 2026-02-21T08:52:40.6079460Z or.b32 %r286, %r576, %r7; 2026-02-21T08:52:40.6079653Z or.b32 %r287, %r576, %r8; 2026-02-21T08:52:40.6079975Z .loc 1 55 53 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:53 2026-02-21T08:52:40.6080336Z shl.b32 %r288, %r286, 13; 2026-02-21T08:52:40.6080504Z shl.b32 %r289, %r287, 13; 2026-02-21T08:52:40.6080833Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6081201Z add.s32 %r923, %r160, 20480; 2026-02-21T08:52:40.6081390Z // begin inline asm 2026-02-21T08:52:40.6081671Z @%p191 mbarrier.init.shared::cta.b64 [%r923], 1; 2026-02-21T08:52:40.6081905Z // end inline asm 2026-02-21T08:52:40.6082069Z bar.sync 0; 2026-02-21T08:52:40.6082225Z add.s32 %r924, %r160, 20488; 2026-02-21T08:52:40.6082417Z // begin inline asm 2026-02-21T08:52:40.6082627Z @%p191 mbarrier.init.shared::cta.b64 [%r924], 1; 2026-02-21T08:52:40.6082868Z // end inline asm 2026-02-21T08:52:40.6083027Z add.s32 %r921, %r160, 20496; 2026-02-21T08:52:40.6083220Z // begin inline asm 2026-02-21T08:52:40.6083414Z @%p191 mbarrier.init.shared::cta.b64 [%r921], 1; 2026-02-21T08:52:40.6083639Z // end inline asm 2026-02-21T08:52:40.6083791Z bar.sync 0; 2026-02-21T08:52:40.6083939Z add.s32 %r922, %r160, 20504; 2026-02-21T08:52:40.6084119Z // begin inline asm 2026-02-21T08:52:40.6084318Z @%p191 mbarrier.init.shared::cta.b64 [%r922], 1; 2026-02-21T08:52:40.6084548Z // end inline asm 2026-02-21T08:52:40.6084847Z .loc 1 55 60 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:60 2026-02-21T08:52:40.6085209Z or.b32 %r291, %r288, %r10; 2026-02-21T08:52:40.6085401Z or.b32 %r292, %r289, %r10; 2026-02-21T08:52:40.6085718Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6086087Z mad.wide.s32 %rd90, %r291, 2, %rd42; 2026-02-21T08:52:40.6086296Z mad.wide.s32 %rd91, %r292, 2, %rd42; 2026-02-21T08:52:40.6086619Z mov.b32 %r236, 8; 2026-02-21T08:52:40.6086915Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6087263Z // begin inline asm 2026-02-21T08:52:40.6087515Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd90 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6087791Z // end inline asm 2026-02-21T08:52:40.6087947Z // begin inline asm 2026-02-21T08:52:40.6088168Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd91 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6088446Z // end inline asm 2026-02-21T08:52:40.6088607Z cp.async.commit_group; 2026-02-21T08:52:40.6088950Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6089494Z // begin inline asm 2026-02-21T08:52:40.6089733Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r923], 256; 2026-02-21T08:52:40.6090011Z // end inline asm 2026-02-21T08:52:40.6090321Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6090705Z bar.sync 0; 2026-02-21T08:52:40.6090870Z elect.sync %r293|%p51, -1; 2026-02-21T08:52:40.6091100Z and.pred %p43, %p1, %p51; 2026-02-21T08:52:40.6091287Z add.s32 %r240, %r160, 19456; 2026-02-21T08:52:40.6091483Z mov.b32 %r242, 0; 2026-02-21T08:52:40.6091641Z // begin inline asm 2026-02-21T08:52:40.6092073Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r240], [%rd180, {%r575, %r242}], [%r923]; 2026-02-21T08:52:40.6092541Z // end inline asm 2026-02-21T08:52:40.6092974Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6093359Z cvt.s64.s32 %rd103, %r288; 2026-02-21T08:52:40.6093556Z or.b64 %rd104, %rd103, %rd276; 2026-02-21T08:52:40.6093772Z shl.b64 %rd105, %rd104, 1; 2026-02-21T08:52:40.6093956Z add.s64 %rd106, %rd42, %rd105; 2026-02-21T08:52:40.6094147Z add.s64 %rd93, %rd106, 64; 2026-02-21T08:52:40.6094331Z cvt.s64.s32 %rd107, %r289; 2026-02-21T08:52:40.6094509Z or.b64 %rd108, %rd107, %rd276; 2026-02-21T08:52:40.6094703Z shl.b64 %rd109, %rd108, 1; 2026-02-21T08:52:40.6094880Z add.s64 %rd110, %rd42, %rd109; 2026-02-21T08:52:40.6095075Z add.s64 %rd94, %rd110, 64; 2026-02-21T08:52:40.6095409Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6095782Z // begin inline asm 2026-02-21T08:52:40.6096023Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd93 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6096313Z // end inline asm 2026-02-21T08:52:40.6096633Z // begin inline asm 2026-02-21T08:52:40.6097044Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd94 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6097506Z // end inline asm 2026-02-21T08:52:40.6097673Z cp.async.commit_group; 2026-02-21T08:52:40.6098007Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6098371Z // begin inline asm 2026-02-21T08:52:40.6098601Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r921], 256; 2026-02-21T08:52:40.6098867Z // end inline asm 2026-02-21T08:52:40.6099166Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6099516Z bar.sync 0; 2026-02-21T08:52:40.6099675Z elect.sync %r294|%p52, -1; 2026-02-21T08:52:40.6099874Z and.pred %p45, %p1, %p52; 2026-02-21T08:52:40.6100058Z add.s32 %r249, %r160, 19968; 2026-02-21T08:52:40.6100241Z mov.b32 %r251, 16; 2026-02-21T08:52:40.6100397Z // begin inline asm 2026-02-21T08:52:40.6100820Z @%p45 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r249], [%rd180, {%r575, %r251}], [%r921]; 2026-02-21T08:52:40.6101282Z // end inline asm 2026-02-21T08:52:40.6101586Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6101952Z add.s64 %rd96, %rd106, 128; 2026-02-21T08:52:40.6102145Z add.s64 %rd97, %rd110, 128; 2026-02-21T08:52:40.6102467Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6102814Z // begin inline asm 2026-02-21T08:52:40.6103046Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd96 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6103318Z // end inline asm 2026-02-21T08:52:40.6103472Z // begin inline asm 2026-02-21T08:52:40.6103701Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd97 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6103970Z // end inline asm 2026-02-21T08:52:40.6104137Z cp.async.commit_group; 2026-02-21T08:52:40.6104453Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6104985Z // begin inline asm 2026-02-21T08:52:40.6105206Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r924], 256; 2026-02-21T08:52:40.6105476Z // end inline asm 2026-02-21T08:52:40.6105767Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6106119Z bar.sync 0; 2026-02-21T08:52:40.6106293Z elect.sync %r295|%p53, -1; 2026-02-21T08:52:40.6106605Z and.pred %p47, %p1, %p53; 2026-02-21T08:52:40.6106800Z add.s32 %r258, %r160, 19712; 2026-02-21T08:52:40.6106988Z mov.b32 %r260, 32; 2026-02-21T08:52:40.6107150Z // begin inline asm 2026-02-21T08:52:40.6107556Z @%p47 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r258], [%rd180, {%r575, %r260}], [%r924]; 2026-02-21T08:52:40.6108018Z // end inline asm 2026-02-21T08:52:40.6108451Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6108900Z add.s64 %rd99, %rd106, 192; 2026-02-21T08:52:40.6109085Z add.s64 %rd100, %rd110, 192; 2026-02-21T08:52:40.6109405Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6109749Z // begin inline asm 2026-02-21T08:52:40.6109980Z cp.async.ca.shared.global [ %r19 + 0 ], [ %rd99 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6110259Z // end inline asm 2026-02-21T08:52:40.6110418Z // begin inline asm 2026-02-21T08:52:40.6110657Z cp.async.ca.shared.global [ %r20 + 0 ], [ %rd100 + 0 ], 0x8, %r236; 2026-02-21T08:52:40.6110930Z // end inline asm 2026-02-21T08:52:40.6111090Z cp.async.commit_group; 2026-02-21T08:52:40.6111410Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6111770Z // begin inline asm 2026-02-21T08:52:40.6111990Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r922], 256; 2026-02-21T08:52:40.6112254Z // end inline asm 2026-02-21T08:52:40.6112545Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6112891Z bar.sync 0; 2026-02-21T08:52:40.6113044Z elect.sync %r296|%p54, -1; 2026-02-21T08:52:40.6113226Z and.pred %p49, %p1, %p54; 2026-02-21T08:52:40.6113408Z add.s32 %r267, %r160, 20224; 2026-02-21T08:52:40.6113580Z mov.b32 %r269, 48; 2026-02-21T08:52:40.6113747Z // begin inline asm 2026-02-21T08:52:40.6114155Z @%p49 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r267], [%rd180, {%r575, %r269}], [%r922]; 2026-02-21T08:52:40.6114611Z // end inline asm 2026-02-21T08:52:40.6114911Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6115265Z shl.b32 %r297, %r282, 19; 2026-02-21T08:52:40.6115442Z or.b32 %r298, %r1710, %r297; 2026-02-21T08:52:40.6115627Z mad.wide.s32 %rd277, %r298, 2, %rd11; 2026-02-21T08:52:40.6115839Z or.b32 %r1712, %r33, %r297; 2026-02-21T08:52:40.6116014Z mov.b32 %r453, 0f00000000; 2026-02-21T08:52:40.6116206Z mov.b32 %r1715, 1; 2026-02-21T08:52:40.6116358Z mov.b32 %r1714, -1; 2026-02-21T08:52:40.6116667Z mov.b64 %rd278, 0; 2026-02-21T08:52:40.6116828Z mov.b32 %r1713, %r242; 2026-02-21T08:52:40.6116993Z mov.b32 %r454, %r453; 2026-02-21T08:52:40.6117157Z mov.b32 %r455, %r453; 2026-02-21T08:52:40.6117313Z mov.b32 %r456, %r453; 2026-02-21T08:52:40.6117544Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:40.6117839Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:40.6118107Z setp.lt.u64 %p74, %rd278, 4032; 2026-02-21T08:52:40.6118298Z add.s32 %r521, %r1714, 1; 2026-02-21T08:52:40.6118480Z setp.gt.s32 %p75, %r521, 1; 2026-02-21T08:52:40.6118681Z selp.b32 %r1714, 0, %r521, %p75; 2026-02-21T08:52:40.6118880Z selp.b32 %r522, 1, 0, %p75; 2026-02-21T08:52:40.6119063Z xor.b32 %r1713, %r1713, %r522; 2026-02-21T08:52:40.6119394Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6119910Z cp.async.wait_group 2; 2026-02-21T08:52:40.6120079Z bar.sync 0; 2026-02-21T08:52:40.6120241Z shl.b32 %r523, %r1714, 12; 2026-02-21T08:52:40.6120419Z add.s32 %r525, %r160, %r523; 2026-02-21T08:52:40.6120742Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6121094Z add.s32 %r526, %r525, %r21; 2026-02-21T08:52:40.6121274Z ld.shared.b16 %rs1, [%r526]; 2026-02-21T08:52:40.6121465Z ld.shared.b16 %rs2, [%r526+512]; 2026-02-21T08:52:40.6121663Z ld.shared.b16 %rs3, [%r526+32]; 2026-02-21T08:52:40.6121863Z ld.shared.b16 %rs4, [%r526+544]; 2026-02-21T08:52:40.6122053Z add.s32 %r527, %r525, %r22; 2026-02-21T08:52:40.6122238Z ld.shared.b16 %rs5, [%r527]; 2026-02-21T08:52:40.6122545Z ld.shared.b16 %rs6, [%r527+512]; 2026-02-21T08:52:40.6122768Z ld.shared.b16 %rs7, [%r527+32]; 2026-02-21T08:52:40.6122971Z ld.shared.b16 %rs8, [%r527+544]; 2026-02-21T08:52:40.6123164Z add.s32 %r528, %r525, %r23; 2026-02-21T08:52:40.6123357Z ld.shared.b16 %rs9, [%r528]; 2026-02-21T08:52:40.6123542Z ld.shared.b16 %rs10, [%r528+512]; 2026-02-21T08:52:40.6123746Z ld.shared.b16 %rs11, [%r528+32]; 2026-02-21T08:52:40.6123938Z ld.shared.b16 %rs12, [%r528+544]; 2026-02-21T08:52:40.6124137Z add.s32 %r529, %r525, %r24; 2026-02-21T08:52:40.6124314Z ld.shared.b16 %rs13, [%r529]; 2026-02-21T08:52:40.6124504Z ld.shared.b16 %rs14, [%r529+512]; 2026-02-21T08:52:40.6124695Z ld.shared.b16 %rs15, [%r529+32]; 2026-02-21T08:52:40.6124892Z ld.shared.b16 %rs16, [%r529+544]; 2026-02-21T08:52:40.6125086Z cvt.f32.bf16 %r369, %rs1; 2026-02-21T08:52:40.6125260Z cvt.f32.bf16 %r370, %rs2; 2026-02-21T08:52:40.6125435Z cvt.f32.bf16 %r371, %rs5; 2026-02-21T08:52:40.6125603Z cvt.f32.bf16 %r372, %rs6; 2026-02-21T08:52:40.6125779Z cvt.f32.bf16 %r381, %rs9; 2026-02-21T08:52:40.6125950Z cvt.f32.bf16 %r382, %rs10; 2026-02-21T08:52:40.6126143Z cvt.f32.bf16 %r383, %rs13; 2026-02-21T08:52:40.6126319Z cvt.f32.bf16 %r384, %rs14; 2026-02-21T08:52:40.6126622Z cvt.f32.bf16 %r393, %rs3; 2026-02-21T08:52:40.6126809Z cvt.f32.bf16 %r394, %rs4; 2026-02-21T08:52:40.6126983Z cvt.f32.bf16 %r395, %rs7; 2026-02-21T08:52:40.6127159Z cvt.f32.bf16 %r396, %rs8; 2026-02-21T08:52:40.6127327Z cvt.f32.bf16 %r405, %rs11; 2026-02-21T08:52:40.6127503Z cvt.f32.bf16 %r406, %rs12; 2026-02-21T08:52:40.6127672Z cvt.f32.bf16 %r407, %rs15; 2026-02-21T08:52:40.6127849Z cvt.f32.bf16 %r408, %rs16; 2026-02-21T08:52:40.6128172Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6128535Z shl.b32 %r530, %r1714, 3; 2026-02-21T08:52:40.6128703Z add.s32 %r299, %r923, %r530; 2026-02-21T08:52:40.6128882Z // begin inline asm 2026-02-21T08:52:40.6129039Z 2026-02-21T08:52:40.6129162Z { 2026-02-21T08:52:40.6129306Z .reg .pred complete; 2026-02-21T08:52:40.6129478Z waitLoop: 2026-02-21T08:52:40.6129708Z mbarrier.try_wait.parity.shared.b64 complete, [%r299], %r1713; 2026-02-21T08:52:40.6130002Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6130179Z } 2026-02-21T08:52:40.6130251Z 2026-02-21T08:52:40.6130310Z // end inline asm 2026-02-21T08:52:40.6130613Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6130985Z shl.b32 %r532, %r1714, 8; 2026-02-21T08:52:40.6131162Z add.s32 %r534, %r240, %r532; 2026-02-21T08:52:40.6131485Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6131835Z add.s32 %r535, %r534, %r25; 2026-02-21T08:52:40.6132024Z ld.shared.b8 %rs17, [%r535]; 2026-02-21T08:52:40.6132207Z ld.shared.b8 %rs18, [%r535+128]; 2026-02-21T08:52:40.6132563Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6132916Z shl.b16 %rs19, %rs17, 4; 2026-02-21T08:52:40.6133100Z shl.b16 %rs20, %rs18, 4; 2026-02-21T08:52:40.6133412Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6133931Z selp.b16 %rs21, %rs19, %rs17, %p243; 2026-02-21T08:52:40.6134141Z cvt.s16.s8 %rs22, %rs21; 2026-02-21T08:52:40.6134312Z shr.s16 %rs23, %rs22, 4; 2026-02-21T08:52:40.6134493Z selp.b16 %rs24, %rs20, %rs18, %p243; 2026-02-21T08:52:40.6134695Z cvt.s16.s8 %rs25, %rs24; 2026-02-21T08:52:40.6134871Z shr.s16 %rs26, %rs25, 4; 2026-02-21T08:52:40.6135177Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6135533Z cvt.rn.f32.s16 %r536, %rs23; 2026-02-21T08:52:40.6135717Z cvt.rn.f32.s16 %r537, %rs26; 2026-02-21T08:52:40.6135896Z st.shared.b32 [%r26], %r536; 2026-02-21T08:52:40.6136078Z st.shared.b32 [%r27], %r537; 2026-02-21T08:52:40.6136580Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r453}; 2026-02-21T08:52:40.6136894Z bar.sync 0; 2026-02-21T08:52:40.6137049Z // begin inline asm 2026-02-21T08:52:40.6137309Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r325, %r373}, [%r303]; 2026-02-21T08:52:40.6137600Z // end inline asm 2026-02-21T08:52:40.6137769Z bar.sync 0; 2026-02-21T08:52:40.6137984Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r455}; 2026-02-21T08:52:40.6138247Z bar.sync 0; 2026-02-21T08:52:40.6138392Z // begin inline asm 2026-02-21T08:52:40.6138629Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r327, %r375}, [%r303]; 2026-02-21T08:52:40.6138914Z // end inline asm 2026-02-21T08:52:40.6139056Z bar.sync 0; 2026-02-21T08:52:40.6139264Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r454}; 2026-02-21T08:52:40.6139522Z bar.sync 0; 2026-02-21T08:52:40.6139669Z // begin inline asm 2026-02-21T08:52:40.6139910Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r326, %r374}, [%r303]; 2026-02-21T08:52:40.6140199Z // end inline asm 2026-02-21T08:52:40.6140350Z bar.sync 0; 2026-02-21T08:52:40.6140555Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r456}; 2026-02-21T08:52:40.6140834Z bar.sync 0; 2026-02-21T08:52:40.6140973Z // begin inline asm 2026-02-21T08:52:40.6141210Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r328, %r376}, [%r303]; 2026-02-21T08:52:40.6141489Z // end inline asm 2026-02-21T08:52:40.6141640Z $L__tmp1: 2026-02-21T08:52:40.6141996Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6142420Z // begin inline asm 2026-02-21T08:52:40.6142607Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6142800Z // end inline asm 2026-02-21T08:52:40.6142976Z shfl.sync.idx.b32 %r538, %r5, 0, 31, -1; 2026-02-21T08:52:40.6143215Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6143410Z mov.pred %p55, -1; 2026-02-21T08:52:40.6143566Z // begin inline asm 2026-02-21T08:52:40.6143956Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r325,%r326,%r327,%r328}, {%r369,%r370,%r371,%r372}, %rd3, %p55, 1, 1; 2026-02-21T08:52:40.6144394Z // end inline asm 2026-02-21T08:52:40.6144547Z // begin inline asm 2026-02-21T08:52:40.6144925Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r325,%r326,%r327,%r328}, {%r381,%r382,%r383,%r384}, %rd4, %p55, 1, 1; 2026-02-21T08:52:40.6145351Z // end inline asm 2026-02-21T08:52:40.6145521Z // begin inline asm 2026-02-21T08:52:40.6145885Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r325,%r326,%r327,%r328}, {%r393,%r394,%r395,%r396}, %rd5, %p55, 1, 1; 2026-02-21T08:52:40.6146309Z // end inline asm 2026-02-21T08:52:40.6146574Z // begin inline asm 2026-02-21T08:52:40.6146975Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r325,%r326,%r327,%r328}, {%r405,%r406,%r407,%r408}, %rd6, %p55, 1, 1; 2026-02-21T08:52:40.6147399Z // end inline asm 2026-02-21T08:52:40.6147545Z // begin inline asm 2026-02-21T08:52:40.6147917Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r373,%r374,%r375,%r376}, {%r369,%r370,%r371,%r372}, %rd7, %p55, 1, 1; 2026-02-21T08:52:40.6148333Z // end inline asm 2026-02-21T08:52:40.6148490Z // begin inline asm 2026-02-21T08:52:40.6149101Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r373,%r374,%r375,%r376}, {%r381,%r382,%r383,%r384}, %rd8, %p55, 1, 1; 2026-02-21T08:52:40.6149537Z // end inline asm 2026-02-21T08:52:40.6149696Z // begin inline asm 2026-02-21T08:52:40.6150077Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r373,%r374,%r375,%r376}, {%r393,%r394,%r395,%r396}, %rd9, %p55, 1, 1; 2026-02-21T08:52:40.6150509Z // end inline asm 2026-02-21T08:52:40.6150656Z // begin inline asm 2026-02-21T08:52:40.6151031Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r373,%r374,%r375,%r376}, {%r405,%r406,%r407,%r408}, %rd10, %p55, 1, 1; 2026-02-21T08:52:40.6151451Z // end inline asm 2026-02-21T08:52:40.6151623Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6151831Z mov.b32 %r418, %r242; 2026-02-21T08:52:40.6152135Z mov.b32 %r419, %r242; 2026-02-21T08:52:40.6152310Z mov.b32 %r417, %r187; 2026-02-21T08:52:40.6152467Z // begin inline asm 2026-02-21T08:52:40.6152721Z // wait for regs: %r325,%r326,%r327,%r328,%r373,%r374,%r375,%r376,%r417,%r418,%r419 2026-02-21T08:52:40.6153044Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6153247Z // end inline asm 2026-02-21T08:52:40.6153389Z $L__tmp2: 2026-02-21T08:52:40.6153703Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6154067Z add.s32 %r539, %r160, 8192; 2026-02-21T08:52:40.6154246Z add.s32 %r540, %r539, %r523; 2026-02-21T08:52:40.6154573Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6154919Z add.s32 %r541, %r540, %r21; 2026-02-21T08:52:40.6155115Z ld.shared.b16 %rs27, [%r541]; 2026-02-21T08:52:40.6155312Z ld.shared.b16 %rs28, [%r541+512]; 2026-02-21T08:52:40.6155522Z ld.shared.b16 %rs29, [%r541+32]; 2026-02-21T08:52:40.6155714Z ld.shared.b16 %rs30, [%r541+544]; 2026-02-21T08:52:40.6155905Z add.s32 %r542, %r540, %r22; 2026-02-21T08:52:40.6156087Z ld.shared.b16 %rs31, [%r542]; 2026-02-21T08:52:40.6156273Z ld.shared.b16 %rs32, [%r542+512]; 2026-02-21T08:52:40.6156631Z ld.shared.b16 %rs33, [%r542+32]; 2026-02-21T08:52:40.6156828Z ld.shared.b16 %rs34, [%r542+544]; 2026-02-21T08:52:40.6157019Z add.s32 %r543, %r540, %r23; 2026-02-21T08:52:40.6157194Z ld.shared.b16 %rs35, [%r543]; 2026-02-21T08:52:40.6157381Z ld.shared.b16 %rs36, [%r543+512]; 2026-02-21T08:52:40.6157591Z ld.shared.b16 %rs37, [%r543+32]; 2026-02-21T08:52:40.6157781Z ld.shared.b16 %rs38, [%r543+544]; 2026-02-21T08:52:40.6157971Z add.s32 %r544, %r540, %r24; 2026-02-21T08:52:40.6158143Z ld.shared.b16 %rs39, [%r544]; 2026-02-21T08:52:40.6158329Z ld.shared.b16 %rs40, [%r544+512]; 2026-02-21T08:52:40.6158531Z ld.shared.b16 %rs41, [%r544+32]; 2026-02-21T08:52:40.6158720Z ld.shared.b16 %rs42, [%r544+544]; 2026-02-21T08:52:40.6158911Z cvt.f32.bf16 %r449, %rs27; 2026-02-21T08:52:40.6159086Z cvt.f32.bf16 %r450, %rs28; 2026-02-21T08:52:40.6159265Z cvt.f32.bf16 %r451, %rs31; 2026-02-21T08:52:40.6159440Z cvt.f32.bf16 %r452, %rs32; 2026-02-21T08:52:40.6159618Z cvt.f32.bf16 %r461, %rs35; 2026-02-21T08:52:40.6159786Z cvt.f32.bf16 %r462, %rs36; 2026-02-21T08:52:40.6159975Z cvt.f32.bf16 %r463, %rs39; 2026-02-21T08:52:40.6160143Z cvt.f32.bf16 %r464, %rs40; 2026-02-21T08:52:40.6160316Z cvt.f32.bf16 %r473, %rs29; 2026-02-21T08:52:40.6160490Z cvt.f32.bf16 %r474, %rs30; 2026-02-21T08:52:40.6160669Z cvt.f32.bf16 %r475, %rs33; 2026-02-21T08:52:40.6160843Z cvt.f32.bf16 %r476, %rs34; 2026-02-21T08:52:40.6161008Z cvt.f32.bf16 %r485, %rs37; 2026-02-21T08:52:40.6161184Z cvt.f32.bf16 %r486, %rs38; 2026-02-21T08:52:40.6161351Z cvt.f32.bf16 %r487, %rs41; 2026-02-21T08:52:40.6161522Z cvt.f32.bf16 %r488, %rs42; 2026-02-21T08:52:40.6161845Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6162210Z add.s32 %r431, %r921, %r530; 2026-02-21T08:52:40.6162383Z // begin inline asm 2026-02-21T08:52:40.6162533Z 2026-02-21T08:52:40.6162794Z { 2026-02-21T08:52:40.6162937Z .reg .pred complete; 2026-02-21T08:52:40.6163102Z waitLoop: 2026-02-21T08:52:40.6163322Z mbarrier.try_wait.parity.shared.b64 complete, [%r431], %r1713; 2026-02-21T08:52:40.6163613Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6163782Z } 2026-02-21T08:52:40.6163855Z 2026-02-21T08:52:40.6163914Z // end inline asm 2026-02-21T08:52:40.6164208Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6164569Z add.s32 %r547, %r249, %r532; 2026-02-21T08:52:40.6164895Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6165243Z add.s32 %r548, %r547, %r25; 2026-02-21T08:52:40.6165421Z ld.shared.b8 %rs43, [%r548]; 2026-02-21T08:52:40.6165730Z ld.shared.b8 %rs44, [%r548+128]; 2026-02-21T08:52:40.6166060Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6166410Z shl.b16 %rs45, %rs43, 4; 2026-02-21T08:52:40.6166723Z shl.b16 %rs46, %rs44, 4; 2026-02-21T08:52:40.6167031Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6167388Z selp.b16 %rs47, %rs45, %rs43, %p243; 2026-02-21T08:52:40.6167593Z cvt.s16.s8 %rs48, %rs47; 2026-02-21T08:52:40.6167757Z shr.s16 %rs49, %rs48, 4; 2026-02-21T08:52:40.6167936Z selp.b16 %rs50, %rs46, %rs44, %p243; 2026-02-21T08:52:40.6168129Z cvt.s16.s8 %rs51, %rs50; 2026-02-21T08:52:40.6168300Z shr.s16 %rs52, %rs51, 4; 2026-02-21T08:52:40.6168602Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6168956Z cvt.rn.f32.s16 %r549, %rs49; 2026-02-21T08:52:40.6169148Z cvt.rn.f32.s16 %r550, %rs52; 2026-02-21T08:52:40.6169326Z bar.sync 0; 2026-02-21T08:52:40.6169478Z st.shared.b32 [%r26], %r549; 2026-02-21T08:52:40.6169662Z st.shared.b32 [%r27], %r550; 2026-02-21T08:52:40.6169840Z $L__tmp3: 2026-02-21T08:52:40.6170188Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6170693Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r303], {%r325, %r373}; 2026-02-21T08:52:40.6170977Z bar.sync 0; 2026-02-21T08:52:40.6171124Z // begin inline asm 2026-02-21T08:52:40.6171366Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r453}, [%r434]; 2026-02-21T08:52:40.6171628Z // end inline asm 2026-02-21T08:52:40.6171776Z bar.sync 0; 2026-02-21T08:52:40.6171994Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r303], {%r327, %r375}; 2026-02-21T08:52:40.6172274Z bar.sync 0; 2026-02-21T08:52:40.6172410Z // begin inline asm 2026-02-21T08:52:40.6172637Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r455}, [%r434]; 2026-02-21T08:52:40.6172896Z // end inline asm 2026-02-21T08:52:40.6173043Z bar.sync 0; 2026-02-21T08:52:40.6173257Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r303], {%r326, %r374}; 2026-02-21T08:52:40.6173539Z bar.sync 0; 2026-02-21T08:52:40.6173680Z // begin inline asm 2026-02-21T08:52:40.6173897Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r454}, [%r434]; 2026-02-21T08:52:40.6174156Z // end inline asm 2026-02-21T08:52:40.6174294Z bar.sync 0; 2026-02-21T08:52:40.6174511Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r303], {%r328, %r376}; 2026-02-21T08:52:40.6174781Z bar.sync 0; 2026-02-21T08:52:40.6174939Z // begin inline asm 2026-02-21T08:52:40.6175178Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r456}, [%r434]; 2026-02-21T08:52:40.6175450Z // end inline asm 2026-02-21T08:52:40.6175610Z // begin inline asm 2026-02-21T08:52:40.6175788Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6175980Z // end inline asm 2026-02-21T08:52:40.6176152Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6176342Z shl.b32 %r551, %r538, 8; 2026-02-21T08:52:40.6176641Z and.b32 %r552, %r551, 1024; 2026-02-21T08:52:40.6176829Z add.s32 %r553, %r552, %r187; 2026-02-21T08:52:40.6177031Z bfe.u32 %r554, %r553, 4, 14; 2026-02-21T08:52:40.6177362Z cvt.u64.u32 %rd129, %r554; 2026-02-21T08:52:40.6177555Z or.b64 %rd119, %rd129, 4611686293313683456; 2026-02-21T08:52:40.6177771Z // begin inline asm 2026-02-21T08:52:40.6178164Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r453,%r454,%r455,%r456}, {%r449,%r450,%r451,%r452}, %rd119, %p55, 1, 1; 2026-02-21T08:52:40.6178612Z // end inline asm 2026-02-21T08:52:40.6178771Z add.s32 %r555, %r553, 32; 2026-02-21T08:52:40.6178945Z bfe.u32 %r556, %r555, 4, 14; 2026-02-21T08:52:40.6179127Z cvt.u64.u32 %rd130, %r556; 2026-02-21T08:52:40.6179314Z or.b64 %rd120, %rd130, 4611686293313683456; 2026-02-21T08:52:40.6179530Z // begin inline asm 2026-02-21T08:52:40.6180051Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r453,%r454,%r455,%r456}, {%r461,%r462,%r463,%r464}, %rd120, %p55, 1, 1; 2026-02-21T08:52:40.6180497Z // end inline asm 2026-02-21T08:52:40.6180657Z add.s32 %r557, %r553, 64; 2026-02-21T08:52:40.6180825Z bfe.u32 %r558, %r557, 4, 14; 2026-02-21T08:52:40.6181012Z cvt.u64.u32 %rd131, %r558; 2026-02-21T08:52:40.6181199Z or.b64 %rd121, %rd131, 4611686293313683456; 2026-02-21T08:52:40.6181413Z // begin inline asm 2026-02-21T08:52:40.6181788Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r453,%r454,%r455,%r456}, {%r473,%r474,%r475,%r476}, %rd121, %p55, 1, 1; 2026-02-21T08:52:40.6182220Z // end inline asm 2026-02-21T08:52:40.6182372Z add.s32 %r559, %r553, 96; 2026-02-21T08:52:40.6182546Z bfe.u32 %r560, %r559, 4, 14; 2026-02-21T08:52:40.6182722Z cvt.u64.u32 %rd132, %r560; 2026-02-21T08:52:40.6182909Z or.b64 %rd122, %rd132, 4611686293313683456; 2026-02-21T08:52:40.6183117Z // begin inline asm 2026-02-21T08:52:40.6183493Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r453,%r454,%r455,%r456}, {%r485,%r486,%r487,%r488}, %rd122, %p55, 1, 1; 2026-02-21T08:52:40.6183924Z // end inline asm 2026-02-21T08:52:40.6184089Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6184287Z mov.b32 %r493, %r187; 2026-02-21T08:52:40.6184455Z mov.b32 %r495, %r242; 2026-02-21T08:52:40.6184610Z mov.b32 %r494, %r242; 2026-02-21T08:52:40.6184780Z // begin inline asm 2026-02-21T08:52:40.6184979Z // wait for regs: %r453,%r454,%r455,%r456,%r493,%r494,%r495 2026-02-21T08:52:40.6185241Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6185429Z // end inline asm 2026-02-21T08:52:40.6185579Z $L__tmp4: 2026-02-21T08:52:40.6185876Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6186251Z add.s32 %r561, %r1715, 1; 2026-02-21T08:52:40.6186435Z setp.gt.s32 %p76, %r561, 1; 2026-02-21T08:52:40.6186756Z selp.b32 %r1715, 0, %r561, %p76; 2026-02-21T08:52:40.6187103Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6187472Z add.s32 %r562, %r1712, -32; 2026-02-21T08:52:40.6187660Z add.s64 %rd123, %rd277, -64; 2026-02-21T08:52:40.6187847Z mad.wide.s32 %rd124, %r562, 2, %rd42; 2026-02-21T08:52:40.6188202Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6188636Z shl.b32 %r563, %r1715, 12; 2026-02-21T08:52:40.6188818Z add.s32 %r564, %r160, %r563; 2026-02-21T08:52:40.6188997Z add.s32 %r503, %r564, %r12; 2026-02-21T08:52:40.6189180Z selp.b32 %r504, 8, 0, %p74; 2026-02-21T08:52:40.6189358Z // begin inline asm 2026-02-21T08:52:40.6189594Z cp.async.ca.shared.global [ %r503 + 0 ], [ %rd123 + 0 ], 0x8, %r504; 2026-02-21T08:52:40.6189876Z // end inline asm 2026-02-21T08:52:40.6190026Z add.s32 %r505, %r503, 2048; 2026-02-21T08:52:40.6190203Z // begin inline asm 2026-02-21T08:52:40.6190428Z cp.async.ca.shared.global [ %r505 + 0 ], [ %rd124 + 0 ], 0x8, %r504; 2026-02-21T08:52:40.6190702Z // end inline asm 2026-02-21T08:52:40.6190866Z cp.async.commit_group; 2026-02-21T08:52:40.6191183Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6191563Z shl.b32 %r565, %r1715, 3; 2026-02-21T08:52:40.6191877Z add.s32 %r507, %r923, %r565; 2026-02-21T08:52:40.6192065Z and.pred %p67, %p191, %p74; 2026-02-21T08:52:40.6192245Z // begin inline asm 2026-02-21T08:52:40.6192468Z @%p67 mbarrier.arrive.expect_tx.shared.b64 _, [%r507], 256; 2026-02-21T08:52:40.6192733Z // end inline asm 2026-02-21T08:52:40.6193042Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6193403Z shl.b32 %r566, %r1715, 8; 2026-02-21T08:52:40.6193574Z add.s32 %r508, %r240, %r566; 2026-02-21T08:52:40.6193748Z bar.sync 0; 2026-02-21T08:52:40.6193899Z elect.sync %r567|%p77, -1; 2026-02-21T08:52:40.6194085Z and.pred %p78, %p74, %p77; 2026-02-21T08:52:40.6194264Z and.pred %p68, %p1, %p78; 2026-02-21T08:52:40.6194584Z cvt.u32.u64 %r568, %rd278; 2026-02-21T08:52:40.6194756Z add.s32 %r510, %r568, 64; 2026-02-21T08:52:40.6194944Z // begin inline asm 2026-02-21T08:52:40.6195358Z @%p68 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r508], [%rd180, {%r575, %r510}], [%r507]; 2026-02-21T08:52:40.6195816Z // end inline asm 2026-02-21T08:52:40.6196110Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6196600Z mad.wide.s32 %rd127, %r1712, 2, %rd42; 2026-02-21T08:52:40.6196970Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6197320Z add.s32 %r569, %r539, %r563; 2026-02-21T08:52:40.6197498Z add.s32 %r512, %r569, %r12; 2026-02-21T08:52:40.6197670Z // begin inline asm 2026-02-21T08:52:40.6197897Z cp.async.ca.shared.global [ %r512 + 0 ], [ %rd277 + 0 ], 0x8, %r504; 2026-02-21T08:52:40.6198184Z // end inline asm 2026-02-21T08:52:40.6198336Z add.s32 %r514, %r512, 2048; 2026-02-21T08:52:40.6198513Z // begin inline asm 2026-02-21T08:52:40.6198739Z cp.async.ca.shared.global [ %r514 + 0 ], [ %rd127 + 0 ], 0x8, %r504; 2026-02-21T08:52:40.6199022Z // end inline asm 2026-02-21T08:52:40.6199184Z cp.async.commit_group; 2026-02-21T08:52:40.6199516Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6199883Z add.s32 %r516, %r921, %r565; 2026-02-21T08:52:40.6200062Z // begin inline asm 2026-02-21T08:52:40.6200288Z @%p67 mbarrier.arrive.expect_tx.shared.b64 _, [%r516], 256; 2026-02-21T08:52:40.6200548Z // end inline asm 2026-02-21T08:52:40.6200851Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6201205Z add.s32 %r517, %r249, %r566; 2026-02-21T08:52:40.6201385Z bar.sync 0; 2026-02-21T08:52:40.6201539Z elect.sync %r570|%p79, -1; 2026-02-21T08:52:40.6201730Z and.pred %p80, %p74, %p79; 2026-02-21T08:52:40.6201915Z and.pred %p70, %p1, %p80; 2026-02-21T08:52:40.6202092Z add.s32 %r519, %r568, 80; 2026-02-21T08:52:40.6202277Z // begin inline asm 2026-02-21T08:52:40.6202690Z @%p70 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r517], [%rd180, {%r575, %r519}], [%r516]; 2026-02-21T08:52:40.6203154Z // end inline asm 2026-02-21T08:52:40.6203457Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6203822Z add.s64 %rd277, %rd277, 128; 2026-02-21T08:52:40.6204002Z add.s32 %r1712, %r1712, 64; 2026-02-21T08:52:40.6204192Z setp.lt.u64 %p81, %rd278, 4064; 2026-02-21T08:52:40.6204388Z add.s64 %rd278, %rd278, 32; 2026-02-21T08:52:40.6204563Z @%p81 bra $L__BB0_3; 2026-02-21T08:52:40.6204798Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:40.6205064Z cp.async.wait_group 0; 2026-02-21T08:52:40.6205249Z bar.sync 0; 2026-02-21T08:52:40.6205399Z // begin inline asm 2026-02-21T08:52:40.6205609Z @%p191 mbarrier.inval.shared::cta.b64 [%r921]; 2026-02-21T08:52:40.6205855Z // end inline asm 2026-02-21T08:52:40.6206016Z bar.sync 0; 2026-02-21T08:52:40.6206166Z // begin inline asm 2026-02-21T08:52:40.6206632Z @%p191 mbarrier.inval.shared::cta.b64 [%r922]; 2026-02-21T08:52:40.6206867Z // end inline asm 2026-02-21T08:52:40.6207017Z // begin inline asm 2026-02-21T08:52:40.6207210Z @%p191 mbarrier.inval.shared::cta.b64 [%r923]; 2026-02-21T08:52:40.6207441Z // end inline asm 2026-02-21T08:52:40.6207597Z bar.sync 0; 2026-02-21T08:52:40.6207745Z // begin inline asm 2026-02-21T08:52:40.6207933Z @%p191 mbarrier.inval.shared::cta.b64 [%r924]; 2026-02-21T08:52:40.6208154Z // end inline asm 2026-02-21T08:52:40.6208452Z .loc 1 94 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:94:28 2026-02-21T08:52:40.6208826Z cvt.rn.bf16x2.f32 %r622, %r454, %r453; 2026-02-21T08:52:40.6209041Z cvt.rn.bf16x2.f32 %r623, %r456, %r455; 2026-02-21T08:52:40.6209733Z .loc 1 95 43 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:95:43 2026-02-21T08:52:40.6210178Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r30], {%r622, %r623}; 2026-02-21T08:52:40.6210475Z // begin inline asm 2026-02-21T08:52:40.6210665Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6210866Z // end inline asm 2026-02-21T08:52:40.6211015Z bar.sync 0; 2026-02-21T08:52:40.6211166Z elect.sync %r624|%p100, -1; 2026-02-21T08:52:40.6211363Z and.pred %p86, %p1, %p100; 2026-02-21T08:52:40.6211541Z // begin inline asm 2026-02-21T08:52:40.6211867Z @%p86 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd177, {%r575, %r576}], [%r160]; 2026-02-21T08:52:40.6212234Z // end inline asm 2026-02-21T08:52:40.6212401Z cp.async.bulk.commit_group; 2026-02-21T08:52:40.6212600Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:52:40.6212819Z bar.sync 0; 2026-02-21T08:52:40.6213120Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6213484Z add.s32 %r625, %r1711, 264; 2026-02-21T08:52:40.6213806Z .loc 1 34 35 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:34:35 2026-02-21T08:52:40.6214156Z shr.s32 %r626, %r625, 31; 2026-02-21T08:52:40.6214334Z shr.u32 %r627, %r626, 29; 2026-02-21T08:52:40.6214505Z add.s32 %r628, %r625, %r627; 2026-02-21T08:52:40.6214846Z .loc 1 35 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:35:33 2026-02-21T08:52:40.6215206Z and.b32 %r629, %r628, -8; 2026-02-21T08:52:40.6215523Z .loc 1 36 39 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:39 2026-02-21T08:52:40.6215885Z sub.s32 %r630, 448, %r629; 2026-02-21T08:52:40.6216217Z .loc 1 36 52 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:52 2026-02-21T08:52:40.6216704Z min.s32 %r631, %r630, 8; 2026-02-21T08:52:40.6217024Z .loc 1 37 45 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:45 2026-02-21T08:52:40.6217399Z sub.s32 %r632, %r625, %r629; 2026-02-21T08:52:40.6217743Z .loc 1 38 51 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:38:51 2026-02-21T08:52:40.6218106Z div.s32 %r633, %r632, %r631; 2026-02-21T08:52:40.6218459Z .loc 1 37 64 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:64 2026-02-21T08:52:40.6218827Z mul.lo.s32 %r634, %r633, %r631; 2026-02-21T08:52:40.6219042Z sub.s32 %r635, %r632, %r634; 2026-02-21T08:52:40.6219382Z .loc 1 37 30 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:30 2026-02-21T08:52:40.6219749Z add.s32 %r636, %r635, %r629; 2026-02-21T08:52:40.6220075Z .loc 1 39 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:39:27 2026-02-21T08:52:40.6220423Z shl.b32 %r925, %r636, 4; 2026-02-21T08:52:40.6220745Z .loc 1 40 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:40:27 2026-02-21T08:52:40.6221094Z shl.b32 %r926, %r633, 6; 2026-02-21T08:52:40.6221406Z .loc 1 41 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:41:32 2026-02-21T08:52:40.6221912Z or.b32 %r637, %r926, %r7; 2026-02-21T08:52:40.6222097Z or.b32 %r638, %r926, %r8; 2026-02-21T08:52:40.6222415Z .loc 1 55 53 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:53 2026-02-21T08:52:40.6222766Z shl.b32 %r639, %r637, 13; 2026-02-21T08:52:40.6222942Z shl.b32 %r640, %r638, 13; 2026-02-21T08:52:40.6223258Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6223620Z // begin inline asm 2026-02-21T08:52:40.6223820Z @%p191 mbarrier.init.shared::cta.b64 [%r923], 1; 2026-02-21T08:52:40.6224058Z // end inline asm 2026-02-21T08:52:40.6224213Z bar.sync 0; 2026-02-21T08:52:40.6224359Z // begin inline asm 2026-02-21T08:52:40.6224689Z @%p191 mbarrier.init.shared::cta.b64 [%r924], 1; 2026-02-21T08:52:40.6224925Z // end inline asm 2026-02-21T08:52:40.6225080Z // begin inline asm 2026-02-21T08:52:40.6225265Z @%p191 mbarrier.init.shared::cta.b64 [%r921], 1; 2026-02-21T08:52:40.6225506Z // end inline asm 2026-02-21T08:52:40.6225657Z bar.sync 0; 2026-02-21T08:52:40.6225805Z // begin inline asm 2026-02-21T08:52:40.6225990Z @%p191 mbarrier.init.shared::cta.b64 [%r922], 1; 2026-02-21T08:52:40.6226219Z // end inline asm 2026-02-21T08:52:40.6226653Z .loc 1 55 60 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:60 2026-02-21T08:52:40.6227029Z or.b32 %r641, %r639, %r10; 2026-02-21T08:52:40.6227224Z or.b32 %r642, %r640, %r10; 2026-02-21T08:52:40.6227544Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6227917Z mad.wide.s32 %rd134, %r641, 2, %rd42; 2026-02-21T08:52:40.6228134Z mad.wide.s32 %rd135, %r642, 2, %rd42; 2026-02-21T08:52:40.6228339Z mov.b32 %r583, 8; 2026-02-21T08:52:40.6228708Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6229068Z // begin inline asm 2026-02-21T08:52:40.6229312Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd134 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6229589Z // end inline asm 2026-02-21T08:52:40.6229746Z // begin inline asm 2026-02-21T08:52:40.6229972Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd135 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6230251Z // end inline asm 2026-02-21T08:52:40.6230409Z cp.async.commit_group; 2026-02-21T08:52:40.6230737Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6231100Z // begin inline asm 2026-02-21T08:52:40.6231339Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r923], 256; 2026-02-21T08:52:40.6231608Z // end inline asm 2026-02-21T08:52:40.6231901Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6232251Z bar.sync 0; 2026-02-21T08:52:40.6232403Z elect.sync %r643|%p101, -1; 2026-02-21T08:52:40.6232595Z and.pred %p92, %p1, %p101; 2026-02-21T08:52:40.6232771Z mov.b32 %r589, 0; 2026-02-21T08:52:40.6232923Z // begin inline asm 2026-02-21T08:52:40.6233359Z @%p92 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r240], [%rd180, {%r925, %r589}], [%r923]; 2026-02-21T08:52:40.6233826Z // end inline asm 2026-02-21T08:52:40.6234124Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6234483Z cvt.s64.s32 %rd147, %r639; 2026-02-21T08:52:40.6234674Z or.b64 %rd148, %rd147, %rd276; 2026-02-21T08:52:40.6234858Z shl.b64 %rd149, %rd148, 1; 2026-02-21T08:52:40.6235041Z add.s64 %rd150, %rd42, %rd149; 2026-02-21T08:52:40.6235226Z add.s64 %rd137, %rd150, 64; 2026-02-21T08:52:40.6235400Z cvt.s64.s32 %rd151, %r640; 2026-02-21T08:52:40.6235581Z or.b64 %rd152, %rd151, %rd276; 2026-02-21T08:52:40.6235758Z shl.b64 %rd153, %rd152, 1; 2026-02-21T08:52:40.6235934Z add.s64 %rd154, %rd42, %rd153; 2026-02-21T08:52:40.6236284Z add.s64 %rd138, %rd154, 64; 2026-02-21T08:52:40.6236731Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6237093Z // begin inline asm 2026-02-21T08:52:40.6237336Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd137 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6237618Z // end inline asm 2026-02-21T08:52:40.6237768Z // begin inline asm 2026-02-21T08:52:40.6238002Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd138 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6238270Z // end inline asm 2026-02-21T08:52:40.6238432Z cp.async.commit_group; 2026-02-21T08:52:40.6238755Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6239118Z // begin inline asm 2026-02-21T08:52:40.6239467Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r921], 256; 2026-02-21T08:52:40.6239758Z // end inline asm 2026-02-21T08:52:40.6240056Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6240406Z bar.sync 0; 2026-02-21T08:52:40.6240566Z elect.sync %r644|%p102, -1; 2026-02-21T08:52:40.6240752Z and.pred %p94, %p1, %p102; 2026-02-21T08:52:40.6240934Z mov.b32 %r598, 16; 2026-02-21T08:52:40.6241088Z // begin inline asm 2026-02-21T08:52:40.6241519Z @%p94 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r249], [%rd180, {%r925, %r598}], [%r921]; 2026-02-21T08:52:40.6241985Z // end inline asm 2026-02-21T08:52:40.6242287Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6242641Z add.s64 %rd140, %rd150, 128; 2026-02-21T08:52:40.6242824Z add.s64 %rd141, %rd154, 128; 2026-02-21T08:52:40.6243156Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6243517Z // begin inline asm 2026-02-21T08:52:40.6243757Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd140 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6244031Z // end inline asm 2026-02-21T08:52:40.6244194Z // begin inline asm 2026-02-21T08:52:40.6244432Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd141 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6244703Z // end inline asm 2026-02-21T08:52:40.6244862Z cp.async.commit_group; 2026-02-21T08:52:40.6245180Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6245550Z // begin inline asm 2026-02-21T08:52:40.6245773Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r924], 256; 2026-02-21T08:52:40.6246044Z // end inline asm 2026-02-21T08:52:40.6246343Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6246831Z bar.sync 0; 2026-02-21T08:52:40.6246992Z elect.sync %r645|%p103, -1; 2026-02-21T08:52:40.6247180Z and.pred %p96, %p1, %p103; 2026-02-21T08:52:40.6247359Z mov.b32 %r607, 32; 2026-02-21T08:52:40.6247513Z // begin inline asm 2026-02-21T08:52:40.6247930Z @%p96 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r258], [%rd180, {%r925, %r607}], [%r924]; 2026-02-21T08:52:40.6248388Z // end inline asm 2026-02-21T08:52:40.6248686Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6249060Z add.s64 %rd143, %rd150, 192; 2026-02-21T08:52:40.6249241Z add.s64 %rd144, %rd154, 192; 2026-02-21T08:52:40.6249559Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6249902Z // begin inline asm 2026-02-21T08:52:40.6250137Z cp.async.ca.shared.global [ %r19 + 0 ], [ %rd143 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6250412Z // end inline asm 2026-02-21T08:52:40.6250565Z // begin inline asm 2026-02-21T08:52:40.6250790Z cp.async.ca.shared.global [ %r20 + 0 ], [ %rd144 + 0 ], 0x8, %r583; 2026-02-21T08:52:40.6251063Z // end inline asm 2026-02-21T08:52:40.6251224Z cp.async.commit_group; 2026-02-21T08:52:40.6251729Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6252088Z // begin inline asm 2026-02-21T08:52:40.6252305Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r922], 256; 2026-02-21T08:52:40.6252572Z // end inline asm 2026-02-21T08:52:40.6252858Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6253215Z bar.sync 0; 2026-02-21T08:52:40.6253285Z elect.sync %r646|%p104, -1; 2026-02-21T08:52:40.6253357Z and.pred %p98, %p1, %p104; 2026-02-21T08:52:40.6253414Z mov.b32 %r616, 48; 2026-02-21T08:52:40.6253473Z // begin inline asm 2026-02-21T08:52:40.6253921Z @%p98 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r267], [%rd180, {%r925, %r616}], [%r922]; 2026-02-21T08:52:40.6253994Z // end inline asm 2026-02-21T08:52:40.6254213Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6254284Z shl.b32 %r647, %r633, 19; 2026-02-21T08:52:40.6254355Z or.b32 %r648, %r1710, %r647; 2026-02-21T08:52:40.6254432Z mad.wide.s32 %rd279, %r648, 2, %rd11; 2026-02-21T08:52:40.6254499Z or.b32 %r1720, %r33, %r647; 2026-02-21T08:52:40.6254566Z mov.b32 %r803, 0f00000000; 2026-02-21T08:52:40.6254625Z mov.b32 %r1723, 1; 2026-02-21T08:52:40.6254686Z mov.b32 %r1722, -1; 2026-02-21T08:52:40.6254746Z mov.b64 %rd280, 0; 2026-02-21T08:52:40.6254825Z mov.b32 %r1721, %r589; 2026-02-21T08:52:40.6254891Z mov.b32 %r804, %r803; 2026-02-21T08:52:40.6254951Z mov.b32 %r805, %r803; 2026-02-21T08:52:40.6255023Z mov.b32 %r806, %r803; 2026-02-21T08:52:40.6255144Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:40.6255258Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:40.6255332Z setp.lt.u64 %p124, %rd280, 4032; 2026-02-21T08:52:40.6255399Z add.s32 %r871, %r1722, 1; 2026-02-21T08:52:40.6255470Z setp.gt.s32 %p125, %r871, 1; 2026-02-21T08:52:40.6255542Z selp.b32 %r1722, 0, %r871, %p125; 2026-02-21T08:52:40.6255613Z selp.b32 %r872, 1, 0, %p125; 2026-02-21T08:52:40.6255680Z xor.b32 %r1721, %r1721, %r872; 2026-02-21T08:52:40.6255891Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6255969Z cp.async.wait_group 2; 2026-02-21T08:52:40.6256026Z bar.sync 0; 2026-02-21T08:52:40.6256090Z shl.b32 %r873, %r1722, 12; 2026-02-21T08:52:40.6256154Z add.s32 %r875, %r160, %r873; 2026-02-21T08:52:40.6256359Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6256424Z add.s32 %r876, %r875, %r21; 2026-02-21T08:52:40.6256626Z ld.shared.b16 %rs53, [%r876]; 2026-02-21T08:52:40.6256717Z ld.shared.b16 %rs54, [%r876+512]; 2026-02-21T08:52:40.6256789Z ld.shared.b16 %rs55, [%r876+32]; 2026-02-21T08:52:40.6256856Z ld.shared.b16 %rs56, [%r876+544]; 2026-02-21T08:52:40.6256925Z add.s32 %r877, %r875, %r22; 2026-02-21T08:52:40.6256998Z ld.shared.b16 %rs57, [%r877]; 2026-02-21T08:52:40.6257068Z ld.shared.b16 %rs58, [%r877+512]; 2026-02-21T08:52:40.6257135Z ld.shared.b16 %rs59, [%r877+32]; 2026-02-21T08:52:40.6257208Z ld.shared.b16 %rs60, [%r877+544]; 2026-02-21T08:52:40.6257272Z add.s32 %r878, %r875, %r23; 2026-02-21T08:52:40.6257336Z ld.shared.b16 %rs61, [%r878]; 2026-02-21T08:52:40.6257409Z ld.shared.b16 %rs62, [%r878+512]; 2026-02-21T08:52:40.6257476Z ld.shared.b16 %rs63, [%r878+32]; 2026-02-21T08:52:40.6257541Z ld.shared.b16 %rs64, [%r878+544]; 2026-02-21T08:52:40.6257602Z add.s32 %r879, %r875, %r24; 2026-02-21T08:52:40.6257673Z ld.shared.b16 %rs65, [%r879]; 2026-02-21T08:52:40.6257741Z ld.shared.b16 %rs66, [%r879+512]; 2026-02-21T08:52:40.6257809Z ld.shared.b16 %rs67, [%r879+32]; 2026-02-21T08:52:40.6257877Z ld.shared.b16 %rs68, [%r879+544]; 2026-02-21T08:52:40.6257942Z cvt.f32.bf16 %r719, %rs53; 2026-02-21T08:52:40.6258143Z cvt.f32.bf16 %r720, %rs54; 2026-02-21T08:52:40.6258209Z cvt.f32.bf16 %r721, %rs57; 2026-02-21T08:52:40.6258275Z cvt.f32.bf16 %r722, %rs58; 2026-02-21T08:52:40.6258337Z cvt.f32.bf16 %r731, %rs61; 2026-02-21T08:52:40.6258399Z cvt.f32.bf16 %r732, %rs62; 2026-02-21T08:52:40.6258465Z cvt.f32.bf16 %r733, %rs65; 2026-02-21T08:52:40.6258528Z cvt.f32.bf16 %r734, %rs66; 2026-02-21T08:52:40.6258591Z cvt.f32.bf16 %r743, %rs55; 2026-02-21T08:52:40.6258652Z cvt.f32.bf16 %r744, %rs56; 2026-02-21T08:52:40.6258718Z cvt.f32.bf16 %r745, %rs59; 2026-02-21T08:52:40.6258781Z cvt.f32.bf16 %r746, %rs60; 2026-02-21T08:52:40.6258847Z cvt.f32.bf16 %r755, %rs63; 2026-02-21T08:52:40.6258919Z cvt.f32.bf16 %r756, %rs64; 2026-02-21T08:52:40.6258989Z cvt.f32.bf16 %r757, %rs67; 2026-02-21T08:52:40.6259170Z cvt.f32.bf16 %r758, %rs68; 2026-02-21T08:52:40.6259390Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6259464Z shl.b32 %r880, %r1722, 3; 2026-02-21T08:52:40.6259526Z add.s32 %r649, %r923, %r880; 2026-02-21T08:52:40.6259590Z // begin inline asm 2026-02-21T08:52:40.6259648Z 2026-02-21T08:52:40.6259700Z { 2026-02-21T08:52:40.6259766Z .reg .pred complete; 2026-02-21T08:52:40.6259833Z waitLoop: 2026-02-21T08:52:40.6259980Z mbarrier.try_wait.parity.shared.b64 complete, [%r649], %r1721; 2026-02-21T08:52:40.6260050Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6260102Z } 2026-02-21T08:52:40.6260107Z 2026-02-21T08:52:40.6260169Z // end inline asm 2026-02-21T08:52:40.6260369Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6260430Z shl.b32 %r882, %r1722, 8; 2026-02-21T08:52:40.6260508Z add.s32 %r884, %r240, %r882; 2026-02-21T08:52:40.6260712Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6260776Z add.s32 %r885, %r884, %r25; 2026-02-21T08:52:40.6260850Z ld.shared.b8 %rs69, [%r885]; 2026-02-21T08:52:40.6260918Z ld.shared.b8 %rs70, [%r885+128]; 2026-02-21T08:52:40.6261115Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6261178Z shl.b16 %rs71, %rs69, 4; 2026-02-21T08:52:40.6261245Z shl.b16 %rs72, %rs70, 4; 2026-02-21T08:52:40.6261440Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6261513Z selp.b16 %rs73, %rs71, %rs69, %p243; 2026-02-21T08:52:40.6261582Z cvt.s16.s8 %rs74, %rs73; 2026-02-21T08:52:40.6261644Z shr.s16 %rs75, %rs74, 4; 2026-02-21T08:52:40.6261784Z selp.b16 %rs76, %rs72, %rs70, %p243; 2026-02-21T08:52:40.6261892Z cvt.s16.s8 %rs77, %rs76; 2026-02-21T08:52:40.6262031Z shr.s16 %rs78, %rs77, 4; 2026-02-21T08:52:40.6262284Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6262383Z cvt.rn.f32.s16 %r886, %rs75; 2026-02-21T08:52:40.6262486Z cvt.rn.f32.s16 %r887, %rs78; 2026-02-21T08:52:40.6262820Z st.shared.b32 [%r26], %r886; 2026-02-21T08:52:40.6262922Z st.shared.b32 [%r27], %r887; 2026-02-21T08:52:40.6263094Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r803}; 2026-02-21T08:52:40.6263240Z bar.sync 0; 2026-02-21T08:52:40.6263339Z // begin inline asm 2026-02-21T08:52:40.6263515Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r675, %r723}, [%r653]; 2026-02-21T08:52:40.6263717Z // end inline asm 2026-02-21T08:52:40.6263808Z bar.sync 0; 2026-02-21T08:52:40.6263971Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r805}; 2026-02-21T08:52:40.6264090Z bar.sync 0; 2026-02-21T08:52:40.6264230Z // begin inline asm 2026-02-21T08:52:40.6264397Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r677, %r725}, [%r653]; 2026-02-21T08:52:40.6264552Z // end inline asm 2026-02-21T08:52:40.6264698Z bar.sync 0; 2026-02-21T08:52:40.6264879Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r804}; 2026-02-21T08:52:40.6265072Z bar.sync 0; 2026-02-21T08:52:40.6265166Z // begin inline asm 2026-02-21T08:52:40.6265364Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r676, %r724}, [%r653]; 2026-02-21T08:52:40.6265511Z // end inline asm 2026-02-21T08:52:40.6265618Z bar.sync 0; 2026-02-21T08:52:40.6265833Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r806}; 2026-02-21T08:52:40.6265924Z bar.sync 0; 2026-02-21T08:52:40.6266018Z // begin inline asm 2026-02-21T08:52:40.6266214Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r678, %r726}, [%r653]; 2026-02-21T08:52:40.6266363Z // end inline asm 2026-02-21T08:52:40.6266634Z $L__tmp5: 2026-02-21T08:52:40.6266972Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6267276Z // begin inline asm 2026-02-21T08:52:40.6267401Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6267494Z // end inline asm 2026-02-21T08:52:40.6267720Z shfl.sync.idx.b32 %r888, %r5, 0, 31, -1; 2026-02-21T08:52:40.6279295Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6279419Z mov.pred %p105, -1; 2026-02-21T08:52:40.6279496Z // begin inline asm 2026-02-21T08:52:40.6279820Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r675,%r676,%r677,%r678}, {%r719,%r720,%r721,%r722}, %rd3, %p105, 1, 1; 2026-02-21T08:52:40.6279884Z // end inline asm 2026-02-21T08:52:40.6279948Z // begin inline asm 2026-02-21T08:52:40.6280247Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r675,%r676,%r677,%r678}, {%r731,%r732,%r733,%r734}, %rd4, %p105, 1, 1; 2026-02-21T08:52:40.6280307Z // end inline asm 2026-02-21T08:52:40.6280368Z // begin inline asm 2026-02-21T08:52:40.6280663Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r675,%r676,%r677,%r678}, {%r743,%r744,%r745,%r746}, %rd5, %p105, 1, 1; 2026-02-21T08:52:40.6280729Z // end inline asm 2026-02-21T08:52:40.6280790Z // begin inline asm 2026-02-21T08:52:40.6281084Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r675,%r676,%r677,%r678}, {%r755,%r756,%r757,%r758}, %rd6, %p105, 1, 1; 2026-02-21T08:52:40.6281151Z // end inline asm 2026-02-21T08:52:40.6281212Z // begin inline asm 2026-02-21T08:52:40.6281492Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r723,%r724,%r725,%r726}, {%r719,%r720,%r721,%r722}, %rd7, %p105, 1, 1; 2026-02-21T08:52:40.6281553Z // end inline asm 2026-02-21T08:52:40.6281613Z // begin inline asm 2026-02-21T08:52:40.6281886Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r723,%r724,%r725,%r726}, {%r731,%r732,%r733,%r734}, %rd8, %p105, 1, 1; 2026-02-21T08:52:40.6281950Z // end inline asm 2026-02-21T08:52:40.6282009Z // begin inline asm 2026-02-21T08:52:40.6282281Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r723,%r724,%r725,%r726}, {%r743,%r744,%r745,%r746}, %rd9, %p105, 1, 1; 2026-02-21T08:52:40.6282349Z // end inline asm 2026-02-21T08:52:40.6282409Z // begin inline asm 2026-02-21T08:52:40.6282700Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r723,%r724,%r725,%r726}, {%r755,%r756,%r757,%r758}, %rd10, %p105, 1, 1; 2026-02-21T08:52:40.6282769Z // end inline asm 2026-02-21T08:52:40.6282852Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6282917Z mov.b32 %r768, %r589; 2026-02-21T08:52:40.6282975Z mov.b32 %r769, %r589; 2026-02-21T08:52:40.6283043Z mov.b32 %r767, %r187; 2026-02-21T08:52:40.6283102Z // begin inline asm 2026-02-21T08:52:40.6283268Z // wait for regs: %r675,%r676,%r677,%r678,%r723,%r724,%r725,%r726,%r767,%r768,%r769 2026-02-21T08:52:40.6283351Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6283409Z // end inline asm 2026-02-21T08:52:40.6283467Z $L__tmp6: 2026-02-21T08:52:40.6283690Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6283769Z add.s32 %r890, %r539, %r873; 2026-02-21T08:52:40.6283980Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6284049Z add.s32 %r891, %r890, %r21; 2026-02-21T08:52:40.6284126Z ld.shared.b16 %rs79, [%r891]; 2026-02-21T08:52:40.6284421Z ld.shared.b16 %rs80, [%r891+512]; 2026-02-21T08:52:40.6284494Z ld.shared.b16 %rs81, [%r891+32]; 2026-02-21T08:52:40.6284570Z ld.shared.b16 %rs82, [%r891+544]; 2026-02-21T08:52:40.6284643Z add.s32 %r892, %r890, %r22; 2026-02-21T08:52:40.6284710Z ld.shared.b16 %rs83, [%r892]; 2026-02-21T08:52:40.6284775Z ld.shared.b16 %rs84, [%r892+512]; 2026-02-21T08:52:40.6284846Z ld.shared.b16 %rs85, [%r892+32]; 2026-02-21T08:52:40.6284911Z ld.shared.b16 %rs86, [%r892+544]; 2026-02-21T08:52:40.6284973Z add.s32 %r893, %r890, %r23; 2026-02-21T08:52:40.6285047Z ld.shared.b16 %rs87, [%r893]; 2026-02-21T08:52:40.6285113Z ld.shared.b16 %rs88, [%r893+512]; 2026-02-21T08:52:40.6285179Z ld.shared.b16 %rs89, [%r893+32]; 2026-02-21T08:52:40.6285371Z ld.shared.b16 %rs90, [%r893+544]; 2026-02-21T08:52:40.6285444Z add.s32 %r894, %r890, %r24; 2026-02-21T08:52:40.6285509Z ld.shared.b16 %rs91, [%r894]; 2026-02-21T08:52:40.6285575Z ld.shared.b16 %rs92, [%r894+512]; 2026-02-21T08:52:40.6285655Z ld.shared.b16 %rs93, [%r894+32]; 2026-02-21T08:52:40.6285727Z ld.shared.b16 %rs94, [%r894+544]; 2026-02-21T08:52:40.6285794Z cvt.f32.bf16 %r799, %rs79; 2026-02-21T08:52:40.6285861Z cvt.f32.bf16 %r800, %rs80; 2026-02-21T08:52:40.6285922Z cvt.f32.bf16 %r801, %rs83; 2026-02-21T08:52:40.6285983Z cvt.f32.bf16 %r802, %rs84; 2026-02-21T08:52:40.6286042Z cvt.f32.bf16 %r811, %rs87; 2026-02-21T08:52:40.6286108Z cvt.f32.bf16 %r812, %rs88; 2026-02-21T08:52:40.6286170Z cvt.f32.bf16 %r813, %rs91; 2026-02-21T08:52:40.6286230Z cvt.f32.bf16 %r814, %rs92; 2026-02-21T08:52:40.6286295Z cvt.f32.bf16 %r823, %rs81; 2026-02-21T08:52:40.6286355Z cvt.f32.bf16 %r824, %rs82; 2026-02-21T08:52:40.6286414Z cvt.f32.bf16 %r825, %rs85; 2026-02-21T08:52:40.6286626Z cvt.f32.bf16 %r826, %rs86; 2026-02-21T08:52:40.6286702Z cvt.f32.bf16 %r835, %rs89; 2026-02-21T08:52:40.6286762Z cvt.f32.bf16 %r836, %rs90; 2026-02-21T08:52:40.6286821Z cvt.f32.bf16 %r837, %rs93; 2026-02-21T08:52:40.6286889Z cvt.f32.bf16 %r838, %rs94; 2026-02-21T08:52:40.6287108Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6287172Z add.s32 %r781, %r921, %r880; 2026-02-21T08:52:40.6287236Z // begin inline asm 2026-02-21T08:52:40.6287296Z 2026-02-21T08:52:40.6287348Z { 2026-02-21T08:52:40.6287414Z .reg .pred complete; 2026-02-21T08:52:40.6287479Z waitLoop: 2026-02-21T08:52:40.6287632Z mbarrier.try_wait.parity.shared.b64 complete, [%r781], %r1721; 2026-02-21T08:52:40.6287705Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6287757Z } 2026-02-21T08:52:40.6287766Z 2026-02-21T08:52:40.6287835Z // end inline asm 2026-02-21T08:52:40.6288044Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6288113Z add.s32 %r897, %r249, %r882; 2026-02-21T08:52:40.6288319Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6288385Z add.s32 %r898, %r897, %r25; 2026-02-21T08:52:40.6288452Z ld.shared.b8 %rs95, [%r898]; 2026-02-21T08:52:40.6288534Z ld.shared.b8 %rs96, [%r898+128]; 2026-02-21T08:52:40.6288736Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6288799Z shl.b16 %rs97, %rs95, 4; 2026-02-21T08:52:40.6288862Z shl.b16 %rs98, %rs96, 4; 2026-02-21T08:52:40.6289061Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6289135Z selp.b16 %rs99, %rs97, %rs95, %p243; 2026-02-21T08:52:40.6289200Z cvt.s16.s8 %rs100, %rs99; 2026-02-21T08:52:40.6289270Z shr.s16 %rs101, %rs100, 4; 2026-02-21T08:52:40.6289343Z selp.b16 %rs102, %rs98, %rs96, %p243; 2026-02-21T08:52:40.6289409Z cvt.s16.s8 %rs103, %rs102; 2026-02-21T08:52:40.6289476Z shr.s16 %rs104, %rs103, 4; 2026-02-21T08:52:40.6289678Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6289898Z cvt.rn.f32.s16 %r899, %rs101; 2026-02-21T08:52:40.6289964Z cvt.rn.f32.s16 %r900, %rs104; 2026-02-21T08:52:40.6290028Z bar.sync 0; 2026-02-21T08:52:40.6290094Z st.shared.b32 [%r26], %r899; 2026-02-21T08:52:40.6290160Z st.shared.b32 [%r27], %r900; 2026-02-21T08:52:40.6290224Z $L__tmp7: 2026-02-21T08:52:40.6290501Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6290655Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r675, %r723}; 2026-02-21T08:52:40.6290721Z bar.sync 0; 2026-02-21T08:52:40.6290787Z // begin inline asm 2026-02-21T08:52:40.6290921Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r803}, [%r434]; 2026-02-21T08:52:40.6291097Z // end inline asm 2026-02-21T08:52:40.6291163Z bar.sync 0; 2026-02-21T08:52:40.6291309Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r677, %r725}; 2026-02-21T08:52:40.6291368Z bar.sync 0; 2026-02-21T08:52:40.6291440Z // begin inline asm 2026-02-21T08:52:40.6291567Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r805}, [%r434]; 2026-02-21T08:52:40.6291626Z // end inline asm 2026-02-21T08:52:40.6291683Z bar.sync 0; 2026-02-21T08:52:40.6291826Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r676, %r724}; 2026-02-21T08:52:40.6291881Z bar.sync 0; 2026-02-21T08:52:40.6291940Z // begin inline asm 2026-02-21T08:52:40.6292081Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r804}, [%r434]; 2026-02-21T08:52:40.6292140Z // end inline asm 2026-02-21T08:52:40.6292196Z bar.sync 0; 2026-02-21T08:52:40.6292337Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r678, %r726}; 2026-02-21T08:52:40.6292398Z bar.sync 0; 2026-02-21T08:52:40.6292457Z // begin inline asm 2026-02-21T08:52:40.6292582Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r806}, [%r434]; 2026-02-21T08:52:40.6292644Z // end inline asm 2026-02-21T08:52:40.6292703Z // begin inline asm 2026-02-21T08:52:40.6292789Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6292853Z // end inline asm 2026-02-21T08:52:40.6292938Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6293000Z shl.b32 %r901, %r888, 8; 2026-02-21T08:52:40.6293063Z and.b32 %r902, %r901, 1024; 2026-02-21T08:52:40.6293141Z add.s32 %r903, %r902, %r187; 2026-02-21T08:52:40.6293203Z bfe.u32 %r904, %r903, 4, 14; 2026-02-21T08:52:40.6293268Z cvt.u64.u32 %rd173, %r904; 2026-02-21T08:52:40.6293355Z or.b64 %rd163, %rd173, 4611686293313683456; 2026-02-21T08:52:40.6293429Z // begin inline asm 2026-02-21T08:52:40.6293735Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r803,%r804,%r805,%r806}, {%r799,%r800,%r801,%r802}, %rd163, %p105, 1, 1; 2026-02-21T08:52:40.6293794Z // end inline asm 2026-02-21T08:52:40.6293861Z add.s32 %r905, %r903, 32; 2026-02-21T08:52:40.6293926Z bfe.u32 %r906, %r905, 4, 14; 2026-02-21T08:52:40.6293989Z cvt.u64.u32 %rd174, %r906; 2026-02-21T08:52:40.6294071Z or.b64 %rd164, %rd174, 4611686293313683456; 2026-02-21T08:52:40.6294130Z // begin inline asm 2026-02-21T08:52:40.6294418Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r803,%r804,%r805,%r806}, {%r811,%r812,%r813,%r814}, %rd164, %p105, 1, 1; 2026-02-21T08:52:40.6294480Z // end inline asm 2026-02-21T08:52:40.6294545Z add.s32 %r907, %r903, 64; 2026-02-21T08:52:40.6294607Z bfe.u32 %r908, %r907, 4, 14; 2026-02-21T08:52:40.6294671Z cvt.u64.u32 %rd175, %r908; 2026-02-21T08:52:40.6294750Z or.b64 %rd165, %rd175, 4611686293313683456; 2026-02-21T08:52:40.6294810Z // begin inline asm 2026-02-21T08:52:40.6295092Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r803,%r804,%r805,%r806}, {%r823,%r824,%r825,%r826}, %rd165, %p105, 1, 1; 2026-02-21T08:52:40.6295158Z // end inline asm 2026-02-21T08:52:40.6295218Z add.s32 %r909, %r903, 96; 2026-02-21T08:52:40.6295285Z bfe.u32 %r910, %r909, 4, 14; 2026-02-21T08:52:40.6295347Z cvt.u64.u32 %rd176, %r910; 2026-02-21T08:52:40.6295427Z or.b64 %rd166, %rd176, 4611686293313683456; 2026-02-21T08:52:40.6295486Z // begin inline asm 2026-02-21T08:52:40.6296321Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r803,%r804,%r805,%r806}, {%r835,%r836,%r837,%r838}, %rd166, %p105, 1, 1; 2026-02-21T08:52:40.6296386Z // end inline asm 2026-02-21T08:52:40.6296582Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6296646Z mov.b32 %r843, %r187; 2026-02-21T08:52:40.6296711Z mov.b32 %r844, %r589; 2026-02-21T08:52:40.6296773Z mov.b32 %r845, %r589; 2026-02-21T08:52:40.6296833Z // begin inline asm 2026-02-21T08:52:40.6297111Z // wait for regs: %r803,%r804,%r805,%r806,%r843,%r844,%r845 2026-02-21T08:52:40.6297193Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6297263Z // end inline asm 2026-02-21T08:52:40.6297319Z $L__tmp8: 2026-02-21T08:52:40.6297685Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6297753Z add.s32 %r911, %r1723, 1; 2026-02-21T08:52:40.6297821Z setp.gt.s32 %p126, %r911, 1; 2026-02-21T08:52:40.6297893Z selp.b32 %r1723, 0, %r911, %p126; 2026-02-21T08:52:40.6298102Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6298171Z add.s32 %r912, %r1720, -32; 2026-02-21T08:52:40.6298235Z add.s64 %rd167, %rd279, -64; 2026-02-21T08:52:40.6298311Z mad.wide.s32 %rd168, %r912, 2, %rd42; 2026-02-21T08:52:40.6298510Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6298572Z shl.b32 %r913, %r1723, 12; 2026-02-21T08:52:40.6298637Z add.s32 %r914, %r160, %r913; 2026-02-21T08:52:40.6298699Z add.s32 %r853, %r914, %r12; 2026-02-21T08:52:40.6298762Z selp.b32 %r854, 8, 0, %p124; 2026-02-21T08:52:40.6298827Z // begin inline asm 2026-02-21T08:52:40.6298973Z cp.async.ca.shared.global [ %r853 + 0 ], [ %rd167 + 0 ], 0x8, %r854; 2026-02-21T08:52:40.6299032Z // end inline asm 2026-02-21T08:52:40.6299093Z add.s32 %r855, %r853, 2048; 2026-02-21T08:52:40.6299156Z // begin inline asm 2026-02-21T08:52:40.6299296Z cp.async.ca.shared.global [ %r855 + 0 ], [ %rd168 + 0 ], 0x8, %r854; 2026-02-21T08:52:40.6299354Z // end inline asm 2026-02-21T08:52:40.6299428Z cp.async.commit_group; 2026-02-21T08:52:40.6299639Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6299700Z shl.b32 %r915, %r1723, 3; 2026-02-21T08:52:40.6299762Z add.s32 %r857, %r923, %r915; 2026-02-21T08:52:40.6299838Z and.pred %p117, %p191, %p124; 2026-02-21T08:52:40.6299898Z // begin inline asm 2026-02-21T08:52:40.6300031Z @%p117 mbarrier.arrive.expect_tx.shared.b64 _, [%r857], 256; 2026-02-21T08:52:40.6300093Z // end inline asm 2026-02-21T08:52:40.6300302Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6300368Z shl.b32 %r916, %r1723, 8; 2026-02-21T08:52:40.6300438Z add.s32 %r858, %r240, %r916; 2026-02-21T08:52:40.6300509Z bar.sync 0; 2026-02-21T08:52:40.6300584Z elect.sync %r917|%p127, -1; 2026-02-21T08:52:40.6300655Z and.pred %p128, %p124, %p127; 2026-02-21T08:52:40.6300729Z and.pred %p118, %p1, %p128; 2026-02-21T08:52:40.6300794Z cvt.u32.u64 %r918, %rd280; 2026-02-21T08:52:40.6300857Z add.s32 %r860, %r918, 64; 2026-02-21T08:52:40.6300921Z // begin inline asm 2026-02-21T08:52:40.6301246Z @%p118 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r858], [%rd180, {%r925, %r860}], [%r857]; 2026-02-21T08:52:40.6301306Z // end inline asm 2026-02-21T08:52:40.6301517Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6301592Z mad.wide.s32 %rd171, %r1720, 2, %rd42; 2026-02-21T08:52:40.6301789Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6301856Z add.s32 %r919, %r539, %r913; 2026-02-21T08:52:40.6301925Z add.s32 %r862, %r919, %r12; 2026-02-21T08:52:40.6301986Z // begin inline asm 2026-02-21T08:52:40.6302124Z cp.async.ca.shared.global [ %r862 + 0 ], [ %rd279 + 0 ], 0x8, %r854; 2026-02-21T08:52:40.6302338Z // end inline asm 2026-02-21T08:52:40.6302399Z add.s32 %r864, %r862, 2048; 2026-02-21T08:52:40.6302460Z // begin inline asm 2026-02-21T08:52:40.6302603Z cp.async.ca.shared.global [ %r864 + 0 ], [ %rd171 + 0 ], 0x8, %r854; 2026-02-21T08:52:40.6302661Z // end inline asm 2026-02-21T08:52:40.6302727Z cp.async.commit_group; 2026-02-21T08:52:40.6302943Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6303011Z add.s32 %r866, %r921, %r915; 2026-02-21T08:52:40.6303071Z // begin inline asm 2026-02-21T08:52:40.6303200Z @%p117 mbarrier.arrive.expect_tx.shared.b64 _, [%r866], 256; 2026-02-21T08:52:40.6303263Z // end inline asm 2026-02-21T08:52:40.6303582Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6303648Z add.s32 %r867, %r249, %r916; 2026-02-21T08:52:40.6303708Z bar.sync 0; 2026-02-21T08:52:40.6303783Z elect.sync %r920|%p129, -1; 2026-02-21T08:52:40.6303850Z and.pred %p130, %p124, %p129; 2026-02-21T08:52:40.6303915Z and.pred %p120, %p1, %p130; 2026-02-21T08:52:40.6303982Z add.s32 %r869, %r918, 80; 2026-02-21T08:52:40.6304042Z // begin inline asm 2026-02-21T08:52:40.6304435Z @%p120 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r867], [%rd180, {%r925, %r869}], [%r866]; 2026-02-21T08:52:40.6304504Z // end inline asm 2026-02-21T08:52:40.6304721Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6304791Z add.s64 %rd279, %rd279, 128; 2026-02-21T08:52:40.6304854Z add.s32 %r1720, %r1720, 64; 2026-02-21T08:52:40.6304930Z setp.lt.u64 %p131, %rd280, 4064; 2026-02-21T08:52:40.6304995Z add.s64 %rd280, %rd280, 32; 2026-02-21T08:52:40.6305059Z @%p131 bra $L__BB0_5; 2026-02-21T08:52:40.6305182Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:40.6305257Z cp.async.wait_group 0; 2026-02-21T08:52:40.6305314Z bar.sync 0; 2026-02-21T08:52:40.6305380Z // begin inline asm 2026-02-21T08:52:40.6305480Z @%p191 mbarrier.inval.shared::cta.b64 [%r921]; 2026-02-21T08:52:40.6305549Z // end inline asm 2026-02-21T08:52:40.6305609Z bar.sync 0; 2026-02-21T08:52:40.6305672Z // begin inline asm 2026-02-21T08:52:40.6305762Z @%p191 mbarrier.inval.shared::cta.b64 [%r922]; 2026-02-21T08:52:40.6305818Z // end inline asm 2026-02-21T08:52:40.6305881Z // begin inline asm 2026-02-21T08:52:40.6305968Z @%p191 mbarrier.inval.shared::cta.b64 [%r923]; 2026-02-21T08:52:40.6306025Z // end inline asm 2026-02-21T08:52:40.6306079Z bar.sync 0; 2026-02-21T08:52:40.6306143Z // begin inline asm 2026-02-21T08:52:40.6306230Z @%p191 mbarrier.inval.shared::cta.b64 [%r924]; 2026-02-21T08:52:40.6306288Z // end inline asm 2026-02-21T08:52:40.6306696Z .loc 1 94 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:94:28 2026-02-21T08:52:40.6306824Z cvt.rn.bf16x2.f32 %r972, %r804, %r803; 2026-02-21T08:52:40.6306911Z cvt.rn.bf16x2.f32 %r973, %r806, %r805; 2026-02-21T08:52:40.6307128Z .loc 1 95 43 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:95:43 2026-02-21T08:52:40.6307288Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r30], {%r972, %r973}; 2026-02-21T08:52:40.6307352Z // begin inline asm 2026-02-21T08:52:40.6307435Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6307497Z // end inline asm 2026-02-21T08:52:40.6307553Z bar.sync 0; 2026-02-21T08:52:40.6307619Z elect.sync %r974|%p150, -1; 2026-02-21T08:52:40.6307690Z and.pred %p136, %p1, %p150; 2026-02-21T08:52:40.6307749Z // begin inline asm 2026-02-21T08:52:40.6307977Z @%p136 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd177, {%r925, %r926}], [%r160]; 2026-02-21T08:52:40.6308040Z // end inline asm 2026-02-21T08:52:40.6308113Z cp.async.bulk.commit_group; 2026-02-21T08:52:40.6308192Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:52:40.6308412Z bar.sync 0; 2026-02-21T08:52:40.6308729Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6308795Z add.s32 %r975, %r1711, 528; 2026-02-21T08:52:40.6308995Z .loc 1 34 35 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:34:35 2026-02-21T08:52:40.6309062Z shr.s32 %r976, %r975, 31; 2026-02-21T08:52:40.6309122Z shr.u32 %r977, %r976, 29; 2026-02-21T08:52:40.6309187Z add.s32 %r978, %r975, %r977; 2026-02-21T08:52:40.6309389Z .loc 1 35 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:35:33 2026-02-21T08:52:40.6309452Z and.b32 %r979, %r978, -8; 2026-02-21T08:52:40.6309778Z .loc 1 36 39 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:39 2026-02-21T08:52:40.6309845Z sub.s32 %r980, 448, %r979; 2026-02-21T08:52:40.6310046Z .loc 1 36 52 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:52 2026-02-21T08:52:40.6310113Z min.s32 %r981, %r980, 8; 2026-02-21T08:52:40.6310308Z .loc 1 37 45 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:45 2026-02-21T08:52:40.6310370Z sub.s32 %r982, %r975, %r979; 2026-02-21T08:52:40.6310571Z .loc 1 38 51 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:38:51 2026-02-21T08:52:40.6310632Z div.s32 %r983, %r982, %r981; 2026-02-21T08:52:40.6310827Z .loc 1 37 64 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:64 2026-02-21T08:52:40.6310896Z mul.lo.s32 %r984, %r983, %r981; 2026-02-21T08:52:40.6310956Z sub.s32 %r985, %r982, %r984; 2026-02-21T08:52:40.6311149Z .loc 1 37 30 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:30 2026-02-21T08:52:40.6311211Z add.s32 %r986, %r985, %r979; 2026-02-21T08:52:40.6311408Z .loc 1 39 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:39:27 2026-02-21T08:52:40.6311472Z shl.b32 %r95, %r986, 4; 2026-02-21T08:52:40.6311678Z .loc 1 40 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:40:27 2026-02-21T08:52:40.6311746Z shl.b32 %r1276, %r983, 6; 2026-02-21T08:52:40.6311940Z .loc 1 41 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:41:32 2026-02-21T08:52:40.6312001Z or.b32 %r987, %r1276, %r7; 2026-02-21T08:52:40.6312066Z or.b32 %r988, %r1276, %r8; 2026-02-21T08:52:40.6312263Z .loc 1 55 53 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:53 2026-02-21T08:52:40.6312330Z shl.b32 %r989, %r987, 13; 2026-02-21T08:52:40.6312396Z shl.b32 %r990, %r988, 13; 2026-02-21T08:52:40.6312612Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6312674Z // begin inline asm 2026-02-21T08:52:40.6312776Z @%p191 mbarrier.init.shared::cta.b64 [%r923], 1; 2026-02-21T08:52:40.6312841Z // end inline asm 2026-02-21T08:52:40.6312898Z bar.sync 0; 2026-02-21T08:52:40.6312955Z // begin inline asm 2026-02-21T08:52:40.6313053Z @%p191 mbarrier.init.shared::cta.b64 [%r924], 1; 2026-02-21T08:52:40.6313114Z // end inline asm 2026-02-21T08:52:40.6313173Z // begin inline asm 2026-02-21T08:52:40.6313261Z @%p191 mbarrier.init.shared::cta.b64 [%r921], 1; 2026-02-21T08:52:40.6313324Z // end inline asm 2026-02-21T08:52:40.6313380Z bar.sync 0; 2026-02-21T08:52:40.6313440Z // begin inline asm 2026-02-21T08:52:40.6313532Z @%p191 mbarrier.init.shared::cta.b64 [%r922], 1; 2026-02-21T08:52:40.6313589Z // end inline asm 2026-02-21T08:52:40.6313794Z .loc 1 55 60 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:60 2026-02-21T08:52:40.6313866Z or.b32 %r991, %r989, %r10; 2026-02-21T08:52:40.6313929Z or.b32 %r992, %r990, %r10; 2026-02-21T08:52:40.6314128Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6314318Z mad.wide.s32 %rd178, %r991, 2, %rd42; 2026-02-21T08:52:40.6314391Z mad.wide.s32 %rd179, %r992, 2, %rd42; 2026-02-21T08:52:40.6314450Z mov.b32 %r933, 8; 2026-02-21T08:52:40.6314648Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6314713Z // begin inline asm 2026-02-21T08:52:40.6314853Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd178 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6314911Z // end inline asm 2026-02-21T08:52:40.6314975Z // begin inline asm 2026-02-21T08:52:40.6315105Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd179 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6315161Z // end inline asm 2026-02-21T08:52:40.6315227Z cp.async.commit_group; 2026-02-21T08:52:40.6315549Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6315618Z // begin inline asm 2026-02-21T08:52:40.6315758Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r923], 256; 2026-02-21T08:52:40.6315826Z // end inline asm 2026-02-21T08:52:40.6316028Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6316085Z bar.sync 0; 2026-02-21T08:52:40.6316160Z elect.sync %r993|%p151, -1; 2026-02-21T08:52:40.6316229Z and.pred %p142, %p1, %p151; 2026-02-21T08:52:40.6316286Z mov.b32 %r939, 0; 2026-02-21T08:52:40.6316346Z // begin inline asm 2026-02-21T08:52:40.6316803Z @%p142 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r240], [%rd180, {%r95, %r939}], [%r923]; 2026-02-21T08:52:40.6316876Z // end inline asm 2026-02-21T08:52:40.6317081Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6317153Z cvt.s64.s32 %rd191, %r989; 2026-02-21T08:52:40.6317221Z or.b64 %rd192, %rd191, %rd276; 2026-02-21T08:52:40.6317285Z shl.b64 %rd193, %rd192, 1; 2026-02-21T08:52:40.6317357Z add.s64 %rd194, %rd42, %rd193; 2026-02-21T08:52:40.6317423Z add.s64 %rd181, %rd194, 64; 2026-02-21T08:52:40.6317485Z cvt.s64.s32 %rd195, %r990; 2026-02-21T08:52:40.6317549Z or.b64 %rd196, %rd195, %rd276; 2026-02-21T08:52:40.6317614Z shl.b64 %rd197, %rd196, 1; 2026-02-21T08:52:40.6317677Z add.s64 %rd198, %rd42, %rd197; 2026-02-21T08:52:40.6317738Z add.s64 %rd182, %rd198, 64; 2026-02-21T08:52:40.6317954Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6318015Z // begin inline asm 2026-02-21T08:52:40.6318155Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd181 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6318213Z // end inline asm 2026-02-21T08:52:40.6318277Z // begin inline asm 2026-02-21T08:52:40.6318409Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd182 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6318467Z // end inline asm 2026-02-21T08:52:40.6318541Z cp.async.commit_group; 2026-02-21T08:52:40.6318749Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6318814Z // begin inline asm 2026-02-21T08:52:40.6318947Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r921], 256; 2026-02-21T08:52:40.6319003Z // end inline asm 2026-02-21T08:52:40.6319199Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6319267Z bar.sync 0; 2026-02-21T08:52:40.6319341Z elect.sync %r994|%p152, -1; 2026-02-21T08:52:40.6319408Z and.pred %p144, %p1, %p152; 2026-02-21T08:52:40.6319464Z mov.b32 %r948, 16; 2026-02-21T08:52:40.6319526Z // begin inline asm 2026-02-21T08:52:40.6319840Z @%p144 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r249], [%rd180, {%r95, %r948}], [%r921]; 2026-02-21T08:52:40.6319900Z // end inline asm 2026-02-21T08:52:40.6320103Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6320317Z add.s64 %rd184, %rd194, 128; 2026-02-21T08:52:40.6320381Z add.s64 %rd185, %rd198, 128; 2026-02-21T08:52:40.6320578Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6320643Z // begin inline asm 2026-02-21T08:52:40.6320772Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd184 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6320829Z // end inline asm 2026-02-21T08:52:40.6320893Z // begin inline asm 2026-02-21T08:52:40.6321019Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd185 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6321075Z // end inline asm 2026-02-21T08:52:40.6321145Z cp.async.commit_group; 2026-02-21T08:52:40.6321348Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6321535Z // begin inline asm 2026-02-21T08:52:40.6321670Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r924], 256; 2026-02-21T08:52:40.6321731Z // end inline asm 2026-02-21T08:52:40.6321934Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6321991Z bar.sync 0; 2026-02-21T08:52:40.6322063Z elect.sync %r995|%p153, -1; 2026-02-21T08:52:40.6322131Z and.pred %p146, %p1, %p153; 2026-02-21T08:52:40.6322189Z mov.b32 %r957, 32; 2026-02-21T08:52:40.6322248Z // begin inline asm 2026-02-21T08:52:40.6322561Z @%p146 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r258], [%rd180, {%r95, %r957}], [%r924]; 2026-02-21T08:52:40.6322619Z // end inline asm 2026-02-21T08:52:40.6322814Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6322893Z add.s64 %rd187, %rd194, 192; 2026-02-21T08:52:40.6322959Z add.s64 %rd188, %rd198, 192; 2026-02-21T08:52:40.6323157Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6323222Z // begin inline asm 2026-02-21T08:52:40.6323353Z cp.async.ca.shared.global [ %r19 + 0 ], [ %rd187 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6323411Z // end inline asm 2026-02-21T08:52:40.6323474Z // begin inline asm 2026-02-21T08:52:40.6323602Z cp.async.ca.shared.global [ %r20 + 0 ], [ %rd188 + 0 ], 0x8, %r933; 2026-02-21T08:52:40.6323658Z // end inline asm 2026-02-21T08:52:40.6323724Z cp.async.commit_group; 2026-02-21T08:52:40.6323933Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6323995Z // begin inline asm 2026-02-21T08:52:40.6324119Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r922], 256; 2026-02-21T08:52:40.6324181Z // end inline asm 2026-02-21T08:52:40.6324376Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6324434Z bar.sync 0; 2026-02-21T08:52:40.6324505Z elect.sync %r996|%p154, -1; 2026-02-21T08:52:40.6324570Z and.pred %p148, %p1, %p154; 2026-02-21T08:52:40.6324627Z mov.b32 %r966, 48; 2026-02-21T08:52:40.6324689Z // begin inline asm 2026-02-21T08:52:40.6325000Z @%p148 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r267], [%rd180, {%r95, %r966}], [%r922]; 2026-02-21T08:52:40.6325059Z // end inline asm 2026-02-21T08:52:40.6325264Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6325332Z shl.b32 %r997, %r983, 19; 2026-02-21T08:52:40.6325394Z or.b32 %r998, %r1710, %r997; 2026-02-21T08:52:40.6325465Z mad.wide.s32 %rd281, %r998, 2, %rd11; 2026-02-21T08:52:40.6325531Z or.b32 %r1728, %r33, %r997; 2026-02-21T08:52:40.6325592Z mov.b32 %r1153, 0f00000000; 2026-02-21T08:52:40.6325649Z mov.b32 %r1731, 1; 2026-02-21T08:52:40.6325723Z mov.b32 %r1730, -1; 2026-02-21T08:52:40.6325789Z mov.b64 %rd282, 0; 2026-02-21T08:52:40.6325850Z mov.b32 %r1729, %r939; 2026-02-21T08:52:40.6325911Z mov.b32 %r1154, %r1153; 2026-02-21T08:52:40.6325975Z mov.b32 %r1155, %r1153; 2026-02-21T08:52:40.6326158Z mov.b32 %r1156, %r1153; 2026-02-21T08:52:40.6326274Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:40.6326395Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:40.6326594Z setp.lt.u64 %p174, %rd282, 4032; 2026-02-21T08:52:40.6326661Z add.s32 %r1221, %r1730, 1; 2026-02-21T08:52:40.6326731Z setp.gt.s32 %p175, %r1221, 1; 2026-02-21T08:52:40.6326802Z selp.b32 %r1730, 0, %r1221, %p175; 2026-02-21T08:52:40.6326865Z selp.b32 %r1222, 1, 0, %p175; 2026-02-21T08:52:40.6326928Z xor.b32 %r1729, %r1729, %r1222; 2026-02-21T08:52:40.6327142Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6327215Z cp.async.wait_group 2; 2026-02-21T08:52:40.6327401Z bar.sync 0; 2026-02-21T08:52:40.6327466Z shl.b32 %r1223, %r1730, 12; 2026-02-21T08:52:40.6327533Z add.s32 %r1225, %r160, %r1223; 2026-02-21T08:52:40.6327742Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6327810Z add.s32 %r1226, %r1225, %r21; 2026-02-21T08:52:40.6327882Z ld.shared.b16 %rs105, [%r1226]; 2026-02-21T08:52:40.6327954Z ld.shared.b16 %rs106, [%r1226+512]; 2026-02-21T08:52:40.6328020Z ld.shared.b16 %rs107, [%r1226+32]; 2026-02-21T08:52:40.6328087Z ld.shared.b16 %rs108, [%r1226+544]; 2026-02-21T08:52:40.6328153Z add.s32 %r1227, %r1225, %r22; 2026-02-21T08:52:40.6328217Z ld.shared.b16 %rs109, [%r1227]; 2026-02-21T08:52:40.6328282Z ld.shared.b16 %rs110, [%r1227+512]; 2026-02-21T08:52:40.6328353Z ld.shared.b16 %rs111, [%r1227+32]; 2026-02-21T08:52:40.6328417Z ld.shared.b16 %rs112, [%r1227+544]; 2026-02-21T08:52:40.6328477Z add.s32 %r1228, %r1225, %r23; 2026-02-21T08:52:40.6328548Z ld.shared.b16 %rs113, [%r1228]; 2026-02-21T08:52:40.6328615Z ld.shared.b16 %rs114, [%r1228+512]; 2026-02-21T08:52:40.6328680Z ld.shared.b16 %rs115, [%r1228+32]; 2026-02-21T08:52:40.6328747Z ld.shared.b16 %rs116, [%r1228+544]; 2026-02-21T08:52:40.6328811Z add.s32 %r1229, %r1225, %r24; 2026-02-21T08:52:40.6328875Z ld.shared.b16 %rs117, [%r1229]; 2026-02-21T08:52:40.6328941Z ld.shared.b16 %rs118, [%r1229+512]; 2026-02-21T08:52:40.6329010Z ld.shared.b16 %rs119, [%r1229+32]; 2026-02-21T08:52:40.6329075Z ld.shared.b16 %rs120, [%r1229+544]; 2026-02-21T08:52:40.6329139Z cvt.f32.bf16 %r1069, %rs105; 2026-02-21T08:52:40.6329203Z cvt.f32.bf16 %r1070, %rs106; 2026-02-21T08:52:40.6329269Z cvt.f32.bf16 %r1071, %rs109; 2026-02-21T08:52:40.6329330Z cvt.f32.bf16 %r1072, %rs110; 2026-02-21T08:52:40.6329402Z cvt.f32.bf16 %r1081, %rs113; 2026-02-21T08:52:40.6329469Z cvt.f32.bf16 %r1082, %rs114; 2026-02-21T08:52:40.6329530Z cvt.f32.bf16 %r1083, %rs117; 2026-02-21T08:52:40.6329594Z cvt.f32.bf16 %r1084, %rs118; 2026-02-21T08:52:40.6329654Z cvt.f32.bf16 %r1093, %rs107; 2026-02-21T08:52:40.6329719Z cvt.f32.bf16 %r1094, %rs108; 2026-02-21T08:52:40.6329778Z cvt.f32.bf16 %r1095, %rs111; 2026-02-21T08:52:40.6329841Z cvt.f32.bf16 %r1096, %rs112; 2026-02-21T08:52:40.6329905Z cvt.f32.bf16 %r1105, %rs115; 2026-02-21T08:52:40.6329965Z cvt.f32.bf16 %r1106, %rs116; 2026-02-21T08:52:40.6330025Z cvt.f32.bf16 %r1107, %rs119; 2026-02-21T08:52:40.6330091Z cvt.f32.bf16 %r1108, %rs120; 2026-02-21T08:52:40.6330308Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6330371Z shl.b32 %r1230, %r1730, 3; 2026-02-21T08:52:40.6330433Z add.s32 %r999, %r923, %r1230; 2026-02-21T08:52:40.6330499Z // begin inline asm 2026-02-21T08:52:40.6330552Z 2026-02-21T08:52:40.6330603Z { 2026-02-21T08:52:40.6330673Z .reg .pred complete; 2026-02-21T08:52:40.6330731Z waitLoop: 2026-02-21T08:52:40.6330880Z mbarrier.try_wait.parity.shared.b64 complete, [%r999], %r1729; 2026-02-21T08:52:40.6330949Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6331004Z } 2026-02-21T08:52:40.6331009Z 2026-02-21T08:52:40.6331066Z // end inline asm 2026-02-21T08:52:40.6331400Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6331478Z shl.b32 %r1232, %r1730, 8; 2026-02-21T08:52:40.6331542Z add.s32 %r1234, %r240, %r1232; 2026-02-21T08:52:40.6331740Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6331806Z add.s32 %r1235, %r1234, %r25; 2026-02-21T08:52:40.6331872Z ld.shared.b8 %rs121, [%r1235]; 2026-02-21T08:52:40.6331939Z ld.shared.b8 %rs122, [%r1235+128]; 2026-02-21T08:52:40.6332134Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6332201Z shl.b16 %rs123, %rs121, 4; 2026-02-21T08:52:40.6332263Z shl.b16 %rs124, %rs122, 4; 2026-02-21T08:52:40.6332549Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6332632Z selp.b16 %rs125, %rs123, %rs121, %p243; 2026-02-21T08:52:40.6332697Z cvt.s16.s8 %rs126, %rs125; 2026-02-21T08:52:40.6332757Z shr.s16 %rs127, %rs126, 4; 2026-02-21T08:52:40.6332834Z selp.b16 %rs128, %rs124, %rs122, %p243; 2026-02-21T08:52:40.6332896Z cvt.s16.s8 %rs129, %rs128; 2026-02-21T08:52:40.6332956Z shr.s16 %rs130, %rs129, 4; 2026-02-21T08:52:40.6333150Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6333219Z cvt.rn.f32.s16 %r1236, %rs127; 2026-02-21T08:52:40.6333285Z cvt.rn.f32.s16 %r1237, %rs130; 2026-02-21T08:52:40.6333347Z st.shared.b32 [%r26], %r1236; 2026-02-21T08:52:40.6333416Z st.shared.b32 [%r27], %r1237; 2026-02-21T08:52:40.6333551Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r1153}; 2026-02-21T08:52:40.6333607Z bar.sync 0; 2026-02-21T08:52:40.6333669Z // begin inline asm 2026-02-21T08:52:40.6333825Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1025, %r1073}, [%r653]; 2026-02-21T08:52:40.6333885Z // end inline asm 2026-02-21T08:52:40.6333944Z bar.sync 0; 2026-02-21T08:52:40.6334078Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r1155}; 2026-02-21T08:52:40.6334134Z bar.sync 0; 2026-02-21T08:52:40.6334193Z // begin inline asm 2026-02-21T08:52:40.6334345Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1027, %r1075}, [%r653]; 2026-02-21T08:52:40.6334402Z // end inline asm 2026-02-21T08:52:40.6334457Z bar.sync 0; 2026-02-21T08:52:40.6334582Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r1154}; 2026-02-21T08:52:40.6334643Z bar.sync 0; 2026-02-21T08:52:40.6334702Z // begin inline asm 2026-02-21T08:52:40.6334861Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1026, %r1074}, [%r653]; 2026-02-21T08:52:40.6334922Z // end inline asm 2026-02-21T08:52:40.6334978Z bar.sync 0; 2026-02-21T08:52:40.6335107Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r434], {%r1156}; 2026-02-21T08:52:40.6335161Z bar.sync 0; 2026-02-21T08:52:40.6335224Z // begin inline asm 2026-02-21T08:52:40.6335367Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1028, %r1076}, [%r653]; 2026-02-21T08:52:40.6335426Z // end inline asm 2026-02-21T08:52:40.6335486Z $L__tmp9: 2026-02-21T08:52:40.6335758Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6335817Z // begin inline asm 2026-02-21T08:52:40.6335900Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6335958Z // end inline asm 2026-02-21T08:52:40.6336040Z shfl.sync.idx.b32 %r1238, %r5, 0, 31, -1; 2026-02-21T08:52:40.6336113Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6336181Z mov.pred %p155, -1; 2026-02-21T08:52:40.6336239Z // begin inline asm 2026-02-21T08:52:40.6336676Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1025,%r1026,%r1027,%r1028}, {%r1069,%r1070,%r1071,%r1072}, %rd3, %p155, 1, 1; 2026-02-21T08:52:40.6336754Z // end inline asm 2026-02-21T08:52:40.6336815Z // begin inline asm 2026-02-21T08:52:40.6337118Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1025,%r1026,%r1027,%r1028}, {%r1081,%r1082,%r1083,%r1084}, %rd4, %p155, 1, 1; 2026-02-21T08:52:40.6337311Z // end inline asm 2026-02-21T08:52:40.6337388Z // begin inline asm 2026-02-21T08:52:40.6337685Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1025,%r1026,%r1027,%r1028}, {%r1093,%r1094,%r1095,%r1096}, %rd5, %p155, 1, 1; 2026-02-21T08:52:40.6337741Z // end inline asm 2026-02-21T08:52:40.6337803Z // begin inline asm 2026-02-21T08:52:40.6338097Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1025,%r1026,%r1027,%r1028}, {%r1105,%r1106,%r1107,%r1108}, %rd6, %p155, 1, 1; 2026-02-21T08:52:40.6338153Z // end inline asm 2026-02-21T08:52:40.6338214Z // begin inline asm 2026-02-21T08:52:40.6338625Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1073,%r1074,%r1075,%r1076}, {%r1069,%r1070,%r1071,%r1072}, %rd7, %p155, 1, 1; 2026-02-21T08:52:40.6338686Z // end inline asm 2026-02-21T08:52:40.6338744Z // begin inline asm 2026-02-21T08:52:40.6339043Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1073,%r1074,%r1075,%r1076}, {%r1081,%r1082,%r1083,%r1084}, %rd8, %p155, 1, 1; 2026-02-21T08:52:40.6339104Z // end inline asm 2026-02-21T08:52:40.6339163Z // begin inline asm 2026-02-21T08:52:40.6339471Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1073,%r1074,%r1075,%r1076}, {%r1093,%r1094,%r1095,%r1096}, %rd9, %p155, 1, 1; 2026-02-21T08:52:40.6339529Z // end inline asm 2026-02-21T08:52:40.6339588Z // begin inline asm 2026-02-21T08:52:40.6339894Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1073,%r1074,%r1075,%r1076}, {%r1105,%r1106,%r1107,%r1108}, %rd10, %p155, 1, 1; 2026-02-21T08:52:40.6339951Z // end inline asm 2026-02-21T08:52:40.6340029Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6340095Z mov.b32 %r1118, %r939; 2026-02-21T08:52:40.6340156Z mov.b32 %r1119, %r939; 2026-02-21T08:52:40.6340217Z mov.b32 %r1117, %r187; 2026-02-21T08:52:40.6340275Z // begin inline asm 2026-02-21T08:52:40.6340461Z // wait for regs: %r1025,%r1026,%r1027,%r1028,%r1073,%r1074,%r1075,%r1076,%r1117,%r1118,%r1119 2026-02-21T08:52:40.6340540Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6340597Z // end inline asm 2026-02-21T08:52:40.6340656Z $L__tmp10: 2026-02-21T08:52:40.6340862Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6340925Z add.s32 %r1240, %r539, %r1223; 2026-02-21T08:52:40.6341122Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6341190Z add.s32 %r1241, %r1240, %r21; 2026-02-21T08:52:40.6341253Z ld.shared.b16 %rs131, [%r1241]; 2026-02-21T08:52:40.6341328Z ld.shared.b16 %rs132, [%r1241+512]; 2026-02-21T08:52:40.6341394Z ld.shared.b16 %rs133, [%r1241+32]; 2026-02-21T08:52:40.6341461Z ld.shared.b16 %rs134, [%r1241+544]; 2026-02-21T08:52:40.6341530Z add.s32 %r1242, %r1240, %r22; 2026-02-21T08:52:40.6341594Z ld.shared.b16 %rs135, [%r1242]; 2026-02-21T08:52:40.6341660Z ld.shared.b16 %rs136, [%r1242+512]; 2026-02-21T08:52:40.6341732Z ld.shared.b16 %rs137, [%r1242+32]; 2026-02-21T08:52:40.6341797Z ld.shared.b16 %rs138, [%r1242+544]; 2026-02-21T08:52:40.6341857Z add.s32 %r1243, %r1240, %r23; 2026-02-21T08:52:40.6341921Z ld.shared.b16 %rs139, [%r1243]; 2026-02-21T08:52:40.6341992Z ld.shared.b16 %rs140, [%r1243+512]; 2026-02-21T08:52:40.6342057Z ld.shared.b16 %rs141, [%r1243+32]; 2026-02-21T08:52:40.6342123Z ld.shared.b16 %rs142, [%r1243+544]; 2026-02-21T08:52:40.6342188Z add.s32 %r1244, %r1240, %r24; 2026-02-21T08:52:40.6342264Z ld.shared.b16 %rs143, [%r1244]; 2026-02-21T08:52:40.6342332Z ld.shared.b16 %rs144, [%r1244+512]; 2026-02-21T08:52:40.6342397Z ld.shared.b16 %rs145, [%r1244+32]; 2026-02-21T08:52:40.6342467Z ld.shared.b16 %rs146, [%r1244+544]; 2026-02-21T08:52:40.6342530Z cvt.f32.bf16 %r1149, %rs131; 2026-02-21T08:52:40.6342593Z cvt.f32.bf16 %r1150, %rs132; 2026-02-21T08:52:40.6342662Z cvt.f32.bf16 %r1151, %rs135; 2026-02-21T08:52:40.6342724Z cvt.f32.bf16 %r1152, %rs136; 2026-02-21T08:52:40.6342889Z cvt.f32.bf16 %r1161, %rs139; 2026-02-21T08:52:40.6342954Z cvt.f32.bf16 %r1162, %rs140; 2026-02-21T08:52:40.6343014Z cvt.f32.bf16 %r1163, %rs143; 2026-02-21T08:52:40.6343075Z cvt.f32.bf16 %r1164, %rs144; 2026-02-21T08:52:40.6343134Z cvt.f32.bf16 %r1173, %rs133; 2026-02-21T08:52:40.6343200Z cvt.f32.bf16 %r1174, %rs134; 2026-02-21T08:52:40.6343260Z cvt.f32.bf16 %r1175, %rs137; 2026-02-21T08:52:40.6343320Z cvt.f32.bf16 %r1176, %rs138; 2026-02-21T08:52:40.6343383Z cvt.f32.bf16 %r1185, %rs141; 2026-02-21T08:52:40.6343443Z cvt.f32.bf16 %r1186, %rs142; 2026-02-21T08:52:40.6343504Z cvt.f32.bf16 %r1187, %rs145; 2026-02-21T08:52:40.6343575Z cvt.f32.bf16 %r1188, %rs146; 2026-02-21T08:52:40.6343881Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6343947Z add.s32 %r1131, %r921, %r1230; 2026-02-21T08:52:40.6344006Z // begin inline asm 2026-02-21T08:52:40.6344065Z 2026-02-21T08:52:40.6344115Z { 2026-02-21T08:52:40.6344184Z .reg .pred complete; 2026-02-21T08:52:40.6344240Z waitLoop: 2026-02-21T08:52:40.6344384Z mbarrier.try_wait.parity.shared.b64 complete, [%r1131], %r1729; 2026-02-21T08:52:40.6344454Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6344516Z } 2026-02-21T08:52:40.6344522Z 2026-02-21T08:52:40.6344584Z // end inline asm 2026-02-21T08:52:40.6344783Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6344844Z add.s32 %r1247, %r249, %r1232; 2026-02-21T08:52:40.6345043Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6345107Z add.s32 %r1248, %r1247, %r25; 2026-02-21T08:52:40.6345173Z ld.shared.b8 %rs147, [%r1248]; 2026-02-21T08:52:40.6345243Z ld.shared.b8 %rs148, [%r1248+128]; 2026-02-21T08:52:40.6345442Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6345504Z shl.b16 %rs149, %rs147, 4; 2026-02-21T08:52:40.6345571Z shl.b16 %rs150, %rs148, 4; 2026-02-21T08:52:40.6345771Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6345844Z selp.b16 %rs151, %rs149, %rs147, %p243; 2026-02-21T08:52:40.6345906Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T08:52:40.6345976Z shr.s16 %rs153, %rs152, 4; 2026-02-21T08:52:40.6346053Z selp.b16 %rs154, %rs150, %rs148, %p243; 2026-02-21T08:52:40.6346115Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T08:52:40.6346174Z shr.s16 %rs156, %rs155, 4; 2026-02-21T08:52:40.6346373Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6346437Z cvt.rn.f32.s16 %r1249, %rs153; 2026-02-21T08:52:40.6346638Z cvt.rn.f32.s16 %r1250, %rs156; 2026-02-21T08:52:40.6346701Z bar.sync 0; 2026-02-21T08:52:40.6346765Z st.shared.b32 [%r26], %r1249; 2026-02-21T08:52:40.6346827Z st.shared.b32 [%r27], %r1250; 2026-02-21T08:52:40.6346889Z $L__tmp11: 2026-02-21T08:52:40.6347169Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6347323Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r1025, %r1073}; 2026-02-21T08:52:40.6347378Z bar.sync 0; 2026-02-21T08:52:40.6347445Z // begin inline asm 2026-02-21T08:52:40.6347576Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1153}, [%r434]; 2026-02-21T08:52:40.6347634Z // end inline asm 2026-02-21T08:52:40.6347691Z bar.sync 0; 2026-02-21T08:52:40.6347837Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r1027, %r1075}; 2026-02-21T08:52:40.6347893Z bar.sync 0; 2026-02-21T08:52:40.6347952Z // begin inline asm 2026-02-21T08:52:40.6348085Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1155}, [%r434]; 2026-02-21T08:52:40.6348144Z // end inline asm 2026-02-21T08:52:40.6348199Z bar.sync 0; 2026-02-21T08:52:40.6348346Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r1026, %r1074}; 2026-02-21T08:52:40.6348651Z bar.sync 0; 2026-02-21T08:52:40.6348714Z // begin inline asm 2026-02-21T08:52:40.6348846Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1154}, [%r434]; 2026-02-21T08:52:40.6348906Z // end inline asm 2026-02-21T08:52:40.6348962Z bar.sync 0; 2026-02-21T08:52:40.6349105Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r653], {%r1028, %r1076}; 2026-02-21T08:52:40.6349165Z bar.sync 0; 2026-02-21T08:52:40.6349223Z // begin inline asm 2026-02-21T08:52:40.6349351Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1156}, [%r434]; 2026-02-21T08:52:40.6349421Z // end inline asm 2026-02-21T08:52:40.6349486Z // begin inline asm 2026-02-21T08:52:40.6349568Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6349627Z // end inline asm 2026-02-21T08:52:40.6349701Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6349897Z shl.b32 %r1251, %r1238, 8; 2026-02-21T08:52:40.6349970Z and.b32 %r1252, %r1251, 1024; 2026-02-21T08:52:40.6350032Z add.s32 %r1253, %r1252, %r187; 2026-02-21T08:52:40.6350092Z bfe.u32 %r1254, %r1253, 4, 14; 2026-02-21T08:52:40.6350164Z cvt.u64.u32 %rd217, %r1254; 2026-02-21T08:52:40.6350241Z or.b64 %rd207, %rd217, 4611686293313683456; 2026-02-21T08:52:40.6350301Z // begin inline asm 2026-02-21T08:52:40.6350611Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1153,%r1154,%r1155,%r1156}, {%r1149,%r1150,%r1151,%r1152}, %rd207, %p155, 1, 1; 2026-02-21T08:52:40.6350671Z // end inline asm 2026-02-21T08:52:40.6350732Z add.s32 %r1255, %r1253, 32; 2026-02-21T08:52:40.6350793Z bfe.u32 %r1256, %r1255, 4, 14; 2026-02-21T08:52:40.6350858Z cvt.u64.u32 %rd218, %r1256; 2026-02-21T08:52:40.6350931Z or.b64 %rd208, %rd218, 4611686293313683456; 2026-02-21T08:52:40.6350989Z // begin inline asm 2026-02-21T08:52:40.6351301Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1153,%r1154,%r1155,%r1156}, {%r1161,%r1162,%r1163,%r1164}, %rd208, %p155, 1, 1; 2026-02-21T08:52:40.6351362Z // end inline asm 2026-02-21T08:52:40.6351423Z add.s32 %r1257, %r1253, 64; 2026-02-21T08:52:40.6351485Z bfe.u32 %r1258, %r1257, 4, 14; 2026-02-21T08:52:40.6351549Z cvt.u64.u32 %rd219, %r1258; 2026-02-21T08:52:40.6351619Z or.b64 %rd209, %rd219, 4611686293313683456; 2026-02-21T08:52:40.6351677Z // begin inline asm 2026-02-21T08:52:40.6351979Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1153,%r1154,%r1155,%r1156}, {%r1173,%r1174,%r1175,%r1176}, %rd209, %p155, 1, 1; 2026-02-21T08:52:40.6352036Z // end inline asm 2026-02-21T08:52:40.6352095Z add.s32 %r1259, %r1253, 96; 2026-02-21T08:52:40.6352155Z bfe.u32 %r1260, %r1259, 4, 14; 2026-02-21T08:52:40.6352220Z cvt.u64.u32 %rd220, %r1260; 2026-02-21T08:52:40.6352291Z or.b64 %rd210, %rd220, 4611686293313683456; 2026-02-21T08:52:40.6352349Z // begin inline asm 2026-02-21T08:52:40.6352669Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1153,%r1154,%r1155,%r1156}, {%r1185,%r1186,%r1187,%r1188}, %rd210, %p155, 1, 1; 2026-02-21T08:52:40.6352726Z // end inline asm 2026-02-21T08:52:40.6352801Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6352869Z mov.b32 %r1194, %r939; 2026-02-21T08:52:40.6352927Z mov.b32 %r1193, %r187; 2026-02-21T08:52:40.6352984Z mov.b32 %r1195, %r939; 2026-02-21T08:52:40.6353041Z // begin inline asm 2026-02-21T08:52:40.6353162Z // wait for regs: %r1153,%r1154,%r1155,%r1156,%r1193,%r1194,%r1195 2026-02-21T08:52:40.6353239Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6353297Z // end inline asm 2026-02-21T08:52:40.6353354Z $L__tmp12: 2026-02-21T08:52:40.6353572Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6353634Z add.s32 %r1261, %r1731, 1; 2026-02-21T08:52:40.6353704Z setp.gt.s32 %p176, %r1261, 1; 2026-02-21T08:52:40.6353773Z selp.b32 %r1731, 0, %r1261, %p176; 2026-02-21T08:52:40.6353977Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6354043Z add.s32 %r1262, %r1728, -32; 2026-02-21T08:52:40.6354111Z add.s64 %rd211, %rd281, -64; 2026-02-21T08:52:40.6354289Z mad.wide.s32 %rd212, %r1262, 2, %rd42; 2026-02-21T08:52:40.6354486Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6354551Z shl.b32 %r1263, %r1731, 12; 2026-02-21T08:52:40.6354615Z add.s32 %r1264, %r160, %r1263; 2026-02-21T08:52:40.6354675Z add.s32 %r1203, %r1264, %r12; 2026-02-21T08:52:40.6354738Z selp.b32 %r1204, 8, 0, %p174; 2026-02-21T08:52:40.6354802Z // begin inline asm 2026-02-21T08:52:40.6354963Z cp.async.ca.shared.global [ %r1203 + 0 ], [ %rd211 + 0 ], 0x8, %r1204; 2026-02-21T08:52:40.6355024Z // end inline asm 2026-02-21T08:52:40.6355088Z add.s32 %r1205, %r1203, 2048; 2026-02-21T08:52:40.6355147Z // begin inline asm 2026-02-21T08:52:40.6355376Z cp.async.ca.shared.global [ %r1205 + 0 ], [ %rd212 + 0 ], 0x8, %r1204; 2026-02-21T08:52:40.6355440Z // end inline asm 2026-02-21T08:52:40.6355508Z cp.async.commit_group; 2026-02-21T08:52:40.6355717Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6355783Z shl.b32 %r1265, %r1731, 3; 2026-02-21T08:52:40.6355850Z add.s32 %r1207, %r923, %r1265; 2026-02-21T08:52:40.6355918Z and.pred %p167, %p191, %p174; 2026-02-21T08:52:40.6355977Z // begin inline asm 2026-02-21T08:52:40.6356113Z @%p167 mbarrier.arrive.expect_tx.shared.b64 _, [%r1207], 256; 2026-02-21T08:52:40.6356170Z // end inline asm 2026-02-21T08:52:40.6356366Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6356431Z shl.b32 %r1266, %r1731, 8; 2026-02-21T08:52:40.6356622Z add.s32 %r1208, %r240, %r1266; 2026-02-21T08:52:40.6356683Z bar.sync 0; 2026-02-21T08:52:40.6356751Z elect.sync %r1267|%p177, -1; 2026-02-21T08:52:40.6356825Z and.pred %p178, %p174, %p177; 2026-02-21T08:52:40.6356892Z and.pred %p168, %p1, %p178; 2026-02-21T08:52:40.6356953Z cvt.u32.u64 %r1268, %rd282; 2026-02-21T08:52:40.6357016Z add.s32 %r1210, %r1268, 64; 2026-02-21T08:52:40.6357079Z // begin inline asm 2026-02-21T08:52:40.6357403Z @%p168 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1208], [%rd180, {%r95, %r1210}], [%r1207]; 2026-02-21T08:52:40.6357461Z // end inline asm 2026-02-21T08:52:40.6357663Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6357736Z mad.wide.s32 %rd215, %r1728, 2, %rd42; 2026-02-21T08:52:40.6357931Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6358011Z add.s32 %r1269, %r539, %r1263; 2026-02-21T08:52:40.6358074Z add.s32 %r1212, %r1269, %r12; 2026-02-21T08:52:40.6358135Z // begin inline asm 2026-02-21T08:52:40.6358279Z cp.async.ca.shared.global [ %r1212 + 0 ], [ %rd281 + 0 ], 0x8, %r1204; 2026-02-21T08:52:40.6358336Z // end inline asm 2026-02-21T08:52:40.6358396Z add.s32 %r1214, %r1212, 2048; 2026-02-21T08:52:40.6358455Z // begin inline asm 2026-02-21T08:52:40.6358597Z cp.async.ca.shared.global [ %r1214 + 0 ], [ %rd215 + 0 ], 0x8, %r1204; 2026-02-21T08:52:40.6358653Z // end inline asm 2026-02-21T08:52:40.6358719Z cp.async.commit_group; 2026-02-21T08:52:40.6358926Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6358991Z add.s32 %r1216, %r921, %r1265; 2026-02-21T08:52:40.6359048Z // begin inline asm 2026-02-21T08:52:40.6359177Z @%p167 mbarrier.arrive.expect_tx.shared.b64 _, [%r1216], 256; 2026-02-21T08:52:40.6359233Z // end inline asm 2026-02-21T08:52:40.6359431Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6359496Z add.s32 %r1217, %r249, %r1266; 2026-02-21T08:52:40.6359564Z bar.sync 0; 2026-02-21T08:52:40.6359637Z elect.sync %r1270|%p179, -1; 2026-02-21T08:52:40.6359703Z and.pred %p180, %p174, %p179; 2026-02-21T08:52:40.6359778Z and.pred %p170, %p1, %p180; 2026-02-21T08:52:40.6359976Z add.s32 %r1219, %r1268, 80; 2026-02-21T08:52:40.6360035Z // begin inline asm 2026-02-21T08:52:40.6360356Z @%p170 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1217], [%rd180, {%r95, %r1219}], [%r1216]; 2026-02-21T08:52:40.6360415Z // end inline asm 2026-02-21T08:52:40.6360618Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6360681Z add.s64 %rd281, %rd281, 128; 2026-02-21T08:52:40.6360742Z add.s32 %r1728, %r1728, 64; 2026-02-21T08:52:40.6360810Z setp.lt.u64 %p181, %rd282, 4064; 2026-02-21T08:52:40.6360871Z add.s64 %rd282, %rd282, 32; 2026-02-21T08:52:40.6360935Z @%p181 bra $L__BB0_7; 2026-02-21T08:52:40.6361045Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:40.6361232Z cp.async.wait_group 0; 2026-02-21T08:52:40.6361296Z bar.sync 0; 2026-02-21T08:52:40.6361355Z // begin inline asm 2026-02-21T08:52:40.6361451Z @%p191 mbarrier.inval.shared::cta.b64 [%r921]; 2026-02-21T08:52:40.6361512Z // end inline asm 2026-02-21T08:52:40.6361570Z bar.sync 0; 2026-02-21T08:52:40.6361627Z // begin inline asm 2026-02-21T08:52:40.6361716Z @%p191 mbarrier.inval.shared::cta.b64 [%r922]; 2026-02-21T08:52:40.6361779Z // end inline asm 2026-02-21T08:52:40.6361837Z // begin inline asm 2026-02-21T08:52:40.6361923Z @%p191 mbarrier.inval.shared::cta.b64 [%r923]; 2026-02-21T08:52:40.6361977Z // end inline asm 2026-02-21T08:52:40.6362035Z bar.sync 0; 2026-02-21T08:52:40.6362091Z // begin inline asm 2026-02-21T08:52:40.6362175Z @%p191 mbarrier.inval.shared::cta.b64 [%r924]; 2026-02-21T08:52:40.6362235Z // end inline asm 2026-02-21T08:52:40.6362434Z .loc 1 94 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:94:28 2026-02-21T08:52:40.6362518Z cvt.rn.bf16x2.f32 %r1278, %r1154, %r1153; 2026-02-21T08:52:40.6362595Z cvt.rn.bf16x2.f32 %r1279, %r1156, %r1155; 2026-02-21T08:52:40.6362798Z .loc 1 95 43 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:95:43 2026-02-21T08:52:40.6362956Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r30], {%r1278, %r1279}; 2026-02-21T08:52:40.6363015Z // begin inline asm 2026-02-21T08:52:40.6363098Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6363154Z // end inline asm 2026-02-21T08:52:40.6363208Z bar.sync 0; 2026-02-21T08:52:40.6363278Z elect.sync %r1280|%p188, -1; 2026-02-21T08:52:40.6363343Z and.pred %p186, %p1, %p188; 2026-02-21T08:52:40.6363413Z // begin inline asm 2026-02-21T08:52:40.6363640Z @%p186 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd177, {%r95, %r1276}], [%r160]; 2026-02-21T08:52:40.6363697Z // end inline asm 2026-02-21T08:52:40.6363768Z cp.async.bulk.commit_group; 2026-02-21T08:52:40.6363846Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:52:40.6363905Z bar.sync 0; 2026-02-21T08:52:40.6364106Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6364170Z add.s32 %r1711, %r1711, 792; 2026-02-21T08:52:40.6364241Z setp.lt.s32 %p189, %r1711, %r1736; 2026-02-21T08:52:40.6364303Z @%p189 bra $L__BB0_2; 2026-02-21T08:52:40.6364394Z $L__BB0_9: // %.preheader 2026-02-21T08:52:40.6364460Z setp.gt.s32 %p190, %r1736, 447; 2026-02-21T08:52:40.6364523Z @%p190 bra $L__BB0_14; 2026-02-21T08:52:40.6364617Z // %bb.10: // %.lr.ph29 2026-02-21T08:52:40.6364816Z .loc 1 0 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:0:97 2026-02-21T08:52:40.6364883Z and.b32 %r1282, %r1695, 2040; 2026-02-21T08:52:40.6364944Z and.b32 %r1284, %r1696, 24; 2026-02-21T08:52:40.6365005Z xor.b32 %r34, %r1282, %r1284; 2026-02-21T08:52:40.6365070Z add.s32 %r35, %r160, %r34; 2026-02-21T08:52:40.6365136Z add.s32 %r1351, %r35, 2048; 2026-02-21T08:52:40.6365195Z add.s32 %r1358, %r35, 8192; 2026-02-21T08:52:40.6365254Z add.s32 %r1360, %r35, 10240; 2026-02-21T08:52:40.6365422Z add.s32 %r1367, %r35, 4096; 2026-02-21T08:52:40.6365482Z add.s32 %r1369, %r35, 6144; 2026-02-21T08:52:40.6365539Z add.s32 %r1376, %r35, 12288; 2026-02-21T08:52:40.6365604Z add.s32 %r1378, %r35, 14336; 2026-02-21T08:52:40.6365664Z and.b32 %r1287, %r1697, 3072; 2026-02-21T08:52:40.6365724Z and.b32 %r1289, %r1698, 448; 2026-02-21T08:52:40.6365784Z and.b32 %r1291, %r1699, 6; 2026-02-21T08:52:40.6365849Z or.b32 %r1293, %r1289, %r1700; 2026-02-21T08:52:40.6365908Z or.b32 %r1294, %r1293, %r1287; 2026-02-21T08:52:40.6365968Z or.b32 %r43, %r1294, %r1291; 2026-02-21T08:52:40.6366029Z xor.b32 %r44, %r43, 8; 2026-02-21T08:52:40.6366090Z xor.b32 %r45, %r43, 16; 2026-02-21T08:52:40.6366149Z xor.b32 %r46, %r43, 24; 2026-02-21T08:52:40.6366208Z and.b32 %r1296, %r1696, 112; 2026-02-21T08:52:40.6366383Z or.b32 %r47, %r1296, %r1701; 2026-02-21T08:52:40.6366446Z shl.b32 %r1297, %r1701, 7; 2026-02-21T08:52:40.6366623Z and.b32 %r1300, %r1703, 60; 2026-02-21T08:52:40.6366691Z xor.b32 %r1301, %r1702, %r1300; 2026-02-21T08:52:40.6366756Z or.b32 %r1302, %r1301, %r1297; 2026-02-21T08:52:40.6366816Z add.s32 %r1303, %r160, 16384; 2026-02-21T08:52:40.6366879Z add.s32 %r48, %r1303, %r1302; 2026-02-21T08:52:40.6366937Z xor.b32 %r1304, %r1302, 64; 2026-02-21T08:52:40.6367000Z add.s32 %r49, %r1303, %r1304; 2026-02-21T08:52:40.6367060Z and.b32 %r1306, %r1704, 56; 2026-02-21T08:52:40.6367123Z or.b32 %r1307, %r1306, %r4; 2026-02-21T08:52:40.6367182Z shl.b32 %r1308, %r1307, 4; 2026-02-21T08:52:40.6367241Z add.s32 %r1309, %r160, 18432; 2026-02-21T08:52:40.6367307Z add.s32 %r1548, %r1309, %r1308; 2026-02-21T08:52:40.6367365Z and.b32 %r1310, %r1698, 112; 2026-02-21T08:52:40.6367425Z or.b32 %r1313, %r1705, %r1706; 2026-02-21T08:52:40.6367482Z and.b32 %r1314, %r1313, 384; 2026-02-21T08:52:40.6367546Z and.b32 %r1316, %r1707, 512; 2026-02-21T08:52:40.6367605Z add.s32 %r1317, %r1309, %r1310; 2026-02-21T08:52:40.6367667Z add.s32 %r1318, %r1317, %r1316; 2026-02-21T08:52:40.6367741Z add.s32 %r1417, %r1318, %r1314; 2026-02-21T08:52:40.6367805Z bfe.u32 %r1319, %r1303, 4, 14; 2026-02-21T08:52:40.6367867Z cvt.u64.u32 %rd222, %r1319; 2026-02-21T08:52:40.6367944Z or.b64 %rd253, %rd222, 4611686293313683456; 2026-02-21T08:52:40.6368009Z add.s32 %r1320, %r160, 16416; 2026-02-21T08:52:40.6368069Z bfe.u32 %r1321, %r1320, 4, 14; 2026-02-21T08:52:40.6368129Z cvt.u64.u32 %rd223, %r1321; 2026-02-21T08:52:40.6368203Z or.b64 %rd254, %rd223, 4611686293313683456; 2026-02-21T08:52:40.6368262Z add.s32 %r1322, %r160, 16448; 2026-02-21T08:52:40.6368320Z bfe.u32 %r1323, %r1322, 4, 14; 2026-02-21T08:52:40.6368382Z cvt.u64.u32 %rd224, %r1323; 2026-02-21T08:52:40.6368453Z or.b64 %rd255, %rd224, 4611686293313683456; 2026-02-21T08:52:40.6368513Z add.s32 %r1324, %r160, 16480; 2026-02-21T08:52:40.6368573Z bfe.u32 %r1325, %r1324, 4, 14; 2026-02-21T08:52:40.6368636Z cvt.u64.u32 %rd225, %r1325; 2026-02-21T08:52:40.6368706Z or.b64 %rd256, %rd225, 4611686293313683456; 2026-02-21T08:52:40.6368765Z add.s32 %r1326, %r160, 17408; 2026-02-21T08:52:40.6368828Z bfe.u32 %r1327, %r1326, 4, 14; 2026-02-21T08:52:40.6368891Z cvt.u64.u32 %rd226, %r1327; 2026-02-21T08:52:40.6368959Z or.b64 %rd257, %rd226, 4611686293313683456; 2026-02-21T08:52:40.6369019Z add.s32 %r1328, %r160, 17440; 2026-02-21T08:52:40.6369084Z bfe.u32 %r1329, %r1328, 4, 14; 2026-02-21T08:52:40.6369144Z cvt.u64.u32 %rd227, %r1329; 2026-02-21T08:52:40.6369213Z or.b64 %rd258, %rd227, 4611686293313683456; 2026-02-21T08:52:40.6369274Z add.s32 %r1330, %r160, 17472; 2026-02-21T08:52:40.6369331Z bfe.u32 %r1331, %r1330, 4, 14; 2026-02-21T08:52:40.6369392Z cvt.u64.u32 %rd228, %r1331; 2026-02-21T08:52:40.6369460Z or.b64 %rd259, %rd228, 4611686293313683456; 2026-02-21T08:52:40.6369523Z add.s32 %r1332, %r160, 17504; 2026-02-21T08:52:40.6369585Z bfe.u32 %r1333, %r1332, 4, 14; 2026-02-21T08:52:40.6369644Z cvt.u64.u32 %rd229, %r1333; 2026-02-21T08:52:40.6369717Z or.b64 %rd260, %rd229, 4611686293313683456; 2026-02-21T08:52:40.6369925Z shl.b32 %r1334, %r5, 9; 2026-02-21T08:52:40.6369985Z shl.b32 %r1335, %r4, 5; 2026-02-21T08:52:40.6370050Z or.b32 %r1336, %r1334, %r1335; 2026-02-21T08:52:40.6370110Z and.b32 %r1337, %r1336, 1888; 2026-02-21T08:52:40.6370170Z and.b32 %r1339, %r1708, 144; 2026-02-21T08:52:40.6370231Z xor.b32 %r1341, %r1339, %r1709; 2026-02-21T08:52:40.6370293Z add.s32 %r1342, %r160, %r1337; 2026-02-21T08:52:40.6370352Z add.s32 %r52, %r1342, %r1341; 2026-02-21T08:52:40.6370552Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6370626Z mad.wide.u32 %rd230, %r9, 8, %rd42; 2026-02-21T08:52:40.6370686Z add.s64 %rd20, %rd230, 320; 2026-02-21T08:52:40.6371008Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6371076Z or.b32 %r1344, %r1710, %r10; 2026-02-21T08:52:40.6371135Z or.b32 %r54, %r1344, 262304; 2026-02-21T08:52:40.6371246Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:40.6371347Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:40.6371571Z .loc 1 34 35 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:34:35 2026-02-21T08:52:40.6371633Z shr.s32 %r1389, %r1736, 31; 2026-02-21T08:52:40.6371692Z shr.u32 %r1390, %r1389, 29; 2026-02-21T08:52:40.6371759Z add.s32 %r1391, %r1736, %r1390; 2026-02-21T08:52:40.6371959Z .loc 1 35 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:35:33 2026-02-21T08:52:40.6372023Z and.b32 %r1392, %r1391, -8; 2026-02-21T08:52:40.6372222Z .loc 1 36 39 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:39 2026-02-21T08:52:40.6372286Z sub.s32 %r1393, 448, %r1392; 2026-02-21T08:52:40.6372480Z .loc 1 36 52 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:36:52 2026-02-21T08:52:40.6372540Z min.s32 %r1394, %r1393, 8; 2026-02-21T08:52:40.6372740Z .loc 1 37 45 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:45 2026-02-21T08:52:40.6372800Z sub.s32 %r1395, %r1736, %r1392; 2026-02-21T08:52:40.6372993Z .loc 1 38 51 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:38:51 2026-02-21T08:52:40.6373061Z div.s32 %r1396, %r1395, %r1394; 2026-02-21T08:52:40.6373252Z .loc 1 37 64 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:64 2026-02-21T08:52:40.6373317Z mul.lo.s32 %r1397, %r1396, %r1394; 2026-02-21T08:52:40.6373387Z sub.s32 %r1398, %r1395, %r1397; 2026-02-21T08:52:40.6373588Z .loc 1 37 30 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:37:30 2026-02-21T08:52:40.6373651Z add.s32 %r1399, %r1398, %r1392; 2026-02-21T08:52:40.6373849Z .loc 1 39 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:39:27 2026-02-21T08:52:40.6373917Z shl.b32 %r1689, %r1399, 4; 2026-02-21T08:52:40.6374115Z .loc 1 40 27 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:40:27 2026-02-21T08:52:40.6374175Z shl.b32 %r1690, %r1396, 6; 2026-02-21T08:52:40.6374368Z .loc 1 41 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:41:32 2026-02-21T08:52:40.6374427Z or.b32 %r1400, %r1690, %r7; 2026-02-21T08:52:40.6374485Z or.b32 %r1401, %r1690, %r8; 2026-02-21T08:52:40.6374681Z .loc 1 55 53 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:53 2026-02-21T08:52:40.6374740Z shl.b32 %r1402, %r1400, 13; 2026-02-21T08:52:40.6374799Z shl.b32 %r1403, %r1401, 13; 2026-02-21T08:52:40.6375007Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6375067Z add.s32 %r1345, %r160, 20480; 2026-02-21T08:52:40.6375129Z // begin inline asm 2026-02-21T08:52:40.6375231Z @%p191 mbarrier.init.shared::cta.b64 [%r1345], 1; 2026-02-21T08:52:40.6375410Z // end inline asm 2026-02-21T08:52:40.6375467Z bar.sync 0; 2026-02-21T08:52:40.6375527Z add.s32 %r1346, %r160, 20488; 2026-02-21T08:52:40.6375590Z // begin inline asm 2026-02-21T08:52:40.6375682Z @%p191 mbarrier.init.shared::cta.b64 [%r1346], 1; 2026-02-21T08:52:40.6375738Z // end inline asm 2026-02-21T08:52:40.6375801Z add.s32 %r1347, %r160, 20496; 2026-02-21T08:52:40.6375859Z // begin inline asm 2026-02-21T08:52:40.6375946Z @%p191 mbarrier.init.shared::cta.b64 [%r1347], 1; 2026-02-21T08:52:40.6376001Z // end inline asm 2026-02-21T08:52:40.6376059Z bar.sync 0; 2026-02-21T08:52:40.6376120Z add.s32 %r1348, %r160, 20504; 2026-02-21T08:52:40.6376177Z // begin inline asm 2026-02-21T08:52:40.6376269Z @%p191 mbarrier.init.shared::cta.b64 [%r1348], 1; 2026-02-21T08:52:40.6376417Z // end inline asm 2026-02-21T08:52:40.6376740Z .loc 1 55 60 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:60 2026-02-21T08:52:40.6376805Z or.b32 %r1405, %r1402, %r10; 2026-02-21T08:52:40.6376872Z or.b32 %r1406, %r1403, %r10; 2026-02-21T08:52:40.6377066Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6377140Z mad.wide.s32 %rd231, %r1405, 2, %rd42; 2026-02-21T08:52:40.6377215Z mad.wide.s32 %rd232, %r1406, 2, %rd42; 2026-02-21T08:52:40.6377271Z mov.b32 %r1350, 8; 2026-02-21T08:52:40.6377464Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6377524Z // begin inline asm 2026-02-21T08:52:40.6377675Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd231 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6377734Z // end inline asm 2026-02-21T08:52:40.6377790Z // begin inline asm 2026-02-21T08:52:40.6377935Z cp.async.ca.shared.global [ %r1351 + 0 ], [ %rd232 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6377992Z // end inline asm 2026-02-21T08:52:40.6378057Z cp.async.commit_group; 2026-02-21T08:52:40.6378266Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6378327Z // begin inline asm 2026-02-21T08:52:40.6378455Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r1345], 256; 2026-02-21T08:52:40.6378514Z // end inline asm 2026-02-21T08:52:40.6378707Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6378761Z bar.sync 0; 2026-02-21T08:52:40.6378829Z elect.sync %r1407|%p204, -1; 2026-02-21T08:52:40.6378903Z and.pred %p196, %p1, %p204; 2026-02-21T08:52:40.6378963Z add.s32 %r1354, %r160, 19456; 2026-02-21T08:52:40.6379018Z mov.b32 %r1356, 0; 2026-02-21T08:52:40.6379080Z // begin inline asm 2026-02-21T08:52:40.6379414Z @%p196 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1354], [%rd180, {%r1689, %r1356}], [%r1345]; 2026-02-21T08:52:40.6379471Z // end inline asm 2026-02-21T08:52:40.6379684Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6379753Z cvt.s64.s32 %rd244, %r1402; 2026-02-21T08:52:40.6379816Z or.b64 %rd246, %rd244, %rd276; 2026-02-21T08:52:40.6379880Z shl.b64 %rd247, %rd246, 1; 2026-02-21T08:52:40.6379945Z add.s64 %rd248, %rd42, %rd247; 2026-02-21T08:52:40.6380006Z add.s64 %rd234, %rd248, 64; 2026-02-21T08:52:40.6380067Z cvt.s64.s32 %rd249, %r1403; 2026-02-21T08:52:40.6380132Z or.b64 %rd250, %rd249, %rd276; 2026-02-21T08:52:40.6380192Z shl.b64 %rd251, %rd250, 1; 2026-02-21T08:52:40.6380253Z add.s64 %rd252, %rd42, %rd251; 2026-02-21T08:52:40.6380314Z add.s64 %rd235, %rd252, 64; 2026-02-21T08:52:40.6380513Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6380576Z // begin inline asm 2026-02-21T08:52:40.6380716Z cp.async.ca.shared.global [ %r1358 + 0 ], [ %rd234 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6380776Z // end inline asm 2026-02-21T08:52:40.6380835Z // begin inline asm 2026-02-21T08:52:40.6381120Z cp.async.ca.shared.global [ %r1360 + 0 ], [ %rd235 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6381181Z // end inline asm 2026-02-21T08:52:40.6381247Z cp.async.commit_group; 2026-02-21T08:52:40.6381452Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6381510Z // begin inline asm 2026-02-21T08:52:40.6381641Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r1347], 256; 2026-02-21T08:52:40.6381697Z // end inline asm 2026-02-21T08:52:40.6381892Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6381951Z bar.sync 0; 2026-02-21T08:52:40.6382016Z elect.sync %r1408|%p205, -1; 2026-02-21T08:52:40.6382203Z and.pred %p198, %p1, %p205; 2026-02-21T08:52:40.6382277Z add.s32 %r1363, %r160, 19968; 2026-02-21T08:52:40.6382335Z mov.b32 %r1365, 16; 2026-02-21T08:52:40.6382393Z // begin inline asm 2026-02-21T08:52:40.6382720Z @%p198 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1363], [%rd180, {%r1689, %r1365}], [%r1347]; 2026-02-21T08:52:40.6382780Z // end inline asm 2026-02-21T08:52:40.6382981Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6383043Z add.s64 %rd237, %rd248, 128; 2026-02-21T08:52:40.6383119Z add.s64 %rd238, %rd252, 128; 2026-02-21T08:52:40.6383316Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6383376Z // begin inline asm 2026-02-21T08:52:40.6383512Z cp.async.ca.shared.global [ %r1367 + 0 ], [ %rd237 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6383569Z // end inline asm 2026-02-21T08:52:40.6383626Z // begin inline asm 2026-02-21T08:52:40.6383760Z cp.async.ca.shared.global [ %r1369 + 0 ], [ %rd238 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6383824Z // end inline asm 2026-02-21T08:52:40.6383890Z cp.async.commit_group; 2026-02-21T08:52:40.6384097Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6384159Z // begin inline asm 2026-02-21T08:52:40.6384282Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r1346], 256; 2026-02-21T08:52:40.6384340Z // end inline asm 2026-02-21T08:52:40.6384537Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6384592Z bar.sync 0; 2026-02-21T08:52:40.6384658Z elect.sync %r1409|%p206, -1; 2026-02-21T08:52:40.6384723Z and.pred %p200, %p1, %p206; 2026-02-21T08:52:40.6384788Z add.s32 %r1372, %r160, 19712; 2026-02-21T08:52:40.6384847Z mov.b32 %r1374, 32; 2026-02-21T08:52:40.6384904Z // begin inline asm 2026-02-21T08:52:40.6385230Z @%p200 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1372], [%rd180, {%r1689, %r1374}], [%r1346]; 2026-02-21T08:52:40.6385286Z // end inline asm 2026-02-21T08:52:40.6385482Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6385549Z add.s64 %rd240, %rd248, 192; 2026-02-21T08:52:40.6385622Z add.s64 %rd241, %rd252, 192; 2026-02-21T08:52:40.6385817Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6385875Z // begin inline asm 2026-02-21T08:52:40.6386015Z cp.async.ca.shared.global [ %r1376 + 0 ], [ %rd240 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6386071Z // end inline asm 2026-02-21T08:52:40.6386129Z // begin inline asm 2026-02-21T08:52:40.6386262Z cp.async.ca.shared.global [ %r1378 + 0 ], [ %rd241 + 0 ], 0x8, %r1350; 2026-02-21T08:52:40.6386318Z // end inline asm 2026-02-21T08:52:40.6386384Z cp.async.commit_group; 2026-02-21T08:52:40.6386702Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6386770Z // begin inline asm 2026-02-21T08:52:40.6386895Z @%p191 mbarrier.arrive.expect_tx.shared.b64 _, [%r1348], 256; 2026-02-21T08:52:40.6387099Z // end inline asm 2026-02-21T08:52:40.6387305Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6387361Z bar.sync 0; 2026-02-21T08:52:40.6387428Z elect.sync %r1410|%p207, -1; 2026-02-21T08:52:40.6387497Z and.pred %p202, %p1, %p207; 2026-02-21T08:52:40.6387557Z add.s32 %r1381, %r160, 20224; 2026-02-21T08:52:40.6387614Z mov.b32 %r1383, 48; 2026-02-21T08:52:40.6387677Z // begin inline asm 2026-02-21T08:52:40.6387994Z @%p202 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1381], [%rd180, {%r1689, %r1383}], [%r1348]; 2026-02-21T08:52:40.6388051Z // end inline asm 2026-02-21T08:52:40.6388388Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6388459Z shl.b32 %r1411, %r1396, 19; 2026-02-21T08:52:40.6388578Z or.b32 %r1412, %r1710, %r1411; 2026-02-21T08:52:40.6388665Z mad.wide.s32 %rd283, %r1412, 2, %rd20; 2026-02-21T08:52:40.6388733Z or.b32 %r1737, %r54, %r1411; 2026-02-21T08:52:40.6388795Z mov.b32 %r1567, 0f00000000; 2026-02-21T08:52:40.6388852Z mov.b32 %r1740, 1; 2026-02-21T08:52:40.6388917Z mov.b32 %r1739, -1; 2026-02-21T08:52:40.6388976Z mov.b64 %rd284, 0; 2026-02-21T08:52:40.6389035Z mov.b32 %r1738, %r1356; 2026-02-21T08:52:40.6389095Z mov.b32 %r1568, %r1567; 2026-02-21T08:52:40.6389156Z mov.b32 %r1569, %r1567; 2026-02-21T08:52:40.6389213Z mov.b32 %r1570, %r1567; 2026-02-21T08:52:40.6389325Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:40.6389440Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:40.6389517Z setp.lt.u64 %p227, %rd284, 4032; 2026-02-21T08:52:40.6389582Z add.s32 %r1635, %r1739, 1; 2026-02-21T08:52:40.6389649Z setp.gt.s32 %p228, %r1635, 1; 2026-02-21T08:52:40.6389723Z selp.b32 %r1739, 0, %r1635, %p228; 2026-02-21T08:52:40.6389790Z selp.b32 %r1636, 1, 0, %p228; 2026-02-21T08:52:40.6389853Z xor.b32 %r1738, %r1738, %r1636; 2026-02-21T08:52:40.6390066Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6390135Z cp.async.wait_group 2; 2026-02-21T08:52:40.6390190Z bar.sync 0; 2026-02-21T08:52:40.6390254Z shl.b32 %r1637, %r1739, 12; 2026-02-21T08:52:40.6390315Z add.s32 %r1639, %r160, %r1637; 2026-02-21T08:52:40.6390510Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6390572Z add.s32 %r1640, %r1639, %r43; 2026-02-21T08:52:40.6390641Z ld.shared.b16 %rs157, [%r1640]; 2026-02-21T08:52:40.6390711Z ld.shared.b16 %rs158, [%r1640+512]; 2026-02-21T08:52:40.6390782Z ld.shared.b16 %rs159, [%r1640+32]; 2026-02-21T08:52:40.6390851Z ld.shared.b16 %rs160, [%r1640+544]; 2026-02-21T08:52:40.6390911Z add.s32 %r1641, %r1639, %r44; 2026-02-21T08:52:40.6390975Z ld.shared.b16 %rs161, [%r1641]; 2026-02-21T08:52:40.6391044Z ld.shared.b16 %rs162, [%r1641+512]; 2026-02-21T08:52:40.6391113Z ld.shared.b16 %rs163, [%r1641+32]; 2026-02-21T08:52:40.6391178Z ld.shared.b16 %rs164, [%r1641+544]; 2026-02-21T08:52:40.6391238Z add.s32 %r1642, %r1639, %r45; 2026-02-21T08:52:40.6391308Z ld.shared.b16 %rs165, [%r1642]; 2026-02-21T08:52:40.6391371Z ld.shared.b16 %rs166, [%r1642+512]; 2026-02-21T08:52:40.6391435Z ld.shared.b16 %rs167, [%r1642+32]; 2026-02-21T08:52:40.6391503Z ld.shared.b16 %rs168, [%r1642+544]; 2026-02-21T08:52:40.6391563Z add.s32 %r1643, %r1639, %r46; 2026-02-21T08:52:40.6391625Z ld.shared.b16 %rs169, [%r1643]; 2026-02-21T08:52:40.6391689Z ld.shared.b16 %rs170, [%r1643+512]; 2026-02-21T08:52:40.6391759Z ld.shared.b16 %rs171, [%r1643+32]; 2026-02-21T08:52:40.6391828Z ld.shared.b16 %rs172, [%r1643+544]; 2026-02-21T08:52:40.6391893Z cvt.f32.bf16 %r1483, %rs157; 2026-02-21T08:52:40.6391967Z cvt.f32.bf16 %r1484, %rs158; 2026-02-21T08:52:40.6392031Z cvt.f32.bf16 %r1485, %rs161; 2026-02-21T08:52:40.6392196Z cvt.f32.bf16 %r1486, %rs162; 2026-02-21T08:52:40.6392256Z cvt.f32.bf16 %r1495, %rs165; 2026-02-21T08:52:40.6392324Z cvt.f32.bf16 %r1496, %rs166; 2026-02-21T08:52:40.6392385Z cvt.f32.bf16 %r1497, %rs169; 2026-02-21T08:52:40.6392445Z cvt.f32.bf16 %r1498, %rs170; 2026-02-21T08:52:40.6392507Z cvt.f32.bf16 %r1507, %rs159; 2026-02-21T08:52:40.6392567Z cvt.f32.bf16 %r1508, %rs160; 2026-02-21T08:52:40.6392627Z cvt.f32.bf16 %r1509, %rs163; 2026-02-21T08:52:40.6392686Z cvt.f32.bf16 %r1510, %rs164; 2026-02-21T08:52:40.6392762Z cvt.f32.bf16 %r1519, %rs167; 2026-02-21T08:52:40.6392823Z cvt.f32.bf16 %r1520, %rs168; 2026-02-21T08:52:40.6392884Z cvt.f32.bf16 %r1521, %rs171; 2026-02-21T08:52:40.6392947Z cvt.f32.bf16 %r1522, %rs172; 2026-02-21T08:52:40.6393243Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6393307Z shl.b32 %r1644, %r1739, 3; 2026-02-21T08:52:40.6393371Z add.s32 %r1413, %r1345, %r1644; 2026-02-21T08:52:40.6393437Z // begin inline asm 2026-02-21T08:52:40.6393489Z 2026-02-21T08:52:40.6393540Z { 2026-02-21T08:52:40.6393608Z .reg .pred complete; 2026-02-21T08:52:40.6393664Z waitLoop: 2026-02-21T08:52:40.6393808Z mbarrier.try_wait.parity.shared.b64 complete, [%r1413], %r1738; 2026-02-21T08:52:40.6393880Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6393931Z } 2026-02-21T08:52:40.6393936Z 2026-02-21T08:52:40.6393991Z // end inline asm 2026-02-21T08:52:40.6394189Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6394254Z shl.b32 %r1646, %r1739, 8; 2026-02-21T08:52:40.6394315Z add.s32 %r1648, %r1354, %r1646; 2026-02-21T08:52:40.6394516Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6394595Z add.s32 %r1649, %r1648, %r47; 2026-02-21T08:52:40.6394663Z ld.shared.b8 %rs173, [%r1649]; 2026-02-21T08:52:40.6394733Z ld.shared.b8 %rs174, [%r1649+128]; 2026-02-21T08:52:40.6394933Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6394996Z shl.b16 %rs175, %rs173, 4; 2026-02-21T08:52:40.6395056Z shl.b16 %rs176, %rs174, 4; 2026-02-21T08:52:40.6395248Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6395326Z selp.b16 %rs177, %rs175, %rs173, %p243; 2026-02-21T08:52:40.6395387Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T08:52:40.6395447Z shr.s16 %rs179, %rs178, 4; 2026-02-21T08:52:40.6395521Z selp.b16 %rs180, %rs176, %rs174, %p243; 2026-02-21T08:52:40.6395582Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T08:52:40.6395642Z shr.s16 %rs182, %rs181, 4; 2026-02-21T08:52:40.6395842Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6395906Z cvt.rn.f32.s16 %r1650, %rs179; 2026-02-21T08:52:40.6395970Z cvt.rn.f32.s16 %r1651, %rs182; 2026-02-21T08:52:40.6396036Z st.shared.b32 [%r48], %r1650; 2026-02-21T08:52:40.6396102Z st.shared.b32 [%r49], %r1651; 2026-02-21T08:52:40.6396241Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1548], {%r1567}; 2026-02-21T08:52:40.6396302Z bar.sync 0; 2026-02-21T08:52:40.6396365Z // begin inline asm 2026-02-21T08:52:40.6396643Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1439, %r1487}, [%r1417]; 2026-02-21T08:52:40.6396706Z // end inline asm 2026-02-21T08:52:40.6396766Z bar.sync 0; 2026-02-21T08:52:40.6396902Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1548], {%r1569}; 2026-02-21T08:52:40.6396958Z bar.sync 0; 2026-02-21T08:52:40.6397018Z // begin inline asm 2026-02-21T08:52:40.6397173Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1441, %r1489}, [%r1417]; 2026-02-21T08:52:40.6397233Z // end inline asm 2026-02-21T08:52:40.6397290Z bar.sync 0; 2026-02-21T08:52:40.6397427Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1548], {%r1568}; 2026-02-21T08:52:40.6397482Z bar.sync 0; 2026-02-21T08:52:40.6397692Z // begin inline asm 2026-02-21T08:52:40.6397847Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1440, %r1488}, [%r1417]; 2026-02-21T08:52:40.6397907Z // end inline asm 2026-02-21T08:52:40.6397961Z bar.sync 0; 2026-02-21T08:52:40.6398090Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1548], {%r1570}; 2026-02-21T08:52:40.6398149Z bar.sync 0; 2026-02-21T08:52:40.6398206Z // begin inline asm 2026-02-21T08:52:40.6398352Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1442, %r1490}, [%r1417]; 2026-02-21T08:52:40.6398412Z // end inline asm 2026-02-21T08:52:40.6398470Z $L__tmp13: 2026-02-21T08:52:40.6398748Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6398808Z // begin inline asm 2026-02-21T08:52:40.6399015Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6399077Z // end inline asm 2026-02-21T08:52:40.6399158Z shfl.sync.idx.b32 %r1652, %r5, 0, 31, -1; 2026-02-21T08:52:40.6399238Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6399300Z mov.pred %p208, -1; 2026-02-21T08:52:40.6399359Z // begin inline asm 2026-02-21T08:52:40.6399678Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1439,%r1440,%r1441,%r1442}, {%r1483,%r1484,%r1485,%r1486}, %rd253, %p208, 1, 1; 2026-02-21T08:52:40.6399745Z // end inline asm 2026-02-21T08:52:40.6399810Z // begin inline asm 2026-02-21T08:52:40.6400115Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1439,%r1440,%r1441,%r1442}, {%r1495,%r1496,%r1497,%r1498}, %rd254, %p208, 1, 1; 2026-02-21T08:52:40.6400177Z // end inline asm 2026-02-21T08:52:40.6400236Z // begin inline asm 2026-02-21T08:52:40.6400533Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1439,%r1440,%r1441,%r1442}, {%r1507,%r1508,%r1509,%r1510}, %rd255, %p208, 1, 1; 2026-02-21T08:52:40.6400597Z // end inline asm 2026-02-21T08:52:40.6400656Z // begin inline asm 2026-02-21T08:52:40.6400954Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1439,%r1440,%r1441,%r1442}, {%r1519,%r1520,%r1521,%r1522}, %rd256, %p208, 1, 1; 2026-02-21T08:52:40.6401014Z // end inline asm 2026-02-21T08:52:40.6401080Z // begin inline asm 2026-02-21T08:52:40.6401380Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1487,%r1488,%r1489,%r1490}, {%r1483,%r1484,%r1485,%r1486}, %rd257, %p208, 1, 1; 2026-02-21T08:52:40.6401436Z // end inline asm 2026-02-21T08:52:40.6401496Z // begin inline asm 2026-02-21T08:52:40.6401795Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1487,%r1488,%r1489,%r1490}, {%r1495,%r1496,%r1497,%r1498}, %rd258, %p208, 1, 1; 2026-02-21T08:52:40.6401866Z // end inline asm 2026-02-21T08:52:40.6401930Z // begin inline asm 2026-02-21T08:52:40.6402233Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1487,%r1488,%r1489,%r1490}, {%r1507,%r1508,%r1509,%r1510}, %rd259, %p208, 1, 1; 2026-02-21T08:52:40.6402290Z // end inline asm 2026-02-21T08:52:40.6402348Z // begin inline asm 2026-02-21T08:52:40.6402649Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1487,%r1488,%r1489,%r1490}, {%r1519,%r1520,%r1521,%r1522}, %rd260, %p208, 1, 1; 2026-02-21T08:52:40.6402707Z // end inline asm 2026-02-21T08:52:40.6402783Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6402849Z mov.b32 %r1532, %r1356; 2026-02-21T08:52:40.6402908Z mov.b32 %r1533, %r1356; 2026-02-21T08:52:40.6402966Z mov.b32 %r1531, %r1303; 2026-02-21T08:52:40.6403032Z // begin inline asm 2026-02-21T08:52:40.6403232Z // wait for regs: %r1439,%r1440,%r1441,%r1442,%r1487,%r1488,%r1489,%r1490,%r1531,%r1532,%r1533 2026-02-21T08:52:40.6403307Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6403363Z // end inline asm 2026-02-21T08:52:40.6403419Z $L__tmp14: 2026-02-21T08:52:40.6403626Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6403691Z add.s32 %r1653, %r160, 8192; 2026-02-21T08:52:40.6403758Z add.s32 %r1654, %r1653, %r1637; 2026-02-21T08:52:40.6403955Z .loc 1 59 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:59:32 2026-02-21T08:52:40.6404122Z add.s32 %r1655, %r1654, %r43; 2026-02-21T08:52:40.6404191Z ld.shared.b16 %rs183, [%r1655]; 2026-02-21T08:52:40.6404261Z ld.shared.b16 %rs184, [%r1655+512]; 2026-02-21T08:52:40.6404326Z ld.shared.b16 %rs185, [%r1655+32]; 2026-02-21T08:52:40.6404392Z ld.shared.b16 %rs186, [%r1655+544]; 2026-02-21T08:52:40.6404456Z add.s32 %r1656, %r1654, %r44; 2026-02-21T08:52:40.6404518Z ld.shared.b16 %rs187, [%r1656]; 2026-02-21T08:52:40.6404583Z ld.shared.b16 %rs188, [%r1656+512]; 2026-02-21T08:52:40.6404653Z ld.shared.b16 %rs189, [%r1656+32]; 2026-02-21T08:52:40.6404717Z ld.shared.b16 %rs190, [%r1656+544]; 2026-02-21T08:52:40.6404778Z add.s32 %r1657, %r1654, %r45; 2026-02-21T08:52:40.6404844Z ld.shared.b16 %rs191, [%r1657]; 2026-02-21T08:52:40.6405007Z ld.shared.b16 %rs192, [%r1657+512]; 2026-02-21T08:52:40.6405074Z ld.shared.b16 %rs193, [%r1657+32]; 2026-02-21T08:52:40.6405140Z ld.shared.b16 %rs194, [%r1657+544]; 2026-02-21T08:52:40.6405210Z add.s32 %r1658, %r1654, %r46; 2026-02-21T08:52:40.6405284Z ld.shared.b16 %rs195, [%r1658]; 2026-02-21T08:52:40.6405353Z ld.shared.b16 %rs196, [%r1658+512]; 2026-02-21T08:52:40.6405420Z ld.shared.b16 %rs197, [%r1658+32]; 2026-02-21T08:52:40.6405484Z ld.shared.b16 %rs198, [%r1658+544]; 2026-02-21T08:52:40.6405547Z cvt.f32.bf16 %r1563, %rs183; 2026-02-21T08:52:40.6405609Z cvt.f32.bf16 %r1564, %rs184; 2026-02-21T08:52:40.6405674Z cvt.f32.bf16 %r1565, %rs187; 2026-02-21T08:52:40.6405734Z cvt.f32.bf16 %r1566, %rs188; 2026-02-21T08:52:40.6405794Z cvt.f32.bf16 %r1575, %rs191; 2026-02-21T08:52:40.6405858Z cvt.f32.bf16 %r1576, %rs192; 2026-02-21T08:52:40.6405918Z cvt.f32.bf16 %r1577, %rs195; 2026-02-21T08:52:40.6405976Z cvt.f32.bf16 %r1578, %rs196; 2026-02-21T08:52:40.6406038Z cvt.f32.bf16 %r1587, %rs185; 2026-02-21T08:52:40.6406101Z cvt.f32.bf16 %r1588, %rs186; 2026-02-21T08:52:40.6406161Z cvt.f32.bf16 %r1589, %rs189; 2026-02-21T08:52:40.6406221Z cvt.f32.bf16 %r1590, %rs190; 2026-02-21T08:52:40.6406289Z cvt.f32.bf16 %r1599, %rs193; 2026-02-21T08:52:40.6406349Z cvt.f32.bf16 %r1600, %rs194; 2026-02-21T08:52:40.6406414Z cvt.f32.bf16 %r1601, %rs197; 2026-02-21T08:52:40.6406591Z cvt.f32.bf16 %r1602, %rs198; 2026-02-21T08:52:40.6406810Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6406872Z add.s32 %r1545, %r1347, %r1644; 2026-02-21T08:52:40.6406932Z // begin inline asm 2026-02-21T08:52:40.6406986Z 2026-02-21T08:52:40.6407036Z { 2026-02-21T08:52:40.6407099Z .reg .pred complete; 2026-02-21T08:52:40.6407157Z waitLoop: 2026-02-21T08:52:40.6407307Z mbarrier.try_wait.parity.shared.b64 complete, [%r1545], %r1738; 2026-02-21T08:52:40.6407379Z @!complete bra.uni waitLoop; 2026-02-21T08:52:40.6407432Z } 2026-02-21T08:52:40.6407436Z 2026-02-21T08:52:40.6407496Z // end inline asm 2026-02-21T08:52:40.6407693Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6407757Z add.s32 %r1661, %r1363, %r1646; 2026-02-21T08:52:40.6407959Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6408021Z add.s32 %r1662, %r1661, %r47; 2026-02-21T08:52:40.6408089Z ld.shared.b8 %rs199, [%r1662]; 2026-02-21T08:52:40.6408157Z ld.shared.b8 %rs200, [%r1662+128]; 2026-02-21T08:52:40.6408349Z .loc 1 64 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:64:28 2026-02-21T08:52:40.6408411Z shl.b16 %rs201, %rs199, 4; 2026-02-21T08:52:40.6408470Z shl.b16 %rs202, %rs200, 4; 2026-02-21T08:52:40.6408665Z .loc 1 79 58 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:79:58 2026-02-21T08:52:40.6408741Z selp.b16 %rs203, %rs201, %rs199, %p243; 2026-02-21T08:52:40.6408803Z cvt.s16.s8 %rs204, %rs203; 2026-02-21T08:52:40.6408865Z shr.s16 %rs205, %rs204, 4; 2026-02-21T08:52:40.6408934Z selp.b16 %rs206, %rs202, %rs200, %p243; 2026-02-21T08:52:40.6409148Z cvt.s16.s8 %rs207, %rs206; 2026-02-21T08:52:40.6409212Z shr.s16 %rs208, %rs207, 4; 2026-02-21T08:52:40.6409408Z .loc 1 84 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:84:32 2026-02-21T08:52:40.6409473Z cvt.rn.f32.s16 %r1663, %rs205; 2026-02-21T08:52:40.6409538Z cvt.rn.f32.s16 %r1664, %rs208; 2026-02-21T08:52:40.6409599Z bar.sync 0; 2026-02-21T08:52:40.6409675Z st.shared.b32 [%r48], %r1663; 2026-02-21T08:52:40.6409741Z st.shared.b32 [%r49], %r1664; 2026-02-21T08:52:40.6409799Z $L__tmp15: 2026-02-21T08:52:40.6410069Z .loc 2 291 36 // standard.py:291:36 @[ c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:91:40 ] 2026-02-21T08:52:40.6410345Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r1417], {%r1439, %r1487}; 2026-02-21T08:52:40.6410419Z bar.sync 0; 2026-02-21T08:52:40.6410480Z // begin inline asm 2026-02-21T08:52:40.6410615Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1567}, [%r1548]; 2026-02-21T08:52:40.6410678Z // end inline asm 2026-02-21T08:52:40.6410734Z bar.sync 0; 2026-02-21T08:52:40.6410883Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r1417], {%r1441, %r1489}; 2026-02-21T08:52:40.6410938Z bar.sync 0; 2026-02-21T08:52:40.6411003Z // begin inline asm 2026-02-21T08:52:40.6411130Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1569}, [%r1548]; 2026-02-21T08:52:40.6411190Z // end inline asm 2026-02-21T08:52:40.6411243Z bar.sync 0; 2026-02-21T08:52:40.6411392Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r1417], {%r1440, %r1488}; 2026-02-21T08:52:40.6411446Z bar.sync 0; 2026-02-21T08:52:40.6411505Z // begin inline asm 2026-02-21T08:52:40.6411635Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1568}, [%r1548]; 2026-02-21T08:52:40.6411691Z // end inline asm 2026-02-21T08:52:40.6411749Z bar.sync 0; 2026-02-21T08:52:40.6411895Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r1417], {%r1442, %r1490}; 2026-02-21T08:52:40.6411953Z bar.sync 0; 2026-02-21T08:52:40.6412014Z // begin inline asm 2026-02-21T08:52:40.6412140Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1570}, [%r1548]; 2026-02-21T08:52:40.6412200Z // end inline asm 2026-02-21T08:52:40.6412257Z // begin inline asm 2026-02-21T08:52:40.6412333Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6412393Z // end inline asm 2026-02-21T08:52:40.6412469Z wgmma.fence.sync.aligned; 2026-02-21T08:52:40.6412533Z shl.b32 %r1665, %r1652, 8; 2026-02-21T08:52:40.6412596Z and.b32 %r1666, %r1665, 1024; 2026-02-21T08:52:40.6412663Z add.s32 %r1667, %r1666, %r1303; 2026-02-21T08:52:40.6412723Z bfe.u32 %r1668, %r1667, 4, 14; 2026-02-21T08:52:40.6412786Z cvt.u64.u32 %rd271, %r1668; 2026-02-21T08:52:40.6412865Z or.b64 %rd261, %rd271, 4611686293313683456; 2026-02-21T08:52:40.6412923Z // begin inline asm 2026-02-21T08:52:40.6413261Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1567,%r1568,%r1569,%r1570}, {%r1563,%r1564,%r1565,%r1566}, %rd261, %p208, 1, 1; 2026-02-21T08:52:40.6413324Z // end inline asm 2026-02-21T08:52:40.6413394Z add.s32 %r1669, %r1667, 32; 2026-02-21T08:52:40.6413454Z bfe.u32 %r1670, %r1669, 4, 14; 2026-02-21T08:52:40.6413516Z cvt.u64.u32 %rd272, %r1670; 2026-02-21T08:52:40.6413596Z or.b64 %rd262, %rd272, 4611686293313683456; 2026-02-21T08:52:40.6413654Z // begin inline asm 2026-02-21T08:52:40.6413959Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1567,%r1568,%r1569,%r1570}, {%r1575,%r1576,%r1577,%r1578}, %rd262, %p208, 1, 1; 2026-02-21T08:52:40.6414019Z // end inline asm 2026-02-21T08:52:40.6414078Z add.s32 %r1671, %r1667, 64; 2026-02-21T08:52:40.6414138Z bfe.u32 %r1672, %r1671, 4, 14; 2026-02-21T08:52:40.6414202Z cvt.u64.u32 %rd273, %r1672; 2026-02-21T08:52:40.6414278Z or.b64 %rd263, %rd273, 4611686293313683456; 2026-02-21T08:52:40.6414337Z // begin inline asm 2026-02-21T08:52:40.6414640Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1567,%r1568,%r1569,%r1570}, {%r1587,%r1588,%r1589,%r1590}, %rd263, %p208, 1, 1; 2026-02-21T08:52:40.6414702Z // end inline asm 2026-02-21T08:52:40.6414867Z add.s32 %r1673, %r1667, 96; 2026-02-21T08:52:40.6414927Z bfe.u32 %r1674, %r1673, 4, 14; 2026-02-21T08:52:40.6414987Z cvt.u64.u32 %rd274, %r1674; 2026-02-21T08:52:40.6415062Z or.b64 %rd264, %rd274, 4611686293313683456; 2026-02-21T08:52:40.6415121Z // begin inline asm 2026-02-21T08:52:40.6415420Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r1567,%r1568,%r1569,%r1570}, {%r1599,%r1600,%r1601,%r1602}, %rd264, %p208, 1, 1; 2026-02-21T08:52:40.6415480Z // end inline asm 2026-02-21T08:52:40.6415556Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:40.6415616Z mov.b32 %r1608, %r1356; 2026-02-21T08:52:40.6415679Z mov.b32 %r1607, %r1303; 2026-02-21T08:52:40.6415738Z mov.b32 %r1609, %r1356; 2026-02-21T08:52:40.6415798Z // begin inline asm 2026-02-21T08:52:40.6416006Z // wait for regs: %r1567,%r1568,%r1569,%r1570,%r1607,%r1608,%r1609 2026-02-21T08:52:40.6416086Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:40.6416142Z // end inline asm 2026-02-21T08:52:40.6416198Z $L__tmp16: 2026-02-21T08:52:40.6416410Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6416595Z add.s32 %r1675, %r1740, 1; 2026-02-21T08:52:40.6416667Z setp.gt.s32 %p229, %r1675, 1; 2026-02-21T08:52:40.6416738Z selp.b32 %r1740, 0, %r1675, %p229; 2026-02-21T08:52:40.6416940Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6417003Z add.s32 %r1676, %r1737, -32; 2026-02-21T08:52:40.6417067Z add.s64 %rd265, %rd283, -64; 2026-02-21T08:52:40.6417144Z mad.wide.s32 %rd266, %r1676, 2, %rd42; 2026-02-21T08:52:40.6417338Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6417402Z shl.b32 %r1677, %r1740, 12; 2026-02-21T08:52:40.6417469Z add.s32 %r1678, %r160, %r1677; 2026-02-21T08:52:40.6417529Z add.s32 %r1617, %r1678, %r34; 2026-02-21T08:52:40.6417602Z selp.b32 %r1618, 8, 0, %p227; 2026-02-21T08:52:40.6417666Z // begin inline asm 2026-02-21T08:52:40.6417814Z cp.async.ca.shared.global [ %r1617 + 0 ], [ %rd265 + 0 ], 0x8, %r1618; 2026-02-21T08:52:40.6417872Z // end inline asm 2026-02-21T08:52:40.6417931Z add.s32 %r1619, %r1617, 2048; 2026-02-21T08:52:40.6417994Z // begin inline asm 2026-02-21T08:52:40.6418130Z cp.async.ca.shared.global [ %r1619 + 0 ], [ %rd266 + 0 ], 0x8, %r1618; 2026-02-21T08:52:40.6418186Z // end inline asm 2026-02-21T08:52:40.6418256Z cp.async.commit_group; 2026-02-21T08:52:40.6418464Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6418525Z shl.b32 %r1679, %r1740, 3; 2026-02-21T08:52:40.6418586Z add.s32 %r1621, %r1345, %r1679; 2026-02-21T08:52:40.6418660Z and.pred %p220, %p191, %p227; 2026-02-21T08:52:40.6418720Z // begin inline asm 2026-02-21T08:52:40.6418850Z @%p220 mbarrier.arrive.expect_tx.shared.b64 _, [%r1621], 256; 2026-02-21T08:52:40.6418915Z // end inline asm 2026-02-21T08:52:40.6419127Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6419188Z shl.b32 %r1680, %r1740, 8; 2026-02-21T08:52:40.6419259Z add.s32 %r1622, %r1354, %r1680; 2026-02-21T08:52:40.6419315Z bar.sync 0; 2026-02-21T08:52:40.6419381Z elect.sync %r1681|%p230, -1; 2026-02-21T08:52:40.6419446Z and.pred %p231, %p227, %p230; 2026-02-21T08:52:40.6419513Z and.pred %p221, %p1, %p231; 2026-02-21T08:52:40.6419573Z cvt.u32.u64 %r1682, %rd284; 2026-02-21T08:52:40.6419633Z add.s32 %r1624, %r1682, 64; 2026-02-21T08:52:40.6419693Z // begin inline asm 2026-02-21T08:52:40.6420020Z @%p221 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1622], [%rd180, {%r1689, %r1624}], [%r1621]; 2026-02-21T08:52:40.6420080Z // end inline asm 2026-02-21T08:52:40.6420279Z .loc 1 55 32 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:32 2026-02-21T08:52:40.6420351Z mad.wide.s32 %rd269, %r1737, 2, %rd42; 2026-02-21T08:52:40.6420702Z .loc 1 55 80 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:55:80 2026-02-21T08:52:40.6420767Z add.s32 %r1683, %r1653, %r1677; 2026-02-21T08:52:40.6420832Z add.s32 %r1626, %r1683, %r34; 2026-02-21T08:52:40.6420892Z // begin inline asm 2026-02-21T08:52:40.6421030Z cp.async.ca.shared.global [ %r1626 + 0 ], [ %rd283 + 0 ], 0x8, %r1618; 2026-02-21T08:52:40.6421089Z // end inline asm 2026-02-21T08:52:40.6421149Z add.s32 %r1628, %r1626, 2048; 2026-02-21T08:52:40.6421207Z // begin inline asm 2026-02-21T08:52:40.6421341Z cp.async.ca.shared.global [ %r1628 + 0 ], [ %rd269 + 0 ], 0x8, %r1618; 2026-02-21T08:52:40.6421402Z // end inline asm 2026-02-21T08:52:40.6421468Z cp.async.commit_group; 2026-02-21T08:52:40.6421788Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6421857Z add.s32 %r1630, %r1347, %r1679; 2026-02-21T08:52:40.6421919Z // begin inline asm 2026-02-21T08:52:40.6422046Z @%p220 mbarrier.arrive.expect_tx.shared.b64 _, [%r1630], 256; 2026-02-21T08:52:40.6422106Z // end inline asm 2026-02-21T08:52:40.6422301Z .loc 1 61 33 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:61:33 2026-02-21T08:52:40.6422361Z add.s32 %r1631, %r1363, %r1680; 2026-02-21T08:52:40.6422415Z bar.sync 0; 2026-02-21T08:52:40.6422488Z elect.sync %r1684|%p232, -1; 2026-02-21T08:52:40.6422566Z and.pred %p233, %p227, %p232; 2026-02-21T08:52:40.6422634Z and.pred %p223, %p1, %p233; 2026-02-21T08:52:40.6422699Z add.s32 %r1633, %r1682, 80; 2026-02-21T08:52:40.6422758Z // begin inline asm 2026-02-21T08:52:40.6423081Z @%p223 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1631], [%rd180, {%r1689, %r1633}], [%r1630]; 2026-02-21T08:52:40.6423141Z // end inline asm 2026-02-21T08:52:40.6423345Z .loc 1 48 112 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:48:112 2026-02-21T08:52:40.6423409Z add.s64 %rd283, %rd283, 128; 2026-02-21T08:52:40.6423470Z add.s32 %r1737, %r1737, 64; 2026-02-21T08:52:40.6423542Z setp.lt.u64 %p234, %rd284, 4064; 2026-02-21T08:52:40.6423604Z add.s64 %rd284, %rd284, 32; 2026-02-21T08:52:40.6423667Z @%p234 bra $L__BB0_12; 2026-02-21T08:52:40.6423780Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:40.6423848Z cp.async.wait_group 0; 2026-02-21T08:52:40.6423905Z bar.sync 0; 2026-02-21T08:52:40.6423967Z // begin inline asm 2026-02-21T08:52:40.6424064Z @%p191 mbarrier.inval.shared::cta.b64 [%r1347]; 2026-02-21T08:52:40.6424122Z // end inline asm 2026-02-21T08:52:40.6424182Z bar.sync 0; 2026-02-21T08:52:40.6424246Z // begin inline asm 2026-02-21T08:52:40.6424339Z @%p191 mbarrier.inval.shared::cta.b64 [%r1348]; 2026-02-21T08:52:40.6424395Z // end inline asm 2026-02-21T08:52:40.6424456Z // begin inline asm 2026-02-21T08:52:40.6424543Z @%p191 mbarrier.inval.shared::cta.b64 [%r1345]; 2026-02-21T08:52:40.6424601Z // end inline asm 2026-02-21T08:52:40.6424654Z bar.sync 0; 2026-02-21T08:52:40.6424717Z // begin inline asm 2026-02-21T08:52:40.6424803Z @%p191 mbarrier.inval.shared::cta.b64 [%r1346]; 2026-02-21T08:52:40.6424859Z // end inline asm 2026-02-21T08:52:40.6425060Z .loc 1 94 28 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:94:28 2026-02-21T08:52:40.6425140Z cvt.rn.bf16x2.f32 %r1692, %r1568, %r1567; 2026-02-21T08:52:40.6425228Z cvt.rn.bf16x2.f32 %r1693, %r1570, %r1569; 2026-02-21T08:52:40.6425437Z .loc 1 95 43 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:95:43 2026-02-21T08:52:40.6425589Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r52], {%r1692, %r1693}; 2026-02-21T08:52:40.6425650Z // begin inline asm 2026-02-21T08:52:40.6425727Z fence.proxy.async.shared::cta; 2026-02-21T08:52:40.6425788Z // end inline asm 2026-02-21T08:52:40.6425842Z bar.sync 0; 2026-02-21T08:52:40.6425911Z elect.sync %r1694|%p241, -1; 2026-02-21T08:52:40.6426085Z and.pred %p239, %p1, %p241; 2026-02-21T08:52:40.6426143Z // begin inline asm 2026-02-21T08:52:40.6426384Z @%p239 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd177, {%r1689, %r1690}], [%r160]; 2026-02-21T08:52:40.6426548Z // end inline asm 2026-02-21T08:52:40.6426635Z cp.async.bulk.commit_group; 2026-02-21T08:52:40.6426712Z cp.async.bulk.wait_group.read 0; 2026-02-21T08:52:40.6426767Z bar.sync 0; 2026-02-21T08:52:40.6426975Z .loc 1 28 97 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:97 2026-02-21T08:52:40.6427038Z add.s32 %r135, %r1736, 264; 2026-02-21T08:52:40.6427102Z setp.lt.s32 %p242, %r1736, 184; 2026-02-21T08:52:40.6427164Z mov.b32 %r1736, %r135; 2026-02-21T08:52:40.6427350Z @%p242 bra $L__BB0_11; 2026-02-21T08:52:40.6427443Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:40.6427646Z .loc 1 28 4 // c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py:28:4 2026-02-21T08:52:40.6427703Z ret; 2026-02-21T08:52:40.6427757Z $L__tmp17: 2026-02-21T08:52:40.6427811Z $L__func_end0: 2026-02-21T08:52:40.6427905Z // -- End function 2026-02-21T08:52:40.6427956Z } 2026-02-21T08:52:40.6428201Z .file 1 "/tmp/torchinductor_root/7d/c7d5d47ne4ntxmr2z7sahcvisbgnt3hmurobfzvokxw6nbze6uko.py" 2026-02-21T08:52:40.6428419Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:40.6428492Z .section .debug_abbrev 2026-02-21T08:52:40.6428626Z { 2026-02-21T08:52:40.6428726Z .b8 1 // Abbreviation Code 2026-02-21T08:52:40.6428823Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:40.6428910Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:40.6428995Z .b8 37 // DW_AT_producer 2026-02-21T08:52:40.6429079Z .b8 8 // DW_FORM_string 2026-02-21T08:52:40.6429161Z .b8 19 // DW_AT_language 2026-02-21T08:52:40.6429242Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:40.6429323Z .b8 3 // DW_AT_name 2026-02-21T08:52:40.6429402Z .b8 8 // DW_FORM_string 2026-02-21T08:52:40.6429482Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:40.6429559Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:40.6429641Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:40.6429717Z .b8 8 // DW_FORM_string 2026-02-21T08:52:40.6429792Z .b8 0 // EOM(1) 2026-02-21T08:52:40.6429868Z .b8 0 // EOM(2) 2026-02-21T08:52:40.6429956Z .b8 2 // Abbreviation Code 2026-02-21T08:52:40.6430044Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:40.6430127Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:40.6430201Z .b8 3 // DW_AT_name 2026-02-21T08:52:40.6430279Z .b8 8 // DW_FORM_string 2026-02-21T08:52:40.6430363Z .b8 32 // DW_AT_inline 2026-02-21T08:52:40.6430442Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:40.6430512Z .b8 0 // EOM(1) 2026-02-21T08:52:40.6430580Z .b8 0 // EOM(2) 2026-02-21T08:52:40.6430669Z .b8 3 // Abbreviation Code 2026-02-21T08:52:40.6430753Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:40.6430837Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:40.6430917Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:40.6430995Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:40.6431247Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:40.6431335Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:40.6431431Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:40.6431509Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:40.6431579Z .b8 0 // EOM(1) 2026-02-21T08:52:40.6431652Z .b8 0 // EOM(2) 2026-02-21T08:52:40.6431739Z .b8 4 // Abbreviation Code 2026-02-21T08:52:40.6431850Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:40.6431940Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:40.6432127Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:40.6432207Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:40.6432288Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:40.6432367Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:40.6432450Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:40.6432590Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:40.6432753Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:40.6432870Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:40.6432991Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:40.6433164Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:40.6443328Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:40.6443479Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:40.6443578Z .b8 0 // EOM(1) 2026-02-21T08:52:40.6443659Z .b8 0 // EOM(2) 2026-02-21T08:52:40.6443731Z .b8 0 // EOM(3) 2026-02-21T08:52:40.6443791Z } 2026-02-21T08:52:40.6443859Z .section .debug_info 2026-02-21T08:52:40.6443917Z { 2026-02-21T08:52:40.6444024Z .b32 178 // Length of Unit 2026-02-21T08:52:40.6444129Z .b8 2 // DWARF version number 2026-02-21T08:52:40.6444185Z .b8 0 2026-02-21T08:52:40.6444325Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:40.6444435Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:40.6444562Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:40.6444666Z .b8 116 // DW_AT_producer 2026-02-21T08:52:40.6444721Z .b8 114 2026-02-21T08:52:40.6444776Z .b8 105 2026-02-21T08:52:40.6444832Z .b8 116 2026-02-21T08:52:40.6444884Z .b8 111 2026-02-21T08:52:40.6444933Z .b8 110 2026-02-21T08:52:40.6444984Z .b8 0 2026-02-21T08:52:40.6445079Z .b8 2 // DW_AT_language 2026-02-21T08:52:40.6445134Z .b8 0 2026-02-21T08:52:40.6445218Z .b8 99 // DW_AT_name 2026-02-21T08:52:40.6445273Z .b8 55 2026-02-21T08:52:40.6445324Z .b8 100 2026-02-21T08:52:40.6445374Z .b8 53 2026-02-21T08:52:40.6445425Z .b8 100 2026-02-21T08:52:40.6445478Z .b8 52 2026-02-21T08:52:40.6445529Z .b8 55 2026-02-21T08:52:40.6445580Z .b8 110 2026-02-21T08:52:40.6445634Z .b8 101 2026-02-21T08:52:40.6445684Z .b8 52 2026-02-21T08:52:40.6445734Z .b8 110 2026-02-21T08:52:40.6445786Z .b8 116 2026-02-21T08:52:40.6445842Z .b8 120 2026-02-21T08:52:40.6445892Z .b8 109 2026-02-21T08:52:40.6445943Z .b8 114 2026-02-21T08:52:40.6445998Z .b8 50 2026-02-21T08:52:40.6446048Z .b8 122 2026-02-21T08:52:40.6446098Z .b8 55 2026-02-21T08:52:40.6446153Z .b8 115 2026-02-21T08:52:40.6446209Z .b8 97 2026-02-21T08:52:40.6446273Z .b8 104 2026-02-21T08:52:40.6446325Z .b8 99 2026-02-21T08:52:40.6446378Z .b8 118 2026-02-21T08:52:40.6446433Z .b8 105 2026-02-21T08:52:40.6446831Z .b8 115 2026-02-21T08:52:40.6446886Z .b8 98 2026-02-21T08:52:40.6446941Z .b8 103 2026-02-21T08:52:40.6446991Z .b8 110 2026-02-21T08:52:40.6447042Z .b8 116 2026-02-21T08:52:40.6447092Z .b8 51 2026-02-21T08:52:40.6447146Z .b8 104 2026-02-21T08:52:40.6447196Z .b8 109 2026-02-21T08:52:40.6447246Z .b8 117 2026-02-21T08:52:40.6447298Z .b8 114 2026-02-21T08:52:40.6447348Z .b8 111 2026-02-21T08:52:40.6447395Z .b8 98 2026-02-21T08:52:40.6447445Z .b8 102 2026-02-21T08:52:40.6447498Z .b8 122 2026-02-21T08:52:40.6447546Z .b8 118 2026-02-21T08:52:40.6447596Z .b8 111 2026-02-21T08:52:40.6447647Z .b8 107 2026-02-21T08:52:40.6447698Z .b8 120 2026-02-21T08:52:40.6447746Z .b8 119 2026-02-21T08:52:40.6447795Z .b8 54 2026-02-21T08:52:40.6447847Z .b8 110 2026-02-21T08:52:40.6447907Z .b8 98 2026-02-21T08:52:40.6447961Z .b8 122 2026-02-21T08:52:40.6448146Z .b8 101 2026-02-21T08:52:40.6448205Z .b8 54 2026-02-21T08:52:40.6448257Z .b8 117 2026-02-21T08:52:40.6448307Z .b8 107 2026-02-21T08:52:40.6448360Z .b8 111 2026-02-21T08:52:40.6448418Z .b8 46 2026-02-21T08:52:40.6448467Z .b8 112 2026-02-21T08:52:40.6448516Z .b8 121 2026-02-21T08:52:40.6448569Z .b8 0 2026-02-21T08:52:40.6448684Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:40.6448771Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:40.6448836Z .b8 116 2026-02-21T08:52:40.6448889Z .b8 109 2026-02-21T08:52:40.6448941Z .b8 112 2026-02-21T08:52:40.6448989Z .b8 47 2026-02-21T08:52:40.6449042Z .b8 116 2026-02-21T08:52:40.6449091Z .b8 111 2026-02-21T08:52:40.6449141Z .b8 114 2026-02-21T08:52:40.6449195Z .b8 99 2026-02-21T08:52:40.6449244Z .b8 104 2026-02-21T08:52:40.6449293Z .b8 105 2026-02-21T08:52:40.6449341Z .b8 110 2026-02-21T08:52:40.6449395Z .b8 100 2026-02-21T08:52:40.6449446Z .b8 117 2026-02-21T08:52:40.6449496Z .b8 99 2026-02-21T08:52:40.6449550Z .b8 116 2026-02-21T08:52:40.6449603Z .b8 111 2026-02-21T08:52:40.6449652Z .b8 114 2026-02-21T08:52:40.6449700Z .b8 95 2026-02-21T08:52:40.6449752Z .b8 114 2026-02-21T08:52:40.6449803Z .b8 111 2026-02-21T08:52:40.6449853Z .b8 111 2026-02-21T08:52:40.6449902Z .b8 116 2026-02-21T08:52:40.6449955Z .b8 47 2026-02-21T08:52:40.6450004Z .b8 55 2026-02-21T08:52:40.6450053Z .b8 100 2026-02-21T08:52:40.6450106Z .b8 0 2026-02-21T08:52:40.6450241Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:40.6450327Z .b8 95 // DW_AT_name 2026-02-21T08:52:40.6450379Z .b8 104 2026-02-21T08:52:40.6450435Z .b8 101 2026-02-21T08:52:40.6450485Z .b8 108 2026-02-21T08:52:40.6450537Z .b8 105 2026-02-21T08:52:40.6450590Z .b8 111 2026-02-21T08:52:40.6450640Z .b8 110 2026-02-21T08:52:40.6450689Z .b8 95 2026-02-21T08:52:40.6450738Z .b8 109 2026-02-21T08:52:40.6450790Z .b8 97 2026-02-21T08:52:40.6450841Z .b8 116 2026-02-21T08:52:40.6450892Z .b8 109 2026-02-21T08:52:40.6450948Z .b8 117 2026-02-21T08:52:40.6450999Z .b8 108 2026-02-21T08:52:40.6451047Z .b8 95 2026-02-21T08:52:40.6451096Z .b8 98 2026-02-21T08:52:40.6451149Z .b8 102 2026-02-21T08:52:40.6451200Z .b8 49 2026-02-21T08:52:40.6451248Z .b8 54 2026-02-21T08:52:40.6451296Z .b8 95 2026-02-21T08:52:40.6451352Z .b8 105 2026-02-21T08:52:40.6451402Z .b8 110 2026-02-21T08:52:40.6451451Z .b8 116 2026-02-21T08:52:40.6451508Z .b8 52 2026-02-21T08:52:40.6451562Z .b8 0 2026-02-21T08:52:40.6451647Z .b8 1 // DW_AT_inline 2026-02-21T08:52:40.6451755Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:40.6451855Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:40.6451955Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:40.6452055Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:40.6452192Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:40.6452288Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:40.6452376Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:40.6452573Z .b64 $L__tmp16 // DW_AT_high_pc 2026-02-21T08:52:40.6452657Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:40.6452738Z .b8 91 // DW_AT_call_line 2026-02-21T08:52:40.6452836Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:40.6452930Z .b8 0 // End Of Children Mark 2026-02-21T08:52:40.6453017Z .b8 0 // End Of Children Mark 2026-02-21T08:52:40.6453066Z } 2026-02-21T08:52:40.6453137Z .section .debug_macinfo { } 2026-02-21T08:52:40.6453143Z 2026-02-21T08:52:40.6453220Z ================================================================ 2026-02-21T08:52:40.6453433Z please share the reproducer above with Triton project. 2026-02-21T08:52:42.3725411Z [79s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=1, num_warps=4, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T08:52:42.3727543Z Tensor-likes are not close! 2026-02-21T08:52:42.3727727Z 2026-02-21T08:52:42.3727842Z Mismatched elements: 455873 / 458752 (99.4%) 2026-02-21T08:52:42.3728302Z Greatest absolute difference: 1824.0 at index (33, 3732) (up to 0.01 allowed) 2026-02-21T08:52:42.3728900Z Greatest relative difference: 105472.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:52:42.3729418Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:42.3729708Z 2026-02-21T08:52:43.3492461Z 2026-02-21T08:52:43.3492479Z 2026-02-21T08:52:43.3492514Z 2026-02-21T08:52:43.3492878Z ================================================================ 2026-02-21T08:52:43.3493269Z Internal Triton PTX codegen error 2026-02-21T08:52:43.3493552Z `ptxas` stderr: 2026-02-21T08:52:43.3494299Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 417 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:43.3495103Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:43.3495336Z 2026-02-21T08:52:43.3495991Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpe8pqahg5.ptx -o /tmp/tmpe8pqahg5.ptx.o 2026-02-21T08:52:43.3496980Z 2026-02-21T08:52:43.3496985Z 2026-02-21T08:52:43.3497059Z // 2026-02-21T08:52:43.3497276Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:43.3497532Z // 2026-02-21T08:52:43.3497631Z 2026-02-21T08:52:43.3497702Z .version 8.7 2026-02-21T08:52:43.3497886Z .target sm_90a 2026-02-21T08:52:43.3498074Z .address_size 64 2026-02-21T08:52:43.3498188Z 2026-02-21T08:52:43.3498419Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:43.3498856Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:43.3499177Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:43.3499501Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:43.3500100Z [80s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:43.3502029Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:43.3503863Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:43.3504223Z `ptxas` stderr: 2026-02-21T08:52:43.3505194Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 417 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:43.3505836Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:43.3506033Z 2026-02-21T08:52:43.3506716Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpe8pqahg5.ptx -o /tmp/tmpe8pqahg5.ptx.o 2026-02-21T08:52:43.3507298Z 2026-02-21T08:52:43.3507461Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:43.3507834Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:43.3508443Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:43.3508909Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:43.3509272Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:43.3509617Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:43.3509897Z ) 2026-02-21T08:52:43.3510032Z .reqntid 1024 2026-02-21T08:52:43.3510172Z .maxnreg 32 2026-02-21T08:52:43.3510310Z { 2026-02-21T08:52:43.3510454Z .reg .pred %p<58>; 2026-02-21T08:52:43.3510627Z .reg .b16 %rs<297>; 2026-02-21T08:52:43.3510785Z .reg .b32 %r<1443>; 2026-02-21T08:52:43.3510940Z .reg .b64 %rd<152>; 2026-02-21T08:52:43.3511242Z .loc 1 14 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:14:0 2026-02-21T08:52:43.3511595Z $L__func_begin0: 2026-02-21T08:52:43.3511875Z .loc 1 14 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:14:0 2026-02-21T08:52:43.3512164Z 2026-02-21T08:52:43.3512219Z // %bb.0: 2026-02-21T08:52:43.3512419Z ld.param.b64 %rd17, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:43.3512718Z ld.param.b64 %rd16, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:43.3513014Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:43.3513248Z $L__tmp0: 2026-02-21T08:52:43.3513538Z .loc 1 19 46 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:46 2026-02-21T08:52:43.3513887Z mov.u32 %r1394, %ctaid.x; 2026-02-21T08:52:43.3514200Z .loc 1 0 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0 2026-02-21T08:52:43.3514541Z sub.s32 %r182, 4279, %r1394; 2026-02-21T08:52:43.3514886Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3515246Z mul.hi.u32 %r183, %r182, 1041204193; 2026-02-21T08:52:43.3515441Z shr.u32 %r184, %r183, 10; 2026-02-21T08:52:43.3515625Z mul.hi.u32 %r185, %r184, 1431655766; 2026-02-21T08:52:43.3515835Z mad.lo.s32 %r1431, %r185, 12672, %r1394; 2026-02-21T08:52:43.3516188Z .loc 1 31 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:45 2026-02-21T08:52:43.3516752Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:43.3516913Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:43.3517075Z and.b32 %r5, %r3, 1008; 2026-02-21T08:52:43.3517251Z shr.u32 %r6, %r3, 4; 2026-02-21T08:52:43.3517413Z and.b32 %r7, %r3, 15; 2026-02-21T08:52:43.3517570Z shl.b32 %r8, %r7, 2; 2026-02-21T08:52:43.3517863Z .loc 1 33 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:45 2026-02-21T08:52:43.3518202Z and.b32 %r9, %r3, 31; 2026-02-21T08:52:43.3518364Z shl.b32 %r10, %r9, 2; 2026-02-21T08:52:43.3518533Z shl.b32 %r11, %r7, 3; 2026-02-21T08:52:43.3518827Z .loc 1 65 38 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:65:38 2026-02-21T08:52:43.3519171Z and.b32 %r12, %r3, 128; 2026-02-21T08:52:43.3519480Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3519845Z setp.ge.s32 %p1, %r1394, %r1431; 2026-02-21T08:52:43.3520035Z shl.b32 %r1377, %r3, 3; 2026-02-21T08:52:43.3520362Z shr.u32 %r1378, %r3, 1; 2026-02-21T08:52:43.3520531Z mov.b32 %r1379, global_smem; 2026-02-21T08:52:43.3520717Z mul.lo.s32 %r1380, %r4, 7168; 2026-02-21T08:52:43.3520900Z shl.b32 %r1381, %r3, 2; 2026-02-21T08:52:43.3521062Z shl.b32 %r1382, %r3, 6; 2026-02-21T08:52:43.3521231Z shl.b32 %r1383, %r3, 5; 2026-02-21T08:52:43.3521393Z shl.b32 %r1384, %r9, 1; 2026-02-21T08:52:43.3521575Z and.b32 %r1385, %r3, 127; 2026-02-21T08:52:43.3521745Z shl.b32 %r1386, %r3, 4; 2026-02-21T08:52:43.3521916Z and.b32 %r1387, %r4, 28; 2026-02-21T08:52:43.3522079Z and.b32 %r1388, %r3, 7; 2026-02-21T08:52:43.3522245Z shl.b32 %r1389, %r7, 4; 2026-02-21T08:52:43.3522403Z shr.u32 %r1390, %r3, 2; 2026-02-21T08:52:43.3522571Z and.b32 %r1391, %r3, 16; 2026-02-21T08:52:43.3522880Z shl.b32 %r1392, %r3, 1; 2026-02-21T08:52:43.3523046Z shl.b32 %r1393, %r3, 7; 2026-02-21T08:52:43.3523224Z setp.eq.b32 %p57, %r12, 0; 2026-02-21T08:52:43.3523411Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:43.3523601Z // %bb.1: // %.lr.ph 2026-02-21T08:52:43.3523958Z .loc 1 0 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0:144 2026-02-21T08:52:43.3524318Z and.b32 %r188, %r1378, 56; 2026-02-21T08:52:43.3524510Z xor.b32 %r189, %r188, %r1377; 2026-02-21T08:52:43.3524703Z add.s32 %r191, %r1379, %r189; 2026-02-21T08:52:43.3524887Z add.s32 %r13, %r191, 32768; 2026-02-21T08:52:43.3525061Z add.s32 %r193, %r1379, 49152; 2026-02-21T08:52:43.3525244Z add.s32 %r15, %r193, %r1381; 2026-02-21T08:52:43.3525427Z add.s32 %r16, %r191, 40960; 2026-02-21T08:52:43.3525610Z add.s32 %r194, %r1379, %r1381; 2026-02-21T08:52:43.3525791Z add.s32 %r17, %r194, 53248; 2026-02-21T08:52:43.3525969Z and.b32 %r196, %r1382, 6144; 2026-02-21T08:52:43.3526142Z and.b32 %r198, %r1383, 896; 2026-02-21T08:52:43.3526319Z or.b32 %r200, %r196, %r198; 2026-02-21T08:52:43.3526615Z or.b32 %r18, %r200, %r1384; 2026-02-21T08:52:43.3526811Z xor.b32 %r19, %r18, 8; 2026-02-21T08:52:43.3526990Z xor.b32 %r20, %r18, 16; 2026-02-21T08:52:43.3527156Z xor.b32 %r21, %r18, 24; 2026-02-21T08:52:43.3527324Z xor.b32 %r22, %r18, 32; 2026-02-21T08:52:43.3527483Z xor.b32 %r23, %r18, 40; 2026-02-21T08:52:43.3527647Z xor.b32 %r24, %r18, 48; 2026-02-21T08:52:43.3527805Z xor.b32 %r25, %r18, 56; 2026-02-21T08:52:43.3527971Z and.b32 %r202, %r1378, 384; 2026-02-21T08:52:43.3528145Z add.s32 %r203, %r193, %r202; 2026-02-21T08:52:43.3528327Z add.s32 %r26, %r203, %r1385; 2026-02-21T08:52:43.3528504Z shl.b32 %r204, %r1385, 7; 2026-02-21T08:52:43.3528688Z and.b32 %r206, %r1386, 112; 2026-02-21T08:52:43.3528869Z or.b32 %r208, %r204, %r206; 2026-02-21T08:52:43.3529040Z xor.b32 %r209, %r208, %r1387; 2026-02-21T08:52:43.3529223Z add.s32 %r27, %r1379, %r209; 2026-02-21T08:52:43.3529399Z xor.b32 %r210, %r209, 32; 2026-02-21T08:52:43.3529572Z add.s32 %r28, %r1379, %r210; 2026-02-21T08:52:43.3529741Z xor.b32 %r211, %r209, 64; 2026-02-21T08:52:43.3529916Z add.s32 %r29, %r1379, %r211; 2026-02-21T08:52:43.3530088Z xor.b32 %r212, %r209, 96; 2026-02-21T08:52:43.3530258Z add.s32 %r30, %r1379, %r212; 2026-02-21T08:52:43.3530438Z shl.b32 %r214, %r1388, 11; 2026-02-21T08:52:43.3530609Z and.b32 %r216, %r3, 96; 2026-02-21T08:52:43.3530787Z shl.b32 %r217, %r216, 3; 2026-02-21T08:52:43.3530968Z and.b32 %r219, %r1390, 96; 2026-02-21T08:52:43.3531146Z and.b32 %r222, %r1392, 1024; 2026-02-21T08:52:43.3531319Z or.b32 %r223, %r1389, %r217; 2026-02-21T08:52:43.3531498Z or.b32 %r224, %r219, %r1391; 2026-02-21T08:52:43.3531668Z xor.b32 %r225, %r223, %r224; 2026-02-21T08:52:43.3531852Z add.s32 %r226, %r1379, %r214; 2026-02-21T08:52:43.3532026Z add.s32 %r227, %r226, %r222; 2026-02-21T08:52:43.3532214Z add.s32 %r31, %r227, %r225; 2026-02-21T08:52:43.3532403Z and.b32 %r229, %r1393, 15360; 2026-02-21T08:52:43.3532579Z shl.b32 %r230, %r1388, 4; 2026-02-21T08:52:43.3532756Z xor.b32 %r231, %r230, %r5; 2026-02-21T08:52:43.3532927Z add.s32 %r232, %r1379, %r229; 2026-02-21T08:52:43.3547680Z add.s32 %r32, %r232, %r231; 2026-02-21T08:52:43.3547877Z shl.b32 %r233, %r216, 6; 2026-02-21T08:52:43.3548065Z or.b32 %r234, %r233, %r198; 2026-02-21T08:52:43.3548239Z or.b32 %r33, %r234, %r1384; 2026-02-21T08:52:43.3548418Z xor.b32 %r34, %r33, 8; 2026-02-21T08:52:43.3548668Z xor.b32 %r35, %r33, 16; 2026-02-21T08:52:43.3548849Z xor.b32 %r36, %r33, 24; 2026-02-21T08:52:43.3549014Z xor.b32 %r37, %r33, 32; 2026-02-21T08:52:43.3549172Z xor.b32 %r38, %r33, 40; 2026-02-21T08:52:43.3549336Z xor.b32 %r39, %r33, 48; 2026-02-21T08:52:43.3549500Z xor.b32 %r40, %r33, 56; 2026-02-21T08:52:43.3549669Z or.b32 %r235, %r204, %r230; 2026-02-21T08:52:43.3549842Z xor.b32 %r236, %r235, %r1387; 2026-02-21T08:52:43.3550026Z add.s32 %r41, %r1379, %r236; 2026-02-21T08:52:43.3550355Z xor.b32 %r237, %r236, 32; 2026-02-21T08:52:43.3550540Z add.s32 %r42, %r1379, %r237; 2026-02-21T08:52:43.3550712Z xor.b32 %r238, %r236, 64; 2026-02-21T08:52:43.3550885Z add.s32 %r43, %r1379, %r238; 2026-02-21T08:52:43.3551066Z xor.b32 %r239, %r236, 96; 2026-02-21T08:52:43.3551230Z add.s32 %r44, %r1379, %r239; 2026-02-21T08:52:43.3551572Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3551939Z or.b32 %r240, %r1380, %r10; 2026-02-21T08:52:43.3552123Z add.s32 %r45, %r240, 458752; 2026-02-21T08:52:43.3552460Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3552834Z shl.b32 %r241, %r6, 13; 2026-02-21T08:52:43.3553150Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3553518Z or.b32 %r242, %r241, %r8; 2026-02-21T08:52:43.3553698Z or.b32 %r46, %r242, 128; 2026-02-21T08:52:43.3553872Z cvt.u64.u32 %rd2, %r8; 2026-02-21T08:52:43.3554050Z cvt.u64.u32 %rd3, %r1380; 2026-02-21T08:52:43.3554279Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:43.3554576Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:43.3554840Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:43.3555104Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:43.3555480Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.3555845Z mul.hi.s32 %r254, %r1394, -1840700269; 2026-02-21T08:52:43.3556064Z add.s32 %r255, %r254, %r1394; 2026-02-21T08:52:43.3556249Z shr.u32 %r256, %r255, 31; 2026-02-21T08:52:43.3556439Z shr.s32 %r257, %r255, 6; 2026-02-21T08:52:43.3556765Z add.s32 %r258, %r257, %r256; 2026-02-21T08:52:43.3557095Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.3557446Z shl.b32 %r259, %r258, 1; 2026-02-21T08:52:43.3557751Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.3558107Z sub.s32 %r260, 1, %r259; 2026-02-21T08:52:43.3558412Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.3558751Z min.s32 %r261, %r260, 2; 2026-02-21T08:52:43.3559054Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.3559408Z mul.lo.s32 %r262, %r258, 112; 2026-02-21T08:52:43.3559586Z sub.s32 %r263, %r1394, %r262; 2026-02-21T08:52:43.3559902Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.3560244Z div.s32 %r264, %r263, %r261; 2026-02-21T08:52:43.3560560Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.3560904Z mul.lo.s32 %r265, %r264, %r261; 2026-02-21T08:52:43.3561098Z sub.s32 %r266, %r263, %r265; 2026-02-21T08:52:43.3561413Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.3561926Z add.s32 %r267, %r266, %r259; 2026-02-21T08:52:43.3562241Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.3562585Z shl.b32 %r268, %r267, 6; 2026-02-21T08:52:43.3562898Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.3563237Z or.b32 %r70, %r268, %r6; 2026-02-21T08:52:43.3563560Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.3563908Z shl.b32 %r71, %r264, 7; 2026-02-21T08:52:43.3564333Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3564689Z or.b32 %r269, %r71, %r10; 2026-02-21T08:52:43.3564994Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.3565357Z shl.b32 %r270, %r70, 13; 2026-02-21T08:52:43.3565661Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.3566010Z or.b32 %r271, %r270, %r8; 2026-02-21T08:52:43.3566318Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3566811Z mad.wide.s32 %rd18, %r271, 2, %rd15; 2026-02-21T08:52:43.3567011Z mov.b32 %r244, 8; 2026-02-21T08:52:43.3567299Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3567645Z // begin inline asm 2026-02-21T08:52:43.3567892Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd18 + 0 ], 0x8, %r244; 2026-02-21T08:52:43.3568172Z // end inline asm 2026-02-21T08:52:43.3568338Z cp.async.commit_group; 2026-02-21T08:52:43.3568640Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.3568991Z add.s32 %r272, %r269, %r1380; 2026-02-21T08:52:43.3569318Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3569664Z cvt.s64.s32 %rd23, %r272; 2026-02-21T08:52:43.3569833Z add.s64 %rd19, %rd16, %rd23; 2026-02-21T08:52:43.3570007Z mov.b32 %r246, 4; 2026-02-21T08:52:43.3570293Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3570631Z // begin inline asm 2026-02-21T08:52:43.3570864Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd19 + 0 ], 0x4, %r246; 2026-02-21T08:52:43.3571131Z // end inline asm 2026-02-21T08:52:43.3571303Z cp.async.commit_group; 2026-02-21T08:52:43.3571608Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3571954Z cvt.s64.s32 %rd24, %r270; 2026-02-21T08:52:43.3572138Z or.b64 %rd25, %rd24, %rd2; 2026-02-21T08:52:43.3572340Z shl.b64 %rd26, %rd25, 1; 2026-02-21T08:52:43.3572521Z add.s64 %rd27, %rd15, %rd26; 2026-02-21T08:52:43.3572699Z add.s64 %rd20, %rd27, 128; 2026-02-21T08:52:43.3573014Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3573356Z bar.sync 0; 2026-02-21T08:52:43.3573503Z // begin inline asm 2026-02-21T08:52:43.3573733Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd20 + 0 ], 0x8, %r244; 2026-02-21T08:52:43.3574010Z // end inline asm 2026-02-21T08:52:43.3574179Z cp.async.commit_group; 2026-02-21T08:52:43.3574488Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3574838Z cvt.s64.s32 %rd28, %r269; 2026-02-21T08:52:43.3575010Z add.s64 %rd29, %rd28, %rd3; 2026-02-21T08:52:43.3575200Z add.s64 %rd30, %rd16, %rd29; 2026-02-21T08:52:43.3575379Z add.s64 %rd21, %rd30, 229376; 2026-02-21T08:52:43.3575706Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3576193Z // begin inline asm 2026-02-21T08:52:43.3576413Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd21 + 0 ], 0x4, %r246; 2026-02-21T08:52:43.3576823Z // end inline asm 2026-02-21T08:52:43.3576978Z cp.async.commit_group; 2026-02-21T08:52:43.3577290Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3577652Z add.s32 %r1396, %r45, %r71; 2026-02-21T08:52:43.3577847Z shl.b32 %r273, %r267, 19; 2026-02-21T08:52:43.3578025Z or.b32 %r1395, %r46, %r273; 2026-02-21T08:52:43.3578201Z mov.b32 %r1399, 0f00000000; 2026-02-21T08:52:43.3578372Z mov.b32 %r1398, 1; 2026-02-21T08:52:43.3578534Z mov.b32 %r1397, -1; 2026-02-21T08:52:43.3578695Z mov.b64 %rd147, -32; 2026-02-21T08:52:43.3579012Z mov.b32 %r1400, %r1399; 2026-02-21T08:52:43.3579187Z mov.b32 %r1401, %r1399; 2026-02-21T08:52:43.3579350Z mov.b32 %r1402, %r1399; 2026-02-21T08:52:43.3579529Z mov.b32 %r1403, %r1399; 2026-02-21T08:52:43.3579703Z mov.b32 %r1404, %r1399; 2026-02-21T08:52:43.3579864Z mov.b32 %r1405, %r1399; 2026-02-21T08:52:43.3580030Z mov.b32 %r1406, %r1399; 2026-02-21T08:52:43.3580245Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.3580546Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.3580810Z add.s64 %rd147, %rd147, 32; 2026-02-21T08:52:43.3581005Z setp.lt.u64 %p11, %rd147, 4032; 2026-02-21T08:52:43.3581195Z add.s32 %r460, %r1397, 1; 2026-02-21T08:52:43.3581373Z setp.gt.s32 %p12, %r460, 1; 2026-02-21T08:52:43.3581562Z selp.b32 %r1397, 0, %r460, %p12; 2026-02-21T08:52:43.3581895Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3582265Z cp.async.wait_group 2; 2026-02-21T08:52:43.3582434Z bar.sync 0; 2026-02-21T08:52:43.3582592Z shl.b32 %r461, %r1397, 12; 2026-02-21T08:52:43.3582765Z shl.b32 %r462, %r1397, 13; 2026-02-21T08:52:43.3582946Z add.s32 %r463, %r1379, %r462; 2026-02-21T08:52:43.3583122Z add.s32 %r464, %r463, 32768; 2026-02-21T08:52:43.3583440Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.3583793Z add.s32 %r465, %r464, %r18; 2026-02-21T08:52:43.3583970Z ld.shared.b16 %rs1, [%r465]; 2026-02-21T08:52:43.3584170Z ld.shared.b16 %rs2, [%r465+1024]; 2026-02-21T08:52:43.3584382Z ld.shared.b16 %rs3, [%r465+64]; 2026-02-21T08:52:43.3584583Z ld.shared.b16 %rs4, [%r465+1088]; 2026-02-21T08:52:43.3584772Z add.s32 %r466, %r464, %r19; 2026-02-21T08:52:43.3584956Z ld.shared.b16 %rs5, [%r466]; 2026-02-21T08:52:43.3585136Z ld.shared.b16 %rs6, [%r466+1024]; 2026-02-21T08:52:43.3585333Z ld.shared.b16 %rs7, [%r466+64]; 2026-02-21T08:52:43.3585532Z ld.shared.b16 %rs8, [%r466+1088]; 2026-02-21T08:52:43.3585717Z add.s32 %r467, %r464, %r20; 2026-02-21T08:52:43.3585902Z ld.shared.b16 %rs9, [%r467]; 2026-02-21T08:52:43.3586086Z ld.shared.b16 %rs10, [%r467+1024]; 2026-02-21T08:52:43.3586295Z ld.shared.b16 %rs11, [%r467+64]; 2026-02-21T08:52:43.3586625Z ld.shared.b16 %rs12, [%r467+1088]; 2026-02-21T08:52:43.3586842Z add.s32 %r468, %r464, %r21; 2026-02-21T08:52:43.3587025Z ld.shared.b16 %rs13, [%r468]; 2026-02-21T08:52:43.3587218Z ld.shared.b16 %rs14, [%r468+1024]; 2026-02-21T08:52:43.3587413Z ld.shared.b16 %rs15, [%r468+64]; 2026-02-21T08:52:43.3587606Z ld.shared.b16 %rs16, [%r468+1088]; 2026-02-21T08:52:43.3587797Z add.s32 %r469, %r464, %r22; 2026-02-21T08:52:43.3587976Z ld.shared.b16 %rs17, [%r469]; 2026-02-21T08:52:43.3588166Z ld.shared.b16 %rs18, [%r469+1024]; 2026-02-21T08:52:43.3588372Z ld.shared.b16 %rs19, [%r469+64]; 2026-02-21T08:52:43.3588636Z ld.shared.b16 %rs20, [%r469+1088]; 2026-02-21T08:52:43.3588828Z add.s32 %r470, %r464, %r23; 2026-02-21T08:52:43.3589010Z ld.shared.b16 %rs21, [%r470]; 2026-02-21T08:52:43.3589188Z ld.shared.b16 %rs22, [%r470+1024]; 2026-02-21T08:52:43.3589383Z ld.shared.b16 %rs23, [%r470+64]; 2026-02-21T08:52:43.3589713Z ld.shared.b16 %rs24, [%r470+1088]; 2026-02-21T08:52:43.3589898Z add.s32 %r471, %r464, %r24; 2026-02-21T08:52:43.3590077Z ld.shared.b16 %rs25, [%r471]; 2026-02-21T08:52:43.3590256Z ld.shared.b16 %rs26, [%r471+1024]; 2026-02-21T08:52:43.3590468Z ld.shared.b16 %rs27, [%r471+64]; 2026-02-21T08:52:43.3590653Z ld.shared.b16 %rs28, [%r471+1088]; 2026-02-21T08:52:43.3590843Z add.s32 %r472, %r464, %r25; 2026-02-21T08:52:43.3591018Z ld.shared.b16 %rs29, [%r472]; 2026-02-21T08:52:43.3591202Z ld.shared.b16 %rs30, [%r472+1024]; 2026-02-21T08:52:43.3591391Z ld.shared.b16 %rs31, [%r472+64]; 2026-02-21T08:52:43.3591598Z ld.shared.b16 %rs32, [%r472+1088]; 2026-02-21T08:52:43.3591796Z cvt.f32.bf16 %r290, %rs1; 2026-02-21T08:52:43.3591972Z cvt.f32.bf16 %r291, %rs2; 2026-02-21T08:52:43.3592273Z cvt.f32.bf16 %r292, %rs5; 2026-02-21T08:52:43.3592447Z cvt.f32.bf16 %r293, %rs6; 2026-02-21T08:52:43.3592618Z cvt.f32.bf16 %r310, %rs9; 2026-02-21T08:52:43.3592788Z cvt.f32.bf16 %r311, %rs10; 2026-02-21T08:52:43.3592972Z cvt.f32.bf16 %r312, %rs13; 2026-02-21T08:52:43.3593143Z cvt.f32.bf16 %r313, %rs14; 2026-02-21T08:52:43.3593317Z cvt.f32.bf16 %r330, %rs17; 2026-02-21T08:52:43.3593504Z cvt.f32.bf16 %r331, %rs18; 2026-02-21T08:52:43.3593673Z cvt.f32.bf16 %r332, %rs21; 2026-02-21T08:52:43.3593851Z cvt.f32.bf16 %r333, %rs22; 2026-02-21T08:52:43.3594019Z cvt.f32.bf16 %r350, %rs25; 2026-02-21T08:52:43.3594195Z cvt.f32.bf16 %r351, %rs26; 2026-02-21T08:52:43.3594363Z cvt.f32.bf16 %r352, %rs29; 2026-02-21T08:52:43.3594538Z cvt.f32.bf16 %r353, %rs30; 2026-02-21T08:52:43.3594705Z cvt.f32.bf16 %r370, %rs3; 2026-02-21T08:52:43.3594884Z cvt.f32.bf16 %r371, %rs4; 2026-02-21T08:52:43.3595052Z cvt.f32.bf16 %r372, %rs7; 2026-02-21T08:52:43.3595235Z cvt.f32.bf16 %r373, %rs8; 2026-02-21T08:52:43.3595411Z cvt.f32.bf16 %r390, %rs11; 2026-02-21T08:52:43.3595588Z cvt.f32.bf16 %r391, %rs12; 2026-02-21T08:52:43.3595767Z cvt.f32.bf16 %r392, %rs15; 2026-02-21T08:52:43.3595939Z cvt.f32.bf16 %r393, %rs16; 2026-02-21T08:52:43.3596113Z cvt.f32.bf16 %r410, %rs19; 2026-02-21T08:52:43.3596280Z cvt.f32.bf16 %r411, %rs20; 2026-02-21T08:52:43.3596577Z cvt.f32.bf16 %r412, %rs23; 2026-02-21T08:52:43.3596753Z cvt.f32.bf16 %r413, %rs24; 2026-02-21T08:52:43.3596925Z cvt.f32.bf16 %r430, %rs27; 2026-02-21T08:52:43.3597094Z cvt.f32.bf16 %r431, %rs28; 2026-02-21T08:52:43.3597267Z cvt.f32.bf16 %r432, %rs31; 2026-02-21T08:52:43.3597436Z cvt.f32.bf16 %r433, %rs32; 2026-02-21T08:52:43.3597752Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.3598113Z add.s32 %r473, %r26, %r461; 2026-02-21T08:52:43.3598292Z ld.shared.b8 %rs33, [%r473]; 2026-02-21T08:52:43.3598495Z ld.shared.b8 %rs34, [%r473+512]; 2026-02-21T08:52:43.3598692Z ld.shared.b8 %rs35, [%r473+1024]; 2026-02-21T08:52:43.3598895Z ld.shared.b8 %rs36, [%r473+1536]; 2026-02-21T08:52:43.3599088Z ld.shared.b8 %rs37, [%r473+2048]; 2026-02-21T08:52:43.3599289Z ld.shared.b8 %rs38, [%r473+2560]; 2026-02-21T08:52:43.3599484Z ld.shared.b8 %rs39, [%r473+3072]; 2026-02-21T08:52:43.3599670Z ld.shared.b8 %rs40, [%r473+3584]; 2026-02-21T08:52:43.3600009Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.3600364Z shl.b16 %rs41, %rs33, 4; 2026-02-21T08:52:43.3600538Z shl.b16 %rs42, %rs34, 4; 2026-02-21T08:52:43.3600702Z shl.b16 %rs43, %rs35, 4; 2026-02-21T08:52:43.3600870Z shl.b16 %rs44, %rs36, 4; 2026-02-21T08:52:43.3601036Z shl.b16 %rs45, %rs37, 4; 2026-02-21T08:52:43.3601208Z shl.b16 %rs46, %rs38, 4; 2026-02-21T08:52:43.3601377Z shl.b16 %rs47, %rs39, 4; 2026-02-21T08:52:43.3601546Z shl.b16 %rs48, %rs40, 4; 2026-02-21T08:52:43.3601859Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.3602215Z selp.b16 %rs49, %rs41, %rs33, %p57; 2026-02-21T08:52:43.3602417Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T08:52:43.3602745Z shr.s16 %rs51, %rs50, 4; 2026-02-21T08:52:43.3602927Z selp.b16 %rs52, %rs42, %rs34, %p57; 2026-02-21T08:52:43.3603119Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T08:52:43.3603290Z shr.s16 %rs54, %rs53, 4; 2026-02-21T08:52:43.3603467Z selp.b16 %rs55, %rs43, %rs35, %p57; 2026-02-21T08:52:43.3603659Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T08:52:43.3603828Z shr.s16 %rs57, %rs56, 4; 2026-02-21T08:52:43.3603998Z selp.b16 %rs58, %rs44, %rs36, %p57; 2026-02-21T08:52:43.3604199Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T08:52:43.3604364Z shr.s16 %rs60, %rs59, 4; 2026-02-21T08:52:43.3604544Z selp.b16 %rs61, %rs45, %rs37, %p57; 2026-02-21T08:52:43.3604739Z cvt.s16.s8 %rs62, %rs61; 2026-02-21T08:52:43.3604910Z shr.s16 %rs63, %rs62, 4; 2026-02-21T08:52:43.3605214Z selp.b16 %rs64, %rs46, %rs38, %p57; 2026-02-21T08:52:43.3605419Z cvt.s16.s8 %rs65, %rs64; 2026-02-21T08:52:43.3605594Z shr.s16 %rs66, %rs65, 4; 2026-02-21T08:52:43.3605770Z selp.b16 %rs67, %rs47, %rs39, %p57; 2026-02-21T08:52:43.3605973Z cvt.s16.s8 %rs68, %rs67; 2026-02-21T08:52:43.3606153Z shr.s16 %rs69, %rs68, 4; 2026-02-21T08:52:43.3606334Z selp.b16 %rs70, %rs48, %rs40, %p57; 2026-02-21T08:52:43.3606635Z cvt.s16.s8 %rs71, %rs70; 2026-02-21T08:52:43.3606803Z shr.s16 %rs72, %rs71, 4; 2026-02-21T08:52:43.3607115Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.3607471Z cvt.rn.f32.s16 %r474, %rs51; 2026-02-21T08:52:43.3607656Z cvt.rn.f32.s16 %r475, %rs54; 2026-02-21T08:52:43.3607830Z cvt.rn.f32.s16 %r476, %rs57; 2026-02-21T08:52:43.3608013Z cvt.rn.f32.s16 %r477, %rs60; 2026-02-21T08:52:43.3608187Z cvt.rn.f32.s16 %r478, %rs63; 2026-02-21T08:52:43.3608367Z cvt.rn.f32.s16 %r479, %rs66; 2026-02-21T08:52:43.3608542Z cvt.rn.f32.s16 %r480, %rs69; 2026-02-21T08:52:43.3608724Z cvt.rn.f32.s16 %r481, %rs72; 2026-02-21T08:52:43.3608898Z st.shared.b32 [%r27], %r474; 2026-02-21T08:52:43.3609083Z st.shared.b32 [%r27+16384], %r478; 2026-02-21T08:52:43.3609277Z st.shared.b32 [%r28], %r475; 2026-02-21T08:52:43.3609460Z st.shared.b32 [%r28+16384], %r479; 2026-02-21T08:52:43.3609655Z st.shared.b32 [%r29], %r476; 2026-02-21T08:52:43.3609832Z st.shared.b32 [%r29+16384], %r480; 2026-02-21T08:52:43.3610026Z st.shared.b32 [%r30], %r477; 2026-02-21T08:52:43.3610202Z st.shared.b32 [%r30+16384], %r481; 2026-02-21T08:52:43.3610388Z $L__tmp1: 2026-02-21T08:52:43.3610739Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.3611159Z // begin inline asm 2026-02-21T08:52:43.3611366Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.3611553Z // end inline asm 2026-02-21T08:52:43.3611707Z bar.sync 0; 2026-02-21T08:52:43.3611872Z shfl.sync.idx.b32 %r482, %r4, 0, 31, -1; 2026-02-21T08:52:43.3612109Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.3612289Z shl.b32 %r483, %r482, 9; 2026-02-21T08:52:43.3612462Z and.b32 %r484, %r483, 14336; 2026-02-21T08:52:43.3612640Z add.s32 %r485, %r484, %r1379; 2026-02-21T08:52:43.3612830Z bfe.u32 %r486, %r485, 4, 14; 2026-02-21T08:52:43.3613007Z cvt.u64.u32 %rd41, %r486; 2026-02-21T08:52:43.3613200Z or.b64 %rd31, %rd41, 4611686293372403712; 2026-02-21T08:52:43.3613431Z mov.pred %p2, -1; 2026-02-21T08:52:43.3613589Z // begin inline asm 2026-02-21T08:52:43.3614069Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r290,%r291,%r292,%r293}, %rd31, %p2, 1, 1; 2026-02-21T08:52:43.3614584Z // end inline asm 2026-02-21T08:52:43.3614750Z add.s32 %r487, %r485, 32; 2026-02-21T08:52:43.3614924Z bfe.u32 %r488, %r487, 4, 14; 2026-02-21T08:52:43.3615106Z cvt.u64.u32 %rd42, %r488; 2026-02-21T08:52:43.3615292Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T08:52:43.3615506Z // begin inline asm 2026-02-21T08:52:43.3615983Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r310,%r311,%r312,%r313}, %rd32, %p2, 1, 1; 2026-02-21T08:52:43.3617353Z // end inline asm 2026-02-21T08:52:43.3617515Z add.s32 %r489, %r485, 64; 2026-02-21T08:52:43.3617687Z bfe.u32 %r490, %r489, 4, 14; 2026-02-21T08:52:43.3617865Z cvt.u64.u32 %rd43, %r490; 2026-02-21T08:52:43.3618055Z or.b64 %rd33, %rd43, 4611686293372403712; 2026-02-21T08:52:43.3618263Z // begin inline asm 2026-02-21T08:52:43.3618717Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r330,%r331,%r332,%r333}, %rd33, %p2, 1, 1; 2026-02-21T08:52:43.3619221Z // end inline asm 2026-02-21T08:52:43.3619373Z add.s32 %r491, %r485, 96; 2026-02-21T08:52:43.3619539Z bfe.u32 %r492, %r491, 4, 14; 2026-02-21T08:52:43.3619716Z cvt.u64.u32 %rd44, %r492; 2026-02-21T08:52:43.3620030Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T08:52:43.3620243Z // begin inline asm 2026-02-21T08:52:43.3620696Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r350,%r351,%r352,%r353}, %rd34, %p2, 1, 1; 2026-02-21T08:52:43.3621203Z // end inline asm 2026-02-21T08:52:43.3621357Z add.s32 %r493, %r485, 16384; 2026-02-21T08:52:43.3621530Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T08:52:43.3621706Z cvt.u64.u32 %rd45, %r494; 2026-02-21T08:52:43.3621881Z or.b64 %rd35, %rd45, 4611686293372403712; 2026-02-21T08:52:43.3622101Z // begin inline asm 2026-02-21T08:52:43.3622552Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r370,%r371,%r372,%r373}, %rd35, %p2, 1, 1; 2026-02-21T08:52:43.3623055Z // end inline asm 2026-02-21T08:52:43.3623204Z add.s32 %r495, %r485, 16416; 2026-02-21T08:52:43.3623382Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T08:52:43.3623558Z cvt.u64.u32 %rd46, %r496; 2026-02-21T08:52:43.3623735Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T08:52:43.3623946Z // begin inline asm 2026-02-21T08:52:43.3624393Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r390,%r391,%r392,%r393}, %rd36, %p2, 1, 1; 2026-02-21T08:52:43.3624898Z // end inline asm 2026-02-21T08:52:43.3625059Z add.s32 %r497, %r485, 16448; 2026-02-21T08:52:43.3625243Z bfe.u32 %r498, %r497, 4, 14; 2026-02-21T08:52:43.3625423Z cvt.u64.u32 %rd47, %r498; 2026-02-21T08:52:43.3625601Z or.b64 %rd37, %rd47, 4611686293372403712; 2026-02-21T08:52:43.3625805Z // begin inline asm 2026-02-21T08:52:43.3626254Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r410,%r411,%r412,%r413}, %rd37, %p2, 1, 1; 2026-02-21T08:52:43.3626908Z // end inline asm 2026-02-21T08:52:43.3627056Z add.s32 %r499, %r485, 16480; 2026-02-21T08:52:43.3627250Z bfe.u32 %r500, %r499, 4, 14; 2026-02-21T08:52:43.3627427Z cvt.u64.u32 %rd48, %r500; 2026-02-21T08:52:43.3627611Z or.b64 %rd38, %rd48, 4611686293372403712; 2026-02-21T08:52:43.3627817Z // begin inline asm 2026-02-21T08:52:43.3628269Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r430,%r431,%r432,%r433}, %rd38, %p2, 1, 1; 2026-02-21T08:52:43.3628834Z // end inline asm 2026-02-21T08:52:43.3629005Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.3629207Z mov.b32 %r443, 0; 2026-02-21T08:52:43.3629358Z mov.b32 %r442, %r1379; 2026-02-21T08:52:43.3629530Z mov.b32 %r444, %r443; 2026-02-21T08:52:43.3629688Z // begin inline asm 2026-02-21T08:52:43.3629965Z // wait for regs: %r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406,%r442,%r443,%r444 2026-02-21T08:52:43.3630314Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.3630509Z // end inline asm 2026-02-21T08:52:43.3630659Z $L__tmp2: 2026-02-21T08:52:43.3630958Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3631326Z add.s32 %r501, %r1398, 1; 2026-02-21T08:52:43.3631508Z setp.gt.s32 %p13, %r501, 1; 2026-02-21T08:52:43.3631879Z selp.b32 %r1398, 0, %r501, %p13; 2026-02-21T08:52:43.3632219Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3632585Z mad.wide.s32 %rd39, %r1395, 2, %rd15; 2026-02-21T08:52:43.3632924Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3633269Z shl.b32 %r502, %r1398, 12; 2026-02-21T08:52:43.3633448Z shl.b32 %r503, %r1398, 13; 2026-02-21T08:52:43.3633619Z add.s32 %r456, %r13, %r503; 2026-02-21T08:52:43.3633800Z selp.b32 %r457, 8, 0, %p11; 2026-02-21T08:52:43.3633973Z // begin inline asm 2026-02-21T08:52:43.3634214Z cp.async.ca.shared.global [ %r456 + 0 ], [ %rd39 + 0 ], 0x8, %r457; 2026-02-21T08:52:43.3634493Z // end inline asm 2026-02-21T08:52:43.3634768Z cp.async.commit_group; 2026-02-21T08:52:43.3635080Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3635425Z cvt.s64.s32 %rd49, %r1396; 2026-02-21T08:52:43.3635610Z add.s64 %rd40, %rd16, %rd49; 2026-02-21T08:52:43.3635918Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3636264Z add.s32 %r458, %r15, %r502; 2026-02-21T08:52:43.3636437Z selp.b32 %r459, 4, 0, %p11; 2026-02-21T08:52:43.3636748Z // begin inline asm 2026-02-21T08:52:43.3636982Z cp.async.ca.shared.global [ %r458 + 0 ], [ %rd40 + 0 ], 0x4, %r459; 2026-02-21T08:52:43.3637250Z // end inline asm 2026-02-21T08:52:43.3637407Z cp.async.commit_group; 2026-02-21T08:52:43.3637719Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3638082Z add.s32 %r1396, %r1396, 229376; 2026-02-21T08:52:43.3638268Z add.s32 %r1395, %r1395, 64; 2026-02-21T08:52:43.3638451Z setp.lt.u64 %p14, %rd147, 4064; 2026-02-21T08:52:43.3638637Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:43.3638855Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.3639257Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3639604Z or.b32 %r519, %r71, %r11; 2026-02-21T08:52:43.3639925Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3640289Z cp.async.wait_group 0; 2026-02-21T08:52:43.3640467Z bar.sync 0; 2026-02-21T08:52:43.3640746Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.3641113Z cvt.rn.bf16x2.f32 %r520, %r1400, %r1399; 2026-02-21T08:52:43.3641337Z cvt.rn.bf16x2.f32 %r521, %r1402, %r1401; 2026-02-21T08:52:43.3641550Z cvt.rn.bf16x2.f32 %r522, %r1404, %r1403; 2026-02-21T08:52:43.3641770Z cvt.rn.bf16x2.f32 %r523, %r1406, %r1405; 2026-02-21T08:52:43.3642111Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.3642473Z mad.lo.s32 %r524, %r70, 7168, %r519; 2026-02-21T08:52:43.3642801Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.3643159Z mad.wide.s32 %rd50, %r524, 2, %rd17; 2026-02-21T08:52:43.3643499Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.3643969Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r520, %r521, %r522, %r523}; 2026-02-21T08:52:43.3644303Z bar.sync 0; 2026-02-21T08:52:43.3644494Z ld.shared.v4.b32 {%r504, %r505, %r506, %r507}, [%r32]; 2026-02-21T08:52:43.3644742Z // begin inline asm 2026-02-21T08:52:43.3644959Z st.global.v4.b32 [ %rd50 + 0 ], { %r504, %r505, %r506, %r507 }; 2026-02-21T08:52:43.3645217Z // end inline asm 2026-02-21T08:52:43.3645536Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3645899Z add.s32 %r525, %r1394, 4224; 2026-02-21T08:52:43.3646371Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.3646856Z mul.hi.s32 %r526, %r525, -1840700269; 2026-02-21T08:52:43.3647066Z add.s32 %r527, %r526, %r525; 2026-02-21T08:52:43.3647243Z shr.u32 %r528, %r527, 31; 2026-02-21T08:52:43.3647418Z shr.s32 %r529, %r527, 6; 2026-02-21T08:52:43.3647584Z add.s32 %r530, %r529, %r528; 2026-02-21T08:52:43.3647900Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.3648263Z shl.b32 %r531, %r530, 1; 2026-02-21T08:52:43.3648569Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.3648912Z sub.s32 %r532, 1, %r531; 2026-02-21T08:52:43.3649340Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.3649691Z min.s32 %r533, %r532, 2; 2026-02-21T08:52:43.3649990Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.3650342Z mul.lo.s32 %r534, %r530, 112; 2026-02-21T08:52:43.3650544Z sub.s32 %r535, %r525, %r534; 2026-02-21T08:52:43.3650854Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.3651201Z div.s32 %r536, %r535, %r533; 2026-02-21T08:52:43.3651507Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.3651859Z mul.lo.s32 %r537, %r536, %r533; 2026-02-21T08:52:43.3652046Z sub.s32 %r538, %r535, %r537; 2026-02-21T08:52:43.3652355Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.3652703Z add.s32 %r539, %r538, %r531; 2026-02-21T08:52:43.3653021Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.3653370Z shl.b32 %r540, %r539, 6; 2026-02-21T08:52:43.3653671Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.3654016Z or.b32 %r98, %r540, %r6; 2026-02-21T08:52:43.3654324Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.3654681Z shl.b32 %r99, %r536, 7; 2026-02-21T08:52:43.3654987Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3655327Z or.b32 %r541, %r99, %r10; 2026-02-21T08:52:43.3655646Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.3655987Z shl.b32 %r542, %r98, 13; 2026-02-21T08:52:43.3656300Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.3656764Z or.b32 %r543, %r542, %r8; 2026-02-21T08:52:43.3657073Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3657437Z mad.wide.s32 %rd51, %r543, 2, %rd15; 2026-02-21T08:52:43.3657638Z mov.b32 %r509, 8; 2026-02-21T08:52:43.3657930Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3658272Z // begin inline asm 2026-02-21T08:52:43.3658514Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd51 + 0 ], 0x8, %r509; 2026-02-21T08:52:43.3658799Z // end inline asm 2026-02-21T08:52:43.3658965Z cp.async.commit_group; 2026-02-21T08:52:43.3659276Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.3659620Z add.s32 %r544, %r541, %r1380; 2026-02-21T08:52:43.3659964Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3660328Z cvt.s64.s32 %rd56, %r544; 2026-02-21T08:52:43.3660510Z add.s64 %rd52, %rd16, %rd56; 2026-02-21T08:52:43.3660683Z mov.b32 %r511, 4; 2026-02-21T08:52:43.3661146Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3661501Z // begin inline asm 2026-02-21T08:52:43.3661730Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd52 + 0 ], 0x4, %r511; 2026-02-21T08:52:43.3662004Z // end inline asm 2026-02-21T08:52:43.3662162Z cp.async.commit_group; 2026-02-21T08:52:43.3662471Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3662821Z cvt.s64.s32 %rd57, %r542; 2026-02-21T08:52:43.3662999Z or.b64 %rd58, %rd57, %rd2; 2026-02-21T08:52:43.3663175Z shl.b64 %rd59, %rd58, 1; 2026-02-21T08:52:43.3663347Z add.s64 %rd60, %rd15, %rd59; 2026-02-21T08:52:43.3663525Z add.s64 %rd53, %rd60, 128; 2026-02-21T08:52:43.3663973Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3664345Z bar.sync 0; 2026-02-21T08:52:43.3664492Z // begin inline asm 2026-02-21T08:52:43.3664727Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd53 + 0 ], 0x8, %r509; 2026-02-21T08:52:43.3664993Z // end inline asm 2026-02-21T08:52:43.3665152Z cp.async.commit_group; 2026-02-21T08:52:43.3665453Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3665803Z cvt.s64.s32 %rd61, %r541; 2026-02-21T08:52:43.3665977Z add.s64 %rd62, %rd61, %rd3; 2026-02-21T08:52:43.3666155Z add.s64 %rd63, %rd16, %rd62; 2026-02-21T08:52:43.3666335Z add.s64 %rd54, %rd63, 229376; 2026-02-21T08:52:43.3666772Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3667128Z // begin inline asm 2026-02-21T08:52:43.3667355Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd54 + 0 ], 0x4, %r511; 2026-02-21T08:52:43.3667626Z // end inline asm 2026-02-21T08:52:43.3667778Z cp.async.commit_group; 2026-02-21T08:52:43.3668098Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3668460Z add.s32 %r1408, %r45, %r99; 2026-02-21T08:52:43.3668723Z shl.b32 %r545, %r539, 19; 2026-02-21T08:52:43.3668900Z or.b32 %r1407, %r46, %r545; 2026-02-21T08:52:43.3669070Z mov.b32 %r1411, 0f00000000; 2026-02-21T08:52:43.3669246Z mov.b32 %r1410, 1; 2026-02-21T08:52:43.3669400Z mov.b32 %r1409, -1; 2026-02-21T08:52:43.3669565Z mov.b64 %rd148, -32; 2026-02-21T08:52:43.3669725Z mov.b32 %r1412, %r1411; 2026-02-21T08:52:43.3669897Z mov.b32 %r1413, %r1411; 2026-02-21T08:52:43.3670061Z mov.b32 %r1414, %r1411; 2026-02-21T08:52:43.3670239Z mov.b32 %r1415, %r1411; 2026-02-21T08:52:43.3670412Z mov.b32 %r1416, %r1411; 2026-02-21T08:52:43.3670570Z mov.b32 %r1417, %r1411; 2026-02-21T08:52:43.3670740Z mov.b32 %r1418, %r1411; 2026-02-21T08:52:43.3670955Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.3671261Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.3671515Z add.s64 %rd148, %rd148, 32; 2026-02-21T08:52:43.3671707Z setp.lt.u64 %p24, %rd148, 4032; 2026-02-21T08:52:43.3671898Z add.s32 %r732, %r1409, 1; 2026-02-21T08:52:43.3672076Z setp.gt.s32 %p25, %r732, 1; 2026-02-21T08:52:43.3672265Z selp.b32 %r1409, 0, %r732, %p25; 2026-02-21T08:52:43.3672595Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3672967Z cp.async.wait_group 2; 2026-02-21T08:52:43.3673138Z bar.sync 0; 2026-02-21T08:52:43.3673285Z shl.b32 %r733, %r1409, 12; 2026-02-21T08:52:43.3673459Z shl.b32 %r734, %r1409, 13; 2026-02-21T08:52:43.3673634Z add.s32 %r735, %r1379, %r734; 2026-02-21T08:52:43.3673808Z add.s32 %r736, %r735, 32768; 2026-02-21T08:52:43.3674126Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.3674481Z add.s32 %r737, %r736, %r33; 2026-02-21T08:52:43.3674672Z ld.shared.b16 %rs73, [%r737]; 2026-02-21T08:52:43.3675048Z ld.shared.b16 %rs74, [%r737+1024]; 2026-02-21T08:52:43.3675253Z ld.shared.b16 %rs75, [%r737+64]; 2026-02-21T08:52:43.3675451Z ld.shared.b16 %rs76, [%r737+1088]; 2026-02-21T08:52:43.3675639Z add.s32 %r738, %r736, %r34; 2026-02-21T08:52:43.3675819Z ld.shared.b16 %rs77, [%r738]; 2026-02-21T08:52:43.3675998Z ld.shared.b16 %rs78, [%r738+1024]; 2026-02-21T08:52:43.3676195Z ld.shared.b16 %rs79, [%r738+64]; 2026-02-21T08:52:43.3676390Z ld.shared.b16 %rs80, [%r738+1088]; 2026-02-21T08:52:43.3676727Z add.s32 %r739, %r736, %r35; 2026-02-21T08:52:43.3676914Z ld.shared.b16 %rs81, [%r739]; 2026-02-21T08:52:43.3677094Z ld.shared.b16 %rs82, [%r739+1024]; 2026-02-21T08:52:43.3677294Z ld.shared.b16 %rs83, [%r739+64]; 2026-02-21T08:52:43.3677622Z ld.shared.b16 %rs84, [%r739+1088]; 2026-02-21T08:52:43.3677832Z add.s32 %r740, %r736, %r36; 2026-02-21T08:52:43.3678010Z ld.shared.b16 %rs85, [%r740]; 2026-02-21T08:52:43.3678197Z ld.shared.b16 %rs86, [%r740+1024]; 2026-02-21T08:52:43.3678397Z ld.shared.b16 %rs87, [%r740+64]; 2026-02-21T08:52:43.3678585Z ld.shared.b16 %rs88, [%r740+1088]; 2026-02-21T08:52:43.3678779Z add.s32 %r741, %r736, %r37; 2026-02-21T08:52:43.3678959Z ld.shared.b16 %rs89, [%r741]; 2026-02-21T08:52:43.3679154Z ld.shared.b16 %rs90, [%r741+1024]; 2026-02-21T08:52:43.3679350Z ld.shared.b16 %rs91, [%r741+64]; 2026-02-21T08:52:43.3679548Z ld.shared.b16 %rs92, [%r741+1088]; 2026-02-21T08:52:43.3679734Z add.s32 %r742, %r736, %r38; 2026-02-21T08:52:43.3679914Z ld.shared.b16 %rs93, [%r742]; 2026-02-21T08:52:43.3694994Z ld.shared.b16 %rs94, [%r742+1024]; 2026-02-21T08:52:43.3695239Z ld.shared.b16 %rs95, [%r742+64]; 2026-02-21T08:52:43.3695469Z ld.shared.b16 %rs96, [%r742+1088]; 2026-02-21T08:52:43.3695703Z add.s32 %r743, %r736, %r39; 2026-02-21T08:52:43.3695897Z ld.shared.b16 %rs97, [%r743]; 2026-02-21T08:52:43.3696109Z ld.shared.b16 %rs98, [%r743+1024]; 2026-02-21T08:52:43.3696331Z ld.shared.b16 %rs99, [%r743+64]; 2026-02-21T08:52:43.3696685Z ld.shared.b16 %rs100, [%r743+1088]; 2026-02-21T08:52:43.3696907Z add.s32 %r744, %r736, %r40; 2026-02-21T08:52:43.3697101Z ld.shared.b16 %rs101, [%r744]; 2026-02-21T08:52:43.3697296Z ld.shared.b16 %rs102, [%r744+1024]; 2026-02-21T08:52:43.3697505Z ld.shared.b16 %rs103, [%r744+64]; 2026-02-21T08:52:43.3697701Z ld.shared.b16 %rs104, [%r744+1088]; 2026-02-21T08:52:43.3697902Z cvt.f32.bf16 %r562, %rs73; 2026-02-21T08:52:43.3698088Z cvt.f32.bf16 %r563, %rs74; 2026-02-21T08:52:43.3698265Z cvt.f32.bf16 %r564, %rs77; 2026-02-21T08:52:43.3698440Z cvt.f32.bf16 %r565, %rs78; 2026-02-21T08:52:43.3698609Z cvt.f32.bf16 %r582, %rs81; 2026-02-21T08:52:43.3698783Z cvt.f32.bf16 %r583, %rs82; 2026-02-21T08:52:43.3698969Z cvt.f32.bf16 %r584, %rs85; 2026-02-21T08:52:43.3699160Z cvt.f32.bf16 %r585, %rs86; 2026-02-21T08:52:43.3699332Z cvt.f32.bf16 %r602, %rs89; 2026-02-21T08:52:43.3699522Z cvt.f32.bf16 %r603, %rs90; 2026-02-21T08:52:43.3699704Z cvt.f32.bf16 %r604, %rs93; 2026-02-21T08:52:43.3699876Z cvt.f32.bf16 %r605, %rs94; 2026-02-21T08:52:43.3700052Z cvt.f32.bf16 %r622, %rs97; 2026-02-21T08:52:43.3700229Z cvt.f32.bf16 %r623, %rs98; 2026-02-21T08:52:43.3700411Z cvt.f32.bf16 %r624, %rs101; 2026-02-21T08:52:43.3700589Z cvt.f32.bf16 %r625, %rs102; 2026-02-21T08:52:43.3700771Z cvt.f32.bf16 %r642, %rs75; 2026-02-21T08:52:43.3700952Z cvt.f32.bf16 %r643, %rs76; 2026-02-21T08:52:43.3701134Z cvt.f32.bf16 %r644, %rs79; 2026-02-21T08:52:43.3701312Z cvt.f32.bf16 %r645, %rs80; 2026-02-21T08:52:43.3701495Z cvt.f32.bf16 %r662, %rs83; 2026-02-21T08:52:43.3701674Z cvt.f32.bf16 %r663, %rs84; 2026-02-21T08:52:43.3701844Z cvt.f32.bf16 %r664, %rs87; 2026-02-21T08:52:43.3702020Z cvt.f32.bf16 %r665, %rs88; 2026-02-21T08:52:43.3702197Z cvt.f32.bf16 %r682, %rs91; 2026-02-21T08:52:43.3702377Z cvt.f32.bf16 %r683, %rs92; 2026-02-21T08:52:43.3702548Z cvt.f32.bf16 %r684, %rs95; 2026-02-21T08:52:43.3702736Z cvt.f32.bf16 %r685, %rs96; 2026-02-21T08:52:43.3702913Z cvt.f32.bf16 %r702, %rs99; 2026-02-21T08:52:43.3703330Z cvt.f32.bf16 %r703, %rs100; 2026-02-21T08:52:43.3703506Z cvt.f32.bf16 %r704, %rs103; 2026-02-21T08:52:43.3703694Z cvt.f32.bf16 %r705, %rs104; 2026-02-21T08:52:43.3704047Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.3704416Z add.s32 %r745, %r26, %r733; 2026-02-21T08:52:43.3704610Z ld.shared.b8 %rs105, [%r745]; 2026-02-21T08:52:43.3704804Z ld.shared.b8 %rs106, [%r745+512]; 2026-02-21T08:52:43.3705018Z ld.shared.b8 %rs107, [%r745+1024]; 2026-02-21T08:52:43.3705229Z ld.shared.b8 %rs108, [%r745+1536]; 2026-02-21T08:52:43.3705439Z ld.shared.b8 %rs109, [%r745+2048]; 2026-02-21T08:52:43.3705633Z ld.shared.b8 %rs110, [%r745+2560]; 2026-02-21T08:52:43.3705973Z ld.shared.b8 %rs111, [%r745+3072]; 2026-02-21T08:52:43.3706184Z ld.shared.b8 %rs112, [%r745+3584]; 2026-02-21T08:52:43.3706635Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.3707013Z shl.b16 %rs113, %rs105, 4; 2026-02-21T08:52:43.3707194Z shl.b16 %rs114, %rs106, 4; 2026-02-21T08:52:43.3707372Z shl.b16 %rs115, %rs107, 4; 2026-02-21T08:52:43.3707560Z shl.b16 %rs116, %rs108, 4; 2026-02-21T08:52:43.3707737Z shl.b16 %rs117, %rs109, 4; 2026-02-21T08:52:43.3707908Z shl.b16 %rs118, %rs110, 4; 2026-02-21T08:52:43.3708085Z shl.b16 %rs119, %rs111, 4; 2026-02-21T08:52:43.3708264Z shl.b16 %rs120, %rs112, 4; 2026-02-21T08:52:43.3708646Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.3709016Z selp.b16 %rs121, %rs113, %rs105, %p57; 2026-02-21T08:52:43.3709223Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T08:52:43.3709404Z shr.s16 %rs123, %rs122, 4; 2026-02-21T08:52:43.3709591Z selp.b16 %rs124, %rs114, %rs106, %p57; 2026-02-21T08:52:43.3709815Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:52:43.3709989Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:52:43.3710176Z selp.b16 %rs127, %rs115, %rs107, %p57; 2026-02-21T08:52:43.3710387Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:52:43.3710561Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:52:43.3710744Z selp.b16 %rs130, %rs116, %rs108, %p57; 2026-02-21T08:52:43.3710943Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T08:52:43.3711119Z shr.s16 %rs132, %rs131, 4; 2026-02-21T08:52:43.3711295Z selp.b16 %rs133, %rs117, %rs109, %p57; 2026-02-21T08:52:43.3711495Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T08:52:43.3711659Z shr.s16 %rs135, %rs134, 4; 2026-02-21T08:52:43.3711852Z selp.b16 %rs136, %rs118, %rs110, %p57; 2026-02-21T08:52:43.3712053Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T08:52:43.3712235Z shr.s16 %rs138, %rs137, 4; 2026-02-21T08:52:43.3712418Z selp.b16 %rs139, %rs119, %rs111, %p57; 2026-02-21T08:52:43.3712619Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T08:52:43.3712797Z shr.s16 %rs141, %rs140, 4; 2026-02-21T08:52:43.3712973Z selp.b16 %rs142, %rs120, %rs112, %p57; 2026-02-21T08:52:43.3713177Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T08:52:43.3713351Z shr.s16 %rs144, %rs143, 4; 2026-02-21T08:52:43.3713666Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.3714017Z cvt.rn.f32.s16 %r746, %rs123; 2026-02-21T08:52:43.3714211Z cvt.rn.f32.s16 %r747, %rs126; 2026-02-21T08:52:43.3714393Z cvt.rn.f32.s16 %r748, %rs129; 2026-02-21T08:52:43.3714580Z cvt.rn.f32.s16 %r749, %rs132; 2026-02-21T08:52:43.3714766Z cvt.rn.f32.s16 %r750, %rs135; 2026-02-21T08:52:43.3714941Z cvt.rn.f32.s16 %r751, %rs138; 2026-02-21T08:52:43.3715124Z cvt.rn.f32.s16 %r752, %rs141; 2026-02-21T08:52:43.3715298Z cvt.rn.f32.s16 %r753, %rs144; 2026-02-21T08:52:43.3715489Z st.shared.b32 [%r41], %r746; 2026-02-21T08:52:43.3715672Z st.shared.b32 [%r41+16384], %r750; 2026-02-21T08:52:43.3715876Z st.shared.b32 [%r42], %r747; 2026-02-21T08:52:43.3716212Z st.shared.b32 [%r42+16384], %r751; 2026-02-21T08:52:43.3716409Z st.shared.b32 [%r43], %r748; 2026-02-21T08:52:43.3716732Z st.shared.b32 [%r43+16384], %r752; 2026-02-21T08:52:43.3717090Z st.shared.b32 [%r44], %r749; 2026-02-21T08:52:43.3717279Z st.shared.b32 [%r44+16384], %r753; 2026-02-21T08:52:43.3717457Z $L__tmp3: 2026-02-21T08:52:43.3717815Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.3718232Z // begin inline asm 2026-02-21T08:52:43.3718420Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.3718612Z // end inline asm 2026-02-21T08:52:43.3718758Z bar.sync 0; 2026-02-21T08:52:43.3718925Z shfl.sync.idx.b32 %r754, %r4, 0, 31, -1; 2026-02-21T08:52:43.3719146Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.3719334Z shl.b32 %r755, %r754, 9; 2026-02-21T08:52:43.3719499Z and.b32 %r756, %r755, 14336; 2026-02-21T08:52:43.3719806Z add.s32 %r757, %r756, %r1379; 2026-02-21T08:52:43.3719986Z bfe.u32 %r758, %r757, 4, 14; 2026-02-21T08:52:43.3720167Z cvt.u64.u32 %rd74, %r758; 2026-02-21T08:52:43.3720351Z or.b64 %rd64, %rd74, 4611686293372403712; 2026-02-21T08:52:43.3720570Z mov.pred %p15, -1; 2026-02-21T08:52:43.3720752Z // begin inline asm 2026-02-21T08:52:43.3721223Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r562,%r563,%r564,%r565}, %rd64, %p15, 1, 1; 2026-02-21T08:52:43.3721752Z // end inline asm 2026-02-21T08:52:43.3721911Z add.s32 %r759, %r757, 32; 2026-02-21T08:52:43.3722091Z bfe.u32 %r760, %r759, 4, 14; 2026-02-21T08:52:43.3722270Z cvt.u64.u32 %rd75, %r760; 2026-02-21T08:52:43.3722458Z or.b64 %rd65, %rd75, 4611686293372403712; 2026-02-21T08:52:43.3722665Z // begin inline asm 2026-02-21T08:52:43.3723139Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r582,%r583,%r584,%r585}, %rd65, %p15, 1, 1; 2026-02-21T08:52:43.3723658Z // end inline asm 2026-02-21T08:52:43.3723811Z add.s32 %r761, %r757, 64; 2026-02-21T08:52:43.3723977Z bfe.u32 %r762, %r761, 4, 14; 2026-02-21T08:52:43.3724154Z cvt.u64.u32 %rd76, %r762; 2026-02-21T08:52:43.3724353Z or.b64 %rd66, %rd76, 4611686293372403712; 2026-02-21T08:52:43.3724555Z // begin inline asm 2026-02-21T08:52:43.3725015Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r602,%r603,%r604,%r605}, %rd66, %p15, 1, 1; 2026-02-21T08:52:43.3725522Z // end inline asm 2026-02-21T08:52:43.3725672Z add.s32 %r763, %r757, 96; 2026-02-21T08:52:43.3725856Z bfe.u32 %r764, %r763, 4, 14; 2026-02-21T08:52:43.3726030Z cvt.u64.u32 %rd77, %r764; 2026-02-21T08:52:43.3726212Z or.b64 %rd67, %rd77, 4611686293372403712; 2026-02-21T08:52:43.3726410Z // begin inline asm 2026-02-21T08:52:43.3726993Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r622,%r623,%r624,%r625}, %rd67, %p15, 1, 1; 2026-02-21T08:52:43.3727504Z // end inline asm 2026-02-21T08:52:43.3727657Z add.s32 %r765, %r757, 16384; 2026-02-21T08:52:43.3727853Z bfe.u32 %r766, %r765, 4, 14; 2026-02-21T08:52:43.3728026Z cvt.u64.u32 %rd78, %r766; 2026-02-21T08:52:43.3728205Z or.b64 %rd68, %rd78, 4611686293372403712; 2026-02-21T08:52:43.3728402Z // begin inline asm 2026-02-21T08:52:43.3728857Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r642,%r643,%r644,%r645}, %rd68, %p15, 1, 1; 2026-02-21T08:52:43.3729356Z // end inline asm 2026-02-21T08:52:43.3729511Z add.s32 %r767, %r757, 16416; 2026-02-21T08:52:43.3729677Z bfe.u32 %r768, %r767, 4, 14; 2026-02-21T08:52:43.3729854Z cvt.u64.u32 %rd79, %r768; 2026-02-21T08:52:43.3730029Z or.b64 %rd69, %rd79, 4611686293372403712; 2026-02-21T08:52:43.3730230Z // begin inline asm 2026-02-21T08:52:43.3730680Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r662,%r663,%r664,%r665}, %rd69, %p15, 1, 1; 2026-02-21T08:52:43.3731182Z // end inline asm 2026-02-21T08:52:43.3731329Z add.s32 %r769, %r757, 16448; 2026-02-21T08:52:43.3731644Z bfe.u32 %r770, %r769, 4, 14; 2026-02-21T08:52:43.3731819Z cvt.u64.u32 %rd80, %r770; 2026-02-21T08:52:43.3731991Z or.b64 %rd70, %rd80, 4611686293372403712; 2026-02-21T08:52:43.3732190Z // begin inline asm 2026-02-21T08:52:43.3732655Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r682,%r683,%r684,%r685}, %rd70, %p15, 1, 1; 2026-02-21T08:52:43.3733159Z // end inline asm 2026-02-21T08:52:43.3733307Z add.s32 %r771, %r757, 16480; 2026-02-21T08:52:43.3733475Z bfe.u32 %r772, %r771, 4, 14; 2026-02-21T08:52:43.3733650Z cvt.u64.u32 %rd81, %r772; 2026-02-21T08:52:43.3733824Z or.b64 %rd71, %rd81, 4611686293372403712; 2026-02-21T08:52:43.3734022Z // begin inline asm 2026-02-21T08:52:43.3734619Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r702,%r703,%r704,%r705}, %rd71, %p15, 1, 1; 2026-02-21T08:52:43.3735127Z // end inline asm 2026-02-21T08:52:43.3735298Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.3735496Z mov.b32 %r716, 0; 2026-02-21T08:52:43.3735647Z mov.b32 %r715, %r716; 2026-02-21T08:52:43.3735809Z mov.b32 %r714, %r1379; 2026-02-21T08:52:43.3735973Z // begin inline asm 2026-02-21T08:52:43.3736239Z // wait for regs: %r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r714,%r715,%r716 2026-02-21T08:52:43.3736715Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.3736911Z // end inline asm 2026-02-21T08:52:43.3737064Z $L__tmp4: 2026-02-21T08:52:43.3737359Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3737726Z add.s32 %r773, %r1410, 1; 2026-02-21T08:52:43.3737920Z setp.gt.s32 %p26, %r773, 1; 2026-02-21T08:52:43.3738112Z selp.b32 %r1410, 0, %r773, %p26; 2026-02-21T08:52:43.3738447Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3738802Z mad.wide.s32 %rd72, %r1407, 2, %rd15; 2026-02-21T08:52:43.3739147Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3739491Z shl.b32 %r774, %r1410, 12; 2026-02-21T08:52:43.3739669Z shl.b32 %r775, %r1410, 13; 2026-02-21T08:52:43.3739858Z add.s32 %r728, %r13, %r775; 2026-02-21T08:52:43.3740034Z selp.b32 %r729, 8, 0, %p24; 2026-02-21T08:52:43.3740216Z // begin inline asm 2026-02-21T08:52:43.3740449Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd72 + 0 ], 0x8, %r729; 2026-02-21T08:52:43.3740730Z // end inline asm 2026-02-21T08:52:43.3740886Z cp.async.commit_group; 2026-02-21T08:52:43.3741210Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3741557Z cvt.s64.s32 %rd82, %r1408; 2026-02-21T08:52:43.3741740Z add.s64 %rd73, %rd16, %rd82; 2026-02-21T08:52:43.3742055Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3742399Z add.s32 %r730, %r15, %r774; 2026-02-21T08:52:43.3742582Z selp.b32 %r731, 4, 0, %p24; 2026-02-21T08:52:43.3742765Z // begin inline asm 2026-02-21T08:52:43.3743001Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd73 + 0 ], 0x4, %r731; 2026-02-21T08:52:43.3743270Z // end inline asm 2026-02-21T08:52:43.3743432Z cp.async.commit_group; 2026-02-21T08:52:43.3743743Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3744114Z add.s32 %r1408, %r1408, 229376; 2026-02-21T08:52:43.3744302Z add.s32 %r1407, %r1407, 64; 2026-02-21T08:52:43.3744482Z setp.lt.u64 %p27, %rd148, 4064; 2026-02-21T08:52:43.3744673Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:43.3744885Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.3745298Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3745650Z or.b32 %r791, %r99, %r11; 2026-02-21T08:52:43.3746170Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3746748Z cp.async.wait_group 0; 2026-02-21T08:52:43.3747029Z bar.sync 0; 2026-02-21T08:52:43.3747322Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.3747681Z cvt.rn.bf16x2.f32 %r792, %r1412, %r1411; 2026-02-21T08:52:43.3747901Z cvt.rn.bf16x2.f32 %r793, %r1414, %r1413; 2026-02-21T08:52:43.3748111Z cvt.rn.bf16x2.f32 %r794, %r1416, %r1415; 2026-02-21T08:52:43.3748322Z cvt.rn.bf16x2.f32 %r795, %r1418, %r1417; 2026-02-21T08:52:43.3748746Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.3749268Z mad.lo.s32 %r796, %r98, 7168, %r791; 2026-02-21T08:52:43.3749611Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.3749964Z mad.wide.s32 %rd83, %r796, 2, %rd17; 2026-02-21T08:52:43.3750297Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.3750753Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r792, %r793, %r794, %r795}; 2026-02-21T08:52:43.3751075Z bar.sync 0; 2026-02-21T08:52:43.3751262Z ld.shared.v4.b32 {%r776, %r777, %r778, %r779}, [%r32]; 2026-02-21T08:52:43.3751528Z // begin inline asm 2026-02-21T08:52:43.3751747Z st.global.v4.b32 [ %rd83 + 0 ], { %r776, %r777, %r778, %r779 }; 2026-02-21T08:52:43.3752000Z // end inline asm 2026-02-21T08:52:43.3752305Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3752666Z add.s32 %r797, %r1394, 8448; 2026-02-21T08:52:43.3752993Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.3753352Z mul.hi.s32 %r798, %r797, -1840700269; 2026-02-21T08:52:43.3753564Z add.s32 %r799, %r798, %r797; 2026-02-21T08:52:43.3753743Z shr.u32 %r800, %r799, 31; 2026-02-21T08:52:43.3753911Z shr.s32 %r801, %r799, 6; 2026-02-21T08:52:43.3754084Z add.s32 %r802, %r801, %r800; 2026-02-21T08:52:43.3754392Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.3754749Z shl.b32 %r803, %r802, 1; 2026-02-21T08:52:43.3755055Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.3755422Z sub.s32 %r804, 1, %r803; 2026-02-21T08:52:43.3755723Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.3756068Z min.s32 %r805, %r804, 2; 2026-02-21T08:52:43.3756375Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.3756861Z mul.lo.s32 %r806, %r802, 112; 2026-02-21T08:52:43.3757052Z sub.s32 %r807, %r797, %r806; 2026-02-21T08:52:43.3757373Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.3757720Z div.s32 %r808, %r807, %r805; 2026-02-21T08:52:43.3758032Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.3758377Z mul.lo.s32 %r809, %r808, %r805; 2026-02-21T08:52:43.3758569Z sub.s32 %r810, %r807, %r809; 2026-02-21T08:52:43.3758873Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.3759220Z add.s32 %r811, %r810, %r803; 2026-02-21T08:52:43.3759539Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.3759883Z shl.b32 %r812, %r811, 6; 2026-02-21T08:52:43.3760189Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.3760529Z or.b32 %r126, %r812, %r6; 2026-02-21T08:52:43.3761007Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.3761343Z shl.b32 %r127, %r808, 7; 2026-02-21T08:52:43.3761644Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3761983Z or.b32 %r813, %r127, %r10; 2026-02-21T08:52:43.3762287Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.3762632Z shl.b32 %r814, %r126, 13; 2026-02-21T08:52:43.3762933Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.3763275Z or.b32 %r815, %r814, %r8; 2026-02-21T08:52:43.3763700Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3764058Z mad.wide.s32 %rd84, %r815, 2, %rd15; 2026-02-21T08:52:43.3764250Z mov.b32 %r781, 8; 2026-02-21T08:52:43.3764544Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3764885Z // begin inline asm 2026-02-21T08:52:43.3765117Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd84 + 0 ], 0x8, %r781; 2026-02-21T08:52:43.3765393Z // end inline asm 2026-02-21T08:52:43.3765546Z cp.async.commit_group; 2026-02-21T08:52:43.3765856Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.3766200Z add.s32 %r816, %r813, %r1380; 2026-02-21T08:52:43.3766642Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3766987Z cvt.s64.s32 %rd89, %r816; 2026-02-21T08:52:43.3767163Z add.s64 %rd85, %rd16, %rd89; 2026-02-21T08:52:43.3767340Z mov.b32 %r783, 4; 2026-02-21T08:52:43.3767622Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3767967Z // begin inline asm 2026-02-21T08:52:43.3768192Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd85 + 0 ], 0x4, %r783; 2026-02-21T08:52:43.3768459Z // end inline asm 2026-02-21T08:52:43.3768625Z cp.async.commit_group; 2026-02-21T08:52:43.3768934Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3769278Z cvt.s64.s32 %rd90, %r814; 2026-02-21T08:52:43.3769448Z or.b64 %rd91, %rd90, %rd2; 2026-02-21T08:52:43.3769628Z shl.b64 %rd92, %rd91, 1; 2026-02-21T08:52:43.3769796Z add.s64 %rd93, %rd15, %rd92; 2026-02-21T08:52:43.3769975Z add.s64 %rd86, %rd93, 128; 2026-02-21T08:52:43.3770278Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3770631Z bar.sync 0; 2026-02-21T08:52:43.3770779Z // begin inline asm 2026-02-21T08:52:43.3771005Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd86 + 0 ], 0x8, %r781; 2026-02-21T08:52:43.3771274Z // end inline asm 2026-02-21T08:52:43.3771429Z cp.async.commit_group; 2026-02-21T08:52:43.3771739Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3772083Z cvt.s64.s32 %rd94, %r813; 2026-02-21T08:52:43.3772261Z add.s64 %rd95, %rd94, %rd3; 2026-02-21T08:52:43.3772437Z add.s64 %rd96, %rd16, %rd95; 2026-02-21T08:52:43.3772619Z add.s64 %rd87, %rd96, 229376; 2026-02-21T08:52:43.3772930Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3773276Z // begin inline asm 2026-02-21T08:52:43.3773506Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd87 + 0 ], 0x4, %r783; 2026-02-21T08:52:43.3773769Z // end inline asm 2026-02-21T08:52:43.3773924Z cp.async.commit_group; 2026-02-21T08:52:43.3774236Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3774599Z add.s32 %r1420, %r45, %r127; 2026-02-21T08:52:43.3774777Z shl.b32 %r817, %r811, 19; 2026-02-21T08:52:43.3775091Z or.b32 %r1419, %r46, %r817; 2026-02-21T08:52:43.3775268Z mov.b32 %r1423, 0f00000000; 2026-02-21T08:52:43.3775435Z mov.b32 %r1422, 1; 2026-02-21T08:52:43.3775605Z mov.b32 %r1421, -1; 2026-02-21T08:52:43.3775765Z mov.b64 %rd149, -32; 2026-02-21T08:52:43.3775930Z mov.b32 %r1424, %r1423; 2026-02-21T08:52:43.3776097Z mov.b32 %r1425, %r1423; 2026-02-21T08:52:43.3776262Z mov.b32 %r1426, %r1423; 2026-02-21T08:52:43.3776417Z mov.b32 %r1427, %r1423; 2026-02-21T08:52:43.3776711Z mov.b32 %r1428, %r1423; 2026-02-21T08:52:43.3776871Z mov.b32 %r1429, %r1423; 2026-02-21T08:52:43.3777032Z mov.b32 %r1430, %r1423; 2026-02-21T08:52:43.3777248Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.3777686Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.3777951Z add.s64 %rd149, %rd149, 32; 2026-02-21T08:52:43.3778132Z setp.lt.u64 %p37, %rd149, 4032; 2026-02-21T08:52:43.3778327Z add.s32 %r1004, %r1421, 1; 2026-02-21T08:52:43.3778507Z setp.gt.s32 %p38, %r1004, 1; 2026-02-21T08:52:43.3778695Z selp.b32 %r1421, 0, %r1004, %p38; 2026-02-21T08:52:43.3779040Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3779394Z cp.async.wait_group 2; 2026-02-21T08:52:43.3779563Z bar.sync 0; 2026-02-21T08:52:43.3779703Z shl.b32 %r1005, %r1421, 12; 2026-02-21T08:52:43.3779880Z shl.b32 %r1006, %r1421, 13; 2026-02-21T08:52:43.3780054Z add.s32 %r1007, %r1379, %r1006; 2026-02-21T08:52:43.3780241Z add.s32 %r1008, %r1007, 32768; 2026-02-21T08:52:43.3780569Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.3780918Z add.s32 %r1009, %r1008, %r33; 2026-02-21T08:52:43.3781106Z ld.shared.b16 %rs145, [%r1009]; 2026-02-21T08:52:43.3781298Z ld.shared.b16 %rs146, [%r1009+1024]; 2026-02-21T08:52:43.3781521Z ld.shared.b16 %rs147, [%r1009+64]; 2026-02-21T08:52:43.3781724Z ld.shared.b16 %rs148, [%r1009+1088]; 2026-02-21T08:52:43.3781920Z add.s32 %r1010, %r1008, %r34; 2026-02-21T08:52:43.3782097Z ld.shared.b16 %rs149, [%r1010]; 2026-02-21T08:52:43.3782287Z ld.shared.b16 %rs150, [%r1010+1024]; 2026-02-21T08:52:43.3782482Z ld.shared.b16 %rs151, [%r1010+64]; 2026-02-21T08:52:43.3782681Z ld.shared.b16 %rs152, [%r1010+1088]; 2026-02-21T08:52:43.3782870Z add.s32 %r1011, %r1008, %r35; 2026-02-21T08:52:43.3782940Z ld.shared.b16 %rs153, [%r1011]; 2026-02-21T08:52:43.3783005Z ld.shared.b16 %rs154, [%r1011+1024]; 2026-02-21T08:52:43.3783081Z ld.shared.b16 %rs155, [%r1011+64]; 2026-02-21T08:52:43.3783154Z ld.shared.b16 %rs156, [%r1011+1088]; 2026-02-21T08:52:43.3783215Z add.s32 %r1012, %r1008, %r36; 2026-02-21T08:52:43.3783279Z ld.shared.b16 %rs157, [%r1012]; 2026-02-21T08:52:43.3783352Z ld.shared.b16 %rs158, [%r1012+1024]; 2026-02-21T08:52:43.3783417Z ld.shared.b16 %rs159, [%r1012+64]; 2026-02-21T08:52:43.3783488Z ld.shared.b16 %rs160, [%r1012+1088]; 2026-02-21T08:52:43.3783552Z add.s32 %r1013, %r1008, %r37; 2026-02-21T08:52:43.3783615Z ld.shared.b16 %rs161, [%r1013]; 2026-02-21T08:52:43.3783691Z ld.shared.b16 %rs162, [%r1013+1024]; 2026-02-21T08:52:43.3783763Z ld.shared.b16 %rs163, [%r1013+64]; 2026-02-21T08:52:43.3783829Z ld.shared.b16 %rs164, [%r1013+1088]; 2026-02-21T08:52:43.3783887Z add.s32 %r1014, %r1008, %r38; 2026-02-21T08:52:43.3783954Z ld.shared.b16 %rs165, [%r1014]; 2026-02-21T08:52:43.3784017Z ld.shared.b16 %rs166, [%r1014+1024]; 2026-02-21T08:52:43.3784081Z ld.shared.b16 %rs167, [%r1014+64]; 2026-02-21T08:52:43.3784146Z ld.shared.b16 %rs168, [%r1014+1088]; 2026-02-21T08:52:43.3784214Z add.s32 %r1015, %r1008, %r39; 2026-02-21T08:52:43.3784276Z ld.shared.b16 %rs169, [%r1015]; 2026-02-21T08:52:43.3784343Z ld.shared.b16 %rs170, [%r1015+1024]; 2026-02-21T08:52:43.3784411Z ld.shared.b16 %rs171, [%r1015+64]; 2026-02-21T08:52:43.3784478Z ld.shared.b16 %rs172, [%r1015+1088]; 2026-02-21T08:52:43.3784538Z add.s32 %r1016, %r1008, %r40; 2026-02-21T08:52:43.3784772Z ld.shared.b16 %rs173, [%r1016]; 2026-02-21T08:52:43.3784848Z ld.shared.b16 %rs174, [%r1016+1024]; 2026-02-21T08:52:43.3784915Z ld.shared.b16 %rs175, [%r1016+64]; 2026-02-21T08:52:43.3784980Z ld.shared.b16 %rs176, [%r1016+1088]; 2026-02-21T08:52:43.3785050Z cvt.f32.bf16 %r834, %rs145; 2026-02-21T08:52:43.3785113Z cvt.f32.bf16 %r835, %rs146; 2026-02-21T08:52:43.3785173Z cvt.f32.bf16 %r836, %rs149; 2026-02-21T08:52:43.3785236Z cvt.f32.bf16 %r837, %rs150; 2026-02-21T08:52:43.3785294Z cvt.f32.bf16 %r854, %rs153; 2026-02-21T08:52:43.3785353Z cvt.f32.bf16 %r855, %rs154; 2026-02-21T08:52:43.3785411Z cvt.f32.bf16 %r856, %rs157; 2026-02-21T08:52:43.3785475Z cvt.f32.bf16 %r857, %rs158; 2026-02-21T08:52:43.3785535Z cvt.f32.bf16 %r874, %rs161; 2026-02-21T08:52:43.3785687Z cvt.f32.bf16 %r875, %rs162; 2026-02-21T08:52:43.3785752Z cvt.f32.bf16 %r876, %rs165; 2026-02-21T08:52:43.3785810Z cvt.f32.bf16 %r877, %rs166; 2026-02-21T08:52:43.3785868Z cvt.f32.bf16 %r894, %rs169; 2026-02-21T08:52:43.3785931Z cvt.f32.bf16 %r895, %rs170; 2026-02-21T08:52:43.3785993Z cvt.f32.bf16 %r896, %rs173; 2026-02-21T08:52:43.3786052Z cvt.f32.bf16 %r897, %rs174; 2026-02-21T08:52:43.3786109Z cvt.f32.bf16 %r914, %rs147; 2026-02-21T08:52:43.3786172Z cvt.f32.bf16 %r915, %rs148; 2026-02-21T08:52:43.3786231Z cvt.f32.bf16 %r916, %rs151; 2026-02-21T08:52:43.3786289Z cvt.f32.bf16 %r917, %rs152; 2026-02-21T08:52:43.3786347Z cvt.f32.bf16 %r934, %rs155; 2026-02-21T08:52:43.3786410Z cvt.f32.bf16 %r935, %rs156; 2026-02-21T08:52:43.3786586Z cvt.f32.bf16 %r936, %rs159; 2026-02-21T08:52:43.3786650Z cvt.f32.bf16 %r937, %rs160; 2026-02-21T08:52:43.3786713Z cvt.f32.bf16 %r954, %rs163; 2026-02-21T08:52:43.3786770Z cvt.f32.bf16 %r955, %rs164; 2026-02-21T08:52:43.3786832Z cvt.f32.bf16 %r956, %rs167; 2026-02-21T08:52:43.3786895Z cvt.f32.bf16 %r957, %rs168; 2026-02-21T08:52:43.3786954Z cvt.f32.bf16 %r974, %rs171; 2026-02-21T08:52:43.3787013Z cvt.f32.bf16 %r975, %rs172; 2026-02-21T08:52:43.3787076Z cvt.f32.bf16 %r976, %rs175; 2026-02-21T08:52:43.3787139Z cvt.f32.bf16 %r977, %rs176; 2026-02-21T08:52:43.3787362Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.3787428Z add.s32 %r1017, %r26, %r1005; 2026-02-21T08:52:43.3787499Z ld.shared.b8 %rs177, [%r1017]; 2026-02-21T08:52:43.3787565Z ld.shared.b8 %rs178, [%r1017+512]; 2026-02-21T08:52:43.3787635Z ld.shared.b8 %rs179, [%r1017+1024]; 2026-02-21T08:52:43.3787703Z ld.shared.b8 %rs180, [%r1017+1536]; 2026-02-21T08:52:43.3787773Z ld.shared.b8 %rs181, [%r1017+2048]; 2026-02-21T08:52:43.3787836Z ld.shared.b8 %rs182, [%r1017+2560]; 2026-02-21T08:52:43.3787899Z ld.shared.b8 %rs183, [%r1017+3072]; 2026-02-21T08:52:43.3787969Z ld.shared.b8 %rs184, [%r1017+3584]; 2026-02-21T08:52:43.3788168Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.3788232Z shl.b16 %rs185, %rs177, 4; 2026-02-21T08:52:43.3788300Z shl.b16 %rs186, %rs178, 4; 2026-02-21T08:52:43.3788361Z shl.b16 %rs187, %rs179, 4; 2026-02-21T08:52:43.3788430Z shl.b16 %rs188, %rs180, 4; 2026-02-21T08:52:43.3788490Z shl.b16 %rs189, %rs181, 4; 2026-02-21T08:52:43.3788619Z shl.b16 %rs190, %rs182, 4; 2026-02-21T08:52:43.3788682Z shl.b16 %rs191, %rs183, 4; 2026-02-21T08:52:43.3788743Z shl.b16 %rs192, %rs184, 4; 2026-02-21T08:52:43.3788943Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.3789016Z selp.b16 %rs193, %rs185, %rs177, %p57; 2026-02-21T08:52:43.3789077Z cvt.s16.s8 %rs194, %rs193; 2026-02-21T08:52:43.3789135Z shr.s16 %rs195, %rs194, 4; 2026-02-21T08:52:43.3789208Z selp.b16 %rs196, %rs186, %rs178, %p57; 2026-02-21T08:52:43.3789270Z cvt.s16.s8 %rs197, %rs196; 2026-02-21T08:52:43.3789330Z shr.s16 %rs198, %rs197, 4; 2026-02-21T08:52:43.3789400Z selp.b16 %rs199, %rs187, %rs179, %p57; 2026-02-21T08:52:43.3789460Z cvt.s16.s8 %rs200, %rs199; 2026-02-21T08:52:43.3789670Z shr.s16 %rs201, %rs200, 4; 2026-02-21T08:52:43.3789742Z selp.b16 %rs202, %rs188, %rs180, %p57; 2026-02-21T08:52:43.3789812Z cvt.s16.s8 %rs203, %rs202; 2026-02-21T08:52:43.3789874Z shr.s16 %rs204, %rs203, 4; 2026-02-21T08:52:43.3789942Z selp.b16 %rs205, %rs189, %rs181, %p57; 2026-02-21T08:52:43.3790008Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T08:52:43.3790067Z shr.s16 %rs207, %rs206, 4; 2026-02-21T08:52:43.3790136Z selp.b16 %rs208, %rs190, %rs182, %p57; 2026-02-21T08:52:43.3790201Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T08:52:43.3790259Z shr.s16 %rs210, %rs209, 4; 2026-02-21T08:52:43.3790326Z selp.b16 %rs211, %rs191, %rs183, %p57; 2026-02-21T08:52:43.3790385Z cvt.s16.s8 %rs212, %rs211; 2026-02-21T08:52:43.3790448Z shr.s16 %rs213, %rs212, 4; 2026-02-21T08:52:43.3790637Z selp.b16 %rs214, %rs192, %rs184, %p57; 2026-02-21T08:52:43.3790700Z cvt.s16.s8 %rs215, %rs214; 2026-02-21T08:52:43.3790763Z shr.s16 %rs216, %rs215, 4; 2026-02-21T08:52:43.3790964Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.3791030Z cvt.rn.f32.s16 %r1018, %rs195; 2026-02-21T08:52:43.3791099Z cvt.rn.f32.s16 %r1019, %rs198; 2026-02-21T08:52:43.3791161Z cvt.rn.f32.s16 %r1020, %rs201; 2026-02-21T08:52:43.3791220Z cvt.rn.f32.s16 %r1021, %rs204; 2026-02-21T08:52:43.3791281Z cvt.rn.f32.s16 %r1022, %rs207; 2026-02-21T08:52:43.3791346Z cvt.rn.f32.s16 %r1023, %rs210; 2026-02-21T08:52:43.3791419Z cvt.rn.f32.s16 %r1024, %rs213; 2026-02-21T08:52:43.3791483Z cvt.rn.f32.s16 %r1025, %rs216; 2026-02-21T08:52:43.3791552Z st.shared.b32 [%r41], %r1018; 2026-02-21T08:52:43.3791620Z st.shared.b32 [%r41+16384], %r1022; 2026-02-21T08:52:43.3791681Z st.shared.b32 [%r42], %r1019; 2026-02-21T08:52:43.3791747Z st.shared.b32 [%r42+16384], %r1023; 2026-02-21T08:52:43.3791813Z st.shared.b32 [%r43], %r1020; 2026-02-21T08:52:43.3791875Z st.shared.b32 [%r43+16384], %r1024; 2026-02-21T08:52:43.3791938Z st.shared.b32 [%r44], %r1021; 2026-02-21T08:52:43.3792004Z st.shared.b32 [%r44+16384], %r1025; 2026-02-21T08:52:43.3792060Z $L__tmp5: 2026-02-21T08:52:43.3792332Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.3792395Z // begin inline asm 2026-02-21T08:52:43.3792478Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.3792534Z // end inline asm 2026-02-21T08:52:43.3792588Z bar.sync 0; 2026-02-21T08:52:43.3792672Z shfl.sync.idx.b32 %r1026, %r4, 0, 31, -1; 2026-02-21T08:52:43.3792745Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.3792805Z shl.b32 %r1027, %r1026, 9; 2026-02-21T08:52:43.3792869Z and.b32 %r1028, %r1027, 14336; 2026-02-21T08:52:43.3792932Z add.s32 %r1029, %r1028, %r1379; 2026-02-21T08:52:43.3792994Z bfe.u32 %r1030, %r1029, 4, 14; 2026-02-21T08:52:43.3793058Z cvt.u64.u32 %rd107, %r1030; 2026-02-21T08:52:43.3793139Z or.b64 %rd97, %rd107, 4611686293372403712; 2026-02-21T08:52:43.3793205Z mov.pred %p28, -1; 2026-02-21T08:52:43.3793262Z // begin inline asm 2026-02-21T08:52:43.3793642Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r834,%r835,%r836,%r837}, %rd97, %p28, 1, 1; 2026-02-21T08:52:43.3793698Z // end inline asm 2026-02-21T08:52:43.3793757Z add.s32 %r1031, %r1029, 32; 2026-02-21T08:52:43.3793817Z bfe.u32 %r1032, %r1031, 4, 14; 2026-02-21T08:52:43.3793882Z cvt.u64.u32 %rd108, %r1032; 2026-02-21T08:52:43.3793955Z or.b64 %rd98, %rd108, 4611686293372403712; 2026-02-21T08:52:43.3794012Z // begin inline asm 2026-02-21T08:52:43.3794386Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r854,%r855,%r856,%r857}, %rd98, %p28, 1, 1; 2026-02-21T08:52:43.3794443Z // end inline asm 2026-02-21T08:52:43.3794502Z add.s32 %r1033, %r1029, 64; 2026-02-21T08:52:43.3794564Z bfe.u32 %r1034, %r1033, 4, 14; 2026-02-21T08:52:43.3794625Z cvt.u64.u32 %rd109, %r1034; 2026-02-21T08:52:43.3794821Z or.b64 %rd99, %rd109, 4611686293372403712; 2026-02-21T08:52:43.3794881Z // begin inline asm 2026-02-21T08:52:43.3795244Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r874,%r875,%r876,%r877}, %rd99, %p28, 1, 1; 2026-02-21T08:52:43.3795300Z // end inline asm 2026-02-21T08:52:43.3795358Z add.s32 %r1035, %r1029, 96; 2026-02-21T08:52:43.3795420Z bfe.u32 %r1036, %r1035, 4, 14; 2026-02-21T08:52:43.3795480Z cvt.u64.u32 %rd110, %r1036; 2026-02-21T08:52:43.3795556Z or.b64 %rd100, %rd110, 4611686293372403712; 2026-02-21T08:52:43.3795617Z // begin inline asm 2026-02-21T08:52:43.3796068Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r894,%r895,%r896,%r897}, %rd100, %p28, 1, 1; 2026-02-21T08:52:43.3796128Z // end inline asm 2026-02-21T08:52:43.3796189Z add.s32 %r1037, %r1029, 16384; 2026-02-21T08:52:43.3796251Z bfe.u32 %r1038, %r1037, 4, 14; 2026-02-21T08:52:43.3796316Z cvt.u64.u32 %rd111, %r1038; 2026-02-21T08:52:43.3796388Z or.b64 %rd101, %rd111, 4611686293372403712; 2026-02-21T08:52:43.3796582Z // begin inline asm 2026-02-21T08:52:43.3796948Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r914,%r915,%r916,%r917}, %rd101, %p28, 1, 1; 2026-02-21T08:52:43.3797015Z // end inline asm 2026-02-21T08:52:43.3797084Z add.s32 %r1039, %r1029, 16416; 2026-02-21T08:52:43.3797144Z bfe.u32 %r1040, %r1039, 4, 14; 2026-02-21T08:52:43.3797204Z cvt.u64.u32 %rd112, %r1040; 2026-02-21T08:52:43.3797273Z or.b64 %rd102, %rd112, 4611686293372403712; 2026-02-21T08:52:43.3797338Z // begin inline asm 2026-02-21T08:52:43.3797702Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r934,%r935,%r936,%r937}, %rd102, %p28, 1, 1; 2026-02-21T08:52:43.3797759Z // end inline asm 2026-02-21T08:52:43.3797822Z add.s32 %r1041, %r1029, 16448; 2026-02-21T08:52:43.3797884Z bfe.u32 %r1042, %r1041, 4, 14; 2026-02-21T08:52:43.3797947Z cvt.u64.u32 %rd113, %r1042; 2026-02-21T08:52:43.3798023Z or.b64 %rd103, %rd113, 4611686293372403712; 2026-02-21T08:52:43.3798081Z // begin inline asm 2026-02-21T08:52:43.3798438Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r954,%r955,%r956,%r957}, %rd103, %p28, 1, 1; 2026-02-21T08:52:43.3798493Z // end inline asm 2026-02-21T08:52:43.3798556Z add.s32 %r1043, %r1029, 16480; 2026-02-21T08:52:43.3798615Z bfe.u32 %r1044, %r1043, 4, 14; 2026-02-21T08:52:43.3798675Z cvt.u64.u32 %rd114, %r1044; 2026-02-21T08:52:43.3798747Z or.b64 %rd104, %rd114, 4611686293372403712; 2026-02-21T08:52:43.3798806Z // begin inline asm 2026-02-21T08:52:43.3799162Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r974,%r975,%r976,%r977}, %rd104, %p28, 1, 1; 2026-02-21T08:52:43.3799223Z // end inline asm 2026-02-21T08:52:43.3799304Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.3799358Z mov.b32 %r987, 0; 2026-02-21T08:52:43.3799419Z mov.b32 %r986, %r1379; 2026-02-21T08:52:43.3799482Z mov.b32 %r988, %r987; 2026-02-21T08:52:43.3799541Z // begin inline asm 2026-02-21T08:52:43.3799712Z // wait for regs: %r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430,%r986,%r987,%r988 2026-02-21T08:52:43.3799791Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.3799846Z // end inline asm 2026-02-21T08:52:43.3799900Z $L__tmp6: 2026-02-21T08:52:43.3800114Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3800181Z add.s32 %r1045, %r1422, 1; 2026-02-21T08:52:43.3800247Z setp.gt.s32 %p39, %r1045, 1; 2026-02-21T08:52:43.3800317Z selp.b32 %r1422, 0, %r1045, %p39; 2026-02-21T08:52:43.3800523Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3800729Z mad.wide.s32 %rd105, %r1419, 2, %rd15; 2026-02-21T08:52:43.3800921Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3800984Z shl.b32 %r1046, %r1422, 12; 2026-02-21T08:52:43.3801042Z shl.b32 %r1047, %r1422, 13; 2026-02-21T08:52:43.3801103Z add.s32 %r1000, %r13, %r1047; 2026-02-21T08:52:43.3801166Z selp.b32 %r1001, 8, 0, %p37; 2026-02-21T08:52:43.3801240Z // begin inline asm 2026-02-21T08:52:43.3801391Z cp.async.ca.shared.global [ %r1000 + 0 ], [ %rd105 + 0 ], 0x8, %r1001; 2026-02-21T08:52:43.3801449Z // end inline asm 2026-02-21T08:52:43.3801519Z cp.async.commit_group; 2026-02-21T08:52:43.3801713Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3801892Z cvt.s64.s32 %rd115, %r1420; 2026-02-21T08:52:43.3801962Z add.s64 %rd106, %rd16, %rd115; 2026-02-21T08:52:43.3802155Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3802219Z add.s32 %r1002, %r15, %r1046; 2026-02-21T08:52:43.3802280Z selp.b32 %r1003, 4, 0, %p37; 2026-02-21T08:52:43.3802356Z // begin inline asm 2026-02-21T08:52:43.3802504Z cp.async.ca.shared.global [ %r1002 + 0 ], [ %rd106 + 0 ], 0x4, %r1003; 2026-02-21T08:52:43.3802563Z // end inline asm 2026-02-21T08:52:43.3802634Z cp.async.commit_group; 2026-02-21T08:52:43.3802846Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3802909Z add.s32 %r1420, %r1420, 229376; 2026-02-21T08:52:43.3802973Z add.s32 %r1419, %r1419, 64; 2026-02-21T08:52:43.3803040Z setp.lt.u64 %p40, %rd149, 4064; 2026-02-21T08:52:43.3803099Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:43.3803213Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.3803417Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3803481Z or.b32 %r1052, %r127, %r11; 2026-02-21T08:52:43.3803684Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3803755Z cp.async.wait_group 0; 2026-02-21T08:52:43.3803809Z bar.sync 0; 2026-02-21T08:52:43.3804004Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.3804091Z cvt.rn.bf16x2.f32 %r1053, %r1424, %r1423; 2026-02-21T08:52:43.3804164Z cvt.rn.bf16x2.f32 %r1054, %r1426, %r1425; 2026-02-21T08:52:43.3804233Z cvt.rn.bf16x2.f32 %r1055, %r1428, %r1427; 2026-02-21T08:52:43.3804301Z cvt.rn.bf16x2.f32 %r1056, %r1430, %r1429; 2026-02-21T08:52:43.3804512Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.3804583Z mad.lo.s32 %r1057, %r126, 7168, %r1052; 2026-02-21T08:52:43.3804779Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.3804856Z mad.wide.s32 %rd116, %r1057, 2, %rd17; 2026-02-21T08:52:43.3805049Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.3805237Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r1053, %r1054, %r1055, %r1056}; 2026-02-21T08:52:43.3805305Z bar.sync 0; 2026-02-21T08:52:43.3805427Z ld.shared.v4.b32 {%r1048, %r1049, %r1050, %r1051}, [%r32]; 2026-02-21T08:52:43.3805487Z // begin inline asm 2026-02-21T08:52:43.3805618Z st.global.v4.b32 [ %rd116 + 0 ], { %r1048, %r1049, %r1050, %r1051 }; 2026-02-21T08:52:43.3805676Z // end inline asm 2026-02-21T08:52:43.3805882Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3805948Z add.s32 %r1394, %r1394, 12672; 2026-02-21T08:52:43.3806019Z setp.lt.s32 %p41, %r1394, %r1431; 2026-02-21T08:52:43.3806079Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:43.3806167Z $L__BB0_9: // %.preheader 2026-02-21T08:52:43.3806355Z setp.gt.s32 %p42, %r1431, 55; 2026-02-21T08:52:43.3806414Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:43.3806617Z // %bb.10: // %.lr.ph151 2026-02-21T08:52:43.3806824Z .loc 1 0 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0:144 2026-02-21T08:52:43.3806889Z and.b32 %r1060, %r1378, 56; 2026-02-21T08:52:43.3806951Z xor.b32 %r1061, %r1060, %r1377; 2026-02-21T08:52:43.3807012Z add.s32 %r1063, %r1379, %r1061; 2026-02-21T08:52:43.3807078Z add.s32 %r47, %r1063, 32768; 2026-02-21T08:52:43.3807137Z add.s32 %r1065, %r1379, 49152; 2026-02-21T08:52:43.3807196Z add.s32 %r49, %r1065, %r1381; 2026-02-21T08:52:43.3807259Z add.s32 %r1108, %r1063, 40960; 2026-02-21T08:52:43.3807460Z add.s32 %r1066, %r1379, %r1381; 2026-02-21T08:52:43.3807525Z add.s32 %r1110, %r1066, 53248; 2026-02-21T08:52:43.3807586Z and.b32 %r1068, %r1382, 6144; 2026-02-21T08:52:43.3807649Z and.b32 %r1070, %r1383, 896; 2026-02-21T08:52:43.3807714Z or.b32 %r1072, %r1068, %r1070; 2026-02-21T08:52:43.3807774Z or.b32 %r52, %r1072, %r1384; 2026-02-21T08:52:43.3807836Z xor.b32 %r53, %r52, 8; 2026-02-21T08:52:43.3807896Z xor.b32 %r54, %r52, 16; 2026-02-21T08:52:43.3807965Z xor.b32 %r55, %r52, 24; 2026-02-21T08:52:43.3808024Z xor.b32 %r56, %r52, 32; 2026-02-21T08:52:43.3808088Z xor.b32 %r57, %r52, 40; 2026-02-21T08:52:43.3808146Z xor.b32 %r58, %r52, 48; 2026-02-21T08:52:43.3808202Z xor.b32 %r59, %r52, 56; 2026-02-21T08:52:43.3808264Z and.b32 %r1074, %r1378, 384; 2026-02-21T08:52:43.3808327Z add.s32 %r1075, %r1065, %r1074; 2026-02-21T08:52:43.3808388Z add.s32 %r60, %r1075, %r1385; 2026-02-21T08:52:43.3808450Z shl.b32 %r1076, %r1385, 7; 2026-02-21T08:52:43.3808513Z and.b32 %r1078, %r1386, 112; 2026-02-21T08:52:43.3808577Z or.b32 %r1080, %r1076, %r1078; 2026-02-21T08:52:43.3808639Z xor.b32 %r1081, %r1080, %r1387; 2026-02-21T08:52:43.3808714Z add.s32 %r61, %r1379, %r1081; 2026-02-21T08:52:43.3808780Z xor.b32 %r1082, %r1081, 32; 2026-02-21T08:52:43.3808841Z add.s32 %r62, %r1379, %r1082; 2026-02-21T08:52:43.3808905Z xor.b32 %r1083, %r1081, 64; 2026-02-21T08:52:43.3808963Z add.s32 %r63, %r1379, %r1083; 2026-02-21T08:52:43.3809020Z xor.b32 %r1084, %r1081, 96; 2026-02-21T08:52:43.3809079Z add.s32 %r64, %r1379, %r1084; 2026-02-21T08:52:43.3809140Z shl.b32 %r1086, %r1388, 11; 2026-02-21T08:52:43.3809197Z and.b32 %r1088, %r1377, 768; 2026-02-21T08:52:43.3809255Z and.b32 %r1090, %r1390, 96; 2026-02-21T08:52:43.3809318Z and.b32 %r1093, %r1392, 1024; 2026-02-21T08:52:43.3809378Z or.b32 %r1094, %r1389, %r1088; 2026-02-21T08:52:43.3809438Z or.b32 %r1095, %r1090, %r1391; 2026-02-21T08:52:43.3809498Z xor.b32 %r1096, %r1094, %r1095; 2026-02-21T08:52:43.3809562Z add.s32 %r1097, %r1379, %r1086; 2026-02-21T08:52:43.3809621Z add.s32 %r1098, %r1097, %r1093; 2026-02-21T08:52:43.3809680Z add.s32 %r65, %r1098, %r1096; 2026-02-21T08:52:43.3809743Z and.b32 %r1100, %r1393, 15360; 2026-02-21T08:52:43.3809806Z shl.b32 %r1101, %r1388, 4; 2026-02-21T08:52:43.3809865Z xor.b32 %r1102, %r1101, %r5; 2026-02-21T08:52:43.3809924Z add.s32 %r1103, %r1379, %r1100; 2026-02-21T08:52:43.3810002Z add.s32 %r66, %r1103, %r1102; 2026-02-21T08:52:43.3810213Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3810272Z or.b32 %r67, %r8, 128; 2026-02-21T08:52:43.3810477Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3810547Z mad.wide.u32 %rd117, %r4, 7168, %rd16; 2026-02-21T08:52:43.3810610Z add.s64 %rd1, %rd117, 458752; 2026-02-21T08:52:43.3810676Z cvt.u64.u32 %rd125, %r1380; 2026-02-21T08:52:43.3810792Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:43.3810890Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:43.3811086Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.3811303Z mul.hi.s32 %r1115, %r1431, -1840700269; 2026-02-21T08:52:43.3811366Z add.s32 %r1116, %r1115, %r1431; 2026-02-21T08:52:43.3811429Z shr.u32 %r1117, %r1116, 31; 2026-02-21T08:52:43.3811492Z shr.s32 %r1118, %r1116, 6; 2026-02-21T08:52:43.3811553Z add.s32 %r1119, %r1118, %r1117; 2026-02-21T08:52:43.3811748Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.3811810Z shl.b32 %r1120, %r1119, 1; 2026-02-21T08:52:43.3812002Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.3812060Z sub.s32 %r1121, 1, %r1120; 2026-02-21T08:52:43.3812346Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.3812407Z min.u32 %r1122, %r1121, 2; 2026-02-21T08:52:43.3812599Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.3812664Z mul.lo.s32 %r1123, %r1119, 112; 2026-02-21T08:52:43.3812727Z sub.s32 %r1124, %r1431, %r1123; 2026-02-21T08:52:43.3812917Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.3812979Z cvt.u16.u32 %rs217, %r1124; 2026-02-21T08:52:43.3813043Z cvt.s8.s32 %rs218, %r1124; 2026-02-21T08:52:43.3813101Z cvt.u16.u32 %rs219, %r1122; 2026-02-21T08:52:43.3813291Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.3813354Z div.s16 %rs220, %rs218, %rs219; 2026-02-21T08:52:43.3813543Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.3813611Z mul.lo.s16 %rs221, %rs220, %rs219; 2026-02-21T08:52:43.3813672Z sub.s16 %rs222, %rs217, %rs221; 2026-02-21T08:52:43.3813734Z cvt.u32.u16 %r1125, %rs222; 2026-02-21T08:52:43.3813796Z cvt.s32.s8 %r1126, %r1125; 2026-02-21T08:52:43.3813987Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.3814059Z add.s32 %r1127, %r1120, %r1126; 2026-02-21T08:52:43.3814261Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.3814323Z shl.b32 %r1128, %r1127, 6; 2026-02-21T08:52:43.3814517Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.3814576Z or.b32 %r156, %r1128, %r6; 2026-02-21T08:52:43.3814767Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.3814827Z cvt.s16.s8 %rs223, %rs220; 2026-02-21T08:52:43.3814899Z mul.wide.s16 %r157, %rs223, 128; 2026-02-21T08:52:43.3815089Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3815148Z or.b32 %r1129, %r157, %r10; 2026-02-21T08:52:43.3815345Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.3815403Z shl.b32 %r1130, %r156, 13; 2026-02-21T08:52:43.3815592Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.3815654Z or.b32 %r1131, %r1130, %r8; 2026-02-21T08:52:43.3815844Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3815913Z mad.wide.s32 %rd118, %r1131, 2, %rd15; 2026-02-21T08:52:43.3815972Z mov.b32 %r1105, 8; 2026-02-21T08:52:43.3816161Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3816222Z // begin inline asm 2026-02-21T08:52:43.3816359Z cp.async.ca.shared.global [ %r47 + 0 ], [ %rd118 + 0 ], 0x8, %r1105; 2026-02-21T08:52:43.3816418Z // end inline asm 2026-02-21T08:52:43.3816613Z cp.async.commit_group; 2026-02-21T08:52:43.3816954Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.3817022Z add.s32 %r1132, %r1129, %r1380; 2026-02-21T08:52:43.3817215Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3817277Z cvt.s64.s32 %rd123, %r1132; 2026-02-21T08:52:43.3817348Z add.s64 %rd119, %rd16, %rd123; 2026-02-21T08:52:43.3817412Z mov.b32 %r1107, 4; 2026-02-21T08:52:43.3817608Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3817667Z // begin inline asm 2026-02-21T08:52:43.3817806Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd119 + 0 ], 0x4, %r1107; 2026-02-21T08:52:43.3817984Z // end inline asm 2026-02-21T08:52:43.3818053Z cp.async.commit_group; 2026-02-21T08:52:43.3818248Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3818315Z add.s64 %rd120, %rd118, 128; 2026-02-21T08:52:43.3818505Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3818564Z bar.sync 0; 2026-02-21T08:52:43.3818621Z // begin inline asm 2026-02-21T08:52:43.3818757Z cp.async.ca.shared.global [ %r1108 + 0 ], [ %rd120 + 0 ], 0x8, %r1105; 2026-02-21T08:52:43.3818812Z // end inline asm 2026-02-21T08:52:43.3818881Z cp.async.commit_group; 2026-02-21T08:52:43.3819082Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.3819147Z cvt.s64.s32 %rd124, %r1129; 2026-02-21T08:52:43.3819214Z add.s64 %rd126, %rd124, %rd125; 2026-02-21T08:52:43.3819276Z add.s64 %rd127, %rd16, %rd126; 2026-02-21T08:52:43.3819340Z add.s64 %rd121, %rd127, 229376; 2026-02-21T08:52:43.3819538Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3819601Z // begin inline asm 2026-02-21T08:52:43.3819736Z cp.async.ca.shared.global [ %r1110 + 0 ], [ %rd121 + 0 ], 0x4, %r1107; 2026-02-21T08:52:43.3819794Z // end inline asm 2026-02-21T08:52:43.3819862Z cp.async.commit_group; 2026-02-21T08:52:43.3820065Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3820127Z shl.b32 %r1133, %r1119, 7; 2026-02-21T08:52:43.3820190Z or.b32 %r1134, %r6, %r1133; 2026-02-21T08:52:43.3820250Z cvt.s16.s8 %rs224, %rs222; 2026-02-21T08:52:43.3820322Z mad.wide.s16 %r1135, %rs224, 64, %r1134; 2026-02-21T08:52:43.3820381Z shl.b32 %r1136, %r1135, 13; 2026-02-21T08:52:43.3820446Z or.b32 %r1432, %r67, %r1136; 2026-02-21T08:52:43.3820507Z add.s64 %rd150, %rd1, %rd124; 2026-02-21T08:52:43.3820576Z mov.b32 %r1435, 0f00000000; 2026-02-21T08:52:43.3820638Z mov.b32 %r1434, 1; 2026-02-21T08:52:43.3820699Z mov.b32 %r1433, -1; 2026-02-21T08:52:43.3820762Z mov.b64 %rd151, -32; 2026-02-21T08:52:43.3820831Z mov.b32 %r1436, %r1435; 2026-02-21T08:52:43.3820890Z mov.b32 %r1437, %r1435; 2026-02-21T08:52:43.3820949Z mov.b32 %r1438, %r1435; 2026-02-21T08:52:43.3821006Z mov.b32 %r1439, %r1435; 2026-02-21T08:52:43.3821069Z mov.b32 %r1440, %r1435; 2026-02-21T08:52:43.3821128Z mov.b32 %r1441, %r1435; 2026-02-21T08:52:43.3821185Z mov.b32 %r1442, %r1435; 2026-02-21T08:52:43.3821297Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:43.3821402Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.3821465Z add.s64 %rd151, %rd151, 32; 2026-02-21T08:52:43.3821532Z setp.lt.u64 %p52, %rd151, 4032; 2026-02-21T08:52:43.3821605Z add.s32 %r1323, %r1433, 1; 2026-02-21T08:52:43.3821672Z setp.gt.s32 %p53, %r1323, 1; 2026-02-21T08:52:43.3821746Z selp.b32 %r1433, 0, %r1323, %p53; 2026-02-21T08:52:43.3821945Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3822114Z cp.async.wait_group 2; 2026-02-21T08:52:43.3822169Z bar.sync 0; 2026-02-21T08:52:43.3822235Z shl.b32 %r1324, %r1433, 12; 2026-02-21T08:52:43.3822295Z shl.b32 %r1325, %r1433, 13; 2026-02-21T08:52:43.3822358Z add.s32 %r1326, %r1379, %r1325; 2026-02-21T08:52:43.3822419Z add.s32 %r1327, %r1326, 32768; 2026-02-21T08:52:43.3822617Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.3822676Z add.s32 %r1328, %r1327, %r52; 2026-02-21T08:52:43.3822741Z ld.shared.b16 %rs225, [%r1328]; 2026-02-21T08:52:43.3822827Z ld.shared.b16 %rs226, [%r1328+1024]; 2026-02-21T08:52:43.3822896Z ld.shared.b16 %rs227, [%r1328+64]; 2026-02-21T08:52:43.3822963Z ld.shared.b16 %rs228, [%r1328+1088]; 2026-02-21T08:52:43.3823133Z add.s32 %r1329, %r1327, %r53; 2026-02-21T08:52:43.3823203Z ld.shared.b16 %rs229, [%r1329]; 2026-02-21T08:52:43.3823269Z ld.shared.b16 %rs230, [%r1329+1024]; 2026-02-21T08:52:43.3823333Z ld.shared.b16 %rs231, [%r1329+64]; 2026-02-21T08:52:43.3823407Z ld.shared.b16 %rs232, [%r1329+1088]; 2026-02-21T08:52:43.3823468Z add.s32 %r1330, %r1327, %r54; 2026-02-21T08:52:43.3823532Z ld.shared.b16 %rs233, [%r1330]; 2026-02-21T08:52:43.3823604Z ld.shared.b16 %rs234, [%r1330+1024]; 2026-02-21T08:52:43.3823669Z ld.shared.b16 %rs235, [%r1330+64]; 2026-02-21T08:52:43.3823734Z ld.shared.b16 %rs236, [%r1330+1088]; 2026-02-21T08:52:43.3823793Z add.s32 %r1331, %r1327, %r55; 2026-02-21T08:52:43.3823862Z ld.shared.b16 %rs237, [%r1331]; 2026-02-21T08:52:43.3823927Z ld.shared.b16 %rs238, [%r1331+1024]; 2026-02-21T08:52:43.3823990Z ld.shared.b16 %rs239, [%r1331+64]; 2026-02-21T08:52:43.3824058Z ld.shared.b16 %rs240, [%r1331+1088]; 2026-02-21T08:52:43.3824118Z add.s32 %r1332, %r1327, %r56; 2026-02-21T08:52:43.3824183Z ld.shared.b16 %rs241, [%r1332]; 2026-02-21T08:52:43.3824249Z ld.shared.b16 %rs242, [%r1332+1024]; 2026-02-21T08:52:43.3824316Z ld.shared.b16 %rs243, [%r1332+64]; 2026-02-21T08:52:43.3824384Z ld.shared.b16 %rs244, [%r1332+1088]; 2026-02-21T08:52:43.3824443Z add.s32 %r1333, %r1327, %r57; 2026-02-21T08:52:43.3824511Z ld.shared.b16 %rs245, [%r1333]; 2026-02-21T08:52:43.3824576Z ld.shared.b16 %rs246, [%r1333+1024]; 2026-02-21T08:52:43.3824639Z ld.shared.b16 %rs247, [%r1333+64]; 2026-02-21T08:52:43.3824707Z ld.shared.b16 %rs248, [%r1333+1088]; 2026-02-21T08:52:43.3824766Z add.s32 %r1334, %r1327, %r58; 2026-02-21T08:52:43.3824833Z ld.shared.b16 %rs249, [%r1334]; 2026-02-21T08:52:43.3824899Z ld.shared.b16 %rs250, [%r1334+1024]; 2026-02-21T08:52:43.3824969Z ld.shared.b16 %rs251, [%r1334+64]; 2026-02-21T08:52:43.3825033Z ld.shared.b16 %rs252, [%r1334+1088]; 2026-02-21T08:52:43.3825107Z add.s32 %r1335, %r1327, %r59; 2026-02-21T08:52:43.3825175Z ld.shared.b16 %rs253, [%r1335]; 2026-02-21T08:52:43.3825243Z ld.shared.b16 %rs254, [%r1335+1024]; 2026-02-21T08:52:43.3825308Z ld.shared.b16 %rs255, [%r1335+64]; 2026-02-21T08:52:43.3825372Z ld.shared.b16 %rs256, [%r1335+1088]; 2026-02-21T08:52:43.3825447Z cvt.f32.bf16 %r1153, %rs225; 2026-02-21T08:52:43.3825512Z cvt.f32.bf16 %r1154, %rs226; 2026-02-21T08:52:43.3825572Z cvt.f32.bf16 %r1155, %rs229; 2026-02-21T08:52:43.3825637Z cvt.f32.bf16 %r1156, %rs230; 2026-02-21T08:52:43.3825699Z cvt.f32.bf16 %r1173, %rs233; 2026-02-21T08:52:43.3825760Z cvt.f32.bf16 %r1174, %rs234; 2026-02-21T08:52:43.3825821Z cvt.f32.bf16 %r1175, %rs237; 2026-02-21T08:52:43.3825888Z cvt.f32.bf16 %r1176, %rs238; 2026-02-21T08:52:43.3825947Z cvt.f32.bf16 %r1193, %rs241; 2026-02-21T08:52:43.3826007Z cvt.f32.bf16 %r1194, %rs242; 2026-02-21T08:52:43.3826071Z cvt.f32.bf16 %r1195, %rs245; 2026-02-21T08:52:43.3826129Z cvt.f32.bf16 %r1196, %rs246; 2026-02-21T08:52:43.3826188Z cvt.f32.bf16 %r1213, %rs249; 2026-02-21T08:52:43.3826258Z cvt.f32.bf16 %r1214, %rs250; 2026-02-21T08:52:43.3826318Z cvt.f32.bf16 %r1215, %rs253; 2026-02-21T08:52:43.3826378Z cvt.f32.bf16 %r1216, %rs254; 2026-02-21T08:52:43.3826437Z cvt.f32.bf16 %r1233, %rs227; 2026-02-21T08:52:43.3826778Z cvt.f32.bf16 %r1234, %rs228; 2026-02-21T08:52:43.3826842Z cvt.f32.bf16 %r1235, %rs231; 2026-02-21T08:52:43.3826905Z cvt.f32.bf16 %r1236, %rs232; 2026-02-21T08:52:43.3826971Z cvt.f32.bf16 %r1253, %rs235; 2026-02-21T08:52:43.3827031Z cvt.f32.bf16 %r1254, %rs236; 2026-02-21T08:52:43.3827091Z cvt.f32.bf16 %r1255, %rs239; 2026-02-21T08:52:43.3827149Z cvt.f32.bf16 %r1256, %rs240; 2026-02-21T08:52:43.3827215Z cvt.f32.bf16 %r1273, %rs243; 2026-02-21T08:52:43.3827274Z cvt.f32.bf16 %r1274, %rs244; 2026-02-21T08:52:43.3827334Z cvt.f32.bf16 %r1275, %rs247; 2026-02-21T08:52:43.3827396Z cvt.f32.bf16 %r1276, %rs248; 2026-02-21T08:52:43.3827459Z cvt.f32.bf16 %r1293, %rs251; 2026-02-21T08:52:43.3827518Z cvt.f32.bf16 %r1294, %rs252; 2026-02-21T08:52:43.3827577Z cvt.f32.bf16 %r1295, %rs255; 2026-02-21T08:52:43.3827762Z cvt.f32.bf16 %r1296, %rs256; 2026-02-21T08:52:43.3827980Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.3828050Z add.s32 %r1336, %r60, %r1324; 2026-02-21T08:52:43.3828123Z ld.shared.b8 %rs257, [%r1336]; 2026-02-21T08:52:43.3828191Z ld.shared.b8 %rs258, [%r1336+512]; 2026-02-21T08:52:43.3828260Z ld.shared.b8 %rs259, [%r1336+1024]; 2026-02-21T08:52:43.3828331Z ld.shared.b8 %rs260, [%r1336+1536]; 2026-02-21T08:52:43.3828396Z ld.shared.b8 %rs261, [%r1336+2048]; 2026-02-21T08:52:43.3828460Z ld.shared.b8 %rs262, [%r1336+2560]; 2026-02-21T08:52:43.3828587Z ld.shared.b8 %rs263, [%r1336+3072]; 2026-02-21T08:52:43.3828662Z ld.shared.b8 %rs264, [%r1336+3584]; 2026-02-21T08:52:43.3828859Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.3828921Z shl.b16 %rs265, %rs257, 4; 2026-02-21T08:52:43.3828993Z shl.b16 %rs266, %rs258, 4; 2026-02-21T08:52:43.3829052Z shl.b16 %rs267, %rs259, 4; 2026-02-21T08:52:43.3829113Z shl.b16 %rs268, %rs260, 4; 2026-02-21T08:52:43.3829174Z shl.b16 %rs269, %rs261, 4; 2026-02-21T08:52:43.3829240Z shl.b16 %rs270, %rs262, 4; 2026-02-21T08:52:43.3829299Z shl.b16 %rs271, %rs263, 4; 2026-02-21T08:52:43.3829358Z shl.b16 %rs272, %rs264, 4; 2026-02-21T08:52:43.3829559Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.3829632Z selp.b16 %rs273, %rs265, %rs257, %p57; 2026-02-21T08:52:43.3829694Z cvt.s16.s8 %rs274, %rs273; 2026-02-21T08:52:43.3829760Z shr.s16 %rs275, %rs274, 4; 2026-02-21T08:52:43.3829828Z selp.b16 %rs276, %rs266, %rs258, %p57; 2026-02-21T08:52:43.3829887Z cvt.s16.s8 %rs277, %rs276; 2026-02-21T08:52:43.3829945Z shr.s16 %rs278, %rs277, 4; 2026-02-21T08:52:43.3830017Z selp.b16 %rs279, %rs267, %rs259, %p57; 2026-02-21T08:52:43.3830077Z cvt.s16.s8 %rs280, %rs279; 2026-02-21T08:52:43.3830138Z shr.s16 %rs281, %rs280, 4; 2026-02-21T08:52:43.3830210Z selp.b16 %rs282, %rs268, %rs260, %p57; 2026-02-21T08:52:43.3830269Z cvt.s16.s8 %rs283, %rs282; 2026-02-21T08:52:43.3830327Z shr.s16 %rs284, %rs283, 4; 2026-02-21T08:52:43.3830397Z selp.b16 %rs285, %rs269, %rs261, %p57; 2026-02-21T08:52:43.3830462Z cvt.s16.s8 %rs286, %rs285; 2026-02-21T08:52:43.3830524Z shr.s16 %rs287, %rs286, 4; 2026-02-21T08:52:43.3830590Z selp.b16 %rs288, %rs270, %rs262, %p57; 2026-02-21T08:52:43.3830653Z cvt.s16.s8 %rs289, %rs288; 2026-02-21T08:52:43.3830712Z shr.s16 %rs290, %rs289, 4; 2026-02-21T08:52:43.3830779Z selp.b16 %rs291, %rs271, %rs263, %p57; 2026-02-21T08:52:43.3830840Z cvt.s16.s8 %rs292, %rs291; 2026-02-21T08:52:43.3830905Z shr.s16 %rs293, %rs292, 4; 2026-02-21T08:52:43.3830971Z selp.b16 %rs294, %rs272, %rs264, %p57; 2026-02-21T08:52:43.3831030Z cvt.s16.s8 %rs295, %rs294; 2026-02-21T08:52:43.3831096Z shr.s16 %rs296, %rs295, 4; 2026-02-21T08:52:43.3831293Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.3831358Z cvt.rn.f32.s16 %r1337, %rs275; 2026-02-21T08:52:43.3831424Z cvt.rn.f32.s16 %r1338, %rs278; 2026-02-21T08:52:43.3831594Z cvt.rn.f32.s16 %r1339, %rs281; 2026-02-21T08:52:43.3831656Z cvt.rn.f32.s16 %r1340, %rs284; 2026-02-21T08:52:43.3831716Z cvt.rn.f32.s16 %r1341, %rs287; 2026-02-21T08:52:43.3831780Z cvt.rn.f32.s16 %r1342, %rs290; 2026-02-21T08:52:43.3831840Z cvt.rn.f32.s16 %r1343, %rs293; 2026-02-21T08:52:43.3831900Z cvt.rn.f32.s16 %r1344, %rs296; 2026-02-21T08:52:43.3831980Z st.shared.b32 [%r61], %r1337; 2026-02-21T08:52:43.3832048Z st.shared.b32 [%r61+16384], %r1341; 2026-02-21T08:52:43.3832111Z st.shared.b32 [%r62], %r1338; 2026-02-21T08:52:43.3832176Z st.shared.b32 [%r62+16384], %r1342; 2026-02-21T08:52:43.3832243Z st.shared.b32 [%r63], %r1339; 2026-02-21T08:52:43.3832306Z st.shared.b32 [%r63+16384], %r1343; 2026-02-21T08:52:43.3832367Z st.shared.b32 [%r64], %r1340; 2026-02-21T08:52:43.3832529Z st.shared.b32 [%r64+16384], %r1344; 2026-02-21T08:52:43.3832586Z $L__tmp7: 2026-02-21T08:52:43.3832860Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.3832928Z // begin inline asm 2026-02-21T08:52:43.3833005Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.3833062Z // end inline asm 2026-02-21T08:52:43.3833116Z bar.sync 0; 2026-02-21T08:52:43.3833209Z shfl.sync.idx.b32 %r1345, %r4, 0, 31, -1; 2026-02-21T08:52:43.3833286Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.3833346Z shl.b32 %r1346, %r1345, 9; 2026-02-21T08:52:43.3833413Z and.b32 %r1347, %r1346, 14336; 2026-02-21T08:52:43.3833475Z add.s32 %r1348, %r1347, %r1379; 2026-02-21T08:52:43.3833534Z bfe.u32 %r1349, %r1348, 4, 14; 2026-02-21T08:52:43.3833598Z cvt.u64.u32 %rd138, %r1349; 2026-02-21T08:52:43.3833679Z or.b64 %rd128, %rd138, 4611686293372403712; 2026-02-21T08:52:43.3833742Z mov.pred %p43, -1; 2026-02-21T08:52:43.3833803Z // begin inline asm 2026-02-21T08:52:43.3834193Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1153,%r1154,%r1155,%r1156}, %rd128, %p43, 1, 1; 2026-02-21T08:52:43.3834253Z // end inline asm 2026-02-21T08:52:43.3834321Z add.s32 %r1350, %r1348, 32; 2026-02-21T08:52:43.3834389Z bfe.u32 %r1351, %r1350, 4, 14; 2026-02-21T08:52:43.3834452Z cvt.u64.u32 %rd139, %r1351; 2026-02-21T08:52:43.3834526Z or.b64 %rd129, %rd139, 4611686293372403712; 2026-02-21T08:52:43.3834584Z // begin inline asm 2026-02-21T08:52:43.3834963Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1173,%r1174,%r1175,%r1176}, %rd129, %p43, 1, 1; 2026-02-21T08:52:43.3835020Z // end inline asm 2026-02-21T08:52:43.3835079Z add.s32 %r1352, %r1348, 64; 2026-02-21T08:52:43.3835145Z bfe.u32 %r1353, %r1352, 4, 14; 2026-02-21T08:52:43.3835206Z cvt.u64.u32 %rd140, %r1353; 2026-02-21T08:52:43.3835281Z or.b64 %rd130, %rd140, 4611686293372403712; 2026-02-21T08:52:43.3835347Z // begin inline asm 2026-02-21T08:52:43.3835715Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1193,%r1194,%r1195,%r1196}, %rd130, %p43, 1, 1; 2026-02-21T08:52:43.3835775Z // end inline asm 2026-02-21T08:52:43.3835834Z add.s32 %r1354, %r1348, 96; 2026-02-21T08:52:43.3835900Z bfe.u32 %r1355, %r1354, 4, 14; 2026-02-21T08:52:43.3835972Z cvt.u64.u32 %rd141, %r1355; 2026-02-21T08:52:43.3836047Z or.b64 %rd131, %rd141, 4611686293372403712; 2026-02-21T08:52:43.3836109Z // begin inline asm 2026-02-21T08:52:43.3836596Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1213,%r1214,%r1215,%r1216}, %rd131, %p43, 1, 1; 2026-02-21T08:52:43.3836658Z // end inline asm 2026-02-21T08:52:43.3836727Z add.s32 %r1356, %r1348, 16384; 2026-02-21T08:52:43.3836784Z bfe.u32 %r1357, %r1356, 4, 14; 2026-02-21T08:52:43.3836858Z cvt.u64.u32 %rd142, %r1357; 2026-02-21T08:52:43.3836931Z or.b64 %rd132, %rd142, 4611686293372403712; 2026-02-21T08:52:43.3836997Z // begin inline asm 2026-02-21T08:52:43.3837363Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1233,%r1234,%r1235,%r1236}, %rd132, %p43, 1, 1; 2026-02-21T08:52:43.3837555Z // end inline asm 2026-02-21T08:52:43.3837621Z add.s32 %r1358, %r1348, 16416; 2026-02-21T08:52:43.3837680Z bfe.u32 %r1359, %r1358, 4, 14; 2026-02-21T08:52:43.3837742Z cvt.u64.u32 %rd143, %r1359; 2026-02-21T08:52:43.3837816Z or.b64 %rd133, %rd143, 4611686293372403712; 2026-02-21T08:52:43.3837876Z // begin inline asm 2026-02-21T08:52:43.3838243Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1253,%r1254,%r1255,%r1256}, %rd133, %p43, 1, 1; 2026-02-21T08:52:43.3838301Z // end inline asm 2026-02-21T08:52:43.3838364Z add.s32 %r1360, %r1348, 16448; 2026-02-21T08:52:43.3838543Z bfe.u32 %r1361, %r1360, 4, 14; 2026-02-21T08:52:43.3838607Z cvt.u64.u32 %rd144, %r1361; 2026-02-21T08:52:43.3838681Z or.b64 %rd134, %rd144, 4611686293372403712; 2026-02-21T08:52:43.3838743Z // begin inline asm 2026-02-21T08:52:43.3839109Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1273,%r1274,%r1275,%r1276}, %rd134, %p43, 1, 1; 2026-02-21T08:52:43.3839168Z // end inline asm 2026-02-21T08:52:43.3839225Z add.s32 %r1362, %r1348, 16480; 2026-02-21T08:52:43.3839285Z bfe.u32 %r1363, %r1362, 4, 14; 2026-02-21T08:52:43.3839345Z cvt.u64.u32 %rd145, %r1363; 2026-02-21T08:52:43.3839419Z or.b64 %rd135, %rd145, 4611686293372403712; 2026-02-21T08:52:43.3839476Z // begin inline asm 2026-02-21T08:52:43.3839845Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1293,%r1294,%r1295,%r1296}, %rd135, %p43, 1, 1; 2026-02-21T08:52:43.3839905Z // end inline asm 2026-02-21T08:52:43.3839982Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.3840039Z mov.b32 %r1306, 0; 2026-02-21T08:52:43.3840097Z mov.b32 %r1305, %r1379; 2026-02-21T08:52:43.3840157Z mov.b32 %r1307, %r1306; 2026-02-21T08:52:43.3840218Z // begin inline asm 2026-02-21T08:52:43.3840396Z // wait for regs: %r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442,%r1305,%r1306,%r1307 2026-02-21T08:52:43.3840475Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.3840531Z // end inline asm 2026-02-21T08:52:43.3840585Z $L__tmp8: 2026-02-21T08:52:43.3840808Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3840877Z add.s32 %r1364, %r1434, 1; 2026-02-21T08:52:43.3840946Z setp.gt.s32 %p54, %r1364, 1; 2026-02-21T08:52:43.3841013Z selp.b32 %r1434, 0, %r1364, %p54; 2026-02-21T08:52:43.3841217Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.3841290Z mad.wide.s32 %rd136, %r1432, 2, %rd15; 2026-02-21T08:52:43.3841484Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.3841551Z shl.b32 %r1365, %r1434, 12; 2026-02-21T08:52:43.3841609Z shl.b32 %r1366, %r1434, 13; 2026-02-21T08:52:43.3841668Z add.s32 %r1319, %r47, %r1366; 2026-02-21T08:52:43.3841735Z selp.b32 %r1320, 8, 0, %p52; 2026-02-21T08:52:43.3841795Z // begin inline asm 2026-02-21T08:52:43.3841938Z cp.async.ca.shared.global [ %r1319 + 0 ], [ %rd136 + 0 ], 0x8, %r1320; 2026-02-21T08:52:43.3841995Z // end inline asm 2026-02-21T08:52:43.3842066Z cp.async.commit_group; 2026-02-21T08:52:43.3842259Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.3842319Z add.s32 %r1321, %r49, %r1365; 2026-02-21T08:52:43.3842386Z selp.b32 %r1322, 4, 0, %p52; 2026-02-21T08:52:43.3842444Z // begin inline asm 2026-02-21T08:52:43.3842583Z cp.async.ca.shared.global [ %r1321 + 0 ], [ %rd150 + 0 ], 0x4, %r1322; 2026-02-21T08:52:43.3842655Z // end inline asm 2026-02-21T08:52:43.3842722Z cp.async.commit_group; 2026-02-21T08:52:43.3842926Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3843095Z add.s32 %r1432, %r1432, 64; 2026-02-21T08:52:43.3843175Z add.s64 %rd150, %rd150, 229376; 2026-02-21T08:52:43.3843241Z setp.lt.u64 %p55, %rd151, 4064; 2026-02-21T08:52:43.3843300Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:43.3843413Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:43.3843608Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.3843667Z or.b32 %r1371, %r157, %r11; 2026-02-21T08:52:43.3843872Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.3843938Z cp.async.wait_group 0; 2026-02-21T08:52:43.3844117Z bar.sync 0; 2026-02-21T08:52:43.3844315Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.3844400Z cvt.rn.bf16x2.f32 %r1372, %r1436, %r1435; 2026-02-21T08:52:43.3844479Z cvt.rn.bf16x2.f32 %r1373, %r1438, %r1437; 2026-02-21T08:52:43.3844550Z cvt.rn.bf16x2.f32 %r1374, %r1440, %r1439; 2026-02-21T08:52:43.3844625Z cvt.rn.bf16x2.f32 %r1375, %r1442, %r1441; 2026-02-21T08:52:43.3844825Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.3844897Z mad.lo.s32 %r1376, %r156, 7168, %r1371; 2026-02-21T08:52:43.3845098Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.3845168Z mad.wide.s32 %rd146, %r1376, 2, %rd17; 2026-02-21T08:52:43.3845360Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.3845551Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r65], {%r1372, %r1373, %r1374, %r1375}; 2026-02-21T08:52:43.3845612Z bar.sync 0; 2026-02-21T08:52:43.3845725Z ld.shared.v4.b32 {%r1367, %r1368, %r1369, %r1370}, [%r66]; 2026-02-21T08:52:43.3845789Z // begin inline asm 2026-02-21T08:52:43.3845921Z st.global.v4.b32 [ %rd146 + 0 ], { %r1367, %r1368, %r1369, %r1370 }; 2026-02-21T08:52:43.3845983Z // end inline asm 2026-02-21T08:52:43.3846186Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.3846253Z add.s32 %r181, %r1431, 4224; 2026-02-21T08:52:43.3846328Z setp.lt.s32 %p56, %r1431, -4168; 2026-02-21T08:52:43.3846388Z mov.b32 %r1431, %r181; 2026-02-21T08:52:43.3846571Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:43.3846674Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:43.3846874Z .loc 1 19 4 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:4 2026-02-21T08:52:43.3846931Z ret; 2026-02-21T08:52:43.3846996Z $L__tmp9: 2026-02-21T08:52:43.3847052Z $L__func_end0: 2026-02-21T08:52:43.3847138Z // -- End function 2026-02-21T08:52:43.3847198Z } 2026-02-21T08:52:43.3847448Z .file 1 "/tmp/torchinductor_root/f7/cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py" 2026-02-21T08:52:43.3847664Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:43.3847730Z .section .debug_abbrev 2026-02-21T08:52:43.3847788Z { 2026-02-21T08:52:43.3847884Z .b8 1 // Abbreviation Code 2026-02-21T08:52:43.3847978Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:43.3848067Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:43.3848166Z .b8 37 // DW_AT_producer 2026-02-21T08:52:43.3848249Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.3848333Z .b8 19 // DW_AT_language 2026-02-21T08:52:43.3848418Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:43.3848497Z .b8 3 // DW_AT_name 2026-02-21T08:52:43.3848715Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.3848802Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:43.3848880Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:43.3848960Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:43.3849043Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.3849117Z .b8 0 // EOM(1) 2026-02-21T08:52:43.3849186Z .b8 0 // EOM(2) 2026-02-21T08:52:43.3849279Z .b8 2 // Abbreviation Code 2026-02-21T08:52:43.3849366Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:43.3849564Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:43.3849643Z .b8 3 // DW_AT_name 2026-02-21T08:52:43.3849728Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.3849811Z .b8 32 // DW_AT_inline 2026-02-21T08:52:43.3849891Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.3849964Z .b8 0 // EOM(1) 2026-02-21T08:52:43.3850031Z .b8 0 // EOM(2) 2026-02-21T08:52:43.3850113Z .b8 3 // Abbreviation Code 2026-02-21T08:52:43.3850209Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:43.3850300Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:43.3850380Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:43.3850459Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.3850545Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:43.3850624Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.3850715Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:43.3850799Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:43.3850870Z .b8 0 // EOM(1) 2026-02-21T08:52:43.3850937Z .b8 0 // EOM(2) 2026-02-21T08:52:43.3851024Z .b8 4 // Abbreviation Code 2026-02-21T08:52:43.3851123Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:43.3851202Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:43.3851293Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:43.3851370Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:43.3851445Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:43.3851525Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.3851613Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:43.3851688Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.3851772Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:43.3851854Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.3851933Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:43.3852010Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.3852095Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:43.3852172Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.3852242Z .b8 0 // EOM(1) 2026-02-21T08:52:43.3852311Z .b8 0 // EOM(2) 2026-02-21T08:52:43.3852383Z .b8 0 // EOM(3) 2026-02-21T08:52:43.3852445Z } 2026-02-21T08:52:43.3852512Z .section .debug_info 2026-02-21T08:52:43.3852566Z { 2026-02-21T08:52:43.3852655Z .b32 178 // Length of Unit 2026-02-21T08:52:43.3852747Z .b8 2 // DWARF version number 2026-02-21T08:52:43.3853120Z .b8 0 2026-02-21T08:52:43.3853253Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:43.3853348Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:43.3853460Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:43.3853549Z .b8 116 // DW_AT_producer 2026-02-21T08:52:43.3853602Z .b8 114 2026-02-21T08:52:43.3853653Z .b8 105 2026-02-21T08:52:43.3853707Z .b8 116 2026-02-21T08:52:43.3853759Z .b8 111 2026-02-21T08:52:43.3853809Z .b8 110 2026-02-21T08:52:43.3853859Z .b8 0 2026-02-21T08:52:43.3853941Z .b8 2 // DW_AT_language 2026-02-21T08:52:43.3854085Z .b8 0 2026-02-21T08:52:43.3854166Z .b8 99 // DW_AT_name 2026-02-21T08:52:43.3854222Z .b8 102 2026-02-21T08:52:43.3854274Z .b8 55 2026-02-21T08:52:43.3854324Z .b8 54 2026-02-21T08:52:43.3854378Z .b8 103 2026-02-21T08:52:43.3854433Z .b8 50 2026-02-21T08:52:43.3854484Z .b8 108 2026-02-21T08:52:43.3854535Z .b8 105 2026-02-21T08:52:43.3854591Z .b8 98 2026-02-21T08:52:43.3854641Z .b8 55 2026-02-21T08:52:43.3854692Z .b8 103 2026-02-21T08:52:43.3854742Z .b8 52 2026-02-21T08:52:43.3854797Z .b8 112 2026-02-21T08:52:43.3854848Z .b8 100 2026-02-21T08:52:43.3854899Z .b8 121 2026-02-21T08:52:43.3854959Z .b8 117 2026-02-21T08:52:43.3855019Z .b8 106 2026-02-21T08:52:43.3855070Z .b8 99 2026-02-21T08:52:43.3855121Z .b8 52 2026-02-21T08:52:43.3855175Z .b8 55 2026-02-21T08:52:43.3855227Z .b8 122 2026-02-21T08:52:43.3855276Z .b8 116 2026-02-21T08:52:43.3855328Z .b8 112 2026-02-21T08:52:43.3855387Z .b8 120 2026-02-21T08:52:43.3855437Z .b8 54 2026-02-21T08:52:43.3855487Z .b8 52 2026-02-21T08:52:43.3855544Z .b8 117 2026-02-21T08:52:43.3855594Z .b8 97 2026-02-21T08:52:43.3855644Z .b8 122 2026-02-21T08:52:43.3855694Z .b8 51 2026-02-21T08:52:43.3855749Z .b8 99 2026-02-21T08:52:43.3855800Z .b8 117 2026-02-21T08:52:43.3855853Z .b8 115 2026-02-21T08:52:43.3855906Z .b8 103 2026-02-21T08:52:43.3855956Z .b8 53 2026-02-21T08:52:43.3856009Z .b8 105 2026-02-21T08:52:43.3856060Z .b8 117 2026-02-21T08:52:43.3856115Z .b8 102 2026-02-21T08:52:43.3856164Z .b8 97 2026-02-21T08:52:43.3856215Z .b8 99 2026-02-21T08:52:43.3856266Z .b8 119 2026-02-21T08:52:43.3856320Z .b8 100 2026-02-21T08:52:43.3856371Z .b8 100 2026-02-21T08:52:43.3856421Z .b8 55 2026-02-21T08:52:43.3856600Z .b8 118 2026-02-21T08:52:43.3856656Z .b8 119 2026-02-21T08:52:43.3856722Z .b8 122 2026-02-21T08:52:43.3856777Z .b8 101 2026-02-21T08:52:43.3856844Z .b8 115 2026-02-21T08:52:43.3856897Z .b8 106 2026-02-21T08:52:43.3856950Z .b8 122 2026-02-21T08:52:43.3857006Z .b8 53 2026-02-21T08:52:43.3857055Z .b8 46 2026-02-21T08:52:43.3857105Z .b8 112 2026-02-21T08:52:43.3857159Z .b8 121 2026-02-21T08:52:43.3857212Z .b8 0 2026-02-21T08:52:43.3857313Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:43.3857393Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:43.3857450Z .b8 116 2026-02-21T08:52:43.3857501Z .b8 109 2026-02-21T08:52:43.3857552Z .b8 112 2026-02-21T08:52:43.3857601Z .b8 47 2026-02-21T08:52:43.3857655Z .b8 116 2026-02-21T08:52:43.3857704Z .b8 111 2026-02-21T08:52:43.3857755Z .b8 114 2026-02-21T08:52:43.3857808Z .b8 99 2026-02-21T08:52:43.3857858Z .b8 104 2026-02-21T08:52:43.3857909Z .b8 105 2026-02-21T08:52:43.3857958Z .b8 110 2026-02-21T08:52:43.3858012Z .b8 100 2026-02-21T08:52:43.3858060Z .b8 117 2026-02-21T08:52:43.3858109Z .b8 99 2026-02-21T08:52:43.3858161Z .b8 116 2026-02-21T08:52:43.3858217Z .b8 111 2026-02-21T08:52:43.3858266Z .b8 114 2026-02-21T08:52:43.3858316Z .b8 95 2026-02-21T08:52:43.3858371Z .b8 114 2026-02-21T08:52:43.3858421Z .b8 111 2026-02-21T08:52:43.3858471Z .b8 111 2026-02-21T08:52:43.3858523Z .b8 116 2026-02-21T08:52:43.3858589Z .b8 47 2026-02-21T08:52:43.3858643Z .b8 102 2026-02-21T08:52:43.3858693Z .b8 55 2026-02-21T08:52:43.3858747Z .b8 0 2026-02-21T08:52:43.3858859Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:43.3859080Z .b8 95 // DW_AT_name 2026-02-21T08:52:43.3859131Z .b8 104 2026-02-21T08:52:43.3859187Z .b8 101 2026-02-21T08:52:43.3859239Z .b8 108 2026-02-21T08:52:43.3859290Z .b8 105 2026-02-21T08:52:43.3859345Z .b8 111 2026-02-21T08:52:43.3859395Z .b8 110 2026-02-21T08:52:43.3859445Z .b8 95 2026-02-21T08:52:43.3859495Z .b8 109 2026-02-21T08:52:43.3859554Z .b8 97 2026-02-21T08:52:43.3859612Z .b8 116 2026-02-21T08:52:43.3859664Z .b8 109 2026-02-21T08:52:43.3859719Z .b8 117 2026-02-21T08:52:43.3859769Z .b8 108 2026-02-21T08:52:43.3859818Z .b8 95 2026-02-21T08:52:43.3859870Z .b8 98 2026-02-21T08:52:43.3859924Z .b8 102 2026-02-21T08:52:43.3859978Z .b8 49 2026-02-21T08:52:43.3860029Z .b8 54 2026-02-21T08:52:43.3874944Z .b8 95 2026-02-21T08:52:43.3875145Z .b8 105 2026-02-21T08:52:43.3875209Z .b8 110 2026-02-21T08:52:43.3875292Z .b8 116 2026-02-21T08:52:43.3875373Z .b8 52 2026-02-21T08:52:43.3875486Z .b8 0 2026-02-21T08:52:43.3875608Z .b8 1 // DW_AT_inline 2026-02-21T08:52:43.3875754Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:43.3875871Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:43.3875977Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:43.3876087Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:43.3876238Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:43.3876362Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:43.3876614Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:43.3876733Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:43.3876835Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:43.3876921Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:43.3877016Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:43.3877117Z .b8 0 // End Of Children Mark 2026-02-21T08:52:43.3877205Z .b8 0 // End Of Children Mark 2026-02-21T08:52:43.3877266Z } 2026-02-21T08:52:43.3877349Z .section .debug_macinfo { } 2026-02-21T08:52:43.3877355Z 2026-02-21T08:52:43.3877455Z ================================================================ 2026-02-21T08:52:43.3877584Z please share the reproducer above with Triton project. 2026-02-21T08:52:43.9529844Z 2026-02-21T08:52:43.9529857Z 2026-02-21T08:52:43.9529862Z 2026-02-21T08:52:43.9530180Z ================================================================ 2026-02-21T08:52:43.9530559Z Internal Triton PTX codegen error 2026-02-21T08:52:43.9530845Z `ptxas` stderr: 2026-02-21T08:52:43.9531569Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 417 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:43.9532451Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:43.9532694Z 2026-02-21T08:52:43.9533341Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8a7fg57n.ptx -o /tmp/tmp8a7fg57n.ptx.o 2026-02-21T08:52:43.9534126Z 2026-02-21T08:52:43.9534133Z 2026-02-21T08:52:43.9534220Z // 2026-02-21T08:52:43.9534425Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:43.9534660Z // 2026-02-21T08:52:43.9534740Z 2026-02-21T08:52:43.9534828Z .version 8.7 2026-02-21T08:52:43.9534986Z .target sm_90a 2026-02-21T08:52:43.9535151Z .address_size 64 2026-02-21T08:52:43.9535250Z 2026-02-21T08:52:43.9535457Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:43.9535845Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:43.9536134Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:43.9537445Z [81s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:43.9539165Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[2, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:43.9540778Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:43.9541095Z `ptxas` stderr: 2026-02-21T08:52:43.9541910Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 417 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:43.9542626Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:43.9542834Z 2026-02-21T08:52:43.9543386Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8a7fg57n.ptx -o /tmp/tmp8a7fg57n.ptx.o 2026-02-21T08:52:43.9544026Z 2026-02-21T08:52:43.9544197Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:43.9544536Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:43.9544876Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:43.9545250Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:43.9545604Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:43.9545949Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:43.9546295Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:43.9546746Z ) 2026-02-21T08:52:43.9546897Z .reqntid 1024 2026-02-21T08:52:43.9547050Z .maxnreg 32 2026-02-21T08:52:43.9547180Z { 2026-02-21T08:52:43.9547331Z .reg .pred %p<58>; 2026-02-21T08:52:43.9547500Z .reg .b16 %rs<297>; 2026-02-21T08:52:43.9547664Z .reg .b32 %r<1443>; 2026-02-21T08:52:43.9547815Z .reg .b64 %rd<152>; 2026-02-21T08:52:43.9548126Z .loc 1 14 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:14:0 2026-02-21T08:52:43.9548494Z $L__func_begin0: 2026-02-21T08:52:43.9548879Z .loc 1 14 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:14:0 2026-02-21T08:52:43.9549165Z 2026-02-21T08:52:43.9549232Z // %bb.0: 2026-02-21T08:52:43.9549428Z ld.param.b64 %rd17, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:43.9549755Z ld.param.b64 %rd16, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:43.9550052Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:43.9550293Z $L__tmp0: 2026-02-21T08:52:43.9550586Z .loc 1 19 46 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:46 2026-02-21T08:52:43.9550954Z mov.u32 %r1394, %ctaid.x; 2026-02-21T08:52:43.9551266Z .loc 1 0 0 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0 2026-02-21T08:52:43.9551610Z sub.s32 %r182, 4279, %r1394; 2026-02-21T08:52:43.9551947Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9552310Z mul.hi.u32 %r183, %r182, 1041204193; 2026-02-21T08:52:43.9552514Z shr.u32 %r184, %r183, 10; 2026-02-21T08:52:43.9552695Z mul.hi.u32 %r185, %r184, 1431655766; 2026-02-21T08:52:43.9552907Z mad.lo.s32 %r1431, %r185, 12672, %r1394; 2026-02-21T08:52:43.9553268Z .loc 1 31 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:45 2026-02-21T08:52:43.9553622Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:43.9553794Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:43.9553956Z and.b32 %r5, %r3, 1008; 2026-02-21T08:52:43.9554307Z shr.u32 %r6, %r3, 4; 2026-02-21T08:52:43.9554474Z and.b32 %r7, %r3, 15; 2026-02-21T08:52:43.9554650Z shl.b32 %r8, %r7, 2; 2026-02-21T08:52:43.9554974Z .loc 1 33 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:45 2026-02-21T08:52:43.9555340Z and.b32 %r9, %r3, 31; 2026-02-21T08:52:43.9555517Z shl.b32 %r10, %r9, 2; 2026-02-21T08:52:43.9555673Z shl.b32 %r11, %r7, 3; 2026-02-21T08:52:43.9556000Z .loc 1 65 38 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:65:38 2026-02-21T08:52:43.9556365Z and.b32 %r12, %r3, 128; 2026-02-21T08:52:43.9556839Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9557235Z setp.ge.s32 %p1, %r1394, %r1431; 2026-02-21T08:52:43.9557585Z shl.b32 %r1377, %r3, 3; 2026-02-21T08:52:43.9557778Z shr.u32 %r1378, %r3, 1; 2026-02-21T08:52:43.9557946Z mov.b32 %r1379, global_smem; 2026-02-21T08:52:43.9558153Z mul.lo.s32 %r1380, %r4, 7168; 2026-02-21T08:52:43.9558340Z shl.b32 %r1381, %r3, 2; 2026-02-21T08:52:43.9558506Z shl.b32 %r1382, %r3, 6; 2026-02-21T08:52:43.9558662Z shl.b32 %r1383, %r3, 5; 2026-02-21T08:52:43.9558829Z shl.b32 %r1384, %r9, 1; 2026-02-21T08:52:43.9558989Z and.b32 %r1385, %r3, 127; 2026-02-21T08:52:43.9559165Z shl.b32 %r1386, %r3, 4; 2026-02-21T08:52:43.9559335Z and.b32 %r1387, %r4, 28; 2026-02-21T08:52:43.9559499Z and.b32 %r1388, %r3, 7; 2026-02-21T08:52:43.9559667Z shl.b32 %r1389, %r7, 4; 2026-02-21T08:52:43.9559828Z shr.u32 %r1390, %r3, 2; 2026-02-21T08:52:43.9559997Z and.b32 %r1391, %r3, 16; 2026-02-21T08:52:43.9560160Z shl.b32 %r1392, %r3, 1; 2026-02-21T08:52:43.9560325Z shl.b32 %r1393, %r3, 7; 2026-02-21T08:52:43.9560493Z setp.eq.b32 %p57, %r12, 0; 2026-02-21T08:52:43.9560682Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:43.9560865Z // %bb.1: // %.lr.ph 2026-02-21T08:52:43.9561254Z .loc 1 0 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0:144 2026-02-21T08:52:43.9561625Z and.b32 %r188, %r1378, 56; 2026-02-21T08:52:43.9561801Z xor.b32 %r189, %r188, %r1377; 2026-02-21T08:52:43.9561991Z add.s32 %r191, %r1379, %r189; 2026-02-21T08:52:43.9562168Z add.s32 %r13, %r191, 32768; 2026-02-21T08:52:43.9562350Z add.s32 %r193, %r1379, 49152; 2026-02-21T08:52:43.9562526Z add.s32 %r15, %r193, %r1381; 2026-02-21T08:52:43.9562709Z add.s32 %r16, %r191, 40960; 2026-02-21T08:52:43.9562884Z add.s32 %r194, %r1379, %r1381; 2026-02-21T08:52:43.9563084Z add.s32 %r17, %r194, 53248; 2026-02-21T08:52:43.9563264Z and.b32 %r196, %r1382, 6144; 2026-02-21T08:52:43.9563436Z and.b32 %r198, %r1383, 896; 2026-02-21T08:52:43.9563613Z or.b32 %r200, %r196, %r198; 2026-02-21T08:52:43.9563780Z or.b32 %r18, %r200, %r1384; 2026-02-21T08:52:43.9563963Z xor.b32 %r19, %r18, 8; 2026-02-21T08:52:43.9564133Z xor.b32 %r20, %r18, 16; 2026-02-21T08:52:43.9564316Z xor.b32 %r21, %r18, 24; 2026-02-21T08:52:43.9564476Z xor.b32 %r22, %r18, 32; 2026-02-21T08:52:43.9564649Z xor.b32 %r23, %r18, 40; 2026-02-21T08:52:43.9564813Z xor.b32 %r24, %r18, 48; 2026-02-21T08:52:43.9564985Z xor.b32 %r25, %r18, 56; 2026-02-21T08:52:43.9565158Z and.b32 %r202, %r1378, 384; 2026-02-21T08:52:43.9565333Z add.s32 %r203, %r193, %r202; 2026-02-21T08:52:43.9565530Z add.s32 %r26, %r203, %r1385; 2026-02-21T08:52:43.9565706Z shl.b32 %r204, %r1385, 7; 2026-02-21T08:52:43.9565884Z and.b32 %r206, %r1386, 112; 2026-02-21T08:52:43.9566052Z or.b32 %r208, %r204, %r206; 2026-02-21T08:52:43.9566228Z xor.b32 %r209, %r208, %r1387; 2026-02-21T08:52:43.9566402Z add.s32 %r27, %r1379, %r209; 2026-02-21T08:52:43.9566718Z xor.b32 %r210, %r209, 32; 2026-02-21T08:52:43.9566891Z add.s32 %r28, %r1379, %r210; 2026-02-21T08:52:43.9567066Z xor.b32 %r211, %r209, 64; 2026-02-21T08:52:43.9567244Z add.s32 %r29, %r1379, %r211; 2026-02-21T08:52:43.9567412Z xor.b32 %r212, %r209, 96; 2026-02-21T08:52:43.9567585Z add.s32 %r30, %r1379, %r212; 2026-02-21T08:52:43.9567755Z shl.b32 %r214, %r1388, 11; 2026-02-21T08:52:43.9568074Z and.b32 %r216, %r3, 96; 2026-02-21T08:52:43.9568255Z shl.b32 %r217, %r216, 3; 2026-02-21T08:52:43.9568421Z and.b32 %r219, %r1390, 96; 2026-02-21T08:52:43.9568598Z and.b32 %r222, %r1392, 1024; 2026-02-21T08:52:43.9568769Z or.b32 %r223, %r1389, %r217; 2026-02-21T08:52:43.9568943Z or.b32 %r224, %r219, %r1391; 2026-02-21T08:52:43.9569111Z xor.b32 %r225, %r223, %r224; 2026-02-21T08:52:43.9569306Z add.s32 %r226, %r1379, %r214; 2026-02-21T08:52:43.9569481Z add.s32 %r227, %r226, %r222; 2026-02-21T08:52:43.9569657Z add.s32 %r31, %r227, %r225; 2026-02-21T08:52:43.9569832Z and.b32 %r229, %r1393, 15360; 2026-02-21T08:52:43.9570004Z shl.b32 %r230, %r1388, 4; 2026-02-21T08:52:43.9570184Z xor.b32 %r231, %r230, %r5; 2026-02-21T08:52:43.9570485Z add.s32 %r232, %r1379, %r229; 2026-02-21T08:52:43.9570672Z add.s32 %r32, %r232, %r231; 2026-02-21T08:52:43.9570844Z shl.b32 %r233, %r216, 6; 2026-02-21T08:52:43.9571014Z or.b32 %r234, %r233, %r198; 2026-02-21T08:52:43.9571186Z or.b32 %r33, %r234, %r1384; 2026-02-21T08:52:43.9571376Z xor.b32 %r34, %r33, 8; 2026-02-21T08:52:43.9571538Z xor.b32 %r35, %r33, 16; 2026-02-21T08:52:43.9571706Z xor.b32 %r36, %r33, 24; 2026-02-21T08:52:43.9571873Z xor.b32 %r37, %r33, 32; 2026-02-21T08:52:43.9572031Z xor.b32 %r38, %r33, 40; 2026-02-21T08:52:43.9572193Z xor.b32 %r39, %r33, 48; 2026-02-21T08:52:43.9572350Z xor.b32 %r40, %r33, 56; 2026-02-21T08:52:43.9572515Z or.b32 %r235, %r204, %r230; 2026-02-21T08:52:43.9572683Z xor.b32 %r236, %r235, %r1387; 2026-02-21T08:52:43.9572861Z add.s32 %r41, %r1379, %r236; 2026-02-21T08:52:43.9573033Z xor.b32 %r237, %r236, 32; 2026-02-21T08:52:43.9573203Z add.s32 %r42, %r1379, %r237; 2026-02-21T08:52:43.9573377Z xor.b32 %r238, %r236, 64; 2026-02-21T08:52:43.9573542Z add.s32 %r43, %r1379, %r238; 2026-02-21T08:52:43.9573723Z xor.b32 %r239, %r236, 96; 2026-02-21T08:52:43.9573896Z add.s32 %r44, %r1379, %r239; 2026-02-21T08:52:43.9574234Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9574597Z or.b32 %r240, %r1380, %r10; 2026-02-21T08:52:43.9574772Z add.s32 %r45, %r240, 458752; 2026-02-21T08:52:43.9575104Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9575459Z shl.b32 %r241, %r6, 13; 2026-02-21T08:52:43.9575772Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9576119Z or.b32 %r242, %r241, %r8; 2026-02-21T08:52:43.9576304Z or.b32 %r46, %r242, 128; 2026-02-21T08:52:43.9576593Z cvt.u64.u32 %rd2, %r8; 2026-02-21T08:52:43.9576782Z cvt.u64.u32 %rd3, %r1380; 2026-02-21T08:52:43.9577017Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:43.9577320Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:43.9577592Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:43.9577855Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:43.9578244Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.9578610Z mul.hi.s32 %r254, %r1394, -1840700269; 2026-02-21T08:52:43.9578818Z add.s32 %r255, %r254, %r1394; 2026-02-21T08:52:43.9578993Z shr.u32 %r256, %r255, 31; 2026-02-21T08:52:43.9579177Z shr.s32 %r257, %r255, 6; 2026-02-21T08:52:43.9579348Z add.s32 %r258, %r257, %r256; 2026-02-21T08:52:43.9579669Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.9580024Z shl.b32 %r259, %r258, 1; 2026-02-21T08:52:43.9580333Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.9580682Z sub.s32 %r260, 1, %r259; 2026-02-21T08:52:43.9580990Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.9581520Z min.s32 %r261, %r260, 2; 2026-02-21T08:52:43.9581845Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.9582203Z mul.lo.s32 %r262, %r258, 112; 2026-02-21T08:52:43.9582404Z sub.s32 %r263, %r1394, %r262; 2026-02-21T08:52:43.9582718Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.9583072Z div.s32 %r264, %r263, %r261; 2026-02-21T08:52:43.9583387Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.9583758Z mul.lo.s32 %r265, %r264, %r261; 2026-02-21T08:52:43.9583947Z sub.s32 %r266, %r263, %r265; 2026-02-21T08:52:43.9584402Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.9584763Z add.s32 %r267, %r266, %r259; 2026-02-21T08:52:43.9585073Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.9585420Z shl.b32 %r268, %r267, 6; 2026-02-21T08:52:43.9585723Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.9586068Z or.b32 %r70, %r268, %r6; 2026-02-21T08:52:43.9586365Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.9586839Z shl.b32 %r71, %r264, 7; 2026-02-21T08:52:43.9587159Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9587519Z or.b32 %r269, %r71, %r10; 2026-02-21T08:52:43.9587832Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.9588172Z shl.b32 %r270, %r70, 13; 2026-02-21T08:52:43.9588491Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.9588913Z or.b32 %r271, %r270, %r8; 2026-02-21T08:52:43.9589220Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9589577Z mad.wide.s32 %rd18, %r271, 2, %rd15; 2026-02-21T08:52:43.9589774Z mov.b32 %r244, 8; 2026-02-21T08:52:43.9590068Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9590408Z // begin inline asm 2026-02-21T08:52:43.9590650Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd18 + 0 ], 0x8, %r244; 2026-02-21T08:52:43.9590928Z // end inline asm 2026-02-21T08:52:43.9591091Z cp.async.commit_group; 2026-02-21T08:52:43.9591404Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.9591750Z add.s32 %r272, %r269, %r1380; 2026-02-21T08:52:43.9592072Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9592417Z cvt.s64.s32 %rd23, %r272; 2026-02-21T08:52:43.9592609Z add.s64 %rd19, %rd16, %rd23; 2026-02-21T08:52:43.9592783Z mov.b32 %r246, 4; 2026-02-21T08:52:43.9593075Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9593421Z // begin inline asm 2026-02-21T08:52:43.9593649Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd19 + 0 ], 0x4, %r246; 2026-02-21T08:52:43.9593922Z // end inline asm 2026-02-21T08:52:43.9594078Z cp.async.commit_group; 2026-02-21T08:52:43.9594391Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9594747Z cvt.s64.s32 %rd24, %r270; 2026-02-21T08:52:43.9594930Z or.b64 %rd25, %rd24, %rd2; 2026-02-21T08:52:43.9595108Z shl.b64 %rd26, %rd25, 1; 2026-02-21T08:52:43.9595285Z add.s64 %rd27, %rd15, %rd26; 2026-02-21T08:52:43.9595471Z add.s64 %rd20, %rd27, 128; 2026-02-21T08:52:43.9595789Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9596313Z bar.sync 0; 2026-02-21T08:52:43.9596594Z // begin inline asm 2026-02-21T08:52:43.9596848Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd20 + 0 ], 0x8, %r244; 2026-02-21T08:52:43.9597124Z // end inline asm 2026-02-21T08:52:43.9597302Z cp.async.commit_group; 2026-02-21T08:52:43.9597615Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9597971Z cvt.s64.s32 %rd28, %r269; 2026-02-21T08:52:43.9598153Z add.s64 %rd29, %rd28, %rd3; 2026-02-21T08:52:43.9598336Z add.s64 %rd30, %rd16, %rd29; 2026-02-21T08:52:43.9598520Z add.s64 %rd21, %rd30, 229376; 2026-02-21T08:52:43.9598969Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9599321Z // begin inline asm 2026-02-21T08:52:43.9599546Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd21 + 0 ], 0x4, %r246; 2026-02-21T08:52:43.9599821Z // end inline asm 2026-02-21T08:52:43.9599974Z cp.async.commit_group; 2026-02-21T08:52:43.9600291Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9600651Z add.s32 %r1396, %r45, %r71; 2026-02-21T08:52:43.9600844Z shl.b32 %r273, %r267, 19; 2026-02-21T08:52:43.9601023Z or.b32 %r1395, %r46, %r273; 2026-02-21T08:52:43.9601193Z mov.b32 %r1399, 0f00000000; 2026-02-21T08:52:43.9601366Z mov.b32 %r1398, 1; 2026-02-21T08:52:43.9601517Z mov.b32 %r1397, -1; 2026-02-21T08:52:43.9601681Z mov.b64 %rd147, -32; 2026-02-21T08:52:43.9601845Z mov.b32 %r1400, %r1399; 2026-02-21T08:52:43.9602016Z mov.b32 %r1401, %r1399; 2026-02-21T08:52:43.9602184Z mov.b32 %r1402, %r1399; 2026-02-21T08:52:43.9602347Z mov.b32 %r1403, %r1399; 2026-02-21T08:52:43.9602511Z mov.b32 %r1404, %r1399; 2026-02-21T08:52:43.9602669Z mov.b32 %r1405, %r1399; 2026-02-21T08:52:43.9602835Z mov.b32 %r1406, %r1399; 2026-02-21T08:52:43.9603055Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.9603359Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.9603625Z add.s64 %rd147, %rd147, 32; 2026-02-21T08:52:43.9603817Z setp.lt.u64 %p11, %rd147, 4032; 2026-02-21T08:52:43.9604013Z add.s32 %r460, %r1397, 1; 2026-02-21T08:52:43.9604202Z setp.gt.s32 %p12, %r460, 1; 2026-02-21T08:52:43.9604391Z selp.b32 %r1397, 0, %r460, %p12; 2026-02-21T08:52:43.9604727Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9605105Z cp.async.wait_group 2; 2026-02-21T08:52:43.9605273Z bar.sync 0; 2026-02-21T08:52:43.9605426Z shl.b32 %r461, %r1397, 12; 2026-02-21T08:52:43.9605600Z shl.b32 %r462, %r1397, 13; 2026-02-21T08:52:43.9605776Z add.s32 %r463, %r1379, %r462; 2026-02-21T08:52:43.9605958Z add.s32 %r464, %r463, 32768; 2026-02-21T08:52:43.9606277Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.9606773Z add.s32 %r465, %r464, %r18; 2026-02-21T08:52:43.9606969Z ld.shared.b16 %rs1, [%r465]; 2026-02-21T08:52:43.9607167Z ld.shared.b16 %rs2, [%r465+1024]; 2026-02-21T08:52:43.9607367Z ld.shared.b16 %rs3, [%r465+64]; 2026-02-21T08:52:43.9607561Z ld.shared.b16 %rs4, [%r465+1088]; 2026-02-21T08:52:43.9607754Z add.s32 %r466, %r464, %r19; 2026-02-21T08:52:43.9607930Z ld.shared.b16 %rs5, [%r466]; 2026-02-21T08:52:43.9608113Z ld.shared.b16 %rs6, [%r466+1024]; 2026-02-21T08:52:43.9608312Z ld.shared.b16 %rs7, [%r466+64]; 2026-02-21T08:52:43.9608495Z ld.shared.b16 %rs8, [%r466+1088]; 2026-02-21T08:52:43.9608692Z add.s32 %r467, %r464, %r20; 2026-02-21T08:52:43.9608868Z ld.shared.b16 %rs9, [%r467]; 2026-02-21T08:52:43.9609057Z ld.shared.b16 %rs10, [%r467+1024]; 2026-02-21T08:52:43.9609267Z ld.shared.b16 %rs11, [%r467+64]; 2026-02-21T08:52:43.9609466Z ld.shared.b16 %rs12, [%r467+1088]; 2026-02-21T08:52:43.9609795Z add.s32 %r468, %r464, %r21; 2026-02-21T08:52:43.9609985Z ld.shared.b16 %rs13, [%r468]; 2026-02-21T08:52:43.9610180Z ld.shared.b16 %rs14, [%r468+1024]; 2026-02-21T08:52:43.9610374Z ld.shared.b16 %rs15, [%r468+64]; 2026-02-21T08:52:43.9610581Z ld.shared.b16 %rs16, [%r468+1088]; 2026-02-21T08:52:43.9610776Z add.s32 %r469, %r464, %r22; 2026-02-21T08:52:43.9610959Z ld.shared.b16 %rs17, [%r469]; 2026-02-21T08:52:43.9611142Z ld.shared.b16 %rs18, [%r469+1024]; 2026-02-21T08:52:43.9611339Z ld.shared.b16 %rs19, [%r469+64]; 2026-02-21T08:52:43.9611528Z ld.shared.b16 %rs20, [%r469+1088]; 2026-02-21T08:52:43.9611722Z add.s32 %r470, %r464, %r23; 2026-02-21T08:52:43.9611907Z ld.shared.b16 %rs21, [%r470]; 2026-02-21T08:52:43.9612099Z ld.shared.b16 %rs22, [%r470+1024]; 2026-02-21T08:52:43.9612435Z ld.shared.b16 %rs23, [%r470+64]; 2026-02-21T08:52:43.9612628Z ld.shared.b16 %rs24, [%r470+1088]; 2026-02-21T08:52:43.9612823Z add.s32 %r471, %r464, %r24; 2026-02-21T08:52:43.9612995Z ld.shared.b16 %rs25, [%r471]; 2026-02-21T08:52:43.9613183Z ld.shared.b16 %rs26, [%r471+1024]; 2026-02-21T08:52:43.9613388Z ld.shared.b16 %rs27, [%r471+64]; 2026-02-21T08:52:43.9613584Z ld.shared.b16 %rs28, [%r471+1088]; 2026-02-21T08:52:43.9613772Z add.s32 %r472, %r464, %r25; 2026-02-21T08:52:43.9613951Z ld.shared.b16 %rs29, [%r472]; 2026-02-21T08:52:43.9614158Z ld.shared.b16 %rs30, [%r472+1024]; 2026-02-21T08:52:43.9614367Z ld.shared.b16 %rs31, [%r472+64]; 2026-02-21T08:52:43.9614573Z ld.shared.b16 %rs32, [%r472+1088]; 2026-02-21T08:52:43.9614767Z cvt.f32.bf16 %r290, %rs1; 2026-02-21T08:52:43.9614953Z cvt.f32.bf16 %r291, %rs2; 2026-02-21T08:52:43.9615123Z cvt.f32.bf16 %r292, %rs5; 2026-02-21T08:52:43.9615294Z cvt.f32.bf16 %r293, %rs6; 2026-02-21T08:52:43.9615461Z cvt.f32.bf16 %r310, %rs9; 2026-02-21T08:52:43.9615639Z cvt.f32.bf16 %r311, %rs10; 2026-02-21T08:52:43.9615817Z cvt.f32.bf16 %r312, %rs13; 2026-02-21T08:52:43.9615993Z cvt.f32.bf16 %r313, %rs14; 2026-02-21T08:52:43.9616173Z cvt.f32.bf16 %r330, %rs17; 2026-02-21T08:52:43.9616362Z cvt.f32.bf16 %r331, %rs18; 2026-02-21T08:52:43.9616661Z cvt.f32.bf16 %r332, %rs21; 2026-02-21T08:52:43.9616845Z cvt.f32.bf16 %r333, %rs22; 2026-02-21T08:52:43.9617039Z cvt.f32.bf16 %r350, %rs25; 2026-02-21T08:52:43.9617216Z cvt.f32.bf16 %r351, %rs26; 2026-02-21T08:52:43.9617386Z cvt.f32.bf16 %r352, %rs29; 2026-02-21T08:52:43.9617568Z cvt.f32.bf16 %r353, %rs30; 2026-02-21T08:52:43.9617738Z cvt.f32.bf16 %r370, %rs3; 2026-02-21T08:52:43.9617914Z cvt.f32.bf16 %r371, %rs4; 2026-02-21T08:52:43.9618081Z cvt.f32.bf16 %r372, %rs7; 2026-02-21T08:52:43.9618256Z cvt.f32.bf16 %r373, %rs8; 2026-02-21T08:52:43.9618448Z cvt.f32.bf16 %r390, %rs11; 2026-02-21T08:52:43.9618623Z cvt.f32.bf16 %r391, %rs12; 2026-02-21T08:52:43.9618805Z cvt.f32.bf16 %r392, %rs15; 2026-02-21T08:52:43.9618978Z cvt.f32.bf16 %r393, %rs16; 2026-02-21T08:52:43.9619155Z cvt.f32.bf16 %r410, %rs19; 2026-02-21T08:52:43.9619326Z cvt.f32.bf16 %r411, %rs20; 2026-02-21T08:52:43.9619515Z cvt.f32.bf16 %r412, %rs23; 2026-02-21T08:52:43.9619686Z cvt.f32.bf16 %r413, %rs24; 2026-02-21T08:52:43.9619861Z cvt.f32.bf16 %r430, %rs27; 2026-02-21T08:52:43.9620042Z cvt.f32.bf16 %r431, %rs28; 2026-02-21T08:52:43.9620218Z cvt.f32.bf16 %r432, %rs31; 2026-02-21T08:52:43.9620390Z cvt.f32.bf16 %r433, %rs32; 2026-02-21T08:52:43.9620706Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.9621069Z add.s32 %r473, %r26, %r461; 2026-02-21T08:52:43.9621257Z ld.shared.b8 %rs33, [%r473]; 2026-02-21T08:52:43.9621449Z ld.shared.b8 %rs34, [%r473+512]; 2026-02-21T08:52:43.9621643Z ld.shared.b8 %rs35, [%r473+1024]; 2026-02-21T08:52:43.9621846Z ld.shared.b8 %rs36, [%r473+1536]; 2026-02-21T08:52:43.9622039Z ld.shared.b8 %rs37, [%r473+2048]; 2026-02-21T08:52:43.9622237Z ld.shared.b8 %rs38, [%r473+2560]; 2026-02-21T08:52:43.9622429Z ld.shared.b8 %rs39, [%r473+3072]; 2026-02-21T08:52:43.9622620Z ld.shared.b8 %rs40, [%r473+3584]; 2026-02-21T08:52:43.9623121Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.9623477Z shl.b16 %rs41, %rs33, 4; 2026-02-21T08:52:43.9623654Z shl.b16 %rs42, %rs34, 4; 2026-02-21T08:52:43.9623821Z shl.b16 %rs43, %rs35, 4; 2026-02-21T08:52:43.9623995Z shl.b16 %rs44, %rs36, 4; 2026-02-21T08:52:43.9624172Z shl.b16 %rs45, %rs37, 4; 2026-02-21T08:52:43.9624346Z shl.b16 %rs46, %rs38, 4; 2026-02-21T08:52:43.9624517Z shl.b16 %rs47, %rs39, 4; 2026-02-21T08:52:43.9624680Z shl.b16 %rs48, %rs40, 4; 2026-02-21T08:52:43.9624991Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.9625349Z selp.b16 %rs49, %rs41, %rs33, %p57; 2026-02-21T08:52:43.9625712Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T08:52:43.9625891Z shr.s16 %rs51, %rs50, 4; 2026-02-21T08:52:43.9626071Z selp.b16 %rs52, %rs42, %rs34, %p57; 2026-02-21T08:52:43.9626265Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T08:52:43.9626441Z shr.s16 %rs54, %rs53, 4; 2026-02-21T08:52:43.9626851Z selp.b16 %rs55, %rs43, %rs35, %p57; 2026-02-21T08:52:43.9627047Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T08:52:43.9627223Z shr.s16 %rs57, %rs56, 4; 2026-02-21T08:52:43.9627406Z selp.b16 %rs58, %rs44, %rs36, %p57; 2026-02-21T08:52:43.9627612Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T08:52:43.9627785Z shr.s16 %rs60, %rs59, 4; 2026-02-21T08:52:43.9627970Z selp.b16 %rs61, %rs45, %rs37, %p57; 2026-02-21T08:52:43.9628168Z cvt.s16.s8 %rs62, %rs61; 2026-02-21T08:52:43.9628342Z shr.s16 %rs63, %rs62, 4; 2026-02-21T08:52:43.9628583Z selp.b16 %rs64, %rs46, %rs38, %p57; 2026-02-21T08:52:43.9628803Z cvt.s16.s8 %rs65, %rs64; 2026-02-21T08:52:43.9628978Z shr.s16 %rs66, %rs65, 4; 2026-02-21T08:52:43.9629156Z selp.b16 %rs67, %rs47, %rs39, %p57; 2026-02-21T08:52:43.9629358Z cvt.s16.s8 %rs68, %rs67; 2026-02-21T08:52:43.9629527Z shr.s16 %rs69, %rs68, 4; 2026-02-21T08:52:43.9629709Z selp.b16 %rs70, %rs48, %rs40, %p57; 2026-02-21T08:52:43.9629904Z cvt.s16.s8 %rs71, %rs70; 2026-02-21T08:52:43.9630074Z shr.s16 %rs72, %rs71, 4; 2026-02-21T08:52:43.9630384Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.9630748Z cvt.rn.f32.s16 %r474, %rs51; 2026-02-21T08:52:43.9630935Z cvt.rn.f32.s16 %r475, %rs54; 2026-02-21T08:52:43.9631112Z cvt.rn.f32.s16 %r476, %rs57; 2026-02-21T08:52:43.9631289Z cvt.rn.f32.s16 %r477, %rs60; 2026-02-21T08:52:43.9631463Z cvt.rn.f32.s16 %r478, %rs63; 2026-02-21T08:52:43.9631645Z cvt.rn.f32.s16 %r479, %rs66; 2026-02-21T08:52:43.9631818Z cvt.rn.f32.s16 %r480, %rs69; 2026-02-21T08:52:43.9631995Z cvt.rn.f32.s16 %r481, %rs72; 2026-02-21T08:52:43.9632167Z st.shared.b32 [%r27], %r474; 2026-02-21T08:52:43.9632357Z st.shared.b32 [%r27+16384], %r478; 2026-02-21T08:52:43.9632556Z st.shared.b32 [%r28], %r475; 2026-02-21T08:52:43.9632755Z st.shared.b32 [%r28+16384], %r479; 2026-02-21T08:52:43.9632951Z st.shared.b32 [%r29], %r476; 2026-02-21T08:52:43.9633132Z st.shared.b32 [%r29+16384], %r480; 2026-02-21T08:52:43.9633328Z st.shared.b32 [%r30], %r477; 2026-02-21T08:52:43.9633507Z st.shared.b32 [%r30+16384], %r481; 2026-02-21T08:52:43.9633694Z $L__tmp1: 2026-02-21T08:52:43.9634057Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.9634492Z // begin inline asm 2026-02-21T08:52:43.9634691Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.9634891Z // end inline asm 2026-02-21T08:52:43.9635044Z bar.sync 0; 2026-02-21T08:52:43.9635211Z shfl.sync.idx.b32 %r482, %r4, 0, 31, -1; 2026-02-21T08:52:43.9635452Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.9635635Z shl.b32 %r483, %r482, 9; 2026-02-21T08:52:43.9635813Z and.b32 %r484, %r483, 14336; 2026-02-21T08:52:43.9635992Z add.s32 %r485, %r484, %r1379; 2026-02-21T08:52:43.9636179Z bfe.u32 %r486, %r485, 4, 14; 2026-02-21T08:52:43.9636354Z cvt.u64.u32 %rd41, %r486; 2026-02-21T08:52:43.9636874Z or.b64 %rd31, %rd41, 4611686293372403712; 2026-02-21T08:52:43.9637115Z mov.pred %p2, -1; 2026-02-21T08:52:43.9637281Z // begin inline asm 2026-02-21T08:52:43.9637762Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r290,%r291,%r292,%r293}, %rd31, %p2, 1, 1; 2026-02-21T08:52:43.9638280Z // end inline asm 2026-02-21T08:52:43.9638438Z add.s32 %r487, %r485, 32; 2026-02-21T08:52:43.9638615Z bfe.u32 %r488, %r487, 4, 14; 2026-02-21T08:52:43.9638805Z cvt.u64.u32 %rd42, %r488; 2026-02-21T08:52:43.9638988Z or.b64 %rd32, %rd42, 4611686293372403712; 2026-02-21T08:52:43.9639200Z // begin inline asm 2026-02-21T08:52:43.9639793Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r310,%r311,%r312,%r313}, %rd32, %p2, 1, 1; 2026-02-21T08:52:43.9640307Z // end inline asm 2026-02-21T08:52:43.9640465Z add.s32 %r489, %r485, 64; 2026-02-21T08:52:43.9640635Z bfe.u32 %r490, %r489, 4, 14; 2026-02-21T08:52:43.9640828Z cvt.u64.u32 %rd43, %r490; 2026-02-21T08:52:43.9641013Z or.b64 %rd33, %rd43, 4611686293372403712; 2026-02-21T08:52:43.9641227Z // begin inline asm 2026-02-21T08:52:43.9641692Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r330,%r331,%r332,%r333}, %rd33, %p2, 1, 1; 2026-02-21T08:52:43.9642221Z // end inline asm 2026-02-21T08:52:43.9642378Z add.s32 %r491, %r485, 96; 2026-02-21T08:52:43.9642549Z bfe.u32 %r492, %r491, 4, 14; 2026-02-21T08:52:43.9642732Z cvt.u64.u32 %rd44, %r492; 2026-02-21T08:52:43.9642906Z or.b64 %rd34, %rd44, 4611686293372403712; 2026-02-21T08:52:43.9643113Z // begin inline asm 2026-02-21T08:52:43.9643569Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r350,%r351,%r352,%r353}, %rd34, %p2, 1, 1; 2026-02-21T08:52:43.9644077Z // end inline asm 2026-02-21T08:52:43.9644235Z add.s32 %r493, %r485, 16384; 2026-02-21T08:52:43.9644412Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T08:52:43.9644588Z cvt.u64.u32 %rd45, %r494; 2026-02-21T08:52:43.9644762Z or.b64 %rd35, %rd45, 4611686293372403712; 2026-02-21T08:52:43.9644979Z // begin inline asm 2026-02-21T08:52:43.9645431Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r370,%r371,%r372,%r373}, %rd35, %p2, 1, 1; 2026-02-21T08:52:43.9645936Z // end inline asm 2026-02-21T08:52:43.9646085Z add.s32 %r495, %r485, 16416; 2026-02-21T08:52:43.9646266Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T08:52:43.9646445Z cvt.u64.u32 %rd46, %r496; 2026-02-21T08:52:43.9646764Z or.b64 %rd36, %rd46, 4611686293372403712; 2026-02-21T08:52:43.9646984Z // begin inline asm 2026-02-21T08:52:43.9647438Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r390,%r391,%r392,%r393}, %rd36, %p2, 1, 1; 2026-02-21T08:52:43.9647943Z // end inline asm 2026-02-21T08:52:43.9648095Z add.s32 %r497, %r485, 16448; 2026-02-21T08:52:43.9648273Z bfe.u32 %r498, %r497, 4, 14; 2026-02-21T08:52:43.9648447Z cvt.u64.u32 %rd47, %r498; 2026-02-21T08:52:43.9648631Z or.b64 %rd37, %rd47, 4611686293372403712; 2026-02-21T08:52:43.9648834Z // begin inline asm 2026-02-21T08:52:43.9649300Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r410,%r411,%r412,%r413}, %rd37, %p2, 1, 1; 2026-02-21T08:52:43.9649806Z // end inline asm 2026-02-21T08:52:43.9649955Z add.s32 %r499, %r485, 16480; 2026-02-21T08:52:43.9650143Z bfe.u32 %r500, %r499, 4, 14; 2026-02-21T08:52:43.9650316Z cvt.u64.u32 %rd48, %r500; 2026-02-21T08:52:43.9650497Z or.b64 %rd38, %rd48, 4611686293372403712; 2026-02-21T08:52:43.9650701Z // begin inline asm 2026-02-21T08:52:43.9651153Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406}, {%r430,%r431,%r432,%r433}, %rd38, %p2, 1, 1; 2026-02-21T08:52:43.9651824Z // end inline asm 2026-02-21T08:52:43.9651996Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.9652199Z mov.b32 %r443, 0; 2026-02-21T08:52:43.9652352Z mov.b32 %r442, %r1379; 2026-02-21T08:52:43.9652525Z mov.b32 %r444, %r443; 2026-02-21T08:52:43.9652684Z // begin inline asm 2026-02-21T08:52:43.9652955Z // wait for regs: %r1399,%r1400,%r1401,%r1402,%r1403,%r1404,%r1405,%r1406,%r442,%r443,%r444 2026-02-21T08:52:43.9653300Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.9653496Z // end inline asm 2026-02-21T08:52:43.9653652Z $L__tmp2: 2026-02-21T08:52:43.9653951Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9654335Z add.s32 %r501, %r1398, 1; 2026-02-21T08:52:43.9654517Z setp.gt.s32 %p13, %r501, 1; 2026-02-21T08:52:43.9654843Z selp.b32 %r1398, 0, %r501, %p13; 2026-02-21T08:52:43.9655182Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9655575Z mad.wide.s32 %rd39, %r1395, 2, %rd15; 2026-02-21T08:52:43.9655924Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9656273Z shl.b32 %r502, %r1398, 12; 2026-02-21T08:52:43.9656565Z shl.b32 %r503, %r1398, 13; 2026-02-21T08:52:43.9656746Z add.s32 %r456, %r13, %r503; 2026-02-21T08:52:43.9656942Z selp.b32 %r457, 8, 0, %p11; 2026-02-21T08:52:43.9657116Z // begin inline asm 2026-02-21T08:52:43.9657360Z cp.async.ca.shared.global [ %r456 + 0 ], [ %rd39 + 0 ], 0x8, %r457; 2026-02-21T08:52:43.9657636Z // end inline asm 2026-02-21T08:52:43.9657798Z cp.async.commit_group; 2026-02-21T08:52:43.9657998Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9658074Z cvt.s64.s32 %rd49, %r1396; 2026-02-21T08:52:43.9658139Z add.s64 %rd40, %rd16, %rd49; 2026-02-21T08:52:43.9658337Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9658408Z add.s32 %r458, %r15, %r502; 2026-02-21T08:52:43.9658471Z selp.b32 %r459, 4, 0, %p11; 2026-02-21T08:52:43.9658530Z // begin inline asm 2026-02-21T08:52:43.9658665Z cp.async.ca.shared.global [ %r458 + 0 ], [ %rd40 + 0 ], 0x4, %r459; 2026-02-21T08:52:43.9658728Z // end inline asm 2026-02-21T08:52:43.9658794Z cp.async.commit_group; 2026-02-21T08:52:43.9658997Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9659067Z add.s32 %r1396, %r1396, 229376; 2026-02-21T08:52:43.9659127Z add.s32 %r1395, %r1395, 64; 2026-02-21T08:52:43.9659197Z setp.lt.u64 %p14, %rd147, 4064; 2026-02-21T08:52:43.9659264Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:43.9659390Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.9659595Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9659660Z or.b32 %r519, %r71, %r11; 2026-02-21T08:52:43.9659876Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9659946Z cp.async.wait_group 0; 2026-02-21T08:52:43.9660004Z bar.sync 0; 2026-02-21T08:52:43.9660203Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.9660283Z cvt.rn.bf16x2.f32 %r520, %r1400, %r1399; 2026-02-21T08:52:43.9660358Z cvt.rn.bf16x2.f32 %r521, %r1402, %r1401; 2026-02-21T08:52:43.9660436Z cvt.rn.bf16x2.f32 %r522, %r1404, %r1403; 2026-02-21T08:52:43.9660507Z cvt.rn.bf16x2.f32 %r523, %r1406, %r1405; 2026-02-21T08:52:43.9660706Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.9660776Z mad.lo.s32 %r524, %r70, 7168, %r519; 2026-02-21T08:52:43.9660979Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.9661194Z mad.wide.s32 %rd50, %r524, 2, %rd17; 2026-02-21T08:52:43.9661401Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.9661592Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r520, %r521, %r522, %r523}; 2026-02-21T08:52:43.9661652Z bar.sync 0; 2026-02-21T08:52:43.9661760Z ld.shared.v4.b32 {%r504, %r505, %r506, %r507}, [%r32]; 2026-02-21T08:52:43.9661827Z // begin inline asm 2026-02-21T08:52:43.9661947Z st.global.v4.b32 [ %rd50 + 0 ], { %r504, %r505, %r506, %r507 }; 2026-02-21T08:52:43.9662006Z // end inline asm 2026-02-21T08:52:43.9662222Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9662296Z add.s32 %r525, %r1394, 4224; 2026-02-21T08:52:43.9662620Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.9662701Z mul.hi.s32 %r526, %r525, -1840700269; 2026-02-21T08:52:43.9662777Z add.s32 %r527, %r526, %r525; 2026-02-21T08:52:43.9662840Z shr.u32 %r528, %r527, 31; 2026-02-21T08:52:43.9662904Z shr.s32 %r529, %r527, 6; 2026-02-21T08:52:43.9662971Z add.s32 %r530, %r529, %r528; 2026-02-21T08:52:43.9663185Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.9663252Z shl.b32 %r531, %r530, 1; 2026-02-21T08:52:43.9663454Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.9663536Z sub.s32 %r532, 1, %r531; 2026-02-21T08:52:43.9663737Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.9663799Z min.s32 %r533, %r532, 2; 2026-02-21T08:52:43.9664003Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.9664068Z mul.lo.s32 %r534, %r530, 112; 2026-02-21T08:52:43.9664137Z sub.s32 %r535, %r525, %r534; 2026-02-21T08:52:43.9664338Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.9664405Z div.s32 %r536, %r535, %r533; 2026-02-21T08:52:43.9664598Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.9664672Z mul.lo.s32 %r537, %r536, %r533; 2026-02-21T08:52:43.9664735Z sub.s32 %r538, %r535, %r537; 2026-02-21T08:52:43.9664933Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.9664994Z add.s32 %r539, %r538, %r531; 2026-02-21T08:52:43.9665196Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.9665261Z shl.b32 %r540, %r539, 6; 2026-02-21T08:52:43.9665456Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.9665524Z or.b32 %r98, %r540, %r6; 2026-02-21T08:52:43.9665724Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.9665799Z shl.b32 %r99, %r536, 7; 2026-02-21T08:52:43.9666004Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9666070Z or.b32 %r541, %r99, %r10; 2026-02-21T08:52:43.9666264Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.9666332Z shl.b32 %r542, %r98, 13; 2026-02-21T08:52:43.9666659Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.9666725Z or.b32 %r543, %r542, %r8; 2026-02-21T08:52:43.9666934Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9667016Z mad.wide.s32 %rd51, %r543, 2, %rd15; 2026-02-21T08:52:43.9667075Z mov.b32 %r509, 8; 2026-02-21T08:52:43.9667409Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9667476Z // begin inline asm 2026-02-21T08:52:43.9667617Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd51 + 0 ], 0x8, %r509; 2026-02-21T08:52:43.9667680Z // end inline asm 2026-02-21T08:52:43.9667754Z cp.async.commit_group; 2026-02-21T08:52:43.9667952Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.9668018Z add.s32 %r544, %r541, %r1380; 2026-02-21T08:52:43.9668220Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9668292Z cvt.s64.s32 %rd56, %r544; 2026-02-21T08:52:43.9668357Z add.s64 %rd52, %rd16, %rd56; 2026-02-21T08:52:43.9668630Z mov.b32 %r511, 4; 2026-02-21T08:52:43.9668866Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9668937Z // begin inline asm 2026-02-21T08:52:43.9669078Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd52 + 0 ], 0x4, %r511; 2026-02-21T08:52:43.9669144Z // end inline asm 2026-02-21T08:52:43.9669212Z cp.async.commit_group; 2026-02-21T08:52:43.9669414Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9669478Z cvt.s64.s32 %rd57, %r542; 2026-02-21T08:52:43.9669547Z or.b64 %rd58, %rd57, %rd2; 2026-02-21T08:52:43.9669610Z shl.b64 %rd59, %rd58, 1; 2026-02-21T08:52:43.9669672Z add.s64 %rd60, %rd15, %rd59; 2026-02-21T08:52:43.9669740Z add.s64 %rd53, %rd60, 128; 2026-02-21T08:52:43.9669935Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9669992Z bar.sync 0; 2026-02-21T08:52:43.9670062Z // begin inline asm 2026-02-21T08:52:43.9670195Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd53 + 0 ], 0x8, %r509; 2026-02-21T08:52:43.9670252Z // end inline asm 2026-02-21T08:52:43.9670321Z cp.async.commit_group; 2026-02-21T08:52:43.9670521Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9670586Z cvt.s64.s32 %rd61, %r541; 2026-02-21T08:52:43.9670650Z add.s64 %rd62, %rd61, %rd3; 2026-02-21T08:52:43.9670722Z add.s64 %rd63, %rd16, %rd62; 2026-02-21T08:52:43.9670786Z add.s64 %rd54, %rd63, 229376; 2026-02-21T08:52:43.9670980Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9671048Z // begin inline asm 2026-02-21T08:52:43.9671176Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd54 + 0 ], 0x4, %r511; 2026-02-21T08:52:43.9671235Z // end inline asm 2026-02-21T08:52:43.9671300Z cp.async.commit_group; 2026-02-21T08:52:43.9671595Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9671702Z add.s32 %r1408, %r45, %r99; 2026-02-21T08:52:43.9671796Z shl.b32 %r545, %r539, 19; 2026-02-21T08:52:43.9671890Z or.b32 %r1407, %r46, %r545; 2026-02-21T08:52:43.9671956Z mov.b32 %r1411, 0f00000000; 2026-02-21T08:52:43.9672014Z mov.b32 %r1410, 1; 2026-02-21T08:52:43.9672077Z mov.b32 %r1409, -1; 2026-02-21T08:52:43.9672145Z mov.b64 %rd148, -32; 2026-02-21T08:52:43.9672207Z mov.b32 %r1412, %r1411; 2026-02-21T08:52:43.9672267Z mov.b32 %r1413, %r1411; 2026-02-21T08:52:43.9672331Z mov.b32 %r1414, %r1411; 2026-02-21T08:52:43.9672389Z mov.b32 %r1415, %r1411; 2026-02-21T08:52:43.9672448Z mov.b32 %r1416, %r1411; 2026-02-21T08:52:43.9672506Z mov.b32 %r1417, %r1411; 2026-02-21T08:52:43.9672573Z mov.b32 %r1418, %r1411; 2026-02-21T08:52:43.9672690Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.9672802Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.9672873Z add.s64 %rd148, %rd148, 32; 2026-02-21T08:52:43.9672942Z setp.lt.u64 %p24, %rd148, 4032; 2026-02-21T08:52:43.9673125Z add.s32 %r732, %r1409, 1; 2026-02-21T08:52:43.9673198Z setp.gt.s32 %p25, %r732, 1; 2026-02-21T08:52:43.9673266Z selp.b32 %r1409, 0, %r732, %p25; 2026-02-21T08:52:43.9673467Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9673536Z cp.async.wait_group 2; 2026-02-21T08:52:43.9673600Z bar.sync 0; 2026-02-21T08:52:43.9673663Z shl.b32 %r733, %r1409, 12; 2026-02-21T08:52:43.9673724Z shl.b32 %r734, %r1409, 13; 2026-02-21T08:52:43.9673792Z add.s32 %r735, %r1379, %r734; 2026-02-21T08:52:43.9673853Z add.s32 %r736, %r735, 32768; 2026-02-21T08:52:43.9674056Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.9674121Z add.s32 %r737, %r736, %r33; 2026-02-21T08:52:43.9674290Z ld.shared.b16 %rs73, [%r737]; 2026-02-21T08:52:43.9674365Z ld.shared.b16 %rs74, [%r737+1024]; 2026-02-21T08:52:43.9674434Z ld.shared.b16 %rs75, [%r737+64]; 2026-02-21T08:52:43.9674510Z ld.shared.b16 %rs76, [%r737+1088]; 2026-02-21T08:52:43.9674574Z add.s32 %r738, %r736, %r34; 2026-02-21T08:52:43.9674643Z ld.shared.b16 %rs77, [%r738]; 2026-02-21T08:52:43.9674721Z ld.shared.b16 %rs78, [%r738+1024]; 2026-02-21T08:52:43.9674787Z ld.shared.b16 %rs79, [%r738+64]; 2026-02-21T08:52:43.9674855Z ld.shared.b16 %rs80, [%r738+1088]; 2026-02-21T08:52:43.9674919Z add.s32 %r739, %r736, %r35; 2026-02-21T08:52:43.9674991Z ld.shared.b16 %rs81, [%r739]; 2026-02-21T08:52:43.9675058Z ld.shared.b16 %rs82, [%r739+1024]; 2026-02-21T08:52:43.9675124Z ld.shared.b16 %rs83, [%r739+64]; 2026-02-21T08:52:43.9675196Z ld.shared.b16 %rs84, [%r739+1088]; 2026-02-21T08:52:43.9675257Z add.s32 %r740, %r736, %r36; 2026-02-21T08:52:43.9675320Z ld.shared.b16 %rs85, [%r740]; 2026-02-21T08:52:43.9675390Z ld.shared.b16 %rs86, [%r740+1024]; 2026-02-21T08:52:43.9675465Z ld.shared.b16 %rs87, [%r740+64]; 2026-02-21T08:52:43.9675531Z ld.shared.b16 %rs88, [%r740+1088]; 2026-02-21T08:52:43.9675600Z add.s32 %r741, %r736, %r37; 2026-02-21T08:52:43.9675673Z ld.shared.b16 %rs89, [%r741]; 2026-02-21T08:52:43.9675737Z ld.shared.b16 %rs90, [%r741+1024]; 2026-02-21T08:52:43.9675807Z ld.shared.b16 %rs91, [%r741+64]; 2026-02-21T08:52:43.9675878Z ld.shared.b16 %rs92, [%r741+1088]; 2026-02-21T08:52:43.9675940Z add.s32 %r742, %r736, %r38; 2026-02-21T08:52:43.9676005Z ld.shared.b16 %rs93, [%r742]; 2026-02-21T08:52:43.9676070Z ld.shared.b16 %rs94, [%r742+1024]; 2026-02-21T08:52:43.9676146Z ld.shared.b16 %rs95, [%r742+64]; 2026-02-21T08:52:43.9676212Z ld.shared.b16 %rs96, [%r742+1088]; 2026-02-21T08:52:43.9676272Z add.s32 %r743, %r736, %r39; 2026-02-21T08:52:43.9676342Z ld.shared.b16 %rs97, [%r743]; 2026-02-21T08:52:43.9676408Z ld.shared.b16 %rs98, [%r743+1024]; 2026-02-21T08:52:43.9676635Z ld.shared.b16 %rs99, [%r743+64]; 2026-02-21T08:52:43.9676715Z ld.shared.b16 %rs100, [%r743+1088]; 2026-02-21T08:52:43.9676784Z add.s32 %r744, %r736, %r40; 2026-02-21T08:52:43.9676851Z ld.shared.b16 %rs101, [%r744]; 2026-02-21T08:52:43.9676925Z ld.shared.b16 %rs102, [%r744+1024]; 2026-02-21T08:52:43.9676997Z ld.shared.b16 %rs103, [%r744+64]; 2026-02-21T08:52:43.9677063Z ld.shared.b16 %rs104, [%r744+1088]; 2026-02-21T08:52:43.9677126Z cvt.f32.bf16 %r562, %rs73; 2026-02-21T08:52:43.9677189Z cvt.f32.bf16 %r563, %rs74; 2026-02-21T08:52:43.9677254Z cvt.f32.bf16 %r564, %rs77; 2026-02-21T08:52:43.9677327Z cvt.f32.bf16 %r565, %rs78; 2026-02-21T08:52:43.9677389Z cvt.f32.bf16 %r582, %rs81; 2026-02-21T08:52:43.9677456Z cvt.f32.bf16 %r583, %rs82; 2026-02-21T08:52:43.9677518Z cvt.f32.bf16 %r584, %rs85; 2026-02-21T08:52:43.9677578Z cvt.f32.bf16 %r585, %rs86; 2026-02-21T08:52:43.9677637Z cvt.f32.bf16 %r602, %rs89; 2026-02-21T08:52:43.9677706Z cvt.f32.bf16 %r603, %rs90; 2026-02-21T08:52:43.9677769Z cvt.f32.bf16 %r604, %rs93; 2026-02-21T08:52:43.9677831Z cvt.f32.bf16 %r605, %rs94; 2026-02-21T08:52:43.9677897Z cvt.f32.bf16 %r622, %rs97; 2026-02-21T08:52:43.9677959Z cvt.f32.bf16 %r623, %rs98; 2026-02-21T08:52:43.9678167Z cvt.f32.bf16 %r624, %rs101; 2026-02-21T08:52:43.9678234Z cvt.f32.bf16 %r625, %rs102; 2026-02-21T08:52:43.9678295Z cvt.f32.bf16 %r642, %rs75; 2026-02-21T08:52:43.9678356Z cvt.f32.bf16 %r643, %rs76; 2026-02-21T08:52:43.9678428Z cvt.f32.bf16 %r644, %rs79; 2026-02-21T08:52:43.9678495Z cvt.f32.bf16 %r645, %rs80; 2026-02-21T08:52:43.9678556Z cvt.f32.bf16 %r662, %rs83; 2026-02-21T08:52:43.9678619Z cvt.f32.bf16 %r663, %rs84; 2026-02-21T08:52:43.9678685Z cvt.f32.bf16 %r664, %rs87; 2026-02-21T08:52:43.9678746Z cvt.f32.bf16 %r665, %rs88; 2026-02-21T08:52:43.9678808Z cvt.f32.bf16 %r682, %rs91; 2026-02-21T08:52:43.9678869Z cvt.f32.bf16 %r683, %rs92; 2026-02-21T08:52:43.9678937Z cvt.f32.bf16 %r684, %rs95; 2026-02-21T08:52:43.9679000Z cvt.f32.bf16 %r685, %rs96; 2026-02-21T08:52:43.9679199Z cvt.f32.bf16 %r702, %rs99; 2026-02-21T08:52:43.9679271Z cvt.f32.bf16 %r703, %rs100; 2026-02-21T08:52:43.9679333Z cvt.f32.bf16 %r704, %rs103; 2026-02-21T08:52:43.9679401Z cvt.f32.bf16 %r705, %rs104; 2026-02-21T08:52:43.9679601Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.9679669Z add.s32 %r745, %r26, %r733; 2026-02-21T08:52:43.9679734Z ld.shared.b8 %rs105, [%r745]; 2026-02-21T08:52:43.9679803Z ld.shared.b8 %rs106, [%r745+512]; 2026-02-21T08:52:43.9679875Z ld.shared.b8 %rs107, [%r745+1024]; 2026-02-21T08:52:43.9679940Z ld.shared.b8 %rs108, [%r745+1536]; 2026-02-21T08:52:43.9680006Z ld.shared.b8 %rs109, [%r745+2048]; 2026-02-21T08:52:43.9680078Z ld.shared.b8 %rs110, [%r745+2560]; 2026-02-21T08:52:43.9680142Z ld.shared.b8 %rs111, [%r745+3072]; 2026-02-21T08:52:43.9680207Z ld.shared.b8 %rs112, [%r745+3584]; 2026-02-21T08:52:43.9680409Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.9680477Z shl.b16 %rs113, %rs105, 4; 2026-02-21T08:52:43.9680539Z shl.b16 %rs114, %rs106, 4; 2026-02-21T08:52:43.9680603Z shl.b16 %rs115, %rs107, 4; 2026-02-21T08:52:43.9680669Z shl.b16 %rs116, %rs108, 4; 2026-02-21T08:52:43.9680731Z shl.b16 %rs117, %rs109, 4; 2026-02-21T08:52:43.9680793Z shl.b16 %rs118, %rs110, 4; 2026-02-21T08:52:43.9680854Z shl.b16 %rs119, %rs111, 4; 2026-02-21T08:52:43.9680922Z shl.b16 %rs120, %rs112, 4; 2026-02-21T08:52:43.9681134Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.9681209Z selp.b16 %rs121, %rs113, %rs105, %p57; 2026-02-21T08:52:43.9681278Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T08:52:43.9681339Z shr.s16 %rs123, %rs122, 4; 2026-02-21T08:52:43.9681411Z selp.b16 %rs124, %rs114, %rs106, %p57; 2026-02-21T08:52:43.9681481Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:52:43.9681542Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:52:43.9681613Z selp.b16 %rs127, %rs115, %rs107, %p57; 2026-02-21T08:52:43.9681676Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:52:43.9681743Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:52:43.9681813Z selp.b16 %rs130, %rs116, %rs108, %p57; 2026-02-21T08:52:43.9681874Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T08:52:43.9681940Z shr.s16 %rs132, %rs131, 4; 2026-02-21T08:52:43.9682010Z selp.b16 %rs133, %rs117, %rs109, %p57; 2026-02-21T08:52:43.9682072Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T08:52:43.9682132Z shr.s16 %rs135, %rs134, 4; 2026-02-21T08:52:43.9682210Z selp.b16 %rs136, %rs118, %rs110, %p57; 2026-02-21T08:52:43.9682278Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T08:52:43.9682338Z shr.s16 %rs138, %rs137, 4; 2026-02-21T08:52:43.9682413Z selp.b16 %rs139, %rs119, %rs111, %p57; 2026-02-21T08:52:43.9682478Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T08:52:43.9682550Z shr.s16 %rs141, %rs140, 4; 2026-02-21T08:52:43.9682624Z selp.b16 %rs142, %rs120, %rs112, %p57; 2026-02-21T08:52:43.9682696Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T08:52:43.9682762Z shr.s16 %rs144, %rs143, 4; 2026-02-21T08:52:43.9682961Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.9683145Z cvt.rn.f32.s16 %r746, %rs123; 2026-02-21T08:52:43.9683210Z cvt.rn.f32.s16 %r747, %rs126; 2026-02-21T08:52:43.9683273Z cvt.rn.f32.s16 %r748, %rs129; 2026-02-21T08:52:43.9683343Z cvt.rn.f32.s16 %r749, %rs132; 2026-02-21T08:52:43.9683405Z cvt.rn.f32.s16 %r750, %rs135; 2026-02-21T08:52:43.9683469Z cvt.rn.f32.s16 %r751, %rs138; 2026-02-21T08:52:43.9683532Z cvt.rn.f32.s16 %r752, %rs141; 2026-02-21T08:52:43.9683601Z cvt.rn.f32.s16 %r753, %rs144; 2026-02-21T08:52:43.9683667Z st.shared.b32 [%r41], %r746; 2026-02-21T08:52:43.9683734Z st.shared.b32 [%r41+16384], %r750; 2026-02-21T08:52:43.9683804Z st.shared.b32 [%r42], %r747; 2026-02-21T08:52:43.9683869Z st.shared.b32 [%r42+16384], %r751; 2026-02-21T08:52:43.9683932Z st.shared.b32 [%r43], %r748; 2026-02-21T08:52:43.9684087Z st.shared.b32 [%r43+16384], %r752; 2026-02-21T08:52:43.9684160Z st.shared.b32 [%r44], %r749; 2026-02-21T08:52:43.9684225Z st.shared.b32 [%r44+16384], %r753; 2026-02-21T08:52:43.9684285Z $L__tmp3: 2026-02-21T08:52:43.9684566Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.9684627Z // begin inline asm 2026-02-21T08:52:43.9684709Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.9684777Z // end inline asm 2026-02-21T08:52:43.9684846Z bar.sync 0; 2026-02-21T08:52:43.9684929Z shfl.sync.idx.b32 %r754, %r4, 0, 31, -1; 2026-02-21T08:52:43.9685002Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.9685075Z shl.b32 %r755, %r754, 9; 2026-02-21T08:52:43.9685136Z and.b32 %r756, %r755, 14336; 2026-02-21T08:52:43.9685198Z add.s32 %r757, %r756, %r1379; 2026-02-21T08:52:43.9685265Z bfe.u32 %r758, %r757, 4, 14; 2026-02-21T08:52:43.9685328Z cvt.u64.u32 %rd74, %r758; 2026-02-21T08:52:43.9685404Z or.b64 %rd64, %rd74, 4611686293372403712; 2026-02-21T08:52:43.9685470Z mov.pred %p15, -1; 2026-02-21T08:52:43.9685535Z // begin inline asm 2026-02-21T08:52:43.9685916Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r562,%r563,%r564,%r565}, %rd64, %p15, 1, 1; 2026-02-21T08:52:43.9685977Z // end inline asm 2026-02-21T08:52:43.9686043Z add.s32 %r759, %r757, 32; 2026-02-21T08:52:43.9686103Z bfe.u32 %r760, %r759, 4, 14; 2026-02-21T08:52:43.9686164Z cvt.u64.u32 %rd75, %r760; 2026-02-21T08:52:43.9686235Z or.b64 %rd65, %rd75, 4611686293372403712; 2026-02-21T08:52:43.9686301Z // begin inline asm 2026-02-21T08:52:43.9686795Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r582,%r583,%r584,%r585}, %rd65, %p15, 1, 1; 2026-02-21T08:52:43.9686869Z // end inline asm 2026-02-21T08:52:43.9686936Z add.s32 %r761, %r757, 64; 2026-02-21T08:52:43.9687000Z bfe.u32 %r762, %r761, 4, 14; 2026-02-21T08:52:43.9687063Z cvt.u64.u32 %rd76, %r762; 2026-02-21T08:52:43.9687137Z or.b64 %rd66, %rd76, 4611686293372403712; 2026-02-21T08:52:43.9687197Z // begin inline asm 2026-02-21T08:52:43.9687559Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r602,%r603,%r604,%r605}, %rd66, %p15, 1, 1; 2026-02-21T08:52:43.9687623Z // end inline asm 2026-02-21T08:52:43.9687683Z add.s32 %r763, %r757, 96; 2026-02-21T08:52:43.9687743Z bfe.u32 %r764, %r763, 4, 14; 2026-02-21T08:52:43.9687804Z cvt.u64.u32 %rd77, %r764; 2026-02-21T08:52:43.9687881Z or.b64 %rd67, %rd77, 4611686293372403712; 2026-02-21T08:52:43.9687941Z // begin inline asm 2026-02-21T08:52:43.9688312Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r622,%r623,%r624,%r625}, %rd67, %p15, 1, 1; 2026-02-21T08:52:43.9688376Z // end inline asm 2026-02-21T08:52:43.9688438Z add.s32 %r765, %r757, 16384; 2026-02-21T08:52:43.9688502Z bfe.u32 %r766, %r765, 4, 14; 2026-02-21T08:52:43.9688563Z cvt.u64.u32 %rd78, %r766; 2026-02-21T08:52:43.9688642Z or.b64 %rd68, %rd78, 4611686293372403712; 2026-02-21T08:52:43.9688861Z // begin inline asm 2026-02-21T08:52:43.9689234Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r642,%r643,%r644,%r645}, %rd68, %p15, 1, 1; 2026-02-21T08:52:43.9689302Z // end inline asm 2026-02-21T08:52:43.9689368Z add.s32 %r767, %r757, 16416; 2026-02-21T08:52:43.9689430Z bfe.u32 %r768, %r767, 4, 14; 2026-02-21T08:52:43.9689499Z cvt.u64.u32 %rd79, %r768; 2026-02-21T08:52:43.9689570Z or.b64 %rd69, %rd79, 4611686293372403712; 2026-02-21T08:52:43.9689634Z // begin inline asm 2026-02-21T08:52:43.9689999Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r662,%r663,%r664,%r665}, %rd69, %p15, 1, 1; 2026-02-21T08:52:43.9690064Z // end inline asm 2026-02-21T08:52:43.9690261Z add.s32 %r769, %r757, 16448; 2026-02-21T08:52:43.9690329Z bfe.u32 %r770, %r769, 4, 14; 2026-02-21T08:52:43.9690399Z cvt.u64.u32 %rd80, %r770; 2026-02-21T08:52:43.9690475Z or.b64 %rd70, %rd80, 4611686293372403712; 2026-02-21T08:52:43.9690540Z // begin inline asm 2026-02-21T08:52:43.9690916Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r682,%r683,%r684,%r685}, %rd70, %p15, 1, 1; 2026-02-21T08:52:43.9690975Z // end inline asm 2026-02-21T08:52:43.9691037Z add.s32 %r771, %r757, 16480; 2026-02-21T08:52:43.9691109Z bfe.u32 %r772, %r771, 4, 14; 2026-02-21T08:52:43.9691179Z cvt.u64.u32 %rd81, %r772; 2026-02-21T08:52:43.9691252Z or.b64 %rd71, %rd81, 4611686293372403712; 2026-02-21T08:52:43.9691313Z // begin inline asm 2026-02-21T08:52:43.9691684Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418}, {%r702,%r703,%r704,%r705}, %rd71, %p15, 1, 1; 2026-02-21T08:52:43.9691744Z // end inline asm 2026-02-21T08:52:43.9691824Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.9691892Z mov.b32 %r716, 0; 2026-02-21T08:52:43.9691953Z mov.b32 %r715, %r716; 2026-02-21T08:52:43.9692018Z mov.b32 %r714, %r1379; 2026-02-21T08:52:43.9692077Z // begin inline asm 2026-02-21T08:52:43.9692256Z // wait for regs: %r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r714,%r715,%r716 2026-02-21T08:52:43.9692334Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.9692392Z // end inline asm 2026-02-21T08:52:43.9692457Z $L__tmp4: 2026-02-21T08:52:43.9692678Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9692741Z add.s32 %r773, %r1410, 1; 2026-02-21T08:52:43.9692809Z setp.gt.s32 %p26, %r773, 1; 2026-02-21T08:52:43.9692901Z selp.b32 %r1410, 0, %r773, %p26; 2026-02-21T08:52:43.9693107Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9693182Z mad.wide.s32 %rd72, %r1407, 2, %rd15; 2026-02-21T08:52:43.9693388Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9693455Z shl.b32 %r774, %r1410, 12; 2026-02-21T08:52:43.9693517Z shl.b32 %r775, %r1410, 13; 2026-02-21T08:52:43.9693587Z add.s32 %r728, %r13, %r775; 2026-02-21T08:52:43.9693652Z selp.b32 %r729, 8, 0, %p24; 2026-02-21T08:52:43.9693714Z // begin inline asm 2026-02-21T08:52:43.9693856Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd72 + 0 ], 0x8, %r729; 2026-02-21T08:52:43.9693922Z // end inline asm 2026-02-21T08:52:43.9693992Z cp.async.commit_group; 2026-02-21T08:52:43.9694189Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9694260Z cvt.s64.s32 %rd82, %r1408; 2026-02-21T08:52:43.9694324Z add.s64 %rd73, %rd16, %rd82; 2026-02-21T08:52:43.9694521Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9694590Z add.s32 %r730, %r15, %r774; 2026-02-21T08:52:43.9694654Z selp.b32 %r731, 4, 0, %p24; 2026-02-21T08:52:43.9694713Z // begin inline asm 2026-02-21T08:52:43.9694961Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd73 + 0 ], 0x4, %r731; 2026-02-21T08:52:43.9695027Z // end inline asm 2026-02-21T08:52:43.9695093Z cp.async.commit_group; 2026-02-21T08:52:43.9695300Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9695369Z add.s32 %r1408, %r1408, 229376; 2026-02-21T08:52:43.9695431Z add.s32 %r1407, %r1407, 64; 2026-02-21T08:52:43.9695497Z setp.lt.u64 %p27, %rd148, 4064; 2026-02-21T08:52:43.9695563Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:43.9695674Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.9695873Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9696031Z or.b32 %r791, %r99, %r11; 2026-02-21T08:52:43.9696241Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9696313Z cp.async.wait_group 0; 2026-02-21T08:52:43.9696369Z bar.sync 0; 2026-02-21T08:52:43.9696699Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.9696791Z cvt.rn.bf16x2.f32 %r792, %r1412, %r1411; 2026-02-21T08:52:43.9696867Z cvt.rn.bf16x2.f32 %r793, %r1414, %r1413; 2026-02-21T08:52:43.9696956Z cvt.rn.bf16x2.f32 %r794, %r1416, %r1415; 2026-02-21T08:52:43.9697029Z cvt.rn.bf16x2.f32 %r795, %r1418, %r1417; 2026-02-21T08:52:43.9697225Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.9697294Z mad.lo.s32 %r796, %r98, 7168, %r791; 2026-02-21T08:52:43.9697491Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.9697564Z mad.wide.s32 %rd83, %r796, 2, %rd17; 2026-02-21T08:52:43.9697758Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.9697947Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r792, %r793, %r794, %r795}; 2026-02-21T08:52:43.9698003Z bar.sync 0; 2026-02-21T08:52:43.9698109Z ld.shared.v4.b32 {%r776, %r777, %r778, %r779}, [%r32]; 2026-02-21T08:52:43.9698173Z // begin inline asm 2026-02-21T08:52:43.9698292Z st.global.v4.b32 [ %rd83 + 0 ], { %r776, %r777, %r778, %r779 }; 2026-02-21T08:52:43.9698350Z // end inline asm 2026-02-21T08:52:43.9698559Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9698627Z add.s32 %r797, %r1394, 8448; 2026-02-21T08:52:43.9698825Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.9698896Z mul.hi.s32 %r798, %r797, -1840700269; 2026-02-21T08:52:43.9698966Z add.s32 %r799, %r798, %r797; 2026-02-21T08:52:43.9699027Z shr.u32 %r800, %r799, 31; 2026-02-21T08:52:43.9699088Z shr.s32 %r801, %r799, 6; 2026-02-21T08:52:43.9699157Z add.s32 %r802, %r801, %r800; 2026-02-21T08:52:43.9699356Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.9699417Z shl.b32 %r803, %r802, 1; 2026-02-21T08:52:43.9699611Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.9699688Z sub.s32 %r804, 1, %r803; 2026-02-21T08:52:43.9699884Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.9699947Z min.s32 %r805, %r804, 2; 2026-02-21T08:52:43.9700147Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.9700210Z mul.lo.s32 %r806, %r802, 112; 2026-02-21T08:52:43.9700274Z sub.s32 %r807, %r797, %r806; 2026-02-21T08:52:43.9700474Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.9700536Z div.s32 %r808, %r807, %r805; 2026-02-21T08:52:43.9700872Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.9700943Z mul.lo.s32 %r809, %r808, %r805; 2026-02-21T08:52:43.9701007Z sub.s32 %r810, %r807, %r809; 2026-02-21T08:52:43.9701202Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.9701263Z add.s32 %r811, %r810, %r803; 2026-02-21T08:52:43.9701462Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.9701522Z shl.b32 %r812, %r811, 6; 2026-02-21T08:52:43.9701715Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.9701898Z or.b32 %r126, %r812, %r6; 2026-02-21T08:52:43.9702095Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.9702168Z shl.b32 %r127, %r808, 7; 2026-02-21T08:52:43.9702376Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9702438Z or.b32 %r813, %r127, %r10; 2026-02-21T08:52:43.9702632Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.9702701Z shl.b32 %r814, %r126, 13; 2026-02-21T08:52:43.9702893Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.9702953Z or.b32 %r815, %r814, %r8; 2026-02-21T08:52:43.9703146Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9703223Z mad.wide.s32 %rd84, %r815, 2, %rd15; 2026-02-21T08:52:43.9703280Z mov.b32 %r781, 8; 2026-02-21T08:52:43.9703475Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9703543Z // begin inline asm 2026-02-21T08:52:43.9703682Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd84 + 0 ], 0x8, %r781; 2026-02-21T08:52:43.9703741Z // end inline asm 2026-02-21T08:52:43.9703822Z cp.async.commit_group; 2026-02-21T08:52:43.9704018Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.9704081Z add.s32 %r816, %r813, %r1380; 2026-02-21T08:52:43.9704276Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9704346Z cvt.s64.s32 %rd89, %r816; 2026-02-21T08:52:43.9704409Z add.s64 %rd85, %rd16, %rd89; 2026-02-21T08:52:43.9704468Z mov.b32 %r783, 4; 2026-02-21T08:52:43.9704670Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9704733Z // begin inline asm 2026-02-21T08:52:43.9704865Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd85 + 0 ], 0x4, %r783; 2026-02-21T08:52:43.9704931Z // end inline asm 2026-02-21T08:52:43.9705002Z cp.async.commit_group; 2026-02-21T08:52:43.9705197Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9705260Z cvt.s64.s32 %rd90, %r814; 2026-02-21T08:52:43.9705330Z or.b64 %rd91, %rd90, %rd2; 2026-02-21T08:52:43.9705404Z shl.b64 %rd92, %rd91, 1; 2026-02-21T08:52:43.9705471Z add.s64 %rd93, %rd15, %rd92; 2026-02-21T08:52:43.9705540Z add.s64 %rd86, %rd93, 128; 2026-02-21T08:52:43.9705736Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9705792Z bar.sync 0; 2026-02-21T08:52:43.9705860Z // begin inline asm 2026-02-21T08:52:43.9705988Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd86 + 0 ], 0x8, %r781; 2026-02-21T08:52:43.9706046Z // end inline asm 2026-02-21T08:52:43.9706115Z cp.async.commit_group; 2026-02-21T08:52:43.9706321Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9706610Z cvt.s64.s32 %rd94, %r813; 2026-02-21T08:52:43.9706678Z add.s64 %rd95, %rd94, %rd3; 2026-02-21T08:52:43.9706746Z add.s64 %rd96, %rd16, %rd95; 2026-02-21T08:52:43.9706807Z add.s64 %rd87, %rd96, 229376; 2026-02-21T08:52:43.9707018Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9707079Z // begin inline asm 2026-02-21T08:52:43.9707210Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd87 + 0 ], 0x4, %r783; 2026-02-21T08:52:43.9707267Z // end inline asm 2026-02-21T08:52:43.9707332Z cp.async.commit_group; 2026-02-21T08:52:43.9707542Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9707604Z add.s32 %r1420, %r45, %r127; 2026-02-21T08:52:43.9707794Z shl.b32 %r817, %r811, 19; 2026-02-21T08:52:43.9707865Z or.b32 %r1419, %r46, %r817; 2026-02-21T08:52:43.9707926Z mov.b32 %r1423, 0f00000000; 2026-02-21T08:52:43.9707984Z mov.b32 %r1422, 1; 2026-02-21T08:52:43.9708049Z mov.b32 %r1421, -1; 2026-02-21T08:52:43.9708119Z mov.b64 %rd149, -32; 2026-02-21T08:52:43.9708190Z mov.b32 %r1424, %r1423; 2026-02-21T08:52:43.9708254Z mov.b32 %r1425, %r1423; 2026-02-21T08:52:43.9708320Z mov.b32 %r1426, %r1423; 2026-02-21T08:52:43.9708379Z mov.b32 %r1427, %r1423; 2026-02-21T08:52:43.9708437Z mov.b32 %r1428, %r1423; 2026-02-21T08:52:43.9708494Z mov.b32 %r1429, %r1423; 2026-02-21T08:52:43.9708647Z mov.b32 %r1430, %r1423; 2026-02-21T08:52:43.9724095Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:43.9724268Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.9724353Z add.s64 %rd149, %rd149, 32; 2026-02-21T08:52:43.9724433Z setp.lt.u64 %p37, %rd149, 4032; 2026-02-21T08:52:43.9724510Z add.s32 %r1004, %r1421, 1; 2026-02-21T08:52:43.9724591Z setp.gt.s32 %p38, %r1004, 1; 2026-02-21T08:52:43.9724666Z selp.b32 %r1421, 0, %r1004, %p38; 2026-02-21T08:52:43.9724914Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9725003Z cp.async.wait_group 2; 2026-02-21T08:52:43.9725065Z bar.sync 0; 2026-02-21T08:52:43.9725137Z shl.b32 %r1005, %r1421, 12; 2026-02-21T08:52:43.9725200Z shl.b32 %r1006, %r1421, 13; 2026-02-21T08:52:43.9725282Z add.s32 %r1007, %r1379, %r1006; 2026-02-21T08:52:43.9725351Z add.s32 %r1008, %r1007, 32768; 2026-02-21T08:52:43.9725578Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.9725649Z add.s32 %r1009, %r1008, %r33; 2026-02-21T08:52:43.9725721Z ld.shared.b16 %rs145, [%r1009]; 2026-02-21T08:52:43.9725795Z ld.shared.b16 %rs146, [%r1009+1024]; 2026-02-21T08:52:43.9725866Z ld.shared.b16 %rs147, [%r1009+64]; 2026-02-21T08:52:43.9725938Z ld.shared.b16 %rs148, [%r1009+1088]; 2026-02-21T08:52:43.9726001Z add.s32 %r1010, %r1008, %r34; 2026-02-21T08:52:43.9726067Z ld.shared.b16 %rs149, [%r1010]; 2026-02-21T08:52:43.9726144Z ld.shared.b16 %rs150, [%r1010+1024]; 2026-02-21T08:52:43.9726210Z ld.shared.b16 %rs151, [%r1010+64]; 2026-02-21T08:52:43.9726277Z ld.shared.b16 %rs152, [%r1010+1088]; 2026-02-21T08:52:43.9726350Z add.s32 %r1011, %r1008, %r35; 2026-02-21T08:52:43.9726415Z ld.shared.b16 %rs153, [%r1011]; 2026-02-21T08:52:43.9726626Z ld.shared.b16 %rs154, [%r1011+1024]; 2026-02-21T08:52:43.9726700Z ld.shared.b16 %rs155, [%r1011+64]; 2026-02-21T08:52:43.9726773Z ld.shared.b16 %rs156, [%r1011+1088]; 2026-02-21T08:52:43.9726833Z add.s32 %r1012, %r1008, %r36; 2026-02-21T08:52:43.9726904Z ld.shared.b16 %rs157, [%r1012]; 2026-02-21T08:52:43.9726985Z ld.shared.b16 %rs158, [%r1012+1024]; 2026-02-21T08:52:43.9727052Z ld.shared.b16 %rs159, [%r1012+64]; 2026-02-21T08:52:43.9727121Z ld.shared.b16 %rs160, [%r1012+1088]; 2026-02-21T08:52:43.9727187Z add.s32 %r1013, %r1008, %r37; 2026-02-21T08:52:43.9727261Z ld.shared.b16 %rs161, [%r1013]; 2026-02-21T08:52:43.9727330Z ld.shared.b16 %rs162, [%r1013+1024]; 2026-02-21T08:52:43.9727654Z ld.shared.b16 %rs163, [%r1013+64]; 2026-02-21T08:52:43.9727730Z ld.shared.b16 %rs164, [%r1013+1088]; 2026-02-21T08:52:43.9727796Z add.s32 %r1014, %r1008, %r38; 2026-02-21T08:52:43.9727864Z ld.shared.b16 %rs165, [%r1014]; 2026-02-21T08:52:43.9727934Z ld.shared.b16 %rs166, [%r1014+1024]; 2026-02-21T08:52:43.9728006Z ld.shared.b16 %rs167, [%r1014+64]; 2026-02-21T08:52:43.9728074Z ld.shared.b16 %rs168, [%r1014+1088]; 2026-02-21T08:52:43.9728138Z add.s32 %r1015, %r1008, %r39; 2026-02-21T08:52:43.9728210Z ld.shared.b16 %rs169, [%r1015]; 2026-02-21T08:52:43.9728277Z ld.shared.b16 %rs170, [%r1015+1024]; 2026-02-21T08:52:43.9728341Z ld.shared.b16 %rs171, [%r1015+64]; 2026-02-21T08:52:43.9728415Z ld.shared.b16 %rs172, [%r1015+1088]; 2026-02-21T08:52:43.9728609Z add.s32 %r1016, %r1008, %r40; 2026-02-21T08:52:43.9728688Z ld.shared.b16 %rs173, [%r1016]; 2026-02-21T08:52:43.9728761Z ld.shared.b16 %rs174, [%r1016+1024]; 2026-02-21T08:52:43.9728841Z ld.shared.b16 %rs175, [%r1016+64]; 2026-02-21T08:52:43.9728909Z ld.shared.b16 %rs176, [%r1016+1088]; 2026-02-21T08:52:43.9728981Z cvt.f32.bf16 %r834, %rs145; 2026-02-21T08:52:43.9729049Z cvt.f32.bf16 %r835, %rs146; 2026-02-21T08:52:43.9729112Z cvt.f32.bf16 %r836, %rs149; 2026-02-21T08:52:43.9729173Z cvt.f32.bf16 %r837, %rs150; 2026-02-21T08:52:43.9729243Z cvt.f32.bf16 %r854, %rs153; 2026-02-21T08:52:43.9729320Z cvt.f32.bf16 %r855, %rs154; 2026-02-21T08:52:43.9729382Z cvt.f32.bf16 %r856, %rs157; 2026-02-21T08:52:43.9729443Z cvt.f32.bf16 %r857, %rs158; 2026-02-21T08:52:43.9729510Z cvt.f32.bf16 %r874, %rs161; 2026-02-21T08:52:43.9729572Z cvt.f32.bf16 %r875, %rs162; 2026-02-21T08:52:43.9729632Z cvt.f32.bf16 %r876, %rs165; 2026-02-21T08:52:43.9729694Z cvt.f32.bf16 %r877, %rs166; 2026-02-21T08:52:43.9729765Z cvt.f32.bf16 %r894, %rs169; 2026-02-21T08:52:43.9729824Z cvt.f32.bf16 %r895, %rs170; 2026-02-21T08:52:43.9729887Z cvt.f32.bf16 %r896, %rs173; 2026-02-21T08:52:43.9729958Z cvt.f32.bf16 %r897, %rs174; 2026-02-21T08:52:43.9730020Z cvt.f32.bf16 %r914, %rs147; 2026-02-21T08:52:43.9730083Z cvt.f32.bf16 %r915, %rs148; 2026-02-21T08:52:43.9730149Z cvt.f32.bf16 %r916, %rs151; 2026-02-21T08:52:43.9730210Z cvt.f32.bf16 %r917, %rs152; 2026-02-21T08:52:43.9730272Z cvt.f32.bf16 %r934, %rs155; 2026-02-21T08:52:43.9730331Z cvt.f32.bf16 %r935, %rs156; 2026-02-21T08:52:43.9730407Z cvt.f32.bf16 %r936, %rs159; 2026-02-21T08:52:43.9730474Z cvt.f32.bf16 %r937, %rs160; 2026-02-21T08:52:43.9730534Z cvt.f32.bf16 %r954, %rs163; 2026-02-21T08:52:43.9730593Z cvt.f32.bf16 %r955, %rs164; 2026-02-21T08:52:43.9730655Z cvt.f32.bf16 %r956, %rs167; 2026-02-21T08:52:43.9730716Z cvt.f32.bf16 %r957, %rs168; 2026-02-21T08:52:43.9730776Z cvt.f32.bf16 %r974, %rs171; 2026-02-21T08:52:43.9730845Z cvt.f32.bf16 %r975, %rs172; 2026-02-21T08:52:43.9730906Z cvt.f32.bf16 %r976, %rs175; 2026-02-21T08:52:43.9730969Z cvt.f32.bf16 %r977, %rs176; 2026-02-21T08:52:43.9731197Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.9731265Z add.s32 %r1017, %r26, %r1005; 2026-02-21T08:52:43.9731333Z ld.shared.b8 %rs177, [%r1017]; 2026-02-21T08:52:43.9731406Z ld.shared.b8 %rs178, [%r1017+512]; 2026-02-21T08:52:43.9731477Z ld.shared.b8 %rs179, [%r1017+1024]; 2026-02-21T08:52:43.9731555Z ld.shared.b8 %rs180, [%r1017+1536]; 2026-02-21T08:52:43.9731625Z ld.shared.b8 %rs181, [%r1017+2048]; 2026-02-21T08:52:43.9731697Z ld.shared.b8 %rs182, [%r1017+2560]; 2026-02-21T08:52:43.9731762Z ld.shared.b8 %rs183, [%r1017+3072]; 2026-02-21T08:52:43.9731828Z ld.shared.b8 %rs184, [%r1017+3584]; 2026-02-21T08:52:43.9732049Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.9732120Z shl.b16 %rs185, %rs177, 4; 2026-02-21T08:52:43.9732184Z shl.b16 %rs186, %rs178, 4; 2026-02-21T08:52:43.9732246Z shl.b16 %rs187, %rs179, 4; 2026-02-21T08:52:43.9732317Z shl.b16 %rs188, %rs180, 4; 2026-02-21T08:52:43.9732489Z shl.b16 %rs189, %rs181, 4; 2026-02-21T08:52:43.9732552Z shl.b16 %rs190, %rs182, 4; 2026-02-21T08:52:43.9732625Z shl.b16 %rs191, %rs183, 4; 2026-02-21T08:52:43.9732688Z shl.b16 %rs192, %rs184, 4; 2026-02-21T08:52:43.9732889Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.9732976Z selp.b16 %rs193, %rs185, %rs177, %p57; 2026-02-21T08:52:43.9733039Z cvt.s16.s8 %rs194, %rs193; 2026-02-21T08:52:43.9733101Z shr.s16 %rs195, %rs194, 4; 2026-02-21T08:52:43.9733176Z selp.b16 %rs196, %rs186, %rs178, %p57; 2026-02-21T08:52:43.9733246Z cvt.s16.s8 %rs197, %rs196; 2026-02-21T08:52:43.9733308Z shr.s16 %rs198, %rs197, 4; 2026-02-21T08:52:43.9733389Z selp.b16 %rs199, %rs187, %rs179, %p57; 2026-02-21T08:52:43.9733556Z cvt.s16.s8 %rs200, %rs199; 2026-02-21T08:52:43.9733625Z shr.s16 %rs201, %rs200, 4; 2026-02-21T08:52:43.9733698Z selp.b16 %rs202, %rs188, %rs180, %p57; 2026-02-21T08:52:43.9733765Z cvt.s16.s8 %rs203, %rs202; 2026-02-21T08:52:43.9733834Z shr.s16 %rs204, %rs203, 4; 2026-02-21T08:52:43.9733905Z selp.b16 %rs205, %rs189, %rs181, %p57; 2026-02-21T08:52:43.9733966Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T08:52:43.9734033Z shr.s16 %rs207, %rs206, 4; 2026-02-21T08:52:43.9734101Z selp.b16 %rs208, %rs190, %rs182, %p57; 2026-02-21T08:52:43.9734170Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T08:52:43.9734232Z shr.s16 %rs210, %rs209, 4; 2026-02-21T08:52:43.9734305Z selp.b16 %rs211, %rs191, %rs183, %p57; 2026-02-21T08:52:43.9734365Z cvt.s16.s8 %rs212, %rs211; 2026-02-21T08:52:43.9734426Z shr.s16 %rs213, %rs212, 4; 2026-02-21T08:52:43.9734501Z selp.b16 %rs214, %rs192, %rs184, %p57; 2026-02-21T08:52:43.9734563Z cvt.s16.s8 %rs215, %rs214; 2026-02-21T08:52:43.9734626Z shr.s16 %rs216, %rs215, 4; 2026-02-21T08:52:43.9734845Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.9734915Z cvt.rn.f32.s16 %r1018, %rs195; 2026-02-21T08:52:43.9734982Z cvt.rn.f32.s16 %r1019, %rs198; 2026-02-21T08:52:43.9735049Z cvt.rn.f32.s16 %r1020, %rs201; 2026-02-21T08:52:43.9735119Z cvt.rn.f32.s16 %r1021, %rs204; 2026-02-21T08:52:43.9735181Z cvt.rn.f32.s16 %r1022, %rs207; 2026-02-21T08:52:43.9735243Z cvt.rn.f32.s16 %r1023, %rs210; 2026-02-21T08:52:43.9735311Z cvt.rn.f32.s16 %r1024, %rs213; 2026-02-21T08:52:43.9735374Z cvt.rn.f32.s16 %r1025, %rs216; 2026-02-21T08:52:43.9735438Z st.shared.b32 [%r41], %r1018; 2026-02-21T08:52:43.9735506Z st.shared.b32 [%r41+16384], %r1022; 2026-02-21T08:52:43.9735575Z st.shared.b32 [%r42], %r1019; 2026-02-21T08:52:43.9735641Z st.shared.b32 [%r42+16384], %r1023; 2026-02-21T08:52:43.9735705Z st.shared.b32 [%r43], %r1020; 2026-02-21T08:52:43.9735777Z st.shared.b32 [%r43+16384], %r1024; 2026-02-21T08:52:43.9735842Z st.shared.b32 [%r44], %r1021; 2026-02-21T08:52:43.9735908Z st.shared.b32 [%r44+16384], %r1025; 2026-02-21T08:52:43.9735969Z $L__tmp5: 2026-02-21T08:52:43.9736260Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.9736327Z // begin inline asm 2026-02-21T08:52:43.9736413Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.9736597Z // end inline asm 2026-02-21T08:52:43.9736662Z bar.sync 0; 2026-02-21T08:52:43.9736746Z shfl.sync.idx.b32 %r1026, %r4, 0, 31, -1; 2026-02-21T08:52:43.9736827Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.9736891Z shl.b32 %r1027, %r1026, 9; 2026-02-21T08:52:43.9736955Z and.b32 %r1028, %r1027, 14336; 2026-02-21T08:52:43.9737030Z add.s32 %r1029, %r1028, %r1379; 2026-02-21T08:52:43.9737099Z bfe.u32 %r1030, %r1029, 4, 14; 2026-02-21T08:52:43.9737166Z cvt.u64.u32 %rd107, %r1030; 2026-02-21T08:52:43.9737248Z or.b64 %rd97, %rd107, 4611686293372403712; 2026-02-21T08:52:43.9737325Z mov.pred %p28, -1; 2026-02-21T08:52:43.9737386Z // begin inline asm 2026-02-21T08:52:43.9737766Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r834,%r835,%r836,%r837}, %rd97, %p28, 1, 1; 2026-02-21T08:52:43.9737970Z // end inline asm 2026-02-21T08:52:43.9738036Z add.s32 %r1031, %r1029, 32; 2026-02-21T08:52:43.9738098Z bfe.u32 %r1032, %r1031, 4, 14; 2026-02-21T08:52:43.9738160Z cvt.u64.u32 %rd108, %r1032; 2026-02-21T08:52:43.9738241Z or.b64 %rd98, %rd108, 4611686293372403712; 2026-02-21T08:52:43.9738303Z // begin inline asm 2026-02-21T08:52:43.9738672Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r854,%r855,%r856,%r857}, %rd98, %p28, 1, 1; 2026-02-21T08:52:43.9738735Z // end inline asm 2026-02-21T08:52:43.9738797Z add.s32 %r1033, %r1029, 64; 2026-02-21T08:52:43.9738858Z bfe.u32 %r1034, %r1033, 4, 14; 2026-02-21T08:52:43.9739039Z cvt.u64.u32 %rd109, %r1034; 2026-02-21T08:52:43.9739125Z or.b64 %rd99, %rd109, 4611686293372403712; 2026-02-21T08:52:43.9739187Z // begin inline asm 2026-02-21T08:52:43.9739546Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r874,%r875,%r876,%r877}, %rd99, %p28, 1, 1; 2026-02-21T08:52:43.9739614Z // end inline asm 2026-02-21T08:52:43.9739674Z add.s32 %r1035, %r1029, 96; 2026-02-21T08:52:43.9739740Z bfe.u32 %r1036, %r1035, 4, 14; 2026-02-21T08:52:43.9739807Z cvt.u64.u32 %rd110, %r1036; 2026-02-21T08:52:43.9739888Z or.b64 %rd100, %rd110, 4611686293372403712; 2026-02-21T08:52:43.9739960Z // begin inline asm 2026-02-21T08:52:43.9740328Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r894,%r895,%r896,%r897}, %rd100, %p28, 1, 1; 2026-02-21T08:52:43.9740394Z // end inline asm 2026-02-21T08:52:43.9740456Z add.s32 %r1037, %r1029, 16384; 2026-02-21T08:52:43.9740520Z bfe.u32 %r1038, %r1037, 4, 14; 2026-02-21T08:52:43.9740590Z cvt.u64.u32 %rd111, %r1038; 2026-02-21T08:52:43.9740668Z or.b64 %rd101, %rd111, 4611686293372403712; 2026-02-21T08:52:43.9740732Z // begin inline asm 2026-02-21T08:52:43.9741110Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r914,%r915,%r916,%r917}, %rd101, %p28, 1, 1; 2026-02-21T08:52:43.9741169Z // end inline asm 2026-02-21T08:52:43.9741237Z add.s32 %r1039, %r1029, 16416; 2026-02-21T08:52:43.9741301Z bfe.u32 %r1040, %r1039, 4, 14; 2026-02-21T08:52:43.9741373Z cvt.u64.u32 %rd112, %r1040; 2026-02-21T08:52:43.9741451Z or.b64 %rd102, %rd112, 4611686293372403712; 2026-02-21T08:52:43.9741516Z // begin inline asm 2026-02-21T08:52:43.9741896Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r934,%r935,%r936,%r937}, %rd102, %p28, 1, 1; 2026-02-21T08:52:43.9741955Z // end inline asm 2026-02-21T08:52:43.9742021Z add.s32 %r1041, %r1029, 16448; 2026-02-21T08:52:43.9742091Z bfe.u32 %r1042, %r1041, 4, 14; 2026-02-21T08:52:43.9742156Z cvt.u64.u32 %rd113, %r1042; 2026-02-21T08:52:43.9742230Z or.b64 %rd103, %rd113, 4611686293372403712; 2026-02-21T08:52:43.9742295Z // begin inline asm 2026-02-21T08:52:43.9742664Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r954,%r955,%r956,%r957}, %rd103, %p28, 1, 1; 2026-02-21T08:52:43.9742724Z // end inline asm 2026-02-21T08:52:43.9742789Z add.s32 %r1043, %r1029, 16480; 2026-02-21T08:52:43.9742864Z bfe.u32 %r1044, %r1043, 4, 14; 2026-02-21T08:52:43.9742934Z cvt.u64.u32 %rd114, %r1044; 2026-02-21T08:52:43.9743007Z or.b64 %rd104, %rd114, 4611686293372403712; 2026-02-21T08:52:43.9743073Z // begin inline asm 2026-02-21T08:52:43.9743434Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430}, {%r974,%r975,%r976,%r977}, %rd104, %p28, 1, 1; 2026-02-21T08:52:43.9743497Z // end inline asm 2026-02-21T08:52:43.9743577Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.9743646Z mov.b32 %r987, 0; 2026-02-21T08:52:43.9743710Z mov.b32 %r986, %r1379; 2026-02-21T08:52:43.9743879Z mov.b32 %r988, %r987; 2026-02-21T08:52:43.9743948Z // begin inline asm 2026-02-21T08:52:43.9744124Z // wait for regs: %r1423,%r1424,%r1425,%r1426,%r1427,%r1428,%r1429,%r1430,%r986,%r987,%r988 2026-02-21T08:52:43.9744205Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.9744268Z // end inline asm 2026-02-21T08:52:43.9744327Z $L__tmp6: 2026-02-21T08:52:43.9744550Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9744616Z add.s32 %r1045, %r1422, 1; 2026-02-21T08:52:43.9744691Z setp.gt.s32 %p39, %r1045, 1; 2026-02-21T08:52:43.9744762Z selp.b32 %r1422, 0, %r1045, %p39; 2026-02-21T08:52:43.9745076Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9745162Z mad.wide.s32 %rd105, %r1419, 2, %rd15; 2026-02-21T08:52:43.9745362Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9745428Z shl.b32 %r1046, %r1422, 12; 2026-02-21T08:52:43.9745493Z shl.b32 %r1047, %r1422, 13; 2026-02-21T08:52:43.9745560Z add.s32 %r1000, %r13, %r1047; 2026-02-21T08:52:43.9745626Z selp.b32 %r1001, 8, 0, %p37; 2026-02-21T08:52:43.9745690Z // begin inline asm 2026-02-21T08:52:43.9745848Z cp.async.ca.shared.global [ %r1000 + 0 ], [ %rd105 + 0 ], 0x8, %r1001; 2026-02-21T08:52:43.9745906Z // end inline asm 2026-02-21T08:52:43.9745975Z cp.async.commit_group; 2026-02-21T08:52:43.9746177Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9746240Z cvt.s64.s32 %rd115, %r1420; 2026-02-21T08:52:43.9746316Z add.s64 %rd106, %rd16, %rd115; 2026-02-21T08:52:43.9746633Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9746707Z add.s32 %r1002, %r15, %r1046; 2026-02-21T08:52:43.9746769Z selp.b32 %r1003, 4, 0, %p37; 2026-02-21T08:52:43.9746832Z // begin inline asm 2026-02-21T08:52:43.9746978Z cp.async.ca.shared.global [ %r1002 + 0 ], [ %rd106 + 0 ], 0x4, %r1003; 2026-02-21T08:52:43.9747038Z // end inline asm 2026-02-21T08:52:43.9747105Z cp.async.commit_group; 2026-02-21T08:52:43.9747333Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9747401Z add.s32 %r1420, %r1420, 229376; 2026-02-21T08:52:43.9747463Z add.s32 %r1419, %r1419, 64; 2026-02-21T08:52:43.9747531Z setp.lt.u64 %p40, %rd149, 4064; 2026-02-21T08:52:43.9747601Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:43.9747712Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:43.9747916Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9747991Z or.b32 %r1052, %r127, %r11; 2026-02-21T08:52:43.9748196Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9748267Z cp.async.wait_group 0; 2026-02-21T08:52:43.9748330Z bar.sync 0; 2026-02-21T08:52:43.9748601Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.9748687Z cvt.rn.bf16x2.f32 %r1053, %r1424, %r1423; 2026-02-21T08:52:43.9748765Z cvt.rn.bf16x2.f32 %r1054, %r1426, %r1425; 2026-02-21T08:52:43.9748844Z cvt.rn.bf16x2.f32 %r1055, %r1428, %r1427; 2026-02-21T08:52:43.9748916Z cvt.rn.bf16x2.f32 %r1056, %r1430, %r1429; 2026-02-21T08:52:43.9749120Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.9749203Z mad.lo.s32 %r1057, %r126, 7168, %r1052; 2026-02-21T08:52:43.9749404Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.9749477Z mad.wide.s32 %rd116, %r1057, 2, %rd17; 2026-02-21T08:52:43.9749677Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.9750023Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r1053, %r1054, %r1055, %r1056}; 2026-02-21T08:52:43.9750092Z bar.sync 0; 2026-02-21T08:52:43.9750217Z ld.shared.v4.b32 {%r1048, %r1049, %r1050, %r1051}, [%r32]; 2026-02-21T08:52:43.9750279Z // begin inline asm 2026-02-21T08:52:43.9750408Z st.global.v4.b32 [ %rd116 + 0 ], { %r1048, %r1049, %r1050, %r1051 }; 2026-02-21T08:52:43.9750470Z // end inline asm 2026-02-21T08:52:43.9750686Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9750752Z add.s32 %r1394, %r1394, 12672; 2026-02-21T08:52:43.9750824Z setp.lt.s32 %p41, %r1394, %r1431; 2026-02-21T08:52:43.9750891Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:43.9751136Z $L__BB0_9: // %.preheader 2026-02-21T08:52:43.9751212Z setp.gt.s32 %p42, %r1431, 55; 2026-02-21T08:52:43.9751281Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:43.9751373Z // %bb.10: // %.lr.ph151 2026-02-21T08:52:43.9751579Z .loc 1 0 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:0:144 2026-02-21T08:52:43.9751645Z and.b32 %r1060, %r1378, 56; 2026-02-21T08:52:43.9751718Z xor.b32 %r1061, %r1060, %r1377; 2026-02-21T08:52:43.9751782Z add.s32 %r1063, %r1379, %r1061; 2026-02-21T08:52:43.9751845Z add.s32 %r47, %r1063, 32768; 2026-02-21T08:52:43.9751912Z add.s32 %r1065, %r1379, 49152; 2026-02-21T08:52:43.9751976Z add.s32 %r49, %r1065, %r1381; 2026-02-21T08:52:43.9752037Z add.s32 %r1108, %r1063, 40960; 2026-02-21T08:52:43.9752100Z add.s32 %r1066, %r1379, %r1381; 2026-02-21T08:52:43.9752167Z add.s32 %r1110, %r1066, 53248; 2026-02-21T08:52:43.9752228Z and.b32 %r1068, %r1382, 6144; 2026-02-21T08:52:43.9752294Z and.b32 %r1070, %r1383, 896; 2026-02-21T08:52:43.9752363Z or.b32 %r1072, %r1068, %r1070; 2026-02-21T08:52:43.9752424Z or.b32 %r52, %r1072, %r1384; 2026-02-21T08:52:43.9752489Z xor.b32 %r53, %r52, 8; 2026-02-21T08:52:43.9752553Z xor.b32 %r54, %r52, 16; 2026-02-21T08:52:43.9752613Z xor.b32 %r55, %r52, 24; 2026-02-21T08:52:43.9752682Z xor.b32 %r56, %r52, 32; 2026-02-21T08:52:43.9752741Z xor.b32 %r57, %r52, 40; 2026-02-21T08:52:43.9752798Z xor.b32 %r58, %r52, 48; 2026-02-21T08:52:43.9752863Z xor.b32 %r59, %r52, 56; 2026-02-21T08:52:43.9752924Z and.b32 %r1074, %r1378, 384; 2026-02-21T08:52:43.9752987Z add.s32 %r1075, %r1065, %r1074; 2026-02-21T08:52:43.9753051Z add.s32 %r60, %r1075, %r1385; 2026-02-21T08:52:43.9753121Z shl.b32 %r1076, %r1385, 7; 2026-02-21T08:52:43.9753179Z and.b32 %r1078, %r1386, 112; 2026-02-21T08:52:43.9753242Z or.b32 %r1080, %r1076, %r1078; 2026-02-21T08:52:43.9753312Z xor.b32 %r1081, %r1080, %r1387; 2026-02-21T08:52:43.9753381Z add.s32 %r61, %r1379, %r1081; 2026-02-21T08:52:43.9753446Z xor.b32 %r1082, %r1081, 32; 2026-02-21T08:52:43.9753508Z add.s32 %r62, %r1379, %r1082; 2026-02-21T08:52:43.9753575Z xor.b32 %r1083, %r1081, 64; 2026-02-21T08:52:43.9753640Z add.s32 %r63, %r1379, %r1083; 2026-02-21T08:52:43.9753701Z xor.b32 %r1084, %r1081, 96; 2026-02-21T08:52:43.9753770Z add.s32 %r64, %r1379, %r1084; 2026-02-21T08:52:43.9753830Z shl.b32 %r1086, %r1388, 11; 2026-02-21T08:52:43.9753893Z and.b32 %r1088, %r1377, 768; 2026-02-21T08:52:43.9753957Z and.b32 %r1090, %r1390, 96; 2026-02-21T08:52:43.9754020Z and.b32 %r1093, %r1392, 1024; 2026-02-21T08:52:43.9754087Z or.b32 %r1094, %r1389, %r1088; 2026-02-21T08:52:43.9754148Z or.b32 %r1095, %r1090, %r1391; 2026-02-21T08:52:43.9754217Z xor.b32 %r1096, %r1094, %r1095; 2026-02-21T08:52:43.9754280Z add.s32 %r1097, %r1379, %r1086; 2026-02-21T08:52:43.9754341Z add.s32 %r1098, %r1097, %r1093; 2026-02-21T08:52:43.9754407Z add.s32 %r65, %r1098, %r1096; 2026-02-21T08:52:43.9754471Z and.b32 %r1100, %r1393, 15360; 2026-02-21T08:52:43.9754534Z shl.b32 %r1101, %r1388, 4; 2026-02-21T08:52:43.9754595Z xor.b32 %r1102, %r1101, %r5; 2026-02-21T08:52:43.9754661Z add.s32 %r1103, %r1379, %r1100; 2026-02-21T08:52:43.9754832Z add.s32 %r66, %r1103, %r1102; 2026-02-21T08:52:43.9755060Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9755129Z or.b32 %r67, %r8, 128; 2026-02-21T08:52:43.9755337Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9755413Z mad.wide.u32 %rd117, %r4, 7168, %rd16; 2026-02-21T08:52:43.9755484Z add.s64 %rd1, %rd117, 458752; 2026-02-21T08:52:43.9755547Z cvt.u64.u32 %rd125, %r1380; 2026-02-21T08:52:43.9755664Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:43.9755767Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:43.9756076Z .loc 1 25 35 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:25:35 2026-02-21T08:52:43.9756154Z mul.hi.s32 %r1115, %r1431, -1840700269; 2026-02-21T08:52:43.9756223Z add.s32 %r1116, %r1115, %r1431; 2026-02-21T08:52:43.9756293Z shr.u32 %r1117, %r1116, 31; 2026-02-21T08:52:43.9756355Z shr.s32 %r1118, %r1116, 6; 2026-02-21T08:52:43.9756417Z add.s32 %r1119, %r1118, %r1117; 2026-02-21T08:52:43.9756757Z .loc 1 26 33 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:26:33 2026-02-21T08:52:43.9756835Z shl.b32 %r1120, %r1119, 1; 2026-02-21T08:52:43.9757045Z .loc 1 27 39 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:39 2026-02-21T08:52:43.9757109Z sub.s32 %r1121, 1, %r1120; 2026-02-21T08:52:43.9757308Z .loc 1 27 52 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:27:52 2026-02-21T08:52:43.9757370Z min.u32 %r1122, %r1121, 2; 2026-02-21T08:52:43.9757564Z .loc 1 28 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:45 2026-02-21T08:52:43.9757632Z mul.lo.s32 %r1123, %r1119, 112; 2026-02-21T08:52:43.9757694Z sub.s32 %r1124, %r1431, %r1123; 2026-02-21T08:52:43.9757889Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.9757958Z cvt.u16.u32 %rs217, %r1124; 2026-02-21T08:52:43.9758020Z cvt.s8.s32 %rs218, %r1124; 2026-02-21T08:52:43.9758081Z cvt.u16.u32 %rs219, %r1122; 2026-02-21T08:52:43.9758275Z .loc 1 29 51 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:29:51 2026-02-21T08:52:43.9758344Z div.s16 %rs220, %rs218, %rs219; 2026-02-21T08:52:43.9758536Z .loc 1 28 64 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:64 2026-02-21T08:52:43.9758604Z mul.lo.s16 %rs221, %rs220, %rs219; 2026-02-21T08:52:43.9758671Z sub.s16 %rs222, %rs217, %rs221; 2026-02-21T08:52:43.9758743Z cvt.u32.u16 %r1125, %rs222; 2026-02-21T08:52:43.9758810Z cvt.s32.s8 %r1126, %r1125; 2026-02-21T08:52:43.9759009Z .loc 1 28 30 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:28:30 2026-02-21T08:52:43.9759073Z add.s32 %r1127, %r1120, %r1126; 2026-02-21T08:52:43.9759264Z .loc 1 30 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:30:27 2026-02-21T08:52:43.9759333Z shl.b32 %r1128, %r1127, 6; 2026-02-21T08:52:43.9759523Z .loc 1 31 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:31:32 2026-02-21T08:52:43.9759587Z or.b32 %r156, %r1128, %r6; 2026-02-21T08:52:43.9759781Z .loc 1 32 27 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:32:27 2026-02-21T08:52:43.9759848Z cvt.s16.s8 %rs223, %rs220; 2026-02-21T08:52:43.9759918Z mul.wide.s16 %r157, %rs223, 128; 2026-02-21T08:52:43.9760119Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9760192Z or.b32 %r1129, %r157, %r10; 2026-02-21T08:52:43.9760386Z .loc 1 48 53 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:53 2026-02-21T08:52:43.9760599Z shl.b32 %r1130, %r156, 13; 2026-02-21T08:52:43.9760805Z .loc 1 48 60 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:60 2026-02-21T08:52:43.9760868Z or.b32 %r1131, %r1130, %r8; 2026-02-21T08:52:43.9761070Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9761150Z mad.wide.s32 %rd118, %r1131, 2, %rd15; 2026-02-21T08:52:43.9761211Z mov.b32 %r1105, 8; 2026-02-21T08:52:43.9761408Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9761469Z // begin inline asm 2026-02-21T08:52:43.9761623Z cp.async.ca.shared.global [ %r47 + 0 ], [ %rd118 + 0 ], 0x8, %r1105; 2026-02-21T08:52:43.9761683Z // end inline asm 2026-02-21T08:52:43.9761871Z cp.async.commit_group; 2026-02-21T08:52:43.9762093Z .loc 1 54 62 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:62 2026-02-21T08:52:43.9762161Z add.s32 %r1132, %r1129, %r1380; 2026-02-21T08:52:43.9762355Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9762422Z cvt.s64.s32 %rd123, %r1132; 2026-02-21T08:52:43.9762488Z add.s64 %rd119, %rd16, %rd123; 2026-02-21T08:52:43.9762546Z mov.b32 %r1107, 4; 2026-02-21T08:52:43.9762738Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9762807Z // begin inline asm 2026-02-21T08:52:43.9762942Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd119 + 0 ], 0x4, %r1107; 2026-02-21T08:52:43.9763000Z // end inline asm 2026-02-21T08:52:43.9763072Z cp.async.commit_group; 2026-02-21T08:52:43.9763270Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9763336Z add.s64 %rd120, %rd118, 128; 2026-02-21T08:52:43.9763533Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9763596Z bar.sync 0; 2026-02-21T08:52:43.9763660Z // begin inline asm 2026-02-21T08:52:43.9763803Z cp.async.ca.shared.global [ %r1108 + 0 ], [ %rd120 + 0 ], 0x8, %r1105; 2026-02-21T08:52:43.9763866Z // end inline asm 2026-02-21T08:52:43.9763931Z cp.async.commit_group; 2026-02-21T08:52:43.9764125Z .loc 1 54 34 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:34 2026-02-21T08:52:43.9764208Z cvt.s64.s32 %rd124, %r1129; 2026-02-21T08:52:43.9764275Z add.s64 %rd126, %rd124, %rd125; 2026-02-21T08:52:43.9764339Z add.s64 %rd127, %rd16, %rd126; 2026-02-21T08:52:43.9764404Z add.s64 %rd121, %rd127, 229376; 2026-02-21T08:52:43.9764606Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9764668Z // begin inline asm 2026-02-21T08:52:43.9764806Z cp.async.ca.shared.global [ %r1110 + 0 ], [ %rd121 + 0 ], 0x4, %r1107; 2026-02-21T08:52:43.9764871Z // end inline asm 2026-02-21T08:52:43.9764941Z cp.async.commit_group; 2026-02-21T08:52:43.9765145Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9765214Z shl.b32 %r1133, %r1119, 7; 2026-02-21T08:52:43.9765276Z or.b32 %r1134, %r6, %r1133; 2026-02-21T08:52:43.9765337Z cvt.s16.s8 %rs224, %rs222; 2026-02-21T08:52:43.9765413Z mad.wide.s16 %r1135, %rs224, 64, %r1134; 2026-02-21T08:52:43.9765477Z shl.b32 %r1136, %r1135, 13; 2026-02-21T08:52:43.9765539Z or.b32 %r1432, %r67, %r1136; 2026-02-21T08:52:43.9765602Z add.s64 %rd150, %rd1, %rd124; 2026-02-21T08:52:43.9765679Z mov.b32 %r1435, 0f00000000; 2026-02-21T08:52:43.9765740Z mov.b32 %r1434, 1; 2026-02-21T08:52:43.9765801Z mov.b32 %r1433, -1; 2026-02-21T08:52:43.9765863Z mov.b64 %rd151, -32; 2026-02-21T08:52:43.9765930Z mov.b32 %r1436, %r1435; 2026-02-21T08:52:43.9765989Z mov.b32 %r1437, %r1435; 2026-02-21T08:52:43.9766047Z mov.b32 %r1438, %r1435; 2026-02-21T08:52:43.9766219Z mov.b32 %r1439, %r1435; 2026-02-21T08:52:43.9766277Z mov.b32 %r1440, %r1435; 2026-02-21T08:52:43.9766335Z mov.b32 %r1441, %r1435; 2026-02-21T08:52:43.9766398Z mov.b32 %r1442, %r1435; 2026-02-21T08:52:43.9766625Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:43.9766738Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:43.9766806Z add.s64 %rd151, %rd151, 32; 2026-02-21T08:52:43.9766887Z setp.lt.u64 %p52, %rd151, 4032; 2026-02-21T08:52:43.9766951Z add.s32 %r1323, %r1433, 1; 2026-02-21T08:52:43.9767020Z setp.gt.s32 %p53, %r1323, 1; 2026-02-21T08:52:43.9767094Z selp.b32 %r1433, 0, %r1323, %p53; 2026-02-21T08:52:43.9767438Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9767511Z cp.async.wait_group 2; 2026-02-21T08:52:43.9767570Z bar.sync 0; 2026-02-21T08:52:43.9767638Z shl.b32 %r1324, %r1433, 12; 2026-02-21T08:52:43.9767701Z shl.b32 %r1325, %r1433, 13; 2026-02-21T08:52:43.9767763Z add.s32 %r1326, %r1379, %r1325; 2026-02-21T08:52:43.9767830Z add.s32 %r1327, %r1326, 32768; 2026-02-21T08:52:43.9768024Z .loc 1 52 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:52:32 2026-02-21T08:52:43.9768086Z add.s32 %r1328, %r1327, %r52; 2026-02-21T08:52:43.9768158Z ld.shared.b16 %rs225, [%r1328]; 2026-02-21T08:52:43.9768240Z ld.shared.b16 %rs226, [%r1328+1024]; 2026-02-21T08:52:43.9768310Z ld.shared.b16 %rs227, [%r1328+64]; 2026-02-21T08:52:43.9768377Z ld.shared.b16 %rs228, [%r1328+1088]; 2026-02-21T08:52:43.9768444Z add.s32 %r1329, %r1327, %r53; 2026-02-21T08:52:43.9768508Z ld.shared.b16 %rs229, [%r1329]; 2026-02-21T08:52:43.9768574Z ld.shared.b16 %rs230, [%r1329+1024]; 2026-02-21T08:52:43.9768648Z ld.shared.b16 %rs231, [%r1329+64]; 2026-02-21T08:52:43.9768714Z ld.shared.b16 %rs232, [%r1329+1088]; 2026-02-21T08:52:43.9768775Z add.s32 %r1330, %r1327, %r54; 2026-02-21T08:52:43.9768842Z ld.shared.b16 %rs233, [%r1330]; 2026-02-21T08:52:43.9768911Z ld.shared.b16 %rs234, [%r1330+1024]; 2026-02-21T08:52:43.9768976Z ld.shared.b16 %rs235, [%r1330+64]; 2026-02-21T08:52:43.9769041Z ld.shared.b16 %rs236, [%r1330+1088]; 2026-02-21T08:52:43.9769108Z add.s32 %r1331, %r1327, %r55; 2026-02-21T08:52:43.9769172Z ld.shared.b16 %rs237, [%r1331]; 2026-02-21T08:52:43.9769239Z ld.shared.b16 %rs238, [%r1331+1024]; 2026-02-21T08:52:43.9769310Z ld.shared.b16 %rs239, [%r1331+64]; 2026-02-21T08:52:43.9769375Z ld.shared.b16 %rs240, [%r1331+1088]; 2026-02-21T08:52:43.9769437Z add.s32 %r1332, %r1327, %r56; 2026-02-21T08:52:43.9769500Z ld.shared.b16 %rs241, [%r1332]; 2026-02-21T08:52:43.9769571Z ld.shared.b16 %rs242, [%r1332+1024]; 2026-02-21T08:52:43.9769637Z ld.shared.b16 %rs243, [%r1332+64]; 2026-02-21T08:52:43.9769704Z ld.shared.b16 %rs244, [%r1332+1088]; 2026-02-21T08:52:43.9769771Z add.s32 %r1333, %r1327, %r57; 2026-02-21T08:52:43.9769836Z ld.shared.b16 %rs245, [%r1333]; 2026-02-21T08:52:43.9769904Z ld.shared.b16 %rs246, [%r1333+1024]; 2026-02-21T08:52:43.9769971Z ld.shared.b16 %rs247, [%r1333+64]; 2026-02-21T08:52:43.9770047Z ld.shared.b16 %rs248, [%r1333+1088]; 2026-02-21T08:52:43.9770110Z add.s32 %r1334, %r1327, %r58; 2026-02-21T08:52:43.9770174Z ld.shared.b16 %rs249, [%r1334]; 2026-02-21T08:52:43.9770245Z ld.shared.b16 %rs250, [%r1334+1024]; 2026-02-21T08:52:43.9770313Z ld.shared.b16 %rs251, [%r1334+64]; 2026-02-21T08:52:43.9770380Z ld.shared.b16 %rs252, [%r1334+1088]; 2026-02-21T08:52:43.9770458Z add.s32 %r1335, %r1327, %r59; 2026-02-21T08:52:43.9770524Z ld.shared.b16 %rs253, [%r1335]; 2026-02-21T08:52:43.9770592Z ld.shared.b16 %rs254, [%r1335+1024]; 2026-02-21T08:52:43.9770657Z ld.shared.b16 %rs255, [%r1335+64]; 2026-02-21T08:52:43.9770730Z ld.shared.b16 %rs256, [%r1335+1088]; 2026-02-21T08:52:43.9770796Z cvt.f32.bf16 %r1153, %rs225; 2026-02-21T08:52:43.9770858Z cvt.f32.bf16 %r1154, %rs226; 2026-02-21T08:52:43.9770926Z cvt.f32.bf16 %r1155, %rs229; 2026-02-21T08:52:43.9771120Z cvt.f32.bf16 %r1156, %rs230; 2026-02-21T08:52:43.9771181Z cvt.f32.bf16 %r1173, %rs233; 2026-02-21T08:52:43.9771242Z cvt.f32.bf16 %r1174, %rs234; 2026-02-21T08:52:43.9771310Z cvt.f32.bf16 %r1175, %rs237; 2026-02-21T08:52:43.9771371Z cvt.f32.bf16 %r1176, %rs238; 2026-02-21T08:52:43.9771431Z cvt.f32.bf16 %r1193, %rs241; 2026-02-21T08:52:43.9771495Z cvt.f32.bf16 %r1194, %rs242; 2026-02-21T08:52:43.9771555Z cvt.f32.bf16 %r1195, %rs245; 2026-02-21T08:52:43.9771627Z cvt.f32.bf16 %r1196, %rs246; 2026-02-21T08:52:43.9771690Z cvt.f32.bf16 %r1213, %rs249; 2026-02-21T08:52:43.9771758Z cvt.f32.bf16 %r1214, %rs250; 2026-02-21T08:52:43.9771819Z cvt.f32.bf16 %r1215, %rs253; 2026-02-21T08:52:43.9771881Z cvt.f32.bf16 %r1216, %rs254; 2026-02-21T08:52:43.9772039Z cvt.f32.bf16 %r1233, %rs227; 2026-02-21T08:52:43.9772117Z cvt.f32.bf16 %r1234, %rs228; 2026-02-21T08:52:43.9772179Z cvt.f32.bf16 %r1235, %rs231; 2026-02-21T08:52:43.9772245Z cvt.f32.bf16 %r1236, %rs232; 2026-02-21T08:52:43.9772334Z cvt.f32.bf16 %r1253, %rs235; 2026-02-21T08:52:43.9772440Z cvt.f32.bf16 %r1254, %rs236; 2026-02-21T08:52:43.9772546Z cvt.f32.bf16 %r1255, %rs239; 2026-02-21T08:52:43.9772648Z cvt.f32.bf16 %r1256, %rs240; 2026-02-21T08:52:43.9772713Z cvt.f32.bf16 %r1273, %rs243; 2026-02-21T08:52:43.9772775Z cvt.f32.bf16 %r1274, %rs244; 2026-02-21T08:52:43.9772840Z cvt.f32.bf16 %r1275, %rs247; 2026-02-21T08:52:43.9772902Z cvt.f32.bf16 %r1276, %rs248; 2026-02-21T08:52:43.9772962Z cvt.f32.bf16 %r1293, %rs251; 2026-02-21T08:52:43.9773022Z cvt.f32.bf16 %r1294, %rs252; 2026-02-21T08:52:43.9773091Z cvt.f32.bf16 %r1295, %rs255; 2026-02-21T08:52:43.9773153Z cvt.f32.bf16 %r1296, %rs256; 2026-02-21T08:52:43.9773354Z .loc 1 67 45 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:67:45 2026-02-21T08:52:43.9773421Z add.s32 %r1336, %r60, %r1324; 2026-02-21T08:52:43.9773489Z ld.shared.b8 %rs257, [%r1336]; 2026-02-21T08:52:43.9773554Z ld.shared.b8 %rs258, [%r1336+512]; 2026-02-21T08:52:43.9773626Z ld.shared.b8 %rs259, [%r1336+1024]; 2026-02-21T08:52:43.9773698Z ld.shared.b8 %rs260, [%r1336+1536]; 2026-02-21T08:52:43.9773762Z ld.shared.b8 %rs261, [%r1336+2048]; 2026-02-21T08:52:43.9773826Z ld.shared.b8 %rs262, [%r1336+2560]; 2026-02-21T08:52:43.9773895Z ld.shared.b8 %rs263, [%r1336+3072]; 2026-02-21T08:52:43.9773959Z ld.shared.b8 %rs264, [%r1336+3584]; 2026-02-21T08:52:43.9774153Z .loc 1 57 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:57:28 2026-02-21T08:52:43.9774219Z shl.b16 %rs265, %rs257, 4; 2026-02-21T08:52:43.9774281Z shl.b16 %rs266, %rs258, 4; 2026-02-21T08:52:43.9774340Z shl.b16 %rs267, %rs259, 4; 2026-02-21T08:52:43.9774400Z shl.b16 %rs268, %rs260, 4; 2026-02-21T08:52:43.9774468Z shl.b16 %rs269, %rs261, 4; 2026-02-21T08:52:43.9774529Z shl.b16 %rs270, %rs262, 4; 2026-02-21T08:52:43.9774589Z shl.b16 %rs271, %rs263, 4; 2026-02-21T08:52:43.9774655Z shl.b16 %rs272, %rs264, 4; 2026-02-21T08:52:43.9774850Z .loc 1 72 58 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:72:58 2026-02-21T08:52:43.9774923Z selp.b16 %rs273, %rs265, %rs257, %p57; 2026-02-21T08:52:43.9774991Z cvt.s16.s8 %rs274, %rs273; 2026-02-21T08:52:43.9775052Z shr.s16 %rs275, %rs274, 4; 2026-02-21T08:52:43.9775125Z selp.b16 %rs276, %rs266, %rs258, %p57; 2026-02-21T08:52:43.9775186Z cvt.s16.s8 %rs277, %rs276; 2026-02-21T08:52:43.9775252Z shr.s16 %rs278, %rs277, 4; 2026-02-21T08:52:43.9775323Z selp.b16 %rs279, %rs267, %rs259, %p57; 2026-02-21T08:52:43.9775383Z cvt.s16.s8 %rs280, %rs279; 2026-02-21T08:52:43.9775447Z shr.s16 %rs281, %rs280, 4; 2026-02-21T08:52:43.9775515Z selp.b16 %rs282, %rs268, %rs260, %p57; 2026-02-21T08:52:43.9775576Z cvt.s16.s8 %rs283, %rs282; 2026-02-21T08:52:43.9775654Z shr.s16 %rs284, %rs283, 4; 2026-02-21T08:52:43.9775730Z selp.b16 %rs285, %rs269, %rs261, %p57; 2026-02-21T08:52:43.9775792Z cvt.s16.s8 %rs286, %rs285; 2026-02-21T08:52:43.9775972Z shr.s16 %rs287, %rs286, 4; 2026-02-21T08:52:43.9776045Z selp.b16 %rs288, %rs270, %rs262, %p57; 2026-02-21T08:52:43.9776106Z cvt.s16.s8 %rs289, %rs288; 2026-02-21T08:52:43.9776166Z shr.s16 %rs290, %rs289, 4; 2026-02-21T08:52:43.9776236Z selp.b16 %rs291, %rs271, %rs263, %p57; 2026-02-21T08:52:43.9776300Z cvt.s16.s8 %rs292, %rs291; 2026-02-21T08:52:43.9776360Z shr.s16 %rs293, %rs292, 4; 2026-02-21T08:52:43.9776427Z selp.b16 %rs294, %rs272, %rs264, %p57; 2026-02-21T08:52:43.9776663Z cvt.s16.s8 %rs295, %rs294; 2026-02-21T08:52:43.9776751Z shr.s16 %rs296, %rs295, 4; 2026-02-21T08:52:43.9776951Z .loc 1 77 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:77:32 2026-02-21T08:52:43.9777022Z cvt.rn.f32.s16 %r1337, %rs275; 2026-02-21T08:52:43.9777225Z cvt.rn.f32.s16 %r1338, %rs278; 2026-02-21T08:52:43.9777293Z cvt.rn.f32.s16 %r1339, %rs281; 2026-02-21T08:52:43.9777357Z cvt.rn.f32.s16 %r1340, %rs284; 2026-02-21T08:52:43.9777436Z cvt.rn.f32.s16 %r1341, %rs287; 2026-02-21T08:52:43.9777505Z cvt.rn.f32.s16 %r1342, %rs290; 2026-02-21T08:52:43.9777567Z cvt.rn.f32.s16 %r1343, %rs293; 2026-02-21T08:52:43.9777635Z cvt.rn.f32.s16 %r1344, %rs296; 2026-02-21T08:52:43.9777700Z st.shared.b32 [%r61], %r1337; 2026-02-21T08:52:43.9777767Z st.shared.b32 [%r61+16384], %r1341; 2026-02-21T08:52:43.9777831Z st.shared.b32 [%r62], %r1338; 2026-02-21T08:52:43.9777902Z st.shared.b32 [%r62+16384], %r1342; 2026-02-21T08:52:43.9777965Z st.shared.b32 [%r63], %r1339; 2026-02-21T08:52:43.9778031Z st.shared.b32 [%r63+16384], %r1343; 2026-02-21T08:52:43.9778099Z st.shared.b32 [%r64], %r1340; 2026-02-21T08:52:43.9778165Z st.shared.b32 [%r64+16384], %r1344; 2026-02-21T08:52:43.9778220Z $L__tmp7: 2026-02-21T08:52:43.9778495Z .loc 2 291 36 // standard.py:291:36 @[ cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:84:40 ] 2026-02-21T08:52:43.9778564Z // begin inline asm 2026-02-21T08:52:43.9778644Z fence.proxy.async.shared::cta; 2026-02-21T08:52:43.9778706Z // end inline asm 2026-02-21T08:52:43.9778766Z bar.sync 0; 2026-02-21T08:52:43.9778847Z shfl.sync.idx.b32 %r1345, %r4, 0, 31, -1; 2026-02-21T08:52:43.9778920Z wgmma.fence.sync.aligned; 2026-02-21T08:52:43.9778987Z shl.b32 %r1346, %r1345, 9; 2026-02-21T08:52:43.9779051Z and.b32 %r1347, %r1346, 14336; 2026-02-21T08:52:43.9779113Z add.s32 %r1348, %r1347, %r1379; 2026-02-21T08:52:43.9779175Z bfe.u32 %r1349, %r1348, 4, 14; 2026-02-21T08:52:43.9779245Z cvt.u64.u32 %rd138, %r1349; 2026-02-21T08:52:43.9779323Z or.b64 %rd128, %rd138, 4611686293372403712; 2026-02-21T08:52:43.9779387Z mov.pred %p43, -1; 2026-02-21T08:52:43.9779454Z // begin inline asm 2026-02-21T08:52:43.9779844Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1153,%r1154,%r1155,%r1156}, %rd128, %p43, 1, 1; 2026-02-21T08:52:43.9779903Z // end inline asm 2026-02-21T08:52:43.9779971Z add.s32 %r1350, %r1348, 32; 2026-02-21T08:52:43.9780033Z bfe.u32 %r1351, %r1350, 4, 14; 2026-02-21T08:52:43.9780100Z cvt.u64.u32 %rd139, %r1351; 2026-02-21T08:52:43.9780175Z or.b64 %rd129, %rd139, 4611686293372403712; 2026-02-21T08:52:43.9780240Z // begin inline asm 2026-02-21T08:52:43.9780618Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1173,%r1174,%r1175,%r1176}, %rd129, %p43, 1, 1; 2026-02-21T08:52:43.9780679Z // end inline asm 2026-02-21T08:52:43.9780753Z add.s32 %r1352, %r1348, 64; 2026-02-21T08:52:43.9780821Z bfe.u32 %r1353, %r1352, 4, 14; 2026-02-21T08:52:43.9780884Z cvt.u64.u32 %rd140, %r1353; 2026-02-21T08:52:43.9780960Z or.b64 %rd130, %rd140, 4611686293372403712; 2026-02-21T08:52:43.9781019Z // begin inline asm 2026-02-21T08:52:43.9781395Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1193,%r1194,%r1195,%r1196}, %rd130, %p43, 1, 1; 2026-02-21T08:52:43.9781460Z // end inline asm 2026-02-21T08:52:43.9781521Z add.s32 %r1354, %r1348, 96; 2026-02-21T08:52:43.9781716Z bfe.u32 %r1355, %r1354, 4, 14; 2026-02-21T08:52:43.9781778Z cvt.u64.u32 %rd141, %r1355; 2026-02-21T08:52:43.9781856Z or.b64 %rd131, %rd141, 4611686293372403712; 2026-02-21T08:52:43.9781916Z // begin inline asm 2026-02-21T08:52:43.9782281Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1213,%r1214,%r1215,%r1216}, %rd131, %p43, 1, 1; 2026-02-21T08:52:43.9782350Z // end inline asm 2026-02-21T08:52:43.9782419Z add.s32 %r1356, %r1348, 16384; 2026-02-21T08:52:43.9782480Z bfe.u32 %r1357, %r1356, 4, 14; 2026-02-21T08:52:43.9782543Z cvt.u64.u32 %rd142, %r1357; 2026-02-21T08:52:43.9782619Z or.b64 %rd132, %rd142, 4611686293372403712; 2026-02-21T08:52:43.9782678Z // begin inline asm 2026-02-21T08:52:43.9783138Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1233,%r1234,%r1235,%r1236}, %rd132, %p43, 1, 1; 2026-02-21T08:52:43.9783208Z // end inline asm 2026-02-21T08:52:43.9783270Z add.s32 %r1358, %r1348, 16416; 2026-02-21T08:52:43.9783331Z bfe.u32 %r1359, %r1358, 4, 14; 2026-02-21T08:52:43.9783395Z cvt.u64.u32 %rd143, %r1359; 2026-02-21T08:52:43.9783467Z or.b64 %rd133, %rd143, 4611686293372403712; 2026-02-21T08:52:43.9783527Z // begin inline asm 2026-02-21T08:52:43.9783890Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1253,%r1254,%r1255,%r1256}, %rd133, %p43, 1, 1; 2026-02-21T08:52:43.9783956Z // end inline asm 2026-02-21T08:52:43.9784016Z add.s32 %r1360, %r1348, 16448; 2026-02-21T08:52:43.9784080Z bfe.u32 %r1361, %r1360, 4, 14; 2026-02-21T08:52:43.9784148Z cvt.u64.u32 %rd144, %r1361; 2026-02-21T08:52:43.9784220Z or.b64 %rd134, %rd144, 4611686293372403712; 2026-02-21T08:52:43.9784299Z // begin inline asm 2026-02-21T08:52:43.9784675Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1273,%r1274,%r1275,%r1276}, %rd134, %p43, 1, 1; 2026-02-21T08:52:43.9784737Z // end inline asm 2026-02-21T08:52:43.9784798Z add.s32 %r1362, %r1348, 16480; 2026-02-21T08:52:43.9784859Z bfe.u32 %r1363, %r1362, 4, 14; 2026-02-21T08:52:43.9784924Z cvt.u64.u32 %rd145, %r1363; 2026-02-21T08:52:43.9784995Z or.b64 %rd135, %rd145, 4611686293372403712; 2026-02-21T08:52:43.9785056Z // begin inline asm 2026-02-21T08:52:43.9785426Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442}, {%r1293,%r1294,%r1295,%r1296}, %rd135, %p43, 1, 1; 2026-02-21T08:52:43.9785484Z // end inline asm 2026-02-21T08:52:43.9785561Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:43.9785624Z mov.b32 %r1306, 0; 2026-02-21T08:52:43.9785684Z mov.b32 %r1305, %r1379; 2026-02-21T08:52:43.9785747Z mov.b32 %r1307, %r1306; 2026-02-21T08:52:43.9785805Z // begin inline asm 2026-02-21T08:52:43.9785989Z // wait for regs: %r1435,%r1436,%r1437,%r1438,%r1439,%r1440,%r1441,%r1442,%r1305,%r1306,%r1307 2026-02-21T08:52:43.9786072Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:43.9786129Z // end inline asm 2026-02-21T08:52:43.9786190Z $L__tmp8: 2026-02-21T08:52:43.9786400Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9786579Z add.s32 %r1364, %r1434, 1; 2026-02-21T08:52:43.9786655Z setp.gt.s32 %p54, %r1364, 1; 2026-02-21T08:52:43.9786725Z selp.b32 %r1434, 0, %r1364, %p54; 2026-02-21T08:52:43.9786939Z .loc 1 48 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:32 2026-02-21T08:52:43.9787012Z mad.wide.s32 %rd136, %r1432, 2, %rd15; 2026-02-21T08:52:43.9787215Z .loc 1 48 80 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:48:80 2026-02-21T08:52:43.9787277Z shl.b32 %r1365, %r1434, 12; 2026-02-21T08:52:43.9787336Z shl.b32 %r1366, %r1434, 13; 2026-02-21T08:52:43.9787408Z add.s32 %r1319, %r47, %r1366; 2026-02-21T08:52:43.9788149Z selp.b32 %r1320, 8, 0, %p52; 2026-02-21T08:52:43.9788208Z // begin inline asm 2026-02-21T08:52:43.9788352Z cp.async.ca.shared.global [ %r1319 + 0 ], [ %rd136 + 0 ], 0x8, %r1320; 2026-02-21T08:52:43.9788413Z // end inline asm 2026-02-21T08:52:43.9788479Z cp.async.commit_group; 2026-02-21T08:52:43.9788742Z .loc 1 54 87 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:54:87 2026-02-21T08:52:43.9788809Z add.s32 %r1321, %r49, %r1365; 2026-02-21T08:52:43.9788871Z selp.b32 %r1322, 4, 0, %p52; 2026-02-21T08:52:43.9788931Z // begin inline asm 2026-02-21T08:52:43.9789072Z cp.async.ca.shared.global [ %r1321 + 0 ], [ %rd150 + 0 ], 0x4, %r1322; 2026-02-21T08:52:43.9789129Z // end inline asm 2026-02-21T08:52:43.9789194Z cp.async.commit_group; 2026-02-21T08:52:43.9789532Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9789603Z add.s32 %r1432, %r1432, 64; 2026-02-21T08:52:43.9789670Z add.s64 %rd150, %rd150, 229376; 2026-02-21T08:52:43.9789737Z setp.lt.u64 %p55, %rd151, 4064; 2026-02-21T08:52:43.9789803Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:43.9789915Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:43.9790109Z .loc 1 33 32 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:33:32 2026-02-21T08:52:43.9790175Z or.b32 %r1371, %r157, %r11; 2026-02-21T08:52:43.9790376Z .loc 1 40 103 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:40:103 2026-02-21T08:52:43.9790443Z cp.async.wait_group 0; 2026-02-21T08:52:43.9790498Z bar.sync 0; 2026-02-21T08:52:43.9790705Z .loc 1 87 28 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:87:28 2026-02-21T08:52:43.9790790Z cvt.rn.bf16x2.f32 %r1372, %r1436, %r1435; 2026-02-21T08:52:43.9790869Z cvt.rn.bf16x2.f32 %r1373, %r1438, %r1437; 2026-02-21T08:52:43.9790948Z cvt.rn.bf16x2.f32 %r1374, %r1440, %r1439; 2026-02-21T08:52:43.9791024Z cvt.rn.bf16x2.f32 %r1375, %r1442, %r1441; 2026-02-21T08:52:43.9791227Z .loc 1 88 50 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:50 2026-02-21T08:52:43.9791301Z mad.lo.s32 %r1376, %r156, 7168, %r1371; 2026-02-21T08:52:43.9791502Z .loc 1 88 22 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:22 2026-02-21T08:52:43.9791583Z mad.wide.s32 %rd146, %r1376, 2, %rd17; 2026-02-21T08:52:43.9791783Z .loc 1 88 81 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:88:81 2026-02-21T08:52:43.9791971Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r65], {%r1372, %r1373, %r1374, %r1375}; 2026-02-21T08:52:43.9792033Z bar.sync 0; 2026-02-21T08:52:43.9792149Z ld.shared.v4.b32 {%r1367, %r1368, %r1369, %r1370}, [%r66]; 2026-02-21T08:52:43.9792211Z // begin inline asm 2026-02-21T08:52:43.9792337Z st.global.v4.b32 [ %rd146 + 0 ], { %r1367, %r1368, %r1369, %r1370 }; 2026-02-21T08:52:43.9792404Z // end inline asm 2026-02-21T08:52:43.9792609Z .loc 1 19 144 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:144 2026-02-21T08:52:43.9792673Z add.s32 %r181, %r1431, 4224; 2026-02-21T08:52:43.9792758Z setp.lt.s32 %p56, %r1431, -4168; 2026-02-21T08:52:43.9792819Z mov.b32 %r1431, %r181; 2026-02-21T08:52:43.9792881Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:43.9792972Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:43.9793173Z .loc 1 19 4 // cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py:19:4 2026-02-21T08:52:43.9793228Z ret; 2026-02-21T08:52:43.9793283Z $L__tmp9: 2026-02-21T08:52:43.9793343Z $L__func_end0: 2026-02-21T08:52:43.9793430Z // -- End function 2026-02-21T08:52:43.9793489Z } 2026-02-21T08:52:43.9793738Z .file 1 "/tmp/torchinductor_root/f7/cf76g2lib7g4pdyujc47ztpx64uaz3cusg5iufacwdd7vwzesjz5.py" 2026-02-21T08:52:43.9793950Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:43.9794123Z .section .debug_abbrev 2026-02-21T08:52:43.9794176Z { 2026-02-21T08:52:43.9794279Z .b8 1 // Abbreviation Code 2026-02-21T08:52:43.9794373Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:43.9794466Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:43.9794558Z .b8 37 // DW_AT_producer 2026-02-21T08:52:43.9794637Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.9794714Z .b8 19 // DW_AT_language 2026-02-21T08:52:43.9794798Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:43.9794967Z .b8 3 // DW_AT_name 2026-02-21T08:52:43.9795060Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.9795146Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:43.9795236Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:43.9795317Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:43.9795393Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.9795474Z .b8 0 // EOM(1) 2026-02-21T08:52:43.9795543Z .b8 0 // EOM(2) 2026-02-21T08:52:43.9795629Z .b8 2 // Abbreviation Code 2026-02-21T08:52:43.9795718Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:43.9795797Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:43.9795874Z .b8 3 // DW_AT_name 2026-02-21T08:52:43.9795957Z .b8 8 // DW_FORM_string 2026-02-21T08:52:43.9796038Z .b8 32 // DW_AT_inline 2026-02-21T08:52:43.9796118Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.9796192Z .b8 0 // EOM(1) 2026-02-21T08:52:43.9796267Z .b8 0 // EOM(2) 2026-02-21T08:52:43.9796351Z .b8 3 // Abbreviation Code 2026-02-21T08:52:43.9796436Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:43.9796652Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:43.9796736Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:43.9796811Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.9796897Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:43.9796974Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.9797085Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:43.9797165Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:43.9797243Z .b8 0 // EOM(1) 2026-02-21T08:52:43.9797315Z .b8 0 // EOM(2) 2026-02-21T08:52:43.9797401Z .b8 4 // Abbreviation Code 2026-02-21T08:52:43.9797505Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:43.9797583Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:43.9797673Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:43.9797757Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:43.9797837Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:43.9797913Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.9797993Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:43.9798076Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:43.9798159Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:43.9798237Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.9798473Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:43.9798556Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.9798645Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:43.9798727Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:43.9798800Z .b8 0 // EOM(1) 2026-02-21T08:52:43.9798869Z .b8 0 // EOM(2) 2026-02-21T08:52:43.9798940Z .b8 0 // EOM(3) 2026-02-21T08:52:43.9798991Z } 2026-02-21T08:52:43.9799055Z .section .debug_info 2026-02-21T08:52:43.9799106Z { 2026-02-21T08:52:43.9799201Z .b32 178 // Length of Unit 2026-02-21T08:52:43.9799423Z .b8 2 // DWARF version number 2026-02-21T08:52:43.9799482Z .b8 0 2026-02-21T08:52:43.9799623Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:43.9799737Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:43.9799856Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:43.9799943Z .b8 116 // DW_AT_producer 2026-02-21T08:52:43.9800005Z .b8 114 2026-02-21T08:52:43.9800057Z .b8 105 2026-02-21T08:52:43.9800108Z .b8 116 2026-02-21T08:52:43.9800163Z .b8 111 2026-02-21T08:52:43.9800215Z .b8 110 2026-02-21T08:52:43.9800266Z .b8 0 2026-02-21T08:52:43.9800342Z .b8 2 // DW_AT_language 2026-02-21T08:52:43.9800399Z .b8 0 2026-02-21T08:52:43.9800479Z .b8 99 // DW_AT_name 2026-02-21T08:52:43.9800531Z .b8 102 2026-02-21T08:52:43.9800585Z .b8 55 2026-02-21T08:52:43.9800639Z .b8 54 2026-02-21T08:52:43.9800691Z .b8 103 2026-02-21T08:52:43.9800741Z .b8 50 2026-02-21T08:52:43.9800798Z .b8 108 2026-02-21T08:52:43.9800848Z .b8 105 2026-02-21T08:52:43.9800902Z .b8 98 2026-02-21T08:52:43.9800957Z .b8 55 2026-02-21T08:52:43.9801008Z .b8 103 2026-02-21T08:52:43.9801059Z .b8 52 2026-02-21T08:52:43.9801109Z .b8 112 2026-02-21T08:52:43.9801163Z .b8 100 2026-02-21T08:52:43.9801215Z .b8 121 2026-02-21T08:52:43.9801265Z .b8 117 2026-02-21T08:52:43.9801324Z .b8 106 2026-02-21T08:52:43.9801374Z .b8 99 2026-02-21T08:52:43.9801424Z .b8 52 2026-02-21T08:52:43.9801474Z .b8 55 2026-02-21T08:52:43.9801528Z .b8 122 2026-02-21T08:52:43.9801580Z .b8 116 2026-02-21T08:52:43.9801642Z .b8 112 2026-02-21T08:52:43.9801696Z .b8 120 2026-02-21T08:52:43.9801751Z .b8 54 2026-02-21T08:52:43.9801801Z .b8 52 2026-02-21T08:52:43.9801852Z .b8 117 2026-02-21T08:52:43.9801906Z .b8 97 2026-02-21T08:52:43.9801957Z .b8 122 2026-02-21T08:52:43.9802009Z .b8 51 2026-02-21T08:52:43.9802060Z .b8 99 2026-02-21T08:52:43.9802118Z .b8 117 2026-02-21T08:52:43.9802172Z .b8 115 2026-02-21T08:52:43.9802222Z .b8 103 2026-02-21T08:52:43.9802276Z .b8 53 2026-02-21T08:52:43.9802327Z .b8 105 2026-02-21T08:52:43.9802381Z .b8 117 2026-02-21T08:52:43.9802432Z .b8 102 2026-02-21T08:52:43.9802486Z .b8 97 2026-02-21T08:52:43.9802536Z .b8 99 2026-02-21T08:52:43.9802587Z .b8 119 2026-02-21T08:52:43.9802637Z .b8 100 2026-02-21T08:52:43.9802692Z .b8 100 2026-02-21T08:52:43.9802742Z .b8 55 2026-02-21T08:52:43.9802793Z .b8 118 2026-02-21T08:52:43.9802848Z .b8 119 2026-02-21T08:52:43.9802900Z .b8 122 2026-02-21T08:52:43.9802951Z .b8 101 2026-02-21T08:52:43.9803002Z .b8 115 2026-02-21T08:52:43.9803057Z .b8 106 2026-02-21T08:52:43.9803108Z .b8 122 2026-02-21T08:52:43.9803158Z .b8 53 2026-02-21T08:52:43.9803214Z .b8 46 2026-02-21T08:52:43.9803264Z .b8 112 2026-02-21T08:52:43.9803316Z .b8 121 2026-02-21T08:52:43.9803366Z .b8 0 2026-02-21T08:52:43.9803471Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:43.9803554Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:43.9803606Z .b8 116 2026-02-21T08:52:43.9803664Z .b8 109 2026-02-21T08:52:43.9803714Z .b8 112 2026-02-21T08:52:43.9803882Z .b8 47 2026-02-21T08:52:43.9803937Z .b8 116 2026-02-21T08:52:43.9803993Z .b8 111 2026-02-21T08:52:43.9804047Z .b8 114 2026-02-21T08:52:43.9804096Z .b8 99 2026-02-21T08:52:43.9804152Z .b8 104 2026-02-21T08:52:43.9804210Z .b8 105 2026-02-21T08:52:43.9804262Z .b8 110 2026-02-21T08:52:43.9804316Z .b8 100 2026-02-21T08:52:43.9804371Z .b8 117 2026-02-21T08:52:43.9804421Z .b8 99 2026-02-21T08:52:43.9804472Z .b8 116 2026-02-21T08:52:43.9804523Z .b8 111 2026-02-21T08:52:43.9804577Z .b8 114 2026-02-21T08:52:43.9804628Z .b8 95 2026-02-21T08:52:43.9804679Z .b8 114 2026-02-21T08:52:43.9804733Z .b8 111 2026-02-21T08:52:43.9804783Z .b8 111 2026-02-21T08:52:43.9804835Z .b8 116 2026-02-21T08:52:43.9804885Z .b8 47 2026-02-21T08:52:43.9804941Z .b8 102 2026-02-21T08:52:43.9804990Z .b8 55 2026-02-21T08:52:43.9805041Z .b8 0 2026-02-21T08:52:43.9805255Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:43.9805338Z .b8 95 // DW_AT_name 2026-02-21T08:52:43.9805393Z .b8 104 2026-02-21T08:52:43.9805444Z .b8 101 2026-02-21T08:52:43.9805498Z .b8 108 2026-02-21T08:52:43.9805553Z .b8 105 2026-02-21T08:52:43.9805616Z .b8 111 2026-02-21T08:52:43.9805677Z .b8 110 2026-02-21T08:52:43.9805729Z .b8 95 2026-02-21T08:52:43.9805779Z .b8 109 2026-02-21T08:52:43.9805829Z .b8 97 2026-02-21T08:52:43.9805884Z .b8 116 2026-02-21T08:52:43.9805935Z .b8 109 2026-02-21T08:52:43.9805984Z .b8 117 2026-02-21T08:52:43.9806040Z .b8 108 2026-02-21T08:52:43.9806092Z .b8 95 2026-02-21T08:52:43.9806140Z .b8 98 2026-02-21T08:52:43.9806191Z .b8 102 2026-02-21T08:52:43.9806246Z .b8 49 2026-02-21T08:52:43.9806298Z .b8 54 2026-02-21T08:52:43.9806348Z .b8 95 2026-02-21T08:52:43.9806399Z .b8 105 2026-02-21T08:52:43.9806562Z .b8 110 2026-02-21T08:52:43.9806617Z .b8 116 2026-02-21T08:52:43.9806667Z .b8 52 2026-02-21T08:52:43.9806728Z .b8 0 2026-02-21T08:52:43.9806821Z .b8 1 // DW_AT_inline 2026-02-21T08:52:43.9806930Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:43.9807026Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:43.9807129Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:43.9807229Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:43.9807359Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:43.9807461Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:43.9807550Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:43.9807639Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:43.9807727Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:43.9807813Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:43.9807910Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:43.9808004Z .b8 0 // End Of Children Mark 2026-02-21T08:52:43.9808094Z .b8 0 // End Of Children Mark 2026-02-21T08:52:43.9808144Z } 2026-02-21T08:52:43.9808214Z .section .debug_macinfo { } 2026-02-21T08:52:43.9808219Z 2026-02-21T08:52:43.9808302Z ================================================================ 2026-02-21T08:52:43.9808418Z please share the reproducer above with Triton project. 2026-02-21T08:52:44.6400017Z 2026-02-21T08:52:44.6400029Z 2026-02-21T08:52:44.6400034Z 2026-02-21T08:52:44.6400244Z ================================================================ 2026-02-21T08:52:44.6400599Z Internal Triton PTX codegen error 2026-02-21T08:52:44.6400861Z `ptxas` stderr: 2026-02-21T08:52:44.6401583Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 562 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:44.6402404Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:44.6402630Z 2026-02-21T08:52:44.6403622Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp69mkrwds.ptx -o /tmp/tmp69mkrwds.ptx.o 2026-02-21T08:52:44.6404385Z 2026-02-21T08:52:44.6404389Z 2026-02-21T08:52:44.6404459Z // 2026-02-21T08:52:44.6404653Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:44.6404892Z // 2026-02-21T08:52:44.6404984Z 2026-02-21T08:52:44.6405054Z .version 8.7 2026-02-21T08:52:44.6405226Z .target sm_90a 2026-02-21T08:52:44.6405403Z .address_size 64 2026-02-21T08:52:44.6405545Z 2026-02-21T08:52:44.6405744Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:44.6406146Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:44.6406859Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:44.6407200Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:44.6407550Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:44.6407961Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:44.6408362Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:44.6408753Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:44.6409178Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:44.6409492Z ) 2026-02-21T08:52:44.6409631Z .reqntid 256 2026-02-21T08:52:44.6409790Z .maxnreg 32 2026-02-21T08:52:44.6409933Z { 2026-02-21T08:52:44.6410083Z .reg .pred %p<65>; 2026-02-21T08:52:44.6410261Z .reg .b16 %rs<449>; 2026-02-21T08:52:44.6410439Z .reg .b32 %r<2579>; 2026-02-21T08:52:44.6410607Z .reg .b64 %rd<252>; 2026-02-21T08:52:44.6410984Z .loc 1 14 0 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:14:0 2026-02-21T08:52:44.6411411Z $L__func_begin0: 2026-02-21T08:52:44.6411763Z .loc 1 14 0 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:14:0 2026-02-21T08:52:44.6412116Z 2026-02-21T08:52:44.6412184Z // %bb.0: 2026-02-21T08:52:44.6412401Z ld.param.b64 %rd39, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:44.6412754Z ld.param.b64 %rd38, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:44.6413088Z ld.param.b64 %rd37, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:44.6413773Z [81s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:44.6415522Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:44.6417168Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:44.6417474Z `ptxas` stderr: 2026-02-21T08:52:44.6418038Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 562 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:44.6418681Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:44.6418867Z 2026-02-21T08:52:44.6419376Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp69mkrwds.ptx -o /tmp/tmp69mkrwds.ptx.o 2026-02-21T08:52:44.6419957Z 2026-02-21T08:52:44.6420116Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:44.6420413Z $L__tmp0: 2026-02-21T08:52:44.6420718Z .loc 1 19 46 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:46 2026-02-21T08:52:44.6421108Z mov.u32 %r2456, %ctaid.x; 2026-02-21T08:52:44.6421423Z .loc 1 0 0 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:0 2026-02-21T08:52:44.6421937Z sub.s32 %r340, 4335, %r2456; 2026-02-21T08:52:44.6422280Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6422662Z mul.hi.u32 %r341, %r340, 1041204193; 2026-02-21T08:52:44.6422866Z shr.u32 %r342, %r341, 10; 2026-02-21T08:52:44.6423038Z mul.hi.u32 %r343, %r342, 1431655766; 2026-02-21T08:52:44.6423261Z mad.lo.s32 %r2558, %r343, 12672, %r2456; 2026-02-21T08:52:44.6423613Z .loc 1 31 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:45 2026-02-21T08:52:44.6423979Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:44.6424138Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:44.6424295Z shr.u32 %r5, %r3, 4; 2026-02-21T08:52:44.6424576Z bfe.u32 %r6, %r3, 4, 4; 2026-02-21T08:52:44.6424755Z or.b32 %r7, %r6, 16; 2026-02-21T08:52:44.6424914Z or.b32 %r8, %r6, 32; 2026-02-21T08:52:44.6425063Z or.b32 %r9, %r5, 48; 2026-02-21T08:52:44.6425229Z bfe.u32 %r10, %r3, 3, 5; 2026-02-21T08:52:44.6425396Z or.b32 %r11, %r10, 32; 2026-02-21T08:52:44.6425578Z shl.b32 %r12, %r3, 2; 2026-02-21T08:52:44.6425740Z and.b32 %r13, %r12, 60; 2026-02-21T08:52:44.6425906Z and.b32 %r14, %r3, 7; 2026-02-21T08:52:44.6426060Z shl.b32 %r15, %r14, 3; 2026-02-21T08:52:44.6426374Z .loc 1 65 38 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:65:38 2026-02-21T08:52:44.6426880Z and.b32 %r16, %r3, 64; 2026-02-21T08:52:44.6427213Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6427587Z setp.lt.s32 %p1, %r2456, %r2558; 2026-02-21T08:52:44.6427778Z shl.b32 %r344, %r3, 3; 2026-02-21T08:52:44.6427950Z and.b32 %r17, %r344, 2040; 2026-02-21T08:52:44.6428121Z shr.u32 %r18, %r3, 1; 2026-02-21T08:52:44.6428288Z and.b32 %r345, %r18, 56; 2026-02-21T08:52:44.6428467Z xor.b32 %r19, %r17, %r345; 2026-02-21T08:52:44.6428752Z mov.b32 %r2442, global_smem; 2026-02-21T08:52:44.6428928Z shl.b32 %r2443, %r3, 5; 2026-02-21T08:52:44.6429093Z shl.b32 %r2444, %r3, 1; 2026-02-21T08:52:44.6429259Z shl.b32 %r2445, %r14, 4; 2026-02-21T08:52:44.6429422Z and.b32 %r2446, %r3, 96; 2026-02-21T08:52:44.6429597Z and.b32 %r2447, %r3, 1; 2026-02-21T08:52:44.6429757Z and.b32 %r2448, %r3, 2; 2026-02-21T08:52:44.6429922Z bfe.s32 %r2449, %r3, 1, 1; 2026-02-21T08:52:44.6430089Z bfe.s32 %r2450, %r3, 2, 1; 2026-02-21T08:52:44.6430262Z and.b32 %r2451, %r3, 4; 2026-02-21T08:52:44.6430422Z bfe.s32 %r2452, %r3, 3, 1; 2026-02-21T08:52:44.6430594Z bfe.s32 %r2453, %r3, 4, 1; 2026-02-21T08:52:44.6430761Z and.b32 %r2454, %r3, 16; 2026-02-21T08:52:44.6430932Z and.b32 %r2455, %r12, 928; 2026-02-21T08:52:44.6431110Z setp.eq.b32 %p64, %r16, 0; 2026-02-21T08:52:44.6431293Z @%p1 bra $L__BB0_2; 2026-02-21T08:52:44.6431469Z bra.uni $L__BB0_1; 2026-02-21T08:52:44.6431649Z $L__BB0_2: // %.lr.ph 2026-02-21T08:52:44.6432032Z .loc 1 0 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:0:144 2026-02-21T08:52:44.6432392Z add.s32 %r1349, %r2442, %r19; 2026-02-21T08:52:44.6432580Z or.b32 %r2520, %r19, 2048; 2026-02-21T08:52:44.6432756Z add.s32 %r1351, %r1349, 2048; 2026-02-21T08:52:44.6432939Z or.b32 %r2519, %r19, 4096; 2026-02-21T08:52:44.6433113Z add.s32 %r1353, %r1349, 4096; 2026-02-21T08:52:44.6433286Z or.b32 %r2518, %r19, 6144; 2026-02-21T08:52:44.6433456Z add.s32 %r1355, %r1349, 6144; 2026-02-21T08:52:44.6433629Z mul.lo.s32 %r31, %r10, 7168; 2026-02-21T08:52:44.6433808Z add.s32 %r347, %r2442, 32768; 2026-02-21T08:52:44.6433976Z add.s32 %r1357, %r347, %r17; 2026-02-21T08:52:44.6434151Z or.b32 %r2517, %r13, 64; 2026-02-21T08:52:44.6434327Z add.s32 %r1359, %r1349, 8192; 2026-02-21T08:52:44.6434534Z add.s32 %r1361, %r1349, 10240; 2026-02-21T08:52:44.6434714Z add.s32 %r1363, %r1349, 12288; 2026-02-21T08:52:44.6434895Z add.s32 %r1365, %r1349, 14336; 2026-02-21T08:52:44.6435237Z add.s32 %r348, %r2442, %r17; 2026-02-21T08:52:44.6435408Z add.s32 %r1367, %r348, 34816; 2026-02-21T08:52:44.6435586Z shl.b32 %r349, %r3, 6; 2026-02-21T08:52:44.6435752Z and.b32 %r350, %r349, 6144; 2026-02-21T08:52:44.6435927Z and.b32 %r352, %r2443, 896; 2026-02-21T08:52:44.6436096Z and.b32 %r354, %r2444, 62; 2026-02-21T08:52:44.6436283Z or.b32 %r355, %r350, %r352; 2026-02-21T08:52:44.6436621Z or.b32 %r39, %r355, %r354; 2026-02-21T08:52:44.6436804Z xor.b32 %r40, %r39, 8; 2026-02-21T08:52:44.6436986Z xor.b32 %r41, %r39, 16; 2026-02-21T08:52:44.6437150Z xor.b32 %r42, %r39, 24; 2026-02-21T08:52:44.6437316Z xor.b32 %r43, %r39, 32; 2026-02-21T08:52:44.6437475Z xor.b32 %r44, %r39, 40; 2026-02-21T08:52:44.6437639Z xor.b32 %r45, %r39, 48; 2026-02-21T08:52:44.6437948Z xor.b32 %r46, %r39, 56; 2026-02-21T08:52:44.6438126Z and.b32 %r356, %r3, 63; 2026-02-21T08:52:44.6438285Z and.b32 %r357, %r18, 64; 2026-02-21T08:52:44.6438457Z add.s32 %r358, %r347, %r357; 2026-02-21T08:52:44.6438637Z add.s32 %r47, %r358, %r356; 2026-02-21T08:52:44.6438812Z shl.b32 %r359, %r356, 7; 2026-02-21T08:52:44.6438981Z and.b32 %r361, %r5, 12; 2026-02-21T08:52:44.6439143Z or.b32 %r362, %r359, %r361; 2026-02-21T08:52:44.6439319Z or.b32 %r363, %r362, %r2445; 2026-02-21T08:52:44.6439494Z add.s32 %r364, %r2442, 16384; 2026-02-21T08:52:44.6439672Z add.s32 %r48, %r364, %r363; 2026-02-21T08:52:44.6439842Z xor.b32 %r365, %r363, 16; 2026-02-21T08:52:44.6440017Z add.s32 %r49, %r364, %r365; 2026-02-21T08:52:44.6440188Z xor.b32 %r366, %r363, 32; 2026-02-21T08:52:44.6440365Z add.s32 %r50, %r364, %r366; 2026-02-21T08:52:44.6440535Z xor.b32 %r367, %r363, 48; 2026-02-21T08:52:44.6440711Z add.s32 %r51, %r364, %r367; 2026-02-21T08:52:44.6440888Z xor.b32 %r368, %r363, 64; 2026-02-21T08:52:44.6441061Z add.s32 %r52, %r364, %r368; 2026-02-21T08:52:44.6441257Z xor.b32 %r369, %r363, 80; 2026-02-21T08:52:44.6441421Z add.s32 %r53, %r364, %r369; 2026-02-21T08:52:44.6441598Z xor.b32 %r370, %r363, 96; 2026-02-21T08:52:44.6441767Z add.s32 %r54, %r364, %r370; 2026-02-21T08:52:44.6441944Z xor.b32 %r371, %r363, 112; 2026-02-21T08:52:44.6442112Z add.s32 %r55, %r364, %r371; 2026-02-21T08:52:44.6442289Z shl.b32 %r373, %r2446, 4; 2026-02-21T08:52:44.6442453Z shl.b32 %r375, %r2447, 5; 2026-02-21T08:52:44.6442621Z and.b32 %r378, %r2449, 4160; 2026-02-21T08:52:44.6442794Z shl.b32 %r381, %r2451, 2; 2026-02-21T08:52:44.6442959Z and.b32 %r383, %r2452, 2080; 2026-02-21T08:52:44.6443132Z shl.b32 %r386, %r2454, 3; 2026-02-21T08:52:44.6443293Z or.b32 %r387, %r373, %r375; 2026-02-21T08:52:44.6443467Z or.b32 %r388, %r378, %r383; 2026-02-21T08:52:44.6443633Z or.b32 %r389, %r387, %r357; 2026-02-21T08:52:44.6443803Z xor.b32 %r390, %r389, %r388; 2026-02-21T08:52:44.6443976Z add.s32 %r391, %r2442, %r381; 2026-02-21T08:52:44.6444153Z add.s32 %r392, %r391, %r386; 2026-02-21T08:52:44.6444327Z add.s32 %r56, %r392, %r390; 2026-02-21T08:52:44.6444529Z shl.b32 %r394, %r2447, 6; 2026-02-21T08:52:44.6444815Z shl.b32 %r395, %r2448, 3; 2026-02-21T08:52:44.6444987Z and.b32 %r396, %r2450, 2080; 2026-02-21T08:52:44.6445174Z and.b32 %r397, %r2453, 4160; 2026-02-21T08:52:44.6445346Z or.b32 %r398, %r2455, %r394; 2026-02-21T08:52:44.6445523Z or.b32 %r399, %r396, %r397; 2026-02-21T08:52:44.6445695Z xor.b32 %r400, %r399, %r398; 2026-02-21T08:52:44.6445876Z add.s32 %r401, %r2442, %r395; 2026-02-21T08:52:44.6446055Z add.s32 %r858, %r401, %r400; 2026-02-21T08:52:44.6446231Z add.s32 %r863, %r858, 1024; 2026-02-21T08:52:44.6446409Z shl.b32 %r402, %r2446, 6; 2026-02-21T08:52:44.6446740Z or.b32 %r403, %r402, %r352; 2026-02-21T08:52:44.6446931Z or.b32 %r59, %r403, %r354; 2026-02-21T08:52:44.6447108Z xor.b32 %r60, %r59, 8; 2026-02-21T08:52:44.6447281Z xor.b32 %r61, %r59, 16; 2026-02-21T08:52:44.6447456Z xor.b32 %r62, %r59, 24; 2026-02-21T08:52:44.6447625Z xor.b32 %r63, %r59, 32; 2026-02-21T08:52:44.6447784Z xor.b32 %r64, %r59, 40; 2026-02-21T08:52:44.6447951Z xor.b32 %r65, %r59, 48; 2026-02-21T08:52:44.6448292Z xor.b32 %r66, %r59, 56; 2026-02-21T08:52:44.6448640Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6449018Z or.b32 %r404, %r31, %r15; 2026-02-21T08:52:44.6449190Z add.s32 %r67, %r404, 458752; 2026-02-21T08:52:44.6449525Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6449897Z and.b32 %r405, %r3, 15; 2026-02-21T08:52:44.6450080Z mad.wide.u32 %rd40, %r405, 8, %rd37; 2026-02-21T08:52:44.6450286Z add.s64 %rd1, %rd40, 256; 2026-02-21T08:52:44.6450465Z shl.b32 %r70, %r6, 13; 2026-02-21T08:52:44.6450627Z shl.b32 %r406, %r9, 13; 2026-02-21T08:52:44.6451105Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6451493Z or.b32 %r407, %r406, %r13; 2026-02-21T08:52:44.6451669Z or.b32 %r71, %r407, 128; 2026-02-21T08:52:44.6451848Z cvt.u64.u32 %rd2, %r13; 2026-02-21T08:52:44.6452020Z cvt.u64.u32 %rd3, %r31; 2026-02-21T08:52:44.6452243Z $L__BB0_3: // =>This Loop Header: Depth=1 2026-02-21T08:52:44.6452528Z // Child Loop BB0_4 Depth 2 2026-02-21T08:52:44.6452801Z // Child Loop BB0_6 Depth 2 2026-02-21T08:52:44.6453068Z // Child Loop BB0_8 Depth 2 2026-02-21T08:52:44.6453444Z .loc 1 25 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:25:35 2026-02-21T08:52:44.6453820Z mul.hi.s32 %r431, %r2456, -1840700269; 2026-02-21T08:52:44.6454025Z add.s32 %r432, %r431, %r2456; 2026-02-21T08:52:44.6454218Z shr.u32 %r433, %r432, 31; 2026-02-21T08:52:44.6454394Z shr.s32 %r434, %r432, 7; 2026-02-21T08:52:44.6454573Z add.s32 %r435, %r434, %r433; 2026-02-21T08:52:44.6454896Z .loc 1 26 33 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:26:33 2026-02-21T08:52:44.6455257Z shl.b32 %r436, %r435, 1; 2026-02-21T08:52:44.6455583Z .loc 1 27 39 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:39 2026-02-21T08:52:44.6455942Z sub.s32 %r437, 1, %r436; 2026-02-21T08:52:44.6456262Z .loc 1 27 52 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:52 2026-02-21T08:52:44.6456768Z min.s32 %r438, %r437, 2; 2026-02-21T08:52:44.6457088Z .loc 1 28 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:45 2026-02-21T08:52:44.6457448Z mul.lo.s32 %r439, %r435, 224; 2026-02-21T08:52:44.6457630Z sub.s32 %r440, %r2456, %r439; 2026-02-21T08:52:44.6457955Z .loc 1 29 51 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:29:51 2026-02-21T08:52:44.6458313Z div.s32 %r441, %r440, %r438; 2026-02-21T08:52:44.6458638Z .loc 1 28 64 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:64 2026-02-21T08:52:44.6458996Z mul.lo.s32 %r442, %r441, %r438; 2026-02-21T08:52:44.6459187Z sub.s32 %r443, %r440, %r442; 2026-02-21T08:52:44.6459503Z .loc 1 28 30 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:30 2026-02-21T08:52:44.6459863Z add.s32 %r444, %r443, %r436; 2026-02-21T08:52:44.6460181Z .loc 1 30 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:30:27 2026-02-21T08:52:44.6460531Z shl.b32 %r73, %r444, 6; 2026-02-21T08:52:44.6460841Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6461196Z or.b32 %r445, %r73, %r6; 2026-02-21T08:52:44.6461373Z or.b32 %r446, %r73, %r7; 2026-02-21T08:52:44.6461536Z or.b32 %r447, %r73, %r8; 2026-02-21T08:52:44.6461723Z or.b32 %r448, %r73, %r9; 2026-02-21T08:52:44.6462043Z .loc 1 32 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:32:27 2026-02-21T08:52:44.6462393Z shl.b32 %r449, %r441, 6; 2026-02-21T08:52:44.6462858Z .loc 1 33 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:33:32 2026-02-21T08:52:44.6463210Z or.b32 %r74, %r449, %r15; 2026-02-21T08:52:44.6463532Z .loc 1 48 53 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:53 2026-02-21T08:52:44.6463880Z shl.b32 %r450, %r445, 13; 2026-02-21T08:52:44.6464052Z shl.b32 %r451, %r446, 13; 2026-02-21T08:52:44.6464219Z shl.b32 %r452, %r447, 13; 2026-02-21T08:52:44.6464393Z shl.b32 %r453, %r448, 13; 2026-02-21T08:52:44.6464712Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6465063Z or.b32 %r454, %r450, %r13; 2026-02-21T08:52:44.6465246Z or.b32 %r455, %r451, %r13; 2026-02-21T08:52:44.6465549Z or.b32 %r456, %r452, %r13; 2026-02-21T08:52:44.6465733Z or.b32 %r457, %r453, %r13; 2026-02-21T08:52:44.6466045Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6466416Z mad.wide.s32 %rd41, %r454, 2, %rd37; 2026-02-21T08:52:44.6466820Z mad.wide.s32 %rd42, %r455, 2, %rd37; 2026-02-21T08:52:44.6467023Z mad.wide.s32 %rd43, %r456, 2, %rd37; 2026-02-21T08:52:44.6467229Z mad.wide.s32 %rd44, %r457, 2, %rd37; 2026-02-21T08:52:44.6467579Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6467933Z bar.sync 0; 2026-02-21T08:52:44.6468078Z mov.b32 %r409, 8; 2026-02-21T08:52:44.6468238Z // begin inline asm 2026-02-21T08:52:44.6468474Z cp.async.ca.shared.global [ %r1349 + 0 ], [ %rd41 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6468821Z // end inline asm 2026-02-21T08:52:44.6468976Z // begin inline asm 2026-02-21T08:52:44.6469207Z cp.async.ca.shared.global [ %r1351 + 0 ], [ %rd42 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6469496Z // end inline asm 2026-02-21T08:52:44.6469644Z // begin inline asm 2026-02-21T08:52:44.6469877Z cp.async.ca.shared.global [ %r1353 + 0 ], [ %rd43 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6470151Z // end inline asm 2026-02-21T08:52:44.6470301Z // begin inline asm 2026-02-21T08:52:44.6470523Z cp.async.ca.shared.global [ %r1355 + 0 ], [ %rd44 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6470805Z // end inline asm 2026-02-21T08:52:44.6470966Z cp.async.commit_group; 2026-02-21T08:52:44.6471287Z .loc 1 54 62 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:62 2026-02-21T08:52:44.6471658Z add.s32 %r458, %r74, %r31; 2026-02-21T08:52:44.6471972Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6472334Z cvt.s64.s32 %rd52, %r458; 2026-02-21T08:52:44.6472513Z add.s64 %rd45, %rd38, %rd52; 2026-02-21T08:52:44.6472843Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6473201Z // begin inline asm 2026-02-21T08:52:44.6473428Z cp.async.ca.shared.global [ %r1357 + 0 ], [ %rd45 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6473707Z // end inline asm 2026-02-21T08:52:44.6473861Z cp.async.commit_group; 2026-02-21T08:52:44.6474177Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6474547Z cvt.s64.s32 %rd53, %r450; 2026-02-21T08:52:44.6474729Z or.b64 %rd54, %rd53, %rd2; 2026-02-21T08:52:44.6474905Z shl.b64 %rd55, %rd54, 1; 2026-02-21T08:52:44.6475083Z add.s64 %rd56, %rd37, %rd55; 2026-02-21T08:52:44.6475265Z add.s64 %rd46, %rd56, 128; 2026-02-21T08:52:44.6490950Z cvt.s64.s32 %rd57, %r451; 2026-02-21T08:52:44.6491190Z or.b64 %rd58, %rd57, %rd2; 2026-02-21T08:52:44.6491398Z shl.b64 %rd59, %rd58, 1; 2026-02-21T08:52:44.6491597Z add.s64 %rd60, %rd37, %rd59; 2026-02-21T08:52:44.6491794Z add.s64 %rd47, %rd60, 128; 2026-02-21T08:52:44.6491986Z cvt.s64.s32 %rd61, %r452; 2026-02-21T08:52:44.6492173Z or.b64 %rd62, %rd61, %rd2; 2026-02-21T08:52:44.6492367Z shl.b64 %rd63, %rd62, 1; 2026-02-21T08:52:44.6492789Z add.s64 %rd64, %rd37, %rd63; 2026-02-21T08:52:44.6492995Z add.s64 %rd48, %rd64, 128; 2026-02-21T08:52:44.6493179Z cvt.s64.s32 %rd65, %r453; 2026-02-21T08:52:44.6493362Z or.b64 %rd66, %rd65, %rd2; 2026-02-21T08:52:44.6493539Z shl.b64 %rd67, %rd66, 1; 2026-02-21T08:52:44.6493735Z add.s64 %rd68, %rd37, %rd67; 2026-02-21T08:52:44.6493938Z add.s64 %rd49, %rd68, 128; 2026-02-21T08:52:44.6494291Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6494694Z bar.sync 0; 2026-02-21T08:52:44.6494856Z // begin inline asm 2026-02-21T08:52:44.6495122Z cp.async.ca.shared.global [ %r1359 + 0 ], [ %rd46 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6495416Z // end inline asm 2026-02-21T08:52:44.6495718Z // begin inline asm 2026-02-21T08:52:44.6495975Z cp.async.ca.shared.global [ %r1361 + 0 ], [ %rd47 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6496258Z // end inline asm 2026-02-21T08:52:44.6496415Z // begin inline asm 2026-02-21T08:52:44.6496819Z cp.async.ca.shared.global [ %r1363 + 0 ], [ %rd48 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6497110Z // end inline asm 2026-02-21T08:52:44.6497264Z // begin inline asm 2026-02-21T08:52:44.6497522Z cp.async.ca.shared.global [ %r1365 + 0 ], [ %rd49 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6497798Z // end inline asm 2026-02-21T08:52:44.6497971Z cp.async.commit_group; 2026-02-21T08:52:44.6498323Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6498718Z cvt.s64.s32 %rd69, %r74; 2026-02-21T08:52:44.6498910Z add.s64 %rd70, %rd69, %rd3; 2026-02-21T08:52:44.6499099Z add.s64 %rd71, %rd38, %rd70; 2026-02-21T08:52:44.6499292Z add.s64 %rd50, %rd71, 229376; 2026-02-21T08:52:44.6499632Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6500002Z // begin inline asm 2026-02-21T08:52:44.6500248Z cp.async.ca.shared.global [ %r1367 + 0 ], [ %rd50 + 0 ], 0x8, %r409; 2026-02-21T08:52:44.6500534Z // end inline asm 2026-02-21T08:52:44.6500694Z cp.async.commit_group; 2026-02-21T08:52:44.6501034Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6501411Z add.s32 %r2458, %r67, %r449; 2026-02-21T08:52:44.6501595Z or.b32 %r459, %r8, %r73; 2026-02-21T08:52:44.6501775Z shl.b32 %r460, %r459, 13; 2026-02-21T08:52:44.6501980Z mad.wide.s32 %rd242, %r460, 2, %rd1; 2026-02-21T08:52:44.6502191Z or.b32 %r461, %r7, %r73; 2026-02-21T08:52:44.6502360Z shl.b32 %r462, %r461, 13; 2026-02-21T08:52:44.6502548Z mad.wide.s32 %rd241, %r462, 2, %rd1; 2026-02-21T08:52:44.6502744Z shl.b32 %r463, %r444, 19; 2026-02-21T08:52:44.6502927Z or.b32 %r464, %r70, %r463; 2026-02-21T08:52:44.6503129Z mad.wide.s32 %rd240, %r464, 2, %rd1; 2026-02-21T08:52:44.6503340Z or.b32 %r2457, %r71, %r463; 2026-02-21T08:52:44.6503535Z mov.b32 %r2461, 0f00000000; 2026-02-21T08:52:44.6503713Z mov.b32 %r2460, 1; 2026-02-21T08:52:44.6503886Z mov.b32 %r2459, -1; 2026-02-21T08:52:44.6504053Z mov.b64 %rd243, -32; 2026-02-21T08:52:44.6504230Z mov.b32 %r2462, %r2461; 2026-02-21T08:52:44.6504418Z mov.b32 %r2463, %r2461; 2026-02-21T08:52:44.6504602Z mov.b32 %r2464, %r2461; 2026-02-21T08:52:44.6504779Z mov.b32 %r2465, %r2461; 2026-02-21T08:52:44.6504954Z mov.b32 %r2466, %r2461; 2026-02-21T08:52:44.6505127Z mov.b32 %r2467, %r2461; 2026-02-21T08:52:44.6505294Z mov.b32 %r2468, %r2461; 2026-02-21T08:52:44.6505462Z mov.b32 %r2469, %r2461; 2026-02-21T08:52:44.6505624Z mov.b32 %r2470, %r2461; 2026-02-21T08:52:44.6505794Z mov.b32 %r2471, %r2461; 2026-02-21T08:52:44.6505957Z mov.b32 %r2472, %r2461; 2026-02-21T08:52:44.6506138Z mov.b32 %r2473, %r2461; 2026-02-21T08:52:44.6506305Z mov.b32 %r2474, %r2461; 2026-02-21T08:52:44.6506590Z mov.b32 %r2475, %r2461; 2026-02-21T08:52:44.6506766Z mov.b32 %r2476, %r2461; 2026-02-21T08:52:44.6507004Z $L__BB0_4: // Parent Loop BB0_3 Depth=1 2026-02-21T08:52:44.6507473Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:44.6507734Z add.s64 %rd243, %rd243, 32; 2026-02-21T08:52:44.6507929Z setp.lt.u64 %p11, %rd243, 4032; 2026-02-21T08:52:44.6508130Z add.s32 %r801, %r2459, 1; 2026-02-21T08:52:44.6508321Z setp.gt.s32 %p12, %r801, 1; 2026-02-21T08:52:44.6508596Z selp.b32 %r2459, 0, %r801, %p12; 2026-02-21T08:52:44.6508968Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6509354Z cp.async.wait_group 2; 2026-02-21T08:52:44.6509530Z bar.sync 0; 2026-02-21T08:52:44.6509689Z shl.b32 %r802, %r2459, 13; 2026-02-21T08:52:44.6509874Z add.s32 %r804, %r2442, %r802; 2026-02-21T08:52:44.6510351Z .loc 1 52 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:52:32 2026-02-21T08:52:44.6510732Z add.s32 %r805, %r804, %r39; 2026-02-21T08:52:44.6510934Z ld.shared.b16 %rs1, [%r805]; 2026-02-21T08:52:44.6511137Z ld.shared.b16 %rs2, [%r805+1024]; 2026-02-21T08:52:44.6511348Z ld.shared.b16 %rs3, [%r805+64]; 2026-02-21T08:52:44.6511557Z ld.shared.b16 %rs4, [%r805+1088]; 2026-02-21T08:52:44.6511750Z add.s32 %r806, %r804, %r40; 2026-02-21T08:52:44.6511939Z ld.shared.b16 %rs5, [%r806]; 2026-02-21T08:52:44.6512126Z ld.shared.b16 %rs6, [%r806+1024]; 2026-02-21T08:52:44.6512337Z ld.shared.b16 %rs7, [%r806+64]; 2026-02-21T08:52:44.6512536Z ld.shared.b16 %rs8, [%r806+1088]; 2026-02-21T08:52:44.6512736Z add.s32 %r807, %r804, %r41; 2026-02-21T08:52:44.6512919Z ld.shared.b16 %rs9, [%r807]; 2026-02-21T08:52:44.6513121Z ld.shared.b16 %rs10, [%r807+1024]; 2026-02-21T08:52:44.6513338Z ld.shared.b16 %rs11, [%r807+64]; 2026-02-21T08:52:44.6513546Z ld.shared.b16 %rs12, [%r807+1088]; 2026-02-21T08:52:44.6513755Z add.s32 %r808, %r804, %r42; 2026-02-21T08:52:44.6513943Z ld.shared.b16 %rs13, [%r808]; 2026-02-21T08:52:44.6514142Z ld.shared.b16 %rs14, [%r808+1024]; 2026-02-21T08:52:44.6514349Z ld.shared.b16 %rs15, [%r808+64]; 2026-02-21T08:52:44.6514557Z ld.shared.b16 %rs16, [%r808+1088]; 2026-02-21T08:52:44.6514762Z add.s32 %r809, %r804, %r43; 2026-02-21T08:52:44.6514953Z ld.shared.b16 %rs17, [%r809]; 2026-02-21T08:52:44.6515140Z ld.shared.b16 %rs18, [%r809+1024]; 2026-02-21T08:52:44.6515350Z ld.shared.b16 %rs19, [%r809+64]; 2026-02-21T08:52:44.6515555Z ld.shared.b16 %rs20, [%r809+1088]; 2026-02-21T08:52:44.6515752Z add.s32 %r810, %r804, %r44; 2026-02-21T08:52:44.6515938Z ld.shared.b16 %rs21, [%r810]; 2026-02-21T08:52:44.6516125Z ld.shared.b16 %rs22, [%r810+1024]; 2026-02-21T08:52:44.6516330Z ld.shared.b16 %rs23, [%r810+64]; 2026-02-21T08:52:44.6516657Z ld.shared.b16 %rs24, [%r810+1088]; 2026-02-21T08:52:44.6516873Z add.s32 %r811, %r804, %r45; 2026-02-21T08:52:44.6517071Z ld.shared.b16 %rs25, [%r811]; 2026-02-21T08:52:44.6517271Z ld.shared.b16 %rs26, [%r811+1024]; 2026-02-21T08:52:44.6517472Z ld.shared.b16 %rs27, [%r811+64]; 2026-02-21T08:52:44.6517678Z ld.shared.b16 %rs28, [%r811+1088]; 2026-02-21T08:52:44.6517877Z add.s32 %r812, %r804, %r46; 2026-02-21T08:52:44.6518057Z ld.shared.b16 %rs29, [%r812]; 2026-02-21T08:52:44.6518248Z ld.shared.b16 %rs30, [%r812+1024]; 2026-02-21T08:52:44.6518444Z ld.shared.b16 %rs31, [%r812+64]; 2026-02-21T08:52:44.6518645Z ld.shared.b16 %rs32, [%r812+1088]; 2026-02-21T08:52:44.6518839Z cvt.f32.bf16 %r497, %rs1; 2026-02-21T08:52:44.6519028Z cvt.f32.bf16 %r498, %rs2; 2026-02-21T08:52:44.6519201Z cvt.f32.bf16 %r499, %rs5; 2026-02-21T08:52:44.6519379Z cvt.f32.bf16 %r500, %rs6; 2026-02-21T08:52:44.6519560Z cvt.f32.bf16 %r533, %rs9; 2026-02-21T08:52:44.6519733Z cvt.f32.bf16 %r534, %rs10; 2026-02-21T08:52:44.6519922Z cvt.f32.bf16 %r535, %rs13; 2026-02-21T08:52:44.6520100Z cvt.f32.bf16 %r536, %rs14; 2026-02-21T08:52:44.6520286Z cvt.f32.bf16 %r569, %rs17; 2026-02-21T08:52:44.6520459Z cvt.f32.bf16 %r570, %rs18; 2026-02-21T08:52:44.6520637Z cvt.f32.bf16 %r571, %rs21; 2026-02-21T08:52:44.6520809Z cvt.f32.bf16 %r572, %rs22; 2026-02-21T08:52:44.6521151Z cvt.f32.bf16 %r605, %rs25; 2026-02-21T08:52:44.6521331Z cvt.f32.bf16 %r606, %rs26; 2026-02-21T08:52:44.6521509Z cvt.f32.bf16 %r607, %rs29; 2026-02-21T08:52:44.6521692Z cvt.f32.bf16 %r608, %rs30; 2026-02-21T08:52:44.6521866Z cvt.f32.bf16 %r641, %rs3; 2026-02-21T08:52:44.6522047Z cvt.f32.bf16 %r642, %rs4; 2026-02-21T08:52:44.6522216Z cvt.f32.bf16 %r643, %rs7; 2026-02-21T08:52:44.6522394Z cvt.f32.bf16 %r644, %rs8; 2026-02-21T08:52:44.6522581Z cvt.f32.bf16 %r677, %rs11; 2026-02-21T08:52:44.6522762Z cvt.f32.bf16 %r678, %rs12; 2026-02-21T08:52:44.6522934Z cvt.f32.bf16 %r679, %rs15; 2026-02-21T08:52:44.6523114Z cvt.f32.bf16 %r680, %rs16; 2026-02-21T08:52:44.6523285Z cvt.f32.bf16 %r713, %rs19; 2026-02-21T08:52:44.6523465Z cvt.f32.bf16 %r714, %rs20; 2026-02-21T08:52:44.6523802Z cvt.f32.bf16 %r715, %rs23; 2026-02-21T08:52:44.6523984Z cvt.f32.bf16 %r716, %rs24; 2026-02-21T08:52:44.6524160Z cvt.f32.bf16 %r749, %rs27; 2026-02-21T08:52:44.6524346Z cvt.f32.bf16 %r750, %rs28; 2026-02-21T08:52:44.6524527Z cvt.f32.bf16 %r751, %rs31; 2026-02-21T08:52:44.6524705Z cvt.f32.bf16 %r752, %rs32; 2026-02-21T08:52:44.6525052Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6525439Z shl.b32 %r813, %r2459, 11; 2026-02-21T08:52:44.6525774Z .loc 1 67 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:67:45 2026-02-21T08:52:44.6526138Z add.s32 %r814, %r47, %r813; 2026-02-21T08:52:44.6526329Z ld.shared.b8 %rs33, [%r814]; 2026-02-21T08:52:44.6526652Z ld.shared.b8 %rs34, [%r814+128]; 2026-02-21T08:52:44.6526859Z ld.shared.b8 %rs35, [%r814+256]; 2026-02-21T08:52:44.6527066Z ld.shared.b8 %rs36, [%r814+384]; 2026-02-21T08:52:44.6527271Z ld.shared.b8 %rs37, [%r814+512]; 2026-02-21T08:52:44.6527473Z ld.shared.b8 %rs38, [%r814+640]; 2026-02-21T08:52:44.6527664Z ld.shared.b8 %rs39, [%r814+768]; 2026-02-21T08:52:44.6527858Z ld.shared.b8 %rs40, [%r814+896]; 2026-02-21T08:52:44.6528052Z ld.shared.b8 %rs41, [%r814+1024]; 2026-02-21T08:52:44.6528256Z ld.shared.b8 %rs42, [%r814+1152]; 2026-02-21T08:52:44.6528449Z ld.shared.b8 %rs43, [%r814+1280]; 2026-02-21T08:52:44.6528646Z ld.shared.b8 %rs44, [%r814+1408]; 2026-02-21T08:52:44.6528838Z ld.shared.b8 %rs45, [%r814+1536]; 2026-02-21T08:52:44.6529037Z ld.shared.b8 %rs46, [%r814+1664]; 2026-02-21T08:52:44.6529233Z ld.shared.b8 %rs47, [%r814+1792]; 2026-02-21T08:52:44.6529429Z ld.shared.b8 %rs48, [%r814+1920]; 2026-02-21T08:52:44.6529773Z .loc 1 57 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:57:28 2026-02-21T08:52:44.6530137Z shl.b16 %rs49, %rs33, 4; 2026-02-21T08:52:44.6530317Z shl.b16 %rs50, %rs34, 4; 2026-02-21T08:52:44.6530494Z shl.b16 %rs51, %rs35, 4; 2026-02-21T08:52:44.6530671Z shl.b16 %rs52, %rs36, 4; 2026-02-21T08:52:44.6530836Z shl.b16 %rs53, %rs37, 4; 2026-02-21T08:52:44.6531007Z shl.b16 %rs54, %rs38, 4; 2026-02-21T08:52:44.6531179Z shl.b16 %rs55, %rs39, 4; 2026-02-21T08:52:44.6531351Z shl.b16 %rs56, %rs40, 4; 2026-02-21T08:52:44.6531541Z shl.b16 %rs57, %rs41, 4; 2026-02-21T08:52:44.6531707Z shl.b16 %rs58, %rs42, 4; 2026-02-21T08:52:44.6531879Z shl.b16 %rs59, %rs43, 4; 2026-02-21T08:52:44.6532043Z shl.b16 %rs60, %rs44, 4; 2026-02-21T08:52:44.6532220Z shl.b16 %rs61, %rs45, 4; 2026-02-21T08:52:44.6532385Z shl.b16 %rs62, %rs46, 4; 2026-02-21T08:52:44.6532558Z shl.b16 %rs63, %rs47, 4; 2026-02-21T08:52:44.6532726Z shl.b16 %rs64, %rs48, 4; 2026-02-21T08:52:44.6533062Z .loc 1 72 58 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:72:58 2026-02-21T08:52:44.6533440Z selp.b16 %rs65, %rs49, %rs33, %p64; 2026-02-21T08:52:44.6533644Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T08:52:44.6533825Z shr.s16 %rs67, %rs66, 4; 2026-02-21T08:52:44.6534004Z selp.b16 %rs68, %rs50, %rs34, %p64; 2026-02-21T08:52:44.6534209Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T08:52:44.6534376Z shr.s16 %rs70, %rs69, 4; 2026-02-21T08:52:44.6534720Z selp.b16 %rs71, %rs51, %rs35, %p64; 2026-02-21T08:52:44.6534915Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T08:52:44.6535094Z shr.s16 %rs73, %rs72, 4; 2026-02-21T08:52:44.6535269Z selp.b16 %rs74, %rs52, %rs36, %p64; 2026-02-21T08:52:44.6535483Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T08:52:44.6535660Z shr.s16 %rs76, %rs75, 4; 2026-02-21T08:52:44.6535836Z selp.b16 %rs77, %rs53, %rs37, %p64; 2026-02-21T08:52:44.6536037Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T08:52:44.6536204Z shr.s16 %rs79, %rs78, 4; 2026-02-21T08:52:44.6536387Z selp.b16 %rs80, %rs54, %rs38, %p64; 2026-02-21T08:52:44.6536706Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T08:52:44.6536896Z shr.s16 %rs82, %rs81, 4; 2026-02-21T08:52:44.6537075Z selp.b16 %rs83, %rs55, %rs39, %p64; 2026-02-21T08:52:44.6537410Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T08:52:44.6537589Z shr.s16 %rs85, %rs84, 4; 2026-02-21T08:52:44.6537781Z selp.b16 %rs86, %rs56, %rs40, %p64; 2026-02-21T08:52:44.6537979Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T08:52:44.6538150Z shr.s16 %rs88, %rs87, 4; 2026-02-21T08:52:44.6538339Z selp.b16 %rs89, %rs57, %rs41, %p64; 2026-02-21T08:52:44.6538534Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T08:52:44.6538706Z shr.s16 %rs91, %rs90, 4; 2026-02-21T08:52:44.6538879Z selp.b16 %rs92, %rs58, %rs42, %p64; 2026-02-21T08:52:44.6539078Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T08:52:44.6539243Z shr.s16 %rs94, %rs93, 4; 2026-02-21T08:52:44.6539419Z selp.b16 %rs95, %rs59, %rs43, %p64; 2026-02-21T08:52:44.6539615Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T08:52:44.6539780Z shr.s16 %rs97, %rs96, 4; 2026-02-21T08:52:44.6539959Z selp.b16 %rs98, %rs60, %rs44, %p64; 2026-02-21T08:52:44.6540153Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T08:52:44.6540328Z shr.s16 %rs100, %rs99, 4; 2026-02-21T08:52:44.6540513Z selp.b16 %rs101, %rs61, %rs45, %p64; 2026-02-21T08:52:44.6540721Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T08:52:44.6540900Z shr.s16 %rs103, %rs102, 4; 2026-02-21T08:52:44.6541102Z selp.b16 %rs104, %rs62, %rs46, %p64; 2026-02-21T08:52:44.6541308Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T08:52:44.6541482Z shr.s16 %rs106, %rs105, 4; 2026-02-21T08:52:44.6541668Z selp.b16 %rs107, %rs63, %rs47, %p64; 2026-02-21T08:52:44.6541864Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T08:52:44.6542043Z shr.s16 %rs109, %rs108, 4; 2026-02-21T08:52:44.6542224Z selp.b16 %rs110, %rs64, %rs48, %p64; 2026-02-21T08:52:44.6542425Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T08:52:44.6542598Z shr.s16 %rs112, %rs111, 4; 2026-02-21T08:52:44.6542931Z .loc 1 77 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:77:32 2026-02-21T08:52:44.6543297Z cvt.rn.f32.s16 %r815, %rs67; 2026-02-21T08:52:44.6543489Z cvt.rn.f32.s16 %r816, %rs70; 2026-02-21T08:52:44.6543675Z cvt.rn.f32.s16 %r817, %rs73; 2026-02-21T08:52:44.6543860Z cvt.rn.f32.s16 %r818, %rs76; 2026-02-21T08:52:44.6544044Z cvt.rn.f32.s16 %r819, %rs79; 2026-02-21T08:52:44.6544220Z cvt.rn.f32.s16 %r820, %rs82; 2026-02-21T08:52:44.6544401Z cvt.rn.f32.s16 %r821, %rs85; 2026-02-21T08:52:44.6544584Z cvt.rn.f32.s16 %r822, %rs88; 2026-02-21T08:52:44.6544785Z cvt.rn.f32.s16 %r823, %rs91; 2026-02-21T08:52:44.6544963Z cvt.rn.f32.s16 %r824, %rs94; 2026-02-21T08:52:44.6545150Z cvt.rn.f32.s16 %r825, %rs97; 2026-02-21T08:52:44.6545336Z cvt.rn.f32.s16 %r826, %rs100; 2026-02-21T08:52:44.6545521Z cvt.rn.f32.s16 %r827, %rs103; 2026-02-21T08:52:44.6545709Z cvt.rn.f32.s16 %r828, %rs106; 2026-02-21T08:52:44.6545886Z cvt.rn.f32.s16 %r829, %rs109; 2026-02-21T08:52:44.6546071Z cvt.rn.f32.s16 %r830, %rs112; 2026-02-21T08:52:44.6546253Z st.shared.b32 [%r48], %r815; 2026-02-21T08:52:44.6546575Z st.shared.b32 [%r48+8192], %r823; 2026-02-21T08:52:44.6546779Z st.shared.b32 [%r49], %r816; 2026-02-21T08:52:44.6546979Z st.shared.b32 [%r49+8192], %r824; 2026-02-21T08:52:44.6547178Z st.shared.b32 [%r50], %r817; 2026-02-21T08:52:44.6547363Z st.shared.b32 [%r50+8192], %r825; 2026-02-21T08:52:44.6547558Z st.shared.b32 [%r51], %r818; 2026-02-21T08:52:44.6547880Z st.shared.b32 [%r51+8192], %r826; 2026-02-21T08:52:44.6548079Z st.shared.b32 [%r52], %r819; 2026-02-21T08:52:44.6548255Z st.shared.b32 [%r52+8192], %r827; 2026-02-21T08:52:44.6548448Z st.shared.b32 [%r53], %r820; 2026-02-21T08:52:44.6548699Z st.shared.b32 [%r53+8192], %r828; 2026-02-21T08:52:44.6548895Z st.shared.b32 [%r54], %r821; 2026-02-21T08:52:44.6549076Z st.shared.b32 [%r54+8192], %r829; 2026-02-21T08:52:44.6549276Z st.shared.b32 [%r55], %r822; 2026-02-21T08:52:44.6549457Z st.shared.b32 [%r55+8192], %r830; 2026-02-21T08:52:44.6549641Z $L__tmp1: 2026-02-21T08:52:44.6550012Z .loc 2 291 36 // standard.py:291:36 @[ cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:84:40 ] 2026-02-21T08:52:44.6550448Z // begin inline asm 2026-02-21T08:52:44.6550785Z fence.proxy.async.shared::cta; 2026-02-21T08:52:44.6550981Z // end inline asm 2026-02-21T08:52:44.6551137Z bar.sync 0; 2026-02-21T08:52:44.6551306Z shfl.sync.idx.b32 %r831, %r4, 0, 31, -1; 2026-02-21T08:52:44.6551559Z wgmma.fence.sync.aligned; 2026-02-21T08:52:44.6551749Z shl.b32 %r832, %r831, 10; 2026-02-21T08:52:44.6551925Z and.b32 %r833, %r832, 4096; 2026-02-21T08:52:44.6552115Z add.s32 %r834, %r833, %r364; 2026-02-21T08:52:44.6552293Z bfe.u32 %r835, %r834, 4, 14; 2026-02-21T08:52:44.6552473Z cvt.u64.u32 %rd85, %r835; 2026-02-21T08:52:44.6552659Z or.b64 %rd72, %rd85, 4611686293338849280; 2026-02-21T08:52:44.6552877Z mov.pred %p2, -1; 2026-02-21T08:52:44.6553037Z // begin inline asm 2026-02-21T08:52:44.6553650Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r497,%r498,%r499,%r500}, %rd72, %p2, 1, 1; 2026-02-21T08:52:44.6554302Z // end inline asm 2026-02-21T08:52:44.6554458Z add.s32 %r836, %r834, 32; 2026-02-21T08:52:44.6554636Z bfe.u32 %r837, %r836, 4, 14; 2026-02-21T08:52:44.6554830Z cvt.u64.u32 %rd86, %r837; 2026-02-21T08:52:44.6555020Z or.b64 %rd73, %rd86, 4611686293338849280; 2026-02-21T08:52:44.6555231Z // begin inline asm 2026-02-21T08:52:44.6555829Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r533,%r534,%r535,%r536}, %rd73, %p2, 1, 1; 2026-02-21T08:52:44.6556620Z // end inline asm 2026-02-21T08:52:44.6556784Z add.s32 %r838, %r834, 64; 2026-02-21T08:52:44.6556967Z bfe.u32 %r839, %r838, 4, 14; 2026-02-21T08:52:44.6557154Z cvt.u64.u32 %rd87, %r839; 2026-02-21T08:52:44.6557345Z or.b64 %rd74, %rd87, 4611686293338849280; 2026-02-21T08:52:44.6557552Z // begin inline asm 2026-02-21T08:52:44.6558153Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r569,%r570,%r571,%r572}, %rd74, %p2, 1, 1; 2026-02-21T08:52:44.6558796Z // end inline asm 2026-02-21T08:52:44.6558950Z add.s32 %r840, %r834, 96; 2026-02-21T08:52:44.6559131Z bfe.u32 %r841, %r840, 4, 14; 2026-02-21T08:52:44.6559308Z cvt.u64.u32 %rd88, %r841; 2026-02-21T08:52:44.6559497Z or.b64 %rd75, %rd88, 4611686293338849280; 2026-02-21T08:52:44.6559702Z // begin inline asm 2026-02-21T08:52:44.6560286Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r605,%r606,%r607,%r608}, %rd75, %p2, 1, 1; 2026-02-21T08:52:44.6560933Z // end inline asm 2026-02-21T08:52:44.6561093Z add.s32 %r842, %r834, 8192; 2026-02-21T08:52:44.6561272Z bfe.u32 %r843, %r842, 4, 14; 2026-02-21T08:52:44.6561456Z cvt.u64.u32 %rd89, %r843; 2026-02-21T08:52:44.6561635Z or.b64 %rd76, %rd89, 4611686293338849280; 2026-02-21T08:52:44.6561844Z // begin inline asm 2026-02-21T08:52:44.6562435Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r641,%r642,%r643,%r644}, %rd76, %p2, 1, 1; 2026-02-21T08:52:44.6563239Z // end inline asm 2026-02-21T08:52:44.6563401Z add.s32 %r844, %r834, 8224; 2026-02-21T08:52:44.6563586Z bfe.u32 %r845, %r844, 4, 14; 2026-02-21T08:52:44.6563774Z cvt.u64.u32 %rd90, %r845; 2026-02-21T08:52:44.6563959Z or.b64 %rd77, %rd90, 4611686293338849280; 2026-02-21T08:52:44.6564180Z // begin inline asm 2026-02-21T08:52:44.6564777Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r677,%r678,%r679,%r680}, %rd77, %p2, 1, 1; 2026-02-21T08:52:44.6565424Z // end inline asm 2026-02-21T08:52:44.6565584Z add.s32 %r846, %r834, 8256; 2026-02-21T08:52:44.6565761Z bfe.u32 %r847, %r846, 4, 14; 2026-02-21T08:52:44.6566072Z cvt.u64.u32 %rd91, %r847; 2026-02-21T08:52:44.6566262Z or.b64 %rd78, %rd91, 4611686293338849280; 2026-02-21T08:52:44.6566601Z // begin inline asm 2026-02-21T08:52:44.6567212Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r713,%r714,%r715,%r716}, %rd78, %p2, 1, 1; 2026-02-21T08:52:44.6567861Z // end inline asm 2026-02-21T08:52:44.6568017Z add.s32 %r848, %r834, 8288; 2026-02-21T08:52:44.6568203Z bfe.u32 %r849, %r848, 4, 14; 2026-02-21T08:52:44.6568396Z cvt.u64.u32 %rd92, %r849; 2026-02-21T08:52:44.6568584Z or.b64 %rd79, %rd92, 4611686293338849280; 2026-02-21T08:52:44.6568804Z // begin inline asm 2026-02-21T08:52:44.6569403Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476}, {%r749,%r750,%r751,%r752}, %rd79, %p2, 1, 1; 2026-02-21T08:52:44.6570053Z // end inline asm 2026-02-21T08:52:44.6570239Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:44.6570453Z mov.b32 %r771, 0; 2026-02-21T08:52:44.6570613Z mov.b32 %r770, %r771; 2026-02-21T08:52:44.6570779Z mov.b32 %r769, %r364; 2026-02-21T08:52:44.6570950Z // begin inline asm 2026-02-21T08:52:44.6571351Z // wait for regs: %r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r769,%r770,%r771 2026-02-21T08:52:44.6571831Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:44.6572025Z // end inline asm 2026-02-21T08:52:44.6572177Z $L__tmp2: 2026-02-21T08:52:44.6572490Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6572878Z add.s32 %r850, %r2460, 1; 2026-02-21T08:52:44.6573065Z setp.gt.s32 %p13, %r850, 1; 2026-02-21T08:52:44.6573265Z selp.b32 %r2460, 0, %r850, %p13; 2026-02-21T08:52:44.6573624Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6574005Z mad.wide.s32 %rd83, %r2457, 2, %rd37; 2026-02-21T08:52:44.6574364Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6574738Z shl.b32 %r851, %r2460, 13; 2026-02-21T08:52:44.6574929Z add.s32 %r852, %r2442, %r851; 2026-02-21T08:52:44.6575126Z add.s32 %r791, %r852, %r19; 2026-02-21T08:52:44.6575308Z selp.b32 %r792, 8, 0, %p11; 2026-02-21T08:52:44.6575487Z // begin inline asm 2026-02-21T08:52:44.6575736Z cp.async.ca.shared.global [ %r791 + 0 ], [ %rd240 + 0 ], 0x8, %r792; 2026-02-21T08:52:44.6576019Z // end inline asm 2026-02-21T08:52:44.6576173Z add.s32 %r793, %r791, 2048; 2026-02-21T08:52:44.6576345Z // begin inline asm 2026-02-21T08:52:44.6576716Z cp.async.ca.shared.global [ %r793 + 0 ], [ %rd241 + 0 ], 0x8, %r792; 2026-02-21T08:52:44.6577007Z // end inline asm 2026-02-21T08:52:44.6577167Z add.s32 %r795, %r791, 4096; 2026-02-21T08:52:44.6577340Z // begin inline asm 2026-02-21T08:52:44.6577581Z cp.async.ca.shared.global [ %r795 + 0 ], [ %rd242 + 0 ], 0x8, %r792; 2026-02-21T08:52:44.6577853Z // end inline asm 2026-02-21T08:52:44.6578012Z add.s32 %r797, %r791, 6144; 2026-02-21T08:52:44.6578355Z // begin inline asm 2026-02-21T08:52:44.6578583Z cp.async.ca.shared.global [ %r797 + 0 ], [ %rd83 + 0 ], 0x8, %r792; 2026-02-21T08:52:44.6578858Z // end inline asm 2026-02-21T08:52:44.6579014Z cp.async.commit_group; 2026-02-21T08:52:44.6579337Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6579702Z cvt.s64.s32 %rd93, %r2458; 2026-02-21T08:52:44.6579891Z add.s64 %rd84, %rd38, %rd93; 2026-02-21T08:52:44.6580224Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6580580Z shl.b32 %r853, %r2460, 11; 2026-02-21T08:52:44.6580759Z add.s32 %r799, %r1357, %r853; 2026-02-21T08:52:44.6580936Z // begin inline asm 2026-02-21T08:52:44.6581299Z cp.async.ca.shared.global [ %r799 + 0 ], [ %rd84 + 0 ], 0x8, %r792; 2026-02-21T08:52:44.6581589Z // end inline asm 2026-02-21T08:52:44.6581755Z cp.async.commit_group; 2026-02-21T08:52:44.6582081Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6582460Z add.s32 %r2458, %r2458, 229376; 2026-02-21T08:52:44.6582654Z add.s64 %rd242, %rd242, 128; 2026-02-21T08:52:44.6582830Z add.s64 %rd241, %rd241, 128; 2026-02-21T08:52:44.6583009Z add.s64 %rd240, %rd240, 128; 2026-02-21T08:52:44.6583182Z add.s32 %r2457, %r2457, 64; 2026-02-21T08:52:44.6583364Z setp.lt.u64 %p14, %rd243, 4064; 2026-02-21T08:52:44.6583563Z @%p14 bra $L__BB0_4; 2026-02-21T08:52:44.6583779Z // %bb.5: // in Loop: Header=BB0_3 Depth=1 2026-02-21T08:52:44.6584192Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6584560Z or.b32 %r895, %r73, %r10; 2026-02-21T08:52:44.6584742Z or.b32 %r896, %r73, %r11; 2026-02-21T08:52:44.6585068Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6585451Z cp.async.wait_group 0; 2026-02-21T08:52:44.6585620Z bar.sync 0; 2026-02-21T08:52:44.6585916Z .loc 1 87 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:87:28 2026-02-21T08:52:44.6586291Z cvt.rn.bf16x2.f32 %r897, %r2462, %r2461; 2026-02-21T08:52:44.6586657Z cvt.rn.bf16x2.f32 %r898, %r2464, %r2463; 2026-02-21T08:52:44.6586879Z cvt.rn.bf16x2.f32 %r899, %r2466, %r2465; 2026-02-21T08:52:44.6587107Z cvt.rn.bf16x2.f32 %r900, %r2468, %r2467; 2026-02-21T08:52:44.6587330Z cvt.rn.bf16x2.f32 %r901, %r2470, %r2469; 2026-02-21T08:52:44.6587540Z cvt.rn.bf16x2.f32 %r902, %r2472, %r2471; 2026-02-21T08:52:44.6587761Z cvt.rn.bf16x2.f32 %r903, %r2474, %r2473; 2026-02-21T08:52:44.6587974Z cvt.rn.bf16x2.f32 %r904, %r2476, %r2475; 2026-02-21T08:52:44.6588333Z .loc 1 88 50 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:50 2026-02-21T08:52:44.6588772Z mad.lo.s32 %r905, %r895, 7168, %r74; 2026-02-21T08:52:44.6588976Z mad.lo.s32 %r906, %r896, 7168, %r74; 2026-02-21T08:52:44.6589323Z .loc 1 88 22 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:22 2026-02-21T08:52:44.6589685Z mad.wide.s32 %rd94, %r905, 2, %rd39; 2026-02-21T08:52:44.6589894Z mad.wide.s32 %rd95, %r906, 2, %rd39; 2026-02-21T08:52:44.6590230Z .loc 1 88 81 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:81 2026-02-21T08:52:44.6590637Z st.shared.v4.b32 [%r56], {%r897, %r899, %r901, %r903}; 2026-02-21T08:52:44.6590938Z st.shared.v4.b32 [%r56+256], {%r898, %r900, %r902, %r904}; 2026-02-21T08:52:44.6591186Z bar.sync 0; 2026-02-21T08:52:44.6591337Z // begin inline asm 2026-02-21T08:52:44.6591619Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r864, %r865, %r866, %r867}, [%r858]; 2026-02-21T08:52:44.6591955Z // end inline asm 2026-02-21T08:52:44.6592106Z // begin inline asm 2026-02-21T08:52:44.6592382Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r868, %r869, %r870, %r871}, [%r863]; 2026-02-21T08:52:44.6592855Z // end inline asm 2026-02-21T08:52:44.6593007Z // begin inline asm 2026-02-21T08:52:44.6593225Z st.global.v4.b32 [ %rd94 + 0 ], { %r864, %r865, %r866, %r867 }; 2026-02-21T08:52:44.6593497Z // end inline asm 2026-02-21T08:52:44.6593649Z // begin inline asm 2026-02-21T08:52:44.6593855Z st.global.v4.b32 [ %rd95 + 0 ], { %r868, %r869, %r870, %r871 }; 2026-02-21T08:52:44.6594107Z // end inline asm 2026-02-21T08:52:44.6594419Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6594797Z add.s32 %r907, %r2456, 4224; 2026-02-21T08:52:44.6595138Z .loc 1 25 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:25:35 2026-02-21T08:52:44.6595527Z mul.hi.s32 %r908, %r907, -1840700269; 2026-02-21T08:52:44.6595874Z add.s32 %r909, %r908, %r907; 2026-02-21T08:52:44.6596056Z shr.u32 %r910, %r909, 31; 2026-02-21T08:52:44.6596234Z shr.s32 %r911, %r909, 7; 2026-02-21T08:52:44.6596415Z add.s32 %r912, %r911, %r910; 2026-02-21T08:52:44.6597006Z .loc 1 26 33 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:26:33 2026-02-21T08:52:44.6597369Z shl.b32 %r913, %r912, 1; 2026-02-21T08:52:44.6597687Z .loc 1 27 39 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:39 2026-02-21T08:52:44.6598046Z sub.s32 %r914, 1, %r913; 2026-02-21T08:52:44.6598357Z .loc 1 27 52 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:52 2026-02-21T08:52:44.6598713Z min.s32 %r915, %r914, 2; 2026-02-21T08:52:44.6599026Z .loc 1 28 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:45 2026-02-21T08:52:44.6599390Z mul.lo.s32 %r916, %r912, 224; 2026-02-21T08:52:44.6599573Z sub.s32 %r917, %r907, %r916; 2026-02-21T08:52:44.6599899Z .loc 1 29 51 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:29:51 2026-02-21T08:52:44.6600262Z div.s32 %r918, %r917, %r915; 2026-02-21T08:52:44.6600586Z .loc 1 28 64 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:64 2026-02-21T08:52:44.6600953Z mul.lo.s32 %r919, %r918, %r915; 2026-02-21T08:52:44.6601153Z sub.s32 %r920, %r917, %r919; 2026-02-21T08:52:44.6601482Z .loc 1 28 30 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:30 2026-02-21T08:52:44.6601838Z add.s32 %r921, %r920, %r913; 2026-02-21T08:52:44.6602163Z .loc 1 30 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:30:27 2026-02-21T08:52:44.6602522Z shl.b32 %r117, %r921, 6; 2026-02-21T08:52:44.6602835Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6603196Z or.b32 %r922, %r117, %r6; 2026-02-21T08:52:44.6603370Z or.b32 %r923, %r117, %r7; 2026-02-21T08:52:44.6603545Z or.b32 %r924, %r117, %r8; 2026-02-21T08:52:44.6603712Z or.b32 %r925, %r117, %r9; 2026-02-21T08:52:44.6604032Z .loc 1 32 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:32:27 2026-02-21T08:52:44.6604394Z shl.b32 %r926, %r918, 6; 2026-02-21T08:52:44.6604711Z .loc 1 33 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:33:32 2026-02-21T08:52:44.6605070Z or.b32 %r118, %r926, %r15; 2026-02-21T08:52:44.6605387Z .loc 1 48 53 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:53 2026-02-21T08:52:44.6605745Z shl.b32 %r927, %r922, 13; 2026-02-21T08:52:44.6605913Z shl.b32 %r928, %r923, 13; 2026-02-21T08:52:44.6606086Z shl.b32 %r929, %r924, 13; 2026-02-21T08:52:44.6606251Z shl.b32 %r930, %r925, 13; 2026-02-21T08:52:44.6606686Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6607066Z or.b32 %r931, %r927, %r13; 2026-02-21T08:52:44.6607240Z or.b32 %r932, %r928, %r13; 2026-02-21T08:52:44.6607414Z or.b32 %r933, %r929, %r13; 2026-02-21T08:52:44.6607778Z or.b32 %r934, %r930, %r13; 2026-02-21T08:52:44.6608110Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6608482Z mad.wide.s32 %rd96, %r931, 2, %rd37; 2026-02-21T08:52:44.6608699Z mad.wide.s32 %rd97, %r932, 2, %rd37; 2026-02-21T08:52:44.6608911Z mad.wide.s32 %rd98, %r933, 2, %rd37; 2026-02-21T08:52:44.6609118Z mad.wide.s32 %rd99, %r934, 2, %rd37; 2026-02-21T08:52:44.6609461Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6609813Z bar.sync 0; 2026-02-21T08:52:44.6609961Z mov.b32 %r873, 8; 2026-02-21T08:52:44.6610112Z // begin inline asm 2026-02-21T08:52:44.6610356Z cp.async.ca.shared.global [ %r1349 + 0 ], [ %rd96 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6610775Z // end inline asm 2026-02-21T08:52:44.6610950Z // begin inline asm 2026-02-21T08:52:44.6611187Z cp.async.ca.shared.global [ %r1351 + 0 ], [ %rd97 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6611479Z // end inline asm 2026-02-21T08:52:44.6611632Z // begin inline asm 2026-02-21T08:52:44.6611855Z cp.async.ca.shared.global [ %r1353 + 0 ], [ %rd98 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6612133Z // end inline asm 2026-02-21T08:52:44.6612279Z // begin inline asm 2026-02-21T08:52:44.6612504Z cp.async.ca.shared.global [ %r1355 + 0 ], [ %rd99 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6612775Z // end inline asm 2026-02-21T08:52:44.6612933Z cp.async.commit_group; 2026-02-21T08:52:44.6613259Z .loc 1 54 62 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:62 2026-02-21T08:52:44.6613629Z add.s32 %r935, %r118, %r31; 2026-02-21T08:52:44.6613962Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6614326Z cvt.s64.s32 %rd107, %r935; 2026-02-21T08:52:44.6614513Z add.s64 %rd100, %rd38, %rd107; 2026-02-21T08:52:44.6614844Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6615218Z // begin inline asm 2026-02-21T08:52:44.6615453Z cp.async.ca.shared.global [ %r1357 + 0 ], [ %rd100 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6615730Z // end inline asm 2026-02-21T08:52:44.6615889Z cp.async.commit_group; 2026-02-21T08:52:44.6616202Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6616706Z cvt.s64.s32 %rd108, %r927; 2026-02-21T08:52:44.6616892Z or.b64 %rd109, %rd108, %rd2; 2026-02-21T08:52:44.6617076Z shl.b64 %rd110, %rd109, 1; 2026-02-21T08:52:44.6617252Z add.s64 %rd111, %rd37, %rd110; 2026-02-21T08:52:44.6617440Z add.s64 %rd101, %rd111, 128; 2026-02-21T08:52:44.6617618Z cvt.s64.s32 %rd112, %r928; 2026-02-21T08:52:44.6617797Z or.b64 %rd113, %rd112, %rd2; 2026-02-21T08:52:44.6617975Z shl.b64 %rd114, %rd113, 1; 2026-02-21T08:52:44.6618152Z add.s64 %rd115, %rd37, %rd114; 2026-02-21T08:52:44.6618341Z add.s64 %rd102, %rd115, 128; 2026-02-21T08:52:44.6618516Z cvt.s64.s32 %rd116, %r929; 2026-02-21T08:52:44.6618700Z or.b64 %rd117, %rd116, %rd2; 2026-02-21T08:52:44.6618878Z shl.b64 %rd118, %rd117, 1; 2026-02-21T08:52:44.6619054Z add.s64 %rd119, %rd37, %rd118; 2026-02-21T08:52:44.6619231Z add.s64 %rd103, %rd119, 128; 2026-02-21T08:52:44.6619407Z cvt.s64.s32 %rd120, %r930; 2026-02-21T08:52:44.6619578Z or.b64 %rd121, %rd120, %rd2; 2026-02-21T08:52:44.6619755Z shl.b64 %rd122, %rd121, 1; 2026-02-21T08:52:44.6619931Z add.s64 %rd123, %rd37, %rd122; 2026-02-21T08:52:44.6620113Z add.s64 %rd104, %rd123, 128; 2026-02-21T08:52:44.6620445Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6620804Z bar.sync 0; 2026-02-21T08:52:44.6620956Z // begin inline asm 2026-02-21T08:52:44.6621195Z cp.async.ca.shared.global [ %r1359 + 0 ], [ %rd101 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6621479Z // end inline asm 2026-02-21T08:52:44.6621629Z // begin inline asm 2026-02-21T08:52:44.6622023Z cp.async.ca.shared.global [ %r1361 + 0 ], [ %rd102 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6622298Z // end inline asm 2026-02-21T08:52:44.6622446Z // begin inline asm 2026-02-21T08:52:44.6622674Z cp.async.ca.shared.global [ %r1363 + 0 ], [ %rd103 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6622948Z // end inline asm 2026-02-21T08:52:44.6623099Z // begin inline asm 2026-02-21T08:52:44.6623323Z cp.async.ca.shared.global [ %r1365 + 0 ], [ %rd104 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6623598Z // end inline asm 2026-02-21T08:52:44.6623751Z cp.async.commit_group; 2026-02-21T08:52:44.6624069Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6624436Z cvt.s64.s32 %rd124, %r118; 2026-02-21T08:52:44.6624755Z add.s64 %rd125, %rd124, %rd3; 2026-02-21T08:52:44.6624950Z add.s64 %rd126, %rd38, %rd125; 2026-02-21T08:52:44.6625136Z add.s64 %rd105, %rd126, 229376; 2026-02-21T08:52:44.6625472Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6625842Z // begin inline asm 2026-02-21T08:52:44.6626090Z cp.async.ca.shared.global [ %r1367 + 0 ], [ %rd105 + 0 ], 0x8, %r873; 2026-02-21T08:52:44.6626384Z // end inline asm 2026-02-21T08:52:44.6626696Z cp.async.commit_group; 2026-02-21T08:52:44.6627043Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6627433Z add.s32 %r2478, %r67, %r926; 2026-02-21T08:52:44.6627629Z or.b32 %r936, %r8, %r117; 2026-02-21T08:52:44.6627823Z shl.b32 %r937, %r936, 13; 2026-02-21T08:52:44.6628016Z mad.wide.s32 %rd246, %r937, 2, %rd1; 2026-02-21T08:52:44.6628222Z or.b32 %r938, %r7, %r117; 2026-02-21T08:52:44.6628399Z shl.b32 %r939, %r938, 13; 2026-02-21T08:52:44.6628644Z mad.wide.s32 %rd245, %r939, 2, %rd1; 2026-02-21T08:52:44.6628855Z shl.b32 %r940, %r921, 19; 2026-02-21T08:52:44.6629036Z or.b32 %r941, %r70, %r940; 2026-02-21T08:52:44.6629230Z mad.wide.s32 %rd244, %r941, 2, %rd1; 2026-02-21T08:52:44.6629443Z or.b32 %r2477, %r71, %r940; 2026-02-21T08:52:44.6629627Z mov.b32 %r2481, 0f00000000; 2026-02-21T08:52:44.6629808Z mov.b32 %r2480, 1; 2026-02-21T08:52:44.6629965Z mov.b32 %r2479, -1; 2026-02-21T08:52:44.6630131Z mov.b64 %rd247, -32; 2026-02-21T08:52:44.6630297Z mov.b32 %r2482, %r2481; 2026-02-21T08:52:44.6630476Z mov.b32 %r2483, %r2481; 2026-02-21T08:52:44.6630645Z mov.b32 %r2484, %r2481; 2026-02-21T08:52:44.6630815Z mov.b32 %r2485, %r2481; 2026-02-21T08:52:44.6630983Z mov.b32 %r2486, %r2481; 2026-02-21T08:52:44.6631144Z mov.b32 %r2487, %r2481; 2026-02-21T08:52:44.6631313Z mov.b32 %r2488, %r2481; 2026-02-21T08:52:44.6631476Z mov.b32 %r2489, %r2481; 2026-02-21T08:52:44.6631646Z mov.b32 %r2490, %r2481; 2026-02-21T08:52:44.6631814Z mov.b32 %r2491, %r2481; 2026-02-21T08:52:44.6631990Z mov.b32 %r2492, %r2481; 2026-02-21T08:52:44.6632167Z mov.b32 %r2493, %r2481; 2026-02-21T08:52:44.6632339Z mov.b32 %r2494, %r2481; 2026-02-21T08:52:44.6632506Z mov.b32 %r2495, %r2481; 2026-02-21T08:52:44.6632677Z mov.b32 %r2496, %r2481; 2026-02-21T08:52:44.6632900Z $L__BB0_6: // Parent Loop BB0_3 Depth=1 2026-02-21T08:52:44.6633205Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:44.6633481Z add.s64 %rd247, %rd247, 32; 2026-02-21T08:52:44.6633676Z setp.lt.u64 %p24, %rd247, 4032; 2026-02-21T08:52:44.6633877Z add.s32 %r1278, %r2479, 1; 2026-02-21T08:52:44.6634062Z setp.gt.s32 %p25, %r1278, 1; 2026-02-21T08:52:44.6634259Z selp.b32 %r2479, 0, %r1278, %p25; 2026-02-21T08:52:44.6634611Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6635006Z cp.async.wait_group 2; 2026-02-21T08:52:44.6635191Z bar.sync 0; 2026-02-21T08:52:44.6635343Z shl.b32 %r1279, %r2479, 13; 2026-02-21T08:52:44.6635546Z add.s32 %r1281, %r2442, %r1279; 2026-02-21T08:52:44.6635890Z .loc 1 52 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:52:32 2026-02-21T08:52:44.6636567Z add.s32 %r1282, %r1281, %r59; 2026-02-21T08:52:44.6636773Z ld.shared.b16 %rs113, [%r1282]; 2026-02-21T08:52:44.6636986Z ld.shared.b16 %rs114, [%r1282+1024]; 2026-02-21T08:52:44.6637206Z ld.shared.b16 %rs115, [%r1282+64]; 2026-02-21T08:52:44.6637416Z ld.shared.b16 %rs116, [%r1282+1088]; 2026-02-21T08:52:44.6637624Z add.s32 %r1283, %r1281, %r60; 2026-02-21T08:52:44.6637812Z ld.shared.b16 %rs117, [%r1283]; 2026-02-21T08:52:44.6638017Z ld.shared.b16 %rs118, [%r1283+1024]; 2026-02-21T08:52:44.6638221Z ld.shared.b16 %rs119, [%r1283+64]; 2026-02-21T08:52:44.6638429Z ld.shared.b16 %rs120, [%r1283+1088]; 2026-02-21T08:52:44.6638632Z add.s32 %r1284, %r1281, %r61; 2026-02-21T08:52:44.6638957Z ld.shared.b16 %rs121, [%r1284]; 2026-02-21T08:52:44.6639176Z ld.shared.b16 %rs122, [%r1284+1024]; 2026-02-21T08:52:44.6639387Z ld.shared.b16 %rs123, [%r1284+64]; 2026-02-21T08:52:44.6639599Z ld.shared.b16 %rs124, [%r1284+1088]; 2026-02-21T08:52:44.6639796Z add.s32 %r1285, %r1281, %r62; 2026-02-21T08:52:44.6639994Z ld.shared.b16 %rs125, [%r1285]; 2026-02-21T08:52:44.6640189Z ld.shared.b16 %rs126, [%r1285+1024]; 2026-02-21T08:52:44.6640397Z ld.shared.b16 %rs127, [%r1285+64]; 2026-02-21T08:52:44.6640596Z ld.shared.b16 %rs128, [%r1285+1088]; 2026-02-21T08:52:44.6640801Z add.s32 %r1286, %r1281, %r63; 2026-02-21T08:52:44.6640982Z ld.shared.b16 %rs129, [%r1286]; 2026-02-21T08:52:44.6641185Z ld.shared.b16 %rs130, [%r1286+1024]; 2026-02-21T08:52:44.6641389Z ld.shared.b16 %rs131, [%r1286+64]; 2026-02-21T08:52:44.6641588Z ld.shared.b16 %rs132, [%r1286+1088]; 2026-02-21T08:52:44.6641788Z add.s32 %r1287, %r1281, %r64; 2026-02-21T08:52:44.6641968Z ld.shared.b16 %rs133, [%r1287]; 2026-02-21T08:52:44.6642169Z ld.shared.b16 %rs134, [%r1287+1024]; 2026-02-21T08:52:44.6642369Z ld.shared.b16 %rs135, [%r1287+64]; 2026-02-21T08:52:44.6642572Z ld.shared.b16 %rs136, [%r1287+1088]; 2026-02-21T08:52:44.6642770Z add.s32 %r1288, %r1281, %r65; 2026-02-21T08:52:44.6642969Z ld.shared.b16 %rs137, [%r1288]; 2026-02-21T08:52:44.6643169Z ld.shared.b16 %rs138, [%r1288+1024]; 2026-02-21T08:52:44.6643368Z ld.shared.b16 %rs139, [%r1288+64]; 2026-02-21T08:52:44.6643575Z ld.shared.b16 %rs140, [%r1288+1088]; 2026-02-21T08:52:44.6643771Z add.s32 %r1289, %r1281, %r66; 2026-02-21T08:52:44.6643960Z ld.shared.b16 %rs141, [%r1289]; 2026-02-21T08:52:44.6644154Z ld.shared.b16 %rs142, [%r1289+1024]; 2026-02-21T08:52:44.6644361Z ld.shared.b16 %rs143, [%r1289+64]; 2026-02-21T08:52:44.6644559Z ld.shared.b16 %rs144, [%r1289+1088]; 2026-02-21T08:52:44.6644765Z cvt.f32.bf16 %r974, %rs113; 2026-02-21T08:52:44.6644949Z cvt.f32.bf16 %r975, %rs114; 2026-02-21T08:52:44.6645136Z cvt.f32.bf16 %r976, %rs117; 2026-02-21T08:52:44.6645323Z cvt.f32.bf16 %r977, %rs118; 2026-02-21T08:52:44.6645502Z cvt.f32.bf16 %r1010, %rs121; 2026-02-21T08:52:44.6645688Z cvt.f32.bf16 %r1011, %rs122; 2026-02-21T08:52:44.6645874Z cvt.f32.bf16 %r1012, %rs125; 2026-02-21T08:52:44.6646056Z cvt.f32.bf16 %r1013, %rs126; 2026-02-21T08:52:44.6646234Z cvt.f32.bf16 %r1046, %rs129; 2026-02-21T08:52:44.6646415Z cvt.f32.bf16 %r1047, %rs130; 2026-02-21T08:52:44.6646719Z cvt.f32.bf16 %r1048, %rs133; 2026-02-21T08:52:44.6646916Z cvt.f32.bf16 %r1049, %rs134; 2026-02-21T08:52:44.6647100Z cvt.f32.bf16 %r1082, %rs137; 2026-02-21T08:52:44.6647274Z cvt.f32.bf16 %r1083, %rs138; 2026-02-21T08:52:44.6647455Z cvt.f32.bf16 %r1084, %rs141; 2026-02-21T08:52:44.6647631Z cvt.f32.bf16 %r1085, %rs142; 2026-02-21T08:52:44.6647812Z cvt.f32.bf16 %r1118, %rs115; 2026-02-21T08:52:44.6647992Z cvt.f32.bf16 %r1119, %rs116; 2026-02-21T08:52:44.6648176Z cvt.f32.bf16 %r1120, %rs119; 2026-02-21T08:52:44.6648354Z cvt.f32.bf16 %r1121, %rs120; 2026-02-21T08:52:44.6648542Z cvt.f32.bf16 %r1154, %rs123; 2026-02-21T08:52:44.6648722Z cvt.f32.bf16 %r1155, %rs124; 2026-02-21T08:52:44.6648909Z cvt.f32.bf16 %r1156, %rs127; 2026-02-21T08:52:44.6649252Z cvt.f32.bf16 %r1157, %rs128; 2026-02-21T08:52:44.6649433Z cvt.f32.bf16 %r1190, %rs131; 2026-02-21T08:52:44.6649618Z cvt.f32.bf16 %r1191, %rs132; 2026-02-21T08:52:44.6649797Z cvt.f32.bf16 %r1192, %rs135; 2026-02-21T08:52:44.6649985Z cvt.f32.bf16 %r1193, %rs136; 2026-02-21T08:52:44.6650163Z cvt.f32.bf16 %r1226, %rs139; 2026-02-21T08:52:44.6650347Z cvt.f32.bf16 %r1227, %rs140; 2026-02-21T08:52:44.6650524Z cvt.f32.bf16 %r1228, %rs143; 2026-02-21T08:52:44.6650704Z cvt.f32.bf16 %r1229, %rs144; 2026-02-21T08:52:44.6651058Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6651428Z shl.b32 %r1290, %r2479, 11; 2026-02-21T08:52:44.6651892Z .loc 1 67 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:67:45 2026-02-21T08:52:44.6652260Z add.s32 %r1291, %r47, %r1290; 2026-02-21T08:52:44.6652452Z ld.shared.b8 %rs145, [%r1291]; 2026-02-21T08:52:44.6652649Z ld.shared.b8 %rs146, [%r1291+128]; 2026-02-21T08:52:44.6652859Z ld.shared.b8 %rs147, [%r1291+256]; 2026-02-21T08:52:44.6653056Z ld.shared.b8 %rs148, [%r1291+384]; 2026-02-21T08:52:44.6653257Z ld.shared.b8 %rs149, [%r1291+512]; 2026-02-21T08:52:44.6653453Z ld.shared.b8 %rs150, [%r1291+640]; 2026-02-21T08:52:44.6653641Z ld.shared.b8 %rs151, [%r1291+768]; 2026-02-21T08:52:44.6653837Z ld.shared.b8 %rs152, [%r1291+896]; 2026-02-21T08:52:44.6654049Z ld.shared.b8 %rs153, [%r1291+1024]; 2026-02-21T08:52:44.6654259Z ld.shared.b8 %rs154, [%r1291+1152]; 2026-02-21T08:52:44.6654461Z ld.shared.b8 %rs155, [%r1291+1280]; 2026-02-21T08:52:44.6654662Z ld.shared.b8 %rs156, [%r1291+1408]; 2026-02-21T08:52:44.6654871Z ld.shared.b8 %rs157, [%r1291+1536]; 2026-02-21T08:52:44.6655070Z ld.shared.b8 %rs158, [%r1291+1664]; 2026-02-21T08:52:44.6655275Z ld.shared.b8 %rs159, [%r1291+1792]; 2026-02-21T08:52:44.6655473Z ld.shared.b8 %rs160, [%r1291+1920]; 2026-02-21T08:52:44.6655839Z .loc 1 57 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:57:28 2026-02-21T08:52:44.6656205Z shl.b16 %rs161, %rs145, 4; 2026-02-21T08:52:44.6656395Z shl.b16 %rs162, %rs146, 4; 2026-02-21T08:52:44.6656681Z shl.b16 %rs163, %rs147, 4; 2026-02-21T08:52:44.6656878Z shl.b16 %rs164, %rs148, 4; 2026-02-21T08:52:44.6657055Z shl.b16 %rs165, %rs149, 4; 2026-02-21T08:52:44.6657234Z shl.b16 %rs166, %rs150, 4; 2026-02-21T08:52:44.6657414Z shl.b16 %rs167, %rs151, 4; 2026-02-21T08:52:44.6657589Z shl.b16 %rs168, %rs152, 4; 2026-02-21T08:52:44.6657771Z shl.b16 %rs169, %rs153, 4; 2026-02-21T08:52:44.6657942Z shl.b16 %rs170, %rs154, 4; 2026-02-21T08:52:44.6658122Z shl.b16 %rs171, %rs155, 4; 2026-02-21T08:52:44.6658298Z shl.b16 %rs172, %rs156, 4; 2026-02-21T08:52:44.6658492Z shl.b16 %rs173, %rs157, 4; 2026-02-21T08:52:44.6658678Z shl.b16 %rs174, %rs158, 4; 2026-02-21T08:52:44.6658862Z shl.b16 %rs175, %rs159, 4; 2026-02-21T08:52:44.6659038Z shl.b16 %rs176, %rs160, 4; 2026-02-21T08:52:44.6659378Z .loc 1 72 58 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:72:58 2026-02-21T08:52:44.6659764Z selp.b16 %rs177, %rs161, %rs145, %p64; 2026-02-21T08:52:44.6659977Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T08:52:44.6660158Z shr.s16 %rs179, %rs178, 4; 2026-02-21T08:52:44.6660346Z selp.b16 %rs180, %rs162, %rs146, %p64; 2026-02-21T08:52:44.6660561Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T08:52:44.6660737Z shr.s16 %rs182, %rs181, 4; 2026-02-21T08:52:44.6660924Z selp.b16 %rs183, %rs163, %rs147, %p64; 2026-02-21T08:52:44.6661127Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T08:52:44.6661307Z shr.s16 %rs185, %rs184, 4; 2026-02-21T08:52:44.6661491Z selp.b16 %rs186, %rs164, %rs148, %p64; 2026-02-21T08:52:44.6661690Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T08:52:44.6661870Z shr.s16 %rs188, %rs187, 4; 2026-02-21T08:52:44.6661949Z selp.b16 %rs189, %rs165, %rs149, %p64; 2026-02-21T08:52:44.6662014Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T08:52:44.6662081Z shr.s16 %rs191, %rs190, 4; 2026-02-21T08:52:44.6662308Z selp.b16 %rs192, %rs166, %rs150, %p64; 2026-02-21T08:52:44.6662376Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T08:52:44.6662439Z shr.s16 %rs194, %rs193, 4; 2026-02-21T08:52:44.6662514Z selp.b16 %rs195, %rs167, %rs151, %p64; 2026-02-21T08:52:44.6662590Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T08:52:44.6662653Z shr.s16 %rs197, %rs196, 4; 2026-02-21T08:52:44.6662729Z selp.b16 %rs198, %rs168, %rs152, %p64; 2026-02-21T08:52:44.6662790Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T08:52:44.6662852Z shr.s16 %rs200, %rs199, 4; 2026-02-21T08:52:44.6662923Z selp.b16 %rs201, %rs169, %rs153, %p64; 2026-02-21T08:52:44.6662988Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T08:52:44.6663051Z shr.s16 %rs203, %rs202, 4; 2026-02-21T08:52:44.6663119Z selp.b16 %rs204, %rs170, %rs154, %p64; 2026-02-21T08:52:44.6663305Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T08:52:44.6663372Z shr.s16 %rs206, %rs205, 4; 2026-02-21T08:52:44.6663441Z selp.b16 %rs207, %rs171, %rs155, %p64; 2026-02-21T08:52:44.6663504Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T08:52:44.6663577Z shr.s16 %rs209, %rs208, 4; 2026-02-21T08:52:44.6663649Z selp.b16 %rs210, %rs172, %rs156, %p64; 2026-02-21T08:52:44.6663711Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T08:52:44.6663776Z shr.s16 %rs212, %rs211, 4; 2026-02-21T08:52:44.6663845Z selp.b16 %rs213, %rs173, %rs157, %p64; 2026-02-21T08:52:44.6663909Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T08:52:44.6663975Z shr.s16 %rs215, %rs214, 4; 2026-02-21T08:52:44.6664043Z selp.b16 %rs216, %rs174, %rs158, %p64; 2026-02-21T08:52:44.6664104Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T08:52:44.6664166Z shr.s16 %rs218, %rs217, 4; 2026-02-21T08:52:44.6664242Z selp.b16 %rs219, %rs175, %rs159, %p64; 2026-02-21T08:52:44.6664307Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T08:52:44.6664368Z shr.s16 %rs221, %rs220, 4; 2026-02-21T08:52:44.6664450Z selp.b16 %rs222, %rs176, %rs160, %p64; 2026-02-21T08:52:44.6664521Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T08:52:44.6664586Z shr.s16 %rs224, %rs223, 4; 2026-02-21T08:52:44.6664818Z .loc 1 77 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:77:32 2026-02-21T08:52:44.6664892Z cvt.rn.f32.s16 %r1292, %rs179; 2026-02-21T08:52:44.6664960Z cvt.rn.f32.s16 %r1293, %rs182; 2026-02-21T08:52:44.6665027Z cvt.rn.f32.s16 %r1294, %rs185; 2026-02-21T08:52:44.6665094Z cvt.rn.f32.s16 %r1295, %rs188; 2026-02-21T08:52:44.6665157Z cvt.rn.f32.s16 %r1296, %rs191; 2026-02-21T08:52:44.6665221Z cvt.rn.f32.s16 %r1297, %rs194; 2026-02-21T08:52:44.6665283Z cvt.rn.f32.s16 %r1298, %rs197; 2026-02-21T08:52:44.6665350Z cvt.rn.f32.s16 %r1299, %rs200; 2026-02-21T08:52:44.6665413Z cvt.rn.f32.s16 %r1300, %rs203; 2026-02-21T08:52:44.6665475Z cvt.rn.f32.s16 %r1301, %rs206; 2026-02-21T08:52:44.6665543Z cvt.rn.f32.s16 %r1302, %rs209; 2026-02-21T08:52:44.6665608Z cvt.rn.f32.s16 %r1303, %rs212; 2026-02-21T08:52:44.6665672Z cvt.rn.f32.s16 %r1304, %rs215; 2026-02-21T08:52:44.6665738Z cvt.rn.f32.s16 %r1305, %rs218; 2026-02-21T08:52:44.6665802Z cvt.rn.f32.s16 %r1306, %rs221; 2026-02-21T08:52:44.6665869Z cvt.rn.f32.s16 %r1307, %rs224; 2026-02-21T08:52:44.6665937Z st.shared.b32 [%r48], %r1292; 2026-02-21T08:52:44.6666015Z st.shared.b32 [%r48+8192], %r1300; 2026-02-21T08:52:44.6666081Z st.shared.b32 [%r49], %r1293; 2026-02-21T08:52:44.6666147Z st.shared.b32 [%r49+8192], %r1301; 2026-02-21T08:52:44.6666215Z st.shared.b32 [%r50], %r1294; 2026-02-21T08:52:44.6666280Z st.shared.b32 [%r50+8192], %r1302; 2026-02-21T08:52:44.6666346Z st.shared.b32 [%r51], %r1295; 2026-02-21T08:52:44.6666409Z st.shared.b32 [%r51+8192], %r1303; 2026-02-21T08:52:44.6666599Z st.shared.b32 [%r52], %r1296; 2026-02-21T08:52:44.6666673Z st.shared.b32 [%r52+8192], %r1304; 2026-02-21T08:52:44.6666740Z st.shared.b32 [%r53], %r1297; 2026-02-21T08:52:44.6666814Z st.shared.b32 [%r53+8192], %r1305; 2026-02-21T08:52:44.6666893Z st.shared.b32 [%r54], %r1298; 2026-02-21T08:52:44.6666960Z st.shared.b32 [%r54+8192], %r1306; 2026-02-21T08:52:44.6667026Z st.shared.b32 [%r55], %r1299; 2026-02-21T08:52:44.6667236Z st.shared.b32 [%r55+8192], %r1307; 2026-02-21T08:52:44.6667298Z $L__tmp3: 2026-02-21T08:52:44.6667596Z .loc 2 291 36 // standard.py:291:36 @[ cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:84:40 ] 2026-02-21T08:52:44.6667662Z // begin inline asm 2026-02-21T08:52:44.6667746Z fence.proxy.async.shared::cta; 2026-02-21T08:52:44.6667803Z // end inline asm 2026-02-21T08:52:44.6667864Z bar.sync 0; 2026-02-21T08:52:44.6667952Z shfl.sync.idx.b32 %r1308, %r4, 0, 31, -1; 2026-02-21T08:52:44.6668028Z wgmma.fence.sync.aligned; 2026-02-21T08:52:44.6668093Z shl.b32 %r1309, %r1308, 10; 2026-02-21T08:52:44.6668162Z and.b32 %r1310, %r1309, 4096; 2026-02-21T08:52:44.6668226Z add.s32 %r1311, %r1310, %r364; 2026-02-21T08:52:44.6668431Z bfe.u32 %r1312, %r1311, 4, 14; 2026-02-21T08:52:44.6668589Z cvt.u64.u32 %rd140, %r1312; 2026-02-21T08:52:44.6668678Z or.b64 %rd127, %rd140, 4611686293338849280; 2026-02-21T08:52:44.6668748Z mov.pred %p15, -1; 2026-02-21T08:52:44.6668814Z // begin inline asm 2026-02-21T08:52:44.6669342Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r974,%r975,%r976,%r977}, %rd127, %p15, 1, 1; 2026-02-21T08:52:44.6669406Z // end inline asm 2026-02-21T08:52:44.6669469Z add.s32 %r1313, %r1311, 32; 2026-02-21T08:52:44.6669538Z bfe.u32 %r1314, %r1313, 4, 14; 2026-02-21T08:52:44.6669602Z cvt.u64.u32 %rd141, %r1314; 2026-02-21T08:52:44.6669679Z or.b64 %rd128, %rd141, 4611686293338849280; 2026-02-21T08:52:44.6669747Z // begin inline asm 2026-02-21T08:52:44.6670268Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1010,%r1011,%r1012,%r1013}, %rd128, %p15, 1, 1; 2026-02-21T08:52:44.6670329Z // end inline asm 2026-02-21T08:52:44.6670397Z add.s32 %r1315, %r1311, 64; 2026-02-21T08:52:44.6670463Z bfe.u32 %r1316, %r1315, 4, 14; 2026-02-21T08:52:44.6670527Z cvt.u64.u32 %rd142, %r1316; 2026-02-21T08:52:44.6670602Z or.b64 %rd129, %rd142, 4611686293338849280; 2026-02-21T08:52:44.6670668Z // begin inline asm 2026-02-21T08:52:44.6671181Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1046,%r1047,%r1048,%r1049}, %rd129, %p15, 1, 1; 2026-02-21T08:52:44.6671240Z // end inline asm 2026-02-21T08:52:44.6671306Z add.s32 %r1317, %r1311, 96; 2026-02-21T08:52:44.6671372Z bfe.u32 %r1318, %r1317, 4, 14; 2026-02-21T08:52:44.6671446Z cvt.u64.u32 %rd143, %r1318; 2026-02-21T08:52:44.6671528Z or.b64 %rd130, %rd143, 4611686293338849280; 2026-02-21T08:52:44.6671591Z // begin inline asm 2026-02-21T08:52:44.6672098Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1082,%r1083,%r1084,%r1085}, %rd130, %p15, 1, 1; 2026-02-21T08:52:44.6672161Z // end inline asm 2026-02-21T08:52:44.6672229Z add.s32 %r1319, %r1311, 8192; 2026-02-21T08:52:44.6672292Z bfe.u32 %r1320, %r1319, 4, 14; 2026-02-21T08:52:44.6672356Z cvt.u64.u32 %rd144, %r1320; 2026-02-21T08:52:44.6672434Z or.b64 %rd131, %rd144, 4611686293338849280; 2026-02-21T08:52:44.6672495Z // begin inline asm 2026-02-21T08:52:44.6672999Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1118,%r1119,%r1120,%r1121}, %rd131, %p15, 1, 1; 2026-02-21T08:52:44.6673061Z // end inline asm 2026-02-21T08:52:44.6673123Z add.s32 %r1321, %r1311, 8224; 2026-02-21T08:52:44.6673188Z bfe.u32 %r1322, %r1321, 4, 14; 2026-02-21T08:52:44.6673251Z cvt.u64.u32 %rd145, %r1322; 2026-02-21T08:52:44.6673328Z or.b64 %rd132, %rd145, 4611686293338849280; 2026-02-21T08:52:44.6673387Z // begin inline asm 2026-02-21T08:52:44.6674015Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1154,%r1155,%r1156,%r1157}, %rd132, %p15, 1, 1; 2026-02-21T08:52:44.6674078Z // end inline asm 2026-02-21T08:52:44.6674142Z add.s32 %r1323, %r1311, 8256; 2026-02-21T08:52:44.6674205Z bfe.u32 %r1324, %r1323, 4, 14; 2026-02-21T08:52:44.6674272Z cvt.u64.u32 %rd146, %r1324; 2026-02-21T08:52:44.6674345Z or.b64 %rd133, %rd146, 4611686293338849280; 2026-02-21T08:52:44.6674407Z // begin inline asm 2026-02-21T08:52:44.6675012Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1190,%r1191,%r1192,%r1193}, %rd133, %p15, 1, 1; 2026-02-21T08:52:44.6675075Z // end inline asm 2026-02-21T08:52:44.6675136Z add.s32 %r1325, %r1311, 8288; 2026-02-21T08:52:44.6675196Z bfe.u32 %r1326, %r1325, 4, 14; 2026-02-21T08:52:44.6675268Z cvt.u64.u32 %rd147, %r1326; 2026-02-21T08:52:44.6675350Z or.b64 %rd134, %rd147, 4611686293338849280; 2026-02-21T08:52:44.6675410Z // begin inline asm 2026-02-21T08:52:44.6675921Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496}, {%r1226,%r1227,%r1228,%r1229}, %rd134, %p15, 1, 1; 2026-02-21T08:52:44.6675980Z // end inline asm 2026-02-21T08:52:44.6676061Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:44.6676123Z mov.b32 %r1247, 0; 2026-02-21T08:52:44.6676185Z mov.b32 %r1248, %r1247; 2026-02-21T08:52:44.6676246Z mov.b32 %r1246, %r364; 2026-02-21T08:52:44.6676307Z // begin inline asm 2026-02-21T08:52:44.6676852Z // wait for regs: %r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r1246,%r1247,%r1248 2026-02-21T08:52:44.6676941Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:44.6677000Z // end inline asm 2026-02-21T08:52:44.6677064Z $L__tmp4: 2026-02-21T08:52:44.6677298Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6677365Z add.s32 %r1327, %r2480, 1; 2026-02-21T08:52:44.6677440Z setp.gt.s32 %p26, %r1327, 1; 2026-02-21T08:52:44.6677513Z selp.b32 %r2480, 0, %r1327, %p26; 2026-02-21T08:52:44.6677739Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6677817Z mad.wide.s32 %rd138, %r2477, 2, %rd37; 2026-02-21T08:52:44.6689561Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6689673Z shl.b32 %r1328, %r2480, 13; 2026-02-21T08:52:44.6689765Z add.s32 %r1329, %r2442, %r1328; 2026-02-21T08:52:44.6689843Z add.s32 %r1268, %r1329, %r19; 2026-02-21T08:52:44.6689916Z selp.b32 %r1269, 8, 0, %p24; 2026-02-21T08:52:44.6689987Z // begin inline asm 2026-02-21T08:52:44.6690157Z cp.async.ca.shared.global [ %r1268 + 0 ], [ %rd244 + 0 ], 0x8, %r1269; 2026-02-21T08:52:44.6690224Z // end inline asm 2026-02-21T08:52:44.6690293Z add.s32 %r1270, %r1268, 2048; 2026-02-21T08:52:44.6690360Z // begin inline asm 2026-02-21T08:52:44.6690514Z cp.async.ca.shared.global [ %r1270 + 0 ], [ %rd245 + 0 ], 0x8, %r1269; 2026-02-21T08:52:44.6690572Z // end inline asm 2026-02-21T08:52:44.6690640Z add.s32 %r1272, %r1268, 4096; 2026-02-21T08:52:44.6690700Z // begin inline asm 2026-02-21T08:52:44.6690847Z cp.async.ca.shared.global [ %r1272 + 0 ], [ %rd246 + 0 ], 0x8, %r1269; 2026-02-21T08:52:44.6690909Z // end inline asm 2026-02-21T08:52:44.6690971Z add.s32 %r1274, %r1268, 6144; 2026-02-21T08:52:44.6691030Z // begin inline asm 2026-02-21T08:52:44.6691171Z cp.async.ca.shared.global [ %r1274 + 0 ], [ %rd138 + 0 ], 0x8, %r1269; 2026-02-21T08:52:44.6691233Z // end inline asm 2026-02-21T08:52:44.6691302Z cp.async.commit_group; 2026-02-21T08:52:44.6691535Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6691839Z cvt.s64.s32 %rd148, %r2478; 2026-02-21T08:52:44.6691908Z add.s64 %rd139, %rd38, %rd148; 2026-02-21T08:52:44.6692128Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6692190Z shl.b32 %r1330, %r2480, 11; 2026-02-21T08:52:44.6692260Z add.s32 %r1276, %r1357, %r1330; 2026-02-21T08:52:44.6692321Z // begin inline asm 2026-02-21T08:52:44.6692464Z cp.async.ca.shared.global [ %r1276 + 0 ], [ %rd139 + 0 ], 0x8, %r1269; 2026-02-21T08:52:44.6692524Z // end inline asm 2026-02-21T08:52:44.6692589Z cp.async.commit_group; 2026-02-21T08:52:44.6692808Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6693002Z add.s32 %r2478, %r2478, 229376; 2026-02-21T08:52:44.6693071Z add.s64 %rd246, %rd246, 128; 2026-02-21T08:52:44.6693134Z add.s64 %rd245, %rd245, 128; 2026-02-21T08:52:44.6693201Z add.s64 %rd244, %rd244, 128; 2026-02-21T08:52:44.6693265Z add.s32 %r2477, %r2477, 64; 2026-02-21T08:52:44.6693334Z setp.lt.u64 %p27, %rd247, 4064; 2026-02-21T08:52:44.6693395Z @%p27 bra $L__BB0_6; 2026-02-21T08:52:44.6693511Z // %bb.7: // in Loop: Header=BB0_3 Depth=1 2026-02-21T08:52:44.6693736Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6693802Z or.b32 %r1372, %r117, %r10; 2026-02-21T08:52:44.6693865Z or.b32 %r1373, %r117, %r11; 2026-02-21T08:52:44.6694084Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6694152Z cp.async.wait_group 0; 2026-02-21T08:52:44.6694219Z bar.sync 0; 2026-02-21T08:52:44.6694441Z .loc 1 87 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:87:28 2026-02-21T08:52:44.6694524Z cvt.rn.bf16x2.f32 %r1374, %r2482, %r2481; 2026-02-21T08:52:44.6694602Z cvt.rn.bf16x2.f32 %r1375, %r2484, %r2483; 2026-02-21T08:52:44.6694676Z cvt.rn.bf16x2.f32 %r1376, %r2486, %r2485; 2026-02-21T08:52:44.6694747Z cvt.rn.bf16x2.f32 %r1377, %r2488, %r2487; 2026-02-21T08:52:44.6694817Z cvt.rn.bf16x2.f32 %r1378, %r2490, %r2489; 2026-02-21T08:52:44.6694892Z cvt.rn.bf16x2.f32 %r1379, %r2492, %r2491; 2026-02-21T08:52:44.6694963Z cvt.rn.bf16x2.f32 %r1380, %r2494, %r2493; 2026-02-21T08:52:44.6695034Z cvt.rn.bf16x2.f32 %r1381, %r2496, %r2495; 2026-02-21T08:52:44.6695254Z .loc 1 88 50 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:50 2026-02-21T08:52:44.6695331Z mad.lo.s32 %r1382, %r1372, 7168, %r118; 2026-02-21T08:52:44.6695398Z mad.lo.s32 %r1383, %r1373, 7168, %r118; 2026-02-21T08:52:44.6695618Z .loc 1 88 22 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:22 2026-02-21T08:52:44.6695695Z mad.wide.s32 %rd149, %r1382, 2, %rd39; 2026-02-21T08:52:44.6695764Z mad.wide.s32 %rd150, %r1383, 2, %rd39; 2026-02-21T08:52:44.6695984Z .loc 1 88 81 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:81 2026-02-21T08:52:44.6696102Z st.shared.v4.b32 [%r56], {%r1374, %r1376, %r1378, %r1380}; 2026-02-21T08:52:44.6696228Z st.shared.v4.b32 [%r56+256], {%r1375, %r1377, %r1379, %r1381}; 2026-02-21T08:52:44.6696287Z bar.sync 0; 2026-02-21T08:52:44.6696349Z // begin inline asm 2026-02-21T08:52:44.6696687Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1341, %r1342, %r1343, %r1344}, [%r858]; 2026-02-21T08:52:44.6696764Z // end inline asm 2026-02-21T08:52:44.6696836Z // begin inline asm 2026-02-21T08:52:44.6697026Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1345, %r1346, %r1347, %r1348}, [%r863]; 2026-02-21T08:52:44.6697081Z // end inline asm 2026-02-21T08:52:44.6697142Z // begin inline asm 2026-02-21T08:52:44.6697274Z st.global.v4.b32 [ %rd149 + 0 ], { %r1341, %r1342, %r1343, %r1344 }; 2026-02-21T08:52:44.6697329Z // end inline asm 2026-02-21T08:52:44.6697385Z // begin inline asm 2026-02-21T08:52:44.6697646Z st.global.v4.b32 [ %rd150 + 0 ], { %r1345, %r1346, %r1347, %r1348 }; 2026-02-21T08:52:44.6697707Z // end inline asm 2026-02-21T08:52:44.6697932Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6698007Z add.s32 %r1384, %r2456, 8448; 2026-02-21T08:52:44.6698230Z .loc 1 25 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:25:35 2026-02-21T08:52:44.6698304Z mul.hi.s32 %r1385, %r1384, -1840700269; 2026-02-21T08:52:44.6698371Z add.s32 %r1386, %r1385, %r1384; 2026-02-21T08:52:44.6698436Z shr.u32 %r1387, %r1386, 31; 2026-02-21T08:52:44.6698498Z shr.s32 %r1388, %r1386, 7; 2026-02-21T08:52:44.6698559Z add.s32 %r1389, %r1388, %r1387; 2026-02-21T08:52:44.6698904Z .loc 1 26 33 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:26:33 2026-02-21T08:52:44.6698976Z shl.b32 %r1390, %r1389, 1; 2026-02-21T08:52:44.6699191Z .loc 1 27 39 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:39 2026-02-21T08:52:44.6699258Z sub.s32 %r1391, 1, %r1390; 2026-02-21T08:52:44.6699475Z .loc 1 27 52 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:52 2026-02-21T08:52:44.6699539Z min.s32 %r1392, %r1391, 2; 2026-02-21T08:52:44.6699745Z .loc 1 28 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:45 2026-02-21T08:52:44.6699814Z mul.lo.s32 %r1393, %r1389, 224; 2026-02-21T08:52:44.6699876Z sub.s32 %r1394, %r1384, %r1393; 2026-02-21T08:52:44.6700078Z .loc 1 29 51 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:29:51 2026-02-21T08:52:44.6700141Z div.s32 %r1395, %r1394, %r1392; 2026-02-21T08:52:44.6700349Z .loc 1 28 64 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:64 2026-02-21T08:52:44.6700417Z mul.lo.s32 %r1396, %r1395, %r1392; 2026-02-21T08:52:44.6700485Z sub.s32 %r1397, %r1394, %r1396; 2026-02-21T08:52:44.6700700Z .loc 1 28 30 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:30 2026-02-21T08:52:44.6700768Z add.s32 %r1398, %r1397, %r1390; 2026-02-21T08:52:44.6700974Z .loc 1 30 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:30:27 2026-02-21T08:52:44.6701042Z shl.b32 %r161, %r1398, 6; 2026-02-21T08:52:44.6701244Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6701307Z or.b32 %r1399, %r161, %r6; 2026-02-21T08:52:44.6701369Z or.b32 %r1400, %r161, %r7; 2026-02-21T08:52:44.6701428Z or.b32 %r1401, %r161, %r8; 2026-02-21T08:52:44.6701490Z or.b32 %r1402, %r161, %r9; 2026-02-21T08:52:44.6701696Z .loc 1 32 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:32:27 2026-02-21T08:52:44.6701761Z shl.b32 %r1403, %r1395, 6; 2026-02-21T08:52:44.6701965Z .loc 1 33 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:33:32 2026-02-21T08:52:44.6702033Z or.b32 %r162, %r1403, %r15; 2026-02-21T08:52:44.6702246Z .loc 1 48 53 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:53 2026-02-21T08:52:44.6702314Z shl.b32 %r1404, %r1399, 13; 2026-02-21T08:52:44.6702375Z shl.b32 %r1405, %r1400, 13; 2026-02-21T08:52:44.6702440Z shl.b32 %r1406, %r1401, 13; 2026-02-21T08:52:44.6702499Z shl.b32 %r1407, %r1402, 13; 2026-02-21T08:52:44.6702704Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6702770Z or.b32 %r1408, %r1404, %r13; 2026-02-21T08:52:44.6702836Z or.b32 %r1409, %r1405, %r13; 2026-02-21T08:52:44.6702896Z or.b32 %r1410, %r1406, %r13; 2026-02-21T08:52:44.6702959Z or.b32 %r1411, %r1407, %r13; 2026-02-21T08:52:44.6703169Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6703350Z mad.wide.s32 %rd151, %r1408, 2, %rd37; 2026-02-21T08:52:44.6703421Z mad.wide.s32 %rd152, %r1409, 2, %rd37; 2026-02-21T08:52:44.6703495Z mad.wide.s32 %rd153, %r1410, 2, %rd37; 2026-02-21T08:52:44.6703562Z mad.wide.s32 %rd154, %r1411, 2, %rd37; 2026-02-21T08:52:44.6703767Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6703828Z bar.sync 0; 2026-02-21T08:52:44.6703901Z mov.b32 %r1350, 8; 2026-02-21T08:52:44.6703967Z // begin inline asm 2026-02-21T08:52:44.6704122Z cp.async.ca.shared.global [ %r1349 + 0 ], [ %rd151 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6704187Z // end inline asm 2026-02-21T08:52:44.6704247Z // begin inline asm 2026-02-21T08:52:44.6704499Z cp.async.ca.shared.global [ %r1351 + 0 ], [ %rd152 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6704572Z // end inline asm 2026-02-21T08:52:44.6704639Z // begin inline asm 2026-02-21T08:52:44.6704777Z cp.async.ca.shared.global [ %r1353 + 0 ], [ %rd153 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6704839Z // end inline asm 2026-02-21T08:52:44.6704903Z // begin inline asm 2026-02-21T08:52:44.6705037Z cp.async.ca.shared.global [ %r1355 + 0 ], [ %rd154 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6705096Z // end inline asm 2026-02-21T08:52:44.6705170Z cp.async.commit_group; 2026-02-21T08:52:44.6705379Z .loc 1 54 62 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:62 2026-02-21T08:52:44.6705444Z add.s32 %r1412, %r162, %r31; 2026-02-21T08:52:44.6705647Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6705717Z cvt.s64.s32 %rd162, %r1412; 2026-02-21T08:52:44.6705787Z add.s64 %rd155, %rd38, %rd162; 2026-02-21T08:52:44.6705995Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6706064Z // begin inline asm 2026-02-21T08:52:44.6706201Z cp.async.ca.shared.global [ %r1357 + 0 ], [ %rd155 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6706262Z // end inline asm 2026-02-21T08:52:44.6706336Z cp.async.commit_group; 2026-02-21T08:52:44.6706671Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6706742Z cvt.s64.s32 %rd163, %r1404; 2026-02-21T08:52:44.6706806Z or.b64 %rd164, %rd163, %rd2; 2026-02-21T08:52:44.6706885Z shl.b64 %rd165, %rd164, 1; 2026-02-21T08:52:44.6706954Z add.s64 %rd166, %rd37, %rd165; 2026-02-21T08:52:44.6707017Z add.s64 %rd156, %rd166, 128; 2026-02-21T08:52:44.6707087Z cvt.s64.s32 %rd167, %r1405; 2026-02-21T08:52:44.6707150Z or.b64 %rd168, %rd167, %rd2; 2026-02-21T08:52:44.6707212Z shl.b64 %rd169, %rd168, 1; 2026-02-21T08:52:44.6707277Z add.s64 %rd170, %rd37, %rd169; 2026-02-21T08:52:44.6707350Z add.s64 %rd157, %rd170, 128; 2026-02-21T08:52:44.6707412Z cvt.s64.s32 %rd171, %r1406; 2026-02-21T08:52:44.6707474Z or.b64 %rd172, %rd171, %rd2; 2026-02-21T08:52:44.6707542Z shl.b64 %rd173, %rd172, 1; 2026-02-21T08:52:44.6707610Z add.s64 %rd174, %rd37, %rd173; 2026-02-21T08:52:44.6707673Z add.s64 %rd158, %rd174, 128; 2026-02-21T08:52:44.6707739Z cvt.s64.s32 %rd175, %r1407; 2026-02-21T08:52:44.6707802Z or.b64 %rd176, %rd175, %rd2; 2026-02-21T08:52:44.6707864Z shl.b64 %rd177, %rd176, 1; 2026-02-21T08:52:44.6707925Z add.s64 %rd178, %rd37, %rd177; 2026-02-21T08:52:44.6707993Z add.s64 %rd159, %rd178, 128; 2026-02-21T08:52:44.6708200Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6708261Z bar.sync 0; 2026-02-21T08:52:44.6708327Z // begin inline asm 2026-02-21T08:52:44.6708464Z cp.async.ca.shared.global [ %r1359 + 0 ], [ %rd156 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6708606Z // end inline asm 2026-02-21T08:52:44.6708672Z // begin inline asm 2026-02-21T08:52:44.6708818Z cp.async.ca.shared.global [ %r1361 + 0 ], [ %rd157 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6708877Z // end inline asm 2026-02-21T08:52:44.6709094Z // begin inline asm 2026-02-21T08:52:44.6709247Z cp.async.ca.shared.global [ %r1363 + 0 ], [ %rd158 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6709305Z // end inline asm 2026-02-21T08:52:44.6709366Z // begin inline asm 2026-02-21T08:52:44.6709520Z cp.async.ca.shared.global [ %r1365 + 0 ], [ %rd159 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6709581Z // end inline asm 2026-02-21T08:52:44.6709649Z cp.async.commit_group; 2026-02-21T08:52:44.6709864Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6709935Z cvt.s64.s32 %rd179, %r162; 2026-02-21T08:52:44.6710001Z add.s64 %rd180, %rd179, %rd3; 2026-02-21T08:52:44.6710066Z add.s64 %rd181, %rd38, %rd180; 2026-02-21T08:52:44.6710136Z add.s64 %rd160, %rd181, 229376; 2026-02-21T08:52:44.6710474Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6710538Z // begin inline asm 2026-02-21T08:52:44.6710678Z cp.async.ca.shared.global [ %r1367 + 0 ], [ %rd160 + 0 ], 0x8, %r1350; 2026-02-21T08:52:44.6710751Z // end inline asm 2026-02-21T08:52:44.6710819Z cp.async.commit_group; 2026-02-21T08:52:44.6711038Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6711112Z add.s32 %r2498, %r67, %r1403; 2026-02-21T08:52:44.6711188Z or.b32 %r1413, %r8, %r161; 2026-02-21T08:52:44.6711253Z shl.b32 %r1414, %r1413, 13; 2026-02-21T08:52:44.6711335Z mad.wide.s32 %rd250, %r1414, 2, %rd1; 2026-02-21T08:52:44.6711397Z or.b32 %r1415, %r7, %r161; 2026-02-21T08:52:44.6711460Z shl.b32 %r1416, %r1415, 13; 2026-02-21T08:52:44.6711529Z mad.wide.s32 %rd249, %r1416, 2, %rd1; 2026-02-21T08:52:44.6711598Z shl.b32 %r1417, %r1398, 19; 2026-02-21T08:52:44.6711662Z or.b32 %r1418, %r70, %r1417; 2026-02-21T08:52:44.6711732Z mad.wide.s32 %rd248, %r1418, 2, %rd1; 2026-02-21T08:52:44.6711802Z or.b32 %r2497, %r71, %r1417; 2026-02-21T08:52:44.6711863Z mov.b32 %r2501, 0f00000000; 2026-02-21T08:52:44.6711928Z mov.b32 %r2500, 1; 2026-02-21T08:52:44.6711992Z mov.b32 %r2499, -1; 2026-02-21T08:52:44.6712063Z mov.b64 %rd251, -32; 2026-02-21T08:52:44.6712125Z mov.b32 %r2502, %r2501; 2026-02-21T08:52:44.6712185Z mov.b32 %r2503, %r2501; 2026-02-21T08:52:44.6712251Z mov.b32 %r2504, %r2501; 2026-02-21T08:52:44.6712311Z mov.b32 %r2505, %r2501; 2026-02-21T08:52:44.6712372Z mov.b32 %r2506, %r2501; 2026-02-21T08:52:44.6712430Z mov.b32 %r2507, %r2501; 2026-02-21T08:52:44.6712496Z mov.b32 %r2508, %r2501; 2026-02-21T08:52:44.6712557Z mov.b32 %r2509, %r2501; 2026-02-21T08:52:44.6712616Z mov.b32 %r2510, %r2501; 2026-02-21T08:52:44.6712682Z mov.b32 %r2511, %r2501; 2026-02-21T08:52:44.6712741Z mov.b32 %r2512, %r2501; 2026-02-21T08:52:44.6712804Z mov.b32 %r2513, %r2501; 2026-02-21T08:52:44.6712872Z mov.b32 %r2514, %r2501; 2026-02-21T08:52:44.6712931Z mov.b32 %r2515, %r2501; 2026-02-21T08:52:44.6712990Z mov.b32 %r2516, %r2501; 2026-02-21T08:52:44.6713108Z $L__BB0_8: // Parent Loop BB0_3 Depth=1 2026-02-21T08:52:44.6713229Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:44.6713292Z add.s64 %rd251, %rd251, 32; 2026-02-21T08:52:44.6713364Z setp.lt.u64 %p37, %rd251, 4032; 2026-02-21T08:52:44.6713432Z add.s32 %r1755, %r2499, 1; 2026-02-21T08:52:44.6713499Z setp.gt.s32 %p38, %r1755, 1; 2026-02-21T08:52:44.6713569Z selp.b32 %r2499, 0, %r1755, %p38; 2026-02-21T08:52:44.6713784Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6713862Z cp.async.wait_group 2; 2026-02-21T08:52:44.6713919Z bar.sync 0; 2026-02-21T08:52:44.6713980Z shl.b32 %r1756, %r2499, 13; 2026-02-21T08:52:44.6714051Z add.s32 %r1758, %r2442, %r1756; 2026-02-21T08:52:44.6714261Z .loc 1 52 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:52:32 2026-02-21T08:52:44.6714323Z add.s32 %r1759, %r1758, %r59; 2026-02-21T08:52:44.6714517Z ld.shared.b16 %rs225, [%r1759]; 2026-02-21T08:52:44.6714590Z ld.shared.b16 %rs226, [%r1759+1024]; 2026-02-21T08:52:44.6714666Z ld.shared.b16 %rs227, [%r1759+64]; 2026-02-21T08:52:44.6714737Z ld.shared.b16 %rs228, [%r1759+1088]; 2026-02-21T08:52:44.6714806Z add.s32 %r1760, %r1758, %r60; 2026-02-21T08:52:44.6714872Z ld.shared.b16 %rs229, [%r1760]; 2026-02-21T08:52:44.6714939Z ld.shared.b16 %rs230, [%r1760+1024]; 2026-02-21T08:52:44.6715013Z ld.shared.b16 %rs231, [%r1760+64]; 2026-02-21T08:52:44.6715082Z ld.shared.b16 %rs232, [%r1760+1088]; 2026-02-21T08:52:44.6715143Z add.s32 %r1761, %r1758, %r61; 2026-02-21T08:52:44.6715209Z ld.shared.b16 %rs233, [%r1761]; 2026-02-21T08:52:44.6715285Z ld.shared.b16 %rs234, [%r1761+1024]; 2026-02-21T08:52:44.6715447Z ld.shared.b16 %rs235, [%r1761+64]; 2026-02-21T08:52:44.6715519Z ld.shared.b16 %rs236, [%r1761+1088]; 2026-02-21T08:52:44.6715586Z add.s32 %r1762, %r1758, %r62; 2026-02-21T08:52:44.6715652Z ld.shared.b16 %rs237, [%r1762]; 2026-02-21T08:52:44.6715722Z ld.shared.b16 %rs238, [%r1762+1024]; 2026-02-21T08:52:44.6715795Z ld.shared.b16 %rs239, [%r1762+64]; 2026-02-21T08:52:44.6715861Z ld.shared.b16 %rs240, [%r1762+1088]; 2026-02-21T08:52:44.6715921Z add.s32 %r1763, %r1758, %r63; 2026-02-21T08:52:44.6715985Z ld.shared.b16 %rs241, [%r1763]; 2026-02-21T08:52:44.6716058Z ld.shared.b16 %rs242, [%r1763+1024]; 2026-02-21T08:52:44.6716122Z ld.shared.b16 %rs243, [%r1763+64]; 2026-02-21T08:52:44.6716190Z ld.shared.b16 %rs244, [%r1763+1088]; 2026-02-21T08:52:44.6716257Z add.s32 %r1764, %r1758, %r64; 2026-02-21T08:52:44.6716322Z ld.shared.b16 %rs245, [%r1764]; 2026-02-21T08:52:44.6716388Z ld.shared.b16 %rs246, [%r1764+1024]; 2026-02-21T08:52:44.6716579Z ld.shared.b16 %rs247, [%r1764+64]; 2026-02-21T08:52:44.6716663Z ld.shared.b16 %rs248, [%r1764+1088]; 2026-02-21T08:52:44.6716725Z add.s32 %r1765, %r1758, %r65; 2026-02-21T08:52:44.6716791Z ld.shared.b16 %rs249, [%r1765]; 2026-02-21T08:52:44.6716867Z ld.shared.b16 %rs250, [%r1765+1024]; 2026-02-21T08:52:44.6716938Z ld.shared.b16 %rs251, [%r1765+64]; 2026-02-21T08:52:44.6717009Z ld.shared.b16 %rs252, [%r1765+1088]; 2026-02-21T08:52:44.6717085Z add.s32 %r1766, %r1758, %r66; 2026-02-21T08:52:44.6717163Z ld.shared.b16 %rs253, [%r1766]; 2026-02-21T08:52:44.6717234Z ld.shared.b16 %rs254, [%r1766+1024]; 2026-02-21T08:52:44.6717313Z ld.shared.b16 %rs255, [%r1766+64]; 2026-02-21T08:52:44.6717391Z ld.shared.b16 %rs256, [%r1766+1088]; 2026-02-21T08:52:44.6717461Z cvt.f32.bf16 %r1451, %rs225; 2026-02-21T08:52:44.6717526Z cvt.f32.bf16 %r1452, %rs226; 2026-02-21T08:52:44.6717593Z cvt.f32.bf16 %r1453, %rs229; 2026-02-21T08:52:44.6717656Z cvt.f32.bf16 %r1454, %rs230; 2026-02-21T08:52:44.6717718Z cvt.f32.bf16 %r1487, %rs233; 2026-02-21T08:52:44.6717785Z cvt.f32.bf16 %r1488, %rs234; 2026-02-21T08:52:44.6717852Z cvt.f32.bf16 %r1489, %rs237; 2026-02-21T08:52:44.6717916Z cvt.f32.bf16 %r1490, %rs238; 2026-02-21T08:52:44.6717979Z cvt.f32.bf16 %r1523, %rs241; 2026-02-21T08:52:44.6718051Z cvt.f32.bf16 %r1524, %rs242; 2026-02-21T08:52:44.6718113Z cvt.f32.bf16 %r1525, %rs245; 2026-02-21T08:52:44.6718176Z cvt.f32.bf16 %r1526, %rs246; 2026-02-21T08:52:44.6718240Z cvt.f32.bf16 %r1559, %rs249; 2026-02-21T08:52:44.6718312Z cvt.f32.bf16 %r1560, %rs250; 2026-02-21T08:52:44.6718377Z cvt.f32.bf16 %r1561, %rs253; 2026-02-21T08:52:44.6718439Z cvt.f32.bf16 %r1562, %rs254; 2026-02-21T08:52:44.6718501Z cvt.f32.bf16 %r1595, %rs227; 2026-02-21T08:52:44.6718572Z cvt.f32.bf16 %r1596, %rs228; 2026-02-21T08:52:44.6718634Z cvt.f32.bf16 %r1597, %rs231; 2026-02-21T08:52:44.6718696Z cvt.f32.bf16 %r1598, %rs232; 2026-02-21T08:52:44.6718758Z cvt.f32.bf16 %r1631, %rs235; 2026-02-21T08:52:44.6718825Z cvt.f32.bf16 %r1632, %rs236; 2026-02-21T08:52:44.6718888Z cvt.f32.bf16 %r1633, %rs239; 2026-02-21T08:52:44.6718950Z cvt.f32.bf16 %r1634, %rs240; 2026-02-21T08:52:44.6719019Z cvt.f32.bf16 %r1667, %rs243; 2026-02-21T08:52:44.6719080Z cvt.f32.bf16 %r1668, %rs244; 2026-02-21T08:52:44.6719280Z cvt.f32.bf16 %r1669, %rs247; 2026-02-21T08:52:44.6719347Z cvt.f32.bf16 %r1670, %rs248; 2026-02-21T08:52:44.6719414Z cvt.f32.bf16 %r1703, %rs251; 2026-02-21T08:52:44.6719474Z cvt.f32.bf16 %r1704, %rs252; 2026-02-21T08:52:44.6719533Z cvt.f32.bf16 %r1705, %rs255; 2026-02-21T08:52:44.6719601Z cvt.f32.bf16 %r1706, %rs256; 2026-02-21T08:52:44.6719822Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6719887Z shl.b32 %r1767, %r2499, 11; 2026-02-21T08:52:44.6720113Z .loc 1 67 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:67:45 2026-02-21T08:52:44.6720180Z add.s32 %r1768, %r47, %r1767; 2026-02-21T08:52:44.6720253Z ld.shared.b8 %rs257, [%r1768]; 2026-02-21T08:52:44.6720442Z ld.shared.b8 %rs258, [%r1768+128]; 2026-02-21T08:52:44.6720519Z ld.shared.b8 %rs259, [%r1768+256]; 2026-02-21T08:52:44.6720587Z ld.shared.b8 %rs260, [%r1768+384]; 2026-02-21T08:52:44.6720659Z ld.shared.b8 %rs261, [%r1768+512]; 2026-02-21T08:52:44.6720730Z ld.shared.b8 %rs262, [%r1768+640]; 2026-02-21T08:52:44.6720793Z ld.shared.b8 %rs263, [%r1768+768]; 2026-02-21T08:52:44.6720858Z ld.shared.b8 %rs264, [%r1768+896]; 2026-02-21T08:52:44.6720928Z ld.shared.b8 %rs265, [%r1768+1024]; 2026-02-21T08:52:44.6721000Z ld.shared.b8 %rs266, [%r1768+1152]; 2026-02-21T08:52:44.6721067Z ld.shared.b8 %rs267, [%r1768+1280]; 2026-02-21T08:52:44.6721136Z ld.shared.b8 %rs268, [%r1768+1408]; 2026-02-21T08:52:44.6721209Z ld.shared.b8 %rs269, [%r1768+1536]; 2026-02-21T08:52:44.6721275Z ld.shared.b8 %rs270, [%r1768+1664]; 2026-02-21T08:52:44.6721340Z ld.shared.b8 %rs271, [%r1768+1792]; 2026-02-21T08:52:44.6721410Z ld.shared.b8 %rs272, [%r1768+1920]; 2026-02-21T08:52:44.6721623Z .loc 1 57 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:57:28 2026-02-21T08:52:44.6721689Z shl.b16 %rs273, %rs257, 4; 2026-02-21T08:52:44.6721751Z shl.b16 %rs274, %rs258, 4; 2026-02-21T08:52:44.6721821Z shl.b16 %rs275, %rs259, 4; 2026-02-21T08:52:44.6721888Z shl.b16 %rs276, %rs260, 4; 2026-02-21T08:52:44.6721951Z shl.b16 %rs277, %rs261, 4; 2026-02-21T08:52:44.6722029Z shl.b16 %rs278, %rs262, 4; 2026-02-21T08:52:44.6722095Z shl.b16 %rs279, %rs263, 4; 2026-02-21T08:52:44.6722155Z shl.b16 %rs280, %rs264, 4; 2026-02-21T08:52:44.6722215Z shl.b16 %rs281, %rs265, 4; 2026-02-21T08:52:44.6722282Z shl.b16 %rs282, %rs266, 4; 2026-02-21T08:52:44.6722343Z shl.b16 %rs283, %rs267, 4; 2026-02-21T08:52:44.6722403Z shl.b16 %rs284, %rs268, 4; 2026-02-21T08:52:44.6722467Z shl.b16 %rs285, %rs269, 4; 2026-02-21T08:52:44.6722528Z shl.b16 %rs286, %rs270, 4; 2026-02-21T08:52:44.6722590Z shl.b16 %rs287, %rs271, 4; 2026-02-21T08:52:44.6722654Z shl.b16 %rs288, %rs272, 4; 2026-02-21T08:52:44.6722864Z .loc 1 72 58 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:72:58 2026-02-21T08:52:44.6722941Z selp.b16 %rs289, %rs273, %rs257, %p64; 2026-02-21T08:52:44.6723007Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:52:44.6723075Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:52:44.6723146Z selp.b16 %rs292, %rs274, %rs258, %p64; 2026-02-21T08:52:44.6723209Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:52:44.6723274Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:52:44.6723343Z selp.b16 %rs295, %rs275, %rs259, %p64; 2026-02-21T08:52:44.6723408Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:52:44.6723469Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:52:44.6723542Z selp.b16 %rs298, %rs276, %rs260, %p64; 2026-02-21T08:52:44.6723604Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:52:44.6723665Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:52:44.6723740Z selp.b16 %rs301, %rs277, %rs261, %p64; 2026-02-21T08:52:44.6723801Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:52:44.6723864Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:52:44.6723934Z selp.b16 %rs304, %rs278, %rs262, %p64; 2026-02-21T08:52:44.6724001Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:52:44.6724208Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:52:44.6724278Z selp.b16 %rs307, %rs279, %rs263, %p64; 2026-02-21T08:52:44.6724348Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:52:44.6724409Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:52:44.6724487Z selp.b16 %rs310, %rs280, %rs264, %p64; 2026-02-21T08:52:44.6724557Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:52:44.6724619Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:52:44.6724687Z selp.b16 %rs313, %rs281, %rs265, %p64; 2026-02-21T08:52:44.6724749Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:52:44.6724816Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:52:44.6724886Z selp.b16 %rs316, %rs282, %rs266, %p64; 2026-02-21T08:52:44.6724947Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:52:44.6725014Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:52:44.6725185Z selp.b16 %rs319, %rs283, %rs267, %p64; 2026-02-21T08:52:44.6725253Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:52:44.6725316Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:52:44.6725392Z selp.b16 %rs322, %rs284, %rs268, %p64; 2026-02-21T08:52:44.6725459Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:52:44.6725521Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:52:44.6725596Z selp.b16 %rs325, %rs285, %rs269, %p64; 2026-02-21T08:52:44.6725661Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:52:44.6725723Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:52:44.6725791Z selp.b16 %rs328, %rs286, %rs270, %p64; 2026-02-21T08:52:44.6725858Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:52:44.6725919Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:52:44.6725990Z selp.b16 %rs331, %rs287, %rs271, %p64; 2026-02-21T08:52:44.6726062Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:52:44.6726124Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:52:44.6726192Z selp.b16 %rs334, %rs288, %rs272, %p64; 2026-02-21T08:52:44.6726253Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:52:44.6726325Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:52:44.6726665Z .loc 1 77 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:77:32 2026-02-21T08:52:44.6726742Z cvt.rn.f32.s16 %r1769, %rs291; 2026-02-21T08:52:44.6726815Z cvt.rn.f32.s16 %r1770, %rs294; 2026-02-21T08:52:44.6726889Z cvt.rn.f32.s16 %r1771, %rs297; 2026-02-21T08:52:44.6726955Z cvt.rn.f32.s16 %r1772, %rs300; 2026-02-21T08:52:44.6727025Z cvt.rn.f32.s16 %r1773, %rs303; 2026-02-21T08:52:44.6727086Z cvt.rn.f32.s16 %r1774, %rs306; 2026-02-21T08:52:44.6727151Z cvt.rn.f32.s16 %r1775, %rs309; 2026-02-21T08:52:44.6727213Z cvt.rn.f32.s16 %r1776, %rs312; 2026-02-21T08:52:44.6727281Z cvt.rn.f32.s16 %r1777, %rs315; 2026-02-21T08:52:44.6727342Z cvt.rn.f32.s16 %r1778, %rs318; 2026-02-21T08:52:44.6727405Z cvt.rn.f32.s16 %r1779, %rs321; 2026-02-21T08:52:44.6727472Z cvt.rn.f32.s16 %r1780, %rs324; 2026-02-21T08:52:44.6727534Z cvt.rn.f32.s16 %r1781, %rs327; 2026-02-21T08:52:44.6727600Z cvt.rn.f32.s16 %r1782, %rs330; 2026-02-21T08:52:44.6727663Z cvt.rn.f32.s16 %r1783, %rs333; 2026-02-21T08:52:44.6727729Z cvt.rn.f32.s16 %r1784, %rs336; 2026-02-21T08:52:44.6727793Z st.shared.b32 [%r48], %r1769; 2026-02-21T08:52:44.6727864Z st.shared.b32 [%r48+8192], %r1777; 2026-02-21T08:52:44.6727937Z st.shared.b32 [%r49], %r1770; 2026-02-21T08:52:44.6728002Z st.shared.b32 [%r49+8192], %r1778; 2026-02-21T08:52:44.6728064Z st.shared.b32 [%r50], %r1771; 2026-02-21T08:52:44.6728132Z st.shared.b32 [%r50+8192], %r1779; 2026-02-21T08:52:44.6728196Z st.shared.b32 [%r51], %r1772; 2026-02-21T08:52:44.6728261Z st.shared.b32 [%r51+8192], %r1780; 2026-02-21T08:52:44.6728336Z st.shared.b32 [%r52], %r1773; 2026-02-21T08:52:44.6728407Z st.shared.b32 [%r52+8192], %r1781; 2026-02-21T08:52:44.6728470Z st.shared.b32 [%r53], %r1774; 2026-02-21T08:52:44.6728534Z st.shared.b32 [%r53+8192], %r1782; 2026-02-21T08:52:44.6728602Z st.shared.b32 [%r54], %r1775; 2026-02-21T08:52:44.6728667Z st.shared.b32 [%r54+8192], %r1783; 2026-02-21T08:52:44.6728734Z st.shared.b32 [%r55], %r1776; 2026-02-21T08:52:44.6728799Z st.shared.b32 [%r55+8192], %r1784; 2026-02-21T08:52:44.6728860Z $L__tmp5: 2026-02-21T08:52:44.6729292Z .loc 2 291 36 // standard.py:291:36 @[ cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:84:40 ] 2026-02-21T08:52:44.6729357Z // begin inline asm 2026-02-21T08:52:44.6729450Z fence.proxy.async.shared::cta; 2026-02-21T08:52:44.6729508Z // end inline asm 2026-02-21T08:52:44.6729566Z bar.sync 0; 2026-02-21T08:52:44.6729648Z shfl.sync.idx.b32 %r1785, %r4, 0, 31, -1; 2026-02-21T08:52:44.6729727Z wgmma.fence.sync.aligned; 2026-02-21T08:52:44.6729791Z shl.b32 %r1786, %r1785, 10; 2026-02-21T08:52:44.6729853Z and.b32 %r1787, %r1786, 4096; 2026-02-21T08:52:44.6729922Z add.s32 %r1788, %r1787, %r364; 2026-02-21T08:52:44.6729984Z bfe.u32 %r1789, %r1788, 4, 14; 2026-02-21T08:52:44.6730048Z cvt.u64.u32 %rd195, %r1789; 2026-02-21T08:52:44.6730257Z or.b64 %rd182, %rd195, 4611686293338849280; 2026-02-21T08:52:44.6730329Z mov.pred %p28, -1; 2026-02-21T08:52:44.6730391Z // begin inline asm 2026-02-21T08:52:44.6730915Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1451,%r1452,%r1453,%r1454}, %rd182, %p28, 1, 1; 2026-02-21T08:52:44.6730984Z // end inline asm 2026-02-21T08:52:44.6731046Z add.s32 %r1790, %r1788, 32; 2026-02-21T08:52:44.6731121Z bfe.u32 %r1791, %r1790, 4, 14; 2026-02-21T08:52:44.6731193Z cvt.u64.u32 %rd196, %r1791; 2026-02-21T08:52:44.6731271Z or.b64 %rd183, %rd196, 4611686293338849280; 2026-02-21T08:52:44.6731332Z // begin inline asm 2026-02-21T08:52:44.6731845Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1487,%r1488,%r1489,%r1490}, %rd183, %p28, 1, 1; 2026-02-21T08:52:44.6731904Z // end inline asm 2026-02-21T08:52:44.6731979Z add.s32 %r1792, %r1788, 64; 2026-02-21T08:52:44.6732044Z bfe.u32 %r1793, %r1792, 4, 14; 2026-02-21T08:52:44.6732112Z cvt.u64.u32 %rd197, %r1793; 2026-02-21T08:52:44.6732186Z or.b64 %rd184, %rd197, 4611686293338849280; 2026-02-21T08:52:44.6732250Z // begin inline asm 2026-02-21T08:52:44.6732760Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1523,%r1524,%r1525,%r1526}, %rd184, %p28, 1, 1; 2026-02-21T08:52:44.6732819Z // end inline asm 2026-02-21T08:52:44.6732880Z add.s32 %r1794, %r1788, 96; 2026-02-21T08:52:44.6732947Z bfe.u32 %r1795, %r1794, 4, 14; 2026-02-21T08:52:44.6733009Z cvt.u64.u32 %rd198, %r1795; 2026-02-21T08:52:44.6733082Z or.b64 %rd185, %rd198, 4611686293338849280; 2026-02-21T08:52:44.6733143Z // begin inline asm 2026-02-21T08:52:44.6733654Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1559,%r1560,%r1561,%r1562}, %rd185, %p28, 1, 1; 2026-02-21T08:52:44.6733713Z // end inline asm 2026-02-21T08:52:44.6733776Z add.s32 %r1796, %r1788, 8192; 2026-02-21T08:52:44.6733848Z bfe.u32 %r1797, %r1796, 4, 14; 2026-02-21T08:52:44.6733911Z cvt.u64.u32 %rd199, %r1797; 2026-02-21T08:52:44.6733986Z or.b64 %rd186, %rd199, 4611686293338849280; 2026-02-21T08:52:44.6734051Z // begin inline asm 2026-02-21T08:52:44.6734552Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1595,%r1596,%r1597,%r1598}, %rd186, %p28, 1, 1; 2026-02-21T08:52:44.6734613Z // end inline asm 2026-02-21T08:52:44.6734679Z add.s32 %r1798, %r1788, 8224; 2026-02-21T08:52:44.6734740Z bfe.u32 %r1799, %r1798, 4, 14; 2026-02-21T08:52:44.6734802Z cvt.u64.u32 %rd200, %r1799; 2026-02-21T08:52:44.6734873Z or.b64 %rd187, %rd200, 4611686293338849280; 2026-02-21T08:52:44.6734942Z // begin inline asm 2026-02-21T08:52:44.6735443Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1631,%r1632,%r1633,%r1634}, %rd187, %p28, 1, 1; 2026-02-21T08:52:44.6735606Z // end inline asm 2026-02-21T08:52:44.6735675Z add.s32 %r1800, %r1788, 8256; 2026-02-21T08:52:44.6735737Z bfe.u32 %r1801, %r1800, 4, 14; 2026-02-21T08:52:44.6735814Z cvt.u64.u32 %rd201, %r1801; 2026-02-21T08:52:44.6735894Z or.b64 %rd188, %rd201, 4611686293338849280; 2026-02-21T08:52:44.6735955Z // begin inline asm 2026-02-21T08:52:44.6736565Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1667,%r1668,%r1669,%r1670}, %rd188, %p28, 1, 1; 2026-02-21T08:52:44.6736635Z // end inline asm 2026-02-21T08:52:44.6736698Z add.s32 %r1802, %r1788, 8288; 2026-02-21T08:52:44.6736900Z bfe.u32 %r1803, %r1802, 4, 14; 2026-02-21T08:52:44.6736969Z cvt.u64.u32 %rd202, %r1803; 2026-02-21T08:52:44.6737051Z or.b64 %rd189, %rd202, 4611686293338849280; 2026-02-21T08:52:44.6737117Z // begin inline asm 2026-02-21T08:52:44.6737621Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516}, {%r1703,%r1704,%r1705,%r1706}, %rd189, %p28, 1, 1; 2026-02-21T08:52:44.6737684Z // end inline asm 2026-02-21T08:52:44.6737764Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:44.6737824Z mov.b32 %r1724, 0; 2026-02-21T08:52:44.6737892Z mov.b32 %r1723, %r364; 2026-02-21T08:52:44.6737954Z mov.b32 %r1725, %r1724; 2026-02-21T08:52:44.6738015Z // begin inline asm 2026-02-21T08:52:44.6738329Z // wait for regs: %r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r2515,%r2516,%r1723,%r1724,%r1725 2026-02-21T08:52:44.6738415Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:44.6738475Z // end inline asm 2026-02-21T08:52:44.6738531Z $L__tmp6: 2026-02-21T08:52:44.6738766Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6738833Z add.s32 %r1804, %r2500, 1; 2026-02-21T08:52:44.6738901Z setp.gt.s32 %p39, %r1804, 1; 2026-02-21T08:52:44.6738970Z selp.b32 %r2500, 0, %r1804, %p39; 2026-02-21T08:52:44.6739192Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6739280Z mad.wide.s32 %rd193, %r2497, 2, %rd37; 2026-02-21T08:52:44.6739489Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6739556Z shl.b32 %r1805, %r2500, 13; 2026-02-21T08:52:44.6739620Z add.s32 %r1806, %r2442, %r1805; 2026-02-21T08:52:44.6739681Z add.s32 %r1745, %r1806, %r19; 2026-02-21T08:52:44.6739750Z selp.b32 %r1746, 8, 0, %p37; 2026-02-21T08:52:44.6739812Z // begin inline asm 2026-02-21T08:52:44.6739962Z cp.async.ca.shared.global [ %r1745 + 0 ], [ %rd248 + 0 ], 0x8, %r1746; 2026-02-21T08:52:44.6740020Z // end inline asm 2026-02-21T08:52:44.6740089Z add.s32 %r1747, %r1745, 2048; 2026-02-21T08:52:44.6740148Z // begin inline asm 2026-02-21T08:52:44.6740283Z cp.async.ca.shared.global [ %r1747 + 0 ], [ %rd249 + 0 ], 0x8, %r1746; 2026-02-21T08:52:44.6740345Z // end inline asm 2026-02-21T08:52:44.6740405Z add.s32 %r1749, %r1745, 4096; 2026-02-21T08:52:44.6740465Z // begin inline asm 2026-02-21T08:52:44.6740604Z cp.async.ca.shared.global [ %r1749 + 0 ], [ %rd250 + 0 ], 0x8, %r1746; 2026-02-21T08:52:44.6740662Z // end inline asm 2026-02-21T08:52:44.6740723Z add.s32 %r1751, %r1745, 6144; 2026-02-21T08:52:44.6740783Z // begin inline asm 2026-02-21T08:52:44.6740920Z cp.async.ca.shared.global [ %r1751 + 0 ], [ %rd193 + 0 ], 0x8, %r1746; 2026-02-21T08:52:44.6740978Z // end inline asm 2026-02-21T08:52:44.6741044Z cp.async.commit_group; 2026-02-21T08:52:44.6741261Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6741325Z cvt.s64.s32 %rd203, %r2498; 2026-02-21T08:52:44.6741525Z add.s64 %rd194, %rd38, %rd203; 2026-02-21T08:52:44.6741732Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6741798Z shl.b32 %r1807, %r2500, 11; 2026-02-21T08:52:44.6741861Z add.s32 %r1753, %r1357, %r1807; 2026-02-21T08:52:44.6741922Z // begin inline asm 2026-02-21T08:52:44.6742077Z cp.async.ca.shared.global [ %r1753 + 0 ], [ %rd194 + 0 ], 0x8, %r1746; 2026-02-21T08:52:44.6742137Z // end inline asm 2026-02-21T08:52:44.6742202Z cp.async.commit_group; 2026-02-21T08:52:44.6742425Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6742489Z add.s32 %r2498, %r2498, 229376; 2026-02-21T08:52:44.6742553Z add.s64 %rd250, %rd250, 128; 2026-02-21T08:52:44.6742729Z add.s64 %rd249, %rd249, 128; 2026-02-21T08:52:44.6742802Z add.s64 %rd248, %rd248, 128; 2026-02-21T08:52:44.6742863Z add.s32 %r2497, %r2497, 64; 2026-02-21T08:52:44.6742931Z setp.lt.u64 %p40, %rd251, 4064; 2026-02-21T08:52:44.6743001Z @%p40 bra $L__BB0_8; 2026-02-21T08:52:44.6743112Z // %bb.9: // in Loop: Header=BB0_3 Depth=1 2026-02-21T08:52:44.6743323Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6743395Z or.b32 %r1826, %r161, %r10; 2026-02-21T08:52:44.6743456Z or.b32 %r1827, %r161, %r11; 2026-02-21T08:52:44.6743669Z .loc 1 40 103 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:40:103 2026-02-21T08:52:44.6743736Z cp.async.wait_group 0; 2026-02-21T08:52:44.6743799Z bar.sync 0; 2026-02-21T08:52:44.6744002Z .loc 1 87 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:87:28 2026-02-21T08:52:44.6744083Z cvt.rn.bf16x2.f32 %r1828, %r2502, %r2501; 2026-02-21T08:52:44.6744167Z cvt.rn.bf16x2.f32 %r1829, %r2504, %r2503; 2026-02-21T08:52:44.6744241Z cvt.rn.bf16x2.f32 %r1830, %r2506, %r2505; 2026-02-21T08:52:44.6744316Z cvt.rn.bf16x2.f32 %r1831, %r2508, %r2507; 2026-02-21T08:52:44.6744392Z cvt.rn.bf16x2.f32 %r1832, %r2510, %r2509; 2026-02-21T08:52:44.6744463Z cvt.rn.bf16x2.f32 %r1833, %r2512, %r2511; 2026-02-21T08:52:44.6744534Z cvt.rn.bf16x2.f32 %r1834, %r2514, %r2513; 2026-02-21T08:52:44.6744606Z cvt.rn.bf16x2.f32 %r1835, %r2516, %r2515; 2026-02-21T08:52:44.6744827Z .loc 1 88 50 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:50 2026-02-21T08:52:44.6744913Z mad.lo.s32 %r1836, %r1826, 7168, %r162; 2026-02-21T08:52:44.6744987Z mad.lo.s32 %r1837, %r1827, 7168, %r162; 2026-02-21T08:52:44.6745203Z .loc 1 88 22 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:22 2026-02-21T08:52:44.6745273Z mad.wide.s32 %rd204, %r1836, 2, %rd39; 2026-02-21T08:52:44.6745345Z mad.wide.s32 %rd205, %r1837, 2, %rd39; 2026-02-21T08:52:44.6745555Z .loc 1 88 81 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:81 2026-02-21T08:52:44.6745671Z st.shared.v4.b32 [%r56], {%r1828, %r1830, %r1832, %r1834}; 2026-02-21T08:52:44.6745791Z st.shared.v4.b32 [%r56+256], {%r1829, %r1831, %r1833, %r1835}; 2026-02-21T08:52:44.6745850Z bar.sync 0; 2026-02-21T08:52:44.6745918Z // begin inline asm 2026-02-21T08:52:44.6746111Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1808, %r1809, %r1810, %r1811}, [%r858]; 2026-02-21T08:52:44.6746172Z // end inline asm 2026-02-21T08:52:44.6746239Z // begin inline asm 2026-02-21T08:52:44.6746421Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1813, %r1814, %r1815, %r1816}, [%r863]; 2026-02-21T08:52:44.6746794Z // end inline asm 2026-02-21T08:52:44.6746869Z // begin inline asm 2026-02-21T08:52:44.6747005Z st.global.v4.b32 [ %rd204 + 0 ], { %r1808, %r1809, %r1810, %r1811 }; 2026-02-21T08:52:44.6747080Z // end inline asm 2026-02-21T08:52:44.6747143Z // begin inline asm 2026-02-21T08:52:44.6747273Z st.global.v4.b32 [ %rd205 + 0 ], { %r1813, %r1814, %r1815, %r1816 }; 2026-02-21T08:52:44.6747507Z // end inline asm 2026-02-21T08:52:44.6747728Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6747802Z add.s32 %r2456, %r2456, 12672; 2026-02-21T08:52:44.6747873Z setp.lt.s32 %p41, %r2456, %r2558; 2026-02-21T08:52:44.6747940Z @%p41 bra $L__BB0_3; 2026-02-21T08:52:44.6748010Z bra.uni $L__BB0_10; 2026-02-21T08:52:44.6748116Z $L__BB0_1: // %.._crit_edge_crit_edge 2026-02-21T08:52:44.6748327Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6748395Z or.b32 %r2520, %r19, 2048; 2026-02-21T08:52:44.6748461Z or.b32 %r2519, %r19, 4096; 2026-02-21T08:52:44.6748598Z or.b32 %r2518, %r19, 6144; 2026-02-21T08:52:44.6748940Z .loc 1 47 25 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:47:25 2026-02-21T08:52:44.6749011Z or.b32 %r2517, %r13, 64; 2026-02-21T08:52:44.6749102Z $L__BB0_10: // %._crit_edge 2026-02-21T08:52:44.6749323Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6749393Z sub.s32 %r1854, 112, %r2558; 2026-02-21T08:52:44.6749459Z mul.hi.s32 %r1855, %r1854, 1041204193; 2026-02-21T08:52:44.6749529Z shr.u32 %r1856, %r1855, 31; 2026-02-21T08:52:44.6749592Z shr.s32 %r1857, %r1855, 10; 2026-02-21T08:52:44.6749666Z add.s32 %r210, %r1857, %r1856; 2026-02-21T08:52:44.6749739Z mul.lo.s32 %r1858, %r210, 4224; 2026-02-21T08:52:44.6749809Z setp.ne.b32 %p42, %r1854, %r1858; 2026-02-21T08:52:44.6749878Z setp.gt.s32 %p43, %r1854, -1; 2026-02-21T08:52:44.6749948Z and.pred %p44, %p43, %p42; 2026-02-21T08:52:44.6750022Z selp.b32 %r211, 1, 0, %p44; 2026-02-21T08:52:44.6750086Z add.s32 %r212, %r210, %r211; 2026-02-21T08:52:44.6750153Z setp.lt.s32 %p45, %r212, 1; 2026-02-21T08:52:44.6750225Z setp.gt.s32 %p46, %r212, 0; 2026-02-21T08:52:44.6750446Z .loc 1 25 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:25:35 2026-02-21T08:52:44.6750527Z mul.hi.s32 %r1859, %r2558, -1840700269; 2026-02-21T08:52:44.6750596Z add.s32 %r1860, %r1859, %r2558; 2026-02-21T08:52:44.6750656Z shr.u32 %r1861, %r1860, 31; 2026-02-21T08:52:44.6750718Z shr.s32 %r1862, %r1860, 7; 2026-02-21T08:52:44.6750780Z add.s32 %r1863, %r1862, %r1861; 2026-02-21T08:52:44.6750996Z .loc 1 26 33 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:26:33 2026-02-21T08:52:44.6751059Z shl.b32 %r1864, %r1863, 1; 2026-02-21T08:52:44.6751264Z .loc 1 27 39 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:39 2026-02-21T08:52:44.6751335Z sub.s32 %r1865, 1, %r1864; 2026-02-21T08:52:44.6751544Z .loc 1 27 52 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:52 2026-02-21T08:52:44.6751609Z min.s32 %r1866, %r1865, 2; 2026-02-21T08:52:44.6751819Z .loc 1 28 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:45 2026-02-21T08:52:44.6751887Z mul.lo.s32 %r1867, %r1863, 224; 2026-02-21T08:52:44.6751950Z sub.s32 %r1868, %r2558, %r1867; 2026-02-21T08:52:44.6752160Z .loc 1 29 51 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:29:51 2026-02-21T08:52:44.6752225Z div.s32 %r213, %r1868, %r1866; 2026-02-21T08:52:44.6752432Z .loc 1 28 64 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:64 2026-02-21T08:52:44.6752499Z mul.lo.s32 %r1869, %r213, %r1866; 2026-02-21T08:52:44.6752567Z sub.s32 %r1870, %r1868, %r1869; 2026-02-21T08:52:44.6752769Z .loc 1 28 30 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:30 2026-02-21T08:52:44.6752833Z add.s32 %r1871, %r1870, %r1864; 2026-02-21T08:52:44.6753043Z .loc 1 30 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:30:27 2026-02-21T08:52:44.6753105Z shl.b32 %r214, %r1871, 6; 2026-02-21T08:52:44.6753432Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6753500Z or.b32 %r2559, %r214, %r6; 2026-02-21T08:52:44.6753564Z or.b32 %r2560, %r214, %r7; 2026-02-21T08:52:44.6753624Z or.b32 %r2561, %r214, %r8; 2026-02-21T08:52:44.6753685Z or.b32 %r2562, %r214, %r9; 2026-02-21T08:52:44.6753898Z .loc 1 48 53 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:53 2026-02-21T08:52:44.6753960Z shl.b32 %r1872, %r2559, 13; 2026-02-21T08:52:44.6754021Z shl.b32 %r1873, %r2560, 13; 2026-02-21T08:52:44.6754087Z shl.b32 %r1874, %r2561, 13; 2026-02-21T08:52:44.6754147Z shl.b32 %r1875, %r2562, 13; 2026-02-21T08:52:44.6754444Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6754514Z or.b32 %r1876, %r1872, %r13; 2026-02-21T08:52:44.6754575Z or.b32 %r1877, %r1873, %r13; 2026-02-21T08:52:44.6754635Z or.b32 %r1878, %r1874, %r13; 2026-02-21T08:52:44.6754701Z or.b32 %r1879, %r1875, %r13; 2026-02-21T08:52:44.6754913Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6754987Z mad.wide.s32 %rd206, %r1876, 2, %rd37; 2026-02-21T08:52:44.6755057Z mad.wide.s32 %rd207, %r1877, 2, %rd37; 2026-02-21T08:52:44.6755130Z mad.wide.s32 %rd208, %r1878, 2, %rd37; 2026-02-21T08:52:44.6755211Z mad.wide.s32 %rd209, %r1879, 2, %rd37; 2026-02-21T08:52:44.6755417Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6755480Z bar.sync 0; 2026-02-21T08:52:44.6755545Z add.s32 %r1838, %r2442, %r19; 2026-02-21T08:52:44.6755609Z selp.b32 %r1839, 8, 0, %p46; 2026-02-21T08:52:44.6755671Z // begin inline asm 2026-02-21T08:52:44.6755822Z cp.async.ca.shared.global [ %r1838 + 0 ], [ %rd206 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6755884Z // end inline asm 2026-02-21T08:52:44.6755947Z add.s32 %r1840, %r2442, %r2520; 2026-02-21T08:52:44.6756013Z // begin inline asm 2026-02-21T08:52:44.6756150Z cp.async.ca.shared.global [ %r1840 + 0 ], [ %rd207 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6756216Z // end inline asm 2026-02-21T08:52:44.6756281Z add.s32 %r1842, %r2442, %r2519; 2026-02-21T08:52:44.6756348Z // begin inline asm 2026-02-21T08:52:44.6756608Z cp.async.ca.shared.global [ %r1842 + 0 ], [ %rd208 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6756671Z // end inline asm 2026-02-21T08:52:44.6756742Z add.s32 %r1844, %r2442, %r2518; 2026-02-21T08:52:44.6756801Z // begin inline asm 2026-02-21T08:52:44.6756944Z cp.async.ca.shared.global [ %r1844 + 0 ], [ %rd209 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6757008Z // end inline asm 2026-02-21T08:52:44.6757077Z cp.async.commit_group; 2026-02-21T08:52:44.6757289Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6757353Z or.b32 %r1881, %r1872, %r2517; 2026-02-21T08:52:44.6757424Z or.b32 %r1882, %r1873, %r2517; 2026-02-21T08:52:44.6757485Z or.b32 %r1883, %r1874, %r2517; 2026-02-21T08:52:44.6757547Z or.b32 %r1884, %r1875, %r2517; 2026-02-21T08:52:44.6757757Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6757827Z mad.wide.s32 %rd210, %r1881, 2, %rd37; 2026-02-21T08:52:44.6757895Z mad.wide.s32 %rd211, %r1882, 2, %rd37; 2026-02-21T08:52:44.6757964Z mad.wide.s32 %rd212, %r1883, 2, %rd37; 2026-02-21T08:52:44.6758036Z mad.wide.s32 %rd213, %r1884, 2, %rd37; 2026-02-21T08:52:44.6758240Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6758299Z bar.sync 0; 2026-02-21T08:52:44.6758367Z add.s32 %r1885, %r2442, 8192; 2026-02-21T08:52:44.6758432Z add.s32 %r1846, %r1885, %r19; 2026-02-21T08:52:44.6758493Z // begin inline asm 2026-02-21T08:52:44.6758640Z cp.async.ca.shared.global [ %r1846 + 0 ], [ %rd210 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6758849Z // end inline asm 2026-02-21T08:52:44.6758926Z add.s32 %r1848, %r1885, %r2520; 2026-02-21T08:52:44.6758987Z // begin inline asm 2026-02-21T08:52:44.6759129Z cp.async.ca.shared.global [ %r1848 + 0 ], [ %rd211 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6759188Z // end inline asm 2026-02-21T08:52:44.6759257Z add.s32 %r1850, %r1885, %r2519; 2026-02-21T08:52:44.6759317Z // begin inline asm 2026-02-21T08:52:44.6759468Z cp.async.ca.shared.global [ %r1850 + 0 ], [ %rd212 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6759536Z // end inline asm 2026-02-21T08:52:44.6759601Z add.s32 %r1852, %r1885, %r2518; 2026-02-21T08:52:44.6759660Z // begin inline asm 2026-02-21T08:52:44.6759796Z cp.async.ca.shared.global [ %r1852 + 0 ], [ %rd213 + 0 ], 0x8, %r1839; 2026-02-21T08:52:44.6759860Z // end inline asm 2026-02-21T08:52:44.6760074Z cp.async.commit_group; 2026-02-21T08:52:44.6760302Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6760378Z @%p45 bra $L__BB0_17; 2026-02-21T08:52:44.6760470Z // %bb.11: // %.lr.ph97 2026-02-21T08:52:44.6760538Z shl.b32 %r1891, %r212, 7; 2026-02-21T08:52:44.6760759Z .loc 1 32 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:32:27 2026-02-21T08:52:44.6760833Z shl.b32 %r1892, %r213, 6; 2026-02-21T08:52:44.6761042Z .loc 1 33 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:33:32 2026-02-21T08:52:44.6761109Z or.b32 %r2526, %r1892, %r15; 2026-02-21T08:52:44.6761319Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6761384Z or.b32 %r2551, %r214, %r10; 2026-02-21T08:52:44.6761445Z or.b32 %r2552, %r214, %r11; 2026-02-21T08:52:44.6761524Z add.s32 %r222, %r1891, -2; 2026-02-21T08:52:44.6761587Z shl.b32 %r1894, %r2446, 6; 2026-02-21T08:52:44.6761649Z and.b32 %r1896, %r2443, 896; 2026-02-21T08:52:44.6761722Z and.b32 %r1898, %r2444, 62; 2026-02-21T08:52:44.6761787Z or.b32 %r1899, %r1894, %r1896; 2026-02-21T08:52:44.6761847Z or.b32 %r223, %r1899, %r1898; 2026-02-21T08:52:44.6761910Z xor.b32 %r224, %r223, 8; 2026-02-21T08:52:44.6761975Z xor.b32 %r225, %r223, 16; 2026-02-21T08:52:44.6762034Z xor.b32 %r226, %r223, 24; 2026-02-21T08:52:44.6762092Z xor.b32 %r227, %r223, 32; 2026-02-21T08:52:44.6762156Z xor.b32 %r228, %r223, 40; 2026-02-21T08:52:44.6762216Z xor.b32 %r229, %r223, 48; 2026-02-21T08:52:44.6762274Z xor.b32 %r230, %r223, 56; 2026-02-21T08:52:44.6762333Z and.b32 %r1900, %r3, 24; 2026-02-21T08:52:44.6762400Z shl.b32 %r1901, %r1900, 5; 2026-02-21T08:52:44.6762458Z shr.u32 %r1903, %r1900, 1; 2026-02-21T08:52:44.6762519Z bfe.u32 %r1904, %r3, 5, 2; 2026-02-21T08:52:44.6762586Z bfe.s32 %r1905, %r3, 7, 1; 2026-02-21T08:52:44.6762645Z and.b32 %r1906, %r3, 128; 2026-02-21T08:52:44.6762718Z or.b32 %r1907, %r1903, %r1904; 2026-02-21T08:52:44.6762782Z or.b32 %r1908, %r1901, %r2445; 2026-02-21T08:52:44.6762853Z or.b32 %r1909, %r1907, %r1908; 2026-02-21T08:52:44.6762914Z or.b32 %r1910, %r1909, %r1906; 2026-02-21T08:52:44.6762974Z add.s32 %r1912, %r2442, 16384; 2026-02-21T08:52:44.6763041Z add.s32 %r231, %r1912, %r1910; 2026-02-21T08:52:44.6763101Z xor.b32 %r1913, %r1910, 4; 2026-02-21T08:52:44.6763162Z add.s32 %r232, %r1912, %r1913; 2026-02-21T08:52:44.6763222Z xor.b32 %r1914, %r1910, 8; 2026-02-21T08:52:44.6763289Z add.s32 %r233, %r1912, %r1914; 2026-02-21T08:52:44.6763349Z xor.b32 %r1915, %r1910, 12; 2026-02-21T08:52:44.6763410Z add.s32 %r234, %r1912, %r1915; 2026-02-21T08:52:44.6763479Z xor.b32 %r1916, %r1910, 64; 2026-02-21T08:52:44.6763540Z add.s32 %r235, %r1912, %r1916; 2026-02-21T08:52:44.6763601Z xor.b32 %r1917, %r1910, 68; 2026-02-21T08:52:44.6763673Z add.s32 %r236, %r1912, %r1917; 2026-02-21T08:52:44.6763733Z xor.b32 %r1918, %r1910, 72; 2026-02-21T08:52:44.6763794Z add.s32 %r237, %r1912, %r1918; 2026-02-21T08:52:44.6763854Z xor.b32 %r1919, %r1910, 76; 2026-02-21T08:52:44.6764026Z add.s32 %r238, %r1912, %r1919; 2026-02-21T08:52:44.6764086Z and.b32 %r1920, %r12, 12; 2026-02-21T08:52:44.6764161Z and.b32 %r1921, %r2444, 112; 2026-02-21T08:52:44.6764230Z and.b32 %r1924, %r2450, 1088; 2026-02-21T08:52:44.6764292Z and.b32 %r1925, %r1905, 260; 2026-02-21T08:52:44.6764351Z or.b32 %r1926, %r1920, %r1921; 2026-02-21T08:52:44.6764412Z or.b32 %r1927, %r1924, %r1925; 2026-02-21T08:52:44.6764484Z xor.b32 %r1928, %r1927, %r1926; 2026-02-21T08:52:44.6764545Z add.s32 %r239, %r1912, %r1928; 2026-02-21T08:52:44.6764605Z xor.b32 %r1929, %r1928, 8; 2026-02-21T08:52:44.6764671Z add.s32 %r240, %r1912, %r1929; 2026-02-21T08:52:44.6764735Z shl.b32 %r1930, %r3, 7; 2026-02-21T08:52:44.6764796Z or.b32 %r1931, %r1930, %r5; 2026-02-21T08:52:44.6764977Z and.b32 %r1932, %r1931, 8076; 2026-02-21T08:52:44.6765051Z or.b32 %r1933, %r1932, %r2445; 2026-02-21T08:52:44.6765112Z add.s32 %r241, %r1912, %r1933; 2026-02-21T08:52:44.6765173Z xor.b32 %r1934, %r1933, 16; 2026-02-21T08:52:44.6765243Z add.s32 %r242, %r1912, %r1934; 2026-02-21T08:52:44.6765303Z xor.b32 %r1935, %r1933, 32; 2026-02-21T08:52:44.6765364Z add.s32 %r243, %r1912, %r1935; 2026-02-21T08:52:44.6765429Z xor.b32 %r1936, %r1933, 48; 2026-02-21T08:52:44.6765490Z add.s32 %r244, %r1912, %r1936; 2026-02-21T08:52:44.6765550Z xor.b32 %r1937, %r1933, 64; 2026-02-21T08:52:44.6765612Z add.s32 %r245, %r1912, %r1937; 2026-02-21T08:52:44.6765677Z xor.b32 %r1938, %r1933, 80; 2026-02-21T08:52:44.6765738Z add.s32 %r246, %r1912, %r1938; 2026-02-21T08:52:44.6765798Z xor.b32 %r1939, %r1933, 96; 2026-02-21T08:52:44.6765864Z add.s32 %r247, %r1912, %r1939; 2026-02-21T08:52:44.6765925Z xor.b32 %r1940, %r1933, 112; 2026-02-21T08:52:44.6766077Z add.s32 %r248, %r1912, %r1940; 2026-02-21T08:52:44.6766142Z shl.b32 %r1941, %r2446, 4; 2026-02-21T08:52:44.6766212Z shl.b32 %r1943, %r2447, 5; 2026-02-21T08:52:44.6766273Z and.b32 %r1946, %r2449, 4160; 2026-02-21T08:52:44.6766335Z shl.b32 %r1947, %r2451, 2; 2026-02-21T08:52:44.6766405Z and.b32 %r1949, %r2452, 2080; 2026-02-21T08:52:44.6766602Z shl.b32 %r1952, %r2454, 3; 2026-02-21T08:52:44.6766668Z shr.u32 %r1953, %r1906, 1; 2026-02-21T08:52:44.6766730Z or.b32 %r1954, %r1941, %r1943; 2026-02-21T08:52:44.6766797Z or.b32 %r1955, %r1946, %r1949; 2026-02-21T08:52:44.6766869Z or.b32 %r1956, %r1954, %r1953; 2026-02-21T08:52:44.6766934Z xor.b32 %r1957, %r1956, %r1955; 2026-02-21T08:52:44.6767002Z add.s32 %r1958, %r1912, %r1947; 2026-02-21T08:52:44.6767065Z add.s32 %r1959, %r1958, %r1952; 2026-02-21T08:52:44.6767127Z add.s32 %r249, %r1959, %r1957; 2026-02-21T08:52:44.6767192Z shl.b32 %r1961, %r2447, 6; 2026-02-21T08:52:44.6767251Z shl.b32 %r1962, %r2448, 3; 2026-02-21T08:52:44.6767313Z and.b32 %r1963, %r2450, 2080; 2026-02-21T08:52:44.6767377Z and.b32 %r1964, %r2453, 4160; 2026-02-21T08:52:44.6767446Z or.b32 %r1965, %r2455, %r1961; 2026-02-21T08:52:44.6767507Z or.b32 %r1966, %r1963, %r1964; 2026-02-21T08:52:44.6767572Z xor.b32 %r1967, %r1966, %r1965; 2026-02-21T08:52:44.6767643Z add.s32 %r1968, %r1912, %r1962; 2026-02-21T08:52:44.6767705Z add.s32 %r2417, %r1968, %r1967; 2026-02-21T08:52:44.6767766Z add.s32 %r2422, %r2417, 1024; 2026-02-21T08:52:44.6767992Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6768062Z shl.b32 %r1969, %r210, 7; 2026-02-21T08:52:44.6768124Z shl.b32 %r1970, %r211, 7; 2026-02-21T08:52:44.6768186Z add.s32 %r252, %r1969, %r1970; 2026-02-21T08:52:44.6768256Z mov.b32 %r2532, 0f00000000; 2026-02-21T08:52:44.6768314Z mov.b32 %r2530, 1; 2026-02-21T08:52:44.6768375Z mov.b32 %r2529, -1; 2026-02-21T08:52:44.6768436Z mov.b32 %r2527, 32; 2026-02-21T08:52:44.6768503Z mov.b32 %r2525, 0; 2026-02-21T08:52:44.6768563Z mov.b32 %r2528, %r2525; 2026-02-21T08:52:44.6768625Z mov.b32 %r2531, %r2526; 2026-02-21T08:52:44.6768690Z mov.b32 %r2533, %r2532; 2026-02-21T08:52:44.6768749Z mov.b32 %r2534, %r2532; 2026-02-21T08:52:44.6768810Z mov.b32 %r2535, %r2532; 2026-02-21T08:52:44.6769026Z mov.b32 %r2536, %r2532; 2026-02-21T08:52:44.6769098Z mov.b32 %r2537, %r2532; 2026-02-21T08:52:44.6769161Z mov.b32 %r2538, %r2532; 2026-02-21T08:52:44.6769218Z mov.b32 %r2539, %r2532; 2026-02-21T08:52:44.6769281Z mov.b32 %r2540, %r2532; 2026-02-21T08:52:44.6769339Z mov.b32 %r2541, %r2532; 2026-02-21T08:52:44.6769397Z mov.b32 %r2542, %r2532; 2026-02-21T08:52:44.6769463Z mov.b32 %r2543, %r2532; 2026-02-21T08:52:44.6769521Z mov.b32 %r2544, %r2532; 2026-02-21T08:52:44.6769580Z mov.b32 %r2545, %r2532; 2026-02-21T08:52:44.6769636Z mov.b32 %r2546, %r2532; 2026-02-21T08:52:44.6769700Z mov.b32 %r2547, %r2532; 2026-02-21T08:52:44.6769758Z mov.b32 %r2549, %r2530; 2026-02-21T08:52:44.6769816Z mov.b32 %r2550, %r2525; 2026-02-21T08:52:44.6769999Z mov.b32 %r2553, %r2551; 2026-02-21T08:52:44.6770062Z mov.b32 %r2554, %r2552; 2026-02-21T08:52:44.6770120Z mov.b32 %r2557, %r2531; 2026-02-21T08:52:44.6770180Z bra.uni $L__BB0_12; 2026-02-21T08:52:44.6770321Z $L__BB0_16: // in Loop: Header=BB0_12 Depth=1 2026-02-21T08:52:44.6770543Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6770605Z add.s32 %r2550, %r2550, 1; 2026-02-21T08:52:44.6770679Z setp.ne.b32 %p63, %r252, %r2550; 2026-02-21T08:52:44.6770741Z mov.b32 %r2525, %r2549; 2026-02-21T08:52:44.6770800Z mov.b32 %r2526, %r2531; 2026-02-21T08:52:44.6770860Z mov.b32 %r2528, %r259; 2026-02-21T08:52:44.6770923Z mov.b32 %r2531, %r2557; 2026-02-21T08:52:44.6770984Z mov.b32 %r2549, %r287; 2026-02-21T08:52:44.6771041Z mov.b32 %r2553, %r283; 2026-02-21T08:52:44.6771106Z mov.b32 %r2554, %r284; 2026-02-21T08:52:44.6771170Z @%p63 bra $L__BB0_12; 2026-02-21T08:52:44.6771235Z bra.uni $L__BB0_17; 2026-02-21T08:52:44.6771364Z $L__BB0_12: // =>This Inner Loop Header: Depth=1 2026-02-21T08:52:44.6771581Z .loc 1 0 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:0:144 2026-02-21T08:52:44.6771645Z mov.b32 %r284, %r2552; 2026-02-21T08:52:44.6771706Z mov.b32 %r283, %r2551; 2026-02-21T08:52:44.6771772Z mov.b32 %r259, %r2527; 2026-02-21T08:52:44.6771986Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6772052Z add.s32 %r1971, %r2549, 1; 2026-02-21T08:52:44.6772124Z setp.eq.b32 %p47, %r2549, 127; 2026-02-21T08:52:44.6772193Z selp.b32 %r287, 0, %r1971, %p47; 2026-02-21T08:52:44.6772260Z setp.ne.b32 %p48, %r287, 0; 2026-02-21T08:52:44.6772321Z @%p48 bra $L__BB0_14; 2026-02-21T08:52:44.6772435Z // %bb.13: // in Loop: Header=BB0_12 Depth=1 2026-02-21T08:52:44.6772498Z add.s32 %r2558, %r2558, 4224; 2026-02-21T08:52:44.6772709Z .loc 1 25 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:25:35 2026-02-21T08:52:44.6772792Z mul.hi.s32 %r1972, %r2558, -1840700269; 2026-02-21T08:52:44.6772859Z add.s32 %r1973, %r1972, %r2558; 2026-02-21T08:52:44.6772921Z shr.u32 %r1974, %r1973, 31; 2026-02-21T08:52:44.6772988Z shr.s32 %r1975, %r1973, 7; 2026-02-21T08:52:44.6773050Z add.s32 %r1976, %r1975, %r1974; 2026-02-21T08:52:44.6773255Z .loc 1 26 33 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:26:33 2026-02-21T08:52:44.6773317Z shl.b32 %r1977, %r1976, 1; 2026-02-21T08:52:44.6773525Z .loc 1 27 39 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:39 2026-02-21T08:52:44.6773587Z sub.s32 %r1978, 1, %r1977; 2026-02-21T08:52:44.6773789Z .loc 1 27 52 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:27:52 2026-02-21T08:52:44.6773855Z min.s32 %r1979, %r1978, 2; 2026-02-21T08:52:44.6774058Z .loc 1 28 45 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:45 2026-02-21T08:52:44.6774122Z mul.lo.s32 %r1980, %r1976, 224; 2026-02-21T08:52:44.6774189Z sub.s32 %r1981, %r2558, %r1980; 2026-02-21T08:52:44.6774494Z .loc 1 29 51 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:29:51 2026-02-21T08:52:44.6774556Z div.s32 %r1982, %r1981, %r1979; 2026-02-21T08:52:44.6774768Z .loc 1 28 64 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:64 2026-02-21T08:52:44.6774835Z mul.lo.s32 %r1983, %r1982, %r1979; 2026-02-21T08:52:44.6774897Z sub.s32 %r1984, %r1981, %r1983; 2026-02-21T08:52:44.6775097Z .loc 1 28 30 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:28:30 2026-02-21T08:52:44.6775165Z add.s32 %r1985, %r1984, %r1977; 2026-02-21T08:52:44.6775369Z .loc 1 30 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:30:27 2026-02-21T08:52:44.6775530Z shl.b32 %r1986, %r1985, 6; 2026-02-21T08:52:44.6775751Z .loc 1 31 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:31:32 2026-02-21T08:52:44.6775816Z or.b32 %r2559, %r1986, %r6; 2026-02-21T08:52:44.6775877Z or.b32 %r2560, %r1986, %r7; 2026-02-21T08:52:44.6775943Z or.b32 %r2561, %r1986, %r8; 2026-02-21T08:52:44.6776002Z or.b32 %r2562, %r1986, %r9; 2026-02-21T08:52:44.6776065Z or.b32 %r2551, %r1986, %r10; 2026-02-21T08:52:44.6776126Z or.b32 %r2552, %r1986, %r11; 2026-02-21T08:52:44.6776335Z .loc 1 32 27 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:32:27 2026-02-21T08:52:44.6776398Z shl.b32 %r1987, %r1982, 6; 2026-02-21T08:52:44.6776706Z .loc 1 33 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:33:32 2026-02-21T08:52:44.6776797Z or.b32 %r2557, %r1987, %r15; 2026-02-21T08:52:44.6776911Z $L__BB0_14: // in Loop: Header=BB0_12 Depth=1 2026-02-21T08:52:44.6777129Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6777203Z setp.eq.b32 %p57, %r287, 0; 2026-02-21T08:52:44.6777275Z setp.lt.s32 %p58, %r2550, %r222; 2026-02-21T08:52:44.6777336Z add.s32 %r2324, %r2529, 1; 2026-02-21T08:52:44.6777407Z setp.gt.s32 %p60, %r2324, 1; 2026-02-21T08:52:44.6777475Z selp.b32 %r2529, 0, %r2324, %p60; 2026-02-21T08:52:44.6777680Z .loc 1 41 35 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:41:35 2026-02-21T08:52:44.6777743Z add.s32 %r2325, %r2528, %r10; 2026-02-21T08:52:44.6777953Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6778021Z cp.async.wait_group 1; 2026-02-21T08:52:44.6778079Z bar.sync 0; 2026-02-21T08:52:44.6778146Z shl.b32 %r2326, %r2529, 13; 2026-02-21T08:52:44.6778209Z add.s32 %r2328, %r2442, %r2326; 2026-02-21T08:52:44.6778413Z .loc 1 52 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:52:32 2026-02-21T08:52:44.6778479Z add.s32 %r2329, %r2328, %r223; 2026-02-21T08:52:44.6778544Z ld.shared.b16 %rs337, [%r2329]; 2026-02-21T08:52:44.6778620Z ld.shared.b16 %rs338, [%r2329+1024]; 2026-02-21T08:52:44.6778687Z ld.shared.b16 %rs339, [%r2329+64]; 2026-02-21T08:52:44.6778763Z ld.shared.b16 %rs340, [%r2329+1088]; 2026-02-21T08:52:44.6778826Z add.s32 %r2330, %r2328, %r224; 2026-02-21T08:52:44.6778891Z ld.shared.b16 %rs341, [%r2330]; 2026-02-21T08:52:44.6778962Z ld.shared.b16 %rs342, [%r2330+1024]; 2026-02-21T08:52:44.6779027Z ld.shared.b16 %rs343, [%r2330+64]; 2026-02-21T08:52:44.6779093Z ld.shared.b16 %rs344, [%r2330+1088]; 2026-02-21T08:52:44.6779156Z add.s32 %r2331, %r2328, %r225; 2026-02-21T08:52:44.6779226Z ld.shared.b16 %rs345, [%r2331]; 2026-02-21T08:52:44.6779291Z ld.shared.b16 %rs346, [%r2331+1024]; 2026-02-21T08:52:44.6779356Z ld.shared.b16 %rs347, [%r2331+64]; 2026-02-21T08:52:44.6779429Z ld.shared.b16 %rs348, [%r2331+1088]; 2026-02-21T08:52:44.6779493Z add.s32 %r2332, %r2328, %r226; 2026-02-21T08:52:44.6779561Z ld.shared.b16 %rs349, [%r2332]; 2026-02-21T08:52:44.6779628Z ld.shared.b16 %rs350, [%r2332+1024]; 2026-02-21T08:52:44.6779848Z ld.shared.b16 %rs351, [%r2332+64]; 2026-02-21T08:52:44.6779916Z ld.shared.b16 %rs352, [%r2332+1088]; 2026-02-21T08:52:44.6779976Z add.s32 %r2333, %r2328, %r227; 2026-02-21T08:52:44.6780040Z ld.shared.b16 %rs353, [%r2333]; 2026-02-21T08:52:44.6780104Z ld.shared.b16 %rs354, [%r2333+1024]; 2026-02-21T08:52:44.6780169Z ld.shared.b16 %rs355, [%r2333+64]; 2026-02-21T08:52:44.6780245Z ld.shared.b16 %rs356, [%r2333+1088]; 2026-02-21T08:52:44.6780307Z add.s32 %r2334, %r2328, %r228; 2026-02-21T08:52:44.6780372Z ld.shared.b16 %rs357, [%r2334]; 2026-02-21T08:52:44.6780437Z ld.shared.b16 %rs358, [%r2334+1024]; 2026-02-21T08:52:44.6780510Z ld.shared.b16 %rs359, [%r2334+64]; 2026-02-21T08:52:44.6780576Z ld.shared.b16 %rs360, [%r2334+1088]; 2026-02-21T08:52:44.6780757Z add.s32 %r2335, %r2328, %r229; 2026-02-21T08:52:44.6780829Z ld.shared.b16 %rs361, [%r2335]; 2026-02-21T08:52:44.6780896Z ld.shared.b16 %rs362, [%r2335+1024]; 2026-02-21T08:52:44.6780966Z ld.shared.b16 %rs363, [%r2335+64]; 2026-02-21T08:52:44.6781046Z ld.shared.b16 %rs364, [%r2335+1088]; 2026-02-21T08:52:44.6781117Z add.s32 %r2336, %r2328, %r230; 2026-02-21T08:52:44.6781183Z ld.shared.b16 %rs365, [%r2336]; 2026-02-21T08:52:44.6781248Z ld.shared.b16 %rs366, [%r2336+1024]; 2026-02-21T08:52:44.6781320Z ld.shared.b16 %rs367, [%r2336+64]; 2026-02-21T08:52:44.6781385Z ld.shared.b16 %rs368, [%r2336+1088]; 2026-02-21T08:52:44.6781450Z cvt.f32.bf16 %r2022, %rs337; 2026-02-21T08:52:44.6781516Z cvt.f32.bf16 %r2023, %rs338; 2026-02-21T08:52:44.6781580Z cvt.f32.bf16 %r2024, %rs341; 2026-02-21T08:52:44.6781639Z cvt.f32.bf16 %r2025, %rs342; 2026-02-21T08:52:44.6781698Z cvt.f32.bf16 %r2058, %rs345; 2026-02-21T08:52:44.6781763Z cvt.f32.bf16 %r2059, %rs346; 2026-02-21T08:52:44.6781825Z cvt.f32.bf16 %r2060, %rs349; 2026-02-21T08:52:44.6781886Z cvt.f32.bf16 %r2061, %rs350; 2026-02-21T08:52:44.6781949Z cvt.f32.bf16 %r2094, %rs353; 2026-02-21T08:52:44.6782011Z cvt.f32.bf16 %r2095, %rs354; 2026-02-21T08:52:44.6782075Z cvt.f32.bf16 %r2096, %rs357; 2026-02-21T08:52:44.6782135Z cvt.f32.bf16 %r2097, %rs358; 2026-02-21T08:52:44.6782199Z cvt.f32.bf16 %r2130, %rs361; 2026-02-21T08:52:44.6782270Z cvt.f32.bf16 %r2131, %rs362; 2026-02-21T08:52:44.6782333Z cvt.f32.bf16 %r2132, %rs365; 2026-02-21T08:52:44.6782398Z cvt.f32.bf16 %r2133, %rs366; 2026-02-21T08:52:44.6782459Z cvt.f32.bf16 %r2166, %rs339; 2026-02-21T08:52:44.6782519Z cvt.f32.bf16 %r2167, %rs340; 2026-02-21T08:52:44.6782580Z cvt.f32.bf16 %r2168, %rs343; 2026-02-21T08:52:44.6782643Z cvt.f32.bf16 %r2169, %rs344; 2026-02-21T08:52:44.6782703Z cvt.f32.bf16 %r2202, %rs347; 2026-02-21T08:52:44.6782763Z cvt.f32.bf16 %r2203, %rs348; 2026-02-21T08:52:44.6782825Z cvt.f32.bf16 %r2204, %rs351; 2026-02-21T08:52:44.6782886Z cvt.f32.bf16 %r2205, %rs352; 2026-02-21T08:52:44.6782948Z cvt.f32.bf16 %r2238, %rs355; 2026-02-21T08:52:44.6783009Z cvt.f32.bf16 %r2239, %rs356; 2026-02-21T08:52:44.6783073Z cvt.f32.bf16 %r2240, %rs359; 2026-02-21T08:52:44.6783135Z cvt.f32.bf16 %r2241, %rs360; 2026-02-21T08:52:44.6783195Z cvt.f32.bf16 %r2274, %rs363; 2026-02-21T08:52:44.6783259Z cvt.f32.bf16 %r2275, %rs364; 2026-02-21T08:52:44.6783319Z cvt.f32.bf16 %r2276, %rs367; 2026-02-21T08:52:44.6783378Z cvt.f32.bf16 %r2277, %rs368; 2026-02-21T08:52:44.6783589Z .loc 1 54 62 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:62 2026-02-21T08:52:44.6783666Z mad.lo.s32 %r2337, %r2325, 7168, %r2526; 2026-02-21T08:52:44.6783883Z .loc 1 54 34 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:34 2026-02-21T08:52:44.6783947Z cvt.s64.s32 %rd229, %r2337; 2026-02-21T08:52:44.6784016Z add.s64 %rd215, %rd38, %rd229; 2026-02-21T08:52:44.6784222Z .loc 1 54 87 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:54:87 2026-02-21T08:52:44.6784281Z // begin inline asm 2026-02-21T08:52:44.6784345Z mov.u64 %rd214, 0x0; 2026-02-21T08:52:44.6784579Z createpolicy.fractional.L2::evict_last.b64 %rd214, 1.0; 2026-02-21T08:52:44.6784638Z // end inline asm 2026-02-21T08:52:44.6784702Z // begin inline asm 2026-02-21T08:52:44.6784760Z mov.u32 %r1988, 0x0; 2026-02-21T08:52:44.6784817Z mov.u32 %r1989, 0x0; 2026-02-21T08:52:44.6785024Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1988, %r1989 }, [ %rd215 + 0 ], %rd214; 2026-02-21T08:52:44.6785089Z // end inline asm 2026-02-21T08:52:44.6785297Z .loc 1 62 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:62:28 2026-02-21T08:52:44.6785365Z st.shared.b8 [%r231], %r1988; 2026-02-21T08:52:44.6785437Z prmt.b32 %r2338, %r1988, 0, 0x7771U; 2026-02-21T08:52:44.6785504Z st.shared.b8 [%r232], %r2338; 2026-02-21T08:52:44.6785682Z prmt.b32 %r2339, %r1988, 0, 0x7772U; 2026-02-21T08:52:44.6785752Z st.shared.b8 [%r233], %r2339; 2026-02-21T08:52:44.6785815Z prmt.b32 %r2340, %r1988, 0, 0x7773U; 2026-02-21T08:52:44.6785878Z st.shared.b8 [%r234], %r2340; 2026-02-21T08:52:44.6785947Z st.shared.b8 [%r235+1024], %r1989; 2026-02-21T08:52:44.6786016Z prmt.b32 %r2341, %r1989, 0, 0x7771U; 2026-02-21T08:52:44.6786079Z st.shared.b8 [%r236+1024], %r2341; 2026-02-21T08:52:44.6786142Z prmt.b32 %r2342, %r1989, 0, 0x7772U; 2026-02-21T08:52:44.6786212Z st.shared.b8 [%r237+1024], %r2342; 2026-02-21T08:52:44.6786275Z prmt.b32 %r2343, %r1989, 0, 0x7773U; 2026-02-21T08:52:44.6786339Z st.shared.b8 [%r238+1024], %r2343; 2026-02-21T08:52:44.6786394Z bar.sync 0; 2026-02-21T08:52:44.6786590Z ld.shared.b32 %r2344, [%r239]; 2026-02-21T08:52:44.6786661Z prmt.b32 %r2345, %r2344, 0, 0x7770U; 2026-02-21T08:52:44.6786723Z cvt.u16.u32 %rs369, %r2345; 2026-02-21T08:52:44.6786791Z prmt.b32 %r2346, %r2344, 0, 0x7771U; 2026-02-21T08:52:44.6786856Z cvt.u16.u32 %rs370, %r2346; 2026-02-21T08:52:44.6786919Z prmt.b32 %r2347, %r2344, 0, 0x7772U; 2026-02-21T08:52:44.6786985Z cvt.u16.u32 %rs371, %r2347; 2026-02-21T08:52:44.6787049Z prmt.b32 %r2348, %r2344, 0, 0x7773U; 2026-02-21T08:52:44.6787114Z cvt.u16.u32 %rs372, %r2348; 2026-02-21T08:52:44.6787188Z ld.shared.b32 %r2349, [%r239+128]; 2026-02-21T08:52:44.6787261Z prmt.b32 %r2350, %r2349, 0, 0x7770U; 2026-02-21T08:52:44.6787321Z cvt.u16.u32 %rs373, %r2350; 2026-02-21T08:52:44.6787386Z prmt.b32 %r2351, %r2349, 0, 0x7771U; 2026-02-21T08:52:44.6787451Z cvt.u16.u32 %rs374, %r2351; 2026-02-21T08:52:44.6787514Z prmt.b32 %r2352, %r2349, 0, 0x7772U; 2026-02-21T08:52:44.6787575Z cvt.u16.u32 %rs375, %r2352; 2026-02-21T08:52:44.6787641Z prmt.b32 %r2353, %r2349, 0, 0x7773U; 2026-02-21T08:52:44.6787706Z cvt.u16.u32 %rs376, %r2353; 2026-02-21T08:52:44.6787770Z ld.shared.b32 %r2354, [%r240+512]; 2026-02-21T08:52:44.6787833Z prmt.b32 %r2355, %r2354, 0, 0x7770U; 2026-02-21T08:52:44.6787898Z cvt.u16.u32 %rs377, %r2355; 2026-02-21T08:52:44.6787963Z prmt.b32 %r2356, %r2354, 0, 0x7771U; 2026-02-21T08:52:44.6788024Z cvt.u16.u32 %rs378, %r2356; 2026-02-21T08:52:44.6788086Z prmt.b32 %r2357, %r2354, 0, 0x7772U; 2026-02-21T08:52:44.6788152Z cvt.u16.u32 %rs379, %r2357; 2026-02-21T08:52:44.6788215Z prmt.b32 %r2358, %r2354, 0, 0x7773U; 2026-02-21T08:52:44.6788275Z cvt.u16.u32 %rs380, %r2358; 2026-02-21T08:52:44.6788344Z ld.shared.b32 %r2359, [%r240+640]; 2026-02-21T08:52:44.6788408Z prmt.b32 %r2360, %r2359, 0, 0x7770U; 2026-02-21T08:52:44.6788472Z cvt.u16.u32 %rs381, %r2360; 2026-02-21T08:52:44.6788613Z prmt.b32 %r2361, %r2359, 0, 0x7771U; 2026-02-21T08:52:44.6788677Z cvt.u16.u32 %rs382, %r2361; 2026-02-21T08:52:44.6788743Z prmt.b32 %r2362, %r2359, 0, 0x7772U; 2026-02-21T08:52:44.6788804Z cvt.u16.u32 %rs383, %r2362; 2026-02-21T08:52:44.6788874Z prmt.b32 %r2363, %r2359, 0, 0x7773U; 2026-02-21T08:52:44.6788936Z cvt.u16.u32 %rs384, %r2363; 2026-02-21T08:52:44.6789147Z .loc 1 57 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:57:28 2026-02-21T08:52:44.6789215Z shl.b16 %rs385, %rs369, 4; 2026-02-21T08:52:44.6789277Z shl.b16 %rs386, %rs377, 4; 2026-02-21T08:52:44.6789491Z shl.b16 %rs387, %rs370, 4; 2026-02-21T08:52:44.6789552Z shl.b16 %rs388, %rs378, 4; 2026-02-21T08:52:44.6789616Z shl.b16 %rs389, %rs371, 4; 2026-02-21T08:52:44.6789677Z shl.b16 %rs390, %rs379, 4; 2026-02-21T08:52:44.6789738Z shl.b16 %rs391, %rs372, 4; 2026-02-21T08:52:44.6789803Z shl.b16 %rs392, %rs380, 4; 2026-02-21T08:52:44.6789864Z shl.b16 %rs393, %rs373, 4; 2026-02-21T08:52:44.6789925Z shl.b16 %rs394, %rs381, 4; 2026-02-21T08:52:44.6789988Z shl.b16 %rs395, %rs374, 4; 2026-02-21T08:52:44.6790049Z shl.b16 %rs396, %rs382, 4; 2026-02-21T08:52:44.6790108Z shl.b16 %rs397, %rs375, 4; 2026-02-21T08:52:44.6790167Z shl.b16 %rs398, %rs383, 4; 2026-02-21T08:52:44.6790232Z shl.b16 %rs399, %rs376, 4; 2026-02-21T08:52:44.6790292Z shl.b16 %rs400, %rs384, 4; 2026-02-21T08:52:44.6790630Z .loc 1 72 58 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:72:58 2026-02-21T08:52:44.6790717Z selp.b16 %rs401, %rs385, %rs369, %p64; 2026-02-21T08:52:44.6790785Z cvt.s16.s8 %rs402, %rs401; 2026-02-21T08:52:44.6790844Z shr.s16 %rs403, %rs402, 4; 2026-02-21T08:52:44.6790915Z selp.b16 %rs404, %rs386, %rs377, %p64; 2026-02-21T08:52:44.6790982Z cvt.s16.s8 %rs405, %rs404; 2026-02-21T08:52:44.6791042Z shr.s16 %rs406, %rs405, 4; 2026-02-21T08:52:44.6791110Z selp.b16 %rs407, %rs387, %rs370, %p64; 2026-02-21T08:52:44.6791174Z cvt.s16.s8 %rs408, %rs407; 2026-02-21T08:52:44.6791234Z shr.s16 %rs409, %rs408, 4; 2026-02-21T08:52:44.6791301Z selp.b16 %rs410, %rs388, %rs378, %p64; 2026-02-21T08:52:44.6791361Z cvt.s16.s8 %rs411, %rs410; 2026-02-21T08:52:44.6791428Z shr.s16 %rs412, %rs411, 4; 2026-02-21T08:52:44.6791496Z selp.b16 %rs413, %rs389, %rs371, %p64; 2026-02-21T08:52:44.6791556Z cvt.s16.s8 %rs414, %rs413; 2026-02-21T08:52:44.6791633Z shr.s16 %rs415, %rs414, 4; 2026-02-21T08:52:44.6791705Z selp.b16 %rs416, %rs390, %rs379, %p64; 2026-02-21T08:52:44.6791766Z cvt.s16.s8 %rs417, %rs416; 2026-02-21T08:52:44.6791830Z shr.s16 %rs418, %rs417, 4; 2026-02-21T08:52:44.6791900Z selp.b16 %rs419, %rs391, %rs372, %p64; 2026-02-21T08:52:44.6791960Z cvt.s16.s8 %rs420, %rs419; 2026-02-21T08:52:44.6792020Z shr.s16 %rs421, %rs420, 4; 2026-02-21T08:52:44.6792091Z selp.b16 %rs422, %rs392, %rs380, %p64; 2026-02-21T08:52:44.6792151Z cvt.s16.s8 %rs423, %rs422; 2026-02-21T08:52:44.6792210Z shr.s16 %rs424, %rs423, 4; 2026-02-21T08:52:44.6792278Z selp.b16 %rs425, %rs393, %rs373, %p64; 2026-02-21T08:52:44.6792341Z cvt.s16.s8 %rs426, %rs425; 2026-02-21T08:52:44.6792401Z shr.s16 %rs427, %rs426, 4; 2026-02-21T08:52:44.6792467Z selp.b16 %rs428, %rs394, %rs381, %p64; 2026-02-21T08:52:44.6792532Z cvt.s16.s8 %rs429, %rs428; 2026-02-21T08:52:44.6792591Z shr.s16 %rs430, %rs429, 4; 2026-02-21T08:52:44.6792657Z selp.b16 %rs431, %rs395, %rs374, %p64; 2026-02-21T08:52:44.6792723Z cvt.s16.s8 %rs432, %rs431; 2026-02-21T08:52:44.6792785Z shr.s16 %rs433, %rs432, 4; 2026-02-21T08:52:44.6792852Z selp.b16 %rs434, %rs396, %rs382, %p64; 2026-02-21T08:52:44.6792917Z cvt.s16.s8 %rs435, %rs434; 2026-02-21T08:52:44.6792983Z shr.s16 %rs436, %rs435, 4; 2026-02-21T08:52:44.6793050Z selp.b16 %rs437, %rs397, %rs375, %p64; 2026-02-21T08:52:44.6793109Z cvt.s16.s8 %rs438, %rs437; 2026-02-21T08:52:44.6793172Z shr.s16 %rs439, %rs438, 4; 2026-02-21T08:52:44.6793240Z selp.b16 %rs440, %rs398, %rs383, %p64; 2026-02-21T08:52:44.6793310Z cvt.s16.s8 %rs441, %rs440; 2026-02-21T08:52:44.6793375Z shr.s16 %rs442, %rs441, 4; 2026-02-21T08:52:44.6793448Z selp.b16 %rs443, %rs399, %rs376, %p64; 2026-02-21T08:52:44.6793509Z cvt.s16.s8 %rs444, %rs443; 2026-02-21T08:52:44.6793569Z shr.s16 %rs445, %rs444, 4; 2026-02-21T08:52:44.6793641Z selp.b16 %rs446, %rs400, %rs384, %p64; 2026-02-21T08:52:44.6793702Z cvt.s16.s8 %rs447, %rs446; 2026-02-21T08:52:44.6793763Z shr.s16 %rs448, %rs447, 4; 2026-02-21T08:52:44.6793974Z .loc 1 77 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:77:32 2026-02-21T08:52:44.6794039Z cvt.rn.f32.s16 %r2364, %rs403; 2026-02-21T08:52:44.6794213Z cvt.rn.f32.s16 %r2365, %rs406; 2026-02-21T08:52:44.6794276Z cvt.rn.f32.s16 %r2366, %rs409; 2026-02-21T08:52:44.6794344Z cvt.rn.f32.s16 %r2367, %rs412; 2026-02-21T08:52:44.6794405Z cvt.rn.f32.s16 %r2368, %rs415; 2026-02-21T08:52:44.6794468Z cvt.rn.f32.s16 %r2369, %rs418; 2026-02-21T08:52:44.6794536Z cvt.rn.f32.s16 %r2370, %rs421; 2026-02-21T08:52:44.6794597Z cvt.rn.f32.s16 %r2371, %rs424; 2026-02-21T08:52:44.6794659Z cvt.rn.f32.s16 %r2372, %rs427; 2026-02-21T08:52:44.6794721Z cvt.rn.f32.s16 %r2373, %rs430; 2026-02-21T08:52:44.6794788Z cvt.rn.f32.s16 %r2374, %rs433; 2026-02-21T08:52:44.6794849Z cvt.rn.f32.s16 %r2375, %rs436; 2026-02-21T08:52:44.6794910Z cvt.rn.f32.s16 %r2376, %rs439; 2026-02-21T08:52:44.6794974Z cvt.rn.f32.s16 %r2377, %rs442; 2026-02-21T08:52:44.6795131Z cvt.rn.f32.s16 %r2378, %rs445; 2026-02-21T08:52:44.6795196Z cvt.rn.f32.s16 %r2379, %rs448; 2026-02-21T08:52:44.6795255Z bar.sync 0; 2026-02-21T08:52:44.6795320Z st.shared.b32 [%r241], %r2364; 2026-02-21T08:52:44.6795393Z st.shared.b32 [%r241+8192], %r2372; 2026-02-21T08:52:44.6795456Z st.shared.b32 [%r242], %r2365; 2026-02-21T08:52:44.6795527Z st.shared.b32 [%r242+8192], %r2373; 2026-02-21T08:52:44.6795591Z st.shared.b32 [%r243], %r2366; 2026-02-21T08:52:44.6795656Z st.shared.b32 [%r243+8192], %r2374; 2026-02-21T08:52:44.6795726Z st.shared.b32 [%r244], %r2367; 2026-02-21T08:52:44.6795800Z st.shared.b32 [%r244+8192], %r2375; 2026-02-21T08:52:44.6795866Z st.shared.b32 [%r245], %r2368; 2026-02-21T08:52:44.6795930Z st.shared.b32 [%r245+8192], %r2376; 2026-02-21T08:52:44.6795997Z st.shared.b32 [%r246], %r2369; 2026-02-21T08:52:44.6796062Z st.shared.b32 [%r246+8192], %r2377; 2026-02-21T08:52:44.6796125Z st.shared.b32 [%r247], %r2370; 2026-02-21T08:52:44.6796194Z st.shared.b32 [%r247+8192], %r2378; 2026-02-21T08:52:44.6796258Z st.shared.b32 [%r248], %r2371; 2026-02-21T08:52:44.6796322Z st.shared.b32 [%r248+8192], %r2379; 2026-02-21T08:52:44.6796379Z $L__tmp7: 2026-02-21T08:52:44.6796796Z .loc 2 291 36 // standard.py:291:36 @[ cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:84:40 ] 2026-02-21T08:52:44.6796861Z // begin inline asm 2026-02-21T08:52:44.6796940Z fence.proxy.async.shared::cta; 2026-02-21T08:52:44.6797011Z // end inline asm 2026-02-21T08:52:44.6797067Z bar.sync 0; 2026-02-21T08:52:44.6797150Z shfl.sync.idx.b32 %r2380, %r4, 0, 31, -1; 2026-02-21T08:52:44.6797228Z wgmma.fence.sync.aligned; 2026-02-21T08:52:44.6797289Z shl.b32 %r2381, %r2380, 10; 2026-02-21T08:52:44.6797350Z and.b32 %r2382, %r2381, 4096; 2026-02-21T08:52:44.6797413Z add.s32 %r2383, %r2382, %r1912; 2026-02-21T08:52:44.6797478Z bfe.u32 %r2384, %r2383, 4, 14; 2026-02-21T08:52:44.6797541Z cvt.u64.u32 %rd230, %r2384; 2026-02-21T08:52:44.6797620Z or.b64 %rd217, %rd230, 4611686293338849280; 2026-02-21T08:52:44.6797689Z mov.pred %p49, -1; 2026-02-21T08:52:44.6797751Z // begin inline asm 2026-02-21T08:52:44.6798274Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2022,%r2023,%r2024,%r2025}, %rd217, %p49, 1, 1; 2026-02-21T08:52:44.6798337Z // end inline asm 2026-02-21T08:52:44.6798397Z add.s32 %r2385, %r2383, 32; 2026-02-21T08:52:44.6798457Z bfe.u32 %r2386, %r2385, 4, 14; 2026-02-21T08:52:44.6798519Z cvt.u64.u32 %rd231, %r2386; 2026-02-21T08:52:44.6798599Z or.b64 %rd218, %rd231, 4611686293338849280; 2026-02-21T08:52:44.6798659Z // begin inline asm 2026-02-21T08:52:44.6799168Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2058,%r2059,%r2060,%r2061}, %rd218, %p49, 1, 1; 2026-02-21T08:52:44.6799232Z // end inline asm 2026-02-21T08:52:44.6799292Z add.s32 %r2387, %r2383, 64; 2026-02-21T08:52:44.6799352Z bfe.u32 %r2388, %r2387, 4, 14; 2026-02-21T08:52:44.6799418Z cvt.u64.u32 %rd232, %r2388; 2026-02-21T08:52:44.6799629Z or.b64 %rd219, %rd232, 4611686293338849280; 2026-02-21T08:52:44.6799691Z // begin inline asm 2026-02-21T08:52:44.6800196Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2094,%r2095,%r2096,%r2097}, %rd219, %p49, 1, 1; 2026-02-21T08:52:44.6800258Z // end inline asm 2026-02-21T08:52:44.6800318Z add.s32 %r2389, %r2383, 96; 2026-02-21T08:52:44.6800377Z bfe.u32 %r2390, %r2389, 4, 14; 2026-02-21T08:52:44.6800446Z cvt.u64.u32 %rd233, %r2390; 2026-02-21T08:52:44.6800517Z or.b64 %rd220, %rd233, 4611686293338849280; 2026-02-21T08:52:44.6800575Z // begin inline asm 2026-02-21T08:52:44.6801212Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2130,%r2131,%r2132,%r2133}, %rd220, %p49, 1, 1; 2026-02-21T08:52:44.6801277Z // end inline asm 2026-02-21T08:52:44.6801344Z add.s32 %r2391, %r2383, 8192; 2026-02-21T08:52:44.6801407Z bfe.u32 %r2392, %r2391, 4, 14; 2026-02-21T08:52:44.6801474Z cvt.u64.u32 %rd234, %r2392; 2026-02-21T08:52:44.6801546Z or.b64 %rd221, %rd234, 4611686293338849280; 2026-02-21T08:52:44.6801604Z // begin inline asm 2026-02-21T08:52:44.6802110Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2166,%r2167,%r2168,%r2169}, %rd221, %p49, 1, 1; 2026-02-21T08:52:44.6802167Z // end inline asm 2026-02-21T08:52:44.6802228Z add.s32 %r2393, %r2383, 8224; 2026-02-21T08:52:44.6802294Z bfe.u32 %r2394, %r2393, 4, 14; 2026-02-21T08:52:44.6802355Z cvt.u64.u32 %rd235, %r2394; 2026-02-21T08:52:44.6802428Z or.b64 %rd222, %rd235, 4611686293338849280; 2026-02-21T08:52:44.6802487Z // begin inline asm 2026-02-21T08:52:44.6802988Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2202,%r2203,%r2204,%r2205}, %rd222, %p49, 1, 1; 2026-02-21T08:52:44.6803049Z // end inline asm 2026-02-21T08:52:44.6803108Z add.s32 %r2395, %r2383, 8256; 2026-02-21T08:52:44.6803173Z bfe.u32 %r2396, %r2395, 4, 14; 2026-02-21T08:52:44.6803234Z cvt.u64.u32 %rd236, %r2396; 2026-02-21T08:52:44.6803305Z or.b64 %rd223, %rd236, 4611686293338849280; 2026-02-21T08:52:44.6803369Z // begin inline asm 2026-02-21T08:52:44.6803873Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2238,%r2239,%r2240,%r2241}, %rd223, %p49, 1, 1; 2026-02-21T08:52:44.6803929Z // end inline asm 2026-02-21T08:52:44.6803995Z add.s32 %r2397, %r2383, 8288; 2026-02-21T08:52:44.6804058Z bfe.u32 %r2398, %r2397, 4, 14; 2026-02-21T08:52:44.6804117Z cvt.u64.u32 %rd237, %r2398; 2026-02-21T08:52:44.6804188Z or.b64 %rd224, %rd237, 4611686293338849280; 2026-02-21T08:52:44.6804253Z // begin inline asm 2026-02-21T08:52:44.6804754Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547}, {%r2274,%r2275,%r2276,%r2277}, %rd224, %p49, 1, 1; 2026-02-21T08:52:44.6804815Z // end inline asm 2026-02-21T08:52:44.6804899Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:44.6804956Z mov.b32 %r2295, 0; 2026-02-21T08:52:44.6805017Z mov.b32 %r2294, %r1912; 2026-02-21T08:52:44.6805082Z mov.b32 %r2296, %r2295; 2026-02-21T08:52:44.6805153Z // begin inline asm 2026-02-21T08:52:44.6805467Z // wait for regs: %r2532,%r2533,%r2534,%r2535,%r2536,%r2537,%r2538,%r2539,%r2540,%r2541,%r2542,%r2543,%r2544,%r2545,%r2546,%r2547,%r2294,%r2295,%r2296 2026-02-21T08:52:44.6805545Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:44.6805607Z // end inline asm 2026-02-21T08:52:44.6805660Z $L__tmp8: 2026-02-21T08:52:44.6805882Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6806074Z add.s32 %r2399, %r259, 32; 2026-02-21T08:52:44.6806133Z add.s32 %r2400, %r2530, 1; 2026-02-21T08:52:44.6806199Z setp.gt.s32 %p61, %r2400, 1; 2026-02-21T08:52:44.6806272Z selp.b32 %r2530, 0, %r2400, %p61; 2026-02-21T08:52:44.6806336Z selp.b32 %r2527, 0, %r2399, %p57; 2026-02-21T08:52:44.6806757Z .loc 1 45 22 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:45:22 2026-02-21T08:52:44.6806842Z shl.b32 %r2401, %r2527, 1; 2026-02-21T08:52:44.6807059Z .loc 1 47 25 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:47:25 2026-02-21T08:52:44.6807125Z add.s32 %r2402, %r2401, %r13; 2026-02-21T08:52:44.6807489Z .loc 1 48 53 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:53 2026-02-21T08:52:44.6807562Z shl.b32 %r2403, %r2559, 13; 2026-02-21T08:52:44.6807623Z shl.b32 %r2404, %r2560, 13; 2026-02-21T08:52:44.6807689Z shl.b32 %r2405, %r2561, 13; 2026-02-21T08:52:44.6807752Z shl.b32 %r2406, %r2562, 13; 2026-02-21T08:52:44.6807959Z .loc 1 48 60 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:60 2026-02-21T08:52:44.6808022Z add.s32 %r2407, %r2403, %r2402; 2026-02-21T08:52:44.6808083Z add.s32 %r2408, %r2404, %r2402; 2026-02-21T08:52:44.6808148Z add.s32 %r2409, %r2405, %r2402; 2026-02-21T08:52:44.6808208Z add.s32 %r2410, %r2406, %r2402; 2026-02-21T08:52:44.6808411Z .loc 1 48 32 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:32 2026-02-21T08:52:44.6808495Z mad.wide.s32 %rd225, %r2407, 2, %rd37; 2026-02-21T08:52:44.6808570Z mad.wide.s32 %rd226, %r2408, 2, %rd37; 2026-02-21T08:52:44.6808638Z mad.wide.s32 %rd227, %r2409, 2, %rd37; 2026-02-21T08:52:44.6808711Z mad.wide.s32 %rd228, %r2410, 2, %rd37; 2026-02-21T08:52:44.6808915Z .loc 1 48 80 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:48:80 2026-02-21T08:52:44.6808980Z shl.b32 %r2411, %r2530, 13; 2026-02-21T08:52:44.6809041Z add.s32 %r2412, %r2442, %r2411; 2026-02-21T08:52:44.6809107Z add.s32 %r2316, %r2412, %r19; 2026-02-21T08:52:44.6809172Z selp.b32 %r2317, 8, 0, %p58; 2026-02-21T08:52:44.6809232Z // begin inline asm 2026-02-21T08:52:44.6809381Z cp.async.ca.shared.global [ %r2316 + 0 ], [ %rd225 + 0 ], 0x8, %r2317; 2026-02-21T08:52:44.6809441Z // end inline asm 2026-02-21T08:52:44.6809502Z add.s32 %r2318, %r2412, %r2520; 2026-02-21T08:52:44.6809561Z // begin inline asm 2026-02-21T08:52:44.6809700Z cp.async.ca.shared.global [ %r2318 + 0 ], [ %rd226 + 0 ], 0x8, %r2317; 2026-02-21T08:52:44.6809756Z // end inline asm 2026-02-21T08:52:44.6809817Z add.s32 %r2320, %r2412, %r2519; 2026-02-21T08:52:44.6809881Z // begin inline asm 2026-02-21T08:52:44.6810026Z cp.async.ca.shared.global [ %r2320 + 0 ], [ %rd227 + 0 ], 0x8, %r2317; 2026-02-21T08:52:44.6810086Z // end inline asm 2026-02-21T08:52:44.6810153Z add.s32 %r2322, %r2412, %r2518; 2026-02-21T08:52:44.6810219Z // begin inline asm 2026-02-21T08:52:44.6810359Z cp.async.ca.shared.global [ %r2322 + 0 ], [ %rd228 + 0 ], 0x8, %r2317; 2026-02-21T08:52:44.6810418Z // end inline asm 2026-02-21T08:52:44.6810493Z cp.async.commit_group; 2026-02-21T08:52:44.6810735Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6810807Z setp.ne.b32 %p62, %r2525, 127; 2026-02-21T08:52:44.6810873Z @%p62 bra $L__BB0_16; 2026-02-21T08:52:44.6810989Z // %bb.15: // in Loop: Header=BB0_12 Depth=1 2026-02-21T08:52:44.6811200Z .loc 1 87 28 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:87:28 2026-02-21T08:52:44.6811284Z cvt.rn.bf16x2.f32 %r2432, %r2533, %r2532; 2026-02-21T08:52:44.6811362Z cvt.rn.bf16x2.f32 %r2433, %r2535, %r2534; 2026-02-21T08:52:44.6811433Z cvt.rn.bf16x2.f32 %r2434, %r2537, %r2536; 2026-02-21T08:52:44.6811506Z cvt.rn.bf16x2.f32 %r2435, %r2539, %r2538; 2026-02-21T08:52:44.6811716Z cvt.rn.bf16x2.f32 %r2436, %r2541, %r2540; 2026-02-21T08:52:44.6811788Z cvt.rn.bf16x2.f32 %r2437, %r2543, %r2542; 2026-02-21T08:52:44.6811859Z cvt.rn.bf16x2.f32 %r2438, %r2545, %r2544; 2026-02-21T08:52:44.6811933Z cvt.rn.bf16x2.f32 %r2439, %r2547, %r2546; 2026-02-21T08:52:44.6812149Z .loc 1 88 50 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:50 2026-02-21T08:52:44.6812222Z mad.lo.s32 %r2440, %r2553, 7168, %r2526; 2026-02-21T08:52:44.6812298Z mad.lo.s32 %r2441, %r2554, 7168, %r2526; 2026-02-21T08:52:44.6812507Z .loc 1 88 22 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:22 2026-02-21T08:52:44.6812579Z mad.wide.s32 %rd238, %r2440, 2, %rd39; 2026-02-21T08:52:44.6812738Z mad.wide.s32 %rd239, %r2441, 2, %rd39; 2026-02-21T08:52:44.6812953Z .loc 1 88 81 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:88:81 2026-02-21T08:52:44.6813013Z bar.sync 0; 2026-02-21T08:52:44.6813133Z st.shared.v4.b32 [%r249], {%r2432, %r2434, %r2436, %r2438}; 2026-02-21T08:52:44.6813261Z st.shared.v4.b32 [%r249+256], {%r2433, %r2435, %r2437, %r2439}; 2026-02-21T08:52:44.6813318Z bar.sync 0; 2026-02-21T08:52:44.6813380Z // begin inline asm 2026-02-21T08:52:44.6813594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2423, %r2424, %r2425, %r2426}, [%r2417]; 2026-02-21T08:52:44.6813653Z // end inline asm 2026-02-21T08:52:44.6813710Z // begin inline asm 2026-02-21T08:52:44.6813898Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2427, %r2428, %r2429, %r2430}, [%r2422]; 2026-02-21T08:52:44.6813961Z // end inline asm 2026-02-21T08:52:44.6814020Z // begin inline asm 2026-02-21T08:52:44.6814149Z st.global.v4.b32 [ %rd238 + 0 ], { %r2423, %r2424, %r2425, %r2426 }; 2026-02-21T08:52:44.6814214Z // end inline asm 2026-02-21T08:52:44.6814273Z // begin inline asm 2026-02-21T08:52:44.6814393Z st.global.v4.b32 [ %rd239 + 0 ], { %r2427, %r2428, %r2429, %r2430 }; 2026-02-21T08:52:44.6814457Z // end inline asm 2026-02-21T08:52:44.6814520Z mov.b32 %r2532, 0f00000000; 2026-02-21T08:52:44.6814580Z mov.b32 %r2533, %r2532; 2026-02-21T08:52:44.6814638Z mov.b32 %r2534, %r2532; 2026-02-21T08:52:44.6814700Z mov.b32 %r2535, %r2532; 2026-02-21T08:52:44.6814758Z mov.b32 %r2536, %r2532; 2026-02-21T08:52:44.6814815Z mov.b32 %r2537, %r2532; 2026-02-21T08:52:44.6814876Z mov.b32 %r2538, %r2532; 2026-02-21T08:52:44.6814932Z mov.b32 %r2539, %r2532; 2026-02-21T08:52:44.6814988Z mov.b32 %r2540, %r2532; 2026-02-21T08:52:44.6815045Z mov.b32 %r2541, %r2532; 2026-02-21T08:52:44.6815108Z mov.b32 %r2542, %r2532; 2026-02-21T08:52:44.6815165Z mov.b32 %r2543, %r2532; 2026-02-21T08:52:44.6815223Z mov.b32 %r2544, %r2532; 2026-02-21T08:52:44.6815284Z mov.b32 %r2545, %r2532; 2026-02-21T08:52:44.6815344Z mov.b32 %r2546, %r2532; 2026-02-21T08:52:44.6815413Z mov.b32 %r2547, %r2532; 2026-02-21T08:52:44.6815479Z bra.uni $L__BB0_16; 2026-02-21T08:52:44.6815583Z $L__BB0_17: // %._crit_edge98 2026-02-21T08:52:44.6815810Z .loc 1 19 144 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:144 2026-02-21T08:52:44.6815878Z cp.async.wait_group 0; 2026-02-21T08:52:44.6815943Z bar.sync 0; 2026-02-21T08:52:44.6816151Z .loc 1 19 4 // cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py:19:4 2026-02-21T08:52:44.6816203Z ret; 2026-02-21T08:52:44.6816261Z $L__tmp9: 2026-02-21T08:52:44.6816317Z $L__func_end0: 2026-02-21T08:52:44.6816404Z // -- End function 2026-02-21T08:52:44.6816578Z } 2026-02-21T08:52:44.6816857Z .file 1 "/tmp/torchinductor_root/st/cstrmtovnixsakzr3okc2wbyrps2dybinakvapv7nd5bhfgkozgd.py" 2026-02-21T08:52:44.6817077Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:44.6817143Z .section .debug_abbrev 2026-02-21T08:52:44.6817200Z { 2026-02-21T08:52:44.6817296Z .b8 1 // Abbreviation Code 2026-02-21T08:52:44.6817527Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:44.6817612Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:44.6817701Z .b8 37 // DW_AT_producer 2026-02-21T08:52:44.6817779Z .b8 8 // DW_FORM_string 2026-02-21T08:52:44.6817858Z .b8 19 // DW_AT_language 2026-02-21T08:52:44.6817943Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:44.6818021Z .b8 3 // DW_AT_name 2026-02-21T08:52:44.6818100Z .b8 8 // DW_FORM_string 2026-02-21T08:52:44.6818186Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:44.6818391Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:44.6818491Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:44.6818581Z .b8 8 // DW_FORM_string 2026-02-21T08:52:44.6818661Z .b8 0 // EOM(1) 2026-02-21T08:52:44.6818736Z .b8 0 // EOM(2) 2026-02-21T08:52:44.6818828Z .b8 2 // Abbreviation Code 2026-02-21T08:52:44.6818922Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:44.6819004Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:44.6819082Z .b8 3 // DW_AT_name 2026-02-21T08:52:44.6819165Z .b8 8 // DW_FORM_string 2026-02-21T08:52:44.6819245Z .b8 32 // DW_AT_inline 2026-02-21T08:52:44.6819325Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:44.6819401Z .b8 0 // EOM(1) 2026-02-21T08:52:44.6819472Z .b8 0 // EOM(2) 2026-02-21T08:52:44.6819558Z .b8 3 // Abbreviation Code 2026-02-21T08:52:44.6819646Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:44.6819733Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:44.6819813Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:44.6819889Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:44.6819974Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:44.6820050Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:44.6820143Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:44.6820224Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:44.6820296Z .b8 0 // EOM(1) 2026-02-21T08:52:44.6820365Z .b8 0 // EOM(2) 2026-02-21T08:52:44.6820450Z .b8 4 // Abbreviation Code 2026-02-21T08:52:44.6820556Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:44.6820642Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:44.6820731Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:44.6820812Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:44.6820890Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:44.6820964Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:44.6821052Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:44.6821127Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:44.6821209Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:44.6821292Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:44.6821372Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:44.6821449Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:44.6821648Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:44.6821732Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:44.6821803Z .b8 0 // EOM(1) 2026-02-21T08:52:44.6821871Z .b8 0 // EOM(2) 2026-02-21T08:52:44.6821945Z .b8 0 // EOM(3) 2026-02-21T08:52:44.6821996Z } 2026-02-21T08:52:44.6822059Z .section .debug_info 2026-02-21T08:52:44.6822110Z { 2026-02-21T08:52:44.6822207Z .b32 178 // Length of Unit 2026-02-21T08:52:44.6822299Z .b8 2 // DWARF version number 2026-02-21T08:52:44.6822351Z .b8 0 2026-02-21T08:52:44.6822487Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:44.6822673Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:44.6822791Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:44.6822885Z .b8 116 // DW_AT_producer 2026-02-21T08:52:44.6822939Z .b8 114 2026-02-21T08:52:44.6822991Z .b8 105 2026-02-21T08:52:44.6823042Z .b8 116 2026-02-21T08:52:44.6823109Z .b8 111 2026-02-21T08:52:44.6823163Z .b8 110 2026-02-21T08:52:44.6823217Z .b8 0 2026-02-21T08:52:44.6823299Z .b8 2 // DW_AT_language 2026-02-21T08:52:44.6823351Z .b8 0 2026-02-21T08:52:44.6823431Z .b8 99 // DW_AT_name 2026-02-21T08:52:44.6823483Z .b8 115 2026-02-21T08:52:44.6823538Z .b8 116 2026-02-21T08:52:44.6823589Z .b8 114 2026-02-21T08:52:44.6823640Z .b8 109 2026-02-21T08:52:44.6823693Z .b8 116 2026-02-21T08:52:44.6823745Z .b8 111 2026-02-21T08:52:44.6823796Z .b8 118 2026-02-21T08:52:44.6823846Z .b8 110 2026-02-21T08:52:44.6823903Z .b8 105 2026-02-21T08:52:44.6823958Z .b8 120 2026-02-21T08:52:44.6824014Z .b8 115 2026-02-21T08:52:44.6824073Z .b8 97 2026-02-21T08:52:44.6824124Z .b8 107 2026-02-21T08:52:44.6824178Z .b8 122 2026-02-21T08:52:44.6824229Z .b8 114 2026-02-21T08:52:44.6824284Z .b8 51 2026-02-21T08:52:44.6824336Z .b8 111 2026-02-21T08:52:44.6824386Z .b8 107 2026-02-21T08:52:44.6824437Z .b8 99 2026-02-21T08:52:44.6824490Z .b8 50 2026-02-21T08:52:44.6824541Z .b8 119 2026-02-21T08:52:44.6824591Z .b8 98 2026-02-21T08:52:44.6824658Z .b8 121 2026-02-21T08:52:44.6824711Z .b8 114 2026-02-21T08:52:44.6824763Z .b8 112 2026-02-21T08:52:44.6824814Z .b8 115 2026-02-21T08:52:44.6824870Z .b8 50 2026-02-21T08:52:44.6824921Z .b8 100 2026-02-21T08:52:44.6824972Z .b8 121 2026-02-21T08:52:44.6825026Z .b8 98 2026-02-21T08:52:44.6825077Z .b8 105 2026-02-21T08:52:44.6825128Z .b8 110 2026-02-21T08:52:44.6825179Z .b8 97 2026-02-21T08:52:44.6825234Z .b8 107 2026-02-21T08:52:44.6825285Z .b8 118 2026-02-21T08:52:44.6825335Z .b8 97 2026-02-21T08:52:44.6825390Z .b8 112 2026-02-21T08:52:44.6825443Z .b8 118 2026-02-21T08:52:44.6825493Z .b8 55 2026-02-21T08:52:44.6825544Z .b8 110 2026-02-21T08:52:44.6825598Z .b8 100 2026-02-21T08:52:44.6825652Z .b8 53 2026-02-21T08:52:44.6825702Z .b8 98 2026-02-21T08:52:44.6825753Z .b8 104 2026-02-21T08:52:44.6825813Z .b8 102 2026-02-21T08:52:44.6825864Z .b8 103 2026-02-21T08:52:44.6825917Z .b8 107 2026-02-21T08:52:44.6825971Z .b8 111 2026-02-21T08:52:44.6826022Z .b8 122 2026-02-21T08:52:44.6826074Z .b8 103 2026-02-21T08:52:44.6826126Z .b8 100 2026-02-21T08:52:44.6826180Z .b8 46 2026-02-21T08:52:44.6826232Z .b8 112 2026-02-21T08:52:44.6826283Z .b8 121 2026-02-21T08:52:44.6826337Z .b8 0 2026-02-21T08:52:44.6826441Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:44.6826652Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:44.6826706Z .b8 116 2026-02-21T08:52:44.6826764Z .b8 109 2026-02-21T08:52:44.6826816Z .b8 112 2026-02-21T08:52:44.6826877Z .b8 47 2026-02-21T08:52:44.6826937Z .b8 116 2026-02-21T08:52:44.6826993Z .b8 111 2026-02-21T08:52:44.6827046Z .b8 114 2026-02-21T08:52:44.6827098Z .b8 99 2026-02-21T08:52:44.6827156Z .b8 104 2026-02-21T08:52:44.6827361Z .b8 105 2026-02-21T08:52:44.6827414Z .b8 110 2026-02-21T08:52:44.6827465Z .b8 100 2026-02-21T08:52:44.6827520Z .b8 117 2026-02-21T08:52:44.6827571Z .b8 99 2026-02-21T08:52:44.6827624Z .b8 116 2026-02-21T08:52:44.6827681Z .b8 111 2026-02-21T08:52:44.6827731Z .b8 114 2026-02-21T08:52:44.6827782Z .b8 95 2026-02-21T08:52:44.6827834Z .b8 114 2026-02-21T08:52:44.6827891Z .b8 111 2026-02-21T08:52:44.6827947Z .b8 111 2026-02-21T08:52:44.6827999Z .b8 116 2026-02-21T08:52:44.6828054Z .b8 47 2026-02-21T08:52:44.6828106Z .b8 115 2026-02-21T08:52:44.6828157Z .b8 116 2026-02-21T08:52:44.6828207Z .b8 0 2026-02-21T08:52:44.6828337Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:44.6828418Z .b8 95 // DW_AT_name 2026-02-21T08:52:44.6828669Z .b8 104 2026-02-21T08:52:44.6828739Z .b8 101 2026-02-21T08:52:44.6828793Z .b8 108 2026-02-21T08:52:44.6828847Z .b8 105 2026-02-21T08:52:44.6828900Z .b8 111 2026-02-21T08:52:44.6828955Z .b8 110 2026-02-21T08:52:44.6829010Z .b8 95 2026-02-21T08:52:44.6829063Z .b8 109 2026-02-21T08:52:44.6829120Z .b8 97 2026-02-21T08:52:44.6829171Z .b8 116 2026-02-21T08:52:44.6829223Z .b8 109 2026-02-21T08:52:44.6829274Z .b8 117 2026-02-21T08:52:44.6829333Z .b8 108 2026-02-21T08:52:44.6829383Z .b8 95 2026-02-21T08:52:44.6829433Z .b8 98 2026-02-21T08:52:44.6829484Z .b8 102 2026-02-21T08:52:44.6829538Z .b8 49 2026-02-21T08:52:44.6829589Z .b8 54 2026-02-21T08:52:44.6829639Z .b8 95 2026-02-21T08:52:44.6829695Z .b8 105 2026-02-21T08:52:44.6829746Z .b8 110 2026-02-21T08:52:44.6829798Z .b8 116 2026-02-21T08:52:44.6829850Z .b8 52 2026-02-21T08:52:44.6829906Z .b8 0 2026-02-21T08:52:44.6829988Z .b8 1 // DW_AT_inline 2026-02-21T08:52:44.6830096Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:44.6830195Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:44.6830289Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:44.6830390Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:44.6830526Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:44.6830622Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:44.6830709Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:44.6830796Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:44.6830884Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:44.6830978Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:44.6831068Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:44.6831163Z .b8 0 // End Of Children Mark 2026-02-21T08:52:44.6831252Z .b8 0 // End Of Children Mark 2026-02-21T08:52:44.6831308Z } 2026-02-21T08:52:44.6831384Z .section .debug_macinfo { } 2026-02-21T08:52:44.6831393Z 2026-02-21T08:52:44.6831475Z ================================================================ 2026-02-21T08:52:44.6831600Z please share the reproducer above with Triton project. 2026-02-21T08:52:45.2578938Z 2026-02-21T08:52:45.2578953Z 2026-02-21T08:52:45.2578960Z 2026-02-21T08:52:45.2592470Z ================================================================ 2026-02-21T08:52:45.2592809Z Internal Triton PTX codegen error 2026-02-21T08:52:45.2593042Z `ptxas` stderr: 2026-02-21T08:52:45.2593721Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 508 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:45.2594462Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:45.2594702Z 2026-02-21T08:52:45.2595376Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8t89bgcl.ptx -o /tmp/tmp8t89bgcl.ptx.o 2026-02-21T08:52:45.2596670Z 2026-02-21T08:52:45.2596675Z 2026-02-21T08:52:45.2596767Z // 2026-02-21T08:52:45.2596951Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:45.2597201Z // 2026-02-21T08:52:45.2597298Z 2026-02-21T08:52:45.2597369Z .version 8.7 2026-02-21T08:52:45.2597558Z .target sm_90a 2026-02-21T08:52:45.2597736Z .address_size 64 2026-02-21T08:52:45.2597857Z 2026-02-21T08:52:45.2598075Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:45.2598514Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:45.2598867Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:45.2599198Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:45.2599560Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:45.2600197Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:45.2600644Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:45.2601081Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:45.2601507Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:45.2601845Z ) 2026-02-21T08:52:45.2602012Z .reqntid 512 2026-02-21T08:52:45.2602179Z .maxnreg 32 2026-02-21T08:52:45.2602344Z { 2026-02-21T08:52:45.2602501Z .reg .pred %p<58>; 2026-02-21T08:52:45.2602701Z .reg .b16 %rs<454>; 2026-02-21T08:52:45.2602889Z .reg .b32 %r<2290>; 2026-02-21T08:52:45.2603080Z .reg .b64 %rd<201>; 2026-02-21T08:52:45.2603459Z .loc 1 14 0 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:14:0 2026-02-21T08:52:45.2603913Z $L__func_begin0: 2026-02-21T08:52:45.2604286Z .loc 1 14 0 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:14:0 2026-02-21T08:52:45.2604665Z 2026-02-21T08:52:45.2604733Z // %bb.0: 2026-02-21T08:52:45.2604984Z ld.param.b64 %rd30, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:45.2605357Z ld.param.b64 %rd29, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:45.2605727Z ld.param.b64 %rd28, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:45.2605974Z $L__tmp0: 2026-02-21T08:52:45.2606266Z .loc 1 19 46 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:46 2026-02-21T08:52:45.2606796Z mov.u32 %r2209, %ctaid.x; 2026-02-21T08:52:45.2607112Z .loc 1 0 0 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:0 2026-02-21T08:52:45.2607491Z sub.s32 %r262, 4279, %r2209; 2026-02-21T08:52:45.2607830Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2608207Z mul.hi.u32 %r263, %r262, 1041204193; 2026-02-21T08:52:45.2608417Z shr.u32 %r264, %r263, 10; 2026-02-21T08:52:45.2608613Z mul.hi.u32 %r265, %r264, 1431655766; 2026-02-21T08:52:45.2608818Z mad.lo.s32 %r2270, %r265, 12672, %r2209; 2026-02-21T08:52:45.2609173Z .loc 1 31 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:31:45 2026-02-21T08:52:45.2609531Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:45.2609691Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:45.2609856Z shr.u32 %r266, %r3, 4; 2026-02-21T08:52:45.2610020Z bfe.u32 %r5, %r3, 4, 5; 2026-02-21T08:52:45.2610190Z or.b32 %r6, %r266, 32; 2026-02-21T08:52:45.2610358Z and.b32 %r267, %r3, 15; 2026-02-21T08:52:45.2610527Z shl.b32 %r7, %r267, 2; 2026-02-21T08:52:45.2610829Z .loc 1 33 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:33:45 2026-02-21T08:52:45.2611186Z shl.b32 %r8, %r267, 3; 2026-02-21T08:52:45.2611492Z .loc 1 65 38 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:65:38 2026-02-21T08:52:45.2611836Z and.b32 %r9, %r3, 128; 2026-02-21T08:52:45.2612155Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2612518Z setp.ge.s32 %p1, %r2209, %r2270; 2026-02-21T08:52:45.2612888Z shl.b32 %r2195, %r3, 3; 2026-02-21T08:52:45.2613050Z shr.u32 %r2196, %r3, 1; 2026-02-21T08:52:45.2613223Z mov.b32 %r2197, global_smem; 2026-02-21T08:52:45.2613412Z mul.lo.s32 %r2198, %r5, 7168; 2026-02-21T08:52:45.2613593Z mul.lo.s32 %r2199, %r6, 7168; 2026-02-21T08:52:45.2613773Z shl.b32 %r2200, %r3, 6; 2026-02-21T08:52:45.2613933Z shl.b32 %r2201, %r3, 5; 2026-02-21T08:52:45.2614098Z shl.b32 %r2202, %r3, 1; 2026-02-21T08:52:45.2614259Z and.b32 %r2203, %r3, 127; 2026-02-21T08:52:45.2614432Z shl.b32 %r2204, %r3, 4; 2026-02-21T08:52:45.2614593Z and.b32 %r2205, %r4, 12; 2026-02-21T08:52:45.2614765Z and.b32 %r2206, %r3, 3; 2026-02-21T08:52:45.2614932Z and.b32 %r2207, %r3, 24; 2026-02-21T08:52:45.2615103Z shl.b32 %r2208, %r3, 2; 2026-02-21T08:52:45.2615421Z cvt.u64.u32 %rd191, %r7; 2026-02-21T08:52:45.2615600Z setp.eq.b32 %p57, %r9, 0; 2026-02-21T08:52:45.2615778Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:45.2615958Z // %bb.1: // %.lr.ph 2026-02-21T08:52:45.2616340Z .loc 1 0 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:0:144 2026-02-21T08:52:45.2616830Z and.b32 %r269, %r2195, 4088; 2026-02-21T08:52:45.2617030Z and.b32 %r271, %r2196, 56; 2026-02-21T08:52:45.2617210Z xor.b32 %r10, %r269, %r271; 2026-02-21T08:52:45.2617397Z add.s32 %r273, %r2197, %r10; 2026-02-21T08:52:45.2617584Z add.s32 %r11, %r273, 32768; 2026-02-21T08:52:45.2617758Z add.s32 %r12, %r273, 36864; 2026-02-21T08:52:45.2617942Z add.s32 %r274, %r2197, 49152; 2026-02-21T08:52:45.2618121Z add.s32 %r14, %r274, %r269; 2026-02-21T08:52:45.2618305Z add.s32 %r15, %r273, 40960; 2026-02-21T08:52:45.2618475Z add.s32 %r16, %r273, 45056; 2026-02-21T08:52:45.2618655Z add.s32 %r275, %r2197, %r269; 2026-02-21T08:52:45.2618832Z add.s32 %r18, %r275, 53248; 2026-02-21T08:52:45.2619019Z and.b32 %r277, %r2200, 6144; 2026-02-21T08:52:45.2619198Z and.b32 %r279, %r2201, 896; 2026-02-21T08:52:45.2619384Z and.b32 %r281, %r2202, 62; 2026-02-21T08:52:45.2619569Z or.b32 %r282, %r277, %r279; 2026-02-21T08:52:45.2619750Z or.b32 %r19, %r282, %r281; 2026-02-21T08:52:45.2619931Z xor.b32 %r20, %r19, 8; 2026-02-21T08:52:45.2620103Z xor.b32 %r21, %r19, 16; 2026-02-21T08:52:45.2620274Z xor.b32 %r22, %r19, 24; 2026-02-21T08:52:45.2620437Z xor.b32 %r23, %r19, 32; 2026-02-21T08:52:45.2620602Z xor.b32 %r24, %r19, 40; 2026-02-21T08:52:45.2620763Z xor.b32 %r25, %r19, 48; 2026-02-21T08:52:45.2620929Z xor.b32 %r26, %r19, 56; 2026-02-21T08:52:45.2621090Z and.b32 %r284, %r2196, 128; 2026-02-21T08:52:45.2621273Z add.s32 %r285, %r274, %r284; 2026-02-21T08:52:45.2621461Z add.s32 %r27, %r285, %r2203; 2026-02-21T08:52:45.2621643Z shl.b32 %r286, %r2203, 7; 2026-02-21T08:52:45.2621838Z and.b32 %r288, %r2204, 112; 2026-02-21T08:52:45.2622013Z or.b32 %r290, %r286, %r2205; 2026-02-21T08:52:45.2622194Z or.b32 %r291, %r290, %r288; 2026-02-21T08:52:45.2622367Z add.s32 %r28, %r2197, %r291; 2026-02-21T08:52:45.2622544Z xor.b32 %r292, %r291, 16; 2026-02-21T08:52:45.2622717Z add.s32 %r29, %r2197, %r292; 2026-02-21T08:52:45.2622898Z xor.b32 %r293, %r291, 32; 2026-02-21T08:52:45.2623065Z add.s32 %r30, %r2197, %r293; 2026-02-21T08:52:45.2623242Z xor.b32 %r294, %r291, 48; 2026-02-21T08:52:45.2623413Z add.s32 %r31, %r2197, %r294; 2026-02-21T08:52:45.2623600Z xor.b32 %r295, %r291, 64; 2026-02-21T08:52:45.2623776Z add.s32 %r32, %r2197, %r295; 2026-02-21T08:52:45.2623945Z xor.b32 %r296, %r291, 80; 2026-02-21T08:52:45.2624121Z add.s32 %r33, %r2197, %r296; 2026-02-21T08:52:45.2624290Z xor.b32 %r297, %r291, 96; 2026-02-21T08:52:45.2624462Z add.s32 %r34, %r2197, %r297; 2026-02-21T08:52:45.2624635Z xor.b32 %r298, %r291, 112; 2026-02-21T08:52:45.2624820Z add.s32 %r35, %r2197, %r298; 2026-02-21T08:52:45.2624999Z shl.b32 %r300, %r2206, 12; 2026-02-21T08:52:45.2625172Z and.b32 %r301, %r2201, 3168; 2026-02-21T08:52:45.2625350Z shl.b32 %r303, %r2207, 4; 2026-02-21T08:52:45.2625515Z and.b32 %r304, %r3, 384; 2026-02-21T08:52:45.2625843Z shr.u32 %r305, %r304, 2; 2026-02-21T08:52:45.2626012Z and.b32 %r307, %r2208, 16; 2026-02-21T08:52:45.2626190Z or.b32 %r308, %r301, %r303; 2026-02-21T08:52:45.2626359Z xor.b32 %r309, %r308, %r305; 2026-02-21T08:52:45.2626673Z add.s32 %r310, %r2197, %r300; 2026-02-21T08:52:45.2626860Z add.s32 %r311, %r310, %r307; 2026-02-21T08:52:45.2627047Z add.s32 %r36, %r311, %r309; 2026-02-21T08:52:45.2627235Z shl.b32 %r312, %r2207, 9; 2026-02-21T08:52:45.2627410Z shl.b32 %r313, %r2206, 5; 2026-02-21T08:52:45.2627585Z and.b32 %r314, %r2208, 2032; 2026-02-21T08:52:45.2627759Z or.b32 %r315, %r312, %r313; 2026-02-21T08:52:45.2627935Z xor.b32 %r316, %r315, %r314; 2026-02-21T08:52:45.2628101Z add.s32 %r759, %r2197, %r316; 2026-02-21T08:52:45.2628281Z add.s32 %r764, %r759, 2048; 2026-02-21T08:52:45.2628717Z shr.u32 %r317, %r304, 5; 2026-02-21T08:52:45.2628921Z or.b32 %r318, %r286, %r317; 2026-02-21T08:52:45.2629109Z or.b32 %r319, %r318, %r288; 2026-02-21T08:52:45.2629288Z add.s32 %r39, %r2197, %r319; 2026-02-21T08:52:45.2629474Z xor.b32 %r320, %r319, 16; 2026-02-21T08:52:45.2629644Z add.s32 %r40, %r2197, %r320; 2026-02-21T08:52:45.2629834Z xor.b32 %r321, %r319, 32; 2026-02-21T08:52:45.2630000Z add.s32 %r41, %r2197, %r321; 2026-02-21T08:52:45.2630177Z xor.b32 %r322, %r319, 48; 2026-02-21T08:52:45.2630342Z add.s32 %r42, %r2197, %r322; 2026-02-21T08:52:45.2630523Z xor.b32 %r323, %r319, 64; 2026-02-21T08:52:45.2630691Z add.s32 %r43, %r2197, %r323; 2026-02-21T08:52:45.2630857Z xor.b32 %r324, %r319, 80; 2026-02-21T08:52:45.2631025Z add.s32 %r44, %r2197, %r324; 2026-02-21T08:52:45.2631194Z xor.b32 %r325, %r319, 96; 2026-02-21T08:52:45.2631364Z add.s32 %r45, %r2197, %r325; 2026-02-21T08:52:45.2631535Z xor.b32 %r326, %r319, 112; 2026-02-21T08:52:45.2631717Z add.s32 %r46, %r2197, %r326; 2026-02-21T08:52:45.2632066Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2632442Z or.b32 %r327, %r2198, %r8; 2026-02-21T08:52:45.2632624Z add.s32 %r47, %r327, 458752; 2026-02-21T08:52:45.2632948Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2633325Z mad.wide.u32 %rd31, %r267, 8, %rd28; 2026-02-21T08:52:45.2633528Z add.s64 %rd1, %rd31, 256; 2026-02-21T08:52:45.2633703Z shl.b32 %r48, %r5, 13; 2026-02-21T08:52:45.2633868Z shl.b32 %r329, %r6, 13; 2026-02-21T08:52:45.2634205Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2634560Z or.b32 %r330, %r329, %r7; 2026-02-21T08:52:45.2634734Z or.b32 %r49, %r330, 128; 2026-02-21T08:52:45.2634960Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:45.2635261Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:45.2635532Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:45.2635790Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:45.2636176Z .loc 1 25 35 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:25:35 2026-02-21T08:52:45.2636686Z mul.hi.s32 %r346, %r2209, -1840700269; 2026-02-21T08:52:45.2636903Z add.s32 %r347, %r346, %r2209; 2026-02-21T08:52:45.2637101Z shr.u32 %r348, %r347, 31; 2026-02-21T08:52:45.2637279Z shr.s32 %r349, %r347, 7; 2026-02-21T08:52:45.2637458Z add.s32 %r350, %r349, %r348; 2026-02-21T08:52:45.2637774Z .loc 1 26 33 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:26:33 2026-02-21T08:52:45.2638131Z shl.b32 %r351, %r350, 2; 2026-02-21T08:52:45.2638452Z .loc 1 27 39 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:39 2026-02-21T08:52:45.2638814Z sub.s32 %r352, 1, %r351; 2026-02-21T08:52:45.2639125Z .loc 1 27 52 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:52 2026-02-21T08:52:45.2639639Z min.s32 %r353, %r352, 4; 2026-02-21T08:52:45.2639956Z .loc 1 28 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:45 2026-02-21T08:52:45.2640304Z mul.lo.s32 %r354, %r350, 224; 2026-02-21T08:52:45.2640488Z sub.s32 %r355, %r2209, %r354; 2026-02-21T08:52:45.2640804Z .loc 1 29 51 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:29:51 2026-02-21T08:52:45.2641159Z div.s32 %r356, %r355, %r353; 2026-02-21T08:52:45.2641740Z [82s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:45.2643431Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:45.2645015Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:45.2645312Z `ptxas` stderr: 2026-02-21T08:52:45.2645864Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 508 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:45.2646640Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:45.2646832Z 2026-02-21T08:52:45.2647363Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8t89bgcl.ptx -o /tmp/tmp8t89bgcl.ptx.o 2026-02-21T08:52:45.2647955Z 2026-02-21T08:52:45.2648115Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:45.2648570Z .loc 1 28 64 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:64 2026-02-21T08:52:45.2648949Z mul.lo.s32 %r357, %r356, %r353; 2026-02-21T08:52:45.2649151Z sub.s32 %r358, %r355, %r357; 2026-02-21T08:52:45.2649479Z .loc 1 28 30 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:30 2026-02-21T08:52:45.2649835Z add.s32 %r359, %r358, %r351; 2026-02-21T08:52:45.2650155Z .loc 1 30 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:30:27 2026-02-21T08:52:45.2650505Z shl.b32 %r360, %r359, 6; 2026-02-21T08:52:45.2650818Z .loc 1 31 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:31:32 2026-02-21T08:52:45.2651165Z or.b32 %r82, %r360, %r5; 2026-02-21T08:52:45.2651333Z or.b32 %r83, %r360, %r6; 2026-02-21T08:52:45.2651641Z .loc 1 32 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:32:27 2026-02-21T08:52:45.2651984Z shl.b32 %r361, %r356, 7; 2026-02-21T08:52:45.2652292Z .loc 1 33 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:33:32 2026-02-21T08:52:45.2652640Z or.b32 %r84, %r361, %r8; 2026-02-21T08:52:45.2652963Z .loc 1 48 53 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:53 2026-02-21T08:52:45.2653326Z shl.b32 %r362, %r82, 13; 2026-02-21T08:52:45.2653497Z shl.b32 %r363, %r83, 13; 2026-02-21T08:52:45.2653804Z .loc 1 48 60 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:60 2026-02-21T08:52:45.2654157Z or.b32 %r364, %r362, %r7; 2026-02-21T08:52:45.2654332Z or.b32 %r365, %r363, %r7; 2026-02-21T08:52:45.2654638Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2655004Z mad.wide.s32 %rd32, %r364, 2, %rd28; 2026-02-21T08:52:45.2655209Z mad.wide.s32 %rd33, %r365, 2, %rd28; 2026-02-21T08:52:45.2655410Z mov.b32 %r332, 8; 2026-02-21T08:52:45.2655708Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2656218Z // begin inline asm 2026-02-21T08:52:45.2656583Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd32 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2656875Z // end inline asm 2026-02-21T08:52:45.2657036Z // begin inline asm 2026-02-21T08:52:45.2657259Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd33 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2657529Z // end inline asm 2026-02-21T08:52:45.2657682Z cp.async.commit_group; 2026-02-21T08:52:45.2658003Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2658362Z add.s32 %r366, %r84, %r2198; 2026-02-21T08:52:45.2658676Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2659037Z cvt.s64.s32 %rd39, %r366; 2026-02-21T08:52:45.2659366Z add.s64 %rd34, %rd29, %rd39; 2026-02-21T08:52:45.2659705Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2660057Z // begin inline asm 2026-02-21T08:52:45.2660287Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd34 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2660559Z // end inline asm 2026-02-21T08:52:45.2660713Z cp.async.commit_group; 2026-02-21T08:52:45.2661043Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2661398Z cvt.s64.s32 %rd40, %r362; 2026-02-21T08:52:45.2661577Z or.b64 %rd41, %rd40, %rd191; 2026-02-21T08:52:45.2661752Z shl.b64 %rd42, %rd41, 1; 2026-02-21T08:52:45.2661924Z add.s64 %rd43, %rd28, %rd42; 2026-02-21T08:52:45.2662098Z add.s64 %rd35, %rd43, 128; 2026-02-21T08:52:45.2662283Z cvt.s64.s32 %rd44, %r363; 2026-02-21T08:52:45.2662468Z or.b64 %rd45, %rd44, %rd191; 2026-02-21T08:52:45.2662641Z shl.b64 %rd46, %rd45, 1; 2026-02-21T08:52:45.2662816Z add.s64 %rd47, %rd28, %rd46; 2026-02-21T08:52:45.2662989Z add.s64 %rd36, %rd47, 128; 2026-02-21T08:52:45.2663308Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2663661Z bar.sync 0; 2026-02-21T08:52:45.2663807Z // begin inline asm 2026-02-21T08:52:45.2664028Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd35 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2664298Z // end inline asm 2026-02-21T08:52:45.2664447Z // begin inline asm 2026-02-21T08:52:45.2664662Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd36 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2664948Z // end inline asm 2026-02-21T08:52:45.2665108Z cp.async.commit_group; 2026-02-21T08:52:45.2665419Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2665768Z add.s32 %r367, %r84, %r2199; 2026-02-21T08:52:45.2666089Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2666439Z cvt.s64.s32 %rd48, %r367; 2026-02-21T08:52:45.2666750Z add.s64 %rd37, %rd29, %rd48; 2026-02-21T08:52:45.2667068Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2667416Z // begin inline asm 2026-02-21T08:52:45.2667641Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd37 + 0 ], 0x8, %r332; 2026-02-21T08:52:45.2667909Z // end inline asm 2026-02-21T08:52:45.2668077Z cp.async.commit_group; 2026-02-21T08:52:45.2668396Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2668833Z add.s32 %r2211, %r47, %r361; 2026-02-21T08:52:45.2669013Z shl.b32 %r368, %r359, 19; 2026-02-21T08:52:45.2669179Z or.b32 %r369, %r48, %r368; 2026-02-21T08:52:45.2669364Z mad.wide.s32 %rd192, %r369, 2, %rd1; 2026-02-21T08:52:45.2669574Z or.b32 %r2210, %r49, %r368; 2026-02-21T08:52:45.2669761Z mov.b32 %r2214, 0f00000000; 2026-02-21T08:52:45.2669933Z mov.b32 %r2213, 1; 2026-02-21T08:52:45.2670094Z mov.b32 %r2212, -1; 2026-02-21T08:52:45.2670252Z mov.b64 %rd193, -32; 2026-02-21T08:52:45.2670416Z mov.b32 %r2215, %r2214; 2026-02-21T08:52:45.2670748Z mov.b32 %r2216, %r2214; 2026-02-21T08:52:45.2670916Z mov.b32 %r2217, %r2214; 2026-02-21T08:52:45.2671089Z mov.b32 %r2218, %r2214; 2026-02-21T08:52:45.2671250Z mov.b32 %r2219, %r2214; 2026-02-21T08:52:45.2671418Z mov.b32 %r2220, %r2214; 2026-02-21T08:52:45.2671577Z mov.b32 %r2221, %r2214; 2026-02-21T08:52:45.2671744Z mov.b32 %r2222, %r2214; 2026-02-21T08:52:45.2671901Z mov.b32 %r2223, %r2214; 2026-02-21T08:52:45.2672065Z mov.b32 %r2224, %r2214; 2026-02-21T08:52:45.2672223Z mov.b32 %r2225, %r2214; 2026-02-21T08:52:45.2672384Z mov.b32 %r2226, %r2214; 2026-02-21T08:52:45.2672551Z mov.b32 %r2227, %r2214; 2026-02-21T08:52:45.2672714Z mov.b32 %r2228, %r2214; 2026-02-21T08:52:45.2672872Z mov.b32 %r2229, %r2214; 2026-02-21T08:52:45.2673228Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.2673534Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.2673793Z add.s64 %rd193, %rd193, 32; 2026-02-21T08:52:45.2673980Z setp.lt.u64 %p11, %rd193, 4032; 2026-02-21T08:52:45.2674182Z add.s32 %r702, %r2212, 1; 2026-02-21T08:52:45.2674362Z setp.gt.s32 %p12, %r702, 1; 2026-02-21T08:52:45.2674558Z selp.b32 %r2212, 0, %r702, %p12; 2026-02-21T08:52:45.2674922Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2675289Z cp.async.wait_group 2; 2026-02-21T08:52:45.2675464Z bar.sync 0; 2026-02-21T08:52:45.2675610Z shl.b32 %r703, %r2212, 12; 2026-02-21T08:52:45.2675806Z shl.b32 %r704, %r2212, 13; 2026-02-21T08:52:45.2675980Z add.s32 %r705, %r2197, 32768; 2026-02-21T08:52:45.2676163Z add.s32 %r706, %r705, %r704; 2026-02-21T08:52:45.2676634Z .loc 1 52 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:52:32 2026-02-21T08:52:45.2677002Z add.s32 %r707, %r706, %r19; 2026-02-21T08:52:45.2677190Z ld.shared.b16 %rs1, [%r707]; 2026-02-21T08:52:45.2677376Z ld.shared.b16 %rs2, [%r707+1024]; 2026-02-21T08:52:45.2677593Z ld.shared.b16 %rs3, [%r707+64]; 2026-02-21T08:52:45.2677791Z ld.shared.b16 %rs4, [%r707+1088]; 2026-02-21T08:52:45.2677982Z add.s32 %r708, %r706, %r20; 2026-02-21T08:52:45.2678158Z ld.shared.b16 %rs5, [%r708]; 2026-02-21T08:52:45.2678342Z ld.shared.b16 %rs6, [%r708+1024]; 2026-02-21T08:52:45.2678530Z ld.shared.b16 %rs7, [%r708+64]; 2026-02-21T08:52:45.2678725Z ld.shared.b16 %rs8, [%r708+1088]; 2026-02-21T08:52:45.2678912Z add.s32 %r709, %r706, %r21; 2026-02-21T08:52:45.2679084Z ld.shared.b16 %rs9, [%r709]; 2026-02-21T08:52:45.2679272Z ld.shared.b16 %rs10, [%r709+1024]; 2026-02-21T08:52:45.2679485Z ld.shared.b16 %rs11, [%r709+64]; 2026-02-21T08:52:45.2679684Z ld.shared.b16 %rs12, [%r709+1088]; 2026-02-21T08:52:45.2679874Z add.s32 %r710, %r706, %r22; 2026-02-21T08:52:45.2680063Z ld.shared.b16 %rs13, [%r710]; 2026-02-21T08:52:45.2680250Z ld.shared.b16 %rs14, [%r710+1024]; 2026-02-21T08:52:45.2680452Z ld.shared.b16 %rs15, [%r710+64]; 2026-02-21T08:52:45.2680653Z ld.shared.b16 %rs16, [%r710+1088]; 2026-02-21T08:52:45.2680843Z add.s32 %r711, %r706, %r23; 2026-02-21T08:52:45.2681023Z ld.shared.b16 %rs17, [%r711]; 2026-02-21T08:52:45.2681206Z ld.shared.b16 %rs18, [%r711+1024]; 2026-02-21T08:52:45.2681405Z ld.shared.b16 %rs19, [%r711+64]; 2026-02-21T08:52:45.2681594Z ld.shared.b16 %rs20, [%r711+1088]; 2026-02-21T08:52:45.2681799Z add.s32 %r712, %r706, %r24; 2026-02-21T08:52:45.2681976Z ld.shared.b16 %rs21, [%r712]; 2026-02-21T08:52:45.2682159Z ld.shared.b16 %rs22, [%r712+1024]; 2026-02-21T08:52:45.2682364Z ld.shared.b16 %rs23, [%r712+64]; 2026-02-21T08:52:45.2682554Z ld.shared.b16 %rs24, [%r712+1088]; 2026-02-21T08:52:45.2682747Z add.s32 %r713, %r706, %r25; 2026-02-21T08:52:45.2682930Z ld.shared.b16 %rs25, [%r713]; 2026-02-21T08:52:45.2683112Z ld.shared.b16 %rs26, [%r713+1024]; 2026-02-21T08:52:45.2683314Z ld.shared.b16 %rs27, [%r713+64]; 2026-02-21T08:52:45.2683501Z ld.shared.b16 %rs28, [%r713+1088]; 2026-02-21T08:52:45.2683866Z add.s32 %r714, %r706, %r26; 2026-02-21T08:52:45.2684041Z ld.shared.b16 %rs29, [%r714]; 2026-02-21T08:52:45.2684226Z ld.shared.b16 %rs30, [%r714+1024]; 2026-02-21T08:52:45.2684416Z ld.shared.b16 %rs31, [%r714+64]; 2026-02-21T08:52:45.2684613Z ld.shared.b16 %rs32, [%r714+1088]; 2026-02-21T08:52:45.2684813Z cvt.f32.bf16 %r402, %rs1; 2026-02-21T08:52:45.2684992Z cvt.f32.bf16 %r403, %rs2; 2026-02-21T08:52:45.2685180Z cvt.f32.bf16 %r404, %rs5; 2026-02-21T08:52:45.2685345Z cvt.f32.bf16 %r405, %rs6; 2026-02-21T08:52:45.2685519Z cvt.f32.bf16 %r438, %rs9; 2026-02-21T08:52:45.2685685Z cvt.f32.bf16 %r439, %rs10; 2026-02-21T08:52:45.2685862Z cvt.f32.bf16 %r440, %rs13; 2026-02-21T08:52:45.2686045Z cvt.f32.bf16 %r441, %rs14; 2026-02-21T08:52:45.2686220Z cvt.f32.bf16 %r474, %rs17; 2026-02-21T08:52:45.2686669Z cvt.f32.bf16 %r475, %rs18; 2026-02-21T08:52:45.2686887Z cvt.f32.bf16 %r476, %rs21; 2026-02-21T08:52:45.2687064Z cvt.f32.bf16 %r477, %rs22; 2026-02-21T08:52:45.2687246Z cvt.f32.bf16 %r510, %rs25; 2026-02-21T08:52:45.2687418Z cvt.f32.bf16 %r511, %rs26; 2026-02-21T08:52:45.2687584Z cvt.f32.bf16 %r512, %rs29; 2026-02-21T08:52:45.2687758Z cvt.f32.bf16 %r513, %rs30; 2026-02-21T08:52:45.2687928Z cvt.f32.bf16 %r546, %rs3; 2026-02-21T08:52:45.2688102Z cvt.f32.bf16 %r547, %rs4; 2026-02-21T08:52:45.2688269Z cvt.f32.bf16 %r548, %rs7; 2026-02-21T08:52:45.2688440Z cvt.f32.bf16 %r549, %rs8; 2026-02-21T08:52:45.2688606Z cvt.f32.bf16 %r582, %rs11; 2026-02-21T08:52:45.2688780Z cvt.f32.bf16 %r583, %rs12; 2026-02-21T08:52:45.2688948Z cvt.f32.bf16 %r584, %rs15; 2026-02-21T08:52:45.2689122Z cvt.f32.bf16 %r585, %rs16; 2026-02-21T08:52:45.2689296Z cvt.f32.bf16 %r618, %rs19; 2026-02-21T08:52:45.2689465Z cvt.f32.bf16 %r619, %rs20; 2026-02-21T08:52:45.2689642Z cvt.f32.bf16 %r620, %rs23; 2026-02-21T08:52:45.2689825Z cvt.f32.bf16 %r621, %rs24; 2026-02-21T08:52:45.2689999Z cvt.f32.bf16 %r654, %rs27; 2026-02-21T08:52:45.2690166Z cvt.f32.bf16 %r655, %rs28; 2026-02-21T08:52:45.2690343Z cvt.f32.bf16 %r656, %rs31; 2026-02-21T08:52:45.2690515Z cvt.f32.bf16 %r657, %rs32; 2026-02-21T08:52:45.2690845Z .loc 1 67 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:67:45 2026-02-21T08:52:45.2691213Z add.s32 %r715, %r27, %r703; 2026-02-21T08:52:45.2691417Z ld.shared.b8 %rs33, [%r715]; 2026-02-21T08:52:45.2691613Z ld.shared.b8 %rs34, [%r715+256]; 2026-02-21T08:52:45.2691810Z ld.shared.b8 %rs35, [%r715+512]; 2026-02-21T08:52:45.2692004Z ld.shared.b8 %rs36, [%r715+768]; 2026-02-21T08:52:45.2692197Z ld.shared.b8 %rs37, [%r715+1024]; 2026-02-21T08:52:45.2692401Z ld.shared.b8 %rs38, [%r715+1280]; 2026-02-21T08:52:45.2692592Z ld.shared.b8 %rs39, [%r715+1536]; 2026-02-21T08:52:45.2692788Z ld.shared.b8 %rs40, [%r715+1792]; 2026-02-21T08:52:45.2692989Z ld.shared.b8 %rs41, [%r715+2048]; 2026-02-21T08:52:45.2693178Z ld.shared.b8 %rs42, [%r715+2304]; 2026-02-21T08:52:45.2693375Z ld.shared.b8 %rs43, [%r715+2560]; 2026-02-21T08:52:45.2693566Z ld.shared.b8 %rs44, [%r715+2816]; 2026-02-21T08:52:45.2693761Z ld.shared.b8 %rs45, [%r715+3072]; 2026-02-21T08:52:45.2693949Z ld.shared.b8 %rs46, [%r715+3328]; 2026-02-21T08:52:45.2694145Z ld.shared.b8 %rs47, [%r715+3584]; 2026-02-21T08:52:45.2694333Z ld.shared.b8 %rs48, [%r715+3840]; 2026-02-21T08:52:45.2694677Z .loc 1 57 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:57:28 2026-02-21T08:52:45.2695041Z shl.b16 %rs49, %rs33, 4; 2026-02-21T08:52:45.2695215Z shl.b16 %rs50, %rs34, 4; 2026-02-21T08:52:45.2695386Z shl.b16 %rs51, %rs35, 4; 2026-02-21T08:52:45.2695548Z shl.b16 %rs52, %rs36, 4; 2026-02-21T08:52:45.2695730Z shl.b16 %rs53, %rs37, 4; 2026-02-21T08:52:45.2695893Z shl.b16 %rs54, %rs38, 4; 2026-02-21T08:52:45.2696061Z shl.b16 %rs55, %rs39, 4; 2026-02-21T08:52:45.2696226Z shl.b16 %rs56, %rs40, 4; 2026-02-21T08:52:45.2696392Z shl.b16 %rs57, %rs41, 4; 2026-02-21T08:52:45.2696693Z shl.b16 %rs58, %rs42, 4; 2026-02-21T08:52:45.2697020Z shl.b16 %rs59, %rs43, 4; 2026-02-21T08:52:45.2697192Z shl.b16 %rs60, %rs44, 4; 2026-02-21T08:52:45.2697354Z shl.b16 %rs61, %rs45, 4; 2026-02-21T08:52:45.2697523Z shl.b16 %rs62, %rs46, 4; 2026-02-21T08:52:45.2697684Z shl.b16 %rs63, %rs47, 4; 2026-02-21T08:52:45.2697854Z shl.b16 %rs64, %rs48, 4; 2026-02-21T08:52:45.2698172Z .loc 1 72 58 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:72:58 2026-02-21T08:52:45.2698546Z selp.b16 %rs65, %rs49, %rs33, %p57; 2026-02-21T08:52:45.2698749Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T08:52:45.2698925Z shr.s16 %rs67, %rs66, 4; 2026-02-21T08:52:45.2699105Z selp.b16 %rs68, %rs50, %rs34, %p57; 2026-02-21T08:52:45.2699309Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T08:52:45.2699490Z shr.s16 %rs70, %rs69, 4; 2026-02-21T08:52:45.2699811Z selp.b16 %rs71, %rs51, %rs35, %p57; 2026-02-21T08:52:45.2700030Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T08:52:45.2700198Z shr.s16 %rs73, %rs72, 4; 2026-02-21T08:52:45.2700381Z selp.b16 %rs74, %rs52, %rs36, %p57; 2026-02-21T08:52:45.2700576Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T08:52:45.2700747Z shr.s16 %rs76, %rs75, 4; 2026-02-21T08:52:45.2700919Z selp.b16 %rs77, %rs53, %rs37, %p57; 2026-02-21T08:52:45.2701117Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T08:52:45.2701303Z shr.s16 %rs79, %rs78, 4; 2026-02-21T08:52:45.2701474Z selp.b16 %rs80, %rs54, %rs38, %p57; 2026-02-21T08:52:45.2701670Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T08:52:45.2701837Z shr.s16 %rs82, %rs81, 4; 2026-02-21T08:52:45.2702017Z selp.b16 %rs83, %rs55, %rs39, %p57; 2026-02-21T08:52:45.2702208Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T08:52:45.2702382Z shr.s16 %rs85, %rs84, 4; 2026-02-21T08:52:45.2702553Z selp.b16 %rs86, %rs56, %rs40, %p57; 2026-02-21T08:52:45.2702748Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T08:52:45.2702931Z shr.s16 %rs88, %rs87, 4; 2026-02-21T08:52:45.2703104Z selp.b16 %rs89, %rs57, %rs41, %p57; 2026-02-21T08:52:45.2703303Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T08:52:45.2703475Z shr.s16 %rs91, %rs90, 4; 2026-02-21T08:52:45.2703656Z selp.b16 %rs92, %rs58, %rs42, %p57; 2026-02-21T08:52:45.2703853Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T08:52:45.2704030Z shr.s16 %rs94, %rs93, 4; 2026-02-21T08:52:45.2704202Z selp.b16 %rs95, %rs59, %rs43, %p57; 2026-02-21T08:52:45.2704401Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T08:52:45.2704565Z shr.s16 %rs97, %rs96, 4; 2026-02-21T08:52:45.2704741Z selp.b16 %rs98, %rs60, %rs44, %p57; 2026-02-21T08:52:45.2704939Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T08:52:45.2705107Z shr.s16 %rs100, %rs99, 4; 2026-02-21T08:52:45.2705293Z selp.b16 %rs101, %rs61, %rs45, %p57; 2026-02-21T08:52:45.2705495Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T08:52:45.2705675Z shr.s16 %rs103, %rs102, 4; 2026-02-21T08:52:45.2705858Z selp.b16 %rs104, %rs62, %rs46, %p57; 2026-02-21T08:52:45.2706064Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T08:52:45.2706234Z shr.s16 %rs106, %rs105, 4; 2026-02-21T08:52:45.2706417Z selp.b16 %rs107, %rs63, %rs47, %p57; 2026-02-21T08:52:45.2706764Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T08:52:45.2706943Z shr.s16 %rs109, %rs108, 4; 2026-02-21T08:52:45.2707137Z selp.b16 %rs110, %rs64, %rs48, %p57; 2026-02-21T08:52:45.2707332Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T08:52:45.2707511Z shr.s16 %rs112, %rs111, 4; 2026-02-21T08:52:45.2707827Z .loc 1 77 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:77:32 2026-02-21T08:52:45.2708192Z cvt.rn.f32.s16 %r716, %rs67; 2026-02-21T08:52:45.2708384Z cvt.rn.f32.s16 %r717, %rs70; 2026-02-21T08:52:45.2708640Z cvt.rn.f32.s16 %r718, %rs73; 2026-02-21T08:52:45.2708819Z cvt.rn.f32.s16 %r719, %rs76; 2026-02-21T08:52:45.2709001Z cvt.rn.f32.s16 %r720, %rs79; 2026-02-21T08:52:45.2709178Z cvt.rn.f32.s16 %r721, %rs82; 2026-02-21T08:52:45.2709351Z cvt.rn.f32.s16 %r722, %rs85; 2026-02-21T08:52:45.2709534Z cvt.rn.f32.s16 %r723, %rs88; 2026-02-21T08:52:45.2709707Z cvt.rn.f32.s16 %r724, %rs91; 2026-02-21T08:52:45.2709888Z cvt.rn.f32.s16 %r725, %rs94; 2026-02-21T08:52:45.2710063Z cvt.rn.f32.s16 %r726, %rs97; 2026-02-21T08:52:45.2710407Z cvt.rn.f32.s16 %r727, %rs100; 2026-02-21T08:52:45.2710586Z cvt.rn.f32.s16 %r728, %rs103; 2026-02-21T08:52:45.2710767Z cvt.rn.f32.s16 %r729, %rs106; 2026-02-21T08:52:45.2710947Z cvt.rn.f32.s16 %r730, %rs109; 2026-02-21T08:52:45.2711131Z cvt.rn.f32.s16 %r731, %rs112; 2026-02-21T08:52:45.2711314Z st.shared.b32 [%r28], %r716; 2026-02-21T08:52:45.2711498Z st.shared.b32 [%r28+16384], %r724; 2026-02-21T08:52:45.2711699Z st.shared.b32 [%r29], %r717; 2026-02-21T08:52:45.2711878Z st.shared.b32 [%r29+16384], %r725; 2026-02-21T08:52:45.2712079Z st.shared.b32 [%r30], %r718; 2026-02-21T08:52:45.2712256Z st.shared.b32 [%r30+16384], %r726; 2026-02-21T08:52:45.2712452Z st.shared.b32 [%r31], %r719; 2026-02-21T08:52:45.2712761Z st.shared.b32 [%r31+16384], %r727; 2026-02-21T08:52:45.2712974Z st.shared.b32 [%r32], %r720; 2026-02-21T08:52:45.2713166Z st.shared.b32 [%r32+16384], %r728; 2026-02-21T08:52:45.2713356Z st.shared.b32 [%r33], %r721; 2026-02-21T08:52:45.2713547Z st.shared.b32 [%r33+16384], %r729; 2026-02-21T08:52:45.2713736Z st.shared.b32 [%r34], %r722; 2026-02-21T08:52:45.2713918Z st.shared.b32 [%r34+16384], %r730; 2026-02-21T08:52:45.2714106Z st.shared.b32 [%r35], %r723; 2026-02-21T08:52:45.2714290Z st.shared.b32 [%r35+16384], %r731; 2026-02-21T08:52:45.2714348Z $L__tmp1: 2026-02-21T08:52:45.2714630Z .loc 2 291 36 // standard.py:291:36 @[ cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:84:40 ] 2026-02-21T08:52:45.2714699Z // begin inline asm 2026-02-21T08:52:45.2714796Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.2714864Z // end inline asm 2026-02-21T08:52:45.2714926Z bar.sync 0; 2026-02-21T08:52:45.2715008Z shfl.sync.idx.b32 %r732, %r4, 0, 31, -1; 2026-02-21T08:52:45.2715084Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.2715150Z shl.b32 %r733, %r732, 10; 2026-02-21T08:52:45.2715211Z and.b32 %r734, %r733, 12288; 2026-02-21T08:52:45.2715272Z add.s32 %r735, %r734, %r2197; 2026-02-21T08:52:45.2715339Z bfe.u32 %r736, %r735, 4, 14; 2026-02-21T08:52:45.2715420Z cvt.u64.u32 %rd60, %r736; 2026-02-21T08:52:45.2715494Z or.b64 %rd49, %rd60, 4611686293372403712; 2026-02-21T08:52:45.2715558Z mov.pred %p2, -1; 2026-02-21T08:52:45.2715622Z // begin inline asm 2026-02-21T08:52:45.2716131Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r402,%r403,%r404,%r405}, %rd49, %p2, 1, 1; 2026-02-21T08:52:45.2716188Z // end inline asm 2026-02-21T08:52:45.2716252Z add.s32 %r737, %r735, 32; 2026-02-21T08:52:45.2716314Z bfe.u32 %r738, %r737, 4, 14; 2026-02-21T08:52:45.2716374Z cvt.u64.u32 %rd61, %r738; 2026-02-21T08:52:45.2716576Z or.b64 %rd50, %rd61, 4611686293372403712; 2026-02-21T08:52:45.2716663Z // begin inline asm 2026-02-21T08:52:45.2717163Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r438,%r439,%r440,%r441}, %rd50, %p2, 1, 1; 2026-02-21T08:52:45.2717225Z // end inline asm 2026-02-21T08:52:45.2717291Z add.s32 %r739, %r735, 64; 2026-02-21T08:52:45.2717352Z bfe.u32 %r740, %r739, 4, 14; 2026-02-21T08:52:45.2717411Z cvt.u64.u32 %rd62, %r740; 2026-02-21T08:52:45.2717487Z or.b64 %rd51, %rd62, 4611686293372403712; 2026-02-21T08:52:45.2717546Z // begin inline asm 2026-02-21T08:52:45.2718038Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r474,%r475,%r476,%r477}, %rd51, %p2, 1, 1; 2026-02-21T08:52:45.2718095Z // end inline asm 2026-02-21T08:52:45.2718160Z add.s32 %r741, %r735, 96; 2026-02-21T08:52:45.2718221Z bfe.u32 %r742, %r741, 4, 14; 2026-02-21T08:52:45.2718282Z cvt.u64.u32 %rd63, %r742; 2026-02-21T08:52:45.2718358Z or.b64 %rd52, %rd63, 4611686293372403712; 2026-02-21T08:52:45.2718417Z // begin inline asm 2026-02-21T08:52:45.2719052Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r510,%r511,%r512,%r513}, %rd52, %p2, 1, 1; 2026-02-21T08:52:45.2719119Z // end inline asm 2026-02-21T08:52:45.2719184Z add.s32 %r743, %r735, 16384; 2026-02-21T08:52:45.2719246Z bfe.u32 %r744, %r743, 4, 14; 2026-02-21T08:52:45.2719309Z cvt.u64.u32 %rd64, %r744; 2026-02-21T08:52:45.2719388Z or.b64 %rd53, %rd64, 4611686293372403712; 2026-02-21T08:52:45.2719448Z // begin inline asm 2026-02-21T08:52:45.2719942Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r546,%r547,%r548,%r549}, %rd53, %p2, 1, 1; 2026-02-21T08:52:45.2720135Z // end inline asm 2026-02-21T08:52:45.2720205Z add.s32 %r745, %r735, 16416; 2026-02-21T08:52:45.2720267Z bfe.u32 %r746, %r745, 4, 14; 2026-02-21T08:52:45.2720337Z cvt.u64.u32 %rd65, %r746; 2026-02-21T08:52:45.2720414Z or.b64 %rd54, %rd65, 4611686293372403712; 2026-02-21T08:52:45.2720473Z // begin inline asm 2026-02-21T08:52:45.2720965Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r582,%r583,%r584,%r585}, %rd54, %p2, 1, 1; 2026-02-21T08:52:45.2721028Z // end inline asm 2026-02-21T08:52:45.2721091Z add.s32 %r747, %r735, 16448; 2026-02-21T08:52:45.2721151Z bfe.u32 %r748, %r747, 4, 14; 2026-02-21T08:52:45.2721219Z cvt.u64.u32 %rd66, %r748; 2026-02-21T08:52:45.2721290Z or.b64 %rd55, %rd66, 4611686293372403712; 2026-02-21T08:52:45.2721350Z // begin inline asm 2026-02-21T08:52:45.2721846Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r618,%r619,%r620,%r621}, %rd55, %p2, 1, 1; 2026-02-21T08:52:45.2721906Z // end inline asm 2026-02-21T08:52:45.2721971Z add.s32 %r749, %r735, 16480; 2026-02-21T08:52:45.2722032Z bfe.u32 %r750, %r749, 4, 14; 2026-02-21T08:52:45.2722099Z cvt.u64.u32 %rd67, %r750; 2026-02-21T08:52:45.2722167Z or.b64 %rd56, %rd67, 4611686293372403712; 2026-02-21T08:52:45.2722227Z // begin inline asm 2026-02-21T08:52:45.2722724Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229}, {%r654,%r655,%r656,%r657}, %rd56, %p2, 1, 1; 2026-02-21T08:52:45.2722782Z // end inline asm 2026-02-21T08:52:45.2722861Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.2722925Z mov.b32 %r675, 0; 2026-02-21T08:52:45.2722986Z mov.b32 %r674, %r2197; 2026-02-21T08:52:45.2723047Z mov.b32 %r676, %r675; 2026-02-21T08:52:45.2723109Z // begin inline asm 2026-02-21T08:52:45.2723415Z // wait for regs: %r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r674,%r675,%r676 2026-02-21T08:52:45.2723495Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.2723552Z // end inline asm 2026-02-21T08:52:45.2723615Z $L__tmp2: 2026-02-21T08:52:45.2723840Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2723905Z add.s32 %r751, %r2213, 1; 2026-02-21T08:52:45.2723982Z setp.gt.s32 %p13, %r751, 1; 2026-02-21T08:52:45.2724051Z selp.b32 %r2213, 0, %r751, %p13; 2026-02-21T08:52:45.2724272Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2724348Z mad.wide.s32 %rd58, %r2210, 2, %rd28; 2026-02-21T08:52:45.2724563Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2724638Z shl.b32 %r752, %r2213, 12; 2026-02-21T08:52:45.2724700Z shl.b32 %r753, %r2213, 13; 2026-02-21T08:52:45.2724771Z add.s32 %r754, %r705, %r753; 2026-02-21T08:52:45.2724834Z add.s32 %r696, %r754, %r10; 2026-02-21T08:52:45.2725022Z selp.b32 %r697, 8, 0, %p11; 2026-02-21T08:52:45.2725092Z // begin inline asm 2026-02-21T08:52:45.2725237Z cp.async.ca.shared.global [ %r696 + 0 ], [ %rd192 + 0 ], 0x8, %r697; 2026-02-21T08:52:45.2725298Z // end inline asm 2026-02-21T08:52:45.2725360Z add.s32 %r698, %r696, 4096; 2026-02-21T08:52:45.2725430Z // begin inline asm 2026-02-21T08:52:45.2725567Z cp.async.ca.shared.global [ %r698 + 0 ], [ %rd58 + 0 ], 0x8, %r697; 2026-02-21T08:52:45.2725626Z // end inline asm 2026-02-21T08:52:45.2725700Z cp.async.commit_group; 2026-02-21T08:52:45.2725906Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2725970Z cvt.s64.s32 %rd68, %r2211; 2026-02-21T08:52:45.2726140Z add.s64 %rd59, %rd29, %rd68; 2026-02-21T08:52:45.2726347Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2726413Z add.s32 %r700, %r14, %r752; 2026-02-21T08:52:45.2726595Z // begin inline asm 2026-02-21T08:52:45.2726737Z cp.async.ca.shared.global [ %r700 + 0 ], [ %rd59 + 0 ], 0x8, %r697; 2026-02-21T08:52:45.2726794Z // end inline asm 2026-02-21T08:52:45.2726860Z cp.async.commit_group; 2026-02-21T08:52:45.2727073Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2727137Z add.s32 %r2211, %r2211, 229376; 2026-02-21T08:52:45.2727199Z add.s64 %rd192, %rd192, 128; 2026-02-21T08:52:45.2727260Z add.s32 %r2210, %r2210, 64; 2026-02-21T08:52:45.2727331Z setp.lt.u64 %p14, %rd193, 4064; 2026-02-21T08:52:45.2727391Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:45.2727504Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.2727580Z cp.async.wait_group 0; 2026-02-21T08:52:45.2727637Z bar.sync 0; 2026-02-21T08:52:45.2727842Z .loc 1 87 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:87:28 2026-02-21T08:52:45.2727929Z cvt.rn.bf16x2.f32 %r788, %r2215, %r2214; 2026-02-21T08:52:45.2728003Z cvt.rn.bf16x2.f32 %r789, %r2217, %r2216; 2026-02-21T08:52:45.2728075Z cvt.rn.bf16x2.f32 %r790, %r2219, %r2218; 2026-02-21T08:52:45.2728145Z cvt.rn.bf16x2.f32 %r791, %r2221, %r2220; 2026-02-21T08:52:45.2728230Z cvt.rn.bf16x2.f32 %r792, %r2223, %r2222; 2026-02-21T08:52:45.2728309Z cvt.rn.bf16x2.f32 %r793, %r2225, %r2224; 2026-02-21T08:52:45.2728380Z cvt.rn.bf16x2.f32 %r794, %r2227, %r2226; 2026-02-21T08:52:45.2728453Z cvt.rn.bf16x2.f32 %r795, %r2229, %r2228; 2026-02-21T08:52:45.2728653Z .loc 1 88 50 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:50 2026-02-21T08:52:45.2728725Z mad.lo.s32 %r796, %r82, 7168, %r84; 2026-02-21T08:52:45.2728799Z mad.lo.s32 %r797, %r83, 7168, %r84; 2026-02-21T08:52:45.2728995Z .loc 1 88 22 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:22 2026-02-21T08:52:45.2729064Z mad.wide.s32 %rd69, %r796, 2, %rd30; 2026-02-21T08:52:45.2729135Z mad.wide.s32 %rd70, %r797, 2, %rd30; 2026-02-21T08:52:45.2729339Z .loc 1 88 81 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:81 2026-02-21T08:52:45.2729446Z st.shared.v4.b32 [%r36], {%r788, %r790, %r792, %r794}; 2026-02-21T08:52:45.2729557Z st.shared.v4.b32 [%r36+512], {%r789, %r791, %r793, %r795}; 2026-02-21T08:52:45.2729620Z bar.sync 0; 2026-02-21T08:52:45.2729681Z // begin inline asm 2026-02-21T08:52:45.2729866Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r765, %r766, %r767, %r768}, [%r759]; 2026-02-21T08:52:45.2729930Z // end inline asm 2026-02-21T08:52:45.2729991Z // begin inline asm 2026-02-21T08:52:45.2730167Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r769, %r770, %r771, %r772}, [%r764]; 2026-02-21T08:52:45.2730228Z // end inline asm 2026-02-21T08:52:45.2730307Z // begin inline asm 2026-02-21T08:52:45.2730428Z st.global.v4.b32 [ %rd69 + 0 ], { %r765, %r766, %r767, %r768 }; 2026-02-21T08:52:45.2730486Z // end inline asm 2026-02-21T08:52:45.2730691Z // begin inline asm 2026-02-21T08:52:45.2730803Z st.global.v4.b32 [ %rd70 + 0 ], { %r769, %r770, %r771, %r772 }; 2026-02-21T08:52:45.2730860Z // end inline asm 2026-02-21T08:52:45.2731082Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2731147Z add.s32 %r798, %r2209, 4224; 2026-02-21T08:52:45.2731348Z .loc 1 25 35 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:25:35 2026-02-21T08:52:45.2731421Z mul.hi.s32 %r799, %r798, -1840700269; 2026-02-21T08:52:45.2731490Z add.s32 %r800, %r799, %r798; 2026-02-21T08:52:45.2731552Z shr.u32 %r801, %r800, 31; 2026-02-21T08:52:45.2731614Z shr.s32 %r802, %r800, 7; 2026-02-21T08:52:45.2731809Z add.s32 %r803, %r802, %r801; 2026-02-21T08:52:45.2732026Z .loc 1 26 33 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:26:33 2026-02-21T08:52:45.2732087Z shl.b32 %r804, %r803, 2; 2026-02-21T08:52:45.2732298Z .loc 1 27 39 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:39 2026-02-21T08:52:45.2732358Z sub.s32 %r805, 1, %r804; 2026-02-21T08:52:45.2732557Z .loc 1 27 52 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:52 2026-02-21T08:52:45.2732617Z min.s32 %r806, %r805, 4; 2026-02-21T08:52:45.2732821Z .loc 1 28 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:45 2026-02-21T08:52:45.2732887Z mul.lo.s32 %r807, %r803, 224; 2026-02-21T08:52:45.2732948Z sub.s32 %r808, %r798, %r807; 2026-02-21T08:52:45.2733154Z .loc 1 29 51 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:29:51 2026-02-21T08:52:45.2733216Z div.s32 %r809, %r808, %r806; 2026-02-21T08:52:45.2733414Z .loc 1 28 64 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:64 2026-02-21T08:52:45.2733486Z mul.lo.s32 %r810, %r809, %r806; 2026-02-21T08:52:45.2733550Z sub.s32 %r811, %r808, %r810; 2026-02-21T08:52:45.2733748Z .loc 1 28 30 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:30 2026-02-21T08:52:45.2733815Z add.s32 %r812, %r811, %r804; 2026-02-21T08:52:45.2734014Z .loc 1 30 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:30:27 2026-02-21T08:52:45.2734075Z shl.b32 %r813, %r812, 6; 2026-02-21T08:52:45.2734274Z .loc 1 31 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:31:32 2026-02-21T08:52:45.2734345Z or.b32 %r127, %r813, %r5; 2026-02-21T08:52:45.2734404Z or.b32 %r128, %r813, %r6; 2026-02-21T08:52:45.2734614Z .loc 1 32 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:32:27 2026-02-21T08:52:45.2734684Z shl.b32 %r814, %r809, 7; 2026-02-21T08:52:45.2734890Z .loc 1 33 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:33:32 2026-02-21T08:52:45.2734952Z or.b32 %r129, %r814, %r8; 2026-02-21T08:52:45.2735154Z .loc 1 48 53 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:53 2026-02-21T08:52:45.2735216Z shl.b32 %r815, %r127, 13; 2026-02-21T08:52:45.2735275Z shl.b32 %r816, %r128, 13; 2026-02-21T08:52:45.2735473Z .loc 1 48 60 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:60 2026-02-21T08:52:45.2735538Z or.b32 %r817, %r815, %r7; 2026-02-21T08:52:45.2735599Z or.b32 %r818, %r816, %r7; 2026-02-21T08:52:45.2735796Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2735875Z mad.wide.s32 %rd71, %r817, 2, %rd28; 2026-02-21T08:52:45.2735946Z mad.wide.s32 %rd72, %r818, 2, %rd28; 2026-02-21T08:52:45.2736005Z mov.b32 %r774, 8; 2026-02-21T08:52:45.2736213Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2736273Z // begin inline asm 2026-02-21T08:52:45.2736643Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd71 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2736708Z // end inline asm 2026-02-21T08:52:45.2736778Z // begin inline asm 2026-02-21T08:52:45.2736924Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd72 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2736984Z // end inline asm 2026-02-21T08:52:45.2737056Z cp.async.commit_group; 2026-02-21T08:52:45.2737259Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2737324Z add.s32 %r819, %r129, %r2198; 2026-02-21T08:52:45.2737528Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2737591Z cvt.s64.s32 %rd78, %r819; 2026-02-21T08:52:45.2737784Z add.s64 %rd73, %rd29, %rd78; 2026-02-21T08:52:45.2737988Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2738056Z // begin inline asm 2026-02-21T08:52:45.2738189Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd73 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2738245Z // end inline asm 2026-02-21T08:52:45.2738316Z cp.async.commit_group; 2026-02-21T08:52:45.2738515Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2738575Z cvt.s64.s32 %rd79, %r815; 2026-02-21T08:52:45.2738640Z or.b64 %rd80, %rd79, %rd191; 2026-02-21T08:52:45.2738700Z shl.b64 %rd81, %rd80, 1; 2026-02-21T08:52:45.2738762Z add.s64 %rd82, %rd28, %rd81; 2026-02-21T08:52:45.2738824Z add.s64 %rd74, %rd82, 128; 2026-02-21T08:52:45.2738889Z cvt.s64.s32 %rd83, %r816; 2026-02-21T08:52:45.2738951Z or.b64 %rd84, %rd83, %rd191; 2026-02-21T08:52:45.2739010Z shl.b64 %rd85, %rd84, 1; 2026-02-21T08:52:45.2739079Z add.s64 %rd86, %rd28, %rd85; 2026-02-21T08:52:45.2739142Z add.s64 %rd75, %rd86, 128; 2026-02-21T08:52:45.2739339Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2739400Z bar.sync 0; 2026-02-21T08:52:45.2739468Z // begin inline asm 2026-02-21T08:52:45.2739594Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd74 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2739652Z // end inline asm 2026-02-21T08:52:45.2739722Z // begin inline asm 2026-02-21T08:52:45.2739855Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd75 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2739913Z // end inline asm 2026-02-21T08:52:45.2739986Z cp.async.commit_group; 2026-02-21T08:52:45.2740183Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2740245Z add.s32 %r820, %r129, %r2199; 2026-02-21T08:52:45.2740442Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2740512Z cvt.s64.s32 %rd87, %r820; 2026-02-21T08:52:45.2740574Z add.s64 %rd76, %rd29, %rd87; 2026-02-21T08:52:45.2740772Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2740840Z // begin inline asm 2026-02-21T08:52:45.2740966Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd76 + 0 ], 0x8, %r774; 2026-02-21T08:52:45.2741022Z // end inline asm 2026-02-21T08:52:45.2741090Z cp.async.commit_group; 2026-02-21T08:52:45.2741299Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2741361Z add.s32 %r2231, %r47, %r814; 2026-02-21T08:52:45.2741421Z shl.b32 %r821, %r812, 19; 2026-02-21T08:52:45.2741487Z or.b32 %r822, %r48, %r821; 2026-02-21T08:52:45.2741555Z mad.wide.s32 %rd194, %r822, 2, %rd1; 2026-02-21T08:52:45.2741617Z or.b32 %r2230, %r49, %r821; 2026-02-21T08:52:45.2741685Z mov.b32 %r2234, 0f00000000; 2026-02-21T08:52:45.2741746Z mov.b32 %r2233, 1; 2026-02-21T08:52:45.2741808Z mov.b32 %r2232, -1; 2026-02-21T08:52:45.2741870Z mov.b64 %rd195, -32; 2026-02-21T08:52:45.2741938Z mov.b32 %r2235, %r2234; 2026-02-21T08:52:45.2742932Z mov.b32 %r2236, %r2234; 2026-02-21T08:52:45.2742993Z mov.b32 %r2237, %r2234; 2026-02-21T08:52:45.2743058Z mov.b32 %r2238, %r2234; 2026-02-21T08:52:45.2743118Z mov.b32 %r2239, %r2234; 2026-02-21T08:52:45.2743177Z mov.b32 %r2240, %r2234; 2026-02-21T08:52:45.2743234Z mov.b32 %r2241, %r2234; 2026-02-21T08:52:45.2743299Z mov.b32 %r2242, %r2234; 2026-02-21T08:52:45.2743359Z mov.b32 %r2243, %r2234; 2026-02-21T08:52:45.2743423Z mov.b32 %r2244, %r2234; 2026-02-21T08:52:45.2743487Z mov.b32 %r2245, %r2234; 2026-02-21T08:52:45.2743546Z mov.b32 %r2246, %r2234; 2026-02-21T08:52:45.2743604Z mov.b32 %r2247, %r2234; 2026-02-21T08:52:45.2743668Z mov.b32 %r2248, %r2234; 2026-02-21T08:52:45.2743726Z mov.b32 %r2249, %r2234; 2026-02-21T08:52:45.2743960Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.2744073Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.2744143Z add.s64 %rd195, %rd195, 32; 2026-02-21T08:52:45.2744218Z setp.lt.u64 %p24, %rd195, 4032; 2026-02-21T08:52:45.2744279Z add.s32 %r1155, %r2232, 1; 2026-02-21T08:52:45.2744352Z setp.gt.s32 %p25, %r1155, 1; 2026-02-21T08:52:45.2744428Z selp.b32 %r2232, 0, %r1155, %p25; 2026-02-21T08:52:45.2744631Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2744698Z cp.async.wait_group 2; 2026-02-21T08:52:45.2744761Z bar.sync 0; 2026-02-21T08:52:45.2744822Z shl.b32 %r1156, %r2232, 12; 2026-02-21T08:52:45.2744882Z shl.b32 %r1157, %r2232, 13; 2026-02-21T08:52:45.2744952Z add.s32 %r1159, %r705, %r1157; 2026-02-21T08:52:45.2745152Z .loc 1 52 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:52:32 2026-02-21T08:52:45.2745221Z add.s32 %r1160, %r1159, %r19; 2026-02-21T08:52:45.2745295Z ld.shared.b16 %rs113, [%r1160]; 2026-02-21T08:52:45.2745363Z ld.shared.b16 %rs114, [%r1160+1024]; 2026-02-21T08:52:45.2745432Z ld.shared.b16 %rs115, [%r1160+64]; 2026-02-21T08:52:45.2745502Z ld.shared.b16 %rs116, [%r1160+1088]; 2026-02-21T08:52:45.2745570Z add.s32 %r1161, %r1159, %r20; 2026-02-21T08:52:45.2745637Z ld.shared.b16 %rs117, [%r1161]; 2026-02-21T08:52:45.2745705Z ld.shared.b16 %rs118, [%r1161+1024]; 2026-02-21T08:52:45.2745780Z ld.shared.b16 %rs119, [%r1161+64]; 2026-02-21T08:52:45.2745846Z ld.shared.b16 %rs120, [%r1161+1088]; 2026-02-21T08:52:45.2745907Z add.s32 %r1162, %r1159, %r21; 2026-02-21T08:52:45.2745973Z ld.shared.b16 %rs121, [%r1162]; 2026-02-21T08:52:45.2746045Z ld.shared.b16 %rs122, [%r1162+1024]; 2026-02-21T08:52:45.2746112Z ld.shared.b16 %rs123, [%r1162+64]; 2026-02-21T08:52:45.2746178Z ld.shared.b16 %rs124, [%r1162+1088]; 2026-02-21T08:52:45.2746248Z add.s32 %r1163, %r1159, %r22; 2026-02-21T08:52:45.2746316Z ld.shared.b16 %rs125, [%r1163]; 2026-02-21T08:52:45.2746384Z ld.shared.b16 %rs126, [%r1163+1024]; 2026-02-21T08:52:45.2746586Z ld.shared.b16 %rs127, [%r1163+64]; 2026-02-21T08:52:45.2746657Z ld.shared.b16 %rs128, [%r1163+1088]; 2026-02-21T08:52:45.2746724Z add.s32 %r1164, %r1159, %r23; 2026-02-21T08:52:45.2746791Z ld.shared.b16 %rs129, [%r1164]; 2026-02-21T08:52:45.2746864Z ld.shared.b16 %rs130, [%r1164+1024]; 2026-02-21T08:52:45.2746931Z ld.shared.b16 %rs131, [%r1164+64]; 2026-02-21T08:52:45.2746998Z ld.shared.b16 %rs132, [%r1164+1088]; 2026-02-21T08:52:45.2747064Z add.s32 %r1165, %r1159, %r24; 2026-02-21T08:52:45.2747128Z ld.shared.b16 %rs133, [%r1165]; 2026-02-21T08:52:45.2747194Z ld.shared.b16 %rs134, [%r1165+1024]; 2026-02-21T08:52:45.2747258Z ld.shared.b16 %rs135, [%r1165+64]; 2026-02-21T08:52:45.2747346Z ld.shared.b16 %rs136, [%r1165+1088]; 2026-02-21T08:52:45.2747409Z add.s32 %r1166, %r1159, %r25; 2026-02-21T08:52:45.2747474Z ld.shared.b16 %rs137, [%r1166]; 2026-02-21T08:52:45.2747549Z ld.shared.b16 %rs138, [%r1166+1024]; 2026-02-21T08:52:45.2747614Z ld.shared.b16 %rs139, [%r1166+64]; 2026-02-21T08:52:45.2747679Z ld.shared.b16 %rs140, [%r1166+1088]; 2026-02-21T08:52:45.2747908Z add.s32 %r1167, %r1159, %r26; 2026-02-21T08:52:45.2747976Z ld.shared.b16 %rs141, [%r1167]; 2026-02-21T08:52:45.2748041Z ld.shared.b16 %rs142, [%r1167+1024]; 2026-02-21T08:52:45.2748105Z ld.shared.b16 %rs143, [%r1167+64]; 2026-02-21T08:52:45.2748177Z ld.shared.b16 %rs144, [%r1167+1088]; 2026-02-21T08:52:45.2748239Z cvt.f32.bf16 %r855, %rs113; 2026-02-21T08:52:45.2748301Z cvt.f32.bf16 %r856, %rs114; 2026-02-21T08:52:45.2748372Z cvt.f32.bf16 %r857, %rs117; 2026-02-21T08:52:45.2748432Z cvt.f32.bf16 %r858, %rs118; 2026-02-21T08:52:45.2748492Z cvt.f32.bf16 %r891, %rs121; 2026-02-21T08:52:45.2748639Z cvt.f32.bf16 %r892, %rs122; 2026-02-21T08:52:45.2748709Z cvt.f32.bf16 %r893, %rs125; 2026-02-21T08:52:45.2748775Z cvt.f32.bf16 %r894, %rs126; 2026-02-21T08:52:45.2748966Z cvt.f32.bf16 %r927, %rs129; 2026-02-21T08:52:45.2749036Z cvt.f32.bf16 %r928, %rs130; 2026-02-21T08:52:45.2749098Z cvt.f32.bf16 %r929, %rs133; 2026-02-21T08:52:45.2749158Z cvt.f32.bf16 %r930, %rs134; 2026-02-21T08:52:45.2749223Z cvt.f32.bf16 %r963, %rs137; 2026-02-21T08:52:45.2749290Z cvt.f32.bf16 %r964, %rs138; 2026-02-21T08:52:45.2749350Z cvt.f32.bf16 %r965, %rs141; 2026-02-21T08:52:45.2749411Z cvt.f32.bf16 %r966, %rs142; 2026-02-21T08:52:45.2749478Z cvt.f32.bf16 %r999, %rs115; 2026-02-21T08:52:45.2749544Z cvt.f32.bf16 %r1000, %rs116; 2026-02-21T08:52:45.2749606Z cvt.f32.bf16 %r1001, %rs119; 2026-02-21T08:52:45.2749674Z cvt.f32.bf16 %r1002, %rs120; 2026-02-21T08:52:45.2749736Z cvt.f32.bf16 %r1035, %rs123; 2026-02-21T08:52:45.2749797Z cvt.f32.bf16 %r1036, %rs124; 2026-02-21T08:52:45.2749858Z cvt.f32.bf16 %r1037, %rs127; 2026-02-21T08:52:45.2749923Z cvt.f32.bf16 %r1038, %rs128; 2026-02-21T08:52:45.2749984Z cvt.f32.bf16 %r1071, %rs131; 2026-02-21T08:52:45.2750045Z cvt.f32.bf16 %r1072, %rs132; 2026-02-21T08:52:45.2750114Z cvt.f32.bf16 %r1073, %rs135; 2026-02-21T08:52:45.2750177Z cvt.f32.bf16 %r1074, %rs136; 2026-02-21T08:52:45.2750253Z cvt.f32.bf16 %r1107, %rs139; 2026-02-21T08:52:45.2750318Z cvt.f32.bf16 %r1108, %rs140; 2026-02-21T08:52:45.2750387Z cvt.f32.bf16 %r1109, %rs143; 2026-02-21T08:52:45.2750448Z cvt.f32.bf16 %r1110, %rs144; 2026-02-21T08:52:45.2750665Z .loc 1 67 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:67:45 2026-02-21T08:52:45.2750737Z add.s32 %r1168, %r27, %r1156; 2026-02-21T08:52:45.2750806Z ld.shared.b8 %rs145, [%r1168]; 2026-02-21T08:52:45.2750877Z ld.shared.b8 %rs146, [%r1168+256]; 2026-02-21T08:52:45.2750942Z ld.shared.b8 %rs147, [%r1168+512]; 2026-02-21T08:52:45.2751016Z ld.shared.b8 %rs148, [%r1168+768]; 2026-02-21T08:52:45.2751085Z ld.shared.b8 %rs149, [%r1168+1024]; 2026-02-21T08:52:45.2751153Z ld.shared.b8 %rs150, [%r1168+1280]; 2026-02-21T08:52:45.2751226Z ld.shared.b8 %rs151, [%r1168+1536]; 2026-02-21T08:52:45.2751295Z ld.shared.b8 %rs152, [%r1168+1792]; 2026-02-21T08:52:45.2751364Z ld.shared.b8 %rs153, [%r1168+2048]; 2026-02-21T08:52:45.2751435Z ld.shared.b8 %rs154, [%r1168+2304]; 2026-02-21T08:52:45.2751504Z ld.shared.b8 %rs155, [%r1168+2560]; 2026-02-21T08:52:45.2751581Z ld.shared.b8 %rs156, [%r1168+2816]; 2026-02-21T08:52:45.2751653Z ld.shared.b8 %rs157, [%r1168+3072]; 2026-02-21T08:52:45.2751725Z ld.shared.b8 %rs158, [%r1168+3328]; 2026-02-21T08:52:45.2751792Z ld.shared.b8 %rs159, [%r1168+3584]; 2026-02-21T08:52:45.2751857Z ld.shared.b8 %rs160, [%r1168+3840]; 2026-02-21T08:52:45.2752073Z .loc 1 57 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:57:28 2026-02-21T08:52:45.2752138Z shl.b16 %rs161, %rs145, 4; 2026-02-21T08:52:45.2752201Z shl.b16 %rs162, %rs146, 4; 2026-02-21T08:52:45.2752274Z shl.b16 %rs163, %rs147, 4; 2026-02-21T08:52:45.2752336Z shl.b16 %rs164, %rs148, 4; 2026-02-21T08:52:45.2752396Z shl.b16 %rs165, %rs149, 4; 2026-02-21T08:52:45.2752458Z shl.b16 %rs166, %rs150, 4; 2026-02-21T08:52:45.2752526Z shl.b16 %rs167, %rs151, 4; 2026-02-21T08:52:45.2752587Z shl.b16 %rs168, %rs152, 4; 2026-02-21T08:52:45.2752648Z shl.b16 %rs169, %rs153, 4; 2026-02-21T08:52:45.2752843Z shl.b16 %rs170, %rs154, 4; 2026-02-21T08:52:45.2752907Z shl.b16 %rs171, %rs155, 4; 2026-02-21T08:52:45.2752969Z shl.b16 %rs172, %rs156, 4; 2026-02-21T08:52:45.2753032Z shl.b16 %rs173, %rs157, 4; 2026-02-21T08:52:45.2753099Z shl.b16 %rs174, %rs158, 4; 2026-02-21T08:52:45.2753165Z shl.b16 %rs175, %rs159, 4; 2026-02-21T08:52:45.2753227Z shl.b16 %rs176, %rs160, 4; 2026-02-21T08:52:45.2753437Z .loc 1 72 58 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:72:58 2026-02-21T08:52:45.2753515Z selp.b16 %rs177, %rs161, %rs145, %p57; 2026-02-21T08:52:45.2753580Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T08:52:45.2753643Z shr.s16 %rs179, %rs178, 4; 2026-02-21T08:52:45.2753725Z selp.b16 %rs180, %rs162, %rs146, %p57; 2026-02-21T08:52:45.2753892Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T08:52:45.2753959Z shr.s16 %rs182, %rs181, 4; 2026-02-21T08:52:45.2754040Z selp.b16 %rs183, %rs163, %rs147, %p57; 2026-02-21T08:52:45.2754106Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T08:52:45.2754168Z shr.s16 %rs185, %rs184, 4; 2026-02-21T08:52:45.2754238Z selp.b16 %rs186, %rs164, %rs148, %p57; 2026-02-21T08:52:45.2754306Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T08:52:45.2754368Z shr.s16 %rs188, %rs187, 4; 2026-02-21T08:52:45.2754437Z selp.b16 %rs189, %rs165, %rs149, %p57; 2026-02-21T08:52:45.2754506Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T08:52:45.2754569Z shr.s16 %rs191, %rs190, 4; 2026-02-21T08:52:45.2754636Z selp.b16 %rs192, %rs166, %rs150, %p57; 2026-02-21T08:52:45.2754700Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T08:52:45.2754761Z shr.s16 %rs194, %rs193, 4; 2026-02-21T08:52:45.2754828Z selp.b16 %rs195, %rs167, %rs151, %p57; 2026-02-21T08:52:45.2754890Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T08:52:45.2754958Z shr.s16 %rs197, %rs196, 4; 2026-02-21T08:52:45.2755026Z selp.b16 %rs198, %rs168, %rs152, %p57; 2026-02-21T08:52:45.2755089Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T08:52:45.2755157Z shr.s16 %rs200, %rs199, 4; 2026-02-21T08:52:45.2755226Z selp.b16 %rs201, %rs169, %rs153, %p57; 2026-02-21T08:52:45.2755286Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T08:52:45.2755347Z shr.s16 %rs203, %rs202, 4; 2026-02-21T08:52:45.2755418Z selp.b16 %rs204, %rs170, %rs154, %p57; 2026-02-21T08:52:45.2755478Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T08:52:45.2755538Z shr.s16 %rs206, %rs205, 4; 2026-02-21T08:52:45.2755610Z selp.b16 %rs207, %rs171, %rs155, %p57; 2026-02-21T08:52:45.2755671Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T08:52:45.2755732Z shr.s16 %rs209, %rs208, 4; 2026-02-21T08:52:45.2755800Z selp.b16 %rs210, %rs172, %rs156, %p57; 2026-02-21T08:52:45.2755868Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T08:52:45.2755928Z shr.s16 %rs212, %rs211, 4; 2026-02-21T08:52:45.2755996Z selp.b16 %rs213, %rs173, %rs157, %p57; 2026-02-21T08:52:45.2756075Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T08:52:45.2756138Z shr.s16 %rs215, %rs214, 4; 2026-02-21T08:52:45.2756207Z selp.b16 %rs216, %rs174, %rs158, %p57; 2026-02-21T08:52:45.2756278Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T08:52:45.2756338Z shr.s16 %rs218, %rs217, 4; 2026-02-21T08:52:45.2756405Z selp.b16 %rs219, %rs175, %rs159, %p57; 2026-02-21T08:52:45.2756585Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T08:52:45.2756655Z shr.s16 %rs221, %rs220, 4; 2026-02-21T08:52:45.2756723Z selp.b16 %rs222, %rs176, %rs160, %p57; 2026-02-21T08:52:45.2756784Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T08:52:45.2756850Z shr.s16 %rs224, %rs223, 4; 2026-02-21T08:52:45.2757052Z .loc 1 77 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:77:32 2026-02-21T08:52:45.2757118Z cvt.rn.f32.s16 %r1169, %rs179; 2026-02-21T08:52:45.2757182Z cvt.rn.f32.s16 %r1170, %rs182; 2026-02-21T08:52:45.2757250Z cvt.rn.f32.s16 %r1171, %rs185; 2026-02-21T08:52:45.2757316Z cvt.rn.f32.s16 %r1172, %rs188; 2026-02-21T08:52:45.2757381Z cvt.rn.f32.s16 %r1173, %rs191; 2026-02-21T08:52:45.2757449Z cvt.rn.f32.s16 %r1174, %rs194; 2026-02-21T08:52:45.2757510Z cvt.rn.f32.s16 %r1175, %rs197; 2026-02-21T08:52:45.2757722Z cvt.rn.f32.s16 %r1176, %rs200; 2026-02-21T08:52:45.2757787Z cvt.rn.f32.s16 %r1177, %rs203; 2026-02-21T08:52:45.2757854Z cvt.rn.f32.s16 %r1178, %rs206; 2026-02-21T08:52:45.2757917Z cvt.rn.f32.s16 %r1179, %rs209; 2026-02-21T08:52:45.2757979Z cvt.rn.f32.s16 %r1180, %rs212; 2026-02-21T08:52:45.2758048Z cvt.rn.f32.s16 %r1181, %rs215; 2026-02-21T08:52:45.2758111Z cvt.rn.f32.s16 %r1182, %rs218; 2026-02-21T08:52:45.2758174Z cvt.rn.f32.s16 %r1183, %rs221; 2026-02-21T08:52:45.2758242Z cvt.rn.f32.s16 %r1184, %rs224; 2026-02-21T08:52:45.2758309Z st.shared.b32 [%r39], %r1169; 2026-02-21T08:52:45.2758378Z st.shared.b32 [%r39+16384], %r1177; 2026-02-21T08:52:45.2758442Z st.shared.b32 [%r40], %r1170; 2026-02-21T08:52:45.2758518Z st.shared.b32 [%r40+16384], %r1178; 2026-02-21T08:52:45.2758699Z st.shared.b32 [%r41], %r1171; 2026-02-21T08:52:45.2758770Z st.shared.b32 [%r41+16384], %r1179; 2026-02-21T08:52:45.2758856Z st.shared.b32 [%r42], %r1172; 2026-02-21T08:52:45.2758930Z st.shared.b32 [%r42+16384], %r1180; 2026-02-21T08:52:45.2758994Z st.shared.b32 [%r43], %r1173; 2026-02-21T08:52:45.2759060Z st.shared.b32 [%r43+16384], %r1181; 2026-02-21T08:52:45.2759131Z st.shared.b32 [%r44], %r1174; 2026-02-21T08:52:45.2759198Z st.shared.b32 [%r44+16384], %r1182; 2026-02-21T08:52:45.2759262Z st.shared.b32 [%r45], %r1175; 2026-02-21T08:52:45.2759336Z st.shared.b32 [%r45+16384], %r1183; 2026-02-21T08:52:45.2759398Z st.shared.b32 [%r46], %r1176; 2026-02-21T08:52:45.2759463Z st.shared.b32 [%r46+16384], %r1184; 2026-02-21T08:52:45.2759519Z $L__tmp3: 2026-02-21T08:52:45.2759810Z .loc 2 291 36 // standard.py:291:36 @[ cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:84:40 ] 2026-02-21T08:52:45.2759873Z // begin inline asm 2026-02-21T08:52:45.2759956Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.2760021Z // end inline asm 2026-02-21T08:52:45.2760078Z bar.sync 0; 2026-02-21T08:52:45.2760160Z shfl.sync.idx.b32 %r1185, %r4, 0, 31, -1; 2026-02-21T08:52:45.2760251Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.2760316Z shl.b32 %r1186, %r1185, 10; 2026-02-21T08:52:45.2760378Z and.b32 %r1187, %r1186, 12288; 2026-02-21T08:52:45.2760443Z add.s32 %r1188, %r1187, %r2197; 2026-02-21T08:52:45.2760513Z bfe.u32 %r1189, %r1188, 4, 14; 2026-02-21T08:52:45.2760576Z cvt.u64.u32 %rd99, %r1189; 2026-02-21T08:52:45.2760653Z or.b64 %rd88, %rd99, 4611686293372403712; 2026-02-21T08:52:45.2760725Z mov.pred %p15, -1; 2026-02-21T08:52:45.2760788Z // begin inline asm 2026-02-21T08:52:45.2761302Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r855,%r856,%r857,%r858}, %rd88, %p15, 1, 1; 2026-02-21T08:52:45.2761369Z // end inline asm 2026-02-21T08:52:45.2761431Z add.s32 %r1190, %r1188, 32; 2026-02-21T08:52:45.2761494Z bfe.u32 %r1191, %r1190, 4, 14; 2026-02-21T08:52:45.2761558Z cvt.u64.u32 %rd100, %r1191; 2026-02-21T08:52:45.2761660Z or.b64 %rd89, %rd100, 4611686293372403712; 2026-02-21T08:52:45.2761724Z // begin inline asm 2026-02-21T08:52:45.2762226Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r891,%r892,%r893,%r894}, %rd89, %p15, 1, 1; 2026-02-21T08:52:45.2762291Z // end inline asm 2026-02-21T08:52:45.2762352Z add.s32 %r1192, %r1188, 64; 2026-02-21T08:52:45.2762414Z bfe.u32 %r1193, %r1192, 4, 14; 2026-02-21T08:52:45.2762483Z cvt.u64.u32 %rd101, %r1193; 2026-02-21T08:52:45.2762557Z or.b64 %rd90, %rd101, 4611686293372403712; 2026-02-21T08:52:45.2762618Z // begin inline asm 2026-02-21T08:52:45.2763111Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r927,%r928,%r929,%r930}, %rd90, %p15, 1, 1; 2026-02-21T08:52:45.2763177Z // end inline asm 2026-02-21T08:52:45.2763344Z add.s32 %r1194, %r1188, 96; 2026-02-21T08:52:45.2763406Z bfe.u32 %r1195, %r1194, 4, 14; 2026-02-21T08:52:45.2763475Z cvt.u64.u32 %rd102, %r1195; 2026-02-21T08:52:45.2763546Z or.b64 %rd91, %rd102, 4611686293372403712; 2026-02-21T08:52:45.2763617Z // begin inline asm 2026-02-21T08:52:45.2764120Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r963,%r964,%r965,%r966}, %rd91, %p15, 1, 1; 2026-02-21T08:52:45.2764177Z // end inline asm 2026-02-21T08:52:45.2764239Z add.s32 %r1196, %r1188, 16384; 2026-02-21T08:52:45.2764300Z bfe.u32 %r1197, %r1196, 4, 14; 2026-02-21T08:52:45.2764369Z cvt.u64.u32 %rd103, %r1197; 2026-02-21T08:52:45.2764549Z or.b64 %rd92, %rd103, 4611686293372403712; 2026-02-21T08:52:45.2764611Z // begin inline asm 2026-02-21T08:52:45.2765119Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r999,%r1000,%r1001,%r1002}, %rd92, %p15, 1, 1; 2026-02-21T08:52:45.2765181Z // end inline asm 2026-02-21T08:52:45.2765242Z add.s32 %r1198, %r1188, 16416; 2026-02-21T08:52:45.2765309Z bfe.u32 %r1199, %r1198, 4, 14; 2026-02-21T08:52:45.2765371Z cvt.u64.u32 %rd104, %r1199; 2026-02-21T08:52:45.2765443Z or.b64 %rd93, %rd104, 4611686293372403712; 2026-02-21T08:52:45.2765504Z // begin inline asm 2026-02-21T08:52:45.2766012Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r1035,%r1036,%r1037,%r1038}, %rd93, %p15, 1, 1; 2026-02-21T08:52:45.2766069Z // end inline asm 2026-02-21T08:52:45.2766129Z add.s32 %r1200, %r1188, 16448; 2026-02-21T08:52:45.2766197Z bfe.u32 %r1201, %r1200, 4, 14; 2026-02-21T08:52:45.2766258Z cvt.u64.u32 %rd105, %r1201; 2026-02-21T08:52:45.2766329Z or.b64 %rd94, %rd105, 4611686293372403712; 2026-02-21T08:52:45.2766396Z // begin inline asm 2026-02-21T08:52:45.2767030Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r1071,%r1072,%r1073,%r1074}, %rd94, %p15, 1, 1; 2026-02-21T08:52:45.2767092Z // end inline asm 2026-02-21T08:52:45.2767160Z add.s32 %r1202, %r1188, 16480; 2026-02-21T08:52:45.2767221Z bfe.u32 %r1203, %r1202, 4, 14; 2026-02-21T08:52:45.2767284Z cvt.u64.u32 %rd106, %r1203; 2026-02-21T08:52:45.2767355Z or.b64 %rd95, %rd106, 4611686293372403712; 2026-02-21T08:52:45.2767420Z // begin inline asm 2026-02-21T08:52:45.2767924Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249}, {%r1107,%r1108,%r1109,%r1110}, %rd95, %p15, 1, 1; 2026-02-21T08:52:45.2767982Z // end inline asm 2026-02-21T08:52:45.2781735Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.2781855Z mov.b32 %r1129, 0; 2026-02-21T08:52:45.2781925Z mov.b32 %r1127, %r2197; 2026-02-21T08:52:45.2781985Z mov.b32 %r1128, %r1129; 2026-02-21T08:52:45.2782053Z // begin inline asm 2026-02-21T08:52:45.2782397Z // wait for regs: %r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r1127,%r1128,%r1129 2026-02-21T08:52:45.2782482Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.2782552Z // end inline asm 2026-02-21T08:52:45.2782612Z $L__tmp4: 2026-02-21T08:52:45.2782857Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2782934Z add.s32 %r1204, %r2233, 1; 2026-02-21T08:52:45.2783005Z setp.gt.s32 %p26, %r1204, 1; 2026-02-21T08:52:45.2783078Z selp.b32 %r2233, 0, %r1204, %p26; 2026-02-21T08:52:45.2783302Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2783390Z mad.wide.s32 %rd97, %r2230, 2, %rd28; 2026-02-21T08:52:45.2783822Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2783890Z shl.b32 %r1205, %r2233, 12; 2026-02-21T08:52:45.2783959Z shl.b32 %r1206, %r2233, 13; 2026-02-21T08:52:45.2784026Z add.s32 %r1207, %r705, %r1206; 2026-02-21T08:52:45.2784091Z add.s32 %r1149, %r1207, %r10; 2026-02-21T08:52:45.2784163Z selp.b32 %r1150, 8, 0, %p24; 2026-02-21T08:52:45.2784225Z // begin inline asm 2026-02-21T08:52:45.2784377Z cp.async.ca.shared.global [ %r1149 + 0 ], [ %rd194 + 0 ], 0x8, %r1150; 2026-02-21T08:52:45.2784439Z // end inline asm 2026-02-21T08:52:45.2784511Z add.s32 %r1151, %r1149, 4096; 2026-02-21T08:52:45.2784570Z // begin inline asm 2026-02-21T08:52:45.2784841Z cp.async.ca.shared.global [ %r1151 + 0 ], [ %rd97 + 0 ], 0x8, %r1150; 2026-02-21T08:52:45.2784913Z // end inline asm 2026-02-21T08:52:45.2784982Z cp.async.commit_group; 2026-02-21T08:52:45.2785193Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2785268Z cvt.s64.s32 %rd107, %r2231; 2026-02-21T08:52:45.2785336Z add.s64 %rd98, %rd29, %rd107; 2026-02-21T08:52:45.2785538Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2785599Z add.s32 %r1153, %r14, %r1205; 2026-02-21T08:52:45.2785666Z // begin inline asm 2026-02-21T08:52:45.2785803Z cp.async.ca.shared.global [ %r1153 + 0 ], [ %rd98 + 0 ], 0x8, %r1150; 2026-02-21T08:52:45.2785860Z // end inline asm 2026-02-21T08:52:45.2785935Z cp.async.commit_group; 2026-02-21T08:52:45.2786150Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2786219Z add.s32 %r2231, %r2231, 229376; 2026-02-21T08:52:45.2786286Z add.s64 %rd194, %rd194, 128; 2026-02-21T08:52:45.2786354Z add.s32 %r2230, %r2230, 64; 2026-02-21T08:52:45.2786424Z setp.lt.u64 %p27, %rd195, 4064; 2026-02-21T08:52:45.2786699Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:45.2786865Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.2786938Z cp.async.wait_group 0; 2026-02-21T08:52:45.2786997Z bar.sync 0; 2026-02-21T08:52:45.2787213Z .loc 1 87 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:87:28 2026-02-21T08:52:45.2787294Z cvt.rn.bf16x2.f32 %r1241, %r2235, %r2234; 2026-02-21T08:52:45.2787370Z cvt.rn.bf16x2.f32 %r1242, %r2237, %r2236; 2026-02-21T08:52:45.2787439Z cvt.rn.bf16x2.f32 %r1243, %r2239, %r2238; 2026-02-21T08:52:45.2787523Z cvt.rn.bf16x2.f32 %r1244, %r2241, %r2240; 2026-02-21T08:52:45.2787596Z cvt.rn.bf16x2.f32 %r1245, %r2243, %r2242; 2026-02-21T08:52:45.2787666Z cvt.rn.bf16x2.f32 %r1246, %r2245, %r2244; 2026-02-21T08:52:45.2787754Z cvt.rn.bf16x2.f32 %r1247, %r2247, %r2246; 2026-02-21T08:52:45.2787828Z cvt.rn.bf16x2.f32 %r1248, %r2249, %r2248; 2026-02-21T08:52:45.2788063Z .loc 1 88 50 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:50 2026-02-21T08:52:45.2788152Z mad.lo.s32 %r1249, %r127, 7168, %r129; 2026-02-21T08:52:45.2788221Z mad.lo.s32 %r1250, %r128, 7168, %r129; 2026-02-21T08:52:45.2788430Z .loc 1 88 22 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:22 2026-02-21T08:52:45.2788567Z mad.wide.s32 %rd108, %r1249, 2, %rd30; 2026-02-21T08:52:45.2788647Z mad.wide.s32 %rd109, %r1250, 2, %rd30; 2026-02-21T08:52:45.2788850Z .loc 1 88 81 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:81 2026-02-21T08:52:45.2788962Z st.shared.v4.b32 [%r36], {%r1241, %r1243, %r1245, %r1247}; 2026-02-21T08:52:45.2789087Z st.shared.v4.b32 [%r36+512], {%r1242, %r1244, %r1246, %r1248}; 2026-02-21T08:52:45.2789145Z bar.sync 0; 2026-02-21T08:52:45.2789212Z // begin inline asm 2026-02-21T08:52:45.2789408Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1218, %r1219, %r1220, %r1221}, [%r759]; 2026-02-21T08:52:45.2789464Z // end inline asm 2026-02-21T08:52:45.2789689Z // begin inline asm 2026-02-21T08:52:45.2789874Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1222, %r1223, %r1224, %r1225}, [%r764]; 2026-02-21T08:52:45.2789932Z // end inline asm 2026-02-21T08:52:45.2789989Z // begin inline asm 2026-02-21T08:52:45.2790115Z st.global.v4.b32 [ %rd108 + 0 ], { %r1218, %r1219, %r1220, %r1221 }; 2026-02-21T08:52:45.2790178Z // end inline asm 2026-02-21T08:52:45.2790239Z // begin inline asm 2026-02-21T08:52:45.2790359Z st.global.v4.b32 [ %rd109 + 0 ], { %r1222, %r1223, %r1224, %r1225 }; 2026-02-21T08:52:45.2790417Z // end inline asm 2026-02-21T08:52:45.2790640Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2790707Z add.s32 %r1251, %r2209, 8448; 2026-02-21T08:52:45.2791057Z .loc 1 25 35 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:25:35 2026-02-21T08:52:45.2791147Z mul.hi.s32 %r1252, %r1251, -1840700269; 2026-02-21T08:52:45.2791218Z add.s32 %r1253, %r1252, %r1251; 2026-02-21T08:52:45.2791282Z shr.u32 %r1254, %r1253, 31; 2026-02-21T08:52:45.2791350Z shr.s32 %r1255, %r1253, 7; 2026-02-21T08:52:45.2791413Z add.s32 %r1256, %r1255, %r1254; 2026-02-21T08:52:45.2791626Z .loc 1 26 33 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:26:33 2026-02-21T08:52:45.2791696Z shl.b32 %r1257, %r1256, 2; 2026-02-21T08:52:45.2791902Z .loc 1 27 39 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:39 2026-02-21T08:52:45.2791963Z sub.s32 %r1258, 1, %r1257; 2026-02-21T08:52:45.2792161Z .loc 1 27 52 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:52 2026-02-21T08:52:45.2792229Z min.s32 %r1259, %r1258, 4; 2026-02-21T08:52:45.2792426Z .loc 1 28 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:45 2026-02-21T08:52:45.2792492Z mul.lo.s32 %r1260, %r1256, 224; 2026-02-21T08:52:45.2792564Z sub.s32 %r1261, %r1251, %r1260; 2026-02-21T08:52:45.2792761Z .loc 1 29 51 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:29:51 2026-02-21T08:52:45.2792824Z div.s32 %r1262, %r1261, %r1259; 2026-02-21T08:52:45.2793024Z .loc 1 28 64 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:64 2026-02-21T08:52:45.2793094Z mul.lo.s32 %r1263, %r1262, %r1259; 2026-02-21T08:52:45.2793161Z sub.s32 %r1264, %r1261, %r1263; 2026-02-21T08:52:45.2793354Z .loc 1 28 30 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:30 2026-02-21T08:52:45.2793425Z add.s32 %r1265, %r1264, %r1257; 2026-02-21T08:52:45.2793623Z .loc 1 30 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:30:27 2026-02-21T08:52:45.2793690Z shl.b32 %r1266, %r1265, 6; 2026-02-21T08:52:45.2793893Z .loc 1 31 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:31:32 2026-02-21T08:52:45.2793959Z or.b32 %r172, %r1266, %r5; 2026-02-21T08:52:45.2794021Z or.b32 %r173, %r1266, %r6; 2026-02-21T08:52:45.2794222Z .loc 1 32 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:32:27 2026-02-21T08:52:45.2794284Z shl.b32 %r1267, %r1262, 7; 2026-02-21T08:52:45.2794480Z .loc 1 33 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:33:32 2026-02-21T08:52:45.2794547Z or.b32 %r174, %r1267, %r8; 2026-02-21T08:52:45.2794746Z .loc 1 48 53 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:53 2026-02-21T08:52:45.2794808Z shl.b32 %r1268, %r172, 13; 2026-02-21T08:52:45.2794882Z shl.b32 %r1269, %r173, 13; 2026-02-21T08:52:45.2795095Z .loc 1 48 60 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:60 2026-02-21T08:52:45.2795164Z or.b32 %r1270, %r1268, %r7; 2026-02-21T08:52:45.2795227Z or.b32 %r1271, %r1269, %r7; 2026-02-21T08:52:45.2795430Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2795610Z mad.wide.s32 %rd110, %r1270, 2, %rd28; 2026-02-21T08:52:45.2795683Z mad.wide.s32 %rd111, %r1271, 2, %rd28; 2026-02-21T08:52:45.2795749Z mov.b32 %r1227, 8; 2026-02-21T08:52:45.2795954Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2796016Z // begin inline asm 2026-02-21T08:52:45.2796161Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd110 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2796237Z // end inline asm 2026-02-21T08:52:45.2796299Z // begin inline asm 2026-02-21T08:52:45.2796434Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd111 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2796625Z // end inline asm 2026-02-21T08:52:45.2796834Z cp.async.commit_group; 2026-02-21T08:52:45.2797049Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2797124Z add.s32 %r1272, %r174, %r2198; 2026-02-21T08:52:45.2797343Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2797410Z cvt.s64.s32 %rd117, %r1272; 2026-02-21T08:52:45.2797479Z add.s64 %rd112, %rd29, %rd117; 2026-02-21T08:52:45.2797683Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2797744Z // begin inline asm 2026-02-21T08:52:45.2797877Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd112 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2797942Z // end inline asm 2026-02-21T08:52:45.2798008Z cp.async.commit_group; 2026-02-21T08:52:45.2798205Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2798276Z cvt.s64.s32 %rd118, %r1268; 2026-02-21T08:52:45.2798342Z or.b64 %rd119, %rd118, %rd191; 2026-02-21T08:52:45.2798405Z shl.b64 %rd120, %rd119, 1; 2026-02-21T08:52:45.2798471Z add.s64 %rd121, %rd28, %rd120; 2026-02-21T08:52:45.2798545Z add.s64 %rd113, %rd121, 128; 2026-02-21T08:52:45.2798609Z cvt.s64.s32 %rd122, %r1269; 2026-02-21T08:52:45.2798673Z or.b64 %rd123, %rd122, %rd191; 2026-02-21T08:52:45.2798739Z shl.b64 %rd124, %rd123, 1; 2026-02-21T08:52:45.2798803Z add.s64 %rd125, %rd28, %rd124; 2026-02-21T08:52:45.2798865Z add.s64 %rd114, %rd125, 128; 2026-02-21T08:52:45.2799064Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2799141Z bar.sync 0; 2026-02-21T08:52:45.2799206Z // begin inline asm 2026-02-21T08:52:45.2799343Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd113 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2799407Z // end inline asm 2026-02-21T08:52:45.2799469Z // begin inline asm 2026-02-21T08:52:45.2799602Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd114 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2799669Z // end inline asm 2026-02-21T08:52:45.2799737Z cp.async.commit_group; 2026-02-21T08:52:45.2799935Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2800001Z add.s32 %r1273, %r174, %r2199; 2026-02-21T08:52:45.2800205Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2800268Z cvt.s64.s32 %rd126, %r1273; 2026-02-21T08:52:45.2800331Z add.s64 %rd115, %rd29, %rd126; 2026-02-21T08:52:45.2800531Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2800592Z // begin inline asm 2026-02-21T08:52:45.2800723Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd115 + 0 ], 0x8, %r1227; 2026-02-21T08:52:45.2800786Z // end inline asm 2026-02-21T08:52:45.2800851Z cp.async.commit_group; 2026-02-21T08:52:45.2801063Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2801130Z add.s32 %r2251, %r47, %r1267; 2026-02-21T08:52:45.2801198Z shl.b32 %r1274, %r1265, 19; 2026-02-21T08:52:45.2801392Z or.b32 %r1275, %r48, %r1274; 2026-02-21T08:52:45.2801466Z mad.wide.s32 %rd196, %r1275, 2, %rd1; 2026-02-21T08:52:45.2801535Z or.b32 %r2250, %r49, %r1274; 2026-02-21T08:52:45.2801596Z mov.b32 %r2254, 0f00000000; 2026-02-21T08:52:45.2801658Z mov.b32 %r2253, 1; 2026-02-21T08:52:45.2801720Z mov.b32 %r2252, -1; 2026-02-21T08:52:45.2801788Z mov.b64 %rd197, -32; 2026-02-21T08:52:45.2801854Z mov.b32 %r2255, %r2254; 2026-02-21T08:52:45.2801913Z mov.b32 %r2256, %r2254; 2026-02-21T08:52:45.2801976Z mov.b32 %r2257, %r2254; 2026-02-21T08:52:45.2802034Z mov.b32 %r2258, %r2254; 2026-02-21T08:52:45.2802092Z mov.b32 %r2259, %r2254; 2026-02-21T08:52:45.2802157Z mov.b32 %r2260, %r2254; 2026-02-21T08:52:45.2802216Z mov.b32 %r2261, %r2254; 2026-02-21T08:52:45.2802383Z mov.b32 %r2262, %r2254; 2026-02-21T08:52:45.2802449Z mov.b32 %r2263, %r2254; 2026-02-21T08:52:45.2802515Z mov.b32 %r2264, %r2254; 2026-02-21T08:52:45.2802574Z mov.b32 %r2265, %r2254; 2026-02-21T08:52:45.2802638Z mov.b32 %r2266, %r2254; 2026-02-21T08:52:45.2802704Z mov.b32 %r2267, %r2254; 2026-02-21T08:52:45.2802763Z mov.b32 %r2268, %r2254; 2026-02-21T08:52:45.2802826Z mov.b32 %r2269, %r2254; 2026-02-21T08:52:45.2802944Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.2803057Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.2803121Z add.s64 %rd197, %rd197, 32; 2026-02-21T08:52:45.2803193Z setp.lt.u64 %p37, %rd197, 4032; 2026-02-21T08:52:45.2803262Z add.s32 %r1608, %r2252, 1; 2026-02-21T08:52:45.2803329Z setp.gt.s32 %p38, %r1608, 1; 2026-02-21T08:52:45.2803399Z selp.b32 %r2252, 0, %r1608, %p38; 2026-02-21T08:52:45.2803604Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2803683Z cp.async.wait_group 2; 2026-02-21T08:52:45.2803745Z bar.sync 0; 2026-02-21T08:52:45.2803807Z shl.b32 %r1609, %r2252, 12; 2026-02-21T08:52:45.2803878Z shl.b32 %r1610, %r2252, 13; 2026-02-21T08:52:45.2803943Z add.s32 %r1612, %r705, %r1610; 2026-02-21T08:52:45.2804145Z .loc 1 52 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:52:32 2026-02-21T08:52:45.2804216Z add.s32 %r1613, %r1612, %r19; 2026-02-21T08:52:45.2804286Z ld.shared.b16 %rs225, [%r1613]; 2026-02-21T08:52:45.2804359Z ld.shared.b16 %rs226, [%r1613+1024]; 2026-02-21T08:52:45.2804428Z ld.shared.b16 %rs227, [%r1613+64]; 2026-02-21T08:52:45.2804506Z ld.shared.b16 %rs228, [%r1613+1088]; 2026-02-21T08:52:45.2804570Z add.s32 %r1614, %r1612, %r20; 2026-02-21T08:52:45.2804636Z ld.shared.b16 %rs229, [%r1614]; 2026-02-21T08:52:45.2804709Z ld.shared.b16 %rs230, [%r1614+1024]; 2026-02-21T08:52:45.2804779Z ld.shared.b16 %rs231, [%r1614+64]; 2026-02-21T08:52:45.2804846Z ld.shared.b16 %rs232, [%r1614+1088]; 2026-02-21T08:52:45.2804914Z add.s32 %r1615, %r1612, %r21; 2026-02-21T08:52:45.2804979Z ld.shared.b16 %rs233, [%r1615]; 2026-02-21T08:52:45.2805049Z ld.shared.b16 %rs234, [%r1615+1024]; 2026-02-21T08:52:45.2805116Z ld.shared.b16 %rs235, [%r1615+64]; 2026-02-21T08:52:45.2805200Z ld.shared.b16 %rs236, [%r1615+1088]; 2026-02-21T08:52:45.2805269Z add.s32 %r1616, %r1612, %r22; 2026-02-21T08:52:45.2805335Z ld.shared.b16 %rs237, [%r1616]; 2026-02-21T08:52:45.2805407Z ld.shared.b16 %rs238, [%r1616+1024]; 2026-02-21T08:52:45.2805472Z ld.shared.b16 %rs239, [%r1616+64]; 2026-02-21T08:52:45.2805539Z ld.shared.b16 %rs240, [%r1616+1088]; 2026-02-21T08:52:45.2805601Z add.s32 %r1617, %r1612, %r23; 2026-02-21T08:52:45.2805674Z ld.shared.b16 %rs241, [%r1617]; 2026-02-21T08:52:45.2805739Z ld.shared.b16 %rs242, [%r1617+1024]; 2026-02-21T08:52:45.2805804Z ld.shared.b16 %rs243, [%r1617+64]; 2026-02-21T08:52:45.2805878Z ld.shared.b16 %rs244, [%r1617+1088]; 2026-02-21T08:52:45.2805941Z add.s32 %r1618, %r1612, %r24; 2026-02-21T08:52:45.2806009Z ld.shared.b16 %rs245, [%r1618]; 2026-02-21T08:52:45.2806076Z ld.shared.b16 %rs246, [%r1618+1024]; 2026-02-21T08:52:45.2806254Z ld.shared.b16 %rs247, [%r1618+64]; 2026-02-21T08:52:45.2806321Z ld.shared.b16 %rs248, [%r1618+1088]; 2026-02-21T08:52:45.2806385Z add.s32 %r1619, %r1612, %r25; 2026-02-21T08:52:45.2806565Z ld.shared.b16 %rs249, [%r1619]; 2026-02-21T08:52:45.2806633Z ld.shared.b16 %rs250, [%r1619+1024]; 2026-02-21T08:52:45.2806699Z ld.shared.b16 %rs251, [%r1619+64]; 2026-02-21T08:52:45.2806771Z ld.shared.b16 %rs252, [%r1619+1088]; 2026-02-21T08:52:45.2806844Z add.s32 %r1620, %r1612, %r26; 2026-02-21T08:52:45.2806912Z ld.shared.b16 %rs253, [%r1620]; 2026-02-21T08:52:45.2806979Z ld.shared.b16 %rs254, [%r1620+1024]; 2026-02-21T08:52:45.2807051Z ld.shared.b16 %rs255, [%r1620+64]; 2026-02-21T08:52:45.2807116Z ld.shared.b16 %rs256, [%r1620+1088]; 2026-02-21T08:52:45.2807311Z cvt.f32.bf16 %r1308, %rs225; 2026-02-21T08:52:45.2807384Z cvt.f32.bf16 %r1309, %rs226; 2026-02-21T08:52:45.2807447Z cvt.f32.bf16 %r1310, %rs229; 2026-02-21T08:52:45.2807514Z cvt.f32.bf16 %r1311, %rs230; 2026-02-21T08:52:45.2807577Z cvt.f32.bf16 %r1344, %rs233; 2026-02-21T08:52:45.2807643Z cvt.f32.bf16 %r1345, %rs234; 2026-02-21T08:52:45.2807705Z cvt.f32.bf16 %r1346, %rs237; 2026-02-21T08:52:45.2807769Z cvt.f32.bf16 %r1347, %rs238; 2026-02-21T08:52:45.2807836Z cvt.f32.bf16 %r1380, %rs241; 2026-02-21T08:52:45.2807900Z cvt.f32.bf16 %r1381, %rs242; 2026-02-21T08:52:45.2807959Z cvt.f32.bf16 %r1382, %rs245; 2026-02-21T08:52:45.2808025Z cvt.f32.bf16 %r1383, %rs246; 2026-02-21T08:52:45.2808085Z cvt.f32.bf16 %r1416, %rs249; 2026-02-21T08:52:45.2808147Z cvt.f32.bf16 %r1417, %rs250; 2026-02-21T08:52:45.2808207Z cvt.f32.bf16 %r1418, %rs253; 2026-02-21T08:52:45.2808287Z cvt.f32.bf16 %r1419, %rs254; 2026-02-21T08:52:45.2808349Z cvt.f32.bf16 %r1452, %rs227; 2026-02-21T08:52:45.2808416Z cvt.f32.bf16 %r1453, %rs228; 2026-02-21T08:52:45.2808484Z cvt.f32.bf16 %r1454, %rs231; 2026-02-21T08:52:45.2808546Z cvt.f32.bf16 %r1455, %rs232; 2026-02-21T08:52:45.2808610Z cvt.f32.bf16 %r1488, %rs235; 2026-02-21T08:52:45.2808672Z cvt.f32.bf16 %r1489, %rs236; 2026-02-21T08:52:45.2808740Z cvt.f32.bf16 %r1490, %rs239; 2026-02-21T08:52:45.2808802Z cvt.f32.bf16 %r1491, %rs240; 2026-02-21T08:52:45.2808864Z cvt.f32.bf16 %r1524, %rs243; 2026-02-21T08:52:45.2808931Z cvt.f32.bf16 %r1525, %rs244; 2026-02-21T08:52:45.2808993Z cvt.f32.bf16 %r1526, %rs247; 2026-02-21T08:52:45.2809053Z cvt.f32.bf16 %r1527, %rs248; 2026-02-21T08:52:45.2809114Z cvt.f32.bf16 %r1560, %rs251; 2026-02-21T08:52:45.2809182Z cvt.f32.bf16 %r1561, %rs252; 2026-02-21T08:52:45.2809245Z cvt.f32.bf16 %r1562, %rs255; 2026-02-21T08:52:45.2809307Z cvt.f32.bf16 %r1563, %rs256; 2026-02-21T08:52:45.2809527Z .loc 1 67 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:67:45 2026-02-21T08:52:45.2809595Z add.s32 %r1621, %r27, %r1609; 2026-02-21T08:52:45.2809665Z ld.shared.b8 %rs257, [%r1621]; 2026-02-21T08:52:45.2809737Z ld.shared.b8 %rs258, [%r1621+256]; 2026-02-21T08:52:45.2809807Z ld.shared.b8 %rs259, [%r1621+512]; 2026-02-21T08:52:45.2809873Z ld.shared.b8 %rs260, [%r1621+768]; 2026-02-21T08:52:45.2809942Z ld.shared.b8 %rs261, [%r1621+1024]; 2026-02-21T08:52:45.2810014Z ld.shared.b8 %rs262, [%r1621+1280]; 2026-02-21T08:52:45.2810079Z ld.shared.b8 %rs263, [%r1621+1536]; 2026-02-21T08:52:45.2810146Z ld.shared.b8 %rs264, [%r1621+1792]; 2026-02-21T08:52:45.2810217Z ld.shared.b8 %rs265, [%r1621+2048]; 2026-02-21T08:52:45.2810283Z ld.shared.b8 %rs266, [%r1621+2304]; 2026-02-21T08:52:45.2810347Z ld.shared.b8 %rs267, [%r1621+2560]; 2026-02-21T08:52:45.2810412Z ld.shared.b8 %rs268, [%r1621+2816]; 2026-02-21T08:52:45.2810483Z ld.shared.b8 %rs269, [%r1621+3072]; 2026-02-21T08:52:45.2810549Z ld.shared.b8 %rs270, [%r1621+3328]; 2026-02-21T08:52:45.2810619Z ld.shared.b8 %rs271, [%r1621+3584]; 2026-02-21T08:52:45.2810692Z ld.shared.b8 %rs272, [%r1621+3840]; 2026-02-21T08:52:45.2810894Z .loc 1 57 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:57:28 2026-02-21T08:52:45.2811090Z shl.b16 %rs273, %rs257, 4; 2026-02-21T08:52:45.2811157Z shl.b16 %rs274, %rs258, 4; 2026-02-21T08:52:45.2811220Z shl.b16 %rs275, %rs259, 4; 2026-02-21T08:52:45.2811281Z shl.b16 %rs276, %rs260, 4; 2026-02-21T08:52:45.2811348Z shl.b16 %rs277, %rs261, 4; 2026-02-21T08:52:45.2811410Z shl.b16 %rs278, %rs262, 4; 2026-02-21T08:52:45.2811469Z shl.b16 %rs279, %rs263, 4; 2026-02-21T08:52:45.2811533Z shl.b16 %rs280, %rs264, 4; 2026-02-21T08:52:45.2811607Z shl.b16 %rs281, %rs265, 4; 2026-02-21T08:52:45.2811673Z shl.b16 %rs282, %rs266, 4; 2026-02-21T08:52:45.2811735Z shl.b16 %rs283, %rs267, 4; 2026-02-21T08:52:45.2811801Z shl.b16 %rs284, %rs268, 4; 2026-02-21T08:52:45.2811862Z shl.b16 %rs285, %rs269, 4; 2026-02-21T08:52:45.2812035Z shl.b16 %rs286, %rs270, 4; 2026-02-21T08:52:45.2812099Z shl.b16 %rs287, %rs271, 4; 2026-02-21T08:52:45.2812168Z shl.b16 %rs288, %rs272, 4; 2026-02-21T08:52:45.2812391Z .loc 1 72 58 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:72:58 2026-02-21T08:52:45.2812475Z selp.b16 %rs289, %rs273, %rs257, %p57; 2026-02-21T08:52:45.2812545Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:52:45.2812606Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:52:45.2812675Z selp.b16 %rs292, %rs274, %rs258, %p57; 2026-02-21T08:52:45.2812736Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:52:45.2812800Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:52:45.2812868Z selp.b16 %rs295, %rs275, %rs259, %p57; 2026-02-21T08:52:45.2812933Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:52:45.2813000Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:52:45.2813068Z selp.b16 %rs298, %rs276, %rs260, %p57; 2026-02-21T08:52:45.2813130Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:52:45.2813198Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:52:45.2813268Z selp.b16 %rs301, %rs277, %rs261, %p57; 2026-02-21T08:52:45.2813330Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:52:45.2813390Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:52:45.2813466Z selp.b16 %rs304, %rs278, %rs262, %p57; 2026-02-21T08:52:45.2813527Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:52:45.2813587Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:52:45.2813659Z selp.b16 %rs307, %rs279, %rs263, %p57; 2026-02-21T08:52:45.2813737Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:52:45.2813800Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:52:45.2813868Z selp.b16 %rs310, %rs280, %rs264, %p57; 2026-02-21T08:52:45.2813934Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:52:45.2813994Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:52:45.2814063Z selp.b16 %rs313, %rs281, %rs265, %p57; 2026-02-21T08:52:45.2814130Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:52:45.2814190Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:52:45.2814258Z selp.b16 %rs316, %rs282, %rs266, %p57; 2026-02-21T08:52:45.2814321Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:52:45.2814387Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:52:45.2814455Z selp.b16 %rs319, %rs283, %rs267, %p57; 2026-02-21T08:52:45.2814514Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:52:45.2814586Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:52:45.2814654Z selp.b16 %rs322, %rs284, %rs268, %p57; 2026-02-21T08:52:45.2814716Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:52:45.2814782Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:52:45.2814849Z selp.b16 %rs325, %rs285, %rs269, %p57; 2026-02-21T08:52:45.2814914Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:52:45.2814974Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:52:45.2815048Z selp.b16 %rs328, %rs286, %rs270, %p57; 2026-02-21T08:52:45.2815107Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:52:45.2815167Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:52:45.2815258Z selp.b16 %rs331, %rs287, %rs271, %p57; 2026-02-21T08:52:45.2815323Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:52:45.2815384Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:52:45.2815455Z selp.b16 %rs334, %rs288, %rs272, %p57; 2026-02-21T08:52:45.2815525Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:52:45.2815588Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:52:45.2815917Z .loc 1 77 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:77:32 2026-02-21T08:52:45.2815996Z cvt.rn.f32.s16 %r1622, %rs291; 2026-02-21T08:52:45.2816066Z cvt.rn.f32.s16 %r1623, %rs294; 2026-02-21T08:52:45.2816130Z cvt.rn.f32.s16 %r1624, %rs297; 2026-02-21T08:52:45.2816206Z cvt.rn.f32.s16 %r1625, %rs300; 2026-02-21T08:52:45.2816276Z cvt.rn.f32.s16 %r1626, %rs303; 2026-02-21T08:52:45.2816339Z cvt.rn.f32.s16 %r1627, %rs306; 2026-02-21T08:52:45.2816404Z cvt.rn.f32.s16 %r1628, %rs309; 2026-02-21T08:52:45.2816601Z cvt.rn.f32.s16 %r1629, %rs312; 2026-02-21T08:52:45.2816668Z cvt.rn.f32.s16 %r1630, %rs315; 2026-02-21T08:52:45.2816733Z cvt.rn.f32.s16 %r1631, %rs318; 2026-02-21T08:52:45.2816799Z cvt.rn.f32.s16 %r1632, %rs321; 2026-02-21T08:52:45.2816995Z cvt.rn.f32.s16 %r1633, %rs324; 2026-02-21T08:52:45.2817064Z cvt.rn.f32.s16 %r1634, %rs327; 2026-02-21T08:52:45.2817127Z cvt.rn.f32.s16 %r1635, %rs330; 2026-02-21T08:52:45.2817193Z cvt.rn.f32.s16 %r1636, %rs333; 2026-02-21T08:52:45.2817270Z cvt.rn.f32.s16 %r1637, %rs336; 2026-02-21T08:52:45.2817340Z st.shared.b32 [%r39], %r1622; 2026-02-21T08:52:45.2817417Z st.shared.b32 [%r39+16384], %r1630; 2026-02-21T08:52:45.2817482Z st.shared.b32 [%r40], %r1623; 2026-02-21T08:52:45.2817549Z st.shared.b32 [%r40+16384], %r1631; 2026-02-21T08:52:45.2817616Z st.shared.b32 [%r41], %r1624; 2026-02-21T08:52:45.2817686Z st.shared.b32 [%r41+16384], %r1632; 2026-02-21T08:52:45.2817749Z st.shared.b32 [%r42], %r1625; 2026-02-21T08:52:45.2817813Z st.shared.b32 [%r42+16384], %r1633; 2026-02-21T08:52:45.2817882Z st.shared.b32 [%r43], %r1626; 2026-02-21T08:52:45.2817949Z st.shared.b32 [%r43+16384], %r1634; 2026-02-21T08:52:45.2818011Z st.shared.b32 [%r44], %r1627; 2026-02-21T08:52:45.2818078Z st.shared.b32 [%r44+16384], %r1635; 2026-02-21T08:52:45.2818146Z st.shared.b32 [%r45], %r1628; 2026-02-21T08:52:45.2818211Z st.shared.b32 [%r45+16384], %r1636; 2026-02-21T08:52:45.2818277Z st.shared.b32 [%r46], %r1629; 2026-02-21T08:52:45.2818348Z st.shared.b32 [%r46+16384], %r1637; 2026-02-21T08:52:45.2818404Z $L__tmp5: 2026-02-21T08:52:45.2818691Z .loc 2 291 36 // standard.py:291:36 @[ cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:84:40 ] 2026-02-21T08:52:45.2818754Z // begin inline asm 2026-02-21T08:52:45.2818836Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.2818895Z // end inline asm 2026-02-21T08:52:45.2818952Z bar.sync 0; 2026-02-21T08:52:45.2819040Z shfl.sync.idx.b32 %r1638, %r4, 0, 31, -1; 2026-02-21T08:52:45.2819114Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.2819178Z shl.b32 %r1639, %r1638, 10; 2026-02-21T08:52:45.2819249Z and.b32 %r1640, %r1639, 12288; 2026-02-21T08:52:45.2819315Z add.s32 %r1641, %r1640, %r2197; 2026-02-21T08:52:45.2819380Z bfe.u32 %r1642, %r1641, 4, 14; 2026-02-21T08:52:45.2819446Z cvt.u64.u32 %rd138, %r1642; 2026-02-21T08:52:45.2819548Z or.b64 %rd127, %rd138, 4611686293372403712; 2026-02-21T08:52:45.2819619Z mov.pred %p28, -1; 2026-02-21T08:52:45.2819684Z // begin inline asm 2026-02-21T08:52:45.2820209Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1308,%r1309,%r1310,%r1311}, %rd127, %p28, 1, 1; 2026-02-21T08:52:45.2820268Z // end inline asm 2026-02-21T08:52:45.2820330Z add.s32 %r1643, %r1641, 32; 2026-02-21T08:52:45.2820396Z bfe.u32 %r1644, %r1643, 4, 14; 2026-02-21T08:52:45.2820459Z cvt.u64.u32 %rd139, %r1644; 2026-02-21T08:52:45.2820533Z or.b64 %rd128, %rd139, 4611686293372403712; 2026-02-21T08:52:45.2820602Z // begin inline asm 2026-02-21T08:52:45.2821116Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1344,%r1345,%r1346,%r1347}, %rd128, %p28, 1, 1; 2026-02-21T08:52:45.2821176Z // end inline asm 2026-02-21T08:52:45.2821236Z add.s32 %r1645, %r1641, 64; 2026-02-21T08:52:45.2821444Z bfe.u32 %r1646, %r1645, 4, 14; 2026-02-21T08:52:45.2821508Z cvt.u64.u32 %rd140, %r1646; 2026-02-21T08:52:45.2821581Z or.b64 %rd129, %rd140, 4611686293372403712; 2026-02-21T08:52:45.2821647Z // begin inline asm 2026-02-21T08:52:45.2822153Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1380,%r1381,%r1382,%r1383}, %rd129, %p28, 1, 1; 2026-02-21T08:52:45.2822210Z // end inline asm 2026-02-21T08:52:45.2822277Z add.s32 %r1647, %r1641, 96; 2026-02-21T08:52:45.2822339Z bfe.u32 %r1648, %r1647, 4, 14; 2026-02-21T08:52:45.2822401Z cvt.u64.u32 %rd141, %r1648; 2026-02-21T08:52:45.2822477Z or.b64 %rd130, %rd141, 4611686293372403712; 2026-02-21T08:52:45.2822649Z // begin inline asm 2026-02-21T08:52:45.2823159Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1416,%r1417,%r1418,%r1419}, %rd130, %p28, 1, 1; 2026-02-21T08:52:45.2823223Z // end inline asm 2026-02-21T08:52:45.2823292Z add.s32 %r1649, %r1641, 16384; 2026-02-21T08:52:45.2823353Z bfe.u32 %r1650, %r1649, 4, 14; 2026-02-21T08:52:45.2823415Z cvt.u64.u32 %rd142, %r1650; 2026-02-21T08:52:45.2823495Z or.b64 %rd131, %rd142, 4611686293372403712; 2026-02-21T08:52:45.2823554Z // begin inline asm 2026-02-21T08:52:45.2824053Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1452,%r1453,%r1454,%r1455}, %rd131, %p28, 1, 1; 2026-02-21T08:52:45.2824118Z // end inline asm 2026-02-21T08:52:45.2824180Z add.s32 %r1651, %r1641, 16416; 2026-02-21T08:52:45.2824245Z bfe.u32 %r1652, %r1651, 4, 14; 2026-02-21T08:52:45.2824309Z cvt.u64.u32 %rd143, %r1652; 2026-02-21T08:52:45.2824387Z or.b64 %rd132, %rd143, 4611686293372403712; 2026-02-21T08:52:45.2824448Z // begin inline asm 2026-02-21T08:52:45.2824968Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1488,%r1489,%r1490,%r1491}, %rd132, %p28, 1, 1; 2026-02-21T08:52:45.2825032Z // end inline asm 2026-02-21T08:52:45.2825096Z add.s32 %r1653, %r1641, 16448; 2026-02-21T08:52:45.2825156Z bfe.u32 %r1654, %r1653, 4, 14; 2026-02-21T08:52:45.2825221Z cvt.u64.u32 %rd144, %r1654; 2026-02-21T08:52:45.2825300Z or.b64 %rd133, %rd144, 4611686293372403712; 2026-02-21T08:52:45.2825361Z // begin inline asm 2026-02-21T08:52:45.2825860Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1524,%r1525,%r1526,%r1527}, %rd133, %p28, 1, 1; 2026-02-21T08:52:45.2825923Z // end inline asm 2026-02-21T08:52:45.2825986Z add.s32 %r1655, %r1641, 16480; 2026-02-21T08:52:45.2826047Z bfe.u32 %r1656, %r1655, 4, 14; 2026-02-21T08:52:45.2826117Z cvt.u64.u32 %rd145, %r1656; 2026-02-21T08:52:45.2826189Z or.b64 %rd134, %rd145, 4611686293372403712; 2026-02-21T08:52:45.2826248Z // begin inline asm 2026-02-21T08:52:45.2826860Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269}, {%r1560,%r1561,%r1562,%r1563}, %rd134, %p28, 1, 1; 2026-02-21T08:52:45.2826922Z // end inline asm 2026-02-21T08:52:45.2826999Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.2827069Z mov.b32 %r1582, 0; 2026-02-21T08:52:45.2827137Z mov.b32 %r1581, %r1582; 2026-02-21T08:52:45.2827198Z mov.b32 %r1580, %r2197; 2026-02-21T08:52:45.2827258Z // begin inline asm 2026-02-21T08:52:45.2827581Z // wait for regs: %r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r1580,%r1581,%r1582 2026-02-21T08:52:45.2827659Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.2827716Z // end inline asm 2026-02-21T08:52:45.2827914Z $L__tmp6: 2026-02-21T08:52:45.2828138Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2828201Z add.s32 %r1657, %r2253, 1; 2026-02-21T08:52:45.2828270Z setp.gt.s32 %p39, %r1657, 1; 2026-02-21T08:52:45.2828346Z selp.b32 %r2253, 0, %r1657, %p39; 2026-02-21T08:52:45.2828642Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2828721Z mad.wide.s32 %rd136, %r2250, 2, %rd28; 2026-02-21T08:52:45.2828933Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2828996Z shl.b32 %r1658, %r2253, 12; 2026-02-21T08:52:45.2829057Z shl.b32 %r1659, %r2253, 13; 2026-02-21T08:52:45.2829247Z add.s32 %r1660, %r705, %r1659; 2026-02-21T08:52:45.2829314Z add.s32 %r1602, %r1660, %r10; 2026-02-21T08:52:45.2829380Z selp.b32 %r1603, 8, 0, %p37; 2026-02-21T08:52:45.2829445Z // begin inline asm 2026-02-21T08:52:45.2829601Z cp.async.ca.shared.global [ %r1602 + 0 ], [ %rd196 + 0 ], 0x8, %r1603; 2026-02-21T08:52:45.2829659Z // end inline asm 2026-02-21T08:52:45.2829721Z add.s32 %r1604, %r1602, 4096; 2026-02-21T08:52:45.2829783Z // begin inline asm 2026-02-21T08:52:45.2829921Z cp.async.ca.shared.global [ %r1604 + 0 ], [ %rd136 + 0 ], 0x8, %r1603; 2026-02-21T08:52:45.2829979Z // end inline asm 2026-02-21T08:52:45.2830047Z cp.async.commit_group; 2026-02-21T08:52:45.2830259Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2830321Z cvt.s64.s32 %rd146, %r2251; 2026-02-21T08:52:45.2830387Z add.s64 %rd137, %rd29, %rd146; 2026-02-21T08:52:45.2830598Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2830662Z add.s32 %r1606, %r14, %r1658; 2026-02-21T08:52:45.2830721Z // begin inline asm 2026-02-21T08:52:45.2830872Z cp.async.ca.shared.global [ %r1606 + 0 ], [ %rd137 + 0 ], 0x8, %r1603; 2026-02-21T08:52:45.2830941Z // end inline asm 2026-02-21T08:52:45.2831010Z cp.async.commit_group; 2026-02-21T08:52:45.2831224Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2831294Z add.s32 %r2251, %r2251, 229376; 2026-02-21T08:52:45.2831359Z add.s64 %rd196, %rd196, 128; 2026-02-21T08:52:45.2831419Z add.s32 %r2250, %r2250, 64; 2026-02-21T08:52:45.2831493Z setp.lt.u64 %p40, %rd197, 4064; 2026-02-21T08:52:45.2831556Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:45.2831668Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.2831740Z cp.async.wait_group 0; 2026-02-21T08:52:45.2831797Z bar.sync 0; 2026-02-21T08:52:45.2832012Z .loc 1 87 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:87:28 2026-02-21T08:52:45.2832094Z cvt.rn.bf16x2.f32 %r1679, %r2255, %r2254; 2026-02-21T08:52:45.2832178Z cvt.rn.bf16x2.f32 %r1680, %r2257, %r2256; 2026-02-21T08:52:45.2832251Z cvt.rn.bf16x2.f32 %r1681, %r2259, %r2258; 2026-02-21T08:52:45.2832322Z cvt.rn.bf16x2.f32 %r1682, %r2261, %r2260; 2026-02-21T08:52:45.2832396Z cvt.rn.bf16x2.f32 %r1683, %r2263, %r2262; 2026-02-21T08:52:45.2832467Z cvt.rn.bf16x2.f32 %r1684, %r2265, %r2264; 2026-02-21T08:52:45.2832538Z cvt.rn.bf16x2.f32 %r1685, %r2267, %r2266; 2026-02-21T08:52:45.2832609Z cvt.rn.bf16x2.f32 %r1686, %r2269, %r2268; 2026-02-21T08:52:45.2832821Z .loc 1 88 50 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:50 2026-02-21T08:52:45.2832891Z mad.lo.s32 %r1687, %r172, 7168, %r174; 2026-02-21T08:52:45.2832959Z mad.lo.s32 %r1688, %r173, 7168, %r174; 2026-02-21T08:52:45.2833168Z .loc 1 88 22 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:22 2026-02-21T08:52:45.2833238Z mad.wide.s32 %rd147, %r1687, 2, %rd30; 2026-02-21T08:52:45.2833308Z mad.wide.s32 %rd148, %r1688, 2, %rd30; 2026-02-21T08:52:45.2833637Z .loc 1 88 81 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:81 2026-02-21T08:52:45.2833749Z st.shared.v4.b32 [%r36], {%r1679, %r1681, %r1683, %r1685}; 2026-02-21T08:52:45.2833866Z st.shared.v4.b32 [%r36+512], {%r1680, %r1682, %r1684, %r1686}; 2026-02-21T08:52:45.2833928Z bar.sync 0; 2026-02-21T08:52:45.2833988Z // begin inline asm 2026-02-21T08:52:45.2834180Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1661, %r1662, %r1663, %r1664}, [%r759]; 2026-02-21T08:52:45.2834240Z // end inline asm 2026-02-21T08:52:45.2834307Z // begin inline asm 2026-02-21T08:52:45.2834486Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1666, %r1667, %r1668, %r1669}, [%r764]; 2026-02-21T08:52:45.2834543Z // end inline asm 2026-02-21T08:52:45.2834702Z // begin inline asm 2026-02-21T08:52:45.2834839Z st.global.v4.b32 [ %rd147 + 0 ], { %r1661, %r1662, %r1663, %r1664 }; 2026-02-21T08:52:45.2834900Z // end inline asm 2026-02-21T08:52:45.2834961Z // begin inline asm 2026-02-21T08:52:45.2835100Z st.global.v4.b32 [ %rd148 + 0 ], { %r1666, %r1667, %r1668, %r1669 }; 2026-02-21T08:52:45.2835157Z // end inline asm 2026-02-21T08:52:45.2835371Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2835441Z add.s32 %r2209, %r2209, 12672; 2026-02-21T08:52:45.2835512Z setp.lt.s32 %p41, %r2209, %r2270; 2026-02-21T08:52:45.2835573Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:45.2835672Z $L__BB0_9: // %.preheader 2026-02-21T08:52:45.2835738Z setp.gt.s32 %p42, %r2270, 55; 2026-02-21T08:52:45.2835799Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:45.2835885Z // %bb.10: // %.lr.ph99 2026-02-21T08:52:45.2836098Z .loc 1 0 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:0:144 2026-02-21T08:52:45.2836161Z and.b32 %r1690, %r2195, 4088; 2026-02-21T08:52:45.2836222Z and.b32 %r1692, %r2196, 56; 2026-02-21T08:52:45.2836293Z xor.b32 %r50, %r1690, %r1692; 2026-02-21T08:52:45.2836354Z add.s32 %r1694, %r2197, %r50; 2026-02-21T08:52:45.2836417Z add.s32 %r1739, %r1694, 32768; 2026-02-21T08:52:45.2836613Z add.s32 %r1741, %r1694, 36864; 2026-02-21T08:52:45.2836678Z add.s32 %r1695, %r2197, 49152; 2026-02-21T08:52:45.2836738Z add.s32 %r54, %r1695, %r1690; 2026-02-21T08:52:45.2836811Z add.s32 %r1745, %r1694, 40960; 2026-02-21T08:52:45.2836881Z add.s32 %r1747, %r1694, 45056; 2026-02-21T08:52:45.2836944Z add.s32 %r1696, %r2197, %r1690; 2026-02-21T08:52:45.2837004Z add.s32 %r1749, %r1696, 53248; 2026-02-21T08:52:45.2837071Z and.b32 %r1698, %r2200, 6144; 2026-02-21T08:52:45.2837135Z and.b32 %r1700, %r2201, 896; 2026-02-21T08:52:45.2837194Z and.b32 %r1702, %r2202, 62; 2026-02-21T08:52:45.2837257Z or.b32 %r1703, %r1698, %r1700; 2026-02-21T08:52:45.2837326Z or.b32 %r59, %r1703, %r1702; 2026-02-21T08:52:45.2837386Z xor.b32 %r60, %r59, 8; 2026-02-21T08:52:45.2837447Z xor.b32 %r61, %r59, 16; 2026-02-21T08:52:45.2837514Z xor.b32 %r62, %r59, 24; 2026-02-21T08:52:45.2837571Z xor.b32 %r63, %r59, 32; 2026-02-21T08:52:45.2837629Z xor.b32 %r64, %r59, 40; 2026-02-21T08:52:45.2837687Z xor.b32 %r65, %r59, 48; 2026-02-21T08:52:45.2837752Z xor.b32 %r66, %r59, 56; 2026-02-21T08:52:45.2837813Z and.b32 %r1705, %r2196, 128; 2026-02-21T08:52:45.2837876Z add.s32 %r1706, %r1695, %r1705; 2026-02-21T08:52:45.2837942Z add.s32 %r67, %r1706, %r2203; 2026-02-21T08:52:45.2838002Z shl.b32 %r1707, %r2203, 7; 2026-02-21T08:52:45.2838061Z and.b32 %r1709, %r2204, 112; 2026-02-21T08:52:45.2838122Z or.b32 %r1711, %r1707, %r2205; 2026-02-21T08:52:45.2838187Z or.b32 %r1712, %r1711, %r1709; 2026-02-21T08:52:45.2838247Z add.s32 %r68, %r2197, %r1712; 2026-02-21T08:52:45.2838307Z xor.b32 %r1713, %r1712, 16; 2026-02-21T08:52:45.2838375Z add.s32 %r69, %r2197, %r1713; 2026-02-21T08:52:45.2838436Z xor.b32 %r1714, %r1712, 32; 2026-02-21T08:52:45.2838497Z add.s32 %r70, %r2197, %r1714; 2026-02-21T08:52:45.2838566Z xor.b32 %r1715, %r1712, 48; 2026-02-21T08:52:45.2838781Z add.s32 %r71, %r2197, %r1715; 2026-02-21T08:52:45.2838841Z xor.b32 %r1716, %r1712, 64; 2026-02-21T08:52:45.2838901Z add.s32 %r72, %r2197, %r1716; 2026-02-21T08:52:45.2838965Z xor.b32 %r1717, %r1712, 80; 2026-02-21T08:52:45.2839025Z add.s32 %r73, %r2197, %r1717; 2026-02-21T08:52:45.2839085Z xor.b32 %r1718, %r1712, 96; 2026-02-21T08:52:45.2839148Z add.s32 %r74, %r2197, %r1718; 2026-02-21T08:52:45.2839208Z xor.b32 %r1719, %r1712, 112; 2026-02-21T08:52:45.2839268Z add.s32 %r75, %r2197, %r1719; 2026-02-21T08:52:45.2839327Z shl.b32 %r1721, %r2206, 12; 2026-02-21T08:52:45.2839392Z and.b32 %r1722, %r2201, 3168; 2026-02-21T08:52:45.2839453Z shl.b32 %r1724, %r2207, 4; 2026-02-21T08:52:45.2839511Z shr.u32 %r1725, %r3, 2; 2026-02-21T08:52:45.2839691Z and.b32 %r1726, %r1725, 96; 2026-02-21T08:52:45.2839762Z and.b32 %r1728, %r2208, 16; 2026-02-21T08:52:45.2839823Z or.b32 %r1729, %r1722, %r1724; 2026-02-21T08:52:45.2839886Z xor.b32 %r1730, %r1729, %r1726; 2026-02-21T08:52:45.2839958Z add.s32 %r1731, %r2197, %r1721; 2026-02-21T08:52:45.2840019Z add.s32 %r1732, %r1731, %r1728; 2026-02-21T08:52:45.2840082Z add.s32 %r76, %r1732, %r1730; 2026-02-21T08:52:45.2840142Z shl.b32 %r1733, %r2207, 9; 2026-02-21T08:52:45.2840208Z shl.b32 %r1734, %r2206, 5; 2026-02-21T08:52:45.2840268Z and.b32 %r1735, %r2208, 2032; 2026-02-21T08:52:45.2840329Z or.b32 %r1736, %r1733, %r1734; 2026-02-21T08:52:45.2840395Z xor.b32 %r1737, %r1736, %r1735; 2026-02-21T08:52:45.2840455Z add.s32 %r2171, %r2197, %r1737; 2026-02-21T08:52:45.2840515Z add.s32 %r2176, %r2171, 2048; 2026-02-21T08:52:45.2840749Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2840828Z mad.wide.u32 %rd149, %r267, 8, %rd28; 2026-02-21T08:52:45.2840892Z add.s64 %rd2, %rd149, 256; 2026-02-21T08:52:45.2841099Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2841169Z or.b32 %r79, %r7, 128; 2026-02-21T08:52:45.2841374Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2841447Z mad.wide.u32 %rd150, %r5, 7168, %rd29; 2026-02-21T08:52:45.2841514Z add.s64 %rd3, %rd150, 458752; 2026-02-21T08:52:45.2841627Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:45.2841725Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:45.2841945Z .loc 1 25 35 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:25:35 2026-02-21T08:52:45.2842023Z mul.hi.s32 %r1754, %r2270, -1840700269; 2026-02-21T08:52:45.2842088Z add.s32 %r1755, %r1754, %r2270; 2026-02-21T08:52:45.2842152Z shr.u32 %r1756, %r1755, 31; 2026-02-21T08:52:45.2842221Z shr.s32 %r1757, %r1755, 7; 2026-02-21T08:52:45.2842281Z add.s32 %r1758, %r1757, %r1756; 2026-02-21T08:52:45.2842491Z .loc 1 26 33 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:26:33 2026-02-21T08:52:45.2842562Z shl.b32 %r1759, %r1758, 2; 2026-02-21T08:52:45.2842777Z .loc 1 27 39 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:39 2026-02-21T08:52:45.2842840Z sub.s32 %r1760, 1, %r1759; 2026-02-21T08:52:45.2843042Z .loc 1 27 52 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:27:52 2026-02-21T08:52:45.2843103Z min.u32 %r1761, %r1760, 4; 2026-02-21T08:52:45.2843299Z .loc 1 28 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:45 2026-02-21T08:52:45.2843367Z mul.lo.s32 %r1762, %r1758, 224; 2026-02-21T08:52:45.2843428Z sub.s32 %r1763, %r2270, %r1762; 2026-02-21T08:52:45.2843627Z .loc 1 28 64 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:64 2026-02-21T08:52:45.2843691Z cvt.u16.u32 %rs337, %r1763; 2026-02-21T08:52:45.2843760Z cvt.u16.u32 %rs338, %r1761; 2026-02-21T08:52:45.2844075Z .loc 1 29 51 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:29:51 2026-02-21T08:52:45.2844137Z div.s16 %rs339, %rs337, %rs338; 2026-02-21T08:52:45.2844339Z .loc 1 28 64 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:64 2026-02-21T08:52:45.2844420Z mul.lo.s16 %rs340, %rs339, %rs338; 2026-02-21T08:52:45.2844484Z sub.s16 %rs341, %rs337, %rs340; 2026-02-21T08:52:45.2844553Z cvt.s32.s16 %r1764, %rs341; 2026-02-21T08:52:45.2844752Z .loc 1 28 30 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:28:30 2026-02-21T08:52:45.2844814Z add.s32 %r1765, %r1759, %r1764; 2026-02-21T08:52:45.2845100Z .loc 1 30 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:30:27 2026-02-21T08:52:45.2845169Z shl.b32 %r1766, %r1765, 6; 2026-02-21T08:52:45.2845366Z .loc 1 31 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:31:32 2026-02-21T08:52:45.2845430Z or.b32 %r219, %r1766, %r5; 2026-02-21T08:52:45.2845494Z or.b32 %r220, %r1766, %r6; 2026-02-21T08:52:45.2845692Z .loc 1 32 27 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:32:27 2026-02-21T08:52:45.2845762Z mul.wide.s16 %r1767, %rs339, 128; 2026-02-21T08:52:45.2845964Z .loc 1 33 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:33:32 2026-02-21T08:52:45.2846036Z or.b32 %r221, %r1767, %r8; 2026-02-21T08:52:45.2846236Z .loc 1 48 53 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:53 2026-02-21T08:52:45.2846303Z shl.b32 %r1768, %r219, 13; 2026-02-21T08:52:45.2846365Z shl.b32 %r1769, %r220, 13; 2026-02-21T08:52:45.2846684Z .loc 1 48 60 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:60 2026-02-21T08:52:45.2846750Z or.b32 %r1770, %r1768, %r7; 2026-02-21T08:52:45.2846818Z or.b32 %r1771, %r1769, %r7; 2026-02-21T08:52:45.2847019Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2847094Z mad.wide.s32 %rd151, %r1770, 2, %rd28; 2026-02-21T08:52:45.2847168Z mad.wide.s32 %rd152, %r1771, 2, %rd28; 2026-02-21T08:52:45.2847226Z mov.b32 %r1740, 8; 2026-02-21T08:52:45.2847424Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2847489Z // begin inline asm 2026-02-21T08:52:45.2847635Z cp.async.ca.shared.global [ %r1739 + 0 ], [ %rd151 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2847694Z // end inline asm 2026-02-21T08:52:45.2847752Z // begin inline asm 2026-02-21T08:52:45.2847904Z cp.async.ca.shared.global [ %r1741 + 0 ], [ %rd152 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2847965Z // end inline asm 2026-02-21T08:52:45.2848035Z cp.async.commit_group; 2026-02-21T08:52:45.2848242Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2848309Z add.s32 %r1772, %r221, %r2198; 2026-02-21T08:52:45.2848508Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2848574Z cvt.s64.s32 %rd158, %r1772; 2026-02-21T08:52:45.2848638Z add.s64 %rd153, %rd29, %rd158; 2026-02-21T08:52:45.2848834Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2848895Z // begin inline asm 2026-02-21T08:52:45.2849038Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd153 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2849096Z // end inline asm 2026-02-21T08:52:45.2849161Z cp.async.commit_group; 2026-02-21T08:52:45.2849364Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2849429Z cvt.s64.s32 %rd159, %r1768; 2026-02-21T08:52:45.2849492Z or.b64 %rd161, %rd159, %rd191; 2026-02-21T08:52:45.2849557Z shl.b64 %rd162, %rd161, 1; 2026-02-21T08:52:45.2849777Z add.s64 %rd163, %rd28, %rd162; 2026-02-21T08:52:45.2849843Z add.s64 %rd154, %rd163, 128; 2026-02-21T08:52:45.2849911Z cvt.s64.s32 %rd164, %r1769; 2026-02-21T08:52:45.2849973Z or.b64 %rd165, %rd164, %rd191; 2026-02-21T08:52:45.2850037Z shl.b64 %rd166, %rd165, 1; 2026-02-21T08:52:45.2850103Z add.s64 %rd167, %rd28, %rd166; 2026-02-21T08:52:45.2850167Z add.s64 %rd155, %rd167, 128; 2026-02-21T08:52:45.2850366Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2850426Z bar.sync 0; 2026-02-21T08:52:45.2850486Z // begin inline asm 2026-02-21T08:52:45.2850621Z cp.async.ca.shared.global [ %r1745 + 0 ], [ %rd154 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2850678Z // end inline asm 2026-02-21T08:52:45.2850741Z // begin inline asm 2026-02-21T08:52:45.2851006Z cp.async.ca.shared.global [ %r1747 + 0 ], [ %rd155 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2851071Z // end inline asm 2026-02-21T08:52:45.2851141Z cp.async.commit_group; 2026-02-21T08:52:45.2851345Z .loc 1 54 62 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:62 2026-02-21T08:52:45.2851407Z add.s32 %r1773, %r221, %r2199; 2026-02-21T08:52:45.2851606Z .loc 1 54 34 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:34 2026-02-21T08:52:45.2851674Z cvt.s64.s32 %rd168, %r1773; 2026-02-21T08:52:45.2851738Z add.s64 %rd156, %rd29, %rd168; 2026-02-21T08:52:45.2851934Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2851999Z // begin inline asm 2026-02-21T08:52:45.2852133Z cp.async.ca.shared.global [ %r1749 + 0 ], [ %rd156 + 0 ], 0x8, %r1740; 2026-02-21T08:52:45.2852190Z // end inline asm 2026-02-21T08:52:45.2852261Z cp.async.commit_group; 2026-02-21T08:52:45.2852469Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2852530Z shl.b32 %r1774, %r1758, 8; 2026-02-21T08:52:45.2852593Z or.b32 %r1775, %r5, %r1774; 2026-02-21T08:52:45.2852665Z mul.wide.s16 %r1776, %rs341, 64; 2026-02-21T08:52:45.2852728Z add.s32 %r1777, %r1775, %r1776; 2026-02-21T08:52:45.2852788Z shl.b32 %r1778, %r1777, 13; 2026-02-21T08:52:45.2852861Z mad.wide.s32 %rd199, %r1778, 2, %rd2; 2026-02-21T08:52:45.2852921Z or.b32 %r1779, %r6, %r1774; 2026-02-21T08:52:45.2852982Z add.s32 %r1780, %r1779, %r1776; 2026-02-21T08:52:45.2853044Z shl.b32 %r1781, %r1780, 13; 2026-02-21T08:52:45.2853104Z or.b32 %r2271, %r79, %r1781; 2026-02-21T08:52:45.2853167Z cvt.s64.s32 %rd169, %r221; 2026-02-21T08:52:45.2853227Z add.s64 %rd198, %rd3, %rd169; 2026-02-21T08:52:45.2853292Z mov.b32 %r2274, 0f00000000; 2026-02-21T08:52:45.2853349Z mov.b32 %r2273, 1; 2026-02-21T08:52:45.2853410Z mov.b32 %r2272, -1; 2026-02-21T08:52:45.2853473Z mov.b64 %rd200, -32; 2026-02-21T08:52:45.2853544Z mov.b32 %r2275, %r2274; 2026-02-21T08:52:45.2853606Z mov.b32 %r2276, %r2274; 2026-02-21T08:52:45.2853669Z mov.b32 %r2277, %r2274; 2026-02-21T08:52:45.2853731Z mov.b32 %r2278, %r2274; 2026-02-21T08:52:45.2853788Z mov.b32 %r2279, %r2274; 2026-02-21T08:52:45.2853846Z mov.b32 %r2280, %r2274; 2026-02-21T08:52:45.2853909Z mov.b32 %r2281, %r2274; 2026-02-21T08:52:45.2853968Z mov.b32 %r2282, %r2274; 2026-02-21T08:52:45.2854026Z mov.b32 %r2283, %r2274; 2026-02-21T08:52:45.2854084Z mov.b32 %r2284, %r2274; 2026-02-21T08:52:45.2854144Z mov.b32 %r2285, %r2274; 2026-02-21T08:52:45.2854202Z mov.b32 %r2286, %r2274; 2026-02-21T08:52:45.2854257Z mov.b32 %r2287, %r2274; 2026-02-21T08:52:45.2854317Z mov.b32 %r2288, %r2274; 2026-02-21T08:52:45.2854376Z mov.b32 %r2289, %r2274; 2026-02-21T08:52:45.2854487Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:45.2854595Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.2854662Z add.s64 %rd200, %rd200, 32; 2026-02-21T08:52:45.2854730Z setp.lt.u64 %p52, %rd200, 4032; 2026-02-21T08:52:45.2854958Z add.s32 %r2114, %r2272, 1; 2026-02-21T08:52:45.2855027Z setp.gt.s32 %p53, %r2114, 1; 2026-02-21T08:52:45.2855094Z selp.b32 %r2272, 0, %r2114, %p53; 2026-02-21T08:52:45.2855295Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2855366Z cp.async.wait_group 2; 2026-02-21T08:52:45.2855423Z bar.sync 0; 2026-02-21T08:52:45.2855493Z shl.b32 %r2115, %r2272, 12; 2026-02-21T08:52:45.2855554Z shl.b32 %r2116, %r2272, 13; 2026-02-21T08:52:45.2855619Z add.s32 %r2117, %r2197, 32768; 2026-02-21T08:52:45.2855681Z add.s32 %r2118, %r2117, %r2116; 2026-02-21T08:52:45.2855878Z .loc 1 52 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:52:32 2026-02-21T08:52:45.2856037Z add.s32 %r2119, %r2118, %r59; 2026-02-21T08:52:45.2856107Z ld.shared.b16 %rs342, [%r2119]; 2026-02-21T08:52:45.2856178Z ld.shared.b16 %rs343, [%r2119+1024]; 2026-02-21T08:52:45.2856249Z ld.shared.b16 %rs344, [%r2119+64]; 2026-02-21T08:52:45.2856322Z ld.shared.b16 %rs345, [%r2119+1088]; 2026-02-21T08:52:45.2856386Z add.s32 %r2120, %r2118, %r60; 2026-02-21T08:52:45.2856579Z ld.shared.b16 %rs346, [%r2120]; 2026-02-21T08:52:45.2856658Z ld.shared.b16 %rs347, [%r2120+1024]; 2026-02-21T08:52:45.2856724Z ld.shared.b16 %rs348, [%r2120+64]; 2026-02-21T08:52:45.2856790Z ld.shared.b16 %rs349, [%r2120+1088]; 2026-02-21T08:52:45.2856854Z add.s32 %r2121, %r2118, %r61; 2026-02-21T08:52:45.2856917Z ld.shared.b16 %rs350, [%r2121]; 2026-02-21T08:52:45.2856981Z ld.shared.b16 %rs351, [%r2121+1024]; 2026-02-21T08:52:45.2857046Z ld.shared.b16 %rs352, [%r2121+64]; 2026-02-21T08:52:45.2857117Z ld.shared.b16 %rs353, [%r2121+1088]; 2026-02-21T08:52:45.2857178Z add.s32 %r2122, %r2118, %r62; 2026-02-21T08:52:45.2857247Z ld.shared.b16 %rs354, [%r2122]; 2026-02-21T08:52:45.2857325Z ld.shared.b16 %rs355, [%r2122+1024]; 2026-02-21T08:52:45.2857392Z ld.shared.b16 %rs356, [%r2122+64]; 2026-02-21T08:52:45.2857461Z ld.shared.b16 %rs357, [%r2122+1088]; 2026-02-21T08:52:45.2857526Z add.s32 %r2123, %r2118, %r63; 2026-02-21T08:52:45.2857591Z ld.shared.b16 %rs358, [%r2123]; 2026-02-21T08:52:45.2857657Z ld.shared.b16 %rs359, [%r2123+1024]; 2026-02-21T08:52:45.2857722Z ld.shared.b16 %rs360, [%r2123+64]; 2026-02-21T08:52:45.2857794Z ld.shared.b16 %rs361, [%r2123+1088]; 2026-02-21T08:52:45.2857855Z add.s32 %r2124, %r2118, %r64; 2026-02-21T08:52:45.2857920Z ld.shared.b16 %rs362, [%r2124]; 2026-02-21T08:52:45.2858001Z ld.shared.b16 %rs363, [%r2124+1024]; 2026-02-21T08:52:45.2858066Z ld.shared.b16 %rs364, [%r2124+64]; 2026-02-21T08:52:45.2858131Z ld.shared.b16 %rs365, [%r2124+1088]; 2026-02-21T08:52:45.2858192Z add.s32 %r2125, %r2118, %r65; 2026-02-21T08:52:45.2858260Z ld.shared.b16 %rs366, [%r2125]; 2026-02-21T08:52:45.2858327Z ld.shared.b16 %rs367, [%r2125+1024]; 2026-02-21T08:52:45.2858392Z ld.shared.b16 %rs368, [%r2125+64]; 2026-02-21T08:52:45.2858460Z ld.shared.b16 %rs369, [%r2125+1088]; 2026-02-21T08:52:45.2858523Z add.s32 %r2126, %r2118, %r66; 2026-02-21T08:52:45.2858587Z ld.shared.b16 %rs370, [%r2126]; 2026-02-21T08:52:45.2858653Z ld.shared.b16 %rs371, [%r2126+1024]; 2026-02-21T08:52:45.2858722Z ld.shared.b16 %rs372, [%r2126+64]; 2026-02-21T08:52:45.2858786Z ld.shared.b16 %rs373, [%r2126+1088]; 2026-02-21T08:52:45.2858851Z cvt.f32.bf16 %r1814, %rs342; 2026-02-21T08:52:45.2858917Z cvt.f32.bf16 %r1815, %rs343; 2026-02-21T08:52:45.2858979Z cvt.f32.bf16 %r1816, %rs346; 2026-02-21T08:52:45.2859040Z cvt.f32.bf16 %r1817, %rs347; 2026-02-21T08:52:45.2859104Z cvt.f32.bf16 %r1850, %rs350; 2026-02-21T08:52:45.2859163Z cvt.f32.bf16 %r1851, %rs351; 2026-02-21T08:52:45.2859222Z cvt.f32.bf16 %r1852, %rs354; 2026-02-21T08:52:45.2859283Z cvt.f32.bf16 %r1853, %rs355; 2026-02-21T08:52:45.2859350Z cvt.f32.bf16 %r1886, %rs358; 2026-02-21T08:52:45.2859410Z cvt.f32.bf16 %r1887, %rs359; 2026-02-21T08:52:45.2859471Z cvt.f32.bf16 %r1888, %rs362; 2026-02-21T08:52:45.2859535Z cvt.f32.bf16 %r1889, %rs363; 2026-02-21T08:52:45.2859733Z cvt.f32.bf16 %r1922, %rs366; 2026-02-21T08:52:45.2859793Z cvt.f32.bf16 %r1923, %rs367; 2026-02-21T08:52:45.2859852Z cvt.f32.bf16 %r1924, %rs370; 2026-02-21T08:52:45.2859916Z cvt.f32.bf16 %r1925, %rs371; 2026-02-21T08:52:45.2859976Z cvt.f32.bf16 %r1958, %rs344; 2026-02-21T08:52:45.2860037Z cvt.f32.bf16 %r1959, %rs345; 2026-02-21T08:52:45.2860100Z cvt.f32.bf16 %r1960, %rs348; 2026-02-21T08:52:45.2860159Z cvt.f32.bf16 %r1961, %rs349; 2026-02-21T08:52:45.2860219Z cvt.f32.bf16 %r1994, %rs352; 2026-02-21T08:52:45.2860279Z cvt.f32.bf16 %r1995, %rs353; 2026-02-21T08:52:45.2860340Z cvt.f32.bf16 %r1996, %rs356; 2026-02-21T08:52:45.2860399Z cvt.f32.bf16 %r1997, %rs357; 2026-02-21T08:52:45.2860459Z cvt.f32.bf16 %r2030, %rs360; 2026-02-21T08:52:45.2860650Z cvt.f32.bf16 %r2031, %rs361; 2026-02-21T08:52:45.2860715Z cvt.f32.bf16 %r2032, %rs364; 2026-02-21T08:52:45.2860781Z cvt.f32.bf16 %r2033, %rs365; 2026-02-21T08:52:45.2860847Z cvt.f32.bf16 %r2066, %rs368; 2026-02-21T08:52:45.2860912Z cvt.f32.bf16 %r2067, %rs369; 2026-02-21T08:52:45.2860972Z cvt.f32.bf16 %r2068, %rs372; 2026-02-21T08:52:45.2861033Z cvt.f32.bf16 %r2069, %rs373; 2026-02-21T08:52:45.2861247Z .loc 1 67 45 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:67:45 2026-02-21T08:52:45.2861314Z add.s32 %r2127, %r67, %r2115; 2026-02-21T08:52:45.2861380Z ld.shared.b8 %rs374, [%r2127]; 2026-02-21T08:52:45.2861453Z ld.shared.b8 %rs375, [%r2127+256]; 2026-02-21T08:52:45.2861518Z ld.shared.b8 %rs376, [%r2127+512]; 2026-02-21T08:52:45.2861581Z ld.shared.b8 %rs377, [%r2127+768]; 2026-02-21T08:52:45.2861648Z ld.shared.b8 %rs378, [%r2127+1024]; 2026-02-21T08:52:45.2861720Z ld.shared.b8 %rs379, [%r2127+1280]; 2026-02-21T08:52:45.2861789Z ld.shared.b8 %rs380, [%r2127+1536]; 2026-02-21T08:52:45.2861857Z ld.shared.b8 %rs381, [%r2127+1792]; 2026-02-21T08:52:45.2861925Z ld.shared.b8 %rs382, [%r2127+2048]; 2026-02-21T08:52:45.2861990Z ld.shared.b8 %rs383, [%r2127+2304]; 2026-02-21T08:52:45.2862058Z ld.shared.b8 %rs384, [%r2127+2560]; 2026-02-21T08:52:45.2862126Z ld.shared.b8 %rs385, [%r2127+2816]; 2026-02-21T08:52:45.2862190Z ld.shared.b8 %rs386, [%r2127+3072]; 2026-02-21T08:52:45.2862266Z ld.shared.b8 %rs387, [%r2127+3328]; 2026-02-21T08:52:45.2862333Z ld.shared.b8 %rs388, [%r2127+3584]; 2026-02-21T08:52:45.2862403Z ld.shared.b8 %rs389, [%r2127+3840]; 2026-02-21T08:52:45.2862608Z .loc 1 57 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:57:28 2026-02-21T08:52:45.2862670Z shl.b16 %rs390, %rs374, 4; 2026-02-21T08:52:45.2862738Z shl.b16 %rs391, %rs375, 4; 2026-02-21T08:52:45.2862798Z shl.b16 %rs392, %rs376, 4; 2026-02-21T08:52:45.2862859Z shl.b16 %rs393, %rs377, 4; 2026-02-21T08:52:45.2862923Z shl.b16 %rs394, %rs378, 4; 2026-02-21T08:52:45.2862988Z shl.b16 %rs395, %rs379, 4; 2026-02-21T08:52:45.2863048Z shl.b16 %rs396, %rs380, 4; 2026-02-21T08:52:45.2863107Z shl.b16 %rs397, %rs381, 4; 2026-02-21T08:52:45.2863173Z shl.b16 %rs398, %rs382, 4; 2026-02-21T08:52:45.2863233Z shl.b16 %rs399, %rs383, 4; 2026-02-21T08:52:45.2863292Z shl.b16 %rs400, %rs384, 4; 2026-02-21T08:52:45.2863351Z shl.b16 %rs401, %rs385, 4; 2026-02-21T08:52:45.2863415Z shl.b16 %rs402, %rs386, 4; 2026-02-21T08:52:45.2863474Z shl.b16 %rs403, %rs387, 4; 2026-02-21T08:52:45.2863534Z shl.b16 %rs404, %rs388, 4; 2026-02-21T08:52:45.2863595Z shl.b16 %rs405, %rs389, 4; 2026-02-21T08:52:45.2863798Z .loc 1 72 58 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:72:58 2026-02-21T08:52:45.2863872Z selp.b16 %rs406, %rs390, %rs374, %p57; 2026-02-21T08:52:45.2863937Z cvt.s16.s8 %rs407, %rs406; 2026-02-21T08:52:45.2863997Z shr.s16 %rs408, %rs407, 4; 2026-02-21T08:52:45.2864071Z selp.b16 %rs409, %rs391, %rs375, %p57; 2026-02-21T08:52:45.2864131Z cvt.s16.s8 %rs410, %rs409; 2026-02-21T08:52:45.2864195Z shr.s16 %rs411, %rs410, 4; 2026-02-21T08:52:45.2864262Z selp.b16 %rs412, %rs392, %rs376, %p57; 2026-02-21T08:52:45.2864426Z cvt.s16.s8 %rs413, %rs412; 2026-02-21T08:52:45.2864489Z shr.s16 %rs414, %rs413, 4; 2026-02-21T08:52:45.2864555Z selp.b16 %rs415, %rs393, %rs377, %p57; 2026-02-21T08:52:45.2864634Z cvt.s16.s8 %rs416, %rs415; 2026-02-21T08:52:45.2864696Z shr.s16 %rs417, %rs416, 4; 2026-02-21T08:52:45.2864769Z selp.b16 %rs418, %rs394, %rs378, %p57; 2026-02-21T08:52:45.2864828Z cvt.s16.s8 %rs419, %rs418; 2026-02-21T08:52:45.2864888Z shr.s16 %rs420, %rs419, 4; 2026-02-21T08:52:45.2864961Z selp.b16 %rs421, %rs395, %rs379, %p57; 2026-02-21T08:52:45.2865019Z cvt.s16.s8 %rs422, %rs421; 2026-02-21T08:52:45.2865082Z shr.s16 %rs423, %rs422, 4; 2026-02-21T08:52:45.2865153Z selp.b16 %rs424, %rs396, %rs380, %p57; 2026-02-21T08:52:45.2865216Z cvt.s16.s8 %rs425, %rs424; 2026-02-21T08:52:45.2865367Z shr.s16 %rs426, %rs425, 4; 2026-02-21T08:52:45.2865437Z selp.b16 %rs427, %rs397, %rs381, %p57; 2026-02-21T08:52:45.2865501Z cvt.s16.s8 %rs428, %rs427; 2026-02-21T08:52:45.2865564Z shr.s16 %rs429, %rs428, 4; 2026-02-21T08:52:45.2865630Z selp.b16 %rs430, %rs398, %rs382, %p57; 2026-02-21T08:52:45.2865692Z cvt.s16.s8 %rs431, %rs430; 2026-02-21T08:52:45.2865754Z shr.s16 %rs432, %rs431, 4; 2026-02-21T08:52:45.2865820Z selp.b16 %rs433, %rs399, %rs383, %p57; 2026-02-21T08:52:45.2865879Z cvt.s16.s8 %rs434, %rs433; 2026-02-21T08:52:45.2865942Z shr.s16 %rs435, %rs434, 4; 2026-02-21T08:52:45.2866008Z selp.b16 %rs436, %rs400, %rs384, %p57; 2026-02-21T08:52:45.2866068Z cvt.s16.s8 %rs437, %rs436; 2026-02-21T08:52:45.2866131Z shr.s16 %rs438, %rs437, 4; 2026-02-21T08:52:45.2866198Z selp.b16 %rs439, %rs401, %rs385, %p57; 2026-02-21T08:52:45.2866257Z cvt.s16.s8 %rs440, %rs439; 2026-02-21T08:52:45.2866318Z shr.s16 %rs441, %rs440, 4; 2026-02-21T08:52:45.2866396Z selp.b16 %rs442, %rs402, %rs386, %p57; 2026-02-21T08:52:45.2866576Z cvt.s16.s8 %rs443, %rs442; 2026-02-21T08:52:45.2866645Z shr.s16 %rs444, %rs443, 4; 2026-02-21T08:52:45.2866722Z selp.b16 %rs445, %rs403, %rs387, %p57; 2026-02-21T08:52:45.2866787Z cvt.s16.s8 %rs446, %rs445; 2026-02-21T08:52:45.2866847Z shr.s16 %rs447, %rs446, 4; 2026-02-21T08:52:45.2866914Z selp.b16 %rs448, %rs404, %rs388, %p57; 2026-02-21T08:52:45.2866979Z cvt.s16.s8 %rs449, %rs448; 2026-02-21T08:52:45.2867038Z shr.s16 %rs450, %rs449, 4; 2026-02-21T08:52:45.2867104Z selp.b16 %rs451, %rs405, %rs389, %p57; 2026-02-21T08:52:45.2867166Z cvt.s16.s8 %rs452, %rs451; 2026-02-21T08:52:45.2867226Z shr.s16 %rs453, %rs452, 4; 2026-02-21T08:52:45.2867429Z .loc 1 77 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:77:32 2026-02-21T08:52:45.2867497Z cvt.rn.f32.s16 %r2128, %rs408; 2026-02-21T08:52:45.2867559Z cvt.rn.f32.s16 %r2129, %rs411; 2026-02-21T08:52:45.2867622Z cvt.rn.f32.s16 %r2130, %rs414; 2026-02-21T08:52:45.2867687Z cvt.rn.f32.s16 %r2131, %rs417; 2026-02-21T08:52:45.2867754Z cvt.rn.f32.s16 %r2132, %rs420; 2026-02-21T08:52:45.2867814Z cvt.rn.f32.s16 %r2133, %rs423; 2026-02-21T08:52:45.2867878Z cvt.rn.f32.s16 %r2134, %rs426; 2026-02-21T08:52:45.2867944Z cvt.rn.f32.s16 %r2135, %rs429; 2026-02-21T08:52:45.2868007Z cvt.rn.f32.s16 %r2136, %rs432; 2026-02-21T08:52:45.2868067Z cvt.rn.f32.s16 %r2137, %rs435; 2026-02-21T08:52:45.2868143Z cvt.rn.f32.s16 %r2138, %rs438; 2026-02-21T08:52:45.2868210Z cvt.rn.f32.s16 %r2139, %rs441; 2026-02-21T08:52:45.2868271Z cvt.rn.f32.s16 %r2140, %rs444; 2026-02-21T08:52:45.2868333Z cvt.rn.f32.s16 %r2141, %rs447; 2026-02-21T08:52:45.2868397Z cvt.rn.f32.s16 %r2142, %rs450; 2026-02-21T08:52:45.2868458Z cvt.rn.f32.s16 %r2143, %rs453; 2026-02-21T08:52:45.2868586Z st.shared.b32 [%r68], %r2128; 2026-02-21T08:52:45.2868663Z st.shared.b32 [%r68+16384], %r2136; 2026-02-21T08:52:45.2868730Z st.shared.b32 [%r69], %r2129; 2026-02-21T08:52:45.2868797Z st.shared.b32 [%r69+16384], %r2137; 2026-02-21T08:52:45.2868860Z st.shared.b32 [%r70], %r2130; 2026-02-21T08:52:45.2868928Z st.shared.b32 [%r70+16384], %r2138; 2026-02-21T08:52:45.2868989Z st.shared.b32 [%r71], %r2131; 2026-02-21T08:52:45.2869191Z st.shared.b32 [%r71+16384], %r2139; 2026-02-21T08:52:45.2869263Z st.shared.b32 [%r72], %r2132; 2026-02-21T08:52:45.2869336Z st.shared.b32 [%r72+16384], %r2140; 2026-02-21T08:52:45.2869400Z st.shared.b32 [%r73], %r2133; 2026-02-21T08:52:45.2869465Z st.shared.b32 [%r73+16384], %r2141; 2026-02-21T08:52:45.2869530Z st.shared.b32 [%r74], %r2134; 2026-02-21T08:52:45.2869593Z st.shared.b32 [%r74+16384], %r2142; 2026-02-21T08:52:45.2869653Z st.shared.b32 [%r75], %r2135; 2026-02-21T08:52:45.2869720Z st.shared.b32 [%r75+16384], %r2143; 2026-02-21T08:52:45.2869774Z $L__tmp7: 2026-02-21T08:52:45.2870055Z .loc 2 291 36 // standard.py:291:36 @[ cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:84:40 ] 2026-02-21T08:52:45.2870254Z // begin inline asm 2026-02-21T08:52:45.2870349Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.2870408Z // end inline asm 2026-02-21T08:52:45.2870466Z bar.sync 0; 2026-02-21T08:52:45.2870551Z shfl.sync.idx.b32 %r2144, %r4, 0, 31, -1; 2026-02-21T08:52:45.2870628Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.2870689Z shl.b32 %r2145, %r2144, 10; 2026-02-21T08:52:45.2870755Z and.b32 %r2146, %r2145, 12288; 2026-02-21T08:52:45.2870817Z add.s32 %r2147, %r2146, %r2197; 2026-02-21T08:52:45.2870878Z bfe.u32 %r2148, %r2147, 4, 14; 2026-02-21T08:52:45.2870940Z cvt.u64.u32 %rd181, %r2148; 2026-02-21T08:52:45.2871022Z or.b64 %rd170, %rd181, 4611686293372403712; 2026-02-21T08:52:45.2871087Z mov.pred %p43, -1; 2026-02-21T08:52:45.2871147Z // begin inline asm 2026-02-21T08:52:45.2871670Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1814,%r1815,%r1816,%r1817}, %rd170, %p43, 1, 1; 2026-02-21T08:52:45.2871743Z // end inline asm 2026-02-21T08:52:45.2871806Z add.s32 %r2149, %r2147, 32; 2026-02-21T08:52:45.2871871Z bfe.u32 %r2150, %r2149, 4, 14; 2026-02-21T08:52:45.2871934Z cvt.u64.u32 %rd182, %r2150; 2026-02-21T08:52:45.2872009Z or.b64 %rd171, %rd182, 4611686293372403712; 2026-02-21T08:52:45.2872068Z // begin inline asm 2026-02-21T08:52:45.2872580Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1850,%r1851,%r1852,%r1853}, %rd171, %p43, 1, 1; 2026-02-21T08:52:45.2872636Z // end inline asm 2026-02-21T08:52:45.2872696Z add.s32 %r2151, %r2147, 64; 2026-02-21T08:52:45.2872760Z bfe.u32 %r2152, %r2151, 4, 14; 2026-02-21T08:52:45.2872821Z cvt.u64.u32 %rd183, %r2152; 2026-02-21T08:52:45.2872893Z or.b64 %rd172, %rd183, 4611686293372403712; 2026-02-21T08:52:45.2872956Z // begin inline asm 2026-02-21T08:52:45.2873459Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1886,%r1887,%r1888,%r1889}, %rd172, %p43, 1, 1; 2026-02-21T08:52:45.2873520Z // end inline asm 2026-02-21T08:52:45.2873583Z add.s32 %r2153, %r2147, 96; 2026-02-21T08:52:45.2873647Z bfe.u32 %r2154, %r2153, 4, 14; 2026-02-21T08:52:45.2873709Z cvt.u64.u32 %rd184, %r2154; 2026-02-21T08:52:45.2873791Z or.b64 %rd173, %rd184, 4611686293372403712; 2026-02-21T08:52:45.2873856Z // begin inline asm 2026-02-21T08:52:45.2874358Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1922,%r1923,%r1924,%r1925}, %rd173, %p43, 1, 1; 2026-02-21T08:52:45.2874415Z // end inline asm 2026-02-21T08:52:45.2874480Z add.s32 %r2155, %r2147, 16384; 2026-02-21T08:52:45.2874539Z bfe.u32 %r2156, %r2155, 4, 14; 2026-02-21T08:52:45.2874601Z cvt.u64.u32 %rd185, %r2156; 2026-02-21T08:52:45.2874687Z or.b64 %rd174, %rd185, 4611686293372403712; 2026-02-21T08:52:45.2874751Z // begin inline asm 2026-02-21T08:52:45.2875257Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1958,%r1959,%r1960,%r1961}, %rd174, %p43, 1, 1; 2026-02-21T08:52:45.2875416Z // end inline asm 2026-02-21T08:52:45.2875480Z add.s32 %r2157, %r2147, 16416; 2026-02-21T08:52:45.2875540Z bfe.u32 %r2158, %r2157, 4, 14; 2026-02-21T08:52:45.2875602Z cvt.u64.u32 %rd186, %r2158; 2026-02-21T08:52:45.2875673Z or.b64 %rd175, %rd186, 4611686293372403712; 2026-02-21T08:52:45.2875737Z // begin inline asm 2026-02-21T08:52:45.2876235Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r1994,%r1995,%r1996,%r1997}, %rd175, %p43, 1, 1; 2026-02-21T08:52:45.2876295Z // end inline asm 2026-02-21T08:52:45.2876441Z add.s32 %r2159, %r2147, 16448; 2026-02-21T08:52:45.2876626Z bfe.u32 %r2160, %r2159, 4, 14; 2026-02-21T08:52:45.2876689Z cvt.u64.u32 %rd187, %r2160; 2026-02-21T08:52:45.2876770Z or.b64 %rd176, %rd187, 4611686293372403712; 2026-02-21T08:52:45.2876834Z // begin inline asm 2026-02-21T08:52:45.2877345Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r2030,%r2031,%r2032,%r2033}, %rd176, %p43, 1, 1; 2026-02-21T08:52:45.2877406Z // end inline asm 2026-02-21T08:52:45.2877466Z add.s32 %r2161, %r2147, 16480; 2026-02-21T08:52:45.2877525Z bfe.u32 %r2162, %r2161, 4, 14; 2026-02-21T08:52:45.2877590Z cvt.u64.u32 %rd188, %r2162; 2026-02-21T08:52:45.2877662Z or.b64 %rd177, %rd188, 4611686293372403712; 2026-02-21T08:52:45.2877721Z // begin inline asm 2026-02-21T08:52:45.2878235Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289}, {%r2066,%r2067,%r2068,%r2069}, %rd177, %p43, 1, 1; 2026-02-21T08:52:45.2878300Z // end inline asm 2026-02-21T08:52:45.2878379Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.2878437Z mov.b32 %r2087, 0; 2026-02-21T08:52:45.2878500Z mov.b32 %r2086, %r2197; 2026-02-21T08:52:45.2878559Z mov.b32 %r2088, %r2087; 2026-02-21T08:52:45.2878617Z // begin inline asm 2026-02-21T08:52:45.2878932Z // wait for regs: %r2274,%r2275,%r2276,%r2277,%r2278,%r2279,%r2280,%r2281,%r2282,%r2283,%r2284,%r2285,%r2286,%r2287,%r2288,%r2289,%r2086,%r2087,%r2088 2026-02-21T08:52:45.2879007Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.2879065Z // end inline asm 2026-02-21T08:52:45.2879120Z $L__tmp8: 2026-02-21T08:52:45.2879347Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2879409Z add.s32 %r2163, %r2273, 1; 2026-02-21T08:52:45.2879476Z setp.gt.s32 %p54, %r2163, 1; 2026-02-21T08:52:45.2879549Z selp.b32 %r2273, 0, %r2163, %p54; 2026-02-21T08:52:45.2879761Z .loc 1 48 32 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:32 2026-02-21T08:52:45.2879835Z mad.wide.s32 %rd179, %r2271, 2, %rd28; 2026-02-21T08:52:45.2880038Z .loc 1 48 80 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:48:80 2026-02-21T08:52:45.2880098Z shl.b32 %r2164, %r2273, 12; 2026-02-21T08:52:45.2880158Z shl.b32 %r2165, %r2273, 13; 2026-02-21T08:52:45.2880219Z add.s32 %r2166, %r2117, %r2165; 2026-02-21T08:52:45.2880284Z add.s32 %r2108, %r2166, %r50; 2026-02-21T08:52:45.2880346Z selp.b32 %r2109, 8, 0, %p52; 2026-02-21T08:52:45.2880405Z // begin inline asm 2026-02-21T08:52:45.2880554Z cp.async.ca.shared.global [ %r2108 + 0 ], [ %rd199 + 0 ], 0x8, %r2109; 2026-02-21T08:52:45.2880610Z // end inline asm 2026-02-21T08:52:45.2880669Z add.s32 %r2110, %r2108, 4096; 2026-02-21T08:52:45.2880731Z // begin inline asm 2026-02-21T08:52:45.2880868Z cp.async.ca.shared.global [ %r2110 + 0 ], [ %rd179 + 0 ], 0x8, %r2109; 2026-02-21T08:52:45.2880925Z // end inline asm 2026-02-21T08:52:45.2880989Z cp.async.commit_group; 2026-02-21T08:52:45.2881349Z .loc 1 54 87 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:54:87 2026-02-21T08:52:45.2881412Z add.s32 %r2112, %r54, %r2164; 2026-02-21T08:52:45.2881469Z // begin inline asm 2026-02-21T08:52:45.2881607Z cp.async.ca.shared.global [ %r2112 + 0 ], [ %rd198 + 0 ], 0x8, %r2109; 2026-02-21T08:52:45.2881663Z // end inline asm 2026-02-21T08:52:45.2881727Z cp.async.commit_group; 2026-02-21T08:52:45.2881936Z .loc 1 40 125 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:40:125 2026-02-21T08:52:45.2882003Z add.s64 %rd199, %rd199, 128; 2026-02-21T08:52:45.2882063Z add.s32 %r2271, %r2271, 64; 2026-02-21T08:52:45.2882127Z add.s64 %rd198, %rd198, 229376; 2026-02-21T08:52:45.2882198Z setp.lt.u64 %p55, %rd200, 4064; 2026-02-21T08:52:45.2882387Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:45.2882506Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:45.2882578Z cp.async.wait_group 0; 2026-02-21T08:52:45.2882639Z bar.sync 0; 2026-02-21T08:52:45.2882841Z .loc 1 87 28 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:87:28 2026-02-21T08:52:45.2882918Z cvt.rn.bf16x2.f32 %r2185, %r2275, %r2274; 2026-02-21T08:52:45.2882998Z cvt.rn.bf16x2.f32 %r2186, %r2277, %r2276; 2026-02-21T08:52:45.2883069Z cvt.rn.bf16x2.f32 %r2187, %r2279, %r2278; 2026-02-21T08:52:45.2883140Z cvt.rn.bf16x2.f32 %r2188, %r2281, %r2280; 2026-02-21T08:52:45.2883213Z cvt.rn.bf16x2.f32 %r2189, %r2283, %r2282; 2026-02-21T08:52:45.2883287Z cvt.rn.bf16x2.f32 %r2190, %r2285, %r2284; 2026-02-21T08:52:45.2883356Z cvt.rn.bf16x2.f32 %r2191, %r2287, %r2286; 2026-02-21T08:52:45.2883431Z cvt.rn.bf16x2.f32 %r2192, %r2289, %r2288; 2026-02-21T08:52:45.2883640Z .loc 1 88 50 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:50 2026-02-21T08:52:45.2883710Z mad.lo.s32 %r2193, %r219, 7168, %r221; 2026-02-21T08:52:45.2883777Z mad.lo.s32 %r2194, %r220, 7168, %r221; 2026-02-21T08:52:45.2883987Z .loc 1 88 22 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:22 2026-02-21T08:52:45.2884056Z mad.wide.s32 %rd189, %r2193, 2, %rd30; 2026-02-21T08:52:45.2884123Z mad.wide.s32 %rd190, %r2194, 2, %rd30; 2026-02-21T08:52:45.2884324Z .loc 1 88 81 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:88:81 2026-02-21T08:52:45.2884437Z st.shared.v4.b32 [%r76], {%r2185, %r2187, %r2189, %r2191}; 2026-02-21T08:52:45.2884553Z st.shared.v4.b32 [%r76+512], {%r2186, %r2188, %r2190, %r2192}; 2026-02-21T08:52:45.2884613Z bar.sync 0; 2026-02-21T08:52:45.2884681Z // begin inline asm 2026-02-21T08:52:45.2884886Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2167, %r2168, %r2169, %r2170}, [%r2171]; 2026-02-21T08:52:45.2884949Z // end inline asm 2026-02-21T08:52:45.2885019Z // begin inline asm 2026-02-21T08:52:45.2885206Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2172, %r2173, %r2174, %r2175}, [%r2176]; 2026-02-21T08:52:45.2885268Z // end inline asm 2026-02-21T08:52:45.2885335Z // begin inline asm 2026-02-21T08:52:45.2885462Z st.global.v4.b32 [ %rd189 + 0 ], { %r2167, %r2168, %r2169, %r2170 }; 2026-02-21T08:52:45.2885522Z // end inline asm 2026-02-21T08:52:45.2885585Z // begin inline asm 2026-02-21T08:52:45.2885703Z st.global.v4.b32 [ %rd190 + 0 ], { %r2172, %r2173, %r2174, %r2175 }; 2026-02-21T08:52:45.2885760Z // end inline asm 2026-02-21T08:52:45.2885974Z .loc 1 19 144 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:144 2026-02-21T08:52:45.2886047Z add.s32 %r261, %r2270, 4224; 2026-02-21T08:52:45.2886120Z setp.lt.s32 %p56, %r2270, -4168; 2026-02-21T08:52:45.2886181Z mov.b32 %r2270, %r261; 2026-02-21T08:52:45.2886246Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:45.2886340Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:45.2886652Z .loc 1 19 4 // cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py:19:4 2026-02-21T08:52:45.2886859Z ret; 2026-02-21T08:52:45.2886918Z $L__tmp9: 2026-02-21T08:52:45.2886977Z $L__func_end0: 2026-02-21T08:52:45.2887067Z // -- End function 2026-02-21T08:52:45.2887127Z } 2026-02-21T08:52:45.2887388Z .file 1 "/tmp/torchinductor_root/dd/cddkixfthec7qwoonpww25mcs5jg5w45czeoabf2s5vnufkqy4g5.py" 2026-02-21T08:52:45.2887607Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:45.2887682Z .section .debug_abbrev 2026-02-21T08:52:45.2887736Z { 2026-02-21T08:52:45.2887831Z .b8 1 // Abbreviation Code 2026-02-21T08:52:45.2887941Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:45.2888153Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:45.2888246Z .b8 37 // DW_AT_producer 2026-02-21T08:52:45.2888328Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.2888415Z .b8 19 // DW_AT_language 2026-02-21T08:52:45.2888497Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:45.2888578Z .b8 3 // DW_AT_name 2026-02-21T08:52:45.2888670Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.2888757Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:45.2888835Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:45.2888919Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:45.2888997Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.2889074Z .b8 0 // EOM(1) 2026-02-21T08:52:45.2889148Z .b8 0 // EOM(2) 2026-02-21T08:52:45.2889245Z .b8 2 // Abbreviation Code 2026-02-21T08:52:45.2889334Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:45.2889415Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:45.2889493Z .b8 3 // DW_AT_name 2026-02-21T08:52:45.2889572Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.2889653Z .b8 32 // DW_AT_inline 2026-02-21T08:52:45.2889737Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.2889809Z .b8 0 // EOM(1) 2026-02-21T08:52:45.2889876Z .b8 0 // EOM(2) 2026-02-21T08:52:45.2889961Z .b8 3 // Abbreviation Code 2026-02-21T08:52:45.2890054Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:45.2890138Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:45.2890219Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:45.2890300Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.2890385Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:45.2890462Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.2890558Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:45.2890637Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:45.2890708Z .b8 0 // EOM(1) 2026-02-21T08:52:45.2890778Z .b8 0 // EOM(2) 2026-02-21T08:52:45.2890866Z .b8 4 // Abbreviation Code 2026-02-21T08:52:45.2890969Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:45.2891050Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:45.2891148Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:45.2891227Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:45.2891303Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:45.2891484Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.2891579Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:45.2891659Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.2891748Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:45.2891827Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.2891909Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:45.2891987Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.2892074Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:45.2892150Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.2892328Z .b8 0 // EOM(1) 2026-02-21T08:52:45.2892405Z .b8 0 // EOM(2) 2026-02-21T08:52:45.2892474Z .b8 0 // EOM(3) 2026-02-21T08:52:45.2892529Z } 2026-02-21T08:52:45.2892593Z .section .debug_info 2026-02-21T08:52:45.2892649Z { 2026-02-21T08:52:45.2892737Z .b32 178 // Length of Unit 2026-02-21T08:52:45.2892828Z .b8 2 // DWARF version number 2026-02-21T08:52:45.2892886Z .b8 0 2026-02-21T08:52:45.2893020Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:45.2893116Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:45.2893238Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:45.2893326Z .b8 116 // DW_AT_producer 2026-02-21T08:52:45.2893379Z .b8 114 2026-02-21T08:52:45.2893433Z .b8 105 2026-02-21T08:52:45.2893502Z .b8 116 2026-02-21T08:52:45.2893558Z .b8 111 2026-02-21T08:52:45.2893611Z .b8 110 2026-02-21T08:52:45.2893666Z .b8 0 2026-02-21T08:52:45.2893746Z .b8 2 // DW_AT_language 2026-02-21T08:52:45.2893802Z .b8 0 2026-02-21T08:52:45.2893880Z .b8 99 // DW_AT_name 2026-02-21T08:52:45.2893937Z .b8 100 2026-02-21T08:52:45.2893990Z .b8 100 2026-02-21T08:52:45.2894042Z .b8 107 2026-02-21T08:52:45.2894100Z .b8 105 2026-02-21T08:52:45.2894152Z .b8 120 2026-02-21T08:52:45.2894204Z .b8 102 2026-02-21T08:52:45.2894258Z .b8 116 2026-02-21T08:52:45.2894316Z .b8 104 2026-02-21T08:52:45.2894368Z .b8 101 2026-02-21T08:52:45.2894421Z .b8 99 2026-02-21T08:52:45.2894477Z .b8 55 2026-02-21T08:52:45.2894530Z .b8 113 2026-02-21T08:52:45.2894583Z .b8 119 2026-02-21T08:52:45.2894636Z .b8 111 2026-02-21T08:52:45.2894694Z .b8 111 2026-02-21T08:52:45.2894748Z .b8 110 2026-02-21T08:52:45.2894800Z .b8 112 2026-02-21T08:52:45.2894853Z .b8 119 2026-02-21T08:52:45.2894909Z .b8 119 2026-02-21T08:52:45.2894962Z .b8 50 2026-02-21T08:52:45.2895016Z .b8 53 2026-02-21T08:52:45.2895074Z .b8 109 2026-02-21T08:52:45.2895127Z .b8 99 2026-02-21T08:52:45.2895179Z .b8 115 2026-02-21T08:52:45.2895234Z .b8 53 2026-02-21T08:52:45.2895291Z .b8 106 2026-02-21T08:52:45.2895343Z .b8 103 2026-02-21T08:52:45.2895394Z .b8 53 2026-02-21T08:52:45.2895452Z .b8 119 2026-02-21T08:52:45.2895507Z .b8 52 2026-02-21T08:52:45.2895557Z .b8 53 2026-02-21T08:52:45.2895608Z .b8 99 2026-02-21T08:52:45.2895663Z .b8 122 2026-02-21T08:52:45.2895715Z .b8 101 2026-02-21T08:52:45.2895766Z .b8 111 2026-02-21T08:52:45.2895822Z .b8 97 2026-02-21T08:52:45.2895875Z .b8 98 2026-02-21T08:52:45.2895927Z .b8 102 2026-02-21T08:52:45.2895978Z .b8 50 2026-02-21T08:52:45.2896034Z .b8 115 2026-02-21T08:52:45.2896087Z .b8 53 2026-02-21T08:52:45.2896139Z .b8 118 2026-02-21T08:52:45.2896189Z .b8 110 2026-02-21T08:52:45.2896247Z .b8 117 2026-02-21T08:52:45.2896298Z .b8 102 2026-02-21T08:52:45.2896351Z .b8 107 2026-02-21T08:52:45.2896405Z .b8 113 2026-02-21T08:52:45.2896578Z .b8 121 2026-02-21T08:52:45.2896634Z .b8 52 2026-02-21T08:52:45.2896688Z .b8 103 2026-02-21T08:52:45.2896748Z .b8 53 2026-02-21T08:52:45.2896806Z .b8 46 2026-02-21T08:52:45.2896996Z .b8 112 2026-02-21T08:52:45.2897053Z .b8 121 2026-02-21T08:52:45.2897104Z .b8 0 2026-02-21T08:52:45.2897206Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:45.2897289Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:45.2897347Z .b8 116 2026-02-21T08:52:45.2897399Z .b8 109 2026-02-21T08:52:45.2897451Z .b8 112 2026-02-21T08:52:45.2897506Z .b8 47 2026-02-21T08:52:45.2897558Z .b8 116 2026-02-21T08:52:45.2897609Z .b8 111 2026-02-21T08:52:45.2897662Z .b8 114 2026-02-21T08:52:45.2897720Z .b8 99 2026-02-21T08:52:45.2897772Z .b8 104 2026-02-21T08:52:45.2897824Z .b8 105 2026-02-21T08:52:45.2897875Z .b8 110 2026-02-21T08:52:45.2897942Z .b8 100 2026-02-21T08:52:45.2897998Z .b8 117 2026-02-21T08:52:45.2898050Z .b8 99 2026-02-21T08:52:45.2898110Z .b8 116 2026-02-21T08:52:45.2898308Z .b8 111 2026-02-21T08:52:45.2898368Z .b8 114 2026-02-21T08:52:45.2898419Z .b8 95 2026-02-21T08:52:45.2898477Z .b8 114 2026-02-21T08:52:45.2898529Z .b8 111 2026-02-21T08:52:45.2898586Z .b8 111 2026-02-21T08:52:45.2898642Z .b8 116 2026-02-21T08:52:45.2898693Z .b8 47 2026-02-21T08:52:45.2898745Z .b8 100 2026-02-21T08:52:45.2898799Z .b8 100 2026-02-21T08:52:45.2898855Z .b8 0 2026-02-21T08:52:45.2898968Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:45.2899046Z .b8 95 // DW_AT_name 2026-02-21T08:52:45.2899105Z .b8 104 2026-02-21T08:52:45.2899158Z .b8 101 2026-02-21T08:52:45.2899210Z .b8 108 2026-02-21T08:52:45.2899262Z .b8 105 2026-02-21T08:52:45.2899319Z .b8 111 2026-02-21T08:52:45.2899373Z .b8 110 2026-02-21T08:52:45.2899424Z .b8 95 2026-02-21T08:52:45.2899483Z .b8 109 2026-02-21T08:52:45.2899534Z .b8 97 2026-02-21T08:52:45.2899586Z .b8 116 2026-02-21T08:52:45.2899639Z .b8 109 2026-02-21T08:52:45.2899700Z .b8 117 2026-02-21T08:52:45.2899752Z .b8 108 2026-02-21T08:52:45.2899806Z .b8 95 2026-02-21T08:52:45.2899857Z .b8 98 2026-02-21T08:52:45.2899913Z .b8 102 2026-02-21T08:52:45.2899965Z .b8 49 2026-02-21T08:52:45.2900020Z .b8 54 2026-02-21T08:52:45.2900079Z .b8 95 2026-02-21T08:52:45.2900132Z .b8 105 2026-02-21T08:52:45.2900183Z .b8 110 2026-02-21T08:52:45.2900235Z .b8 116 2026-02-21T08:52:45.2900292Z .b8 52 2026-02-21T08:52:45.2900344Z .b8 0 2026-02-21T08:52:45.2900423Z .b8 1 // DW_AT_inline 2026-02-21T08:52:45.2900534Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:45.2900630Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:45.2900726Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:45.2900831Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:45.2900972Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:45.2901071Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:45.2901159Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:45.2901256Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:45.2901338Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:45.2901420Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:45.2901511Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:45.2901600Z .b8 0 // End Of Children Mark 2026-02-21T08:52:45.2901685Z .b8 0 // End Of Children Mark 2026-02-21T08:52:45.2901746Z } 2026-02-21T08:52:45.2901827Z .section .debug_macinfo { } 2026-02-21T08:52:45.2901833Z 2026-02-21T08:52:45.2901915Z ================================================================ 2026-02-21T08:52:45.2902032Z please share the reproducer above with Triton project. 2026-02-21T08:52:45.7177299Z 2026-02-21T08:52:45.7177314Z 2026-02-21T08:52:45.7177319Z 2026-02-21T08:52:45.7177585Z ================================================================ 2026-02-21T08:52:45.7178323Z Internal Triton PTX codegen error 2026-02-21T08:52:45.7178620Z `ptxas` stderr: 2026-02-21T08:52:45.7179366Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 278 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:45.7180191Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:45.7180428Z 2026-02-21T08:52:45.7181094Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpjds5ezvl.ptx -o /tmp/tmpjds5ezvl.ptx.o 2026-02-21T08:52:45.7181846Z 2026-02-21T08:52:45.7181851Z 2026-02-21T08:52:45.7181922Z // 2026-02-21T08:52:45.7182115Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:45.7182357Z // 2026-02-21T08:52:45.7182794Z 2026-02-21T08:52:45.7182884Z .version 8.7 2026-02-21T08:52:45.7183071Z .target sm_90a 2026-02-21T08:52:45.7183258Z .address_size 64 2026-02-21T08:52:45.7183375Z 2026-02-21T08:52:45.7183621Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:45.7184076Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:45.7184411Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:45.7184755Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:45.7185158Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:45.7185552Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:45.7185931Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:45.7186320Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:45.7186882Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:45.7187191Z ) 2026-02-21T08:52:45.7187330Z .reqntid 1024 2026-02-21T08:52:45.7187489Z .maxnreg 32 2026-02-21T08:52:45.7187631Z { 2026-02-21T08:52:45.7187788Z .reg .pred %p<29>; 2026-02-21T08:52:45.7187966Z .reg .b16 %rs<109>; 2026-02-21T08:52:45.7188142Z .reg .b32 %r<836>; 2026-02-21T08:52:45.7188304Z .reg .b64 %rd<109>; 2026-02-21T08:52:45.7188750Z .loc 1 14 0 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:14:0 2026-02-21T08:52:45.7189152Z $L__func_begin0: 2026-02-21T08:52:45.7189467Z .loc 1 14 0 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:14:0 2026-02-21T08:52:45.7189787Z 2026-02-21T08:52:45.7189853Z // %bb.0: 2026-02-21T08:52:45.7190076Z ld.param.b64 %rd32, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:45.7190409Z ld.param.b64 %rd31, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:45.7190726Z ld.param.b64 %rd30, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:45.7190990Z $L__tmp0: 2026-02-21T08:52:45.7191314Z .loc 1 19 46 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:46 2026-02-21T08:52:45.7191709Z mov.u32 %r790, %ctaid.x; 2026-02-21T08:52:45.7192053Z .loc 1 0 0 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:0 2026-02-21T08:52:45.7192428Z sub.s32 %r143, 4279, %r790; 2026-02-21T08:52:45.7192792Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7193192Z mul.hi.u32 %r144, %r143, 1041204193; 2026-02-21T08:52:45.7193419Z shr.u32 %r145, %r144, 10; 2026-02-21T08:52:45.7193613Z and.b32 %r146, %r145, 1048572; 2026-02-21T08:52:45.7193815Z mad.lo.s32 %r827, %r146, 4224, %r790; 2026-02-21T08:52:45.7194193Z .loc 1 31 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:45 2026-02-21T08:52:45.7194579Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:45.7194760Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:45.7194929Z and.b32 %r5, %r3, 1008; 2026-02-21T08:52:45.7195108Z shr.u32 %r6, %r3, 4; 2026-02-21T08:52:45.7195435Z .loc 1 33 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:45 2026-02-21T08:52:45.7195976Z and.b32 %r7, %r3, 15; 2026-02-21T08:52:45.7196148Z shl.b32 %r8, %r7, 3; 2026-02-21T08:52:45.7196302Z and.b32 %r9, %r3, 127; 2026-02-21T08:52:45.7196761Z .loc 1 41 48 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:41:48 2026-02-21T08:52:45.7197132Z and.b32 %r10, %r3, 896; 2026-02-21T08:52:45.7197304Z shr.u32 %r11, %r3, 7; 2026-02-21T08:52:45.7197604Z .loc 1 65 38 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:65:38 2026-02-21T08:52:45.7197954Z and.b32 %r12, %r3, 128; 2026-02-21T08:52:45.7198274Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7198643Z setp.ge.s32 %p1, %r790, %r827; 2026-02-21T08:52:45.7198996Z shl.b32 %r773, %r3, 1; 2026-02-21T08:52:45.7199171Z and.b32 %r774, %r3, 64; 2026-02-21T08:52:45.7199360Z bfe.s32 %r775, %r3, 6, 1; 2026-02-21T08:52:45.7199544Z mov.b32 %r776, global_smem; 2026-02-21T08:52:45.7199728Z shl.b32 %r777, %r3, 4; 2026-02-21T08:52:45.7199897Z shl.b32 %r778, %r3, 3; 2026-02-21T08:52:45.7200064Z and.b32 %r779, %r3, 16; 2026-02-21T08:52:45.7200236Z bfe.s32 %r780, %r3, 4, 1; 2026-02-21T08:52:45.7200402Z shl.b32 %r781, %r3, 2; 2026-02-21T08:52:45.7200570Z and.b32 %r782, %r3, 384; 2026-02-21T08:52:45.7200750Z and.b32 %r783, %r6, 2; 2026-02-21T08:52:45.7200916Z setp.gt.u32 %p27, %r3, 511; 2026-02-21T08:52:45.7201099Z shr.u32 %r784, %r3, 1; 2026-02-21T08:52:45.7201275Z shl.b32 %r785, %r9, 6; 2026-02-21T08:52:45.7201445Z shr.u32 %r786, %r10, 5; 2026-02-21T08:52:45.7201605Z and.b32 %r787, %r3, 7; 2026-02-21T08:52:45.7201768Z shl.b32 %r788, %r7, 4; 2026-02-21T08:52:45.7201924Z shl.b32 %r789, %r3, 7; 2026-02-21T08:52:45.7202094Z setp.eq.b32 %p28, %r12, 0; 2026-02-21T08:52:45.7202280Z @%p1 bra $L__BB0_11; 2026-02-21T08:52:45.7202468Z // %bb.1: // %.lr.ph 2026-02-21T08:52:45.7202867Z .loc 1 0 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:0:144 2026-02-21T08:52:45.7203235Z and.b32 %r148, %r773, 1918; 2026-02-21T08:52:45.7203742Z [82s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:45.7205304Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[4, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:45.7206940Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:45.7207240Z `ptxas` stderr: 2026-02-21T08:52:45.7207794Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 278 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:45.7208447Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:45.7208644Z 2026-02-21T08:52:45.7209174Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpjds5ezvl.ptx -o /tmp/tmpjds5ezvl.ptx.o 2026-02-21T08:52:45.7209763Z 2026-02-21T08:52:45.7209916Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:45.7210222Z and.b32 %r151, %r775, 136; 2026-02-21T08:52:45.7210406Z xor.b32 %r152, %r151, %r148; 2026-02-21T08:52:45.7210592Z add.s32 %r13, %r776, %r152; 2026-02-21T08:52:45.7210771Z and.b32 %r155, %r777, 1536; 2026-02-21T08:52:45.7210943Z and.b32 %r157, %r778, 96; 2026-02-21T08:52:45.7211135Z and.b32 %r158, %r773, 6; 2026-02-21T08:52:45.7211313Z and.b32 %r161, %r780, 136; 2026-02-21T08:52:45.7211494Z or.b32 %r162, %r155, %r157; 2026-02-21T08:52:45.7211663Z or.b32 %r163, %r162, %r158; 2026-02-21T08:52:45.7211999Z or.b32 %r164, %r163, %r161; 2026-02-21T08:52:45.7212188Z add.s32 %r14, %r776, %r164; 2026-02-21T08:52:45.7212369Z xor.b32 %r165, %r164, 8; 2026-02-21T08:52:45.7212536Z add.s32 %r15, %r776, %r165; 2026-02-21T08:52:45.7212714Z and.b32 %r167, %r781, 124; 2026-02-21T08:52:45.7212887Z shl.b32 %r170, %r774, 3; 2026-02-21T08:52:45.7213055Z selp.b32 %r171, 1, 0, %p27; 2026-02-21T08:52:45.7226797Z add.s32 %r172, %r776, %r782; 2026-02-21T08:52:45.7227092Z add.s32 %r173, %r172, %r171; 2026-02-21T08:52:45.7227301Z add.s32 %r174, %r173, %r170; 2026-02-21T08:52:45.7227495Z add.s32 %r175, %r174, %r783; 2026-02-21T08:52:45.7227683Z add.s32 %r16, %r175, %r167; 2026-02-21T08:52:45.7227877Z and.b32 %r177, %r784, 384; 2026-02-21T08:52:45.7228060Z add.s32 %r178, %r776, %r783; 2026-02-21T08:52:45.7228477Z add.s32 %r179, %r178, %r177; 2026-02-21T08:52:45.7228775Z add.s32 %r180, %r179, %r167; 2026-02-21T08:52:45.7228972Z add.s32 %r17, %r180, %r170; 2026-02-21T08:52:45.7229160Z and.b32 %r182, %r778, 48; 2026-02-21T08:52:45.7229346Z xor.b32 %r184, %r182, %r786; 2026-02-21T08:52:45.7229519Z or.b32 %r185, %r184, %r785; 2026-02-21T08:52:45.7229703Z add.s32 %r18, %r776, %r185; 2026-02-21T08:52:45.7229874Z xor.b32 %r186, %r185, 32; 2026-02-21T08:52:45.7230049Z add.s32 %r19, %r776, %r186; 2026-02-21T08:52:45.7230229Z shl.b32 %r188, %r787, 11; 2026-02-21T08:52:45.7230397Z and.b32 %r190, %r3, 96; 2026-02-21T08:52:45.7230572Z shl.b32 %r191, %r190, 3; 2026-02-21T08:52:45.7230743Z shr.u32 %r192, %r782, 2; 2026-02-21T08:52:45.7230924Z selp.b32 %r193, 1024, 0, %p27; 2026-02-21T08:52:45.7231112Z or.b32 %r194, %r788, %r191; 2026-02-21T08:52:45.7231293Z or.b32 %r195, %r192, %r779; 2026-02-21T08:52:45.7231469Z xor.b32 %r196, %r194, %r195; 2026-02-21T08:52:45.7231655Z add.s32 %r197, %r776, %r188; 2026-02-21T08:52:45.7231849Z add.s32 %r198, %r197, %r193; 2026-02-21T08:52:45.7232026Z add.s32 %r20, %r198, %r196; 2026-02-21T08:52:45.7232204Z and.b32 %r200, %r789, 15360; 2026-02-21T08:52:45.7232385Z shl.b32 %r201, %r787, 4; 2026-02-21T08:52:45.7232561Z xor.b32 %r202, %r201, %r5; 2026-02-21T08:52:45.7232734Z add.s32 %r203, %r776, %r200; 2026-02-21T08:52:45.7232914Z add.s32 %r21, %r203, %r202; 2026-02-21T08:52:45.7233086Z shl.b32 %r204, %r190, 4; 2026-02-21T08:52:45.7233258Z or.b32 %r205, %r204, %r157; 2026-02-21T08:52:45.7233425Z or.b32 %r206, %r205, %r158; 2026-02-21T08:52:45.7233606Z or.b32 %r207, %r206, %r161; 2026-02-21T08:52:45.7233780Z add.s32 %r22, %r776, %r207; 2026-02-21T08:52:45.7233947Z xor.b32 %r208, %r207, 8; 2026-02-21T08:52:45.7234116Z add.s32 %r23, %r776, %r208; 2026-02-21T08:52:45.7234301Z add.s32 %r209, %r776, %r170; 2026-02-21T08:52:45.7234484Z add.s32 %r210, %r209, %r171; 2026-02-21T08:52:45.7234658Z add.s32 %r211, %r210, %r783; 2026-02-21T08:52:45.7234841Z add.s32 %r212, %r211, %r782; 2026-02-21T08:52:45.7235020Z add.s32 %r24, %r212, %r167; 2026-02-21T08:52:45.7235380Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7235786Z mul.lo.s32 %r213, %r11, 7168; 2026-02-21T08:52:45.7236142Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7236661Z or.b32 %r25, %r213, %r9; 2026-02-21T08:52:45.7236991Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7237354Z shl.b32 %r214, %r6, 13; 2026-02-21T08:52:45.7237672Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7238031Z or.b32 %r26, %r214, %r7; 2026-02-21T08:52:45.7238262Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:45.7238573Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:45.7238861Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:45.7239136Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:45.7239575Z // Child Loop BB0_9 Depth 2 2026-02-21T08:52:45.7239975Z .loc 1 25 35 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:25:35 2026-02-21T08:52:45.7240358Z mul.hi.s32 %r216, %r790, -1840700269; 2026-02-21T08:52:45.7240573Z add.s32 %r217, %r216, %r790; 2026-02-21T08:52:45.7240760Z shr.u32 %r218, %r217, 31; 2026-02-21T08:52:45.7240941Z shr.s32 %r219, %r217, 6; 2026-02-21T08:52:45.7241115Z add.s32 %r220, %r219, %r218; 2026-02-21T08:52:45.7241452Z .loc 1 26 33 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:26:33 2026-02-21T08:52:45.7241809Z shl.b32 %r221, %r220, 1; 2026-02-21T08:52:45.7242268Z .loc 1 27 39 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:39 2026-02-21T08:52:45.7242647Z sub.s32 %r222, 1, %r221; 2026-02-21T08:52:45.7242957Z .loc 1 27 52 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:52 2026-02-21T08:52:45.7243313Z min.s32 %r223, %r222, 2; 2026-02-21T08:52:45.7243619Z .loc 1 28 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:45 2026-02-21T08:52:45.7243975Z mul.lo.s32 %r224, %r220, 112; 2026-02-21T08:52:45.7244169Z sub.s32 %r225, %r790, %r224; 2026-02-21T08:52:45.7244491Z .loc 1 29 51 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:29:51 2026-02-21T08:52:45.7244855Z div.s32 %r226, %r225, %r223; 2026-02-21T08:52:45.7245167Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7245524Z mul.lo.s32 %r227, %r226, %r223; 2026-02-21T08:52:45.7245715Z sub.s32 %r228, %r225, %r227; 2026-02-21T08:52:45.7246032Z .loc 1 28 30 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:30 2026-02-21T08:52:45.7246384Z add.s32 %r229, %r228, %r221; 2026-02-21T08:52:45.7246881Z .loc 1 30 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:30:27 2026-02-21T08:52:45.7247247Z shl.b32 %r230, %r229, 6; 2026-02-21T08:52:45.7247555Z .loc 1 31 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:32 2026-02-21T08:52:45.7247909Z or.b32 %r38, %r230, %r6; 2026-02-21T08:52:45.7248219Z .loc 1 32 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:32:27 2026-02-21T08:52:45.7248594Z shl.b32 %r39, %r226, 7; 2026-02-21T08:52:45.7248926Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7249305Z add.s32 %r791, %r25, %r39; 2026-02-21T08:52:45.7249491Z shl.b32 %r231, %r229, 19; 2026-02-21T08:52:45.7249668Z or.b32 %r232, %r26, %r231; 2026-02-21T08:52:45.7249856Z mad.wide.s32 %rd98, %r232, 2, %rd30; 2026-02-21T08:52:45.7250061Z mov.b32 %r792, 0f00000000; 2026-02-21T08:52:45.7250246Z mov.b64 %rd99, -8; 2026-02-21T08:52:45.7250410Z mov.b32 %r793, %r792; 2026-02-21T08:52:45.7250590Z mov.b32 %r794, %r792; 2026-02-21T08:52:45.7250753Z mov.b32 %r795, %r792; 2026-02-21T08:52:45.7250916Z mov.b32 %r796, %r792; 2026-02-21T08:52:45.7251071Z mov.b32 %r797, %r792; 2026-02-21T08:52:45.7251234Z mov.b32 %r798, %r792; 2026-02-21T08:52:45.7251401Z mov.b32 %r799, %r792; 2026-02-21T08:52:45.7251621Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.7251927Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.7252328Z .loc 1 48 80 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:48:80 2026-02-21T08:52:45.7252695Z // begin inline asm 2026-02-21T08:52:45.7252858Z mov.u64 %rd34, 0x0; 2026-02-21T08:52:45.7253113Z createpolicy.fractional.L2::evict_first.b64 %rd34, 1.0; 2026-02-21T08:52:45.7253378Z // end inline asm 2026-02-21T08:52:45.7253544Z // begin inline asm 2026-02-21T08:52:45.7253891Z mov.u16 %rs1, 0x0; 2026-02-21T08:52:45.7254145Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs1 }, [ %rd98 + 0 ], %rd34; 2026-02-21T08:52:45.7254456Z // end inline asm 2026-02-21T08:52:45.7254760Z .loc 1 52 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:52:32 2026-02-21T08:52:45.7255124Z bar.sync 0; 2026-02-21T08:52:45.7255278Z st.shared.b16 [%r13], %rs1; 2026-02-21T08:52:45.7255462Z bar.sync 0; 2026-02-21T08:52:45.7255611Z ld.shared.b16 %rs3, [%r14]; 2026-02-21T08:52:45.7255809Z ld.shared.b16 %rs4, [%r14+256]; 2026-02-21T08:52:45.7256017Z ld.shared.b16 %rs5, [%r14+16]; 2026-02-21T08:52:45.7256228Z ld.shared.b16 %rs6, [%r14+272]; 2026-02-21T08:52:45.7256429Z ld.shared.b16 %rs7, [%r15]; 2026-02-21T08:52:45.7256881Z ld.shared.b16 %rs8, [%r15+256]; 2026-02-21T08:52:45.7257094Z ld.shared.b16 %rs9, [%r15+16]; 2026-02-21T08:52:45.7257286Z ld.shared.b16 %rs10, [%r15+272]; 2026-02-21T08:52:45.7257488Z cvt.f32.bf16 %r249, %rs3; 2026-02-21T08:52:45.7257672Z cvt.f32.bf16 %r250, %rs4; 2026-02-21T08:52:45.7257853Z cvt.f32.bf16 %r251, %rs7; 2026-02-21T08:52:45.7258031Z cvt.f32.bf16 %r252, %rs8; 2026-02-21T08:52:45.7258202Z cvt.f32.bf16 %r269, %rs5; 2026-02-21T08:52:45.7258382Z cvt.f32.bf16 %r270, %rs6; 2026-02-21T08:52:45.7258552Z cvt.f32.bf16 %r271, %rs9; 2026-02-21T08:52:45.7258731Z cvt.f32.bf16 %r272, %rs10; 2026-02-21T08:52:45.7259063Z .loc 1 54 34 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:34 2026-02-21T08:52:45.7259429Z cvt.s64.s32 %rd42, %r791; 2026-02-21T08:52:45.7259603Z add.s64 %rd38, %rd31, %rd42; 2026-02-21T08:52:45.7259948Z .loc 1 54 87 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:87 2026-02-21T08:52:45.7260324Z // begin inline asm 2026-02-21T08:52:45.7260487Z mov.u64 %rd37, 0x0; 2026-02-21T08:52:45.7260720Z createpolicy.fractional.L2::evict_last.b64 %rd37, 1.0; 2026-02-21T08:52:45.7260979Z // end inline asm 2026-02-21T08:52:45.7261147Z // begin inline asm 2026-02-21T08:52:45.7261314Z mov.u16 %rs2, 0x0; 2026-02-21T08:52:45.7261575Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs2 }, [ %rd38 + 0 ], %rd37; 2026-02-21T08:52:45.7261872Z // end inline asm 2026-02-21T08:52:45.7262183Z .loc 1 62 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:62:28 2026-02-21T08:52:45.7262547Z bar.sync 0; 2026-02-21T08:52:45.7262711Z st.shared.b8 [%r16], %rs2; 2026-02-21T08:52:45.7262898Z bar.sync 0; 2026-02-21T08:52:45.7263062Z ld.shared.v2.b8 {%rs11, %rs12}, [%r17]; 2026-02-21T08:52:45.7263424Z .loc 1 57 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:57:28 2026-02-21T08:52:45.7263778Z shl.b16 %rs13, %rs11, 4; 2026-02-21T08:52:45.7263968Z shl.b16 %rs14, %rs12, 4; 2026-02-21T08:52:45.7264279Z .loc 1 72 58 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:72:58 2026-02-21T08:52:45.7264653Z selp.b16 %rs15, %rs13, %rs11, %p28; 2026-02-21T08:52:45.7264869Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T08:52:45.7265043Z shr.s16 %rs17, %rs16, 4; 2026-02-21T08:52:45.7265225Z selp.b16 %rs18, %rs14, %rs12, %p28; 2026-02-21T08:52:45.7265424Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T08:52:45.7265588Z shr.s16 %rs20, %rs19, 4; 2026-02-21T08:52:45.7265894Z .loc 1 77 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:77:32 2026-02-21T08:52:45.7266251Z cvt.rn.f32.s16 %r295, %rs17; 2026-02-21T08:52:45.7266431Z cvt.rn.f32.s16 %r296, %rs20; 2026-02-21T08:52:45.7266741Z bar.sync 0; 2026-02-21T08:52:45.7266895Z st.shared.b32 [%r18], %r295; 2026-02-21T08:52:45.7267082Z st.shared.b32 [%r19], %r296; 2026-02-21T08:52:45.7267269Z $L__tmp1: 2026-02-21T08:52:45.7267634Z .loc 2 291 36 // standard.py:291:36 @[ c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:84:40 ] 2026-02-21T08:52:45.7268064Z // begin inline asm 2026-02-21T08:52:45.7268255Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.7268689Z // end inline asm 2026-02-21T08:52:45.7268853Z bar.sync 0; 2026-02-21T08:52:45.7269022Z shfl.sync.idx.b32 %r297, %r4, 0, 31, -1; 2026-02-21T08:52:45.7269259Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.7269446Z shl.b32 %r298, %r297, 8; 2026-02-21T08:52:45.7269624Z and.b32 %r299, %r298, 7168; 2026-02-21T08:52:45.7269805Z add.s32 %r300, %r299, %r776; 2026-02-21T08:52:45.7269992Z bfe.u32 %r301, %r300, 4, 14; 2026-02-21T08:52:45.7270169Z cvt.u64.u32 %rd43, %r301; 2026-02-21T08:52:45.7270370Z or.b64 %rd40, %rd43, -9223371899382267904; 2026-02-21T08:52:45.7270593Z mov.pred %p3, -1; 2026-02-21T08:52:45.7270752Z // begin inline asm 2026-02-21T08:52:45.7271351Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r792,%r793,%r794,%r795,%r796,%r797,%r798,%r799}, {%r249,%r250,%r251,%r252}, %rd40, %p3, 1, 1; 2026-02-21T08:52:45.7271862Z // end inline asm 2026-02-21T08:52:45.7272018Z add.s32 %r302, %r300, 32; 2026-02-21T08:52:45.7272189Z bfe.u32 %r303, %r302, 4, 14; 2026-02-21T08:52:45.7272373Z cvt.u64.u32 %rd44, %r303; 2026-02-21T08:52:45.7272557Z or.b64 %rd41, %rd44, -9223371899382267904; 2026-02-21T08:52:45.7272770Z // begin inline asm 2026-02-21T08:52:45.7273209Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r792,%r793,%r794,%r795,%r796,%r797,%r798,%r799}, {%r269,%r270,%r271,%r272}, %rd41, %p3, 1, 1; 2026-02-21T08:52:45.7273687Z // end inline asm 2026-02-21T08:52:45.7273859Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.7274055Z mov.b32 %r283, 0; 2026-02-21T08:52:45.7274210Z mov.b32 %r281, %r776; 2026-02-21T08:52:45.7274371Z mov.b32 %r282, %r283; 2026-02-21T08:52:45.7274538Z // begin inline asm 2026-02-21T08:52:45.7274789Z // wait for regs: %r792,%r793,%r794,%r795,%r796,%r797,%r798,%r799,%r281,%r282,%r283 2026-02-21T08:52:45.7275141Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.7275344Z // end inline asm 2026-02-21T08:52:45.7275489Z $L__tmp2: 2026-02-21T08:52:45.7275796Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7276172Z add.s64 %rd99, %rd99, 8; 2026-02-21T08:52:45.7276350Z add.s32 %r791, %r791, 57344; 2026-02-21T08:52:45.7276657Z add.s64 %rd98, %rd98, 32; 2026-02-21T08:52:45.7276853Z setp.lt.u64 %p6, %rd99, 4088; 2026-02-21T08:52:45.7277043Z @%p6 bra $L__BB0_3; 2026-02-21T08:52:45.7277261Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.7277686Z .loc 1 33 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:32 2026-02-21T08:52:45.7278051Z or.b32 %r309, %r39, %r8; 2026-02-21T08:52:45.7278374Z .loc 1 87 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:87:28 2026-02-21T08:52:45.7278740Z cvt.rn.bf16x2.f32 %r310, %r793, %r792; 2026-02-21T08:52:45.7278966Z cvt.rn.bf16x2.f32 %r311, %r795, %r794; 2026-02-21T08:52:45.7279177Z cvt.rn.bf16x2.f32 %r312, %r797, %r796; 2026-02-21T08:52:45.7279390Z cvt.rn.bf16x2.f32 %r313, %r799, %r798; 2026-02-21T08:52:45.7279735Z .loc 1 88 50 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:50 2026-02-21T08:52:45.7280093Z mad.lo.s32 %r314, %r38, 7168, %r309; 2026-02-21T08:52:45.7280441Z .loc 1 88 22 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:22 2026-02-21T08:52:45.7280801Z mad.wide.s32 %rd45, %r314, 2, %rd32; 2026-02-21T08:52:45.7281147Z .loc 1 88 81 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:81 2026-02-21T08:52:45.7281495Z bar.sync 0; 2026-02-21T08:52:45.7281786Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r310, %r311, %r312, %r313}; 2026-02-21T08:52:45.7282116Z bar.sync 0; 2026-02-21T08:52:45.7282308Z ld.shared.v4.b32 {%r304, %r305, %r306, %r307}, [%r21]; 2026-02-21T08:52:45.7282574Z // begin inline asm 2026-02-21T08:52:45.7282794Z st.global.v4.b32 [ %rd45 + 0 ], { %r304, %r305, %r306, %r307 }; 2026-02-21T08:52:45.7283066Z // end inline asm 2026-02-21T08:52:45.7283518Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7283896Z add.s32 %r315, %r790, 4224; 2026-02-21T08:52:45.7284222Z .loc 1 25 35 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:25:35 2026-02-21T08:52:45.7284595Z mul.hi.s32 %r316, %r315, -1840700269; 2026-02-21T08:52:45.7284816Z add.s32 %r317, %r316, %r315; 2026-02-21T08:52:45.7284997Z shr.u32 %r318, %r317, 31; 2026-02-21T08:52:45.7285176Z shr.s32 %r319, %r317, 6; 2026-02-21T08:52:45.7285349Z add.s32 %r320, %r319, %r318; 2026-02-21T08:52:45.7285673Z .loc 1 26 33 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:26:33 2026-02-21T08:52:45.7286025Z shl.b32 %r321, %r320, 1; 2026-02-21T08:52:45.7286606Z .loc 1 27 39 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:39 2026-02-21T08:52:45.7287016Z sub.s32 %r322, 1, %r321; 2026-02-21T08:52:45.7287351Z .loc 1 27 52 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:52 2026-02-21T08:52:45.7287715Z min.s32 %r323, %r322, 2; 2026-02-21T08:52:45.7288030Z .loc 1 28 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:45 2026-02-21T08:52:45.7288388Z mul.lo.s32 %r324, %r320, 112; 2026-02-21T08:52:45.7288572Z sub.s32 %r325, %r315, %r324; 2026-02-21T08:52:45.7288897Z .loc 1 29 51 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:29:51 2026-02-21T08:52:45.7289254Z div.s32 %r326, %r325, %r323; 2026-02-21T08:52:45.7289577Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7289941Z mul.lo.s32 %r327, %r326, %r323; 2026-02-21T08:52:45.7290132Z sub.s32 %r328, %r325, %r327; 2026-02-21T08:52:45.7290452Z .loc 1 28 30 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:30 2026-02-21T08:52:45.7290805Z add.s32 %r329, %r328, %r321; 2026-02-21T08:52:45.7291124Z .loc 1 30 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:30:27 2026-02-21T08:52:45.7291476Z shl.b32 %r330, %r329, 6; 2026-02-21T08:52:45.7291782Z .loc 1 31 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:32 2026-02-21T08:52:45.7292144Z or.b32 %r59, %r330, %r6; 2026-02-21T08:52:45.7292453Z .loc 1 32 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:32:27 2026-02-21T08:52:45.7292807Z shl.b32 %r60, %r326, 7; 2026-02-21T08:52:45.7293125Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7293495Z add.s32 %r800, %r25, %r60; 2026-02-21T08:52:45.7293682Z shl.b32 %r331, %r329, 19; 2026-02-21T08:52:45.7293855Z or.b32 %r332, %r26, %r331; 2026-02-21T08:52:45.7294044Z mad.wide.s32 %rd100, %r332, 2, %rd30; 2026-02-21T08:52:45.7294247Z mov.b32 %r801, 0f00000000; 2026-02-21T08:52:45.7294430Z mov.b64 %rd101, -8; 2026-02-21T08:52:45.7294592Z mov.b32 %r802, %r801; 2026-02-21T08:52:45.7294779Z mov.b32 %r803, %r801; 2026-02-21T08:52:45.7294940Z mov.b32 %r804, %r801; 2026-02-21T08:52:45.7295106Z mov.b32 %r805, %r801; 2026-02-21T08:52:45.7295261Z mov.b32 %r806, %r801; 2026-02-21T08:52:45.7295427Z mov.b32 %r807, %r801; 2026-02-21T08:52:45.7295593Z mov.b32 %r808, %r801; 2026-02-21T08:52:45.7295802Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.7296110Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.7296640Z .loc 1 48 80 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:48:80 2026-02-21T08:52:45.7297007Z // begin inline asm 2026-02-21T08:52:45.7297168Z mov.u64 %rd47, 0x0; 2026-02-21T08:52:45.7297397Z createpolicy.fractional.L2::evict_first.b64 %rd47, 1.0; 2026-02-21T08:52:45.7297668Z // end inline asm 2026-02-21T08:52:45.7297969Z // begin inline asm 2026-02-21T08:52:45.7298128Z mov.u16 %rs21, 0x0; 2026-02-21T08:52:45.7298385Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs21 }, [ %rd100 + 0 ], %rd47; 2026-02-21T08:52:45.7298695Z // end inline asm 2026-02-21T08:52:45.7298989Z .loc 1 52 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:52:32 2026-02-21T08:52:45.7299356Z bar.sync 0; 2026-02-21T08:52:45.7299513Z st.shared.b16 [%r13], %rs21; 2026-02-21T08:52:45.7299697Z bar.sync 0; 2026-02-21T08:52:45.7299849Z ld.shared.b16 %rs23, [%r22]; 2026-02-21T08:52:45.7300035Z ld.shared.b16 %rs24, [%r22+256]; 2026-02-21T08:52:45.7300239Z ld.shared.b16 %rs25, [%r22+16]; 2026-02-21T08:52:45.7300433Z ld.shared.b16 %rs26, [%r22+272]; 2026-02-21T08:52:45.7300634Z ld.shared.b16 %rs27, [%r23]; 2026-02-21T08:52:45.7300948Z ld.shared.b16 %rs28, [%r23+256]; 2026-02-21T08:52:45.7301150Z ld.shared.b16 %rs29, [%r23+16]; 2026-02-21T08:52:45.7301340Z ld.shared.b16 %rs30, [%r23+272]; 2026-02-21T08:52:45.7301539Z cvt.f32.bf16 %r349, %rs23; 2026-02-21T08:52:45.7301736Z cvt.f32.bf16 %r350, %rs24; 2026-02-21T08:52:45.7301916Z cvt.f32.bf16 %r351, %rs27; 2026-02-21T08:52:45.7302093Z cvt.f32.bf16 %r352, %rs28; 2026-02-21T08:52:45.7302263Z cvt.f32.bf16 %r369, %rs25; 2026-02-21T08:52:45.7302439Z cvt.f32.bf16 %r370, %rs26; 2026-02-21T08:52:45.7302608Z cvt.f32.bf16 %r371, %rs29; 2026-02-21T08:52:45.7302782Z cvt.f32.bf16 %r372, %rs30; 2026-02-21T08:52:45.7303096Z .loc 1 54 34 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:34 2026-02-21T08:52:45.7303459Z cvt.s64.s32 %rd55, %r800; 2026-02-21T08:52:45.7303634Z add.s64 %rd51, %rd31, %rd55; 2026-02-21T08:52:45.7303957Z .loc 1 54 87 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:87 2026-02-21T08:52:45.7304311Z // begin inline asm 2026-02-21T08:52:45.7304465Z mov.u64 %rd50, 0x0; 2026-02-21T08:52:45.7304686Z createpolicy.fractional.L2::evict_last.b64 %rd50, 1.0; 2026-02-21T08:52:45.7304948Z // end inline asm 2026-02-21T08:52:45.7305106Z // begin inline asm 2026-02-21T08:52:45.7305270Z mov.u16 %rs22, 0x0; 2026-02-21T08:52:45.7305527Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs22 }, [ %rd51 + 0 ], %rd50; 2026-02-21T08:52:45.7305820Z // end inline asm 2026-02-21T08:52:45.7306119Z .loc 1 62 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:62:28 2026-02-21T08:52:45.7306604Z bar.sync 0; 2026-02-21T08:52:45.7306781Z st.shared.b8 [%r24], %rs22; 2026-02-21T08:52:45.7306965Z bar.sync 0; 2026-02-21T08:52:45.7307126Z ld.shared.v2.b8 {%rs31, %rs32}, [%r17]; 2026-02-21T08:52:45.7307485Z .loc 1 57 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:57:28 2026-02-21T08:52:45.7307841Z shl.b16 %rs33, %rs31, 4; 2026-02-21T08:52:45.7308020Z shl.b16 %rs34, %rs32, 4; 2026-02-21T08:52:45.7308333Z .loc 1 72 58 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:72:58 2026-02-21T08:52:45.7308774Z selp.b16 %rs35, %rs33, %rs31, %p28; 2026-02-21T08:52:45.7308984Z cvt.s16.s8 %rs36, %rs35; 2026-02-21T08:52:45.7309151Z shr.s16 %rs37, %rs36, 4; 2026-02-21T08:52:45.7309333Z selp.b16 %rs38, %rs34, %rs32, %p28; 2026-02-21T08:52:45.7309530Z cvt.s16.s8 %rs39, %rs38; 2026-02-21T08:52:45.7309709Z shr.s16 %rs40, %rs39, 4; 2026-02-21T08:52:45.7310017Z .loc 1 77 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:77:32 2026-02-21T08:52:45.7310380Z cvt.rn.f32.s16 %r395, %rs37; 2026-02-21T08:52:45.7310569Z cvt.rn.f32.s16 %r396, %rs40; 2026-02-21T08:52:45.7310756Z bar.sync 0; 2026-02-21T08:52:45.7310913Z st.shared.b32 [%r18], %r395; 2026-02-21T08:52:45.7311093Z st.shared.b32 [%r19], %r396; 2026-02-21T08:52:45.7311271Z $L__tmp3: 2026-02-21T08:52:45.7311631Z .loc 2 291 36 // standard.py:291:36 @[ c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:84:40 ] 2026-02-21T08:52:45.7312062Z // begin inline asm 2026-02-21T08:52:45.7312425Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.7312623Z // end inline asm 2026-02-21T08:52:45.7312777Z bar.sync 0; 2026-02-21T08:52:45.7312936Z shfl.sync.idx.b32 %r397, %r4, 0, 31, -1; 2026-02-21T08:52:45.7313173Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.7313353Z shl.b32 %r398, %r397, 8; 2026-02-21T08:52:45.7313528Z and.b32 %r399, %r398, 7168; 2026-02-21T08:52:45.7313706Z add.s32 %r400, %r399, %r776; 2026-02-21T08:52:45.7313885Z bfe.u32 %r401, %r400, 4, 14; 2026-02-21T08:52:45.7314060Z cvt.u64.u32 %rd56, %r401; 2026-02-21T08:52:45.7314268Z or.b64 %rd53, %rd56, -9223371899382267904; 2026-02-21T08:52:45.7314485Z mov.pred %p7, -1; 2026-02-21T08:52:45.7314647Z // begin inline asm 2026-02-21T08:52:45.7315248Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r801,%r802,%r803,%r804,%r805,%r806,%r807,%r808}, {%r349,%r350,%r351,%r352}, %rd53, %p7, 1, 1; 2026-02-21T08:52:45.7315747Z // end inline asm 2026-02-21T08:52:45.7315905Z add.s32 %r402, %r400, 32; 2026-02-21T08:52:45.7316083Z bfe.u32 %r403, %r402, 4, 14; 2026-02-21T08:52:45.7316265Z cvt.u64.u32 %rd57, %r403; 2026-02-21T08:52:45.7316611Z or.b64 %rd54, %rd57, -9223371899382267904; 2026-02-21T08:52:45.7316926Z // begin inline asm 2026-02-21T08:52:45.7317396Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r801,%r802,%r803,%r804,%r805,%r806,%r807,%r808}, {%r369,%r370,%r371,%r372}, %rd54, %p7, 1, 1; 2026-02-21T08:52:45.7317885Z // end inline asm 2026-02-21T08:52:45.7318061Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.7318260Z mov.b32 %r382, 0; 2026-02-21T08:52:45.7318416Z mov.b32 %r381, %r776; 2026-02-21T08:52:45.7318590Z mov.b32 %r383, %r382; 2026-02-21T08:52:45.7318751Z // begin inline asm 2026-02-21T08:52:45.7319022Z // wait for regs: %r801,%r802,%r803,%r804,%r805,%r806,%r807,%r808,%r381,%r382,%r383 2026-02-21T08:52:45.7319348Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.7319552Z // end inline asm 2026-02-21T08:52:45.7319700Z $L__tmp4: 2026-02-21T08:52:45.7320005Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7320385Z add.s64 %rd101, %rd101, 8; 2026-02-21T08:52:45.7320566Z add.s32 %r800, %r800, 57344; 2026-02-21T08:52:45.7320759Z add.s64 %rd100, %rd100, 32; 2026-02-21T08:52:45.7320946Z setp.lt.u64 %p10, %rd101, 4088; 2026-02-21T08:52:45.7321150Z @%p10 bra $L__BB0_5; 2026-02-21T08:52:45.7321364Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.7321788Z .loc 1 33 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:32 2026-02-21T08:52:45.7322153Z or.b32 %r409, %r60, %r8; 2026-02-21T08:52:45.7322474Z .loc 1 87 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:87:28 2026-02-21T08:52:45.7322850Z cvt.rn.bf16x2.f32 %r410, %r802, %r801; 2026-02-21T08:52:45.7323073Z cvt.rn.bf16x2.f32 %r411, %r804, %r803; 2026-02-21T08:52:45.7323286Z cvt.rn.bf16x2.f32 %r412, %r806, %r805; 2026-02-21T08:52:45.7323497Z cvt.rn.bf16x2.f32 %r413, %r808, %r807; 2026-02-21T08:52:45.7323841Z .loc 1 88 50 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:50 2026-02-21T08:52:45.7324211Z mad.lo.s32 %r414, %r59, 7168, %r409; 2026-02-21T08:52:45.7324555Z .loc 1 88 22 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:22 2026-02-21T08:52:45.7324915Z mad.wide.s32 %rd58, %r414, 2, %rd32; 2026-02-21T08:52:45.7325247Z .loc 1 88 81 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:81 2026-02-21T08:52:45.7325597Z bar.sync 0; 2026-02-21T08:52:45.7325862Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r410, %r411, %r412, %r413}; 2026-02-21T08:52:45.7326190Z bar.sync 0; 2026-02-21T08:52:45.7326379Z ld.shared.v4.b32 {%r404, %r405, %r406, %r407}, [%r21]; 2026-02-21T08:52:45.7326783Z // begin inline asm 2026-02-21T08:52:45.7327010Z st.global.v4.b32 [ %rd58 + 0 ], { %r404, %r405, %r406, %r407 }; 2026-02-21T08:52:45.7327433Z // end inline asm 2026-02-21T08:52:45.7327757Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7328123Z add.s32 %r415, %r790, 8448; 2026-02-21T08:52:45.7328338Z .loc 1 25 35 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:25:35 2026-02-21T08:52:45.7328411Z mul.hi.s32 %r416, %r415, -1840700269; 2026-02-21T08:52:45.7328476Z add.s32 %r417, %r416, %r415; 2026-02-21T08:52:45.7328539Z shr.u32 %r418, %r417, 31; 2026-02-21T08:52:45.7328607Z shr.s32 %r419, %r417, 6; 2026-02-21T08:52:45.7328669Z add.s32 %r420, %r419, %r418; 2026-02-21T08:52:45.7328872Z .loc 1 26 33 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:26:33 2026-02-21T08:52:45.7329072Z shl.b32 %r421, %r420, 1; 2026-02-21T08:52:45.7329290Z .loc 1 27 39 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:39 2026-02-21T08:52:45.7329360Z sub.s32 %r422, 1, %r421; 2026-02-21T08:52:45.7329570Z .loc 1 27 52 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:52 2026-02-21T08:52:45.7329634Z min.s32 %r423, %r422, 2; 2026-02-21T08:52:45.7329836Z .loc 1 28 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:45 2026-02-21T08:52:45.7329904Z mul.lo.s32 %r424, %r420, 112; 2026-02-21T08:52:45.7329974Z sub.s32 %r425, %r415, %r424; 2026-02-21T08:52:45.7330173Z .loc 1 29 51 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:29:51 2026-02-21T08:52:45.7330239Z div.s32 %r426, %r425, %r423; 2026-02-21T08:52:45.7330440Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7330512Z mul.lo.s32 %r427, %r426, %r423; 2026-02-21T08:52:45.7330576Z sub.s32 %r428, %r425, %r427; 2026-02-21T08:52:45.7330778Z .loc 1 28 30 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:30 2026-02-21T08:52:45.7330845Z add.s32 %r429, %r428, %r421; 2026-02-21T08:52:45.7331045Z .loc 1 30 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:30:27 2026-02-21T08:52:45.7331111Z shl.b32 %r430, %r429, 6; 2026-02-21T08:52:45.7331318Z .loc 1 31 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:32 2026-02-21T08:52:45.7331383Z or.b32 %r80, %r430, %r6; 2026-02-21T08:52:45.7331585Z .loc 1 32 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:32:27 2026-02-21T08:52:45.7331653Z shl.b32 %r81, %r426, 7; 2026-02-21T08:52:45.7331865Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7331938Z add.s32 %r809, %r25, %r81; 2026-02-21T08:52:45.7332009Z shl.b32 %r431, %r429, 19; 2026-02-21T08:52:45.7332073Z or.b32 %r432, %r26, %r431; 2026-02-21T08:52:45.7332149Z mad.wide.s32 %rd102, %r432, 2, %rd30; 2026-02-21T08:52:45.7332221Z mov.b32 %r810, 0f00000000; 2026-02-21T08:52:45.7332286Z mov.b64 %rd103, -8; 2026-02-21T08:52:45.7332347Z mov.b32 %r811, %r810; 2026-02-21T08:52:45.7332407Z mov.b32 %r812, %r810; 2026-02-21T08:52:45.7332479Z mov.b32 %r813, %r810; 2026-02-21T08:52:45.7332539Z mov.b32 %r814, %r810; 2026-02-21T08:52:45.7332598Z mov.b32 %r815, %r810; 2026-02-21T08:52:45.7332665Z mov.b32 %r816, %r810; 2026-02-21T08:52:45.7332724Z mov.b32 %r817, %r810; 2026-02-21T08:52:45.7332843Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.7332955Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.7333165Z .loc 1 48 80 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:48:80 2026-02-21T08:52:45.7333232Z // begin inline asm 2026-02-21T08:52:45.7333294Z mov.u64 %rd60, 0x0; 2026-02-21T08:52:45.7333432Z createpolicy.fractional.L2::evict_first.b64 %rd60, 1.0; 2026-02-21T08:52:45.7333602Z // end inline asm 2026-02-21T08:52:45.7333663Z // begin inline asm 2026-02-21T08:52:45.7333729Z mov.u16 %rs41, 0x0; 2026-02-21T08:52:45.7333896Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs41 }, [ %rd102 + 0 ], %rd60; 2026-02-21T08:52:45.7333956Z // end inline asm 2026-02-21T08:52:45.7334156Z .loc 1 52 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:52:32 2026-02-21T08:52:45.7334222Z bar.sync 0; 2026-02-21T08:52:45.7334290Z st.shared.b16 [%r13], %rs41; 2026-02-21T08:52:45.7334347Z bar.sync 0; 2026-02-21T08:52:45.7334421Z ld.shared.b16 %rs43, [%r22]; 2026-02-21T08:52:45.7334493Z ld.shared.b16 %rs44, [%r22+256]; 2026-02-21T08:52:45.7334563Z ld.shared.b16 %rs45, [%r22+16]; 2026-02-21T08:52:45.7334635Z ld.shared.b16 %rs46, [%r22+272]; 2026-02-21T08:52:45.7334821Z ld.shared.b16 %rs47, [%r23]; 2026-02-21T08:52:45.7334890Z ld.shared.b16 %rs48, [%r23+256]; 2026-02-21T08:52:45.7334956Z ld.shared.b16 %rs49, [%r23+16]; 2026-02-21T08:52:45.7335031Z ld.shared.b16 %rs50, [%r23+272]; 2026-02-21T08:52:45.7335096Z cvt.f32.bf16 %r449, %rs43; 2026-02-21T08:52:45.7335158Z cvt.f32.bf16 %r450, %rs44; 2026-02-21T08:52:45.7335225Z cvt.f32.bf16 %r451, %rs47; 2026-02-21T08:52:45.7335286Z cvt.f32.bf16 %r452, %rs48; 2026-02-21T08:52:45.7335345Z cvt.f32.bf16 %r469, %rs45; 2026-02-21T08:52:45.7335405Z cvt.f32.bf16 %r470, %rs46; 2026-02-21T08:52:45.7335470Z cvt.f32.bf16 %r471, %rs49; 2026-02-21T08:52:45.7335531Z cvt.f32.bf16 %r472, %rs50; 2026-02-21T08:52:45.7335737Z .loc 1 54 34 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:34 2026-02-21T08:52:45.7335807Z cvt.s64.s32 %rd68, %r809; 2026-02-21T08:52:45.7335870Z add.s64 %rd64, %rd31, %rd68; 2026-02-21T08:52:45.7336069Z .loc 1 54 87 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:87 2026-02-21T08:52:45.7336135Z // begin inline asm 2026-02-21T08:52:45.7336194Z mov.u64 %rd63, 0x0; 2026-02-21T08:52:45.7336316Z createpolicy.fractional.L2::evict_last.b64 %rd63, 1.0; 2026-02-21T08:52:45.7336377Z // end inline asm 2026-02-21T08:52:45.7336441Z // begin inline asm 2026-02-21T08:52:45.7336657Z mov.u16 %rs42, 0x0; 2026-02-21T08:52:45.7336819Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs42 }, [ %rd64 + 0 ], %rd63; 2026-02-21T08:52:45.7336883Z // end inline asm 2026-02-21T08:52:45.7337082Z .loc 1 62 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:62:28 2026-02-21T08:52:45.7337140Z bar.sync 0; 2026-02-21T08:52:45.7337206Z st.shared.b8 [%r24], %rs42; 2026-02-21T08:52:45.7337267Z bar.sync 0; 2026-02-21T08:52:45.7337345Z ld.shared.v2.b8 {%rs51, %rs52}, [%r17]; 2026-02-21T08:52:45.7337548Z .loc 1 57 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:57:28 2026-02-21T08:52:45.7337618Z shl.b16 %rs53, %rs51, 4; 2026-02-21T08:52:45.7337679Z shl.b16 %rs54, %rs52, 4; 2026-02-21T08:52:45.7337880Z .loc 1 72 58 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:72:58 2026-02-21T08:52:45.7337961Z selp.b16 %rs55, %rs53, %rs51, %p28; 2026-02-21T08:52:45.7338024Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T08:52:45.7338085Z shr.s16 %rs57, %rs56, 4; 2026-02-21T08:52:45.7338156Z selp.b16 %rs58, %rs54, %rs52, %p28; 2026-02-21T08:52:45.7338235Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T08:52:45.7338297Z shr.s16 %rs60, %rs59, 4; 2026-02-21T08:52:45.7338498Z .loc 1 77 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:77:32 2026-02-21T08:52:45.7338570Z cvt.rn.f32.s16 %r495, %rs57; 2026-02-21T08:52:45.7338635Z cvt.rn.f32.s16 %r496, %rs60; 2026-02-21T08:52:45.7338690Z bar.sync 0; 2026-02-21T08:52:45.7338760Z st.shared.b32 [%r18], %r495; 2026-02-21T08:52:45.7338825Z st.shared.b32 [%r19], %r496; 2026-02-21T08:52:45.7338884Z $L__tmp5: 2026-02-21T08:52:45.7339173Z .loc 2 291 36 // standard.py:291:36 @[ c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:84:40 ] 2026-02-21T08:52:45.7339407Z // begin inline asm 2026-02-21T08:52:45.7339491Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.7339550Z // end inline asm 2026-02-21T08:52:45.7339613Z bar.sync 0; 2026-02-21T08:52:45.7339698Z shfl.sync.idx.b32 %r497, %r4, 0, 31, -1; 2026-02-21T08:52:45.7339770Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.7339830Z shl.b32 %r498, %r497, 8; 2026-02-21T08:52:45.7339899Z and.b32 %r499, %r498, 7168; 2026-02-21T08:52:45.7339961Z add.s32 %r500, %r499, %r776; 2026-02-21T08:52:45.7340025Z bfe.u32 %r501, %r500, 4, 14; 2026-02-21T08:52:45.7340097Z cvt.u64.u32 %rd69, %r501; 2026-02-21T08:52:45.7340179Z or.b64 %rd66, %rd69, -9223371899382267904; 2026-02-21T08:52:45.7340249Z mov.pred %p11, -1; 2026-02-21T08:52:45.7340310Z // begin inline asm 2026-02-21T08:52:45.7340822Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r810,%r811,%r812,%r813,%r814,%r815,%r816,%r817}, {%r449,%r450,%r451,%r452}, %rd66, %p11, 1, 1; 2026-02-21T08:52:45.7340891Z // end inline asm 2026-02-21T08:52:45.7340960Z add.s32 %r502, %r500, 32; 2026-02-21T08:52:45.7341030Z bfe.u32 %r503, %r502, 4, 14; 2026-02-21T08:52:45.7341094Z cvt.u64.u32 %rd70, %r503; 2026-02-21T08:52:45.7341174Z or.b64 %rd67, %rd70, -9223371899382267904; 2026-02-21T08:52:45.7341241Z // begin inline asm 2026-02-21T08:52:45.7341590Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r810,%r811,%r812,%r813,%r814,%r815,%r816,%r817}, {%r469,%r470,%r471,%r472}, %rd67, %p11, 1, 1; 2026-02-21T08:52:45.7341651Z // end inline asm 2026-02-21T08:52:45.7341733Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.7341801Z mov.b32 %r482, 0; 2026-02-21T08:52:45.7341862Z mov.b32 %r481, %r776; 2026-02-21T08:52:45.7341920Z mov.b32 %r483, %r482; 2026-02-21T08:52:45.7341989Z // begin inline asm 2026-02-21T08:52:45.7342149Z // wait for regs: %r810,%r811,%r812,%r813,%r814,%r815,%r816,%r817,%r481,%r482,%r483 2026-02-21T08:52:45.7342228Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.7342293Z // end inline asm 2026-02-21T08:52:45.7342356Z $L__tmp6: 2026-02-21T08:52:45.7342586Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7342654Z add.s64 %rd103, %rd103, 8; 2026-02-21T08:52:45.7342724Z add.s32 %r809, %r809, 57344; 2026-02-21T08:52:45.7342788Z add.s64 %rd102, %rd102, 32; 2026-02-21T08:52:45.7342855Z setp.lt.u64 %p14, %rd103, 4088; 2026-02-21T08:52:45.7342940Z @%p14 bra $L__BB0_7; 2026-02-21T08:52:45.7343059Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.7343273Z .loc 1 33 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:32 2026-02-21T08:52:45.7343344Z or.b32 %r509, %r81, %r8; 2026-02-21T08:52:45.7343553Z .loc 1 87 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:87:28 2026-02-21T08:52:45.7343631Z cvt.rn.bf16x2.f32 %r510, %r811, %r810; 2026-02-21T08:52:45.7343705Z cvt.rn.bf16x2.f32 %r511, %r813, %r812; 2026-02-21T08:52:45.7343785Z cvt.rn.bf16x2.f32 %r512, %r815, %r814; 2026-02-21T08:52:45.7343856Z cvt.rn.bf16x2.f32 %r513, %r817, %r816; 2026-02-21T08:52:45.7344061Z .loc 1 88 50 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:50 2026-02-21T08:52:45.7344132Z mad.lo.s32 %r514, %r80, 7168, %r509; 2026-02-21T08:52:45.7344327Z .loc 1 88 22 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:22 2026-02-21T08:52:45.7344395Z mad.wide.s32 %rd71, %r514, 2, %rd32; 2026-02-21T08:52:45.7344599Z .loc 1 88 81 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:81 2026-02-21T08:52:45.7344659Z bar.sync 0; 2026-02-21T08:52:45.7344842Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r510, %r511, %r512, %r513}; 2026-02-21T08:52:45.7344903Z bar.sync 0; 2026-02-21T08:52:45.7345015Z ld.shared.v4.b32 {%r504, %r505, %r506, %r507}, [%r21]; 2026-02-21T08:52:45.7345077Z // begin inline asm 2026-02-21T08:52:45.7345312Z st.global.v4.b32 [ %rd71 + 0 ], { %r504, %r505, %r506, %r507 }; 2026-02-21T08:52:45.7345379Z // end inline asm 2026-02-21T08:52:45.7345596Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7345660Z add.s32 %r515, %r790, 12672; 2026-02-21T08:52:45.7345871Z .loc 1 25 35 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:25:35 2026-02-21T08:52:45.7345943Z mul.hi.s32 %r516, %r515, -1840700269; 2026-02-21T08:52:45.7346004Z add.s32 %r517, %r516, %r515; 2026-02-21T08:52:45.7346064Z shr.u32 %r518, %r517, 31; 2026-02-21T08:52:45.7346129Z shr.s32 %r519, %r517, 6; 2026-02-21T08:52:45.7346188Z add.s32 %r520, %r519, %r518; 2026-02-21T08:52:45.7346793Z .loc 1 26 33 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:26:33 2026-02-21T08:52:45.7346885Z shl.b32 %r521, %r520, 1; 2026-02-21T08:52:45.7347098Z .loc 1 27 39 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:39 2026-02-21T08:52:45.7347164Z sub.s32 %r522, 1, %r521; 2026-02-21T08:52:45.7347369Z .loc 1 27 52 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:52 2026-02-21T08:52:45.7347429Z min.s32 %r523, %r522, 2; 2026-02-21T08:52:45.7347637Z .loc 1 28 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:45 2026-02-21T08:52:45.7347708Z mul.lo.s32 %r524, %r520, 112; 2026-02-21T08:52:45.7347769Z sub.s32 %r525, %r515, %r524; 2026-02-21T08:52:45.7347965Z .loc 1 29 51 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:29:51 2026-02-21T08:52:45.7348029Z div.s32 %r526, %r525, %r523; 2026-02-21T08:52:45.7348237Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7348303Z mul.lo.s32 %r527, %r526, %r523; 2026-02-21T08:52:45.7348364Z sub.s32 %r528, %r525, %r527; 2026-02-21T08:52:45.7348662Z .loc 1 28 30 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:30 2026-02-21T08:52:45.7348733Z add.s32 %r529, %r528, %r521; 2026-02-21T08:52:45.7348933Z .loc 1 30 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:30:27 2026-02-21T08:52:45.7349002Z shl.b32 %r530, %r529, 6; 2026-02-21T08:52:45.7349202Z .loc 1 31 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:32 2026-02-21T08:52:45.7349265Z or.b32 %r101, %r530, %r6; 2026-02-21T08:52:45.7349465Z .loc 1 32 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:32:27 2026-02-21T08:52:45.7349535Z shl.b32 %r102, %r526, 7; 2026-02-21T08:52:45.7349747Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7349813Z add.s32 %r818, %r25, %r102; 2026-02-21T08:52:45.7349889Z shl.b32 %r531, %r529, 19; 2026-02-21T08:52:45.7349950Z or.b32 %r532, %r26, %r531; 2026-02-21T08:52:45.7350024Z mad.wide.s32 %rd104, %r532, 2, %rd30; 2026-02-21T08:52:45.7350094Z mov.b32 %r819, 0f00000000; 2026-02-21T08:52:45.7350159Z mov.b64 %rd105, -8; 2026-02-21T08:52:45.7350223Z mov.b32 %r820, %r819; 2026-02-21T08:52:45.7350287Z mov.b32 %r821, %r819; 2026-02-21T08:52:45.7350351Z mov.b32 %r822, %r819; 2026-02-21T08:52:45.7350411Z mov.b32 %r823, %r819; 2026-02-21T08:52:45.7350477Z mov.b32 %r824, %r819; 2026-02-21T08:52:45.7350546Z mov.b32 %r825, %r819; 2026-02-21T08:52:45.7350607Z mov.b32 %r826, %r819; 2026-02-21T08:52:45.7350727Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:45.7350837Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.7351056Z .loc 1 48 80 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:48:80 2026-02-21T08:52:45.7351120Z // begin inline asm 2026-02-21T08:52:45.7351179Z mov.u64 %rd73, 0x0; 2026-02-21T08:52:45.7351314Z createpolicy.fractional.L2::evict_first.b64 %rd73, 1.0; 2026-02-21T08:52:45.7351535Z // end inline asm 2026-02-21T08:52:45.7351595Z // begin inline asm 2026-02-21T08:52:45.7351660Z mov.u16 %rs61, 0x0; 2026-02-21T08:52:45.7351825Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs61 }, [ %rd104 + 0 ], %rd73; 2026-02-21T08:52:45.7351885Z // end inline asm 2026-02-21T08:52:45.7352092Z .loc 1 52 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:52:32 2026-02-21T08:52:45.7352154Z bar.sync 0; 2026-02-21T08:52:45.7352223Z st.shared.b16 [%r13], %rs61; 2026-02-21T08:52:45.7352279Z bar.sync 0; 2026-02-21T08:52:45.7352353Z ld.shared.b16 %rs63, [%r22]; 2026-02-21T08:52:45.7352424Z ld.shared.b16 %rs64, [%r22+256]; 2026-02-21T08:52:45.7352494Z ld.shared.b16 %rs65, [%r22+16]; 2026-02-21T08:52:45.7352686Z ld.shared.b16 %rs66, [%r22+272]; 2026-02-21T08:52:45.7352757Z ld.shared.b16 %rs67, [%r23]; 2026-02-21T08:52:45.7352822Z ld.shared.b16 %rs68, [%r23+256]; 2026-02-21T08:52:45.7352893Z ld.shared.b16 %rs69, [%r23+16]; 2026-02-21T08:52:45.7352968Z ld.shared.b16 %rs70, [%r23+272]; 2026-02-21T08:52:45.7353033Z cvt.f32.bf16 %r549, %rs63; 2026-02-21T08:52:45.7353098Z cvt.f32.bf16 %r550, %rs64; 2026-02-21T08:52:45.7353165Z cvt.f32.bf16 %r551, %rs67; 2026-02-21T08:52:45.7353229Z cvt.f32.bf16 %r552, %rs68; 2026-02-21T08:52:45.7353290Z cvt.f32.bf16 %r569, %rs65; 2026-02-21T08:52:45.7353356Z cvt.f32.bf16 %r570, %rs66; 2026-02-21T08:52:45.7353424Z cvt.f32.bf16 %r571, %rs69; 2026-02-21T08:52:45.7353486Z cvt.f32.bf16 %r572, %rs70; 2026-02-21T08:52:45.7353703Z .loc 1 54 34 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:34 2026-02-21T08:52:45.7353776Z cvt.s64.s32 %rd81, %r818; 2026-02-21T08:52:45.7353839Z add.s64 %rd77, %rd31, %rd81; 2026-02-21T08:52:45.7354048Z .loc 1 54 87 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:87 2026-02-21T08:52:45.7354118Z // begin inline asm 2026-02-21T08:52:45.7354184Z mov.u64 %rd76, 0x0; 2026-02-21T08:52:45.7354308Z createpolicy.fractional.L2::evict_last.b64 %rd76, 1.0; 2026-02-21T08:52:45.7354371Z // end inline asm 2026-02-21T08:52:45.7354439Z // begin inline asm 2026-02-21T08:52:45.7354497Z mov.u16 %rs62, 0x0; 2026-02-21T08:52:45.7354654Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs62 }, [ %rd77 + 0 ], %rd76; 2026-02-21T08:52:45.7354721Z // end inline asm 2026-02-21T08:52:45.7354927Z .loc 1 62 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:62:28 2026-02-21T08:52:45.7354984Z bar.sync 0; 2026-02-21T08:52:45.7355050Z st.shared.b8 [%r24], %rs62; 2026-02-21T08:52:45.7355114Z bar.sync 0; 2026-02-21T08:52:45.7355193Z ld.shared.v2.b8 {%rs71, %rs72}, [%r17]; 2026-02-21T08:52:45.7355409Z .loc 1 57 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:57:28 2026-02-21T08:52:45.7355482Z shl.b16 %rs73, %rs71, 4; 2026-02-21T08:52:45.7355547Z shl.b16 %rs74, %rs72, 4; 2026-02-21T08:52:45.7355748Z .loc 1 72 58 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:72:58 2026-02-21T08:52:45.7355825Z selp.b16 %rs75, %rs73, %rs71, %p28; 2026-02-21T08:52:45.7355888Z cvt.s16.s8 %rs76, %rs75; 2026-02-21T08:52:45.7355948Z shr.s16 %rs77, %rs76, 4; 2026-02-21T08:52:45.7356016Z selp.b16 %rs78, %rs74, %rs72, %p28; 2026-02-21T08:52:45.7356081Z cvt.s16.s8 %rs79, %rs78; 2026-02-21T08:52:45.7356141Z shr.s16 %rs80, %rs79, 4; 2026-02-21T08:52:45.7356345Z .loc 1 77 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:77:32 2026-02-21T08:52:45.7356415Z cvt.rn.f32.s16 %r595, %rs77; 2026-02-21T08:52:45.7356623Z cvt.rn.f32.s16 %r596, %rs80; 2026-02-21T08:52:45.7356686Z bar.sync 0; 2026-02-21T08:52:45.7356759Z st.shared.b32 [%r18], %r595; 2026-02-21T08:52:45.7356825Z st.shared.b32 [%r19], %r596; 2026-02-21T08:52:45.7356881Z $L__tmp7: 2026-02-21T08:52:45.7357161Z .loc 2 291 36 // standard.py:291:36 @[ c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:84:40 ] 2026-02-21T08:52:45.7357373Z // begin inline asm 2026-02-21T08:52:45.7357452Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.7357510Z // end inline asm 2026-02-21T08:52:45.7357569Z bar.sync 0; 2026-02-21T08:52:45.7357649Z shfl.sync.idx.b32 %r597, %r4, 0, 31, -1; 2026-02-21T08:52:45.7357723Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.7357782Z shl.b32 %r598, %r597, 8; 2026-02-21T08:52:45.7357848Z and.b32 %r599, %r598, 7168; 2026-02-21T08:52:45.7357910Z add.s32 %r600, %r599, %r776; 2026-02-21T08:52:45.7357972Z bfe.u32 %r601, %r600, 4, 14; 2026-02-21T08:52:45.7358039Z cvt.u64.u32 %rd82, %r601; 2026-02-21T08:52:45.7358116Z or.b64 %rd79, %rd82, -9223371899382267904; 2026-02-21T08:52:45.7358180Z mov.pred %p15, -1; 2026-02-21T08:52:45.7358366Z // begin inline asm 2026-02-21T08:52:45.7358732Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r819,%r820,%r821,%r822,%r823,%r824,%r825,%r826}, {%r549,%r550,%r551,%r552}, %rd79, %p15, 1, 1; 2026-02-21T08:52:45.7358795Z // end inline asm 2026-02-21T08:52:45.7358856Z add.s32 %r602, %r600, 32; 2026-02-21T08:52:45.7358934Z bfe.u32 %r603, %r602, 4, 14; 2026-02-21T08:52:45.7358998Z cvt.u64.u32 %rd83, %r603; 2026-02-21T08:52:45.7359077Z or.b64 %rd80, %rd83, -9223371899382267904; 2026-02-21T08:52:45.7359144Z // begin inline asm 2026-02-21T08:52:45.7359487Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r819,%r820,%r821,%r822,%r823,%r824,%r825,%r826}, {%r569,%r570,%r571,%r572}, %rd80, %p15, 1, 1; 2026-02-21T08:52:45.7359546Z // end inline asm 2026-02-21T08:52:45.7359623Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.7359687Z mov.b32 %r583, 0; 2026-02-21T08:52:45.7359747Z mov.b32 %r581, %r776; 2026-02-21T08:52:45.7359807Z mov.b32 %r582, %r583; 2026-02-21T08:52:45.7359876Z // begin inline asm 2026-02-21T08:52:45.7360032Z // wait for regs: %r819,%r820,%r821,%r822,%r823,%r824,%r825,%r826,%r581,%r582,%r583 2026-02-21T08:52:45.7360109Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.7360176Z // end inline asm 2026-02-21T08:52:45.7360231Z $L__tmp8: 2026-02-21T08:52:45.7360449Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7360515Z add.s64 %rd105, %rd105, 8; 2026-02-21T08:52:45.7360584Z add.s32 %r818, %r818, 57344; 2026-02-21T08:52:45.7360648Z add.s64 %rd104, %rd104, 32; 2026-02-21T08:52:45.7360718Z setp.lt.u64 %p18, %rd105, 4088; 2026-02-21T08:52:45.7360786Z @%p18 bra $L__BB0_9; 2026-02-21T08:52:45.7360897Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:45.7361104Z .loc 1 33 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:32 2026-02-21T08:52:45.7361172Z or.b32 %r608, %r102, %r8; 2026-02-21T08:52:45.7361379Z .loc 1 87 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:87:28 2026-02-21T08:52:45.7361455Z cvt.rn.bf16x2.f32 %r609, %r820, %r819; 2026-02-21T08:52:45.7361530Z cvt.rn.bf16x2.f32 %r610, %r822, %r821; 2026-02-21T08:52:45.7361607Z cvt.rn.bf16x2.f32 %r611, %r824, %r823; 2026-02-21T08:52:45.7361676Z cvt.rn.bf16x2.f32 %r612, %r826, %r825; 2026-02-21T08:52:45.7361876Z .loc 1 88 50 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:50 2026-02-21T08:52:45.7361952Z mad.lo.s32 %r613, %r101, 7168, %r608; 2026-02-21T08:52:45.7362151Z .loc 1 88 22 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:22 2026-02-21T08:52:45.7362222Z mad.wide.s32 %rd84, %r613, 2, %rd32; 2026-02-21T08:52:45.7362435Z .loc 1 88 81 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:81 2026-02-21T08:52:45.7362493Z bar.sync 0; 2026-02-21T08:52:45.7362677Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r609, %r610, %r611, %r612}; 2026-02-21T08:52:45.7362736Z bar.sync 0; 2026-02-21T08:52:45.7362866Z ld.shared.v4.b32 {%r604, %r605, %r606, %r607}, [%r21]; 2026-02-21T08:52:45.7363045Z // begin inline asm 2026-02-21T08:52:45.7363171Z st.global.v4.b32 [ %rd84 + 0 ], { %r604, %r605, %r606, %r607 }; 2026-02-21T08:52:45.7363237Z // end inline asm 2026-02-21T08:52:45.7363454Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7363519Z add.s32 %r790, %r790, 16896; 2026-02-21T08:52:45.7363593Z setp.lt.s32 %p19, %r790, %r827; 2026-02-21T08:52:45.7363654Z @%p19 bra $L__BB0_2; 2026-02-21T08:52:45.7363746Z $L__BB0_11: // %.preheader 2026-02-21T08:52:45.7363816Z setp.gt.s32 %p20, %r827, 55; 2026-02-21T08:52:45.7363887Z @%p20 bra $L__BB0_16; 2026-02-21T08:52:45.7363972Z // %bb.12: // %.lr.ph129 2026-02-21T08:52:45.7364302Z .loc 1 0 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:0:144 2026-02-21T08:52:45.7364376Z and.b32 %r615, %r773, 1918; 2026-02-21T08:52:45.7364441Z and.b32 %r618, %r775, 136; 2026-02-21T08:52:45.7364506Z xor.b32 %r619, %r618, %r615; 2026-02-21T08:52:45.7364574Z add.s32 %r27, %r776, %r619; 2026-02-21T08:52:45.7364637Z and.b32 %r622, %r777, 1536; 2026-02-21T08:52:45.7364698Z and.b32 %r624, %r778, 96; 2026-02-21T08:52:45.7364762Z and.b32 %r625, %r773, 6; 2026-02-21T08:52:45.7364830Z and.b32 %r628, %r780, 136; 2026-02-21T08:52:45.7364890Z or.b32 %r629, %r622, %r624; 2026-02-21T08:52:45.7364951Z or.b32 %r630, %r629, %r625; 2026-02-21T08:52:45.7365016Z or.b32 %r631, %r630, %r628; 2026-02-21T08:52:45.7365077Z add.s32 %r28, %r776, %r631; 2026-02-21T08:52:45.7365137Z xor.b32 %r632, %r631, 8; 2026-02-21T08:52:45.7365197Z add.s32 %r29, %r776, %r632; 2026-02-21T08:52:45.7365263Z and.b32 %r634, %r781, 124; 2026-02-21T08:52:45.7365321Z shl.b32 %r637, %r774, 3; 2026-02-21T08:52:45.7365388Z selp.b32 %r638, 1, 0, %p27; 2026-02-21T08:52:45.7365455Z add.s32 %r639, %r776, %r782; 2026-02-21T08:52:45.7365516Z add.s32 %r640, %r639, %r638; 2026-02-21T08:52:45.7365577Z add.s32 %r641, %r640, %r637; 2026-02-21T08:52:45.7365641Z add.s32 %r642, %r641, %r783; 2026-02-21T08:52:45.7365706Z add.s32 %r30, %r642, %r634; 2026-02-21T08:52:45.7365766Z and.b32 %r644, %r784, 384; 2026-02-21T08:52:45.7365826Z add.s32 %r645, %r776, %r783; 2026-02-21T08:52:45.7365892Z add.s32 %r646, %r645, %r644; 2026-02-21T08:52:45.7365964Z add.s32 %r647, %r646, %r634; 2026-02-21T08:52:45.7366028Z add.s32 %r31, %r647, %r637; 2026-02-21T08:52:45.7366089Z and.b32 %r649, %r778, 48; 2026-02-21T08:52:45.7366155Z xor.b32 %r651, %r649, %r786; 2026-02-21T08:52:45.7366215Z or.b32 %r652, %r651, %r785; 2026-02-21T08:52:45.7366275Z add.s32 %r32, %r776, %r652; 2026-02-21T08:52:45.7366340Z xor.b32 %r653, %r652, 32; 2026-02-21T08:52:45.7366405Z add.s32 %r33, %r776, %r653; 2026-02-21T08:52:45.7366595Z shl.b32 %r655, %r787, 11; 2026-02-21T08:52:45.7366677Z and.b32 %r657, %r778, 768; 2026-02-21T08:52:45.7366741Z shr.u32 %r658, %r3, 2; 2026-02-21T08:52:45.7366803Z and.b32 %r659, %r658, 96; 2026-02-21T08:52:45.7366866Z and.b32 %r660, %r773, 1024; 2026-02-21T08:52:45.7366931Z or.b32 %r661, %r788, %r657; 2026-02-21T08:52:45.7366991Z or.b32 %r662, %r659, %r779; 2026-02-21T08:52:45.7367051Z xor.b32 %r663, %r661, %r662; 2026-02-21T08:52:45.7367116Z add.s32 %r664, %r776, %r655; 2026-02-21T08:52:45.7367178Z add.s32 %r665, %r664, %r660; 2026-02-21T08:52:45.7367238Z add.s32 %r34, %r665, %r663; 2026-02-21T08:52:45.7367298Z and.b32 %r667, %r789, 15360; 2026-02-21T08:52:45.7367365Z shl.b32 %r668, %r787, 4; 2026-02-21T08:52:45.7367426Z xor.b32 %r669, %r668, %r5; 2026-02-21T08:52:45.7367489Z add.s32 %r670, %r776, %r667; 2026-02-21T08:52:45.7367556Z add.s32 %r35, %r670, %r669; 2026-02-21T08:52:45.7367772Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7367846Z mad.wide.u32 %rd1, %r11, 7168, %rd31; 2026-02-21T08:52:45.7367961Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T08:52:45.7368238Z // Child Loop BB0_14 Depth 2 2026-02-21T08:52:45.7368446Z .loc 1 25 35 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:25:35 2026-02-21T08:52:45.7368518Z mul.hi.s32 %r672, %r827, -1840700269; 2026-02-21T08:52:45.7368586Z add.s32 %r673, %r672, %r827; 2026-02-21T08:52:45.7368647Z shr.u32 %r674, %r673, 31; 2026-02-21T08:52:45.7368709Z shr.s32 %r675, %r673, 6; 2026-02-21T08:52:45.7368776Z add.s32 %r676, %r675, %r674; 2026-02-21T08:52:45.7368982Z .loc 1 26 33 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:26:33 2026-02-21T08:52:45.7369042Z shl.b32 %r677, %r676, 1; 2026-02-21T08:52:45.7369368Z .loc 1 27 39 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:39 2026-02-21T08:52:45.7369434Z sub.s32 %r678, 1, %r677; 2026-02-21T08:52:45.7369635Z .loc 1 27 52 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:27:52 2026-02-21T08:52:45.7369699Z min.u32 %r679, %r678, 2; 2026-02-21T08:52:45.7369904Z .loc 1 28 45 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:45 2026-02-21T08:52:45.7369981Z mul.lo.s32 %r680, %r676, 112; 2026-02-21T08:52:45.7370048Z sub.s32 %r681, %r827, %r680; 2026-02-21T08:52:45.7370251Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7370316Z cvt.u16.u32 %rs81, %r681; 2026-02-21T08:52:45.7370379Z cvt.s8.s32 %rs82, %r681; 2026-02-21T08:52:45.7370449Z cvt.u16.u32 %rs83, %r679; 2026-02-21T08:52:45.7370648Z .loc 1 29 51 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:29:51 2026-02-21T08:52:45.7370712Z div.s16 %rs84, %rs82, %rs83; 2026-02-21T08:52:45.7370914Z .loc 1 28 64 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:64 2026-02-21T08:52:45.7370988Z mul.lo.s16 %rs85, %rs84, %rs83; 2026-02-21T08:52:45.7371054Z sub.s16 %rs86, %rs81, %rs85; 2026-02-21T08:52:45.7371116Z cvt.u32.u16 %r682, %rs86; 2026-02-21T08:52:45.7371183Z cvt.s32.s8 %r683, %r682; 2026-02-21T08:52:45.7371384Z .loc 1 28 30 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:28:30 2026-02-21T08:52:45.7371446Z add.s32 %r684, %r677, %r683; 2026-02-21T08:52:45.7371652Z .loc 1 30 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:30:27 2026-02-21T08:52:45.7371715Z shl.b32 %r685, %r684, 6; 2026-02-21T08:52:45.7371914Z .loc 1 31 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:31:32 2026-02-21T08:52:45.7371976Z or.b32 %r124, %r685, %r6; 2026-02-21T08:52:45.7372186Z .loc 1 32 27 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:32:27 2026-02-21T08:52:45.7372250Z cvt.s16.s8 %rs87, %rs84; 2026-02-21T08:52:45.7372327Z mul.wide.s16 %r125, %rs87, 128; 2026-02-21T08:52:45.7372547Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7372612Z shl.b32 %r686, %r676, 7; 2026-02-21T08:52:45.7372674Z or.b32 %r687, %r6, %r686; 2026-02-21T08:52:45.7372742Z cvt.s16.s8 %rs88, %rs86; 2026-02-21T08:52:45.7372813Z mad.wide.s16 %r688, %rs88, 64, %r687; 2026-02-21T08:52:45.7372874Z shl.b32 %r689, %r688, 13; 2026-02-21T08:52:45.7372936Z or.b32 %r690, %r7, %r689; 2026-02-21T08:52:45.7373013Z mad.wide.s32 %rd107, %r690, 2, %rd30; 2026-02-21T08:52:45.7373073Z or.b32 %r691, %r9, %r125; 2026-02-21T08:52:45.7373136Z cvt.s64.s32 %rd86, %r691; 2026-02-21T08:52:45.7373206Z add.s64 %rd106, %rd1, %rd86; 2026-02-21T08:52:45.7373267Z mov.b32 %r828, 0f00000000; 2026-02-21T08:52:45.7373329Z mov.b64 %rd108, -8; 2026-02-21T08:52:45.7373392Z mov.b32 %r829, %r828; 2026-02-21T08:52:45.7373460Z mov.b32 %r830, %r828; 2026-02-21T08:52:45.7373520Z mov.b32 %r831, %r828; 2026-02-21T08:52:45.7373579Z mov.b32 %r832, %r828; 2026-02-21T08:52:45.7373643Z mov.b32 %r833, %r828; 2026-02-21T08:52:45.7373807Z mov.b32 %r834, %r828; 2026-02-21T08:52:45.7373866Z mov.b32 %r835, %r828; 2026-02-21T08:52:45.7373986Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T08:52:45.7374094Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:45.7374298Z .loc 1 48 80 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:48:80 2026-02-21T08:52:45.7374360Z // begin inline asm 2026-02-21T08:52:45.7374425Z mov.u64 %rd87, 0x0; 2026-02-21T08:52:45.7374553Z createpolicy.fractional.L2::evict_first.b64 %rd87, 1.0; 2026-02-21T08:52:45.7374610Z // end inline asm 2026-02-21T08:52:45.7374677Z // begin inline asm 2026-02-21T08:52:45.7374734Z mov.u16 %rs89, 0x0; 2026-02-21T08:52:45.7375004Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs89 }, [ %rd107 + 0 ], %rd87; 2026-02-21T08:52:45.7375076Z // end inline asm 2026-02-21T08:52:45.7375280Z .loc 1 52 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:52:32 2026-02-21T08:52:45.7375342Z bar.sync 0; 2026-02-21T08:52:45.7375408Z st.shared.b16 [%r27], %rs89; 2026-02-21T08:52:45.7375468Z bar.sync 0; 2026-02-21T08:52:45.7375534Z ld.shared.b16 %rs91, [%r28]; 2026-02-21T08:52:45.7375604Z ld.shared.b16 %rs92, [%r28+256]; 2026-02-21T08:52:45.7375676Z ld.shared.b16 %rs93, [%r28+16]; 2026-02-21T08:52:45.7375742Z ld.shared.b16 %rs94, [%r28+272]; 2026-02-21T08:52:45.7375805Z ld.shared.b16 %rs95, [%r29]; 2026-02-21T08:52:45.7375870Z ld.shared.b16 %rs96, [%r29+256]; 2026-02-21T08:52:45.7375941Z ld.shared.b16 %rs97, [%r29+16]; 2026-02-21T08:52:45.7376005Z ld.shared.b16 %rs98, [%r29+272]; 2026-02-21T08:52:45.7376072Z cvt.f32.bf16 %r708, %rs91; 2026-02-21T08:52:45.7376142Z cvt.f32.bf16 %r709, %rs92; 2026-02-21T08:52:45.7376204Z cvt.f32.bf16 %r710, %rs95; 2026-02-21T08:52:45.7376267Z cvt.f32.bf16 %r711, %rs96; 2026-02-21T08:52:45.7376329Z cvt.f32.bf16 %r728, %rs93; 2026-02-21T08:52:45.7376399Z cvt.f32.bf16 %r729, %rs94; 2026-02-21T08:52:45.7376592Z cvt.f32.bf16 %r730, %rs97; 2026-02-21T08:52:45.7376658Z cvt.f32.bf16 %r731, %rs98; 2026-02-21T08:52:45.7376870Z .loc 1 54 87 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:54:87 2026-02-21T08:52:45.7376943Z // begin inline asm 2026-02-21T08:52:45.7377006Z mov.u64 %rd90, 0x0; 2026-02-21T08:52:45.7377134Z createpolicy.fractional.L2::evict_last.b64 %rd90, 1.0; 2026-02-21T08:52:45.7377193Z // end inline asm 2026-02-21T08:52:45.7377252Z // begin inline asm 2026-02-21T08:52:45.7377311Z mov.u16 %rs90, 0x0; 2026-02-21T08:52:45.7377472Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs90 }, [ %rd106 + 0 ], %rd90; 2026-02-21T08:52:45.7377532Z // end inline asm 2026-02-21T08:52:45.7377734Z .loc 1 62 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:62:28 2026-02-21T08:52:45.7377797Z bar.sync 0; 2026-02-21T08:52:45.7377862Z st.shared.b8 [%r30], %rs90; 2026-02-21T08:52:45.7377923Z bar.sync 0; 2026-02-21T08:52:45.7378009Z ld.shared.v2.b8 {%rs99, %rs100}, [%r31]; 2026-02-21T08:52:45.7378210Z .loc 1 57 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:57:28 2026-02-21T08:52:45.7378276Z shl.b16 %rs101, %rs99, 4; 2026-02-21T08:52:45.7378340Z shl.b16 %rs102, %rs100, 4; 2026-02-21T08:52:45.7378544Z .loc 1 72 58 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:72:58 2026-02-21T08:52:45.7378619Z selp.b16 %rs103, %rs101, %rs99, %p28; 2026-02-21T08:52:45.7378683Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T08:52:45.7378750Z shr.s16 %rs105, %rs104, 4; 2026-02-21T08:52:45.7378823Z selp.b16 %rs106, %rs102, %rs100, %p28; 2026-02-21T08:52:45.7378884Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T08:52:45.7378950Z shr.s16 %rs108, %rs107, 4; 2026-02-21T08:52:45.7379157Z .loc 1 77 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:77:32 2026-02-21T08:52:45.7379223Z cvt.rn.f32.s16 %r754, %rs105; 2026-02-21T08:52:45.7379436Z cvt.rn.f32.s16 %r755, %rs108; 2026-02-21T08:52:45.7379500Z bar.sync 0; 2026-02-21T08:52:45.7379567Z st.shared.b32 [%r32], %r754; 2026-02-21T08:52:45.7379632Z st.shared.b32 [%r33], %r755; 2026-02-21T08:52:45.7379694Z $L__tmp9: 2026-02-21T08:52:45.7379971Z .loc 2 291 36 // standard.py:291:36 @[ c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:84:40 ] 2026-02-21T08:52:45.7380034Z // begin inline asm 2026-02-21T08:52:45.7380112Z fence.proxy.async.shared::cta; 2026-02-21T08:52:45.7380190Z // end inline asm 2026-02-21T08:52:45.7380252Z bar.sync 0; 2026-02-21T08:52:45.7380333Z shfl.sync.idx.b32 %r756, %r4, 0, 31, -1; 2026-02-21T08:52:45.7380412Z wgmma.fence.sync.aligned; 2026-02-21T08:52:45.7380473Z shl.b32 %r757, %r756, 8; 2026-02-21T08:52:45.7380659Z and.b32 %r758, %r757, 7168; 2026-02-21T08:52:45.7380727Z add.s32 %r759, %r758, %r776; 2026-02-21T08:52:45.7380797Z bfe.u32 %r760, %r759, 4, 14; 2026-02-21T08:52:45.7380864Z cvt.u64.u32 %rd95, %r760; 2026-02-21T08:52:45.7380943Z or.b64 %rd93, %rd95, -9223371899382267904; 2026-02-21T08:52:45.7381018Z mov.pred %p22, -1; 2026-02-21T08:52:45.7381079Z // begin inline asm 2026-02-21T08:52:45.7381434Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835}, {%r708,%r709,%r710,%r711}, %rd93, %p22, 1, 1; 2026-02-21T08:52:45.7381500Z // end inline asm 2026-02-21T08:52:45.7381562Z add.s32 %r761, %r759, 32; 2026-02-21T08:52:45.7381624Z bfe.u32 %r762, %r761, 4, 14; 2026-02-21T08:52:45.7381700Z cvt.u64.u32 %rd96, %r762; 2026-02-21T08:52:45.7381785Z or.b64 %rd94, %rd96, -9223371899382267904; 2026-02-21T08:52:45.7381846Z // begin inline asm 2026-02-21T08:52:45.7382193Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835}, {%r728,%r729,%r730,%r731}, %rd94, %p22, 1, 1; 2026-02-21T08:52:45.7382260Z // end inline asm 2026-02-21T08:52:45.7382339Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:45.7382405Z mov.b32 %r742, 0; 2026-02-21T08:52:45.7382471Z mov.b32 %r740, %r776; 2026-02-21T08:52:45.7382531Z mov.b32 %r741, %r742; 2026-02-21T08:52:45.7382592Z // begin inline asm 2026-02-21T08:52:45.7382751Z // wait for regs: %r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r740,%r741,%r742 2026-02-21T08:52:45.7382836Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:45.7382893Z // end inline asm 2026-02-21T08:52:45.7382952Z $L__tmp10: 2026-02-21T08:52:45.7383176Z .loc 1 40 103 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:40:103 2026-02-21T08:52:45.7383243Z add.s64 %rd108, %rd108, 8; 2026-02-21T08:52:45.7383309Z add.s64 %rd107, %rd107, 32; 2026-02-21T08:52:45.7383375Z add.s64 %rd106, %rd106, 57344; 2026-02-21T08:52:45.7383453Z setp.lt.u64 %p25, %rd108, 4088; 2026-02-21T08:52:45.7383515Z @%p25 bra $L__BB0_14; 2026-02-21T08:52:45.7383629Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T08:52:45.7383844Z .loc 1 33 32 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:33:32 2026-02-21T08:52:45.7383907Z or.b32 %r767, %r125, %r8; 2026-02-21T08:52:45.7384107Z .loc 1 87 28 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:87:28 2026-02-21T08:52:45.7384186Z cvt.rn.bf16x2.f32 %r768, %r829, %r828; 2026-02-21T08:52:45.7384258Z cvt.rn.bf16x2.f32 %r769, %r831, %r830; 2026-02-21T08:52:45.7384326Z cvt.rn.bf16x2.f32 %r770, %r833, %r832; 2026-02-21T08:52:45.7384394Z cvt.rn.bf16x2.f32 %r771, %r835, %r834; 2026-02-21T08:52:45.7384601Z .loc 1 88 50 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:50 2026-02-21T08:52:45.7384671Z mad.lo.s32 %r772, %r124, 7168, %r767; 2026-02-21T08:52:45.7384875Z .loc 1 88 22 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:22 2026-02-21T08:52:45.7384952Z mad.wide.s32 %rd97, %r772, 2, %rd32; 2026-02-21T08:52:45.7385152Z .loc 1 88 81 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:88:81 2026-02-21T08:52:45.7385343Z bar.sync 0; 2026-02-21T08:52:45.7385529Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r768, %r769, %r770, %r771}; 2026-02-21T08:52:45.7385585Z bar.sync 0; 2026-02-21T08:52:45.7385689Z ld.shared.v4.b32 {%r763, %r764, %r765, %r766}, [%r35]; 2026-02-21T08:52:45.7385753Z // begin inline asm 2026-02-21T08:52:45.7385870Z st.global.v4.b32 [ %rd97 + 0 ], { %r763, %r764, %r765, %r766 }; 2026-02-21T08:52:45.7385928Z // end inline asm 2026-02-21T08:52:45.7386137Z .loc 1 19 144 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:144 2026-02-21T08:52:45.7386209Z add.s32 %r142, %r827, 4224; 2026-02-21T08:52:45.7386277Z setp.lt.s32 %p26, %r827, -4168; 2026-02-21T08:52:45.7386432Z mov.b32 %r827, %r142; 2026-02-21T08:52:45.7386638Z @%p26 bra $L__BB0_13; 2026-02-21T08:52:45.7386731Z $L__BB0_16: // %._crit_edge 2026-02-21T08:52:45.7386940Z .loc 1 19 4 // c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py:19:4 2026-02-21T08:52:45.7386996Z ret; 2026-02-21T08:52:45.7387059Z $L__tmp11: 2026-02-21T08:52:45.7387117Z $L__func_end0: 2026-02-21T08:52:45.7387206Z // -- End function 2026-02-21T08:52:45.7387268Z } 2026-02-21T08:52:45.7387525Z .file 1 "/tmp/torchinductor_root/3x/c3x6t6wvki3z33smkdohpl6gws4pcmecadohru4j5vqeipzxqnlr.py" 2026-02-21T08:52:45.7387741Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:45.7387822Z .section .debug_abbrev 2026-02-21T08:52:45.7387876Z { 2026-02-21T08:52:45.7387973Z .b8 1 // Abbreviation Code 2026-02-21T08:52:45.7388072Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:45.7388164Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:45.7388251Z .b8 37 // DW_AT_producer 2026-02-21T08:52:45.7388334Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.7388419Z .b8 19 // DW_AT_language 2026-02-21T08:52:45.7388559Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:45.7388654Z .b8 3 // DW_AT_name 2026-02-21T08:52:45.7388741Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.7388826Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:45.7388907Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:45.7388987Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:45.7389073Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.7389153Z .b8 0 // EOM(1) 2026-02-21T08:52:45.7389225Z .b8 0 // EOM(2) 2026-02-21T08:52:45.7389321Z .b8 2 // Abbreviation Code 2026-02-21T08:52:45.7389412Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:45.7389493Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:45.7389578Z .b8 3 // DW_AT_name 2026-02-21T08:52:45.7389658Z .b8 8 // DW_FORM_string 2026-02-21T08:52:45.7389742Z .b8 32 // DW_AT_inline 2026-02-21T08:52:45.7389839Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.7389913Z .b8 0 // EOM(1) 2026-02-21T08:52:45.7389985Z .b8 0 // EOM(2) 2026-02-21T08:52:45.7390074Z .b8 3 // Abbreviation Code 2026-02-21T08:52:45.7390172Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:45.7390258Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:45.7390341Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:45.7390574Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.7390659Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:45.7390738Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.7390838Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:45.7390918Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:45.7391001Z .b8 0 // EOM(1) 2026-02-21T08:52:45.7391075Z .b8 0 // EOM(2) 2026-02-21T08:52:45.7391170Z .b8 4 // Abbreviation Code 2026-02-21T08:52:45.7391277Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:45.7391477Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:45.7391581Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:45.7391661Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:45.7391744Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:45.7391825Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.7391923Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:45.7392004Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:45.7392088Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:45.7392175Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.7392260Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:45.7392339Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.7392434Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:45.7392517Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:45.7392590Z .b8 0 // EOM(1) 2026-02-21T08:52:45.7392668Z .b8 0 // EOM(2) 2026-02-21T08:52:45.7392742Z .b8 0 // EOM(3) 2026-02-21T08:52:45.7392796Z } 2026-02-21T08:52:45.7392862Z .section .debug_info 2026-02-21T08:52:45.7392925Z { 2026-02-21T08:52:45.7393015Z .b32 178 // Length of Unit 2026-02-21T08:52:45.7393110Z .b8 2 // DWARF version number 2026-02-21T08:52:45.7393169Z .b8 0 2026-02-21T08:52:45.7393301Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:45.7393399Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:45.7393520Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:45.7393609Z .b8 116 // DW_AT_producer 2026-02-21T08:52:45.7393665Z .b8 114 2026-02-21T08:52:45.7393718Z .b8 105 2026-02-21T08:52:45.7393775Z .b8 116 2026-02-21T08:52:45.7393829Z .b8 111 2026-02-21T08:52:45.7393884Z .b8 110 2026-02-21T08:52:45.7393939Z .b8 0 2026-02-21T08:52:45.7394019Z .b8 2 // DW_AT_language 2026-02-21T08:52:45.7394071Z .b8 0 2026-02-21T08:52:45.7394152Z .b8 99 // DW_AT_name 2026-02-21T08:52:45.7394208Z .b8 51 2026-02-21T08:52:45.7394261Z .b8 120 2026-02-21T08:52:45.7394313Z .b8 54 2026-02-21T08:52:45.7394371Z .b8 116 2026-02-21T08:52:45.7394422Z .b8 54 2026-02-21T08:52:45.7394473Z .b8 119 2026-02-21T08:52:45.7394526Z .b8 118 2026-02-21T08:52:45.7394596Z .b8 107 2026-02-21T08:52:45.7394650Z .b8 105 2026-02-21T08:52:45.7394703Z .b8 51 2026-02-21T08:52:45.7394762Z .b8 122 2026-02-21T08:52:45.7394813Z .b8 51 2026-02-21T08:52:45.7394865Z .b8 51 2026-02-21T08:52:45.7394916Z .b8 115 2026-02-21T08:52:45.7394974Z .b8 109 2026-02-21T08:52:45.7395029Z .b8 107 2026-02-21T08:52:45.7395082Z .b8 100 2026-02-21T08:52:45.7395134Z .b8 111 2026-02-21T08:52:45.7395192Z .b8 104 2026-02-21T08:52:45.7395244Z .b8 112 2026-02-21T08:52:45.7395295Z .b8 108 2026-02-21T08:52:45.7395460Z .b8 54 2026-02-21T08:52:45.7395512Z .b8 103 2026-02-21T08:52:45.7395564Z .b8 119 2026-02-21T08:52:45.7395617Z .b8 115 2026-02-21T08:52:45.7395674Z .b8 52 2026-02-21T08:52:45.7395726Z .b8 112 2026-02-21T08:52:45.7395779Z .b8 99 2026-02-21T08:52:45.7395836Z .b8 109 2026-02-21T08:52:45.7395890Z .b8 101 2026-02-21T08:52:45.7395941Z .b8 99 2026-02-21T08:52:45.7395994Z .b8 97 2026-02-21T08:52:45.7396064Z .b8 100 2026-02-21T08:52:45.7396121Z .b8 111 2026-02-21T08:52:45.7396174Z .b8 104 2026-02-21T08:52:45.7396226Z .b8 114 2026-02-21T08:52:45.7396285Z .b8 117 2026-02-21T08:52:45.7396337Z .b8 52 2026-02-21T08:52:45.7396391Z .b8 106 2026-02-21T08:52:45.7396562Z .b8 53 2026-02-21T08:52:45.7396619Z .b8 118 2026-02-21T08:52:45.7396674Z .b8 113 2026-02-21T08:52:45.7396726Z .b8 101 2026-02-21T08:52:45.7396932Z .b8 105 2026-02-21T08:52:45.7396999Z .b8 112 2026-02-21T08:52:45.7397053Z .b8 122 2026-02-21T08:52:45.7397109Z .b8 120 2026-02-21T08:52:45.7397164Z .b8 113 2026-02-21T08:52:45.7397217Z .b8 110 2026-02-21T08:52:45.7397276Z .b8 108 2026-02-21T08:52:45.7397338Z .b8 114 2026-02-21T08:52:45.7397390Z .b8 46 2026-02-21T08:52:45.7397443Z .b8 112 2026-02-21T08:52:45.7397502Z .b8 121 2026-02-21T08:52:45.7397554Z .b8 0 2026-02-21T08:52:45.7397663Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:45.7397749Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:45.7397810Z .b8 116 2026-02-21T08:52:45.7397864Z .b8 109 2026-02-21T08:52:45.7397916Z .b8 112 2026-02-21T08:52:45.7397977Z .b8 47 2026-02-21T08:52:45.7398031Z .b8 116 2026-02-21T08:52:45.7398084Z .b8 111 2026-02-21T08:52:45.7398139Z .b8 114 2026-02-21T08:52:45.7398197Z .b8 99 2026-02-21T08:52:45.7398251Z .b8 104 2026-02-21T08:52:45.7398303Z .b8 105 2026-02-21T08:52:45.7398357Z .b8 110 2026-02-21T08:52:45.7398419Z .b8 100 2026-02-21T08:52:45.7398475Z .b8 117 2026-02-21T08:52:45.7398529Z .b8 99 2026-02-21T08:52:45.7398588Z .b8 116 2026-02-21T08:52:45.7398640Z .b8 111 2026-02-21T08:52:45.7398694Z .b8 114 2026-02-21T08:52:45.7398752Z .b8 95 2026-02-21T08:52:45.7398812Z .b8 114 2026-02-21T08:52:45.7398864Z .b8 111 2026-02-21T08:52:45.7398918Z .b8 111 2026-02-21T08:52:45.7398977Z .b8 116 2026-02-21T08:52:45.7399029Z .b8 47 2026-02-21T08:52:45.7399081Z .b8 51 2026-02-21T08:52:45.7399135Z .b8 120 2026-02-21T08:52:45.7399193Z .b8 0 2026-02-21T08:52:45.7399320Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:45.7399405Z .b8 95 // DW_AT_name 2026-02-21T08:52:45.7399465Z .b8 104 2026-02-21T08:52:45.7399518Z .b8 101 2026-02-21T08:52:45.7399570Z .b8 108 2026-02-21T08:52:45.7399623Z .b8 105 2026-02-21T08:52:45.7399681Z .b8 111 2026-02-21T08:52:45.7399733Z .b8 110 2026-02-21T08:52:45.7399785Z .b8 95 2026-02-21T08:52:45.7399842Z .b8 109 2026-02-21T08:52:45.7399895Z .b8 97 2026-02-21T08:52:45.7399947Z .b8 116 2026-02-21T08:52:45.7399998Z .b8 109 2026-02-21T08:52:45.7400054Z .b8 117 2026-02-21T08:52:45.7400107Z .b8 108 2026-02-21T08:52:45.7400160Z .b8 95 2026-02-21T08:52:45.7400210Z .b8 98 2026-02-21T08:52:45.7400267Z .b8 102 2026-02-21T08:52:45.7400318Z .b8 49 2026-02-21T08:52:45.7400370Z .b8 54 2026-02-21T08:52:45.7400426Z .b8 95 2026-02-21T08:52:45.7400475Z .b8 105 2026-02-21T08:52:45.7400526Z .b8 110 2026-02-21T08:52:45.7400579Z .b8 116 2026-02-21T08:52:45.7400635Z .b8 52 2026-02-21T08:52:45.7400685Z .b8 0 2026-02-21T08:52:45.7400768Z .b8 1 // DW_AT_inline 2026-02-21T08:52:45.7400881Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:45.7400977Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:45.7401077Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:45.7401182Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:45.7401319Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:45.7401418Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:45.7401663Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:45.7401767Z .b64 $L__tmp10 // DW_AT_high_pc 2026-02-21T08:52:45.7401853Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:45.7401936Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:45.7402031Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:45.7402122Z .b8 0 // End Of Children Mark 2026-02-21T08:52:45.7402210Z .b8 0 // End Of Children Mark 2026-02-21T08:52:45.7402271Z } 2026-02-21T08:52:45.7402343Z .section .debug_macinfo { } 2026-02-21T08:52:45.7402349Z 2026-02-21T08:52:45.7402526Z ================================================================ 2026-02-21T08:52:45.7402655Z please share the reproducer above with Triton project. 2026-02-21T08:52:46.4117630Z 2026-02-21T08:52:46.4117646Z 2026-02-21T08:52:46.4117676Z 2026-02-21T08:52:46.4118008Z ================================================================ 2026-02-21T08:52:46.4118391Z Internal Triton PTX codegen error 2026-02-21T08:52:46.4118654Z `ptxas` stderr: 2026-02-21T08:52:46.4119382Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 674 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T08:52:46.4120243Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:46.4120484Z 2026-02-21T08:52:46.4121154Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpquavh5gk.ptx -o /tmp/tmpquavh5gk.ptx.o 2026-02-21T08:52:46.4121941Z 2026-02-21T08:52:46.4121946Z 2026-02-21T08:52:46.4122025Z // 2026-02-21T08:52:46.4122217Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:46.4122467Z // 2026-02-21T08:52:46.4122555Z 2026-02-21T08:52:46.4122634Z .version 8.7 2026-02-21T08:52:46.4122822Z .target sm_90a 2026-02-21T08:52:46.4123006Z .address_size 64 2026-02-21T08:52:46.4123117Z 2026-02-21T08:52:46.4123341Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:46.4123780Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:46.4124106Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:46.4124438Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:46.4124831Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:46.4125298Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:46.4125744Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:46.4126197Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:46.4126880Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:46.4127226Z ) 2026-02-21T08:52:46.4127385Z .reqntid 256 2026-02-21T08:52:46.4127554Z .maxnreg 32 2026-02-21T08:52:46.4127712Z { 2026-02-21T08:52:46.4127869Z .reg .pred %p<58>; 2026-02-21T08:52:46.4128051Z .reg .b16 %rs<777>; 2026-02-21T08:52:46.4128236Z .reg .b32 %r<3929>; 2026-02-21T08:52:46.4128406Z .reg .b64 %rd<306>; 2026-02-21T08:52:46.4128770Z .loc 1 14 0 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:14:0 2026-02-21T08:52:46.4129183Z $L__func_begin0: 2026-02-21T08:52:46.4129537Z .loc 1 14 0 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:14:0 2026-02-21T08:52:46.4129880Z 2026-02-21T08:52:46.4129949Z // %bb.0: 2026-02-21T08:52:46.4130170Z ld.param.b64 %rd55, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:46.4130529Z ld.param.b64 %rd54, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:46.4130869Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:46.4131165Z $L__tmp0: 2026-02-21T08:52:46.4131498Z .loc 1 19 46 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:46 2026-02-21T08:52:46.4132329Z mov.u32 %r3784, %ctaid.x; 2026-02-21T08:52:46.4132695Z .loc 1 0 0 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:0 2026-02-21T08:52:46.4133097Z sub.s32 %r418, 4279, %r3784; 2026-02-21T08:52:46.4133492Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4133927Z mul.hi.u32 %r419, %r418, 1041204193; 2026-02-21T08:52:46.4134165Z shr.u32 %r420, %r419, 10; 2026-02-21T08:52:46.4134365Z mul.hi.u32 %r421, %r420, 1431655766; 2026-02-21T08:52:46.4134608Z mad.lo.s32 %r3893, %r421, 12672, %r3784; 2026-02-21T08:52:46.4135035Z .loc 1 31 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:31:45 2026-02-21T08:52:46.4135660Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:46.4135844Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:46.4136003Z shr.u32 %r422, %r3, 4; 2026-02-21T08:52:46.4136180Z bfe.u32 %r5, %r3, 4, 4; 2026-02-21T08:52:46.4136352Z or.b32 %r6, %r5, 16; 2026-02-21T08:52:46.4136700Z or.b32 %r7, %r5, 32; 2026-02-21T08:52:46.4136856Z or.b32 %r8, %r422, 48; 2026-02-21T08:52:46.4137024Z and.b32 %r423, %r3, 15; 2026-02-21T08:52:46.4137188Z shl.b32 %r9, %r423, 2; 2026-02-21T08:52:46.4137504Z .loc 1 33 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:45 2026-02-21T08:52:46.4137862Z shl.b32 %r424, %r3, 4; 2026-02-21T08:52:46.4138024Z and.b32 %r10, %r424, 112; 2026-02-21T08:52:46.4138217Z shl.b32 %r11, %r423, 3; 2026-02-21T08:52:46.4138530Z .loc 1 41 48 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:41:48 2026-02-21T08:52:46.4138895Z bfe.u32 %r12, %r3, 3, 5; 2026-02-21T08:52:46.4139213Z .loc 1 65 38 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:65:38 2026-02-21T08:52:46.4139584Z and.b32 %r13, %r3, 128; 2026-02-21T08:52:46.4139907Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4140280Z setp.ge.s32 %p1, %r3784, %r3893; 2026-02-21T08:52:46.4140484Z and.b32 %r3767, %r3, 255; 2026-02-21T08:52:46.4140650Z shr.u32 %r3768, %r3, 1; 2026-02-21T08:52:46.4140821Z mov.b32 %r3769, global_smem; 2026-02-21T08:52:46.4141001Z mul.lo.s32 %r3770, %r12, 7168; 2026-02-21T08:52:46.4141191Z shl.b32 %r3771, %r3, 6; 2026-02-21T08:52:46.4141355Z shl.b32 %r3772, %r3, 5; 2026-02-21T08:52:46.4141519Z shl.b32 %r3773, %r3, 1; 2026-02-21T08:52:46.4141684Z and.b32 %r3774, %r3, 127; 2026-02-21T08:52:46.4141852Z or.b32 %r3775, %r3, 896; 2026-02-21T08:52:46.4142025Z or.b32 %r3776, %r3, 1920; 2026-02-21T08:52:46.4142189Z or.b32 %r3777, %r3, 2944; 2026-02-21T08:52:46.4142360Z or.b32 %r3778, %r3, 3968; 2026-02-21T08:52:46.4142527Z shr.u32 %r3779, %r13, 5; 2026-02-21T08:52:46.4142697Z and.b32 %r3780, %r3, 3; 2026-02-21T08:52:46.4142855Z and.b32 %r3781, %r3, 24; 2026-02-21T08:52:46.4143039Z shl.b32 %r3782, %r3, 2; 2026-02-21T08:52:46.4143208Z shr.u32 %r3783, %r13, 1; 2026-02-21T08:52:46.4143382Z cvt.u64.u32 %rd288, %r9; 2026-02-21T08:52:46.4143562Z setp.eq.b32 %p57, %r13, 0; 2026-02-21T08:52:46.4144209Z [83s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:46.4145780Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:46.4147398Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:46.4147698Z `ptxas` stderr: 2026-02-21T08:52:46.4148262Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 674 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T08:52:46.4149192Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:46.4149383Z 2026-02-21T08:52:46.4149899Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpquavh5gk.ptx -o /tmp/tmpquavh5gk.ptx.o 2026-02-21T08:52:46.4150489Z 2026-02-21T08:52:46.4150651Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:46.4150951Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:46.4151145Z // %bb.1: // %.lr.ph 2026-02-21T08:52:46.4151523Z .loc 1 0 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:0:112 2026-02-21T08:52:46.4152058Z shl.b32 %r426, %r3767, 3; 2026-02-21T08:52:46.4152254Z and.b32 %r428, %r3768, 56; 2026-02-21T08:52:46.4152434Z xor.b32 %r14, %r426, %r428; 2026-02-21T08:52:46.4152627Z add.s32 %r430, %r3769, %r14; 2026-02-21T08:52:46.4152807Z add.s32 %r15, %r430, 32768; 2026-02-21T08:52:46.4152989Z add.s32 %r16, %r430, 34816; 2026-02-21T08:52:46.4153158Z add.s32 %r17, %r430, 36864; 2026-02-21T08:52:46.4153332Z add.s32 %r18, %r430, 38912; 2026-02-21T08:52:46.4153511Z shl.b32 %r431, %r3767, 4; 2026-02-21T08:52:46.4153691Z add.s32 %r432, %r3769, %r431; 2026-02-21T08:52:46.4153870Z add.s32 %r20, %r432, 49152; 2026-02-21T08:52:46.4154055Z add.s32 %r21, %r430, 40960; 2026-02-21T08:52:46.4154232Z add.s32 %r22, %r430, 43008; 2026-02-21T08:52:46.4154399Z add.s32 %r23, %r430, 45056; 2026-02-21T08:52:46.4154575Z add.s32 %r24, %r430, 47104; 2026-02-21T08:52:46.4154747Z add.s32 %r25, %r432, 53248; 2026-02-21T08:52:46.4154924Z and.b32 %r434, %r3771, 6144; 2026-02-21T08:52:46.4155103Z and.b32 %r436, %r3772, 896; 2026-02-21T08:52:46.4155287Z and.b32 %r438, %r3773, 62; 2026-02-21T08:52:46.4155458Z or.b32 %r439, %r434, %r436; 2026-02-21T08:52:46.4155644Z or.b32 %r26, %r439, %r438; 2026-02-21T08:52:46.4155821Z xor.b32 %r27, %r26, 8; 2026-02-21T08:52:46.4155985Z xor.b32 %r28, %r26, 16; 2026-02-21T08:52:46.4156154Z xor.b32 %r29, %r26, 24; 2026-02-21T08:52:46.4156319Z xor.b32 %r30, %r26, 32; 2026-02-21T08:52:46.4156657Z xor.b32 %r31, %r26, 40; 2026-02-21T08:52:46.4156909Z xor.b32 %r32, %r26, 48; 2026-02-21T08:52:46.4157076Z xor.b32 %r33, %r26, 56; 2026-02-21T08:52:46.4157253Z shl.b32 %r440, %r3774, 7; 2026-02-21T08:52:46.4157429Z or.b32 %r442, %r3779, %r440; 2026-02-21T08:52:46.4157607Z or.b32 %r443, %r442, %r10; 2026-02-21T08:52:46.4157775Z add.s32 %r39, %r3769, %r443; 2026-02-21T08:52:46.4157952Z xor.b32 %r444, %r443, 16; 2026-02-21T08:52:46.4158119Z add.s32 %r40, %r3769, %r444; 2026-02-21T08:52:46.4158294Z xor.b32 %r445, %r443, 32; 2026-02-21T08:52:46.4158473Z add.s32 %r41, %r3769, %r445; 2026-02-21T08:52:46.4158650Z xor.b32 %r446, %r443, 48; 2026-02-21T08:52:46.4158824Z add.s32 %r42, %r3769, %r446; 2026-02-21T08:52:46.4158995Z xor.b32 %r447, %r443, 64; 2026-02-21T08:52:46.4159169Z add.s32 %r43, %r3769, %r447; 2026-02-21T08:52:46.4159341Z xor.b32 %r448, %r443, 80; 2026-02-21T08:52:46.4159512Z add.s32 %r44, %r3769, %r448; 2026-02-21T08:52:46.4159682Z xor.b32 %r449, %r443, 96; 2026-02-21T08:52:46.4159854Z add.s32 %r45, %r3769, %r449; 2026-02-21T08:52:46.4160022Z xor.b32 %r450, %r443, 112; 2026-02-21T08:52:46.4160197Z add.s32 %r46, %r3769, %r450; 2026-02-21T08:52:46.4160385Z shl.b32 %r452, %r3780, 12; 2026-02-21T08:52:46.4160556Z and.b32 %r453, %r3772, 3168; 2026-02-21T08:52:46.4160736Z shl.b32 %r455, %r3781, 4; 2026-02-21T08:52:46.4160898Z and.b32 %r457, %r3782, 16; 2026-02-21T08:52:46.4161071Z or.b32 %r459, %r452, %r457; 2026-02-21T08:52:46.4161242Z or.b32 %r460, %r453, %r455; 2026-02-21T08:52:46.4161426Z xor.b32 %r461, %r460, %r3783; 2026-02-21T08:52:46.4161607Z or.b32 %r462, %r461, %r459; 2026-02-21T08:52:46.4161789Z add.s32 %r47, %r3769, %r462; 2026-02-21T08:52:46.4161960Z xor.b32 %r463, %r462, 32; 2026-02-21T08:52:46.4162307Z add.s32 %r48, %r3769, %r463; 2026-02-21T08:52:46.4162490Z shl.b32 %r464, %r3781, 9; 2026-02-21T08:52:46.4162655Z shl.b32 %r465, %r3780, 5; 2026-02-21T08:52:46.4162825Z and.b32 %r466, %r3782, 1008; 2026-02-21T08:52:46.4162998Z or.b32 %r467, %r464, %r465; 2026-02-21T08:52:46.4163171Z xor.b32 %r468, %r467, %r466; 2026-02-21T08:52:46.4163356Z add.s32 %r1230, %r3769, %r468; 2026-02-21T08:52:46.4163546Z add.s32 %r1235, %r1230, 1024; 2026-02-21T08:52:46.4163719Z add.s32 %r1240, %r1230, 2048; 2026-02-21T08:52:46.4163898Z add.s32 %r1245, %r1230, 3072; 2026-02-21T08:52:46.4164242Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4164616Z or.b32 %r469, %r3770, %r10; 2026-02-21T08:52:46.4164951Z add.s32 %r53, %r469, 458752; 2026-02-21T08:52:46.4165297Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4165673Z mad.wide.u32 %rd56, %r423, 8, %rd53; 2026-02-21T08:52:46.4165879Z add.s64 %rd1, %rd56, 256; 2026-02-21T08:52:46.4166054Z shl.b32 %r56, %r5, 13; 2026-02-21T08:52:46.4166218Z shl.b32 %r471, %r8, 13; 2026-02-21T08:52:46.4166683Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4167050Z or.b32 %r472, %r471, %r9; 2026-02-21T08:52:46.4167217Z or.b32 %r57, %r472, 128; 2026-02-21T08:52:46.4167391Z cvt.u64.u32 %rd5, %r3770; 2026-02-21T08:52:46.4167610Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:46.4167913Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:46.4168181Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:46.4168445Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:46.4168824Z .loc 1 25 35 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:25:35 2026-02-21T08:52:46.4169206Z mul.hi.s32 %r496, %r3784, -1840700269; 2026-02-21T08:52:46.4169422Z add.s32 %r497, %r496, %r3784; 2026-02-21T08:52:46.4169598Z shr.u32 %r498, %r497, 31; 2026-02-21T08:52:46.4169772Z shr.s32 %r499, %r497, 6; 2026-02-21T08:52:46.4169937Z add.s32 %r500, %r499, %r498; 2026-02-21T08:52:46.4170265Z .loc 1 26 33 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:26:33 2026-02-21T08:52:46.4170625Z shl.b32 %r501, %r500, 1; 2026-02-21T08:52:46.4170933Z .loc 1 27 39 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:39 2026-02-21T08:52:46.4171294Z sub.s32 %r502, 1, %r501; 2026-02-21T08:52:46.4171607Z .loc 1 27 52 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:52 2026-02-21T08:52:46.4171965Z min.s32 %r503, %r502, 2; 2026-02-21T08:52:46.4172269Z .loc 1 28 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:45 2026-02-21T08:52:46.4172636Z mul.lo.s32 %r504, %r500, 112; 2026-02-21T08:52:46.4172821Z sub.s32 %r505, %r3784, %r504; 2026-02-21T08:52:46.4173135Z .loc 1 29 51 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:29:51 2026-02-21T08:52:46.4173490Z div.s32 %r506, %r505, %r503; 2026-02-21T08:52:46.4173807Z .loc 1 28 64 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:64 2026-02-21T08:52:46.4174170Z mul.lo.s32 %r507, %r506, %r503; 2026-02-21T08:52:46.4174359Z sub.s32 %r508, %r505, %r507; 2026-02-21T08:52:46.4174678Z .loc 1 28 30 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:30 2026-02-21T08:52:46.4175034Z add.s32 %r509, %r508, %r501; 2026-02-21T08:52:46.4175356Z .loc 1 30 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:30:27 2026-02-21T08:52:46.4175711Z shl.b32 %r510, %r509, 6; 2026-02-21T08:52:46.4176030Z .loc 1 31 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:31:32 2026-02-21T08:52:46.4176696Z or.b32 %r102, %r510, %r5; 2026-02-21T08:52:46.4176876Z or.b32 %r103, %r510, %r6; 2026-02-21T08:52:46.4177052Z or.b32 %r104, %r510, %r7; 2026-02-21T08:52:46.4177220Z or.b32 %r105, %r510, %r8; 2026-02-21T08:52:46.4177544Z .loc 1 32 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:32:27 2026-02-21T08:52:46.4177900Z shl.b32 %r106, %r506, 7; 2026-02-21T08:52:46.4178205Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4178563Z or.b32 %r511, %r106, %r10; 2026-02-21T08:52:46.4178871Z .loc 1 48 53 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:53 2026-02-21T08:52:46.4179222Z shl.b32 %r512, %r102, 13; 2026-02-21T08:52:46.4179560Z shl.b32 %r513, %r103, 13; 2026-02-21T08:52:46.4179751Z shl.b32 %r514, %r104, 13; 2026-02-21T08:52:46.4179918Z shl.b32 %r515, %r105, 13; 2026-02-21T08:52:46.4180226Z .loc 1 48 60 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:60 2026-02-21T08:52:46.4180579Z or.b32 %r516, %r512, %r9; 2026-02-21T08:52:46.4180743Z or.b32 %r517, %r513, %r9; 2026-02-21T08:52:46.4180910Z or.b32 %r518, %r514, %r9; 2026-02-21T08:52:46.4181071Z or.b32 %r519, %r515, %r9; 2026-02-21T08:52:46.4181382Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4181736Z mad.wide.s32 %rd57, %r516, 2, %rd53; 2026-02-21T08:52:46.4181953Z mad.wide.s32 %rd58, %r517, 2, %rd53; 2026-02-21T08:52:46.4182157Z mad.wide.s32 %rd59, %r518, 2, %rd53; 2026-02-21T08:52:46.4182363Z mad.wide.s32 %rd60, %r519, 2, %rd53; 2026-02-21T08:52:46.4182564Z mov.b32 %r474, 8; 2026-02-21T08:52:46.4182869Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4183223Z // begin inline asm 2026-02-21T08:52:46.4183458Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd57 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4183751Z // end inline asm 2026-02-21T08:52:46.4183910Z // begin inline asm 2026-02-21T08:52:46.4184134Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd58 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4184403Z // end inline asm 2026-02-21T08:52:46.4184548Z // begin inline asm 2026-02-21T08:52:46.4184773Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd59 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4185047Z // end inline asm 2026-02-21T08:52:46.4185198Z // begin inline asm 2026-02-21T08:52:46.4185415Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd60 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4185683Z // end inline asm 2026-02-21T08:52:46.4185834Z cp.async.commit_group; 2026-02-21T08:52:46.4186157Z .loc 1 54 62 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:62 2026-02-21T08:52:46.4186651Z add.s32 %r520, %r511, %r3770; 2026-02-21T08:52:46.4186974Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4187331Z cvt.s64.s32 %rd68, %r520; 2026-02-21T08:52:46.4187505Z add.s64 %rd61, %rd54, %rd68; 2026-02-21T08:52:46.4187687Z mov.b32 %r482, 16; 2026-02-21T08:52:46.4187985Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4188342Z // begin inline asm 2026-02-21T08:52:46.4188653Z cp.async.cg.shared.global [ %r20 + 0 ], [ %rd61 + 0 ], 0x10, %r482; 2026-02-21T08:52:46.4188937Z // end inline asm 2026-02-21T08:52:46.4189101Z cp.async.commit_group; 2026-02-21T08:52:46.4189410Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4189767Z cvt.s64.s32 %rd69, %r512; 2026-02-21T08:52:46.4189936Z or.b64 %rd70, %rd69, %rd288; 2026-02-21T08:52:46.4190128Z shl.b64 %rd71, %rd70, 1; 2026-02-21T08:52:46.4190298Z add.s64 %rd72, %rd53, %rd71; 2026-02-21T08:52:46.4190480Z add.s64 %rd62, %rd72, 128; 2026-02-21T08:52:46.4190834Z cvt.s64.s32 %rd73, %r513; 2026-02-21T08:52:46.4191001Z or.b64 %rd74, %rd73, %rd288; 2026-02-21T08:52:46.4191181Z shl.b64 %rd75, %rd74, 1; 2026-02-21T08:52:46.4191346Z add.s64 %rd76, %rd53, %rd75; 2026-02-21T08:52:46.4191523Z add.s64 %rd63, %rd76, 128; 2026-02-21T08:52:46.4191694Z cvt.s64.s32 %rd77, %r514; 2026-02-21T08:52:46.4191865Z or.b64 %rd78, %rd77, %rd288; 2026-02-21T08:52:46.4192035Z shl.b64 %rd79, %rd78, 1; 2026-02-21T08:52:46.4192205Z add.s64 %rd80, %rd53, %rd79; 2026-02-21T08:52:46.4192374Z add.s64 %rd64, %rd80, 128; 2026-02-21T08:52:46.4192547Z cvt.s64.s32 %rd81, %r515; 2026-02-21T08:52:46.4192717Z or.b64 %rd82, %rd81, %rd288; 2026-02-21T08:52:46.4192885Z shl.b64 %rd83, %rd82, 1; 2026-02-21T08:52:46.4193065Z add.s64 %rd84, %rd53, %rd83; 2026-02-21T08:52:46.4193365Z add.s64 %rd65, %rd84, 128; 2026-02-21T08:52:46.4193684Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4194031Z bar.sync 0; 2026-02-21T08:52:46.4194190Z // begin inline asm 2026-02-21T08:52:46.4194416Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd62 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4194691Z // end inline asm 2026-02-21T08:52:46.4194851Z // begin inline asm 2026-02-21T08:52:46.4195077Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd63 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4195359Z // end inline asm 2026-02-21T08:52:46.4195503Z // begin inline asm 2026-02-21T08:52:46.4195731Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd64 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4196016Z // end inline asm 2026-02-21T08:52:46.4196169Z // begin inline asm 2026-02-21T08:52:46.4196394Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd65 + 0 ], 0x8, %r474; 2026-02-21T08:52:46.4196824Z // end inline asm 2026-02-21T08:52:46.4196986Z cp.async.commit_group; 2026-02-21T08:52:46.4197304Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4197663Z cvt.s64.s32 %rd85, %r511; 2026-02-21T08:52:46.4197847Z add.s64 %rd86, %rd85, %rd5; 2026-02-21T08:52:46.4198025Z add.s64 %rd87, %rd54, %rd86; 2026-02-21T08:52:46.4198206Z add.s64 %rd66, %rd87, 229376; 2026-02-21T08:52:46.4198526Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4198881Z // begin inline asm 2026-02-21T08:52:46.4199106Z cp.async.cg.shared.global [ %r25 + 0 ], [ %rd66 + 0 ], 0x10, %r482; 2026-02-21T08:52:46.4199380Z // end inline asm 2026-02-21T08:52:46.4199546Z cp.async.commit_group; 2026-02-21T08:52:46.4199888Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4200273Z add.s32 %r3786, %r53, %r106; 2026-02-21T08:52:46.4200452Z or.b32 %r521, %r7, %r510; 2026-02-21T08:52:46.4200631Z shl.b32 %r522, %r521, 13; 2026-02-21T08:52:46.4200811Z mad.wide.s32 %rd291, %r522, 2, %rd1; 2026-02-21T08:52:46.4201032Z or.b32 %r523, %r6, %r510; 2026-02-21T08:52:46.4201197Z shl.b32 %r524, %r523, 13; 2026-02-21T08:52:46.4201377Z mad.wide.s32 %rd290, %r524, 2, %rd1; 2026-02-21T08:52:46.4201571Z shl.b32 %r525, %r509, 19; 2026-02-21T08:52:46.4201742Z or.b32 %r526, %r56, %r525; 2026-02-21T08:52:46.4201922Z mad.wide.s32 %rd289, %r526, 2, %rd1; 2026-02-21T08:52:46.4202126Z or.b32 %r3785, %r57, %r525; 2026-02-21T08:52:46.4202307Z mov.b32 %r3789, 0f00000000; 2026-02-21T08:52:46.4202474Z mov.b32 %r3788, 1; 2026-02-21T08:52:46.4202631Z mov.b32 %r3787, -1; 2026-02-21T08:52:46.4202788Z mov.b64 %rd292, -32; 2026-02-21T08:52:46.4202964Z mov.b32 %r3790, %r3789; 2026-02-21T08:52:46.4203130Z mov.b32 %r3791, %r3789; 2026-02-21T08:52:46.4203295Z mov.b32 %r3792, %r3789; 2026-02-21T08:52:46.4203452Z mov.b32 %r3793, %r3789; 2026-02-21T08:52:46.4203616Z mov.b32 %r3794, %r3789; 2026-02-21T08:52:46.4203782Z mov.b32 %r3795, %r3789; 2026-02-21T08:52:46.4203939Z mov.b32 %r3796, %r3789; 2026-02-21T08:52:46.4204114Z mov.b32 %r3797, %r3789; 2026-02-21T08:52:46.4204278Z mov.b32 %r3798, %r3789; 2026-02-21T08:52:46.4204587Z mov.b32 %r3799, %r3789; 2026-02-21T08:52:46.4204748Z mov.b32 %r3800, %r3789; 2026-02-21T08:52:46.4204917Z mov.b32 %r3801, %r3789; 2026-02-21T08:52:46.4205074Z mov.b32 %r3802, %r3789; 2026-02-21T08:52:46.4205251Z mov.b32 %r3803, %r3789; 2026-02-21T08:52:46.4205410Z mov.b32 %r3804, %r3789; 2026-02-21T08:52:46.4205585Z mov.b32 %r3805, %r3789; 2026-02-21T08:52:46.4205750Z mov.b32 %r3806, %r3789; 2026-02-21T08:52:46.4205910Z mov.b32 %r3807, %r3789; 2026-02-21T08:52:46.4206094Z mov.b32 %r3808, %r3789; 2026-02-21T08:52:46.4206258Z mov.b32 %r3809, %r3789; 2026-02-21T08:52:46.4206416Z mov.b32 %r3810, %r3789; 2026-02-21T08:52:46.4206735Z mov.b32 %r3811, %r3789; 2026-02-21T08:52:46.4206894Z mov.b32 %r3812, %r3789; 2026-02-21T08:52:46.4207422Z mov.b32 %r3813, %r3789; 2026-02-21T08:52:46.4207595Z mov.b32 %r3814, %r3789; 2026-02-21T08:52:46.4207776Z mov.b32 %r3815, %r3789; 2026-02-21T08:52:46.4207934Z mov.b32 %r3816, %r3789; 2026-02-21T08:52:46.4208106Z mov.b32 %r3817, %r3789; 2026-02-21T08:52:46.4208263Z mov.b32 %r3818, %r3789; 2026-02-21T08:52:46.4208426Z mov.b32 %r3819, %r3789; 2026-02-21T08:52:46.4208590Z mov.b32 %r3820, %r3789; 2026-02-21T08:52:46.4208801Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:46.4209101Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:46.4209355Z add.s64 %rd292, %rd292, 32; 2026-02-21T08:52:46.4209544Z setp.lt.u64 %p11, %rd292, 4032; 2026-02-21T08:52:46.4209732Z add.s32 %r1151, %r3787, 1; 2026-02-21T08:52:46.4209914Z setp.gt.s32 %p12, %r1151, 1; 2026-02-21T08:52:46.4210100Z selp.b32 %r3787, 0, %r1151, %p12; 2026-02-21T08:52:46.4210449Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4210822Z cp.async.wait_group 2; 2026-02-21T08:52:46.4210992Z bar.sync 0; 2026-02-21T08:52:46.4211152Z shl.b32 %r1152, %r3787, 12; 2026-02-21T08:52:46.4211338Z shl.b32 %r1153, %r3787, 13; 2026-02-21T08:52:46.4211519Z add.s32 %r1154, %r3769, 32768; 2026-02-21T08:52:46.4211706Z add.s32 %r1155, %r1154, %r1153; 2026-02-21T08:52:46.4212047Z .loc 1 52 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:52:32 2026-02-21T08:52:46.4212415Z add.s32 %r1156, %r1155, %r26; 2026-02-21T08:52:46.4212600Z ld.shared.b16 %rs1, [%r1156]; 2026-02-21T08:52:46.4212791Z ld.shared.b16 %rs2, [%r1156+1024]; 2026-02-21T08:52:46.4212994Z ld.shared.b16 %rs3, [%r1156+64]; 2026-02-21T08:52:46.4213193Z ld.shared.b16 %rs4, [%r1156+1088]; 2026-02-21T08:52:46.4213384Z add.s32 %r1157, %r1155, %r27; 2026-02-21T08:52:46.4213567Z ld.shared.b16 %rs5, [%r1157]; 2026-02-21T08:52:46.4213746Z ld.shared.b16 %rs6, [%r1157+1024]; 2026-02-21T08:52:46.4213962Z ld.shared.b16 %rs7, [%r1157+64]; 2026-02-21T08:52:46.4214150Z ld.shared.b16 %rs8, [%r1157+1088]; 2026-02-21T08:52:46.4214345Z add.s32 %r1158, %r1155, %r28; 2026-02-21T08:52:46.4214525Z ld.shared.b16 %rs9, [%r1158]; 2026-02-21T08:52:46.4214716Z ld.shared.b16 %rs10, [%r1158+1024]; 2026-02-21T08:52:46.4214926Z ld.shared.b16 %rs11, [%r1158+64]; 2026-02-21T08:52:46.4215123Z ld.shared.b16 %rs12, [%r1158+1088]; 2026-02-21T08:52:46.4215323Z add.s32 %r1159, %r1155, %r29; 2026-02-21T08:52:46.4215503Z ld.shared.b16 %rs13, [%r1159]; 2026-02-21T08:52:46.4215708Z ld.shared.b16 %rs14, [%r1159+1024]; 2026-02-21T08:52:46.4215904Z ld.shared.b16 %rs15, [%r1159+64]; 2026-02-21T08:52:46.4216105Z ld.shared.b16 %rs16, [%r1159+1088]; 2026-02-21T08:52:46.4216304Z add.s32 %r1160, %r1155, %r30; 2026-02-21T08:52:46.4216638Z ld.shared.b16 %rs17, [%r1160]; 2026-02-21T08:52:46.4216850Z ld.shared.b16 %rs18, [%r1160+1024]; 2026-02-21T08:52:46.4217058Z ld.shared.b16 %rs19, [%r1160+64]; 2026-02-21T08:52:46.4217268Z ld.shared.b16 %rs20, [%r1160+1088]; 2026-02-21T08:52:46.4217472Z add.s32 %r1161, %r1155, %r31; 2026-02-21T08:52:46.4217676Z ld.shared.b16 %rs21, [%r1161]; 2026-02-21T08:52:46.4217876Z ld.shared.b16 %rs22, [%r1161+1024]; 2026-02-21T08:52:46.4218248Z ld.shared.b16 %rs23, [%r1161+64]; 2026-02-21T08:52:46.4218453Z ld.shared.b16 %rs24, [%r1161+1088]; 2026-02-21T08:52:46.4218658Z add.s32 %r1162, %r1155, %r32; 2026-02-21T08:52:46.4218857Z ld.shared.b16 %rs25, [%r1162]; 2026-02-21T08:52:46.4219050Z ld.shared.b16 %rs26, [%r1162+1024]; 2026-02-21T08:52:46.4219250Z ld.shared.b16 %rs27, [%r1162+64]; 2026-02-21T08:52:46.4219444Z ld.shared.b16 %rs28, [%r1162+1088]; 2026-02-21T08:52:46.4219641Z add.s32 %r1163, %r1155, %r33; 2026-02-21T08:52:46.4219821Z ld.shared.b16 %rs29, [%r1163]; 2026-02-21T08:52:46.4220012Z ld.shared.b16 %rs30, [%r1163+1024]; 2026-02-21T08:52:46.4220206Z ld.shared.b16 %rs31, [%r1163+64]; 2026-02-21T08:52:46.4220404Z ld.shared.b16 %rs32, [%r1163+1088]; 2026-02-21T08:52:46.4220735Z cvt.f32.bf16 %r591, %rs1; 2026-02-21T08:52:46.4220931Z cvt.f32.bf16 %r592, %rs2; 2026-02-21T08:52:46.4221118Z cvt.f32.bf16 %r593, %rs5; 2026-02-21T08:52:46.4221286Z cvt.f32.bf16 %r594, %rs6; 2026-02-21T08:52:46.4221464Z cvt.f32.bf16 %r659, %rs9; 2026-02-21T08:52:46.4221634Z cvt.f32.bf16 %r660, %rs10; 2026-02-21T08:52:46.4221818Z cvt.f32.bf16 %r661, %rs13; 2026-02-21T08:52:46.4221990Z cvt.f32.bf16 %r662, %rs14; 2026-02-21T08:52:46.4222165Z cvt.f32.bf16 %r727, %rs17; 2026-02-21T08:52:46.4222333Z cvt.f32.bf16 %r728, %rs18; 2026-02-21T08:52:46.4222510Z cvt.f32.bf16 %r729, %rs21; 2026-02-21T08:52:46.4222684Z cvt.f32.bf16 %r730, %rs22; 2026-02-21T08:52:46.4222856Z cvt.f32.bf16 %r795, %rs25; 2026-02-21T08:52:46.4223029Z cvt.f32.bf16 %r796, %rs26; 2026-02-21T08:52:46.4223214Z cvt.f32.bf16 %r797, %rs29; 2026-02-21T08:52:46.4223392Z cvt.f32.bf16 %r798, %rs30; 2026-02-21T08:52:46.4223563Z cvt.f32.bf16 %r863, %rs3; 2026-02-21T08:52:46.4223737Z cvt.f32.bf16 %r864, %rs4; 2026-02-21T08:52:46.4223907Z cvt.f32.bf16 %r865, %rs7; 2026-02-21T08:52:46.4224081Z cvt.f32.bf16 %r866, %rs8; 2026-02-21T08:52:46.4224248Z cvt.f32.bf16 %r931, %rs11; 2026-02-21T08:52:46.4224430Z cvt.f32.bf16 %r932, %rs12; 2026-02-21T08:52:46.4224605Z cvt.f32.bf16 %r933, %rs15; 2026-02-21T08:52:46.4224774Z cvt.f32.bf16 %r934, %rs16; 2026-02-21T08:52:46.4224947Z cvt.f32.bf16 %r999, %rs19; 2026-02-21T08:52:46.4225123Z cvt.f32.bf16 %r1000, %rs20; 2026-02-21T08:52:46.4225307Z cvt.f32.bf16 %r1001, %rs23; 2026-02-21T08:52:46.4225480Z cvt.f32.bf16 %r1002, %rs24; 2026-02-21T08:52:46.4225660Z cvt.f32.bf16 %r1067, %rs27; 2026-02-21T08:52:46.4225836Z cvt.f32.bf16 %r1068, %rs28; 2026-02-21T08:52:46.4226020Z cvt.f32.bf16 %r1069, %rs31; 2026-02-21T08:52:46.4226210Z cvt.f32.bf16 %r1070, %rs32; 2026-02-21T08:52:46.4226696Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4227082Z add.s32 %r1164, %r3769, %r1152; 2026-02-21T08:52:46.4227287Z add.s32 %r1165, %r1164, 49152; 2026-02-21T08:52:46.4227628Z .loc 1 67 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:67:45 2026-02-21T08:52:46.4227994Z add.s32 %r1166, %r1165, %r3774; 2026-02-21T08:52:46.4228192Z ld.shared.b8 %rs33, [%r1166]; 2026-02-21T08:52:46.4228387Z ld.shared.b8 %rs34, [%r1166+128]; 2026-02-21T08:52:46.4228700Z ld.shared.b8 %rs35, [%r1166+256]; 2026-02-21T08:52:46.4228903Z ld.shared.b8 %rs36, [%r1166+384]; 2026-02-21T08:52:46.4229097Z ld.shared.b8 %rs37, [%r1166+512]; 2026-02-21T08:52:46.4229295Z ld.shared.b8 %rs38, [%r1166+640]; 2026-02-21T08:52:46.4229484Z ld.shared.b8 %rs39, [%r1166+768]; 2026-02-21T08:52:46.4229678Z add.s32 %r1167, %r1165, %r3775; 2026-02-21T08:52:46.4229863Z ld.shared.b8 %rs40, [%r1167]; 2026-02-21T08:52:46.4230057Z ld.shared.b8 %rs41, [%r1166+1024]; 2026-02-21T08:52:46.4230255Z ld.shared.b8 %rs42, [%r1166+1152]; 2026-02-21T08:52:46.4230461Z ld.shared.b8 %rs43, [%r1166+1280]; 2026-02-21T08:52:46.4230667Z ld.shared.b8 %rs44, [%r1166+1408]; 2026-02-21T08:52:46.4230859Z ld.shared.b8 %rs45, [%r1166+1536]; 2026-02-21T08:52:46.4231058Z ld.shared.b8 %rs46, [%r1166+1664]; 2026-02-21T08:52:46.4231417Z ld.shared.b8 %rs47, [%r1166+1792]; 2026-02-21T08:52:46.4231617Z add.s32 %r1168, %r1165, %r3776; 2026-02-21T08:52:46.4231805Z ld.shared.b8 %rs48, [%r1168]; 2026-02-21T08:52:46.4231991Z ld.shared.b8 %rs49, [%r1166+2048]; 2026-02-21T08:52:46.4232182Z ld.shared.b8 %rs50, [%r1166+2176]; 2026-02-21T08:52:46.4232379Z ld.shared.b8 %rs51, [%r1166+2304]; 2026-02-21T08:52:46.4232576Z ld.shared.b8 %rs52, [%r1166+2432]; 2026-02-21T08:52:46.4232770Z ld.shared.b8 %rs53, [%r1166+2560]; 2026-02-21T08:52:46.4232969Z ld.shared.b8 %rs54, [%r1166+2688]; 2026-02-21T08:52:46.4233161Z ld.shared.b8 %rs55, [%r1166+2816]; 2026-02-21T08:52:46.4233357Z add.s32 %r1169, %r1165, %r3777; 2026-02-21T08:52:46.4233543Z ld.shared.b8 %rs56, [%r1169]; 2026-02-21T08:52:46.4233894Z ld.shared.b8 %rs57, [%r1166+3072]; 2026-02-21T08:52:46.4234108Z ld.shared.b8 %rs58, [%r1166+3200]; 2026-02-21T08:52:46.4234308Z ld.shared.b8 %rs59, [%r1166+3328]; 2026-02-21T08:52:46.4234499Z ld.shared.b8 %rs60, [%r1166+3456]; 2026-02-21T08:52:46.4234700Z ld.shared.b8 %rs61, [%r1166+3584]; 2026-02-21T08:52:46.4234898Z ld.shared.b8 %rs62, [%r1166+3712]; 2026-02-21T08:52:46.4235090Z ld.shared.b8 %rs63, [%r1166+3840]; 2026-02-21T08:52:46.4235311Z add.s32 %r1170, %r1165, %r3778; 2026-02-21T08:52:46.4235497Z ld.shared.b8 %rs64, [%r1170]; 2026-02-21T08:52:46.4235831Z .loc 1 57 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:57:28 2026-02-21T08:52:46.4236194Z shl.b16 %rs65, %rs33, 4; 2026-02-21T08:52:46.4236389Z shl.b16 %rs66, %rs34, 4; 2026-02-21T08:52:46.4236701Z shl.b16 %rs67, %rs35, 4; 2026-02-21T08:52:46.4236874Z shl.b16 %rs68, %rs36, 4; 2026-02-21T08:52:46.4237054Z shl.b16 %rs69, %rs37, 4; 2026-02-21T08:52:46.4237225Z shl.b16 %rs70, %rs38, 4; 2026-02-21T08:52:46.4237403Z shl.b16 %rs71, %rs39, 4; 2026-02-21T08:52:46.4237571Z shl.b16 %rs72, %rs40, 4; 2026-02-21T08:52:46.4237758Z shl.b16 %rs73, %rs41, 4; 2026-02-21T08:52:46.4237925Z shl.b16 %rs74, %rs42, 4; 2026-02-21T08:52:46.4238105Z shl.b16 %rs75, %rs43, 4; 2026-02-21T08:52:46.4238273Z shl.b16 %rs76, %rs44, 4; 2026-02-21T08:52:46.4238450Z shl.b16 %rs77, %rs45, 4; 2026-02-21T08:52:46.4238620Z shl.b16 %rs78, %rs46, 4; 2026-02-21T08:52:46.4238793Z shl.b16 %rs79, %rs47, 4; 2026-02-21T08:52:46.4238977Z shl.b16 %rs80, %rs48, 4; 2026-02-21T08:52:46.4239147Z shl.b16 %rs81, %rs49, 4; 2026-02-21T08:52:46.4239321Z shl.b16 %rs82, %rs50, 4; 2026-02-21T08:52:46.4239486Z shl.b16 %rs83, %rs51, 4; 2026-02-21T08:52:46.4239653Z shl.b16 %rs84, %rs52, 4; 2026-02-21T08:52:46.4239817Z shl.b16 %rs85, %rs53, 4; 2026-02-21T08:52:46.4239984Z shl.b16 %rs86, %rs54, 4; 2026-02-21T08:52:46.4240148Z shl.b16 %rs87, %rs55, 4; 2026-02-21T08:52:46.4240316Z shl.b16 %rs88, %rs56, 4; 2026-02-21T08:52:46.4240483Z shl.b16 %rs89, %rs57, 4; 2026-02-21T08:52:46.4240655Z shl.b16 %rs90, %rs58, 4; 2026-02-21T08:52:46.4240823Z shl.b16 %rs91, %rs59, 4; 2026-02-21T08:52:46.4240986Z shl.b16 %rs92, %rs60, 4; 2026-02-21T08:52:46.4241051Z shl.b16 %rs93, %rs61, 4; 2026-02-21T08:52:46.4241117Z shl.b16 %rs94, %rs62, 4; 2026-02-21T08:52:46.4241176Z shl.b16 %rs95, %rs63, 4; 2026-02-21T08:52:46.4241236Z shl.b16 %rs96, %rs64, 4; 2026-02-21T08:52:46.4241453Z .loc 1 72 58 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:72:58 2026-02-21T08:52:46.4241526Z selp.b16 %rs97, %rs65, %rs33, %p57; 2026-02-21T08:52:46.4241588Z cvt.s16.s8 %rs98, %rs97; 2026-02-21T08:52:46.4241647Z shr.s16 %rs99, %rs98, 4; 2026-02-21T08:52:46.4241724Z selp.b16 %rs100, %rs66, %rs34, %p57; 2026-02-21T08:52:46.4241787Z cvt.s16.s8 %rs101, %rs100; 2026-02-21T08:52:46.4241850Z shr.s16 %rs102, %rs101, 4; 2026-02-21T08:52:46.4241923Z selp.b16 %rs103, %rs67, %rs35, %p57; 2026-02-21T08:52:46.4241985Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T08:52:46.4242063Z shr.s16 %rs105, %rs104, 4; 2026-02-21T08:52:46.4242135Z selp.b16 %rs106, %rs68, %rs36, %p57; 2026-02-21T08:52:46.4242198Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T08:52:46.4242407Z shr.s16 %rs108, %rs107, 4; 2026-02-21T08:52:46.4242480Z selp.b16 %rs109, %rs69, %rs37, %p57; 2026-02-21T08:52:46.4242549Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T08:52:46.4242610Z shr.s16 %rs111, %rs110, 4; 2026-02-21T08:52:46.4242678Z selp.b16 %rs112, %rs70, %rs38, %p57; 2026-02-21T08:52:46.4242744Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T08:52:46.4242804Z shr.s16 %rs114, %rs113, 4; 2026-02-21T08:52:46.4242871Z selp.b16 %rs115, %rs71, %rs39, %p57; 2026-02-21T08:52:46.4242932Z cvt.s16.s8 %rs116, %rs115; 2026-02-21T08:52:46.4243001Z shr.s16 %rs117, %rs116, 4; 2026-02-21T08:52:46.4243068Z selp.b16 %rs118, %rs72, %rs40, %p57; 2026-02-21T08:52:46.4243129Z cvt.s16.s8 %rs119, %rs118; 2026-02-21T08:52:46.4243195Z shr.s16 %rs120, %rs119, 4; 2026-02-21T08:52:46.4243385Z selp.b16 %rs121, %rs73, %rs41, %p57; 2026-02-21T08:52:46.4243457Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T08:52:46.4243519Z shr.s16 %rs123, %rs122, 4; 2026-02-21T08:52:46.4243592Z selp.b16 %rs124, %rs74, %rs42, %p57; 2026-02-21T08:52:46.4243658Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:52:46.4243719Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:52:46.4243794Z selp.b16 %rs127, %rs75, %rs43, %p57; 2026-02-21T08:52:46.4243856Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:52:46.4243917Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:52:46.4243983Z selp.b16 %rs130, %rs76, %rs44, %p57; 2026-02-21T08:52:46.4244050Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T08:52:46.4244111Z shr.s16 %rs132, %rs131, 4; 2026-02-21T08:52:46.4244177Z selp.b16 %rs133, %rs77, %rs45, %p57; 2026-02-21T08:52:46.4244245Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T08:52:46.4244307Z shr.s16 %rs135, %rs134, 4; 2026-02-21T08:52:46.4244373Z selp.b16 %rs136, %rs78, %rs46, %p57; 2026-02-21T08:52:46.4244440Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T08:52:46.4244503Z shr.s16 %rs138, %rs137, 4; 2026-02-21T08:52:46.4244571Z selp.b16 %rs139, %rs79, %rs47, %p57; 2026-02-21T08:52:46.4244633Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T08:52:46.4244700Z shr.s16 %rs141, %rs140, 4; 2026-02-21T08:52:46.4244769Z selp.b16 %rs142, %rs80, %rs48, %p57; 2026-02-21T08:52:46.4244831Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T08:52:46.4244897Z shr.s16 %rs144, %rs143, 4; 2026-02-21T08:52:46.4244971Z selp.b16 %rs145, %rs81, %rs49, %p57; 2026-02-21T08:52:46.4245043Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T08:52:46.4245104Z shr.s16 %rs147, %rs146, 4; 2026-02-21T08:52:46.4245180Z selp.b16 %rs148, %rs82, %rs50, %p57; 2026-02-21T08:52:46.4245243Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T08:52:46.4245304Z shr.s16 %rs150, %rs149, 4; 2026-02-21T08:52:46.4245388Z selp.b16 %rs151, %rs83, %rs51, %p57; 2026-02-21T08:52:46.4245453Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T08:52:46.4245516Z shr.s16 %rs153, %rs152, 4; 2026-02-21T08:52:46.4245583Z selp.b16 %rs154, %rs84, %rs52, %p57; 2026-02-21T08:52:46.4245654Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T08:52:46.4245715Z shr.s16 %rs156, %rs155, 4; 2026-02-21T08:52:46.4245783Z selp.b16 %rs157, %rs85, %rs53, %p57; 2026-02-21T08:52:46.4245857Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T08:52:46.4245919Z shr.s16 %rs159, %rs158, 4; 2026-02-21T08:52:46.4245987Z selp.b16 %rs160, %rs86, %rs54, %p57; 2026-02-21T08:52:46.4246056Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T08:52:46.4246117Z shr.s16 %rs162, %rs161, 4; 2026-02-21T08:52:46.4246186Z selp.b16 %rs163, %rs87, %rs55, %p57; 2026-02-21T08:52:46.4246249Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T08:52:46.4246317Z shr.s16 %rs165, %rs164, 4; 2026-02-21T08:52:46.4246383Z selp.b16 %rs166, %rs88, %rs56, %p57; 2026-02-21T08:52:46.4246446Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T08:52:46.4246652Z shr.s16 %rs168, %rs167, 4; 2026-02-21T08:52:46.4246722Z selp.b16 %rs169, %rs89, %rs57, %p57; 2026-02-21T08:52:46.4246786Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T08:52:46.4246851Z shr.s16 %rs171, %rs170, 4; 2026-02-21T08:52:46.4246923Z selp.b16 %rs172, %rs90, %rs58, %p57; 2026-02-21T08:52:46.4246985Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T08:52:46.4247048Z shr.s16 %rs174, %rs173, 4; 2026-02-21T08:52:46.4247267Z selp.b16 %rs175, %rs91, %rs59, %p57; 2026-02-21T08:52:46.4247329Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T08:52:46.4247391Z shr.s16 %rs177, %rs176, 4; 2026-02-21T08:52:46.4247459Z selp.b16 %rs178, %rs92, %rs60, %p57; 2026-02-21T08:52:46.4247526Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T08:52:46.4247588Z shr.s16 %rs180, %rs179, 4; 2026-02-21T08:52:46.4247654Z selp.b16 %rs181, %rs93, %rs61, %p57; 2026-02-21T08:52:46.4247721Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T08:52:46.4247781Z shr.s16 %rs183, %rs182, 4; 2026-02-21T08:52:46.4247847Z selp.b16 %rs184, %rs94, %rs62, %p57; 2026-02-21T08:52:46.4247911Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T08:52:46.4247978Z shr.s16 %rs186, %rs185, 4; 2026-02-21T08:52:46.4248043Z selp.b16 %rs187, %rs95, %rs63, %p57; 2026-02-21T08:52:46.4248232Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T08:52:46.4248306Z shr.s16 %rs189, %rs188, 4; 2026-02-21T08:52:46.4248373Z selp.b16 %rs190, %rs96, %rs64, %p57; 2026-02-21T08:52:46.4248433Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T08:52:46.4248506Z shr.s16 %rs192, %rs191, 4; 2026-02-21T08:52:46.4248719Z .loc 1 77 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:77:32 2026-02-21T08:52:46.4248785Z cvt.rn.f32.s16 %r1171, %rs99; 2026-02-21T08:52:46.4248851Z cvt.rn.f32.s16 %r1172, %rs102; 2026-02-21T08:52:46.4248920Z cvt.rn.f32.s16 %r1173, %rs105; 2026-02-21T08:52:46.4248995Z cvt.rn.f32.s16 %r1174, %rs108; 2026-02-21T08:52:46.4249060Z cvt.rn.f32.s16 %r1175, %rs111; 2026-02-21T08:52:46.4249131Z cvt.rn.f32.s16 %r1176, %rs114; 2026-02-21T08:52:46.4249197Z cvt.rn.f32.s16 %r1177, %rs117; 2026-02-21T08:52:46.4249259Z cvt.rn.f32.s16 %r1178, %rs120; 2026-02-21T08:52:46.4249321Z cvt.rn.f32.s16 %r1179, %rs123; 2026-02-21T08:52:46.4249389Z cvt.rn.f32.s16 %r1180, %rs126; 2026-02-21T08:52:46.4249456Z cvt.rn.f32.s16 %r1181, %rs129; 2026-02-21T08:52:46.4249518Z cvt.rn.f32.s16 %r1182, %rs132; 2026-02-21T08:52:46.4249587Z cvt.rn.f32.s16 %r1183, %rs135; 2026-02-21T08:52:46.4249653Z cvt.rn.f32.s16 %r1184, %rs138; 2026-02-21T08:52:46.4249716Z cvt.rn.f32.s16 %r1185, %rs141; 2026-02-21T08:52:46.4249784Z cvt.rn.f32.s16 %r1186, %rs144; 2026-02-21T08:52:46.4249846Z cvt.rn.f32.s16 %r1187, %rs147; 2026-02-21T08:52:46.4249907Z cvt.rn.f32.s16 %r1188, %rs150; 2026-02-21T08:52:46.4249970Z cvt.rn.f32.s16 %r1189, %rs153; 2026-02-21T08:52:46.4250037Z cvt.rn.f32.s16 %r1190, %rs156; 2026-02-21T08:52:46.4250099Z cvt.rn.f32.s16 %r1191, %rs159; 2026-02-21T08:52:46.4250163Z cvt.rn.f32.s16 %r1192, %rs162; 2026-02-21T08:52:46.4250231Z cvt.rn.f32.s16 %r1193, %rs165; 2026-02-21T08:52:46.4250293Z cvt.rn.f32.s16 %r1194, %rs168; 2026-02-21T08:52:46.4250355Z cvt.rn.f32.s16 %r1195, %rs171; 2026-02-21T08:52:46.4250417Z cvt.rn.f32.s16 %r1196, %rs174; 2026-02-21T08:52:46.4250490Z cvt.rn.f32.s16 %r1197, %rs177; 2026-02-21T08:52:46.4250553Z cvt.rn.f32.s16 %r1198, %rs180; 2026-02-21T08:52:46.4250615Z cvt.rn.f32.s16 %r1199, %rs183; 2026-02-21T08:52:46.4250682Z cvt.rn.f32.s16 %r1200, %rs186; 2026-02-21T08:52:46.4250748Z cvt.rn.f32.s16 %r1201, %rs189; 2026-02-21T08:52:46.4250809Z cvt.rn.f32.s16 %r1202, %rs192; 2026-02-21T08:52:46.4250875Z st.shared.b32 [%r39], %r1171; 2026-02-21T08:52:46.4250948Z st.shared.b32 [%r39+8], %r1172; 2026-02-21T08:52:46.4251016Z st.shared.b32 [%r39+16384], %r1187; 2026-02-21T08:52:46.4251085Z st.shared.b32 [%r39+16392], %r1188; 2026-02-21T08:52:46.4251158Z st.shared.b32 [%r40], %r1173; 2026-02-21T08:52:46.4251223Z st.shared.b32 [%r40+8], %r1174; 2026-02-21T08:52:46.4251290Z st.shared.b32 [%r40+16384], %r1189; 2026-02-21T08:52:46.4251360Z st.shared.b32 [%r40+16392], %r1190; 2026-02-21T08:52:46.4251423Z st.shared.b32 [%r41], %r1175; 2026-02-21T08:52:46.4251488Z st.shared.b32 [%r41+8], %r1176; 2026-02-21T08:52:46.4251557Z st.shared.b32 [%r41+16384], %r1191; 2026-02-21T08:52:46.4251628Z st.shared.b32 [%r41+16392], %r1192; 2026-02-21T08:52:46.4251695Z st.shared.b32 [%r42], %r1177; 2026-02-21T08:52:46.4251760Z st.shared.b32 [%r42+8], %r1178; 2026-02-21T08:52:46.4251948Z st.shared.b32 [%r42+16384], %r1193; 2026-02-21T08:52:46.4252014Z st.shared.b32 [%r42+16392], %r1194; 2026-02-21T08:52:46.4252077Z st.shared.b32 [%r43], %r1179; 2026-02-21T08:52:46.4252140Z st.shared.b32 [%r43+8], %r1180; 2026-02-21T08:52:46.4252213Z st.shared.b32 [%r43+16384], %r1195; 2026-02-21T08:52:46.4252279Z st.shared.b32 [%r43+16392], %r1196; 2026-02-21T08:52:46.4252344Z st.shared.b32 [%r44], %r1181; 2026-02-21T08:52:46.4252415Z st.shared.b32 [%r44+8], %r1182; 2026-02-21T08:52:46.4252481Z st.shared.b32 [%r44+16384], %r1197; 2026-02-21T08:52:46.4252545Z st.shared.b32 [%r44+16392], %r1198; 2026-02-21T08:52:46.4252608Z st.shared.b32 [%r45], %r1183; 2026-02-21T08:52:46.4252687Z st.shared.b32 [%r45+8], %r1184; 2026-02-21T08:52:46.4252840Z st.shared.b32 [%r45+16384], %r1199; 2026-02-21T08:52:46.4252911Z st.shared.b32 [%r45+16392], %r1200; 2026-02-21T08:52:46.4252981Z st.shared.b32 [%r46], %r1185; 2026-02-21T08:52:46.4253045Z st.shared.b32 [%r46+8], %r1186; 2026-02-21T08:52:46.4253113Z st.shared.b32 [%r46+16384], %r1201; 2026-02-21T08:52:46.4253186Z st.shared.b32 [%r46+16392], %r1202; 2026-02-21T08:52:46.4253245Z $L__tmp1: 2026-02-21T08:52:46.4253533Z .loc 2 291 36 // standard.py:291:36 @[ cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:84:40 ] 2026-02-21T08:52:46.4253599Z // begin inline asm 2026-02-21T08:52:46.4253703Z fence.proxy.async.shared::cta; 2026-02-21T08:52:46.4253762Z // end inline asm 2026-02-21T08:52:46.4253821Z bar.sync 0; 2026-02-21T08:52:46.4253911Z shfl.sync.idx.b32 %r1203, %r4, 0, 31, -1; 2026-02-21T08:52:46.4253986Z wgmma.fence.sync.aligned; 2026-02-21T08:52:46.4254051Z shl.b32 %r1204, %r1203, 11; 2026-02-21T08:52:46.4254114Z and.b32 %r1205, %r1204, 8192; 2026-02-21T08:52:46.4254190Z add.s32 %r1206, %r1205, %r3769; 2026-02-21T08:52:46.4254253Z bfe.u32 %r1207, %r1206, 4, 14; 2026-02-21T08:52:46.4254314Z cvt.u64.u32 %rd101, %r1207; 2026-02-21T08:52:46.4254397Z or.b64 %rd88, %rd101, 4611686293372403712; 2026-02-21T08:52:46.4254466Z mov.pred %p2, -1; 2026-02-21T08:52:46.4254527Z // begin inline asm 2026-02-21T08:52:46.4255295Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r591,%r592,%r593,%r594}, %rd88, %p2, 1, 1; 2026-02-21T08:52:46.4255357Z // end inline asm 2026-02-21T08:52:46.4255417Z add.s32 %r1208, %r1206, 32; 2026-02-21T08:52:46.4255487Z bfe.u32 %r1209, %r1208, 4, 14; 2026-02-21T08:52:46.4255561Z cvt.u64.u32 %rd102, %r1209; 2026-02-21T08:52:46.4255639Z or.b64 %rd89, %rd102, 4611686293372403712; 2026-02-21T08:52:46.4255704Z // begin inline asm 2026-02-21T08:52:46.4256601Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r659,%r660,%r661,%r662}, %rd89, %p2, 1, 1; 2026-02-21T08:52:46.4256666Z // end inline asm 2026-02-21T08:52:46.4256726Z add.s32 %r1210, %r1206, 64; 2026-02-21T08:52:46.4256795Z bfe.u32 %r1211, %r1210, 4, 14; 2026-02-21T08:52:46.4256857Z cvt.u64.u32 %rd103, %r1211; 2026-02-21T08:52:46.4256927Z or.b64 %rd90, %rd103, 4611686293372403712; 2026-02-21T08:52:46.4256991Z // begin inline asm 2026-02-21T08:52:46.4257745Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r727,%r728,%r729,%r730}, %rd90, %p2, 1, 1; 2026-02-21T08:52:46.4257803Z // end inline asm 2026-02-21T08:52:46.4257868Z add.s32 %r1212, %r1206, 96; 2026-02-21T08:52:46.4257929Z bfe.u32 %r1213, %r1212, 4, 14; 2026-02-21T08:52:46.4258169Z cvt.u64.u32 %rd104, %r1213; 2026-02-21T08:52:46.4258246Z or.b64 %rd91, %rd104, 4611686293372403712; 2026-02-21T08:52:46.4258314Z // begin inline asm 2026-02-21T08:52:46.4259067Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r795,%r796,%r797,%r798}, %rd91, %p2, 1, 1; 2026-02-21T08:52:46.4259126Z // end inline asm 2026-02-21T08:52:46.4259194Z add.s32 %r1214, %r1206, 16384; 2026-02-21T08:52:46.4259257Z bfe.u32 %r1215, %r1214, 4, 14; 2026-02-21T08:52:46.4259320Z cvt.u64.u32 %rd105, %r1215; 2026-02-21T08:52:46.4259526Z or.b64 %rd92, %rd105, 4611686293372403712; 2026-02-21T08:52:46.4259593Z // begin inline asm 2026-02-21T08:52:46.4260349Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r863,%r864,%r865,%r866}, %rd92, %p2, 1, 1; 2026-02-21T08:52:46.4260416Z // end inline asm 2026-02-21T08:52:46.4260479Z add.s32 %r1216, %r1206, 16416; 2026-02-21T08:52:46.4260540Z bfe.u32 %r1217, %r1216, 4, 14; 2026-02-21T08:52:46.4260602Z cvt.u64.u32 %rd106, %r1217; 2026-02-21T08:52:46.4260680Z or.b64 %rd93, %rd106, 4611686293372403712; 2026-02-21T08:52:46.4260741Z // begin inline asm 2026-02-21T08:52:46.4261494Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r931,%r932,%r933,%r934}, %rd93, %p2, 1, 1; 2026-02-21T08:52:46.4261568Z // end inline asm 2026-02-21T08:52:46.4261634Z add.s32 %r1218, %r1206, 16448; 2026-02-21T08:52:46.4261696Z bfe.u32 %r1219, %r1218, 4, 14; 2026-02-21T08:52:46.4261767Z cvt.u64.u32 %rd107, %r1219; 2026-02-21T08:52:46.4261848Z or.b64 %rd94, %rd107, 4611686293372403712; 2026-02-21T08:52:46.4261909Z // begin inline asm 2026-02-21T08:52:46.4262676Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r999,%r1000,%r1001,%r1002}, %rd94, %p2, 1, 1; 2026-02-21T08:52:46.4262734Z // end inline asm 2026-02-21T08:52:46.4262796Z add.s32 %r1220, %r1206, 16480; 2026-02-21T08:52:46.4262861Z bfe.u32 %r1221, %r1220, 4, 14; 2026-02-21T08:52:46.4262927Z cvt.u64.u32 %rd108, %r1221; 2026-02-21T08:52:46.4262998Z or.b64 %rd95, %rd108, 4611686293372403712; 2026-02-21T08:52:46.4263058Z // begin inline asm 2026-02-21T08:52:46.4263826Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820}, {%r1067,%r1068,%r1069,%r1070}, %rd95, %p2, 1, 1; 2026-02-21T08:52:46.4263886Z // end inline asm 2026-02-21T08:52:46.4263965Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:46.4264027Z mov.b32 %r1104, 0; 2026-02-21T08:52:46.4264089Z mov.b32 %r1103, %r3769; 2026-02-21T08:52:46.4264149Z mov.b32 %r1105, %r1104; 2026-02-21T08:52:46.4264215Z // begin inline asm 2026-02-21T08:52:46.4264782Z // wait for regs: %r3789,%r3790,%r3791,%r3792,%r3793,%r3794,%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r1103,%r1104,%r1105 2026-02-21T08:52:46.4264858Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:46.4265032Z // end inline asm 2026-02-21T08:52:46.4265089Z $L__tmp2: 2026-02-21T08:52:46.4265317Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4265381Z add.s32 %r1222, %r3788, 1; 2026-02-21T08:52:46.4265457Z setp.gt.s32 %p13, %r1222, 1; 2026-02-21T08:52:46.4265528Z selp.b32 %r3788, 0, %r1222, %p13; 2026-02-21T08:52:46.4265740Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4265818Z mad.wide.s32 %rd99, %r3785, 2, %rd53; 2026-02-21T08:52:46.4266023Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4266088Z shl.b32 %r1223, %r3788, 12; 2026-02-21T08:52:46.4266244Z shl.b32 %r1224, %r3788, 13; 2026-02-21T08:52:46.4266311Z add.s32 %r1225, %r1154, %r1224; 2026-02-21T08:52:46.4266375Z add.s32 %r1141, %r1225, %r14; 2026-02-21T08:52:46.4266442Z selp.b32 %r1142, 8, 0, %p11; 2026-02-21T08:52:46.4266637Z // begin inline asm 2026-02-21T08:52:46.4266791Z cp.async.ca.shared.global [ %r1141 + 0 ], [ %rd289 + 0 ], 0x8, %r1142; 2026-02-21T08:52:46.4266852Z // end inline asm 2026-02-21T08:52:46.4266921Z add.s32 %r1143, %r1141, 2048; 2026-02-21T08:52:46.4266980Z // begin inline asm 2026-02-21T08:52:46.4267119Z cp.async.ca.shared.global [ %r1143 + 0 ], [ %rd290 + 0 ], 0x8, %r1142; 2026-02-21T08:52:46.4267177Z // end inline asm 2026-02-21T08:52:46.4267245Z add.s32 %r1145, %r1141, 4096; 2026-02-21T08:52:46.4267304Z // begin inline asm 2026-02-21T08:52:46.4267439Z cp.async.ca.shared.global [ %r1145 + 0 ], [ %rd291 + 0 ], 0x8, %r1142; 2026-02-21T08:52:46.4267505Z // end inline asm 2026-02-21T08:52:46.4267566Z add.s32 %r1147, %r1141, 6144; 2026-02-21T08:52:46.4267629Z // begin inline asm 2026-02-21T08:52:46.4267776Z cp.async.ca.shared.global [ %r1147 + 0 ], [ %rd99 + 0 ], 0x8, %r1142; 2026-02-21T08:52:46.4267836Z // end inline asm 2026-02-21T08:52:46.4267905Z cp.async.commit_group; 2026-02-21T08:52:46.4268115Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4268186Z cvt.s64.s32 %rd109, %r3786; 2026-02-21T08:52:46.4268251Z add.s64 %rd100, %rd54, %rd109; 2026-02-21T08:52:46.4268454Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4268594Z add.s32 %r1149, %r20, %r1223; 2026-02-21T08:52:46.4268662Z selp.b32 %r1150, 16, 0, %p11; 2026-02-21T08:52:46.4268722Z // begin inline asm 2026-02-21T08:52:46.4268878Z cp.async.cg.shared.global [ %r1149 + 0 ], [ %rd100 + 0 ], 0x10, %r1150; 2026-02-21T08:52:46.4268935Z // end inline asm 2026-02-21T08:52:46.4269002Z cp.async.commit_group; 2026-02-21T08:52:46.4269216Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4269285Z add.s32 %r3786, %r3786, 229376; 2026-02-21T08:52:46.4269349Z add.s64 %rd291, %rd291, 128; 2026-02-21T08:52:46.4269415Z add.s64 %rd290, %rd290, 128; 2026-02-21T08:52:46.4269480Z add.s64 %rd289, %rd289, 128; 2026-02-21T08:52:46.4269540Z add.s32 %r3785, %r3785, 64; 2026-02-21T08:52:46.4269605Z setp.lt.u64 %p14, %rd292, 4064; 2026-02-21T08:52:46.4269666Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:46.4269783Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:46.4269999Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4270065Z or.b32 %r1285, %r106, %r11; 2026-02-21T08:52:46.4270281Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4270349Z cp.async.wait_group 0; 2026-02-21T08:52:46.4270407Z bar.sync 0; 2026-02-21T08:52:46.4270615Z .loc 1 87 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:87:28 2026-02-21T08:52:46.4270694Z cvt.rn.bf16x2.f32 %r1286, %r3790, %r3789; 2026-02-21T08:52:46.4270912Z cvt.rn.bf16x2.f32 %r1287, %r3792, %r3791; 2026-02-21T08:52:46.4270991Z cvt.rn.bf16x2.f32 %r1288, %r3794, %r3793; 2026-02-21T08:52:46.4271061Z cvt.rn.bf16x2.f32 %r1289, %r3796, %r3795; 2026-02-21T08:52:46.4271133Z cvt.rn.bf16x2.f32 %r1290, %r3798, %r3797; 2026-02-21T08:52:46.4271215Z cvt.rn.bf16x2.f32 %r1291, %r3800, %r3799; 2026-02-21T08:52:46.4271295Z cvt.rn.bf16x2.f32 %r1292, %r3802, %r3801; 2026-02-21T08:52:46.4271369Z cvt.rn.bf16x2.f32 %r1293, %r3804, %r3803; 2026-02-21T08:52:46.4271439Z cvt.rn.bf16x2.f32 %r1294, %r3806, %r3805; 2026-02-21T08:52:46.4271514Z cvt.rn.bf16x2.f32 %r1295, %r3808, %r3807; 2026-02-21T08:52:46.4271585Z cvt.rn.bf16x2.f32 %r1296, %r3810, %r3809; 2026-02-21T08:52:46.4271656Z cvt.rn.bf16x2.f32 %r1297, %r3812, %r3811; 2026-02-21T08:52:46.4271845Z cvt.rn.bf16x2.f32 %r1298, %r3814, %r3813; 2026-02-21T08:52:46.4271930Z cvt.rn.bf16x2.f32 %r1299, %r3816, %r3815; 2026-02-21T08:52:46.4272000Z cvt.rn.bf16x2.f32 %r1300, %r3818, %r3817; 2026-02-21T08:52:46.4272086Z cvt.rn.bf16x2.f32 %r1301, %r3820, %r3819; 2026-02-21T08:52:46.4272304Z .loc 1 88 50 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:50 2026-02-21T08:52:46.4272376Z mad.lo.s32 %r1302, %r102, 7168, %r1285; 2026-02-21T08:52:46.4272446Z mad.lo.s32 %r1303, %r103, 7168, %r1285; 2026-02-21T08:52:46.4272517Z mad.lo.s32 %r1304, %r104, 7168, %r1285; 2026-02-21T08:52:46.4272582Z mad.lo.s32 %r1305, %r105, 7168, %r1285; 2026-02-21T08:52:46.4272787Z .loc 1 88 22 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:22 2026-02-21T08:52:46.4272858Z mad.wide.s32 %rd110, %r1302, 2, %rd55; 2026-02-21T08:52:46.4272938Z mad.wide.s32 %rd111, %r1303, 2, %rd55; 2026-02-21T08:52:46.4273004Z mad.wide.s32 %rd112, %r1304, 2, %rd55; 2026-02-21T08:52:46.4273074Z mad.wide.s32 %rd113, %r1305, 2, %rd55; 2026-02-21T08:52:46.4273284Z .loc 1 88 81 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:81 2026-02-21T08:52:46.4273401Z st.shared.v4.b32 [%r47], {%r1286, %r1288, %r1290, %r1292}; 2026-02-21T08:52:46.4273520Z st.shared.v4.b32 [%r47+512], {%r1287, %r1289, %r1291, %r1293}; 2026-02-21T08:52:46.4273633Z st.shared.v4.b32 [%r48], {%r1294, %r1296, %r1298, %r1300}; 2026-02-21T08:52:46.4273746Z st.shared.v4.b32 [%r48+512], {%r1295, %r1297, %r1299, %r1301}; 2026-02-21T08:52:46.4273803Z bar.sync 0; 2026-02-21T08:52:46.4273867Z // begin inline asm 2026-02-21T08:52:46.4274068Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1246, %r1247, %r1248, %r1249}, [%r1230]; 2026-02-21T08:52:46.4274129Z // end inline asm 2026-02-21T08:52:46.4274189Z // begin inline asm 2026-02-21T08:52:46.4274383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1250, %r1251, %r1252, %r1253}, [%r1235]; 2026-02-21T08:52:46.4274443Z // end inline asm 2026-02-21T08:52:46.4274504Z // begin inline asm 2026-02-21T08:52:46.4274694Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1254, %r1255, %r1256, %r1257}, [%r1240]; 2026-02-21T08:52:46.4274755Z // end inline asm 2026-02-21T08:52:46.4274815Z // begin inline asm 2026-02-21T08:52:46.4275001Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1258, %r1259, %r1260, %r1261}, [%r1245]; 2026-02-21T08:52:46.4275068Z // end inline asm 2026-02-21T08:52:46.4275128Z // begin inline asm 2026-02-21T08:52:46.4275267Z st.global.v4.b32 [ %rd110 + 0 ], { %r1246, %r1247, %r1248, %r1249 }; 2026-02-21T08:52:46.4275330Z // end inline asm 2026-02-21T08:52:46.4275388Z // begin inline asm 2026-02-21T08:52:46.4275508Z st.global.v4.b32 [ %rd111 + 0 ], { %r1250, %r1251, %r1252, %r1253 }; 2026-02-21T08:52:46.4275572Z // end inline asm 2026-02-21T08:52:46.4275638Z // begin inline asm 2026-02-21T08:52:46.4275769Z st.global.v4.b32 [ %rd112 + 0 ], { %r1254, %r1255, %r1256, %r1257 }; 2026-02-21T08:52:46.4275831Z // end inline asm 2026-02-21T08:52:46.4275901Z // begin inline asm 2026-02-21T08:52:46.4276018Z st.global.v4.b32 [ %rd113 + 0 ], { %r1258, %r1259, %r1260, %r1261 }; 2026-02-21T08:52:46.4276079Z // end inline asm 2026-02-21T08:52:46.4276411Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4276597Z add.s32 %r1306, %r3784, 4224; 2026-02-21T08:52:46.4276808Z .loc 1 25 35 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:25:35 2026-02-21T08:52:46.4276898Z mul.hi.s32 %r1307, %r1306, -1840700269; 2026-02-21T08:52:46.4276968Z add.s32 %r1308, %r1307, %r1306; 2026-02-21T08:52:46.4277031Z shr.u32 %r1309, %r1308, 31; 2026-02-21T08:52:46.4277098Z shr.s32 %r1310, %r1308, 6; 2026-02-21T08:52:46.4277168Z add.s32 %r1311, %r1310, %r1309; 2026-02-21T08:52:46.4277372Z .loc 1 26 33 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:26:33 2026-02-21T08:52:46.4277564Z shl.b32 %r1312, %r1311, 1; 2026-02-21T08:52:46.4277781Z .loc 1 27 39 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:39 2026-02-21T08:52:46.4277844Z sub.s32 %r1313, 1, %r1312; 2026-02-21T08:52:46.4278051Z .loc 1 27 52 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:52 2026-02-21T08:52:46.4278120Z min.s32 %r1314, %r1313, 2; 2026-02-21T08:52:46.4278320Z .loc 1 28 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:45 2026-02-21T08:52:46.4278386Z mul.lo.s32 %r1315, %r1311, 112; 2026-02-21T08:52:46.4278451Z sub.s32 %r1316, %r1306, %r1315; 2026-02-21T08:52:46.4278659Z .loc 1 29 51 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:29:51 2026-02-21T08:52:46.4278723Z div.s32 %r1317, %r1316, %r1314; 2026-02-21T08:52:46.4278924Z .loc 1 28 64 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:64 2026-02-21T08:52:46.4279011Z mul.lo.s32 %r1318, %r1317, %r1314; 2026-02-21T08:52:46.4279076Z sub.s32 %r1319, %r1316, %r1318; 2026-02-21T08:52:46.4279282Z .loc 1 28 30 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:30 2026-02-21T08:52:46.4279354Z add.s32 %r1320, %r1319, %r1312; 2026-02-21T08:52:46.4279555Z .loc 1 30 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:30:27 2026-02-21T08:52:46.4279617Z shl.b32 %r1321, %r1320, 6; 2026-02-21T08:52:46.4279823Z .loc 1 31 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:31:32 2026-02-21T08:52:46.4279887Z or.b32 %r181, %r1321, %r5; 2026-02-21T08:52:46.4279947Z or.b32 %r182, %r1321, %r6; 2026-02-21T08:52:46.4280008Z or.b32 %r183, %r1321, %r7; 2026-02-21T08:52:46.4280073Z or.b32 %r184, %r1321, %r8; 2026-02-21T08:52:46.4280274Z .loc 1 32 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:32:27 2026-02-21T08:52:46.4280336Z shl.b32 %r185, %r1317, 7; 2026-02-21T08:52:46.4280544Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4280608Z or.b32 %r1322, %r185, %r10; 2026-02-21T08:52:46.4280810Z .loc 1 48 53 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:53 2026-02-21T08:52:46.4280878Z shl.b32 %r1323, %r181, 13; 2026-02-21T08:52:46.4280950Z shl.b32 %r1324, %r182, 13; 2026-02-21T08:52:46.4281011Z shl.b32 %r1325, %r183, 13; 2026-02-21T08:52:46.4281071Z shl.b32 %r1326, %r184, 13; 2026-02-21T08:52:46.4281279Z .loc 1 48 60 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:60 2026-02-21T08:52:46.4281341Z or.b32 %r1327, %r1323, %r9; 2026-02-21T08:52:46.4281402Z or.b32 %r1328, %r1324, %r9; 2026-02-21T08:52:46.4281466Z or.b32 %r1329, %r1325, %r9; 2026-02-21T08:52:46.4281525Z or.b32 %r1330, %r1326, %r9; 2026-02-21T08:52:46.4281728Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4281807Z mad.wide.s32 %rd114, %r1327, 2, %rd53; 2026-02-21T08:52:46.4281876Z mad.wide.s32 %rd115, %r1328, 2, %rd53; 2026-02-21T08:52:46.4282093Z mad.wide.s32 %rd116, %r1329, 2, %rd53; 2026-02-21T08:52:46.4282159Z mad.wide.s32 %rd117, %r1330, 2, %rd53; 2026-02-21T08:52:46.4282223Z mov.b32 %r1263, 8; 2026-02-21T08:52:46.4282428Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4282488Z // begin inline asm 2026-02-21T08:52:46.4282633Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd114 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4282692Z // end inline asm 2026-02-21T08:52:46.4282751Z // begin inline asm 2026-02-21T08:52:46.4282900Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd115 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4282965Z // end inline asm 2026-02-21T08:52:46.4283025Z // begin inline asm 2026-02-21T08:52:46.4283250Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd116 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4283317Z // end inline asm 2026-02-21T08:52:46.4283378Z // begin inline asm 2026-02-21T08:52:46.4283507Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd117 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4283575Z // end inline asm 2026-02-21T08:52:46.4283642Z cp.async.commit_group; 2026-02-21T08:52:46.4283848Z .loc 1 54 62 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:62 2026-02-21T08:52:46.4283912Z add.s32 %r1331, %r1322, %r3770; 2026-02-21T08:52:46.4284123Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4284187Z cvt.s64.s32 %rd125, %r1331; 2026-02-21T08:52:46.4284255Z add.s64 %rd118, %rd54, %rd125; 2026-02-21T08:52:46.4284323Z mov.b32 %r1271, 16; 2026-02-21T08:52:46.4284526Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4284586Z // begin inline asm 2026-02-21T08:52:46.4284743Z cp.async.cg.shared.global [ %r20 + 0 ], [ %rd118 + 0 ], 0x10, %r1271; 2026-02-21T08:52:46.4284802Z // end inline asm 2026-02-21T08:52:46.4284868Z cp.async.commit_group; 2026-02-21T08:52:46.4285078Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4285153Z cvt.s64.s32 %rd126, %r1323; 2026-02-21T08:52:46.4285219Z or.b64 %rd127, %rd126, %rd288; 2026-02-21T08:52:46.4285284Z shl.b64 %rd128, %rd127, 1; 2026-02-21T08:52:46.4285354Z add.s64 %rd129, %rd53, %rd128; 2026-02-21T08:52:46.4285420Z add.s64 %rd119, %rd129, 128; 2026-02-21T08:52:46.4285484Z cvt.s64.s32 %rd130, %r1324; 2026-02-21T08:52:46.4285546Z or.b64 %rd131, %rd130, %rd288; 2026-02-21T08:52:46.4285616Z shl.b64 %rd132, %rd131, 1; 2026-02-21T08:52:46.4285678Z add.s64 %rd133, %rd53, %rd132; 2026-02-21T08:52:46.4285740Z add.s64 %rd120, %rd133, 128; 2026-02-21T08:52:46.4285809Z cvt.s64.s32 %rd134, %r1325; 2026-02-21T08:52:46.4285870Z or.b64 %rd135, %rd134, %rd288; 2026-02-21T08:52:46.4285934Z shl.b64 %rd136, %rd135, 1; 2026-02-21T08:52:46.4286002Z add.s64 %rd137, %rd53, %rd136; 2026-02-21T08:52:46.4286064Z add.s64 %rd121, %rd137, 128; 2026-02-21T08:52:46.4286130Z cvt.s64.s32 %rd138, %r1326; 2026-02-21T08:52:46.4286191Z or.b64 %rd139, %rd138, %rd288; 2026-02-21T08:52:46.4286259Z shl.b64 %rd140, %rd139, 1; 2026-02-21T08:52:46.4286322Z add.s64 %rd141, %rd53, %rd140; 2026-02-21T08:52:46.4286391Z add.s64 %rd122, %rd141, 128; 2026-02-21T08:52:46.4286742Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4286808Z bar.sync 0; 2026-02-21T08:52:46.4286870Z // begin inline asm 2026-02-21T08:52:46.4287008Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd119 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4287076Z // end inline asm 2026-02-21T08:52:46.4287136Z // begin inline asm 2026-02-21T08:52:46.4287271Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd120 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4287339Z // end inline asm 2026-02-21T08:52:46.4287398Z // begin inline asm 2026-02-21T08:52:46.4287526Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd121 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4287724Z // end inline asm 2026-02-21T08:52:46.4287796Z // begin inline asm 2026-02-21T08:52:46.4287927Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd122 + 0 ], 0x8, %r1263; 2026-02-21T08:52:46.4287985Z // end inline asm 2026-02-21T08:52:46.4288057Z cp.async.commit_group; 2026-02-21T08:52:46.4288262Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4288327Z cvt.s64.s32 %rd142, %r1322; 2026-02-21T08:52:46.4288399Z add.s64 %rd143, %rd142, %rd5; 2026-02-21T08:52:46.4288462Z add.s64 %rd144, %rd54, %rd143; 2026-02-21T08:52:46.4288526Z add.s64 %rd123, %rd144, 229376; 2026-02-21T08:52:46.4288731Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4288923Z // begin inline asm 2026-02-21T08:52:46.4289075Z cp.async.cg.shared.global [ %r25 + 0 ], [ %rd123 + 0 ], 0x10, %r1271; 2026-02-21T08:52:46.4289140Z // end inline asm 2026-02-21T08:52:46.4289212Z cp.async.commit_group; 2026-02-21T08:52:46.4289429Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4289493Z add.s32 %r3822, %r53, %r185; 2026-02-21T08:52:46.4289566Z or.b32 %r1332, %r7, %r1321; 2026-02-21T08:52:46.4289625Z shl.b32 %r1333, %r1332, 13; 2026-02-21T08:52:46.4289697Z mad.wide.s32 %rd295, %r1333, 2, %rd1; 2026-02-21T08:52:46.4289757Z or.b32 %r1334, %r6, %r1321; 2026-02-21T08:52:46.4289824Z shl.b32 %r1335, %r1334, 13; 2026-02-21T08:52:46.4289893Z mad.wide.s32 %rd294, %r1335, 2, %rd1; 2026-02-21T08:52:46.4289953Z shl.b32 %r1336, %r1320, 19; 2026-02-21T08:52:46.4290018Z or.b32 %r1337, %r56, %r1336; 2026-02-21T08:52:46.4290084Z mad.wide.s32 %rd293, %r1337, 2, %rd1; 2026-02-21T08:52:46.4290144Z or.b32 %r3821, %r57, %r1336; 2026-02-21T08:52:46.4290207Z mov.b32 %r3825, 0f00000000; 2026-02-21T08:52:46.4290272Z mov.b32 %r3824, 1; 2026-02-21T08:52:46.4290332Z mov.b32 %r3823, -1; 2026-02-21T08:52:46.4290393Z mov.b64 %rd296, -32; 2026-02-21T08:52:46.4290463Z mov.b32 %r3826, %r3825; 2026-02-21T08:52:46.4290524Z mov.b32 %r3827, %r3825; 2026-02-21T08:52:46.4290582Z mov.b32 %r3828, %r3825; 2026-02-21T08:52:46.4290640Z mov.b32 %r3829, %r3825; 2026-02-21T08:52:46.4290704Z mov.b32 %r3830, %r3825; 2026-02-21T08:52:46.4290762Z mov.b32 %r3831, %r3825; 2026-02-21T08:52:46.4290820Z mov.b32 %r3832, %r3825; 2026-02-21T08:52:46.4290884Z mov.b32 %r3833, %r3825; 2026-02-21T08:52:46.4290943Z mov.b32 %r3834, %r3825; 2026-02-21T08:52:46.4291011Z mov.b32 %r3835, %r3825; 2026-02-21T08:52:46.4291077Z mov.b32 %r3836, %r3825; 2026-02-21T08:52:46.4291138Z mov.b32 %r3837, %r3825; 2026-02-21T08:52:46.4291200Z mov.b32 %r3838, %r3825; 2026-02-21T08:52:46.4291257Z mov.b32 %r3839, %r3825; 2026-02-21T08:52:46.4291324Z mov.b32 %r3840, %r3825; 2026-02-21T08:52:46.4291383Z mov.b32 %r3841, %r3825; 2026-02-21T08:52:46.4291442Z mov.b32 %r3842, %r3825; 2026-02-21T08:52:46.4291506Z mov.b32 %r3843, %r3825; 2026-02-21T08:52:46.4291567Z mov.b32 %r3844, %r3825; 2026-02-21T08:52:46.4291626Z mov.b32 %r3845, %r3825; 2026-02-21T08:52:46.4291684Z mov.b32 %r3846, %r3825; 2026-02-21T08:52:46.4291749Z mov.b32 %r3847, %r3825; 2026-02-21T08:52:46.4291808Z mov.b32 %r3848, %r3825; 2026-02-21T08:52:46.4291867Z mov.b32 %r3849, %r3825; 2026-02-21T08:52:46.4310406Z mov.b32 %r3850, %r3825; 2026-02-21T08:52:46.4310549Z mov.b32 %r3851, %r3825; 2026-02-21T08:52:46.4310627Z mov.b32 %r3852, %r3825; 2026-02-21T08:52:46.4310701Z mov.b32 %r3853, %r3825; 2026-02-21T08:52:46.4310763Z mov.b32 %r3854, %r3825; 2026-02-21T08:52:46.4310822Z mov.b32 %r3855, %r3825; 2026-02-21T08:52:46.4310878Z mov.b32 %r3856, %r3825; 2026-02-21T08:52:46.4311035Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:46.4311169Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:46.4311242Z add.s64 %rd296, %rd296, 32; 2026-02-21T08:52:46.4311317Z setp.lt.u64 %p24, %rd296, 4032; 2026-02-21T08:52:46.4311618Z add.s32 %r1962, %r3823, 1; 2026-02-21T08:52:46.4311686Z setp.gt.s32 %p25, %r1962, 1; 2026-02-21T08:52:46.4311758Z selp.b32 %r3823, 0, %r1962, %p25; 2026-02-21T08:52:46.4311992Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4312067Z cp.async.wait_group 2; 2026-02-21T08:52:46.4312125Z bar.sync 0; 2026-02-21T08:52:46.4312193Z shl.b32 %r1963, %r3823, 12; 2026-02-21T08:52:46.4312252Z shl.b32 %r1964, %r3823, 13; 2026-02-21T08:52:46.4312316Z add.s32 %r1966, %r1154, %r1964; 2026-02-21T08:52:46.4312539Z .loc 1 52 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:52:32 2026-02-21T08:52:46.4312607Z add.s32 %r1967, %r1966, %r26; 2026-02-21T08:52:46.4312826Z ld.shared.b16 %rs193, [%r1967]; 2026-02-21T08:52:46.4312905Z ld.shared.b16 %rs194, [%r1967+1024]; 2026-02-21T08:52:46.4312976Z ld.shared.b16 %rs195, [%r1967+64]; 2026-02-21T08:52:46.4313043Z ld.shared.b16 %rs196, [%r1967+1088]; 2026-02-21T08:52:46.4313109Z add.s32 %r1968, %r1966, %r27; 2026-02-21T08:52:46.4313176Z ld.shared.b16 %rs197, [%r1968]; 2026-02-21T08:52:46.4313240Z ld.shared.b16 %rs198, [%r1968+1024]; 2026-02-21T08:52:46.4313303Z ld.shared.b16 %rs199, [%r1968+64]; 2026-02-21T08:52:46.4313371Z ld.shared.b16 %rs200, [%r1968+1088]; 2026-02-21T08:52:46.4313437Z add.s32 %r1969, %r1966, %r28; 2026-02-21T08:52:46.4313508Z ld.shared.b16 %rs201, [%r1969]; 2026-02-21T08:52:46.4313578Z ld.shared.b16 %rs202, [%r1969+1024]; 2026-02-21T08:52:46.4313650Z ld.shared.b16 %rs203, [%r1969+64]; 2026-02-21T08:52:46.4313732Z ld.shared.b16 %rs204, [%r1969+1088]; 2026-02-21T08:52:46.4313800Z add.s32 %r1970, %r1966, %r29; 2026-02-21T08:52:46.4313870Z ld.shared.b16 %rs205, [%r1970]; 2026-02-21T08:52:46.4313938Z ld.shared.b16 %rs206, [%r1970+1024]; 2026-02-21T08:52:46.4314010Z ld.shared.b16 %rs207, [%r1970+64]; 2026-02-21T08:52:46.4314100Z ld.shared.b16 %rs208, [%r1970+1088]; 2026-02-21T08:52:46.4314165Z add.s32 %r1971, %r1966, %r30; 2026-02-21T08:52:46.4314232Z ld.shared.b16 %rs209, [%r1971]; 2026-02-21T08:52:46.4314297Z ld.shared.b16 %rs210, [%r1971+1024]; 2026-02-21T08:52:46.4314364Z ld.shared.b16 %rs211, [%r1971+64]; 2026-02-21T08:52:46.4314427Z ld.shared.b16 %rs212, [%r1971+1088]; 2026-02-21T08:52:46.4314489Z add.s32 %r1972, %r1966, %r31; 2026-02-21T08:52:46.4314557Z ld.shared.b16 %rs213, [%r1972]; 2026-02-21T08:52:46.4314621Z ld.shared.b16 %rs214, [%r1972+1024]; 2026-02-21T08:52:46.4314684Z ld.shared.b16 %rs215, [%r1972+64]; 2026-02-21T08:52:46.4314748Z ld.shared.b16 %rs216, [%r1972+1088]; 2026-02-21T08:52:46.4314812Z add.s32 %r1973, %r1966, %r32; 2026-02-21T08:52:46.4314874Z ld.shared.b16 %rs217, [%r1973]; 2026-02-21T08:52:46.4314939Z ld.shared.b16 %rs218, [%r1973+1024]; 2026-02-21T08:52:46.4315008Z ld.shared.b16 %rs219, [%r1973+64]; 2026-02-21T08:52:46.4315071Z ld.shared.b16 %rs220, [%r1973+1088]; 2026-02-21T08:52:46.4315130Z add.s32 %r1974, %r1966, %r33; 2026-02-21T08:52:46.4315223Z ld.shared.b16 %rs221, [%r1974]; 2026-02-21T08:52:46.4315304Z ld.shared.b16 %rs222, [%r1974+1024]; 2026-02-21T08:52:46.4315372Z ld.shared.b16 %rs223, [%r1974+64]; 2026-02-21T08:52:46.4315440Z ld.shared.b16 %rs224, [%r1974+1088]; 2026-02-21T08:52:46.4315512Z cvt.f32.bf16 %r1402, %rs193; 2026-02-21T08:52:46.4315588Z cvt.f32.bf16 %r1403, %rs194; 2026-02-21T08:52:46.4315653Z cvt.f32.bf16 %r1404, %rs197; 2026-02-21T08:52:46.4315716Z cvt.f32.bf16 %r1405, %rs198; 2026-02-21T08:52:46.4315788Z cvt.f32.bf16 %r1470, %rs201; 2026-02-21T08:52:46.4315851Z cvt.f32.bf16 %r1471, %rs202; 2026-02-21T08:52:46.4315913Z cvt.f32.bf16 %r1472, %rs205; 2026-02-21T08:52:46.4315978Z cvt.f32.bf16 %r1473, %rs206; 2026-02-21T08:52:46.4316046Z cvt.f32.bf16 %r1538, %rs209; 2026-02-21T08:52:46.4316112Z cvt.f32.bf16 %r1539, %rs210; 2026-02-21T08:52:46.4316180Z cvt.f32.bf16 %r1540, %rs213; 2026-02-21T08:52:46.4316250Z cvt.f32.bf16 %r1541, %rs214; 2026-02-21T08:52:46.4316313Z cvt.f32.bf16 %r1606, %rs217; 2026-02-21T08:52:46.4316627Z cvt.f32.bf16 %r1607, %rs218; 2026-02-21T08:52:46.4316696Z cvt.f32.bf16 %r1608, %rs221; 2026-02-21T08:52:46.4316768Z cvt.f32.bf16 %r1609, %rs222; 2026-02-21T08:52:46.4316837Z cvt.f32.bf16 %r1674, %rs195; 2026-02-21T08:52:46.4316903Z cvt.f32.bf16 %r1675, %rs196; 2026-02-21T08:52:46.4316971Z cvt.f32.bf16 %r1676, %rs199; 2026-02-21T08:52:46.4317033Z cvt.f32.bf16 %r1677, %rs200; 2026-02-21T08:52:46.4317096Z cvt.f32.bf16 %r1742, %rs203; 2026-02-21T08:52:46.4317171Z cvt.f32.bf16 %r1743, %rs204; 2026-02-21T08:52:46.4317240Z cvt.f32.bf16 %r1744, %rs207; 2026-02-21T08:52:46.4317302Z cvt.f32.bf16 %r1745, %rs208; 2026-02-21T08:52:46.4317365Z cvt.f32.bf16 %r1810, %rs211; 2026-02-21T08:52:46.4317433Z cvt.f32.bf16 %r1811, %rs212; 2026-02-21T08:52:46.4317636Z cvt.f32.bf16 %r1812, %rs215; 2026-02-21T08:52:46.4317703Z cvt.f32.bf16 %r1813, %rs216; 2026-02-21T08:52:46.4317770Z cvt.f32.bf16 %r1878, %rs219; 2026-02-21T08:52:46.4317834Z cvt.f32.bf16 %r1879, %rs220; 2026-02-21T08:52:46.4317901Z cvt.f32.bf16 %r1880, %rs223; 2026-02-21T08:52:46.4317965Z cvt.f32.bf16 %r1881, %rs224; 2026-02-21T08:52:46.4318195Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4318265Z add.s32 %r1975, %r3769, %r1963; 2026-02-21T08:52:46.4318334Z add.s32 %r1976, %r1975, 49152; 2026-02-21T08:52:46.4318558Z .loc 1 67 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:67:45 2026-02-21T08:52:46.4318624Z add.s32 %r1977, %r1976, %r3774; 2026-02-21T08:52:46.4318695Z ld.shared.b8 %rs225, [%r1977]; 2026-02-21T08:52:46.4318771Z ld.shared.b8 %rs226, [%r1977+128]; 2026-02-21T08:52:46.4318836Z ld.shared.b8 %rs227, [%r1977+256]; 2026-02-21T08:52:46.4318905Z ld.shared.b8 %rs228, [%r1977+384]; 2026-02-21T08:52:46.4318982Z ld.shared.b8 %rs229, [%r1977+512]; 2026-02-21T08:52:46.4319057Z ld.shared.b8 %rs230, [%r1977+640]; 2026-02-21T08:52:46.4319124Z ld.shared.b8 %rs231, [%r1977+768]; 2026-02-21T08:52:46.4319190Z add.s32 %r1978, %r1976, %r3775; 2026-02-21T08:52:46.4319262Z ld.shared.b8 %rs232, [%r1978]; 2026-02-21T08:52:46.4319333Z ld.shared.b8 %rs233, [%r1977+1024]; 2026-02-21T08:52:46.4319399Z ld.shared.b8 %rs234, [%r1977+1152]; 2026-02-21T08:52:46.4319466Z ld.shared.b8 %rs235, [%r1977+1280]; 2026-02-21T08:52:46.4319540Z ld.shared.b8 %rs236, [%r1977+1408]; 2026-02-21T08:52:46.4319608Z ld.shared.b8 %rs237, [%r1977+1536]; 2026-02-21T08:52:46.4319674Z ld.shared.b8 %rs238, [%r1977+1664]; 2026-02-21T08:52:46.4319751Z ld.shared.b8 %rs239, [%r1977+1792]; 2026-02-21T08:52:46.4319818Z add.s32 %r1979, %r1976, %r3776; 2026-02-21T08:52:46.4319884Z ld.shared.b8 %rs240, [%r1979]; 2026-02-21T08:52:46.4319959Z ld.shared.b8 %rs241, [%r1977+2048]; 2026-02-21T08:52:46.4320032Z ld.shared.b8 %rs242, [%r1977+2176]; 2026-02-21T08:52:46.4320098Z ld.shared.b8 %rs243, [%r1977+2304]; 2026-02-21T08:52:46.4320165Z ld.shared.b8 %rs244, [%r1977+2432]; 2026-02-21T08:52:46.4320244Z ld.shared.b8 %rs245, [%r1977+2560]; 2026-02-21T08:52:46.4320308Z ld.shared.b8 %rs246, [%r1977+2688]; 2026-02-21T08:52:46.4320374Z ld.shared.b8 %rs247, [%r1977+2816]; 2026-02-21T08:52:46.4320448Z add.s32 %r1980, %r1976, %r3777; 2026-02-21T08:52:46.4320528Z ld.shared.b8 %rs248, [%r1980]; 2026-02-21T08:52:46.4320598Z ld.shared.b8 %rs249, [%r1977+3072]; 2026-02-21T08:52:46.4320665Z ld.shared.b8 %rs250, [%r1977+3200]; 2026-02-21T08:52:46.4320745Z ld.shared.b8 %rs251, [%r1977+3328]; 2026-02-21T08:52:46.4320810Z ld.shared.b8 %rs252, [%r1977+3456]; 2026-02-21T08:52:46.4320877Z ld.shared.b8 %rs253, [%r1977+3584]; 2026-02-21T08:52:46.4320954Z ld.shared.b8 %rs254, [%r1977+3712]; 2026-02-21T08:52:46.4321020Z ld.shared.b8 %rs255, [%r1977+3840]; 2026-02-21T08:52:46.4321085Z add.s32 %r1981, %r1976, %r3778; 2026-02-21T08:52:46.4321159Z ld.shared.b8 %rs256, [%r1981]; 2026-02-21T08:52:46.4321368Z .loc 1 57 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:57:28 2026-02-21T08:52:46.4321569Z shl.b16 %rs257, %rs225, 4; 2026-02-21T08:52:46.4321632Z shl.b16 %rs258, %rs226, 4; 2026-02-21T08:52:46.4321704Z shl.b16 %rs259, %rs227, 4; 2026-02-21T08:52:46.4321769Z shl.b16 %rs260, %rs228, 4; 2026-02-21T08:52:46.4321832Z shl.b16 %rs261, %rs229, 4; 2026-02-21T08:52:46.4321901Z shl.b16 %rs262, %rs230, 4; 2026-02-21T08:52:46.4321963Z shl.b16 %rs263, %rs231, 4; 2026-02-21T08:52:46.4322025Z shl.b16 %rs264, %rs232, 4; 2026-02-21T08:52:46.4322086Z shl.b16 %rs265, %rs233, 4; 2026-02-21T08:52:46.4322156Z shl.b16 %rs266, %rs234, 4; 2026-02-21T08:52:46.4322218Z shl.b16 %rs267, %rs235, 4; 2026-02-21T08:52:46.4322279Z shl.b16 %rs268, %rs236, 4; 2026-02-21T08:52:46.4322363Z shl.b16 %rs269, %rs237, 4; 2026-02-21T08:52:46.4322433Z shl.b16 %rs270, %rs238, 4; 2026-02-21T08:52:46.4322587Z shl.b16 %rs271, %rs239, 4; 2026-02-21T08:52:46.4322652Z shl.b16 %rs272, %rs240, 4; 2026-02-21T08:52:46.4322721Z shl.b16 %rs273, %rs241, 4; 2026-02-21T08:52:46.4322789Z shl.b16 %rs274, %rs242, 4; 2026-02-21T08:52:46.4322854Z shl.b16 %rs275, %rs243, 4; 2026-02-21T08:52:46.4322927Z shl.b16 %rs276, %rs244, 4; 2026-02-21T08:52:46.4322990Z shl.b16 %rs277, %rs245, 4; 2026-02-21T08:52:46.4323052Z shl.b16 %rs278, %rs246, 4; 2026-02-21T08:52:46.4323113Z shl.b16 %rs279, %rs247, 4; 2026-02-21T08:52:46.4323187Z shl.b16 %rs280, %rs248, 4; 2026-02-21T08:52:46.4323253Z shl.b16 %rs281, %rs249, 4; 2026-02-21T08:52:46.4323314Z shl.b16 %rs282, %rs250, 4; 2026-02-21T08:52:46.4323389Z shl.b16 %rs283, %rs251, 4; 2026-02-21T08:52:46.4323451Z shl.b16 %rs284, %rs252, 4; 2026-02-21T08:52:46.4323511Z shl.b16 %rs285, %rs253, 4; 2026-02-21T08:52:46.4323576Z shl.b16 %rs286, %rs254, 4; 2026-02-21T08:52:46.4323637Z shl.b16 %rs287, %rs255, 4; 2026-02-21T08:52:46.4323706Z shl.b16 %rs288, %rs256, 4; 2026-02-21T08:52:46.4323912Z .loc 1 72 58 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:72:58 2026-02-21T08:52:46.4323997Z selp.b16 %rs289, %rs257, %rs225, %p57; 2026-02-21T08:52:46.4324065Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:52:46.4324143Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:52:46.4324226Z selp.b16 %rs292, %rs258, %rs226, %p57; 2026-02-21T08:52:46.4324290Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:52:46.4324352Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:52:46.4324422Z selp.b16 %rs295, %rs259, %rs227, %p57; 2026-02-21T08:52:46.4324491Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:52:46.4324553Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:52:46.4324621Z selp.b16 %rs298, %rs260, %rs228, %p57; 2026-02-21T08:52:46.4324691Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:52:46.4324753Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:52:46.4324823Z selp.b16 %rs301, %rs261, %rs229, %p57; 2026-02-21T08:52:46.4324886Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:52:46.4324965Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:52:46.4325034Z selp.b16 %rs304, %rs262, %rs230, %p57; 2026-02-21T08:52:46.4325096Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:52:46.4325167Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:52:46.4325235Z selp.b16 %rs307, %rs263, %rs231, %p57; 2026-02-21T08:52:46.4325296Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:52:46.4325364Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:52:46.4325432Z selp.b16 %rs310, %rs264, %rs232, %p57; 2026-02-21T08:52:46.4325495Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:52:46.4325556Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:52:46.4325631Z selp.b16 %rs313, %rs265, %rs233, %p57; 2026-02-21T08:52:46.4325692Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:52:46.4325767Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:52:46.4325844Z selp.b16 %rs316, %rs266, %rs234, %p57; 2026-02-21T08:52:46.4325906Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:52:46.4325967Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:52:46.4326038Z selp.b16 %rs319, %rs267, %rs235, %p57; 2026-02-21T08:52:46.4326109Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:52:46.4326177Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:52:46.4326250Z selp.b16 %rs322, %rs268, %rs236, %p57; 2026-02-21T08:52:46.4326423Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:52:46.4326612Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:52:46.4326687Z selp.b16 %rs325, %rs269, %rs237, %p57; 2026-02-21T08:52:46.4326750Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:52:46.4326818Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:52:46.4326887Z selp.b16 %rs328, %rs270, %rs238, %p57; 2026-02-21T08:52:46.4326950Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:52:46.4327020Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:52:46.4327090Z selp.b16 %rs331, %rs271, %rs239, %p57; 2026-02-21T08:52:46.4327152Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:52:46.4327215Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:52:46.4327290Z selp.b16 %rs334, %rs272, %rs240, %p57; 2026-02-21T08:52:46.4327352Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:52:46.4327579Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:52:46.4327660Z selp.b16 %rs337, %rs273, %rs241, %p57; 2026-02-21T08:52:46.4327723Z cvt.s16.s8 %rs338, %rs337; 2026-02-21T08:52:46.4327789Z shr.s16 %rs339, %rs338, 4; 2026-02-21T08:52:46.4327863Z selp.b16 %rs340, %rs274, %rs242, %p57; 2026-02-21T08:52:46.4327926Z cvt.s16.s8 %rs341, %rs340; 2026-02-21T08:52:46.4327987Z shr.s16 %rs342, %rs341, 4; 2026-02-21T08:52:46.4328055Z selp.b16 %rs343, %rs275, %rs243, %p57; 2026-02-21T08:52:46.4328125Z cvt.s16.s8 %rs344, %rs343; 2026-02-21T08:52:46.4328187Z shr.s16 %rs345, %rs344, 4; 2026-02-21T08:52:46.4328254Z selp.b16 %rs346, %rs276, %rs244, %p57; 2026-02-21T08:52:46.4328320Z cvt.s16.s8 %rs347, %rs346; 2026-02-21T08:52:46.4328382Z shr.s16 %rs348, %rs347, 4; 2026-02-21T08:52:46.4328450Z selp.b16 %rs349, %rs277, %rs245, %p57; 2026-02-21T08:52:46.4328510Z cvt.s16.s8 %rs350, %rs349; 2026-02-21T08:52:46.4328578Z shr.s16 %rs351, %rs350, 4; 2026-02-21T08:52:46.4328649Z selp.b16 %rs352, %rs278, %rs246, %p57; 2026-02-21T08:52:46.4328711Z cvt.s16.s8 %rs353, %rs352; 2026-02-21T08:52:46.4328781Z shr.s16 %rs354, %rs353, 4; 2026-02-21T08:52:46.4328849Z selp.b16 %rs355, %rs279, %rs247, %p57; 2026-02-21T08:52:46.4328914Z cvt.s16.s8 %rs356, %rs355; 2026-02-21T08:52:46.4328979Z shr.s16 %rs357, %rs356, 4; 2026-02-21T08:52:46.4329065Z selp.b16 %rs358, %rs280, %rs248, %p57; 2026-02-21T08:52:46.4329133Z cvt.s16.s8 %rs359, %rs358; 2026-02-21T08:52:46.4329196Z shr.s16 %rs360, %rs359, 4; 2026-02-21T08:52:46.4329271Z selp.b16 %rs361, %rs281, %rs249, %p57; 2026-02-21T08:52:46.4329332Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T08:52:46.4329392Z shr.s16 %rs363, %rs362, 4; 2026-02-21T08:52:46.4329466Z selp.b16 %rs364, %rs282, %rs250, %p57; 2026-02-21T08:52:46.4329529Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T08:52:46.4329593Z shr.s16 %rs366, %rs365, 4; 2026-02-21T08:52:46.4329662Z selp.b16 %rs367, %rs283, %rs251, %p57; 2026-02-21T08:52:46.4329733Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T08:52:46.4329798Z shr.s16 %rs369, %rs368, 4; 2026-02-21T08:52:46.4329868Z selp.b16 %rs370, %rs284, %rs252, %p57; 2026-02-21T08:52:46.4329934Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T08:52:46.4330000Z shr.s16 %rs372, %rs371, 4; 2026-02-21T08:52:46.4330069Z selp.b16 %rs373, %rs285, %rs253, %p57; 2026-02-21T08:52:46.4330132Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T08:52:46.4330204Z shr.s16 %rs375, %rs374, 4; 2026-02-21T08:52:46.4330273Z selp.b16 %rs376, %rs286, %rs254, %p57; 2026-02-21T08:52:46.4330335Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T08:52:46.4330403Z shr.s16 %rs378, %rs377, 4; 2026-02-21T08:52:46.4330473Z selp.b16 %rs379, %rs287, %rs255, %p57; 2026-02-21T08:52:46.4330536Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T08:52:46.4330599Z shr.s16 %rs381, %rs380, 4; 2026-02-21T08:52:46.4330678Z selp.b16 %rs382, %rs288, %rs256, %p57; 2026-02-21T08:52:46.4330742Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T08:52:46.4330806Z shr.s16 %rs384, %rs383, 4; 2026-02-21T08:52:46.4331024Z .loc 1 77 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:77:32 2026-02-21T08:52:46.4331094Z cvt.rn.f32.s16 %r1982, %rs291; 2026-02-21T08:52:46.4331160Z cvt.rn.f32.s16 %r1983, %rs294; 2026-02-21T08:52:46.4331389Z cvt.rn.f32.s16 %r1984, %rs297; 2026-02-21T08:52:46.4331457Z cvt.rn.f32.s16 %r1985, %rs300; 2026-02-21T08:52:46.4331522Z cvt.rn.f32.s16 %r1986, %rs303; 2026-02-21T08:52:46.4331587Z cvt.rn.f32.s16 %r1987, %rs306; 2026-02-21T08:52:46.4331659Z cvt.rn.f32.s16 %r1988, %rs309; 2026-02-21T08:52:46.4331724Z cvt.rn.f32.s16 %r1989, %rs312; 2026-02-21T08:52:46.4331786Z cvt.rn.f32.s16 %r1990, %rs315; 2026-02-21T08:52:46.4331856Z cvt.rn.f32.s16 %r1991, %rs318; 2026-02-21T08:52:46.4331920Z cvt.rn.f32.s16 %r1992, %rs321; 2026-02-21T08:52:46.4331983Z cvt.rn.f32.s16 %r1993, %rs324; 2026-02-21T08:52:46.4332045Z cvt.rn.f32.s16 %r1994, %rs327; 2026-02-21T08:52:46.4332116Z cvt.rn.f32.s16 %r1995, %rs330; 2026-02-21T08:52:46.4332177Z cvt.rn.f32.s16 %r1996, %rs333; 2026-02-21T08:52:46.4332337Z cvt.rn.f32.s16 %r1997, %rs336; 2026-02-21T08:52:46.4332409Z cvt.rn.f32.s16 %r1998, %rs339; 2026-02-21T08:52:46.4332471Z cvt.rn.f32.s16 %r1999, %rs342; 2026-02-21T08:52:46.4332538Z cvt.rn.f32.s16 %r2000, %rs345; 2026-02-21T08:52:46.4332599Z cvt.rn.f32.s16 %r2001, %rs348; 2026-02-21T08:52:46.4332669Z cvt.rn.f32.s16 %r2002, %rs351; 2026-02-21T08:52:46.4332731Z cvt.rn.f32.s16 %r2003, %rs354; 2026-02-21T08:52:46.4332794Z cvt.rn.f32.s16 %r2004, %rs357; 2026-02-21T08:52:46.4332862Z cvt.rn.f32.s16 %r2005, %rs360; 2026-02-21T08:52:46.4332925Z cvt.rn.f32.s16 %r2006, %rs363; 2026-02-21T08:52:46.4332987Z cvt.rn.f32.s16 %r2007, %rs366; 2026-02-21T08:52:46.4333053Z cvt.rn.f32.s16 %r2008, %rs369; 2026-02-21T08:52:46.4333121Z cvt.rn.f32.s16 %r2009, %rs372; 2026-02-21T08:52:46.4333184Z cvt.rn.f32.s16 %r2010, %rs375; 2026-02-21T08:52:46.4333243Z cvt.rn.f32.s16 %r2011, %rs378; 2026-02-21T08:52:46.4333306Z cvt.rn.f32.s16 %r2012, %rs381; 2026-02-21T08:52:46.4333369Z cvt.rn.f32.s16 %r2013, %rs384; 2026-02-21T08:52:46.4333435Z st.shared.b32 [%r39], %r1982; 2026-02-21T08:52:46.4333510Z st.shared.b32 [%r39+8], %r1983; 2026-02-21T08:52:46.4333582Z st.shared.b32 [%r39+16384], %r1998; 2026-02-21T08:52:46.4333651Z st.shared.b32 [%r39+16392], %r1999; 2026-02-21T08:52:46.4333716Z st.shared.b32 [%r40], %r1984; 2026-02-21T08:52:46.4333788Z st.shared.b32 [%r40+8], %r1985; 2026-02-21T08:52:46.4333853Z st.shared.b32 [%r40+16384], %r2000; 2026-02-21T08:52:46.4333933Z st.shared.b32 [%r40+16392], %r2001; 2026-02-21T08:52:46.4334007Z st.shared.b32 [%r41], %r1986; 2026-02-21T08:52:46.4334071Z st.shared.b32 [%r41+8], %r1987; 2026-02-21T08:52:46.4334138Z st.shared.b32 [%r41+16384], %r2002; 2026-02-21T08:52:46.4334205Z st.shared.b32 [%r41+16392], %r2003; 2026-02-21T08:52:46.4334275Z st.shared.b32 [%r42], %r1988; 2026-02-21T08:52:46.4334339Z st.shared.b32 [%r42+8], %r1989; 2026-02-21T08:52:46.4334404Z st.shared.b32 [%r42+16384], %r2004; 2026-02-21T08:52:46.4334478Z st.shared.b32 [%r42+16392], %r2005; 2026-02-21T08:52:46.4334542Z st.shared.b32 [%r43], %r1990; 2026-02-21T08:52:46.4334607Z st.shared.b32 [%r43+8], %r1991; 2026-02-21T08:52:46.4334682Z st.shared.b32 [%r43+16384], %r2006; 2026-02-21T08:52:46.4334750Z st.shared.b32 [%r43+16392], %r2007; 2026-02-21T08:52:46.4334815Z st.shared.b32 [%r44], %r1992; 2026-02-21T08:52:46.4334880Z st.shared.b32 [%r44+8], %r1993; 2026-02-21T08:52:46.4334951Z st.shared.b32 [%r44+16384], %r2008; 2026-02-21T08:52:46.4335017Z st.shared.b32 [%r44+16392], %r2009; 2026-02-21T08:52:46.4335081Z st.shared.b32 [%r45], %r1994; 2026-02-21T08:52:46.4335150Z st.shared.b32 [%r45+8], %r1995; 2026-02-21T08:52:46.4335217Z st.shared.b32 [%r45+16384], %r2010; 2026-02-21T08:52:46.4335281Z st.shared.b32 [%r45+16392], %r2011; 2026-02-21T08:52:46.4335346Z st.shared.b32 [%r46], %r1996; 2026-02-21T08:52:46.4335416Z st.shared.b32 [%r46+8], %r1997; 2026-02-21T08:52:46.4335482Z st.shared.b32 [%r46+16384], %r2012; 2026-02-21T08:52:46.4335551Z st.shared.b32 [%r46+16392], %r2013; 2026-02-21T08:52:46.4335621Z $L__tmp3: 2026-02-21T08:52:46.4335916Z .loc 2 291 36 // standard.py:291:36 @[ cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:84:40 ] 2026-02-21T08:52:46.4336087Z // begin inline asm 2026-02-21T08:52:46.4336188Z fence.proxy.async.shared::cta; 2026-02-21T08:52:46.4336249Z // end inline asm 2026-02-21T08:52:46.4336308Z bar.sync 0; 2026-02-21T08:52:46.4336394Z shfl.sync.idx.b32 %r2014, %r4, 0, 31, -1; 2026-02-21T08:52:46.4336599Z wgmma.fence.sync.aligned; 2026-02-21T08:52:46.4336679Z shl.b32 %r2015, %r2014, 11; 2026-02-21T08:52:46.4336744Z and.b32 %r2016, %r2015, 8192; 2026-02-21T08:52:46.4336815Z add.s32 %r2017, %r2016, %r3769; 2026-02-21T08:52:46.4336880Z bfe.u32 %r2018, %r2017, 4, 14; 2026-02-21T08:52:46.4336946Z cvt.u64.u32 %rd158, %r2018; 2026-02-21T08:52:46.4337027Z or.b64 %rd145, %rd158, 4611686293372403712; 2026-02-21T08:52:46.4337107Z mov.pred %p15, -1; 2026-02-21T08:52:46.4337311Z // begin inline asm 2026-02-21T08:52:46.4338086Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1402,%r1403,%r1404,%r1405}, %rd145, %p15, 1, 1; 2026-02-21T08:52:46.4338157Z // end inline asm 2026-02-21T08:52:46.4338221Z add.s32 %r2019, %r2017, 32; 2026-02-21T08:52:46.4338286Z bfe.u32 %r2020, %r2019, 4, 14; 2026-02-21T08:52:46.4338358Z cvt.u64.u32 %rd159, %r2020; 2026-02-21T08:52:46.4338437Z or.b64 %rd146, %rd159, 4611686293372403712; 2026-02-21T08:52:46.4338498Z // begin inline asm 2026-02-21T08:52:46.4339266Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1470,%r1471,%r1472,%r1473}, %rd146, %p15, 1, 1; 2026-02-21T08:52:46.4339328Z // end inline asm 2026-02-21T08:52:46.4339391Z add.s32 %r2021, %r2017, 64; 2026-02-21T08:52:46.4339456Z bfe.u32 %r2022, %r2021, 4, 14; 2026-02-21T08:52:46.4339529Z cvt.u64.u32 %rd160, %r2022; 2026-02-21T08:52:46.4339602Z or.b64 %rd147, %rd160, 4611686293372403712; 2026-02-21T08:52:46.4339663Z // begin inline asm 2026-02-21T08:52:46.4340416Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1538,%r1539,%r1540,%r1541}, %rd147, %p15, 1, 1; 2026-02-21T08:52:46.4340473Z // end inline asm 2026-02-21T08:52:46.4340536Z add.s32 %r2023, %r2017, 96; 2026-02-21T08:52:46.4340594Z bfe.u32 %r2024, %r2023, 4, 14; 2026-02-21T08:52:46.4340655Z cvt.u64.u32 %rd161, %r2024; 2026-02-21T08:52:46.4340725Z or.b64 %rd148, %rd161, 4611686293372403712; 2026-02-21T08:52:46.4340790Z // begin inline asm 2026-02-21T08:52:46.4341550Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1606,%r1607,%r1608,%r1609}, %rd148, %p15, 1, 1; 2026-02-21T08:52:46.4341615Z // end inline asm 2026-02-21T08:52:46.4341686Z add.s32 %r2025, %r2017, 16384; 2026-02-21T08:52:46.4341749Z bfe.u32 %r2026, %r2025, 4, 14; 2026-02-21T08:52:46.4341817Z cvt.u64.u32 %rd162, %r2026; 2026-02-21T08:52:46.4341898Z or.b64 %rd149, %rd162, 4611686293372403712; 2026-02-21T08:52:46.4341958Z // begin inline asm 2026-02-21T08:52:46.4342714Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1674,%r1675,%r1676,%r1677}, %rd149, %p15, 1, 1; 2026-02-21T08:52:46.4342779Z // end inline asm 2026-02-21T08:52:46.4342984Z add.s32 %r2027, %r2017, 16416; 2026-02-21T08:52:46.4343047Z bfe.u32 %r2028, %r2027, 4, 14; 2026-02-21T08:52:46.4343112Z cvt.u64.u32 %rd163, %r2028; 2026-02-21T08:52:46.4343193Z or.b64 %rd150, %rd163, 4611686293372403712; 2026-02-21T08:52:46.4343254Z // begin inline asm 2026-02-21T08:52:46.4344014Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1742,%r1743,%r1744,%r1745}, %rd150, %p15, 1, 1; 2026-02-21T08:52:46.4344078Z // end inline asm 2026-02-21T08:52:46.4344140Z add.s32 %r2029, %r2017, 16448; 2026-02-21T08:52:46.4344298Z bfe.u32 %r2030, %r2029, 4, 14; 2026-02-21T08:52:46.4344369Z cvt.u64.u32 %rd164, %r2030; 2026-02-21T08:52:46.4344444Z or.b64 %rd151, %rd164, 4611686293372403712; 2026-02-21T08:52:46.4344505Z // begin inline asm 2026-02-21T08:52:46.4345272Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1810,%r1811,%r1812,%r1813}, %rd151, %p15, 1, 1; 2026-02-21T08:52:46.4345332Z // end inline asm 2026-02-21T08:52:46.4345393Z add.s32 %r2031, %r2017, 16480; 2026-02-21T08:52:46.4345458Z bfe.u32 %r2032, %r2031, 4, 14; 2026-02-21T08:52:46.4345521Z cvt.u64.u32 %rd165, %r2032; 2026-02-21T08:52:46.4345593Z or.b64 %rd152, %rd165, 4611686293372403712; 2026-02-21T08:52:46.4345654Z // begin inline asm 2026-02-21T08:52:46.4346411Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856}, {%r1878,%r1879,%r1880,%r1881}, %rd152, %p15, 1, 1; 2026-02-21T08:52:46.4346604Z // end inline asm 2026-02-21T08:52:46.4346688Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:46.4346755Z mov.b32 %r1915, 0; 2026-02-21T08:52:46.4346817Z mov.b32 %r1914, %r3769; 2026-02-21T08:52:46.4346879Z mov.b32 %r1916, %r1915; 2026-02-21T08:52:46.4346944Z // begin inline asm 2026-02-21T08:52:46.4347511Z // wait for regs: %r3825,%r3826,%r3827,%r3828,%r3829,%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r1914,%r1915,%r1916 2026-02-21T08:52:46.4347589Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:46.4347652Z // end inline asm 2026-02-21T08:52:46.4347710Z $L__tmp4: 2026-02-21T08:52:46.4347939Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4348007Z add.s32 %r2033, %r3824, 1; 2026-02-21T08:52:46.4348086Z setp.gt.s32 %p26, %r2033, 1; 2026-02-21T08:52:46.4348156Z selp.b32 %r3824, 0, %r2033, %p26; 2026-02-21T08:52:46.4348367Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4348453Z mad.wide.s32 %rd156, %r3821, 2, %rd53; 2026-02-21T08:52:46.4348764Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4348830Z shl.b32 %r2034, %r3824, 12; 2026-02-21T08:52:46.4348896Z shl.b32 %r2035, %r3824, 13; 2026-02-21T08:52:46.4348960Z add.s32 %r2036, %r1154, %r2035; 2026-02-21T08:52:46.4349025Z add.s32 %r1952, %r2036, %r14; 2026-02-21T08:52:46.4349090Z selp.b32 %r1953, 8, 0, %p24; 2026-02-21T08:52:46.4349158Z // begin inline asm 2026-02-21T08:52:46.4349309Z cp.async.ca.shared.global [ %r1952 + 0 ], [ %rd293 + 0 ], 0x8, %r1953; 2026-02-21T08:52:46.4349370Z // end inline asm 2026-02-21T08:52:46.4349437Z add.s32 %r1954, %r1952, 2048; 2026-02-21T08:52:46.4349675Z // begin inline asm 2026-02-21T08:52:46.4349816Z cp.async.ca.shared.global [ %r1954 + 0 ], [ %rd294 + 0 ], 0x8, %r1953; 2026-02-21T08:52:46.4349875Z // end inline asm 2026-02-21T08:52:46.4349943Z add.s32 %r1956, %r1952, 4096; 2026-02-21T08:52:46.4350004Z // begin inline asm 2026-02-21T08:52:46.4350139Z cp.async.ca.shared.global [ %r1956 + 0 ], [ %rd295 + 0 ], 0x8, %r1953; 2026-02-21T08:52:46.4350204Z // end inline asm 2026-02-21T08:52:46.4350266Z add.s32 %r1958, %r1952, 6144; 2026-02-21T08:52:46.4350325Z // begin inline asm 2026-02-21T08:52:46.4350463Z cp.async.ca.shared.global [ %r1958 + 0 ], [ %rd156 + 0 ], 0x8, %r1953; 2026-02-21T08:52:46.4350523Z // end inline asm 2026-02-21T08:52:46.4350592Z cp.async.commit_group; 2026-02-21T08:52:46.4350921Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4350997Z cvt.s64.s32 %rd166, %r3822; 2026-02-21T08:52:46.4351066Z add.s64 %rd157, %rd54, %rd166; 2026-02-21T08:52:46.4351272Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4351340Z add.s32 %r1960, %r20, %r2034; 2026-02-21T08:52:46.4351404Z selp.b32 %r1961, 16, 0, %p24; 2026-02-21T08:52:46.4351465Z // begin inline asm 2026-02-21T08:52:46.4351610Z cp.async.cg.shared.global [ %r1960 + 0 ], [ %rd157 + 0 ], 0x10, %r1961; 2026-02-21T08:52:46.4351675Z // end inline asm 2026-02-21T08:52:46.4351743Z cp.async.commit_group; 2026-02-21T08:52:46.4351958Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4352030Z add.s32 %r3822, %r3822, 229376; 2026-02-21T08:52:46.4352096Z add.s64 %rd295, %rd295, 128; 2026-02-21T08:52:46.4352161Z add.s64 %rd294, %rd294, 128; 2026-02-21T08:52:46.4352233Z add.s64 %rd293, %rd293, 128; 2026-02-21T08:52:46.4352298Z add.s32 %r3821, %r3821, 64; 2026-02-21T08:52:46.4352368Z setp.lt.u64 %p27, %rd296, 4064; 2026-02-21T08:52:46.4352450Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:46.4352575Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:46.4352795Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4352862Z or.b32 %r2096, %r185, %r11; 2026-02-21T08:52:46.4353085Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4353161Z cp.async.wait_group 0; 2026-02-21T08:52:46.4353220Z bar.sync 0; 2026-02-21T08:52:46.4353427Z .loc 1 87 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:87:28 2026-02-21T08:52:46.4353510Z cvt.rn.bf16x2.f32 %r2097, %r3826, %r3825; 2026-02-21T08:52:46.4353587Z cvt.rn.bf16x2.f32 %r2098, %r3828, %r3827; 2026-02-21T08:52:46.4353664Z cvt.rn.bf16x2.f32 %r2099, %r3830, %r3829; 2026-02-21T08:52:46.4353742Z cvt.rn.bf16x2.f32 %r2100, %r3832, %r3831; 2026-02-21T08:52:46.4353813Z cvt.rn.bf16x2.f32 %r2101, %r3834, %r3833; 2026-02-21T08:52:46.4353888Z cvt.rn.bf16x2.f32 %r2102, %r3836, %r3835; 2026-02-21T08:52:46.4353964Z cvt.rn.bf16x2.f32 %r2103, %r3838, %r3837; 2026-02-21T08:52:46.4354035Z cvt.rn.bf16x2.f32 %r2104, %r3840, %r3839; 2026-02-21T08:52:46.4354105Z cvt.rn.bf16x2.f32 %r2105, %r3842, %r3841; 2026-02-21T08:52:46.4354180Z cvt.rn.bf16x2.f32 %r2106, %r3844, %r3843; 2026-02-21T08:52:46.4354265Z cvt.rn.bf16x2.f32 %r2107, %r3846, %r3845; 2026-02-21T08:52:46.4354338Z cvt.rn.bf16x2.f32 %r2108, %r3848, %r3847; 2026-02-21T08:52:46.4354412Z cvt.rn.bf16x2.f32 %r2109, %r3850, %r3849; 2026-02-21T08:52:46.4354489Z cvt.rn.bf16x2.f32 %r2110, %r3852, %r3851; 2026-02-21T08:52:46.4354562Z cvt.rn.bf16x2.f32 %r2111, %r3854, %r3853; 2026-02-21T08:52:46.4354633Z cvt.rn.bf16x2.f32 %r2112, %r3856, %r3855; 2026-02-21T08:52:46.4354851Z .loc 1 88 50 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:50 2026-02-21T08:52:46.4354925Z mad.lo.s32 %r2113, %r181, 7168, %r2096; 2026-02-21T08:52:46.4355106Z mad.lo.s32 %r2114, %r182, 7168, %r2096; 2026-02-21T08:52:46.4355193Z mad.lo.s32 %r2115, %r183, 7168, %r2096; 2026-02-21T08:52:46.4355260Z mad.lo.s32 %r2116, %r184, 7168, %r2096; 2026-02-21T08:52:46.4355469Z .loc 1 88 22 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:22 2026-02-21T08:52:46.4355543Z mad.wide.s32 %rd167, %r2113, 2, %rd55; 2026-02-21T08:52:46.4355616Z mad.wide.s32 %rd168, %r2114, 2, %rd55; 2026-02-21T08:52:46.4355683Z mad.wide.s32 %rd169, %r2115, 2, %rd55; 2026-02-21T08:52:46.4355752Z mad.wide.s32 %rd170, %r2116, 2, %rd55; 2026-02-21T08:52:46.4355957Z .loc 1 88 81 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:81 2026-02-21T08:52:46.4356162Z st.shared.v4.b32 [%r47], {%r2097, %r2099, %r2101, %r2103}; 2026-02-21T08:52:46.4356283Z st.shared.v4.b32 [%r47+512], {%r2098, %r2100, %r2102, %r2104}; 2026-02-21T08:52:46.4356394Z st.shared.v4.b32 [%r48], {%r2105, %r2107, %r2109, %r2111}; 2026-02-21T08:52:46.4356642Z st.shared.v4.b32 [%r48+512], {%r2106, %r2108, %r2110, %r2112}; 2026-02-21T08:52:46.4356706Z bar.sync 0; 2026-02-21T08:52:46.4356770Z // begin inline asm 2026-02-21T08:52:46.4356971Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2057, %r2058, %r2059, %r2060}, [%r1230]; 2026-02-21T08:52:46.4357031Z // end inline asm 2026-02-21T08:52:46.4357090Z // begin inline asm 2026-02-21T08:52:46.4357282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2061, %r2062, %r2063, %r2064}, [%r1235]; 2026-02-21T08:52:46.4357340Z // end inline asm 2026-02-21T08:52:46.4357399Z // begin inline asm 2026-02-21T08:52:46.4357583Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2065, %r2066, %r2067, %r2068}, [%r1240]; 2026-02-21T08:52:46.4357641Z // end inline asm 2026-02-21T08:52:46.4357703Z // begin inline asm 2026-02-21T08:52:46.4357882Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2069, %r2070, %r2071, %r2072}, [%r1245]; 2026-02-21T08:52:46.4357947Z // end inline asm 2026-02-21T08:52:46.4358024Z // begin inline asm 2026-02-21T08:52:46.4358156Z st.global.v4.b32 [ %rd167 + 0 ], { %r2057, %r2058, %r2059, %r2060 }; 2026-02-21T08:52:46.4358219Z // end inline asm 2026-02-21T08:52:46.4358279Z // begin inline asm 2026-02-21T08:52:46.4358399Z st.global.v4.b32 [ %rd168 + 0 ], { %r2061, %r2062, %r2063, %r2064 }; 2026-02-21T08:52:46.4358457Z // end inline asm 2026-02-21T08:52:46.4358522Z // begin inline asm 2026-02-21T08:52:46.4358637Z st.global.v4.b32 [ %rd169 + 0 ], { %r2065, %r2066, %r2067, %r2068 }; 2026-02-21T08:52:46.4358694Z // end inline asm 2026-02-21T08:52:46.4358759Z // begin inline asm 2026-02-21T08:52:46.4358874Z st.global.v4.b32 [ %rd170 + 0 ], { %r2069, %r2070, %r2071, %r2072 }; 2026-02-21T08:52:46.4358932Z // end inline asm 2026-02-21T08:52:46.4359162Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4359236Z add.s32 %r2117, %r3784, 8448; 2026-02-21T08:52:46.4359442Z .loc 1 25 35 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:25:35 2026-02-21T08:52:46.4359521Z mul.hi.s32 %r2118, %r2117, -1840700269; 2026-02-21T08:52:46.4359594Z add.s32 %r2119, %r2118, %r2117; 2026-02-21T08:52:46.4359658Z shr.u32 %r2120, %r2119, 31; 2026-02-21T08:52:46.4359722Z shr.s32 %r2121, %r2119, 6; 2026-02-21T08:52:46.4359792Z add.s32 %r2122, %r2121, %r2120; 2026-02-21T08:52:46.4360000Z .loc 1 26 33 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:26:33 2026-02-21T08:52:46.4360063Z shl.b32 %r2123, %r2122, 1; 2026-02-21T08:52:46.4360271Z .loc 1 27 39 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:39 2026-02-21T08:52:46.4360335Z sub.s32 %r2124, 1, %r2123; 2026-02-21T08:52:46.4360536Z .loc 1 27 52 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:52 2026-02-21T08:52:46.4360597Z min.s32 %r2125, %r2124, 2; 2026-02-21T08:52:46.4360801Z .loc 1 28 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:45 2026-02-21T08:52:46.4361011Z mul.lo.s32 %r2126, %r2122, 112; 2026-02-21T08:52:46.4361076Z sub.s32 %r2127, %r2117, %r2126; 2026-02-21T08:52:46.4361283Z .loc 1 29 51 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:29:51 2026-02-21T08:52:46.4361348Z div.s32 %r2128, %r2127, %r2125; 2026-02-21T08:52:46.4361547Z .loc 1 28 64 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:64 2026-02-21T08:52:46.4361624Z mul.lo.s32 %r2129, %r2128, %r2125; 2026-02-21T08:52:46.4361688Z sub.s32 %r2130, %r2127, %r2129; 2026-02-21T08:52:46.4361899Z .loc 1 28 30 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:30 2026-02-21T08:52:46.4362102Z add.s32 %r2131, %r2130, %r2123; 2026-02-21T08:52:46.4362309Z .loc 1 30 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:30:27 2026-02-21T08:52:46.4362372Z shl.b32 %r2132, %r2131, 6; 2026-02-21T08:52:46.4362575Z .loc 1 31 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:31:32 2026-02-21T08:52:46.4362644Z or.b32 %r260, %r2132, %r5; 2026-02-21T08:52:46.4362706Z or.b32 %r261, %r2132, %r6; 2026-02-21T08:52:46.4362767Z or.b32 %r262, %r2132, %r7; 2026-02-21T08:52:46.4362834Z or.b32 %r263, %r2132, %r8; 2026-02-21T08:52:46.4363040Z .loc 1 32 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:32:27 2026-02-21T08:52:46.4363107Z shl.b32 %r264, %r2128, 7; 2026-02-21T08:52:46.4363323Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4363389Z or.b32 %r2133, %r264, %r10; 2026-02-21T08:52:46.4363597Z .loc 1 48 53 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:53 2026-02-21T08:52:46.4363662Z shl.b32 %r2134, %r260, 13; 2026-02-21T08:52:46.4363730Z shl.b32 %r2135, %r261, 13; 2026-02-21T08:52:46.4363794Z shl.b32 %r2136, %r262, 13; 2026-02-21T08:52:46.4363853Z shl.b32 %r2137, %r263, 13; 2026-02-21T08:52:46.4364059Z .loc 1 48 60 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:60 2026-02-21T08:52:46.4364135Z or.b32 %r2138, %r2134, %r9; 2026-02-21T08:52:46.4364199Z or.b32 %r2139, %r2135, %r9; 2026-02-21T08:52:46.4364265Z or.b32 %r2140, %r2136, %r9; 2026-02-21T08:52:46.4364327Z or.b32 %r2141, %r2137, %r9; 2026-02-21T08:52:46.4364527Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4364604Z mad.wide.s32 %rd171, %r2138, 2, %rd53; 2026-02-21T08:52:46.4364681Z mad.wide.s32 %rd172, %r2139, 2, %rd53; 2026-02-21T08:52:46.4364748Z mad.wide.s32 %rd173, %r2140, 2, %rd53; 2026-02-21T08:52:46.4364820Z mad.wide.s32 %rd174, %r2141, 2, %rd53; 2026-02-21T08:52:46.4364884Z mov.b32 %r2074, 8; 2026-02-21T08:52:46.4365094Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4365160Z // begin inline asm 2026-02-21T08:52:46.4365307Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd171 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4365365Z // end inline asm 2026-02-21T08:52:46.4365425Z // begin inline asm 2026-02-21T08:52:46.4365560Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd172 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4365623Z // end inline asm 2026-02-21T08:52:46.4365684Z // begin inline asm 2026-02-21T08:52:46.4365817Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd173 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4365879Z // end inline asm 2026-02-21T08:52:46.4365938Z // begin inline asm 2026-02-21T08:52:46.4366066Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd174 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4366123Z // end inline asm 2026-02-21T08:52:46.4366200Z cp.async.commit_group; 2026-02-21T08:52:46.4366406Z .loc 1 54 62 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:62 2026-02-21T08:52:46.4366859Z add.s32 %r2142, %r2133, %r3770; 2026-02-21T08:52:46.4367089Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4367158Z cvt.s64.s32 %rd182, %r2142; 2026-02-21T08:52:46.4367227Z add.s64 %rd175, %rd54, %rd182; 2026-02-21T08:52:46.4367291Z mov.b32 %r2082, 16; 2026-02-21T08:52:46.4367499Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4367560Z // begin inline asm 2026-02-21T08:52:46.4367715Z cp.async.cg.shared.global [ %r20 + 0 ], [ %rd175 + 0 ], 0x10, %r2082; 2026-02-21T08:52:46.4367780Z // end inline asm 2026-02-21T08:52:46.4367847Z cp.async.commit_group; 2026-02-21T08:52:46.4368179Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4368253Z cvt.s64.s32 %rd183, %r2134; 2026-02-21T08:52:46.4368318Z or.b64 %rd184, %rd183, %rd288; 2026-02-21T08:52:46.4368383Z shl.b64 %rd185, %rd184, 1; 2026-02-21T08:52:46.4368456Z add.s64 %rd186, %rd53, %rd185; 2026-02-21T08:52:46.4368522Z add.s64 %rd176, %rd186, 128; 2026-02-21T08:52:46.4368585Z cvt.s64.s32 %rd187, %r2135; 2026-02-21T08:52:46.4368659Z or.b64 %rd188, %rd187, %rd288; 2026-02-21T08:52:46.4368728Z shl.b64 %rd189, %rd188, 1; 2026-02-21T08:52:46.4368791Z add.s64 %rd190, %rd53, %rd189; 2026-02-21T08:52:46.4368855Z add.s64 %rd177, %rd190, 128; 2026-02-21T08:52:46.4368922Z cvt.s64.s32 %rd191, %r2136; 2026-02-21T08:52:46.4368984Z or.b64 %rd192, %rd191, %rd288; 2026-02-21T08:52:46.4369046Z shl.b64 %rd193, %rd192, 1; 2026-02-21T08:52:46.4369109Z add.s64 %rd194, %rd53, %rd193; 2026-02-21T08:52:46.4369178Z add.s64 %rd178, %rd194, 128; 2026-02-21T08:52:46.4369239Z cvt.s64.s32 %rd195, %r2137; 2026-02-21T08:52:46.4369304Z or.b64 %rd196, %rd195, %rd288; 2026-02-21T08:52:46.4369373Z shl.b64 %rd197, %rd196, 1; 2026-02-21T08:52:46.4369438Z add.s64 %rd198, %rd53, %rd197; 2026-02-21T08:52:46.4369503Z add.s64 %rd179, %rd198, 128; 2026-02-21T08:52:46.4369718Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4369776Z bar.sync 0; 2026-02-21T08:52:46.4369836Z // begin inline asm 2026-02-21T08:52:46.4369975Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd176 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4370040Z // end inline asm 2026-02-21T08:52:46.4370101Z // begin inline asm 2026-02-21T08:52:46.4370231Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd177 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4370293Z // end inline asm 2026-02-21T08:52:46.4370352Z // begin inline asm 2026-02-21T08:52:46.4370480Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd178 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4370537Z // end inline asm 2026-02-21T08:52:46.4370602Z // begin inline asm 2026-02-21T08:52:46.4370731Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd179 + 0 ], 0x8, %r2074; 2026-02-21T08:52:46.4370788Z // end inline asm 2026-02-21T08:52:46.4370861Z cp.async.commit_group; 2026-02-21T08:52:46.4371064Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4371127Z cvt.s64.s32 %rd199, %r2133; 2026-02-21T08:52:46.4371199Z add.s64 %rd200, %rd199, %rd5; 2026-02-21T08:52:46.4371264Z add.s64 %rd201, %rd54, %rd200; 2026-02-21T08:52:46.4371330Z add.s64 %rd180, %rd201, 229376; 2026-02-21T08:52:46.4371534Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4371601Z // begin inline asm 2026-02-21T08:52:46.4371740Z cp.async.cg.shared.global [ %r25 + 0 ], [ %rd180 + 0 ], 0x10, %r2082; 2026-02-21T08:52:46.4371798Z // end inline asm 2026-02-21T08:52:46.4371870Z cp.async.commit_group; 2026-02-21T08:52:46.4372084Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4372149Z add.s32 %r3858, %r53, %r264; 2026-02-21T08:52:46.4372218Z or.b32 %r2143, %r7, %r2132; 2026-02-21T08:52:46.4372438Z shl.b32 %r2144, %r2143, 13; 2026-02-21T08:52:46.4372511Z mad.wide.s32 %rd299, %r2144, 2, %rd1; 2026-02-21T08:52:46.4372573Z or.b32 %r2145, %r6, %r2132; 2026-02-21T08:52:46.4372639Z shl.b32 %r2146, %r2145, 13; 2026-02-21T08:52:46.4372708Z mad.wide.s32 %rd298, %r2146, 2, %rd1; 2026-02-21T08:52:46.4372769Z shl.b32 %r2147, %r2131, 19; 2026-02-21T08:52:46.4372839Z or.b32 %r2148, %r56, %r2147; 2026-02-21T08:52:46.4372907Z mad.wide.s32 %rd297, %r2148, 2, %rd1; 2026-02-21T08:52:46.4372969Z or.b32 %r3857, %r57, %r2147; 2026-02-21T08:52:46.4373031Z mov.b32 %r3861, 0f00000000; 2026-02-21T08:52:46.4373095Z mov.b32 %r3860, 1; 2026-02-21T08:52:46.4373156Z mov.b32 %r3859, -1; 2026-02-21T08:52:46.4373216Z mov.b64 %rd300, -32; 2026-02-21T08:52:46.4373287Z mov.b32 %r3862, %r3861; 2026-02-21T08:52:46.4373440Z mov.b32 %r3863, %r3861; 2026-02-21T08:52:46.4373501Z mov.b32 %r3864, %r3861; 2026-02-21T08:52:46.4373559Z mov.b32 %r3865, %r3861; 2026-02-21T08:52:46.4373624Z mov.b32 %r3866, %r3861; 2026-02-21T08:52:46.4373686Z mov.b32 %r3867, %r3861; 2026-02-21T08:52:46.4373757Z mov.b32 %r3868, %r3861; 2026-02-21T08:52:46.4373821Z mov.b32 %r3869, %r3861; 2026-02-21T08:52:46.4373879Z mov.b32 %r3870, %r3861; 2026-02-21T08:52:46.4373937Z mov.b32 %r3871, %r3861; 2026-02-21T08:52:46.4373994Z mov.b32 %r3872, %r3861; 2026-02-21T08:52:46.4374058Z mov.b32 %r3873, %r3861; 2026-02-21T08:52:46.4374117Z mov.b32 %r3874, %r3861; 2026-02-21T08:52:46.4374175Z mov.b32 %r3875, %r3861; 2026-02-21T08:52:46.4374242Z mov.b32 %r3876, %r3861; 2026-02-21T08:52:46.4374300Z mov.b32 %r3877, %r3861; 2026-02-21T08:52:46.4374359Z mov.b32 %r3878, %r3861; 2026-02-21T08:52:46.4374422Z mov.b32 %r3879, %r3861; 2026-02-21T08:52:46.4374488Z mov.b32 %r3880, %r3861; 2026-02-21T08:52:46.4374553Z mov.b32 %r3881, %r3861; 2026-02-21T08:52:46.4374615Z mov.b32 %r3882, %r3861; 2026-02-21T08:52:46.4374679Z mov.b32 %r3883, %r3861; 2026-02-21T08:52:46.4374737Z mov.b32 %r3884, %r3861; 2026-02-21T08:52:46.4374798Z mov.b32 %r3885, %r3861; 2026-02-21T08:52:46.4374861Z mov.b32 %r3886, %r3861; 2026-02-21T08:52:46.4374921Z mov.b32 %r3887, %r3861; 2026-02-21T08:52:46.4374980Z mov.b32 %r3888, %r3861; 2026-02-21T08:52:46.4375038Z mov.b32 %r3889, %r3861; 2026-02-21T08:52:46.4375103Z mov.b32 %r3890, %r3861; 2026-02-21T08:52:46.4375163Z mov.b32 %r3891, %r3861; 2026-02-21T08:52:46.4375220Z mov.b32 %r3892, %r3861; 2026-02-21T08:52:46.4375344Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:46.4375452Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:46.4375520Z add.s64 %rd300, %rd300, 32; 2026-02-21T08:52:46.4375592Z setp.lt.u64 %p37, %rd300, 4032; 2026-02-21T08:52:46.4375660Z add.s32 %r2773, %r3859, 1; 2026-02-21T08:52:46.4375728Z setp.gt.s32 %p38, %r2773, 1; 2026-02-21T08:52:46.4375798Z selp.b32 %r3859, 0, %r2773, %p38; 2026-02-21T08:52:46.4376021Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4376093Z cp.async.wait_group 2; 2026-02-21T08:52:46.4376152Z bar.sync 0; 2026-02-21T08:52:46.4376219Z shl.b32 %r2774, %r3859, 12; 2026-02-21T08:52:46.4376284Z shl.b32 %r2775, %r3859, 13; 2026-02-21T08:52:46.4376349Z add.s32 %r2777, %r1154, %r2775; 2026-02-21T08:52:46.4376697Z .loc 1 52 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:52:32 2026-02-21T08:52:46.4376775Z add.s32 %r2778, %r2777, %r26; 2026-02-21T08:52:46.4376846Z ld.shared.b16 %rs385, [%r2778]; 2026-02-21T08:52:46.4376919Z ld.shared.b16 %rs386, [%r2778+1024]; 2026-02-21T08:52:46.4376989Z ld.shared.b16 %rs387, [%r2778+64]; 2026-02-21T08:52:46.4377064Z ld.shared.b16 %rs388, [%r2778+1088]; 2026-02-21T08:52:46.4377134Z add.s32 %r2779, %r2777, %r27; 2026-02-21T08:52:46.4377210Z ld.shared.b16 %rs389, [%r2779]; 2026-02-21T08:52:46.4377285Z ld.shared.b16 %rs390, [%r2779+1024]; 2026-02-21T08:52:46.4377354Z ld.shared.b16 %rs391, [%r2779+64]; 2026-02-21T08:52:46.4414776Z ld.shared.b16 %rs392, [%r2779+1088]; 2026-02-21T08:52:46.4414848Z add.s32 %r2780, %r2777, %r28; 2026-02-21T08:52:46.4414922Z ld.shared.b16 %rs393, [%r2780]; 2026-02-21T08:52:46.4414996Z ld.shared.b16 %rs394, [%r2780+1024]; 2026-02-21T08:52:46.4415064Z ld.shared.b16 %rs395, [%r2780+64]; 2026-02-21T08:52:46.4415135Z ld.shared.b16 %rs396, [%r2780+1088]; 2026-02-21T08:52:46.4415197Z add.s32 %r2781, %r2777, %r29; 2026-02-21T08:52:46.4415263Z ld.shared.b16 %rs397, [%r2781]; 2026-02-21T08:52:46.4415340Z ld.shared.b16 %rs398, [%r2781+1024]; 2026-02-21T08:52:46.4415406Z ld.shared.b16 %rs399, [%r2781+64]; 2026-02-21T08:52:46.4415475Z ld.shared.b16 %rs400, [%r2781+1088]; 2026-02-21T08:52:46.4415538Z add.s32 %r2782, %r2777, %r30; 2026-02-21T08:52:46.4415754Z ld.shared.b16 %rs401, [%r2782]; 2026-02-21T08:52:46.4415825Z ld.shared.b16 %rs402, [%r2782+1024]; 2026-02-21T08:52:46.4415908Z ld.shared.b16 %rs403, [%r2782+64]; 2026-02-21T08:52:46.4415981Z ld.shared.b16 %rs404, [%r2782+1088]; 2026-02-21T08:52:46.4416048Z add.s32 %r2783, %r2777, %r31; 2026-02-21T08:52:46.4416112Z ld.shared.b16 %rs405, [%r2783]; 2026-02-21T08:52:46.4416179Z ld.shared.b16 %rs406, [%r2783+1024]; 2026-02-21T08:52:46.4416250Z ld.shared.b16 %rs407, [%r2783+64]; 2026-02-21T08:52:46.4416315Z ld.shared.b16 %rs408, [%r2783+1088]; 2026-02-21T08:52:46.4416379Z add.s32 %r2784, %r2777, %r32; 2026-02-21T08:52:46.4416568Z ld.shared.b16 %rs409, [%r2784]; 2026-02-21T08:52:46.4416640Z ld.shared.b16 %rs410, [%r2784+1024]; 2026-02-21T08:52:46.4416708Z ld.shared.b16 %rs411, [%r2784+64]; 2026-02-21T08:52:46.4416782Z ld.shared.b16 %rs412, [%r2784+1088]; 2026-02-21T08:52:46.4416845Z add.s32 %r2785, %r2777, %r33; 2026-02-21T08:52:46.4416911Z ld.shared.b16 %rs413, [%r2785]; 2026-02-21T08:52:46.4416982Z ld.shared.b16 %rs414, [%r2785+1024]; 2026-02-21T08:52:46.4417053Z ld.shared.b16 %rs415, [%r2785+64]; 2026-02-21T08:52:46.4417120Z ld.shared.b16 %rs416, [%r2785+1088]; 2026-02-21T08:52:46.4417192Z cvt.f32.bf16 %r2213, %rs385; 2026-02-21T08:52:46.4417274Z cvt.f32.bf16 %r2214, %rs386; 2026-02-21T08:52:46.4417339Z cvt.f32.bf16 %r2215, %rs389; 2026-02-21T08:52:46.4417401Z cvt.f32.bf16 %r2216, %rs390; 2026-02-21T08:52:46.4417462Z cvt.f32.bf16 %r2281, %rs393; 2026-02-21T08:52:46.4417530Z cvt.f32.bf16 %r2282, %rs394; 2026-02-21T08:52:46.4417592Z cvt.f32.bf16 %r2283, %rs397; 2026-02-21T08:52:46.4417652Z cvt.f32.bf16 %r2284, %rs398; 2026-02-21T08:52:46.4417720Z cvt.f32.bf16 %r2349, %rs401; 2026-02-21T08:52:46.4417781Z cvt.f32.bf16 %r2350, %rs402; 2026-02-21T08:52:46.4417841Z cvt.f32.bf16 %r2351, %rs405; 2026-02-21T08:52:46.4417903Z cvt.f32.bf16 %r2352, %rs406; 2026-02-21T08:52:46.4417973Z cvt.f32.bf16 %r2417, %rs409; 2026-02-21T08:52:46.4418037Z cvt.f32.bf16 %r2418, %rs410; 2026-02-21T08:52:46.4418098Z cvt.f32.bf16 %r2419, %rs413; 2026-02-21T08:52:46.4418164Z cvt.f32.bf16 %r2420, %rs414; 2026-02-21T08:52:46.4418224Z cvt.f32.bf16 %r2485, %rs387; 2026-02-21T08:52:46.4418287Z cvt.f32.bf16 %r2486, %rs388; 2026-02-21T08:52:46.4418355Z cvt.f32.bf16 %r2487, %rs391; 2026-02-21T08:52:46.4418416Z cvt.f32.bf16 %r2488, %rs392; 2026-02-21T08:52:46.4418478Z cvt.f32.bf16 %r2553, %rs395; 2026-02-21T08:52:46.4418553Z cvt.f32.bf16 %r2554, %rs396; 2026-02-21T08:52:46.4418623Z cvt.f32.bf16 %r2555, %rs399; 2026-02-21T08:52:46.4418684Z cvt.f32.bf16 %r2556, %rs400; 2026-02-21T08:52:46.4418746Z cvt.f32.bf16 %r2621, %rs403; 2026-02-21T08:52:46.4418813Z cvt.f32.bf16 %r2622, %rs404; 2026-02-21T08:52:46.4418872Z cvt.f32.bf16 %r2623, %rs407; 2026-02-21T08:52:46.4418932Z cvt.f32.bf16 %r2624, %rs408; 2026-02-21T08:52:46.4418994Z cvt.f32.bf16 %r2689, %rs411; 2026-02-21T08:52:46.4419060Z cvt.f32.bf16 %r2690, %rs412; 2026-02-21T08:52:46.4419120Z cvt.f32.bf16 %r2691, %rs415; 2026-02-21T08:52:46.4419184Z cvt.f32.bf16 %r2692, %rs416; 2026-02-21T08:52:46.4419416Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4419644Z add.s32 %r2786, %r3769, %r2774; 2026-02-21T08:52:46.4419713Z add.s32 %r2787, %r2786, 49152; 2026-02-21T08:52:46.4419931Z .loc 1 67 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:67:45 2026-02-21T08:52:46.4419995Z add.s32 %r2788, %r2787, %r3774; 2026-02-21T08:52:46.4420065Z ld.shared.b8 %rs417, [%r2788]; 2026-02-21T08:52:46.4420132Z ld.shared.b8 %rs418, [%r2788+128]; 2026-02-21T08:52:46.4420203Z ld.shared.b8 %rs419, [%r2788+256]; 2026-02-21T08:52:46.4420268Z ld.shared.b8 %rs420, [%r2788+384]; 2026-02-21T08:52:46.4420333Z ld.shared.b8 %rs421, [%r2788+512]; 2026-02-21T08:52:46.4420402Z ld.shared.b8 %rs422, [%r2788+640]; 2026-02-21T08:52:46.4420465Z ld.shared.b8 %rs423, [%r2788+768]; 2026-02-21T08:52:46.4420654Z add.s32 %r2789, %r2787, %r3775; 2026-02-21T08:52:46.4420723Z ld.shared.b8 %rs424, [%r2789]; 2026-02-21T08:52:46.4420799Z ld.shared.b8 %rs425, [%r2788+1024]; 2026-02-21T08:52:46.4420866Z ld.shared.b8 %rs426, [%r2788+1152]; 2026-02-21T08:52:46.4420942Z ld.shared.b8 %rs427, [%r2788+1280]; 2026-02-21T08:52:46.4421028Z ld.shared.b8 %rs428, [%r2788+1408]; 2026-02-21T08:52:46.4421102Z ld.shared.b8 %rs429, [%r2788+1536]; 2026-02-21T08:52:46.4421175Z ld.shared.b8 %rs430, [%r2788+1664]; 2026-02-21T08:52:46.4421242Z ld.shared.b8 %rs431, [%r2788+1792]; 2026-02-21T08:52:46.4421307Z add.s32 %r2790, %r2787, %r3776; 2026-02-21T08:52:46.4421373Z ld.shared.b8 %rs432, [%r2790]; 2026-02-21T08:52:46.4421449Z ld.shared.b8 %rs433, [%r2788+2048]; 2026-02-21T08:52:46.4421518Z ld.shared.b8 %rs434, [%r2788+2176]; 2026-02-21T08:52:46.4421583Z ld.shared.b8 %rs435, [%r2788+2304]; 2026-02-21T08:52:46.4421655Z ld.shared.b8 %rs436, [%r2788+2432]; 2026-02-21T08:52:46.4421720Z ld.shared.b8 %rs437, [%r2788+2560]; 2026-02-21T08:52:46.4421790Z ld.shared.b8 %rs438, [%r2788+2688]; 2026-02-21T08:52:46.4421863Z ld.shared.b8 %rs439, [%r2788+2816]; 2026-02-21T08:52:46.4421928Z add.s32 %r2791, %r2787, %r3777; 2026-02-21T08:52:46.4421996Z ld.shared.b8 %rs440, [%r2791]; 2026-02-21T08:52:46.4422062Z ld.shared.b8 %rs441, [%r2788+3072]; 2026-02-21T08:52:46.4422136Z ld.shared.b8 %rs442, [%r2788+3200]; 2026-02-21T08:52:46.4422202Z ld.shared.b8 %rs443, [%r2788+3328]; 2026-02-21T08:52:46.4422278Z ld.shared.b8 %rs444, [%r2788+3456]; 2026-02-21T08:52:46.4422354Z ld.shared.b8 %rs445, [%r2788+3584]; 2026-02-21T08:52:46.4422421Z ld.shared.b8 %rs446, [%r2788+3712]; 2026-02-21T08:52:46.4422486Z ld.shared.b8 %rs447, [%r2788+3840]; 2026-02-21T08:52:46.4422548Z add.s32 %r2792, %r2787, %r3778; 2026-02-21T08:52:46.4422623Z ld.shared.b8 %rs448, [%r2792]; 2026-02-21T08:52:46.4422836Z .loc 1 57 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:57:28 2026-02-21T08:52:46.4422903Z shl.b16 %rs449, %rs417, 4; 2026-02-21T08:52:46.4422975Z shl.b16 %rs450, %rs418, 4; 2026-02-21T08:52:46.4423036Z shl.b16 %rs451, %rs419, 4; 2026-02-21T08:52:46.4423097Z shl.b16 %rs452, %rs420, 4; 2026-02-21T08:52:46.4423165Z shl.b16 %rs453, %rs421, 4; 2026-02-21T08:52:46.4423240Z shl.b16 %rs454, %rs422, 4; 2026-02-21T08:52:46.4423304Z shl.b16 %rs455, %rs423, 4; 2026-02-21T08:52:46.4423365Z shl.b16 %rs456, %rs424, 4; 2026-02-21T08:52:46.4423432Z shl.b16 %rs457, %rs425, 4; 2026-02-21T08:52:46.4423493Z shl.b16 %rs458, %rs426, 4; 2026-02-21T08:52:46.4423554Z shl.b16 %rs459, %rs427, 4; 2026-02-21T08:52:46.4423620Z shl.b16 %rs460, %rs428, 4; 2026-02-21T08:52:46.4423679Z shl.b16 %rs461, %rs429, 4; 2026-02-21T08:52:46.4423739Z shl.b16 %rs462, %rs430, 4; 2026-02-21T08:52:46.4423799Z shl.b16 %rs463, %rs431, 4; 2026-02-21T08:52:46.4423865Z shl.b16 %rs464, %rs432, 4; 2026-02-21T08:52:46.4423926Z shl.b16 %rs465, %rs433, 4; 2026-02-21T08:52:46.4423987Z shl.b16 %rs466, %rs434, 4; 2026-02-21T08:52:46.4424059Z shl.b16 %rs467, %rs435, 4; 2026-02-21T08:52:46.4424120Z shl.b16 %rs468, %rs436, 4; 2026-02-21T08:52:46.4424180Z shl.b16 %rs469, %rs437, 4; 2026-02-21T08:52:46.4424239Z shl.b16 %rs470, %rs438, 4; 2026-02-21T08:52:46.4424417Z shl.b16 %rs471, %rs439, 4; 2026-02-21T08:52:46.4424480Z shl.b16 %rs472, %rs440, 4; 2026-02-21T08:52:46.4424541Z shl.b16 %rs473, %rs441, 4; 2026-02-21T08:52:46.4424605Z shl.b16 %rs474, %rs442, 4; 2026-02-21T08:52:46.4424665Z shl.b16 %rs475, %rs443, 4; 2026-02-21T08:52:46.4424723Z shl.b16 %rs476, %rs444, 4; 2026-02-21T08:52:46.4424784Z shl.b16 %rs477, %rs445, 4; 2026-02-21T08:52:46.4424850Z shl.b16 %rs478, %rs446, 4; 2026-02-21T08:52:46.4424910Z shl.b16 %rs479, %rs447, 4; 2026-02-21T08:52:46.4424971Z shl.b16 %rs480, %rs448, 4; 2026-02-21T08:52:46.4425194Z .loc 1 72 58 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:72:58 2026-02-21T08:52:46.4425272Z selp.b16 %rs481, %rs449, %rs417, %p57; 2026-02-21T08:52:46.4425448Z cvt.s16.s8 %rs482, %rs481; 2026-02-21T08:52:46.4425518Z shr.s16 %rs483, %rs482, 4; 2026-02-21T08:52:46.4425591Z selp.b16 %rs484, %rs450, %rs418, %p57; 2026-02-21T08:52:46.4425667Z cvt.s16.s8 %rs485, %rs484; 2026-02-21T08:52:46.4425734Z shr.s16 %rs486, %rs485, 4; 2026-02-21T08:52:46.4425811Z selp.b16 %rs487, %rs451, %rs419, %p57; 2026-02-21T08:52:46.4425877Z cvt.s16.s8 %rs488, %rs487; 2026-02-21T08:52:46.4425938Z shr.s16 %rs489, %rs488, 4; 2026-02-21T08:52:46.4426014Z selp.b16 %rs490, %rs452, %rs420, %p57; 2026-02-21T08:52:46.4426077Z cvt.s16.s8 %rs491, %rs490; 2026-02-21T08:52:46.4426139Z shr.s16 %rs492, %rs491, 4; 2026-02-21T08:52:46.4426205Z selp.b16 %rs493, %rs453, %rs421, %p57; 2026-02-21T08:52:46.4426274Z cvt.s16.s8 %rs494, %rs493; 2026-02-21T08:52:46.4426336Z shr.s16 %rs495, %rs494, 4; 2026-02-21T08:52:46.4426405Z selp.b16 %rs496, %rs454, %rs422, %p57; 2026-02-21T08:52:46.4426602Z cvt.s16.s8 %rs497, %rs496; 2026-02-21T08:52:46.4426673Z shr.s16 %rs498, %rs497, 4; 2026-02-21T08:52:46.4426746Z selp.b16 %rs499, %rs455, %rs423, %p57; 2026-02-21T08:52:46.4426808Z cvt.s16.s8 %rs500, %rs499; 2026-02-21T08:52:46.4426874Z shr.s16 %rs501, %rs500, 4; 2026-02-21T08:52:46.4426943Z selp.b16 %rs502, %rs456, %rs424, %p57; 2026-02-21T08:52:46.4427009Z cvt.s16.s8 %rs503, %rs502; 2026-02-21T08:52:46.4427075Z shr.s16 %rs504, %rs503, 4; 2026-02-21T08:52:46.4427142Z selp.b16 %rs505, %rs457, %rs425, %p57; 2026-02-21T08:52:46.4427204Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T08:52:46.4427272Z shr.s16 %rs507, %rs506, 4; 2026-02-21T08:52:46.4427341Z selp.b16 %rs508, %rs458, %rs426, %p57; 2026-02-21T08:52:46.4427407Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T08:52:46.4427469Z shr.s16 %rs510, %rs509, 4; 2026-02-21T08:52:46.4427544Z selp.b16 %rs511, %rs459, %rs427, %p57; 2026-02-21T08:52:46.4427603Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T08:52:46.4427667Z shr.s16 %rs513, %rs512, 4; 2026-02-21T08:52:46.4427742Z selp.b16 %rs514, %rs460, %rs428, %p57; 2026-02-21T08:52:46.4427806Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T08:52:46.4427868Z shr.s16 %rs516, %rs515, 4; 2026-02-21T08:52:46.4427937Z selp.b16 %rs517, %rs461, %rs429, %p57; 2026-02-21T08:52:46.4428004Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T08:52:46.4428066Z shr.s16 %rs519, %rs518, 4; 2026-02-21T08:52:46.4428136Z selp.b16 %rs520, %rs462, %rs430, %p57; 2026-02-21T08:52:46.4428202Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T08:52:46.4428265Z shr.s16 %rs522, %rs521, 4; 2026-02-21T08:52:46.4428333Z selp.b16 %rs523, %rs463, %rs431, %p57; 2026-02-21T08:52:46.4428396Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T08:52:46.4428464Z shr.s16 %rs525, %rs524, 4; 2026-02-21T08:52:46.4428603Z selp.b16 %rs526, %rs464, %rs432, %p57; 2026-02-21T08:52:46.4428672Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T08:52:46.4428742Z shr.s16 %rs528, %rs527, 4; 2026-02-21T08:52:46.4428811Z selp.b16 %rs529, %rs465, %rs433, %p57; 2026-02-21T08:52:46.4428873Z cvt.s16.s8 %rs530, %rs529; 2026-02-21T08:52:46.4428934Z shr.s16 %rs531, %rs530, 4; 2026-02-21T08:52:46.4429013Z selp.b16 %rs532, %rs466, %rs434, %p57; 2026-02-21T08:52:46.4429076Z cvt.s16.s8 %rs533, %rs532; 2026-02-21T08:52:46.4429137Z shr.s16 %rs534, %rs533, 4; 2026-02-21T08:52:46.4429213Z selp.b16 %rs535, %rs467, %rs435, %p57; 2026-02-21T08:52:46.4429464Z cvt.s16.s8 %rs536, %rs535; 2026-02-21T08:52:46.4429527Z shr.s16 %rs537, %rs536, 4; 2026-02-21T08:52:46.4429606Z selp.b16 %rs538, %rs468, %rs436, %p57; 2026-02-21T08:52:46.4429667Z cvt.s16.s8 %rs539, %rs538; 2026-02-21T08:52:46.4429728Z shr.s16 %rs540, %rs539, 4; 2026-02-21T08:52:46.4429797Z selp.b16 %rs541, %rs469, %rs437, %p57; 2026-02-21T08:52:46.4429869Z cvt.s16.s8 %rs542, %rs541; 2026-02-21T08:52:46.4429932Z shr.s16 %rs543, %rs542, 4; 2026-02-21T08:52:46.4430001Z selp.b16 %rs544, %rs470, %rs438, %p57; 2026-02-21T08:52:46.4430069Z cvt.s16.s8 %rs545, %rs544; 2026-02-21T08:52:46.4430130Z shr.s16 %rs546, %rs545, 4; 2026-02-21T08:52:46.4430198Z selp.b16 %rs547, %rs471, %rs439, %p57; 2026-02-21T08:52:46.4430259Z cvt.s16.s8 %rs548, %rs547; 2026-02-21T08:52:46.4430454Z shr.s16 %rs549, %rs548, 4; 2026-02-21T08:52:46.4430526Z selp.b16 %rs550, %rs472, %rs440, %p57; 2026-02-21T08:52:46.4430590Z cvt.s16.s8 %rs551, %rs550; 2026-02-21T08:52:46.4430661Z shr.s16 %rs552, %rs551, 4; 2026-02-21T08:52:46.4430730Z selp.b16 %rs553, %rs473, %rs441, %p57; 2026-02-21T08:52:46.4430791Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T08:52:46.4430852Z shr.s16 %rs555, %rs554, 4; 2026-02-21T08:52:46.4430925Z selp.b16 %rs556, %rs474, %rs442, %p57; 2026-02-21T08:52:46.4430985Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T08:52:46.4431046Z shr.s16 %rs558, %rs557, 4; 2026-02-21T08:52:46.4431118Z selp.b16 %rs559, %rs475, %rs443, %p57; 2026-02-21T08:52:46.4431178Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T08:52:46.4431238Z shr.s16 %rs561, %rs560, 4; 2026-02-21T08:52:46.4431310Z selp.b16 %rs562, %rs476, %rs444, %p57; 2026-02-21T08:52:46.4431371Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T08:52:46.4431430Z shr.s16 %rs564, %rs563, 4; 2026-02-21T08:52:46.4431500Z selp.b16 %rs565, %rs477, %rs445, %p57; 2026-02-21T08:52:46.4431567Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T08:52:46.4431626Z shr.s16 %rs567, %rs566, 4; 2026-02-21T08:52:46.4431693Z selp.b16 %rs568, %rs478, %rs446, %p57; 2026-02-21T08:52:46.4431762Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T08:52:46.4431825Z shr.s16 %rs570, %rs569, 4; 2026-02-21T08:52:46.4431892Z selp.b16 %rs571, %rs479, %rs447, %p57; 2026-02-21T08:52:46.4431953Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T08:52:46.4432020Z shr.s16 %rs573, %rs572, 4; 2026-02-21T08:52:46.4432087Z selp.b16 %rs574, %rs480, %rs448, %p57; 2026-02-21T08:52:46.4432148Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T08:52:46.4432214Z shr.s16 %rs576, %rs575, 4; 2026-02-21T08:52:46.4432440Z .loc 1 77 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:77:32 2026-02-21T08:52:46.4432511Z cvt.rn.f32.s16 %r2793, %rs483; 2026-02-21T08:52:46.4432576Z cvt.rn.f32.s16 %r2794, %rs486; 2026-02-21T08:52:46.4432645Z cvt.rn.f32.s16 %r2795, %rs489; 2026-02-21T08:52:46.4432707Z cvt.rn.f32.s16 %r2796, %rs492; 2026-02-21T08:52:46.4432770Z cvt.rn.f32.s16 %r2797, %rs495; 2026-02-21T08:52:46.4432837Z cvt.rn.f32.s16 %r2798, %rs498; 2026-02-21T08:52:46.4432901Z cvt.rn.f32.s16 %r2799, %rs501; 2026-02-21T08:52:46.4432967Z cvt.rn.f32.s16 %r2800, %rs504; 2026-02-21T08:52:46.4433035Z cvt.rn.f32.s16 %r2801, %rs507; 2026-02-21T08:52:46.4433099Z cvt.rn.f32.s16 %r2802, %rs510; 2026-02-21T08:52:46.4433161Z cvt.rn.f32.s16 %r2803, %rs513; 2026-02-21T08:52:46.4433223Z cvt.rn.f32.s16 %r2804, %rs516; 2026-02-21T08:52:46.4433293Z cvt.rn.f32.s16 %r2805, %rs519; 2026-02-21T08:52:46.4433355Z cvt.rn.f32.s16 %r2806, %rs522; 2026-02-21T08:52:46.4433419Z cvt.rn.f32.s16 %r2807, %rs525; 2026-02-21T08:52:46.4433499Z cvt.rn.f32.s16 %r2808, %rs528; 2026-02-21T08:52:46.4433564Z cvt.rn.f32.s16 %r2809, %rs531; 2026-02-21T08:52:46.4433626Z cvt.rn.f32.s16 %r2810, %rs534; 2026-02-21T08:52:46.4433688Z cvt.rn.f32.s16 %r2811, %rs537; 2026-02-21T08:52:46.4433761Z cvt.rn.f32.s16 %r2812, %rs540; 2026-02-21T08:52:46.4433823Z cvt.rn.f32.s16 %r2813, %rs543; 2026-02-21T08:52:46.4433886Z cvt.rn.f32.s16 %r2814, %rs546; 2026-02-21T08:52:46.4433953Z cvt.rn.f32.s16 %r2815, %rs549; 2026-02-21T08:52:46.4434123Z cvt.rn.f32.s16 %r2816, %rs552; 2026-02-21T08:52:46.4434186Z cvt.rn.f32.s16 %r2817, %rs555; 2026-02-21T08:52:46.4434249Z cvt.rn.f32.s16 %r2818, %rs558; 2026-02-21T08:52:46.4434319Z cvt.rn.f32.s16 %r2819, %rs561; 2026-02-21T08:52:46.4434379Z cvt.rn.f32.s16 %r2820, %rs564; 2026-02-21T08:52:46.4434445Z cvt.rn.f32.s16 %r2821, %rs567; 2026-02-21T08:52:46.4434514Z cvt.rn.f32.s16 %r2822, %rs570; 2026-02-21T08:52:46.4434578Z cvt.rn.f32.s16 %r2823, %rs573; 2026-02-21T08:52:46.4434639Z cvt.rn.f32.s16 %r2824, %rs576; 2026-02-21T08:52:46.4434706Z st.shared.b32 [%r39], %r2793; 2026-02-21T08:52:46.4434779Z st.shared.b32 [%r39+8], %r2794; 2026-02-21T08:52:46.4434848Z st.shared.b32 [%r39+16384], %r2809; 2026-02-21T08:52:46.4435008Z st.shared.b32 [%r39+16392], %r2810; 2026-02-21T08:52:46.4435084Z st.shared.b32 [%r40], %r2795; 2026-02-21T08:52:46.4435150Z st.shared.b32 [%r40+8], %r2796; 2026-02-21T08:52:46.4435217Z st.shared.b32 [%r40+16384], %r2811; 2026-02-21T08:52:46.4435291Z st.shared.b32 [%r40+16392], %r2812; 2026-02-21T08:52:46.4435355Z st.shared.b32 [%r41], %r2797; 2026-02-21T08:52:46.4435428Z st.shared.b32 [%r41+8], %r2798; 2026-02-21T08:52:46.4435494Z st.shared.b32 [%r41+16384], %r2813; 2026-02-21T08:52:46.4435567Z st.shared.b32 [%r41+16392], %r2814; 2026-02-21T08:52:46.4435636Z st.shared.b32 [%r42], %r2799; 2026-02-21T08:52:46.4435701Z st.shared.b32 [%r42+8], %r2800; 2026-02-21T08:52:46.4435776Z st.shared.b32 [%r42+16384], %r2815; 2026-02-21T08:52:46.4435839Z st.shared.b32 [%r42+16392], %r2816; 2026-02-21T08:52:46.4435902Z st.shared.b32 [%r43], %r2801; 2026-02-21T08:52:46.4435968Z st.shared.b32 [%r43+8], %r2802; 2026-02-21T08:52:46.4436041Z st.shared.b32 [%r43+16384], %r2817; 2026-02-21T08:52:46.4436108Z st.shared.b32 [%r43+16392], %r2818; 2026-02-21T08:52:46.4436172Z st.shared.b32 [%r44], %r2803; 2026-02-21T08:52:46.4436243Z st.shared.b32 [%r44+8], %r2804; 2026-02-21T08:52:46.4436309Z st.shared.b32 [%r44+16384], %r2819; 2026-02-21T08:52:46.4436378Z st.shared.b32 [%r44+16392], %r2820; 2026-02-21T08:52:46.4436583Z st.shared.b32 [%r45], %r2805; 2026-02-21T08:52:46.4436661Z st.shared.b32 [%r45+8], %r2806; 2026-02-21T08:52:46.4436728Z st.shared.b32 [%r45+16384], %r2821; 2026-02-21T08:52:46.4436795Z st.shared.b32 [%r45+16392], %r2822; 2026-02-21T08:52:46.4436866Z st.shared.b32 [%r46], %r2807; 2026-02-21T08:52:46.4436932Z st.shared.b32 [%r46+8], %r2808; 2026-02-21T08:52:46.4437000Z st.shared.b32 [%r46+16384], %r2823; 2026-02-21T08:52:46.4437071Z st.shared.b32 [%r46+16392], %r2824; 2026-02-21T08:52:46.4437129Z $L__tmp5: 2026-02-21T08:52:46.4437423Z .loc 2 291 36 // standard.py:291:36 @[ cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:84:40 ] 2026-02-21T08:52:46.4437499Z // begin inline asm 2026-02-21T08:52:46.4437593Z fence.proxy.async.shared::cta; 2026-02-21T08:52:46.4437654Z // end inline asm 2026-02-21T08:52:46.4437711Z bar.sync 0; 2026-02-21T08:52:46.4437800Z shfl.sync.idx.b32 %r2825, %r4, 0, 31, -1; 2026-02-21T08:52:46.4437878Z wgmma.fence.sync.aligned; 2026-02-21T08:52:46.4437942Z shl.b32 %r2826, %r2825, 11; 2026-02-21T08:52:46.4438013Z and.b32 %r2827, %r2826, 8192; 2026-02-21T08:52:46.4438076Z add.s32 %r2828, %r2827, %r3769; 2026-02-21T08:52:46.4438139Z bfe.u32 %r2829, %r2828, 4, 14; 2026-02-21T08:52:46.4438202Z cvt.u64.u32 %rd215, %r2829; 2026-02-21T08:52:46.4438287Z or.b64 %rd202, %rd215, 4611686293372403712; 2026-02-21T08:52:46.4438354Z mov.pred %p28, -1; 2026-02-21T08:52:46.4438413Z // begin inline asm 2026-02-21T08:52:46.4439188Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2213,%r2214,%r2215,%r2216}, %rd202, %p28, 1, 1; 2026-02-21T08:52:46.4439250Z // end inline asm 2026-02-21T08:52:46.4439311Z add.s32 %r2830, %r2828, 32; 2026-02-21T08:52:46.4439524Z bfe.u32 %r2831, %r2830, 4, 14; 2026-02-21T08:52:46.4439589Z cvt.u64.u32 %rd216, %r2831; 2026-02-21T08:52:46.4439663Z or.b64 %rd203, %rd216, 4611686293372403712; 2026-02-21T08:52:46.4439722Z // begin inline asm 2026-02-21T08:52:46.4440487Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2281,%r2282,%r2283,%r2284}, %rd203, %p28, 1, 1; 2026-02-21T08:52:46.4440545Z // end inline asm 2026-02-21T08:52:46.4440606Z add.s32 %r2832, %r2828, 64; 2026-02-21T08:52:46.4440674Z bfe.u32 %r2833, %r2832, 4, 14; 2026-02-21T08:52:46.4440865Z cvt.u64.u32 %rd217, %r2833; 2026-02-21T08:52:46.4440947Z or.b64 %rd204, %rd217, 4611686293372403712; 2026-02-21T08:52:46.4441020Z // begin inline asm 2026-02-21T08:52:46.4441775Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2349,%r2350,%r2351,%r2352}, %rd204, %p28, 1, 1; 2026-02-21T08:52:46.4441840Z // end inline asm 2026-02-21T08:52:46.4441905Z add.s32 %r2834, %r2828, 96; 2026-02-21T08:52:46.4441969Z bfe.u32 %r2835, %r2834, 4, 14; 2026-02-21T08:52:46.4442034Z cvt.u64.u32 %rd218, %r2835; 2026-02-21T08:52:46.4442110Z or.b64 %rd205, %rd218, 4611686293372403712; 2026-02-21T08:52:46.4442170Z // begin inline asm 2026-02-21T08:52:46.4442924Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2417,%r2418,%r2419,%r2420}, %rd205, %p28, 1, 1; 2026-02-21T08:52:46.4442991Z // end inline asm 2026-02-21T08:52:46.4443056Z add.s32 %r2836, %r2828, 16384; 2026-02-21T08:52:46.4443116Z bfe.u32 %r2837, %r2836, 4, 14; 2026-02-21T08:52:46.4443179Z cvt.u64.u32 %rd219, %r2837; 2026-02-21T08:52:46.4443257Z or.b64 %rd206, %rd219, 4611686293372403712; 2026-02-21T08:52:46.4443317Z // begin inline asm 2026-02-21T08:52:46.4444066Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2485,%r2486,%r2487,%r2488}, %rd206, %p28, 1, 1; 2026-02-21T08:52:46.4444130Z // end inline asm 2026-02-21T08:52:46.4444195Z add.s32 %r2838, %r2828, 16416; 2026-02-21T08:52:46.4444258Z bfe.u32 %r2839, %r2838, 4, 14; 2026-02-21T08:52:46.4444326Z cvt.u64.u32 %rd220, %r2839; 2026-02-21T08:52:46.4444398Z or.b64 %rd207, %rd220, 4611686293372403712; 2026-02-21T08:52:46.4444460Z // begin inline asm 2026-02-21T08:52:46.4445223Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2553,%r2554,%r2555,%r2556}, %rd207, %p28, 1, 1; 2026-02-21T08:52:46.4445283Z // end inline asm 2026-02-21T08:52:46.4445345Z add.s32 %r2840, %r2828, 16448; 2026-02-21T08:52:46.4445405Z bfe.u32 %r2841, %r2840, 4, 14; 2026-02-21T08:52:46.4445476Z cvt.u64.u32 %rd221, %r2841; 2026-02-21T08:52:46.4445548Z or.b64 %rd208, %rd221, 4611686293372403712; 2026-02-21T08:52:46.4445608Z // begin inline asm 2026-02-21T08:52:46.4446368Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2621,%r2622,%r2623,%r2624}, %rd208, %p28, 1, 1; 2026-02-21T08:52:46.4446677Z // end inline asm 2026-02-21T08:52:46.4446740Z add.s32 %r2842, %r2828, 16480; 2026-02-21T08:52:46.4446810Z bfe.u32 %r2843, %r2842, 4, 14; 2026-02-21T08:52:46.4446874Z cvt.u64.u32 %rd222, %r2843; 2026-02-21T08:52:46.4446947Z or.b64 %rd209, %rd222, 4611686293372403712; 2026-02-21T08:52:46.4447013Z // begin inline asm 2026-02-21T08:52:46.4447765Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892}, {%r2689,%r2690,%r2691,%r2692}, %rd209, %p28, 1, 1; 2026-02-21T08:52:46.4447975Z // end inline asm 2026-02-21T08:52:46.4448063Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:46.4448122Z mov.b32 %r2727, 0; 2026-02-21T08:52:46.4448183Z mov.b32 %r2725, %r3769; 2026-02-21T08:52:46.4448247Z mov.b32 %r2726, %r2727; 2026-02-21T08:52:46.4448314Z // begin inline asm 2026-02-21T08:52:46.4448876Z // wait for regs: %r3861,%r3862,%r3863,%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r2725,%r2726,%r2727 2026-02-21T08:52:46.4448955Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:46.4449021Z // end inline asm 2026-02-21T08:52:46.4449078Z $L__tmp6: 2026-02-21T08:52:46.4449302Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4449372Z add.s32 %r2844, %r3860, 1; 2026-02-21T08:52:46.4449440Z setp.gt.s32 %p39, %r2844, 1; 2026-02-21T08:52:46.4449512Z selp.b32 %r3860, 0, %r2844, %p39; 2026-02-21T08:52:46.4449719Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4449804Z mad.wide.s32 %rd213, %r3857, 2, %rd53; 2026-02-21T08:52:46.4450017Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4450082Z shl.b32 %r2845, %r3860, 12; 2026-02-21T08:52:46.4450148Z shl.b32 %r2846, %r3860, 13; 2026-02-21T08:52:46.4450213Z add.s32 %r2847, %r1154, %r2846; 2026-02-21T08:52:46.4450278Z add.s32 %r2763, %r2847, %r14; 2026-02-21T08:52:46.4450347Z selp.b32 %r2764, 8, 0, %p37; 2026-02-21T08:52:46.4450408Z // begin inline asm 2026-02-21T08:52:46.4450557Z cp.async.ca.shared.global [ %r2763 + 0 ], [ %rd297 + 0 ], 0x8, %r2764; 2026-02-21T08:52:46.4450617Z // end inline asm 2026-02-21T08:52:46.4450686Z add.s32 %r2765, %r2763, 2048; 2026-02-21T08:52:46.4450746Z // begin inline asm 2026-02-21T08:52:46.4450887Z cp.async.ca.shared.global [ %r2765 + 0 ], [ %rd298 + 0 ], 0x8, %r2764; 2026-02-21T08:52:46.4450952Z // end inline asm 2026-02-21T08:52:46.4451015Z add.s32 %r2767, %r2763, 4096; 2026-02-21T08:52:46.4451078Z // begin inline asm 2026-02-21T08:52:46.4451211Z cp.async.ca.shared.global [ %r2767 + 0 ], [ %rd299 + 0 ], 0x8, %r2764; 2026-02-21T08:52:46.4451274Z // end inline asm 2026-02-21T08:52:46.4451335Z add.s32 %r2769, %r2763, 6144; 2026-02-21T08:52:46.4451394Z // begin inline asm 2026-02-21T08:52:46.4451533Z cp.async.ca.shared.global [ %r2769 + 0 ], [ %rd213 + 0 ], 0x8, %r2764; 2026-02-21T08:52:46.4451592Z // end inline asm 2026-02-21T08:52:46.4451658Z cp.async.commit_group; 2026-02-21T08:52:46.4451869Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4451934Z cvt.s64.s32 %rd223, %r3858; 2026-02-21T08:52:46.4451999Z add.s64 %rd214, %rd54, %rd223; 2026-02-21T08:52:46.4452203Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4452272Z add.s32 %r2771, %r20, %r2845; 2026-02-21T08:52:46.4452335Z selp.b32 %r2772, 16, 0, %p37; 2026-02-21T08:52:46.4452536Z // begin inline asm 2026-02-21T08:52:46.4452688Z cp.async.cg.shared.global [ %r2771 + 0 ], [ %rd214 + 0 ], 0x10, %r2772; 2026-02-21T08:52:46.4452745Z // end inline asm 2026-02-21T08:52:46.4452809Z cp.async.commit_group; 2026-02-21T08:52:46.4453020Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4453089Z add.s32 %r3858, %r3858, 229376; 2026-02-21T08:52:46.4453153Z add.s64 %rd299, %rd299, 128; 2026-02-21T08:52:46.4453214Z add.s64 %rd298, %rd298, 128; 2026-02-21T08:52:46.4453279Z add.s64 %rd297, %rd297, 128; 2026-02-21T08:52:46.4453340Z add.s32 %r3857, %r3857, 64; 2026-02-21T08:52:46.4453405Z setp.lt.u64 %p40, %rd300, 4064; 2026-02-21T08:52:46.4453472Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:46.4453670Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:46.4453881Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4453951Z or.b32 %r2884, %r264, %r11; 2026-02-21T08:52:46.4454168Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4454237Z cp.async.wait_group 0; 2026-02-21T08:52:46.4454294Z bar.sync 0; 2026-02-21T08:52:46.4454501Z .loc 1 87 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:87:28 2026-02-21T08:52:46.4454584Z cvt.rn.bf16x2.f32 %r2885, %r3862, %r3861; 2026-02-21T08:52:46.4454661Z cvt.rn.bf16x2.f32 %r2886, %r3864, %r3863; 2026-02-21T08:52:46.4454736Z cvt.rn.bf16x2.f32 %r2887, %r3866, %r3865; 2026-02-21T08:52:46.4454809Z cvt.rn.bf16x2.f32 %r2888, %r3868, %r3867; 2026-02-21T08:52:46.4454880Z cvt.rn.bf16x2.f32 %r2889, %r3870, %r3869; 2026-02-21T08:52:46.4454954Z cvt.rn.bf16x2.f32 %r2890, %r3872, %r3871; 2026-02-21T08:52:46.4455043Z cvt.rn.bf16x2.f32 %r2891, %r3874, %r3873; 2026-02-21T08:52:46.4455124Z cvt.rn.bf16x2.f32 %r2892, %r3876, %r3875; 2026-02-21T08:52:46.4455198Z cvt.rn.bf16x2.f32 %r2893, %r3878, %r3877; 2026-02-21T08:52:46.4455275Z cvt.rn.bf16x2.f32 %r2894, %r3880, %r3879; 2026-02-21T08:52:46.4455347Z cvt.rn.bf16x2.f32 %r2895, %r3882, %r3881; 2026-02-21T08:52:46.4455418Z cvt.rn.bf16x2.f32 %r2896, %r3884, %r3883; 2026-02-21T08:52:46.4455493Z cvt.rn.bf16x2.f32 %r2897, %r3886, %r3885; 2026-02-21T08:52:46.4455563Z cvt.rn.bf16x2.f32 %r2898, %r3888, %r3887; 2026-02-21T08:52:46.4455635Z cvt.rn.bf16x2.f32 %r2899, %r3890, %r3889; 2026-02-21T08:52:46.4455705Z cvt.rn.bf16x2.f32 %r2900, %r3892, %r3891; 2026-02-21T08:52:46.4455921Z .loc 1 88 50 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:50 2026-02-21T08:52:46.4455994Z mad.lo.s32 %r2901, %r260, 7168, %r2884; 2026-02-21T08:52:46.4456066Z mad.lo.s32 %r2902, %r261, 7168, %r2884; 2026-02-21T08:52:46.4456141Z mad.lo.s32 %r2903, %r262, 7168, %r2884; 2026-02-21T08:52:46.4456207Z mad.lo.s32 %r2904, %r263, 7168, %r2884; 2026-02-21T08:52:46.4456413Z .loc 1 88 22 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:22 2026-02-21T08:52:46.4456615Z mad.wide.s32 %rd224, %r2901, 2, %rd55; 2026-02-21T08:52:46.4456689Z mad.wide.s32 %rd225, %r2902, 2, %rd55; 2026-02-21T08:52:46.4456757Z mad.wide.s32 %rd226, %r2903, 2, %rd55; 2026-02-21T08:52:46.4456824Z mad.wide.s32 %rd227, %r2904, 2, %rd55; 2026-02-21T08:52:46.4457030Z .loc 1 88 81 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:81 2026-02-21T08:52:46.4457145Z st.shared.v4.b32 [%r47], {%r2885, %r2887, %r2889, %r2891}; 2026-02-21T08:52:46.4457262Z st.shared.v4.b32 [%r47+512], {%r2886, %r2888, %r2890, %r2892}; 2026-02-21T08:52:46.4457392Z st.shared.v4.b32 [%r48], {%r2893, %r2895, %r2897, %r2899}; 2026-02-21T08:52:46.4457508Z st.shared.v4.b32 [%r48+512], {%r2894, %r2896, %r2898, %r2900}; 2026-02-21T08:52:46.4457566Z bar.sync 0; 2026-02-21T08:52:46.4457634Z // begin inline asm 2026-02-21T08:52:46.4457831Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2848, %r2849, %r2850, %r2851}, [%r1230]; 2026-02-21T08:52:46.4458459Z // end inline asm 2026-02-21T08:52:46.4458520Z // begin inline asm 2026-02-21T08:52:46.4458711Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2853, %r2854, %r2855, %r2856}, [%r1235]; 2026-02-21T08:52:46.4458770Z // end inline asm 2026-02-21T08:52:46.4458829Z // begin inline asm 2026-02-21T08:52:46.4459016Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2858, %r2859, %r2860, %r2861}, [%r1240]; 2026-02-21T08:52:46.4459074Z // end inline asm 2026-02-21T08:52:46.4459134Z // begin inline asm 2026-02-21T08:52:46.4459317Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2863, %r2864, %r2865, %r2866}, [%r1245]; 2026-02-21T08:52:46.4459374Z // end inline asm 2026-02-21T08:52:46.4459433Z // begin inline asm 2026-02-21T08:52:46.4459686Z st.global.v4.b32 [ %rd224 + 0 ], { %r2848, %r2849, %r2850, %r2851 }; 2026-02-21T08:52:46.4459762Z // end inline asm 2026-02-21T08:52:46.4459825Z // begin inline asm 2026-02-21T08:52:46.4459947Z st.global.v4.b32 [ %rd225 + 0 ], { %r2853, %r2854, %r2855, %r2856 }; 2026-02-21T08:52:46.4460013Z // end inline asm 2026-02-21T08:52:46.4460073Z // begin inline asm 2026-02-21T08:52:46.4460193Z st.global.v4.b32 [ %rd226 + 0 ], { %r2858, %r2859, %r2860, %r2861 }; 2026-02-21T08:52:46.4460252Z // end inline asm 2026-02-21T08:52:46.4460317Z // begin inline asm 2026-02-21T08:52:46.4460437Z st.global.v4.b32 [ %rd227 + 0 ], { %r2863, %r2864, %r2865, %r2866 }; 2026-02-21T08:52:46.4460495Z // end inline asm 2026-02-21T08:52:46.4460718Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4460797Z add.s32 %r3784, %r3784, 12672; 2026-02-21T08:52:46.4460872Z setp.lt.s32 %p41, %r3784, %r3893; 2026-02-21T08:52:46.4460944Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:46.4461040Z $L__BB0_9: // %.preheader 2026-02-21T08:52:46.4461112Z setp.gt.s32 %p42, %r3893, 55; 2026-02-21T08:52:46.4461175Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:46.4461272Z // %bb.10: // %.lr.ph147 2026-02-21T08:52:46.4461484Z .loc 1 0 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:0:112 2026-02-21T08:52:46.4461549Z shl.b32 %r2906, %r3767, 3; 2026-02-21T08:52:46.4461618Z and.b32 %r2908, %r3768, 56; 2026-02-21T08:52:46.4461681Z xor.b32 %r58, %r2906, %r2908; 2026-02-21T08:52:46.4461745Z add.s32 %r2910, %r3769, %r58; 2026-02-21T08:52:46.4461815Z add.s32 %r2950, %r2910, 32768; 2026-02-21T08:52:46.4461878Z add.s32 %r2952, %r2910, 34816; 2026-02-21T08:52:46.4461937Z add.s32 %r2954, %r2910, 36864; 2026-02-21T08:52:46.4461996Z add.s32 %r2956, %r2910, 38912; 2026-02-21T08:52:46.4462064Z shl.b32 %r2911, %r3767, 4; 2026-02-21T08:52:46.4462130Z add.s32 %r2912, %r3769, %r2911; 2026-02-21T08:52:46.4462195Z add.s32 %r64, %r2912, 49152; 2026-02-21T08:52:46.4462262Z add.s32 %r2960, %r2910, 40960; 2026-02-21T08:52:46.4462321Z add.s32 %r2962, %r2910, 43008; 2026-02-21T08:52:46.4462381Z add.s32 %r2964, %r2910, 45056; 2026-02-21T08:52:46.4462444Z add.s32 %r2966, %r2910, 47104; 2026-02-21T08:52:46.4462509Z add.s32 %r2968, %r2912, 53248; 2026-02-21T08:52:46.4462569Z and.b32 %r2914, %r3771, 6144; 2026-02-21T08:52:46.4462630Z and.b32 %r2916, %r3772, 896; 2026-02-21T08:52:46.4462699Z and.b32 %r2918, %r3773, 62; 2026-02-21T08:52:46.4462760Z or.b32 %r2919, %r2914, %r2916; 2026-02-21T08:52:46.4462818Z or.b32 %r70, %r2919, %r2918; 2026-02-21T08:52:46.4462879Z xor.b32 %r71, %r70, 8; 2026-02-21T08:52:46.4462958Z xor.b32 %r72, %r70, 16; 2026-02-21T08:52:46.4463022Z xor.b32 %r73, %r70, 24; 2026-02-21T08:52:46.4463084Z xor.b32 %r74, %r70, 32; 2026-02-21T08:52:46.4463147Z xor.b32 %r75, %r70, 40; 2026-02-21T08:52:46.4463206Z xor.b32 %r76, %r70, 48; 2026-02-21T08:52:46.4463266Z xor.b32 %r77, %r70, 56; 2026-02-21T08:52:46.4463328Z shl.b32 %r2920, %r3774, 7; 2026-02-21T08:52:46.4463394Z or.b32 %r2922, %r3779, %r2920; 2026-02-21T08:52:46.4463454Z or.b32 %r2923, %r2922, %r10; 2026-02-21T08:52:46.4463622Z add.s32 %r83, %r3769, %r2923; 2026-02-21T08:52:46.4463689Z xor.b32 %r2924, %r2923, 16; 2026-02-21T08:52:46.4463750Z add.s32 %r84, %r3769, %r2924; 2026-02-21T08:52:46.4463810Z xor.b32 %r2925, %r2923, 32; 2026-02-21T08:52:46.4463875Z add.s32 %r85, %r3769, %r2925; 2026-02-21T08:52:46.4463935Z xor.b32 %r2926, %r2923, 48; 2026-02-21T08:52:46.4463996Z add.s32 %r86, %r3769, %r2926; 2026-02-21T08:52:46.4464054Z xor.b32 %r2927, %r2923, 64; 2026-02-21T08:52:46.4464120Z add.s32 %r87, %r3769, %r2927; 2026-02-21T08:52:46.4464180Z xor.b32 %r2928, %r2923, 80; 2026-02-21T08:52:46.4464243Z add.s32 %r88, %r3769, %r2928; 2026-02-21T08:52:46.4464307Z xor.b32 %r2929, %r2923, 96; 2026-02-21T08:52:46.4464368Z add.s32 %r89, %r3769, %r2929; 2026-02-21T08:52:46.4464518Z xor.b32 %r2930, %r2923, 112; 2026-02-21T08:52:46.4464583Z add.s32 %r90, %r3769, %r2930; 2026-02-21T08:52:46.4464649Z shl.b32 %r2932, %r3780, 12; 2026-02-21T08:52:46.4464708Z and.b32 %r2933, %r3772, 3168; 2026-02-21T08:52:46.4464774Z shl.b32 %r2935, %r3781, 4; 2026-02-21T08:52:46.4464843Z and.b32 %r2937, %r3782, 16; 2026-02-21T08:52:46.4464905Z or.b32 %r2939, %r2932, %r2937; 2026-02-21T08:52:46.4464967Z or.b32 %r2940, %r2933, %r2935; 2026-02-21T08:52:46.4465030Z xor.b32 %r2941, %r2940, %r3783; 2026-02-21T08:52:46.4465098Z or.b32 %r2942, %r2941, %r2939; 2026-02-21T08:52:46.4465160Z add.s32 %r91, %r3769, %r2942; 2026-02-21T08:52:46.4465221Z xor.b32 %r2943, %r2942, 32; 2026-02-21T08:52:46.4465300Z add.s32 %r92, %r3769, %r2943; 2026-02-21T08:52:46.4465363Z shl.b32 %r2944, %r3781, 9; 2026-02-21T08:52:46.4465423Z shl.b32 %r2945, %r3780, 5; 2026-02-21T08:52:46.4465485Z and.b32 %r2946, %r3782, 1008; 2026-02-21T08:52:46.4465551Z or.b32 %r2947, %r2944, %r2945; 2026-02-21T08:52:46.4465617Z xor.b32 %r2948, %r2947, %r2946; 2026-02-21T08:52:46.4465679Z add.s32 %r3714, %r3769, %r2948; 2026-02-21T08:52:46.4465746Z add.s32 %r3719, %r3714, 1024; 2026-02-21T08:52:46.4465806Z add.s32 %r3724, %r3714, 2048; 2026-02-21T08:52:46.4465869Z add.s32 %r3729, %r3714, 3072; 2026-02-21T08:52:46.4466088Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4466163Z mad.wide.u32 %rd228, %r423, 8, %rd53; 2026-02-21T08:52:46.4466226Z add.s64 %rd2, %rd228, 256; 2026-02-21T08:52:46.4466432Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4466633Z or.b32 %r99, %r9, 128; 2026-02-21T08:52:46.4466843Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4466918Z mad.wide.u32 %rd229, %r12, 7168, %rd54; 2026-02-21T08:52:46.4466987Z add.s64 %rd3, %rd229, 458752; 2026-02-21T08:52:46.4467054Z cvt.u64.u32 %rd260, %r3770; 2026-02-21T08:52:46.4467168Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:46.4467273Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:46.4467476Z .loc 1 25 35 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:25:35 2026-02-21T08:52:46.4467550Z mul.hi.s32 %r2973, %r3893, -1840700269; 2026-02-21T08:52:46.4467619Z add.s32 %r2974, %r2973, %r3893; 2026-02-21T08:52:46.4467680Z shr.u32 %r2975, %r2974, 31; 2026-02-21T08:52:46.4467740Z shr.s32 %r2976, %r2974, 6; 2026-02-21T08:52:46.4467804Z add.s32 %r2977, %r2976, %r2975; 2026-02-21T08:52:46.4468008Z .loc 1 26 33 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:26:33 2026-02-21T08:52:46.4468072Z shl.b32 %r2978, %r2977, 1; 2026-02-21T08:52:46.4468283Z .loc 1 27 39 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:39 2026-02-21T08:52:46.4468353Z sub.s32 %r2979, 1, %r2978; 2026-02-21T08:52:46.4468624Z .loc 1 27 52 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:27:52 2026-02-21T08:52:46.4468688Z min.u32 %r2980, %r2979, 2; 2026-02-21T08:52:46.4469055Z .loc 1 28 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:45 2026-02-21T08:52:46.4469121Z mul.lo.s32 %r2981, %r2977, 112; 2026-02-21T08:52:46.4469183Z sub.s32 %r2982, %r3893, %r2981; 2026-02-21T08:52:46.4469382Z .loc 1 28 64 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:64 2026-02-21T08:52:46.4469453Z cvt.u16.u32 %rs577, %r2982; 2026-02-21T08:52:46.4469514Z cvt.s8.s32 %rs578, %r2982; 2026-02-21T08:52:46.4469578Z cvt.u16.u32 %rs579, %r2980; 2026-02-21T08:52:46.4469781Z .loc 1 29 51 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:29:51 2026-02-21T08:52:46.4469849Z div.s16 %rs580, %rs578, %rs579; 2026-02-21T08:52:46.4470165Z .loc 1 28 64 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:64 2026-02-21T08:52:46.4470243Z mul.lo.s16 %rs581, %rs580, %rs579; 2026-02-21T08:52:46.4470307Z sub.s16 %rs582, %rs577, %rs581; 2026-02-21T08:52:46.4470371Z cvt.u32.u16 %r2983, %rs582; 2026-02-21T08:52:46.4470434Z cvt.s32.s8 %r2984, %r2983; 2026-02-21T08:52:46.4470639Z .loc 1 28 30 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:28:30 2026-02-21T08:52:46.4470702Z add.s32 %r2985, %r2978, %r2984; 2026-02-21T08:52:46.4470901Z .loc 1 30 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:30:27 2026-02-21T08:52:46.4470966Z shl.b32 %r2986, %r2985, 6; 2026-02-21T08:52:46.4471161Z .loc 1 31 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:31:32 2026-02-21T08:52:46.4471224Z or.b32 %r341, %r2986, %r5; 2026-02-21T08:52:46.4471289Z or.b32 %r342, %r2986, %r6; 2026-02-21T08:52:46.4471348Z or.b32 %r343, %r2986, %r7; 2026-02-21T08:52:46.4471410Z or.b32 %r344, %r2986, %r8; 2026-02-21T08:52:46.4471608Z .loc 1 32 27 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:32:27 2026-02-21T08:52:46.4471679Z cvt.s16.s8 %rs583, %rs580; 2026-02-21T08:52:46.4471748Z mul.wide.s16 %r345, %rs583, 128; 2026-02-21T08:52:46.4471946Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4472012Z or.b32 %r2987, %r345, %r10; 2026-02-21T08:52:46.4472211Z .loc 1 48 53 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:53 2026-02-21T08:52:46.4472272Z shl.b32 %r2988, %r341, 13; 2026-02-21T08:52:46.4472338Z shl.b32 %r2989, %r342, 13; 2026-02-21T08:52:46.4472396Z shl.b32 %r2990, %r343, 13; 2026-02-21T08:52:46.4472454Z shl.b32 %r2991, %r344, 13; 2026-02-21T08:52:46.4472653Z .loc 1 48 60 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:60 2026-02-21T08:52:46.4472721Z or.b32 %r2992, %r2988, %r9; 2026-02-21T08:52:46.4472782Z or.b32 %r2993, %r2989, %r9; 2026-02-21T08:52:46.4472842Z or.b32 %r2994, %r2990, %r9; 2026-02-21T08:52:46.4472913Z or.b32 %r2995, %r2991, %r9; 2026-02-21T08:52:46.4473121Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4473195Z mad.wide.s32 %rd230, %r2992, 2, %rd53; 2026-02-21T08:52:46.4473269Z mad.wide.s32 %rd231, %r2993, 2, %rd53; 2026-02-21T08:52:46.4473336Z mad.wide.s32 %rd232, %r2994, 2, %rd53; 2026-02-21T08:52:46.4473401Z mad.wide.s32 %rd233, %r2995, 2, %rd53; 2026-02-21T08:52:46.4473458Z mov.b32 %r2951, 8; 2026-02-21T08:52:46.4473662Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4473722Z // begin inline asm 2026-02-21T08:52:46.4473867Z cp.async.ca.shared.global [ %r2950 + 0 ], [ %rd230 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4473930Z // end inline asm 2026-02-21T08:52:46.4473992Z // begin inline asm 2026-02-21T08:52:46.4474128Z cp.async.ca.shared.global [ %r2952 + 0 ], [ %rd231 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4474191Z // end inline asm 2026-02-21T08:52:46.4474250Z // begin inline asm 2026-02-21T08:52:46.4474487Z cp.async.ca.shared.global [ %r2954 + 0 ], [ %rd232 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4474544Z // end inline asm 2026-02-21T08:52:46.4474609Z // begin inline asm 2026-02-21T08:52:46.4474744Z cp.async.ca.shared.global [ %r2956 + 0 ], [ %rd233 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4474801Z // end inline asm 2026-02-21T08:52:46.4474875Z cp.async.commit_group; 2026-02-21T08:52:46.4475077Z .loc 1 54 62 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:62 2026-02-21T08:52:46.4475140Z add.s32 %r2996, %r2987, %r3770; 2026-02-21T08:52:46.4475350Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4475421Z cvt.s64.s32 %rd241, %r2996; 2026-02-21T08:52:46.4475573Z add.s64 %rd234, %rd54, %rd241; 2026-02-21T08:52:46.4475636Z mov.b32 %r2959, 16; 2026-02-21T08:52:46.4475840Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4475905Z // begin inline asm 2026-02-21T08:52:46.4476045Z cp.async.cg.shared.global [ %r64 + 0 ], [ %rd234 + 0 ], 0x10, %r2959; 2026-02-21T08:52:46.4476108Z // end inline asm 2026-02-21T08:52:46.4476173Z cp.async.commit_group; 2026-02-21T08:52:46.4476373Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4476436Z cvt.s64.s32 %rd242, %r2988; 2026-02-21T08:52:46.4476639Z or.b64 %rd244, %rd242, %rd288; 2026-02-21T08:52:46.4476702Z shl.b64 %rd245, %rd244, 1; 2026-02-21T08:52:46.4476764Z add.s64 %rd246, %rd53, %rd245; 2026-02-21T08:52:46.4476834Z add.s64 %rd235, %rd246, 128; 2026-02-21T08:52:46.4476897Z cvt.s64.s32 %rd247, %r2989; 2026-02-21T08:52:46.4476959Z or.b64 %rd248, %rd247, %rd288; 2026-02-21T08:52:46.4477029Z shl.b64 %rd249, %rd248, 1; 2026-02-21T08:52:46.4477093Z add.s64 %rd250, %rd53, %rd249; 2026-02-21T08:52:46.4477155Z add.s64 %rd236, %rd250, 128; 2026-02-21T08:52:46.4477221Z cvt.s64.s32 %rd251, %r2990; 2026-02-21T08:52:46.4477291Z or.b64 %rd252, %rd251, %rd288; 2026-02-21T08:52:46.4477353Z shl.b64 %rd253, %rd252, 1; 2026-02-21T08:52:46.4477416Z add.s64 %rd254, %rd53, %rd253; 2026-02-21T08:52:46.4477483Z add.s64 %rd237, %rd254, 128; 2026-02-21T08:52:46.4477544Z cvt.s64.s32 %rd255, %r2991; 2026-02-21T08:52:46.4477605Z or.b64 %rd256, %rd255, %rd288; 2026-02-21T08:52:46.4477666Z shl.b64 %rd257, %rd256, 1; 2026-02-21T08:52:46.4477735Z add.s64 %rd258, %rd53, %rd257; 2026-02-21T08:52:46.4477798Z add.s64 %rd238, %rd258, 128; 2026-02-21T08:52:46.4478002Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4478068Z bar.sync 0; 2026-02-21T08:52:46.4478129Z // begin inline asm 2026-02-21T08:52:46.4478282Z cp.async.ca.shared.global [ %r2960 + 0 ], [ %rd235 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4478343Z // end inline asm 2026-02-21T08:52:46.4478412Z // begin inline asm 2026-02-21T08:52:46.4478549Z cp.async.ca.shared.global [ %r2962 + 0 ], [ %rd236 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4478606Z // end inline asm 2026-02-21T08:52:46.4478672Z // begin inline asm 2026-02-21T08:52:46.4478803Z cp.async.ca.shared.global [ %r2964 + 0 ], [ %rd237 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4478866Z // end inline asm 2026-02-21T08:52:46.4478932Z // begin inline asm 2026-02-21T08:52:46.4479067Z cp.async.ca.shared.global [ %r2966 + 0 ], [ %rd238 + 0 ], 0x8, %r2951; 2026-02-21T08:52:46.4479125Z // end inline asm 2026-02-21T08:52:46.4479193Z cp.async.commit_group; 2026-02-21T08:52:46.4479404Z .loc 1 54 34 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:34 2026-02-21T08:52:46.4479467Z cvt.s64.s32 %rd259, %r2987; 2026-02-21T08:52:46.4479535Z add.s64 %rd261, %rd259, %rd260; 2026-02-21T08:52:46.4479608Z add.s64 %rd262, %rd54, %rd261; 2026-02-21T08:52:46.4479672Z add.s64 %rd239, %rd262, 229376; 2026-02-21T08:52:46.4479871Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4480076Z // begin inline asm 2026-02-21T08:52:46.4480218Z cp.async.cg.shared.global [ %r2968 + 0 ], [ %rd239 + 0 ], 0x10, %r2959; 2026-02-21T08:52:46.4480280Z // end inline asm 2026-02-21T08:52:46.4480346Z cp.async.commit_group; 2026-02-21T08:52:46.4480563Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4480625Z shl.b32 %r2997, %r2977, 7; 2026-02-21T08:52:46.4480685Z or.b32 %r2998, %r7, %r2997; 2026-02-21T08:52:46.4480759Z cvt.s16.s8 %rs584, %rs582; 2026-02-21T08:52:46.4480836Z mul.wide.s16 %r2999, %rs584, 64; 2026-02-21T08:52:46.4480900Z add.s32 %r3000, %r2998, %r2999; 2026-02-21T08:52:46.4480961Z shl.b32 %r3001, %r3000, 13; 2026-02-21T08:52:46.4481153Z mad.wide.s32 %rd304, %r3001, 2, %rd2; 2026-02-21T08:52:46.4481218Z or.b32 %r3002, %r6, %r2997; 2026-02-21T08:52:46.4481279Z add.s32 %r3003, %r3002, %r2999; 2026-02-21T08:52:46.4481348Z shl.b32 %r3004, %r3003, 13; 2026-02-21T08:52:46.4481416Z mad.wide.s32 %rd303, %r3004, 2, %rd2; 2026-02-21T08:52:46.4481477Z or.b32 %r3005, %r5, %r2997; 2026-02-21T08:52:46.4481545Z add.s32 %r3006, %r3005, %r2999; 2026-02-21T08:52:46.4481607Z shl.b32 %r3007, %r3006, 13; 2026-02-21T08:52:46.4481674Z mad.wide.s32 %rd302, %r3007, 2, %rd2; 2026-02-21T08:52:46.4481733Z or.b32 %r3008, %r8, %r2997; 2026-02-21T08:52:46.4481800Z add.s32 %r3009, %r3008, %r2999; 2026-02-21T08:52:46.4481859Z shl.b32 %r3010, %r3009, 13; 2026-02-21T08:52:46.4481919Z or.b32 %r3894, %r99, %r3010; 2026-02-21T08:52:46.4481985Z add.s64 %rd301, %rd3, %rd259; 2026-02-21T08:52:46.4482045Z mov.b32 %r3897, 0f00000000; 2026-02-21T08:52:46.4482102Z mov.b32 %r3896, 1; 2026-02-21T08:52:46.4482164Z mov.b32 %r3895, -1; 2026-02-21T08:52:46.4482232Z mov.b64 %rd305, -32; 2026-02-21T08:52:46.4482292Z mov.b32 %r3898, %r3897; 2026-02-21T08:52:46.4482354Z mov.b32 %r3899, %r3897; 2026-02-21T08:52:46.4482417Z mov.b32 %r3900, %r3897; 2026-02-21T08:52:46.4482479Z mov.b32 %r3901, %r3897; 2026-02-21T08:52:46.4482540Z mov.b32 %r3902, %r3897; 2026-02-21T08:52:46.4482601Z mov.b32 %r3903, %r3897; 2026-02-21T08:52:46.4482667Z mov.b32 %r3904, %r3897; 2026-02-21T08:52:46.4482726Z mov.b32 %r3905, %r3897; 2026-02-21T08:52:46.4482786Z mov.b32 %r3906, %r3897; 2026-02-21T08:52:46.4482849Z mov.b32 %r3907, %r3897; 2026-02-21T08:52:46.4482909Z mov.b32 %r3908, %r3897; 2026-02-21T08:52:46.4482966Z mov.b32 %r3909, %r3897; 2026-02-21T08:52:46.4483036Z mov.b32 %r3910, %r3897; 2026-02-21T08:52:46.4483103Z mov.b32 %r3911, %r3897; 2026-02-21T08:52:46.4483163Z mov.b32 %r3912, %r3897; 2026-02-21T08:52:46.4483222Z mov.b32 %r3913, %r3897; 2026-02-21T08:52:46.4483288Z mov.b32 %r3914, %r3897; 2026-02-21T08:52:46.4483349Z mov.b32 %r3915, %r3897; 2026-02-21T08:52:46.4483407Z mov.b32 %r3916, %r3897; 2026-02-21T08:52:46.4483464Z mov.b32 %r3917, %r3897; 2026-02-21T08:52:46.4483529Z mov.b32 %r3918, %r3897; 2026-02-21T08:52:46.4483591Z mov.b32 %r3919, %r3897; 2026-02-21T08:52:46.4483650Z mov.b32 %r3920, %r3897; 2026-02-21T08:52:46.4483713Z mov.b32 %r3921, %r3897; 2026-02-21T08:52:46.4483772Z mov.b32 %r3922, %r3897; 2026-02-21T08:52:46.4483841Z mov.b32 %r3923, %r3897; 2026-02-21T08:52:46.4483908Z mov.b32 %r3924, %r3897; 2026-02-21T08:52:46.4483967Z mov.b32 %r3925, %r3897; 2026-02-21T08:52:46.4484028Z mov.b32 %r3926, %r3897; 2026-02-21T08:52:46.4484085Z mov.b32 %r3927, %r3897; 2026-02-21T08:52:46.4484150Z mov.b32 %r3928, %r3897; 2026-02-21T08:52:46.4484263Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:46.4484373Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:46.4484440Z add.s64 %rd305, %rd305, 32; 2026-02-21T08:52:46.4484511Z setp.lt.u64 %p52, %rd305, 4032; 2026-02-21T08:52:46.4484573Z add.s32 %r3635, %r3895, 1; 2026-02-21T08:52:46.4484639Z setp.gt.s32 %p53, %r3635, 1; 2026-02-21T08:52:46.4484715Z selp.b32 %r3895, 0, %r3635, %p53; 2026-02-21T08:52:46.4485038Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4485109Z cp.async.wait_group 2; 2026-02-21T08:52:46.4485171Z bar.sync 0; 2026-02-21T08:52:46.4485231Z shl.b32 %r3636, %r3895, 12; 2026-02-21T08:52:46.4485291Z shl.b32 %r3637, %r3895, 13; 2026-02-21T08:52:46.4485358Z add.s32 %r3638, %r3769, 32768; 2026-02-21T08:52:46.4485422Z add.s32 %r3639, %r3638, %r3637; 2026-02-21T08:52:46.4485625Z .loc 1 52 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:52:32 2026-02-21T08:52:46.4485690Z add.s32 %r3640, %r3639, %r70; 2026-02-21T08:52:46.4485761Z ld.shared.b16 %rs585, [%r3640]; 2026-02-21T08:52:46.4485833Z ld.shared.b16 %rs586, [%r3640+1024]; 2026-02-21T08:52:46.4485991Z ld.shared.b16 %rs587, [%r3640+64]; 2026-02-21T08:52:46.4486069Z ld.shared.b16 %rs588, [%r3640+1088]; 2026-02-21T08:52:46.4486133Z add.s32 %r3641, %r3639, %r71; 2026-02-21T08:52:46.4486202Z ld.shared.b16 %rs589, [%r3641]; 2026-02-21T08:52:46.4486284Z ld.shared.b16 %rs590, [%r3641+1024]; 2026-02-21T08:52:46.4486360Z ld.shared.b16 %rs591, [%r3641+64]; 2026-02-21T08:52:46.4486429Z ld.shared.b16 %rs592, [%r3641+1088]; 2026-02-21T08:52:46.4486628Z add.s32 %r3642, %r3639, %r72; 2026-02-21T08:52:46.4486704Z ld.shared.b16 %rs593, [%r3642]; 2026-02-21T08:52:46.4486772Z ld.shared.b16 %rs594, [%r3642+1024]; 2026-02-21T08:52:46.4486840Z ld.shared.b16 %rs595, [%r3642+64]; 2026-02-21T08:52:46.4486913Z ld.shared.b16 %rs596, [%r3642+1088]; 2026-02-21T08:52:46.4486976Z add.s32 %r3643, %r3639, %r73; 2026-02-21T08:52:46.4487041Z ld.shared.b16 %rs597, [%r3643]; 2026-02-21T08:52:46.4487107Z ld.shared.b16 %rs598, [%r3643+1024]; 2026-02-21T08:52:46.4487184Z ld.shared.b16 %rs599, [%r3643+64]; 2026-02-21T08:52:46.4487251Z ld.shared.b16 %rs600, [%r3643+1088]; 2026-02-21T08:52:46.4487314Z add.s32 %r3644, %r3639, %r74; 2026-02-21T08:52:46.4487398Z ld.shared.b16 %rs601, [%r3644]; 2026-02-21T08:52:46.4487470Z ld.shared.b16 %rs602, [%r3644+1024]; 2026-02-21T08:52:46.4487542Z ld.shared.b16 %rs603, [%r3644+64]; 2026-02-21T08:52:46.4487609Z ld.shared.b16 %rs604, [%r3644+1088]; 2026-02-21T08:52:46.4487679Z add.s32 %r3645, %r3639, %r75; 2026-02-21T08:52:46.4487745Z ld.shared.b16 %rs605, [%r3645]; 2026-02-21T08:52:46.4487813Z ld.shared.b16 %rs606, [%r3645+1024]; 2026-02-21T08:52:46.4487883Z ld.shared.b16 %rs607, [%r3645+64]; 2026-02-21T08:52:46.4487949Z ld.shared.b16 %rs608, [%r3645+1088]; 2026-02-21T08:52:46.4488010Z add.s32 %r3646, %r3639, %r76; 2026-02-21T08:52:46.4488084Z ld.shared.b16 %rs609, [%r3646]; 2026-02-21T08:52:46.4488151Z ld.shared.b16 %rs610, [%r3646+1024]; 2026-02-21T08:52:46.4488215Z ld.shared.b16 %rs611, [%r3646+64]; 2026-02-21T08:52:46.4488282Z ld.shared.b16 %rs612, [%r3646+1088]; 2026-02-21T08:52:46.4488349Z add.s32 %r3647, %r3639, %r77; 2026-02-21T08:52:46.4488412Z ld.shared.b16 %rs613, [%r3647]; 2026-02-21T08:52:46.4488479Z ld.shared.b16 %rs614, [%r3647+1024]; 2026-02-21T08:52:46.4488552Z ld.shared.b16 %rs615, [%r3647+64]; 2026-02-21T08:52:46.4488619Z ld.shared.b16 %rs616, [%r3647+1088]; 2026-02-21T08:52:46.4488682Z cvt.f32.bf16 %r3075, %rs585; 2026-02-21T08:52:46.4488745Z cvt.f32.bf16 %r3076, %rs586; 2026-02-21T08:52:46.4488817Z cvt.f32.bf16 %r3077, %rs589; 2026-02-21T08:52:46.4488879Z cvt.f32.bf16 %r3078, %rs590; 2026-02-21T08:52:46.4488940Z cvt.f32.bf16 %r3143, %rs593; 2026-02-21T08:52:46.4489005Z cvt.f32.bf16 %r3144, %rs594; 2026-02-21T08:52:46.4489067Z cvt.f32.bf16 %r3145, %rs597; 2026-02-21T08:52:46.4489127Z cvt.f32.bf16 %r3146, %rs598; 2026-02-21T08:52:46.4489188Z cvt.f32.bf16 %r3211, %rs601; 2026-02-21T08:52:46.4489256Z cvt.f32.bf16 %r3212, %rs602; 2026-02-21T08:52:46.4489315Z cvt.f32.bf16 %r3213, %rs605; 2026-02-21T08:52:46.4489379Z cvt.f32.bf16 %r3214, %rs606; 2026-02-21T08:52:46.4489446Z cvt.f32.bf16 %r3279, %rs609; 2026-02-21T08:52:46.4489506Z cvt.f32.bf16 %r3280, %rs610; 2026-02-21T08:52:46.4489741Z cvt.f32.bf16 %r3281, %rs613; 2026-02-21T08:52:46.4489811Z cvt.f32.bf16 %r3282, %rs614; 2026-02-21T08:52:46.4489872Z cvt.f32.bf16 %r3347, %rs587; 2026-02-21T08:52:46.4489935Z cvt.f32.bf16 %r3348, %rs588; 2026-02-21T08:52:46.4489996Z cvt.f32.bf16 %r3349, %rs591; 2026-02-21T08:52:46.4490063Z cvt.f32.bf16 %r3350, %rs592; 2026-02-21T08:52:46.4490123Z cvt.f32.bf16 %r3415, %rs595; 2026-02-21T08:52:46.4490183Z cvt.f32.bf16 %r3416, %rs596; 2026-02-21T08:52:46.4490249Z cvt.f32.bf16 %r3417, %rs599; 2026-02-21T08:52:46.4490309Z cvt.f32.bf16 %r3418, %rs600; 2026-02-21T08:52:46.4490371Z cvt.f32.bf16 %r3483, %rs603; 2026-02-21T08:52:46.4490434Z cvt.f32.bf16 %r3484, %rs604; 2026-02-21T08:52:46.4490500Z cvt.f32.bf16 %r3485, %rs607; 2026-02-21T08:52:46.4490561Z cvt.f32.bf16 %r3486, %rs608; 2026-02-21T08:52:46.4490741Z cvt.f32.bf16 %r3551, %rs611; 2026-02-21T08:52:46.4490816Z cvt.f32.bf16 %r3552, %rs612; 2026-02-21T08:52:46.4490876Z cvt.f32.bf16 %r3553, %rs615; 2026-02-21T08:52:46.4490941Z cvt.f32.bf16 %r3554, %rs616; 2026-02-21T08:52:46.4491150Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4491221Z add.s32 %r3648, %r3769, %r3636; 2026-02-21T08:52:46.4491284Z add.s32 %r3649, %r3648, 49152; 2026-02-21T08:52:46.4491505Z .loc 1 67 45 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:67:45 2026-02-21T08:52:46.4491575Z add.s32 %r3650, %r3649, %r3774; 2026-02-21T08:52:46.4491642Z ld.shared.b8 %rs617, [%r3650]; 2026-02-21T08:52:46.4491709Z ld.shared.b8 %rs618, [%r3650+128]; 2026-02-21T08:52:46.4491779Z ld.shared.b8 %rs619, [%r3650+256]; 2026-02-21T08:52:46.4491844Z ld.shared.b8 %rs620, [%r3650+384]; 2026-02-21T08:52:46.4491909Z ld.shared.b8 %rs621, [%r3650+512]; 2026-02-21T08:52:46.4491977Z ld.shared.b8 %rs622, [%r3650+640]; 2026-02-21T08:52:46.4492049Z ld.shared.b8 %rs623, [%r3650+768]; 2026-02-21T08:52:46.4492112Z add.s32 %r3651, %r3649, %r3775; 2026-02-21T08:52:46.4492178Z ld.shared.b8 %rs624, [%r3651]; 2026-02-21T08:52:46.4492254Z ld.shared.b8 %rs625, [%r3650+1024]; 2026-02-21T08:52:46.4492322Z ld.shared.b8 %rs626, [%r3650+1152]; 2026-02-21T08:52:46.4492389Z ld.shared.b8 %rs627, [%r3650+1280]; 2026-02-21T08:52:46.4492456Z ld.shared.b8 %rs628, [%r3650+1408]; 2026-02-21T08:52:46.4492527Z ld.shared.b8 %rs629, [%r3650+1536]; 2026-02-21T08:52:46.4492591Z ld.shared.b8 %rs630, [%r3650+1664]; 2026-02-21T08:52:46.4492657Z ld.shared.b8 %rs631, [%r3650+1792]; 2026-02-21T08:52:46.4492728Z add.s32 %r3652, %r3649, %r3776; 2026-02-21T08:52:46.4492798Z ld.shared.b8 %rs632, [%r3652]; 2026-02-21T08:52:46.4492864Z ld.shared.b8 %rs633, [%r3650+2048]; 2026-02-21T08:52:46.4492937Z ld.shared.b8 %rs634, [%r3650+2176]; 2026-02-21T08:52:46.4493009Z ld.shared.b8 %rs635, [%r3650+2304]; 2026-02-21T08:52:46.4493073Z ld.shared.b8 %rs636, [%r3650+2432]; 2026-02-21T08:52:46.4493141Z ld.shared.b8 %rs637, [%r3650+2560]; 2026-02-21T08:52:46.4493216Z ld.shared.b8 %rs638, [%r3650+2688]; 2026-02-21T08:52:46.4500008Z ld.shared.b8 %rs639, [%r3650+2816]; 2026-02-21T08:52:46.4500121Z add.s32 %r3653, %r3649, %r3777; 2026-02-21T08:52:46.4500207Z ld.shared.b8 %rs640, [%r3653]; 2026-02-21T08:52:46.4500284Z ld.shared.b8 %rs641, [%r3650+3072]; 2026-02-21T08:52:46.4500352Z ld.shared.b8 %rs642, [%r3650+3200]; 2026-02-21T08:52:46.4500418Z ld.shared.b8 %rs643, [%r3650+3328]; 2026-02-21T08:52:46.4500486Z ld.shared.b8 %rs644, [%r3650+3456]; 2026-02-21T08:52:46.4500550Z ld.shared.b8 %rs645, [%r3650+3584]; 2026-02-21T08:52:46.4500614Z ld.shared.b8 %rs646, [%r3650+3712]; 2026-02-21T08:52:46.4500680Z ld.shared.b8 %rs647, [%r3650+3840]; 2026-02-21T08:52:46.4500744Z add.s32 %r3654, %r3649, %r3778; 2026-02-21T08:52:46.4500807Z ld.shared.b8 %rs648, [%r3654]; 2026-02-21T08:52:46.4501057Z .loc 1 57 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:57:28 2026-02-21T08:52:46.4501138Z shl.b16 %rs649, %rs617, 4; 2026-02-21T08:52:46.4501409Z shl.b16 %rs650, %rs618, 4; 2026-02-21T08:52:46.4501470Z shl.b16 %rs651, %rs619, 4; 2026-02-21T08:52:46.4501532Z shl.b16 %rs652, %rs620, 4; 2026-02-21T08:52:46.4501594Z shl.b16 %rs653, %rs621, 4; 2026-02-21T08:52:46.4501654Z shl.b16 %rs654, %rs622, 4; 2026-02-21T08:52:46.4501716Z shl.b16 %rs655, %rs623, 4; 2026-02-21T08:52:46.4501776Z shl.b16 %rs656, %rs624, 4; 2026-02-21T08:52:46.4501836Z shl.b16 %rs657, %rs625, 4; 2026-02-21T08:52:46.4501895Z shl.b16 %rs658, %rs626, 4; 2026-02-21T08:52:46.4501958Z shl.b16 %rs659, %rs627, 4; 2026-02-21T08:52:46.4502018Z shl.b16 %rs660, %rs628, 4; 2026-02-21T08:52:46.4502088Z shl.b16 %rs661, %rs629, 4; 2026-02-21T08:52:46.4502153Z shl.b16 %rs662, %rs630, 4; 2026-02-21T08:52:46.4502212Z shl.b16 %rs663, %rs631, 4; 2026-02-21T08:52:46.4502395Z shl.b16 %rs664, %rs632, 4; 2026-02-21T08:52:46.4502459Z shl.b16 %rs665, %rs633, 4; 2026-02-21T08:52:46.4502524Z shl.b16 %rs666, %rs634, 4; 2026-02-21T08:52:46.4502585Z shl.b16 %rs667, %rs635, 4; 2026-02-21T08:52:46.4502653Z shl.b16 %rs668, %rs636, 4; 2026-02-21T08:52:46.4502716Z shl.b16 %rs669, %rs637, 4; 2026-02-21T08:52:46.4502776Z shl.b16 %rs670, %rs638, 4; 2026-02-21T08:52:46.4502845Z shl.b16 %rs671, %rs639, 4; 2026-02-21T08:52:46.4502906Z shl.b16 %rs672, %rs640, 4; 2026-02-21T08:52:46.4502970Z shl.b16 %rs673, %rs641, 4; 2026-02-21T08:52:46.4503031Z shl.b16 %rs674, %rs642, 4; 2026-02-21T08:52:46.4503090Z shl.b16 %rs675, %rs643, 4; 2026-02-21T08:52:46.4503155Z shl.b16 %rs676, %rs644, 4; 2026-02-21T08:52:46.4503214Z shl.b16 %rs677, %rs645, 4; 2026-02-21T08:52:46.4503275Z shl.b16 %rs678, %rs646, 4; 2026-02-21T08:52:46.4503333Z shl.b16 %rs679, %rs647, 4; 2026-02-21T08:52:46.4503396Z shl.b16 %rs680, %rs648, 4; 2026-02-21T08:52:46.4503625Z .loc 1 72 58 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:72:58 2026-02-21T08:52:46.4503711Z selp.b16 %rs681, %rs649, %rs617, %p57; 2026-02-21T08:52:46.4503781Z cvt.s16.s8 %rs682, %rs681; 2026-02-21T08:52:46.4503845Z shr.s16 %rs683, %rs682, 4; 2026-02-21T08:52:46.4503915Z selp.b16 %rs684, %rs650, %rs618, %p57; 2026-02-21T08:52:46.4503982Z cvt.s16.s8 %rs685, %rs684; 2026-02-21T08:52:46.4504043Z shr.s16 %rs686, %rs685, 4; 2026-02-21T08:52:46.4504112Z selp.b16 %rs687, %rs651, %rs619, %p57; 2026-02-21T08:52:46.4504175Z cvt.s16.s8 %rs688, %rs687; 2026-02-21T08:52:46.4504238Z shr.s16 %rs689, %rs688, 4; 2026-02-21T08:52:46.4504303Z selp.b16 %rs690, %rs652, %rs620, %p57; 2026-02-21T08:52:46.4504363Z cvt.s16.s8 %rs691, %rs690; 2026-02-21T08:52:46.4504440Z shr.s16 %rs692, %rs691, 4; 2026-02-21T08:52:46.4504521Z selp.b16 %rs693, %rs653, %rs621, %p57; 2026-02-21T08:52:46.4504589Z cvt.s16.s8 %rs694, %rs693; 2026-02-21T08:52:46.4504651Z shr.s16 %rs695, %rs694, 4; 2026-02-21T08:52:46.4504730Z selp.b16 %rs696, %rs654, %rs622, %p57; 2026-02-21T08:52:46.4504805Z cvt.s16.s8 %rs697, %rs696; 2026-02-21T08:52:46.4504869Z shr.s16 %rs698, %rs697, 4; 2026-02-21T08:52:46.4504941Z selp.b16 %rs699, %rs655, %rs623, %p57; 2026-02-21T08:52:46.4505015Z cvt.s16.s8 %rs700, %rs699; 2026-02-21T08:52:46.4505079Z shr.s16 %rs701, %rs700, 4; 2026-02-21T08:52:46.4505149Z selp.b16 %rs702, %rs656, %rs624, %p57; 2026-02-21T08:52:46.4505216Z cvt.s16.s8 %rs703, %rs702; 2026-02-21T08:52:46.4505285Z shr.s16 %rs704, %rs703, 4; 2026-02-21T08:52:46.4505352Z selp.b16 %rs705, %rs657, %rs625, %p57; 2026-02-21T08:52:46.4505413Z cvt.s16.s8 %rs706, %rs705; 2026-02-21T08:52:46.4505479Z shr.s16 %rs707, %rs706, 4; 2026-02-21T08:52:46.4505548Z selp.b16 %rs708, %rs658, %rs626, %p57; 2026-02-21T08:52:46.4505608Z cvt.s16.s8 %rs709, %rs708; 2026-02-21T08:52:46.4505678Z shr.s16 %rs710, %rs709, 4; 2026-02-21T08:52:46.4505747Z selp.b16 %rs711, %rs659, %rs627, %p57; 2026-02-21T08:52:46.4505811Z cvt.s16.s8 %rs712, %rs711; 2026-02-21T08:52:46.4505874Z shr.s16 %rs713, %rs712, 4; 2026-02-21T08:52:46.4505949Z selp.b16 %rs714, %rs660, %rs628, %p57; 2026-02-21T08:52:46.4506013Z cvt.s16.s8 %rs715, %rs714; 2026-02-21T08:52:46.4506185Z shr.s16 %rs716, %rs715, 4; 2026-02-21T08:52:46.4506261Z selp.b16 %rs717, %rs661, %rs629, %p57; 2026-02-21T08:52:46.4506333Z cvt.s16.s8 %rs718, %rs717; 2026-02-21T08:52:46.4506394Z shr.s16 %rs719, %rs718, 4; 2026-02-21T08:52:46.4506608Z selp.b16 %rs720, %rs662, %rs630, %p57; 2026-02-21T08:52:46.4506678Z cvt.s16.s8 %rs721, %rs720; 2026-02-21T08:52:46.4506740Z shr.s16 %rs722, %rs721, 4; 2026-02-21T08:52:46.4506809Z selp.b16 %rs723, %rs663, %rs631, %p57; 2026-02-21T08:52:46.4506876Z cvt.s16.s8 %rs724, %rs723; 2026-02-21T08:52:46.4506938Z shr.s16 %rs725, %rs724, 4; 2026-02-21T08:52:46.4507005Z selp.b16 %rs726, %rs664, %rs632, %p57; 2026-02-21T08:52:46.4507068Z cvt.s16.s8 %rs727, %rs726; 2026-02-21T08:52:46.4507132Z shr.s16 %rs728, %rs727, 4; 2026-02-21T08:52:46.4507343Z selp.b16 %rs729, %rs665, %rs633, %p57; 2026-02-21T08:52:46.4507415Z cvt.s16.s8 %rs730, %rs729; 2026-02-21T08:52:46.4507481Z shr.s16 %rs731, %rs730, 4; 2026-02-21T08:52:46.4507554Z selp.b16 %rs732, %rs666, %rs634, %p57; 2026-02-21T08:52:46.4507620Z cvt.s16.s8 %rs733, %rs732; 2026-02-21T08:52:46.4507680Z shr.s16 %rs734, %rs733, 4; 2026-02-21T08:52:46.4507753Z selp.b16 %rs735, %rs667, %rs635, %p57; 2026-02-21T08:52:46.4507815Z cvt.s16.s8 %rs736, %rs735; 2026-02-21T08:52:46.4507876Z shr.s16 %rs737, %rs736, 4; 2026-02-21T08:52:46.4507950Z selp.b16 %rs738, %rs668, %rs636, %p57; 2026-02-21T08:52:46.4508013Z cvt.s16.s8 %rs739, %rs738; 2026-02-21T08:52:46.4508075Z shr.s16 %rs740, %rs739, 4; 2026-02-21T08:52:46.4508149Z selp.b16 %rs741, %rs669, %rs637, %p57; 2026-02-21T08:52:46.4508213Z cvt.s16.s8 %rs742, %rs741; 2026-02-21T08:52:46.4508274Z shr.s16 %rs743, %rs742, 4; 2026-02-21T08:52:46.4508341Z selp.b16 %rs744, %rs670, %rs638, %p57; 2026-02-21T08:52:46.4508410Z cvt.s16.s8 %rs745, %rs744; 2026-02-21T08:52:46.4508473Z shr.s16 %rs746, %rs745, 4; 2026-02-21T08:52:46.4508619Z selp.b16 %rs747, %rs671, %rs639, %p57; 2026-02-21T08:52:46.4508689Z cvt.s16.s8 %rs748, %rs747; 2026-02-21T08:52:46.4508753Z shr.s16 %rs749, %rs748, 4; 2026-02-21T08:52:46.4508821Z selp.b16 %rs750, %rs672, %rs640, %p57; 2026-02-21T08:52:46.4508883Z cvt.s16.s8 %rs751, %rs750; 2026-02-21T08:52:46.4508951Z shr.s16 %rs752, %rs751, 4; 2026-02-21T08:52:46.4509021Z selp.b16 %rs753, %rs673, %rs641, %p57; 2026-02-21T08:52:46.4509087Z cvt.s16.s8 %rs754, %rs753; 2026-02-21T08:52:46.4509156Z shr.s16 %rs755, %rs754, 4; 2026-02-21T08:52:46.4509223Z selp.b16 %rs756, %rs674, %rs642, %p57; 2026-02-21T08:52:46.4509286Z cvt.s16.s8 %rs757, %rs756; 2026-02-21T08:52:46.4509345Z shr.s16 %rs758, %rs757, 4; 2026-02-21T08:52:46.4509418Z selp.b16 %rs759, %rs675, %rs643, %p57; 2026-02-21T08:52:46.4509478Z cvt.s16.s8 %rs760, %rs759; 2026-02-21T08:52:46.4509539Z shr.s16 %rs761, %rs760, 4; 2026-02-21T08:52:46.4509613Z selp.b16 %rs762, %rs676, %rs644, %p57; 2026-02-21T08:52:46.4509675Z cvt.s16.s8 %rs763, %rs762; 2026-02-21T08:52:46.4509746Z shr.s16 %rs764, %rs763, 4; 2026-02-21T08:52:46.4509823Z selp.b16 %rs765, %rs677, %rs645, %p57; 2026-02-21T08:52:46.4509890Z cvt.s16.s8 %rs766, %rs765; 2026-02-21T08:52:46.4509951Z shr.s16 %rs767, %rs766, 4; 2026-02-21T08:52:46.4510019Z selp.b16 %rs768, %rs678, %rs646, %p57; 2026-02-21T08:52:46.4510086Z cvt.s16.s8 %rs769, %rs768; 2026-02-21T08:52:46.4510150Z shr.s16 %rs770, %rs769, 4; 2026-02-21T08:52:46.4510218Z selp.b16 %rs771, %rs679, %rs647, %p57; 2026-02-21T08:52:46.4510282Z cvt.s16.s8 %rs772, %rs771; 2026-02-21T08:52:46.4510342Z shr.s16 %rs773, %rs772, 4; 2026-02-21T08:52:46.4510410Z selp.b16 %rs774, %rs680, %rs648, %p57; 2026-02-21T08:52:46.4510469Z cvt.s16.s8 %rs775, %rs774; 2026-02-21T08:52:46.4510535Z shr.s16 %rs776, %rs775, 4; 2026-02-21T08:52:46.4510758Z .loc 1 77 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:77:32 2026-02-21T08:52:46.4510831Z cvt.rn.f32.s16 %r3655, %rs683; 2026-02-21T08:52:46.4510901Z cvt.rn.f32.s16 %r3656, %rs686; 2026-02-21T08:52:46.4510964Z cvt.rn.f32.s16 %r3657, %rs689; 2026-02-21T08:52:46.4511171Z cvt.rn.f32.s16 %r3658, %rs692; 2026-02-21T08:52:46.4511234Z cvt.rn.f32.s16 %r3659, %rs695; 2026-02-21T08:52:46.4511302Z cvt.rn.f32.s16 %r3660, %rs698; 2026-02-21T08:52:46.4511364Z cvt.rn.f32.s16 %r3661, %rs701; 2026-02-21T08:52:46.4511428Z cvt.rn.f32.s16 %r3662, %rs704; 2026-02-21T08:52:46.4511493Z cvt.rn.f32.s16 %r3663, %rs707; 2026-02-21T08:52:46.4511556Z cvt.rn.f32.s16 %r3664, %rs710; 2026-02-21T08:52:46.4511618Z cvt.rn.f32.s16 %r3665, %rs713; 2026-02-21T08:52:46.4511686Z cvt.rn.f32.s16 %r3666, %rs716; 2026-02-21T08:52:46.4511749Z cvt.rn.f32.s16 %r3667, %rs719; 2026-02-21T08:52:46.4511812Z cvt.rn.f32.s16 %r3668, %rs722; 2026-02-21T08:52:46.4511876Z cvt.rn.f32.s16 %r3669, %rs725; 2026-02-21T08:52:46.4511943Z cvt.rn.f32.s16 %r3670, %rs728; 2026-02-21T08:52:46.4512153Z cvt.rn.f32.s16 %r3671, %rs731; 2026-02-21T08:52:46.4512221Z cvt.rn.f32.s16 %r3672, %rs734; 2026-02-21T08:52:46.4512290Z cvt.rn.f32.s16 %r3673, %rs737; 2026-02-21T08:52:46.4512353Z cvt.rn.f32.s16 %r3674, %rs740; 2026-02-21T08:52:46.4512421Z cvt.rn.f32.s16 %r3675, %rs743; 2026-02-21T08:52:46.4512482Z cvt.rn.f32.s16 %r3676, %rs746; 2026-02-21T08:52:46.4512550Z cvt.rn.f32.s16 %r3677, %rs749; 2026-02-21T08:52:46.4512614Z cvt.rn.f32.s16 %r3678, %rs752; 2026-02-21T08:52:46.4512678Z cvt.rn.f32.s16 %r3679, %rs755; 2026-02-21T08:52:46.4512747Z cvt.rn.f32.s16 %r3680, %rs758; 2026-02-21T08:52:46.4512810Z cvt.rn.f32.s16 %r3681, %rs761; 2026-02-21T08:52:46.4512873Z cvt.rn.f32.s16 %r3682, %rs764; 2026-02-21T08:52:46.4512936Z cvt.rn.f32.s16 %r3683, %rs767; 2026-02-21T08:52:46.4513008Z cvt.rn.f32.s16 %r3684, %rs770; 2026-02-21T08:52:46.4513070Z cvt.rn.f32.s16 %r3685, %rs773; 2026-02-21T08:52:46.4513134Z cvt.rn.f32.s16 %r3686, %rs776; 2026-02-21T08:52:46.4513208Z st.shared.b32 [%r83], %r3655; 2026-02-21T08:52:46.4513279Z st.shared.b32 [%r83+8], %r3656; 2026-02-21T08:52:46.4513353Z st.shared.b32 [%r83+16384], %r3671; 2026-02-21T08:52:46.4513430Z st.shared.b32 [%r83+16392], %r3672; 2026-02-21T08:52:46.4513499Z st.shared.b32 [%r84], %r3657; 2026-02-21T08:52:46.4513567Z st.shared.b32 [%r84+8], %r3658; 2026-02-21T08:52:46.4513632Z st.shared.b32 [%r84+16384], %r3673; 2026-02-21T08:52:46.4513707Z st.shared.b32 [%r84+16392], %r3674; 2026-02-21T08:52:46.4513770Z st.shared.b32 [%r85], %r3659; 2026-02-21T08:52:46.4513836Z st.shared.b32 [%r85+8], %r3660; 2026-02-21T08:52:46.4513908Z st.shared.b32 [%r85+16384], %r3675; 2026-02-21T08:52:46.4513974Z st.shared.b32 [%r85+16392], %r3676; 2026-02-21T08:52:46.4514049Z st.shared.b32 [%r86], %r3661; 2026-02-21T08:52:46.4514116Z st.shared.b32 [%r86+8], %r3662; 2026-02-21T08:52:46.4514191Z st.shared.b32 [%r86+16384], %r3677; 2026-02-21T08:52:46.4514258Z st.shared.b32 [%r86+16392], %r3678; 2026-02-21T08:52:46.4514324Z st.shared.b32 [%r87], %r3663; 2026-02-21T08:52:46.4514399Z st.shared.b32 [%r87+8], %r3664; 2026-02-21T08:52:46.4514466Z st.shared.b32 [%r87+16384], %r3679; 2026-02-21T08:52:46.4514534Z st.shared.b32 [%r87+16392], %r3680; 2026-02-21T08:52:46.4514604Z st.shared.b32 [%r88], %r3665; 2026-02-21T08:52:46.4514677Z st.shared.b32 [%r88+8], %r3666; 2026-02-21T08:52:46.4514742Z st.shared.b32 [%r88+16384], %r3681; 2026-02-21T08:52:46.4514808Z st.shared.b32 [%r88+16392], %r3682; 2026-02-21T08:52:46.4514878Z st.shared.b32 [%r89], %r3667; 2026-02-21T08:52:46.4514945Z st.shared.b32 [%r89+8], %r3668; 2026-02-21T08:52:46.4515012Z st.shared.b32 [%r89+16384], %r3683; 2026-02-21T08:52:46.4515081Z st.shared.b32 [%r89+16392], %r3684; 2026-02-21T08:52:46.4515146Z st.shared.b32 [%r90], %r3669; 2026-02-21T08:52:46.4515217Z st.shared.b32 [%r90+8], %r3670; 2026-02-21T08:52:46.4515281Z st.shared.b32 [%r90+16384], %r3685; 2026-02-21T08:52:46.4515354Z st.shared.b32 [%r90+16392], %r3686; 2026-02-21T08:52:46.4515413Z $L__tmp7: 2026-02-21T08:52:46.4515715Z .loc 2 291 36 // standard.py:291:36 @[ cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:84:40 ] 2026-02-21T08:52:46.4515793Z // begin inline asm 2026-02-21T08:52:46.4515987Z fence.proxy.async.shared::cta; 2026-02-21T08:52:46.4516063Z // end inline asm 2026-02-21T08:52:46.4516124Z bar.sync 0; 2026-02-21T08:52:46.4516220Z shfl.sync.idx.b32 %r3687, %r4, 0, 31, -1; 2026-02-21T08:52:46.4516295Z wgmma.fence.sync.aligned; 2026-02-21T08:52:46.4516359Z shl.b32 %r3688, %r3687, 11; 2026-02-21T08:52:46.4516428Z and.b32 %r3689, %r3688, 8192; 2026-02-21T08:52:46.4516628Z add.s32 %r3690, %r3689, %r3769; 2026-02-21T08:52:46.4516697Z bfe.u32 %r3691, %r3690, 4, 14; 2026-02-21T08:52:46.4516770Z cvt.u64.u32 %rd276, %r3691; 2026-02-21T08:52:46.4516852Z or.b64 %rd263, %rd276, 4611686293372403712; 2026-02-21T08:52:46.4516922Z mov.pred %p43, -1; 2026-02-21T08:52:46.4516987Z // begin inline asm 2026-02-21T08:52:46.4517917Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3075,%r3076,%r3077,%r3078}, %rd263, %p43, 1, 1; 2026-02-21T08:52:46.4517988Z // end inline asm 2026-02-21T08:52:46.4518052Z add.s32 %r3692, %r3690, 32; 2026-02-21T08:52:46.4518123Z bfe.u32 %r3693, %r3692, 4, 14; 2026-02-21T08:52:46.4518188Z cvt.u64.u32 %rd277, %r3693; 2026-02-21T08:52:46.4518269Z or.b64 %rd264, %rd277, 4611686293372403712; 2026-02-21T08:52:46.4518340Z // begin inline asm 2026-02-21T08:52:46.4519105Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3143,%r3144,%r3145,%r3146}, %rd264, %p43, 1, 1; 2026-02-21T08:52:46.4519168Z // end inline asm 2026-02-21T08:52:46.4519236Z add.s32 %r3694, %r3690, 64; 2026-02-21T08:52:46.4519299Z bfe.u32 %r3695, %r3694, 4, 14; 2026-02-21T08:52:46.4519364Z cvt.u64.u32 %rd278, %r3695; 2026-02-21T08:52:46.4519447Z or.b64 %rd265, %rd278, 4611686293372403712; 2026-02-21T08:52:46.4519516Z // begin inline asm 2026-02-21T08:52:46.4520266Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3211,%r3212,%r3213,%r3214}, %rd265, %p43, 1, 1; 2026-02-21T08:52:46.4520325Z // end inline asm 2026-02-21T08:52:46.4520393Z add.s32 %r3696, %r3690, 96; 2026-02-21T08:52:46.4520457Z bfe.u32 %r3697, %r3696, 4, 14; 2026-02-21T08:52:46.4520520Z cvt.u64.u32 %rd279, %r3697; 2026-02-21T08:52:46.4520599Z or.b64 %rd266, %rd279, 4611686293372403712; 2026-02-21T08:52:46.4520663Z // begin inline asm 2026-02-21T08:52:46.4521413Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3279,%r3280,%r3281,%r3282}, %rd266, %p43, 1, 1; 2026-02-21T08:52:46.4521483Z // end inline asm 2026-02-21T08:52:46.4521548Z add.s32 %r3698, %r3690, 16384; 2026-02-21T08:52:46.4521622Z bfe.u32 %r3699, %r3698, 4, 14; 2026-02-21T08:52:46.4521697Z cvt.u64.u32 %rd280, %r3699; 2026-02-21T08:52:46.4521771Z or.b64 %rd267, %rd280, 4611686293372403712; 2026-02-21T08:52:46.4521833Z // begin inline asm 2026-02-21T08:52:46.4522584Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3347,%r3348,%r3349,%r3350}, %rd267, %p43, 1, 1; 2026-02-21T08:52:46.4522650Z // end inline asm 2026-02-21T08:52:46.4522715Z add.s32 %r3700, %r3690, 16416; 2026-02-21T08:52:46.4522905Z bfe.u32 %r3701, %r3700, 4, 14; 2026-02-21T08:52:46.4522976Z cvt.u64.u32 %rd281, %r3701; 2026-02-21T08:52:46.4523050Z or.b64 %rd268, %rd281, 4611686293372403712; 2026-02-21T08:52:46.4523111Z // begin inline asm 2026-02-21T08:52:46.4523890Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3415,%r3416,%r3417,%r3418}, %rd268, %p43, 1, 1; 2026-02-21T08:52:46.4523951Z // end inline asm 2026-02-21T08:52:46.4524016Z add.s32 %r3702, %r3690, 16448; 2026-02-21T08:52:46.4524084Z bfe.u32 %r3703, %r3702, 4, 14; 2026-02-21T08:52:46.4524148Z cvt.u64.u32 %rd282, %r3703; 2026-02-21T08:52:46.4524311Z or.b64 %rd269, %rd282, 4611686293372403712; 2026-02-21T08:52:46.4524375Z // begin inline asm 2026-02-21T08:52:46.4525141Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3483,%r3484,%r3485,%r3486}, %rd269, %p43, 1, 1; 2026-02-21T08:52:46.4525220Z // end inline asm 2026-02-21T08:52:46.4525286Z add.s32 %r3704, %r3690, 16480; 2026-02-21T08:52:46.4525356Z bfe.u32 %r3705, %r3704, 4, 14; 2026-02-21T08:52:46.4525418Z cvt.u64.u32 %rd283, %r3705; 2026-02-21T08:52:46.4525491Z or.b64 %rd270, %rd283, 4611686293372403712; 2026-02-21T08:52:46.4525562Z // begin inline asm 2026-02-21T08:52:46.4526320Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928}, {%r3551,%r3552,%r3553,%r3554}, %rd270, %p43, 1, 1; 2026-02-21T08:52:46.4526384Z // end inline asm 2026-02-21T08:52:46.4526603Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:46.4526667Z mov.b32 %r3589, 0; 2026-02-21T08:52:46.4526731Z mov.b32 %r3587, %r3769; 2026-02-21T08:52:46.4526797Z mov.b32 %r3588, %r3589; 2026-02-21T08:52:46.4526857Z // begin inline asm 2026-02-21T08:52:46.4527436Z // wait for regs: %r3897,%r3898,%r3899,%r3900,%r3901,%r3902,%r3903,%r3904,%r3905,%r3906,%r3907,%r3908,%r3909,%r3910,%r3911,%r3912,%r3913,%r3914,%r3915,%r3916,%r3917,%r3918,%r3919,%r3920,%r3921,%r3922,%r3923,%r3924,%r3925,%r3926,%r3927,%r3928,%r3587,%r3588,%r3589 2026-02-21T08:52:46.4527516Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:46.4527582Z // end inline asm 2026-02-21T08:52:46.4527640Z $L__tmp8: 2026-02-21T08:52:46.4527873Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4527945Z add.s32 %r3706, %r3896, 1; 2026-02-21T08:52:46.4528015Z setp.gt.s32 %p54, %r3706, 1; 2026-02-21T08:52:46.4528088Z selp.b32 %r3896, 0, %r3706, %p54; 2026-02-21T08:52:46.4528312Z .loc 1 48 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:32 2026-02-21T08:52:46.4528390Z mad.wide.s32 %rd274, %r3894, 2, %rd53; 2026-02-21T08:52:46.4528595Z .loc 1 48 80 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:48:80 2026-02-21T08:52:46.4528659Z shl.b32 %r3707, %r3896, 12; 2026-02-21T08:52:46.4528726Z shl.b32 %r3708, %r3896, 13; 2026-02-21T08:52:46.4528792Z add.s32 %r3709, %r3638, %r3708; 2026-02-21T08:52:46.4528854Z add.s32 %r3625, %r3709, %r58; 2026-02-21T08:52:46.4528920Z selp.b32 %r3626, 8, 0, %p52; 2026-02-21T08:52:46.4528983Z // begin inline asm 2026-02-21T08:52:46.4529132Z cp.async.ca.shared.global [ %r3625 + 0 ], [ %rd302 + 0 ], 0x8, %r3626; 2026-02-21T08:52:46.4529198Z // end inline asm 2026-02-21T08:52:46.4529262Z add.s32 %r3627, %r3625, 2048; 2026-02-21T08:52:46.4529322Z // begin inline asm 2026-02-21T08:52:46.4529460Z cp.async.ca.shared.global [ %r3627 + 0 ], [ %rd303 + 0 ], 0x8, %r3626; 2026-02-21T08:52:46.4529663Z // end inline asm 2026-02-21T08:52:46.4529726Z add.s32 %r3629, %r3625, 4096; 2026-02-21T08:52:46.4529798Z // begin inline asm 2026-02-21T08:52:46.4529942Z cp.async.ca.shared.global [ %r3629 + 0 ], [ %rd304 + 0 ], 0x8, %r3626; 2026-02-21T08:52:46.4530001Z // end inline asm 2026-02-21T08:52:46.4530063Z add.s32 %r3631, %r3625, 6144; 2026-02-21T08:52:46.4530123Z // begin inline asm 2026-02-21T08:52:46.4530265Z cp.async.ca.shared.global [ %r3631 + 0 ], [ %rd274 + 0 ], 0x8, %r3626; 2026-02-21T08:52:46.4530324Z // end inline asm 2026-02-21T08:52:46.4530392Z cp.async.commit_group; 2026-02-21T08:52:46.4530604Z .loc 1 54 87 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:54:87 2026-02-21T08:52:46.4530789Z add.s32 %r3633, %r64, %r3707; 2026-02-21T08:52:46.4530856Z selp.b32 %r3634, 16, 0, %p52; 2026-02-21T08:52:46.4530922Z // begin inline asm 2026-02-21T08:52:46.4531067Z cp.async.cg.shared.global [ %r3633 + 0 ], [ %rd301 + 0 ], 0x10, %r3634; 2026-02-21T08:52:46.4531130Z // end inline asm 2026-02-21T08:52:46.4531199Z cp.async.commit_group; 2026-02-21T08:52:46.4531415Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4531482Z add.s64 %rd304, %rd304, 128; 2026-02-21T08:52:46.4531549Z add.s64 %rd303, %rd303, 128; 2026-02-21T08:52:46.4531616Z add.s64 %rd302, %rd302, 128; 2026-02-21T08:52:46.4531680Z add.s32 %r3894, %r3894, 64; 2026-02-21T08:52:46.4531746Z add.s64 %rd301, %rd301, 229376; 2026-02-21T08:52:46.4531815Z setp.lt.u64 %p55, %rd305, 4064; 2026-02-21T08:52:46.4531884Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:46.4531997Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:46.4532207Z .loc 1 33 32 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:33:32 2026-02-21T08:52:46.4532291Z or.b32 %r3746, %r345, %r11; 2026-02-21T08:52:46.4532506Z .loc 1 40 103 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:40:103 2026-02-21T08:52:46.4532576Z cp.async.wait_group 0; 2026-02-21T08:52:46.4532639Z bar.sync 0; 2026-02-21T08:52:46.4532843Z .loc 1 87 28 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:87:28 2026-02-21T08:52:46.4532926Z cvt.rn.bf16x2.f32 %r3747, %r3898, %r3897; 2026-02-21T08:52:46.4533008Z cvt.rn.bf16x2.f32 %r3748, %r3900, %r3899; 2026-02-21T08:52:46.4533081Z cvt.rn.bf16x2.f32 %r3749, %r3902, %r3901; 2026-02-21T08:52:46.4533153Z cvt.rn.bf16x2.f32 %r3750, %r3904, %r3903; 2026-02-21T08:52:46.4533224Z cvt.rn.bf16x2.f32 %r3751, %r3906, %r3905; 2026-02-21T08:52:46.4533303Z cvt.rn.bf16x2.f32 %r3752, %r3908, %r3907; 2026-02-21T08:52:46.4533379Z cvt.rn.bf16x2.f32 %r3753, %r3910, %r3909; 2026-02-21T08:52:46.4533451Z cvt.rn.bf16x2.f32 %r3754, %r3912, %r3911; 2026-02-21T08:52:46.4533526Z cvt.rn.bf16x2.f32 %r3755, %r3914, %r3913; 2026-02-21T08:52:46.4533601Z cvt.rn.bf16x2.f32 %r3756, %r3916, %r3915; 2026-02-21T08:52:46.4533673Z cvt.rn.bf16x2.f32 %r3757, %r3918, %r3917; 2026-02-21T08:52:46.4533745Z cvt.rn.bf16x2.f32 %r3758, %r3920, %r3919; 2026-02-21T08:52:46.4533822Z cvt.rn.bf16x2.f32 %r3759, %r3922, %r3921; 2026-02-21T08:52:46.4533894Z cvt.rn.bf16x2.f32 %r3760, %r3924, %r3923; 2026-02-21T08:52:46.4533965Z cvt.rn.bf16x2.f32 %r3761, %r3926, %r3925; 2026-02-21T08:52:46.4534043Z cvt.rn.bf16x2.f32 %r3762, %r3928, %r3927; 2026-02-21T08:52:46.4534253Z .loc 1 88 50 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:50 2026-02-21T08:52:46.4534325Z mad.lo.s32 %r3763, %r341, 7168, %r3746; 2026-02-21T08:52:46.4534401Z mad.lo.s32 %r3764, %r342, 7168, %r3746; 2026-02-21T08:52:46.4534470Z mad.lo.s32 %r3765, %r343, 7168, %r3746; 2026-02-21T08:52:46.4534540Z mad.lo.s32 %r3766, %r344, 7168, %r3746; 2026-02-21T08:52:46.4534748Z .loc 1 88 22 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:22 2026-02-21T08:52:46.4534954Z mad.wide.s32 %rd284, %r3763, 2, %rd55; 2026-02-21T08:52:46.4535024Z mad.wide.s32 %rd285, %r3764, 2, %rd55; 2026-02-21T08:52:46.4535097Z mad.wide.s32 %rd286, %r3765, 2, %rd55; 2026-02-21T08:52:46.4535164Z mad.wide.s32 %rd287, %r3766, 2, %rd55; 2026-02-21T08:52:46.4535366Z .loc 1 88 81 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:88:81 2026-02-21T08:52:46.4535479Z st.shared.v4.b32 [%r91], {%r3747, %r3749, %r3751, %r3753}; 2026-02-21T08:52:46.4535599Z st.shared.v4.b32 [%r91+512], {%r3748, %r3750, %r3752, %r3754}; 2026-02-21T08:52:46.4535703Z st.shared.v4.b32 [%r92], {%r3755, %r3757, %r3759, %r3761}; 2026-02-21T08:52:46.4535813Z st.shared.v4.b32 [%r92+512], {%r3756, %r3758, %r3760, %r3762}; 2026-02-21T08:52:46.4535987Z bar.sync 0; 2026-02-21T08:52:46.4536053Z // begin inline asm 2026-02-21T08:52:46.4536249Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3710, %r3711, %r3712, %r3713}, [%r3714]; 2026-02-21T08:52:46.4536320Z // end inline asm 2026-02-21T08:52:46.4536381Z // begin inline asm 2026-02-21T08:52:46.4536696Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3715, %r3716, %r3717, %r3718}, [%r3719]; 2026-02-21T08:52:46.4536758Z // end inline asm 2026-02-21T08:52:46.4536822Z // begin inline asm 2026-02-21T08:52:46.4537004Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3720, %r3721, %r3722, %r3723}, [%r3724]; 2026-02-21T08:52:46.4537063Z // end inline asm 2026-02-21T08:52:46.4537127Z // begin inline asm 2026-02-21T08:52:46.4537310Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3725, %r3726, %r3727, %r3728}, [%r3729]; 2026-02-21T08:52:46.4537367Z // end inline asm 2026-02-21T08:52:46.4537427Z // begin inline asm 2026-02-21T08:52:46.4537563Z st.global.v4.b32 [ %rd284 + 0 ], { %r3710, %r3711, %r3712, %r3713 }; 2026-02-21T08:52:46.4537625Z // end inline asm 2026-02-21T08:52:46.4537684Z // begin inline asm 2026-02-21T08:52:46.4537808Z st.global.v4.b32 [ %rd285 + 0 ], { %r3715, %r3716, %r3717, %r3718 }; 2026-02-21T08:52:46.4537870Z // end inline asm 2026-02-21T08:52:46.4537929Z // begin inline asm 2026-02-21T08:52:46.4538050Z st.global.v4.b32 [ %rd286 + 0 ], { %r3720, %r3721, %r3722, %r3723 }; 2026-02-21T08:52:46.4538108Z // end inline asm 2026-02-21T08:52:46.4538170Z // begin inline asm 2026-02-21T08:52:46.4538285Z st.global.v4.b32 [ %rd287 + 0 ], { %r3725, %r3726, %r3727, %r3728 }; 2026-02-21T08:52:46.4538348Z // end inline asm 2026-02-21T08:52:46.4538565Z .loc 1 19 112 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:112 2026-02-21T08:52:46.4538631Z add.s32 %r417, %r3893, 4224; 2026-02-21T08:52:46.4538709Z setp.lt.s32 %p56, %r3893, -4168; 2026-02-21T08:52:46.4538771Z mov.b32 %r3893, %r417; 2026-02-21T08:52:46.4538833Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:46.4538932Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:46.4539141Z .loc 1 19 4 // cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py:19:4 2026-02-21T08:52:46.4539199Z ret; 2026-02-21T08:52:46.4539256Z $L__tmp9: 2026-02-21T08:52:46.4539318Z $L__func_end0: 2026-02-21T08:52:46.4539406Z // -- End function 2026-02-21T08:52:46.4539463Z } 2026-02-21T08:52:46.4539723Z .file 1 "/tmp/torchinductor_root/uq/cuq4eo2em6amnpevtaqeblt4sqvtkueeap5iwxc4qe3bb5gy5zdu.py" 2026-02-21T08:52:46.4539944Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:46.4540011Z .section .debug_abbrev 2026-02-21T08:52:46.4540065Z { 2026-02-21T08:52:46.4540166Z .b8 1 // Abbreviation Code 2026-02-21T08:52:46.4540262Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:46.4540354Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:46.4540450Z .b8 37 // DW_AT_producer 2026-02-21T08:52:46.4540534Z .b8 8 // DW_FORM_string 2026-02-21T08:52:46.4540614Z .b8 19 // DW_AT_language 2026-02-21T08:52:46.4540865Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:46.4540949Z .b8 3 // DW_AT_name 2026-02-21T08:52:46.4541036Z .b8 8 // DW_FORM_string 2026-02-21T08:52:46.4541127Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:46.4541209Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:46.4541292Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:46.4541374Z .b8 8 // DW_FORM_string 2026-02-21T08:52:46.4541455Z .b8 0 // EOM(1) 2026-02-21T08:52:46.4541527Z .b8 0 // EOM(2) 2026-02-21T08:52:46.4541760Z .b8 2 // Abbreviation Code 2026-02-21T08:52:46.4541864Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:46.4541953Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:46.4542031Z .b8 3 // DW_AT_name 2026-02-21T08:52:46.4542115Z .b8 8 // DW_FORM_string 2026-02-21T08:52:46.4542200Z .b8 32 // DW_AT_inline 2026-02-21T08:52:46.4542282Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:46.4542358Z .b8 0 // EOM(1) 2026-02-21T08:52:46.4542433Z .b8 0 // EOM(2) 2026-02-21T08:52:46.4542518Z .b8 3 // Abbreviation Code 2026-02-21T08:52:46.4542605Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:46.4542697Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:46.4542778Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:46.4542856Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:46.4542947Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:46.4543023Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:46.4543117Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:46.4543197Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:46.4543274Z .b8 0 // EOM(1) 2026-02-21T08:52:46.4543357Z .b8 0 // EOM(2) 2026-02-21T08:52:46.4543447Z .b8 4 // Abbreviation Code 2026-02-21T08:52:46.4543557Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:46.4543637Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:46.4543730Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:46.4543814Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:46.4543894Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:46.4543973Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:46.4544060Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:46.4544137Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:46.4544219Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:46.4544297Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:46.4544384Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:46.4544462Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:46.4544548Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:46.4544631Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:46.4544704Z .b8 0 // EOM(1) 2026-02-21T08:52:46.4544775Z .b8 0 // EOM(2) 2026-02-21T08:52:46.4544848Z .b8 0 // EOM(3) 2026-02-21T08:52:46.4545012Z } 2026-02-21T08:52:46.4545081Z .section .debug_info 2026-02-21T08:52:46.4545144Z { 2026-02-21T08:52:46.4545244Z .b32 178 // Length of Unit 2026-02-21T08:52:46.4545337Z .b8 2 // DWARF version number 2026-02-21T08:52:46.4545391Z .b8 0 2026-02-21T08:52:46.4545528Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:46.4545625Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:46.4545742Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:46.4545835Z .b8 116 // DW_AT_producer 2026-02-21T08:52:46.4545891Z .b8 114 2026-02-21T08:52:46.4545948Z .b8 105 2026-02-21T08:52:46.4546001Z .b8 116 2026-02-21T08:52:46.4546168Z .b8 111 2026-02-21T08:52:46.4546226Z .b8 110 2026-02-21T08:52:46.4546278Z .b8 0 2026-02-21T08:52:46.4546358Z .b8 2 // DW_AT_language 2026-02-21T08:52:46.4546417Z .b8 0 2026-02-21T08:52:46.4546615Z .b8 99 // DW_AT_name 2026-02-21T08:52:46.4546673Z .b8 117 2026-02-21T08:52:46.4546732Z .b8 113 2026-02-21T08:52:46.4546784Z .b8 52 2026-02-21T08:52:46.4546836Z .b8 101 2026-02-21T08:52:46.4546888Z .b8 111 2026-02-21T08:52:46.4546945Z .b8 50 2026-02-21T08:52:46.4546997Z .b8 101 2026-02-21T08:52:46.4547060Z .b8 109 2026-02-21T08:52:46.4547118Z .b8 54 2026-02-21T08:52:46.4547181Z .b8 97 2026-02-21T08:52:46.4547234Z .b8 109 2026-02-21T08:52:46.4547288Z .b8 110 2026-02-21T08:52:46.4547346Z .b8 112 2026-02-21T08:52:46.4547398Z .b8 101 2026-02-21T08:52:46.4547454Z .b8 118 2026-02-21T08:52:46.4547511Z .b8 116 2026-02-21T08:52:46.4547562Z .b8 97 2026-02-21T08:52:46.4547614Z .b8 113 2026-02-21T08:52:46.4547667Z .b8 101 2026-02-21T08:52:46.4547728Z .b8 98 2026-02-21T08:52:46.4547781Z .b8 108 2026-02-21T08:52:46.4547833Z .b8 116 2026-02-21T08:52:46.4547885Z .b8 52 2026-02-21T08:52:46.4547943Z .b8 115 2026-02-21T08:52:46.4548000Z .b8 113 2026-02-21T08:52:46.4548054Z .b8 118 2026-02-21T08:52:46.4548115Z .b8 116 2026-02-21T08:52:46.4548168Z .b8 107 2026-02-21T08:52:46.4548222Z .b8 117 2026-02-21T08:52:46.4548275Z .b8 101 2026-02-21T08:52:46.4548334Z .b8 101 2026-02-21T08:52:46.4548386Z .b8 97 2026-02-21T08:52:46.4548439Z .b8 112 2026-02-21T08:52:46.4548496Z .b8 53 2026-02-21T08:52:46.4548607Z .b8 105 2026-02-21T08:52:46.4548663Z .b8 119 2026-02-21T08:52:46.4548716Z .b8 120 2026-02-21T08:52:46.4548773Z .b8 99 2026-02-21T08:52:46.4548825Z .b8 52 2026-02-21T08:52:46.4548878Z .b8 113 2026-02-21T08:52:46.4548938Z .b8 101 2026-02-21T08:52:46.4548990Z .b8 51 2026-02-21T08:52:46.4549041Z .b8 98 2026-02-21T08:52:46.4549095Z .b8 98 2026-02-21T08:52:46.4549151Z .b8 53 2026-02-21T08:52:46.4549205Z .b8 103 2026-02-21T08:52:46.4549258Z .b8 121 2026-02-21T08:52:46.4549312Z .b8 53 2026-02-21T08:52:46.4549372Z .b8 122 2026-02-21T08:52:46.4549425Z .b8 100 2026-02-21T08:52:46.4549479Z .b8 117 2026-02-21T08:52:46.4549535Z .b8 46 2026-02-21T08:52:46.4549591Z .b8 112 2026-02-21T08:52:46.4549644Z .b8 121 2026-02-21T08:52:46.4549697Z .b8 0 2026-02-21T08:52:46.4549805Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:46.4549889Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:46.4549944Z .b8 116 2026-02-21T08:52:46.4550006Z .b8 109 2026-02-21T08:52:46.4550058Z .b8 112 2026-02-21T08:52:46.4550110Z .b8 47 2026-02-21T08:52:46.4550162Z .b8 116 2026-02-21T08:52:46.4550229Z .b8 111 2026-02-21T08:52:46.4550287Z .b8 114 2026-02-21T08:52:46.4550341Z .b8 99 2026-02-21T08:52:46.4550400Z .b8 104 2026-02-21T08:52:46.4550453Z .b8 105 2026-02-21T08:52:46.4550505Z .b8 110 2026-02-21T08:52:46.4550558Z .b8 100 2026-02-21T08:52:46.4550618Z .b8 117 2026-02-21T08:52:46.4550670Z .b8 99 2026-02-21T08:52:46.4550722Z .b8 116 2026-02-21T08:52:46.4550779Z .b8 111 2026-02-21T08:52:46.4550839Z .b8 114 2026-02-21T08:52:46.4550892Z .b8 95 2026-02-21T08:52:46.4550948Z .b8 114 2026-02-21T08:52:46.4551006Z .b8 111 2026-02-21T08:52:46.4551058Z .b8 111 2026-02-21T08:52:46.4551255Z .b8 116 2026-02-21T08:52:46.4551308Z .b8 47 2026-02-21T08:52:46.4551365Z .b8 117 2026-02-21T08:52:46.4551417Z .b8 113 2026-02-21T08:52:46.4551471Z .b8 0 2026-02-21T08:52:46.4551589Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:46.4551666Z .b8 95 // DW_AT_name 2026-02-21T08:52:46.4551720Z .b8 104 2026-02-21T08:52:46.4551773Z .b8 101 2026-02-21T08:52:46.4551831Z .b8 108 2026-02-21T08:52:46.4551883Z .b8 105 2026-02-21T08:52:46.4551935Z .b8 111 2026-02-21T08:52:46.4551997Z .b8 110 2026-02-21T08:52:46.4552048Z .b8 95 2026-02-21T08:52:46.4552100Z .b8 109 2026-02-21T08:52:46.4552151Z .b8 97 2026-02-21T08:52:46.4552210Z .b8 116 2026-02-21T08:52:46.4552274Z .b8 109 2026-02-21T08:52:46.4552328Z .b8 117 2026-02-21T08:52:46.4552508Z .b8 108 2026-02-21T08:52:46.4552563Z .b8 95 2026-02-21T08:52:46.4552616Z .b8 98 2026-02-21T08:52:46.4552669Z .b8 102 2026-02-21T08:52:46.4552725Z .b8 49 2026-02-21T08:52:46.4552783Z .b8 54 2026-02-21T08:52:46.4552835Z .b8 95 2026-02-21T08:52:46.4552887Z .b8 105 2026-02-21T08:52:46.4552944Z .b8 110 2026-02-21T08:52:46.4552996Z .b8 116 2026-02-21T08:52:46.4553048Z .b8 52 2026-02-21T08:52:46.4553104Z .b8 0 2026-02-21T08:52:46.4553183Z .b8 1 // DW_AT_inline 2026-02-21T08:52:46.4553302Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:46.4553405Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:46.4553502Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:46.4553603Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:46.4553734Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:46.4553837Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:46.4553927Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:46.4554018Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:46.4554106Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:46.4554190Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:46.4554275Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:46.4554374Z .b8 0 // End Of Children Mark 2026-02-21T08:52:46.4554463Z .b8 0 // End Of Children Mark 2026-02-21T08:52:46.4554516Z } 2026-02-21T08:52:46.4554589Z .section .debug_macinfo { } 2026-02-21T08:52:46.4554599Z 2026-02-21T08:52:46.4554697Z ================================================================ 2026-02-21T08:52:46.4554818Z please share the reproducer above with Triton project. 2026-02-21T08:52:47.0685207Z 2026-02-21T08:52:47.0685221Z 2026-02-21T08:52:47.0685226Z 2026-02-21T08:52:47.0685566Z ================================================================ 2026-02-21T08:52:47.0686004Z Internal Triton PTX codegen error 2026-02-21T08:52:47.0686341Z `ptxas` stderr: 2026-02-21T08:52:47.0687264Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 509 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:47.0688120Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:47.0688354Z 2026-02-21T08:52:47.0689009Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp0ofhpa7v.ptx -o /tmp/tmp0ofhpa7v.ptx.o 2026-02-21T08:52:47.0689767Z 2026-02-21T08:52:47.0689772Z 2026-02-21T08:52:47.0689845Z // 2026-02-21T08:52:47.0690037Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:47.0690295Z // 2026-02-21T08:52:47.0690397Z 2026-02-21T08:52:47.0690480Z .version 8.7 2026-02-21T08:52:47.0690669Z .target sm_90a 2026-02-21T08:52:47.0690854Z .address_size 64 2026-02-21T08:52:47.0690971Z 2026-02-21T08:52:47.0691205Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:47.0692063Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:47.0692384Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:47.0692715Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:47.0693080Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:47.0693516Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:47.0693941Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:47.0694372Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:47.0694805Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:47.0695364Z ) 2026-02-21T08:52:47.0695546Z .reqntid 512 2026-02-21T08:52:47.0695707Z .maxnreg 32 2026-02-21T08:52:47.0695846Z { 2026-02-21T08:52:47.0695980Z .reg .pred %p<71>; 2026-02-21T08:52:47.0696176Z .reg .b16 %rs<569>; 2026-02-21T08:52:47.0696336Z .reg .b32 %r<2809>; 2026-02-21T08:52:47.0696666Z .reg .b64 %rd<247>; 2026-02-21T08:52:47.0696987Z .loc 1 14 0 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:14:0 2026-02-21T08:52:47.0697369Z $L__func_begin0: 2026-02-21T08:52:47.0697684Z .loc 1 14 0 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:14:0 2026-02-21T08:52:47.0697998Z 2026-02-21T08:52:47.0698054Z // %bb.0: 2026-02-21T08:52:47.0698257Z ld.param.b64 %rd35, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:47.0698563Z ld.param.b64 %rd34, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:47.0698868Z ld.param.b64 %rd33, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:47.0699111Z $L__tmp0: 2026-02-21T08:52:47.0699412Z .loc 1 19 46 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:46 2026-02-21T08:52:47.0699793Z mov.u32 %r2708, %ctaid.x; 2026-02-21T08:52:47.0700110Z .loc 1 0 0 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:0 2026-02-21T08:52:47.0700493Z sub.s32 %r307, 4279, %r2708; 2026-02-21T08:52:47.0700846Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.0701233Z mul.hi.u32 %r308, %r307, 1041204193; 2026-02-21T08:52:47.0701435Z shr.u32 %r309, %r308, 10; 2026-02-21T08:52:47.0701630Z and.b32 %r310, %r309, 1048572; 2026-02-21T08:52:47.0701839Z mad.lo.s32 %r2789, %r310, 4224, %r2708; 2026-02-21T08:52:47.0702199Z .loc 1 31 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:45 2026-02-21T08:52:47.0702579Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:47.0702746Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:47.0702916Z shr.u32 %r311, %r3, 4; 2026-02-21T08:52:47.0703085Z bfe.u32 %r5, %r3, 4, 5; 2026-02-21T08:52:47.0703259Z or.b32 %r6, %r311, 32; 2026-02-21T08:52:47.0703422Z and.b32 %r312, %r3, 15; 2026-02-21T08:52:47.0703597Z shl.b32 %r7, %r312, 2; 2026-02-21T08:52:47.0703925Z .loc 1 33 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:45 2026-02-21T08:52:47.0704289Z shl.b32 %r8, %r312, 3; 2026-02-21T08:52:47.0704621Z .loc 1 65 38 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:65:38 2026-02-21T08:52:47.0704997Z and.b32 %r9, %r3, 128; 2026-02-21T08:52:47.0705334Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.0705726Z setp.ge.s32 %p1, %r2708, %r2789; 2026-02-21T08:52:47.0705925Z shl.b32 %r2694, %r3, 3; 2026-02-21T08:52:47.0706103Z shr.u32 %r2695, %r3, 1; 2026-02-21T08:52:47.0706274Z mov.b32 %r2696, global_smem; 2026-02-21T08:52:47.0706594Z mul.lo.s32 %r2697, %r5, 7168; 2026-02-21T08:52:47.0706799Z mul.lo.s32 %r2698, %r6, 7168; 2026-02-21T08:52:47.0706980Z shl.b32 %r2699, %r3, 6; 2026-02-21T08:52:47.0707149Z shl.b32 %r2700, %r3, 5; 2026-02-21T08:52:47.0707510Z shl.b32 %r2701, %r3, 1; 2026-02-21T08:52:47.0707681Z and.b32 %r2702, %r3, 127; 2026-02-21T08:52:47.0707852Z shl.b32 %r2703, %r3, 4; 2026-02-21T08:52:47.0708018Z and.b32 %r2704, %r4, 12; 2026-02-21T08:52:47.0708182Z and.b32 %r2705, %r3, 3; 2026-02-21T08:52:47.0708351Z and.b32 %r2706, %r3, 24; 2026-02-21T08:52:47.0708623Z shl.b32 %r2707, %r3, 2; 2026-02-21T08:52:47.0708816Z cvt.u64.u32 %rd235, %r7; 2026-02-21T08:52:47.0708995Z setp.eq.b32 %p70, %r9, 0; 2026-02-21T08:52:47.0709168Z @%p1 bra $L__BB0_11; 2026-02-21T08:52:47.0709356Z // %bb.1: // %.lr.ph 2026-02-21T08:52:47.0710016Z [84s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:47.0711730Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[4, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:47.0713198Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:47.0713493Z `ptxas` stderr: 2026-02-21T08:52:47.0714050Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 509 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:47.0714688Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:47.0714870Z 2026-02-21T08:52:47.0715385Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp0ofhpa7v.ptx -o /tmp/tmp0ofhpa7v.ptx.o 2026-02-21T08:52:47.0715976Z 2026-02-21T08:52:47.0716130Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:47.0716729Z .loc 1 0 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:0:144 2026-02-21T08:52:47.0717106Z and.b32 %r314, %r2694, 4088; 2026-02-21T08:52:47.0717298Z and.b32 %r316, %r2695, 56; 2026-02-21T08:52:47.0717476Z xor.b32 %r10, %r314, %r316; 2026-02-21T08:52:47.0717660Z add.s32 %r318, %r2696, %r10; 2026-02-21T08:52:47.0717834Z add.s32 %r11, %r318, 32768; 2026-02-21T08:52:47.0718009Z add.s32 %r12, %r318, 36864; 2026-02-21T08:52:47.0718184Z add.s32 %r319, %r2696, 49152; 2026-02-21T08:52:47.0718380Z add.s32 %r14, %r319, %r314; 2026-02-21T08:52:47.0718557Z add.s32 %r15, %r318, 40960; 2026-02-21T08:52:47.0718732Z add.s32 %r16, %r318, 45056; 2026-02-21T08:52:47.0718902Z add.s32 %r320, %r2696, %r314; 2026-02-21T08:52:47.0719087Z add.s32 %r18, %r320, 53248; 2026-02-21T08:52:47.0719268Z and.b32 %r322, %r2699, 6144; 2026-02-21T08:52:47.0719453Z and.b32 %r324, %r2700, 896; 2026-02-21T08:52:47.0719633Z and.b32 %r326, %r2701, 62; 2026-02-21T08:52:47.0719805Z or.b32 %r327, %r322, %r324; 2026-02-21T08:52:47.0719986Z or.b32 %r19, %r327, %r326; 2026-02-21T08:52:47.0720158Z xor.b32 %r20, %r19, 8; 2026-02-21T08:52:47.0720329Z xor.b32 %r21, %r19, 16; 2026-02-21T08:52:47.0720497Z xor.b32 %r22, %r19, 24; 2026-02-21T08:52:47.0720669Z xor.b32 %r23, %r19, 32; 2026-02-21T08:52:47.0720831Z xor.b32 %r24, %r19, 40; 2026-02-21T08:52:47.0720999Z xor.b32 %r25, %r19, 48; 2026-02-21T08:52:47.0721162Z xor.b32 %r26, %r19, 56; 2026-02-21T08:52:47.0721325Z and.b32 %r329, %r2695, 128; 2026-02-21T08:52:47.0721506Z add.s32 %r330, %r319, %r329; 2026-02-21T08:52:47.0721681Z add.s32 %r27, %r330, %r2702; 2026-02-21T08:52:47.0721863Z shl.b32 %r331, %r2702, 7; 2026-02-21T08:52:47.0722030Z and.b32 %r333, %r2703, 112; 2026-02-21T08:52:47.0722204Z or.b32 %r335, %r331, %r2704; 2026-02-21T08:52:47.0722377Z or.b32 %r336, %r335, %r333; 2026-02-21T08:52:47.0722554Z add.s32 %r28, %r2696, %r336; 2026-02-21T08:52:47.0722729Z xor.b32 %r337, %r336, 16; 2026-02-21T08:52:47.0723067Z add.s32 %r29, %r2696, %r337; 2026-02-21T08:52:47.0723244Z xor.b32 %r338, %r336, 32; 2026-02-21T08:52:47.0723408Z add.s32 %r30, %r2696, %r338; 2026-02-21T08:52:47.0723581Z xor.b32 %r339, %r336, 48; 2026-02-21T08:52:47.0723743Z add.s32 %r31, %r2696, %r339; 2026-02-21T08:52:47.0723915Z xor.b32 %r340, %r336, 64; 2026-02-21T08:52:47.0724077Z add.s32 %r32, %r2696, %r340; 2026-02-21T08:52:47.0724251Z xor.b32 %r341, %r336, 80; 2026-02-21T08:52:47.0724416Z add.s32 %r33, %r2696, %r341; 2026-02-21T08:52:47.0724587Z xor.b32 %r342, %r336, 96; 2026-02-21T08:52:47.0724756Z add.s32 %r34, %r2696, %r342; 2026-02-21T08:52:47.0724924Z xor.b32 %r343, %r336, 112; 2026-02-21T08:52:47.0725105Z add.s32 %r35, %r2696, %r343; 2026-02-21T08:52:47.0725274Z shl.b32 %r345, %r2705, 12; 2026-02-21T08:52:47.0725599Z and.b32 %r346, %r2700, 3168; 2026-02-21T08:52:47.0725785Z shl.b32 %r348, %r2706, 4; 2026-02-21T08:52:47.0725963Z and.b32 %r349, %r3, 384; 2026-02-21T08:52:47.0726129Z shr.u32 %r350, %r349, 2; 2026-02-21T08:52:47.0726306Z and.b32 %r352, %r2707, 16; 2026-02-21T08:52:47.0726608Z or.b32 %r353, %r346, %r348; 2026-02-21T08:52:47.0726806Z xor.b32 %r354, %r353, %r350; 2026-02-21T08:52:47.0726987Z add.s32 %r355, %r2696, %r345; 2026-02-21T08:52:47.0727165Z add.s32 %r356, %r355, %r352; 2026-02-21T08:52:47.0727343Z add.s32 %r36, %r356, %r354; 2026-02-21T08:52:47.0727518Z shl.b32 %r357, %r2706, 9; 2026-02-21T08:52:47.0727690Z shl.b32 %r358, %r2705, 5; 2026-02-21T08:52:47.0727851Z and.b32 %r359, %r2707, 2032; 2026-02-21T08:52:47.0728026Z or.b32 %r360, %r357, %r358; 2026-02-21T08:52:47.0728195Z xor.b32 %r361, %r360, %r359; 2026-02-21T08:52:47.0728371Z add.s32 %r804, %r2696, %r361; 2026-02-21T08:52:47.0728546Z add.s32 %r809, %r804, 2048; 2026-02-21T08:52:47.0728721Z shr.u32 %r362, %r349, 5; 2026-02-21T08:52:47.0728892Z or.b32 %r363, %r331, %r362; 2026-02-21T08:52:47.0729062Z or.b32 %r364, %r363, %r333; 2026-02-21T08:52:47.0729240Z add.s32 %r39, %r2696, %r364; 2026-02-21T08:52:47.0729428Z xor.b32 %r365, %r364, 16; 2026-02-21T08:52:47.0729602Z add.s32 %r40, %r2696, %r365; 2026-02-21T08:52:47.0729775Z xor.b32 %r366, %r364, 32; 2026-02-21T08:52:47.0729947Z add.s32 %r41, %r2696, %r366; 2026-02-21T08:52:47.0730117Z xor.b32 %r367, %r364, 48; 2026-02-21T08:52:47.0730289Z add.s32 %r42, %r2696, %r367; 2026-02-21T08:52:47.0730468Z xor.b32 %r368, %r364, 64; 2026-02-21T08:52:47.0730636Z add.s32 %r43, %r2696, %r368; 2026-02-21T08:52:47.0730813Z xor.b32 %r369, %r364, 80; 2026-02-21T08:52:47.0730979Z add.s32 %r44, %r2696, %r369; 2026-02-21T08:52:47.0731159Z xor.b32 %r370, %r364, 96; 2026-02-21T08:52:47.0731323Z add.s32 %r45, %r2696, %r370; 2026-02-21T08:52:47.0731498Z xor.b32 %r371, %r364, 112; 2026-02-21T08:52:47.0731670Z add.s32 %r46, %r2696, %r371; 2026-02-21T08:52:47.0732007Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0732388Z or.b32 %r372, %r2697, %r8; 2026-02-21T08:52:47.0732562Z add.s32 %r47, %r372, 458752; 2026-02-21T08:52:47.0732899Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.0733267Z mad.wide.u32 %rd36, %r312, 8, %rd33; 2026-02-21T08:52:47.0733473Z add.s64 %rd1, %rd36, 256; 2026-02-21T08:52:47.0733641Z shl.b32 %r48, %r5, 13; 2026-02-21T08:52:47.0733808Z shl.b32 %r374, %r6, 13; 2026-02-21T08:52:47.0734118Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0734472Z or.b32 %r375, %r374, %r7; 2026-02-21T08:52:47.0734642Z or.b32 %r49, %r375, 128; 2026-02-21T08:52:47.0734853Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:47.0735144Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:47.0735422Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:47.0735701Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:47.0736113Z // Child Loop BB0_9 Depth 2 2026-02-21T08:52:47.0736629Z .loc 1 25 35 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:25:35 2026-02-21T08:52:47.0737012Z mul.hi.s32 %r391, %r2708, -1840700269; 2026-02-21T08:52:47.0737221Z add.s32 %r392, %r391, %r2708; 2026-02-21T08:52:47.0737414Z shr.u32 %r393, %r392, 31; 2026-02-21T08:52:47.0737599Z shr.s32 %r394, %r392, 6; 2026-02-21T08:52:47.0737780Z add.s32 %r395, %r394, %r393; 2026-02-21T08:52:47.0738102Z .loc 1 26 33 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:26:33 2026-02-21T08:52:47.0738462Z shl.b32 %r396, %r395, 1; 2026-02-21T08:52:47.0738918Z .loc 1 27 39 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:39 2026-02-21T08:52:47.0739281Z sub.s32 %r397, 1, %r396; 2026-02-21T08:52:47.0739594Z .loc 1 27 52 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:52 2026-02-21T08:52:47.0739945Z min.s32 %r398, %r397, 2; 2026-02-21T08:52:47.0740251Z .loc 1 28 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:45 2026-02-21T08:52:47.0740599Z mul.lo.s32 %r399, %r395, 112; 2026-02-21T08:52:47.0740786Z sub.s32 %r400, %r2708, %r399; 2026-02-21T08:52:47.0741097Z .loc 1 29 51 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:29:51 2026-02-21T08:52:47.0741465Z div.s32 %r401, %r400, %r398; 2026-02-21T08:52:47.0741787Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.0742140Z mul.lo.s32 %r402, %r401, %r398; 2026-02-21T08:52:47.0742336Z sub.s32 %r403, %r400, %r402; 2026-02-21T08:52:47.0742650Z .loc 1 28 30 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:30 2026-02-21T08:52:47.0743005Z add.s32 %r404, %r403, %r396; 2026-02-21T08:52:47.0743323Z .loc 1 30 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:30:27 2026-02-21T08:52:47.0743686Z shl.b32 %r405, %r404, 6; 2026-02-21T08:52:47.0743996Z .loc 1 31 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:32 2026-02-21T08:52:47.0744344Z or.b32 %r82, %r405, %r5; 2026-02-21T08:52:47.0744519Z or.b32 %r83, %r405, %r6; 2026-02-21T08:52:47.0744827Z .loc 1 32 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:32:27 2026-02-21T08:52:47.0745182Z shl.b32 %r406, %r401, 7; 2026-02-21T08:52:47.0745494Z .loc 1 33 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:32 2026-02-21T08:52:47.0745843Z or.b32 %r84, %r406, %r8; 2026-02-21T08:52:47.0746166Z .loc 1 48 53 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:53 2026-02-21T08:52:47.0746657Z shl.b32 %r407, %r82, 13; 2026-02-21T08:52:47.0746839Z shl.b32 %r408, %r83, 13; 2026-02-21T08:52:47.0747157Z .loc 1 48 60 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:60 2026-02-21T08:52:47.0747518Z or.b32 %r409, %r407, %r7; 2026-02-21T08:52:47.0747692Z or.b32 %r410, %r408, %r7; 2026-02-21T08:52:47.0748010Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0748376Z mad.wide.s32 %rd37, %r409, 2, %rd33; 2026-02-21T08:52:47.0748665Z mad.wide.s32 %rd38, %r410, 2, %rd33; 2026-02-21T08:52:47.0748864Z mov.b32 %r377, 8; 2026-02-21T08:52:47.0749158Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0749507Z // begin inline asm 2026-02-21T08:52:47.0749751Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd37 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0750040Z // end inline asm 2026-02-21T08:52:47.0750195Z // begin inline asm 2026-02-21T08:52:47.0750417Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd38 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0750862Z // end inline asm 2026-02-21T08:52:47.0751017Z cp.async.commit_group; 2026-02-21T08:52:47.0751338Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0751693Z add.s32 %r411, %r84, %r2697; 2026-02-21T08:52:47.0752015Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0752378Z cvt.s64.s32 %rd44, %r411; 2026-02-21T08:52:47.0752569Z add.s64 %rd39, %rd34, %rd44; 2026-02-21T08:52:47.0752905Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0753264Z // begin inline asm 2026-02-21T08:52:47.0753502Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd39 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0753902Z // end inline asm 2026-02-21T08:52:47.0754072Z cp.async.commit_group; 2026-02-21T08:52:47.0754383Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0754746Z cvt.s64.s32 %rd45, %r407; 2026-02-21T08:52:47.0754930Z or.b64 %rd46, %rd45, %rd235; 2026-02-21T08:52:47.0755122Z shl.b64 %rd47, %rd46, 1; 2026-02-21T08:52:47.0755304Z add.s64 %rd48, %rd33, %rd47; 2026-02-21T08:52:47.0755494Z add.s64 %rd40, %rd48, 128; 2026-02-21T08:52:47.0755676Z cvt.s64.s32 %rd49, %r408; 2026-02-21T08:52:47.0755848Z or.b64 %rd50, %rd49, %rd235; 2026-02-21T08:52:47.0756038Z shl.b64 %rd51, %rd50, 1; 2026-02-21T08:52:47.0756209Z add.s64 %rd52, %rd33, %rd51; 2026-02-21T08:52:47.0756398Z add.s64 %rd41, %rd52, 128; 2026-02-21T08:52:47.0756848Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0757205Z bar.sync 0; 2026-02-21T08:52:47.0757365Z // begin inline asm 2026-02-21T08:52:47.0757594Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd40 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0757870Z // end inline asm 2026-02-21T08:52:47.0758020Z // begin inline asm 2026-02-21T08:52:47.0758253Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd41 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0758519Z // end inline asm 2026-02-21T08:52:47.0758683Z cp.async.commit_group; 2026-02-21T08:52:47.0759007Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0759367Z add.s32 %r412, %r84, %r2698; 2026-02-21T08:52:47.0759690Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0760037Z cvt.s64.s32 %rd53, %r412; 2026-02-21T08:52:47.0760215Z add.s64 %rd42, %rd34, %rd53; 2026-02-21T08:52:47.0760545Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0760900Z // begin inline asm 2026-02-21T08:52:47.0761122Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd42 + 0 ], 0x8, %r377; 2026-02-21T08:52:47.0761396Z // end inline asm 2026-02-21T08:52:47.0761556Z cp.async.commit_group; 2026-02-21T08:52:47.0761864Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0762216Z add.s32 %r2710, %r47, %r406; 2026-02-21T08:52:47.0762394Z shl.b32 %r413, %r404, 19; 2026-02-21T08:52:47.0762572Z or.b32 %r414, %r48, %r413; 2026-02-21T08:52:47.0762755Z mad.wide.s32 %rd236, %r414, 2, %rd1; 2026-02-21T08:52:47.0762964Z or.b32 %r2709, %r49, %r413; 2026-02-21T08:52:47.0763141Z mov.b32 %r2713, 0f00000000; 2026-02-21T08:52:47.0763320Z mov.b32 %r2712, 1; 2026-02-21T08:52:47.0763486Z mov.b32 %r2711, -1; 2026-02-21T08:52:47.0763644Z mov.b64 %rd237, -32; 2026-02-21T08:52:47.0763817Z mov.b32 %r2714, %r2713; 2026-02-21T08:52:47.0763992Z mov.b32 %r2715, %r2713; 2026-02-21T08:52:47.0764160Z mov.b32 %r2716, %r2713; 2026-02-21T08:52:47.0764325Z mov.b32 %r2717, %r2713; 2026-02-21T08:52:47.0764491Z mov.b32 %r2718, %r2713; 2026-02-21T08:52:47.0764655Z mov.b32 %r2719, %r2713; 2026-02-21T08:52:47.0764821Z mov.b32 %r2720, %r2713; 2026-02-21T08:52:47.0765135Z mov.b32 %r2721, %r2713; 2026-02-21T08:52:47.0765302Z mov.b32 %r2722, %r2713; 2026-02-21T08:52:47.0765467Z mov.b32 %r2723, %r2713; 2026-02-21T08:52:47.0765627Z mov.b32 %r2724, %r2713; 2026-02-21T08:52:47.0765792Z mov.b32 %r2725, %r2713; 2026-02-21T08:52:47.0765957Z mov.b32 %r2726, %r2713; 2026-02-21T08:52:47.0766125Z mov.b32 %r2727, %r2713; 2026-02-21T08:52:47.0766283Z mov.b32 %r2728, %r2713; 2026-02-21T08:52:47.0766633Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.0766933Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.0767191Z add.s64 %rd237, %rd237, 32; 2026-02-21T08:52:47.0767387Z setp.lt.u64 %p11, %rd237, 4032; 2026-02-21T08:52:47.0767724Z add.s32 %r747, %r2711, 1; 2026-02-21T08:52:47.0767919Z setp.gt.s32 %p12, %r747, 1; 2026-02-21T08:52:47.0768105Z selp.b32 %r2711, 0, %r747, %p12; 2026-02-21T08:52:47.0768448Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0768817Z cp.async.wait_group 2; 2026-02-21T08:52:47.0768996Z bar.sync 0; 2026-02-21T08:52:47.0769150Z shl.b32 %r748, %r2711, 12; 2026-02-21T08:52:47.0769332Z shl.b32 %r749, %r2711, 13; 2026-02-21T08:52:47.0769507Z add.s32 %r750, %r2696, 32768; 2026-02-21T08:52:47.0769697Z add.s32 %r751, %r750, %r749; 2026-02-21T08:52:47.0770018Z .loc 1 52 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:52:32 2026-02-21T08:52:47.0770374Z add.s32 %r752, %r751, %r19; 2026-02-21T08:52:47.0770558Z ld.shared.b16 %rs1, [%r752]; 2026-02-21T08:52:47.0770751Z ld.shared.b16 %rs2, [%r752+1024]; 2026-02-21T08:52:47.0770967Z ld.shared.b16 %rs3, [%r752+64]; 2026-02-21T08:52:47.0771161Z ld.shared.b16 %rs4, [%r752+1088]; 2026-02-21T08:52:47.0771356Z add.s32 %r753, %r751, %r20; 2026-02-21T08:52:47.0771532Z ld.shared.b16 %rs5, [%r753]; 2026-02-21T08:52:47.0771718Z ld.shared.b16 %rs6, [%r753+1024]; 2026-02-21T08:52:47.0771917Z ld.shared.b16 %rs7, [%r753+64]; 2026-02-21T08:52:47.0772104Z ld.shared.b16 %rs8, [%r753+1088]; 2026-02-21T08:52:47.0772298Z add.s32 %r754, %r751, %r21; 2026-02-21T08:52:47.0772474Z ld.shared.b16 %rs9, [%r754]; 2026-02-21T08:52:47.0772663Z ld.shared.b16 %rs10, [%r754+1024]; 2026-02-21T08:52:47.0772870Z ld.shared.b16 %rs11, [%r754+64]; 2026-02-21T08:52:47.0773078Z ld.shared.b16 %rs12, [%r754+1088]; 2026-02-21T08:52:47.0773272Z add.s32 %r755, %r751, %r22; 2026-02-21T08:52:47.0773471Z ld.shared.b16 %rs13, [%r755]; 2026-02-21T08:52:47.0773665Z ld.shared.b16 %rs14, [%r755+1024]; 2026-02-21T08:52:47.0773858Z ld.shared.b16 %rs15, [%r755+64]; 2026-02-21T08:52:47.0774054Z ld.shared.b16 %rs16, [%r755+1088]; 2026-02-21T08:52:47.0774242Z add.s32 %r756, %r751, %r23; 2026-02-21T08:52:47.0774425Z ld.shared.b16 %rs17, [%r756]; 2026-02-21T08:52:47.0774616Z ld.shared.b16 %rs18, [%r756+1024]; 2026-02-21T08:52:47.0774817Z ld.shared.b16 %rs19, [%r756+64]; 2026-02-21T08:52:47.0775010Z ld.shared.b16 %rs20, [%r756+1088]; 2026-02-21T08:52:47.0775210Z add.s32 %r757, %r751, %r24; 2026-02-21T08:52:47.0775399Z ld.shared.b16 %rs21, [%r757]; 2026-02-21T08:52:47.0775580Z ld.shared.b16 %rs22, [%r757+1024]; 2026-02-21T08:52:47.0775778Z ld.shared.b16 %rs23, [%r757+64]; 2026-02-21T08:52:47.0775966Z ld.shared.b16 %rs24, [%r757+1088]; 2026-02-21T08:52:47.0776159Z add.s32 %r758, %r751, %r25; 2026-02-21T08:52:47.0776333Z ld.shared.b16 %rs25, [%r758]; 2026-02-21T08:52:47.0776651Z ld.shared.b16 %rs26, [%r758+1024]; 2026-02-21T08:52:47.0776846Z ld.shared.b16 %rs27, [%r758+64]; 2026-02-21T08:52:47.0777042Z ld.shared.b16 %rs28, [%r758+1088]; 2026-02-21T08:52:47.0777232Z add.s32 %r759, %r751, %r26; 2026-02-21T08:52:47.0777416Z ld.shared.b16 %rs29, [%r759]; 2026-02-21T08:52:47.0777606Z ld.shared.b16 %rs30, [%r759+1024]; 2026-02-21T08:52:47.0777798Z ld.shared.b16 %rs31, [%r759+64]; 2026-02-21T08:52:47.0777997Z ld.shared.b16 %rs32, [%r759+1088]; 2026-02-21T08:52:47.0778338Z cvt.f32.bf16 %r447, %rs1; 2026-02-21T08:52:47.0778520Z cvt.f32.bf16 %r448, %rs2; 2026-02-21T08:52:47.0778690Z cvt.f32.bf16 %r449, %rs5; 2026-02-21T08:52:47.0778866Z cvt.f32.bf16 %r450, %rs6; 2026-02-21T08:52:47.0779035Z cvt.f32.bf16 %r483, %rs9; 2026-02-21T08:52:47.0779216Z cvt.f32.bf16 %r484, %rs10; 2026-02-21T08:52:47.0779394Z cvt.f32.bf16 %r485, %rs13; 2026-02-21T08:52:47.0779574Z cvt.f32.bf16 %r486, %rs14; 2026-02-21T08:52:47.0779752Z cvt.f32.bf16 %r519, %rs17; 2026-02-21T08:52:47.0779923Z cvt.f32.bf16 %r520, %rs18; 2026-02-21T08:52:47.0780098Z cvt.f32.bf16 %r521, %rs21; 2026-02-21T08:52:47.0780266Z cvt.f32.bf16 %r522, %rs22; 2026-02-21T08:52:47.0780442Z cvt.f32.bf16 %r555, %rs25; 2026-02-21T08:52:47.0780613Z cvt.f32.bf16 %r556, %rs26; 2026-02-21T08:52:47.0780913Z cvt.f32.bf16 %r557, %rs29; 2026-02-21T08:52:47.0781086Z cvt.f32.bf16 %r558, %rs30; 2026-02-21T08:52:47.0781259Z cvt.f32.bf16 %r591, %rs3; 2026-02-21T08:52:47.0781435Z cvt.f32.bf16 %r592, %rs4; 2026-02-21T08:52:47.0781608Z cvt.f32.bf16 %r593, %rs7; 2026-02-21T08:52:47.0781779Z cvt.f32.bf16 %r594, %rs8; 2026-02-21T08:52:47.0781948Z cvt.f32.bf16 %r627, %rs11; 2026-02-21T08:52:47.0782121Z cvt.f32.bf16 %r628, %rs12; 2026-02-21T08:52:47.0782288Z cvt.f32.bf16 %r629, %rs15; 2026-02-21T08:52:47.0782475Z cvt.f32.bf16 %r630, %rs16; 2026-02-21T08:52:47.0782645Z cvt.f32.bf16 %r663, %rs19; 2026-02-21T08:52:47.0782817Z cvt.f32.bf16 %r664, %rs20; 2026-02-21T08:52:47.0782984Z cvt.f32.bf16 %r665, %rs23; 2026-02-21T08:52:47.0783159Z cvt.f32.bf16 %r666, %rs24; 2026-02-21T08:52:47.0783332Z cvt.f32.bf16 %r699, %rs27; 2026-02-21T08:52:47.0783499Z cvt.f32.bf16 %r700, %rs28; 2026-02-21T08:52:47.0783687Z cvt.f32.bf16 %r701, %rs31; 2026-02-21T08:52:47.0783860Z cvt.f32.bf16 %r702, %rs32; 2026-02-21T08:52:47.0784193Z .loc 1 67 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:67:45 2026-02-21T08:52:47.0784557Z add.s32 %r760, %r27, %r748; 2026-02-21T08:52:47.0784755Z ld.shared.b8 %rs33, [%r760]; 2026-02-21T08:52:47.0784940Z ld.shared.b8 %rs34, [%r760+256]; 2026-02-21T08:52:47.0785141Z ld.shared.b8 %rs35, [%r760+512]; 2026-02-21T08:52:47.0785342Z ld.shared.b8 %rs36, [%r760+768]; 2026-02-21T08:52:47.0785531Z ld.shared.b8 %rs37, [%r760+1024]; 2026-02-21T08:52:47.0785731Z ld.shared.b8 %rs38, [%r760+1280]; 2026-02-21T08:52:47.0785923Z ld.shared.b8 %rs39, [%r760+1536]; 2026-02-21T08:52:47.0786121Z ld.shared.b8 %rs40, [%r760+1792]; 2026-02-21T08:52:47.0786311Z ld.shared.b8 %rs41, [%r760+2048]; 2026-02-21T08:52:47.0786644Z ld.shared.b8 %rs42, [%r760+2304]; 2026-02-21T08:52:47.0786834Z ld.shared.b8 %rs43, [%r760+2560]; 2026-02-21T08:52:47.0787029Z ld.shared.b8 %rs44, [%r760+2816]; 2026-02-21T08:52:47.0787219Z ld.shared.b8 %rs45, [%r760+3072]; 2026-02-21T08:52:47.0787426Z ld.shared.b8 %rs46, [%r760+3328]; 2026-02-21T08:52:47.0787630Z ld.shared.b8 %rs47, [%r760+3584]; 2026-02-21T08:52:47.0787820Z ld.shared.b8 %rs48, [%r760+3840]; 2026-02-21T08:52:47.0788168Z .loc 1 57 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:57:28 2026-02-21T08:52:47.0788615Z shl.b16 %rs49, %rs33, 4; 2026-02-21T08:52:47.0788806Z shl.b16 %rs50, %rs34, 4; 2026-02-21T08:52:47.0788979Z shl.b16 %rs51, %rs35, 4; 2026-02-21T08:52:47.0789155Z shl.b16 %rs52, %rs36, 4; 2026-02-21T08:52:47.0789325Z shl.b16 %rs53, %rs37, 4; 2026-02-21T08:52:47.0789499Z shl.b16 %rs54, %rs38, 4; 2026-02-21T08:52:47.0789672Z shl.b16 %rs55, %rs39, 4; 2026-02-21T08:52:47.0789834Z shl.b16 %rs56, %rs40, 4; 2026-02-21T08:52:47.0790035Z shl.b16 %rs57, %rs41, 4; 2026-02-21T08:52:47.0790221Z shl.b16 %rs58, %rs42, 4; 2026-02-21T08:52:47.0790403Z shl.b16 %rs59, %rs43, 4; 2026-02-21T08:52:47.0790565Z shl.b16 %rs60, %rs44, 4; 2026-02-21T08:52:47.0790747Z shl.b16 %rs61, %rs45, 4; 2026-02-21T08:52:47.0790926Z shl.b16 %rs62, %rs46, 4; 2026-02-21T08:52:47.0791100Z shl.b16 %rs63, %rs47, 4; 2026-02-21T08:52:47.0791264Z shl.b16 %rs64, %rs48, 4; 2026-02-21T08:52:47.0791729Z .loc 1 72 58 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:72:58 2026-02-21T08:52:47.0792100Z selp.b16 %rs65, %rs49, %rs33, %p70; 2026-02-21T08:52:47.0792305Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T08:52:47.0792479Z shr.s16 %rs67, %rs66, 4; 2026-02-21T08:52:47.0792654Z selp.b16 %rs68, %rs50, %rs34, %p70; 2026-02-21T08:52:47.0792864Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T08:52:47.0793026Z shr.s16 %rs70, %rs69, 4; 2026-02-21T08:52:47.0793205Z selp.b16 %rs71, %rs51, %rs35, %p70; 2026-02-21T08:52:47.0793399Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T08:52:47.0793573Z shr.s16 %rs73, %rs72, 4; 2026-02-21T08:52:47.0793761Z selp.b16 %rs74, %rs52, %rs36, %p70; 2026-02-21T08:52:47.0793959Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T08:52:47.0794284Z shr.s16 %rs76, %rs75, 4; 2026-02-21T08:52:47.0794477Z selp.b16 %rs77, %rs53, %rs37, %p70; 2026-02-21T08:52:47.0794684Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T08:52:47.0794887Z shr.s16 %rs79, %rs78, 4; 2026-02-21T08:52:47.0795081Z selp.b16 %rs80, %rs54, %rs38, %p70; 2026-02-21T08:52:47.0795275Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T08:52:47.0795450Z shr.s16 %rs82, %rs81, 4; 2026-02-21T08:52:47.0795628Z selp.b16 %rs83, %rs55, %rs39, %p70; 2026-02-21T08:52:47.0795823Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T08:52:47.0795998Z shr.s16 %rs85, %rs84, 4; 2026-02-21T08:52:47.0826016Z selp.b16 %rs86, %rs56, %rs40, %p70; 2026-02-21T08:52:47.0826335Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T08:52:47.0826700Z shr.s16 %rs88, %rs87, 4; 2026-02-21T08:52:47.0826926Z selp.b16 %rs89, %rs57, %rs41, %p70; 2026-02-21T08:52:47.0827166Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T08:52:47.0827404Z shr.s16 %rs91, %rs90, 4; 2026-02-21T08:52:47.0827621Z selp.b16 %rs92, %rs58, %rs42, %p70; 2026-02-21T08:52:47.0827849Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T08:52:47.0828056Z shr.s16 %rs94, %rs93, 4; 2026-02-21T08:52:47.0828246Z selp.b16 %rs95, %rs59, %rs43, %p70; 2026-02-21T08:52:47.0828463Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T08:52:47.0828720Z shr.s16 %rs97, %rs96, 4; 2026-02-21T08:52:47.0828966Z selp.b16 %rs98, %rs60, %rs44, %p70; 2026-02-21T08:52:47.0829203Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T08:52:47.0829428Z shr.s16 %rs100, %rs99, 4; 2026-02-21T08:52:47.0829628Z selp.b16 %rs101, %rs61, %rs45, %p70; 2026-02-21T08:52:47.0829852Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T08:52:47.0830054Z shr.s16 %rs103, %rs102, 4; 2026-02-21T08:52:47.0830256Z selp.b16 %rs104, %rs62, %rs46, %p70; 2026-02-21T08:52:47.0830468Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T08:52:47.0830672Z shr.s16 %rs106, %rs105, 4; 2026-02-21T08:52:47.0830863Z selp.b16 %rs107, %rs63, %rs47, %p70; 2026-02-21T08:52:47.0831073Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T08:52:47.0831258Z shr.s16 %rs109, %rs108, 4; 2026-02-21T08:52:47.0831448Z selp.b16 %rs110, %rs64, %rs48, %p70; 2026-02-21T08:52:47.0831658Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T08:52:47.0831832Z shr.s16 %rs112, %rs111, 4; 2026-02-21T08:52:47.0832177Z .loc 1 77 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:77:32 2026-02-21T08:52:47.0832571Z cvt.rn.f32.s16 %r761, %rs67; 2026-02-21T08:52:47.0832766Z cvt.rn.f32.s16 %r762, %rs70; 2026-02-21T08:52:47.0832962Z cvt.rn.f32.s16 %r763, %rs73; 2026-02-21T08:52:47.0833141Z cvt.rn.f32.s16 %r764, %rs76; 2026-02-21T08:52:47.0833331Z cvt.rn.f32.s16 %r765, %rs79; 2026-02-21T08:52:47.0833507Z cvt.rn.f32.s16 %r766, %rs82; 2026-02-21T08:52:47.0833690Z cvt.rn.f32.s16 %r767, %rs85; 2026-02-21T08:52:47.0833869Z cvt.rn.f32.s16 %r768, %rs88; 2026-02-21T08:52:47.0834058Z cvt.rn.f32.s16 %r769, %rs91; 2026-02-21T08:52:47.0834240Z cvt.rn.f32.s16 %r770, %rs94; 2026-02-21T08:52:47.0834423Z cvt.rn.f32.s16 %r771, %rs97; 2026-02-21T08:52:47.0834613Z cvt.rn.f32.s16 %r772, %rs100; 2026-02-21T08:52:47.0834805Z cvt.rn.f32.s16 %r773, %rs103; 2026-02-21T08:52:47.0834994Z cvt.rn.f32.s16 %r774, %rs106; 2026-02-21T08:52:47.0835176Z cvt.rn.f32.s16 %r775, %rs109; 2026-02-21T08:52:47.0835370Z cvt.rn.f32.s16 %r776, %rs112; 2026-02-21T08:52:47.0835814Z st.shared.b32 [%r28], %r761; 2026-02-21T08:52:47.0836032Z st.shared.b32 [%r28+16384], %r769; 2026-02-21T08:52:47.0836240Z st.shared.b32 [%r29], %r762; 2026-02-21T08:52:47.0836442Z st.shared.b32 [%r29+16384], %r770; 2026-02-21T08:52:47.0836806Z st.shared.b32 [%r30], %r763; 2026-02-21T08:52:47.0836997Z st.shared.b32 [%r30+16384], %r771; 2026-02-21T08:52:47.0837200Z st.shared.b32 [%r31], %r764; 2026-02-21T08:52:47.0837399Z st.shared.b32 [%r31+16384], %r772; 2026-02-21T08:52:47.0837602Z st.shared.b32 [%r32], %r765; 2026-02-21T08:52:47.0837781Z st.shared.b32 [%r32+16384], %r773; 2026-02-21T08:52:47.0837981Z st.shared.b32 [%r33], %r766; 2026-02-21T08:52:47.0838162Z st.shared.b32 [%r33+16384], %r774; 2026-02-21T08:52:47.0838370Z st.shared.b32 [%r34], %r767; 2026-02-21T08:52:47.0838714Z st.shared.b32 [%r34+16384], %r775; 2026-02-21T08:52:47.0838925Z st.shared.b32 [%r35], %r768; 2026-02-21T08:52:47.0839116Z st.shared.b32 [%r35+16384], %r776; 2026-02-21T08:52:47.0839307Z $L__tmp1: 2026-02-21T08:52:47.0839689Z .loc 2 291 36 // standard.py:291:36 @[ c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:84:40 ] 2026-02-21T08:52:47.0840121Z // begin inline asm 2026-02-21T08:52:47.0840338Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.0840538Z // end inline asm 2026-02-21T08:52:47.0840696Z bar.sync 0; 2026-02-21T08:52:47.0840865Z shfl.sync.idx.b32 %r777, %r4, 0, 31, -1; 2026-02-21T08:52:47.0841102Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.0841304Z shl.b32 %r778, %r777, 10; 2026-02-21T08:52:47.0841483Z and.b32 %r779, %r778, 12288; 2026-02-21T08:52:47.0841670Z add.s32 %r780, %r779, %r2696; 2026-02-21T08:52:47.0841858Z bfe.u32 %r781, %r780, 4, 14; 2026-02-21T08:52:47.0842053Z cvt.u64.u32 %rd65, %r781; 2026-02-21T08:52:47.0842262Z or.b64 %rd54, %rd65, 4611686293372403712; 2026-02-21T08:52:47.0842491Z mov.pred %p2, -1; 2026-02-21T08:52:47.0842662Z // begin inline asm 2026-02-21T08:52:47.0843282Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r447,%r448,%r449,%r450}, %rd54, %p2, 1, 1; 2026-02-21T08:52:47.0843938Z // end inline asm 2026-02-21T08:52:47.0844097Z add.s32 %r782, %r780, 32; 2026-02-21T08:52:47.0844286Z bfe.u32 %r783, %r782, 4, 14; 2026-02-21T08:52:47.0844467Z cvt.u64.u32 %rd66, %r783; 2026-02-21T08:52:47.0844671Z or.b64 %rd55, %rd66, 4611686293372403712; 2026-02-21T08:52:47.0844887Z // begin inline asm 2026-02-21T08:52:47.0845496Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r483,%r484,%r485,%r486}, %rd55, %p2, 1, 1; 2026-02-21T08:52:47.0846140Z // end inline asm 2026-02-21T08:52:47.0846302Z add.s32 %r784, %r780, 64; 2026-02-21T08:52:47.0846611Z bfe.u32 %r785, %r784, 4, 14; 2026-02-21T08:52:47.0846809Z cvt.u64.u32 %rd67, %r785; 2026-02-21T08:52:47.0847007Z or.b64 %rd56, %rd67, 4611686293372403712; 2026-02-21T08:52:47.0847223Z // begin inline asm 2026-02-21T08:52:47.0847823Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r519,%r520,%r521,%r522}, %rd56, %p2, 1, 1; 2026-02-21T08:52:47.0848461Z // end inline asm 2026-02-21T08:52:47.0848615Z add.s32 %r786, %r780, 96; 2026-02-21T08:52:47.0848795Z bfe.u32 %r787, %r786, 4, 14; 2026-02-21T08:52:47.0848976Z cvt.u64.u32 %rd68, %r787; 2026-02-21T08:52:47.0849165Z or.b64 %rd57, %rd68, 4611686293372403712; 2026-02-21T08:52:47.0849376Z // begin inline asm 2026-02-21T08:52:47.0849965Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r555,%r556,%r557,%r558}, %rd57, %p2, 1, 1; 2026-02-21T08:52:47.0850595Z // end inline asm 2026-02-21T08:52:47.0850755Z add.s32 %r788, %r780, 16384; 2026-02-21T08:52:47.0851129Z bfe.u32 %r789, %r788, 4, 14; 2026-02-21T08:52:47.0851308Z cvt.u64.u32 %rd69, %r789; 2026-02-21T08:52:47.0851495Z or.b64 %rd58, %rd69, 4611686293372403712; 2026-02-21T08:52:47.0851696Z // begin inline asm 2026-02-21T08:52:47.0852287Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r591,%r592,%r593,%r594}, %rd58, %p2, 1, 1; 2026-02-21T08:52:47.0852920Z // end inline asm 2026-02-21T08:52:47.0853079Z add.s32 %r790, %r780, 16416; 2026-02-21T08:52:47.0853264Z bfe.u32 %r791, %r790, 4, 14; 2026-02-21T08:52:47.0853445Z cvt.u64.u32 %rd70, %r791; 2026-02-21T08:52:47.0853634Z or.b64 %rd59, %rd70, 4611686293372403712; 2026-02-21T08:52:47.0853963Z // begin inline asm 2026-02-21T08:52:47.0854559Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r627,%r628,%r629,%r630}, %rd59, %p2, 1, 1; 2026-02-21T08:52:47.0855193Z // end inline asm 2026-02-21T08:52:47.0855359Z add.s32 %r792, %r780, 16448; 2026-02-21T08:52:47.0855542Z bfe.u32 %r793, %r792, 4, 14; 2026-02-21T08:52:47.0855725Z cvt.u64.u32 %rd71, %r793; 2026-02-21T08:52:47.0855910Z or.b64 %rd60, %rd71, 4611686293372403712; 2026-02-21T08:52:47.0856110Z // begin inline asm 2026-02-21T08:52:47.0856833Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r663,%r664,%r665,%r666}, %rd60, %p2, 1, 1; 2026-02-21T08:52:47.0857472Z // end inline asm 2026-02-21T08:52:47.0857633Z add.s32 %r794, %r780, 16480; 2026-02-21T08:52:47.0857811Z bfe.u32 %r795, %r794, 4, 14; 2026-02-21T08:52:47.0857998Z cvt.u64.u32 %rd72, %r795; 2026-02-21T08:52:47.0858193Z or.b64 %rd61, %rd72, 4611686293372403712; 2026-02-21T08:52:47.0858397Z // begin inline asm 2026-02-21T08:52:47.0858992Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728}, {%r699,%r700,%r701,%r702}, %rd61, %p2, 1, 1; 2026-02-21T08:52:47.0859643Z // end inline asm 2026-02-21T08:52:47.0859827Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.0860035Z mov.b32 %r720, 0; 2026-02-21T08:52:47.0860201Z mov.b32 %r719, %r2696; 2026-02-21T08:52:47.0860382Z mov.b32 %r721, %r720; 2026-02-21T08:52:47.0860546Z // begin inline asm 2026-02-21T08:52:47.0860959Z // wait for regs: %r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728,%r719,%r720,%r721 2026-02-21T08:52:47.0861430Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.0861644Z // end inline asm 2026-02-21T08:52:47.0861796Z $L__tmp2: 2026-02-21T08:52:47.0862120Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0862492Z add.s32 %r796, %r2712, 1; 2026-02-21T08:52:47.0862685Z setp.gt.s32 %p13, %r796, 1; 2026-02-21T08:52:47.0862889Z selp.b32 %r2712, 0, %r796, %p13; 2026-02-21T08:52:47.0863230Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0863603Z mad.wide.s32 %rd63, %r2709, 2, %rd33; 2026-02-21T08:52:47.0863946Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0864306Z shl.b32 %r797, %r2712, 12; 2026-02-21T08:52:47.0864486Z shl.b32 %r798, %r2712, 13; 2026-02-21T08:52:47.0864672Z add.s32 %r799, %r750, %r798; 2026-02-21T08:52:47.0864861Z add.s32 %r741, %r799, %r10; 2026-02-21T08:52:47.0865040Z selp.b32 %r742, 8, 0, %p11; 2026-02-21T08:52:47.0865226Z // begin inline asm 2026-02-21T08:52:47.0865467Z cp.async.ca.shared.global [ %r741 + 0 ], [ %rd236 + 0 ], 0x8, %r742; 2026-02-21T08:52:47.0865757Z // end inline asm 2026-02-21T08:52:47.0865911Z add.s32 %r743, %r741, 4096; 2026-02-21T08:52:47.0866239Z // begin inline asm 2026-02-21T08:52:47.0866590Z cp.async.ca.shared.global [ %r743 + 0 ], [ %rd63 + 0 ], 0x8, %r742; 2026-02-21T08:52:47.0866896Z // end inline asm 2026-02-21T08:52:47.0867060Z cp.async.commit_group; 2026-02-21T08:52:47.0867387Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0867760Z cvt.s64.s32 %rd73, %r2710; 2026-02-21T08:52:47.0867944Z add.s64 %rd64, %rd34, %rd73; 2026-02-21T08:52:47.0868268Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0868701Z add.s32 %r745, %r14, %r797; 2026-02-21T08:52:47.0868888Z // begin inline asm 2026-02-21T08:52:47.0869260Z cp.async.ca.shared.global [ %r745 + 0 ], [ %rd64 + 0 ], 0x8, %r742; 2026-02-21T08:52:47.0869560Z // end inline asm 2026-02-21T08:52:47.0869730Z cp.async.commit_group; 2026-02-21T08:52:47.0870043Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0870415Z add.s32 %r2710, %r2710, 229376; 2026-02-21T08:52:47.0870609Z add.s64 %rd236, %rd236, 128; 2026-02-21T08:52:47.0870795Z add.s32 %r2709, %r2709, 64; 2026-02-21T08:52:47.0870980Z setp.lt.u64 %p14, %rd237, 4064; 2026-02-21T08:52:47.0871182Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:47.0871404Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.0871691Z cp.async.wait_group 0; 2026-02-21T08:52:47.0871879Z bar.sync 0; 2026-02-21T08:52:47.0872180Z .loc 1 87 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:87:28 2026-02-21T08:52:47.0872567Z cvt.rn.bf16x2.f32 %r833, %r2714, %r2713; 2026-02-21T08:52:47.0872792Z cvt.rn.bf16x2.f32 %r834, %r2716, %r2715; 2026-02-21T08:52:47.0873028Z cvt.rn.bf16x2.f32 %r835, %r2718, %r2717; 2026-02-21T08:52:47.0873246Z cvt.rn.bf16x2.f32 %r836, %r2720, %r2719; 2026-02-21T08:52:47.0873464Z cvt.rn.bf16x2.f32 %r837, %r2722, %r2721; 2026-02-21T08:52:47.0873682Z cvt.rn.bf16x2.f32 %r838, %r2724, %r2723; 2026-02-21T08:52:47.0873893Z cvt.rn.bf16x2.f32 %r839, %r2726, %r2725; 2026-02-21T08:52:47.0874115Z cvt.rn.bf16x2.f32 %r840, %r2728, %r2727; 2026-02-21T08:52:47.0874460Z .loc 1 88 50 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:50 2026-02-21T08:52:47.0874829Z mad.lo.s32 %r841, %r82, 7168, %r84; 2026-02-21T08:52:47.0875032Z mad.lo.s32 %r842, %r83, 7168, %r84; 2026-02-21T08:52:47.0875381Z .loc 1 88 22 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:22 2026-02-21T08:52:47.0875745Z mad.wide.s32 %rd74, %r841, 2, %rd35; 2026-02-21T08:52:47.0875951Z mad.wide.s32 %rd75, %r842, 2, %rd35; 2026-02-21T08:52:47.0876293Z .loc 1 88 81 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:81 2026-02-21T08:52:47.0876827Z st.shared.v4.b32 [%r36], {%r833, %r835, %r837, %r839}; 2026-02-21T08:52:47.0877132Z st.shared.v4.b32 [%r36+512], {%r834, %r836, %r838, %r840}; 2026-02-21T08:52:47.0877380Z bar.sync 0; 2026-02-21T08:52:47.0877534Z // begin inline asm 2026-02-21T08:52:47.0877822Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r810, %r811, %r812, %r813}, [%r804]; 2026-02-21T08:52:47.0878146Z // end inline asm 2026-02-21T08:52:47.0878306Z // begin inline asm 2026-02-21T08:52:47.0878594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r814, %r815, %r816, %r817}, [%r809]; 2026-02-21T08:52:47.0878917Z // end inline asm 2026-02-21T08:52:47.0879071Z // begin inline asm 2026-02-21T08:52:47.0879294Z st.global.v4.b32 [ %rd74 + 0 ], { %r810, %r811, %r812, %r813 }; 2026-02-21T08:52:47.0879557Z // end inline asm 2026-02-21T08:52:47.0879713Z // begin inline asm 2026-02-21T08:52:47.0879931Z st.global.v4.b32 [ %rd75 + 0 ], { %r814, %r815, %r816, %r817 }; 2026-02-21T08:52:47.0880186Z // end inline asm 2026-02-21T08:52:47.0880504Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.0881019Z add.s32 %r843, %r2708, 4224; 2026-02-21T08:52:47.0881355Z .loc 1 25 35 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:25:35 2026-02-21T08:52:47.0881717Z mul.hi.s32 %r844, %r843, -1840700269; 2026-02-21T08:52:47.0881933Z add.s32 %r845, %r844, %r843; 2026-02-21T08:52:47.0882115Z shr.u32 %r846, %r845, 31; 2026-02-21T08:52:47.0882310Z shr.s32 %r847, %r845, 6; 2026-02-21T08:52:47.0882495Z add.s32 %r848, %r847, %r846; 2026-02-21T08:52:47.0882818Z .loc 1 26 33 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:26:33 2026-02-21T08:52:47.0883180Z shl.b32 %r849, %r848, 1; 2026-02-21T08:52:47.0883618Z .loc 1 27 39 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:39 2026-02-21T08:52:47.0883980Z sub.s32 %r850, 1, %r849; 2026-02-21T08:52:47.0884289Z .loc 1 27 52 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:52 2026-02-21T08:52:47.0884648Z min.s32 %r851, %r850, 2; 2026-02-21T08:52:47.0884963Z .loc 1 28 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:45 2026-02-21T08:52:47.0885345Z mul.lo.s32 %r852, %r848, 112; 2026-02-21T08:52:47.0885542Z sub.s32 %r853, %r843, %r852; 2026-02-21T08:52:47.0885860Z .loc 1 29 51 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:29:51 2026-02-21T08:52:47.0886218Z div.s32 %r854, %r853, %r851; 2026-02-21T08:52:47.0886653Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.0887019Z mul.lo.s32 %r855, %r854, %r851; 2026-02-21T08:52:47.0887216Z sub.s32 %r856, %r853, %r855; 2026-02-21T08:52:47.0887551Z .loc 1 28 30 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:30 2026-02-21T08:52:47.0887911Z add.s32 %r857, %r856, %r849; 2026-02-21T08:52:47.0888222Z .loc 1 30 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:30:27 2026-02-21T08:52:47.0888593Z shl.b32 %r858, %r857, 6; 2026-02-21T08:52:47.0888900Z .loc 1 31 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:32 2026-02-21T08:52:47.0889258Z or.b32 %r127, %r858, %r5; 2026-02-21T08:52:47.0889437Z or.b32 %r128, %r858, %r6; 2026-02-21T08:52:47.0889744Z .loc 1 32 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:32:27 2026-02-21T08:52:47.0890097Z shl.b32 %r859, %r854, 7; 2026-02-21T08:52:47.0890404Z .loc 1 33 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:32 2026-02-21T08:52:47.0890755Z or.b32 %r129, %r859, %r8; 2026-02-21T08:52:47.0891064Z .loc 1 48 53 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:53 2026-02-21T08:52:47.0891428Z shl.b32 %r860, %r127, 13; 2026-02-21T08:52:47.0891605Z shl.b32 %r861, %r128, 13; 2026-02-21T08:52:47.0891911Z .loc 1 48 60 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:60 2026-02-21T08:52:47.0892268Z or.b32 %r862, %r860, %r7; 2026-02-21T08:52:47.0892436Z or.b32 %r863, %r861, %r7; 2026-02-21T08:52:47.0892746Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0893104Z mad.wide.s32 %rd76, %r862, 2, %rd33; 2026-02-21T08:52:47.0893320Z mad.wide.s32 %rd77, %r863, 2, %rd33; 2026-02-21T08:52:47.0893514Z mov.b32 %r819, 8; 2026-02-21T08:52:47.0893814Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0894172Z // begin inline asm 2026-02-21T08:52:47.0894410Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd76 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0894694Z // end inline asm 2026-02-21T08:52:47.0894846Z // begin inline asm 2026-02-21T08:52:47.0895090Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd77 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0895362Z // end inline asm 2026-02-21T08:52:47.0895682Z cp.async.commit_group; 2026-02-21T08:52:47.0896009Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0896365Z add.s32 %r864, %r129, %r2697; 2026-02-21T08:52:47.0896809Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0897165Z cvt.s64.s32 %rd83, %r864; 2026-02-21T08:52:47.0897347Z add.s64 %rd78, %rd34, %rd83; 2026-02-21T08:52:47.0897666Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0898021Z // begin inline asm 2026-02-21T08:52:47.0898248Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd78 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0898678Z // end inline asm 2026-02-21T08:52:47.0898858Z cp.async.commit_group; 2026-02-21T08:52:47.0899178Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0899544Z cvt.s64.s32 %rd84, %r860; 2026-02-21T08:52:47.0899719Z or.b64 %rd85, %rd84, %rd235; 2026-02-21T08:52:47.0899904Z shl.b64 %rd86, %rd85, 1; 2026-02-21T08:52:47.0900076Z add.s64 %rd87, %rd33, %rd86; 2026-02-21T08:52:47.0900266Z add.s64 %rd79, %rd87, 128; 2026-02-21T08:52:47.0900446Z cvt.s64.s32 %rd88, %r861; 2026-02-21T08:52:47.0900627Z or.b64 %rd89, %rd88, %rd235; 2026-02-21T08:52:47.0900811Z shl.b64 %rd90, %rd89, 1; 2026-02-21T08:52:47.0900979Z add.s64 %rd91, %rd33, %rd90; 2026-02-21T08:52:47.0901163Z add.s64 %rd80, %rd91, 128; 2026-02-21T08:52:47.0901476Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0901830Z bar.sync 0; 2026-02-21T08:52:47.0901976Z // begin inline asm 2026-02-21T08:52:47.0902210Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd79 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0902479Z // end inline asm 2026-02-21T08:52:47.0902635Z // begin inline asm 2026-02-21T08:52:47.0902866Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd80 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0903146Z // end inline asm 2026-02-21T08:52:47.0903309Z cp.async.commit_group; 2026-02-21T08:52:47.0903620Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0903983Z add.s32 %r865, %r129, %r2698; 2026-02-21T08:52:47.0904302Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0904659Z cvt.s64.s32 %rd92, %r865; 2026-02-21T08:52:47.0904840Z add.s64 %rd81, %rd34, %rd92; 2026-02-21T08:52:47.0905155Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0905518Z // begin inline asm 2026-02-21T08:52:47.0905747Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd81 + 0 ], 0x8, %r819; 2026-02-21T08:52:47.0906021Z // end inline asm 2026-02-21T08:52:47.0906178Z cp.async.commit_group; 2026-02-21T08:52:47.0906640Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0907022Z add.s32 %r2730, %r47, %r859; 2026-02-21T08:52:47.0907209Z shl.b32 %r866, %r857, 19; 2026-02-21T08:52:47.0907388Z or.b32 %r867, %r48, %r866; 2026-02-21T08:52:47.0907574Z mad.wide.s32 %rd238, %r867, 2, %rd1; 2026-02-21T08:52:47.0907781Z or.b32 %r2729, %r49, %r866; 2026-02-21T08:52:47.0907962Z mov.b32 %r2733, 0f00000000; 2026-02-21T08:52:47.0908139Z mov.b32 %r2732, 1; 2026-02-21T08:52:47.0908293Z mov.b32 %r2731, -1; 2026-02-21T08:52:47.0908462Z mov.b64 %rd239, -32; 2026-02-21T08:52:47.0908720Z mov.b32 %r2734, %r2733; 2026-02-21T08:52:47.0908896Z mov.b32 %r2735, %r2733; 2026-02-21T08:52:47.0909062Z mov.b32 %r2736, %r2733; 2026-02-21T08:52:47.0909236Z mov.b32 %r2737, %r2733; 2026-02-21T08:52:47.0909405Z mov.b32 %r2738, %r2733; 2026-02-21T08:52:47.0909575Z mov.b32 %r2739, %r2733; 2026-02-21T08:52:47.0909746Z mov.b32 %r2740, %r2733; 2026-02-21T08:52:47.0910061Z mov.b32 %r2741, %r2733; 2026-02-21T08:52:47.0910231Z mov.b32 %r2742, %r2733; 2026-02-21T08:52:47.0910391Z mov.b32 %r2743, %r2733; 2026-02-21T08:52:47.0910562Z mov.b32 %r2744, %r2733; 2026-02-21T08:52:47.0910725Z mov.b32 %r2745, %r2733; 2026-02-21T08:52:47.0910890Z mov.b32 %r2746, %r2733; 2026-02-21T08:52:47.0911069Z mov.b32 %r2747, %r2733; 2026-02-21T08:52:47.0911245Z mov.b32 %r2748, %r2733; 2026-02-21T08:52:47.0911467Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.0911769Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.0912034Z add.s64 %rd239, %rd239, 32; 2026-02-21T08:52:47.0912224Z setp.lt.u64 %p24, %rd239, 4032; 2026-02-21T08:52:47.0912427Z add.s32 %r1200, %r2731, 1; 2026-02-21T08:52:47.0912741Z setp.gt.s32 %p25, %r1200, 1; 2026-02-21T08:52:47.0912947Z selp.b32 %r2731, 0, %r1200, %p25; 2026-02-21T08:52:47.0913291Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0913663Z cp.async.wait_group 2; 2026-02-21T08:52:47.0913845Z bar.sync 0; 2026-02-21T08:52:47.0913996Z shl.b32 %r1201, %r2731, 12; 2026-02-21T08:52:47.0914180Z shl.b32 %r1202, %r2731, 13; 2026-02-21T08:52:47.0914362Z add.s32 %r1204, %r750, %r1202; 2026-02-21T08:52:47.0914691Z .loc 1 52 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:52:32 2026-02-21T08:52:47.0915047Z add.s32 %r1205, %r1204, %r19; 2026-02-21T08:52:47.0915240Z ld.shared.b16 %rs113, [%r1205]; 2026-02-21T08:52:47.0915453Z ld.shared.b16 %rs114, [%r1205+1024]; 2026-02-21T08:52:47.0915662Z ld.shared.b16 %rs115, [%r1205+64]; 2026-02-21T08:52:47.0915875Z ld.shared.b16 %rs116, [%r1205+1088]; 2026-02-21T08:52:47.0916089Z add.s32 %r1206, %r1204, %r20; 2026-02-21T08:52:47.0916282Z ld.shared.b16 %rs117, [%r1206]; 2026-02-21T08:52:47.0916587Z ld.shared.b16 %rs118, [%r1206+1024]; 2026-02-21T08:52:47.0916800Z ld.shared.b16 %rs119, [%r1206+64]; 2026-02-21T08:52:47.0917001Z ld.shared.b16 %rs120, [%r1206+1088]; 2026-02-21T08:52:47.0917201Z add.s32 %r1207, %r1204, %r21; 2026-02-21T08:52:47.0917386Z ld.shared.b16 %rs121, [%r1207]; 2026-02-21T08:52:47.0917588Z ld.shared.b16 %rs122, [%r1207+1024]; 2026-02-21T08:52:47.0917788Z ld.shared.b16 %rs123, [%r1207+64]; 2026-02-21T08:52:47.0917997Z ld.shared.b16 %rs124, [%r1207+1088]; 2026-02-21T08:52:47.0918198Z add.s32 %r1208, %r1204, %r22; 2026-02-21T08:52:47.0918395Z ld.shared.b16 %rs125, [%r1208]; 2026-02-21T08:52:47.0918592Z ld.shared.b16 %rs126, [%r1208+1024]; 2026-02-21T08:52:47.0918790Z ld.shared.b16 %rs127, [%r1208+64]; 2026-02-21T08:52:47.0918993Z ld.shared.b16 %rs128, [%r1208+1088]; 2026-02-21T08:52:47.0919188Z add.s32 %r1209, %r1204, %r23; 2026-02-21T08:52:47.0919382Z ld.shared.b16 %rs129, [%r1209]; 2026-02-21T08:52:47.0919576Z ld.shared.b16 %rs130, [%r1209+1024]; 2026-02-21T08:52:47.0919778Z ld.shared.b16 %rs131, [%r1209+64]; 2026-02-21T08:52:47.0919980Z ld.shared.b16 %rs132, [%r1209+1088]; 2026-02-21T08:52:47.0920183Z add.s32 %r1210, %r1204, %r24; 2026-02-21T08:52:47.0920372Z ld.shared.b16 %rs133, [%r1210]; 2026-02-21T08:52:47.0920567Z ld.shared.b16 %rs134, [%r1210+1024]; 2026-02-21T08:52:47.0920776Z ld.shared.b16 %rs135, [%r1210+64]; 2026-02-21T08:52:47.0920971Z ld.shared.b16 %rs136, [%r1210+1088]; 2026-02-21T08:52:47.0921171Z add.s32 %r1211, %r1204, %r25; 2026-02-21T08:52:47.0921353Z ld.shared.b16 %rs137, [%r1211]; 2026-02-21T08:52:47.0921551Z ld.shared.b16 %rs138, [%r1211+1024]; 2026-02-21T08:52:47.0921765Z ld.shared.b16 %rs139, [%r1211+64]; 2026-02-21T08:52:47.0921967Z ld.shared.b16 %rs140, [%r1211+1088]; 2026-02-21T08:52:47.0922166Z add.s32 %r1212, %r1204, %r26; 2026-02-21T08:52:47.0922345Z ld.shared.b16 %rs141, [%r1212]; 2026-02-21T08:52:47.0922542Z ld.shared.b16 %rs142, [%r1212+1024]; 2026-02-21T08:52:47.0922741Z ld.shared.b16 %rs143, [%r1212+64]; 2026-02-21T08:52:47.0922941Z ld.shared.b16 %rs144, [%r1212+1088]; 2026-02-21T08:52:47.0923284Z cvt.f32.bf16 %r900, %rs113; 2026-02-21T08:52:47.0923470Z cvt.f32.bf16 %r901, %rs114; 2026-02-21T08:52:47.0923648Z cvt.f32.bf16 %r902, %rs117; 2026-02-21T08:52:47.0923828Z cvt.f32.bf16 %r903, %rs118; 2026-02-21T08:52:47.0924007Z cvt.f32.bf16 %r936, %rs121; 2026-02-21T08:52:47.0924181Z cvt.f32.bf16 %r937, %rs122; 2026-02-21T08:52:47.0924380Z cvt.f32.bf16 %r938, %rs125; 2026-02-21T08:52:47.0924556Z cvt.f32.bf16 %r939, %rs126; 2026-02-21T08:52:47.0924735Z cvt.f32.bf16 %r972, %rs129; 2026-02-21T08:52:47.0924910Z cvt.f32.bf16 %r973, %rs130; 2026-02-21T08:52:47.0925090Z cvt.f32.bf16 %r974, %rs133; 2026-02-21T08:52:47.0925272Z cvt.f32.bf16 %r975, %rs134; 2026-02-21T08:52:47.0925453Z cvt.f32.bf16 %r1008, %rs137; 2026-02-21T08:52:47.0925633Z cvt.f32.bf16 %r1009, %rs138; 2026-02-21T08:52:47.0925943Z cvt.f32.bf16 %r1010, %rs141; 2026-02-21T08:52:47.0926131Z cvt.f32.bf16 %r1011, %rs142; 2026-02-21T08:52:47.0926306Z cvt.f32.bf16 %r1044, %rs115; 2026-02-21T08:52:47.0926606Z cvt.f32.bf16 %r1045, %rs116; 2026-02-21T08:52:47.0926805Z cvt.f32.bf16 %r1046, %rs119; 2026-02-21T08:52:47.0926982Z cvt.f32.bf16 %r1047, %rs120; 2026-02-21T08:52:47.0927167Z cvt.f32.bf16 %r1080, %rs123; 2026-02-21T08:52:47.0927345Z cvt.f32.bf16 %r1081, %rs124; 2026-02-21T08:52:47.0927528Z cvt.f32.bf16 %r1082, %rs127; 2026-02-21T08:52:47.0927703Z cvt.f32.bf16 %r1083, %rs128; 2026-02-21T08:52:47.0927883Z cvt.f32.bf16 %r1116, %rs131; 2026-02-21T08:52:47.0928059Z cvt.f32.bf16 %r1117, %rs132; 2026-02-21T08:52:47.0928250Z cvt.f32.bf16 %r1118, %rs135; 2026-02-21T08:52:47.0928438Z cvt.f32.bf16 %r1119, %rs136; 2026-02-21T08:52:47.0928613Z cvt.f32.bf16 %r1152, %rs139; 2026-02-21T08:52:47.0928794Z cvt.f32.bf16 %r1153, %rs140; 2026-02-21T08:52:47.0928969Z cvt.f32.bf16 %r1154, %rs143; 2026-02-21T08:52:47.0929149Z cvt.f32.bf16 %r1155, %rs144; 2026-02-21T08:52:47.0929470Z .loc 1 67 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:67:45 2026-02-21T08:52:47.0929834Z add.s32 %r1213, %r27, %r1201; 2026-02-21T08:52:47.0930035Z ld.shared.b8 %rs145, [%r1213]; 2026-02-21T08:52:47.0930236Z ld.shared.b8 %rs146, [%r1213+256]; 2026-02-21T08:52:47.0930441Z ld.shared.b8 %rs147, [%r1213+512]; 2026-02-21T08:52:47.0930638Z ld.shared.b8 %rs148, [%r1213+768]; 2026-02-21T08:52:47.0930840Z ld.shared.b8 %rs149, [%r1213+1024]; 2026-02-21T08:52:47.0931039Z ld.shared.b8 %rs150, [%r1213+1280]; 2026-02-21T08:52:47.0931240Z ld.shared.b8 %rs151, [%r1213+1536]; 2026-02-21T08:52:47.0931437Z ld.shared.b8 %rs152, [%r1213+1792]; 2026-02-21T08:52:47.0931637Z ld.shared.b8 %rs153, [%r1213+2048]; 2026-02-21T08:52:47.0931835Z ld.shared.b8 %rs154, [%r1213+2304]; 2026-02-21T08:52:47.0932036Z ld.shared.b8 %rs155, [%r1213+2560]; 2026-02-21T08:52:47.0932242Z ld.shared.b8 %rs156, [%r1213+2816]; 2026-02-21T08:52:47.0932438Z ld.shared.b8 %rs157, [%r1213+3072]; 2026-02-21T08:52:47.0932640Z ld.shared.b8 %rs158, [%r1213+3328]; 2026-02-21T08:52:47.0932839Z ld.shared.b8 %rs159, [%r1213+3584]; 2026-02-21T08:52:47.0933043Z ld.shared.b8 %rs160, [%r1213+3840]; 2026-02-21T08:52:47.0933376Z .loc 1 57 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:57:28 2026-02-21T08:52:47.0933739Z shl.b16 %rs161, %rs145, 4; 2026-02-21T08:52:47.0933920Z shl.b16 %rs162, %rs146, 4; 2026-02-21T08:52:47.0934114Z shl.b16 %rs163, %rs147, 4; 2026-02-21T08:52:47.0934298Z shl.b16 %rs164, %rs148, 4; 2026-02-21T08:52:47.0934477Z shl.b16 %rs165, %rs149, 4; 2026-02-21T08:52:47.0934658Z shl.b16 %rs166, %rs150, 4; 2026-02-21T08:52:47.0934830Z shl.b16 %rs167, %rs151, 4; 2026-02-21T08:52:47.0935007Z shl.b16 %rs168, %rs152, 4; 2026-02-21T08:52:47.0935181Z shl.b16 %rs169, %rs153, 4; 2026-02-21T08:52:47.0935366Z shl.b16 %rs170, %rs154, 4; 2026-02-21T08:52:47.0935540Z shl.b16 %rs171, %rs155, 4; 2026-02-21T08:52:47.0935716Z shl.b16 %rs172, %rs156, 4; 2026-02-21T08:52:47.0935891Z shl.b16 %rs173, %rs157, 4; 2026-02-21T08:52:47.0936067Z shl.b16 %rs174, %rs158, 4; 2026-02-21T08:52:47.0936388Z shl.b16 %rs175, %rs159, 4; 2026-02-21T08:52:47.0936705Z shl.b16 %rs176, %rs160, 4; 2026-02-21T08:52:47.0937029Z .loc 1 72 58 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:72:58 2026-02-21T08:52:47.0937398Z selp.b16 %rs177, %rs161, %rs145, %p70; 2026-02-21T08:52:47.0937612Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T08:52:47.0937785Z shr.s16 %rs179, %rs178, 4; 2026-02-21T08:52:47.0937976Z selp.b16 %rs180, %rs162, %rs146, %p70; 2026-02-21T08:52:47.0938181Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T08:52:47.0938363Z shr.s16 %rs182, %rs181, 4; 2026-02-21T08:52:47.0938554Z selp.b16 %rs183, %rs163, %rs147, %p70; 2026-02-21T08:52:47.0938758Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T08:52:47.0938938Z shr.s16 %rs185, %rs184, 4; 2026-02-21T08:52:47.0939257Z selp.b16 %rs186, %rs164, %rs148, %p70; 2026-02-21T08:52:47.0939473Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T08:52:47.0939646Z shr.s16 %rs188, %rs187, 4; 2026-02-21T08:52:47.0939835Z selp.b16 %rs189, %rs165, %rs149, %p70; 2026-02-21T08:52:47.0940040Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T08:52:47.0940217Z shr.s16 %rs191, %rs190, 4; 2026-02-21T08:52:47.0940403Z selp.b16 %rs192, %rs166, %rs150, %p70; 2026-02-21T08:52:47.0940616Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T08:52:47.0940796Z shr.s16 %rs194, %rs193, 4; 2026-02-21T08:52:47.0940977Z selp.b16 %rs195, %rs167, %rs151, %p70; 2026-02-21T08:52:47.0941184Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T08:52:47.0941356Z shr.s16 %rs197, %rs196, 4; 2026-02-21T08:52:47.0941546Z selp.b16 %rs198, %rs168, %rs152, %p70; 2026-02-21T08:52:47.0941749Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T08:52:47.0941928Z shr.s16 %rs200, %rs199, 4; 2026-02-21T08:52:47.0942108Z selp.b16 %rs201, %rs169, %rs153, %p70; 2026-02-21T08:52:47.0942314Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T08:52:47.0942490Z shr.s16 %rs203, %rs202, 4; 2026-02-21T08:52:47.0942671Z selp.b16 %rs204, %rs170, %rs154, %p70; 2026-02-21T08:52:47.0942875Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T08:52:47.0943052Z shr.s16 %rs206, %rs205, 4; 2026-02-21T08:52:47.0943237Z selp.b16 %rs207, %rs171, %rs155, %p70; 2026-02-21T08:52:47.0943438Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T08:52:47.0943615Z shr.s16 %rs209, %rs208, 4; 2026-02-21T08:52:47.0943795Z selp.b16 %rs210, %rs172, %rs156, %p70; 2026-02-21T08:52:47.0944022Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T08:52:47.0944195Z shr.s16 %rs212, %rs211, 4; 2026-02-21T08:52:47.0944381Z selp.b16 %rs213, %rs173, %rs157, %p70; 2026-02-21T08:52:47.0944587Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T08:52:47.0944762Z shr.s16 %rs215, %rs214, 4; 2026-02-21T08:52:47.0944950Z selp.b16 %rs216, %rs174, %rs158, %p70; 2026-02-21T08:52:47.0945150Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T08:52:47.0945327Z shr.s16 %rs218, %rs217, 4; 2026-02-21T08:52:47.0945510Z selp.b16 %rs219, %rs175, %rs159, %p70; 2026-02-21T08:52:47.0945715Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T08:52:47.0945890Z shr.s16 %rs221, %rs220, 4; 2026-02-21T08:52:47.0946076Z selp.b16 %rs222, %rs176, %rs160, %p70; 2026-02-21T08:52:47.0946286Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T08:52:47.0946598Z shr.s16 %rs224, %rs223, 4; 2026-02-21T08:52:47.0946939Z .loc 1 77 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:77:32 2026-02-21T08:52:47.0947314Z cvt.rn.f32.s16 %r1214, %rs179; 2026-02-21T08:52:47.0947516Z cvt.rn.f32.s16 %r1215, %rs182; 2026-02-21T08:52:47.0947702Z cvt.rn.f32.s16 %r1216, %rs185; 2026-02-21T08:52:47.0947890Z cvt.rn.f32.s16 %r1217, %rs188; 2026-02-21T08:52:47.0948072Z cvt.rn.f32.s16 %r1218, %rs191; 2026-02-21T08:52:47.0948262Z cvt.rn.f32.s16 %r1219, %rs194; 2026-02-21T08:52:47.0948454Z cvt.rn.f32.s16 %r1220, %rs197; 2026-02-21T08:52:47.0948727Z cvt.rn.f32.s16 %r1221, %rs200; 2026-02-21T08:52:47.0948922Z cvt.rn.f32.s16 %r1222, %rs203; 2026-02-21T08:52:47.0949108Z cvt.rn.f32.s16 %r1223, %rs206; 2026-02-21T08:52:47.0949296Z cvt.rn.f32.s16 %r1224, %rs209; 2026-02-21T08:52:47.0949480Z cvt.rn.f32.s16 %r1225, %rs212; 2026-02-21T08:52:47.0949812Z cvt.rn.f32.s16 %r1226, %rs215; 2026-02-21T08:52:47.0949991Z cvt.rn.f32.s16 %r1227, %rs218; 2026-02-21T08:52:47.0950178Z cvt.rn.f32.s16 %r1228, %rs221; 2026-02-21T08:52:47.0950357Z cvt.rn.f32.s16 %r1229, %rs224; 2026-02-21T08:52:47.0950546Z st.shared.b32 [%r39], %r1214; 2026-02-21T08:52:47.0950742Z st.shared.b32 [%r39+16384], %r1222; 2026-02-21T08:52:47.0950939Z st.shared.b32 [%r40], %r1215; 2026-02-21T08:52:47.0951130Z st.shared.b32 [%r40+16384], %r1223; 2026-02-21T08:52:47.0951326Z st.shared.b32 [%r41], %r1216; 2026-02-21T08:52:47.0951513Z st.shared.b32 [%r41+16384], %r1224; 2026-02-21T08:52:47.0951707Z st.shared.b32 [%r42], %r1217; 2026-02-21T08:52:47.0951896Z st.shared.b32 [%r42+16384], %r1225; 2026-02-21T08:52:47.0952107Z st.shared.b32 [%r43], %r1218; 2026-02-21T08:52:47.0952445Z st.shared.b32 [%r43+16384], %r1226; 2026-02-21T08:52:47.0952666Z st.shared.b32 [%r44], %r1219; 2026-02-21T08:52:47.0952852Z st.shared.b32 [%r44+16384], %r1227; 2026-02-21T08:52:47.0953055Z st.shared.b32 [%r45], %r1220; 2026-02-21T08:52:47.0953236Z st.shared.b32 [%r45+16384], %r1228; 2026-02-21T08:52:47.0953435Z st.shared.b32 [%r46], %r1221; 2026-02-21T08:52:47.0953614Z st.shared.b32 [%r46+16384], %r1229; 2026-02-21T08:52:47.0953808Z $L__tmp3: 2026-02-21T08:52:47.0954169Z .loc 2 291 36 // standard.py:291:36 @[ c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:84:40 ] 2026-02-21T08:52:47.0954609Z // begin inline asm 2026-02-21T08:52:47.0954798Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.0954989Z // end inline asm 2026-02-21T08:52:47.0955143Z bar.sync 0; 2026-02-21T08:52:47.0955312Z shfl.sync.idx.b32 %r1230, %r4, 0, 31, -1; 2026-02-21T08:52:47.0955565Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.0955757Z shl.b32 %r1231, %r1230, 10; 2026-02-21T08:52:47.0955945Z and.b32 %r1232, %r1231, 12288; 2026-02-21T08:52:47.0956131Z add.s32 %r1233, %r1232, %r2696; 2026-02-21T08:52:47.0956327Z bfe.u32 %r1234, %r1233, 4, 14; 2026-02-21T08:52:47.0956637Z cvt.u64.u32 %rd104, %r1234; 2026-02-21T08:52:47.0956837Z or.b64 %rd93, %rd104, 4611686293372403712; 2026-02-21T08:52:47.0957057Z mov.pred %p15, -1; 2026-02-21T08:52:47.0957219Z // begin inline asm 2026-02-21T08:52:47.0957825Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r900,%r901,%r902,%r903}, %rd93, %p15, 1, 1; 2026-02-21T08:52:47.0958468Z // end inline asm 2026-02-21T08:52:47.0958631Z add.s32 %r1235, %r1233, 32; 2026-02-21T08:52:47.0958810Z bfe.u32 %r1236, %r1235, 4, 14; 2026-02-21T08:52:47.0958999Z cvt.u64.u32 %rd105, %r1236; 2026-02-21T08:52:47.0959211Z or.b64 %rd94, %rd105, 4611686293372403712; 2026-02-21T08:52:47.0959425Z // begin inline asm 2026-02-21T08:52:47.0960020Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r936,%r937,%r938,%r939}, %rd94, %p15, 1, 1; 2026-02-21T08:52:47.0960664Z // end inline asm 2026-02-21T08:52:47.0960826Z add.s32 %r1237, %r1233, 64; 2026-02-21T08:52:47.0961004Z bfe.u32 %r1238, %r1237, 4, 14; 2026-02-21T08:52:47.0961191Z cvt.u64.u32 %rd106, %r1238; 2026-02-21T08:52:47.0961384Z or.b64 %rd95, %rd106, 4611686293372403712; 2026-02-21T08:52:47.0961594Z // begin inline asm 2026-02-21T08:52:47.0962188Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r972,%r973,%r974,%r975}, %rd95, %p15, 1, 1; 2026-02-21T08:52:47.0962824Z // end inline asm 2026-02-21T08:52:47.0962986Z add.s32 %r1239, %r1233, 96; 2026-02-21T08:52:47.0963165Z bfe.u32 %r1240, %r1239, 4, 14; 2026-02-21T08:52:47.0963355Z cvt.u64.u32 %rd107, %r1240; 2026-02-21T08:52:47.0963549Z or.b64 %rd96, %rd107, 4611686293372403712; 2026-02-21T08:52:47.0963617Z // begin inline asm 2026-02-21T08:52:47.0964122Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r1008,%r1009,%r1010,%r1011}, %rd96, %p15, 1, 1; 2026-02-21T08:52:47.0964343Z // end inline asm 2026-02-21T08:52:47.0964416Z add.s32 %r1241, %r1233, 16384; 2026-02-21T08:52:47.0964481Z bfe.u32 %r1242, %r1241, 4, 14; 2026-02-21T08:52:47.0964551Z cvt.u64.u32 %rd108, %r1242; 2026-02-21T08:52:47.0964649Z or.b64 %rd97, %rd108, 4611686293372403712; 2026-02-21T08:52:47.0964711Z // begin inline asm 2026-02-21T08:52:47.0965219Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r1044,%r1045,%r1046,%r1047}, %rd97, %p15, 1, 1; 2026-02-21T08:52:47.0965400Z // end inline asm 2026-02-21T08:52:47.0965467Z add.s32 %r1243, %r1233, 16416; 2026-02-21T08:52:47.0965532Z bfe.u32 %r1244, %r1243, 4, 14; 2026-02-21T08:52:47.0965601Z cvt.u64.u32 %rd109, %r1244; 2026-02-21T08:52:47.0965685Z or.b64 %rd98, %rd109, 4611686293372403712; 2026-02-21T08:52:47.0965747Z // begin inline asm 2026-02-21T08:52:47.0966243Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r1080,%r1081,%r1082,%r1083}, %rd98, %p15, 1, 1; 2026-02-21T08:52:47.0966307Z // end inline asm 2026-02-21T08:52:47.0966369Z add.s32 %r1245, %r1233, 16448; 2026-02-21T08:52:47.0966434Z bfe.u32 %r1246, %r1245, 4, 14; 2026-02-21T08:52:47.0966619Z cvt.u64.u32 %rd110, %r1246; 2026-02-21T08:52:47.0966700Z or.b64 %rd99, %rd110, 4611686293372403712; 2026-02-21T08:52:47.0966762Z // begin inline asm 2026-02-21T08:52:47.0967263Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r1116,%r1117,%r1118,%r1119}, %rd99, %p15, 1, 1; 2026-02-21T08:52:47.0967331Z // end inline asm 2026-02-21T08:52:47.0967393Z add.s32 %r1247, %r1233, 16480; 2026-02-21T08:52:47.0967456Z bfe.u32 %r1248, %r1247, 4, 14; 2026-02-21T08:52:47.0967527Z cvt.u64.u32 %rd111, %r1248; 2026-02-21T08:52:47.0967603Z or.b64 %rd100, %rd111, 4611686293372403712; 2026-02-21T08:52:47.0967664Z // begin inline asm 2026-02-21T08:52:47.0968165Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748}, {%r1152,%r1153,%r1154,%r1155}, %rd100, %p15, 1, 1; 2026-02-21T08:52:47.0968224Z // end inline asm 2026-02-21T08:52:47.0968305Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.0968365Z mov.b32 %r1174, 0; 2026-02-21T08:52:47.0968433Z mov.b32 %r1172, %r2696; 2026-02-21T08:52:47.0968501Z mov.b32 %r1173, %r1174; 2026-02-21T08:52:47.0968563Z // begin inline asm 2026-02-21T08:52:47.0968886Z // wait for regs: %r2733,%r2734,%r2735,%r2736,%r2737,%r2738,%r2739,%r2740,%r2741,%r2742,%r2743,%r2744,%r2745,%r2746,%r2747,%r2748,%r1172,%r1173,%r1174 2026-02-21T08:52:47.0968966Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.0969026Z // end inline asm 2026-02-21T08:52:47.0969089Z $L__tmp4: 2026-02-21T08:52:47.0969307Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0969373Z add.s32 %r1249, %r2732, 1; 2026-02-21T08:52:47.0969444Z setp.gt.s32 %p26, %r1249, 1; 2026-02-21T08:52:47.0969520Z selp.b32 %r2732, 0, %r1249, %p26; 2026-02-21T08:52:47.0969730Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0969808Z mad.wide.s32 %rd102, %r2729, 2, %rd33; 2026-02-21T08:52:47.0970023Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0970090Z shl.b32 %r1250, %r2732, 12; 2026-02-21T08:52:47.0970156Z shl.b32 %r1251, %r2732, 13; 2026-02-21T08:52:47.0970227Z add.s32 %r1252, %r750, %r1251; 2026-02-21T08:52:47.0970430Z add.s32 %r1194, %r1252, %r10; 2026-02-21T08:52:47.0970499Z selp.b32 %r1195, 8, 0, %p24; 2026-02-21T08:52:47.0970564Z // begin inline asm 2026-02-21T08:52:47.0970721Z cp.async.ca.shared.global [ %r1194 + 0 ], [ %rd238 + 0 ], 0x8, %r1195; 2026-02-21T08:52:47.0970781Z // end inline asm 2026-02-21T08:52:47.0970858Z add.s32 %r1196, %r1194, 4096; 2026-02-21T08:52:47.0970928Z // begin inline asm 2026-02-21T08:52:47.0971070Z cp.async.ca.shared.global [ %r1196 + 0 ], [ %rd102 + 0 ], 0x8, %r1195; 2026-02-21T08:52:47.0971129Z // end inline asm 2026-02-21T08:52:47.0971198Z cp.async.commit_group; 2026-02-21T08:52:47.0971408Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0971596Z cvt.s64.s32 %rd112, %r2730; 2026-02-21T08:52:47.0971666Z add.s64 %rd103, %rd34, %rd112; 2026-02-21T08:52:47.0971872Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0971949Z add.s32 %r1198, %r14, %r1250; 2026-02-21T08:52:47.0972011Z // begin inline asm 2026-02-21T08:52:47.0972154Z cp.async.ca.shared.global [ %r1198 + 0 ], [ %rd103 + 0 ], 0x8, %r1195; 2026-02-21T08:52:47.0972213Z // end inline asm 2026-02-21T08:52:47.0972281Z cp.async.commit_group; 2026-02-21T08:52:47.0972488Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0972555Z add.s32 %r2730, %r2730, 229376; 2026-02-21T08:52:47.0972621Z add.s64 %rd238, %rd238, 128; 2026-02-21T08:52:47.0972684Z add.s32 %r2729, %r2729, 64; 2026-02-21T08:52:47.0972758Z setp.lt.u64 %p27, %rd239, 4064; 2026-02-21T08:52:47.0972823Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:47.0972939Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.0973017Z cp.async.wait_group 0; 2026-02-21T08:52:47.0973078Z bar.sync 0; 2026-02-21T08:52:47.0973280Z .loc 1 87 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:87:28 2026-02-21T08:52:47.0973378Z cvt.rn.bf16x2.f32 %r1286, %r2734, %r2733; 2026-02-21T08:52:47.0973466Z cvt.rn.bf16x2.f32 %r1287, %r2736, %r2735; 2026-02-21T08:52:47.0973545Z cvt.rn.bf16x2.f32 %r1288, %r2738, %r2737; 2026-02-21T08:52:47.0973619Z cvt.rn.bf16x2.f32 %r1289, %r2740, %r2739; 2026-02-21T08:52:47.0973698Z cvt.rn.bf16x2.f32 %r1290, %r2742, %r2741; 2026-02-21T08:52:47.0973770Z cvt.rn.bf16x2.f32 %r1291, %r2744, %r2743; 2026-02-21T08:52:47.0973842Z cvt.rn.bf16x2.f32 %r1292, %r2746, %r2745; 2026-02-21T08:52:47.0973920Z cvt.rn.bf16x2.f32 %r1293, %r2748, %r2747; 2026-02-21T08:52:47.0974133Z .loc 1 88 50 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:50 2026-02-21T08:52:47.0974210Z mad.lo.s32 %r1294, %r127, 7168, %r129; 2026-02-21T08:52:47.0974281Z mad.lo.s32 %r1295, %r128, 7168, %r129; 2026-02-21T08:52:47.0974491Z .loc 1 88 22 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:22 2026-02-21T08:52:47.0974567Z mad.wide.s32 %rd113, %r1294, 2, %rd35; 2026-02-21T08:52:47.0974636Z mad.wide.s32 %rd114, %r1295, 2, %rd35; 2026-02-21T08:52:47.0974843Z .loc 1 88 81 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:81 2026-02-21T08:52:47.0974957Z st.shared.v4.b32 [%r36], {%r1286, %r1288, %r1290, %r1292}; 2026-02-21T08:52:47.0975076Z st.shared.v4.b32 [%r36+512], {%r1287, %r1289, %r1291, %r1293}; 2026-02-21T08:52:47.0975139Z bar.sync 0; 2026-02-21T08:52:47.0975201Z // begin inline asm 2026-02-21T08:52:47.0975392Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1263, %r1264, %r1265, %r1266}, [%r804]; 2026-02-21T08:52:47.0975451Z // end inline asm 2026-02-21T08:52:47.0975518Z // begin inline asm 2026-02-21T08:52:47.0975702Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1267, %r1268, %r1269, %r1270}, [%r809]; 2026-02-21T08:52:47.0975760Z // end inline asm 2026-02-21T08:52:47.0975825Z // begin inline asm 2026-02-21T08:52:47.0975954Z st.global.v4.b32 [ %rd113 + 0 ], { %r1263, %r1264, %r1265, %r1266 }; 2026-02-21T08:52:47.0976140Z // end inline asm 2026-02-21T08:52:47.0976206Z // begin inline asm 2026-02-21T08:52:47.0976329Z st.global.v4.b32 [ %rd114 + 0 ], { %r1267, %r1268, %r1269, %r1270 }; 2026-02-21T08:52:47.0976386Z // end inline asm 2026-02-21T08:52:47.0976723Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.0976794Z add.s32 %r1296, %r2708, 8448; 2026-02-21T08:52:47.0976997Z .loc 1 25 35 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:25:35 2026-02-21T08:52:47.0977071Z mul.hi.s32 %r1297, %r1296, -1840700269; 2026-02-21T08:52:47.0977141Z add.s32 %r1298, %r1297, %r1296; 2026-02-21T08:52:47.0977208Z shr.u32 %r1299, %r1298, 31; 2026-02-21T08:52:47.0977406Z shr.s32 %r1300, %r1298, 6; 2026-02-21T08:52:47.0977478Z add.s32 %r1301, %r1300, %r1299; 2026-02-21T08:52:47.0977681Z .loc 1 26 33 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:26:33 2026-02-21T08:52:47.0977749Z shl.b32 %r1302, %r1301, 1; 2026-02-21T08:52:47.0977947Z .loc 1 27 39 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:39 2026-02-21T08:52:47.0978015Z sub.s32 %r1303, 1, %r1302; 2026-02-21T08:52:47.0978213Z .loc 1 27 52 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:52 2026-02-21T08:52:47.0978276Z min.s32 %r1304, %r1303, 2; 2026-02-21T08:52:47.0978479Z .loc 1 28 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:45 2026-02-21T08:52:47.0978546Z mul.lo.s32 %r1305, %r1301, 112; 2026-02-21T08:52:47.0978610Z sub.s32 %r1306, %r1296, %r1305; 2026-02-21T08:52:47.0978817Z .loc 1 29 51 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:29:51 2026-02-21T08:52:47.0978894Z div.s32 %r1307, %r1306, %r1304; 2026-02-21T08:52:47.0979096Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.0979174Z mul.lo.s32 %r1308, %r1307, %r1304; 2026-02-21T08:52:47.0979237Z sub.s32 %r1309, %r1306, %r1308; 2026-02-21T08:52:47.0979440Z .loc 1 28 30 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:30 2026-02-21T08:52:47.0979503Z add.s32 %r1310, %r1309, %r1302; 2026-02-21T08:52:47.0979708Z .loc 1 30 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:30:27 2026-02-21T08:52:47.0979772Z shl.b32 %r1311, %r1310, 6; 2026-02-21T08:52:47.0979970Z .loc 1 31 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:32 2026-02-21T08:52:47.0980046Z or.b32 %r172, %r1311, %r5; 2026-02-21T08:52:47.0980110Z or.b32 %r173, %r1311, %r6; 2026-02-21T08:52:47.0980310Z .loc 1 32 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:32:27 2026-02-21T08:52:47.0980379Z shl.b32 %r1312, %r1307, 7; 2026-02-21T08:52:47.0980579Z .loc 1 33 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:32 2026-02-21T08:52:47.0980641Z or.b32 %r174, %r1312, %r8; 2026-02-21T08:52:47.0980838Z .loc 1 48 53 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:53 2026-02-21T08:52:47.0980906Z shl.b32 %r1313, %r172, 13; 2026-02-21T08:52:47.0980968Z shl.b32 %r1314, %r173, 13; 2026-02-21T08:52:47.0981167Z .loc 1 48 60 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:60 2026-02-21T08:52:47.0981237Z or.b32 %r1315, %r1313, %r7; 2026-02-21T08:52:47.0981298Z or.b32 %r1316, %r1314, %r7; 2026-02-21T08:52:47.0981497Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0981581Z mad.wide.s32 %rd115, %r1315, 2, %rd33; 2026-02-21T08:52:47.0981652Z mad.wide.s32 %rd116, %r1316, 2, %rd33; 2026-02-21T08:52:47.0981712Z mov.b32 %r1272, 8; 2026-02-21T08:52:47.0982086Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0982150Z // begin inline asm 2026-02-21T08:52:47.0982295Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd115 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0982354Z // end inline asm 2026-02-21T08:52:47.0982420Z // begin inline asm 2026-02-21T08:52:47.0982557Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd116 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0982616Z // end inline asm 2026-02-21T08:52:47.0982691Z cp.async.commit_group; 2026-02-21T08:52:47.0982893Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0982961Z add.s32 %r1317, %r174, %r2697; 2026-02-21T08:52:47.0983252Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0983327Z cvt.s64.s32 %rd122, %r1317; 2026-02-21T08:52:47.0983394Z add.s64 %rd117, %rd34, %rd122; 2026-02-21T08:52:47.0983596Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0983665Z // begin inline asm 2026-02-21T08:52:47.0983800Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd117 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0983860Z // end inline asm 2026-02-21T08:52:47.0983934Z cp.async.commit_group; 2026-02-21T08:52:47.0984132Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.0984197Z cvt.s64.s32 %rd123, %r1313; 2026-02-21T08:52:47.0984265Z or.b64 %rd124, %rd123, %rd235; 2026-02-21T08:52:47.0984340Z shl.b64 %rd125, %rd124, 1; 2026-02-21T08:52:47.0984407Z add.s64 %rd126, %rd33, %rd125; 2026-02-21T08:52:47.0984477Z add.s64 %rd118, %rd126, 128; 2026-02-21T08:52:47.0984552Z cvt.s64.s32 %rd127, %r1314; 2026-02-21T08:52:47.0984617Z or.b64 %rd128, %rd127, %rd235; 2026-02-21T08:52:47.0984679Z shl.b64 %rd129, %rd128, 1; 2026-02-21T08:52:47.0984750Z add.s64 %rd130, %rd33, %rd129; 2026-02-21T08:52:47.0984816Z add.s64 %rd119, %rd130, 128; 2026-02-21T08:52:47.0985015Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0985074Z bar.sync 0; 2026-02-21T08:52:47.0985144Z // begin inline asm 2026-02-21T08:52:47.0985279Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd118 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0985338Z // end inline asm 2026-02-21T08:52:47.0985404Z // begin inline asm 2026-02-21T08:52:47.0985533Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd119 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0985591Z // end inline asm 2026-02-21T08:52:47.0985658Z cp.async.commit_group; 2026-02-21T08:52:47.0985862Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.0985929Z add.s32 %r1318, %r174, %r2698; 2026-02-21T08:52:47.0986127Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.0986200Z cvt.s64.s32 %rd131, %r1318; 2026-02-21T08:52:47.0986264Z add.s64 %rd120, %rd34, %rd131; 2026-02-21T08:52:47.0986587Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.0986661Z // begin inline asm 2026-02-21T08:52:47.0986794Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd120 + 0 ], 0x8, %r1272; 2026-02-21T08:52:47.0986853Z // end inline asm 2026-02-21T08:52:47.0986921Z cp.async.commit_group; 2026-02-21T08:52:47.0987122Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.0987188Z add.s32 %r2750, %r47, %r1312; 2026-02-21T08:52:47.0987251Z shl.b32 %r1319, %r1310, 19; 2026-02-21T08:52:47.0987318Z or.b32 %r1320, %r48, %r1319; 2026-02-21T08:52:47.0987395Z mad.wide.s32 %rd240, %r1320, 2, %rd1; 2026-02-21T08:52:47.0987457Z or.b32 %r2749, %r49, %r1319; 2026-02-21T08:52:47.0987524Z mov.b32 %r2753, 0f00000000; 2026-02-21T08:52:47.0987584Z mov.b32 %r2752, 1; 2026-02-21T08:52:47.0987784Z mov.b32 %r2751, -1; 2026-02-21T08:52:47.0987846Z mov.b64 %rd241, -32; 2026-02-21T08:52:47.0987912Z mov.b32 %r2754, %r2753; 2026-02-21T08:52:47.0987974Z mov.b32 %r2755, %r2753; 2026-02-21T08:52:47.0988033Z mov.b32 %r2756, %r2753; 2026-02-21T08:52:47.0988096Z mov.b32 %r2757, %r2753; 2026-02-21T08:52:47.0988156Z mov.b32 %r2758, %r2753; 2026-02-21T08:52:47.0988217Z mov.b32 %r2759, %r2753; 2026-02-21T08:52:47.0988277Z mov.b32 %r2760, %r2753; 2026-02-21T08:52:47.0988341Z mov.b32 %r2761, %r2753; 2026-02-21T08:52:47.0988402Z mov.b32 %r2762, %r2753; 2026-02-21T08:52:47.0988462Z mov.b32 %r2763, %r2753; 2026-02-21T08:52:47.0988602Z mov.b32 %r2764, %r2753; 2026-02-21T08:52:47.0988667Z mov.b32 %r2765, %r2753; 2026-02-21T08:52:47.0989073Z mov.b32 %r2766, %r2753; 2026-02-21T08:52:47.0989137Z mov.b32 %r2767, %r2753; 2026-02-21T08:52:47.0989203Z mov.b32 %r2768, %r2753; 2026-02-21T08:52:47.0989316Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.0989431Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.0989504Z add.s64 %rd241, %rd241, 32; 2026-02-21T08:52:47.0989576Z setp.lt.u64 %p37, %rd241, 4032; 2026-02-21T08:52:47.0989639Z add.s32 %r1653, %r2751, 1; 2026-02-21T08:52:47.0989711Z setp.gt.s32 %p38, %r1653, 1; 2026-02-21T08:52:47.0989783Z selp.b32 %r2751, 0, %r1653, %p38; 2026-02-21T08:52:47.0989988Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.0990057Z cp.async.wait_group 2; 2026-02-21T08:52:47.0990121Z bar.sync 0; 2026-02-21T08:52:47.0990185Z shl.b32 %r1654, %r2751, 12; 2026-02-21T08:52:47.0990247Z shl.b32 %r1655, %r2751, 13; 2026-02-21T08:52:47.0990318Z add.s32 %r1657, %r750, %r1655; 2026-02-21T08:52:47.0990519Z .loc 1 52 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:52:32 2026-02-21T08:52:47.0990583Z add.s32 %r1658, %r1657, %r19; 2026-02-21T08:52:47.0990662Z ld.shared.b16 %rs225, [%r1658]; 2026-02-21T08:52:47.0990735Z ld.shared.b16 %rs226, [%r1658+1024]; 2026-02-21T08:52:47.0990806Z ld.shared.b16 %rs227, [%r1658+64]; 2026-02-21T08:52:47.0990875Z ld.shared.b16 %rs228, [%r1658+1088]; 2026-02-21T08:52:47.0990946Z add.s32 %r1659, %r1657, %r20; 2026-02-21T08:52:47.0991013Z ld.shared.b16 %rs229, [%r1659]; 2026-02-21T08:52:47.0991082Z ld.shared.b16 %rs230, [%r1659+1024]; 2026-02-21T08:52:47.0991153Z ld.shared.b16 %rs231, [%r1659+64]; 2026-02-21T08:52:47.0991220Z ld.shared.b16 %rs232, [%r1659+1088]; 2026-02-21T08:52:47.0991296Z add.s32 %r1660, %r1657, %r21; 2026-02-21T08:52:47.0991366Z ld.shared.b16 %rs233, [%r1660]; 2026-02-21T08:52:47.0991439Z ld.shared.b16 %rs234, [%r1660+1024]; 2026-02-21T08:52:47.0991509Z ld.shared.b16 %rs235, [%r1660+64]; 2026-02-21T08:52:47.0991577Z ld.shared.b16 %rs236, [%r1660+1088]; 2026-02-21T08:52:47.0991646Z add.s32 %r1661, %r1657, %r22; 2026-02-21T08:52:47.0991713Z ld.shared.b16 %rs237, [%r1661]; 2026-02-21T08:52:47.0991785Z ld.shared.b16 %rs238, [%r1661+1024]; 2026-02-21T08:52:47.0991853Z ld.shared.b16 %rs239, [%r1661+64]; 2026-02-21T08:52:47.0991926Z ld.shared.b16 %rs240, [%r1661+1088]; 2026-02-21T08:52:47.0991990Z add.s32 %r1662, %r1657, %r23; 2026-02-21T08:52:47.0992057Z ld.shared.b16 %rs241, [%r1662]; 2026-02-21T08:52:47.0992133Z ld.shared.b16 %rs242, [%r1662+1024]; 2026-02-21T08:52:47.0992200Z ld.shared.b16 %rs243, [%r1662+64]; 2026-02-21T08:52:47.0992269Z ld.shared.b16 %rs244, [%r1662+1088]; 2026-02-21T08:52:47.0992336Z add.s32 %r1663, %r1657, %r24; 2026-02-21T08:52:47.0992403Z ld.shared.b16 %rs245, [%r1663]; 2026-02-21T08:52:47.0992472Z ld.shared.b16 %rs246, [%r1663+1024]; 2026-02-21T08:52:47.0992540Z ld.shared.b16 %rs247, [%r1663+64]; 2026-02-21T08:52:47.0992618Z ld.shared.b16 %rs248, [%r1663+1088]; 2026-02-21T08:52:47.0992682Z add.s32 %r1664, %r1657, %r25; 2026-02-21T08:52:47.0992748Z ld.shared.b16 %rs249, [%r1664]; 2026-02-21T08:52:47.0992929Z ld.shared.b16 %rs250, [%r1664+1024]; 2026-02-21T08:52:47.0992996Z ld.shared.b16 %rs251, [%r1664+64]; 2026-02-21T08:52:47.0993065Z ld.shared.b16 %rs252, [%r1664+1088]; 2026-02-21T08:52:47.0993129Z add.s32 %r1665, %r1657, %r26; 2026-02-21T08:52:47.0993202Z ld.shared.b16 %rs253, [%r1665]; 2026-02-21T08:52:47.0993271Z ld.shared.b16 %rs254, [%r1665+1024]; 2026-02-21T08:52:47.0993338Z ld.shared.b16 %rs255, [%r1665+64]; 2026-02-21T08:52:47.0993412Z ld.shared.b16 %rs256, [%r1665+1088]; 2026-02-21T08:52:47.0993480Z cvt.f32.bf16 %r1353, %rs225; 2026-02-21T08:52:47.0993543Z cvt.f32.bf16 %r1354, %rs226; 2026-02-21T08:52:47.0993611Z cvt.f32.bf16 %r1355, %rs229; 2026-02-21T08:52:47.0993675Z cvt.f32.bf16 %r1356, %rs230; 2026-02-21T08:52:47.0993739Z cvt.f32.bf16 %r1389, %rs233; 2026-02-21T08:52:47.0993893Z cvt.f32.bf16 %r1390, %rs234; 2026-02-21T08:52:47.0993963Z cvt.f32.bf16 %r1391, %rs237; 2026-02-21T08:52:47.0994026Z cvt.f32.bf16 %r1392, %rs238; 2026-02-21T08:52:47.0994090Z cvt.f32.bf16 %r1425, %rs241; 2026-02-21T08:52:47.0994162Z cvt.f32.bf16 %r1426, %rs242; 2026-02-21T08:52:47.0994225Z cvt.f32.bf16 %r1427, %rs245; 2026-02-21T08:52:47.0994288Z cvt.f32.bf16 %r1428, %rs246; 2026-02-21T08:52:47.0994351Z cvt.f32.bf16 %r1461, %rs249; 2026-02-21T08:52:47.0994419Z cvt.f32.bf16 %r1462, %rs250; 2026-02-21T08:52:47.0994480Z cvt.f32.bf16 %r1463, %rs253; 2026-02-21T08:52:47.0994542Z cvt.f32.bf16 %r1464, %rs254; 2026-02-21T08:52:47.0994609Z cvt.f32.bf16 %r1497, %rs227; 2026-02-21T08:52:47.0994672Z cvt.f32.bf16 %r1498, %rs228; 2026-02-21T08:52:47.0994734Z cvt.f32.bf16 %r1499, %rs231; 2026-02-21T08:52:47.0994797Z cvt.f32.bf16 %r1500, %rs232; 2026-02-21T08:52:47.0994864Z cvt.f32.bf16 %r1533, %rs235; 2026-02-21T08:52:47.0994927Z cvt.f32.bf16 %r1534, %rs236; 2026-02-21T08:52:47.0995005Z cvt.f32.bf16 %r1535, %rs239; 2026-02-21T08:52:47.0995074Z cvt.f32.bf16 %r1536, %rs240; 2026-02-21T08:52:47.0995136Z cvt.f32.bf16 %r1569, %rs243; 2026-02-21T08:52:47.0995198Z cvt.f32.bf16 %r1570, %rs244; 2026-02-21T08:52:47.0995269Z cvt.f32.bf16 %r1571, %rs247; 2026-02-21T08:52:47.0995333Z cvt.f32.bf16 %r1572, %rs248; 2026-02-21T08:52:47.0995404Z cvt.f32.bf16 %r1605, %rs251; 2026-02-21T08:52:47.0995465Z cvt.f32.bf16 %r1606, %rs252; 2026-02-21T08:52:47.0995532Z cvt.f32.bf16 %r1607, %rs255; 2026-02-21T08:52:47.0995595Z cvt.f32.bf16 %r1608, %rs256; 2026-02-21T08:52:47.0995799Z .loc 1 67 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:67:45 2026-02-21T08:52:47.0995867Z add.s32 %r1666, %r27, %r1654; 2026-02-21T08:52:47.0995934Z ld.shared.b8 %rs257, [%r1666]; 2026-02-21T08:52:47.0996001Z ld.shared.b8 %rs258, [%r1666+256]; 2026-02-21T08:52:47.0996067Z ld.shared.b8 %rs259, [%r1666+512]; 2026-02-21T08:52:47.0996142Z ld.shared.b8 %rs260, [%r1666+768]; 2026-02-21T08:52:47.0996211Z ld.shared.b8 %rs261, [%r1666+1024]; 2026-02-21T08:52:47.0996279Z ld.shared.b8 %rs262, [%r1666+1280]; 2026-02-21T08:52:47.0996350Z ld.shared.b8 %rs263, [%r1666+1536]; 2026-02-21T08:52:47.0996420Z ld.shared.b8 %rs264, [%r1666+1792]; 2026-02-21T08:52:47.0996608Z ld.shared.b8 %rs265, [%r1666+2048]; 2026-02-21T08:52:47.0996687Z ld.shared.b8 %rs266, [%r1666+2304]; 2026-02-21T08:52:47.0996756Z ld.shared.b8 %rs267, [%r1666+2560]; 2026-02-21T08:52:47.0996823Z ld.shared.b8 %rs268, [%r1666+2816]; 2026-02-21T08:52:47.0996888Z ld.shared.b8 %rs269, [%r1666+3072]; 2026-02-21T08:52:47.0996960Z ld.shared.b8 %rs270, [%r1666+3328]; 2026-02-21T08:52:47.0997026Z ld.shared.b8 %rs271, [%r1666+3584]; 2026-02-21T08:52:47.0997092Z ld.shared.b8 %rs272, [%r1666+3840]; 2026-02-21T08:52:47.0997299Z .loc 1 57 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:57:28 2026-02-21T08:52:47.0997374Z shl.b16 %rs273, %rs257, 4; 2026-02-21T08:52:47.0997443Z shl.b16 %rs274, %rs258, 4; 2026-02-21T08:52:47.0997506Z shl.b16 %rs275, %rs259, 4; 2026-02-21T08:52:47.0997575Z shl.b16 %rs276, %rs260, 4; 2026-02-21T08:52:47.0997638Z shl.b16 %rs277, %rs261, 4; 2026-02-21T08:52:47.0997840Z shl.b16 %rs278, %rs262, 4; 2026-02-21T08:52:47.0997908Z shl.b16 %rs279, %rs263, 4; 2026-02-21T08:52:47.0997970Z shl.b16 %rs280, %rs264, 4; 2026-02-21T08:52:47.0998032Z shl.b16 %rs281, %rs265, 4; 2026-02-21T08:52:47.0998094Z shl.b16 %rs282, %rs266, 4; 2026-02-21T08:52:47.0998162Z shl.b16 %rs283, %rs267, 4; 2026-02-21T08:52:47.0998225Z shl.b16 %rs284, %rs268, 4; 2026-02-21T08:52:47.0998286Z shl.b16 %rs285, %rs269, 4; 2026-02-21T08:52:47.0998355Z shl.b16 %rs286, %rs270, 4; 2026-02-21T08:52:47.0998417Z shl.b16 %rs287, %rs271, 4; 2026-02-21T08:52:47.0998479Z shl.b16 %rs288, %rs272, 4; 2026-02-21T08:52:47.0998698Z .loc 1 72 58 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:72:58 2026-02-21T08:52:47.0998901Z selp.b16 %rs289, %rs273, %rs257, %p70; 2026-02-21T08:52:47.0998971Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:52:47.0999035Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:52:47.0999116Z selp.b16 %rs292, %rs274, %rs258, %p70; 2026-02-21T08:52:47.0999186Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:52:47.0999250Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:52:47.0999325Z selp.b16 %rs295, %rs275, %rs259, %p70; 2026-02-21T08:52:47.0999387Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:52:47.0999450Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:52:47.0999520Z selp.b16 %rs298, %rs276, %rs260, %p70; 2026-02-21T08:52:47.0999586Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:52:47.0999649Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:52:47.0999720Z selp.b16 %rs301, %rs277, %rs261, %p70; 2026-02-21T08:52:47.0999791Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:52:47.0999853Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:52:47.0999923Z selp.b16 %rs304, %rs278, %rs262, %p70; 2026-02-21T08:52:47.0999987Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:52:47.1000070Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:52:47.1000144Z selp.b16 %rs307, %rs279, %rs263, %p70; 2026-02-21T08:52:47.1000207Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:52:47.1000281Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:52:47.1000353Z selp.b16 %rs310, %rs280, %rs264, %p70; 2026-02-21T08:52:47.1000416Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:52:47.1000485Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:52:47.1000555Z selp.b16 %rs313, %rs281, %rs265, %p70; 2026-02-21T08:52:47.1000623Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:52:47.1000688Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:52:47.1000771Z selp.b16 %rs316, %rs282, %rs266, %p70; 2026-02-21T08:52:47.1000838Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:52:47.1000901Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:52:47.1000981Z selp.b16 %rs319, %rs283, %rs267, %p70; 2026-02-21T08:52:47.1001046Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:52:47.1001108Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:52:47.1001182Z selp.b16 %rs322, %rs284, %rs268, %p70; 2026-02-21T08:52:47.1001253Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:52:47.1001316Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:52:47.1001387Z selp.b16 %rs325, %rs285, %rs269, %p70; 2026-02-21T08:52:47.1001458Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:52:47.1001521Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:52:47.1001590Z selp.b16 %rs328, %rs286, %rs270, %p70; 2026-02-21T08:52:47.1001651Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:52:47.1001719Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:52:47.1001788Z selp.b16 %rs331, %rs287, %rs271, %p70; 2026-02-21T08:52:47.1001849Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:52:47.1001916Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:52:47.1001985Z selp.b16 %rs334, %rs288, %rs272, %p70; 2026-02-21T08:52:47.1002047Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:52:47.1002114Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:52:47.1002318Z .loc 1 77 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:77:32 2026-02-21T08:52:47.1002387Z cvt.rn.f32.s16 %r1667, %rs291; 2026-02-21T08:52:47.1002451Z cvt.rn.f32.s16 %r1668, %rs294; 2026-02-21T08:52:47.1002520Z cvt.rn.f32.s16 %r1669, %rs297; 2026-02-21T08:52:47.1002707Z cvt.rn.f32.s16 %r1670, %rs300; 2026-02-21T08:52:47.1002771Z cvt.rn.f32.s16 %r1671, %rs303; 2026-02-21T08:52:47.1002839Z cvt.rn.f32.s16 %r1672, %rs306; 2026-02-21T08:52:47.1002907Z cvt.rn.f32.s16 %r1673, %rs309; 2026-02-21T08:52:47.1002971Z cvt.rn.f32.s16 %r1674, %rs312; 2026-02-21T08:52:47.1003035Z cvt.rn.f32.s16 %r1675, %rs315; 2026-02-21T08:52:47.1003102Z cvt.rn.f32.s16 %r1676, %rs318; 2026-02-21T08:52:47.1003165Z cvt.rn.f32.s16 %r1677, %rs321; 2026-02-21T08:52:47.1003228Z cvt.rn.f32.s16 %r1678, %rs324; 2026-02-21T08:52:47.1003296Z cvt.rn.f32.s16 %r1679, %rs327; 2026-02-21T08:52:47.1003359Z cvt.rn.f32.s16 %r1680, %rs330; 2026-02-21T08:52:47.1003422Z cvt.rn.f32.s16 %r1681, %rs333; 2026-02-21T08:52:47.1003485Z cvt.rn.f32.s16 %r1682, %rs336; 2026-02-21T08:52:47.1003652Z st.shared.b32 [%r39], %r1667; 2026-02-21T08:52:47.1003726Z st.shared.b32 [%r39+16384], %r1675; 2026-02-21T08:52:47.1003795Z st.shared.b32 [%r40], %r1668; 2026-02-21T08:52:47.1003867Z st.shared.b32 [%r40+16384], %r1676; 2026-02-21T08:52:47.1003936Z st.shared.b32 [%r41], %r1669; 2026-02-21T08:52:47.1004003Z st.shared.b32 [%r41+16384], %r1677; 2026-02-21T08:52:47.1004071Z st.shared.b32 [%r42], %r1670; 2026-02-21T08:52:47.1004136Z st.shared.b32 [%r42+16384], %r1678; 2026-02-21T08:52:47.1004199Z st.shared.b32 [%r43], %r1671; 2026-02-21T08:52:47.1004265Z st.shared.b32 [%r43+16384], %r1679; 2026-02-21T08:52:47.1004343Z st.shared.b32 [%r44], %r1672; 2026-02-21T08:52:47.1004417Z st.shared.b32 [%r44+16384], %r1680; 2026-02-21T08:52:47.1004483Z st.shared.b32 [%r45], %r1673; 2026-02-21T08:52:47.1004553Z st.shared.b32 [%r45+16384], %r1681; 2026-02-21T08:52:47.1004618Z st.shared.b32 [%r46], %r1674; 2026-02-21T08:52:47.1004683Z st.shared.b32 [%r46+16384], %r1682; 2026-02-21T08:52:47.1004740Z $L__tmp5: 2026-02-21T08:52:47.1005031Z .loc 2 291 36 // standard.py:291:36 @[ c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:84:40 ] 2026-02-21T08:52:47.1005104Z // begin inline asm 2026-02-21T08:52:47.1005190Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.1005255Z // end inline asm 2026-02-21T08:52:47.1005313Z bar.sync 0; 2026-02-21T08:52:47.1005396Z shfl.sync.idx.b32 %r1683, %r4, 0, 31, -1; 2026-02-21T08:52:47.1005479Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.1005544Z shl.b32 %r1684, %r1683, 10; 2026-02-21T08:52:47.1005607Z and.b32 %r1685, %r1684, 12288; 2026-02-21T08:52:47.1005671Z add.s32 %r1686, %r1685, %r2696; 2026-02-21T08:52:47.1005739Z bfe.u32 %r1687, %r1686, 4, 14; 2026-02-21T08:52:47.1005804Z cvt.u64.u32 %rd143, %r1687; 2026-02-21T08:52:47.1005883Z or.b64 %rd132, %rd143, 4611686293372403712; 2026-02-21T08:52:47.1005955Z mov.pred %p28, -1; 2026-02-21T08:52:47.1006017Z // begin inline asm 2026-02-21T08:52:47.1006654Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1353,%r1354,%r1355,%r1356}, %rd132, %p28, 1, 1; 2026-02-21T08:52:47.1006729Z // end inline asm 2026-02-21T08:52:47.1006795Z add.s32 %r1688, %r1686, 32; 2026-02-21T08:52:47.1006859Z bfe.u32 %r1689, %r1688, 4, 14; 2026-02-21T08:52:47.1006923Z cvt.u64.u32 %rd144, %r1689; 2026-02-21T08:52:47.1007006Z or.b64 %rd133, %rd144, 4611686293372403712; 2026-02-21T08:52:47.1007068Z // begin inline asm 2026-02-21T08:52:47.1007571Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1389,%r1390,%r1391,%r1392}, %rd133, %p28, 1, 1; 2026-02-21T08:52:47.1007636Z // end inline asm 2026-02-21T08:52:47.1007699Z add.s32 %r1690, %r1686, 64; 2026-02-21T08:52:47.1007762Z bfe.u32 %r1691, %r1690, 4, 14; 2026-02-21T08:52:47.1007831Z cvt.u64.u32 %rd145, %r1691; 2026-02-21T08:52:47.1007909Z or.b64 %rd134, %rd145, 4611686293372403712; 2026-02-21T08:52:47.1007972Z // begin inline asm 2026-02-21T08:52:47.1008473Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1425,%r1426,%r1427,%r1428}, %rd134, %p28, 1, 1; 2026-02-21T08:52:47.1013802Z // end inline asm 2026-02-21T08:52:47.1013897Z add.s32 %r1692, %r1686, 96; 2026-02-21T08:52:47.1013970Z bfe.u32 %r1693, %r1692, 4, 14; 2026-02-21T08:52:47.1014051Z cvt.u64.u32 %rd146, %r1693; 2026-02-21T08:52:47.1014141Z or.b64 %rd135, %rd146, 4611686293372403712; 2026-02-21T08:52:47.1014206Z // begin inline asm 2026-02-21T08:52:47.1014734Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1461,%r1462,%r1463,%r1464}, %rd135, %p28, 1, 1; 2026-02-21T08:52:47.1014804Z // end inline asm 2026-02-21T08:52:47.1015079Z add.s32 %r1694, %r1686, 16384; 2026-02-21T08:52:47.1015159Z bfe.u32 %r1695, %r1694, 4, 14; 2026-02-21T08:52:47.1015236Z cvt.u64.u32 %rd147, %r1695; 2026-02-21T08:52:47.1015326Z or.b64 %rd136, %rd147, 4611686293372403712; 2026-02-21T08:52:47.1015392Z // begin inline asm 2026-02-21T08:52:47.1015910Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1497,%r1498,%r1499,%r1500}, %rd136, %p28, 1, 1; 2026-02-21T08:52:47.1015978Z // end inline asm 2026-02-21T08:52:47.1016044Z add.s32 %r1696, %r1686, 16416; 2026-02-21T08:52:47.1016105Z bfe.u32 %r1697, %r1696, 4, 14; 2026-02-21T08:52:47.1016173Z cvt.u64.u32 %rd148, %r1697; 2026-02-21T08:52:47.1016251Z or.b64 %rd137, %rd148, 4611686293372403712; 2026-02-21T08:52:47.1016311Z // begin inline asm 2026-02-21T08:52:47.1016999Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1533,%r1534,%r1535,%r1536}, %rd137, %p28, 1, 1; 2026-02-21T08:52:47.1017067Z // end inline asm 2026-02-21T08:52:47.1017133Z add.s32 %r1698, %r1686, 16448; 2026-02-21T08:52:47.1017215Z bfe.u32 %r1699, %r1698, 4, 14; 2026-02-21T08:52:47.1017279Z cvt.u64.u32 %rd149, %r1699; 2026-02-21T08:52:47.1017356Z or.b64 %rd138, %rd149, 4611686293372403712; 2026-02-21T08:52:47.1017418Z // begin inline asm 2026-02-21T08:52:47.1017911Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1569,%r1570,%r1571,%r1572}, %rd138, %p28, 1, 1; 2026-02-21T08:52:47.1017969Z // end inline asm 2026-02-21T08:52:47.1018031Z add.s32 %r1700, %r1686, 16480; 2026-02-21T08:52:47.1018098Z bfe.u32 %r1701, %r1700, 4, 14; 2026-02-21T08:52:47.1018163Z cvt.u64.u32 %rd150, %r1701; 2026-02-21T08:52:47.1018253Z or.b64 %rd139, %rd150, 4611686293372403712; 2026-02-21T08:52:47.1018320Z // begin inline asm 2026-02-21T08:52:47.1018819Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768}, {%r1605,%r1606,%r1607,%r1608}, %rd139, %p28, 1, 1; 2026-02-21T08:52:47.1018880Z // end inline asm 2026-02-21T08:52:47.1018964Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.1019023Z mov.b32 %r1627, 0; 2026-02-21T08:52:47.1019083Z mov.b32 %r1626, %r1627; 2026-02-21T08:52:47.1019144Z mov.b32 %r1625, %r2696; 2026-02-21T08:52:47.1019213Z // begin inline asm 2026-02-21T08:52:47.1019525Z // wait for regs: %r2753,%r2754,%r2755,%r2756,%r2757,%r2758,%r2759,%r2760,%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768,%r1625,%r1626,%r1627 2026-02-21T08:52:47.1019607Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.1019671Z // end inline asm 2026-02-21T08:52:47.1019728Z $L__tmp6: 2026-02-21T08:52:47.1019952Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1020027Z add.s32 %r1702, %r2752, 1; 2026-02-21T08:52:47.1020097Z setp.gt.s32 %p39, %r1702, 1; 2026-02-21T08:52:47.1020323Z selp.b32 %r2752, 0, %r1702, %p39; 2026-02-21T08:52:47.1020545Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1020631Z mad.wide.s32 %rd141, %r2749, 2, %rd33; 2026-02-21T08:52:47.1020840Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1020905Z shl.b32 %r1703, %r2752, 12; 2026-02-21T08:52:47.1020973Z shl.b32 %r1704, %r2752, 13; 2026-02-21T08:52:47.1021037Z add.s32 %r1705, %r750, %r1704; 2026-02-21T08:52:47.1021100Z add.s32 %r1647, %r1705, %r10; 2026-02-21T08:52:47.1021170Z selp.b32 %r1648, 8, 0, %p37; 2026-02-21T08:52:47.1021238Z // begin inline asm 2026-02-21T08:52:47.1021502Z cp.async.ca.shared.global [ %r1647 + 0 ], [ %rd240 + 0 ], 0x8, %r1648; 2026-02-21T08:52:47.1021569Z // end inline asm 2026-02-21T08:52:47.1021633Z add.s32 %r1649, %r1647, 4096; 2026-02-21T08:52:47.1021699Z // begin inline asm 2026-02-21T08:52:47.1021845Z cp.async.ca.shared.global [ %r1649 + 0 ], [ %rd141 + 0 ], 0x8, %r1648; 2026-02-21T08:52:47.1021902Z // end inline asm 2026-02-21T08:52:47.1021980Z cp.async.commit_group; 2026-02-21T08:52:47.1022185Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1022253Z cvt.s64.s32 %rd151, %r2750; 2026-02-21T08:52:47.1022319Z add.s64 %rd142, %rd34, %rd151; 2026-02-21T08:52:47.1022526Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1022591Z add.s32 %r1651, %r14, %r1703; 2026-02-21T08:52:47.1022654Z // begin inline asm 2026-02-21T08:52:47.1022795Z cp.async.ca.shared.global [ %r1651 + 0 ], [ %rd142 + 0 ], 0x8, %r1648; 2026-02-21T08:52:47.1022853Z // end inline asm 2026-02-21T08:52:47.1022923Z cp.async.commit_group; 2026-02-21T08:52:47.1023132Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1023201Z add.s32 %r2750, %r2750, 229376; 2026-02-21T08:52:47.1023267Z add.s64 %rd240, %rd240, 128; 2026-02-21T08:52:47.1023329Z add.s32 %r2749, %r2749, 64; 2026-02-21T08:52:47.1023422Z setp.lt.u64 %p40, %rd241, 4064; 2026-02-21T08:52:47.1023486Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:47.1023601Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.1023679Z cp.async.wait_group 0; 2026-02-21T08:52:47.1023738Z bar.sync 0; 2026-02-21T08:52:47.1023940Z .loc 1 87 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:87:28 2026-02-21T08:52:47.1024027Z cvt.rn.bf16x2.f32 %r1739, %r2754, %r2753; 2026-02-21T08:52:47.1024103Z cvt.rn.bf16x2.f32 %r1740, %r2756, %r2755; 2026-02-21T08:52:47.1024177Z cvt.rn.bf16x2.f32 %r1741, %r2758, %r2757; 2026-02-21T08:52:47.1024251Z cvt.rn.bf16x2.f32 %r1742, %r2760, %r2759; 2026-02-21T08:52:47.1024328Z cvt.rn.bf16x2.f32 %r1743, %r2762, %r2761; 2026-02-21T08:52:47.1024398Z cvt.rn.bf16x2.f32 %r1744, %r2764, %r2763; 2026-02-21T08:52:47.1024471Z cvt.rn.bf16x2.f32 %r1745, %r2766, %r2765; 2026-02-21T08:52:47.1024548Z cvt.rn.bf16x2.f32 %r1746, %r2768, %r2767; 2026-02-21T08:52:47.1024761Z .loc 1 88 50 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:50 2026-02-21T08:52:47.1024835Z mad.lo.s32 %r1747, %r172, 7168, %r174; 2026-02-21T08:52:47.1024909Z mad.lo.s32 %r1748, %r173, 7168, %r174; 2026-02-21T08:52:47.1025120Z .loc 1 88 22 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:22 2026-02-21T08:52:47.1025191Z mad.wide.s32 %rd152, %r1747, 2, %rd35; 2026-02-21T08:52:47.1025263Z mad.wide.s32 %rd153, %r1748, 2, %rd35; 2026-02-21T08:52:47.1025474Z .loc 1 88 81 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:81 2026-02-21T08:52:47.1025597Z st.shared.v4.b32 [%r36], {%r1739, %r1741, %r1743, %r1745}; 2026-02-21T08:52:47.1025715Z st.shared.v4.b32 [%r36+512], {%r1740, %r1742, %r1744, %r1746}; 2026-02-21T08:52:47.1025893Z bar.sync 0; 2026-02-21T08:52:47.1025958Z // begin inline asm 2026-02-21T08:52:47.1026153Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1716, %r1717, %r1718, %r1719}, [%r804]; 2026-02-21T08:52:47.1026219Z // end inline asm 2026-02-21T08:52:47.1026279Z // begin inline asm 2026-02-21T08:52:47.1026590Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1720, %r1721, %r1722, %r1723}, [%r809]; 2026-02-21T08:52:47.1026654Z // end inline asm 2026-02-21T08:52:47.1026732Z // begin inline asm 2026-02-21T08:52:47.1026865Z st.global.v4.b32 [ %rd152 + 0 ], { %r1716, %r1717, %r1718, %r1719 }; 2026-02-21T08:52:47.1026924Z // end inline asm 2026-02-21T08:52:47.1026991Z // begin inline asm 2026-02-21T08:52:47.1027113Z st.global.v4.b32 [ %rd153 + 0 ], { %r1720, %r1721, %r1722, %r1723 }; 2026-02-21T08:52:47.1027327Z // end inline asm 2026-02-21T08:52:47.1027555Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.1027626Z add.s32 %r1749, %r2708, 12672; 2026-02-21T08:52:47.1027832Z .loc 1 25 35 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:25:35 2026-02-21T08:52:47.1027908Z mul.hi.s32 %r1750, %r1749, -1840700269; 2026-02-21T08:52:47.1027980Z add.s32 %r1751, %r1750, %r1749; 2026-02-21T08:52:47.1028044Z shr.u32 %r1752, %r1751, 31; 2026-02-21T08:52:47.1028109Z shr.s32 %r1753, %r1751, 6; 2026-02-21T08:52:47.1028179Z add.s32 %r1754, %r1753, %r1752; 2026-02-21T08:52:47.1028383Z .loc 1 26 33 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:26:33 2026-02-21T08:52:47.1028444Z shl.b32 %r1755, %r1754, 1; 2026-02-21T08:52:47.1028738Z .loc 1 27 39 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:39 2026-02-21T08:52:47.1028806Z sub.s32 %r1756, 1, %r1755; 2026-02-21T08:52:47.1029007Z .loc 1 27 52 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:52 2026-02-21T08:52:47.1029070Z min.s32 %r1757, %r1756, 2; 2026-02-21T08:52:47.1029273Z .loc 1 28 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:45 2026-02-21T08:52:47.1029338Z mul.lo.s32 %r1758, %r1754, 112; 2026-02-21T08:52:47.1029400Z sub.s32 %r1759, %r1749, %r1758; 2026-02-21T08:52:47.1029602Z .loc 1 29 51 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:29:51 2026-02-21T08:52:47.1029665Z div.s32 %r1760, %r1759, %r1757; 2026-02-21T08:52:47.1029862Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.1029935Z mul.lo.s32 %r1761, %r1760, %r1757; 2026-02-21T08:52:47.1030001Z sub.s32 %r1762, %r1759, %r1761; 2026-02-21T08:52:47.1030200Z .loc 1 28 30 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:30 2026-02-21T08:52:47.1030270Z add.s32 %r1763, %r1762, %r1755; 2026-02-21T08:52:47.1030466Z .loc 1 30 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:30:27 2026-02-21T08:52:47.1030532Z shl.b32 %r1764, %r1763, 6; 2026-02-21T08:52:47.1030729Z .loc 1 31 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:32 2026-02-21T08:52:47.1030799Z or.b32 %r217, %r1764, %r5; 2026-02-21T08:52:47.1030861Z or.b32 %r218, %r1764, %r6; 2026-02-21T08:52:47.1031056Z .loc 1 32 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:32:27 2026-02-21T08:52:47.1031126Z shl.b32 %r1765, %r1760, 7; 2026-02-21T08:52:47.1031321Z .loc 1 33 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:32 2026-02-21T08:52:47.1031396Z or.b32 %r219, %r1765, %r8; 2026-02-21T08:52:47.1031603Z .loc 1 48 53 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:53 2026-02-21T08:52:47.1031666Z shl.b32 %r1766, %r217, 13; 2026-02-21T08:52:47.1031726Z shl.b32 %r1767, %r218, 13; 2026-02-21T08:52:47.1031922Z .loc 1 48 60 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:60 2026-02-21T08:52:47.1032129Z or.b32 %r1768, %r1766, %r7; 2026-02-21T08:52:47.1032194Z or.b32 %r1769, %r1767, %r7; 2026-02-21T08:52:47.1032391Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1032472Z mad.wide.s32 %rd154, %r1768, 2, %rd33; 2026-02-21T08:52:47.1032543Z mad.wide.s32 %rd155, %r1769, 2, %rd33; 2026-02-21T08:52:47.1032601Z mov.b32 %r1725, 8; 2026-02-21T08:52:47.1032809Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1032871Z // begin inline asm 2026-02-21T08:52:47.1033016Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd154 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1033170Z // end inline asm 2026-02-21T08:52:47.1033240Z // begin inline asm 2026-02-21T08:52:47.1033373Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd155 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1033437Z // end inline asm 2026-02-21T08:52:47.1033513Z cp.async.commit_group; 2026-02-21T08:52:47.1033715Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.1033781Z add.s32 %r1770, %r219, %r2697; 2026-02-21T08:52:47.1033984Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1034049Z cvt.s64.s32 %rd161, %r1770; 2026-02-21T08:52:47.1034114Z add.s64 %rd156, %rd34, %rd161; 2026-02-21T08:52:47.1034311Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1034377Z // begin inline asm 2026-02-21T08:52:47.1034507Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd156 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1034569Z // end inline asm 2026-02-21T08:52:47.1034644Z cp.async.commit_group; 2026-02-21T08:52:47.1034842Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1034911Z cvt.s64.s32 %rd162, %r1766; 2026-02-21T08:52:47.1034985Z or.b64 %rd163, %rd162, %rd235; 2026-02-21T08:52:47.1035052Z shl.b64 %rd164, %rd163, 1; 2026-02-21T08:52:47.1035131Z add.s64 %rd165, %rd33, %rd164; 2026-02-21T08:52:47.1035197Z add.s64 %rd157, %rd165, 128; 2026-02-21T08:52:47.1035270Z cvt.s64.s32 %rd166, %r1767; 2026-02-21T08:52:47.1035340Z or.b64 %rd167, %rd166, %rd235; 2026-02-21T08:52:47.1035402Z shl.b64 %rd168, %rd167, 1; 2026-02-21T08:52:47.1035475Z add.s64 %rd169, %rd33, %rd168; 2026-02-21T08:52:47.1035537Z add.s64 %rd158, %rd169, 128; 2026-02-21T08:52:47.1035738Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1035801Z bar.sync 0; 2026-02-21T08:52:47.1035864Z // begin inline asm 2026-02-21T08:52:47.1035997Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd157 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1036058Z // end inline asm 2026-02-21T08:52:47.1036126Z // begin inline asm 2026-02-21T08:52:47.1036268Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd158 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1036327Z // end inline asm 2026-02-21T08:52:47.1036401Z cp.async.commit_group; 2026-02-21T08:52:47.1036718Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.1036784Z add.s32 %r1771, %r219, %r2698; 2026-02-21T08:52:47.1036983Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1037052Z cvt.s64.s32 %rd170, %r1771; 2026-02-21T08:52:47.1037115Z add.s64 %rd159, %rd34, %rd170; 2026-02-21T08:52:47.1037309Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1037379Z // begin inline asm 2026-02-21T08:52:47.1037510Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd159 + 0 ], 0x8, %r1725; 2026-02-21T08:52:47.1037568Z // end inline asm 2026-02-21T08:52:47.1037776Z cp.async.commit_group; 2026-02-21T08:52:47.1037972Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1038037Z add.s32 %r2770, %r47, %r1765; 2026-02-21T08:52:47.1038097Z shl.b32 %r1772, %r1763, 19; 2026-02-21T08:52:47.1038165Z or.b32 %r1773, %r48, %r1772; 2026-02-21T08:52:47.1038237Z mad.wide.s32 %rd242, %r1773, 2, %rd1; 2026-02-21T08:52:47.1038298Z or.b32 %r2769, %r49, %r1772; 2026-02-21T08:52:47.1038366Z mov.b32 %r2773, 0f00000000; 2026-02-21T08:52:47.1038424Z mov.b32 %r2772, 1; 2026-02-21T08:52:47.1038486Z mov.b32 %r2771, -1; 2026-02-21T08:52:47.1038549Z mov.b64 %rd243, -32; 2026-02-21T08:52:47.1038619Z mov.b32 %r2774, %r2773; 2026-02-21T08:52:47.1038679Z mov.b32 %r2775, %r2773; 2026-02-21T08:52:47.1038862Z mov.b32 %r2776, %r2773; 2026-02-21T08:52:47.1038933Z mov.b32 %r2777, %r2773; 2026-02-21T08:52:47.1038992Z mov.b32 %r2778, %r2773; 2026-02-21T08:52:47.1039050Z mov.b32 %r2779, %r2773; 2026-02-21T08:52:47.1039119Z mov.b32 %r2780, %r2773; 2026-02-21T08:52:47.1039179Z mov.b32 %r2781, %r2773; 2026-02-21T08:52:47.1039239Z mov.b32 %r2782, %r2773; 2026-02-21T08:52:47.1039297Z mov.b32 %r2783, %r2773; 2026-02-21T08:52:47.1039360Z mov.b32 %r2784, %r2773; 2026-02-21T08:52:47.1039420Z mov.b32 %r2785, %r2773; 2026-02-21T08:52:47.1039476Z mov.b32 %r2786, %r2773; 2026-02-21T08:52:47.1039543Z mov.b32 %r2787, %r2773; 2026-02-21T08:52:47.1039602Z mov.b32 %r2788, %r2773; 2026-02-21T08:52:47.1039717Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.1039825Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.1039906Z add.s64 %rd243, %rd243, 32; 2026-02-21T08:52:47.1039981Z setp.lt.u64 %p50, %rd243, 4032; 2026-02-21T08:52:47.1040045Z add.s32 %r2106, %r2771, 1; 2026-02-21T08:52:47.1040120Z setp.gt.s32 %p51, %r2106, 1; 2026-02-21T08:52:47.1040191Z selp.b32 %r2771, 0, %r2106, %p51; 2026-02-21T08:52:47.1040399Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1040476Z cp.async.wait_group 2; 2026-02-21T08:52:47.1040533Z bar.sync 0; 2026-02-21T08:52:47.1040594Z shl.b32 %r2107, %r2771, 12; 2026-02-21T08:52:47.1040655Z shl.b32 %r2108, %r2771, 13; 2026-02-21T08:52:47.1040726Z add.s32 %r2110, %r750, %r2108; 2026-02-21T08:52:47.1040929Z .loc 1 52 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:52:32 2026-02-21T08:52:47.1040993Z add.s32 %r2111, %r2110, %r19; 2026-02-21T08:52:47.1041068Z ld.shared.b16 %rs337, [%r2111]; 2026-02-21T08:52:47.1041140Z ld.shared.b16 %rs338, [%r2111+1024]; 2026-02-21T08:52:47.1041209Z ld.shared.b16 %rs339, [%r2111+64]; 2026-02-21T08:52:47.1041280Z ld.shared.b16 %rs340, [%r2111+1088]; 2026-02-21T08:52:47.1041349Z add.s32 %r2112, %r2110, %r20; 2026-02-21T08:52:47.1041416Z ld.shared.b16 %rs341, [%r2112]; 2026-02-21T08:52:47.1041485Z ld.shared.b16 %rs342, [%r2112+1024]; 2026-02-21T08:52:47.1041562Z ld.shared.b16 %rs343, [%r2112+64]; 2026-02-21T08:52:47.1041629Z ld.shared.b16 %rs344, [%r2112+1088]; 2026-02-21T08:52:47.1041693Z add.s32 %r2113, %r2110, %r21; 2026-02-21T08:52:47.1041767Z ld.shared.b16 %rs345, [%r2113]; 2026-02-21T08:52:47.1041834Z ld.shared.b16 %rs346, [%r2113+1024]; 2026-02-21T08:52:47.1041900Z ld.shared.b16 %rs347, [%r2113+64]; 2026-02-21T08:52:47.1041968Z ld.shared.b16 %rs348, [%r2113+1088]; 2026-02-21T08:52:47.1042040Z add.s32 %r2114, %r2110, %r22; 2026-02-21T08:52:47.1042107Z ld.shared.b16 %rs349, [%r2114]; 2026-02-21T08:52:47.1042174Z ld.shared.b16 %rs350, [%r2114+1024]; 2026-02-21T08:52:47.1042247Z ld.shared.b16 %rs351, [%r2114+64]; 2026-02-21T08:52:47.1042313Z ld.shared.b16 %rs352, [%r2114+1088]; 2026-02-21T08:52:47.1042380Z add.s32 %r2115, %r2110, %r23; 2026-02-21T08:52:47.1042458Z ld.shared.b16 %rs353, [%r2115]; 2026-02-21T08:52:47.1042535Z ld.shared.b16 %rs354, [%r2115+1024]; 2026-02-21T08:52:47.1042603Z ld.shared.b16 %rs355, [%r2115+64]; 2026-02-21T08:52:47.1042775Z ld.shared.b16 %rs356, [%r2115+1088]; 2026-02-21T08:52:47.1042845Z add.s32 %r2116, %r2110, %r24; 2026-02-21T08:52:47.1042913Z ld.shared.b16 %rs357, [%r2116]; 2026-02-21T08:52:47.1042979Z ld.shared.b16 %rs358, [%r2116+1024]; 2026-02-21T08:52:47.1043044Z ld.shared.b16 %rs359, [%r2116+64]; 2026-02-21T08:52:47.1043120Z ld.shared.b16 %rs360, [%r2116+1088]; 2026-02-21T08:52:47.1043181Z add.s32 %r2117, %r2110, %r25; 2026-02-21T08:52:47.1043248Z ld.shared.b16 %rs361, [%r2117]; 2026-02-21T08:52:47.1043320Z ld.shared.b16 %rs362, [%r2117+1024]; 2026-02-21T08:52:47.1043385Z ld.shared.b16 %rs363, [%r2117+64]; 2026-02-21T08:52:47.1043453Z ld.shared.b16 %rs364, [%r2117+1088]; 2026-02-21T08:52:47.1043521Z add.s32 %r2118, %r2110, %r26; 2026-02-21T08:52:47.1043680Z ld.shared.b16 %rs365, [%r2118]; 2026-02-21T08:52:47.1043751Z ld.shared.b16 %rs366, [%r2118+1024]; 2026-02-21T08:52:47.1043816Z ld.shared.b16 %rs367, [%r2118+64]; 2026-02-21T08:52:47.1043891Z ld.shared.b16 %rs368, [%r2118+1088]; 2026-02-21T08:52:47.1043959Z cvt.f32.bf16 %r1806, %rs337; 2026-02-21T08:52:47.1044021Z cvt.f32.bf16 %r1807, %rs338; 2026-02-21T08:52:47.1044091Z cvt.f32.bf16 %r1808, %rs341; 2026-02-21T08:52:47.1044153Z cvt.f32.bf16 %r1809, %rs342; 2026-02-21T08:52:47.1044215Z cvt.f32.bf16 %r1842, %rs345; 2026-02-21T08:52:47.1044276Z cvt.f32.bf16 %r1843, %rs346; 2026-02-21T08:52:47.1044343Z cvt.f32.bf16 %r1844, %rs349; 2026-02-21T08:52:47.1044405Z cvt.f32.bf16 %r1845, %rs350; 2026-02-21T08:52:47.1044465Z cvt.f32.bf16 %r1878, %rs353; 2026-02-21T08:52:47.1044533Z cvt.f32.bf16 %r1879, %rs354; 2026-02-21T08:52:47.1044594Z cvt.f32.bf16 %r1880, %rs357; 2026-02-21T08:52:47.1044655Z cvt.f32.bf16 %r1881, %rs358; 2026-02-21T08:52:47.1044724Z cvt.f32.bf16 %r1914, %rs361; 2026-02-21T08:52:47.1044787Z cvt.f32.bf16 %r1915, %rs362; 2026-02-21T08:52:47.1044850Z cvt.f32.bf16 %r1916, %rs365; 2026-02-21T08:52:47.1044910Z cvt.f32.bf16 %r1917, %rs366; 2026-02-21T08:52:47.1044979Z cvt.f32.bf16 %r1950, %rs339; 2026-02-21T08:52:47.1045040Z cvt.f32.bf16 %r1951, %rs340; 2026-02-21T08:52:47.1045113Z cvt.f32.bf16 %r1952, %rs343; 2026-02-21T08:52:47.1045179Z cvt.f32.bf16 %r1953, %rs344; 2026-02-21T08:52:47.1045240Z cvt.f32.bf16 %r1986, %rs347; 2026-02-21T08:52:47.1045407Z cvt.f32.bf16 %r1987, %rs348; 2026-02-21T08:52:47.1045474Z cvt.f32.bf16 %r1988, %rs351; 2026-02-21T08:52:47.1045546Z cvt.f32.bf16 %r1989, %rs352; 2026-02-21T08:52:47.1045611Z cvt.f32.bf16 %r2022, %rs355; 2026-02-21T08:52:47.1045672Z cvt.f32.bf16 %r2023, %rs356; 2026-02-21T08:52:47.1045741Z cvt.f32.bf16 %r2024, %rs359; 2026-02-21T08:52:47.1045802Z cvt.f32.bf16 %r2025, %rs360; 2026-02-21T08:52:47.1045862Z cvt.f32.bf16 %r2058, %rs363; 2026-02-21T08:52:47.1045927Z cvt.f32.bf16 %r2059, %rs364; 2026-02-21T08:52:47.1045996Z cvt.f32.bf16 %r2060, %rs367; 2026-02-21T08:52:47.1046058Z cvt.f32.bf16 %r2061, %rs368; 2026-02-21T08:52:47.1046267Z .loc 1 67 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:67:45 2026-02-21T08:52:47.1046339Z add.s32 %r2119, %r27, %r2107; 2026-02-21T08:52:47.1046408Z ld.shared.b8 %rs369, [%r2119]; 2026-02-21T08:52:47.1046587Z ld.shared.b8 %rs370, [%r2119+256]; 2026-02-21T08:52:47.1046674Z ld.shared.b8 %rs371, [%r2119+512]; 2026-02-21T08:52:47.1046742Z ld.shared.b8 %rs372, [%r2119+768]; 2026-02-21T08:52:47.1046815Z ld.shared.b8 %rs373, [%r2119+1024]; 2026-02-21T08:52:47.1046886Z ld.shared.b8 %rs374, [%r2119+1280]; 2026-02-21T08:52:47.1046958Z ld.shared.b8 %rs375, [%r2119+1536]; 2026-02-21T08:52:47.1047025Z ld.shared.b8 %rs376, [%r2119+1792]; 2026-02-21T08:52:47.1047091Z ld.shared.b8 %rs377, [%r2119+2048]; 2026-02-21T08:52:47.1047163Z ld.shared.b8 %rs378, [%r2119+2304]; 2026-02-21T08:52:47.1047232Z ld.shared.b8 %rs379, [%r2119+2560]; 2026-02-21T08:52:47.1047298Z ld.shared.b8 %rs380, [%r2119+2816]; 2026-02-21T08:52:47.1047363Z ld.shared.b8 %rs381, [%r2119+3072]; 2026-02-21T08:52:47.1047436Z ld.shared.b8 %rs382, [%r2119+3328]; 2026-02-21T08:52:47.1047639Z ld.shared.b8 %rs383, [%r2119+3584]; 2026-02-21T08:52:47.1047706Z ld.shared.b8 %rs384, [%r2119+3840]; 2026-02-21T08:52:47.1047927Z .loc 1 57 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:57:28 2026-02-21T08:52:47.1047994Z shl.b16 %rs385, %rs369, 4; 2026-02-21T08:52:47.1048057Z shl.b16 %rs386, %rs370, 4; 2026-02-21T08:52:47.1048125Z shl.b16 %rs387, %rs371, 4; 2026-02-21T08:52:47.1048187Z shl.b16 %rs388, %rs372, 4; 2026-02-21T08:52:47.1048249Z shl.b16 %rs389, %rs373, 4; 2026-02-21T08:52:47.1048310Z shl.b16 %rs390, %rs374, 4; 2026-02-21T08:52:47.1048378Z shl.b16 %rs391, %rs375, 4; 2026-02-21T08:52:47.1048439Z shl.b16 %rs392, %rs376, 4; 2026-02-21T08:52:47.1048500Z shl.b16 %rs393, %rs377, 4; 2026-02-21T08:52:47.1048714Z shl.b16 %rs394, %rs378, 4; 2026-02-21T08:52:47.1048780Z shl.b16 %rs395, %rs379, 4; 2026-02-21T08:52:47.1048843Z shl.b16 %rs396, %rs380, 4; 2026-02-21T08:52:47.1048903Z shl.b16 %rs397, %rs381, 4; 2026-02-21T08:52:47.1048976Z shl.b16 %rs398, %rs382, 4; 2026-02-21T08:52:47.1049036Z shl.b16 %rs399, %rs383, 4; 2026-02-21T08:52:47.1049097Z shl.b16 %rs400, %rs384, 4; 2026-02-21T08:52:47.1049308Z .loc 1 72 58 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:72:58 2026-02-21T08:52:47.1049387Z selp.b16 %rs401, %rs385, %rs369, %p70; 2026-02-21T08:52:47.1049450Z cvt.s16.s8 %rs402, %rs401; 2026-02-21T08:52:47.1049512Z shr.s16 %rs403, %rs402, 4; 2026-02-21T08:52:47.1049594Z selp.b16 %rs404, %rs386, %rs370, %p70; 2026-02-21T08:52:47.1049656Z cvt.s16.s8 %rs405, %rs404; 2026-02-21T08:52:47.1049724Z shr.s16 %rs406, %rs405, 4; 2026-02-21T08:52:47.1049793Z selp.b16 %rs407, %rs387, %rs371, %p70; 2026-02-21T08:52:47.1049855Z cvt.s16.s8 %rs408, %rs407; 2026-02-21T08:52:47.1049919Z shr.s16 %rs409, %rs408, 4; 2026-02-21T08:52:47.1049991Z selp.b16 %rs410, %rs388, %rs372, %p70; 2026-02-21T08:52:47.1050051Z cvt.s16.s8 %rs411, %rs410; 2026-02-21T08:52:47.1050115Z shr.s16 %rs412, %rs411, 4; 2026-02-21T08:52:47.1050191Z selp.b16 %rs413, %rs389, %rs373, %p70; 2026-02-21T08:52:47.1050261Z cvt.s16.s8 %rs414, %rs413; 2026-02-21T08:52:47.1050322Z shr.s16 %rs415, %rs414, 4; 2026-02-21T08:52:47.1050390Z selp.b16 %rs416, %rs390, %rs374, %p70; 2026-02-21T08:52:47.1050456Z cvt.s16.s8 %rs417, %rs416; 2026-02-21T08:52:47.1050517Z shr.s16 %rs418, %rs417, 4; 2026-02-21T08:52:47.1050585Z selp.b16 %rs419, %rs391, %rs375, %p70; 2026-02-21T08:52:47.1050654Z cvt.s16.s8 %rs420, %rs419; 2026-02-21T08:52:47.1050715Z shr.s16 %rs421, %rs420, 4; 2026-02-21T08:52:47.1050781Z selp.b16 %rs422, %rs392, %rs376, %p70; 2026-02-21T08:52:47.1050843Z cvt.s16.s8 %rs423, %rs422; 2026-02-21T08:52:47.1050909Z shr.s16 %rs424, %rs423, 4; 2026-02-21T08:52:47.1050982Z selp.b16 %rs425, %rs393, %rs377, %p70; 2026-02-21T08:52:47.1051044Z cvt.s16.s8 %rs426, %rs425; 2026-02-21T08:52:47.1051109Z shr.s16 %rs427, %rs426, 4; 2026-02-21T08:52:47.1051176Z selp.b16 %rs428, %rs394, %rs378, %p70; 2026-02-21T08:52:47.1051240Z cvt.s16.s8 %rs429, %rs428; 2026-02-21T08:52:47.1051301Z shr.s16 %rs430, %rs429, 4; 2026-02-21T08:52:47.1051373Z selp.b16 %rs431, %rs395, %rs379, %p70; 2026-02-21T08:52:47.1051432Z cvt.s16.s8 %rs432, %rs431; 2026-02-21T08:52:47.1051492Z shr.s16 %rs433, %rs432, 4; 2026-02-21T08:52:47.1051564Z selp.b16 %rs434, %rs396, %rs380, %p70; 2026-02-21T08:52:47.1051626Z cvt.s16.s8 %rs435, %rs434; 2026-02-21T08:52:47.1051687Z shr.s16 %rs436, %rs435, 4; 2026-02-21T08:52:47.1051760Z selp.b16 %rs437, %rs397, %rs381, %p70; 2026-02-21T08:52:47.1051823Z cvt.s16.s8 %rs438, %rs437; 2026-02-21T08:52:47.1051884Z shr.s16 %rs439, %rs438, 4; 2026-02-21T08:52:47.1051952Z selp.b16 %rs440, %rs398, %rs382, %p70; 2026-02-21T08:52:47.1052018Z cvt.s16.s8 %rs441, %rs440; 2026-02-21T08:52:47.1052080Z shr.s16 %rs442, %rs441, 4; 2026-02-21T08:52:47.1052149Z selp.b16 %rs443, %rs399, %rs383, %p70; 2026-02-21T08:52:47.1052215Z cvt.s16.s8 %rs444, %rs443; 2026-02-21T08:52:47.1052382Z shr.s16 %rs445, %rs444, 4; 2026-02-21T08:52:47.1052451Z selp.b16 %rs446, %rs400, %rs384, %p70; 2026-02-21T08:52:47.1052512Z cvt.s16.s8 %rs447, %rs446; 2026-02-21T08:52:47.1052578Z shr.s16 %rs448, %rs447, 4; 2026-02-21T08:52:47.1052784Z .loc 1 77 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:77:32 2026-02-21T08:52:47.1052850Z cvt.rn.f32.s16 %r2120, %rs403; 2026-02-21T08:52:47.1052922Z cvt.rn.f32.s16 %r2121, %rs406; 2026-02-21T08:52:47.1052986Z cvt.rn.f32.s16 %r2122, %rs409; 2026-02-21T08:52:47.1053050Z cvt.rn.f32.s16 %r2123, %rs412; 2026-02-21T08:52:47.1053113Z cvt.rn.f32.s16 %r2124, %rs415; 2026-02-21T08:52:47.1053181Z cvt.rn.f32.s16 %r2125, %rs418; 2026-02-21T08:52:47.1053246Z cvt.rn.f32.s16 %r2126, %rs421; 2026-02-21T08:52:47.1053400Z cvt.rn.f32.s16 %r2127, %rs424; 2026-02-21T08:52:47.1053470Z cvt.rn.f32.s16 %r2128, %rs427; 2026-02-21T08:52:47.1053534Z cvt.rn.f32.s16 %r2129, %rs430; 2026-02-21T08:52:47.1053595Z cvt.rn.f32.s16 %r2130, %rs433; 2026-02-21T08:52:47.1053667Z cvt.rn.f32.s16 %r2131, %rs436; 2026-02-21T08:52:47.1053729Z cvt.rn.f32.s16 %r2132, %rs439; 2026-02-21T08:52:47.1053790Z cvt.rn.f32.s16 %r2133, %rs442; 2026-02-21T08:52:47.1053851Z cvt.rn.f32.s16 %r2134, %rs445; 2026-02-21T08:52:47.1053919Z cvt.rn.f32.s16 %r2135, %rs448; 2026-02-21T08:52:47.1053984Z st.shared.b32 [%r39], %r2120; 2026-02-21T08:52:47.1054053Z st.shared.b32 [%r39+16384], %r2128; 2026-02-21T08:52:47.1054123Z st.shared.b32 [%r40], %r2121; 2026-02-21T08:52:47.1054189Z st.shared.b32 [%r40+16384], %r2129; 2026-02-21T08:52:47.1054253Z st.shared.b32 [%r41], %r2122; 2026-02-21T08:52:47.1054319Z st.shared.b32 [%r41+16384], %r2130; 2026-02-21T08:52:47.1054388Z st.shared.b32 [%r42], %r2123; 2026-02-21T08:52:47.1054456Z st.shared.b32 [%r42+16384], %r2131; 2026-02-21T08:52:47.1054521Z st.shared.b32 [%r43], %r2124; 2026-02-21T08:52:47.1054592Z st.shared.b32 [%r43+16384], %r2132; 2026-02-21T08:52:47.1054655Z st.shared.b32 [%r44], %r2125; 2026-02-21T08:52:47.1054724Z st.shared.b32 [%r44+16384], %r2133; 2026-02-21T08:52:47.1054788Z st.shared.b32 [%r45], %r2126; 2026-02-21T08:52:47.1054859Z st.shared.b32 [%r45+16384], %r2134; 2026-02-21T08:52:47.1054924Z st.shared.b32 [%r46], %r2127; 2026-02-21T08:52:47.1054990Z st.shared.b32 [%r46+16384], %r2135; 2026-02-21T08:52:47.1055052Z $L__tmp7: 2026-02-21T08:52:47.1055362Z .loc 2 291 36 // standard.py:291:36 @[ c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:84:40 ] 2026-02-21T08:52:47.1055428Z // begin inline asm 2026-02-21T08:52:47.1055522Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.1055581Z // end inline asm 2026-02-21T08:52:47.1055639Z bar.sync 0; 2026-02-21T08:52:47.1055723Z shfl.sync.idx.b32 %r2136, %r4, 0, 31, -1; 2026-02-21T08:52:47.1055807Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.1055871Z shl.b32 %r2137, %r2136, 10; 2026-02-21T08:52:47.1055933Z and.b32 %r2138, %r2137, 12288; 2026-02-21T08:52:47.1056001Z add.s32 %r2139, %r2138, %r2696; 2026-02-21T08:52:47.1056068Z bfe.u32 %r2140, %r2139, 4, 14; 2026-02-21T08:52:47.1056133Z cvt.u64.u32 %rd182, %r2140; 2026-02-21T08:52:47.1056212Z or.b64 %rd171, %rd182, 4611686293372403712; 2026-02-21T08:52:47.1056284Z mov.pred %p41, -1; 2026-02-21T08:52:47.1056347Z // begin inline asm 2026-02-21T08:52:47.1056990Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1806,%r1807,%r1808,%r1809}, %rd171, %p41, 1, 1; 2026-02-21T08:52:47.1057060Z // end inline asm 2026-02-21T08:52:47.1057124Z add.s32 %r2141, %r2139, 32; 2026-02-21T08:52:47.1057185Z bfe.u32 %r2142, %r2141, 4, 14; 2026-02-21T08:52:47.1057256Z cvt.u64.u32 %rd183, %r2142; 2026-02-21T08:52:47.1057335Z or.b64 %rd172, %rd183, 4611686293372403712; 2026-02-21T08:52:47.1057396Z // begin inline asm 2026-02-21T08:52:47.1057901Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1842,%r1843,%r1844,%r1845}, %rd172, %p41, 1, 1; 2026-02-21T08:52:47.1058101Z // end inline asm 2026-02-21T08:52:47.1058164Z add.s32 %r2143, %r2139, 64; 2026-02-21T08:52:47.1058229Z bfe.u32 %r2144, %r2143, 4, 14; 2026-02-21T08:52:47.1058308Z cvt.u64.u32 %rd184, %r2144; 2026-02-21T08:52:47.1058384Z or.b64 %rd173, %rd184, 4611686293372403712; 2026-02-21T08:52:47.1058445Z // begin inline asm 2026-02-21T08:52:47.1058945Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1878,%r1879,%r1880,%r1881}, %rd173, %p41, 1, 1; 2026-02-21T08:52:47.1059003Z // end inline asm 2026-02-21T08:52:47.1059178Z add.s32 %r2145, %r2139, 96; 2026-02-21T08:52:47.1059249Z bfe.u32 %r2146, %r2145, 4, 14; 2026-02-21T08:52:47.1059313Z cvt.u64.u32 %rd185, %r2146; 2026-02-21T08:52:47.1059386Z or.b64 %rd174, %rd185, 4611686293372403712; 2026-02-21T08:52:47.1059452Z // begin inline asm 2026-02-21T08:52:47.1059954Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1914,%r1915,%r1916,%r1917}, %rd174, %p41, 1, 1; 2026-02-21T08:52:47.1060015Z // end inline asm 2026-02-21T08:52:47.1060076Z add.s32 %r2147, %r2139, 16384; 2026-02-21T08:52:47.1060140Z bfe.u32 %r2148, %r2147, 4, 14; 2026-02-21T08:52:47.1060215Z cvt.u64.u32 %rd186, %r2148; 2026-02-21T08:52:47.1060293Z or.b64 %rd175, %rd186, 4611686293372403712; 2026-02-21T08:52:47.1060360Z // begin inline asm 2026-02-21T08:52:47.1060867Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1950,%r1951,%r1952,%r1953}, %rd175, %p41, 1, 1; 2026-02-21T08:52:47.1060926Z // end inline asm 2026-02-21T08:52:47.1060995Z add.s32 %r2149, %r2139, 16416; 2026-02-21T08:52:47.1061060Z bfe.u32 %r2150, %r2149, 4, 14; 2026-02-21T08:52:47.1061126Z cvt.u64.u32 %rd187, %r2150; 2026-02-21T08:52:47.1061201Z or.b64 %rd176, %rd187, 4611686293372403712; 2026-02-21T08:52:47.1061266Z // begin inline asm 2026-02-21T08:52:47.1061771Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r1986,%r1987,%r1988,%r1989}, %rd176, %p41, 1, 1; 2026-02-21T08:52:47.1061833Z // end inline asm 2026-02-21T08:52:47.1061901Z add.s32 %r2151, %r2139, 16448; 2026-02-21T08:52:47.1061963Z bfe.u32 %r2152, %r2151, 4, 14; 2026-02-21T08:52:47.1062027Z cvt.u64.u32 %rd188, %r2152; 2026-02-21T08:52:47.1062109Z or.b64 %rd177, %rd188, 4611686293372403712; 2026-02-21T08:52:47.1062174Z // begin inline asm 2026-02-21T08:52:47.1062673Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r2022,%r2023,%r2024,%r2025}, %rd177, %p41, 1, 1; 2026-02-21T08:52:47.1062733Z // end inline asm 2026-02-21T08:52:47.1062802Z add.s32 %r2153, %r2139, 16480; 2026-02-21T08:52:47.1062863Z bfe.u32 %r2154, %r2153, 4, 14; 2026-02-21T08:52:47.1062929Z cvt.u64.u32 %rd189, %r2154; 2026-02-21T08:52:47.1063010Z or.b64 %rd178, %rd189, 4611686293372403712; 2026-02-21T08:52:47.1063072Z // begin inline asm 2026-02-21T08:52:47.1063567Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788}, {%r2058,%r2059,%r2060,%r2061}, %rd178, %p41, 1, 1; 2026-02-21T08:52:47.1063631Z // end inline asm 2026-02-21T08:52:47.1063709Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.1063769Z mov.b32 %r2079, 0; 2026-02-21T08:52:47.1063832Z mov.b32 %r2078, %r2696; 2026-02-21T08:52:47.1063897Z mov.b32 %r2080, %r2079; 2026-02-21T08:52:47.1063969Z // begin inline asm 2026-02-21T08:52:47.1064385Z // wait for regs: %r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788,%r2078,%r2079,%r2080 2026-02-21T08:52:47.1064468Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.1064524Z // end inline asm 2026-02-21T08:52:47.1064579Z $L__tmp8: 2026-02-21T08:52:47.1064805Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1064869Z add.s32 %r2155, %r2772, 1; 2026-02-21T08:52:47.1064938Z setp.gt.s32 %p52, %r2155, 1; 2026-02-21T08:52:47.1065008Z selp.b32 %r2772, 0, %r2155, %p52; 2026-02-21T08:52:47.1065220Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1065404Z mad.wide.s32 %rd180, %r2769, 2, %rd33; 2026-02-21T08:52:47.1065611Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1065680Z shl.b32 %r2156, %r2772, 12; 2026-02-21T08:52:47.1065745Z shl.b32 %r2157, %r2772, 13; 2026-02-21T08:52:47.1065810Z add.s32 %r2158, %r750, %r2157; 2026-02-21T08:52:47.1065880Z add.s32 %r2100, %r2158, %r10; 2026-02-21T08:52:47.1065946Z selp.b32 %r2101, 8, 0, %p50; 2026-02-21T08:52:47.1066006Z // begin inline asm 2026-02-21T08:52:47.1066154Z cp.async.ca.shared.global [ %r2100 + 0 ], [ %rd242 + 0 ], 0x8, %r2101; 2026-02-21T08:52:47.1066219Z // end inline asm 2026-02-21T08:52:47.1066280Z add.s32 %r2102, %r2100, 4096; 2026-02-21T08:52:47.1066339Z // begin inline asm 2026-02-21T08:52:47.1066607Z cp.async.ca.shared.global [ %r2102 + 0 ], [ %rd180 + 0 ], 0x8, %r2101; 2026-02-21T08:52:47.1066673Z // end inline asm 2026-02-21T08:52:47.1066741Z cp.async.commit_group; 2026-02-21T08:52:47.1066947Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1067018Z cvt.s64.s32 %rd190, %r2770; 2026-02-21T08:52:47.1067085Z add.s64 %rd181, %rd34, %rd190; 2026-02-21T08:52:47.1067289Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1067359Z add.s32 %r2104, %r14, %r2156; 2026-02-21T08:52:47.1067421Z // begin inline asm 2026-02-21T08:52:47.1067557Z cp.async.ca.shared.global [ %r2104 + 0 ], [ %rd181 + 0 ], 0x8, %r2101; 2026-02-21T08:52:47.1067623Z // end inline asm 2026-02-21T08:52:47.1067701Z cp.async.commit_group; 2026-02-21T08:52:47.1067905Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1067972Z add.s32 %r2770, %r2770, 229376; 2026-02-21T08:52:47.1068043Z add.s64 %rd242, %rd242, 128; 2026-02-21T08:52:47.1068106Z add.s32 %r2769, %r2769, 64; 2026-02-21T08:52:47.1068175Z setp.lt.u64 %p53, %rd243, 4064; 2026-02-21T08:52:47.1068248Z @%p53 bra $L__BB0_9; 2026-02-21T08:52:47.1068362Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.1068431Z cp.async.wait_group 0; 2026-02-21T08:52:47.1068496Z bar.sync 0; 2026-02-21T08:52:47.1068771Z .loc 1 87 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:87:28 2026-02-21T08:52:47.1068852Z cvt.rn.bf16x2.f32 %r2177, %r2774, %r2773; 2026-02-21T08:52:47.1068928Z cvt.rn.bf16x2.f32 %r2178, %r2776, %r2775; 2026-02-21T08:52:47.1069012Z cvt.rn.bf16x2.f32 %r2179, %r2778, %r2777; 2026-02-21T08:52:47.1069084Z cvt.rn.bf16x2.f32 %r2180, %r2780, %r2779; 2026-02-21T08:52:47.1069155Z cvt.rn.bf16x2.f32 %r2181, %r2782, %r2781; 2026-02-21T08:52:47.1069232Z cvt.rn.bf16x2.f32 %r2182, %r2784, %r2783; 2026-02-21T08:52:47.1069303Z cvt.rn.bf16x2.f32 %r2183, %r2786, %r2785; 2026-02-21T08:52:47.1069374Z cvt.rn.bf16x2.f32 %r2184, %r2788, %r2787; 2026-02-21T08:52:47.1069581Z .loc 1 88 50 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:50 2026-02-21T08:52:47.1069651Z mad.lo.s32 %r2185, %r217, 7168, %r219; 2026-02-21T08:52:47.1069718Z mad.lo.s32 %r2186, %r218, 7168, %r219; 2026-02-21T08:52:47.1070075Z .loc 1 88 22 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:22 2026-02-21T08:52:47.1070150Z mad.wide.s32 %rd191, %r2185, 2, %rd35; 2026-02-21T08:52:47.1070217Z mad.wide.s32 %rd192, %r2186, 2, %rd35; 2026-02-21T08:52:47.1070413Z .loc 1 88 81 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:81 2026-02-21T08:52:47.1070533Z st.shared.v4.b32 [%r36], {%r2177, %r2179, %r2181, %r2183}; 2026-02-21T08:52:47.1070649Z st.shared.v4.b32 [%r36+512], {%r2178, %r2180, %r2182, %r2184}; 2026-02-21T08:52:47.1070719Z bar.sync 0; 2026-02-21T08:52:47.1070787Z // begin inline asm 2026-02-21T08:52:47.1070980Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2159, %r2160, %r2161, %r2162}, [%r804]; 2026-02-21T08:52:47.1071154Z // end inline asm 2026-02-21T08:52:47.1071217Z // begin inline asm 2026-02-21T08:52:47.1071405Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2164, %r2165, %r2166, %r2167}, [%r809]; 2026-02-21T08:52:47.1071468Z // end inline asm 2026-02-21T08:52:47.1071526Z // begin inline asm 2026-02-21T08:52:47.1071658Z st.global.v4.b32 [ %rd191 + 0 ], { %r2159, %r2160, %r2161, %r2162 }; 2026-02-21T08:52:47.1071715Z // end inline asm 2026-02-21T08:52:47.1071774Z // begin inline asm 2026-02-21T08:52:47.1071894Z st.global.v4.b32 [ %rd192 + 0 ], { %r2164, %r2165, %r2166, %r2167 }; 2026-02-21T08:52:47.1071951Z // end inline asm 2026-02-21T08:52:47.1072164Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.1072229Z add.s32 %r2708, %r2708, 16896; 2026-02-21T08:52:47.1072304Z setp.lt.s32 %p54, %r2708, %r2789; 2026-02-21T08:52:47.1072364Z @%p54 bra $L__BB0_2; 2026-02-21T08:52:47.1072455Z $L__BB0_11: // %.preheader 2026-02-21T08:52:47.1072529Z setp.gt.s32 %p55, %r2789, 55; 2026-02-21T08:52:47.1072588Z @%p55 bra $L__BB0_16; 2026-02-21T08:52:47.1072672Z // %bb.12: // %.lr.ph123 2026-02-21T08:52:47.1072881Z .loc 1 0 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:0:144 2026-02-21T08:52:47.1072950Z and.b32 %r2188, %r2694, 4088; 2026-02-21T08:52:47.1073012Z and.b32 %r2190, %r2695, 56; 2026-02-21T08:52:47.1073072Z xor.b32 %r50, %r2188, %r2190; 2026-02-21T08:52:47.1073139Z add.s32 %r2192, %r2696, %r50; 2026-02-21T08:52:47.1073202Z add.s32 %r2237, %r2192, 32768; 2026-02-21T08:52:47.1073263Z add.s32 %r2239, %r2192, 36864; 2026-02-21T08:52:47.1073328Z add.s32 %r2193, %r2696, 49152; 2026-02-21T08:52:47.1073390Z add.s32 %r54, %r2193, %r2188; 2026-02-21T08:52:47.1073450Z add.s32 %r2243, %r2192, 40960; 2026-02-21T08:52:47.1073511Z add.s32 %r2245, %r2192, 45056; 2026-02-21T08:52:47.1073581Z add.s32 %r2194, %r2696, %r2188; 2026-02-21T08:52:47.1073659Z add.s32 %r2247, %r2194, 53248; 2026-02-21T08:52:47.1073726Z and.b32 %r2196, %r2699, 6144; 2026-02-21T08:52:47.1073793Z and.b32 %r2198, %r2700, 896; 2026-02-21T08:52:47.1073854Z and.b32 %r2200, %r2701, 62; 2026-02-21T08:52:47.1073920Z or.b32 %r2201, %r2196, %r2198; 2026-02-21T08:52:47.1073981Z or.b32 %r59, %r2201, %r2200; 2026-02-21T08:52:47.1074046Z xor.b32 %r60, %r59, 8; 2026-02-21T08:52:47.1074107Z xor.b32 %r61, %r59, 16; 2026-02-21T08:52:47.1074168Z xor.b32 %r62, %r59, 24; 2026-02-21T08:52:47.1074233Z xor.b32 %r63, %r59, 32; 2026-02-21T08:52:47.1074292Z xor.b32 %r64, %r59, 40; 2026-02-21T08:52:47.1074351Z xor.b32 %r65, %r59, 48; 2026-02-21T08:52:47.1074410Z xor.b32 %r66, %r59, 56; 2026-02-21T08:52:47.1074476Z and.b32 %r2203, %r2695, 128; 2026-02-21T08:52:47.1074539Z add.s32 %r2204, %r2193, %r2203; 2026-02-21T08:52:47.1074601Z add.s32 %r67, %r2204, %r2702; 2026-02-21T08:52:47.1074669Z shl.b32 %r2205, %r2702, 7; 2026-02-21T08:52:47.1074731Z and.b32 %r2207, %r2703, 112; 2026-02-21T08:52:47.1074797Z or.b32 %r2209, %r2205, %r2704; 2026-02-21T08:52:47.1074863Z or.b32 %r2210, %r2209, %r2207; 2026-02-21T08:52:47.1074924Z add.s32 %r68, %r2696, %r2210; 2026-02-21T08:52:47.1075089Z xor.b32 %r2211, %r2210, 16; 2026-02-21T08:52:47.1075151Z add.s32 %r69, %r2696, %r2211; 2026-02-21T08:52:47.1075217Z xor.b32 %r2212, %r2210, 32; 2026-02-21T08:52:47.1075278Z add.s32 %r70, %r2696, %r2212; 2026-02-21T08:52:47.1075338Z xor.b32 %r2213, %r2210, 48; 2026-02-21T08:52:47.1075410Z add.s32 %r71, %r2696, %r2213; 2026-02-21T08:52:47.1075469Z xor.b32 %r2214, %r2210, 64; 2026-02-21T08:52:47.1075529Z add.s32 %r72, %r2696, %r2214; 2026-02-21T08:52:47.1075589Z xor.b32 %r2215, %r2210, 80; 2026-02-21T08:52:47.1075653Z add.s32 %r73, %r2696, %r2215; 2026-02-21T08:52:47.1075713Z xor.b32 %r2216, %r2210, 96; 2026-02-21T08:52:47.1075773Z add.s32 %r74, %r2696, %r2216; 2026-02-21T08:52:47.1075835Z xor.b32 %r2217, %r2210, 112; 2026-02-21T08:52:47.1075897Z add.s32 %r75, %r2696, %r2217; 2026-02-21T08:52:47.1076047Z shl.b32 %r2219, %r2705, 12; 2026-02-21T08:52:47.1076109Z and.b32 %r2220, %r2700, 3168; 2026-02-21T08:52:47.1076174Z shl.b32 %r2222, %r2706, 4; 2026-02-21T08:52:47.1076238Z shr.u32 %r2223, %r3, 2; 2026-02-21T08:52:47.1076296Z and.b32 %r2224, %r2223, 96; 2026-02-21T08:52:47.1076359Z and.b32 %r2226, %r2707, 16; 2026-02-21T08:52:47.1076418Z or.b32 %r2227, %r2220, %r2222; 2026-02-21T08:52:47.1076602Z xor.b32 %r2228, %r2227, %r2224; 2026-02-21T08:52:47.1076670Z add.s32 %r2229, %r2696, %r2219; 2026-02-21T08:52:47.1076735Z add.s32 %r2230, %r2229, %r2226; 2026-02-21T08:52:47.1076795Z add.s32 %r76, %r2230, %r2228; 2026-02-21T08:52:47.1076853Z shl.b32 %r2231, %r2706, 9; 2026-02-21T08:52:47.1076915Z shl.b32 %r2232, %r2705, 5; 2026-02-21T08:52:47.1076975Z and.b32 %r2233, %r2707, 2032; 2026-02-21T08:52:47.1077034Z or.b32 %r2234, %r2231, %r2232; 2026-02-21T08:52:47.1077098Z xor.b32 %r2235, %r2234, %r2233; 2026-02-21T08:52:47.1077157Z add.s32 %r2670, %r2696, %r2235; 2026-02-21T08:52:47.1077219Z add.s32 %r2675, %r2670, 2048; 2026-02-21T08:52:47.1077431Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.1077509Z mad.wide.u32 %rd193, %r312, 8, %rd33; 2026-02-21T08:52:47.1077570Z add.s64 %rd2, %rd193, 256; 2026-02-21T08:52:47.1077768Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1077831Z or.b32 %r79, %r7, 128; 2026-02-21T08:52:47.1078034Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.1078104Z mad.wide.u32 %rd194, %r5, 7168, %rd34; 2026-02-21T08:52:47.1078169Z add.s64 %rd3, %rd194, 458752; 2026-02-21T08:52:47.1078279Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T08:52:47.1078374Z // Child Loop BB0_14 Depth 2 2026-02-21T08:52:47.1078573Z .loc 1 25 35 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:25:35 2026-02-21T08:52:47.1078661Z mul.hi.s32 %r2252, %r2789, -1840700269; 2026-02-21T08:52:47.1078723Z add.s32 %r2253, %r2252, %r2789; 2026-02-21T08:52:47.1078785Z shr.u32 %r2254, %r2253, 31; 2026-02-21T08:52:47.1078850Z shr.s32 %r2255, %r2253, 6; 2026-02-21T08:52:47.1078910Z add.s32 %r2256, %r2255, %r2254; 2026-02-21T08:52:47.1079106Z .loc 1 26 33 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:26:33 2026-02-21T08:52:47.1079168Z shl.b32 %r2257, %r2256, 1; 2026-02-21T08:52:47.1079362Z .loc 1 27 39 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:39 2026-02-21T08:52:47.1079422Z sub.s32 %r2258, 1, %r2257; 2026-02-21T08:52:47.1079617Z .loc 1 27 52 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:27:52 2026-02-21T08:52:47.1079681Z min.u32 %r2259, %r2258, 2; 2026-02-21T08:52:47.1079879Z .loc 1 28 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:45 2026-02-21T08:52:47.1079940Z mul.lo.s32 %r2260, %r2256, 112; 2026-02-21T08:52:47.1080004Z sub.s32 %r2261, %r2789, %r2260; 2026-02-21T08:52:47.1080336Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.1080399Z cvt.u16.u32 %rs449, %r2261; 2026-02-21T08:52:47.1080464Z cvt.s8.s32 %rs450, %r2261; 2026-02-21T08:52:47.1080526Z cvt.u16.u32 %rs451, %r2259; 2026-02-21T08:52:47.1080721Z .loc 1 29 51 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:29:51 2026-02-21T08:52:47.1080788Z div.s16 %rs452, %rs450, %rs451; 2026-02-21T08:52:47.1080985Z .loc 1 28 64 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:64 2026-02-21T08:52:47.1081054Z mul.lo.s16 %rs453, %rs452, %rs451; 2026-02-21T08:52:47.1081116Z sub.s16 %rs454, %rs449, %rs453; 2026-02-21T08:52:47.1081180Z cvt.u32.u16 %r2262, %rs454; 2026-02-21T08:52:47.1081357Z cvt.s32.s8 %r2263, %r2262; 2026-02-21T08:52:47.1081557Z .loc 1 28 30 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:28:30 2026-02-21T08:52:47.1081626Z add.s32 %r2264, %r2257, %r2263; 2026-02-21T08:52:47.1081820Z .loc 1 30 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:30:27 2026-02-21T08:52:47.1081879Z shl.b32 %r2265, %r2264, 6; 2026-02-21T08:52:47.1082075Z .loc 1 31 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:31:32 2026-02-21T08:52:47.1082135Z or.b32 %r264, %r2265, %r5; 2026-02-21T08:52:47.1082198Z or.b32 %r265, %r2265, %r6; 2026-02-21T08:52:47.1082391Z .loc 1 32 27 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:32:27 2026-02-21T08:52:47.1082459Z cvt.s16.s8 %rs455, %rs452; 2026-02-21T08:52:47.1082527Z mul.wide.s16 %r2266, %rs455, 128; 2026-02-21T08:52:47.1082726Z .loc 1 33 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:33:32 2026-02-21T08:52:47.1082793Z or.b32 %r266, %r2266, %r8; 2026-02-21T08:52:47.1083009Z .loc 1 48 53 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:53 2026-02-21T08:52:47.1083080Z shl.b32 %r2267, %r264, 13; 2026-02-21T08:52:47.1083141Z shl.b32 %r2268, %r265, 13; 2026-02-21T08:52:47.1083344Z .loc 1 48 60 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:60 2026-02-21T08:52:47.1083404Z or.b32 %r2269, %r2267, %r7; 2026-02-21T08:52:47.1083463Z or.b32 %r2270, %r2268, %r7; 2026-02-21T08:52:47.1083667Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1083739Z mad.wide.s32 %rd195, %r2269, 2, %rd33; 2026-02-21T08:52:47.1083807Z mad.wide.s32 %rd196, %r2270, 2, %rd33; 2026-02-21T08:52:47.1083870Z mov.b32 %r2238, 8; 2026-02-21T08:52:47.1084071Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1084134Z // begin inline asm 2026-02-21T08:52:47.1084275Z cp.async.ca.shared.global [ %r2237 + 0 ], [ %rd195 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1084341Z // end inline asm 2026-02-21T08:52:47.1084402Z // begin inline asm 2026-02-21T08:52:47.1084546Z cp.async.ca.shared.global [ %r2239 + 0 ], [ %rd196 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1084613Z // end inline asm 2026-02-21T08:52:47.1084682Z cp.async.commit_group; 2026-02-21T08:52:47.1084893Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.1084966Z add.s32 %r2271, %r266, %r2697; 2026-02-21T08:52:47.1085179Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1085246Z cvt.s64.s32 %rd202, %r2271; 2026-02-21T08:52:47.1085311Z add.s64 %rd197, %rd34, %rd202; 2026-02-21T08:52:47.1085520Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1085584Z // begin inline asm 2026-02-21T08:52:47.1085725Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd197 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1085893Z // end inline asm 2026-02-21T08:52:47.1085972Z cp.async.commit_group; 2026-02-21T08:52:47.1086177Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1086247Z cvt.s64.s32 %rd203, %r2267; 2026-02-21T08:52:47.1086311Z or.b64 %rd205, %rd203, %rd235; 2026-02-21T08:52:47.1086378Z shl.b64 %rd206, %rd205, 1; 2026-02-21T08:52:47.1086443Z add.s64 %rd207, %rd33, %rd206; 2026-02-21T08:52:47.1086633Z add.s64 %rd198, %rd207, 128; 2026-02-21T08:52:47.1086697Z cvt.s64.s32 %rd208, %r2268; 2026-02-21T08:52:47.1086760Z or.b64 %rd209, %rd208, %rd235; 2026-02-21T08:52:47.1086827Z shl.b64 %rd210, %rd209, 1; 2026-02-21T08:52:47.1086902Z add.s64 %rd211, %rd33, %rd210; 2026-02-21T08:52:47.1087106Z add.s64 %rd199, %rd211, 128; 2026-02-21T08:52:47.1087310Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1087377Z bar.sync 0; 2026-02-21T08:52:47.1087442Z // begin inline asm 2026-02-21T08:52:47.1087584Z cp.async.ca.shared.global [ %r2243 + 0 ], [ %rd198 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1087648Z // end inline asm 2026-02-21T08:52:47.1087707Z // begin inline asm 2026-02-21T08:52:47.1087844Z cp.async.ca.shared.global [ %r2245 + 0 ], [ %rd199 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1087906Z // end inline asm 2026-02-21T08:52:47.1087973Z cp.async.commit_group; 2026-02-21T08:52:47.1088174Z .loc 1 54 62 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:62 2026-02-21T08:52:47.1088236Z add.s32 %r2272, %r266, %r2698; 2026-02-21T08:52:47.1088439Z .loc 1 54 34 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:34 2026-02-21T08:52:47.1088502Z cvt.s64.s32 %rd212, %r2272; 2026-02-21T08:52:47.1088568Z add.s64 %rd200, %rd34, %rd212; 2026-02-21T08:52:47.1088769Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1088832Z // begin inline asm 2026-02-21T08:52:47.1088978Z cp.async.ca.shared.global [ %r2247 + 0 ], [ %rd200 + 0 ], 0x8, %r2238; 2026-02-21T08:52:47.1089042Z // end inline asm 2026-02-21T08:52:47.1089108Z cp.async.commit_group; 2026-02-21T08:52:47.1089307Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1089369Z shl.b32 %r2273, %r2256, 7; 2026-02-21T08:52:47.1089434Z or.b32 %r2274, %r5, %r2273; 2026-02-21T08:52:47.1089496Z cvt.s16.s8 %rs456, %rs454; 2026-02-21T08:52:47.1089566Z mul.wide.s16 %r2275, %rs456, 64; 2026-02-21T08:52:47.1089635Z add.s32 %r2276, %r2274, %r2275; 2026-02-21T08:52:47.1089696Z shl.b32 %r2277, %r2276, 13; 2026-02-21T08:52:47.1089767Z mad.wide.s32 %rd245, %r2277, 2, %rd2; 2026-02-21T08:52:47.1089830Z or.b32 %r2278, %r6, %r2273; 2026-02-21T08:52:47.1089899Z add.s32 %r2279, %r2278, %r2275; 2026-02-21T08:52:47.1089958Z shl.b32 %r2280, %r2279, 13; 2026-02-21T08:52:47.1090019Z or.b32 %r2790, %r79, %r2280; 2026-02-21T08:52:47.1090088Z cvt.s64.s32 %rd213, %r266; 2026-02-21T08:52:47.1090153Z add.s64 %rd244, %rd3, %rd213; 2026-02-21T08:52:47.1090213Z mov.b32 %r2793, 0f00000000; 2026-02-21T08:52:47.1090270Z mov.b32 %r2792, 1; 2026-02-21T08:52:47.1090337Z mov.b32 %r2791, -1; 2026-02-21T08:52:47.1090397Z mov.b64 %rd246, -32; 2026-02-21T08:52:47.1090458Z mov.b32 %r2794, %r2793; 2026-02-21T08:52:47.1090523Z mov.b32 %r2795, %r2793; 2026-02-21T08:52:47.1090583Z mov.b32 %r2796, %r2793; 2026-02-21T08:52:47.1090641Z mov.b32 %r2797, %r2793; 2026-02-21T08:52:47.1090715Z mov.b32 %r2798, %r2793; 2026-02-21T08:52:47.1090779Z mov.b32 %r2799, %r2793; 2026-02-21T08:52:47.1090839Z mov.b32 %r2800, %r2793; 2026-02-21T08:52:47.1090897Z mov.b32 %r2801, %r2793; 2026-02-21T08:52:47.1090965Z mov.b32 %r2802, %r2793; 2026-02-21T08:52:47.1091024Z mov.b32 %r2803, %r2793; 2026-02-21T08:52:47.1091092Z mov.b32 %r2804, %r2793; 2026-02-21T08:52:47.1091151Z mov.b32 %r2805, %r2793; 2026-02-21T08:52:47.1091362Z mov.b32 %r2806, %r2793; 2026-02-21T08:52:47.1091422Z mov.b32 %r2807, %r2793; 2026-02-21T08:52:47.1091488Z mov.b32 %r2808, %r2793; 2026-02-21T08:52:47.1091602Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T08:52:47.1091721Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.1091794Z add.s64 %rd246, %rd246, 32; 2026-02-21T08:52:47.1091863Z setp.lt.u64 %p65, %rd246, 4032; 2026-02-21T08:52:47.1091927Z add.s32 %r2613, %r2791, 1; 2026-02-21T08:52:47.1091999Z setp.gt.s32 %p66, %r2613, 1; 2026-02-21T08:52:47.1092068Z selp.b32 %r2791, 0, %r2613, %p66; 2026-02-21T08:52:47.1092273Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1092435Z cp.async.wait_group 2; 2026-02-21T08:52:47.1092500Z bar.sync 0; 2026-02-21T08:52:47.1092562Z shl.b32 %r2614, %r2791, 12; 2026-02-21T08:52:47.1092621Z shl.b32 %r2615, %r2791, 13; 2026-02-21T08:52:47.1092697Z add.s32 %r2616, %r2696, 32768; 2026-02-21T08:52:47.1092760Z add.s32 %r2617, %r2616, %r2615; 2026-02-21T08:52:47.1092960Z .loc 1 52 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:52:32 2026-02-21T08:52:47.1093030Z add.s32 %r2618, %r2617, %r59; 2026-02-21T08:52:47.1093097Z ld.shared.b16 %rs457, [%r2618]; 2026-02-21T08:52:47.1093172Z ld.shared.b16 %rs458, [%r2618+1024]; 2026-02-21T08:52:47.1093241Z ld.shared.b16 %rs459, [%r2618+64]; 2026-02-21T08:52:47.1093316Z ld.shared.b16 %rs460, [%r2618+1088]; 2026-02-21T08:52:47.1093378Z add.s32 %r2619, %r2617, %r60; 2026-02-21T08:52:47.1093444Z ld.shared.b16 %rs461, [%r2619]; 2026-02-21T08:52:47.1093514Z ld.shared.b16 %rs462, [%r2619+1024]; 2026-02-21T08:52:47.1093582Z ld.shared.b16 %rs463, [%r2619+64]; 2026-02-21T08:52:47.1093649Z ld.shared.b16 %rs464, [%r2619+1088]; 2026-02-21T08:52:47.1093710Z add.s32 %r2620, %r2617, %r61; 2026-02-21T08:52:47.1093781Z ld.shared.b16 %rs465, [%r2620]; 2026-02-21T08:52:47.1093849Z ld.shared.b16 %rs466, [%r2620+1024]; 2026-02-21T08:52:47.1093915Z ld.shared.b16 %rs467, [%r2620+64]; 2026-02-21T08:52:47.1093985Z ld.shared.b16 %rs468, [%r2620+1088]; 2026-02-21T08:52:47.1094047Z add.s32 %r2621, %r2617, %r62; 2026-02-21T08:52:47.1094111Z ld.shared.b16 %rs469, [%r2621]; 2026-02-21T08:52:47.1094182Z ld.shared.b16 %rs470, [%r2621+1024]; 2026-02-21T08:52:47.1094245Z ld.shared.b16 %rs471, [%r2621+64]; 2026-02-21T08:52:47.1094312Z ld.shared.b16 %rs472, [%r2621+1088]; 2026-02-21T08:52:47.1094373Z add.s32 %r2622, %r2617, %r63; 2026-02-21T08:52:47.1094442Z ld.shared.b16 %rs473, [%r2622]; 2026-02-21T08:52:47.1094520Z ld.shared.b16 %rs474, [%r2622+1024]; 2026-02-21T08:52:47.1094588Z ld.shared.b16 %rs475, [%r2622+64]; 2026-02-21T08:52:47.1094659Z ld.shared.b16 %rs476, [%r2622+1088]; 2026-02-21T08:52:47.1094721Z add.s32 %r2623, %r2617, %r64; 2026-02-21T08:52:47.1094785Z ld.shared.b16 %rs477, [%r2623]; 2026-02-21T08:52:47.1094850Z ld.shared.b16 %rs478, [%r2623+1024]; 2026-02-21T08:52:47.1094922Z ld.shared.b16 %rs479, [%r2623+64]; 2026-02-21T08:52:47.1094988Z ld.shared.b16 %rs480, [%r2623+1088]; 2026-02-21T08:52:47.1095049Z add.s32 %r2624, %r2617, %r65; 2026-02-21T08:52:47.1095119Z ld.shared.b16 %rs481, [%r2624]; 2026-02-21T08:52:47.1095185Z ld.shared.b16 %rs482, [%r2624+1024]; 2026-02-21T08:52:47.1095250Z ld.shared.b16 %rs483, [%r2624+64]; 2026-02-21T08:52:47.1095316Z ld.shared.b16 %rs484, [%r2624+1088]; 2026-02-21T08:52:47.1095382Z add.s32 %r2625, %r2617, %r66; 2026-02-21T08:52:47.1095448Z ld.shared.b16 %rs485, [%r2625]; 2026-02-21T08:52:47.1095513Z ld.shared.b16 %rs486, [%r2625+1024]; 2026-02-21T08:52:47.1095583Z ld.shared.b16 %rs487, [%r2625+64]; 2026-02-21T08:52:47.1095649Z ld.shared.b16 %rs488, [%r2625+1088]; 2026-02-21T08:52:47.1095716Z cvt.f32.bf16 %r2313, %rs457; 2026-02-21T08:52:47.1095785Z cvt.f32.bf16 %r2314, %rs458; 2026-02-21T08:52:47.1095847Z cvt.f32.bf16 %r2315, %rs461; 2026-02-21T08:52:47.1095908Z cvt.f32.bf16 %r2316, %rs462; 2026-02-21T08:52:47.1096078Z cvt.f32.bf16 %r2349, %rs465; 2026-02-21T08:52:47.1096143Z cvt.f32.bf16 %r2350, %rs466; 2026-02-21T08:52:47.1096204Z cvt.f32.bf16 %r2351, %rs469; 2026-02-21T08:52:47.1096265Z cvt.f32.bf16 %r2352, %rs470; 2026-02-21T08:52:47.1096332Z cvt.f32.bf16 %r2385, %rs473; 2026-02-21T08:52:47.1096392Z cvt.f32.bf16 %r2386, %rs474; 2026-02-21T08:52:47.1096583Z cvt.f32.bf16 %r2387, %rs477; 2026-02-21T08:52:47.1096650Z cvt.f32.bf16 %r2388, %rs478; 2026-02-21T08:52:47.1096717Z cvt.f32.bf16 %r2421, %rs481; 2026-02-21T08:52:47.1096779Z cvt.f32.bf16 %r2422, %rs482; 2026-02-21T08:52:47.1096840Z cvt.f32.bf16 %r2423, %rs485; 2026-02-21T08:52:47.1096906Z cvt.f32.bf16 %r2424, %rs486; 2026-02-21T08:52:47.1096967Z cvt.f32.bf16 %r2457, %rs459; 2026-02-21T08:52:47.1097161Z cvt.f32.bf16 %r2458, %rs460; 2026-02-21T08:52:47.1097228Z cvt.f32.bf16 %r2459, %rs463; 2026-02-21T08:52:47.1097294Z cvt.f32.bf16 %r2460, %rs464; 2026-02-21T08:52:47.1097355Z cvt.f32.bf16 %r2493, %rs467; 2026-02-21T08:52:47.1097418Z cvt.f32.bf16 %r2494, %rs468; 2026-02-21T08:52:47.1097493Z cvt.f32.bf16 %r2495, %rs471; 2026-02-21T08:52:47.1097561Z cvt.f32.bf16 %r2496, %rs472; 2026-02-21T08:52:47.1097628Z cvt.f32.bf16 %r2529, %rs475; 2026-02-21T08:52:47.1097693Z cvt.f32.bf16 %r2530, %rs476; 2026-02-21T08:52:47.1097756Z cvt.f32.bf16 %r2531, %rs479; 2026-02-21T08:52:47.1097821Z cvt.f32.bf16 %r2532, %rs480; 2026-02-21T08:52:47.1097881Z cvt.f32.bf16 %r2565, %rs483; 2026-02-21T08:52:47.1097948Z cvt.f32.bf16 %r2566, %rs484; 2026-02-21T08:52:47.1098010Z cvt.f32.bf16 %r2567, %rs487; 2026-02-21T08:52:47.1098070Z cvt.f32.bf16 %r2568, %rs488; 2026-02-21T08:52:47.1098279Z .loc 1 67 45 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:67:45 2026-02-21T08:52:47.1098343Z add.s32 %r2626, %r67, %r2614; 2026-02-21T08:52:47.1098410Z ld.shared.b8 %rs489, [%r2626]; 2026-02-21T08:52:47.1098476Z ld.shared.b8 %rs490, [%r2626+256]; 2026-02-21T08:52:47.1098549Z ld.shared.b8 %rs491, [%r2626+512]; 2026-02-21T08:52:47.1098614Z ld.shared.b8 %rs492, [%r2626+768]; 2026-02-21T08:52:47.1098683Z ld.shared.b8 %rs493, [%r2626+1024]; 2026-02-21T08:52:47.1098757Z ld.shared.b8 %rs494, [%r2626+1280]; 2026-02-21T08:52:47.1098824Z ld.shared.b8 %rs495, [%r2626+1536]; 2026-02-21T08:52:47.1098890Z ld.shared.b8 %rs496, [%r2626+1792]; 2026-02-21T08:52:47.1098961Z ld.shared.b8 %rs497, [%r2626+2048]; 2026-02-21T08:52:47.1099026Z ld.shared.b8 %rs498, [%r2626+2304]; 2026-02-21T08:52:47.1099091Z ld.shared.b8 %rs499, [%r2626+2560]; 2026-02-21T08:52:47.1099156Z ld.shared.b8 %rs500, [%r2626+2816]; 2026-02-21T08:52:47.1099230Z ld.shared.b8 %rs501, [%r2626+3072]; 2026-02-21T08:52:47.1099295Z ld.shared.b8 %rs502, [%r2626+3328]; 2026-02-21T08:52:47.1099363Z ld.shared.b8 %rs503, [%r2626+3584]; 2026-02-21T08:52:47.1099435Z ld.shared.b8 %rs504, [%r2626+3840]; 2026-02-21T08:52:47.1099635Z .loc 1 57 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:57:28 2026-02-21T08:52:47.1099702Z shl.b16 %rs505, %rs489, 4; 2026-02-21T08:52:47.1099765Z shl.b16 %rs506, %rs490, 4; 2026-02-21T08:52:47.1099845Z shl.b16 %rs507, %rs491, 4; 2026-02-21T08:52:47.1099909Z shl.b16 %rs508, %rs492, 4; 2026-02-21T08:52:47.1099970Z shl.b16 %rs509, %rs493, 4; 2026-02-21T08:52:47.1100038Z shl.b16 %rs510, %rs494, 4; 2026-02-21T08:52:47.1100099Z shl.b16 %rs511, %rs495, 4; 2026-02-21T08:52:47.1100160Z shl.b16 %rs512, %rs496, 4; 2026-02-21T08:52:47.1100228Z shl.b16 %rs513, %rs497, 4; 2026-02-21T08:52:47.1100289Z shl.b16 %rs514, %rs498, 4; 2026-02-21T08:52:47.1100349Z shl.b16 %rs515, %rs499, 4; 2026-02-21T08:52:47.1100410Z shl.b16 %rs516, %rs500, 4; 2026-02-21T08:52:47.1100475Z shl.b16 %rs517, %rs501, 4; 2026-02-21T08:52:47.1100538Z shl.b16 %rs518, %rs502, 4; 2026-02-21T08:52:47.1100599Z shl.b16 %rs519, %rs503, 4; 2026-02-21T08:52:47.1100665Z shl.b16 %rs520, %rs504, 4; 2026-02-21T08:52:47.1100863Z .loc 1 72 58 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:72:58 2026-02-21T08:52:47.1101073Z selp.b16 %rs521, %rs505, %rs489, %p70; 2026-02-21T08:52:47.1101136Z cvt.s16.s8 %rs522, %rs521; 2026-02-21T08:52:47.1101202Z shr.s16 %rs523, %rs522, 4; 2026-02-21T08:52:47.1101273Z selp.b16 %rs524, %rs506, %rs490, %p70; 2026-02-21T08:52:47.1101336Z cvt.s16.s8 %rs525, %rs524; 2026-02-21T08:52:47.1101402Z shr.s16 %rs526, %rs525, 4; 2026-02-21T08:52:47.1101470Z selp.b16 %rs527, %rs507, %rs491, %p70; 2026-02-21T08:52:47.1101532Z cvt.s16.s8 %rs528, %rs527; 2026-02-21T08:52:47.1101593Z shr.s16 %rs529, %rs528, 4; 2026-02-21T08:52:47.1101668Z selp.b16 %rs530, %rs508, %rs492, %p70; 2026-02-21T08:52:47.1101728Z cvt.s16.s8 %rs531, %rs530; 2026-02-21T08:52:47.1101788Z shr.s16 %rs532, %rs531, 4; 2026-02-21T08:52:47.1101957Z selp.b16 %rs533, %rs509, %rs493, %p70; 2026-02-21T08:52:47.1102020Z cvt.s16.s8 %rs534, %rs533; 2026-02-21T08:52:47.1102081Z shr.s16 %rs535, %rs534, 4; 2026-02-21T08:52:47.1102156Z selp.b16 %rs536, %rs510, %rs494, %p70; 2026-02-21T08:52:47.1102220Z cvt.s16.s8 %rs537, %rs536; 2026-02-21T08:52:47.1102282Z shr.s16 %rs538, %rs537, 4; 2026-02-21T08:52:47.1102353Z selp.b16 %rs539, %rs511, %rs495, %p70; 2026-02-21T08:52:47.1102426Z cvt.s16.s8 %rs540, %rs539; 2026-02-21T08:52:47.1102495Z shr.s16 %rs541, %rs540, 4; 2026-02-21T08:52:47.1102563Z selp.b16 %rs542, %rs512, %rs496, %p70; 2026-02-21T08:52:47.1102633Z cvt.s16.s8 %rs543, %rs542; 2026-02-21T08:52:47.1102694Z shr.s16 %rs544, %rs543, 4; 2026-02-21T08:52:47.1102762Z selp.b16 %rs545, %rs513, %rs497, %p70; 2026-02-21T08:52:47.1102823Z cvt.s16.s8 %rs546, %rs545; 2026-02-21T08:52:47.1102889Z shr.s16 %rs547, %rs546, 4; 2026-02-21T08:52:47.1102957Z selp.b16 %rs548, %rs514, %rs498, %p70; 2026-02-21T08:52:47.1103020Z cvt.s16.s8 %rs549, %rs548; 2026-02-21T08:52:47.1103087Z shr.s16 %rs550, %rs549, 4; 2026-02-21T08:52:47.1103155Z selp.b16 %rs551, %rs515, %rs499, %p70; 2026-02-21T08:52:47.1103216Z cvt.s16.s8 %rs552, %rs551; 2026-02-21T08:52:47.1103280Z shr.s16 %rs553, %rs552, 4; 2026-02-21T08:52:47.1103353Z selp.b16 %rs554, %rs516, %rs500, %p70; 2026-02-21T08:52:47.1103414Z cvt.s16.s8 %rs555, %rs554; 2026-02-21T08:52:47.1103474Z shr.s16 %rs556, %rs555, 4; 2026-02-21T08:52:47.1103549Z selp.b16 %rs557, %rs517, %rs501, %p70; 2026-02-21T08:52:47.1103611Z cvt.s16.s8 %rs558, %rs557; 2026-02-21T08:52:47.1103671Z shr.s16 %rs559, %rs558, 4; 2026-02-21T08:52:47.1103742Z selp.b16 %rs560, %rs518, %rs502, %p70; 2026-02-21T08:52:47.1103810Z cvt.s16.s8 %rs561, %rs560; 2026-02-21T08:52:47.1103871Z shr.s16 %rs562, %rs561, 4; 2026-02-21T08:52:47.1103939Z selp.b16 %rs563, %rs519, %rs503, %p70; 2026-02-21T08:52:47.1104006Z cvt.s16.s8 %rs564, %rs563; 2026-02-21T08:52:47.1104068Z shr.s16 %rs565, %rs564, 4; 2026-02-21T08:52:47.1104140Z selp.b16 %rs566, %rs520, %rs504, %p70; 2026-02-21T08:52:47.1104206Z cvt.s16.s8 %rs567, %rs566; 2026-02-21T08:52:47.1104267Z shr.s16 %rs568, %rs567, 4; 2026-02-21T08:52:47.1104466Z .loc 1 77 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:77:32 2026-02-21T08:52:47.1104534Z cvt.rn.f32.s16 %r2627, %rs523; 2026-02-21T08:52:47.1104604Z cvt.rn.f32.s16 %r2628, %rs526; 2026-02-21T08:52:47.1104667Z cvt.rn.f32.s16 %r2629, %rs529; 2026-02-21T08:52:47.1104729Z cvt.rn.f32.s16 %r2630, %rs532; 2026-02-21T08:52:47.1104797Z cvt.rn.f32.s16 %r2631, %rs535; 2026-02-21T08:52:47.1104860Z cvt.rn.f32.s16 %r2632, %rs538; 2026-02-21T08:52:47.1104923Z cvt.rn.f32.s16 %r2633, %rs541; 2026-02-21T08:52:47.1104988Z cvt.rn.f32.s16 %r2634, %rs544; 2026-02-21T08:52:47.1105055Z cvt.rn.f32.s16 %r2635, %rs547; 2026-02-21T08:52:47.1105117Z cvt.rn.f32.s16 %r2636, %rs550; 2026-02-21T08:52:47.1105180Z cvt.rn.f32.s16 %r2637, %rs553; 2026-02-21T08:52:47.1105251Z cvt.rn.f32.s16 %r2638, %rs556; 2026-02-21T08:52:47.1105313Z cvt.rn.f32.s16 %r2639, %rs559; 2026-02-21T08:52:47.1105388Z cvt.rn.f32.s16 %r2640, %rs562; 2026-02-21T08:52:47.1105458Z cvt.rn.f32.s16 %r2641, %rs565; 2026-02-21T08:52:47.1105649Z cvt.rn.f32.s16 %r2642, %rs568; 2026-02-21T08:52:47.1105715Z st.shared.b32 [%r68], %r2627; 2026-02-21T08:52:47.1105786Z st.shared.b32 [%r68+16384], %r2635; 2026-02-21T08:52:47.1105859Z st.shared.b32 [%r69], %r2628; 2026-02-21T08:52:47.1105927Z st.shared.b32 [%r69+16384], %r2636; 2026-02-21T08:52:47.1105991Z st.shared.b32 [%r70], %r2629; 2026-02-21T08:52:47.1106067Z st.shared.b32 [%r70+16384], %r2637; 2026-02-21T08:52:47.1106131Z st.shared.b32 [%r71], %r2630; 2026-02-21T08:52:47.1106198Z st.shared.b32 [%r71+16384], %r2638; 2026-02-21T08:52:47.1106261Z st.shared.b32 [%r72], %r2631; 2026-02-21T08:52:47.1106332Z st.shared.b32 [%r72+16384], %r2639; 2026-02-21T08:52:47.1106395Z st.shared.b32 [%r73], %r2632; 2026-02-21T08:52:47.1106578Z st.shared.b32 [%r73+16384], %r2640; 2026-02-21T08:52:47.1106801Z st.shared.b32 [%r74], %r2633; 2026-02-21T08:52:47.1106884Z st.shared.b32 [%r74+16384], %r2641; 2026-02-21T08:52:47.1106950Z st.shared.b32 [%r75], %r2634; 2026-02-21T08:52:47.1107021Z st.shared.b32 [%r75+16384], %r2642; 2026-02-21T08:52:47.1107085Z $L__tmp9: 2026-02-21T08:52:47.1107367Z .loc 2 291 36 // standard.py:291:36 @[ c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:84:40 ] 2026-02-21T08:52:47.1107429Z // begin inline asm 2026-02-21T08:52:47.1107515Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.1107573Z // end inline asm 2026-02-21T08:52:47.1107631Z bar.sync 0; 2026-02-21T08:52:47.1107717Z shfl.sync.idx.b32 %r2643, %r4, 0, 31, -1; 2026-02-21T08:52:47.1107793Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.1107855Z shl.b32 %r2644, %r2643, 10; 2026-02-21T08:52:47.1107919Z and.b32 %r2645, %r2644, 12288; 2026-02-21T08:52:47.1107988Z add.s32 %r2646, %r2645, %r2696; 2026-02-21T08:52:47.1108053Z bfe.u32 %r2647, %r2646, 4, 14; 2026-02-21T08:52:47.1108119Z cvt.u64.u32 %rd225, %r2647; 2026-02-21T08:52:47.1108203Z or.b64 %rd214, %rd225, 4611686293372403712; 2026-02-21T08:52:47.1108270Z mov.pred %p56, -1; 2026-02-21T08:52:47.1108333Z // begin inline asm 2026-02-21T08:52:47.1108923Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2313,%r2314,%r2315,%r2316}, %rd214, %p56, 1, 1; 2026-02-21T08:52:47.1108992Z // end inline asm 2026-02-21T08:52:47.1109053Z add.s32 %r2648, %r2646, 32; 2026-02-21T08:52:47.1109114Z bfe.u32 %r2649, %r2648, 4, 14; 2026-02-21T08:52:47.1109181Z cvt.u64.u32 %rd226, %r2649; 2026-02-21T08:52:47.1109255Z or.b64 %rd215, %rd226, 4611686293372403712; 2026-02-21T08:52:47.1109314Z // begin inline asm 2026-02-21T08:52:47.1109820Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2349,%r2350,%r2351,%r2352}, %rd215, %p56, 1, 1; 2026-02-21T08:52:47.1109881Z // end inline asm 2026-02-21T08:52:47.1109943Z add.s32 %r2650, %r2646, 64; 2026-02-21T08:52:47.1110012Z bfe.u32 %r2651, %r2650, 4, 14; 2026-02-21T08:52:47.1110075Z cvt.u64.u32 %rd227, %r2651; 2026-02-21T08:52:47.1110147Z or.b64 %rd216, %rd227, 4611686293372403712; 2026-02-21T08:52:47.1110207Z // begin inline asm 2026-02-21T08:52:47.1110704Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2385,%r2386,%r2387,%r2388}, %rd216, %p56, 1, 1; 2026-02-21T08:52:47.1110764Z // end inline asm 2026-02-21T08:52:47.1110824Z add.s32 %r2652, %r2646, 96; 2026-02-21T08:52:47.1110890Z bfe.u32 %r2653, %r2652, 4, 14; 2026-02-21T08:52:47.1110951Z cvt.u64.u32 %rd228, %r2653; 2026-02-21T08:52:47.1111024Z or.b64 %rd217, %rd228, 4611686293372403712; 2026-02-21T08:52:47.1111087Z // begin inline asm 2026-02-21T08:52:47.1111582Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2421,%r2422,%r2423,%r2424}, %rd217, %p56, 1, 1; 2026-02-21T08:52:47.1111783Z // end inline asm 2026-02-21T08:52:47.1111844Z add.s32 %r2654, %r2646, 16384; 2026-02-21T08:52:47.1111910Z bfe.u32 %r2655, %r2654, 4, 14; 2026-02-21T08:52:47.1111970Z cvt.u64.u32 %rd229, %r2655; 2026-02-21T08:52:47.1112054Z or.b64 %rd218, %rd229, 4611686293372403712; 2026-02-21T08:52:47.1112121Z // begin inline asm 2026-02-21T08:52:47.1112615Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2457,%r2458,%r2459,%r2460}, %rd218, %p56, 1, 1; 2026-02-21T08:52:47.1112672Z // end inline asm 2026-02-21T08:52:47.1112738Z add.s32 %r2656, %r2646, 16416; 2026-02-21T08:52:47.1112800Z bfe.u32 %r2657, %r2656, 4, 14; 2026-02-21T08:52:47.1112953Z cvt.u64.u32 %rd230, %r2657; 2026-02-21T08:52:47.1113028Z or.b64 %rd219, %rd230, 4611686293372403712; 2026-02-21T08:52:47.1113092Z // begin inline asm 2026-02-21T08:52:47.1113590Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2493,%r2494,%r2495,%r2496}, %rd219, %p56, 1, 1; 2026-02-21T08:52:47.1113648Z // end inline asm 2026-02-21T08:52:47.1113718Z add.s32 %r2658, %r2646, 16448; 2026-02-21T08:52:47.1113779Z bfe.u32 %r2659, %r2658, 4, 14; 2026-02-21T08:52:47.1113840Z cvt.u64.u32 %rd231, %r2659; 2026-02-21T08:52:47.1113917Z or.b64 %rd220, %rd231, 4611686293372403712; 2026-02-21T08:52:47.1113977Z // begin inline asm 2026-02-21T08:52:47.1114473Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2529,%r2530,%r2531,%r2532}, %rd220, %p56, 1, 1; 2026-02-21T08:52:47.1114535Z // end inline asm 2026-02-21T08:52:47.1114598Z add.s32 %r2660, %r2646, 16480; 2026-02-21T08:52:47.1114658Z bfe.u32 %r2661, %r2660, 4, 14; 2026-02-21T08:52:47.1114723Z cvt.u64.u32 %rd232, %r2661; 2026-02-21T08:52:47.1114802Z or.b64 %rd221, %rd232, 4611686293372403712; 2026-02-21T08:52:47.1114863Z // begin inline asm 2026-02-21T08:52:47.1115363Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808}, {%r2565,%r2566,%r2567,%r2568}, %rd221, %p56, 1, 1; 2026-02-21T08:52:47.1115426Z // end inline asm 2026-02-21T08:52:47.1115502Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.1115560Z mov.b32 %r2586, 0; 2026-02-21T08:52:47.1115637Z mov.b32 %r2585, %r2696; 2026-02-21T08:52:47.1115700Z mov.b32 %r2587, %r2586; 2026-02-21T08:52:47.1115760Z // begin inline asm 2026-02-21T08:52:47.1116070Z // wait for regs: %r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808,%r2585,%r2586,%r2587 2026-02-21T08:52:47.1116153Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.1116209Z // end inline asm 2026-02-21T08:52:47.1116271Z $L__tmp10: 2026-02-21T08:52:47.1116595Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1116662Z add.s32 %r2662, %r2792, 1; 2026-02-21T08:52:47.1116728Z setp.gt.s32 %p67, %r2662, 1; 2026-02-21T08:52:47.1116802Z selp.b32 %r2792, 0, %r2662, %p67; 2026-02-21T08:52:47.1117004Z .loc 1 48 32 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:32 2026-02-21T08:52:47.1117078Z mad.wide.s32 %rd223, %r2790, 2, %rd33; 2026-02-21T08:52:47.1117276Z .loc 1 48 80 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:48:80 2026-02-21T08:52:47.1117347Z shl.b32 %r2663, %r2792, 12; 2026-02-21T08:52:47.1117418Z shl.b32 %r2664, %r2792, 13; 2026-02-21T08:52:47.1117486Z add.s32 %r2665, %r2616, %r2664; 2026-02-21T08:52:47.1117557Z add.s32 %r2607, %r2665, %r50; 2026-02-21T08:52:47.1117625Z selp.b32 %r2608, 8, 0, %p65; 2026-02-21T08:52:47.1117686Z // begin inline asm 2026-02-21T08:52:47.1117971Z cp.async.ca.shared.global [ %r2607 + 0 ], [ %rd245 + 0 ], 0x8, %r2608; 2026-02-21T08:52:47.1118030Z // end inline asm 2026-02-21T08:52:47.1118094Z add.s32 %r2609, %r2607, 4096; 2026-02-21T08:52:47.1118154Z // begin inline asm 2026-02-21T08:52:47.1118302Z cp.async.ca.shared.global [ %r2609 + 0 ], [ %rd223 + 0 ], 0x8, %r2608; 2026-02-21T08:52:47.1118359Z // end inline asm 2026-02-21T08:52:47.1118426Z cp.async.commit_group; 2026-02-21T08:52:47.1118633Z .loc 1 54 87 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:54:87 2026-02-21T08:52:47.1118695Z add.s32 %r2611, %r54, %r2663; 2026-02-21T08:52:47.1118753Z // begin inline asm 2026-02-21T08:52:47.1118888Z cp.async.ca.shared.global [ %r2611 + 0 ], [ %rd244 + 0 ], 0x8, %r2608; 2026-02-21T08:52:47.1119076Z // end inline asm 2026-02-21T08:52:47.1119148Z cp.async.commit_group; 2026-02-21T08:52:47.1119349Z .loc 1 40 71 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:40:71 2026-02-21T08:52:47.1119429Z add.s64 %rd245, %rd245, 128; 2026-02-21T08:52:47.1119490Z add.s32 %r2790, %r2790, 64; 2026-02-21T08:52:47.1119554Z add.s64 %rd244, %rd244, 229376; 2026-02-21T08:52:47.1119626Z setp.lt.u64 %p68, %rd246, 4064; 2026-02-21T08:52:47.1119688Z @%p68 bra $L__BB0_14; 2026-02-21T08:52:47.1119801Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T08:52:47.1119870Z cp.async.wait_group 0; 2026-02-21T08:52:47.1119931Z bar.sync 0; 2026-02-21T08:52:47.1120128Z .loc 1 87 28 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:87:28 2026-02-21T08:52:47.1120207Z cvt.rn.bf16x2.f32 %r2684, %r2794, %r2793; 2026-02-21T08:52:47.1120287Z cvt.rn.bf16x2.f32 %r2685, %r2796, %r2795; 2026-02-21T08:52:47.1120360Z cvt.rn.bf16x2.f32 %r2686, %r2798, %r2797; 2026-02-21T08:52:47.1120431Z cvt.rn.bf16x2.f32 %r2687, %r2800, %r2799; 2026-02-21T08:52:47.1120507Z cvt.rn.bf16x2.f32 %r2688, %r2802, %r2801; 2026-02-21T08:52:47.1120584Z cvt.rn.bf16x2.f32 %r2689, %r2804, %r2803; 2026-02-21T08:52:47.1120654Z cvt.rn.bf16x2.f32 %r2690, %r2806, %r2805; 2026-02-21T08:52:47.1120724Z cvt.rn.bf16x2.f32 %r2691, %r2808, %r2807; 2026-02-21T08:52:47.1120933Z .loc 1 88 50 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:50 2026-02-21T08:52:47.1121004Z mad.lo.s32 %r2692, %r264, 7168, %r266; 2026-02-21T08:52:47.1121071Z mad.lo.s32 %r2693, %r265, 7168, %r266; 2026-02-21T08:52:47.1121272Z .loc 1 88 22 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:22 2026-02-21T08:52:47.1121342Z mad.wide.s32 %rd233, %r2692, 2, %rd35; 2026-02-21T08:52:47.1121409Z mad.wide.s32 %rd234, %r2693, 2, %rd35; 2026-02-21T08:52:47.1121613Z .loc 1 88 81 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:88:81 2026-02-21T08:52:47.1121737Z st.shared.v4.b32 [%r76], {%r2684, %r2686, %r2688, %r2690}; 2026-02-21T08:52:47.1121859Z st.shared.v4.b32 [%r76+512], {%r2685, %r2687, %r2689, %r2691}; 2026-02-21T08:52:47.1121918Z bar.sync 0; 2026-02-21T08:52:47.1121984Z // begin inline asm 2026-02-21T08:52:47.1122179Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2666, %r2667, %r2668, %r2669}, [%r2670]; 2026-02-21T08:52:47.1122238Z // end inline asm 2026-02-21T08:52:47.1122301Z // begin inline asm 2026-02-21T08:52:47.1122485Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2671, %r2672, %r2673, %r2674}, [%r2675]; 2026-02-21T08:52:47.1122543Z // end inline asm 2026-02-21T08:52:47.1122607Z // begin inline asm 2026-02-21T08:52:47.1122734Z st.global.v4.b32 [ %rd233 + 0 ], { %r2666, %r2667, %r2668, %r2669 }; 2026-02-21T08:52:47.1122793Z // end inline asm 2026-02-21T08:52:47.1122852Z // begin inline asm 2026-02-21T08:52:47.1122978Z st.global.v4.b32 [ %rd234 + 0 ], { %r2671, %r2672, %r2673, %r2674 }; 2026-02-21T08:52:47.1123035Z // end inline asm 2026-02-21T08:52:47.1123247Z .loc 1 19 144 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:144 2026-02-21T08:52:47.1123422Z add.s32 %r306, %r2789, 4224; 2026-02-21T08:52:47.1123493Z setp.lt.s32 %p69, %r2789, -4168; 2026-02-21T08:52:47.1123555Z mov.b32 %r2789, %r306; 2026-02-21T08:52:47.1123615Z @%p69 bra $L__BB0_13; 2026-02-21T08:52:47.1123711Z $L__BB0_16: // %._crit_edge 2026-02-21T08:52:47.1123911Z .loc 1 19 4 // c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py:19:4 2026-02-21T08:52:47.1123965Z ret; 2026-02-21T08:52:47.1124027Z $L__tmp11: 2026-02-21T08:52:47.1124084Z $L__func_end0: 2026-02-21T08:52:47.1124173Z // -- End function 2026-02-21T08:52:47.1124232Z } 2026-02-21T08:52:47.1124574Z .file 1 "/tmp/torchinductor_root/7h/c7hd3qfayfosybzqfaz2rroutsv4ti4brk4tb5fogobfox2tm5wt.py" 2026-02-21T08:52:47.1124800Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:47.1124868Z .section .debug_abbrev 2026-02-21T08:52:47.1124932Z { 2026-02-21T08:52:47.1125027Z .b8 1 // Abbreviation Code 2026-02-21T08:52:47.1125127Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:47.1125218Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:47.1125304Z .b8 37 // DW_AT_producer 2026-02-21T08:52:47.1125385Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.1125470Z .b8 19 // DW_AT_language 2026-02-21T08:52:47.1125553Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:47.1125633Z .b8 3 // DW_AT_name 2026-02-21T08:52:47.1125713Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.1125805Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:47.1125886Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:47.1125967Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:47.1126055Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.1126130Z .b8 0 // EOM(1) 2026-02-21T08:52:47.1126202Z .b8 0 // EOM(2) 2026-02-21T08:52:47.1126296Z .b8 2 // Abbreviation Code 2026-02-21T08:52:47.1126382Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:47.1126585Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:47.1126673Z .b8 3 // DW_AT_name 2026-02-21T08:52:47.1126755Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.1126838Z .b8 32 // DW_AT_inline 2026-02-21T08:52:47.1126925Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.1127004Z .b8 0 // EOM(1) 2026-02-21T08:52:47.1127073Z .b8 0 // EOM(2) 2026-02-21T08:52:47.1127161Z .b8 3 // Abbreviation Code 2026-02-21T08:52:47.1127252Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:47.1127334Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:47.1127418Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:47.1127501Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.1127583Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:47.1127672Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.1127768Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:47.1127856Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:47.1127929Z .b8 0 // EOM(1) 2026-02-21T08:52:47.1128001Z .b8 0 // EOM(2) 2026-02-21T08:52:47.1128093Z .b8 4 // Abbreviation Code 2026-02-21T08:52:47.1128331Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:47.1128412Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:47.1128509Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:47.1128588Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:47.1128666Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:47.1128741Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.1128829Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:47.1128905Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.1129120Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:47.1129207Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.1129288Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:47.1129370Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.1129460Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:47.1129538Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.1129608Z .b8 0 // EOM(1) 2026-02-21T08:52:47.1129684Z .b8 0 // EOM(2) 2026-02-21T08:52:47.1129755Z .b8 0 // EOM(3) 2026-02-21T08:52:47.1129808Z } 2026-02-21T08:52:47.1129884Z .section .debug_info 2026-02-21T08:52:47.1129942Z { 2026-02-21T08:52:47.1130033Z .b32 178 // Length of Unit 2026-02-21T08:52:47.1130126Z .b8 2 // DWARF version number 2026-02-21T08:52:47.1130181Z .b8 0 2026-02-21T08:52:47.1130318Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:47.1130414Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:47.1130532Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:47.1130625Z .b8 116 // DW_AT_producer 2026-02-21T08:52:47.1130680Z .b8 114 2026-02-21T08:52:47.1130737Z .b8 105 2026-02-21T08:52:47.1130794Z .b8 116 2026-02-21T08:52:47.1130846Z .b8 111 2026-02-21T08:52:47.1130898Z .b8 110 2026-02-21T08:52:47.1130948Z .b8 0 2026-02-21T08:52:47.1131032Z .b8 2 // DW_AT_language 2026-02-21T08:52:47.1131085Z .b8 0 2026-02-21T08:52:47.1131163Z .b8 99 // DW_AT_name 2026-02-21T08:52:47.1131219Z .b8 55 2026-02-21T08:52:47.1131271Z .b8 104 2026-02-21T08:52:47.1131324Z .b8 100 2026-02-21T08:52:47.1131377Z .b8 51 2026-02-21T08:52:47.1131434Z .b8 113 2026-02-21T08:52:47.1131490Z .b8 102 2026-02-21T08:52:47.1131543Z .b8 97 2026-02-21T08:52:47.1131600Z .b8 121 2026-02-21T08:52:47.1131652Z .b8 102 2026-02-21T08:52:47.1131704Z .b8 111 2026-02-21T08:52:47.1131756Z .b8 115 2026-02-21T08:52:47.1131815Z .b8 121 2026-02-21T08:52:47.1131866Z .b8 98 2026-02-21T08:52:47.1131918Z .b8 122 2026-02-21T08:52:47.1131984Z .b8 113 2026-02-21T08:52:47.1132045Z .b8 102 2026-02-21T08:52:47.1132099Z .b8 97 2026-02-21T08:52:47.1132152Z .b8 122 2026-02-21T08:52:47.1132207Z .b8 50 2026-02-21T08:52:47.1132259Z .b8 114 2026-02-21T08:52:47.1132311Z .b8 114 2026-02-21T08:52:47.1132363Z .b8 111 2026-02-21T08:52:47.1132421Z .b8 117 2026-02-21T08:52:47.1132474Z .b8 116 2026-02-21T08:52:47.1132528Z .b8 115 2026-02-21T08:52:47.1132586Z .b8 118 2026-02-21T08:52:47.1132637Z .b8 52 2026-02-21T08:52:47.1132690Z .b8 116 2026-02-21T08:52:47.1132743Z .b8 105 2026-02-21T08:52:47.1132811Z .b8 52 2026-02-21T08:52:47.1132864Z .b8 98 2026-02-21T08:52:47.1132916Z .b8 114 2026-02-21T08:52:47.1132974Z .b8 107 2026-02-21T08:52:47.1133029Z .b8 52 2026-02-21T08:52:47.1133081Z .b8 116 2026-02-21T08:52:47.1133135Z .b8 98 2026-02-21T08:52:47.1133195Z .b8 53 2026-02-21T08:52:47.1133247Z .b8 102 2026-02-21T08:52:47.1133299Z .b8 111 2026-02-21T08:52:47.1133461Z .b8 103 2026-02-21T08:52:47.1133515Z .b8 111 2026-02-21T08:52:47.1133565Z .b8 98 2026-02-21T08:52:47.1133618Z .b8 102 2026-02-21T08:52:47.1133675Z .b8 111 2026-02-21T08:52:47.1133728Z .b8 120 2026-02-21T08:52:47.1133779Z .b8 50 2026-02-21T08:52:47.1133831Z .b8 116 2026-02-21T08:52:47.1133887Z .b8 109 2026-02-21T08:52:47.1133939Z .b8 53 2026-02-21T08:52:47.1133992Z .b8 119 2026-02-21T08:52:47.1134048Z .b8 116 2026-02-21T08:52:47.1134100Z .b8 46 2026-02-21T08:52:47.1134151Z .b8 112 2026-02-21T08:52:47.1134203Z .b8 121 2026-02-21T08:52:47.1134260Z .b8 0 2026-02-21T08:52:47.1134365Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:47.1134446Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:47.1134506Z .b8 116 2026-02-21T08:52:47.1134654Z .b8 109 2026-02-21T08:52:47.1134710Z .b8 112 2026-02-21T08:52:47.1134762Z .b8 47 2026-02-21T08:52:47.1134820Z .b8 116 2026-02-21T08:52:47.1134873Z .b8 111 2026-02-21T08:52:47.1134925Z .b8 114 2026-02-21T08:52:47.1134987Z .b8 99 2026-02-21T08:52:47.1135039Z .b8 104 2026-02-21T08:52:47.1135091Z .b8 105 2026-02-21T08:52:47.1135143Z .b8 110 2026-02-21T08:52:47.1135202Z .b8 100 2026-02-21T08:52:47.1135254Z .b8 117 2026-02-21T08:52:47.1135306Z .b8 99 2026-02-21T08:52:47.1135362Z .b8 116 2026-02-21T08:52:47.1135420Z .b8 111 2026-02-21T08:52:47.1135473Z .b8 114 2026-02-21T08:52:47.1135525Z .b8 95 2026-02-21T08:52:47.1135583Z .b8 114 2026-02-21T08:52:47.1135637Z .b8 111 2026-02-21T08:52:47.1135690Z .b8 111 2026-02-21T08:52:47.1135742Z .b8 116 2026-02-21T08:52:47.1135798Z .b8 47 2026-02-21T08:52:47.1135849Z .b8 55 2026-02-21T08:52:47.1135902Z .b8 104 2026-02-21T08:52:47.1135959Z .b8 0 2026-02-21T08:52:47.1136074Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:47.1136157Z .b8 95 // DW_AT_name 2026-02-21T08:52:47.1136211Z .b8 104 2026-02-21T08:52:47.1136268Z .b8 101 2026-02-21T08:52:47.1136322Z .b8 108 2026-02-21T08:52:47.1136377Z .b8 105 2026-02-21T08:52:47.1136432Z .b8 111 2026-02-21T08:52:47.1136599Z .b8 110 2026-02-21T08:52:47.1136657Z .b8 95 2026-02-21T08:52:47.1136709Z .b8 109 2026-02-21T08:52:47.1136766Z .b8 97 2026-02-21T08:52:47.1136818Z .b8 116 2026-02-21T08:52:47.1136869Z .b8 109 2026-02-21T08:52:47.1136925Z .b8 117 2026-02-21T08:52:47.1136980Z .b8 108 2026-02-21T08:52:47.1137030Z .b8 95 2026-02-21T08:52:47.1137081Z .b8 98 2026-02-21T08:52:47.1137137Z .b8 102 2026-02-21T08:52:47.1137188Z .b8 49 2026-02-21T08:52:47.1137239Z .b8 54 2026-02-21T08:52:47.1137289Z .b8 95 2026-02-21T08:52:47.1137345Z .b8 105 2026-02-21T08:52:47.1137397Z .b8 110 2026-02-21T08:52:47.1137448Z .b8 116 2026-02-21T08:52:47.1137503Z .b8 52 2026-02-21T08:52:47.1137554Z .b8 0 2026-02-21T08:52:47.1137635Z .b8 1 // DW_AT_inline 2026-02-21T08:52:47.1137745Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:47.1137844Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:47.1137943Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:47.1138053Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:47.1138193Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:47.1138294Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:47.1138382Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:47.1138476Z .b64 $L__tmp10 // DW_AT_high_pc 2026-02-21T08:52:47.1138559Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:47.1138640Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:47.1138731Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:47.1138822Z .b8 0 // End Of Children Mark 2026-02-21T08:52:47.1138911Z .b8 0 // End Of Children Mark 2026-02-21T08:52:47.1139098Z } 2026-02-21T08:52:47.1139177Z .section .debug_macinfo { } 2026-02-21T08:52:47.1139184Z 2026-02-21T08:52:47.1139278Z ================================================================ 2026-02-21T08:52:47.1139399Z please share the reproducer above with Triton project. 2026-02-21T08:52:47.6535399Z 2026-02-21T08:52:47.6535412Z 2026-02-21T08:52:47.6535418Z 2026-02-21T08:52:47.6535790Z ================================================================ 2026-02-21T08:52:47.6536167Z Internal Triton PTX codegen error 2026-02-21T08:52:47.6536433Z `ptxas` stderr: 2026-02-21T08:52:47.6537338Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 471 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:47.6538504Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:47.6538763Z 2026-02-21T08:52:47.6539415Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp6sjtovhj.ptx -o /tmp/tmp6sjtovhj.ptx.o 2026-02-21T08:52:47.6540182Z 2026-02-21T08:52:47.6540187Z 2026-02-21T08:52:47.6540261Z // 2026-02-21T08:52:47.6540459Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:47.6540712Z // 2026-02-21T08:52:47.6540803Z 2026-02-21T08:52:47.6540888Z .version 8.7 2026-02-21T08:52:47.6541073Z .target sm_90a 2026-02-21T08:52:47.6541266Z .address_size 64 2026-02-21T08:52:47.6541383Z 2026-02-21T08:52:47.6541610Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:47.6542057Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:47.6542420Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:47.6542753Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:47.6543136Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:47.6543581Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:47.6544017Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:47.6544439Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:47.6544872Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:47.6545215Z ) 2026-02-21T08:52:47.6545382Z .reqntid 1024 2026-02-21T08:52:47.6545577Z .maxnreg 32 2026-02-21T08:52:47.6545735Z { 2026-02-21T08:52:47.6545880Z .reg .pred %p<58>; 2026-02-21T08:52:47.6546051Z .reg .b16 %rs<457>; 2026-02-21T08:52:47.6546221Z .reg .b32 %r<2223>; 2026-02-21T08:52:47.6546381Z .reg .b64 %rd<156>; 2026-02-21T08:52:47.6546869Z .loc 1 14 0 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:14:0 2026-02-21T08:52:47.6547259Z $L__func_begin0: 2026-02-21T08:52:47.6547593Z .loc 1 14 0 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:14:0 2026-02-21T08:52:47.6547914Z 2026-02-21T08:52:47.6547979Z // %bb.0: 2026-02-21T08:52:47.6548191Z ld.param.b64 %rd17, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:47.6548607Z ld.param.b64 %rd16, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:47.6548938Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:47.6549203Z $L__tmp0: 2026-02-21T08:52:47.6549513Z .loc 1 19 46 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:46 2026-02-21T08:52:47.6549913Z mov.u32 %r2142, %ctaid.x; 2026-02-21T08:52:47.6550248Z .loc 1 0 0 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:0 2026-02-21T08:52:47.6550619Z sub.s32 %r242, 4251, %r2142; 2026-02-21T08:52:47.6550986Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6551836Z [84s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:47.6553511Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:47.6555307Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:47.6555621Z `ptxas` stderr: 2026-02-21T08:52:47.6556221Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 471 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T08:52:47.6557012Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:47.6557364Z 2026-02-21T08:52:47.6557872Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp6sjtovhj.ptx -o /tmp/tmp6sjtovhj.ptx.o 2026-02-21T08:52:47.6558462Z 2026-02-21T08:52:47.6558615Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:47.6558929Z mul.hi.u32 %r243, %r242, 1041204193; 2026-02-21T08:52:47.6559134Z shr.u32 %r244, %r243, 10; 2026-02-21T08:52:47.6559325Z mul.hi.u32 %r245, %r244, 1431655766; 2026-02-21T08:52:47.6559539Z mad.lo.s32 %r2203, %r245, 12672, %r2142; 2026-02-21T08:52:47.6559902Z .loc 1 31 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:45 2026-02-21T08:52:47.6560261Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:47.6560435Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:47.6560592Z shr.u32 %r5, %r3, 4; 2026-02-21T08:52:47.6560756Z or.b32 %r6, %r4, 32; 2026-02-21T08:52:47.6560913Z shl.b32 %r246, %r3, 2; 2026-02-21T08:52:47.6561098Z and.b32 %r7, %r246, 60; 2026-02-21T08:52:47.6561418Z .loc 1 33 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:33:45 2026-02-21T08:52:47.6561769Z and.b32 %r8, %r3, 31; 2026-02-21T08:52:47.6561940Z shl.b32 %r9, %r8, 3; 2026-02-21T08:52:47.6562240Z .loc 1 65 38 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:65:38 2026-02-21T08:52:47.6562594Z and.b32 %r10, %r3, 256; 2026-02-21T08:52:47.6562913Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6563289Z setp.ge.s32 %p1, %r2142, %r2203; 2026-02-21T08:52:47.6563487Z shl.b32 %r2126, %r3, 3; 2026-02-21T08:52:47.6563653Z shr.u32 %r2127, %r3, 1; 2026-02-21T08:52:47.6563842Z mov.b32 %r2128, global_smem; 2026-02-21T08:52:47.6564028Z mul.lo.s32 %r2129, %r4, 7168; 2026-02-21T08:52:47.6564217Z shl.b32 %r2130, %r3, 6; 2026-02-21T08:52:47.6564378Z shl.b32 %r2131, %r3, 5; 2026-02-21T08:52:47.6564546Z shl.b32 %r2132, %r8, 1; 2026-02-21T08:52:47.6564709Z and.b32 %r2133, %r3, 255; 2026-02-21T08:52:47.6564893Z shl.b32 %r2134, %r3, 4; 2026-02-21T08:52:47.6565060Z shr.u32 %r2135, %r3, 6; 2026-02-21T08:52:47.6565235Z shl.b32 %r2136, %r3, 13; 2026-02-21T08:52:47.6573398Z and.b32 %r2137, %r3, 24; 2026-02-21T08:52:47.6573621Z shr.u32 %r2138, %r3, 3; 2026-02-21T08:52:47.6573833Z bfe.s32 %r2139, %r3, 2, 1; 2026-02-21T08:52:47.6574023Z shl.b32 %r2140, %r3, 1; 2026-02-21T08:52:47.6574209Z bfe.s32 %r2141, %r3, 5, 1; 2026-02-21T08:52:47.6574394Z setp.eq.b32 %p57, %r10, 0; 2026-02-21T08:52:47.6574586Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:47.6574789Z // %bb.1: // %.lr.ph 2026-02-21T08:52:47.6575209Z .loc 1 0 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:0:144 2026-02-21T08:52:47.6575600Z and.b32 %r249, %r2127, 56; 2026-02-21T08:52:47.6575789Z xor.b32 %r250, %r249, %r2126; 2026-02-21T08:52:47.6575993Z add.s32 %r252, %r2128, %r250; 2026-02-21T08:52:47.6576179Z add.s32 %r11, %r252, 65536; 2026-02-21T08:52:47.6576367Z add.s32 %r253, %r2128, 81920; 2026-02-21T08:52:47.6576715Z add.s32 %r13, %r253, %r2126; 2026-02-21T08:52:47.6577163Z add.s32 %r14, %r252, 73728; 2026-02-21T08:52:47.6577352Z add.s32 %r254, %r2128, %r2126; 2026-02-21T08:52:47.6577536Z add.s32 %r15, %r254, 90112; 2026-02-21T08:52:47.6577735Z and.b32 %r256, %r2130, 6144; 2026-02-21T08:52:47.6577913Z and.b32 %r258, %r2131, 896; 2026-02-21T08:52:47.6578094Z or.b32 %r260, %r256, %r258; 2026-02-21T08:52:47.6578268Z or.b32 %r16, %r260, %r2132; 2026-02-21T08:52:47.6578450Z xor.b32 %r17, %r16, 8; 2026-02-21T08:52:47.6578620Z xor.b32 %r18, %r16, 16; 2026-02-21T08:52:47.6578799Z xor.b32 %r19, %r16, 24; 2026-02-21T08:52:47.6578968Z xor.b32 %r20, %r16, 32; 2026-02-21T08:52:47.6579143Z xor.b32 %r21, %r16, 40; 2026-02-21T08:52:47.6579312Z xor.b32 %r22, %r16, 48; 2026-02-21T08:52:47.6579474Z xor.b32 %r23, %r16, 56; 2026-02-21T08:52:47.6579812Z and.b32 %r262, %r2127, 256; 2026-02-21T08:52:47.6580007Z add.s32 %r263, %r253, %r262; 2026-02-21T08:52:47.6580185Z add.s32 %r24, %r263, %r2133; 2026-02-21T08:52:47.6580371Z shl.b32 %r264, %r2133, 7; 2026-02-21T08:52:47.6580558Z and.b32 %r266, %r2134, 112; 2026-02-21T08:52:47.6580737Z and.b32 %r268, %r2135, 12; 2026-02-21T08:52:47.6580925Z or.b32 %r269, %r264, %r268; 2026-02-21T08:52:47.6581099Z or.b32 %r270, %r269, %r266; 2026-02-21T08:52:47.6581285Z add.s32 %r25, %r2128, %r270; 2026-02-21T08:52:47.6581463Z xor.b32 %r271, %r270, 16; 2026-02-21T08:52:47.6581644Z add.s32 %r26, %r2128, %r271; 2026-02-21T08:52:47.6581837Z xor.b32 %r272, %r270, 32; 2026-02-21T08:52:47.6582032Z add.s32 %r27, %r2128, %r272; 2026-02-21T08:52:47.6582236Z xor.b32 %r273, %r270, 48; 2026-02-21T08:52:47.6582412Z add.s32 %r28, %r2128, %r273; 2026-02-21T08:52:47.6582597Z xor.b32 %r274, %r270, 64; 2026-02-21T08:52:47.6582770Z add.s32 %r29, %r2128, %r274; 2026-02-21T08:52:47.6582950Z xor.b32 %r275, %r270, 80; 2026-02-21T08:52:47.6583123Z add.s32 %r30, %r2128, %r275; 2026-02-21T08:52:47.6583307Z xor.b32 %r276, %r270, 96; 2026-02-21T08:52:47.6583479Z add.s32 %r31, %r2128, %r276; 2026-02-21T08:52:47.6583667Z xor.b32 %r277, %r270, 112; 2026-02-21T08:52:47.6583844Z add.s32 %r32, %r2128, %r277; 2026-02-21T08:52:47.6584030Z and.b32 %r279, %r2136, 24576; 2026-02-21T08:52:47.6584222Z and.b32 %r280, %r2131, 3168; 2026-02-21T08:52:47.6584399Z shl.b32 %r282, %r2137, 4; 2026-02-21T08:52:47.6584579Z and.b32 %r284, %r2138, 112; 2026-02-21T08:52:47.6584756Z and.b32 %r286, %r2139, 4112; 2026-02-21T08:52:47.6584941Z or.b32 %r287, %r280, %r282; 2026-02-21T08:52:47.6585128Z xor.b32 %r288, %r286, %r284; 2026-02-21T08:52:47.6585316Z xor.b32 %r289, %r288, %r287; 2026-02-21T08:52:47.6585497Z add.s32 %r290, %r2128, %r279; 2026-02-21T08:52:47.6585685Z add.s32 %r33, %r290, %r289; 2026-02-21T08:52:47.6585862Z shl.b32 %r291, %r2137, 10; 2026-02-21T08:52:47.6586041Z shl.b32 %r292, %r2137, 2; 2026-02-21T08:52:47.6586222Z and.b32 %r294, %r2140, 1920; 2026-02-21T08:52:47.6586395Z and.b32 %r296, %r2141, 4112; 2026-02-21T08:52:47.6586725Z or.b32 %r297, %r291, %r266; 2026-02-21T08:52:47.6586902Z or.b32 %r298, %r292, %r294; 2026-02-21T08:52:47.6587086Z xor.b32 %r299, %r297, %r298; 2026-02-21T08:52:47.6587259Z xor.b32 %r300, %r299, %r296; 2026-02-21T08:52:47.6587439Z add.s32 %r719, %r2128, %r300; 2026-02-21T08:52:47.6587622Z add.s32 %r724, %r719, 2048; 2026-02-21T08:52:47.6587970Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6588360Z or.b32 %r301, %r2129, %r9; 2026-02-21T08:52:47.6588684Z add.s32 %r36, %r301, 458752; 2026-02-21T08:52:47.6589034Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6589403Z shl.b32 %r302, %r5, 13; 2026-02-21T08:52:47.6589731Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6590090Z or.b32 %r303, %r302, %r7; 2026-02-21T08:52:47.6590285Z or.b32 %r37, %r303, 128; 2026-02-21T08:52:47.6590475Z cvt.u64.u32 %rd2, %r7; 2026-02-21T08:52:47.6590817Z cvt.u64.u32 %rd3, %r2129; 2026-02-21T08:52:47.6591056Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:47.6591360Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:47.6591643Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:47.6591909Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:47.6592306Z .loc 1 25 35 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:25:35 2026-02-21T08:52:47.6592685Z mul.hi.s32 %r315, %r2142, -1840700269; 2026-02-21T08:52:47.6592899Z add.s32 %r316, %r315, %r2142; 2026-02-21T08:52:47.6593096Z shr.u32 %r317, %r316, 31; 2026-02-21T08:52:47.6593270Z shr.s32 %r318, %r316, 5; 2026-02-21T08:52:47.6593610Z add.s32 %r319, %r318, %r317; 2026-02-21T08:52:47.6593942Z .loc 1 26 33 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:26:33 2026-02-21T08:52:47.6594304Z shl.b32 %r320, %r319, 1; 2026-02-21T08:52:47.6594611Z .loc 1 27 39 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:39 2026-02-21T08:52:47.6594965Z sub.s32 %r321, 1, %r320; 2026-02-21T08:52:47.6595277Z .loc 1 27 52 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:52 2026-02-21T08:52:47.6595631Z min.s32 %r322, %r321, 2; 2026-02-21T08:52:47.6595956Z .loc 1 28 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:45 2026-02-21T08:52:47.6596306Z mul.lo.s32 %r323, %r319, 56; 2026-02-21T08:52:47.6596608Z sub.s32 %r324, %r2142, %r323; 2026-02-21T08:52:47.6596927Z .loc 1 29 51 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:29:51 2026-02-21T08:52:47.6597282Z div.s32 %r325, %r324, %r322; 2026-02-21T08:52:47.6597599Z .loc 1 28 64 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:64 2026-02-21T08:52:47.6597949Z mul.lo.s32 %r326, %r325, %r322; 2026-02-21T08:52:47.6598156Z sub.s32 %r327, %r324, %r326; 2026-02-21T08:52:47.6598473Z .loc 1 28 30 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:30 2026-02-21T08:52:47.6598826Z add.s32 %r328, %r327, %r320; 2026-02-21T08:52:47.6599134Z .loc 1 30 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:30:27 2026-02-21T08:52:47.6599487Z shl.b32 %r66, %r328, 6; 2026-02-21T08:52:47.6599798Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6600147Z or.b32 %r329, %r66, %r5; 2026-02-21T08:52:47.6600456Z .loc 1 32 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:32:27 2026-02-21T08:52:47.6600805Z shl.b32 %r330, %r325, 8; 2026-02-21T08:52:47.6601116Z .loc 1 33 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:33:32 2026-02-21T08:52:47.6601458Z or.b32 %r67, %r330, %r9; 2026-02-21T08:52:47.6601771Z .loc 1 48 53 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:53 2026-02-21T08:52:47.6602126Z shl.b32 %r331, %r329, 13; 2026-02-21T08:52:47.6602435Z .loc 1 48 60 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:60 2026-02-21T08:52:47.6602791Z or.b32 %r332, %r331, %r7; 2026-02-21T08:52:47.6603107Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6603477Z mad.wide.s32 %rd18, %r332, 2, %rd15; 2026-02-21T08:52:47.6603675Z mov.b32 %r305, 8; 2026-02-21T08:52:47.6603973Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6604330Z // begin inline asm 2026-02-21T08:52:47.6604587Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd18 + 0 ], 0x8, %r305; 2026-02-21T08:52:47.6604872Z // end inline asm 2026-02-21T08:52:47.6605036Z cp.async.commit_group; 2026-02-21T08:52:47.6605544Z .loc 1 54 62 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:62 2026-02-21T08:52:47.6605895Z add.s32 %r333, %r67, %r2129; 2026-02-21T08:52:47.6606217Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6606707Z cvt.s64.s32 %rd23, %r333; 2026-02-21T08:52:47.6606892Z add.s64 %rd19, %rd16, %rd23; 2026-02-21T08:52:47.6607218Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6607566Z // begin inline asm 2026-02-21T08:52:47.6607804Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd19 + 0 ], 0x8, %r305; 2026-02-21T08:52:47.6608078Z // end inline asm 2026-02-21T08:52:47.6608246Z cp.async.commit_group; 2026-02-21T08:52:47.6608694Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6609062Z cvt.s64.s32 %rd24, %r331; 2026-02-21T08:52:47.6609249Z or.b64 %rd25, %rd24, %rd2; 2026-02-21T08:52:47.6609430Z shl.b64 %rd26, %rd25, 1; 2026-02-21T08:52:47.6609628Z add.s64 %rd27, %rd15, %rd26; 2026-02-21T08:52:47.6609811Z add.s64 %rd20, %rd27, 128; 2026-02-21T08:52:47.6610128Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6610473Z bar.sync 0; 2026-02-21T08:52:47.6610630Z // begin inline asm 2026-02-21T08:52:47.6610857Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd20 + 0 ], 0x8, %r305; 2026-02-21T08:52:47.6611141Z // end inline asm 2026-02-21T08:52:47.6611305Z cp.async.commit_group; 2026-02-21T08:52:47.6611615Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6611969Z cvt.s64.s32 %rd28, %r67; 2026-02-21T08:52:47.6612148Z add.s64 %rd29, %rd28, %rd3; 2026-02-21T08:52:47.6612335Z add.s64 %rd30, %rd16, %rd29; 2026-02-21T08:52:47.6612518Z add.s64 %rd21, %rd30, 229376; 2026-02-21T08:52:47.6612845Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6613197Z // begin inline asm 2026-02-21T08:52:47.6613426Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd21 + 0 ], 0x8, %r305; 2026-02-21T08:52:47.6613704Z // end inline asm 2026-02-21T08:52:47.6613863Z cp.async.commit_group; 2026-02-21T08:52:47.6614184Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6614546Z add.s32 %r2144, %r36, %r330; 2026-02-21T08:52:47.6614742Z shl.b32 %r334, %r328, 19; 2026-02-21T08:52:47.6614920Z or.b32 %r2143, %r37, %r334; 2026-02-21T08:52:47.6615106Z mov.b32 %r2147, 0f00000000; 2026-02-21T08:52:47.6615282Z mov.b32 %r2146, 1; 2026-02-21T08:52:47.6615439Z mov.b32 %r2145, -1; 2026-02-21T08:52:47.6615613Z mov.b64 %rd151, -32; 2026-02-21T08:52:47.6615776Z mov.b32 %r2148, %r2147; 2026-02-21T08:52:47.6615953Z mov.b32 %r2149, %r2147; 2026-02-21T08:52:47.6616120Z mov.b32 %r2150, %r2147; 2026-02-21T08:52:47.6616294Z mov.b32 %r2151, %r2147; 2026-02-21T08:52:47.6616586Z mov.b32 %r2152, %r2147; 2026-02-21T08:52:47.6616770Z mov.b32 %r2153, %r2147; 2026-02-21T08:52:47.6616934Z mov.b32 %r2154, %r2147; 2026-02-21T08:52:47.6617107Z mov.b32 %r2155, %r2147; 2026-02-21T08:52:47.6617282Z mov.b32 %r2156, %r2147; 2026-02-21T08:52:47.6617444Z mov.b32 %r2157, %r2147; 2026-02-21T08:52:47.6617616Z mov.b32 %r2158, %r2147; 2026-02-21T08:52:47.6617781Z mov.b32 %r2159, %r2147; 2026-02-21T08:52:47.6617953Z mov.b32 %r2160, %r2147; 2026-02-21T08:52:47.6618117Z mov.b32 %r2161, %r2147; 2026-02-21T08:52:47.6618283Z mov.b32 %r2162, %r2147; 2026-02-21T08:52:47.6618496Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.6618799Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.6619058Z add.s64 %rd151, %rd151, 32; 2026-02-21T08:52:47.6619245Z setp.lt.u64 %p11, %rd151, 4032; 2026-02-21T08:52:47.6619756Z add.s32 %r665, %r2145, 1; 2026-02-21T08:52:47.6619934Z setp.gt.s32 %p12, %r665, 1; 2026-02-21T08:52:47.6620127Z selp.b32 %r2145, 0, %r665, %p12; 2026-02-21T08:52:47.6620459Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6620834Z cp.async.wait_group 2; 2026-02-21T08:52:47.6621010Z bar.sync 0; 2026-02-21T08:52:47.6621158Z shl.b32 %r666, %r2145, 13; 2026-02-21T08:52:47.6621338Z add.s32 %r667, %r2128, %r666; 2026-02-21T08:52:47.6621516Z add.s32 %r668, %r667, 65536; 2026-02-21T08:52:47.6621833Z .loc 1 52 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:52:32 2026-02-21T08:52:47.6622184Z add.s32 %r669, %r668, %r16; 2026-02-21T08:52:47.6622368Z ld.shared.b16 %rs1, [%r669]; 2026-02-21T08:52:47.6622685Z ld.shared.b16 %rs2, [%r669+1024]; 2026-02-21T08:52:47.6622898Z ld.shared.b16 %rs3, [%r669+64]; 2026-02-21T08:52:47.6623098Z ld.shared.b16 %rs4, [%r669+1088]; 2026-02-21T08:52:47.6623293Z add.s32 %r670, %r668, %r17; 2026-02-21T08:52:47.6623476Z ld.shared.b16 %rs5, [%r670]; 2026-02-21T08:52:47.6623659Z ld.shared.b16 %rs6, [%r670+1024]; 2026-02-21T08:52:47.6623860Z ld.shared.b16 %rs7, [%r670+64]; 2026-02-21T08:52:47.6624052Z ld.shared.b16 %rs8, [%r670+1088]; 2026-02-21T08:52:47.6624250Z add.s32 %r671, %r668, %r18; 2026-02-21T08:52:47.6624428Z ld.shared.b16 %rs9, [%r671]; 2026-02-21T08:52:47.6624618Z ld.shared.b16 %rs10, [%r671+1024]; 2026-02-21T08:52:47.6624822Z ld.shared.b16 %rs11, [%r671+64]; 2026-02-21T08:52:47.6625023Z ld.shared.b16 %rs12, [%r671+1088]; 2026-02-21T08:52:47.6625220Z add.s32 %r672, %r668, %r19; 2026-02-21T08:52:47.6625403Z ld.shared.b16 %rs13, [%r672]; 2026-02-21T08:52:47.6625593Z ld.shared.b16 %rs14, [%r672+1024]; 2026-02-21T08:52:47.6625795Z ld.shared.b16 %rs15, [%r672+64]; 2026-02-21T08:52:47.6625994Z ld.shared.b16 %rs16, [%r672+1088]; 2026-02-21T08:52:47.6626198Z add.s32 %r673, %r668, %r20; 2026-02-21T08:52:47.6626388Z ld.shared.b16 %rs17, [%r673]; 2026-02-21T08:52:47.6626705Z ld.shared.b16 %rs18, [%r673+1024]; 2026-02-21T08:52:47.6626907Z ld.shared.b16 %rs19, [%r673+64]; 2026-02-21T08:52:47.6627107Z ld.shared.b16 %rs20, [%r673+1088]; 2026-02-21T08:52:47.6627310Z add.s32 %r674, %r668, %r21; 2026-02-21T08:52:47.6627499Z ld.shared.b16 %rs21, [%r674]; 2026-02-21T08:52:47.6627685Z ld.shared.b16 %rs22, [%r674+1024]; 2026-02-21T08:52:47.6627886Z ld.shared.b16 %rs23, [%r674+64]; 2026-02-21T08:52:47.6628077Z ld.shared.b16 %rs24, [%r674+1088]; 2026-02-21T08:52:47.6628273Z add.s32 %r675, %r668, %r22; 2026-02-21T08:52:47.6628451Z ld.shared.b16 %rs25, [%r675]; 2026-02-21T08:52:47.6628713Z ld.shared.b16 %rs26, [%r675+1024]; 2026-02-21T08:52:47.6628910Z ld.shared.b16 %rs27, [%r675+64]; 2026-02-21T08:52:47.6629108Z ld.shared.b16 %rs28, [%r675+1088]; 2026-02-21T08:52:47.6629302Z add.s32 %r676, %r668, %r23; 2026-02-21T08:52:47.6629480Z ld.shared.b16 %rs29, [%r676]; 2026-02-21T08:52:47.6629663Z ld.shared.b16 %rs30, [%r676+1024]; 2026-02-21T08:52:47.6629859Z ld.shared.b16 %rs31, [%r676+64]; 2026-02-21T08:52:47.6630054Z ld.shared.b16 %rs32, [%r676+1088]; 2026-02-21T08:52:47.6630245Z cvt.f32.bf16 %r367, %rs1; 2026-02-21T08:52:47.6630425Z cvt.f32.bf16 %r368, %rs2; 2026-02-21T08:52:47.6630594Z cvt.f32.bf16 %r369, %rs5; 2026-02-21T08:52:47.6630781Z cvt.f32.bf16 %r370, %rs6; 2026-02-21T08:52:47.6630954Z cvt.f32.bf16 %r403, %rs9; 2026-02-21T08:52:47.6631125Z cvt.f32.bf16 %r404, %rs10; 2026-02-21T08:52:47.6631306Z cvt.f32.bf16 %r405, %rs13; 2026-02-21T08:52:47.6631478Z cvt.f32.bf16 %r406, %rs14; 2026-02-21T08:52:47.6631654Z cvt.f32.bf16 %r439, %rs17; 2026-02-21T08:52:47.6631827Z cvt.f32.bf16 %r440, %rs18; 2026-02-21T08:52:47.6632009Z cvt.f32.bf16 %r441, %rs21; 2026-02-21T08:52:47.6632188Z cvt.f32.bf16 %r442, %rs22; 2026-02-21T08:52:47.6632365Z cvt.f32.bf16 %r475, %rs25; 2026-02-21T08:52:47.6632536Z cvt.f32.bf16 %r476, %rs26; 2026-02-21T08:52:47.6632712Z cvt.f32.bf16 %r477, %rs29; 2026-02-21T08:52:47.6633039Z cvt.f32.bf16 %r478, %rs30; 2026-02-21T08:52:47.6633210Z cvt.f32.bf16 %r511, %rs3; 2026-02-21T08:52:47.6633387Z cvt.f32.bf16 %r512, %rs4; 2026-02-21T08:52:47.6633559Z cvt.f32.bf16 %r513, %rs7; 2026-02-21T08:52:47.6633734Z cvt.f32.bf16 %r514, %rs8; 2026-02-21T08:52:47.6633921Z cvt.f32.bf16 %r547, %rs11; 2026-02-21T08:52:47.6634099Z cvt.f32.bf16 %r548, %rs12; 2026-02-21T08:52:47.6634273Z cvt.f32.bf16 %r549, %rs15; 2026-02-21T08:52:47.6634448Z cvt.f32.bf16 %r550, %rs16; 2026-02-21T08:52:47.6634621Z cvt.f32.bf16 %r583, %rs19; 2026-02-21T08:52:47.6634800Z cvt.f32.bf16 %r584, %rs20; 2026-02-21T08:52:47.6634976Z cvt.f32.bf16 %r585, %rs23; 2026-02-21T08:52:47.6635146Z cvt.f32.bf16 %r586, %rs24; 2026-02-21T08:52:47.6635320Z cvt.f32.bf16 %r619, %rs27; 2026-02-21T08:52:47.6635632Z cvt.f32.bf16 %r620, %rs28; 2026-02-21T08:52:47.6635817Z cvt.f32.bf16 %r621, %rs31; 2026-02-21T08:52:47.6635990Z cvt.f32.bf16 %r622, %rs32; 2026-02-21T08:52:47.6636319Z .loc 1 67 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:67:45 2026-02-21T08:52:47.6636808Z add.s32 %r677, %r24, %r666; 2026-02-21T08:52:47.6637004Z ld.shared.b8 %rs33, [%r677]; 2026-02-21T08:52:47.6637199Z ld.shared.b8 %rs34, [%r677+512]; 2026-02-21T08:52:47.6637395Z ld.shared.b8 %rs35, [%r677+1024]; 2026-02-21T08:52:47.6637597Z ld.shared.b8 %rs36, [%r677+1536]; 2026-02-21T08:52:47.6637791Z ld.shared.b8 %rs37, [%r677+2048]; 2026-02-21T08:52:47.6637988Z ld.shared.b8 %rs38, [%r677+2560]; 2026-02-21T08:52:47.6638180Z ld.shared.b8 %rs39, [%r677+3072]; 2026-02-21T08:52:47.6638379Z ld.shared.b8 %rs40, [%r677+3584]; 2026-02-21T08:52:47.6638566Z ld.shared.b8 %rs41, [%r677+4096]; 2026-02-21T08:52:47.6638764Z ld.shared.b8 %rs42, [%r677+4608]; 2026-02-21T08:52:47.6638955Z ld.shared.b8 %rs43, [%r677+5120]; 2026-02-21T08:52:47.6639152Z ld.shared.b8 %rs44, [%r677+5632]; 2026-02-21T08:52:47.6639349Z ld.shared.b8 %rs45, [%r677+6144]; 2026-02-21T08:52:47.6639537Z ld.shared.b8 %rs46, [%r677+6656]; 2026-02-21T08:52:47.6639736Z ld.shared.b8 %rs47, [%r677+7168]; 2026-02-21T08:52:47.6639923Z ld.shared.b8 %rs48, [%r677+7680]; 2026-02-21T08:52:47.6640268Z .loc 1 57 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:57:28 2026-02-21T08:52:47.6640631Z shl.b16 %rs49, %rs33, 4; 2026-02-21T08:52:47.6640810Z shl.b16 %rs50, %rs34, 4; 2026-02-21T08:52:47.6640976Z shl.b16 %rs51, %rs35, 4; 2026-02-21T08:52:47.6641149Z shl.b16 %rs52, %rs36, 4; 2026-02-21T08:52:47.6641318Z shl.b16 %rs53, %rs37, 4; 2026-02-21T08:52:47.6641486Z shl.b16 %rs54, %rs38, 4; 2026-02-21T08:52:47.6641667Z shl.b16 %rs55, %rs39, 4; 2026-02-21T08:52:47.6641838Z shl.b16 %rs56, %rs40, 4; 2026-02-21T08:52:47.6642011Z shl.b16 %rs57, %rs41, 4; 2026-02-21T08:52:47.6642182Z shl.b16 %rs58, %rs42, 4; 2026-02-21T08:52:47.6642355Z shl.b16 %rs59, %rs43, 4; 2026-02-21T08:52:47.6642520Z shl.b16 %rs60, %rs44, 4; 2026-02-21T08:52:47.6642702Z shl.b16 %rs61, %rs45, 4; 2026-02-21T08:52:47.6642875Z shl.b16 %rs62, %rs46, 4; 2026-02-21T08:52:47.6643049Z shl.b16 %rs63, %rs47, 4; 2026-02-21T08:52:47.6643222Z shl.b16 %rs64, %rs48, 4; 2026-02-21T08:52:47.6643538Z .loc 1 72 58 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:72:58 2026-02-21T08:52:47.6643919Z selp.b16 %rs65, %rs49, %rs33, %p57; 2026-02-21T08:52:47.6644130Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T08:52:47.6644315Z shr.s16 %rs67, %rs66, 4; 2026-02-21T08:52:47.6644498Z selp.b16 %rs68, %rs50, %rs34, %p57; 2026-02-21T08:52:47.6644705Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T08:52:47.6644879Z shr.s16 %rs70, %rs69, 4; 2026-02-21T08:52:47.6645067Z selp.b16 %rs71, %rs51, %rs35, %p57; 2026-02-21T08:52:47.6645271Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T08:52:47.6645451Z shr.s16 %rs73, %rs72, 4; 2026-02-21T08:52:47.6645639Z selp.b16 %rs74, %rs52, %rs36, %p57; 2026-02-21T08:52:47.6645836Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T08:52:47.6646011Z shr.s16 %rs76, %rs75, 4; 2026-02-21T08:52:47.6646337Z selp.b16 %rs77, %rs53, %rs37, %p57; 2026-02-21T08:52:47.6646661Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T08:52:47.6646840Z shr.s16 %rs79, %rs78, 4; 2026-02-21T08:52:47.6647022Z selp.b16 %rs80, %rs54, %rs38, %p57; 2026-02-21T08:52:47.6647224Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T08:52:47.6647396Z shr.s16 %rs82, %rs81, 4; 2026-02-21T08:52:47.6647578Z selp.b16 %rs83, %rs55, %rs39, %p57; 2026-02-21T08:52:47.6647773Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T08:52:47.6647952Z shr.s16 %rs85, %rs84, 4; 2026-02-21T08:52:47.6648128Z selp.b16 %rs86, %rs56, %rs40, %p57; 2026-02-21T08:52:47.6648335Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T08:52:47.6648501Z shr.s16 %rs88, %rs87, 4; 2026-02-21T08:52:47.6648686Z selp.b16 %rs89, %rs57, %rs41, %p57; 2026-02-21T08:52:47.6648885Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T08:52:47.6649225Z shr.s16 %rs91, %rs90, 4; 2026-02-21T08:52:47.6649417Z selp.b16 %rs92, %rs58, %rs42, %p57; 2026-02-21T08:52:47.6649613Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T08:52:47.6649803Z shr.s16 %rs94, %rs93, 4; 2026-02-21T08:52:47.6649983Z selp.b16 %rs95, %rs59, %rs43, %p57; 2026-02-21T08:52:47.6650181Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T08:52:47.6650352Z shr.s16 %rs97, %rs96, 4; 2026-02-21T08:52:47.6650531Z selp.b16 %rs98, %rs60, %rs44, %p57; 2026-02-21T08:52:47.6650724Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T08:52:47.6650899Z shr.s16 %rs100, %rs99, 4; 2026-02-21T08:52:47.6651095Z selp.b16 %rs101, %rs61, %rs45, %p57; 2026-02-21T08:52:47.6651300Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T08:52:47.6651488Z shr.s16 %rs103, %rs102, 4; 2026-02-21T08:52:47.6651674Z selp.b16 %rs104, %rs62, %rs46, %p57; 2026-02-21T08:52:47.6651888Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T08:52:47.6652069Z shr.s16 %rs106, %rs105, 4; 2026-02-21T08:52:47.6652255Z selp.b16 %rs107, %rs63, %rs47, %p57; 2026-02-21T08:52:47.6652456Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T08:52:47.6652636Z shr.s16 %rs109, %rs108, 4; 2026-02-21T08:52:47.6652816Z selp.b16 %rs110, %rs64, %rs48, %p57; 2026-02-21T08:52:47.6653025Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T08:52:47.6653205Z shr.s16 %rs112, %rs111, 4; 2026-02-21T08:52:47.6653553Z .loc 1 77 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:77:32 2026-02-21T08:52:47.6653928Z cvt.rn.f32.s16 %r678, %rs67; 2026-02-21T08:52:47.6654116Z cvt.rn.f32.s16 %r679, %rs70; 2026-02-21T08:52:47.6654304Z cvt.rn.f32.s16 %r680, %rs73; 2026-02-21T08:52:47.6654481Z cvt.rn.f32.s16 %r681, %rs76; 2026-02-21T08:52:47.6654666Z cvt.rn.f32.s16 %r682, %rs79; 2026-02-21T08:52:47.6654844Z cvt.rn.f32.s16 %r683, %rs82; 2026-02-21T08:52:47.6655030Z cvt.rn.f32.s16 %r684, %rs85; 2026-02-21T08:52:47.6655205Z cvt.rn.f32.s16 %r685, %rs88; 2026-02-21T08:52:47.6655388Z cvt.rn.f32.s16 %r686, %rs91; 2026-02-21T08:52:47.6655572Z cvt.rn.f32.s16 %r687, %rs94; 2026-02-21T08:52:47.6655760Z cvt.rn.f32.s16 %r688, %rs97; 2026-02-21T08:52:47.6655955Z cvt.rn.f32.s16 %r689, %rs100; 2026-02-21T08:52:47.6656146Z cvt.rn.f32.s16 %r690, %rs103; 2026-02-21T08:52:47.6656339Z cvt.rn.f32.s16 %r691, %rs106; 2026-02-21T08:52:47.6656661Z cvt.rn.f32.s16 %r692, %rs109; 2026-02-21T08:52:47.6656864Z cvt.rn.f32.s16 %r693, %rs112; 2026-02-21T08:52:47.6657050Z st.shared.b32 [%r25], %r678; 2026-02-21T08:52:47.6657250Z st.shared.b32 [%r25+32768], %r686; 2026-02-21T08:52:47.6657455Z st.shared.b32 [%r26], %r679; 2026-02-21T08:52:47.6657654Z st.shared.b32 [%r26+32768], %r687; 2026-02-21T08:52:47.6657863Z st.shared.b32 [%r27], %r680; 2026-02-21T08:52:47.6658047Z st.shared.b32 [%r27+32768], %r688; 2026-02-21T08:52:47.6658248Z st.shared.b32 [%r28], %r681; 2026-02-21T08:52:47.6658430Z st.shared.b32 [%r28+32768], %r689; 2026-02-21T08:52:47.6658628Z st.shared.b32 [%r29], %r682; 2026-02-21T08:52:47.6658809Z st.shared.b32 [%r29+32768], %r690; 2026-02-21T08:52:47.6659013Z st.shared.b32 [%r30], %r683; 2026-02-21T08:52:47.6659196Z st.shared.b32 [%r30+32768], %r691; 2026-02-21T08:52:47.6659394Z st.shared.b32 [%r31], %r684; 2026-02-21T08:52:47.6659586Z st.shared.b32 [%r31+32768], %r692; 2026-02-21T08:52:47.6659934Z st.shared.b32 [%r32], %r685; 2026-02-21T08:52:47.6660123Z st.shared.b32 [%r32+32768], %r693; 2026-02-21T08:52:47.6660311Z $L__tmp1: 2026-02-21T08:52:47.6660679Z .loc 2 291 36 // standard.py:291:36 @[ cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:84:40 ] 2026-02-21T08:52:47.6661108Z // begin inline asm 2026-02-21T08:52:47.6661319Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.6661532Z // end inline asm 2026-02-21T08:52:47.6661691Z bar.sync 0; 2026-02-21T08:52:47.6661858Z shfl.sync.idx.b32 %r694, %r4, 0, 31, -1; 2026-02-21T08:52:47.6662092Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.6662288Z shl.b32 %r695, %r694, 10; 2026-02-21T08:52:47.6662466Z and.b32 %r696, %r695, 28672; 2026-02-21T08:52:47.6662784Z add.s32 %r697, %r696, %r2128; 2026-02-21T08:52:47.6662976Z bfe.u32 %r698, %r697, 4, 14; 2026-02-21T08:52:47.6663171Z cvt.u64.u32 %rd41, %r698; 2026-02-21T08:52:47.6663380Z or.b64 %rd31, %rd41, 4611686293439512576; 2026-02-21T08:52:47.6663606Z mov.pred %p2, -1; 2026-02-21T08:52:47.6663778Z // begin inline asm 2026-02-21T08:52:47.6664384Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r367,%r368,%r369,%r370}, %rd31, %p2, 1, 1; 2026-02-21T08:52:47.6665032Z // end inline asm 2026-02-21T08:52:47.6665196Z add.s32 %r699, %r697, 32; 2026-02-21T08:52:47.6665375Z bfe.u32 %r700, %r699, 4, 14; 2026-02-21T08:52:47.6665569Z cvt.u64.u32 %rd42, %r700; 2026-02-21T08:52:47.6665758Z or.b64 %rd32, %rd42, 4611686293439512576; 2026-02-21T08:52:47.6665976Z // begin inline asm 2026-02-21T08:52:47.6666694Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r403,%r404,%r405,%r406}, %rd32, %p2, 1, 1; 2026-02-21T08:52:47.6667343Z // end inline asm 2026-02-21T08:52:47.6667512Z add.s32 %r701, %r697, 64; 2026-02-21T08:52:47.6667686Z bfe.u32 %r702, %r701, 4, 14; 2026-02-21T08:52:47.6667883Z cvt.u64.u32 %rd43, %r702; 2026-02-21T08:52:47.6668069Z or.b64 %rd33, %rd43, 4611686293439512576; 2026-02-21T08:52:47.6668285Z // begin inline asm 2026-02-21T08:52:47.6668958Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r439,%r440,%r441,%r442}, %rd33, %p2, 1, 1; 2026-02-21T08:52:47.6669595Z // end inline asm 2026-02-21T08:52:47.6669750Z add.s32 %r703, %r697, 96; 2026-02-21T08:52:47.6669928Z bfe.u32 %r704, %r703, 4, 14; 2026-02-21T08:52:47.6670109Z cvt.u64.u32 %rd44, %r704; 2026-02-21T08:52:47.6670289Z or.b64 %rd34, %rd44, 4611686293439512576; 2026-02-21T08:52:47.6670503Z // begin inline asm 2026-02-21T08:52:47.6670990Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r475,%r476,%r477,%r478}, %rd34, %p2, 1, 1; 2026-02-21T08:52:47.6671055Z // end inline asm 2026-02-21T08:52:47.6671125Z add.s32 %r705, %r697, 32768; 2026-02-21T08:52:47.6671193Z bfe.u32 %r706, %r705, 4, 14; 2026-02-21T08:52:47.6671257Z cvt.u64.u32 %rd45, %r706; 2026-02-21T08:52:47.6671335Z or.b64 %rd35, %rd45, 4611686293439512576; 2026-02-21T08:52:47.6671398Z // begin inline asm 2026-02-21T08:52:47.6671882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r511,%r512,%r513,%r514}, %rd35, %p2, 1, 1; 2026-02-21T08:52:47.6671942Z // end inline asm 2026-02-21T08:52:47.6672010Z add.s32 %r707, %r697, 32800; 2026-02-21T08:52:47.6672077Z bfe.u32 %r708, %r707, 4, 14; 2026-02-21T08:52:47.6672141Z cvt.u64.u32 %rd46, %r708; 2026-02-21T08:52:47.6672218Z or.b64 %rd36, %rd46, 4611686293439512576; 2026-02-21T08:52:47.6672281Z // begin inline asm 2026-02-21T08:52:47.6672915Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r547,%r548,%r549,%r550}, %rd36, %p2, 1, 1; 2026-02-21T08:52:47.6672979Z // end inline asm 2026-02-21T08:52:47.6673043Z add.s32 %r709, %r697, 32832; 2026-02-21T08:52:47.6673107Z bfe.u32 %r710, %r709, 4, 14; 2026-02-21T08:52:47.6673171Z cvt.u64.u32 %rd47, %r710; 2026-02-21T08:52:47.6673250Z or.b64 %rd37, %rd47, 4611686293439512576; 2026-02-21T08:52:47.6673312Z // begin inline asm 2026-02-21T08:52:47.6673796Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r583,%r584,%r585,%r586}, %rd37, %p2, 1, 1; 2026-02-21T08:52:47.6673983Z // end inline asm 2026-02-21T08:52:47.6674063Z add.s32 %r711, %r697, 32864; 2026-02-21T08:52:47.6674128Z bfe.u32 %r712, %r711, 4, 14; 2026-02-21T08:52:47.6674198Z cvt.u64.u32 %rd48, %r712; 2026-02-21T08:52:47.6674276Z or.b64 %rd38, %rd48, 4611686293439512576; 2026-02-21T08:52:47.6674338Z // begin inline asm 2026-02-21T08:52:47.6674822Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162}, {%r619,%r620,%r621,%r622}, %rd38, %p2, 1, 1; 2026-02-21T08:52:47.6674887Z // end inline asm 2026-02-21T08:52:47.6674970Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.6675031Z mov.b32 %r641, 0; 2026-02-21T08:52:47.6675099Z mov.b32 %r639, %r2128; 2026-02-21T08:52:47.6675161Z mov.b32 %r640, %r641; 2026-02-21T08:52:47.6675223Z // begin inline asm 2026-02-21T08:52:47.6675540Z // wait for regs: %r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r639,%r640,%r641 2026-02-21T08:52:47.6675622Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.6675683Z // end inline asm 2026-02-21T08:52:47.6675747Z $L__tmp2: 2026-02-21T08:52:47.6675984Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6676050Z add.s32 %r713, %r2146, 1; 2026-02-21T08:52:47.6676121Z setp.gt.s32 %p13, %r713, 1; 2026-02-21T08:52:47.6676202Z selp.b32 %r2146, 0, %r713, %p13; 2026-02-21T08:52:47.6676425Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6676625Z mad.wide.s32 %rd39, %r2143, 2, %rd15; 2026-02-21T08:52:47.6676854Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6676926Z shl.b32 %r714, %r2146, 13; 2026-02-21T08:52:47.6676993Z add.s32 %r661, %r11, %r714; 2026-02-21T08:52:47.6677064Z selp.b32 %r662, 8, 0, %p11; 2026-02-21T08:52:47.6677135Z // begin inline asm 2026-02-21T08:52:47.6677281Z cp.async.ca.shared.global [ %r661 + 0 ], [ %rd39 + 0 ], 0x8, %r662; 2026-02-21T08:52:47.6677346Z // end inline asm 2026-02-21T08:52:47.6677426Z cp.async.commit_group; 2026-02-21T08:52:47.6677633Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6677705Z cvt.s64.s32 %rd49, %r2144; 2026-02-21T08:52:47.6677780Z add.s64 %rd40, %rd16, %rd49; 2026-02-21T08:52:47.6677982Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6678050Z add.s32 %r663, %r13, %r714; 2026-02-21T08:52:47.6678113Z // begin inline asm 2026-02-21T08:52:47.6678257Z cp.async.ca.shared.global [ %r663 + 0 ], [ %rd40 + 0 ], 0x8, %r662; 2026-02-21T08:52:47.6678318Z // end inline asm 2026-02-21T08:52:47.6678389Z cp.async.commit_group; 2026-02-21T08:52:47.6678612Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6678682Z add.s32 %r2144, %r2144, 229376; 2026-02-21T08:52:47.6678746Z add.s32 %r2143, %r2143, 64; 2026-02-21T08:52:47.6678978Z setp.lt.u64 %p14, %rd151, 4064; 2026-02-21T08:52:47.6679046Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:47.6679164Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.6679374Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6679448Z or.b32 %r744, %r66, %r4; 2026-02-21T08:52:47.6679514Z or.b32 %r745, %r66, %r6; 2026-02-21T08:52:47.6679726Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6679807Z cp.async.wait_group 0; 2026-02-21T08:52:47.6679867Z bar.sync 0; 2026-02-21T08:52:47.6680072Z .loc 1 87 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:87:28 2026-02-21T08:52:47.6680284Z cvt.rn.bf16x2.f32 %r746, %r2148, %r2147; 2026-02-21T08:52:47.6680365Z cvt.rn.bf16x2.f32 %r747, %r2150, %r2149; 2026-02-21T08:52:47.6680439Z cvt.rn.bf16x2.f32 %r748, %r2152, %r2151; 2026-02-21T08:52:47.6680516Z cvt.rn.bf16x2.f32 %r749, %r2154, %r2153; 2026-02-21T08:52:47.6680597Z cvt.rn.bf16x2.f32 %r750, %r2156, %r2155; 2026-02-21T08:52:47.6680670Z cvt.rn.bf16x2.f32 %r751, %r2158, %r2157; 2026-02-21T08:52:47.6680745Z cvt.rn.bf16x2.f32 %r752, %r2160, %r2159; 2026-02-21T08:52:47.6680822Z cvt.rn.bf16x2.f32 %r753, %r2162, %r2161; 2026-02-21T08:52:47.6681025Z .loc 1 88 50 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:50 2026-02-21T08:52:47.6681097Z mad.lo.s32 %r754, %r744, 7168, %r67; 2026-02-21T08:52:47.6681171Z mad.lo.s32 %r755, %r745, 7168, %r67; 2026-02-21T08:52:47.6681371Z .loc 1 88 22 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:22 2026-02-21T08:52:47.6681441Z mad.wide.s32 %rd50, %r754, 2, %rd17; 2026-02-21T08:52:47.6681511Z mad.wide.s32 %rd51, %r755, 2, %rd17; 2026-02-21T08:52:47.6681718Z .loc 1 88 81 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:81 2026-02-21T08:52:47.6681830Z st.shared.v4.b32 [%r33], {%r746, %r748, %r750, %r752}; 2026-02-21T08:52:47.6681945Z st.shared.v4.b32 [%r33+512], {%r747, %r749, %r751, %r753}; 2026-02-21T08:52:47.6682010Z bar.sync 0; 2026-02-21T08:52:47.6682073Z // begin inline asm 2026-02-21T08:52:47.6682258Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r725, %r726, %r727, %r728}, [%r719]; 2026-02-21T08:52:47.6682326Z // end inline asm 2026-02-21T08:52:47.6682388Z // begin inline asm 2026-02-21T08:52:47.6682564Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r729, %r730, %r731, %r732}, [%r724]; 2026-02-21T08:52:47.6682623Z // end inline asm 2026-02-21T08:52:47.6682693Z // begin inline asm 2026-02-21T08:52:47.6682814Z st.global.v4.b32 [ %rd50 + 0 ], { %r725, %r726, %r727, %r728 }; 2026-02-21T08:52:47.6682873Z // end inline asm 2026-02-21T08:52:47.6682941Z // begin inline asm 2026-02-21T08:52:47.6683057Z st.global.v4.b32 [ %rd51 + 0 ], { %r729, %r730, %r731, %r732 }; 2026-02-21T08:52:47.6683116Z // end inline asm 2026-02-21T08:52:47.6683338Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6683412Z add.s32 %r756, %r2142, 4224; 2026-02-21T08:52:47.6683620Z .loc 1 25 35 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:25:35 2026-02-21T08:52:47.6683694Z mul.hi.s32 %r757, %r756, -1840700269; 2026-02-21T08:52:47.6683766Z add.s32 %r758, %r757, %r756; 2026-02-21T08:52:47.6683831Z shr.u32 %r759, %r758, 31; 2026-02-21T08:52:47.6683895Z shr.s32 %r760, %r758, 5; 2026-02-21T08:52:47.6683967Z add.s32 %r761, %r760, %r759; 2026-02-21T08:52:47.6684175Z .loc 1 26 33 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:26:33 2026-02-21T08:52:47.6684243Z shl.b32 %r762, %r761, 1; 2026-02-21T08:52:47.6684446Z .loc 1 27 39 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:39 2026-02-21T08:52:47.6684516Z sub.s32 %r763, 1, %r762; 2026-02-21T08:52:47.6684835Z .loc 1 27 52 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:52 2026-02-21T08:52:47.6684900Z min.s32 %r764, %r763, 2; 2026-02-21T08:52:47.6685108Z .loc 1 28 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:45 2026-02-21T08:52:47.6685173Z mul.lo.s32 %r765, %r761, 56; 2026-02-21T08:52:47.6685236Z sub.s32 %r766, %r756, %r765; 2026-02-21T08:52:47.6685446Z .loc 1 29 51 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:29:51 2026-02-21T08:52:47.6685507Z div.s32 %r767, %r766, %r764; 2026-02-21T08:52:47.6685706Z .loc 1 28 64 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:64 2026-02-21T08:52:47.6685779Z mul.lo.s32 %r768, %r767, %r764; 2026-02-21T08:52:47.6685956Z sub.s32 %r769, %r766, %r768; 2026-02-21T08:52:47.6686158Z .loc 1 28 30 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:30 2026-02-21T08:52:47.6686223Z add.s32 %r770, %r769, %r762; 2026-02-21T08:52:47.6686430Z .loc 1 30 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:30:27 2026-02-21T08:52:47.6686609Z shl.b32 %r110, %r770, 6; 2026-02-21T08:52:47.6686814Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6686885Z or.b32 %r771, %r110, %r5; 2026-02-21T08:52:47.6687084Z .loc 1 32 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:32:27 2026-02-21T08:52:47.6687147Z shl.b32 %r772, %r767, 8; 2026-02-21T08:52:47.6687350Z .loc 1 33 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:33:32 2026-02-21T08:52:47.6687414Z or.b32 %r111, %r772, %r9; 2026-02-21T08:52:47.6687615Z .loc 1 48 53 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:53 2026-02-21T08:52:47.6687685Z shl.b32 %r773, %r771, 13; 2026-02-21T08:52:47.6687885Z .loc 1 48 60 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:60 2026-02-21T08:52:47.6687952Z or.b32 %r774, %r773, %r7; 2026-02-21T08:52:47.6688154Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6688237Z mad.wide.s32 %rd52, %r774, 2, %rd15; 2026-02-21T08:52:47.6688298Z mov.b32 %r734, 8; 2026-02-21T08:52:47.6688495Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6688568Z // begin inline asm 2026-02-21T08:52:47.6688709Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd52 + 0 ], 0x8, %r734; 2026-02-21T08:52:47.6688773Z // end inline asm 2026-02-21T08:52:47.6688848Z cp.async.commit_group; 2026-02-21T08:52:47.6689054Z .loc 1 54 62 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:62 2026-02-21T08:52:47.6689123Z add.s32 %r775, %r111, %r2129; 2026-02-21T08:52:47.6689322Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6689396Z cvt.s64.s32 %rd57, %r775; 2026-02-21T08:52:47.6689461Z add.s64 %rd53, %rd16, %rd57; 2026-02-21T08:52:47.6689660Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6689733Z // begin inline asm 2026-02-21T08:52:47.6689867Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd53 + 0 ], 0x8, %r734; 2026-02-21T08:52:47.6689927Z // end inline asm 2026-02-21T08:52:47.6690003Z cp.async.commit_group; 2026-02-21T08:52:47.6690207Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6690272Z cvt.s64.s32 %rd58, %r773; 2026-02-21T08:52:47.6690342Z or.b64 %rd59, %rd58, %rd2; 2026-02-21T08:52:47.6690419Z shl.b64 %rd60, %rd59, 1; 2026-02-21T08:52:47.6690486Z add.s64 %rd61, %rd15, %rd60; 2026-02-21T08:52:47.6690554Z add.s64 %rd54, %rd61, 128; 2026-02-21T08:52:47.6690764Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6690978Z bar.sync 0; 2026-02-21T08:52:47.6691045Z // begin inline asm 2026-02-21T08:52:47.6691192Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd54 + 0 ], 0x8, %r734; 2026-02-21T08:52:47.6691255Z // end inline asm 2026-02-21T08:52:47.6691325Z cp.async.commit_group; 2026-02-21T08:52:47.6691533Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6691607Z cvt.s64.s32 %rd62, %r111; 2026-02-21T08:52:47.6691679Z add.s64 %rd63, %rd62, %rd3; 2026-02-21T08:52:47.6691747Z add.s64 %rd64, %rd16, %rd63; 2026-02-21T08:52:47.6691825Z add.s64 %rd55, %rd64, 229376; 2026-02-21T08:52:47.6692143Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6692212Z // begin inline asm 2026-02-21T08:52:47.6692353Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd55 + 0 ], 0x8, %r734; 2026-02-21T08:52:47.6692419Z // end inline asm 2026-02-21T08:52:47.6692489Z cp.async.commit_group; 2026-02-21T08:52:47.6692699Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6692770Z add.s32 %r2164, %r36, %r772; 2026-02-21T08:52:47.6692836Z shl.b32 %r776, %r770, 19; 2026-02-21T08:52:47.6692899Z or.b32 %r2163, %r37, %r776; 2026-02-21T08:52:47.6692966Z mov.b32 %r2167, 0f00000000; 2026-02-21T08:52:47.6693028Z mov.b32 %r2166, 1; 2026-02-21T08:52:47.6693094Z mov.b32 %r2165, -1; 2026-02-21T08:52:47.6693156Z mov.b64 %rd152, -32; 2026-02-21T08:52:47.6693226Z mov.b32 %r2168, %r2167; 2026-02-21T08:52:47.6693287Z mov.b32 %r2169, %r2167; 2026-02-21T08:52:47.6693349Z mov.b32 %r2170, %r2167; 2026-02-21T08:52:47.6693417Z mov.b32 %r2171, %r2167; 2026-02-21T08:52:47.6693477Z mov.b32 %r2172, %r2167; 2026-02-21T08:52:47.6693539Z mov.b32 %r2173, %r2167; 2026-02-21T08:52:47.6693611Z mov.b32 %r2174, %r2167; 2026-02-21T08:52:47.6693683Z mov.b32 %r2175, %r2167; 2026-02-21T08:52:47.6693745Z mov.b32 %r2176, %r2167; 2026-02-21T08:52:47.6693809Z mov.b32 %r2177, %r2167; 2026-02-21T08:52:47.6693877Z mov.b32 %r2178, %r2167; 2026-02-21T08:52:47.6693941Z mov.b32 %r2179, %r2167; 2026-02-21T08:52:47.6694001Z mov.b32 %r2180, %r2167; 2026-02-21T08:52:47.6694062Z mov.b32 %r2181, %r2167; 2026-02-21T08:52:47.6694126Z mov.b32 %r2182, %r2167; 2026-02-21T08:52:47.6694248Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.6694359Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.6694430Z add.s64 %rd152, %rd152, 32; 2026-02-21T08:52:47.6694499Z setp.lt.u64 %p24, %rd152, 4032; 2026-02-21T08:52:47.6694565Z add.s32 %r1107, %r2165, 1; 2026-02-21T08:52:47.6694652Z setp.gt.s32 %p25, %r1107, 1; 2026-02-21T08:52:47.6694727Z selp.b32 %r2165, 0, %r1107, %p25; 2026-02-21T08:52:47.6694935Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6695012Z cp.async.wait_group 2; 2026-02-21T08:52:47.6695075Z bar.sync 0; 2026-02-21T08:52:47.6695140Z shl.b32 %r1108, %r2165, 13; 2026-02-21T08:52:47.6695206Z add.s32 %r1109, %r2128, %r1108; 2026-02-21T08:52:47.6695278Z add.s32 %r1110, %r1109, 65536; 2026-02-21T08:52:47.6695480Z .loc 1 52 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:52:32 2026-02-21T08:52:47.6695546Z add.s32 %r1111, %r1110, %r16; 2026-02-21T08:52:47.6695619Z ld.shared.b16 %rs113, [%r1111]; 2026-02-21T08:52:47.6695694Z ld.shared.b16 %rs114, [%r1111+1024]; 2026-02-21T08:52:47.6695764Z ld.shared.b16 %rs115, [%r1111+64]; 2026-02-21T08:52:47.6695834Z ld.shared.b16 %rs116, [%r1111+1088]; 2026-02-21T08:52:47.6695908Z add.s32 %r1112, %r1110, %r17; 2026-02-21T08:52:47.6695976Z ld.shared.b16 %rs117, [%r1112]; 2026-02-21T08:52:47.6696045Z ld.shared.b16 %rs118, [%r1112+1024]; 2026-02-21T08:52:47.6696119Z ld.shared.b16 %rs119, [%r1112+64]; 2026-02-21T08:52:47.6696291Z ld.shared.b16 %rs120, [%r1112+1088]; 2026-02-21T08:52:47.6696355Z add.s32 %r1113, %r1110, %r18; 2026-02-21T08:52:47.6696423Z ld.shared.b16 %rs121, [%r1113]; 2026-02-21T08:52:47.6696621Z ld.shared.b16 %rs122, [%r1113+1024]; 2026-02-21T08:52:47.6696695Z ld.shared.b16 %rs123, [%r1113+64]; 2026-02-21T08:52:47.6696765Z ld.shared.b16 %rs124, [%r1113+1088]; 2026-02-21T08:52:47.6696836Z add.s32 %r1114, %r1110, %r19; 2026-02-21T08:52:47.6696904Z ld.shared.b16 %rs125, [%r1114]; 2026-02-21T08:52:47.6696971Z ld.shared.b16 %rs126, [%r1114+1024]; 2026-02-21T08:52:47.6697044Z ld.shared.b16 %rs127, [%r1114+64]; 2026-02-21T08:52:47.6697112Z ld.shared.b16 %rs128, [%r1114+1088]; 2026-02-21T08:52:47.6697175Z add.s32 %r1115, %r1110, %r20; 2026-02-21T08:52:47.6697388Z ld.shared.b16 %rs129, [%r1115]; 2026-02-21T08:52:47.6697468Z ld.shared.b16 %rs130, [%r1115+1024]; 2026-02-21T08:52:47.6697533Z ld.shared.b16 %rs131, [%r1115+64]; 2026-02-21T08:52:47.6697606Z ld.shared.b16 %rs132, [%r1115+1088]; 2026-02-21T08:52:47.6697674Z add.s32 %r1116, %r1110, %r21; 2026-02-21T08:52:47.6697744Z ld.shared.b16 %rs133, [%r1116]; 2026-02-21T08:52:47.6697811Z ld.shared.b16 %rs134, [%r1116+1024]; 2026-02-21T08:52:47.6697877Z ld.shared.b16 %rs135, [%r1116+64]; 2026-02-21T08:52:47.6697950Z ld.shared.b16 %rs136, [%r1116+1088]; 2026-02-21T08:52:47.6698013Z add.s32 %r1117, %r1110, %r22; 2026-02-21T08:52:47.6698081Z ld.shared.b16 %rs137, [%r1117]; 2026-02-21T08:52:47.6698155Z ld.shared.b16 %rs138, [%r1117+1024]; 2026-02-21T08:52:47.6698223Z ld.shared.b16 %rs139, [%r1117+64]; 2026-02-21T08:52:47.6698291Z ld.shared.b16 %rs140, [%r1117+1088]; 2026-02-21T08:52:47.6698355Z add.s32 %r1118, %r1110, %r23; 2026-02-21T08:52:47.6698429Z ld.shared.b16 %rs141, [%r1118]; 2026-02-21T08:52:47.6698500Z ld.shared.b16 %rs142, [%r1118+1024]; 2026-02-21T08:52:47.6698568Z ld.shared.b16 %rs143, [%r1118+64]; 2026-02-21T08:52:47.6698643Z ld.shared.b16 %rs144, [%r1118+1088]; 2026-02-21T08:52:47.6698718Z cvt.f32.bf16 %r809, %rs113; 2026-02-21T08:52:47.6698785Z cvt.f32.bf16 %r810, %rs114; 2026-02-21T08:52:47.6698860Z cvt.f32.bf16 %r811, %rs117; 2026-02-21T08:52:47.6698924Z cvt.f32.bf16 %r812, %rs118; 2026-02-21T08:52:47.6698990Z cvt.f32.bf16 %r845, %rs121; 2026-02-21T08:52:47.6699052Z cvt.f32.bf16 %r846, %rs122; 2026-02-21T08:52:47.6699122Z cvt.f32.bf16 %r847, %rs125; 2026-02-21T08:52:47.6699188Z cvt.f32.bf16 %r848, %rs126; 2026-02-21T08:52:47.6699257Z cvt.f32.bf16 %r881, %rs129; 2026-02-21T08:52:47.6699326Z cvt.f32.bf16 %r882, %rs130; 2026-02-21T08:52:47.6699389Z cvt.f32.bf16 %r883, %rs133; 2026-02-21T08:52:47.6699451Z cvt.f32.bf16 %r884, %rs134; 2026-02-21T08:52:47.6699514Z cvt.f32.bf16 %r917, %rs137; 2026-02-21T08:52:47.6699579Z cvt.f32.bf16 %r918, %rs138; 2026-02-21T08:52:47.6699642Z cvt.f32.bf16 %r919, %rs141; 2026-02-21T08:52:47.6699706Z cvt.f32.bf16 %r920, %rs142; 2026-02-21T08:52:47.6699775Z cvt.f32.bf16 %r953, %rs115; 2026-02-21T08:52:47.6699842Z cvt.f32.bf16 %r954, %rs116; 2026-02-21T08:52:47.6699906Z cvt.f32.bf16 %r955, %rs119; 2026-02-21T08:52:47.6699969Z cvt.f32.bf16 %r956, %rs120; 2026-02-21T08:52:47.6700037Z cvt.f32.bf16 %r989, %rs123; 2026-02-21T08:52:47.6700114Z cvt.f32.bf16 %r990, %rs124; 2026-02-21T08:52:47.6700179Z cvt.f32.bf16 %r991, %rs127; 2026-02-21T08:52:47.6700249Z cvt.f32.bf16 %r992, %rs128; 2026-02-21T08:52:47.6700313Z cvt.f32.bf16 %r1025, %rs131; 2026-02-21T08:52:47.6700377Z cvt.f32.bf16 %r1026, %rs132; 2026-02-21T08:52:47.6700445Z cvt.f32.bf16 %r1027, %rs135; 2026-02-21T08:52:47.6700509Z cvt.f32.bf16 %r1028, %rs136; 2026-02-21T08:52:47.6700573Z cvt.f32.bf16 %r1061, %rs139; 2026-02-21T08:52:47.6700636Z cvt.f32.bf16 %r1062, %rs140; 2026-02-21T08:52:47.6700704Z cvt.f32.bf16 %r1063, %rs143; 2026-02-21T08:52:47.6700770Z cvt.f32.bf16 %r1064, %rs144; 2026-02-21T08:52:47.6700980Z .loc 1 67 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:67:45 2026-02-21T08:52:47.6701186Z add.s32 %r1119, %r24, %r1108; 2026-02-21T08:52:47.6701254Z ld.shared.b8 %rs145, [%r1119]; 2026-02-21T08:52:47.6701324Z ld.shared.b8 %rs146, [%r1119+512]; 2026-02-21T08:52:47.6701395Z ld.shared.b8 %rs147, [%r1119+1024]; 2026-02-21T08:52:47.6701469Z ld.shared.b8 %rs148, [%r1119+1536]; 2026-02-21T08:52:47.6701537Z ld.shared.b8 %rs149, [%r1119+2048]; 2026-02-21T08:52:47.6701604Z ld.shared.b8 %rs150, [%r1119+2560]; 2026-02-21T08:52:47.6701676Z ld.shared.b8 %rs151, [%r1119+3072]; 2026-02-21T08:52:47.6701741Z ld.shared.b8 %rs152, [%r1119+3584]; 2026-02-21T08:52:47.6701809Z ld.shared.b8 %rs153, [%r1119+4096]; 2026-02-21T08:52:47.6701883Z ld.shared.b8 %rs154, [%r1119+4608]; 2026-02-21T08:52:47.6701951Z ld.shared.b8 %rs155, [%r1119+5120]; 2026-02-21T08:52:47.6702120Z ld.shared.b8 %rs156, [%r1119+5632]; 2026-02-21T08:52:47.6702192Z ld.shared.b8 %rs157, [%r1119+6144]; 2026-02-21T08:52:47.6702264Z ld.shared.b8 %rs158, [%r1119+6656]; 2026-02-21T08:52:47.6702345Z ld.shared.b8 %rs159, [%r1119+7168]; 2026-02-21T08:52:47.6702418Z ld.shared.b8 %rs160, [%r1119+7680]; 2026-02-21T08:52:47.6702628Z .loc 1 57 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:57:28 2026-02-21T08:52:47.6702696Z shl.b16 %rs161, %rs145, 4; 2026-02-21T08:52:47.6702764Z shl.b16 %rs162, %rs146, 4; 2026-02-21T08:52:47.6702831Z shl.b16 %rs163, %rs147, 4; 2026-02-21T08:52:47.6702900Z shl.b16 %rs164, %rs148, 4; 2026-02-21T08:52:47.6702963Z shl.b16 %rs165, %rs149, 4; 2026-02-21T08:52:47.6703026Z shl.b16 %rs166, %rs150, 4; 2026-02-21T08:52:47.6703096Z shl.b16 %rs167, %rs151, 4; 2026-02-21T08:52:47.6703160Z shl.b16 %rs168, %rs152, 4; 2026-02-21T08:52:47.6703223Z shl.b16 %rs169, %rs153, 4; 2026-02-21T08:52:47.6703292Z shl.b16 %rs170, %rs154, 4; 2026-02-21T08:52:47.6703359Z shl.b16 %rs171, %rs155, 4; 2026-02-21T08:52:47.6703422Z shl.b16 %rs172, %rs156, 4; 2026-02-21T08:52:47.6703489Z shl.b16 %rs173, %rs157, 4; 2026-02-21T08:52:47.6703558Z shl.b16 %rs174, %rs158, 4; 2026-02-21T08:52:47.6703624Z shl.b16 %rs175, %rs159, 4; 2026-02-21T08:52:47.6703687Z shl.b16 %rs176, %rs160, 4; 2026-02-21T08:52:47.6703894Z .loc 1 72 58 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:72:58 2026-02-21T08:52:47.6703973Z selp.b16 %rs177, %rs161, %rs145, %p57; 2026-02-21T08:52:47.6704038Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T08:52:47.6704099Z shr.s16 %rs179, %rs178, 4; 2026-02-21T08:52:47.6704179Z selp.b16 %rs180, %rs162, %rs146, %p57; 2026-02-21T08:52:47.6704243Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T08:52:47.6704308Z shr.s16 %rs182, %rs181, 4; 2026-02-21T08:52:47.6704386Z selp.b16 %rs183, %rs163, %rs147, %p57; 2026-02-21T08:52:47.6704450Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T08:52:47.6704514Z shr.s16 %rs185, %rs184, 4; 2026-02-21T08:52:47.6704588Z selp.b16 %rs186, %rs164, %rs148, %p57; 2026-02-21T08:52:47.6704658Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T08:52:47.6704721Z shr.s16 %rs188, %rs187, 4; 2026-02-21T08:52:47.6704796Z selp.b16 %rs189, %rs165, %rs149, %p57; 2026-02-21T08:52:47.6704865Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T08:52:47.6704927Z shr.s16 %rs191, %rs190, 4; 2026-02-21T08:52:47.6704997Z selp.b16 %rs192, %rs166, %rs150, %p57; 2026-02-21T08:52:47.6705065Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T08:52:47.6705130Z shr.s16 %rs194, %rs193, 4; 2026-02-21T08:52:47.6705203Z selp.b16 %rs195, %rs167, %rs151, %p57; 2026-02-21T08:52:47.6705266Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T08:52:47.6705345Z shr.s16 %rs197, %rs196, 4; 2026-02-21T08:52:47.6705418Z selp.b16 %rs198, %rs168, %rs152, %p57; 2026-02-21T08:52:47.6705482Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T08:52:47.6705553Z shr.s16 %rs200, %rs199, 4; 2026-02-21T08:52:47.6705629Z selp.b16 %rs201, %rs169, %rs153, %p57; 2026-02-21T08:52:47.6705697Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T08:52:47.6705764Z shr.s16 %rs203, %rs202, 4; 2026-02-21T08:52:47.6705842Z selp.b16 %rs204, %rs170, %rs154, %p57; 2026-02-21T08:52:47.6705907Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T08:52:47.6706105Z shr.s16 %rs206, %rs205, 4; 2026-02-21T08:52:47.6706182Z selp.b16 %rs207, %rs171, %rs155, %p57; 2026-02-21T08:52:47.6706258Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T08:52:47.6706328Z shr.s16 %rs209, %rs208, 4; 2026-02-21T08:52:47.6706402Z selp.b16 %rs210, %rs172, %rs156, %p57; 2026-02-21T08:52:47.6706586Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T08:52:47.6706658Z shr.s16 %rs212, %rs211, 4; 2026-02-21T08:52:47.6706730Z selp.b16 %rs213, %rs173, %rs157, %p57; 2026-02-21T08:52:47.6706802Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T08:52:47.6706869Z shr.s16 %rs215, %rs214, 4; 2026-02-21T08:52:47.6706941Z selp.b16 %rs216, %rs174, %rs158, %p57; 2026-02-21T08:52:47.6707007Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T08:52:47.6707080Z shr.s16 %rs218, %rs217, 4; 2026-02-21T08:52:47.6707317Z selp.b16 %rs219, %rs175, %rs159, %p57; 2026-02-21T08:52:47.6707396Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T08:52:47.6707469Z shr.s16 %rs221, %rs220, 4; 2026-02-21T08:52:47.6707547Z selp.b16 %rs222, %rs176, %rs160, %p57; 2026-02-21T08:52:47.6707616Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T08:52:47.6707686Z shr.s16 %rs224, %rs223, 4; 2026-02-21T08:52:47.6707900Z .loc 1 77 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:77:32 2026-02-21T08:52:47.6707970Z cvt.rn.f32.s16 %r1120, %rs179; 2026-02-21T08:52:47.6708037Z cvt.rn.f32.s16 %r1121, %rs182; 2026-02-21T08:52:47.6708111Z cvt.rn.f32.s16 %r1122, %rs185; 2026-02-21T08:52:47.6708177Z cvt.rn.f32.s16 %r1123, %rs188; 2026-02-21T08:52:47.6708244Z cvt.rn.f32.s16 %r1124, %rs191; 2026-02-21T08:52:47.6708314Z cvt.rn.f32.s16 %r1125, %rs194; 2026-02-21T08:52:47.6708380Z cvt.rn.f32.s16 %r1126, %rs197; 2026-02-21T08:52:47.6708443Z cvt.rn.f32.s16 %r1127, %rs200; 2026-02-21T08:52:47.6708598Z cvt.rn.f32.s16 %r1128, %rs203; 2026-02-21T08:52:47.6708674Z cvt.rn.f32.s16 %r1129, %rs206; 2026-02-21T08:52:47.6708737Z cvt.rn.f32.s16 %r1130, %rs209; 2026-02-21T08:52:47.6708802Z cvt.rn.f32.s16 %r1131, %rs212; 2026-02-21T08:52:47.6708875Z cvt.rn.f32.s16 %r1132, %rs215; 2026-02-21T08:52:47.6708940Z cvt.rn.f32.s16 %r1133, %rs218; 2026-02-21T08:52:47.6709008Z cvt.rn.f32.s16 %r1134, %rs221; 2026-02-21T08:52:47.6709071Z cvt.rn.f32.s16 %r1135, %rs224; 2026-02-21T08:52:47.6709144Z st.shared.b32 [%r25], %r1120; 2026-02-21T08:52:47.6709219Z st.shared.b32 [%r25+32768], %r1128; 2026-02-21T08:52:47.6709286Z st.shared.b32 [%r26], %r1121; 2026-02-21T08:52:47.6709361Z st.shared.b32 [%r26+32768], %r1129; 2026-02-21T08:52:47.6709427Z st.shared.b32 [%r27], %r1122; 2026-02-21T08:52:47.6709494Z st.shared.b32 [%r27+32768], %r1130; 2026-02-21T08:52:47.6709567Z st.shared.b32 [%r28], %r1123; 2026-02-21T08:52:47.6709634Z st.shared.b32 [%r28+32768], %r1131; 2026-02-21T08:52:47.6709701Z st.shared.b32 [%r29], %r1124; 2026-02-21T08:52:47.6709770Z st.shared.b32 [%r29+32768], %r1132; 2026-02-21T08:52:47.6709842Z st.shared.b32 [%r30], %r1125; 2026-02-21T08:52:47.6709911Z st.shared.b32 [%r30+32768], %r1133; 2026-02-21T08:52:47.6709979Z st.shared.b32 [%r31], %r1126; 2026-02-21T08:52:47.6710057Z st.shared.b32 [%r31+32768], %r1134; 2026-02-21T08:52:47.6710124Z st.shared.b32 [%r32], %r1127; 2026-02-21T08:52:47.6710191Z st.shared.b32 [%r32+32768], %r1135; 2026-02-21T08:52:47.6710251Z $L__tmp3: 2026-02-21T08:52:47.6710541Z .loc 2 291 36 // standard.py:291:36 @[ cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:84:40 ] 2026-02-21T08:52:47.6710607Z // begin inline asm 2026-02-21T08:52:47.6710702Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.6710771Z // end inline asm 2026-02-21T08:52:47.6710832Z bar.sync 0; 2026-02-21T08:52:47.6710918Z shfl.sync.idx.b32 %r1136, %r4, 0, 31, -1; 2026-02-21T08:52:47.6711002Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.6711071Z shl.b32 %r1137, %r1136, 10; 2026-02-21T08:52:47.6711136Z and.b32 %r1138, %r1137, 28672; 2026-02-21T08:52:47.6711203Z add.s32 %r1139, %r1138, %r2128; 2026-02-21T08:52:47.6711274Z bfe.u32 %r1140, %r1139, 4, 14; 2026-02-21T08:52:47.6711483Z cvt.u64.u32 %rd75, %r1140; 2026-02-21T08:52:47.6711561Z or.b64 %rd65, %rd75, 4611686293439512576; 2026-02-21T08:52:47.6711636Z mov.pred %p15, -1; 2026-02-21T08:52:47.6711699Z // begin inline asm 2026-02-21T08:52:47.6712207Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r809,%r810,%r811,%r812}, %rd65, %p15, 1, 1; 2026-02-21T08:52:47.6712276Z // end inline asm 2026-02-21T08:52:47.6712343Z add.s32 %r1141, %r1139, 32; 2026-02-21T08:52:47.6712409Z bfe.u32 %r1142, %r1141, 4, 14; 2026-02-21T08:52:47.6712474Z cvt.u64.u32 %rd76, %r1142; 2026-02-21T08:52:47.6712556Z or.b64 %rd66, %rd76, 4611686293439512576; 2026-02-21T08:52:47.6712620Z // begin inline asm 2026-02-21T08:52:47.6713217Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r845,%r846,%r847,%r848}, %rd66, %p15, 1, 1; 2026-02-21T08:52:47.6713296Z // end inline asm 2026-02-21T08:52:47.6713363Z add.s32 %r1143, %r1139, 64; 2026-02-21T08:52:47.6713426Z bfe.u32 %r1144, %r1143, 4, 14; 2026-02-21T08:52:47.6713497Z cvt.u64.u32 %rd77, %r1144; 2026-02-21T08:52:47.6713570Z or.b64 %rd67, %rd77, 4611686293439512576; 2026-02-21T08:52:47.6713633Z // begin inline asm 2026-02-21T08:52:47.6714120Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r881,%r882,%r883,%r884}, %rd67, %p15, 1, 1; 2026-02-21T08:52:47.6714187Z // end inline asm 2026-02-21T08:52:47.6714248Z add.s32 %r1145, %r1139, 96; 2026-02-21T08:52:47.6714312Z bfe.u32 %r1146, %r1145, 4, 14; 2026-02-21T08:52:47.6714388Z cvt.u64.u32 %rd78, %r1146; 2026-02-21T08:52:47.6714458Z or.b64 %rd68, %rd78, 4611686293439512576; 2026-02-21T08:52:47.6714521Z // begin inline asm 2026-02-21T08:52:47.6715010Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r917,%r918,%r919,%r920}, %rd68, %p15, 1, 1; 2026-02-21T08:52:47.6715074Z // end inline asm 2026-02-21T08:52:47.6715139Z add.s32 %r1147, %r1139, 32768; 2026-02-21T08:52:47.6715204Z bfe.u32 %r1148, %r1147, 4, 14; 2026-02-21T08:52:47.6715276Z cvt.u64.u32 %rd79, %r1148; 2026-02-21T08:52:47.6715347Z or.b64 %rd69, %rd79, 4611686293439512576; 2026-02-21T08:52:47.6715409Z // begin inline asm 2026-02-21T08:52:47.6715911Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r953,%r954,%r955,%r956}, %rd69, %p15, 1, 1; 2026-02-21T08:52:47.6715976Z // end inline asm 2026-02-21T08:52:47.6716042Z add.s32 %r1149, %r1139, 32800; 2026-02-21T08:52:47.6716117Z bfe.u32 %r1150, %r1149, 4, 14; 2026-02-21T08:52:47.6716182Z cvt.u64.u32 %rd80, %r1150; 2026-02-21T08:52:47.6716256Z or.b64 %rd70, %rd80, 4611686293439512576; 2026-02-21T08:52:47.6716320Z // begin inline asm 2026-02-21T08:52:47.6716953Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r989,%r990,%r991,%r992}, %rd70, %p15, 1, 1; 2026-02-21T08:52:47.6717018Z // end inline asm 2026-02-21T08:52:47.6717084Z add.s32 %r1151, %r1139, 32832; 2026-02-21T08:52:47.6717165Z bfe.u32 %r1152, %r1151, 4, 14; 2026-02-21T08:52:47.6717231Z cvt.u64.u32 %rd81, %r1152; 2026-02-21T08:52:47.6717305Z or.b64 %rd71, %rd81, 4611686293439512576; 2026-02-21T08:52:47.6717374Z // begin inline asm 2026-02-21T08:52:47.6717880Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r1025,%r1026,%r1027,%r1028}, %rd71, %p15, 1, 1; 2026-02-21T08:52:47.6717939Z // end inline asm 2026-02-21T08:52:47.6718177Z add.s32 %r1153, %r1139, 32864; 2026-02-21T08:52:47.6718240Z bfe.u32 %r1154, %r1153, 4, 14; 2026-02-21T08:52:47.6718307Z cvt.u64.u32 %rd82, %r1154; 2026-02-21T08:52:47.6718379Z or.b64 %rd72, %rd82, 4611686293439512576; 2026-02-21T08:52:47.6718448Z // begin inline asm 2026-02-21T08:52:47.6718945Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182}, {%r1061,%r1062,%r1063,%r1064}, %rd72, %p15, 1, 1; 2026-02-21T08:52:47.6719007Z // end inline asm 2026-02-21T08:52:47.6719095Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.6719155Z mov.b32 %r1082, 0; 2026-02-21T08:52:47.6719220Z mov.b32 %r1081, %r2128; 2026-02-21T08:52:47.6719286Z mov.b32 %r1083, %r1082; 2026-02-21T08:52:47.6719475Z // begin inline asm 2026-02-21T08:52:47.6719787Z // wait for regs: %r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r1081,%r1082,%r1083 2026-02-21T08:52:47.6719871Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.6719935Z // end inline asm 2026-02-21T08:52:47.6719992Z $L__tmp4: 2026-02-21T08:52:47.6720217Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6720292Z add.s32 %r1155, %r2166, 1; 2026-02-21T08:52:47.6720364Z setp.gt.s32 %p26, %r1155, 1; 2026-02-21T08:52:47.6720435Z selp.b32 %r2166, 0, %r1155, %p26; 2026-02-21T08:52:47.6720646Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6720726Z mad.wide.s32 %rd73, %r2163, 2, %rd15; 2026-02-21T08:52:47.6720928Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6720996Z shl.b32 %r1156, %r2166, 13; 2026-02-21T08:52:47.6721067Z add.s32 %r1103, %r11, %r1156; 2026-02-21T08:52:47.6721135Z selp.b32 %r1104, 8, 0, %p24; 2026-02-21T08:52:47.6721202Z // begin inline asm 2026-02-21T08:52:47.6721355Z cp.async.ca.shared.global [ %r1103 + 0 ], [ %rd73 + 0 ], 0x8, %r1104; 2026-02-21T08:52:47.6721416Z // end inline asm 2026-02-21T08:52:47.6721484Z cp.async.commit_group; 2026-02-21T08:52:47.6721684Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6721756Z cvt.s64.s32 %rd83, %r2164; 2026-02-21T08:52:47.6721822Z add.s64 %rd74, %rd16, %rd83; 2026-02-21T08:52:47.6722020Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6722092Z add.s32 %r1105, %r13, %r1156; 2026-02-21T08:52:47.6722153Z // begin inline asm 2026-02-21T08:52:47.6722292Z cp.async.ca.shared.global [ %r1105 + 0 ], [ %rd74 + 0 ], 0x8, %r1104; 2026-02-21T08:52:47.6722360Z // end inline asm 2026-02-21T08:52:47.6722428Z cp.async.commit_group; 2026-02-21T08:52:47.6722637Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6722707Z add.s32 %r2164, %r2164, 229376; 2026-02-21T08:52:47.6722776Z add.s32 %r2163, %r2163, 64; 2026-02-21T08:52:47.6722845Z setp.lt.u64 %p27, %rd152, 4064; 2026-02-21T08:52:47.6722910Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:47.6723032Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.6723233Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6723300Z or.b32 %r1186, %r110, %r4; 2026-02-21T08:52:47.6723368Z or.b32 %r1187, %r110, %r6; 2026-02-21T08:52:47.6723578Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6723648Z cp.async.wait_group 0; 2026-02-21T08:52:47.6723711Z bar.sync 0; 2026-02-21T08:52:47.6723917Z .loc 1 87 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:87:28 2026-02-21T08:52:47.6723998Z cvt.rn.bf16x2.f32 %r1188, %r2168, %r2167; 2026-02-21T08:52:47.6724182Z cvt.rn.bf16x2.f32 %r1189, %r2170, %r2169; 2026-02-21T08:52:47.6724265Z cvt.rn.bf16x2.f32 %r1190, %r2172, %r2171; 2026-02-21T08:52:47.6724339Z cvt.rn.bf16x2.f32 %r1191, %r2174, %r2173; 2026-02-21T08:52:47.6724415Z cvt.rn.bf16x2.f32 %r1192, %r2176, %r2175; 2026-02-21T08:52:47.6724494Z cvt.rn.bf16x2.f32 %r1193, %r2178, %r2177; 2026-02-21T08:52:47.6724568Z cvt.rn.bf16x2.f32 %r1194, %r2180, %r2179; 2026-02-21T08:52:47.6724642Z cvt.rn.bf16x2.f32 %r1195, %r2182, %r2181; 2026-02-21T08:52:47.6724852Z .loc 1 88 50 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:50 2026-02-21T08:52:47.6724933Z mad.lo.s32 %r1196, %r1186, 7168, %r111; 2026-02-21T08:52:47.6725005Z mad.lo.s32 %r1197, %r1187, 7168, %r111; 2026-02-21T08:52:47.6725309Z .loc 1 88 22 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:22 2026-02-21T08:52:47.6725399Z mad.wide.s32 %rd84, %r1196, 2, %rd17; 2026-02-21T08:52:47.6725478Z mad.wide.s32 %rd85, %r1197, 2, %rd17; 2026-02-21T08:52:47.6725681Z .loc 1 88 81 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:81 2026-02-21T08:52:47.6725803Z st.shared.v4.b32 [%r33], {%r1188, %r1190, %r1192, %r1194}; 2026-02-21T08:52:47.6725925Z st.shared.v4.b32 [%r33+512], {%r1189, %r1191, %r1193, %r1195}; 2026-02-21T08:52:47.6725983Z bar.sync 0; 2026-02-21T08:52:47.6726052Z // begin inline asm 2026-02-21T08:52:47.6726248Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1167, %r1168, %r1169, %r1170}, [%r719]; 2026-02-21T08:52:47.6726312Z // end inline asm 2026-02-21T08:52:47.6726373Z // begin inline asm 2026-02-21T08:52:47.6726677Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1171, %r1172, %r1173, %r1174}, [%r724]; 2026-02-21T08:52:47.6726743Z // end inline asm 2026-02-21T08:52:47.6726808Z // begin inline asm 2026-02-21T08:52:47.6726946Z st.global.v4.b32 [ %rd84 + 0 ], { %r1167, %r1168, %r1169, %r1170 }; 2026-02-21T08:52:47.6727009Z // end inline asm 2026-02-21T08:52:47.6727070Z // begin inline asm 2026-02-21T08:52:47.6727202Z st.global.v4.b32 [ %rd85 + 0 ], { %r1171, %r1172, %r1173, %r1174 }; 2026-02-21T08:52:47.6727273Z // end inline asm 2026-02-21T08:52:47.6727489Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6727555Z add.s32 %r1198, %r2142, 8448; 2026-02-21T08:52:47.6727764Z .loc 1 25 35 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:25:35 2026-02-21T08:52:47.6727840Z mul.hi.s32 %r1199, %r1198, -1840700269; 2026-02-21T08:52:47.6727908Z add.s32 %r1200, %r1199, %r1198; 2026-02-21T08:52:47.6727978Z shr.u32 %r1201, %r1200, 31; 2026-02-21T08:52:47.6728043Z shr.s32 %r1202, %r1200, 5; 2026-02-21T08:52:47.6728111Z add.s32 %r1203, %r1202, %r1201; 2026-02-21T08:52:47.6728313Z .loc 1 26 33 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:26:33 2026-02-21T08:52:47.6728384Z shl.b32 %r1204, %r1203, 1; 2026-02-21T08:52:47.6728586Z .loc 1 27 39 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:39 2026-02-21T08:52:47.6728650Z sub.s32 %r1205, 1, %r1204; 2026-02-21T08:52:47.6728856Z .loc 1 27 52 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:52 2026-02-21T08:52:47.6728919Z min.s32 %r1206, %r1205, 2; 2026-02-21T08:52:47.6729116Z .loc 1 28 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:45 2026-02-21T08:52:47.6729191Z mul.lo.s32 %r1207, %r1203, 56; 2026-02-21T08:52:47.6729255Z sub.s32 %r1208, %r1198, %r1207; 2026-02-21T08:52:47.6729458Z .loc 1 29 51 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:29:51 2026-02-21T08:52:47.6729534Z div.s32 %r1209, %r1208, %r1206; 2026-02-21T08:52:47.6729731Z .loc 1 28 64 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:64 2026-02-21T08:52:47.6729801Z mul.lo.s32 %r1210, %r1209, %r1206; 2026-02-21T08:52:47.6730031Z sub.s32 %r1211, %r1208, %r1210; 2026-02-21T08:52:47.6730235Z .loc 1 28 30 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:30 2026-02-21T08:52:47.6730312Z add.s32 %r1212, %r1211, %r1204; 2026-02-21T08:52:47.6730514Z .loc 1 30 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:30:27 2026-02-21T08:52:47.6730584Z shl.b32 %r154, %r1212, 6; 2026-02-21T08:52:47.6730782Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6730849Z or.b32 %r1213, %r154, %r5; 2026-02-21T08:52:47.6731051Z .loc 1 32 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:32:27 2026-02-21T08:52:47.6731239Z shl.b32 %r1214, %r1209, 8; 2026-02-21T08:52:47.6731441Z .loc 1 33 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:33:32 2026-02-21T08:52:47.6731511Z or.b32 %r155, %r1214, %r9; 2026-02-21T08:52:47.6731713Z .loc 1 48 53 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:53 2026-02-21T08:52:47.6731779Z shl.b32 %r1215, %r1213, 13; 2026-02-21T08:52:47.6731976Z .loc 1 48 60 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:60 2026-02-21T08:52:47.6732042Z or.b32 %r1216, %r1215, %r7; 2026-02-21T08:52:47.6732241Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6732312Z mad.wide.s32 %rd86, %r1216, 2, %rd15; 2026-02-21T08:52:47.6732377Z mov.b32 %r1176, 8; 2026-02-21T08:52:47.6732574Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6732638Z // begin inline asm 2026-02-21T08:52:47.6732780Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd86 + 0 ], 0x8, %r1176; 2026-02-21T08:52:47.6732840Z // end inline asm 2026-02-21T08:52:47.6732909Z cp.async.commit_group; 2026-02-21T08:52:47.6733109Z .loc 1 54 62 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:62 2026-02-21T08:52:47.6733181Z add.s32 %r1217, %r155, %r2129; 2026-02-21T08:52:47.6733379Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6733444Z cvt.s64.s32 %rd91, %r1217; 2026-02-21T08:52:47.6733515Z add.s64 %rd87, %rd16, %rd91; 2026-02-21T08:52:47.6733728Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6733794Z // begin inline asm 2026-02-21T08:52:47.6733939Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd87 + 0 ], 0x8, %r1176; 2026-02-21T08:52:47.6733998Z // end inline asm 2026-02-21T08:52:47.6734070Z cp.async.commit_group; 2026-02-21T08:52:47.6734270Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6734341Z cvt.s64.s32 %rd92, %r1215; 2026-02-21T08:52:47.6734408Z or.b64 %rd93, %rd92, %rd2; 2026-02-21T08:52:47.6734473Z shl.b64 %rd94, %rd93, 1; 2026-02-21T08:52:47.6734548Z add.s64 %rd95, %rd15, %rd94; 2026-02-21T08:52:47.6734611Z add.s64 %rd88, %rd95, 128; 2026-02-21T08:52:47.6734808Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6734874Z bar.sync 0; 2026-02-21T08:52:47.6734935Z // begin inline asm 2026-02-21T08:52:47.6735065Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd88 + 0 ], 0x8, %r1176; 2026-02-21T08:52:47.6735125Z // end inline asm 2026-02-21T08:52:47.6735200Z cp.async.commit_group; 2026-02-21T08:52:47.6735400Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6735469Z cvt.s64.s32 %rd96, %r155; 2026-02-21T08:52:47.6735540Z add.s64 %rd97, %rd96, %rd3; 2026-02-21T08:52:47.6735619Z add.s64 %rd98, %rd16, %rd97; 2026-02-21T08:52:47.6735688Z add.s64 %rd89, %rd98, 229376; 2026-02-21T08:52:47.6735990Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6736057Z // begin inline asm 2026-02-21T08:52:47.6736188Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd89 + 0 ], 0x8, %r1176; 2026-02-21T08:52:47.6736249Z // end inline asm 2026-02-21T08:52:47.6736323Z cp.async.commit_group; 2026-02-21T08:52:47.6736642Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6736711Z add.s32 %r2184, %r36, %r1214; 2026-02-21T08:52:47.6736780Z shl.b32 %r1218, %r1212, 19; 2026-02-21T08:52:47.6736856Z or.b32 %r2183, %r37, %r1218; 2026-02-21T08:52:47.6736922Z mov.b32 %r2187, 0f00000000; 2026-02-21T08:52:47.6736982Z mov.b32 %r2186, 1; 2026-02-21T08:52:47.6737186Z mov.b32 %r2185, -1; 2026-02-21T08:52:47.6737253Z mov.b64 %rd153, -32; 2026-02-21T08:52:47.6737317Z mov.b32 %r2188, %r2187; 2026-02-21T08:52:47.6737386Z mov.b32 %r2189, %r2187; 2026-02-21T08:52:47.6737451Z mov.b32 %r2190, %r2187; 2026-02-21T08:52:47.6737511Z mov.b32 %r2191, %r2187; 2026-02-21T08:52:47.6737574Z mov.b32 %r2192, %r2187; 2026-02-21T08:52:47.6737640Z mov.b32 %r2193, %r2187; 2026-02-21T08:52:47.6737702Z mov.b32 %r2194, %r2187; 2026-02-21T08:52:47.6737766Z mov.b32 %r2195, %r2187; 2026-02-21T08:52:47.6737833Z mov.b32 %r2196, %r2187; 2026-02-21T08:52:47.6737895Z mov.b32 %r2197, %r2187; 2026-02-21T08:52:47.6737956Z mov.b32 %r2198, %r2187; 2026-02-21T08:52:47.6738017Z mov.b32 %r2199, %r2187; 2026-02-21T08:52:47.6738097Z mov.b32 %r2200, %r2187; 2026-02-21T08:52:47.6738161Z mov.b32 %r2201, %r2187; 2026-02-21T08:52:47.6738224Z mov.b32 %r2202, %r2187; 2026-02-21T08:52:47.6738349Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:47.6738462Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.6738530Z add.s64 %rd153, %rd153, 32; 2026-02-21T08:52:47.6738607Z setp.lt.u64 %p37, %rd153, 4032; 2026-02-21T08:52:47.6738674Z add.s32 %r1549, %r2185, 1; 2026-02-21T08:52:47.6738745Z setp.gt.s32 %p38, %r1549, 1; 2026-02-21T08:52:47.6738816Z selp.b32 %r2185, 0, %r1549, %p38; 2026-02-21T08:52:47.6739027Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6739100Z cp.async.wait_group 2; 2026-02-21T08:52:47.6739159Z bar.sync 0; 2026-02-21T08:52:47.6739229Z shl.b32 %r1550, %r2185, 13; 2026-02-21T08:52:47.6739294Z add.s32 %r1551, %r2128, %r1550; 2026-02-21T08:52:47.6739361Z add.s32 %r1552, %r1551, 65536; 2026-02-21T08:52:47.6739573Z .loc 1 52 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:52:32 2026-02-21T08:52:47.6739638Z add.s32 %r1553, %r1552, %r16; 2026-02-21T08:52:47.6739712Z ld.shared.b16 %rs225, [%r1553]; 2026-02-21T08:52:47.6739786Z ld.shared.b16 %rs226, [%r1553+1024]; 2026-02-21T08:52:47.6739865Z ld.shared.b16 %rs227, [%r1553+64]; 2026-02-21T08:52:47.6739939Z ld.shared.b16 %rs228, [%r1553+1088]; 2026-02-21T08:52:47.6740002Z add.s32 %r1554, %r1552, %r17; 2026-02-21T08:52:47.6740074Z ld.shared.b16 %rs229, [%r1554]; 2026-02-21T08:52:47.6740142Z ld.shared.b16 %rs230, [%r1554+1024]; 2026-02-21T08:52:47.6740211Z ld.shared.b16 %rs231, [%r1554+64]; 2026-02-21T08:52:47.6740280Z ld.shared.b16 %rs232, [%r1554+1088]; 2026-02-21T08:52:47.6740350Z add.s32 %r1555, %r1552, %r18; 2026-02-21T08:52:47.6740416Z ld.shared.b16 %rs233, [%r1555]; 2026-02-21T08:52:47.6740483Z ld.shared.b16 %rs234, [%r1555+1024]; 2026-02-21T08:52:47.6740556Z ld.shared.b16 %rs235, [%r1555+64]; 2026-02-21T08:52:47.6740623Z ld.shared.b16 %rs236, [%r1555+1088]; 2026-02-21T08:52:47.6740686Z add.s32 %r1556, %r1552, %r19; 2026-02-21T08:52:47.6740759Z ld.shared.b16 %rs237, [%r1556]; 2026-02-21T08:52:47.6740831Z ld.shared.b16 %rs238, [%r1556+1024]; 2026-02-21T08:52:47.6740898Z ld.shared.b16 %rs239, [%r1556+64]; 2026-02-21T08:52:47.6740965Z ld.shared.b16 %rs240, [%r1556+1088]; 2026-02-21T08:52:47.6741458Z add.s32 %r1557, %r1552, %r20; 2026-02-21T08:52:47.6741525Z ld.shared.b16 %rs241, [%r1557]; 2026-02-21T08:52:47.6741596Z ld.shared.b16 %rs242, [%r1557+1024]; 2026-02-21T08:52:47.6741668Z ld.shared.b16 %rs243, [%r1557+64]; 2026-02-21T08:52:47.6741737Z ld.shared.b16 %rs244, [%r1557+1088]; 2026-02-21T08:52:47.6741801Z add.s32 %r1558, %r1552, %r21; 2026-02-21T08:52:47.6741867Z ld.shared.b16 %rs245, [%r1558]; 2026-02-21T08:52:47.6741939Z ld.shared.b16 %rs246, [%r1558+1024]; 2026-02-21T08:52:47.6742005Z ld.shared.b16 %rs247, [%r1558+64]; 2026-02-21T08:52:47.6742073Z ld.shared.b16 %rs248, [%r1558+1088]; 2026-02-21T08:52:47.6742140Z add.s32 %r1559, %r1552, %r22; 2026-02-21T08:52:47.6742206Z ld.shared.b16 %rs249, [%r1559]; 2026-02-21T08:52:47.6742373Z ld.shared.b16 %rs250, [%r1559+1024]; 2026-02-21T08:52:47.6742445Z ld.shared.b16 %rs251, [%r1559+64]; 2026-02-21T08:52:47.6742521Z ld.shared.b16 %rs252, [%r1559+1088]; 2026-02-21T08:52:47.6742585Z add.s32 %r1560, %r1552, %r23; 2026-02-21T08:52:47.6742655Z ld.shared.b16 %rs253, [%r1560]; 2026-02-21T08:52:47.6742729Z ld.shared.b16 %rs254, [%r1560+1024]; 2026-02-21T08:52:47.6742795Z ld.shared.b16 %rs255, [%r1560+64]; 2026-02-21T08:52:47.6742863Z ld.shared.b16 %rs256, [%r1560+1088]; 2026-02-21T08:52:47.6742935Z cvt.f32.bf16 %r1251, %rs225; 2026-02-21T08:52:47.6742999Z cvt.f32.bf16 %r1252, %rs226; 2026-02-21T08:52:47.6743062Z cvt.f32.bf16 %r1253, %rs229; 2026-02-21T08:52:47.6743126Z cvt.f32.bf16 %r1254, %rs230; 2026-02-21T08:52:47.6743194Z cvt.f32.bf16 %r1287, %rs233; 2026-02-21T08:52:47.6743256Z cvt.f32.bf16 %r1288, %rs234; 2026-02-21T08:52:47.6743319Z cvt.f32.bf16 %r1289, %rs237; 2026-02-21T08:52:47.6743386Z cvt.f32.bf16 %r1290, %rs238; 2026-02-21T08:52:47.6743449Z cvt.f32.bf16 %r1323, %rs241; 2026-02-21T08:52:47.6743514Z cvt.f32.bf16 %r1324, %rs242; 2026-02-21T08:52:47.6743578Z cvt.f32.bf16 %r1325, %rs245; 2026-02-21T08:52:47.6743647Z cvt.f32.bf16 %r1326, %rs246; 2026-02-21T08:52:47.6743714Z cvt.f32.bf16 %r1359, %rs249; 2026-02-21T08:52:47.6743777Z cvt.f32.bf16 %r1360, %rs250; 2026-02-21T08:52:47.6743846Z cvt.f32.bf16 %r1361, %rs253; 2026-02-21T08:52:47.6743911Z cvt.f32.bf16 %r1362, %rs254; 2026-02-21T08:52:47.6743975Z cvt.f32.bf16 %r1395, %rs227; 2026-02-21T08:52:47.6744039Z cvt.f32.bf16 %r1396, %rs228; 2026-02-21T08:52:47.6744113Z cvt.f32.bf16 %r1397, %rs231; 2026-02-21T08:52:47.6744176Z cvt.f32.bf16 %r1398, %rs232; 2026-02-21T08:52:47.6744238Z cvt.f32.bf16 %r1431, %rs235; 2026-02-21T08:52:47.6744309Z cvt.f32.bf16 %r1432, %rs236; 2026-02-21T08:52:47.6744372Z cvt.f32.bf16 %r1433, %rs239; 2026-02-21T08:52:47.6744434Z cvt.f32.bf16 %r1434, %rs240; 2026-02-21T08:52:47.6744502Z cvt.f32.bf16 %r1467, %rs243; 2026-02-21T08:52:47.6744565Z cvt.f32.bf16 %r1468, %rs244; 2026-02-21T08:52:47.6744631Z cvt.f32.bf16 %r1469, %rs247; 2026-02-21T08:52:47.6744697Z cvt.f32.bf16 %r1470, %rs248; 2026-02-21T08:52:47.6744766Z cvt.f32.bf16 %r1503, %rs251; 2026-02-21T08:52:47.6744832Z cvt.f32.bf16 %r1504, %rs252; 2026-02-21T08:52:47.6744899Z cvt.f32.bf16 %r1505, %rs255; 2026-02-21T08:52:47.6744969Z cvt.f32.bf16 %r1506, %rs256; 2026-02-21T08:52:47.6745177Z .loc 1 67 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:67:45 2026-02-21T08:52:47.6745242Z add.s32 %r1561, %r24, %r1550; 2026-02-21T08:52:47.6745311Z ld.shared.b8 %rs257, [%r1561]; 2026-02-21T08:52:47.6745399Z ld.shared.b8 %rs258, [%r1561+512]; 2026-02-21T08:52:47.6745473Z ld.shared.b8 %rs259, [%r1561+1024]; 2026-02-21T08:52:47.6745542Z ld.shared.b8 %rs260, [%r1561+1536]; 2026-02-21T08:52:47.6754897Z ld.shared.b8 %rs261, [%r1561+2048]; 2026-02-21T08:52:47.6755036Z ld.shared.b8 %rs262, [%r1561+2560]; 2026-02-21T08:52:47.6755120Z ld.shared.b8 %rs263, [%r1561+3072]; 2026-02-21T08:52:47.6755203Z ld.shared.b8 %rs264, [%r1561+3584]; 2026-02-21T08:52:47.6755281Z ld.shared.b8 %rs265, [%r1561+4096]; 2026-02-21T08:52:47.6755348Z ld.shared.b8 %rs266, [%r1561+4608]; 2026-02-21T08:52:47.6755664Z ld.shared.b8 %rs267, [%r1561+5120]; 2026-02-21T08:52:47.6755738Z ld.shared.b8 %rs268, [%r1561+5632]; 2026-02-21T08:52:47.6755806Z ld.shared.b8 %rs269, [%r1561+6144]; 2026-02-21T08:52:47.6755872Z ld.shared.b8 %rs270, [%r1561+6656]; 2026-02-21T08:52:47.6755940Z ld.shared.b8 %rs271, [%r1561+7168]; 2026-02-21T08:52:47.6756016Z ld.shared.b8 %rs272, [%r1561+7680]; 2026-02-21T08:52:47.6756256Z .loc 1 57 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:57:28 2026-02-21T08:52:47.6756331Z shl.b16 %rs273, %rs257, 4; 2026-02-21T08:52:47.6756405Z shl.b16 %rs274, %rs258, 4; 2026-02-21T08:52:47.6756630Z shl.b16 %rs275, %rs259, 4; 2026-02-21T08:52:47.6756700Z shl.b16 %rs276, %rs260, 4; 2026-02-21T08:52:47.6756769Z shl.b16 %rs277, %rs261, 4; 2026-02-21T08:52:47.6756969Z shl.b16 %rs278, %rs262, 4; 2026-02-21T08:52:47.6757036Z shl.b16 %rs279, %rs263, 4; 2026-02-21T08:52:47.6757100Z shl.b16 %rs280, %rs264, 4; 2026-02-21T08:52:47.6757167Z shl.b16 %rs281, %rs265, 4; 2026-02-21T08:52:47.6757234Z shl.b16 %rs282, %rs266, 4; 2026-02-21T08:52:47.6757298Z shl.b16 %rs283, %rs267, 4; 2026-02-21T08:52:47.6757364Z shl.b16 %rs284, %rs268, 4; 2026-02-21T08:52:47.6757439Z shl.b16 %rs285, %rs269, 4; 2026-02-21T08:52:47.6757501Z shl.b16 %rs286, %rs270, 4; 2026-02-21T08:52:47.6757562Z shl.b16 %rs287, %rs271, 4; 2026-02-21T08:52:47.6757631Z shl.b16 %rs288, %rs272, 4; 2026-02-21T08:52:47.6757861Z .loc 1 72 58 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:72:58 2026-02-21T08:52:47.6757947Z selp.b16 %rs289, %rs273, %rs257, %p57; 2026-02-21T08:52:47.6758021Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:52:47.6758085Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:52:47.6758161Z selp.b16 %rs292, %rs274, %rs258, %p57; 2026-02-21T08:52:47.6758233Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:52:47.6758297Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:52:47.6758380Z selp.b16 %rs295, %rs275, %rs259, %p57; 2026-02-21T08:52:47.6758447Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:52:47.6758515Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:52:47.6758583Z selp.b16 %rs298, %rs276, %rs260, %p57; 2026-02-21T08:52:47.6758645Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:52:47.6758713Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:52:47.6758782Z selp.b16 %rs301, %rs277, %rs261, %p57; 2026-02-21T08:52:47.6758846Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:52:47.6758908Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:52:47.6758984Z selp.b16 %rs304, %rs278, %rs262, %p57; 2026-02-21T08:52:47.6759046Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:52:47.6759108Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:52:47.6759183Z selp.b16 %rs307, %rs279, %rs263, %p57; 2026-02-21T08:52:47.6759245Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:52:47.6759310Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:52:47.6759380Z selp.b16 %rs310, %rs280, %rs264, %p57; 2026-02-21T08:52:47.6759448Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:52:47.6759510Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:52:47.6759581Z selp.b16 %rs313, %rs281, %rs265, %p57; 2026-02-21T08:52:47.6759650Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:52:47.6759710Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:52:47.6759793Z selp.b16 %rs316, %rs282, %rs266, %p57; 2026-02-21T08:52:47.6759857Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:52:47.6759929Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:52:47.6760000Z selp.b16 %rs319, %rs283, %rs267, %p57; 2026-02-21T08:52:47.6760061Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:52:47.6760128Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:52:47.6760205Z selp.b16 %rs322, %rs284, %rs268, %p57; 2026-02-21T08:52:47.6760268Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:52:47.6760330Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:52:47.6760404Z selp.b16 %rs325, %rs285, %rs269, %p57; 2026-02-21T08:52:47.6760469Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:52:47.6760531Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:52:47.6760600Z selp.b16 %rs328, %rs286, %rs270, %p57; 2026-02-21T08:52:47.6760806Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:52:47.6760870Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:52:47.6760939Z selp.b16 %rs331, %rs287, %rs271, %p57; 2026-02-21T08:52:47.6761007Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:52:47.6761069Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:52:47.6761138Z selp.b16 %rs334, %rs288, %rs272, %p57; 2026-02-21T08:52:47.6761201Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:52:47.6761269Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:52:47.6761490Z .loc 1 77 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:77:32 2026-02-21T08:52:47.6761562Z cvt.rn.f32.s16 %r1562, %rs291; 2026-02-21T08:52:47.6761634Z cvt.rn.f32.s16 %r1563, %rs294; 2026-02-21T08:52:47.6761698Z cvt.rn.f32.s16 %r1564, %rs297; 2026-02-21T08:52:47.6761855Z cvt.rn.f32.s16 %r1565, %rs300; 2026-02-21T08:52:47.6761926Z cvt.rn.f32.s16 %r1566, %rs303; 2026-02-21T08:52:47.6761991Z cvt.rn.f32.s16 %r1567, %rs306; 2026-02-21T08:52:47.6762054Z cvt.rn.f32.s16 %r1568, %rs309; 2026-02-21T08:52:47.6762121Z cvt.rn.f32.s16 %r1569, %rs312; 2026-02-21T08:52:47.6762192Z cvt.rn.f32.s16 %r1570, %rs315; 2026-02-21T08:52:47.6762254Z cvt.rn.f32.s16 %r1571, %rs318; 2026-02-21T08:52:47.6762317Z cvt.rn.f32.s16 %r1572, %rs321; 2026-02-21T08:52:47.6762386Z cvt.rn.f32.s16 %r1573, %rs324; 2026-02-21T08:52:47.6762450Z cvt.rn.f32.s16 %r1574, %rs327; 2026-02-21T08:52:47.6762513Z cvt.rn.f32.s16 %r1575, %rs330; 2026-02-21T08:52:47.6762575Z cvt.rn.f32.s16 %r1576, %rs333; 2026-02-21T08:52:47.6762643Z cvt.rn.f32.s16 %r1577, %rs336; 2026-02-21T08:52:47.6762712Z st.shared.b32 [%r25], %r1562; 2026-02-21T08:52:47.6762783Z st.shared.b32 [%r25+32768], %r1570; 2026-02-21T08:52:47.6762854Z st.shared.b32 [%r26], %r1563; 2026-02-21T08:52:47.6762923Z st.shared.b32 [%r26+32768], %r1571; 2026-02-21T08:52:47.6762990Z st.shared.b32 [%r27], %r1564; 2026-02-21T08:52:47.6763058Z st.shared.b32 [%r27+32768], %r1572; 2026-02-21T08:52:47.6763139Z st.shared.b32 [%r28], %r1565; 2026-02-21T08:52:47.6763210Z st.shared.b32 [%r28+32768], %r1573; 2026-02-21T08:52:47.6763275Z st.shared.b32 [%r29], %r1566; 2026-02-21T08:52:47.6763348Z st.shared.b32 [%r29+32768], %r1574; 2026-02-21T08:52:47.6763411Z st.shared.b32 [%r30], %r1567; 2026-02-21T08:52:47.6763477Z st.shared.b32 [%r30+32768], %r1575; 2026-02-21T08:52:47.6763546Z st.shared.b32 [%r31], %r1568; 2026-02-21T08:52:47.6763611Z st.shared.b32 [%r31+32768], %r1576; 2026-02-21T08:52:47.6763675Z st.shared.b32 [%r32], %r1569; 2026-02-21T08:52:47.6763739Z st.shared.b32 [%r32+32768], %r1577; 2026-02-21T08:52:47.6763802Z $L__tmp5: 2026-02-21T08:52:47.6764094Z .loc 2 291 36 // standard.py:291:36 @[ cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:84:40 ] 2026-02-21T08:52:47.6764159Z // begin inline asm 2026-02-21T08:52:47.6764269Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.6764332Z // end inline asm 2026-02-21T08:52:47.6764390Z bar.sync 0; 2026-02-21T08:52:47.6764477Z shfl.sync.idx.b32 %r1578, %r4, 0, 31, -1; 2026-02-21T08:52:47.6764563Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.6764629Z shl.b32 %r1579, %r1578, 10; 2026-02-21T08:52:47.6764695Z and.b32 %r1580, %r1579, 28672; 2026-02-21T08:52:47.6764767Z add.s32 %r1581, %r1580, %r2128; 2026-02-21T08:52:47.6764831Z bfe.u32 %r1582, %r1581, 4, 14; 2026-02-21T08:52:47.6764899Z cvt.u64.u32 %rd109, %r1582; 2026-02-21T08:52:47.6764984Z or.b64 %rd99, %rd109, 4611686293439512576; 2026-02-21T08:52:47.6765052Z mov.pred %p28, -1; 2026-02-21T08:52:47.6765115Z // begin inline asm 2026-02-21T08:52:47.6765634Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1251,%r1252,%r1253,%r1254}, %rd99, %p28, 1, 1; 2026-02-21T08:52:47.6765704Z // end inline asm 2026-02-21T08:52:47.6765771Z add.s32 %r1583, %r1581, 32; 2026-02-21T08:52:47.6765835Z bfe.u32 %r1584, %r1583, 4, 14; 2026-02-21T08:52:47.6765906Z cvt.u64.u32 %rd110, %r1584; 2026-02-21T08:52:47.6766088Z or.b64 %rd100, %rd110, 4611686293439512576; 2026-02-21T08:52:47.6766151Z // begin inline asm 2026-02-21T08:52:47.6766780Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1287,%r1288,%r1289,%r1290}, %rd100, %p28, 1, 1; 2026-02-21T08:52:47.6766859Z // end inline asm 2026-02-21T08:52:47.6766926Z add.s32 %r1585, %r1581, 64; 2026-02-21T08:52:47.6766989Z bfe.u32 %r1586, %r1585, 4, 14; 2026-02-21T08:52:47.6767063Z cvt.u64.u32 %rd111, %r1586; 2026-02-21T08:52:47.6767141Z or.b64 %rd101, %rd111, 4611686293439512576; 2026-02-21T08:52:47.6767202Z // begin inline asm 2026-02-21T08:52:47.6767849Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1323,%r1324,%r1325,%r1326}, %rd101, %p28, 1, 1; 2026-02-21T08:52:47.6767915Z // end inline asm 2026-02-21T08:52:47.6767986Z add.s32 %r1587, %r1581, 96; 2026-02-21T08:52:47.6768060Z bfe.u32 %r1588, %r1587, 4, 14; 2026-02-21T08:52:47.6768128Z cvt.u64.u32 %rd112, %r1588; 2026-02-21T08:52:47.6768200Z or.b64 %rd102, %rd112, 4611686293439512576; 2026-02-21T08:52:47.6768263Z // begin inline asm 2026-02-21T08:52:47.6768778Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1359,%r1360,%r1361,%r1362}, %rd102, %p28, 1, 1; 2026-02-21T08:52:47.6768841Z // end inline asm 2026-02-21T08:52:47.6768906Z add.s32 %r1589, %r1581, 32768; 2026-02-21T08:52:47.6768979Z bfe.u32 %r1590, %r1589, 4, 14; 2026-02-21T08:52:47.6769050Z cvt.u64.u32 %rd113, %r1590; 2026-02-21T08:52:47.6769129Z or.b64 %rd103, %rd113, 4611686293439512576; 2026-02-21T08:52:47.6769201Z // begin inline asm 2026-02-21T08:52:47.6769702Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1395,%r1396,%r1397,%r1398}, %rd103, %p28, 1, 1; 2026-02-21T08:52:47.6769768Z // end inline asm 2026-02-21T08:52:47.6769852Z add.s32 %r1591, %r1581, 32800; 2026-02-21T08:52:47.6769919Z bfe.u32 %r1592, %r1591, 4, 14; 2026-02-21T08:52:47.6769985Z cvt.u64.u32 %rd114, %r1592; 2026-02-21T08:52:47.6770058Z or.b64 %rd104, %rd114, 4611686293439512576; 2026-02-21T08:52:47.6770131Z // begin inline asm 2026-02-21T08:52:47.6770627Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1431,%r1432,%r1433,%r1434}, %rd104, %p28, 1, 1; 2026-02-21T08:52:47.6770687Z // end inline asm 2026-02-21T08:52:47.6770759Z add.s32 %r1593, %r1581, 32832; 2026-02-21T08:52:47.6770821Z bfe.u32 %r1594, %r1593, 4, 14; 2026-02-21T08:52:47.6770885Z cvt.u64.u32 %rd115, %r1594; 2026-02-21T08:52:47.6770965Z or.b64 %rd105, %rd115, 4611686293439512576; 2026-02-21T08:52:47.6771031Z // begin inline asm 2026-02-21T08:52:47.6771525Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1467,%r1468,%r1469,%r1470}, %rd105, %p28, 1, 1; 2026-02-21T08:52:47.6771583Z // end inline asm 2026-02-21T08:52:47.6771655Z add.s32 %r1595, %r1581, 32864; 2026-02-21T08:52:47.6771715Z bfe.u32 %r1596, %r1595, 4, 14; 2026-02-21T08:52:47.6771777Z cvt.u64.u32 %rd116, %r1596; 2026-02-21T08:52:47.6771856Z or.b64 %rd106, %rd116, 4611686293439512576; 2026-02-21T08:52:47.6771917Z // begin inline asm 2026-02-21T08:52:47.6772409Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202}, {%r1503,%r1504,%r1505,%r1506}, %rd106, %p28, 1, 1; 2026-02-21T08:52:47.6772474Z // end inline asm 2026-02-21T08:52:47.6772558Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.6772750Z mov.b32 %r1525, 0; 2026-02-21T08:52:47.6772816Z mov.b32 %r1523, %r2128; 2026-02-21T08:52:47.6772882Z mov.b32 %r1524, %r1525; 2026-02-21T08:52:47.6772941Z // begin inline asm 2026-02-21T08:52:47.6773252Z // wait for regs: %r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r1523,%r1524,%r1525 2026-02-21T08:52:47.6773334Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.6773392Z // end inline asm 2026-02-21T08:52:47.6773448Z $L__tmp6: 2026-02-21T08:52:47.6773678Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6773744Z add.s32 %r1597, %r2186, 1; 2026-02-21T08:52:47.6773815Z setp.gt.s32 %p39, %r1597, 1; 2026-02-21T08:52:47.6774492Z selp.b32 %r2186, 0, %r1597, %p39; 2026-02-21T08:52:47.6774717Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6774795Z mad.wide.s32 %rd107, %r2183, 2, %rd15; 2026-02-21T08:52:47.6775013Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6775088Z shl.b32 %r1598, %r2186, 13; 2026-02-21T08:52:47.6775154Z add.s32 %r1545, %r11, %r1598; 2026-02-21T08:52:47.6775221Z selp.b32 %r1546, 8, 0, %p37; 2026-02-21T08:52:47.6775291Z // begin inline asm 2026-02-21T08:52:47.6775442Z cp.async.ca.shared.global [ %r1545 + 0 ], [ %rd107 + 0 ], 0x8, %r1546; 2026-02-21T08:52:47.6775501Z // end inline asm 2026-02-21T08:52:47.6775573Z cp.async.commit_group; 2026-02-21T08:52:47.6775780Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6775845Z cvt.s64.s32 %rd117, %r2184; 2026-02-21T08:52:47.6775914Z add.s64 %rd108, %rd16, %rd117; 2026-02-21T08:52:47.6776116Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6776184Z add.s32 %r1547, %r13, %r1598; 2026-02-21T08:52:47.6776247Z // begin inline asm 2026-02-21T08:52:47.6776391Z cp.async.ca.shared.global [ %r1547 + 0 ], [ %rd108 + 0 ], 0x8, %r1546; 2026-02-21T08:52:47.6776569Z // end inline asm 2026-02-21T08:52:47.6776641Z cp.async.commit_group; 2026-02-21T08:52:47.6776851Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6776921Z add.s32 %r2184, %r2184, 229376; 2026-02-21T08:52:47.6776984Z add.s32 %r2183, %r2183, 64; 2026-02-21T08:52:47.6777050Z setp.lt.u64 %p40, %rd153, 4064; 2026-02-21T08:52:47.6777118Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:47.6777230Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:47.6777436Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6777507Z or.b32 %r1617, %r154, %r4; 2026-02-21T08:52:47.6777569Z or.b32 %r1618, %r154, %r6; 2026-02-21T08:52:47.6777779Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6777853Z cp.async.wait_group 0; 2026-02-21T08:52:47.6777918Z bar.sync 0; 2026-02-21T08:52:47.6778118Z .loc 1 87 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:87:28 2026-02-21T08:52:47.6778204Z cvt.rn.bf16x2.f32 %r1619, %r2188, %r2187; 2026-02-21T08:52:47.6778288Z cvt.rn.bf16x2.f32 %r1620, %r2190, %r2189; 2026-02-21T08:52:47.6778362Z cvt.rn.bf16x2.f32 %r1621, %r2192, %r2191; 2026-02-21T08:52:47.6778432Z cvt.rn.bf16x2.f32 %r1622, %r2194, %r2193; 2026-02-21T08:52:47.6778509Z cvt.rn.bf16x2.f32 %r1623, %r2196, %r2195; 2026-02-21T08:52:47.6778580Z cvt.rn.bf16x2.f32 %r1624, %r2198, %r2197; 2026-02-21T08:52:47.6778653Z cvt.rn.bf16x2.f32 %r1625, %r2200, %r2199; 2026-02-21T08:52:47.6778724Z cvt.rn.bf16x2.f32 %r1626, %r2202, %r2201; 2026-02-21T08:52:47.6778934Z .loc 1 88 50 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:50 2026-02-21T08:52:47.6779166Z mad.lo.s32 %r1627, %r1617, 7168, %r155; 2026-02-21T08:52:47.6779237Z mad.lo.s32 %r1628, %r1618, 7168, %r155; 2026-02-21T08:52:47.6779448Z .loc 1 88 22 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:22 2026-02-21T08:52:47.6779519Z mad.wide.s32 %rd118, %r1627, 2, %rd17; 2026-02-21T08:52:47.6779589Z mad.wide.s32 %rd119, %r1628, 2, %rd17; 2026-02-21T08:52:47.6779807Z .loc 1 88 81 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:81 2026-02-21T08:52:47.6779928Z st.shared.v4.b32 [%r33], {%r1619, %r1621, %r1623, %r1625}; 2026-02-21T08:52:47.6780046Z st.shared.v4.b32 [%r33+512], {%r1620, %r1622, %r1624, %r1626}; 2026-02-21T08:52:47.6780105Z bar.sync 0; 2026-02-21T08:52:47.6780296Z // begin inline asm 2026-02-21T08:52:47.6780491Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1599, %r1600, %r1601, %r1602}, [%r719]; 2026-02-21T08:52:47.6780550Z // end inline asm 2026-02-21T08:52:47.6780623Z // begin inline asm 2026-02-21T08:52:47.6780807Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1604, %r1605, %r1606, %r1607}, [%r724]; 2026-02-21T08:52:47.6780871Z // end inline asm 2026-02-21T08:52:47.6780936Z // begin inline asm 2026-02-21T08:52:47.6781063Z st.global.v4.b32 [ %rd118 + 0 ], { %r1599, %r1600, %r1601, %r1602 }; 2026-02-21T08:52:47.6781121Z // end inline asm 2026-02-21T08:52:47.6781184Z // begin inline asm 2026-02-21T08:52:47.6781308Z st.global.v4.b32 [ %rd119 + 0 ], { %r1604, %r1605, %r1606, %r1607 }; 2026-02-21T08:52:47.6781370Z // end inline asm 2026-02-21T08:52:47.6781582Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6781655Z add.s32 %r2142, %r2142, 12672; 2026-02-21T08:52:47.6781730Z setp.lt.s32 %p41, %r2142, %r2203; 2026-02-21T08:52:47.6781794Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:47.6781892Z $L__BB0_9: // %.preheader 2026-02-21T08:52:47.6781962Z setp.gt.s32 %p42, %r2203, 27; 2026-02-21T08:52:47.6782029Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:47.6782119Z // %bb.10: // %.lr.ph114 2026-02-21T08:52:47.6782336Z .loc 1 0 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:0:144 2026-02-21T08:52:47.6782401Z and.b32 %r1631, %r2127, 56; 2026-02-21T08:52:47.6782469Z xor.b32 %r1632, %r1631, %r2126; 2026-02-21T08:52:47.6782545Z add.s32 %r1634, %r2128, %r1632; 2026-02-21T08:52:47.6782610Z add.s32 %r38, %r1634, 65536; 2026-02-21T08:52:47.6782687Z add.s32 %r1635, %r2128, 81920; 2026-02-21T08:52:47.6782754Z add.s32 %r40, %r1635, %r2126; 2026-02-21T08:52:47.6782825Z add.s32 %r1687, %r1634, 73728; 2026-02-21T08:52:47.6782894Z add.s32 %r1636, %r2128, %r2126; 2026-02-21T08:52:47.6782963Z add.s32 %r1689, %r1636, 90112; 2026-02-21T08:52:47.6783038Z and.b32 %r1638, %r2130, 6144; 2026-02-21T08:52:47.6783103Z and.b32 %r1640, %r2131, 896; 2026-02-21T08:52:47.6783167Z or.b32 %r1642, %r1638, %r1640; 2026-02-21T08:52:47.6783241Z or.b32 %r43, %r1642, %r2132; 2026-02-21T08:52:47.6783303Z xor.b32 %r44, %r43, 8; 2026-02-21T08:52:47.6783366Z xor.b32 %r45, %r43, 16; 2026-02-21T08:52:47.6783427Z xor.b32 %r46, %r43, 24; 2026-02-21T08:52:47.6783494Z xor.b32 %r47, %r43, 32; 2026-02-21T08:52:47.6783555Z xor.b32 %r48, %r43, 40; 2026-02-21T08:52:47.6783615Z xor.b32 %r49, %r43, 48; 2026-02-21T08:52:47.6783680Z xor.b32 %r50, %r43, 56; 2026-02-21T08:52:47.6783742Z and.b32 %r1644, %r2127, 256; 2026-02-21T08:52:47.6783809Z add.s32 %r1645, %r1635, %r1644; 2026-02-21T08:52:47.6783873Z add.s32 %r51, %r1645, %r2133; 2026-02-21T08:52:47.6783942Z shl.b32 %r1646, %r2133, 7; 2026-02-21T08:52:47.6784004Z and.b32 %r1648, %r2134, 112; 2026-02-21T08:52:47.6784070Z and.b32 %r1650, %r2135, 12; 2026-02-21T08:52:47.6784143Z or.b32 %r1651, %r1646, %r1650; 2026-02-21T08:52:47.6784204Z or.b32 %r1652, %r1651, %r1648; 2026-02-21T08:52:47.6784266Z add.s32 %r52, %r2128, %r1652; 2026-02-21T08:52:47.6784446Z xor.b32 %r1653, %r1652, 16; 2026-02-21T08:52:47.6784514Z add.s32 %r53, %r2128, %r1653; 2026-02-21T08:52:47.6784576Z xor.b32 %r1654, %r1652, 32; 2026-02-21T08:52:47.6784637Z add.s32 %r54, %r2128, %r1654; 2026-02-21T08:52:47.6784704Z xor.b32 %r1655, %r1652, 48; 2026-02-21T08:52:47.6784765Z add.s32 %r55, %r2128, %r1655; 2026-02-21T08:52:47.6784826Z xor.b32 %r1656, %r1652, 64; 2026-02-21T08:52:47.6784887Z add.s32 %r56, %r2128, %r1656; 2026-02-21T08:52:47.6784954Z xor.b32 %r1657, %r1652, 80; 2026-02-21T08:52:47.6785017Z add.s32 %r57, %r2128, %r1657; 2026-02-21T08:52:47.6785076Z xor.b32 %r1658, %r1652, 96; 2026-02-21T08:52:47.6785143Z add.s32 %r58, %r2128, %r1658; 2026-02-21T08:52:47.6785207Z xor.b32 %r1659, %r1652, 112; 2026-02-21T08:52:47.6785269Z add.s32 %r59, %r2128, %r1659; 2026-02-21T08:52:47.6785426Z and.b32 %r1661, %r2136, 24576; 2026-02-21T08:52:47.6785491Z and.b32 %r1662, %r2131, 3168; 2026-02-21T08:52:47.6785554Z shl.b32 %r1664, %r2137, 4; 2026-02-21T08:52:47.6785619Z and.b32 %r1666, %r2138, 112; 2026-02-21T08:52:47.6785689Z and.b32 %r1668, %r2139, 4112; 2026-02-21T08:52:47.6785750Z or.b32 %r1669, %r1662, %r1664; 2026-02-21T08:52:47.6785813Z xor.b32 %r1670, %r1668, %r1666; 2026-02-21T08:52:47.6785882Z xor.b32 %r1671, %r1670, %r1669; 2026-02-21T08:52:47.6785944Z add.s32 %r1672, %r2128, %r1661; 2026-02-21T08:52:47.6786018Z add.s32 %r60, %r1672, %r1671; 2026-02-21T08:52:47.6786081Z shl.b32 %r1673, %r2137, 10; 2026-02-21T08:52:47.6786151Z shl.b32 %r1674, %r2137, 2; 2026-02-21T08:52:47.6786212Z and.b32 %r1676, %r2140, 1920; 2026-02-21T08:52:47.6786273Z and.b32 %r1678, %r2141, 4112; 2026-02-21T08:52:47.6786343Z or.b32 %r1679, %r1673, %r1648; 2026-02-21T08:52:47.6786405Z or.b32 %r1680, %r1674, %r1676; 2026-02-21T08:52:47.6786586Z xor.b32 %r1681, %r1679, %r1680; 2026-02-21T08:52:47.6786653Z xor.b32 %r1682, %r1681, %r1678; 2026-02-21T08:52:47.6786725Z add.s32 %r2100, %r2128, %r1682; 2026-02-21T08:52:47.6786788Z add.s32 %r2105, %r2100, 2048; 2026-02-21T08:52:47.6787005Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6787073Z or.b32 %r63, %r7, 128; 2026-02-21T08:52:47.6787277Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6787350Z mad.wide.u32 %rd120, %r4, 7168, %rd16; 2026-02-21T08:52:47.6787420Z add.s64 %rd1, %rd120, 458752; 2026-02-21T08:52:47.6787483Z cvt.u64.u32 %rd128, %r2129; 2026-02-21T08:52:47.6787599Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:47.6787699Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:47.6787910Z .loc 1 25 35 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:25:35 2026-02-21T08:52:47.6787984Z mul.hi.s32 %r1694, %r2203, -1840700269; 2026-02-21T08:52:47.6788049Z add.s32 %r1695, %r1694, %r2203; 2026-02-21T08:52:47.6788128Z shr.u32 %r1696, %r1695, 31; 2026-02-21T08:52:47.6788196Z shr.s32 %r1697, %r1695, 5; 2026-02-21T08:52:47.6788258Z add.s32 %r1698, %r1697, %r1696; 2026-02-21T08:52:47.6788463Z .loc 1 26 33 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:26:33 2026-02-21T08:52:47.6788590Z shl.b32 %r1699, %r1698, 1; 2026-02-21T08:52:47.6788796Z .loc 1 27 39 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:39 2026-02-21T08:52:47.6788860Z sub.s32 %r1700, 1, %r1699; 2026-02-21T08:52:47.6789061Z .loc 1 27 52 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:27:52 2026-02-21T08:52:47.6789123Z min.u32 %r1701, %r1700, 2; 2026-02-21T08:52:47.6789320Z .loc 1 28 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:45 2026-02-21T08:52:47.6789395Z mul.lo.s32 %r1702, %r1698, 56; 2026-02-21T08:52:47.6789457Z sub.s32 %r1703, %r2203, %r1702; 2026-02-21T08:52:47.6789655Z .loc 1 28 64 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:64 2026-02-21T08:52:47.6789869Z cvt.u16.u32 %rs337, %r1703; 2026-02-21T08:52:47.6789931Z cvt.s8.s32 %rs338, %r1703; 2026-02-21T08:52:47.6789995Z cvt.u16.u32 %rs339, %r1701; 2026-02-21T08:52:47.6790199Z .loc 1 29 51 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:29:51 2026-02-21T08:52:47.6790266Z div.s16 %rs340, %rs338, %rs339; 2026-02-21T08:52:47.6790462Z .loc 1 28 64 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:64 2026-02-21T08:52:47.6790531Z mul.lo.s16 %rs341, %rs340, %rs339; 2026-02-21T08:52:47.6790603Z sub.s16 %rs342, %rs337, %rs341; 2026-02-21T08:52:47.6790664Z cvt.u32.u16 %r1704, %rs342; 2026-02-21T08:52:47.6790728Z cvt.s32.s8 %r1705, %r1704; 2026-02-21T08:52:47.6791046Z .loc 1 28 30 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:28:30 2026-02-21T08:52:47.6791113Z add.s32 %r1706, %r1699, %r1705; 2026-02-21T08:52:47.6791312Z .loc 1 30 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:30:27 2026-02-21T08:52:47.6791382Z shl.b32 %r200, %r1706, 6; 2026-02-21T08:52:47.6791578Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6791640Z or.b32 %r1707, %r200, %r5; 2026-02-21T08:52:47.6791840Z .loc 1 32 27 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:32:27 2026-02-21T08:52:47.6791903Z cvt.s16.s8 %rs343, %rs340; 2026-02-21T08:52:47.6791970Z mul.wide.s16 %r1708, %rs343, 256; 2026-02-21T08:52:47.6792168Z .loc 1 33 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:33:32 2026-02-21T08:52:47.6792230Z or.b32 %r201, %r1708, %r9; 2026-02-21T08:52:47.6792427Z .loc 1 48 53 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:53 2026-02-21T08:52:47.6792489Z shl.b32 %r1709, %r1707, 13; 2026-02-21T08:52:47.6792693Z .loc 1 48 60 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:60 2026-02-21T08:52:47.6792755Z or.b32 %r1710, %r1709, %r7; 2026-02-21T08:52:47.6792949Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6793026Z mad.wide.s32 %rd121, %r1710, 2, %rd15; 2026-02-21T08:52:47.6793085Z mov.b32 %r1684, 8; 2026-02-21T08:52:47.6793291Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6793358Z // begin inline asm 2026-02-21T08:52:47.6793498Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd121 + 0 ], 0x8, %r1684; 2026-02-21T08:52:47.6793556Z // end inline asm 2026-02-21T08:52:47.6793627Z cp.async.commit_group; 2026-02-21T08:52:47.6793834Z .loc 1 54 62 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:62 2026-02-21T08:52:47.6793897Z add.s32 %r1711, %r201, %r2129; 2026-02-21T08:52:47.6794095Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6794162Z cvt.s64.s32 %rd126, %r1711; 2026-02-21T08:52:47.6794225Z add.s64 %rd122, %rd16, %rd126; 2026-02-21T08:52:47.6794418Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6794485Z // begin inline asm 2026-02-21T08:52:47.6794617Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd122 + 0 ], 0x8, %r1684; 2026-02-21T08:52:47.6794675Z // end inline asm 2026-02-21T08:52:47.6794744Z cp.async.commit_group; 2026-02-21T08:52:47.6794945Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6795010Z add.s64 %rd123, %rd121, 128; 2026-02-21T08:52:47.6795207Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6795270Z bar.sync 0; 2026-02-21T08:52:47.6795330Z // begin inline asm 2026-02-21T08:52:47.6795582Z cp.async.ca.shared.global [ %r1687 + 0 ], [ %rd123 + 0 ], 0x8, %r1684; 2026-02-21T08:52:47.6795644Z // end inline asm 2026-02-21T08:52:47.6795711Z cp.async.commit_group; 2026-02-21T08:52:47.6795906Z .loc 1 54 34 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:34 2026-02-21T08:52:47.6795969Z cvt.s64.s32 %rd127, %r201; 2026-02-21T08:52:47.6796038Z add.s64 %rd129, %rd128, %rd127; 2026-02-21T08:52:47.6796103Z add.s64 %rd130, %rd16, %rd129; 2026-02-21T08:52:47.6796169Z add.s64 %rd124, %rd130, 229376; 2026-02-21T08:52:47.6796374Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6796436Z // begin inline asm 2026-02-21T08:52:47.6796838Z cp.async.ca.shared.global [ %r1689 + 0 ], [ %rd124 + 0 ], 0x8, %r1684; 2026-02-21T08:52:47.6796924Z // end inline asm 2026-02-21T08:52:47.6796994Z cp.async.commit_group; 2026-02-21T08:52:47.6797201Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6797268Z shl.b32 %r1712, %r1698, 7; 2026-02-21T08:52:47.6797335Z or.b32 %r1713, %r5, %r1712; 2026-02-21T08:52:47.6797397Z cvt.s16.s8 %rs344, %rs342; 2026-02-21T08:52:47.6797473Z mad.wide.s16 %r1714, %rs344, 64, %r1713; 2026-02-21T08:52:47.6797540Z shl.b32 %r1715, %r1714, 13; 2026-02-21T08:52:47.6797601Z or.b32 %r2204, %r63, %r1715; 2026-02-21T08:52:47.6797665Z add.s64 %rd154, %rd1, %rd127; 2026-02-21T08:52:47.6797726Z mov.b32 %r2207, 0f00000000; 2026-02-21T08:52:47.6797791Z mov.b32 %r2206, 1; 2026-02-21T08:52:47.6797852Z mov.b32 %r2205, -1; 2026-02-21T08:52:47.6797916Z mov.b64 %rd155, -32; 2026-02-21T08:52:47.6797984Z mov.b32 %r2208, %r2207; 2026-02-21T08:52:47.6798044Z mov.b32 %r2209, %r2207; 2026-02-21T08:52:47.6798107Z mov.b32 %r2210, %r2207; 2026-02-21T08:52:47.6798172Z mov.b32 %r2211, %r2207; 2026-02-21T08:52:47.6798232Z mov.b32 %r2212, %r2207; 2026-02-21T08:52:47.6798291Z mov.b32 %r2213, %r2207; 2026-02-21T08:52:47.6798352Z mov.b32 %r2214, %r2207; 2026-02-21T08:52:47.6798418Z mov.b32 %r2215, %r2207; 2026-02-21T08:52:47.6798479Z mov.b32 %r2216, %r2207; 2026-02-21T08:52:47.6798540Z mov.b32 %r2217, %r2207; 2026-02-21T08:52:47.6798608Z mov.b32 %r2218, %r2207; 2026-02-21T08:52:47.6798668Z mov.b32 %r2219, %r2207; 2026-02-21T08:52:47.6798726Z mov.b32 %r2220, %r2207; 2026-02-21T08:52:47.6798787Z mov.b32 %r2221, %r2207; 2026-02-21T08:52:47.6798853Z mov.b32 %r2222, %r2207; 2026-02-21T08:52:47.6798964Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:47.6799072Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:47.6799141Z add.s64 %rd155, %rd155, 32; 2026-02-21T08:52:47.6799212Z setp.lt.u64 %p52, %rd155, 4032; 2026-02-21T08:52:47.6799275Z add.s32 %r2046, %r2205, 1; 2026-02-21T08:52:47.6799343Z setp.gt.s32 %p53, %r2046, 1; 2026-02-21T08:52:47.6799418Z selp.b32 %r2205, 0, %r2046, %p53; 2026-02-21T08:52:47.6799622Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6799691Z cp.async.wait_group 2; 2026-02-21T08:52:47.6799767Z bar.sync 0; 2026-02-21T08:52:47.6799831Z shl.b32 %r2047, %r2205, 13; 2026-02-21T08:52:47.6799895Z add.s32 %r2048, %r2128, %r2047; 2026-02-21T08:52:47.6799962Z add.s32 %r2049, %r2048, 65536; 2026-02-21T08:52:47.6800160Z .loc 1 52 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:52:32 2026-02-21T08:52:47.6800226Z add.s32 %r2050, %r2049, %r43; 2026-02-21T08:52:47.6800297Z ld.shared.b16 %rs345, [%r2050]; 2026-02-21T08:52:47.6800375Z ld.shared.b16 %rs346, [%r2050+1024]; 2026-02-21T08:52:47.6800444Z ld.shared.b16 %rs347, [%r2050+64]; 2026-02-21T08:52:47.6800516Z ld.shared.b16 %rs348, [%r2050+1088]; 2026-02-21T08:52:47.6800583Z add.s32 %r2051, %r2049, %r44; 2026-02-21T08:52:47.6800650Z ld.shared.b16 %rs349, [%r2051]; 2026-02-21T08:52:47.6800855Z ld.shared.b16 %rs350, [%r2051+1024]; 2026-02-21T08:52:47.6800923Z ld.shared.b16 %rs351, [%r2051+64]; 2026-02-21T08:52:47.6801008Z ld.shared.b16 %rs352, [%r2051+1088]; 2026-02-21T08:52:47.6801071Z add.s32 %r2052, %r2049, %r45; 2026-02-21T08:52:47.6801137Z ld.shared.b16 %rs353, [%r2052]; 2026-02-21T08:52:47.6801208Z ld.shared.b16 %rs354, [%r2052+1024]; 2026-02-21T08:52:47.6801275Z ld.shared.b16 %rs355, [%r2052+64]; 2026-02-21T08:52:47.6801342Z ld.shared.b16 %rs356, [%r2052+1088]; 2026-02-21T08:52:47.6801407Z add.s32 %r2053, %r2049, %r46; 2026-02-21T08:52:47.6801472Z ld.shared.b16 %rs357, [%r2053]; 2026-02-21T08:52:47.6801538Z ld.shared.b16 %rs358, [%r2053+1024]; 2026-02-21T08:52:47.6801603Z ld.shared.b16 %rs359, [%r2053+64]; 2026-02-21T08:52:47.6801673Z ld.shared.b16 %rs360, [%r2053+1088]; 2026-02-21T08:52:47.6801828Z add.s32 %r2054, %r2049, %r47; 2026-02-21T08:52:47.6801897Z ld.shared.b16 %rs361, [%r2054]; 2026-02-21T08:52:47.6801969Z ld.shared.b16 %rs362, [%r2054+1024]; 2026-02-21T08:52:47.6802042Z ld.shared.b16 %rs363, [%r2054+64]; 2026-02-21T08:52:47.6802109Z ld.shared.b16 %rs364, [%r2054+1088]; 2026-02-21T08:52:47.6802170Z add.s32 %r2055, %r2049, %r48; 2026-02-21T08:52:47.6802243Z ld.shared.b16 %rs365, [%r2055]; 2026-02-21T08:52:47.6802309Z ld.shared.b16 %rs366, [%r2055+1024]; 2026-02-21T08:52:47.6802374Z ld.shared.b16 %rs367, [%r2055+64]; 2026-02-21T08:52:47.6802445Z ld.shared.b16 %rs368, [%r2055+1088]; 2026-02-21T08:52:47.6802506Z add.s32 %r2056, %r2049, %r49; 2026-02-21T08:52:47.6802571Z ld.shared.b16 %rs369, [%r2056]; 2026-02-21T08:52:47.6802657Z ld.shared.b16 %rs370, [%r2056+1024]; 2026-02-21T08:52:47.6802726Z ld.shared.b16 %rs371, [%r2056+64]; 2026-02-21T08:52:47.6802791Z ld.shared.b16 %rs372, [%r2056+1088]; 2026-02-21T08:52:47.6802855Z add.s32 %r2057, %r2049, %r50; 2026-02-21T08:52:47.6802925Z ld.shared.b16 %rs373, [%r2057]; 2026-02-21T08:52:47.6802994Z ld.shared.b16 %rs374, [%r2057+1024]; 2026-02-21T08:52:47.6803059Z ld.shared.b16 %rs375, [%r2057+64]; 2026-02-21T08:52:47.6803135Z ld.shared.b16 %rs376, [%r2057+1088]; 2026-02-21T08:52:47.6803202Z cvt.f32.bf16 %r1748, %rs345; 2026-02-21T08:52:47.6803267Z cvt.f32.bf16 %r1749, %rs346; 2026-02-21T08:52:47.6803329Z cvt.f32.bf16 %r1750, %rs349; 2026-02-21T08:52:47.6803397Z cvt.f32.bf16 %r1751, %rs350; 2026-02-21T08:52:47.6803459Z cvt.f32.bf16 %r1784, %rs353; 2026-02-21T08:52:47.6803521Z cvt.f32.bf16 %r1785, %rs354; 2026-02-21T08:52:47.6803588Z cvt.f32.bf16 %r1786, %rs357; 2026-02-21T08:52:47.6803651Z cvt.f32.bf16 %r1787, %rs358; 2026-02-21T08:52:47.6803712Z cvt.f32.bf16 %r1820, %rs361; 2026-02-21T08:52:47.6803778Z cvt.f32.bf16 %r1821, %rs362; 2026-02-21T08:52:47.6803852Z cvt.f32.bf16 %r1822, %rs365; 2026-02-21T08:52:47.6803915Z cvt.f32.bf16 %r1823, %rs366; 2026-02-21T08:52:47.6803980Z cvt.f32.bf16 %r1856, %rs369; 2026-02-21T08:52:47.6804052Z cvt.f32.bf16 %r1857, %rs370; 2026-02-21T08:52:47.6804114Z cvt.f32.bf16 %r1858, %rs373; 2026-02-21T08:52:47.6804176Z cvt.f32.bf16 %r1859, %rs374; 2026-02-21T08:52:47.6804245Z cvt.f32.bf16 %r1892, %rs347; 2026-02-21T08:52:47.6804308Z cvt.f32.bf16 %r1893, %rs348; 2026-02-21T08:52:47.6804369Z cvt.f32.bf16 %r1894, %rs351; 2026-02-21T08:52:47.6804432Z cvt.f32.bf16 %r1895, %rs352; 2026-02-21T08:52:47.6804509Z cvt.f32.bf16 %r1928, %rs355; 2026-02-21T08:52:47.6804575Z cvt.f32.bf16 %r1929, %rs356; 2026-02-21T08:52:47.6804637Z cvt.f32.bf16 %r1930, %rs359; 2026-02-21T08:52:47.6804708Z cvt.f32.bf16 %r1931, %rs360; 2026-02-21T08:52:47.6804770Z cvt.f32.bf16 %r1964, %rs363; 2026-02-21T08:52:47.6804832Z cvt.f32.bf16 %r1965, %rs364; 2026-02-21T08:52:47.6804895Z cvt.f32.bf16 %r1966, %rs367; 2026-02-21T08:52:47.6804962Z cvt.f32.bf16 %r1967, %rs368; 2026-02-21T08:52:47.6805024Z cvt.f32.bf16 %r2000, %rs371; 2026-02-21T08:52:47.6805088Z cvt.f32.bf16 %r2001, %rs372; 2026-02-21T08:52:47.6805156Z cvt.f32.bf16 %r2002, %rs375; 2026-02-21T08:52:47.6805217Z cvt.f32.bf16 %r2003, %rs376; 2026-02-21T08:52:47.6805432Z .loc 1 67 45 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:67:45 2026-02-21T08:52:47.6805606Z add.s32 %r2058, %r51, %r2047; 2026-02-21T08:52:47.6805676Z ld.shared.b8 %rs377, [%r2058]; 2026-02-21T08:52:47.6805745Z ld.shared.b8 %rs378, [%r2058+512]; 2026-02-21T08:52:47.6805815Z ld.shared.b8 %rs379, [%r2058+1024]; 2026-02-21T08:52:47.6805889Z ld.shared.b8 %rs380, [%r2058+1536]; 2026-02-21T08:52:47.6805956Z ld.shared.b8 %rs381, [%r2058+2048]; 2026-02-21T08:52:47.6806023Z ld.shared.b8 %rs382, [%r2058+2560]; 2026-02-21T08:52:47.6806099Z ld.shared.b8 %rs383, [%r2058+3072]; 2026-02-21T08:52:47.6806165Z ld.shared.b8 %rs384, [%r2058+3584]; 2026-02-21T08:52:47.6806230Z ld.shared.b8 %rs385, [%r2058+4096]; 2026-02-21T08:52:47.6806296Z ld.shared.b8 %rs386, [%r2058+4608]; 2026-02-21T08:52:47.6806580Z ld.shared.b8 %rs387, [%r2058+5120]; 2026-02-21T08:52:47.6806669Z ld.shared.b8 %rs388, [%r2058+5632]; 2026-02-21T08:52:47.6806738Z ld.shared.b8 %rs389, [%r2058+6144]; 2026-02-21T08:52:47.6806819Z ld.shared.b8 %rs390, [%r2058+6656]; 2026-02-21T08:52:47.6806886Z ld.shared.b8 %rs391, [%r2058+7168]; 2026-02-21T08:52:47.6806954Z ld.shared.b8 %rs392, [%r2058+7680]; 2026-02-21T08:52:47.6807165Z .loc 1 57 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:57:28 2026-02-21T08:52:47.6807231Z shl.b16 %rs393, %rs377, 4; 2026-02-21T08:52:47.6807295Z shl.b16 %rs394, %rs378, 4; 2026-02-21T08:52:47.6807356Z shl.b16 %rs395, %rs379, 4; 2026-02-21T08:52:47.6807424Z shl.b16 %rs396, %rs380, 4; 2026-02-21T08:52:47.6807487Z shl.b16 %rs397, %rs381, 4; 2026-02-21T08:52:47.6807549Z shl.b16 %rs398, %rs382, 4; 2026-02-21T08:52:47.6807619Z shl.b16 %rs399, %rs383, 4; 2026-02-21T08:52:47.6807682Z shl.b16 %rs400, %rs384, 4; 2026-02-21T08:52:47.6807747Z shl.b16 %rs401, %rs385, 4; 2026-02-21T08:52:47.6807807Z shl.b16 %rs402, %rs386, 4; 2026-02-21T08:52:47.6807878Z shl.b16 %rs403, %rs387, 4; 2026-02-21T08:52:47.6807942Z shl.b16 %rs404, %rs388, 4; 2026-02-21T08:52:47.6808008Z shl.b16 %rs405, %rs389, 4; 2026-02-21T08:52:47.6808080Z shl.b16 %rs406, %rs390, 4; 2026-02-21T08:52:47.6808152Z shl.b16 %rs407, %rs391, 4; 2026-02-21T08:52:47.6808217Z shl.b16 %rs408, %rs392, 4; 2026-02-21T08:52:47.6808418Z .loc 1 72 58 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:72:58 2026-02-21T08:52:47.6808502Z selp.b16 %rs409, %rs393, %rs377, %p57; 2026-02-21T08:52:47.6808567Z cvt.s16.s8 %rs410, %rs409; 2026-02-21T08:52:47.6808627Z shr.s16 %rs411, %rs410, 4; 2026-02-21T08:52:47.6808703Z selp.b16 %rs412, %rs394, %rs378, %p57; 2026-02-21T08:52:47.6808765Z cvt.s16.s8 %rs413, %rs412; 2026-02-21T08:52:47.6808826Z shr.s16 %rs414, %rs413, 4; 2026-02-21T08:52:47.6808900Z selp.b16 %rs415, %rs395, %rs379, %p57; 2026-02-21T08:52:47.6808965Z cvt.s16.s8 %rs416, %rs415; 2026-02-21T08:52:47.6809027Z shr.s16 %rs417, %rs416, 4; 2026-02-21T08:52:47.6809094Z selp.b16 %rs418, %rs396, %rs380, %p57; 2026-02-21T08:52:47.6809161Z cvt.s16.s8 %rs419, %rs418; 2026-02-21T08:52:47.6809225Z shr.s16 %rs420, %rs419, 4; 2026-02-21T08:52:47.6809293Z selp.b16 %rs421, %rs397, %rs381, %p57; 2026-02-21T08:52:47.6809359Z cvt.s16.s8 %rs422, %rs421; 2026-02-21T08:52:47.6809420Z shr.s16 %rs423, %rs422, 4; 2026-02-21T08:52:47.6809487Z selp.b16 %rs424, %rs398, %rs382, %p57; 2026-02-21T08:52:47.6809548Z cvt.s16.s8 %rs425, %rs424; 2026-02-21T08:52:47.6809612Z shr.s16 %rs426, %rs425, 4; 2026-02-21T08:52:47.6809679Z selp.b16 %rs427, %rs399, %rs383, %p57; 2026-02-21T08:52:47.6809739Z cvt.s16.s8 %rs428, %rs427; 2026-02-21T08:52:47.6809806Z shr.s16 %rs429, %rs428, 4; 2026-02-21T08:52:47.6809876Z selp.b16 %rs430, %rs400, %rs384, %p57; 2026-02-21T08:52:47.6809938Z cvt.s16.s8 %rs431, %rs430; 2026-02-21T08:52:47.6809999Z shr.s16 %rs432, %rs431, 4; 2026-02-21T08:52:47.6810075Z selp.b16 %rs433, %rs401, %rs385, %p57; 2026-02-21T08:52:47.6810136Z cvt.s16.s8 %rs434, %rs433; 2026-02-21T08:52:47.6810197Z shr.s16 %rs435, %rs434, 4; 2026-02-21T08:52:47.6810421Z selp.b16 %rs436, %rs402, %rs386, %p57; 2026-02-21T08:52:47.6810484Z cvt.s16.s8 %rs437, %rs436; 2026-02-21T08:52:47.6810546Z shr.s16 %rs438, %rs437, 4; 2026-02-21T08:52:47.6810616Z selp.b16 %rs439, %rs403, %rs387, %p57; 2026-02-21T08:52:47.6810682Z cvt.s16.s8 %rs440, %rs439; 2026-02-21T08:52:47.6810743Z shr.s16 %rs441, %rs440, 4; 2026-02-21T08:52:47.6810812Z selp.b16 %rs442, %rs404, %rs388, %p57; 2026-02-21T08:52:47.6810881Z cvt.s16.s8 %rs443, %rs442; 2026-02-21T08:52:47.6810944Z shr.s16 %rs444, %rs443, 4; 2026-02-21T08:52:47.6811010Z selp.b16 %rs445, %rs405, %rs389, %p57; 2026-02-21T08:52:47.6811077Z cvt.s16.s8 %rs446, %rs445; 2026-02-21T08:52:47.6811140Z shr.s16 %rs447, %rs446, 4; 2026-02-21T08:52:47.6811208Z selp.b16 %rs448, %rs406, %rs390, %p57; 2026-02-21T08:52:47.6811386Z cvt.s16.s8 %rs449, %rs448; 2026-02-21T08:52:47.6811458Z shr.s16 %rs450, %rs449, 4; 2026-02-21T08:52:47.6811528Z selp.b16 %rs451, %rs407, %rs391, %p57; 2026-02-21T08:52:47.6811591Z cvt.s16.s8 %rs452, %rs451; 2026-02-21T08:52:47.6811662Z shr.s16 %rs453, %rs452, 4; 2026-02-21T08:52:47.6811730Z selp.b16 %rs454, %rs408, %rs392, %p57; 2026-02-21T08:52:47.6811791Z cvt.s16.s8 %rs455, %rs454; 2026-02-21T08:52:47.6811854Z shr.s16 %rs456, %rs455, 4; 2026-02-21T08:52:47.6812079Z .loc 1 77 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:77:32 2026-02-21T08:52:47.6812149Z cvt.rn.f32.s16 %r2059, %rs411; 2026-02-21T08:52:47.6812214Z cvt.rn.f32.s16 %r2060, %rs414; 2026-02-21T08:52:47.6812285Z cvt.rn.f32.s16 %r2061, %rs417; 2026-02-21T08:52:47.6812349Z cvt.rn.f32.s16 %r2062, %rs420; 2026-02-21T08:52:47.6812413Z cvt.rn.f32.s16 %r2063, %rs423; 2026-02-21T08:52:47.6812482Z cvt.rn.f32.s16 %r2064, %rs426; 2026-02-21T08:52:47.6812545Z cvt.rn.f32.s16 %r2065, %rs429; 2026-02-21T08:52:47.6812611Z cvt.rn.f32.s16 %r2066, %rs432; 2026-02-21T08:52:47.6812677Z cvt.rn.f32.s16 %r2067, %rs435; 2026-02-21T08:52:47.6812751Z cvt.rn.f32.s16 %r2068, %rs438; 2026-02-21T08:52:47.6812819Z cvt.rn.f32.s16 %r2069, %rs441; 2026-02-21T08:52:47.6812882Z cvt.rn.f32.s16 %r2070, %rs444; 2026-02-21T08:52:47.6812951Z cvt.rn.f32.s16 %r2071, %rs447; 2026-02-21T08:52:47.6813014Z cvt.rn.f32.s16 %r2072, %rs450; 2026-02-21T08:52:47.6813076Z cvt.rn.f32.s16 %r2073, %rs453; 2026-02-21T08:52:47.6813138Z cvt.rn.f32.s16 %r2074, %rs456; 2026-02-21T08:52:47.6813210Z st.shared.b32 [%r52], %r2059; 2026-02-21T08:52:47.6813280Z st.shared.b32 [%r52+32768], %r2067; 2026-02-21T08:52:47.6813344Z st.shared.b32 [%r53], %r2060; 2026-02-21T08:52:47.6813414Z st.shared.b32 [%r53+32768], %r2068; 2026-02-21T08:52:47.6813477Z st.shared.b32 [%r54], %r2061; 2026-02-21T08:52:47.6813542Z st.shared.b32 [%r54+32768], %r2069; 2026-02-21T08:52:47.6813608Z st.shared.b32 [%r55], %r2062; 2026-02-21T08:52:47.6813681Z st.shared.b32 [%r55+32768], %r2070; 2026-02-21T08:52:47.6813746Z st.shared.b32 [%r56], %r2063; 2026-02-21T08:52:47.6813812Z st.shared.b32 [%r56+32768], %r2071; 2026-02-21T08:52:47.6813884Z st.shared.b32 [%r57], %r2064; 2026-02-21T08:52:47.6813949Z st.shared.b32 [%r57+32768], %r2072; 2026-02-21T08:52:47.6814016Z st.shared.b32 [%r58], %r2065; 2026-02-21T08:52:47.6814089Z st.shared.b32 [%r58+32768], %r2073; 2026-02-21T08:52:47.6814153Z st.shared.b32 [%r59], %r2066; 2026-02-21T08:52:47.6814218Z st.shared.b32 [%r59+32768], %r2074; 2026-02-21T08:52:47.6814275Z $L__tmp7: 2026-02-21T08:52:47.6814561Z .loc 2 291 36 // standard.py:291:36 @[ cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:84:40 ] 2026-02-21T08:52:47.6814625Z // begin inline asm 2026-02-21T08:52:47.6814705Z fence.proxy.async.shared::cta; 2026-02-21T08:52:47.6814772Z // end inline asm 2026-02-21T08:52:47.6814829Z bar.sync 0; 2026-02-21T08:52:47.6814915Z shfl.sync.idx.b32 %r2075, %r4, 0, 31, -1; 2026-02-21T08:52:47.6814990Z wgmma.fence.sync.aligned; 2026-02-21T08:52:47.6815060Z shl.b32 %r2076, %r2075, 10; 2026-02-21T08:52:47.6815123Z and.b32 %r2077, %r2076, 28672; 2026-02-21T08:52:47.6815290Z add.s32 %r2078, %r2077, %r2128; 2026-02-21T08:52:47.6815368Z bfe.u32 %r2079, %r2078, 4, 14; 2026-02-21T08:52:47.6815432Z cvt.u64.u32 %rd141, %r2079; 2026-02-21T08:52:47.6815510Z or.b64 %rd131, %rd141, 4611686293439512576; 2026-02-21T08:52:47.6815575Z mov.pred %p43, -1; 2026-02-21T08:52:47.6815641Z // begin inline asm 2026-02-21T08:52:47.6816156Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1748,%r1749,%r1750,%r1751}, %rd131, %p43, 1, 1; 2026-02-21T08:52:47.6816214Z // end inline asm 2026-02-21T08:52:47.6816280Z add.s32 %r2080, %r2078, 32; 2026-02-21T08:52:47.6816340Z bfe.u32 %r2081, %r2080, 4, 14; 2026-02-21T08:52:47.6816402Z cvt.u64.u32 %rd142, %r2081; 2026-02-21T08:52:47.6816748Z or.b64 %rd132, %rd142, 4611686293439512576; 2026-02-21T08:52:47.6816824Z // begin inline asm 2026-02-21T08:52:47.6817336Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1784,%r1785,%r1786,%r1787}, %rd132, %p43, 1, 1; 2026-02-21T08:52:47.6817407Z // end inline asm 2026-02-21T08:52:47.6817469Z add.s32 %r2082, %r2078, 64; 2026-02-21T08:52:47.6817530Z bfe.u32 %r2083, %r2082, 4, 14; 2026-02-21T08:52:47.6817594Z cvt.u64.u32 %rd143, %r2083; 2026-02-21T08:52:47.6817686Z or.b64 %rd133, %rd143, 4611686293439512576; 2026-02-21T08:52:47.6817752Z // begin inline asm 2026-02-21T08:52:47.6818250Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1820,%r1821,%r1822,%r1823}, %rd133, %p43, 1, 1; 2026-02-21T08:52:47.6818314Z // end inline asm 2026-02-21T08:52:47.6818379Z add.s32 %r2084, %r2078, 96; 2026-02-21T08:52:47.6818441Z bfe.u32 %r2085, %r2084, 4, 14; 2026-02-21T08:52:47.6818509Z cvt.u64.u32 %rd144, %r2085; 2026-02-21T08:52:47.6818585Z or.b64 %rd134, %rd144, 4611686293439512576; 2026-02-21T08:52:47.6818645Z // begin inline asm 2026-02-21T08:52:47.6819135Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1856,%r1857,%r1858,%r1859}, %rd134, %p43, 1, 1; 2026-02-21T08:52:47.6819198Z // end inline asm 2026-02-21T08:52:47.6819260Z add.s32 %r2086, %r2078, 32768; 2026-02-21T08:52:47.6819322Z bfe.u32 %r2087, %r2086, 4, 14; 2026-02-21T08:52:47.6819392Z cvt.u64.u32 %rd145, %r2087; 2026-02-21T08:52:47.6819464Z or.b64 %rd135, %rd145, 4611686293439512576; 2026-02-21T08:52:47.6819525Z // begin inline asm 2026-02-21T08:52:47.6820023Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1892,%r1893,%r1894,%r1895}, %rd135, %p43, 1, 1; 2026-02-21T08:52:47.6820081Z // end inline asm 2026-02-21T08:52:47.6820147Z add.s32 %r2088, %r2078, 32800; 2026-02-21T08:52:47.6820212Z bfe.u32 %r2089, %r2088, 4, 14; 2026-02-21T08:52:47.6820275Z cvt.u64.u32 %rd146, %r2089; 2026-02-21T08:52:47.6820347Z or.b64 %rd136, %rd146, 4611686293439512576; 2026-02-21T08:52:47.6820408Z // begin inline asm 2026-02-21T08:52:47.6820904Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1928,%r1929,%r1930,%r1931}, %rd136, %p43, 1, 1; 2026-02-21T08:52:47.6820964Z // end inline asm 2026-02-21T08:52:47.6821026Z add.s32 %r2090, %r2078, 32832; 2026-02-21T08:52:47.6821090Z bfe.u32 %r2091, %r2090, 4, 14; 2026-02-21T08:52:47.6821162Z cvt.u64.u32 %rd147, %r2091; 2026-02-21T08:52:47.6821237Z or.b64 %rd137, %rd147, 4611686293439512576; 2026-02-21T08:52:47.6821298Z // begin inline asm 2026-02-21T08:52:47.6821793Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r1964,%r1965,%r1966,%r1967}, %rd137, %p43, 1, 1; 2026-02-21T08:52:47.6821991Z // end inline asm 2026-02-21T08:52:47.6822052Z add.s32 %r2092, %r2078, 32864; 2026-02-21T08:52:47.6822119Z bfe.u32 %r2093, %r2092, 4, 14; 2026-02-21T08:52:47.6822181Z cvt.u64.u32 %rd148, %r2093; 2026-02-21T08:52:47.6822252Z or.b64 %rd138, %rd148, 4611686293439512576; 2026-02-21T08:52:47.6822312Z // begin inline asm 2026-02-21T08:52:47.6822805Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222}, {%r2000,%r2001,%r2002,%r2003}, %rd138, %p43, 1, 1; 2026-02-21T08:52:47.6822863Z // end inline asm 2026-02-21T08:52:47.6822941Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:47.6823098Z mov.b32 %r2021, 0; 2026-02-21T08:52:47.6823163Z mov.b32 %r2020, %r2128; 2026-02-21T08:52:47.6823236Z mov.b32 %r2022, %r2021; 2026-02-21T08:52:47.6823304Z // begin inline asm 2026-02-21T08:52:47.6823623Z // wait for regs: %r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2020,%r2021,%r2022 2026-02-21T08:52:47.6823701Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:47.6823764Z // end inline asm 2026-02-21T08:52:47.6823824Z $L__tmp8: 2026-02-21T08:52:47.6824048Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6824112Z add.s32 %r2094, %r2206, 1; 2026-02-21T08:52:47.6824188Z setp.gt.s32 %p54, %r2094, 1; 2026-02-21T08:52:47.6824258Z selp.b32 %r2206, 0, %r2094, %p54; 2026-02-21T08:52:47.6824466Z .loc 1 48 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:32 2026-02-21T08:52:47.6824548Z mad.wide.s32 %rd139, %r2204, 2, %rd15; 2026-02-21T08:52:47.6824747Z .loc 1 48 80 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:48:80 2026-02-21T08:52:47.6824815Z shl.b32 %r2095, %r2206, 13; 2026-02-21T08:52:47.6824886Z add.s32 %r2042, %r38, %r2095; 2026-02-21T08:52:47.6824952Z selp.b32 %r2043, 8, 0, %p52; 2026-02-21T08:52:47.6825013Z // begin inline asm 2026-02-21T08:52:47.6825162Z cp.async.ca.shared.global [ %r2042 + 0 ], [ %rd139 + 0 ], 0x8, %r2043; 2026-02-21T08:52:47.6825228Z // end inline asm 2026-02-21T08:52:47.6825297Z cp.async.commit_group; 2026-02-21T08:52:47.6825496Z .loc 1 54 87 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:54:87 2026-02-21T08:52:47.6825564Z add.s32 %r2044, %r40, %r2095; 2026-02-21T08:52:47.6825625Z // begin inline asm 2026-02-21T08:52:47.6825762Z cp.async.ca.shared.global [ %r2044 + 0 ], [ %rd154 + 0 ], 0x8, %r2043; 2026-02-21T08:52:47.6825821Z // end inline asm 2026-02-21T08:52:47.6825896Z cp.async.commit_group; 2026-02-21T08:52:47.6826106Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6826172Z add.s32 %r2204, %r2204, 64; 2026-02-21T08:52:47.6826258Z add.s64 %rd154, %rd154, 229376; 2026-02-21T08:52:47.6826328Z setp.lt.u64 %p55, %rd155, 4064; 2026-02-21T08:52:47.6826390Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:47.6826637Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:47.6826855Z .loc 1 31 32 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:31:32 2026-02-21T08:52:47.6826920Z or.b32 %r2114, %r200, %r4; 2026-02-21T08:52:47.6826985Z or.b32 %r2115, %r200, %r6; 2026-02-21T08:52:47.6827197Z .loc 1 40 103 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:40:103 2026-02-21T08:52:47.6827267Z cp.async.wait_group 0; 2026-02-21T08:52:47.6827323Z bar.sync 0; 2026-02-21T08:52:47.6827530Z .loc 1 87 28 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:87:28 2026-02-21T08:52:47.6827611Z cvt.rn.bf16x2.f32 %r2116, %r2208, %r2207; 2026-02-21T08:52:47.6827687Z cvt.rn.bf16x2.f32 %r2117, %r2210, %r2209; 2026-02-21T08:52:47.6827905Z cvt.rn.bf16x2.f32 %r2118, %r2212, %r2211; 2026-02-21T08:52:47.6827978Z cvt.rn.bf16x2.f32 %r2119, %r2214, %r2213; 2026-02-21T08:52:47.6828048Z cvt.rn.bf16x2.f32 %r2120, %r2216, %r2215; 2026-02-21T08:52:47.6828119Z cvt.rn.bf16x2.f32 %r2121, %r2218, %r2217; 2026-02-21T08:52:47.6828195Z cvt.rn.bf16x2.f32 %r2122, %r2220, %r2219; 2026-02-21T08:52:47.6828277Z cvt.rn.bf16x2.f32 %r2123, %r2222, %r2221; 2026-02-21T08:52:47.6828485Z .loc 1 88 50 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:50 2026-02-21T08:52:47.6828637Z mad.lo.s32 %r2124, %r2114, 7168, %r201; 2026-02-21T08:52:47.6828707Z mad.lo.s32 %r2125, %r2115, 7168, %r201; 2026-02-21T08:52:47.6829033Z .loc 1 88 22 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:22 2026-02-21T08:52:47.6829110Z mad.wide.s32 %rd149, %r2124, 2, %rd17; 2026-02-21T08:52:47.6829179Z mad.wide.s32 %rd150, %r2125, 2, %rd17; 2026-02-21T08:52:47.6829383Z .loc 1 88 81 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:88:81 2026-02-21T08:52:47.6829500Z st.shared.v4.b32 [%r60], {%r2116, %r2118, %r2120, %r2122}; 2026-02-21T08:52:47.6829619Z st.shared.v4.b32 [%r60+512], {%r2117, %r2119, %r2121, %r2123}; 2026-02-21T08:52:47.6829676Z bar.sync 0; 2026-02-21T08:52:47.6829736Z // begin inline asm 2026-02-21T08:52:47.6829934Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2096, %r2097, %r2098, %r2099}, [%r2100]; 2026-02-21T08:52:47.6829994Z // end inline asm 2026-02-21T08:52:47.6830054Z // begin inline asm 2026-02-21T08:52:47.6830241Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2101, %r2102, %r2103, %r2104}, [%r2105]; 2026-02-21T08:52:47.6830298Z // end inline asm 2026-02-21T08:52:47.6830357Z // begin inline asm 2026-02-21T08:52:47.6830489Z st.global.v4.b32 [ %rd149 + 0 ], { %r2096, %r2097, %r2098, %r2099 }; 2026-02-21T08:52:47.6830551Z // end inline asm 2026-02-21T08:52:47.6830611Z // begin inline asm 2026-02-21T08:52:47.6830731Z st.global.v4.b32 [ %rd150 + 0 ], { %r2101, %r2102, %r2103, %r2104 }; 2026-02-21T08:52:47.6830795Z // end inline asm 2026-02-21T08:52:47.6831014Z .loc 1 19 144 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:144 2026-02-21T08:52:47.6831079Z add.s32 %r241, %r2203, 4224; 2026-02-21T08:52:47.6831154Z setp.lt.s32 %p56, %r2203, -4196; 2026-02-21T08:52:47.6831217Z mov.b32 %r2203, %r241; 2026-02-21T08:52:47.6831278Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:47.6831369Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:47.6831586Z .loc 1 19 4 // cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py:19:4 2026-02-21T08:52:47.6831641Z ret; 2026-02-21T08:52:47.6831710Z $L__tmp9: 2026-02-21T08:52:47.6831772Z $L__func_end0: 2026-02-21T08:52:47.6831862Z // -- End function 2026-02-21T08:52:47.6831924Z } 2026-02-21T08:52:47.6832176Z .file 1 "/tmp/torchinductor_root/dg/cdgd3xyskxtai5q7mjdmm6ztgmdqkii34qu2icji4ytidmu3o2sd.py" 2026-02-21T08:52:47.6832394Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:47.6832461Z .section .debug_abbrev 2026-02-21T08:52:47.6832523Z { 2026-02-21T08:52:47.6832622Z .b8 1 // Abbreviation Code 2026-02-21T08:52:47.6832722Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:47.6832824Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:47.6832916Z .b8 37 // DW_AT_producer 2026-02-21T08:52:47.6833001Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.6833090Z .b8 19 // DW_AT_language 2026-02-21T08:52:47.6833179Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:47.6833261Z .b8 3 // DW_AT_name 2026-02-21T08:52:47.6833348Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.6833536Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:47.6833618Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:47.6833699Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:47.6833787Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.6833863Z .b8 0 // EOM(1) 2026-02-21T08:52:47.6833936Z .b8 0 // EOM(2) 2026-02-21T08:52:47.6834031Z .b8 2 // Abbreviation Code 2026-02-21T08:52:47.6834119Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:47.6834199Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:47.6834374Z .b8 3 // DW_AT_name 2026-02-21T08:52:47.6834457Z .b8 8 // DW_FORM_string 2026-02-21T08:52:47.6834540Z .b8 32 // DW_AT_inline 2026-02-21T08:52:47.6834624Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.6834703Z .b8 0 // EOM(1) 2026-02-21T08:52:47.6834773Z .b8 0 // EOM(2) 2026-02-21T08:52:47.6834860Z .b8 3 // Abbreviation Code 2026-02-21T08:52:47.6834950Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:47.6835034Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:47.6835114Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:47.6835196Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.6835278Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:47.6835356Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.6835450Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:47.6835540Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:47.6835613Z .b8 0 // EOM(1) 2026-02-21T08:52:47.6835685Z .b8 0 // EOM(2) 2026-02-21T08:52:47.6835775Z .b8 4 // Abbreviation Code 2026-02-21T08:52:47.6835876Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:47.6835957Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:47.6836052Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:47.6836130Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:47.6836208Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:47.6836296Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.6836388Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:47.6836777Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:47.6836871Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:47.6836957Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.6837040Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:47.6837120Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.6837210Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:47.6837288Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:47.6837359Z .b8 0 // EOM(1) 2026-02-21T08:52:47.6837433Z .b8 0 // EOM(2) 2026-02-21T08:52:47.6837502Z .b8 0 // EOM(3) 2026-02-21T08:52:47.6837555Z } 2026-02-21T08:52:47.6837620Z .section .debug_info 2026-02-21T08:52:47.6837681Z { 2026-02-21T08:52:47.6837771Z .b32 178 // Length of Unit 2026-02-21T08:52:47.6837865Z .b8 2 // DWARF version number 2026-02-21T08:52:47.6838083Z .b8 0 2026-02-21T08:52:47.6838221Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:47.6838319Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:47.6838434Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:47.6838526Z .b8 116 // DW_AT_producer 2026-02-21T08:52:47.6838581Z .b8 114 2026-02-21T08:52:47.6838634Z .b8 105 2026-02-21T08:52:47.6838705Z .b8 116 2026-02-21T08:52:47.6838766Z .b8 111 2026-02-21T08:52:47.6838820Z .b8 110 2026-02-21T08:52:47.6838874Z .b8 0 2026-02-21T08:52:47.6838959Z .b8 2 // DW_AT_language 2026-02-21T08:52:47.6839013Z .b8 0 2026-02-21T08:52:47.6839226Z .b8 99 // DW_AT_name 2026-02-21T08:52:47.6839289Z .b8 100 2026-02-21T08:52:47.6839343Z .b8 103 2026-02-21T08:52:47.6839395Z .b8 100 2026-02-21T08:52:47.6839448Z .b8 51 2026-02-21T08:52:47.6839510Z .b8 120 2026-02-21T08:52:47.6839567Z .b8 121 2026-02-21T08:52:47.6839619Z .b8 115 2026-02-21T08:52:47.6839676Z .b8 107 2026-02-21T08:52:47.6839730Z .b8 120 2026-02-21T08:52:47.6839782Z .b8 116 2026-02-21T08:52:47.6839834Z .b8 97 2026-02-21T08:52:47.6839893Z .b8 105 2026-02-21T08:52:47.6839945Z .b8 53 2026-02-21T08:52:47.6839997Z .b8 113 2026-02-21T08:52:47.6840056Z .b8 55 2026-02-21T08:52:47.6840109Z .b8 109 2026-02-21T08:52:47.6840162Z .b8 106 2026-02-21T08:52:47.6840214Z .b8 100 2026-02-21T08:52:47.6840276Z .b8 109 2026-02-21T08:52:47.6840329Z .b8 109 2026-02-21T08:52:47.6840381Z .b8 54 2026-02-21T08:52:47.6840444Z .b8 122 2026-02-21T08:52:47.6840508Z .b8 116 2026-02-21T08:52:47.6840564Z .b8 103 2026-02-21T08:52:47.6840619Z .b8 109 2026-02-21T08:52:47.6840676Z .b8 100 2026-02-21T08:52:47.6840728Z .b8 113 2026-02-21T08:52:47.6840785Z .b8 107 2026-02-21T08:52:47.6840839Z .b8 105 2026-02-21T08:52:47.6840898Z .b8 105 2026-02-21T08:52:47.6840954Z .b8 51 2026-02-21T08:52:47.6841007Z .b8 52 2026-02-21T08:52:47.6841071Z .b8 113 2026-02-21T08:52:47.6841124Z .b8 117 2026-02-21T08:52:47.6841179Z .b8 50 2026-02-21T08:52:47.6841233Z .b8 105 2026-02-21T08:52:47.6841294Z .b8 99 2026-02-21T08:52:47.6841348Z .b8 106 2026-02-21T08:52:47.6841406Z .b8 105 2026-02-21T08:52:47.6841466Z .b8 52 2026-02-21T08:52:47.6841518Z .b8 121 2026-02-21T08:52:47.6841573Z .b8 116 2026-02-21T08:52:47.6841626Z .b8 105 2026-02-21T08:52:47.6841683Z .b8 100 2026-02-21T08:52:47.6841736Z .b8 109 2026-02-21T08:52:47.6841788Z .b8 117 2026-02-21T08:52:47.6841841Z .b8 51 2026-02-21T08:52:47.6841899Z .b8 111 2026-02-21T08:52:47.6841951Z .b8 50 2026-02-21T08:52:47.6842005Z .b8 115 2026-02-21T08:52:47.6842067Z .b8 100 2026-02-21T08:52:47.6842119Z .b8 46 2026-02-21T08:52:47.6842171Z .b8 112 2026-02-21T08:52:47.6842224Z .b8 121 2026-02-21T08:52:47.6842285Z .b8 0 2026-02-21T08:52:47.6842399Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:47.6842487Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:47.6842552Z .b8 116 2026-02-21T08:52:47.6842605Z .b8 109 2026-02-21T08:52:47.6842657Z .b8 112 2026-02-21T08:52:47.6842709Z .b8 47 2026-02-21T08:52:47.6842767Z .b8 116 2026-02-21T08:52:47.6842821Z .b8 111 2026-02-21T08:52:47.6842874Z .b8 114 2026-02-21T08:52:47.6842932Z .b8 99 2026-02-21T08:52:47.6842986Z .b8 104 2026-02-21T08:52:47.6843039Z .b8 105 2026-02-21T08:52:47.6843090Z .b8 110 2026-02-21T08:52:47.6843149Z .b8 100 2026-02-21T08:52:47.6843202Z .b8 117 2026-02-21T08:52:47.6843257Z .b8 99 2026-02-21T08:52:47.6843313Z .b8 116 2026-02-21T08:52:47.6843366Z .b8 111 2026-02-21T08:52:47.6843418Z .b8 114 2026-02-21T08:52:47.6843470Z .b8 95 2026-02-21T08:52:47.6843529Z .b8 114 2026-02-21T08:52:47.6843581Z .b8 111 2026-02-21T08:52:47.6843634Z .b8 111 2026-02-21T08:52:47.6843686Z .b8 116 2026-02-21T08:52:47.6843745Z .b8 47 2026-02-21T08:52:47.6843801Z .b8 100 2026-02-21T08:52:47.6843854Z .b8 103 2026-02-21T08:52:47.6843913Z .b8 0 2026-02-21T08:52:47.6844044Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:47.6844236Z .b8 95 // DW_AT_name 2026-02-21T08:52:47.6844291Z .b8 104 2026-02-21T08:52:47.6844348Z .b8 101 2026-02-21T08:52:47.6844402Z .b8 108 2026-02-21T08:52:47.6844456Z .b8 105 2026-02-21T08:52:47.6844514Z .b8 111 2026-02-21T08:52:47.6844566Z .b8 110 2026-02-21T08:52:47.6844619Z .b8 95 2026-02-21T08:52:47.6844672Z .b8 109 2026-02-21T08:52:47.6844731Z .b8 97 2026-02-21T08:52:47.6844784Z .b8 116 2026-02-21T08:52:47.6844837Z .b8 109 2026-02-21T08:52:47.6844895Z .b8 117 2026-02-21T08:52:47.6844949Z .b8 108 2026-02-21T08:52:47.6845001Z .b8 95 2026-02-21T08:52:47.6845056Z .b8 98 2026-02-21T08:52:47.6845115Z .b8 102 2026-02-21T08:52:47.6845167Z .b8 49 2026-02-21T08:52:47.6845219Z .b8 54 2026-02-21T08:52:47.6845276Z .b8 95 2026-02-21T08:52:47.6845435Z .b8 105 2026-02-21T08:52:47.6845494Z .b8 110 2026-02-21T08:52:47.6845548Z .b8 116 2026-02-21T08:52:47.6845605Z .b8 52 2026-02-21T08:52:47.6845656Z .b8 0 2026-02-21T08:52:47.6845744Z .b8 1 // DW_AT_inline 2026-02-21T08:52:47.6845863Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:47.6845962Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:47.6846062Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:47.6846164Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:47.6846302Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:47.6846402Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:47.6846619Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:47.6846726Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:47.6846819Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:47.6846908Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:47.6847010Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:47.6847108Z .b8 0 // End Of Children Mark 2026-02-21T08:52:47.6847198Z .b8 0 // End Of Children Mark 2026-02-21T08:52:47.6847252Z } 2026-02-21T08:52:47.6847330Z .section .debug_macinfo { } 2026-02-21T08:52:47.6847337Z 2026-02-21T08:52:47.6847420Z ================================================================ 2026-02-21T08:52:47.6847540Z please share the reproducer above with Triton project. 2026-02-21T08:52:48.1850389Z 2026-02-21T08:52:48.1850404Z 2026-02-21T08:52:48.1850410Z 2026-02-21T08:52:48.1850650Z ================================================================ 2026-02-21T08:52:48.1851030Z Internal Triton PTX codegen error 2026-02-21T08:52:48.1851306Z `ptxas` stderr: 2026-02-21T08:52:48.1852061Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 413 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:48.1852895Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:48.1853123Z 2026-02-21T08:52:48.1853771Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp311q_6sc.ptx -o /tmp/tmp311q_6sc.ptx.o 2026-02-21T08:52:48.1854521Z 2026-02-21T08:52:48.1854526Z 2026-02-21T08:52:48.1854595Z // 2026-02-21T08:52:48.1854786Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:48.1855026Z // 2026-02-21T08:52:48.1855131Z 2026-02-21T08:52:48.1855211Z .version 8.7 2026-02-21T08:52:48.1855407Z .target sm_90a 2026-02-21T08:52:48.1855611Z .address_size 64 2026-02-21T08:52:48.1855750Z 2026-02-21T08:52:48.1855993Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:48.1856403Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:48.1856888Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:48.1857186Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:48.1857782Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:48.1858192Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:48.1858594Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:48.1858990Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:48.1859394Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:48.1859705Z ) 2026-02-21T08:52:48.1859853Z .reqntid 1024 2026-02-21T08:52:48.1860017Z .maxnreg 32 2026-02-21T08:52:48.1860164Z { 2026-02-21T08:52:48.1860327Z .reg .pred %p<58>; 2026-02-21T08:52:48.1860512Z .reg .b16 %rs<289>; 2026-02-21T08:52:48.1860696Z .reg .b32 %r<1452>; 2026-02-21T08:52:48.1861044Z .reg .b64 %rd<137>; 2026-02-21T08:52:48.1861408Z .loc 1 14 0 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:14:0 2026-02-21T08:52:48.1861827Z $L__func_begin0: 2026-02-21T08:52:48.1862171Z .loc 1 14 0 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:14:0 2026-02-21T08:52:48.1862515Z 2026-02-21T08:52:48.1862588Z // %bb.0: 2026-02-21T08:52:48.1862809Z ld.param.b64 %rd12, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:48.1863161Z ld.param.b64 %rd11, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:48.1863497Z ld.param.b64 %rd10, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:48.1863794Z $L__tmp0: 2026-02-21T08:52:48.1864122Z .loc 1 19 46 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:46 2026-02-21T08:52:48.1864541Z mov.u32 %r1402, %ctaid.x; 2026-02-21T08:52:48.1865148Z [85s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:48.1867003Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:48.1868466Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:48.1868850Z `ptxas` stderr: 2026-02-21T08:52:48.1869408Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 413 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T08:52:48.1870051Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:48.1870230Z 2026-02-21T08:52:48.1870736Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp311q_6sc.ptx -o /tmp/tmp311q_6sc.ptx.o 2026-02-21T08:52:48.1871307Z 2026-02-21T08:52:48.1871464Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:48.1871920Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.1872303Z sub.s32 %r188, 4279, %r1402; 2026-02-21T08:52:48.1872505Z mul.hi.u32 %r189, %r188, 1041204193; 2026-02-21T08:52:48.1872706Z shr.u32 %r190, %r189, 10; 2026-02-21T08:52:48.1872881Z mul.hi.u32 %r191, %r190, 1431655766; 2026-02-21T08:52:48.1873091Z mad.lo.s32 %r1439, %r191, 12672, %r1402; 2026-02-21T08:52:48.1873438Z .loc 1 31 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:45 2026-02-21T08:52:48.1873794Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:48.1873955Z shr.u32 %r4, %r3, 5; 2026-02-21T08:52:48.1874117Z and.b32 %r5, %r3, 31; 2026-02-21T08:52:48.1874279Z shl.b32 %r6, %r5, 2; 2026-02-21T08:52:48.1874440Z and.b32 %r7, %r3, 15; 2026-02-21T08:52:48.1874597Z shl.b32 %r8, %r7, 3; 2026-02-21T08:52:48.1874916Z .loc 1 33 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:33:45 2026-02-21T08:52:48.1875439Z and.b32 %r9, %r3, 1008; 2026-02-21T08:52:48.1875607Z shr.u32 %r10, %r3, 4; 2026-02-21T08:52:48.1875767Z shl.b32 %r11, %r7, 2; 2026-02-21T08:52:48.1876071Z .loc 1 65 38 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:65:38 2026-02-21T08:52:48.1876445Z and.b32 %r12, %r3, 128; 2026-02-21T08:52:48.1876902Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.1877271Z setp.ge.s32 %p1, %r1402, %r1439; 2026-02-21T08:52:48.1877471Z shl.b32 %r1384, %r3, 3; 2026-02-21T08:52:48.1877642Z shr.u32 %r1385, %r3, 1; 2026-02-21T08:52:48.1877810Z mov.b32 %r1386, global_smem; 2026-02-21T08:52:48.1878157Z mul.lo.s32 %r1387, %r4, 7168; 2026-02-21T08:52:48.1878351Z shl.b32 %r1388, %r3, 2; 2026-02-21T08:52:48.1878528Z shl.b32 %r1389, %r3, 6; 2026-02-21T08:52:48.1878692Z shl.b32 %r1390, %r3, 5; 2026-02-21T08:52:48.1878868Z shl.b32 %r1391, %r5, 1; 2026-02-21T08:52:48.1879030Z and.b32 %r1392, %r3, 127; 2026-02-21T08:52:48.1879205Z shl.b32 %r1393, %r3, 4; 2026-02-21T08:52:48.1879378Z and.b32 %r1394, %r4, 28; 2026-02-21T08:52:48.1879556Z and.b32 %r1395, %r3, 7; 2026-02-21T08:52:48.1879719Z shl.b32 %r1396, %r7, 4; 2026-02-21T08:52:48.1879878Z shr.u32 %r1397, %r3, 2; 2026-02-21T08:52:48.1880049Z and.b32 %r1398, %r3, 16; 2026-02-21T08:52:48.1880213Z shl.b32 %r1399, %r3, 1; 2026-02-21T08:52:48.1880377Z shl.b32 %r1400, %r3, 7; 2026-02-21T08:52:48.1880538Z shl.b32 %r1401, %r10, 13; 2026-02-21T08:52:48.1880719Z setp.eq.b32 %p57, %r12, 0; 2026-02-21T08:52:48.1880911Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:48.1881098Z // %bb.1: // %.lr.ph 2026-02-21T08:52:48.1881478Z .loc 1 0 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:0:144 2026-02-21T08:52:48.1881838Z and.b32 %r194, %r1385, 56; 2026-02-21T08:52:48.1882026Z xor.b32 %r195, %r194, %r1384; 2026-02-21T08:52:48.1882206Z add.s32 %r197, %r1386, %r195; 2026-02-21T08:52:48.1882391Z add.s32 %r13, %r197, 32768; 2026-02-21T08:52:48.1882567Z add.s32 %r199, %r1386, 49152; 2026-02-21T08:52:48.1882751Z add.s32 %r15, %r199, %r1388; 2026-02-21T08:52:48.1882926Z add.s32 %r16, %r197, 40960; 2026-02-21T08:52:48.1883105Z add.s32 %r17, %r1387, 229376; 2026-02-21T08:52:48.1883281Z add.s32 %r200, %r1386, %r1388; 2026-02-21T08:52:48.1883470Z add.s32 %r18, %r200, 53248; 2026-02-21T08:52:48.1883646Z and.b32 %r202, %r1389, 6144; 2026-02-21T08:52:48.1883835Z and.b32 %r204, %r1390, 896; 2026-02-21T08:52:48.1884017Z or.b32 %r206, %r202, %r204; 2026-02-21T08:52:48.1884188Z or.b32 %r19, %r206, %r1391; 2026-02-21T08:52:48.1884368Z xor.b32 %r20, %r19, 8; 2026-02-21T08:52:48.1884538Z xor.b32 %r21, %r19, 16; 2026-02-21T08:52:48.1884706Z xor.b32 %r22, %r19, 24; 2026-02-21T08:52:48.1884869Z xor.b32 %r23, %r19, 32; 2026-02-21T08:52:48.1885038Z xor.b32 %r24, %r19, 40; 2026-02-21T08:52:48.1885201Z xor.b32 %r25, %r19, 48; 2026-02-21T08:52:48.1885369Z xor.b32 %r26, %r19, 56; 2026-02-21T08:52:48.1885544Z and.b32 %r208, %r1385, 384; 2026-02-21T08:52:48.1885715Z add.s32 %r209, %r199, %r208; 2026-02-21T08:52:48.1885893Z add.s32 %r27, %r209, %r1392; 2026-02-21T08:52:48.1886065Z shl.b32 %r210, %r1392, 7; 2026-02-21T08:52:48.1886244Z and.b32 %r212, %r1393, 112; 2026-02-21T08:52:48.1886413Z or.b32 %r214, %r210, %r212; 2026-02-21T08:52:48.1886734Z xor.b32 %r215, %r214, %r1394; 2026-02-21T08:52:48.1886914Z add.s32 %r28, %r1386, %r215; 2026-02-21T08:52:48.1887094Z xor.b32 %r216, %r215, 32; 2026-02-21T08:52:48.1887270Z add.s32 %r29, %r1386, %r216; 2026-02-21T08:52:48.1887443Z xor.b32 %r217, %r215, 64; 2026-02-21T08:52:48.1887618Z add.s32 %r30, %r1386, %r217; 2026-02-21T08:52:48.1887796Z xor.b32 %r218, %r215, 96; 2026-02-21T08:52:48.1887972Z add.s32 %r31, %r1386, %r218; 2026-02-21T08:52:48.1888151Z shl.b32 %r220, %r1395, 11; 2026-02-21T08:52:48.1888511Z and.b32 %r222, %r3, 96; 2026-02-21T08:52:48.1888677Z shl.b32 %r223, %r222, 3; 2026-02-21T08:52:48.1888850Z and.b32 %r225, %r1397, 96; 2026-02-21T08:52:48.1889025Z and.b32 %r228, %r1399, 1024; 2026-02-21T08:52:48.1889211Z or.b32 %r229, %r1396, %r223; 2026-02-21T08:52:48.1889387Z or.b32 %r230, %r225, %r1398; 2026-02-21T08:52:48.1889560Z xor.b32 %r231, %r229, %r230; 2026-02-21T08:52:48.1889740Z add.s32 %r232, %r1386, %r220; 2026-02-21T08:52:48.1889917Z add.s32 %r233, %r232, %r228; 2026-02-21T08:52:48.1890093Z add.s32 %r32, %r233, %r231; 2026-02-21T08:52:48.1890269Z and.b32 %r235, %r1400, 15360; 2026-02-21T08:52:48.1890466Z shl.b32 %r236, %r1395, 4; 2026-02-21T08:52:48.1890640Z xor.b32 %r237, %r236, %r9; 2026-02-21T08:52:48.1890827Z add.s32 %r238, %r1386, %r235; 2026-02-21T08:52:48.1891142Z add.s32 %r33, %r238, %r237; 2026-02-21T08:52:48.1891327Z shl.b32 %r239, %r222, 6; 2026-02-21T08:52:48.1891502Z or.b32 %r240, %r239, %r204; 2026-02-21T08:52:48.1891672Z or.b32 %r34, %r240, %r1391; 2026-02-21T08:52:48.1891855Z xor.b32 %r35, %r34, 8; 2026-02-21T08:52:48.1892022Z xor.b32 %r36, %r34, 16; 2026-02-21T08:52:48.1892191Z xor.b32 %r37, %r34, 24; 2026-02-21T08:52:48.1892354Z xor.b32 %r38, %r34, 32; 2026-02-21T08:52:48.1892532Z xor.b32 %r39, %r34, 40; 2026-02-21T08:52:48.1892696Z xor.b32 %r40, %r34, 48; 2026-02-21T08:52:48.1892860Z xor.b32 %r41, %r34, 56; 2026-02-21T08:52:48.1893023Z or.b32 %r241, %r210, %r236; 2026-02-21T08:52:48.1893202Z xor.b32 %r242, %r241, %r1394; 2026-02-21T08:52:48.1893392Z add.s32 %r42, %r1386, %r242; 2026-02-21T08:52:48.1893567Z xor.b32 %r243, %r242, 32; 2026-02-21T08:52:48.1893742Z add.s32 %r43, %r1386, %r243; 2026-02-21T08:52:48.1893914Z xor.b32 %r244, %r242, 64; 2026-02-21T08:52:48.1894087Z add.s32 %r44, %r1386, %r244; 2026-02-21T08:52:48.1894261Z xor.b32 %r245, %r242, 96; 2026-02-21T08:52:48.1894435Z add.s32 %r45, %r1386, %r245; 2026-02-21T08:52:48.1894765Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.1895144Z or.b32 %r246, %r1387, %r6; 2026-02-21T08:52:48.1895332Z add.s32 %r46, %r246, 458752; 2026-02-21T08:52:48.1895516Z or.b32 %r248, %r1401, %r11; 2026-02-21T08:52:48.1895694Z or.b32 %r47, %r248, 128; 2026-02-21T08:52:48.1895866Z cvt.u64.u32 %rd1, %r11; 2026-02-21T08:52:48.1896098Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:48.1896386Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:48.1896812Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:48.1897085Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:48.1897469Z .loc 1 25 35 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:25:35 2026-02-21T08:52:48.1897842Z shr.u32 %r260, %r1402, 31; 2026-02-21T08:52:48.1898027Z add.s32 %r261, %r1402, %r260; 2026-02-21T08:52:48.1898358Z .loc 1 26 33 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:26:33 2026-02-21T08:52:48.1898719Z and.b32 %r262, %r261, -2; 2026-02-21T08:52:48.1899042Z .loc 1 27 39 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:39 2026-02-21T08:52:48.1899397Z sub.s32 %r263, 56, %r262; 2026-02-21T08:52:48.1899712Z .loc 1 27 52 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:52 2026-02-21T08:52:48.1900088Z min.s32 %r264, %r263, 2; 2026-02-21T08:52:48.1900403Z .loc 1 28 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:45 2026-02-21T08:52:48.1900759Z sub.s32 %r265, %r1402, %r262; 2026-02-21T08:52:48.1901083Z .loc 1 29 51 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:29:51 2026-02-21T08:52:48.1901440Z div.s32 %r266, %r265, %r264; 2026-02-21T08:52:48.1901770Z .loc 1 28 64 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:64 2026-02-21T08:52:48.1902274Z mul.lo.s32 %r267, %r266, %r264; 2026-02-21T08:52:48.1902468Z sub.s32 %r268, %r265, %r267; 2026-02-21T08:52:48.1902782Z .loc 1 28 30 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:30 2026-02-21T08:52:48.1903136Z add.s32 %r269, %r268, %r262; 2026-02-21T08:52:48.1903448Z .loc 1 30 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:30:27 2026-02-21T08:52:48.1903815Z shl.b32 %r73, %r269, 7; 2026-02-21T08:52:48.1904131Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.1904483Z or.b32 %r270, %r73, %r6; 2026-02-21T08:52:48.1904795Z .loc 1 32 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:32:27 2026-02-21T08:52:48.1905270Z shl.b32 %r271, %r266, 6; 2026-02-21T08:52:48.1905582Z .loc 1 33 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:33:32 2026-02-21T08:52:48.1905935Z or.b32 %r74, %r271, %r10; 2026-02-21T08:52:48.1906266Z .loc 1 48 53 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:53 2026-02-21T08:52:48.1906737Z shl.b32 %r272, %r74, 13; 2026-02-21T08:52:48.1907044Z .loc 1 48 60 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:60 2026-02-21T08:52:48.1907398Z or.b32 %r273, %r272, %r11; 2026-02-21T08:52:48.1907724Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.1908089Z mad.wide.s32 %rd13, %r273, 2, %rd10; 2026-02-21T08:52:48.1908287Z mov.b32 %r250, 8; 2026-02-21T08:52:48.1908680Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1909042Z // begin inline asm 2026-02-21T08:52:48.1909277Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd13 + 0 ], 0x8, %r250; 2026-02-21T08:52:48.1909557Z // end inline asm 2026-02-21T08:52:48.1909722Z cp.async.commit_group; 2026-02-21T08:52:48.1910043Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.1910398Z add.s32 %r274, %r270, %r1387; 2026-02-21T08:52:48.1910722Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.1911076Z cvt.s64.s32 %rd18, %r274; 2026-02-21T08:52:48.1911255Z add.s64 %rd14, %rd11, %rd18; 2026-02-21T08:52:48.1911436Z mov.b32 %r252, 4; 2026-02-21T08:52:48.1911728Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.1912080Z // begin inline asm 2026-02-21T08:52:48.1912308Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd14 + 0 ], 0x4, %r252; 2026-02-21T08:52:48.1912598Z // end inline asm 2026-02-21T08:52:48.1912760Z cp.async.commit_group; 2026-02-21T08:52:48.1913078Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.1913438Z cvt.s64.s32 %rd19, %r272; 2026-02-21T08:52:48.1913615Z or.b64 %rd20, %rd19, %rd1; 2026-02-21T08:52:48.1913797Z shl.b64 %rd21, %rd20, 1; 2026-02-21T08:52:48.1913969Z add.s64 %rd22, %rd10, %rd21; 2026-02-21T08:52:48.1914153Z add.s64 %rd15, %rd22, 128; 2026-02-21T08:52:48.1914466Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1914819Z bar.sync 0; 2026-02-21T08:52:48.1914967Z // begin inline asm 2026-02-21T08:52:48.1915197Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd15 + 0 ], 0x8, %r250; 2026-02-21T08:52:48.1915469Z // end inline asm 2026-02-21T08:52:48.1915624Z cp.async.commit_group; 2026-02-21T08:52:48.1915945Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.1916296Z add.s32 %r275, %r270, %r17; 2026-02-21T08:52:48.1916745Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.1917251Z cvt.s64.s32 %rd23, %r275; 2026-02-21T08:52:48.1917427Z add.s64 %rd16, %rd11, %rd23; 2026-02-21T08:52:48.1917738Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.1918089Z // begin inline asm 2026-02-21T08:52:48.1918316Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd16 + 0 ], 0x4, %r252; 2026-02-21T08:52:48.1918593Z // end inline asm 2026-02-21T08:52:48.1918757Z cp.async.commit_group; 2026-02-21T08:52:48.1919074Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.1919438Z shl.b32 %r276, %r1402, 7; 2026-02-21T08:52:48.1919611Z add.s32 %r277, %r46, %r276; 2026-02-21T08:52:48.1919792Z shl.b32 %r278, %r267, 7; 2026-02-21T08:52:48.1920096Z sub.s32 %r1404, %r277, %r278; 2026-02-21T08:52:48.1920278Z shl.b32 %r279, %r266, 19; 2026-02-21T08:52:48.1920450Z or.b32 %r1403, %r47, %r279; 2026-02-21T08:52:48.1920629Z mov.b32 %r1407, 0f00000000; 2026-02-21T08:52:48.1920807Z mov.b32 %r1406, 1; 2026-02-21T08:52:48.1920960Z mov.b32 %r1405, -1; 2026-02-21T08:52:48.1921122Z mov.b64 %rd133, -32; 2026-02-21T08:52:48.1921284Z mov.b32 %r1408, %r1407; 2026-02-21T08:52:48.1921457Z mov.b32 %r1409, %r1407; 2026-02-21T08:52:48.1921623Z mov.b32 %r1410, %r1407; 2026-02-21T08:52:48.1921791Z mov.b32 %r1411, %r1407; 2026-02-21T08:52:48.1921958Z mov.b32 %r1412, %r1407; 2026-02-21T08:52:48.1922120Z mov.b32 %r1413, %r1407; 2026-02-21T08:52:48.1922288Z mov.b32 %r1414, %r1407; 2026-02-21T08:52:48.1922503Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.1922802Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.1923074Z add.s64 %rd133, %rd133, 32; 2026-02-21T08:52:48.1923267Z setp.lt.u64 %p11, %rd133, 4032; 2026-02-21T08:52:48.1923458Z add.s32 %r466, %r1405, 1; 2026-02-21T08:52:48.1923648Z setp.gt.s32 %p12, %r466, 1; 2026-02-21T08:52:48.1931917Z selp.b32 %r1405, 0, %r466, %p12; 2026-02-21T08:52:48.1932331Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1932720Z cp.async.wait_group 2; 2026-02-21T08:52:48.1932901Z bar.sync 0; 2026-02-21T08:52:48.1933065Z shl.b32 %r467, %r1405, 12; 2026-02-21T08:52:48.1933256Z shl.b32 %r468, %r1405, 13; 2026-02-21T08:52:48.1933448Z add.s32 %r469, %r1386, %r468; 2026-02-21T08:52:48.1933636Z add.s32 %r470, %r469, 32768; 2026-02-21T08:52:48.1933978Z .loc 1 52 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:52:32 2026-02-21T08:52:48.1934349Z add.s32 %r471, %r470, %r19; 2026-02-21T08:52:48.1934549Z ld.shared.b16 %rs1, [%r471]; 2026-02-21T08:52:48.1934755Z ld.shared.b16 %rs2, [%r471+1024]; 2026-02-21T08:52:48.1934965Z ld.shared.b16 %rs3, [%r471+64]; 2026-02-21T08:52:48.1935182Z ld.shared.b16 %rs4, [%r471+1088]; 2026-02-21T08:52:48.1935387Z add.s32 %r472, %r470, %r20; 2026-02-21T08:52:48.1935590Z ld.shared.b16 %rs5, [%r472]; 2026-02-21T08:52:48.1935785Z ld.shared.b16 %rs6, [%r472+1024]; 2026-02-21T08:52:48.1935988Z ld.shared.b16 %rs7, [%r472+64]; 2026-02-21T08:52:48.1936186Z ld.shared.b16 %rs8, [%r472+1088]; 2026-02-21T08:52:48.1936383Z add.s32 %r473, %r470, %r21; 2026-02-21T08:52:48.1936745Z ld.shared.b16 %rs9, [%r473]; 2026-02-21T08:52:48.1936946Z ld.shared.b16 %rs10, [%r473+1024]; 2026-02-21T08:52:48.1937160Z ld.shared.b16 %rs11, [%r473+64]; 2026-02-21T08:52:48.1937359Z ld.shared.b16 %rs12, [%r473+1088]; 2026-02-21T08:52:48.1937564Z add.s32 %r474, %r470, %r22; 2026-02-21T08:52:48.1937760Z ld.shared.b16 %rs13, [%r474]; 2026-02-21T08:52:48.1937959Z ld.shared.b16 %rs14, [%r474+1024]; 2026-02-21T08:52:48.1938157Z ld.shared.b16 %rs15, [%r474+64]; 2026-02-21T08:52:48.1938367Z ld.shared.b16 %rs16, [%r474+1088]; 2026-02-21T08:52:48.1938565Z add.s32 %r475, %r470, %r23; 2026-02-21T08:52:48.1938754Z ld.shared.b16 %rs17, [%r475]; 2026-02-21T08:52:48.1939168Z ld.shared.b16 %rs18, [%r475+1024]; 2026-02-21T08:52:48.1939378Z ld.shared.b16 %rs19, [%r475+64]; 2026-02-21T08:52:48.1939587Z ld.shared.b16 %rs20, [%r475+1088]; 2026-02-21T08:52:48.1939803Z add.s32 %r476, %r470, %r24; 2026-02-21T08:52:48.1939997Z ld.shared.b16 %rs21, [%r476]; 2026-02-21T08:52:48.1940217Z ld.shared.b16 %rs22, [%r476+1024]; 2026-02-21T08:52:48.1940436Z ld.shared.b16 %rs23, [%r476+64]; 2026-02-21T08:52:48.1940634Z ld.shared.b16 %rs24, [%r476+1088]; 2026-02-21T08:52:48.1940839Z add.s32 %r477, %r470, %r25; 2026-02-21T08:52:48.1941027Z ld.shared.b16 %rs25, [%r477]; 2026-02-21T08:52:48.1941223Z ld.shared.b16 %rs26, [%r477+1024]; 2026-02-21T08:52:48.1941419Z ld.shared.b16 %rs27, [%r477+64]; 2026-02-21T08:52:48.1941619Z ld.shared.b16 %rs28, [%r477+1088]; 2026-02-21T08:52:48.1941985Z add.s32 %r478, %r470, %r26; 2026-02-21T08:52:48.1942180Z ld.shared.b16 %rs29, [%r478]; 2026-02-21T08:52:48.1942371Z ld.shared.b16 %rs30, [%r478+1024]; 2026-02-21T08:52:48.1942573Z ld.shared.b16 %rs31, [%r478+64]; 2026-02-21T08:52:48.1942776Z ld.shared.b16 %rs32, [%r478+1088]; 2026-02-21T08:52:48.1942975Z cvt.f32.bf16 %r296, %rs1; 2026-02-21T08:52:48.1943167Z cvt.f32.bf16 %r297, %rs2; 2026-02-21T08:52:48.1943346Z cvt.f32.bf16 %r298, %rs5; 2026-02-21T08:52:48.1943528Z cvt.f32.bf16 %r299, %rs6; 2026-02-21T08:52:48.1943713Z cvt.f32.bf16 %r316, %rs9; 2026-02-21T08:52:48.1943896Z cvt.f32.bf16 %r317, %rs10; 2026-02-21T08:52:48.1944075Z cvt.f32.bf16 %r318, %rs13; 2026-02-21T08:52:48.1944256Z cvt.f32.bf16 %r319, %rs14; 2026-02-21T08:52:48.1944436Z cvt.f32.bf16 %r336, %rs17; 2026-02-21T08:52:48.1944610Z cvt.f32.bf16 %r337, %rs18; 2026-02-21T08:52:48.1944790Z cvt.f32.bf16 %r338, %rs21; 2026-02-21T08:52:48.1944962Z cvt.f32.bf16 %r339, %rs22; 2026-02-21T08:52:48.1945146Z cvt.f32.bf16 %r356, %rs25; 2026-02-21T08:52:48.1945321Z cvt.f32.bf16 %r357, %rs26; 2026-02-21T08:52:48.1945501Z cvt.f32.bf16 %r358, %rs29; 2026-02-21T08:52:48.1945672Z cvt.f32.bf16 %r359, %rs30; 2026-02-21T08:52:48.1945854Z cvt.f32.bf16 %r376, %rs3; 2026-02-21T08:52:48.1946026Z cvt.f32.bf16 %r377, %rs4; 2026-02-21T08:52:48.1946204Z cvt.f32.bf16 %r378, %rs7; 2026-02-21T08:52:48.1946380Z cvt.f32.bf16 %r379, %rs8; 2026-02-21T08:52:48.1946697Z cvt.f32.bf16 %r396, %rs11; 2026-02-21T08:52:48.1946882Z cvt.f32.bf16 %r397, %rs12; 2026-02-21T08:52:48.1947056Z cvt.f32.bf16 %r398, %rs15; 2026-02-21T08:52:48.1947239Z cvt.f32.bf16 %r399, %rs16; 2026-02-21T08:52:48.1947413Z cvt.f32.bf16 %r416, %rs19; 2026-02-21T08:52:48.1947593Z cvt.f32.bf16 %r417, %rs20; 2026-02-21T08:52:48.1947766Z cvt.f32.bf16 %r418, %rs23; 2026-02-21T08:52:48.1947944Z cvt.f32.bf16 %r419, %rs24; 2026-02-21T08:52:48.1948118Z cvt.f32.bf16 %r436, %rs27; 2026-02-21T08:52:48.1948300Z cvt.f32.bf16 %r437, %rs28; 2026-02-21T08:52:48.1948484Z cvt.f32.bf16 %r438, %rs31; 2026-02-21T08:52:48.1948749Z cvt.f32.bf16 %r439, %rs32; 2026-02-21T08:52:48.1949096Z .loc 1 67 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:67:45 2026-02-21T08:52:48.1949471Z add.s32 %r479, %r27, %r467; 2026-02-21T08:52:48.1949670Z ld.shared.b8 %rs33, [%r479]; 2026-02-21T08:52:48.1949865Z ld.shared.b8 %rs34, [%r479+512]; 2026-02-21T08:52:48.1950080Z ld.shared.b8 %rs35, [%r479+1024]; 2026-02-21T08:52:48.1950284Z ld.shared.b8 %rs36, [%r479+1536]; 2026-02-21T08:52:48.1950487Z ld.shared.b8 %rs37, [%r479+2048]; 2026-02-21T08:52:48.1950685Z ld.shared.b8 %rs38, [%r479+2560]; 2026-02-21T08:52:48.1950880Z ld.shared.b8 %rs39, [%r479+3072]; 2026-02-21T08:52:48.1951080Z ld.shared.b8 %rs40, [%r479+3584]; 2026-02-21T08:52:48.1951414Z .loc 1 57 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:57:28 2026-02-21T08:52:48.1951780Z shl.b16 %rs41, %rs33, 4; 2026-02-21T08:52:48.1951957Z shl.b16 %rs42, %rs34, 4; 2026-02-21T08:52:48.1952140Z shl.b16 %rs43, %rs35, 4; 2026-02-21T08:52:48.1952327Z shl.b16 %rs44, %rs36, 4; 2026-02-21T08:52:48.1952505Z shl.b16 %rs45, %rs37, 4; 2026-02-21T08:52:48.1952825Z shl.b16 %rs46, %rs38, 4; 2026-02-21T08:52:48.1952995Z shl.b16 %rs47, %rs39, 4; 2026-02-21T08:52:48.1953168Z shl.b16 %rs48, %rs40, 4; 2026-02-21T08:52:48.1953480Z .loc 1 72 58 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:72:58 2026-02-21T08:52:48.1953850Z selp.b16 %rs49, %rs41, %rs33, %p57; 2026-02-21T08:52:48.1954054Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T08:52:48.1954230Z shr.s16 %rs51, %rs50, 4; 2026-02-21T08:52:48.1954409Z selp.b16 %rs52, %rs42, %rs34, %p57; 2026-02-21T08:52:48.1954613Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T08:52:48.1954788Z shr.s16 %rs54, %rs53, 4; 2026-02-21T08:52:48.1954964Z selp.b16 %rs55, %rs43, %rs35, %p57; 2026-02-21T08:52:48.1955184Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T08:52:48.1955486Z shr.s16 %rs57, %rs56, 4; 2026-02-21T08:52:48.1955688Z selp.b16 %rs58, %rs44, %rs36, %p57; 2026-02-21T08:52:48.1955889Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T08:52:48.1956065Z shr.s16 %rs60, %rs59, 4; 2026-02-21T08:52:48.1956247Z selp.b16 %rs61, %rs45, %rs37, %p57; 2026-02-21T08:52:48.1956566Z cvt.s16.s8 %rs62, %rs61; 2026-02-21T08:52:48.1956749Z shr.s16 %rs63, %rs62, 4; 2026-02-21T08:52:48.1956931Z selp.b16 %rs64, %rs46, %rs38, %p57; 2026-02-21T08:52:48.1957132Z cvt.s16.s8 %rs65, %rs64; 2026-02-21T08:52:48.1957301Z shr.s16 %rs66, %rs65, 4; 2026-02-21T08:52:48.1957488Z selp.b16 %rs67, %rs47, %rs39, %p57; 2026-02-21T08:52:48.1957685Z cvt.s16.s8 %rs68, %rs67; 2026-02-21T08:52:48.1957857Z shr.s16 %rs69, %rs68, 4; 2026-02-21T08:52:48.1958031Z selp.b16 %rs70, %rs48, %rs40, %p57; 2026-02-21T08:52:48.1958233Z cvt.s16.s8 %rs71, %rs70; 2026-02-21T08:52:48.1958402Z shr.s16 %rs72, %rs71, 4; 2026-02-21T08:52:48.1958752Z .loc 1 77 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:77:32 2026-02-21T08:52:48.1959126Z cvt.rn.f32.s16 %r480, %rs51; 2026-02-21T08:52:48.1959311Z cvt.rn.f32.s16 %r481, %rs54; 2026-02-21T08:52:48.1959500Z cvt.rn.f32.s16 %r482, %rs57; 2026-02-21T08:52:48.1959684Z cvt.rn.f32.s16 %r483, %rs60; 2026-02-21T08:52:48.1959872Z cvt.rn.f32.s16 %r484, %rs63; 2026-02-21T08:52:48.1960051Z cvt.rn.f32.s16 %r485, %rs66; 2026-02-21T08:52:48.1960238Z cvt.rn.f32.s16 %r486, %rs69; 2026-02-21T08:52:48.1960417Z cvt.rn.f32.s16 %r487, %rs72; 2026-02-21T08:52:48.1960611Z st.shared.b32 [%r28], %r480; 2026-02-21T08:52:48.1960804Z st.shared.b32 [%r28+16384], %r484; 2026-02-21T08:52:48.1961007Z st.shared.b32 [%r29], %r481; 2026-02-21T08:52:48.1961207Z st.shared.b32 [%r29+16384], %r485; 2026-02-21T08:52:48.1961414Z st.shared.b32 [%r30], %r482; 2026-02-21T08:52:48.1961611Z st.shared.b32 [%r30+16384], %r486; 2026-02-21T08:52:48.1961806Z st.shared.b32 [%r31], %r483; 2026-02-21T08:52:48.1961998Z st.shared.b32 [%r31+16384], %r487; 2026-02-21T08:52:48.1962189Z $L__tmp1: 2026-02-21T08:52:48.1962564Z .loc 2 291 36 // standard.py:291:36 @[ cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:84:40 ] 2026-02-21T08:52:48.1962998Z // begin inline asm 2026-02-21T08:52:48.1963206Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.1963409Z // end inline asm 2026-02-21T08:52:48.1963561Z bar.sync 0; 2026-02-21T08:52:48.1963740Z shfl.sync.idx.b32 %r488, %r4, 0, 31, -1; 2026-02-21T08:52:48.1963966Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.1964155Z shl.b32 %r489, %r488, 9; 2026-02-21T08:52:48.1964329Z and.b32 %r490, %r489, 14336; 2026-02-21T08:52:48.1964516Z add.s32 %r491, %r490, %r1386; 2026-02-21T08:52:48.1964702Z bfe.u32 %r492, %r491, 4, 14; 2026-02-21T08:52:48.1964889Z cvt.u64.u32 %rd34, %r492; 2026-02-21T08:52:48.1965085Z or.b64 %rd24, %rd34, 4611686293372403712; 2026-02-21T08:52:48.1965300Z mov.pred %p2, -1; 2026-02-21T08:52:48.1965474Z // begin inline asm 2026-02-21T08:52:48.1965950Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r296,%r297,%r298,%r299}, %rd24, %p2, 1, 1; 2026-02-21T08:52:48.1966582Z // end inline asm 2026-02-21T08:52:48.1966918Z add.s32 %r493, %r491, 32; 2026-02-21T08:52:48.1967102Z bfe.u32 %r494, %r493, 4, 14; 2026-02-21T08:52:48.1967280Z cvt.u64.u32 %rd35, %r494; 2026-02-21T08:52:48.1967471Z or.b64 %rd25, %rd35, 4611686293372403712; 2026-02-21T08:52:48.1967684Z // begin inline asm 2026-02-21T08:52:48.1968151Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r316,%r317,%r318,%r319}, %rd25, %p2, 1, 1; 2026-02-21T08:52:48.1968674Z // end inline asm 2026-02-21T08:52:48.1968832Z add.s32 %r495, %r491, 64; 2026-02-21T08:52:48.1969012Z bfe.u32 %r496, %r495, 4, 14; 2026-02-21T08:52:48.1969192Z cvt.u64.u32 %rd36, %r496; 2026-02-21T08:52:48.1969380Z or.b64 %rd26, %rd36, 4611686293372403712; 2026-02-21T08:52:48.1969584Z // begin inline asm 2026-02-21T08:52:48.1970176Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r336,%r337,%r338,%r339}, %rd26, %p2, 1, 1; 2026-02-21T08:52:48.1970695Z // end inline asm 2026-02-21T08:52:48.1970855Z add.s32 %r497, %r491, 96; 2026-02-21T08:52:48.1971035Z bfe.u32 %r498, %r497, 4, 14; 2026-02-21T08:52:48.1971214Z cvt.u64.u32 %rd37, %r498; 2026-02-21T08:52:48.1971399Z or.b64 %rd27, %rd37, 4611686293372403712; 2026-02-21T08:52:48.1971605Z // begin inline asm 2026-02-21T08:52:48.1972067Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r356,%r357,%r358,%r359}, %rd27, %p2, 1, 1; 2026-02-21T08:52:48.1972581Z // end inline asm 2026-02-21T08:52:48.1972735Z add.s32 %r499, %r491, 16384; 2026-02-21T08:52:48.1972926Z bfe.u32 %r500, %r499, 4, 14; 2026-02-21T08:52:48.1973105Z cvt.u64.u32 %rd38, %r500; 2026-02-21T08:52:48.1973288Z or.b64 %rd28, %rd38, 4611686293372403712; 2026-02-21T08:52:48.1973493Z // begin inline asm 2026-02-21T08:52:48.1973954Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r376,%r377,%r378,%r379}, %rd28, %p2, 1, 1; 2026-02-21T08:52:48.1974466Z // end inline asm 2026-02-21T08:52:48.1974616Z add.s32 %r501, %r491, 16416; 2026-02-21T08:52:48.1974794Z bfe.u32 %r502, %r501, 4, 14; 2026-02-21T08:52:48.1974970Z cvt.u64.u32 %rd39, %r502; 2026-02-21T08:52:48.1975157Z or.b64 %rd29, %rd39, 4611686293372403712; 2026-02-21T08:52:48.1975359Z // begin inline asm 2026-02-21T08:52:48.1975819Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r396,%r397,%r398,%r399}, %rd29, %p2, 1, 1; 2026-02-21T08:52:48.1976330Z // end inline asm 2026-02-21T08:52:48.1976607Z add.s32 %r503, %r491, 16448; 2026-02-21T08:52:48.1976796Z bfe.u32 %r504, %r503, 4, 14; 2026-02-21T08:52:48.1976970Z cvt.u64.u32 %rd40, %r504; 2026-02-21T08:52:48.1977162Z or.b64 %rd30, %rd40, 4611686293372403712; 2026-02-21T08:52:48.1977366Z // begin inline asm 2026-02-21T08:52:48.1977824Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r416,%r417,%r418,%r419}, %rd30, %p2, 1, 1; 2026-02-21T08:52:48.1978330Z // end inline asm 2026-02-21T08:52:48.1978487Z add.s32 %r505, %r491, 16480; 2026-02-21T08:52:48.1978668Z bfe.u32 %r506, %r505, 4, 14; 2026-02-21T08:52:48.1978844Z cvt.u64.u32 %rd41, %r506; 2026-02-21T08:52:48.1979032Z or.b64 %rd31, %rd41, 4611686293372403712; 2026-02-21T08:52:48.1979236Z // begin inline asm 2026-02-21T08:52:48.1979697Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414}, {%r436,%r437,%r438,%r439}, %rd31, %p2, 1, 1; 2026-02-21T08:52:48.1980202Z // end inline asm 2026-02-21T08:52:48.1980377Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.1980582Z mov.b32 %r450, 0; 2026-02-21T08:52:48.1980737Z mov.b32 %r449, %r450; 2026-02-21T08:52:48.1980911Z mov.b32 %r448, %r1386; 2026-02-21T08:52:48.1981085Z // begin inline asm 2026-02-21T08:52:48.1981361Z // wait for regs: %r1407,%r1408,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r448,%r449,%r450 2026-02-21T08:52:48.1981848Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.1982064Z // end inline asm 2026-02-21T08:52:48.1982211Z $L__tmp2: 2026-02-21T08:52:48.1982523Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.1982901Z add.s32 %r507, %r1406, 1; 2026-02-21T08:52:48.1983078Z setp.gt.s32 %p13, %r507, 1; 2026-02-21T08:52:48.1983276Z selp.b32 %r1406, 0, %r507, %p13; 2026-02-21T08:52:48.1983616Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.1983993Z mad.wide.s32 %rd32, %r1403, 2, %rd10; 2026-02-21T08:52:48.1984343Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1984829Z shl.b32 %r508, %r1406, 12; 2026-02-21T08:52:48.1985015Z shl.b32 %r509, %r1406, 13; 2026-02-21T08:52:48.1985198Z add.s32 %r462, %r13, %r509; 2026-02-21T08:52:48.1985386Z selp.b32 %r463, 8, 0, %p11; 2026-02-21T08:52:48.1985570Z // begin inline asm 2026-02-21T08:52:48.1985815Z cp.async.ca.shared.global [ %r462 + 0 ], [ %rd32 + 0 ], 0x8, %r463; 2026-02-21T08:52:48.1986091Z // end inline asm 2026-02-21T08:52:48.1986255Z cp.async.commit_group; 2026-02-21T08:52:48.1986719Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.1987089Z cvt.s64.s32 %rd42, %r1404; 2026-02-21T08:52:48.1987268Z add.s64 %rd33, %rd11, %rd42; 2026-02-21T08:52:48.1987591Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.1987656Z add.s32 %r464, %r15, %r508; 2026-02-21T08:52:48.1987727Z selp.b32 %r465, 4, 0, %p11; 2026-02-21T08:52:48.1987787Z // begin inline asm 2026-02-21T08:52:48.1987927Z cp.async.ca.shared.global [ %r464 + 0 ], [ %rd33 + 0 ], 0x4, %r465; 2026-02-21T08:52:48.1987986Z // end inline asm 2026-02-21T08:52:48.1988059Z cp.async.commit_group; 2026-02-21T08:52:48.1988274Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.1988342Z add.s32 %r1404, %r1404, 229376; 2026-02-21T08:52:48.1988420Z add.s32 %r1403, %r1403, 64; 2026-02-21T08:52:48.1988494Z setp.lt.u64 %p14, %rd133, 4064; 2026-02-21T08:52:48.1988618Z @%p14 bra $L__BB0_3; 2026-02-21T08:52:48.1988737Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.1988943Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.1989010Z or.b32 %r525, %r73, %r8; 2026-02-21T08:52:48.1989225Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.1989294Z cp.async.wait_group 0; 2026-02-21T08:52:48.1989356Z bar.sync 0; 2026-02-21T08:52:48.1989557Z .loc 1 87 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:87:28 2026-02-21T08:52:48.1989647Z cvt.rn.bf16x2.f32 %r526, %r1408, %r1407; 2026-02-21T08:52:48.1989721Z cvt.rn.bf16x2.f32 %r527, %r1410, %r1409; 2026-02-21T08:52:48.1989793Z cvt.rn.bf16x2.f32 %r528, %r1412, %r1411; 2026-02-21T08:52:48.1989870Z cvt.rn.bf16x2.f32 %r529, %r1414, %r1413; 2026-02-21T08:52:48.1990069Z .loc 1 88 50 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:50 2026-02-21T08:52:48.1990140Z mad.lo.s32 %r530, %r74, 7168, %r525; 2026-02-21T08:52:48.1990349Z .loc 1 88 22 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:22 2026-02-21T08:52:48.1990419Z mad.wide.s32 %rd43, %r530, 2, %rd12; 2026-02-21T08:52:48.1990617Z .loc 1 88 81 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:81 2026-02-21T08:52:48.1990799Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r526, %r527, %r528, %r529}; 2026-02-21T08:52:48.1990865Z bar.sync 0; 2026-02-21T08:52:48.1990974Z ld.shared.v4.b32 {%r510, %r511, %r512, %r513}, [%r33]; 2026-02-21T08:52:48.1991194Z // begin inline asm 2026-02-21T08:52:48.1991327Z st.global.v4.b32 [ %rd43 + 0 ], { %r510, %r511, %r512, %r513 }; 2026-02-21T08:52:48.1991388Z // end inline asm 2026-02-21T08:52:48.1991606Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.1991678Z add.s32 %r531, %r1402, 4224; 2026-02-21T08:52:48.1991884Z .loc 1 25 35 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:25:35 2026-02-21T08:52:48.1991951Z shr.u32 %r532, %r531, 31; 2026-02-21T08:52:48.1992017Z add.s32 %r533, %r531, %r532; 2026-02-21T08:52:48.1992225Z .loc 1 26 33 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:26:33 2026-02-21T08:52:48.1992432Z and.b32 %r534, %r533, -2; 2026-02-21T08:52:48.1992638Z .loc 1 27 39 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:39 2026-02-21T08:52:48.1992707Z sub.s32 %r535, 56, %r534; 2026-02-21T08:52:48.1992911Z .loc 1 27 52 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:52 2026-02-21T08:52:48.1992975Z min.s32 %r536, %r535, 2; 2026-02-21T08:52:48.1993183Z .loc 1 28 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:45 2026-02-21T08:52:48.1993254Z sub.s32 %r537, %r531, %r534; 2026-02-21T08:52:48.1993455Z .loc 1 29 51 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:29:51 2026-02-21T08:52:48.1993530Z div.s32 %r538, %r537, %r536; 2026-02-21T08:52:48.1993730Z .loc 1 28 64 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:64 2026-02-21T08:52:48.1993804Z mul.lo.s32 %r539, %r538, %r536; 2026-02-21T08:52:48.1993868Z sub.s32 %r540, %r537, %r539; 2026-02-21T08:52:48.1994074Z .loc 1 28 30 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:30 2026-02-21T08:52:48.1994138Z add.s32 %r541, %r540, %r534; 2026-02-21T08:52:48.1994340Z .loc 1 30 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:30:27 2026-02-21T08:52:48.1994413Z shl.b32 %r101, %r541, 7; 2026-02-21T08:52:48.1994617Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.1994679Z or.b32 %r542, %r101, %r6; 2026-02-21T08:52:48.1994882Z .loc 1 32 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:32:27 2026-02-21T08:52:48.1994946Z shl.b32 %r543, %r538, 6; 2026-02-21T08:52:48.1995145Z .loc 1 33 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:33:32 2026-02-21T08:52:48.1995216Z or.b32 %r102, %r543, %r10; 2026-02-21T08:52:48.1995416Z .loc 1 48 53 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:53 2026-02-21T08:52:48.1995479Z shl.b32 %r544, %r102, 13; 2026-02-21T08:52:48.1995679Z .loc 1 48 60 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:60 2026-02-21T08:52:48.1995761Z or.b32 %r545, %r544, %r11; 2026-02-21T08:52:48.1995965Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.1996041Z mad.wide.s32 %rd44, %r545, 2, %rd10; 2026-02-21T08:52:48.1996108Z mov.b32 %r515, 8; 2026-02-21T08:52:48.1996309Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1996373Z // begin inline asm 2026-02-21T08:52:48.1996646Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd44 + 0 ], 0x8, %r515; 2026-02-21T08:52:48.1996711Z // end inline asm 2026-02-21T08:52:48.1996781Z cp.async.commit_group; 2026-02-21T08:52:48.1996985Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.1997063Z add.s32 %r546, %r542, %r1387; 2026-02-21T08:52:48.1997263Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.1997476Z cvt.s64.s32 %rd49, %r546; 2026-02-21T08:52:48.1997551Z add.s64 %rd45, %rd11, %rd49; 2026-02-21T08:52:48.1997612Z mov.b32 %r517, 4; 2026-02-21T08:52:48.1997812Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.1997882Z // begin inline asm 2026-02-21T08:52:48.1998017Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd45 + 0 ], 0x4, %r517; 2026-02-21T08:52:48.1998077Z // end inline asm 2026-02-21T08:52:48.1998144Z cp.async.commit_group; 2026-02-21T08:52:48.1998352Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.1998416Z cvt.s64.s32 %rd50, %r544; 2026-02-21T08:52:48.1998479Z or.b64 %rd51, %rd50, %rd1; 2026-02-21T08:52:48.1998678Z shl.b64 %rd52, %rd51, 1; 2026-02-21T08:52:48.1998750Z add.s64 %rd53, %rd10, %rd52; 2026-02-21T08:52:48.1998814Z add.s64 %rd46, %rd53, 128; 2026-02-21T08:52:48.1999032Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.1999093Z bar.sync 0; 2026-02-21T08:52:48.1999155Z // begin inline asm 2026-02-21T08:52:48.1999289Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd46 + 0 ], 0x8, %r515; 2026-02-21T08:52:48.1999352Z // end inline asm 2026-02-21T08:52:48.1999420Z cp.async.commit_group; 2026-02-21T08:52:48.1999625Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.1999696Z add.s32 %r547, %r542, %r17; 2026-02-21T08:52:48.1999903Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.1999967Z cvt.s64.s32 %rd54, %r547; 2026-02-21T08:52:48.2000040Z add.s64 %rd47, %rd11, %rd54; 2026-02-21T08:52:48.2000242Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2000306Z // begin inline asm 2026-02-21T08:52:48.2000445Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd47 + 0 ], 0x4, %r517; 2026-02-21T08:52:48.2000514Z // end inline asm 2026-02-21T08:52:48.2000580Z cp.async.commit_group; 2026-02-21T08:52:48.2000791Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2000861Z shl.b32 %r548, %r531, 7; 2026-02-21T08:52:48.2000923Z add.s32 %r549, %r46, %r548; 2026-02-21T08:52:48.2000985Z shl.b32 %r550, %r539, 7; 2026-02-21T08:52:48.2001050Z sub.s32 %r1416, %r549, %r550; 2026-02-21T08:52:48.2001121Z shl.b32 %r551, %r538, 19; 2026-02-21T08:52:48.2001188Z or.b32 %r1415, %r47, %r551; 2026-02-21T08:52:48.2001254Z mov.b32 %r1419, 0f00000000; 2026-02-21T08:52:48.2001326Z mov.b32 %r1418, 1; 2026-02-21T08:52:48.2001401Z mov.b32 %r1417, -1; 2026-02-21T08:52:48.2001464Z mov.b64 %rd134, -32; 2026-02-21T08:52:48.2001532Z mov.b32 %r1420, %r1419; 2026-02-21T08:52:48.2001593Z mov.b32 %r1421, %r1419; 2026-02-21T08:52:48.2001656Z mov.b32 %r1422, %r1419; 2026-02-21T08:52:48.2001717Z mov.b32 %r1423, %r1419; 2026-02-21T08:52:48.2001783Z mov.b32 %r1424, %r1419; 2026-02-21T08:52:48.2001845Z mov.b32 %r1425, %r1419; 2026-02-21T08:52:48.2001905Z mov.b32 %r1426, %r1419; 2026-02-21T08:52:48.2002028Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.2002139Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.2002205Z add.s64 %rd134, %rd134, 32; 2026-02-21T08:52:48.2002275Z setp.lt.u64 %p24, %rd134, 4032; 2026-02-21T08:52:48.2002343Z add.s32 %r738, %r1417, 1; 2026-02-21T08:52:48.2002409Z setp.gt.s32 %p25, %r738, 1; 2026-02-21T08:52:48.2002479Z selp.b32 %r1417, 0, %r738, %p25; 2026-02-21T08:52:48.2002697Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2002768Z cp.async.wait_group 2; 2026-02-21T08:52:48.2002827Z bar.sync 0; 2026-02-21T08:52:48.2002897Z shl.b32 %r739, %r1417, 12; 2026-02-21T08:52:48.2003072Z shl.b32 %r740, %r1417, 13; 2026-02-21T08:52:48.2003137Z add.s32 %r741, %r1386, %r740; 2026-02-21T08:52:48.2003201Z add.s32 %r742, %r741, 32768; 2026-02-21T08:52:48.2003409Z .loc 1 52 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:52:32 2026-02-21T08:52:48.2003473Z add.s32 %r743, %r742, %r34; 2026-02-21T08:52:48.2003542Z ld.shared.b16 %rs73, [%r743]; 2026-02-21T08:52:48.2003620Z ld.shared.b16 %rs74, [%r743+1024]; 2026-02-21T08:52:48.2003691Z ld.shared.b16 %rs75, [%r743+64]; 2026-02-21T08:52:48.2003758Z ld.shared.b16 %rs76, [%r743+1088]; 2026-02-21T08:52:48.2003821Z add.s32 %r744, %r742, %r35; 2026-02-21T08:52:48.2003892Z ld.shared.b16 %rs77, [%r744]; 2026-02-21T08:52:48.2004047Z ld.shared.b16 %rs78, [%r744+1024]; 2026-02-21T08:52:48.2004118Z ld.shared.b16 %rs79, [%r744+64]; 2026-02-21T08:52:48.2004192Z ld.shared.b16 %rs80, [%r744+1088]; 2026-02-21T08:52:48.2004258Z add.s32 %r745, %r742, %r36; 2026-02-21T08:52:48.2004329Z ld.shared.b16 %rs81, [%r745]; 2026-02-21T08:52:48.2004396Z ld.shared.b16 %rs82, [%r745+1024]; 2026-02-21T08:52:48.2004473Z ld.shared.b16 %rs83, [%r745+64]; 2026-02-21T08:52:48.2004539Z ld.shared.b16 %rs84, [%r745+1088]; 2026-02-21T08:52:48.2004601Z add.s32 %r746, %r742, %r37; 2026-02-21T08:52:48.2004673Z ld.shared.b16 %rs85, [%r746]; 2026-02-21T08:52:48.2004737Z ld.shared.b16 %rs86, [%r746+1024]; 2026-02-21T08:52:48.2004803Z ld.shared.b16 %rs87, [%r746+64]; 2026-02-21T08:52:48.2004876Z ld.shared.b16 %rs88, [%r746+1088]; 2026-02-21T08:52:48.2004939Z add.s32 %r747, %r742, %r38; 2026-02-21T08:52:48.2005005Z ld.shared.b16 %rs89, [%r747]; 2026-02-21T08:52:48.2005076Z ld.shared.b16 %rs90, [%r747+1024]; 2026-02-21T08:52:48.2005165Z ld.shared.b16 %rs91, [%r747+64]; 2026-02-21T08:52:48.2005236Z ld.shared.b16 %rs92, [%r747+1088]; 2026-02-21T08:52:48.2005300Z add.s32 %r748, %r742, %r39; 2026-02-21T08:52:48.2005372Z ld.shared.b16 %rs93, [%r748]; 2026-02-21T08:52:48.2005443Z ld.shared.b16 %rs94, [%r748+1024]; 2026-02-21T08:52:48.2005511Z ld.shared.b16 %rs95, [%r748+64]; 2026-02-21T08:52:48.2005579Z ld.shared.b16 %rs96, [%r748+1088]; 2026-02-21T08:52:48.2005652Z add.s32 %r749, %r742, %r40; 2026-02-21T08:52:48.2005718Z ld.shared.b16 %rs97, [%r749]; 2026-02-21T08:52:48.2005785Z ld.shared.b16 %rs98, [%r749+1024]; 2026-02-21T08:52:48.2005860Z ld.shared.b16 %rs99, [%r749+64]; 2026-02-21T08:52:48.2005932Z ld.shared.b16 %rs100, [%r749+1088]; 2026-02-21T08:52:48.2005997Z add.s32 %r750, %r742, %r41; 2026-02-21T08:52:48.2006075Z ld.shared.b16 %rs101, [%r750]; 2026-02-21T08:52:48.2006145Z ld.shared.b16 %rs102, [%r750+1024]; 2026-02-21T08:52:48.2006215Z ld.shared.b16 %rs103, [%r750+64]; 2026-02-21T08:52:48.2006287Z ld.shared.b16 %rs104, [%r750+1088]; 2026-02-21T08:52:48.2006362Z cvt.f32.bf16 %r568, %rs73; 2026-02-21T08:52:48.2006429Z cvt.f32.bf16 %r569, %rs74; 2026-02-21T08:52:48.2006629Z cvt.f32.bf16 %r570, %rs77; 2026-02-21T08:52:48.2006706Z cvt.f32.bf16 %r571, %rs78; 2026-02-21T08:52:48.2006770Z cvt.f32.bf16 %r588, %rs81; 2026-02-21T08:52:48.2006835Z cvt.f32.bf16 %r589, %rs82; 2026-02-21T08:52:48.2006897Z cvt.f32.bf16 %r590, %rs85; 2026-02-21T08:52:48.2006967Z cvt.f32.bf16 %r591, %rs86; 2026-02-21T08:52:48.2007027Z cvt.f32.bf16 %r608, %rs89; 2026-02-21T08:52:48.2007090Z cvt.f32.bf16 %r609, %rs90; 2026-02-21T08:52:48.2007159Z cvt.f32.bf16 %r610, %rs93; 2026-02-21T08:52:48.2007232Z cvt.f32.bf16 %r611, %rs94; 2026-02-21T08:52:48.2007297Z cvt.f32.bf16 %r628, %rs97; 2026-02-21T08:52:48.2007359Z cvt.f32.bf16 %r629, %rs98; 2026-02-21T08:52:48.2007429Z cvt.f32.bf16 %r630, %rs101; 2026-02-21T08:52:48.2007495Z cvt.f32.bf16 %r631, %rs102; 2026-02-21T08:52:48.2007558Z cvt.f32.bf16 %r648, %rs75; 2026-02-21T08:52:48.2007627Z cvt.f32.bf16 %r649, %rs76; 2026-02-21T08:52:48.2007693Z cvt.f32.bf16 %r650, %rs79; 2026-02-21T08:52:48.2007753Z cvt.f32.bf16 %r651, %rs80; 2026-02-21T08:52:48.2007817Z cvt.f32.bf16 %r668, %rs83; 2026-02-21T08:52:48.2008023Z cvt.f32.bf16 %r669, %rs84; 2026-02-21T08:52:48.2008087Z cvt.f32.bf16 %r670, %rs87; 2026-02-21T08:52:48.2008148Z cvt.f32.bf16 %r671, %rs88; 2026-02-21T08:52:48.2008214Z cvt.f32.bf16 %r688, %rs91; 2026-02-21T08:52:48.2008277Z cvt.f32.bf16 %r689, %rs92; 2026-02-21T08:52:48.2008337Z cvt.f32.bf16 %r690, %rs95; 2026-02-21T08:52:48.2008399Z cvt.f32.bf16 %r691, %rs96; 2026-02-21T08:52:48.2008468Z cvt.f32.bf16 %r708, %rs99; 2026-02-21T08:52:48.2008530Z cvt.f32.bf16 %r709, %rs100; 2026-02-21T08:52:48.2008594Z cvt.f32.bf16 %r710, %rs103; 2026-02-21T08:52:48.2008663Z cvt.f32.bf16 %r711, %rs104; 2026-02-21T08:52:48.2008870Z .loc 1 67 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:67:45 2026-02-21T08:52:48.2008935Z add.s32 %r751, %r27, %r739; 2026-02-21T08:52:48.2009131Z ld.shared.b8 %rs105, [%r751]; 2026-02-21T08:52:48.2009205Z ld.shared.b8 %rs106, [%r751+512]; 2026-02-21T08:52:48.2009273Z ld.shared.b8 %rs107, [%r751+1024]; 2026-02-21T08:52:48.2009345Z ld.shared.b8 %rs108, [%r751+1536]; 2026-02-21T08:52:48.2009416Z ld.shared.b8 %rs109, [%r751+2048]; 2026-02-21T08:52:48.2009482Z ld.shared.b8 %rs110, [%r751+2560]; 2026-02-21T08:52:48.2009547Z ld.shared.b8 %rs111, [%r751+3072]; 2026-02-21T08:52:48.2009617Z ld.shared.b8 %rs112, [%r751+3584]; 2026-02-21T08:52:48.2009816Z .loc 1 57 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:57:28 2026-02-21T08:52:48.2009881Z shl.b16 %rs113, %rs105, 4; 2026-02-21T08:52:48.2009944Z shl.b16 %rs114, %rs106, 4; 2026-02-21T08:52:48.2010014Z shl.b16 %rs115, %rs107, 4; 2026-02-21T08:52:48.2010078Z shl.b16 %rs116, %rs108, 4; 2026-02-21T08:52:48.2010141Z shl.b16 %rs117, %rs109, 4; 2026-02-21T08:52:48.2010215Z shl.b16 %rs118, %rs110, 4; 2026-02-21T08:52:48.2010285Z shl.b16 %rs119, %rs111, 4; 2026-02-21T08:52:48.2010349Z shl.b16 %rs120, %rs112, 4; 2026-02-21T08:52:48.2010552Z .loc 1 72 58 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:72:58 2026-02-21T08:52:48.2010642Z selp.b16 %rs121, %rs113, %rs105, %p57; 2026-02-21T08:52:48.2010707Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T08:52:48.2010782Z shr.s16 %rs123, %rs122, 4; 2026-02-21T08:52:48.2010862Z selp.b16 %rs124, %rs114, %rs106, %p57; 2026-02-21T08:52:48.2010925Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:52:48.2010988Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:52:48.2011058Z selp.b16 %rs127, %rs115, %rs107, %p57; 2026-02-21T08:52:48.2011128Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:52:48.2011190Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:52:48.2011261Z selp.b16 %rs130, %rs116, %rs108, %p57; 2026-02-21T08:52:48.2011332Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T08:52:48.2011394Z shr.s16 %rs132, %rs131, 4; 2026-02-21T08:52:48.2011467Z selp.b16 %rs133, %rs117, %rs109, %p57; 2026-02-21T08:52:48.2011534Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T08:52:48.2011603Z shr.s16 %rs135, %rs134, 4; 2026-02-21T08:52:48.2011677Z selp.b16 %rs136, %rs118, %rs110, %p57; 2026-02-21T08:52:48.2011745Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T08:52:48.2011816Z shr.s16 %rs138, %rs137, 4; 2026-02-21T08:52:48.2011886Z selp.b16 %rs139, %rs119, %rs111, %p57; 2026-02-21T08:52:48.2011950Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T08:52:48.2012019Z shr.s16 %rs141, %rs140, 4; 2026-02-21T08:52:48.2012090Z selp.b16 %rs142, %rs120, %rs112, %p57; 2026-02-21T08:52:48.2012154Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T08:52:48.2012218Z shr.s16 %rs144, %rs143, 4; 2026-02-21T08:52:48.2012427Z .loc 1 77 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:77:32 2026-02-21T08:52:48.2012494Z cvt.rn.f32.s16 %r752, %rs123; 2026-02-21T08:52:48.2012561Z cvt.rn.f32.s16 %r753, %rs126; 2026-02-21T08:52:48.2012632Z cvt.rn.f32.s16 %r754, %rs129; 2026-02-21T08:52:48.2012699Z cvt.rn.f32.s16 %r755, %rs132; 2026-02-21T08:52:48.2012764Z cvt.rn.f32.s16 %r756, %rs135; 2026-02-21T08:52:48.2012829Z cvt.rn.f32.s16 %r757, %rs138; 2026-02-21T08:52:48.2012902Z cvt.rn.f32.s16 %r758, %rs141; 2026-02-21T08:52:48.2013099Z cvt.rn.f32.s16 %r759, %rs144; 2026-02-21T08:52:48.2013168Z st.shared.b32 [%r42], %r752; 2026-02-21T08:52:48.2013242Z st.shared.b32 [%r42+16384], %r756; 2026-02-21T08:52:48.2013311Z st.shared.b32 [%r43], %r753; 2026-02-21T08:52:48.2013394Z st.shared.b32 [%r43+16384], %r757; 2026-02-21T08:52:48.2013463Z st.shared.b32 [%r44], %r754; 2026-02-21T08:52:48.2013535Z st.shared.b32 [%r44+16384], %r758; 2026-02-21T08:52:48.2013602Z st.shared.b32 [%r45], %r755; 2026-02-21T08:52:48.2013673Z st.shared.b32 [%r45+16384], %r759; 2026-02-21T08:52:48.2013737Z $L__tmp3: 2026-02-21T08:52:48.2014016Z .loc 2 291 36 // standard.py:291:36 @[ cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:84:40 ] 2026-02-21T08:52:48.2014172Z // begin inline asm 2026-02-21T08:52:48.2014262Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.2014321Z // end inline asm 2026-02-21T08:52:48.2014379Z bar.sync 0; 2026-02-21T08:52:48.2014459Z shfl.sync.idx.b32 %r760, %r4, 0, 31, -1; 2026-02-21T08:52:48.2014545Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.2014607Z shl.b32 %r761, %r760, 9; 2026-02-21T08:52:48.2014669Z and.b32 %r762, %r761, 14336; 2026-02-21T08:52:48.2014740Z add.s32 %r763, %r762, %r1386; 2026-02-21T08:52:48.2014802Z bfe.u32 %r764, %r763, 4, 14; 2026-02-21T08:52:48.2014866Z cvt.u64.u32 %rd65, %r764; 2026-02-21T08:52:48.2014942Z or.b64 %rd55, %rd65, 4611686293372403712; 2026-02-21T08:52:48.2015012Z mov.pred %p15, -1; 2026-02-21T08:52:48.2015072Z // begin inline asm 2026-02-21T08:52:48.2015472Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r568,%r569,%r570,%r571}, %rd55, %p15, 1, 1; 2026-02-21T08:52:48.2015542Z // end inline asm 2026-02-21T08:52:48.2015607Z add.s32 %r765, %r763, 32; 2026-02-21T08:52:48.2015670Z bfe.u32 %r766, %r765, 4, 14; 2026-02-21T08:52:48.2015737Z cvt.u64.u32 %rd66, %r766; 2026-02-21T08:52:48.2015809Z or.b64 %rd56, %rd66, 4611686293372403712; 2026-02-21T08:52:48.2015874Z // begin inline asm 2026-02-21T08:52:48.2016239Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r588,%r589,%r590,%r591}, %rd56, %p15, 1, 1; 2026-02-21T08:52:48.2016304Z // end inline asm 2026-02-21T08:52:48.2016367Z add.s32 %r767, %r763, 64; 2026-02-21T08:52:48.2016428Z bfe.u32 %r768, %r767, 4, 14; 2026-02-21T08:52:48.2016622Z cvt.u64.u32 %rd67, %r768; 2026-02-21T08:52:48.2016699Z or.b64 %rd57, %rd67, 4611686293372403712; 2026-02-21T08:52:48.2016762Z // begin inline asm 2026-02-21T08:52:48.2017128Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r608,%r609,%r610,%r611}, %rd57, %p15, 1, 1; 2026-02-21T08:52:48.2017191Z // end inline asm 2026-02-21T08:52:48.2017254Z add.s32 %r769, %r763, 96; 2026-02-21T08:52:48.2017316Z bfe.u32 %r770, %r769, 4, 14; 2026-02-21T08:52:48.2017387Z cvt.u64.u32 %rd68, %r770; 2026-02-21T08:52:48.2017461Z or.b64 %rd58, %rd68, 4611686293372403712; 2026-02-21T08:52:48.2017523Z // begin inline asm 2026-02-21T08:52:48.2017887Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r628,%r629,%r630,%r631}, %rd58, %p15, 1, 1; 2026-02-21T08:52:48.2017948Z // end inline asm 2026-02-21T08:52:48.2018011Z add.s32 %r771, %r763, 16384; 2026-02-21T08:52:48.2018078Z bfe.u32 %r772, %r771, 4, 14; 2026-02-21T08:52:48.2018141Z cvt.u64.u32 %rd69, %r772; 2026-02-21T08:52:48.2018211Z or.b64 %rd59, %rd69, 4611686293372403712; 2026-02-21T08:52:48.2018275Z // begin inline asm 2026-02-21T08:52:48.2018655Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r648,%r649,%r650,%r651}, %rd59, %p15, 1, 1; 2026-02-21T08:52:48.2018718Z // end inline asm 2026-02-21T08:52:48.2018782Z add.s32 %r773, %r763, 16416; 2026-02-21T08:52:48.2018852Z bfe.u32 %r774, %r773, 4, 14; 2026-02-21T08:52:48.2019060Z cvt.u64.u32 %rd70, %r774; 2026-02-21T08:52:48.2019138Z or.b64 %rd60, %rd70, 4611686293372403712; 2026-02-21T08:52:48.2019201Z // begin inline asm 2026-02-21T08:52:48.2019564Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r668,%r669,%r670,%r671}, %rd60, %p15, 1, 1; 2026-02-21T08:52:48.2019624Z // end inline asm 2026-02-21T08:52:48.2019694Z add.s32 %r775, %r763, 16448; 2026-02-21T08:52:48.2019757Z bfe.u32 %r776, %r775, 4, 14; 2026-02-21T08:52:48.2019820Z cvt.u64.u32 %rd71, %r776; 2026-02-21T08:52:48.2019897Z or.b64 %rd61, %rd71, 4611686293372403712; 2026-02-21T08:52:48.2019959Z // begin inline asm 2026-02-21T08:52:48.2020438Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r688,%r689,%r690,%r691}, %rd61, %p15, 1, 1; 2026-02-21T08:52:48.2020506Z // end inline asm 2026-02-21T08:52:48.2020568Z add.s32 %r777, %r763, 16480; 2026-02-21T08:52:48.2020629Z bfe.u32 %r778, %r777, 4, 14; 2026-02-21T08:52:48.2020696Z cvt.u64.u32 %rd72, %r778; 2026-02-21T08:52:48.2020775Z or.b64 %rd62, %rd72, 4611686293372403712; 2026-02-21T08:52:48.2020836Z // begin inline asm 2026-02-21T08:52:48.2021192Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426}, {%r708,%r709,%r710,%r711}, %rd62, %p15, 1, 1; 2026-02-21T08:52:48.2021258Z // end inline asm 2026-02-21T08:52:48.2021337Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.2021396Z mov.b32 %r722, 0; 2026-02-21T08:52:48.2021460Z mov.b32 %r721, %r722; 2026-02-21T08:52:48.2021521Z mov.b32 %r720, %r1386; 2026-02-21T08:52:48.2021594Z // begin inline asm 2026-02-21T08:52:48.2021771Z // wait for regs: %r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1425,%r1426,%r720,%r721,%r722 2026-02-21T08:52:48.2021856Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.2021915Z // end inline asm 2026-02-21T08:52:48.2021971Z $L__tmp4: 2026-02-21T08:52:48.2022191Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2022256Z add.s32 %r779, %r1418, 1; 2026-02-21T08:52:48.2022327Z setp.gt.s32 %p26, %r779, 1; 2026-02-21T08:52:48.2022396Z selp.b32 %r1418, 0, %r779, %p26; 2026-02-21T08:52:48.2022608Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2022681Z mad.wide.s32 %rd63, %r1415, 2, %rd10; 2026-02-21T08:52:48.2022885Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2022952Z shl.b32 %r780, %r1418, 12; 2026-02-21T08:52:48.2023014Z shl.b32 %r781, %r1418, 13; 2026-02-21T08:52:48.2023076Z add.s32 %r734, %r13, %r781; 2026-02-21T08:52:48.2023148Z selp.b32 %r735, 8, 0, %p24; 2026-02-21T08:52:48.2023209Z // begin inline asm 2026-02-21T08:52:48.2023349Z cp.async.ca.shared.global [ %r734 + 0 ], [ %rd63 + 0 ], 0x8, %r735; 2026-02-21T08:52:48.2023409Z // end inline asm 2026-02-21T08:52:48.2023495Z cp.async.commit_group; 2026-02-21T08:52:48.2023701Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2023765Z cvt.s64.s32 %rd73, %r1416; 2026-02-21T08:52:48.2023836Z add.s64 %rd64, %rd11, %rd73; 2026-02-21T08:52:48.2024037Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2024100Z add.s32 %r736, %r15, %r780; 2026-02-21T08:52:48.2024169Z selp.b32 %r737, 4, 0, %p24; 2026-02-21T08:52:48.2024228Z // begin inline asm 2026-02-21T08:52:48.2024365Z cp.async.ca.shared.global [ %r736 + 0 ], [ %rd64 + 0 ], 0x4, %r737; 2026-02-21T08:52:48.2024424Z // end inline asm 2026-02-21T08:52:48.2024497Z cp.async.commit_group; 2026-02-21T08:52:48.2024709Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2024776Z add.s32 %r1416, %r1416, 229376; 2026-02-21T08:52:48.2024958Z add.s32 %r1415, %r1415, 64; 2026-02-21T08:52:48.2025026Z setp.lt.u64 %p27, %rd134, 4064; 2026-02-21T08:52:48.2025090Z @%p27 bra $L__BB0_5; 2026-02-21T08:52:48.2025206Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.2025410Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.2025472Z or.b32 %r797, %r101, %r8; 2026-02-21T08:52:48.2025688Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2025761Z cp.async.wait_group 0; 2026-02-21T08:52:48.2025820Z bar.sync 0; 2026-02-21T08:52:48.2026017Z .loc 1 87 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:87:28 2026-02-21T08:52:48.2026194Z cvt.rn.bf16x2.f32 %r798, %r1420, %r1419; 2026-02-21T08:52:48.2026273Z cvt.rn.bf16x2.f32 %r799, %r1422, %r1421; 2026-02-21T08:52:48.2026346Z cvt.rn.bf16x2.f32 %r800, %r1424, %r1423; 2026-02-21T08:52:48.2026428Z cvt.rn.bf16x2.f32 %r801, %r1426, %r1425; 2026-02-21T08:52:48.2026754Z .loc 1 88 50 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:50 2026-02-21T08:52:48.2026828Z mad.lo.s32 %r802, %r102, 7168, %r797; 2026-02-21T08:52:48.2027028Z .loc 1 88 22 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:22 2026-02-21T08:52:48.2027106Z mad.wide.s32 %rd74, %r802, 2, %rd12; 2026-02-21T08:52:48.2027305Z .loc 1 88 81 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:81 2026-02-21T08:52:48.2027487Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r798, %r799, %r800, %r801}; 2026-02-21T08:52:48.2027550Z bar.sync 0; 2026-02-21T08:52:48.2027659Z ld.shared.v4.b32 {%r782, %r783, %r784, %r785}, [%r33]; 2026-02-21T08:52:48.2027722Z // begin inline asm 2026-02-21T08:52:48.2027845Z st.global.v4.b32 [ %rd74 + 0 ], { %r782, %r783, %r784, %r785 }; 2026-02-21T08:52:48.2027905Z // end inline asm 2026-02-21T08:52:48.2028123Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.2028193Z add.s32 %r803, %r1402, 8448; 2026-02-21T08:52:48.2028398Z .loc 1 25 35 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:25:35 2026-02-21T08:52:48.2028463Z shr.u32 %r804, %r803, 31; 2026-02-21T08:52:48.2028606Z add.s32 %r805, %r803, %r804; 2026-02-21T08:52:48.2028823Z .loc 1 26 33 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:26:33 2026-02-21T08:52:48.2028888Z and.b32 %r806, %r805, -2; 2026-02-21T08:52:48.2029196Z .loc 1 27 39 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:39 2026-02-21T08:52:48.2029267Z sub.s32 %r807, 56, %r806; 2026-02-21T08:52:48.2029469Z .loc 1 27 52 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:52 2026-02-21T08:52:48.2029532Z min.s32 %r808, %r807, 2; 2026-02-21T08:52:48.2029737Z .loc 1 28 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:45 2026-02-21T08:52:48.2029802Z sub.s32 %r809, %r803, %r806; 2026-02-21T08:52:48.2030002Z .loc 1 29 51 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:29:51 2026-02-21T08:52:48.2030071Z div.s32 %r810, %r809, %r808; 2026-02-21T08:52:48.2030269Z .loc 1 28 64 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:64 2026-02-21T08:52:48.2030334Z mul.lo.s32 %r811, %r810, %r808; 2026-02-21T08:52:48.2030398Z sub.s32 %r812, %r809, %r811; 2026-02-21T08:52:48.2030603Z .loc 1 28 30 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:30 2026-02-21T08:52:48.2030667Z add.s32 %r813, %r812, %r806; 2026-02-21T08:52:48.2030868Z .loc 1 30 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:30:27 2026-02-21T08:52:48.2030937Z shl.b32 %r129, %r813, 7; 2026-02-21T08:52:48.2031276Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.2031339Z or.b32 %r814, %r129, %r6; 2026-02-21T08:52:48.2031558Z .loc 1 32 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:32:27 2026-02-21T08:52:48.2031621Z shl.b32 %r815, %r810, 6; 2026-02-21T08:52:48.2031821Z .loc 1 33 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:33:32 2026-02-21T08:52:48.2031888Z or.b32 %r130, %r815, %r10; 2026-02-21T08:52:48.2032092Z .loc 1 48 53 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:53 2026-02-21T08:52:48.2032155Z shl.b32 %r816, %r130, 13; 2026-02-21T08:52:48.2032469Z .loc 1 48 60 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:60 2026-02-21T08:52:48.2032542Z or.b32 %r817, %r816, %r11; 2026-02-21T08:52:48.2032741Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2032820Z mad.wide.s32 %rd75, %r817, 2, %rd10; 2026-02-21T08:52:48.2032883Z mov.b32 %r787, 8; 2026-02-21T08:52:48.2033084Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2033147Z // begin inline asm 2026-02-21T08:52:48.2033291Z cp.async.ca.shared.global [ %r13 + 0 ], [ %rd75 + 0 ], 0x8, %r787; 2026-02-21T08:52:48.2033354Z // end inline asm 2026-02-21T08:52:48.2033423Z cp.async.commit_group; 2026-02-21T08:52:48.2033624Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.2033699Z add.s32 %r818, %r814, %r1387; 2026-02-21T08:52:48.2033906Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2033970Z cvt.s64.s32 %rd80, %r818; 2026-02-21T08:52:48.2034042Z add.s64 %rd76, %rd11, %rd80; 2026-02-21T08:52:48.2034100Z mov.b32 %r789, 4; 2026-02-21T08:52:48.2034304Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2034369Z // begin inline asm 2026-02-21T08:52:48.2034501Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd76 + 0 ], 0x4, %r789; 2026-02-21T08:52:48.2034559Z // end inline asm 2026-02-21T08:52:48.2034627Z cp.async.commit_group; 2026-02-21T08:52:48.2034833Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2034895Z cvt.s64.s32 %rd81, %r816; 2026-02-21T08:52:48.2034958Z or.b64 %rd82, %rd81, %rd1; 2026-02-21T08:52:48.2035025Z shl.b64 %rd83, %rd82, 1; 2026-02-21T08:52:48.2035089Z add.s64 %rd84, %rd10, %rd83; 2026-02-21T08:52:48.2035151Z add.s64 %rd77, %rd84, 128; 2026-02-21T08:52:48.2035351Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2035416Z bar.sync 0; 2026-02-21T08:52:48.2035479Z // begin inline asm 2026-02-21T08:52:48.2035611Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd77 + 0 ], 0x8, %r787; 2026-02-21T08:52:48.2035679Z // end inline asm 2026-02-21T08:52:48.2035750Z cp.async.commit_group; 2026-02-21T08:52:48.2035950Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.2036020Z add.s32 %r819, %r814, %r17; 2026-02-21T08:52:48.2036218Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2036282Z cvt.s64.s32 %rd85, %r819; 2026-02-21T08:52:48.2036346Z add.s64 %rd78, %rd11, %rd85; 2026-02-21T08:52:48.2036698Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2036762Z // begin inline asm 2026-02-21T08:52:48.2036894Z cp.async.ca.shared.global [ %r18 + 0 ], [ %rd78 + 0 ], 0x4, %r789; 2026-02-21T08:52:48.2036959Z // end inline asm 2026-02-21T08:52:48.2037026Z cp.async.commit_group; 2026-02-21T08:52:48.2037399Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2037465Z shl.b32 %r820, %r803, 7; 2026-02-21T08:52:48.2037527Z add.s32 %r821, %r46, %r820; 2026-02-21T08:52:48.2037588Z shl.b32 %r822, %r811, 7; 2026-02-21T08:52:48.2037651Z sub.s32 %r1428, %r821, %r822; 2026-02-21T08:52:48.2037717Z shl.b32 %r823, %r810, 19; 2026-02-21T08:52:48.2037782Z or.b32 %r1427, %r47, %r823; 2026-02-21T08:52:48.2037845Z mov.b32 %r1431, 0f00000000; 2026-02-21T08:52:48.2037908Z mov.b32 %r1430, 1; 2026-02-21T08:52:48.2037969Z mov.b32 %r1429, -1; 2026-02-21T08:52:48.2038030Z mov.b64 %rd135, -32; 2026-02-21T08:52:48.2038105Z mov.b32 %r1432, %r1431; 2026-02-21T08:52:48.2038174Z mov.b32 %r1433, %r1431; 2026-02-21T08:52:48.2038360Z mov.b32 %r1434, %r1431; 2026-02-21T08:52:48.2038424Z mov.b32 %r1435, %r1431; 2026-02-21T08:52:48.2038491Z mov.b32 %r1436, %r1431; 2026-02-21T08:52:48.2038551Z mov.b32 %r1437, %r1431; 2026-02-21T08:52:48.2038618Z mov.b32 %r1438, %r1431; 2026-02-21T08:52:48.2038739Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.2038849Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.2038916Z add.s64 %rd135, %rd135, 32; 2026-02-21T08:52:48.2038985Z setp.lt.u64 %p37, %rd135, 4032; 2026-02-21T08:52:48.2039054Z add.s32 %r1010, %r1429, 1; 2026-02-21T08:52:48.2039120Z setp.gt.s32 %p38, %r1010, 1; 2026-02-21T08:52:48.2039190Z selp.b32 %r1429, 0, %r1010, %p38; 2026-02-21T08:52:48.2039398Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2039466Z cp.async.wait_group 2; 2026-02-21T08:52:48.2039525Z bar.sync 0; 2026-02-21T08:52:48.2039591Z shl.b32 %r1011, %r1429, 12; 2026-02-21T08:52:48.2039660Z shl.b32 %r1012, %r1429, 13; 2026-02-21T08:52:48.2039723Z add.s32 %r1013, %r1386, %r1012; 2026-02-21T08:52:48.2039789Z add.s32 %r1014, %r1013, 32768; 2026-02-21T08:52:48.2040000Z .loc 1 52 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:52:32 2026-02-21T08:52:48.2040063Z add.s32 %r1015, %r1014, %r34; 2026-02-21T08:52:48.2040131Z ld.shared.b16 %rs145, [%r1015]; 2026-02-21T08:52:48.2040209Z ld.shared.b16 %rs146, [%r1015+1024]; 2026-02-21T08:52:48.2040280Z ld.shared.b16 %rs147, [%r1015+64]; 2026-02-21T08:52:48.2040351Z ld.shared.b16 %rs148, [%r1015+1088]; 2026-02-21T08:52:48.2040416Z add.s32 %r1016, %r1014, %r35; 2026-02-21T08:52:48.2040489Z ld.shared.b16 %rs149, [%r1016]; 2026-02-21T08:52:48.2040557Z ld.shared.b16 %rs150, [%r1016+1024]; 2026-02-21T08:52:48.2040625Z ld.shared.b16 %rs151, [%r1016+64]; 2026-02-21T08:52:48.2040699Z ld.shared.b16 %rs152, [%r1016+1088]; 2026-02-21T08:52:48.2040765Z add.s32 %r1017, %r1014, %r36; 2026-02-21T08:52:48.2040832Z ld.shared.b16 %rs153, [%r1017]; 2026-02-21T08:52:48.2040901Z ld.shared.b16 %rs154, [%r1017+1024]; 2026-02-21T08:52:48.2040973Z ld.shared.b16 %rs155, [%r1017+64]; 2026-02-21T08:52:48.2041043Z ld.shared.b16 %rs156, [%r1017+1088]; 2026-02-21T08:52:48.2041105Z add.s32 %r1018, %r1014, %r37; 2026-02-21T08:52:48.2041183Z ld.shared.b16 %rs157, [%r1018]; 2026-02-21T08:52:48.2041251Z ld.shared.b16 %rs158, [%r1018+1024]; 2026-02-21T08:52:48.2041319Z ld.shared.b16 %rs159, [%r1018+64]; 2026-02-21T08:52:48.2041393Z ld.shared.b16 %rs160, [%r1018+1088]; 2026-02-21T08:52:48.2041459Z add.s32 %r1019, %r1014, %r38; 2026-02-21T08:52:48.2041527Z ld.shared.b16 %rs161, [%r1019]; 2026-02-21T08:52:48.2041597Z ld.shared.b16 %rs162, [%r1019+1024]; 2026-02-21T08:52:48.2041670Z ld.shared.b16 %rs163, [%r1019+64]; 2026-02-21T08:52:48.2041739Z ld.shared.b16 %rs164, [%r1019+1088]; 2026-02-21T08:52:48.2041801Z add.s32 %r1020, %r1014, %r39; 2026-02-21T08:52:48.2041876Z ld.shared.b16 %rs165, [%r1020]; 2026-02-21T08:52:48.2041943Z ld.shared.b16 %rs166, [%r1020+1024]; 2026-02-21T08:52:48.2042011Z ld.shared.b16 %rs167, [%r1020+64]; 2026-02-21T08:52:48.2042187Z ld.shared.b16 %rs168, [%r1020+1088]; 2026-02-21T08:52:48.2042259Z add.s32 %r1021, %r1014, %r40; 2026-02-21T08:52:48.2042324Z ld.shared.b16 %rs169, [%r1021]; 2026-02-21T08:52:48.2042404Z ld.shared.b16 %rs170, [%r1021+1024]; 2026-02-21T08:52:48.2042478Z ld.shared.b16 %rs171, [%r1021+64]; 2026-02-21T08:52:48.2042545Z ld.shared.b16 %rs172, [%r1021+1088]; 2026-02-21T08:52:48.2042608Z add.s32 %r1022, %r1014, %r41; 2026-02-21T08:52:48.2042678Z ld.shared.b16 %rs173, [%r1022]; 2026-02-21T08:52:48.2042747Z ld.shared.b16 %rs174, [%r1022+1024]; 2026-02-21T08:52:48.2042815Z ld.shared.b16 %rs175, [%r1022+64]; 2026-02-21T08:52:48.2042882Z ld.shared.b16 %rs176, [%r1022+1088]; 2026-02-21T08:52:48.2042959Z cvt.f32.bf16 %r840, %rs145; 2026-02-21T08:52:48.2043022Z cvt.f32.bf16 %r841, %rs146; 2026-02-21T08:52:48.2043177Z cvt.f32.bf16 %r842, %rs149; 2026-02-21T08:52:48.2043247Z cvt.f32.bf16 %r843, %rs150; 2026-02-21T08:52:48.2043309Z cvt.f32.bf16 %r860, %rs153; 2026-02-21T08:52:48.2043380Z cvt.f32.bf16 %r861, %rs154; 2026-02-21T08:52:48.2043441Z cvt.f32.bf16 %r862, %rs157; 2026-02-21T08:52:48.2043508Z cvt.f32.bf16 %r863, %rs158; 2026-02-21T08:52:48.2043569Z cvt.f32.bf16 %r880, %rs161; 2026-02-21T08:52:48.2043632Z cvt.f32.bf16 %r881, %rs162; 2026-02-21T08:52:48.2043698Z cvt.f32.bf16 %r882, %rs165; 2026-02-21T08:52:48.2043759Z cvt.f32.bf16 %r883, %rs166; 2026-02-21T08:52:48.2043821Z cvt.f32.bf16 %r900, %rs169; 2026-02-21T08:52:48.2043884Z cvt.f32.bf16 %r901, %rs170; 2026-02-21T08:52:48.2043951Z cvt.f32.bf16 %r902, %rs173; 2026-02-21T08:52:48.2044024Z cvt.f32.bf16 %r903, %rs174; 2026-02-21T08:52:48.2044089Z cvt.f32.bf16 %r920, %rs147; 2026-02-21T08:52:48.2044157Z cvt.f32.bf16 %r921, %rs148; 2026-02-21T08:52:48.2044219Z cvt.f32.bf16 %r922, %rs151; 2026-02-21T08:52:48.2044283Z cvt.f32.bf16 %r923, %rs152; 2026-02-21T08:52:48.2044348Z cvt.f32.bf16 %r940, %rs155; 2026-02-21T08:52:48.2044420Z cvt.f32.bf16 %r941, %rs156; 2026-02-21T08:52:48.2044481Z cvt.f32.bf16 %r942, %rs159; 2026-02-21T08:52:48.2044544Z cvt.f32.bf16 %r943, %rs160; 2026-02-21T08:52:48.2044614Z cvt.f32.bf16 %r960, %rs163; 2026-02-21T08:52:48.2044679Z cvt.f32.bf16 %r961, %rs164; 2026-02-21T08:52:48.2044741Z cvt.f32.bf16 %r962, %rs167; 2026-02-21T08:52:48.2044809Z cvt.f32.bf16 %r963, %rs168; 2026-02-21T08:52:48.2044873Z cvt.f32.bf16 %r980, %rs171; 2026-02-21T08:52:48.2044938Z cvt.f32.bf16 %r981, %rs172; 2026-02-21T08:52:48.2045000Z cvt.f32.bf16 %r982, %rs175; 2026-02-21T08:52:48.2045078Z cvt.f32.bf16 %r983, %rs176; 2026-02-21T08:52:48.2045290Z .loc 1 67 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:67:45 2026-02-21T08:52:48.2045356Z add.s32 %r1023, %r27, %r1011; 2026-02-21T08:52:48.2045430Z ld.shared.b8 %rs177, [%r1023]; 2026-02-21T08:52:48.2045507Z ld.shared.b8 %rs178, [%r1023+512]; 2026-02-21T08:52:48.2045577Z ld.shared.b8 %rs179, [%r1023+1024]; 2026-02-21T08:52:48.2045645Z ld.shared.b8 %rs180, [%r1023+1536]; 2026-02-21T08:52:48.2045720Z ld.shared.b8 %rs181, [%r1023+2048]; 2026-02-21T08:52:48.2045788Z ld.shared.b8 %rs182, [%r1023+2560]; 2026-02-21T08:52:48.2045855Z ld.shared.b8 %rs183, [%r1023+3072]; 2026-02-21T08:52:48.2045928Z ld.shared.b8 %rs184, [%r1023+3584]; 2026-02-21T08:52:48.2046129Z .loc 1 57 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:57:28 2026-02-21T08:52:48.2046194Z shl.b16 %rs185, %rs177, 4; 2026-02-21T08:52:48.2046266Z shl.b16 %rs186, %rs178, 4; 2026-02-21T08:52:48.2046328Z shl.b16 %rs187, %rs179, 4; 2026-02-21T08:52:48.2046391Z shl.b16 %rs188, %rs180, 4; 2026-02-21T08:52:48.2046571Z shl.b16 %rs189, %rs181, 4; 2026-02-21T08:52:48.2046643Z shl.b16 %rs190, %rs182, 4; 2026-02-21T08:52:48.2046707Z shl.b16 %rs191, %rs183, 4; 2026-02-21T08:52:48.2046774Z shl.b16 %rs192, %rs184, 4; 2026-02-21T08:52:48.2046981Z .loc 1 72 58 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:72:58 2026-02-21T08:52:48.2047057Z selp.b16 %rs193, %rs185, %rs177, %p57; 2026-02-21T08:52:48.2047281Z cvt.s16.s8 %rs194, %rs193; 2026-02-21T08:52:48.2047344Z shr.s16 %rs195, %rs194, 4; 2026-02-21T08:52:48.2047423Z selp.b16 %rs196, %rs186, %rs178, %p57; 2026-02-21T08:52:48.2047486Z cvt.s16.s8 %rs197, %rs196; 2026-02-21T08:52:48.2047548Z shr.s16 %rs198, %rs197, 4; 2026-02-21T08:52:48.2047624Z selp.b16 %rs199, %rs187, %rs179, %p57; 2026-02-21T08:52:48.2047689Z cvt.s16.s8 %rs200, %rs199; 2026-02-21T08:52:48.2047751Z shr.s16 %rs201, %rs200, 4; 2026-02-21T08:52:48.2047827Z selp.b16 %rs202, %rs188, %rs180, %p57; 2026-02-21T08:52:48.2047893Z cvt.s16.s8 %rs203, %rs202; 2026-02-21T08:52:48.2047955Z shr.s16 %rs204, %rs203, 4; 2026-02-21T08:52:48.2048025Z selp.b16 %rs205, %rs189, %rs181, %p57; 2026-02-21T08:52:48.2048104Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T08:52:48.2048296Z shr.s16 %rs207, %rs206, 4; 2026-02-21T08:52:48.2048370Z selp.b16 %rs208, %rs190, %rs182, %p57; 2026-02-21T08:52:48.2048441Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T08:52:48.2048509Z shr.s16 %rs210, %rs209, 4; 2026-02-21T08:52:48.2048580Z selp.b16 %rs211, %rs191, %rs183, %p57; 2026-02-21T08:52:48.2048644Z cvt.s16.s8 %rs212, %rs211; 2026-02-21T08:52:48.2048712Z shr.s16 %rs213, %rs212, 4; 2026-02-21T08:52:48.2048782Z selp.b16 %rs214, %rs192, %rs184, %p57; 2026-02-21T08:52:48.2048844Z cvt.s16.s8 %rs215, %rs214; 2026-02-21T08:52:48.2048911Z shr.s16 %rs216, %rs215, 4; 2026-02-21T08:52:48.2049117Z .loc 1 77 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:77:32 2026-02-21T08:52:48.2049186Z cvt.rn.f32.s16 %r1024, %rs195; 2026-02-21T08:52:48.2049258Z cvt.rn.f32.s16 %r1025, %rs198; 2026-02-21T08:52:48.2049323Z cvt.rn.f32.s16 %r1026, %rs201; 2026-02-21T08:52:48.2049390Z cvt.rn.f32.s16 %r1027, %rs204; 2026-02-21T08:52:48.2049456Z cvt.rn.f32.s16 %r1028, %rs207; 2026-02-21T08:52:48.2049528Z cvt.rn.f32.s16 %r1029, %rs210; 2026-02-21T08:52:48.2049592Z cvt.rn.f32.s16 %r1030, %rs213; 2026-02-21T08:52:48.2049656Z cvt.rn.f32.s16 %r1031, %rs216; 2026-02-21T08:52:48.2049735Z st.shared.b32 [%r42], %r1024; 2026-02-21T08:52:48.2049805Z st.shared.b32 [%r42+16384], %r1028; 2026-02-21T08:52:48.2049873Z st.shared.b32 [%r43], %r1025; 2026-02-21T08:52:48.2049943Z st.shared.b32 [%r43+16384], %r1029; 2026-02-21T08:52:48.2050018Z st.shared.b32 [%r44], %r1026; 2026-02-21T08:52:48.2050086Z st.shared.b32 [%r44+16384], %r1030; 2026-02-21T08:52:48.2050151Z st.shared.b32 [%r45], %r1027; 2026-02-21T08:52:48.2050226Z st.shared.b32 [%r45+16384], %r1031; 2026-02-21T08:52:48.2050284Z $L__tmp5: 2026-02-21T08:52:48.2050562Z .loc 2 291 36 // standard.py:291:36 @[ cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:84:40 ] 2026-02-21T08:52:48.2050633Z // begin inline asm 2026-02-21T08:52:48.2050716Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.2050774Z // end inline asm 2026-02-21T08:52:48.2050834Z bar.sync 0; 2026-02-21T08:52:48.2050921Z shfl.sync.idx.b32 %r1032, %r4, 0, 31, -1; 2026-02-21T08:52:48.2050999Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.2051060Z shl.b32 %r1033, %r1032, 9; 2026-02-21T08:52:48.2051127Z and.b32 %r1034, %r1033, 14336; 2026-02-21T08:52:48.2051191Z add.s32 %r1035, %r1034, %r1386; 2026-02-21T08:52:48.2051253Z bfe.u32 %r1036, %r1035, 4, 14; 2026-02-21T08:52:48.2051315Z cvt.u64.u32 %rd96, %r1036; 2026-02-21T08:52:48.2051392Z or.b64 %rd86, %rd96, 4611686293372403712; 2026-02-21T08:52:48.2051458Z mov.pred %p28, -1; 2026-02-21T08:52:48.2051530Z // begin inline asm 2026-02-21T08:52:48.2051915Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r840,%r841,%r842,%r843}, %rd86, %p28, 1, 1; 2026-02-21T08:52:48.2051974Z // end inline asm 2026-02-21T08:52:48.2052039Z add.s32 %r1037, %r1035, 32; 2026-02-21T08:52:48.2052105Z bfe.u32 %r1038, %r1037, 4, 14; 2026-02-21T08:52:48.2052178Z cvt.u64.u32 %rd97, %r1038; 2026-02-21T08:52:48.2052250Z or.b64 %rd87, %rd97, 4611686293372403712; 2026-02-21T08:52:48.2052415Z // begin inline asm 2026-02-21T08:52:48.2052787Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r860,%r861,%r862,%r863}, %rd87, %p28, 1, 1; 2026-02-21T08:52:48.2052848Z // end inline asm 2026-02-21T08:52:48.2052909Z add.s32 %r1039, %r1035, 64; 2026-02-21T08:52:48.2052975Z bfe.u32 %r1040, %r1039, 4, 14; 2026-02-21T08:52:48.2053043Z cvt.u64.u32 %rd98, %r1040; 2026-02-21T08:52:48.2053114Z or.b64 %rd88, %rd98, 4611686293372403712; 2026-02-21T08:52:48.2053174Z // begin inline asm 2026-02-21T08:52:48.2053543Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r880,%r881,%r882,%r883}, %rd88, %p28, 1, 1; 2026-02-21T08:52:48.2053602Z // end inline asm 2026-02-21T08:52:48.2053774Z add.s32 %r1041, %r1035, 96; 2026-02-21T08:52:48.2053845Z bfe.u32 %r1042, %r1041, 4, 14; 2026-02-21T08:52:48.2053908Z cvt.u64.u32 %rd99, %r1042; 2026-02-21T08:52:48.2053978Z or.b64 %rd89, %rd99, 4611686293372403712; 2026-02-21T08:52:48.2054048Z // begin inline asm 2026-02-21T08:52:48.2054408Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r900,%r901,%r902,%r903}, %rd89, %p28, 1, 1; 2026-02-21T08:52:48.2054466Z // end inline asm 2026-02-21T08:52:48.2054530Z add.s32 %r1043, %r1035, 16384; 2026-02-21T08:52:48.2054596Z bfe.u32 %r1044, %r1043, 4, 14; 2026-02-21T08:52:48.2054665Z cvt.u64.u32 %rd100, %r1044; 2026-02-21T08:52:48.2054742Z or.b64 %rd90, %rd100, 4611686293372403712; 2026-02-21T08:52:48.2054808Z // begin inline asm 2026-02-21T08:52:48.2055171Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r920,%r921,%r922,%r923}, %rd90, %p28, 1, 1; 2026-02-21T08:52:48.2055233Z // end inline asm 2026-02-21T08:52:48.2055301Z add.s32 %r1045, %r1035, 16416; 2026-02-21T08:52:48.2055363Z bfe.u32 %r1046, %r1045, 4, 14; 2026-02-21T08:52:48.2055429Z cvt.u64.u32 %rd101, %r1046; 2026-02-21T08:52:48.2055515Z or.b64 %rd91, %rd101, 4611686293372403712; 2026-02-21T08:52:48.2055582Z // begin inline asm 2026-02-21T08:52:48.2055941Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r940,%r941,%r942,%r943}, %rd91, %p28, 1, 1; 2026-02-21T08:52:48.2055999Z // end inline asm 2026-02-21T08:52:48.2056066Z add.s32 %r1047, %r1035, 16448; 2026-02-21T08:52:48.2056128Z bfe.u32 %r1048, %r1047, 4, 14; 2026-02-21T08:52:48.2056192Z cvt.u64.u32 %rd102, %r1048; 2026-02-21T08:52:48.2056272Z or.b64 %rd92, %rd102, 4611686293372403712; 2026-02-21T08:52:48.2056336Z // begin inline asm 2026-02-21T08:52:48.2056837Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r960,%r961,%r962,%r963}, %rd92, %p28, 1, 1; 2026-02-21T08:52:48.2056898Z // end inline asm 2026-02-21T08:52:48.2056968Z add.s32 %r1049, %r1035, 16480; 2026-02-21T08:52:48.2057032Z bfe.u32 %r1050, %r1049, 4, 14; 2026-02-21T08:52:48.2057101Z cvt.u64.u32 %rd103, %r1050; 2026-02-21T08:52:48.2057180Z or.b64 %rd93, %rd103, 4611686293372403712; 2026-02-21T08:52:48.2057242Z // begin inline asm 2026-02-21T08:52:48.2057605Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438}, {%r980,%r981,%r982,%r983}, %rd93, %p28, 1, 1; 2026-02-21T08:52:48.2057671Z // end inline asm 2026-02-21T08:52:48.2057750Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.2057809Z mov.b32 %r993, 0; 2026-02-21T08:52:48.2057871Z mov.b32 %r994, %r993; 2026-02-21T08:52:48.2057939Z mov.b32 %r992, %r1386; 2026-02-21T08:52:48.2057999Z // begin inline asm 2026-02-21T08:52:48.2058174Z // wait for regs: %r1431,%r1432,%r1433,%r1434,%r1435,%r1436,%r1437,%r1438,%r992,%r993,%r994 2026-02-21T08:52:48.2058266Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.2058327Z // end inline asm 2026-02-21T08:52:48.2058383Z $L__tmp6: 2026-02-21T08:52:48.2058603Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2058815Z add.s32 %r1051, %r1430, 1; 2026-02-21T08:52:48.2058886Z setp.gt.s32 %p39, %r1051, 1; 2026-02-21T08:52:48.2058969Z selp.b32 %r1430, 0, %r1051, %p39; 2026-02-21T08:52:48.2059187Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2059261Z mad.wide.s32 %rd94, %r1427, 2, %rd10; 2026-02-21T08:52:48.2059466Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2059537Z shl.b32 %r1052, %r1430, 12; 2026-02-21T08:52:48.2059599Z shl.b32 %r1053, %r1430, 13; 2026-02-21T08:52:48.2059665Z add.s32 %r1006, %r13, %r1053; 2026-02-21T08:52:48.2059854Z selp.b32 %r1007, 8, 0, %p37; 2026-02-21T08:52:48.2059927Z // begin inline asm 2026-02-21T08:52:48.2060082Z cp.async.ca.shared.global [ %r1006 + 0 ], [ %rd94 + 0 ], 0x8, %r1007; 2026-02-21T08:52:48.2060143Z // end inline asm 2026-02-21T08:52:48.2060228Z cp.async.commit_group; 2026-02-21T08:52:48.2060438Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2060507Z cvt.s64.s32 %rd104, %r1428; 2026-02-21T08:52:48.2060580Z add.s64 %rd95, %rd11, %rd104; 2026-02-21T08:52:48.2060784Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2060849Z add.s32 %r1008, %r15, %r1052; 2026-02-21T08:52:48.2060918Z selp.b32 %r1009, 4, 0, %p37; 2026-02-21T08:52:48.2060988Z // begin inline asm 2026-02-21T08:52:48.2061129Z cp.async.ca.shared.global [ %r1008 + 0 ], [ %rd95 + 0 ], 0x4, %r1009; 2026-02-21T08:52:48.2061187Z // end inline asm 2026-02-21T08:52:48.2061263Z cp.async.commit_group; 2026-02-21T08:52:48.2061479Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2061546Z add.s32 %r1428, %r1428, 229376; 2026-02-21T08:52:48.2061617Z add.s32 %r1427, %r1427, 64; 2026-02-21T08:52:48.2061687Z setp.lt.u64 %p40, %rd135, 4064; 2026-02-21T08:52:48.2061750Z @%p40 bra $L__BB0_7; 2026-02-21T08:52:48.2061861Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.2062072Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.2062137Z or.b32 %r1058, %r129, %r8; 2026-02-21T08:52:48.2062345Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2062419Z cp.async.wait_group 0; 2026-02-21T08:52:48.2062477Z bar.sync 0; 2026-02-21T08:52:48.2062677Z .loc 1 87 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:87:28 2026-02-21T08:52:48.2062765Z cvt.rn.bf16x2.f32 %r1059, %r1432, %r1431; 2026-02-21T08:52:48.2062841Z cvt.rn.bf16x2.f32 %r1060, %r1434, %r1433; 2026-02-21T08:52:48.2062916Z cvt.rn.bf16x2.f32 %r1061, %r1436, %r1435; 2026-02-21T08:52:48.2062995Z cvt.rn.bf16x2.f32 %r1062, %r1438, %r1437; 2026-02-21T08:52:48.2063203Z .loc 1 88 50 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:50 2026-02-21T08:52:48.2063275Z mad.lo.s32 %r1063, %r130, 7168, %r1058; 2026-02-21T08:52:48.2063475Z .loc 1 88 22 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:22 2026-02-21T08:52:48.2063554Z mad.wide.s32 %rd105, %r1063, 2, %rd12; 2026-02-21T08:52:48.2063758Z .loc 1 88 81 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:81 2026-02-21T08:52:48.2063950Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r1059, %r1060, %r1061, %r1062}; 2026-02-21T08:52:48.2064021Z bar.sync 0; 2026-02-21T08:52:48.2064147Z ld.shared.v4.b32 {%r1054, %r1055, %r1056, %r1057}, [%r33]; 2026-02-21T08:52:48.2064211Z // begin inline asm 2026-02-21T08:52:48.2064349Z st.global.v4.b32 [ %rd105 + 0 ], { %r1054, %r1055, %r1056, %r1057 }; 2026-02-21T08:52:48.2064511Z // end inline asm 2026-02-21T08:52:48.2064724Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.2064790Z add.s32 %r1402, %r1402, 12672; 2026-02-21T08:52:48.2064868Z setp.lt.s32 %p41, %r1402, %r1439; 2026-02-21T08:52:48.2064930Z @%p41 bra $L__BB0_2; 2026-02-21T08:52:48.2065020Z $L__BB0_9: // %.preheader 2026-02-21T08:52:48.2065094Z setp.gt.s32 %p42, %r1439, 55; 2026-02-21T08:52:48.2065158Z @%p42 bra $L__BB0_14; 2026-02-21T08:52:48.2065244Z // %bb.10: // %.lr.ph148 2026-02-21T08:52:48.2065451Z .loc 1 0 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:0:144 2026-02-21T08:52:48.2065521Z and.b32 %r1066, %r1385, 56; 2026-02-21T08:52:48.2065686Z xor.b32 %r1067, %r1066, %r1384; 2026-02-21T08:52:48.2065754Z add.s32 %r1069, %r1386, %r1067; 2026-02-21T08:52:48.2065823Z add.s32 %r48, %r1069, 32768; 2026-02-21T08:52:48.2065889Z add.s32 %r1071, %r1386, 49152; 2026-02-21T08:52:48.2065953Z add.s32 %r50, %r1071, %r1388; 2026-02-21T08:52:48.2066021Z add.s32 %r1117, %r1069, 40960; 2026-02-21T08:52:48.2066231Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.2066294Z add.s32 %r52, %r1387, 229376; 2026-02-21T08:52:48.2066358Z add.s32 %r1072, %r1386, %r1388; 2026-02-21T08:52:48.2066428Z add.s32 %r1119, %r1072, 53248; 2026-02-21T08:52:48.2066623Z and.b32 %r1074, %r1389, 6144; 2026-02-21T08:52:48.2066691Z and.b32 %r1076, %r1390, 896; 2026-02-21T08:52:48.2066761Z or.b32 %r1078, %r1074, %r1076; 2026-02-21T08:52:48.2066825Z or.b32 %r54, %r1078, %r1391; 2026-02-21T08:52:48.2066888Z xor.b32 %r55, %r54, 8; 2026-02-21T08:52:48.2066955Z xor.b32 %r56, %r54, 16; 2026-02-21T08:52:48.2067024Z xor.b32 %r57, %r54, 24; 2026-02-21T08:52:48.2067096Z xor.b32 %r58, %r54, 32; 2026-02-21T08:52:48.2067159Z xor.b32 %r59, %r54, 40; 2026-02-21T08:52:48.2067230Z xor.b32 %r60, %r54, 48; 2026-02-21T08:52:48.2067291Z xor.b32 %r61, %r54, 56; 2026-02-21T08:52:48.2067355Z and.b32 %r1080, %r1385, 384; 2026-02-21T08:52:48.2067427Z add.s32 %r1081, %r1071, %r1080; 2026-02-21T08:52:48.2067492Z add.s32 %r62, %r1081, %r1392; 2026-02-21T08:52:48.2067557Z shl.b32 %r1082, %r1392, 7; 2026-02-21T08:52:48.2067620Z and.b32 %r1084, %r1393, 112; 2026-02-21T08:52:48.2067691Z or.b32 %r1086, %r1082, %r1084; 2026-02-21T08:52:48.2067757Z xor.b32 %r1087, %r1086, %r1394; 2026-02-21T08:52:48.2067821Z add.s32 %r63, %r1386, %r1087; 2026-02-21T08:52:48.2067891Z xor.b32 %r1088, %r1087, 32; 2026-02-21T08:52:48.2067955Z add.s32 %r64, %r1386, %r1088; 2026-02-21T08:52:48.2068017Z xor.b32 %r1089, %r1087, 64; 2026-02-21T08:52:48.2068079Z add.s32 %r65, %r1386, %r1089; 2026-02-21T08:52:48.2068155Z xor.b32 %r1090, %r1087, 96; 2026-02-21T08:52:48.2068219Z add.s32 %r66, %r1386, %r1090; 2026-02-21T08:52:48.2068283Z shl.b32 %r1092, %r1395, 11; 2026-02-21T08:52:48.2068358Z and.b32 %r1094, %r1384, 768; 2026-02-21T08:52:48.2068423Z and.b32 %r1096, %r1397, 96; 2026-02-21T08:52:48.2068491Z and.b32 %r1099, %r1399, 1024; 2026-02-21T08:52:48.2068641Z or.b32 %r1100, %r1396, %r1094; 2026-02-21T08:52:48.2068716Z or.b32 %r1101, %r1096, %r1398; 2026-02-21T08:52:48.2068782Z xor.b32 %r1102, %r1100, %r1101; 2026-02-21T08:52:48.2068850Z add.s32 %r1103, %r1386, %r1092; 2026-02-21T08:52:48.2068919Z add.s32 %r1104, %r1103, %r1099; 2026-02-21T08:52:48.2068983Z add.s32 %r67, %r1104, %r1102; 2026-02-21T08:52:48.2069047Z and.b32 %r1106, %r1400, 15360; 2026-02-21T08:52:48.2069111Z shl.b32 %r1107, %r1395, 4; 2026-02-21T08:52:48.2069181Z xor.b32 %r1108, %r1107, %r9; 2026-02-21T08:52:48.2069245Z add.s32 %r1109, %r1386, %r1106; 2026-02-21T08:52:48.2069311Z add.s32 %r68, %r1109, %r1108; 2026-02-21T08:52:48.2069532Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2069596Z or.b32 %r1110, %r1387, %r6; 2026-02-21T08:52:48.2069810Z add.s32 %r69, %r1110, 458752; 2026-02-21T08:52:48.2069879Z or.b32 %r1112, %r1401, %r11; 2026-02-21T08:52:48.2069944Z or.b32 %r70, %r1112, 128; 2026-02-21T08:52:48.2070059Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:48.2070162Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:48.2070371Z .loc 1 25 35 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:25:35 2026-02-21T08:52:48.2070437Z shr.u32 %r1124, %r1439, 31; 2026-02-21T08:52:48.2070501Z add.s32 %r1125, %r1439, %r1124; 2026-02-21T08:52:48.2070709Z .loc 1 26 33 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:26:33 2026-02-21T08:52:48.2070773Z and.b32 %r1126, %r1125, -2; 2026-02-21T08:52:48.2071093Z .loc 1 27 39 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:39 2026-02-21T08:52:48.2071167Z sub.s32 %r1127, 56, %r1126; 2026-02-21T08:52:48.2071371Z .loc 1 27 52 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:27:52 2026-02-21T08:52:48.2071432Z min.s32 %r1128, %r1127, 2; 2026-02-21T08:52:48.2071638Z .loc 1 28 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:45 2026-02-21T08:52:48.2071702Z sub.s32 %r1129, %r1439, %r1126; 2026-02-21T08:52:48.2071900Z .loc 1 29 51 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:29:51 2026-02-21T08:52:48.2071964Z div.s32 %r1130, %r1129, %r1128; 2026-02-21T08:52:48.2072173Z .loc 1 28 64 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:64 2026-02-21T08:52:48.2072239Z mul.lo.s32 %r1131, %r1130, %r1128; 2026-02-21T08:52:48.2072304Z sub.s32 %r1132, %r1129, %r1131; 2026-02-21T08:52:48.2072508Z .loc 1 28 30 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:28:30 2026-02-21T08:52:48.2072571Z add.s32 %r1133, %r1132, %r1126; 2026-02-21T08:52:48.2072771Z .loc 1 30 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:30:27 2026-02-21T08:52:48.2072837Z shl.b32 %r159, %r1133, 7; 2026-02-21T08:52:48.2073036Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.2073099Z or.b32 %r1134, %r159, %r6; 2026-02-21T08:52:48.2073305Z .loc 1 32 27 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:32:27 2026-02-21T08:52:48.2073379Z shl.b32 %r1135, %r1130, 6; 2026-02-21T08:52:48.2073581Z .loc 1 33 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:33:32 2026-02-21T08:52:48.2073643Z or.b32 %r160, %r1135, %r10; 2026-02-21T08:52:48.2073848Z .loc 1 48 53 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:53 2026-02-21T08:52:48.2073912Z shl.b32 %r1136, %r160, 13; 2026-02-21T08:52:48.2074111Z .loc 1 48 60 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:60 2026-02-21T08:52:48.2074182Z or.b32 %r1137, %r1136, %r11; 2026-02-21T08:52:48.2074383Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2074457Z mad.wide.s32 %rd106, %r1137, 2, %rd10; 2026-02-21T08:52:48.2074522Z mov.b32 %r1114, 8; 2026-02-21T08:52:48.2074721Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2074782Z // begin inline asm 2026-02-21T08:52:48.2074927Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd106 + 0 ], 0x8, %r1114; 2026-02-21T08:52:48.2074988Z // end inline asm 2026-02-21T08:52:48.2075060Z cp.async.commit_group; 2026-02-21T08:52:48.2075264Z .loc 1 54 62 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:62 2026-02-21T08:52:48.2075334Z add.s32 %r1138, %r1134, %r1387; 2026-02-21T08:52:48.2075535Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2075706Z cvt.s64.s32 %rd111, %r1138; 2026-02-21T08:52:48.2075789Z add.s64 %rd107, %rd11, %rd111; 2026-02-21T08:52:48.2075849Z mov.b32 %r1116, 4; 2026-02-21T08:52:48.2076049Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2076118Z // begin inline asm 2026-02-21T08:52:48.2076254Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd107 + 0 ], 0x4, %r1116; 2026-02-21T08:52:48.2076313Z // end inline asm 2026-02-21T08:52:48.2076381Z cp.async.commit_group; 2026-02-21T08:52:48.2076709Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2076779Z add.s64 %rd108, %rd106, 128; 2026-02-21T08:52:48.2077125Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2077193Z bar.sync 0; 2026-02-21T08:52:48.2077257Z // begin inline asm 2026-02-21T08:52:48.2077404Z cp.async.ca.shared.global [ %r1117 + 0 ], [ %rd108 + 0 ], 0x8, %r1114; 2026-02-21T08:52:48.2077464Z // end inline asm 2026-02-21T08:52:48.2077538Z cp.async.commit_group; 2026-02-21T08:52:48.2077602Z add.s32 %r1139, %r1134, %r52; 2026-02-21T08:52:48.2077802Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2077876Z cvt.s64.s32 %rd112, %r1139; 2026-02-21T08:52:48.2077941Z add.s64 %rd109, %rd11, %rd112; 2026-02-21T08:52:48.2078141Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2078207Z // begin inline asm 2026-02-21T08:52:48.2078358Z cp.async.ca.shared.global [ %r1119 + 0 ], [ %rd109 + 0 ], 0x4, %r1116; 2026-02-21T08:52:48.2078423Z // end inline asm 2026-02-21T08:52:48.2078492Z cp.async.commit_group; 2026-02-21T08:52:48.2078710Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2078778Z shl.b32 %r1140, %r1439, 7; 2026-02-21T08:52:48.2078843Z add.s32 %r1141, %r69, %r1140; 2026-02-21T08:52:48.2078911Z shl.b32 %r1142, %r1131, 7; 2026-02-21T08:52:48.2078975Z sub.s32 %r1441, %r1141, %r1142; 2026-02-21T08:52:48.2079038Z shl.b32 %r1143, %r1130, 19; 2026-02-21T08:52:48.2079106Z or.b32 %r1440, %r70, %r1143; 2026-02-21T08:52:48.2079167Z mov.b32 %r1444, 0f00000000; 2026-02-21T08:52:48.2079226Z mov.b32 %r1443, 1; 2026-02-21T08:52:48.2079289Z mov.b32 %r1442, -1; 2026-02-21T08:52:48.2079358Z mov.b64 %rd136, -32; 2026-02-21T08:52:48.2079426Z mov.b32 %r1445, %r1444; 2026-02-21T08:52:48.2079488Z mov.b32 %r1446, %r1444; 2026-02-21T08:52:48.2079555Z mov.b32 %r1447, %r1444; 2026-02-21T08:52:48.2079616Z mov.b32 %r1448, %r1444; 2026-02-21T08:52:48.2079684Z mov.b32 %r1449, %r1444; 2026-02-21T08:52:48.2079745Z mov.b32 %r1450, %r1444; 2026-02-21T08:52:48.2079814Z mov.b32 %r1451, %r1444; 2026-02-21T08:52:48.2079925Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:48.2080034Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.2080101Z add.s64 %rd136, %rd136, 32; 2026-02-21T08:52:48.2080169Z setp.lt.u64 %p52, %rd136, 4032; 2026-02-21T08:52:48.2080235Z add.s32 %r1330, %r1442, 1; 2026-02-21T08:52:48.2080302Z setp.gt.s32 %p53, %r1330, 1; 2026-02-21T08:52:48.2080379Z selp.b32 %r1442, 0, %r1330, %p53; 2026-02-21T08:52:48.2080582Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2080653Z cp.async.wait_group 2; 2026-02-21T08:52:48.2080725Z bar.sync 0; 2026-02-21T08:52:48.2080788Z shl.b32 %r1331, %r1442, 12; 2026-02-21T08:52:48.2080851Z shl.b32 %r1332, %r1442, 13; 2026-02-21T08:52:48.2080923Z add.s32 %r1333, %r1386, %r1332; 2026-02-21T08:52:48.2080987Z add.s32 %r1334, %r1333, 32768; 2026-02-21T08:52:48.2081186Z .loc 1 52 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:52:32 2026-02-21T08:52:48.2081384Z add.s32 %r1335, %r1334, %r54; 2026-02-21T08:52:48.2081462Z ld.shared.b16 %rs217, [%r1335]; 2026-02-21T08:52:48.2081536Z ld.shared.b16 %rs218, [%r1335+1024]; 2026-02-21T08:52:48.2081606Z ld.shared.b16 %rs219, [%r1335+64]; 2026-02-21T08:52:48.2081696Z ld.shared.b16 %rs220, [%r1335+1088]; 2026-02-21T08:52:48.2081761Z add.s32 %r1336, %r1334, %r55; 2026-02-21T08:52:48.2081826Z ld.shared.b16 %rs221, [%r1336]; 2026-02-21T08:52:48.2081903Z ld.shared.b16 %rs222, [%r1336+1024]; 2026-02-21T08:52:48.2081971Z ld.shared.b16 %rs223, [%r1336+64]; 2026-02-21T08:52:48.2082039Z ld.shared.b16 %rs224, [%r1336+1088]; 2026-02-21T08:52:48.2082104Z add.s32 %r1337, %r1334, %r56; 2026-02-21T08:52:48.2082175Z ld.shared.b16 %rs225, [%r1337]; 2026-02-21T08:52:48.2082343Z ld.shared.b16 %rs226, [%r1337+1024]; 2026-02-21T08:52:48.2082418Z ld.shared.b16 %rs227, [%r1337+64]; 2026-02-21T08:52:48.2082491Z ld.shared.b16 %rs228, [%r1337+1088]; 2026-02-21T08:52:48.2082557Z add.s32 %r1338, %r1334, %r57; 2026-02-21T08:52:48.2082623Z ld.shared.b16 %rs229, [%r1338]; 2026-02-21T08:52:48.2082698Z ld.shared.b16 %rs230, [%r1338+1024]; 2026-02-21T08:52:48.2082771Z ld.shared.b16 %rs231, [%r1338+64]; 2026-02-21T08:52:48.2082838Z ld.shared.b16 %rs232, [%r1338+1088]; 2026-02-21T08:52:48.2082900Z add.s32 %r1339, %r1334, %r58; 2026-02-21T08:52:48.2082970Z ld.shared.b16 %rs233, [%r1339]; 2026-02-21T08:52:48.2083036Z ld.shared.b16 %rs234, [%r1339+1024]; 2026-02-21T08:52:48.2083103Z ld.shared.b16 %rs235, [%r1339+64]; 2026-02-21T08:52:48.2083170Z ld.shared.b16 %rs236, [%r1339+1088]; 2026-02-21T08:52:48.2083235Z add.s32 %r1340, %r1334, %r59; 2026-02-21T08:52:48.2083301Z ld.shared.b16 %rs237, [%r1340]; 2026-02-21T08:52:48.2083372Z ld.shared.b16 %rs238, [%r1340+1024]; 2026-02-21T08:52:48.2083443Z ld.shared.b16 %rs239, [%r1340+64]; 2026-02-21T08:52:48.2083510Z ld.shared.b16 %rs240, [%r1340+1088]; 2026-02-21T08:52:48.2083574Z add.s32 %r1341, %r1334, %r60; 2026-02-21T08:52:48.2083648Z ld.shared.b16 %rs241, [%r1341]; 2026-02-21T08:52:48.2083716Z ld.shared.b16 %rs242, [%r1341+1024]; 2026-02-21T08:52:48.2083784Z ld.shared.b16 %rs243, [%r1341+64]; 2026-02-21T08:52:48.2083850Z ld.shared.b16 %rs244, [%r1341+1088]; 2026-02-21T08:52:48.2083918Z add.s32 %r1342, %r1334, %r61; 2026-02-21T08:52:48.2083983Z ld.shared.b16 %rs245, [%r1342]; 2026-02-21T08:52:48.2084049Z ld.shared.b16 %rs246, [%r1342+1024]; 2026-02-21T08:52:48.2084123Z ld.shared.b16 %rs247, [%r1342+64]; 2026-02-21T08:52:48.2084199Z ld.shared.b16 %rs248, [%r1342+1088]; 2026-02-21T08:52:48.2084267Z cvt.f32.bf16 %r1160, %rs217; 2026-02-21T08:52:48.2084334Z cvt.f32.bf16 %r1161, %rs218; 2026-02-21T08:52:48.2084404Z cvt.f32.bf16 %r1162, %rs221; 2026-02-21T08:52:48.2084470Z cvt.f32.bf16 %r1163, %rs222; 2026-02-21T08:52:48.2084533Z cvt.f32.bf16 %r1180, %rs225; 2026-02-21T08:52:48.2084601Z cvt.f32.bf16 %r1181, %rs226; 2026-02-21T08:52:48.2084666Z cvt.f32.bf16 %r1182, %rs229; 2026-02-21T08:52:48.2084732Z cvt.f32.bf16 %r1183, %rs230; 2026-02-21T08:52:48.2084799Z cvt.f32.bf16 %r1200, %rs233; 2026-02-21T08:52:48.2084860Z cvt.f32.bf16 %r1201, %rs234; 2026-02-21T08:52:48.2084922Z cvt.f32.bf16 %r1202, %rs237; 2026-02-21T08:52:48.2084984Z cvt.f32.bf16 %r1203, %rs238; 2026-02-21T08:52:48.2085054Z cvt.f32.bf16 %r1220, %rs241; 2026-02-21T08:52:48.2085118Z cvt.f32.bf16 %r1221, %rs242; 2026-02-21T08:52:48.2085180Z cvt.f32.bf16 %r1222, %rs245; 2026-02-21T08:52:48.2085248Z cvt.f32.bf16 %r1223, %rs246; 2026-02-21T08:52:48.2085322Z cvt.f32.bf16 %r1240, %rs219; 2026-02-21T08:52:48.2085389Z cvt.f32.bf16 %r1241, %rs220; 2026-02-21T08:52:48.2085452Z cvt.f32.bf16 %r1242, %rs223; 2026-02-21T08:52:48.2085526Z cvt.f32.bf16 %r1243, %rs224; 2026-02-21T08:52:48.2085592Z cvt.f32.bf16 %r1260, %rs227; 2026-02-21T08:52:48.2085658Z cvt.f32.bf16 %r1261, %rs228; 2026-02-21T08:52:48.2085733Z cvt.f32.bf16 %r1262, %rs231; 2026-02-21T08:52:48.2085795Z cvt.f32.bf16 %r1263, %rs232; 2026-02-21T08:52:48.2085963Z cvt.f32.bf16 %r1280, %rs235; 2026-02-21T08:52:48.2086025Z cvt.f32.bf16 %r1281, %rs236; 2026-02-21T08:52:48.2086097Z cvt.f32.bf16 %r1282, %rs239; 2026-02-21T08:52:48.2086159Z cvt.f32.bf16 %r1283, %rs240; 2026-02-21T08:52:48.2086223Z cvt.f32.bf16 %r1300, %rs243; 2026-02-21T08:52:48.2086289Z cvt.f32.bf16 %r1301, %rs244; 2026-02-21T08:52:48.2086351Z cvt.f32.bf16 %r1302, %rs247; 2026-02-21T08:52:48.2086412Z cvt.f32.bf16 %r1303, %rs248; 2026-02-21T08:52:48.2086733Z .loc 1 67 45 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:67:45 2026-02-21T08:52:48.2086813Z add.s32 %r1343, %r62, %r1331; 2026-02-21T08:52:48.2086884Z ld.shared.b8 %rs249, [%r1343]; 2026-02-21T08:52:48.2086952Z ld.shared.b8 %rs250, [%r1343+512]; 2026-02-21T08:52:48.2087161Z ld.shared.b8 %rs251, [%r1343+1024]; 2026-02-21T08:52:48.2087234Z ld.shared.b8 %rs252, [%r1343+1536]; 2026-02-21T08:52:48.2087302Z ld.shared.b8 %rs253, [%r1343+2048]; 2026-02-21T08:52:48.2087379Z ld.shared.b8 %rs254, [%r1343+2560]; 2026-02-21T08:52:48.2087446Z ld.shared.b8 %rs255, [%r1343+3072]; 2026-02-21T08:52:48.2087513Z ld.shared.b8 %rs256, [%r1343+3584]; 2026-02-21T08:52:48.2087725Z .loc 1 57 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:57:28 2026-02-21T08:52:48.2087801Z shl.b16 %rs257, %rs249, 4; 2026-02-21T08:52:48.2087864Z shl.b16 %rs258, %rs250, 4; 2026-02-21T08:52:48.2087929Z shl.b16 %rs259, %rs251, 4; 2026-02-21T08:52:48.2087997Z shl.b16 %rs260, %rs252, 4; 2026-02-21T08:52:48.2088060Z shl.b16 %rs261, %rs253, 4; 2026-02-21T08:52:48.2088122Z shl.b16 %rs262, %rs254, 4; 2026-02-21T08:52:48.2088191Z shl.b16 %rs263, %rs255, 4; 2026-02-21T08:52:48.2088254Z shl.b16 %rs264, %rs256, 4; 2026-02-21T08:52:48.2088461Z .loc 1 72 58 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:72:58 2026-02-21T08:52:48.2088537Z selp.b16 %rs265, %rs257, %rs249, %p57; 2026-02-21T08:52:48.2088608Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T08:52:48.2088675Z shr.s16 %rs267, %rs266, 4; 2026-02-21T08:52:48.2088749Z selp.b16 %rs268, %rs258, %rs250, %p57; 2026-02-21T08:52:48.2088818Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T08:52:48.2088883Z shr.s16 %rs270, %rs269, 4; 2026-02-21T08:52:48.2088952Z selp.b16 %rs271, %rs259, %rs251, %p57; 2026-02-21T08:52:48.2089016Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T08:52:48.2089083Z shr.s16 %rs273, %rs272, 4; 2026-02-21T08:52:48.2089152Z selp.b16 %rs274, %rs260, %rs252, %p57; 2026-02-21T08:52:48.2089216Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T08:52:48.2089281Z shr.s16 %rs276, %rs275, 4; 2026-02-21T08:52:48.2089350Z selp.b16 %rs277, %rs261, %rs253, %p57; 2026-02-21T08:52:48.2089411Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T08:52:48.2089474Z shr.s16 %rs279, %rs278, 4; 2026-02-21T08:52:48.2089551Z selp.b16 %rs280, %rs262, %rs254, %p57; 2026-02-21T08:52:48.2089614Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T08:52:48.2089675Z shr.s16 %rs282, %rs281, 4; 2026-02-21T08:52:48.2089751Z selp.b16 %rs283, %rs263, %rs255, %p57; 2026-02-21T08:52:48.2089813Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T08:52:48.2089875Z shr.s16 %rs285, %rs284, 4; 2026-02-21T08:52:48.2089948Z selp.b16 %rs286, %rs264, %rs256, %p57; 2026-02-21T08:52:48.2090010Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T08:52:48.2090072Z shr.s16 %rs288, %rs287, 4; 2026-02-21T08:52:48.2090275Z .loc 1 77 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:77:32 2026-02-21T08:52:48.2090346Z cvt.rn.f32.s16 %r1344, %rs267; 2026-02-21T08:52:48.2090411Z cvt.rn.f32.s16 %r1345, %rs270; 2026-02-21T08:52:48.2090476Z cvt.rn.f32.s16 %r1346, %rs273; 2026-02-21T08:52:48.2090555Z cvt.rn.f32.s16 %r1347, %rs276; 2026-02-21T08:52:48.2090621Z cvt.rn.f32.s16 %r1348, %rs279; 2026-02-21T08:52:48.2090688Z cvt.rn.f32.s16 %r1349, %rs282; 2026-02-21T08:52:48.2090752Z cvt.rn.f32.s16 %r1350, %rs285; 2026-02-21T08:52:48.2090820Z cvt.rn.f32.s16 %r1351, %rs288; 2026-02-21T08:52:48.2090887Z st.shared.b32 [%r63], %r1344; 2026-02-21T08:52:48.2091090Z st.shared.b32 [%r63+16384], %r1348; 2026-02-21T08:52:48.2091160Z st.shared.b32 [%r64], %r1345; 2026-02-21T08:52:48.2091229Z st.shared.b32 [%r64+16384], %r1349; 2026-02-21T08:52:48.2091292Z st.shared.b32 [%r65], %r1346; 2026-02-21T08:52:48.2091364Z st.shared.b32 [%r65+16384], %r1350; 2026-02-21T08:52:48.2091428Z st.shared.b32 [%r66], %r1347; 2026-02-21T08:52:48.2091495Z st.shared.b32 [%r66+16384], %r1351; 2026-02-21T08:52:48.2091551Z $L__tmp7: 2026-02-21T08:52:48.2091843Z .loc 2 291 36 // standard.py:291:36 @[ cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:84:40 ] 2026-02-21T08:52:48.2091905Z // begin inline asm 2026-02-21T08:52:48.2091985Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.2092049Z // end inline asm 2026-02-21T08:52:48.2092205Z bar.sync 0; 2026-02-21T08:52:48.2092291Z shfl.sync.idx.b32 %r1352, %r4, 0, 31, -1; 2026-02-21T08:52:48.2092368Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.2092437Z shl.b32 %r1353, %r1352, 9; 2026-02-21T08:52:48.2092503Z and.b32 %r1354, %r1353, 14336; 2026-02-21T08:52:48.2092568Z add.s32 %r1355, %r1354, %r1386; 2026-02-21T08:52:48.2092636Z bfe.u32 %r1356, %r1355, 4, 14; 2026-02-21T08:52:48.2092702Z cvt.u64.u32 %rd123, %r1356; 2026-02-21T08:52:48.2092781Z or.b64 %rd113, %rd123, 4611686293372403712; 2026-02-21T08:52:48.2092848Z mov.pred %p43, -1; 2026-02-21T08:52:48.2092913Z // begin inline asm 2026-02-21T08:52:48.2093303Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1160,%r1161,%r1162,%r1163}, %rd113, %p43, 1, 1; 2026-02-21T08:52:48.2093363Z // end inline asm 2026-02-21T08:52:48.2093432Z add.s32 %r1357, %r1355, 32; 2026-02-21T08:52:48.2093495Z bfe.u32 %r1358, %r1357, 4, 14; 2026-02-21T08:52:48.2093562Z cvt.u64.u32 %rd124, %r1358; 2026-02-21T08:52:48.2093643Z or.b64 %rd114, %rd124, 4611686293372403712; 2026-02-21T08:52:48.2093704Z // begin inline asm 2026-02-21T08:52:48.2094078Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1180,%r1181,%r1182,%r1183}, %rd114, %p43, 1, 1; 2026-02-21T08:52:48.2094148Z // end inline asm 2026-02-21T08:52:48.2094212Z add.s32 %r1359, %r1355, 64; 2026-02-21T08:52:48.2094273Z bfe.u32 %r1360, %r1359, 4, 14; 2026-02-21T08:52:48.2094338Z cvt.u64.u32 %rd125, %r1360; 2026-02-21T08:52:48.2094419Z or.b64 %rd115, %rd125, 4611686293372403712; 2026-02-21T08:52:48.2094482Z // begin inline asm 2026-02-21T08:52:48.2094866Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1200,%r1201,%r1202,%r1203}, %rd115, %p43, 1, 1; 2026-02-21T08:52:48.2094932Z // end inline asm 2026-02-21T08:52:48.2094996Z add.s32 %r1361, %r1355, 96; 2026-02-21T08:52:48.2095063Z bfe.u32 %r1362, %r1361, 4, 14; 2026-02-21T08:52:48.2095128Z cvt.u64.u32 %rd126, %r1362; 2026-02-21T08:52:48.2095207Z or.b64 %rd116, %rd126, 4611686293372403712; 2026-02-21T08:52:48.2095271Z // begin inline asm 2026-02-21T08:52:48.2095640Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1220,%r1221,%r1222,%r1223}, %rd116, %p43, 1, 1; 2026-02-21T08:52:48.2095704Z // end inline asm 2026-02-21T08:52:48.2095767Z add.s32 %r1363, %r1355, 16384; 2026-02-21T08:52:48.2095831Z bfe.u32 %r1364, %r1363, 4, 14; 2026-02-21T08:52:48.2099933Z cvt.u64.u32 %rd127, %r1364; 2026-02-21T08:52:48.2100067Z or.b64 %rd117, %rd127, 4611686293372403712; 2026-02-21T08:52:48.2100137Z // begin inline asm 2026-02-21T08:52:48.2100539Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1240,%r1241,%r1242,%r1243}, %rd117, %p43, 1, 1; 2026-02-21T08:52:48.2100609Z // end inline asm 2026-02-21T08:52:48.2100687Z add.s32 %r1365, %r1355, 16416; 2026-02-21T08:52:48.2100752Z bfe.u32 %r1366, %r1365, 4, 14; 2026-02-21T08:52:48.2100825Z cvt.u64.u32 %rd128, %r1366; 2026-02-21T08:52:48.2101130Z or.b64 %rd118, %rd128, 4611686293372403712; 2026-02-21T08:52:48.2101195Z // begin inline asm 2026-02-21T08:52:48.2101578Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1260,%r1261,%r1262,%r1263}, %rd118, %p43, 1, 1; 2026-02-21T08:52:48.2101648Z // end inline asm 2026-02-21T08:52:48.2101715Z add.s32 %r1367, %r1355, 16448; 2026-02-21T08:52:48.2101781Z bfe.u32 %r1368, %r1367, 4, 14; 2026-02-21T08:52:48.2101854Z cvt.u64.u32 %rd129, %r1368; 2026-02-21T08:52:48.2101934Z or.b64 %rd119, %rd129, 4611686293372403712; 2026-02-21T08:52:48.2101996Z // begin inline asm 2026-02-21T08:52:48.2102511Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1280,%r1281,%r1282,%r1283}, %rd119, %p43, 1, 1; 2026-02-21T08:52:48.2102577Z // end inline asm 2026-02-21T08:52:48.2102643Z add.s32 %r1369, %r1355, 16480; 2026-02-21T08:52:48.2102708Z bfe.u32 %r1370, %r1369, 4, 14; 2026-02-21T08:52:48.2102785Z cvt.u64.u32 %rd130, %r1370; 2026-02-21T08:52:48.2102866Z or.b64 %rd120, %rd130, 4611686293372403712; 2026-02-21T08:52:48.2102942Z // begin inline asm 2026-02-21T08:52:48.2103333Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451}, {%r1300,%r1301,%r1302,%r1303}, %rd120, %p43, 1, 1; 2026-02-21T08:52:48.2103394Z // end inline asm 2026-02-21T08:52:48.2103474Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.2103540Z mov.b32 %r1313, 0; 2026-02-21T08:52:48.2103603Z mov.b32 %r1314, %r1313; 2026-02-21T08:52:48.2103663Z mov.b32 %r1312, %r1386; 2026-02-21T08:52:48.2103724Z // begin inline asm 2026-02-21T08:52:48.2103912Z // wait for regs: %r1444,%r1445,%r1446,%r1447,%r1448,%r1449,%r1450,%r1451,%r1312,%r1313,%r1314 2026-02-21T08:52:48.2104002Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.2104063Z // end inline asm 2026-02-21T08:52:48.2104127Z $L__tmp8: 2026-02-21T08:52:48.2104359Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2104428Z add.s32 %r1371, %r1443, 1; 2026-02-21T08:52:48.2104503Z setp.gt.s32 %p54, %r1371, 1; 2026-02-21T08:52:48.2104575Z selp.b32 %r1443, 0, %r1371, %p54; 2026-02-21T08:52:48.2104787Z .loc 1 48 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:32 2026-02-21T08:52:48.2104864Z mad.wide.s32 %rd121, %r1440, 2, %rd10; 2026-02-21T08:52:48.2105073Z .loc 1 48 80 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:48:80 2026-02-21T08:52:48.2105140Z shl.b32 %r1372, %r1443, 12; 2026-02-21T08:52:48.2105201Z shl.b32 %r1373, %r1443, 13; 2026-02-21T08:52:48.2105273Z add.s32 %r1326, %r48, %r1373; 2026-02-21T08:52:48.2105339Z selp.b32 %r1327, 8, 0, %p52; 2026-02-21T08:52:48.2105406Z // begin inline asm 2026-02-21T08:52:48.2105561Z cp.async.ca.shared.global [ %r1326 + 0 ], [ %rd121 + 0 ], 0x8, %r1327; 2026-02-21T08:52:48.2105636Z // end inline asm 2026-02-21T08:52:48.2105710Z cp.async.commit_group; 2026-02-21T08:52:48.2105924Z .loc 1 54 34 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:34 2026-02-21T08:52:48.2105995Z cvt.s64.s32 %rd131, %r1441; 2026-02-21T08:52:48.2106064Z add.s64 %rd122, %rd11, %rd131; 2026-02-21T08:52:48.2106262Z .loc 1 54 87 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:54:87 2026-02-21T08:52:48.2106332Z add.s32 %r1328, %r50, %r1372; 2026-02-21T08:52:48.2106396Z selp.b32 %r1329, 4, 0, %p52; 2026-02-21T08:52:48.2106623Z // begin inline asm 2026-02-21T08:52:48.2106778Z cp.async.ca.shared.global [ %r1328 + 0 ], [ %rd122 + 0 ], 0x4, %r1329; 2026-02-21T08:52:48.2106845Z // end inline asm 2026-02-21T08:52:48.2106915Z cp.async.commit_group; 2026-02-21T08:52:48.2107130Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2107198Z add.s32 %r1441, %r1441, 229376; 2026-02-21T08:52:48.2107417Z add.s32 %r1440, %r1440, 64; 2026-02-21T08:52:48.2107489Z setp.lt.u64 %p55, %rd136, 4064; 2026-02-21T08:52:48.2107554Z @%p55 bra $L__BB0_12; 2026-02-21T08:52:48.2107679Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:48.2107884Z .loc 1 31 32 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:31:32 2026-02-21T08:52:48.2107950Z or.b32 %r1378, %r159, %r8; 2026-02-21T08:52:48.2108165Z .loc 1 40 103 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:40:103 2026-02-21T08:52:48.2108235Z cp.async.wait_group 0; 2026-02-21T08:52:48.2108295Z bar.sync 0; 2026-02-21T08:52:48.2108497Z .loc 1 87 28 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:87:28 2026-02-21T08:52:48.2108776Z cvt.rn.bf16x2.f32 %r1379, %r1445, %r1444; 2026-02-21T08:52:48.2108861Z cvt.rn.bf16x2.f32 %r1380, %r1447, %r1446; 2026-02-21T08:52:48.2108938Z cvt.rn.bf16x2.f32 %r1381, %r1449, %r1448; 2026-02-21T08:52:48.2109023Z cvt.rn.bf16x2.f32 %r1382, %r1451, %r1450; 2026-02-21T08:52:48.2109231Z .loc 1 88 50 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:50 2026-02-21T08:52:48.2109304Z mad.lo.s32 %r1383, %r160, 7168, %r1378; 2026-02-21T08:52:48.2109516Z .loc 1 88 22 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:22 2026-02-21T08:52:48.2109590Z mad.wide.s32 %rd132, %r1383, 2, %rd12; 2026-02-21T08:52:48.2109789Z .loc 1 88 81 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:88:81 2026-02-21T08:52:48.2109978Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r67], {%r1379, %r1380, %r1381, %r1382}; 2026-02-21T08:52:48.2110042Z bar.sync 0; 2026-02-21T08:52:48.2110159Z ld.shared.v4.b32 {%r1374, %r1375, %r1376, %r1377}, [%r68]; 2026-02-21T08:52:48.2110224Z // begin inline asm 2026-02-21T08:52:48.2110361Z st.global.v4.b32 [ %rd132 + 0 ], { %r1374, %r1375, %r1376, %r1377 }; 2026-02-21T08:52:48.2110424Z // end inline asm 2026-02-21T08:52:48.2110640Z .loc 1 19 144 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:144 2026-02-21T08:52:48.2110711Z add.s32 %r187, %r1439, 4224; 2026-02-21T08:52:48.2110785Z setp.lt.s32 %p56, %r1439, -4168; 2026-02-21T08:52:48.2110848Z mov.b32 %r1439, %r187; 2026-02-21T08:52:48.2110909Z @%p56 bra $L__BB0_11; 2026-02-21T08:52:48.2111006Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:48.2111208Z .loc 1 19 4 // cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py:19:4 2026-02-21T08:52:48.2111262Z ret; 2026-02-21T08:52:48.2111327Z $L__tmp9: 2026-02-21T08:52:48.2111384Z $L__func_end0: 2026-02-21T08:52:48.2111473Z // -- End function 2026-02-21T08:52:48.2111540Z } 2026-02-21T08:52:48.2111790Z .file 1 "/tmp/torchinductor_root/xg/cxgm6uqqxxr7t2r2jxzfgzzihcsnemo7uu5qqkgmcf6dvqp7fgkd.py" 2026-02-21T08:52:48.2112001Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:48.2112071Z .section .debug_abbrev 2026-02-21T08:52:48.2112133Z { 2026-02-21T08:52:48.2112229Z .b8 1 // Abbreviation Code 2026-02-21T08:52:48.2112337Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:48.2112435Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:48.2112521Z .b8 37 // DW_AT_producer 2026-02-21T08:52:48.2112601Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.2112689Z .b8 19 // DW_AT_language 2026-02-21T08:52:48.2112772Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:48.2112855Z .b8 3 // DW_AT_name 2026-02-21T08:52:48.2112938Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.2113030Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:48.2113216Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:48.2113298Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:48.2113383Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.2113460Z .b8 0 // EOM(1) 2026-02-21T08:52:48.2113536Z .b8 0 // EOM(2) 2026-02-21T08:52:48.2113629Z .b8 2 // Abbreviation Code 2026-02-21T08:52:48.2113720Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:48.2113801Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:48.2113879Z .b8 3 // DW_AT_name 2026-02-21T08:52:48.2114053Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.2114139Z .b8 32 // DW_AT_inline 2026-02-21T08:52:48.2114224Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.2114306Z .b8 0 // EOM(1) 2026-02-21T08:52:48.2114377Z .b8 0 // EOM(2) 2026-02-21T08:52:48.2114462Z .b8 3 // Abbreviation Code 2026-02-21T08:52:48.2114552Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:48.2114636Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:48.2114716Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:48.2114801Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.2114898Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:48.2114977Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.2115075Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:48.2115159Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:48.2115231Z .b8 0 // EOM(1) 2026-02-21T08:52:48.2115302Z .b8 0 // EOM(2) 2026-02-21T08:52:48.2115398Z .b8 4 // Abbreviation Code 2026-02-21T08:52:48.2115500Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:48.2115583Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:48.2115683Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:48.2115770Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:48.2115849Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:48.2115926Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.2116018Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:48.2116098Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.2116182Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:48.2116268Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.2116352Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:48.2116432Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.2116662Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:48.2116747Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.2116819Z .b8 0 // EOM(1) 2026-02-21T08:52:48.2116890Z .b8 0 // EOM(2) 2026-02-21T08:52:48.2116981Z .b8 0 // EOM(3) 2026-02-21T08:52:48.2117038Z } 2026-02-21T08:52:48.2117109Z .section .debug_info 2026-02-21T08:52:48.2117171Z { 2026-02-21T08:52:48.2117266Z .b32 178 // Length of Unit 2026-02-21T08:52:48.2117368Z .b8 2 // DWARF version number 2026-02-21T08:52:48.2117432Z .b8 0 2026-02-21T08:52:48.2117564Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:48.2117808Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:48.2117924Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:48.2118018Z .b8 116 // DW_AT_producer 2026-02-21T08:52:48.2118074Z .b8 114 2026-02-21T08:52:48.2118130Z .b8 105 2026-02-21T08:52:48.2118190Z .b8 116 2026-02-21T08:52:48.2118244Z .b8 111 2026-02-21T08:52:48.2118297Z .b8 110 2026-02-21T08:52:48.2118353Z .b8 0 2026-02-21T08:52:48.2118439Z .b8 2 // DW_AT_language 2026-02-21T08:52:48.2118492Z .b8 0 2026-02-21T08:52:48.2118571Z .b8 99 // DW_AT_name 2026-02-21T08:52:48.2118633Z .b8 120 2026-02-21T08:52:48.2118689Z .b8 103 2026-02-21T08:52:48.2118870Z .b8 109 2026-02-21T08:52:48.2118927Z .b8 54 2026-02-21T08:52:48.2118988Z .b8 117 2026-02-21T08:52:48.2119044Z .b8 113 2026-02-21T08:52:48.2119096Z .b8 113 2026-02-21T08:52:48.2119158Z .b8 120 2026-02-21T08:52:48.2119214Z .b8 120 2026-02-21T08:52:48.2119267Z .b8 114 2026-02-21T08:52:48.2119318Z .b8 55 2026-02-21T08:52:48.2119389Z .b8 116 2026-02-21T08:52:48.2119443Z .b8 50 2026-02-21T08:52:48.2119498Z .b8 114 2026-02-21T08:52:48.2119556Z .b8 50 2026-02-21T08:52:48.2119609Z .b8 106 2026-02-21T08:52:48.2119662Z .b8 120 2026-02-21T08:52:48.2119716Z .b8 122 2026-02-21T08:52:48.2119775Z .b8 102 2026-02-21T08:52:48.2119828Z .b8 103 2026-02-21T08:52:48.2119880Z .b8 122 2026-02-21T08:52:48.2119933Z .b8 122 2026-02-21T08:52:48.2119994Z .b8 105 2026-02-21T08:52:48.2120047Z .b8 104 2026-02-21T08:52:48.2120099Z .b8 99 2026-02-21T08:52:48.2120157Z .b8 115 2026-02-21T08:52:48.2120210Z .b8 110 2026-02-21T08:52:48.2120263Z .b8 101 2026-02-21T08:52:48.2120315Z .b8 109 2026-02-21T08:52:48.2120381Z .b8 111 2026-02-21T08:52:48.2120434Z .b8 55 2026-02-21T08:52:48.2120486Z .b8 117 2026-02-21T08:52:48.2120545Z .b8 117 2026-02-21T08:52:48.2120598Z .b8 53 2026-02-21T08:52:48.2120652Z .b8 113 2026-02-21T08:52:48.2120708Z .b8 113 2026-02-21T08:52:48.2120769Z .b8 107 2026-02-21T08:52:48.2120822Z .b8 103 2026-02-21T08:52:48.2120876Z .b8 109 2026-02-21T08:52:48.2120931Z .b8 99 2026-02-21T08:52:48.2120990Z .b8 102 2026-02-21T08:52:48.2121043Z .b8 54 2026-02-21T08:52:48.2121098Z .b8 100 2026-02-21T08:52:48.2121157Z .b8 118 2026-02-21T08:52:48.2121210Z .b8 113 2026-02-21T08:52:48.2121262Z .b8 112 2026-02-21T08:52:48.2121315Z .b8 55 2026-02-21T08:52:48.2121376Z .b8 102 2026-02-21T08:52:48.2121429Z .b8 103 2026-02-21T08:52:48.2121486Z .b8 107 2026-02-21T08:52:48.2121544Z .b8 100 2026-02-21T08:52:48.2121597Z .b8 46 2026-02-21T08:52:48.2121651Z .b8 112 2026-02-21T08:52:48.2121707Z .b8 121 2026-02-21T08:52:48.2121763Z .b8 0 2026-02-21T08:52:48.2121868Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:48.2121954Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:48.2122013Z .b8 116 2026-02-21T08:52:48.2122066Z .b8 109 2026-02-21T08:52:48.2122118Z .b8 112 2026-02-21T08:52:48.2122174Z .b8 47 2026-02-21T08:52:48.2122233Z .b8 116 2026-02-21T08:52:48.2122285Z .b8 111 2026-02-21T08:52:48.2122339Z .b8 114 2026-02-21T08:52:48.2122408Z .b8 99 2026-02-21T08:52:48.2122463Z .b8 104 2026-02-21T08:52:48.2122518Z .b8 105 2026-02-21T08:52:48.2122572Z .b8 110 2026-02-21T08:52:48.2122631Z .b8 100 2026-02-21T08:52:48.2122688Z .b8 117 2026-02-21T08:52:48.2122740Z .b8 99 2026-02-21T08:52:48.2122795Z .b8 116 2026-02-21T08:52:48.2122860Z .b8 111 2026-02-21T08:52:48.2122912Z .b8 114 2026-02-21T08:52:48.2122964Z .b8 95 2026-02-21T08:52:48.2123023Z .b8 114 2026-02-21T08:52:48.2123076Z .b8 111 2026-02-21T08:52:48.2123130Z .b8 111 2026-02-21T08:52:48.2123184Z .b8 116 2026-02-21T08:52:48.2123244Z .b8 47 2026-02-21T08:52:48.2123299Z .b8 120 2026-02-21T08:52:48.2123353Z .b8 103 2026-02-21T08:52:48.2123421Z .b8 0 2026-02-21T08:52:48.2123536Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:48.2123616Z .b8 95 // DW_AT_name 2026-02-21T08:52:48.2123802Z .b8 104 2026-02-21T08:52:48.2123863Z .b8 101 2026-02-21T08:52:48.2123919Z .b8 108 2026-02-21T08:52:48.2123972Z .b8 105 2026-02-21T08:52:48.2124033Z .b8 111 2026-02-21T08:52:48.2124087Z .b8 110 2026-02-21T08:52:48.2124139Z .b8 95 2026-02-21T08:52:48.2124193Z .b8 109 2026-02-21T08:52:48.2124253Z .b8 97 2026-02-21T08:52:48.2124307Z .b8 116 2026-02-21T08:52:48.2124362Z .b8 109 2026-02-21T08:52:48.2124422Z .b8 117 2026-02-21T08:52:48.2124476Z .b8 108 2026-02-21T08:52:48.2124528Z .b8 95 2026-02-21T08:52:48.2124581Z .b8 98 2026-02-21T08:52:48.2124641Z .b8 102 2026-02-21T08:52:48.2124696Z .b8 49 2026-02-21T08:52:48.2124748Z .b8 54 2026-02-21T08:52:48.2124801Z .b8 95 2026-02-21T08:52:48.2124872Z .b8 105 2026-02-21T08:52:48.2124931Z .b8 110 2026-02-21T08:52:48.2124985Z .b8 116 2026-02-21T08:52:48.2125141Z .b8 52 2026-02-21T08:52:48.2125199Z .b8 0 2026-02-21T08:52:48.2125281Z .b8 1 // DW_AT_inline 2026-02-21T08:52:48.2125389Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:48.2125504Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:48.2125601Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:48.2125703Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:48.2125839Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:48.2125935Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:48.2126024Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:48.2126122Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:48.2126204Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:48.2126292Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:48.2126387Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:48.2126591Z .b8 0 // End Of Children Mark 2026-02-21T08:52:48.2126688Z .b8 0 // End Of Children Mark 2026-02-21T08:52:48.2126741Z } 2026-02-21T08:52:48.2126820Z .section .debug_macinfo { } 2026-02-21T08:52:48.2126826Z 2026-02-21T08:52:48.2126921Z ================================================================ 2026-02-21T08:52:48.2127046Z please share the reproducer above with Triton project. 2026-02-21T08:52:48.7350277Z 2026-02-21T08:52:48.7350294Z 2026-02-21T08:52:48.7350302Z 2026-02-21T08:52:48.7350709Z ================================================================ 2026-02-21T08:52:48.7351280Z Internal Triton PTX codegen error 2026-02-21T08:52:48.7351668Z `ptxas` stderr: 2026-02-21T08:52:48.7352782Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 337 in function _helion_matmul_bf16_int4. Try to compile with register target of 34 or higher. 2026-02-21T08:52:48.7354047Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:48.7354415Z 2026-02-21T08:52:48.7355392Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpyu_1xu3j.ptx -o /tmp/tmpyu_1xu3j.ptx.o 2026-02-21T08:52:48.7356896Z 2026-02-21T08:52:48.7356900Z 2026-02-21T08:52:48.7356970Z // 2026-02-21T08:52:48.7357171Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:52:48.7357411Z // 2026-02-21T08:52:48.7357505Z 2026-02-21T08:52:48.7357575Z .version 8.7 2026-02-21T08:52:48.7357757Z .target sm_90a 2026-02-21T08:52:48.7357928Z .address_size 64 2026-02-21T08:52:48.7358039Z 2026-02-21T08:52:48.7358258Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:52:48.7358679Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:52:48.7359001Z // @_helion_matmul_bf16_int4 2026-02-21T08:52:48.7359317Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:52:48.7359683Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:52:48.7360479Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:52:48.7360911Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:52:48.7361353Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:52:48.7361776Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:52:48.7362110Z ) 2026-02-21T08:52:48.7362261Z .reqntid 1024 2026-02-21T08:52:48.7362437Z .maxnreg 32 2026-02-21T08:52:48.7362592Z { 2026-02-21T08:52:48.7362755Z .reg .pred %p<61>; 2026-02-21T08:52:48.7362953Z .reg .b16 %rs<178>; 2026-02-21T08:52:48.7363143Z .reg .b32 %r<1001>; 2026-02-21T08:52:48.7363336Z .reg .b64 %rd<134>; 2026-02-21T08:52:48.7363914Z .loc 1 14 0 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:14:0 2026-02-21T08:52:48.7364374Z $L__func_begin0: 2026-02-21T08:52:48.7364741Z .loc 1 14 0 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:14:0 2026-02-21T08:52:48.7365122Z 2026-02-21T08:52:48.7365188Z // %bb.0: 2026-02-21T08:52:48.7365425Z ld.param.b64 %rd16, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:52:48.7365797Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:52:48.7366676Z [85s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:52:48.7368225Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:52:48.7369697Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:52:48.7370001Z `ptxas` stderr: 2026-02-21T08:52:48.7370551Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 337 in function _helion_matmul_bf16_int4. Try to compile with register target of 34 or higher. 2026-02-21T08:52:48.7371188Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:52:48.7371368Z 2026-02-21T08:52:48.7371877Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpyu_1xu3j.ptx -o /tmp/tmpyu_1xu3j.ptx.o 2026-02-21T08:52:48.7372446Z 2026-02-21T08:52:48.7372599Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:52:48.7372967Z ld.param.b64 %rd14, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:52:48.7373213Z $L__tmp0: 2026-02-21T08:52:48.7373517Z .loc 1 19 46 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:46 2026-02-21T08:52:48.7373892Z mov.u32 %r968, %ctaid.x; 2026-02-21T08:52:48.7374203Z .loc 1 0 0 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:0 2026-02-21T08:52:48.7374558Z sub.s32 %r126, 4447, %r968; 2026-02-21T08:52:48.7374898Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7375277Z mul.hi.u32 %r127, %r126, 1041204193; 2026-02-21T08:52:48.7375476Z shr.u32 %r128, %r127, 10; 2026-02-21T08:52:48.7375658Z mul.hi.u32 %r129, %r128, 1431655766; 2026-02-21T08:52:48.7375879Z mad.lo.s32 %r993, %r129, 12672, %r968; 2026-02-21T08:52:48.7376231Z .loc 1 31 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:31:45 2026-02-21T08:52:48.7376728Z mov.u32 %r3, %tid.x; 2026-02-21T08:52:48.7376886Z and.b32 %r4, %r3, 31; 2026-02-21T08:52:48.7377056Z shr.u32 %r5, %r3, 5; 2026-02-21T08:52:48.7377210Z shr.u32 %r6, %r3, 4; 2026-02-21T08:52:48.7377379Z and.b32 %r130, %r3, 15; 2026-02-21T08:52:48.7377550Z shl.b32 %r7, %r130, 2; 2026-02-21T08:52:48.7378014Z .loc 1 33 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:33:45 2026-02-21T08:52:48.7378370Z and.b32 %r8, %r3, 992; 2026-02-21T08:52:48.7378529Z shl.b32 %r9, %r130, 1; 2026-02-21T08:52:48.7378833Z .loc 1 65 38 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:65:38 2026-02-21T08:52:48.7379188Z and.b32 %r10, %r3, 32; 2026-02-21T08:52:48.7379507Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7379871Z setp.ge.s32 %p1, %r968, %r993; 2026-02-21T08:52:48.7380063Z shl.b32 %r950, %r3, 3; 2026-02-21T08:52:48.7380222Z shr.u32 %r951, %r3, 1; 2026-02-21T08:52:48.7380390Z mov.b32 %r952, global_smem; 2026-02-21T08:52:48.7380570Z shl.b32 %r953, %r3, 6; 2026-02-21T08:52:48.7380863Z shl.b32 %r954, %r3, 5; 2026-02-21T08:52:48.7381031Z shl.b32 %r955, %r4, 1; 2026-02-21T08:52:48.7381191Z shl.b32 %r956, %r4, 2; 2026-02-21T08:52:48.7381364Z shl.b32 %r957, %r3, 1; 2026-02-21T08:52:48.7381531Z shr.u32 %r958, %r10, 4; 2026-02-21T08:52:48.7381707Z setp.gt.u32 %p59, %r3, 511; 2026-02-21T08:52:48.7381886Z and.b32 %r959, %r3, 896; 2026-02-21T08:52:48.7382071Z and.b32 %r960, %r5, 2; 2026-02-21T08:52:48.7382236Z shl.b32 %r961, %r4, 7; 2026-02-21T08:52:48.7382399Z shl.b32 %r962, %r3, 4; 2026-02-21T08:52:48.7382560Z shr.u32 %r963, %r8, 3; 2026-02-21T08:52:48.7382721Z shl.b32 %r964, %r3, 9; 2026-02-21T08:52:48.7382889Z shl.b32 %r965, %r5, 8; 2026-02-21T08:52:48.7383044Z shl.b32 %r966, %r4, 4; 2026-02-21T08:52:48.7383206Z shl.b32 %r967, %r5, 3; 2026-02-21T08:52:48.7383368Z setp.eq.b32 %p60, %r10, 0; 2026-02-21T08:52:48.7383551Z @%p1 bra $L__BB0_9; 2026-02-21T08:52:48.7383733Z // %bb.1: // %.lr.ph 2026-02-21T08:52:48.7384118Z .loc 1 0 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:0:144 2026-02-21T08:52:48.7384481Z and.b32 %r133, %r951, 56; 2026-02-21T08:52:48.7384665Z xor.b32 %r134, %r133, %r950; 2026-02-21T08:52:48.7384864Z add.s32 %r11, %r952, %r134; 2026-02-21T08:52:48.7385044Z add.s32 %r12, %r11, 8192; 2026-02-21T08:52:48.7385221Z and.b32 %r137, %r953, 6144; 2026-02-21T08:52:48.7385394Z and.b32 %r139, %r954, 896; 2026-02-21T08:52:48.7385571Z or.b32 %r141, %r137, %r139; 2026-02-21T08:52:48.7385743Z or.b32 %r13, %r141, %r955; 2026-02-21T08:52:48.7385917Z xor.b32 %r14, %r13, 8; 2026-02-21T08:52:48.7386087Z xor.b32 %r15, %r13, 16; 2026-02-21T08:52:48.7386258Z xor.b32 %r16, %r13, 24; 2026-02-21T08:52:48.7386422Z xor.b32 %r17, %r13, 32; 2026-02-21T08:52:48.7386824Z xor.b32 %r18, %r13, 40; 2026-02-21T08:52:48.7386993Z xor.b32 %r19, %r13, 48; 2026-02-21T08:52:48.7387167Z xor.b32 %r20, %r13, 56; 2026-02-21T08:52:48.7387335Z and.b32 %r144, %r957, 896; 2026-02-21T08:52:48.7387514Z selp.b32 %r146, 1, 0, %p59; 2026-02-21T08:52:48.7387694Z add.s32 %r147, %r952, 16384; 2026-02-21T08:52:48.7387867Z add.s32 %r148, %r147, %r144; 2026-02-21T08:52:48.7388047Z add.s32 %r149, %r148, %r958; 2026-02-21T08:52:48.7388219Z add.s32 %r150, %r149, %r146; 2026-02-21T08:52:48.7388394Z add.s32 %r21, %r150, %r956; 2026-02-21T08:52:48.7388664Z add.s32 %r153, %r147, %r960; 2026-02-21T08:52:48.7388838Z add.s32 %r154, %r153, %r959; 2026-02-21T08:52:48.7389014Z add.s32 %r22, %r154, %r956; 2026-02-21T08:52:48.7389184Z and.b32 %r157, %r962, 112; 2026-02-21T08:52:48.7389358Z or.b32 %r159, %r961, %r157; 2026-02-21T08:52:48.7389527Z xor.b32 %r160, %r159, %r963; 2026-02-21T08:52:48.7389703Z add.s32 %r23, %r147, %r160; 2026-02-21T08:52:48.7389892Z and.b32 %r162, %r964, 3072; 2026-02-21T08:52:48.7390070Z and.b32 %r165, %r965, 768; 2026-02-21T08:52:48.7390238Z or.b32 %r166, %r165, %r966; 2026-02-21T08:52:48.7390413Z and.b32 %r168, %r967, 96; 2026-02-21T08:52:48.7390593Z xor.b32 %r169, %r166, %r168; 2026-02-21T08:52:48.7390763Z add.s32 %r170, %r952, %r162; 2026-02-21T08:52:48.7390943Z add.s32 %r24, %r170, %r169; 2026-02-21T08:52:48.7391116Z or.b32 %r171, %r967, %r4; 2026-02-21T08:52:48.7391457Z shl.b32 %r172, %r171, 7; 2026-02-21T08:52:48.7391626Z and.b32 %r173, %r172, 3072; 2026-02-21T08:52:48.7391801Z and.b32 %r174, %r954, 96; 2026-02-21T08:52:48.7391967Z shl.b32 %r175, %r171, 2; 2026-02-21T08:52:48.7392141Z and.b32 %r176, %r175, 1008; 2026-02-21T08:52:48.7392312Z xor.b32 %r177, %r176, %r174; 2026-02-21T08:52:48.7392492Z add.s32 %r178, %r952, %r173; 2026-02-21T08:52:48.7392671Z add.s32 %r355, %r178, %r177; 2026-02-21T08:52:48.7393008Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7393382Z shl.b32 %r179, %r6, 13; 2026-02-21T08:52:48.7393710Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7394239Z or.b32 %r180, %r179, %r7; 2026-02-21T08:52:48.7394418Z or.b32 %r26, %r180, 128; 2026-02-21T08:52:48.7394750Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7395124Z mul.lo.s32 %r181, %r5, 7168; 2026-02-21T08:52:48.7395449Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7395817Z or.b32 %r27, %r181, %r4; 2026-02-21T08:52:48.7395986Z cvt.u64.u32 %rd2, %r7; 2026-02-21T08:52:48.7396211Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:52:48.7396626Z // Child Loop BB0_3 Depth 2 2026-02-21T08:52:48.7396904Z // Child Loop BB0_5 Depth 2 2026-02-21T08:52:48.7397170Z // Child Loop BB0_7 Depth 2 2026-02-21T08:52:48.7397551Z .loc 1 25 35 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:25:35 2026-02-21T08:52:48.7397918Z mul.hi.s32 %r189, %r968, -1840700269; 2026-02-21T08:52:48.7398118Z add.s32 %r190, %r189, %r968; 2026-02-21T08:52:48.7398298Z shr.u32 %r191, %r190, 31; 2026-02-21T08:52:48.7398466Z shr.s32 %r192, %r190, 8; 2026-02-21T08:52:48.7398651Z add.s32 %r193, %r192, %r191; 2026-02-21T08:52:48.7398969Z .loc 1 26 33 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:26:33 2026-02-21T08:52:48.7399324Z shl.b32 %r194, %r193, 1; 2026-02-21T08:52:48.7399635Z .loc 1 27 39 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:39 2026-02-21T08:52:48.7399985Z sub.s32 %r195, 1, %r194; 2026-02-21T08:52:48.7400291Z .loc 1 27 52 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:52 2026-02-21T08:52:48.7400637Z min.s32 %r196, %r195, 2; 2026-02-21T08:52:48.7400949Z .loc 1 28 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:45 2026-02-21T08:52:48.7401305Z mul.lo.s32 %r197, %r193, 448; 2026-02-21T08:52:48.7401485Z sub.s32 %r198, %r968, %r197; 2026-02-21T08:52:48.7401818Z .loc 1 29 51 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:29:51 2026-02-21T08:52:48.7402181Z div.s32 %r199, %r198, %r196; 2026-02-21T08:52:48.7402514Z .loc 1 28 64 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:64 2026-02-21T08:52:48.7402874Z mul.lo.s32 %r200, %r199, %r196; 2026-02-21T08:52:48.7403068Z sub.s32 %r201, %r198, %r200; 2026-02-21T08:52:48.7403387Z .loc 1 28 30 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:30 2026-02-21T08:52:48.7403746Z add.s32 %r202, %r201, %r194; 2026-02-21T08:52:48.7404067Z .loc 1 30 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:30:27 2026-02-21T08:52:48.7404433Z shl.b32 %r203, %r202, 6; 2026-02-21T08:52:48.7404755Z .loc 1 31 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:31:32 2026-02-21T08:52:48.7405122Z or.b32 %r46, %r203, %r6; 2026-02-21T08:52:48.7405440Z .loc 1 32 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:32:27 2026-02-21T08:52:48.7405945Z shl.b32 %r47, %r199, 5; 2026-02-21T08:52:48.7406264Z .loc 1 48 53 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:53 2026-02-21T08:52:48.7406738Z shl.b32 %r204, %r46, 13; 2026-02-21T08:52:48.7407045Z .loc 1 48 60 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:60 2026-02-21T08:52:48.7407401Z or.b32 %r205, %r204, %r7; 2026-02-21T08:52:48.7407717Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7408086Z mad.wide.s32 %rd17, %r205, 2, %rd14; 2026-02-21T08:52:48.7408427Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7408954Z bar.sync 0; 2026-02-21T08:52:48.7409116Z mov.b32 %r183, 8; 2026-02-21T08:52:48.7409271Z // begin inline asm 2026-02-21T08:52:48.7409516Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd17 + 0 ], 0x8, %r183; 2026-02-21T08:52:48.7409796Z // end inline asm 2026-02-21T08:52:48.7409960Z cp.async.commit_group; 2026-02-21T08:52:48.7410276Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7410638Z cvt.s64.s32 %rd20, %r204; 2026-02-21T08:52:48.7410818Z or.b64 %rd21, %rd20, %rd2; 2026-02-21T08:52:48.7411021Z shl.b64 %rd22, %rd21, 1; 2026-02-21T08:52:48.7411192Z add.s64 %rd23, %rd14, %rd22; 2026-02-21T08:52:48.7411376Z add.s64 %rd18, %rd23, 128; 2026-02-21T08:52:48.7411695Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7412043Z bar.sync 0; 2026-02-21T08:52:48.7412192Z // begin inline asm 2026-02-21T08:52:48.7412421Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd18 + 0 ], 0x8, %r183; 2026-02-21T08:52:48.7412695Z // end inline asm 2026-02-21T08:52:48.7412851Z cp.async.commit_group; 2026-02-21T08:52:48.7413183Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7413543Z shl.b32 %r206, %r202, 19; 2026-02-21T08:52:48.7413710Z or.b32 %r970, %r26, %r206; 2026-02-21T08:52:48.7413892Z add.s32 %r969, %r27, %r47; 2026-02-21T08:52:48.7414060Z mov.b32 %r973, 0f00000000; 2026-02-21T08:52:48.7414233Z mov.b32 %r972, 1; 2026-02-21T08:52:48.7414389Z mov.b32 %r971, -1; 2026-02-21T08:52:48.7414564Z mov.b64 %rd129, -32; 2026-02-21T08:52:48.7414730Z mov.b32 %r974, %r973; 2026-02-21T08:52:48.7414898Z mov.b32 %r975, %r973; 2026-02-21T08:52:48.7415059Z mov.b32 %r976, %r973; 2026-02-21T08:52:48.7415277Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.7415589Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.7415846Z add.s64 %rd129, %rd129, 32; 2026-02-21T08:52:48.7416053Z setp.lt.u64 %p12, %rd129, 4032; 2026-02-21T08:52:48.7416262Z add.s32 %r319, %r971, 1; 2026-02-21T08:52:48.7416588Z setp.gt.s32 %p13, %r319, 1; 2026-02-21T08:52:48.7416789Z selp.b32 %r971, 0, %r319, %p13; 2026-02-21T08:52:48.7417127Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7417504Z cp.async.wait_group 1; 2026-02-21T08:52:48.7417678Z bar.sync 0; 2026-02-21T08:52:48.7417837Z shl.b32 %r320, %r971, 13; 2026-02-21T08:52:48.7418018Z add.s32 %r322, %r952, %r320; 2026-02-21T08:52:48.7418373Z .loc 1 52 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:52:32 2026-02-21T08:52:48.7418740Z add.s32 %r323, %r322, %r13; 2026-02-21T08:52:48.7418923Z ld.shared.b16 %rs2, [%r323]; 2026-02-21T08:52:48.7419117Z ld.shared.b16 %rs3, [%r323+1024]; 2026-02-21T08:52:48.7419323Z ld.shared.b16 %rs4, [%r323+64]; 2026-02-21T08:52:48.7419521Z ld.shared.b16 %rs5, [%r323+1088]; 2026-02-21T08:52:48.7419713Z add.s32 %r324, %r322, %r14; 2026-02-21T08:52:48.7419897Z ld.shared.b16 %rs6, [%r324]; 2026-02-21T08:52:48.7420229Z ld.shared.b16 %rs7, [%r324+1024]; 2026-02-21T08:52:48.7420425Z ld.shared.b16 %rs8, [%r324+64]; 2026-02-21T08:52:48.7420617Z ld.shared.b16 %rs9, [%r324+1088]; 2026-02-21T08:52:48.7420804Z add.s32 %r325, %r322, %r15; 2026-02-21T08:52:48.7420992Z ld.shared.b16 %rs10, [%r325]; 2026-02-21T08:52:48.7421177Z ld.shared.b16 %rs11, [%r325+1024]; 2026-02-21T08:52:48.7421385Z ld.shared.b16 %rs12, [%r325+64]; 2026-02-21T08:52:48.7421579Z ld.shared.b16 %rs13, [%r325+1088]; 2026-02-21T08:52:48.7421775Z add.s32 %r326, %r322, %r16; 2026-02-21T08:52:48.7421953Z ld.shared.b16 %rs14, [%r326]; 2026-02-21T08:52:48.7422155Z ld.shared.b16 %rs15, [%r326+1024]; 2026-02-21T08:52:48.7422358Z ld.shared.b16 %rs16, [%r326+64]; 2026-02-21T08:52:48.7422549Z ld.shared.b16 %rs17, [%r326+1088]; 2026-02-21T08:52:48.7422872Z add.s32 %r327, %r322, %r17; 2026-02-21T08:52:48.7423056Z ld.shared.b16 %rs18, [%r327]; 2026-02-21T08:52:48.7423245Z ld.shared.b16 %rs19, [%r327+1024]; 2026-02-21T08:52:48.7423441Z ld.shared.b16 %rs20, [%r327+64]; 2026-02-21T08:52:48.7423636Z ld.shared.b16 %rs21, [%r327+1088]; 2026-02-21T08:52:48.7423822Z add.s32 %r328, %r322, %r18; 2026-02-21T08:52:48.7424002Z ld.shared.b16 %rs22, [%r328]; 2026-02-21T08:52:48.7424184Z ld.shared.b16 %rs23, [%r328+1024]; 2026-02-21T08:52:48.7424384Z ld.shared.b16 %rs24, [%r328+64]; 2026-02-21T08:52:48.7424580Z ld.shared.b16 %rs25, [%r328+1088]; 2026-02-21T08:52:48.7424771Z add.s32 %r329, %r322, %r19; 2026-02-21T08:52:48.7424955Z ld.shared.b16 %rs26, [%r329]; 2026-02-21T08:52:48.7425138Z ld.shared.b16 %rs27, [%r329+1024]; 2026-02-21T08:52:48.7425355Z ld.shared.b16 %rs28, [%r329+64]; 2026-02-21T08:52:48.7425546Z ld.shared.b16 %rs29, [%r329+1088]; 2026-02-21T08:52:48.7425741Z add.s32 %r330, %r322, %r20; 2026-02-21T08:52:48.7425925Z ld.shared.b16 %rs30, [%r330]; 2026-02-21T08:52:48.7426113Z ld.shared.b16 %rs31, [%r330+1024]; 2026-02-21T08:52:48.7426312Z ld.shared.b16 %rs32, [%r330+64]; 2026-02-21T08:52:48.7426643Z ld.shared.b16 %rs33, [%r330+1088]; 2026-02-21T08:52:48.7426847Z cvt.f32.bf16 %r215, %rs2; 2026-02-21T08:52:48.7427028Z cvt.f32.bf16 %r216, %rs3; 2026-02-21T08:52:48.7427205Z cvt.f32.bf16 %r217, %rs6; 2026-02-21T08:52:48.7427377Z cvt.f32.bf16 %r218, %rs7; 2026-02-21T08:52:48.7427556Z cvt.f32.bf16 %r227, %rs10; 2026-02-21T08:52:48.7427732Z cvt.f32.bf16 %r228, %rs11; 2026-02-21T08:52:48.7427910Z cvt.f32.bf16 %r229, %rs14; 2026-02-21T08:52:48.7428084Z cvt.f32.bf16 %r230, %rs15; 2026-02-21T08:52:48.7428261Z cvt.f32.bf16 %r239, %rs18; 2026-02-21T08:52:48.7428440Z cvt.f32.bf16 %r240, %rs19; 2026-02-21T08:52:48.7428690Z cvt.f32.bf16 %r241, %rs22; 2026-02-21T08:52:48.7428871Z cvt.f32.bf16 %r242, %rs23; 2026-02-21T08:52:48.7429044Z cvt.f32.bf16 %r251, %rs26; 2026-02-21T08:52:48.7429223Z cvt.f32.bf16 %r252, %rs27; 2026-02-21T08:52:48.7429393Z cvt.f32.bf16 %r253, %rs30; 2026-02-21T08:52:48.7429566Z cvt.f32.bf16 %r254, %rs31; 2026-02-21T08:52:48.7429737Z cvt.f32.bf16 %r263, %rs4; 2026-02-21T08:52:48.7429912Z cvt.f32.bf16 %r264, %rs5; 2026-02-21T08:52:48.7430076Z cvt.f32.bf16 %r265, %rs8; 2026-02-21T08:52:48.7430250Z cvt.f32.bf16 %r266, %rs9; 2026-02-21T08:52:48.7430422Z cvt.f32.bf16 %r275, %rs12; 2026-02-21T08:52:48.7430591Z cvt.f32.bf16 %r276, %rs13; 2026-02-21T08:52:48.7430764Z cvt.f32.bf16 %r277, %rs16; 2026-02-21T08:52:48.7430931Z cvt.f32.bf16 %r278, %rs17; 2026-02-21T08:52:48.7431106Z cvt.f32.bf16 %r287, %rs20; 2026-02-21T08:52:48.7431272Z cvt.f32.bf16 %r288, %rs21; 2026-02-21T08:52:48.7431446Z cvt.f32.bf16 %r289, %rs24; 2026-02-21T08:52:48.7431617Z cvt.f32.bf16 %r290, %rs25; 2026-02-21T08:52:48.7431791Z cvt.f32.bf16 %r299, %rs28; 2026-02-21T08:52:48.7431967Z cvt.f32.bf16 %r300, %rs29; 2026-02-21T08:52:48.7432137Z cvt.f32.bf16 %r301, %rs32; 2026-02-21T08:52:48.7432317Z cvt.f32.bf16 %r302, %rs33; 2026-02-21T08:52:48.7432641Z .loc 1 54 34 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:34 2026-02-21T08:52:48.7433008Z cvt.s64.s32 %rd36, %r969; 2026-02-21T08:52:48.7433324Z add.s64 %rd25, %rd15, %rd36; 2026-02-21T08:52:48.7433650Z .loc 1 54 87 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:87 2026-02-21T08:52:48.7434003Z // begin inline asm 2026-02-21T08:52:48.7434166Z mov.u64 %rd24, 0x0; 2026-02-21T08:52:48.7434406Z createpolicy.fractional.L2::evict_last.b64 %rd24, 1.0; 2026-02-21T08:52:48.7434665Z // end inline asm 2026-02-21T08:52:48.7434823Z // begin inline asm 2026-02-21T08:52:48.7434976Z mov.u16 %rs1, 0x0; 2026-02-21T08:52:48.7435227Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs1 }, [ %rd25 + 0 ], %rd24; 2026-02-21T08:52:48.7435516Z // end inline asm 2026-02-21T08:52:48.7435986Z .loc 1 62 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:62:28 2026-02-21T08:52:48.7436353Z st.shared.b8 [%r21], %rs1; 2026-02-21T08:52:48.7436661Z bar.sync 0; 2026-02-21T08:52:48.7436829Z ld.shared.v2.b8 {%rs34, %rs35}, [%r22]; 2026-02-21T08:52:48.7437201Z .loc 1 57 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:57:28 2026-02-21T08:52:48.7437567Z shl.b16 %rs36, %rs34, 4; 2026-02-21T08:52:48.7437739Z shl.b16 %rs37, %rs35, 4; 2026-02-21T08:52:48.7438059Z .loc 1 72 58 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:72:58 2026-02-21T08:52:48.7438425Z selp.b16 %rs38, %rs36, %rs34, %p60; 2026-02-21T08:52:48.7438634Z cvt.s16.s8 %rs39, %rs38; 2026-02-21T08:52:48.7438815Z shr.s16 %rs40, %rs39, 4; 2026-02-21T08:52:48.7438993Z selp.b16 %rs41, %rs37, %rs35, %p60; 2026-02-21T08:52:48.7439202Z cvt.s16.s8 %rs42, %rs41; 2026-02-21T08:52:48.7439370Z shr.s16 %rs43, %rs42, 4; 2026-02-21T08:52:48.7439697Z .loc 1 77 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:77:32 2026-02-21T08:52:48.7440059Z cvt.rn.f32.s16 %r331, %rs40; 2026-02-21T08:52:48.7440249Z cvt.rn.f32.s16 %r332, %rs43; 2026-02-21T08:52:48.7440425Z bar.sync 0; 2026-02-21T08:52:48.7440584Z st.shared.b32 [%r23], %r331; 2026-02-21T08:52:48.7440771Z st.shared.b32 [%r23+4096], %r332; 2026-02-21T08:52:48.7440968Z $L__tmp1: 2026-02-21T08:52:48.7441329Z .loc 2 291 36 // standard.py:291:36 @[ c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:84:40 ] 2026-02-21T08:52:48.7441765Z // begin inline asm 2026-02-21T08:52:48.7441954Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.7442144Z // end inline asm 2026-02-21T08:52:48.7442295Z bar.sync 0; 2026-02-21T08:52:48.7442455Z shfl.sync.idx.b32 %r333, %r5, 0, 31, -1; 2026-02-21T08:52:48.7442685Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.7442867Z shl.b32 %r334, %r333, 8; 2026-02-21T08:52:48.7443043Z and.b32 %r335, %r334, 3072; 2026-02-21T08:52:48.7443223Z add.s32 %r336, %r335, %r147; 2026-02-21T08:52:48.7443409Z bfe.u32 %r337, %r336, 4, 14; 2026-02-21T08:52:48.7443591Z cvt.u64.u32 %rd37, %r337; 2026-02-21T08:52:48.7443774Z or.b64 %rd27, %rd37, 4611686293322072064; 2026-02-21T08:52:48.7443989Z mov.pred %p3, -1; 2026-02-21T08:52:48.7444150Z // begin inline asm 2026-02-21T08:52:48.7444557Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r215,%r216,%r217,%r218}, %rd27, %p3, 1, 1; 2026-02-21T08:52:48.7444987Z // end inline asm 2026-02-21T08:52:48.7445142Z add.s32 %r338, %r336, 32; 2026-02-21T08:52:48.7445321Z bfe.u32 %r339, %r338, 4, 14; 2026-02-21T08:52:48.7445497Z cvt.u64.u32 %rd38, %r339; 2026-02-21T08:52:48.7445685Z or.b64 %rd28, %rd38, 4611686293322072064; 2026-02-21T08:52:48.7445895Z // begin inline asm 2026-02-21T08:52:48.7446276Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r227,%r228,%r229,%r230}, %rd28, %p3, 1, 1; 2026-02-21T08:52:48.7446826Z // end inline asm 2026-02-21T08:52:48.7447004Z add.s32 %r340, %r336, 64; 2026-02-21T08:52:48.7447180Z bfe.u32 %r341, %r340, 4, 14; 2026-02-21T08:52:48.7447363Z cvt.u64.u32 %rd39, %r341; 2026-02-21T08:52:48.7447548Z or.b64 %rd29, %rd39, 4611686293322072064; 2026-02-21T08:52:48.7447899Z // begin inline asm 2026-02-21T08:52:48.7448277Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r239,%r240,%r241,%r242}, %rd29, %p3, 1, 1; 2026-02-21T08:52:48.7448701Z // end inline asm 2026-02-21T08:52:48.7448872Z add.s32 %r342, %r336, 96; 2026-02-21T08:52:48.7449042Z bfe.u32 %r343, %r342, 4, 14; 2026-02-21T08:52:48.7449224Z cvt.u64.u32 %rd40, %r343; 2026-02-21T08:52:48.7449404Z or.b64 %rd30, %rd40, 4611686293322072064; 2026-02-21T08:52:48.7449611Z // begin inline asm 2026-02-21T08:52:48.7449991Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r251,%r252,%r253,%r254}, %rd30, %p3, 1, 1; 2026-02-21T08:52:48.7450412Z // end inline asm 2026-02-21T08:52:48.7450573Z add.s32 %r344, %r336, 4096; 2026-02-21T08:52:48.7450898Z bfe.u32 %r345, %r344, 4, 14; 2026-02-21T08:52:48.7451086Z cvt.u64.u32 %rd41, %r345; 2026-02-21T08:52:48.7451266Z or.b64 %rd31, %rd41, 4611686293322072064; 2026-02-21T08:52:48.7451480Z // begin inline asm 2026-02-21T08:52:48.7451849Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r263,%r264,%r265,%r266}, %rd31, %p3, 1, 1; 2026-02-21T08:52:48.7452274Z // end inline asm 2026-02-21T08:52:48.7452435Z add.s32 %r346, %r336, 4128; 2026-02-21T08:52:48.7452610Z bfe.u32 %r347, %r346, 4, 14; 2026-02-21T08:52:48.7452793Z cvt.u64.u32 %rd42, %r347; 2026-02-21T08:52:48.7452972Z or.b64 %rd32, %rd42, 4611686293322072064; 2026-02-21T08:52:48.7453182Z // begin inline asm 2026-02-21T08:52:48.7453548Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r275,%r276,%r277,%r278}, %rd32, %p3, 1, 1; 2026-02-21T08:52:48.7453974Z // end inline asm 2026-02-21T08:52:48.7454127Z add.s32 %r348, %r336, 4160; 2026-02-21T08:52:48.7454303Z bfe.u32 %r349, %r348, 4, 14; 2026-02-21T08:52:48.7454482Z cvt.u64.u32 %rd43, %r349; 2026-02-21T08:52:48.7454661Z or.b64 %rd33, %rd43, 4611686293322072064; 2026-02-21T08:52:48.7454863Z // begin inline asm 2026-02-21T08:52:48.7455252Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r287,%r288,%r289,%r290}, %rd33, %p3, 1, 1; 2026-02-21T08:52:48.7455675Z // end inline asm 2026-02-21T08:52:48.7455822Z add.s32 %r350, %r336, 4192; 2026-02-21T08:52:48.7455999Z bfe.u32 %r351, %r350, 4, 14; 2026-02-21T08:52:48.7456177Z cvt.u64.u32 %rd44, %r351; 2026-02-21T08:52:48.7456355Z or.b64 %rd34, %rd44, 4611686293322072064; 2026-02-21T08:52:48.7456672Z // begin inline asm 2026-02-21T08:52:48.7457052Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r973,%r974,%r975,%r976}, {%r299,%r300,%r301,%r302}, %rd34, %p3, 1, 1; 2026-02-21T08:52:48.7457477Z // end inline asm 2026-02-21T08:52:48.7457646Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.7457845Z mov.b32 %r309, 0; 2026-02-21T08:52:48.7458000Z mov.b32 %r308, %r309; 2026-02-21T08:52:48.7458169Z mov.b32 %r307, %r147; 2026-02-21T08:52:48.7458330Z // begin inline asm 2026-02-21T08:52:48.7458538Z // wait for regs: %r973,%r974,%r975,%r976,%r307,%r308,%r309 2026-02-21T08:52:48.7458811Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.7459004Z // end inline asm 2026-02-21T08:52:48.7459157Z $L__tmp2: 2026-02-21T08:52:48.7459449Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7459815Z add.s32 %r352, %r972, 1; 2026-02-21T08:52:48.7459989Z setp.gt.s32 %p14, %r352, 1; 2026-02-21T08:52:48.7460180Z selp.b32 %r972, 0, %r352, %p14; 2026-02-21T08:52:48.7460508Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7460884Z mad.wide.s32 %rd35, %r970, 2, %rd14; 2026-02-21T08:52:48.7461231Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7461589Z shl.b32 %r353, %r972, 13; 2026-02-21T08:52:48.7461766Z add.s32 %r317, %r11, %r353; 2026-02-21T08:52:48.7461944Z selp.b32 %r318, 8, 0, %p12; 2026-02-21T08:52:48.7462310Z // begin inline asm 2026-02-21T08:52:48.7462555Z cp.async.ca.shared.global [ %r317 + 0 ], [ %rd35 + 0 ], 0x8, %r318; 2026-02-21T08:52:48.7462840Z // end inline asm 2026-02-21T08:52:48.7463006Z cp.async.commit_group; 2026-02-21T08:52:48.7463323Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7463693Z add.s32 %r970, %r970, 64; 2026-02-21T08:52:48.7463868Z add.s32 %r969, %r969, 229376; 2026-02-21T08:52:48.7464064Z setp.lt.u64 %p15, %rd129, 4064; 2026-02-21T08:52:48.7464255Z @%p15 bra $L__BB0_3; 2026-02-21T08:52:48.7464471Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.7464891Z .loc 1 33 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:33:32 2026-02-21T08:52:48.7465395Z or.b32 %r364, %r47, %r9; 2026-02-21T08:52:48.7465721Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7466086Z cp.async.wait_group 0; 2026-02-21T08:52:48.7466259Z bar.sync 0; 2026-02-21T08:52:48.7466679Z .loc 1 87 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:87:28 2026-02-21T08:52:48.7467057Z cvt.rn.bf16x2.f32 %r365, %r974, %r973; 2026-02-21T08:52:48.7467271Z cvt.rn.bf16x2.f32 %r366, %r976, %r975; 2026-02-21T08:52:48.7467634Z .loc 1 88 50 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:50 2026-02-21T08:52:48.7467997Z mad.lo.s32 %r367, %r46, 7168, %r364; 2026-02-21T08:52:48.7468333Z .loc 1 88 22 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:22 2026-02-21T08:52:48.7468783Z mad.wide.s32 %rd45, %r367, 2, %rd16; 2026-02-21T08:52:48.7469126Z .loc 1 88 81 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:81 2026-02-21T08:52:48.7469564Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r24], {%r365, %r366}; 2026-02-21T08:52:48.7469848Z bar.sync 0; 2026-02-21T08:52:48.7470000Z // begin inline asm 2026-02-21T08:52:48.7470235Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r356}, [%r355]; 2026-02-21T08:52:48.7470500Z // end inline asm 2026-02-21T08:52:48.7470655Z // begin inline asm 2026-02-21T08:52:48.7470822Z st.global.b32 [ %rd45 + 0 ], { %r356 }; 2026-02-21T08:52:48.7471025Z // end inline asm 2026-02-21T08:52:48.7471331Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7471706Z add.s32 %r368, %r968, 4224; 2026-02-21T08:52:48.7472040Z .loc 1 25 35 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:25:35 2026-02-21T08:52:48.7472410Z mul.hi.s32 %r369, %r368, -1840700269; 2026-02-21T08:52:48.7472619Z add.s32 %r370, %r369, %r368; 2026-02-21T08:52:48.7472802Z shr.u32 %r371, %r370, 31; 2026-02-21T08:52:48.7472980Z shr.s32 %r372, %r370, 8; 2026-02-21T08:52:48.7473151Z add.s32 %r373, %r372, %r371; 2026-02-21T08:52:48.7473476Z .loc 1 26 33 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:26:33 2026-02-21T08:52:48.7473833Z shl.b32 %r374, %r373, 1; 2026-02-21T08:52:48.7474148Z .loc 1 27 39 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:39 2026-02-21T08:52:48.7474503Z sub.s32 %r375, 1, %r374; 2026-02-21T08:52:48.7474813Z .loc 1 27 52 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:52 2026-02-21T08:52:48.7475170Z min.s32 %r376, %r375, 2; 2026-02-21T08:52:48.7475474Z .loc 1 28 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:45 2026-02-21T08:52:48.7475840Z mul.lo.s32 %r377, %r373, 448; 2026-02-21T08:52:48.7476022Z sub.s32 %r378, %r368, %r377; 2026-02-21T08:52:48.7476348Z .loc 1 29 51 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:29:51 2026-02-21T08:52:48.7476834Z div.s32 %r379, %r378, %r376; 2026-02-21T08:52:48.7477148Z .loc 1 28 64 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:64 2026-02-21T08:52:48.7477660Z mul.lo.s32 %r380, %r379, %r376; 2026-02-21T08:52:48.7477849Z sub.s32 %r381, %r378, %r380; 2026-02-21T08:52:48.7478168Z .loc 1 28 30 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:30 2026-02-21T08:52:48.7478520Z add.s32 %r382, %r381, %r374; 2026-02-21T08:52:48.7478845Z .loc 1 30 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:30:27 2026-02-21T08:52:48.7479200Z shl.b32 %r383, %r382, 6; 2026-02-21T08:52:48.7479508Z .loc 1 31 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:31:32 2026-02-21T08:52:48.7479865Z or.b32 %r66, %r383, %r6; 2026-02-21T08:52:48.7480291Z .loc 1 32 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:32:27 2026-02-21T08:52:48.7480653Z shl.b32 %r67, %r379, 5; 2026-02-21T08:52:48.7480962Z .loc 1 48 53 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:53 2026-02-21T08:52:48.7481323Z shl.b32 %r384, %r66, 13; 2026-02-21T08:52:48.7481636Z .loc 1 48 60 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:60 2026-02-21T08:52:48.7481989Z or.b32 %r385, %r384, %r7; 2026-02-21T08:52:48.7482308Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7482666Z mad.wide.s32 %rd46, %r385, 2, %rd14; 2026-02-21T08:52:48.7483023Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7483373Z bar.sync 0; 2026-02-21T08:52:48.7483518Z mov.b32 %r358, 8; 2026-02-21T08:52:48.7483672Z // begin inline asm 2026-02-21T08:52:48.7483903Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd46 + 0 ], 0x8, %r358; 2026-02-21T08:52:48.7484182Z // end inline asm 2026-02-21T08:52:48.7484336Z cp.async.commit_group; 2026-02-21T08:52:48.7484654Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7485010Z cvt.s64.s32 %rd49, %r384; 2026-02-21T08:52:48.7485189Z or.b64 %rd50, %rd49, %rd2; 2026-02-21T08:52:48.7485365Z shl.b64 %rd51, %rd50, 1; 2026-02-21T08:52:48.7485537Z add.s64 %rd52, %rd14, %rd51; 2026-02-21T08:52:48.7485721Z add.s64 %rd47, %rd52, 128; 2026-02-21T08:52:48.7486049Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7486412Z bar.sync 0; 2026-02-21T08:52:48.7486690Z // begin inline asm 2026-02-21T08:52:48.7486935Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd47 + 0 ], 0x8, %r358; 2026-02-21T08:52:48.7487210Z // end inline asm 2026-02-21T08:52:48.7487296Z cp.async.commit_group; 2026-02-21T08:52:48.7487518Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7487581Z shl.b32 %r386, %r382, 19; 2026-02-21T08:52:48.7487651Z or.b32 %r978, %r26, %r386; 2026-02-21T08:52:48.7487717Z add.s32 %r977, %r27, %r67; 2026-02-21T08:52:48.7487779Z mov.b32 %r981, 0f00000000; 2026-02-21T08:52:48.7487838Z mov.b32 %r980, 1; 2026-02-21T08:52:48.7487906Z mov.b32 %r979, -1; 2026-02-21T08:52:48.7487969Z mov.b64 %rd130, -32; 2026-02-21T08:52:48.7488029Z mov.b32 %r982, %r981; 2026-02-21T08:52:48.7488095Z mov.b32 %r983, %r981; 2026-02-21T08:52:48.7488154Z mov.b32 %r984, %r981; 2026-02-21T08:52:48.7488272Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.7488379Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.7488450Z add.s64 %rd130, %rd130, 32; 2026-02-21T08:52:48.7488520Z setp.lt.u64 %p25, %rd130, 4032; 2026-02-21T08:52:48.7488589Z add.s32 %r499, %r979, 1; 2026-02-21T08:52:48.7488662Z setp.gt.s32 %p26, %r499, 1; 2026-02-21T08:52:48.7488729Z selp.b32 %r979, 0, %r499, %p26; 2026-02-21T08:52:48.7488938Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7489149Z cp.async.wait_group 1; 2026-02-21T08:52:48.7489214Z bar.sync 0; 2026-02-21T08:52:48.7489277Z shl.b32 %r500, %r979, 13; 2026-02-21T08:52:48.7489341Z add.s32 %r502, %r952, %r500; 2026-02-21T08:52:48.7489563Z .loc 1 52 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:52:32 2026-02-21T08:52:48.7489631Z add.s32 %r503, %r502, %r13; 2026-02-21T08:52:48.7489700Z ld.shared.b16 %rs45, [%r503]; 2026-02-21T08:52:48.7489777Z ld.shared.b16 %rs46, [%r503+1024]; 2026-02-21T08:52:48.7489847Z ld.shared.b16 %rs47, [%r503+64]; 2026-02-21T08:52:48.7489914Z ld.shared.b16 %rs48, [%r503+1088]; 2026-02-21T08:52:48.7489975Z add.s32 %r504, %r502, %r14; 2026-02-21T08:52:48.7490174Z ld.shared.b16 %rs49, [%r504]; 2026-02-21T08:52:48.7490245Z ld.shared.b16 %rs50, [%r504+1024]; 2026-02-21T08:52:48.7490314Z ld.shared.b16 %rs51, [%r504+64]; 2026-02-21T08:52:48.7490387Z ld.shared.b16 %rs52, [%r504+1088]; 2026-02-21T08:52:48.7490453Z add.s32 %r505, %r502, %r15; 2026-02-21T08:52:48.7490517Z ld.shared.b16 %rs53, [%r505]; 2026-02-21T08:52:48.7490585Z ld.shared.b16 %rs54, [%r505+1024]; 2026-02-21T08:52:48.7490659Z ld.shared.b16 %rs55, [%r505+64]; 2026-02-21T08:52:48.7490724Z ld.shared.b16 %rs56, [%r505+1088]; 2026-02-21T08:52:48.7490787Z add.s32 %r506, %r502, %r16; 2026-02-21T08:52:48.7490858Z ld.shared.b16 %rs57, [%r506]; 2026-02-21T08:52:48.7490924Z ld.shared.b16 %rs58, [%r506+1024]; 2026-02-21T08:52:48.7490992Z ld.shared.b16 %rs59, [%r506+64]; 2026-02-21T08:52:48.7491063Z ld.shared.b16 %rs60, [%r506+1088]; 2026-02-21T08:52:48.7491127Z add.s32 %r507, %r502, %r17; 2026-02-21T08:52:48.7491192Z ld.shared.b16 %rs61, [%r507]; 2026-02-21T08:52:48.7491261Z ld.shared.b16 %rs62, [%r507+1024]; 2026-02-21T08:52:48.7491339Z ld.shared.b16 %rs63, [%r507+64]; 2026-02-21T08:52:48.7491405Z ld.shared.b16 %rs64, [%r507+1088]; 2026-02-21T08:52:48.7491468Z add.s32 %r508, %r502, %r18; 2026-02-21T08:52:48.7491542Z ld.shared.b16 %rs65, [%r508]; 2026-02-21T08:52:48.7491608Z ld.shared.b16 %rs66, [%r508+1024]; 2026-02-21T08:52:48.7491675Z ld.shared.b16 %rs67, [%r508+64]; 2026-02-21T08:52:48.7491755Z ld.shared.b16 %rs68, [%r508+1088]; 2026-02-21T08:52:48.7491828Z add.s32 %r509, %r502, %r19; 2026-02-21T08:52:48.7491895Z ld.shared.b16 %rs69, [%r509]; 2026-02-21T08:52:48.7491960Z ld.shared.b16 %rs70, [%r509+1024]; 2026-02-21T08:52:48.7492029Z ld.shared.b16 %rs71, [%r509+64]; 2026-02-21T08:52:48.7492093Z ld.shared.b16 %rs72, [%r509+1088]; 2026-02-21T08:52:48.7492154Z add.s32 %r510, %r502, %r20; 2026-02-21T08:52:48.7492219Z ld.shared.b16 %rs73, [%r510]; 2026-02-21T08:52:48.7492292Z ld.shared.b16 %rs74, [%r510+1024]; 2026-02-21T08:52:48.7492357Z ld.shared.b16 %rs75, [%r510+64]; 2026-02-21T08:52:48.7492424Z ld.shared.b16 %rs76, [%r510+1088]; 2026-02-21T08:52:48.7492496Z cvt.f32.bf16 %r395, %rs45; 2026-02-21T08:52:48.7492559Z cvt.f32.bf16 %r396, %rs46; 2026-02-21T08:52:48.7492623Z cvt.f32.bf16 %r397, %rs49; 2026-02-21T08:52:48.7492694Z cvt.f32.bf16 %r398, %rs50; 2026-02-21T08:52:48.7492756Z cvt.f32.bf16 %r407, %rs53; 2026-02-21T08:52:48.7492817Z cvt.f32.bf16 %r408, %rs54; 2026-02-21T08:52:48.7492880Z cvt.f32.bf16 %r409, %rs57; 2026-02-21T08:52:48.7492945Z cvt.f32.bf16 %r410, %rs58; 2026-02-21T08:52:48.7493006Z cvt.f32.bf16 %r419, %rs61; 2026-02-21T08:52:48.7493069Z cvt.f32.bf16 %r420, %rs62; 2026-02-21T08:52:48.7493135Z cvt.f32.bf16 %r421, %rs65; 2026-02-21T08:52:48.7493195Z cvt.f32.bf16 %r422, %rs66; 2026-02-21T08:52:48.7493259Z cvt.f32.bf16 %r431, %rs69; 2026-02-21T08:52:48.7493319Z cvt.f32.bf16 %r432, %rs70; 2026-02-21T08:52:48.7493386Z cvt.f32.bf16 %r433, %rs73; 2026-02-21T08:52:48.7493448Z cvt.f32.bf16 %r434, %rs74; 2026-02-21T08:52:48.7493522Z cvt.f32.bf16 %r443, %rs47; 2026-02-21T08:52:48.7493592Z cvt.f32.bf16 %r444, %rs48; 2026-02-21T08:52:48.7493654Z cvt.f32.bf16 %r445, %rs51; 2026-02-21T08:52:48.7493715Z cvt.f32.bf16 %r446, %rs52; 2026-02-21T08:52:48.7493883Z cvt.f32.bf16 %r455, %rs55; 2026-02-21T08:52:48.7493951Z cvt.f32.bf16 %r456, %rs56; 2026-02-21T08:52:48.7494015Z cvt.f32.bf16 %r457, %rs59; 2026-02-21T08:52:48.7494076Z cvt.f32.bf16 %r458, %rs60; 2026-02-21T08:52:48.7494143Z cvt.f32.bf16 %r467, %rs63; 2026-02-21T08:52:48.7494208Z cvt.f32.bf16 %r468, %rs64; 2026-02-21T08:52:48.7494268Z cvt.f32.bf16 %r469, %rs67; 2026-02-21T08:52:48.7494329Z cvt.f32.bf16 %r470, %rs68; 2026-02-21T08:52:48.7494396Z cvt.f32.bf16 %r479, %rs71; 2026-02-21T08:52:48.7494458Z cvt.f32.bf16 %r480, %rs72; 2026-02-21T08:52:48.7494518Z cvt.f32.bf16 %r481, %rs75; 2026-02-21T08:52:48.7494588Z cvt.f32.bf16 %r482, %rs76; 2026-02-21T08:52:48.7494909Z .loc 1 54 34 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:34 2026-02-21T08:52:48.7494979Z cvt.s64.s32 %rd65, %r977; 2026-02-21T08:52:48.7495051Z add.s64 %rd54, %rd15, %rd65; 2026-02-21T08:52:48.7495260Z .loc 1 54 87 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:87 2026-02-21T08:52:48.7495327Z // begin inline asm 2026-02-21T08:52:48.7495388Z mov.u64 %rd53, 0x0; 2026-02-21T08:52:48.7495520Z createpolicy.fractional.L2::evict_last.b64 %rd53, 1.0; 2026-02-21T08:52:48.7495580Z // end inline asm 2026-02-21T08:52:48.7495643Z // begin inline asm 2026-02-21T08:52:48.7495707Z mov.u16 %rs44, 0x0; 2026-02-21T08:52:48.7495866Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs44 }, [ %rd54 + 0 ], %rd53; 2026-02-21T08:52:48.7495927Z // end inline asm 2026-02-21T08:52:48.7496139Z .loc 1 62 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:62:28 2026-02-21T08:52:48.7496206Z st.shared.b8 [%r21], %rs44; 2026-02-21T08:52:48.7496265Z bar.sync 0; 2026-02-21T08:52:48.7496346Z ld.shared.v2.b8 {%rs77, %rs78}, [%r22]; 2026-02-21T08:52:48.7496687Z .loc 1 57 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:57:28 2026-02-21T08:52:48.7496758Z shl.b16 %rs79, %rs77, 4; 2026-02-21T08:52:48.7496825Z shl.b16 %rs80, %rs78, 4; 2026-02-21T08:52:48.7497034Z .loc 1 72 58 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:72:58 2026-02-21T08:52:48.7497109Z selp.b16 %rs81, %rs79, %rs77, %p60; 2026-02-21T08:52:48.7497173Z cvt.s16.s8 %rs82, %rs81; 2026-02-21T08:52:48.7497236Z shr.s16 %rs83, %rs82, 4; 2026-02-21T08:52:48.7497312Z selp.b16 %rs84, %rs80, %rs78, %p60; 2026-02-21T08:52:48.7497374Z cvt.s16.s8 %rs85, %rs84; 2026-02-21T08:52:48.7497438Z shr.s16 %rs86, %rs85, 4; 2026-02-21T08:52:48.7497649Z .loc 1 77 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:77:32 2026-02-21T08:52:48.7497715Z cvt.rn.f32.s16 %r511, %rs83; 2026-02-21T08:52:48.7497780Z cvt.rn.f32.s16 %r512, %rs86; 2026-02-21T08:52:48.7497847Z bar.sync 0; 2026-02-21T08:52:48.7497926Z st.shared.b32 [%r23], %r511; 2026-02-21T08:52:48.7498000Z st.shared.b32 [%r23+4096], %r512; 2026-02-21T08:52:48.7498056Z $L__tmp3: 2026-02-21T08:52:48.7498349Z .loc 2 291 36 // standard.py:291:36 @[ c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:84:40 ] 2026-02-21T08:52:48.7498412Z // begin inline asm 2026-02-21T08:52:48.7498494Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.7498559Z // end inline asm 2026-02-21T08:52:48.7498616Z bar.sync 0; 2026-02-21T08:52:48.7498698Z shfl.sync.idx.b32 %r513, %r5, 0, 31, -1; 2026-02-21T08:52:48.7498772Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.7498841Z shl.b32 %r514, %r513, 8; 2026-02-21T08:52:48.7498906Z and.b32 %r515, %r514, 3072; 2026-02-21T08:52:48.7498968Z add.s32 %r516, %r515, %r147; 2026-02-21T08:52:48.7499036Z bfe.u32 %r517, %r516, 4, 14; 2026-02-21T08:52:48.7499103Z cvt.u64.u32 %rd66, %r517; 2026-02-21T08:52:48.7499182Z or.b64 %rd56, %rd66, 4611686293322072064; 2026-02-21T08:52:48.7499254Z mov.pred %p16, -1; 2026-02-21T08:52:48.7499316Z // begin inline asm 2026-02-21T08:52:48.7499606Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r395,%r396,%r397,%r398}, %rd56, %p16, 1, 1; 2026-02-21T08:52:48.7499824Z // end inline asm 2026-02-21T08:52:48.7499897Z add.s32 %r518, %r516, 32; 2026-02-21T08:52:48.7499964Z bfe.u32 %r519, %r518, 4, 14; 2026-02-21T08:52:48.7500027Z cvt.u64.u32 %rd67, %r519; 2026-02-21T08:52:48.7500108Z or.b64 %rd57, %rd67, 4611686293322072064; 2026-02-21T08:52:48.7500169Z // begin inline asm 2026-02-21T08:52:48.7500451Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r407,%r408,%r409,%r410}, %rd57, %p16, 1, 1; 2026-02-21T08:52:48.7500515Z // end inline asm 2026-02-21T08:52:48.7500581Z add.s32 %r520, %r516, 64; 2026-02-21T08:52:48.7500643Z bfe.u32 %r521, %r520, 4, 14; 2026-02-21T08:52:48.7500705Z cvt.u64.u32 %rd68, %r521; 2026-02-21T08:52:48.7500898Z or.b64 %rd58, %rd68, 4611686293322072064; 2026-02-21T08:52:48.7500963Z // begin inline asm 2026-02-21T08:52:48.7501237Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r419,%r420,%r421,%r422}, %rd58, %p16, 1, 1; 2026-02-21T08:52:48.7501308Z // end inline asm 2026-02-21T08:52:48.7501377Z add.s32 %r522, %r516, 96; 2026-02-21T08:52:48.7501440Z bfe.u32 %r523, %r522, 4, 14; 2026-02-21T08:52:48.7501503Z cvt.u64.u32 %rd69, %r523; 2026-02-21T08:52:48.7501578Z or.b64 %rd59, %rd69, 4611686293322072064; 2026-02-21T08:52:48.7501638Z // begin inline asm 2026-02-21T08:52:48.7501913Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r431,%r432,%r433,%r434}, %rd59, %p16, 1, 1; 2026-02-21T08:52:48.7501979Z // end inline asm 2026-02-21T08:52:48.7502039Z add.s32 %r524, %r516, 4096; 2026-02-21T08:52:48.7502100Z bfe.u32 %r525, %r524, 4, 14; 2026-02-21T08:52:48.7502168Z cvt.u64.u32 %rd70, %r525; 2026-02-21T08:52:48.7502239Z or.b64 %rd60, %rd70, 4611686293322072064; 2026-02-21T08:52:48.7502300Z // begin inline asm 2026-02-21T08:52:48.7502573Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r443,%r444,%r445,%r446}, %rd60, %p16, 1, 1; 2026-02-21T08:52:48.7502639Z // end inline asm 2026-02-21T08:52:48.7502699Z add.s32 %r526, %r516, 4128; 2026-02-21T08:52:48.7502761Z bfe.u32 %r527, %r526, 4, 14; 2026-02-21T08:52:48.7502828Z cvt.u64.u32 %rd71, %r527; 2026-02-21T08:52:48.7502897Z or.b64 %rd61, %rd71, 4611686293322072064; 2026-02-21T08:52:48.7502956Z // begin inline asm 2026-02-21T08:52:48.7503229Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r455,%r456,%r457,%r458}, %rd61, %p16, 1, 1; 2026-02-21T08:52:48.7503292Z // end inline asm 2026-02-21T08:52:48.7503352Z add.s32 %r528, %r516, 4160; 2026-02-21T08:52:48.7503413Z bfe.u32 %r529, %r528, 4, 14; 2026-02-21T08:52:48.7503481Z cvt.u64.u32 %rd72, %r529; 2026-02-21T08:52:48.7503551Z or.b64 %rd62, %rd72, 4611686293322072064; 2026-02-21T08:52:48.7503616Z // begin inline asm 2026-02-21T08:52:48.7503895Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r467,%r468,%r469,%r470}, %rd62, %p16, 1, 1; 2026-02-21T08:52:48.7503957Z // end inline asm 2026-02-21T08:52:48.7504017Z add.s32 %r530, %r516, 4192; 2026-02-21T08:52:48.7504078Z bfe.u32 %r531, %r530, 4, 14; 2026-02-21T08:52:48.7504147Z cvt.u64.u32 %rd73, %r531; 2026-02-21T08:52:48.7504217Z or.b64 %rd63, %rd73, 4611686293322072064; 2026-02-21T08:52:48.7504277Z // begin inline asm 2026-02-21T08:52:48.7504554Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r981,%r982,%r983,%r984}, {%r479,%r480,%r481,%r482}, %rd63, %p16, 1, 1; 2026-02-21T08:52:48.7504611Z // end inline asm 2026-02-21T08:52:48.7504690Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.7504754Z mov.b32 %r489, 0; 2026-02-21T08:52:48.7504815Z mov.b32 %r488, %r489; 2026-02-21T08:52:48.7504875Z mov.b32 %r487, %r147; 2026-02-21T08:52:48.7504936Z // begin inline asm 2026-02-21T08:52:48.7505055Z // wait for regs: %r981,%r982,%r983,%r984,%r487,%r488,%r489 2026-02-21T08:52:48.7505131Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.7505189Z // end inline asm 2026-02-21T08:52:48.7505360Z $L__tmp4: 2026-02-21T08:52:48.7505574Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7505638Z add.s32 %r532, %r980, 1; 2026-02-21T08:52:48.7505707Z setp.gt.s32 %p27, %r532, 1; 2026-02-21T08:52:48.7505780Z selp.b32 %r980, 0, %r532, %p27; 2026-02-21T08:52:48.7505987Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7506062Z mad.wide.s32 %rd64, %r978, 2, %rd14; 2026-02-21T08:52:48.7506275Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7506337Z shl.b32 %r533, %r980, 13; 2026-02-21T08:52:48.7506399Z add.s32 %r497, %r11, %r533; 2026-02-21T08:52:48.7506736Z selp.b32 %r498, 8, 0, %p25; 2026-02-21T08:52:48.7506810Z // begin inline asm 2026-02-21T08:52:48.7506964Z cp.async.ca.shared.global [ %r497 + 0 ], [ %rd64 + 0 ], 0x8, %r498; 2026-02-21T08:52:48.7507028Z // end inline asm 2026-02-21T08:52:48.7507105Z cp.async.commit_group; 2026-02-21T08:52:48.7507323Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7507386Z add.s32 %r978, %r978, 64; 2026-02-21T08:52:48.7507459Z add.s32 %r977, %r977, 229376; 2026-02-21T08:52:48.7507528Z setp.lt.u64 %p28, %rd130, 4064; 2026-02-21T08:52:48.7507590Z @%p28 bra $L__BB0_5; 2026-02-21T08:52:48.7507705Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.7507924Z .loc 1 33 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:33:32 2026-02-21T08:52:48.7507987Z or.b32 %r544, %r67, %r9; 2026-02-21T08:52:48.7508195Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7508282Z cp.async.wait_group 0; 2026-02-21T08:52:48.7517319Z bar.sync 0; 2026-02-21T08:52:48.7517614Z .loc 1 87 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:87:28 2026-02-21T08:52:48.7517718Z cvt.rn.bf16x2.f32 %r545, %r982, %r981; 2026-02-21T08:52:48.7517801Z cvt.rn.bf16x2.f32 %r546, %r984, %r983; 2026-02-21T08:52:48.7518032Z .loc 1 88 50 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:50 2026-02-21T08:52:48.7518111Z mad.lo.s32 %r547, %r66, 7168, %r544; 2026-02-21T08:52:48.7518353Z .loc 1 88 22 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:22 2026-02-21T08:52:48.7518431Z mad.wide.s32 %rd74, %r547, 2, %rd16; 2026-02-21T08:52:48.7518645Z .loc 1 88 81 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:81 2026-02-21T08:52:48.7518812Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r24], {%r545, %r546}; 2026-02-21T08:52:48.7518876Z bar.sync 0; 2026-02-21T08:52:48.7518941Z // begin inline asm 2026-02-21T08:52:48.7519087Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r536}, [%r355]; 2026-02-21T08:52:48.7519153Z // end inline asm 2026-02-21T08:52:48.7519213Z // begin inline asm 2026-02-21T08:52:48.7519289Z st.global.b32 [ %rd74 + 0 ], { %r536 }; 2026-02-21T08:52:48.7519356Z // end inline asm 2026-02-21T08:52:48.7519586Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7519655Z add.s32 %r548, %r968, 8448; 2026-02-21T08:52:48.7519879Z .loc 1 25 35 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:25:35 2026-02-21T08:52:48.7519954Z mul.hi.s32 %r549, %r548, -1840700269; 2026-02-21T08:52:48.7520024Z add.s32 %r550, %r549, %r548; 2026-02-21T08:52:48.7520094Z shr.u32 %r551, %r550, 31; 2026-02-21T08:52:48.7520169Z shr.s32 %r552, %r550, 8; 2026-02-21T08:52:48.7520239Z add.s32 %r553, %r552, %r551; 2026-02-21T08:52:48.7520453Z .loc 1 26 33 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:26:33 2026-02-21T08:52:48.7520523Z shl.b32 %r554, %r553, 1; 2026-02-21T08:52:48.7520936Z .loc 1 27 39 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:39 2026-02-21T08:52:48.7521011Z sub.s32 %r555, 1, %r554; 2026-02-21T08:52:48.7521223Z .loc 1 27 52 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:52 2026-02-21T08:52:48.7521285Z min.s32 %r556, %r555, 2; 2026-02-21T08:52:48.7521486Z .loc 1 28 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:45 2026-02-21T08:52:48.7521561Z mul.lo.s32 %r557, %r553, 448; 2026-02-21T08:52:48.7521624Z sub.s32 %r558, %r548, %r557; 2026-02-21T08:52:48.7521826Z .loc 1 29 51 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:29:51 2026-02-21T08:52:48.7521893Z div.s32 %r559, %r558, %r556; 2026-02-21T08:52:48.7522249Z .loc 1 28 64 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:64 2026-02-21T08:52:48.7522321Z mul.lo.s32 %r560, %r559, %r556; 2026-02-21T08:52:48.7522389Z sub.s32 %r561, %r558, %r560; 2026-02-21T08:52:48.7522596Z .loc 1 28 30 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:30 2026-02-21T08:52:48.7522658Z add.s32 %r562, %r561, %r554; 2026-02-21T08:52:48.7522858Z .loc 1 30 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:30:27 2026-02-21T08:52:48.7522927Z shl.b32 %r563, %r562, 6; 2026-02-21T08:52:48.7523128Z .loc 1 31 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:31:32 2026-02-21T08:52:48.7523192Z or.b32 %r86, %r563, %r6; 2026-02-21T08:52:48.7523400Z .loc 1 32 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:32:27 2026-02-21T08:52:48.7523475Z shl.b32 %r87, %r559, 5; 2026-02-21T08:52:48.7523681Z .loc 1 48 53 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:53 2026-02-21T08:52:48.7523744Z shl.b32 %r564, %r86, 13; 2026-02-21T08:52:48.7523952Z .loc 1 48 60 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:60 2026-02-21T08:52:48.7524016Z or.b32 %r565, %r564, %r7; 2026-02-21T08:52:48.7524217Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7524298Z mad.wide.s32 %rd75, %r565, 2, %rd14; 2026-02-21T08:52:48.7524499Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7524558Z bar.sync 0; 2026-02-21T08:52:48.7524619Z mov.b32 %r538, 8; 2026-02-21T08:52:48.7524686Z // begin inline asm 2026-02-21T08:52:48.7524831Z cp.async.ca.shared.global [ %r11 + 0 ], [ %rd75 + 0 ], 0x8, %r538; 2026-02-21T08:52:48.7524892Z // end inline asm 2026-02-21T08:52:48.7524985Z cp.async.commit_group; 2026-02-21T08:52:48.7525205Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7525275Z cvt.s64.s32 %rd78, %r564; 2026-02-21T08:52:48.7525345Z or.b64 %rd79, %rd78, %rd2; 2026-02-21T08:52:48.7525420Z shl.b64 %rd80, %rd79, 1; 2026-02-21T08:52:48.7525486Z add.s64 %rd81, %rd14, %rd80; 2026-02-21T08:52:48.7525552Z add.s64 %rd76, %rd81, 128; 2026-02-21T08:52:48.7525766Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7525825Z bar.sync 0; 2026-02-21T08:52:48.7525889Z // begin inline asm 2026-02-21T08:52:48.7526021Z cp.async.ca.shared.global [ %r12 + 0 ], [ %rd76 + 0 ], 0x8, %r538; 2026-02-21T08:52:48.7526088Z // end inline asm 2026-02-21T08:52:48.7526156Z cp.async.commit_group; 2026-02-21T08:52:48.7526361Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7526435Z shl.b32 %r566, %r562, 19; 2026-02-21T08:52:48.7526654Z or.b32 %r986, %r26, %r566; 2026-02-21T08:52:48.7526721Z add.s32 %r985, %r27, %r87; 2026-02-21T08:52:48.7526791Z mov.b32 %r989, 0f00000000; 2026-02-21T08:52:48.7526986Z mov.b32 %r988, 1; 2026-02-21T08:52:48.7527051Z mov.b32 %r987, -1; 2026-02-21T08:52:48.7527114Z mov.b64 %rd131, -32; 2026-02-21T08:52:48.7527184Z mov.b32 %r990, %r989; 2026-02-21T08:52:48.7527243Z mov.b32 %r991, %r989; 2026-02-21T08:52:48.7527302Z mov.b32 %r992, %r989; 2026-02-21T08:52:48.7527431Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:52:48.7527541Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.7527608Z add.s64 %rd131, %rd131, 32; 2026-02-21T08:52:48.7527680Z setp.lt.u64 %p38, %rd131, 4032; 2026-02-21T08:52:48.7527749Z add.s32 %r679, %r987, 1; 2026-02-21T08:52:48.7527818Z setp.gt.s32 %p39, %r679, 1; 2026-02-21T08:52:48.7527885Z selp.b32 %r987, 0, %r679, %p39; 2026-02-21T08:52:48.7528222Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7528296Z cp.async.wait_group 1; 2026-02-21T08:52:48.7528359Z bar.sync 0; 2026-02-21T08:52:48.7528427Z shl.b32 %r680, %r987, 13; 2026-02-21T08:52:48.7528492Z add.s32 %r682, %r952, %r680; 2026-02-21T08:52:48.7528694Z .loc 1 52 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:52:32 2026-02-21T08:52:48.7528758Z add.s32 %r683, %r682, %r13; 2026-02-21T08:52:48.7528833Z ld.shared.b16 %rs88, [%r683]; 2026-02-21T08:52:48.7528903Z ld.shared.b16 %rs89, [%r683+1024]; 2026-02-21T08:52:48.7528988Z ld.shared.b16 %rs90, [%r683+64]; 2026-02-21T08:52:48.7529063Z ld.shared.b16 %rs91, [%r683+1088]; 2026-02-21T08:52:48.7529127Z add.s32 %r684, %r682, %r14; 2026-02-21T08:52:48.7529197Z ld.shared.b16 %rs92, [%r684]; 2026-02-21T08:52:48.7529263Z ld.shared.b16 %rs93, [%r684+1024]; 2026-02-21T08:52:48.7529342Z ld.shared.b16 %rs94, [%r684+64]; 2026-02-21T08:52:48.7529409Z ld.shared.b16 %rs95, [%r684+1088]; 2026-02-21T08:52:48.7529474Z add.s32 %r685, %r682, %r15; 2026-02-21T08:52:48.7529546Z ld.shared.b16 %rs96, [%r685]; 2026-02-21T08:52:48.7529614Z ld.shared.b16 %rs97, [%r685+1024]; 2026-02-21T08:52:48.7529684Z ld.shared.b16 %rs98, [%r685+64]; 2026-02-21T08:52:48.7529759Z ld.shared.b16 %rs99, [%r685+1088]; 2026-02-21T08:52:48.7529821Z add.s32 %r686, %r682, %r16; 2026-02-21T08:52:48.7529890Z ld.shared.b16 %rs100, [%r686]; 2026-02-21T08:52:48.7529966Z ld.shared.b16 %rs101, [%r686+1024]; 2026-02-21T08:52:48.7530055Z ld.shared.b16 %rs102, [%r686+64]; 2026-02-21T08:52:48.7530131Z ld.shared.b16 %rs103, [%r686+1088]; 2026-02-21T08:52:48.7530199Z add.s32 %r687, %r682, %r17; 2026-02-21T08:52:48.7530275Z ld.shared.b16 %rs104, [%r687]; 2026-02-21T08:52:48.7530345Z ld.shared.b16 %rs105, [%r687+1024]; 2026-02-21T08:52:48.7530413Z ld.shared.b16 %rs106, [%r687+64]; 2026-02-21T08:52:48.7530484Z ld.shared.b16 %rs107, [%r687+1088]; 2026-02-21T08:52:48.7530559Z add.s32 %r688, %r682, %r18; 2026-02-21T08:52:48.7530626Z ld.shared.b16 %rs108, [%r688]; 2026-02-21T08:52:48.7530693Z ld.shared.b16 %rs109, [%r688+1024]; 2026-02-21T08:52:48.7530768Z ld.shared.b16 %rs110, [%r688+64]; 2026-02-21T08:52:48.7530837Z ld.shared.b16 %rs111, [%r688+1088]; 2026-02-21T08:52:48.7530899Z add.s32 %r689, %r682, %r19; 2026-02-21T08:52:48.7530967Z ld.shared.b16 %rs112, [%r689]; 2026-02-21T08:52:48.7531040Z ld.shared.b16 %rs113, [%r689+1024]; 2026-02-21T08:52:48.7531107Z ld.shared.b16 %rs114, [%r689+64]; 2026-02-21T08:52:48.7531174Z ld.shared.b16 %rs115, [%r689+1088]; 2026-02-21T08:52:48.7531245Z add.s32 %r690, %r682, %r20; 2026-02-21T08:52:48.7531315Z ld.shared.b16 %rs116, [%r690]; 2026-02-21T08:52:48.7531383Z ld.shared.b16 %rs117, [%r690+1024]; 2026-02-21T08:52:48.7531456Z ld.shared.b16 %rs118, [%r690+64]; 2026-02-21T08:52:48.7531524Z ld.shared.b16 %rs119, [%r690+1088]; 2026-02-21T08:52:48.7531591Z cvt.f32.bf16 %r575, %rs88; 2026-02-21T08:52:48.7531658Z cvt.f32.bf16 %r576, %rs89; 2026-02-21T08:52:48.7531732Z cvt.f32.bf16 %r577, %rs92; 2026-02-21T08:52:48.7531797Z cvt.f32.bf16 %r578, %rs93; 2026-02-21T08:52:48.7531968Z cvt.f32.bf16 %r587, %rs96; 2026-02-21T08:52:48.7532042Z cvt.f32.bf16 %r588, %rs97; 2026-02-21T08:52:48.7532108Z cvt.f32.bf16 %r589, %rs100; 2026-02-21T08:52:48.7532172Z cvt.f32.bf16 %r590, %rs101; 2026-02-21T08:52:48.7532234Z cvt.f32.bf16 %r599, %rs104; 2026-02-21T08:52:48.7532307Z cvt.f32.bf16 %r600, %rs105; 2026-02-21T08:52:48.7532369Z cvt.f32.bf16 %r601, %rs108; 2026-02-21T08:52:48.7532445Z cvt.f32.bf16 %r602, %rs109; 2026-02-21T08:52:48.7532518Z cvt.f32.bf16 %r611, %rs112; 2026-02-21T08:52:48.7532581Z cvt.f32.bf16 %r612, %rs113; 2026-02-21T08:52:48.7532646Z cvt.f32.bf16 %r613, %rs116; 2026-02-21T08:52:48.7532709Z cvt.f32.bf16 %r614, %rs117; 2026-02-21T08:52:48.7532780Z cvt.f32.bf16 %r623, %rs90; 2026-02-21T08:52:48.7532844Z cvt.f32.bf16 %r624, %rs91; 2026-02-21T08:52:48.7532999Z cvt.f32.bf16 %r625, %rs94; 2026-02-21T08:52:48.7533077Z cvt.f32.bf16 %r626, %rs95; 2026-02-21T08:52:48.7533144Z cvt.f32.bf16 %r635, %rs98; 2026-02-21T08:52:48.7533207Z cvt.f32.bf16 %r636, %rs99; 2026-02-21T08:52:48.7533278Z cvt.f32.bf16 %r637, %rs102; 2026-02-21T08:52:48.7533353Z cvt.f32.bf16 %r638, %rs103; 2026-02-21T08:52:48.7533419Z cvt.f32.bf16 %r647, %rs106; 2026-02-21T08:52:48.7533482Z cvt.f32.bf16 %r648, %rs107; 2026-02-21T08:52:48.7533555Z cvt.f32.bf16 %r649, %rs110; 2026-02-21T08:52:48.7533619Z cvt.f32.bf16 %r650, %rs111; 2026-02-21T08:52:48.7533684Z cvt.f32.bf16 %r659, %rs114; 2026-02-21T08:52:48.7533754Z cvt.f32.bf16 %r660, %rs115; 2026-02-21T08:52:48.7533818Z cvt.f32.bf16 %r661, %rs118; 2026-02-21T08:52:48.7533881Z cvt.f32.bf16 %r662, %rs119; 2026-02-21T08:52:48.7534102Z .loc 1 54 34 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:34 2026-02-21T08:52:48.7534176Z cvt.s64.s32 %rd94, %r985; 2026-02-21T08:52:48.7534248Z add.s64 %rd83, %rd15, %rd94; 2026-02-21T08:52:48.7534457Z .loc 1 54 87 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:87 2026-02-21T08:52:48.7534527Z // begin inline asm 2026-02-21T08:52:48.7534593Z mov.u64 %rd82, 0x0; 2026-02-21T08:52:48.7534728Z createpolicy.fractional.L2::evict_last.b64 %rd82, 1.0; 2026-02-21T08:52:48.7534797Z // end inline asm 2026-02-21T08:52:48.7534861Z // begin inline asm 2026-02-21T08:52:48.7534922Z mov.u16 %rs87, 0x0; 2026-02-21T08:52:48.7535084Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs87 }, [ %rd83 + 0 ], %rd82; 2026-02-21T08:52:48.7535154Z // end inline asm 2026-02-21T08:52:48.7535373Z .loc 1 62 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:62:28 2026-02-21T08:52:48.7535443Z st.shared.b8 [%r21], %rs87; 2026-02-21T08:52:48.7535512Z bar.sync 0; 2026-02-21T08:52:48.7535595Z ld.shared.v2.b8 {%rs120, %rs121}, [%r22]; 2026-02-21T08:52:48.7535813Z .loc 1 57 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:57:28 2026-02-21T08:52:48.7535886Z shl.b16 %rs122, %rs120, 4; 2026-02-21T08:52:48.7535950Z shl.b16 %rs123, %rs121, 4; 2026-02-21T08:52:48.7536159Z .loc 1 72 58 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:72:58 2026-02-21T08:52:48.7536237Z selp.b16 %rs124, %rs122, %rs120, %p60; 2026-02-21T08:52:48.7536310Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:52:48.7536373Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:52:48.7536445Z selp.b16 %rs127, %rs123, %rs121, %p60; 2026-02-21T08:52:48.7536657Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:52:48.7536721Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:52:48.7536930Z .loc 1 77 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:77:32 2026-02-21T08:52:48.7536997Z cvt.rn.f32.s16 %r691, %rs126; 2026-02-21T08:52:48.7537067Z cvt.rn.f32.s16 %r692, %rs129; 2026-02-21T08:52:48.7537124Z bar.sync 0; 2026-02-21T08:52:48.7537195Z st.shared.b32 [%r23], %r691; 2026-02-21T08:52:48.7537274Z st.shared.b32 [%r23+4096], %r692; 2026-02-21T08:52:48.7537334Z $L__tmp5: 2026-02-21T08:52:48.7537618Z .loc 2 291 36 // standard.py:291:36 @[ c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:84:40 ] 2026-02-21T08:52:48.7537836Z // begin inline asm 2026-02-21T08:52:48.7537918Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.7537977Z // end inline asm 2026-02-21T08:52:48.7538034Z bar.sync 0; 2026-02-21T08:52:48.7538122Z shfl.sync.idx.b32 %r693, %r5, 0, 31, -1; 2026-02-21T08:52:48.7538197Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.7538262Z shl.b32 %r694, %r693, 8; 2026-02-21T08:52:48.7538333Z and.b32 %r695, %r694, 3072; 2026-02-21T08:52:48.7538395Z add.s32 %r696, %r695, %r147; 2026-02-21T08:52:48.7538459Z bfe.u32 %r697, %r696, 4, 14; 2026-02-21T08:52:48.7538523Z cvt.u64.u32 %rd95, %r697; 2026-02-21T08:52:48.7538603Z or.b64 %rd85, %rd95, 4611686293322072064; 2026-02-21T08:52:48.7538672Z mov.pred %p29, -1; 2026-02-21T08:52:48.7538854Z // begin inline asm 2026-02-21T08:52:48.7539164Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r575,%r576,%r577,%r578}, %rd85, %p29, 1, 1; 2026-02-21T08:52:48.7539229Z // end inline asm 2026-02-21T08:52:48.7539293Z add.s32 %r698, %r696, 32; 2026-02-21T08:52:48.7539365Z bfe.u32 %r699, %r698, 4, 14; 2026-02-21T08:52:48.7539429Z cvt.u64.u32 %rd96, %r699; 2026-02-21T08:52:48.7539503Z or.b64 %rd86, %rd96, 4611686293322072064; 2026-02-21T08:52:48.7539565Z // begin inline asm 2026-02-21T08:52:48.7539852Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r587,%r588,%r589,%r590}, %rd86, %p29, 1, 1; 2026-02-21T08:52:48.7539912Z // end inline asm 2026-02-21T08:52:48.7539974Z add.s32 %r700, %r696, 64; 2026-02-21T08:52:48.7540043Z bfe.u32 %r701, %r700, 4, 14; 2026-02-21T08:52:48.7540106Z cvt.u64.u32 %rd97, %r701; 2026-02-21T08:52:48.7540174Z or.b64 %rd87, %rd97, 4611686293322072064; 2026-02-21T08:52:48.7540240Z // begin inline asm 2026-02-21T08:52:48.7540520Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r599,%r600,%r601,%r602}, %rd87, %p29, 1, 1; 2026-02-21T08:52:48.7540579Z // end inline asm 2026-02-21T08:52:48.7540646Z add.s32 %r702, %r696, 96; 2026-02-21T08:52:48.7540714Z bfe.u32 %r703, %r702, 4, 14; 2026-02-21T08:52:48.7540777Z cvt.u64.u32 %rd98, %r703; 2026-02-21T08:52:48.7540846Z or.b64 %rd88, %rd98, 4611686293322072064; 2026-02-21T08:52:48.7540913Z // begin inline asm 2026-02-21T08:52:48.7541202Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r611,%r612,%r613,%r614}, %rd88, %p29, 1, 1; 2026-02-21T08:52:48.7541264Z // end inline asm 2026-02-21T08:52:48.7541341Z add.s32 %r704, %r696, 4096; 2026-02-21T08:52:48.7541414Z bfe.u32 %r705, %r704, 4, 14; 2026-02-21T08:52:48.7541479Z cvt.u64.u32 %rd99, %r705; 2026-02-21T08:52:48.7541552Z or.b64 %rd89, %rd99, 4611686293322072064; 2026-02-21T08:52:48.7541622Z // begin inline asm 2026-02-21T08:52:48.7541904Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r623,%r624,%r625,%r626}, %rd89, %p29, 1, 1; 2026-02-21T08:52:48.7541968Z // end inline asm 2026-02-21T08:52:48.7542042Z add.s32 %r706, %r696, 4128; 2026-02-21T08:52:48.7542106Z bfe.u32 %r707, %r706, 4, 14; 2026-02-21T08:52:48.7542172Z cvt.u64.u32 %rd100, %r707; 2026-02-21T08:52:48.7542249Z or.b64 %rd90, %rd100, 4611686293322072064; 2026-02-21T08:52:48.7542319Z // begin inline asm 2026-02-21T08:52:48.7542598Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r635,%r636,%r637,%r638}, %rd90, %p29, 1, 1; 2026-02-21T08:52:48.7542658Z // end inline asm 2026-02-21T08:52:48.7542727Z add.s32 %r708, %r696, 4160; 2026-02-21T08:52:48.7542790Z bfe.u32 %r709, %r708, 4, 14; 2026-02-21T08:52:48.7542854Z cvt.u64.u32 %rd101, %r709; 2026-02-21T08:52:48.7542932Z or.b64 %rd91, %rd101, 4611686293322072064; 2026-02-21T08:52:48.7543003Z // begin inline asm 2026-02-21T08:52:48.7543280Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r647,%r648,%r649,%r650}, %rd91, %p29, 1, 1; 2026-02-21T08:52:48.7543342Z // end inline asm 2026-02-21T08:52:48.7543537Z add.s32 %r710, %r696, 4192; 2026-02-21T08:52:48.7543601Z bfe.u32 %r711, %r710, 4, 14; 2026-02-21T08:52:48.7543665Z cvt.u64.u32 %rd102, %r711; 2026-02-21T08:52:48.7543746Z or.b64 %rd92, %rd102, 4611686293322072064; 2026-02-21T08:52:48.7543808Z // begin inline asm 2026-02-21T08:52:48.7544081Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r989,%r990,%r991,%r992}, {%r659,%r660,%r661,%r662}, %rd92, %p29, 1, 1; 2026-02-21T08:52:48.7544139Z // end inline asm 2026-02-21T08:52:48.7544228Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.7544289Z mov.b32 %r669, 0; 2026-02-21T08:52:48.7544351Z mov.b32 %r667, %r147; 2026-02-21T08:52:48.7544419Z mov.b32 %r668, %r669; 2026-02-21T08:52:48.7544481Z // begin inline asm 2026-02-21T08:52:48.7544681Z // wait for regs: %r989,%r990,%r991,%r992,%r667,%r668,%r669 2026-02-21T08:52:48.7544762Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.7544830Z // end inline asm 2026-02-21T08:52:48.7544886Z $L__tmp6: 2026-02-21T08:52:48.7545114Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7545192Z add.s32 %r712, %r988, 1; 2026-02-21T08:52:48.7545262Z setp.gt.s32 %p40, %r712, 1; 2026-02-21T08:52:48.7545330Z selp.b32 %r988, 0, %r712, %p40; 2026-02-21T08:52:48.7545548Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7545624Z mad.wide.s32 %rd93, %r986, 2, %rd14; 2026-02-21T08:52:48.7545833Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7545898Z shl.b32 %r713, %r988, 13; 2026-02-21T08:52:48.7545967Z add.s32 %r677, %r11, %r713; 2026-02-21T08:52:48.7546034Z selp.b32 %r678, 8, 0, %p38; 2026-02-21T08:52:48.7546100Z // begin inline asm 2026-02-21T08:52:48.7546255Z cp.async.ca.shared.global [ %r677 + 0 ], [ %rd93 + 0 ], 0x8, %r678; 2026-02-21T08:52:48.7546319Z // end inline asm 2026-02-21T08:52:48.7546389Z cp.async.commit_group; 2026-02-21T08:52:48.7546753Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7546820Z add.s32 %r986, %r986, 64; 2026-02-21T08:52:48.7546887Z add.s32 %r985, %r985, 229376; 2026-02-21T08:52:48.7546964Z setp.lt.u64 %p41, %rd131, 4064; 2026-02-21T08:52:48.7547034Z @%p41 bra $L__BB0_7; 2026-02-21T08:52:48.7547149Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:52:48.7547357Z .loc 1 33 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:33:32 2026-02-21T08:52:48.7547430Z or.b32 %r717, %r87, %r9; 2026-02-21T08:52:48.7547634Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7547708Z cp.async.wait_group 0; 2026-02-21T08:52:48.7547774Z bar.sync 0; 2026-02-21T08:52:48.7547971Z .loc 1 87 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:87:28 2026-02-21T08:52:48.7548049Z cvt.rn.bf16x2.f32 %r718, %r990, %r989; 2026-02-21T08:52:48.7548116Z cvt.rn.bf16x2.f32 %r719, %r992, %r991; 2026-02-21T08:52:48.7548321Z .loc 1 88 50 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:50 2026-02-21T08:52:48.7548391Z mad.lo.s32 %r720, %r86, 7168, %r717; 2026-02-21T08:52:48.7548686Z .loc 1 88 22 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:22 2026-02-21T08:52:48.7548769Z mad.wide.s32 %rd103, %r720, 2, %rd16; 2026-02-21T08:52:48.7548972Z .loc 1 88 81 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:81 2026-02-21T08:52:48.7549125Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r24], {%r718, %r719}; 2026-02-21T08:52:48.7549190Z bar.sync 0; 2026-02-21T08:52:48.7549259Z // begin inline asm 2026-02-21T08:52:48.7549398Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r714}, [%r355]; 2026-02-21T08:52:48.7549462Z // end inline asm 2026-02-21T08:52:48.7549677Z // begin inline asm 2026-02-21T08:52:48.7549756Z st.global.b32 [ %rd103 + 0 ], { %r714 }; 2026-02-21T08:52:48.7549814Z // end inline asm 2026-02-21T08:52:48.7550045Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7550114Z add.s32 %r968, %r968, 12672; 2026-02-21T08:52:48.7550185Z setp.lt.s32 %p42, %r968, %r993; 2026-02-21T08:52:48.7550258Z @%p42 bra $L__BB0_2; 2026-02-21T08:52:48.7550359Z $L__BB0_9: // %.preheader 2026-02-21T08:52:48.7550430Z setp.gt.s32 %p43, %r993, 223; 2026-02-21T08:52:48.7550496Z @%p43 bra $L__BB0_14; 2026-02-21T08:52:48.7550582Z // %bb.10: // %.lr.ph35 2026-02-21T08:52:48.7550911Z .loc 1 0 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:0:144 2026-02-21T08:52:48.7550981Z and.b32 %r723, %r951, 56; 2026-02-21T08:52:48.7551052Z xor.b32 %r724, %r723, %r950; 2026-02-21T08:52:48.7551117Z add.s32 %r28, %r952, %r724; 2026-02-21T08:52:48.7551181Z add.s32 %r771, %r28, 8192; 2026-02-21T08:52:48.7551245Z and.b32 %r727, %r953, 6144; 2026-02-21T08:52:48.7551314Z and.b32 %r729, %r954, 896; 2026-02-21T08:52:48.7551375Z or.b32 %r731, %r727, %r729; 2026-02-21T08:52:48.7551436Z or.b32 %r30, %r731, %r955; 2026-02-21T08:52:48.7551515Z xor.b32 %r31, %r30, 8; 2026-02-21T08:52:48.7551578Z xor.b32 %r32, %r30, 16; 2026-02-21T08:52:48.7551640Z xor.b32 %r33, %r30, 24; 2026-02-21T08:52:48.7551706Z xor.b32 %r34, %r30, 32; 2026-02-21T08:52:48.7551766Z xor.b32 %r35, %r30, 40; 2026-02-21T08:52:48.7551829Z xor.b32 %r36, %r30, 48; 2026-02-21T08:52:48.7551890Z xor.b32 %r37, %r30, 56; 2026-02-21T08:52:48.7551958Z and.b32 %r734, %r957, 896; 2026-02-21T08:52:48.7552023Z selp.b32 %r736, 1, 0, %p59; 2026-02-21T08:52:48.7552089Z add.s32 %r737, %r952, 16384; 2026-02-21T08:52:48.7552155Z add.s32 %r738, %r737, %r734; 2026-02-21T08:52:48.7552218Z add.s32 %r739, %r738, %r958; 2026-02-21T08:52:48.7552280Z add.s32 %r740, %r739, %r736; 2026-02-21T08:52:48.7552345Z add.s32 %r38, %r740, %r956; 2026-02-21T08:52:48.7552415Z add.s32 %r743, %r737, %r960; 2026-02-21T08:52:48.7552479Z add.s32 %r744, %r743, %r959; 2026-02-21T08:52:48.7552540Z add.s32 %r39, %r744, %r956; 2026-02-21T08:52:48.7552612Z and.b32 %r747, %r962, 112; 2026-02-21T08:52:48.7552672Z or.b32 %r749, %r961, %r747; 2026-02-21T08:52:48.7552734Z xor.b32 %r750, %r749, %r963; 2026-02-21T08:52:48.7552795Z add.s32 %r40, %r737, %r750; 2026-02-21T08:52:48.7552868Z and.b32 %r752, %r964, 3072; 2026-02-21T08:52:48.7552928Z and.b32 %r755, %r965, 768; 2026-02-21T08:52:48.7552985Z or.b32 %r756, %r755, %r966; 2026-02-21T08:52:48.7553054Z and.b32 %r758, %r967, 96; 2026-02-21T08:52:48.7553118Z xor.b32 %r759, %r756, %r758; 2026-02-21T08:52:48.7553184Z add.s32 %r760, %r952, %r752; 2026-02-21T08:52:48.7553256Z add.s32 %r41, %r760, %r759; 2026-02-21T08:52:48.7553326Z or.b32 %r761, %r967, %r4; 2026-02-21T08:52:48.7553389Z shl.b32 %r762, %r761, 7; 2026-02-21T08:52:48.7553453Z and.b32 %r763, %r762, 3072; 2026-02-21T08:52:48.7553518Z and.b32 %r764, %r954, 96; 2026-02-21T08:52:48.7553579Z shl.b32 %r765, %r761, 2; 2026-02-21T08:52:48.7553638Z and.b32 %r766, %r765, 1008; 2026-02-21T08:52:48.7553708Z xor.b32 %r767, %r766, %r764; 2026-02-21T08:52:48.7553772Z add.s32 %r768, %r952, %r763; 2026-02-21T08:52:48.7553835Z add.s32 %r944, %r768, %r767; 2026-02-21T08:52:48.7554049Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7554111Z or.b32 %r43, %r7, 128; 2026-02-21T08:52:48.7554323Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7554396Z mad.wide.u32 %rd1, %r5, 7168, %rd15; 2026-02-21T08:52:48.7554524Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:52:48.7554625Z // Child Loop BB0_12 Depth 2 2026-02-21T08:52:48.7554932Z .loc 1 25 35 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:25:35 2026-02-21T08:52:48.7555015Z mul.hi.s32 %r776, %r993, -1840700269; 2026-02-21T08:52:48.7555077Z add.s32 %r777, %r776, %r993; 2026-02-21T08:52:48.7555139Z shr.u32 %r778, %r777, 31; 2026-02-21T08:52:48.7555205Z shr.s32 %r779, %r777, 8; 2026-02-21T08:52:48.7555264Z add.s32 %r780, %r779, %r778; 2026-02-21T08:52:48.7555464Z .loc 1 26 33 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:26:33 2026-02-21T08:52:48.7555526Z shl.b32 %r781, %r780, 1; 2026-02-21T08:52:48.7555729Z .loc 1 27 39 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:39 2026-02-21T08:52:48.7555789Z sub.s32 %r782, 1, %r781; 2026-02-21T08:52:48.7556092Z .loc 1 27 52 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:27:52 2026-02-21T08:52:48.7556164Z min.u32 %r783, %r782, 2; 2026-02-21T08:52:48.7556366Z .loc 1 28 45 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:45 2026-02-21T08:52:48.7556434Z mul.lo.s32 %r784, %r780, 448; 2026-02-21T08:52:48.7556631Z sub.s32 %r785, %r993, %r784; 2026-02-21T08:52:48.7556837Z .loc 1 28 64 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:64 2026-02-21T08:52:48.7556902Z cvt.u16.u32 %rs130, %r785; 2026-02-21T08:52:48.7556964Z cvt.u16.u32 %rs131, %r783; 2026-02-21T08:52:48.7557171Z .loc 1 29 51 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:29:51 2026-02-21T08:52:48.7557238Z div.s16 %rs132, %rs130, %rs131; 2026-02-21T08:52:48.7557437Z .loc 1 28 64 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:64 2026-02-21T08:52:48.7557515Z mul.lo.s16 %rs133, %rs132, %rs131; 2026-02-21T08:52:48.7557581Z sub.s16 %rs134, %rs130, %rs133; 2026-02-21T08:52:48.7557646Z cvt.s32.s16 %r786, %rs134; 2026-02-21T08:52:48.7557852Z .loc 1 28 30 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:28:30 2026-02-21T08:52:48.7557919Z add.s32 %r787, %r781, %r786; 2026-02-21T08:52:48.7558119Z .loc 1 30 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:30:27 2026-02-21T08:52:48.7558186Z shl.b32 %r788, %r787, 6; 2026-02-21T08:52:48.7558387Z .loc 1 31 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:31:32 2026-02-21T08:52:48.7558449Z or.b32 %r108, %r788, %r6; 2026-02-21T08:52:48.7558651Z .loc 1 32 27 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:32:27 2026-02-21T08:52:48.7558725Z mul.wide.s16 %r109, %rs132, 32; 2026-02-21T08:52:48.7558928Z .loc 1 48 53 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:53 2026-02-21T08:52:48.7558990Z shl.b32 %r789, %r108, 13; 2026-02-21T08:52:48.7559195Z .loc 1 48 60 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:60 2026-02-21T08:52:48.7559258Z or.b32 %r790, %r789, %r7; 2026-02-21T08:52:48.7559458Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7559537Z mad.wide.s32 %rd104, %r790, 2, %rd14; 2026-02-21T08:52:48.7559735Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7559792Z bar.sync 0; 2026-02-21T08:52:48.7559857Z mov.b32 %r770, 8; 2026-02-21T08:52:48.7559920Z // begin inline asm 2026-02-21T08:52:48.7560061Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd104 + 0 ], 0x8, %r770; 2026-02-21T08:52:48.7560120Z // end inline asm 2026-02-21T08:52:48.7560194Z cp.async.commit_group; 2026-02-21T08:52:48.7560396Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7560459Z add.s64 %rd105, %rd104, 128; 2026-02-21T08:52:48.7560665Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7560872Z bar.sync 0; 2026-02-21T08:52:48.7560936Z // begin inline asm 2026-02-21T08:52:48.7561079Z cp.async.ca.shared.global [ %r771 + 0 ], [ %rd105 + 0 ], 0x8, %r770; 2026-02-21T08:52:48.7561145Z // end inline asm 2026-02-21T08:52:48.7561211Z cp.async.commit_group; 2026-02-21T08:52:48.7561415Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7561483Z shl.b32 %r791, %r780, 7; 2026-02-21T08:52:48.7561545Z or.b32 %r792, %r6, %r791; 2026-02-21T08:52:48.7561619Z mad.wide.s16 %r793, %rs134, 64, %r792; 2026-02-21T08:52:48.7561685Z shl.b32 %r794, %r793, 13; 2026-02-21T08:52:48.7561747Z or.b32 %r994, %r43, %r794; 2026-02-21T08:52:48.7561810Z or.b32 %r795, %r4, %r109; 2026-02-21T08:52:48.7562001Z cvt.s64.s32 %rd107, %r795; 2026-02-21T08:52:48.7562076Z add.s64 %rd132, %rd1, %rd107; 2026-02-21T08:52:48.7562140Z mov.b32 %r997, 0f00000000; 2026-02-21T08:52:48.7562198Z mov.b32 %r996, 1; 2026-02-21T08:52:48.7562268Z mov.b32 %r995, -1; 2026-02-21T08:52:48.7562330Z mov.b64 %rd133, -32; 2026-02-21T08:52:48.7562391Z mov.b32 %r998, %r997; 2026-02-21T08:52:48.7562451Z mov.b32 %r999, %r997; 2026-02-21T08:52:48.7562522Z mov.b32 %r1000, %r997; 2026-02-21T08:52:48.7562638Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:52:48.7562745Z // => This Inner Loop Header: Depth=2 2026-02-21T08:52:48.7562816Z add.s64 %rd133, %rd133, 32; 2026-02-21T08:52:48.7562895Z setp.lt.u64 %p54, %rd133, 4032; 2026-02-21T08:52:48.7562960Z add.s32 %r908, %r995, 1; 2026-02-21T08:52:48.7563032Z setp.gt.s32 %p55, %r908, 1; 2026-02-21T08:52:48.7563097Z selp.b32 %r995, 0, %r908, %p55; 2026-02-21T08:52:48.7563306Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7563374Z cp.async.wait_group 1; 2026-02-21T08:52:48.7563436Z bar.sync 0; 2026-02-21T08:52:48.7563501Z shl.b32 %r909, %r995, 13; 2026-02-21T08:52:48.7563564Z add.s32 %r911, %r952, %r909; 2026-02-21T08:52:48.7563769Z .loc 1 52 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:52:32 2026-02-21T08:52:48.7563832Z add.s32 %r912, %r911, %r30; 2026-02-21T08:52:48.7563899Z ld.shared.b16 %rs136, [%r912]; 2026-02-21T08:52:48.7563971Z ld.shared.b16 %rs137, [%r912+1024]; 2026-02-21T08:52:48.7564045Z ld.shared.b16 %rs138, [%r912+64]; 2026-02-21T08:52:48.7564112Z ld.shared.b16 %rs139, [%r912+1088]; 2026-02-21T08:52:48.7564177Z add.s32 %r913, %r911, %r31; 2026-02-21T08:52:48.7564261Z ld.shared.b16 %rs140, [%r913]; 2026-02-21T08:52:48.7564329Z ld.shared.b16 %rs141, [%r913+1024]; 2026-02-21T08:52:48.7564395Z ld.shared.b16 %rs142, [%r913+64]; 2026-02-21T08:52:48.7564467Z ld.shared.b16 %rs143, [%r913+1088]; 2026-02-21T08:52:48.7564529Z add.s32 %r914, %r911, %r32; 2026-02-21T08:52:48.7564598Z ld.shared.b16 %rs144, [%r914]; 2026-02-21T08:52:48.7564667Z ld.shared.b16 %rs145, [%r914+1024]; 2026-02-21T08:52:48.7564737Z ld.shared.b16 %rs146, [%r914+64]; 2026-02-21T08:52:48.7564804Z ld.shared.b16 %rs147, [%r914+1088]; 2026-02-21T08:52:48.7564865Z add.s32 %r915, %r911, %r33; 2026-02-21T08:52:48.7564933Z ld.shared.b16 %rs148, [%r915]; 2026-02-21T08:52:48.7564999Z ld.shared.b16 %rs149, [%r915+1024]; 2026-02-21T08:52:48.7565066Z ld.shared.b16 %rs150, [%r915+64]; 2026-02-21T08:52:48.7565135Z ld.shared.b16 %rs151, [%r915+1088]; 2026-02-21T08:52:48.7565201Z add.s32 %r916, %r911, %r34; 2026-02-21T08:52:48.7565265Z ld.shared.b16 %rs152, [%r916]; 2026-02-21T08:52:48.7565331Z ld.shared.b16 %rs153, [%r916+1024]; 2026-02-21T08:52:48.7565402Z ld.shared.b16 %rs154, [%r916+64]; 2026-02-21T08:52:48.7565469Z ld.shared.b16 %rs155, [%r916+1088]; 2026-02-21T08:52:48.7565531Z add.s32 %r917, %r911, %r35; 2026-02-21T08:52:48.7565600Z ld.shared.b16 %rs156, [%r917]; 2026-02-21T08:52:48.7565666Z ld.shared.b16 %rs157, [%r917+1024]; 2026-02-21T08:52:48.7565857Z ld.shared.b16 %rs158, [%r917+64]; 2026-02-21T08:52:48.7565923Z ld.shared.b16 %rs159, [%r917+1088]; 2026-02-21T08:52:48.7565990Z add.s32 %r918, %r911, %r36; 2026-02-21T08:52:48.7566055Z ld.shared.b16 %rs160, [%r918]; 2026-02-21T08:52:48.7566121Z ld.shared.b16 %rs161, [%r918+1024]; 2026-02-21T08:52:48.7566189Z ld.shared.b16 %rs162, [%r918+64]; 2026-02-21T08:52:48.7566255Z ld.shared.b16 %rs163, [%r918+1088]; 2026-02-21T08:52:48.7566316Z add.s32 %r919, %r911, %r37; 2026-02-21T08:52:48.7566381Z ld.shared.b16 %rs164, [%r919]; 2026-02-21T08:52:48.7566576Z ld.shared.b16 %rs165, [%r919+1024]; 2026-02-21T08:52:48.7566649Z ld.shared.b16 %rs166, [%r919+64]; 2026-02-21T08:52:48.7566719Z ld.shared.b16 %rs167, [%r919+1088]; 2026-02-21T08:52:48.7566790Z cvt.f32.bf16 %r804, %rs136; 2026-02-21T08:52:48.7566987Z cvt.f32.bf16 %r805, %rs137; 2026-02-21T08:52:48.7567056Z cvt.f32.bf16 %r806, %rs140; 2026-02-21T08:52:48.7567117Z cvt.f32.bf16 %r807, %rs141; 2026-02-21T08:52:48.7567184Z cvt.f32.bf16 %r816, %rs144; 2026-02-21T08:52:48.7567250Z cvt.f32.bf16 %r817, %rs145; 2026-02-21T08:52:48.7567310Z cvt.f32.bf16 %r818, %rs148; 2026-02-21T08:52:48.7567376Z cvt.f32.bf16 %r819, %rs149; 2026-02-21T08:52:48.7567438Z cvt.f32.bf16 %r828, %rs152; 2026-02-21T08:52:48.7567499Z cvt.f32.bf16 %r829, %rs153; 2026-02-21T08:52:48.7567566Z cvt.f32.bf16 %r830, %rs156; 2026-02-21T08:52:48.7567627Z cvt.f32.bf16 %r831, %rs157; 2026-02-21T08:52:48.7567701Z cvt.f32.bf16 %r840, %rs160; 2026-02-21T08:52:48.7567765Z cvt.f32.bf16 %r841, %rs161; 2026-02-21T08:52:48.7567834Z cvt.f32.bf16 %r842, %rs164; 2026-02-21T08:52:48.7567896Z cvt.f32.bf16 %r843, %rs165; 2026-02-21T08:52:48.7567958Z cvt.f32.bf16 %r852, %rs138; 2026-02-21T08:52:48.7568025Z cvt.f32.bf16 %r853, %rs139; 2026-02-21T08:52:48.7568090Z cvt.f32.bf16 %r854, %rs142; 2026-02-21T08:52:48.7568150Z cvt.f32.bf16 %r855, %rs143; 2026-02-21T08:52:48.7568215Z cvt.f32.bf16 %r864, %rs146; 2026-02-21T08:52:48.7568281Z cvt.f32.bf16 %r865, %rs147; 2026-02-21T08:52:48.7568346Z cvt.f32.bf16 %r866, %rs150; 2026-02-21T08:52:48.7568408Z cvt.f32.bf16 %r867, %rs151; 2026-02-21T08:52:48.7568477Z cvt.f32.bf16 %r876, %rs154; 2026-02-21T08:52:48.7568540Z cvt.f32.bf16 %r877, %rs155; 2026-02-21T08:52:48.7568600Z cvt.f32.bf16 %r878, %rs158; 2026-02-21T08:52:48.7568662Z cvt.f32.bf16 %r879, %rs159; 2026-02-21T08:52:48.7568731Z cvt.f32.bf16 %r888, %rs162; 2026-02-21T08:52:48.7568793Z cvt.f32.bf16 %r889, %rs163; 2026-02-21T08:52:48.7568854Z cvt.f32.bf16 %r890, %rs166; 2026-02-21T08:52:48.7568920Z cvt.f32.bf16 %r891, %rs167; 2026-02-21T08:52:48.7569133Z .loc 1 54 87 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:54:87 2026-02-21T08:52:48.7569196Z // begin inline asm 2026-02-21T08:52:48.7569262Z mov.u64 %rd108, 0x0; 2026-02-21T08:52:48.7569395Z createpolicy.fractional.L2::evict_last.b64 %rd108, 1.0; 2026-02-21T08:52:48.7569455Z // end inline asm 2026-02-21T08:52:48.7569519Z // begin inline asm 2026-02-21T08:52:48.7569588Z mov.u16 %rs135, 0x0; 2026-02-21T08:52:48.7569751Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs135 }, [ %rd132 + 0 ], %rd108; 2026-02-21T08:52:48.7569812Z // end inline asm 2026-02-21T08:52:48.7570027Z .loc 1 62 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:62:28 2026-02-21T08:52:48.7570096Z st.shared.b8 [%r38], %rs135; 2026-02-21T08:52:48.7570165Z bar.sync 0; 2026-02-21T08:52:48.7570257Z ld.shared.v2.b8 {%rs168, %rs169}, [%r39]; 2026-02-21T08:52:48.7570469Z .loc 1 57 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:57:28 2026-02-21T08:52:48.7570535Z shl.b16 %rs170, %rs168, 4; 2026-02-21T08:52:48.7570598Z shl.b16 %rs171, %rs169, 4; 2026-02-21T08:52:48.7570811Z .loc 1 72 58 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:72:58 2026-02-21T08:52:48.7570885Z selp.b16 %rs172, %rs170, %rs168, %p60; 2026-02-21T08:52:48.7570948Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T08:52:48.7571153Z shr.s16 %rs174, %rs173, 4; 2026-02-21T08:52:48.7571222Z selp.b16 %rs175, %rs171, %rs169, %p60; 2026-02-21T08:52:48.7571284Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T08:52:48.7571346Z shr.s16 %rs177, %rs176, 4; 2026-02-21T08:52:48.7571551Z .loc 1 77 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:77:32 2026-02-21T08:52:48.7571617Z cvt.rn.f32.s16 %r920, %rs174; 2026-02-21T08:52:48.7571683Z cvt.rn.f32.s16 %r921, %rs177; 2026-02-21T08:52:48.7571745Z bar.sync 0; 2026-02-21T08:52:48.7571810Z st.shared.b32 [%r40], %r920; 2026-02-21T08:52:48.7571879Z st.shared.b32 [%r40+4096], %r921; 2026-02-21T08:52:48.7571945Z $L__tmp7: 2026-02-21T08:52:48.7572314Z .loc 2 291 36 // standard.py:291:36 @[ c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:84:40 ] 2026-02-21T08:52:48.7572378Z // begin inline asm 2026-02-21T08:52:48.7572456Z fence.proxy.async.shared::cta; 2026-02-21T08:52:48.7572519Z // end inline asm 2026-02-21T08:52:48.7572580Z bar.sync 0; 2026-02-21T08:52:48.7572659Z shfl.sync.idx.b32 %r922, %r5, 0, 31, -1; 2026-02-21T08:52:48.7572739Z wgmma.fence.sync.aligned; 2026-02-21T08:52:48.7572801Z shl.b32 %r923, %r922, 8; 2026-02-21T08:52:48.7572864Z and.b32 %r924, %r923, 3072; 2026-02-21T08:52:48.7572926Z add.s32 %r925, %r924, %r737; 2026-02-21T08:52:48.7572992Z bfe.u32 %r926, %r925, 4, 14; 2026-02-21T08:52:48.7573055Z cvt.u64.u32 %rd120, %r926; 2026-02-21T08:52:48.7573134Z or.b64 %rd111, %rd120, 4611686293322072064; 2026-02-21T08:52:48.7573205Z mov.pred %p45, -1; 2026-02-21T08:52:48.7573263Z // begin inline asm 2026-02-21T08:52:48.7573560Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r804,%r805,%r806,%r807}, %rd111, %p45, 1, 1; 2026-02-21T08:52:48.7573624Z // end inline asm 2026-02-21T08:52:48.7573690Z add.s32 %r927, %r925, 32; 2026-02-21T08:52:48.7573752Z bfe.u32 %r928, %r927, 4, 14; 2026-02-21T08:52:48.7573814Z cvt.u64.u32 %rd121, %r928; 2026-02-21T08:52:48.7573904Z or.b64 %rd112, %rd121, 4611686293322072064; 2026-02-21T08:52:48.7573971Z // begin inline asm 2026-02-21T08:52:48.7574258Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r816,%r817,%r818,%r819}, %rd112, %p45, 1, 1; 2026-02-21T08:52:48.7574321Z // end inline asm 2026-02-21T08:52:48.7574380Z add.s32 %r929, %r925, 64; 2026-02-21T08:52:48.7574441Z bfe.u32 %r930, %r929, 4, 14; 2026-02-21T08:52:48.7574507Z cvt.u64.u32 %rd122, %r930; 2026-02-21T08:52:48.7574581Z or.b64 %rd113, %rd122, 4611686293322072064; 2026-02-21T08:52:48.7574639Z // begin inline asm 2026-02-21T08:52:48.7574919Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r828,%r829,%r830,%r831}, %rd113, %p45, 1, 1; 2026-02-21T08:52:48.7574982Z // end inline asm 2026-02-21T08:52:48.7575045Z add.s32 %r931, %r925, 96; 2026-02-21T08:52:48.7575106Z bfe.u32 %r932, %r931, 4, 14; 2026-02-21T08:52:48.7575171Z cvt.u64.u32 %rd123, %r932; 2026-02-21T08:52:48.7575244Z or.b64 %rd114, %rd123, 4611686293322072064; 2026-02-21T08:52:48.7575307Z // begin inline asm 2026-02-21T08:52:48.7575584Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r840,%r841,%r842,%r843}, %rd114, %p45, 1, 1; 2026-02-21T08:52:48.7575647Z // end inline asm 2026-02-21T08:52:48.7575708Z add.s32 %r933, %r925, 4096; 2026-02-21T08:52:48.7575770Z bfe.u32 %r934, %r933, 4, 14; 2026-02-21T08:52:48.7575837Z cvt.u64.u32 %rd124, %r934; 2026-02-21T08:52:48.7575913Z or.b64 %rd115, %rd124, 4611686293322072064; 2026-02-21T08:52:48.7575975Z // begin inline asm 2026-02-21T08:52:48.7576257Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r852,%r853,%r854,%r855}, %rd115, %p45, 1, 1; 2026-02-21T08:52:48.7576316Z // end inline asm 2026-02-21T08:52:48.7576381Z add.s32 %r935, %r925, 4128; 2026-02-21T08:52:48.7576445Z bfe.u32 %r936, %r935, 4, 14; 2026-02-21T08:52:48.7576650Z cvt.u64.u32 %rd125, %r936; 2026-02-21T08:52:48.7576725Z or.b64 %rd116, %rd125, 4611686293322072064; 2026-02-21T08:52:48.7576927Z // begin inline asm 2026-02-21T08:52:48.7577210Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r864,%r865,%r866,%r867}, %rd116, %p45, 1, 1; 2026-02-21T08:52:48.7577269Z // end inline asm 2026-02-21T08:52:48.7577332Z add.s32 %r937, %r925, 4160; 2026-02-21T08:52:48.7577399Z bfe.u32 %r938, %r937, 4, 14; 2026-02-21T08:52:48.7577461Z cvt.u64.u32 %rd126, %r938; 2026-02-21T08:52:48.7577534Z or.b64 %rd117, %rd126, 4611686293322072064; 2026-02-21T08:52:48.7577594Z // begin inline asm 2026-02-21T08:52:48.7577877Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r876,%r877,%r878,%r879}, %rd117, %p45, 1, 1; 2026-02-21T08:52:48.7577935Z // end inline asm 2026-02-21T08:52:48.7578120Z add.s32 %r939, %r925, 4192; 2026-02-21T08:52:48.7578192Z bfe.u32 %r940, %r939, 4, 14; 2026-02-21T08:52:48.7578253Z cvt.u64.u32 %rd127, %r940; 2026-02-21T08:52:48.7578324Z or.b64 %rd118, %rd127, 4611686293322072064; 2026-02-21T08:52:48.7578387Z // begin inline asm 2026-02-21T08:52:48.7578670Z wgmma.mma_async.sync.aligned.m64n8k8.f32.tf32.tf32 {%r997,%r998,%r999,%r1000}, {%r888,%r889,%r890,%r891}, %rd118, %p45, 1, 1; 2026-02-21T08:52:48.7578728Z // end inline asm 2026-02-21T08:52:48.7578807Z wgmma.commit_group.sync.aligned; 2026-02-21T08:52:48.7578872Z mov.b32 %r897, 0; 2026-02-21T08:52:48.7578933Z mov.b32 %r896, %r737; 2026-02-21T08:52:48.7578991Z mov.b32 %r898, %r897; 2026-02-21T08:52:48.7579057Z // begin inline asm 2026-02-21T08:52:48.7579170Z // wait for regs: %r997,%r998,%r999,%r1000,%r896,%r897,%r898 2026-02-21T08:52:48.7579245Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:52:48.7579302Z // end inline asm 2026-02-21T08:52:48.7579362Z $L__tmp8: 2026-02-21T08:52:48.7579575Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7579637Z add.s32 %r941, %r996, 1; 2026-02-21T08:52:48.7579710Z setp.gt.s32 %p56, %r941, 1; 2026-02-21T08:52:48.7579779Z selp.b32 %r996, 0, %r941, %p56; 2026-02-21T08:52:48.7579982Z .loc 1 48 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:32 2026-02-21T08:52:48.7580058Z mad.wide.s32 %rd119, %r994, 2, %rd14; 2026-02-21T08:52:48.7580258Z .loc 1 48 80 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:48:80 2026-02-21T08:52:48.7580320Z shl.b32 %r942, %r996, 13; 2026-02-21T08:52:48.7580382Z add.s32 %r906, %r28, %r942; 2026-02-21T08:52:48.7580450Z selp.b32 %r907, 8, 0, %p54; 2026-02-21T08:52:48.7580512Z // begin inline asm 2026-02-21T08:52:48.7580655Z cp.async.ca.shared.global [ %r906 + 0 ], [ %rd119 + 0 ], 0x8, %r907; 2026-02-21T08:52:48.7580718Z // end inline asm 2026-02-21T08:52:48.7580784Z cp.async.commit_group; 2026-02-21T08:52:48.7580989Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7581055Z add.s32 %r994, %r994, 64; 2026-02-21T08:52:48.7581144Z add.s64 %rd132, %rd132, 229376; 2026-02-21T08:52:48.7581211Z setp.lt.u64 %p57, %rd133, 4064; 2026-02-21T08:52:48.7581272Z @%p57 bra $L__BB0_12; 2026-02-21T08:52:48.7581390Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:52:48.7581595Z .loc 1 33 32 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:33:32 2026-02-21T08:52:48.7581659Z or.b32 %r946, %r109, %r9; 2026-02-21T08:52:48.7581865Z .loc 1 40 71 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:40:71 2026-02-21T08:52:48.7581935Z cp.async.wait_group 0; 2026-02-21T08:52:48.7581991Z bar.sync 0; 2026-02-21T08:52:48.7582189Z .loc 1 87 28 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:87:28 2026-02-21T08:52:48.7582275Z cvt.rn.bf16x2.f32 %r947, %r998, %r997; 2026-02-21T08:52:48.7582350Z cvt.rn.bf16x2.f32 %r948, %r1000, %r999; 2026-02-21T08:52:48.7582549Z .loc 1 88 50 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:50 2026-02-21T08:52:48.7582722Z mad.lo.s32 %r949, %r108, 7168, %r946; 2026-02-21T08:52:48.7582926Z .loc 1 88 22 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:22 2026-02-21T08:52:48.7582993Z mad.wide.s32 %rd128, %r949, 2, %rd16; 2026-02-21T08:52:48.7583198Z .loc 1 88 81 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:88:81 2026-02-21T08:52:48.7583346Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r41], {%r947, %r948}; 2026-02-21T08:52:48.7583401Z bar.sync 0; 2026-02-21T08:52:48.7583465Z // begin inline asm 2026-02-21T08:52:48.7583596Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r943}, [%r944]; 2026-02-21T08:52:48.7583655Z // end inline asm 2026-02-21T08:52:48.7583821Z // begin inline asm 2026-02-21T08:52:48.7583903Z st.global.b32 [ %rd128 + 0 ], { %r943 }; 2026-02-21T08:52:48.7583960Z // end inline asm 2026-02-21T08:52:48.7584177Z .loc 1 19 144 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:144 2026-02-21T08:52:48.7584248Z add.s32 %r125, %r993, 4224; 2026-02-21T08:52:48.7584315Z setp.lt.s32 %p58, %r993, -4000; 2026-02-21T08:52:48.7584376Z mov.b32 %r993, %r125; 2026-02-21T08:52:48.7584438Z @%p58 bra $L__BB0_11; 2026-02-21T08:52:48.7584533Z $L__BB0_14: // %._crit_edge 2026-02-21T08:52:48.7584736Z .loc 1 19 4 // c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py:19:4 2026-02-21T08:52:48.7584790Z ret; 2026-02-21T08:52:48.7584853Z $L__tmp9: 2026-02-21T08:52:48.7584911Z $L__func_end0: 2026-02-21T08:52:48.7584999Z // -- End function 2026-02-21T08:52:48.7585059Z } 2026-02-21T08:52:48.7585313Z .file 1 "/tmp/torchinductor_root/5v/c5vufscthqmvbpeuligi4yxhlgplc5eztvn3kizxdubhwnk3txtx.py" 2026-02-21T08:52:48.7585523Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:52:48.7585589Z .section .debug_abbrev 2026-02-21T08:52:48.7585648Z { 2026-02-21T08:52:48.7585748Z .b8 1 // Abbreviation Code 2026-02-21T08:52:48.7585851Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:52:48.7585940Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:48.7586046Z .b8 37 // DW_AT_producer 2026-02-21T08:52:48.7586131Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.7586219Z .b8 19 // DW_AT_language 2026-02-21T08:52:48.7586311Z .b8 5 // DW_FORM_data2 2026-02-21T08:52:48.7586396Z .b8 3 // DW_AT_name 2026-02-21T08:52:48.7586605Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.7586700Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:52:48.7586781Z .b8 6 // DW_FORM_data4 2026-02-21T08:52:48.7586865Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:52:48.7586952Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.7587029Z .b8 0 // EOM(1) 2026-02-21T08:52:48.7587100Z .b8 0 // EOM(2) 2026-02-21T08:52:48.7587191Z .b8 2 // Abbreviation Code 2026-02-21T08:52:48.7587283Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:48.7587363Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:48.7587441Z .b8 3 // DW_AT_name 2026-02-21T08:52:48.7587525Z .b8 8 // DW_FORM_string 2026-02-21T08:52:48.7587608Z .b8 32 // DW_AT_inline 2026-02-21T08:52:48.7587691Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.7587768Z .b8 0 // EOM(1) 2026-02-21T08:52:48.7587976Z .b8 0 // EOM(2) 2026-02-21T08:52:48.7588065Z .b8 3 // Abbreviation Code 2026-02-21T08:52:48.7588150Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:52:48.7588238Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:52:48.7588318Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:48.7588395Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.7588572Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:48.7588662Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.7588761Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:48.7588973Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:48.7589049Z .b8 0 // EOM(1) 2026-02-21T08:52:48.7589121Z .b8 0 // EOM(2) 2026-02-21T08:52:48.7589214Z .b8 4 // Abbreviation Code 2026-02-21T08:52:48.7589325Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:52:48.7589409Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:52:48.7589501Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:52:48.7589585Z .b8 19 // DW_FORM_ref4 2026-02-21T08:52:48.7589663Z .b8 17 // DW_AT_low_pc 2026-02-21T08:52:48.7589740Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.7589827Z .b8 18 // DW_AT_high_pc 2026-02-21T08:52:48.7589903Z .b8 1 // DW_FORM_addr 2026-02-21T08:52:48.7590003Z .b8 88 // DW_AT_call_file 2026-02-21T08:52:48.7590089Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.7590172Z .b8 89 // DW_AT_call_line 2026-02-21T08:52:48.7590253Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.7590339Z .b8 87 // DW_AT_call_column 2026-02-21T08:52:48.7590421Z .b8 11 // DW_FORM_data1 2026-02-21T08:52:48.7590491Z .b8 0 // EOM(1) 2026-02-21T08:52:48.7590561Z .b8 0 // EOM(2) 2026-02-21T08:52:48.7590635Z .b8 0 // EOM(3) 2026-02-21T08:52:48.7590690Z } 2026-02-21T08:52:48.7590753Z .section .debug_info 2026-02-21T08:52:48.7590804Z { 2026-02-21T08:52:48.7590899Z .b32 178 // Length of Unit 2026-02-21T08:52:48.7590993Z .b8 2 // DWARF version number 2026-02-21T08:52:48.7591047Z .b8 0 2026-02-21T08:52:48.7591186Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:52:48.7591282Z .b8 8 // Address Size (in bytes) 2026-02-21T08:52:48.7591401Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:52:48.7591494Z .b8 116 // DW_AT_producer 2026-02-21T08:52:48.7591548Z .b8 114 2026-02-21T08:52:48.7591603Z .b8 105 2026-02-21T08:52:48.7591666Z .b8 116 2026-02-21T08:52:48.7591728Z .b8 111 2026-02-21T08:52:48.7591780Z .b8 110 2026-02-21T08:52:48.7591832Z .b8 0 2026-02-21T08:52:48.7591919Z .b8 2 // DW_AT_language 2026-02-21T08:52:48.7591970Z .b8 0 2026-02-21T08:52:48.7592049Z .b8 99 // DW_AT_name 2026-02-21T08:52:48.7592101Z .b8 53 2026-02-21T08:52:48.7592158Z .b8 118 2026-02-21T08:52:48.7592211Z .b8 117 2026-02-21T08:52:48.7592262Z .b8 102 2026-02-21T08:52:48.7592319Z .b8 115 2026-02-21T08:52:48.7592374Z .b8 99 2026-02-21T08:52:48.7592426Z .b8 116 2026-02-21T08:52:48.7592478Z .b8 104 2026-02-21T08:52:48.7592535Z .b8 113 2026-02-21T08:52:48.7592587Z .b8 109 2026-02-21T08:52:48.7592747Z .b8 118 2026-02-21T08:52:48.7592803Z .b8 98 2026-02-21T08:52:48.7592854Z .b8 112 2026-02-21T08:52:48.7592907Z .b8 101 2026-02-21T08:52:48.7592959Z .b8 117 2026-02-21T08:52:48.7593028Z .b8 108 2026-02-21T08:52:48.7593081Z .b8 105 2026-02-21T08:52:48.7593135Z .b8 103 2026-02-21T08:52:48.7593187Z .b8 105 2026-02-21T08:52:48.7593244Z .b8 52 2026-02-21T08:52:48.7593295Z .b8 121 2026-02-21T08:52:48.7593355Z .b8 120 2026-02-21T08:52:48.7593405Z .b8 104 2026-02-21T08:52:48.7593457Z .b8 108 2026-02-21T08:52:48.7593512Z .b8 103 2026-02-21T08:52:48.7593563Z .b8 112 2026-02-21T08:52:48.7593616Z .b8 108 2026-02-21T08:52:48.7593667Z .b8 99 2026-02-21T08:52:48.7593721Z .b8 53 2026-02-21T08:52:48.7593772Z .b8 101 2026-02-21T08:52:48.7593822Z .b8 122 2026-02-21T08:52:48.7593873Z .b8 116 2026-02-21T08:52:48.7594035Z .b8 118 2026-02-21T08:52:48.7594093Z .b8 110 2026-02-21T08:52:48.7594144Z .b8 51 2026-02-21T08:52:48.7594201Z .b8 107 2026-02-21T08:52:48.7594251Z .b8 105 2026-02-21T08:52:48.7594302Z .b8 122 2026-02-21T08:52:48.7594359Z .b8 120 2026-02-21T08:52:48.7594416Z .b8 100 2026-02-21T08:52:48.7594467Z .b8 117 2026-02-21T08:52:48.7594518Z .b8 98 2026-02-21T08:52:48.7594575Z .b8 104 2026-02-21T08:52:48.7594628Z .b8 119 2026-02-21T08:52:48.7594680Z .b8 110 2026-02-21T08:52:48.7594731Z .b8 107 2026-02-21T08:52:48.7594785Z .b8 51 2026-02-21T08:52:48.7594836Z .b8 116 2026-02-21T08:52:48.7594887Z .b8 120 2026-02-21T08:52:48.7594942Z .b8 116 2026-02-21T08:52:48.7594994Z .b8 120 2026-02-21T08:52:48.7595044Z .b8 46 2026-02-21T08:52:48.7595111Z .b8 112 2026-02-21T08:52:48.7595169Z .b8 121 2026-02-21T08:52:48.7595221Z .b8 0 2026-02-21T08:52:48.7595325Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:52:48.7595411Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:52:48.7595466Z .b8 116 2026-02-21T08:52:48.7595518Z .b8 109 2026-02-21T08:52:48.7595568Z .b8 112 2026-02-21T08:52:48.7595624Z .b8 47 2026-02-21T08:52:48.7595676Z .b8 116 2026-02-21T08:52:48.7595727Z .b8 111 2026-02-21T08:52:48.7595781Z .b8 114 2026-02-21T08:52:48.7595836Z .b8 99 2026-02-21T08:52:48.7595887Z .b8 104 2026-02-21T08:52:48.7595938Z .b8 105 2026-02-21T08:52:48.7595992Z .b8 110 2026-02-21T08:52:48.7596045Z .b8 100 2026-02-21T08:52:48.7596098Z .b8 117 2026-02-21T08:52:48.7596151Z .b8 99 2026-02-21T08:52:48.7596208Z .b8 116 2026-02-21T08:52:48.7596259Z .b8 111 2026-02-21T08:52:48.7596310Z .b8 114 2026-02-21T08:52:48.7596364Z .b8 95 2026-02-21T08:52:48.7596415Z .b8 114 2026-02-21T08:52:48.7596635Z .b8 111 2026-02-21T08:52:48.7596737Z .b8 111 2026-02-21T08:52:48.7596815Z .b8 116 2026-02-21T08:52:48.7596868Z .b8 47 2026-02-21T08:52:48.7596920Z .b8 53 2026-02-21T08:52:48.7596977Z .b8 118 2026-02-21T08:52:48.7597038Z .b8 0 2026-02-21T08:52:48.7597159Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:52:48.7597241Z .b8 95 // DW_AT_name 2026-02-21T08:52:48.7597297Z .b8 104 2026-02-21T08:52:48.7597348Z .b8 101 2026-02-21T08:52:48.7597401Z .b8 108 2026-02-21T08:52:48.7597459Z .b8 105 2026-02-21T08:52:48.7597510Z .b8 111 2026-02-21T08:52:48.7597564Z .b8 110 2026-02-21T08:52:48.7597616Z .b8 95 2026-02-21T08:52:48.7597677Z .b8 109 2026-02-21T08:52:48.7597734Z .b8 97 2026-02-21T08:52:48.7597786Z .b8 116 2026-02-21T08:52:48.7597838Z .b8 109 2026-02-21T08:52:48.7597894Z .b8 117 2026-02-21T08:52:48.7597945Z .b8 108 2026-02-21T08:52:48.7597995Z .b8 95 2026-02-21T08:52:48.7598049Z .b8 98 2026-02-21T08:52:48.7598101Z .b8 102 2026-02-21T08:52:48.7598151Z .b8 49 2026-02-21T08:52:48.7598201Z .b8 54 2026-02-21T08:52:48.7598256Z .b8 95 2026-02-21T08:52:48.7598307Z .b8 105 2026-02-21T08:52:48.7598357Z .b8 110 2026-02-21T08:52:48.7598413Z .b8 116 2026-02-21T08:52:48.7598466Z .b8 52 2026-02-21T08:52:48.7598520Z .b8 0 2026-02-21T08:52:48.7598604Z .b8 1 // DW_AT_inline 2026-02-21T08:52:48.7598718Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:52:48.7598817Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:52:48.7599079Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:52:48.7599185Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:48.7599317Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:52:48.7599415Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:52:48.7599507Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:52:48.7599597Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:52:48.7599682Z .b8 1 // DW_AT_call_file 2026-02-21T08:52:48.7599764Z .b8 84 // DW_AT_call_line 2026-02-21T08:52:48.7599979Z .b8 40 // DW_AT_call_column 2026-02-21T08:52:48.7600074Z .b8 0 // End Of Children Mark 2026-02-21T08:52:48.7600161Z .b8 0 // End Of Children Mark 2026-02-21T08:52:48.7600221Z } 2026-02-21T08:52:48.7600292Z .section .debug_macinfo { } 2026-02-21T08:52:48.7600298Z 2026-02-21T08:52:48.7600379Z ================================================================ 2026-02-21T08:52:48.7600505Z please share the reproducer above with Triton project. 2026-02-21T08:52:49.6226745Z [86s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 2], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T08:52:49.6228326Z Tensor-likes are not close! 2026-02-21T08:52:49.6228580Z 2026-02-21T08:52:49.6228707Z Mismatched elements: 455873 / 458752 (99.4%) 2026-02-21T08:52:49.6229125Z Greatest absolute difference: 1824.0 at index (33, 3732) (up to 0.01 allowed) 2026-02-21T08:52:49.6229674Z Greatest relative difference: 105472.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:52:49.6230142Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:49.6230418Z 2026-02-21T08:52:49.7556862Z [86s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T08:52:49.7558419Z Tensor-likes are not close! 2026-02-21T08:52:49.7558577Z 2026-02-21T08:52:49.7558680Z Mismatched elements: 456958 / 458752 (99.6%) 2026-02-21T08:52:49.7559087Z Greatest absolute difference: 3136.0 at index (18, 3644) (up to 0.01 allowed) 2026-02-21T08:52:49.7559625Z Greatest relative difference: 142336.0 at index (61, 2065) (up to 0.01 allowed) 2026-02-21T08:52:49.7560096Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:49.7560351Z 2026-02-21T08:52:49.8212938Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 2], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T08:52:49.8214610Z Tensor-likes are not close! 2026-02-21T08:52:49.8214783Z 2026-02-21T08:52:49.8214897Z Mismatched elements: 457551 / 458752 (99.7%) 2026-02-21T08:52:49.8215345Z Greatest absolute difference: 4416.0 at index (62, 2622) (up to 0.01 allowed) 2026-02-21T08:52:49.8215921Z Greatest relative difference: 210944.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:52:49.8216635Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:49.8217212Z 2026-02-21T08:52:49.9578378Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:49.9580167Z Tensor-likes are not close! 2026-02-21T08:52:49.9580329Z 2026-02-21T08:52:49.9580435Z Mismatched elements: 457263 / 458752 (99.7%) 2026-02-21T08:52:49.9580843Z Greatest absolute difference: 3200.0 at index (41, 6557) (up to 0.01 allowed) 2026-02-21T08:52:49.9581657Z Greatest relative difference: 97280.0 at index (61, 2065) (up to 0.01 allowed) 2026-02-21T08:52:49.9582185Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:49.9582466Z 2026-02-21T08:52:50.0934042Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T08:52:50.0935870Z Tensor-likes are not close! 2026-02-21T08:52:50.0936053Z 2026-02-21T08:52:50.0936166Z Mismatched elements: 457020 / 458752 (99.6%) 2026-02-21T08:52:50.0936912Z Greatest absolute difference: 3360.0 at index (27, 1168) (up to 0.01 allowed) 2026-02-21T08:52:50.0937405Z Greatest relative difference: 290816.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.0937829Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.0938058Z 2026-02-21T08:52:50.1002277Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.1004073Z Tensor-likes are not close! 2026-02-21T08:52:50.1004228Z 2026-02-21T08:52:50.1004329Z Mismatched elements: 456631 / 458752 (99.5%) 2026-02-21T08:52:50.1004733Z Greatest absolute difference: 3360.0 at index (27, 1168) (up to 0.01 allowed) 2026-02-21T08:52:50.1005269Z Greatest relative difference: 290816.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.1005748Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.1006000Z 2026-02-21T08:52:50.2383116Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 3], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.2384901Z Tensor-likes are not close! 2026-02-21T08:52:50.2385073Z 2026-02-21T08:52:50.2385182Z Mismatched elements: 457020 / 458752 (99.6%) 2026-02-21T08:52:50.2385614Z Greatest absolute difference: 3360.0 at index (27, 1168) (up to 0.01 allowed) 2026-02-21T08:52:50.2386155Z Greatest relative difference: 290816.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.2386982Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.2387208Z 2026-02-21T08:52:50.3048954Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.3051032Z Tensor-likes are not close! 2026-02-21T08:52:50.3051221Z 2026-02-21T08:52:50.3051329Z Mismatched elements: 456634 / 458752 (99.5%) 2026-02-21T08:52:50.3051741Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:52:50.3052281Z Greatest relative difference: 185344.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:52:50.3052936Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.3053195Z 2026-02-21T08:52:50.3813980Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.3815769Z Tensor-likes are not close! 2026-02-21T08:52:50.3815933Z 2026-02-21T08:52:50.3816042Z Mismatched elements: 457263 / 458752 (99.7%) 2026-02-21T08:52:50.3816812Z Greatest absolute difference: 3200.0 at index (41, 6557) (up to 0.01 allowed) 2026-02-21T08:52:50.3817351Z Greatest relative difference: 97280.0 at index (61, 2065) (up to 0.01 allowed) 2026-02-21T08:52:50.3817805Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.3818044Z 2026-02-21T08:52:50.4457070Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[3, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.4458865Z Tensor-likes are not close! 2026-02-21T08:52:50.4459036Z 2026-02-21T08:52:50.4459144Z Mismatched elements: 456634 / 458752 (99.5%) 2026-02-21T08:52:50.4459558Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:52:50.4460093Z Greatest relative difference: 185344.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:52:50.4460570Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.4460825Z 2026-02-21T08:52:50.4521602Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.4523399Z Tensor-likes are not close! 2026-02-21T08:52:50.4523553Z 2026-02-21T08:52:50.4523661Z Mismatched elements: 456631 / 458752 (99.5%) 2026-02-21T08:52:50.4524058Z Greatest absolute difference: 3360.0 at index (27, 1168) (up to 0.01 allowed) 2026-02-21T08:52:50.4524589Z Greatest relative difference: 290816.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.4525056Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.4525314Z 2026-02-21T08:52:50.5239739Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T08:52:50.5241778Z Tensor-likes are not close! 2026-02-21T08:52:50.5241942Z 2026-02-21T08:52:50.5242043Z Mismatched elements: 457164 / 458752 (99.7%) 2026-02-21T08:52:50.5242448Z Greatest absolute difference: 3200.0 at index (38, 1878) (up to 0.01 allowed) 2026-02-21T08:52:50.5242980Z Greatest relative difference: 198656.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.5243453Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.5243706Z 2026-02-21T08:52:50.5308451Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[False, None], range_num_stages=[2, 2], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T08:52:50.5310133Z Tensor-likes are not close! 2026-02-21T08:52:50.5310272Z 2026-02-21T08:52:50.5310366Z Mismatched elements: 456977 / 458752 (99.6%) 2026-02-21T08:52:50.5310717Z Greatest absolute difference: 3200.0 at index (38, 1878) (up to 0.01 allowed) 2026-02-21T08:52:50.5311186Z Greatest relative difference: 70144.0 at index (61, 2065) (up to 0.01 allowed) 2026-02-21T08:52:50.5311591Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.5311828Z 2026-02-21T08:52:50.6744833Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[1, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.6747045Z Tensor-likes are not close! 2026-02-21T08:52:50.6747191Z 2026-02-21T08:52:50.6747287Z Mismatched elements: 456631 / 458752 (99.5%) 2026-02-21T08:52:50.6747654Z Greatest absolute difference: 3360.0 at index (27, 1168) (up to 0.01 allowed) 2026-02-21T08:52:50.6748128Z Greatest relative difference: 290816.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.6748632Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.6748872Z 2026-02-21T08:52:50.6822915Z [87s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 1], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:52:50.6824712Z Tensor-likes are not close! 2026-02-21T08:52:50.6824877Z 2026-02-21T08:52:50.6824988Z Mismatched elements: 457164 / 458752 (99.7%) 2026-02-21T08:52:50.6825418Z Greatest absolute difference: 3200.0 at index (38, 1878) (up to 0.01 allowed) 2026-02-21T08:52:50.6825958Z Greatest relative difference: 198656.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:52:50.6826627Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:52:50.6826905Z 2026-02-21T08:52:50.7485810Z 2026-02-21T08:52:50.7487396Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 103/103 9.4 configs/s 2026-02-21T08:52:56.4234646Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━ 931/931 185.0 configs/s 2026-02-21T08:52:56.6216227Z [93s] Generation 1 complete: 2026-02-21T08:52:56.6216705Z error=28 2026-02-21T08:52:56.6217357Z ok=77 2026-02-21T08:52:56.6217523Z min=0.2138 2026-02-21T08:52:56.6217701Z mid=0.6071 2026-02-21T08:52:56.6217861Z max=20.0965 2026-02-21T08:52:56.6218072Z best={'block_sizes': [16, 64, 16], 2026-02-21T08:52:56.6218469Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:52:56.6218885Z 'l2_groupings': [8], 2026-02-21T08:52:56.6219133Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:52:56.6219415Z 'loop_orders': [[1, 0]], 2026-02-21T08:52:56.6219645Z 'maxnreg': 64, 2026-02-21T08:52:56.6219848Z 'num_sm_multiplier': 2, 2026-02-21T08:52:56.6220068Z 'num_stages': 3, 2026-02-21T08:52:56.6220255Z 'num_warps': 4, 2026-02-21T08:52:56.6220473Z 'pid_type': 'persistent_interleaved', 2026-02-21T08:52:56.6220752Z 'range_flattens': [None, True], 2026-02-21T08:52:56.6221343Z 'range_multi_buffers': [None, True], 2026-02-21T08:52:56.6221638Z 'range_num_stages': [0, 0], 2026-02-21T08:52:56.6221883Z 'range_unroll_factors': [4, 2], 2026-02-21T08:52:56.6222134Z 'range_warp_specializes': []} 2026-02-21T08:52:56.6240026Z [93s] Fitting surrogate: 205 points, 205 targets 2026-02-21T08:52:58.0709890Z [95s] Generation 2 starting: 91 neighbors, 5 active search path(s) 2026-02-21T08:53:16.5559611Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94/94 3.8 configs/s 2026-02-21T08:53:21.6768027Z [118s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[1, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:53:21.6770963Z Tensor-likes are not close! 2026-02-21T08:53:21.6771238Z 2026-02-21T08:53:21.6771414Z Mismatched elements: 456634 / 458752 (99.5%) 2026-02-21T08:53:21.6772068Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:53:21.6772924Z Greatest relative difference: 185344.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:53:21.6773679Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:53:21.6774086Z 2026-02-21T08:53:22.5867577Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 94/94 15.7 configs/s 2026-02-21T08:53:25.5939916Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 330.5 2026-02-21T08:53:25.5940408Z configs/s 2026-02-21T08:53:25.7488423Z [122s] Generation 2 complete: 2026-02-21T08:53:25.7488694Z error=1 2026-02-21T08:53:25.7488863Z ok=95 2026-02-21T08:53:25.7489031Z min=0.1398 2026-02-21T08:53:25.7489220Z mid=0.3685 2026-02-21T08:53:25.7489395Z max=10.8721 2026-02-21T08:53:25.7489588Z best={'block_sizes': [64, 64, 16], 2026-02-21T08:53:25.7489928Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:53:25.7490270Z 'l2_groupings': [1], 2026-02-21T08:53:25.7490503Z 'load_eviction_policies': ['', ''], 2026-02-21T08:53:25.7490770Z 'loop_orders': [[0, 1]], 2026-02-21T08:53:25.7491022Z 'num_stages': 2, 2026-02-21T08:53:25.7491231Z 'num_warps': 4, 2026-02-21T08:53:25.7491429Z 'pid_type': 'flat', 2026-02-21T08:53:25.7491656Z 'range_flattens': [None, None], 2026-02-21T08:53:25.7491916Z 'range_multi_buffers': [None, None], 2026-02-21T08:53:25.7492180Z 'range_num_stages': [0, 0], 2026-02-21T08:53:25.7492411Z 'range_unroll_factors': [0, 0], 2026-02-21T08:53:25.7492662Z 'range_warp_specializes': []} 2026-02-21T08:53:25.7524342Z [122s] Fitting surrogate: 301 points, 301 targets 2026-02-21T08:53:27.2238386Z [124s] Generation 3 starting: 90 neighbors, 5 active search path(s) 2026-02-21T08:53:46.3653490Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 92/92 3.5 configs/s 2026-02-21T08:53:50.3922439Z 2026-02-21T08:53:50.3922450Z 2026-02-21T08:53:50.3922721Z ================================================================ 2026-02-21T08:53:50.3923781Z Internal Triton PTX codegen error 2026-02-21T08:53:50.3924082Z `ptxas` stderr: 2026-02-21T08:53:50.3924978Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 724 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T08:53:50.3926143Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:53:50.3926800Z 2026-02-21T08:53:50.3927685Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsszhqie7.ptx -o /tmp/tmpsszhqie7.ptx.o 2026-02-21T08:53:50.3928703Z 2026-02-21T08:53:50.3928708Z 2026-02-21T08:53:50.3928819Z // 2026-02-21T08:53:50.3929396Z // Generated by LLVM NVPTX Back-End 2026-02-21T08:53:50.3929754Z // 2026-02-21T08:53:50.3929884Z 2026-02-21T08:53:50.3929984Z .version 8.7 2026-02-21T08:53:50.3930225Z .target sm_90a 2026-02-21T08:53:50.3930459Z .address_size 64 2026-02-21T08:53:50.3930626Z 2026-02-21T08:53:50.3930918Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T08:53:50.3931524Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T08:53:50.3931961Z // @_helion_matmul_bf16_int4 2026-02-21T08:53:50.3932413Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T08:53:50.3932905Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T08:53:50.3933498Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T08:53:50.3934075Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T08:53:50.3934654Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T08:53:50.3935235Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T08:53:50.3935595Z ) 2026-02-21T08:53:50.3935768Z .reqntid 128 2026-02-21T08:53:50.3935951Z .maxnreg 32 2026-02-21T08:53:50.3936127Z { 2026-02-21T08:53:50.3936302Z .reg .pred %p<58>; 2026-02-21T08:53:50.3936765Z .reg .b16 %rs<769>; 2026-02-21T08:53:50.3936971Z .reg .b32 %r<3896>; 2026-02-21T08:53:50.3937183Z .reg .b64 %rd<326>; 2026-02-21T08:53:50.3937588Z .loc 1 14 0 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:14:0 2026-02-21T08:53:50.3938077Z $L__func_begin0: 2026-02-21T08:53:50.3938472Z .loc 1 14 0 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:14:0 2026-02-21T08:53:50.3938871Z 2026-02-21T08:53:50.3938944Z // %bb.0: 2026-02-21T08:53:50.3939200Z ld.param.b64 %rd82, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T08:53:50.3939605Z ld.param.b64 %rd81, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T08:53:50.3940052Z ld.param.b64 %rd80, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T08:53:50.3940374Z $L__tmp0: 2026-02-21T08:53:50.3940771Z .loc 1 20 30 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:20:30 2026-02-21T08:53:50.3941265Z mov.u32 %r1, %ctaid.x; 2026-02-21T08:53:50.3941680Z .loc 1 21 49 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:21:49 2026-02-21T08:53:50.3942160Z min.u32 %r2, %r1, 111; 2026-02-21T08:53:50.3942570Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.3943059Z sub.s32 %r413, %r2, %r1; 2026-02-21T08:53:50.3943284Z add.s32 %r414, %r413, 1; 2026-02-21T08:53:50.3944115Z [147s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T08:53:50.3946040Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[None, False], range_num_stages=[2, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T08:53:50.3947871Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T08:53:50.3948175Z `ptxas` stderr: 2026-02-21T08:53:50.3948847Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 724 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T08:53:50.3949499Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T08:53:50.3949697Z 2026-02-21T08:53:50.3950226Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsszhqie7.ptx -o /tmp/tmpsszhqie7.ptx.o 2026-02-21T08:53:50.3950845Z 2026-02-21T08:53:50.3951156Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T08:53:50.3951476Z mul.hi.s32 %r415, %r414, 1431655766; 2026-02-21T08:53:50.3951700Z shr.u32 %r416, %r415, 31; 2026-02-21T08:53:50.3951887Z add.s32 %r417, %r415, %r416; 2026-02-21T08:53:50.3952085Z mul.lo.s32 %r418, %r417, 3; 2026-02-21T08:53:50.3952272Z add.s32 %r4, %r418, %r1; 2026-02-21T08:53:50.3952612Z .loc 1 28 45 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:28:45 2026-02-21T08:53:50.3952988Z mov.u32 %r5, %tid.x; 2026-02-21T08:53:50.3953150Z shr.u32 %r6, %r5, 5; 2026-02-21T08:53:50.3953314Z shl.b32 %r7, %r5, 2; 2026-02-21T08:53:50.3953484Z and.b32 %r8, %r7, 60; 2026-02-21T08:53:50.3953661Z and.b32 %r9, %r5, 7; 2026-02-21T08:53:50.3953817Z shl.b32 %r10, %r9, 3; 2026-02-21T08:53:50.3953985Z shl.b32 %r11, %r5, 4; 2026-02-21T08:53:50.3954147Z and.b32 %r12, %r11, 48; 2026-02-21T08:53:50.3954475Z .loc 1 38 48 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:38:48 2026-02-21T08:53:50.3954853Z bfe.u32 %r13, %r5, 2, 5; 2026-02-21T08:53:50.3955178Z .loc 1 45 42 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:42 2026-02-21T08:53:50.3955550Z and.b32 %r14, %r5, 112; 2026-02-21T08:53:50.3955715Z bfe.u32 %r419, %r5, 3, 4; 2026-02-21T08:53:50.3956026Z .loc 1 45 53 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:53 2026-02-21T08:53:50.3956370Z shl.b32 %r15, %r14, 9; 2026-02-21T08:53:50.3956663Z shl.b32 %r420, %r5, 9; 2026-02-21T08:53:50.3956829Z and.b32 %r16, %r420, 57344; 2026-02-21T08:53:50.3957145Z .loc 1 62 38 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:62:38 2026-02-21T08:53:50.3957490Z and.b32 %r17, %r5, 64; 2026-02-21T08:53:50.3957785Z .loc 1 85 43 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:43 2026-02-21T08:53:50.3958135Z mul.lo.s32 %r18, %r419, 7168; 2026-02-21T08:53:50.3958314Z add.s32 %r19, %r18, 114688; 2026-02-21T08:53:50.3958501Z add.s32 %r20, %r18, 229376; 2026-02-21T08:53:50.3958686Z add.s32 %r21, %r18, 344064; 2026-02-21T08:53:50.3959027Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.3959410Z setp.ge.s32 %p1, %r1, %r4; 2026-02-21T08:53:50.3959603Z or.b32 %r3729, %r15, %r8; 2026-02-21T08:53:50.3959788Z cvt.u64.u32 %rd311, %r8; 2026-02-21T08:53:50.3959961Z cvt.u64.u32 %rd312, %r15; 2026-02-21T08:53:50.3960140Z cvt.u64.u32 %rd313, %r16; 2026-02-21T08:53:50.3960307Z and.b32 %r3730, %r5, 127; 2026-02-21T08:53:50.3960481Z shr.u32 %r3731, %r14, 1; 2026-02-21T08:53:50.3960655Z mov.b32 %r3732, global_smem; 2026-02-21T08:53:50.3960848Z mul.lo.s32 %r3733, %r13, 7168; 2026-02-21T08:53:50.3961031Z or.b32 %r3734, %r8, 64; 2026-02-21T08:53:50.3961205Z shl.b32 %r3735, %r5, 6; 2026-02-21T08:53:50.3961377Z shl.b32 %r3736, %r5, 5; 2026-02-21T08:53:50.3961544Z shl.b32 %r3737, %r5, 1; 2026-02-21T08:53:50.3961725Z and.b32 %r3738, %r5, 63; 2026-02-21T08:53:50.3961902Z or.b32 %r3739, %r5, 960; 2026-02-21T08:53:50.3962074Z or.b32 %r3740, %r5, 1984; 2026-02-21T08:53:50.3962245Z shl.b32 %r3741, %r9, 4; 2026-02-21T08:53:50.3962555Z shr.u32 %r3742, %r17, 4; 2026-02-21T08:53:50.3962722Z and.b32 %r3743, %r5, 1; 2026-02-21T08:53:50.3962888Z and.b32 %r3744, %r5, 2; 2026-02-21T08:53:50.3963069Z bfe.s32 %r3745, %r5, 1, 1; 2026-02-21T08:53:50.3972180Z bfe.s32 %r3746, %r5, 2, 1; 2026-02-21T08:53:50.3972390Z and.b32 %r3747, %r5, 4; 2026-02-21T08:53:50.3972585Z bfe.s32 %r3748, %r5, 3, 1; 2026-02-21T08:53:50.3972768Z bfe.s32 %r3749, %r5, 4, 1; 2026-02-21T08:53:50.3972956Z and.b32 %r3750, %r5, 16; 2026-02-21T08:53:50.3973148Z and.b32 %r3751, %r7, 416; 2026-02-21T08:53:50.3973331Z bfe.u32 %r3752, %r5, 4, 3; 2026-02-21T08:53:50.3973519Z and.b32 %r3753, %r5, 15; 2026-02-21T08:53:50.3973708Z setp.eq.b32 %p57, %r17, 0; 2026-02-21T08:53:50.3973909Z @%p1 bra $L__BB0_9; 2026-02-21T08:53:50.3974313Z // %bb.1: // %.lr.ph 2026-02-21T08:53:50.3974727Z .loc 1 0 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:0:88 2026-02-21T08:53:50.3975110Z mad.wide.u32 %rd1, %r3729, 2, %rd80; 2026-02-21T08:53:50.3975351Z add.s64 %rd85, %rd312, %rd311; 2026-02-21T08:53:50.3975554Z shl.b64 %rd86, %rd85, 1; 2026-02-21T08:53:50.3975733Z add.s64 %rd87, %rd80, %rd86; 2026-02-21T08:53:50.3975922Z add.s64 %rd2, %rd87, 131072; 2026-02-21T08:53:50.3976098Z add.s64 %rd3, %rd87, 262144; 2026-02-21T08:53:50.3976278Z add.s64 %rd4, %rd87, 393216; 2026-02-21T08:53:50.3976608Z add.s64 %rd5, %rd87, 524288; 2026-02-21T08:53:50.3976797Z add.s64 %rd6, %rd87, 655360; 2026-02-21T08:53:50.3976975Z add.s64 %rd7, %rd87, 786432; 2026-02-21T08:53:50.3977162Z add.s64 %rd89, %rd313, %rd311; 2026-02-21T08:53:50.3977349Z shl.b64 %rd90, %rd89, 1; 2026-02-21T08:53:50.3977535Z add.s64 %rd91, %rd80, %rd90; 2026-02-21T08:53:50.3977716Z add.s64 %rd8, %rd91, 917504; 2026-02-21T08:53:50.3977907Z shl.b32 %r423, %r3730, 3; 2026-02-21T08:53:50.3978087Z xor.b32 %r22, %r423, %r3731; 2026-02-21T08:53:50.3978272Z add.s32 %r23, %r3732, %r22; 2026-02-21T08:53:50.3978456Z add.s32 %r24, %r23, 1024; 2026-02-21T08:53:50.3978628Z add.s32 %r25, %r23, 2048; 2026-02-21T08:53:50.3978801Z add.s32 %r26, %r23, 3072; 2026-02-21T08:53:50.3978969Z add.s32 %r27, %r23, 4096; 2026-02-21T08:53:50.3979141Z add.s32 %r28, %r23, 5120; 2026-02-21T08:53:50.3979302Z add.s32 %r29, %r23, 6144; 2026-02-21T08:53:50.3979474Z add.s32 %r30, %r23, 7168; 2026-02-21T08:53:50.3979641Z shl.b32 %r426, %r3730, 4; 2026-02-21T08:53:50.3979820Z add.s32 %r427, %r3732, %r426; 2026-02-21T08:53:50.3980005Z add.s32 %r32, %r427, 32768; 2026-02-21T08:53:50.3980190Z add.s64 %rd9, %rd87, 128; 2026-02-21T08:53:50.3980372Z cvt.u64.u32 %rd92, %r3734; 2026-02-21T08:53:50.3980559Z add.s64 %rd93, %rd312, %rd92; 2026-02-21T08:53:50.3980745Z shl.b64 %rd94, %rd93, 1; 2026-02-21T08:53:50.3980919Z add.s64 %rd95, %rd80, %rd94; 2026-02-21T08:53:50.3981109Z add.s64 %rd10, %rd95, 131072; 2026-02-21T08:53:50.3981284Z add.s64 %rd11, %rd95, 262144; 2026-02-21T08:53:50.3981467Z add.s64 %rd12, %rd95, 393216; 2026-02-21T08:53:50.3981642Z add.s64 %rd13, %rd95, 524288; 2026-02-21T08:53:50.3981824Z add.s64 %rd14, %rd95, 655360; 2026-02-21T08:53:50.3981997Z add.s64 %rd15, %rd95, 786432; 2026-02-21T08:53:50.3982177Z add.s64 %rd96, %rd313, %rd92; 2026-02-21T08:53:50.3982359Z shl.b64 %rd97, %rd96, 1; 2026-02-21T08:53:50.3982530Z add.s64 %rd98, %rd80, %rd97; 2026-02-21T08:53:50.3982712Z add.s64 %rd16, %rd98, 917504; 2026-02-21T08:53:50.3982886Z add.s32 %r33, %r23, 8192; 2026-02-21T08:53:50.3983061Z add.s32 %r34, %r23, 9216; 2026-02-21T08:53:50.3983227Z add.s32 %r35, %r23, 10240; 2026-02-21T08:53:50.3983403Z add.s32 %r36, %r23, 11264; 2026-02-21T08:53:50.3983571Z add.s32 %r37, %r23, 12288; 2026-02-21T08:53:50.3983762Z add.s32 %r38, %r23, 13312; 2026-02-21T08:53:50.3983931Z add.s32 %r39, %r23, 14336; 2026-02-21T08:53:50.3984110Z add.s32 %r40, %r23, 15360; 2026-02-21T08:53:50.3984287Z add.s32 %r41, %r3733, 229376; 2026-02-21T08:53:50.3984462Z add.s32 %r42, %r427, 34816; 2026-02-21T08:53:50.3984645Z and.b32 %r430, %r3735, 6144; 2026-02-21T08:53:50.3985006Z and.b32 %r432, %r3736, 896; 2026-02-21T08:53:50.3985188Z and.b32 %r434, %r3737, 62; 2026-02-21T08:53:50.3985357Z or.b32 %r435, %r430, %r432; 2026-02-21T08:53:50.3985535Z or.b32 %r43, %r435, %r434; 2026-02-21T08:53:50.3985704Z xor.b32 %r44, %r43, 8; 2026-02-21T08:53:50.3985879Z xor.b32 %r45, %r43, 16; 2026-02-21T08:53:50.3986057Z xor.b32 %r46, %r43, 24; 2026-02-21T08:53:50.3986233Z xor.b32 %r47, %r43, 32; 2026-02-21T08:53:50.3986404Z xor.b32 %r48, %r43, 40; 2026-02-21T08:53:50.3986685Z xor.b32 %r49, %r43, 48; 2026-02-21T08:53:50.3986856Z xor.b32 %r50, %r43, 56; 2026-02-21T08:53:50.3987019Z shl.b32 %r436, %r3738, 7; 2026-02-21T08:53:50.3987200Z or.b32 %r439, %r3741, %r3742; 2026-02-21T08:53:50.3987378Z or.b32 %r440, %r439, %r436; 2026-02-21T08:53:50.3987748Z add.s32 %r441, %r3732, 16384; 2026-02-21T08:53:50.3987940Z add.s32 %r54, %r441, %r440; 2026-02-21T08:53:50.3988128Z xor.b32 %r442, %r440, 16; 2026-02-21T08:53:50.3988315Z add.s32 %r55, %r441, %r442; 2026-02-21T08:53:50.3988488Z xor.b32 %r443, %r440, 32; 2026-02-21T08:53:50.3988760Z add.s32 %r56, %r441, %r443; 2026-02-21T08:53:50.3988934Z xor.b32 %r444, %r440, 48; 2026-02-21T08:53:50.3989110Z add.s32 %r57, %r441, %r444; 2026-02-21T08:53:50.3989280Z xor.b32 %r445, %r440, 64; 2026-02-21T08:53:50.3989452Z add.s32 %r58, %r441, %r445; 2026-02-21T08:53:50.3989622Z xor.b32 %r446, %r440, 80; 2026-02-21T08:53:50.3989796Z add.s32 %r59, %r441, %r446; 2026-02-21T08:53:50.3989965Z xor.b32 %r447, %r440, 96; 2026-02-21T08:53:50.3990138Z add.s32 %r60, %r441, %r447; 2026-02-21T08:53:50.3990317Z xor.b32 %r448, %r440, 112; 2026-02-21T08:53:50.3990491Z add.s32 %r61, %r441, %r448; 2026-02-21T08:53:50.3990687Z bfe.u32 %r449, %r441, 4, 14; 2026-02-21T08:53:50.3990871Z cvt.u64.u32 %rd99, %r449; 2026-02-21T08:53:50.3991068Z or.b64 %rd17, %rd99, 4611686293338849280; 2026-02-21T08:53:50.3991279Z add.s32 %r450, %r3732, 16416; 2026-02-21T08:53:50.3991466Z bfe.u32 %r451, %r450, 4, 14; 2026-02-21T08:53:50.3991645Z cvt.u64.u32 %rd100, %r451; 2026-02-21T08:53:50.3991843Z or.b64 %rd18, %rd100, 4611686293338849280; 2026-02-21T08:53:50.3992055Z add.s32 %r452, %r3732, 16448; 2026-02-21T08:53:50.3992237Z bfe.u32 %r453, %r452, 4, 14; 2026-02-21T08:53:50.3992418Z cvt.u64.u32 %rd101, %r453; 2026-02-21T08:53:50.3992600Z or.b64 %rd19, %rd101, 4611686293338849280; 2026-02-21T08:53:50.3992814Z add.s32 %r454, %r3732, 16480; 2026-02-21T08:53:50.3992987Z bfe.u32 %r455, %r454, 4, 14; 2026-02-21T08:53:50.3993166Z cvt.u64.u32 %rd102, %r455; 2026-02-21T08:53:50.3993348Z or.b64 %rd20, %rd102, 4611686293338849280; 2026-02-21T08:53:50.3993560Z add.s32 %r456, %r3732, 24576; 2026-02-21T08:53:50.3993733Z bfe.u32 %r457, %r456, 4, 14; 2026-02-21T08:53:50.3993918Z cvt.u64.u32 %rd103, %r457; 2026-02-21T08:53:50.3994114Z or.b64 %rd21, %rd103, 4611686293338849280; 2026-02-21T08:53:50.3994325Z add.s32 %r458, %r3732, 24608; 2026-02-21T08:53:50.3994511Z bfe.u32 %r459, %r458, 4, 14; 2026-02-21T08:53:50.3994697Z cvt.u64.u32 %rd104, %r459; 2026-02-21T08:53:50.3994889Z or.b64 %rd22, %rd104, 4611686293338849280; 2026-02-21T08:53:50.3995096Z add.s32 %r460, %r3732, 24640; 2026-02-21T08:53:50.3995286Z bfe.u32 %r461, %r460, 4, 14; 2026-02-21T08:53:50.3995463Z cvt.u64.u32 %rd105, %r461; 2026-02-21T08:53:50.3995650Z or.b64 %rd23, %rd105, 4611686293338849280; 2026-02-21T08:53:50.3995877Z add.s32 %r462, %r3732, 24672; 2026-02-21T08:53:50.3996057Z bfe.u32 %r463, %r462, 4, 14; 2026-02-21T08:53:50.3996239Z cvt.u64.u32 %rd106, %r463; 2026-02-21T08:53:50.3996420Z or.b64 %rd24, %rd106, 4611686293338849280; 2026-02-21T08:53:50.3996749Z and.b32 %r464, %r5, 96; 2026-02-21T08:53:50.3996915Z shl.b32 %r465, %r464, 4; 2026-02-21T08:53:50.3997096Z shl.b32 %r467, %r3743, 5; 2026-02-21T08:53:50.3997268Z and.b32 %r470, %r3745, 4160; 2026-02-21T08:53:50.3997452Z shl.b32 %r473, %r3747, 2; 2026-02-21T08:53:50.3997624Z and.b32 %r475, %r3748, 2080; 2026-02-21T08:53:50.3997809Z shl.b32 %r478, %r3750, 3; 2026-02-21T08:53:50.3998145Z or.b32 %r479, %r470, %r473; 2026-02-21T08:53:50.3998320Z or.b32 %r480, %r479, %r478; 2026-02-21T08:53:50.3998504Z or.b32 %r481, %r465, %r467; 2026-02-21T08:53:50.3998676Z xor.b32 %r482, %r481, %r475; 2026-02-21T08:53:50.3998857Z or.b32 %r483, %r480, %r482; 2026-02-21T08:53:50.3999030Z add.s32 %r62, %r3732, %r483; 2026-02-21T08:53:50.3999211Z xor.b32 %r484, %r483, 64; 2026-02-21T08:53:50.3999378Z add.s32 %r63, %r3732, %r484; 2026-02-21T08:53:50.3999560Z shl.b32 %r486, %r3743, 6; 2026-02-21T08:53:50.3999726Z shl.b32 %r487, %r3744, 3; 2026-02-21T08:53:50.3999898Z and.b32 %r488, %r3746, 2080; 2026-02-21T08:53:50.4000077Z and.b32 %r489, %r3749, 4160; 2026-02-21T08:53:50.4000251Z or.b32 %r490, %r3751, %r486; 2026-02-21T08:53:50.4000564Z or.b32 %r491, %r488, %r489; 2026-02-21T08:53:50.4000741Z xor.b32 %r492, %r491, %r490; 2026-02-21T08:53:50.4000922Z add.s32 %r493, %r3732, %r487; 2026-02-21T08:53:50.4001117Z add.s32 %r1235, %r493, %r492; 2026-02-21T08:53:50.4001310Z add.s32 %r1240, %r1235, 512; 2026-02-21T08:53:50.4001482Z add.s32 %r1245, %r1235, 1024; 2026-02-21T08:53:50.4001660Z add.s32 %r1250, %r1235, 1536; 2026-02-21T08:53:50.4001844Z shl.b32 %r494, %r464, 6; 2026-02-21T08:53:50.4002014Z or.b32 %r495, %r494, %r432; 2026-02-21T08:53:50.4002191Z or.b32 %r68, %r495, %r434; 2026-02-21T08:53:50.4002363Z xor.b32 %r69, %r68, 8; 2026-02-21T08:53:50.4002538Z xor.b32 %r70, %r68, 16; 2026-02-21T08:53:50.4002709Z xor.b32 %r71, %r68, 24; 2026-02-21T08:53:50.4002876Z xor.b32 %r72, %r68, 32; 2026-02-21T08:53:50.4003038Z xor.b32 %r73, %r68, 40; 2026-02-21T08:53:50.4003206Z xor.b32 %r74, %r68, 48; 2026-02-21T08:53:50.4003367Z xor.b32 %r75, %r68, 56; 2026-02-21T08:53:50.4003702Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.4004081Z cvt.u64.u32 %rd314, %r1; 2026-02-21T08:53:50.4004257Z cvt.u64.u32 %rd26, %r4; 2026-02-21T08:53:50.4004435Z shl.b32 %r496, %r1, 6; 2026-02-21T08:53:50.4004614Z add.s32 %r497, %r3733, %r496; 2026-02-21T08:53:50.4004796Z or.b32 %r498, %r497, %r12; 2026-02-21T08:53:50.4004971Z add.s32 %r3756, %r498, 458752; 2026-02-21T08:53:50.4005168Z mul.wide.u32 %rd107, %r3752, 16384; 2026-02-21T08:53:50.4005372Z mul.wide.u32 %rd108, %r3753, 8; 2026-02-21T08:53:50.4005569Z or.b64 %rd109, %rd107, %rd108; 2026-02-21T08:53:50.4005756Z add.s64 %rd110, %rd109, %rd80; 2026-02-21T08:53:50.4005941Z add.s64 %rd27, %rd110, 917760; 2026-02-21T08:53:50.4006127Z add.s32 %r3755, %r498, 458816; 2026-02-21T08:53:50.4006305Z add.s32 %r3754, %r498, 458880; 2026-02-21T08:53:50.4006660Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T08:53:50.4006960Z // Child Loop BB0_3 Depth 2 2026-02-21T08:53:50.4007241Z // Child Loop BB0_5 Depth 2 2026-02-21T08:53:50.4007505Z // Child Loop BB0_7 Depth 2 2026-02-21T08:53:50.4007910Z .loc 1 29 27 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:29:27 2026-02-21T08:53:50.4008293Z cvt.u32.u64 %r540, %rd314; 2026-02-21T08:53:50.4008470Z shl.b32 %r128, %r540, 6; 2026-02-21T08:53:50.4008798Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4009157Z or.b32 %r541, %r128, %r12; 2026-02-21T08:53:50.4009480Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4009829Z bar.sync 0; 2026-02-21T08:53:50.4009971Z mov.b32 %r502, 8; 2026-02-21T08:53:50.4010133Z // begin inline asm 2026-02-21T08:53:50.4010367Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd1 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4010659Z // end inline asm 2026-02-21T08:53:50.4010812Z // begin inline asm 2026-02-21T08:53:50.4011041Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd2 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4011447Z // end inline asm 2026-02-21T08:53:50.4011603Z // begin inline asm 2026-02-21T08:53:50.4011828Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd3 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4012089Z // end inline asm 2026-02-21T08:53:50.4012245Z // begin inline asm 2026-02-21T08:53:50.4012460Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd4 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4012729Z // end inline asm 2026-02-21T08:53:50.4012873Z // begin inline asm 2026-02-21T08:53:50.4013095Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd5 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4013356Z // end inline asm 2026-02-21T08:53:50.4013507Z // begin inline asm 2026-02-21T08:53:50.4013720Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd6 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4013987Z // end inline asm 2026-02-21T08:53:50.4014268Z // begin inline asm 2026-02-21T08:53:50.4014501Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd7 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4014770Z // end inline asm 2026-02-21T08:53:50.4014926Z // begin inline asm 2026-02-21T08:53:50.4015148Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd8 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4015408Z // end inline asm 2026-02-21T08:53:50.4015571Z cp.async.commit_group; 2026-02-21T08:53:50.4015885Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4016247Z add.s32 %r542, %r541, %r3733; 2026-02-21T08:53:50.4016692Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4017052Z cvt.s64.s32 %rd130, %r542; 2026-02-21T08:53:50.4017245Z add.s64 %rd119, %rd81, %rd130; 2026-02-21T08:53:50.4017424Z mov.b32 %r518, 16; 2026-02-21T08:53:50.4017726Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4018067Z // begin inline asm 2026-02-21T08:53:50.4018309Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd119 + 0 ], 0x10, %r518; 2026-02-21T08:53:50.4018601Z // end inline asm 2026-02-21T08:53:50.4018763Z cp.async.commit_group; 2026-02-21T08:53:50.4019102Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4019455Z bar.sync 0; 2026-02-21T08:53:50.4019605Z // begin inline asm 2026-02-21T08:53:50.4019832Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd9 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4020103Z // end inline asm 2026-02-21T08:53:50.4020247Z // begin inline asm 2026-02-21T08:53:50.4020475Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd10 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4020746Z // end inline asm 2026-02-21T08:53:50.4020889Z // begin inline asm 2026-02-21T08:53:50.4021110Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd11 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4021373Z // end inline asm 2026-02-21T08:53:50.4021522Z // begin inline asm 2026-02-21T08:53:50.4021739Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd12 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4022003Z // end inline asm 2026-02-21T08:53:50.4022150Z // begin inline asm 2026-02-21T08:53:50.4022369Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd13 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4022636Z // end inline asm 2026-02-21T08:53:50.4022779Z // begin inline asm 2026-02-21T08:53:50.4022996Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd14 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4023254Z // end inline asm 2026-02-21T08:53:50.4023405Z // begin inline asm 2026-02-21T08:53:50.4023617Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd15 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4023884Z // end inline asm 2026-02-21T08:53:50.4024027Z // begin inline asm 2026-02-21T08:53:50.4024246Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd16 + 0 ], 0x8, %r502; 2026-02-21T08:53:50.4024512Z // end inline asm 2026-02-21T08:53:50.4024666Z cp.async.commit_group; 2026-02-21T08:53:50.4024983Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4025335Z add.s32 %r543, %r541, %r41; 2026-02-21T08:53:50.4025815Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4026165Z cvt.s64.s32 %rd131, %r543; 2026-02-21T08:53:50.4026352Z add.s64 %rd128, %rd81, %rd131; 2026-02-21T08:53:50.4026822Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4027172Z // begin inline asm 2026-02-21T08:53:50.4027405Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd128 + 0 ], 0x10, %r518; 2026-02-21T08:53:50.4027677Z // end inline asm 2026-02-21T08:53:50.4027850Z cp.async.commit_group; 2026-02-21T08:53:50.4028027Z mov.b32 %r3760, 0f00000000; 2026-02-21T08:53:50.4028206Z mov.b32 %r3759, 1; 2026-02-21T08:53:50.4028360Z mov.b32 %r3758, -1; 2026-02-21T08:53:50.4028719Z mov.b64 %rd316, -32; 2026-02-21T08:53:50.4028902Z mov.b64 %rd315, %rd27; 2026-02-21T08:53:50.4029077Z mov.b32 %r3757, %r3756; 2026-02-21T08:53:50.4029251Z mov.b32 %r3761, %r3760; 2026-02-21T08:53:50.4029421Z mov.b32 %r3762, %r3760; 2026-02-21T08:53:50.4029593Z mov.b32 %r3763, %r3760; 2026-02-21T08:53:50.4029756Z mov.b32 %r3764, %r3760; 2026-02-21T08:53:50.4029922Z mov.b32 %r3765, %r3760; 2026-02-21T08:53:50.4030084Z mov.b32 %r3766, %r3760; 2026-02-21T08:53:50.4030250Z mov.b32 %r3767, %r3760; 2026-02-21T08:53:50.4030423Z mov.b32 %r3768, %r3760; 2026-02-21T08:53:50.4030592Z mov.b32 %r3769, %r3760; 2026-02-21T08:53:50.4030753Z mov.b32 %r3770, %r3760; 2026-02-21T08:53:50.4030920Z mov.b32 %r3771, %r3760; 2026-02-21T08:53:50.4031089Z mov.b32 %r3772, %r3760; 2026-02-21T08:53:50.4031247Z mov.b32 %r3773, %r3760; 2026-02-21T08:53:50.4031411Z mov.b32 %r3774, %r3760; 2026-02-21T08:53:50.4031569Z mov.b32 %r3775, %r3760; 2026-02-21T08:53:50.4031736Z mov.b32 %r3776, %r3760; 2026-02-21T08:53:50.4031899Z mov.b32 %r3777, %r3760; 2026-02-21T08:53:50.4032069Z mov.b32 %r3778, %r3760; 2026-02-21T08:53:50.4032236Z mov.b32 %r3779, %r3760; 2026-02-21T08:53:50.4032406Z mov.b32 %r3780, %r3760; 2026-02-21T08:53:50.4032569Z mov.b32 %r3781, %r3760; 2026-02-21T08:53:50.4032734Z mov.b32 %r3782, %r3760; 2026-02-21T08:53:50.4032900Z mov.b32 %r3783, %r3760; 2026-02-21T08:53:50.4033059Z mov.b32 %r3784, %r3760; 2026-02-21T08:53:50.4033240Z mov.b32 %r3785, %r3760; 2026-02-21T08:53:50.4033403Z mov.b32 %r3786, %r3760; 2026-02-21T08:53:50.4033567Z mov.b32 %r3787, %r3760; 2026-02-21T08:53:50.4033727Z mov.b32 %r3788, %r3760; 2026-02-21T08:53:50.4033890Z mov.b32 %r3789, %r3760; 2026-02-21T08:53:50.4034049Z mov.b32 %r3790, %r3760; 2026-02-21T08:53:50.4034216Z mov.b32 %r3791, %r3760; 2026-02-21T08:53:50.4034432Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T08:53:50.4034743Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:50.4035154Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4035519Z add.s64 %rd316, %rd316, 32; 2026-02-21T08:53:50.4035719Z setp.lt.u64 %p11, %rd316, 4032; 2026-02-21T08:53:50.4035926Z add.s32 %r1176, %r3758, 1; 2026-02-21T08:53:50.4036118Z setp.gt.s32 %p12, %r1176, 1; 2026-02-21T08:53:50.4036313Z selp.b32 %r3758, 0, %r1176, %p12; 2026-02-21T08:53:50.4036787Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4037156Z cp.async.wait_group 2; 2026-02-21T08:53:50.4037327Z bar.sync 0; 2026-02-21T08:53:50.4037495Z shl.b32 %r1177, %r3758, 13; 2026-02-21T08:53:50.4037675Z add.s32 %r1179, %r3732, %r1177; 2026-02-21T08:53:50.4038005Z .loc 1 49 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:49:32 2026-02-21T08:53:50.4038358Z add.s32 %r1180, %r1179, %r43; 2026-02-21T08:53:50.4038551Z ld.shared.b16 %rs1, [%r1180]; 2026-02-21T08:53:50.4038740Z ld.shared.b16 %rs2, [%r1180+1024]; 2026-02-21T08:53:50.4038950Z ld.shared.b16 %rs3, [%r1180+64]; 2026-02-21T08:53:50.4039151Z ld.shared.b16 %rs4, [%r1180+1088]; 2026-02-21T08:53:50.4039511Z add.s32 %r1181, %r1179, %r44; 2026-02-21T08:53:50.4039704Z ld.shared.b16 %rs5, [%r1181]; 2026-02-21T08:53:50.4039887Z ld.shared.b16 %rs6, [%r1181+1024]; 2026-02-21T08:53:50.4040094Z ld.shared.b16 %rs7, [%r1181+64]; 2026-02-21T08:53:50.4040294Z ld.shared.b16 %rs8, [%r1181+1088]; 2026-02-21T08:53:50.4040494Z add.s32 %r1182, %r1179, %r45; 2026-02-21T08:53:50.4040675Z ld.shared.b16 %rs9, [%r1182]; 2026-02-21T08:53:50.4040871Z ld.shared.b16 %rs10, [%r1182+1024]; 2026-02-21T08:53:50.4041075Z ld.shared.b16 %rs11, [%r1182+64]; 2026-02-21T08:53:50.4041279Z ld.shared.b16 %rs12, [%r1182+1088]; 2026-02-21T08:53:50.4041479Z add.s32 %r1183, %r1179, %r46; 2026-02-21T08:53:50.4041662Z ld.shared.b16 %rs13, [%r1183]; 2026-02-21T08:53:50.4041995Z ld.shared.b16 %rs14, [%r1183+1024]; 2026-02-21T08:53:50.4042208Z ld.shared.b16 %rs15, [%r1183+64]; 2026-02-21T08:53:50.4042409Z ld.shared.b16 %rs16, [%r1183+1088]; 2026-02-21T08:53:50.4042597Z add.s32 %r1184, %r1179, %r47; 2026-02-21T08:53:50.4042786Z ld.shared.b16 %rs17, [%r1184]; 2026-02-21T08:53:50.4042971Z ld.shared.b16 %rs18, [%r1184+1024]; 2026-02-21T08:53:50.4043169Z ld.shared.b16 %rs19, [%r1184+64]; 2026-02-21T08:53:50.4043363Z ld.shared.b16 %rs20, [%r1184+1088]; 2026-02-21T08:53:50.4043553Z add.s32 %r1185, %r1179, %r48; 2026-02-21T08:53:50.4043736Z ld.shared.b16 %rs21, [%r1185]; 2026-02-21T08:53:50.4043924Z ld.shared.b16 %rs22, [%r1185+1024]; 2026-02-21T08:53:50.4044119Z ld.shared.b16 %rs23, [%r1185+64]; 2026-02-21T08:53:50.4044309Z ld.shared.b16 %rs24, [%r1185+1088]; 2026-02-21T08:53:50.4044504Z add.s32 %r1186, %r1179, %r49; 2026-02-21T08:53:50.4044682Z ld.shared.b16 %rs25, [%r1186]; 2026-02-21T08:53:50.4044874Z ld.shared.b16 %rs26, [%r1186+1024]; 2026-02-21T08:53:50.4045076Z ld.shared.b16 %rs27, [%r1186+64]; 2026-02-21T08:53:50.4045268Z ld.shared.b16 %rs28, [%r1186+1088]; 2026-02-21T08:53:50.4045463Z add.s32 %r1187, %r1179, %r50; 2026-02-21T08:53:50.4045641Z ld.shared.b16 %rs29, [%r1187]; 2026-02-21T08:53:50.4045836Z ld.shared.b16 %rs30, [%r1187+1024]; 2026-02-21T08:53:50.4046028Z ld.shared.b16 %rs31, [%r1187+64]; 2026-02-21T08:53:50.4046230Z ld.shared.b16 %rs32, [%r1187+1088]; 2026-02-21T08:53:50.4046426Z cvt.f32.bf16 %r608, %rs1; 2026-02-21T08:53:50.4046757Z cvt.f32.bf16 %r609, %rs2; 2026-02-21T08:53:50.4046932Z cvt.f32.bf16 %r610, %rs5; 2026-02-21T08:53:50.4047107Z cvt.f32.bf16 %r611, %rs6; 2026-02-21T08:53:50.4047274Z cvt.f32.bf16 %r676, %rs9; 2026-02-21T08:53:50.4047449Z cvt.f32.bf16 %r677, %rs10; 2026-02-21T08:53:50.4047625Z cvt.f32.bf16 %r678, %rs13; 2026-02-21T08:53:50.4047802Z cvt.f32.bf16 %r679, %rs14; 2026-02-21T08:53:50.4047980Z cvt.f32.bf16 %r744, %rs17; 2026-02-21T08:53:50.4048156Z cvt.f32.bf16 %r745, %rs18; 2026-02-21T08:53:50.4048339Z cvt.f32.bf16 %r746, %rs21; 2026-02-21T08:53:50.4048512Z cvt.f32.bf16 %r747, %rs22; 2026-02-21T08:53:50.4048692Z cvt.f32.bf16 %r812, %rs25; 2026-02-21T08:53:50.4048864Z cvt.f32.bf16 %r813, %rs26; 2026-02-21T08:53:50.4049046Z cvt.f32.bf16 %r814, %rs29; 2026-02-21T08:53:50.4049215Z cvt.f32.bf16 %r815, %rs30; 2026-02-21T08:53:50.4049393Z cvt.f32.bf16 %r880, %rs3; 2026-02-21T08:53:50.4049569Z cvt.f32.bf16 %r881, %rs4; 2026-02-21T08:53:50.4049736Z cvt.f32.bf16 %r882, %rs7; 2026-02-21T08:53:50.4049925Z cvt.f32.bf16 %r883, %rs8; 2026-02-21T08:53:50.4050097Z cvt.f32.bf16 %r948, %rs11; 2026-02-21T08:53:50.4050275Z cvt.f32.bf16 %r949, %rs12; 2026-02-21T08:53:50.4050454Z cvt.f32.bf16 %r950, %rs15; 2026-02-21T08:53:50.4050634Z cvt.f32.bf16 %r951, %rs16; 2026-02-21T08:53:50.4050811Z cvt.f32.bf16 %r1016, %rs19; 2026-02-21T08:53:50.4050995Z cvt.f32.bf16 %r1017, %rs20; 2026-02-21T08:53:50.4051168Z cvt.f32.bf16 %r1018, %rs23; 2026-02-21T08:53:50.4051347Z cvt.f32.bf16 %r1019, %rs24; 2026-02-21T08:53:50.4051528Z cvt.f32.bf16 %r1084, %rs27; 2026-02-21T08:53:50.4051697Z cvt.f32.bf16 %r1085, %rs28; 2026-02-21T08:53:50.4051872Z cvt.f32.bf16 %r1086, %rs31; 2026-02-21T08:53:50.4052187Z cvt.f32.bf16 %r1087, %rs32; 2026-02-21T08:53:50.4052515Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4052883Z shl.b32 %r1188, %r3758, 11; 2026-02-21T08:53:50.4053069Z add.s32 %r1189, %r3732, %r1188; 2026-02-21T08:53:50.4053256Z add.s32 %r1190, %r1189, 32768; 2026-02-21T08:53:50.4053580Z .loc 1 64 45 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:64:45 2026-02-21T08:53:50.4053935Z add.s32 %r1191, %r1190, %r3738; 2026-02-21T08:53:50.4054123Z ld.shared.b8 %rs33, [%r1191]; 2026-02-21T08:53:50.4054316Z ld.shared.b8 %rs34, [%r1191+64]; 2026-02-21T08:53:50.4054509Z ld.shared.b8 %rs35, [%r1191+128]; 2026-02-21T08:53:50.4054710Z ld.shared.b8 %rs36, [%r1191+192]; 2026-02-21T08:53:50.4055026Z ld.shared.b8 %rs37, [%r1191+256]; 2026-02-21T08:53:50.4055228Z ld.shared.b8 %rs38, [%r1191+320]; 2026-02-21T08:53:50.4055416Z ld.shared.b8 %rs39, [%r1191+384]; 2026-02-21T08:53:50.4055617Z ld.shared.b8 %rs40, [%r1191+448]; 2026-02-21T08:53:50.4055821Z ld.shared.b8 %rs41, [%r1191+512]; 2026-02-21T08:53:50.4056011Z ld.shared.b8 %rs42, [%r1191+576]; 2026-02-21T08:53:50.4056207Z ld.shared.b8 %rs43, [%r1191+640]; 2026-02-21T08:53:50.4056417Z ld.shared.b8 %rs44, [%r1191+704]; 2026-02-21T08:53:50.4056727Z ld.shared.b8 %rs45, [%r1191+768]; 2026-02-21T08:53:50.4056923Z ld.shared.b8 %rs46, [%r1191+832]; 2026-02-21T08:53:50.4057125Z ld.shared.b8 %rs47, [%r1191+896]; 2026-02-21T08:53:50.4057329Z add.s32 %r1192, %r1190, %r3739; 2026-02-21T08:53:50.4057521Z ld.shared.b8 %rs48, [%r1192]; 2026-02-21T08:53:50.4057717Z ld.shared.b8 %rs49, [%r1191+1024]; 2026-02-21T08:53:50.4057916Z ld.shared.b8 %rs50, [%r1191+1088]; 2026-02-21T08:53:50.4058117Z ld.shared.b8 %rs51, [%r1191+1152]; 2026-02-21T08:53:50.4058315Z ld.shared.b8 %rs52, [%r1191+1216]; 2026-02-21T08:53:50.4058505Z ld.shared.b8 %rs53, [%r1191+1280]; 2026-02-21T08:53:50.4058700Z ld.shared.b8 %rs54, [%r1191+1344]; 2026-02-21T08:53:50.4058897Z ld.shared.b8 %rs55, [%r1191+1408]; 2026-02-21T08:53:50.4059106Z ld.shared.b8 %rs56, [%r1191+1472]; 2026-02-21T08:53:50.4059297Z ld.shared.b8 %rs57, [%r1191+1536]; 2026-02-21T08:53:50.4059491Z ld.shared.b8 %rs58, [%r1191+1600]; 2026-02-21T08:53:50.4059686Z ld.shared.b8 %rs59, [%r1191+1664]; 2026-02-21T08:53:50.4059883Z ld.shared.b8 %rs60, [%r1191+1728]; 2026-02-21T08:53:50.4060078Z ld.shared.b8 %rs61, [%r1191+1792]; 2026-02-21T08:53:50.4060268Z ld.shared.b8 %rs62, [%r1191+1856]; 2026-02-21T08:53:50.4060463Z ld.shared.b8 %rs63, [%r1191+1920]; 2026-02-21T08:53:50.4060650Z add.s32 %r1193, %r1190, %r3740; 2026-02-21T08:53:50.4060838Z ld.shared.b8 %rs64, [%r1193]; 2026-02-21T08:53:50.4061160Z .loc 1 54 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:54:28 2026-02-21T08:53:50.4061519Z shl.b16 %rs65, %rs33, 4; 2026-02-21T08:53:50.4061690Z shl.b16 %rs66, %rs34, 4; 2026-02-21T08:53:50.4061864Z shl.b16 %rs67, %rs35, 4; 2026-02-21T08:53:50.4062038Z shl.b16 %rs68, %rs36, 4; 2026-02-21T08:53:50.4062204Z shl.b16 %rs69, %rs37, 4; 2026-02-21T08:53:50.4062377Z shl.b16 %rs70, %rs38, 4; 2026-02-21T08:53:50.4062542Z shl.b16 %rs71, %rs39, 4; 2026-02-21T08:53:50.4062710Z shl.b16 %rs72, %rs40, 4; 2026-02-21T08:53:50.4062876Z shl.b16 %rs73, %rs41, 4; 2026-02-21T08:53:50.4063045Z shl.b16 %rs74, %rs42, 4; 2026-02-21T08:53:50.4063227Z shl.b16 %rs75, %rs43, 4; 2026-02-21T08:53:50.4063401Z shl.b16 %rs76, %rs44, 4; 2026-02-21T08:53:50.4063565Z shl.b16 %rs77, %rs45, 4; 2026-02-21T08:53:50.4063740Z shl.b16 %rs78, %rs46, 4; 2026-02-21T08:53:50.4063912Z shl.b16 %rs79, %rs47, 4; 2026-02-21T08:53:50.4064074Z shl.b16 %rs80, %rs48, 4; 2026-02-21T08:53:50.4064247Z shl.b16 %rs81, %rs49, 4; 2026-02-21T08:53:50.4064412Z shl.b16 %rs82, %rs50, 4; 2026-02-21T08:53:50.4064589Z shl.b16 %rs83, %rs51, 4; 2026-02-21T08:53:50.4064759Z shl.b16 %rs84, %rs52, 4; 2026-02-21T08:53:50.4064933Z shl.b16 %rs85, %rs53, 4; 2026-02-21T08:53:50.4065098Z shl.b16 %rs86, %rs54, 4; 2026-02-21T08:53:50.4065419Z shl.b16 %rs87, %rs55, 4; 2026-02-21T08:53:50.4065589Z shl.b16 %rs88, %rs56, 4; 2026-02-21T08:53:50.4065762Z shl.b16 %rs89, %rs57, 4; 2026-02-21T08:53:50.4065938Z shl.b16 %rs90, %rs58, 4; 2026-02-21T08:53:50.4066107Z shl.b16 %rs91, %rs59, 4; 2026-02-21T08:53:50.4066280Z shl.b16 %rs92, %rs60, 4; 2026-02-21T08:53:50.4066446Z shl.b16 %rs93, %rs61, 4; 2026-02-21T08:53:50.4066751Z shl.b16 %rs94, %rs62, 4; 2026-02-21T08:53:50.4066918Z shl.b16 %rs95, %rs63, 4; 2026-02-21T08:53:50.4067088Z shl.b16 %rs96, %rs64, 4; 2026-02-21T08:53:50.4067396Z .loc 1 69 58 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:69:58 2026-02-21T08:53:50.4067769Z selp.b16 %rs97, %rs65, %rs33, %p57; 2026-02-21T08:53:50.4068125Z cvt.s16.s8 %rs98, %rs97; 2026-02-21T08:53:50.4068301Z shr.s16 %rs99, %rs98, 4; 2026-02-21T08:53:50.4068488Z selp.b16 %rs100, %rs66, %rs34, %p57; 2026-02-21T08:53:50.4068802Z cvt.s16.s8 %rs101, %rs100; 2026-02-21T08:53:50.4068990Z shr.s16 %rs102, %rs101, 4; 2026-02-21T08:53:50.4069168Z selp.b16 %rs103, %rs67, %rs35, %p57; 2026-02-21T08:53:50.4069364Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T08:53:50.4069527Z shr.s16 %rs105, %rs104, 4; 2026-02-21T08:53:50.4069700Z selp.b16 %rs106, %rs68, %rs36, %p57; 2026-02-21T08:53:50.4069887Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T08:53:50.4070056Z shr.s16 %rs108, %rs107, 4; 2026-02-21T08:53:50.4070230Z selp.b16 %rs109, %rs69, %rs37, %p57; 2026-02-21T08:53:50.4070419Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T08:53:50.4070587Z shr.s16 %rs111, %rs110, 4; 2026-02-21T08:53:50.4070758Z selp.b16 %rs112, %rs70, %rs38, %p57; 2026-02-21T08:53:50.4070948Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T08:53:50.4071117Z shr.s16 %rs114, %rs113, 4; 2026-02-21T08:53:50.4071295Z selp.b16 %rs115, %rs71, %rs39, %p57; 2026-02-21T08:53:50.4071482Z cvt.s16.s8 %rs116, %rs115; 2026-02-21T08:53:50.4071653Z shr.s16 %rs117, %rs116, 4; 2026-02-21T08:53:50.4071831Z selp.b16 %rs118, %rs72, %rs40, %p57; 2026-02-21T08:53:50.4072027Z cvt.s16.s8 %rs119, %rs118; 2026-02-21T08:53:50.4072203Z shr.s16 %rs120, %rs119, 4; 2026-02-21T08:53:50.4072374Z selp.b16 %rs121, %rs73, %rs41, %p57; 2026-02-21T08:53:50.4072571Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T08:53:50.4072740Z shr.s16 %rs123, %rs122, 4; 2026-02-21T08:53:50.4072918Z selp.b16 %rs124, %rs74, %rs42, %p57; 2026-02-21T08:53:50.4073110Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T08:53:50.4073281Z shr.s16 %rs126, %rs125, 4; 2026-02-21T08:53:50.4073452Z selp.b16 %rs127, %rs75, %rs43, %p57; 2026-02-21T08:53:50.4073647Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T08:53:50.4073815Z shr.s16 %rs129, %rs128, 4; 2026-02-21T08:53:50.4073986Z selp.b16 %rs130, %rs76, %rs44, %p57; 2026-02-21T08:53:50.4074179Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T08:53:50.4074362Z shr.s16 %rs132, %rs131, 4; 2026-02-21T08:53:50.4074543Z selp.b16 %rs133, %rs77, %rs45, %p57; 2026-02-21T08:53:50.4074735Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T08:53:50.4074902Z shr.s16 %rs135, %rs134, 4; 2026-02-21T08:53:50.4075088Z selp.b16 %rs136, %rs78, %rs46, %p57; 2026-02-21T08:53:50.4075288Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T08:53:50.4075452Z shr.s16 %rs138, %rs137, 4; 2026-02-21T08:53:50.4075630Z selp.b16 %rs139, %rs79, %rs47, %p57; 2026-02-21T08:53:50.4075829Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T08:53:50.4075997Z shr.s16 %rs141, %rs140, 4; 2026-02-21T08:53:50.4076180Z selp.b16 %rs142, %rs80, %rs48, %p57; 2026-02-21T08:53:50.4076373Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T08:53:50.4076672Z shr.s16 %rs144, %rs143, 4; 2026-02-21T08:53:50.4076847Z selp.b16 %rs145, %rs81, %rs49, %p57; 2026-02-21T08:53:50.4077045Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T08:53:50.4077215Z shr.s16 %rs147, %rs146, 4; 2026-02-21T08:53:50.4077400Z selp.b16 %rs148, %rs82, %rs50, %p57; 2026-02-21T08:53:50.4077612Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T08:53:50.4077781Z shr.s16 %rs150, %rs149, 4; 2026-02-21T08:53:50.4077960Z selp.b16 %rs151, %rs83, %rs51, %p57; 2026-02-21T08:53:50.4078296Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T08:53:50.4078466Z shr.s16 %rs153, %rs152, 4; 2026-02-21T08:53:50.4078639Z selp.b16 %rs154, %rs84, %rs52, %p57; 2026-02-21T08:53:50.4078833Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T08:53:50.4079003Z shr.s16 %rs156, %rs155, 4; 2026-02-21T08:53:50.4079194Z selp.b16 %rs157, %rs85, %rs53, %p57; 2026-02-21T08:53:50.4079387Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T08:53:50.4079557Z shr.s16 %rs159, %rs158, 4; 2026-02-21T08:53:50.4079741Z selp.b16 %rs160, %rs86, %rs54, %p57; 2026-02-21T08:53:50.4079930Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T08:53:50.4080102Z shr.s16 %rs162, %rs161, 4; 2026-02-21T08:53:50.4080278Z selp.b16 %rs163, %rs87, %rs55, %p57; 2026-02-21T08:53:50.4080476Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T08:53:50.4080768Z shr.s16 %rs165, %rs164, 4; 2026-02-21T08:53:50.4080956Z selp.b16 %rs166, %rs88, %rs56, %p57; 2026-02-21T08:53:50.4081146Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T08:53:50.4081317Z shr.s16 %rs168, %rs167, 4; 2026-02-21T08:53:50.4081498Z selp.b16 %rs169, %rs89, %rs57, %p57; 2026-02-21T08:53:50.4081687Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T08:53:50.4081858Z shr.s16 %rs171, %rs170, 4; 2026-02-21T08:53:50.4082033Z selp.b16 %rs172, %rs90, %rs58, %p57; 2026-02-21T08:53:50.4082226Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T08:53:50.4082394Z shr.s16 %rs174, %rs173, 4; 2026-02-21T08:53:50.4082570Z selp.b16 %rs175, %rs91, %rs59, %p57; 2026-02-21T08:53:50.4082754Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T08:53:50.4082922Z shr.s16 %rs177, %rs176, 4; 2026-02-21T08:53:50.4083096Z selp.b16 %rs178, %rs92, %rs60, %p57; 2026-02-21T08:53:50.4083400Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T08:53:50.4083694Z shr.s16 %rs180, %rs179, 4; 2026-02-21T08:53:50.4083944Z selp.b16 %rs181, %rs93, %rs61, %p57; 2026-02-21T08:53:50.4084147Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T08:53:50.4084316Z shr.s16 %rs183, %rs182, 4; 2026-02-21T08:53:50.4084493Z selp.b16 %rs184, %rs94, %rs62, %p57; 2026-02-21T08:53:50.4084682Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T08:53:50.4084859Z shr.s16 %rs186, %rs185, 4; 2026-02-21T08:53:50.4085033Z selp.b16 %rs187, %rs95, %rs63, %p57; 2026-02-21T08:53:50.4085226Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T08:53:50.4085393Z shr.s16 %rs189, %rs188, 4; 2026-02-21T08:53:50.4085571Z selp.b16 %rs190, %rs96, %rs64, %p57; 2026-02-21T08:53:50.4085782Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T08:53:50.4085957Z shr.s16 %rs192, %rs191, 4; 2026-02-21T08:53:50.4086279Z .loc 1 74 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:74:32 2026-02-21T08:53:50.4086805Z cvt.rn.f32.s16 %r1194, %rs99; 2026-02-21T08:53:50.4087009Z cvt.rn.f32.s16 %r1195, %rs102; 2026-02-21T08:53:50.4087196Z cvt.rn.f32.s16 %r1196, %rs105; 2026-02-21T08:53:50.4087383Z cvt.rn.f32.s16 %r1197, %rs108; 2026-02-21T08:53:50.4087561Z cvt.rn.f32.s16 %r1198, %rs111; 2026-02-21T08:53:50.4087747Z cvt.rn.f32.s16 %r1199, %rs114; 2026-02-21T08:53:50.4087931Z cvt.rn.f32.s16 %r1200, %rs117; 2026-02-21T08:53:50.4088110Z cvt.rn.f32.s16 %r1201, %rs120; 2026-02-21T08:53:50.4088290Z cvt.rn.f32.s16 %r1202, %rs123; 2026-02-21T08:53:50.4088466Z cvt.rn.f32.s16 %r1203, %rs126; 2026-02-21T08:53:50.4088644Z cvt.rn.f32.s16 %r1204, %rs129; 2026-02-21T08:53:50.4088818Z cvt.rn.f32.s16 %r1205, %rs132; 2026-02-21T08:53:50.4088996Z cvt.rn.f32.s16 %r1206, %rs135; 2026-02-21T08:53:50.4089171Z cvt.rn.f32.s16 %r1207, %rs138; 2026-02-21T08:53:50.4089350Z cvt.rn.f32.s16 %r1208, %rs141; 2026-02-21T08:53:50.4089544Z cvt.rn.f32.s16 %r1209, %rs144; 2026-02-21T08:53:50.4089722Z cvt.rn.f32.s16 %r1210, %rs147; 2026-02-21T08:53:50.4089902Z cvt.rn.f32.s16 %r1211, %rs150; 2026-02-21T08:53:50.4090077Z cvt.rn.f32.s16 %r1212, %rs153; 2026-02-21T08:53:50.4090260Z cvt.rn.f32.s16 %r1213, %rs156; 2026-02-21T08:53:50.4090438Z cvt.rn.f32.s16 %r1214, %rs159; 2026-02-21T08:53:50.4090617Z cvt.rn.f32.s16 %r1215, %rs162; 2026-02-21T08:53:50.4090793Z cvt.rn.f32.s16 %r1216, %rs165; 2026-02-21T08:53:50.4090974Z cvt.rn.f32.s16 %r1217, %rs168; 2026-02-21T08:53:50.4091332Z cvt.rn.f32.s16 %r1218, %rs171; 2026-02-21T08:53:50.4091513Z cvt.rn.f32.s16 %r1219, %rs174; 2026-02-21T08:53:50.4091693Z cvt.rn.f32.s16 %r1220, %rs177; 2026-02-21T08:53:50.4091869Z cvt.rn.f32.s16 %r1221, %rs180; 2026-02-21T08:53:50.4092051Z cvt.rn.f32.s16 %r1222, %rs183; 2026-02-21T08:53:50.4092240Z cvt.rn.f32.s16 %r1223, %rs186; 2026-02-21T08:53:50.4092424Z cvt.rn.f32.s16 %r1224, %rs189; 2026-02-21T08:53:50.4092602Z cvt.rn.f32.s16 %r1225, %rs192; 2026-02-21T08:53:50.4092785Z st.shared.b32 [%r54], %r1194; 2026-02-21T08:53:50.4092968Z st.shared.b32 [%r54+8], %r1195; 2026-02-21T08:53:50.4093164Z st.shared.b32 [%r54+8192], %r1210; 2026-02-21T08:53:50.4093365Z st.shared.b32 [%r54+8200], %r1211; 2026-02-21T08:53:50.4093692Z st.shared.b32 [%r55], %r1196; 2026-02-21T08:53:50.4093887Z st.shared.b32 [%r55+8], %r1197; 2026-02-21T08:53:50.4094072Z st.shared.b32 [%r55+8192], %r1212; 2026-02-21T08:53:50.4094267Z st.shared.b32 [%r55+8200], %r1213; 2026-02-21T08:53:50.4094459Z st.shared.b32 [%r56], %r1198; 2026-02-21T08:53:50.4094641Z st.shared.b32 [%r56+8], %r1199; 2026-02-21T08:53:50.4094824Z st.shared.b32 [%r56+8192], %r1214; 2026-02-21T08:53:50.4095039Z st.shared.b32 [%r56+8200], %r1215; 2026-02-21T08:53:50.4095231Z st.shared.b32 [%r57], %r1200; 2026-02-21T08:53:50.4095420Z st.shared.b32 [%r57+8], %r1201; 2026-02-21T08:53:50.4095613Z st.shared.b32 [%r57+8192], %r1216; 2026-02-21T08:53:50.4095808Z st.shared.b32 [%r57+8200], %r1217; 2026-02-21T08:53:50.4096005Z st.shared.b32 [%r58], %r1202; 2026-02-21T08:53:50.4096185Z st.shared.b32 [%r58+8], %r1203; 2026-02-21T08:53:50.4096378Z st.shared.b32 [%r58+8192], %r1218; 2026-02-21T08:53:50.4096696Z st.shared.b32 [%r58+8200], %r1219; 2026-02-21T08:53:50.4096899Z st.shared.b32 [%r59], %r1204; 2026-02-21T08:53:50.4097079Z st.shared.b32 [%r59+8], %r1205; 2026-02-21T08:53:50.4097271Z st.shared.b32 [%r59+8192], %r1220; 2026-02-21T08:53:50.4097464Z st.shared.b32 [%r59+8200], %r1221; 2026-02-21T08:53:50.4097659Z st.shared.b32 [%r60], %r1206; 2026-02-21T08:53:50.4097846Z st.shared.b32 [%r60+8], %r1207; 2026-02-21T08:53:50.4098044Z st.shared.b32 [%r60+8192], %r1222; 2026-02-21T08:53:50.4098239Z st.shared.b32 [%r60+8200], %r1223; 2026-02-21T08:53:50.4098428Z st.shared.b32 [%r61], %r1208; 2026-02-21T08:53:50.4098611Z st.shared.b32 [%r61+8], %r1209; 2026-02-21T08:53:50.4098795Z st.shared.b32 [%r61+8192], %r1224; 2026-02-21T08:53:50.4098987Z st.shared.b32 [%r61+8200], %r1225; 2026-02-21T08:53:50.4099170Z $L__tmp1: 2026-02-21T08:53:50.4099534Z .loc 2 291 36 // standard.py:291:36 @[ cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:81:40 ] 2026-02-21T08:53:50.4099961Z // begin inline asm 2026-02-21T08:53:50.4100159Z fence.proxy.async.shared::cta; 2026-02-21T08:53:50.4100354Z // end inline asm 2026-02-21T08:53:50.4100502Z bar.sync 0; 2026-02-21T08:53:50.4100671Z shfl.sync.idx.b32 %r1226, %r6, 0, 31, -1; 2026-02-21T08:53:50.4100899Z wgmma.fence.sync.aligned; 2026-02-21T08:53:50.4101092Z mov.pred %p2, -1; 2026-02-21T08:53:50.4101253Z // begin inline asm 2026-02-21T08:53:50.4102115Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r608,%r609,%r610,%r611}, %rd17, %p2, 1, 1; 2026-02-21T08:53:50.4103012Z // end inline asm 2026-02-21T08:53:50.4103161Z // begin inline asm 2026-02-21T08:53:50.4104000Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r676,%r677,%r678,%r679}, %rd18, %p2, 1, 1; 2026-02-21T08:53:50.4104897Z // end inline asm 2026-02-21T08:53:50.4105046Z // begin inline asm 2026-02-21T08:53:50.4106027Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r744,%r745,%r746,%r747}, %rd19, %p2, 1, 1; 2026-02-21T08:53:50.4107028Z // end inline asm 2026-02-21T08:53:50.4107192Z // begin inline asm 2026-02-21T08:53:50.4108034Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r812,%r813,%r814,%r815}, %rd20, %p2, 1, 1; 2026-02-21T08:53:50.4109129Z // end inline asm 2026-02-21T08:53:50.4109289Z // begin inline asm 2026-02-21T08:53:50.4110118Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r880,%r881,%r882,%r883}, %rd21, %p2, 1, 1; 2026-02-21T08:53:50.4111007Z // end inline asm 2026-02-21T08:53:50.4111160Z // begin inline asm 2026-02-21T08:53:50.4111984Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r948,%r949,%r950,%r951}, %rd22, %p2, 1, 1; 2026-02-21T08:53:50.4112876Z // end inline asm 2026-02-21T08:53:50.4113027Z // begin inline asm 2026-02-21T08:53:50.4113866Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r1016,%r1017,%r1018,%r1019}, %rd23, %p2, 1, 1; 2026-02-21T08:53:50.4114761Z // end inline asm 2026-02-21T08:53:50.4114906Z // begin inline asm 2026-02-21T08:53:50.4115749Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791}, {%r1084,%r1085,%r1086,%r1087}, %rd24, %p2, 1, 1; 2026-02-21T08:53:50.4116766Z // end inline asm 2026-02-21T08:53:50.4116930Z wgmma.commit_group.sync.aligned; 2026-02-21T08:53:50.4117134Z mov.b32 %r1121, 0; 2026-02-21T08:53:50.4117288Z mov.b32 %r1120, %r441; 2026-02-21T08:53:50.4117468Z mov.b32 %r1122, %r1121; 2026-02-21T08:53:50.4117636Z // begin inline asm 2026-02-21T08:53:50.4118292Z // wait for regs: %r3760,%r3761,%r3762,%r3763,%r3764,%r3765,%r3766,%r3767,%r3768,%r3769,%r3770,%r3771,%r3772,%r3773,%r3774,%r3775,%r3776,%r3777,%r3778,%r3779,%r3780,%r3781,%r3782,%r3783,%r3784,%r3785,%r3786,%r3787,%r3788,%r3789,%r3790,%r3791,%r1120,%r1121,%r1122 2026-02-21T08:53:50.4119021Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:53:50.4119214Z // end inline asm 2026-02-21T08:53:50.4119374Z $L__tmp2: 2026-02-21T08:53:50.4119675Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4120045Z add.s32 %r1227, %r3759, 1; 2026-02-21T08:53:50.4120227Z setp.gt.s32 %p13, %r1227, 1; 2026-02-21T08:53:50.4120420Z selp.b32 %r3759, 0, %r1227, %p13; 2026-02-21T08:53:50.4120764Z .loc 1 45 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:32 2026-02-21T08:53:50.4121120Z add.s64 %rd140, %rd315, -917504; 2026-02-21T08:53:50.4121328Z add.s64 %rd141, %rd315, -786432; 2026-02-21T08:53:50.4121523Z add.s64 %rd142, %rd315, -655360; 2026-02-21T08:53:50.4121716Z add.s64 %rd143, %rd315, -524288; 2026-02-21T08:53:50.4122046Z add.s64 %rd144, %rd315, -393216; 2026-02-21T08:53:50.4122233Z add.s64 %rd145, %rd315, -262144; 2026-02-21T08:53:50.4122415Z add.s64 %rd146, %rd315, -131072; 2026-02-21T08:53:50.4122749Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4123124Z shl.b32 %r1228, %r3759, 13; 2026-02-21T08:53:50.4123306Z add.s32 %r1229, %r3732, %r1228; 2026-02-21T08:53:50.4123498Z add.s32 %r1158, %r1229, %r22; 2026-02-21T08:53:50.4123683Z selp.b32 %r1159, 8, 0, %p11; 2026-02-21T08:53:50.4123866Z // begin inline asm 2026-02-21T08:53:50.4124106Z cp.async.ca.shared.global [ %r1158 + 0 ], [ %rd140 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4124390Z // end inline asm 2026-02-21T08:53:50.4124544Z add.s32 %r1160, %r1158, 1024; 2026-02-21T08:53:50.4124853Z // begin inline asm 2026-02-21T08:53:50.4125094Z cp.async.ca.shared.global [ %r1160 + 0 ], [ %rd141 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4125384Z // end inline asm 2026-02-21T08:53:50.4125548Z add.s32 %r1162, %r1158, 2048; 2026-02-21T08:53:50.4125719Z // begin inline asm 2026-02-21T08:53:50.4125951Z cp.async.ca.shared.global [ %r1162 + 0 ], [ %rd142 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4126223Z // end inline asm 2026-02-21T08:53:50.4126377Z add.s32 %r1164, %r1158, 3072; 2026-02-21T08:53:50.4126675Z // begin inline asm 2026-02-21T08:53:50.4126910Z cp.async.ca.shared.global [ %r1164 + 0 ], [ %rd143 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4127185Z // end inline asm 2026-02-21T08:53:50.4127333Z add.s32 %r1166, %r1158, 4096; 2026-02-21T08:53:50.4127513Z // begin inline asm 2026-02-21T08:53:50.4127739Z cp.async.ca.shared.global [ %r1166 + 0 ], [ %rd144 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4128016Z // end inline asm 2026-02-21T08:53:50.4128170Z add.s32 %r1168, %r1158, 5120; 2026-02-21T08:53:50.4128347Z // begin inline asm 2026-02-21T08:53:50.4128574Z cp.async.ca.shared.global [ %r1168 + 0 ], [ %rd145 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4128850Z // end inline asm 2026-02-21T08:53:50.4129010Z add.s32 %r1170, %r1158, 6144; 2026-02-21T08:53:50.4129200Z // begin inline asm 2026-02-21T08:53:50.4129436Z cp.async.ca.shared.global [ %r1170 + 0 ], [ %rd146 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4129707Z // end inline asm 2026-02-21T08:53:50.4129867Z add.s32 %r1172, %r1158, 7168; 2026-02-21T08:53:50.4130043Z // begin inline asm 2026-02-21T08:53:50.4130277Z cp.async.ca.shared.global [ %r1172 + 0 ], [ %rd315 + 0 ], 0x8, %r1159; 2026-02-21T08:53:50.4130553Z // end inline asm 2026-02-21T08:53:50.4130718Z cp.async.commit_group; 2026-02-21T08:53:50.4131035Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4131395Z cvt.s64.s32 %rd149, %r3757; 2026-02-21T08:53:50.4131586Z add.s64 %rd148, %rd81, %rd149; 2026-02-21T08:53:50.4131902Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4132251Z shl.b32 %r1230, %r3759, 11; 2026-02-21T08:53:50.4132428Z add.s32 %r1174, %r32, %r1230; 2026-02-21T08:53:50.4132613Z selp.b32 %r1175, 16, 0, %p11; 2026-02-21T08:53:50.4132787Z // begin inline asm 2026-02-21T08:53:50.4133027Z cp.async.cg.shared.global [ %r1174 + 0 ], [ %rd148 + 0 ], 0x10, %r1175; 2026-02-21T08:53:50.4133313Z // end inline asm 2026-02-21T08:53:50.4133467Z cp.async.commit_group; 2026-02-21T08:53:50.4133783Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4134142Z add.s32 %r3757, %r3757, 229376; 2026-02-21T08:53:50.4134338Z add.s64 %rd315, %rd315, 128; 2026-02-21T08:53:50.4134521Z setp.lt.u64 %p14, %rd316, 4064; 2026-02-21T08:53:50.4134733Z @%p14 bra $L__BB0_3; 2026-02-21T08:53:50.4134949Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:50.4135348Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4135707Z or.b32 %r1306, %r128, %r10; 2026-02-21T08:53:50.4136168Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4136644Z cp.async.wait_group 0; 2026-02-21T08:53:50.4136817Z bar.sync 0; 2026-02-21T08:53:50.4137126Z .loc 1 84 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:84:28 2026-02-21T08:53:50.4137489Z cvt.rn.bf16x2.f32 %r1307, %r3761, %r3760; 2026-02-21T08:53:50.4137726Z cvt.rn.bf16x2.f32 %r1308, %r3763, %r3762; 2026-02-21T08:53:50.4137950Z cvt.rn.bf16x2.f32 %r1309, %r3765, %r3764; 2026-02-21T08:53:50.4138169Z cvt.rn.bf16x2.f32 %r1310, %r3767, %r3766; 2026-02-21T08:53:50.4138402Z cvt.rn.bf16x2.f32 %r1311, %r3769, %r3768; 2026-02-21T08:53:50.4138619Z cvt.rn.bf16x2.f32 %r1312, %r3771, %r3770; 2026-02-21T08:53:50.4138991Z cvt.rn.bf16x2.f32 %r1313, %r3773, %r3772; 2026-02-21T08:53:50.4139214Z cvt.rn.bf16x2.f32 %r1314, %r3775, %r3774; 2026-02-21T08:53:50.4139431Z cvt.rn.bf16x2.f32 %r1315, %r3777, %r3776; 2026-02-21T08:53:50.4139650Z cvt.rn.bf16x2.f32 %r1316, %r3779, %r3778; 2026-02-21T08:53:50.4139868Z cvt.rn.bf16x2.f32 %r1317, %r3781, %r3780; 2026-02-21T08:53:50.4140083Z cvt.rn.bf16x2.f32 %r1318, %r3783, %r3782; 2026-02-21T08:53:50.4140298Z cvt.rn.bf16x2.f32 %r1319, %r3785, %r3784; 2026-02-21T08:53:50.4140516Z cvt.rn.bf16x2.f32 %r1320, %r3787, %r3786; 2026-02-21T08:53:50.4140729Z cvt.rn.bf16x2.f32 %r1321, %r3789, %r3788; 2026-02-21T08:53:50.4140946Z cvt.rn.bf16x2.f32 %r1322, %r3791, %r3790; 2026-02-21T08:53:50.4141297Z .loc 1 85 50 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:50 2026-02-21T08:53:50.4141663Z add.s32 %r1323, %r1306, %r18; 2026-02-21T08:53:50.4141857Z add.s32 %r1324, %r1306, %r19; 2026-02-21T08:53:50.4142037Z add.s32 %r1325, %r1306, %r20; 2026-02-21T08:53:50.4142222Z add.s32 %r1326, %r1306, %r21; 2026-02-21T08:53:50.4142538Z .loc 1 85 22 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:22 2026-02-21T08:53:50.4142910Z mad.wide.s32 %rd150, %r1323, 2, %rd82; 2026-02-21T08:53:50.4143126Z mad.wide.s32 %rd151, %r1324, 2, %rd82; 2026-02-21T08:53:50.4143353Z mad.wide.s32 %rd152, %r1325, 2, %rd82; 2026-02-21T08:53:50.4143560Z mad.wide.s32 %rd153, %r1326, 2, %rd82; 2026-02-21T08:53:50.4143906Z .loc 1 85 81 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:81 2026-02-21T08:53:50.4144307Z st.shared.v4.b32 [%r62], {%r1307, %r1309, %r1311, %r1313}; 2026-02-21T08:53:50.4144611Z st.shared.v4.b32 [%r62+256], {%r1308, %r1310, %r1312, %r1314}; 2026-02-21T08:53:50.4144914Z st.shared.v4.b32 [%r63], {%r1315, %r1317, %r1319, %r1321}; 2026-02-21T08:53:50.4145205Z st.shared.v4.b32 [%r63+256], {%r1316, %r1318, %r1320, %r1322}; 2026-02-21T08:53:50.4145457Z bar.sync 0; 2026-02-21T08:53:50.4145604Z // begin inline asm 2026-02-21T08:53:50.4145907Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1251, %r1252, %r1253, %r1254}, [%r1235]; 2026-02-21T08:53:50.4146255Z // end inline asm 2026-02-21T08:53:50.4146405Z // begin inline asm 2026-02-21T08:53:50.4146801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1255, %r1256, %r1257, %r1258}, [%r1240]; 2026-02-21T08:53:50.4147127Z // end inline asm 2026-02-21T08:53:50.4147280Z // begin inline asm 2026-02-21T08:53:50.4147550Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1259, %r1260, %r1261, %r1262}, [%r1245]; 2026-02-21T08:53:50.4147877Z // end inline asm 2026-02-21T08:53:50.4148021Z // begin inline asm 2026-02-21T08:53:50.4148300Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1263, %r1264, %r1265, %r1266}, [%r1250]; 2026-02-21T08:53:50.4148700Z // end inline asm 2026-02-21T08:53:50.4148847Z // begin inline asm 2026-02-21T08:53:50.4149075Z st.global.v4.b32 [ %rd150 + 0 ], { %r1251, %r1252, %r1253, %r1254 }; 2026-02-21T08:53:50.4149340Z // end inline asm 2026-02-21T08:53:50.4149491Z // begin inline asm 2026-02-21T08:53:50.4149705Z st.global.v4.b32 [ %rd151 + 0 ], { %r1255, %r1256, %r1257, %r1258 }; 2026-02-21T08:53:50.4150117Z // end inline asm 2026-02-21T08:53:50.4150267Z // begin inline asm 2026-02-21T08:53:50.4150483Z st.global.v4.b32 [ %rd152 + 0 ], { %r1259, %r1260, %r1261, %r1262 }; 2026-02-21T08:53:50.4150744Z // end inline asm 2026-02-21T08:53:50.4150901Z // begin inline asm 2026-02-21T08:53:50.4151120Z st.global.v4.b32 [ %rd153 + 0 ], { %r1263, %r1264, %r1265, %r1266 }; 2026-02-21T08:53:50.4151372Z // end inline asm 2026-02-21T08:53:50.4151671Z .loc 1 29 27 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:29:27 2026-02-21T08:53:50.4152025Z add.s32 %r199, %r128, 64; 2026-02-21T08:53:50.4152346Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4152695Z or.b32 %r1327, %r199, %r12; 2026-02-21T08:53:50.4153154Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4153505Z bar.sync 0; 2026-02-21T08:53:50.4153648Z mov.b32 %r1268, 8; 2026-02-21T08:53:50.4153813Z // begin inline asm 2026-02-21T08:53:50.4158757Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd1 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4159069Z // end inline asm 2026-02-21T08:53:50.4159244Z // begin inline asm 2026-02-21T08:53:50.4159499Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd2 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4159783Z // end inline asm 2026-02-21T08:53:50.4159948Z // begin inline asm 2026-02-21T08:53:50.4160185Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd3 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4160465Z // end inline asm 2026-02-21T08:53:50.4160618Z // begin inline asm 2026-02-21T08:53:50.4160844Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd4 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4161118Z // end inline asm 2026-02-21T08:53:50.4161275Z // begin inline asm 2026-02-21T08:53:50.4161507Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd5 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4161770Z // end inline asm 2026-02-21T08:53:50.4161937Z // begin inline asm 2026-02-21T08:53:50.4162161Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd6 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4162426Z // end inline asm 2026-02-21T08:53:50.4162577Z // begin inline asm 2026-02-21T08:53:50.4162793Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd7 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4163062Z // end inline asm 2026-02-21T08:53:50.4163214Z // begin inline asm 2026-02-21T08:53:50.4163434Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd8 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4163693Z // end inline asm 2026-02-21T08:53:50.4163852Z cp.async.commit_group; 2026-02-21T08:53:50.4164184Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4164560Z add.s32 %r1328, %r1327, %r3733; 2026-02-21T08:53:50.4164903Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4165266Z cvt.s64.s32 %rd173, %r1328; 2026-02-21T08:53:50.4165455Z add.s64 %rd162, %rd81, %rd173; 2026-02-21T08:53:50.4165654Z mov.b32 %r1284, 16; 2026-02-21T08:53:50.4165965Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4166316Z // begin inline asm 2026-02-21T08:53:50.4166706Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd162 + 0 ], 0x10, %r1284; 2026-02-21T08:53:50.4166988Z // end inline asm 2026-02-21T08:53:50.4167171Z cp.async.commit_group; 2026-02-21T08:53:50.4167486Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4167850Z bar.sync 0; 2026-02-21T08:53:50.4168003Z // begin inline asm 2026-02-21T08:53:50.4168249Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd9 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4168533Z // end inline asm 2026-02-21T08:53:50.4168692Z // begin inline asm 2026-02-21T08:53:50.4168928Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd10 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4169202Z // end inline asm 2026-02-21T08:53:50.4169571Z // begin inline asm 2026-02-21T08:53:50.4169800Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd11 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4170072Z // end inline asm 2026-02-21T08:53:50.4170220Z // begin inline asm 2026-02-21T08:53:50.4170465Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd12 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4170734Z // end inline asm 2026-02-21T08:53:50.4170882Z // begin inline asm 2026-02-21T08:53:50.4171108Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd13 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4171372Z // end inline asm 2026-02-21T08:53:50.4171525Z // begin inline asm 2026-02-21T08:53:50.4171745Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd14 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4172013Z // end inline asm 2026-02-21T08:53:50.4172160Z // begin inline asm 2026-02-21T08:53:50.4172539Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd15 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4172820Z // end inline asm 2026-02-21T08:53:50.4172967Z // begin inline asm 2026-02-21T08:53:50.4173194Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd16 + 0 ], 0x8, %r1268; 2026-02-21T08:53:50.4173456Z // end inline asm 2026-02-21T08:53:50.4173628Z cp.async.commit_group; 2026-02-21T08:53:50.4173957Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4174333Z add.s32 %r1329, %r1327, %r41; 2026-02-21T08:53:50.4174657Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4175021Z cvt.s64.s32 %rd174, %r1329; 2026-02-21T08:53:50.4175213Z add.s64 %rd171, %rd81, %rd174; 2026-02-21T08:53:50.4175553Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4175907Z // begin inline asm 2026-02-21T08:53:50.4176149Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd171 + 0 ], 0x10, %r1284; 2026-02-21T08:53:50.4176428Z // end inline asm 2026-02-21T08:53:50.4176724Z cp.async.commit_group; 2026-02-21T08:53:50.4176916Z mov.b32 %r3795, 0f00000000; 2026-02-21T08:53:50.4177100Z mov.b32 %r3794, 1; 2026-02-21T08:53:50.4177260Z mov.b32 %r3793, -1; 2026-02-21T08:53:50.4177426Z mov.b64 %rd318, -32; 2026-02-21T08:53:50.4177589Z mov.b64 %rd317, %rd27; 2026-02-21T08:53:50.4177764Z mov.b32 %r3792, %r3755; 2026-02-21T08:53:50.4177932Z mov.b32 %r3796, %r3795; 2026-02-21T08:53:50.4178102Z mov.b32 %r3797, %r3795; 2026-02-21T08:53:50.4178269Z mov.b32 %r3798, %r3795; 2026-02-21T08:53:50.4178438Z mov.b32 %r3799, %r3795; 2026-02-21T08:53:50.4178598Z mov.b32 %r3800, %r3795; 2026-02-21T08:53:50.4178764Z mov.b32 %r3801, %r3795; 2026-02-21T08:53:50.4178930Z mov.b32 %r3802, %r3795; 2026-02-21T08:53:50.4179100Z mov.b32 %r3803, %r3795; 2026-02-21T08:53:50.4179266Z mov.b32 %r3804, %r3795; 2026-02-21T08:53:50.4179430Z mov.b32 %r3805, %r3795; 2026-02-21T08:53:50.4179602Z mov.b32 %r3806, %r3795; 2026-02-21T08:53:50.4179765Z mov.b32 %r3807, %r3795; 2026-02-21T08:53:50.4179948Z mov.b32 %r3808, %r3795; 2026-02-21T08:53:50.4180119Z mov.b32 %r3809, %r3795; 2026-02-21T08:53:50.4180286Z mov.b32 %r3810, %r3795; 2026-02-21T08:53:50.4180448Z mov.b32 %r3811, %r3795; 2026-02-21T08:53:50.4180611Z mov.b32 %r3812, %r3795; 2026-02-21T08:53:50.4180769Z mov.b32 %r3813, %r3795; 2026-02-21T08:53:50.4180935Z mov.b32 %r3814, %r3795; 2026-02-21T08:53:50.4181100Z mov.b32 %r3815, %r3795; 2026-02-21T08:53:50.4181260Z mov.b32 %r3816, %r3795; 2026-02-21T08:53:50.4181423Z mov.b32 %r3817, %r3795; 2026-02-21T08:53:50.4181584Z mov.b32 %r3818, %r3795; 2026-02-21T08:53:50.4181752Z mov.b32 %r3819, %r3795; 2026-02-21T08:53:50.4181920Z mov.b32 %r3820, %r3795; 2026-02-21T08:53:50.4182089Z mov.b32 %r3821, %r3795; 2026-02-21T08:53:50.4182249Z mov.b32 %r3822, %r3795; 2026-02-21T08:53:50.4182418Z mov.b32 %r3823, %r3795; 2026-02-21T08:53:50.4182580Z mov.b32 %r3824, %r3795; 2026-02-21T08:53:50.4182745Z mov.b32 %r3825, %r3795; 2026-02-21T08:53:50.4182915Z mov.b32 %r3826, %r3795; 2026-02-21T08:53:50.4183290Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T08:53:50.4183596Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:50.4184005Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4184378Z add.s64 %rd318, %rd318, 32; 2026-02-21T08:53:50.4184568Z setp.lt.u64 %p24, %rd318, 4032; 2026-02-21T08:53:50.4184767Z add.s32 %r1962, %r3793, 1; 2026-02-21T08:53:50.4184954Z setp.gt.s32 %p25, %r1962, 1; 2026-02-21T08:53:50.4185143Z selp.b32 %r3793, 0, %r1962, %p25; 2026-02-21T08:53:50.4185485Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4185839Z cp.async.wait_group 2; 2026-02-21T08:53:50.4186147Z bar.sync 0; 2026-02-21T08:53:50.4186304Z shl.b32 %r1963, %r3793, 13; 2026-02-21T08:53:50.4186604Z add.s32 %r1965, %r3732, %r1963; 2026-02-21T08:53:50.4186944Z .loc 1 49 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:49:32 2026-02-21T08:53:50.4187024Z add.s32 %r1966, %r1965, %r68; 2026-02-21T08:53:50.4187093Z ld.shared.b16 %rs193, [%r1966]; 2026-02-21T08:53:50.4187165Z ld.shared.b16 %rs194, [%r1966+1024]; 2026-02-21T08:53:50.4187240Z ld.shared.b16 %rs195, [%r1966+64]; 2026-02-21T08:53:50.4187309Z ld.shared.b16 %rs196, [%r1966+1088]; 2026-02-21T08:53:50.4187373Z add.s32 %r1967, %r1965, %r69; 2026-02-21T08:53:50.4187439Z ld.shared.b16 %rs197, [%r1967]; 2026-02-21T08:53:50.4187510Z ld.shared.b16 %rs198, [%r1967+1024]; 2026-02-21T08:53:50.4187578Z ld.shared.b16 %rs199, [%r1967+64]; 2026-02-21T08:53:50.4187645Z ld.shared.b16 %rs200, [%r1967+1088]; 2026-02-21T08:53:50.4187716Z add.s32 %r1968, %r1965, %r70; 2026-02-21T08:53:50.4187787Z ld.shared.b16 %rs201, [%r1968]; 2026-02-21T08:53:50.4187857Z ld.shared.b16 %rs202, [%r1968+1024]; 2026-02-21T08:53:50.4187930Z ld.shared.b16 %rs203, [%r1968+64]; 2026-02-21T08:53:50.4188001Z ld.shared.b16 %rs204, [%r1968+1088]; 2026-02-21T08:53:50.4188064Z add.s32 %r1969, %r1965, %r71; 2026-02-21T08:53:50.4188132Z ld.shared.b16 %rs205, [%r1969]; 2026-02-21T08:53:50.4188212Z ld.shared.b16 %rs206, [%r1969+1024]; 2026-02-21T08:53:50.4188279Z ld.shared.b16 %rs207, [%r1969+64]; 2026-02-21T08:53:50.4188347Z ld.shared.b16 %rs208, [%r1969+1088]; 2026-02-21T08:53:50.4188415Z add.s32 %r1970, %r1965, %r72; 2026-02-21T08:53:50.4188482Z ld.shared.b16 %rs209, [%r1970]; 2026-02-21T08:53:50.4188646Z ld.shared.b16 %rs210, [%r1970+1024]; 2026-02-21T08:53:50.4188718Z ld.shared.b16 %rs211, [%r1970+64]; 2026-02-21T08:53:50.4188793Z ld.shared.b16 %rs212, [%r1970+1088]; 2026-02-21T08:53:50.4188855Z add.s32 %r1971, %r1965, %r73; 2026-02-21T08:53:50.4188922Z ld.shared.b16 %rs213, [%r1971]; 2026-02-21T08:53:50.4189001Z ld.shared.b16 %rs214, [%r1971+1024]; 2026-02-21T08:53:50.4189068Z ld.shared.b16 %rs215, [%r1971+64]; 2026-02-21T08:53:50.4189136Z ld.shared.b16 %rs216, [%r1971+1088]; 2026-02-21T08:53:50.4189206Z add.s32 %r1972, %r1965, %r74; 2026-02-21T08:53:50.4189274Z ld.shared.b16 %rs217, [%r1972]; 2026-02-21T08:53:50.4189342Z ld.shared.b16 %rs218, [%r1972+1024]; 2026-02-21T08:53:50.4189409Z ld.shared.b16 %rs219, [%r1972+64]; 2026-02-21T08:53:50.4189479Z ld.shared.b16 %rs220, [%r1972+1088]; 2026-02-21T08:53:50.4189542Z add.s32 %r1973, %r1965, %r75; 2026-02-21T08:53:50.4189608Z ld.shared.b16 %rs221, [%r1973]; 2026-02-21T08:53:50.4189687Z ld.shared.b16 %rs222, [%r1973+1024]; 2026-02-21T08:53:50.4189753Z ld.shared.b16 %rs223, [%r1973+64]; 2026-02-21T08:53:50.4189822Z ld.shared.b16 %rs224, [%r1973+1088]; 2026-02-21T08:53:50.4189889Z cvt.f32.bf16 %r1394, %rs193; 2026-02-21T08:53:50.4189958Z cvt.f32.bf16 %r1395, %rs194; 2026-02-21T08:53:50.4190020Z cvt.f32.bf16 %r1396, %rs197; 2026-02-21T08:53:50.4190084Z cvt.f32.bf16 %r1397, %rs198; 2026-02-21T08:53:50.4190155Z cvt.f32.bf16 %r1462, %rs201; 2026-02-21T08:53:50.4190224Z cvt.f32.bf16 %r1463, %rs202; 2026-02-21T08:53:50.4190459Z cvt.f32.bf16 %r1464, %rs205; 2026-02-21T08:53:50.4190523Z cvt.f32.bf16 %r1465, %rs206; 2026-02-21T08:53:50.4190591Z cvt.f32.bf16 %r1530, %rs209; 2026-02-21T08:53:50.4190654Z cvt.f32.bf16 %r1531, %rs210; 2026-02-21T08:53:50.4190728Z cvt.f32.bf16 %r1532, %rs213; 2026-02-21T08:53:50.4190798Z cvt.f32.bf16 %r1533, %rs214; 2026-02-21T08:53:50.4190863Z cvt.f32.bf16 %r1598, %rs217; 2026-02-21T08:53:50.4190925Z cvt.f32.bf16 %r1599, %rs218; 2026-02-21T08:53:50.4190993Z cvt.f32.bf16 %r1600, %rs221; 2026-02-21T08:53:50.4191055Z cvt.f32.bf16 %r1601, %rs222; 2026-02-21T08:53:50.4191117Z cvt.f32.bf16 %r1666, %rs195; 2026-02-21T08:53:50.4191179Z cvt.f32.bf16 %r1667, %rs196; 2026-02-21T08:53:50.4191248Z cvt.f32.bf16 %r1668, %rs199; 2026-02-21T08:53:50.4191309Z cvt.f32.bf16 %r1669, %rs200; 2026-02-21T08:53:50.4191513Z cvt.f32.bf16 %r1734, %rs203; 2026-02-21T08:53:50.4191586Z cvt.f32.bf16 %r1735, %rs204; 2026-02-21T08:53:50.4191648Z cvt.f32.bf16 %r1736, %rs207; 2026-02-21T08:53:50.4191718Z cvt.f32.bf16 %r1737, %rs208; 2026-02-21T08:53:50.4191790Z cvt.f32.bf16 %r1802, %rs211; 2026-02-21T08:53:50.4191859Z cvt.f32.bf16 %r1803, %rs212; 2026-02-21T08:53:50.4191923Z cvt.f32.bf16 %r1804, %rs215; 2026-02-21T08:53:50.4191990Z cvt.f32.bf16 %r1805, %rs216; 2026-02-21T08:53:50.4192055Z cvt.f32.bf16 %r1870, %rs219; 2026-02-21T08:53:50.4192117Z cvt.f32.bf16 %r1871, %rs220; 2026-02-21T08:53:50.4192180Z cvt.f32.bf16 %r1872, %rs223; 2026-02-21T08:53:50.4192242Z cvt.f32.bf16 %r1873, %rs224; 2026-02-21T08:53:50.4192464Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4192530Z shl.b32 %r1974, %r3793, 11; 2026-02-21T08:53:50.4192595Z add.s32 %r1975, %r3732, %r1974; 2026-02-21T08:53:50.4192668Z add.s32 %r1976, %r1975, 32768; 2026-02-21T08:53:50.4192870Z .loc 1 64 45 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:64:45 2026-02-21T08:53:50.4192933Z add.s32 %r1977, %r1976, %r3738; 2026-02-21T08:53:50.4193009Z ld.shared.b8 %rs225, [%r1977]; 2026-02-21T08:53:50.4193078Z ld.shared.b8 %rs226, [%r1977+64]; 2026-02-21T08:53:50.4193147Z ld.shared.b8 %rs227, [%r1977+128]; 2026-02-21T08:53:50.4193214Z ld.shared.b8 %rs228, [%r1977+192]; 2026-02-21T08:53:50.4193289Z ld.shared.b8 %rs229, [%r1977+256]; 2026-02-21T08:53:50.4193354Z ld.shared.b8 %rs230, [%r1977+320]; 2026-02-21T08:53:50.4193418Z ld.shared.b8 %rs231, [%r1977+384]; 2026-02-21T08:53:50.4193489Z ld.shared.b8 %rs232, [%r1977+448]; 2026-02-21T08:53:50.4193557Z ld.shared.b8 %rs233, [%r1977+512]; 2026-02-21T08:53:50.4193624Z ld.shared.b8 %rs234, [%r1977+576]; 2026-02-21T08:53:50.4193692Z ld.shared.b8 %rs235, [%r1977+640]; 2026-02-21T08:53:50.4193763Z ld.shared.b8 %rs236, [%r1977+704]; 2026-02-21T08:53:50.4193830Z ld.shared.b8 %rs237, [%r1977+768]; 2026-02-21T08:53:50.4193897Z ld.shared.b8 %rs238, [%r1977+832]; 2026-02-21T08:53:50.4193969Z ld.shared.b8 %rs239, [%r1977+896]; 2026-02-21T08:53:50.4194033Z add.s32 %r1978, %r1976, %r3739; 2026-02-21T08:53:50.4194104Z ld.shared.b8 %rs240, [%r1978]; 2026-02-21T08:53:50.4194178Z ld.shared.b8 %rs241, [%r1977+1024]; 2026-02-21T08:53:50.4194247Z ld.shared.b8 %rs242, [%r1977+1088]; 2026-02-21T08:53:50.4194314Z ld.shared.b8 %rs243, [%r1977+1152]; 2026-02-21T08:53:50.4194380Z ld.shared.b8 %rs244, [%r1977+1216]; 2026-02-21T08:53:50.4194451Z ld.shared.b8 %rs245, [%r1977+1280]; 2026-02-21T08:53:50.4194517Z ld.shared.b8 %rs246, [%r1977+1344]; 2026-02-21T08:53:50.4194583Z ld.shared.b8 %rs247, [%r1977+1408]; 2026-02-21T08:53:50.4194650Z ld.shared.b8 %rs248, [%r1977+1472]; 2026-02-21T08:53:50.4194716Z ld.shared.b8 %rs249, [%r1977+1536]; 2026-02-21T08:53:50.4194781Z ld.shared.b8 %rs250, [%r1977+1600]; 2026-02-21T08:53:50.4194850Z ld.shared.b8 %rs251, [%r1977+1664]; 2026-02-21T08:53:50.4194922Z ld.shared.b8 %rs252, [%r1977+1728]; 2026-02-21T08:53:50.4194987Z ld.shared.b8 %rs253, [%r1977+1792]; 2026-02-21T08:53:50.4195054Z ld.shared.b8 %rs254, [%r1977+1856]; 2026-02-21T08:53:50.4195228Z ld.shared.b8 %rs255, [%r1977+1920]; 2026-02-21T08:53:50.4195293Z add.s32 %r1979, %r1976, %r3740; 2026-02-21T08:53:50.4195361Z ld.shared.b8 %rs256, [%r1979]; 2026-02-21T08:53:50.4195571Z .loc 1 54 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:54:28 2026-02-21T08:53:50.4195640Z shl.b16 %rs257, %rs225, 4; 2026-02-21T08:53:50.4195703Z shl.b16 %rs258, %rs226, 4; 2026-02-21T08:53:50.4195771Z shl.b16 %rs259, %rs227, 4; 2026-02-21T08:53:50.4195836Z shl.b16 %rs260, %rs228, 4; 2026-02-21T08:53:50.4195911Z shl.b16 %rs261, %rs229, 4; 2026-02-21T08:53:50.4195973Z shl.b16 %rs262, %rs230, 4; 2026-02-21T08:53:50.4196039Z shl.b16 %rs263, %rs231, 4; 2026-02-21T08:53:50.4196102Z shl.b16 %rs264, %rs232, 4; 2026-02-21T08:53:50.4196258Z shl.b16 %rs265, %rs233, 4; 2026-02-21T08:53:50.4196334Z shl.b16 %rs266, %rs234, 4; 2026-02-21T08:53:50.4196395Z shl.b16 %rs267, %rs235, 4; 2026-02-21T08:53:50.4196580Z shl.b16 %rs268, %rs236, 4; 2026-02-21T08:53:50.4196656Z shl.b16 %rs269, %rs237, 4; 2026-02-21T08:53:50.4196727Z shl.b16 %rs270, %rs238, 4; 2026-02-21T08:53:50.4196789Z shl.b16 %rs271, %rs239, 4; 2026-02-21T08:53:50.4196852Z shl.b16 %rs272, %rs240, 4; 2026-02-21T08:53:50.4196916Z shl.b16 %rs273, %rs241, 4; 2026-02-21T08:53:50.4196977Z shl.b16 %rs274, %rs242, 4; 2026-02-21T08:53:50.4197039Z shl.b16 %rs275, %rs243, 4; 2026-02-21T08:53:50.4197105Z shl.b16 %rs276, %rs244, 4; 2026-02-21T08:53:50.4197171Z shl.b16 %rs277, %rs245, 4; 2026-02-21T08:53:50.4197233Z shl.b16 %rs278, %rs246, 4; 2026-02-21T08:53:50.4197295Z shl.b16 %rs279, %rs247, 4; 2026-02-21T08:53:50.4197360Z shl.b16 %rs280, %rs248, 4; 2026-02-21T08:53:50.4197421Z shl.b16 %rs281, %rs249, 4; 2026-02-21T08:53:50.4197484Z shl.b16 %rs282, %rs250, 4; 2026-02-21T08:53:50.4197554Z shl.b16 %rs283, %rs251, 4; 2026-02-21T08:53:50.4197615Z shl.b16 %rs284, %rs252, 4; 2026-02-21T08:53:50.4197677Z shl.b16 %rs285, %rs253, 4; 2026-02-21T08:53:50.4197738Z shl.b16 %rs286, %rs254, 4; 2026-02-21T08:53:50.4197806Z shl.b16 %rs287, %rs255, 4; 2026-02-21T08:53:50.4197868Z shl.b16 %rs288, %rs256, 4; 2026-02-21T08:53:50.4198067Z .loc 1 69 58 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:69:58 2026-02-21T08:53:50.4198148Z selp.b16 %rs289, %rs257, %rs225, %p57; 2026-02-21T08:53:50.4198213Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T08:53:50.4198274Z shr.s16 %rs291, %rs290, 4; 2026-02-21T08:53:50.4198344Z selp.b16 %rs292, %rs258, %rs226, %p57; 2026-02-21T08:53:50.4198410Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T08:53:50.4198472Z shr.s16 %rs294, %rs293, 4; 2026-02-21T08:53:50.4198541Z selp.b16 %rs295, %rs259, %rs227, %p57; 2026-02-21T08:53:50.4198607Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T08:53:50.4198668Z shr.s16 %rs297, %rs296, 4; 2026-02-21T08:53:50.4198738Z selp.b16 %rs298, %rs260, %rs228, %p57; 2026-02-21T08:53:50.4198800Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T08:53:50.4198866Z shr.s16 %rs300, %rs299, 4; 2026-02-21T08:53:50.4198942Z selp.b16 %rs301, %rs261, %rs229, %p57; 2026-02-21T08:53:50.4199004Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T08:53:50.4199070Z shr.s16 %rs303, %rs302, 4; 2026-02-21T08:53:50.4199138Z selp.b16 %rs304, %rs262, %rs230, %p57; 2026-02-21T08:53:50.4199201Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T08:53:50.4199277Z shr.s16 %rs306, %rs305, 4; 2026-02-21T08:53:50.4199349Z selp.b16 %rs307, %rs263, %rs231, %p57; 2026-02-21T08:53:50.4199412Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T08:53:50.4199474Z shr.s16 %rs309, %rs308, 4; 2026-02-21T08:53:50.4199553Z selp.b16 %rs310, %rs264, %rs232, %p57; 2026-02-21T08:53:50.4199616Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T08:53:50.4199678Z shr.s16 %rs312, %rs311, 4; 2026-02-21T08:53:50.4199751Z selp.b16 %rs313, %rs265, %rs233, %p57; 2026-02-21T08:53:50.4199815Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T08:53:50.4199877Z shr.s16 %rs315, %rs314, 4; 2026-02-21T08:53:50.4199945Z selp.b16 %rs316, %rs266, %rs234, %p57; 2026-02-21T08:53:50.4200013Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T08:53:50.4200218Z shr.s16 %rs318, %rs317, 4; 2026-02-21T08:53:50.4200291Z selp.b16 %rs319, %rs267, %rs235, %p57; 2026-02-21T08:53:50.4200358Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T08:53:50.4200419Z shr.s16 %rs321, %rs320, 4; 2026-02-21T08:53:50.4200488Z selp.b16 %rs322, %rs268, %rs236, %p57; 2026-02-21T08:53:50.4200553Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T08:53:50.4200617Z shr.s16 %rs324, %rs323, 4; 2026-02-21T08:53:50.4200685Z selp.b16 %rs325, %rs269, %rs237, %p57; 2026-02-21T08:53:50.4200748Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T08:53:50.4200812Z shr.s16 %rs327, %rs326, 4; 2026-02-21T08:53:50.4200880Z selp.b16 %rs328, %rs270, %rs238, %p57; 2026-02-21T08:53:50.4200942Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T08:53:50.4201004Z shr.s16 %rs330, %rs329, 4; 2026-02-21T08:53:50.4201199Z selp.b16 %rs331, %rs271, %rs239, %p57; 2026-02-21T08:53:50.4201264Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T08:53:50.4201326Z shr.s16 %rs333, %rs332, 4; 2026-02-21T08:53:50.4201403Z selp.b16 %rs334, %rs272, %rs240, %p57; 2026-02-21T08:53:50.4201466Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T08:53:50.4201529Z shr.s16 %rs336, %rs335, 4; 2026-02-21T08:53:50.4201602Z selp.b16 %rs337, %rs273, %rs241, %p57; 2026-02-21T08:53:50.4201665Z cvt.s16.s8 %rs338, %rs337; 2026-02-21T08:53:50.4201726Z shr.s16 %rs339, %rs338, 4; 2026-02-21T08:53:50.4201794Z selp.b16 %rs340, %rs274, %rs242, %p57; 2026-02-21T08:53:50.4201871Z cvt.s16.s8 %rs341, %rs340; 2026-02-21T08:53:50.4201943Z shr.s16 %rs342, %rs341, 4; 2026-02-21T08:53:50.4202012Z selp.b16 %rs343, %rs275, %rs243, %p57; 2026-02-21T08:53:50.4202083Z cvt.s16.s8 %rs344, %rs343; 2026-02-21T08:53:50.4202145Z shr.s16 %rs345, %rs344, 4; 2026-02-21T08:53:50.4202214Z selp.b16 %rs346, %rs276, %rs244, %p57; 2026-02-21T08:53:50.4202278Z cvt.s16.s8 %rs347, %rs346; 2026-02-21T08:53:50.4202347Z shr.s16 %rs348, %rs347, 4; 2026-02-21T08:53:50.4202414Z selp.b16 %rs349, %rs277, %rs245, %p57; 2026-02-21T08:53:50.4202478Z cvt.s16.s8 %rs350, %rs349; 2026-02-21T08:53:50.4202549Z shr.s16 %rs351, %rs350, 4; 2026-02-21T08:53:50.4202618Z selp.b16 %rs352, %rs278, %rs246, %p57; 2026-02-21T08:53:50.4202678Z cvt.s16.s8 %rs353, %rs352; 2026-02-21T08:53:50.4202739Z shr.s16 %rs354, %rs353, 4; 2026-02-21T08:53:50.4202811Z selp.b16 %rs355, %rs279, %rs247, %p57; 2026-02-21T08:53:50.4202873Z cvt.s16.s8 %rs356, %rs355; 2026-02-21T08:53:50.4202934Z shr.s16 %rs357, %rs356, 4; 2026-02-21T08:53:50.4203007Z selp.b16 %rs358, %rs280, %rs248, %p57; 2026-02-21T08:53:50.4203071Z cvt.s16.s8 %rs359, %rs358; 2026-02-21T08:53:50.4203131Z shr.s16 %rs360, %rs359, 4; 2026-02-21T08:53:50.4203199Z selp.b16 %rs361, %rs281, %rs249, %p57; 2026-02-21T08:53:50.4203266Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T08:53:50.4203331Z shr.s16 %rs363, %rs362, 4; 2026-02-21T08:53:50.4203402Z selp.b16 %rs364, %rs282, %rs250, %p57; 2026-02-21T08:53:50.4203472Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T08:53:50.4203533Z shr.s16 %rs366, %rs365, 4; 2026-02-21T08:53:50.4203606Z selp.b16 %rs367, %rs283, %rs251, %p57; 2026-02-21T08:53:50.4203671Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T08:53:50.4203733Z shr.s16 %rs369, %rs368, 4; 2026-02-21T08:53:50.4203801Z selp.b16 %rs370, %rs284, %rs252, %p57; 2026-02-21T08:53:50.4203864Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T08:53:50.4203931Z shr.s16 %rs372, %rs371, 4; 2026-02-21T08:53:50.4203998Z selp.b16 %rs373, %rs285, %rs253, %p57; 2026-02-21T08:53:50.4204060Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T08:53:50.4204127Z shr.s16 %rs375, %rs374, 4; 2026-02-21T08:53:50.4204195Z selp.b16 %rs376, %rs286, %rs254, %p57; 2026-02-21T08:53:50.4204257Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T08:53:50.4204320Z shr.s16 %rs378, %rs377, 4; 2026-02-21T08:53:50.4204392Z selp.b16 %rs379, %rs287, %rs255, %p57; 2026-02-21T08:53:50.4204458Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T08:53:50.4204519Z shr.s16 %rs381, %rs380, 4; 2026-02-21T08:53:50.4204591Z selp.b16 %rs382, %rs288, %rs256, %p57; 2026-02-21T08:53:50.4204653Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T08:53:50.4204828Z shr.s16 %rs384, %rs383, 4; 2026-02-21T08:53:50.4205040Z .loc 1 74 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:74:32 2026-02-21T08:53:50.4205112Z cvt.rn.f32.s16 %r1980, %rs291; 2026-02-21T08:53:50.4205176Z cvt.rn.f32.s16 %r1981, %rs294; 2026-02-21T08:53:50.4205240Z cvt.rn.f32.s16 %r1982, %rs297; 2026-02-21T08:53:50.4205307Z cvt.rn.f32.s16 %r1983, %rs300; 2026-02-21T08:53:50.4205371Z cvt.rn.f32.s16 %r1984, %rs303; 2026-02-21T08:53:50.4205434Z cvt.rn.f32.s16 %r1985, %rs306; 2026-02-21T08:53:50.4205500Z cvt.rn.f32.s16 %r1986, %rs309; 2026-02-21T08:53:50.4205563Z cvt.rn.f32.s16 %r1987, %rs312; 2026-02-21T08:53:50.4205625Z cvt.rn.f32.s16 %r1988, %rs315; 2026-02-21T08:53:50.4205779Z cvt.rn.f32.s16 %r1989, %rs318; 2026-02-21T08:53:50.4205849Z cvt.rn.f32.s16 %r1990, %rs321; 2026-02-21T08:53:50.4205911Z cvt.rn.f32.s16 %r1991, %rs324; 2026-02-21T08:53:50.4205973Z cvt.rn.f32.s16 %r1992, %rs327; 2026-02-21T08:53:50.4206043Z cvt.rn.f32.s16 %r1993, %rs330; 2026-02-21T08:53:50.4206105Z cvt.rn.f32.s16 %r1994, %rs333; 2026-02-21T08:53:50.4206169Z cvt.rn.f32.s16 %r1995, %rs336; 2026-02-21T08:53:50.4206232Z cvt.rn.f32.s16 %r1996, %rs339; 2026-02-21T08:53:50.4206302Z cvt.rn.f32.s16 %r1997, %rs342; 2026-02-21T08:53:50.4206368Z cvt.rn.f32.s16 %r1998, %rs345; 2026-02-21T08:53:50.4206429Z cvt.rn.f32.s16 %r1999, %rs348; 2026-02-21T08:53:50.4206616Z cvt.rn.f32.s16 %r2000, %rs351; 2026-02-21T08:53:50.4206681Z cvt.rn.f32.s16 %r2001, %rs354; 2026-02-21T08:53:50.4206745Z cvt.rn.f32.s16 %r2002, %rs357; 2026-02-21T08:53:50.4206807Z cvt.rn.f32.s16 %r2003, %rs360; 2026-02-21T08:53:50.4206875Z cvt.rn.f32.s16 %r2004, %rs363; 2026-02-21T08:53:50.4206938Z cvt.rn.f32.s16 %r2005, %rs366; 2026-02-21T08:53:50.4207003Z cvt.rn.f32.s16 %r2006, %rs369; 2026-02-21T08:53:50.4207071Z cvt.rn.f32.s16 %r2007, %rs372; 2026-02-21T08:53:50.4207145Z cvt.rn.f32.s16 %r2008, %rs375; 2026-02-21T08:53:50.4207211Z cvt.rn.f32.s16 %r2009, %rs378; 2026-02-21T08:53:50.4207281Z cvt.rn.f32.s16 %r2010, %rs381; 2026-02-21T08:53:50.4207345Z cvt.rn.f32.s16 %r2011, %rs384; 2026-02-21T08:53:50.4207424Z st.shared.b32 [%r54], %r1980; 2026-02-21T08:53:50.4207496Z st.shared.b32 [%r54+8], %r1981; 2026-02-21T08:53:50.4207569Z st.shared.b32 [%r54+8192], %r1996; 2026-02-21T08:53:50.4207637Z st.shared.b32 [%r54+8200], %r1997; 2026-02-21T08:53:50.4207701Z st.shared.b32 [%r55], %r1982; 2026-02-21T08:53:50.4207770Z st.shared.b32 [%r55+8], %r1983; 2026-02-21T08:53:50.4207834Z st.shared.b32 [%r55+8192], %r1998; 2026-02-21T08:53:50.4207898Z st.shared.b32 [%r55+8200], %r1999; 2026-02-21T08:53:50.4207962Z st.shared.b32 [%r56], %r1984; 2026-02-21T08:53:50.4208031Z st.shared.b32 [%r56+8], %r1985; 2026-02-21T08:53:50.4208098Z st.shared.b32 [%r56+8192], %r2000; 2026-02-21T08:53:50.4208164Z st.shared.b32 [%r56+8200], %r2001; 2026-02-21T08:53:50.4208233Z st.shared.b32 [%r57], %r1986; 2026-02-21T08:53:50.4208296Z st.shared.b32 [%r57+8], %r1987; 2026-02-21T08:53:50.4208365Z st.shared.b32 [%r57+8192], %r2002; 2026-02-21T08:53:50.4208430Z st.shared.b32 [%r57+8200], %r2003; 2026-02-21T08:53:50.4208498Z st.shared.b32 [%r58], %r1988; 2026-02-21T08:53:50.4208561Z st.shared.b32 [%r58+8], %r1989; 2026-02-21T08:53:50.4208624Z st.shared.b32 [%r58+8192], %r2004; 2026-02-21T08:53:50.4208701Z st.shared.b32 [%r58+8200], %r2005; 2026-02-21T08:53:50.4208769Z st.shared.b32 [%r59], %r1990; 2026-02-21T08:53:50.4208833Z st.shared.b32 [%r59+8], %r1991; 2026-02-21T08:53:50.4208904Z st.shared.b32 [%r59+8192], %r2006; 2026-02-21T08:53:50.4208968Z st.shared.b32 [%r59+8200], %r2007; 2026-02-21T08:53:50.4209031Z st.shared.b32 [%r60], %r1992; 2026-02-21T08:53:50.4209094Z st.shared.b32 [%r60+8], %r1993; 2026-02-21T08:53:50.4209165Z st.shared.b32 [%r60+8192], %r2008; 2026-02-21T08:53:50.4209230Z st.shared.b32 [%r60+8200], %r2009; 2026-02-21T08:53:50.4209293Z st.shared.b32 [%r61], %r1994; 2026-02-21T08:53:50.4209363Z st.shared.b32 [%r61+8], %r1995; 2026-02-21T08:53:50.4209585Z st.shared.b32 [%r61+8192], %r2010; 2026-02-21T08:53:50.4209649Z st.shared.b32 [%r61+8200], %r2011; 2026-02-21T08:53:50.4209707Z $L__tmp3: 2026-02-21T08:53:50.4209993Z .loc 2 291 36 // standard.py:291:36 @[ cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:81:40 ] 2026-02-21T08:53:50.4210057Z // begin inline asm 2026-02-21T08:53:50.4210144Z fence.proxy.async.shared::cta; 2026-02-21T08:53:50.4210208Z // end inline asm 2026-02-21T08:53:50.4210266Z bar.sync 0; 2026-02-21T08:53:50.4210350Z shfl.sync.idx.b32 %r2012, %r6, 0, 31, -1; 2026-02-21T08:53:50.4210430Z wgmma.fence.sync.aligned; 2026-02-21T08:53:50.4210496Z mov.pred %p15, -1; 2026-02-21T08:53:50.4210557Z // begin inline asm 2026-02-21T08:53:50.4211447Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1394,%r1395,%r1396,%r1397}, %rd17, %p15, 1, 1; 2026-02-21T08:53:50.4211520Z // end inline asm 2026-02-21T08:53:50.4211580Z // begin inline asm 2026-02-21T08:53:50.4212334Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1462,%r1463,%r1464,%r1465}, %rd18, %p15, 1, 1; 2026-02-21T08:53:50.4212396Z // end inline asm 2026-02-21T08:53:50.4212455Z // begin inline asm 2026-02-21T08:53:50.4213205Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1530,%r1531,%r1532,%r1533}, %rd19, %p15, 1, 1; 2026-02-21T08:53:50.4213270Z // end inline asm 2026-02-21T08:53:50.4213329Z // begin inline asm 2026-02-21T08:53:50.4214072Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1598,%r1599,%r1600,%r1601}, %rd20, %p15, 1, 1; 2026-02-21T08:53:50.4214135Z // end inline asm 2026-02-21T08:53:50.4214193Z // begin inline asm 2026-02-21T08:53:50.4214941Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1666,%r1667,%r1668,%r1669}, %rd21, %p15, 1, 1; 2026-02-21T08:53:50.4215018Z // end inline asm 2026-02-21T08:53:50.4215078Z // begin inline asm 2026-02-21T08:53:50.4215824Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1734,%r1735,%r1736,%r1737}, %rd22, %p15, 1, 1; 2026-02-21T08:53:50.4215885Z // end inline asm 2026-02-21T08:53:50.4215945Z // begin inline asm 2026-02-21T08:53:50.4216819Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1802,%r1803,%r1804,%r1805}, %rd23, %p15, 1, 1; 2026-02-21T08:53:50.4216890Z // end inline asm 2026-02-21T08:53:50.4216951Z // begin inline asm 2026-02-21T08:53:50.4217699Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826}, {%r1870,%r1871,%r1872,%r1873}, %rd24, %p15, 1, 1; 2026-02-21T08:53:50.4217894Z // end inline asm 2026-02-21T08:53:50.4217971Z wgmma.commit_group.sync.aligned; 2026-02-21T08:53:50.4218031Z mov.b32 %r1907, 0; 2026-02-21T08:53:50.4218098Z mov.b32 %r1906, %r441; 2026-02-21T08:53:50.4218160Z mov.b32 %r1908, %r1907; 2026-02-21T08:53:50.4218219Z // begin inline asm 2026-02-21T08:53:50.4218897Z // wait for regs: %r3795,%r3796,%r3797,%r3798,%r3799,%r3800,%r3801,%r3802,%r3803,%r3804,%r3805,%r3806,%r3807,%r3808,%r3809,%r3810,%r3811,%r3812,%r3813,%r3814,%r3815,%r3816,%r3817,%r3818,%r3819,%r3820,%r3821,%r3822,%r3823,%r3824,%r3825,%r3826,%r1906,%r1907,%r1908 2026-02-21T08:53:50.4218978Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:53:50.4219038Z // end inline asm 2026-02-21T08:53:50.4219100Z $L__tmp4: 2026-02-21T08:53:50.4219324Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4219388Z add.s32 %r2013, %r3794, 1; 2026-02-21T08:53:50.4219456Z setp.gt.s32 %p26, %r2013, 1; 2026-02-21T08:53:50.4219527Z selp.b32 %r3794, 0, %r2013, %p26; 2026-02-21T08:53:50.4219730Z .loc 1 45 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:32 2026-02-21T08:53:50.4219800Z add.s64 %rd183, %rd317, -917504; 2026-02-21T08:53:50.4219885Z add.s64 %rd184, %rd317, -786432; 2026-02-21T08:53:50.4219951Z add.s64 %rd185, %rd317, -655360; 2026-02-21T08:53:50.4220015Z add.s64 %rd186, %rd317, -524288; 2026-02-21T08:53:50.4220082Z add.s64 %rd187, %rd317, -393216; 2026-02-21T08:53:50.4220148Z add.s64 %rd188, %rd317, -262144; 2026-02-21T08:53:50.4220216Z add.s64 %rd189, %rd317, -131072; 2026-02-21T08:53:50.4220416Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4220488Z shl.b32 %r2014, %r3794, 13; 2026-02-21T08:53:50.4220554Z add.s32 %r2015, %r3732, %r2014; 2026-02-21T08:53:50.4220618Z add.s32 %r1944, %r2015, %r22; 2026-02-21T08:53:50.4220689Z selp.b32 %r1945, 8, 0, %p24; 2026-02-21T08:53:50.4220752Z // begin inline asm 2026-02-21T08:53:50.4220900Z cp.async.ca.shared.global [ %r1944 + 0 ], [ %rd183 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4220961Z // end inline asm 2026-02-21T08:53:50.4221034Z add.s32 %r1946, %r1944, 1024; 2026-02-21T08:53:50.4221096Z // begin inline asm 2026-02-21T08:53:50.4221236Z cp.async.ca.shared.global [ %r1946 + 0 ], [ %rd184 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4221300Z // end inline asm 2026-02-21T08:53:50.4221362Z add.s32 %r1948, %r1944, 2048; 2026-02-21T08:53:50.4221422Z // begin inline asm 2026-02-21T08:53:50.4221563Z cp.async.ca.shared.global [ %r1948 + 0 ], [ %rd185 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4221623Z // end inline asm 2026-02-21T08:53:50.4221685Z add.s32 %r1950, %r1944, 3072; 2026-02-21T08:53:50.4221747Z // begin inline asm 2026-02-21T08:53:50.4221886Z cp.async.ca.shared.global [ %r1950 + 0 ], [ %rd186 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4221943Z // end inline asm 2026-02-21T08:53:50.4222004Z add.s32 %r1952, %r1944, 4096; 2026-02-21T08:53:50.4222067Z // begin inline asm 2026-02-21T08:53:50.4222198Z cp.async.ca.shared.global [ %r1952 + 0 ], [ %rd187 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4222255Z // end inline asm 2026-02-21T08:53:50.4222315Z add.s32 %r1954, %r1944, 5120; 2026-02-21T08:53:50.4222378Z // begin inline asm 2026-02-21T08:53:50.4222511Z cp.async.ca.shared.global [ %r1954 + 0 ], [ %rd188 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4222569Z // end inline asm 2026-02-21T08:53:50.4222639Z add.s32 %r1956, %r1944, 6144; 2026-02-21T08:53:50.4222700Z // begin inline asm 2026-02-21T08:53:50.4222836Z cp.async.ca.shared.global [ %r1956 + 0 ], [ %rd189 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4222896Z // end inline asm 2026-02-21T08:53:50.4222958Z add.s32 %r1958, %r1944, 7168; 2026-02-21T08:53:50.4223129Z // begin inline asm 2026-02-21T08:53:50.4223262Z cp.async.ca.shared.global [ %r1958 + 0 ], [ %rd317 + 0 ], 0x8, %r1945; 2026-02-21T08:53:50.4223326Z // end inline asm 2026-02-21T08:53:50.4223394Z cp.async.commit_group; 2026-02-21T08:53:50.4223598Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4223669Z cvt.s64.s32 %rd192, %r3792; 2026-02-21T08:53:50.4223737Z add.s64 %rd191, %rd81, %rd192; 2026-02-21T08:53:50.4223937Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4224004Z shl.b32 %r2016, %r3794, 11; 2026-02-21T08:53:50.4224066Z add.s32 %r1960, %r32, %r2016; 2026-02-21T08:53:50.4226333Z selp.b32 %r1961, 16, 0, %p24; 2026-02-21T08:53:50.4226691Z // begin inline asm 2026-02-21T08:53:50.4226880Z cp.async.cg.shared.global [ %r1960 + 0 ], [ %rd191 + 0 ], 0x10, %r1961; 2026-02-21T08:53:50.4226950Z // end inline asm 2026-02-21T08:53:50.4227027Z cp.async.commit_group; 2026-02-21T08:53:50.4227271Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4227355Z add.s32 %r3792, %r3792, 229376; 2026-02-21T08:53:50.4227433Z add.s64 %rd317, %rd317, 128; 2026-02-21T08:53:50.4227509Z setp.lt.u64 %p27, %rd318, 4064; 2026-02-21T08:53:50.4227574Z @%p27 bra $L__BB0_5; 2026-02-21T08:53:50.4227699Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:50.4227920Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4227988Z or.b32 %r2092, %r199, %r10; 2026-02-21T08:53:50.4228211Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4228308Z cp.async.wait_group 0; 2026-02-21T08:53:50.4228366Z bar.sync 0; 2026-02-21T08:53:50.4228635Z .loc 1 84 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:84:28 2026-02-21T08:53:50.4228729Z cvt.rn.bf16x2.f32 %r2093, %r3796, %r3795; 2026-02-21T08:53:50.4228806Z cvt.rn.bf16x2.f32 %r2094, %r3798, %r3797; 2026-02-21T08:53:50.4228881Z cvt.rn.bf16x2.f32 %r2095, %r3800, %r3799; 2026-02-21T08:53:50.4228957Z cvt.rn.bf16x2.f32 %r2096, %r3802, %r3801; 2026-02-21T08:53:50.4229028Z cvt.rn.bf16x2.f32 %r2097, %r3804, %r3803; 2026-02-21T08:53:50.4229097Z cvt.rn.bf16x2.f32 %r2098, %r3806, %r3805; 2026-02-21T08:53:50.4229171Z cvt.rn.bf16x2.f32 %r2099, %r3808, %r3807; 2026-02-21T08:53:50.4229241Z cvt.rn.bf16x2.f32 %r2100, %r3810, %r3809; 2026-02-21T08:53:50.4229312Z cvt.rn.bf16x2.f32 %r2101, %r3812, %r3811; 2026-02-21T08:53:50.4229385Z cvt.rn.bf16x2.f32 %r2102, %r3814, %r3813; 2026-02-21T08:53:50.4229464Z cvt.rn.bf16x2.f32 %r2103, %r3816, %r3815; 2026-02-21T08:53:50.4229535Z cvt.rn.bf16x2.f32 %r2104, %r3818, %r3817; 2026-02-21T08:53:50.4229606Z cvt.rn.bf16x2.f32 %r2105, %r3820, %r3819; 2026-02-21T08:53:50.4229683Z cvt.rn.bf16x2.f32 %r2106, %r3822, %r3821; 2026-02-21T08:53:50.4229757Z cvt.rn.bf16x2.f32 %r2107, %r3824, %r3823; 2026-02-21T08:53:50.4229829Z cvt.rn.bf16x2.f32 %r2108, %r3826, %r3825; 2026-02-21T08:53:50.4230045Z .loc 1 85 50 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:50 2026-02-21T08:53:50.4230114Z add.s32 %r2109, %r2092, %r18; 2026-02-21T08:53:50.4230177Z add.s32 %r2110, %r2092, %r19; 2026-02-21T08:53:50.4230238Z add.s32 %r2111, %r2092, %r20; 2026-02-21T08:53:50.4230306Z add.s32 %r2112, %r2092, %r21; 2026-02-21T08:53:50.4230514Z .loc 1 85 22 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:22 2026-02-21T08:53:50.4230591Z mad.wide.s32 %rd193, %r2109, 2, %rd82; 2026-02-21T08:53:50.4230669Z mad.wide.s32 %rd194, %r2110, 2, %rd82; 2026-02-21T08:53:50.4230735Z mad.wide.s32 %rd195, %r2111, 2, %rd82; 2026-02-21T08:53:50.4230802Z mad.wide.s32 %rd196, %r2112, 2, %rd82; 2026-02-21T08:53:50.4231009Z .loc 1 85 81 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:81 2026-02-21T08:53:50.4231215Z st.shared.v4.b32 [%r62], {%r2093, %r2095, %r2097, %r2099}; 2026-02-21T08:53:50.4231336Z st.shared.v4.b32 [%r62+256], {%r2094, %r2096, %r2098, %r2100}; 2026-02-21T08:53:50.4231447Z st.shared.v4.b32 [%r63], {%r2101, %r2103, %r2105, %r2107}; 2026-02-21T08:53:50.4231562Z st.shared.v4.b32 [%r63+256], {%r2102, %r2104, %r2106, %r2108}; 2026-02-21T08:53:50.4231621Z bar.sync 0; 2026-02-21T08:53:50.4231687Z // begin inline asm 2026-02-21T08:53:50.4231895Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2037, %r2038, %r2039, %r2040}, [%r1235]; 2026-02-21T08:53:50.4231965Z // end inline asm 2026-02-21T08:53:50.4232025Z // begin inline asm 2026-02-21T08:53:50.4232490Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2041, %r2042, %r2043, %r2044}, [%r1240]; 2026-02-21T08:53:50.4232558Z // end inline asm 2026-02-21T08:53:50.4232619Z // begin inline asm 2026-02-21T08:53:50.4232801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2045, %r2046, %r2047, %r2048}, [%r1245]; 2026-02-21T08:53:50.4232867Z // end inline asm 2026-02-21T08:53:50.4232927Z // begin inline asm 2026-02-21T08:53:50.4233106Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2049, %r2050, %r2051, %r2052}, [%r1250]; 2026-02-21T08:53:50.4233165Z // end inline asm 2026-02-21T08:53:50.4233229Z // begin inline asm 2026-02-21T08:53:50.4233361Z st.global.v4.b32 [ %rd193 + 0 ], { %r2037, %r2038, %r2039, %r2040 }; 2026-02-21T08:53:50.4233419Z // end inline asm 2026-02-21T08:53:50.4233482Z // begin inline asm 2026-02-21T08:53:50.4233605Z st.global.v4.b32 [ %rd194 + 0 ], { %r2041, %r2042, %r2043, %r2044 }; 2026-02-21T08:53:50.4233663Z // end inline asm 2026-02-21T08:53:50.4233727Z // begin inline asm 2026-02-21T08:53:50.4233851Z st.global.v4.b32 [ %rd195 + 0 ], { %r2045, %r2046, %r2047, %r2048 }; 2026-02-21T08:53:50.4233913Z // end inline asm 2026-02-21T08:53:50.4233974Z // begin inline asm 2026-02-21T08:53:50.4234105Z st.global.v4.b32 [ %rd196 + 0 ], { %r2049, %r2050, %r2051, %r2052 }; 2026-02-21T08:53:50.4234164Z // end inline asm 2026-02-21T08:53:50.4234377Z .loc 1 29 27 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:29:27 2026-02-21T08:53:50.4234450Z add.s32 %r270, %r128, 128; 2026-02-21T08:53:50.4234654Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4234723Z or.b32 %r2113, %r270, %r12; 2026-02-21T08:53:50.4234928Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4234988Z bar.sync 0; 2026-02-21T08:53:50.4235048Z mov.b32 %r2054, 8; 2026-02-21T08:53:50.4235109Z // begin inline asm 2026-02-21T08:53:50.4235259Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd1 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4235321Z // end inline asm 2026-02-21T08:53:50.4235382Z // begin inline asm 2026-02-21T08:53:50.4235518Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd2 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4235589Z // end inline asm 2026-02-21T08:53:50.4235653Z // begin inline asm 2026-02-21T08:53:50.4235781Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd3 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4235846Z // end inline asm 2026-02-21T08:53:50.4235904Z // begin inline asm 2026-02-21T08:53:50.4236031Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd4 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4236093Z // end inline asm 2026-02-21T08:53:50.4236155Z // begin inline asm 2026-02-21T08:53:50.4236279Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd5 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4236342Z // end inline asm 2026-02-21T08:53:50.4236401Z // begin inline asm 2026-02-21T08:53:50.4236674Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd6 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4236742Z // end inline asm 2026-02-21T08:53:50.4236810Z // begin inline asm 2026-02-21T08:53:50.4236936Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd7 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4236992Z // end inline asm 2026-02-21T08:53:50.4237160Z // begin inline asm 2026-02-21T08:53:50.4237291Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd8 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4237349Z // end inline asm 2026-02-21T08:53:50.4237421Z cp.async.commit_group; 2026-02-21T08:53:50.4237627Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4237698Z add.s32 %r2114, %r2113, %r3733; 2026-02-21T08:53:50.4237898Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4237968Z cvt.s64.s32 %rd216, %r2114; 2026-02-21T08:53:50.4238039Z add.s64 %rd205, %rd81, %rd216; 2026-02-21T08:53:50.4238099Z mov.b32 %r2070, 16; 2026-02-21T08:53:50.4238498Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4238562Z // begin inline asm 2026-02-21T08:53:50.4238706Z cp.async.cg.shared.global [ %r32 + 0 ], [ %rd205 + 0 ], 0x10, %r2070; 2026-02-21T08:53:50.4238767Z // end inline asm 2026-02-21T08:53:50.4238842Z cp.async.commit_group; 2026-02-21T08:53:50.4239040Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4239097Z bar.sync 0; 2026-02-21T08:53:50.4239166Z // begin inline asm 2026-02-21T08:53:50.4239307Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd9 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4239365Z // end inline asm 2026-02-21T08:53:50.4239428Z // begin inline asm 2026-02-21T08:53:50.4239568Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd10 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4239625Z // end inline asm 2026-02-21T08:53:50.4239684Z // begin inline asm 2026-02-21T08:53:50.4239822Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd11 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4239880Z // end inline asm 2026-02-21T08:53:50.4239941Z // begin inline asm 2026-02-21T08:53:50.4240071Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd12 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4240132Z // end inline asm 2026-02-21T08:53:50.4240193Z // begin inline asm 2026-02-21T08:53:50.4240322Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd13 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4240384Z // end inline asm 2026-02-21T08:53:50.4240444Z // begin inline asm 2026-02-21T08:53:50.4240572Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd14 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4240633Z // end inline asm 2026-02-21T08:53:50.4240692Z // begin inline asm 2026-02-21T08:53:50.4240820Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd15 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4240884Z // end inline asm 2026-02-21T08:53:50.4240943Z // begin inline asm 2026-02-21T08:53:50.4241069Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd16 + 0 ], 0x8, %r2054; 2026-02-21T08:53:50.4241137Z // end inline asm 2026-02-21T08:53:50.4241207Z cp.async.commit_group; 2026-02-21T08:53:50.4241408Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4241481Z add.s32 %r2115, %r2113, %r41; 2026-02-21T08:53:50.4241677Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4241746Z cvt.s64.s32 %rd217, %r2115; 2026-02-21T08:53:50.4241815Z add.s64 %rd214, %rd81, %rd217; 2026-02-21T08:53:50.4242013Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4242073Z // begin inline asm 2026-02-21T08:53:50.4242212Z cp.async.cg.shared.global [ %r42 + 0 ], [ %rd214 + 0 ], 0x10, %r2070; 2026-02-21T08:53:50.4242274Z // end inline asm 2026-02-21T08:53:50.4242341Z cp.async.commit_group; 2026-02-21T08:53:50.4242404Z mov.b32 %r3830, 0f00000000; 2026-02-21T08:53:50.4242473Z mov.b32 %r3829, 1; 2026-02-21T08:53:50.4242546Z mov.b32 %r3828, -1; 2026-02-21T08:53:50.4242609Z mov.b64 %rd320, -32; 2026-02-21T08:53:50.4242673Z mov.b64 %rd319, %rd27; 2026-02-21T08:53:50.4242739Z mov.b32 %r3827, %r3754; 2026-02-21T08:53:50.4242873Z mov.b32 %r3831, %r3830; 2026-02-21T08:53:50.4242932Z mov.b32 %r3832, %r3830; 2026-02-21T08:53:50.4242994Z mov.b32 %r3833, %r3830; 2026-02-21T08:53:50.4243053Z mov.b32 %r3834, %r3830; 2026-02-21T08:53:50.4243112Z mov.b32 %r3835, %r3830; 2026-02-21T08:53:50.4243180Z mov.b32 %r3836, %r3830; 2026-02-21T08:53:50.4243237Z mov.b32 %r3837, %r3830; 2026-02-21T08:53:50.4243295Z mov.b32 %r3838, %r3830; 2026-02-21T08:53:50.4243353Z mov.b32 %r3839, %r3830; 2026-02-21T08:53:50.4243418Z mov.b32 %r3840, %r3830; 2026-02-21T08:53:50.4243476Z mov.b32 %r3841, %r3830; 2026-02-21T08:53:50.4243536Z mov.b32 %r3842, %r3830; 2026-02-21T08:53:50.4243598Z mov.b32 %r3843, %r3830; 2026-02-21T08:53:50.4243658Z mov.b32 %r3844, %r3830; 2026-02-21T08:53:50.4243769Z mov.b32 %r3845, %r3830; 2026-02-21T08:53:50.4243912Z mov.b32 %r3846, %r3830; 2026-02-21T08:53:50.4243976Z mov.b32 %r3847, %r3830; 2026-02-21T08:53:50.4244036Z mov.b32 %r3848, %r3830; 2026-02-21T08:53:50.4244095Z mov.b32 %r3849, %r3830; 2026-02-21T08:53:50.4244160Z mov.b32 %r3850, %r3830; 2026-02-21T08:53:50.4244219Z mov.b32 %r3851, %r3830; 2026-02-21T08:53:50.4244276Z mov.b32 %r3852, %r3830; 2026-02-21T08:53:50.4244334Z mov.b32 %r3853, %r3830; 2026-02-21T08:53:50.4244407Z mov.b32 %r3854, %r3830; 2026-02-21T08:53:50.4244471Z mov.b32 %r3855, %r3830; 2026-02-21T08:53:50.4244529Z mov.b32 %r3856, %r3830; 2026-02-21T08:53:50.4244591Z mov.b32 %r3857, %r3830; 2026-02-21T08:53:50.4244648Z mov.b32 %r3858, %r3830; 2026-02-21T08:53:50.4244705Z mov.b32 %r3859, %r3830; 2026-02-21T08:53:50.4244763Z mov.b32 %r3860, %r3830; 2026-02-21T08:53:50.4244825Z mov.b32 %r3861, %r3830; 2026-02-21T08:53:50.4244938Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T08:53:50.4245056Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:50.4245281Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4245350Z add.s64 %rd320, %rd320, 32; 2026-02-21T08:53:50.4245423Z setp.lt.u64 %p37, %rd320, 4032; 2026-02-21T08:53:50.4245490Z add.s32 %r2748, %r3828, 1; 2026-02-21T08:53:50.4245558Z setp.gt.s32 %p38, %r2748, 1; 2026-02-21T08:53:50.4245630Z selp.b32 %r3828, 0, %r2748, %p38; 2026-02-21T08:53:50.4245840Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4245918Z cp.async.wait_group 2; 2026-02-21T08:53:50.4245976Z bar.sync 0; 2026-02-21T08:53:50.4246038Z shl.b32 %r2749, %r3828, 13; 2026-02-21T08:53:50.4246110Z add.s32 %r2751, %r3732, %r2749; 2026-02-21T08:53:50.4246312Z .loc 1 49 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:49:32 2026-02-21T08:53:50.4246381Z add.s32 %r2752, %r2751, %r68; 2026-02-21T08:53:50.4246584Z ld.shared.b16 %rs385, [%r2752]; 2026-02-21T08:53:50.4246666Z ld.shared.b16 %rs386, [%r2752+1024]; 2026-02-21T08:53:50.4246737Z ld.shared.b16 %rs387, [%r2752+64]; 2026-02-21T08:53:50.4246809Z ld.shared.b16 %rs388, [%r2752+1088]; 2026-02-21T08:53:50.4246877Z add.s32 %r2753, %r2751, %r69; 2026-02-21T08:53:50.4246943Z ld.shared.b16 %rs389, [%r2753]; 2026-02-21T08:53:50.4247013Z ld.shared.b16 %rs390, [%r2753+1024]; 2026-02-21T08:53:50.4247084Z ld.shared.b16 %rs391, [%r2753+64]; 2026-02-21T08:53:50.4247151Z ld.shared.b16 %rs392, [%r2753+1088]; 2026-02-21T08:53:50.4247212Z add.s32 %r2754, %r2751, %r70; 2026-02-21T08:53:50.4247279Z ld.shared.b16 %rs393, [%r2754]; 2026-02-21T08:53:50.4247350Z ld.shared.b16 %rs394, [%r2754+1024]; 2026-02-21T08:53:50.4247416Z ld.shared.b16 %rs395, [%r2754+64]; 2026-02-21T08:53:50.4247482Z ld.shared.b16 %rs396, [%r2754+1088]; 2026-02-21T08:53:50.4247553Z add.s32 %r2755, %r2751, %r71; 2026-02-21T08:53:50.4247623Z ld.shared.b16 %rs397, [%r2755]; 2026-02-21T08:53:50.4247691Z ld.shared.b16 %rs398, [%r2755+1024]; 2026-02-21T08:53:50.4247767Z ld.shared.b16 %rs399, [%r2755+64]; 2026-02-21T08:53:50.4247840Z ld.shared.b16 %rs400, [%r2755+1088]; 2026-02-21T08:53:50.4247986Z add.s32 %r2756, %r2751, %r72; 2026-02-21T08:53:50.4248052Z ld.shared.b16 %rs401, [%r2756]; 2026-02-21T08:53:50.4248123Z ld.shared.b16 %rs402, [%r2756+1024]; 2026-02-21T08:53:50.4248190Z ld.shared.b16 %rs403, [%r2756+64]; 2026-02-21T08:53:50.4248258Z ld.shared.b16 %rs404, [%r2756+1088]; 2026-02-21T08:53:50.4248324Z add.s32 %r2757, %r2751, %r73; 2026-02-21T08:53:50.4248390Z ld.shared.b16 %rs405, [%r2757]; 2026-02-21T08:53:50.4248455Z ld.shared.b16 %rs406, [%r2757+1024]; 2026-02-21T08:53:50.4248521Z ld.shared.b16 %rs407, [%r2757+64]; 2026-02-21T08:53:50.4248590Z ld.shared.b16 %rs408, [%r2757+1088]; 2026-02-21T08:53:50.4248652Z add.s32 %r2758, %r2751, %r74; 2026-02-21T08:53:50.4248718Z ld.shared.b16 %rs409, [%r2758]; 2026-02-21T08:53:50.4248976Z ld.shared.b16 %rs410, [%r2758+1024]; 2026-02-21T08:53:50.4249046Z ld.shared.b16 %rs411, [%r2758+64]; 2026-02-21T08:53:50.4249115Z ld.shared.b16 %rs412, [%r2758+1088]; 2026-02-21T08:53:50.4249178Z add.s32 %r2759, %r2751, %r75; 2026-02-21T08:53:50.4249248Z ld.shared.b16 %rs413, [%r2759]; 2026-02-21T08:53:50.4249315Z ld.shared.b16 %rs414, [%r2759+1024]; 2026-02-21T08:53:50.4249379Z ld.shared.b16 %rs415, [%r2759+64]; 2026-02-21T08:53:50.4249451Z ld.shared.b16 %rs416, [%r2759+1088]; 2026-02-21T08:53:50.4249519Z cvt.f32.bf16 %r2180, %rs385; 2026-02-21T08:53:50.4249581Z cvt.f32.bf16 %r2181, %rs386; 2026-02-21T08:53:50.4249648Z cvt.f32.bf16 %r2182, %rs389; 2026-02-21T08:53:50.4249711Z cvt.f32.bf16 %r2183, %rs390; 2026-02-21T08:53:50.4249773Z cvt.f32.bf16 %r2248, %rs393; 2026-02-21T08:53:50.4249834Z cvt.f32.bf16 %r2249, %rs394; 2026-02-21T08:53:50.4249902Z cvt.f32.bf16 %r2250, %rs397; 2026-02-21T08:53:50.4249963Z cvt.f32.bf16 %r2251, %rs398; 2026-02-21T08:53:50.4250032Z cvt.f32.bf16 %r2316, %rs401; 2026-02-21T08:53:50.4250099Z cvt.f32.bf16 %r2317, %rs402; 2026-02-21T08:53:50.4250160Z cvt.f32.bf16 %r2318, %rs405; 2026-02-21T08:53:50.4250222Z cvt.f32.bf16 %r2319, %rs406; 2026-02-21T08:53:50.4250285Z cvt.f32.bf16 %r2384, %rs409; 2026-02-21T08:53:50.4250351Z cvt.f32.bf16 %r2385, %rs410; 2026-02-21T08:53:50.4250427Z cvt.f32.bf16 %r2386, %rs413; 2026-02-21T08:53:50.4250493Z cvt.f32.bf16 %r2387, %rs414; 2026-02-21T08:53:50.4250558Z cvt.f32.bf16 %r2452, %rs387; 2026-02-21T08:53:50.4250623Z cvt.f32.bf16 %r2453, %rs388; 2026-02-21T08:53:50.4250684Z cvt.f32.bf16 %r2454, %rs391; 2026-02-21T08:53:50.4250746Z cvt.f32.bf16 %r2455, %rs392; 2026-02-21T08:53:50.4250815Z cvt.f32.bf16 %r2520, %rs395; 2026-02-21T08:53:50.4250876Z cvt.f32.bf16 %r2521, %rs396; 2026-02-21T08:53:50.4250938Z cvt.f32.bf16 %r2522, %rs399; 2026-02-21T08:53:50.4251002Z cvt.f32.bf16 %r2523, %rs400; 2026-02-21T08:53:50.4251064Z cvt.f32.bf16 %r2588, %rs403; 2026-02-21T08:53:50.4251129Z cvt.f32.bf16 %r2589, %rs404; 2026-02-21T08:53:50.4251195Z cvt.f32.bf16 %r2590, %rs407; 2026-02-21T08:53:50.4251257Z cvt.f32.bf16 %r2591, %rs408; 2026-02-21T08:53:50.4251316Z cvt.f32.bf16 %r2656, %rs411; 2026-02-21T08:53:50.4251379Z cvt.f32.bf16 %r2657, %rs412; 2026-02-21T08:53:50.4251443Z cvt.f32.bf16 %r2658, %rs415; 2026-02-21T08:53:50.4251505Z cvt.f32.bf16 %r2659, %rs416; 2026-02-21T08:53:50.4251710Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4251773Z shl.b32 %r2760, %r3828, 11; 2026-02-21T08:53:50.4251835Z add.s32 %r2761, %r3732, %r2760; 2026-02-21T08:53:50.4251900Z add.s32 %r2762, %r2761, 32768; 2026-02-21T08:53:50.4252100Z .loc 1 64 45 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:64:45 2026-02-21T08:53:50.4252167Z add.s32 %r2763, %r2762, %r3738; 2026-02-21T08:53:50.4252235Z ld.shared.b8 %rs417, [%r2763]; 2026-02-21T08:53:50.4252305Z ld.shared.b8 %rs418, [%r2763+64]; 2026-02-21T08:53:50.4252378Z ld.shared.b8 %rs419, [%r2763+128]; 2026-02-21T08:53:50.4252444Z ld.shared.b8 %rs420, [%r2763+192]; 2026-02-21T08:53:50.4252508Z ld.shared.b8 %rs421, [%r2763+256]; 2026-02-21T08:53:50.4252650Z ld.shared.b8 %rs422, [%r2763+320]; 2026-02-21T08:53:50.4252715Z ld.shared.b8 %rs423, [%r2763+384]; 2026-02-21T08:53:50.4252779Z ld.shared.b8 %rs424, [%r2763+448]; 2026-02-21T08:53:50.4252842Z ld.shared.b8 %rs425, [%r2763+512]; 2026-02-21T08:53:50.4252911Z ld.shared.b8 %rs426, [%r2763+576]; 2026-02-21T08:53:50.4252974Z ld.shared.b8 %rs427, [%r2763+640]; 2026-02-21T08:53:50.4253039Z ld.shared.b8 %rs428, [%r2763+704]; 2026-02-21T08:53:50.4253106Z ld.shared.b8 %rs429, [%r2763+768]; 2026-02-21T08:53:50.4253171Z ld.shared.b8 %rs430, [%r2763+832]; 2026-02-21T08:53:50.4253235Z ld.shared.b8 %rs431, [%r2763+896]; 2026-02-21T08:53:50.4253298Z add.s32 %r2764, %r2762, %r3739; 2026-02-21T08:53:50.4253368Z ld.shared.b8 %rs432, [%r2764]; 2026-02-21T08:53:50.4253612Z ld.shared.b8 %rs433, [%r2763+1024]; 2026-02-21T08:53:50.4253685Z ld.shared.b8 %rs434, [%r2763+1088]; 2026-02-21T08:53:50.4253754Z ld.shared.b8 %rs435, [%r2763+1152]; 2026-02-21T08:53:50.4253822Z ld.shared.b8 %rs436, [%r2763+1216]; 2026-02-21T08:53:50.4253891Z ld.shared.b8 %rs437, [%r2763+1280]; 2026-02-21T08:53:50.4253960Z ld.shared.b8 %rs438, [%r2763+1344]; 2026-02-21T08:53:50.4254025Z ld.shared.b8 %rs439, [%r2763+1408]; 2026-02-21T08:53:50.4254090Z ld.shared.b8 %rs440, [%r2763+1472]; 2026-02-21T08:53:50.4254156Z ld.shared.b8 %rs441, [%r2763+1536]; 2026-02-21T08:53:50.4254225Z ld.shared.b8 %rs442, [%r2763+1600]; 2026-02-21T08:53:50.4254292Z ld.shared.b8 %rs443, [%r2763+1664]; 2026-02-21T08:53:50.4254359Z ld.shared.b8 %rs444, [%r2763+1728]; 2026-02-21T08:53:50.4254429Z ld.shared.b8 %rs445, [%r2763+1792]; 2026-02-21T08:53:50.4254495Z ld.shared.b8 %rs446, [%r2763+1856]; 2026-02-21T08:53:50.4254563Z ld.shared.b8 %rs447, [%r2763+1920]; 2026-02-21T08:53:50.4254627Z add.s32 %r2765, %r2762, %r3740; 2026-02-21T08:53:50.4254698Z ld.shared.b8 %rs448, [%r2765]; 2026-02-21T08:53:50.4254902Z .loc 1 54 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:54:28 2026-02-21T08:53:50.4254969Z shl.b16 %rs449, %rs417, 4; 2026-02-21T08:53:50.4255034Z shl.b16 %rs450, %rs418, 4; 2026-02-21T08:53:50.4255096Z shl.b16 %rs451, %rs419, 4; 2026-02-21T08:53:50.4255156Z shl.b16 %rs452, %rs420, 4; 2026-02-21T08:53:50.4255222Z shl.b16 %rs453, %rs421, 4; 2026-02-21T08:53:50.4255296Z shl.b16 %rs454, %rs422, 4; 2026-02-21T08:53:50.4255359Z shl.b16 %rs455, %rs423, 4; 2026-02-21T08:53:50.4255421Z shl.b16 %rs456, %rs424, 4; 2026-02-21T08:53:50.4255487Z shl.b16 %rs457, %rs425, 4; 2026-02-21T08:53:50.4255547Z shl.b16 %rs458, %rs426, 4; 2026-02-21T08:53:50.4255609Z shl.b16 %rs459, %rs427, 4; 2026-02-21T08:53:50.4255674Z shl.b16 %rs460, %rs428, 4; 2026-02-21T08:53:50.4255737Z shl.b16 %rs461, %rs429, 4; 2026-02-21T08:53:50.4255799Z shl.b16 %rs462, %rs430, 4; 2026-02-21T08:53:50.4255864Z shl.b16 %rs463, %rs431, 4; 2026-02-21T08:53:50.4255930Z shl.b16 %rs464, %rs432, 4; 2026-02-21T08:53:50.4255993Z shl.b16 %rs465, %rs433, 4; 2026-02-21T08:53:50.4256055Z shl.b16 %rs466, %rs434, 4; 2026-02-21T08:53:50.4256121Z shl.b16 %rs467, %rs435, 4; 2026-02-21T08:53:50.4256181Z shl.b16 %rs468, %rs436, 4; 2026-02-21T08:53:50.4256243Z shl.b16 %rs469, %rs437, 4; 2026-02-21T08:53:50.4256303Z shl.b16 %rs470, %rs438, 4; 2026-02-21T08:53:50.4256368Z shl.b16 %rs471, %rs439, 4; 2026-02-21T08:53:50.4256429Z shl.b16 %rs472, %rs440, 4; 2026-02-21T08:53:50.4256609Z shl.b16 %rs473, %rs441, 4; 2026-02-21T08:53:50.4256677Z shl.b16 %rs474, %rs442, 4; 2026-02-21T08:53:50.4256738Z shl.b16 %rs475, %rs443, 4; 2026-02-21T08:53:50.4256799Z shl.b16 %rs476, %rs444, 4; 2026-02-21T08:53:50.4256859Z shl.b16 %rs477, %rs445, 4; 2026-02-21T08:53:50.4256923Z shl.b16 %rs478, %rs446, 4; 2026-02-21T08:53:50.4256985Z shl.b16 %rs479, %rs447, 4; 2026-02-21T08:53:50.4257052Z shl.b16 %rs480, %rs448, 4; 2026-02-21T08:53:50.4257260Z .loc 1 69 58 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:69:58 2026-02-21T08:53:50.4257340Z selp.b16 %rs481, %rs449, %rs417, %p57; 2026-02-21T08:53:50.4257497Z cvt.s16.s8 %rs482, %rs481; 2026-02-21T08:53:50.4257565Z shr.s16 %rs483, %rs482, 4; 2026-02-21T08:53:50.4257639Z selp.b16 %rs484, %rs450, %rs418, %p57; 2026-02-21T08:53:50.4257701Z cvt.s16.s8 %rs485, %rs484; 2026-02-21T08:53:50.4257761Z shr.s16 %rs486, %rs485, 4; 2026-02-21T08:53:50.4257836Z selp.b16 %rs487, %rs451, %rs419, %p57; 2026-02-21T08:53:50.4257898Z cvt.s16.s8 %rs488, %rs487; 2026-02-21T08:53:50.4257961Z shr.s16 %rs489, %rs488, 4; 2026-02-21T08:53:50.4258033Z selp.b16 %rs490, %rs452, %rs420, %p57; 2026-02-21T08:53:50.4258094Z cvt.s16.s8 %rs491, %rs490; 2026-02-21T08:53:50.4258155Z shr.s16 %rs492, %rs491, 4; 2026-02-21T08:53:50.4258225Z selp.b16 %rs493, %rs453, %rs421, %p57; 2026-02-21T08:53:50.4258290Z cvt.s16.s8 %rs494, %rs493; 2026-02-21T08:53:50.4258539Z shr.s16 %rs495, %rs494, 4; 2026-02-21T08:53:50.4258613Z selp.b16 %rs496, %rs454, %rs422, %p57; 2026-02-21T08:53:50.4258677Z cvt.s16.s8 %rs497, %rs496; 2026-02-21T08:53:50.4258737Z shr.s16 %rs498, %rs497, 4; 2026-02-21T08:53:50.4258808Z selp.b16 %rs499, %rs455, %rs423, %p57; 2026-02-21T08:53:50.4258868Z cvt.s16.s8 %rs500, %rs499; 2026-02-21T08:53:50.4258931Z shr.s16 %rs501, %rs500, 4; 2026-02-21T08:53:50.4258999Z selp.b16 %rs502, %rs456, %rs424, %p57; 2026-02-21T08:53:50.4259062Z cvt.s16.s8 %rs503, %rs502; 2026-02-21T08:53:50.4259125Z shr.s16 %rs504, %rs503, 4; 2026-02-21T08:53:50.4259194Z selp.b16 %rs505, %rs457, %rs425, %p57; 2026-02-21T08:53:50.4259254Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T08:53:50.4259313Z shr.s16 %rs507, %rs506, 4; 2026-02-21T08:53:50.4259386Z selp.b16 %rs508, %rs458, %rs426, %p57; 2026-02-21T08:53:50.4259445Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T08:53:50.4259505Z shr.s16 %rs510, %rs509, 4; 2026-02-21T08:53:50.4259582Z selp.b16 %rs511, %rs459, %rs427, %p57; 2026-02-21T08:53:50.4259644Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T08:53:50.4259705Z shr.s16 %rs513, %rs512, 4; 2026-02-21T08:53:50.4259777Z selp.b16 %rs514, %rs460, %rs428, %p57; 2026-02-21T08:53:50.4259840Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T08:53:50.4259901Z shr.s16 %rs516, %rs515, 4; 2026-02-21T08:53:50.4259970Z selp.b16 %rs517, %rs461, %rs429, %p57; 2026-02-21T08:53:50.4260035Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T08:53:50.4260096Z shr.s16 %rs519, %rs518, 4; 2026-02-21T08:53:50.4260163Z selp.b16 %rs520, %rs462, %rs430, %p57; 2026-02-21T08:53:50.4260232Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T08:53:50.4260304Z shr.s16 %rs522, %rs521, 4; 2026-02-21T08:53:50.4260374Z selp.b16 %rs523, %rs463, %rs431, %p57; 2026-02-21T08:53:50.4260437Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T08:53:50.4260504Z shr.s16 %rs525, %rs524, 4; 2026-02-21T08:53:50.4260572Z selp.b16 %rs526, %rs464, %rs432, %p57; 2026-02-21T08:53:50.4260633Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T08:53:50.4260704Z shr.s16 %rs528, %rs527, 4; 2026-02-21T08:53:50.4260773Z selp.b16 %rs529, %rs465, %rs433, %p57; 2026-02-21T08:53:50.4260835Z cvt.s16.s8 %rs530, %rs529; 2026-02-21T08:53:50.4260896Z shr.s16 %rs531, %rs530, 4; 2026-02-21T08:53:50.4260974Z selp.b16 %rs532, %rs466, %rs434, %p57; 2026-02-21T08:53:50.4261034Z cvt.s16.s8 %rs533, %rs532; 2026-02-21T08:53:50.4261095Z shr.s16 %rs534, %rs533, 4; 2026-02-21T08:53:50.4261168Z selp.b16 %rs535, %rs467, %rs435, %p57; 2026-02-21T08:53:50.4261229Z cvt.s16.s8 %rs536, %rs535; 2026-02-21T08:53:50.4261290Z shr.s16 %rs537, %rs536, 4; 2026-02-21T08:53:50.4261362Z selp.b16 %rs538, %rs468, %rs436, %p57; 2026-02-21T08:53:50.4261425Z cvt.s16.s8 %rs539, %rs538; 2026-02-21T08:53:50.4261488Z shr.s16 %rs540, %rs539, 4; 2026-02-21T08:53:50.4261557Z selp.b16 %rs541, %rs469, %rs437, %p57; 2026-02-21T08:53:50.4261624Z cvt.s16.s8 %rs542, %rs541; 2026-02-21T08:53:50.4261685Z shr.s16 %rs543, %rs542, 4; 2026-02-21T08:53:50.4261760Z selp.b16 %rs544, %rs470, %rs438, %p57; 2026-02-21T08:53:50.4261832Z cvt.s16.s8 %rs545, %rs544; 2026-02-21T08:53:50.4261903Z shr.s16 %rs546, %rs545, 4; 2026-02-21T08:53:50.4261973Z selp.b16 %rs547, %rs471, %rs439, %p57; 2026-02-21T08:53:50.4262104Z cvt.s16.s8 %rs548, %rs547; 2026-02-21T08:53:50.4262169Z shr.s16 %rs549, %rs548, 4; 2026-02-21T08:53:50.4262237Z selp.b16 %rs550, %rs472, %rs440, %p57; 2026-02-21T08:53:50.4262298Z cvt.s16.s8 %rs551, %rs550; 2026-02-21T08:53:50.4262362Z shr.s16 %rs552, %rs551, 4; 2026-02-21T08:53:50.4262433Z selp.b16 %rs553, %rs473, %rs441, %p57; 2026-02-21T08:53:50.4262494Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T08:53:50.4262555Z shr.s16 %rs555, %rs554, 4; 2026-02-21T08:53:50.4262628Z selp.b16 %rs556, %rs474, %rs442, %p57; 2026-02-21T08:53:50.4262688Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T08:53:50.4262749Z shr.s16 %rs558, %rs557, 4; 2026-02-21T08:53:50.4262822Z selp.b16 %rs559, %rs475, %rs443, %p57; 2026-02-21T08:53:50.4262884Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T08:53:50.4263082Z shr.s16 %rs561, %rs560, 4; 2026-02-21T08:53:50.4263157Z selp.b16 %rs562, %rs476, %rs444, %p57; 2026-02-21T08:53:50.4263224Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T08:53:50.4263284Z shr.s16 %rs564, %rs563, 4; 2026-02-21T08:53:50.4263355Z selp.b16 %rs565, %rs477, %rs445, %p57; 2026-02-21T08:53:50.4263421Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T08:53:50.4263482Z shr.s16 %rs567, %rs566, 4; 2026-02-21T08:53:50.4263549Z selp.b16 %rs568, %rs478, %rs446, %p57; 2026-02-21T08:53:50.4263613Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T08:53:50.4263675Z shr.s16 %rs570, %rs569, 4; 2026-02-21T08:53:50.4263744Z selp.b16 %rs571, %rs479, %rs447, %p57; 2026-02-21T08:53:50.4263804Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T08:53:50.4263870Z shr.s16 %rs573, %rs572, 4; 2026-02-21T08:53:50.4263943Z selp.b16 %rs574, %rs480, %rs448, %p57; 2026-02-21T08:53:50.4264005Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T08:53:50.4264071Z shr.s16 %rs576, %rs575, 4; 2026-02-21T08:53:50.4264287Z .loc 1 74 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:74:32 2026-02-21T08:53:50.4264358Z cvt.rn.f32.s16 %r2766, %rs483; 2026-02-21T08:53:50.4264423Z cvt.rn.f32.s16 %r2767, %rs486; 2026-02-21T08:53:50.4264490Z cvt.rn.f32.s16 %r2768, %rs489; 2026-02-21T08:53:50.4264551Z cvt.rn.f32.s16 %r2769, %rs492; 2026-02-21T08:53:50.4264616Z cvt.rn.f32.s16 %r2770, %rs495; 2026-02-21T08:53:50.4264682Z cvt.rn.f32.s16 %r2771, %rs498; 2026-02-21T08:53:50.4264745Z cvt.rn.f32.s16 %r2772, %rs501; 2026-02-21T08:53:50.4264807Z cvt.rn.f32.s16 %r2773, %rs504; 2026-02-21T08:53:50.4264870Z cvt.rn.f32.s16 %r2774, %rs507; 2026-02-21T08:53:50.4264935Z cvt.rn.f32.s16 %r2775, %rs510; 2026-02-21T08:53:50.4264997Z cvt.rn.f32.s16 %r2776, %rs513; 2026-02-21T08:53:50.4265058Z cvt.rn.f32.s16 %r2777, %rs516; 2026-02-21T08:53:50.4265125Z cvt.rn.f32.s16 %r2778, %rs519; 2026-02-21T08:53:50.4265186Z cvt.rn.f32.s16 %r2779, %rs522; 2026-02-21T08:53:50.4265248Z cvt.rn.f32.s16 %r2780, %rs525; 2026-02-21T08:53:50.4265317Z cvt.rn.f32.s16 %r2781, %rs528; 2026-02-21T08:53:50.4265379Z cvt.rn.f32.s16 %r2782, %rs531; 2026-02-21T08:53:50.4265442Z cvt.rn.f32.s16 %r2783, %rs534; 2026-02-21T08:53:50.4265504Z cvt.rn.f32.s16 %r2784, %rs537; 2026-02-21T08:53:50.4265572Z cvt.rn.f32.s16 %r2785, %rs540; 2026-02-21T08:53:50.4265634Z cvt.rn.f32.s16 %r2786, %rs543; 2026-02-21T08:53:50.4265695Z cvt.rn.f32.s16 %r2787, %rs546; 2026-02-21T08:53:50.4265761Z cvt.rn.f32.s16 %r2788, %rs549; 2026-02-21T08:53:50.4265825Z cvt.rn.f32.s16 %r2789, %rs552; 2026-02-21T08:53:50.4265898Z cvt.rn.f32.s16 %r2790, %rs555; 2026-02-21T08:53:50.4265962Z cvt.rn.f32.s16 %r2791, %rs558; 2026-02-21T08:53:50.4266028Z cvt.rn.f32.s16 %r2792, %rs561; 2026-02-21T08:53:50.4266089Z cvt.rn.f32.s16 %r2793, %rs564; 2026-02-21T08:53:50.4266151Z cvt.rn.f32.s16 %r2794, %rs567; 2026-02-21T08:53:50.4266216Z cvt.rn.f32.s16 %r2795, %rs570; 2026-02-21T08:53:50.4266278Z cvt.rn.f32.s16 %r2796, %rs573; 2026-02-21T08:53:50.4266345Z cvt.rn.f32.s16 %r2797, %rs576; 2026-02-21T08:53:50.4266416Z st.shared.b32 [%r54], %r2766; 2026-02-21T08:53:50.4266601Z st.shared.b32 [%r54+8], %r2767; 2026-02-21T08:53:50.4266676Z st.shared.b32 [%r54+8192], %r2782; 2026-02-21T08:53:50.4266827Z st.shared.b32 [%r54+8200], %r2783; 2026-02-21T08:53:50.4266897Z st.shared.b32 [%r55], %r2768; 2026-02-21T08:53:50.4266963Z st.shared.b32 [%r55+8], %r2769; 2026-02-21T08:53:50.4267030Z st.shared.b32 [%r55+8192], %r2784; 2026-02-21T08:53:50.4267096Z st.shared.b32 [%r55+8200], %r2785; 2026-02-21T08:53:50.4267172Z st.shared.b32 [%r56], %r2770; 2026-02-21T08:53:50.4267237Z st.shared.b32 [%r56+8], %r2771; 2026-02-21T08:53:50.4267300Z st.shared.b32 [%r56+8192], %r2786; 2026-02-21T08:53:50.4267370Z st.shared.b32 [%r56+8200], %r2787; 2026-02-21T08:53:50.4267435Z st.shared.b32 [%r57], %r2772; 2026-02-21T08:53:50.4267499Z st.shared.b32 [%r57+8], %r2773; 2026-02-21T08:53:50.4267567Z st.shared.b32 [%r57+8192], %r2788; 2026-02-21T08:53:50.4267811Z st.shared.b32 [%r57+8200], %r2789; 2026-02-21T08:53:50.4267878Z st.shared.b32 [%r58], %r2774; 2026-02-21T08:53:50.4267943Z st.shared.b32 [%r58+8], %r2775; 2026-02-21T08:53:50.4268011Z st.shared.b32 [%r58+8192], %r2790; 2026-02-21T08:53:50.4268076Z st.shared.b32 [%r58+8200], %r2791; 2026-02-21T08:53:50.4268141Z st.shared.b32 [%r59], %r2776; 2026-02-21T08:53:50.4268207Z st.shared.b32 [%r59+8], %r2777; 2026-02-21T08:53:50.4268273Z st.shared.b32 [%r59+8192], %r2792; 2026-02-21T08:53:50.4268336Z st.shared.b32 [%r59+8200], %r2793; 2026-02-21T08:53:50.4268401Z st.shared.b32 [%r60], %r2778; 2026-02-21T08:53:50.4268468Z st.shared.b32 [%r60+8], %r2779; 2026-02-21T08:53:50.4268620Z st.shared.b32 [%r60+8192], %r2794; 2026-02-21T08:53:50.4268687Z st.shared.b32 [%r60+8200], %r2795; 2026-02-21T08:53:50.4268755Z st.shared.b32 [%r61], %r2780; 2026-02-21T08:53:50.4268821Z st.shared.b32 [%r61+8], %r2781; 2026-02-21T08:53:50.4268886Z st.shared.b32 [%r61+8192], %r2796; 2026-02-21T08:53:50.4268958Z st.shared.b32 [%r61+8200], %r2797; 2026-02-21T08:53:50.4269018Z $L__tmp5: 2026-02-21T08:53:50.4269301Z .loc 2 291 36 // standard.py:291:36 @[ cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:81:40 ] 2026-02-21T08:53:50.4269366Z // begin inline asm 2026-02-21T08:53:50.4269456Z fence.proxy.async.shared::cta; 2026-02-21T08:53:50.4269514Z // end inline asm 2026-02-21T08:53:50.4269568Z bar.sync 0; 2026-02-21T08:53:50.4269655Z shfl.sync.idx.b32 %r2798, %r6, 0, 31, -1; 2026-02-21T08:53:50.4269730Z wgmma.fence.sync.aligned; 2026-02-21T08:53:50.4269797Z mov.pred %p28, -1; 2026-02-21T08:53:50.4269856Z // begin inline asm 2026-02-21T08:53:50.4270623Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2180,%r2181,%r2182,%r2183}, %rd17, %p28, 1, 1; 2026-02-21T08:53:50.4270687Z // end inline asm 2026-02-21T08:53:50.4270746Z // begin inline asm 2026-02-21T08:53:50.4271496Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2248,%r2249,%r2250,%r2251}, %rd18, %p28, 1, 1; 2026-02-21T08:53:50.4271556Z // end inline asm 2026-02-21T08:53:50.4271614Z // begin inline asm 2026-02-21T08:53:50.4272364Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2316,%r2317,%r2318,%r2319}, %rd19, %p28, 1, 1; 2026-02-21T08:53:50.4272424Z // end inline asm 2026-02-21T08:53:50.4272482Z // begin inline asm 2026-02-21T08:53:50.4273234Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2384,%r2385,%r2386,%r2387}, %rd20, %p28, 1, 1; 2026-02-21T08:53:50.4273356Z // end inline asm 2026-02-21T08:53:50.4273416Z // begin inline asm 2026-02-21T08:53:50.4274169Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2452,%r2453,%r2454,%r2455}, %rd21, %p28, 1, 1; 2026-02-21T08:53:50.4274227Z // end inline asm 2026-02-21T08:53:50.4274291Z // begin inline asm 2026-02-21T08:53:50.4275156Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2520,%r2521,%r2522,%r2523}, %rd22, %p28, 1, 1; 2026-02-21T08:53:50.4275260Z // end inline asm 2026-02-21T08:53:50.4275324Z // begin inline asm 2026-02-21T08:53:50.4276075Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2588,%r2589,%r2590,%r2591}, %rd23, %p28, 1, 1; 2026-02-21T08:53:50.4276134Z // end inline asm 2026-02-21T08:53:50.4276199Z // begin inline asm 2026-02-21T08:53:50.4277098Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861}, {%r2656,%r2657,%r2658,%r2659}, %rd24, %p28, 1, 1; 2026-02-21T08:53:50.4277167Z // end inline asm 2026-02-21T08:53:50.4277253Z wgmma.commit_group.sync.aligned; 2026-02-21T08:53:50.4277316Z mov.b32 %r2694, 0; 2026-02-21T08:53:50.4277381Z mov.b32 %r2692, %r441; 2026-02-21T08:53:50.4277441Z mov.b32 %r2693, %r2694; 2026-02-21T08:53:50.4277508Z // begin inline asm 2026-02-21T08:53:50.4278069Z // wait for regs: %r3830,%r3831,%r3832,%r3833,%r3834,%r3835,%r3836,%r3837,%r3838,%r3839,%r3840,%r3841,%r3842,%r3843,%r3844,%r3845,%r3846,%r3847,%r3848,%r3849,%r3850,%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861,%r2692,%r2693,%r2694 2026-02-21T08:53:50.4278150Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:53:50.4278212Z // end inline asm 2026-02-21T08:53:50.4278270Z $L__tmp6: 2026-02-21T08:53:50.4278487Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4278572Z add.s32 %r2799, %r3829, 1; 2026-02-21T08:53:50.4278644Z setp.gt.s32 %p39, %r2799, 1; 2026-02-21T08:53:50.4278713Z selp.b32 %r3829, 0, %r2799, %p39; 2026-02-21T08:53:50.4278919Z .loc 1 45 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:32 2026-02-21T08:53:50.4278995Z add.s64 %rd226, %rd319, -917504; 2026-02-21T08:53:50.4279059Z add.s64 %rd227, %rd319, -786432; 2026-02-21T08:53:50.4279123Z add.s64 %rd228, %rd319, -655360; 2026-02-21T08:53:50.4279191Z add.s64 %rd229, %rd319, -524288; 2026-02-21T08:53:50.4279254Z add.s64 %rd230, %rd319, -393216; 2026-02-21T08:53:50.4279317Z add.s64 %rd231, %rd319, -262144; 2026-02-21T08:53:50.4279388Z add.s64 %rd232, %rd319, -131072; 2026-02-21T08:53:50.4279588Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4279653Z shl.b32 %r2800, %r3829, 13; 2026-02-21T08:53:50.4279716Z add.s32 %r2801, %r3732, %r2800; 2026-02-21T08:53:50.4279788Z add.s32 %r2730, %r2801, %r22; 2026-02-21T08:53:50.4279855Z selp.b32 %r2731, 8, 0, %p37; 2026-02-21T08:53:50.4279916Z // begin inline asm 2026-02-21T08:53:50.4280070Z cp.async.ca.shared.global [ %r2730 + 0 ], [ %rd226 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4280224Z // end inline asm 2026-02-21T08:53:50.4280289Z add.s32 %r2732, %r2730, 1024; 2026-02-21T08:53:50.4280352Z // begin inline asm 2026-02-21T08:53:50.4280497Z cp.async.ca.shared.global [ %r2732 + 0 ], [ %rd227 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4280555Z // end inline asm 2026-02-21T08:53:50.4280618Z add.s32 %r2734, %r2730, 2048; 2026-02-21T08:53:50.4280681Z // begin inline asm 2026-02-21T08:53:50.4280815Z cp.async.ca.shared.global [ %r2734 + 0 ], [ %rd228 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4280873Z // end inline asm 2026-02-21T08:53:50.4280937Z add.s32 %r2736, %r2730, 3072; 2026-02-21T08:53:50.4280997Z // begin inline asm 2026-02-21T08:53:50.4281131Z cp.async.ca.shared.global [ %r2736 + 0 ], [ %rd229 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4281368Z // end inline asm 2026-02-21T08:53:50.4281435Z add.s32 %r2738, %r2730, 4096; 2026-02-21T08:53:50.4281494Z // begin inline asm 2026-02-21T08:53:50.4281626Z cp.async.ca.shared.global [ %r2738 + 0 ], [ %rd230 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4281690Z // end inline asm 2026-02-21T08:53:50.4281751Z add.s32 %r2740, %r2730, 5120; 2026-02-21T08:53:50.4281809Z // begin inline asm 2026-02-21T08:53:50.4281940Z cp.async.ca.shared.global [ %r2740 + 0 ], [ %rd231 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4282000Z // end inline asm 2026-02-21T08:53:50.4282062Z add.s32 %r2742, %r2730, 6144; 2026-02-21T08:53:50.4282120Z // begin inline asm 2026-02-21T08:53:50.4282255Z cp.async.ca.shared.global [ %r2742 + 0 ], [ %rd232 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4282312Z // end inline asm 2026-02-21T08:53:50.4282374Z add.s32 %r2744, %r2730, 7168; 2026-02-21T08:53:50.4282445Z // begin inline asm 2026-02-21T08:53:50.4282589Z cp.async.ca.shared.global [ %r2744 + 0 ], [ %rd319 + 0 ], 0x8, %r2731; 2026-02-21T08:53:50.4282650Z // end inline asm 2026-02-21T08:53:50.4282719Z cp.async.commit_group; 2026-02-21T08:53:50.4282929Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4282999Z cvt.s64.s32 %rd235, %r3827; 2026-02-21T08:53:50.4283066Z add.s64 %rd234, %rd81, %rd235; 2026-02-21T08:53:50.4283272Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4283336Z shl.b32 %r2802, %r3829, 11; 2026-02-21T08:53:50.4283400Z add.s32 %r2746, %r32, %r2802; 2026-02-21T08:53:50.4283466Z selp.b32 %r2747, 16, 0, %p37; 2026-02-21T08:53:50.4283532Z // begin inline asm 2026-02-21T08:53:50.4283680Z cp.async.cg.shared.global [ %r2746 + 0 ], [ %rd234 + 0 ], 0x10, %r2747; 2026-02-21T08:53:50.4283742Z // end inline asm 2026-02-21T08:53:50.4283814Z cp.async.commit_group; 2026-02-21T08:53:50.4284029Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4284099Z add.s32 %r3827, %r3827, 229376; 2026-02-21T08:53:50.4284172Z add.s64 %rd319, %rd319, 128; 2026-02-21T08:53:50.4284239Z setp.lt.u64 %p40, %rd320, 4064; 2026-02-21T08:53:50.4284304Z @%p40 bra $L__BB0_7; 2026-02-21T08:53:50.4284419Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T08:53:50.4284626Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4284692Z or.b32 %r2839, %r270, %r10; 2026-02-21T08:53:50.4284900Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4284976Z cp.async.wait_group 0; 2026-02-21T08:53:50.4285037Z bar.sync 0; 2026-02-21T08:53:50.4285237Z .loc 1 84 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:84:28 2026-02-21T08:53:50.4285324Z cvt.rn.bf16x2.f32 %r2840, %r3831, %r3830; 2026-02-21T08:53:50.4285406Z cvt.rn.bf16x2.f32 %r2841, %r3833, %r3832; 2026-02-21T08:53:50.4285481Z cvt.rn.bf16x2.f32 %r2842, %r3835, %r3834; 2026-02-21T08:53:50.4285554Z cvt.rn.bf16x2.f32 %r2843, %r3837, %r3836; 2026-02-21T08:53:50.4285718Z cvt.rn.bf16x2.f32 %r2844, %r3839, %r3838; 2026-02-21T08:53:50.4285792Z cvt.rn.bf16x2.f32 %r2845, %r3841, %r3840; 2026-02-21T08:53:50.4285866Z cvt.rn.bf16x2.f32 %r2846, %r3843, %r3842; 2026-02-21T08:53:50.4285945Z cvt.rn.bf16x2.f32 %r2847, %r3845, %r3844; 2026-02-21T08:53:50.4286019Z cvt.rn.bf16x2.f32 %r2848, %r3847, %r3846; 2026-02-21T08:53:50.4286092Z cvt.rn.bf16x2.f32 %r2849, %r3849, %r3848; 2026-02-21T08:53:50.4286173Z cvt.rn.bf16x2.f32 %r2850, %r3851, %r3850; 2026-02-21T08:53:50.4286247Z cvt.rn.bf16x2.f32 %r2851, %r3853, %r3852; 2026-02-21T08:53:50.4286321Z cvt.rn.bf16x2.f32 %r2852, %r3855, %r3854; 2026-02-21T08:53:50.4286398Z cvt.rn.bf16x2.f32 %r2853, %r3857, %r3856; 2026-02-21T08:53:50.4286594Z cvt.rn.bf16x2.f32 %r2854, %r3859, %r3858; 2026-02-21T08:53:50.4286865Z cvt.rn.bf16x2.f32 %r2855, %r3861, %r3860; 2026-02-21T08:53:50.4287075Z .loc 1 85 50 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:50 2026-02-21T08:53:50.4287161Z add.s32 %r2856, %r2839, %r18; 2026-02-21T08:53:50.4287226Z add.s32 %r2857, %r2839, %r19; 2026-02-21T08:53:50.4287289Z add.s32 %r2858, %r2839, %r20; 2026-02-21T08:53:50.4287355Z add.s32 %r2859, %r2839, %r21; 2026-02-21T08:53:50.4287557Z .loc 1 85 22 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:22 2026-02-21T08:53:50.4287636Z mad.wide.s32 %rd236, %r2856, 2, %rd82; 2026-02-21T08:53:50.4287706Z mad.wide.s32 %rd237, %r2857, 2, %rd82; 2026-02-21T08:53:50.4287780Z mad.wide.s32 %rd238, %r2858, 2, %rd82; 2026-02-21T08:53:50.4287848Z mad.wide.s32 %rd239, %r2859, 2, %rd82; 2026-02-21T08:53:50.4288045Z .loc 1 85 81 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:81 2026-02-21T08:53:50.4288171Z st.shared.v4.b32 [%r62], {%r2840, %r2842, %r2844, %r2846}; 2026-02-21T08:53:50.4288289Z st.shared.v4.b32 [%r62+256], {%r2841, %r2843, %r2845, %r2847}; 2026-02-21T08:53:50.4288396Z st.shared.v4.b32 [%r63], {%r2848, %r2850, %r2852, %r2854}; 2026-02-21T08:53:50.4288511Z st.shared.v4.b32 [%r63+256], {%r2849, %r2851, %r2853, %r2855}; 2026-02-21T08:53:50.4288568Z bar.sync 0; 2026-02-21T08:53:50.4288631Z // begin inline asm 2026-02-21T08:53:50.4288830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2803, %r2804, %r2805, %r2806}, [%r1235]; 2026-02-21T08:53:50.4288893Z // end inline asm 2026-02-21T08:53:50.4288953Z // begin inline asm 2026-02-21T08:53:50.4289138Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2808, %r2809, %r2810, %r2811}, [%r1240]; 2026-02-21T08:53:50.4289198Z // end inline asm 2026-02-21T08:53:50.4289256Z // begin inline asm 2026-02-21T08:53:50.4289436Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2813, %r2814, %r2815, %r2816}, [%r1245]; 2026-02-21T08:53:50.4289496Z // end inline asm 2026-02-21T08:53:50.4289560Z // begin inline asm 2026-02-21T08:53:50.4289738Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2818, %r2819, %r2820, %r2821}, [%r1250]; 2026-02-21T08:53:50.4289794Z // end inline asm 2026-02-21T08:53:50.4289856Z // begin inline asm 2026-02-21T08:53:50.4289984Z st.global.v4.b32 [ %rd236 + 0 ], { %r2803, %r2804, %r2805, %r2806 }; 2026-02-21T08:53:50.4290041Z // end inline asm 2026-02-21T08:53:50.4290100Z // begin inline asm 2026-02-21T08:53:50.4290221Z st.global.v4.b32 [ %rd237 + 0 ], { %r2808, %r2809, %r2810, %r2811 }; 2026-02-21T08:53:50.4290278Z // end inline asm 2026-02-21T08:53:50.4290336Z // begin inline asm 2026-02-21T08:53:50.4290457Z st.global.v4.b32 [ %rd238 + 0 ], { %r2813, %r2814, %r2815, %r2816 }; 2026-02-21T08:53:50.4290514Z // end inline asm 2026-02-21T08:53:50.4290571Z // begin inline asm 2026-02-21T08:53:50.4290690Z st.global.v4.b32 [ %rd239 + 0 ], { %r2818, %r2819, %r2820, %r2821 }; 2026-02-21T08:53:50.4290746Z // end inline asm 2026-02-21T08:53:50.4290956Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.4291025Z add.s64 %rd314, %rd314, 3; 2026-02-21T08:53:50.4291090Z add.s32 %r3756, %r3756, 192; 2026-02-21T08:53:50.4291243Z add.s32 %r3755, %r3755, 192; 2026-02-21T08:53:50.4291302Z add.s32 %r3754, %r3754, 192; 2026-02-21T08:53:50.4291377Z setp.lt.u64 %p41, %rd314, %rd26; 2026-02-21T08:53:50.4291442Z @%p41 bra $L__BB0_2; 2026-02-21T08:53:50.4291536Z $L__BB0_9: // %.preheader 2026-02-21T08:53:50.4291611Z setp.gt.s32 %p42, %r4, %r2; 2026-02-21T08:53:50.4291672Z @%p42 bra $L__BB0_14; 2026-02-21T08:53:50.4291759Z // %bb.10: // %.lr.ph153 2026-02-21T08:53:50.4291970Z .loc 1 0 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:0:88 2026-02-21T08:53:50.4292040Z sub.s32 %r3, %r414, %r418; 2026-02-21T08:53:50.4292114Z mad.wide.u32 %rd269, %r3729, 2, %rd80; 2026-02-21T08:53:50.4292324Z add.s64 %rd242, %rd312, %rd311; 2026-02-21T08:53:50.4292394Z shl.b64 %rd243, %rd242, 1; 2026-02-21T08:53:50.4292461Z add.s64 %rd244, %rd80, %rd243; 2026-02-21T08:53:50.4292522Z add.s64 %rd270, %rd244, 131072; 2026-02-21T08:53:50.4292588Z add.s64 %rd271, %rd244, 262144; 2026-02-21T08:53:50.4292649Z add.s64 %rd272, %rd244, 393216; 2026-02-21T08:53:50.4292709Z add.s64 %rd273, %rd244, 524288; 2026-02-21T08:53:50.4292772Z add.s64 %rd274, %rd244, 655360; 2026-02-21T08:53:50.4292838Z add.s64 %rd275, %rd244, 786432; 2026-02-21T08:53:50.4292898Z add.s64 %rd246, %rd313, %rd311; 2026-02-21T08:53:50.4292959Z shl.b64 %rd247, %rd246, 1; 2026-02-21T08:53:50.4293024Z add.s64 %rd248, %rd80, %rd247; 2026-02-21T08:53:50.4293090Z add.s64 %rd276, %rd248, 917504; 2026-02-21T08:53:50.4293149Z shl.b32 %r2862, %r3730, 3; 2026-02-21T08:53:50.4293211Z xor.b32 %r79, %r2862, %r3731; 2026-02-21T08:53:50.4293278Z add.s32 %r80, %r3732, %r79; 2026-02-21T08:53:50.4293338Z add.s32 %r2944, %r80, 1024; 2026-02-21T08:53:50.4293403Z add.s32 %r2946, %r80, 2048; 2026-02-21T08:53:50.4293466Z add.s32 %r2948, %r80, 3072; 2026-02-21T08:53:50.4293525Z add.s32 %r2950, %r80, 4096; 2026-02-21T08:53:50.4293583Z add.s32 %r2952, %r80, 5120; 2026-02-21T08:53:50.4293644Z add.s32 %r2954, %r80, 6144; 2026-02-21T08:53:50.4293707Z add.s32 %r2956, %r80, 7168; 2026-02-21T08:53:50.4293780Z shl.b32 %r2865, %r3730, 4; 2026-02-21T08:53:50.4293842Z add.s32 %r2866, %r3732, %r2865; 2026-02-21T08:53:50.4293907Z add.s32 %r89, %r2866, 32768; 2026-02-21T08:53:50.4293971Z add.s64 %rd278, %rd244, 128; 2026-02-21T08:53:50.4294034Z cvt.u64.u32 %rd249, %r3734; 2026-02-21T08:53:50.4294099Z add.s64 %rd250, %rd312, %rd249; 2026-02-21T08:53:50.4294160Z shl.b64 %rd251, %rd250, 1; 2026-02-21T08:53:50.4294222Z add.s64 %rd252, %rd80, %rd251; 2026-02-21T08:53:50.4294282Z add.s64 %rd279, %rd252, 131072; 2026-02-21T08:53:50.4294347Z add.s64 %rd280, %rd252, 262144; 2026-02-21T08:53:50.4294409Z add.s64 %rd281, %rd252, 393216; 2026-02-21T08:53:50.4294476Z add.s64 %rd282, %rd252, 524288; 2026-02-21T08:53:50.4294539Z add.s64 %rd283, %rd252, 655360; 2026-02-21T08:53:50.4294599Z add.s64 %rd284, %rd252, 786432; 2026-02-21T08:53:50.4294660Z add.s64 %rd253, %rd313, %rd249; 2026-02-21T08:53:50.4294725Z shl.b64 %rd254, %rd253, 1; 2026-02-21T08:53:50.4294790Z add.s64 %rd255, %rd80, %rd254; 2026-02-21T08:53:50.4294851Z add.s64 %rd285, %rd255, 917504; 2026-02-21T08:53:50.4294912Z add.s32 %r2960, %r80, 8192; 2026-02-21T08:53:50.4294977Z add.s32 %r2962, %r80, 9216; 2026-02-21T08:53:50.4295040Z add.s32 %r2964, %r80, 10240; 2026-02-21T08:53:50.4295100Z add.s32 %r2966, %r80, 11264; 2026-02-21T08:53:50.4295160Z add.s32 %r2968, %r80, 12288; 2026-02-21T08:53:50.4295224Z add.s32 %r2970, %r80, 13312; 2026-02-21T08:53:50.4295284Z add.s32 %r2972, %r80, 14336; 2026-02-21T08:53:50.4295344Z add.s32 %r2974, %r80, 15360; 2026-02-21T08:53:50.4295558Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.4295627Z add.s32 %r98, %r3733, 229376; 2026-02-21T08:53:50.4295690Z add.s32 %r2976, %r2866, 34816; 2026-02-21T08:53:50.4295757Z and.b32 %r2869, %r3735, 6144; 2026-02-21T08:53:50.4295907Z and.b32 %r2871, %r3736, 896; 2026-02-21T08:53:50.4295970Z and.b32 %r2873, %r3737, 62; 2026-02-21T08:53:50.4296033Z or.b32 %r2874, %r2869, %r2871; 2026-02-21T08:53:50.4296102Z or.b32 %r100, %r2874, %r2873; 2026-02-21T08:53:50.4296167Z xor.b32 %r101, %r100, 8; 2026-02-21T08:53:50.4296230Z xor.b32 %r102, %r100, 16; 2026-02-21T08:53:50.4296295Z xor.b32 %r103, %r100, 24; 2026-02-21T08:53:50.4296354Z xor.b32 %r104, %r100, 32; 2026-02-21T08:53:50.4296412Z xor.b32 %r105, %r100, 40; 2026-02-21T08:53:50.4296631Z xor.b32 %r106, %r100, 48; 2026-02-21T08:53:50.4296700Z xor.b32 %r107, %r100, 56; 2026-02-21T08:53:50.4296763Z shl.b32 %r2875, %r3738, 7; 2026-02-21T08:53:50.4296826Z or.b32 %r2878, %r3741, %r3742; 2026-02-21T08:53:50.4296890Z or.b32 %r2879, %r2878, %r2875; 2026-02-21T08:53:50.4297152Z add.s32 %r2880, %r3732, 16384; 2026-02-21T08:53:50.4297216Z add.s32 %r111, %r2880, %r2879; 2026-02-21T08:53:50.4297280Z xor.b32 %r2881, %r2879, 16; 2026-02-21T08:53:50.4297344Z add.s32 %r112, %r2880, %r2881; 2026-02-21T08:53:50.4297410Z xor.b32 %r2882, %r2879, 32; 2026-02-21T08:53:50.4297471Z add.s32 %r113, %r2880, %r2882; 2026-02-21T08:53:50.4297535Z xor.b32 %r2883, %r2879, 48; 2026-02-21T08:53:50.4297596Z add.s32 %r114, %r2880, %r2883; 2026-02-21T08:53:50.4297669Z xor.b32 %r2884, %r2879, 64; 2026-02-21T08:53:50.4297735Z add.s32 %r115, %r2880, %r2884; 2026-02-21T08:53:50.4297795Z xor.b32 %r2885, %r2879, 80; 2026-02-21T08:53:50.4297856Z add.s32 %r116, %r2880, %r2885; 2026-02-21T08:53:50.4297915Z xor.b32 %r2886, %r2879, 96; 2026-02-21T08:53:50.4297979Z add.s32 %r117, %r2880, %r2886; 2026-02-21T08:53:50.4298043Z xor.b32 %r2887, %r2879, 112; 2026-02-21T08:53:50.4298104Z add.s32 %r118, %r2880, %r2887; 2026-02-21T08:53:50.4298167Z bfe.u32 %r2888, %r2880, 4, 14; 2026-02-21T08:53:50.4298237Z cvt.u64.u32 %rd256, %r2888; 2026-02-21T08:53:50.4298318Z or.b64 %rd290, %rd256, 4611686293338849280; 2026-02-21T08:53:50.4298380Z add.s32 %r2889, %r3732, 16416; 2026-02-21T08:53:50.4298444Z bfe.u32 %r2890, %r2889, 4, 14; 2026-02-21T08:53:50.4298505Z cvt.u64.u32 %rd257, %r2890; 2026-02-21T08:53:50.4298579Z or.b64 %rd291, %rd257, 4611686293338849280; 2026-02-21T08:53:50.4298642Z add.s32 %r2891, %r3732, 16448; 2026-02-21T08:53:50.4298703Z bfe.u32 %r2892, %r2891, 4, 14; 2026-02-21T08:53:50.4298764Z cvt.u64.u32 %rd258, %r2892; 2026-02-21T08:53:50.4298835Z or.b64 %rd292, %rd258, 4611686293338849280; 2026-02-21T08:53:50.4298899Z add.s32 %r2893, %r3732, 16480; 2026-02-21T08:53:50.4298957Z bfe.u32 %r2894, %r2893, 4, 14; 2026-02-21T08:53:50.4299017Z cvt.u64.u32 %rd259, %r2894; 2026-02-21T08:53:50.4299092Z or.b64 %rd293, %rd259, 4611686293338849280; 2026-02-21T08:53:50.4299152Z add.s32 %r2895, %r3732, 24576; 2026-02-21T08:53:50.4299213Z bfe.u32 %r2896, %r2895, 4, 14; 2026-02-21T08:53:50.4299281Z cvt.u64.u32 %rd260, %r2896; 2026-02-21T08:53:50.4299352Z or.b64 %rd294, %rd260, 4611686293338849280; 2026-02-21T08:53:50.4299413Z add.s32 %r2897, %r3732, 24608; 2026-02-21T08:53:50.4299474Z bfe.u32 %r2898, %r2897, 4, 14; 2026-02-21T08:53:50.4299539Z cvt.u64.u32 %rd261, %r2898; 2026-02-21T08:53:50.4299610Z or.b64 %rd295, %rd261, 4611686293338849280; 2026-02-21T08:53:50.4299671Z add.s32 %r2899, %r3732, 24640; 2026-02-21T08:53:50.4299733Z bfe.u32 %r2900, %r2899, 4, 14; 2026-02-21T08:53:50.4299796Z cvt.u64.u32 %rd262, %r2900; 2026-02-21T08:53:50.4299867Z or.b64 %rd296, %rd262, 4611686293338849280; 2026-02-21T08:53:50.4299928Z add.s32 %r2901, %r3732, 24672; 2026-02-21T08:53:50.4299989Z bfe.u32 %r2902, %r2901, 4, 14; 2026-02-21T08:53:50.4300050Z cvt.u64.u32 %rd263, %r2902; 2026-02-21T08:53:50.4300120Z or.b64 %rd297, %rd263, 4611686293338849280; 2026-02-21T08:53:50.4300184Z and.b32 %r2903, %r11, 1536; 2026-02-21T08:53:50.4300249Z shl.b32 %r2905, %r3743, 5; 2026-02-21T08:53:50.4300314Z and.b32 %r2908, %r3745, 4160; 2026-02-21T08:53:50.4300374Z shl.b32 %r2911, %r3747, 2; 2026-02-21T08:53:50.4300440Z and.b32 %r2913, %r3748, 2080; 2026-02-21T08:53:50.4300593Z shl.b32 %r2916, %r3750, 3; 2026-02-21T08:53:50.4300655Z or.b32 %r2917, %r2908, %r2911; 2026-02-21T08:53:50.4300721Z or.b32 %r2918, %r2917, %r2916; 2026-02-21T08:53:50.4300782Z or.b32 %r2919, %r2903, %r2905; 2026-02-21T08:53:50.4300846Z xor.b32 %r2920, %r2919, %r2913; 2026-02-21T08:53:50.4300909Z or.b32 %r2921, %r2918, %r2920; 2026-02-21T08:53:50.4300971Z add.s32 %r119, %r3732, %r2921; 2026-02-21T08:53:50.4301032Z xor.b32 %r2922, %r2921, 64; 2026-02-21T08:53:50.4301093Z add.s32 %r120, %r3732, %r2922; 2026-02-21T08:53:50.4301157Z shl.b32 %r2924, %r3743, 6; 2026-02-21T08:53:50.4301220Z shl.b32 %r2925, %r3744, 3; 2026-02-21T08:53:50.4301281Z and.b32 %r2926, %r3746, 2080; 2026-02-21T08:53:50.4301345Z and.b32 %r2927, %r3749, 4160; 2026-02-21T08:53:50.4301461Z or.b32 %r2928, %r3751, %r2924; 2026-02-21T08:53:50.4301610Z or.b32 %r2929, %r2926, %r2927; 2026-02-21T08:53:50.4301675Z xor.b32 %r2930, %r2929, %r2928; 2026-02-21T08:53:50.4301741Z add.s32 %r2931, %r3732, %r2925; 2026-02-21T08:53:50.4301803Z add.s32 %r3676, %r2931, %r2930; 2026-02-21T08:53:50.4301879Z add.s32 %r3681, %r3676, 512; 2026-02-21T08:53:50.4301950Z add.s32 %r3686, %r3676, 1024; 2026-02-21T08:53:50.4302012Z add.s32 %r3691, %r3676, 1536; 2026-02-21T08:53:50.4302073Z sub.s32 %r2932, %r2, %r3; 2026-02-21T08:53:50.4302133Z add.s32 %r2933, %r2932, 1; 2026-02-21T08:53:50.4302200Z cvt.s64.s32 %rd322, %r2933; 2026-02-21T08:53:50.4302264Z cvt.u64.u32 %rd53, %r2; 2026-02-21T08:53:50.4302327Z shl.b32 %r2934, %r2, 6; 2026-02-21T08:53:50.4302392Z add.s32 %r2935, %r3733, %r2934; 2026-02-21T08:53:50.4302453Z or.b32 %r2936, %r2935, %r12; 2026-02-21T08:53:50.4302513Z shl.b32 %r2937, %r3, 6; 2026-02-21T08:53:50.4302576Z sub.s32 %r2938, %r2936, %r2937; 2026-02-21T08:53:50.4302642Z add.s32 %r2939, %r2938, 458816; 2026-02-21T08:53:50.4302710Z cvt.u64.u32 %rd264, %r2939; 2026-02-21T08:53:50.4302777Z add.s64 %rd321, %rd81, %rd264; 2026-02-21T08:53:50.4302853Z mul.wide.u32 %rd265, %r3752, 16384; 2026-02-21T08:53:50.4302920Z mul.wide.u32 %rd266, %r3753, 8; 2026-02-21T08:53:50.4302984Z or.b64 %rd267, %rd265, %rd266; 2026-02-21T08:53:50.4303050Z add.s64 %rd268, %rd267, %rd80; 2026-02-21T08:53:50.4303112Z add.s64 %rd55, %rd268, 917760; 2026-02-21T08:53:50.4303233Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T08:53:50.4303334Z // Child Loop BB0_12 Depth 2 2026-02-21T08:53:50.4303546Z .loc 1 29 27 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:29:27 2026-02-21T08:53:50.4303610Z cvt.u32.u64 %r2981, %rd322; 2026-02-21T08:53:50.4303670Z shl.b32 %r344, %r2981, 6; 2026-02-21T08:53:50.4303875Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4303942Z or.b32 %r2982, %r344, %r12; 2026-02-21T08:53:50.4304140Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4304203Z bar.sync 0; 2026-02-21T08:53:50.4304262Z mov.b32 %r2943, 8; 2026-02-21T08:53:50.4304321Z // begin inline asm 2026-02-21T08:53:50.4304464Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd269 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4304525Z // end inline asm 2026-02-21T08:53:50.4304583Z // begin inline asm 2026-02-21T08:53:50.4304725Z cp.async.ca.shared.global [ %r2944 + 0 ], [ %rd270 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4304787Z // end inline asm 2026-02-21T08:53:50.4304845Z // begin inline asm 2026-02-21T08:53:50.4304979Z cp.async.ca.shared.global [ %r2946 + 0 ], [ %rd271 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4305041Z // end inline asm 2026-02-21T08:53:50.4305100Z // begin inline asm 2026-02-21T08:53:50.4305234Z cp.async.ca.shared.global [ %r2948 + 0 ], [ %rd272 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4305295Z // end inline asm 2026-02-21T08:53:50.4305360Z // begin inline asm 2026-02-21T08:53:50.4305492Z cp.async.ca.shared.global [ %r2950 + 0 ], [ %rd273 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4305637Z // end inline asm 2026-02-21T08:53:50.4305698Z // begin inline asm 2026-02-21T08:53:50.4305830Z cp.async.ca.shared.global [ %r2952 + 0 ], [ %rd274 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4305886Z // end inline asm 2026-02-21T08:53:50.4305948Z // begin inline asm 2026-02-21T08:53:50.4306092Z cp.async.ca.shared.global [ %r2954 + 0 ], [ %rd275 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4306151Z // end inline asm 2026-02-21T08:53:50.4306209Z // begin inline asm 2026-02-21T08:53:50.4306345Z cp.async.ca.shared.global [ %r2956 + 0 ], [ %rd276 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4306402Z // end inline asm 2026-02-21T08:53:50.4306600Z cp.async.commit_group; 2026-02-21T08:53:50.4306819Z .loc 1 51 62 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:62 2026-02-21T08:53:50.4307085Z add.s32 %r2983, %r2982, %r3733; 2026-02-21T08:53:50.4307287Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4307355Z cvt.s64.s32 %rd288, %r2983; 2026-02-21T08:53:50.4307428Z add.s64 %rd277, %rd81, %rd288; 2026-02-21T08:53:50.4307487Z mov.b32 %r2959, 16; 2026-02-21T08:53:50.4307686Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4307749Z // begin inline asm 2026-02-21T08:53:50.4307890Z cp.async.cg.shared.global [ %r89 + 0 ], [ %rd277 + 0 ], 0x10, %r2959; 2026-02-21T08:53:50.4307948Z // end inline asm 2026-02-21T08:53:50.4308016Z cp.async.commit_group; 2026-02-21T08:53:50.4308217Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4308275Z bar.sync 0; 2026-02-21T08:53:50.4308335Z // begin inline asm 2026-02-21T08:53:50.4308481Z cp.async.ca.shared.global [ %r2960 + 0 ], [ %rd278 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4308604Z // end inline asm 2026-02-21T08:53:50.4308665Z // begin inline asm 2026-02-21T08:53:50.4308804Z cp.async.ca.shared.global [ %r2962 + 0 ], [ %rd279 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4308863Z // end inline asm 2026-02-21T08:53:50.4308921Z // begin inline asm 2026-02-21T08:53:50.4309053Z cp.async.ca.shared.global [ %r2964 + 0 ], [ %rd280 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4309113Z // end inline asm 2026-02-21T08:53:50.4309171Z // begin inline asm 2026-02-21T08:53:50.4309301Z cp.async.ca.shared.global [ %r2966 + 0 ], [ %rd281 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4309362Z // end inline asm 2026-02-21T08:53:50.4309421Z // begin inline asm 2026-02-21T08:53:50.4309550Z cp.async.ca.shared.global [ %r2968 + 0 ], [ %rd282 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4309609Z // end inline asm 2026-02-21T08:53:50.4309667Z // begin inline asm 2026-02-21T08:53:50.4309802Z cp.async.ca.shared.global [ %r2970 + 0 ], [ %rd283 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4309861Z // end inline asm 2026-02-21T08:53:50.4309925Z // begin inline asm 2026-02-21T08:53:50.4310057Z cp.async.ca.shared.global [ %r2972 + 0 ], [ %rd284 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4310119Z // end inline asm 2026-02-21T08:53:50.4310180Z // begin inline asm 2026-02-21T08:53:50.4310309Z cp.async.ca.shared.global [ %r2974 + 0 ], [ %rd285 + 0 ], 0x8, %r2943; 2026-02-21T08:53:50.4310364Z // end inline asm 2026-02-21T08:53:50.4310431Z cp.async.commit_group; 2026-02-21T08:53:50.4310502Z add.s32 %r2984, %r2982, %r98; 2026-02-21T08:53:50.4310701Z .loc 1 51 34 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:34 2026-02-21T08:53:50.4313839Z cvt.s64.s32 %rd289, %r2984; 2026-02-21T08:53:50.4313947Z add.s64 %rd286, %rd81, %rd289; 2026-02-21T08:53:50.4314191Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4314269Z // begin inline asm 2026-02-21T08:53:50.4314428Z cp.async.cg.shared.global [ %r2976 + 0 ], [ %rd286 + 0 ], 0x10, %r2959; 2026-02-21T08:53:50.4314494Z // end inline asm 2026-02-21T08:53:50.4314565Z cp.async.commit_group; 2026-02-21T08:53:50.4314765Z mov.b32 %r3864, 0f00000000; 2026-02-21T08:53:50.4314832Z mov.b32 %r3863, 1; 2026-02-21T08:53:50.4314894Z mov.b32 %r3862, -1; 2026-02-21T08:53:50.4314955Z mov.b64 %rd325, -32; 2026-02-21T08:53:50.4315017Z mov.b64 %rd323, %rd55; 2026-02-21T08:53:50.4315086Z mov.b64 %rd324, %rd321; 2026-02-21T08:53:50.4315145Z mov.b32 %r3865, %r3864; 2026-02-21T08:53:50.4315203Z mov.b32 %r3866, %r3864; 2026-02-21T08:53:50.4315266Z mov.b32 %r3867, %r3864; 2026-02-21T08:53:50.4315325Z mov.b32 %r3868, %r3864; 2026-02-21T08:53:50.4315383Z mov.b32 %r3869, %r3864; 2026-02-21T08:53:50.4315442Z mov.b32 %r3870, %r3864; 2026-02-21T08:53:50.4315507Z mov.b32 %r3871, %r3864; 2026-02-21T08:53:50.4315566Z mov.b32 %r3872, %r3864; 2026-02-21T08:53:50.4315694Z mov.b32 %r3873, %r3864; 2026-02-21T08:53:50.4315871Z mov.b32 %r3874, %r3864; 2026-02-21T08:53:50.4315933Z mov.b32 %r3875, %r3864; 2026-02-21T08:53:50.4315991Z mov.b32 %r3876, %r3864; 2026-02-21T08:53:50.4316053Z mov.b32 %r3877, %r3864; 2026-02-21T08:53:50.4316113Z mov.b32 %r3878, %r3864; 2026-02-21T08:53:50.4316182Z mov.b32 %r3879, %r3864; 2026-02-21T08:53:50.4316244Z mov.b32 %r3880, %r3864; 2026-02-21T08:53:50.4316308Z mov.b32 %r3881, %r3864; 2026-02-21T08:53:50.4316365Z mov.b32 %r3882, %r3864; 2026-02-21T08:53:50.4316423Z mov.b32 %r3883, %r3864; 2026-02-21T08:53:50.4316654Z mov.b32 %r3884, %r3864; 2026-02-21T08:53:50.4316718Z mov.b32 %r3885, %r3864; 2026-02-21T08:53:50.4316776Z mov.b32 %r3886, %r3864; 2026-02-21T08:53:50.4316835Z mov.b32 %r3887, %r3864; 2026-02-21T08:53:50.4316899Z mov.b32 %r3888, %r3864; 2026-02-21T08:53:50.4316957Z mov.b32 %r3889, %r3864; 2026-02-21T08:53:50.4317014Z mov.b32 %r3890, %r3864; 2026-02-21T08:53:50.4317086Z mov.b32 %r3891, %r3864; 2026-02-21T08:53:50.4317151Z mov.b32 %r3892, %r3864; 2026-02-21T08:53:50.4317212Z mov.b32 %r3893, %r3864; 2026-02-21T08:53:50.4317272Z mov.b32 %r3894, %r3864; 2026-02-21T08:53:50.4317333Z mov.b32 %r3895, %r3864; 2026-02-21T08:53:50.4317459Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T08:53:50.4317568Z // => This Inner Loop Header: Depth=2 2026-02-21T08:53:50.4317799Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4317865Z add.s64 %rd325, %rd325, 32; 2026-02-21T08:53:50.4317934Z setp.lt.u64 %p52, %rd325, 4032; 2026-02-21T08:53:50.4318001Z add.s32 %r3617, %r3862, 1; 2026-02-21T08:53:50.4318069Z setp.gt.s32 %p53, %r3617, 1; 2026-02-21T08:53:50.4318139Z selp.b32 %r3862, 0, %r3617, %p53; 2026-02-21T08:53:50.4318347Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4318420Z cp.async.wait_group 2; 2026-02-21T08:53:50.4318480Z bar.sync 0; 2026-02-21T08:53:50.4318542Z shl.b32 %r3618, %r3862, 13; 2026-02-21T08:53:50.4318606Z add.s32 %r3620, %r3732, %r3618; 2026-02-21T08:53:50.4318808Z .loc 1 49 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:49:32 2026-02-21T08:53:50.4318877Z add.s32 %r3621, %r3620, %r100; 2026-02-21T08:53:50.4318944Z ld.shared.b16 %rs577, [%r3621]; 2026-02-21T08:53:50.4319018Z ld.shared.b16 %rs578, [%r3621+1024]; 2026-02-21T08:53:50.4319088Z ld.shared.b16 %rs579, [%r3621+64]; 2026-02-21T08:53:50.4319157Z ld.shared.b16 %rs580, [%r3621+1088]; 2026-02-21T08:53:50.4319218Z add.s32 %r3622, %r3620, %r101; 2026-02-21T08:53:50.4319298Z ld.shared.b16 %rs581, [%r3622]; 2026-02-21T08:53:50.4319368Z ld.shared.b16 %rs582, [%r3622+1024]; 2026-02-21T08:53:50.4319435Z ld.shared.b16 %rs583, [%r3622+64]; 2026-02-21T08:53:50.4319506Z ld.shared.b16 %rs584, [%r3622+1088]; 2026-02-21T08:53:50.4319570Z add.s32 %r3623, %r3620, %r102; 2026-02-21T08:53:50.4319641Z ld.shared.b16 %rs585, [%r3623]; 2026-02-21T08:53:50.4319709Z ld.shared.b16 %rs586, [%r3623+1024]; 2026-02-21T08:53:50.4319778Z ld.shared.b16 %rs587, [%r3623+64]; 2026-02-21T08:53:50.4319934Z ld.shared.b16 %rs588, [%r3623+1088]; 2026-02-21T08:53:50.4319998Z add.s32 %r3624, %r3620, %r103; 2026-02-21T08:53:50.4320066Z ld.shared.b16 %rs589, [%r3624]; 2026-02-21T08:53:50.4320131Z ld.shared.b16 %rs590, [%r3624+1024]; 2026-02-21T08:53:50.4320208Z ld.shared.b16 %rs591, [%r3624+64]; 2026-02-21T08:53:50.4320280Z ld.shared.b16 %rs592, [%r3624+1088]; 2026-02-21T08:53:50.4320343Z add.s32 %r3625, %r3620, %r104; 2026-02-21T08:53:50.4320410Z ld.shared.b16 %rs593, [%r3625]; 2026-02-21T08:53:50.4320475Z ld.shared.b16 %rs594, [%r3625+1024]; 2026-02-21T08:53:50.4320544Z ld.shared.b16 %rs595, [%r3625+64]; 2026-02-21T08:53:50.4320609Z ld.shared.b16 %rs596, [%r3625+1088]; 2026-02-21T08:53:50.4320673Z add.s32 %r3626, %r3620, %r105; 2026-02-21T08:53:50.4320810Z ld.shared.b16 %rs597, [%r3626]; 2026-02-21T08:53:50.4320991Z ld.shared.b16 %rs598, [%r3626+1024]; 2026-02-21T08:53:50.4321058Z ld.shared.b16 %rs599, [%r3626+64]; 2026-02-21T08:53:50.4321123Z ld.shared.b16 %rs600, [%r3626+1088]; 2026-02-21T08:53:50.4321191Z add.s32 %r3627, %r3620, %r106; 2026-02-21T08:53:50.4321256Z ld.shared.b16 %rs601, [%r3627]; 2026-02-21T08:53:50.4321323Z ld.shared.b16 %rs602, [%r3627+1024]; 2026-02-21T08:53:50.4321392Z ld.shared.b16 %rs603, [%r3627+64]; 2026-02-21T08:53:50.4321458Z ld.shared.b16 %rs604, [%r3627+1088]; 2026-02-21T08:53:50.4321518Z add.s32 %r3628, %r3620, %r107; 2026-02-21T08:53:50.4321585Z ld.shared.b16 %rs605, [%r3628]; 2026-02-21T08:53:50.4321671Z ld.shared.b16 %rs606, [%r3628+1024]; 2026-02-21T08:53:50.4321735Z ld.shared.b16 %rs607, [%r3628+64]; 2026-02-21T08:53:50.4321802Z ld.shared.b16 %rs608, [%r3628+1088]; 2026-02-21T08:53:50.4321871Z cvt.f32.bf16 %r3049, %rs577; 2026-02-21T08:53:50.4321934Z cvt.f32.bf16 %r3050, %rs578; 2026-02-21T08:53:50.4322000Z cvt.f32.bf16 %r3051, %rs581; 2026-02-21T08:53:50.4322065Z cvt.f32.bf16 %r3052, %rs582; 2026-02-21T08:53:50.4322125Z cvt.f32.bf16 %r3117, %rs585; 2026-02-21T08:53:50.4322186Z cvt.f32.bf16 %r3118, %rs586; 2026-02-21T08:53:50.4322251Z cvt.f32.bf16 %r3119, %rs589; 2026-02-21T08:53:50.4322316Z cvt.f32.bf16 %r3120, %rs590; 2026-02-21T08:53:50.4322376Z cvt.f32.bf16 %r3185, %rs593; 2026-02-21T08:53:50.4322437Z cvt.f32.bf16 %r3186, %rs594; 2026-02-21T08:53:50.4322501Z cvt.f32.bf16 %r3187, %rs597; 2026-02-21T08:53:50.4322561Z cvt.f32.bf16 %r3188, %rs598; 2026-02-21T08:53:50.4322621Z cvt.f32.bf16 %r3253, %rs601; 2026-02-21T08:53:50.4322683Z cvt.f32.bf16 %r3254, %rs602; 2026-02-21T08:53:50.4322748Z cvt.f32.bf16 %r3255, %rs605; 2026-02-21T08:53:50.4322809Z cvt.f32.bf16 %r3256, %rs606; 2026-02-21T08:53:50.4322869Z cvt.f32.bf16 %r3321, %rs579; 2026-02-21T08:53:50.4322932Z cvt.f32.bf16 %r3322, %rs580; 2026-02-21T08:53:50.4322992Z cvt.f32.bf16 %r3323, %rs583; 2026-02-21T08:53:50.4323057Z cvt.f32.bf16 %r3324, %rs584; 2026-02-21T08:53:50.4323118Z cvt.f32.bf16 %r3389, %rs587; 2026-02-21T08:53:50.4323180Z cvt.f32.bf16 %r3390, %rs588; 2026-02-21T08:53:50.4323241Z cvt.f32.bf16 %r3391, %rs591; 2026-02-21T08:53:50.4323305Z cvt.f32.bf16 %r3392, %rs592; 2026-02-21T08:53:50.4323369Z cvt.f32.bf16 %r3457, %rs595; 2026-02-21T08:53:50.4323429Z cvt.f32.bf16 %r3458, %rs596; 2026-02-21T08:53:50.4323491Z cvt.f32.bf16 %r3459, %rs599; 2026-02-21T08:53:50.4323554Z cvt.f32.bf16 %r3460, %rs600; 2026-02-21T08:53:50.4323613Z cvt.f32.bf16 %r3525, %rs603; 2026-02-21T08:53:50.4323673Z cvt.f32.bf16 %r3526, %rs604; 2026-02-21T08:53:50.4323733Z cvt.f32.bf16 %r3527, %rs607; 2026-02-21T08:53:50.4323797Z cvt.f32.bf16 %r3528, %rs608; 2026-02-21T08:53:50.4324010Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4324074Z shl.b32 %r3629, %r3862, 11; 2026-02-21T08:53:50.4324140Z add.s32 %r3630, %r3732, %r3629; 2026-02-21T08:53:50.4324210Z add.s32 %r3631, %r3630, 32768; 2026-02-21T08:53:50.4324422Z .loc 1 64 45 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:64:45 2026-02-21T08:53:50.4324487Z add.s32 %r3632, %r3631, %r3738; 2026-02-21T08:53:50.4324619Z ld.shared.b8 %rs609, [%r3632]; 2026-02-21T08:53:50.4324688Z ld.shared.b8 %rs610, [%r3632+64]; 2026-02-21T08:53:50.4324756Z ld.shared.b8 %rs611, [%r3632+128]; 2026-02-21T08:53:50.4324827Z ld.shared.b8 %rs612, [%r3632+192]; 2026-02-21T08:53:50.4324890Z ld.shared.b8 %rs613, [%r3632+256]; 2026-02-21T08:53:50.4324952Z ld.shared.b8 %rs614, [%r3632+320]; 2026-02-21T08:53:50.4325019Z ld.shared.b8 %rs615, [%r3632+384]; 2026-02-21T08:53:50.4325092Z ld.shared.b8 %rs616, [%r3632+448]; 2026-02-21T08:53:50.4325158Z ld.shared.b8 %rs617, [%r3632+512]; 2026-02-21T08:53:50.4325221Z ld.shared.b8 %rs618, [%r3632+576]; 2026-02-21T08:53:50.4325286Z ld.shared.b8 %rs619, [%r3632+640]; 2026-02-21T08:53:50.4325350Z ld.shared.b8 %rs620, [%r3632+704]; 2026-02-21T08:53:50.4325552Z ld.shared.b8 %rs621, [%r3632+768]; 2026-02-21T08:53:50.4325620Z ld.shared.b8 %rs622, [%r3632+832]; 2026-02-21T08:53:50.4325684Z ld.shared.b8 %rs623, [%r3632+896]; 2026-02-21T08:53:50.4325747Z add.s32 %r3633, %r3631, %r3739; 2026-02-21T08:53:50.4325813Z ld.shared.b8 %rs624, [%r3633]; 2026-02-21T08:53:50.4325885Z ld.shared.b8 %rs625, [%r3632+1024]; 2026-02-21T08:53:50.4325950Z ld.shared.b8 %rs626, [%r3632+1088]; 2026-02-21T08:53:50.4326017Z ld.shared.b8 %rs627, [%r3632+1152]; 2026-02-21T08:53:50.4326090Z ld.shared.b8 %rs628, [%r3632+1216]; 2026-02-21T08:53:50.4326155Z ld.shared.b8 %rs629, [%r3632+1280]; 2026-02-21T08:53:50.4326222Z ld.shared.b8 %rs630, [%r3632+1344]; 2026-02-21T08:53:50.4326290Z ld.shared.b8 %rs631, [%r3632+1408]; 2026-02-21T08:53:50.4326354Z ld.shared.b8 %rs632, [%r3632+1472]; 2026-02-21T08:53:50.4326419Z ld.shared.b8 %rs633, [%r3632+1536]; 2026-02-21T08:53:50.4326609Z ld.shared.b8 %rs634, [%r3632+1600]; 2026-02-21T08:53:50.4326690Z ld.shared.b8 %rs635, [%r3632+1664]; 2026-02-21T08:53:50.4326754Z ld.shared.b8 %rs636, [%r3632+1728]; 2026-02-21T08:53:50.4326820Z ld.shared.b8 %rs637, [%r3632+1792]; 2026-02-21T08:53:50.4326889Z ld.shared.b8 %rs638, [%r3632+1856]; 2026-02-21T08:53:50.4326956Z ld.shared.b8 %rs639, [%r3632+1920]; 2026-02-21T08:53:50.4327030Z add.s32 %r3634, %r3631, %r3740; 2026-02-21T08:53:50.4327099Z ld.shared.b8 %rs640, [%r3634]; 2026-02-21T08:53:50.4327307Z .loc 1 54 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:54:28 2026-02-21T08:53:50.4327372Z shl.b16 %rs641, %rs609, 4; 2026-02-21T08:53:50.4327433Z shl.b16 %rs642, %rs610, 4; 2026-02-21T08:53:50.4327499Z shl.b16 %rs643, %rs611, 4; 2026-02-21T08:53:50.4327561Z shl.b16 %rs644, %rs612, 4; 2026-02-21T08:53:50.4327624Z shl.b16 %rs645, %rs613, 4; 2026-02-21T08:53:50.4327690Z shl.b16 %rs646, %rs614, 4; 2026-02-21T08:53:50.4327751Z shl.b16 %rs647, %rs615, 4; 2026-02-21T08:53:50.4327814Z shl.b16 %rs648, %rs616, 4; 2026-02-21T08:53:50.4327879Z shl.b16 %rs649, %rs617, 4; 2026-02-21T08:53:50.4327945Z shl.b16 %rs650, %rs618, 4; 2026-02-21T08:53:50.4328006Z shl.b16 %rs651, %rs619, 4; 2026-02-21T08:53:50.4328066Z shl.b16 %rs652, %rs620, 4; 2026-02-21T08:53:50.4328133Z shl.b16 %rs653, %rs621, 4; 2026-02-21T08:53:50.4328194Z shl.b16 %rs654, %rs622, 4; 2026-02-21T08:53:50.4328254Z shl.b16 %rs655, %rs623, 4; 2026-02-21T08:53:50.4328312Z shl.b16 %rs656, %rs624, 4; 2026-02-21T08:53:50.4328375Z shl.b16 %rs657, %rs625, 4; 2026-02-21T08:53:50.4328435Z shl.b16 %rs658, %rs626, 4; 2026-02-21T08:53:50.4328495Z shl.b16 %rs659, %rs627, 4; 2026-02-21T08:53:50.4328559Z shl.b16 %rs660, %rs628, 4; 2026-02-21T08:53:50.4328620Z shl.b16 %rs661, %rs629, 4; 2026-02-21T08:53:50.4328681Z shl.b16 %rs662, %rs630, 4; 2026-02-21T08:53:50.4328741Z shl.b16 %rs663, %rs631, 4; 2026-02-21T08:53:50.4328809Z shl.b16 %rs664, %rs632, 4; 2026-02-21T08:53:50.4328871Z shl.b16 %rs665, %rs633, 4; 2026-02-21T08:53:50.4328932Z shl.b16 %rs666, %rs634, 4; 2026-02-21T08:53:50.4328999Z shl.b16 %rs667, %rs635, 4; 2026-02-21T08:53:50.4329060Z shl.b16 %rs668, %rs636, 4; 2026-02-21T08:53:50.4329120Z shl.b16 %rs669, %rs637, 4; 2026-02-21T08:53:50.4329278Z shl.b16 %rs670, %rs638, 4; 2026-02-21T08:53:50.4329346Z shl.b16 %rs671, %rs639, 4; 2026-02-21T08:53:50.4329407Z shl.b16 %rs672, %rs640, 4; 2026-02-21T08:53:50.4329607Z .loc 1 69 58 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:69:58 2026-02-21T08:53:50.4329687Z selp.b16 %rs673, %rs641, %rs609, %p57; 2026-02-21T08:53:50.4329749Z cvt.s16.s8 %rs674, %rs673; 2026-02-21T08:53:50.4329810Z shr.s16 %rs675, %rs674, 4; 2026-02-21T08:53:50.4329883Z selp.b16 %rs676, %rs642, %rs610, %p57; 2026-02-21T08:53:50.4329943Z cvt.s16.s8 %rs677, %rs676; 2026-02-21T08:53:50.4330004Z shr.s16 %rs678, %rs677, 4; 2026-02-21T08:53:50.4330072Z selp.b16 %rs679, %rs643, %rs611, %p57; 2026-02-21T08:53:50.4330137Z cvt.s16.s8 %rs680, %rs679; 2026-02-21T08:53:50.4330378Z shr.s16 %rs681, %rs680, 4; 2026-02-21T08:53:50.4330450Z selp.b16 %rs682, %rs644, %rs612, %p57; 2026-02-21T08:53:50.4330515Z cvt.s16.s8 %rs683, %rs682; 2026-02-21T08:53:50.4330575Z shr.s16 %rs684, %rs683, 4; 2026-02-21T08:53:50.4330645Z selp.b16 %rs685, %rs645, %rs613, %p57; 2026-02-21T08:53:50.4330707Z cvt.s16.s8 %rs686, %rs685; 2026-02-21T08:53:50.4330771Z shr.s16 %rs687, %rs686, 4; 2026-02-21T08:53:50.4330839Z selp.b16 %rs688, %rs646, %rs614, %p57; 2026-02-21T08:53:50.4330901Z cvt.s16.s8 %rs689, %rs688; 2026-02-21T08:53:50.4330965Z shr.s16 %rs690, %rs689, 4; 2026-02-21T08:53:50.4331031Z selp.b16 %rs691, %rs647, %rs615, %p57; 2026-02-21T08:53:50.4331092Z cvt.s16.s8 %rs692, %rs691; 2026-02-21T08:53:50.4331154Z shr.s16 %rs693, %rs692, 4; 2026-02-21T08:53:50.4331225Z selp.b16 %rs694, %rs648, %rs616, %p57; 2026-02-21T08:53:50.4331285Z cvt.s16.s8 %rs695, %rs694; 2026-02-21T08:53:50.4331345Z shr.s16 %rs696, %rs695, 4; 2026-02-21T08:53:50.4331429Z selp.b16 %rs697, %rs649, %rs617, %p57; 2026-02-21T08:53:50.4332901Z cvt.s16.s8 %rs698, %rs697; 2026-02-21T08:53:50.4333007Z shr.s16 %rs699, %rs698, 4; 2026-02-21T08:53:50.4333090Z selp.b16 %rs700, %rs650, %rs618, %p57; 2026-02-21T08:53:50.4333167Z cvt.s16.s8 %rs701, %rs700; 2026-02-21T08:53:50.4333229Z shr.s16 %rs702, %rs701, 4; 2026-02-21T08:53:50.4333302Z selp.b16 %rs703, %rs651, %rs619, %p57; 2026-02-21T08:53:50.4333365Z cvt.s16.s8 %rs704, %rs703; 2026-02-21T08:53:50.4333431Z shr.s16 %rs705, %rs704, 4; 2026-02-21T08:53:50.4333501Z selp.b16 %rs706, %rs652, %rs620, %p57; 2026-02-21T08:53:50.4333561Z cvt.s16.s8 %rs707, %rs706; 2026-02-21T08:53:50.4333627Z shr.s16 %rs708, %rs707, 4; 2026-02-21T08:53:50.4333697Z selp.b16 %rs709, %rs653, %rs621, %p57; 2026-02-21T08:53:50.4333756Z cvt.s16.s8 %rs710, %rs709; 2026-02-21T08:53:50.4333816Z shr.s16 %rs711, %rs710, 4; 2026-02-21T08:53:50.4333889Z selp.b16 %rs712, %rs654, %rs622, %p57; 2026-02-21T08:53:50.4333949Z cvt.s16.s8 %rs713, %rs712; 2026-02-21T08:53:50.4334015Z shr.s16 %rs714, %rs713, 4; 2026-02-21T08:53:50.4334093Z selp.b16 %rs715, %rs655, %rs623, %p57; 2026-02-21T08:53:50.4334156Z cvt.s16.s8 %rs716, %rs715; 2026-02-21T08:53:50.4334217Z shr.s16 %rs717, %rs716, 4; 2026-02-21T08:53:50.4334289Z selp.b16 %rs718, %rs656, %rs624, %p57; 2026-02-21T08:53:50.4334354Z cvt.s16.s8 %rs719, %rs718; 2026-02-21T08:53:50.4334416Z shr.s16 %rs720, %rs719, 4; 2026-02-21T08:53:50.4334484Z selp.b16 %rs721, %rs657, %rs625, %p57; 2026-02-21T08:53:50.4334547Z cvt.s16.s8 %rs722, %rs721; 2026-02-21T08:53:50.4334611Z shr.s16 %rs723, %rs722, 4; 2026-02-21T08:53:50.4334678Z selp.b16 %rs724, %rs658, %rs626, %p57; 2026-02-21T08:53:50.4334739Z cvt.s16.s8 %rs725, %rs724; 2026-02-21T08:53:50.4334803Z shr.s16 %rs726, %rs725, 4; 2026-02-21T08:53:50.4334873Z selp.b16 %rs727, %rs659, %rs627, %p57; 2026-02-21T08:53:50.4334935Z cvt.s16.s8 %rs728, %rs727; 2026-02-21T08:53:50.4335000Z shr.s16 %rs729, %rs728, 4; 2026-02-21T08:53:50.4335068Z selp.b16 %rs730, %rs660, %rs628, %p57; 2026-02-21T08:53:50.4335132Z cvt.s16.s8 %rs731, %rs730; 2026-02-21T08:53:50.4335194Z shr.s16 %rs732, %rs731, 4; 2026-02-21T08:53:50.4335270Z selp.b16 %rs733, %rs661, %rs629, %p57; 2026-02-21T08:53:50.4335419Z cvt.s16.s8 %rs734, %rs733; 2026-02-21T08:53:50.4335479Z shr.s16 %rs735, %rs734, 4; 2026-02-21T08:53:50.4335552Z selp.b16 %rs736, %rs662, %rs630, %p57; 2026-02-21T08:53:50.4335612Z cvt.s16.s8 %rs737, %rs736; 2026-02-21T08:53:50.4335673Z shr.s16 %rs738, %rs737, 4; 2026-02-21T08:53:50.4335739Z selp.b16 %rs739, %rs663, %rs631, %p57; 2026-02-21T08:53:50.4335802Z cvt.s16.s8 %rs740, %rs739; 2026-02-21T08:53:50.4335862Z shr.s16 %rs741, %rs740, 4; 2026-02-21T08:53:50.4335930Z selp.b16 %rs742, %rs664, %rs632, %p57; 2026-02-21T08:53:50.4335995Z cvt.s16.s8 %rs743, %rs742; 2026-02-21T08:53:50.4336054Z shr.s16 %rs744, %rs743, 4; 2026-02-21T08:53:50.4336121Z selp.b16 %rs745, %rs665, %rs633, %p57; 2026-02-21T08:53:50.4336180Z cvt.s16.s8 %rs746, %rs745; 2026-02-21T08:53:50.4336365Z shr.s16 %rs747, %rs746, 4; 2026-02-21T08:53:50.4336436Z selp.b16 %rs748, %rs666, %rs634, %p57; 2026-02-21T08:53:50.4336669Z cvt.s16.s8 %rs749, %rs748; 2026-02-21T08:53:50.4336738Z shr.s16 %rs750, %rs749, 4; 2026-02-21T08:53:50.4336810Z selp.b16 %rs751, %rs667, %rs635, %p57; 2026-02-21T08:53:50.4336870Z cvt.s16.s8 %rs752, %rs751; 2026-02-21T08:53:50.4336932Z shr.s16 %rs753, %rs752, 4; 2026-02-21T08:53:50.4336999Z selp.b16 %rs754, %rs668, %rs636, %p57; 2026-02-21T08:53:50.4337059Z cvt.s16.s8 %rs755, %rs754; 2026-02-21T08:53:50.4337119Z shr.s16 %rs756, %rs755, 4; 2026-02-21T08:53:50.4337189Z selp.b16 %rs757, %rs669, %rs637, %p57; 2026-02-21T08:53:50.4337251Z cvt.s16.s8 %rs758, %rs757; 2026-02-21T08:53:50.4337310Z shr.s16 %rs759, %rs758, 4; 2026-02-21T08:53:50.4337379Z selp.b16 %rs760, %rs670, %rs638, %p57; 2026-02-21T08:53:50.4337440Z cvt.s16.s8 %rs761, %rs760; 2026-02-21T08:53:50.4337500Z shr.s16 %rs762, %rs761, 4; 2026-02-21T08:53:50.4337569Z selp.b16 %rs763, %rs671, %rs639, %p57; 2026-02-21T08:53:50.4337645Z cvt.s16.s8 %rs764, %rs763; 2026-02-21T08:53:50.4337826Z shr.s16 %rs765, %rs764, 4; 2026-02-21T08:53:50.4337899Z selp.b16 %rs766, %rs672, %rs640, %p57; 2026-02-21T08:53:50.4337964Z cvt.s16.s8 %rs767, %rs766; 2026-02-21T08:53:50.4338026Z shr.s16 %rs768, %rs767, 4; 2026-02-21T08:53:50.4338252Z .loc 1 74 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:74:32 2026-02-21T08:53:50.4338325Z cvt.rn.f32.s16 %r3635, %rs675; 2026-02-21T08:53:50.4338394Z cvt.rn.f32.s16 %r3636, %rs678; 2026-02-21T08:53:50.4338456Z cvt.rn.f32.s16 %r3637, %rs681; 2026-02-21T08:53:50.4338518Z cvt.rn.f32.s16 %r3638, %rs684; 2026-02-21T08:53:50.4338595Z cvt.rn.f32.s16 %r3639, %rs687; 2026-02-21T08:53:50.4338659Z cvt.rn.f32.s16 %r3640, %rs690; 2026-02-21T08:53:50.4338721Z cvt.rn.f32.s16 %r3641, %rs693; 2026-02-21T08:53:50.4338785Z cvt.rn.f32.s16 %r3642, %rs696; 2026-02-21T08:53:50.4338846Z cvt.rn.f32.s16 %r3643, %rs699; 2026-02-21T08:53:50.4338910Z cvt.rn.f32.s16 %r3644, %rs702; 2026-02-21T08:53:50.4338972Z cvt.rn.f32.s16 %r3645, %rs705; 2026-02-21T08:53:50.4339039Z cvt.rn.f32.s16 %r3646, %rs708; 2026-02-21T08:53:50.4339100Z cvt.rn.f32.s16 %r3647, %rs711; 2026-02-21T08:53:50.4339163Z cvt.rn.f32.s16 %r3648, %rs714; 2026-02-21T08:53:50.4339227Z cvt.rn.f32.s16 %r3649, %rs717; 2026-02-21T08:53:50.4339289Z cvt.rn.f32.s16 %r3650, %rs720; 2026-02-21T08:53:50.4339348Z cvt.rn.f32.s16 %r3651, %rs723; 2026-02-21T08:53:50.4339409Z cvt.rn.f32.s16 %r3652, %rs726; 2026-02-21T08:53:50.4339474Z cvt.rn.f32.s16 %r3653, %rs729; 2026-02-21T08:53:50.4339535Z cvt.rn.f32.s16 %r3654, %rs732; 2026-02-21T08:53:50.4339600Z cvt.rn.f32.s16 %r3655, %rs735; 2026-02-21T08:53:50.4339665Z cvt.rn.f32.s16 %r3656, %rs738; 2026-02-21T08:53:50.4339727Z cvt.rn.f32.s16 %r3657, %rs741; 2026-02-21T08:53:50.4339786Z cvt.rn.f32.s16 %r3658, %rs744; 2026-02-21T08:53:50.4339847Z cvt.rn.f32.s16 %r3659, %rs747; 2026-02-21T08:53:50.4339912Z cvt.rn.f32.s16 %r3660, %rs750; 2026-02-21T08:53:50.4339977Z cvt.rn.f32.s16 %r3661, %rs753; 2026-02-21T08:53:50.4340041Z cvt.rn.f32.s16 %r3662, %rs756; 2026-02-21T08:53:50.4340106Z cvt.rn.f32.s16 %r3663, %rs759; 2026-02-21T08:53:50.4340168Z cvt.rn.f32.s16 %r3664, %rs762; 2026-02-21T08:53:50.4340309Z cvt.rn.f32.s16 %r3665, %rs765; 2026-02-21T08:53:50.4340370Z cvt.rn.f32.s16 %r3666, %rs768; 2026-02-21T08:53:50.4340437Z st.shared.b32 [%r111], %r3635; 2026-02-21T08:53:50.4340505Z st.shared.b32 [%r111+8], %r3636; 2026-02-21T08:53:50.4340579Z st.shared.b32 [%r111+8192], %r3651; 2026-02-21T08:53:50.4340650Z st.shared.b32 [%r111+8200], %r3652; 2026-02-21T08:53:50.4340713Z st.shared.b32 [%r112], %r3637; 2026-02-21T08:53:50.4340778Z st.shared.b32 [%r112+8], %r3638; 2026-02-21T08:53:50.4340845Z st.shared.b32 [%r112+8192], %r3653; 2026-02-21T08:53:50.4340909Z st.shared.b32 [%r112+8200], %r3654; 2026-02-21T08:53:50.4340971Z st.shared.b32 [%r113], %r3639; 2026-02-21T08:53:50.4341034Z st.shared.b32 [%r113+8], %r3640; 2026-02-21T08:53:50.4341258Z st.shared.b32 [%r113+8192], %r3655; 2026-02-21T08:53:50.4341327Z st.shared.b32 [%r113+8200], %r3656; 2026-02-21T08:53:50.4341392Z st.shared.b32 [%r114], %r3641; 2026-02-21T08:53:50.4341461Z st.shared.b32 [%r114+8], %r3642; 2026-02-21T08:53:50.4341528Z st.shared.b32 [%r114+8192], %r3657; 2026-02-21T08:53:50.4341591Z st.shared.b32 [%r114+8200], %r3658; 2026-02-21T08:53:50.4341654Z st.shared.b32 [%r115], %r3643; 2026-02-21T08:53:50.4341721Z st.shared.b32 [%r115+8], %r3644; 2026-02-21T08:53:50.4341788Z st.shared.b32 [%r115+8192], %r3659; 2026-02-21T08:53:50.4341853Z st.shared.b32 [%r115+8200], %r3660; 2026-02-21T08:53:50.4341920Z st.shared.b32 [%r116], %r3645; 2026-02-21T08:53:50.4341982Z st.shared.b32 [%r116+8], %r3646; 2026-02-21T08:53:50.4342046Z st.shared.b32 [%r116+8192], %r3661; 2026-02-21T08:53:50.4342116Z st.shared.b32 [%r116+8200], %r3662; 2026-02-21T08:53:50.4342181Z st.shared.b32 [%r117], %r3647; 2026-02-21T08:53:50.4342258Z st.shared.b32 [%r117+8], %r3648; 2026-02-21T08:53:50.4342389Z st.shared.b32 [%r117+8192], %r3663; 2026-02-21T08:53:50.4342456Z st.shared.b32 [%r117+8200], %r3664; 2026-02-21T08:53:50.4342519Z st.shared.b32 [%r118], %r3649; 2026-02-21T08:53:50.4342582Z st.shared.b32 [%r118+8], %r3650; 2026-02-21T08:53:50.4342649Z st.shared.b32 [%r118+8192], %r3665; 2026-02-21T08:53:50.4342712Z st.shared.b32 [%r118+8200], %r3666; 2026-02-21T08:53:50.4342768Z $L__tmp7: 2026-02-21T08:53:50.4343063Z .loc 2 291 36 // standard.py:291:36 @[ cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:81:40 ] 2026-02-21T08:53:50.4343125Z // begin inline asm 2026-02-21T08:53:50.4343213Z fence.proxy.async.shared::cta; 2026-02-21T08:53:50.4343274Z // end inline asm 2026-02-21T08:53:50.4343331Z bar.sync 0; 2026-02-21T08:53:50.4343415Z shfl.sync.idx.b32 %r3667, %r6, 0, 31, -1; 2026-02-21T08:53:50.4343488Z wgmma.fence.sync.aligned; 2026-02-21T08:53:50.4343558Z mov.pred %p43, -1; 2026-02-21T08:53:50.4343621Z // begin inline asm 2026-02-21T08:53:50.4344388Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3049,%r3050,%r3051,%r3052}, %rd290, %p43, 1, 1; 2026-02-21T08:53:50.4344460Z // end inline asm 2026-02-21T08:53:50.4344525Z // begin inline asm 2026-02-21T08:53:50.4345279Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3117,%r3118,%r3119,%r3120}, %rd291, %p43, 1, 1; 2026-02-21T08:53:50.4345341Z // end inline asm 2026-02-21T08:53:50.4345402Z // begin inline asm 2026-02-21T08:53:50.4346147Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3185,%r3186,%r3187,%r3188}, %rd292, %p43, 1, 1; 2026-02-21T08:53:50.4346270Z // end inline asm 2026-02-21T08:53:50.4346328Z // begin inline asm 2026-02-21T08:53:50.4347208Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3253,%r3254,%r3255,%r3256}, %rd293, %p43, 1, 1; 2026-02-21T08:53:50.4347272Z // end inline asm 2026-02-21T08:53:50.4347332Z // begin inline asm 2026-02-21T08:53:50.4348150Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3321,%r3322,%r3323,%r3324}, %rd294, %p43, 1, 1; 2026-02-21T08:53:50.4348272Z // end inline asm 2026-02-21T08:53:50.4348332Z // begin inline asm 2026-02-21T08:53:50.4349157Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3389,%r3390,%r3391,%r3392}, %rd295, %p43, 1, 1; 2026-02-21T08:53:50.4349220Z // end inline asm 2026-02-21T08:53:50.4349280Z // begin inline asm 2026-02-21T08:53:50.4350020Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3457,%r3458,%r3459,%r3460}, %rd296, %p43, 1, 1; 2026-02-21T08:53:50.4350149Z // end inline asm 2026-02-21T08:53:50.4350209Z // begin inline asm 2026-02-21T08:53:50.4350953Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895}, {%r3525,%r3526,%r3527,%r3528}, %rd297, %p43, 1, 1; 2026-02-21T08:53:50.4351016Z // end inline asm 2026-02-21T08:53:50.4351095Z wgmma.commit_group.sync.aligned; 2026-02-21T08:53:50.4351153Z mov.b32 %r3563, 0; 2026-02-21T08:53:50.4351218Z mov.b32 %r3561, %r2880; 2026-02-21T08:53:50.4351277Z mov.b32 %r3562, %r3563; 2026-02-21T08:53:50.4351334Z // begin inline asm 2026-02-21T08:53:50.4351893Z // wait for regs: %r3864,%r3865,%r3866,%r3867,%r3868,%r3869,%r3870,%r3871,%r3872,%r3873,%r3874,%r3875,%r3876,%r3877,%r3878,%r3879,%r3880,%r3881,%r3882,%r3883,%r3884,%r3885,%r3886,%r3887,%r3888,%r3889,%r3890,%r3891,%r3892,%r3893,%r3894,%r3895,%r3561,%r3562,%r3563 2026-02-21T08:53:50.4351977Z wgmma.wait_group.sync.aligned 0; 2026-02-21T08:53:50.4352035Z // end inline asm 2026-02-21T08:53:50.4352092Z $L__tmp8: 2026-02-21T08:53:50.4352318Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4352383Z add.s32 %r3668, %r3863, 1; 2026-02-21T08:53:50.4352450Z setp.gt.s32 %p54, %r3668, 1; 2026-02-21T08:53:50.4352524Z selp.b32 %r3863, 0, %r3668, %p54; 2026-02-21T08:53:50.4352733Z .loc 1 45 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:32 2026-02-21T08:53:50.4352802Z add.s64 %rd298, %rd323, -917504; 2026-02-21T08:53:50.4352871Z add.s64 %rd299, %rd323, -786432; 2026-02-21T08:53:50.4352934Z add.s64 %rd300, %rd323, -655360; 2026-02-21T08:53:50.4352998Z add.s64 %rd301, %rd323, -524288; 2026-02-21T08:53:50.4353059Z add.s64 %rd302, %rd323, -393216; 2026-02-21T08:53:50.4353128Z add.s64 %rd303, %rd323, -262144; 2026-02-21T08:53:50.4353192Z add.s64 %rd304, %rd323, -131072; 2026-02-21T08:53:50.4353391Z .loc 1 45 80 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:45:80 2026-02-21T08:53:50.4353538Z shl.b32 %r3669, %r3863, 13; 2026-02-21T08:53:50.4353603Z add.s32 %r3670, %r3732, %r3669; 2026-02-21T08:53:50.4353666Z add.s32 %r3599, %r3670, %r79; 2026-02-21T08:53:50.4353730Z selp.b32 %r3600, 8, 0, %p52; 2026-02-21T08:53:50.4353792Z // begin inline asm 2026-02-21T08:53:50.4353939Z cp.async.ca.shared.global [ %r3599 + 0 ], [ %rd298 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4353995Z // end inline asm 2026-02-21T08:53:50.4354066Z add.s32 %r3601, %r3599, 1024; 2026-02-21T08:53:50.4354132Z // begin inline asm 2026-02-21T08:53:50.4354268Z cp.async.ca.shared.global [ %r3601 + 0 ], [ %rd299 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4354331Z // end inline asm 2026-02-21T08:53:50.4354392Z add.s32 %r3603, %r3599, 2048; 2026-02-21T08:53:50.4354552Z // begin inline asm 2026-02-21T08:53:50.4354693Z cp.async.ca.shared.global [ %r3603 + 0 ], [ %rd300 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4354752Z // end inline asm 2026-02-21T08:53:50.4354812Z add.s32 %r3605, %r3599, 3072; 2026-02-21T08:53:50.4354872Z // begin inline asm 2026-02-21T08:53:50.4355005Z cp.async.ca.shared.global [ %r3605 + 0 ], [ %rd301 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4355065Z // end inline asm 2026-02-21T08:53:50.4355124Z add.s32 %r3607, %r3599, 4096; 2026-02-21T08:53:50.4355181Z // begin inline asm 2026-02-21T08:53:50.4355312Z cp.async.ca.shared.global [ %r3607 + 0 ], [ %rd302 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4355368Z // end inline asm 2026-02-21T08:53:50.4355438Z add.s32 %r3609, %r3599, 5120; 2026-02-21T08:53:50.4355502Z // begin inline asm 2026-02-21T08:53:50.4355633Z cp.async.ca.shared.global [ %r3609 + 0 ], [ %rd303 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4355688Z // end inline asm 2026-02-21T08:53:50.4355748Z add.s32 %r3611, %r3599, 6144; 2026-02-21T08:53:50.4355812Z // begin inline asm 2026-02-21T08:53:50.4355994Z cp.async.ca.shared.global [ %r3611 + 0 ], [ %rd304 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4356056Z // end inline asm 2026-02-21T08:53:50.4356121Z add.s32 %r3613, %r3599, 7168; 2026-02-21T08:53:50.4356179Z // begin inline asm 2026-02-21T08:53:50.4356309Z cp.async.ca.shared.global [ %r3613 + 0 ], [ %rd323 + 0 ], 0x8, %r3600; 2026-02-21T08:53:50.4356367Z // end inline asm 2026-02-21T08:53:50.4356432Z cp.async.commit_group; 2026-02-21T08:53:50.4356783Z .loc 1 51 87 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:51:87 2026-02-21T08:53:50.4356850Z shl.b32 %r3671, %r3863, 11; 2026-02-21T08:53:50.4356913Z add.s32 %r3615, %r89, %r3671; 2026-02-21T08:53:50.4356976Z selp.b32 %r3616, 16, 0, %p52; 2026-02-21T08:53:50.4357035Z // begin inline asm 2026-02-21T08:53:50.4357178Z cp.async.cg.shared.global [ %r3615 + 0 ], [ %rd324 + 0 ], 0x10, %r3616; 2026-02-21T08:53:50.4357237Z // end inline asm 2026-02-21T08:53:50.4357302Z cp.async.commit_group; 2026-02-21T08:53:50.4357514Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4357580Z add.s64 %rd324, %rd324, 229376; 2026-02-21T08:53:50.4357646Z add.s64 %rd323, %rd323, 128; 2026-02-21T08:53:50.4357713Z setp.lt.u64 %p55, %rd325, 4064; 2026-02-21T08:53:50.4357776Z @%p55 bra $L__BB0_12; 2026-02-21T08:53:50.4357887Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T08:53:50.4358086Z .loc 1 30 32 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:30:32 2026-02-21T08:53:50.4358159Z or.b32 %r3708, %r344, %r10; 2026-02-21T08:53:50.4358372Z .loc 1 37 103 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:37:103 2026-02-21T08:53:50.4358439Z cp.async.wait_group 0; 2026-02-21T08:53:50.4358498Z bar.sync 0; 2026-02-21T08:53:50.4358697Z .loc 1 84 28 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:84:28 2026-02-21T08:53:50.4358779Z cvt.rn.bf16x2.f32 %r3709, %r3865, %r3864; 2026-02-21T08:53:50.4358855Z cvt.rn.bf16x2.f32 %r3710, %r3867, %r3866; 2026-02-21T08:53:50.4359045Z cvt.rn.bf16x2.f32 %r3711, %r3869, %r3868; 2026-02-21T08:53:50.4359115Z cvt.rn.bf16x2.f32 %r3712, %r3871, %r3870; 2026-02-21T08:53:50.4359185Z cvt.rn.bf16x2.f32 %r3713, %r3873, %r3872; 2026-02-21T08:53:50.4359258Z cvt.rn.bf16x2.f32 %r3714, %r3875, %r3874; 2026-02-21T08:53:50.4359326Z cvt.rn.bf16x2.f32 %r3715, %r3877, %r3876; 2026-02-21T08:53:50.4359396Z cvt.rn.bf16x2.f32 %r3716, %r3879, %r3878; 2026-02-21T08:53:50.4359468Z cvt.rn.bf16x2.f32 %r3717, %r3881, %r3880; 2026-02-21T08:53:50.4359538Z cvt.rn.bf16x2.f32 %r3718, %r3883, %r3882; 2026-02-21T08:53:50.4359608Z cvt.rn.bf16x2.f32 %r3719, %r3885, %r3884; 2026-02-21T08:53:50.4359678Z cvt.rn.bf16x2.f32 %r3720, %r3887, %r3886; 2026-02-21T08:53:50.4359751Z cvt.rn.bf16x2.f32 %r3721, %r3889, %r3888; 2026-02-21T08:53:50.4359946Z cvt.rn.bf16x2.f32 %r3722, %r3891, %r3890; 2026-02-21T08:53:50.4360021Z cvt.rn.bf16x2.f32 %r3723, %r3893, %r3892; 2026-02-21T08:53:50.4360093Z cvt.rn.bf16x2.f32 %r3724, %r3895, %r3894; 2026-02-21T08:53:50.4360299Z .loc 1 85 50 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:50 2026-02-21T08:53:50.4360363Z add.s32 %r3725, %r3708, %r18; 2026-02-21T08:53:50.4360424Z add.s32 %r3726, %r3708, %r19; 2026-02-21T08:53:50.4360488Z add.s32 %r3727, %r3708, %r20; 2026-02-21T08:53:50.4360548Z add.s32 %r3728, %r3708, %r21; 2026-02-21T08:53:50.4360746Z .loc 1 85 22 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:22 2026-02-21T08:53:50.4360824Z mad.wide.s32 %rd307, %r3725, 2, %rd82; 2026-02-21T08:53:50.4360895Z mad.wide.s32 %rd308, %r3726, 2, %rd82; 2026-02-21T08:53:50.4360960Z mad.wide.s32 %rd309, %r3727, 2, %rd82; 2026-02-21T08:53:50.4361030Z mad.wide.s32 %rd310, %r3728, 2, %rd82; 2026-02-21T08:53:50.4361295Z .loc 1 85 81 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:85:81 2026-02-21T08:53:50.4361414Z st.shared.v4.b32 [%r119], {%r3709, %r3711, %r3713, %r3715}; 2026-02-21T08:53:50.4361538Z st.shared.v4.b32 [%r119+256], {%r3710, %r3712, %r3714, %r3716}; 2026-02-21T08:53:50.4361646Z st.shared.v4.b32 [%r120], {%r3717, %r3719, %r3721, %r3723}; 2026-02-21T08:53:50.4361755Z st.shared.v4.b32 [%r120+256], {%r3718, %r3720, %r3722, %r3724}; 2026-02-21T08:53:50.4361811Z bar.sync 0; 2026-02-21T08:53:50.4361873Z // begin inline asm 2026-02-21T08:53:50.4362066Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3672, %r3673, %r3674, %r3675}, [%r3676]; 2026-02-21T08:53:50.4362123Z // end inline asm 2026-02-21T08:53:50.4362188Z // begin inline asm 2026-02-21T08:53:50.4362383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3677, %r3678, %r3679, %r3680}, [%r3681]; 2026-02-21T08:53:50.4362439Z // end inline asm 2026-02-21T08:53:50.4362498Z // begin inline asm 2026-02-21T08:53:50.4362685Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3682, %r3683, %r3684, %r3685}, [%r3686]; 2026-02-21T08:53:50.4362744Z // end inline asm 2026-02-21T08:53:50.4362801Z // begin inline asm 2026-02-21T08:53:50.4362981Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3687, %r3688, %r3689, %r3690}, [%r3691]; 2026-02-21T08:53:50.4363039Z // end inline asm 2026-02-21T08:53:50.4363097Z // begin inline asm 2026-02-21T08:53:50.4363225Z st.global.v4.b32 [ %rd307 + 0 ], { %r3672, %r3673, %r3674, %r3675 }; 2026-02-21T08:53:50.4363281Z // end inline asm 2026-02-21T08:53:50.4363339Z // begin inline asm 2026-02-21T08:53:50.4363456Z st.global.v4.b32 [ %rd308 + 0 ], { %r3677, %r3678, %r3679, %r3680 }; 2026-02-21T08:53:50.4363516Z // end inline asm 2026-02-21T08:53:50.4363572Z // begin inline asm 2026-02-21T08:53:50.4363687Z st.global.v4.b32 [ %rd309 + 0 ], { %r3682, %r3683, %r3684, %r3685 }; 2026-02-21T08:53:50.4363748Z // end inline asm 2026-02-21T08:53:50.4363805Z // begin inline asm 2026-02-21T08:53:50.4363931Z st.global.v4.b32 [ %rd310 + 0 ], { %r3687, %r3688, %r3689, %r3690 }; 2026-02-21T08:53:50.4363992Z // end inline asm 2026-02-21T08:53:50.4364200Z .loc 1 22 88 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:88 2026-02-21T08:53:50.4364322Z add.s64 %rd78, %rd322, 1; 2026-02-21T08:53:50.4364386Z add.s64 %rd321, %rd321, 64; 2026-02-21T08:53:50.4364458Z setp.lt.s64 %p56, %rd322, %rd53; 2026-02-21T08:53:50.4364519Z mov.b64 %rd322, %rd78; 2026-02-21T08:53:50.4364580Z @%p56 bra $L__BB0_11; 2026-02-21T08:53:50.4364670Z $L__BB0_14: // %._crit_edge 2026-02-21T08:53:50.4364869Z .loc 1 22 4 // cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py:22:4 2026-02-21T08:53:50.4364923Z ret; 2026-02-21T08:53:50.4364978Z $L__tmp9: 2026-02-21T08:53:50.4365040Z $L__func_end0: 2026-02-21T08:53:50.4365125Z // -- End function 2026-02-21T08:53:50.4365182Z } 2026-02-21T08:53:50.4365546Z .file 1 "/tmp/torchinductor_root/na/cnagdxnhs7zntkbupo466dm4tsm64ytyo3nrrhzlx4oet5yebi3y.py" 2026-02-21T08:53:50.4365761Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T08:53:50.4365828Z .section .debug_abbrev 2026-02-21T08:53:50.4365884Z { 2026-02-21T08:53:50.4365977Z .b8 1 // Abbreviation Code 2026-02-21T08:53:50.4366070Z .b8 17 // DW_TAG_compile_unit 2026-02-21T08:53:50.4366153Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:53:50.4366242Z .b8 37 // DW_AT_producer 2026-02-21T08:53:50.4366322Z .b8 8 // DW_FORM_string 2026-02-21T08:53:50.4366400Z .b8 19 // DW_AT_language 2026-02-21T08:53:50.4366604Z .b8 5 // DW_FORM_data2 2026-02-21T08:53:50.4366687Z .b8 3 // DW_AT_name 2026-02-21T08:53:50.4366771Z .b8 8 // DW_FORM_string 2026-02-21T08:53:50.4366929Z .b8 16 // DW_AT_stmt_list 2026-02-21T08:53:50.4367013Z .b8 6 // DW_FORM_data4 2026-02-21T08:53:50.4367094Z .b8 27 // DW_AT_comp_dir 2026-02-21T08:53:50.4367173Z .b8 8 // DW_FORM_string 2026-02-21T08:53:50.4367248Z .b8 0 // EOM(1) 2026-02-21T08:53:50.4367318Z .b8 0 // EOM(2) 2026-02-21T08:53:50.4367407Z .b8 2 // Abbreviation Code 2026-02-21T08:53:50.4367496Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:53:50.4367575Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:53:50.4367650Z .b8 3 // DW_AT_name 2026-02-21T08:53:50.4367729Z .b8 8 // DW_FORM_string 2026-02-21T08:53:50.4367811Z .b8 32 // DW_AT_inline 2026-02-21T08:53:50.4367891Z .b8 11 // DW_FORM_data1 2026-02-21T08:53:50.4367962Z .b8 0 // EOM(1) 2026-02-21T08:53:50.4368053Z .b8 0 // EOM(2) 2026-02-21T08:53:50.4368143Z .b8 3 // Abbreviation Code 2026-02-21T08:53:50.4368232Z .b8 46 // DW_TAG_subprogram 2026-02-21T08:53:50.4368316Z .b8 1 // DW_CHILDREN_yes 2026-02-21T08:53:50.4368398Z .b8 17 // DW_AT_low_pc 2026-02-21T08:53:50.4368477Z .b8 1 // DW_FORM_addr 2026-02-21T08:53:50.4368560Z .b8 18 // DW_AT_high_pc 2026-02-21T08:53:50.4368636Z .b8 1 // DW_FORM_addr 2026-02-21T08:53:50.4368728Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:53:50.4368812Z .b8 19 // DW_FORM_ref4 2026-02-21T08:53:50.4368886Z .b8 0 // EOM(1) 2026-02-21T08:53:50.4368954Z .b8 0 // EOM(2) 2026-02-21T08:53:50.4369113Z .b8 4 // Abbreviation Code 2026-02-21T08:53:50.4369219Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T08:53:50.4369311Z .b8 0 // DW_CHILDREN_no 2026-02-21T08:53:50.4369406Z .b8 49 // DW_AT_abstract_origin 2026-02-21T08:53:50.4369487Z .b8 19 // DW_FORM_ref4 2026-02-21T08:53:50.4369564Z .b8 17 // DW_AT_low_pc 2026-02-21T08:53:50.4369639Z .b8 1 // DW_FORM_addr 2026-02-21T08:53:50.4369722Z .b8 18 // DW_AT_high_pc 2026-02-21T08:53:50.4369798Z .b8 1 // DW_FORM_addr 2026-02-21T08:53:50.4370002Z .b8 88 // DW_AT_call_file 2026-02-21T08:53:50.4370081Z .b8 11 // DW_FORM_data1 2026-02-21T08:53:50.4370164Z .b8 89 // DW_AT_call_line 2026-02-21T08:53:50.4370244Z .b8 11 // DW_FORM_data1 2026-02-21T08:53:50.4370328Z .b8 87 // DW_AT_call_column 2026-02-21T08:53:50.4370410Z .b8 11 // DW_FORM_data1 2026-02-21T08:53:50.4370479Z .b8 0 // EOM(1) 2026-02-21T08:53:50.4370548Z .b8 0 // EOM(2) 2026-02-21T08:53:50.4370629Z .b8 0 // EOM(3) 2026-02-21T08:53:50.4370681Z } 2026-02-21T08:53:50.4370744Z .section .debug_info 2026-02-21T08:53:50.4370794Z { 2026-02-21T08:53:50.4370889Z .b32 178 // Length of Unit 2026-02-21T08:53:50.4370981Z .b8 2 // DWARF version number 2026-02-21T08:53:50.4371036Z .b8 0 2026-02-21T08:53:50.4371228Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T08:53:50.4371326Z .b8 8 // Address Size (in bytes) 2026-02-21T08:53:50.4371443Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T08:53:50.4371532Z .b8 116 // DW_AT_producer 2026-02-21T08:53:50.4371586Z .b8 114 2026-02-21T08:53:50.4371637Z .b8 105 2026-02-21T08:53:50.4371689Z .b8 116 2026-02-21T08:53:50.4371743Z .b8 111 2026-02-21T08:53:50.4371793Z .b8 110 2026-02-21T08:53:50.4371844Z .b8 0 2026-02-21T08:53:50.4371927Z .b8 2 // DW_AT_language 2026-02-21T08:53:50.4371978Z .b8 0 2026-02-21T08:53:50.4372058Z .b8 99 // DW_AT_name 2026-02-21T08:53:50.4372111Z .b8 110 2026-02-21T08:53:50.4372165Z .b8 97 2026-02-21T08:53:50.4372228Z .b8 103 2026-02-21T08:53:50.4372283Z .b8 100 2026-02-21T08:53:50.4372339Z .b8 120 2026-02-21T08:53:50.4372393Z .b8 110 2026-02-21T08:53:50.4372444Z .b8 104 2026-02-21T08:53:50.4372495Z .b8 115 2026-02-21T08:53:50.4372547Z .b8 55 2026-02-21T08:53:50.4372598Z .b8 122 2026-02-21T08:53:50.4372652Z .b8 110 2026-02-21T08:53:50.4372703Z .b8 116 2026-02-21T08:53:50.4372757Z .b8 107 2026-02-21T08:53:50.4372807Z .b8 98 2026-02-21T08:53:50.4372858Z .b8 117 2026-02-21T08:53:50.4372911Z .b8 112 2026-02-21T08:53:50.4372962Z .b8 111 2026-02-21T08:53:50.4373012Z .b8 52 2026-02-21T08:53:50.4373061Z .b8 54 2026-02-21T08:53:50.4373116Z .b8 54 2026-02-21T08:53:50.4373168Z .b8 100 2026-02-21T08:53:50.4373219Z .b8 109 2026-02-21T08:53:50.4373271Z .b8 52 2026-02-21T08:53:50.4373322Z .b8 116 2026-02-21T08:53:50.4373373Z .b8 115 2026-02-21T08:53:50.4373423Z .b8 109 2026-02-21T08:53:50.4373477Z .b8 54 2026-02-21T08:53:50.4373532Z .b8 52 2026-02-21T08:53:50.4373583Z .b8 121 2026-02-21T08:53:50.4373634Z .b8 116 2026-02-21T08:53:50.4373687Z .b8 121 2026-02-21T08:53:50.4373739Z .b8 111 2026-02-21T08:53:50.4373790Z .b8 51 2026-02-21T08:53:50.4373847Z .b8 110 2026-02-21T08:53:50.4373897Z .b8 114 2026-02-21T08:53:50.4373948Z .b8 114 2026-02-21T08:53:50.4373998Z .b8 104 2026-02-21T08:53:50.4374053Z .b8 122 2026-02-21T08:53:50.4374171Z .b8 108 2026-02-21T08:53:50.4374232Z .b8 120 2026-02-21T08:53:50.4374287Z .b8 52 2026-02-21T08:53:50.4374337Z .b8 111 2026-02-21T08:53:50.4374389Z .b8 101 2026-02-21T08:53:50.4374440Z .b8 116 2026-02-21T08:53:50.4374493Z .b8 53 2026-02-21T08:53:50.4374547Z .b8 121 2026-02-21T08:53:50.4374597Z .b8 101 2026-02-21T08:53:50.4374651Z .b8 98 2026-02-21T08:53:50.4374702Z .b8 105 2026-02-21T08:53:50.4374752Z .b8 51 2026-02-21T08:53:50.4374802Z .b8 121 2026-02-21T08:53:50.4374857Z .b8 46 2026-02-21T08:53:50.4374906Z .b8 112 2026-02-21T08:53:50.4374956Z .b8 121 2026-02-21T08:53:50.4375006Z .b8 0 2026-02-21T08:53:50.4375120Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T08:53:50.4375207Z .b8 47 // DW_AT_comp_dir 2026-02-21T08:53:50.4375377Z .b8 116 2026-02-21T08:53:50.4375437Z .b8 109 2026-02-21T08:53:50.4375489Z .b8 112 2026-02-21T08:53:50.4375540Z .b8 47 2026-02-21T08:53:50.4375590Z .b8 116 2026-02-21T08:53:50.4375643Z .b8 111 2026-02-21T08:53:50.4375696Z .b8 114 2026-02-21T08:53:50.4375746Z .b8 99 2026-02-21T08:53:50.4375799Z .b8 104 2026-02-21T08:53:50.4375849Z .b8 105 2026-02-21T08:53:50.4375900Z .b8 110 2026-02-21T08:53:50.4375951Z .b8 100 2026-02-21T08:53:50.4376006Z .b8 117 2026-02-21T08:53:50.4376055Z .b8 99 2026-02-21T08:53:50.4376107Z .b8 116 2026-02-21T08:53:50.4376162Z .b8 111 2026-02-21T08:53:50.4376216Z .b8 114 2026-02-21T08:53:50.4376266Z .b8 95 2026-02-21T08:53:50.4376319Z .b8 114 2026-02-21T08:53:50.4376379Z .b8 111 2026-02-21T08:53:50.4376431Z .b8 111 2026-02-21T08:53:50.4376612Z .b8 116 2026-02-21T08:53:50.4376668Z .b8 47 2026-02-21T08:53:50.4376723Z .b8 110 2026-02-21T08:53:50.4376773Z .b8 97 2026-02-21T08:53:50.4376823Z .b8 0 2026-02-21T08:53:50.4376945Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T08:53:50.4377129Z .b8 95 // DW_AT_name 2026-02-21T08:53:50.4377184Z .b8 104 2026-02-21T08:53:50.4377236Z .b8 101 2026-02-21T08:53:50.4377304Z .b8 108 2026-02-21T08:53:50.4377356Z .b8 105 2026-02-21T08:53:50.4377409Z .b8 111 2026-02-21T08:53:50.4377460Z .b8 110 2026-02-21T08:53:50.4377515Z .b8 95 2026-02-21T08:53:50.4377566Z .b8 109 2026-02-21T08:53:50.4377617Z .b8 97 2026-02-21T08:53:50.4377674Z .b8 116 2026-02-21T08:53:50.4377724Z .b8 109 2026-02-21T08:53:50.4377776Z .b8 117 2026-02-21T08:53:50.4377826Z .b8 108 2026-02-21T08:53:50.4377879Z .b8 95 2026-02-21T08:53:50.4377930Z .b8 98 2026-02-21T08:53:50.4377982Z .b8 102 2026-02-21T08:53:50.4378034Z .b8 49 2026-02-21T08:53:50.4378086Z .b8 54 2026-02-21T08:53:50.4378137Z .b8 95 2026-02-21T08:53:50.4378189Z .b8 105 2026-02-21T08:53:50.4378242Z .b8 110 2026-02-21T08:53:50.4378292Z .b8 116 2026-02-21T08:53:50.4378343Z .b8 52 2026-02-21T08:53:50.4378393Z .b8 0 2026-02-21T08:53:50.4378484Z .b8 1 // DW_AT_inline 2026-02-21T08:53:50.4378592Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T08:53:50.4378688Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T08:53:50.4378791Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T08:53:50.4378893Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:53:50.4379025Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T08:53:50.4379126Z .b32 108 // DW_AT_abstract_origin 2026-02-21T08:53:50.4379214Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T08:53:50.4379304Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T08:53:50.4379392Z .b8 1 // DW_AT_call_file 2026-02-21T08:53:50.4379474Z .b8 81 // DW_AT_call_line 2026-02-21T08:53:50.4379564Z .b8 40 // DW_AT_call_column 2026-02-21T08:53:50.4379657Z .b8 0 // End Of Children Mark 2026-02-21T08:53:50.4379755Z .b8 0 // End Of Children Mark 2026-02-21T08:53:50.4379889Z } 2026-02-21T08:53:50.4379962Z .section .debug_macinfo { } 2026-02-21T08:53:50.4379968Z 2026-02-21T08:53:50.4380053Z ================================================================ 2026-02-21T08:53:50.4380172Z please share the reproducer above with Triton project. 2026-02-21T08:53:51.6775372Z [148s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=16, num_stages=6, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, False], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:53:51.6778094Z Tensor-likes are not close! 2026-02-21T08:53:51.6778287Z 2026-02-21T08:53:51.6778399Z Mismatched elements: 456977 / 458752 (99.6%) 2026-02-21T08:53:51.6778826Z Greatest absolute difference: 3200.0 at index (38, 1878) (up to 0.01 allowed) 2026-02-21T08:53:51.6779372Z Greatest relative difference: 70144.0 at index (61, 2065) (up to 0.01 allowed) 2026-02-21T08:53:51.6779860Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:53:51.6780126Z 2026-02-21T08:53:53.0205586Z 2026-02-21T08:53:53.0206836Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 92/92 13.9 configs/s 2026-02-21T08:53:59.0731804Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 165.5 2026-02-21T08:53:59.0732371Z configs/s 2026-02-21T08:53:59.3498208Z [156s] Generation 3 complete: 2026-02-21T08:53:59.3498534Z error=5 2026-02-21T08:53:59.3498757Z ok=90 2026-02-21T08:53:59.3498925Z min=0.1195 2026-02-21T08:53:59.3499115Z mid=0.2387 2026-02-21T08:53:59.3499874Z max=11.3668 2026-02-21T08:53:59.3500086Z best={'block_sizes': [32, 64, 32], 2026-02-21T08:53:59.3500500Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T08:53:59.3500951Z 'l2_groupings': [8], 2026-02-21T08:53:59.3501213Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:53:59.3501510Z 'loop_orders': [[1, 0]], 2026-02-21T08:53:59.3501742Z 'maxnreg': 256, 2026-02-21T08:53:59.3501943Z 'num_sm_multiplier': 2, 2026-02-21T08:53:59.3502172Z 'num_stages': 3, 2026-02-21T08:53:59.3502358Z 'num_warps': 4, 2026-02-21T08:53:59.3502580Z 'pid_type': 'persistent_interleaved', 2026-02-21T08:53:59.3502858Z 'range_flattens': [None, True], 2026-02-21T08:53:59.3503117Z 'range_multi_buffers': [None, False], 2026-02-21T08:53:59.3503384Z 'range_num_stages': [0, 0], 2026-02-21T08:53:59.3503623Z 'range_unroll_factors': [3, 2], 2026-02-21T08:53:59.3503874Z 'range_warp_specializes': []} 2026-02-21T08:53:59.3533811Z [156s] Fitting surrogate: 396 points, 396 targets 2026-02-21T08:54:00.7154327Z [157s] Generation 4 starting: 84 neighbors, 5 active search path(s) 2026-02-21T08:54:18.7829588Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 85/85 4.1 configs/s 2026-02-21T08:54:24.2944001Z [181s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:54:24.2945847Z Tensor-likes are not close! 2026-02-21T08:54:24.2946021Z 2026-02-21T08:54:24.2946138Z Mismatched elements: 457098 / 458752 (99.6%) 2026-02-21T08:54:24.2946727Z Greatest absolute difference: 3200.0 at index (41, 6557) (up to 0.01 allowed) 2026-02-21T08:54:24.2947326Z Greatest relative difference: 186368.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:54:24.2947803Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:54:24.2948398Z 2026-02-21T08:54:24.3741390Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 85/85 15.3 configs/s 2026-02-21T08:54:34.1982113Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 102.3 2026-02-21T08:54:34.1982568Z configs/s 2026-02-21T08:54:34.6362079Z [191s] Generation 4 complete: 2026-02-21T08:54:34.6362327Z error=1 2026-02-21T08:54:34.6362507Z ok=88 2026-02-21T08:54:34.6362666Z min=0.1089 2026-02-21T08:54:34.6362839Z mid=0.1659 2026-02-21T08:54:34.6363005Z max=11.4140 2026-02-21T08:54:34.6363194Z best={'block_sizes': [128, 64, 32], 2026-02-21T08:54:34.6363519Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:54:34.6364123Z 'l2_groupings': [1], 2026-02-21T08:54:34.6364497Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:54:34.6364778Z 'loop_orders': [[1, 0]], 2026-02-21T08:54:34.6364988Z 'num_stages': 6, 2026-02-21T08:54:34.6365179Z 'num_warps': 4, 2026-02-21T08:54:34.6365365Z 'pid_type': 'flat', 2026-02-21T08:54:34.6365574Z 'range_flattens': [None, None], 2026-02-21T08:54:34.6365824Z 'range_multi_buffers': [None, False], 2026-02-21T08:54:34.6366093Z 'range_num_stages': [0, 2], 2026-02-21T08:54:34.6366323Z 'range_unroll_factors': [0, 0], 2026-02-21T08:54:34.6366743Z 'range_warp_specializes': []} 2026-02-21T08:54:34.6402329Z [191s] Fitting surrogate: 485 points, 485 targets 2026-02-21T08:54:36.0931861Z [193s] Generation 5 starting: 90 neighbors, 5 active search path(s) 2026-02-21T08:54:53.6759717Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 91/91 3.9 configs/s 2026-02-21T08:55:00.1799772Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 91/91 14.1 configs/s 2026-02-21T08:55:11.3965754Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 89.7 configs/s 2026-02-21T08:55:11.8977919Z [229s] Generation 5 complete: 2026-02-21T08:55:11.8978341Z error=2 2026-02-21T08:55:11.8978606Z ok=93 2026-02-21T08:55:11.8978894Z min=0.1087 2026-02-21T08:55:11.8979173Z mid=0.1531 2026-02-21T08:55:11.8979450Z max=33.2367 2026-02-21T08:55:11.8979756Z best={'block_sizes': [128, 64, 32], 2026-02-21T08:55:11.8980297Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T08:55:11.8980832Z 'l2_groupings': [2], 2026-02-21T08:55:11.8981149Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:55:11.8981437Z 'loop_orders': [[1, 0]], 2026-02-21T08:55:11.8981650Z 'num_stages': 6, 2026-02-21T08:55:11.8981859Z 'num_warps': 4, 2026-02-21T08:55:11.8982055Z 'pid_type': 'flat', 2026-02-21T08:55:11.8982291Z 'range_flattens': [None, None], 2026-02-21T08:55:11.8982560Z 'range_multi_buffers': [None, False], 2026-02-21T08:55:11.8982860Z 'range_num_stages': [0, 2], 2026-02-21T08:55:11.8983112Z 'range_unroll_factors': [0, 0], 2026-02-21T08:55:11.8983374Z 'range_warp_specializes': []} 2026-02-21T08:55:11.9017983Z [229s] Fitting surrogate: 580 points, 580 targets 2026-02-21T08:55:13.3404918Z [230s] Generation 6 starting: 88 neighbors, 5 active search path(s) 2026-02-21T08:55:37.8419295Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 1.8 configs/s 2026-02-21T08:55:42.1982780Z [259s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=16, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:55:42.1985442Z Tensor-likes are not close! 2026-02-21T08:55:42.1985722Z 2026-02-21T08:55:42.1985897Z Mismatched elements: 456538 / 458752 (99.5%) 2026-02-21T08:55:42.1987122Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:55:42.1988011Z Greatest relative difference: 107008.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:55:42.1989471Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:55:42.1989917Z 2026-02-21T08:55:42.4202837Z [259s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=4, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:55:42.4204471Z Tensor-likes are not close! 2026-02-21T08:55:42.4204627Z 2026-02-21T08:55:42.4204728Z Mismatched elements: 457125 / 458752 (99.6%) 2026-02-21T08:55:42.4205131Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:55:42.4206029Z Greatest relative difference: 99328.0 at index (8, 2887) (up to 0.01 allowed) 2026-02-21T08:55:42.4206644Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:55:42.4206919Z 2026-02-21T08:55:42.8578203Z [260s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:55:42.8579842Z Tensor-likes are not close! 2026-02-21T08:55:42.8580013Z 2026-02-21T08:55:42.8580120Z Mismatched elements: 456538 / 458752 (99.5%) 2026-02-21T08:55:42.8580528Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:55:42.8581053Z Greatest relative difference: 107008.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:55:42.8581694Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:55:42.8581964Z 2026-02-21T08:55:43.0613232Z [260s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:55:43.0616154Z Tensor-likes are not close! 2026-02-21T08:55:43.0616426Z 2026-02-21T08:55:43.0617138Z Mismatched elements: 457125 / 458752 (99.6%) 2026-02-21T08:55:43.0617818Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:55:43.0618651Z Greatest relative difference: 99328.0 at index (8, 2887) (up to 0.01 allowed) 2026-02-21T08:55:43.0619421Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:55:43.0619874Z 2026-02-21T08:55:43.6650713Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 15.4 configs/s 2026-02-21T08:55:51.2229933Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 133.0 2026-02-21T08:55:51.2230646Z configs/s 2026-02-21T08:55:51.5843068Z [268s] Generation 6 complete: 2026-02-21T08:55:51.5843559Z error=10 2026-02-21T08:55:51.5843881Z ok=83 2026-02-21T08:55:51.5844149Z min=0.1063 2026-02-21T08:55:51.5844418Z mid=0.1869 2026-02-21T08:55:51.5844666Z max=46.2729 2026-02-21T08:55:51.5844956Z best={'block_sizes': [64, 64, 32], 2026-02-21T08:55:51.5845460Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:55:51.5845942Z 'l2_groupings': [2], 2026-02-21T08:55:51.5846324Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:55:51.5850126Z 'loop_orders': [[1, 0]], 2026-02-21T08:55:51.5850512Z 'num_stages': 7, 2026-02-21T08:55:51.5850822Z 'num_warps': 4, 2026-02-21T08:55:51.5851130Z 'pid_type': 'flat', 2026-02-21T08:55:51.5851469Z 'range_flattens': [None, True], 2026-02-21T08:55:51.5852534Z 'range_multi_buffers': [None, False], 2026-02-21T08:55:51.5853007Z 'range_num_stages': [0, 3], 2026-02-21T08:55:51.5853432Z 'range_unroll_factors': [0, 1], 2026-02-21T08:55:51.5853669Z 'range_warp_specializes': []} 2026-02-21T08:55:51.5894938Z [268s] Fitting surrogate: 673 points, 673 targets 2026-02-21T08:55:53.0069748Z [270s] Generation 7 starting: 88 neighbors, 5 active search path(s) 2026-02-21T08:56:17.4206059Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 89/89 1.7 configs/s 2026-02-21T08:56:23.7904053Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 89/89 14.0 configs/s 2026-02-21T08:56:31.8144330Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 125.3 2026-02-21T08:56:31.8147093Z configs/s 2026-02-21T08:56:32.2044532Z [309s] Generation 7 complete: 2026-02-21T08:56:32.2044816Z error=5 2026-02-21T08:56:32.2044993Z ok=88 2026-02-21T08:56:32.2045170Z min=0.1063 2026-02-21T08:56:32.2045394Z mid=0.1776 2026-02-21T08:56:32.2045600Z max=46.4783 2026-02-21T08:56:32.2045829Z best={'block_sizes': [64, 64, 32], 2026-02-21T08:56:32.2046187Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:56:32.2046929Z 'l2_groupings': [2], 2026-02-21T08:56:32.2047177Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:56:32.2047457Z 'loop_orders': [[1, 0]], 2026-02-21T08:56:32.2047670Z 'num_stages': 8, 2026-02-21T08:56:32.2047864Z 'num_warps': 4, 2026-02-21T08:56:32.2048071Z 'pid_type': 'flat', 2026-02-21T08:56:32.2048296Z 'range_flattens': [None, True], 2026-02-21T08:56:32.2048565Z 'range_multi_buffers': [None, False], 2026-02-21T08:56:32.2048835Z 'range_num_stages': [0, 3], 2026-02-21T08:56:32.2049078Z 'range_unroll_factors': [0, 1], 2026-02-21T08:56:32.2049340Z 'range_warp_specializes': []} 2026-02-21T08:56:32.2092344Z [309s] Fitting surrogate: 766 points, 766 targets 2026-02-21T08:56:33.0448121Z [310s] Generation 8 starting: 50 neighbors, 3 active search path(s) 2026-02-21T08:56:44.4928687Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 51/51 2.0 configs/s 2026-02-21T08:56:47.4517606Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 51/51 17.2 configs/s 2026-02-21T08:56:52.6325152Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 227.4 2026-02-21T08:56:52.6327577Z configs/s 2026-02-21T08:56:52.8613943Z [330s] Generation 8 complete: 2026-02-21T08:56:52.8614203Z error=4 2026-02-21T08:56:52.8614378Z ok=50 2026-02-21T08:56:52.8614528Z min=0.1062 2026-02-21T08:56:52.8614690Z mid=0.2262 2026-02-21T08:56:52.8614846Z max=1.7858 2026-02-21T08:56:52.8615016Z best={'block_sizes': [64, 64, 32], 2026-02-21T08:56:52.8615314Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T08:56:52.8615600Z 'l2_groupings': [2], 2026-02-21T08:56:52.8615812Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T08:56:52.8616072Z 'loop_orders': [[1, 0]], 2026-02-21T08:56:52.8625524Z 'num_stages': 8, 2026-02-21T08:56:52.8625791Z 'num_warps': 4, 2026-02-21T08:56:52.8625974Z 'pid_type': 'flat', 2026-02-21T08:56:52.8626182Z 'range_flattens': [None, True], 2026-02-21T08:56:52.8626408Z 'range_multi_buffers': [None, False], 2026-02-21T08:56:52.8626859Z 'range_num_stages': [0, 3], 2026-02-21T08:56:52.8627072Z 'range_unroll_factors': [0, 1], 2026-02-21T08:56:52.8627304Z 'range_warp_specializes': []} 2026-02-21T08:56:52.8658584Z [330s] Fitting surrogate: 820 points, 820 targets 2026-02-21T08:56:53.4673604Z [330s] Generation 9 starting: 32 neighbors, 2 active search path(s) 2026-02-21T08:57:25.0221458Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 33/33 0.2 configs/s 2026-02-21T08:57:27.8744575Z [365s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, False], range_num_stages=[2, 2], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T08:57:27.8746963Z Tensor-likes are not close! 2026-02-21T08:57:27.8747141Z 2026-02-21T08:57:27.8747254Z Mismatched elements: 457125 / 458752 (99.6%) 2026-02-21T08:57:27.8747680Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:57:27.8748213Z Greatest relative difference: 99328.0 at index (8, 2887) (up to 0.01 allowed) 2026-02-21T08:57:27.8748811Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:57:27.8749077Z 2026-02-21T08:57:27.9626946Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 33/33 11.3 configs/s 2026-02-21T08:57:30.5914056Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━ 1000/1000 378.9 2026-02-21T08:57:30.5914583Z configs/s 2026-02-21T08:57:30.7462928Z [367s] Generation 9 complete: 2026-02-21T08:57:30.7463194Z error=2 2026-02-21T08:57:30.7463368Z ok=33 2026-02-21T08:57:30.7463532Z min=0.1061 2026-02-21T08:57:30.7463730Z mid=0.7428 2026-02-21T08:57:30.7463898Z max=34.2320 2026-02-21T08:57:30.7464089Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:57:30.7464436Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:57:30.7464761Z 'l2_groupings': [8], 2026-02-21T08:57:30.7465009Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:57:30.7465290Z 'loop_orders': [[1, 0]], 2026-02-21T08:57:30.7465519Z 'num_stages': 4, 2026-02-21T08:57:30.7465721Z 'num_warps': 4, 2026-02-21T08:57:30.7465941Z 'pid_type': 'flat', 2026-02-21T08:57:30.7466172Z 'range_flattens': [None, True], 2026-02-21T08:57:30.7467097Z 'range_multi_buffers': [None, False], 2026-02-21T08:57:30.7467403Z 'range_num_stages': [0, 0], 2026-02-21T08:57:30.7467652Z 'range_unroll_factors': [0, 1], 2026-02-21T08:57:30.7467917Z 'range_warp_specializes': []} 2026-02-21T08:57:30.7517374Z [367s] Fitting surrogate: 855 points, 855 targets 2026-02-21T08:57:31.4038265Z [368s] Generation 10 starting: 35 neighbors, 2 active search path(s) 2026-02-21T08:57:41.2423583Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 36/36 1.8 configs/s 2026-02-21T08:57:43.4657662Z [380s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], num_stages=6, num_warps=8, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:57:43.4659432Z Tensor-likes are not close! 2026-02-21T08:57:43.4659616Z 2026-02-21T08:57:43.4659754Z Mismatched elements: 456584 / 458752 (99.5%) 2026-02-21T08:57:43.4660180Z Greatest absolute difference: 3296.0 at index (46, 6382) (up to 0.01 allowed) 2026-02-21T08:57:43.4661086Z Greatest relative difference: 115712.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:57:43.4661557Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:57:43.4661818Z 2026-02-21T08:57:44.1243157Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 36/36 12.6 configs/s 2026-02-21T08:57:46.5057647Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 417.8 2026-02-21T08:57:46.5058187Z configs/s 2026-02-21T08:57:46.6498438Z [383s] Generation 10 complete: 2026-02-21T08:57:46.6498712Z error=1 2026-02-21T08:57:46.6498921Z ok=36 2026-02-21T08:57:46.6499094Z min=0.1061 2026-02-21T08:57:46.6499278Z mid=0.6968 2026-02-21T08:57:46.6499469Z max=34.8731 2026-02-21T08:57:46.6500095Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:57:46.6500441Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:57:46.6500782Z 'l2_groupings': [8], 2026-02-21T08:57:46.6501068Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:57:46.6501353Z 'loop_orders': [[1, 0]], 2026-02-21T08:57:46.6501579Z 'num_stages': 4, 2026-02-21T08:57:46.6501771Z 'num_warps': 4, 2026-02-21T08:57:46.6501984Z 'pid_type': 'flat', 2026-02-21T08:57:46.6502202Z 'range_flattens': [None, True], 2026-02-21T08:57:46.6502467Z 'range_multi_buffers': [None, False], 2026-02-21T08:57:46.6502753Z 'range_num_stages': [0, 0], 2026-02-21T08:57:46.6502998Z 'range_unroll_factors': [0, 1], 2026-02-21T08:57:46.6503245Z 'range_warp_specializes': []} 2026-02-21T08:57:46.6549940Z [383s] Fitting surrogate: 892 points, 892 targets 2026-02-21T08:57:47.3032255Z [384s] Generation 11 starting: 34 neighbors, 2 active search path(s) 2026-02-21T08:57:58.1291331Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 35/35 2.9 configs/s 2026-02-21T08:57:58.8592781Z [396s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], num_stages=7, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:57:58.8594583Z Tensor-likes are not close! 2026-02-21T08:57:58.8594749Z 2026-02-21T08:57:58.8594859Z Mismatched elements: 456538 / 458752 (99.5%) 2026-02-21T08:57:58.8595271Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:57:58.8595810Z Greatest relative difference: 107008.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:57:58.8596281Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:57:58.8596684Z 2026-02-21T08:57:58.9383647Z [396s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, False], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:57:58.9387053Z Tensor-likes are not close! 2026-02-21T08:57:58.9387332Z 2026-02-21T08:57:58.9387507Z Mismatched elements: 456538 / 458752 (99.5%) 2026-02-21T08:57:58.9388181Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:57:58.9389179Z Greatest relative difference: 107008.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:57:58.9389931Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:57:58.9390360Z 2026-02-21T08:58:00.3147390Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 35/35 16.3 configs/s 2026-02-21T08:58:02.1631796Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━ 1000/1000 535.3 2026-02-21T08:58:02.1632679Z configs/s 2026-02-21T08:58:02.2833354Z [399s] Generation 11 complete: 2026-02-21T08:58:02.2833904Z error=2 2026-02-21T08:58:02.2834221Z ok=34 2026-02-21T08:58:02.2834416Z min=0.1060 2026-02-21T08:58:02.2834591Z mid=0.8528 2026-02-21T08:58:02.2834751Z max=4.0906 2026-02-21T08:58:02.2834935Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:02.2835254Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:02.2835578Z 'l2_groupings': [8], 2026-02-21T08:58:02.2835812Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:02.2836100Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:02.2836318Z 'num_stages': 4, 2026-02-21T08:58:02.2836786Z 'num_warps': 4, 2026-02-21T08:58:02.2837187Z 'pid_type': 'flat', 2026-02-21T08:58:02.2837534Z 'range_flattens': [None, True], 2026-02-21T08:58:02.2837794Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:02.2838055Z 'range_num_stages': [0, 0], 2026-02-21T08:58:02.2838292Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:02.2838533Z 'range_warp_specializes': []} 2026-02-21T08:58:02.2880542Z [399s] Fitting surrogate: 928 points, 928 targets 2026-02-21T08:58:02.7247912Z [399s] Generation 12 starting: 19 neighbors, 1 active search path(s) 2026-02-21T08:58:13.5840573Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20/20 1.2 configs/s 2026-02-21T08:58:13.6970704Z [410s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:58:13.6973844Z Tensor-likes are not close! 2026-02-21T08:58:13.6974138Z 2026-02-21T08:58:13.6974319Z Mismatched elements: 457359 / 458752 (99.7%) 2026-02-21T08:58:13.6975009Z Greatest absolute difference: 3168.0 at index (7, 6991) (up to 0.01 allowed) 2026-02-21T08:58:13.6975876Z Greatest relative difference: 258048.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:58:13.6977062Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:13.6977539Z 2026-02-21T08:58:13.8751877Z [411s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, False], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:58:13.8753497Z Tensor-likes are not close! 2026-02-21T08:58:13.8753647Z 2026-02-21T08:58:13.8753739Z Mismatched elements: 457359 / 458752 (99.7%) 2026-02-21T08:58:13.8754105Z Greatest absolute difference: 3168.0 at index (7, 6991) (up to 0.01 allowed) 2026-02-21T08:58:13.8754582Z Greatest relative difference: 258048.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:58:13.8755000Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:13.8755221Z 2026-02-21T08:58:14.3792036Z [411s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], num_stages=7, num_warps=1, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:58:14.3794674Z Tensor-likes are not close! 2026-02-21T08:58:14.3794956Z 2026-02-21T08:58:14.3795149Z Mismatched elements: 457359 / 458752 (99.7%) 2026-02-21T08:58:14.3795844Z Greatest absolute difference: 3168.0 at index (7, 6991) (up to 0.01 allowed) 2026-02-21T08:58:14.3797640Z Greatest relative difference: 258048.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:58:14.3798239Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:14.3798495Z 2026-02-21T08:58:14.5571644Z [411s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=32, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T08:58:14.5574897Z Tensor-likes are not close! 2026-02-21T08:58:14.5575383Z 2026-02-21T08:58:14.5575569Z Mismatched elements: 456955 / 458752 (99.6%) 2026-02-21T08:58:14.5576219Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:58:14.5577620Z Greatest relative difference: 129536.0 at index (58, 2774) (up to 0.01 allowed) 2026-02-21T08:58:14.5578193Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:14.5578428Z 2026-02-21T08:58:14.9631408Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 20/20 14.5 configs/s 2026-02-21T08:58:14.9637492Z [412s] Generation 12 complete: 2026-02-21T08:58:14.9637747Z error=4 2026-02-21T08:58:14.9637918Z ok=17 2026-02-21T08:58:14.9638090Z min=0.1060 2026-02-21T08:58:14.9638258Z mid=1.6879 2026-02-21T08:58:14.9638425Z max=22.5408 2026-02-21T08:58:14.9638603Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:14.9638924Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:14.9639250Z 'l2_groupings': [8], 2026-02-21T08:58:14.9639495Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:14.9639957Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:14.9640172Z 'num_stages': 4, 2026-02-21T08:58:14.9640361Z 'num_warps': 4, 2026-02-21T08:58:14.9640550Z 'pid_type': 'flat', 2026-02-21T08:58:14.9640760Z 'range_flattens': [None, True], 2026-02-21T08:58:14.9641006Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:14.9641268Z 'range_num_stages': [0, 0], 2026-02-21T08:58:14.9641502Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:14.9641741Z 'range_warp_specializes': []} 2026-02-21T08:58:14.9687004Z [412s] Fitting surrogate: 949 points, 949 targets 2026-02-21T08:58:15.3394185Z [412s] Generation 13 starting: 16 neighbors, 1 active search path(s) 2026-02-21T08:58:24.2138003Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 1.2 configs/s 2026-02-21T08:58:24.4217360Z [421s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=7, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:58:24.4218940Z Tensor-likes are not close! 2026-02-21T08:58:24.4219093Z 2026-02-21T08:58:24.4219202Z Mismatched elements: 457125 / 458752 (99.6%) 2026-02-21T08:58:24.4219577Z Greatest absolute difference: 3072.0 at index (33, 554) (up to 0.01 allowed) 2026-02-21T08:58:24.4220067Z Greatest relative difference: 99328.0 at index (8, 2887) (up to 0.01 allowed) 2026-02-21T08:58:24.4220508Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:24.4220746Z 2026-02-21T08:58:25.3219977Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 15.9 configs/s 2026-02-21T08:58:25.3225244Z [422s] Generation 13 complete: 2026-02-21T08:58:25.3225453Z error=1 2026-02-21T08:58:25.3225622Z ok=17 2026-02-21T08:58:25.3225774Z min=0.1060 2026-02-21T08:58:25.3225923Z mid=1.2224 2026-02-21T08:58:25.3226071Z max=3.7200 2026-02-21T08:58:25.3226234Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:25.3226809Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:25.3227400Z 'l2_groupings': [8], 2026-02-21T08:58:25.3227606Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:25.3227856Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:25.3228038Z 'num_stages': 4, 2026-02-21T08:58:25.3228211Z 'num_warps': 4, 2026-02-21T08:58:25.3228384Z 'pid_type': 'flat', 2026-02-21T08:58:25.3228670Z 'range_flattens': [None, True], 2026-02-21T08:58:25.3228890Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:25.3229121Z 'range_num_stages': [0, 0], 2026-02-21T08:58:25.3229324Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:25.3229537Z 'range_warp_specializes': []} 2026-02-21T08:58:25.3274956Z [422s] Fitting surrogate: 967 points, 967 targets 2026-02-21T08:58:25.7500645Z [422s] Generation 14 starting: 18 neighbors, 1 active search path(s) 2026-02-21T08:58:32.9954944Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 2.0 configs/s 2026-02-21T08:58:33.5664410Z [430s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 512], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=7, num_warps=16, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:58:33.5666060Z Tensor-likes are not close! 2026-02-21T08:58:33.5666235Z 2026-02-21T08:58:33.5666350Z Mismatched elements: 456955 / 458752 (99.6%) 2026-02-21T08:58:33.5667114Z Greatest absolute difference: 3168.0 at index (12, 1164) (up to 0.01 allowed) 2026-02-21T08:58:33.5667652Z Greatest relative difference: 107008.0 at index (44, 6823) (up to 0.01 allowed) 2026-02-21T08:58:33.5668145Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:33.5668825Z 2026-02-21T08:58:33.9647693Z [431s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 512], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=7, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 1], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T08:58:33.9649259Z Tensor-likes are not close! 2026-02-21T08:58:33.9649442Z 2026-02-21T08:58:33.9649550Z Mismatched elements: 456656 / 458752 (99.5%) 2026-02-21T08:58:33.9649961Z Greatest absolute difference: 3056.0 at index (28, 5904) (up to 0.01 allowed) 2026-02-21T08:58:33.9650484Z Greatest relative difference: 187392.0 at index (8, 2887) (up to 0.01 allowed) 2026-02-21T08:58:33.9650957Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T08:58:33.9651217Z 2026-02-21T08:58:34.2515813Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 19/19 15.6 configs/s 2026-02-21T08:58:34.2521631Z [431s] Generation 14 complete: 2026-02-21T08:58:34.2521870Z error=2 2026-02-21T08:58:34.2522042Z ok=18 2026-02-21T08:58:34.2522206Z min=0.1060 2026-02-21T08:58:34.2522386Z mid=1.1325 2026-02-21T08:58:34.2522559Z max=10.7797 2026-02-21T08:58:34.2522745Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:34.2523056Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:34.2523378Z 'l2_groupings': [8], 2026-02-21T08:58:34.2523616Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:34.2523905Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:34.2524112Z 'num_stages': 4, 2026-02-21T08:58:34.2524303Z 'num_warps': 4, 2026-02-21T08:58:34.2524488Z 'pid_type': 'flat', 2026-02-21T08:58:34.2524700Z 'range_flattens': [None, True], 2026-02-21T08:58:34.2524957Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:34.2525206Z 'range_num_stages': [0, 0], 2026-02-21T08:58:34.2525432Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:34.2525659Z 'range_warp_specializes': []} 2026-02-21T08:58:34.2574963Z [431s] Fitting surrogate: 987 points, 987 targets 2026-02-21T08:58:34.6882480Z [431s] Generation 15 starting: 19 neighbors, 1 active search path(s) 2026-02-21T08:58:39.9274284Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20/20 3.9 configs/s 2026-02-21T08:58:41.6834805Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 20/20 13.3 configs/s 2026-02-21T08:58:41.6841307Z [438s] Generation 15 complete: 2026-02-21T08:58:41.6841554Z ok=21 2026-02-21T08:58:41.6841737Z min=0.1060 2026-02-21T08:58:41.6841904Z mid=1.2041 2026-02-21T08:58:41.6842063Z max=34.0867 2026-02-21T08:58:41.6842245Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:41.6842556Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:41.6842883Z 'l2_groupings': [8], 2026-02-21T08:58:41.6843113Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:41.6843865Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:41.6844082Z 'num_stages': 4, 2026-02-21T08:58:41.6844271Z 'num_warps': 4, 2026-02-21T08:58:41.6844457Z 'pid_type': 'flat', 2026-02-21T08:58:41.6844661Z 'range_flattens': [None, True], 2026-02-21T08:58:41.6844924Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:41.6845180Z 'range_num_stages': [0, 0], 2026-02-21T08:58:41.6845411Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:41.6845660Z 'range_warp_specializes': []} 2026-02-21T08:58:41.6893215Z [438s] Fitting surrogate: 1008 points, 1008 targets 2026-02-21T08:58:42.1237968Z [439s] Generation 16 starting: 20 neighbors, 1 active search path(s) 2026-02-21T08:58:58.1642400Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21/21 0.9 configs/s 2026-02-21T08:58:59.6336807Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 21/21 14.6 configs/s 2026-02-21T08:58:59.6343236Z [456s] Generation 16 complete: 2026-02-21T08:58:59.6343464Z error=1 2026-02-21T08:58:59.6343631Z ok=21 2026-02-21T08:58:59.6343794Z min=0.1060 2026-02-21T08:58:59.6344526Z mid=1.5623 2026-02-21T08:58:59.6344706Z max=7.7805 2026-02-21T08:58:59.6344873Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:58:59.6345175Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:58:59.6345507Z 'l2_groupings': [8], 2026-02-21T08:58:59.6345725Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:58:59.6345988Z 'loop_orders': [[1, 0]], 2026-02-21T08:58:59.6346182Z 'num_stages': 4, 2026-02-21T08:58:59.6346362Z 'num_warps': 4, 2026-02-21T08:58:59.6346703Z 'pid_type': 'flat', 2026-02-21T08:58:59.6346903Z 'range_flattens': [None, True], 2026-02-21T08:58:59.6347144Z 'range_multi_buffers': [None, False], 2026-02-21T08:58:59.6347390Z 'range_num_stages': [0, 0], 2026-02-21T08:58:59.6347600Z 'range_unroll_factors': [0, 1], 2026-02-21T08:58:59.6347842Z 'range_warp_specializes': []} 2026-02-21T08:58:59.6396075Z [456s] Fitting surrogate: 1030 points, 1030 targets 2026-02-21T08:59:00.1637462Z [457s] Generation 17 starting: 23 neighbors, 1 active search path(s) 2026-02-21T08:59:31.7669409Z [488s] Timeout after 30s compiling Config(block_sizes=[32, 4, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:32.0453435Z [489s] Timeout after 30s compiling Config(block_sizes=[32, 8, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:32.3325085Z [489s] Timeout after 30s compiling Config(block_sizes=[16, 8, 512], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:32.6228246Z [489s] Timeout after 30s compiling Config(block_sizes=[32, 8, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:33.1744969Z [490s] Timeout after 30s compiling Config(block_sizes=[8, 8, 512], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:33.4933607Z [490s] Timeout after 30s compiling Config(block_sizes=[32, 8, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[2, 4], range_warp_specializes=[]) 2026-02-21T08:59:34.2424688Z [491s] Timeout after 30s compiling Config(block_sizes=[32, 8, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, False], range_num_stages=[2, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T08:59:34.2445105Z Generation 17: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 24/24 0.7 configs/s 2026-02-21T08:59:35.6311958Z Generation 17: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 24/24 17.9 configs/s 2026-02-21T08:59:35.6318730Z [492s] Generation 17 complete: 2026-02-21T08:59:35.6318974Z error=1 2026-02-21T08:59:35.6319145Z timeout=7 2026-02-21T08:59:35.6319305Z ok=17 2026-02-21T08:59:35.6319481Z min=0.1060 2026-02-21T08:59:35.6319643Z mid=1.0622 2026-02-21T08:59:35.6319807Z max=17.4984 2026-02-21T08:59:35.6319983Z best={'block_sizes': [32, 64, 64], 2026-02-21T08:59:35.6320305Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T08:59:35.6320648Z 'l2_groupings': [8], 2026-02-21T08:59:35.6320901Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T08:59:35.6321185Z 'loop_orders': [[1, 0]], 2026-02-21T08:59:35.6321394Z 'num_stages': 4, 2026-02-21T08:59:35.6321585Z 'num_warps': 4, 2026-02-21T08:59:35.6321776Z 'pid_type': 'flat', 2026-02-21T08:59:35.6321997Z 'range_flattens': [None, True], 2026-02-21T08:59:35.6322252Z 'range_multi_buffers': [None, False], 2026-02-21T08:59:35.6322509Z 'range_num_stages': [0, 0], 2026-02-21T08:59:35.6322745Z 'range_unroll_factors': [0, 1], 2026-02-21T08:59:35.6322985Z 'range_warp_specializes': []} 2026-02-21T08:59:35.6372558Z [492s] Fitting surrogate: 1055 points, 1055 targets 2026-02-21T08:59:36.0560199Z [493s] Generation 18 starting: 19 neighbors, 1 active search path(s) 2026-02-21T09:00:09.3022316Z [526s] Timeout after 30s compiling Config(block_sizes=[8, 8, 1024], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[2, 4], range_warp_specializes=[]) 2026-02-21T09:00:09.3044802Z Generation 18: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20/20 0.5 configs/s 2026-02-21T09:00:11.2708734Z Generation 18: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 20/20 10.8 configs/s 2026-02-21T09:00:11.2714644Z [528s] Generation 18 complete: 2026-02-21T09:00:11.2714903Z error=1 2026-02-21T09:00:11.2715069Z timeout=1 2026-02-21T09:00:11.2715231Z ok=19 2026-02-21T09:00:11.2715396Z min=0.1060 2026-02-21T09:00:11.2715555Z mid=3.5956 2026-02-21T09:00:11.2720085Z max=20.0289 2026-02-21T09:00:11.2720363Z best={'block_sizes': [32, 64, 64], 2026-02-21T09:00:11.2720733Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:00:11.2721073Z 'l2_groupings': [8], 2026-02-21T09:00:11.2721316Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:00:11.2722335Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:11.2722570Z 'num_stages': 4, 2026-02-21T09:00:11.2722764Z 'num_warps': 4, 2026-02-21T09:00:11.2722952Z 'pid_type': 'flat', 2026-02-21T09:00:11.2723180Z 'range_flattens': [None, True], 2026-02-21T09:00:11.2723440Z 'range_multi_buffers': [None, False], 2026-02-21T09:00:11.2723709Z 'range_num_stages': [0, 0], 2026-02-21T09:00:11.2723943Z 'range_unroll_factors': [0, 1], 2026-02-21T09:00:11.2724181Z 'range_warp_specializes': []} 2026-02-21T09:00:11.2769283Z [528s] Fitting surrogate: 1076 points, 1076 targets 2026-02-21T09:00:11.6909312Z [528s] Generation 19 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:00:25.0885682Z Generation 19: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 1.5 configs/s 2026-02-21T09:00:26.7021887Z Generation 19: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 19/19 11.9 configs/s 2026-02-21T09:00:26.7028624Z [543s] Generation 19 complete: 2026-02-21T09:00:26.7028829Z ok=20 2026-02-21T09:00:26.7029002Z min=0.1060 2026-02-21T09:00:26.7029177Z mid=0.6860 2026-02-21T09:00:26.7029692Z max=15.6893 2026-02-21T09:00:26.7029863Z best={'block_sizes': [32, 64, 64], 2026-02-21T09:00:26.7030144Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:00:26.7030450Z 'l2_groupings': [8], 2026-02-21T09:00:26.7030654Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:00:26.7030904Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:26.7031087Z 'num_stages': 4, 2026-02-21T09:00:26.7031251Z 'num_warps': 4, 2026-02-21T09:00:26.7031412Z 'pid_type': 'flat', 2026-02-21T09:00:26.7031595Z 'range_flattens': [None, True], 2026-02-21T09:00:26.7031812Z 'range_multi_buffers': [None, False], 2026-02-21T09:00:26.7032038Z 'range_num_stages': [0, 0], 2026-02-21T09:00:26.7032239Z 'range_unroll_factors': [0, 1], 2026-02-21T09:00:26.7032451Z 'range_warp_specializes': []} 2026-02-21T09:00:26.7080835Z [543s] Fitting surrogate: 1096 points, 1096 targets 2026-02-21T09:00:27.1246743Z [544s] Generation 20 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:00:41.8813654Z Generation 20: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 1.3 configs/s 2026-02-21T09:00:43.3735216Z Generation 20: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 19/19 13.0 configs/s 2026-02-21T09:00:43.3753274Z [560s] Generation 20 complete: 2026-02-21T09:00:43.3753515Z error=1 2026-02-21T09:00:43.3753671Z ok=19 2026-02-21T09:00:43.3753845Z min=0.1060 2026-02-21T09:00:43.3754012Z mid=0.3644 2026-02-21T09:00:43.3754178Z max=16.7483 2026-02-21T09:00:43.3754366Z best={'block_sizes': [32, 64, 64], 2026-02-21T09:00:43.3754677Z 'indexing': ['tensor_descriptor', 'pointer', 'pointer'], 2026-02-21T09:00:43.3754998Z 'l2_groupings': [8], 2026-02-21T09:00:43.3755227Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:00:43.3755509Z 'loop_orders': [[1, 0]], 2026-02-21T09:00:43.3755714Z 'num_stages': 4, 2026-02-21T09:00:43.3755902Z 'num_warps': 4, 2026-02-21T09:00:43.3756092Z 'pid_type': 'flat', 2026-02-21T09:00:43.3756312Z 'range_flattens': [None, True], 2026-02-21T09:00:43.3756913Z 'range_multi_buffers': [None, False], 2026-02-21T09:00:43.3757186Z 'range_num_stages': [0, 0], 2026-02-21T09:00:43.3757422Z 'range_unroll_factors': [0, 1], 2026-02-21T09:00:43.3757667Z 'range_warp_specializes': []} 2026-02-21T09:00:43.3796010Z [560s] Fitting surrogate: 1116 points, 1116 targets 2026-02-21T09:00:44.3499090Z [561s] Autotuning complete in 561.6s after searching 1067 configs. 2026-02-21T09:00:44.3499557Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:00:44.3501086Z @helion.kernel(config=helion.Config(block_sizes=[32, 64, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=4, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:00:44.3502903Z 2026-02-21T09:00:44.3503289Z [561s] Code of selected kernel: /tmp/torchinductor_root/y3/cy3kmi6beg2sgxn2n4d5czhbr3h7ke7tqlbrgiu6klw7c3v6c7ei.py 2026-02-21T09:00:45.5394697Z WARNING:tritonbench.utils.triton_op:Completed input ID 14: 2026-02-21T09:00:45.5395107Z x_val 2026-02-21T09:00:45.5395282Z ------------------- 2026-02-21T09:00:45.5395480Z (64, 1, 7168, 8192) 2026-02-21T09:00:45.5395615Z 2026-02-21T09:00:45.5409257Z 50%|█████ | 5/10 [42:25<42:58, 515.74s/it]WARNING:tritonbench.utils.triton_op:Running input ID 17: 2026-02-21T09:00:45.5409672Z x_val 2026-02-21T09:00:45.5409828Z --------------------- 2026-02-21T09:00:45.5410017Z (1, 4096, 8192, 1024) 2026-02-21T09:00:45.5413850Z INFO:tritonbench.utils.triton_op:Took 0.26ms to get benchmark function for preprocessed_eager_int4_gemm 2026-02-21T09:00:46.7728898Z INFO:tritonbench.utils.triton_op:Took 3.58ms to get benchmark function for preprocessed_torch_compile_int4_gemm 2026-02-21T09:00:49.5313580Z Autotune Choices Stats: 2026-02-21T09:00:49.5316856Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.10675200074911118, "best_triton_pos": 1, "best_triton_time": 0.12918399274349213, "best_triton_kernel": "triton_mm_68", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2026-02-21T09:00:49.5319865Z AUTOTUNE mm(4096x1024, 1024x8192) 2026-02-21T09:00:49.5320114Z strides: [1024, 1], [8192, 1] 2026-02-21T09:00:49.5320369Z dtypes: torch.bfloat16, torch.bfloat16 2026-02-21T09:00:49.5320631Z mm 0.1068 ms 100.0% 2026-02-21T09:00:49.5321303Z triton_mm_68 0.1292 ms 82.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:00:49.5322482Z triton_mm_67 0.1373 ms 77.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:00:49.5323644Z triton_mm_69 0.1435 ms 74.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T09:00:49.5324999Z triton_mm_62 0.1667 ms 64.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:00:49.5326096Z triton_mm_60 0.1845 ms 57.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:00:49.5327227Z triton_mm_63 0.1914 ms 55.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T09:00:49.5328376Z triton_mm_61 0.1918 ms 55.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:00:49.5329405Z triton_mm_64 0.1927 ms 55.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:00:49.5330400Z triton_mm_65 0.1928 ms 55.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:00:49.5331226Z SingleProcess AUTOTUNE benchmarking takes 0.6329 seconds and 1.8577 seconds precompiling for 20 choices 2026-02-21T09:00:52.7947975Z INFO:tritonbench.utils.triton_op:Took 0.18ms to get benchmark function for preprocessed_triton_int4_gemm 2026-02-21T09:00:53.9628343Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:00:53.9628866Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:00:53.9629315Z 'dtype': 'torch.bfloat16', 2026-02-21T09:00:53.9629755Z 'shape': (1, 4096, 1024), 2026-02-21T09:00:53.9630617Z 'stride': (4194304, 1024, 1)}, 2026-02-21T09:00:53.9631054Z { 'device': 'cuda:0', 2026-02-21T09:00:53.9631468Z 'dtype': 'torch.int32', 2026-02-21T09:00:53.9631889Z 'shape': (1024, 8192), 2026-02-21T09:00:53.9632310Z 'stride': (8192, 1)}), 2026-02-21T09:00:53.9632693Z 'kwargs': {}} 2026-02-21T09:00:53.9666178Z INFO:tritonbench.utils.triton_op:Took 4.11ms to get benchmark function for helion_int4_gemm_tritonbench 2026-02-21T09:00:54.4328884Z [0s] Autotune random seed: 2135373392 2026-02-21T09:00:54.6039710Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:01:39.5278955Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━━ 100/100 0.5 configs/s 2026-02-21T09:01:40.1609811Z [45s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=32, num_stages=1, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[0, 3], range_unroll_factors=[0, 1], range_warp_specializes=[]) 2026-02-21T09:01:40.1611896Z Tensor-likes are not close! 2026-02-21T09:01:40.1612074Z 2026-02-21T09:01:40.1612198Z Mismatched elements: 33509231 / 33554432 (99.9%) 2026-02-21T09:01:40.1612634Z Greatest absolute difference: 3376.0 at index (1150, 2257) (up to 0.01 allowed) 2026-02-21T09:01:40.1613182Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:01:40.1613664Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:01:40.1613926Z 2026-02-21T09:01:46.6532354Z [52s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=4, num_stages=1, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, True], range_num_stages=[4, 1], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:01:46.6534554Z Tensor-likes are not close! 2026-02-21T09:01:46.6534709Z 2026-02-21T09:01:46.6534820Z Mismatched elements: 33476315 / 33554432 (99.8%) 2026-02-21T09:01:46.6535215Z Greatest absolute difference: 2336.0 at index (1818, 918) (up to 0.01 allowed) 2026-02-21T09:01:46.6535705Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:01:46.6536134Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:01:46.6536380Z 2026-02-21T09:01:49.4938414Z [54s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 128, 512], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=64, num_stages=2, num_warps=32, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[4, 2], range_unroll_factors=[4, 1], range_warp_specializes=[]) 2026-02-21T09:01:49.4940298Z Tensor-likes are not close! 2026-02-21T09:01:49.4940471Z 2026-02-21T09:01:49.4940594Z Mismatched elements: 33506687 / 33554432 (99.9%) 2026-02-21T09:01:49.4941030Z Greatest absolute difference: 3632.0 at index (1142, 7868) (up to 0.01 allowed) 2026-02-21T09:01:49.4941598Z Greatest relative difference: inf at index (2170, 978) (up to 0.01 allowed) 2026-02-21T09:01:49.4942062Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:01:49.4942338Z 2026-02-21T09:01:58.7140089Z [64s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:01:58.7142112Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 512, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:01:58.7144441Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:01:58.7144812Z `ptxas` stderr: 2026-02-21T09:01:58.7145493Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T09:01:58.7146298Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:01:58.7146803Z 2026-02-21T09:01:58.7147446Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmprlh95mtg.ptx -o /tmp/tmprlh95mtg.ptx.o 2026-02-21T09:01:58.7148096Z 2026-02-21T09:01:58.7148344Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:01:58.7148725Z 2026-02-21T09:01:58.7148731Z 2026-02-21T09:01:58.7148839Z ================================================================ 2026-02-21T09:01:58.7149104Z Internal Triton PTX codegen error 2026-02-21T09:01:58.7149344Z `ptxas` stderr: 2026-02-21T09:01:58.7149940Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T09:01:58.7150640Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:01:58.7150833Z 2026-02-21T09:01:58.7151380Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmprlh95mtg.ptx -o /tmp/tmprlh95mtg.ptx.o 2026-02-21T09:01:58.7152005Z 2026-02-21T09:01:58.7152009Z 2026-02-21T09:01:58.7152069Z // 2026-02-21T09:01:58.7152230Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:01:58.7152435Z // 2026-02-21T09:01:58.7152519Z 2026-02-21T09:01:58.7152582Z .version 8.7 2026-02-21T09:01:58.7152737Z .target sm_90a 2026-02-21T09:01:58.7152889Z .address_size 64 2026-02-21T09:01:58.7152987Z 2026-02-21T09:01:58.7153347Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:01:58.7153725Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:01:58.7154002Z // @_helion_matmul_bf16_int4 2026-02-21T09:01:58.7154277Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:01:58.7154599Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:01:58.7154983Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:01:58.7155350Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:01:58.7155722Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:01:58.7156086Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:01:58.7156683Z ) 2026-02-21T09:01:58.7156849Z .reqntid 1024 2026-02-21T09:01:58.7157009Z .maxnreg 32 2026-02-21T09:01:58.7157151Z { 2026-02-21T09:01:58.7157296Z .reg .pred %p<34>; 2026-02-21T09:01:58.7157472Z .reg .b16 %rs<59>; 2026-02-21T09:01:58.7157635Z .reg .b32 %r<1170>; 2026-02-21T09:01:58.7157804Z .reg .b64 %rd<144>; 2026-02-21T09:01:58.7158115Z .loc 1 14 0 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:14:0 2026-02-21T09:01:58.7158481Z $L__func_begin0: 2026-02-21T09:01:58.7158774Z .loc 1 14 0 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:14:0 2026-02-21T09:01:58.7159080Z 2026-02-21T09:01:58.7159136Z // %bb.0: 2026-02-21T09:01:58.7159343Z ld.param.b64 %rd34, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:01:58.7159653Z ld.param.b64 %rd33, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:01:58.7159952Z ld.param.b64 %rd32, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:01:58.7160286Z $L__tmp0: 2026-02-21T09:01:58.7160583Z .loc 1 19 46 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:46 2026-02-21T09:01:58.7160939Z mov.u32 %r1089, %ctaid.x; 2026-02-21T09:01:58.7161393Z .loc 1 0 0 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:0 2026-02-21T09:01:58.7161738Z sub.s32 %r224, 6271, %r1089; 2026-02-21T09:01:58.7162063Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7162431Z mul.hi.u32 %r225, %r224, 1041204193; 2026-02-21T09:01:58.7162632Z shr.u32 %r226, %r225, 10; 2026-02-21T09:01:58.7162809Z mul.hi.u32 %r227, %r226, 1431655766; 2026-02-21T09:01:58.7163012Z mad.lo.s32 %r1150, %r227, 12672, %r1089; 2026-02-21T09:01:58.7163358Z .loc 1 31 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:31:45 2026-02-21T09:01:58.7163707Z mov.u32 %r3, %tid.x; 2026-02-21T09:01:58.7163869Z shr.u32 %r4, %r3, 5; 2026-02-21T09:01:58.7164031Z shr.u32 %r5, %r3, 2; 2026-02-21T09:01:58.7164267Z or.b32 %r6, %r5, 256; 2026-02-21T09:01:58.7164582Z .loc 1 33 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:33:45 2026-02-21T09:01:58.7164926Z and.b32 %r7, %r3, 31; 2026-02-21T09:01:58.7165088Z and.b32 %r8, %r3, 3; 2026-02-21T09:01:58.7165239Z shl.b32 %r9, %r8, 3; 2026-02-21T09:01:58.7165536Z .loc 1 41 48 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:41:48 2026-02-21T09:01:58.7165888Z bfe.u32 %r10, %r3, 6, 3; 2026-02-21T09:01:58.7166190Z .loc 1 47 38 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:47:38 2026-02-21T09:01:58.7166668Z shl.b32 %r11, %r8, 2; 2026-02-21T09:01:58.7166972Z .loc 1 65 38 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:65:38 2026-02-21T09:01:58.7167324Z and.b32 %r12, %r3, 32; 2026-02-21T09:01:58.7167640Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7168008Z setp.ge.s32 %p1, %r1089, %r1150; 2026-02-21T09:01:58.7168208Z shl.b32 %r1078, %r3, 3; 2026-02-21T09:01:58.7168477Z bfe.s32 %r1079, %r3, 4, 1; 2026-02-21T09:01:58.7168664Z mov.b32 %r1080, global_smem; 2026-02-21T09:01:58.7168839Z shl.b32 %r1081, %r3, 4; 2026-02-21T09:01:58.7169015Z shl.b32 %r1082, %r8, 1; 2026-02-21T09:01:58.7169182Z shl.b32 %r1083, %r7, 6; 2026-02-21T09:01:58.7169345Z shr.u32 %r1084, %r3, 3; 2026-02-21T09:01:58.7169507Z shl.b32 %r1085, %r3, 10; 2026-02-21T09:01:58.7169680Z shl.b32 %r1086, %r8, 5; 2026-02-21T09:01:58.7169840Z shl.b32 %r1087, %r3, 2; 2026-02-21T09:01:58.7170012Z shl.b32 %r1088, %r5, 10; 2026-02-21T09:01:58.7170188Z setp.eq.b32 %p33, %r12, 0; 2026-02-21T09:01:58.7170372Z cvt.u64.u32 %rd134, %r11; 2026-02-21T09:01:58.7170551Z @%p1 bra $L__BB0_9; 2026-02-21T09:01:58.7170729Z // %bb.1: // %.lr.ph 2026-02-21T09:01:58.7171184Z .loc 1 0 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:0:144 2026-02-21T09:01:58.7171551Z and.b32 %r229, %r1078, 8056; 2026-02-21T09:01:58.7171737Z and.b32 %r231, %r1079, 136; 2026-02-21T09:01:58.7171921Z xor.b32 %r13, %r231, %r229; 2026-02-21T09:01:58.7172099Z add.s32 %r14, %r1080, %r13; 2026-02-21T09:01:58.7172281Z add.s32 %r15, %r14, 8192; 2026-02-21T09:01:58.7172454Z add.s32 %r16, %r14, 16384; 2026-02-21T09:01:58.7172627Z add.s32 %r17, %r14, 24576; 2026-02-21T09:01:58.7172801Z and.b32 %r234, %r1081, 15872; 2026-02-21T09:01:58.7172984Z and.b32 %r235, %r1078, 96; 2026-02-21T09:01:58.7173154Z or.b32 %r237, %r234, %r235; 2026-02-21T09:01:58.7173331Z or.b32 %r238, %r237, %r1082; 2026-02-21T09:01:58.7173503Z or.b32 %r18, %r238, %r231; 2026-02-21T09:01:58.7173677Z xor.b32 %r19, %r18, 8; 2026-02-21T09:01:58.7173837Z and.b32 %r240, %r1078, 48; 2026-02-21T09:01:58.7174006Z and.b32 %r242, %r1084, 60; 2026-02-21T09:01:58.7174269Z xor.b32 %r243, %r240, %r242; 2026-02-21T09:01:58.7174447Z add.s32 %r244, %r1080, 32768; 2026-02-21T09:01:58.7174629Z add.s32 %r245, %r244, %r1083; 2026-02-21T09:01:58.7174807Z add.s32 %r20, %r245, %r243; 2026-02-21T09:01:58.7174991Z bfe.u32 %r246, %r244, 4, 14; 2026-02-21T09:01:58.7175163Z cvt.u64.u32 %rd35, %r246; 2026-02-21T09:01:58.7175356Z or.b64 %rd1, %rd35, -9223371899407433728; 2026-02-21T09:01:58.7175567Z add.s32 %r247, %r1080, 32800; 2026-02-21T09:01:58.7175750Z bfe.u32 %r248, %r247, 4, 14; 2026-02-21T09:01:58.7175922Z cvt.u64.u32 %rd36, %r248; 2026-02-21T09:01:58.7176110Z or.b64 %rd2, %rd36, -9223371899407433728; 2026-02-21T09:01:58.7176318Z and.b32 %r250, %r1085, 24576; 2026-02-21T09:01:58.7176629Z and.b32 %r252, %r3, 992; 2026-02-21T09:01:58.7176809Z shl.b32 %r253, %r252, 3; 2026-02-21T09:01:58.7176982Z and.b32 %r255, %r1087, 112; 2026-02-21T09:01:58.7177164Z or.b32 %r256, %r1086, %r253; 2026-02-21T09:01:58.7177337Z xor.b32 %r257, %r256, %r255; 2026-02-21T09:01:58.7177520Z add.s32 %r258, %r1080, %r250; 2026-02-21T09:01:58.7177769Z add.s32 %r21, %r258, %r257; 2026-02-21T09:01:58.7177952Z and.b32 %r259, %r3, 6; 2026-02-21T09:01:58.7178128Z shl.b32 %r260, %r259, 12; 2026-02-21T09:01:58.7178308Z and.b32 %r261, %r1081, 112; 2026-02-21T09:01:58.7178486Z and.b32 %r262, %r1087, 4064; 2026-02-21T09:01:58.7178657Z or.b32 %r263, %r260, %r261; 2026-02-21T09:01:58.7178833Z xor.b32 %r264, %r263, %r262; 2026-02-21T09:01:58.7179003Z add.s32 %r438, %r1080, %r264; 2026-02-21T09:01:58.7179179Z add.s32 %r443, %r438, 4096; 2026-02-21T09:01:58.7179348Z shl.b32 %r265, %r252, 4; 2026-02-21T09:01:58.7179522Z or.b32 %r266, %r265, %r235; 2026-02-21T09:01:58.7179695Z or.b32 %r267, %r266, %r1082; 2026-02-21T09:01:58.7179875Z or.b32 %r24, %r267, %r231; 2026-02-21T09:01:58.7180053Z xor.b32 %r25, %r24, 8; 2026-02-21T09:01:58.7180229Z shl.b32 %r268, %r259, 3; 2026-02-21T09:01:58.7180405Z xor.b32 %r269, %r268, %r242; 2026-02-21T09:01:58.7180579Z add.s32 %r26, %r245, %r269; 2026-02-21T09:01:58.7180913Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7181280Z mad.wide.u32 %rd37, %r8, 8, %rd32; 2026-02-21T09:01:58.7181582Z add.s64 %rd3, %rd37, 64; 2026-02-21T09:01:58.7181893Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7182247Z or.b32 %r271, %r1088, %r11; 2026-02-21T09:01:58.7182428Z or.b32 %r28, %r271, 262176; 2026-02-21T09:01:58.7182739Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7183090Z shl.b32 %r272, %r10, 13; 2026-02-21T09:01:58.7183390Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7183743Z or.b32 %r29, %r272, %r7; 2026-02-21T09:01:58.7183958Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:01:58.7184326Z // Child Loop BB0_3 Depth 2 2026-02-21T09:01:58.7184602Z // Child Loop BB0_5 Depth 2 2026-02-21T09:01:58.7184859Z // Child Loop BB0_7 Depth 2 2026-02-21T09:01:58.7185249Z .loc 1 25 35 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:25:35 2026-02-21T09:01:58.7185606Z shr.s32 %r284, %r1089, 31; 2026-02-21T09:01:58.7185792Z shr.u32 %r285, %r284, 23; 2026-02-21T09:01:58.7185969Z add.s32 %r286, %r1089, %r285; 2026-02-21T09:01:58.7186153Z shr.s32 %r287, %r286, 9; 2026-02-21T09:01:58.7186596Z .loc 1 26 33 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:26:33 2026-02-21T09:01:58.7186977Z shl.b32 %r288, %r287, 1; 2026-02-21T09:01:58.7187291Z .loc 1 27 39 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:39 2026-02-21T09:01:58.7187643Z sub.s32 %r289, 8, %r288; 2026-02-21T09:01:58.7188064Z .loc 1 27 52 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:52 2026-02-21T09:01:58.7188506Z min.s32 %r290, %r289, 2; 2026-02-21T09:01:58.7188822Z .loc 1 28 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:45 2026-02-21T09:01:58.7189184Z and.b32 %r291, %r286, -512; 2026-02-21T09:01:58.7189368Z sub.s32 %r292, %r1089, %r291; 2026-02-21T09:01:58.7189690Z .loc 1 29 51 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:29:51 2026-02-21T09:01:58.7190038Z div.s32 %r293, %r292, %r290; 2026-02-21T09:01:58.7190354Z .loc 1 28 64 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:64 2026-02-21T09:01:58.7190704Z mul.lo.s32 %r294, %r293, %r290; 2026-02-21T09:01:58.7190903Z sub.s32 %r295, %r292, %r294; 2026-02-21T09:01:58.7191225Z .loc 1 28 30 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:30 2026-02-21T09:01:58.7191582Z add.s32 %r296, %r295, %r288; 2026-02-21T09:01:58.7191986Z .loc 1 30 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:30:27 2026-02-21T09:01:58.7192338Z shl.b32 %r297, %r296, 9; 2026-02-21T09:01:58.7192673Z .loc 1 31 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:31:32 2026-02-21T09:01:58.7193018Z or.b32 %r44, %r297, %r5; 2026-02-21T09:01:58.7193194Z or.b32 %r45, %r297, %r6; 2026-02-21T09:01:58.7193502Z .loc 1 32 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:32:27 2026-02-21T09:01:58.7193846Z shl.b32 %r46, %r293, 5; 2026-02-21T09:01:58.7194154Z .loc 1 48 53 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:53 2026-02-21T09:01:58.7203066Z shl.b32 %r298, %r44, 10; 2026-02-21T09:01:58.7203335Z shl.b32 %r299, %r45, 10; 2026-02-21T09:01:58.7203688Z .loc 1 48 60 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:60 2026-02-21T09:01:58.7204084Z or.b32 %r300, %r298, %r11; 2026-02-21T09:01:58.7204274Z or.b32 %r301, %r299, %r11; 2026-02-21T09:01:58.7204633Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7205173Z mad.wide.s32 %rd38, %r300, 2, %rd32; 2026-02-21T09:01:58.7205389Z mad.wide.s32 %rd39, %r301, 2, %rd32; 2026-02-21T09:01:58.7205736Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7206081Z bar.sync 0; 2026-02-21T09:01:58.7206237Z mov.b32 %r274, 8; 2026-02-21T09:01:58.7206394Z // begin inline asm 2026-02-21T09:01:58.7206811Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd38 + 0 ], 0x8, %r274; 2026-02-21T09:01:58.7207088Z // end inline asm 2026-02-21T09:01:58.7207251Z // begin inline asm 2026-02-21T09:01:58.7207486Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd39 + 0 ], 0x8, %r274; 2026-02-21T09:01:58.7207751Z // end inline asm 2026-02-21T09:01:58.7208019Z cp.async.commit_group; 2026-02-21T09:01:58.7208349Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7208713Z cvt.s64.s32 %rd43, %r298; 2026-02-21T09:01:58.7208908Z or.b64 %rd44, %rd43, %rd134; 2026-02-21T09:01:58.7209102Z shl.b64 %rd45, %rd44, 1; 2026-02-21T09:01:58.7209281Z add.s64 %rd46, %rd32, %rd45; 2026-02-21T09:01:58.7209469Z add.s64 %rd40, %rd46, 32; 2026-02-21T09:01:58.7209663Z cvt.s64.s32 %rd47, %r299; 2026-02-21T09:01:58.7209846Z or.b64 %rd48, %rd47, %rd134; 2026-02-21T09:01:58.7210039Z shl.b64 %rd49, %rd48, 1; 2026-02-21T09:01:58.7210214Z add.s64 %rd50, %rd32, %rd49; 2026-02-21T09:01:58.7210392Z add.s64 %rd41, %rd50, 32; 2026-02-21T09:01:58.7210726Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7211096Z bar.sync 0; 2026-02-21T09:01:58.7211250Z // begin inline asm 2026-02-21T09:01:58.7211505Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd40 + 0 ], 0x8, %r274; 2026-02-21T09:01:58.7211883Z // end inline asm 2026-02-21T09:01:58.7212048Z // begin inline asm 2026-02-21T09:01:58.7212277Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd41 + 0 ], 0x8, %r274; 2026-02-21T09:01:58.7212556Z // end inline asm 2026-02-21T09:01:58.7212720Z cp.async.commit_group; 2026-02-21T09:01:58.7213055Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7213429Z shl.b32 %r302, %r296, 19; 2026-02-21T09:01:58.7213607Z or.b32 %r303, %r1088, %r302; 2026-02-21T09:01:58.7213807Z mad.wide.s32 %rd135, %r303, 2, %rd3; 2026-02-21T09:01:58.7214016Z shl.b32 %r304, %r1089, 19; 2026-02-21T09:01:58.7214200Z or.b32 %r305, %r28, %r304; 2026-02-21T09:01:58.7214375Z shl.b32 %r306, %r294, 19; 2026-02-21T09:01:58.7214559Z sub.s32 %r307, %r305, %r306; 2026-02-21T09:01:58.7214747Z mul.lo.s32 %r308, %r287, 267386880; 2026-02-21T09:01:58.7214968Z sub.s32 %r1091, %r307, %r308; 2026-02-21T09:01:58.7215167Z add.s32 %r1090, %r29, %r46; 2026-02-21T09:01:58.7215434Z mov.b32 %r1094, 0f00000000; 2026-02-21T09:01:58.7215624Z mov.b32 %r1093, 1; 2026-02-21T09:01:58.7215785Z mov.b32 %r1092, -1; 2026-02-21T09:01:58.7215956Z mov.b64 %rd136, -8; 2026-02-21T09:01:58.7216118Z mov.b32 %r1095, %r1094; 2026-02-21T09:01:58.7216294Z mov.b32 %r1096, %r1094; 2026-02-21T09:01:58.7216580Z mov.b32 %r1097, %r1094; 2026-02-21T09:01:58.7216760Z mov.b32 %r1098, %r1094; 2026-02-21T09:01:58.7216921Z mov.b32 %r1099, %r1094; 2026-02-21T09:01:58.7217092Z mov.b32 %r1100, %r1094; 2026-02-21T09:01:58.7217264Z mov.b32 %r1101, %r1094; 2026-02-21T09:01:58.7217425Z mov.b32 %r1102, %r1094; 2026-02-21T09:01:58.7217599Z mov.b32 %r1103, %r1094; 2026-02-21T09:01:58.7217763Z mov.b32 %r1104, %r1094; 2026-02-21T09:01:58.7217936Z mov.b32 %r1105, %r1094; 2026-02-21T09:01:58.7218097Z mov.b32 %r1106, %r1094; 2026-02-21T09:01:58.7218265Z mov.b32 %r1107, %r1094; 2026-02-21T09:01:58.7218429Z mov.b32 %r1108, %r1094; 2026-02-21T09:01:58.7218600Z mov.b32 %r1109, %r1094; 2026-02-21T09:01:58.7218821Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:01:58.7219131Z // => This Inner Loop Header: Depth=2 2026-02-21T09:01:58.7219503Z add.s64 %rd136, %rd136, 8; 2026-02-21T09:01:58.7219697Z setp.lt.u64 %p5, %rd136, 496; 2026-02-21T09:01:58.7219897Z add.s32 %r423, %r1092, 1; 2026-02-21T09:01:58.7220080Z setp.gt.s32 %p6, %r423, 1; 2026-02-21T09:01:58.7220275Z selp.b32 %r1092, 0, %r423, %p6; 2026-02-21T09:01:58.7220618Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7221004Z cp.async.wait_group 1; 2026-02-21T09:01:58.7221179Z bar.sync 0; 2026-02-21T09:01:58.7221339Z shl.b32 %r424, %r1092, 14; 2026-02-21T09:01:58.7221527Z add.s32 %r426, %r1080, %r424; 2026-02-21T09:01:58.7221856Z .loc 1 52 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:52:32 2026-02-21T09:01:58.7222296Z add.s32 %r427, %r426, %r18; 2026-02-21T09:01:58.7222503Z ld.shared.b16 %rs2, [%r427]; 2026-02-21T09:01:58.7222708Z ld.shared.b16 %rs3, [%r427+256]; 2026-02-21T09:01:58.7222914Z ld.shared.b16 %rs4, [%r427+16]; 2026-02-21T09:01:58.7223122Z ld.shared.b16 %rs5, [%r427+272]; 2026-02-21T09:01:58.7223320Z add.s32 %r428, %r426, %r19; 2026-02-21T09:01:58.7223523Z ld.shared.b16 %rs6, [%r428]; 2026-02-21T09:01:58.7223726Z ld.shared.b16 %rs7, [%r428+256]; 2026-02-21T09:01:58.7223926Z ld.shared.b16 %rs8, [%r428+16]; 2026-02-21T09:01:58.7224125Z ld.shared.b16 %rs9, [%r428+272]; 2026-02-21T09:01:58.7224319Z cvt.f32.bf16 %r341, %rs2; 2026-02-21T09:01:58.7224506Z cvt.f32.bf16 %r342, %rs3; 2026-02-21T09:01:58.7224681Z cvt.f32.bf16 %r343, %rs6; 2026-02-21T09:01:58.7224861Z cvt.f32.bf16 %r344, %rs7; 2026-02-21T09:01:58.7225034Z cvt.f32.bf16 %r377, %rs4; 2026-02-21T09:01:58.7225216Z cvt.f32.bf16 %r378, %rs5; 2026-02-21T09:01:58.7225490Z cvt.f32.bf16 %r379, %rs8; 2026-02-21T09:01:58.7225664Z cvt.f32.bf16 %r380, %rs9; 2026-02-21T09:01:58.7225996Z .loc 1 54 34 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:34 2026-02-21T09:01:58.7226360Z cvt.s64.s32 %rd58, %r1090; 2026-02-21T09:01:58.7226684Z add.s64 %rd52, %rd33, %rd58; 2026-02-21T09:01:58.7227007Z .loc 1 54 87 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:87 2026-02-21T09:01:58.7227362Z // begin inline asm 2026-02-21T09:01:58.7227523Z mov.u64 %rd51, 0x0; 2026-02-21T09:01:58.7227773Z createpolicy.fractional.L2::evict_last.b64 %rd51, 1.0; 2026-02-21T09:01:58.7228034Z // end inline asm 2026-02-21T09:01:58.7228187Z // begin inline asm 2026-02-21T09:01:58.7228427Z mov.u16 %rs1, 0x0; 2026-02-21T09:01:58.7228680Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs1 }, [ %rd52 + 0 ], %rd51; 2026-02-21T09:01:58.7228981Z // end inline asm 2026-02-21T09:01:58.7229287Z .loc 1 57 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:57:28 2026-02-21T09:01:58.7229752Z shl.b16 %rs10, %rs1, 4; 2026-02-21T09:01:58.7230079Z .loc 1 72 58 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:72:58 2026-02-21T09:01:58.7230453Z selp.b16 %rs11, %rs10, %rs1, %p33; 2026-02-21T09:01:58.7230668Z cvt.s16.s8 %rs12, %rs11; 2026-02-21T09:01:58.7230843Z shr.s16 %rs13, %rs12, 4; 2026-02-21T09:01:58.7231160Z .loc 1 77 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:77:32 2026-02-21T09:01:58.7231513Z cvt.rn.f32.s16 %r429, %rs13; 2026-02-21T09:01:58.7231708Z st.shared.b32 [%r20], %r429; 2026-02-21T09:01:58.7231881Z $L__tmp1: 2026-02-21T09:01:58.7232244Z .loc 2 291 36 // standard.py:291:36 @[ ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:84:40 ] 2026-02-21T09:01:58.7232669Z // begin inline asm 2026-02-21T09:01:58.7232849Z fence.proxy.async.shared::cta; 2026-02-21T09:01:58.7233052Z // end inline asm 2026-02-21T09:01:58.7233206Z bar.sync 0; 2026-02-21T09:01:58.7233380Z shfl.sync.idx.b32 %r430, %r4, 0, 31, -1; 2026-02-21T09:01:58.7233605Z wgmma.fence.sync.aligned; 2026-02-21T09:01:58.7233798Z mov.pred %p2, -1; 2026-02-21T09:01:58.7234053Z // begin inline asm 2026-02-21T09:01:58.7234668Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109}, {%r341,%r342,%r343,%r344}, %rd1, %p2, 1, 1; 2026-02-21T09:01:58.7235312Z // end inline asm 2026-02-21T09:01:58.7235468Z // begin inline asm 2026-02-21T09:01:58.7236057Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109}, {%r377,%r378,%r379,%r380}, %rd2, %p2, 1, 1; 2026-02-21T09:01:58.7236832Z // end inline asm 2026-02-21T09:01:58.7237021Z wgmma.commit_group.sync.aligned; 2026-02-21T09:01:58.7237223Z mov.b32 %r399, 0; 2026-02-21T09:01:58.7237473Z mov.b32 %r398, %r399; 2026-02-21T09:01:58.7237662Z mov.b32 %r397, %r244; 2026-02-21T09:01:58.7237832Z // begin inline asm 2026-02-21T09:01:58.7238243Z // wait for regs: %r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109,%r397,%r398,%r399 2026-02-21T09:01:58.7238711Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:01:58.7238922Z // end inline asm 2026-02-21T09:01:58.7239072Z $L__tmp2: 2026-02-21T09:01:58.7239378Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7239744Z add.s32 %r431, %r1093, 1; 2026-02-21T09:01:58.7239936Z setp.gt.s32 %p7, %r431, 1; 2026-02-21T09:01:58.7240131Z selp.b32 %r1093, 0, %r431, %p7; 2026-02-21T09:01:58.7240465Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7240837Z mad.wide.s32 %rd57, %r1091, 2, %rd32; 2026-02-21T09:01:58.7241285Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7241648Z shl.b32 %r432, %r1093, 14; 2026-02-21T09:01:58.7241828Z add.s32 %r433, %r1080, %r432; 2026-02-21T09:01:58.7242025Z add.s32 %r419, %r433, %r13; 2026-02-21T09:01:58.7242213Z selp.b32 %r420, 8, 0, %p5; 2026-02-21T09:01:58.7242389Z // begin inline asm 2026-02-21T09:01:58.7242638Z cp.async.ca.shared.global [ %r419 + 0 ], [ %rd135 + 0 ], 0x8, %r420; 2026-02-21T09:01:58.7242917Z // end inline asm 2026-02-21T09:01:58.7243083Z add.s32 %r421, %r419, 8192; 2026-02-21T09:01:58.7243261Z // begin inline asm 2026-02-21T09:01:58.7243499Z cp.async.ca.shared.global [ %r421 + 0 ], [ %rd57 + 0 ], 0x8, %r420; 2026-02-21T09:01:58.7243803Z // end inline asm 2026-02-21T09:01:58.7243976Z cp.async.commit_group; 2026-02-21T09:01:58.7244311Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7244677Z add.s64 %rd135, %rd135, 32; 2026-02-21T09:01:58.7244872Z add.s32 %r1091, %r1091, 16; 2026-02-21T09:01:58.7245127Z add.s32 %r1090, %r1090, 65536; 2026-02-21T09:01:58.7245341Z setp.lt.u64 %p8, %rd136, 504; 2026-02-21T09:01:58.7245528Z @%p8 bra $L__BB0_3; 2026-02-21T09:01:58.7245754Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:01:58.7246167Z .loc 1 33 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:33:32 2026-02-21T09:01:58.7246667Z or.b32 %r463, %r46, %r9; 2026-02-21T09:01:58.7247003Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7247364Z cp.async.wait_group 0; 2026-02-21T09:01:58.7247550Z bar.sync 0; 2026-02-21T09:01:58.7247843Z .loc 1 87 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:87:28 2026-02-21T09:01:58.7248219Z cvt.rn.bf16x2.f32 %r464, %r1095, %r1094; 2026-02-21T09:01:58.7248443Z cvt.rn.bf16x2.f32 %r465, %r1097, %r1096; 2026-02-21T09:01:58.7248696Z cvt.rn.bf16x2.f32 %r466, %r1099, %r1098; 2026-02-21T09:01:58.7248925Z cvt.rn.bf16x2.f32 %r467, %r1101, %r1100; 2026-02-21T09:01:58.7249144Z cvt.rn.bf16x2.f32 %r468, %r1103, %r1102; 2026-02-21T09:01:58.7249458Z cvt.rn.bf16x2.f32 %r469, %r1105, %r1104; 2026-02-21T09:01:58.7249670Z cvt.rn.bf16x2.f32 %r470, %r1107, %r1106; 2026-02-21T09:01:58.7249886Z cvt.rn.bf16x2.f32 %r471, %r1109, %r1108; 2026-02-21T09:01:58.7250227Z .loc 1 88 43 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:43 2026-02-21T09:01:58.7250584Z shl.b32 %r472, %r44, 13; 2026-02-21T09:01:58.7250761Z shl.b32 %r473, %r45, 13; 2026-02-21T09:01:58.7251067Z .loc 1 88 50 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:50 2026-02-21T09:01:58.7251426Z add.s32 %r474, %r472, %r463; 2026-02-21T09:01:58.7251610Z add.s32 %r475, %r473, %r463; 2026-02-21T09:01:58.7252024Z .loc 1 88 22 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:22 2026-02-21T09:01:58.7252395Z mad.wide.s32 %rd59, %r474, 2, %rd34; 2026-02-21T09:01:58.7252609Z mad.wide.s32 %rd60, %r475, 2, %rd34; 2026-02-21T09:01:58.7252948Z .loc 1 88 81 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:81 2026-02-21T09:01:58.7253343Z st.shared.v4.b32 [%r21], {%r464, %r466, %r468, %r470}; 2026-02-21T09:01:58.7253640Z st.shared.v4.b32 [%r21+128], {%r465, %r467, %r469, %r471}; 2026-02-21T09:01:58.7253887Z bar.sync 0; 2026-02-21T09:01:58.7254039Z // begin inline asm 2026-02-21T09:01:58.7254320Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r444, %r445, %r446, %r447}, [%r438]; 2026-02-21T09:01:58.7254649Z // end inline asm 2026-02-21T09:01:58.7254804Z // begin inline asm 2026-02-21T09:01:58.7255081Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r448, %r449, %r450, %r451}, [%r443]; 2026-02-21T09:01:58.7255406Z // end inline asm 2026-02-21T09:01:58.7255557Z // begin inline asm 2026-02-21T09:01:58.7255888Z st.global.v4.b32 [ %rd59 + 0 ], { %r444, %r445, %r446, %r447 }; 2026-02-21T09:01:58.7256148Z // end inline asm 2026-02-21T09:01:58.7256306Z // begin inline asm 2026-02-21T09:01:58.7256638Z st.global.v4.b32 [ %rd60 + 0 ], { %r448, %r449, %r450, %r451 }; 2026-02-21T09:01:58.7256908Z // end inline asm 2026-02-21T09:01:58.7257216Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7257589Z add.s32 %r476, %r1089, 4224; 2026-02-21T09:01:58.7257914Z .loc 1 25 35 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:25:35 2026-02-21T09:01:58.7258266Z shr.s32 %r477, %r476, 31; 2026-02-21T09:01:58.7258454Z shr.u32 %r478, %r477, 23; 2026-02-21T09:01:58.7258628Z add.s32 %r479, %r476, %r478; 2026-02-21T09:01:58.7258813Z shr.s32 %r480, %r479, 9; 2026-02-21T09:01:58.7259123Z .loc 1 26 33 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:26:33 2026-02-21T09:01:58.7259483Z shl.b32 %r481, %r480, 1; 2026-02-21T09:01:58.7259891Z .loc 1 27 39 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:39 2026-02-21T09:01:58.7260249Z sub.s32 %r482, 8, %r481; 2026-02-21T09:01:58.7260568Z .loc 1 27 52 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:52 2026-02-21T09:01:58.7260915Z min.s32 %r483, %r482, 2; 2026-02-21T09:01:58.7261225Z .loc 1 28 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:45 2026-02-21T09:01:58.7261577Z and.b32 %r484, %r479, -512; 2026-02-21T09:01:58.7261771Z sub.s32 %r485, %r476, %r484; 2026-02-21T09:01:58.7262094Z .loc 1 29 51 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:29:51 2026-02-21T09:01:58.7262442Z div.s32 %r486, %r485, %r483; 2026-02-21T09:01:58.7262773Z .loc 1 28 64 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:64 2026-02-21T09:01:58.7263137Z mul.lo.s32 %r487, %r486, %r483; 2026-02-21T09:01:58.7263346Z sub.s32 %r488, %r485, %r487; 2026-02-21T09:01:58.7263669Z .loc 1 28 30 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:30 2026-02-21T09:01:58.7264120Z add.s32 %r489, %r488, %r481; 2026-02-21T09:01:58.7264449Z .loc 1 30 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:30:27 2026-02-21T09:01:58.7264805Z shl.b32 %r490, %r489, 9; 2026-02-21T09:01:58.7265140Z .loc 1 31 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:31:32 2026-02-21T09:01:58.7265489Z or.b32 %r89, %r490, %r5; 2026-02-21T09:01:58.7265666Z or.b32 %r90, %r490, %r6; 2026-02-21T09:01:58.7265973Z .loc 1 32 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:32:27 2026-02-21T09:01:58.7266323Z shl.b32 %r91, %r486, 5; 2026-02-21T09:01:58.7266760Z .loc 1 48 53 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:53 2026-02-21T09:01:58.7267192Z shl.b32 %r491, %r89, 10; 2026-02-21T09:01:58.7267382Z shl.b32 %r492, %r90, 10; 2026-02-21T09:01:58.7267688Z .loc 1 48 60 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:60 2026-02-21T09:01:58.7268046Z or.b32 %r493, %r491, %r11; 2026-02-21T09:01:58.7268226Z or.b32 %r494, %r492, %r11; 2026-02-21T09:01:58.7268614Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7268976Z mad.wide.s32 %rd61, %r493, 2, %rd32; 2026-02-21T09:01:58.7269187Z mad.wide.s32 %rd62, %r494, 2, %rd32; 2026-02-21T09:01:58.7269526Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7269872Z bar.sync 0; 2026-02-21T09:01:58.7270022Z mov.b32 %r453, 8; 2026-02-21T09:01:58.7270178Z // begin inline asm 2026-02-21T09:01:58.7270426Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd61 + 0 ], 0x8, %r453; 2026-02-21T09:01:58.7270796Z // end inline asm 2026-02-21T09:01:58.7270961Z // begin inline asm 2026-02-21T09:01:58.7271200Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd62 + 0 ], 0x8, %r453; 2026-02-21T09:01:58.7271471Z // end inline asm 2026-02-21T09:01:58.7271641Z cp.async.commit_group; 2026-02-21T09:01:58.7271955Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7272320Z cvt.s64.s32 %rd66, %r491; 2026-02-21T09:01:58.7272504Z or.b64 %rd67, %rd66, %rd134; 2026-02-21T09:01:58.7272696Z shl.b64 %rd68, %rd67, 1; 2026-02-21T09:01:58.7272874Z add.s64 %rd69, %rd32, %rd68; 2026-02-21T09:01:58.7273060Z add.s64 %rd63, %rd69, 32; 2026-02-21T09:01:58.7273245Z cvt.s64.s32 %rd70, %r492; 2026-02-21T09:01:58.7273422Z or.b64 %rd71, %rd70, %rd134; 2026-02-21T09:01:58.7273615Z shl.b64 %rd72, %rd71, 1; 2026-02-21T09:01:58.7273788Z add.s64 %rd73, %rd32, %rd72; 2026-02-21T09:01:58.7273982Z add.s64 %rd64, %rd73, 32; 2026-02-21T09:01:58.7274376Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7274742Z bar.sync 0; 2026-02-21T09:01:58.7274890Z // begin inline asm 2026-02-21T09:01:58.7275131Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd63 + 0 ], 0x8, %r453; 2026-02-21T09:01:58.7275412Z // end inline asm 2026-02-21T09:01:58.7275564Z // begin inline asm 2026-02-21T09:01:58.7275795Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd64 + 0 ], 0x8, %r453; 2026-02-21T09:01:58.7276075Z // end inline asm 2026-02-21T09:01:58.7276246Z cp.async.commit_group; 2026-02-21T09:01:58.7276696Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7277076Z shl.b32 %r495, %r489, 19; 2026-02-21T09:01:58.7277249Z or.b32 %r496, %r1088, %r495; 2026-02-21T09:01:58.7277440Z mad.wide.s32 %rd137, %r496, 2, %rd3; 2026-02-21T09:01:58.7277649Z shl.b32 %r497, %r476, 19; 2026-02-21T09:01:58.7277822Z or.b32 %r498, %r28, %r497; 2026-02-21T09:01:58.7278007Z shl.b32 %r499, %r487, 19; 2026-02-21T09:01:58.7278180Z sub.s32 %r500, %r498, %r499; 2026-02-21T09:01:58.7278372Z mul.lo.s32 %r501, %r480, 267386880; 2026-02-21T09:01:58.7278572Z sub.s32 %r1111, %r500, %r501; 2026-02-21T09:01:58.7278867Z add.s32 %r1110, %r29, %r91; 2026-02-21T09:01:58.7279046Z mov.b32 %r1114, 0f00000000; 2026-02-21T09:01:58.7279227Z mov.b32 %r1113, 1; 2026-02-21T09:01:58.7279387Z mov.b32 %r1112, -1; 2026-02-21T09:01:58.7279552Z mov.b64 %rd138, -8; 2026-02-21T09:01:58.7279716Z mov.b32 %r1115, %r1114; 2026-02-21T09:01:58.7279889Z mov.b32 %r1116, %r1114; 2026-02-21T09:01:58.7280060Z mov.b32 %r1117, %r1114; 2026-02-21T09:01:58.7280226Z mov.b32 %r1118, %r1114; 2026-02-21T09:01:58.7280398Z mov.b32 %r1119, %r1114; 2026-02-21T09:01:58.7280564Z mov.b32 %r1120, %r1114; 2026-02-21T09:01:58.7280738Z mov.b32 %r1121, %r1114; 2026-02-21T09:01:58.7280903Z mov.b32 %r1122, %r1114; 2026-02-21T09:01:58.7281073Z mov.b32 %r1123, %r1114; 2026-02-21T09:01:58.7281240Z mov.b32 %r1124, %r1114; 2026-02-21T09:01:58.7281499Z mov.b32 %r1125, %r1114; 2026-02-21T09:01:58.7281678Z mov.b32 %r1126, %r1114; 2026-02-21T09:01:58.7281844Z mov.b32 %r1127, %r1114; 2026-02-21T09:01:58.7282017Z mov.b32 %r1128, %r1114; 2026-02-21T09:01:58.7282184Z mov.b32 %r1129, %r1114; 2026-02-21T09:01:58.7282408Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:01:58.7282712Z // => This Inner Loop Header: Depth=2 2026-02-21T09:01:58.7282976Z add.s64 %rd138, %rd138, 8; 2026-02-21T09:01:58.7283167Z setp.lt.u64 %p12, %rd138, 496; 2026-02-21T09:01:58.7283365Z add.s32 %r616, %r1112, 1; 2026-02-21T09:01:58.7283548Z setp.gt.s32 %p13, %r616, 1; 2026-02-21T09:01:58.7283739Z selp.b32 %r1112, 0, %r616, %p13; 2026-02-21T09:01:58.7284087Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7284454Z cp.async.wait_group 1; 2026-02-21T09:01:58.7284728Z bar.sync 0; 2026-02-21T09:01:58.7284886Z shl.b32 %r617, %r1112, 14; 2026-02-21T09:01:58.7285079Z add.s32 %r619, %r1080, %r617; 2026-02-21T09:01:58.7285405Z .loc 1 52 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:52:32 2026-02-21T09:01:58.7285775Z add.s32 %r620, %r619, %r24; 2026-02-21T09:01:58.7285973Z ld.shared.b16 %rs15, [%r620]; 2026-02-21T09:01:58.7286164Z ld.shared.b16 %rs16, [%r620+256]; 2026-02-21T09:01:58.7286379Z ld.shared.b16 %rs17, [%r620+16]; 2026-02-21T09:01:58.7286708Z ld.shared.b16 %rs18, [%r620+272]; 2026-02-21T09:01:58.7286917Z add.s32 %r621, %r619, %r25; 2026-02-21T09:01:58.7287099Z ld.shared.b16 %rs19, [%r621]; 2026-02-21T09:01:58.7287291Z ld.shared.b16 %rs20, [%r621+256]; 2026-02-21T09:01:58.7287488Z ld.shared.b16 %rs21, [%r621+16]; 2026-02-21T09:01:58.7287687Z ld.shared.b16 %rs22, [%r621+272]; 2026-02-21T09:01:58.7287884Z cvt.f32.bf16 %r534, %rs15; 2026-02-21T09:01:58.7288061Z cvt.f32.bf16 %r535, %rs16; 2026-02-21T09:01:58.7288260Z cvt.f32.bf16 %r536, %rs19; 2026-02-21T09:01:58.7288515Z cvt.f32.bf16 %r537, %rs20; 2026-02-21T09:01:58.7288702Z cvt.f32.bf16 %r570, %rs17; 2026-02-21T09:01:58.7288873Z cvt.f32.bf16 %r571, %rs18; 2026-02-21T09:01:58.7289051Z cvt.f32.bf16 %r572, %rs21; 2026-02-21T09:01:58.7289224Z cvt.f32.bf16 %r573, %rs22; 2026-02-21T09:01:58.7289541Z .loc 1 54 34 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:34 2026-02-21T09:01:58.7289891Z cvt.s64.s32 %rd81, %r1110; 2026-02-21T09:01:58.7290078Z add.s64 %rd75, %rd33, %rd81; 2026-02-21T09:01:58.7290401Z .loc 1 54 87 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:87 2026-02-21T09:01:58.7290749Z // begin inline asm 2026-02-21T09:01:58.7290917Z mov.u64 %rd74, 0x0; 2026-02-21T09:01:58.7291138Z createpolicy.fractional.L2::evict_last.b64 %rd74, 1.0; 2026-02-21T09:01:58.7291415Z // end inline asm 2026-02-21T09:01:58.7291570Z // begin inline asm 2026-02-21T09:01:58.7291735Z mov.u16 %rs14, 0x0; 2026-02-21T09:01:58.7291989Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs14 }, [ %rd75 + 0 ], %rd74; 2026-02-21T09:01:58.7292302Z // end inline asm 2026-02-21T09:01:58.7292609Z .loc 1 57 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:57:28 2026-02-21T09:01:58.7293073Z shl.b16 %rs23, %rs14, 4; 2026-02-21T09:01:58.7293393Z .loc 1 72 58 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:72:58 2026-02-21T09:01:58.7293753Z selp.b16 %rs24, %rs23, %rs14, %p33; 2026-02-21T09:01:58.7293965Z cvt.s16.s8 %rs25, %rs24; 2026-02-21T09:01:58.7294139Z shr.s16 %rs26, %rs25, 4; 2026-02-21T09:01:58.7294451Z .loc 1 77 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:77:32 2026-02-21T09:01:58.7294805Z cvt.rn.f32.s16 %r622, %rs26; 2026-02-21T09:01:58.7295001Z st.shared.b32 [%r26], %r622; 2026-02-21T09:01:58.7295180Z $L__tmp3: 2026-02-21T09:01:58.7295620Z .loc 2 291 36 // standard.py:291:36 @[ ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:84:40 ] 2026-02-21T09:01:58.7296060Z // begin inline asm 2026-02-21T09:01:58.7296240Z fence.proxy.async.shared::cta; 2026-02-21T09:01:58.7296568Z // end inline asm 2026-02-21T09:01:58.7296727Z bar.sync 0; 2026-02-21T09:01:58.7296902Z shfl.sync.idx.b32 %r623, %r4, 0, 31, -1; 2026-02-21T09:01:58.7297149Z wgmma.fence.sync.aligned; 2026-02-21T09:01:58.7297344Z mov.pred %p9, -1; 2026-02-21T09:01:58.7297514Z // begin inline asm 2026-02-21T09:01:58.7298109Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129}, {%r534,%r535,%r536,%r537}, %rd1, %p9, 1, 1; 2026-02-21T09:01:58.7298755Z // end inline asm 2026-02-21T09:01:58.7298907Z // begin inline asm 2026-02-21T09:01:58.7299500Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129}, {%r570,%r571,%r572,%r573}, %rd2, %p9, 1, 1; 2026-02-21T09:01:58.7300242Z // end inline asm 2026-02-21T09:01:58.7300413Z wgmma.commit_group.sync.aligned; 2026-02-21T09:01:58.7300618Z mov.b32 %r591, 0; 2026-02-21T09:01:58.7300772Z mov.b32 %r592, %r591; 2026-02-21T09:01:58.7300942Z mov.b32 %r590, %r244; 2026-02-21T09:01:58.7301102Z // begin inline asm 2026-02-21T09:01:58.7301504Z // wait for regs: %r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r590,%r591,%r592 2026-02-21T09:01:58.7301964Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:01:58.7302167Z // end inline asm 2026-02-21T09:01:58.7302318Z $L__tmp4: 2026-02-21T09:01:58.7302614Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7302985Z add.s32 %r624, %r1113, 1; 2026-02-21T09:01:58.7303168Z setp.gt.s32 %p14, %r624, 1; 2026-02-21T09:01:58.7303366Z selp.b32 %r1113, 0, %r624, %p14; 2026-02-21T09:01:58.7303798Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7304172Z mad.wide.s32 %rd80, %r1111, 2, %rd32; 2026-02-21T09:01:58.7304516Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7304875Z shl.b32 %r625, %r1113, 14; 2026-02-21T09:01:58.7305064Z add.s32 %r626, %r1080, %r625; 2026-02-21T09:01:58.7305246Z add.s32 %r612, %r626, %r13; 2026-02-21T09:01:58.7305434Z selp.b32 %r613, 8, 0, %p12; 2026-02-21T09:01:58.7305612Z // begin inline asm 2026-02-21T09:01:58.7305860Z cp.async.ca.shared.global [ %r612 + 0 ], [ %rd137 + 0 ], 0x8, %r613; 2026-02-21T09:01:58.7306143Z // end inline asm 2026-02-21T09:01:58.7306309Z add.s32 %r614, %r612, 8192; 2026-02-21T09:01:58.7306614Z // begin inline asm 2026-02-21T09:01:58.7306855Z cp.async.ca.shared.global [ %r614 + 0 ], [ %rd80 + 0 ], 0x8, %r613; 2026-02-21T09:01:58.7307138Z // end inline asm 2026-02-21T09:01:58.7307295Z cp.async.commit_group; 2026-02-21T09:01:58.7307624Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7308085Z add.s64 %rd137, %rd137, 32; 2026-02-21T09:01:58.7308347Z add.s32 %r1111, %r1111, 16; 2026-02-21T09:01:58.7308546Z add.s32 %r1110, %r1110, 65536; 2026-02-21T09:01:58.7308738Z setp.lt.u64 %p15, %rd138, 504; 2026-02-21T09:01:58.7308933Z @%p15 bra $L__BB0_5; 2026-02-21T09:01:58.7309149Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:01:58.7309550Z .loc 1 33 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:33:32 2026-02-21T09:01:58.7309909Z or.b32 %r656, %r91, %r9; 2026-02-21T09:01:58.7310225Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7310590Z cp.async.wait_group 0; 2026-02-21T09:01:58.7310763Z bar.sync 0; 2026-02-21T09:01:58.7311161Z .loc 1 87 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:87:28 2026-02-21T09:01:58.7311539Z cvt.rn.bf16x2.f32 %r657, %r1115, %r1114; 2026-02-21T09:01:58.7311771Z cvt.rn.bf16x2.f32 %r658, %r1117, %r1116; 2026-02-21T09:01:58.7311996Z cvt.rn.bf16x2.f32 %r659, %r1119, %r1118; 2026-02-21T09:01:58.7312211Z cvt.rn.bf16x2.f32 %r660, %r1121, %r1120; 2026-02-21T09:01:58.7312425Z cvt.rn.bf16x2.f32 %r661, %r1123, %r1122; 2026-02-21T09:01:58.7312634Z cvt.rn.bf16x2.f32 %r662, %r1125, %r1124; 2026-02-21T09:01:58.7312846Z cvt.rn.bf16x2.f32 %r663, %r1127, %r1126; 2026-02-21T09:01:58.7313059Z cvt.rn.bf16x2.f32 %r664, %r1129, %r1128; 2026-02-21T09:01:58.7313404Z .loc 1 88 43 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:43 2026-02-21T09:01:58.7313757Z shl.b32 %r665, %r89, 13; 2026-02-21T09:01:58.7313932Z shl.b32 %r666, %r90, 13; 2026-02-21T09:01:58.7314247Z .loc 1 88 50 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:50 2026-02-21T09:01:58.7314688Z add.s32 %r667, %r665, %r656; 2026-02-21T09:01:58.7314873Z add.s32 %r668, %r666, %r656; 2026-02-21T09:01:58.7315190Z .loc 1 88 22 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:22 2026-02-21T09:01:58.7315562Z mad.wide.s32 %rd82, %r667, 2, %rd34; 2026-02-21T09:01:58.7315771Z mad.wide.s32 %rd83, %r668, 2, %rd34; 2026-02-21T09:01:58.7316112Z .loc 1 88 81 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:81 2026-02-21T09:01:58.7316638Z st.shared.v4.b32 [%r21], {%r657, %r659, %r661, %r663}; 2026-02-21T09:01:58.7316938Z st.shared.v4.b32 [%r21+128], {%r658, %r660, %r662, %r664}; 2026-02-21T09:01:58.7317193Z bar.sync 0; 2026-02-21T09:01:58.7317341Z // begin inline asm 2026-02-21T09:01:58.7317632Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r638, %r639, %r640}, [%r438]; 2026-02-21T09:01:58.7317961Z // end inline asm 2026-02-21T09:01:58.7318123Z // begin inline asm 2026-02-21T09:01:58.7318486Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r641, %r642, %r643, %r644}, [%r443]; 2026-02-21T09:01:58.7318830Z // end inline asm 2026-02-21T09:01:58.7318991Z // begin inline asm 2026-02-21T09:01:58.7319209Z st.global.v4.b32 [ %rd82 + 0 ], { %r637, %r638, %r639, %r640 }; 2026-02-21T09:01:58.7319474Z // end inline asm 2026-02-21T09:01:58.7319625Z // begin inline asm 2026-02-21T09:01:58.7319845Z st.global.v4.b32 [ %rd83 + 0 ], { %r641, %r642, %r643, %r644 }; 2026-02-21T09:01:58.7320099Z // end inline asm 2026-02-21T09:01:58.7320412Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7320779Z add.s32 %r669, %r1089, 8448; 2026-02-21T09:01:58.7321127Z .loc 1 25 35 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:25:35 2026-02-21T09:01:58.7321496Z shr.s32 %r670, %r669, 31; 2026-02-21T09:01:58.7321675Z shr.u32 %r671, %r670, 23; 2026-02-21T09:01:58.7321860Z add.s32 %r672, %r669, %r671; 2026-02-21T09:01:58.7322041Z shr.s32 %r673, %r672, 9; 2026-02-21T09:01:58.7322364Z .loc 1 26 33 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:26:33 2026-02-21T09:01:58.7322807Z shl.b32 %r674, %r673, 1; 2026-02-21T09:01:58.7323123Z .loc 1 27 39 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:39 2026-02-21T09:01:58.7323477Z sub.s32 %r675, 8, %r674; 2026-02-21T09:01:58.7323790Z .loc 1 27 52 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:27:52 2026-02-21T09:01:58.7324144Z min.s32 %r676, %r675, 2; 2026-02-21T09:01:58.7324446Z .loc 1 28 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:45 2026-02-21T09:01:58.7324802Z and.b32 %r677, %r672, -512; 2026-02-21T09:01:58.7324980Z sub.s32 %r678, %r669, %r677; 2026-02-21T09:01:58.7325298Z .loc 1 29 51 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:29:51 2026-02-21T09:01:58.7325738Z div.s32 %r679, %r678, %r676; 2026-02-21T09:01:58.7326064Z .loc 1 28 64 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:64 2026-02-21T09:01:58.7326424Z mul.lo.s32 %r680, %r679, %r676; 2026-02-21T09:01:58.7326747Z sub.s32 %r681, %r678, %r680; 2026-02-21T09:01:58.7327065Z .loc 1 28 30 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:30 2026-02-21T09:01:58.7327415Z add.s32 %r682, %r681, %r674; 2026-02-21T09:01:58.7327732Z .loc 1 30 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:30:27 2026-02-21T09:01:58.7328088Z shl.b32 %r683, %r682, 9; 2026-02-21T09:01:58.7328404Z .loc 1 31 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:31:32 2026-02-21T09:01:58.7328759Z or.b32 %r134, %r683, %r5; 2026-02-21T09:01:58.7328932Z or.b32 %r135, %r683, %r6; 2026-02-21T09:01:58.7329246Z .loc 1 32 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:32:27 2026-02-21T09:01:58.7329688Z shl.b32 %r136, %r679, 5; 2026-02-21T09:01:58.7330001Z .loc 1 48 53 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:53 2026-02-21T09:01:58.7330354Z shl.b32 %r684, %r134, 10; 2026-02-21T09:01:58.7330525Z shl.b32 %r685, %r135, 10; 2026-02-21T09:01:58.7330842Z .loc 1 48 60 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:60 2026-02-21T09:01:58.7331196Z or.b32 %r686, %r684, %r11; 2026-02-21T09:01:58.7331380Z or.b32 %r687, %r685, %r11; 2026-02-21T09:01:58.7331694Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7332055Z mad.wide.s32 %rd84, %r686, 2, %rd32; 2026-02-21T09:01:58.7332271Z mad.wide.s32 %rd85, %r687, 2, %rd32; 2026-02-21T09:01:58.7332605Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7332957Z bar.sync 0; 2026-02-21T09:01:58.7333184Z mov.b32 %r646, 8; 2026-02-21T09:01:58.7333352Z // begin inline asm 2026-02-21T09:01:58.7333589Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd84 + 0 ], 0x8, %r646; 2026-02-21T09:01:58.7333883Z // end inline asm 2026-02-21T09:01:58.7334037Z // begin inline asm 2026-02-21T09:01:58.7334272Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd85 + 0 ], 0x8, %r646; 2026-02-21T09:01:58.7334550Z // end inline asm 2026-02-21T09:01:58.7334709Z cp.async.commit_group; 2026-02-21T09:01:58.7335029Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7335385Z cvt.s64.s32 %rd89, %r684; 2026-02-21T09:01:58.7335571Z or.b64 %rd90, %rd89, %rd134; 2026-02-21T09:01:58.7335754Z shl.b64 %rd91, %rd90, 1; 2026-02-21T09:01:58.7335936Z add.s64 %rd92, %rd32, %rd91; 2026-02-21T09:01:58.7336116Z add.s64 %rd86, %rd92, 32; 2026-02-21T09:01:58.7336297Z cvt.s64.s32 %rd93, %r685; 2026-02-21T09:01:58.7336623Z or.b64 %rd94, %rd93, %rd134; 2026-02-21T09:01:58.7336820Z shl.b64 %rd95, %rd94, 1; 2026-02-21T09:01:58.7337003Z add.s64 %rd96, %rd32, %rd95; 2026-02-21T09:01:58.7337186Z add.s64 %rd87, %rd96, 32; 2026-02-21T09:01:58.7337601Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7337945Z bar.sync 0; 2026-02-21T09:01:58.7338098Z // begin inline asm 2026-02-21T09:01:58.7338327Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd86 + 0 ], 0x8, %r646; 2026-02-21T09:01:58.7338600Z // end inline asm 2026-02-21T09:01:58.7338749Z // begin inline asm 2026-02-21T09:01:58.7338979Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd87 + 0 ], 0x8, %r646; 2026-02-21T09:01:58.7339248Z // end inline asm 2026-02-21T09:01:58.7339405Z cp.async.commit_group; 2026-02-21T09:01:58.7339732Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7340091Z shl.b32 %r688, %r682, 19; 2026-02-21T09:01:58.7340351Z or.b32 %r689, %r1088, %r688; 2026-02-21T09:01:58.7340552Z mad.wide.s32 %rd139, %r689, 2, %rd3; 2026-02-21T09:01:58.7340758Z shl.b32 %r690, %r669, 19; 2026-02-21T09:01:58.7340931Z or.b32 %r691, %r28, %r690; 2026-02-21T09:01:58.7341113Z shl.b32 %r692, %r680, 19; 2026-02-21T09:01:58.7341288Z sub.s32 %r693, %r691, %r692; 2026-02-21T09:01:58.7341471Z mul.lo.s32 %r694, %r673, 267386880; 2026-02-21T09:01:58.7341676Z sub.s32 %r1131, %r693, %r694; 2026-02-21T09:01:58.7341858Z add.s32 %r1130, %r29, %r136; 2026-02-21T09:01:58.7342042Z mov.b32 %r1134, 0f00000000; 2026-02-21T09:01:58.7342219Z mov.b32 %r1133, 1; 2026-02-21T09:01:58.7342379Z mov.b32 %r1132, -1; 2026-02-21T09:01:58.7342541Z mov.b64 %rd140, -8; 2026-02-21T09:01:58.7342709Z mov.b32 %r1135, %r1134; 2026-02-21T09:01:58.7342879Z mov.b32 %r1136, %r1134; 2026-02-21T09:01:58.7343053Z mov.b32 %r1137, %r1134; 2026-02-21T09:01:58.7343224Z mov.b32 %r1138, %r1134; 2026-02-21T09:01:58.7343487Z mov.b32 %r1139, %r1134; 2026-02-21T09:01:58.7343658Z mov.b32 %r1140, %r1134; 2026-02-21T09:01:58.7343824Z mov.b32 %r1141, %r1134; 2026-02-21T09:01:58.7343995Z mov.b32 %r1142, %r1134; 2026-02-21T09:01:58.7344158Z mov.b32 %r1143, %r1134; 2026-02-21T09:01:58.7344343Z mov.b32 %r1144, %r1134; 2026-02-21T09:01:58.7344509Z mov.b32 %r1145, %r1134; 2026-02-21T09:01:58.7344677Z mov.b32 %r1146, %r1134; 2026-02-21T09:01:58.7344841Z mov.b32 %r1147, %r1134; 2026-02-21T09:01:58.7345010Z mov.b32 %r1148, %r1134; 2026-02-21T09:01:58.7345186Z mov.b32 %r1149, %r1134; 2026-02-21T09:01:58.7345402Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:01:58.7345712Z // => This Inner Loop Header: Depth=2 2026-02-21T09:01:58.7345973Z add.s64 %rd140, %rd140, 8; 2026-02-21T09:01:58.7346182Z setp.lt.u64 %p19, %rd140, 496; 2026-02-21T09:01:58.7346377Z add.s32 %r809, %r1132, 1; 2026-02-21T09:01:58.7346678Z setp.gt.s32 %p20, %r809, 1; 2026-02-21T09:01:58.7346874Z selp.b32 %r1132, 0, %r809, %p20; 2026-02-21T09:01:58.7347308Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7347685Z cp.async.wait_group 1; 2026-02-21T09:01:58.7347864Z bar.sync 0; 2026-02-21T09:01:58.7348024Z shl.b32 %r810, %r1132, 14; 2026-02-21T09:01:58.7348204Z add.s32 %r812, %r1080, %r810; 2026-02-21T09:01:58.7348604Z .loc 1 52 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:52:32 2026-02-21T09:01:58.7348960Z add.s32 %r813, %r812, %r24; 2026-02-21T09:01:58.7349151Z ld.shared.b16 %rs28, [%r813]; 2026-02-21T09:01:58.7349344Z ld.shared.b16 %rs29, [%r813+256]; 2026-02-21T09:01:58.7349555Z ld.shared.b16 %rs30, [%r813+16]; 2026-02-21T09:01:58.7349756Z ld.shared.b16 %rs31, [%r813+272]; 2026-02-21T09:01:58.7349947Z add.s32 %r814, %r812, %r25; 2026-02-21T09:01:58.7350134Z ld.shared.b16 %rs32, [%r814]; 2026-02-21T09:01:58.7350318Z ld.shared.b16 %rs33, [%r814+256]; 2026-02-21T09:01:58.7350521Z ld.shared.b16 %rs34, [%r814+16]; 2026-02-21T09:01:58.7350713Z ld.shared.b16 %rs35, [%r814+272]; 2026-02-21T09:01:58.7350908Z cvt.f32.bf16 %r727, %rs28; 2026-02-21T09:01:58.7351084Z cvt.f32.bf16 %r728, %rs29; 2026-02-21T09:01:58.7351364Z cvt.f32.bf16 %r729, %rs32; 2026-02-21T09:01:58.7351546Z cvt.f32.bf16 %r730, %rs33; 2026-02-21T09:01:58.7351734Z cvt.f32.bf16 %r763, %rs30; 2026-02-21T09:01:58.7351922Z cvt.f32.bf16 %r764, %rs31; 2026-02-21T09:01:58.7352100Z cvt.f32.bf16 %r765, %rs34; 2026-02-21T09:01:58.7352282Z cvt.f32.bf16 %r766, %rs35; 2026-02-21T09:01:58.7352610Z .loc 1 54 34 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:34 2026-02-21T09:01:58.7352980Z cvt.s64.s32 %rd104, %r1130; 2026-02-21T09:01:58.7353168Z add.s64 %rd98, %rd33, %rd104; 2026-02-21T09:01:58.7353499Z .loc 1 54 87 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:87 2026-02-21T09:01:58.7353857Z // begin inline asm 2026-02-21T09:01:58.7354104Z mov.u64 %rd97, 0x0; 2026-02-21T09:01:58.7354344Z createpolicy.fractional.L2::evict_last.b64 %rd97, 1.0; 2026-02-21T09:01:58.7354603Z // end inline asm 2026-02-21T09:01:58.7354764Z // begin inline asm 2026-02-21T09:01:58.7354924Z mov.u16 %rs27, 0x0; 2026-02-21T09:01:58.7355179Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs27 }, [ %rd98 + 0 ], %rd97; 2026-02-21T09:01:58.7355478Z // end inline asm 2026-02-21T09:01:58.7355779Z .loc 1 57 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:57:28 2026-02-21T09:01:58.7356139Z shl.b16 %rs36, %rs27, 4; 2026-02-21T09:01:58.7356583Z .loc 1 72 58 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:72:58 2026-02-21T09:01:58.7356959Z selp.b16 %rs37, %rs36, %rs27, %p33; 2026-02-21T09:01:58.7357166Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T09:01:58.7357347Z shr.s16 %rs39, %rs38, 4; 2026-02-21T09:01:58.7357658Z .loc 1 77 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:77:32 2026-02-21T09:01:58.7358126Z cvt.rn.f32.s16 %r815, %rs39; 2026-02-21T09:01:58.7358325Z st.shared.b32 [%r26], %r815; 2026-02-21T09:01:58.7358505Z $L__tmp5: 2026-02-21T09:01:58.7358872Z .loc 2 291 36 // standard.py:291:36 @[ ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:84:40 ] 2026-02-21T09:01:58.7359292Z // begin inline asm 2026-02-21T09:01:58.7359496Z fence.proxy.async.shared::cta; 2026-02-21T09:01:58.7359691Z // end inline asm 2026-02-21T09:01:58.7359853Z bar.sync 0; 2026-02-21T09:01:58.7360023Z shfl.sync.idx.b32 %r816, %r4, 0, 31, -1; 2026-02-21T09:01:58.7360258Z wgmma.fence.sync.aligned; 2026-02-21T09:01:58.7360454Z mov.pred %p16, -1; 2026-02-21T09:01:58.7360626Z // begin inline asm 2026-02-21T09:01:58.7361225Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149}, {%r727,%r728,%r729,%r730}, %rd1, %p16, 1, 1; 2026-02-21T09:01:58.7361868Z // end inline asm 2026-02-21T09:01:58.7362100Z // begin inline asm 2026-02-21T09:01:58.7362683Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149}, {%r763,%r764,%r765,%r766}, %rd2, %p16, 1, 1; 2026-02-21T09:01:58.7363323Z // end inline asm 2026-02-21T09:01:58.7363498Z wgmma.commit_group.sync.aligned; 2026-02-21T09:01:58.7363694Z mov.b32 %r785, 0; 2026-02-21T09:01:58.7363852Z mov.b32 %r784, %r785; 2026-02-21T09:01:58.7364015Z mov.b32 %r783, %r244; 2026-02-21T09:01:58.7364177Z // begin inline asm 2026-02-21T09:01:58.7364575Z // wait for regs: %r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149,%r783,%r784,%r785 2026-02-21T09:01:58.7365046Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:01:58.7365248Z // end inline asm 2026-02-21T09:01:58.7365392Z $L__tmp6: 2026-02-21T09:01:58.7365699Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7366068Z add.s32 %r817, %r1133, 1; 2026-02-21T09:01:58.7366271Z setp.gt.s32 %p21, %r817, 1; 2026-02-21T09:01:58.7366704Z selp.b32 %r1133, 0, %r817, %p21; 2026-02-21T09:01:58.7367048Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7367410Z mad.wide.s32 %rd103, %r1131, 2, %rd32; 2026-02-21T09:01:58.7367763Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7368124Z shl.b32 %r818, %r1133, 14; 2026-02-21T09:01:58.7368305Z add.s32 %r819, %r1080, %r818; 2026-02-21T09:01:58.7368513Z add.s32 %r805, %r819, %r13; 2026-02-21T09:01:58.7368696Z selp.b32 %r806, 8, 0, %p19; 2026-02-21T09:01:58.7368883Z // begin inline asm 2026-02-21T09:01:58.7369123Z cp.async.ca.shared.global [ %r805 + 0 ], [ %rd139 + 0 ], 0x8, %r806; 2026-02-21T09:01:58.7369414Z // end inline asm 2026-02-21T09:01:58.7369655Z add.s32 %r807, %r805, 8192; 2026-02-21T09:01:58.7369847Z // begin inline asm 2026-02-21T09:01:58.7370086Z cp.async.ca.shared.global [ %r807 + 0 ], [ %rd103 + 0 ], 0x8, %r806; 2026-02-21T09:01:58.7370375Z // end inline asm 2026-02-21T09:01:58.7370547Z cp.async.commit_group; 2026-02-21T09:01:58.7370873Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7371242Z add.s64 %rd139, %rd139, 32; 2026-02-21T09:01:58.7371421Z add.s32 %r1131, %r1131, 16; 2026-02-21T09:01:58.7371606Z add.s32 %r1130, %r1130, 65536; 2026-02-21T09:01:58.7371797Z setp.lt.u64 %p22, %rd140, 504; 2026-02-21T09:01:58.7371993Z @%p22 bra $L__BB0_7; 2026-02-21T09:01:58.7372217Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:01:58.7372616Z .loc 1 33 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:33:32 2026-02-21T09:01:58.7373107Z or.b32 %r838, %r136, %r9; 2026-02-21T09:01:58.7373432Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7373803Z cp.async.wait_group 0; 2026-02-21T09:01:58.7373980Z bar.sync 0; 2026-02-21T09:01:58.7374276Z .loc 1 87 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:87:28 2026-02-21T09:01:58.7374652Z cvt.rn.bf16x2.f32 %r839, %r1135, %r1134; 2026-02-21T09:01:58.7374877Z cvt.rn.bf16x2.f32 %r840, %r1137, %r1136; 2026-02-21T09:01:58.7375100Z cvt.rn.bf16x2.f32 %r841, %r1139, %r1138; 2026-02-21T09:01:58.7375313Z cvt.rn.bf16x2.f32 %r842, %r1141, %r1140; 2026-02-21T09:01:58.7375537Z cvt.rn.bf16x2.f32 %r843, %r1143, %r1142; 2026-02-21T09:01:58.7375751Z cvt.rn.bf16x2.f32 %r844, %r1145, %r1144; 2026-02-21T09:01:58.7375970Z cvt.rn.bf16x2.f32 %r845, %r1147, %r1146; 2026-02-21T09:01:58.7376184Z cvt.rn.bf16x2.f32 %r846, %r1149, %r1148; 2026-02-21T09:01:58.7376758Z .loc 1 88 43 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:43 2026-02-21T09:01:58.7377145Z shl.b32 %r847, %r134, 13; 2026-02-21T09:01:58.7377319Z shl.b32 %r848, %r135, 13; 2026-02-21T09:01:58.7377635Z .loc 1 88 50 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:50 2026-02-21T09:01:58.7377991Z add.s32 %r849, %r847, %r838; 2026-02-21T09:01:58.7378178Z add.s32 %r850, %r848, %r838; 2026-02-21T09:01:58.7378492Z .loc 1 88 22 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:22 2026-02-21T09:01:58.7378862Z mad.wide.s32 %rd105, %r849, 2, %rd34; 2026-02-21T09:01:58.7379076Z mad.wide.s32 %rd106, %r850, 2, %rd34; 2026-02-21T09:01:58.7379412Z .loc 1 88 81 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:81 2026-02-21T09:01:58.7379808Z st.shared.v4.b32 [%r21], {%r839, %r841, %r843, %r845}; 2026-02-21T09:01:58.7380101Z st.shared.v4.b32 [%r21+128], {%r840, %r842, %r844, %r846}; 2026-02-21T09:01:58.7380356Z bar.sync 0; 2026-02-21T09:01:58.7380506Z // begin inline asm 2026-02-21T09:01:58.7380791Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r820, %r821, %r822, %r823}, [%r438]; 2026-02-21T09:01:58.7381234Z // end inline asm 2026-02-21T09:01:58.7381389Z // begin inline asm 2026-02-21T09:01:58.7381667Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r825, %r826, %r827, %r828}, [%r443]; 2026-02-21T09:01:58.7381987Z // end inline asm 2026-02-21T09:01:58.7382158Z // begin inline asm 2026-02-21T09:01:58.7382378Z st.global.v4.b32 [ %rd105 + 0 ], { %r820, %r821, %r822, %r823 }; 2026-02-21T09:01:58.7382642Z // end inline asm 2026-02-21T09:01:58.7382793Z // begin inline asm 2026-02-21T09:01:58.7383014Z st.global.v4.b32 [ %rd106 + 0 ], { %r825, %r826, %r827, %r828 }; 2026-02-21T09:01:58.7383271Z // end inline asm 2026-02-21T09:01:58.7383579Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7383955Z add.s32 %r1089, %r1089, 12672; 2026-02-21T09:01:58.7384239Z setp.lt.s32 %p23, %r1089, %r1150; 2026-02-21T09:01:58.7384452Z @%p23 bra $L__BB0_2; 2026-02-21T09:01:58.7384641Z $L__BB0_9: // %.preheader 2026-02-21T09:01:58.7384894Z setp.gt.s32 %p24, %r1150, 2047; 2026-02-21T09:01:58.7385090Z @%p24 bra $L__BB0_14; 2026-02-21T09:01:58.7385291Z // %bb.10: // %.lr.ph29 2026-02-21T09:01:58.7385674Z .loc 1 0 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:0:144 2026-02-21T09:01:58.7386038Z and.b32 %r852, %r1078, 8056; 2026-02-21T09:01:58.7386230Z and.b32 %r854, %r1079, 136; 2026-02-21T09:01:58.7386412Z xor.b32 %r30, %r854, %r852; 2026-02-21T09:01:58.7386724Z add.s32 %r31, %r1080, %r30; 2026-02-21T09:01:58.7386906Z add.s32 %r892, %r31, 8192; 2026-02-21T09:01:58.7387094Z add.s32 %r894, %r31, 16384; 2026-02-21T09:01:58.7387270Z add.s32 %r896, %r31, 24576; 2026-02-21T09:01:58.7387454Z and.b32 %r857, %r1081, 15872; 2026-02-21T09:01:58.7387741Z and.b32 %r858, %r1078, 96; 2026-02-21T09:01:58.7387920Z or.b32 %r860, %r857, %r858; 2026-02-21T09:01:58.7388105Z or.b32 %r861, %r860, %r1082; 2026-02-21T09:01:58.7388370Z or.b32 %r35, %r861, %r854; 2026-02-21T09:01:58.7388556Z xor.b32 %r36, %r35, 8; 2026-02-21T09:01:58.7388727Z and.b32 %r863, %r1078, 48; 2026-02-21T09:01:58.7388904Z and.b32 %r865, %r1084, 60; 2026-02-21T09:01:58.7389077Z xor.b32 %r866, %r863, %r865; 2026-02-21T09:01:58.7389260Z add.s32 %r867, %r1080, 32768; 2026-02-21T09:01:58.7389439Z add.s32 %r868, %r867, %r1083; 2026-02-21T09:01:58.7389625Z add.s32 %r37, %r868, %r866; 2026-02-21T09:01:58.7389811Z bfe.u32 %r869, %r867, 4, 14; 2026-02-21T09:01:58.7389989Z cvt.u64.u32 %rd107, %r869; 2026-02-21T09:01:58.7390188Z or.b64 %rd128, %rd107, -9223371899407433728; 2026-02-21T09:01:58.7390407Z add.s32 %r870, %r1080, 32800; 2026-02-21T09:01:58.7390590Z bfe.u32 %r871, %r870, 4, 14; 2026-02-21T09:01:58.7390768Z cvt.u64.u32 %rd108, %r871; 2026-02-21T09:01:58.7390968Z or.b64 %rd129, %rd108, -9223371899407433728; 2026-02-21T09:01:58.7391270Z and.b32 %r873, %r1085, 24576; 2026-02-21T09:01:58.7391464Z and.b32 %r875, %r1078, 7936; 2026-02-21T09:01:58.7391645Z and.b32 %r877, %r1087, 112; 2026-02-21T09:01:58.7391823Z or.b32 %r878, %r1086, %r875; 2026-02-21T09:01:58.7392003Z xor.b32 %r879, %r878, %r877; 2026-02-21T09:01:58.7392180Z add.s32 %r880, %r1080, %r873; 2026-02-21T09:01:58.7392364Z add.s32 %r38, %r880, %r879; 2026-02-21T09:01:58.7392542Z shl.b32 %r881, %r3, 12; 2026-02-21T09:01:58.7392715Z and.b32 %r882, %r881, 24576; 2026-02-21T09:01:58.7392891Z and.b32 %r883, %r1081, 112; 2026-02-21T09:01:58.7393075Z and.b32 %r884, %r1087, 4064; 2026-02-21T09:01:58.7393250Z or.b32 %r885, %r882, %r883; 2026-02-21T09:01:58.7393432Z xor.b32 %r886, %r885, %r884; 2026-02-21T09:01:58.7393621Z add.s32 %r1051, %r1080, %r886; 2026-02-21T09:01:58.7393807Z add.s32 %r1056, %r1051, 4096; 2026-02-21T09:01:58.7394147Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7394523Z mad.wide.u32 %rd109, %r8, 8, %rd32; 2026-02-21T09:01:58.7394738Z add.s64 %rd6, %rd109, 64; 2026-02-21T09:01:58.7395058Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7395513Z or.b32 %r889, %r1088, %r11; 2026-02-21T09:01:58.7395693Z or.b32 %r41, %r889, 262176; 2026-02-21T09:01:58.7396018Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7396393Z mad.wide.u32 %rd7, %r10, 8192, %rd33; 2026-02-21T09:01:58.7396796Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T09:01:58.7397106Z // Child Loop BB0_12 Depth 2 2026-02-21T09:01:58.7397490Z .loc 1 25 35 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:25:35 2026-02-21T09:01:58.7397854Z shr.s32 %r901, %r1150, 31; 2026-02-21T09:01:58.7398122Z shr.u32 %r902, %r901, 23; 2026-02-21T09:01:58.7398312Z add.s32 %r903, %r1150, %r902; 2026-02-21T09:01:58.7398505Z shr.s32 %r904, %r903, 9; 2026-02-21T09:01:58.7398820Z .loc 1 28 45 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:45 2026-02-21T09:01:58.7399182Z and.b32 %r905, %r903, 65024; 2026-02-21T09:01:58.7399363Z sub.s32 %r906, %r1150, %r905; 2026-02-21T09:01:58.7399688Z .loc 1 28 64 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:64 2026-02-21T09:01:58.7400037Z cvt.u16.u32 %rs40, %r906; 2026-02-21T09:01:58.7400359Z .loc 1 29 51 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:29:51 2026-02-21T09:01:58.7400716Z shr.u16 %rs41, %rs40, 15; 2026-02-21T09:01:58.7400891Z add.s16 %rs42, %rs40, %rs41; 2026-02-21T09:01:58.7401079Z shr.s16 %rs43, %rs42, 1; 2026-02-21T09:01:58.7401387Z .loc 1 28 64 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:28:64 2026-02-21T09:01:58.7401834Z and.b16 %rs44, %rs42, -2; 2026-02-21T09:01:58.7402008Z sub.s16 %rs45, %rs40, %rs44; 2026-02-21T09:01:58.7402190Z cvt.u32.u16 %r907, %rs45; 2026-02-21T09:01:58.7402496Z .loc 1 30 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:30:27 2026-02-21T09:01:58.7402847Z shl.b32 %r908, %r904, 10; 2026-02-21T09:01:58.7403027Z mul.wide.s16 %r909, %rs45, 512; 2026-02-21T09:01:58.7403220Z add.s32 %r910, %r909, %r908; 2026-02-21T09:01:58.7403537Z .loc 1 31 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:31:32 2026-02-21T09:01:58.7403886Z or.b32 %r181, %r910, %r5; 2026-02-21T09:01:58.7404060Z or.b32 %r182, %r910, %r6; 2026-02-21T09:01:58.7404366Z .loc 1 32 27 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:32:27 2026-02-21T09:01:58.7404721Z mul.wide.s16 %r183, %rs43, 32; 2026-02-21T09:01:58.7405131Z .loc 1 48 53 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:53 2026-02-21T09:01:58.7405490Z shl.b32 %r911, %r181, 10; 2026-02-21T09:01:58.7405667Z shl.b32 %r912, %r182, 10; 2026-02-21T09:01:58.7405976Z .loc 1 48 60 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:60 2026-02-21T09:01:58.7406331Z or.b32 %r913, %r911, %r11; 2026-02-21T09:01:58.7406634Z or.b32 %r914, %r912, %r11; 2026-02-21T09:01:58.7406957Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7407318Z mad.wide.s32 %rd110, %r913, 2, %rd32; 2026-02-21T09:01:58.7407529Z mad.wide.s32 %rd111, %r914, 2, %rd32; 2026-02-21T09:01:58.7407872Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7408217Z bar.sync 0; 2026-02-21T09:01:58.7408379Z mov.b32 %r891, 8; 2026-02-21T09:01:58.7408538Z // begin inline asm 2026-02-21T09:01:58.7408787Z cp.async.ca.shared.global [ %r31 + 0 ], [ %rd110 + 0 ], 0x8, %r891; 2026-02-21T09:01:58.7409071Z // end inline asm 2026-02-21T09:01:58.7409231Z // begin inline asm 2026-02-21T09:01:58.7409474Z cp.async.ca.shared.global [ %r892 + 0 ], [ %rd111 + 0 ], 0x8, %r891; 2026-02-21T09:01:58.7409850Z // end inline asm 2026-02-21T09:01:58.7410017Z cp.async.commit_group; 2026-02-21T09:01:58.7410333Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7410695Z cvt.s64.s32 %rd115, %r911; 2026-02-21T09:01:58.7417699Z or.b64 %rd117, %rd115, %rd134; 2026-02-21T09:01:58.7417982Z shl.b64 %rd118, %rd117, 1; 2026-02-21T09:01:58.7418184Z add.s64 %rd119, %rd32, %rd118; 2026-02-21T09:01:58.7418394Z add.s64 %rd112, %rd119, 32; 2026-02-21T09:01:58.7418588Z cvt.s64.s32 %rd120, %r912; 2026-02-21T09:01:58.7418782Z or.b64 %rd121, %rd120, %rd134; 2026-02-21T09:01:58.7418978Z shl.b64 %rd122, %rd121, 1; 2026-02-21T09:01:58.7419170Z add.s64 %rd123, %rd32, %rd122; 2026-02-21T09:01:58.7419541Z add.s64 %rd113, %rd123, 32; 2026-02-21T09:01:58.7419913Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7420281Z bar.sync 0; 2026-02-21T09:01:58.7420434Z // begin inline asm 2026-02-21T09:01:58.7420685Z cp.async.ca.shared.global [ %r894 + 0 ], [ %rd112 + 0 ], 0x8, %r891; 2026-02-21T09:01:58.7420967Z // end inline asm 2026-02-21T09:01:58.7421127Z // begin inline asm 2026-02-21T09:01:58.7421362Z cp.async.ca.shared.global [ %r896 + 0 ], [ %rd113 + 0 ], 0x8, %r891; 2026-02-21T09:01:58.7421630Z // end inline asm 2026-02-21T09:01:58.7421795Z cp.async.commit_group; 2026-02-21T09:01:58.7422117Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7422493Z or.b32 %r915, %r5, %r908; 2026-02-21T09:01:58.7422678Z add.s32 %r916, %r915, %r909; 2026-02-21T09:01:58.7422873Z shl.b32 %r917, %r916, 10; 2026-02-21T09:01:58.7423157Z mad.wide.s32 %rd142, %r917, 2, %rd6; 2026-02-21T09:01:58.7423370Z shl.b32 %r918, %r904, 20; 2026-02-21T09:01:58.7423548Z or.b32 %r919, %r41, %r918; 2026-02-21T09:01:58.7423723Z shl.b32 %r920, %r907, 19; 2026-02-21T09:01:58.7423906Z add.s32 %r1151, %r919, %r920; 2026-02-21T09:01:58.7424089Z or.b32 %r921, %r7, %r183; 2026-02-21T09:01:58.7424263Z cvt.s64.s32 %rd124, %r921; 2026-02-21T09:01:58.7424446Z add.s64 %rd141, %rd7, %rd124; 2026-02-21T09:01:58.7424636Z mov.b32 %r1154, 0f00000000; 2026-02-21T09:01:58.7424816Z mov.b32 %r1153, 1; 2026-02-21T09:01:58.7424983Z mov.b32 %r1152, -1; 2026-02-21T09:01:58.7425147Z mov.b64 %rd143, -8; 2026-02-21T09:01:58.7425316Z mov.b32 %r1155, %r1154; 2026-02-21T09:01:58.7425484Z mov.b32 %r1156, %r1154; 2026-02-21T09:01:58.7425661Z mov.b32 %r1157, %r1154; 2026-02-21T09:01:58.7425823Z mov.b32 %r1158, %r1154; 2026-02-21T09:01:58.7425993Z mov.b32 %r1159, %r1154; 2026-02-21T09:01:58.7426164Z mov.b32 %r1160, %r1154; 2026-02-21T09:01:58.7426345Z mov.b32 %r1161, %r1154; 2026-02-21T09:01:58.7426669Z mov.b32 %r1162, %r1154; 2026-02-21T09:01:58.7426953Z mov.b32 %r1163, %r1154; 2026-02-21T09:01:58.7427131Z mov.b32 %r1164, %r1154; 2026-02-21T09:01:58.7427294Z mov.b32 %r1165, %r1154; 2026-02-21T09:01:58.7427463Z mov.b32 %r1166, %r1154; 2026-02-21T09:01:58.7427626Z mov.b32 %r1167, %r1154; 2026-02-21T09:01:58.7427795Z mov.b32 %r1168, %r1154; 2026-02-21T09:01:58.7427855Z mov.b32 %r1169, %r1154; 2026-02-21T09:01:58.7427975Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T09:01:58.7428089Z // => This Inner Loop Header: Depth=2 2026-02-21T09:01:58.7428156Z add.s64 %rd143, %rd143, 8; 2026-02-21T09:01:58.7428301Z setp.lt.u64 %p28, %rd143, 496; 2026-02-21T09:01:58.7428381Z add.s32 %r1036, %r1152, 1; 2026-02-21T09:01:58.7428453Z setp.gt.s32 %p29, %r1036, 1; 2026-02-21T09:01:58.7428525Z selp.b32 %r1152, 0, %r1036, %p29; 2026-02-21T09:01:58.7428744Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7428826Z cp.async.wait_group 1; 2026-02-21T09:01:58.7428886Z bar.sync 0; 2026-02-21T09:01:58.7428954Z shl.b32 %r1037, %r1152, 14; 2026-02-21T09:01:58.7429134Z add.s32 %r1039, %r1080, %r1037; 2026-02-21T09:01:58.7429345Z .loc 1 52 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:52:32 2026-02-21T09:01:58.7429415Z add.s32 %r1040, %r1039, %r35; 2026-02-21T09:01:58.7429496Z ld.shared.b16 %rs47, [%r1040]; 2026-02-21T09:01:58.7429568Z ld.shared.b16 %rs48, [%r1040+256]; 2026-02-21T09:01:58.7429636Z ld.shared.b16 %rs49, [%r1040+16]; 2026-02-21T09:01:58.7429704Z ld.shared.b16 %rs50, [%r1040+272]; 2026-02-21T09:01:58.7429773Z add.s32 %r1041, %r1039, %r36; 2026-02-21T09:01:58.7429841Z ld.shared.b16 %rs51, [%r1041]; 2026-02-21T09:01:58.7429908Z ld.shared.b16 %rs52, [%r1041+256]; 2026-02-21T09:01:58.7429981Z ld.shared.b16 %rs53, [%r1041+16]; 2026-02-21T09:01:58.7430125Z ld.shared.b16 %rs54, [%r1041+272]; 2026-02-21T09:01:58.7430200Z cvt.f32.bf16 %r954, %rs47; 2026-02-21T09:01:58.7430266Z cvt.f32.bf16 %r955, %rs48; 2026-02-21T09:01:58.7430335Z cvt.f32.bf16 %r956, %rs51; 2026-02-21T09:01:58.7430404Z cvt.f32.bf16 %r957, %rs52; 2026-02-21T09:01:58.7430467Z cvt.f32.bf16 %r990, %rs49; 2026-02-21T09:01:58.7430537Z cvt.f32.bf16 %r991, %rs50; 2026-02-21T09:01:58.7430600Z cvt.f32.bf16 %r992, %rs53; 2026-02-21T09:01:58.7430665Z cvt.f32.bf16 %r993, %rs54; 2026-02-21T09:01:58.7430877Z .loc 1 54 87 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:54:87 2026-02-21T09:01:58.7430942Z // begin inline asm 2026-02-21T09:01:58.7431004Z mov.u64 %rd125, 0x0; 2026-02-21T09:01:58.7431145Z createpolicy.fractional.L2::evict_last.b64 %rd125, 1.0; 2026-02-21T09:01:58.7431213Z // end inline asm 2026-02-21T09:01:58.7431275Z // begin inline asm 2026-02-21T09:01:58.7431334Z mov.u16 %rs46, 0x0; 2026-02-21T09:01:58.7431508Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs46 }, [ %rd141 + 0 ], %rd125; 2026-02-21T09:01:58.7431652Z // end inline asm 2026-02-21T09:01:58.7431865Z .loc 1 57 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:57:28 2026-02-21T09:01:58.7431940Z shl.b16 %rs55, %rs46, 4; 2026-02-21T09:01:58.7432140Z .loc 1 72 58 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:72:58 2026-02-21T09:01:58.7432215Z selp.b16 %rs56, %rs55, %rs46, %p33; 2026-02-21T09:01:58.7432281Z cvt.s16.s8 %rs57, %rs56; 2026-02-21T09:01:58.7432360Z shr.s16 %rs58, %rs57, 4; 2026-02-21T09:01:58.7432563Z .loc 1 77 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:77:32 2026-02-21T09:01:58.7432632Z cvt.rn.f32.s16 %r1042, %rs58; 2026-02-21T09:01:58.7432708Z st.shared.b32 [%r37], %r1042; 2026-02-21T09:01:58.7432766Z $L__tmp7: 2026-02-21T09:01:58.7433046Z .loc 2 291 36 // standard.py:291:36 @[ ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:84:40 ] 2026-02-21T09:01:58.7433177Z // begin inline asm 2026-02-21T09:01:58.7433259Z fence.proxy.async.shared::cta; 2026-02-21T09:01:58.7433319Z // end inline asm 2026-02-21T09:01:58.7433379Z bar.sync 0; 2026-02-21T09:01:58.7433477Z shfl.sync.idx.b32 %r1043, %r4, 0, 31, -1; 2026-02-21T09:01:58.7433550Z wgmma.fence.sync.aligned; 2026-02-21T09:01:58.7433616Z mov.pred %p25, -1; 2026-02-21T09:01:58.7433684Z // begin inline asm 2026-02-21T09:01:58.7434192Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169}, {%r954,%r955,%r956,%r957}, %rd128, %p25, 1, 1; 2026-02-21T09:01:58.7434255Z // end inline asm 2026-02-21T09:01:58.7434324Z // begin inline asm 2026-02-21T09:01:58.7434819Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169}, {%r990,%r991,%r992,%r993}, %rd129, %p25, 1, 1; 2026-02-21T09:01:58.7434884Z // end inline asm 2026-02-21T09:01:58.7434967Z wgmma.commit_group.sync.aligned; 2026-02-21T09:01:58.7435037Z mov.b32 %r1011, 0; 2026-02-21T09:01:58.7435104Z mov.b32 %r1010, %r867; 2026-02-21T09:01:58.7435236Z mov.b32 %r1012, %r1011; 2026-02-21T09:01:58.7435307Z // begin inline asm 2026-02-21T09:01:58.7435622Z // wait for regs: %r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169,%r1010,%r1011,%r1012 2026-02-21T09:01:58.7435703Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:01:58.7435769Z // end inline asm 2026-02-21T09:01:58.7435834Z $L__tmp8: 2026-02-21T09:01:58.7436057Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7436127Z add.s32 %r1044, %r1153, 1; 2026-02-21T09:01:58.7436205Z setp.gt.s32 %p30, %r1044, 1; 2026-02-21T09:01:58.7436276Z selp.b32 %r1153, 0, %r1044, %p30; 2026-02-21T09:01:58.7436696Z .loc 1 48 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:32 2026-02-21T09:01:58.7436797Z mad.wide.s32 %rd131, %r1151, 2, %rd32; 2026-02-21T09:01:58.7437006Z .loc 1 48 80 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:48:80 2026-02-21T09:01:58.7437075Z shl.b32 %r1045, %r1153, 14; 2026-02-21T09:01:58.7437147Z add.s32 %r1046, %r1080, %r1045; 2026-02-21T09:01:58.7437212Z add.s32 %r1032, %r1046, %r30; 2026-02-21T09:01:58.7437278Z selp.b32 %r1033, 8, 0, %p28; 2026-02-21T09:01:58.7437339Z // begin inline asm 2026-02-21T09:01:58.7437493Z cp.async.ca.shared.global [ %r1032 + 0 ], [ %rd142 + 0 ], 0x8, %r1033; 2026-02-21T09:01:58.7437553Z // end inline asm 2026-02-21T09:01:58.7437616Z add.s32 %r1034, %r1032, 8192; 2026-02-21T09:01:58.7437683Z // begin inline asm 2026-02-21T09:01:58.7437820Z cp.async.ca.shared.global [ %r1034 + 0 ], [ %rd131 + 0 ], 0x8, %r1033; 2026-02-21T09:01:58.7437880Z // end inline asm 2026-02-21T09:01:58.7438031Z cp.async.commit_group; 2026-02-21T09:01:58.7438257Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7438325Z add.s64 %rd142, %rd142, 32; 2026-02-21T09:01:58.7438390Z add.s32 %r1151, %r1151, 16; 2026-02-21T09:01:58.7438465Z add.s64 %rd141, %rd141, 65536; 2026-02-21T09:01:58.7438534Z setp.lt.u64 %p31, %rd143, 504; 2026-02-21T09:01:58.7438599Z @%p31 bra $L__BB0_12; 2026-02-21T09:01:58.7438721Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:01:58.7438921Z .loc 1 33 32 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:33:32 2026-02-21T09:01:58.7438986Z or.b32 %r1065, %r183, %r9; 2026-02-21T09:01:58.7439189Z .loc 1 40 102 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:40:102 2026-02-21T09:01:58.7439267Z cp.async.wait_group 0; 2026-02-21T09:01:58.7439326Z bar.sync 0; 2026-02-21T09:01:58.7439598Z .loc 1 87 28 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:87:28 2026-02-21T09:01:58.7439692Z cvt.rn.bf16x2.f32 %r1066, %r1155, %r1154; 2026-02-21T09:01:58.7439771Z cvt.rn.bf16x2.f32 %r1067, %r1157, %r1156; 2026-02-21T09:01:58.7439847Z cvt.rn.bf16x2.f32 %r1068, %r1159, %r1158; 2026-02-21T09:01:58.7439925Z cvt.rn.bf16x2.f32 %r1069, %r1161, %r1160; 2026-02-21T09:01:58.7439998Z cvt.rn.bf16x2.f32 %r1070, %r1163, %r1162; 2026-02-21T09:01:58.7440069Z cvt.rn.bf16x2.f32 %r1071, %r1165, %r1164; 2026-02-21T09:01:58.7440141Z cvt.rn.bf16x2.f32 %r1072, %r1167, %r1166; 2026-02-21T09:01:58.7440222Z cvt.rn.bf16x2.f32 %r1073, %r1169, %r1168; 2026-02-21T09:01:58.7440426Z .loc 1 88 43 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:43 2026-02-21T09:01:58.7440491Z shl.b32 %r1074, %r181, 13; 2026-02-21T09:01:58.7440559Z shl.b32 %r1075, %r182, 13; 2026-02-21T09:01:58.7440761Z .loc 1 88 50 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:50 2026-02-21T09:01:58.7440834Z add.s32 %r1076, %r1074, %r1065; 2026-02-21T09:01:58.7440904Z add.s32 %r1077, %r1075, %r1065; 2026-02-21T09:01:58.7441106Z .loc 1 88 22 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:22 2026-02-21T09:01:58.7441265Z mad.wide.s32 %rd132, %r1076, 2, %rd34; 2026-02-21T09:01:58.7441335Z mad.wide.s32 %rd133, %r1077, 2, %rd34; 2026-02-21T09:01:58.7441539Z .loc 1 88 81 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:88:81 2026-02-21T09:01:58.7441659Z st.shared.v4.b32 [%r38], {%r1066, %r1068, %r1070, %r1072}; 2026-02-21T09:01:58.7441778Z st.shared.v4.b32 [%r38+128], {%r1067, %r1069, %r1071, %r1073}; 2026-02-21T09:01:58.7441846Z bar.sync 0; 2026-02-21T09:01:58.7441908Z // begin inline asm 2026-02-21T09:01:58.7442105Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1047, %r1048, %r1049, %r1050}, [%r1051]; 2026-02-21T09:01:58.7442171Z // end inline asm 2026-02-21T09:01:58.7442234Z // begin inline asm 2026-02-21T09:01:58.7442476Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1052, %r1053, %r1054, %r1055}, [%r1056]; 2026-02-21T09:01:58.7442545Z // end inline asm 2026-02-21T09:01:58.7442614Z // begin inline asm 2026-02-21T09:01:58.7442747Z st.global.v4.b32 [ %rd132 + 0 ], { %r1047, %r1048, %r1049, %r1050 }; 2026-02-21T09:01:58.7442805Z // end inline asm 2026-02-21T09:01:58.7442871Z // begin inline asm 2026-02-21T09:01:58.7442989Z st.global.v4.b32 [ %rd133 + 0 ], { %r1052, %r1053, %r1054, %r1055 }; 2026-02-21T09:01:58.7443048Z // end inline asm 2026-02-21T09:01:58.7443268Z .loc 1 19 144 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:144 2026-02-21T09:01:58.7443334Z add.s32 %r223, %r1150, 4224; 2026-02-21T09:01:58.7443408Z setp.lt.s32 %p32, %r1150, -2176; 2026-02-21T09:01:58.7443472Z mov.b32 %r1150, %r223; 2026-02-21T09:01:58.7443544Z @%p32 bra $L__BB0_11; 2026-02-21T09:01:58.7443636Z $L__BB0_14: // %._crit_edge 2026-02-21T09:01:58.7443905Z .loc 1 19 4 // ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py:19:4 2026-02-21T09:01:58.7443968Z ret; 2026-02-21T09:01:58.7444028Z $L__tmp9: 2026-02-21T09:01:58.7444088Z $L__func_end0: 2026-02-21T09:01:58.7444183Z // -- End function 2026-02-21T09:01:58.7444241Z } 2026-02-21T09:01:58.7444488Z .file 1 "/tmp/torchinductor_root/h6/ch6c3d3lqq2xay65huhjm2m5jgqqelakydrdvq5sbfsugh6m3mfa.py" 2026-02-21T09:01:58.7444701Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:01:58.7444775Z .section .debug_abbrev 2026-02-21T09:01:58.7444828Z { 2026-02-21T09:01:58.7444927Z .b8 1 // Abbreviation Code 2026-02-21T09:01:58.7445033Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:01:58.7445131Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:01:58.7445231Z .b8 37 // DW_AT_producer 2026-02-21T09:01:58.7445384Z .b8 8 // DW_FORM_string 2026-02-21T09:01:58.7445476Z .b8 19 // DW_AT_language 2026-02-21T09:01:58.7445565Z .b8 5 // DW_FORM_data2 2026-02-21T09:01:58.7445650Z .b8 3 // DW_AT_name 2026-02-21T09:01:58.7445742Z .b8 8 // DW_FORM_string 2026-02-21T09:01:58.7445828Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:01:58.7445910Z .b8 6 // DW_FORM_data4 2026-02-21T09:01:58.7445999Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:01:58.7446079Z .b8 8 // DW_FORM_string 2026-02-21T09:01:58.7446156Z .b8 0 // EOM(1) 2026-02-21T09:01:58.7446240Z .b8 0 // EOM(2) 2026-02-21T09:01:58.7446332Z .b8 2 // Abbreviation Code 2026-02-21T09:01:58.7446427Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:01:58.7446621Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:01:58.7446813Z .b8 3 // DW_AT_name 2026-02-21T09:01:58.7446895Z .b8 8 // DW_FORM_string 2026-02-21T09:01:58.7446979Z .b8 32 // DW_AT_inline 2026-02-21T09:01:58.7447068Z .b8 11 // DW_FORM_data1 2026-02-21T09:01:58.7447143Z .b8 0 // EOM(1) 2026-02-21T09:01:58.7447215Z .b8 0 // EOM(2) 2026-02-21T09:01:58.7447311Z .b8 3 // Abbreviation Code 2026-02-21T09:01:58.7447400Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:01:58.7447486Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:01:58.7447643Z .b8 17 // DW_AT_low_pc 2026-02-21T09:01:58.7447735Z .b8 1 // DW_FORM_addr 2026-02-21T09:01:58.7447820Z .b8 18 // DW_AT_high_pc 2026-02-21T09:01:58.7447902Z .b8 1 // DW_FORM_addr 2026-02-21T09:01:58.7448009Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:01:58.7448087Z .b8 19 // DW_FORM_ref4 2026-02-21T09:01:58.7448159Z .b8 0 // EOM(1) 2026-02-21T09:01:58.7448235Z .b8 0 // EOM(2) 2026-02-21T09:01:58.7448323Z .b8 4 // Abbreviation Code 2026-02-21T09:01:58.7448426Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:01:58.7448515Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:01:58.7448620Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:01:58.7448789Z .b8 19 // DW_FORM_ref4 2026-02-21T09:01:58.7448882Z .b8 17 // DW_AT_low_pc 2026-02-21T09:01:58.7448970Z .b8 1 // DW_FORM_addr 2026-02-21T09:01:58.7449056Z .b8 18 // DW_AT_high_pc 2026-02-21T09:01:58.7449135Z .b8 1 // DW_FORM_addr 2026-02-21T09:01:58.7449228Z .b8 88 // DW_AT_call_file 2026-02-21T09:01:58.7449307Z .b8 11 // DW_FORM_data1 2026-02-21T09:01:58.7449390Z .b8 89 // DW_AT_call_line 2026-02-21T09:01:58.7449476Z .b8 11 // DW_FORM_data1 2026-02-21T09:01:58.7449565Z .b8 87 // DW_AT_call_column 2026-02-21T09:01:58.7449645Z .b8 11 // DW_FORM_data1 2026-02-21T09:01:58.7449720Z .b8 0 // EOM(1) 2026-02-21T09:01:58.7449800Z .b8 0 // EOM(2) 2026-02-21T09:01:58.7449940Z .b8 0 // EOM(3) 2026-02-21T09:01:58.7450009Z } 2026-02-21T09:01:58.7450084Z .section .debug_info 2026-02-21T09:01:58.7450139Z { 2026-02-21T09:01:58.7450233Z .b32 178 // Length of Unit 2026-02-21T09:01:58.7450334Z .b8 2 // DWARF version number 2026-02-21T09:01:58.7450389Z .b8 0 2026-02-21T09:01:58.7450526Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:01:58.7450632Z .b8 8 // Address Size (in bytes) 2026-02-21T09:01:58.7450761Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:01:58.7450849Z .b8 116 // DW_AT_producer 2026-02-21T09:01:58.7450905Z .b8 114 2026-02-21T09:01:58.7450965Z .b8 105 2026-02-21T09:01:58.7451019Z .b8 116 2026-02-21T09:01:58.7451074Z .b8 111 2026-02-21T09:01:58.7451130Z .b8 110 2026-02-21T09:01:58.7451189Z .b8 0 2026-02-21T09:01:58.7451273Z .b8 2 // DW_AT_language 2026-02-21T09:01:58.7451327Z .b8 0 2026-02-21T09:01:58.7451416Z .b8 99 // DW_AT_name 2026-02-21T09:01:58.7451539Z .b8 104 2026-02-21T09:01:58.7451593Z .b8 54 2026-02-21T09:01:58.7451647Z .b8 99 2026-02-21T09:01:58.7451706Z .b8 51 2026-02-21T09:01:58.7451760Z .b8 100 2026-02-21T09:01:58.7451813Z .b8 51 2026-02-21T09:01:58.7451872Z .b8 108 2026-02-21T09:01:58.7451925Z .b8 113 2026-02-21T09:01:58.7451978Z .b8 113 2026-02-21T09:01:58.7452031Z .b8 50 2026-02-21T09:01:58.7452091Z .b8 120 2026-02-21T09:01:58.7452144Z .b8 97 2026-02-21T09:01:58.7452197Z .b8 121 2026-02-21T09:01:58.7452260Z .b8 54 2026-02-21T09:01:58.7452313Z .b8 53 2026-02-21T09:01:58.7452367Z .b8 104 2026-02-21T09:01:58.7452424Z .b8 117 2026-02-21T09:01:58.7452484Z .b8 104 2026-02-21T09:01:58.7452537Z .b8 106 2026-02-21T09:01:58.7452591Z .b8 109 2026-02-21T09:01:58.7452647Z .b8 50 2026-02-21T09:01:58.7452767Z .b8 109 2026-02-21T09:01:58.7452826Z .b8 53 2026-02-21T09:01:58.7452883Z .b8 106 2026-02-21T09:01:58.7452946Z .b8 103 2026-02-21T09:01:58.7453000Z .b8 113 2026-02-21T09:01:58.7453055Z .b8 113 2026-02-21T09:01:58.7453109Z .b8 101 2026-02-21T09:01:58.7453170Z .b8 108 2026-02-21T09:01:58.7453224Z .b8 97 2026-02-21T09:01:58.7453280Z .b8 107 2026-02-21T09:01:58.7453341Z .b8 121 2026-02-21T09:01:58.7453396Z .b8 100 2026-02-21T09:01:58.7453460Z .b8 114 2026-02-21T09:01:58.7453519Z .b8 100 2026-02-21T09:01:58.7453581Z .b8 118 2026-02-21T09:01:58.7453637Z .b8 113 2026-02-21T09:01:58.7453691Z .b8 53 2026-02-21T09:01:58.7453749Z .b8 115 2026-02-21T09:01:58.7453800Z .b8 98 2026-02-21T09:01:58.7453851Z .b8 102 2026-02-21T09:01:58.7453904Z .b8 115 2026-02-21T09:01:58.7453964Z .b8 117 2026-02-21T09:01:58.7454016Z .b8 103 2026-02-21T09:01:58.7454067Z .b8 104 2026-02-21T09:01:58.7454124Z .b8 54 2026-02-21T09:01:58.7454176Z .b8 109 2026-02-21T09:01:58.7454229Z .b8 51 2026-02-21T09:01:58.7454357Z .b8 109 2026-02-21T09:01:58.7454417Z .b8 102 2026-02-21T09:01:58.7454469Z .b8 97 2026-02-21T09:01:58.7454524Z .b8 46 2026-02-21T09:01:58.7454582Z .b8 112 2026-02-21T09:01:58.7454634Z .b8 121 2026-02-21T09:01:58.7454687Z .b8 0 2026-02-21T09:01:58.7454794Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:01:58.7454883Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:01:58.7454937Z .b8 116 2026-02-21T09:01:58.7454989Z .b8 109 2026-02-21T09:01:58.7455050Z .b8 112 2026-02-21T09:01:58.7455101Z .b8 47 2026-02-21T09:01:58.7455154Z .b8 116 2026-02-21T09:01:58.7455211Z .b8 111 2026-02-21T09:01:58.7455268Z .b8 114 2026-02-21T09:01:58.7455320Z .b8 99 2026-02-21T09:01:58.7455373Z .b8 104 2026-02-21T09:01:58.7455426Z .b8 105 2026-02-21T09:01:58.7455487Z .b8 110 2026-02-21T09:01:58.7455540Z .b8 100 2026-02-21T09:01:58.7455592Z .b8 117 2026-02-21T09:01:58.7455663Z .b8 99 2026-02-21T09:01:58.7455717Z .b8 116 2026-02-21T09:01:58.7455770Z .b8 111 2026-02-21T09:01:58.7455825Z .b8 114 2026-02-21T09:01:58.7455884Z .b8 95 2026-02-21T09:01:58.7455999Z .b8 114 2026-02-21T09:01:58.7456059Z .b8 111 2026-02-21T09:01:58.7456118Z .b8 111 2026-02-21T09:01:58.7456171Z .b8 116 2026-02-21T09:01:58.7456225Z .b8 47 2026-02-21T09:01:58.7456279Z .b8 104 2026-02-21T09:01:58.7456336Z .b8 54 2026-02-21T09:01:58.7456388Z .b8 0 2026-02-21T09:01:58.7456616Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:01:58.7456708Z .b8 95 // DW_AT_name 2026-02-21T09:01:58.7456762Z .b8 104 2026-02-21T09:01:58.7456815Z .b8 101 2026-02-21T09:01:58.7456867Z .b8 108 2026-02-21T09:01:58.7456928Z .b8 105 2026-02-21T09:01:58.7456980Z .b8 111 2026-02-21T09:01:58.7457034Z .b8 110 2026-02-21T09:01:58.7457092Z .b8 95 2026-02-21T09:01:58.7457145Z .b8 109 2026-02-21T09:01:58.7457197Z .b8 97 2026-02-21T09:01:58.7457250Z .b8 116 2026-02-21T09:01:58.7457310Z .b8 109 2026-02-21T09:01:58.7457363Z .b8 117 2026-02-21T09:01:58.7457417Z .b8 108 2026-02-21T09:01:58.7457470Z .b8 95 2026-02-21T09:01:58.7457529Z .b8 98 2026-02-21T09:01:58.7457583Z .b8 102 2026-02-21T09:01:58.7457636Z .b8 49 2026-02-21T09:01:58.7457698Z .b8 54 2026-02-21T09:01:58.7457749Z .b8 95 2026-02-21T09:01:58.7457898Z .b8 105 2026-02-21T09:01:58.7457951Z .b8 110 2026-02-21T09:01:58.7458010Z .b8 116 2026-02-21T09:01:58.7458062Z .b8 52 2026-02-21T09:01:58.7458113Z .b8 0 2026-02-21T09:01:58.7458202Z .b8 1 // DW_AT_inline 2026-02-21T09:01:58.7458316Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:01:58.7458414Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:01:58.7458516Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:01:58.7458624Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:01:58.7458756Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:01:58.7458934Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:01:58.7459042Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:01:58.7459134Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T09:01:58.7459221Z .b8 1 // DW_AT_call_file 2026-02-21T09:01:58.7459314Z .b8 84 // DW_AT_call_line 2026-02-21T09:01:58.7459403Z .b8 40 // DW_AT_call_column 2026-02-21T09:01:58.7459498Z .b8 0 // End Of Children Mark 2026-02-21T09:01:58.7459595Z .b8 0 // End Of Children Mark 2026-02-21T09:01:58.7459650Z } 2026-02-21T09:01:58.7459723Z .section .debug_macinfo { } 2026-02-21T09:01:58.7459730Z 2026-02-21T09:01:58.7459810Z ================================================================ 2026-02-21T09:01:58.7459941Z please share the reproducer above with Triton project. 2026-02-21T09:02:02.3905739Z [67s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 16], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=4, num_stages=3, num_warps=16, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[None, False], range_num_stages=[3, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:02:02.3908181Z Tensor-likes are not close! 2026-02-21T09:02:02.3908435Z 2026-02-21T09:02:02.3908538Z Mismatched elements: 33528121 / 33554432 (99.9%) 2026-02-21T09:02:02.3908912Z Greatest absolute difference: 3280.0 at index (3294, 3994) (up to 0.01 allowed) 2026-02-21T09:02:02.3909373Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:02.3909785Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:02.3910011Z 2026-02-21T09:02:02.4172151Z [67s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=2, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, None], range_num_stages=[4, 1], range_unroll_factors=[1, 1], range_warp_specializes=[]) 2026-02-21T09:02:02.4173940Z Tensor-likes are not close! 2026-02-21T09:02:02.4174098Z 2026-02-21T09:02:02.4174215Z Mismatched elements: 33509612 / 33554432 (99.9%) 2026-02-21T09:02:02.4174644Z Greatest absolute difference: 3232.0 at index (47, 1710) (up to 0.01 allowed) 2026-02-21T09:02:02.4175169Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:02.4175625Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:02.4175882Z 2026-02-21T09:02:09.3778014Z [74s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 128, 256], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=128, num_stages=3, num_warps=32, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, True], range_num_stages=[1, 0], range_unroll_factors=[1, 4], range_warp_specializes=[]) 2026-02-21T09:02:09.3780296Z Tensor-likes are not close! 2026-02-21T09:02:09.3780498Z 2026-02-21T09:02:09.3780602Z Mismatched elements: 33509460 / 33554432 (99.9%) 2026-02-21T09:02:09.3781008Z Greatest absolute difference: 3408.0 at index (1142, 7740) (up to 0.01 allowed) 2026-02-21T09:02:09.3781499Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:09.3781937Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:09.3782184Z 2026-02-21T09:02:20.8593625Z [86s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=8, num_stages=3, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, True], range_num_stages=[1, 4], range_unroll_factors=[1, 1], range_warp_specializes=[]) 2026-02-21T09:02:20.8595609Z Tensor-likes are not close! 2026-02-21T09:02:20.8595773Z 2026-02-21T09:02:20.8595894Z Mismatched elements: 33503948 / 33554432 (99.8%) 2026-02-21T09:02:20.8596320Z Greatest absolute difference: 3200.0 at index (3811, 5777) (up to 0.01 allowed) 2026-02-21T09:02:20.8597021Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:20.8597483Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:20.8597748Z 2026-02-21T09:02:21.5523198Z 2026-02-21T09:02:21.5523214Z 2026-02-21T09:02:21.5523220Z 2026-02-21T09:02:21.5523552Z ================================================================ 2026-02-21T09:02:21.5524381Z Internal Triton PTX codegen error 2026-02-21T09:02:21.5524700Z `ptxas` stderr: 2026-02-21T09:02:21.5525459Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 400 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:02:21.5526304Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:02:21.5526772Z 2026-02-21T09:02:21.5527436Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpg__rzxlx.ptx -o /tmp/tmpg__rzxlx.ptx.o 2026-02-21T09:02:21.5528185Z 2026-02-21T09:02:21.5528190Z 2026-02-21T09:02:21.5528272Z // 2026-02-21T09:02:21.5528461Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:02:21.5528711Z // 2026-02-21T09:02:21.5528798Z 2026-02-21T09:02:21.5528870Z .version 8.7 2026-02-21T09:02:21.5529053Z .target sm_90a 2026-02-21T09:02:21.5529231Z .address_size 64 2026-02-21T09:02:21.5529353Z 2026-02-21T09:02:21.5529577Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:02:21.5530200Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:02:21.5530541Z // @_helion_matmul_bf16_int4 2026-02-21T09:02:21.5530880Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:02:21.5531253Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:02:21.5531701Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:02:21.5532139Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:02:21.5532568Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:02:21.5533002Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:02:21.5533336Z ) 2026-02-21T09:02:21.5533505Z .reqntid 512 2026-02-21T09:02:21.5533683Z .maxnreg 32 2026-02-21T09:02:21.5533856Z { 2026-02-21T09:02:21.5534024Z .reg .pred %p<69>; 2026-02-21T09:02:21.5534235Z .reg .b16 %rs<85>; 2026-02-21T09:02:21.5534399Z .reg .b32 %r<950>; 2026-02-21T09:02:21.5534563Z .reg .b64 %rd<264>; 2026-02-21T09:02:21.5534737Z $L__func_begin0: 2026-02-21T09:02:21.5534833Z 2026-02-21T09:02:21.5534891Z // %bb.0: 2026-02-21T09:02:21.5535383Z .loc 1 21 67 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:21:67 2026-02-21T09:02:21.5535781Z mov.u32 %r142, %ctaid.x; 2026-02-21T09:02:21.5536035Z ld.param.b64 %rd68, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:02:21.5536318Z mov.u32 %r143, %ctaid.y; 2026-02-21T09:02:21.5536733Z ld.param.b64 %rd85, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:02:21.5537001Z mov.u32 %r144, %ctaid.z; 2026-02-21T09:02:21.5537205Z mov.u32 %r145, %nctaid.x; 2026-02-21T09:02:21.5537397Z mov.u32 %r146, %nctaid.y; 2026-02-21T09:02:21.5537592Z mad.lo.s32 %r147, %r144, %r146, %r143; 2026-02-21T09:02:21.5537823Z mad.lo.s32 %r148, %r147, %r145, %r142; 2026-02-21T09:02:21.5538038Z shl.b32 %r149, %r148, 7; 2026-02-21T09:02:21.5538470Z cvt.s64.s32 %rd86, %r149; 2026-02-21T09:02:21.5538669Z add.s64 %rd82, %rd85, %rd86; 2026-02-21T09:02:21.5538863Z mov.u32 %r1, %tid.x; 2026-02-21T09:02:21.5539035Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:02:21.5539227Z shl.b32 %r150, %r1, 2; 2026-02-21T09:02:21.5539411Z mov.b32 %r151, global_smem; 2026-02-21T09:02:21.5539608Z add.s32 %r134, %r151, %r150; 2026-02-21T09:02:21.5539792Z mov.b32 %r135, 0; 2026-02-21T09:02:21.5539949Z // begin inline asm 2026-02-21T09:02:21.5540137Z @%p1 st.shared.b32 [ %r134 + 0 ], %r135; 2026-02-21T09:02:21.5540357Z // end inline asm 2026-02-21T09:02:21.5540526Z bar.warp.sync -1; 2026-02-21T09:02:21.5540697Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T09:02:21.5540892Z cvt.u64.u32 %rd67, %r151; 2026-02-21T09:02:21.5541077Z // begin inline asm 2026-02-21T09:02:21.5541436Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd67 + 0 ], %rd68; 2026-02-21T09:02:21.5541816Z // end inline asm 2026-02-21T09:02:21.5550577Z // begin inline asm 2026-02-21T09:02:21.5550871Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x1; 2026-02-21T09:02:21.5551189Z // end inline asm 2026-02-21T09:02:21.5551343Z mov.b32 %r136, 16; 2026-02-21T09:02:21.5551499Z // begin inline asm 2026-02-21T09:02:21.5551784Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x0, %r136; 2026-02-21T09:02:21.5552123Z // end inline asm 2026-02-21T09:02:21.5552304Z mov.b32 %r137, 256; 2026-02-21T09:02:21.5552465Z // begin inline asm 2026-02-21T09:02:21.5552754Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x1, %r137; 2026-02-21T09:02:21.5553093Z // end inline asm 2026-02-21T09:02:21.5553243Z mov.b32 %r138, 8192; 2026-02-21T09:02:21.5553405Z // begin inline asm 2026-02-21T09:02:21.5553709Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x0, %r138; 2026-02-21T09:02:21.5554070Z // end inline asm 2026-02-21T09:02:21.5554219Z mov.b32 %r139, 4096; 2026-02-21T09:02:21.5554387Z // begin inline asm 2026-02-21T09:02:21.5554775Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x1, %r139; 2026-02-21T09:02:21.5555128Z // end inline asm 2026-02-21T09:02:21.5555285Z mov.b64 %rd75, 16384; 2026-02-21T09:02:21.5555447Z // begin inline asm 2026-02-21T09:02:21.5555758Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd67 + 0 ], 0x0, %rd75; 2026-02-21T09:02:21.5556108Z // end inline asm 2026-02-21T09:02:21.5556270Z mov.b32 %r140, 1; 2026-02-21T09:02:21.5556424Z // begin inline asm 2026-02-21T09:02:21.5556874Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x0, %r140; 2026-02-21T09:02:21.5557252Z // end inline asm 2026-02-21T09:02:21.5557405Z // begin inline asm 2026-02-21T09:02:21.5557722Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x1, %r140; 2026-02-21T09:02:21.5558076Z // end inline asm 2026-02-21T09:02:21.5558232Z // begin inline asm 2026-02-21T09:02:21.5558509Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd67 + 0 ], 0xa; 2026-02-21T09:02:21.5558834Z // end inline asm 2026-02-21T09:02:21.5558982Z // begin inline asm 2026-02-21T09:02:21.5559400Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x0; 2026-02-21T09:02:21.5559749Z // end inline asm 2026-02-21T09:02:21.5559892Z // begin inline asm 2026-02-21T09:02:21.5560170Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x1; 2026-02-21T09:02:21.5560508Z // end inline asm 2026-02-21T09:02:21.5560659Z // begin inline asm 2026-02-21T09:02:21.5560926Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd67 + 0 ], 0x0; 2026-02-21T09:02:21.5561245Z // end inline asm 2026-02-21T09:02:21.5561394Z // begin inline asm 2026-02-21T09:02:21.5561863Z [86s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:02:21.5563520Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 256, 16], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=6, num_warps=16, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[True, True], range_num_stages=[2, 3], range_unroll_factors=[4, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:02:21.5565044Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:02:21.5565327Z `ptxas` stderr: 2026-02-21T09:02:21.5565875Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 400 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:02:21.5566623Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:02:21.5566810Z 2026-02-21T09:02:21.5567310Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpg__rzxlx.ptx -o /tmp/tmpg__rzxlx.ptx.o 2026-02-21T09:02:21.5567991Z 2026-02-21T09:02:21.5568149Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:02:21.5568739Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd82 + 0 ], [ %rd67 + 0 ], 0x80; 2026-02-21T09:02:21.5569234Z // end inline asm 2026-02-21T09:02:21.5569391Z // begin inline asm 2026-02-21T09:02:21.5569640Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd82 + 0 ], 0x80; 2026-02-21T09:02:21.5569957Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:02:21.5570185Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:02:21.5570391Z // end inline asm 2026-02-21T09:02:21.5570544Z bar.sync 0; 2026-02-21T09:02:21.5570698Z cvta.global.u64 %rd205, %rd82; 2026-02-21T09:02:21.5571042Z .loc 1 27 35 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:27:35 2026-02-21T09:02:21.5571401Z shl.b32 %r905, %r142, 4; 2026-02-21T09:02:21.5571800Z .loc 1 28 37 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:28:37 2026-02-21T09:02:21.5572159Z add.s32 %r152, %r905, 16; 2026-02-21T09:02:21.5572476Z .loc 1 28 49 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:28:49 2026-02-21T09:02:21.5572829Z min.s32 %r3, %r152, 8192; 2026-02-21T09:02:21.5573132Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5573487Z setp.ge.s32 %p19, %r905, %r3; 2026-02-21T09:02:21.5573675Z @%p19 bra $L__BB0_11; 2026-02-21T09:02:21.5573861Z // %bb.1: // %.lr.ph 2026-02-21T09:02:21.5574211Z .loc 1 0 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:0:88 2026-02-21T09:02:21.5574610Z ld.param.b64 %rd66, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:02:21.5574911Z ld.param.b64 %rd65, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:02:21.5575156Z shr.u32 %r4, %r1, 5; 2026-02-21T09:02:21.5575322Z and.b32 %r5, %r1, 15; 2026-02-21T09:02:21.5575484Z shr.u32 %r153, %r1, 3; 2026-02-21T09:02:21.5575653Z bfe.u32 %r6, %r1, 3, 6; 2026-02-21T09:02:21.5575904Z or.b32 %r7, %r153, 64; 2026-02-21T09:02:21.5576070Z or.b32 %r8, %r6, 128; 2026-02-21T09:02:21.5576228Z or.b32 %r9, %r153, 192; 2026-02-21T09:02:21.5576394Z and.b32 %r10, %r1, 7; 2026-02-21T09:02:21.5576696Z shl.b32 %r11, %r10, 2; 2026-02-21T09:02:21.5576858Z and.b32 %r12, %r1, 16; 2026-02-21T09:02:21.5577166Z .loc 1 35 45 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:35:45 2026-02-21T09:02:21.5577514Z and.b32 %r154, %r1, 480; 2026-02-21T09:02:21.5577692Z bfe.u32 %r155, %r1, 5, 4; 2026-02-21T09:02:21.5577860Z shl.b32 %r156, %r1, 3; 2026-02-21T09:02:21.5578030Z and.b32 %r157, %r156, 4088; 2026-02-21T09:02:21.5578205Z shr.u32 %r158, %r1, 1; 2026-02-21T09:02:21.5578369Z and.b32 %r159, %r158, 24; 2026-02-21T09:02:21.5578541Z xor.b32 %r13, %r157, %r159; 2026-02-21T09:02:21.5578813Z add.s32 %r722, %r151, %r13; 2026-02-21T09:02:21.5578994Z add.s32 %r724, %r722, 4096; 2026-02-21T09:02:21.5579178Z add.s32 %r726, %r722, 8192; 2026-02-21T09:02:21.5579360Z add.s32 %r728, %r722, 12288; 2026-02-21T09:02:21.5579537Z add.s32 %r730, %r722, 16384; 2026-02-21T09:02:21.5579717Z add.s32 %r732, %r722, 20480; 2026-02-21T09:02:21.5579887Z add.s32 %r734, %r722, 24576; 2026-02-21T09:02:21.5580061Z add.s32 %r736, %r722, 28672; 2026-02-21T09:02:21.5580229Z shl.b32 %r161, %r154, 5; 2026-02-21T09:02:21.5580399Z shl.b32 %r162, %r1, 4; 2026-02-21T09:02:21.5580561Z and.b32 %r163, %r162, 448; 2026-02-21T09:02:21.5580740Z shl.b32 %r164, %r1, 1; 2026-02-21T09:02:21.5580908Z and.b32 %r165, %r164, 6; 2026-02-21T09:02:21.5581079Z and.b32 %r166, %r1, 24; 2026-02-21T09:02:21.5581250Z or.b32 %r167, %r161, %r163; 2026-02-21T09:02:21.5581421Z or.b32 %r168, %r166, %r165; 2026-02-21T09:02:21.5581601Z or.b32 %r22, %r167, %r168; 2026-02-21T09:02:21.5581859Z xor.b32 %r23, %r22, 8; 2026-02-21T09:02:21.5582042Z xor.b32 %r24, %r22, 16; 2026-02-21T09:02:21.5582208Z xor.b32 %r25, %r22, 24; 2026-02-21T09:02:21.5582376Z shl.b32 %r169, %r5, 7; 2026-02-21T09:02:21.5582540Z shl.b32 %r170, %r10, 4; 2026-02-21T09:02:21.5582709Z shr.u32 %r171, %r1, 2; 2026-02-21T09:02:21.5582877Z and.b32 %r172, %r171, 124; 2026-02-21T09:02:21.5583044Z xor.b32 %r173, %r170, %r172; 2026-02-21T09:02:21.5583220Z add.s32 %r174, %r151, 32768; 2026-02-21T09:02:21.5583388Z add.s32 %r175, %r174, %r169; 2026-02-21T09:02:21.5583562Z add.s32 %r26, %r175, %r173; 2026-02-21T09:02:21.5583732Z bfe.u32 %r176, %r174, 4, 14; 2026-02-21T09:02:21.5583916Z cvt.u64.u32 %rd87, %r176; 2026-02-21T09:02:21.5584102Z or.b64 %rd235, %rd87, 4611686293313683456; 2026-02-21T09:02:21.5584315Z add.s32 %r177, %r151, 32800; 2026-02-21T09:02:21.5584492Z bfe.u32 %r178, %r177, 4, 14; 2026-02-21T09:02:21.5584666Z cvt.u64.u32 %rd88, %r178; 2026-02-21T09:02:21.5584857Z or.b64 %rd236, %rd88, 4611686293313683456; 2026-02-21T09:02:21.5585151Z add.s32 %r179, %r151, 32832; 2026-02-21T09:02:21.5585335Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T09:02:21.5585505Z cvt.u64.u32 %rd89, %r180; 2026-02-21T09:02:21.5585693Z or.b64 %rd237, %rd89, 4611686293313683456; 2026-02-21T09:02:21.5585892Z add.s32 %r181, %r151, 32864; 2026-02-21T09:02:21.5586067Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T09:02:21.5586236Z cvt.u64.u32 %rd90, %r182; 2026-02-21T09:02:21.5586423Z or.b64 %rd238, %rd90, 4611686293313683456; 2026-02-21T09:02:21.5586775Z shl.b32 %r183, %r1, 5; 2026-02-21T09:02:21.5586936Z and.b32 %r184, %r183, 352; 2026-02-21T09:02:21.5587111Z shl.b32 %r185, %r154, 4; 2026-02-21T09:02:21.5587275Z bfe.s32 %r186, %r1, 2, 1; 2026-02-21T09:02:21.5587444Z and.b32 %r187, %r186, 144; 2026-02-21T09:02:21.5587611Z or.b32 %r188, %r187, %r185; 2026-02-21T09:02:21.5587786Z xor.b32 %r189, %r188, %r12; 2026-02-21T09:02:21.5587956Z add.s32 %r190, %r151, %r184; 2026-02-21T09:02:21.5588137Z add.s32 %r27, %r190, %r189; 2026-02-21T09:02:21.5588547Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5588910Z mad.wide.u32 %rd91, %r10, 8, %rd65; 2026-02-21T09:02:21.5589240Z add.s64 %rd6, %rd91, 128; 2026-02-21T09:02:21.5589408Z shl.b32 %r29, %r7, 10; 2026-02-21T09:02:21.5589573Z shl.b32 %r30, %r6, 10; 2026-02-21T09:02:21.5589730Z shl.b32 %r192, %r9, 10; 2026-02-21T09:02:21.5590037Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5590379Z or.b32 %r193, %r192, %r11; 2026-02-21T09:02:21.5590553Z or.b32 %r31, %r193, 64; 2026-02-21T09:02:21.5590859Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5591218Z mad.wide.u32 %rd7, %r155, 8192, %rd66; 2026-02-21T09:02:21.5591478Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:02:21.5591841Z // Child Loop BB0_3 Depth 2 2026-02-21T09:02:21.5592119Z // Child Loop BB0_5 Depth 2 2026-02-21T09:02:21.5592377Z // Child Loop BB0_7 Depth 2 2026-02-21T09:02:21.5592643Z // Child Loop BB0_9 Depth 2 2026-02-21T09:02:21.5593022Z .loc 1 33 31 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:33:31 2026-02-21T09:02:21.5593377Z shr.s32 %r213, %r905, 31; 2026-02-21T09:02:21.5593552Z shr.u32 %r214, %r213, 23; 2026-02-21T09:02:21.5593720Z add.s32 %r215, %r905, %r214; 2026-02-21T09:02:21.5593902Z shr.s32 %r216, %r215, 9; 2026-02-21T09:02:21.5594212Z .loc 1 32 30 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:32:30 2026-02-21T09:02:21.5594568Z and.b32 %r217, %r215, 268434944; 2026-02-21T09:02:21.5594758Z sub.s32 %r218, %r905, %r217; 2026-02-21T09:02:21.5595154Z .loc 1 34 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:34:27 2026-02-21T09:02:21.5595503Z shl.b32 %r363, %r218, 4; 2026-02-21T09:02:21.5595802Z .loc 1 36 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:36:27 2026-02-21T09:02:21.5596154Z shl.b32 %r364, %r216, 8; 2026-02-21T09:02:21.5596579Z .loc 1 37 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:37:32 2026-02-21T09:02:21.5596939Z or.b32 %r219, %r364, %r6; 2026-02-21T09:02:21.5597111Z or.b32 %r220, %r364, %r7; 2026-02-21T09:02:21.5597273Z or.b32 %r221, %r364, %r8; 2026-02-21T09:02:21.5597445Z or.b32 %r222, %r364, %r9; 2026-02-21T09:02:21.5597742Z .loc 1 52 53 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:53 2026-02-21T09:02:21.5598089Z shl.b32 %r223, %r219, 10; 2026-02-21T09:02:21.5598254Z shl.b32 %r224, %r220, 10; 2026-02-21T09:02:21.5598421Z shl.b32 %r225, %r221, 10; 2026-02-21T09:02:21.5598590Z shl.b32 %r226, %r222, 10; 2026-02-21T09:02:21.5598978Z .loc 1 52 60 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:60 2026-02-21T09:02:21.5599336Z or.b32 %r227, %r223, %r11; 2026-02-21T09:02:21.5599510Z or.b32 %r228, %r224, %r11; 2026-02-21T09:02:21.5599685Z or.b32 %r229, %r225, %r11; 2026-02-21T09:02:21.5599868Z or.b32 %r230, %r226, %r11; 2026-02-21T09:02:21.5600185Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5600536Z mad.wide.s32 %rd92, %r227, 2, %rd65; 2026-02-21T09:02:21.5600745Z mad.wide.s32 %rd93, %r228, 2, %rd65; 2026-02-21T09:02:21.5600940Z mad.wide.s32 %rd94, %r229, 2, %rd65; 2026-02-21T09:02:21.5601141Z mad.wide.s32 %rd95, %r230, 2, %rd65; 2026-02-21T09:02:21.5601471Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5601812Z bar.sync 0; 2026-02-21T09:02:21.5601960Z mov.b32 %r195, 8; 2026-02-21T09:02:21.5602112Z // begin inline asm 2026-02-21T09:02:21.5602360Z cp.async.ca.shared.global [ %r722 + 0 ], [ %rd92 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5602632Z // end inline asm 2026-02-21T09:02:21.5602876Z // begin inline asm 2026-02-21T09:02:21.5603097Z cp.async.ca.shared.global [ %r724 + 0 ], [ %rd93 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5603370Z // end inline asm 2026-02-21T09:02:21.5603524Z // begin inline asm 2026-02-21T09:02:21.5603741Z cp.async.ca.shared.global [ %r726 + 0 ], [ %rd94 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5604020Z // end inline asm 2026-02-21T09:02:21.5604167Z // begin inline asm 2026-02-21T09:02:21.5604392Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd95 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5604657Z // end inline asm 2026-02-21T09:02:21.5604823Z cp.async.commit_group; 2026-02-21T09:02:21.5605130Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5605496Z cvt.s64.s32 %rd101, %r223; 2026-02-21T09:02:21.5605768Z cvt.u64.u32 %rd8, %r11; 2026-02-21T09:02:21.5605952Z or.b64 %rd102, %rd101, %rd8; 2026-02-21T09:02:21.5606160Z shl.b64 %rd103, %rd102, 1; 2026-02-21T09:02:21.5606340Z add.s64 %rd104, %rd65, %rd103; 2026-02-21T09:02:21.5606659Z add.s64 %rd96, %rd104, 64; 2026-02-21T09:02:21.5606833Z cvt.s64.s32 %rd105, %r224; 2026-02-21T09:02:21.5607011Z or.b64 %rd106, %rd105, %rd8; 2026-02-21T09:02:21.5607188Z shl.b64 %rd107, %rd106, 1; 2026-02-21T09:02:21.5607364Z add.s64 %rd108, %rd65, %rd107; 2026-02-21T09:02:21.5607547Z add.s64 %rd97, %rd108, 64; 2026-02-21T09:02:21.5607723Z cvt.s64.s32 %rd109, %r225; 2026-02-21T09:02:21.5607911Z or.b64 %rd110, %rd109, %rd8; 2026-02-21T09:02:21.5608087Z shl.b64 %rd111, %rd110, 1; 2026-02-21T09:02:21.5608266Z add.s64 %rd112, %rd65, %rd111; 2026-02-21T09:02:21.5608442Z add.s64 %rd98, %rd112, 64; 2026-02-21T09:02:21.5608615Z cvt.s64.s32 %rd113, %r226; 2026-02-21T09:02:21.5608784Z or.b64 %rd114, %rd113, %rd8; 2026-02-21T09:02:21.5609067Z shl.b64 %rd115, %rd114, 1; 2026-02-21T09:02:21.5609242Z add.s64 %rd116, %rd65, %rd115; 2026-02-21T09:02:21.5609424Z add.s64 %rd99, %rd116, 64; 2026-02-21T09:02:21.5609753Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5610105Z bar.sync 0; 2026-02-21T09:02:21.5610254Z // begin inline asm 2026-02-21T09:02:21.5610490Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd96 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5610768Z // end inline asm 2026-02-21T09:02:21.5610917Z // begin inline asm 2026-02-21T09:02:21.5611148Z cp.async.ca.shared.global [ %r732 + 0 ], [ %rd97 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5611412Z // end inline asm 2026-02-21T09:02:21.5611568Z // begin inline asm 2026-02-21T09:02:21.5611792Z cp.async.ca.shared.global [ %r734 + 0 ], [ %rd98 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5612055Z // end inline asm 2026-02-21T09:02:21.5612209Z // begin inline asm 2026-02-21T09:02:21.5612431Z cp.async.ca.shared.global [ %r736 + 0 ], [ %rd99 + 0 ], 0x8, %r195; 2026-02-21T09:02:21.5612781Z // end inline asm 2026-02-21T09:02:21.5612940Z cp.async.commit_group; 2026-02-21T09:02:21.5613254Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5613604Z or.b32 %r231, %r8, %r364; 2026-02-21T09:02:21.5613789Z shl.b32 %r232, %r231, 10; 2026-02-21T09:02:21.5613977Z mad.wide.s32 %rd247, %r232, 2, %rd6; 2026-02-21T09:02:21.5614178Z shl.b32 %r233, %r216, 18; 2026-02-21T09:02:21.5614351Z or.b32 %r234, %r29, %r233; 2026-02-21T09:02:21.5614541Z mad.wide.s32 %rd246, %r234, 2, %rd6; 2026-02-21T09:02:21.5614743Z or.b32 %r235, %r30, %r233; 2026-02-21T09:02:21.5614925Z mad.wide.s32 %rd245, %r235, 2, %rd6; 2026-02-21T09:02:21.5615126Z or.b32 %r906, %r31, %r233; 2026-02-21T09:02:21.5615299Z shl.b32 %r236, %r905, 4; 2026-02-21T09:02:21.5615477Z or.b32 %r237, %r5, %r236; 2026-02-21T09:02:21.5615644Z shl.b32 %r238, %r216, 13; 2026-02-21T09:02:21.5615824Z sub.s32 %r239, %r237, %r238; 2026-02-21T09:02:21.5616007Z cvt.s64.s32 %rd117, %r239; 2026-02-21T09:02:21.5616186Z add.s64 %rd244, %rd7, %rd117; 2026-02-21T09:02:21.5616377Z mov.b32 %r909, 0f00000000; 2026-02-21T09:02:21.5616760Z mov.b32 %r908, 1; 2026-02-21T09:02:21.5616925Z mov.b32 %r907, -1; 2026-02-21T09:02:21.5617084Z mov.b64 %rd248, -16; 2026-02-21T09:02:21.5617253Z mov.b32 %r910, %r909; 2026-02-21T09:02:21.5617413Z mov.b32 %r911, %r909; 2026-02-21T09:02:21.5617576Z mov.b32 %r912, %r909; 2026-02-21T09:02:21.5617729Z mov.b32 %r913, %r909; 2026-02-21T09:02:21.5617900Z mov.b32 %r914, %r909; 2026-02-21T09:02:21.5618066Z mov.b32 %r915, %r909; 2026-02-21T09:02:21.5618221Z mov.b32 %r916, %r909; 2026-02-21T09:02:21.5618430Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:02:21.5618726Z // => This Inner Loop Header: Depth=2 2026-02-21T09:02:21.5619207Z .loc 1 75 38 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:75:38 2026-02-21T09:02:21.5619568Z setp.eq.b32 %p24, %r12, 0; 2026-02-21T09:02:21.5619886Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5620239Z add.s64 %rd248, %rd248, 16; 2026-02-21T09:02:21.5620426Z setp.lt.u64 %p25, %rd248, 480; 2026-02-21T09:02:21.5620616Z add.s32 %r350, %r907, 1; 2026-02-21T09:02:21.5620789Z setp.gt.s32 %p26, %r350, 1; 2026-02-21T09:02:21.5620981Z selp.b32 %r907, 0, %r350, %p26; 2026-02-21T09:02:21.5621302Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5621660Z cp.async.wait_group 1; 2026-02-21T09:02:21.5621831Z bar.sync 0; 2026-02-21T09:02:21.5621982Z shl.b32 %r351, %r907, 14; 2026-02-21T09:02:21.5622161Z add.s32 %r353, %r151, %r351; 2026-02-21T09:02:21.5622472Z .loc 1 56 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:56:32 2026-02-21T09:02:21.5622911Z add.s32 %r354, %r353, %r22; 2026-02-21T09:02:21.5623090Z ld.shared.b16 %rs2, [%r354]; 2026-02-21T09:02:21.5623284Z ld.shared.b16 %rs3, [%r354+512]; 2026-02-21T09:02:21.5623479Z ld.shared.b16 %rs4, [%r354+32]; 2026-02-21T09:02:21.5623678Z ld.shared.b16 %rs5, [%r354+544]; 2026-02-21T09:02:21.5623868Z add.s32 %r355, %r353, %r23; 2026-02-21T09:02:21.5624051Z ld.shared.b16 %rs6, [%r355]; 2026-02-21T09:02:21.5624236Z ld.shared.b16 %rs7, [%r355+512]; 2026-02-21T09:02:21.5624426Z ld.shared.b16 %rs8, [%r355+32]; 2026-02-21T09:02:21.5624619Z ld.shared.b16 %rs9, [%r355+544]; 2026-02-21T09:02:21.5624804Z add.s32 %r356, %r353, %r24; 2026-02-21T09:02:21.5624991Z ld.shared.b16 %rs10, [%r356]; 2026-02-21T09:02:21.5625178Z ld.shared.b16 %rs11, [%r356+512]; 2026-02-21T09:02:21.5625378Z ld.shared.b16 %rs12, [%r356+32]; 2026-02-21T09:02:21.5625566Z ld.shared.b16 %rs13, [%r356+544]; 2026-02-21T09:02:21.5625760Z add.s32 %r357, %r353, %r25; 2026-02-21T09:02:21.5625941Z ld.shared.b16 %rs14, [%r357]; 2026-02-21T09:02:21.5626214Z ld.shared.b16 %rs15, [%r357+512]; 2026-02-21T09:02:21.5626415Z ld.shared.b16 %rs16, [%r357+32]; 2026-02-21T09:02:21.5626723Z ld.shared.b16 %rs17, [%r357+544]; 2026-02-21T09:02:21.5626924Z cvt.f32.bf16 %r256, %rs2; 2026-02-21T09:02:21.5627098Z cvt.f32.bf16 %r257, %rs3; 2026-02-21T09:02:21.5627275Z cvt.f32.bf16 %r258, %rs6; 2026-02-21T09:02:21.5627447Z cvt.f32.bf16 %r259, %rs7; 2026-02-21T09:02:21.5627641Z cvt.f32.bf16 %r276, %rs10; 2026-02-21T09:02:21.5627821Z cvt.f32.bf16 %r277, %rs11; 2026-02-21T09:02:21.5628001Z cvt.f32.bf16 %r278, %rs14; 2026-02-21T09:02:21.5628184Z cvt.f32.bf16 %r279, %rs15; 2026-02-21T09:02:21.5628434Z cvt.f32.bf16 %r296, %rs4; 2026-02-21T09:02:21.5628609Z cvt.f32.bf16 %r297, %rs5; 2026-02-21T09:02:21.5628775Z cvt.f32.bf16 %r298, %rs8; 2026-02-21T09:02:21.5628945Z cvt.f32.bf16 %r299, %rs9; 2026-02-21T09:02:21.5629113Z cvt.f32.bf16 %r316, %rs12; 2026-02-21T09:02:21.5629288Z cvt.f32.bf16 %r317, %rs13; 2026-02-21T09:02:21.5629461Z cvt.f32.bf16 %r318, %rs16; 2026-02-21T09:02:21.5629637Z cvt.f32.bf16 %r319, %rs17; 2026-02-21T09:02:21.5629945Z .loc 1 58 87 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:58:87 2026-02-21T09:02:21.5630392Z // begin inline asm 2026-02-21T09:02:21.5630553Z mov.u64 %rd118, 0x0; 2026-02-21T09:02:21.5630780Z createpolicy.fractional.L2::evict_first.b64 %rd118, 1.0; 2026-02-21T09:02:21.5631043Z // end inline asm 2026-02-21T09:02:21.5631193Z // begin inline asm 2026-02-21T09:02:21.5631352Z mov.u16 %rs1, 0x0; 2026-02-21T09:02:21.5631600Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd244 + 0 ], %rd118; 2026-02-21T09:02:21.5631903Z // end inline asm 2026-02-21T09:02:21.5632192Z .loc 1 61 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:61:28 2026-02-21T09:02:21.5632549Z shl.b16 %rs18, %rs1, 4; 2026-02-21T09:02:21.5632944Z .loc 1 76 58 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:76:58 2026-02-21T09:02:21.5633346Z selp.b16 %rs19, %rs18, %rs1, %p24; 2026-02-21T09:02:21.5633566Z cvt.s16.s8 %rs20, %rs19; 2026-02-21T09:02:21.5633738Z shr.s16 %rs21, %rs20, 4; 2026-02-21T09:02:21.5634047Z .loc 1 81 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:81:32 2026-02-21T09:02:21.5634396Z cvt.rn.f32.s16 %r358, %rs21; 2026-02-21T09:02:21.5634585Z st.shared.b32 [%r26], %r358; 2026-02-21T09:02:21.5634764Z $L__tmp0: 2026-02-21T09:02:21.5635113Z .loc 2 291 36 // standard.py:291:36 @[ cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:88:40 ] 2026-02-21T09:02:21.5635530Z // begin inline asm 2026-02-21T09:02:21.5635710Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5635904Z // end inline asm 2026-02-21T09:02:21.5636048Z bar.sync 0; 2026-02-21T09:02:21.5636214Z shfl.sync.idx.b32 %r359, %r4, 0, 31, -1; 2026-02-21T09:02:21.5636435Z wgmma.fence.sync.aligned; 2026-02-21T09:02:21.5636849Z mov.pred %p20, -1; 2026-02-21T09:02:21.5637019Z // begin inline asm 2026-02-21T09:02:21.5637466Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r256,%r257,%r258,%r259}, %rd235, %p20, 1, 1; 2026-02-21T09:02:21.5637966Z // end inline asm 2026-02-21T09:02:21.5638116Z // begin inline asm 2026-02-21T09:02:21.5638553Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r276,%r277,%r278,%r279}, %rd236, %p20, 1, 1; 2026-02-21T09:02:21.5639033Z // end inline asm 2026-02-21T09:02:21.5639193Z // begin inline asm 2026-02-21T09:02:21.5639626Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r296,%r297,%r298,%r299}, %rd237, %p20, 1, 1; 2026-02-21T09:02:21.5640104Z // end inline asm 2026-02-21T09:02:21.5640261Z // begin inline asm 2026-02-21T09:02:21.5640689Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r316,%r317,%r318,%r319}, %rd238, %p20, 1, 1; 2026-02-21T09:02:21.5641269Z // end inline asm 2026-02-21T09:02:21.5641441Z wgmma.commit_group.sync.aligned; 2026-02-21T09:02:21.5641644Z mov.b32 %r330, 0; 2026-02-21T09:02:21.5641800Z mov.b32 %r329, %r330; 2026-02-21T09:02:21.5641966Z mov.b32 %r328, %r174; 2026-02-21T09:02:21.5642144Z // begin inline asm 2026-02-21T09:02:21.5642390Z // wait for regs: %r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916,%r328,%r329,%r330 2026-02-21T09:02:21.5642715Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:02:21.5642906Z // end inline asm 2026-02-21T09:02:21.5643056Z $L__tmp1: 2026-02-21T09:02:21.5643338Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5643700Z add.s32 %r360, %r908, 1; 2026-02-21T09:02:21.5643875Z setp.gt.s32 %p27, %r360, 1; 2026-02-21T09:02:21.5644063Z selp.b32 %r908, 0, %r360, %p27; 2026-02-21T09:02:21.5644392Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5644753Z mad.wide.s32 %rd128, %r906, 2, %rd65; 2026-02-21T09:02:21.5645098Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5645525Z shl.b32 %r361, %r908, 14; 2026-02-21T09:02:21.5645706Z add.s32 %r362, %r151, %r361; 2026-02-21T09:02:21.5645882Z add.s32 %r342, %r362, %r13; 2026-02-21T09:02:21.5646064Z selp.b32 %r343, 8, 0, %p25; 2026-02-21T09:02:21.5646241Z // begin inline asm 2026-02-21T09:02:21.5646604Z cp.async.ca.shared.global [ %r342 + 0 ], [ %rd245 + 0 ], 0x8, %r343; 2026-02-21T09:02:21.5646896Z // end inline asm 2026-02-21T09:02:21.5647046Z add.s32 %r344, %r342, 4096; 2026-02-21T09:02:21.5647232Z // begin inline asm 2026-02-21T09:02:21.5647456Z cp.async.ca.shared.global [ %r344 + 0 ], [ %rd246 + 0 ], 0x8, %r343; 2026-02-21T09:02:21.5647731Z // end inline asm 2026-02-21T09:02:21.5647880Z add.s32 %r346, %r342, 8192; 2026-02-21T09:02:21.5648146Z // begin inline asm 2026-02-21T09:02:21.5648375Z cp.async.ca.shared.global [ %r346 + 0 ], [ %rd247 + 0 ], 0x8, %r343; 2026-02-21T09:02:21.5648655Z // end inline asm 2026-02-21T09:02:21.5648815Z add.s32 %r348, %r342, 12288; 2026-02-21T09:02:21.5648990Z // begin inline asm 2026-02-21T09:02:21.5649234Z cp.async.ca.shared.global [ %r348 + 0 ], [ %rd128 + 0 ], 0x8, %r343; 2026-02-21T09:02:21.5649503Z // end inline asm 2026-02-21T09:02:21.5649663Z cp.async.commit_group; 2026-02-21T09:02:21.5649968Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5650320Z add.s64 %rd247, %rd247, 64; 2026-02-21T09:02:21.5650501Z add.s64 %rd246, %rd246, 64; 2026-02-21T09:02:21.5650677Z add.s64 %rd245, %rd245, 64; 2026-02-21T09:02:21.5650855Z add.s32 %r906, %r906, 32; 2026-02-21T09:02:21.5651042Z add.s64 %rd244, %rd244, 131072; 2026-02-21T09:02:21.5651244Z setp.lt.u64 %p28, %rd248, 496; 2026-02-21T09:02:21.5651509Z @%p28 bra $L__BB0_3; 2026-02-21T09:02:21.5651731Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:02:21.5651989Z cp.async.wait_group 0; 2026-02-21T09:02:21.5652169Z bar.sync 0; 2026-02-21T09:02:21.5652458Z .loc 1 91 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:91:28 2026-02-21T09:02:21.5652829Z cvt.rn.bf16x2.f32 %r385, %r910, %r909; 2026-02-21T09:02:21.5653051Z cvt.rn.bf16x2.f32 %r386, %r912, %r911; 2026-02-21T09:02:21.5653260Z cvt.rn.bf16x2.f32 %r387, %r914, %r913; 2026-02-21T09:02:21.5653470Z cvt.rn.bf16x2.f32 %r388, %r916, %r915; 2026-02-21T09:02:21.5653804Z .loc 1 92 43 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:92:43 2026-02-21T09:02:21.5654281Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r385, %r386, %r387, %r388}; 2026-02-21T09:02:21.5654602Z // begin inline asm 2026-02-21T09:02:21.5654786Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5654979Z // end inline asm 2026-02-21T09:02:21.5655129Z bar.sync 0; 2026-02-21T09:02:21.5655376Z elect.sync %r389|%p31, -1; 2026-02-21T09:02:21.5655571Z and.pred %p29, %p1, %p31; 2026-02-21T09:02:21.5655750Z // begin inline asm 2026-02-21T09:02:21.5656068Z @%p29 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd205, {%r363, %r364}], [%r151]; 2026-02-21T09:02:21.5656439Z // end inline asm 2026-02-21T09:02:21.5656716Z cp.async.bulk.commit_group; 2026-02-21T09:02:21.5656920Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:21.5657114Z bar.sync 0; 2026-02-21T09:02:21.5657395Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5657755Z or.b32 %r390, %r905, 1; 2026-02-21T09:02:21.5658066Z .loc 1 33 31 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:33:31 2026-02-21T09:02:21.5658414Z add.s32 %r393, %r390, %r214; 2026-02-21T09:02:21.5658590Z shr.s32 %r394, %r393, 9; 2026-02-21T09:02:21.5658905Z .loc 1 32 30 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:32:30 2026-02-21T09:02:21.5659257Z and.b32 %r395, %r393, 268434944; 2026-02-21T09:02:21.5659442Z sub.s32 %r396, %r390, %r395; 2026-02-21T09:02:21.5659855Z .loc 1 34 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:34:27 2026-02-21T09:02:21.5660202Z shl.b32 %r541, %r396, 4; 2026-02-21T09:02:21.5660509Z .loc 1 36 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:36:27 2026-02-21T09:02:21.5660848Z shl.b32 %r542, %r394, 8; 2026-02-21T09:02:21.5661151Z .loc 1 37 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:37:32 2026-02-21T09:02:21.5661499Z or.b32 %r397, %r542, %r6; 2026-02-21T09:02:21.5661667Z or.b32 %r398, %r542, %r7; 2026-02-21T09:02:21.5661840Z or.b32 %r399, %r542, %r8; 2026-02-21T09:02:21.5662005Z or.b32 %r400, %r542, %r9; 2026-02-21T09:02:21.5662391Z .loc 1 52 53 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:53 2026-02-21T09:02:21.5662743Z shl.b32 %r401, %r397, 10; 2026-02-21T09:02:21.5662914Z shl.b32 %r402, %r398, 10; 2026-02-21T09:02:21.5663078Z shl.b32 %r403, %r399, 10; 2026-02-21T09:02:21.5663250Z shl.b32 %r404, %r400, 10; 2026-02-21T09:02:21.5663575Z .loc 1 52 60 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:60 2026-02-21T09:02:21.5663933Z or.b32 %r405, %r401, %r11; 2026-02-21T09:02:21.5672629Z or.b32 %r406, %r402, %r11; 2026-02-21T09:02:21.5672853Z or.b32 %r407, %r403, %r11; 2026-02-21T09:02:21.5673056Z or.b32 %r408, %r404, %r11; 2026-02-21T09:02:21.5673407Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5673801Z mad.wide.s32 %rd130, %r405, 2, %rd65; 2026-02-21T09:02:21.5674017Z mad.wide.s32 %rd131, %r406, 2, %rd65; 2026-02-21T09:02:21.5674229Z mad.wide.s32 %rd132, %r407, 2, %rd65; 2026-02-21T09:02:21.5674609Z mad.wide.s32 %rd133, %r408, 2, %rd65; 2026-02-21T09:02:21.5674812Z mov.b32 %r367, 8; 2026-02-21T09:02:21.5675128Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5675489Z // begin inline asm 2026-02-21T09:02:21.5675740Z cp.async.ca.shared.global [ %r722 + 0 ], [ %rd130 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5676021Z // end inline asm 2026-02-21T09:02:21.5676180Z // begin inline asm 2026-02-21T09:02:21.5676408Z cp.async.ca.shared.global [ %r724 + 0 ], [ %rd131 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5676848Z // end inline asm 2026-02-21T09:02:21.5676999Z // begin inline asm 2026-02-21T09:02:21.5677226Z cp.async.ca.shared.global [ %r726 + 0 ], [ %rd132 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5677517Z // end inline asm 2026-02-21T09:02:21.5677670Z // begin inline asm 2026-02-21T09:02:21.5677897Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd133 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5678178Z // end inline asm 2026-02-21T09:02:21.5678350Z cp.async.commit_group; 2026-02-21T09:02:21.5678762Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5679142Z cvt.s64.s32 %rd139, %r401; 2026-02-21T09:02:21.5679336Z or.b64 %rd140, %rd139, %rd8; 2026-02-21T09:02:21.5679528Z shl.b64 %rd141, %rd140, 1; 2026-02-21T09:02:21.5679729Z add.s64 %rd142, %rd65, %rd141; 2026-02-21T09:02:21.5679923Z add.s64 %rd134, %rd142, 64; 2026-02-21T09:02:21.5680104Z cvt.s64.s32 %rd143, %r402; 2026-02-21T09:02:21.5680297Z or.b64 %rd144, %rd143, %rd8; 2026-02-21T09:02:21.5680478Z shl.b64 %rd145, %rd144, 1; 2026-02-21T09:02:21.5680662Z add.s64 %rd146, %rd65, %rd145; 2026-02-21T09:02:21.5680854Z add.s64 %rd135, %rd146, 64; 2026-02-21T09:02:21.5681032Z cvt.s64.s32 %rd147, %r403; 2026-02-21T09:02:21.5681214Z or.b64 %rd148, %rd147, %rd8; 2026-02-21T09:02:21.5681393Z shl.b64 %rd149, %rd148, 1; 2026-02-21T09:02:21.5681579Z add.s64 %rd150, %rd65, %rd149; 2026-02-21T09:02:21.5681763Z add.s64 %rd136, %rd150, 64; 2026-02-21T09:02:21.5681954Z cvt.s64.s32 %rd151, %r404; 2026-02-21T09:02:21.5682131Z or.b64 %rd152, %rd151, %rd8; 2026-02-21T09:02:21.5682317Z shl.b64 %rd153, %rd152, 1; 2026-02-21T09:02:21.5682603Z add.s64 %rd154, %rd65, %rd153; 2026-02-21T09:02:21.5682782Z add.s64 %rd137, %rd154, 64; 2026-02-21T09:02:21.5683117Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5683476Z bar.sync 0; 2026-02-21T09:02:21.5683650Z // begin inline asm 2026-02-21T09:02:21.5683898Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd134 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5684182Z // end inline asm 2026-02-21T09:02:21.5684336Z // begin inline asm 2026-02-21T09:02:21.5684569Z cp.async.ca.shared.global [ %r732 + 0 ], [ %rd135 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5684844Z // end inline asm 2026-02-21T09:02:21.5684997Z // begin inline asm 2026-02-21T09:02:21.5685229Z cp.async.ca.shared.global [ %r734 + 0 ], [ %rd136 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5685598Z // end inline asm 2026-02-21T09:02:21.5685760Z // begin inline asm 2026-02-21T09:02:21.5685985Z cp.async.ca.shared.global [ %r736 + 0 ], [ %rd137 + 0 ], 0x8, %r367; 2026-02-21T09:02:21.5686264Z // end inline asm 2026-02-21T09:02:21.5686427Z cp.async.commit_group; 2026-02-21T09:02:21.5686889Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5687254Z or.b32 %r409, %r8, %r542; 2026-02-21T09:02:21.5687434Z shl.b32 %r410, %r409, 10; 2026-02-21T09:02:21.5687623Z mad.wide.s32 %rd252, %r410, 2, %rd6; 2026-02-21T09:02:21.5687823Z shl.b32 %r411, %r394, 18; 2026-02-21T09:02:21.5688000Z or.b32 %r412, %r29, %r411; 2026-02-21T09:02:21.5688179Z mad.wide.s32 %rd251, %r412, 2, %rd6; 2026-02-21T09:02:21.5688385Z or.b32 %r413, %r30, %r411; 2026-02-21T09:02:21.5688561Z mad.wide.s32 %rd250, %r413, 2, %rd6; 2026-02-21T09:02:21.5688760Z or.b32 %r917, %r31, %r411; 2026-02-21T09:02:21.5689024Z shl.b32 %r414, %r390, 4; 2026-02-21T09:02:21.5689207Z or.b32 %r415, %r5, %r414; 2026-02-21T09:02:21.5689385Z shl.b32 %r416, %r394, 13; 2026-02-21T09:02:21.5689562Z sub.s32 %r417, %r415, %r416; 2026-02-21T09:02:21.5689753Z cvt.s64.s32 %rd155, %r417; 2026-02-21T09:02:21.5689930Z add.s64 %rd249, %rd7, %rd155; 2026-02-21T09:02:21.5690137Z mov.b32 %r920, 0f00000000; 2026-02-21T09:02:21.5690309Z mov.b32 %r919, 1; 2026-02-21T09:02:21.5690472Z mov.b32 %r918, -1; 2026-02-21T09:02:21.5690635Z mov.b64 %rd253, -16; 2026-02-21T09:02:21.5690806Z mov.b32 %r921, %r920; 2026-02-21T09:02:21.5690970Z mov.b32 %r922, %r920; 2026-02-21T09:02:21.5691140Z mov.b32 %r923, %r920; 2026-02-21T09:02:21.5691306Z mov.b32 %r924, %r920; 2026-02-21T09:02:21.5691466Z mov.b32 %r925, %r920; 2026-02-21T09:02:21.5691635Z mov.b32 %r926, %r920; 2026-02-21T09:02:21.5691792Z mov.b32 %r927, %r920; 2026-02-21T09:02:21.5692008Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:02:21.5692390Z // => This Inner Loop Header: Depth=2 2026-02-21T09:02:21.5692660Z add.s64 %rd253, %rd253, 16; 2026-02-21T09:02:21.5692853Z setp.lt.u64 %p37, %rd253, 480; 2026-02-21T09:02:21.5693056Z add.s32 %r528, %r918, 1; 2026-02-21T09:02:21.5693240Z setp.gt.s32 %p38, %r528, 1; 2026-02-21T09:02:21.5693430Z selp.b32 %r918, 0, %r528, %p38; 2026-02-21T09:02:21.5693795Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5694172Z cp.async.wait_group 1; 2026-02-21T09:02:21.5694357Z bar.sync 0; 2026-02-21T09:02:21.5694522Z shl.b32 %r529, %r918, 14; 2026-02-21T09:02:21.5694713Z add.s32 %r531, %r151, %r529; 2026-02-21T09:02:21.5695033Z .loc 1 56 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:56:32 2026-02-21T09:02:21.5695393Z add.s32 %r532, %r531, %r22; 2026-02-21T09:02:21.5695584Z ld.shared.b16 %rs23, [%r532]; 2026-02-21T09:02:21.5695777Z ld.shared.b16 %rs24, [%r532+512]; 2026-02-21T09:02:21.5695991Z ld.shared.b16 %rs25, [%r532+32]; 2026-02-21T09:02:21.5696187Z ld.shared.b16 %rs26, [%r532+544]; 2026-02-21T09:02:21.5696390Z add.s32 %r533, %r531, %r23; 2026-02-21T09:02:21.5696804Z ld.shared.b16 %rs27, [%r533]; 2026-02-21T09:02:21.5697007Z ld.shared.b16 %rs28, [%r533+512]; 2026-02-21T09:02:21.5697211Z ld.shared.b16 %rs29, [%r533+32]; 2026-02-21T09:02:21.5697418Z ld.shared.b16 %rs30, [%r533+544]; 2026-02-21T09:02:21.5697617Z add.s32 %r534, %r531, %r24; 2026-02-21T09:02:21.5697798Z ld.shared.b16 %rs31, [%r534]; 2026-02-21T09:02:21.5697989Z ld.shared.b16 %rs32, [%r534+512]; 2026-02-21T09:02:21.5698184Z ld.shared.b16 %rs33, [%r534+32]; 2026-02-21T09:02:21.5698384Z ld.shared.b16 %rs34, [%r534+544]; 2026-02-21T09:02:21.5698569Z add.s32 %r535, %r531, %r25; 2026-02-21T09:02:21.5698755Z ld.shared.b16 %rs35, [%r535]; 2026-02-21T09:02:21.5698936Z ld.shared.b16 %rs36, [%r535+512]; 2026-02-21T09:02:21.5699133Z ld.shared.b16 %rs37, [%r535+32]; 2026-02-21T09:02:21.5699412Z ld.shared.b16 %rs38, [%r535+544]; 2026-02-21T09:02:21.5699621Z cvt.f32.bf16 %r434, %rs23; 2026-02-21T09:02:21.5699804Z cvt.f32.bf16 %r435, %rs24; 2026-02-21T09:02:21.5699977Z cvt.f32.bf16 %r436, %rs27; 2026-02-21T09:02:21.5700159Z cvt.f32.bf16 %r437, %rs28; 2026-02-21T09:02:21.5700328Z cvt.f32.bf16 %r454, %rs31; 2026-02-21T09:02:21.5700507Z cvt.f32.bf16 %r455, %rs32; 2026-02-21T09:02:21.5700681Z cvt.f32.bf16 %r456, %rs35; 2026-02-21T09:02:21.5700858Z cvt.f32.bf16 %r457, %rs36; 2026-02-21T09:02:21.5701036Z cvt.f32.bf16 %r474, %rs25; 2026-02-21T09:02:21.5701216Z cvt.f32.bf16 %r475, %rs26; 2026-02-21T09:02:21.5701398Z cvt.f32.bf16 %r476, %rs29; 2026-02-21T09:02:21.5701571Z cvt.f32.bf16 %r477, %rs30; 2026-02-21T09:02:21.5701754Z cvt.f32.bf16 %r494, %rs33; 2026-02-21T09:02:21.5701926Z cvt.f32.bf16 %r495, %rs34; 2026-02-21T09:02:21.5702110Z cvt.f32.bf16 %r496, %rs37; 2026-02-21T09:02:21.5702283Z cvt.f32.bf16 %r497, %rs38; 2026-02-21T09:02:21.5702702Z .loc 1 58 87 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:58:87 2026-02-21T09:02:21.5703061Z // begin inline asm 2026-02-21T09:02:21.5703234Z mov.u64 %rd156, 0x0; 2026-02-21T09:02:21.5703490Z createpolicy.fractional.L2::evict_first.b64 %rd156, 1.0; 2026-02-21T09:02:21.5703758Z // end inline asm 2026-02-21T09:02:21.5703930Z // begin inline asm 2026-02-21T09:02:21.5704089Z mov.u16 %rs22, 0x0; 2026-02-21T09:02:21.5704371Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs22 }, [ %rd249 + 0 ], %rd156; 2026-02-21T09:02:21.5704678Z // end inline asm 2026-02-21T09:02:21.5704979Z .loc 1 61 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:61:28 2026-02-21T09:02:21.5705333Z shl.b16 %rs39, %rs22, 4; 2026-02-21T09:02:21.5705647Z .loc 1 76 58 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:76:58 2026-02-21T09:02:21.5706014Z selp.b16 %rs40, %rs39, %rs22, %p24; 2026-02-21T09:02:21.5706222Z cvt.s16.s8 %rs41, %rs40; 2026-02-21T09:02:21.5706597Z shr.s16 %rs42, %rs41, 4; 2026-02-21T09:02:21.5706924Z .loc 1 81 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:81:32 2026-02-21T09:02:21.5707289Z cvt.rn.f32.s16 %r536, %rs42; 2026-02-21T09:02:21.5707477Z st.shared.b32 [%r26], %r536; 2026-02-21T09:02:21.5707659Z $L__tmp2: 2026-02-21T09:02:21.5708010Z .loc 2 291 36 // standard.py:291:36 @[ cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:88:40 ] 2026-02-21T09:02:21.5708514Z // begin inline asm 2026-02-21T09:02:21.5708700Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5708894Z // end inline asm 2026-02-21T09:02:21.5709051Z bar.sync 0; 2026-02-21T09:02:21.5709220Z shfl.sync.idx.b32 %r537, %r4, 0, 31, -1; 2026-02-21T09:02:21.5709456Z wgmma.fence.sync.aligned; 2026-02-21T09:02:21.5709637Z mov.pred %p32, -1; 2026-02-21T09:02:21.5709801Z // begin inline asm 2026-02-21T09:02:21.5710253Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r920,%r921,%r922,%r923,%r924,%r925,%r926,%r927}, {%r434,%r435,%r436,%r437}, %rd235, %p32, 1, 1; 2026-02-21T09:02:21.5710772Z // end inline asm 2026-02-21T09:02:21.5710935Z // begin inline asm 2026-02-21T09:02:21.5711466Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r920,%r921,%r922,%r923,%r924,%r925,%r926,%r927}, {%r454,%r455,%r456,%r457}, %rd236, %p32, 1, 1; 2026-02-21T09:02:21.5711961Z // end inline asm 2026-02-21T09:02:21.5712110Z // begin inline asm 2026-02-21T09:02:21.5712548Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r920,%r921,%r922,%r923,%r924,%r925,%r926,%r927}, {%r474,%r475,%r476,%r477}, %rd237, %p32, 1, 1; 2026-02-21T09:02:21.5713035Z // end inline asm 2026-02-21T09:02:21.5713185Z // begin inline asm 2026-02-21T09:02:21.5713626Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r920,%r921,%r922,%r923,%r924,%r925,%r926,%r927}, {%r494,%r495,%r496,%r497}, %rd238, %p32, 1, 1; 2026-02-21T09:02:21.5714118Z // end inline asm 2026-02-21T09:02:21.5714373Z wgmma.commit_group.sync.aligned; 2026-02-21T09:02:21.5714594Z mov.b32 %r508, 0; 2026-02-21T09:02:21.5714758Z mov.b32 %r507, %r508; 2026-02-21T09:02:21.5714921Z mov.b32 %r506, %r174; 2026-02-21T09:02:21.5715088Z // begin inline asm 2026-02-21T09:02:21.5715347Z // wait for regs: %r920,%r921,%r922,%r923,%r924,%r925,%r926,%r927,%r506,%r507,%r508 2026-02-21T09:02:21.5715670Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:02:21.5715870Z // end inline asm 2026-02-21T09:02:21.5716014Z $L__tmp3: 2026-02-21T09:02:21.5716311Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5716794Z add.s32 %r538, %r919, 1; 2026-02-21T09:02:21.5716977Z setp.gt.s32 %p39, %r538, 1; 2026-02-21T09:02:21.5717172Z selp.b32 %r919, 0, %r538, %p39; 2026-02-21T09:02:21.5717500Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5717865Z mad.wide.s32 %rd166, %r917, 2, %rd65; 2026-02-21T09:02:21.5718295Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5718649Z shl.b32 %r539, %r919, 14; 2026-02-21T09:02:21.5718826Z add.s32 %r540, %r151, %r539; 2026-02-21T09:02:21.5719011Z add.s32 %r520, %r540, %r13; 2026-02-21T09:02:21.5719189Z selp.b32 %r521, 8, 0, %p37; 2026-02-21T09:02:21.5719366Z // begin inline asm 2026-02-21T09:02:21.5719604Z cp.async.ca.shared.global [ %r520 + 0 ], [ %rd250 + 0 ], 0x8, %r521; 2026-02-21T09:02:21.5719880Z // end inline asm 2026-02-21T09:02:21.5720038Z add.s32 %r522, %r520, 4096; 2026-02-21T09:02:21.5720219Z // begin inline asm 2026-02-21T09:02:21.5720456Z cp.async.ca.shared.global [ %r522 + 0 ], [ %rd251 + 0 ], 0x8, %r521; 2026-02-21T09:02:21.5720727Z // end inline asm 2026-02-21T09:02:21.5720881Z add.s32 %r524, %r520, 8192; 2026-02-21T09:02:21.5721052Z // begin inline asm 2026-02-21T09:02:21.5721279Z cp.async.ca.shared.global [ %r524 + 0 ], [ %rd252 + 0 ], 0x8, %r521; 2026-02-21T09:02:21.5721562Z // end inline asm 2026-02-21T09:02:21.5721792Z add.s32 %r526, %r520, 12288; 2026-02-21T09:02:21.5721978Z // begin inline asm 2026-02-21T09:02:21.5722199Z cp.async.ca.shared.global [ %r526 + 0 ], [ %rd166 + 0 ], 0x8, %r521; 2026-02-21T09:02:21.5722476Z // end inline asm 2026-02-21T09:02:21.5722630Z cp.async.commit_group; 2026-02-21T09:02:21.5722946Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5723298Z add.s64 %rd252, %rd252, 64; 2026-02-21T09:02:21.5723482Z add.s64 %rd251, %rd251, 64; 2026-02-21T09:02:21.5723665Z add.s64 %rd250, %rd250, 64; 2026-02-21T09:02:21.5723840Z add.s32 %r917, %r917, 32; 2026-02-21T09:02:21.5724020Z add.s64 %rd249, %rd249, 131072; 2026-02-21T09:02:21.5724211Z setp.lt.u64 %p40, %rd253, 496; 2026-02-21T09:02:21.5724407Z @%p40 bra $L__BB0_5; 2026-02-21T09:02:21.5724621Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:02:21.5724897Z cp.async.wait_group 0; 2026-02-21T09:02:21.5725074Z bar.sync 0; 2026-02-21T09:02:21.5725381Z .loc 1 91 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:91:28 2026-02-21T09:02:21.5725871Z cvt.rn.bf16x2.f32 %r563, %r921, %r920; 2026-02-21T09:02:21.5726091Z cvt.rn.bf16x2.f32 %r564, %r923, %r922; 2026-02-21T09:02:21.5726308Z cvt.rn.bf16x2.f32 %r565, %r925, %r924; 2026-02-21T09:02:21.5726631Z cvt.rn.bf16x2.f32 %r566, %r927, %r926; 2026-02-21T09:02:21.5726976Z .loc 1 92 43 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:92:43 2026-02-21T09:02:21.5727445Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r563, %r564, %r565, %r566}; 2026-02-21T09:02:21.5727781Z // begin inline asm 2026-02-21T09:02:21.5727967Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5728158Z // end inline asm 2026-02-21T09:02:21.5728310Z bar.sync 0; 2026-02-21T09:02:21.5728466Z elect.sync %r567|%p43, -1; 2026-02-21T09:02:21.5728664Z and.pred %p41, %p1, %p43; 2026-02-21T09:02:21.5728924Z // begin inline asm 2026-02-21T09:02:21.5729254Z @%p41 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd205, {%r541, %r542}], [%r151]; 2026-02-21T09:02:21.5729619Z // end inline asm 2026-02-21T09:02:21.5729788Z cp.async.bulk.commit_group; 2026-02-21T09:02:21.5729989Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:21.5730190Z bar.sync 0; 2026-02-21T09:02:21.5730491Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5730843Z or.b32 %r568, %r905, 2; 2026-02-21T09:02:21.5731153Z .loc 1 33 31 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:33:31 2026-02-21T09:02:21.5731501Z add.s32 %r571, %r568, %r214; 2026-02-21T09:02:21.5731687Z shr.s32 %r572, %r571, 9; 2026-02-21T09:02:21.5731989Z .loc 1 32 30 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:32:30 2026-02-21T09:02:21.5732341Z and.b32 %r573, %r571, 268434944; 2026-02-21T09:02:21.5732618Z sub.s32 %r574, %r568, %r573; 2026-02-21T09:02:21.5732931Z .loc 1 34 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:34:27 2026-02-21T09:02:21.5733283Z shl.b32 %r719, %r574, 4; 2026-02-21T09:02:21.5733588Z .loc 1 36 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:36:27 2026-02-21T09:02:21.5733943Z shl.b32 %r720, %r572, 8; 2026-02-21T09:02:21.5734244Z .loc 1 37 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:37:32 2026-02-21T09:02:21.5734595Z or.b32 %r575, %r720, %r6; 2026-02-21T09:02:21.5734771Z or.b32 %r576, %r720, %r7; 2026-02-21T09:02:21.5734939Z or.b32 %r577, %r720, %r8; 2026-02-21T09:02:21.5735113Z or.b32 %r578, %r720, %r9; 2026-02-21T09:02:21.5735413Z .loc 1 52 53 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:53 2026-02-21T09:02:21.5735761Z shl.b32 %r579, %r575, 10; 2026-02-21T09:02:21.5735932Z shl.b32 %r580, %r576, 10; 2026-02-21T09:02:21.5736183Z shl.b32 %r581, %r577, 10; 2026-02-21T09:02:21.5736351Z shl.b32 %r582, %r578, 10; 2026-02-21T09:02:21.5736796Z .loc 1 52 60 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:60 2026-02-21T09:02:21.5737159Z or.b32 %r583, %r579, %r11; 2026-02-21T09:02:21.5737338Z or.b32 %r584, %r580, %r11; 2026-02-21T09:02:21.5737515Z or.b32 %r585, %r581, %r11; 2026-02-21T09:02:21.5737683Z or.b32 %r586, %r582, %r11; 2026-02-21T09:02:21.5737996Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5738360Z mad.wide.s32 %rd168, %r583, 2, %rd65; 2026-02-21T09:02:21.5738579Z mad.wide.s32 %rd169, %r584, 2, %rd65; 2026-02-21T09:02:21.5738787Z mad.wide.s32 %rd170, %r585, 2, %rd65; 2026-02-21T09:02:21.5738984Z mad.wide.s32 %rd171, %r586, 2, %rd65; 2026-02-21T09:02:21.5739179Z mov.b32 %r545, 8; 2026-02-21T09:02:21.5739468Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5739813Z // begin inline asm 2026-02-21T09:02:21.5740047Z cp.async.ca.shared.global [ %r722 + 0 ], [ %rd168 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5740411Z // end inline asm 2026-02-21T09:02:21.5740557Z // begin inline asm 2026-02-21T09:02:21.5740780Z cp.async.ca.shared.global [ %r724 + 0 ], [ %rd169 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5741050Z // end inline asm 2026-02-21T09:02:21.5741208Z // begin inline asm 2026-02-21T09:02:21.5741432Z cp.async.ca.shared.global [ %r726 + 0 ], [ %rd170 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5741698Z // end inline asm 2026-02-21T09:02:21.5741846Z // begin inline asm 2026-02-21T09:02:21.5742062Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd171 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5742332Z // end inline asm 2026-02-21T09:02:21.5742480Z cp.async.commit_group; 2026-02-21T09:02:21.5742788Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5743225Z cvt.s64.s32 %rd177, %r579; 2026-02-21T09:02:21.5743409Z or.b64 %rd178, %rd177, %rd8; 2026-02-21T09:02:21.5743592Z shl.b64 %rd179, %rd178, 1; 2026-02-21T09:02:21.5743770Z add.s64 %rd180, %rd65, %rd179; 2026-02-21T09:02:21.5743959Z add.s64 %rd172, %rd180, 64; 2026-02-21T09:02:21.5744136Z cvt.s64.s32 %rd181, %r580; 2026-02-21T09:02:21.5744314Z or.b64 %rd182, %rd181, %rd8; 2026-02-21T09:02:21.5744487Z shl.b64 %rd183, %rd182, 1; 2026-02-21T09:02:21.5744662Z add.s64 %rd184, %rd65, %rd183; 2026-02-21T09:02:21.5744847Z add.s64 %rd173, %rd184, 64; 2026-02-21T09:02:21.5745028Z cvt.s64.s32 %rd185, %r581; 2026-02-21T09:02:21.5745215Z or.b64 %rd186, %rd185, %rd8; 2026-02-21T09:02:21.5745395Z shl.b64 %rd187, %rd186, 1; 2026-02-21T09:02:21.5745572Z add.s64 %rd188, %rd65, %rd187; 2026-02-21T09:02:21.5745754Z add.s64 %rd174, %rd188, 64; 2026-02-21T09:02:21.5745933Z cvt.s64.s32 %rd189, %r582; 2026-02-21T09:02:21.5746107Z or.b64 %rd190, %rd189, %rd8; 2026-02-21T09:02:21.5746375Z shl.b64 %rd191, %rd190, 1; 2026-02-21T09:02:21.5746661Z add.s64 %rd192, %rd65, %rd191; 2026-02-21T09:02:21.5746849Z add.s64 %rd175, %rd192, 64; 2026-02-21T09:02:21.5747163Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5747518Z bar.sync 0; 2026-02-21T09:02:21.5747670Z // begin inline asm 2026-02-21T09:02:21.5747900Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd172 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5748178Z // end inline asm 2026-02-21T09:02:21.5748400Z // begin inline asm 2026-02-21T09:02:21.5748630Z cp.async.ca.shared.global [ %r732 + 0 ], [ %rd173 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5748900Z // end inline asm 2026-02-21T09:02:21.5749053Z // begin inline asm 2026-02-21T09:02:21.5749278Z cp.async.ca.shared.global [ %r734 + 0 ], [ %rd174 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5749555Z // end inline asm 2026-02-21T09:02:21.5749711Z // begin inline asm 2026-02-21T09:02:21.5750020Z cp.async.ca.shared.global [ %r736 + 0 ], [ %rd175 + 0 ], 0x8, %r545; 2026-02-21T09:02:21.5750299Z // end inline asm 2026-02-21T09:02:21.5750466Z cp.async.commit_group; 2026-02-21T09:02:21.5750780Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5751133Z or.b32 %r587, %r8, %r720; 2026-02-21T09:02:21.5751315Z shl.b32 %r588, %r587, 10; 2026-02-21T09:02:21.5751496Z mad.wide.s32 %rd257, %r588, 2, %rd6; 2026-02-21T09:02:21.5751700Z shl.b32 %r589, %r572, 18; 2026-02-21T09:02:21.5751875Z or.b32 %r590, %r29, %r589; 2026-02-21T09:02:21.5752055Z mad.wide.s32 %rd256, %r590, 2, %rd6; 2026-02-21T09:02:21.5752255Z or.b32 %r591, %r30, %r589; 2026-02-21T09:02:21.5752431Z mad.wide.s32 %rd255, %r591, 2, %rd6; 2026-02-21T09:02:21.5752626Z or.b32 %r928, %r31, %r589; 2026-02-21T09:02:21.5752795Z shl.b32 %r592, %r568, 4; 2026-02-21T09:02:21.5752971Z or.b32 %r593, %r5, %r592; 2026-02-21T09:02:21.5753134Z shl.b32 %r594, %r572, 13; 2026-02-21T09:02:21.5753312Z sub.s32 %r595, %r593, %r594; 2026-02-21T09:02:21.5753495Z cvt.s64.s32 %rd193, %r595; 2026-02-21T09:02:21.5753671Z add.s64 %rd254, %rd7, %rd193; 2026-02-21T09:02:21.5753856Z mov.b32 %r931, 0f00000000; 2026-02-21T09:02:21.5754106Z mov.b32 %r930, 1; 2026-02-21T09:02:21.5754263Z mov.b32 %r929, -1; 2026-02-21T09:02:21.5754415Z mov.b64 %rd258, -16; 2026-02-21T09:02:21.5754578Z mov.b32 %r932, %r931; 2026-02-21T09:02:21.5754735Z mov.b32 %r933, %r931; 2026-02-21T09:02:21.5754894Z mov.b32 %r934, %r931; 2026-02-21T09:02:21.5755050Z mov.b32 %r935, %r931; 2026-02-21T09:02:21.5755208Z mov.b32 %r936, %r931; 2026-02-21T09:02:21.5755367Z mov.b32 %r937, %r931; 2026-02-21T09:02:21.5755521Z mov.b32 %r938, %r931; 2026-02-21T09:02:21.5755745Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:02:21.5756042Z // => This Inner Loop Header: Depth=2 2026-02-21T09:02:21.5756306Z add.s64 %rd258, %rd258, 16; 2026-02-21T09:02:21.5756693Z setp.lt.u64 %p49, %rd258, 480; 2026-02-21T09:02:21.5756898Z add.s32 %r706, %r929, 1; 2026-02-21T09:02:21.5757072Z setp.gt.s32 %p50, %r706, 1; 2026-02-21T09:02:21.5757259Z selp.b32 %r929, 0, %r706, %p50; 2026-02-21T09:02:21.5757594Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5757957Z cp.async.wait_group 1; 2026-02-21T09:02:21.5758126Z bar.sync 0; 2026-02-21T09:02:21.5758279Z shl.b32 %r707, %r929, 14; 2026-02-21T09:02:21.5758458Z add.s32 %r709, %r151, %r707; 2026-02-21T09:02:21.5758770Z .loc 1 56 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:56:32 2026-02-21T09:02:21.5759123Z add.s32 %r710, %r709, %r22; 2026-02-21T09:02:21.5759307Z ld.shared.b16 %rs44, [%r710]; 2026-02-21T09:02:21.5759502Z ld.shared.b16 %rs45, [%r710+512]; 2026-02-21T09:02:21.5759705Z ld.shared.b16 %rs46, [%r710+32]; 2026-02-21T09:02:21.5759916Z ld.shared.b16 %rs47, [%r710+544]; 2026-02-21T09:02:21.5760198Z add.s32 %r711, %r709, %r23; 2026-02-21T09:02:21.5760385Z ld.shared.b16 %rs48, [%r711]; 2026-02-21T09:02:21.5760573Z ld.shared.b16 %rs49, [%r711+512]; 2026-02-21T09:02:21.5760765Z ld.shared.b16 %rs50, [%r711+32]; 2026-02-21T09:02:21.5760964Z ld.shared.b16 %rs51, [%r711+544]; 2026-02-21T09:02:21.5761153Z add.s32 %r712, %r709, %r24; 2026-02-21T09:02:21.5761334Z ld.shared.b16 %rs52, [%r712]; 2026-02-21T09:02:21.5761515Z ld.shared.b16 %rs53, [%r712+512]; 2026-02-21T09:02:21.5761708Z ld.shared.b16 %rs54, [%r712+32]; 2026-02-21T09:02:21.5761895Z ld.shared.b16 %rs55, [%r712+544]; 2026-02-21T09:02:21.5762088Z add.s32 %r713, %r709, %r25; 2026-02-21T09:02:21.5762267Z ld.shared.b16 %rs56, [%r713]; 2026-02-21T09:02:21.5762447Z ld.shared.b16 %rs57, [%r713+512]; 2026-02-21T09:02:21.5762645Z ld.shared.b16 %rs58, [%r713+32]; 2026-02-21T09:02:21.5762845Z ld.shared.b16 %rs59, [%r713+544]; 2026-02-21T09:02:21.5763045Z cvt.f32.bf16 %r612, %rs44; 2026-02-21T09:02:21.5763225Z cvt.f32.bf16 %r613, %rs45; 2026-02-21T09:02:21.5763486Z cvt.f32.bf16 %r614, %rs48; 2026-02-21T09:02:21.5763660Z cvt.f32.bf16 %r615, %rs49; 2026-02-21T09:02:21.5763834Z cvt.f32.bf16 %r632, %rs52; 2026-02-21T09:02:21.5764005Z cvt.f32.bf16 %r633, %rs53; 2026-02-21T09:02:21.5764177Z cvt.f32.bf16 %r634, %rs56; 2026-02-21T09:02:21.5764352Z cvt.f32.bf16 %r635, %rs57; 2026-02-21T09:02:21.5764523Z cvt.f32.bf16 %r652, %rs46; 2026-02-21T09:02:21.5764711Z cvt.f32.bf16 %r653, %rs47; 2026-02-21T09:02:21.5764887Z cvt.f32.bf16 %r654, %rs50; 2026-02-21T09:02:21.5765060Z cvt.f32.bf16 %r655, %rs51; 2026-02-21T09:02:21.5765228Z cvt.f32.bf16 %r672, %rs54; 2026-02-21T09:02:21.5765402Z cvt.f32.bf16 %r673, %rs55; 2026-02-21T09:02:21.5765571Z cvt.f32.bf16 %r674, %rs58; 2026-02-21T09:02:21.5765746Z cvt.f32.bf16 %r675, %rs59; 2026-02-21T09:02:21.5766067Z .loc 1 58 87 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:58:87 2026-02-21T09:02:21.5766419Z // begin inline asm 2026-02-21T09:02:21.5766712Z mov.u64 %rd194, 0x0; 2026-02-21T09:02:21.5766951Z createpolicy.fractional.L2::evict_first.b64 %rd194, 1.0; 2026-02-21T09:02:21.5767216Z // end inline asm 2026-02-21T09:02:21.5767477Z // begin inline asm 2026-02-21T09:02:21.5767633Z mov.u16 %rs43, 0x0; 2026-02-21T09:02:21.5767895Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs43 }, [ %rd254 + 0 ], %rd194; 2026-02-21T09:02:21.5768202Z // end inline asm 2026-02-21T09:02:21.5768505Z .loc 1 61 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:61:28 2026-02-21T09:02:21.5768866Z shl.b16 %rs60, %rs43, 4; 2026-02-21T09:02:21.5769175Z .loc 1 76 58 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:76:58 2026-02-21T09:02:21.5769536Z selp.b16 %rs61, %rs60, %rs43, %p24; 2026-02-21T09:02:21.5769751Z cvt.s16.s8 %rs62, %rs61; 2026-02-21T09:02:21.5769931Z shr.s16 %rs63, %rs62, 4; 2026-02-21T09:02:21.5770320Z .loc 1 81 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:81:32 2026-02-21T09:02:21.5770681Z cvt.rn.f32.s16 %r714, %rs63; 2026-02-21T09:02:21.5770866Z st.shared.b32 [%r26], %r714; 2026-02-21T09:02:21.5771045Z $L__tmp4: 2026-02-21T09:02:21.5771400Z .loc 2 291 36 // standard.py:291:36 @[ cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:88:40 ] 2026-02-21T09:02:21.5771812Z // begin inline asm 2026-02-21T09:02:21.5771995Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5772182Z // end inline asm 2026-02-21T09:02:21.5772330Z bar.sync 0; 2026-02-21T09:02:21.5772493Z shfl.sync.idx.b32 %r715, %r4, 0, 31, -1; 2026-02-21T09:02:21.5772722Z wgmma.fence.sync.aligned; 2026-02-21T09:02:21.5772900Z mov.pred %p44, -1; 2026-02-21T09:02:21.5773063Z // begin inline asm 2026-02-21T09:02:21.5773511Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r931,%r932,%r933,%r934,%r935,%r936,%r937,%r938}, {%r612,%r613,%r614,%r615}, %rd235, %p44, 1, 1; 2026-02-21T09:02:21.5774081Z // end inline asm 2026-02-21T09:02:21.5774233Z // begin inline asm 2026-02-21T09:02:21.5774667Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r931,%r932,%r933,%r934,%r935,%r936,%r937,%r938}, {%r632,%r633,%r634,%r635}, %rd236, %p44, 1, 1; 2026-02-21T09:02:21.5775157Z // end inline asm 2026-02-21T09:02:21.5775303Z // begin inline asm 2026-02-21T09:02:21.5775732Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r931,%r932,%r933,%r934,%r935,%r936,%r937,%r938}, {%r652,%r653,%r654,%r655}, %rd237, %p44, 1, 1; 2026-02-21T09:02:21.5776213Z // end inline asm 2026-02-21T09:02:21.5776362Z // begin inline asm 2026-02-21T09:02:21.5776936Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r931,%r932,%r933,%r934,%r935,%r936,%r937,%r938}, {%r672,%r673,%r674,%r675}, %rd238, %p44, 1, 1; 2026-02-21T09:02:21.5777414Z // end inline asm 2026-02-21T09:02:21.5777600Z wgmma.commit_group.sync.aligned; 2026-02-21T09:02:21.5777798Z mov.b32 %r686, 0; 2026-02-21T09:02:21.5777957Z mov.b32 %r685, %r686; 2026-02-21T09:02:21.5778123Z mov.b32 %r684, %r174; 2026-02-21T09:02:21.5778392Z // begin inline asm 2026-02-21T09:02:21.5778648Z // wait for regs: %r931,%r932,%r933,%r934,%r935,%r936,%r937,%r938,%r684,%r685,%r686 2026-02-21T09:02:21.5778971Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:02:21.5779173Z // end inline asm 2026-02-21T09:02:21.5779318Z $L__tmp5: 2026-02-21T09:02:21.5779606Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5779962Z add.s32 %r716, %r930, 1; 2026-02-21T09:02:21.5780144Z setp.gt.s32 %p51, %r716, 1; 2026-02-21T09:02:21.5780329Z selp.b32 %r930, 0, %r716, %p51; 2026-02-21T09:02:21.5780658Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5781023Z mad.wide.s32 %rd204, %r928, 2, %rd65; 2026-02-21T09:02:21.5781363Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5781723Z shl.b32 %r717, %r930, 14; 2026-02-21T09:02:21.5781905Z add.s32 %r718, %r151, %r717; 2026-02-21T09:02:21.5782088Z add.s32 %r698, %r718, %r13; 2026-02-21T09:02:21.5782267Z selp.b32 %r699, 8, 0, %p49; 2026-02-21T09:02:21.5782532Z // begin inline asm 2026-02-21T09:02:21.5782775Z cp.async.ca.shared.global [ %r698 + 0 ], [ %rd255 + 0 ], 0x8, %r699; 2026-02-21T09:02:21.5783052Z // end inline asm 2026-02-21T09:02:21.5783211Z add.s32 %r700, %r698, 4096; 2026-02-21T09:02:21.5783384Z // begin inline asm 2026-02-21T09:02:21.5783614Z cp.async.ca.shared.global [ %r700 + 0 ], [ %rd256 + 0 ], 0x8, %r699; 2026-02-21T09:02:21.5783883Z // end inline asm 2026-02-21T09:02:21.5784036Z add.s32 %r702, %r698, 8192; 2026-02-21T09:02:21.5784208Z // begin inline asm 2026-02-21T09:02:21.5784435Z cp.async.ca.shared.global [ %r702 + 0 ], [ %rd257 + 0 ], 0x8, %r699; 2026-02-21T09:02:21.5784706Z // end inline asm 2026-02-21T09:02:21.5784856Z add.s32 %r704, %r698, 12288; 2026-02-21T09:02:21.5785034Z // begin inline asm 2026-02-21T09:02:21.5785333Z cp.async.ca.shared.global [ %r704 + 0 ], [ %rd204 + 0 ], 0x8, %r699; 2026-02-21T09:02:21.5785607Z // end inline asm 2026-02-21T09:02:21.5785758Z cp.async.commit_group; 2026-02-21T09:02:21.5786069Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5786418Z add.s64 %rd257, %rd257, 64; 2026-02-21T09:02:21.5786744Z add.s64 %rd256, %rd256, 64; 2026-02-21T09:02:21.5786924Z add.s64 %rd255, %rd255, 64; 2026-02-21T09:02:21.5787097Z add.s32 %r928, %r928, 32; 2026-02-21T09:02:21.5787283Z add.s64 %rd254, %rd254, 131072; 2026-02-21T09:02:21.5787474Z setp.lt.u64 %p52, %rd258, 496; 2026-02-21T09:02:21.5787666Z @%p52 bra $L__BB0_7; 2026-02-21T09:02:21.5787877Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:02:21.5788147Z cp.async.wait_group 0; 2026-02-21T09:02:21.5788405Z bar.sync 0; 2026-02-21T09:02:21.5788709Z .loc 1 91 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:91:28 2026-02-21T09:02:21.5789173Z cvt.rn.bf16x2.f32 %r741, %r932, %r931; 2026-02-21T09:02:21.5789388Z cvt.rn.bf16x2.f32 %r742, %r934, %r933; 2026-02-21T09:02:21.5789606Z cvt.rn.bf16x2.f32 %r743, %r936, %r935; 2026-02-21T09:02:21.5789813Z cvt.rn.bf16x2.f32 %r744, %r938, %r937; 2026-02-21T09:02:21.5790154Z .loc 1 92 43 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:92:43 2026-02-21T09:02:21.5790616Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r741, %r742, %r743, %r744}; 2026-02-21T09:02:21.5790946Z // begin inline asm 2026-02-21T09:02:21.5791123Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5791319Z // end inline asm 2026-02-21T09:02:21.5791469Z bar.sync 0; 2026-02-21T09:02:21.5791623Z elect.sync %r745|%p55, -1; 2026-02-21T09:02:21.5791814Z and.pred %p53, %p1, %p55; 2026-02-21T09:02:21.5792000Z // begin inline asm 2026-02-21T09:02:21.5792330Z @%p53 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd205, {%r719, %r720}], [%r151]; 2026-02-21T09:02:21.5792793Z // end inline asm 2026-02-21T09:02:21.5792967Z cp.async.bulk.commit_group; 2026-02-21T09:02:21.5793166Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:21.5793366Z bar.sync 0; 2026-02-21T09:02:21.5793654Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5794010Z or.b32 %r746, %r905, 3; 2026-02-21T09:02:21.5794323Z .loc 1 33 31 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:33:31 2026-02-21T09:02:21.5794670Z add.s32 %r749, %r746, %r214; 2026-02-21T09:02:21.5794854Z shr.s32 %r750, %r749, 9; 2026-02-21T09:02:21.5795158Z .loc 1 32 30 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:32:30 2026-02-21T09:02:21.5795513Z and.b32 %r751, %r749, 268434944; 2026-02-21T09:02:21.5795705Z sub.s32 %r752, %r746, %r751; 2026-02-21T09:02:21.5796014Z .loc 1 34 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:34:27 2026-02-21T09:02:21.5796365Z shl.b32 %r108, %r752, 4; 2026-02-21T09:02:21.5796705Z .loc 1 36 27 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:36:27 2026-02-21T09:02:21.5796864Z shl.b32 %r898, %r750, 8; 2026-02-21T09:02:21.5797074Z .loc 1 37 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:37:32 2026-02-21T09:02:21.5797140Z or.b32 %r753, %r898, %r6; 2026-02-21T09:02:21.5797200Z or.b32 %r754, %r898, %r7; 2026-02-21T09:02:21.5797261Z or.b32 %r755, %r898, %r8; 2026-02-21T09:02:21.5797326Z or.b32 %r756, %r898, %r9; 2026-02-21T09:02:21.5797524Z .loc 1 52 53 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:53 2026-02-21T09:02:21.5797585Z shl.b32 %r757, %r753, 10; 2026-02-21T09:02:21.5797651Z shl.b32 %r758, %r754, 10; 2026-02-21T09:02:21.5797711Z shl.b32 %r759, %r755, 10; 2026-02-21T09:02:21.5797774Z shl.b32 %r760, %r756, 10; 2026-02-21T09:02:21.5798049Z .loc 1 52 60 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:60 2026-02-21T09:02:21.5798116Z or.b32 %r761, %r757, %r11; 2026-02-21T09:02:21.5798177Z or.b32 %r762, %r758, %r11; 2026-02-21T09:02:21.5798240Z or.b32 %r763, %r759, %r11; 2026-02-21T09:02:21.5798319Z or.b32 %r764, %r760, %r11; 2026-02-21T09:02:21.5798518Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5798594Z mad.wide.s32 %rd206, %r761, 2, %rd65; 2026-02-21T09:02:21.5798670Z mad.wide.s32 %rd207, %r762, 2, %rd65; 2026-02-21T09:02:21.5798739Z mad.wide.s32 %rd208, %r763, 2, %rd65; 2026-02-21T09:02:21.5798806Z mad.wide.s32 %rd209, %r764, 2, %rd65; 2026-02-21T09:02:21.5798869Z mov.b32 %r723, 8; 2026-02-21T09:02:21.5799065Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5799127Z // begin inline asm 2026-02-21T09:02:21.5799352Z cp.async.ca.shared.global [ %r722 + 0 ], [ %rd206 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5799418Z // end inline asm 2026-02-21T09:02:21.5799477Z // begin inline asm 2026-02-21T09:02:21.5799611Z cp.async.ca.shared.global [ %r724 + 0 ], [ %rd207 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5799676Z // end inline asm 2026-02-21T09:02:21.5799734Z // begin inline asm 2026-02-21T09:02:21.5799865Z cp.async.ca.shared.global [ %r726 + 0 ], [ %rd208 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5799923Z // end inline asm 2026-02-21T09:02:21.5799989Z // begin inline asm 2026-02-21T09:02:21.5800117Z cp.async.ca.shared.global [ %r728 + 0 ], [ %rd209 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5800174Z // end inline asm 2026-02-21T09:02:21.5800250Z cp.async.commit_group; 2026-02-21T09:02:21.5800451Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5800518Z cvt.s64.s32 %rd215, %r757; 2026-02-21T09:02:21.5800588Z or.b64 %rd216, %rd215, %rd8; 2026-02-21T09:02:21.5800655Z shl.b64 %rd217, %rd216, 1; 2026-02-21T09:02:21.5800794Z add.s64 %rd218, %rd65, %rd217; 2026-02-21T09:02:21.5800863Z add.s64 %rd210, %rd218, 64; 2026-02-21T09:02:21.5800931Z cvt.s64.s32 %rd219, %r758; 2026-02-21T09:02:21.5800995Z or.b64 %rd220, %rd219, %rd8; 2026-02-21T09:02:21.5801058Z shl.b64 %rd221, %rd220, 1; 2026-02-21T09:02:21.5801124Z add.s64 %rd222, %rd65, %rd221; 2026-02-21T09:02:21.5801186Z add.s64 %rd211, %rd222, 64; 2026-02-21T09:02:21.5801260Z cvt.s64.s32 %rd223, %r759; 2026-02-21T09:02:21.5801327Z or.b64 %rd224, %rd223, %rd8; 2026-02-21T09:02:21.5801397Z shl.b64 %rd225, %rd224, 1; 2026-02-21T09:02:21.5801461Z add.s64 %rd226, %rd65, %rd225; 2026-02-21T09:02:21.5801522Z add.s64 %rd212, %rd226, 64; 2026-02-21T09:02:21.5801590Z cvt.s64.s32 %rd227, %r760; 2026-02-21T09:02:21.5801652Z or.b64 %rd228, %rd227, %rd8; 2026-02-21T09:02:21.5801714Z shl.b64 %rd229, %rd228, 1; 2026-02-21T09:02:21.5801778Z add.s64 %rd230, %rd65, %rd229; 2026-02-21T09:02:21.5801850Z add.s64 %rd213, %rd230, 64; 2026-02-21T09:02:21.5802051Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5802110Z bar.sync 0; 2026-02-21T09:02:21.5802235Z // begin inline asm 2026-02-21T09:02:21.5802370Z cp.async.ca.shared.global [ %r730 + 0 ], [ %rd210 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5802428Z // end inline asm 2026-02-21T09:02:21.5802493Z // begin inline asm 2026-02-21T09:02:21.5802625Z cp.async.ca.shared.global [ %r732 + 0 ], [ %rd211 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5802683Z // end inline asm 2026-02-21T09:02:21.5802742Z // begin inline asm 2026-02-21T09:02:21.5802878Z cp.async.ca.shared.global [ %r734 + 0 ], [ %rd212 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5802937Z // end inline asm 2026-02-21T09:02:21.5802997Z // begin inline asm 2026-02-21T09:02:21.5803136Z cp.async.ca.shared.global [ %r736 + 0 ], [ %rd213 + 0 ], 0x8, %r723; 2026-02-21T09:02:21.5803196Z // end inline asm 2026-02-21T09:02:21.5803315Z cp.async.commit_group; 2026-02-21T09:02:21.5803526Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5803591Z or.b32 %r765, %r8, %r898; 2026-02-21T09:02:21.5803659Z shl.b32 %r766, %r765, 10; 2026-02-21T09:02:21.5803732Z mad.wide.s32 %rd262, %r766, 2, %rd6; 2026-02-21T09:02:21.5803800Z shl.b32 %r767, %r750, 18; 2026-02-21T09:02:21.5803862Z or.b32 %r768, %r29, %r767; 2026-02-21T09:02:21.5803932Z mad.wide.s32 %rd261, %r768, 2, %rd6; 2026-02-21T09:02:21.5803999Z or.b32 %r769, %r30, %r767; 2026-02-21T09:02:21.5804067Z mad.wide.s32 %rd260, %r769, 2, %rd6; 2026-02-21T09:02:21.5804129Z or.b32 %r939, %r31, %r767; 2026-02-21T09:02:21.5804194Z shl.b32 %r770, %r746, 4; 2026-02-21T09:02:21.5804262Z or.b32 %r771, %r5, %r770; 2026-02-21T09:02:21.5804325Z shl.b32 %r772, %r750, 13; 2026-02-21T09:02:21.5804387Z sub.s32 %r773, %r771, %r772; 2026-02-21T09:02:21.5804461Z cvt.s64.s32 %rd231, %r773; 2026-02-21T09:02:21.5804593Z add.s64 %rd259, %rd7, %rd231; 2026-02-21T09:02:21.5804661Z mov.b32 %r942, 0f00000000; 2026-02-21T09:02:21.5804720Z mov.b32 %r941, 1; 2026-02-21T09:02:21.5804787Z mov.b32 %r940, -1; 2026-02-21T09:02:21.5804851Z mov.b64 %rd263, -16; 2026-02-21T09:02:21.5804913Z mov.b32 %r943, %r942; 2026-02-21T09:02:21.5804976Z mov.b32 %r944, %r942; 2026-02-21T09:02:21.5805036Z mov.b32 %r945, %r942; 2026-02-21T09:02:21.5805095Z mov.b32 %r946, %r942; 2026-02-21T09:02:21.5805153Z mov.b32 %r947, %r942; 2026-02-21T09:02:21.5805220Z mov.b32 %r948, %r942; 2026-02-21T09:02:21.5805278Z mov.b32 %r949, %r942; 2026-02-21T09:02:21.5805393Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:02:21.5805506Z // => This Inner Loop Header: Depth=2 2026-02-21T09:02:21.5805571Z add.s64 %rd263, %rd263, 16; 2026-02-21T09:02:21.5805641Z setp.lt.u64 %p61, %rd263, 480; 2026-02-21T09:02:21.5805708Z add.s32 %r884, %r940, 1; 2026-02-21T09:02:21.5805776Z setp.gt.s32 %p62, %r884, 1; 2026-02-21T09:02:21.5805897Z selp.b32 %r940, 0, %r884, %p62; 2026-02-21T09:02:21.5806101Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5806181Z cp.async.wait_group 1; 2026-02-21T09:02:21.5806251Z bar.sync 0; 2026-02-21T09:02:21.5806315Z shl.b32 %r885, %r940, 14; 2026-02-21T09:02:21.5806383Z add.s32 %r887, %r151, %r885; 2026-02-21T09:02:21.5806696Z .loc 1 56 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:56:32 2026-02-21T09:02:21.5806763Z add.s32 %r888, %r887, %r22; 2026-02-21T09:02:21.5806833Z ld.shared.b16 %rs65, [%r888]; 2026-02-21T09:02:21.5806909Z ld.shared.b16 %rs66, [%r888+512]; 2026-02-21T09:02:21.5806977Z ld.shared.b16 %rs67, [%r888+32]; 2026-02-21T09:02:21.5807043Z ld.shared.b16 %rs68, [%r888+544]; 2026-02-21T09:02:21.5807111Z add.s32 %r889, %r887, %r23; 2026-02-21T09:02:21.5807177Z ld.shared.b16 %rs69, [%r889]; 2026-02-21T09:02:21.5807246Z ld.shared.b16 %rs70, [%r889+512]; 2026-02-21T09:02:21.5807320Z ld.shared.b16 %rs71, [%r889+32]; 2026-02-21T09:02:21.5807385Z ld.shared.b16 %rs72, [%r889+544]; 2026-02-21T09:02:21.5807447Z add.s32 %r890, %r887, %r24; 2026-02-21T09:02:21.5807598Z ld.shared.b16 %rs73, [%r890]; 2026-02-21T09:02:21.5807677Z ld.shared.b16 %rs74, [%r890+512]; 2026-02-21T09:02:21.5807752Z ld.shared.b16 %rs75, [%r890+32]; 2026-02-21T09:02:21.5807820Z ld.shared.b16 %rs76, [%r890+544]; 2026-02-21T09:02:21.5807890Z add.s32 %r891, %r887, %r25; 2026-02-21T09:02:21.5807954Z ld.shared.b16 %rs77, [%r891]; 2026-02-21T09:02:21.5808020Z ld.shared.b16 %rs78, [%r891+512]; 2026-02-21T09:02:21.5808085Z ld.shared.b16 %rs79, [%r891+32]; 2026-02-21T09:02:21.5808156Z ld.shared.b16 %rs80, [%r891+544]; 2026-02-21T09:02:21.5808220Z cvt.f32.bf16 %r790, %rs65; 2026-02-21T09:02:21.5808285Z cvt.f32.bf16 %r791, %rs66; 2026-02-21T09:02:21.5808353Z cvt.f32.bf16 %r792, %rs69; 2026-02-21T09:02:21.5808416Z cvt.f32.bf16 %r793, %rs70; 2026-02-21T09:02:21.5808552Z cvt.f32.bf16 %r810, %rs73; 2026-02-21T09:02:21.5808618Z cvt.f32.bf16 %r811, %rs74; 2026-02-21T09:02:21.5808690Z cvt.f32.bf16 %r812, %rs77; 2026-02-21T09:02:21.5808755Z cvt.f32.bf16 %r813, %rs78; 2026-02-21T09:02:21.5808818Z cvt.f32.bf16 %r830, %rs67; 2026-02-21T09:02:21.5808887Z cvt.f32.bf16 %r831, %rs68; 2026-02-21T09:02:21.5808950Z cvt.f32.bf16 %r832, %rs71; 2026-02-21T09:02:21.5809011Z cvt.f32.bf16 %r833, %rs72; 2026-02-21T09:02:21.5809072Z cvt.f32.bf16 %r850, %rs75; 2026-02-21T09:02:21.5809141Z cvt.f32.bf16 %r851, %rs76; 2026-02-21T09:02:21.5809202Z cvt.f32.bf16 %r852, %rs79; 2026-02-21T09:02:21.5809264Z cvt.f32.bf16 %r853, %rs80; 2026-02-21T09:02:21.5809473Z .loc 1 58 87 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:58:87 2026-02-21T09:02:21.5809539Z // begin inline asm 2026-02-21T09:02:21.5809601Z mov.u64 %rd232, 0x0; 2026-02-21T09:02:21.5809739Z createpolicy.fractional.L2::evict_first.b64 %rd232, 1.0; 2026-02-21T09:02:21.5809907Z // end inline asm 2026-02-21T09:02:21.5809971Z // begin inline asm 2026-02-21T09:02:21.5810031Z mov.u16 %rs64, 0x0; 2026-02-21T09:02:21.5810203Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs64 }, [ %rd259 + 0 ], %rd232; 2026-02-21T09:02:21.5810265Z // end inline asm 2026-02-21T09:02:21.5810464Z .loc 1 61 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:61:28 2026-02-21T09:02:21.5810535Z shl.b16 %rs81, %rs64, 4; 2026-02-21T09:02:21.5810733Z .loc 1 76 58 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:76:58 2026-02-21T09:02:21.5810806Z selp.b16 %rs82, %rs81, %rs64, %p24; 2026-02-21T09:02:21.5810876Z cvt.s16.s8 %rs83, %rs82; 2026-02-21T09:02:21.5810939Z shr.s16 %rs84, %rs83, 4; 2026-02-21T09:02:21.5811133Z .loc 1 81 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:81:32 2026-02-21T09:02:21.5811199Z cvt.rn.f32.s16 %r892, %rs84; 2026-02-21T09:02:21.5811279Z st.shared.b32 [%r26], %r892; 2026-02-21T09:02:21.5811411Z $L__tmp6: 2026-02-21T09:02:21.5811686Z .loc 2 291 36 // standard.py:291:36 @[ cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:88:40 ] 2026-02-21T09:02:21.5811760Z // begin inline asm 2026-02-21T09:02:21.5811842Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5811900Z // end inline asm 2026-02-21T09:02:21.5811963Z bar.sync 0; 2026-02-21T09:02:21.5812046Z shfl.sync.idx.b32 %r893, %r4, 0, 31, -1; 2026-02-21T09:02:21.5812119Z wgmma.fence.sync.aligned; 2026-02-21T09:02:21.5812187Z mov.pred %p56, -1; 2026-02-21T09:02:21.5812255Z // begin inline asm 2026-02-21T09:02:21.5812605Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r942,%r943,%r944,%r945,%r946,%r947,%r948,%r949}, {%r790,%r791,%r792,%r793}, %rd235, %p56, 1, 1; 2026-02-21T09:02:21.5812668Z // end inline asm 2026-02-21T09:02:21.5812734Z // begin inline asm 2026-02-21T09:02:21.5813075Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r942,%r943,%r944,%r945,%r946,%r947,%r948,%r949}, {%r810,%r811,%r812,%r813}, %rd236, %p56, 1, 1; 2026-02-21T09:02:21.5813137Z // end inline asm 2026-02-21T09:02:21.5813206Z // begin inline asm 2026-02-21T09:02:21.5813595Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r942,%r943,%r944,%r945,%r946,%r947,%r948,%r949}, {%r830,%r831,%r832,%r833}, %rd237, %p56, 1, 1; 2026-02-21T09:02:21.5813655Z // end inline asm 2026-02-21T09:02:21.5813717Z // begin inline asm 2026-02-21T09:02:21.5814055Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r942,%r943,%r944,%r945,%r946,%r947,%r948,%r949}, {%r850,%r851,%r852,%r853}, %rd238, %p56, 1, 1; 2026-02-21T09:02:21.5814114Z // end inline asm 2026-02-21T09:02:21.5814193Z wgmma.commit_group.sync.aligned; 2026-02-21T09:02:21.5814257Z mov.b32 %r863, 0; 2026-02-21T09:02:21.5814321Z mov.b32 %r862, %r174; 2026-02-21T09:02:21.5814379Z mov.b32 %r864, %r863; 2026-02-21T09:02:21.5814442Z // begin inline asm 2026-02-21T09:02:21.5814650Z // wait for regs: %r942,%r943,%r944,%r945,%r946,%r947,%r948,%r949,%r862,%r863,%r864 2026-02-21T09:02:21.5814731Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:02:21.5814790Z // end inline asm 2026-02-21T09:02:21.5814852Z $L__tmp7: 2026-02-21T09:02:21.5815057Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5815118Z add.s32 %r894, %r941, 1; 2026-02-21T09:02:21.5815190Z setp.gt.s32 %p63, %r894, 1; 2026-02-21T09:02:21.5815255Z selp.b32 %r941, 0, %r894, %p63; 2026-02-21T09:02:21.5815462Z .loc 1 52 32 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:32 2026-02-21T09:02:21.5815541Z mad.wide.s32 %rd242, %r939, 2, %rd65; 2026-02-21T09:02:21.5815735Z .loc 1 52 80 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:52:80 2026-02-21T09:02:21.5815799Z shl.b32 %r895, %r941, 14; 2026-02-21T09:02:21.5815861Z add.s32 %r896, %r151, %r895; 2026-02-21T09:02:21.5815983Z add.s32 %r876, %r896, %r13; 2026-02-21T09:02:21.5816048Z selp.b32 %r877, 8, 0, %p61; 2026-02-21T09:02:21.5816111Z // begin inline asm 2026-02-21T09:02:21.5816255Z cp.async.ca.shared.global [ %r876 + 0 ], [ %rd260 + 0 ], 0x8, %r877; 2026-02-21T09:02:21.5816315Z // end inline asm 2026-02-21T09:02:21.5816376Z add.s32 %r878, %r876, 4096; 2026-02-21T09:02:21.5816437Z // begin inline asm 2026-02-21T09:02:21.5816699Z cp.async.ca.shared.global [ %r878 + 0 ], [ %rd261 + 0 ], 0x8, %r877; 2026-02-21T09:02:21.5816757Z // end inline asm 2026-02-21T09:02:21.5816818Z add.s32 %r880, %r876, 8192; 2026-02-21T09:02:21.5816884Z // begin inline asm 2026-02-21T09:02:21.5817015Z cp.async.ca.shared.global [ %r880 + 0 ], [ %rd262 + 0 ], 0x8, %r877; 2026-02-21T09:02:21.5817075Z // end inline asm 2026-02-21T09:02:21.5817142Z add.s32 %r882, %r876, 12288; 2026-02-21T09:02:21.5817201Z // begin inline asm 2026-02-21T09:02:21.5817330Z cp.async.ca.shared.global [ %r882 + 0 ], [ %rd242 + 0 ], 0x8, %r877; 2026-02-21T09:02:21.5817390Z // end inline asm 2026-02-21T09:02:21.5817463Z cp.async.commit_group; 2026-02-21T09:02:21.5817739Z .loc 1 44 92 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:44:92 2026-02-21T09:02:21.5817821Z add.s64 %rd262, %rd262, 64; 2026-02-21T09:02:21.5817894Z add.s64 %rd261, %rd261, 64; 2026-02-21T09:02:21.5817956Z add.s64 %rd260, %rd260, 64; 2026-02-21T09:02:21.5818017Z add.s32 %r939, %r939, 32; 2026-02-21T09:02:21.5818082Z add.s64 %rd259, %rd259, 131072; 2026-02-21T09:02:21.5818157Z setp.lt.u64 %p64, %rd263, 496; 2026-02-21T09:02:21.5818222Z @%p64 bra $L__BB0_9; 2026-02-21T09:02:21.5818336Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:02:21.5818410Z cp.async.wait_group 0; 2026-02-21T09:02:21.5818468Z bar.sync 0; 2026-02-21T09:02:21.5818671Z .loc 1 91 28 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:91:28 2026-02-21T09:02:21.5818752Z cvt.rn.bf16x2.f32 %r900, %r943, %r942; 2026-02-21T09:02:21.5818828Z cvt.rn.bf16x2.f32 %r901, %r945, %r944; 2026-02-21T09:02:21.5818900Z cvt.rn.bf16x2.f32 %r902, %r947, %r946; 2026-02-21T09:02:21.5818970Z cvt.rn.bf16x2.f32 %r903, %r949, %r948; 2026-02-21T09:02:21.5819253Z .loc 1 92 43 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:92:43 2026-02-21T09:02:21.5819436Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r900, %r901, %r902, %r903}; 2026-02-21T09:02:21.5819498Z // begin inline asm 2026-02-21T09:02:21.5819582Z fence.proxy.async.shared::cta; 2026-02-21T09:02:21.5819640Z // end inline asm 2026-02-21T09:02:21.5819697Z bar.sync 0; 2026-02-21T09:02:21.5819775Z elect.sync %r904|%p67, -1; 2026-02-21T09:02:21.5819853Z and.pred %p65, %p1, %p67; 2026-02-21T09:02:21.5819916Z // begin inline asm 2026-02-21T09:02:21.5820141Z @%p65 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd205, {%r108, %r898}], [%r151]; 2026-02-21T09:02:21.5820205Z // end inline asm 2026-02-21T09:02:21.5820281Z cp.async.bulk.commit_group; 2026-02-21T09:02:21.5820436Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:02:21.5820502Z bar.sync 0; 2026-02-21T09:02:21.5820703Z .loc 1 29 88 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:88 2026-02-21T09:02:21.5820771Z add.s32 %r905, %r905, 4; 2026-02-21T09:02:21.5820844Z setp.lt.s32 %p68, %r905, %r3; 2026-02-21T09:02:21.5820906Z @%p68 bra $L__BB0_2; 2026-02-21T09:02:21.5820997Z $L__BB0_11: // %.preheader 2026-02-21T09:02:21.5821192Z .loc 1 29 4 // cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py:29:4 2026-02-21T09:02:21.5821253Z ret; 2026-02-21T09:02:21.5821309Z $L__tmp8: 2026-02-21T09:02:21.5821366Z $L__func_end0: 2026-02-21T09:02:21.5821460Z // -- End function 2026-02-21T09:02:21.5821517Z } 2026-02-21T09:02:21.5821764Z .file 1 "/tmp/torchinductor_root/lt/cltm5co4gfnxyrxvtnq2te335gs7jtekakrixlb62g65qfykqv7m.py" 2026-02-21T09:02:21.5822070Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:02:21.5822143Z .section .debug_abbrev 2026-02-21T09:02:21.5822198Z { 2026-02-21T09:02:21.5822298Z .b8 1 // Abbreviation Code 2026-02-21T09:02:21.5822400Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:02:21.5822489Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:02:21.5822577Z .b8 37 // DW_AT_producer 2026-02-21T09:02:21.5822665Z .b8 8 // DW_FORM_string 2026-02-21T09:02:21.5822746Z .b8 19 // DW_AT_language 2026-02-21T09:02:21.5822830Z .b8 5 // DW_FORM_data2 2026-02-21T09:02:21.5822911Z .b8 3 // DW_AT_name 2026-02-21T09:02:21.5822997Z .b8 8 // DW_FORM_string 2026-02-21T09:02:21.5823092Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:02:21.5823233Z .b8 6 // DW_FORM_data4 2026-02-21T09:02:21.5823327Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:02:21.5823409Z .b8 8 // DW_FORM_string 2026-02-21T09:02:21.5823488Z .b8 0 // EOM(1) 2026-02-21T09:02:21.5823565Z .b8 0 // EOM(2) 2026-02-21T09:02:21.5823655Z .b8 2 // Abbreviation Code 2026-02-21T09:02:21.5823746Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:02:21.5823833Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:02:21.5823910Z .b8 3 // DW_AT_name 2026-02-21T09:02:21.5823993Z .b8 8 // DW_FORM_string 2026-02-21T09:02:21.5824077Z .b8 32 // DW_AT_inline 2026-02-21T09:02:21.5824166Z .b8 11 // DW_FORM_data1 2026-02-21T09:02:21.5824241Z .b8 0 // EOM(1) 2026-02-21T09:02:21.5824314Z .b8 0 // EOM(2) 2026-02-21T09:02:21.5824406Z .b8 3 // Abbreviation Code 2026-02-21T09:02:21.5824549Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:02:21.5824633Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:02:21.5824718Z .b8 17 // DW_AT_low_pc 2026-02-21T09:02:21.5824795Z .b8 1 // DW_FORM_addr 2026-02-21T09:02:21.5824879Z .b8 18 // DW_AT_high_pc 2026-02-21T09:02:21.5824958Z .b8 1 // DW_FORM_addr 2026-02-21T09:02:21.5825058Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:02:21.5825136Z .b8 19 // DW_FORM_ref4 2026-02-21T09:02:21.5825262Z .b8 0 // EOM(1) 2026-02-21T09:02:21.5825344Z .b8 0 // EOM(2) 2026-02-21T09:02:21.5825431Z .b8 4 // Abbreviation Code 2026-02-21T09:02:21.5825535Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:02:21.5825621Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:02:21.5825719Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:02:21.5825796Z .b8 19 // DW_FORM_ref4 2026-02-21T09:02:21.5825874Z .b8 17 // DW_AT_low_pc 2026-02-21T09:02:21.5825957Z .b8 1 // DW_FORM_addr 2026-02-21T09:02:21.5826038Z .b8 18 // DW_AT_high_pc 2026-02-21T09:02:21.5826114Z .b8 1 // DW_FORM_addr 2026-02-21T09:02:21.5826205Z .b8 88 // DW_AT_call_file 2026-02-21T09:02:21.5826337Z .b8 11 // DW_FORM_data1 2026-02-21T09:02:21.5826432Z .b8 89 // DW_AT_call_line 2026-02-21T09:02:21.5826634Z .b8 11 // DW_FORM_data1 2026-02-21T09:02:21.5826726Z .b8 87 // DW_AT_call_column 2026-02-21T09:02:21.5826805Z .b8 11 // DW_FORM_data1 2026-02-21T09:02:21.5826882Z .b8 0 // EOM(1) 2026-02-21T09:02:21.5826963Z .b8 0 // EOM(2) 2026-02-21T09:02:21.5827037Z .b8 0 // EOM(3) 2026-02-21T09:02:21.5827092Z } 2026-02-21T09:02:21.5827161Z .section .debug_info 2026-02-21T09:02:21.5827214Z { 2026-02-21T09:02:21.5827305Z .b32 178 // Length of Unit 2026-02-21T09:02:21.5827403Z .b8 2 // DWARF version number 2026-02-21T09:02:21.5827457Z .b8 0 2026-02-21T09:02:21.5827593Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:02:21.5827773Z .b8 8 // Address Size (in bytes) 2026-02-21T09:02:21.5827897Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:02:21.5827986Z .b8 116 // DW_AT_producer 2026-02-21T09:02:21.5828042Z .b8 114 2026-02-21T09:02:21.5828101Z .b8 105 2026-02-21T09:02:21.5828154Z .b8 116 2026-02-21T09:02:21.5828207Z .b8 111 2026-02-21T09:02:21.5828339Z .b8 110 2026-02-21T09:02:21.5828400Z .b8 0 2026-02-21T09:02:21.5828484Z .b8 2 // DW_AT_language 2026-02-21T09:02:21.5828538Z .b8 0 2026-02-21T09:02:21.5828625Z .b8 99 // DW_AT_name 2026-02-21T09:02:21.5828679Z .b8 108 2026-02-21T09:02:21.5828734Z .b8 116 2026-02-21T09:02:21.5828787Z .b8 109 2026-02-21T09:02:21.5828846Z .b8 53 2026-02-21T09:02:21.5828902Z .b8 99 2026-02-21T09:02:21.5828957Z .b8 111 2026-02-21T09:02:21.5829017Z .b8 52 2026-02-21T09:02:21.5829073Z .b8 103 2026-02-21T09:02:21.5829129Z .b8 102 2026-02-21T09:02:21.5829182Z .b8 110 2026-02-21T09:02:21.5829242Z .b8 120 2026-02-21T09:02:21.5829295Z .b8 121 2026-02-21T09:02:21.5829347Z .b8 114 2026-02-21T09:02:21.5829492Z .b8 120 2026-02-21T09:02:21.5829545Z .b8 118 2026-02-21T09:02:21.5829599Z .b8 116 2026-02-21T09:02:21.5829652Z .b8 110 2026-02-21T09:02:21.5829713Z .b8 113 2026-02-21T09:02:21.5829765Z .b8 50 2026-02-21T09:02:21.5829819Z .b8 116 2026-02-21T09:02:21.5829882Z .b8 101 2026-02-21T09:02:21.5829935Z .b8 51 2026-02-21T09:02:21.5829989Z .b8 51 2026-02-21T09:02:21.5830041Z .b8 53 2026-02-21T09:02:21.5830102Z .b8 103 2026-02-21T09:02:21.5830154Z .b8 115 2026-02-21T09:02:21.5830205Z .b8 55 2026-02-21T09:02:21.5830261Z .b8 106 2026-02-21T09:02:21.5830322Z .b8 116 2026-02-21T09:02:21.5830374Z .b8 101 2026-02-21T09:02:21.5830430Z .b8 107 2026-02-21T09:02:21.5830489Z .b8 97 2026-02-21T09:02:21.5830542Z .b8 107 2026-02-21T09:02:21.5830595Z .b8 114 2026-02-21T09:02:21.5830652Z .b8 105 2026-02-21T09:02:21.5830812Z .b8 120 2026-02-21T09:02:21.5830871Z .b8 108 2026-02-21T09:02:21.5830926Z .b8 98 2026-02-21T09:02:21.5830983Z .b8 54 2026-02-21T09:02:21.5831036Z .b8 50 2026-02-21T09:02:21.5831089Z .b8 103 2026-02-21T09:02:21.5831144Z .b8 54 2026-02-21T09:02:21.5831202Z .b8 53 2026-02-21T09:02:21.5831268Z .b8 113 2026-02-21T09:02:21.5831324Z .b8 102 2026-02-21T09:02:21.5831379Z .b8 121 2026-02-21T09:02:21.5831440Z .b8 107 2026-02-21T09:02:21.5831497Z .b8 113 2026-02-21T09:02:21.5831549Z .b8 118 2026-02-21T09:02:21.5831606Z .b8 55 2026-02-21T09:02:21.5831659Z .b8 109 2026-02-21T09:02:21.5831711Z .b8 46 2026-02-21T09:02:21.5831763Z .b8 112 2026-02-21T09:02:21.5831820Z .b8 121 2026-02-21T09:02:21.5831872Z .b8 0 2026-02-21T09:02:21.5831975Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:02:21.5832066Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:02:21.5832119Z .b8 116 2026-02-21T09:02:21.5832171Z .b8 109 2026-02-21T09:02:21.5832309Z .b8 112 2026-02-21T09:02:21.5832368Z .b8 47 2026-02-21T09:02:21.5832424Z .b8 116 2026-02-21T09:02:21.5832476Z .b8 111 2026-02-21T09:02:21.5832534Z .b8 114 2026-02-21T09:02:21.5832587Z .b8 99 2026-02-21T09:02:21.5832639Z .b8 104 2026-02-21T09:02:21.5832697Z .b8 105 2026-02-21T09:02:21.5832753Z .b8 110 2026-02-21T09:02:21.5832809Z .b8 100 2026-02-21T09:02:21.5832862Z .b8 117 2026-02-21T09:02:21.5832919Z .b8 99 2026-02-21T09:02:21.5832971Z .b8 116 2026-02-21T09:02:21.5833023Z .b8 111 2026-02-21T09:02:21.5833075Z .b8 114 2026-02-21T09:02:21.5833131Z .b8 95 2026-02-21T09:02:21.5833184Z .b8 114 2026-02-21T09:02:21.5833237Z .b8 111 2026-02-21T09:02:21.5833291Z .b8 111 2026-02-21T09:02:21.5833350Z .b8 116 2026-02-21T09:02:21.5833403Z .b8 47 2026-02-21T09:02:21.5833455Z .b8 108 2026-02-21T09:02:21.5833511Z .b8 116 2026-02-21T09:02:21.5833563Z .b8 0 2026-02-21T09:02:21.5833678Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:02:21.5833758Z .b8 95 // DW_AT_name 2026-02-21T09:02:21.5833822Z .b8 104 2026-02-21T09:02:21.5833929Z .b8 101 2026-02-21T09:02:21.5833985Z .b8 108 2026-02-21T09:02:21.5834055Z .b8 105 2026-02-21T09:02:21.5834109Z .b8 111 2026-02-21T09:02:21.5834167Z .b8 110 2026-02-21T09:02:21.5834220Z .b8 95 2026-02-21T09:02:21.5834277Z .b8 109 2026-02-21T09:02:21.5834331Z .b8 97 2026-02-21T09:02:21.5834384Z .b8 116 2026-02-21T09:02:21.5834443Z .b8 109 2026-02-21T09:02:21.5834495Z .b8 117 2026-02-21T09:02:21.5834549Z .b8 108 2026-02-21T09:02:21.5834602Z .b8 95 2026-02-21T09:02:21.5834660Z .b8 98 2026-02-21T09:02:21.5834714Z .b8 102 2026-02-21T09:02:21.5834767Z .b8 49 2026-02-21T09:02:21.5834826Z .b8 54 2026-02-21T09:02:21.5834878Z .b8 95 2026-02-21T09:02:21.5834931Z .b8 105 2026-02-21T09:02:21.5834984Z .b8 110 2026-02-21T09:02:21.5835043Z .b8 116 2026-02-21T09:02:21.5835095Z .b8 52 2026-02-21T09:02:21.5835148Z .b8 0 2026-02-21T09:02:21.5835229Z .b8 1 // DW_AT_inline 2026-02-21T09:02:21.5835348Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:02:21.5835445Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:02:21.5835541Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:02:21.5835710Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:02:21.5835841Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:02:21.5835940Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:02:21.5836035Z .b64 $L__tmp0 // DW_AT_low_pc 2026-02-21T09:02:21.5836126Z .b64 $L__tmp7 // DW_AT_high_pc 2026-02-21T09:02:21.5836211Z .b8 1 // DW_AT_call_file 2026-02-21T09:02:21.5836300Z .b8 88 // DW_AT_call_line 2026-02-21T09:02:21.5836387Z .b8 40 // DW_AT_call_column 2026-02-21T09:02:21.5836686Z .b8 0 // End Of Children Mark 2026-02-21T09:02:21.5836799Z .b8 0 // End Of Children Mark 2026-02-21T09:02:21.5836860Z } 2026-02-21T09:02:21.5836933Z .section .debug_macinfo { } 2026-02-21T09:02:21.5836942Z 2026-02-21T09:02:21.5837026Z ================================================================ 2026-02-21T09:02:21.5837158Z please share the reproducer above with Triton project. 2026-02-21T09:02:22.9526778Z [88s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 512, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, True], range_num_stages=[1, 3], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:02:22.9528581Z Tensor-likes are not close! 2026-02-21T09:02:22.9528989Z 2026-02-21T09:02:22.9529117Z Mismatched elements: 33504489 / 33554432 (99.9%) 2026-02-21T09:02:22.9529552Z Greatest absolute difference: 3264.0 at index (1244, 4726) (up to 0.01 allowed) 2026-02-21T09:02:22.9530082Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:22.9530547Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:22.9530813Z 2026-02-21T09:02:24.0972423Z [89s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 128], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=4, num_stages=5, num_warps=32, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[None, None], range_num_stages=[2, 4], range_unroll_factors=[1, 2], range_warp_specializes=[]) 2026-02-21T09:02:24.0974208Z Tensor-likes are not close! 2026-02-21T09:02:24.0974393Z 2026-02-21T09:02:24.0974521Z Mismatched elements: 33467813 / 33554432 (99.7%) 2026-02-21T09:02:24.0975249Z Greatest absolute difference: 1680.0 at index (31, 4832) (up to 0.01 allowed) 2026-02-21T09:02:24.0975740Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:24.0976191Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:24.0976428Z 2026-02-21T09:02:28.1349517Z [93s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 1024], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=1, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, None], range_num_stages=[4, 0], range_unroll_factors=[3, 0], range_warp_specializes=[]) 2026-02-21T09:02:28.1351452Z Tensor-likes are not close! 2026-02-21T09:02:28.1351649Z 2026-02-21T09:02:28.1351775Z Mismatched elements: 33504258 / 33554432 (99.9%) 2026-02-21T09:02:28.1352250Z Greatest absolute difference: 3200.0 at index (967, 3412) (up to 0.01 allowed) 2026-02-21T09:02:28.1352788Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:28.1353266Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:28.1353903Z 2026-02-21T09:02:28.2951163Z [93s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 2048], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=128, num_stages=8, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[0, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:02:28.2953000Z Tensor-likes are not close! 2026-02-21T09:02:28.2953168Z 2026-02-21T09:02:28.2953285Z Mismatched elements: 33503116 / 33554432 (99.8%) 2026-02-21T09:02:28.3306116Z Greatest absolute difference: 3200.0 at index (1150, 2457) (up to 0.01 allowed) 2026-02-21T09:02:28.3307627Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:28.3308532Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:28.3308954Z 2026-02-21T09:02:28.9263664Z [94s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=5, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:02:28.9265438Z Tensor-likes are not close! 2026-02-21T09:02:28.9265601Z 2026-02-21T09:02:28.9265716Z Mismatched elements: 33393926 / 33554432 (99.5%) 2026-02-21T09:02:28.9266246Z Greatest absolute difference: 1296.0 at index (2736, 4050) (up to 0.01 allowed) 2026-02-21T09:02:28.9267221Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:28.9267701Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:28.9267963Z 2026-02-21T09:02:36.0990873Z [101s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:02:36.0992711Z Tensor-likes are not close! 2026-02-21T09:02:36.0992882Z 2026-02-21T09:02:36.0993013Z Mismatched elements: 33443532 / 33554432 (99.7%) 2026-02-21T09:02:36.0993462Z Greatest absolute difference: 1328.0 at index (2366, 4290) (up to 0.01 allowed) 2026-02-21T09:02:36.1001035Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:36.1001605Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:36.1001889Z 2026-02-21T09:02:44.3670900Z [109s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 32], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=7, num_warps=16, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[None, True], range_num_stages=[4, 2], range_unroll_factors=[4, 0], range_warp_specializes=[]) 2026-02-21T09:02:44.3672708Z Tensor-likes are not close! 2026-02-21T09:02:44.3672877Z 2026-02-21T09:02:44.3672998Z Mismatched elements: 33553817 / 33554432 (100.0%) 2026-02-21T09:02:44.3673440Z Greatest absolute difference: 2464.0 at index (1078, 7740) (up to 0.01 allowed) 2026-02-21T09:02:44.3674013Z Greatest relative difference: 3.046875 at index (1390, 6294) (up to 0.01 allowed) 2026-02-21T09:02:44.3674526Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:44.3674785Z 2026-02-21T09:02:45.3961726Z [110s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 1024], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=4, num_stages=8, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 1], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T09:02:45.3963832Z Tensor-likes are not close! 2026-02-21T09:02:45.3963994Z 2026-02-21T09:02:45.3964110Z Mismatched elements: 33485699 / 33554432 (99.8%) 2026-02-21T09:02:45.3964533Z Greatest absolute difference: 2368.0 at index (1992, 3787) (up to 0.01 allowed) 2026-02-21T09:02:45.3965225Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:02:45.3965698Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:02:45.3965960Z 2026-02-21T09:02:48.6363858Z 2026-02-21T09:02:48.6365358Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 1.7 configs/s 2026-02-21T09:02:48.6386237Z [114s] Adaptive compile timeout: 30s (90% percentile=14.8s, bounds=[30.0s, 30s]) 2026-02-21T09:02:48.6391499Z [114s] Initial random population of 100, 5 starting points: 2026-02-21T09:02:48.6391885Z error=20 2026-02-21T09:02:48.6392098Z ok=80 2026-02-21T09:02:48.6392312Z min=1.0282 2026-02-21T09:02:48.6392550Z mid=57.6757 2026-02-21T09:02:48.6392771Z max=660.0285 2026-02-21T09:02:48.6392975Z best={'block_sizes': [8, 32, 256], 2026-02-21T09:02:48.6393392Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:02:48.6393832Z 'l2_groupings': [32], 2026-02-21T09:02:48.6394068Z 'load_eviction_policies': ['', ''], 2026-02-21T09:02:48.6394679Z 'loop_orders': [[0, 1]], 2026-02-21T09:02:48.6394926Z 'num_sm_multiplier': 4, 2026-02-21T09:02:48.6395173Z 'num_stages': 7, 2026-02-21T09:02:48.6395371Z 'num_warps': 4, 2026-02-21T09:02:48.6395599Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:02:48.6395909Z 'range_flattens': [False, False], 2026-02-21T09:02:48.6396195Z 'range_multi_buffers': [None, False], 2026-02-21T09:02:48.6396644Z 'range_num_stages': [0, 0], 2026-02-21T09:02:48.6396899Z 'range_unroll_factors': [2, 2], 2026-02-21T09:02:48.6397172Z 'range_warp_specializes': []} 2026-02-21T09:02:48.6415342Z [114s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:02:50.4506173Z [115s] Generation 1 starting: 107 neighbors, 5 active search path(s) 2026-02-21T09:03:37.7776961Z [163s] Timeout after 30s compiling Config(block_sizes=[4, 2048, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:03:40.4039006Z [165s] Timeout after 30s compiling Config(block_sizes=[4, 4096, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:03:40.4059688Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 109/109 0.7 configs/s 2026-02-21T09:03:41.0219418Z [166s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_sm_multiplier=4, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[True, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:03:41.0221508Z Tensor-likes are not close! 2026-02-21T09:03:41.0221682Z 2026-02-21T09:03:41.0221806Z Mismatched elements: 33437143 / 33554432 (99.7%) 2026-02-21T09:03:41.0222240Z Greatest absolute difference: 1408.0 at index (2015, 6799) (up to 0.01 allowed) 2026-02-21T09:03:41.0222805Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:41.0223275Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:41.0223538Z 2026-02-21T09:03:42.7720422Z [168s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 4], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:03:42.7722095Z Tensor-likes are not close! 2026-02-21T09:03:42.7722259Z 2026-02-21T09:03:42.7722394Z Mismatched elements: 33467813 / 33554432 (99.7%) 2026-02-21T09:03:42.7722815Z Greatest absolute difference: 1680.0 at index (31, 4832) (up to 0.01 allowed) 2026-02-21T09:03:42.7723341Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:42.7723803Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:42.7724075Z 2026-02-21T09:03:43.8753347Z [169s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=3, num_warps=8, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 4], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:03:43.8755271Z Tensor-likes are not close! 2026-02-21T09:03:43.8755449Z 2026-02-21T09:03:43.8755571Z Mismatched elements: 33421383 / 33554432 (99.6%) 2026-02-21T09:03:43.8756027Z Greatest absolute difference: 1280.0 at index (2293, 6853) (up to 0.01 allowed) 2026-02-21T09:03:43.8756942Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.4409956Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.4410314Z 2026-02-21T09:03:44.4411805Z [169s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], num_stages=3, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 4], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:03:44.4413432Z Tensor-likes are not close! 2026-02-21T09:03:44.4413602Z 2026-02-21T09:03:44.4413943Z Mismatched elements: 33393679 / 33554432 (99.5%) 2026-02-21T09:03:44.4414396Z Greatest absolute difference: 1368.0 at index (1818, 934) (up to 0.01 allowed) 2026-02-21T09:03:44.4414931Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.4415389Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.4415659Z 2026-02-21T09:03:44.4491892Z [169s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:44.4493698Z Tensor-likes are not close! 2026-02-21T09:03:44.4493867Z 2026-02-21T09:03:44.4493981Z Mismatched elements: 33443654 / 33554432 (99.7%) 2026-02-21T09:03:44.4494404Z Greatest absolute difference: 1376.0 at index (2643, 2912) (up to 0.01 allowed) 2026-02-21T09:03:44.4495136Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.4495595Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.4495851Z 2026-02-21T09:03:44.7875950Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:44.7878060Z Tensor-likes are not close! 2026-02-21T09:03:44.7878232Z 2026-02-21T09:03:44.7878563Z Mismatched elements: 33435568 / 33554432 (99.6%) 2026-02-21T09:03:44.7878997Z Greatest absolute difference: 1312.0 at index (902, 435) (up to 0.01 allowed) 2026-02-21T09:03:44.7879612Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.7880179Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.7880432Z 2026-02-21T09:03:44.7955575Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:44.7957511Z Tensor-likes are not close! 2026-02-21T09:03:44.7957875Z 2026-02-21T09:03:44.7958002Z Mismatched elements: 33443654 / 33554432 (99.7%) 2026-02-21T09:03:44.7958453Z Greatest absolute difference: 1376.0 at index (2643, 2912) (up to 0.01 allowed) 2026-02-21T09:03:44.7959008Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.7959506Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.7959837Z 2026-02-21T09:03:44.9926862Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[2, 0], range_warp_specializes=[]) 2026-02-21T09:03:44.9928704Z Tensor-likes are not close! 2026-02-21T09:03:44.9928864Z 2026-02-21T09:03:44.9928989Z Mismatched elements: 33435568 / 33554432 (99.6%) 2026-02-21T09:03:44.9929577Z Greatest absolute difference: 1312.0 at index (902, 435) (up to 0.01 allowed) 2026-02-21T09:03:44.9930123Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:44.9930594Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:44.9930854Z 2026-02-21T09:03:45.0821997Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[3, 3], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T09:03:45.0823801Z Tensor-likes are not close! 2026-02-21T09:03:45.0823957Z 2026-02-21T09:03:45.0824071Z Mismatched elements: 33422856 / 33554432 (99.6%) 2026-02-21T09:03:45.0824511Z Greatest absolute difference: 1408.0 at index (2015, 6799) (up to 0.01 allowed) 2026-02-21T09:03:45.0825039Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:45.0825721Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:45.0825981Z 2026-02-21T09:03:45.2485898Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:45.2487989Z Tensor-likes are not close! 2026-02-21T09:03:45.2488145Z 2026-02-21T09:03:45.2488252Z Mismatched elements: 33436714 / 33554432 (99.6%) 2026-02-21T09:03:45.2488865Z Greatest absolute difference: 1336.0 at index (938, 5439) (up to 0.01 allowed) 2026-02-21T09:03:45.2489403Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:45.2489874Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:45.2490128Z 2026-02-21T09:03:45.4081640Z [170s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:45.4083454Z Tensor-likes are not close! 2026-02-21T09:03:45.4083618Z 2026-02-21T09:03:45.4083727Z Mismatched elements: 33422856 / 33554432 (99.6%) 2026-02-21T09:03:45.4084326Z Greatest absolute difference: 1408.0 at index (2015, 6799) (up to 0.01 allowed) 2026-02-21T09:03:45.4084858Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:45.4085328Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:45.4085584Z 2026-02-21T09:03:45.6663131Z [171s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:03:45.6664939Z Tensor-likes are not close! 2026-02-21T09:03:45.6665102Z 2026-02-21T09:03:45.6665216Z Mismatched elements: 33422856 / 33554432 (99.6%) 2026-02-21T09:03:45.6665640Z Greatest absolute difference: 1408.0 at index (2015, 6799) (up to 0.01 allowed) 2026-02-21T09:03:45.6666340Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:45.6667012Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:45.6667277Z 2026-02-21T09:03:45.6771238Z [171s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 16, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[2, 0], range_warp_specializes=[]) 2026-02-21T09:03:45.6773016Z Tensor-likes are not close! 2026-02-21T09:03:45.6773172Z 2026-02-21T09:03:45.6773289Z Mismatched elements: 33508766 / 33554432 (99.9%) 2026-02-21T09:03:45.6773711Z Greatest absolute difference: 3216.0 at index (3294, 4010) (up to 0.01 allowed) 2026-02-21T09:03:45.6774248Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:45.6774702Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:45.6775159Z 2026-02-21T09:03:46.2448556Z [171s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[True, False], range_num_stages=[4, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T09:03:46.2450374Z Tensor-likes are not close! 2026-02-21T09:03:46.2450529Z 2026-02-21T09:03:46.2450639Z Mismatched elements: 33486130 / 33554432 (99.8%) 2026-02-21T09:03:46.2451048Z Greatest absolute difference: 2464.0 at index (2896, 6752) (up to 0.01 allowed) 2026-02-21T09:03:46.2451937Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:46.2452390Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:46.2452625Z 2026-02-21T09:03:47.6298425Z [173s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 2048, 16], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:03:47.6300345Z Tensor-likes are not close! 2026-02-21T09:03:47.6300536Z 2026-02-21T09:03:47.6300683Z Mismatched elements: 33451473 / 33554432 (99.7%) 2026-02-21T09:03:47.6301106Z Greatest absolute difference: 1376.0 at index (2772, 5188) (up to 0.01 allowed) 2026-02-21T09:03:47.6301638Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:47.6302599Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:47.6302847Z 2026-02-21T09:03:48.1437728Z [173s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 2048, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:03:48.1439597Z Tensor-likes are not close! 2026-02-21T09:03:48.1439777Z 2026-02-21T09:03:48.1439909Z Mismatched elements: 33452180 / 33554432 (99.7%) 2026-02-21T09:03:48.1440347Z Greatest absolute difference: 1376.0 at index (2010, 1809) (up to 0.01 allowed) 2026-02-21T09:03:48.1441017Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:48.1441940Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:48.1442213Z 2026-02-21T09:03:48.2394356Z [173s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 512, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:03:48.2396081Z Tensor-likes are not close! 2026-02-21T09:03:48.2396249Z 2026-02-21T09:03:48.2396368Z Mismatched elements: 33451187 / 33554432 (99.7%) 2026-02-21T09:03:48.2396965Z Greatest absolute difference: 1336.0 at index (3898, 483) (up to 0.01 allowed) 2026-02-21T09:03:48.2397515Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:03:48.2397984Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:03:48.2398248Z 2026-02-21T09:03:48.2718805Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━ 109/109 13.9 configs/s 2026-02-21T09:03:50.6072828Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━ 276/276 109.3 configs/s 2026-02-21T09:03:50.8340769Z [176s] Generation 1 complete: 2026-02-21T09:03:50.8341026Z error=43 2026-02-21T09:03:50.8341197Z timeout=2 2026-02-21T09:03:50.8341362Z ok=67 2026-02-21T09:03:50.8341517Z min=0.6420 2026-02-21T09:03:50.8341696Z mid=2.8415 2026-02-21T09:03:50.8341881Z max=38.7893 2026-02-21T09:03:50.8342073Z best={'block_sizes': [8, 256, 64], 2026-02-21T09:03:50.8342362Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T09:03:50.8342660Z 'l2_groupings': [8], 2026-02-21T09:03:50.8342895Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:03:50.8343184Z 'loop_orders': [[1, 0]], 2026-02-21T09:03:50.8343394Z 'maxnreg': 256, 2026-02-21T09:03:50.8343597Z 'num_sm_multiplier': 16, 2026-02-21T09:03:50.8344168Z 'num_stages': 6, 2026-02-21T09:03:50.8344369Z 'num_warps': 4, 2026-02-21T09:03:50.8344586Z 'pid_type': 'persistent_blocked', 2026-02-21T09:03:50.8344860Z 'range_flattens': [False, None], 2026-02-21T09:03:50.8345129Z 'range_multi_buffers': [True, False], 2026-02-21T09:03:50.8345390Z 'range_num_stages': [4, 1], 2026-02-21T09:03:50.8345647Z 'range_unroll_factors': [4, 2], 2026-02-21T09:03:50.8345893Z 'range_warp_specializes': []} 2026-02-21T09:03:50.8366031Z [176s] Fitting surrogate: 212 points, 212 targets 2026-02-21T09:03:52.5544191Z [177s] Generation 2 starting: 104 neighbors, 5 active search path(s) 2026-02-21T09:04:39.7724764Z [225s] Timeout after 30s compiling Config(block_sizes=[16, 1024, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:04:40.1605171Z [225s] Timeout after 30s compiling Config(block_sizes=[8, 1024, 8], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=1, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:04:41.7069470Z [227s] Timeout after 30s compiling Config(block_sizes=[8, 1024, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:04:42.9348906Z [228s] Timeout after 30s compiling Config(block_sizes=[8, 1024, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T09:04:42.9370072Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 106/106 0.7 configs/s 2026-02-21T09:04:43.2254337Z [228s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 512], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=7, num_warps=4, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:04:43.2256057Z Tensor-likes are not close! 2026-02-21T09:04:43.2256234Z 2026-02-21T09:04:43.2256357Z Mismatched elements: 33485526 / 33554432 (99.8%) 2026-02-21T09:04:43.2257148Z Greatest absolute difference: 2304.0 at index (536, 4357) (up to 0.01 allowed) 2026-02-21T09:04:43.2258177Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:43.2258619Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:43.2258854Z 2026-02-21T09:04:43.4292849Z [228s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=4, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:04:43.4295150Z Tensor-likes are not close! 2026-02-21T09:04:43.4295335Z 2026-02-21T09:04:43.4295462Z Mismatched elements: 33484551 / 33554432 (99.8%) 2026-02-21T09:04:43.4295901Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:04:43.4296804Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:43.4297308Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:43.7789100Z 2026-02-21T09:04:43.7790625Z [229s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=7, num_warps=4, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:04:43.7792255Z Tensor-likes are not close! 2026-02-21T09:04:43.7792426Z 2026-02-21T09:04:43.7792743Z Mismatched elements: 33444000 / 33554432 (99.7%) 2026-02-21T09:04:43.7793201Z Greatest absolute difference: 1344.0 at index (2556, 2881) (up to 0.01 allowed) 2026-02-21T09:04:43.7793740Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:43.7794221Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:43.7794472Z 2026-02-21T09:04:45.3865510Z [230s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 2], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:04:45.3867793Z Tensor-likes are not close! 2026-02-21T09:04:45.3867989Z 2026-02-21T09:04:45.3868122Z Mismatched elements: 33393679 / 33554432 (99.5%) 2026-02-21T09:04:45.3868680Z Greatest absolute difference: 1368.0 at index (1818, 934) (up to 0.01 allowed) 2026-02-21T09:04:45.3869239Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:45.3869674Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:45.3869918Z 2026-02-21T09:04:45.6825336Z [231s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[4, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:04:45.6827584Z Tensor-likes are not close! 2026-02-21T09:04:45.6827757Z 2026-02-21T09:04:45.6827892Z Mismatched elements: 33480905 / 33554432 (99.8%) 2026-02-21T09:04:45.6828543Z Greatest absolute difference: 2336.0 at index (1818, 918) (up to 0.01 allowed) 2026-02-21T09:04:45.6829133Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:45.6829959Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:45.6830197Z 2026-02-21T09:04:45.8354136Z [231s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[3, 4], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:04:45.8355975Z Tensor-likes are not close! 2026-02-21T09:04:45.8356149Z 2026-02-21T09:04:45.8356892Z Mismatched elements: 33480905 / 33554432 (99.8%) 2026-02-21T09:04:45.8357347Z Greatest absolute difference: 2336.0 at index (1818, 918) (up to 0.01 allowed) 2026-02-21T09:04:45.8357874Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:45.8358365Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:45.8358683Z 2026-02-21T09:04:45.9820247Z [231s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[True, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:04:45.9822042Z Tensor-likes are not close! 2026-02-21T09:04:45.9822224Z 2026-02-21T09:04:45.9822352Z Mismatched elements: 33421873 / 33554432 (99.6%) 2026-02-21T09:04:45.9822978Z Greatest absolute difference: 1296.0 at index (3163, 2221) (up to 0.01 allowed) 2026-02-21T09:04:45.9823542Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:45.9824006Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:45.9824264Z 2026-02-21T09:04:48.5675751Z [233s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 1024, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:04:48.5678005Z Tensor-likes are not close! 2026-02-21T09:04:48.5678189Z 2026-02-21T09:04:48.5678308Z Mismatched elements: 33452391 / 33554432 (99.7%) 2026-02-21T09:04:48.5678779Z Greatest absolute difference: 1440.0 at index (1120, 2214) (up to 0.01 allowed) 2026-02-21T09:04:48.5679357Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:48.5679920Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:48.5680178Z 2026-02-21T09:04:48.7530996Z [234s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 1024, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T09:04:48.7532855Z Tensor-likes are not close! 2026-02-21T09:04:48.7533018Z 2026-02-21T09:04:48.7533138Z Mismatched elements: 33452391 / 33554432 (99.7%) 2026-02-21T09:04:48.7533578Z Greatest absolute difference: 1440.0 at index (1120, 2214) (up to 0.01 allowed) 2026-02-21T09:04:48.7534125Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:04:48.7534603Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:04:48.7535519Z 2026-02-21T09:04:49.9961444Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━ 106/106 15.1 configs/s 2026-02-21T09:04:54.2600423Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━━ 500/500 114.4 configs/s 2026-02-21T09:04:54.4460624Z [239s] Generation 2 complete: 2026-02-21T09:04:54.4460979Z error=14 2026-02-21T09:04:54.4461199Z timeout=4 2026-02-21T09:04:54.4461423Z ok=91 2026-02-21T09:04:54.4461596Z min=0.4044 2026-02-21T09:04:54.4461777Z mid=1.1222 2026-02-21T09:04:54.4461946Z max=19.2452 2026-02-21T09:04:54.4462153Z best={'block_sizes': [8, 64, 256], 2026-02-21T09:04:54.4462560Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:04:54.4462995Z 'l2_groupings': [32], 2026-02-21T09:04:54.4464004Z 'load_eviction_policies': ['', ''], 2026-02-21T09:04:54.4464297Z 'loop_orders': [[0, 1]], 2026-02-21T09:04:54.4464531Z 'num_stages': 7, 2026-02-21T09:04:54.4464728Z 'num_warps': 4, 2026-02-21T09:04:54.4464949Z 'pid_type': 'flat', 2026-02-21T09:04:54.4465175Z 'range_flattens': [None, False], 2026-02-21T09:04:54.4465458Z 'range_multi_buffers': [None, False], 2026-02-21T09:04:54.4465740Z 'range_num_stages': [0, 0], 2026-02-21T09:04:54.4465996Z 'range_unroll_factors': [0, 2], 2026-02-21T09:04:54.4466262Z 'range_warp_specializes': []} 2026-02-21T09:04:54.4494331Z [239s] Fitting surrogate: 321 points, 321 targets 2026-02-21T09:04:56.0801729Z [241s] Generation 3 starting: 99 neighbors, 5 active search path(s) 2026-02-21T09:05:30.6361207Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 101/101 1.4 configs/s 2026-02-21T09:05:30.9607133Z 2026-02-21T09:05:30.9607150Z 2026-02-21T09:05:30.9607508Z ================================================================ 2026-02-21T09:05:30.9607964Z Internal Triton PTX codegen error 2026-02-21T09:05:30.9608640Z `ptxas` stderr: 2026-02-21T09:05:30.9609491Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 706 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:05:30.9610463Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.9610727Z 2026-02-21T09:05:30.9611487Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp0whdxpx1.ptx -o /tmp/tmp0whdxpx1.ptx.o 2026-02-21T09:05:30.9612365Z 2026-02-21T09:05:30.9612369Z 2026-02-21T09:05:30.9612456Z // 2026-02-21T09:05:30.9612687Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:30.9612985Z // 2026-02-21T09:05:30.9613073Z 2026-02-21T09:05:30.9613150Z .version 8.7 2026-02-21T09:05:30.9613323Z .target sm_90a 2026-02-21T09:05:30.9613504Z .address_size 64 2026-02-21T09:05:30.9613636Z 2026-02-21T09:05:30.9613866Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:05:30.9614299Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:30.9614612Z // @_helion_matmul_bf16_int4 2026-02-21T09:05:30.9614937Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:05:30.9615302Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:05:30.9615738Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:05:30.9616180Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:05:30.9616787Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:05:30.9617240Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:05:30.9617570Z ) 2026-02-21T09:05:30.9617725Z .reqntid 512 2026-02-21T09:05:30.9617888Z { 2026-02-21T09:05:30.9618048Z .reg .pred %p<31>; 2026-02-21T09:05:30.9618245Z .reg .b16 %rs<97>; 2026-02-21T09:05:30.9618437Z .reg .b32 %r<2274>; 2026-02-21T09:05:30.9618645Z .reg .b64 %rd<102>; 2026-02-21T09:05:30.9619023Z .loc 1 19 0 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:19:0 2026-02-21T09:05:30.9620030Z [276s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:30.9621906Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=6, num_warps=16, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:05:30.9623565Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:30.9623850Z `ptxas` stderr: 2026-02-21T09:05:30.9624533Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 706 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:05:30.9625256Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:30.9625448Z 2026-02-21T09:05:30.9625959Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp0whdxpx1.ptx -o /tmp/tmp0whdxpx1.ptx.o 2026-02-21T09:05:30.9626677Z 2026-02-21T09:05:30.9626834Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:30.9627130Z $L__func_begin0: 2026-02-21T09:05:30.9627446Z .loc 1 19 0 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:19:0 2026-02-21T09:05:30.9627742Z 2026-02-21T09:05:30.9627806Z // %bb.0: 2026-02-21T09:05:30.9628014Z ld.param.b64 %rd10, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:05:30.9628422Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:05:30.9628671Z $L__tmp0: 2026-02-21T09:05:30.9629052Z .loc 1 21 67 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:21:67 2026-02-21T09:05:30.9629417Z mov.u32 %r359, %ctaid.x; 2026-02-21T09:05:30.9629651Z ld.param.b64 %rd12, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:05:30.9629905Z mov.u32 %r360, %ctaid.y; 2026-02-21T09:05:30.9630125Z ld.param.b64 %rd60, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:05:30.9630380Z mov.u32 %r361, %ctaid.z; 2026-02-21T09:05:30.9630549Z mov.u32 %r362, %nctaid.x; 2026-02-21T09:05:30.9630732Z mov.u32 %r363, %nctaid.y; 2026-02-21T09:05:30.9630919Z mad.lo.s32 %r364, %r361, %r363, %r360; 2026-02-21T09:05:30.9631139Z mad.lo.s32 %r365, %r364, %r362, %r359; 2026-02-21T09:05:30.9631334Z shl.b32 %r366, %r365, 7; 2026-02-21T09:05:30.9631519Z cvt.s64.s32 %rd61, %r366; 2026-02-21T09:05:30.9631692Z add.s64 %rd1, %rd60, %rd61; 2026-02-21T09:05:30.9631875Z mov.u32 %r1, %tid.x; 2026-02-21T09:05:30.9632045Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:05:30.9632226Z shl.b32 %r367, %r1, 2; 2026-02-21T09:05:30.9632404Z mov.b32 %r368, global_smem; 2026-02-21T09:05:30.9632586Z add.s32 %r288, %r368, %r367; 2026-02-21T09:05:30.9632765Z mov.b32 %r1210, 0; 2026-02-21T09:05:30.9632921Z // begin inline asm 2026-02-21T09:05:30.9633102Z @%p1 st.shared.b32 [ %r288 + 0 ], %r1210; 2026-02-21T09:05:30.9633306Z // end inline asm 2026-02-21T09:05:30.9633468Z bar.warp.sync -1; 2026-02-21T09:05:30.9633628Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T09:05:30.9633808Z cvt.u64.u32 %rd11, %r368; 2026-02-21T09:05:30.9633984Z // begin inline asm 2026-02-21T09:05:30.9634317Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd11 + 0 ], %rd12; 2026-02-21T09:05:30.9634673Z // end inline asm 2026-02-21T09:05:30.9634821Z // begin inline asm 2026-02-21T09:05:30.9635088Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1; 2026-02-21T09:05:30.9635395Z // end inline asm 2026-02-21T09:05:30.9635550Z mov.b32 %r290, 64; 2026-02-21T09:05:30.9635702Z // begin inline asm 2026-02-21T09:05:30.9635991Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r290; 2026-02-21T09:05:30.9636334Z // end inline asm 2026-02-21T09:05:30.9636697Z mov.b32 %r291, 256; 2026-02-21T09:05:30.9636860Z // begin inline asm 2026-02-21T09:05:30.9637136Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r291; 2026-02-21T09:05:30.9637463Z // end inline asm 2026-02-21T09:05:30.9637608Z mov.b32 %r292, 8192; 2026-02-21T09:05:30.9637770Z // begin inline asm 2026-02-21T09:05:30.9638067Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r292; 2026-02-21T09:05:30.9638407Z // end inline asm 2026-02-21T09:05:30.9638558Z mov.b32 %r293, 4096; 2026-02-21T09:05:30.9638728Z // begin inline asm 2026-02-21T09:05:30.9639024Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r293; 2026-02-21T09:05:30.9639449Z // end inline asm 2026-02-21T09:05:30.9639681Z mov.b64 %rd19, 16384; 2026-02-21T09:05:30.9639851Z // begin inline asm 2026-02-21T09:05:30.9640174Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd11 + 0 ], 0x0, %rd19; 2026-02-21T09:05:30.9640534Z // end inline asm 2026-02-21T09:05:30.9640679Z mov.b32 %r294, 1; 2026-02-21T09:05:30.9640836Z // begin inline asm 2026-02-21T09:05:30.9641146Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0, %r294; 2026-02-21T09:05:30.9641504Z // end inline asm 2026-02-21T09:05:30.9641650Z // begin inline asm 2026-02-21T09:05:30.9641956Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x1, %r294; 2026-02-21T09:05:30.9642310Z // end inline asm 2026-02-21T09:05:30.9642453Z // begin inline asm 2026-02-21T09:05:30.9642739Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd11 + 0 ], 0xa; 2026-02-21T09:05:30.9643059Z // end inline asm 2026-02-21T09:05:30.9643208Z // begin inline asm 2026-02-21T09:05:30.9643576Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:05:30.9643927Z // end inline asm 2026-02-21T09:05:30.9644070Z // begin inline asm 2026-02-21T09:05:30.9644350Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x3; 2026-02-21T09:05:30.9644680Z // end inline asm 2026-02-21T09:05:30.9644843Z // begin inline asm 2026-02-21T09:05:30.9645116Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd11 + 0 ], 0x0; 2026-02-21T09:05:30.9645429Z // end inline asm 2026-02-21T09:05:30.9645575Z // begin inline asm 2026-02-21T09:05:30.9645994Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd1 + 0 ], [ %rd11 + 0 ], 0x80; 2026-02-21T09:05:30.9646591Z // end inline asm 2026-02-21T09:05:30.9646750Z // begin inline asm 2026-02-21T09:05:30.9646988Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd1 + 0 ], 0x80; 2026-02-21T09:05:30.9647297Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:05:30.9647517Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:30.9647725Z // end inline asm 2026-02-21T09:05:30.9647879Z bar.sync 0; 2026-02-21T09:05:30.9648183Z .loc 1 28 29 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:28:29 2026-02-21T09:05:30.9648540Z shr.u32 %r369, %r359, 5; 2026-02-21T09:05:30.9648719Z and.b32 %r370, %r369, 67108832; 2026-02-21T09:05:30.9649049Z .loc 1 29 35 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:29:35 2026-02-21T09:05:30.9649400Z sub.s32 %r371, 16, %r370; 2026-02-21T09:05:30.9649717Z .loc 1 30 41 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:30:41 2026-02-21T09:05:30.9650079Z and.b32 %r372, %r359, 1023; 2026-02-21T09:05:30.9650407Z .loc 1 31 47 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:31:47 2026-02-21T09:05:30.9650770Z div.s32 %r373, %r372, %r371; 2026-02-21T09:05:30.9651103Z .loc 1 30 60 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:30:60 2026-02-21T09:05:30.9651458Z mul.lo.s32 %r374, %r373, %r371; 2026-02-21T09:05:30.9651649Z sub.s32 %r375, %r372, %r374; 2026-02-21T09:05:30.9652055Z .loc 1 30 26 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:30:26 2026-02-21T09:05:30.9652400Z add.s32 %r376, %r375, %r370; 2026-02-21T09:05:30.9652723Z .loc 1 32 23 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:32:23 2026-02-21T09:05:30.9653068Z shl.b32 %r2055, %r376, 8; 2026-02-21T09:05:30.9653383Z .loc 1 33 41 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:33:41 2026-02-21T09:05:30.9653729Z shr.u32 %r3, %r1, 5; 2026-02-21T09:05:30.9653890Z shr.u32 %r377, %r1, 2; 2026-02-21T09:05:30.9654069Z bfe.u32 %r378, %r1, 2, 7; 2026-02-21T09:05:30.9654243Z and.b32 %r379, %r367, 252; 2026-02-21T09:05:30.9654645Z .loc 1 33 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:33:28 2026-02-21T09:05:30.9655066Z or.b32 %r380, %r2055, %r378; 2026-02-21T09:05:30.9655251Z or.b32 %r381, %r377, %r2055; 2026-02-21T09:05:30.9655570Z .loc 1 34 23 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:34:23 2026-02-21T09:05:30.9655916Z shl.b32 %r4, %r373, 8; 2026-02-21T09:05:30.9656227Z .loc 1 35 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:35:28 2026-02-21T09:05:30.9656729Z or.b32 %r5, %r4, %r379; 2026-02-21T09:05:30.9657042Z .loc 1 43 44 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:43:44 2026-02-21T09:05:30.9657395Z shr.u32 %r6, %r1, 6; 2026-02-21T09:05:30.9657691Z .loc 1 49 34 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:49:34 2026-02-21T09:05:30.9658039Z and.b32 %r382, %r1, 3; 2026-02-21T09:05:30.9658202Z shl.b32 %r383, %r382, 2; 2026-02-21T09:05:30.9658594Z .loc 1 50 49 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:49 2026-02-21T09:05:30.9658941Z shl.b32 %r384, %r380, 10; 2026-02-21T09:05:30.9659115Z shl.b32 %r385, %r381, 10; 2026-02-21T09:05:30.9659282Z or.b32 %r386, %r385, 131072; 2026-02-21T09:05:30.9659591Z .loc 1 67 34 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:67:34 2026-02-21T09:05:30.9659930Z and.b32 %r7, %r1, 256; 2026-02-21T09:05:30.9660254Z .loc 1 50 56 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:56 2026-02-21T09:05:30.9660604Z or.b32 %r387, %r384, %r383; 2026-02-21T09:05:30.9660777Z or.b32 %r388, %r386, %r383; 2026-02-21T09:05:30.9661087Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9661441Z mad.wide.s32 %rd29, %r387, 2, %rd9; 2026-02-21T09:05:30.9661649Z mad.wide.s32 %rd30, %r388, 2, %rd9; 2026-02-21T09:05:30.9661982Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9662346Z bar.sync 0; 2026-02-21T09:05:30.9662501Z shl.b32 %r389, %r1, 3; 2026-02-21T09:05:30.9662665Z and.b32 %r390, %r389, 3960; 2026-02-21T09:05:30.9662844Z and.b32 %r8, %r1, 16; 2026-02-21T09:05:30.9663007Z bfe.s32 %r391, %r1, 4, 1; 2026-02-21T09:05:30.9663183Z and.b32 %r392, %r391, 136; 2026-02-21T09:05:30.9663352Z xor.b32 %r9, %r392, %r390; 2026-02-21T09:05:30.9663527Z add.s32 %r296, %r368, %r9; 2026-02-21T09:05:30.9663691Z mov.b32 %r297, 8; 2026-02-21T09:05:30.9663850Z // begin inline asm 2026-02-21T09:05:30.9664087Z cp.async.ca.shared.global [ %r296 + 0 ], [ %rd29 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9664366Z // end inline asm 2026-02-21T09:05:30.9664524Z add.s32 %r298, %r296, 4096; 2026-02-21T09:05:30.9664697Z // begin inline asm 2026-02-21T09:05:30.9664927Z cp.async.ca.shared.global [ %r298 + 0 ], [ %rd30 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9665199Z // end inline asm 2026-02-21T09:05:30.9665365Z cp.async.commit_group; 2026-02-21T09:05:30.9665674Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9666040Z shl.b32 %r393, %r6, 13; 2026-02-21T09:05:30.9666292Z and.b32 %r394, %r393, 57344; 2026-02-21T09:05:30.9666729Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9667086Z add.s32 %r395, %r5, %r394; 2026-02-21T09:05:30.9667409Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9667763Z cvt.s64.s32 %rd62, %r395; 2026-02-21T09:05:30.9667939Z add.s64 %rd31, %rd10, %rd62; 2026-02-21T09:05:30.9668330Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9668683Z and.b32 %r396, %r367, 2044; 2026-02-21T09:05:30.9668864Z add.s32 %r397, %r368, %r396; 2026-02-21T09:05:30.9669215Z add.s32 %r300, %r397, 98304; 2026-02-21T09:05:30.9669388Z mov.b32 %r2145, 4; 2026-02-21T09:05:30.9669552Z // begin inline asm 2026-02-21T09:05:30.9669780Z cp.async.ca.shared.global [ %r300 + 0 ], [ %rd31 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9670061Z // end inline asm 2026-02-21T09:05:30.9670213Z cp.async.commit_group; 2026-02-21T09:05:30.9670518Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9670875Z cvt.s64.s32 %rd63, %r384; 2026-02-21T09:05:30.9671057Z cvt.u64.u32 %rd64, %r383; 2026-02-21T09:05:30.9671230Z or.b64 %rd65, %rd63, %rd64; 2026-02-21T09:05:30.9671404Z shl.b64 %rd66, %rd65, 1; 2026-02-21T09:05:30.9671577Z add.s64 %rd67, %rd9, %rd66; 2026-02-21T09:05:30.9671763Z add.s64 %rd32, %rd67, 32; 2026-02-21T09:05:30.9671935Z cvt.s64.s32 %rd68, %r386; 2026-02-21T09:05:30.9672104Z or.b64 %rd69, %rd68, %rd64; 2026-02-21T09:05:30.9672283Z shl.b64 %rd70, %rd69, 1; 2026-02-21T09:05:30.9672455Z add.s64 %rd71, %rd9, %rd70; 2026-02-21T09:05:30.9672707Z add.s64 %rd33, %rd71, 32; 2026-02-21T09:05:30.9673018Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9673366Z add.s32 %r302, %r296, 40960; 2026-02-21T09:05:30.9673542Z // begin inline asm 2026-02-21T09:05:30.9673766Z cp.async.ca.shared.global [ %r302 + 0 ], [ %rd32 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9674037Z // end inline asm 2026-02-21T09:05:30.9674184Z add.s32 %r304, %r296, 45056; 2026-02-21T09:05:30.9674356Z // begin inline asm 2026-02-21T09:05:30.9674579Z cp.async.ca.shared.global [ %r304 + 0 ], [ %rd33 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9674852Z // end inline asm 2026-02-21T09:05:30.9675010Z cp.async.commit_group; 2026-02-21T09:05:30.9675316Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9675664Z or.b32 %r398, %r393, 65536; 2026-02-21T09:05:30.9675974Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9676327Z add.s32 %r399, %r5, %r398; 2026-02-21T09:05:30.9676769Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9677122Z cvt.s64.s32 %rd72, %r399; 2026-02-21T09:05:30.9677302Z add.s64 %rd34, %rd10, %rd72; 2026-02-21T09:05:30.9677609Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9677965Z add.s32 %r306, %r397, 108544; 2026-02-21T09:05:30.9678148Z // begin inline asm 2026-02-21T09:05:30.9678385Z cp.async.ca.shared.global [ %r306 + 0 ], [ %rd34 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9678656Z // end inline asm 2026-02-21T09:05:30.9678818Z cp.async.commit_group; 2026-02-21T09:05:30.9679128Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9679487Z add.s64 %rd35, %rd67, 64; 2026-02-21T09:05:30.9679667Z add.s64 %rd36, %rd71, 64; 2026-02-21T09:05:30.9679977Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9680336Z bar.sync 0; 2026-02-21T09:05:30.9680570Z add.s32 %r308, %r296, 8192; 2026-02-21T09:05:30.9680752Z // begin inline asm 2026-02-21T09:05:30.9680984Z cp.async.ca.shared.global [ %r308 + 0 ], [ %rd35 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9681260Z // end inline asm 2026-02-21T09:05:30.9681415Z add.s32 %r310, %r296, 12288; 2026-02-21T09:05:30.9681594Z // begin inline asm 2026-02-21T09:05:30.9681825Z cp.async.ca.shared.global [ %r310 + 0 ], [ %rd36 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9682098Z // end inline asm 2026-02-21T09:05:30.9682254Z cp.async.commit_group; 2026-02-21T09:05:30.9682562Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9682915Z cvt.s64.s32 %rd73, %r5; 2026-02-21T09:05:30.9683085Z cvt.u64.u32 %rd74, %r394; 2026-02-21T09:05:30.9683403Z add.s64 %rd75, %rd74, %rd73; 2026-02-21T09:05:30.9683588Z add.s64 %rd76, %rd10, %rd75; 2026-02-21T09:05:30.9683778Z add.s64 %rd37, %rd76, 131072; 2026-02-21T09:05:30.9684101Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9684446Z add.s32 %r312, %r397, 100352; 2026-02-21T09:05:30.9684626Z // begin inline asm 2026-02-21T09:05:30.9684855Z cp.async.ca.shared.global [ %r312 + 0 ], [ %rd37 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9685133Z // end inline asm 2026-02-21T09:05:30.9685288Z cp.async.commit_group; 2026-02-21T09:05:30.9685592Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9685936Z add.s64 %rd38, %rd67, 96; 2026-02-21T09:05:30.9686113Z add.s64 %rd39, %rd71, 96; 2026-02-21T09:05:30.9686421Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9686896Z add.s32 %r314, %r296, 49152; 2026-02-21T09:05:30.9687162Z // begin inline asm 2026-02-21T09:05:30.9687397Z cp.async.ca.shared.global [ %r314 + 0 ], [ %rd38 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9687669Z // end inline asm 2026-02-21T09:05:30.9687818Z add.s32 %r316, %r296, 53248; 2026-02-21T09:05:30.9687994Z // begin inline asm 2026-02-21T09:05:30.9688217Z cp.async.ca.shared.global [ %r316 + 0 ], [ %rd39 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9688492Z // end inline asm 2026-02-21T09:05:30.9688664Z cp.async.commit_group; 2026-02-21T09:05:30.9688968Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9689320Z or.b32 %r400, %r393, 196608; 2026-02-21T09:05:30.9689627Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9689985Z add.s32 %r401, %r5, %r400; 2026-02-21T09:05:30.9690304Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9690666Z cvt.s64.s32 %rd77, %r401; 2026-02-21T09:05:30.9690856Z add.s64 %rd40, %rd10, %rd77; 2026-02-21T09:05:30.9691176Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9691533Z add.s32 %r318, %r397, 110592; 2026-02-21T09:05:30.9691714Z // begin inline asm 2026-02-21T09:05:30.9691959Z cp.async.ca.shared.global [ %r318 + 0 ], [ %rd40 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9692233Z // end inline asm 2026-02-21T09:05:30.9692397Z cp.async.commit_group; 2026-02-21T09:05:30.9692719Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9693070Z add.s64 %rd41, %rd67, 128; 2026-02-21T09:05:30.9693255Z add.s64 %rd42, %rd71, 128; 2026-02-21T09:05:30.9693565Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9693914Z bar.sync 0; 2026-02-21T09:05:30.9694066Z add.s32 %r320, %r296, 16384; 2026-02-21T09:05:30.9694250Z // begin inline asm 2026-02-21T09:05:30.9694485Z cp.async.ca.shared.global [ %r320 + 0 ], [ %rd41 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9694856Z // end inline asm 2026-02-21T09:05:30.9695016Z add.s32 %r322, %r296, 20480; 2026-02-21T09:05:30.9695195Z // begin inline asm 2026-02-21T09:05:30.9695430Z cp.async.ca.shared.global [ %r322 + 0 ], [ %rd42 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9695702Z // end inline asm 2026-02-21T09:05:30.9695866Z cp.async.commit_group; 2026-02-21T09:05:30.9696176Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9696647Z add.s64 %rd43, %rd76, 262144; 2026-02-21T09:05:30.9696966Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9697320Z add.s32 %r324, %r397, 102400; 2026-02-21T09:05:30.9697496Z // begin inline asm 2026-02-21T09:05:30.9697894Z cp.async.ca.shared.global [ %r324 + 0 ], [ %rd43 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9698176Z // end inline asm 2026-02-21T09:05:30.9698327Z cp.async.commit_group; 2026-02-21T09:05:30.9698633Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9698983Z add.s64 %rd44, %rd67, 160; 2026-02-21T09:05:30.9699162Z add.s64 %rd45, %rd71, 160; 2026-02-21T09:05:30.9699478Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9699843Z add.s32 %r326, %r296, 57344; 2026-02-21T09:05:30.9700026Z // begin inline asm 2026-02-21T09:05:30.9700261Z cp.async.ca.shared.global [ %r326 + 0 ], [ %rd44 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9700540Z // end inline asm 2026-02-21T09:05:30.9700689Z add.s32 %r328, %r296, 61440; 2026-02-21T09:05:30.9700870Z // begin inline asm 2026-02-21T09:05:30.9701093Z cp.async.ca.shared.global [ %r328 + 0 ], [ %rd45 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9701386Z // end inline asm 2026-02-21T09:05:30.9701616Z cp.async.commit_group; 2026-02-21T09:05:30.9701940Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9702299Z or.b32 %r402, %r393, 327680; 2026-02-21T09:05:30.9702610Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9702961Z add.s32 %r403, %r5, %r402; 2026-02-21T09:05:30.9703270Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9703626Z cvt.s64.s32 %rd78, %r403; 2026-02-21T09:05:30.9703805Z add.s64 %rd46, %rd10, %rd78; 2026-02-21T09:05:30.9704127Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9704478Z add.s32 %r330, %r397, 112640; 2026-02-21T09:05:30.9704654Z // begin inline asm 2026-02-21T09:05:30.9704896Z cp.async.ca.shared.global [ %r330 + 0 ], [ %rd46 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9705173Z // end inline asm 2026-02-21T09:05:30.9705343Z cp.async.commit_group; 2026-02-21T09:05:30.9705661Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9706046Z add.s64 %rd47, %rd67, 192; 2026-02-21T09:05:30.9706240Z add.s64 %rd48, %rd71, 192; 2026-02-21T09:05:30.9706686Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9707049Z bar.sync 0; 2026-02-21T09:05:30.9707201Z add.s32 %r332, %r296, 24576; 2026-02-21T09:05:30.9707387Z // begin inline asm 2026-02-21T09:05:30.9707624Z cp.async.ca.shared.global [ %r332 + 0 ], [ %rd47 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9707908Z // end inline asm 2026-02-21T09:05:30.9708065Z add.s32 %r334, %r296, 28672; 2026-02-21T09:05:30.9708247Z // begin inline asm 2026-02-21T09:05:30.9708571Z cp.async.ca.shared.global [ %r334 + 0 ], [ %rd48 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9708846Z // end inline asm 2026-02-21T09:05:30.9709011Z cp.async.commit_group; 2026-02-21T09:05:30.9709321Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9709768Z add.s64 %rd49, %rd76, 393216; 2026-02-21T09:05:30.9710084Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9710432Z add.s32 %r336, %r397, 104448; 2026-02-21T09:05:30.9710606Z // begin inline asm 2026-02-21T09:05:30.9710845Z cp.async.ca.shared.global [ %r336 + 0 ], [ %rd49 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9711122Z // end inline asm 2026-02-21T09:05:30.9711277Z cp.async.commit_group; 2026-02-21T09:05:30.9711583Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9711931Z add.s64 %rd50, %rd67, 224; 2026-02-21T09:05:30.9712111Z add.s64 %rd51, %rd71, 224; 2026-02-21T09:05:30.9712563Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9712929Z add.s32 %r338, %r296, 65536; 2026-02-21T09:05:30.9713106Z // begin inline asm 2026-02-21T09:05:30.9713338Z cp.async.ca.shared.global [ %r338 + 0 ], [ %rd50 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9713613Z // end inline asm 2026-02-21T09:05:30.9713761Z add.s32 %r340, %r296, 69632; 2026-02-21T09:05:30.9713940Z // begin inline asm 2026-02-21T09:05:30.9714164Z cp.async.ca.shared.global [ %r340 + 0 ], [ %rd51 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9714434Z // end inline asm 2026-02-21T09:05:30.9714592Z cp.async.commit_group; 2026-02-21T09:05:30.9714903Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9715257Z or.b32 %r404, %r393, 458752; 2026-02-21T09:05:30.9715581Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9715938Z add.s32 %r405, %r5, %r404; 2026-02-21T09:05:30.9716318Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9716819Z cvt.s64.s32 %rd79, %r405; 2026-02-21T09:05:30.9716998Z add.s64 %rd52, %rd10, %rd79; 2026-02-21T09:05:30.9717318Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9717669Z add.s32 %r342, %r397, 114688; 2026-02-21T09:05:30.9717845Z // begin inline asm 2026-02-21T09:05:30.9718082Z cp.async.ca.shared.global [ %r342 + 0 ], [ %rd52 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9718357Z // end inline asm 2026-02-21T09:05:30.9718519Z cp.async.commit_group; 2026-02-21T09:05:30.9718832Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9719187Z add.s64 %rd53, %rd67, 256; 2026-02-21T09:05:30.9719367Z add.s64 %rd54, %rd71, 256; 2026-02-21T09:05:30.9719685Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9720039Z bar.sync 0; 2026-02-21T09:05:30.9720199Z add.s32 %r344, %r296, 32768; 2026-02-21T09:05:30.9720381Z // begin inline asm 2026-02-21T09:05:30.9720611Z cp.async.ca.shared.global [ %r344 + 0 ], [ %rd53 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9720889Z // end inline asm 2026-02-21T09:05:30.9721044Z add.s32 %r346, %r296, 36864; 2026-02-21T09:05:30.9721224Z // begin inline asm 2026-02-21T09:05:30.9721448Z cp.async.ca.shared.global [ %r346 + 0 ], [ %rd54 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9721724Z // end inline asm 2026-02-21T09:05:30.9721885Z cp.async.commit_group; 2026-02-21T09:05:30.9722191Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9722551Z add.s64 %rd55, %rd76, 524288; 2026-02-21T09:05:30.9722865Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9723235Z add.s32 %r348, %r397, 106496; 2026-02-21T09:05:30.9723412Z // begin inline asm 2026-02-21T09:05:30.9723648Z cp.async.ca.shared.global [ %r348 + 0 ], [ %rd55 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9723919Z // end inline asm 2026-02-21T09:05:30.9724191Z cp.async.commit_group; 2026-02-21T09:05:30.9724495Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9724842Z add.s64 %rd56, %rd67, 288; 2026-02-21T09:05:30.9725034Z add.s64 %rd57, %rd71, 288; 2026-02-21T09:05:30.9725343Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9725689Z add.s32 %r350, %r296, 73728; 2026-02-21T09:05:30.9725859Z // begin inline asm 2026-02-21T09:05:30.9726084Z cp.async.ca.shared.global [ %r350 + 0 ], [ %rd56 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9726356Z // end inline asm 2026-02-21T09:05:30.9726627Z add.s32 %r352, %r296, 77824; 2026-02-21T09:05:30.9726891Z // begin inline asm 2026-02-21T09:05:30.9727195Z cp.async.ca.shared.global [ %r352 + 0 ], [ %rd57 + 0 ], 0x8, %r297; 2026-02-21T09:05:30.9727468Z // end inline asm 2026-02-21T09:05:30.9727621Z cp.async.commit_group; 2026-02-21T09:05:30.9727931Z .loc 1 56 51 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:51 2026-02-21T09:05:30.9728271Z or.b32 %r406, %r393, 589824; 2026-02-21T09:05:30.9728585Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9728933Z add.s32 %r407, %r5, %r406; 2026-02-21T09:05:30.9729240Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9729592Z cvt.s64.s32 %rd80, %r407; 2026-02-21T09:05:30.9729769Z add.s64 %rd58, %rd10, %rd80; 2026-02-21T09:05:30.9730095Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9730445Z add.s32 %r354, %r397, 116736; 2026-02-21T09:05:30.9730629Z // begin inline asm 2026-02-21T09:05:30.9730942Z cp.async.ca.shared.global [ %r354 + 0 ], [ %rd58 + 0 ], 0x4, %r2145; 2026-02-21T09:05:30.9731219Z // end inline asm 2026-02-21T09:05:30.9731384Z cp.async.commit_group; 2026-02-21T09:05:30.9731549Z shl.b32 %r12, %r1, 4; 2026-02-21T09:05:30.9731718Z and.b32 %r408, %r12, 7680; 2026-02-21T09:05:30.9731888Z and.b32 %r409, %r389, 96; 2026-02-21T09:05:30.9732071Z shl.b32 %r410, %r382, 1; 2026-02-21T09:05:30.9741640Z or.b32 %r411, %r408, %r409; 2026-02-21T09:05:30.9741876Z or.b32 %r412, %r411, %r410; 2026-02-21T09:05:30.9742078Z or.b32 %r13, %r412, %r392; 2026-02-21T09:05:30.9742276Z xor.b32 %r14, %r13, 8; 2026-02-21T09:05:30.9742455Z and.b32 %r15, %r1, 255; 2026-02-21T09:05:30.9742641Z or.b32 %r16, %r1, 768; 2026-02-21T09:05:30.9742833Z or.b32 %r17, %r1, 1792; 2026-02-21T09:05:30.9743013Z shl.b32 %r413, %r15, 6; 2026-02-21T09:05:30.9743191Z and.b32 %r414, %r389, 48; 2026-02-21T09:05:30.9743374Z shr.u32 %r415, %r7, 6; 2026-02-21T09:05:30.9743558Z or.b32 %r416, %r413, %r415; 2026-02-21T09:05:30.9743738Z or.b32 %r417, %r416, %r414; 2026-02-21T09:05:30.9743933Z add.s32 %r1209, %r368, 81920; 2026-02-21T09:05:30.9744129Z add.s32 %r18, %r1209, %r417; 2026-02-21T09:05:30.9744317Z xor.b32 %r419, %r417, 16; 2026-02-21T09:05:30.9744493Z add.s32 %r19, %r1209, %r419; 2026-02-21T09:05:30.9744671Z xor.b32 %r420, %r417, 32; 2026-02-21T09:05:30.9744851Z add.s32 %r20, %r1209, %r420; 2026-02-21T09:05:30.9745032Z xor.b32 %r421, %r417, 48; 2026-02-21T09:05:30.9745207Z add.s32 %r21, %r1209, %r421; 2026-02-21T09:05:30.9745382Z bfe.u32 %r422, %r1209, 4, 14; 2026-02-21T09:05:30.9745570Z cvt.u64.u32 %rd81, %r422; 2026-02-21T09:05:30.9745767Z or.b64 %rd87, %rd81, -9223371899348713472; 2026-02-21T09:05:30.9745993Z add.s32 %r423, %r368, 81952; 2026-02-21T09:05:30.9746175Z bfe.u32 %r424, %r423, 4, 14; 2026-02-21T09:05:30.9746358Z cvt.u64.u32 %rd82, %r424; 2026-02-21T09:05:30.9746736Z or.b64 %rd88, %rd82, -9223371899348713472; 2026-02-21T09:05:30.9747134Z .loc 1 42 106 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:42:106 2026-02-21T09:05:30.9747521Z mul.wide.u32 %rd83, %r382, 8; 2026-02-21T09:05:30.9747877Z shl.b32 %r425, %r376, 18; 2026-02-21T09:05:30.9748060Z shl.b32 %r426, %r378, 10; 2026-02-21T09:05:30.9748235Z or.b32 %r427, %r425, %r426; 2026-02-21T09:05:30.9748518Z mul.wide.s32 %rd84, %r427, 2; 2026-02-21T09:05:30.9748704Z or.b64 %rd85, %rd83, %rd84; 2026-02-21T09:05:30.9748892Z add.s64 %rd86, %rd85, %rd9; 2026-02-21T09:05:30.9749071Z add.s64 %rd100, %rd86, 352; 2026-02-21T09:05:30.9749260Z or.b32 %r2143, %r388, 176; 2026-02-21T09:05:30.9749437Z add.s32 %r428, %r394, %r4; 2026-02-21T09:05:30.9749620Z or.b32 %r429, %r428, %r379; 2026-02-21T09:05:30.9749796Z add.s32 %r2142, %r429, 655360; 2026-02-21T09:05:30.9749991Z mov.b32 %r2146, 0f00000000; 2026-02-21T09:05:30.9750182Z mov.b32 %r2144, -1; 2026-02-21T09:05:30.9750437Z mov.b64 %rd101, 0; 2026-02-21T09:05:30.9750679Z mov.b32 %r2147, %r2146; 2026-02-21T09:05:30.9750852Z mov.b32 %r2148, %r2146; 2026-02-21T09:05:30.9751027Z mov.b32 %r2149, %r2146; 2026-02-21T09:05:30.9751190Z mov.b32 %r2150, %r2146; 2026-02-21T09:05:30.9751361Z mov.b32 %r2151, %r2146; 2026-02-21T09:05:30.9751522Z mov.b32 %r2152, %r2146; 2026-02-21T09:05:30.9751690Z mov.b32 %r2153, %r2146; 2026-02-21T09:05:30.9751865Z mov.b32 %r2154, %r2146; 2026-02-21T09:05:30.9752038Z mov.b32 %r2155, %r2146; 2026-02-21T09:05:30.9752212Z mov.b32 %r2156, %r2146; 2026-02-21T09:05:30.9752379Z mov.b32 %r2157, %r2146; 2026-02-21T09:05:30.9752549Z mov.b32 %r2158, %r2146; 2026-02-21T09:05:30.9752713Z mov.b32 %r2159, %r2146; 2026-02-21T09:05:30.9752879Z mov.b32 %r2160, %r2146; 2026-02-21T09:05:30.9753041Z mov.b32 %r2161, %r2146; 2026-02-21T09:05:30.9753209Z mov.b32 %r2162, %r2146; 2026-02-21T09:05:30.9753373Z mov.b32 %r2163, %r2146; 2026-02-21T09:05:30.9753542Z mov.b32 %r2164, %r2146; 2026-02-21T09:05:30.9753707Z mov.b32 %r2165, %r2146; 2026-02-21T09:05:30.9753967Z mov.b32 %r2166, %r2146; 2026-02-21T09:05:30.9754145Z mov.b32 %r2167, %r2146; 2026-02-21T09:05:30.9754306Z mov.b32 %r2168, %r2146; 2026-02-21T09:05:30.9754472Z mov.b32 %r2169, %r2146; 2026-02-21T09:05:30.9754638Z mov.b32 %r2170, %r2146; 2026-02-21T09:05:30.9754806Z mov.b32 %r2171, %r2146; 2026-02-21T09:05:30.9754966Z mov.b32 %r2172, %r2146; 2026-02-21T09:05:30.9755134Z mov.b32 %r2173, %r2146; 2026-02-21T09:05:30.9755299Z mov.b32 %r2174, %r2146; 2026-02-21T09:05:30.9755467Z mov.b32 %r2175, %r2146; 2026-02-21T09:05:30.9755634Z mov.b32 %r2176, %r2146; 2026-02-21T09:05:30.9755804Z mov.b32 %r2177, %r2146; 2026-02-21T09:05:30.9755972Z mov.b32 %r2178, %r2146; 2026-02-21T09:05:30.9756134Z mov.b32 %r2179, %r2146; 2026-02-21T09:05:30.9756303Z mov.b32 %r2180, %r2146; 2026-02-21T09:05:30.9756594Z mov.b32 %r2181, %r2146; 2026-02-21T09:05:30.9756773Z mov.b32 %r2182, %r2146; 2026-02-21T09:05:30.9756934Z mov.b32 %r2183, %r2146; 2026-02-21T09:05:30.9757110Z mov.b32 %r2184, %r2146; 2026-02-21T09:05:30.9757277Z mov.b32 %r2185, %r2146; 2026-02-21T09:05:30.9757451Z mov.b32 %r2186, %r2146; 2026-02-21T09:05:30.9757616Z mov.b32 %r2187, %r2146; 2026-02-21T09:05:30.9757789Z mov.b32 %r2188, %r2146; 2026-02-21T09:05:30.9757966Z mov.b32 %r2189, %r2146; 2026-02-21T09:05:30.9758132Z mov.b32 %r2190, %r2146; 2026-02-21T09:05:30.9758302Z mov.b32 %r2191, %r2146; 2026-02-21T09:05:30.9758464Z mov.b32 %r2192, %r2146; 2026-02-21T09:05:30.9758634Z mov.b32 %r2193, %r2146; 2026-02-21T09:05:30.9758798Z mov.b32 %r2194, %r2146; 2026-02-21T09:05:30.9758982Z mov.b32 %r2195, %r2146; 2026-02-21T09:05:30.9759148Z mov.b32 %r2196, %r2146; 2026-02-21T09:05:30.9759318Z mov.b32 %r2197, %r2146; 2026-02-21T09:05:30.9759481Z mov.b32 %r2198, %r2146; 2026-02-21T09:05:30.9759654Z mov.b32 %r2199, %r2146; 2026-02-21T09:05:30.9759823Z mov.b32 %r2200, %r2146; 2026-02-21T09:05:30.9759986Z mov.b32 %r2201, %r2146; 2026-02-21T09:05:30.9760171Z mov.b32 %r2202, %r2146; 2026-02-21T09:05:30.9760339Z mov.b32 %r2203, %r2146; 2026-02-21T09:05:30.9760509Z mov.b32 %r2204, %r2146; 2026-02-21T09:05:30.9760669Z mov.b32 %r2205, %r2146; 2026-02-21T09:05:30.9760839Z mov.b32 %r2206, %r2146; 2026-02-21T09:05:30.9761088Z mov.b32 %r2207, %r2146; 2026-02-21T09:05:30.9761256Z mov.b32 %r2208, %r2146; 2026-02-21T09:05:30.9761433Z mov.b32 %r2209, %r2146; 2026-02-21T09:05:30.9761604Z mov.b32 %r2210, %r2146; 2026-02-21T09:05:30.9761773Z mov.b32 %r2211, %r2146; 2026-02-21T09:05:30.9761934Z mov.b32 %r2212, %r2146; 2026-02-21T09:05:30.9762104Z mov.b32 %r2213, %r2146; 2026-02-21T09:05:30.9762267Z mov.b32 %r2214, %r2146; 2026-02-21T09:05:30.9762440Z mov.b32 %r2215, %r2146; 2026-02-21T09:05:30.9762602Z mov.b32 %r2216, %r2146; 2026-02-21T09:05:30.9762770Z mov.b32 %r2217, %r2146; 2026-02-21T09:05:30.9762932Z mov.b32 %r2218, %r2146; 2026-02-21T09:05:30.9763103Z mov.b32 %r2219, %r2146; 2026-02-21T09:05:30.9763261Z mov.b32 %r2220, %r2146; 2026-02-21T09:05:30.9763573Z mov.b32 %r2221, %r2146; 2026-02-21T09:05:30.9763747Z mov.b32 %r2222, %r2146; 2026-02-21T09:05:30.9763909Z mov.b32 %r2223, %r2146; 2026-02-21T09:05:30.9764077Z mov.b32 %r2224, %r2146; 2026-02-21T09:05:30.9764241Z mov.b32 %r2225, %r2146; 2026-02-21T09:05:30.9764411Z mov.b32 %r2226, %r2146; 2026-02-21T09:05:30.9764571Z mov.b32 %r2227, %r2146; 2026-02-21T09:05:30.9764739Z mov.b32 %r2228, %r2146; 2026-02-21T09:05:30.9764900Z mov.b32 %r2229, %r2146; 2026-02-21T09:05:30.9765068Z mov.b32 %r2230, %r2146; 2026-02-21T09:05:30.9765235Z mov.b32 %r2231, %r2146; 2026-02-21T09:05:30.9765408Z mov.b32 %r2232, %r2146; 2026-02-21T09:05:30.9765577Z mov.b32 %r2233, %r2146; 2026-02-21T09:05:30.9765740Z mov.b32 %r2234, %r2146; 2026-02-21T09:05:30.9765911Z mov.b32 %r2235, %r2146; 2026-02-21T09:05:30.9766076Z mov.b32 %r2236, %r2146; 2026-02-21T09:05:30.9766246Z mov.b32 %r2237, %r2146; 2026-02-21T09:05:30.9766412Z mov.b32 %r2238, %r2146; 2026-02-21T09:05:30.9766723Z mov.b32 %r2239, %r2146; 2026-02-21T09:05:30.9766903Z mov.b32 %r2240, %r2146; 2026-02-21T09:05:30.9767151Z mov.b32 %r2241, %r2146; 2026-02-21T09:05:30.9767319Z mov.b32 %r2242, %r2146; 2026-02-21T09:05:30.9767496Z mov.b32 %r2243, %r2146; 2026-02-21T09:05:30.9767673Z mov.b32 %r2244, %r2146; 2026-02-21T09:05:30.9767836Z mov.b32 %r2245, %r2146; 2026-02-21T09:05:30.9768006Z mov.b32 %r2246, %r2146; 2026-02-21T09:05:30.9768169Z mov.b32 %r2247, %r2146; 2026-02-21T09:05:30.9768337Z mov.b32 %r2248, %r2146; 2026-02-21T09:05:30.9768500Z mov.b32 %r2249, %r2146; 2026-02-21T09:05:30.9768669Z mov.b32 %r2250, %r2146; 2026-02-21T09:05:30.9768833Z mov.b32 %r2251, %r2146; 2026-02-21T09:05:30.9769005Z mov.b32 %r2252, %r2146; 2026-02-21T09:05:30.9769167Z mov.b32 %r2253, %r2146; 2026-02-21T09:05:30.9769337Z mov.b32 %r2254, %r2146; 2026-02-21T09:05:30.9769508Z mov.b32 %r2255, %r2146; 2026-02-21T09:05:30.9769672Z mov.b32 %r2256, %r2146; 2026-02-21T09:05:30.9769841Z mov.b32 %r2257, %r2146; 2026-02-21T09:05:30.9770004Z mov.b32 %r2258, %r2146; 2026-02-21T09:05:30.9770177Z mov.b32 %r2259, %r2146; 2026-02-21T09:05:30.9770344Z mov.b32 %r2260, %r2146; 2026-02-21T09:05:30.9770521Z mov.b32 %r2261, %r2146; 2026-02-21T09:05:30.9770685Z mov.b32 %r2262, %r2146; 2026-02-21T09:05:30.9770869Z mov.b32 %r2263, %r2146; 2026-02-21T09:05:30.9771033Z mov.b32 %r2264, %r2146; 2026-02-21T09:05:30.9771203Z mov.b32 %r2265, %r2146; 2026-02-21T09:05:30.9771378Z mov.b32 %r2266, %r2146; 2026-02-21T09:05:30.9771541Z mov.b32 %r2267, %r2146; 2026-02-21T09:05:30.9771709Z mov.b32 %r2268, %r2146; 2026-02-21T09:05:30.9771872Z mov.b32 %r2269, %r2146; 2026-02-21T09:05:30.9772052Z mov.b32 %r2270, %r2146; 2026-02-21T09:05:30.9772215Z mov.b32 %r2271, %r2146; 2026-02-21T09:05:30.9772383Z mov.b32 %r2272, %r2146; 2026-02-21T09:05:30.9772549Z mov.b32 %r2273, %r2146; 2026-02-21T09:05:30.9772780Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:05:30.9773205Z .loc 1 73 34 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:73:34 2026-02-21T09:05:30.9773586Z setp.eq.b32 %p23, %r7, 0; 2026-02-21T09:05:30.9773929Z .loc 1 42 106 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:42:106 2026-02-21T09:05:30.9774391Z setp.lt.u64 %p24, %rd101, 432; 2026-02-21T09:05:30.9774597Z add.s32 %r2006, %r2144, 1; 2026-02-21T09:05:30.9774781Z setp.gt.s32 %p25, %r2006, 4; 2026-02-21T09:05:30.9774980Z selp.b32 %r2144, 0, %r2006, %p25; 2026-02-21T09:05:30.9775335Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9775715Z cp.async.wait_group 16; 2026-02-21T09:05:30.9775897Z bar.sync 0; 2026-02-21T09:05:30.9776054Z shl.b32 %r2007, %r2144, 13; 2026-02-21T09:05:30.9776243Z add.s32 %r2009, %r368, %r2007; 2026-02-21T09:05:30.9776697Z .loc 1 54 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:54:28 2026-02-21T09:05:30.9777148Z add.s32 %r2010, %r2009, %r13; 2026-02-21T09:05:30.9777410Z ld.shared.b16 %rs1, [%r2010]; 2026-02-21T09:05:30.9777618Z ld.shared.b16 %rs2, [%r2010+256]; 2026-02-21T09:05:30.9777822Z ld.shared.b16 %rs3, [%r2010+16]; 2026-02-21T09:05:30.9778019Z ld.shared.b16 %rs4, [%r2010+272]; 2026-02-21T09:05:30.9778215Z add.s32 %r2011, %r2009, %r14; 2026-02-21T09:05:30.9778393Z ld.shared.b16 %rs5, [%r2011]; 2026-02-21T09:05:30.9778578Z ld.shared.b16 %rs6, [%r2011+256]; 2026-02-21T09:05:30.9778768Z ld.shared.b16 %rs7, [%r2011+16]; 2026-02-21T09:05:30.9778962Z ld.shared.b16 %rs8, [%r2011+272]; 2026-02-21T09:05:30.9779153Z cvt.f32.bf16 %r686, %rs1; 2026-02-21T09:05:30.9779332Z cvt.f32.bf16 %r687, %rs2; 2026-02-21T09:05:30.9779506Z cvt.f32.bf16 %r688, %rs5; 2026-02-21T09:05:30.9779674Z cvt.f32.bf16 %r689, %rs6; 2026-02-21T09:05:30.9779848Z cvt.f32.bf16 %r946, %rs3; 2026-02-21T09:05:30.9780014Z cvt.f32.bf16 %r947, %rs4; 2026-02-21T09:05:30.9780188Z cvt.f32.bf16 %r948, %rs7; 2026-02-21T09:05:30.9780357Z cvt.f32.bf16 %r949, %rs8; 2026-02-21T09:05:30.9780748Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9781115Z shl.b32 %r2012, %r2144, 11; 2026-02-21T09:05:30.9781307Z add.s32 %r2013, %r368, %r2012; 2026-02-21T09:05:30.9781495Z add.s32 %r2014, %r2013, 98304; 2026-02-21T09:05:30.9781821Z .loc 1 69 41 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:69:41 2026-02-21T09:05:30.9782191Z add.s32 %r2015, %r2014, %r15; 2026-02-21T09:05:30.9782379Z ld.shared.b8 %rs9, [%r2015]; 2026-02-21T09:05:30.9782570Z ld.shared.b8 %rs10, [%r2015+256]; 2026-02-21T09:05:30.9782765Z ld.shared.b8 %rs11, [%r2015+512]; 2026-02-21T09:05:30.9782959Z add.s32 %r2016, %r2014, %r16; 2026-02-21T09:05:30.9783137Z ld.shared.b8 %rs12, [%r2016]; 2026-02-21T09:05:30.9783330Z ld.shared.b8 %rs13, [%r2015+1024]; 2026-02-21T09:05:30.9783534Z ld.shared.b8 %rs14, [%r2015+1280]; 2026-02-21T09:05:30.9783727Z ld.shared.b8 %rs15, [%r2015+1536]; 2026-02-21T09:05:30.9783928Z add.s32 %r2017, %r2014, %r17; 2026-02-21T09:05:30.9784112Z ld.shared.b8 %rs16, [%r2017]; 2026-02-21T09:05:30.9784439Z .loc 1 59 24 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:59:24 2026-02-21T09:05:30.9784795Z shl.b16 %rs17, %rs9, 4; 2026-02-21T09:05:30.9784972Z shl.b16 %rs18, %rs10, 4; 2026-02-21T09:05:30.9785147Z shl.b16 %rs19, %rs11, 4; 2026-02-21T09:05:30.9785323Z shl.b16 %rs20, %rs12, 4; 2026-02-21T09:05:30.9785489Z shl.b16 %rs21, %rs13, 4; 2026-02-21T09:05:30.9785663Z shl.b16 %rs22, %rs14, 4; 2026-02-21T09:05:30.9785835Z shl.b16 %rs23, %rs15, 4; 2026-02-21T09:05:30.9786013Z shl.b16 %rs24, %rs16, 4; 2026-02-21T09:05:30.9786327Z .loc 1 74 54 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:74:54 2026-02-21T09:05:30.9786810Z selp.b16 %rs25, %rs17, %rs9, %p23; 2026-02-21T09:05:30.9787014Z cvt.s16.s8 %rs26, %rs25; 2026-02-21T09:05:30.9787186Z shr.s16 %rs27, %rs26, 4; 2026-02-21T09:05:30.9787374Z selp.b16 %rs28, %rs18, %rs10, %p23; 2026-02-21T09:05:30.9787583Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T09:05:30.9787765Z shr.s16 %rs30, %rs29, 4; 2026-02-21T09:05:30.9787961Z selp.b16 %rs31, %rs19, %rs11, %p23; 2026-02-21T09:05:30.9788323Z cvt.s16.s8 %rs32, %rs31; 2026-02-21T09:05:30.9788505Z shr.s16 %rs33, %rs32, 4; 2026-02-21T09:05:30.9788678Z selp.b16 %rs34, %rs20, %rs12, %p23; 2026-02-21T09:05:30.9788878Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T09:05:30.9789044Z shr.s16 %rs36, %rs35, 4; 2026-02-21T09:05:30.9789220Z selp.b16 %rs37, %rs21, %rs13, %p23; 2026-02-21T09:05:30.9789411Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T09:05:30.9789582Z shr.s16 %rs39, %rs38, 4; 2026-02-21T09:05:30.9789760Z selp.b16 %rs40, %rs22, %rs14, %p23; 2026-02-21T09:05:30.9789952Z cvt.s16.s8 %rs41, %rs40; 2026-02-21T09:05:30.9790123Z shr.s16 %rs42, %rs41, 4; 2026-02-21T09:05:30.9790294Z selp.b16 %rs43, %rs23, %rs15, %p23; 2026-02-21T09:05:30.9790504Z cvt.s16.s8 %rs44, %rs43; 2026-02-21T09:05:30.9790836Z shr.s16 %rs45, %rs44, 4; 2026-02-21T09:05:30.9791020Z selp.b16 %rs46, %rs24, %rs16, %p23; 2026-02-21T09:05:30.9791212Z cvt.s16.s8 %rs47, %rs46; 2026-02-21T09:05:30.9791388Z shr.s16 %rs48, %rs47, 4; 2026-02-21T09:05:30.9791706Z .loc 1 79 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:79:28 2026-02-21T09:05:30.9792083Z cvt.rn.f32.s16 %r2018, %rs27; 2026-02-21T09:05:30.9792276Z cvt.rn.f32.s16 %r2019, %rs30; 2026-02-21T09:05:30.9792457Z cvt.rn.f32.s16 %r2020, %rs33; 2026-02-21T09:05:30.9792654Z cvt.rn.f32.s16 %r2021, %rs36; 2026-02-21T09:05:30.9792836Z cvt.rn.f32.s16 %r2022, %rs39; 2026-02-21T09:05:30.9793019Z cvt.rn.f32.s16 %r2023, %rs42; 2026-02-21T09:05:30.9793198Z cvt.rn.f32.s16 %r2024, %rs45; 2026-02-21T09:05:30.9793380Z cvt.rn.f32.s16 %r2025, %rs48; 2026-02-21T09:05:30.9793562Z st.shared.b32 [%r18], %r2018; 2026-02-21T09:05:30.9793757Z st.shared.b32 [%r18+8], %r2019; 2026-02-21T09:05:30.9793954Z st.shared.b32 [%r19], %r2020; 2026-02-21T09:05:30.9794215Z st.shared.b32 [%r19+8], %r2021; 2026-02-21T09:05:30.9794414Z st.shared.b32 [%r20], %r2022; 2026-02-21T09:05:30.9794593Z st.shared.b32 [%r20+8], %r2023; 2026-02-21T09:05:30.9794785Z st.shared.b32 [%r21], %r2024; 2026-02-21T09:05:30.9794966Z st.shared.b32 [%r21+8], %r2025; 2026-02-21T09:05:30.9795156Z $L__tmp1: 2026-02-21T09:05:30.9795514Z .loc 2 291 36 // standard.py:291:36 @[ cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:86:36 ] 2026-02-21T09:05:30.9795945Z // begin inline asm 2026-02-21T09:05:30.9796143Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.9796343Z // end inline asm 2026-02-21T09:05:30.9796641Z bar.sync 0; 2026-02-21T09:05:30.9796815Z shfl.sync.idx.b32 %r2026, %r3, 0, 31, -1; 2026-02-21T09:05:30.9797049Z wgmma.fence.sync.aligned; 2026-02-21T09:05:30.9797231Z mov.pred %p19, -1; 2026-02-21T09:05:30.9797399Z // begin inline asm 2026-02-21T09:05:30.9799801Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273}, {%r686,%r687,%r688,%r689}, %rd87, %p19, 1, 1; 2026-02-21T09:05:30.9802230Z // end inline asm 2026-02-21T09:05:30.9802405Z // begin inline asm 2026-02-21T09:05:30.9804773Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273}, {%r946,%r947,%r948,%r949}, %rd88, %p19, 1, 1; 2026-02-21T09:05:30.9807515Z // end inline asm 2026-02-21T09:05:30.9807716Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:30.9807936Z mov.b32 %r1078, %r1209; 2026-02-21T09:05:30.9808111Z mov.b32 %r1079, %r1210; 2026-02-21T09:05:30.9808295Z mov.b32 %r1080, %r1210; 2026-02-21T09:05:30.9808463Z // begin inline asm 2026-02-21T09:05:30.9810710Z // wait for regs: %r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273,%r1078,%r1079,%r1080 2026-02-21T09:05:30.9812943Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:30.9813142Z // end inline asm 2026-02-21T09:05:30.9813292Z $L__tmp2: 2026-02-21T09:05:30.9813595Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9813971Z add.s32 %r2027, %r368, 40960; 2026-02-21T09:05:30.9814169Z add.s32 %r2028, %r2027, %r2007; 2026-02-21T09:05:30.9814501Z .loc 1 54 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:54:28 2026-02-21T09:05:30.9814858Z add.s32 %r2029, %r2028, %r13; 2026-02-21T09:05:30.9815053Z ld.shared.b16 %rs49, [%r2029]; 2026-02-21T09:05:30.9815255Z ld.shared.b16 %rs50, [%r2029+256]; 2026-02-21T09:05:30.9815472Z ld.shared.b16 %rs51, [%r2029+16]; 2026-02-21T09:05:30.9815671Z ld.shared.b16 %rs52, [%r2029+272]; 2026-02-21T09:05:30.9815874Z add.s32 %r2030, %r2028, %r14; 2026-02-21T09:05:30.9816059Z ld.shared.b16 %rs53, [%r2030]; 2026-02-21T09:05:30.9816256Z ld.shared.b16 %rs54, [%r2030+256]; 2026-02-21T09:05:30.9816576Z ld.shared.b16 %rs55, [%r2030+16]; 2026-02-21T09:05:30.9816780Z ld.shared.b16 %rs56, [%r2030+272]; 2026-02-21T09:05:30.9816983Z cvt.f32.bf16 %r1468, %rs49; 2026-02-21T09:05:30.9817168Z cvt.f32.bf16 %r1469, %rs50; 2026-02-21T09:05:30.9817351Z cvt.f32.bf16 %r1470, %rs53; 2026-02-21T09:05:30.9817532Z cvt.f32.bf16 %r1471, %rs54; 2026-02-21T09:05:30.9817713Z cvt.f32.bf16 %r1728, %rs51; 2026-02-21T09:05:30.9817888Z cvt.f32.bf16 %r1729, %rs52; 2026-02-21T09:05:30.9818068Z cvt.f32.bf16 %r1730, %rs55; 2026-02-21T09:05:30.9818243Z cvt.f32.bf16 %r1731, %rs56; 2026-02-21T09:05:30.9818571Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9818946Z add.s32 %r2031, %r2013, 108544; 2026-02-21T09:05:30.9819272Z .loc 1 69 41 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:69:41 2026-02-21T09:05:30.9819714Z add.s32 %r2032, %r2031, %r15; 2026-02-21T09:05:30.9819901Z ld.shared.b8 %rs57, [%r2032]; 2026-02-21T09:05:30.9820095Z ld.shared.b8 %rs58, [%r2032+256]; 2026-02-21T09:05:30.9820293Z ld.shared.b8 %rs59, [%r2032+512]; 2026-02-21T09:05:30.9820490Z add.s32 %r2033, %r2031, %r16; 2026-02-21T09:05:30.9820677Z ld.shared.b8 %rs60, [%r2033]; 2026-02-21T09:05:30.9820863Z ld.shared.b8 %rs61, [%r2032+1024]; 2026-02-21T09:05:30.9821065Z ld.shared.b8 %rs62, [%r2032+1280]; 2026-02-21T09:05:30.9821259Z ld.shared.b8 %rs63, [%r2032+1536]; 2026-02-21T09:05:30.9821457Z add.s32 %r2034, %r2031, %r17; 2026-02-21T09:05:30.9821639Z ld.shared.b8 %rs64, [%r2034]; 2026-02-21T09:05:30.9821965Z .loc 1 59 24 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:59:24 2026-02-21T09:05:30.9822464Z shl.b16 %rs65, %rs57, 4; 2026-02-21T09:05:30.9822660Z shl.b16 %rs66, %rs58, 4; 2026-02-21T09:05:30.9822847Z shl.b16 %rs67, %rs59, 4; 2026-02-21T09:05:30.9823015Z shl.b16 %rs68, %rs60, 4; 2026-02-21T09:05:30.9823187Z shl.b16 %rs69, %rs61, 4; 2026-02-21T09:05:30.9823352Z shl.b16 %rs70, %rs62, 4; 2026-02-21T09:05:30.9823521Z shl.b16 %rs71, %rs63, 4; 2026-02-21T09:05:30.9823684Z shl.b16 %rs72, %rs64, 4; 2026-02-21T09:05:30.9823993Z .loc 1 74 54 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:74:54 2026-02-21T09:05:30.9824348Z selp.b16 %rs73, %rs65, %rs57, %p23; 2026-02-21T09:05:30.9824554Z cvt.s16.s8 %rs74, %rs73; 2026-02-21T09:05:30.9824726Z shr.s16 %rs75, %rs74, 4; 2026-02-21T09:05:30.9824901Z selp.b16 %rs76, %rs66, %rs58, %p23; 2026-02-21T09:05:30.9825102Z cvt.s16.s8 %rs77, %rs76; 2026-02-21T09:05:30.9825273Z shr.s16 %rs78, %rs77, 4; 2026-02-21T09:05:30.9825453Z selp.b16 %rs79, %rs67, %rs59, %p23; 2026-02-21T09:05:30.9825649Z cvt.s16.s8 %rs80, %rs79; 2026-02-21T09:05:30.9825899Z shr.s16 %rs81, %rs80, 4; 2026-02-21T09:05:30.9826080Z selp.b16 %rs82, %rs68, %rs60, %p23; 2026-02-21T09:05:30.9826278Z cvt.s16.s8 %rs83, %rs82; 2026-02-21T09:05:30.9826561Z shr.s16 %rs84, %rs83, 4; 2026-02-21T09:05:30.9826757Z selp.b16 %rs85, %rs69, %rs61, %p23; 2026-02-21T09:05:30.9826951Z cvt.s16.s8 %rs86, %rs85; 2026-02-21T09:05:30.9827126Z shr.s16 %rs87, %rs86, 4; 2026-02-21T09:05:30.9827300Z selp.b16 %rs88, %rs70, %rs62, %p23; 2026-02-21T09:05:30.9827497Z cvt.s16.s8 %rs89, %rs88; 2026-02-21T09:05:30.9827669Z shr.s16 %rs90, %rs89, 4; 2026-02-21T09:05:30.9827842Z selp.b16 %rs91, %rs71, %rs63, %p23; 2026-02-21T09:05:30.9828052Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T09:05:30.9828226Z shr.s16 %rs93, %rs92, 4; 2026-02-21T09:05:30.9828504Z selp.b16 %rs94, %rs72, %rs64, %p23; 2026-02-21T09:05:30.9828703Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T09:05:30.9828879Z shr.s16 %rs96, %rs95, 4; 2026-02-21T09:05:30.9829192Z .loc 1 79 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:79:28 2026-02-21T09:05:30.9829552Z cvt.rn.f32.s16 %r2035, %rs75; 2026-02-21T09:05:30.9829741Z cvt.rn.f32.s16 %r2036, %rs78; 2026-02-21T09:05:30.9829925Z cvt.rn.f32.s16 %r2037, %rs81; 2026-02-21T09:05:30.9830110Z cvt.rn.f32.s16 %r2038, %rs84; 2026-02-21T09:05:30.9830291Z cvt.rn.f32.s16 %r2039, %rs87; 2026-02-21T09:05:30.9830494Z cvt.rn.f32.s16 %r2040, %rs90; 2026-02-21T09:05:30.9830682Z cvt.rn.f32.s16 %r2041, %rs93; 2026-02-21T09:05:30.9830873Z cvt.rn.f32.s16 %r2042, %rs96; 2026-02-21T09:05:30.9831049Z bar.sync 0; 2026-02-21T09:05:30.9831204Z st.shared.b32 [%r18], %r2035; 2026-02-21T09:05:30.9831391Z st.shared.b32 [%r18+8], %r2036; 2026-02-21T09:05:30.9831582Z st.shared.b32 [%r19], %r2037; 2026-02-21T09:05:30.9831771Z st.shared.b32 [%r19+8], %r2038; 2026-02-21T09:05:30.9831957Z st.shared.b32 [%r20], %r2039; 2026-02-21T09:05:30.9832142Z st.shared.b32 [%r20+8], %r2040; 2026-02-21T09:05:30.9832330Z st.shared.b32 [%r21], %r2041; 2026-02-21T09:05:30.9832530Z st.shared.b32 [%r21+8], %r2042; 2026-02-21T09:05:30.9832711Z $L__tmp3: 2026-02-21T09:05:30.9833089Z .loc 2 291 36 // standard.py:291:36 @[ cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:86:36 ] 2026-02-21T09:05:30.9833624Z // begin inline asm 2026-02-21T09:05:30.9833816Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.9834008Z // end inline asm 2026-02-21T09:05:30.9834166Z bar.sync 0; 2026-02-21T09:05:30.9834329Z wgmma.fence.sync.aligned; 2026-02-21T09:05:30.9834511Z // begin inline asm 2026-02-21T09:05:30.9837108Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273}, {%r1468,%r1469,%r1470,%r1471}, %rd87, %p19, 1, 1; 2026-02-21T09:05:30.9839617Z // end inline asm 2026-02-21T09:05:30.9839777Z // begin inline asm 2026-02-21T09:05:30.9842226Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273}, {%r1728,%r1729,%r1730,%r1731}, %rd88, %p19, 1, 1; 2026-02-21T09:05:30.9844655Z // end inline asm 2026-02-21T09:05:30.9844831Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:30.9845032Z mov.b32 %r1862, %r1210; 2026-02-21T09:05:30.9845207Z mov.b32 %r1861, %r1210; 2026-02-21T09:05:30.9845377Z mov.b32 %r1860, %r1209; 2026-02-21T09:05:30.9845563Z // begin inline asm 2026-02-21T09:05:30.9847865Z // wait for regs: %r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r2162,%r2163,%r2164,%r2165,%r2166,%r2167,%r2168,%r2169,%r2170,%r2171,%r2172,%r2173,%r2174,%r2175,%r2176,%r2177,%r2178,%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2211,%r2212,%r2213,%r2214,%r2215,%r2216,%r2217,%r2218,%r2219,%r2220,%r2221,%r2222,%r2223,%r2224,%r2225,%r2226,%r2227,%r2228,%r2229,%r2230,%r2231,%r2232,%r2233,%r2234,%r2235,%r2236,%r2237,%r2238,%r2239,%r2240,%r2241,%r2242,%r2243,%r2244,%r2245,%r2246,%r2247,%r2248,%r2249,%r2250,%r2251,%r2252,%r2253,%r2254,%r2255,%r2256,%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2265,%r2266,%r2267,%r2268,%r2269,%r2270,%r2271,%r2272,%r2273,%r1860,%r1861,%r1862 2026-02-21T09:05:30.9850103Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:30.9850306Z // end inline asm 2026-02-21T09:05:30.9850463Z $L__tmp4: 2026-02-21T09:05:30.9850770Z .loc 1 42 106 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:42:106 2026-02-21T09:05:30.9851246Z add.s32 %r2043, %r2145, 1; 2026-02-21T09:05:30.9851433Z setp.gt.s32 %p26, %r2043, 4; 2026-02-21T09:05:30.9851635Z selp.b32 %r2145, 0, %r2043, %p26; 2026-02-21T09:05:30.9851975Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9852358Z add.s32 %r2044, %r2143, -16; 2026-02-21T09:05:30.9852541Z add.s64 %rd91, %rd100, -32; 2026-02-21T09:05:30.9852740Z mad.wide.s32 %rd92, %r2044, 2, %rd9; 2026-02-21T09:05:30.9853077Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9853437Z shl.b32 %r2045, %r2145, 13; 2026-02-21T09:05:30.9853619Z add.s32 %r2046, %r368, %r2045; 2026-02-21T09:05:30.9853973Z add.s32 %r1994, %r2046, %r9; 2026-02-21T09:05:30.9854160Z selp.b32 %r1995, 8, 0, %p24; 2026-02-21T09:05:30.9854334Z // begin inline asm 2026-02-21T09:05:30.9854574Z cp.async.ca.shared.global [ %r1994 + 0 ], [ %rd91 + 0 ], 0x8, %r1995; 2026-02-21T09:05:30.9854856Z // end inline asm 2026-02-21T09:05:30.9855019Z add.s32 %r1996, %r1994, 4096; 2026-02-21T09:05:30.9855210Z // begin inline asm 2026-02-21T09:05:30.9855449Z cp.async.ca.shared.global [ %r1996 + 0 ], [ %rd92 + 0 ], 0x8, %r1995; 2026-02-21T09:05:30.9855727Z // end inline asm 2026-02-21T09:05:30.9855885Z cp.async.commit_group; 2026-02-21T09:05:30.9856200Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9856683Z cvt.s64.s32 %rd97, %r2142; 2026-02-21T09:05:30.9856874Z add.s64 %rd93, %rd10, %rd97; 2026-02-21T09:05:30.9857197Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9857559Z shl.b32 %r2047, %r2145, 11; 2026-02-21T09:05:30.9857825Z add.s32 %r1998, %r300, %r2047; 2026-02-21T09:05:30.9858030Z selp.b32 %r1999, 4, 0, %p24; 2026-02-21T09:05:30.9858216Z // begin inline asm 2026-02-21T09:05:30.9858474Z cp.async.ca.shared.global [ %r1998 + 0 ], [ %rd93 + 0 ], 0x4, %r1999; 2026-02-21T09:05:30.9858770Z // end inline asm 2026-02-21T09:05:30.9858933Z cp.async.commit_group; 2026-02-21T09:05:30.9859261Z .loc 1 50 28 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:28 2026-02-21T09:05:30.9859632Z mad.wide.s32 %rd95, %r2143, 2, %rd9; 2026-02-21T09:05:30.9859979Z .loc 1 50 76 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:50:76 2026-02-21T09:05:30.9860341Z add.s32 %r2048, %r2027, %r2045; 2026-02-21T09:05:30.9860547Z add.s32 %r2000, %r2048, %r9; 2026-02-21T09:05:30.9860734Z // begin inline asm 2026-02-21T09:05:30.9860978Z cp.async.ca.shared.global [ %r2000 + 0 ], [ %rd100 + 0 ], 0x8, %r1995; 2026-02-21T09:05:30.9861274Z // end inline asm 2026-02-21T09:05:30.9861433Z add.s32 %r2002, %r2000, 4096; 2026-02-21T09:05:30.9861621Z // begin inline asm 2026-02-21T09:05:30.9861854Z cp.async.ca.shared.global [ %r2002 + 0 ], [ %rd95 + 0 ], 0x8, %r1995; 2026-02-21T09:05:30.9862139Z // end inline asm 2026-02-21T09:05:30.9862297Z cp.async.commit_group; 2026-02-21T09:05:30.9862619Z .loc 1 56 58 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:58 2026-02-21T09:05:30.9862990Z cvt.u32.u64 %r2049, %rd101; 2026-02-21T09:05:30.9863178Z add.s32 %r2050, %r2049, 88; 2026-02-21T09:05:30.9863364Z or.b32 %r2051, %r6, %r2050; 2026-02-21T09:05:30.9863538Z shl.b32 %r2052, %r2051, 13; 2026-02-21T09:05:30.9863720Z add.s32 %r2053, %r2052, %r5; 2026-02-21T09:05:30.9864034Z .loc 1 56 30 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:30 2026-02-21T09:05:30.9864390Z cvt.s64.s32 %rd98, %r2053; 2026-02-21T09:05:30.9864578Z add.s64 %rd96, %rd10, %rd98; 2026-02-21T09:05:30.9864896Z .loc 1 56 83 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:56:83 2026-02-21T09:05:30.9865252Z add.s32 %r2004, %r306, %r2047; 2026-02-21T09:05:30.9865519Z // begin inline asm 2026-02-21T09:05:30.9865760Z cp.async.ca.shared.global [ %r2004 + 0 ], [ %rd96 + 0 ], 0x4, %r1999; 2026-02-21T09:05:30.9866037Z // end inline asm 2026-02-21T09:05:30.9866198Z cp.async.commit_group; 2026-02-21T09:05:30.9866670Z .loc 1 42 106 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:42:106 2026-02-21T09:05:30.9867042Z add.s64 %rd100, %rd100, 64; 2026-02-21T09:05:30.9867224Z add.s32 %r2143, %r2143, 32; 2026-02-21T09:05:30.9867403Z add.s32 %r2142, %r2142, 131072; 2026-02-21T09:05:30.9867604Z setp.lt.u64 %p27, %rd101, 496; 2026-02-21T09:05:30.9867789Z add.s64 %rd101, %rd101, 16; 2026-02-21T09:05:30.9867971Z @%p27 bra $L__BB0_1; 2026-02-21T09:05:30.9868127Z // %bb.2: 2026-02-21T09:05:30.9868675Z .loc 1 21 67 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:21:67 2026-02-21T09:05:30.9869043Z cvta.global.u64 %rd99, %rd1; 2026-02-21T09:05:30.9869376Z .loc 1 42 106 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:42:106 2026-02-21T09:05:30.9869743Z cp.async.wait_group 0; 2026-02-21T09:05:30.9869913Z bar.sync 0; 2026-02-21T09:05:30.9870202Z .loc 1 89 24 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:89:24 2026-02-21T09:05:30.9870569Z cvt.rn.bf16x2.f32 %r2057, %r2147, %r2146; 2026-02-21T09:05:30.9870801Z cvt.rn.bf16x2.f32 %r2058, %r2149, %r2148; 2026-02-21T09:05:30.9871021Z cvt.rn.bf16x2.f32 %r2059, %r2151, %r2150; 2026-02-21T09:05:30.9871247Z cvt.rn.bf16x2.f32 %r2060, %r2153, %r2152; 2026-02-21T09:05:30.9871469Z cvt.rn.bf16x2.f32 %r2061, %r2155, %r2154; 2026-02-21T09:05:30.9871690Z cvt.rn.bf16x2.f32 %r2062, %r2157, %r2156; 2026-02-21T09:05:30.9871913Z cvt.rn.bf16x2.f32 %r2063, %r2159, %r2158; 2026-02-21T09:05:30.9872213Z cvt.rn.bf16x2.f32 %r2064, %r2161, %r2160; 2026-02-21T09:05:30.9872441Z cvt.rn.bf16x2.f32 %r2065, %r2163, %r2162; 2026-02-21T09:05:30.9872655Z cvt.rn.bf16x2.f32 %r2066, %r2165, %r2164; 2026-02-21T09:05:30.9872878Z cvt.rn.bf16x2.f32 %r2067, %r2167, %r2166; 2026-02-21T09:05:30.9873092Z cvt.rn.bf16x2.f32 %r2068, %r2169, %r2168; 2026-02-21T09:05:30.9873310Z cvt.rn.bf16x2.f32 %r2069, %r2171, %r2170; 2026-02-21T09:05:30.9873531Z cvt.rn.bf16x2.f32 %r2070, %r2173, %r2172; 2026-02-21T09:05:30.9873746Z cvt.rn.bf16x2.f32 %r2071, %r2175, %r2174; 2026-02-21T09:05:30.9873965Z cvt.rn.bf16x2.f32 %r2072, %r2177, %r2176; 2026-02-21T09:05:30.9874178Z cvt.rn.bf16x2.f32 %r2073, %r2179, %r2178; 2026-02-21T09:05:30.9874399Z cvt.rn.bf16x2.f32 %r2074, %r2181, %r2180; 2026-02-21T09:05:30.9874614Z cvt.rn.bf16x2.f32 %r2075, %r2183, %r2182; 2026-02-21T09:05:30.9874832Z cvt.rn.bf16x2.f32 %r2076, %r2185, %r2184; 2026-02-21T09:05:30.9875047Z cvt.rn.bf16x2.f32 %r2077, %r2187, %r2186; 2026-02-21T09:05:30.9875277Z cvt.rn.bf16x2.f32 %r2078, %r2189, %r2188; 2026-02-21T09:05:30.9875497Z cvt.rn.bf16x2.f32 %r2079, %r2191, %r2190; 2026-02-21T09:05:30.9875713Z cvt.rn.bf16x2.f32 %r2080, %r2193, %r2192; 2026-02-21T09:05:30.9875946Z cvt.rn.bf16x2.f32 %r2081, %r2195, %r2194; 2026-02-21T09:05:30.9876166Z cvt.rn.bf16x2.f32 %r2082, %r2197, %r2196; 2026-02-21T09:05:30.9876389Z cvt.rn.bf16x2.f32 %r2083, %r2199, %r2198; 2026-02-21T09:05:30.9876754Z cvt.rn.bf16x2.f32 %r2084, %r2201, %r2200; 2026-02-21T09:05:30.9876979Z cvt.rn.bf16x2.f32 %r2085, %r2203, %r2202; 2026-02-21T09:05:30.9877199Z cvt.rn.bf16x2.f32 %r2086, %r2205, %r2204; 2026-02-21T09:05:30.9877424Z cvt.rn.bf16x2.f32 %r2087, %r2207, %r2206; 2026-02-21T09:05:30.9877645Z cvt.rn.bf16x2.f32 %r2088, %r2209, %r2208; 2026-02-21T09:05:30.9877859Z cvt.rn.bf16x2.f32 %r2089, %r2211, %r2210; 2026-02-21T09:05:30.9878081Z cvt.rn.bf16x2.f32 %r2090, %r2213, %r2212; 2026-02-21T09:05:30.9878293Z cvt.rn.bf16x2.f32 %r2091, %r2215, %r2214; 2026-02-21T09:05:30.9878519Z cvt.rn.bf16x2.f32 %r2092, %r2217, %r2216; 2026-02-21T09:05:30.9878735Z cvt.rn.bf16x2.f32 %r2093, %r2219, %r2218; 2026-02-21T09:05:30.9878956Z cvt.rn.bf16x2.f32 %r2094, %r2221, %r2220; 2026-02-21T09:05:30.9879273Z cvt.rn.bf16x2.f32 %r2095, %r2223, %r2222; 2026-02-21T09:05:30.9879488Z cvt.rn.bf16x2.f32 %r2096, %r2225, %r2224; 2026-02-21T09:05:30.9879706Z cvt.rn.bf16x2.f32 %r2097, %r2227, %r2226; 2026-02-21T09:05:30.9879920Z cvt.rn.bf16x2.f32 %r2098, %r2229, %r2228; 2026-02-21T09:05:30.9880138Z cvt.rn.bf16x2.f32 %r2099, %r2231, %r2230; 2026-02-21T09:05:30.9880352Z cvt.rn.bf16x2.f32 %r2100, %r2233, %r2232; 2026-02-21T09:05:30.9880572Z cvt.rn.bf16x2.f32 %r2101, %r2235, %r2234; 2026-02-21T09:05:30.9880787Z cvt.rn.bf16x2.f32 %r2102, %r2237, %r2236; 2026-02-21T09:05:30.9881009Z cvt.rn.bf16x2.f32 %r2103, %r2239, %r2238; 2026-02-21T09:05:30.9881226Z cvt.rn.bf16x2.f32 %r2104, %r2241, %r2240; 2026-02-21T09:05:30.9881456Z cvt.rn.bf16x2.f32 %r2105, %r2243, %r2242; 2026-02-21T09:05:30.9881815Z cvt.rn.bf16x2.f32 %r2106, %r2245, %r2244; 2026-02-21T09:05:30.9882035Z cvt.rn.bf16x2.f32 %r2107, %r2247, %r2246; 2026-02-21T09:05:30.9882255Z cvt.rn.bf16x2.f32 %r2108, %r2249, %r2248; 2026-02-21T09:05:30.9882469Z cvt.rn.bf16x2.f32 %r2109, %r2251, %r2250; 2026-02-21T09:05:30.9882698Z cvt.rn.bf16x2.f32 %r2110, %r2253, %r2252; 2026-02-21T09:05:30.9882915Z cvt.rn.bf16x2.f32 %r2111, %r2255, %r2254; 2026-02-21T09:05:30.9883151Z cvt.rn.bf16x2.f32 %r2112, %r2257, %r2256; 2026-02-21T09:05:30.9883373Z cvt.rn.bf16x2.f32 %r2113, %r2259, %r2258; 2026-02-21T09:05:30.9883587Z cvt.rn.bf16x2.f32 %r2114, %r2261, %r2260; 2026-02-21T09:05:30.9883807Z cvt.rn.bf16x2.f32 %r2115, %r2263, %r2262; 2026-02-21T09:05:30.9884023Z cvt.rn.bf16x2.f32 %r2116, %r2265, %r2264; 2026-02-21T09:05:30.9884243Z cvt.rn.bf16x2.f32 %r2117, %r2267, %r2266; 2026-02-21T09:05:30.9884467Z cvt.rn.bf16x2.f32 %r2118, %r2269, %r2268; 2026-02-21T09:05:30.9884697Z cvt.rn.bf16x2.f32 %r2119, %r2271, %r2270; 2026-02-21T09:05:30.9884915Z cvt.rn.bf16x2.f32 %r2120, %r2273, %r2272; 2026-02-21T09:05:30.9885346Z .loc 1 90 39 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:90:39 2026-02-21T09:05:30.9885715Z shl.b32 %r2121, %r1, 7; 2026-02-21T09:05:30.9885893Z and.b32 %r2122, %r2121, 1920; 2026-02-21T09:05:30.9886081Z shl.b32 %r2123, %r1, 6; 2026-02-21T09:05:30.9886254Z and.b32 %r2124, %r2123, 30720; 2026-02-21T09:05:30.9886445Z and.b32 %r2125, %r12, 112; 2026-02-21T09:05:30.9886764Z or.b32 %r2126, %r2122, %r2125; 2026-02-21T09:05:30.9886956Z xor.b32 %r2127, %r2126, %r8; 2026-02-21T09:05:30.9887141Z or.b32 %r2128, %r2127, %r2124; 2026-02-21T09:05:30.9887331Z add.s32 %r2130, %r368, %r2128; 2026-02-21T09:05:30.9887664Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2130], {%r2057, %r2058, %r2059, %r2060}; 2026-02-21T09:05:30.9888164Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2130+32768], {%r2073, %r2074, %r2075, %r2076}; 2026-02-21T09:05:30.9888680Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2130+65536], {%r2089, %r2090, %r2091, %r2092}; 2026-02-21T09:05:30.9889172Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2130+98304], {%r2105, %r2106, %r2107, %r2108}; 2026-02-21T09:05:30.9889531Z xor.b32 %r2131, %r2128, 32; 2026-02-21T09:05:30.9889717Z add.s32 %r2132, %r368, %r2131; 2026-02-21T09:05:30.9890033Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2132], {%r2061, %r2062, %r2063, %r2064}; 2026-02-21T09:05:30.9890513Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2132+32768], {%r2077, %r2078, %r2079, %r2080}; 2026-02-21T09:05:30.9890991Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2132+65536], {%r2093, %r2094, %r2095, %r2096}; 2026-02-21T09:05:30.9891475Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2132+98304], {%r2109, %r2110, %r2111, %r2112}; 2026-02-21T09:05:30.9891821Z xor.b32 %r2133, %r2128, 64; 2026-02-21T09:05:30.9892021Z add.s32 %r2134, %r368, %r2133; 2026-02-21T09:05:30.9892327Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2134], {%r2065, %r2066, %r2067, %r2068}; 2026-02-21T09:05:30.9892812Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2134+32768], {%r2081, %r2082, %r2083, %r2084}; 2026-02-21T09:05:30.9893289Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2134+65536], {%r2097, %r2098, %r2099, %r2100}; 2026-02-21T09:05:30.9893849Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2134+98304], {%r2113, %r2114, %r2115, %r2116}; 2026-02-21T09:05:30.9894194Z xor.b32 %r2135, %r2128, 96; 2026-02-21T09:05:30.9894368Z add.s32 %r2136, %r368, %r2135; 2026-02-21T09:05:30.9894670Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2136], {%r2069, %r2070, %r2071, %r2072}; 2026-02-21T09:05:30.9895146Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2136+32768], {%r2085, %r2086, %r2087, %r2088}; 2026-02-21T09:05:30.9895619Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2136+65536], {%r2101, %r2102, %r2103, %r2104}; 2026-02-21T09:05:30.9896101Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2136+98304], {%r2117, %r2118, %r2119, %r2120}; 2026-02-21T09:05:30.9896442Z // begin inline asm 2026-02-21T09:05:30.9896906Z fence.proxy.async.shared::cta; 2026-02-21T09:05:30.9897107Z // end inline asm 2026-02-21T09:05:30.9897273Z bar.sync 0; 2026-02-21T09:05:30.9897426Z elect.sync %r2137|%p29, -1; 2026-02-21T09:05:30.9897637Z shfl.sync.idx.b32 %r2138, %r3, 0, 31, -1; 2026-02-21T09:05:30.9897863Z setp.lt.u32 %p30, %r1, 128; 2026-02-21T09:05:30.9898053Z and.pred %p28, %p30, %p29; 2026-02-21T09:05:30.9898237Z and.b32 %r2139, %r2138, 3; 2026-02-21T09:05:30.9898411Z shl.b32 %r2140, %r2139, 15; 2026-02-21T09:05:30.9898592Z add.s32 %r2056, %r368, %r2140; 2026-02-21T09:05:30.9898773Z shl.b32 %r2141, %r2139, 6; 2026-02-21T09:05:30.9898950Z or.b32 %r2054, %r2141, %r4; 2026-02-21T09:05:30.9899123Z // begin inline asm 2026-02-21T09:05:30.9899471Z @%p28 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd99, {%r2054, %r2055}], [%r2056]; 2026-02-21T09:05:30.9899534Z // end inline asm 2026-02-21T09:05:30.9899616Z cp.async.bulk.commit_group; 2026-02-21T09:05:30.9899701Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:30.9899763Z bar.sync 0; 2026-02-21T09:05:30.9900054Z .loc 1 90 4 // cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py:90:4 2026-02-21T09:05:30.9900121Z ret; 2026-02-21T09:05:30.9900182Z $L__tmp5: 2026-02-21T09:05:30.9900242Z $L__func_end0: 2026-02-21T09:05:30.9900339Z // -- End function 2026-02-21T09:05:30.9900395Z } 2026-02-21T09:05:30.9900646Z .file 1 "/tmp/torchinductor_root/qg/cqgnx3cd735q6yjbnnqq5x2k4czlkaanyb74cc7qtnraxlhbshbs.py" 2026-02-21T09:05:30.9900869Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:05:30.9900937Z .section .debug_abbrev 2026-02-21T09:05:30.9900994Z { 2026-02-21T09:05:30.9901092Z .b8 1 // Abbreviation Code 2026-02-21T09:05:30.9901198Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:30.9901286Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:05:30.9901382Z .b8 37 // DW_AT_producer 2026-02-21T09:05:30.9901524Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.9901611Z .b8 19 // DW_AT_language 2026-02-21T09:05:30.9901707Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:30.9901788Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.9901870Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.9901959Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:30.9902039Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:30.9902121Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:30.9902210Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.9902285Z .b8 0 // EOM(1) 2026-02-21T09:05:30.9902356Z .b8 0 // EOM(2) 2026-02-21T09:05:30.9902456Z .b8 2 // Abbreviation Code 2026-02-21T09:05:30.9902547Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:05:30.9902629Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.9902845Z .b8 3 // DW_AT_name 2026-02-21T09:05:30.9902938Z .b8 8 // DW_FORM_string 2026-02-21T09:05:30.9903023Z .b8 32 // DW_AT_inline 2026-02-21T09:05:30.9903105Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:30.9903184Z .b8 0 // EOM(1) 2026-02-21T09:05:30.9903256Z .b8 0 // EOM(2) 2026-02-21T09:05:30.9903346Z .b8 3 // Abbreviation Code 2026-02-21T09:05:30.9903438Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:05:30.9903523Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:05:30.9903704Z .b8 17 // DW_AT_low_pc 2026-02-21T09:05:30.9903789Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:30.9903885Z .b8 18 // DW_AT_high_pc 2026-02-21T09:05:30.9903968Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:30.9904063Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:05:30.9904158Z .b8 19 // DW_FORM_ref4 2026-02-21T09:05:30.9904235Z .b8 0 // EOM(1) 2026-02-21T09:05:30.9904311Z .b8 0 // EOM(2) 2026-02-21T09:05:30.9904407Z .b8 4 // Abbreviation Code 2026-02-21T09:05:30.9904511Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:05:30.9904596Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:30.9904696Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:05:30.9904829Z .b8 19 // DW_FORM_ref4 2026-02-21T09:05:30.9904914Z .b8 17 // DW_AT_low_pc 2026-02-21T09:05:30.9904993Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:30.9905085Z .b8 18 // DW_AT_high_pc 2026-02-21T09:05:30.9905162Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:30.9905249Z .b8 88 // DW_AT_call_file 2026-02-21T09:05:30.9905334Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:30.9905418Z .b8 89 // DW_AT_call_line 2026-02-21T09:05:30.9905497Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:30.9905589Z .b8 87 // DW_AT_call_column 2026-02-21T09:05:30.9905668Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:30.9905747Z .b8 0 // EOM(1) 2026-02-21T09:05:30.9905825Z .b8 0 // EOM(2) 2026-02-21T09:05:30.9905901Z .b8 0 // EOM(3) 2026-02-21T09:05:30.9905955Z } 2026-02-21T09:05:30.9906021Z .section .debug_info 2026-02-21T09:05:30.9906078Z { 2026-02-21T09:05:30.9906174Z .b32 178 // Length of Unit 2026-02-21T09:05:30.9906268Z .b8 2 // DWARF version number 2026-02-21T09:05:30.9906323Z .b8 0 2026-02-21T09:05:30.9906608Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:30.9906715Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:30.9906834Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:05:30.9906927Z .b8 116 // DW_AT_producer 2026-02-21T09:05:30.9906995Z .b8 114 2026-02-21T09:05:30.9907052Z .b8 105 2026-02-21T09:05:30.9907115Z .b8 116 2026-02-21T09:05:30.9907171Z .b8 111 2026-02-21T09:05:30.9907227Z .b8 110 2026-02-21T09:05:30.9907280Z .b8 0 2026-02-21T09:05:30.9907368Z .b8 2 // DW_AT_language 2026-02-21T09:05:30.9907505Z .b8 0 2026-02-21T09:05:30.9907588Z .b8 99 // DW_AT_name 2026-02-21T09:05:30.9907646Z .b8 113 2026-02-21T09:05:30.9907700Z .b8 103 2026-02-21T09:05:30.9907753Z .b8 110 2026-02-21T09:05:30.9907806Z .b8 120 2026-02-21T09:05:30.9907864Z .b8 51 2026-02-21T09:05:30.9907916Z .b8 99 2026-02-21T09:05:30.9907969Z .b8 100 2026-02-21T09:05:30.9908030Z .b8 55 2026-02-21T09:05:30.9908082Z .b8 51 2026-02-21T09:05:30.9908134Z .b8 53 2026-02-21T09:05:30.9908188Z .b8 113 2026-02-21T09:05:30.9908340Z .b8 54 2026-02-21T09:05:30.9908398Z .b8 121 2026-02-21T09:05:30.9908453Z .b8 106 2026-02-21T09:05:30.9908505Z .b8 98 2026-02-21T09:05:30.9908567Z .b8 110 2026-02-21T09:05:30.9908621Z .b8 110 2026-02-21T09:05:30.9908673Z .b8 113 2026-02-21T09:05:30.9908812Z .b8 113 2026-02-21T09:05:30.9908864Z .b8 53 2026-02-21T09:05:30.9908978Z .b8 120 2026-02-21T09:05:30.9909038Z .b8 50 2026-02-21T09:05:30.9909098Z .b8 107 2026-02-21T09:05:30.9909151Z .b8 52 2026-02-21T09:05:30.9909203Z .b8 99 2026-02-21T09:05:30.9909263Z .b8 122 2026-02-21T09:05:30.9909318Z .b8 108 2026-02-21T09:05:30.9909371Z .b8 107 2026-02-21T09:05:30.9909423Z .b8 97 2026-02-21T09:05:30.9909482Z .b8 97 2026-02-21T09:05:30.9909536Z .b8 110 2026-02-21T09:05:30.9909588Z .b8 121 2026-02-21T09:05:30.9909641Z .b8 98 2026-02-21T09:05:30.9909700Z .b8 55 2026-02-21T09:05:30.9909753Z .b8 52 2026-02-21T09:05:30.9909806Z .b8 99 2026-02-21T09:05:30.9909875Z .b8 99 2026-02-21T09:05:30.9909931Z .b8 55 2026-02-21T09:05:30.9909985Z .b8 113 2026-02-21T09:05:30.9910039Z .b8 116 2026-02-21T09:05:30.9910099Z .b8 110 2026-02-21T09:05:30.9910153Z .b8 114 2026-02-21T09:05:30.9910206Z .b8 97 2026-02-21T09:05:30.9910266Z .b8 120 2026-02-21T09:05:30.9910321Z .b8 108 2026-02-21T09:05:30.9910375Z .b8 104 2026-02-21T09:05:30.9910428Z .b8 98 2026-02-21T09:05:30.9910492Z .b8 115 2026-02-21T09:05:30.9910548Z .b8 104 2026-02-21T09:05:30.9910670Z .b8 98 2026-02-21T09:05:30.9910739Z .b8 115 2026-02-21T09:05:30.9910792Z .b8 46 2026-02-21T09:05:30.9910846Z .b8 112 2026-02-21T09:05:30.9910902Z .b8 121 2026-02-21T09:05:30.9910961Z .b8 0 2026-02-21T09:05:30.9911069Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:30.9911154Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:30.9911213Z .b8 116 2026-02-21T09:05:30.9911267Z .b8 109 2026-02-21T09:05:30.9911320Z .b8 112 2026-02-21T09:05:30.9911373Z .b8 47 2026-02-21T09:05:30.9911431Z .b8 116 2026-02-21T09:05:30.9911483Z .b8 111 2026-02-21T09:05:30.9911537Z .b8 114 2026-02-21T09:05:30.9911591Z .b8 99 2026-02-21T09:05:30.9911648Z .b8 104 2026-02-21T09:05:30.9911703Z .b8 105 2026-02-21T09:05:30.9911756Z .b8 110 2026-02-21T09:05:30.9911814Z .b8 100 2026-02-21T09:05:30.9911868Z .b8 117 2026-02-21T09:05:30.9911920Z .b8 99 2026-02-21T09:05:30.9911972Z .b8 116 2026-02-21T09:05:30.9912037Z .b8 111 2026-02-21T09:05:30.9912092Z .b8 114 2026-02-21T09:05:30.9912146Z .b8 95 2026-02-21T09:05:30.9912204Z .b8 114 2026-02-21T09:05:30.9912256Z .b8 111 2026-02-21T09:05:30.9912309Z .b8 111 2026-02-21T09:05:30.9912364Z .b8 116 2026-02-21T09:05:30.9912421Z .b8 47 2026-02-21T09:05:30.9912474Z .b8 113 2026-02-21T09:05:30.9912527Z .b8 103 2026-02-21T09:05:30.9912578Z .b8 0 2026-02-21T09:05:30.9912707Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:05:30.9912787Z .b8 95 // DW_AT_name 2026-02-21T09:05:30.9912854Z .b8 104 2026-02-21T09:05:30.9912915Z .b8 101 2026-02-21T09:05:30.9912968Z .b8 108 2026-02-21T09:05:30.9913021Z .b8 105 2026-02-21T09:05:30.9913082Z .b8 111 2026-02-21T09:05:30.9913136Z .b8 110 2026-02-21T09:05:30.9913189Z .b8 95 2026-02-21T09:05:30.9913242Z .b8 109 2026-02-21T09:05:30.9913300Z .b8 97 2026-02-21T09:05:30.9913353Z .b8 116 2026-02-21T09:05:30.9913406Z .b8 109 2026-02-21T09:05:30.9913462Z .b8 117 2026-02-21T09:05:30.9913524Z .b8 108 2026-02-21T09:05:30.9913576Z .b8 95 2026-02-21T09:05:30.9913630Z .b8 98 2026-02-21T09:05:30.9913688Z .b8 102 2026-02-21T09:05:30.9913740Z .b8 49 2026-02-21T09:05:30.9913793Z .b8 54 2026-02-21T09:05:30.9913917Z .b8 95 2026-02-21T09:05:30.9913978Z .b8 105 2026-02-21T09:05:30.9914031Z .b8 110 2026-02-21T09:05:30.9914083Z .b8 116 2026-02-21T09:05:30.9914141Z .b8 52 2026-02-21T09:05:30.9914194Z .b8 0 2026-02-21T09:05:30.9914276Z .b8 1 // DW_AT_inline 2026-02-21T09:05:30.9914386Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:05:30.9914498Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:05:30.9914596Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:05:30.9914700Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:05:30.9914838Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:05:30.9915045Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:05:30.9915149Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:05:30.9915247Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:05:30.9915339Z .b8 1 // DW_AT_call_file 2026-02-21T09:05:30.9915423Z .b8 86 // DW_AT_call_line 2026-02-21T09:05:30.9915511Z .b8 36 // DW_AT_call_column 2026-02-21T09:05:30.9915611Z .b8 0 // End Of Children Mark 2026-02-21T09:05:30.9915698Z .b8 0 // End Of Children Mark 2026-02-21T09:05:30.9915751Z } 2026-02-21T09:05:30.9915829Z .section .debug_macinfo { } 2026-02-21T09:05:30.9915836Z 2026-02-21T09:05:30.9915917Z ================================================================ 2026-02-21T09:05:30.9916035Z please share the reproducer above with Triton project. 2026-02-21T09:05:31.5362498Z [276s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=7, num_warps=4, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T09:05:31.5365342Z Tensor-likes are not close! 2026-02-21T09:05:31.5365624Z 2026-02-21T09:05:31.5365810Z Mismatched elements: 33485594 / 33554432 (99.8%) 2026-02-21T09:05:31.5367053Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:05:31.5367930Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:05:31.5368688Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:05:31.5369104Z 2026-02-21T09:05:36.0002346Z [281s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:05:36.0004260Z Tensor-likes are not close! 2026-02-21T09:05:36.0004422Z 2026-02-21T09:05:36.0004532Z Mismatched elements: 33448156 / 33554432 (99.7%) 2026-02-21T09:05:36.0004959Z Greatest absolute difference: 1416.0 at index (1834, 910) (up to 0.01 allowed) 2026-02-21T09:05:36.0005458Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:05:36.0005889Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:05:36.0006127Z 2026-02-21T09:05:37.9483545Z [283s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], num_stages=1, num_warps=2, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:05:37.9485765Z Tensor-likes are not close! 2026-02-21T09:05:37.9485928Z 2026-02-21T09:05:37.9486038Z Mismatched elements: 33485998 / 33554432 (99.8%) 2026-02-21T09:05:37.9486440Z Greatest absolute difference: 2384.0 at index (1818, 910) (up to 0.01 allowed) 2026-02-21T09:05:37.9487287Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:05:37.9487730Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:05:37.9487968Z 2026-02-21T09:05:41.0642339Z 2026-02-21T09:05:41.0642513Z 2026-02-21T09:05:41.0642524Z 2026-02-21T09:05:41.0642884Z ================================================================ 2026-02-21T09:05:41.0643699Z Internal Triton PTX codegen error 2026-02-21T09:05:41.0643970Z `ptxas` stderr: 2026-02-21T09:05:41.0644684Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 801 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:05:41.0645526Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:41.0645762Z 2026-02-21T09:05:41.0646420Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfrxler8o.ptx -o /tmp/tmpfrxler8o.ptx.o 2026-02-21T09:05:41.0647335Z 2026-02-21T09:05:41.0647340Z 2026-02-21T09:05:41.0647413Z // 2026-02-21T09:05:41.0647582Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:05:41.0647817Z // 2026-02-21T09:05:41.0647901Z 2026-02-21T09:05:41.0647966Z .version 8.7 2026-02-21T09:05:41.0648135Z .target sm_90a 2026-02-21T09:05:41.0648299Z .address_size 64 2026-02-21T09:05:41.0648408Z 2026-02-21T09:05:41.0648754Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:05:41.0649162Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:05:41.0649467Z // @_helion_matmul_bf16_int4 2026-02-21T09:05:41.0649781Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:05:41.0650116Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:05:41.0650830Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:05:41.0651220Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:05:41.0651635Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:05:41.0652031Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:05:41.0652338Z ) 2026-02-21T09:05:41.0652490Z .reqntid 128 2026-02-21T09:05:41.0652650Z .maxnreg 64 2026-02-21T09:05:41.0652810Z { 2026-02-21T09:05:41.0652959Z .reg .pred %p<168>; 2026-02-21T09:05:41.0653159Z .reg .b16 %rs<1290>; 2026-02-21T09:05:41.0653342Z .reg .b32 %r<30123>; 2026-02-21T09:05:41.0669743Z .reg .b64 %rd<308>; 2026-02-21T09:05:41.0670110Z .loc 1 19 0 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:19:0 2026-02-21T09:05:41.0670503Z $L__func_begin0: 2026-02-21T09:05:41.0670837Z .loc 1 19 0 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:19:0 2026-02-21T09:05:41.0671137Z 2026-02-21T09:05:41.0671198Z // %bb.0: 2026-02-21T09:05:41.0671416Z ld.param.b64 %rd45, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:05:41.0671729Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:05:41.0671992Z $L__tmp0: 2026-02-21T09:05:41.0672292Z .loc 1 21 67 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:21:67 2026-02-21T09:05:41.0672667Z mov.u32 %r29094, %ctaid.x; 2026-02-21T09:05:41.0672922Z ld.param.b64 %rd47, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:05:41.0673192Z mov.u32 %r2198, %ctaid.y; 2026-02-21T09:05:41.0673437Z ld.param.b64 %rd64, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:05:41.0673683Z mov.u32 %r2199, %ctaid.z; 2026-02-21T09:05:41.0673865Z mov.u32 %r2200, %nctaid.x; 2026-02-21T09:05:41.0674269Z mov.u32 %r2201, %nctaid.y; 2026-02-21T09:05:41.0674469Z mad.lo.s32 %r2202, %r2199, %r2201, %r2198; 2026-02-21T09:05:41.0674700Z mad.lo.s32 %r2203, %r2202, %r2200, %r29094; 2026-02-21T09:05:41.0674916Z shl.b32 %r2204, %r2203, 7; 2026-02-21T09:05:41.0675105Z cvt.s64.s32 %rd65, %r2204; 2026-02-21T09:05:41.0675288Z add.s64 %rd61, %rd64, %rd65; 2026-02-21T09:05:41.0675492Z mov.u32 %r2, %tid.x; 2026-02-21T09:05:41.0675667Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T09:05:41.0675859Z shl.b32 %r2205, %r2, 2; 2026-02-21T09:05:41.0676039Z mov.b32 %r24001, global_smem; 2026-02-21T09:05:41.0676237Z add.s32 %r2190, %r24001, %r2205; 2026-02-21T09:05:41.0676423Z mov.b32 %r2191, 0; 2026-02-21T09:05:41.0676774Z // begin inline asm 2026-02-21T09:05:41.0677163Z @%p1 st.shared.b32 [ %r2190 + 0 ], %r2191; 2026-02-21T09:05:41.0677387Z // end inline asm 2026-02-21T09:05:41.0677557Z bar.warp.sync -1; 2026-02-21T09:05:41.0677725Z setp.eq.b32 %p2, %r2, 0; 2026-02-21T09:05:41.0677917Z cvt.u64.u32 %rd46, %r24001; 2026-02-21T09:05:41.0678098Z // begin inline asm 2026-02-21T09:05:41.0678446Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd46 + 0 ], %rd47; 2026-02-21T09:05:41.0678799Z // end inline asm 2026-02-21T09:05:41.0678958Z // begin inline asm 2026-02-21T09:05:41.0679231Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1; 2026-02-21T09:05:41.0679542Z // end inline asm 2026-02-21T09:05:41.0679708Z mov.b32 %r2192, 64; 2026-02-21T09:05:41.0679868Z // begin inline asm 2026-02-21T09:05:41.0680155Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r2192; 2026-02-21T09:05:41.0680485Z // end inline asm 2026-02-21T09:05:41.0680638Z mov.b32 %r2193, 256; 2026-02-21T09:05:41.0680802Z // begin inline asm 2026-02-21T09:05:41.0681173Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r2193; 2026-02-21T09:05:41.0681517Z // end inline asm 2026-02-21T09:05:41.0681670Z mov.b32 %r2194, 8192; 2026-02-21T09:05:41.0681839Z // begin inline asm 2026-02-21T09:05:41.0682151Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r2194; 2026-02-21T09:05:41.0682511Z // end inline asm 2026-02-21T09:05:41.0682663Z mov.b32 %r2195, 4096; 2026-02-21T09:05:41.0682830Z // begin inline asm 2026-02-21T09:05:41.0683123Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r2195; 2026-02-21T09:05:41.0683463Z // end inline asm 2026-02-21T09:05:41.0683618Z mov.b64 %rd54, 16384; 2026-02-21T09:05:41.0683775Z // begin inline asm 2026-02-21T09:05:41.0684081Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd46 + 0 ], 0x0, %rd54; 2026-02-21T09:05:41.0684430Z // end inline asm 2026-02-21T09:05:41.0684585Z mov.b32 %r2196, 1; 2026-02-21T09:05:41.0684734Z // begin inline asm 2026-02-21T09:05:41.0685048Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r2196; 2026-02-21T09:05:41.0685430Z // end inline asm 2026-02-21T09:05:41.0685583Z // begin inline asm 2026-02-21T09:05:41.0685908Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r2196; 2026-02-21T09:05:41.0686271Z // end inline asm 2026-02-21T09:05:41.0686445Z // begin inline asm 2026-02-21T09:05:41.0686896Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd46 + 0 ], 0xa; 2026-02-21T09:05:41.0687246Z // end inline asm 2026-02-21T09:05:41.0687400Z // begin inline asm 2026-02-21T09:05:41.0687711Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0; 2026-02-21T09:05:41.0688073Z // end inline asm 2026-02-21T09:05:41.0688223Z // begin inline asm 2026-02-21T09:05:41.0688515Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x3; 2026-02-21T09:05:41.0688848Z // end inline asm 2026-02-21T09:05:41.0689010Z // begin inline asm 2026-02-21T09:05:41.0689279Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0; 2026-02-21T09:05:41.0689700Z // end inline asm 2026-02-21T09:05:41.0689858Z // begin inline asm 2026-02-21T09:05:41.0690285Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd46 + 0 ], 0x80; 2026-02-21T09:05:41.0690766Z // end inline asm 2026-02-21T09:05:41.0690914Z // begin inline asm 2026-02-21T09:05:41.0691169Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T09:05:41.0691488Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:05:41.0691714Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:05:41.0691923Z // end inline asm 2026-02-21T09:05:41.0692072Z bar.sync 0; 2026-02-21T09:05:41.0692236Z cvta.global.u64 %rd183, %rd61; 2026-02-21T09:05:41.0692709Z .loc 1 0 0 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:0 2026-02-21T09:05:41.0693065Z sub.s32 %r2207, 2079, %r29094; 2026-02-21T09:05:41.0693402Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.0693777Z mul.hi.u32 %r2208, %r2207, 1041204193; 2026-02-21T09:05:41.0693980Z shr.u32 %r2209, %r2208, 8; 2026-02-21T09:05:41.0694167Z mul.hi.u32 %r2210, %r2209, 1431655766; 2026-02-21T09:05:41.0694386Z mad.lo.s32 %r29866, %r2210, 3168, %r29094; 2026-02-21T09:05:41.0694737Z .loc 1 38 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:38:45 2026-02-21T09:05:41.0695096Z shr.u32 %r4, %r2, 5; 2026-02-21T09:05:41.0695261Z bfe.u32 %r5, %r2, 1, 6; 2026-02-21T09:05:41.0695581Z .loc 1 40 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:40:45 2026-02-21T09:05:41.0695936Z and.b32 %r6, %r2, 15; 2026-02-21T09:05:41.0696105Z shl.b32 %r7, %r6, 3; 2026-02-21T09:05:41.0696620Z .loc 1 48 48 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:48:48 2026-02-21T09:05:41.0696994Z bfe.u32 %r8, %r2, 4, 3; 2026-02-21T09:05:41.0697308Z .loc 1 54 38 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:54:38 2026-02-21T09:05:41.0697663Z and.b32 %r9, %r2, 1; 2026-02-21T09:05:41.0697976Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.0698339Z setp.ge.s32 %p19, %r29094, %r29866; 2026-02-21T09:05:41.0698547Z shl.b32 %r29078, %r2, 4; 2026-02-21T09:05:41.0698728Z bfe.s32 %r29079, %r2, 3, 1; 2026-02-21T09:05:41.0698904Z and.b32 %r29080, %r2, 96; 2026-02-21T09:05:41.0699081Z shl.b32 %r29081, %r2, 3; 2026-02-21T09:05:41.0699246Z shl.b32 %r29082, %r2, 1; 2026-02-21T09:05:41.0699417Z and.b32 %r29083, %r2, 16; 2026-02-21T09:05:41.0699584Z bfe.s32 %r29084, %r2, 4, 1; 2026-02-21T09:05:41.0699775Z bfe.u32 %r29085, %r2, 5, 2; 2026-02-21T09:05:41.0699952Z and.b32 %r29086, %r2, 6; 2026-02-21T09:05:41.0700131Z and.b32 %r29087, %r2, 120; 2026-02-21T09:05:41.0700301Z shl.b32 %r29088, %r9, 2; 2026-02-21T09:05:41.0700469Z shl.b32 %r29089, %r2, 6; 2026-02-21T09:05:41.0700643Z shl.b32 %r29090, %r6, 7; 2026-02-21T09:05:41.0700808Z or.b32 %r29091, %r5, 192; 2026-02-21T09:05:41.0700979Z or.b32 %r29092, %r5, 128; 2026-02-21T09:05:41.0701147Z or.b32 %r29093, %r5, 64; 2026-02-21T09:05:41.0701331Z setp.lt.u32 %p167, %r2, 64; 2026-02-21T09:05:41.0701506Z @%p19 bra $L__BB0_9; 2026-02-21T09:05:41.0701694Z // %bb.1: // %.lr.ph 2026-02-21T09:05:41.0702066Z .loc 1 0 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:0:144 2026-02-21T09:05:41.0702417Z and.b32 %r2212, %r29078, 1904; 2026-02-21T09:05:41.0702617Z and.b32 %r2214, %r29079, 136; 2026-02-21T09:05:41.0702805Z or.b32 %r2215, %r2214, %r2212; 2026-02-21T09:05:41.0702993Z add.s32 %r10, %r24001, %r2215; 2026-02-21T09:05:41.0703175Z xor.b32 %r2217, %r2215, 8; 2026-02-21T09:05:41.0703357Z add.s32 %r11, %r24001, %r2217; 2026-02-21T09:05:41.0703533Z shl.b32 %r2219, %r29080, 4; 2026-02-21T09:05:41.0703817Z and.b32 %r2221, %r29081, 96; 2026-02-21T09:05:41.0703993Z and.b32 %r2223, %r29082, 6; 2026-02-21T09:05:41.0704176Z and.b32 %r2226, %r29084, 136; 2026-02-21T09:05:41.0704360Z or.b32 %r2227, %r2219, %r2221; 2026-02-21T09:05:41.0704546Z or.b32 %r2228, %r2227, %r2223; 2026-02-21T09:05:41.0704731Z or.b32 %r2229, %r2228, %r2226; 2026-02-21T09:05:41.0704907Z add.s32 %r12, %r24001, %r2229; 2026-02-21T09:05:41.0705088Z xor.b32 %r2230, %r2229, 8; 2026-02-21T09:05:41.0705258Z add.s32 %r13, %r24001, %r2230; 2026-02-21T09:05:41.0705438Z and.b32 %r2232, %r29084, 132; 2026-02-21T09:05:41.0705613Z or.b32 %r2233, %r29085, %r2232; 2026-02-21T09:05:41.0705800Z or.b32 %r2234, %r2233, %r7; 2026-02-21T09:05:41.0705983Z add.s32 %r14, %r24001, %r2234; 2026-02-21T09:05:41.0706236Z xor.b32 %r2235, %r2234, 4; 2026-02-21T09:05:41.0706618Z add.s32 %r15, %r24001, %r2235; 2026-02-21T09:05:41.0706824Z xor.b32 %r2236, %r2234, 32; 2026-02-21T09:05:41.0707000Z add.s32 %r16, %r24001, %r2236; 2026-02-21T09:05:41.0707176Z xor.b32 %r2237, %r2234, 36; 2026-02-21T09:05:41.0707352Z add.s32 %r17, %r24001, %r2237; 2026-02-21T09:05:41.0707525Z xor.b32 %r2238, %r2234, 64; 2026-02-21T09:05:41.0707703Z add.s32 %r18, %r24001, %r2238; 2026-02-21T09:05:41.0707890Z xor.b32 %r2239, %r2234, 68; 2026-02-21T09:05:41.0708070Z add.s32 %r19, %r24001, %r2239; 2026-02-21T09:05:41.0708335Z xor.b32 %r2240, %r2234, 96; 2026-02-21T09:05:41.0708518Z add.s32 %r20, %r24001, %r2240; 2026-02-21T09:05:41.0708701Z xor.b32 %r2241, %r2234, 100; 2026-02-21T09:05:41.0708878Z add.s32 %r21, %r24001, %r2241; 2026-02-21T09:05:41.0709070Z mul.lo.s32 %r2245, %r29086, 144; 2026-02-21T09:05:41.0709258Z xor.b32 %r2246, %r2245, %r29087; 2026-02-21T09:05:41.0709446Z or.b32 %r2247, %r2246, %r29088; 2026-02-21T09:05:41.0709630Z add.s32 %r22, %r24001, %r2247; 2026-02-21T09:05:41.0709893Z xor.b32 %r2248, %r2247, 132; 2026-02-21T09:05:41.0710077Z add.s32 %r23, %r24001, %r2248; 2026-02-21T09:05:41.0710258Z and.b32 %r2250, %r29089, 8128; 2026-02-21T09:05:41.0710446Z shl.b32 %r2251, %r29086, 3; 2026-02-21T09:05:41.0710616Z or.b32 %r2252, %r2250, %r2251; 2026-02-21T09:05:41.0710799Z add.s32 %r24, %r24001, %r2252; 2026-02-21T09:05:41.0710974Z xor.b32 %r2253, %r2252, 16; 2026-02-21T09:05:41.0711153Z add.s32 %r25, %r24001, %r2253; 2026-02-21T09:05:41.0711327Z xor.b32 %r2254, %r2252, 32; 2026-02-21T09:05:41.0711511Z add.s32 %r26, %r24001, %r2254; 2026-02-21T09:05:41.0711690Z xor.b32 %r2255, %r2252, 48; 2026-02-21T09:05:41.0711869Z add.s32 %r27, %r24001, %r2255; 2026-02-21T09:05:41.0712056Z bfe.u32 %r2256, %r24001, 4, 14; 2026-02-21T09:05:41.0712243Z cvt.u64.u32 %rd66, %r2256; 2026-02-21T09:05:41.0712444Z or.b64 %rd2, %rd66, -9223371899382267904; 2026-02-21T09:05:41.0712653Z add.s32 %r2257, %r24001, 32; 2026-02-21T09:05:41.0712854Z bfe.u32 %r2258, %r2257, 4, 14; 2026-02-21T09:05:41.0713039Z cvt.u64.u32 %rd67, %r2258; 2026-02-21T09:05:41.0713226Z or.b64 %rd3, %rd67, -9223371899382267904; 2026-02-21T09:05:41.0713432Z shl.b32 %r2260, %r29080, 6; 2026-02-21T09:05:41.0713614Z and.b32 %r2261, %r29078, 112; 2026-02-21T09:05:41.0713797Z or.b32 %r2262, %r29090, %r2261; 2026-02-21T09:05:41.0713980Z xor.b32 %r2263, %r2262, %r29083; 2026-02-21T09:05:41.0714169Z or.b32 %r2264, %r2263, %r2260; 2026-02-21T09:05:41.0714347Z add.s32 %r28, %r24001, %r2264; 2026-02-21T09:05:41.0714531Z add.s32 %r29, %r28, 32768; 2026-02-21T09:05:41.0714705Z add.s32 %r30, %r28, 8192; 2026-02-21T09:05:41.0714886Z add.s32 %r31, %r28, 40960; 2026-02-21T09:05:41.0715056Z add.s32 %r32, %r28, 16384; 2026-02-21T09:05:41.0715244Z add.s32 %r33, %r28, 49152; 2026-02-21T09:05:41.0715415Z add.s32 %r34, %r28, 24576; 2026-02-21T09:05:41.0715590Z add.s32 %r35, %r28, 57344; 2026-02-21T09:05:41.0715763Z xor.b32 %r2265, %r2264, 32; 2026-02-21T09:05:41.0715940Z add.s32 %r36, %r24001, %r2265; 2026-02-21T09:05:41.0716122Z add.s32 %r37, %r36, 32768; 2026-02-21T09:05:41.0716292Z add.s32 %r38, %r36, 8192; 2026-02-21T09:05:41.0716588Z add.s32 %r39, %r36, 40960; 2026-02-21T09:05:41.0733966Z add.s32 %r40, %r36, 16384; 2026-02-21T09:05:41.0734163Z add.s32 %r41, %r36, 49152; 2026-02-21T09:05:41.0734333Z add.s32 %r42, %r36, 24576; 2026-02-21T09:05:41.0734506Z add.s32 %r43, %r36, 57344; 2026-02-21T09:05:41.0734682Z xor.b32 %r2266, %r2264, 64; 2026-02-21T09:05:41.0734865Z add.s32 %r44, %r24001, %r2266; 2026-02-21T09:05:41.0735049Z add.s32 %r45, %r44, 32768; 2026-02-21T09:05:41.0735218Z add.s32 %r46, %r44, 8192; 2026-02-21T09:05:41.0735390Z add.s32 %r47, %r44, 40960; 2026-02-21T09:05:41.0735557Z add.s32 %r48, %r44, 16384; 2026-02-21T09:05:41.0735730Z add.s32 %r49, %r44, 49152; 2026-02-21T09:05:41.0735902Z add.s32 %r50, %r44, 24576; 2026-02-21T09:05:41.0736071Z add.s32 %r51, %r44, 57344; 2026-02-21T09:05:41.0736591Z xor.b32 %r2267, %r2264, 96; 2026-02-21T09:05:41.0736801Z add.s32 %r52, %r24001, %r2267; 2026-02-21T09:05:41.0736981Z add.s32 %r53, %r52, 32768; 2026-02-21T09:05:41.0737155Z add.s32 %r54, %r52, 8192; 2026-02-21T09:05:41.0737329Z add.s32 %r55, %r52, 40960; 2026-02-21T09:05:41.0737509Z add.s32 %r56, %r52, 16384; 2026-02-21T09:05:41.0737686Z add.s32 %r57, %r52, 49152; 2026-02-21T09:05:41.0737855Z add.s32 %r58, %r52, 24576; 2026-02-21T09:05:41.0738030Z add.s32 %r59, %r52, 57344; 2026-02-21T09:05:41.0738362Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.0738733Z shl.b32 %r2268, %r8, 13; 2026-02-21T09:05:41.0739050Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.0739413Z or.b32 %r60, %r2268, %r7; 2026-02-21T09:05:41.0739725Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.0740088Z mad.wide.u32 %rd4, %r9, 16, %rd44; 2026-02-21T09:05:41.0740389Z shl.b32 %r64, %r5, 10; 2026-02-21T09:05:41.0740614Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:05:41.0740911Z // Child Loop BB0_3 Depth 2 2026-02-21T09:05:41.0741171Z // Child Loop BB0_5 Depth 2 2026-02-21T09:05:41.0741432Z // Child Loop BB0_7 Depth 2 2026-02-21T09:05:41.0741806Z .loc 1 32 35 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:32:35 2026-02-21T09:05:41.0742173Z shr.s32 %r2271, %r29094, 31; 2026-02-21T09:05:41.0742359Z shr.u32 %r2272, %r2271, 25; 2026-02-21T09:05:41.0742536Z add.s32 %r2273, %r29094, %r2272; 2026-02-21T09:05:41.0742728Z shr.s32 %r2274, %r2273, 7; 2026-02-21T09:05:41.0743047Z .loc 1 33 33 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:33:33 2026-02-21T09:05:41.0743415Z shl.b32 %r2275, %r2274, 1; 2026-02-21T09:05:41.0743737Z .loc 1 34 39 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:39 2026-02-21T09:05:41.0744095Z sub.s32 %r2276, 16, %r2275; 2026-02-21T09:05:41.0744412Z .loc 1 34 52 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:52 2026-02-21T09:05:41.0744755Z min.s32 %r2277, %r2276, 2; 2026-02-21T09:05:41.0745063Z .loc 1 35 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:45 2026-02-21T09:05:41.0745412Z and.b32 %r2278, %r2273, -128; 2026-02-21T09:05:41.0745602Z sub.s32 %r2279, %r29094, %r2278; 2026-02-21T09:05:41.0745926Z .loc 1 36 51 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:36:51 2026-02-21T09:05:41.0746278Z div.s32 %r2280, %r2279, %r2277; 2026-02-21T09:05:41.0746733Z .loc 1 35 64 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:64 2026-02-21T09:05:41.0747095Z mul.lo.s32 %r2281, %r2280, %r2277; 2026-02-21T09:05:41.0747301Z sub.s32 %r2282, %r2279, %r2281; 2026-02-21T09:05:41.0747626Z .loc 1 35 30 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:30 2026-02-21T09:05:41.0748102Z add.s32 %r2283, %r2282, %r2275; 2026-02-21T09:05:41.0748513Z .loc 1 37 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:37:27 2026-02-21T09:05:41.0748864Z shl.b32 %r8825, %r2283, 8; 2026-02-21T09:05:41.0749176Z .loc 1 39 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:39:27 2026-02-21T09:05:41.0749519Z shl.b32 %r121, %r2280, 7; 2026-02-21T09:05:41.0749840Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.0750192Z add.s32 %r29095, %r60, %r121; 2026-02-21T09:05:41.0750379Z or.b32 %r2284, %r29091, %r8825; 2026-02-21T09:05:41.0750568Z shl.b32 %r2285, %r2284, 10; 2026-02-21T09:05:41.0750907Z mul.wide.s32 %rd9, %r2285, 2; 2026-02-21T09:05:41.0751106Z or.b32 %r2286, %r29092, %r8825; 2026-02-21T09:05:41.0751286Z shl.b32 %r2287, %r2286, 10; 2026-02-21T09:05:41.0751473Z mul.wide.s32 %rd10, %r2287, 2; 2026-02-21T09:05:41.0751664Z or.b32 %r2288, %r29093, %r8825; 2026-02-21T09:05:41.0751851Z shl.b32 %r2289, %r2288, 10; 2026-02-21T09:05:41.0752029Z mul.wide.s32 %rd11, %r2289, 2; 2026-02-21T09:05:41.0752217Z shl.b32 %r2290, %r2283, 18; 2026-02-21T09:05:41.0752400Z or.b32 %r2291, %r64, %r2290; 2026-02-21T09:05:41.0752578Z mul.wide.s32 %rd12, %r2291, 2; 2026-02-21T09:05:41.0752766Z mov.b32 %r29096, 0f00000000; 2026-02-21T09:05:41.0752943Z mov.b64 %rd300, -32; 2026-02-21T09:05:41.0753116Z mov.b64 %rd299, %rd4; 2026-02-21T09:05:41.0753280Z mov.b32 %r29097, %r29096; 2026-02-21T09:05:41.0753456Z mov.b32 %r29098, %r29096; 2026-02-21T09:05:41.0753618Z mov.b32 %r29099, %r29096; 2026-02-21T09:05:41.0753786Z mov.b32 %r29100, %r29096; 2026-02-21T09:05:41.0753949Z mov.b32 %r29101, %r29096; 2026-02-21T09:05:41.0754209Z mov.b32 %r29102, %r29096; 2026-02-21T09:05:41.0754381Z mov.b32 %r29103, %r29096; 2026-02-21T09:05:41.0754542Z mov.b32 %r29104, %r29096; 2026-02-21T09:05:41.0754713Z mov.b32 %r29105, %r29096; 2026-02-21T09:05:41.0754875Z mov.b32 %r29106, %r29096; 2026-02-21T09:05:41.0755041Z mov.b32 %r29107, %r29096; 2026-02-21T09:05:41.0755206Z mov.b32 %r29108, %r29096; 2026-02-21T09:05:41.0755373Z mov.b32 %r29109, %r29096; 2026-02-21T09:05:41.0755535Z mov.b32 %r29110, %r29096; 2026-02-21T09:05:41.0755706Z mov.b32 %r29111, %r29096; 2026-02-21T09:05:41.0755869Z mov.b32 %r29112, %r29096; 2026-02-21T09:05:41.0756037Z mov.b32 %r29113, %r29096; 2026-02-21T09:05:41.0756205Z mov.b32 %r29114, %r29096; 2026-02-21T09:05:41.0756376Z mov.b32 %r29115, %r29096; 2026-02-21T09:05:41.0756673Z mov.b32 %r29116, %r29096; 2026-02-21T09:05:41.0756838Z mov.b32 %r29117, %r29096; 2026-02-21T09:05:41.0757009Z mov.b32 %r29118, %r29096; 2026-02-21T09:05:41.0757174Z mov.b32 %r29119, %r29096; 2026-02-21T09:05:41.0757345Z mov.b32 %r29120, %r29096; 2026-02-21T09:05:41.0757508Z mov.b32 %r29121, %r29096; 2026-02-21T09:05:41.0757677Z mov.b32 %r29122, %r29096; 2026-02-21T09:05:41.0757856Z mov.b32 %r29123, %r29096; 2026-02-21T09:05:41.0758028Z mov.b32 %r29124, %r29096; 2026-02-21T09:05:41.0758197Z mov.b32 %r29125, %r29096; 2026-02-21T09:05:41.0758360Z mov.b32 %r29126, %r29096; 2026-02-21T09:05:41.0758529Z mov.b32 %r29127, %r29096; 2026-02-21T09:05:41.0758692Z mov.b32 %r29128, %r29096; 2026-02-21T09:05:41.0758862Z mov.b32 %r29129, %r29096; 2026-02-21T09:05:41.0759024Z mov.b32 %r29130, %r29096; 2026-02-21T09:05:41.0759194Z mov.b32 %r29131, %r29096; 2026-02-21T09:05:41.0759359Z mov.b32 %r29132, %r29096; 2026-02-21T09:05:41.0759528Z mov.b32 %r29133, %r29096; 2026-02-21T09:05:41.0759691Z mov.b32 %r29134, %r29096; 2026-02-21T09:05:41.0759865Z mov.b32 %r29135, %r29096; 2026-02-21T09:05:41.0760036Z mov.b32 %r29136, %r29096; 2026-02-21T09:05:41.0760202Z mov.b32 %r29137, %r29096; 2026-02-21T09:05:41.0760377Z mov.b32 %r29138, %r29096; 2026-02-21T09:05:41.0760545Z mov.b32 %r29139, %r29096; 2026-02-21T09:05:41.0760715Z mov.b32 %r29140, %r29096; 2026-02-21T09:05:41.0760982Z mov.b32 %r29141, %r29096; 2026-02-21T09:05:41.0761155Z mov.b32 %r29142, %r29096; 2026-02-21T09:05:41.0761320Z mov.b32 %r29143, %r29096; 2026-02-21T09:05:41.0761488Z mov.b32 %r29144, %r29096; 2026-02-21T09:05:41.0761656Z mov.b32 %r29145, %r29096; 2026-02-21T09:05:41.0761818Z mov.b32 %r29146, %r29096; 2026-02-21T09:05:41.0761987Z mov.b32 %r29147, %r29096; 2026-02-21T09:05:41.0762148Z mov.b32 %r29148, %r29096; 2026-02-21T09:05:41.0762320Z mov.b32 %r29149, %r29096; 2026-02-21T09:05:41.0762482Z mov.b32 %r29150, %r29096; 2026-02-21T09:05:41.0762656Z mov.b32 %r29151, %r29096; 2026-02-21T09:05:41.0762820Z mov.b32 %r29152, %r29096; 2026-02-21T09:05:41.0762997Z mov.b32 %r29153, %r29096; 2026-02-21T09:05:41.0763161Z mov.b32 %r29154, %r29096; 2026-02-21T09:05:41.0763474Z mov.b32 %r29155, %r29096; 2026-02-21T09:05:41.0763654Z mov.b32 %r29156, %r29096; 2026-02-21T09:05:41.0763830Z mov.b32 %r29157, %r29096; 2026-02-21T09:05:41.0764006Z mov.b32 %r29158, %r29096; 2026-02-21T09:05:41.0764173Z mov.b32 %r29159, %r29096; 2026-02-21T09:05:41.0764344Z mov.b32 %r29160, %r29096; 2026-02-21T09:05:41.0764508Z mov.b32 %r29161, %r29096; 2026-02-21T09:05:41.0764674Z mov.b32 %r29162, %r29096; 2026-02-21T09:05:41.0764840Z mov.b32 %r29163, %r29096; 2026-02-21T09:05:41.0765007Z mov.b32 %r29164, %r29096; 2026-02-21T09:05:41.0765171Z mov.b32 %r29165, %r29096; 2026-02-21T09:05:41.0765339Z mov.b32 %r29166, %r29096; 2026-02-21T09:05:41.0765511Z mov.b32 %r29167, %r29096; 2026-02-21T09:05:41.0765674Z mov.b32 %r29168, %r29096; 2026-02-21T09:05:41.0765843Z mov.b32 %r29169, %r29096; 2026-02-21T09:05:41.0766004Z mov.b32 %r29170, %r29096; 2026-02-21T09:05:41.0766186Z mov.b32 %r29171, %r29096; 2026-02-21T09:05:41.0766348Z mov.b32 %r29172, %r29096; 2026-02-21T09:05:41.0766649Z mov.b32 %r29173, %r29096; 2026-02-21T09:05:41.0766901Z mov.b32 %r29174, %r29096; 2026-02-21T09:05:41.0767073Z mov.b32 %r29175, %r29096; 2026-02-21T09:05:41.0767236Z mov.b32 %r29176, %r29096; 2026-02-21T09:05:41.0767422Z mov.b32 %r29177, %r29096; 2026-02-21T09:05:41.0767590Z mov.b32 %r29178, %r29096; 2026-02-21T09:05:41.0767753Z mov.b32 %r29179, %r29096; 2026-02-21T09:05:41.0767919Z mov.b32 %r29180, %r29096; 2026-02-21T09:05:41.0768080Z mov.b32 %r29181, %r29096; 2026-02-21T09:05:41.0768253Z mov.b32 %r29182, %r29096; 2026-02-21T09:05:41.0768415Z mov.b32 %r29183, %r29096; 2026-02-21T09:05:41.0768584Z mov.b32 %r29184, %r29096; 2026-02-21T09:05:41.0768747Z mov.b32 %r29185, %r29096; 2026-02-21T09:05:41.0768914Z mov.b32 %r29186, %r29096; 2026-02-21T09:05:41.0769078Z mov.b32 %r29187, %r29096; 2026-02-21T09:05:41.0769258Z mov.b32 %r29188, %r29096; 2026-02-21T09:05:41.0769436Z mov.b32 %r29189, %r29096; 2026-02-21T09:05:41.0769614Z mov.b32 %r29190, %r29096; 2026-02-21T09:05:41.0769783Z mov.b32 %r29191, %r29096; 2026-02-21T09:05:41.0769954Z mov.b32 %r29192, %r29096; 2026-02-21T09:05:41.0770117Z mov.b32 %r29193, %r29096; 2026-02-21T09:05:41.0770290Z mov.b32 %r29194, %r29096; 2026-02-21T09:05:41.0770455Z mov.b32 %r29195, %r29096; 2026-02-21T09:05:41.0770629Z mov.b32 %r29196, %r29096; 2026-02-21T09:05:41.0770794Z mov.b32 %r29197, %r29096; 2026-02-21T09:05:41.0770966Z mov.b32 %r29198, %r29096; 2026-02-21T09:05:41.0771129Z mov.b32 %r29199, %r29096; 2026-02-21T09:05:41.0771313Z mov.b32 %r29200, %r29096; 2026-02-21T09:05:41.0771483Z mov.b32 %r29201, %r29096; 2026-02-21T09:05:41.0771644Z mov.b32 %r29202, %r29096; 2026-02-21T09:05:41.0771810Z mov.b32 %r29203, %r29096; 2026-02-21T09:05:41.0771971Z mov.b32 %r29204, %r29096; 2026-02-21T09:05:41.0772151Z mov.b32 %r29205, %r29096; 2026-02-21T09:05:41.0772316Z mov.b32 %r29206, %r29096; 2026-02-21T09:05:41.0772485Z mov.b32 %r29207, %r29096; 2026-02-21T09:05:41.0772645Z mov.b32 %r29208, %r29096; 2026-02-21T09:05:41.0772818Z mov.b32 %r29209, %r29096; 2026-02-21T09:05:41.0772984Z mov.b32 %r29210, %r29096; 2026-02-21T09:05:41.0773152Z mov.b32 %r29211, %r29096; 2026-02-21T09:05:41.0773321Z mov.b32 %r29212, %r29096; 2026-02-21T09:05:41.0773564Z mov.b32 %r29213, %r29096; 2026-02-21T09:05:41.0773729Z mov.b32 %r29214, %r29096; 2026-02-21T09:05:41.0773891Z mov.b32 %r29215, %r29096; 2026-02-21T09:05:41.0774057Z mov.b32 %r29216, %r29096; 2026-02-21T09:05:41.0774216Z mov.b32 %r29217, %r29096; 2026-02-21T09:05:41.0774384Z mov.b32 %r29218, %r29096; 2026-02-21T09:05:41.0774548Z mov.b32 %r29219, %r29096; 2026-02-21T09:05:41.0774716Z mov.b32 %r29220, %r29096; 2026-02-21T09:05:41.0774893Z mov.b32 %r29221, %r29096; 2026-02-21T09:05:41.0775066Z mov.b32 %r29222, %r29096; 2026-02-21T09:05:41.0775236Z mov.b32 %r29223, %r29096; 2026-02-21T09:05:41.0775399Z mov.b32 %r29224, %r29096; 2026-02-21T09:05:41.0775568Z mov.b32 %r29225, %r29096; 2026-02-21T09:05:41.0775729Z mov.b32 %r29226, %r29096; 2026-02-21T09:05:41.0776036Z mov.b32 %r29227, %r29096; 2026-02-21T09:05:41.0776203Z mov.b32 %r29228, %r29096; 2026-02-21T09:05:41.0776371Z mov.b32 %r29229, %r29096; 2026-02-21T09:05:41.0776660Z mov.b32 %r29230, %r29096; 2026-02-21T09:05:41.0776838Z mov.b32 %r29231, %r29096; 2026-02-21T09:05:41.0777002Z mov.b32 %r29232, %r29096; 2026-02-21T09:05:41.0777176Z mov.b32 %r29233, %r29096; 2026-02-21T09:05:41.0777344Z mov.b32 %r29234, %r29096; 2026-02-21T09:05:41.0777511Z mov.b32 %r29235, %r29096; 2026-02-21T09:05:41.0777682Z mov.b32 %r29236, %r29096; 2026-02-21T09:05:41.0777846Z mov.b32 %r29237, %r29096; 2026-02-21T09:05:41.0778017Z mov.b32 %r29238, %r29096; 2026-02-21T09:05:41.0778180Z mov.b32 %r29239, %r29096; 2026-02-21T09:05:41.0778351Z mov.b32 %r29240, %r29096; 2026-02-21T09:05:41.0778515Z mov.b32 %r29241, %r29096; 2026-02-21T09:05:41.0778686Z mov.b32 %r29242, %r29096; 2026-02-21T09:05:41.0778853Z mov.b32 %r29243, %r29096; 2026-02-21T09:05:41.0779024Z mov.b32 %r29244, %r29096; 2026-02-21T09:05:41.0779199Z mov.b32 %r29245, %r29096; 2026-02-21T09:05:41.0779443Z mov.b32 %r29246, %r29096; 2026-02-21T09:05:41.0779618Z mov.b32 %r29247, %r29096; 2026-02-21T09:05:41.0779780Z mov.b32 %r29248, %r29096; 2026-02-21T09:05:41.0779954Z mov.b32 %r29249, %r29096; 2026-02-21T09:05:41.0780120Z mov.b32 %r29250, %r29096; 2026-02-21T09:05:41.0780290Z mov.b32 %r29251, %r29096; 2026-02-21T09:05:41.0780452Z mov.b32 %r29252, %r29096; 2026-02-21T09:05:41.0780621Z mov.b32 %r29253, %r29096; 2026-02-21T09:05:41.0780787Z mov.b32 %r29254, %r29096; 2026-02-21T09:05:41.0780968Z mov.b32 %r29255, %r29096; 2026-02-21T09:05:41.0781140Z mov.b32 %r29256, %r29096; 2026-02-21T09:05:41.0781301Z mov.b32 %r29257, %r29096; 2026-02-21T09:05:41.0781469Z mov.b32 %r29258, %r29096; 2026-02-21T09:05:41.0781630Z mov.b32 %r29259, %r29096; 2026-02-21T09:05:41.0781797Z mov.b32 %r29260, %r29096; 2026-02-21T09:05:41.0781957Z mov.b32 %r29261, %r29096; 2026-02-21T09:05:41.0782125Z mov.b32 %r29262, %r29096; 2026-02-21T09:05:41.0782292Z mov.b32 %r29263, %r29096; 2026-02-21T09:05:41.0782464Z mov.b32 %r29264, %r29096; 2026-02-21T09:05:41.0782631Z mov.b32 %r29265, %r29096; 2026-02-21T09:05:41.0782791Z mov.b32 %r29266, %r29096; 2026-02-21T09:05:41.0782965Z mov.b32 %r29267, %r29096; 2026-02-21T09:05:41.0783128Z mov.b32 %r29268, %r29096; 2026-02-21T09:05:41.0783297Z mov.b32 %r29269, %r29096; 2026-02-21T09:05:41.0783460Z mov.b32 %r29270, %r29096; 2026-02-21T09:05:41.0783629Z mov.b32 %r29271, %r29096; 2026-02-21T09:05:41.0783790Z mov.b32 %r29272, %r29096; 2026-02-21T09:05:41.0783975Z mov.b32 %r29273, %r29096; 2026-02-21T09:05:41.0784138Z mov.b32 %r29274, %r29096; 2026-02-21T09:05:41.0784305Z mov.b32 %r29275, %r29096; 2026-02-21T09:05:41.0784474Z mov.b32 %r29276, %r29096; 2026-02-21T09:05:41.0784638Z mov.b32 %r29277, %r29096; 2026-02-21T09:05:41.0784808Z mov.b32 %r29278, %r29096; 2026-02-21T09:05:41.0784970Z mov.b32 %r29279, %r29096; 2026-02-21T09:05:41.0785140Z mov.b32 %r29280, %r29096; 2026-02-21T09:05:41.0785306Z mov.b32 %r29281, %r29096; 2026-02-21T09:05:41.0785476Z mov.b32 %r29282, %r29096; 2026-02-21T09:05:41.0785639Z mov.b32 %r29283, %r29096; 2026-02-21T09:05:41.0785806Z mov.b32 %r29284, %r29096; 2026-02-21T09:05:41.0786050Z mov.b32 %r29285, %r29096; 2026-02-21T09:05:41.0786222Z mov.b32 %r29286, %r29096; 2026-02-21T09:05:41.0786390Z mov.b32 %r29287, %r29096; 2026-02-21T09:05:41.0786681Z mov.b32 %r29288, %r29096; 2026-02-21T09:05:41.0786856Z mov.b32 %r29289, %r29096; 2026-02-21T09:05:41.0787022Z mov.b32 %r29290, %r29096; 2026-02-21T09:05:41.0787192Z mov.b32 %r29291, %r29096; 2026-02-21T09:05:41.0787354Z mov.b32 %r29292, %r29096; 2026-02-21T09:05:41.0787525Z mov.b32 %r29293, %r29096; 2026-02-21T09:05:41.0787692Z mov.b32 %r29294, %r29096; 2026-02-21T09:05:41.0787867Z mov.b32 %r29295, %r29096; 2026-02-21T09:05:41.0788034Z mov.b32 %r29296, %r29096; 2026-02-21T09:05:41.0788209Z mov.b32 %r29297, %r29096; 2026-02-21T09:05:41.0788475Z mov.b32 %r29298, %r29096; 2026-02-21T09:05:41.0788812Z mov.b32 %r29299, %r29096; 2026-02-21T09:05:41.0788987Z mov.b32 %r29300, %r29096; 2026-02-21T09:05:41.0789153Z mov.b32 %r29301, %r29096; 2026-02-21T09:05:41.0789323Z mov.b32 %r29302, %r29096; 2026-02-21T09:05:41.0789488Z mov.b32 %r29303, %r29096; 2026-02-21T09:05:41.0789657Z mov.b32 %r29304, %r29096; 2026-02-21T09:05:41.0789821Z mov.b32 %r29305, %r29096; 2026-02-21T09:05:41.0789989Z mov.b32 %r29306, %r29096; 2026-02-21T09:05:41.0790148Z mov.b32 %r29307, %r29096; 2026-02-21T09:05:41.0790317Z mov.b32 %r29308, %r29096; 2026-02-21T09:05:41.0790480Z mov.b32 %r29309, %r29096; 2026-02-21T09:05:41.0790643Z mov.b32 %r29310, %r29096; 2026-02-21T09:05:41.0790809Z mov.b32 %r29311, %r29096; 2026-02-21T09:05:41.0790969Z mov.b32 %r29312, %r29096; 2026-02-21T09:05:41.0791150Z mov.b32 %r29313, %r29096; 2026-02-21T09:05:41.0791329Z mov.b32 %r29314, %r29096; 2026-02-21T09:05:41.0791502Z mov.b32 %r29315, %r29096; 2026-02-21T09:05:41.0791664Z mov.b32 %r29316, %r29096; 2026-02-21T09:05:41.0791835Z mov.b32 %r29317, %r29096; 2026-02-21T09:05:41.0792070Z mov.b32 %r29318, %r29096; 2026-02-21T09:05:41.0792240Z mov.b32 %r29319, %r29096; 2026-02-21T09:05:41.0792430Z mov.b32 %r29320, %r29096; 2026-02-21T09:05:41.0792602Z mov.b32 %r29321, %r29096; 2026-02-21T09:05:41.0792779Z mov.b32 %r29322, %r29096; 2026-02-21T09:05:41.0792946Z mov.b32 %r29323, %r29096; 2026-02-21T09:05:41.0793119Z mov.b32 %r29324, %r29096; 2026-02-21T09:05:41.0793282Z mov.b32 %r29325, %r29096; 2026-02-21T09:05:41.0793454Z mov.b32 %r29326, %r29096; 2026-02-21T09:05:41.0793614Z mov.b32 %r29327, %r29096; 2026-02-21T09:05:41.0793786Z mov.b32 %r29328, %r29096; 2026-02-21T09:05:41.0793955Z mov.b32 %r29329, %r29096; 2026-02-21T09:05:41.0794121Z mov.b32 %r29330, %r29096; 2026-02-21T09:05:41.0794290Z mov.b32 %r29331, %r29096; 2026-02-21T09:05:41.0794454Z mov.b32 %r29332, %r29096; 2026-02-21T09:05:41.0794623Z mov.b32 %r29333, %r29096; 2026-02-21T09:05:41.0794783Z mov.b32 %r29334, %r29096; 2026-02-21T09:05:41.0794957Z mov.b32 %r29335, %r29096; 2026-02-21T09:05:41.0795120Z mov.b32 %r29336, %r29096; 2026-02-21T09:05:41.0795290Z mov.b32 %r29337, %r29096; 2026-02-21T09:05:41.0795452Z mov.b32 %r29338, %r29096; 2026-02-21T09:05:41.0795627Z mov.b32 %r29339, %r29096; 2026-02-21T09:05:41.0795793Z mov.b32 %r29340, %r29096; 2026-02-21T09:05:41.0795965Z mov.b32 %r29341, %r29096; 2026-02-21T09:05:41.0796142Z mov.b32 %r29342, %r29096; 2026-02-21T09:05:41.0796309Z mov.b32 %r29343, %r29096; 2026-02-21T09:05:41.0796601Z mov.b32 %r29344, %r29096; 2026-02-21T09:05:41.0796775Z mov.b32 %r29345, %r29096; 2026-02-21T09:05:41.0796944Z mov.b32 %r29346, %r29096; 2026-02-21T09:05:41.0797106Z mov.b32 %r29347, %r29096; 2026-02-21T09:05:41.0797277Z mov.b32 %r29348, %r29096; 2026-02-21T09:05:41.0797439Z mov.b32 %r29349, %r29096; 2026-02-21T09:05:41.0797609Z mov.b32 %r29350, %r29096; 2026-02-21T09:05:41.0797777Z mov.b32 %r29351, %r29096; 2026-02-21T09:05:41.0798001Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:41.0798310Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:41.0798710Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.0799185Z add.s64 %rd69, %rd299, %rd12; 2026-02-21T09:05:41.0799366Z add.s64 %rd70, %rd299, %rd11; 2026-02-21T09:05:41.0799550Z add.s64 %rd71, %rd299, %rd10; 2026-02-21T09:05:41.0799861Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.0800217Z add.s64 %rd72, %rd299, %rd9; 2026-02-21T09:05:41.0800400Z // begin inline asm 2026-02-21T09:05:41.0800556Z mov.u32 %r2292, 0x0; 2026-02-21T09:05:41.0800717Z mov.u32 %r2293, 0x0; 2026-02-21T09:05:41.0800866Z mov.u32 %r2294, 0x0; 2026-02-21T09:05:41.0801016Z mov.u32 %r2295, 0x0; 2026-02-21T09:05:41.0801234Z ld.global.v4.b32 { %r2292, %r2293, %r2294, %r2295 }, [ %rd69 + 0 ]; 2026-02-21T09:05:41.0801651Z // end inline asm 2026-02-21T09:05:41.0801809Z // begin inline asm 2026-02-21T09:05:41.0801974Z mov.u32 %r2296, 0x0; 2026-02-21T09:05:41.0802128Z mov.u32 %r2297, 0x0; 2026-02-21T09:05:41.0802277Z mov.u32 %r2298, 0x0; 2026-02-21T09:05:41.0802432Z mov.u32 %r2299, 0x0; 2026-02-21T09:05:41.0802648Z ld.global.v4.b32 { %r2296, %r2297, %r2298, %r2299 }, [ %rd70 + 0 ]; 2026-02-21T09:05:41.0802914Z // end inline asm 2026-02-21T09:05:41.0803064Z // begin inline asm 2026-02-21T09:05:41.0803223Z mov.u32 %r2300, 0x0; 2026-02-21T09:05:41.0803373Z mov.u32 %r2301, 0x0; 2026-02-21T09:05:41.0803529Z mov.u32 %r2302, 0x0; 2026-02-21T09:05:41.0803677Z mov.u32 %r2303, 0x0; 2026-02-21T09:05:41.0803895Z ld.global.v4.b32 { %r2300, %r2301, %r2302, %r2303 }, [ %rd71 + 0 ]; 2026-02-21T09:05:41.0804156Z // end inline asm 2026-02-21T09:05:41.0804303Z // begin inline asm 2026-02-21T09:05:41.0804464Z mov.u32 %r2304, 0x0; 2026-02-21T09:05:41.0804619Z mov.u32 %r2305, 0x0; 2026-02-21T09:05:41.0804775Z mov.u32 %r2306, 0x0; 2026-02-21T09:05:41.0804938Z mov.u32 %r2307, 0x0; 2026-02-21T09:05:41.0805235Z ld.global.v4.b32 { %r2304, %r2305, %r2306, %r2307 }, [ %rd72 + 0 ]; 2026-02-21T09:05:41.0805491Z // end inline asm 2026-02-21T09:05:41.0805789Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.0806153Z bar.sync 0; 2026-02-21T09:05:41.0806322Z st.shared.v2.b32 [%r10], {%r2292, %r2293}; 2026-02-21T09:05:41.0806686Z st.shared.v2.b32 [%r10+2048], {%r2296, %r2297}; 2026-02-21T09:05:41.0806938Z st.shared.v2.b32 [%r10+4096], {%r2300, %r2301}; 2026-02-21T09:05:41.0807184Z st.shared.v2.b32 [%r10+6144], {%r2304, %r2305}; 2026-02-21T09:05:41.0807414Z st.shared.v2.b32 [%r11], {%r2294, %r2295}; 2026-02-21T09:05:41.0807650Z st.shared.v2.b32 [%r11+2048], {%r2298, %r2299}; 2026-02-21T09:05:41.0807895Z st.shared.v2.b32 [%r11+4096], {%r2302, %r2303}; 2026-02-21T09:05:41.0808139Z st.shared.v2.b32 [%r11+6144], {%r2306, %r2307}; 2026-02-21T09:05:41.0808359Z bar.sync 0; 2026-02-21T09:05:41.0808513Z ld.shared.b16 %rs1, [%r12]; 2026-02-21T09:05:41.0808710Z ld.shared.b16 %rs2, [%r12+256]; 2026-02-21T09:05:41.0808907Z ld.shared.b16 %rs3, [%r12+16]; 2026-02-21T09:05:41.0809116Z ld.shared.b16 %rs4, [%r12+272]; 2026-02-21T09:05:41.0809308Z ld.shared.b16 %rs5, [%r12+2048]; 2026-02-21T09:05:41.0809504Z ld.shared.b16 %rs6, [%r12+2304]; 2026-02-21T09:05:41.0809692Z ld.shared.b16 %rs7, [%r12+2064]; 2026-02-21T09:05:41.0809883Z ld.shared.b16 %rs8, [%r12+2320]; 2026-02-21T09:05:41.0810078Z ld.shared.b16 %rs9, [%r12+4096]; 2026-02-21T09:05:41.0810268Z ld.shared.b16 %rs10, [%r12+4352]; 2026-02-21T09:05:41.0810468Z ld.shared.b16 %rs11, [%r12+4112]; 2026-02-21T09:05:41.0810658Z ld.shared.b16 %rs12, [%r12+4368]; 2026-02-21T09:05:41.0810849Z ld.shared.b16 %rs13, [%r12+6144]; 2026-02-21T09:05:41.0811035Z ld.shared.b16 %rs14, [%r12+6400]; 2026-02-21T09:05:41.0811231Z ld.shared.b16 %rs15, [%r12+6160]; 2026-02-21T09:05:41.0811424Z ld.shared.b16 %rs16, [%r12+6416]; 2026-02-21T09:05:41.0811622Z ld.shared.b16 %rs17, [%r13]; 2026-02-21T09:05:41.0811805Z ld.shared.b16 %rs18, [%r13+256]; 2026-02-21T09:05:41.0812000Z ld.shared.b16 %rs19, [%r13+16]; 2026-02-21T09:05:41.0812278Z ld.shared.b16 %rs20, [%r13+272]; 2026-02-21T09:05:41.0812469Z ld.shared.b16 %rs21, [%r13+2048]; 2026-02-21T09:05:41.0812667Z ld.shared.b16 %rs22, [%r13+2304]; 2026-02-21T09:05:41.0812869Z ld.shared.b16 %rs23, [%r13+2064]; 2026-02-21T09:05:41.0813065Z ld.shared.b16 %rs24, [%r13+2320]; 2026-02-21T09:05:41.0813253Z ld.shared.b16 %rs25, [%r13+4096]; 2026-02-21T09:05:41.0813446Z ld.shared.b16 %rs26, [%r13+4352]; 2026-02-21T09:05:41.0813632Z ld.shared.b16 %rs27, [%r13+4112]; 2026-02-21T09:05:41.0813824Z ld.shared.b16 %rs28, [%r13+4368]; 2026-02-21T09:05:41.0814017Z ld.shared.b16 %rs29, [%r13+6144]; 2026-02-21T09:05:41.0814203Z ld.shared.b16 %rs30, [%r13+6400]; 2026-02-21T09:05:41.0814395Z ld.shared.b16 %rs31, [%r13+6160]; 2026-02-21T09:05:41.0814660Z ld.shared.b16 %rs32, [%r13+6416]; 2026-02-21T09:05:41.0814931Z cvt.f32.bf16 %r2438, %rs1; 2026-02-21T09:05:41.0815118Z cvt.f32.bf16 %r2439, %rs2; 2026-02-21T09:05:41.0815298Z cvt.f32.bf16 %r2440, %rs17; 2026-02-21T09:05:41.0815479Z cvt.f32.bf16 %r2441, %rs18; 2026-02-21T09:05:41.0815658Z cvt.f32.bf16 %r2570, %rs3; 2026-02-21T09:05:41.0815831Z cvt.f32.bf16 %r2571, %rs4; 2026-02-21T09:05:41.0816013Z cvt.f32.bf16 %r2572, %rs19; 2026-02-21T09:05:41.0816194Z cvt.f32.bf16 %r2573, %rs20; 2026-02-21T09:05:41.0816369Z cvt.f32.bf16 %r2702, %rs5; 2026-02-21T09:05:41.0816673Z cvt.f32.bf16 %r2703, %rs6; 2026-02-21T09:05:41.0816851Z cvt.f32.bf16 %r2704, %rs21; 2026-02-21T09:05:41.0817032Z cvt.f32.bf16 %r2705, %rs22; 2026-02-21T09:05:41.0817207Z cvt.f32.bf16 %r2834, %rs7; 2026-02-21T09:05:41.0817383Z cvt.f32.bf16 %r2835, %rs8; 2026-02-21T09:05:41.0817556Z cvt.f32.bf16 %r2836, %rs23; 2026-02-21T09:05:41.0817736Z cvt.f32.bf16 %r2837, %rs24; 2026-02-21T09:05:41.0817916Z cvt.f32.bf16 %r2966, %rs9; 2026-02-21T09:05:41.0818092Z cvt.f32.bf16 %r2967, %rs10; 2026-02-21T09:05:41.0818349Z cvt.f32.bf16 %r2968, %rs25; 2026-02-21T09:05:41.0818525Z cvt.f32.bf16 %r2969, %rs26; 2026-02-21T09:05:41.0818704Z cvt.f32.bf16 %r3098, %rs11; 2026-02-21T09:05:41.0818875Z cvt.f32.bf16 %r3099, %rs12; 2026-02-21T09:05:41.0819051Z cvt.f32.bf16 %r3100, %rs27; 2026-02-21T09:05:41.0819224Z cvt.f32.bf16 %r3101, %rs28; 2026-02-21T09:05:41.0819399Z cvt.f32.bf16 %r3230, %rs13; 2026-02-21T09:05:41.0819570Z cvt.f32.bf16 %r3231, %rs14; 2026-02-21T09:05:41.0819745Z cvt.f32.bf16 %r3232, %rs29; 2026-02-21T09:05:41.0819917Z cvt.f32.bf16 %r3233, %rs30; 2026-02-21T09:05:41.0820087Z cvt.f32.bf16 %r3362, %rs15; 2026-02-21T09:05:41.0820267Z cvt.f32.bf16 %r3363, %rs16; 2026-02-21T09:05:41.0820447Z cvt.f32.bf16 %r3364, %rs31; 2026-02-21T09:05:41.0820622Z cvt.f32.bf16 %r3365, %rs32; 2026-02-21T09:05:41.0820939Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.0821301Z cvt.s64.s32 %rd121, %r29095; 2026-02-21T09:05:41.0821481Z add.s64 %rd73, %rd45, %rd121; 2026-02-21T09:05:41.0821805Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.0822157Z // begin inline asm 2026-02-21T09:05:41.0822314Z mov.u32 %r2308, 0x0; 2026-02-21T09:05:41.0822476Z mov.u32 %r2309, 0x0; 2026-02-21T09:05:41.0822665Z ld.global.v2.b32 { %r2308, %r2309 }, [ %rd73 + 0 ]; 2026-02-21T09:05:41.0822899Z // end inline asm 2026-02-21T09:05:41.0823186Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.0823533Z bar.sync 0; 2026-02-21T09:05:41.0823682Z st.shared.b8 [%r14], %r2308; 2026-02-21T09:05:41.0823876Z prmt.b32 %r8660, %r2308, 0, 0x7771U; 2026-02-21T09:05:41.0824083Z st.shared.b8 [%r15], %r8660; 2026-02-21T09:05:41.0824265Z prmt.b32 %r8661, %r2308, 0, 0x7772U; 2026-02-21T09:05:41.0824470Z st.shared.b8 [%r16+256], %r8661; 2026-02-21T09:05:41.0824662Z prmt.b32 %r8662, %r2308, 0, 0x7773U; 2026-02-21T09:05:41.0824864Z st.shared.b8 [%r17+256], %r8662; 2026-02-21T09:05:41.0825051Z st.shared.b8 [%r18+512], %r2309; 2026-02-21T09:05:41.0825259Z prmt.b32 %r8663, %r2309, 0, 0x7771U; 2026-02-21T09:05:41.0825542Z st.shared.b8 [%r19+512], %r8663; 2026-02-21T09:05:41.0825734Z prmt.b32 %r8664, %r2309, 0, 0x7772U; 2026-02-21T09:05:41.0825931Z st.shared.b8 [%r20+768], %r8664; 2026-02-21T09:05:41.0826115Z prmt.b32 %r8665, %r2309, 0, 0x7773U; 2026-02-21T09:05:41.0826312Z st.shared.b8 [%r21+768], %r8665; 2026-02-21T09:05:41.0826610Z bar.sync 0; 2026-02-21T09:05:41.0826779Z ld.shared.b32 %r8666, [%r22]; 2026-02-21T09:05:41.0826964Z prmt.b32 %r8667, %r8666, 0, 0x7770U; 2026-02-21T09:05:41.0827185Z cvt.u16.u32 %rs33, %r8667; 2026-02-21T09:05:41.0827367Z prmt.b32 %r8668, %r8666, 0, 0x7771U; 2026-02-21T09:05:41.0827568Z cvt.u16.u32 %rs34, %r8668; 2026-02-21T09:05:41.0827742Z prmt.b32 %r8669, %r8666, 0, 0x7772U; 2026-02-21T09:05:41.0828086Z cvt.u16.u32 %rs35, %r8669; 2026-02-21T09:05:41.0828336Z prmt.b32 %r8670, %r8666, 0, 0x7773U; 2026-02-21T09:05:41.0828545Z cvt.u16.u32 %rs36, %r8670; 2026-02-21T09:05:41.0828726Z ld.shared.b32 %r8671, [%r23]; 2026-02-21T09:05:41.0828914Z prmt.b32 %r8672, %r8671, 0, 0x7770U; 2026-02-21T09:05:41.0829112Z cvt.u16.u32 %rs37, %r8672; 2026-02-21T09:05:41.0829286Z prmt.b32 %r8673, %r8671, 0, 0x7771U; 2026-02-21T09:05:41.0829480Z cvt.u16.u32 %rs38, %r8673; 2026-02-21T09:05:41.0829651Z prmt.b32 %r8674, %r8671, 0, 0x7772U; 2026-02-21T09:05:41.0829847Z cvt.u16.u32 %rs39, %r8674; 2026-02-21T09:05:41.0830021Z prmt.b32 %r8675, %r8671, 0, 0x7773U; 2026-02-21T09:05:41.0830211Z cvt.u16.u32 %rs40, %r8675; 2026-02-21T09:05:41.0830530Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.0830888Z shl.b16 %rs41, %rs33, 4; 2026-02-21T09:05:41.0831073Z shl.b16 %rs42, %rs37, 4; 2026-02-21T09:05:41.0831245Z shl.b16 %rs43, %rs34, 4; 2026-02-21T09:05:41.0831416Z shl.b16 %rs44, %rs38, 4; 2026-02-21T09:05:41.0831660Z shl.b16 %rs45, %rs35, 4; 2026-02-21T09:05:41.0831838Z shl.b16 %rs46, %rs39, 4; 2026-02-21T09:05:41.0832007Z shl.b16 %rs47, %rs36, 4; 2026-02-21T09:05:41.0832173Z shl.b16 %rs48, %rs40, 4; 2026-02-21T09:05:41.0832481Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0832824Z cvt.s16.s8 %rs49, %rs41; 2026-02-21T09:05:41.0832997Z shr.s16 %rs50, %rs49, 4; 2026-02-21T09:05:41.0833163Z cvt.s16.s8 %rs51, %rs42; 2026-02-21T09:05:41.0833345Z shr.s16 %rs52, %rs51, 4; 2026-02-21T09:05:41.0833518Z prmt.b32 %r8676, %r8666, 0, 0x8880U; 2026-02-21T09:05:41.0833714Z cvt.u16.u32 %rs53, %r8676; 2026-02-21T09:05:41.0833889Z shr.s16 %rs54, %rs53, 4; 2026-02-21T09:05:41.0834075Z prmt.b32 %r8677, %r8671, 0, 0x8880U; 2026-02-21T09:05:41.0834282Z cvt.u16.u32 %rs55, %r8677; 2026-02-21T09:05:41.0834463Z shr.s16 %rs56, %rs55, 4; 2026-02-21T09:05:41.0834790Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0835146Z cvt.rn.f32.s16 %r8678, %rs56; 2026-02-21T09:05:41.0835351Z cvt.rn.f32.s16 %r8679, %rs54; 2026-02-21T09:05:41.0835544Z cvt.rn.f32.s16 %r8680, %rs52; 2026-02-21T09:05:41.0835727Z cvt.rn.f32.s16 %r8681, %rs50; 2026-02-21T09:05:41.0836054Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0836405Z cvt.s16.s8 %rs57, %rs43; 2026-02-21T09:05:41.0836731Z shr.s16 %rs58, %rs57, 4; 2026-02-21T09:05:41.0836903Z cvt.s16.s8 %rs59, %rs44; 2026-02-21T09:05:41.0837077Z shr.s16 %rs60, %rs59, 4; 2026-02-21T09:05:41.0837251Z prmt.b32 %r8682, %r8666, 0, 0x9991U; 2026-02-21T09:05:41.0837456Z cvt.u16.u32 %rs61, %r8682; 2026-02-21T09:05:41.0837632Z shr.s16 %rs62, %rs61, 4; 2026-02-21T09:05:41.0837814Z prmt.b32 %r8683, %r8671, 0, 0x9991U; 2026-02-21T09:05:41.0838021Z cvt.u16.u32 %rs63, %r8683; 2026-02-21T09:05:41.0838212Z shr.s16 %rs64, %rs63, 4; 2026-02-21T09:05:41.0838533Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0838884Z cvt.rn.f32.s16 %r8684, %rs64; 2026-02-21T09:05:41.0839181Z cvt.rn.f32.s16 %r8685, %rs62; 2026-02-21T09:05:41.0839360Z cvt.rn.f32.s16 %r8686, %rs60; 2026-02-21T09:05:41.0839544Z cvt.rn.f32.s16 %r8687, %rs58; 2026-02-21T09:05:41.0839873Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0840231Z cvt.s16.s8 %rs65, %rs45; 2026-02-21T09:05:41.0840410Z shr.s16 %rs66, %rs65, 4; 2026-02-21T09:05:41.0840578Z cvt.s16.s8 %rs67, %rs46; 2026-02-21T09:05:41.0840749Z shr.s16 %rs68, %rs67, 4; 2026-02-21T09:05:41.0840918Z prmt.b32 %r8688, %r8666, 0, 0xaaa2U; 2026-02-21T09:05:41.0841117Z cvt.u16.u32 %rs69, %r8688; 2026-02-21T09:05:41.0841291Z shr.s16 %rs70, %rs69, 4; 2026-02-21T09:05:41.0841542Z prmt.b32 %r8689, %r8671, 0, 0xaaa2U; 2026-02-21T09:05:41.0841797Z cvt.u16.u32 %rs71, %r8689; 2026-02-21T09:05:41.0841980Z shr.s16 %rs72, %rs71, 4; 2026-02-21T09:05:41.0842281Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0842633Z cvt.rn.f32.s16 %r8690, %rs72; 2026-02-21T09:05:41.0842830Z cvt.rn.f32.s16 %r8691, %rs70; 2026-02-21T09:05:41.0843010Z cvt.rn.f32.s16 %r8692, %rs68; 2026-02-21T09:05:41.0843189Z cvt.rn.f32.s16 %r8693, %rs66; 2026-02-21T09:05:41.0843499Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0843851Z cvt.s16.s8 %rs73, %rs47; 2026-02-21T09:05:41.0844018Z shr.s16 %rs74, %rs73, 4; 2026-02-21T09:05:41.0844191Z cvt.s16.s8 %rs75, %rs48; 2026-02-21T09:05:41.0844355Z shr.s16 %rs76, %rs75, 4; 2026-02-21T09:05:41.0844531Z prmt.b32 %r8694, %r8666, 0, 0xbbb3U; 2026-02-21T09:05:41.0844732Z cvt.u16.u32 %rs77, %r8694; 2026-02-21T09:05:41.0844905Z shr.s16 %rs78, %rs77, 4; 2026-02-21T09:05:41.0845157Z prmt.b32 %r8695, %r8671, 0, 0xbbb3U; 2026-02-21T09:05:41.0845369Z cvt.u16.u32 %rs79, %r8695; 2026-02-21T09:05:41.0845550Z shr.s16 %rs80, %rs79, 4; 2026-02-21T09:05:41.0845855Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0846205Z cvt.rn.f32.s16 %r8696, %rs80; 2026-02-21T09:05:41.0846386Z cvt.rn.f32.s16 %r8697, %rs78; 2026-02-21T09:05:41.0846697Z cvt.rn.f32.s16 %r8698, %rs76; 2026-02-21T09:05:41.0846880Z cvt.rn.f32.s16 %r8699, %rs74; 2026-02-21T09:05:41.0847051Z bar.sync 0; 2026-02-21T09:05:41.0847249Z st.shared.v4.b32 [%r24], {%r8681, %r8679, %r8680, %r8678}; 2026-02-21T09:05:41.0847541Z st.shared.v4.b32 [%r25], {%r8687, %r8685, %r8686, %r8684}; 2026-02-21T09:05:41.0847832Z st.shared.v4.b32 [%r26], {%r8693, %r8691, %r8692, %r8690}; 2026-02-21T09:05:41.0848115Z st.shared.v4.b32 [%r27], {%r8699, %r8697, %r8698, %r8696}; 2026-02-21T09:05:41.0848359Z $L__tmp1: 2026-02-21T09:05:41.0848724Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.0849138Z // begin inline asm 2026-02-21T09:05:41.0849332Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.0864451Z // end inline asm 2026-02-21T09:05:41.0864637Z bar.sync 0; 2026-02-21T09:05:41.0864898Z shfl.sync.idx.b32 %r8700, %r4, 0, 31, -1; 2026-02-21T09:05:41.0865212Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.0865442Z mov.pred %p20, -1; 2026-02-21T09:05:41.0865612Z // begin inline asm 2026-02-21T09:05:41.0867331Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r2438,%r2439,%r2440,%r2441}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0869268Z // end inline asm 2026-02-21T09:05:41.0869437Z // begin inline asm 2026-02-21T09:05:41.0871004Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r2570,%r2571,%r2572,%r2573}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0872612Z // end inline asm 2026-02-21T09:05:41.0872887Z // begin inline asm 2026-02-21T09:05:41.0874536Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r2702,%r2703,%r2704,%r2705}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0876136Z // end inline asm 2026-02-21T09:05:41.0876295Z // begin inline asm 2026-02-21T09:05:41.0878071Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r2834,%r2835,%r2836,%r2837}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0879669Z // end inline asm 2026-02-21T09:05:41.0879838Z // begin inline asm 2026-02-21T09:05:41.0881442Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r2966,%r2967,%r2968,%r2969}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0883066Z // end inline asm 2026-02-21T09:05:41.0883225Z // begin inline asm 2026-02-21T09:05:41.0884822Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r3098,%r3099,%r3100,%r3101}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0886438Z // end inline asm 2026-02-21T09:05:41.0886749Z // begin inline asm 2026-02-21T09:05:41.0888328Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r3230,%r3231,%r3232,%r3233}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0890019Z // end inline asm 2026-02-21T09:05:41.0890177Z // begin inline asm 2026-02-21T09:05:41.0891821Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r3362,%r3363,%r3364,%r3365}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0893495Z // end inline asm 2026-02-21T09:05:41.0893666Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.0893881Z mov.b32 %r8400, 0; 2026-02-21T09:05:41.0894041Z mov.b32 %r3622, %r24001; 2026-02-21T09:05:41.0894218Z mov.b32 %r3623, %r8400; 2026-02-21T09:05:41.0894384Z mov.b32 %r3624, %r8400; 2026-02-21T09:05:41.0894571Z // begin inline asm 2026-02-21T09:05:41.0899851Z // wait for regs: %r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159,%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223,%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287,%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351,%r3622,%r3623,%r3624 2026-02-21T09:05:41.0905359Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.0905565Z // end inline asm 2026-02-21T09:05:41.0905716Z $L__tmp2: 2026-02-21T09:05:41.0906023Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.0906402Z add.s64 %rd82, %rd69, 32; 2026-02-21T09:05:41.0906716Z add.s64 %rd83, %rd70, 32; 2026-02-21T09:05:41.0906905Z add.s64 %rd84, %rd71, 32; 2026-02-21T09:05:41.0907229Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.0907680Z add.s64 %rd85, %rd72, 32; 2026-02-21T09:05:41.0907849Z // begin inline asm 2026-02-21T09:05:41.0908018Z mov.u32 %r3884, 0x0; 2026-02-21T09:05:41.0908190Z mov.u32 %r3885, 0x0; 2026-02-21T09:05:41.0908432Z mov.u32 %r3886, 0x0; 2026-02-21T09:05:41.0908588Z mov.u32 %r3887, 0x0; 2026-02-21T09:05:41.0908826Z ld.global.v4.b32 { %r3884, %r3885, %r3886, %r3887 }, [ %rd82 + 0 ]; 2026-02-21T09:05:41.0909093Z // end inline asm 2026-02-21T09:05:41.0909253Z // begin inline asm 2026-02-21T09:05:41.0909421Z mov.u32 %r3888, 0x0; 2026-02-21T09:05:41.0909575Z mov.u32 %r3889, 0x0; 2026-02-21T09:05:41.0909733Z mov.u32 %r3890, 0x0; 2026-02-21T09:05:41.0909884Z mov.u32 %r3891, 0x0; 2026-02-21T09:05:41.0910291Z ld.global.v4.b32 { %r3888, %r3889, %r3890, %r3891 }, [ %rd83 + 0 ]; 2026-02-21T09:05:41.0910556Z // end inline asm 2026-02-21T09:05:41.0910710Z // begin inline asm 2026-02-21T09:05:41.0910876Z mov.u32 %r3892, 0x0; 2026-02-21T09:05:41.0911035Z mov.u32 %r3893, 0x0; 2026-02-21T09:05:41.0911194Z mov.u32 %r3894, 0x0; 2026-02-21T09:05:41.0911352Z mov.u32 %r3895, 0x0; 2026-02-21T09:05:41.0911564Z ld.global.v4.b32 { %r3892, %r3893, %r3894, %r3895 }, [ %rd84 + 0 ]; 2026-02-21T09:05:41.0911821Z // end inline asm 2026-02-21T09:05:41.0911976Z // begin inline asm 2026-02-21T09:05:41.0912125Z mov.u32 %r3896, 0x0; 2026-02-21T09:05:41.0912281Z mov.u32 %r3897, 0x0; 2026-02-21T09:05:41.0912432Z mov.u32 %r3898, 0x0; 2026-02-21T09:05:41.0912589Z mov.u32 %r3899, 0x0; 2026-02-21T09:05:41.0912803Z ld.global.v4.b32 { %r3896, %r3897, %r3898, %r3899 }, [ %rd85 + 0 ]; 2026-02-21T09:05:41.0913066Z // end inline asm 2026-02-21T09:05:41.0913364Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.0913810Z bar.sync 0; 2026-02-21T09:05:41.0913995Z st.shared.v2.b32 [%r10], {%r3884, %r3885}; 2026-02-21T09:05:41.0914235Z st.shared.v2.b32 [%r10+2048], {%r3888, %r3889}; 2026-02-21T09:05:41.0914493Z st.shared.v2.b32 [%r10+4096], {%r3892, %r3893}; 2026-02-21T09:05:41.0914735Z st.shared.v2.b32 [%r10+6144], {%r3896, %r3897}; 2026-02-21T09:05:41.0914972Z st.shared.v2.b32 [%r11], {%r3886, %r3887}; 2026-02-21T09:05:41.0915215Z st.shared.v2.b32 [%r11+2048], {%r3890, %r3891}; 2026-02-21T09:05:41.0915482Z st.shared.v2.b32 [%r11+4096], {%r3894, %r3895}; 2026-02-21T09:05:41.0915730Z st.shared.v2.b32 [%r11+6144], {%r3898, %r3899}; 2026-02-21T09:05:41.0915951Z bar.sync 0; 2026-02-21T09:05:41.0916114Z ld.shared.b16 %rs81, [%r12]; 2026-02-21T09:05:41.0916308Z ld.shared.b16 %rs82, [%r12+256]; 2026-02-21T09:05:41.0916641Z ld.shared.b16 %rs83, [%r12+16]; 2026-02-21T09:05:41.0916845Z ld.shared.b16 %rs84, [%r12+272]; 2026-02-21T09:05:41.0917066Z ld.shared.b16 %rs85, [%r12+2048]; 2026-02-21T09:05:41.0917271Z ld.shared.b16 %rs86, [%r12+2304]; 2026-02-21T09:05:41.0917468Z ld.shared.b16 %rs87, [%r12+2064]; 2026-02-21T09:05:41.0917666Z ld.shared.b16 %rs88, [%r12+2320]; 2026-02-21T09:05:41.0917860Z ld.shared.b16 %rs89, [%r12+4096]; 2026-02-21T09:05:41.0918057Z ld.shared.b16 %rs90, [%r12+4352]; 2026-02-21T09:05:41.0918245Z ld.shared.b16 %rs91, [%r12+4112]; 2026-02-21T09:05:41.0918442Z ld.shared.b16 %rs92, [%r12+4368]; 2026-02-21T09:05:41.0918630Z ld.shared.b16 %rs93, [%r12+6144]; 2026-02-21T09:05:41.0918826Z ld.shared.b16 %rs94, [%r12+6400]; 2026-02-21T09:05:41.0919019Z ld.shared.b16 %rs95, [%r12+6160]; 2026-02-21T09:05:41.0919216Z ld.shared.b16 %rs96, [%r12+6416]; 2026-02-21T09:05:41.0919410Z ld.shared.b16 %rs97, [%r13]; 2026-02-21T09:05:41.0919608Z ld.shared.b16 %rs98, [%r13+256]; 2026-02-21T09:05:41.0919815Z ld.shared.b16 %rs99, [%r13+16]; 2026-02-21T09:05:41.0920019Z ld.shared.b16 %rs100, [%r13+272]; 2026-02-21T09:05:41.0920227Z ld.shared.b16 %rs101, [%r13+2048]; 2026-02-21T09:05:41.0920428Z ld.shared.b16 %rs102, [%r13+2304]; 2026-02-21T09:05:41.0920634Z ld.shared.b16 %rs103, [%r13+2064]; 2026-02-21T09:05:41.0920830Z ld.shared.b16 %rs104, [%r13+2320]; 2026-02-21T09:05:41.0921121Z ld.shared.b16 %rs105, [%r13+4096]; 2026-02-21T09:05:41.0921313Z ld.shared.b16 %rs106, [%r13+4352]; 2026-02-21T09:05:41.0921515Z ld.shared.b16 %rs107, [%r13+4112]; 2026-02-21T09:05:41.0921716Z ld.shared.b16 %rs108, [%r13+4368]; 2026-02-21T09:05:41.0921907Z ld.shared.b16 %rs109, [%r13+6144]; 2026-02-21T09:05:41.0922108Z ld.shared.b16 %rs110, [%r13+6400]; 2026-02-21T09:05:41.0922302Z ld.shared.b16 %rs111, [%r13+6160]; 2026-02-21T09:05:41.0922500Z ld.shared.b16 %rs112, [%r13+6416]; 2026-02-21T09:05:41.0922694Z cvt.f32.bf16 %r4030, %rs81; 2026-02-21T09:05:41.0922881Z cvt.f32.bf16 %r4031, %rs82; 2026-02-21T09:05:41.0923058Z cvt.f32.bf16 %r4032, %rs97; 2026-02-21T09:05:41.0923241Z cvt.f32.bf16 %r4033, %rs98; 2026-02-21T09:05:41.0923576Z cvt.f32.bf16 %r4162, %rs83; 2026-02-21T09:05:41.0923761Z cvt.f32.bf16 %r4163, %rs84; 2026-02-21T09:05:41.0923940Z cvt.f32.bf16 %r4164, %rs99; 2026-02-21T09:05:41.0924116Z cvt.f32.bf16 %r4165, %rs100; 2026-02-21T09:05:41.0924302Z cvt.f32.bf16 %r4294, %rs85; 2026-02-21T09:05:41.0924476Z cvt.f32.bf16 %r4295, %rs86; 2026-02-21T09:05:41.0924664Z cvt.f32.bf16 %r4296, %rs101; 2026-02-21T09:05:41.0924857Z cvt.f32.bf16 %r4297, %rs102; 2026-02-21T09:05:41.0925041Z cvt.f32.bf16 %r4426, %rs87; 2026-02-21T09:05:41.0925213Z cvt.f32.bf16 %r4427, %rs88; 2026-02-21T09:05:41.0925395Z cvt.f32.bf16 %r4428, %rs103; 2026-02-21T09:05:41.0925574Z cvt.f32.bf16 %r4429, %rs104; 2026-02-21T09:05:41.0925749Z cvt.f32.bf16 %r4558, %rs89; 2026-02-21T09:05:41.0925931Z cvt.f32.bf16 %r4559, %rs90; 2026-02-21T09:05:41.0926104Z cvt.f32.bf16 %r4560, %rs105; 2026-02-21T09:05:41.0926285Z cvt.f32.bf16 %r4561, %rs106; 2026-02-21T09:05:41.0926598Z cvt.f32.bf16 %r4690, %rs91; 2026-02-21T09:05:41.0926795Z cvt.f32.bf16 %r4691, %rs92; 2026-02-21T09:05:41.0927048Z cvt.f32.bf16 %r4692, %rs107; 2026-02-21T09:05:41.0927234Z cvt.f32.bf16 %r4693, %rs108; 2026-02-21T09:05:41.0927410Z cvt.f32.bf16 %r4822, %rs93; 2026-02-21T09:05:41.0927594Z cvt.f32.bf16 %r4823, %rs94; 2026-02-21T09:05:41.0927776Z cvt.f32.bf16 %r4824, %rs109; 2026-02-21T09:05:41.0927951Z cvt.f32.bf16 %r4825, %rs110; 2026-02-21T09:05:41.0928145Z cvt.f32.bf16 %r4954, %rs95; 2026-02-21T09:05:41.0928325Z cvt.f32.bf16 %r4955, %rs96; 2026-02-21T09:05:41.0928508Z cvt.f32.bf16 %r4956, %rs111; 2026-02-21T09:05:41.0928683Z cvt.f32.bf16 %r4957, %rs112; 2026-02-21T09:05:41.0929022Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.0929394Z add.s32 %r8701, %r29095, 65536; 2026-02-21T09:05:41.0929582Z cvt.s64.s32 %rd122, %r8701; 2026-02-21T09:05:41.0929772Z add.s64 %rd86, %rd45, %rd122; 2026-02-21T09:05:41.0930094Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.0930458Z // begin inline asm 2026-02-21T09:05:41.0930618Z mov.u32 %r3900, 0x0; 2026-02-21T09:05:41.0930784Z mov.u32 %r3901, 0x0; 2026-02-21T09:05:41.0930978Z ld.global.v2.b32 { %r3900, %r3901 }, [ %rd86 + 0 ]; 2026-02-21T09:05:41.0931222Z // end inline asm 2026-02-21T09:05:41.0931522Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.0931865Z bar.sync 0; 2026-02-21T09:05:41.0932024Z st.shared.b8 [%r14], %r3900; 2026-02-21T09:05:41.0932216Z prmt.b32 %r8702, %r3900, 0, 0x7771U; 2026-02-21T09:05:41.0932424Z st.shared.b8 [%r15], %r8702; 2026-02-21T09:05:41.0932620Z prmt.b32 %r8703, %r3900, 0, 0x7772U; 2026-02-21T09:05:41.0932831Z st.shared.b8 [%r16+256], %r8703; 2026-02-21T09:05:41.0933025Z prmt.b32 %r8704, %r3900, 0, 0x7773U; 2026-02-21T09:05:41.0933228Z st.shared.b8 [%r17+256], %r8704; 2026-02-21T09:05:41.0933425Z st.shared.b8 [%r18+512], %r3901; 2026-02-21T09:05:41.0933617Z prmt.b32 %r8705, %r3901, 0, 0x7771U; 2026-02-21T09:05:41.0933820Z st.shared.b8 [%r19+512], %r8705; 2026-02-21T09:05:41.0934011Z prmt.b32 %r8706, %r3901, 0, 0x7772U; 2026-02-21T09:05:41.0934289Z st.shared.b8 [%r20+768], %r8706; 2026-02-21T09:05:41.0934474Z prmt.b32 %r8707, %r3901, 0, 0x7773U; 2026-02-21T09:05:41.0934672Z st.shared.b8 [%r21+768], %r8707; 2026-02-21T09:05:41.0934849Z bar.sync 0; 2026-02-21T09:05:41.0935015Z ld.shared.b32 %r8708, [%r22]; 2026-02-21T09:05:41.0935209Z prmt.b32 %r8709, %r8708, 0, 0x7770U; 2026-02-21T09:05:41.0935403Z cvt.u16.u32 %rs113, %r8709; 2026-02-21T09:05:41.0935589Z prmt.b32 %r8710, %r8708, 0, 0x7771U; 2026-02-21T09:05:41.0935794Z cvt.u16.u32 %rs114, %r8710; 2026-02-21T09:05:41.0935993Z prmt.b32 %r8711, %r8708, 0, 0x7772U; 2026-02-21T09:05:41.0936192Z cvt.u16.u32 %rs115, %r8711; 2026-02-21T09:05:41.0936379Z prmt.b32 %r8712, %r8708, 0, 0x7773U; 2026-02-21T09:05:41.0936702Z cvt.u16.u32 %rs116, %r8712; 2026-02-21T09:05:41.0937038Z ld.shared.b32 %r8713, [%r23]; 2026-02-21T09:05:41.0937239Z prmt.b32 %r8714, %r8713, 0, 0x7770U; 2026-02-21T09:05:41.0937432Z cvt.u16.u32 %rs117, %r8714; 2026-02-21T09:05:41.0937615Z prmt.b32 %r8715, %r8713, 0, 0x7771U; 2026-02-21T09:05:41.0937825Z cvt.u16.u32 %rs118, %r8715; 2026-02-21T09:05:41.0938011Z prmt.b32 %r8716, %r8713, 0, 0x7772U; 2026-02-21T09:05:41.0938205Z cvt.u16.u32 %rs119, %r8716; 2026-02-21T09:05:41.0938388Z prmt.b32 %r8717, %r8713, 0, 0x7773U; 2026-02-21T09:05:41.0938579Z cvt.u16.u32 %rs120, %r8717; 2026-02-21T09:05:41.0938910Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.0939276Z shl.b16 %rs121, %rs113, 4; 2026-02-21T09:05:41.0939458Z shl.b16 %rs122, %rs117, 4; 2026-02-21T09:05:41.0939641Z shl.b16 %rs123, %rs114, 4; 2026-02-21T09:05:41.0939833Z shl.b16 %rs124, %rs118, 4; 2026-02-21T09:05:41.0940010Z shl.b16 %rs125, %rs115, 4; 2026-02-21T09:05:41.0940191Z shl.b16 %rs126, %rs119, 4; 2026-02-21T09:05:41.0940436Z shl.b16 %rs127, %rs116, 4; 2026-02-21T09:05:41.0940620Z shl.b16 %rs128, %rs120, 4; 2026-02-21T09:05:41.0940936Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0941315Z cvt.s16.s8 %rs129, %rs121; 2026-02-21T09:05:41.0941495Z shr.s16 %rs130, %rs129, 4; 2026-02-21T09:05:41.0941676Z cvt.s16.s8 %rs131, %rs122; 2026-02-21T09:05:41.0941854Z shr.s16 %rs132, %rs131, 4; 2026-02-21T09:05:41.0942032Z prmt.b32 %r8718, %r8708, 0, 0x8880U; 2026-02-21T09:05:41.0942233Z cvt.u16.u32 %rs133, %r8718; 2026-02-21T09:05:41.0942409Z shr.s16 %rs134, %rs133, 4; 2026-02-21T09:05:41.0942590Z prmt.b32 %r8719, %r8713, 0, 0x8880U; 2026-02-21T09:05:41.0942783Z cvt.u16.u32 %rs135, %r8719; 2026-02-21T09:05:41.0942959Z shr.s16 %rs136, %rs135, 4; 2026-02-21T09:05:41.0943270Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0943634Z cvt.rn.f32.s16 %r8720, %rs136; 2026-02-21T09:05:41.0943828Z cvt.rn.f32.s16 %r8721, %rs134; 2026-02-21T09:05:41.0944013Z cvt.rn.f32.s16 %r8722, %rs132; 2026-02-21T09:05:41.0944209Z cvt.rn.f32.s16 %r8723, %rs130; 2026-02-21T09:05:41.0944528Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0944883Z cvt.s16.s8 %rs137, %rs123; 2026-02-21T09:05:41.0945058Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:05:41.0945237Z cvt.s16.s8 %rs139, %rs124; 2026-02-21T09:05:41.0945406Z shr.s16 %rs140, %rs139, 4; 2026-02-21T09:05:41.0945587Z prmt.b32 %r8724, %r8708, 0, 0x9991U; 2026-02-21T09:05:41.0945782Z cvt.u16.u32 %rs141, %r8724; 2026-02-21T09:05:41.0945961Z shr.s16 %rs142, %rs141, 4; 2026-02-21T09:05:41.0946144Z prmt.b32 %r8725, %r8713, 0, 0x9991U; 2026-02-21T09:05:41.0946337Z cvt.u16.u32 %rs143, %r8725; 2026-02-21T09:05:41.0946400Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:05:41.0946732Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0946811Z cvt.rn.f32.s16 %r8726, %rs144; 2026-02-21T09:05:41.0946878Z cvt.rn.f32.s16 %r8727, %rs142; 2026-02-21T09:05:41.0946949Z cvt.rn.f32.s16 %r8728, %rs140; 2026-02-21T09:05:41.0947099Z cvt.rn.f32.s16 %r8729, %rs138; 2026-02-21T09:05:41.0947303Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0947376Z cvt.s16.s8 %rs145, %rs125; 2026-02-21T09:05:41.0947439Z shr.s16 %rs146, %rs145, 4; 2026-02-21T09:05:41.0947501Z cvt.s16.s8 %rs147, %rs126; 2026-02-21T09:05:41.0947564Z shr.s16 %rs148, %rs147, 4; 2026-02-21T09:05:41.0947639Z prmt.b32 %r8730, %r8708, 0, 0xaaa2U; 2026-02-21T09:05:41.0947702Z cvt.u16.u32 %rs149, %r8730; 2026-02-21T09:05:41.0947767Z shr.s16 %rs150, %rs149, 4; 2026-02-21T09:05:41.0947841Z prmt.b32 %r8731, %r8713, 0, 0xaaa2U; 2026-02-21T09:05:41.0947902Z cvt.u16.u32 %rs151, %r8731; 2026-02-21T09:05:41.0948048Z shr.s16 %rs152, %rs151, 4; 2026-02-21T09:05:41.0948377Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0948459Z cvt.rn.f32.s16 %r8732, %rs152; 2026-02-21T09:05:41.0948527Z cvt.rn.f32.s16 %r8733, %rs150; 2026-02-21T09:05:41.0948592Z cvt.rn.f32.s16 %r8734, %rs148; 2026-02-21T09:05:41.0948662Z cvt.rn.f32.s16 %r8735, %rs146; 2026-02-21T09:05:41.0948867Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0948931Z cvt.s16.s8 %rs153, %rs127; 2026-02-21T09:05:41.0948998Z shr.s16 %rs154, %rs153, 4; 2026-02-21T09:05:41.0949060Z cvt.s16.s8 %rs155, %rs128; 2026-02-21T09:05:41.0949123Z shr.s16 %rs156, %rs155, 4; 2026-02-21T09:05:41.0949192Z prmt.b32 %r8736, %r8708, 0, 0xbbb3U; 2026-02-21T09:05:41.0949262Z cvt.u16.u32 %rs157, %r8736; 2026-02-21T09:05:41.0949324Z shr.s16 %rs158, %rs157, 4; 2026-02-21T09:05:41.0949391Z prmt.b32 %r8737, %r8713, 0, 0xbbb3U; 2026-02-21T09:05:41.0949474Z cvt.u16.u32 %rs159, %r8737; 2026-02-21T09:05:41.0949607Z shr.s16 %rs160, %rs159, 4; 2026-02-21T09:05:41.0949812Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0949882Z cvt.rn.f32.s16 %r8738, %rs160; 2026-02-21T09:05:41.0949955Z cvt.rn.f32.s16 %r8739, %rs158; 2026-02-21T09:05:41.0950020Z cvt.rn.f32.s16 %r8740, %rs156; 2026-02-21T09:05:41.0950087Z cvt.rn.f32.s16 %r8741, %rs154; 2026-02-21T09:05:41.0950152Z bar.sync 0; 2026-02-21T09:05:41.0950263Z st.shared.v4.b32 [%r24], {%r8723, %r8721, %r8722, %r8720}; 2026-02-21T09:05:41.0950372Z st.shared.v4.b32 [%r25], {%r8729, %r8727, %r8728, %r8726}; 2026-02-21T09:05:41.0950479Z st.shared.v4.b32 [%r26], {%r8735, %r8733, %r8734, %r8732}; 2026-02-21T09:05:41.0950582Z st.shared.v4.b32 [%r27], {%r8741, %r8739, %r8740, %r8738}; 2026-02-21T09:05:41.0950642Z $L__tmp3: 2026-02-21T09:05:41.0950924Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.0950997Z // begin inline asm 2026-02-21T09:05:41.0951092Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.0951153Z // end inline asm 2026-02-21T09:05:41.0951219Z bar.sync 0; 2026-02-21T09:05:41.0951295Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.0951357Z // begin inline asm 2026-02-21T09:05:41.0952853Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r4030,%r4031,%r4032,%r4033}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0952917Z // end inline asm 2026-02-21T09:05:41.0952981Z // begin inline asm 2026-02-21T09:05:41.0954464Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r4162,%r4163,%r4164,%r4165}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0954602Z // end inline asm 2026-02-21T09:05:41.0954670Z // begin inline asm 2026-02-21T09:05:41.0956199Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r4294,%r4295,%r4296,%r4297}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0956309Z // end inline asm 2026-02-21T09:05:41.0956379Z // begin inline asm 2026-02-21T09:05:41.0958045Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r4426,%r4427,%r4428,%r4429}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0958124Z // end inline asm 2026-02-21T09:05:41.0958189Z // begin inline asm 2026-02-21T09:05:41.0959673Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r4558,%r4559,%r4560,%r4561}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0959739Z // end inline asm 2026-02-21T09:05:41.0959802Z // begin inline asm 2026-02-21T09:05:41.0961282Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r4690,%r4691,%r4692,%r4693}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0961349Z // end inline asm 2026-02-21T09:05:41.0961409Z // begin inline asm 2026-02-21T09:05:41.0962886Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r4822,%r4823,%r4824,%r4825}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0963032Z // end inline asm 2026-02-21T09:05:41.0963094Z // begin inline asm 2026-02-21T09:05:41.0964635Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r4954,%r4955,%r4956,%r4957}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0964753Z // end inline asm 2026-02-21T09:05:41.0964835Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.0964906Z mov.b32 %r5214, %r24001; 2026-02-21T09:05:41.0964971Z mov.b32 %r5215, %r8400; 2026-02-21T09:05:41.0965032Z mov.b32 %r5216, %r8400; 2026-02-21T09:05:41.0965096Z // begin inline asm 2026-02-21T09:05:41.0970260Z // wait for regs: %r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159,%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223,%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287,%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351,%r5214,%r5215,%r5216 2026-02-21T09:05:41.0970373Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.0970434Z // end inline asm 2026-02-21T09:05:41.0970495Z $L__tmp4: 2026-02-21T09:05:41.0970712Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.0970778Z add.s64 %rd95, %rd69, 64; 2026-02-21T09:05:41.0970847Z add.s64 %rd96, %rd70, 64; 2026-02-21T09:05:41.0970910Z add.s64 %rd97, %rd71, 64; 2026-02-21T09:05:41.0971115Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.0971185Z add.s64 %rd98, %rd72, 64; 2026-02-21T09:05:41.0971247Z // begin inline asm 2026-02-21T09:05:41.0971309Z mov.u32 %r5476, 0x0; 2026-02-21T09:05:41.0971437Z mov.u32 %r5477, 0x0; 2026-02-21T09:05:41.0971500Z mov.u32 %r5478, 0x0; 2026-02-21T09:05:41.0971560Z mov.u32 %r5479, 0x0; 2026-02-21T09:05:41.0971691Z ld.global.v4.b32 { %r5476, %r5477, %r5478, %r5479 }, [ %rd95 + 0 ]; 2026-02-21T09:05:41.0971760Z // end inline asm 2026-02-21T09:05:41.0971821Z // begin inline asm 2026-02-21T09:05:41.0971880Z mov.u32 %r5480, 0x0; 2026-02-21T09:05:41.0971942Z mov.u32 %r5481, 0x0; 2026-02-21T09:05:41.0972006Z mov.u32 %r5482, 0x0; 2026-02-21T09:05:41.0972067Z mov.u32 %r5483, 0x0; 2026-02-21T09:05:41.0972188Z ld.global.v4.b32 { %r5480, %r5481, %r5482, %r5483 }, [ %rd96 + 0 ]; 2026-02-21T09:05:41.0972253Z // end inline asm 2026-02-21T09:05:41.0972316Z // begin inline asm 2026-02-21T09:05:41.0972375Z mov.u32 %r5484, 0x0; 2026-02-21T09:05:41.0972563Z mov.u32 %r5485, 0x0; 2026-02-21T09:05:41.0972630Z mov.u32 %r5486, 0x0; 2026-02-21T09:05:41.0972688Z mov.u32 %r5487, 0x0; 2026-02-21T09:05:41.0972806Z ld.global.v4.b32 { %r5484, %r5485, %r5486, %r5487 }, [ %rd97 + 0 ]; 2026-02-21T09:05:41.0972871Z // end inline asm 2026-02-21T09:05:41.0972933Z // begin inline asm 2026-02-21T09:05:41.0972992Z mov.u32 %r5488, 0x0; 2026-02-21T09:05:41.0973051Z mov.u32 %r5489, 0x0; 2026-02-21T09:05:41.0973115Z mov.u32 %r5490, 0x0; 2026-02-21T09:05:41.0973174Z mov.u32 %r5491, 0x0; 2026-02-21T09:05:41.0973290Z ld.global.v4.b32 { %r5488, %r5489, %r5490, %r5491 }, [ %rd98 + 0 ]; 2026-02-21T09:05:41.0973354Z // end inline asm 2026-02-21T09:05:41.0973557Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.0973617Z bar.sync 0; 2026-02-21T09:05:41.0973706Z st.shared.v2.b32 [%r10], {%r5476, %r5477}; 2026-02-21T09:05:41.0973798Z st.shared.v2.b32 [%r10+2048], {%r5480, %r5481}; 2026-02-21T09:05:41.0973941Z st.shared.v2.b32 [%r10+4096], {%r5484, %r5485}; 2026-02-21T09:05:41.0974028Z st.shared.v2.b32 [%r10+6144], {%r5488, %r5489}; 2026-02-21T09:05:41.0974109Z st.shared.v2.b32 [%r11], {%r5478, %r5479}; 2026-02-21T09:05:41.0974206Z st.shared.v2.b32 [%r11+2048], {%r5482, %r5483}; 2026-02-21T09:05:41.0974293Z st.shared.v2.b32 [%r11+4096], {%r5486, %r5487}; 2026-02-21T09:05:41.0974380Z st.shared.v2.b32 [%r11+6144], {%r5490, %r5491}; 2026-02-21T09:05:41.0974437Z bar.sync 0; 2026-02-21T09:05:41.0974510Z ld.shared.b16 %rs161, [%r12]; 2026-02-21T09:05:41.0974582Z ld.shared.b16 %rs162, [%r12+256]; 2026-02-21T09:05:41.0974659Z ld.shared.b16 %rs163, [%r12+16]; 2026-02-21T09:05:41.0974728Z ld.shared.b16 %rs164, [%r12+272]; 2026-02-21T09:05:41.0974800Z ld.shared.b16 %rs165, [%r12+2048]; 2026-02-21T09:05:41.0974874Z ld.shared.b16 %rs166, [%r12+2304]; 2026-02-21T09:05:41.0974939Z ld.shared.b16 %rs167, [%r12+2064]; 2026-02-21T09:05:41.0975005Z ld.shared.b16 %rs168, [%r12+2320]; 2026-02-21T09:05:41.0975081Z ld.shared.b16 %rs169, [%r12+4096]; 2026-02-21T09:05:41.0975152Z ld.shared.b16 %rs170, [%r12+4352]; 2026-02-21T09:05:41.0975218Z ld.shared.b16 %rs171, [%r12+4112]; 2026-02-21T09:05:41.0975285Z ld.shared.b16 %rs172, [%r12+4368]; 2026-02-21T09:05:41.0975357Z ld.shared.b16 %rs173, [%r12+6144]; 2026-02-21T09:05:41.0975423Z ld.shared.b16 %rs174, [%r12+6400]; 2026-02-21T09:05:41.0975492Z ld.shared.b16 %rs175, [%r12+6160]; 2026-02-21T09:05:41.0975564Z ld.shared.b16 %rs176, [%r12+6416]; 2026-02-21T09:05:41.0975633Z ld.shared.b16 %rs177, [%r13]; 2026-02-21T09:05:41.0975702Z ld.shared.b16 %rs178, [%r13+256]; 2026-02-21T09:05:41.0975770Z ld.shared.b16 %rs179, [%r13+16]; 2026-02-21T09:05:41.0975844Z ld.shared.b16 %rs180, [%r13+272]; 2026-02-21T09:05:41.0975912Z ld.shared.b16 %rs181, [%r13+2048]; 2026-02-21T09:05:41.0975978Z ld.shared.b16 %rs182, [%r13+2304]; 2026-02-21T09:05:41.0976051Z ld.shared.b16 %rs183, [%r13+2064]; 2026-02-21T09:05:41.0976119Z ld.shared.b16 %rs184, [%r13+2320]; 2026-02-21T09:05:41.0976188Z ld.shared.b16 %rs185, [%r13+4096]; 2026-02-21T09:05:41.0976255Z ld.shared.b16 %rs186, [%r13+4352]; 2026-02-21T09:05:41.0976329Z ld.shared.b16 %rs187, [%r13+4112]; 2026-02-21T09:05:41.0976576Z ld.shared.b16 %rs188, [%r13+4368]; 2026-02-21T09:05:41.0976649Z ld.shared.b16 %rs189, [%r13+6144]; 2026-02-21T09:05:41.0976721Z ld.shared.b16 %rs190, [%r13+6400]; 2026-02-21T09:05:41.0976788Z ld.shared.b16 %rs191, [%r13+6160]; 2026-02-21T09:05:41.0976853Z ld.shared.b16 %rs192, [%r13+6416]; 2026-02-21T09:05:41.0976925Z cvt.f32.bf16 %r5622, %rs161; 2026-02-21T09:05:41.0976991Z cvt.f32.bf16 %r5623, %rs162; 2026-02-21T09:05:41.0977052Z cvt.f32.bf16 %r5624, %rs177; 2026-02-21T09:05:41.0977114Z cvt.f32.bf16 %r5625, %rs178; 2026-02-21T09:05:41.0977181Z cvt.f32.bf16 %r5754, %rs163; 2026-02-21T09:05:41.0977243Z cvt.f32.bf16 %r5755, %rs164; 2026-02-21T09:05:41.0977305Z cvt.f32.bf16 %r5756, %rs179; 2026-02-21T09:05:41.0977371Z cvt.f32.bf16 %r5757, %rs180; 2026-02-21T09:05:41.0977576Z cvt.f32.bf16 %r5886, %rs165; 2026-02-21T09:05:41.0977643Z cvt.f32.bf16 %r5887, %rs166; 2026-02-21T09:05:41.0977707Z cvt.f32.bf16 %r5888, %rs181; 2026-02-21T09:05:41.0977788Z cvt.f32.bf16 %r5889, %rs182; 2026-02-21T09:05:41.0977854Z cvt.f32.bf16 %r6018, %rs167; 2026-02-21T09:05:41.0977917Z cvt.f32.bf16 %r6019, %rs168; 2026-02-21T09:05:41.0977985Z cvt.f32.bf16 %r6020, %rs183; 2026-02-21T09:05:41.0978048Z cvt.f32.bf16 %r6021, %rs184; 2026-02-21T09:05:41.0978110Z cvt.f32.bf16 %r6150, %rs169; 2026-02-21T09:05:41.0978171Z cvt.f32.bf16 %r6151, %rs170; 2026-02-21T09:05:41.0978239Z cvt.f32.bf16 %r6152, %rs185; 2026-02-21T09:05:41.0978301Z cvt.f32.bf16 %r6153, %rs186; 2026-02-21T09:05:41.0978362Z cvt.f32.bf16 %r6282, %rs171; 2026-02-21T09:05:41.0978428Z cvt.f32.bf16 %r6283, %rs172; 2026-02-21T09:05:41.0978490Z cvt.f32.bf16 %r6284, %rs187; 2026-02-21T09:05:41.0978551Z cvt.f32.bf16 %r6285, %rs188; 2026-02-21T09:05:41.0978624Z cvt.f32.bf16 %r6414, %rs173; 2026-02-21T09:05:41.0978689Z cvt.f32.bf16 %r6415, %rs174; 2026-02-21T09:05:41.0978839Z cvt.f32.bf16 %r6416, %rs189; 2026-02-21T09:05:41.0978905Z cvt.f32.bf16 %r6417, %rs190; 2026-02-21T09:05:41.0978970Z cvt.f32.bf16 %r6546, %rs175; 2026-02-21T09:05:41.0979035Z cvt.f32.bf16 %r6547, %rs176; 2026-02-21T09:05:41.0979098Z cvt.f32.bf16 %r6548, %rs191; 2026-02-21T09:05:41.0979167Z cvt.f32.bf16 %r6549, %rs192; 2026-02-21T09:05:41.0979371Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.0979438Z add.s32 %r8742, %r29095, 131072; 2026-02-21T09:05:41.0979506Z cvt.s64.s32 %rd123, %r8742; 2026-02-21T09:05:41.0979578Z add.s64 %rd99, %rd45, %rd123; 2026-02-21T09:05:41.0979777Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.0979841Z // begin inline asm 2026-02-21T09:05:41.0979905Z mov.u32 %r5492, 0x0; 2026-02-21T09:05:41.0979964Z mov.u32 %r5493, 0x0; 2026-02-21T09:05:41.0980066Z ld.global.v2.b32 { %r5492, %r5493 }, [ %rd99 + 0 ]; 2026-02-21T09:05:41.0980131Z // end inline asm 2026-02-21T09:05:41.0980333Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.0980406Z bar.sync 0; 2026-02-21T09:05:41.0980474Z st.shared.b8 [%r14], %r5492; 2026-02-21T09:05:41.0980550Z prmt.b32 %r8743, %r5492, 0, 0x7771U; 2026-02-21T09:05:41.0980615Z st.shared.b8 [%r15], %r8743; 2026-02-21T09:05:41.0980683Z prmt.b32 %r8744, %r5492, 0, 0x7772U; 2026-02-21T09:05:41.0980755Z st.shared.b8 [%r16+256], %r8744; 2026-02-21T09:05:41.0980820Z prmt.b32 %r8745, %r5492, 0, 0x7773U; 2026-02-21T09:05:41.0980889Z st.shared.b8 [%r17+256], %r8745; 2026-02-21T09:05:41.0980954Z st.shared.b8 [%r18+512], %r5493; 2026-02-21T09:05:41.0981028Z prmt.b32 %r8746, %r5493, 0, 0x7771U; 2026-02-21T09:05:41.0981093Z st.shared.b8 [%r19+512], %r8746; 2026-02-21T09:05:41.0981159Z prmt.b32 %r8747, %r5493, 0, 0x7772U; 2026-02-21T09:05:41.0981233Z st.shared.b8 [%r20+768], %r8747; 2026-02-21T09:05:41.0981300Z prmt.b32 %r8748, %r5493, 0, 0x7773U; 2026-02-21T09:05:41.0981365Z st.shared.b8 [%r21+768], %r8748; 2026-02-21T09:05:41.0981426Z bar.sync 0; 2026-02-21T09:05:41.0981571Z ld.shared.b32 %r8749, [%r22]; 2026-02-21T09:05:41.0981637Z prmt.b32 %r8750, %r8749, 0, 0x7770U; 2026-02-21T09:05:41.0981702Z cvt.u16.u32 %rs193, %r8750; 2026-02-21T09:05:41.0981774Z prmt.b32 %r8751, %r8749, 0, 0x7771U; 2026-02-21T09:05:41.0981837Z cvt.u16.u32 %rs194, %r8751; 2026-02-21T09:05:41.0981906Z prmt.b32 %r8752, %r8749, 0, 0x7772U; 2026-02-21T09:05:41.0981977Z cvt.u16.u32 %rs195, %r8752; 2026-02-21T09:05:41.0982042Z prmt.b32 %r8753, %r8749, 0, 0x7773U; 2026-02-21T09:05:41.0982106Z cvt.u16.u32 %rs196, %r8753; 2026-02-21T09:05:41.0982170Z ld.shared.b32 %r8754, [%r23]; 2026-02-21T09:05:41.0982242Z prmt.b32 %r8755, %r8754, 0, 0x7770U; 2026-02-21T09:05:41.0982305Z cvt.u16.u32 %rs197, %r8755; 2026-02-21T09:05:41.0982469Z prmt.b32 %r8756, %r8754, 0, 0x7771U; 2026-02-21T09:05:41.0982543Z cvt.u16.u32 %rs198, %r8756; 2026-02-21T09:05:41.0982609Z prmt.b32 %r8757, %r8754, 0, 0x7772U; 2026-02-21T09:05:41.0982671Z cvt.u16.u32 %rs199, %r8757; 2026-02-21T09:05:41.0982751Z prmt.b32 %r8758, %r8754, 0, 0x7773U; 2026-02-21T09:05:41.0982823Z cvt.u16.u32 %rs200, %r8758; 2026-02-21T09:05:41.0983026Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.0983092Z shl.b16 %rs201, %rs193, 4; 2026-02-21T09:05:41.0983164Z shl.b16 %rs202, %rs197, 4; 2026-02-21T09:05:41.0983228Z shl.b16 %rs203, %rs194, 4; 2026-02-21T09:05:41.0983292Z shl.b16 %rs204, %rs198, 4; 2026-02-21T09:05:41.0983363Z shl.b16 %rs205, %rs195, 4; 2026-02-21T09:05:41.0983425Z shl.b16 %rs206, %rs199, 4; 2026-02-21T09:05:41.0983488Z shl.b16 %rs207, %rs196, 4; 2026-02-21T09:05:41.0983549Z shl.b16 %rs208, %rs200, 4; 2026-02-21T09:05:41.0983760Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0983882Z cvt.s16.s8 %rs209, %rs201; 2026-02-21T09:05:41.0983948Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:05:41.0984018Z cvt.s16.s8 %rs211, %rs202; 2026-02-21T09:05:41.0984083Z shr.s16 %rs212, %rs211, 4; 2026-02-21T09:05:41.0984148Z prmt.b32 %r8759, %r8749, 0, 0x8880U; 2026-02-21T09:05:41.0984212Z cvt.u16.u32 %rs213, %r8759; 2026-02-21T09:05:41.0984283Z shr.s16 %rs214, %rs213, 4; 2026-02-21T09:05:41.0984352Z prmt.b32 %r8760, %r8754, 0, 0x8880U; 2026-02-21T09:05:41.0984416Z cvt.u16.u32 %rs215, %r8760; 2026-02-21T09:05:41.0984484Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:05:41.0984681Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0984748Z cvt.rn.f32.s16 %r8761, %rs216; 2026-02-21T09:05:41.0984818Z cvt.rn.f32.s16 %r8762, %rs214; 2026-02-21T09:05:41.0984882Z cvt.rn.f32.s16 %r8763, %rs212; 2026-02-21T09:05:41.0984946Z cvt.rn.f32.s16 %r8764, %rs210; 2026-02-21T09:05:41.0985145Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0985213Z cvt.s16.s8 %rs217, %rs203; 2026-02-21T09:05:41.0985279Z shr.s16 %rs218, %rs217, 4; 2026-02-21T09:05:41.0985340Z cvt.s16.s8 %rs219, %rs204; 2026-02-21T09:05:41.0985406Z shr.s16 %rs220, %rs219, 4; 2026-02-21T09:05:41.0985472Z prmt.b32 %r8765, %r8749, 0, 0x9991U; 2026-02-21T09:05:41.0985534Z cvt.u16.u32 %rs221, %r8765; 2026-02-21T09:05:41.0985597Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:05:41.0985669Z prmt.b32 %r8766, %r8754, 0, 0x9991U; 2026-02-21T09:05:41.0985730Z cvt.u16.u32 %rs223, %r8766; 2026-02-21T09:05:41.0985793Z shr.s16 %rs224, %rs223, 4; 2026-02-21T09:05:41.0986000Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0986065Z cvt.rn.f32.s16 %r8767, %rs224; 2026-02-21T09:05:41.0986140Z cvt.rn.f32.s16 %r8768, %rs222; 2026-02-21T09:05:41.0986213Z cvt.rn.f32.s16 %r8769, %rs220; 2026-02-21T09:05:41.0986278Z cvt.rn.f32.s16 %r8770, %rs218; 2026-02-21T09:05:41.0986617Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0986786Z cvt.s16.s8 %rs225, %rs205; 2026-02-21T09:05:41.0986858Z shr.s16 %rs226, %rs225, 4; 2026-02-21T09:05:41.0986921Z cvt.s16.s8 %rs227, %rs206; 2026-02-21T09:05:41.0986982Z shr.s16 %rs228, %rs227, 4; 2026-02-21T09:05:41.0987055Z prmt.b32 %r8771, %r8749, 0, 0xaaa2U; 2026-02-21T09:05:41.0987118Z cvt.u16.u32 %rs229, %r8771; 2026-02-21T09:05:41.0987180Z shr.s16 %rs230, %rs229, 4; 2026-02-21T09:05:41.0987253Z prmt.b32 %r8772, %r8754, 0, 0xaaa2U; 2026-02-21T09:05:41.0987316Z cvt.u16.u32 %rs231, %r8772; 2026-02-21T09:05:41.0987378Z shr.s16 %rs232, %rs231, 4; 2026-02-21T09:05:41.0987576Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0987646Z cvt.rn.f32.s16 %r8773, %rs232; 2026-02-21T09:05:41.0987841Z cvt.rn.f32.s16 %r8774, %rs230; 2026-02-21T09:05:41.0987909Z cvt.rn.f32.s16 %r8775, %rs228; 2026-02-21T09:05:41.0987979Z cvt.rn.f32.s16 %r8776, %rs226; 2026-02-21T09:05:41.0988176Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.0988242Z cvt.s16.s8 %rs233, %rs207; 2026-02-21T09:05:41.0988398Z shr.s16 %rs234, %rs233, 4; 2026-02-21T09:05:41.0988466Z cvt.s16.s8 %rs235, %rs208; 2026-02-21T09:05:41.0988532Z shr.s16 %rs236, %rs235, 4; 2026-02-21T09:05:41.0988601Z prmt.b32 %r8777, %r8749, 0, 0xbbb3U; 2026-02-21T09:05:41.0988670Z cvt.u16.u32 %rs237, %r8777; 2026-02-21T09:05:41.0988732Z shr.s16 %rs238, %rs237, 4; 2026-02-21T09:05:41.0988798Z prmt.b32 %r8778, %r8754, 0, 0xbbb3U; 2026-02-21T09:05:41.0988867Z cvt.u16.u32 %rs239, %r8778; 2026-02-21T09:05:41.0988930Z shr.s16 %rs240, %rs239, 4; 2026-02-21T09:05:41.0989139Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.0989277Z cvt.rn.f32.s16 %r8779, %rs240; 2026-02-21T09:05:41.0989349Z cvt.rn.f32.s16 %r8780, %rs238; 2026-02-21T09:05:41.0989413Z cvt.rn.f32.s16 %r8781, %rs236; 2026-02-21T09:05:41.0989478Z cvt.rn.f32.s16 %r8782, %rs234; 2026-02-21T09:05:41.0989544Z bar.sync 0; 2026-02-21T09:05:41.0989658Z st.shared.v4.b32 [%r24], {%r8764, %r8762, %r8763, %r8761}; 2026-02-21T09:05:41.0989766Z st.shared.v4.b32 [%r25], {%r8770, %r8768, %r8769, %r8767}; 2026-02-21T09:05:41.0989872Z st.shared.v4.b32 [%r26], {%r8776, %r8774, %r8775, %r8773}; 2026-02-21T09:05:41.0989981Z st.shared.v4.b32 [%r27], {%r8782, %r8780, %r8781, %r8779}; 2026-02-21T09:05:41.0990038Z $L__tmp5: 2026-02-21T09:05:41.0990314Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.0990385Z // begin inline asm 2026-02-21T09:05:41.0990465Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.0990527Z // end inline asm 2026-02-21T09:05:41.0990595Z bar.sync 0; 2026-02-21T09:05:41.0990673Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.0990735Z // begin inline asm 2026-02-21T09:05:41.0992224Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r5622,%r5623,%r5624,%r5625}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0992286Z // end inline asm 2026-02-21T09:05:41.0992347Z // begin inline asm 2026-02-21T09:05:41.0993828Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r5754,%r5755,%r5756,%r5757}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0993957Z // end inline asm 2026-02-21T09:05:41.0994017Z // begin inline asm 2026-02-21T09:05:41.0995550Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r5886,%r5887,%r5888,%r5889}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0995667Z // end inline asm 2026-02-21T09:05:41.0995738Z // begin inline asm 2026-02-21T09:05:41.0997349Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r6018,%r6019,%r6020,%r6021}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.0997427Z // end inline asm 2026-02-21T09:05:41.0997498Z // begin inline asm 2026-02-21T09:05:41.0999044Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r6150,%r6151,%r6152,%r6153}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.0999118Z // end inline asm 2026-02-21T09:05:41.0999182Z // begin inline asm 2026-02-21T09:05:41.1000657Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r6282,%r6283,%r6284,%r6285}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1000730Z // end inline asm 2026-02-21T09:05:41.1000791Z // begin inline asm 2026-02-21T09:05:41.1002258Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r6414,%r6415,%r6416,%r6417}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.1002328Z // end inline asm 2026-02-21T09:05:41.1002455Z // begin inline asm 2026-02-21T09:05:41.1003930Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r6546,%r6547,%r6548,%r6549}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1003993Z // end inline asm 2026-02-21T09:05:41.1004072Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1004280Z mov.b32 %r6806, %r24001; 2026-02-21T09:05:41.1004349Z mov.b32 %r6807, %r8400; 2026-02-21T09:05:41.1004412Z mov.b32 %r6808, %r8400; 2026-02-21T09:05:41.1004473Z // begin inline asm 2026-02-21T09:05:41.1009654Z // wait for regs: %r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159,%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223,%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287,%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351,%r6806,%r6807,%r6808 2026-02-21T09:05:41.1009754Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1009812Z // end inline asm 2026-02-21T09:05:41.1009886Z $L__tmp6: 2026-02-21T09:05:41.1010108Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1010175Z add.s64 %rd108, %rd69, 96; 2026-02-21T09:05:41.1010247Z add.s64 %rd109, %rd70, 96; 2026-02-21T09:05:41.1010310Z add.s64 %rd110, %rd71, 96; 2026-02-21T09:05:41.1010514Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1010579Z add.s64 %rd111, %rd72, 96; 2026-02-21T09:05:41.1010645Z // begin inline asm 2026-02-21T09:05:41.1010710Z mov.u32 %r7068, 0x0; 2026-02-21T09:05:41.1010774Z mov.u32 %r7069, 0x0; 2026-02-21T09:05:41.1010838Z mov.u32 %r7070, 0x0; 2026-02-21T09:05:41.1010897Z mov.u32 %r7071, 0x0; 2026-02-21T09:05:41.1011028Z ld.global.v4.b32 { %r7068, %r7069, %r7070, %r7071 }, [ %rd108 + 0 ]; 2026-02-21T09:05:41.1011163Z // end inline asm 2026-02-21T09:05:41.1011230Z // begin inline asm 2026-02-21T09:05:41.1011291Z mov.u32 %r7072, 0x0; 2026-02-21T09:05:41.1011361Z mov.u32 %r7073, 0x0; 2026-02-21T09:05:41.1011427Z mov.u32 %r7074, 0x0; 2026-02-21T09:05:41.1011488Z mov.u32 %r7075, 0x0; 2026-02-21T09:05:41.1011612Z ld.global.v4.b32 { %r7072, %r7073, %r7074, %r7075 }, [ %rd109 + 0 ]; 2026-02-21T09:05:41.1011671Z // end inline asm 2026-02-21T09:05:41.1011737Z // begin inline asm 2026-02-21T09:05:41.1011795Z mov.u32 %r7076, 0x0; 2026-02-21T09:05:41.1011854Z mov.u32 %r7077, 0x0; 2026-02-21T09:05:41.1011915Z mov.u32 %r7078, 0x0; 2026-02-21T09:05:41.1011977Z mov.u32 %r7079, 0x0; 2026-02-21T09:05:41.1012097Z ld.global.v4.b32 { %r7076, %r7077, %r7078, %r7079 }, [ %rd110 + 0 ]; 2026-02-21T09:05:41.1012296Z // end inline asm 2026-02-21T09:05:41.1012359Z // begin inline asm 2026-02-21T09:05:41.1012416Z mov.u32 %r7080, 0x0; 2026-02-21T09:05:41.1012474Z mov.u32 %r7081, 0x0; 2026-02-21T09:05:41.1012541Z mov.u32 %r7082, 0x0; 2026-02-21T09:05:41.1012600Z mov.u32 %r7083, 0x0; 2026-02-21T09:05:41.1012719Z ld.global.v4.b32 { %r7080, %r7081, %r7082, %r7083 }, [ %rd111 + 0 ]; 2026-02-21T09:05:41.1012783Z // end inline asm 2026-02-21T09:05:41.1012991Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1013050Z bar.sync 0; 2026-02-21T09:05:41.1013134Z st.shared.v2.b32 [%r10], {%r7068, %r7069}; 2026-02-21T09:05:41.1013229Z st.shared.v2.b32 [%r10+2048], {%r7072, %r7073}; 2026-02-21T09:05:41.1013314Z st.shared.v2.b32 [%r10+4096], {%r7076, %r7077}; 2026-02-21T09:05:41.1013395Z st.shared.v2.b32 [%r10+6144], {%r7080, %r7081}; 2026-02-21T09:05:41.1013477Z st.shared.v2.b32 [%r11], {%r7070, %r7071}; 2026-02-21T09:05:41.1013613Z st.shared.v2.b32 [%r11+2048], {%r7074, %r7075}; 2026-02-21T09:05:41.1013700Z st.shared.v2.b32 [%r11+4096], {%r7078, %r7079}; 2026-02-21T09:05:41.1013788Z st.shared.v2.b32 [%r11+6144], {%r7082, %r7083}; 2026-02-21T09:05:41.1013849Z bar.sync 0; 2026-02-21T09:05:41.1013920Z ld.shared.b16 %rs241, [%r12]; 2026-02-21T09:05:41.1013991Z ld.shared.b16 %rs242, [%r12+256]; 2026-02-21T09:05:41.1014066Z ld.shared.b16 %rs243, [%r12+16]; 2026-02-21T09:05:41.1014133Z ld.shared.b16 %rs244, [%r12+272]; 2026-02-21T09:05:41.1014202Z ld.shared.b16 %rs245, [%r12+2048]; 2026-02-21T09:05:41.1014274Z ld.shared.b16 %rs246, [%r12+2304]; 2026-02-21T09:05:41.1014342Z ld.shared.b16 %rs247, [%r12+2064]; 2026-02-21T09:05:41.1014407Z ld.shared.b16 %rs248, [%r12+2320]; 2026-02-21T09:05:41.1014474Z ld.shared.b16 %rs249, [%r12+4096]; 2026-02-21T09:05:41.1014548Z ld.shared.b16 %rs250, [%r12+4352]; 2026-02-21T09:05:41.1014614Z ld.shared.b16 %rs251, [%r12+4112]; 2026-02-21T09:05:41.1014683Z ld.shared.b16 %rs252, [%r12+4368]; 2026-02-21T09:05:41.1014756Z ld.shared.b16 %rs253, [%r12+6144]; 2026-02-21T09:05:41.1014824Z ld.shared.b16 %rs254, [%r12+6400]; 2026-02-21T09:05:41.1014891Z ld.shared.b16 %rs255, [%r12+6160]; 2026-02-21T09:05:41.1014967Z ld.shared.b16 %rs256, [%r12+6416]; 2026-02-21T09:05:41.1015033Z ld.shared.b16 %rs257, [%r13]; 2026-02-21T09:05:41.1015101Z ld.shared.b16 %rs258, [%r13+256]; 2026-02-21T09:05:41.1015180Z ld.shared.b16 %rs259, [%r13+16]; 2026-02-21T09:05:41.1015254Z ld.shared.b16 %rs260, [%r13+272]; 2026-02-21T09:05:41.1015322Z ld.shared.b16 %rs261, [%r13+2048]; 2026-02-21T09:05:41.1015387Z ld.shared.b16 %rs262, [%r13+2304]; 2026-02-21T09:05:41.1015457Z ld.shared.b16 %rs263, [%r13+2064]; 2026-02-21T09:05:41.1015522Z ld.shared.b16 %rs264, [%r13+2320]; 2026-02-21T09:05:41.1015587Z ld.shared.b16 %rs265, [%r13+4096]; 2026-02-21T09:05:41.1015655Z ld.shared.b16 %rs266, [%r13+4352]; 2026-02-21T09:05:41.1015727Z ld.shared.b16 %rs267, [%r13+4112]; 2026-02-21T09:05:41.1015799Z ld.shared.b16 %rs268, [%r13+4368]; 2026-02-21T09:05:41.1015867Z ld.shared.b16 %rs269, [%r13+6144]; 2026-02-21T09:05:41.1015938Z ld.shared.b16 %rs270, [%r13+6400]; 2026-02-21T09:05:41.1016068Z ld.shared.b16 %rs271, [%r13+6160]; 2026-02-21T09:05:41.1016137Z ld.shared.b16 %rs272, [%r13+6416]; 2026-02-21T09:05:41.1016204Z cvt.f32.bf16 %r7214, %rs241; 2026-02-21T09:05:41.1016280Z cvt.f32.bf16 %r7215, %rs242; 2026-02-21T09:05:41.1016346Z cvt.f32.bf16 %r7216, %rs257; 2026-02-21T09:05:41.1016408Z cvt.f32.bf16 %r7217, %rs258; 2026-02-21T09:05:41.1016603Z cvt.f32.bf16 %r7346, %rs243; 2026-02-21T09:05:41.1016673Z cvt.f32.bf16 %r7347, %rs244; 2026-02-21T09:05:41.1016737Z cvt.f32.bf16 %r7348, %rs259; 2026-02-21T09:05:41.1016806Z cvt.f32.bf16 %r7349, %rs260; 2026-02-21T09:05:41.1016870Z cvt.f32.bf16 %r7478, %rs245; 2026-02-21T09:05:41.1016934Z cvt.f32.bf16 %r7479, %rs246; 2026-02-21T09:05:41.1016998Z cvt.f32.bf16 %r7480, %rs261; 2026-02-21T09:05:41.1017216Z cvt.f32.bf16 %r7481, %rs262; 2026-02-21T09:05:41.1017289Z cvt.f32.bf16 %r7610, %rs247; 2026-02-21T09:05:41.1017355Z cvt.f32.bf16 %r7611, %rs248; 2026-02-21T09:05:41.1017425Z cvt.f32.bf16 %r7612, %rs263; 2026-02-21T09:05:41.1017490Z cvt.f32.bf16 %r7613, %rs264; 2026-02-21T09:05:41.1017554Z cvt.f32.bf16 %r7742, %rs249; 2026-02-21T09:05:41.1017618Z cvt.f32.bf16 %r7743, %rs250; 2026-02-21T09:05:41.1017687Z cvt.f32.bf16 %r7744, %rs265; 2026-02-21T09:05:41.1017750Z cvt.f32.bf16 %r7745, %rs266; 2026-02-21T09:05:41.1017813Z cvt.f32.bf16 %r7874, %rs251; 2026-02-21T09:05:41.1017880Z cvt.f32.bf16 %r7875, %rs252; 2026-02-21T09:05:41.1017943Z cvt.f32.bf16 %r7876, %rs267; 2026-02-21T09:05:41.1018008Z cvt.f32.bf16 %r7877, %rs268; 2026-02-21T09:05:41.1018071Z cvt.f32.bf16 %r8006, %rs253; 2026-02-21T09:05:41.1018141Z cvt.f32.bf16 %r8007, %rs254; 2026-02-21T09:05:41.1018204Z cvt.f32.bf16 %r8008, %rs269; 2026-02-21T09:05:41.1018268Z cvt.f32.bf16 %r8009, %rs270; 2026-02-21T09:05:41.1018339Z cvt.f32.bf16 %r8138, %rs255; 2026-02-21T09:05:41.1018468Z cvt.f32.bf16 %r8139, %rs256; 2026-02-21T09:05:41.1018534Z cvt.f32.bf16 %r8140, %rs271; 2026-02-21T09:05:41.1018601Z cvt.f32.bf16 %r8141, %rs272; 2026-02-21T09:05:41.1018809Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1018876Z add.s32 %r8783, %r29095, 196608; 2026-02-21T09:05:41.1018943Z cvt.s64.s32 %rd124, %r8783; 2026-02-21T09:05:41.1019015Z add.s64 %rd112, %rd45, %rd124; 2026-02-21T09:05:41.1019214Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1019277Z // begin inline asm 2026-02-21T09:05:41.1019342Z mov.u32 %r7084, 0x0; 2026-02-21T09:05:41.1019402Z mov.u32 %r7085, 0x0; 2026-02-21T09:05:41.1019502Z ld.global.v2.b32 { %r7084, %r7085 }, [ %rd112 + 0 ]; 2026-02-21T09:05:41.1019561Z // end inline asm 2026-02-21T09:05:41.1019763Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1019824Z bar.sync 0; 2026-02-21T09:05:41.1019890Z st.shared.b8 [%r14], %r7084; 2026-02-21T09:05:41.1019965Z prmt.b32 %r8784, %r7084, 0, 0x7771U; 2026-02-21T09:05:41.1020034Z st.shared.b8 [%r15], %r8784; 2026-02-21T09:05:41.1020104Z prmt.b32 %r8785, %r7084, 0, 0x7772U; 2026-02-21T09:05:41.1020181Z st.shared.b8 [%r16+256], %r8785; 2026-02-21T09:05:41.1020258Z prmt.b32 %r8786, %r7084, 0, 0x7773U; 2026-02-21T09:05:41.1020329Z st.shared.b8 [%r17+256], %r8786; 2026-02-21T09:05:41.1020394Z st.shared.b8 [%r18+512], %r7085; 2026-02-21T09:05:41.1020466Z prmt.b32 %r8787, %r7085, 0, 0x7771U; 2026-02-21T09:05:41.1020531Z st.shared.b8 [%r19+512], %r8787; 2026-02-21T09:05:41.1020599Z prmt.b32 %r8788, %r7085, 0, 0x7772U; 2026-02-21T09:05:41.1020668Z st.shared.b8 [%r20+768], %r8788; 2026-02-21T09:05:41.1020735Z prmt.b32 %r8789, %r7085, 0, 0x7773U; 2026-02-21T09:05:41.1020799Z st.shared.b8 [%r21+768], %r8789; 2026-02-21T09:05:41.1020858Z bar.sync 0; 2026-02-21T09:05:41.1020932Z ld.shared.b32 %r8790, [%r22]; 2026-02-21T09:05:41.1020998Z prmt.b32 %r8791, %r8790, 0, 0x7770U; 2026-02-21T09:05:41.1021063Z cvt.u16.u32 %rs273, %r8791; 2026-02-21T09:05:41.1021213Z prmt.b32 %r8792, %r8790, 0, 0x7771U; 2026-02-21T09:05:41.1021277Z cvt.u16.u32 %rs274, %r8792; 2026-02-21T09:05:41.1021343Z prmt.b32 %r8793, %r8790, 0, 0x7772U; 2026-02-21T09:05:41.1021413Z cvt.u16.u32 %rs275, %r8793; 2026-02-21T09:05:41.1021478Z prmt.b32 %r8794, %r8790, 0, 0x7773U; 2026-02-21T09:05:41.1021540Z cvt.u16.u32 %rs276, %r8794; 2026-02-21T09:05:41.1021605Z ld.shared.b32 %r8795, [%r23]; 2026-02-21T09:05:41.1021678Z prmt.b32 %r8796, %r8795, 0, 0x7770U; 2026-02-21T09:05:41.1021742Z cvt.u16.u32 %rs277, %r8796; 2026-02-21T09:05:41.1021808Z prmt.b32 %r8797, %r8795, 0, 0x7771U; 2026-02-21T09:05:41.1021878Z cvt.u16.u32 %rs278, %r8797; 2026-02-21T09:05:41.1021944Z prmt.b32 %r8798, %r8795, 0, 0x7772U; 2026-02-21T09:05:41.1022061Z cvt.u16.u32 %rs279, %r8798; 2026-02-21T09:05:41.1022175Z prmt.b32 %r8799, %r8795, 0, 0x7773U; 2026-02-21T09:05:41.1022247Z cvt.u16.u32 %rs280, %r8799; 2026-02-21T09:05:41.1022460Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1022528Z shl.b16 %rs281, %rs273, 4; 2026-02-21T09:05:41.1022598Z shl.b16 %rs282, %rs277, 4; 2026-02-21T09:05:41.1022661Z shl.b16 %rs283, %rs274, 4; 2026-02-21T09:05:41.1022727Z shl.b16 %rs284, %rs278, 4; 2026-02-21T09:05:41.1022794Z shl.b16 %rs285, %rs275, 4; 2026-02-21T09:05:41.1022857Z shl.b16 %rs286, %rs279, 4; 2026-02-21T09:05:41.1022919Z shl.b16 %rs287, %rs276, 4; 2026-02-21T09:05:41.1022982Z shl.b16 %rs288, %rs280, 4; 2026-02-21T09:05:41.1023188Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1023252Z cvt.s16.s8 %rs289, %rs281; 2026-02-21T09:05:41.1023317Z shr.s16 %rs290, %rs289, 4; 2026-02-21T09:05:41.1023389Z cvt.s16.s8 %rs291, %rs282; 2026-02-21T09:05:41.1023504Z shr.s16 %rs292, %rs291, 4; 2026-02-21T09:05:41.1023574Z prmt.b32 %r8800, %r8790, 0, 0x8880U; 2026-02-21T09:05:41.1023637Z cvt.u16.u32 %rs293, %r8800; 2026-02-21T09:05:41.1023708Z shr.s16 %rs294, %rs293, 4; 2026-02-21T09:05:41.1023774Z prmt.b32 %r8801, %r8795, 0, 0x8880U; 2026-02-21T09:05:41.1023837Z cvt.u16.u32 %rs295, %r8801; 2026-02-21T09:05:41.1023908Z shr.s16 %rs296, %rs295, 4; 2026-02-21T09:05:41.1024106Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1024173Z cvt.rn.f32.s16 %r8802, %rs296; 2026-02-21T09:05:41.1024247Z cvt.rn.f32.s16 %r8803, %rs294; 2026-02-21T09:05:41.1024315Z cvt.rn.f32.s16 %r8804, %rs292; 2026-02-21T09:05:41.1024380Z cvt.rn.f32.s16 %r8805, %rs290; 2026-02-21T09:05:41.1024578Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1024647Z cvt.s16.s8 %rs297, %rs283; 2026-02-21T09:05:41.1024714Z shr.s16 %rs298, %rs297, 4; 2026-02-21T09:05:41.1024783Z cvt.s16.s8 %rs299, %rs284; 2026-02-21T09:05:41.1024854Z shr.s16 %rs300, %rs299, 4; 2026-02-21T09:05:41.1024923Z prmt.b32 %r8806, %r8790, 0, 0x9991U; 2026-02-21T09:05:41.1024990Z cvt.u16.u32 %rs301, %r8806; 2026-02-21T09:05:41.1025056Z shr.s16 %rs302, %rs301, 4; 2026-02-21T09:05:41.1025133Z prmt.b32 %r8807, %r8795, 0, 0x9991U; 2026-02-21T09:05:41.1025197Z cvt.u16.u32 %rs303, %r8807; 2026-02-21T09:05:41.1025261Z shr.s16 %rs304, %rs303, 4; 2026-02-21T09:05:41.1025466Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1025534Z cvt.rn.f32.s16 %r8808, %rs304; 2026-02-21T09:05:41.1025600Z cvt.rn.f32.s16 %r8809, %rs302; 2026-02-21T09:05:41.1025668Z cvt.rn.f32.s16 %r8810, %rs300; 2026-02-21T09:05:41.1025731Z cvt.rn.f32.s16 %r8811, %rs298; 2026-02-21T09:05:41.1025926Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1025994Z cvt.s16.s8 %rs305, %rs285; 2026-02-21T09:05:41.1026063Z shr.s16 %rs306, %rs305, 4; 2026-02-21T09:05:41.1026124Z cvt.s16.s8 %rs307, %rs286; 2026-02-21T09:05:41.1026249Z shr.s16 %rs308, %rs307, 4; 2026-02-21T09:05:41.1026321Z prmt.b32 %r8812, %r8790, 0, 0xaaa2U; 2026-02-21T09:05:41.1026384Z cvt.u16.u32 %rs309, %r8812; 2026-02-21T09:05:41.1026568Z shr.s16 %rs310, %rs309, 4; 2026-02-21T09:05:41.1026640Z prmt.b32 %r8813, %r8795, 0, 0xaaa2U; 2026-02-21T09:05:41.1026711Z cvt.u16.u32 %rs311, %r8813; 2026-02-21T09:05:41.1026785Z shr.s16 %rs312, %rs311, 4; 2026-02-21T09:05:41.1026987Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1027060Z cvt.rn.f32.s16 %r8814, %rs312; 2026-02-21T09:05:41.1027125Z cvt.rn.f32.s16 %r8815, %rs310; 2026-02-21T09:05:41.1027188Z cvt.rn.f32.s16 %r8816, %rs308; 2026-02-21T09:05:41.1027259Z cvt.rn.f32.s16 %r8817, %rs306; 2026-02-21T09:05:41.1027618Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1027685Z cvt.s16.s8 %rs313, %rs287; 2026-02-21T09:05:41.1027747Z shr.s16 %rs314, %rs313, 4; 2026-02-21T09:05:41.1027817Z cvt.s16.s8 %rs315, %rs288; 2026-02-21T09:05:41.1027880Z shr.s16 %rs316, %rs315, 4; 2026-02-21T09:05:41.1027949Z prmt.b32 %r8818, %r8790, 0, 0xbbb3U; 2026-02-21T09:05:41.1028030Z cvt.u16.u32 %rs317, %r8818; 2026-02-21T09:05:41.1028099Z shr.s16 %rs318, %rs317, 4; 2026-02-21T09:05:41.1028167Z prmt.b32 %r8819, %r8795, 0, 0xbbb3U; 2026-02-21T09:05:41.1028232Z cvt.u16.u32 %rs319, %r8819; 2026-02-21T09:05:41.1028382Z shr.s16 %rs320, %rs319, 4; 2026-02-21T09:05:41.1028587Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1028653Z cvt.rn.f32.s16 %r8820, %rs320; 2026-02-21T09:05:41.1028723Z cvt.rn.f32.s16 %r8821, %rs318; 2026-02-21T09:05:41.1028786Z cvt.rn.f32.s16 %r8822, %rs316; 2026-02-21T09:05:41.1028924Z cvt.rn.f32.s16 %r8823, %rs314; 2026-02-21T09:05:41.1028988Z bar.sync 0; 2026-02-21T09:05:41.1029103Z st.shared.v4.b32 [%r24], {%r8805, %r8803, %r8804, %r8802}; 2026-02-21T09:05:41.1029220Z st.shared.v4.b32 [%r25], {%r8811, %r8809, %r8810, %r8808}; 2026-02-21T09:05:41.1029325Z st.shared.v4.b32 [%r26], {%r8817, %r8815, %r8816, %r8814}; 2026-02-21T09:05:41.1029435Z st.shared.v4.b32 [%r27], {%r8823, %r8821, %r8822, %r8820}; 2026-02-21T09:05:41.1029492Z $L__tmp7: 2026-02-21T09:05:41.1029770Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1029838Z // begin inline asm 2026-02-21T09:05:41.1029919Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1029979Z // end inline asm 2026-02-21T09:05:41.1030040Z bar.sync 0; 2026-02-21T09:05:41.1030115Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1030176Z // begin inline asm 2026-02-21T09:05:41.1031678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r7214,%r7215,%r7216,%r7217}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.1031740Z // end inline asm 2026-02-21T09:05:41.1031803Z // begin inline asm 2026-02-21T09:05:41.1033291Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159}, {%r7346,%r7347,%r7348,%r7349}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1033427Z // end inline asm 2026-02-21T09:05:41.1033502Z // begin inline asm 2026-02-21T09:05:41.1035037Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r7478,%r7479,%r7480,%r7481}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.1035143Z // end inline asm 2026-02-21T09:05:41.1035211Z // begin inline asm 2026-02-21T09:05:41.1036810Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223}, {%r7610,%r7611,%r7612,%r7613}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1036876Z // end inline asm 2026-02-21T09:05:41.1036944Z // begin inline asm 2026-02-21T09:05:41.1038488Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r7742,%r7743,%r7744,%r7745}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.1038560Z // end inline asm 2026-02-21T09:05:41.1038625Z // begin inline asm 2026-02-21T09:05:41.1040104Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287}, {%r7874,%r7875,%r7876,%r7877}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1040173Z // end inline asm 2026-02-21T09:05:41.1040234Z // begin inline asm 2026-02-21T09:05:41.1041704Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r8006,%r8007,%r8008,%r8009}, %rd2, %p20, 1, 1; 2026-02-21T09:05:41.1041784Z // end inline asm 2026-02-21T09:05:41.1041852Z // begin inline asm 2026-02-21T09:05:41.1043329Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351}, {%r8138,%r8139,%r8140,%r8141}, %rd3, %p20, 1, 1; 2026-02-21T09:05:41.1043468Z // end inline asm 2026-02-21T09:05:41.1043552Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1043621Z mov.b32 %r8399, %r8400; 2026-02-21T09:05:41.1043683Z mov.b32 %r8398, %r24001; 2026-02-21T09:05:41.1043745Z // begin inline asm 2026-02-21T09:05:41.1048963Z // wait for regs: %r29096,%r29097,%r29098,%r29099,%r29100,%r29101,%r29102,%r29103,%r29104,%r29105,%r29106,%r29107,%r29108,%r29109,%r29110,%r29111,%r29112,%r29113,%r29114,%r29115,%r29116,%r29117,%r29118,%r29119,%r29120,%r29121,%r29122,%r29123,%r29124,%r29125,%r29126,%r29127,%r29128,%r29129,%r29130,%r29131,%r29132,%r29133,%r29134,%r29135,%r29136,%r29137,%r29138,%r29139,%r29140,%r29141,%r29142,%r29143,%r29144,%r29145,%r29146,%r29147,%r29148,%r29149,%r29150,%r29151,%r29152,%r29153,%r29154,%r29155,%r29156,%r29157,%r29158,%r29159,%r29160,%r29161,%r29162,%r29163,%r29164,%r29165,%r29166,%r29167,%r29168,%r29169,%r29170,%r29171,%r29172,%r29173,%r29174,%r29175,%r29176,%r29177,%r29178,%r29179,%r29180,%r29181,%r29182,%r29183,%r29184,%r29185,%r29186,%r29187,%r29188,%r29189,%r29190,%r29191,%r29192,%r29193,%r29194,%r29195,%r29196,%r29197,%r29198,%r29199,%r29200,%r29201,%r29202,%r29203,%r29204,%r29205,%r29206,%r29207,%r29208,%r29209,%r29210,%r29211,%r29212,%r29213,%r29214,%r29215,%r29216,%r29217,%r29218,%r29219,%r29220,%r29221,%r29222,%r29223,%r29224,%r29225,%r29226,%r29227,%r29228,%r29229,%r29230,%r29231,%r29232,%r29233,%r29234,%r29235,%r29236,%r29237,%r29238,%r29239,%r29240,%r29241,%r29242,%r29243,%r29244,%r29245,%r29246,%r29247,%r29248,%r29249,%r29250,%r29251,%r29252,%r29253,%r29254,%r29255,%r29256,%r29257,%r29258,%r29259,%r29260,%r29261,%r29262,%r29263,%r29264,%r29265,%r29266,%r29267,%r29268,%r29269,%r29270,%r29271,%r29272,%r29273,%r29274,%r29275,%r29276,%r29277,%r29278,%r29279,%r29280,%r29281,%r29282,%r29283,%r29284,%r29285,%r29286,%r29287,%r29288,%r29289,%r29290,%r29291,%r29292,%r29293,%r29294,%r29295,%r29296,%r29297,%r29298,%r29299,%r29300,%r29301,%r29302,%r29303,%r29304,%r29305,%r29306,%r29307,%r29308,%r29309,%r29310,%r29311,%r29312,%r29313,%r29314,%r29315,%r29316,%r29317,%r29318,%r29319,%r29320,%r29321,%r29322,%r29323,%r29324,%r29325,%r29326,%r29327,%r29328,%r29329,%r29330,%r29331,%r29332,%r29333,%r29334,%r29335,%r29336,%r29337,%r29338,%r29339,%r29340,%r29341,%r29342,%r29343,%r29344,%r29345,%r29346,%r29347,%r29348,%r29349,%r29350,%r29351,%r8398,%r8399,%r8400 2026-02-21T09:05:41.1049137Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1049198Z // end inline asm 2026-02-21T09:05:41.1049255Z $L__tmp8: 2026-02-21T09:05:41.1049482Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1049551Z add.s64 %rd300, %rd300, 32; 2026-02-21T09:05:41.1049619Z add.s32 %r29095, %r29095, 262144; 2026-02-21T09:05:41.1049687Z add.s64 %rd299, %rd299, 128; 2026-02-21T09:05:41.1049763Z setp.lt.u64 %p52, %rd300, 480; 2026-02-21T09:05:41.1049825Z @%p52 bra $L__BB0_3; 2026-02-21T09:05:41.1049941Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:41.1050158Z .loc 1 94 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:94:28 2026-02-21T09:05:41.1050247Z cvt.rn.bf16x2.f32 %r8828, %r29097, %r29096; 2026-02-21T09:05:41.1050329Z cvt.rn.bf16x2.f32 %r8829, %r29099, %r29098; 2026-02-21T09:05:41.1050416Z cvt.rn.bf16x2.f32 %r8830, %r29101, %r29100; 2026-02-21T09:05:41.1050494Z cvt.rn.bf16x2.f32 %r8831, %r29103, %r29102; 2026-02-21T09:05:41.1050573Z cvt.rn.bf16x2.f32 %r8832, %r29105, %r29104; 2026-02-21T09:05:41.1050650Z cvt.rn.bf16x2.f32 %r8833, %r29107, %r29106; 2026-02-21T09:05:41.1050805Z cvt.rn.bf16x2.f32 %r8834, %r29109, %r29108; 2026-02-21T09:05:41.1050883Z cvt.rn.bf16x2.f32 %r8835, %r29111, %r29110; 2026-02-21T09:05:41.1050958Z cvt.rn.bf16x2.f32 %r8836, %r29113, %r29112; 2026-02-21T09:05:41.1051044Z cvt.rn.bf16x2.f32 %r8837, %r29115, %r29114; 2026-02-21T09:05:41.1051121Z cvt.rn.bf16x2.f32 %r8838, %r29117, %r29116; 2026-02-21T09:05:41.1051198Z cvt.rn.bf16x2.f32 %r8839, %r29119, %r29118; 2026-02-21T09:05:41.1051279Z cvt.rn.bf16x2.f32 %r8840, %r29121, %r29120; 2026-02-21T09:05:41.1051353Z cvt.rn.bf16x2.f32 %r8841, %r29123, %r29122; 2026-02-21T09:05:41.1051429Z cvt.rn.bf16x2.f32 %r8842, %r29125, %r29124; 2026-02-21T09:05:41.1051504Z cvt.rn.bf16x2.f32 %r8843, %r29127, %r29126; 2026-02-21T09:05:41.1051694Z cvt.rn.bf16x2.f32 %r8844, %r29129, %r29128; 2026-02-21T09:05:41.1051776Z cvt.rn.bf16x2.f32 %r8845, %r29131, %r29130; 2026-02-21T09:05:41.1051853Z cvt.rn.bf16x2.f32 %r8846, %r29133, %r29132; 2026-02-21T09:05:41.1051936Z cvt.rn.bf16x2.f32 %r8847, %r29135, %r29134; 2026-02-21T09:05:41.1052012Z cvt.rn.bf16x2.f32 %r8848, %r29137, %r29136; 2026-02-21T09:05:41.1052086Z cvt.rn.bf16x2.f32 %r8849, %r29139, %r29138; 2026-02-21T09:05:41.1052170Z cvt.rn.bf16x2.f32 %r8850, %r29141, %r29140; 2026-02-21T09:05:41.1052246Z cvt.rn.bf16x2.f32 %r8851, %r29143, %r29142; 2026-02-21T09:05:41.1052321Z cvt.rn.bf16x2.f32 %r8852, %r29145, %r29144; 2026-02-21T09:05:41.1052397Z cvt.rn.bf16x2.f32 %r8853, %r29147, %r29146; 2026-02-21T09:05:41.1052478Z cvt.rn.bf16x2.f32 %r8854, %r29149, %r29148; 2026-02-21T09:05:41.1052552Z cvt.rn.bf16x2.f32 %r8855, %r29151, %r29150; 2026-02-21T09:05:41.1052627Z cvt.rn.bf16x2.f32 %r8856, %r29153, %r29152; 2026-02-21T09:05:41.1052707Z cvt.rn.bf16x2.f32 %r8857, %r29155, %r29154; 2026-02-21T09:05:41.1052836Z cvt.rn.bf16x2.f32 %r8858, %r29157, %r29156; 2026-02-21T09:05:41.1052915Z cvt.rn.bf16x2.f32 %r8859, %r29159, %r29158; 2026-02-21T09:05:41.1052999Z cvt.rn.bf16x2.f32 %r8860, %r29161, %r29160; 2026-02-21T09:05:41.1053076Z cvt.rn.bf16x2.f32 %r8861, %r29163, %r29162; 2026-02-21T09:05:41.1053152Z cvt.rn.bf16x2.f32 %r8862, %r29165, %r29164; 2026-02-21T09:05:41.1053232Z cvt.rn.bf16x2.f32 %r8863, %r29167, %r29166; 2026-02-21T09:05:41.1053314Z cvt.rn.bf16x2.f32 %r8864, %r29169, %r29168; 2026-02-21T09:05:41.1053388Z cvt.rn.bf16x2.f32 %r8865, %r29171, %r29170; 2026-02-21T09:05:41.1053465Z cvt.rn.bf16x2.f32 %r8866, %r29173, %r29172; 2026-02-21T09:05:41.1053546Z cvt.rn.bf16x2.f32 %r8867, %r29175, %r29174; 2026-02-21T09:05:41.1053621Z cvt.rn.bf16x2.f32 %r8868, %r29177, %r29176; 2026-02-21T09:05:41.1053697Z cvt.rn.bf16x2.f32 %r8869, %r29179, %r29178; 2026-02-21T09:05:41.1053777Z cvt.rn.bf16x2.f32 %r8870, %r29181, %r29180; 2026-02-21T09:05:41.1053857Z cvt.rn.bf16x2.f32 %r8871, %r29183, %r29182; 2026-02-21T09:05:41.1053935Z cvt.rn.bf16x2.f32 %r8872, %r29185, %r29184; 2026-02-21T09:05:41.1054010Z cvt.rn.bf16x2.f32 %r8873, %r29187, %r29186; 2026-02-21T09:05:41.1054090Z cvt.rn.bf16x2.f32 %r8874, %r29189, %r29188; 2026-02-21T09:05:41.1054170Z cvt.rn.bf16x2.f32 %r8875, %r29191, %r29190; 2026-02-21T09:05:41.1054246Z cvt.rn.bf16x2.f32 %r8876, %r29193, %r29192; 2026-02-21T09:05:41.1054326Z cvt.rn.bf16x2.f32 %r8877, %r29195, %r29194; 2026-02-21T09:05:41.1054403Z cvt.rn.bf16x2.f32 %r8878, %r29197, %r29196; 2026-02-21T09:05:41.1054479Z cvt.rn.bf16x2.f32 %r8879, %r29199, %r29198; 2026-02-21T09:05:41.1054568Z cvt.rn.bf16x2.f32 %r8880, %r29201, %r29200; 2026-02-21T09:05:41.1054656Z cvt.rn.bf16x2.f32 %r8881, %r29203, %r29202; 2026-02-21T09:05:41.1054732Z cvt.rn.bf16x2.f32 %r8882, %r29205, %r29204; 2026-02-21T09:05:41.1054810Z cvt.rn.bf16x2.f32 %r8883, %r29207, %r29206; 2026-02-21T09:05:41.1054895Z cvt.rn.bf16x2.f32 %r8884, %r29209, %r29208; 2026-02-21T09:05:41.1054972Z cvt.rn.bf16x2.f32 %r8885, %r29211, %r29210; 2026-02-21T09:05:41.1055050Z cvt.rn.bf16x2.f32 %r8886, %r29213, %r29212; 2026-02-21T09:05:41.1055134Z cvt.rn.bf16x2.f32 %r8887, %r29215, %r29214; 2026-02-21T09:05:41.1055285Z cvt.rn.bf16x2.f32 %r8888, %r29217, %r29216; 2026-02-21T09:05:41.1055361Z cvt.rn.bf16x2.f32 %r8889, %r29219, %r29218; 2026-02-21T09:05:41.1055444Z cvt.rn.bf16x2.f32 %r8890, %r29221, %r29220; 2026-02-21T09:05:41.1055521Z cvt.rn.bf16x2.f32 %r8891, %r29223, %r29222; 2026-02-21T09:05:41.1055598Z cvt.rn.bf16x2.f32 %r8892, %r29225, %r29224; 2026-02-21T09:05:41.1055676Z cvt.rn.bf16x2.f32 %r8893, %r29227, %r29226; 2026-02-21T09:05:41.1055758Z cvt.rn.bf16x2.f32 %r8894, %r29229, %r29228; 2026-02-21T09:05:41.1055832Z cvt.rn.bf16x2.f32 %r8895, %r29231, %r29230; 2026-02-21T09:05:41.1055909Z cvt.rn.bf16x2.f32 %r8896, %r29233, %r29232; 2026-02-21T09:05:41.1055999Z cvt.rn.bf16x2.f32 %r8897, %r29235, %r29234; 2026-02-21T09:05:41.1056073Z cvt.rn.bf16x2.f32 %r8898, %r29237, %r29236; 2026-02-21T09:05:41.1056252Z cvt.rn.bf16x2.f32 %r8899, %r29239, %r29238; 2026-02-21T09:05:41.1056333Z cvt.rn.bf16x2.f32 %r8900, %r29241, %r29240; 2026-02-21T09:05:41.1056415Z cvt.rn.bf16x2.f32 %r8901, %r29243, %r29242; 2026-02-21T09:05:41.1056611Z cvt.rn.bf16x2.f32 %r8902, %r29245, %r29244; 2026-02-21T09:05:41.1056692Z cvt.rn.bf16x2.f32 %r8903, %r29247, %r29246; 2026-02-21T09:05:41.1056777Z cvt.rn.bf16x2.f32 %r8904, %r29249, %r29248; 2026-02-21T09:05:41.1056853Z cvt.rn.bf16x2.f32 %r8905, %r29251, %r29250; 2026-02-21T09:05:41.1056928Z cvt.rn.bf16x2.f32 %r8906, %r29253, %r29252; 2026-02-21T09:05:41.1057010Z cvt.rn.bf16x2.f32 %r8907, %r29255, %r29254; 2026-02-21T09:05:41.1057086Z cvt.rn.bf16x2.f32 %r8908, %r29257, %r29256; 2026-02-21T09:05:41.1057161Z cvt.rn.bf16x2.f32 %r8909, %r29259, %r29258; 2026-02-21T09:05:41.1057234Z cvt.rn.bf16x2.f32 %r8910, %r29261, %r29260; 2026-02-21T09:05:41.1057331Z cvt.rn.bf16x2.f32 %r8911, %r29263, %r29262; 2026-02-21T09:05:41.1057416Z cvt.rn.bf16x2.f32 %r8912, %r29265, %r29264; 2026-02-21T09:05:41.1057570Z cvt.rn.bf16x2.f32 %r8913, %r29267, %r29266; 2026-02-21T09:05:41.1057659Z cvt.rn.bf16x2.f32 %r8914, %r29269, %r29268; 2026-02-21T09:05:41.1057740Z cvt.rn.bf16x2.f32 %r8915, %r29271, %r29270; 2026-02-21T09:05:41.1057818Z cvt.rn.bf16x2.f32 %r8916, %r29273, %r29272; 2026-02-21T09:05:41.1057903Z cvt.rn.bf16x2.f32 %r8917, %r29275, %r29274; 2026-02-21T09:05:41.1057980Z cvt.rn.bf16x2.f32 %r8918, %r29277, %r29276; 2026-02-21T09:05:41.1058056Z cvt.rn.bf16x2.f32 %r8919, %r29279, %r29278; 2026-02-21T09:05:41.1058132Z cvt.rn.bf16x2.f32 %r8920, %r29281, %r29280; 2026-02-21T09:05:41.1058216Z cvt.rn.bf16x2.f32 %r8921, %r29283, %r29282; 2026-02-21T09:05:41.1058291Z cvt.rn.bf16x2.f32 %r8922, %r29285, %r29284; 2026-02-21T09:05:41.1058368Z cvt.rn.bf16x2.f32 %r8923, %r29287, %r29286; 2026-02-21T09:05:41.1058451Z cvt.rn.bf16x2.f32 %r8924, %r29289, %r29288; 2026-02-21T09:05:41.1058526Z cvt.rn.bf16x2.f32 %r8925, %r29291, %r29290; 2026-02-21T09:05:41.1058608Z cvt.rn.bf16x2.f32 %r8926, %r29293, %r29292; 2026-02-21T09:05:41.1058692Z cvt.rn.bf16x2.f32 %r8927, %r29295, %r29294; 2026-02-21T09:05:41.1058768Z cvt.rn.bf16x2.f32 %r8928, %r29297, %r29296; 2026-02-21T09:05:41.1058847Z cvt.rn.bf16x2.f32 %r8929, %r29299, %r29298; 2026-02-21T09:05:41.1058926Z cvt.rn.bf16x2.f32 %r8930, %r29301, %r29300; 2026-02-21T09:05:41.1059008Z cvt.rn.bf16x2.f32 %r8931, %r29303, %r29302; 2026-02-21T09:05:41.1059087Z cvt.rn.bf16x2.f32 %r8932, %r29305, %r29304; 2026-02-21T09:05:41.1059163Z cvt.rn.bf16x2.f32 %r8933, %r29307, %r29306; 2026-02-21T09:05:41.1059260Z cvt.rn.bf16x2.f32 %r8934, %r29309, %r29308; 2026-02-21T09:05:41.1059338Z cvt.rn.bf16x2.f32 %r8935, %r29311, %r29310; 2026-02-21T09:05:41.1059417Z cvt.rn.bf16x2.f32 %r8936, %r29313, %r29312; 2026-02-21T09:05:41.1059499Z cvt.rn.bf16x2.f32 %r8937, %r29315, %r29314; 2026-02-21T09:05:41.1059578Z cvt.rn.bf16x2.f32 %r8938, %r29317, %r29316; 2026-02-21T09:05:41.1059657Z cvt.rn.bf16x2.f32 %r8939, %r29319, %r29318; 2026-02-21T09:05:41.1059738Z cvt.rn.bf16x2.f32 %r8940, %r29321, %r29320; 2026-02-21T09:05:41.1059822Z cvt.rn.bf16x2.f32 %r8941, %r29323, %r29322; 2026-02-21T09:05:41.1059897Z cvt.rn.bf16x2.f32 %r8942, %r29325, %r29324; 2026-02-21T09:05:41.1060048Z cvt.rn.bf16x2.f32 %r8943, %r29327, %r29326; 2026-02-21T09:05:41.1060130Z cvt.rn.bf16x2.f32 %r8944, %r29329, %r29328; 2026-02-21T09:05:41.1060206Z cvt.rn.bf16x2.f32 %r8945, %r29331, %r29330; 2026-02-21T09:05:41.1060282Z cvt.rn.bf16x2.f32 %r8946, %r29333, %r29332; 2026-02-21T09:05:41.1060363Z cvt.rn.bf16x2.f32 %r8947, %r29335, %r29334; 2026-02-21T09:05:41.1060440Z cvt.rn.bf16x2.f32 %r8948, %r29337, %r29336; 2026-02-21T09:05:41.1060515Z cvt.rn.bf16x2.f32 %r8949, %r29339, %r29338; 2026-02-21T09:05:41.1060590Z cvt.rn.bf16x2.f32 %r8950, %r29341, %r29340; 2026-02-21T09:05:41.1060671Z cvt.rn.bf16x2.f32 %r8951, %r29343, %r29342; 2026-02-21T09:05:41.1060747Z cvt.rn.bf16x2.f32 %r8952, %r29345, %r29344; 2026-02-21T09:05:41.1060949Z cvt.rn.bf16x2.f32 %r8953, %r29347, %r29346; 2026-02-21T09:05:41.1061035Z cvt.rn.bf16x2.f32 %r8954, %r29349, %r29348; 2026-02-21T09:05:41.1061114Z cvt.rn.bf16x2.f32 %r8955, %r29351, %r29350; 2026-02-21T09:05:41.1061324Z .loc 1 95 43 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:95:43 2026-02-21T09:05:41.1061392Z bar.sync 0; 2026-02-21T09:05:41.1061584Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r8828, %r8829, %r8830, %r8831}; 2026-02-21T09:05:41.1061764Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r8844, %r8845, %r8846, %r8847}; 2026-02-21T09:05:41.1061940Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r8860, %r8861, %r8862, %r8863}; 2026-02-21T09:05:41.1062121Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r8876, %r8877, %r8878, %r8879}; 2026-02-21T09:05:41.1062296Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r8892, %r8893, %r8894, %r8895}; 2026-02-21T09:05:41.1062470Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r8908, %r8909, %r8910, %r8911}; 2026-02-21T09:05:41.1062703Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r8924, %r8925, %r8926, %r8927}; 2026-02-21T09:05:41.1062878Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r8940, %r8941, %r8942, %r8943}; 2026-02-21T09:05:41.1063070Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r8832, %r8833, %r8834, %r8835}; 2026-02-21T09:05:41.1063251Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r8848, %r8849, %r8850, %r8851}; 2026-02-21T09:05:41.1063427Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r8864, %r8865, %r8866, %r8867}; 2026-02-21T09:05:41.1063601Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r8880, %r8881, %r8882, %r8883}; 2026-02-21T09:05:41.1063781Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r8896, %r8897, %r8898, %r8899}; 2026-02-21T09:05:41.1063960Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r8912, %r8913, %r8914, %r8915}; 2026-02-21T09:05:41.1064135Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r8928, %r8929, %r8930, %r8931}; 2026-02-21T09:05:41.1064321Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r8944, %r8945, %r8946, %r8947}; 2026-02-21T09:05:41.1064497Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r8836, %r8837, %r8838, %r8839}; 2026-02-21T09:05:41.1064677Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r8852, %r8853, %r8854, %r8855}; 2026-02-21T09:05:41.1064860Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r8868, %r8869, %r8870, %r8871}; 2026-02-21T09:05:41.1065033Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r8884, %r8885, %r8886, %r8887}; 2026-02-21T09:05:41.1065206Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r8900, %r8901, %r8902, %r8903}; 2026-02-21T09:05:41.1065380Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r8916, %r8917, %r8918, %r8919}; 2026-02-21T09:05:41.1065561Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r8932, %r8933, %r8934, %r8935}; 2026-02-21T09:05:41.1065735Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r8948, %r8949, %r8950, %r8951}; 2026-02-21T09:05:41.1065924Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r52], {%r8840, %r8841, %r8842, %r8843}; 2026-02-21T09:05:41.1066105Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r8856, %r8857, %r8858, %r8859}; 2026-02-21T09:05:41.1066336Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r54], {%r8872, %r8873, %r8874, %r8875}; 2026-02-21T09:05:41.1066627Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r55], {%r8888, %r8889, %r8890, %r8891}; 2026-02-21T09:05:41.1066815Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r56], {%r8904, %r8905, %r8906, %r8907}; 2026-02-21T09:05:41.1066989Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r57], {%r8920, %r8921, %r8922, %r8923}; 2026-02-21T09:05:41.1067166Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r58], {%r8936, %r8937, %r8938, %r8939}; 2026-02-21T09:05:41.1067347Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r59], {%r8952, %r8953, %r8954, %r8955}; 2026-02-21T09:05:41.1067413Z // begin inline asm 2026-02-21T09:05:41.1067498Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1067699Z // end inline asm 2026-02-21T09:05:41.1067761Z bar.sync 0; 2026-02-21T09:05:41.1067835Z elect.sync %r8956|%p55, -1; 2026-02-21T09:05:41.1067920Z shfl.sync.idx.b32 %r8957, %r4, 0, 31, -1; 2026-02-21T09:05:41.1068013Z and.pred %p53, %p167, %p55; 2026-02-21T09:05:41.1068082Z and.b32 %r8958, %r8957, 1; 2026-02-21T09:05:41.1068146Z shl.b32 %r8959, %r8958, 15; 2026-02-21T09:05:41.1068221Z add.s32 %r15516, %r24001, %r8959; 2026-02-21T09:05:41.1068375Z shl.b32 %r638, %r8958, 6; 2026-02-21T09:05:41.1068444Z or.b32 %r8824, %r638, %r121; 2026-02-21T09:05:41.1068506Z // begin inline asm 2026-02-21T09:05:41.1068752Z @%p53 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd183, {%r8824, %r8825}], [%r15516]; 2026-02-21T09:05:41.1068869Z // end inline asm 2026-02-21T09:05:41.1068979Z cp.async.bulk.commit_group; 2026-02-21T09:05:41.1069156Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:41.1069255Z bar.sync 0; 2026-02-21T09:05:41.1069586Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.1069727Z add.s32 %r8961, %r29094, 1056; 2026-02-21T09:05:41.1070524Z .loc 1 32 35 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:32:35 2026-02-21T09:05:41.1070675Z shr.s32 %r8962, %r8961, 31; 2026-02-21T09:05:41.1070771Z shr.u32 %r8963, %r8962, 25; 2026-02-21T09:05:41.1070907Z add.s32 %r8964, %r8961, %r8963; 2026-02-21T09:05:41.1071005Z shr.s32 %r8965, %r8964, 7; 2026-02-21T09:05:41.1071241Z .loc 1 33 33 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:33:33 2026-02-21T09:05:41.1071412Z shl.b32 %r8966, %r8965, 1; 2026-02-21T09:05:41.1071676Z .loc 1 34 39 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:39 2026-02-21T09:05:41.1071770Z sub.s32 %r8967, 16, %r8966; 2026-02-21T09:05:41.1072037Z .loc 1 34 52 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:52 2026-02-21T09:05:41.1072137Z min.s32 %r8968, %r8967, 2; 2026-02-21T09:05:41.1072369Z .loc 1 35 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:45 2026-02-21T09:05:41.1072549Z and.b32 %r8969, %r8964, -128; 2026-02-21T09:05:41.1072663Z sub.s32 %r8970, %r8961, %r8969; 2026-02-21T09:05:41.1072892Z .loc 1 36 51 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:36:51 2026-02-21T09:05:41.1072990Z div.s32 %r8971, %r8970, %r8968; 2026-02-21T09:05:41.1073260Z .loc 1 35 64 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:64 2026-02-21T09:05:41.1073363Z mul.lo.s32 %r8972, %r8971, %r8968; 2026-02-21T09:05:41.1073445Z sub.s32 %r8973, %r8970, %r8972; 2026-02-21T09:05:41.1073783Z .loc 1 35 30 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:30 2026-02-21T09:05:41.1073891Z add.s32 %r8974, %r8973, %r8966; 2026-02-21T09:05:41.1074130Z .loc 1 37 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:37:27 2026-02-21T09:05:41.1074268Z shl.b32 %r15515, %r8974, 8; 2026-02-21T09:05:41.1074500Z .loc 1 39 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:39:27 2026-02-21T09:05:41.1074685Z shl.b32 %r640, %r8971, 7; 2026-02-21T09:05:41.1075026Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1075122Z add.s32 %r29352, %r60, %r640; 2026-02-21T09:05:41.1075222Z or.b32 %r8975, %r29091, %r15515; 2026-02-21T09:05:41.1075317Z shl.b32 %r8976, %r8975, 10; 2026-02-21T09:05:41.1075470Z mul.wide.s32 %rd17, %r8976, 2; 2026-02-21T09:05:41.1075549Z or.b32 %r8977, %r29092, %r15515; 2026-02-21T09:05:41.1075689Z shl.b32 %r8978, %r8977, 10; 2026-02-21T09:05:41.1075842Z mul.wide.s32 %rd18, %r8978, 2; 2026-02-21T09:05:41.1075944Z or.b32 %r8979, %r29093, %r15515; 2026-02-21T09:05:41.1076043Z shl.b32 %r8980, %r8979, 10; 2026-02-21T09:05:41.1076311Z mul.wide.s32 %rd19, %r8980, 2; 2026-02-21T09:05:41.1076395Z shl.b32 %r8981, %r8974, 18; 2026-02-21T09:05:41.1076685Z or.b32 %r8982, %r64, %r8981; 2026-02-21T09:05:41.1076810Z mul.wide.s32 %rd20, %r8982, 2; 2026-02-21T09:05:41.1076948Z mov.b32 %r29353, 0f00000000; 2026-02-21T09:05:41.1077069Z mov.b64 %rd302, -32; 2026-02-21T09:05:41.1077167Z mov.b64 %rd301, %rd4; 2026-02-21T09:05:41.1077289Z mov.b32 %r29354, %r29353; 2026-02-21T09:05:41.1077432Z mov.b32 %r29355, %r29353; 2026-02-21T09:05:41.1077544Z mov.b32 %r29356, %r29353; 2026-02-21T09:05:41.1077695Z mov.b32 %r29357, %r29353; 2026-02-21T09:05:41.1077786Z mov.b32 %r29358, %r29353; 2026-02-21T09:05:41.1077883Z mov.b32 %r29359, %r29353; 2026-02-21T09:05:41.1077977Z mov.b32 %r29360, %r29353; 2026-02-21T09:05:41.1078135Z mov.b32 %r29361, %r29353; 2026-02-21T09:05:41.1078247Z mov.b32 %r29362, %r29353; 2026-02-21T09:05:41.1078351Z mov.b32 %r29363, %r29353; 2026-02-21T09:05:41.1078499Z mov.b32 %r29364, %r29353; 2026-02-21T09:05:41.1078607Z mov.b32 %r29365, %r29353; 2026-02-21T09:05:41.1078792Z mov.b32 %r29366, %r29353; 2026-02-21T09:05:41.1078873Z mov.b32 %r29367, %r29353; 2026-02-21T09:05:41.1079064Z mov.b32 %r29368, %r29353; 2026-02-21T09:05:41.1079180Z mov.b32 %r29369, %r29353; 2026-02-21T09:05:41.1079279Z mov.b32 %r29370, %r29353; 2026-02-21T09:05:41.1079407Z mov.b32 %r29371, %r29353; 2026-02-21T09:05:41.1079501Z mov.b32 %r29372, %r29353; 2026-02-21T09:05:41.1079577Z mov.b32 %r29373, %r29353; 2026-02-21T09:05:41.1079786Z mov.b32 %r29374, %r29353; 2026-02-21T09:05:41.1079882Z mov.b32 %r29375, %r29353; 2026-02-21T09:05:41.1079976Z mov.b32 %r29376, %r29353; 2026-02-21T09:05:41.1080069Z mov.b32 %r29377, %r29353; 2026-02-21T09:05:41.1080199Z mov.b32 %r29378, %r29353; 2026-02-21T09:05:41.1080276Z mov.b32 %r29379, %r29353; 2026-02-21T09:05:41.1080411Z mov.b32 %r29380, %r29353; 2026-02-21T09:05:41.1080574Z mov.b32 %r29381, %r29353; 2026-02-21T09:05:41.1080670Z mov.b32 %r29382, %r29353; 2026-02-21T09:05:41.1080765Z mov.b32 %r29383, %r29353; 2026-02-21T09:05:41.1080863Z mov.b32 %r29384, %r29353; 2026-02-21T09:05:41.1080981Z mov.b32 %r29385, %r29353; 2026-02-21T09:05:41.1081125Z mov.b32 %r29386, %r29353; 2026-02-21T09:05:41.1081255Z mov.b32 %r29387, %r29353; 2026-02-21T09:05:41.1081387Z mov.b32 %r29388, %r29353; 2026-02-21T09:05:41.1081481Z mov.b32 %r29389, %r29353; 2026-02-21T09:05:41.1081572Z mov.b32 %r29390, %r29353; 2026-02-21T09:05:41.1081667Z mov.b32 %r29391, %r29353; 2026-02-21T09:05:41.1081841Z mov.b32 %r29392, %r29353; 2026-02-21T09:05:41.1081952Z mov.b32 %r29393, %r29353; 2026-02-21T09:05:41.1082049Z mov.b32 %r29394, %r29353; 2026-02-21T09:05:41.1082180Z mov.b32 %r29395, %r29353; 2026-02-21T09:05:41.1082277Z mov.b32 %r29396, %r29353; 2026-02-21T09:05:41.1082371Z mov.b32 %r29397, %r29353; 2026-02-21T09:05:41.1082556Z mov.b32 %r29398, %r29353; 2026-02-21T09:05:41.1082667Z mov.b32 %r29399, %r29353; 2026-02-21T09:05:41.1082763Z mov.b32 %r29400, %r29353; 2026-02-21T09:05:41.1082859Z mov.b32 %r29401, %r29353; 2026-02-21T09:05:41.1082994Z mov.b32 %r29402, %r29353; 2026-02-21T09:05:41.1083088Z mov.b32 %r29403, %r29353; 2026-02-21T09:05:41.1083187Z mov.b32 %r29404, %r29353; 2026-02-21T09:05:41.1083469Z mov.b32 %r29405, %r29353; 2026-02-21T09:05:41.1083567Z mov.b32 %r29406, %r29353; 2026-02-21T09:05:41.1084428Z mov.b32 %r29407, %r29353; 2026-02-21T09:05:41.1084524Z mov.b32 %r29408, %r29353; 2026-02-21T09:05:41.1084670Z mov.b32 %r29409, %r29353; 2026-02-21T09:05:41.1084768Z mov.b32 %r29410, %r29353; 2026-02-21T09:05:41.1084905Z mov.b32 %r29411, %r29353; 2026-02-21T09:05:41.1085061Z mov.b32 %r29412, %r29353; 2026-02-21T09:05:41.1085152Z mov.b32 %r29413, %r29353; 2026-02-21T09:05:41.1085247Z mov.b32 %r29414, %r29353; 2026-02-21T09:05:41.1085391Z mov.b32 %r29415, %r29353; 2026-02-21T09:05:41.1085469Z mov.b32 %r29416, %r29353; 2026-02-21T09:05:41.1085612Z mov.b32 %r29417, %r29353; 2026-02-21T09:05:41.1085722Z mov.b32 %r29418, %r29353; 2026-02-21T09:05:41.1086020Z mov.b32 %r29419, %r29353; 2026-02-21T09:05:41.1086137Z mov.b32 %r29420, %r29353; 2026-02-21T09:05:41.1086232Z mov.b32 %r29421, %r29353; 2026-02-21T09:05:41.1086346Z mov.b32 %r29422, %r29353; 2026-02-21T09:05:41.1086625Z mov.b32 %r29423, %r29353; 2026-02-21T09:05:41.1086748Z mov.b32 %r29424, %r29353; 2026-02-21T09:05:41.1086843Z mov.b32 %r29425, %r29353; 2026-02-21T09:05:41.1086990Z mov.b32 %r29426, %r29353; 2026-02-21T09:05:41.1087085Z mov.b32 %r29427, %r29353; 2026-02-21T09:05:41.1087178Z mov.b32 %r29428, %r29353; 2026-02-21T09:05:41.1087349Z mov.b32 %r29429, %r29353; 2026-02-21T09:05:41.1087460Z mov.b32 %r29430, %r29353; 2026-02-21T09:05:41.1087556Z mov.b32 %r29431, %r29353; 2026-02-21T09:05:41.1087701Z mov.b32 %r29432, %r29353; 2026-02-21T09:05:41.1087795Z mov.b32 %r29433, %r29353; 2026-02-21T09:05:41.1087887Z mov.b32 %r29434, %r29353; 2026-02-21T09:05:41.1087963Z mov.b32 %r29435, %r29353; 2026-02-21T09:05:41.1088154Z mov.b32 %r29436, %r29353; 2026-02-21T09:05:41.1088268Z mov.b32 %r29437, %r29353; 2026-02-21T09:05:41.1088450Z mov.b32 %r29438, %r29353; 2026-02-21T09:05:41.1088596Z mov.b32 %r29439, %r29353; 2026-02-21T09:05:41.1088690Z mov.b32 %r29440, %r29353; 2026-02-21T09:05:41.1088771Z mov.b32 %r29441, %r29353; 2026-02-21T09:05:41.1088983Z mov.b32 %r29442, %r29353; 2026-02-21T09:05:41.1089077Z mov.b32 %r29443, %r29353; 2026-02-21T09:05:41.1089170Z mov.b32 %r29444, %r29353; 2026-02-21T09:05:41.1089275Z mov.b32 %r29445, %r29353; 2026-02-21T09:05:41.1089407Z mov.b32 %r29446, %r29353; 2026-02-21T09:05:41.1089483Z mov.b32 %r29447, %r29353; 2026-02-21T09:05:41.1089620Z mov.b32 %r29448, %r29353; 2026-02-21T09:05:41.1089778Z mov.b32 %r29449, %r29353; 2026-02-21T09:05:41.1089870Z mov.b32 %r29450, %r29353; 2026-02-21T09:05:41.1089964Z mov.b32 %r29451, %r29353; 2026-02-21T09:05:41.1090057Z mov.b32 %r29452, %r29353; 2026-02-21T09:05:41.1090173Z mov.b32 %r29453, %r29353; 2026-02-21T09:05:41.1090304Z mov.b32 %r29454, %r29353; 2026-02-21T09:05:41.1090435Z mov.b32 %r29455, %r29353; 2026-02-21T09:05:41.1090567Z mov.b32 %r29456, %r29353; 2026-02-21T09:05:41.1090658Z mov.b32 %r29457, %r29353; 2026-02-21T09:05:41.1090751Z mov.b32 %r29458, %r29353; 2026-02-21T09:05:41.1090843Z mov.b32 %r29459, %r29353; 2026-02-21T09:05:41.1091016Z mov.b32 %r29460, %r29353; 2026-02-21T09:05:41.1091126Z mov.b32 %r29461, %r29353; 2026-02-21T09:05:41.1091219Z mov.b32 %r29462, %r29353; 2026-02-21T09:05:41.1091351Z mov.b32 %r29463, %r29353; 2026-02-21T09:05:41.1091445Z mov.b32 %r29464, %r29353; 2026-02-21T09:05:41.1091537Z mov.b32 %r29465, %r29353; 2026-02-21T09:05:41.1091712Z mov.b32 %r29466, %r29353; 2026-02-21T09:05:41.1091822Z mov.b32 %r29467, %r29353; 2026-02-21T09:05:41.1091919Z mov.b32 %r29468, %r29353; 2026-02-21T09:05:41.1092014Z mov.b32 %r29469, %r29353; 2026-02-21T09:05:41.1092148Z mov.b32 %r29470, %r29353; 2026-02-21T09:05:41.1092241Z mov.b32 %r29471, %r29353; 2026-02-21T09:05:41.1092332Z mov.b32 %r29472, %r29353; 2026-02-21T09:05:41.1092546Z mov.b32 %r29473, %r29353; 2026-02-21T09:05:41.1092643Z mov.b32 %r29474, %r29353; 2026-02-21T09:05:41.1092734Z mov.b32 %r29475, %r29353; 2026-02-21T09:05:41.1092829Z mov.b32 %r29476, %r29353; 2026-02-21T09:05:41.1093064Z mov.b32 %r29477, %r29353; 2026-02-21T09:05:41.1093141Z mov.b32 %r29478, %r29353; 2026-02-21T09:05:41.1093281Z mov.b32 %r29479, %r29353; 2026-02-21T09:05:41.1093430Z mov.b32 %r29480, %r29353; 2026-02-21T09:05:41.1093523Z mov.b32 %r29481, %r29353; 2026-02-21T09:05:41.1093614Z mov.b32 %r29482, %r29353; 2026-02-21T09:05:41.1093760Z mov.b32 %r29483, %r29353; 2026-02-21T09:05:41.1093837Z mov.b32 %r29484, %r29353; 2026-02-21T09:05:41.1093977Z mov.b32 %r29485, %r29353; 2026-02-21T09:05:41.1094088Z mov.b32 %r29486, %r29353; 2026-02-21T09:05:41.1094218Z mov.b32 %r29487, %r29353; 2026-02-21T09:05:41.1094313Z mov.b32 %r29488, %r29353; 2026-02-21T09:05:41.1094423Z mov.b32 %r29489, %r29353; 2026-02-21T09:05:41.1094615Z mov.b32 %r29490, %r29353; 2026-02-21T09:05:41.1094816Z mov.b32 %r29491, %r29353; 2026-02-21T09:05:41.1094942Z mov.b32 %r29492, %r29353; 2026-02-21T09:05:41.1095039Z mov.b32 %r29493, %r29353; 2026-02-21T09:05:41.1095189Z mov.b32 %r29494, %r29353; 2026-02-21T09:05:41.1095286Z mov.b32 %r29495, %r29353; 2026-02-21T09:05:41.1095379Z mov.b32 %r29496, %r29353; 2026-02-21T09:05:41.1095555Z mov.b32 %r29497, %r29353; 2026-02-21T09:05:41.1095664Z mov.b32 %r29498, %r29353; 2026-02-21T09:05:41.1095756Z mov.b32 %r29499, %r29353; 2026-02-21T09:05:41.1095904Z mov.b32 %r29500, %r29353; 2026-02-21T09:05:41.1096003Z mov.b32 %r29501, %r29353; 2026-02-21T09:05:41.1096097Z mov.b32 %r29502, %r29353; 2026-02-21T09:05:41.1096176Z mov.b32 %r29503, %r29353; 2026-02-21T09:05:41.1096367Z mov.b32 %r29504, %r29353; 2026-02-21T09:05:41.1096630Z mov.b32 %r29505, %r29353; 2026-02-21T09:05:41.1096735Z mov.b32 %r29506, %r29353; 2026-02-21T09:05:41.1096867Z mov.b32 %r29507, %r29353; 2026-02-21T09:05:41.1096965Z mov.b32 %r29508, %r29353; 2026-02-21T09:05:41.1097042Z mov.b32 %r29509, %r29353; 2026-02-21T09:05:41.1097271Z mov.b32 %r29510, %r29353; 2026-02-21T09:05:41.1097448Z mov.b32 %r29511, %r29353; 2026-02-21T09:05:41.1097544Z mov.b32 %r29512, %r29353; 2026-02-21T09:05:41.1097640Z mov.b32 %r29513, %r29353; 2026-02-21T09:05:41.1097768Z mov.b32 %r29514, %r29353; 2026-02-21T09:05:41.1097845Z mov.b32 %r29515, %r29353; 2026-02-21T09:05:41.1097994Z mov.b32 %r29516, %r29353; 2026-02-21T09:05:41.1098155Z mov.b32 %r29517, %r29353; 2026-02-21T09:05:41.1098246Z mov.b32 %r29518, %r29353; 2026-02-21T09:05:41.1098343Z mov.b32 %r29519, %r29353; 2026-02-21T09:05:41.1098438Z mov.b32 %r29520, %r29353; 2026-02-21T09:05:41.1098558Z mov.b32 %r29521, %r29353; 2026-02-21T09:05:41.1098697Z mov.b32 %r29522, %r29353; 2026-02-21T09:05:41.1098829Z mov.b32 %r29523, %r29353; 2026-02-21T09:05:41.1098959Z mov.b32 %r29524, %r29353; 2026-02-21T09:05:41.1099054Z mov.b32 %r29525, %r29353; 2026-02-21T09:05:41.1099151Z mov.b32 %r29526, %r29353; 2026-02-21T09:05:41.1099245Z mov.b32 %r29527, %r29353; 2026-02-21T09:05:41.1099436Z mov.b32 %r29528, %r29353; 2026-02-21T09:05:41.1099549Z mov.b32 %r29529, %r29353; 2026-02-21T09:05:41.1099639Z mov.b32 %r29530, %r29353; 2026-02-21T09:05:41.1099771Z mov.b32 %r29531, %r29353; 2026-02-21T09:05:41.1099865Z mov.b32 %r29532, %r29353; 2026-02-21T09:05:41.1099962Z mov.b32 %r29533, %r29353; 2026-02-21T09:05:41.1100149Z mov.b32 %r29534, %r29353; 2026-02-21T09:05:41.1100274Z mov.b32 %r29535, %r29353; 2026-02-21T09:05:41.1100370Z mov.b32 %r29536, %r29353; 2026-02-21T09:05:41.1100465Z mov.b32 %r29537, %r29353; 2026-02-21T09:05:41.1100597Z mov.b32 %r29538, %r29353; 2026-02-21T09:05:41.1100690Z mov.b32 %r29539, %r29353; 2026-02-21T09:05:41.1100781Z mov.b32 %r29540, %r29353; 2026-02-21T09:05:41.1100978Z mov.b32 %r29541, %r29353; 2026-02-21T09:05:41.1101073Z mov.b32 %r29542, %r29353; 2026-02-21T09:05:41.1101171Z mov.b32 %r29543, %r29353; 2026-02-21T09:05:41.1101269Z mov.b32 %r29544, %r29353; 2026-02-21T09:05:41.1101415Z mov.b32 %r29545, %r29353; 2026-02-21T09:05:41.1101492Z mov.b32 %r29546, %r29353; 2026-02-21T09:05:41.1101628Z mov.b32 %r29547, %r29353; 2026-02-21T09:05:41.1101776Z mov.b32 %r29548, %r29353; 2026-02-21T09:05:41.1101945Z mov.b32 %r29549, %r29353; 2026-02-21T09:05:41.1102050Z mov.b32 %r29550, %r29353; 2026-02-21T09:05:41.1102202Z mov.b32 %r29551, %r29353; 2026-02-21T09:05:41.1102284Z mov.b32 %r29552, %r29353; 2026-02-21T09:05:41.1102425Z mov.b32 %r29553, %r29353; 2026-02-21T09:05:41.1102537Z mov.b32 %r29554, %r29353; 2026-02-21T09:05:41.1102664Z mov.b32 %r29555, %r29353; 2026-02-21T09:05:41.1102756Z mov.b32 %r29556, %r29353; 2026-02-21T09:05:41.1102867Z mov.b32 %r29557, %r29353; 2026-02-21T09:05:41.1102985Z mov.b32 %r29558, %r29353; 2026-02-21T09:05:41.1103127Z mov.b32 %r29559, %r29353; 2026-02-21T09:05:41.1103242Z mov.b32 %r29560, %r29353; 2026-02-21T09:05:41.1103339Z mov.b32 %r29561, %r29353; 2026-02-21T09:05:41.1103563Z mov.b32 %r29562, %r29353; 2026-02-21T09:05:41.1103720Z mov.b32 %r29563, %r29353; 2026-02-21T09:05:41.1103816Z mov.b32 %r29564, %r29353; 2026-02-21T09:05:41.1103982Z mov.b32 %r29565, %r29353; 2026-02-21T09:05:41.1104095Z mov.b32 %r29566, %r29353; 2026-02-21T09:05:41.1104195Z mov.b32 %r29567, %r29353; 2026-02-21T09:05:41.1104342Z mov.b32 %r29568, %r29353; 2026-02-21T09:05:41.1104438Z mov.b32 %r29569, %r29353; 2026-02-21T09:05:41.1104531Z mov.b32 %r29570, %r29353; 2026-02-21T09:05:41.1104610Z mov.b32 %r29571, %r29353; 2026-02-21T09:05:41.1104803Z mov.b32 %r29572, %r29353; 2026-02-21T09:05:41.1104914Z mov.b32 %r29573, %r29353; 2026-02-21T09:05:41.1105006Z mov.b32 %r29574, %r29353; 2026-02-21T09:05:41.1105136Z mov.b32 %r29575, %r29353; 2026-02-21T09:05:41.1105227Z mov.b32 %r29576, %r29353; 2026-02-21T09:05:41.1105302Z mov.b32 %r29577, %r29353; 2026-02-21T09:05:41.1105437Z mov.b32 %r29578, %r29353; 2026-02-21T09:05:41.1105602Z mov.b32 %r29579, %r29353; 2026-02-21T09:05:41.1105696Z mov.b32 %r29580, %r29353; 2026-02-21T09:05:41.1105864Z mov.b32 %r29581, %r29353; 2026-02-21T09:05:41.1106002Z mov.b32 %r29582, %r29353; 2026-02-21T09:05:41.1106080Z mov.b32 %r29583, %r29353; 2026-02-21T09:05:41.1106218Z mov.b32 %r29584, %r29353; 2026-02-21T09:05:41.1106384Z mov.b32 %r29585, %r29353; 2026-02-21T09:05:41.1106612Z mov.b32 %r29586, %r29353; 2026-02-21T09:05:41.1106716Z mov.b32 %r29587, %r29353; 2026-02-21T09:05:41.1106809Z mov.b32 %r29588, %r29353; 2026-02-21T09:05:41.1106921Z mov.b32 %r29589, %r29353; 2026-02-21T09:05:41.1107076Z mov.b32 %r29590, %r29353; 2026-02-21T09:05:41.1107208Z mov.b32 %r29591, %r29353; 2026-02-21T09:05:41.1107341Z mov.b32 %r29592, %r29353; 2026-02-21T09:05:41.1107436Z mov.b32 %r29593, %r29353; 2026-02-21T09:05:41.1107527Z mov.b32 %r29594, %r29353; 2026-02-21T09:05:41.1107619Z mov.b32 %r29595, %r29353; 2026-02-21T09:05:41.1107792Z mov.b32 %r29596, %r29353; 2026-02-21T09:05:41.1109415Z mov.b32 %r29597, %r29353; 2026-02-21T09:05:41.1109535Z mov.b32 %r29598, %r29353; 2026-02-21T09:05:41.1109668Z mov.b32 %r29599, %r29353; 2026-02-21T09:05:41.1109769Z mov.b32 %r29600, %r29353; 2026-02-21T09:05:41.1109861Z mov.b32 %r29601, %r29353; 2026-02-21T09:05:41.1110038Z mov.b32 %r29602, %r29353; 2026-02-21T09:05:41.1110160Z mov.b32 %r29603, %r29353; 2026-02-21T09:05:41.1110255Z mov.b32 %r29604, %r29353; 2026-02-21T09:05:41.1110353Z mov.b32 %r29605, %r29353; 2026-02-21T09:05:41.1110485Z mov.b32 %r29606, %r29353; 2026-02-21T09:05:41.1110581Z mov.b32 %r29607, %r29353; 2026-02-21T09:05:41.1110692Z mov.b32 %r29608, %r29353; 2026-02-21T09:05:41.1116819Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:41.1116991Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:41.1117229Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1117310Z add.s64 %rd127, %rd301, %rd20; 2026-02-21T09:05:41.1117397Z add.s64 %rd128, %rd301, %rd19; 2026-02-21T09:05:41.1117466Z add.s64 %rd129, %rd301, %rd18; 2026-02-21T09:05:41.1117698Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1117935Z add.s64 %rd130, %rd301, %rd17; 2026-02-21T09:05:41.1118002Z // begin inline asm 2026-02-21T09:05:41.1118066Z mov.u32 %r8983, 0x0; 2026-02-21T09:05:41.1118131Z mov.u32 %r8984, 0x0; 2026-02-21T09:05:41.1118189Z mov.u32 %r8985, 0x0; 2026-02-21T09:05:41.1118247Z mov.u32 %r8986, 0x0; 2026-02-21T09:05:41.1118387Z ld.global.v4.b32 { %r8983, %r8984, %r8985, %r8986 }, [ %rd127 + 0 ]; 2026-02-21T09:05:41.1118454Z // end inline asm 2026-02-21T09:05:41.1118516Z // begin inline asm 2026-02-21T09:05:41.1118576Z mov.u32 %r8987, 0x0; 2026-02-21T09:05:41.1118641Z mov.u32 %r8988, 0x0; 2026-02-21T09:05:41.1118700Z mov.u32 %r8989, 0x0; 2026-02-21T09:05:41.1118758Z mov.u32 %r8990, 0x0; 2026-02-21T09:05:41.1118887Z ld.global.v4.b32 { %r8987, %r8988, %r8989, %r8990 }, [ %rd128 + 0 ]; 2026-02-21T09:05:41.1119097Z // end inline asm 2026-02-21T09:05:41.1119165Z // begin inline asm 2026-02-21T09:05:41.1119226Z mov.u32 %r8991, 0x0; 2026-02-21T09:05:41.1119288Z mov.u32 %r8992, 0x0; 2026-02-21T09:05:41.1119349Z mov.u32 %r8993, 0x0; 2026-02-21T09:05:41.1119409Z mov.u32 %r8994, 0x0; 2026-02-21T09:05:41.1119550Z ld.global.v4.b32 { %r8991, %r8992, %r8993, %r8994 }, [ %rd129 + 0 ]; 2026-02-21T09:05:41.1119618Z // end inline asm 2026-02-21T09:05:41.1119679Z // begin inline asm 2026-02-21T09:05:41.1119739Z mov.u32 %r8995, 0x0; 2026-02-21T09:05:41.1119806Z mov.u32 %r8996, 0x0; 2026-02-21T09:05:41.1119866Z mov.u32 %r8997, 0x0; 2026-02-21T09:05:41.1119924Z mov.u32 %r8998, 0x0; 2026-02-21T09:05:41.1120051Z ld.global.v4.b32 { %r8995, %r8996, %r8997, %r8998 }, [ %rd130 + 0 ]; 2026-02-21T09:05:41.1120109Z // end inline asm 2026-02-21T09:05:41.1120325Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1120386Z bar.sync 0; 2026-02-21T09:05:41.1120556Z st.shared.v2.b32 [%r10], {%r8983, %r8984}; 2026-02-21T09:05:41.1120651Z st.shared.v2.b32 [%r10+2048], {%r8987, %r8988}; 2026-02-21T09:05:41.1120737Z st.shared.v2.b32 [%r10+4096], {%r8991, %r8992}; 2026-02-21T09:05:41.1120824Z st.shared.v2.b32 [%r10+6144], {%r8995, %r8996}; 2026-02-21T09:05:41.1120905Z st.shared.v2.b32 [%r11], {%r8985, %r8986}; 2026-02-21T09:05:41.1120993Z st.shared.v2.b32 [%r11+2048], {%r8989, %r8990}; 2026-02-21T09:05:41.1121078Z st.shared.v2.b32 [%r11+4096], {%r8993, %r8994}; 2026-02-21T09:05:41.1121165Z st.shared.v2.b32 [%r11+6144], {%r8997, %r8998}; 2026-02-21T09:05:41.1121224Z bar.sync 0; 2026-02-21T09:05:41.1121297Z ld.shared.b16 %rs321, [%r12]; 2026-02-21T09:05:41.1121372Z ld.shared.b16 %rs322, [%r12+256]; 2026-02-21T09:05:41.1121443Z ld.shared.b16 %rs323, [%r12+16]; 2026-02-21T09:05:41.1121512Z ld.shared.b16 %rs324, [%r12+272]; 2026-02-21T09:05:41.1121583Z ld.shared.b16 %rs325, [%r12+2048]; 2026-02-21T09:05:41.1121653Z ld.shared.b16 %rs326, [%r12+2304]; 2026-02-21T09:05:41.1121722Z ld.shared.b16 %rs327, [%r12+2064]; 2026-02-21T09:05:41.1121786Z ld.shared.b16 %rs328, [%r12+2320]; 2026-02-21T09:05:41.1121855Z ld.shared.b16 %rs329, [%r12+4096]; 2026-02-21T09:05:41.1121921Z ld.shared.b16 %rs330, [%r12+4352]; 2026-02-21T09:05:41.1121985Z ld.shared.b16 %rs331, [%r12+4112]; 2026-02-21T09:05:41.1122053Z ld.shared.b16 %rs332, [%r12+4368]; 2026-02-21T09:05:41.1122115Z ld.shared.b16 %rs333, [%r12+6144]; 2026-02-21T09:05:41.1122179Z ld.shared.b16 %rs334, [%r12+6400]; 2026-02-21T09:05:41.1122245Z ld.shared.b16 %rs335, [%r12+6160]; 2026-02-21T09:05:41.1122313Z ld.shared.b16 %rs336, [%r12+6416]; 2026-02-21T09:05:41.1122382Z ld.shared.b16 %rs337, [%r13]; 2026-02-21T09:05:41.1122447Z ld.shared.b16 %rs338, [%r13+256]; 2026-02-21T09:05:41.1122516Z ld.shared.b16 %rs339, [%r13+16]; 2026-02-21T09:05:41.1122583Z ld.shared.b16 %rs340, [%r13+272]; 2026-02-21T09:05:41.1122650Z ld.shared.b16 %rs341, [%r13+2048]; 2026-02-21T09:05:41.1122726Z ld.shared.b16 %rs342, [%r13+2304]; 2026-02-21T09:05:41.1122790Z ld.shared.b16 %rs343, [%r13+2064]; 2026-02-21T09:05:41.1122855Z ld.shared.b16 %rs344, [%r13+2320]; 2026-02-21T09:05:41.1122980Z ld.shared.b16 %rs345, [%r13+4096]; 2026-02-21T09:05:41.1123064Z ld.shared.b16 %rs346, [%r13+4352]; 2026-02-21T09:05:41.1123130Z ld.shared.b16 %rs347, [%r13+4112]; 2026-02-21T09:05:41.1123198Z ld.shared.b16 %rs348, [%r13+4368]; 2026-02-21T09:05:41.1123265Z ld.shared.b16 %rs349, [%r13+6144]; 2026-02-21T09:05:41.1123330Z ld.shared.b16 %rs350, [%r13+6400]; 2026-02-21T09:05:41.1123395Z ld.shared.b16 %rs351, [%r13+6160]; 2026-02-21T09:05:41.1123459Z ld.shared.b16 %rs352, [%r13+6416]; 2026-02-21T09:05:41.1123534Z cvt.f32.bf16 %r9129, %rs321; 2026-02-21T09:05:41.1123601Z cvt.f32.bf16 %r9130, %rs322; 2026-02-21T09:05:41.1123664Z cvt.f32.bf16 %r9131, %rs337; 2026-02-21T09:05:41.1123732Z cvt.f32.bf16 %r9132, %rs338; 2026-02-21T09:05:41.1123794Z cvt.f32.bf16 %r9261, %rs323; 2026-02-21T09:05:41.1123955Z cvt.f32.bf16 %r9262, %rs324; 2026-02-21T09:05:41.1124022Z cvt.f32.bf16 %r9263, %rs339; 2026-02-21T09:05:41.1124094Z cvt.f32.bf16 %r9264, %rs340; 2026-02-21T09:05:41.1124156Z cvt.f32.bf16 %r9393, %rs325; 2026-02-21T09:05:41.1124219Z cvt.f32.bf16 %r9394, %rs326; 2026-02-21T09:05:41.1124287Z cvt.f32.bf16 %r9395, %rs341; 2026-02-21T09:05:41.1124351Z cvt.f32.bf16 %r9396, %rs342; 2026-02-21T09:05:41.1124415Z cvt.f32.bf16 %r9525, %rs327; 2026-02-21T09:05:41.1124481Z cvt.f32.bf16 %r9526, %rs328; 2026-02-21T09:05:41.1124544Z cvt.f32.bf16 %r9527, %rs343; 2026-02-21T09:05:41.1124605Z cvt.f32.bf16 %r9528, %rs344; 2026-02-21T09:05:41.1124667Z cvt.f32.bf16 %r9657, %rs329; 2026-02-21T09:05:41.1124738Z cvt.f32.bf16 %r9658, %rs330; 2026-02-21T09:05:41.1124800Z cvt.f32.bf16 %r9659, %rs345; 2026-02-21T09:05:41.1124863Z cvt.f32.bf16 %r9660, %rs346; 2026-02-21T09:05:41.1124933Z cvt.f32.bf16 %r9789, %rs331; 2026-02-21T09:05:41.1124996Z cvt.f32.bf16 %r9790, %rs332; 2026-02-21T09:05:41.1125061Z cvt.f32.bf16 %r9791, %rs347; 2026-02-21T09:05:41.1125171Z cvt.f32.bf16 %r9792, %rs348; 2026-02-21T09:05:41.1125244Z cvt.f32.bf16 %r9921, %rs333; 2026-02-21T09:05:41.1125308Z cvt.f32.bf16 %r9922, %rs334; 2026-02-21T09:05:41.1125373Z cvt.f32.bf16 %r9923, %rs349; 2026-02-21T09:05:41.1125440Z cvt.f32.bf16 %r9924, %rs350; 2026-02-21T09:05:41.1125506Z cvt.f32.bf16 %r10053, %rs335; 2026-02-21T09:05:41.1125566Z cvt.f32.bf16 %r10054, %rs336; 2026-02-21T09:05:41.1125628Z cvt.f32.bf16 %r10055, %rs351; 2026-02-21T09:05:41.1125696Z cvt.f32.bf16 %r10056, %rs352; 2026-02-21T09:05:41.1125922Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1125991Z cvt.s64.s32 %rd179, %r29352; 2026-02-21T09:05:41.1126065Z add.s64 %rd131, %rd45, %rd179; 2026-02-21T09:05:41.1126270Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1126336Z // begin inline asm 2026-02-21T09:05:41.1126404Z mov.u32 %r8999, 0x0; 2026-02-21T09:05:41.1126594Z mov.u32 %r9000, 0x0; 2026-02-21T09:05:41.1126707Z ld.global.v2.b32 { %r8999, %r9000 }, [ %rd131 + 0 ]; 2026-02-21T09:05:41.1126774Z // end inline asm 2026-02-21T09:05:41.1126981Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1127039Z bar.sync 0; 2026-02-21T09:05:41.1127109Z st.shared.b8 [%r14], %r8999; 2026-02-21T09:05:41.1127190Z prmt.b32 %r15351, %r8999, 0, 0x7771U; 2026-02-21T09:05:41.1127259Z st.shared.b8 [%r15], %r15351; 2026-02-21T09:05:41.1127326Z prmt.b32 %r15352, %r8999, 0, 0x7772U; 2026-02-21T09:05:41.1127401Z st.shared.b8 [%r16+256], %r15352; 2026-02-21T09:05:41.1127468Z prmt.b32 %r15353, %r8999, 0, 0x7773U; 2026-02-21T09:05:41.1127534Z st.shared.b8 [%r17+256], %r15353; 2026-02-21T09:05:41.1127599Z st.shared.b8 [%r18+512], %r9000; 2026-02-21T09:05:41.1127683Z prmt.b32 %r15354, %r9000, 0, 0x7771U; 2026-02-21T09:05:41.1127754Z st.shared.b8 [%r19+512], %r15354; 2026-02-21T09:05:41.1127825Z prmt.b32 %r15355, %r9000, 0, 0x7772U; 2026-02-21T09:05:41.1127892Z st.shared.b8 [%r20+768], %r15355; 2026-02-21T09:05:41.1127956Z prmt.b32 %r15356, %r9000, 0, 0x7773U; 2026-02-21T09:05:41.1128108Z st.shared.b8 [%r21+768], %r15356; 2026-02-21T09:05:41.1128166Z bar.sync 0; 2026-02-21T09:05:41.1128240Z ld.shared.b32 %r15357, [%r22]; 2026-02-21T09:05:41.1128314Z prmt.b32 %r15358, %r15357, 0, 0x7770U; 2026-02-21T09:05:41.1128378Z cvt.u16.u32 %rs353, %r15358; 2026-02-21T09:05:41.1128450Z prmt.b32 %r15359, %r15357, 0, 0x7771U; 2026-02-21T09:05:41.1128513Z cvt.u16.u32 %rs354, %r15359; 2026-02-21T09:05:41.1128580Z prmt.b32 %r15360, %r15357, 0, 0x7772U; 2026-02-21T09:05:41.1128641Z cvt.u16.u32 %rs355, %r15360; 2026-02-21T09:05:41.1128719Z prmt.b32 %r15361, %r15357, 0, 0x7773U; 2026-02-21T09:05:41.1128781Z cvt.u16.u32 %rs356, %r15361; 2026-02-21T09:05:41.1128848Z ld.shared.b32 %r15362, [%r23]; 2026-02-21T09:05:41.1129049Z prmt.b32 %r15363, %r15362, 0, 0x7770U; 2026-02-21T09:05:41.1129120Z cvt.u16.u32 %rs357, %r15363; 2026-02-21T09:05:41.1129187Z prmt.b32 %r15364, %r15362, 0, 0x7771U; 2026-02-21T09:05:41.1129257Z cvt.u16.u32 %rs358, %r15364; 2026-02-21T09:05:41.1129325Z prmt.b32 %r15365, %r15362, 0, 0x7772U; 2026-02-21T09:05:41.1129387Z cvt.u16.u32 %rs359, %r15365; 2026-02-21T09:05:41.1129452Z prmt.b32 %r15366, %r15362, 0, 0x7773U; 2026-02-21T09:05:41.1129520Z cvt.u16.u32 %rs360, %r15366; 2026-02-21T09:05:41.1129733Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1129802Z shl.b16 %rs361, %rs353, 4; 2026-02-21T09:05:41.1129873Z shl.b16 %rs362, %rs357, 4; 2026-02-21T09:05:41.1129936Z shl.b16 %rs363, %rs354, 4; 2026-02-21T09:05:41.1129996Z shl.b16 %rs364, %rs358, 4; 2026-02-21T09:05:41.1130059Z shl.b16 %rs365, %rs355, 4; 2026-02-21T09:05:41.1130130Z shl.b16 %rs366, %rs359, 4; 2026-02-21T09:05:41.1130193Z shl.b16 %rs367, %rs356, 4; 2026-02-21T09:05:41.1130323Z shl.b16 %rs368, %rs360, 4; 2026-02-21T09:05:41.1130537Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1130604Z cvt.s16.s8 %rs369, %rs361; 2026-02-21T09:05:41.1130665Z shr.s16 %rs370, %rs369, 4; 2026-02-21T09:05:41.1130733Z cvt.s16.s8 %rs371, %rs362; 2026-02-21T09:05:41.1130795Z shr.s16 %rs372, %rs371, 4; 2026-02-21T09:05:41.1130865Z prmt.b32 %r15367, %r15357, 0, 0x8880U; 2026-02-21T09:05:41.1130931Z cvt.u16.u32 %rs373, %r15367; 2026-02-21T09:05:41.1130999Z shr.s16 %rs374, %rs373, 4; 2026-02-21T09:05:41.1131067Z prmt.b32 %r15368, %r15362, 0, 0x8880U; 2026-02-21T09:05:41.1131131Z cvt.u16.u32 %rs375, %r15368; 2026-02-21T09:05:41.1131200Z shr.s16 %rs376, %rs375, 4; 2026-02-21T09:05:41.1131403Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1131473Z cvt.rn.f32.s16 %r15369, %rs376; 2026-02-21T09:05:41.1131542Z cvt.rn.f32.s16 %r15370, %rs374; 2026-02-21T09:05:41.1131613Z cvt.rn.f32.s16 %r15371, %rs372; 2026-02-21T09:05:41.1131677Z cvt.rn.f32.s16 %r15372, %rs370; 2026-02-21T09:05:41.1131888Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1131963Z cvt.s16.s8 %rs377, %rs363; 2026-02-21T09:05:41.1132025Z shr.s16 %rs378, %rs377, 4; 2026-02-21T09:05:41.1132088Z cvt.s16.s8 %rs379, %rs364; 2026-02-21T09:05:41.1132156Z shr.s16 %rs380, %rs379, 4; 2026-02-21T09:05:41.1132232Z prmt.b32 %r15373, %r15357, 0, 0x9991U; 2026-02-21T09:05:41.1132298Z cvt.u16.u32 %rs381, %r15373; 2026-02-21T09:05:41.1132361Z shr.s16 %rs382, %rs381, 4; 2026-02-21T09:05:41.1132436Z prmt.b32 %r15374, %r15362, 0, 0x9991U; 2026-02-21T09:05:41.1132498Z cvt.u16.u32 %rs383, %r15374; 2026-02-21T09:05:41.1132561Z shr.s16 %rs384, %rs383, 4; 2026-02-21T09:05:41.1132766Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1132836Z cvt.rn.f32.s16 %r15375, %rs384; 2026-02-21T09:05:41.1132904Z cvt.rn.f32.s16 %r15376, %rs382; 2026-02-21T09:05:41.1132969Z cvt.rn.f32.s16 %r15377, %rs380; 2026-02-21T09:05:41.1133105Z cvt.rn.f32.s16 %r15378, %rs378; 2026-02-21T09:05:41.1133316Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1133383Z cvt.s16.s8 %rs385, %rs365; 2026-02-21T09:05:41.1133454Z shr.s16 %rs386, %rs385, 4; 2026-02-21T09:05:41.1133517Z cvt.s16.s8 %rs387, %rs366; 2026-02-21T09:05:41.1133581Z shr.s16 %rs388, %rs387, 4; 2026-02-21T09:05:41.1133659Z prmt.b32 %r15379, %r15357, 0, 0xaaa2U; 2026-02-21T09:05:41.1133726Z cvt.u16.u32 %rs389, %r15379; 2026-02-21T09:05:41.1133792Z shr.s16 %rs390, %rs389, 4; 2026-02-21T09:05:41.1133867Z prmt.b32 %r15380, %r15362, 0, 0xaaa2U; 2026-02-21T09:05:41.1133941Z cvt.u16.u32 %rs391, %r15380; 2026-02-21T09:05:41.1134003Z shr.s16 %rs392, %rs391, 4; 2026-02-21T09:05:41.1134336Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1134413Z cvt.rn.f32.s16 %r15381, %rs392; 2026-02-21T09:05:41.1134491Z cvt.rn.f32.s16 %r15382, %rs390; 2026-02-21T09:05:41.1134561Z cvt.rn.f32.s16 %r15383, %rs388; 2026-02-21T09:05:41.1134631Z cvt.rn.f32.s16 %r15384, %rs386; 2026-02-21T09:05:41.1134837Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1134903Z cvt.s16.s8 %rs393, %rs367; 2026-02-21T09:05:41.1134967Z shr.s16 %rs394, %rs393, 4; 2026-02-21T09:05:41.1135036Z cvt.s16.s8 %rs395, %rs368; 2026-02-21T09:05:41.1135100Z shr.s16 %rs396, %rs395, 4; 2026-02-21T09:05:41.1135171Z prmt.b32 %r15385, %r15357, 0, 0xbbb3U; 2026-02-21T09:05:41.1135241Z cvt.u16.u32 %rs397, %r15385; 2026-02-21T09:05:41.1135307Z shr.s16 %rs398, %rs397, 4; 2026-02-21T09:05:41.1135374Z prmt.b32 %r15386, %r15362, 0, 0xbbb3U; 2026-02-21T09:05:41.1135442Z cvt.u16.u32 %rs399, %r15386; 2026-02-21T09:05:41.1135565Z shr.s16 %rs400, %rs399, 4; 2026-02-21T09:05:41.1135768Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1135835Z cvt.rn.f32.s16 %r15387, %rs400; 2026-02-21T09:05:41.1135909Z cvt.rn.f32.s16 %r15388, %rs398; 2026-02-21T09:05:41.1135974Z cvt.rn.f32.s16 %r15389, %rs396; 2026-02-21T09:05:41.1136038Z cvt.rn.f32.s16 %r15390, %rs394; 2026-02-21T09:05:41.1136101Z bar.sync 0; 2026-02-21T09:05:41.1136223Z st.shared.v4.b32 [%r24], {%r15372, %r15370, %r15371, %r15369}; 2026-02-21T09:05:41.1136340Z st.shared.v4.b32 [%r25], {%r15378, %r15376, %r15377, %r15375}; 2026-02-21T09:05:41.1136566Z st.shared.v4.b32 [%r26], {%r15384, %r15382, %r15383, %r15381}; 2026-02-21T09:05:41.1136695Z st.shared.v4.b32 [%r27], {%r15390, %r15388, %r15389, %r15387}; 2026-02-21T09:05:41.1136753Z $L__tmp9: 2026-02-21T09:05:41.1137038Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1137110Z // begin inline asm 2026-02-21T09:05:41.1137198Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1137257Z // end inline asm 2026-02-21T09:05:41.1137323Z bar.sync 0; 2026-02-21T09:05:41.1137399Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1137467Z mov.pred %p56, -1; 2026-02-21T09:05:41.1137528Z // begin inline asm 2026-02-21T09:05:41.1139019Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r9129,%r9130,%r9131,%r9132}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1139086Z // end inline asm 2026-02-21T09:05:41.1139154Z // begin inline asm 2026-02-21T09:05:41.1140650Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r9261,%r9262,%r9263,%r9264}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1140796Z // end inline asm 2026-02-21T09:05:41.1140864Z // begin inline asm 2026-02-21T09:05:41.1142412Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r9393,%r9394,%r9395,%r9396}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1142538Z // end inline asm 2026-02-21T09:05:41.1142601Z // begin inline asm 2026-02-21T09:05:41.1144140Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r9525,%r9526,%r9527,%r9528}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1144212Z // end inline asm 2026-02-21T09:05:41.1144274Z // begin inline asm 2026-02-21T09:05:41.1145757Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r9657,%r9658,%r9659,%r9660}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1145831Z // end inline asm 2026-02-21T09:05:41.1145894Z // begin inline asm 2026-02-21T09:05:41.1147502Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r9789,%r9790,%r9791,%r9792}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1147569Z // end inline asm 2026-02-21T09:05:41.1147630Z // begin inline asm 2026-02-21T09:05:41.1149197Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r9921,%r9922,%r9923,%r9924}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1149341Z // end inline asm 2026-02-21T09:05:41.1149405Z // begin inline asm 2026-02-21T09:05:41.1150960Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r10053,%r10054,%r10055,%r10056}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1151079Z // end inline asm 2026-02-21T09:05:41.1151166Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1151230Z mov.b32 %r15091, 0; 2026-02-21T09:05:41.1151294Z mov.b32 %r10313, %r24001; 2026-02-21T09:05:41.1151354Z mov.b32 %r10314, %r15091; 2026-02-21T09:05:41.1151433Z mov.b32 %r10315, %r15091; 2026-02-21T09:05:41.1151497Z // begin inline asm 2026-02-21T09:05:41.1156675Z // wait for regs: %r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416,%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480,%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544,%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608,%r10313,%r10314,%r10315 2026-02-21T09:05:41.1156781Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1156840Z // end inline asm 2026-02-21T09:05:41.1156898Z $L__tmp10: 2026-02-21T09:05:41.1157118Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1157192Z add.s64 %rd140, %rd127, 32; 2026-02-21T09:05:41.1157255Z add.s64 %rd141, %rd128, 32; 2026-02-21T09:05:41.1157319Z add.s64 %rd142, %rd129, 32; 2026-02-21T09:05:41.1157537Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1157603Z add.s64 %rd143, %rd130, 32; 2026-02-21T09:05:41.1157737Z // begin inline asm 2026-02-21T09:05:41.1157803Z mov.u32 %r10575, 0x0; 2026-02-21T09:05:41.1157864Z mov.u32 %r10576, 0x0; 2026-02-21T09:05:41.1157924Z mov.u32 %r10577, 0x0; 2026-02-21T09:05:41.1157984Z mov.u32 %r10578, 0x0; 2026-02-21T09:05:41.1158133Z ld.global.v4.b32 { %r10575, %r10576, %r10577, %r10578 }, [ %rd140 + 0 ]; 2026-02-21T09:05:41.1158194Z // end inline asm 2026-02-21T09:05:41.1158254Z // begin inline asm 2026-02-21T09:05:41.1158324Z mov.u32 %r10579, 0x0; 2026-02-21T09:05:41.1158383Z mov.u32 %r10580, 0x0; 2026-02-21T09:05:41.1158443Z mov.u32 %r10581, 0x0; 2026-02-21T09:05:41.1158501Z mov.u32 %r10582, 0x0; 2026-02-21T09:05:41.1158648Z ld.global.v4.b32 { %r10579, %r10580, %r10581, %r10582 }, [ %rd141 + 0 ]; 2026-02-21T09:05:41.1158776Z // end inline asm 2026-02-21T09:05:41.1158899Z // begin inline asm 2026-02-21T09:05:41.1158970Z mov.u32 %r10583, 0x0; 2026-02-21T09:05:41.1159030Z mov.u32 %r10584, 0x0; 2026-02-21T09:05:41.1159088Z mov.u32 %r10585, 0x0; 2026-02-21T09:05:41.1159156Z mov.u32 %r10586, 0x0; 2026-02-21T09:05:41.1159292Z ld.global.v4.b32 { %r10583, %r10584, %r10585, %r10586 }, [ %rd142 + 0 ]; 2026-02-21T09:05:41.1159350Z // end inline asm 2026-02-21T09:05:41.1159411Z // begin inline asm 2026-02-21T09:05:41.1159476Z mov.u32 %r10587, 0x0; 2026-02-21T09:05:41.1159548Z mov.u32 %r10588, 0x0; 2026-02-21T09:05:41.1159608Z mov.u32 %r10589, 0x0; 2026-02-21T09:05:41.1159673Z mov.u32 %r10590, 0x0; 2026-02-21T09:05:41.1159804Z ld.global.v4.b32 { %r10587, %r10588, %r10589, %r10590 }, [ %rd143 + 0 ]; 2026-02-21T09:05:41.1159862Z // end inline asm 2026-02-21T09:05:41.1160077Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1160139Z bar.sync 0; 2026-02-21T09:05:41.1160284Z st.shared.v2.b32 [%r10], {%r10575, %r10576}; 2026-02-21T09:05:41.1160382Z st.shared.v2.b32 [%r10+2048], {%r10579, %r10580}; 2026-02-21T09:05:41.1160477Z st.shared.v2.b32 [%r10+4096], {%r10583, %r10584}; 2026-02-21T09:05:41.1160567Z st.shared.v2.b32 [%r10+6144], {%r10587, %r10588}; 2026-02-21T09:05:41.1160646Z st.shared.v2.b32 [%r11], {%r10577, %r10578}; 2026-02-21T09:05:41.1160739Z st.shared.v2.b32 [%r11+2048], {%r10581, %r10582}; 2026-02-21T09:05:41.1160825Z st.shared.v2.b32 [%r11+4096], {%r10585, %r10586}; 2026-02-21T09:05:41.1160909Z st.shared.v2.b32 [%r11+6144], {%r10589, %r10590}; 2026-02-21T09:05:41.1160972Z bar.sync 0; 2026-02-21T09:05:41.1161042Z ld.shared.b16 %rs401, [%r12]; 2026-02-21T09:05:41.1161114Z ld.shared.b16 %rs402, [%r12+256]; 2026-02-21T09:05:41.1161182Z ld.shared.b16 %rs403, [%r12+16]; 2026-02-21T09:05:41.1161254Z ld.shared.b16 %rs404, [%r12+272]; 2026-02-21T09:05:41.1161337Z ld.shared.b16 %rs405, [%r12+2048]; 2026-02-21T09:05:41.1161410Z ld.shared.b16 %rs406, [%r12+2304]; 2026-02-21T09:05:41.1161482Z ld.shared.b16 %rs407, [%r12+2064]; 2026-02-21T09:05:41.1161548Z ld.shared.b16 %rs408, [%r12+2320]; 2026-02-21T09:05:41.1161613Z ld.shared.b16 %rs409, [%r12+4096]; 2026-02-21T09:05:41.1161680Z ld.shared.b16 %rs410, [%r12+4352]; 2026-02-21T09:05:41.1161751Z ld.shared.b16 %rs411, [%r12+4112]; 2026-02-21T09:05:41.1161817Z ld.shared.b16 %rs412, [%r12+4368]; 2026-02-21T09:05:41.1161883Z ld.shared.b16 %rs413, [%r12+6144]; 2026-02-21T09:05:41.1161953Z ld.shared.b16 %rs414, [%r12+6400]; 2026-02-21T09:05:41.1162018Z ld.shared.b16 %rs415, [%r12+6160]; 2026-02-21T09:05:41.1162083Z ld.shared.b16 %rs416, [%r12+6416]; 2026-02-21T09:05:41.1162150Z ld.shared.b16 %rs417, [%r13]; 2026-02-21T09:05:41.1162221Z ld.shared.b16 %rs418, [%r13+256]; 2026-02-21T09:05:41.1162289Z ld.shared.b16 %rs419, [%r13+16]; 2026-02-21T09:05:41.1162355Z ld.shared.b16 %rs420, [%r13+272]; 2026-02-21T09:05:41.1162428Z ld.shared.b16 %rs421, [%r13+2048]; 2026-02-21T09:05:41.1162508Z ld.shared.b16 %rs422, [%r13+2304]; 2026-02-21T09:05:41.1162577Z ld.shared.b16 %rs423, [%r13+2064]; 2026-02-21T09:05:41.1162648Z ld.shared.b16 %rs424, [%r13+2320]; 2026-02-21T09:05:41.1162713Z ld.shared.b16 %rs425, [%r13+4096]; 2026-02-21T09:05:41.1162856Z ld.shared.b16 %rs426, [%r13+4352]; 2026-02-21T09:05:41.1162919Z ld.shared.b16 %rs427, [%r13+4112]; 2026-02-21T09:05:41.1162990Z ld.shared.b16 %rs428, [%r13+4368]; 2026-02-21T09:05:41.1163056Z ld.shared.b16 %rs429, [%r13+6144]; 2026-02-21T09:05:41.1163120Z ld.shared.b16 %rs430, [%r13+6400]; 2026-02-21T09:05:41.1163190Z ld.shared.b16 %rs431, [%r13+6160]; 2026-02-21T09:05:41.1163255Z ld.shared.b16 %rs432, [%r13+6416]; 2026-02-21T09:05:41.1163322Z cvt.f32.bf16 %r10721, %rs401; 2026-02-21T09:05:41.1163384Z cvt.f32.bf16 %r10722, %rs402; 2026-02-21T09:05:41.1163451Z cvt.f32.bf16 %r10723, %rs417; 2026-02-21T09:05:41.1163514Z cvt.f32.bf16 %r10724, %rs418; 2026-02-21T09:05:41.1163577Z cvt.f32.bf16 %r10853, %rs403; 2026-02-21T09:05:41.1163744Z cvt.f32.bf16 %r10854, %rs404; 2026-02-21T09:05:41.1163809Z cvt.f32.bf16 %r10855, %rs419; 2026-02-21T09:05:41.1163870Z cvt.f32.bf16 %r10856, %rs420; 2026-02-21T09:05:41.1163935Z cvt.f32.bf16 %r10985, %rs405; 2026-02-21T09:05:41.1164003Z cvt.f32.bf16 %r10986, %rs406; 2026-02-21T09:05:41.1164064Z cvt.f32.bf16 %r10987, %rs421; 2026-02-21T09:05:41.1164137Z cvt.f32.bf16 %r10988, %rs422; 2026-02-21T09:05:41.1164206Z cvt.f32.bf16 %r11117, %rs407; 2026-02-21T09:05:41.1164267Z cvt.f32.bf16 %r11118, %rs408; 2026-02-21T09:05:41.1164328Z cvt.f32.bf16 %r11119, %rs423; 2026-02-21T09:05:41.1164394Z cvt.f32.bf16 %r11120, %rs424; 2026-02-21T09:05:41.1164456Z cvt.f32.bf16 %r11249, %rs409; 2026-02-21T09:05:41.1164517Z cvt.f32.bf16 %r11250, %rs410; 2026-02-21T09:05:41.1164577Z cvt.f32.bf16 %r11251, %rs425; 2026-02-21T09:05:41.1164642Z cvt.f32.bf16 %r11252, %rs426; 2026-02-21T09:05:41.1164704Z cvt.f32.bf16 %r11381, %rs411; 2026-02-21T09:05:41.1164767Z cvt.f32.bf16 %r11382, %rs412; 2026-02-21T09:05:41.1164886Z cvt.f32.bf16 %r11383, %rs427; 2026-02-21T09:05:41.1164951Z cvt.f32.bf16 %r11384, %rs428; 2026-02-21T09:05:41.1165014Z cvt.f32.bf16 %r11513, %rs413; 2026-02-21T09:05:41.1165076Z cvt.f32.bf16 %r11514, %rs414; 2026-02-21T09:05:41.1165144Z cvt.f32.bf16 %r11515, %rs429; 2026-02-21T09:05:41.1165205Z cvt.f32.bf16 %r11516, %rs430; 2026-02-21T09:05:41.1165267Z cvt.f32.bf16 %r11645, %rs415; 2026-02-21T09:05:41.1165332Z cvt.f32.bf16 %r11646, %rs416; 2026-02-21T09:05:41.1165396Z cvt.f32.bf16 %r11647, %rs431; 2026-02-21T09:05:41.1165458Z cvt.f32.bf16 %r11648, %rs432; 2026-02-21T09:05:41.1165667Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1165753Z add.s32 %r15391, %r29352, 65536; 2026-02-21T09:05:41.1165820Z cvt.s64.s32 %rd180, %r15391; 2026-02-21T09:05:41.1165889Z add.s64 %rd144, %rd45, %rd180; 2026-02-21T09:05:41.1166096Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1166161Z // begin inline asm 2026-02-21T09:05:41.1166222Z mov.u32 %r10591, 0x0; 2026-02-21T09:05:41.1166287Z mov.u32 %r10592, 0x0; 2026-02-21T09:05:41.1166394Z ld.global.v2.b32 { %r10591, %r10592 }, [ %rd144 + 0 ]; 2026-02-21T09:05:41.1166568Z // end inline asm 2026-02-21T09:05:41.1166777Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1166834Z bar.sync 0; 2026-02-21T09:05:41.1166901Z st.shared.b8 [%r14], %r10591; 2026-02-21T09:05:41.1166977Z prmt.b32 %r15392, %r10591, 0, 0x7771U; 2026-02-21T09:05:41.1167041Z st.shared.b8 [%r15], %r15392; 2026-02-21T09:05:41.1167110Z prmt.b32 %r15393, %r10591, 0, 0x7772U; 2026-02-21T09:05:41.1167182Z st.shared.b8 [%r16+256], %r15393; 2026-02-21T09:05:41.1167248Z prmt.b32 %r15394, %r10591, 0, 0x7773U; 2026-02-21T09:05:41.1167316Z st.shared.b8 [%r17+256], %r15394; 2026-02-21T09:05:41.1167383Z st.shared.b8 [%r18+512], %r10592; 2026-02-21T09:05:41.1167456Z prmt.b32 %r15395, %r10592, 0, 0x7771U; 2026-02-21T09:05:41.1167519Z st.shared.b8 [%r19+512], %r15395; 2026-02-21T09:05:41.1167584Z prmt.b32 %r15396, %r10592, 0, 0x7772U; 2026-02-21T09:05:41.1167732Z st.shared.b8 [%r20+768], %r15396; 2026-02-21T09:05:41.1167796Z prmt.b32 %r15397, %r10592, 0, 0x7773U; 2026-02-21T09:05:41.1167859Z st.shared.b8 [%r21+768], %r15397; 2026-02-21T09:05:41.1167918Z bar.sync 0; 2026-02-21T09:05:41.1167984Z ld.shared.b32 %r15398, [%r22]; 2026-02-21T09:05:41.1168049Z prmt.b32 %r15399, %r15398, 0, 0x7770U; 2026-02-21T09:05:41.1168113Z cvt.u16.u32 %rs433, %r15399; 2026-02-21T09:05:41.1168182Z prmt.b32 %r15400, %r15398, 0, 0x7771U; 2026-02-21T09:05:41.1168243Z cvt.u16.u32 %rs434, %r15400; 2026-02-21T09:05:41.1168307Z prmt.b32 %r15401, %r15398, 0, 0x7772U; 2026-02-21T09:05:41.1168371Z cvt.u16.u32 %rs435, %r15401; 2026-02-21T09:05:41.1168437Z prmt.b32 %r15402, %r15398, 0, 0x7773U; 2026-02-21T09:05:41.1168572Z cvt.u16.u32 %rs436, %r15402; 2026-02-21T09:05:41.1168697Z ld.shared.b32 %r15403, [%r23]; 2026-02-21T09:05:41.1168768Z prmt.b32 %r15404, %r15403, 0, 0x7770U; 2026-02-21T09:05:41.1168829Z cvt.u16.u32 %rs437, %r15404; 2026-02-21T09:05:41.1168896Z prmt.b32 %r15405, %r15403, 0, 0x7771U; 2026-02-21T09:05:41.1168961Z cvt.u16.u32 %rs438, %r15405; 2026-02-21T09:05:41.1169024Z prmt.b32 %r15406, %r15403, 0, 0x7772U; 2026-02-21T09:05:41.1169085Z cvt.u16.u32 %rs439, %r15406; 2026-02-21T09:05:41.1169149Z prmt.b32 %r15407, %r15403, 0, 0x7773U; 2026-02-21T09:05:41.1169215Z cvt.u16.u32 %rs440, %r15407; 2026-02-21T09:05:41.1169424Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1169491Z shl.b16 %rs441, %rs433, 4; 2026-02-21T09:05:41.1169557Z shl.b16 %rs442, %rs437, 4; 2026-02-21T09:05:41.1169618Z shl.b16 %rs443, %rs434, 4; 2026-02-21T09:05:41.1169679Z shl.b16 %rs444, %rs438, 4; 2026-02-21T09:05:41.1169742Z shl.b16 %rs445, %rs435, 4; 2026-02-21T09:05:41.1169806Z shl.b16 %rs446, %rs439, 4; 2026-02-21T09:05:41.1169931Z shl.b16 %rs447, %rs436, 4; 2026-02-21T09:05:41.1169995Z shl.b16 %rs448, %rs440, 4; 2026-02-21T09:05:41.1170195Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1170261Z cvt.s16.s8 %rs449, %rs441; 2026-02-21T09:05:41.1170320Z shr.s16 %rs450, %rs449, 4; 2026-02-21T09:05:41.1170382Z cvt.s16.s8 %rs451, %rs442; 2026-02-21T09:05:41.1170446Z shr.s16 %rs452, %rs451, 4; 2026-02-21T09:05:41.1170514Z prmt.b32 %r15408, %r15398, 0, 0x8880U; 2026-02-21T09:05:41.1170576Z cvt.u16.u32 %rs453, %r15408; 2026-02-21T09:05:41.1170638Z shr.s16 %rs454, %rs453, 4; 2026-02-21T09:05:41.1170703Z prmt.b32 %r15409, %r15403, 0, 0x8880U; 2026-02-21T09:05:41.1170766Z cvt.u16.u32 %rs455, %r15409; 2026-02-21T09:05:41.1170831Z shr.s16 %rs456, %rs455, 4; 2026-02-21T09:05:41.1171025Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1171096Z cvt.rn.f32.s16 %r15410, %rs456; 2026-02-21T09:05:41.1171163Z cvt.rn.f32.s16 %r15411, %rs454; 2026-02-21T09:05:41.1171226Z cvt.rn.f32.s16 %r15412, %rs452; 2026-02-21T09:05:41.1171290Z cvt.rn.f32.s16 %r15413, %rs450; 2026-02-21T09:05:41.1171485Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1171551Z cvt.s16.s8 %rs457, %rs443; 2026-02-21T09:05:41.1171613Z shr.s16 %rs458, %rs457, 4; 2026-02-21T09:05:41.1171673Z cvt.s16.s8 %rs459, %rs444; 2026-02-21T09:05:41.1171754Z shr.s16 %rs460, %rs459, 4; 2026-02-21T09:05:41.1171822Z prmt.b32 %r15414, %r15398, 0, 0x9991U; 2026-02-21T09:05:41.1171885Z cvt.u16.u32 %rs461, %r15414; 2026-02-21T09:05:41.1171950Z shr.s16 %rs462, %rs461, 4; 2026-02-21T09:05:41.1172016Z prmt.b32 %r15415, %r15403, 0, 0x9991U; 2026-02-21T09:05:41.1172077Z cvt.u16.u32 %rs463, %r15415; 2026-02-21T09:05:41.1172138Z shr.s16 %rs464, %rs463, 4; 2026-02-21T09:05:41.1172343Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1172406Z cvt.rn.f32.s16 %r15416, %rs464; 2026-02-21T09:05:41.1172469Z cvt.rn.f32.s16 %r15417, %rs462; 2026-02-21T09:05:41.1172599Z cvt.rn.f32.s16 %r15418, %rs460; 2026-02-21T09:05:41.1172664Z cvt.rn.f32.s16 %r15419, %rs458; 2026-02-21T09:05:41.1172859Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1172922Z cvt.s16.s8 %rs465, %rs445; 2026-02-21T09:05:41.1172986Z shr.s16 %rs466, %rs465, 4; 2026-02-21T09:05:41.1173046Z cvt.s16.s8 %rs467, %rs446; 2026-02-21T09:05:41.1173107Z shr.s16 %rs468, %rs467, 4; 2026-02-21T09:05:41.1173177Z prmt.b32 %r15420, %r15398, 0, 0xaaa2U; 2026-02-21T09:05:41.1173239Z cvt.u16.u32 %rs469, %r15420; 2026-02-21T09:05:41.1173298Z shr.s16 %rs470, %rs469, 4; 2026-02-21T09:05:41.1173367Z prmt.b32 %r15421, %r15403, 0, 0xaaa2U; 2026-02-21T09:05:41.1173524Z cvt.u16.u32 %rs471, %r15421; 2026-02-21T09:05:41.1173592Z shr.s16 %rs472, %rs471, 4; 2026-02-21T09:05:41.1173787Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1173856Z cvt.rn.f32.s16 %r15422, %rs472; 2026-02-21T09:05:41.1173917Z cvt.rn.f32.s16 %r15423, %rs470; 2026-02-21T09:05:41.1173979Z cvt.rn.f32.s16 %r15424, %rs468; 2026-02-21T09:05:41.1174044Z cvt.rn.f32.s16 %r15425, %rs466; 2026-02-21T09:05:41.1174237Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1174298Z cvt.s16.s8 %rs473, %rs447; 2026-02-21T09:05:41.1174364Z shr.s16 %rs474, %rs473, 4; 2026-02-21T09:05:41.1174425Z cvt.s16.s8 %rs475, %rs448; 2026-02-21T09:05:41.1174485Z shr.s16 %rs476, %rs475, 4; 2026-02-21T09:05:41.1174550Z prmt.b32 %r15426, %r15398, 0, 0xbbb3U; 2026-02-21T09:05:41.1174615Z cvt.u16.u32 %rs477, %r15426; 2026-02-21T09:05:41.1174675Z shr.s16 %rs478, %rs477, 4; 2026-02-21T09:05:41.1174792Z prmt.b32 %r15427, %r15403, 0, 0xbbb3U; 2026-02-21T09:05:41.1174861Z cvt.u16.u32 %rs479, %r15427; 2026-02-21T09:05:41.1174923Z shr.s16 %rs480, %rs479, 4; 2026-02-21T09:05:41.1175117Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1175182Z cvt.rn.f32.s16 %r15428, %rs480; 2026-02-21T09:05:41.1175249Z cvt.rn.f32.s16 %r15429, %rs478; 2026-02-21T09:05:41.1175312Z cvt.rn.f32.s16 %r15430, %rs476; 2026-02-21T09:05:41.1175374Z cvt.rn.f32.s16 %r15431, %rs474; 2026-02-21T09:05:41.1175432Z bar.sync 0; 2026-02-21T09:05:41.1175551Z st.shared.v4.b32 [%r24], {%r15413, %r15411, %r15412, %r15410}; 2026-02-21T09:05:41.1175666Z st.shared.v4.b32 [%r25], {%r15419, %r15417, %r15418, %r15416}; 2026-02-21T09:05:41.1175779Z st.shared.v4.b32 [%r26], {%r15425, %r15423, %r15424, %r15422}; 2026-02-21T09:05:41.1175888Z st.shared.v4.b32 [%r27], {%r15431, %r15429, %r15430, %r15428}; 2026-02-21T09:05:41.1175945Z $L__tmp11: 2026-02-21T09:05:41.1176222Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1176298Z // begin inline asm 2026-02-21T09:05:41.1176382Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1176439Z // end inline asm 2026-02-21T09:05:41.1176611Z bar.sync 0; 2026-02-21T09:05:41.1176689Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1176748Z // begin inline asm 2026-02-21T09:05:41.1178247Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r10721,%r10722,%r10723,%r10724}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1178307Z // end inline asm 2026-02-21T09:05:41.1178366Z // begin inline asm 2026-02-21T09:05:41.1179954Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r10853,%r10854,%r10855,%r10856}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1180014Z // end inline asm 2026-02-21T09:05:41.1180073Z // begin inline asm 2026-02-21T09:05:41.1181621Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r10985,%r10986,%r10987,%r10988}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1181739Z // end inline asm 2026-02-21T09:05:41.1181806Z // begin inline asm 2026-02-21T09:05:41.1183374Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r11117,%r11118,%r11119,%r11120}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1183438Z // end inline asm 2026-02-21T09:05:41.1183500Z // begin inline asm 2026-02-21T09:05:41.1184985Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r11249,%r11250,%r11251,%r11252}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1185051Z // end inline asm 2026-02-21T09:05:41.1185110Z // begin inline asm 2026-02-21T09:05:41.1186698Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r11381,%r11382,%r11383,%r11384}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1186765Z // end inline asm 2026-02-21T09:05:41.1186824Z // begin inline asm 2026-02-21T09:05:41.1188395Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r11513,%r11514,%r11515,%r11516}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1188539Z // end inline asm 2026-02-21T09:05:41.1188600Z // begin inline asm 2026-02-21T09:05:41.1190151Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r11645,%r11646,%r11647,%r11648}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1190267Z // end inline asm 2026-02-21T09:05:41.1190344Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1190410Z mov.b32 %r11905, %r24001; 2026-02-21T09:05:41.1190470Z mov.b32 %r11906, %r15091; 2026-02-21T09:05:41.1190528Z mov.b32 %r11907, %r15091; 2026-02-21T09:05:41.1190590Z // begin inline asm 2026-02-21T09:05:41.1195732Z // wait for regs: %r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416,%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480,%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544,%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608,%r11905,%r11906,%r11907 2026-02-21T09:05:41.1195824Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1195882Z // end inline asm 2026-02-21T09:05:41.1195940Z $L__tmp12: 2026-02-21T09:05:41.1196150Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1196215Z add.s64 %rd153, %rd127, 64; 2026-02-21T09:05:41.1196278Z add.s64 %rd154, %rd128, 64; 2026-02-21T09:05:41.1196341Z add.s64 %rd155, %rd129, 64; 2026-02-21T09:05:41.1196652Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1196796Z add.s64 %rd156, %rd130, 64; 2026-02-21T09:05:41.1196857Z // begin inline asm 2026-02-21T09:05:41.1196916Z mov.u32 %r12167, 0x0; 2026-02-21T09:05:41.1196974Z mov.u32 %r12168, 0x0; 2026-02-21T09:05:41.1197048Z mov.u32 %r12169, 0x0; 2026-02-21T09:05:41.1197107Z mov.u32 %r12170, 0x0; 2026-02-21T09:05:41.1197248Z ld.global.v4.b32 { %r12167, %r12168, %r12169, %r12170 }, [ %rd153 + 0 ]; 2026-02-21T09:05:41.1197307Z // end inline asm 2026-02-21T09:05:41.1197367Z // begin inline asm 2026-02-21T09:05:41.1197425Z mov.u32 %r12171, 0x0; 2026-02-21T09:05:41.1197484Z mov.u32 %r12172, 0x0; 2026-02-21T09:05:41.1197546Z mov.u32 %r12173, 0x0; 2026-02-21T09:05:41.1197603Z mov.u32 %r12174, 0x0; 2026-02-21T09:05:41.1197733Z ld.global.v4.b32 { %r12171, %r12172, %r12173, %r12174 }, [ %rd154 + 0 ]; 2026-02-21T09:05:41.1197921Z // end inline asm 2026-02-21T09:05:41.1197985Z // begin inline asm 2026-02-21T09:05:41.1198044Z mov.u32 %r12175, 0x0; 2026-02-21T09:05:41.1198100Z mov.u32 %r12176, 0x0; 2026-02-21T09:05:41.1198163Z mov.u32 %r12177, 0x0; 2026-02-21T09:05:41.1198220Z mov.u32 %r12178, 0x0; 2026-02-21T09:05:41.1198345Z ld.global.v4.b32 { %r12175, %r12176, %r12177, %r12178 }, [ %rd155 + 0 ]; 2026-02-21T09:05:41.1198405Z // end inline asm 2026-02-21T09:05:41.1198463Z // begin inline asm 2026-02-21T09:05:41.1198522Z mov.u32 %r12179, 0x0; 2026-02-21T09:05:41.1198579Z mov.u32 %r12180, 0x0; 2026-02-21T09:05:41.1198638Z mov.u32 %r12181, 0x0; 2026-02-21T09:05:41.1198695Z mov.u32 %r12182, 0x0; 2026-02-21T09:05:41.1198821Z ld.global.v4.b32 { %r12179, %r12180, %r12181, %r12182 }, [ %rd156 + 0 ]; 2026-02-21T09:05:41.1198881Z // end inline asm 2026-02-21T09:05:41.1199079Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1199138Z bar.sync 0; 2026-02-21T09:05:41.1199289Z st.shared.v2.b32 [%r10], {%r12167, %r12168}; 2026-02-21T09:05:41.1199387Z st.shared.v2.b32 [%r10+2048], {%r12171, %r12172}; 2026-02-21T09:05:41.1199476Z st.shared.v2.b32 [%r10+4096], {%r12175, %r12176}; 2026-02-21T09:05:41.1199575Z st.shared.v2.b32 [%r10+6144], {%r12179, %r12180}; 2026-02-21T09:05:41.1199661Z st.shared.v2.b32 [%r11], {%r12169, %r12170}; 2026-02-21T09:05:41.1199747Z st.shared.v2.b32 [%r11+2048], {%r12173, %r12174}; 2026-02-21T09:05:41.1199831Z st.shared.v2.b32 [%r11+4096], {%r12177, %r12178}; 2026-02-21T09:05:41.1199918Z st.shared.v2.b32 [%r11+6144], {%r12181, %r12182}; 2026-02-21T09:05:41.1199974Z bar.sync 0; 2026-02-21T09:05:41.1200041Z ld.shared.b16 %rs481, [%r12]; 2026-02-21T09:05:41.1200110Z ld.shared.b16 %rs482, [%r12+256]; 2026-02-21T09:05:41.1200184Z ld.shared.b16 %rs483, [%r12+16]; 2026-02-21T09:05:41.1200248Z ld.shared.b16 %rs484, [%r12+272]; 2026-02-21T09:05:41.1200314Z ld.shared.b16 %rs485, [%r12+2048]; 2026-02-21T09:05:41.1200385Z ld.shared.b16 %rs486, [%r12+2304]; 2026-02-21T09:05:41.1200449Z ld.shared.b16 %rs487, [%r12+2064]; 2026-02-21T09:05:41.1200515Z ld.shared.b16 %rs488, [%r12+2320]; 2026-02-21T09:05:41.1200586Z ld.shared.b16 %rs489, [%r12+4096]; 2026-02-21T09:05:41.1200650Z ld.shared.b16 %rs490, [%r12+4352]; 2026-02-21T09:05:41.1200714Z ld.shared.b16 %rs491, [%r12+4112]; 2026-02-21T09:05:41.1200778Z ld.shared.b16 %rs492, [%r12+4368]; 2026-02-21T09:05:41.1200844Z ld.shared.b16 %rs493, [%r12+6144]; 2026-02-21T09:05:41.1200910Z ld.shared.b16 %rs494, [%r12+6400]; 2026-02-21T09:05:41.1200973Z ld.shared.b16 %rs495, [%r12+6160]; 2026-02-21T09:05:41.1201039Z ld.shared.b16 %rs496, [%r12+6416]; 2026-02-21T09:05:41.1201103Z ld.shared.b16 %rs497, [%r13]; 2026-02-21T09:05:41.1201165Z ld.shared.b16 %rs498, [%r13+256]; 2026-02-21T09:05:41.1201229Z ld.shared.b16 %rs499, [%r13+16]; 2026-02-21T09:05:41.1201297Z ld.shared.b16 %rs500, [%r13+272]; 2026-02-21T09:05:41.1201372Z ld.shared.b16 %rs501, [%r13+2048]; 2026-02-21T09:05:41.1201442Z ld.shared.b16 %rs502, [%r13+2304]; 2026-02-21T09:05:41.1201509Z ld.shared.b16 %rs503, [%r13+2064]; 2026-02-21T09:05:41.1201570Z ld.shared.b16 %rs504, [%r13+2320]; 2026-02-21T09:05:41.1201693Z ld.shared.b16 %rs505, [%r13+4096]; 2026-02-21T09:05:41.1201759Z ld.shared.b16 %rs506, [%r13+4352]; 2026-02-21T09:05:41.1201821Z ld.shared.b16 %rs507, [%r13+4112]; 2026-02-21T09:05:41.1201884Z ld.shared.b16 %rs508, [%r13+4368]; 2026-02-21T09:05:41.1201949Z ld.shared.b16 %rs509, [%r13+6144]; 2026-02-21T09:05:41.1202016Z ld.shared.b16 %rs510, [%r13+6400]; 2026-02-21T09:05:41.1202081Z ld.shared.b16 %rs511, [%r13+6160]; 2026-02-21T09:05:41.1202144Z ld.shared.b16 %rs512, [%r13+6416]; 2026-02-21T09:05:41.1202213Z cvt.f32.bf16 %r12313, %rs481; 2026-02-21T09:05:41.1202273Z cvt.f32.bf16 %r12314, %rs482; 2026-02-21T09:05:41.1202333Z cvt.f32.bf16 %r12315, %rs497; 2026-02-21T09:05:41.1202394Z cvt.f32.bf16 %r12316, %rs498; 2026-02-21T09:05:41.1202513Z cvt.f32.bf16 %r12445, %rs483; 2026-02-21T09:05:41.1202621Z cvt.f32.bf16 %r12446, %rs484; 2026-02-21T09:05:41.1202683Z cvt.f32.bf16 %r12447, %rs499; 2026-02-21T09:05:41.1202746Z cvt.f32.bf16 %r12448, %rs500; 2026-02-21T09:05:41.1202809Z cvt.f32.bf16 %r12577, %rs485; 2026-02-21T09:05:41.1202868Z cvt.f32.bf16 %r12578, %rs486; 2026-02-21T09:05:41.1202928Z cvt.f32.bf16 %r12579, %rs501; 2026-02-21T09:05:41.1202999Z cvt.f32.bf16 %r12580, %rs502; 2026-02-21T09:05:41.1203064Z cvt.f32.bf16 %r12709, %rs487; 2026-02-21T09:05:41.1203124Z cvt.f32.bf16 %r12710, %rs488; 2026-02-21T09:05:41.1203188Z cvt.f32.bf16 %r12711, %rs503; 2026-02-21T09:05:41.1203247Z cvt.f32.bf16 %r12712, %rs504; 2026-02-21T09:05:41.1203306Z cvt.f32.bf16 %r12841, %rs489; 2026-02-21T09:05:41.1203368Z cvt.f32.bf16 %r12842, %rs490; 2026-02-21T09:05:41.1203427Z cvt.f32.bf16 %r12843, %rs505; 2026-02-21T09:05:41.1203486Z cvt.f32.bf16 %r12844, %rs506; 2026-02-21T09:05:41.1203546Z cvt.f32.bf16 %r12973, %rs491; 2026-02-21T09:05:41.1203615Z cvt.f32.bf16 %r12974, %rs492; 2026-02-21T09:05:41.1203726Z cvt.f32.bf16 %r12975, %rs507; 2026-02-21T09:05:41.1203787Z cvt.f32.bf16 %r12976, %rs508; 2026-02-21T09:05:41.1203852Z cvt.f32.bf16 %r13105, %rs493; 2026-02-21T09:05:41.1203913Z cvt.f32.bf16 %r13106, %rs494; 2026-02-21T09:05:41.1203973Z cvt.f32.bf16 %r13107, %rs509; 2026-02-21T09:05:41.1204033Z cvt.f32.bf16 %r13108, %rs510; 2026-02-21T09:05:41.1204097Z cvt.f32.bf16 %r13237, %rs495; 2026-02-21T09:05:41.1204157Z cvt.f32.bf16 %r13238, %rs496; 2026-02-21T09:05:41.1204217Z cvt.f32.bf16 %r13239, %rs511; 2026-02-21T09:05:41.1204282Z cvt.f32.bf16 %r13240, %rs512; 2026-02-21T09:05:41.1204485Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1204548Z add.s32 %r15432, %r29352, 131072; 2026-02-21T09:05:41.1204612Z cvt.s64.s32 %rd181, %r15432; 2026-02-21T09:05:41.1204678Z add.s64 %rd157, %rd45, %rd181; 2026-02-21T09:05:41.1204877Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1204939Z // begin inline asm 2026-02-21T09:05:41.1205000Z mov.u32 %r12183, 0x0; 2026-02-21T09:05:41.1205057Z mov.u32 %r12184, 0x0; 2026-02-21T09:05:41.1205159Z ld.global.v2.b32 { %r12183, %r12184 }, [ %rd157 + 0 ]; 2026-02-21T09:05:41.1205219Z // end inline asm 2026-02-21T09:05:41.1205415Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1206023Z [286s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:05:41.1207424Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=8, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[True, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:05:41.1207581Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:05:41.1207643Z `ptxas` stderr: 2026-02-21T09:05:41.1208188Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 801 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:05:41.1208295Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:05:41.1208302Z 2026-02-21T09:05:41.1208810Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfrxler8o.ptx -o /tmp/tmpfrxler8o.ptx.o 2026-02-21T09:05:41.1208819Z 2026-02-21T09:05:41.1208973Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:05:41.1210283Z [286s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 256, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[4, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:05:41.1210466Z Tensor-likes are not close! 2026-02-21T09:05:41.1210471Z 2026-02-21T09:05:41.1210561Z Mismatched elements: 33451524 / 33554432 (99.7%) 2026-02-21T09:05:41.1210737Z Greatest absolute difference: 1392.0 at index (3409, 1855) (up to 0.01 allowed) 2026-02-21T09:05:41.1210900Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:05:41.1211026Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:05:41.1211031Z 2026-02-21T09:05:41.1211090Z bar.sync 0; 2026-02-21T09:05:41.1211161Z st.shared.b8 [%r14], %r12183; 2026-02-21T09:05:41.1211236Z prmt.b32 %r15433, %r12183, 0, 0x7771U; 2026-02-21T09:05:41.1211303Z st.shared.b8 [%r15], %r15433; 2026-02-21T09:05:41.1211435Z prmt.b32 %r15434, %r12183, 0, 0x7772U; 2026-02-21T09:05:41.1211514Z st.shared.b8 [%r16+256], %r15434; 2026-02-21T09:05:41.1211581Z prmt.b32 %r15435, %r12183, 0, 0x7773U; 2026-02-21T09:05:41.1211645Z st.shared.b8 [%r17+256], %r15435; 2026-02-21T09:05:41.1211710Z st.shared.b8 [%r18+512], %r12184; 2026-02-21T09:05:41.1211787Z prmt.b32 %r15436, %r12184, 0, 0x7771U; 2026-02-21T09:05:41.1211852Z st.shared.b8 [%r19+512], %r15436; 2026-02-21T09:05:41.1211918Z prmt.b32 %r15437, %r12184, 0, 0x7772U; 2026-02-21T09:05:41.1211984Z st.shared.b8 [%r20+768], %r15437; 2026-02-21T09:05:41.1212048Z prmt.b32 %r15438, %r12184, 0, 0x7773U; 2026-02-21T09:05:41.1212111Z st.shared.b8 [%r21+768], %r15438; 2026-02-21T09:05:41.1212170Z bar.sync 0; 2026-02-21T09:05:41.1212238Z ld.shared.b32 %r15439, [%r22]; 2026-02-21T09:05:41.1212303Z prmt.b32 %r15440, %r15439, 0, 0x7770U; 2026-02-21T09:05:41.1212370Z cvt.u16.u32 %rs513, %r15440; 2026-02-21T09:05:41.1212440Z prmt.b32 %r15441, %r15439, 0, 0x7771U; 2026-02-21T09:05:41.1212502Z cvt.u16.u32 %rs514, %r15441; 2026-02-21T09:05:41.1212566Z prmt.b32 %r15442, %r15439, 0, 0x7772U; 2026-02-21T09:05:41.1212632Z cvt.u16.u32 %rs515, %r15442; 2026-02-21T09:05:41.1212695Z prmt.b32 %r15443, %r15439, 0, 0x7773U; 2026-02-21T09:05:41.1212753Z cvt.u16.u32 %rs516, %r15443; 2026-02-21T09:05:41.1212821Z ld.shared.b32 %r15444, [%r23]; 2026-02-21T09:05:41.1212885Z prmt.b32 %r15445, %r15444, 0, 0x7770U; 2026-02-21T09:05:41.1212947Z cvt.u16.u32 %rs517, %r15445; 2026-02-21T09:05:41.1213010Z prmt.b32 %r15446, %r15444, 0, 0x7771U; 2026-02-21T09:05:41.1213072Z cvt.u16.u32 %rs518, %r15446; 2026-02-21T09:05:41.1213136Z prmt.b32 %r15447, %r15444, 0, 0x7772U; 2026-02-21T09:05:41.1213196Z cvt.u16.u32 %rs519, %r15447; 2026-02-21T09:05:41.1213263Z prmt.b32 %r15448, %r15444, 0, 0x7773U; 2026-02-21T09:05:41.1213324Z cvt.u16.u32 %rs520, %r15448; 2026-02-21T09:05:41.1213557Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1213624Z shl.b16 %rs521, %rs513, 4; 2026-02-21T09:05:41.1213690Z shl.b16 %rs522, %rs517, 4; 2026-02-21T09:05:41.1213811Z shl.b16 %rs523, %rs514, 4; 2026-02-21T09:05:41.1213871Z shl.b16 %rs524, %rs518, 4; 2026-02-21T09:05:41.1213933Z shl.b16 %rs525, %rs515, 4; 2026-02-21T09:05:41.1213993Z shl.b16 %rs526, %rs519, 4; 2026-02-21T09:05:41.1214052Z shl.b16 %rs527, %rs516, 4; 2026-02-21T09:05:41.1214115Z shl.b16 %rs528, %rs520, 4; 2026-02-21T09:05:41.1214316Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1214378Z cvt.s16.s8 %rs529, %rs521; 2026-02-21T09:05:41.1214438Z shr.s16 %rs530, %rs529, 4; 2026-02-21T09:05:41.1214503Z cvt.s16.s8 %rs531, %rs522; 2026-02-21T09:05:41.1214562Z shr.s16 %rs532, %rs531, 4; 2026-02-21T09:05:41.1214627Z prmt.b32 %r15449, %r15439, 0, 0x8880U; 2026-02-21T09:05:41.1214796Z cvt.u16.u32 %rs533, %r15449; 2026-02-21T09:05:41.1214860Z shr.s16 %rs534, %rs533, 4; 2026-02-21T09:05:41.1214926Z prmt.b32 %r15450, %r15444, 0, 0x8880U; 2026-02-21T09:05:41.1214988Z cvt.u16.u32 %rs535, %r15450; 2026-02-21T09:05:41.1215053Z shr.s16 %rs536, %rs535, 4; 2026-02-21T09:05:41.1215250Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1215316Z cvt.rn.f32.s16 %r15451, %rs536; 2026-02-21T09:05:41.1215380Z cvt.rn.f32.s16 %r15452, %rs534; 2026-02-21T09:05:41.1215444Z cvt.rn.f32.s16 %r15453, %rs532; 2026-02-21T09:05:41.1215505Z cvt.rn.f32.s16 %r15454, %rs530; 2026-02-21T09:05:41.1215702Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1215763Z cvt.s16.s8 %rs537, %rs523; 2026-02-21T09:05:41.1215824Z shr.s16 %rs538, %rs537, 4; 2026-02-21T09:05:41.1215883Z cvt.s16.s8 %rs539, %rs524; 2026-02-21T09:05:41.1215945Z shr.s16 %rs540, %rs539, 4; 2026-02-21T09:05:41.1216063Z prmt.b32 %r15455, %r15439, 0, 0x9991U; 2026-02-21T09:05:41.1216127Z cvt.u16.u32 %rs541, %r15455; 2026-02-21T09:05:41.1216190Z shr.s16 %rs542, %rs541, 4; 2026-02-21T09:05:41.1216254Z prmt.b32 %r15456, %r15444, 0, 0x9991U; 2026-02-21T09:05:41.1216316Z cvt.u16.u32 %rs543, %r15456; 2026-02-21T09:05:41.1216388Z shr.s16 %rs544, %rs543, 4; 2026-02-21T09:05:41.1216701Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1216768Z cvt.rn.f32.s16 %r15457, %rs544; 2026-02-21T09:05:41.1216830Z cvt.rn.f32.s16 %r15458, %rs542; 2026-02-21T09:05:41.1216896Z cvt.rn.f32.s16 %r15459, %rs540; 2026-02-21T09:05:41.1216956Z cvt.rn.f32.s16 %r15460, %rs538; 2026-02-21T09:05:41.1217147Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1217211Z cvt.s16.s8 %rs545, %rs525; 2026-02-21T09:05:41.1217274Z shr.s16 %rs546, %rs545, 4; 2026-02-21T09:05:41.1217339Z cvt.s16.s8 %rs547, %rs526; 2026-02-21T09:05:41.1217400Z shr.s16 %rs548, %rs547, 4; 2026-02-21T09:05:41.1217482Z prmt.b32 %r15461, %r15439, 0, 0xaaa2U; 2026-02-21T09:05:41.1217544Z cvt.u16.u32 %rs549, %r15461; 2026-02-21T09:05:41.1217607Z shr.s16 %rs550, %rs549, 4; 2026-02-21T09:05:41.1217676Z prmt.b32 %r15462, %r15444, 0, 0xaaa2U; 2026-02-21T09:05:41.1217737Z cvt.u16.u32 %rs551, %r15462; 2026-02-21T09:05:41.1217798Z shr.s16 %rs552, %rs551, 4; 2026-02-21T09:05:41.1217997Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1218061Z cvt.rn.f32.s16 %r15463, %rs552; 2026-02-21T09:05:41.1218123Z cvt.rn.f32.s16 %r15464, %rs550; 2026-02-21T09:05:41.1218186Z cvt.rn.f32.s16 %r15465, %rs548; 2026-02-21T09:05:41.1218250Z cvt.rn.f32.s16 %r15466, %rs546; 2026-02-21T09:05:41.1218456Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1218521Z cvt.s16.s8 %rs553, %rs527; 2026-02-21T09:05:41.1218584Z shr.s16 %rs554, %rs553, 4; 2026-02-21T09:05:41.1218644Z cvt.s16.s8 %rs555, %rs528; 2026-02-21T09:05:41.1218712Z shr.s16 %rs556, %rs555, 4; 2026-02-21T09:05:41.1218862Z prmt.b32 %r15467, %r15439, 0, 0xbbb3U; 2026-02-21T09:05:41.1218930Z cvt.u16.u32 %rs557, %r15467; 2026-02-21T09:05:41.1218991Z shr.s16 %rs558, %rs557, 4; 2026-02-21T09:05:41.1219058Z prmt.b32 %r15468, %r15444, 0, 0xbbb3U; 2026-02-21T09:05:41.1219128Z cvt.u16.u32 %rs559, %r15468; 2026-02-21T09:05:41.1219190Z shr.s16 %rs560, %rs559, 4; 2026-02-21T09:05:41.1219386Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1219453Z cvt.rn.f32.s16 %r15469, %rs560; 2026-02-21T09:05:41.1219519Z cvt.rn.f32.s16 %r15470, %rs558; 2026-02-21T09:05:41.1219584Z cvt.rn.f32.s16 %r15471, %rs556; 2026-02-21T09:05:41.1219646Z cvt.rn.f32.s16 %r15472, %rs554; 2026-02-21T09:05:41.1219708Z bar.sync 0; 2026-02-21T09:05:41.1219961Z st.shared.v4.b32 [%r24], {%r15454, %r15452, %r15453, %r15451}; 2026-02-21T09:05:41.1220079Z st.shared.v4.b32 [%r25], {%r15460, %r15458, %r15459, %r15457}; 2026-02-21T09:05:41.1220191Z st.shared.v4.b32 [%r26], {%r15466, %r15464, %r15465, %r15463}; 2026-02-21T09:05:41.1220303Z st.shared.v4.b32 [%r27], {%r15472, %r15470, %r15471, %r15469}; 2026-02-21T09:05:41.1220359Z $L__tmp13: 2026-02-21T09:05:41.1220640Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1220704Z // begin inline asm 2026-02-21T09:05:41.1220782Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1220841Z // end inline asm 2026-02-21T09:05:41.1220903Z bar.sync 0; 2026-02-21T09:05:41.1220977Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1221048Z // begin inline asm 2026-02-21T09:05:41.1222622Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r12313,%r12314,%r12315,%r12316}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1222687Z // end inline asm 2026-02-21T09:05:41.1222748Z // begin inline asm 2026-02-21T09:05:41.1224236Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r12445,%r12446,%r12447,%r12448}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1224297Z // end inline asm 2026-02-21T09:05:41.1224361Z // begin inline asm 2026-02-21T09:05:41.1225845Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r12577,%r12578,%r12579,%r12580}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1225919Z // end inline asm 2026-02-21T09:05:41.1225988Z // begin inline asm 2026-02-21T09:05:41.1227571Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r12709,%r12710,%r12711,%r12712}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1227710Z // end inline asm 2026-02-21T09:05:41.1227770Z // begin inline asm 2026-02-21T09:05:41.1229390Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r12841,%r12842,%r12843,%r12844}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1229520Z // end inline asm 2026-02-21T09:05:41.1229586Z // begin inline asm 2026-02-21T09:05:41.1231137Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r12973,%r12974,%r12975,%r12976}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1231209Z // end inline asm 2026-02-21T09:05:41.1231269Z // begin inline asm 2026-02-21T09:05:41.1232760Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r13105,%r13106,%r13107,%r13108}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1232821Z // end inline asm 2026-02-21T09:05:41.1232881Z // begin inline asm 2026-02-21T09:05:41.1234368Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r13237,%r13238,%r13239,%r13240}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1234428Z // end inline asm 2026-02-21T09:05:41.1234508Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1234576Z mov.b32 %r13497, %r24001; 2026-02-21T09:05:41.1234637Z mov.b32 %r13498, %r15091; 2026-02-21T09:05:41.1234695Z mov.b32 %r13499, %r15091; 2026-02-21T09:05:41.1234760Z // begin inline asm 2026-02-21T09:05:41.1239940Z // wait for regs: %r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416,%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480,%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544,%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608,%r13497,%r13498,%r13499 2026-02-21T09:05:41.1240174Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1240296Z // end inline asm 2026-02-21T09:05:41.1240359Z $L__tmp14: 2026-02-21T09:05:41.1240569Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1240637Z add.s64 %rd166, %rd127, 96; 2026-02-21T09:05:41.1240707Z add.s64 %rd167, %rd128, 96; 2026-02-21T09:05:41.1240768Z add.s64 %rd168, %rd129, 96; 2026-02-21T09:05:41.1240979Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1241047Z add.s64 %rd169, %rd130, 96; 2026-02-21T09:05:41.1241107Z // begin inline asm 2026-02-21T09:05:41.1241170Z mov.u32 %r13759, 0x0; 2026-02-21T09:05:41.1241230Z mov.u32 %r13760, 0x0; 2026-02-21T09:05:41.1241294Z mov.u32 %r13761, 0x0; 2026-02-21T09:05:41.1241350Z mov.u32 %r13762, 0x0; 2026-02-21T09:05:41.1241492Z ld.global.v4.b32 { %r13759, %r13760, %r13761, %r13762 }, [ %rd166 + 0 ]; 2026-02-21T09:05:41.1241560Z // end inline asm 2026-02-21T09:05:41.1241623Z // begin inline asm 2026-02-21T09:05:41.1241682Z mov.u32 %r13763, 0x0; 2026-02-21T09:05:41.1241739Z mov.u32 %r13764, 0x0; 2026-02-21T09:05:41.1241802Z mov.u32 %r13765, 0x0; 2026-02-21T09:05:41.1241862Z mov.u32 %r13766, 0x0; 2026-02-21T09:05:41.1241992Z ld.global.v4.b32 { %r13763, %r13764, %r13765, %r13766 }, [ %rd167 + 0 ]; 2026-02-21T09:05:41.1242053Z // end inline asm 2026-02-21T09:05:41.1242113Z // begin inline asm 2026-02-21T09:05:41.1242171Z mov.u32 %r13767, 0x0; 2026-02-21T09:05:41.1242236Z mov.u32 %r13768, 0x0; 2026-02-21T09:05:41.1242294Z mov.u32 %r13769, 0x0; 2026-02-21T09:05:41.1242352Z mov.u32 %r13770, 0x0; 2026-02-21T09:05:41.1242480Z ld.global.v4.b32 { %r13767, %r13768, %r13769, %r13770 }, [ %rd168 + 0 ]; 2026-02-21T09:05:41.1242544Z // end inline asm 2026-02-21T09:05:41.1242605Z // begin inline asm 2026-02-21T09:05:41.1242663Z mov.u32 %r13771, 0x0; 2026-02-21T09:05:41.1242724Z mov.u32 %r13772, 0x0; 2026-02-21T09:05:41.1242784Z mov.u32 %r13773, 0x0; 2026-02-21T09:05:41.1242859Z mov.u32 %r13774, 0x0; 2026-02-21T09:05:41.1242988Z ld.global.v4.b32 { %r13771, %r13772, %r13773, %r13774 }, [ %rd169 + 0 ]; 2026-02-21T09:05:41.1243052Z // end inline asm 2026-02-21T09:05:41.1243319Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1243378Z bar.sync 0; 2026-02-21T09:05:41.1243468Z st.shared.v2.b32 [%r10], {%r13759, %r13760}; 2026-02-21T09:05:41.1243563Z st.shared.v2.b32 [%r10+2048], {%r13763, %r13764}; 2026-02-21T09:05:41.1243652Z st.shared.v2.b32 [%r10+4096], {%r13767, %r13768}; 2026-02-21T09:05:41.1243738Z st.shared.v2.b32 [%r10+6144], {%r13771, %r13772}; 2026-02-21T09:05:41.1243818Z st.shared.v2.b32 [%r11], {%r13761, %r13762}; 2026-02-21T09:05:41.1243901Z st.shared.v2.b32 [%r11+2048], {%r13765, %r13766}; 2026-02-21T09:05:41.1243986Z st.shared.v2.b32 [%r11+4096], {%r13769, %r13770}; 2026-02-21T09:05:41.1244074Z st.shared.v2.b32 [%r11+6144], {%r13773, %r13774}; 2026-02-21T09:05:41.1244227Z bar.sync 0; 2026-02-21T09:05:41.1244300Z ld.shared.b16 %rs561, [%r12]; 2026-02-21T09:05:41.1244374Z ld.shared.b16 %rs562, [%r12+256]; 2026-02-21T09:05:41.1244444Z ld.shared.b16 %rs563, [%r12+16]; 2026-02-21T09:05:41.1244512Z ld.shared.b16 %rs564, [%r12+272]; 2026-02-21T09:05:41.1244581Z ld.shared.b16 %rs565, [%r12+2048]; 2026-02-21T09:05:41.1244654Z ld.shared.b16 %rs566, [%r12+2304]; 2026-02-21T09:05:41.1244718Z ld.shared.b16 %rs567, [%r12+2064]; 2026-02-21T09:05:41.1244783Z ld.shared.b16 %rs568, [%r12+2320]; 2026-02-21T09:05:41.1244853Z ld.shared.b16 %rs569, [%r12+4096]; 2026-02-21T09:05:41.1244916Z ld.shared.b16 %rs570, [%r12+4352]; 2026-02-21T09:05:41.1244981Z ld.shared.b16 %rs571, [%r12+4112]; 2026-02-21T09:05:41.1245050Z ld.shared.b16 %rs572, [%r12+4368]; 2026-02-21T09:05:41.1245118Z ld.shared.b16 %rs573, [%r12+6144]; 2026-02-21T09:05:41.1245184Z ld.shared.b16 %rs574, [%r12+6400]; 2026-02-21T09:05:41.1245253Z ld.shared.b16 %rs575, [%r12+6160]; 2026-02-21T09:05:41.1245376Z ld.shared.b16 %rs576, [%r12+6416]; 2026-02-21T09:05:41.1245446Z ld.shared.b16 %rs577, [%r13]; 2026-02-21T09:05:41.1245512Z ld.shared.b16 %rs578, [%r13+256]; 2026-02-21T09:05:41.1245584Z ld.shared.b16 %rs579, [%r13+16]; 2026-02-21T09:05:41.1245648Z ld.shared.b16 %rs580, [%r13+272]; 2026-02-21T09:05:41.1245712Z ld.shared.b16 %rs581, [%r13+2048]; 2026-02-21T09:05:41.1245776Z ld.shared.b16 %rs582, [%r13+2304]; 2026-02-21T09:05:41.1245852Z ld.shared.b16 %rs583, [%r13+2064]; 2026-02-21T09:05:41.1245916Z ld.shared.b16 %rs584, [%r13+2320]; 2026-02-21T09:05:41.1245980Z ld.shared.b16 %rs585, [%r13+4096]; 2026-02-21T09:05:41.1246049Z ld.shared.b16 %rs586, [%r13+4352]; 2026-02-21T09:05:41.1246112Z ld.shared.b16 %rs587, [%r13+4112]; 2026-02-21T09:05:41.1246189Z ld.shared.b16 %rs588, [%r13+4368]; 2026-02-21T09:05:41.1246257Z ld.shared.b16 %rs589, [%r13+6144]; 2026-02-21T09:05:41.1246329Z ld.shared.b16 %rs590, [%r13+6400]; 2026-02-21T09:05:41.1246397Z ld.shared.b16 %rs591, [%r13+6160]; 2026-02-21T09:05:41.1246572Z ld.shared.b16 %rs592, [%r13+6416]; 2026-02-21T09:05:41.1246644Z cvt.f32.bf16 %r13905, %rs561; 2026-02-21T09:05:41.1246707Z cvt.f32.bf16 %r13906, %rs562; 2026-02-21T09:05:41.1246771Z cvt.f32.bf16 %r13907, %rs577; 2026-02-21T09:05:41.1246837Z cvt.f32.bf16 %r13908, %rs578; 2026-02-21T09:05:41.1246897Z cvt.f32.bf16 %r14037, %rs563; 2026-02-21T09:05:41.1246959Z cvt.f32.bf16 %r14038, %rs564; 2026-02-21T09:05:41.1247018Z cvt.f32.bf16 %r14039, %rs579; 2026-02-21T09:05:41.1247084Z cvt.f32.bf16 %r14040, %rs580; 2026-02-21T09:05:41.1247147Z cvt.f32.bf16 %r14169, %rs565; 2026-02-21T09:05:41.1247208Z cvt.f32.bf16 %r14170, %rs566; 2026-02-21T09:05:41.1247272Z cvt.f32.bf16 %r14171, %rs581; 2026-02-21T09:05:41.1247332Z cvt.f32.bf16 %r14172, %rs582; 2026-02-21T09:05:41.1247393Z cvt.f32.bf16 %r14301, %rs567; 2026-02-21T09:05:41.1247454Z cvt.f32.bf16 %r14302, %rs568; 2026-02-21T09:05:41.1247520Z cvt.f32.bf16 %r14303, %rs583; 2026-02-21T09:05:41.1247584Z cvt.f32.bf16 %r14304, %rs584; 2026-02-21T09:05:41.1247646Z cvt.f32.bf16 %r14433, %rs569; 2026-02-21T09:05:41.1247712Z cvt.f32.bf16 %r14434, %rs570; 2026-02-21T09:05:41.1247774Z cvt.f32.bf16 %r14435, %rs585; 2026-02-21T09:05:41.1247934Z cvt.f32.bf16 %r14436, %rs586; 2026-02-21T09:05:41.1247995Z cvt.f32.bf16 %r14565, %rs571; 2026-02-21T09:05:41.1248061Z cvt.f32.bf16 %r14566, %rs572; 2026-02-21T09:05:41.1248121Z cvt.f32.bf16 %r14567, %rs587; 2026-02-21T09:05:41.1248182Z cvt.f32.bf16 %r14568, %rs588; 2026-02-21T09:05:41.1248247Z cvt.f32.bf16 %r14697, %rs573; 2026-02-21T09:05:41.1248306Z cvt.f32.bf16 %r14698, %rs574; 2026-02-21T09:05:41.1248367Z cvt.f32.bf16 %r14699, %rs589; 2026-02-21T09:05:41.1248432Z cvt.f32.bf16 %r14700, %rs590; 2026-02-21T09:05:41.1248493Z cvt.f32.bf16 %r14829, %rs575; 2026-02-21T09:05:41.1248552Z cvt.f32.bf16 %r14830, %rs576; 2026-02-21T09:05:41.1248612Z cvt.f32.bf16 %r14831, %rs591; 2026-02-21T09:05:41.1248757Z cvt.f32.bf16 %r14832, %rs592; 2026-02-21T09:05:41.1249027Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1249092Z add.s32 %r15473, %r29352, 196608; 2026-02-21T09:05:41.1249166Z cvt.s64.s32 %rd182, %r15473; 2026-02-21T09:05:41.1249232Z add.s64 %rd170, %rd45, %rd182; 2026-02-21T09:05:41.1249431Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1249492Z // begin inline asm 2026-02-21T09:05:41.1249556Z mov.u32 %r13775, 0x0; 2026-02-21T09:05:41.1249615Z mov.u32 %r13776, 0x0; 2026-02-21T09:05:41.1249714Z ld.global.v2.b32 { %r13775, %r13776 }, [ %rd170 + 0 ]; 2026-02-21T09:05:41.1249777Z // end inline asm 2026-02-21T09:05:41.1249971Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1250029Z bar.sync 0; 2026-02-21T09:05:41.1250099Z st.shared.b8 [%r14], %r13775; 2026-02-21T09:05:41.1250175Z prmt.b32 %r15474, %r13775, 0, 0x7771U; 2026-02-21T09:05:41.1250304Z st.shared.b8 [%r15], %r15474; 2026-02-21T09:05:41.1250374Z prmt.b32 %r15475, %r13775, 0, 0x7772U; 2026-02-21T09:05:41.1250444Z st.shared.b8 [%r16+256], %r15475; 2026-02-21T09:05:41.1250514Z prmt.b32 %r15476, %r13775, 0, 0x7773U; 2026-02-21T09:05:41.1250578Z st.shared.b8 [%r17+256], %r15476; 2026-02-21T09:05:41.1250645Z st.shared.b8 [%r18+512], %r13776; 2026-02-21T09:05:41.1250712Z prmt.b32 %r15477, %r13776, 0, 0x7771U; 2026-02-21T09:05:41.1250777Z st.shared.b8 [%r19+512], %r15477; 2026-02-21T09:05:41.1250845Z prmt.b32 %r15478, %r13776, 0, 0x7772U; 2026-02-21T09:05:41.1250911Z st.shared.b8 [%r20+768], %r15478; 2026-02-21T09:05:41.1250977Z prmt.b32 %r15479, %r13776, 0, 0x7773U; 2026-02-21T09:05:41.1251055Z st.shared.b8 [%r21+768], %r15479; 2026-02-21T09:05:41.1251118Z bar.sync 0; 2026-02-21T09:05:41.1251186Z ld.shared.b32 %r15480, [%r22]; 2026-02-21T09:05:41.1251254Z prmt.b32 %r15481, %r15480, 0, 0x7770U; 2026-02-21T09:05:41.1251325Z cvt.u16.u32 %rs593, %r15481; 2026-02-21T09:05:41.1251392Z prmt.b32 %r15482, %r15480, 0, 0x7771U; 2026-02-21T09:05:41.1251455Z cvt.u16.u32 %rs594, %r15482; 2026-02-21T09:05:41.1251520Z prmt.b32 %r15483, %r15480, 0, 0x7772U; 2026-02-21T09:05:41.1251586Z cvt.u16.u32 %rs595, %r15483; 2026-02-21T09:05:41.1251656Z prmt.b32 %r15484, %r15480, 0, 0x7773U; 2026-02-21T09:05:41.1251718Z cvt.u16.u32 %rs596, %r15484; 2026-02-21T09:05:41.1251787Z ld.shared.b32 %r15485, [%r23]; 2026-02-21T09:05:41.1251853Z prmt.b32 %r15486, %r15485, 0, 0x7770U; 2026-02-21T09:05:41.1251915Z cvt.u16.u32 %rs597, %r15486; 2026-02-21T09:05:41.1251982Z prmt.b32 %r15487, %r15485, 0, 0x7771U; 2026-02-21T09:05:41.1252047Z cvt.u16.u32 %rs598, %r15487; 2026-02-21T09:05:41.1252112Z prmt.b32 %r15488, %r15485, 0, 0x7772U; 2026-02-21T09:05:41.1252174Z cvt.u16.u32 %rs599, %r15488; 2026-02-21T09:05:41.1252249Z prmt.b32 %r15489, %r15485, 0, 0x7773U; 2026-02-21T09:05:41.1252311Z cvt.u16.u32 %rs600, %r15489; 2026-02-21T09:05:41.1252516Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1252585Z shl.b16 %rs601, %rs593, 4; 2026-02-21T09:05:41.1252649Z shl.b16 %rs602, %rs597, 4; 2026-02-21T09:05:41.1252776Z shl.b16 %rs603, %rs594, 4; 2026-02-21T09:05:41.1252837Z shl.b16 %rs604, %rs598, 4; 2026-02-21T09:05:41.1252906Z shl.b16 %rs605, %rs595, 4; 2026-02-21T09:05:41.1252968Z shl.b16 %rs606, %rs599, 4; 2026-02-21T09:05:41.1253029Z shl.b16 %rs607, %rs596, 4; 2026-02-21T09:05:41.1253092Z shl.b16 %rs608, %rs600, 4; 2026-02-21T09:05:41.1253291Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1253356Z cvt.s16.s8 %rs609, %rs601; 2026-02-21T09:05:41.1253417Z shr.s16 %rs610, %rs609, 4; 2026-02-21T09:05:41.1253484Z cvt.s16.s8 %rs611, %rs602; 2026-02-21T09:05:41.1253544Z shr.s16 %rs612, %rs611, 4; 2026-02-21T09:05:41.1253611Z prmt.b32 %r15490, %r15480, 0, 0x8880U; 2026-02-21T09:05:41.1253779Z cvt.u16.u32 %rs613, %r15490; 2026-02-21T09:05:41.1253844Z shr.s16 %rs614, %rs613, 4; 2026-02-21T09:05:41.1253909Z prmt.b32 %r15491, %r15485, 0, 0x8880U; 2026-02-21T09:05:41.1253972Z cvt.u16.u32 %rs615, %r15491; 2026-02-21T09:05:41.1254040Z shr.s16 %rs616, %rs615, 4; 2026-02-21T09:05:41.1254235Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1254301Z cvt.rn.f32.s16 %r15492, %rs616; 2026-02-21T09:05:41.1254369Z cvt.rn.f32.s16 %r15493, %rs614; 2026-02-21T09:05:41.1254430Z cvt.rn.f32.s16 %r15494, %rs612; 2026-02-21T09:05:41.1254493Z cvt.rn.f32.s16 %r15495, %rs610; 2026-02-21T09:05:41.1254691Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1254753Z cvt.s16.s8 %rs617, %rs603; 2026-02-21T09:05:41.1254813Z shr.s16 %rs618, %rs617, 4; 2026-02-21T09:05:41.1254873Z cvt.s16.s8 %rs619, %rs604; 2026-02-21T09:05:41.1254950Z shr.s16 %rs620, %rs619, 4; 2026-02-21T09:05:41.1255073Z prmt.b32 %r15496, %r15480, 0, 0x9991U; 2026-02-21T09:05:41.1255139Z cvt.u16.u32 %rs621, %r15496; 2026-02-21T09:05:41.1255210Z shr.s16 %rs622, %rs621, 4; 2026-02-21T09:05:41.1255280Z prmt.b32 %r15497, %r15485, 0, 0x9991U; 2026-02-21T09:05:41.1255342Z cvt.u16.u32 %rs623, %r15497; 2026-02-21T09:05:41.1255407Z shr.s16 %rs624, %rs623, 4; 2026-02-21T09:05:41.1255602Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1255664Z cvt.rn.f32.s16 %r15498, %rs624; 2026-02-21T09:05:41.1255727Z cvt.rn.f32.s16 %r15499, %rs622; 2026-02-21T09:05:41.1255793Z cvt.rn.f32.s16 %r15500, %rs620; 2026-02-21T09:05:41.1255864Z cvt.rn.f32.s16 %r15501, %rs618; 2026-02-21T09:05:41.1256060Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1256134Z cvt.s16.s8 %rs625, %rs605; 2026-02-21T09:05:41.1256203Z shr.s16 %rs626, %rs625, 4; 2026-02-21T09:05:41.1256268Z cvt.s16.s8 %rs627, %rs606; 2026-02-21T09:05:41.1256330Z shr.s16 %rs628, %rs627, 4; 2026-02-21T09:05:41.1256402Z prmt.b32 %r15502, %r15480, 0, 0xaaa2U; 2026-02-21T09:05:41.1256578Z cvt.u16.u32 %rs629, %r15502; 2026-02-21T09:05:41.1256644Z shr.s16 %rs630, %rs629, 4; 2026-02-21T09:05:41.1256717Z prmt.b32 %r15503, %r15485, 0, 0xaaa2U; 2026-02-21T09:05:41.1256781Z cvt.u16.u32 %rs631, %r15503; 2026-02-21T09:05:41.1256842Z shr.s16 %rs632, %rs631, 4; 2026-02-21T09:05:41.1257057Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1257123Z cvt.rn.f32.s16 %r15504, %rs632; 2026-02-21T09:05:41.1257187Z cvt.rn.f32.s16 %r15505, %rs630; 2026-02-21T09:05:41.1257250Z cvt.rn.f32.s16 %r15506, %rs628; 2026-02-21T09:05:41.1257316Z cvt.rn.f32.s16 %r15507, %rs626; 2026-02-21T09:05:41.1257511Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1257576Z cvt.s16.s8 %rs633, %rs607; 2026-02-21T09:05:41.1257642Z shr.s16 %rs634, %rs633, 4; 2026-02-21T09:05:41.1257705Z cvt.s16.s8 %rs635, %rs608; 2026-02-21T09:05:41.1257765Z shr.s16 %rs636, %rs635, 4; 2026-02-21T09:05:41.1257934Z prmt.b32 %r15508, %r15480, 0, 0xbbb3U; 2026-02-21T09:05:41.1258000Z cvt.u16.u32 %rs637, %r15508; 2026-02-21T09:05:41.1258063Z shr.s16 %rs638, %rs637, 4; 2026-02-21T09:05:41.1258128Z prmt.b32 %r15509, %r15485, 0, 0xbbb3U; 2026-02-21T09:05:41.1258192Z cvt.u16.u32 %rs639, %r15509; 2026-02-21T09:05:41.1258253Z shr.s16 %rs640, %rs639, 4; 2026-02-21T09:05:41.1258448Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1258516Z cvt.rn.f32.s16 %r15510, %rs640; 2026-02-21T09:05:41.1258580Z cvt.rn.f32.s16 %r15511, %rs638; 2026-02-21T09:05:41.1258644Z cvt.rn.f32.s16 %r15512, %rs636; 2026-02-21T09:05:41.1258706Z cvt.rn.f32.s16 %r15513, %rs634; 2026-02-21T09:05:41.1258835Z bar.sync 0; 2026-02-21T09:05:41.1259010Z st.shared.v4.b32 [%r24], {%r15495, %r15493, %r15494, %r15492}; 2026-02-21T09:05:41.1259124Z st.shared.v4.b32 [%r25], {%r15501, %r15499, %r15500, %r15498}; 2026-02-21T09:05:41.1259242Z st.shared.v4.b32 [%r26], {%r15507, %r15505, %r15506, %r15504}; 2026-02-21T09:05:41.1259350Z st.shared.v4.b32 [%r27], {%r15513, %r15511, %r15512, %r15510}; 2026-02-21T09:05:41.1259405Z $L__tmp15: 2026-02-21T09:05:41.1259684Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1259744Z // begin inline asm 2026-02-21T09:05:41.1259822Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1259881Z // end inline asm 2026-02-21T09:05:41.1259941Z bar.sync 0; 2026-02-21T09:05:41.1260014Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1260075Z // begin inline asm 2026-02-21T09:05:41.1261667Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r13905,%r13906,%r13907,%r13908}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1261735Z // end inline asm 2026-02-21T09:05:41.1261795Z // begin inline asm 2026-02-21T09:05:41.1263288Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416}, {%r14037,%r14038,%r14039,%r14040}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1263350Z // end inline asm 2026-02-21T09:05:41.1263416Z // begin inline asm 2026-02-21T09:05:41.1264903Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r14169,%r14170,%r14171,%r14172}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1264964Z // end inline asm 2026-02-21T09:05:41.1265031Z // begin inline asm 2026-02-21T09:05:41.1266611Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480}, {%r14301,%r14302,%r14303,%r14304}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1266764Z // end inline asm 2026-02-21T09:05:41.1266826Z // begin inline asm 2026-02-21T09:05:41.1268435Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r14433,%r14434,%r14435,%r14436}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1268568Z // end inline asm 2026-02-21T09:05:41.1268630Z // begin inline asm 2026-02-21T09:05:41.1270175Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544}, {%r14565,%r14566,%r14567,%r14568}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1270244Z // end inline asm 2026-02-21T09:05:41.1270306Z // begin inline asm 2026-02-21T09:05:41.1271795Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r14697,%r14698,%r14699,%r14700}, %rd2, %p56, 1, 1; 2026-02-21T09:05:41.1271853Z // end inline asm 2026-02-21T09:05:41.1271910Z // begin inline asm 2026-02-21T09:05:41.1273396Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608}, {%r14829,%r14830,%r14831,%r14832}, %rd3, %p56, 1, 1; 2026-02-21T09:05:41.1273458Z // end inline asm 2026-02-21T09:05:41.1273542Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1273606Z mov.b32 %r15089, %r24001; 2026-02-21T09:05:41.1273665Z mov.b32 %r15090, %r15091; 2026-02-21T09:05:41.1273723Z // begin inline asm 2026-02-21T09:05:41.1278889Z // wait for regs: %r29353,%r29354,%r29355,%r29356,%r29357,%r29358,%r29359,%r29360,%r29361,%r29362,%r29363,%r29364,%r29365,%r29366,%r29367,%r29368,%r29369,%r29370,%r29371,%r29372,%r29373,%r29374,%r29375,%r29376,%r29377,%r29378,%r29379,%r29380,%r29381,%r29382,%r29383,%r29384,%r29385,%r29386,%r29387,%r29388,%r29389,%r29390,%r29391,%r29392,%r29393,%r29394,%r29395,%r29396,%r29397,%r29398,%r29399,%r29400,%r29401,%r29402,%r29403,%r29404,%r29405,%r29406,%r29407,%r29408,%r29409,%r29410,%r29411,%r29412,%r29413,%r29414,%r29415,%r29416,%r29417,%r29418,%r29419,%r29420,%r29421,%r29422,%r29423,%r29424,%r29425,%r29426,%r29427,%r29428,%r29429,%r29430,%r29431,%r29432,%r29433,%r29434,%r29435,%r29436,%r29437,%r29438,%r29439,%r29440,%r29441,%r29442,%r29443,%r29444,%r29445,%r29446,%r29447,%r29448,%r29449,%r29450,%r29451,%r29452,%r29453,%r29454,%r29455,%r29456,%r29457,%r29458,%r29459,%r29460,%r29461,%r29462,%r29463,%r29464,%r29465,%r29466,%r29467,%r29468,%r29469,%r29470,%r29471,%r29472,%r29473,%r29474,%r29475,%r29476,%r29477,%r29478,%r29479,%r29480,%r29481,%r29482,%r29483,%r29484,%r29485,%r29486,%r29487,%r29488,%r29489,%r29490,%r29491,%r29492,%r29493,%r29494,%r29495,%r29496,%r29497,%r29498,%r29499,%r29500,%r29501,%r29502,%r29503,%r29504,%r29505,%r29506,%r29507,%r29508,%r29509,%r29510,%r29511,%r29512,%r29513,%r29514,%r29515,%r29516,%r29517,%r29518,%r29519,%r29520,%r29521,%r29522,%r29523,%r29524,%r29525,%r29526,%r29527,%r29528,%r29529,%r29530,%r29531,%r29532,%r29533,%r29534,%r29535,%r29536,%r29537,%r29538,%r29539,%r29540,%r29541,%r29542,%r29543,%r29544,%r29545,%r29546,%r29547,%r29548,%r29549,%r29550,%r29551,%r29552,%r29553,%r29554,%r29555,%r29556,%r29557,%r29558,%r29559,%r29560,%r29561,%r29562,%r29563,%r29564,%r29565,%r29566,%r29567,%r29568,%r29569,%r29570,%r29571,%r29572,%r29573,%r29574,%r29575,%r29576,%r29577,%r29578,%r29579,%r29580,%r29581,%r29582,%r29583,%r29584,%r29585,%r29586,%r29587,%r29588,%r29589,%r29590,%r29591,%r29592,%r29593,%r29594,%r29595,%r29596,%r29597,%r29598,%r29599,%r29600,%r29601,%r29602,%r29603,%r29604,%r29605,%r29606,%r29607,%r29608,%r15089,%r15090,%r15091 2026-02-21T09:05:41.1279099Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1279215Z // end inline asm 2026-02-21T09:05:41.1279279Z $L__tmp16: 2026-02-21T09:05:41.1279502Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1279569Z add.s64 %rd302, %rd302, 32; 2026-02-21T09:05:41.1279636Z add.s32 %r29352, %r29352, 262144; 2026-02-21T09:05:41.1279704Z add.s64 %rd301, %rd301, 128; 2026-02-21T09:05:41.1279770Z setp.lt.u64 %p88, %rd302, 480; 2026-02-21T09:05:41.1279831Z @%p88 bra $L__BB0_5; 2026-02-21T09:05:41.1279951Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:41.1280161Z .loc 1 94 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:94:28 2026-02-21T09:05:41.1280251Z cvt.rn.bf16x2.f32 %r15518, %r29354, %r29353; 2026-02-21T09:05:41.1280341Z cvt.rn.bf16x2.f32 %r15519, %r29356, %r29355; 2026-02-21T09:05:41.1280421Z cvt.rn.bf16x2.f32 %r15520, %r29358, %r29357; 2026-02-21T09:05:41.1280501Z cvt.rn.bf16x2.f32 %r15521, %r29360, %r29359; 2026-02-21T09:05:41.1280580Z cvt.rn.bf16x2.f32 %r15522, %r29362, %r29361; 2026-02-21T09:05:41.1280663Z cvt.rn.bf16x2.f32 %r15523, %r29364, %r29363; 2026-02-21T09:05:41.1280740Z cvt.rn.bf16x2.f32 %r15524, %r29366, %r29365; 2026-02-21T09:05:41.1280816Z cvt.rn.bf16x2.f32 %r15525, %r29368, %r29367; 2026-02-21T09:05:41.1280896Z cvt.rn.bf16x2.f32 %r15526, %r29370, %r29369; 2026-02-21T09:05:41.1280972Z cvt.rn.bf16x2.f32 %r15527, %r29372, %r29371; 2026-02-21T09:05:41.1281048Z cvt.rn.bf16x2.f32 %r15528, %r29374, %r29373; 2026-02-21T09:05:41.1281128Z cvt.rn.bf16x2.f32 %r15529, %r29376, %r29375; 2026-02-21T09:05:41.1281206Z cvt.rn.bf16x2.f32 %r15530, %r29378, %r29377; 2026-02-21T09:05:41.1281296Z cvt.rn.bf16x2.f32 %r15531, %r29380, %r29379; 2026-02-21T09:05:41.1281376Z cvt.rn.bf16x2.f32 %r15532, %r29382, %r29381; 2026-02-21T09:05:41.1281458Z cvt.rn.bf16x2.f32 %r15533, %r29384, %r29383; 2026-02-21T09:05:41.1281533Z cvt.rn.bf16x2.f32 %r15534, %r29386, %r29385; 2026-02-21T09:05:41.1281614Z cvt.rn.bf16x2.f32 %r15535, %r29388, %r29387; 2026-02-21T09:05:41.1281694Z cvt.rn.bf16x2.f32 %r15536, %r29390, %r29389; 2026-02-21T09:05:41.1281771Z cvt.rn.bf16x2.f32 %r15537, %r29392, %r29391; 2026-02-21T09:05:41.1281905Z cvt.rn.bf16x2.f32 %r15538, %r29394, %r29393; 2026-02-21T09:05:41.1281986Z cvt.rn.bf16x2.f32 %r15539, %r29396, %r29395; 2026-02-21T09:05:41.1282063Z cvt.rn.bf16x2.f32 %r15540, %r29398, %r29397; 2026-02-21T09:05:41.1282139Z cvt.rn.bf16x2.f32 %r15541, %r29400, %r29399; 2026-02-21T09:05:41.1282214Z cvt.rn.bf16x2.f32 %r15542, %r29402, %r29401; 2026-02-21T09:05:41.1282297Z cvt.rn.bf16x2.f32 %r15543, %r29404, %r29403; 2026-02-21T09:05:41.1282372Z cvt.rn.bf16x2.f32 %r15544, %r29406, %r29405; 2026-02-21T09:05:41.1282448Z cvt.rn.bf16x2.f32 %r15545, %r29408, %r29407; 2026-02-21T09:05:41.1282528Z cvt.rn.bf16x2.f32 %r15546, %r29410, %r29409; 2026-02-21T09:05:41.1282605Z cvt.rn.bf16x2.f32 %r15547, %r29412, %r29411; 2026-02-21T09:05:41.1282783Z cvt.rn.bf16x2.f32 %r15548, %r29414, %r29413; 2026-02-21T09:05:41.1282863Z cvt.rn.bf16x2.f32 %r15549, %r29416, %r29415; 2026-02-21T09:05:41.1282946Z cvt.rn.bf16x2.f32 %r15550, %r29418, %r29417; 2026-02-21T09:05:41.1283025Z cvt.rn.bf16x2.f32 %r15551, %r29420, %r29419; 2026-02-21T09:05:41.1283101Z cvt.rn.bf16x2.f32 %r15552, %r29422, %r29421; 2026-02-21T09:05:41.1283183Z cvt.rn.bf16x2.f32 %r15553, %r29424, %r29423; 2026-02-21T09:05:41.1283257Z cvt.rn.bf16x2.f32 %r15554, %r29426, %r29425; 2026-02-21T09:05:41.1283334Z cvt.rn.bf16x2.f32 %r15555, %r29428, %r29427; 2026-02-21T09:05:41.1283414Z cvt.rn.bf16x2.f32 %r15556, %r29430, %r29429; 2026-02-21T09:05:41.1283490Z cvt.rn.bf16x2.f32 %r15557, %r29432, %r29431; 2026-02-21T09:05:41.1283566Z cvt.rn.bf16x2.f32 %r15558, %r29434, %r29433; 2026-02-21T09:05:41.1283641Z cvt.rn.bf16x2.f32 %r15559, %r29436, %r29435; 2026-02-21T09:05:41.1283720Z cvt.rn.bf16x2.f32 %r15560, %r29438, %r29437; 2026-02-21T09:05:41.1283798Z cvt.rn.bf16x2.f32 %r15561, %r29440, %r29439; 2026-02-21T09:05:41.1283942Z cvt.rn.bf16x2.f32 %r15562, %r29442, %r29441; 2026-02-21T09:05:41.1284026Z cvt.rn.bf16x2.f32 %r15563, %r29444, %r29443; 2026-02-21T09:05:41.1284101Z cvt.rn.bf16x2.f32 %r15564, %r29446, %r29445; 2026-02-21T09:05:41.1284177Z cvt.rn.bf16x2.f32 %r15565, %r29448, %r29447; 2026-02-21T09:05:41.1284258Z cvt.rn.bf16x2.f32 %r15566, %r29450, %r29449; 2026-02-21T09:05:41.1284335Z cvt.rn.bf16x2.f32 %r15567, %r29452, %r29451; 2026-02-21T09:05:41.1284413Z cvt.rn.bf16x2.f32 %r15568, %r29454, %r29453; 2026-02-21T09:05:41.1284488Z cvt.rn.bf16x2.f32 %r15569, %r29456, %r29455; 2026-02-21T09:05:41.1284570Z cvt.rn.bf16x2.f32 %r15570, %r29458, %r29457; 2026-02-21T09:05:41.1284650Z cvt.rn.bf16x2.f32 %r15571, %r29460, %r29459; 2026-02-21T09:05:41.1284725Z cvt.rn.bf16x2.f32 %r15572, %r29462, %r29461; 2026-02-21T09:05:41.1284821Z cvt.rn.bf16x2.f32 %r15573, %r29464, %r29463; 2026-02-21T09:05:41.1284901Z cvt.rn.bf16x2.f32 %r15574, %r29466, %r29465; 2026-02-21T09:05:41.1284982Z cvt.rn.bf16x2.f32 %r15575, %r29468, %r29467; 2026-02-21T09:05:41.1285066Z cvt.rn.bf16x2.f32 %r15576, %r29470, %r29469; 2026-02-21T09:05:41.1285144Z cvt.rn.bf16x2.f32 %r15577, %r29472, %r29471; 2026-02-21T09:05:41.1285223Z cvt.rn.bf16x2.f32 %r15578, %r29474, %r29473; 2026-02-21T09:05:41.1285301Z cvt.rn.bf16x2.f32 %r15579, %r29476, %r29475; 2026-02-21T09:05:41.1285386Z cvt.rn.bf16x2.f32 %r15580, %r29478, %r29477; 2026-02-21T09:05:41.1285462Z cvt.rn.bf16x2.f32 %r15581, %r29480, %r29479; 2026-02-21T09:05:41.1285539Z cvt.rn.bf16x2.f32 %r15582, %r29482, %r29481; 2026-02-21T09:05:41.1285620Z cvt.rn.bf16x2.f32 %r15583, %r29484, %r29483; 2026-02-21T09:05:41.1285698Z cvt.rn.bf16x2.f32 %r15584, %r29486, %r29485; 2026-02-21T09:05:41.1285776Z cvt.rn.bf16x2.f32 %r15585, %r29488, %r29487; 2026-02-21T09:05:41.1285863Z cvt.rn.bf16x2.f32 %r15586, %r29490, %r29489; 2026-02-21T09:05:41.1285942Z cvt.rn.bf16x2.f32 %r15587, %r29492, %r29491; 2026-02-21T09:05:41.1286023Z cvt.rn.bf16x2.f32 %r15588, %r29494, %r29493; 2026-02-21T09:05:41.1286104Z cvt.rn.bf16x2.f32 %r15589, %r29496, %r29495; 2026-02-21T09:05:41.1286186Z cvt.rn.bf16x2.f32 %r15590, %r29498, %r29497; 2026-02-21T09:05:41.1286264Z cvt.rn.bf16x2.f32 %r15591, %r29500, %r29499; 2026-02-21T09:05:41.1286398Z cvt.rn.bf16x2.f32 %r15592, %r29502, %r29501; 2026-02-21T09:05:41.1286604Z cvt.rn.bf16x2.f32 %r15593, %r29504, %r29503; 2026-02-21T09:05:41.1286690Z cvt.rn.bf16x2.f32 %r15594, %r29506, %r29505; 2026-02-21T09:05:41.1286767Z cvt.rn.bf16x2.f32 %r15595, %r29508, %r29507; 2026-02-21T09:05:41.1286848Z cvt.rn.bf16x2.f32 %r15596, %r29510, %r29509; 2026-02-21T09:05:41.1286924Z cvt.rn.bf16x2.f32 %r15597, %r29512, %r29511; 2026-02-21T09:05:41.1287002Z cvt.rn.bf16x2.f32 %r15598, %r29514, %r29513; 2026-02-21T09:05:41.1287079Z cvt.rn.bf16x2.f32 %r15599, %r29516, %r29515; 2026-02-21T09:05:41.1287160Z cvt.rn.bf16x2.f32 %r15600, %r29518, %r29517; 2026-02-21T09:05:41.1287237Z cvt.rn.bf16x2.f32 %r15601, %r29520, %r29519; 2026-02-21T09:05:41.1287455Z cvt.rn.bf16x2.f32 %r15602, %r29522, %r29521; 2026-02-21T09:05:41.1287540Z cvt.rn.bf16x2.f32 %r15603, %r29524, %r29523; 2026-02-21T09:05:41.1287617Z cvt.rn.bf16x2.f32 %r15604, %r29526, %r29525; 2026-02-21T09:05:41.1287709Z cvt.rn.bf16x2.f32 %r15605, %r29528, %r29527; 2026-02-21T09:05:41.1287791Z cvt.rn.bf16x2.f32 %r15606, %r29530, %r29529; 2026-02-21T09:05:41.1287867Z cvt.rn.bf16x2.f32 %r15607, %r29532, %r29531; 2026-02-21T09:05:41.1287943Z cvt.rn.bf16x2.f32 %r15608, %r29534, %r29533; 2026-02-21T09:05:41.1288019Z cvt.rn.bf16x2.f32 %r15609, %r29536, %r29535; 2026-02-21T09:05:41.1288104Z cvt.rn.bf16x2.f32 %r15610, %r29538, %r29537; 2026-02-21T09:05:41.1288180Z cvt.rn.bf16x2.f32 %r15611, %r29540, %r29539; 2026-02-21T09:05:41.1288256Z cvt.rn.bf16x2.f32 %r15612, %r29542, %r29541; 2026-02-21T09:05:41.1288335Z cvt.rn.bf16x2.f32 %r15613, %r29544, %r29543; 2026-02-21T09:05:41.1288412Z cvt.rn.bf16x2.f32 %r15614, %r29546, %r29545; 2026-02-21T09:05:41.1288492Z cvt.rn.bf16x2.f32 %r15615, %r29548, %r29547; 2026-02-21T09:05:41.1288633Z cvt.rn.bf16x2.f32 %r15616, %r29550, %r29549; 2026-02-21T09:05:41.1288719Z cvt.rn.bf16x2.f32 %r15617, %r29552, %r29551; 2026-02-21T09:05:41.1288796Z cvt.rn.bf16x2.f32 %r15618, %r29554, %r29553; 2026-02-21T09:05:41.1288876Z cvt.rn.bf16x2.f32 %r15619, %r29556, %r29555; 2026-02-21T09:05:41.1288958Z cvt.rn.bf16x2.f32 %r15620, %r29558, %r29557; 2026-02-21T09:05:41.1289034Z cvt.rn.bf16x2.f32 %r15621, %r29560, %r29559; 2026-02-21T09:05:41.1289109Z cvt.rn.bf16x2.f32 %r15622, %r29562, %r29561; 2026-02-21T09:05:41.1289188Z cvt.rn.bf16x2.f32 %r15623, %r29564, %r29563; 2026-02-21T09:05:41.1289265Z cvt.rn.bf16x2.f32 %r15624, %r29566, %r29565; 2026-02-21T09:05:41.1289341Z cvt.rn.bf16x2.f32 %r15625, %r29568, %r29567; 2026-02-21T09:05:41.1289417Z cvt.rn.bf16x2.f32 %r15626, %r29570, %r29569; 2026-02-21T09:05:41.1289499Z cvt.rn.bf16x2.f32 %r15627, %r29572, %r29571; 2026-02-21T09:05:41.1289574Z cvt.rn.bf16x2.f32 %r15628, %r29574, %r29573; 2026-02-21T09:05:41.1289656Z cvt.rn.bf16x2.f32 %r15629, %r29576, %r29575; 2026-02-21T09:05:41.1289737Z cvt.rn.bf16x2.f32 %r15630, %r29578, %r29577; 2026-02-21T09:05:41.1289812Z cvt.rn.bf16x2.f32 %r15631, %r29580, %r29579; 2026-02-21T09:05:41.1289891Z cvt.rn.bf16x2.f32 %r15632, %r29582, %r29581; 2026-02-21T09:05:41.1289974Z cvt.rn.bf16x2.f32 %r15633, %r29584, %r29583; 2026-02-21T09:05:41.1290063Z cvt.rn.bf16x2.f32 %r15634, %r29586, %r29585; 2026-02-21T09:05:41.1290144Z cvt.rn.bf16x2.f32 %r15635, %r29588, %r29587; 2026-02-21T09:05:41.1290223Z cvt.rn.bf16x2.f32 %r15636, %r29590, %r29589; 2026-02-21T09:05:41.1290306Z cvt.rn.bf16x2.f32 %r15637, %r29592, %r29591; 2026-02-21T09:05:41.1290382Z cvt.rn.bf16x2.f32 %r15638, %r29594, %r29593; 2026-02-21T09:05:41.1290459Z cvt.rn.bf16x2.f32 %r15639, %r29596, %r29595; 2026-02-21T09:05:41.1290542Z cvt.rn.bf16x2.f32 %r15640, %r29598, %r29597; 2026-02-21T09:05:41.1290618Z cvt.rn.bf16x2.f32 %r15641, %r29600, %r29599; 2026-02-21T09:05:41.1290699Z cvt.rn.bf16x2.f32 %r15642, %r29602, %r29601; 2026-02-21T09:05:41.1290781Z cvt.rn.bf16x2.f32 %r15643, %r29604, %r29603; 2026-02-21T09:05:41.1290859Z cvt.rn.bf16x2.f32 %r15644, %r29606, %r29605; 2026-02-21T09:05:41.1290935Z cvt.rn.bf16x2.f32 %r15645, %r29608, %r29607; 2026-02-21T09:05:41.1291220Z .loc 1 95 43 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:95:43 2026-02-21T09:05:41.1291286Z bar.sync 0; 2026-02-21T09:05:41.1291486Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r15518, %r15519, %r15520, %r15521}; 2026-02-21T09:05:41.1291675Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r15534, %r15535, %r15536, %r15537}; 2026-02-21T09:05:41.1291866Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r15550, %r15551, %r15552, %r15553}; 2026-02-21T09:05:41.1292049Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r15566, %r15567, %r15568, %r15569}; 2026-02-21T09:05:41.1292229Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r15582, %r15583, %r15584, %r15585}; 2026-02-21T09:05:41.1292520Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r15598, %r15599, %r15600, %r15601}; 2026-02-21T09:05:41.1292702Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r15614, %r15615, %r15616, %r15617}; 2026-02-21T09:05:41.1292886Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r15630, %r15631, %r15632, %r15633}; 2026-02-21T09:05:41.1293071Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r15522, %r15523, %r15524, %r15525}; 2026-02-21T09:05:41.1293252Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r15538, %r15539, %r15540, %r15541}; 2026-02-21T09:05:41.1293434Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r15554, %r15555, %r15556, %r15557}; 2026-02-21T09:05:41.1293624Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r15570, %r15571, %r15572, %r15573}; 2026-02-21T09:05:41.1293805Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r15586, %r15587, %r15588, %r15589}; 2026-02-21T09:05:41.1293988Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r15602, %r15603, %r15604, %r15605}; 2026-02-21T09:05:41.1294220Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r15618, %r15619, %r15620, %r15621}; 2026-02-21T09:05:41.1294408Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r15634, %r15635, %r15636, %r15637}; 2026-02-21T09:05:41.1294606Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r15526, %r15527, %r15528, %r15529}; 2026-02-21T09:05:41.1294790Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r15542, %r15543, %r15544, %r15545}; 2026-02-21T09:05:41.1294976Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r15558, %r15559, %r15560, %r15561}; 2026-02-21T09:05:41.1295157Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r15574, %r15575, %r15576, %r15577}; 2026-02-21T09:05:41.1295336Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r15590, %r15591, %r15592, %r15593}; 2026-02-21T09:05:41.1295523Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r15606, %r15607, %r15608, %r15609}; 2026-02-21T09:05:41.1295705Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r15622, %r15623, %r15624, %r15625}; 2026-02-21T09:05:41.1295889Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r15638, %r15639, %r15640, %r15641}; 2026-02-21T09:05:41.1296083Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r52], {%r15530, %r15531, %r15532, %r15533}; 2026-02-21T09:05:41.1296269Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r15546, %r15547, %r15548, %r15549}; 2026-02-21T09:05:41.1296567Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r54], {%r15562, %r15563, %r15564, %r15565}; 2026-02-21T09:05:41.1296763Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r55], {%r15578, %r15579, %r15580, %r15581}; 2026-02-21T09:05:41.1296947Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r56], {%r15594, %r15595, %r15596, %r15597}; 2026-02-21T09:05:41.1297128Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r57], {%r15610, %r15611, %r15612, %r15613}; 2026-02-21T09:05:41.1297313Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r58], {%r15626, %r15627, %r15628, %r15629}; 2026-02-21T09:05:41.1297498Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r59], {%r15642, %r15643, %r15644, %r15645}; 2026-02-21T09:05:41.1297561Z // begin inline asm 2026-02-21T09:05:41.1297651Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1297799Z // end inline asm 2026-02-21T09:05:41.1297855Z bar.sync 0; 2026-02-21T09:05:41.1297926Z elect.sync %r15646|%p91, -1; 2026-02-21T09:05:41.1298002Z and.pred %p89, %p167, %p91; 2026-02-21T09:05:41.1298065Z or.b32 %r15514, %r640, %r638; 2026-02-21T09:05:41.1298126Z // begin inline asm 2026-02-21T09:05:41.1298368Z @%p89 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd183, {%r15514, %r15515}], [%r15516]; 2026-02-21T09:05:41.1298427Z // end inline asm 2026-02-21T09:05:41.1298502Z cp.async.bulk.commit_group; 2026-02-21T09:05:41.1298581Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:41.1298643Z bar.sync 0; 2026-02-21T09:05:41.1298859Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.1299052Z add.s32 %r15647, %r29094, 2112; 2026-02-21T09:05:41.1299264Z .loc 1 32 35 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:32:35 2026-02-21T09:05:41.1299328Z shr.s32 %r15648, %r15647, 31; 2026-02-21T09:05:41.1299393Z shr.u32 %r15649, %r15648, 25; 2026-02-21T09:05:41.1299462Z add.s32 %r15650, %r15647, %r15649; 2026-02-21T09:05:41.1299525Z shr.s32 %r15651, %r15650, 7; 2026-02-21T09:05:41.1299722Z .loc 1 33 33 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:33:33 2026-02-21T09:05:41.1299784Z shl.b32 %r15652, %r15651, 1; 2026-02-21T09:05:41.1299985Z .loc 1 34 39 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:39 2026-02-21T09:05:41.1300048Z sub.s32 %r15653, 16, %r15652; 2026-02-21T09:05:41.1300243Z .loc 1 34 52 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:34:52 2026-02-21T09:05:41.1300311Z min.s32 %r15654, %r15653, 2; 2026-02-21T09:05:41.1300571Z .loc 1 35 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:45 2026-02-21T09:05:41.1300643Z and.b32 %r15655, %r15650, -128; 2026-02-21T09:05:41.1300713Z sub.s32 %r15656, %r15647, %r15655; 2026-02-21T09:05:41.1300909Z .loc 1 36 51 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:36:51 2026-02-21T09:05:41.1300972Z div.s32 %r15657, %r15656, %r15654; 2026-02-21T09:05:41.1301173Z .loc 1 35 64 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:64 2026-02-21T09:05:41.1301254Z mul.lo.s32 %r15658, %r15657, %r15654; 2026-02-21T09:05:41.1301318Z sub.s32 %r15659, %r15656, %r15658; 2026-02-21T09:05:41.1301513Z .loc 1 35 30 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:30 2026-02-21T09:05:41.1301579Z add.s32 %r15660, %r15659, %r15652; 2026-02-21T09:05:41.1301774Z .loc 1 37 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:37:27 2026-02-21T09:05:41.1301841Z shl.b32 %r22201, %r15660, 8; 2026-02-21T09:05:41.1302042Z .loc 1 39 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:39:27 2026-02-21T09:05:41.1302107Z shl.b32 %r1157, %r15657, 7; 2026-02-21T09:05:41.1302312Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1302382Z add.s32 %r29609, %r60, %r1157; 2026-02-21T09:05:41.1302447Z or.b32 %r15661, %r29091, %r22201; 2026-02-21T09:05:41.1302508Z shl.b32 %r15662, %r15661, 10; 2026-02-21T09:05:41.1302581Z mul.wide.s32 %rd25, %r15662, 2; 2026-02-21T09:05:41.1302643Z or.b32 %r15663, %r29092, %r22201; 2026-02-21T09:05:41.1302704Z shl.b32 %r15664, %r15663, 10; 2026-02-21T09:05:41.1302768Z mul.wide.s32 %rd26, %r15664, 2; 2026-02-21T09:05:41.1302834Z or.b32 %r15665, %r29093, %r22201; 2026-02-21T09:05:41.1302897Z shl.b32 %r15666, %r15665, 10; 2026-02-21T09:05:41.1302963Z mul.wide.s32 %rd27, %r15666, 2; 2026-02-21T09:05:41.1303031Z shl.b32 %r15667, %r15660, 18; 2026-02-21T09:05:41.1303098Z or.b32 %r15668, %r64, %r15667; 2026-02-21T09:05:41.1303165Z mul.wide.s32 %rd28, %r15668, 2; 2026-02-21T09:05:41.1303227Z mov.b32 %r29610, 0f00000000; 2026-02-21T09:05:41.1303355Z mov.b64 %rd304, -32; 2026-02-21T09:05:41.1303419Z mov.b64 %rd303, %rd4; 2026-02-21T09:05:41.1303482Z mov.b32 %r29611, %r29610; 2026-02-21T09:05:41.1303547Z mov.b32 %r29612, %r29610; 2026-02-21T09:05:41.1303609Z mov.b32 %r29613, %r29610; 2026-02-21T09:05:41.1303669Z mov.b32 %r29614, %r29610; 2026-02-21T09:05:41.1303729Z mov.b32 %r29615, %r29610; 2026-02-21T09:05:41.1303792Z mov.b32 %r29616, %r29610; 2026-02-21T09:05:41.1303851Z mov.b32 %r29617, %r29610; 2026-02-21T09:05:41.1303911Z mov.b32 %r29618, %r29610; 2026-02-21T09:05:41.1303975Z mov.b32 %r29619, %r29610; 2026-02-21T09:05:41.1304035Z mov.b32 %r29620, %r29610; 2026-02-21T09:05:41.1304095Z mov.b32 %r29621, %r29610; 2026-02-21T09:05:41.1304153Z mov.b32 %r29622, %r29610; 2026-02-21T09:05:41.1304342Z mov.b32 %r29623, %r29610; 2026-02-21T09:05:41.1304408Z mov.b32 %r29624, %r29610; 2026-02-21T09:05:41.1304468Z mov.b32 %r29625, %r29610; 2026-02-21T09:05:41.1304532Z mov.b32 %r29626, %r29610; 2026-02-21T09:05:41.1304593Z mov.b32 %r29627, %r29610; 2026-02-21T09:05:41.1304653Z mov.b32 %r29628, %r29610; 2026-02-21T09:05:41.1304713Z mov.b32 %r29629, %r29610; 2026-02-21T09:05:41.1304779Z mov.b32 %r29630, %r29610; 2026-02-21T09:05:41.1304838Z mov.b32 %r29631, %r29610; 2026-02-21T09:05:41.1304898Z mov.b32 %r29632, %r29610; 2026-02-21T09:05:41.1304962Z mov.b32 %r29633, %r29610; 2026-02-21T09:05:41.1305021Z mov.b32 %r29634, %r29610; 2026-02-21T09:05:41.1305080Z mov.b32 %r29635, %r29610; 2026-02-21T09:05:41.1305144Z mov.b32 %r29636, %r29610; 2026-02-21T09:05:41.1305206Z mov.b32 %r29637, %r29610; 2026-02-21T09:05:41.1305266Z mov.b32 %r29638, %r29610; 2026-02-21T09:05:41.1305325Z mov.b32 %r29639, %r29610; 2026-02-21T09:05:41.1305388Z mov.b32 %r29640, %r29610; 2026-02-21T09:05:41.1305451Z mov.b32 %r29641, %r29610; 2026-02-21T09:05:41.1305561Z mov.b32 %r29642, %r29610; 2026-02-21T09:05:41.1305628Z mov.b32 %r29643, %r29610; 2026-02-21T09:05:41.1305688Z mov.b32 %r29644, %r29610; 2026-02-21T09:05:41.1305750Z mov.b32 %r29645, %r29610; 2026-02-21T09:05:41.1305810Z mov.b32 %r29646, %r29610; 2026-02-21T09:05:41.1305873Z mov.b32 %r29647, %r29610; 2026-02-21T09:05:41.1305932Z mov.b32 %r29648, %r29610; 2026-02-21T09:05:41.1305992Z mov.b32 %r29649, %r29610; 2026-02-21T09:05:41.1306057Z mov.b32 %r29650, %r29610; 2026-02-21T09:05:41.1306115Z mov.b32 %r29651, %r29610; 2026-02-21T09:05:41.1306175Z mov.b32 %r29652, %r29610; 2026-02-21T09:05:41.1306233Z mov.b32 %r29653, %r29610; 2026-02-21T09:05:41.1306310Z mov.b32 %r29654, %r29610; 2026-02-21T09:05:41.1306375Z mov.b32 %r29655, %r29610; 2026-02-21T09:05:41.1306436Z mov.b32 %r29656, %r29610; 2026-02-21T09:05:41.1306609Z mov.b32 %r29657, %r29610; 2026-02-21T09:05:41.1306672Z mov.b32 %r29658, %r29610; 2026-02-21T09:05:41.1306736Z mov.b32 %r29659, %r29610; 2026-02-21T09:05:41.1306797Z mov.b32 %r29660, %r29610; 2026-02-21T09:05:41.1306863Z mov.b32 %r29661, %r29610; 2026-02-21T09:05:41.1306923Z mov.b32 %r29662, %r29610; 2026-02-21T09:05:41.1306984Z mov.b32 %r29663, %r29610; 2026-02-21T09:05:41.1307050Z mov.b32 %r29664, %r29610; 2026-02-21T09:05:41.1307109Z mov.b32 %r29665, %r29610; 2026-02-21T09:05:41.1307168Z mov.b32 %r29666, %r29610; 2026-02-21T09:05:41.1307239Z mov.b32 %r29667, %r29610; 2026-02-21T09:05:41.1307306Z mov.b32 %r29668, %r29610; 2026-02-21T09:05:41.1307366Z mov.b32 %r29669, %r29610; 2026-02-21T09:05:41.1307425Z mov.b32 %r29670, %r29610; 2026-02-21T09:05:41.1307489Z mov.b32 %r29671, %r29610; 2026-02-21T09:05:41.1307550Z mov.b32 %r29672, %r29610; 2026-02-21T09:05:41.1307609Z mov.b32 %r29673, %r29610; 2026-02-21T09:05:41.1307667Z mov.b32 %r29674, %r29610; 2026-02-21T09:05:41.1307732Z mov.b32 %r29675, %r29610; 2026-02-21T09:05:41.1307793Z mov.b32 %r29676, %r29610; 2026-02-21T09:05:41.1307852Z mov.b32 %r29677, %r29610; 2026-02-21T09:05:41.1307917Z mov.b32 %r29678, %r29610; 2026-02-21T09:05:41.1307976Z mov.b32 %r29679, %r29610; 2026-02-21T09:05:41.1308035Z mov.b32 %r29680, %r29610; 2026-02-21T09:05:41.1308175Z mov.b32 %r29681, %r29610; 2026-02-21T09:05:41.1308240Z mov.b32 %r29682, %r29610; 2026-02-21T09:05:41.1308380Z mov.b32 %r29683, %r29610; 2026-02-21T09:05:41.1308443Z mov.b32 %r29684, %r29610; 2026-02-21T09:05:41.1308507Z mov.b32 %r29685, %r29610; 2026-02-21T09:05:41.1308565Z mov.b32 %r29686, %r29610; 2026-02-21T09:05:41.1308624Z mov.b32 %r29687, %r29610; 2026-02-21T09:05:41.1308686Z mov.b32 %r29688, %r29610; 2026-02-21T09:05:41.1308744Z mov.b32 %r29689, %r29610; 2026-02-21T09:05:41.1308803Z mov.b32 %r29690, %r29610; 2026-02-21T09:05:41.1308863Z mov.b32 %r29691, %r29610; 2026-02-21T09:05:41.1308927Z mov.b32 %r29692, %r29610; 2026-02-21T09:05:41.1308987Z mov.b32 %r29693, %r29610; 2026-02-21T09:05:41.1309118Z mov.b32 %r29694, %r29610; 2026-02-21T09:05:41.1309239Z mov.b32 %r29695, %r29610; 2026-02-21T09:05:41.1309302Z mov.b32 %r29696, %r29610; 2026-02-21T09:05:41.1309360Z mov.b32 %r29697, %r29610; 2026-02-21T09:05:41.1309417Z mov.b32 %r29698, %r29610; 2026-02-21T09:05:41.1309483Z mov.b32 %r29699, %r29610; 2026-02-21T09:05:41.1309543Z mov.b32 %r29700, %r29610; 2026-02-21T09:05:41.1309602Z mov.b32 %r29701, %r29610; 2026-02-21T09:05:41.1309664Z mov.b32 %r29702, %r29610; 2026-02-21T09:05:41.1309723Z mov.b32 %r29703, %r29610; 2026-02-21T09:05:41.1309781Z mov.b32 %r29704, %r29610; 2026-02-21T09:05:41.1309839Z mov.b32 %r29705, %r29610; 2026-02-21T09:05:41.1309902Z mov.b32 %r29706, %r29610; 2026-02-21T09:05:41.1309961Z mov.b32 %r29707, %r29610; 2026-02-21T09:05:41.1310032Z mov.b32 %r29708, %r29610; 2026-02-21T09:05:41.1310098Z mov.b32 %r29709, %r29610; 2026-02-21T09:05:41.1310158Z mov.b32 %r29710, %r29610; 2026-02-21T09:05:41.1310217Z mov.b32 %r29711, %r29610; 2026-02-21T09:05:41.1310279Z mov.b32 %r29712, %r29610; 2026-02-21T09:05:41.1310415Z mov.b32 %r29713, %r29610; 2026-02-21T09:05:41.1310482Z mov.b32 %r29714, %r29610; 2026-02-21T09:05:41.1310541Z mov.b32 %r29715, %r29610; 2026-02-21T09:05:41.1310606Z mov.b32 %r29716, %r29610; 2026-02-21T09:05:41.1310668Z mov.b32 %r29717, %r29610; 2026-02-21T09:05:41.1310726Z mov.b32 %r29718, %r29610; 2026-02-21T09:05:41.1310786Z mov.b32 %r29719, %r29610; 2026-02-21T09:05:41.1310847Z mov.b32 %r29720, %r29610; 2026-02-21T09:05:41.1310906Z mov.b32 %r29721, %r29610; 2026-02-21T09:05:41.1310966Z mov.b32 %r29722, %r29610; 2026-02-21T09:05:41.1311032Z mov.b32 %r29723, %r29610; 2026-02-21T09:05:41.1311090Z mov.b32 %r29724, %r29610; 2026-02-21T09:05:41.1311150Z mov.b32 %r29725, %r29610; 2026-02-21T09:05:41.1311209Z mov.b32 %r29726, %r29610; 2026-02-21T09:05:41.1311272Z mov.b32 %r29727, %r29610; 2026-02-21T09:05:41.1311332Z mov.b32 %r29728, %r29610; 2026-02-21T09:05:41.1311403Z mov.b32 %r29729, %r29610; 2026-02-21T09:05:41.1311468Z mov.b32 %r29730, %r29610; 2026-02-21T09:05:41.1311531Z mov.b32 %r29731, %r29610; 2026-02-21T09:05:41.1311592Z mov.b32 %r29732, %r29610; 2026-02-21T09:05:41.1311654Z mov.b32 %r29733, %r29610; 2026-02-21T09:05:41.1311713Z mov.b32 %r29734, %r29610; 2026-02-21T09:05:41.1311775Z mov.b32 %r29735, %r29610; 2026-02-21T09:05:41.1311836Z mov.b32 %r29736, %r29610; 2026-02-21T09:05:41.1311900Z mov.b32 %r29737, %r29610; 2026-02-21T09:05:41.1311961Z mov.b32 %r29738, %r29610; 2026-02-21T09:05:41.1312021Z mov.b32 %r29739, %r29610; 2026-02-21T09:05:41.1312086Z mov.b32 %r29740, %r29610; 2026-02-21T09:05:41.1312147Z mov.b32 %r29741, %r29610; 2026-02-21T09:05:41.1312206Z mov.b32 %r29742, %r29610; 2026-02-21T09:05:41.1312264Z mov.b32 %r29743, %r29610; 2026-02-21T09:05:41.1312329Z mov.b32 %r29744, %r29610; 2026-02-21T09:05:41.1312390Z mov.b32 %r29745, %r29610; 2026-02-21T09:05:41.1312463Z mov.b32 %r29746, %r29610; 2026-02-21T09:05:41.1312530Z mov.b32 %r29747, %r29610; 2026-02-21T09:05:41.1312592Z mov.b32 %r29748, %r29610; 2026-02-21T09:05:41.1312654Z mov.b32 %r29749, %r29610; 2026-02-21T09:05:41.1312713Z mov.b32 %r29750, %r29610; 2026-02-21T09:05:41.1312777Z mov.b32 %r29751, %r29610; 2026-02-21T09:05:41.1312837Z mov.b32 %r29752, %r29610; 2026-02-21T09:05:41.1312957Z mov.b32 %r29753, %r29610; 2026-02-21T09:05:41.1313020Z mov.b32 %r29754, %r29610; 2026-02-21T09:05:41.1313080Z mov.b32 %r29755, %r29610; 2026-02-21T09:05:41.1313141Z mov.b32 %r29756, %r29610; 2026-02-21T09:05:41.1313201Z mov.b32 %r29757, %r29610; 2026-02-21T09:05:41.1313268Z mov.b32 %r29758, %r29610; 2026-02-21T09:05:41.1313329Z mov.b32 %r29759, %r29610; 2026-02-21T09:05:41.1313388Z mov.b32 %r29760, %r29610; 2026-02-21T09:05:41.1313453Z mov.b32 %r29761, %r29610; 2026-02-21T09:05:41.1313512Z mov.b32 %r29762, %r29610; 2026-02-21T09:05:41.1313572Z mov.b32 %r29763, %r29610; 2026-02-21T09:05:41.1313631Z mov.b32 %r29764, %r29610; 2026-02-21T09:05:41.1313693Z mov.b32 %r29765, %r29610; 2026-02-21T09:05:41.1313800Z mov.b32 %r29766, %r29610; 2026-02-21T09:05:41.1313903Z mov.b32 %r29767, %r29610; 2026-02-21T09:05:41.1313968Z mov.b32 %r29768, %r29610; 2026-02-21T09:05:41.1314027Z mov.b32 %r29769, %r29610; 2026-02-21T09:05:41.1314090Z mov.b32 %r29770, %r29610; 2026-02-21T09:05:41.1314153Z mov.b32 %r29771, %r29610; 2026-02-21T09:05:41.1314218Z mov.b32 %r29772, %r29610; 2026-02-21T09:05:41.1314276Z mov.b32 %r29773, %r29610; 2026-02-21T09:05:41.1314335Z mov.b32 %r29774, %r29610; 2026-02-21T09:05:41.1314397Z mov.b32 %r29775, %r29610; 2026-02-21T09:05:41.1314454Z mov.b32 %r29776, %r29610; 2026-02-21T09:05:41.1314512Z mov.b32 %r29777, %r29610; 2026-02-21T09:05:41.1314575Z mov.b32 %r29778, %r29610; 2026-02-21T09:05:41.1314633Z mov.b32 %r29779, %r29610; 2026-02-21T09:05:41.1314691Z mov.b32 %r29780, %r29610; 2026-02-21T09:05:41.1314749Z mov.b32 %r29781, %r29610; 2026-02-21T09:05:41.1314814Z mov.b32 %r29782, %r29610; 2026-02-21T09:05:41.1314874Z mov.b32 %r29783, %r29610; 2026-02-21T09:05:41.1314935Z mov.b32 %r29784, %r29610; 2026-02-21T09:05:41.1315049Z mov.b32 %r29785, %r29610; 2026-02-21T09:05:41.1315113Z mov.b32 %r29786, %r29610; 2026-02-21T09:05:41.1315171Z mov.b32 %r29787, %r29610; 2026-02-21T09:05:41.1315229Z mov.b32 %r29788, %r29610; 2026-02-21T09:05:41.1315294Z mov.b32 %r29789, %r29610; 2026-02-21T09:05:41.1315352Z mov.b32 %r29790, %r29610; 2026-02-21T09:05:41.1315410Z mov.b32 %r29791, %r29610; 2026-02-21T09:05:41.1315472Z mov.b32 %r29792, %r29610; 2026-02-21T09:05:41.1315530Z mov.b32 %r29793, %r29610; 2026-02-21T09:05:41.1315588Z mov.b32 %r29794, %r29610; 2026-02-21T09:05:41.1315645Z mov.b32 %r29795, %r29610; 2026-02-21T09:05:41.1315708Z mov.b32 %r29796, %r29610; 2026-02-21T09:05:41.1315767Z mov.b32 %r29797, %r29610; 2026-02-21T09:05:41.1315828Z mov.b32 %r29798, %r29610; 2026-02-21T09:05:41.1315890Z mov.b32 %r29799, %r29610; 2026-02-21T09:05:41.1315950Z mov.b32 %r29800, %r29610; 2026-02-21T09:05:41.1316008Z mov.b32 %r29801, %r29610; 2026-02-21T09:05:41.1316068Z mov.b32 %r29802, %r29610; 2026-02-21T09:05:41.1316145Z mov.b32 %r29803, %r29610; 2026-02-21T09:05:41.1316210Z mov.b32 %r29804, %r29610; 2026-02-21T09:05:41.1316271Z mov.b32 %r29805, %r29610; 2026-02-21T09:05:41.1316336Z mov.b32 %r29806, %r29610; 2026-02-21T09:05:41.1316393Z mov.b32 %r29807, %r29610; 2026-02-21T09:05:41.1316560Z mov.b32 %r29808, %r29610; 2026-02-21T09:05:41.1316625Z mov.b32 %r29809, %r29610; 2026-02-21T09:05:41.1316689Z mov.b32 %r29810, %r29610; 2026-02-21T09:05:41.1316749Z mov.b32 %r29811, %r29610; 2026-02-21T09:05:41.1316809Z mov.b32 %r29812, %r29610; 2026-02-21T09:05:41.1316872Z mov.b32 %r29813, %r29610; 2026-02-21T09:05:41.1316931Z mov.b32 %r29814, %r29610; 2026-02-21T09:05:41.1316990Z mov.b32 %r29815, %r29610; 2026-02-21T09:05:41.1317049Z mov.b32 %r29816, %r29610; 2026-02-21T09:05:41.1317115Z mov.b32 %r29817, %r29610; 2026-02-21T09:05:41.1317174Z mov.b32 %r29818, %r29610; 2026-02-21T09:05:41.1317235Z mov.b32 %r29819, %r29610; 2026-02-21T09:05:41.1317299Z mov.b32 %r29820, %r29610; 2026-02-21T09:05:41.1317359Z mov.b32 %r29821, %r29610; 2026-02-21T09:05:41.1317417Z mov.b32 %r29822, %r29610; 2026-02-21T09:05:41.1317480Z mov.b32 %r29823, %r29610; 2026-02-21T09:05:41.1317636Z mov.b32 %r29824, %r29610; 2026-02-21T09:05:41.1317695Z mov.b32 %r29825, %r29610; 2026-02-21T09:05:41.1317754Z mov.b32 %r29826, %r29610; 2026-02-21T09:05:41.1317818Z mov.b32 %r29827, %r29610; 2026-02-21T09:05:41.1317877Z mov.b32 %r29828, %r29610; 2026-02-21T09:05:41.1317937Z mov.b32 %r29829, %r29610; 2026-02-21T09:05:41.1318001Z mov.b32 %r29830, %r29610; 2026-02-21T09:05:41.1318060Z mov.b32 %r29831, %r29610; 2026-02-21T09:05:41.1318119Z mov.b32 %r29832, %r29610; 2026-02-21T09:05:41.1318177Z mov.b32 %r29833, %r29610; 2026-02-21T09:05:41.1318241Z mov.b32 %r29834, %r29610; 2026-02-21T09:05:41.1318299Z mov.b32 %r29835, %r29610; 2026-02-21T09:05:41.1318358Z mov.b32 %r29836, %r29610; 2026-02-21T09:05:41.1318422Z mov.b32 %r29837, %r29610; 2026-02-21T09:05:41.1318559Z mov.b32 %r29838, %r29610; 2026-02-21T09:05:41.1326787Z mov.b32 %r29839, %r29610; 2026-02-21T09:05:41.1326927Z mov.b32 %r29840, %r29610; 2026-02-21T09:05:41.1327005Z mov.b32 %r29841, %r29610; 2026-02-21T09:05:41.1327082Z mov.b32 %r29842, %r29610; 2026-02-21T09:05:41.1327157Z mov.b32 %r29843, %r29610; 2026-02-21T09:05:41.1327228Z mov.b32 %r29844, %r29610; 2026-02-21T09:05:41.1327288Z mov.b32 %r29845, %r29610; 2026-02-21T09:05:41.1327354Z mov.b32 %r29846, %r29610; 2026-02-21T09:05:41.1327416Z mov.b32 %r29847, %r29610; 2026-02-21T09:05:41.1327476Z mov.b32 %r29848, %r29610; 2026-02-21T09:05:41.1327538Z mov.b32 %r29849, %r29610; 2026-02-21T09:05:41.1327602Z mov.b32 %r29850, %r29610; 2026-02-21T09:05:41.1327660Z mov.b32 %r29851, %r29610; 2026-02-21T09:05:41.1327722Z mov.b32 %r29852, %r29610; 2026-02-21T09:05:41.1327786Z mov.b32 %r29853, %r29610; 2026-02-21T09:05:41.1327845Z mov.b32 %r29854, %r29610; 2026-02-21T09:05:41.1327904Z mov.b32 %r29855, %r29610; 2026-02-21T09:05:41.1327967Z mov.b32 %r29856, %r29610; 2026-02-21T09:05:41.1328122Z mov.b32 %r29857, %r29610; 2026-02-21T09:05:41.1328190Z mov.b32 %r29858, %r29610; 2026-02-21T09:05:41.1328250Z mov.b32 %r29859, %r29610; 2026-02-21T09:05:41.1328316Z mov.b32 %r29860, %r29610; 2026-02-21T09:05:41.1328376Z mov.b32 %r29861, %r29610; 2026-02-21T09:05:41.1328434Z mov.b32 %r29862, %r29610; 2026-02-21T09:05:41.1328497Z mov.b32 %r29863, %r29610; 2026-02-21T09:05:41.1328556Z mov.b32 %r29864, %r29610; 2026-02-21T09:05:41.1328615Z mov.b32 %r29865, %r29610; 2026-02-21T09:05:41.1328753Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:05:41.1328876Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:41.1329107Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1329182Z add.s64 %rd185, %rd303, %rd28; 2026-02-21T09:05:41.1329252Z add.s64 %rd186, %rd303, %rd27; 2026-02-21T09:05:41.1329321Z add.s64 %rd187, %rd303, %rd26; 2026-02-21T09:05:41.1329538Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1329610Z add.s64 %rd188, %rd303, %rd25; 2026-02-21T09:05:41.1329675Z // begin inline asm 2026-02-21T09:05:41.1329738Z mov.u32 %r15669, 0x0; 2026-02-21T09:05:41.1329796Z mov.u32 %r15670, 0x0; 2026-02-21T09:05:41.1329860Z mov.u32 %r15671, 0x0; 2026-02-21T09:05:41.1329918Z mov.u32 %r15672, 0x0; 2026-02-21T09:05:41.1330066Z ld.global.v4.b32 { %r15669, %r15670, %r15671, %r15672 }, [ %rd185 + 0 ]; 2026-02-21T09:05:41.1330136Z // end inline asm 2026-02-21T09:05:41.1330199Z // begin inline asm 2026-02-21T09:05:41.1330259Z mov.u32 %r15673, 0x0; 2026-02-21T09:05:41.1330319Z mov.u32 %r15674, 0x0; 2026-02-21T09:05:41.1330386Z mov.u32 %r15675, 0x0; 2026-02-21T09:05:41.1330446Z mov.u32 %r15676, 0x0; 2026-02-21T09:05:41.1330586Z ld.global.v4.b32 { %r15673, %r15674, %r15675, %r15676 }, [ %rd186 + 0 ]; 2026-02-21T09:05:41.1330660Z // end inline asm 2026-02-21T09:05:41.1330722Z // begin inline asm 2026-02-21T09:05:41.1330780Z mov.u32 %r15677, 0x0; 2026-02-21T09:05:41.1330838Z mov.u32 %r15678, 0x0; 2026-02-21T09:05:41.1330901Z mov.u32 %r15679, 0x0; 2026-02-21T09:05:41.1331055Z mov.u32 %r15680, 0x0; 2026-02-21T09:05:41.1331195Z ld.global.v4.b32 { %r15677, %r15678, %r15679, %r15680 }, [ %rd187 + 0 ]; 2026-02-21T09:05:41.1331258Z // end inline asm 2026-02-21T09:05:41.1331317Z // begin inline asm 2026-02-21T09:05:41.1331376Z mov.u32 %r15681, 0x0; 2026-02-21T09:05:41.1331434Z mov.u32 %r15682, 0x0; 2026-02-21T09:05:41.1331495Z mov.u32 %r15683, 0x0; 2026-02-21T09:05:41.1331551Z mov.u32 %r15684, 0x0; 2026-02-21T09:05:41.1331677Z ld.global.v4.b32 { %r15681, %r15682, %r15683, %r15684 }, [ %rd188 + 0 ]; 2026-02-21T09:05:41.1331737Z // end inline asm 2026-02-21T09:05:41.1331958Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1332088Z bar.sync 0; 2026-02-21T09:05:41.1332242Z st.shared.v2.b32 [%r10], {%r15669, %r15670}; 2026-02-21T09:05:41.1332341Z st.shared.v2.b32 [%r10+2048], {%r15673, %r15674}; 2026-02-21T09:05:41.1332429Z st.shared.v2.b32 [%r10+4096], {%r15677, %r15678}; 2026-02-21T09:05:41.1332529Z st.shared.v2.b32 [%r10+6144], {%r15681, %r15682}; 2026-02-21T09:05:41.1332616Z st.shared.v2.b32 [%r11], {%r15671, %r15672}; 2026-02-21T09:05:41.1332703Z st.shared.v2.b32 [%r11+2048], {%r15675, %r15676}; 2026-02-21T09:05:41.1332791Z st.shared.v2.b32 [%r11+4096], {%r15679, %r15680}; 2026-02-21T09:05:41.1332880Z st.shared.v2.b32 [%r11+6144], {%r15683, %r15684}; 2026-02-21T09:05:41.1332937Z bar.sync 0; 2026-02-21T09:05:41.1333008Z ld.shared.b16 %rs641, [%r12]; 2026-02-21T09:05:41.1333081Z ld.shared.b16 %rs642, [%r12+256]; 2026-02-21T09:05:41.1333150Z ld.shared.b16 %rs643, [%r12+16]; 2026-02-21T09:05:41.1333218Z ld.shared.b16 %rs644, [%r12+272]; 2026-02-21T09:05:41.1333287Z ld.shared.b16 %rs645, [%r12+2048]; 2026-02-21T09:05:41.1333362Z ld.shared.b16 %rs646, [%r12+2304]; 2026-02-21T09:05:41.1333477Z ld.shared.b16 %rs647, [%r12+2064]; 2026-02-21T09:05:41.1333545Z ld.shared.b16 %rs648, [%r12+2320]; 2026-02-21T09:05:41.1333618Z ld.shared.b16 %rs649, [%r12+4096]; 2026-02-21T09:05:41.1333685Z ld.shared.b16 %rs650, [%r12+4352]; 2026-02-21T09:05:41.1333750Z ld.shared.b16 %rs651, [%r12+4112]; 2026-02-21T09:05:41.1333813Z ld.shared.b16 %rs652, [%r12+4368]; 2026-02-21T09:05:41.1333887Z ld.shared.b16 %rs653, [%r12+6144]; 2026-02-21T09:05:41.1333952Z ld.shared.b16 %rs654, [%r12+6400]; 2026-02-21T09:05:41.1334017Z ld.shared.b16 %rs655, [%r12+6160]; 2026-02-21T09:05:41.1334084Z ld.shared.b16 %rs656, [%r12+6416]; 2026-02-21T09:05:41.1334160Z ld.shared.b16 %rs657, [%r13]; 2026-02-21T09:05:41.1334232Z ld.shared.b16 %rs658, [%r13+256]; 2026-02-21T09:05:41.1334299Z ld.shared.b16 %rs659, [%r13+16]; 2026-02-21T09:05:41.1334370Z ld.shared.b16 %rs660, [%r13+272]; 2026-02-21T09:05:41.1334435Z ld.shared.b16 %rs661, [%r13+2048]; 2026-02-21T09:05:41.1334502Z ld.shared.b16 %rs662, [%r13+2304]; 2026-02-21T09:05:41.1334573Z ld.shared.b16 %rs663, [%r13+2064]; 2026-02-21T09:05:41.1334638Z ld.shared.b16 %rs664, [%r13+2320]; 2026-02-21T09:05:41.1334705Z ld.shared.b16 %rs665, [%r13+4096]; 2026-02-21T09:05:41.1334774Z ld.shared.b16 %rs666, [%r13+4352]; 2026-02-21T09:05:41.1334838Z ld.shared.b16 %rs667, [%r13+4112]; 2026-02-21T09:05:41.1334903Z ld.shared.b16 %rs668, [%r13+4368]; 2026-02-21T09:05:41.1334966Z ld.shared.b16 %rs669, [%r13+6144]; 2026-02-21T09:05:41.1335038Z ld.shared.b16 %rs670, [%r13+6400]; 2026-02-21T09:05:41.1335102Z ld.shared.b16 %rs671, [%r13+6160]; 2026-02-21T09:05:41.1335169Z ld.shared.b16 %rs672, [%r13+6416]; 2026-02-21T09:05:41.1335235Z cvt.f32.bf16 %r15815, %rs641; 2026-02-21T09:05:41.1335297Z cvt.f32.bf16 %r15816, %rs642; 2026-02-21T09:05:41.1356060Z cvt.f32.bf16 %r15817, %rs657; 2026-02-21T09:05:41.1356160Z cvt.f32.bf16 %r15818, %rs658; 2026-02-21T09:05:41.1356231Z cvt.f32.bf16 %r15947, %rs643; 2026-02-21T09:05:41.1356305Z cvt.f32.bf16 %r15948, %rs644; 2026-02-21T09:05:41.1356364Z cvt.f32.bf16 %r15949, %rs659; 2026-02-21T09:05:41.1356426Z cvt.f32.bf16 %r15950, %rs660; 2026-02-21T09:05:41.1356588Z cvt.f32.bf16 %r16079, %rs645; 2026-02-21T09:05:41.1356773Z cvt.f32.bf16 %r16080, %rs646; 2026-02-21T09:05:41.1356836Z cvt.f32.bf16 %r16081, %rs661; 2026-02-21T09:05:41.1356895Z cvt.f32.bf16 %r16082, %rs662; 2026-02-21T09:05:41.1356953Z cvt.f32.bf16 %r16211, %rs647; 2026-02-21T09:05:41.1357010Z cvt.f32.bf16 %r16212, %rs648; 2026-02-21T09:05:41.1357070Z cvt.f32.bf16 %r16213, %rs663; 2026-02-21T09:05:41.1357127Z cvt.f32.bf16 %r16214, %rs664; 2026-02-21T09:05:41.1357185Z cvt.f32.bf16 %r16343, %rs649; 2026-02-21T09:05:41.1357244Z cvt.f32.bf16 %r16344, %rs650; 2026-02-21T09:05:41.1357302Z cvt.f32.bf16 %r16345, %rs665; 2026-02-21T09:05:41.1357357Z cvt.f32.bf16 %r16346, %rs666; 2026-02-21T09:05:41.1357414Z cvt.f32.bf16 %r16475, %rs651; 2026-02-21T09:05:41.1357546Z cvt.f32.bf16 %r16476, %rs652; 2026-02-21T09:05:41.1357665Z cvt.f32.bf16 %r16477, %rs667; 2026-02-21T09:05:41.1357737Z cvt.f32.bf16 %r16478, %rs668; 2026-02-21T09:05:41.1357798Z cvt.f32.bf16 %r16607, %rs653; 2026-02-21T09:05:41.1357859Z cvt.f32.bf16 %r16608, %rs654; 2026-02-21T09:05:41.1357916Z cvt.f32.bf16 %r16609, %rs669; 2026-02-21T09:05:41.1357974Z cvt.f32.bf16 %r16610, %rs670; 2026-02-21T09:05:41.1358033Z cvt.f32.bf16 %r16739, %rs655; 2026-02-21T09:05:41.1358089Z cvt.f32.bf16 %r16740, %rs656; 2026-02-21T09:05:41.1358146Z cvt.f32.bf16 %r16741, %rs671; 2026-02-21T09:05:41.1358205Z cvt.f32.bf16 %r16742, %rs672; 2026-02-21T09:05:41.1358431Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1358498Z cvt.s64.s32 %rd237, %r29609; 2026-02-21T09:05:41.1358566Z add.s64 %rd189, %rd45, %rd237; 2026-02-21T09:05:41.1358784Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1358847Z // begin inline asm 2026-02-21T09:05:41.1358982Z mov.u32 %r15685, 0x0; 2026-02-21T09:05:41.1359045Z mov.u32 %r15686, 0x0; 2026-02-21T09:05:41.1359151Z ld.global.v2.b32 { %r15685, %r15686 }, [ %rd189 + 0 ]; 2026-02-21T09:05:41.1359209Z // end inline asm 2026-02-21T09:05:41.1359427Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1359483Z bar.sync 0; 2026-02-21T09:05:41.1359550Z st.shared.b8 [%r14], %r15685; 2026-02-21T09:05:41.1359623Z prmt.b32 %r22037, %r15685, 0, 0x7771U; 2026-02-21T09:05:41.1359687Z st.shared.b8 [%r15], %r22037; 2026-02-21T09:05:41.1359753Z prmt.b32 %r22038, %r15685, 0, 0x7772U; 2026-02-21T09:05:41.1359817Z st.shared.b8 [%r16+256], %r22038; 2026-02-21T09:05:41.1359882Z prmt.b32 %r22039, %r15685, 0, 0x7773U; 2026-02-21T09:05:41.1359942Z st.shared.b8 [%r17+256], %r22039; 2026-02-21T09:05:41.1360001Z st.shared.b8 [%r18+512], %r15686; 2026-02-21T09:05:41.1360068Z prmt.b32 %r22040, %r15686, 0, 0x7771U; 2026-02-21T09:05:41.1360131Z st.shared.b8 [%r19+512], %r22040; 2026-02-21T09:05:41.1360195Z prmt.b32 %r22041, %r15686, 0, 0x7772U; 2026-02-21T09:05:41.1360255Z st.shared.b8 [%r20+768], %r22041; 2026-02-21T09:05:41.1360323Z prmt.b32 %r22042, %r15686, 0, 0x7773U; 2026-02-21T09:05:41.1360382Z st.shared.b8 [%r21+768], %r22042; 2026-02-21T09:05:41.1360434Z bar.sync 0; 2026-02-21T09:05:41.1360505Z ld.shared.b32 %r22043, [%r22]; 2026-02-21T09:05:41.1360576Z prmt.b32 %r22044, %r22043, 0, 0x7770U; 2026-02-21T09:05:41.1360638Z cvt.u16.u32 %rs673, %r22044; 2026-02-21T09:05:41.1360698Z prmt.b32 %r22045, %r22043, 0, 0x7771U; 2026-02-21T09:05:41.1360760Z cvt.u16.u32 %rs674, %r22045; 2026-02-21T09:05:41.1360821Z prmt.b32 %r22046, %r22043, 0, 0x7772U; 2026-02-21T09:05:41.1360878Z cvt.u16.u32 %rs675, %r22046; 2026-02-21T09:05:41.1360941Z prmt.b32 %r22047, %r22043, 0, 0x7773U; 2026-02-21T09:05:41.1360997Z cvt.u16.u32 %rs676, %r22047; 2026-02-21T09:05:41.1361063Z ld.shared.b32 %r22048, [%r23]; 2026-02-21T09:05:41.1361128Z prmt.b32 %r22049, %r22048, 0, 0x7770U; 2026-02-21T09:05:41.1361188Z cvt.u16.u32 %rs677, %r22049; 2026-02-21T09:05:41.1361249Z prmt.b32 %r22050, %r22048, 0, 0x7771U; 2026-02-21T09:05:41.1361366Z cvt.u16.u32 %rs678, %r22050; 2026-02-21T09:05:41.1361431Z prmt.b32 %r22051, %r22048, 0, 0x7772U; 2026-02-21T09:05:41.1361489Z cvt.u16.u32 %rs679, %r22051; 2026-02-21T09:05:41.1361551Z prmt.b32 %r22052, %r22048, 0, 0x7773U; 2026-02-21T09:05:41.1361610Z cvt.u16.u32 %rs680, %r22052; 2026-02-21T09:05:41.1361816Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1361876Z shl.b16 %rs681, %rs673, 4; 2026-02-21T09:05:41.1361934Z shl.b16 %rs682, %rs677, 4; 2026-02-21T09:05:41.1361994Z shl.b16 %rs683, %rs674, 4; 2026-02-21T09:05:41.1362062Z shl.b16 %rs684, %rs678, 4; 2026-02-21T09:05:41.1362121Z shl.b16 %rs685, %rs675, 4; 2026-02-21T09:05:41.1362180Z shl.b16 %rs686, %rs679, 4; 2026-02-21T09:05:41.1362335Z shl.b16 %rs687, %rs676, 4; 2026-02-21T09:05:41.1362396Z shl.b16 %rs688, %rs680, 4; 2026-02-21T09:05:41.1362593Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1362658Z cvt.s16.s8 %rs689, %rs681; 2026-02-21T09:05:41.1362715Z shr.s16 %rs690, %rs689, 4; 2026-02-21T09:05:41.1362771Z cvt.s16.s8 %rs691, %rs682; 2026-02-21T09:05:41.1362831Z shr.s16 %rs692, %rs691, 4; 2026-02-21T09:05:41.1362896Z prmt.b32 %r22053, %r22043, 0, 0x8880U; 2026-02-21T09:05:41.1362955Z cvt.u16.u32 %rs693, %r22053; 2026-02-21T09:05:41.1363014Z shr.s16 %rs694, %rs693, 4; 2026-02-21T09:05:41.1363076Z prmt.b32 %r22054, %r22048, 0, 0x8880U; 2026-02-21T09:05:41.1363133Z cvt.u16.u32 %rs695, %r22054; 2026-02-21T09:05:41.1363190Z shr.s16 %rs696, %rs695, 4; 2026-02-21T09:05:41.1363384Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1363452Z cvt.rn.f32.s16 %r22055, %rs696; 2026-02-21T09:05:41.1363565Z cvt.rn.f32.s16 %r22056, %rs694; 2026-02-21T09:05:41.1363629Z cvt.rn.f32.s16 %r22057, %rs692; 2026-02-21T09:05:41.1363688Z cvt.rn.f32.s16 %r22058, %rs690; 2026-02-21T09:05:41.1363901Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1363967Z cvt.s16.s8 %rs697, %rs683; 2026-02-21T09:05:41.1364028Z shr.s16 %rs698, %rs697, 4; 2026-02-21T09:05:41.1364088Z cvt.s16.s8 %rs699, %rs684; 2026-02-21T09:05:41.1364147Z shr.s16 %rs700, %rs699, 4; 2026-02-21T09:05:41.1364218Z prmt.b32 %r22059, %r22043, 0, 0x9991U; 2026-02-21T09:05:41.1364279Z cvt.u16.u32 %rs701, %r22059; 2026-02-21T09:05:41.1364335Z shr.s16 %rs702, %rs701, 4; 2026-02-21T09:05:41.1364401Z prmt.b32 %r22060, %r22048, 0, 0x9991U; 2026-02-21T09:05:41.1364460Z cvt.u16.u32 %rs703, %r22060; 2026-02-21T09:05:41.1364524Z shr.s16 %rs704, %rs703, 4; 2026-02-21T09:05:41.1364728Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1364794Z cvt.rn.f32.s16 %r22061, %rs704; 2026-02-21T09:05:41.1364861Z cvt.rn.f32.s16 %r22062, %rs702; 2026-02-21T09:05:41.1364922Z cvt.rn.f32.s16 %r22063, %rs700; 2026-02-21T09:05:41.1364985Z cvt.rn.f32.s16 %r22064, %rs698; 2026-02-21T09:05:41.1365185Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1365246Z cvt.s16.s8 %rs705, %rs685; 2026-02-21T09:05:41.1365307Z shr.s16 %rs706, %rs705, 4; 2026-02-21T09:05:41.1365369Z cvt.s16.s8 %rs707, %rs686; 2026-02-21T09:05:41.1365437Z shr.s16 %rs708, %rs707, 4; 2026-02-21T09:05:41.1365503Z prmt.b32 %r22065, %r22043, 0, 0xaaa2U; 2026-02-21T09:05:41.1365567Z cvt.u16.u32 %rs709, %r22065; 2026-02-21T09:05:41.1365632Z shr.s16 %rs710, %rs709, 4; 2026-02-21T09:05:41.1365697Z prmt.b32 %r22066, %r22048, 0, 0xaaa2U; 2026-02-21T09:05:41.1365757Z cvt.u16.u32 %rs711, %r22066; 2026-02-21T09:05:41.1365821Z shr.s16 %rs712, %rs711, 4; 2026-02-21T09:05:41.1366022Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1366086Z cvt.rn.f32.s16 %r22067, %rs712; 2026-02-21T09:05:41.1366228Z cvt.rn.f32.s16 %r22068, %rs710; 2026-02-21T09:05:41.1366303Z cvt.rn.f32.s16 %r22069, %rs708; 2026-02-21T09:05:41.1366371Z cvt.rn.f32.s16 %r22070, %rs706; 2026-02-21T09:05:41.1366689Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1366771Z cvt.s16.s8 %rs713, %rs687; 2026-02-21T09:05:41.1366834Z shr.s16 %rs714, %rs713, 4; 2026-02-21T09:05:41.1366895Z cvt.s16.s8 %rs715, %rs688; 2026-02-21T09:05:41.1366955Z shr.s16 %rs716, %rs715, 4; 2026-02-21T09:05:41.1367030Z prmt.b32 %r22071, %r22043, 0, 0xbbb3U; 2026-02-21T09:05:41.1367093Z cvt.u16.u32 %rs717, %r22071; 2026-02-21T09:05:41.1367154Z shr.s16 %rs718, %rs717, 4; 2026-02-21T09:05:41.1367224Z prmt.b32 %r22072, %r22048, 0, 0xbbb3U; 2026-02-21T09:05:41.1367429Z cvt.u16.u32 %rs719, %r22072; 2026-02-21T09:05:41.1367497Z shr.s16 %rs720, %rs719, 4; 2026-02-21T09:05:41.1367699Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1367773Z cvt.rn.f32.s16 %r22073, %rs720; 2026-02-21T09:05:41.1367835Z cvt.rn.f32.s16 %r22074, %rs718; 2026-02-21T09:05:41.1367896Z cvt.rn.f32.s16 %r22075, %rs716; 2026-02-21T09:05:41.1367965Z cvt.rn.f32.s16 %r22076, %rs714; 2026-02-21T09:05:41.1368022Z bar.sync 0; 2026-02-21T09:05:41.1368147Z st.shared.v4.b32 [%r24], {%r22058, %r22056, %r22057, %r22055}; 2026-02-21T09:05:41.1368264Z st.shared.v4.b32 [%r25], {%r22064, %r22062, %r22063, %r22061}; 2026-02-21T09:05:41.1368371Z st.shared.v4.b32 [%r26], {%r22070, %r22068, %r22069, %r22067}; 2026-02-21T09:05:41.1368479Z st.shared.v4.b32 [%r27], {%r22076, %r22074, %r22075, %r22073}; 2026-02-21T09:05:41.1368536Z $L__tmp17: 2026-02-21T09:05:41.1368895Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1368960Z // begin inline asm 2026-02-21T09:05:41.1369046Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1369112Z // end inline asm 2026-02-21T09:05:41.1369167Z bar.sync 0; 2026-02-21T09:05:41.1369238Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1369311Z mov.pred %p92, -1; 2026-02-21T09:05:41.1369369Z // begin inline asm 2026-02-21T09:05:41.1370851Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r15815,%r15816,%r15817,%r15818}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1370918Z // end inline asm 2026-02-21T09:05:41.1370977Z // begin inline asm 2026-02-21T09:05:41.1372437Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r15947,%r15948,%r15949,%r15950}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1372502Z // end inline asm 2026-02-21T09:05:41.1372561Z // begin inline asm 2026-02-21T09:05:41.1374030Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r16079,%r16080,%r16081,%r16082}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1374163Z // end inline asm 2026-02-21T09:05:41.1374222Z // begin inline asm 2026-02-21T09:05:41.1375725Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r16211,%r16212,%r16213,%r16214}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1375836Z // end inline asm 2026-02-21T09:05:41.1375894Z // begin inline asm 2026-02-21T09:05:41.1377497Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r16343,%r16344,%r16345,%r16346}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1377565Z // end inline asm 2026-02-21T09:05:41.1377697Z // begin inline asm 2026-02-21T09:05:41.1379171Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r16475,%r16476,%r16477,%r16478}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1379234Z // end inline asm 2026-02-21T09:05:41.1379299Z // begin inline asm 2026-02-21T09:05:41.1380755Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r16607,%r16608,%r16609,%r16610}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1380823Z // end inline asm 2026-02-21T09:05:41.1380880Z // begin inline asm 2026-02-21T09:05:41.1382333Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r16739,%r16740,%r16741,%r16742}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1382463Z // end inline asm 2026-02-21T09:05:41.1382552Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1382615Z mov.b32 %r21777, 0; 2026-02-21T09:05:41.1382683Z mov.b32 %r16999, %r24001; 2026-02-21T09:05:41.1382743Z mov.b32 %r17000, %r21777; 2026-02-21T09:05:41.1382800Z mov.b32 %r17001, %r21777; 2026-02-21T09:05:41.1382862Z // begin inline asm 2026-02-21T09:05:41.1387985Z // wait for regs: %r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673,%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737,%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801,%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865,%r16999,%r17000,%r17001 2026-02-21T09:05:41.1388151Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1388211Z // end inline asm 2026-02-21T09:05:41.1388378Z $L__tmp18: 2026-02-21T09:05:41.1388600Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1388668Z add.s64 %rd198, %rd185, 32; 2026-02-21T09:05:41.1388743Z add.s64 %rd199, %rd186, 32; 2026-02-21T09:05:41.1388804Z add.s64 %rd200, %rd187, 32; 2026-02-21T09:05:41.1389006Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1389069Z add.s64 %rd201, %rd188, 32; 2026-02-21T09:05:41.1389133Z // begin inline asm 2026-02-21T09:05:41.1389193Z mov.u32 %r17261, 0x0; 2026-02-21T09:05:41.1389253Z mov.u32 %r17262, 0x0; 2026-02-21T09:05:41.1389312Z mov.u32 %r17263, 0x0; 2026-02-21T09:05:41.1389368Z mov.u32 %r17264, 0x0; 2026-02-21T09:05:41.1389507Z ld.global.v4.b32 { %r17261, %r17262, %r17263, %r17264 }, [ %rd198 + 0 ]; 2026-02-21T09:05:41.1389563Z // end inline asm 2026-02-21T09:05:41.1389626Z // begin inline asm 2026-02-21T09:05:41.1389684Z mov.u32 %r17265, 0x0; 2026-02-21T09:05:41.1389740Z mov.u32 %r17266, 0x0; 2026-02-21T09:05:41.1389801Z mov.u32 %r17267, 0x0; 2026-02-21T09:05:41.1389861Z mov.u32 %r17268, 0x0; 2026-02-21T09:05:41.1389992Z ld.global.v4.b32 { %r17265, %r17266, %r17267, %r17268 }, [ %rd199 + 0 ]; 2026-02-21T09:05:41.1390053Z // end inline asm 2026-02-21T09:05:41.1390116Z // begin inline asm 2026-02-21T09:05:41.1390172Z mov.u32 %r17269, 0x0; 2026-02-21T09:05:41.1390309Z mov.u32 %r17270, 0x0; 2026-02-21T09:05:41.1390371Z mov.u32 %r17271, 0x0; 2026-02-21T09:05:41.1390428Z mov.u32 %r17272, 0x0; 2026-02-21T09:05:41.1390551Z ld.global.v4.b32 { %r17269, %r17270, %r17271, %r17272 }, [ %rd200 + 0 ]; 2026-02-21T09:05:41.1390613Z // end inline asm 2026-02-21T09:05:41.1390672Z // begin inline asm 2026-02-21T09:05:41.1390729Z mov.u32 %r17273, 0x0; 2026-02-21T09:05:41.1390786Z mov.u32 %r17274, 0x0; 2026-02-21T09:05:41.1390845Z mov.u32 %r17275, 0x0; 2026-02-21T09:05:41.1390901Z mov.u32 %r17276, 0x0; 2026-02-21T09:05:41.1391027Z ld.global.v4.b32 { %r17273, %r17274, %r17275, %r17276 }, [ %rd201 + 0 ]; 2026-02-21T09:05:41.1391088Z // end inline asm 2026-02-21T09:05:41.1391288Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1391451Z bar.sync 0; 2026-02-21T09:05:41.1391539Z st.shared.v2.b32 [%r10], {%r17261, %r17262}; 2026-02-21T09:05:41.1391638Z st.shared.v2.b32 [%r10+2048], {%r17265, %r17266}; 2026-02-21T09:05:41.1391739Z st.shared.v2.b32 [%r10+4096], {%r17269, %r17270}; 2026-02-21T09:05:41.1391824Z st.shared.v2.b32 [%r10+6144], {%r17273, %r17274}; 2026-02-21T09:05:41.1391912Z st.shared.v2.b32 [%r11], {%r17263, %r17264}; 2026-02-21T09:05:41.1391995Z st.shared.v2.b32 [%r11+2048], {%r17267, %r17268}; 2026-02-21T09:05:41.1392082Z st.shared.v2.b32 [%r11+4096], {%r17271, %r17272}; 2026-02-21T09:05:41.1392175Z st.shared.v2.b32 [%r11+6144], {%r17275, %r17276}; 2026-02-21T09:05:41.1392230Z bar.sync 0; 2026-02-21T09:05:41.1392297Z ld.shared.b16 %rs721, [%r12]; 2026-02-21T09:05:41.1392366Z ld.shared.b16 %rs722, [%r12+256]; 2026-02-21T09:05:41.1392437Z ld.shared.b16 %rs723, [%r12+16]; 2026-02-21T09:05:41.1392502Z ld.shared.b16 %rs724, [%r12+272]; 2026-02-21T09:05:41.1392572Z ld.shared.b16 %rs725, [%r12+2048]; 2026-02-21T09:05:41.1392710Z ld.shared.b16 %rs726, [%r12+2304]; 2026-02-21T09:05:41.1392776Z ld.shared.b16 %rs727, [%r12+2064]; 2026-02-21T09:05:41.1392839Z ld.shared.b16 %rs728, [%r12+2320]; 2026-02-21T09:05:41.1392905Z ld.shared.b16 %rs729, [%r12+4096]; 2026-02-21T09:05:41.1392975Z ld.shared.b16 %rs730, [%r12+4352]; 2026-02-21T09:05:41.1393040Z ld.shared.b16 %rs731, [%r12+4112]; 2026-02-21T09:05:41.1393103Z ld.shared.b16 %rs732, [%r12+4368]; 2026-02-21T09:05:41.1393173Z ld.shared.b16 %rs733, [%r12+6144]; 2026-02-21T09:05:41.1393236Z ld.shared.b16 %rs734, [%r12+6400]; 2026-02-21T09:05:41.1393302Z ld.shared.b16 %rs735, [%r12+6160]; 2026-02-21T09:05:41.1393370Z ld.shared.b16 %rs736, [%r12+6416]; 2026-02-21T09:05:41.1393434Z ld.shared.b16 %rs737, [%r13]; 2026-02-21T09:05:41.1393497Z ld.shared.b16 %rs738, [%r13+256]; 2026-02-21T09:05:41.1393562Z ld.shared.b16 %rs739, [%r13+16]; 2026-02-21T09:05:41.1393630Z ld.shared.b16 %rs740, [%r13+272]; 2026-02-21T09:05:41.1393709Z ld.shared.b16 %rs741, [%r13+2048]; 2026-02-21T09:05:41.1393780Z ld.shared.b16 %rs742, [%r13+2304]; 2026-02-21T09:05:41.1393848Z ld.shared.b16 %rs743, [%r13+2064]; 2026-02-21T09:05:41.1393914Z ld.shared.b16 %rs744, [%r13+2320]; 2026-02-21T09:05:41.1393978Z ld.shared.b16 %rs745, [%r13+4096]; 2026-02-21T09:05:41.1394040Z ld.shared.b16 %rs746, [%r13+4352]; 2026-02-21T09:05:41.1394109Z ld.shared.b16 %rs747, [%r13+4112]; 2026-02-21T09:05:41.1394172Z ld.shared.b16 %rs748, [%r13+4368]; 2026-02-21T09:05:41.1394236Z ld.shared.b16 %rs749, [%r13+6144]; 2026-02-21T09:05:41.1394304Z ld.shared.b16 %rs750, [%r13+6400]; 2026-02-21T09:05:41.1394367Z ld.shared.b16 %rs751, [%r13+6160]; 2026-02-21T09:05:41.1394443Z ld.shared.b16 %rs752, [%r13+6416]; 2026-02-21T09:05:41.1394508Z cvt.f32.bf16 %r17407, %rs721; 2026-02-21T09:05:41.1394576Z cvt.f32.bf16 %r17408, %rs722; 2026-02-21T09:05:41.1394636Z cvt.f32.bf16 %r17409, %rs737; 2026-02-21T09:05:41.1394696Z cvt.f32.bf16 %r17410, %rs738; 2026-02-21T09:05:41.1394765Z cvt.f32.bf16 %r17539, %rs723; 2026-02-21T09:05:41.1394824Z cvt.f32.bf16 %r17540, %rs724; 2026-02-21T09:05:41.1394885Z cvt.f32.bf16 %r17541, %rs739; 2026-02-21T09:05:41.1394948Z cvt.f32.bf16 %r17542, %rs740; 2026-02-21T09:05:41.1395065Z cvt.f32.bf16 %r17671, %rs725; 2026-02-21T09:05:41.1395126Z cvt.f32.bf16 %r17672, %rs726; 2026-02-21T09:05:41.1395185Z cvt.f32.bf16 %r17673, %rs741; 2026-02-21T09:05:41.1395250Z cvt.f32.bf16 %r17674, %rs742; 2026-02-21T09:05:41.1395309Z cvt.f32.bf16 %r17803, %rs727; 2026-02-21T09:05:41.1395371Z cvt.f32.bf16 %r17804, %rs728; 2026-02-21T09:05:41.1395435Z cvt.f32.bf16 %r17805, %rs743; 2026-02-21T09:05:41.1395497Z cvt.f32.bf16 %r17806, %rs744; 2026-02-21T09:05:41.1395555Z cvt.f32.bf16 %r17935, %rs729; 2026-02-21T09:05:41.1395614Z cvt.f32.bf16 %r17936, %rs730; 2026-02-21T09:05:41.1395680Z cvt.f32.bf16 %r17937, %rs745; 2026-02-21T09:05:41.1395739Z cvt.f32.bf16 %r17938, %rs746; 2026-02-21T09:05:41.1395853Z cvt.f32.bf16 %r18067, %rs731; 2026-02-21T09:05:41.1395966Z cvt.f32.bf16 %r18068, %rs732; 2026-02-21T09:05:41.1396029Z cvt.f32.bf16 %r18069, %rs747; 2026-02-21T09:05:41.1396088Z cvt.f32.bf16 %r18070, %rs748; 2026-02-21T09:05:41.1396151Z cvt.f32.bf16 %r18199, %rs733; 2026-02-21T09:05:41.1396215Z cvt.f32.bf16 %r18200, %rs734; 2026-02-21T09:05:41.1396275Z cvt.f32.bf16 %r18201, %rs749; 2026-02-21T09:05:41.1396333Z cvt.f32.bf16 %r18202, %rs750; 2026-02-21T09:05:41.1396396Z cvt.f32.bf16 %r18331, %rs735; 2026-02-21T09:05:41.1396578Z cvt.f32.bf16 %r18332, %rs736; 2026-02-21T09:05:41.1396648Z cvt.f32.bf16 %r18333, %rs751; 2026-02-21T09:05:41.1396708Z cvt.f32.bf16 %r18334, %rs752; 2026-02-21T09:05:41.1396918Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1396986Z add.s32 %r22077, %r29609, 65536; 2026-02-21T09:05:41.1397049Z cvt.s64.s32 %rd238, %r22077; 2026-02-21T09:05:41.1397117Z add.s64 %rd202, %rd45, %rd238; 2026-02-21T09:05:41.1397402Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1397467Z // begin inline asm 2026-02-21T09:05:41.1397527Z mov.u32 %r17277, 0x0; 2026-02-21T09:05:41.1397587Z mov.u32 %r17278, 0x0; 2026-02-21T09:05:41.1397697Z ld.global.v2.b32 { %r17277, %r17278 }, [ %rd202 + 0 ]; 2026-02-21T09:05:41.1397766Z // end inline asm 2026-02-21T09:05:41.1397964Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1398027Z bar.sync 0; 2026-02-21T09:05:41.1398093Z st.shared.b8 [%r14], %r17277; 2026-02-21T09:05:41.1398164Z prmt.b32 %r22078, %r17277, 0, 0x7771U; 2026-02-21T09:05:41.1398236Z st.shared.b8 [%r15], %r22078; 2026-02-21T09:05:41.1398304Z prmt.b32 %r22079, %r17277, 0, 0x7772U; 2026-02-21T09:05:41.1398373Z st.shared.b8 [%r16+256], %r22079; 2026-02-21T09:05:41.1398439Z prmt.b32 %r22080, %r17277, 0, 0x7773U; 2026-02-21T09:05:41.1398512Z st.shared.b8 [%r17+256], %r22080; 2026-02-21T09:05:41.1398578Z st.shared.b8 [%r18+512], %r17278; 2026-02-21T09:05:41.1398643Z prmt.b32 %r22081, %r17278, 0, 0x7771U; 2026-02-21T09:05:41.1398709Z st.shared.b8 [%r19+512], %r22081; 2026-02-21T09:05:41.1398775Z prmt.b32 %r22082, %r17278, 0, 0x7772U; 2026-02-21T09:05:41.1398840Z st.shared.b8 [%r20+768], %r22082; 2026-02-21T09:05:41.1398903Z prmt.b32 %r22083, %r17278, 0, 0x7773U; 2026-02-21T09:05:41.1398971Z st.shared.b8 [%r21+768], %r22083; 2026-02-21T09:05:41.1399027Z bar.sync 0; 2026-02-21T09:05:41.1399093Z ld.shared.b32 %r22084, [%r22]; 2026-02-21T09:05:41.1399164Z prmt.b32 %r22085, %r22084, 0, 0x7770U; 2026-02-21T09:05:41.1399227Z cvt.u16.u32 %rs753, %r22085; 2026-02-21T09:05:41.1399290Z prmt.b32 %r22086, %r22084, 0, 0x7771U; 2026-02-21T09:05:41.1399351Z cvt.u16.u32 %rs754, %r22086; 2026-02-21T09:05:41.1399420Z prmt.b32 %r22087, %r22084, 0, 0x7772U; 2026-02-21T09:05:41.1399480Z cvt.u16.u32 %rs755, %r22087; 2026-02-21T09:05:41.1399549Z prmt.b32 %r22088, %r22084, 0, 0x7773U; 2026-02-21T09:05:41.1399618Z cvt.u16.u32 %rs756, %r22088; 2026-02-21T09:05:41.1399683Z ld.shared.b32 %r22089, [%r23]; 2026-02-21T09:05:41.1399747Z prmt.b32 %r22090, %r22089, 0, 0x7770U; 2026-02-21T09:05:41.1399895Z cvt.u16.u32 %rs757, %r22090; 2026-02-21T09:05:41.1399963Z prmt.b32 %r22091, %r22089, 0, 0x7771U; 2026-02-21T09:05:41.1400023Z cvt.u16.u32 %rs758, %r22091; 2026-02-21T09:05:41.1400088Z prmt.b32 %r22092, %r22089, 0, 0x7772U; 2026-02-21T09:05:41.1400153Z cvt.u16.u32 %rs759, %r22092; 2026-02-21T09:05:41.1400216Z prmt.b32 %r22093, %r22089, 0, 0x7773U; 2026-02-21T09:05:41.1400278Z cvt.u16.u32 %rs760, %r22093; 2026-02-21T09:05:41.1400488Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1400564Z shl.b16 %rs761, %rs753, 4; 2026-02-21T09:05:41.1400626Z shl.b16 %rs762, %rs757, 4; 2026-02-21T09:05:41.1400686Z shl.b16 %rs763, %rs754, 4; 2026-02-21T09:05:41.1400818Z shl.b16 %rs764, %rs758, 4; 2026-02-21T09:05:41.1400939Z shl.b16 %rs765, %rs755, 4; 2026-02-21T09:05:41.1401002Z shl.b16 %rs766, %rs759, 4; 2026-02-21T09:05:41.1401069Z shl.b16 %rs767, %rs756, 4; 2026-02-21T09:05:41.1401130Z shl.b16 %rs768, %rs760, 4; 2026-02-21T09:05:41.1401331Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1401398Z cvt.s16.s8 %rs769, %rs761; 2026-02-21T09:05:41.1401469Z shr.s16 %rs770, %rs769, 4; 2026-02-21T09:05:41.1401530Z cvt.s16.s8 %rs771, %rs762; 2026-02-21T09:05:41.1401591Z shr.s16 %rs772, %rs771, 4; 2026-02-21T09:05:41.1401663Z prmt.b32 %r22094, %r22084, 0, 0x8880U; 2026-02-21T09:05:41.1401723Z cvt.u16.u32 %rs773, %r22094; 2026-02-21T09:05:41.1401783Z shr.s16 %rs774, %rs773, 4; 2026-02-21T09:05:41.1401850Z prmt.b32 %r22095, %r22089, 0, 0x8880U; 2026-02-21T09:05:41.1401910Z cvt.u16.u32 %rs775, %r22095; 2026-02-21T09:05:41.1401970Z shr.s16 %rs776, %rs775, 4; 2026-02-21T09:05:41.1402222Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1402295Z cvt.rn.f32.s16 %r22096, %rs776; 2026-02-21T09:05:41.1402358Z cvt.rn.f32.s16 %r22097, %rs774; 2026-02-21T09:05:41.1402421Z cvt.rn.f32.s16 %r22098, %rs772; 2026-02-21T09:05:41.1402489Z cvt.rn.f32.s16 %r22099, %rs770; 2026-02-21T09:05:41.1402684Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1402745Z cvt.s16.s8 %rs777, %rs763; 2026-02-21T09:05:41.1402821Z shr.s16 %rs778, %rs777, 4; 2026-02-21T09:05:41.1402883Z cvt.s16.s8 %rs779, %rs764; 2026-02-21T09:05:41.1402943Z shr.s16 %rs780, %rs779, 4; 2026-02-21T09:05:41.1403011Z prmt.b32 %r22100, %r22084, 0, 0x9991U; 2026-02-21T09:05:41.1403076Z cvt.u16.u32 %rs781, %r22100; 2026-02-21T09:05:41.1403137Z shr.s16 %rs782, %rs781, 4; 2026-02-21T09:05:41.1403202Z prmt.b32 %r22101, %r22089, 0, 0x9991U; 2026-02-21T09:05:41.1403265Z cvt.u16.u32 %rs783, %r22101; 2026-02-21T09:05:41.1403328Z shr.s16 %rs784, %rs783, 4; 2026-02-21T09:05:41.1403523Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1403590Z cvt.rn.f32.s16 %r22102, %rs784; 2026-02-21T09:05:41.1403660Z cvt.rn.f32.s16 %r22103, %rs782; 2026-02-21T09:05:41.1403720Z cvt.rn.f32.s16 %r22104, %rs780; 2026-02-21T09:05:41.1403782Z cvt.rn.f32.s16 %r22105, %rs778; 2026-02-21T09:05:41.1403979Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1404039Z cvt.s16.s8 %rs785, %rs765; 2026-02-21T09:05:41.1404107Z shr.s16 %rs786, %rs785, 4; 2026-02-21T09:05:41.1404179Z cvt.s16.s8 %rs787, %rs766; 2026-02-21T09:05:41.1404244Z shr.s16 %rs788, %rs787, 4; 2026-02-21T09:05:41.1404314Z prmt.b32 %r22106, %r22084, 0, 0xaaa2U; 2026-02-21T09:05:41.1404373Z cvt.u16.u32 %rs789, %r22106; 2026-02-21T09:05:41.1404432Z shr.s16 %rs790, %rs789, 4; 2026-02-21T09:05:41.1404501Z prmt.b32 %r22107, %r22089, 0, 0xaaa2U; 2026-02-21T09:05:41.1404566Z cvt.u16.u32 %rs791, %r22107; 2026-02-21T09:05:41.1404626Z shr.s16 %rs792, %rs791, 4; 2026-02-21T09:05:41.1404821Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1404948Z cvt.rn.f32.s16 %r22108, %rs792; 2026-02-21T09:05:41.1405008Z cvt.rn.f32.s16 %r22109, %rs790; 2026-02-21T09:05:41.1405068Z cvt.rn.f32.s16 %r22110, %rs788; 2026-02-21T09:05:41.1405130Z cvt.rn.f32.s16 %r22111, %rs786; 2026-02-21T09:05:41.1405337Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1405400Z cvt.s16.s8 %rs793, %rs767; 2026-02-21T09:05:41.1405462Z shr.s16 %rs794, %rs793, 4; 2026-02-21T09:05:41.1405528Z cvt.s16.s8 %rs795, %rs768; 2026-02-21T09:05:41.1405588Z shr.s16 %rs796, %rs795, 4; 2026-02-21T09:05:41.1405652Z prmt.b32 %r22112, %r22084, 0, 0xbbb3U; 2026-02-21T09:05:41.1405765Z cvt.u16.u32 %rs797, %r22112; 2026-02-21T09:05:41.1405872Z shr.s16 %rs798, %rs797, 4; 2026-02-21T09:05:41.1405939Z prmt.b32 %r22113, %r22089, 0, 0xbbb3U; 2026-02-21T09:05:41.1405999Z cvt.u16.u32 %rs799, %r22113; 2026-02-21T09:05:41.1406065Z shr.s16 %rs800, %rs799, 4; 2026-02-21T09:05:41.1406259Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1406322Z cvt.rn.f32.s16 %r22114, %rs800; 2026-02-21T09:05:41.1406387Z cvt.rn.f32.s16 %r22115, %rs798; 2026-02-21T09:05:41.1406563Z cvt.rn.f32.s16 %r22116, %rs796; 2026-02-21T09:05:41.1406630Z cvt.rn.f32.s16 %r22117, %rs794; 2026-02-21T09:05:41.1406686Z bar.sync 0; 2026-02-21T09:05:41.1406812Z st.shared.v4.b32 [%r24], {%r22099, %r22097, %r22098, %r22096}; 2026-02-21T09:05:41.1406923Z st.shared.v4.b32 [%r25], {%r22105, %r22103, %r22104, %r22102}; 2026-02-21T09:05:41.1407030Z st.shared.v4.b32 [%r26], {%r22111, %r22109, %r22110, %r22108}; 2026-02-21T09:05:41.1407139Z st.shared.v4.b32 [%r27], {%r22117, %r22115, %r22116, %r22114}; 2026-02-21T09:05:41.1407275Z $L__tmp19: 2026-02-21T09:05:41.1407554Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1407619Z // begin inline asm 2026-02-21T09:05:41.1407696Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1407751Z // end inline asm 2026-02-21T09:05:41.1407805Z bar.sync 0; 2026-02-21T09:05:41.1407881Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1407939Z // begin inline asm 2026-02-21T09:05:41.1409405Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r17407,%r17408,%r17409,%r17410}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1409468Z // end inline asm 2026-02-21T09:05:41.1409527Z // begin inline asm 2026-02-21T09:05:41.1410994Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r17539,%r17540,%r17541,%r17542}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1411054Z // end inline asm 2026-02-21T09:05:41.1411113Z // begin inline asm 2026-02-21T09:05:41.1412579Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r17671,%r17672,%r17673,%r17674}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1412708Z // end inline asm 2026-02-21T09:05:41.1412768Z // begin inline asm 2026-02-21T09:05:41.1414279Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r17803,%r17804,%r17805,%r17806}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1414415Z // end inline asm 2026-02-21T09:05:41.1414475Z // begin inline asm 2026-02-21T09:05:41.1415970Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r17935,%r17936,%r17937,%r17938}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1416036Z // end inline asm 2026-02-21T09:05:41.1416093Z // begin inline asm 2026-02-21T09:05:41.1417652Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r18067,%r18068,%r18069,%r18070}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1417717Z // end inline asm 2026-02-21T09:05:41.1417774Z // begin inline asm 2026-02-21T09:05:41.1419227Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r18199,%r18200,%r18201,%r18202}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1419303Z // end inline asm 2026-02-21T09:05:41.1419364Z // begin inline asm 2026-02-21T09:05:41.1420823Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r18331,%r18332,%r18333,%r18334}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1420956Z // end inline asm 2026-02-21T09:05:41.1421032Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1421096Z mov.b32 %r18591, %r24001; 2026-02-21T09:05:41.1421155Z mov.b32 %r18592, %r21777; 2026-02-21T09:05:41.1421213Z mov.b32 %r18593, %r21777; 2026-02-21T09:05:41.1421270Z // begin inline asm 2026-02-21T09:05:41.1426238Z // wait for regs: %r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673,%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737,%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801,%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865,%r18591,%r18592,%r18593 2026-02-21T09:05:41.1426392Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1426554Z // end inline asm 2026-02-21T09:05:41.1426618Z $L__tmp20: 2026-02-21T09:05:41.1426835Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1426904Z add.s64 %rd211, %rd185, 64; 2026-02-21T09:05:41.1426969Z add.s64 %rd212, %rd186, 64; 2026-02-21T09:05:41.1427034Z add.s64 %rd213, %rd187, 64; 2026-02-21T09:05:41.1427231Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1427294Z add.s64 %rd214, %rd188, 64; 2026-02-21T09:05:41.1427357Z // begin inline asm 2026-02-21T09:05:41.1427415Z mov.u32 %r18853, 0x0; 2026-02-21T09:05:41.1427473Z mov.u32 %r18854, 0x0; 2026-02-21T09:05:41.1427534Z mov.u32 %r18855, 0x0; 2026-02-21T09:05:41.1427591Z mov.u32 %r18856, 0x0; 2026-02-21T09:05:41.1427726Z ld.global.v4.b32 { %r18853, %r18854, %r18855, %r18856 }, [ %rd211 + 0 ]; 2026-02-21T09:05:41.1427782Z // end inline asm 2026-02-21T09:05:41.1427844Z // begin inline asm 2026-02-21T09:05:41.1427898Z mov.u32 %r18857, 0x0; 2026-02-21T09:05:41.1427954Z mov.u32 %r18858, 0x0; 2026-02-21T09:05:41.1428025Z mov.u32 %r18859, 0x0; 2026-02-21T09:05:41.1428082Z mov.u32 %r18860, 0x0; 2026-02-21T09:05:41.1428214Z ld.global.v4.b32 { %r18857, %r18858, %r18859, %r18860 }, [ %rd212 + 0 ]; 2026-02-21T09:05:41.1428340Z // end inline asm 2026-02-21T09:05:41.1428406Z // begin inline asm 2026-02-21T09:05:41.1428462Z mov.u32 %r18861, 0x0; 2026-02-21T09:05:41.1428600Z mov.u32 %r18862, 0x0; 2026-02-21T09:05:41.1428659Z mov.u32 %r18863, 0x0; 2026-02-21T09:05:41.1428716Z mov.u32 %r18864, 0x0; 2026-02-21T09:05:41.1428839Z ld.global.v4.b32 { %r18861, %r18862, %r18863, %r18864 }, [ %rd213 + 0 ]; 2026-02-21T09:05:41.1428893Z // end inline asm 2026-02-21T09:05:41.1428955Z // begin inline asm 2026-02-21T09:05:41.1429013Z mov.u32 %r18865, 0x0; 2026-02-21T09:05:41.1429069Z mov.u32 %r18866, 0x0; 2026-02-21T09:05:41.1429129Z mov.u32 %r18867, 0x0; 2026-02-21T09:05:41.1429184Z mov.u32 %r18868, 0x0; 2026-02-21T09:05:41.1429304Z ld.global.v4.b32 { %r18865, %r18866, %r18867, %r18868 }, [ %rd214 + 0 ]; 2026-02-21T09:05:41.1429359Z // end inline asm 2026-02-21T09:05:41.1429632Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1429762Z bar.sync 0; 2026-02-21T09:05:41.1429847Z st.shared.v2.b32 [%r10], {%r18853, %r18854}; 2026-02-21T09:05:41.1429951Z st.shared.v2.b32 [%r10+2048], {%r18857, %r18858}; 2026-02-21T09:05:41.1430041Z st.shared.v2.b32 [%r10+4096], {%r18861, %r18862}; 2026-02-21T09:05:41.1430125Z st.shared.v2.b32 [%r10+6144], {%r18865, %r18866}; 2026-02-21T09:05:41.1430206Z st.shared.v2.b32 [%r11], {%r18855, %r18856}; 2026-02-21T09:05:41.1430286Z st.shared.v2.b32 [%r11+2048], {%r18859, %r18860}; 2026-02-21T09:05:41.1430368Z st.shared.v2.b32 [%r11+4096], {%r18863, %r18864}; 2026-02-21T09:05:41.1430450Z st.shared.v2.b32 [%r11+6144], {%r18867, %r18868}; 2026-02-21T09:05:41.1430513Z bar.sync 0; 2026-02-21T09:05:41.1430580Z ld.shared.b16 %rs801, [%r12]; 2026-02-21T09:05:41.1430647Z ld.shared.b16 %rs802, [%r12+256]; 2026-02-21T09:05:41.1430716Z ld.shared.b16 %rs803, [%r12+16]; 2026-02-21T09:05:41.1430781Z ld.shared.b16 %rs804, [%r12+272]; 2026-02-21T09:05:41.1430850Z ld.shared.b16 %rs805, [%r12+2048]; 2026-02-21T09:05:41.1430982Z ld.shared.b16 %rs806, [%r12+2304]; 2026-02-21T09:05:41.1431050Z ld.shared.b16 %rs807, [%r12+2064]; 2026-02-21T09:05:41.1431113Z ld.shared.b16 %rs808, [%r12+2320]; 2026-02-21T09:05:41.1431174Z ld.shared.b16 %rs809, [%r12+4096]; 2026-02-21T09:05:41.1431238Z ld.shared.b16 %rs810, [%r12+4352]; 2026-02-21T09:05:41.1431300Z ld.shared.b16 %rs811, [%r12+4112]; 2026-02-21T09:05:41.1431363Z ld.shared.b16 %rs812, [%r12+4368]; 2026-02-21T09:05:41.1431426Z ld.shared.b16 %rs813, [%r12+6144]; 2026-02-21T09:05:41.1431489Z ld.shared.b16 %rs814, [%r12+6400]; 2026-02-21T09:05:41.1431549Z ld.shared.b16 %rs815, [%r12+6160]; 2026-02-21T09:05:41.1431611Z ld.shared.b16 %rs816, [%r12+6416]; 2026-02-21T09:05:41.1431678Z ld.shared.b16 %rs817, [%r13]; 2026-02-21T09:05:41.1431740Z ld.shared.b16 %rs818, [%r13+256]; 2026-02-21T09:05:41.1431804Z ld.shared.b16 %rs819, [%r13+16]; 2026-02-21T09:05:41.1431872Z ld.shared.b16 %rs820, [%r13+272]; 2026-02-21T09:05:41.1431938Z ld.shared.b16 %rs821, [%r13+2048]; 2026-02-21T09:05:41.1432001Z ld.shared.b16 %rs822, [%r13+2304]; 2026-02-21T09:05:41.1432066Z ld.shared.b16 %rs823, [%r13+2064]; 2026-02-21T09:05:41.1432129Z ld.shared.b16 %rs824, [%r13+2320]; 2026-02-21T09:05:41.1432192Z ld.shared.b16 %rs825, [%r13+4096]; 2026-02-21T09:05:41.1432253Z ld.shared.b16 %rs826, [%r13+4352]; 2026-02-21T09:05:41.1432319Z ld.shared.b16 %rs827, [%r13+4112]; 2026-02-21T09:05:41.1432381Z ld.shared.b16 %rs828, [%r13+4368]; 2026-02-21T09:05:41.1432443Z ld.shared.b16 %rs829, [%r13+6144]; 2026-02-21T09:05:41.1432509Z ld.shared.b16 %rs830, [%r13+6400]; 2026-02-21T09:05:41.1432572Z ld.shared.b16 %rs831, [%r13+6160]; 2026-02-21T09:05:41.1432632Z ld.shared.b16 %rs832, [%r13+6416]; 2026-02-21T09:05:41.1432694Z cvt.f32.bf16 %r18999, %rs801; 2026-02-21T09:05:41.1432759Z cvt.f32.bf16 %r19000, %rs802; 2026-02-21T09:05:41.1432818Z cvt.f32.bf16 %r19001, %rs817; 2026-02-21T09:05:41.1432882Z cvt.f32.bf16 %r19002, %rs818; 2026-02-21T09:05:41.1432958Z cvt.f32.bf16 %r19131, %rs803; 2026-02-21T09:05:41.1433021Z cvt.f32.bf16 %r19132, %rs804; 2026-02-21T09:05:41.1433080Z cvt.f32.bf16 %r19133, %rs819; 2026-02-21T09:05:41.1433194Z cvt.f32.bf16 %r19134, %rs820; 2026-02-21T09:05:41.1433258Z cvt.f32.bf16 %r19263, %rs805; 2026-02-21T09:05:41.1433318Z cvt.f32.bf16 %r19264, %rs806; 2026-02-21T09:05:41.1433377Z cvt.f32.bf16 %r19265, %rs821; 2026-02-21T09:05:41.1433441Z cvt.f32.bf16 %r19266, %rs822; 2026-02-21T09:05:41.1433500Z cvt.f32.bf16 %r19395, %rs807; 2026-02-21T09:05:41.1433560Z cvt.f32.bf16 %r19396, %rs808; 2026-02-21T09:05:41.1433628Z cvt.f32.bf16 %r19397, %rs823; 2026-02-21T09:05:41.1433688Z cvt.f32.bf16 %r19398, %rs824; 2026-02-21T09:05:41.1433747Z cvt.f32.bf16 %r19527, %rs809; 2026-02-21T09:05:41.1433806Z cvt.f32.bf16 %r19528, %rs810; 2026-02-21T09:05:41.1433869Z cvt.f32.bf16 %r19529, %rs825; 2026-02-21T09:05:41.1433928Z cvt.f32.bf16 %r19530, %rs826; 2026-02-21T09:05:41.1434079Z cvt.f32.bf16 %r19659, %rs811; 2026-02-21T09:05:41.1434153Z cvt.f32.bf16 %r19660, %rs812; 2026-02-21T09:05:41.1434214Z cvt.f32.bf16 %r19661, %rs827; 2026-02-21T09:05:41.1434272Z cvt.f32.bf16 %r19662, %rs828; 2026-02-21T09:05:41.1434333Z cvt.f32.bf16 %r19791, %rs813; 2026-02-21T09:05:41.1434396Z cvt.f32.bf16 %r19792, %rs814; 2026-02-21T09:05:41.1434457Z cvt.f32.bf16 %r19793, %rs829; 2026-02-21T09:05:41.1434517Z cvt.f32.bf16 %r19794, %rs830; 2026-02-21T09:05:41.1434577Z cvt.f32.bf16 %r19923, %rs815; 2026-02-21T09:05:41.1434637Z cvt.f32.bf16 %r19924, %rs816; 2026-02-21T09:05:41.1434696Z cvt.f32.bf16 %r19925, %rs831; 2026-02-21T09:05:41.1434757Z cvt.f32.bf16 %r19926, %rs832; 2026-02-21T09:05:41.1434974Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1435039Z add.s32 %r22118, %r29609, 131072; 2026-02-21T09:05:41.1435103Z cvt.s64.s32 %rd239, %r22118; 2026-02-21T09:05:41.1435187Z add.s64 %rd215, %rd45, %rd239; 2026-02-21T09:05:41.1435445Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1435508Z // begin inline asm 2026-02-21T09:05:41.1435573Z mov.u32 %r18869, 0x0; 2026-02-21T09:05:41.1435630Z mov.u32 %r18870, 0x0; 2026-02-21T09:05:41.1435730Z ld.global.v2.b32 { %r18869, %r18870 }, [ %rd215 + 0 ]; 2026-02-21T09:05:41.1435786Z // end inline asm 2026-02-21T09:05:41.1435989Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1436044Z bar.sync 0; 2026-02-21T09:05:41.1436108Z st.shared.b8 [%r14], %r18869; 2026-02-21T09:05:41.1436181Z prmt.b32 %r22119, %r18869, 0, 0x7771U; 2026-02-21T09:05:41.1436243Z st.shared.b8 [%r15], %r22119; 2026-02-21T09:05:41.1436311Z prmt.b32 %r22120, %r18869, 0, 0x7772U; 2026-02-21T09:05:41.1436375Z st.shared.b8 [%r16+256], %r22120; 2026-02-21T09:05:41.1436442Z prmt.b32 %r22121, %r18869, 0, 0x7773U; 2026-02-21T09:05:41.1436638Z st.shared.b8 [%r17+256], %r22121; 2026-02-21T09:05:41.1436704Z st.shared.b8 [%r18+512], %r18870; 2026-02-21T09:05:41.1436774Z prmt.b32 %r22122, %r18870, 0, 0x7771U; 2026-02-21T09:05:41.1436836Z st.shared.b8 [%r19+512], %r22122; 2026-02-21T09:05:41.1436904Z prmt.b32 %r22123, %r18870, 0, 0x7772U; 2026-02-21T09:05:41.1436971Z st.shared.b8 [%r20+768], %r22123; 2026-02-21T09:05:41.1437036Z prmt.b32 %r22124, %r18870, 0, 0x7773U; 2026-02-21T09:05:41.1437099Z st.shared.b8 [%r21+768], %r22124; 2026-02-21T09:05:41.1437154Z bar.sync 0; 2026-02-21T09:05:41.1437225Z ld.shared.b32 %r22125, [%r22]; 2026-02-21T09:05:41.1437287Z prmt.b32 %r22126, %r22125, 0, 0x7770U; 2026-02-21T09:05:41.1437350Z cvt.u16.u32 %rs833, %r22126; 2026-02-21T09:05:41.1437415Z prmt.b32 %r22127, %r22125, 0, 0x7771U; 2026-02-21T09:05:41.1437476Z cvt.u16.u32 %rs834, %r22127; 2026-02-21T09:05:41.1437539Z prmt.b32 %r22128, %r22125, 0, 0x7772U; 2026-02-21T09:05:41.1437599Z cvt.u16.u32 %rs835, %r22128; 2026-02-21T09:05:41.1437671Z prmt.b32 %r22129, %r22125, 0, 0x7773U; 2026-02-21T09:05:41.1437736Z cvt.u16.u32 %rs836, %r22129; 2026-02-21T09:05:41.1437800Z ld.shared.b32 %r22130, [%r23]; 2026-02-21T09:05:41.1437867Z prmt.b32 %r22131, %r22130, 0, 0x7770U; 2026-02-21T09:05:41.1438008Z cvt.u16.u32 %rs837, %r22131; 2026-02-21T09:05:41.1438071Z prmt.b32 %r22132, %r22130, 0, 0x7771U; 2026-02-21T09:05:41.1438147Z cvt.u16.u32 %rs838, %r22132; 2026-02-21T09:05:41.1438214Z prmt.b32 %r22133, %r22130, 0, 0x7772U; 2026-02-21T09:05:41.1438274Z cvt.u16.u32 %rs839, %r22133; 2026-02-21T09:05:41.1438338Z prmt.b32 %r22134, %r22130, 0, 0x7773U; 2026-02-21T09:05:41.1438401Z cvt.u16.u32 %rs840, %r22134; 2026-02-21T09:05:41.1438600Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1438663Z shl.b16 %rs841, %rs833, 4; 2026-02-21T09:05:41.1438730Z shl.b16 %rs842, %rs837, 4; 2026-02-21T09:05:41.1438791Z shl.b16 %rs843, %rs834, 4; 2026-02-21T09:05:41.1439004Z shl.b16 %rs844, %rs838, 4; 2026-02-21T09:05:41.1439070Z shl.b16 %rs845, %rs835, 4; 2026-02-21T09:05:41.1439136Z shl.b16 %rs846, %rs839, 4; 2026-02-21T09:05:41.1439195Z shl.b16 %rs847, %rs836, 4; 2026-02-21T09:05:41.1439258Z shl.b16 %rs848, %rs840, 4; 2026-02-21T09:05:41.1439459Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1439520Z cvt.s16.s8 %rs849, %rs841; 2026-02-21T09:05:41.1439579Z shr.s16 %rs850, %rs849, 4; 2026-02-21T09:05:41.1439643Z cvt.s16.s8 %rs851, %rs842; 2026-02-21T09:05:41.1439700Z shr.s16 %rs852, %rs851, 4; 2026-02-21T09:05:41.1439764Z prmt.b32 %r22135, %r22125, 0, 0x8880U; 2026-02-21T09:05:41.1439825Z cvt.u16.u32 %rs853, %r22135; 2026-02-21T09:05:41.1439889Z shr.s16 %rs854, %rs853, 4; 2026-02-21T09:05:41.1439952Z prmt.b32 %r22136, %r22130, 0, 0x8880U; 2026-02-21T09:05:41.1440011Z cvt.u16.u32 %rs855, %r22136; 2026-02-21T09:05:41.1440078Z shr.s16 %rs856, %rs855, 4; 2026-02-21T09:05:41.1440340Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1440409Z cvt.rn.f32.s16 %r22137, %rs856; 2026-02-21T09:05:41.1440472Z cvt.rn.f32.s16 %r22138, %rs854; 2026-02-21T09:05:41.1440541Z cvt.rn.f32.s16 %r22139, %rs852; 2026-02-21T09:05:41.1440602Z cvt.rn.f32.s16 %r22140, %rs850; 2026-02-21T09:05:41.1440796Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1440860Z cvt.s16.s8 %rs857, %rs843; 2026-02-21T09:05:41.1440920Z shr.s16 %rs858, %rs857, 4; 2026-02-21T09:05:41.1440990Z cvt.s16.s8 %rs859, %rs844; 2026-02-21T09:05:41.1441055Z shr.s16 %rs860, %rs859, 4; 2026-02-21T09:05:41.1441122Z prmt.b32 %r22141, %r22125, 0, 0x9991U; 2026-02-21T09:05:41.1441183Z cvt.u16.u32 %rs861, %r22141; 2026-02-21T09:05:41.1441241Z shr.s16 %rs862, %rs861, 4; 2026-02-21T09:05:41.1441311Z prmt.b32 %r22142, %r22130, 0, 0x9991U; 2026-02-21T09:05:41.1441374Z cvt.u16.u32 %rs863, %r22142; 2026-02-21T09:05:41.1441436Z shr.s16 %rs864, %rs863, 4; 2026-02-21T09:05:41.1441637Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1441703Z cvt.rn.f32.s16 %r22143, %rs864; 2026-02-21T09:05:41.1441767Z cvt.rn.f32.s16 %r22144, %rs862; 2026-02-21T09:05:41.1441827Z cvt.rn.f32.s16 %r22145, %rs860; 2026-02-21T09:05:41.1441893Z cvt.rn.f32.s16 %r22146, %rs858; 2026-02-21T09:05:41.1442097Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1442159Z cvt.s16.s8 %rs865, %rs845; 2026-02-21T09:05:41.1442222Z shr.s16 %rs866, %rs865, 4; 2026-02-21T09:05:41.1442280Z cvt.s16.s8 %rs867, %rs846; 2026-02-21T09:05:41.1442340Z shr.s16 %rs868, %rs867, 4; 2026-02-21T09:05:41.1442409Z prmt.b32 %r22147, %r22125, 0, 0xaaa2U; 2026-02-21T09:05:41.1442470Z cvt.u16.u32 %rs869, %r22147; 2026-02-21T09:05:41.1442529Z shr.s16 %rs870, %rs869, 4; 2026-02-21T09:05:41.1442598Z prmt.b32 %r22148, %r22130, 0, 0xaaa2U; 2026-02-21T09:05:41.1442663Z cvt.u16.u32 %rs871, %r22148; 2026-02-21T09:05:41.1442721Z shr.s16 %rs872, %rs871, 4; 2026-02-21T09:05:41.1442916Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1443040Z cvt.rn.f32.s16 %r22149, %rs872; 2026-02-21T09:05:41.1443102Z cvt.rn.f32.s16 %r22150, %rs870; 2026-02-21T09:05:41.1443162Z cvt.rn.f32.s16 %r22151, %rs868; 2026-02-21T09:05:41.1443223Z cvt.rn.f32.s16 %r22152, %rs866; 2026-02-21T09:05:41.1443423Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1443483Z cvt.s16.s8 %rs873, %rs847; 2026-02-21T09:05:41.1443542Z shr.s16 %rs874, %rs873, 4; 2026-02-21T09:05:41.1443606Z cvt.s16.s8 %rs875, %rs848; 2026-02-21T09:05:41.1443664Z shr.s16 %rs876, %rs875, 4; 2026-02-21T09:05:41.1443731Z prmt.b32 %r22153, %r22125, 0, 0xbbb3U; 2026-02-21T09:05:41.1443903Z cvt.u16.u32 %rs877, %r22153; 2026-02-21T09:05:41.1443966Z shr.s16 %rs878, %rs877, 4; 2026-02-21T09:05:41.1444033Z prmt.b32 %r22154, %r22130, 0, 0xbbb3U; 2026-02-21T09:05:41.1444094Z cvt.u16.u32 %rs879, %r22154; 2026-02-21T09:05:41.1444157Z shr.s16 %rs880, %rs879, 4; 2026-02-21T09:05:41.1444352Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1444416Z cvt.rn.f32.s16 %r22155, %rs880; 2026-02-21T09:05:41.1444485Z cvt.rn.f32.s16 %r22156, %rs878; 2026-02-21T09:05:41.1444545Z cvt.rn.f32.s16 %r22157, %rs876; 2026-02-21T09:05:41.1444605Z cvt.rn.f32.s16 %r22158, %rs874; 2026-02-21T09:05:41.1444663Z bar.sync 0; 2026-02-21T09:05:41.1444778Z st.shared.v4.b32 [%r24], {%r22140, %r22138, %r22139, %r22137}; 2026-02-21T09:05:41.1444888Z st.shared.v4.b32 [%r25], {%r22146, %r22144, %r22145, %r22143}; 2026-02-21T09:05:41.1444994Z st.shared.v4.b32 [%r26], {%r22152, %r22150, %r22151, %r22149}; 2026-02-21T09:05:41.1445109Z st.shared.v4.b32 [%r27], {%r22158, %r22156, %r22157, %r22155}; 2026-02-21T09:05:41.1445212Z $L__tmp21: 2026-02-21T09:05:41.1445489Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1445557Z // begin inline asm 2026-02-21T09:05:41.1445633Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1445694Z // end inline asm 2026-02-21T09:05:41.1445764Z bar.sync 0; 2026-02-21T09:05:41.1445838Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1445897Z // begin inline asm 2026-02-21T09:05:41.1447481Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r18999,%r19000,%r19001,%r19002}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1447558Z // end inline asm 2026-02-21T09:05:41.1447617Z // begin inline asm 2026-02-21T09:05:41.1449072Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r19131,%r19132,%r19133,%r19134}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1449133Z // end inline asm 2026-02-21T09:05:41.1449196Z // begin inline asm 2026-02-21T09:05:41.1450656Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r19263,%r19264,%r19265,%r19266}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1450793Z // end inline asm 2026-02-21T09:05:41.1450856Z // begin inline asm 2026-02-21T09:05:41.1452375Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r19395,%r19396,%r19397,%r19398}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1452492Z // end inline asm 2026-02-21T09:05:41.1452555Z // begin inline asm 2026-02-21T09:05:41.1454065Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r19527,%r19528,%r19529,%r19530}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1454130Z // end inline asm 2026-02-21T09:05:41.1454188Z // begin inline asm 2026-02-21T09:05:41.1455638Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r19659,%r19660,%r19661,%r19662}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1455702Z // end inline asm 2026-02-21T09:05:41.1455759Z // begin inline asm 2026-02-21T09:05:41.1457315Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r19791,%r19792,%r19793,%r19794}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1457394Z // end inline asm 2026-02-21T09:05:41.1457453Z // begin inline asm 2026-02-21T09:05:41.1458915Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r19923,%r19924,%r19925,%r19926}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1459048Z // end inline asm 2026-02-21T09:05:41.1459124Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1459189Z mov.b32 %r20183, %r24001; 2026-02-21T09:05:41.1459248Z mov.b32 %r20184, %r21777; 2026-02-21T09:05:41.1459305Z mov.b32 %r20185, %r21777; 2026-02-21T09:05:41.1459367Z // begin inline asm 2026-02-21T09:05:41.1464358Z // wait for regs: %r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673,%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737,%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801,%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865,%r20183,%r20184,%r20185 2026-02-21T09:05:41.1464507Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1464564Z // end inline asm 2026-02-21T09:05:41.1464622Z $L__tmp22: 2026-02-21T09:05:41.1464832Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1464901Z add.s64 %rd224, %rd185, 96; 2026-02-21T09:05:41.1464968Z add.s64 %rd225, %rd186, 96; 2026-02-21T09:05:41.1465032Z add.s64 %rd226, %rd187, 96; 2026-02-21T09:05:41.1465231Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1465294Z add.s64 %rd227, %rd188, 96; 2026-02-21T09:05:41.1465358Z // begin inline asm 2026-02-21T09:05:41.1465418Z mov.u32 %r20445, 0x0; 2026-02-21T09:05:41.1465476Z mov.u32 %r20446, 0x0; 2026-02-21T09:05:41.1465542Z mov.u32 %r20447, 0x0; 2026-02-21T09:05:41.1465606Z mov.u32 %r20448, 0x0; 2026-02-21T09:05:41.1465748Z ld.global.v4.b32 { %r20445, %r20446, %r20447, %r20448 }, [ %rd224 + 0 ]; 2026-02-21T09:05:41.1465807Z // end inline asm 2026-02-21T09:05:41.1465869Z // begin inline asm 2026-02-21T09:05:41.1465928Z mov.u32 %r20449, 0x0; 2026-02-21T09:05:41.1465994Z mov.u32 %r20450, 0x0; 2026-02-21T09:05:41.1466054Z mov.u32 %r20451, 0x0; 2026-02-21T09:05:41.1466110Z mov.u32 %r20452, 0x0; 2026-02-21T09:05:41.1466244Z ld.global.v4.b32 { %r20449, %r20450, %r20451, %r20452 }, [ %rd225 + 0 ]; 2026-02-21T09:05:41.1466300Z // end inline asm 2026-02-21T09:05:41.1466363Z // begin inline asm 2026-02-21T09:05:41.1466615Z mov.u32 %r20453, 0x0; 2026-02-21T09:05:41.1466676Z mov.u32 %r20454, 0x0; 2026-02-21T09:05:41.1466737Z mov.u32 %r20455, 0x0; 2026-02-21T09:05:41.1466794Z mov.u32 %r20456, 0x0; 2026-02-21T09:05:41.1466919Z ld.global.v4.b32 { %r20453, %r20454, %r20455, %r20456 }, [ %rd226 + 0 ]; 2026-02-21T09:05:41.1466976Z // end inline asm 2026-02-21T09:05:41.1467039Z // begin inline asm 2026-02-21T09:05:41.1467097Z mov.u32 %r20457, 0x0; 2026-02-21T09:05:41.1467153Z mov.u32 %r20458, 0x0; 2026-02-21T09:05:41.1467212Z mov.u32 %r20459, 0x0; 2026-02-21T09:05:41.1467270Z mov.u32 %r20460, 0x0; 2026-02-21T09:05:41.1467405Z ld.global.v4.b32 { %r20457, %r20458, %r20459, %r20460 }, [ %rd227 + 0 ]; 2026-02-21T09:05:41.1467466Z // end inline asm 2026-02-21T09:05:41.1467749Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1467868Z bar.sync 0; 2026-02-21T09:05:41.1467952Z st.shared.v2.b32 [%r10], {%r20445, %r20446}; 2026-02-21T09:05:41.1468048Z st.shared.v2.b32 [%r10+2048], {%r20449, %r20450}; 2026-02-21T09:05:41.1468138Z st.shared.v2.b32 [%r10+4096], {%r20453, %r20454}; 2026-02-21T09:05:41.1468222Z st.shared.v2.b32 [%r10+6144], {%r20457, %r20458}; 2026-02-21T09:05:41.1468394Z st.shared.v2.b32 [%r11], {%r20447, %r20448}; 2026-02-21T09:05:41.1468483Z st.shared.v2.b32 [%r11+2048], {%r20451, %r20452}; 2026-02-21T09:05:41.1468571Z st.shared.v2.b32 [%r11+4096], {%r20455, %r20456}; 2026-02-21T09:05:41.1468659Z st.shared.v2.b32 [%r11+6144], {%r20459, %r20460}; 2026-02-21T09:05:41.1468716Z bar.sync 0; 2026-02-21T09:05:41.1468785Z ld.shared.b16 %rs881, [%r12]; 2026-02-21T09:05:41.1468854Z ld.shared.b16 %rs882, [%r12+256]; 2026-02-21T09:05:41.1468928Z ld.shared.b16 %rs883, [%r12+16]; 2026-02-21T09:05:41.1468994Z ld.shared.b16 %rs884, [%r12+272]; 2026-02-21T09:05:41.1469130Z ld.shared.b16 %rs885, [%r12+2048]; 2026-02-21T09:05:41.1469199Z ld.shared.b16 %rs886, [%r12+2304]; 2026-02-21T09:05:41.1469263Z ld.shared.b16 %rs887, [%r12+2064]; 2026-02-21T09:05:41.1469327Z ld.shared.b16 %rs888, [%r12+2320]; 2026-02-21T09:05:41.1469390Z ld.shared.b16 %rs889, [%r12+4096]; 2026-02-21T09:05:41.1469455Z ld.shared.b16 %rs890, [%r12+4352]; 2026-02-21T09:05:41.1469518Z ld.shared.b16 %rs891, [%r12+4112]; 2026-02-21T09:05:41.1469582Z ld.shared.b16 %rs892, [%r12+4368]; 2026-02-21T09:05:41.1469650Z ld.shared.b16 %rs893, [%r12+6144]; 2026-02-21T09:05:41.1469715Z ld.shared.b16 %rs894, [%r12+6400]; 2026-02-21T09:05:41.1469791Z ld.shared.b16 %rs895, [%r12+6160]; 2026-02-21T09:05:41.1469856Z ld.shared.b16 %rs896, [%r12+6416]; 2026-02-21T09:05:41.1469926Z ld.shared.b16 %rs897, [%r13]; 2026-02-21T09:05:41.1469989Z ld.shared.b16 %rs898, [%r13+256]; 2026-02-21T09:05:41.1470054Z ld.shared.b16 %rs899, [%r13+16]; 2026-02-21T09:05:41.1470128Z ld.shared.b16 %rs900, [%r13+272]; 2026-02-21T09:05:41.1470194Z ld.shared.b16 %rs901, [%r13+2048]; 2026-02-21T09:05:41.1470259Z ld.shared.b16 %rs902, [%r13+2304]; 2026-02-21T09:05:41.1470325Z ld.shared.b16 %rs903, [%r13+2064]; 2026-02-21T09:05:41.1470390Z ld.shared.b16 %rs904, [%r13+2320]; 2026-02-21T09:05:41.1470454Z ld.shared.b16 %rs905, [%r13+4096]; 2026-02-21T09:05:41.1470517Z ld.shared.b16 %rs906, [%r13+4352]; 2026-02-21T09:05:41.1470586Z ld.shared.b16 %rs907, [%r13+4112]; 2026-02-21T09:05:41.1470648Z ld.shared.b16 %rs908, [%r13+4368]; 2026-02-21T09:05:41.1470713Z ld.shared.b16 %rs909, [%r13+6144]; 2026-02-21T09:05:41.1470779Z ld.shared.b16 %rs910, [%r13+6400]; 2026-02-21T09:05:41.1470841Z ld.shared.b16 %rs911, [%r13+6160]; 2026-02-21T09:05:41.1470904Z ld.shared.b16 %rs912, [%r13+6416]; 2026-02-21T09:05:41.1470966Z cvt.f32.bf16 %r20591, %rs881; 2026-02-21T09:05:41.1471030Z cvt.f32.bf16 %r20592, %rs882; 2026-02-21T09:05:41.1471090Z cvt.f32.bf16 %r20593, %rs897; 2026-02-21T09:05:41.1471154Z cvt.f32.bf16 %r20594, %rs898; 2026-02-21T09:05:41.1471220Z cvt.f32.bf16 %r20723, %rs883; 2026-02-21T09:05:41.1471281Z cvt.f32.bf16 %r20724, %rs884; 2026-02-21T09:05:41.1471341Z cvt.f32.bf16 %r20725, %rs899; 2026-02-21T09:05:41.1471486Z cvt.f32.bf16 %r20726, %rs900; 2026-02-21T09:05:41.1471554Z cvt.f32.bf16 %r20855, %rs885; 2026-02-21T09:05:41.1471615Z cvt.f32.bf16 %r20856, %rs886; 2026-02-21T09:05:41.1471675Z cvt.f32.bf16 %r20857, %rs901; 2026-02-21T09:05:41.1471739Z cvt.f32.bf16 %r20858, %rs902; 2026-02-21T09:05:41.1471799Z cvt.f32.bf16 %r20987, %rs887; 2026-02-21T09:05:41.1471859Z cvt.f32.bf16 %r20988, %rs888; 2026-02-21T09:05:41.1471922Z cvt.f32.bf16 %r20989, %rs903; 2026-02-21T09:05:41.1471982Z cvt.f32.bf16 %r20990, %rs904; 2026-02-21T09:05:41.1472042Z cvt.f32.bf16 %r21119, %rs889; 2026-02-21T09:05:41.1472101Z cvt.f32.bf16 %r21120, %rs890; 2026-02-21T09:05:41.1472163Z cvt.f32.bf16 %r21121, %rs905; 2026-02-21T09:05:41.1472275Z cvt.f32.bf16 %r21122, %rs906; 2026-02-21T09:05:41.1472383Z cvt.f32.bf16 %r21251, %rs891; 2026-02-21T09:05:41.1472448Z cvt.f32.bf16 %r21252, %rs892; 2026-02-21T09:05:41.1472508Z cvt.f32.bf16 %r21253, %rs907; 2026-02-21T09:05:41.1472571Z cvt.f32.bf16 %r21254, %rs908; 2026-02-21T09:05:41.1472631Z cvt.f32.bf16 %r21383, %rs893; 2026-02-21T09:05:41.1472695Z cvt.f32.bf16 %r21384, %rs894; 2026-02-21T09:05:41.1472755Z cvt.f32.bf16 %r21385, %rs909; 2026-02-21T09:05:41.1472817Z cvt.f32.bf16 %r21386, %rs910; 2026-02-21T09:05:41.1472881Z cvt.f32.bf16 %r21515, %rs895; 2026-02-21T09:05:41.1472941Z cvt.f32.bf16 %r21516, %rs896; 2026-02-21T09:05:41.1473001Z cvt.f32.bf16 %r21517, %rs911; 2026-02-21T09:05:41.1473059Z cvt.f32.bf16 %r21518, %rs912; 2026-02-21T09:05:41.1473280Z .loc 1 61 34 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:34 2026-02-21T09:05:41.1473344Z add.s32 %r22159, %r29609, 196608; 2026-02-21T09:05:41.1473409Z cvt.s64.s32 %rd240, %r22159; 2026-02-21T09:05:41.1473483Z add.s64 %rd228, %rd45, %rd240; 2026-02-21T09:05:41.1473729Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1473793Z // begin inline asm 2026-02-21T09:05:41.1473859Z mov.u32 %r20461, 0x0; 2026-02-21T09:05:41.1473916Z mov.u32 %r20462, 0x0; 2026-02-21T09:05:41.1474015Z ld.global.v2.b32 { %r20461, %r20462 }, [ %rd228 + 0 ]; 2026-02-21T09:05:41.1474071Z // end inline asm 2026-02-21T09:05:41.1474271Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1474326Z bar.sync 0; 2026-02-21T09:05:41.1474389Z st.shared.b8 [%r14], %r20461; 2026-02-21T09:05:41.1474462Z prmt.b32 %r22160, %r20461, 0, 0x7771U; 2026-02-21T09:05:41.1474526Z st.shared.b8 [%r15], %r22160; 2026-02-21T09:05:41.1474593Z prmt.b32 %r22161, %r20461, 0, 0x7772U; 2026-02-21T09:05:41.1474661Z st.shared.b8 [%r16+256], %r22161; 2026-02-21T09:05:41.1474727Z prmt.b32 %r22162, %r20461, 0, 0x7773U; 2026-02-21T09:05:41.1474796Z st.shared.b8 [%r17+256], %r22162; 2026-02-21T09:05:41.1474858Z st.shared.b8 [%r18+512], %r20462; 2026-02-21T09:05:41.1474926Z prmt.b32 %r22163, %r20462, 0, 0x7771U; 2026-02-21T09:05:41.1474990Z st.shared.b8 [%r19+512], %r22163; 2026-02-21T09:05:41.1475053Z prmt.b32 %r22164, %r20462, 0, 0x7772U; 2026-02-21T09:05:41.1475119Z st.shared.b8 [%r20+768], %r22164; 2026-02-21T09:05:41.1475184Z prmt.b32 %r22165, %r20462, 0, 0x7773U; 2026-02-21T09:05:41.1475245Z st.shared.b8 [%r21+768], %r22165; 2026-02-21T09:05:41.1475300Z bar.sync 0; 2026-02-21T09:05:41.1475370Z ld.shared.b32 %r22166, [%r22]; 2026-02-21T09:05:41.1475435Z prmt.b32 %r22167, %r22166, 0, 0x7770U; 2026-02-21T09:05:41.1475496Z cvt.u16.u32 %rs913, %r22167; 2026-02-21T09:05:41.1475566Z prmt.b32 %r22168, %r22166, 0, 0x7771U; 2026-02-21T09:05:41.1475626Z cvt.u16.u32 %rs914, %r22168; 2026-02-21T09:05:41.1475689Z prmt.b32 %r22169, %r22166, 0, 0x7772U; 2026-02-21T09:05:41.1475754Z cvt.u16.u32 %rs915, %r22169; 2026-02-21T09:05:41.1475825Z prmt.b32 %r22170, %r22166, 0, 0x7773U; 2026-02-21T09:05:41.1475885Z cvt.u16.u32 %rs916, %r22170; 2026-02-21T09:05:41.1475950Z ld.shared.b32 %r22171, [%r23]; 2026-02-21T09:05:41.1476076Z prmt.b32 %r22172, %r22171, 0, 0x7770U; 2026-02-21T09:05:41.1476149Z cvt.u16.u32 %rs917, %r22172; 2026-02-21T09:05:41.1476218Z prmt.b32 %r22173, %r22171, 0, 0x7771U; 2026-02-21T09:05:41.1476286Z cvt.u16.u32 %rs918, %r22173; 2026-02-21T09:05:41.1476352Z prmt.b32 %r22174, %r22171, 0, 0x7772U; 2026-02-21T09:05:41.1476414Z cvt.u16.u32 %rs919, %r22174; 2026-02-21T09:05:41.1476603Z prmt.b32 %r22175, %r22171, 0, 0x7773U; 2026-02-21T09:05:41.1476673Z cvt.u16.u32 %rs920, %r22175; 2026-02-21T09:05:41.1476871Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1476935Z shl.b16 %rs921, %rs913, 4; 2026-02-21T09:05:41.1476998Z shl.b16 %rs922, %rs917, 4; 2026-02-21T09:05:41.1477059Z shl.b16 %rs923, %rs914, 4; 2026-02-21T09:05:41.1477263Z shl.b16 %rs924, %rs918, 4; 2026-02-21T09:05:41.1477328Z shl.b16 %rs925, %rs915, 4; 2026-02-21T09:05:41.1477394Z shl.b16 %rs926, %rs919, 4; 2026-02-21T09:05:41.1477453Z shl.b16 %rs927, %rs916, 4; 2026-02-21T09:05:41.1477518Z shl.b16 %rs928, %rs920, 4; 2026-02-21T09:05:41.1477721Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1477784Z cvt.s16.s8 %rs929, %rs921; 2026-02-21T09:05:41.1477843Z shr.s16 %rs930, %rs929, 4; 2026-02-21T09:05:41.1477907Z cvt.s16.s8 %rs931, %rs922; 2026-02-21T09:05:41.1477967Z shr.s16 %rs932, %rs931, 4; 2026-02-21T09:05:41.1478032Z prmt.b32 %r22176, %r22166, 0, 0x8880U; 2026-02-21T09:05:41.1478094Z cvt.u16.u32 %rs933, %r22176; 2026-02-21T09:05:41.1478162Z shr.s16 %rs934, %rs933, 4; 2026-02-21T09:05:41.1478226Z prmt.b32 %r22177, %r22171, 0, 0x8880U; 2026-02-21T09:05:41.1478288Z cvt.u16.u32 %rs935, %r22177; 2026-02-21T09:05:41.1478354Z shr.s16 %rs936, %rs935, 4; 2026-02-21T09:05:41.1478613Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1478683Z cvt.rn.f32.s16 %r22178, %rs936; 2026-02-21T09:05:41.1478748Z cvt.rn.f32.s16 %r22179, %rs934; 2026-02-21T09:05:41.1478816Z cvt.rn.f32.s16 %r22180, %rs932; 2026-02-21T09:05:41.1478877Z cvt.rn.f32.s16 %r22181, %rs930; 2026-02-21T09:05:41.1479070Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1479136Z cvt.s16.s8 %rs937, %rs923; 2026-02-21T09:05:41.1479196Z shr.s16 %rs938, %rs937, 4; 2026-02-21T09:05:41.1479256Z cvt.s16.s8 %rs939, %rs924; 2026-02-21T09:05:41.1479323Z shr.s16 %rs940, %rs939, 4; 2026-02-21T09:05:41.1479388Z prmt.b32 %r22182, %r22166, 0, 0x9991U; 2026-02-21T09:05:41.1479449Z cvt.u16.u32 %rs941, %r22182; 2026-02-21T09:05:41.1479508Z shr.s16 %rs942, %rs941, 4; 2026-02-21T09:05:41.1479578Z prmt.b32 %r22183, %r22171, 0, 0x9991U; 2026-02-21T09:05:41.1479641Z cvt.u16.u32 %rs943, %r22183; 2026-02-21T09:05:41.1479703Z shr.s16 %rs944, %rs943, 4; 2026-02-21T09:05:41.1479906Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1479978Z cvt.rn.f32.s16 %r22184, %rs944; 2026-02-21T09:05:41.1480040Z cvt.rn.f32.s16 %r22185, %rs942; 2026-02-21T09:05:41.1480102Z cvt.rn.f32.s16 %r22186, %rs940; 2026-02-21T09:05:41.1480168Z cvt.rn.f32.s16 %r22187, %rs938; 2026-02-21T09:05:41.1480363Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1480422Z cvt.s16.s8 %rs945, %rs925; 2026-02-21T09:05:41.1480487Z shr.s16 %rs946, %rs945, 4; 2026-02-21T09:05:41.1480548Z cvt.s16.s8 %rs947, %rs926; 2026-02-21T09:05:41.1480608Z shr.s16 %rs948, %rs947, 4; 2026-02-21T09:05:41.1480676Z prmt.b32 %r22188, %r22166, 0, 0xaaa2U; 2026-02-21T09:05:41.1480739Z cvt.u16.u32 %rs949, %r22188; 2026-02-21T09:05:41.1480801Z shr.s16 %rs950, %rs949, 4; 2026-02-21T09:05:41.1480867Z prmt.b32 %r22189, %r22171, 0, 0xaaa2U; 2026-02-21T09:05:41.1480931Z cvt.u16.u32 %rs951, %r22189; 2026-02-21T09:05:41.1480991Z shr.s16 %rs952, %rs951, 4; 2026-02-21T09:05:41.1481260Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1481335Z cvt.rn.f32.s16 %r22190, %rs952; 2026-02-21T09:05:41.1481402Z cvt.rn.f32.s16 %r22191, %rs950; 2026-02-21T09:05:41.1481464Z cvt.rn.f32.s16 %r22192, %rs948; 2026-02-21T09:05:41.1481529Z cvt.rn.f32.s16 %r22193, %rs946; 2026-02-21T09:05:41.1481723Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1481783Z cvt.s16.s8 %rs953, %rs927; 2026-02-21T09:05:41.1481843Z shr.s16 %rs954, %rs953, 4; 2026-02-21T09:05:41.1481906Z cvt.s16.s8 %rs955, %rs928; 2026-02-21T09:05:41.1481966Z shr.s16 %rs956, %rs955, 4; 2026-02-21T09:05:41.1482032Z prmt.b32 %r22194, %r22166, 0, 0xbbb3U; 2026-02-21T09:05:41.1482205Z cvt.u16.u32 %rs957, %r22194; 2026-02-21T09:05:41.1482268Z shr.s16 %rs958, %rs957, 4; 2026-02-21T09:05:41.1482335Z prmt.b32 %r22195, %r22171, 0, 0xbbb3U; 2026-02-21T09:05:41.1482400Z cvt.u16.u32 %rs959, %r22195; 2026-02-21T09:05:41.1482474Z shr.s16 %rs960, %rs959, 4; 2026-02-21T09:05:41.1482671Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1482734Z cvt.rn.f32.s16 %r22196, %rs960; 2026-02-21T09:05:41.1482802Z cvt.rn.f32.s16 %r22197, %rs958; 2026-02-21T09:05:41.1482862Z cvt.rn.f32.s16 %r22198, %rs956; 2026-02-21T09:05:41.1482923Z cvt.rn.f32.s16 %r22199, %rs954; 2026-02-21T09:05:41.1482990Z bar.sync 0; 2026-02-21T09:05:41.1483108Z st.shared.v4.b32 [%r24], {%r22181, %r22179, %r22180, %r22178}; 2026-02-21T09:05:41.1483219Z st.shared.v4.b32 [%r25], {%r22187, %r22185, %r22186, %r22184}; 2026-02-21T09:05:41.1483330Z st.shared.v4.b32 [%r26], {%r22193, %r22191, %r22192, %r22190}; 2026-02-21T09:05:41.1483514Z st.shared.v4.b32 [%r27], {%r22199, %r22197, %r22198, %r22196}; 2026-02-21T09:05:41.1483571Z $L__tmp23: 2026-02-21T09:05:41.1483846Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1483914Z // begin inline asm 2026-02-21T09:05:41.1483992Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1484048Z // end inline asm 2026-02-21T09:05:41.1484108Z bar.sync 0; 2026-02-21T09:05:41.1484183Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1484243Z // begin inline asm 2026-02-21T09:05:41.1485709Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r20591,%r20592,%r20593,%r20594}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1485770Z // end inline asm 2026-02-21T09:05:41.1485828Z // begin inline asm 2026-02-21T09:05:41.1487395Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673}, {%r20723,%r20724,%r20725,%r20726}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1487458Z // end inline asm 2026-02-21T09:05:41.1487522Z // begin inline asm 2026-02-21T09:05:41.1488977Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r20855,%r20856,%r20857,%r20858}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1489123Z // end inline asm 2026-02-21T09:05:41.1489186Z // begin inline asm 2026-02-21T09:05:41.1490734Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737}, {%r20987,%r20988,%r20989,%r20990}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1490852Z // end inline asm 2026-02-21T09:05:41.1490916Z // begin inline asm 2026-02-21T09:05:41.1492451Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r21119,%r21120,%r21121,%r21122}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1492520Z // end inline asm 2026-02-21T09:05:41.1492580Z // begin inline asm 2026-02-21T09:05:41.1494061Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801}, {%r21251,%r21252,%r21253,%r21254}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1494122Z // end inline asm 2026-02-21T09:05:41.1494180Z // begin inline asm 2026-02-21T09:05:41.1495666Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r21383,%r21384,%r21385,%r21386}, %rd2, %p92, 1, 1; 2026-02-21T09:05:41.1495735Z // end inline asm 2026-02-21T09:05:41.1495792Z // begin inline asm 2026-02-21T09:05:41.1497401Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865}, {%r21515,%r21516,%r21517,%r21518}, %rd3, %p92, 1, 1; 2026-02-21T09:05:41.1497538Z // end inline asm 2026-02-21T09:05:41.1497620Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1497690Z mov.b32 %r21775, %r24001; 2026-02-21T09:05:41.1497750Z mov.b32 %r21776, %r21777; 2026-02-21T09:05:41.1497810Z // begin inline asm 2026-02-21T09:05:41.1502914Z // wait for regs: %r29610,%r29611,%r29612,%r29613,%r29614,%r29615,%r29616,%r29617,%r29618,%r29619,%r29620,%r29621,%r29622,%r29623,%r29624,%r29625,%r29626,%r29627,%r29628,%r29629,%r29630,%r29631,%r29632,%r29633,%r29634,%r29635,%r29636,%r29637,%r29638,%r29639,%r29640,%r29641,%r29642,%r29643,%r29644,%r29645,%r29646,%r29647,%r29648,%r29649,%r29650,%r29651,%r29652,%r29653,%r29654,%r29655,%r29656,%r29657,%r29658,%r29659,%r29660,%r29661,%r29662,%r29663,%r29664,%r29665,%r29666,%r29667,%r29668,%r29669,%r29670,%r29671,%r29672,%r29673,%r29674,%r29675,%r29676,%r29677,%r29678,%r29679,%r29680,%r29681,%r29682,%r29683,%r29684,%r29685,%r29686,%r29687,%r29688,%r29689,%r29690,%r29691,%r29692,%r29693,%r29694,%r29695,%r29696,%r29697,%r29698,%r29699,%r29700,%r29701,%r29702,%r29703,%r29704,%r29705,%r29706,%r29707,%r29708,%r29709,%r29710,%r29711,%r29712,%r29713,%r29714,%r29715,%r29716,%r29717,%r29718,%r29719,%r29720,%r29721,%r29722,%r29723,%r29724,%r29725,%r29726,%r29727,%r29728,%r29729,%r29730,%r29731,%r29732,%r29733,%r29734,%r29735,%r29736,%r29737,%r29738,%r29739,%r29740,%r29741,%r29742,%r29743,%r29744,%r29745,%r29746,%r29747,%r29748,%r29749,%r29750,%r29751,%r29752,%r29753,%r29754,%r29755,%r29756,%r29757,%r29758,%r29759,%r29760,%r29761,%r29762,%r29763,%r29764,%r29765,%r29766,%r29767,%r29768,%r29769,%r29770,%r29771,%r29772,%r29773,%r29774,%r29775,%r29776,%r29777,%r29778,%r29779,%r29780,%r29781,%r29782,%r29783,%r29784,%r29785,%r29786,%r29787,%r29788,%r29789,%r29790,%r29791,%r29792,%r29793,%r29794,%r29795,%r29796,%r29797,%r29798,%r29799,%r29800,%r29801,%r29802,%r29803,%r29804,%r29805,%r29806,%r29807,%r29808,%r29809,%r29810,%r29811,%r29812,%r29813,%r29814,%r29815,%r29816,%r29817,%r29818,%r29819,%r29820,%r29821,%r29822,%r29823,%r29824,%r29825,%r29826,%r29827,%r29828,%r29829,%r29830,%r29831,%r29832,%r29833,%r29834,%r29835,%r29836,%r29837,%r29838,%r29839,%r29840,%r29841,%r29842,%r29843,%r29844,%r29845,%r29846,%r29847,%r29848,%r29849,%r29850,%r29851,%r29852,%r29853,%r29854,%r29855,%r29856,%r29857,%r29858,%r29859,%r29860,%r29861,%r29862,%r29863,%r29864,%r29865,%r21775,%r21776,%r21777 2026-02-21T09:05:41.1503061Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1503118Z // end inline asm 2026-02-21T09:05:41.1503172Z $L__tmp24: 2026-02-21T09:05:41.1503391Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1503457Z add.s64 %rd304, %rd304, 32; 2026-02-21T09:05:41.1503524Z add.s32 %r29609, %r29609, 262144; 2026-02-21T09:05:41.1503592Z add.s64 %rd303, %rd303, 128; 2026-02-21T09:05:41.1503661Z setp.lt.u64 %p124, %rd304, 480; 2026-02-21T09:05:41.1503723Z @%p124 bra $L__BB0_7; 2026-02-21T09:05:41.1503836Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:05:41.1504047Z .loc 1 94 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:94:28 2026-02-21T09:05:41.1504134Z cvt.rn.bf16x2.f32 %r22203, %r29611, %r29610; 2026-02-21T09:05:41.1504214Z cvt.rn.bf16x2.f32 %r22204, %r29613, %r29612; 2026-02-21T09:05:41.1504295Z cvt.rn.bf16x2.f32 %r22205, %r29615, %r29614; 2026-02-21T09:05:41.1504372Z cvt.rn.bf16x2.f32 %r22206, %r29617, %r29616; 2026-02-21T09:05:41.1504447Z cvt.rn.bf16x2.f32 %r22207, %r29619, %r29618; 2026-02-21T09:05:41.1504526Z cvt.rn.bf16x2.f32 %r22208, %r29621, %r29620; 2026-02-21T09:05:41.1504602Z cvt.rn.bf16x2.f32 %r22209, %r29623, %r29622; 2026-02-21T09:05:41.1504682Z cvt.rn.bf16x2.f32 %r22210, %r29625, %r29624; 2026-02-21T09:05:41.1504758Z cvt.rn.bf16x2.f32 %r22211, %r29627, %r29626; 2026-02-21T09:05:41.1504853Z cvt.rn.bf16x2.f32 %r22212, %r29629, %r29628; 2026-02-21T09:05:41.1504987Z cvt.rn.bf16x2.f32 %r22213, %r29631, %r29630; 2026-02-21T09:05:41.1505062Z cvt.rn.bf16x2.f32 %r22214, %r29633, %r29632; 2026-02-21T09:05:41.1505141Z cvt.rn.bf16x2.f32 %r22215, %r29635, %r29634; 2026-02-21T09:05:41.1505215Z cvt.rn.bf16x2.f32 %r22216, %r29637, %r29636; 2026-02-21T09:05:41.1505288Z cvt.rn.bf16x2.f32 %r22217, %r29639, %r29638; 2026-02-21T09:05:41.1505367Z cvt.rn.bf16x2.f32 %r22218, %r29641, %r29640; 2026-02-21T09:05:41.1505442Z cvt.rn.bf16x2.f32 %r22219, %r29643, %r29642; 2026-02-21T09:05:41.1505517Z cvt.rn.bf16x2.f32 %r22220, %r29645, %r29644; 2026-02-21T09:05:41.1505592Z cvt.rn.bf16x2.f32 %r22221, %r29647, %r29646; 2026-02-21T09:05:41.1505672Z cvt.rn.bf16x2.f32 %r22222, %r29649, %r29648; 2026-02-21T09:05:41.1505846Z cvt.rn.bf16x2.f32 %r22223, %r29651, %r29650; 2026-02-21T09:05:41.1505936Z cvt.rn.bf16x2.f32 %r22224, %r29653, %r29652; 2026-02-21T09:05:41.1506016Z cvt.rn.bf16x2.f32 %r22225, %r29655, %r29654; 2026-02-21T09:05:41.1506092Z cvt.rn.bf16x2.f32 %r22226, %r29657, %r29656; 2026-02-21T09:05:41.1506169Z cvt.rn.bf16x2.f32 %r22227, %r29659, %r29658; 2026-02-21T09:05:41.1506245Z cvt.rn.bf16x2.f32 %r22228, %r29661, %r29660; 2026-02-21T09:05:41.1506327Z cvt.rn.bf16x2.f32 %r22229, %r29663, %r29662; 2026-02-21T09:05:41.1506402Z cvt.rn.bf16x2.f32 %r22230, %r29665, %r29664; 2026-02-21T09:05:41.1506598Z cvt.rn.bf16x2.f32 %r22231, %r29667, %r29666; 2026-02-21T09:05:41.1506686Z cvt.rn.bf16x2.f32 %r22232, %r29669, %r29668; 2026-02-21T09:05:41.1506762Z cvt.rn.bf16x2.f32 %r22233, %r29671, %r29670; 2026-02-21T09:05:41.1506838Z cvt.rn.bf16x2.f32 %r22234, %r29673, %r29672; 2026-02-21T09:05:41.1506917Z cvt.rn.bf16x2.f32 %r22235, %r29675, %r29674; 2026-02-21T09:05:41.1506993Z cvt.rn.bf16x2.f32 %r22236, %r29677, %r29676; 2026-02-21T09:05:41.1507143Z cvt.rn.bf16x2.f32 %r22237, %r29679, %r29678; 2026-02-21T09:05:41.1507223Z cvt.rn.bf16x2.f32 %r22238, %r29681, %r29680; 2026-02-21T09:05:41.1507301Z cvt.rn.bf16x2.f32 %r22239, %r29683, %r29682; 2026-02-21T09:05:41.1507378Z cvt.rn.bf16x2.f32 %r22240, %r29685, %r29684; 2026-02-21T09:05:41.1507453Z cvt.rn.bf16x2.f32 %r22241, %r29687, %r29686; 2026-02-21T09:05:41.1507531Z cvt.rn.bf16x2.f32 %r22242, %r29689, %r29688; 2026-02-21T09:05:41.1507606Z cvt.rn.bf16x2.f32 %r22243, %r29691, %r29690; 2026-02-21T09:05:41.1507680Z cvt.rn.bf16x2.f32 %r22244, %r29693, %r29692; 2026-02-21T09:05:41.1507758Z cvt.rn.bf16x2.f32 %r22245, %r29695, %r29694; 2026-02-21T09:05:41.1507834Z cvt.rn.bf16x2.f32 %r22246, %r29697, %r29696; 2026-02-21T09:05:41.1507911Z cvt.rn.bf16x2.f32 %r22247, %r29699, %r29698; 2026-02-21T09:05:41.1507986Z cvt.rn.bf16x2.f32 %r22248, %r29701, %r29700; 2026-02-21T09:05:41.1508079Z cvt.rn.bf16x2.f32 %r22249, %r29703, %r29702; 2026-02-21T09:05:41.1508163Z cvt.rn.bf16x2.f32 %r22250, %r29705, %r29704; 2026-02-21T09:05:41.1508240Z cvt.rn.bf16x2.f32 %r22251, %r29707, %r29706; 2026-02-21T09:05:41.1508390Z cvt.rn.bf16x2.f32 %r22252, %r29709, %r29708; 2026-02-21T09:05:41.1508469Z cvt.rn.bf16x2.f32 %r22253, %r29711, %r29710; 2026-02-21T09:05:41.1508547Z cvt.rn.bf16x2.f32 %r22254, %r29713, %r29712; 2026-02-21T09:05:41.1508629Z cvt.rn.bf16x2.f32 %r22255, %r29715, %r29714; 2026-02-21T09:05:41.1508705Z cvt.rn.bf16x2.f32 %r22256, %r29717, %r29716; 2026-02-21T09:05:41.1508779Z cvt.rn.bf16x2.f32 %r22257, %r29719, %r29718; 2026-02-21T09:05:41.1508853Z cvt.rn.bf16x2.f32 %r22258, %r29721, %r29720; 2026-02-21T09:05:41.1508932Z cvt.rn.bf16x2.f32 %r22259, %r29723, %r29722; 2026-02-21T09:05:41.1509007Z cvt.rn.bf16x2.f32 %r22260, %r29725, %r29724; 2026-02-21T09:05:41.1509085Z cvt.rn.bf16x2.f32 %r22261, %r29727, %r29726; 2026-02-21T09:05:41.1509166Z cvt.rn.bf16x2.f32 %r22262, %r29729, %r29728; 2026-02-21T09:05:41.1509239Z cvt.rn.bf16x2.f32 %r22263, %r29731, %r29730; 2026-02-21T09:05:41.1509319Z cvt.rn.bf16x2.f32 %r22264, %r29733, %r29732; 2026-02-21T09:05:41.1509399Z cvt.rn.bf16x2.f32 %r22265, %r29735, %r29734; 2026-02-21T09:05:41.1509475Z cvt.rn.bf16x2.f32 %r22266, %r29737, %r29736; 2026-02-21T09:05:41.1509627Z cvt.rn.bf16x2.f32 %r22267, %r29739, %r29738; 2026-02-21T09:05:41.1509703Z cvt.rn.bf16x2.f32 %r22268, %r29741, %r29740; 2026-02-21T09:05:41.1509781Z cvt.rn.bf16x2.f32 %r22269, %r29743, %r29742; 2026-02-21T09:05:41.1509856Z cvt.rn.bf16x2.f32 %r22270, %r29745, %r29744; 2026-02-21T09:05:41.1509932Z cvt.rn.bf16x2.f32 %r22271, %r29747, %r29746; 2026-02-21T09:05:41.1510015Z cvt.rn.bf16x2.f32 %r22272, %r29749, %r29748; 2026-02-21T09:05:41.1510090Z cvt.rn.bf16x2.f32 %r22273, %r29751, %r29750; 2026-02-21T09:05:41.1510165Z cvt.rn.bf16x2.f32 %r22274, %r29753, %r29752; 2026-02-21T09:05:41.1510244Z cvt.rn.bf16x2.f32 %r22275, %r29755, %r29754; 2026-02-21T09:05:41.1510318Z cvt.rn.bf16x2.f32 %r22276, %r29757, %r29756; 2026-02-21T09:05:41.1510531Z cvt.rn.bf16x2.f32 %r22277, %r29759, %r29758; 2026-02-21T09:05:41.1510612Z cvt.rn.bf16x2.f32 %r22278, %r29761, %r29760; 2026-02-21T09:05:41.1510691Z cvt.rn.bf16x2.f32 %r22279, %r29763, %r29762; 2026-02-21T09:05:41.1510767Z cvt.rn.bf16x2.f32 %r22280, %r29765, %r29764; 2026-02-21T09:05:41.1510842Z cvt.rn.bf16x2.f32 %r22281, %r29767, %r29766; 2026-02-21T09:05:41.1510921Z cvt.rn.bf16x2.f32 %r22282, %r29769, %r29768; 2026-02-21T09:05:41.1510996Z cvt.rn.bf16x2.f32 %r22283, %r29771, %r29770; 2026-02-21T09:05:41.1511071Z cvt.rn.bf16x2.f32 %r22284, %r29773, %r29772; 2026-02-21T09:05:41.1511150Z cvt.rn.bf16x2.f32 %r22285, %r29775, %r29774; 2026-02-21T09:05:41.1511225Z cvt.rn.bf16x2.f32 %r22286, %r29777, %r29776; 2026-02-21T09:05:41.1511300Z cvt.rn.bf16x2.f32 %r22287, %r29779, %r29778; 2026-02-21T09:05:41.1511374Z cvt.rn.bf16x2.f32 %r22288, %r29781, %r29780; 2026-02-21T09:05:41.1511452Z cvt.rn.bf16x2.f32 %r22289, %r29783, %r29782; 2026-02-21T09:05:41.1511540Z cvt.rn.bf16x2.f32 %r22290, %r29785, %r29784; 2026-02-21T09:05:41.1511670Z cvt.rn.bf16x2.f32 %r22291, %r29787, %r29786; 2026-02-21T09:05:41.1511751Z cvt.rn.bf16x2.f32 %r22292, %r29789, %r29788; 2026-02-21T09:05:41.1511826Z cvt.rn.bf16x2.f32 %r22293, %r29791, %r29790; 2026-02-21T09:05:41.1511903Z cvt.rn.bf16x2.f32 %r22294, %r29793, %r29792; 2026-02-21T09:05:41.1511976Z cvt.rn.bf16x2.f32 %r22295, %r29795, %r29794; 2026-02-21T09:05:41.1512056Z cvt.rn.bf16x2.f32 %r22296, %r29797, %r29796; 2026-02-21T09:05:41.1512131Z cvt.rn.bf16x2.f32 %r22297, %r29799, %r29798; 2026-02-21T09:05:41.1512205Z cvt.rn.bf16x2.f32 %r22298, %r29801, %r29800; 2026-02-21T09:05:41.1512282Z cvt.rn.bf16x2.f32 %r22299, %r29803, %r29802; 2026-02-21T09:05:41.1512356Z cvt.rn.bf16x2.f32 %r22300, %r29805, %r29804; 2026-02-21T09:05:41.1512429Z cvt.rn.bf16x2.f32 %r22301, %r29807, %r29806; 2026-02-21T09:05:41.1512510Z cvt.rn.bf16x2.f32 %r22302, %r29809, %r29808; 2026-02-21T09:05:41.1512585Z cvt.rn.bf16x2.f32 %r22303, %r29811, %r29810; 2026-02-21T09:05:41.1512663Z cvt.rn.bf16x2.f32 %r22304, %r29813, %r29812; 2026-02-21T09:05:41.1512740Z cvt.rn.bf16x2.f32 %r22305, %r29815, %r29814; 2026-02-21T09:05:41.1512818Z cvt.rn.bf16x2.f32 %r22306, %r29817, %r29816; 2026-02-21T09:05:41.1512898Z cvt.rn.bf16x2.f32 %r22307, %r29819, %r29818; 2026-02-21T09:05:41.1512974Z cvt.rn.bf16x2.f32 %r22308, %r29821, %r29820; 2026-02-21T09:05:41.1513052Z cvt.rn.bf16x2.f32 %r22309, %r29823, %r29822; 2026-02-21T09:05:41.1513128Z cvt.rn.bf16x2.f32 %r22310, %r29825, %r29824; 2026-02-21T09:05:41.1513202Z cvt.rn.bf16x2.f32 %r22311, %r29827, %r29826; 2026-02-21T09:05:41.1513281Z cvt.rn.bf16x2.f32 %r22312, %r29829, %r29828; 2026-02-21T09:05:41.1513357Z cvt.rn.bf16x2.f32 %r22313, %r29831, %r29830; 2026-02-21T09:05:41.1513432Z cvt.rn.bf16x2.f32 %r22314, %r29833, %r29832; 2026-02-21T09:05:41.1513506Z cvt.rn.bf16x2.f32 %r22315, %r29835, %r29834; 2026-02-21T09:05:41.1513584Z cvt.rn.bf16x2.f32 %r22316, %r29837, %r29836; 2026-02-21T09:05:41.1513658Z cvt.rn.bf16x2.f32 %r22317, %r29839, %r29838; 2026-02-21T09:05:41.1513736Z cvt.rn.bf16x2.f32 %r22318, %r29841, %r29840; 2026-02-21T09:05:41.1513813Z cvt.rn.bf16x2.f32 %r22319, %r29843, %r29842; 2026-02-21T09:05:41.1513887Z cvt.rn.bf16x2.f32 %r22320, %r29845, %r29844; 2026-02-21T09:05:41.1514019Z cvt.rn.bf16x2.f32 %r22321, %r29847, %r29846; 2026-02-21T09:05:41.1514098Z cvt.rn.bf16x2.f32 %r22322, %r29849, %r29848; 2026-02-21T09:05:41.1514173Z cvt.rn.bf16x2.f32 %r22323, %r29851, %r29850; 2026-02-21T09:05:41.1514248Z cvt.rn.bf16x2.f32 %r22324, %r29853, %r29852; 2026-02-21T09:05:41.1514324Z cvt.rn.bf16x2.f32 %r22325, %r29855, %r29854; 2026-02-21T09:05:41.1514404Z cvt.rn.bf16x2.f32 %r22326, %r29857, %r29856; 2026-02-21T09:05:41.1514477Z cvt.rn.bf16x2.f32 %r22327, %r29859, %r29858; 2026-02-21T09:05:41.1514552Z cvt.rn.bf16x2.f32 %r22328, %r29861, %r29860; 2026-02-21T09:05:41.1514631Z cvt.rn.bf16x2.f32 %r22329, %r29863, %r29862; 2026-02-21T09:05:41.1514704Z cvt.rn.bf16x2.f32 %r22330, %r29865, %r29864; 2026-02-21T09:05:41.1515007Z .loc 1 95 43 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:95:43 2026-02-21T09:05:41.1515074Z bar.sync 0; 2026-02-21T09:05:41.1515268Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r22203, %r22204, %r22205, %r22206}; 2026-02-21T09:05:41.1515455Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r22219, %r22220, %r22221, %r22222}; 2026-02-21T09:05:41.1515635Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r22235, %r22236, %r22237, %r22238}; 2026-02-21T09:05:41.1515820Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r22251, %r22252, %r22253, %r22254}; 2026-02-21T09:05:41.1515998Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r22267, %r22268, %r22269, %r22270}; 2026-02-21T09:05:41.1516176Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r22283, %r22284, %r22285, %r22286}; 2026-02-21T09:05:41.1516360Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r22299, %r22300, %r22301, %r22302}; 2026-02-21T09:05:41.1516670Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r22315, %r22316, %r22317, %r22318}; 2026-02-21T09:05:41.1516931Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r22207, %r22208, %r22209, %r22210}; 2026-02-21T09:05:41.1517123Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r22223, %r22224, %r22225, %r22226}; 2026-02-21T09:05:41.1517317Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r22239, %r22240, %r22241, %r22242}; 2026-02-21T09:05:41.1517496Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r22255, %r22256, %r22257, %r22258}; 2026-02-21T09:05:41.1517681Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r22271, %r22272, %r22273, %r22274}; 2026-02-21T09:05:41.1517858Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r22287, %r22288, %r22289, %r22290}; 2026-02-21T09:05:41.1518038Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r22303, %r22304, %r22305, %r22306}; 2026-02-21T09:05:41.1518219Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r22319, %r22320, %r22321, %r22322}; 2026-02-21T09:05:41.1518402Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r22211, %r22212, %r22213, %r22214}; 2026-02-21T09:05:41.1518580Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r22227, %r22228, %r22229, %r22230}; 2026-02-21T09:05:41.1518762Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r22243, %r22244, %r22245, %r22246}; 2026-02-21T09:05:41.1518943Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r22259, %r22260, %r22261, %r22262}; 2026-02-21T09:05:41.1519122Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r22275, %r22276, %r22277, %r22278}; 2026-02-21T09:05:41.1519303Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r22291, %r22292, %r22293, %r22294}; 2026-02-21T09:05:41.1519482Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r22307, %r22308, %r22309, %r22310}; 2026-02-21T09:05:41.1519661Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r22323, %r22324, %r22325, %r22326}; 2026-02-21T09:05:41.1519845Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r52], {%r22215, %r22216, %r22217, %r22218}; 2026-02-21T09:05:41.1520030Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r22231, %r22232, %r22233, %r22234}; 2026-02-21T09:05:41.1520209Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r54], {%r22247, %r22248, %r22249, %r22250}; 2026-02-21T09:05:41.1520472Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r55], {%r22263, %r22264, %r22265, %r22266}; 2026-02-21T09:05:41.1520658Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r56], {%r22279, %r22280, %r22281, %r22282}; 2026-02-21T09:05:41.1520837Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r57], {%r22295, %r22296, %r22297, %r22298}; 2026-02-21T09:05:41.1521017Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r58], {%r22311, %r22312, %r22313, %r22314}; 2026-02-21T09:05:41.1521199Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r59], {%r22327, %r22328, %r22329, %r22330}; 2026-02-21T09:05:41.1521260Z // begin inline asm 2026-02-21T09:05:41.1521339Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1521399Z // end inline asm 2026-02-21T09:05:41.1521523Z bar.sync 0; 2026-02-21T09:05:41.1521661Z elect.sync %r22331|%p127, -1; 2026-02-21T09:05:41.1521736Z and.pred %p125, %p167, %p127; 2026-02-21T09:05:41.1521806Z or.b32 %r22200, %r1157, %r638; 2026-02-21T09:05:41.1521867Z // begin inline asm 2026-02-21T09:05:41.1522105Z @%p125 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd183, {%r22200, %r22201}], [%r15516]; 2026-02-21T09:05:41.1522165Z // end inline asm 2026-02-21T09:05:41.1522240Z cp.async.bulk.commit_group; 2026-02-21T09:05:41.1522317Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:41.1522378Z bar.sync 0; 2026-02-21T09:05:41.1522588Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.1522655Z add.s32 %r29094, %r29094, 3168; 2026-02-21T09:05:41.1522725Z setp.lt.s32 %p128, %r29094, %r29866; 2026-02-21T09:05:41.1522791Z @%p128 bra $L__BB0_2; 2026-02-21T09:05:41.1522881Z $L__BB0_9: // %.preheader 2026-02-21T09:05:41.1522953Z setp.gt.s32 %p129, %r29866, 1023; 2026-02-21T09:05:41.1523069Z @%p129 bra $L__BB0_14; 2026-02-21T09:05:41.1523154Z // %bb.10: // %.lr.ph141 2026-02-21T09:05:41.1523359Z .loc 1 0 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:0:144 2026-02-21T09:05:41.1523428Z and.b32 %r22333, %r29078, 1904; 2026-02-21T09:05:41.1523490Z and.b32 %r22335, %r29079, 136; 2026-02-21T09:05:41.1523553Z or.b32 %r22336, %r22335, %r22333; 2026-02-21T09:05:41.1523616Z add.s32 %r65, %r24001, %r22336; 2026-02-21T09:05:41.1523684Z xor.b32 %r22338, %r22336, 8; 2026-02-21T09:05:41.1523745Z add.s32 %r66, %r24001, %r22338; 2026-02-21T09:05:41.1523805Z shl.b32 %r22340, %r29080, 4; 2026-02-21T09:05:41.1523875Z and.b32 %r22342, %r29081, 96; 2026-02-21T09:05:41.1523942Z and.b32 %r22344, %r29082, 6; 2026-02-21T09:05:41.1524003Z and.b32 %r22347, %r29084, 136; 2026-02-21T09:05:41.1524064Z or.b32 %r22348, %r22340, %r22342; 2026-02-21T09:05:41.1524129Z or.b32 %r22349, %r22348, %r22344; 2026-02-21T09:05:41.1524192Z or.b32 %r22350, %r22349, %r22347; 2026-02-21T09:05:41.1524254Z add.s32 %r67, %r24001, %r22350; 2026-02-21T09:05:41.1524320Z xor.b32 %r22351, %r22350, 8; 2026-02-21T09:05:41.1524381Z add.s32 %r68, %r24001, %r22351; 2026-02-21T09:05:41.1524442Z and.b32 %r22353, %r29084, 132; 2026-02-21T09:05:41.1524508Z or.b32 %r22354, %r29085, %r22353; 2026-02-21T09:05:41.1524567Z or.b32 %r22355, %r22354, %r7; 2026-02-21T09:05:41.1524626Z add.s32 %r69, %r24001, %r22355; 2026-02-21T09:05:41.1524686Z xor.b32 %r22356, %r22355, 4; 2026-02-21T09:05:41.1524753Z add.s32 %r70, %r24001, %r22356; 2026-02-21T09:05:41.1524814Z xor.b32 %r22357, %r22355, 32; 2026-02-21T09:05:41.1524873Z add.s32 %r71, %r24001, %r22357; 2026-02-21T09:05:41.1524936Z xor.b32 %r22358, %r22355, 36; 2026-02-21T09:05:41.1524997Z add.s32 %r72, %r24001, %r22358; 2026-02-21T09:05:41.1525056Z xor.b32 %r22359, %r22355, 64; 2026-02-21T09:05:41.1525115Z add.s32 %r73, %r24001, %r22359; 2026-02-21T09:05:41.1525182Z xor.b32 %r22360, %r22355, 68; 2026-02-21T09:05:41.1525243Z add.s32 %r74, %r24001, %r22360; 2026-02-21T09:05:41.1525304Z xor.b32 %r22361, %r22355, 96; 2026-02-21T09:05:41.1525369Z add.s32 %r75, %r24001, %r22361; 2026-02-21T09:05:41.1525487Z xor.b32 %r22362, %r22355, 100; 2026-02-21T09:05:41.1525550Z add.s32 %r76, %r24001, %r22362; 2026-02-21T09:05:41.1525611Z mul.lo.s32 %r22366, %r29086, 144; 2026-02-21T09:05:41.1525691Z xor.b32 %r22367, %r22366, %r29087; 2026-02-21T09:05:41.1525754Z or.b32 %r22368, %r22367, %r29088; 2026-02-21T09:05:41.1525816Z add.s32 %r77, %r24001, %r22368; 2026-02-21T09:05:41.1525882Z xor.b32 %r22369, %r22368, 132; 2026-02-21T09:05:41.1525951Z add.s32 %r78, %r24001, %r22369; 2026-02-21T09:05:41.1526010Z and.b32 %r22371, %r29089, 8128; 2026-02-21T09:05:41.1526075Z shl.b32 %r22372, %r29086, 3; 2026-02-21T09:05:41.1526135Z or.b32 %r22373, %r22371, %r22372; 2026-02-21T09:05:41.1526198Z add.s32 %r79, %r24001, %r22373; 2026-02-21T09:05:41.1526360Z xor.b32 %r22374, %r22373, 16; 2026-02-21T09:05:41.1526431Z add.s32 %r80, %r24001, %r22374; 2026-02-21T09:05:41.1526619Z xor.b32 %r22375, %r22373, 32; 2026-02-21T09:05:41.1526686Z add.s32 %r81, %r24001, %r22375; 2026-02-21T09:05:41.1526755Z xor.b32 %r22376, %r22373, 48; 2026-02-21T09:05:41.1526816Z add.s32 %r82, %r24001, %r22376; 2026-02-21T09:05:41.1526878Z bfe.u32 %r22377, %r24001, 4, 14; 2026-02-21T09:05:41.1526940Z cvt.u64.u32 %rd242, %r22377; 2026-02-21T09:05:41.1527024Z or.b64 %rd5, %rd242, -9223371899382267904; 2026-02-21T09:05:41.1527089Z add.s32 %r22378, %r24001, 32; 2026-02-21T09:05:41.1527154Z bfe.u32 %r22379, %r22378, 4, 14; 2026-02-21T09:05:41.1527223Z cvt.u64.u32 %rd243, %r22379; 2026-02-21T09:05:41.1527303Z or.b64 %rd6, %rd243, -9223371899382267904; 2026-02-21T09:05:41.1527368Z and.b32 %r22381, %r29089, 6144; 2026-02-21T09:05:41.1527429Z and.b32 %r22382, %r29078, 112; 2026-02-21T09:05:41.1527500Z or.b32 %r22383, %r29090, %r22382; 2026-02-21T09:05:41.1527574Z xor.b32 %r22384, %r22383, %r29083; 2026-02-21T09:05:41.1527708Z or.b32 %r22385, %r22384, %r22381; 2026-02-21T09:05:41.1527783Z add.s32 %r83, %r24001, %r22385; 2026-02-21T09:05:41.1527846Z add.s32 %r84, %r83, 32768; 2026-02-21T09:05:41.1527910Z add.s32 %r85, %r83, 8192; 2026-02-21T09:05:41.1527976Z add.s32 %r86, %r83, 40960; 2026-02-21T09:05:41.1528040Z add.s32 %r87, %r83, 16384; 2026-02-21T09:05:41.1528100Z add.s32 %r88, %r83, 49152; 2026-02-21T09:05:41.1528158Z add.s32 %r89, %r83, 24576; 2026-02-21T09:05:41.1528223Z add.s32 %r90, %r83, 57344; 2026-02-21T09:05:41.1528284Z xor.b32 %r22386, %r22385, 32; 2026-02-21T09:05:41.1528347Z add.s32 %r91, %r24001, %r22386; 2026-02-21T09:05:41.1528411Z add.s32 %r92, %r91, 32768; 2026-02-21T09:05:41.1528473Z add.s32 %r93, %r91, 8192; 2026-02-21T09:05:41.1528532Z add.s32 %r94, %r91, 40960; 2026-02-21T09:05:41.1528592Z add.s32 %r95, %r91, 16384; 2026-02-21T09:05:41.1528657Z add.s32 %r96, %r91, 49152; 2026-02-21T09:05:41.1528718Z add.s32 %r97, %r91, 24576; 2026-02-21T09:05:41.1528782Z add.s32 %r98, %r91, 57344; 2026-02-21T09:05:41.1528851Z xor.b32 %r22387, %r22385, 64; 2026-02-21T09:05:41.1528913Z add.s32 %r99, %r24001, %r22387; 2026-02-21T09:05:41.1528978Z add.s32 %r100, %r99, 32768; 2026-02-21T09:05:41.1529038Z add.s32 %r101, %r99, 8192; 2026-02-21T09:05:41.1529107Z add.s32 %r102, %r99, 40960; 2026-02-21T09:05:41.1529169Z add.s32 %r103, %r99, 16384; 2026-02-21T09:05:41.1529229Z add.s32 %r104, %r99, 49152; 2026-02-21T09:05:41.1529300Z add.s32 %r105, %r99, 24576; 2026-02-21T09:05:41.1529360Z add.s32 %r106, %r99, 57344; 2026-02-21T09:05:41.1529420Z xor.b32 %r22388, %r22385, 96; 2026-02-21T09:05:41.1529496Z add.s32 %r107, %r24001, %r22388; 2026-02-21T09:05:41.1529564Z add.s32 %r108, %r107, 32768; 2026-02-21T09:05:41.1529626Z add.s32 %r109, %r107, 8192; 2026-02-21T09:05:41.1529687Z add.s32 %r110, %r107, 40960; 2026-02-21T09:05:41.1529750Z add.s32 %r111, %r107, 16384; 2026-02-21T09:05:41.1529809Z add.s32 %r112, %r107, 49152; 2026-02-21T09:05:41.1529873Z add.s32 %r113, %r107, 24576; 2026-02-21T09:05:41.1529941Z add.s32 %r114, %r107, 57344; 2026-02-21T09:05:41.1530158Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.1530305Z mad.wide.u32 %rd7, %r9, 16, %rd44; 2026-02-21T09:05:41.1530375Z mad.wide.u32 %rd8, %r8, 8192, %rd45; 2026-02-21T09:05:41.1530495Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T09:05:41.1530593Z // Child Loop BB0_12 Depth 2 2026-02-21T09:05:41.1530798Z .loc 1 32 35 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:32:35 2026-02-21T09:05:41.1530867Z shr.s32 %r22391, %r29866, 31; 2026-02-21T09:05:41.1530930Z shr.u32 %r22392, %r22391, 25; 2026-02-21T09:05:41.1530993Z add.s32 %r22393, %r29866, %r22392; 2026-02-21T09:05:41.1531194Z .loc 1 35 45 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:45 2026-02-21T09:05:41.1531405Z and.b32 %r22394, %r22393, 65408; 2026-02-21T09:05:41.1531471Z sub.s32 %r22395, %r29866, %r22394; 2026-02-21T09:05:41.1531666Z .loc 1 35 64 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:64 2026-02-21T09:05:41.1531736Z cvt.u16.u32 %rs961, %r22395; 2026-02-21T09:05:41.1531799Z and.b16 %rs962, %rs961, 128; 2026-02-21T09:05:41.1531994Z .loc 1 36 51 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:36:51 2026-02-21T09:05:41.1532062Z shr.u16 %rs963, %rs962, 7; 2026-02-21T09:05:41.1532127Z add.s16 %rs964, %rs961, %rs963; 2026-02-21T09:05:41.1532191Z cvt.s16.s8 %rs965, %rs964; 2026-02-21T09:05:41.1532260Z shr.s16 %rs966, %rs965, 1; 2026-02-21T09:05:41.1532456Z .loc 1 35 64 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:35:64 2026-02-21T09:05:41.1532517Z and.b16 %rs967, %rs964, 254; 2026-02-21T09:05:41.1532583Z sub.s16 %rs968, %rs961, %rs967; 2026-02-21T09:05:41.1532835Z .loc 1 37 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:37:27 2026-02-21T09:05:41.1532897Z shl.b32 %r22396, %r22393, 2; 2026-02-21T09:05:41.1532961Z and.b32 %r22397, %r22396, -512; 2026-02-21T09:05:41.1533029Z cvt.s16.s8 %rs969, %rs968; 2026-02-21T09:05:41.1533095Z mul.wide.s16 %r22398, %rs969, 256; 2026-02-21T09:05:41.1533159Z add.s32 %r1675, %r22398, %r22397; 2026-02-21T09:05:41.1533358Z .loc 1 39 27 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:39:27 2026-02-21T09:05:41.1533423Z mul.wide.s16 %r1676, %rs966, 128; 2026-02-21T09:05:41.1533630Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1533691Z or.b32 %r22399, %r29091, %r22397; 2026-02-21T09:05:41.1533758Z add.s32 %r22400, %r22399, %r22398; 2026-02-21T09:05:41.1533820Z shl.b32 %r22401, %r22400, 10; 2026-02-21T09:05:41.1533886Z mul.wide.s32 %rd33, %r22401, 2; 2026-02-21T09:05:41.1533958Z or.b32 %r22402, %r29092, %r22397; 2026-02-21T09:05:41.1534019Z add.s32 %r22403, %r22402, %r22398; 2026-02-21T09:05:41.1534080Z shl.b32 %r22404, %r22403, 10; 2026-02-21T09:05:41.1534152Z mul.wide.s32 %rd34, %r22404, 2; 2026-02-21T09:05:41.1534213Z or.b32 %r22405, %r29093, %r22397; 2026-02-21T09:05:41.1534276Z add.s32 %r22406, %r22405, %r22398; 2026-02-21T09:05:41.1534337Z shl.b32 %r22407, %r22406, 10; 2026-02-21T09:05:41.1534415Z mul.wide.s32 %rd35, %r22407, 2; 2026-02-21T09:05:41.1534481Z or.b32 %r22408, %r5, %r22397; 2026-02-21T09:05:41.1534546Z add.s32 %r22409, %r22408, %r22398; 2026-02-21T09:05:41.1534611Z shl.b32 %r22410, %r22409, 10; 2026-02-21T09:05:41.1534675Z mul.wide.s32 %rd36, %r22410, 2; 2026-02-21T09:05:41.1534736Z or.b32 %r22411, %r7, %r1676; 2026-02-21T09:05:41.1534800Z cvt.s64.s32 %rd245, %r22411; 2026-02-21T09:05:41.1534868Z add.s64 %rd305, %rd8, %rd245; 2026-02-21T09:05:41.1534930Z mov.b32 %r29867, 0f00000000; 2026-02-21T09:05:41.1534995Z mov.b64 %rd307, -32; 2026-02-21T09:05:41.1535061Z mov.b64 %rd306, %rd7; 2026-02-21T09:05:41.1535124Z mov.b32 %r29868, %r29867; 2026-02-21T09:05:41.1535183Z mov.b32 %r29869, %r29867; 2026-02-21T09:05:41.1535304Z mov.b32 %r29870, %r29867; 2026-02-21T09:05:41.1535369Z mov.b32 %r29871, %r29867; 2026-02-21T09:05:41.1535428Z mov.b32 %r29872, %r29867; 2026-02-21T09:05:41.1535487Z mov.b32 %r29873, %r29867; 2026-02-21T09:05:41.1535550Z mov.b32 %r29874, %r29867; 2026-02-21T09:05:41.1535610Z mov.b32 %r29875, %r29867; 2026-02-21T09:05:41.1535669Z mov.b32 %r29876, %r29867; 2026-02-21T09:05:41.1535727Z mov.b32 %r29877, %r29867; 2026-02-21T09:05:41.1535791Z mov.b32 %r29878, %r29867; 2026-02-21T09:05:41.1535848Z mov.b32 %r29879, %r29867; 2026-02-21T09:05:41.1535907Z mov.b32 %r29880, %r29867; 2026-02-21T09:05:41.1535971Z mov.b32 %r29881, %r29867; 2026-02-21T09:05:41.1536031Z mov.b32 %r29882, %r29867; 2026-02-21T09:05:41.1536090Z mov.b32 %r29883, %r29867; 2026-02-21T09:05:41.1536245Z mov.b32 %r29884, %r29867; 2026-02-21T09:05:41.1536307Z mov.b32 %r29885, %r29867; 2026-02-21T09:05:41.1536365Z mov.b32 %r29886, %r29867; 2026-02-21T09:05:41.1536423Z mov.b32 %r29887, %r29867; 2026-02-21T09:05:41.1536616Z mov.b32 %r29888, %r29867; 2026-02-21T09:05:41.1536682Z mov.b32 %r29889, %r29867; 2026-02-21T09:05:41.1536745Z mov.b32 %r29890, %r29867; 2026-02-21T09:05:41.1536808Z mov.b32 %r29891, %r29867; 2026-02-21T09:05:41.1536868Z mov.b32 %r29892, %r29867; 2026-02-21T09:05:41.1536926Z mov.b32 %r29893, %r29867; 2026-02-21T09:05:41.1536984Z mov.b32 %r29894, %r29867; 2026-02-21T09:05:41.1537054Z mov.b32 %r29895, %r29867; 2026-02-21T09:05:41.1537115Z mov.b32 %r29896, %r29867; 2026-02-21T09:05:41.1537177Z mov.b32 %r29897, %r29867; 2026-02-21T09:05:41.1537242Z mov.b32 %r29898, %r29867; 2026-02-21T09:05:41.1537301Z mov.b32 %r29899, %r29867; 2026-02-21T09:05:41.1537359Z mov.b32 %r29900, %r29867; 2026-02-21T09:05:41.1537418Z mov.b32 %r29901, %r29867; 2026-02-21T09:05:41.1537487Z mov.b32 %r29902, %r29867; 2026-02-21T09:05:41.1537621Z mov.b32 %r29903, %r29867; 2026-02-21T09:05:41.1537685Z mov.b32 %r29904, %r29867; 2026-02-21T09:05:41.1537748Z mov.b32 %r29905, %r29867; 2026-02-21T09:05:41.1537809Z mov.b32 %r29906, %r29867; 2026-02-21T09:05:41.1537867Z mov.b32 %r29907, %r29867; 2026-02-21T09:05:41.1537926Z mov.b32 %r29908, %r29867; 2026-02-21T09:05:41.1537999Z mov.b32 %r29909, %r29867; 2026-02-21T09:05:41.1538065Z mov.b32 %r29910, %r29867; 2026-02-21T09:05:41.1538125Z mov.b32 %r29911, %r29867; 2026-02-21T09:05:41.1538190Z mov.b32 %r29912, %r29867; 2026-02-21T09:05:41.1538247Z mov.b32 %r29913, %r29867; 2026-02-21T09:05:41.1538306Z mov.b32 %r29914, %r29867; 2026-02-21T09:05:41.1538364Z mov.b32 %r29915, %r29867; 2026-02-21T09:05:41.1538426Z mov.b32 %r29916, %r29867; 2026-02-21T09:05:41.1538484Z mov.b32 %r29917, %r29867; 2026-02-21T09:05:41.1538544Z mov.b32 %r29918, %r29867; 2026-02-21T09:05:41.1538606Z mov.b32 %r29919, %r29867; 2026-02-21T09:05:41.1538670Z mov.b32 %r29920, %r29867; 2026-02-21T09:05:41.1538730Z mov.b32 %r29921, %r29867; 2026-02-21T09:05:41.1538789Z mov.b32 %r29922, %r29867; 2026-02-21T09:05:41.1538853Z mov.b32 %r29923, %r29867; 2026-02-21T09:05:41.1538914Z mov.b32 %r29924, %r29867; 2026-02-21T09:05:41.1538973Z mov.b32 %r29925, %r29867; 2026-02-21T09:05:41.1539036Z mov.b32 %r29926, %r29867; 2026-02-21T09:05:41.1539098Z mov.b32 %r29927, %r29867; 2026-02-21T09:05:41.1539155Z mov.b32 %r29928, %r29867; 2026-02-21T09:05:41.1539221Z mov.b32 %r29929, %r29867; 2026-02-21T09:05:41.1539280Z mov.b32 %r29930, %r29867; 2026-02-21T09:05:41.1539337Z mov.b32 %r29931, %r29867; 2026-02-21T09:05:41.1539397Z mov.b32 %r29932, %r29867; 2026-02-21T09:05:41.1539468Z mov.b32 %r29933, %r29867; 2026-02-21T09:05:41.1539531Z mov.b32 %r29934, %r29867; 2026-02-21T09:05:41.1539592Z mov.b32 %r29935, %r29867; 2026-02-21T09:05:41.1539658Z mov.b32 %r29936, %r29867; 2026-02-21T09:05:41.1539717Z mov.b32 %r29937, %r29867; 2026-02-21T09:05:41.1539779Z mov.b32 %r29938, %r29867; 2026-02-21T09:05:41.1539839Z mov.b32 %r29939, %r29867; 2026-02-21T09:05:41.1539903Z mov.b32 %r29940, %r29867; 2026-02-21T09:05:41.1539963Z mov.b32 %r29941, %r29867; 2026-02-21T09:05:41.1540099Z mov.b32 %r29942, %r29867; 2026-02-21T09:05:41.1540162Z mov.b32 %r29943, %r29867; 2026-02-21T09:05:41.1540221Z mov.b32 %r29944, %r29867; 2026-02-21T09:05:41.1540280Z mov.b32 %r29945, %r29867; 2026-02-21T09:05:41.1540340Z mov.b32 %r29946, %r29867; 2026-02-21T09:05:41.1540404Z mov.b32 %r29947, %r29867; 2026-02-21T09:05:41.1540460Z mov.b32 %r29948, %r29867; 2026-02-21T09:05:41.1540518Z mov.b32 %r29949, %r29867; 2026-02-21T09:05:41.1540583Z mov.b32 %r29950, %r29867; 2026-02-21T09:05:41.1540642Z mov.b32 %r29951, %r29867; 2026-02-21T09:05:41.1540704Z mov.b32 %r29952, %r29867; 2026-02-21T09:05:41.1540763Z mov.b32 %r29953, %r29867; 2026-02-21T09:05:41.1540826Z mov.b32 %r29954, %r29867; 2026-02-21T09:05:41.1540886Z mov.b32 %r29955, %r29867; 2026-02-21T09:05:41.1541079Z mov.b32 %r29956, %r29867; 2026-02-21T09:05:41.1541147Z mov.b32 %r29957, %r29867; 2026-02-21T09:05:41.1541207Z mov.b32 %r29958, %r29867; 2026-02-21T09:05:41.1541272Z mov.b32 %r29959, %r29867; 2026-02-21T09:05:41.1541343Z mov.b32 %r29960, %r29867; 2026-02-21T09:05:41.1541406Z mov.b32 %r29961, %r29867; 2026-02-21T09:05:41.1541465Z mov.b32 %r29962, %r29867; 2026-02-21T09:05:41.1541524Z mov.b32 %r29963, %r29867; 2026-02-21T09:05:41.1541587Z mov.b32 %r29964, %r29867; 2026-02-21T09:05:41.1541645Z mov.b32 %r29965, %r29867; 2026-02-21T09:05:41.1541704Z mov.b32 %r29966, %r29867; 2026-02-21T09:05:41.1541766Z mov.b32 %r29967, %r29867; 2026-02-21T09:05:41.1541831Z mov.b32 %r29968, %r29867; 2026-02-21T09:05:41.1541889Z mov.b32 %r29969, %r29867; 2026-02-21T09:05:41.1541947Z mov.b32 %r29970, %r29867; 2026-02-21T09:05:41.1542013Z mov.b32 %r29971, %r29867; 2026-02-21T09:05:41.1542073Z mov.b32 %r29972, %r29867; 2026-02-21T09:05:41.1542133Z mov.b32 %r29973, %r29867; 2026-02-21T09:05:41.1548187Z mov.b32 %r29974, %r29867; 2026-02-21T09:05:41.1548513Z mov.b32 %r29975, %r29867; 2026-02-21T09:05:41.1548598Z mov.b32 %r29976, %r29867; 2026-02-21T09:05:41.1548667Z mov.b32 %r29977, %r29867; 2026-02-21T09:05:41.1548744Z mov.b32 %r29978, %r29867; 2026-02-21T09:05:41.1548809Z mov.b32 %r29979, %r29867; 2026-02-21T09:05:41.1548872Z mov.b32 %r29980, %r29867; 2026-02-21T09:05:41.1548933Z mov.b32 %r29981, %r29867; 2026-02-21T09:05:41.1548994Z mov.b32 %r29982, %r29867; 2026-02-21T09:05:41.1549056Z mov.b32 %r29983, %r29867; 2026-02-21T09:05:41.1549115Z mov.b32 %r29984, %r29867; 2026-02-21T09:05:41.1549175Z mov.b32 %r29985, %r29867; 2026-02-21T09:05:41.1549234Z mov.b32 %r29986, %r29867; 2026-02-21T09:05:41.1549298Z mov.b32 %r29987, %r29867; 2026-02-21T09:05:41.1549359Z mov.b32 %r29988, %r29867; 2026-02-21T09:05:41.1549417Z mov.b32 %r29989, %r29867; 2026-02-21T09:05:41.1549483Z mov.b32 %r29990, %r29867; 2026-02-21T09:05:41.1549541Z mov.b32 %r29991, %r29867; 2026-02-21T09:05:41.1549605Z mov.b32 %r29992, %r29867; 2026-02-21T09:05:41.1549667Z mov.b32 %r29993, %r29867; 2026-02-21T09:05:41.1549730Z mov.b32 %r29994, %r29867; 2026-02-21T09:05:41.1549789Z mov.b32 %r29995, %r29867; 2026-02-21T09:05:41.1549852Z mov.b32 %r29996, %r29867; 2026-02-21T09:05:41.1549915Z mov.b32 %r29997, %r29867; 2026-02-21T09:05:41.1549975Z mov.b32 %r29998, %r29867; 2026-02-21T09:05:41.1550034Z mov.b32 %r29999, %r29867; 2026-02-21T09:05:41.1550090Z mov.b32 %r30000, %r29867; 2026-02-21T09:05:41.1550157Z mov.b32 %r30001, %r29867; 2026-02-21T09:05:41.1550213Z mov.b32 %r30002, %r29867; 2026-02-21T09:05:41.1550273Z mov.b32 %r30003, %r29867; 2026-02-21T09:05:41.1550336Z mov.b32 %r30004, %r29867; 2026-02-21T09:05:41.1550394Z mov.b32 %r30005, %r29867; 2026-02-21T09:05:41.1550453Z mov.b32 %r30006, %r29867; 2026-02-21T09:05:41.1550518Z mov.b32 %r30007, %r29867; 2026-02-21T09:05:41.1550578Z mov.b32 %r30008, %r29867; 2026-02-21T09:05:41.1550636Z mov.b32 %r30009, %r29867; 2026-02-21T09:05:41.1550697Z mov.b32 %r30010, %r29867; 2026-02-21T09:05:41.1550769Z mov.b32 %r30011, %r29867; 2026-02-21T09:05:41.1550828Z mov.b32 %r30012, %r29867; 2026-02-21T09:05:41.1550887Z mov.b32 %r30013, %r29867; 2026-02-21T09:05:41.1551038Z mov.b32 %r30014, %r29867; 2026-02-21T09:05:41.1551099Z mov.b32 %r30015, %r29867; 2026-02-21T09:05:41.1551157Z mov.b32 %r30016, %r29867; 2026-02-21T09:05:41.1551217Z mov.b32 %r30017, %r29867; 2026-02-21T09:05:41.1551281Z mov.b32 %r30018, %r29867; 2026-02-21T09:05:41.1551340Z mov.b32 %r30019, %r29867; 2026-02-21T09:05:41.1551397Z mov.b32 %r30020, %r29867; 2026-02-21T09:05:41.1551462Z mov.b32 %r30021, %r29867; 2026-02-21T09:05:41.1551520Z mov.b32 %r30022, %r29867; 2026-02-21T09:05:41.1551581Z mov.b32 %r30023, %r29867; 2026-02-21T09:05:41.1551641Z mov.b32 %r30024, %r29867; 2026-02-21T09:05:41.1551703Z mov.b32 %r30025, %r29867; 2026-02-21T09:05:41.1551762Z mov.b32 %r30026, %r29867; 2026-02-21T09:05:41.1551824Z mov.b32 %r30027, %r29867; 2026-02-21T09:05:41.1552022Z mov.b32 %r30028, %r29867; 2026-02-21T09:05:41.1552096Z mov.b32 %r30029, %r29867; 2026-02-21T09:05:41.1552157Z mov.b32 %r30030, %r29867; 2026-02-21T09:05:41.1552219Z mov.b32 %r30031, %r29867; 2026-02-21T09:05:41.1552287Z mov.b32 %r30032, %r29867; 2026-02-21T09:05:41.1552346Z mov.b32 %r30033, %r29867; 2026-02-21T09:05:41.1552403Z mov.b32 %r30034, %r29867; 2026-02-21T09:05:41.1552465Z mov.b32 %r30035, %r29867; 2026-02-21T09:05:41.1552522Z mov.b32 %r30036, %r29867; 2026-02-21T09:05:41.1552582Z mov.b32 %r30037, %r29867; 2026-02-21T09:05:41.1552640Z mov.b32 %r30038, %r29867; 2026-02-21T09:05:41.1552706Z mov.b32 %r30039, %r29867; 2026-02-21T09:05:41.1552768Z mov.b32 %r30040, %r29867; 2026-02-21T09:05:41.1552828Z mov.b32 %r30041, %r29867; 2026-02-21T09:05:41.1552891Z mov.b32 %r30042, %r29867; 2026-02-21T09:05:41.1552948Z mov.b32 %r30043, %r29867; 2026-02-21T09:05:41.1553008Z mov.b32 %r30044, %r29867; 2026-02-21T09:05:41.1553067Z mov.b32 %r30045, %r29867; 2026-02-21T09:05:41.1553135Z mov.b32 %r30046, %r29867; 2026-02-21T09:05:41.1553246Z mov.b32 %r30047, %r29867; 2026-02-21T09:05:41.1553308Z mov.b32 %r30048, %r29867; 2026-02-21T09:05:41.1553370Z mov.b32 %r30049, %r29867; 2026-02-21T09:05:41.1553431Z mov.b32 %r30050, %r29867; 2026-02-21T09:05:41.1553491Z mov.b32 %r30051, %r29867; 2026-02-21T09:05:41.1553550Z mov.b32 %r30052, %r29867; 2026-02-21T09:05:41.1553627Z mov.b32 %r30053, %r29867; 2026-02-21T09:05:41.1553686Z mov.b32 %r30054, %r29867; 2026-02-21T09:05:41.1553745Z mov.b32 %r30055, %r29867; 2026-02-21T09:05:41.1553808Z mov.b32 %r30056, %r29867; 2026-02-21T09:05:41.1553867Z mov.b32 %r30057, %r29867; 2026-02-21T09:05:41.1553926Z mov.b32 %r30058, %r29867; 2026-02-21T09:05:41.1553989Z mov.b32 %r30059, %r29867; 2026-02-21T09:05:41.1554050Z mov.b32 %r30060, %r29867; 2026-02-21T09:05:41.1554108Z mov.b32 %r30061, %r29867; 2026-02-21T09:05:41.1554168Z mov.b32 %r30062, %r29867; 2026-02-21T09:05:41.1554232Z mov.b32 %r30063, %r29867; 2026-02-21T09:05:41.1554294Z mov.b32 %r30064, %r29867; 2026-02-21T09:05:41.1554355Z mov.b32 %r30065, %r29867; 2026-02-21T09:05:41.1554418Z mov.b32 %r30066, %r29867; 2026-02-21T09:05:41.1554476Z mov.b32 %r30067, %r29867; 2026-02-21T09:05:41.1554536Z mov.b32 %r30068, %r29867; 2026-02-21T09:05:41.1554594Z mov.b32 %r30069, %r29867; 2026-02-21T09:05:41.1554655Z mov.b32 %r30070, %r29867; 2026-02-21T09:05:41.1554713Z mov.b32 %r30071, %r29867; 2026-02-21T09:05:41.1554772Z mov.b32 %r30072, %r29867; 2026-02-21T09:05:41.1554835Z mov.b32 %r30073, %r29867; 2026-02-21T09:05:41.1554895Z mov.b32 %r30074, %r29867; 2026-02-21T09:05:41.1554953Z mov.b32 %r30075, %r29867; 2026-02-21T09:05:41.1555013Z mov.b32 %r30076, %r29867; 2026-02-21T09:05:41.1555076Z mov.b32 %r30077, %r29867; 2026-02-21T09:05:41.1555133Z mov.b32 %r30078, %r29867; 2026-02-21T09:05:41.1555193Z mov.b32 %r30079, %r29867; 2026-02-21T09:05:41.1555256Z mov.b32 %r30080, %r29867; 2026-02-21T09:05:41.1555317Z mov.b32 %r30081, %r29867; 2026-02-21T09:05:41.1555380Z mov.b32 %r30082, %r29867; 2026-02-21T09:05:41.1555441Z mov.b32 %r30083, %r29867; 2026-02-21T09:05:41.1555508Z mov.b32 %r30084, %r29867; 2026-02-21T09:05:41.1555573Z mov.b32 %r30085, %r29867; 2026-02-21T09:05:41.1555716Z mov.b32 %r30086, %r29867; 2026-02-21T09:05:41.1555786Z mov.b32 %r30087, %r29867; 2026-02-21T09:05:41.1555855Z mov.b32 %r30088, %r29867; 2026-02-21T09:05:41.1555917Z mov.b32 %r30089, %r29867; 2026-02-21T09:05:41.1555979Z mov.b32 %r30090, %r29867; 2026-02-21T09:05:41.1556044Z mov.b32 %r30091, %r29867; 2026-02-21T09:05:41.1556103Z mov.b32 %r30092, %r29867; 2026-02-21T09:05:41.1556163Z mov.b32 %r30093, %r29867; 2026-02-21T09:05:41.1556228Z mov.b32 %r30094, %r29867; 2026-02-21T09:05:41.1556289Z mov.b32 %r30095, %r29867; 2026-02-21T09:05:41.1556348Z mov.b32 %r30096, %r29867; 2026-02-21T09:05:41.1556410Z mov.b32 %r30097, %r29867; 2026-02-21T09:05:41.1556619Z mov.b32 %r30098, %r29867; 2026-02-21T09:05:41.1556769Z mov.b32 %r30099, %r29867; 2026-02-21T09:05:41.1556903Z mov.b32 %r30100, %r29867; 2026-02-21T09:05:41.1556973Z mov.b32 %r30101, %r29867; 2026-02-21T09:05:41.1557036Z mov.b32 %r30102, %r29867; 2026-02-21T09:05:41.1557095Z mov.b32 %r30103, %r29867; 2026-02-21T09:05:41.1557171Z mov.b32 %r30104, %r29867; 2026-02-21T09:05:41.1557232Z mov.b32 %r30105, %r29867; 2026-02-21T09:05:41.1557293Z mov.b32 %r30106, %r29867; 2026-02-21T09:05:41.1557351Z mov.b32 %r30107, %r29867; 2026-02-21T09:05:41.1557416Z mov.b32 %r30108, %r29867; 2026-02-21T09:05:41.1557475Z mov.b32 %r30109, %r29867; 2026-02-21T09:05:41.1557534Z mov.b32 %r30110, %r29867; 2026-02-21T09:05:41.1557598Z mov.b32 %r30111, %r29867; 2026-02-21T09:05:41.1557657Z mov.b32 %r30112, %r29867; 2026-02-21T09:05:41.1557718Z mov.b32 %r30113, %r29867; 2026-02-21T09:05:41.1557776Z mov.b32 %r30114, %r29867; 2026-02-21T09:05:41.1557840Z mov.b32 %r30115, %r29867; 2026-02-21T09:05:41.1557898Z mov.b32 %r30116, %r29867; 2026-02-21T09:05:41.1557971Z mov.b32 %r30117, %r29867; 2026-02-21T09:05:41.1558040Z mov.b32 %r30118, %r29867; 2026-02-21T09:05:41.1558166Z mov.b32 %r30119, %r29867; 2026-02-21T09:05:41.1558230Z mov.b32 %r30120, %r29867; 2026-02-21T09:05:41.1558291Z mov.b32 %r30121, %r29867; 2026-02-21T09:05:41.1558358Z mov.b32 %r30122, %r29867; 2026-02-21T09:05:41.1558495Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T09:05:41.1558609Z // => This Inner Loop Header: Depth=2 2026-02-21T09:05:41.1558843Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1558915Z add.s64 %rd246, %rd306, %rd36; 2026-02-21T09:05:41.1558982Z add.s64 %rd247, %rd306, %rd35; 2026-02-21T09:05:41.1559052Z add.s64 %rd248, %rd306, %rd34; 2026-02-21T09:05:41.1559267Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1559331Z add.s64 %rd249, %rd306, %rd33; 2026-02-21T09:05:41.1559398Z // begin inline asm 2026-02-21T09:05:41.1559468Z mov.u32 %r22412, 0x0; 2026-02-21T09:05:41.1559529Z mov.u32 %r22413, 0x0; 2026-02-21T09:05:41.1559587Z mov.u32 %r22414, 0x0; 2026-02-21T09:05:41.1559655Z mov.u32 %r22415, 0x0; 2026-02-21T09:05:41.1559801Z ld.global.v4.b32 { %r22412, %r22413, %r22414, %r22415 }, [ %rd246 + 0 ]; 2026-02-21T09:05:41.1559861Z // end inline asm 2026-02-21T09:05:41.1559923Z // begin inline asm 2026-02-21T09:05:41.1559987Z mov.u32 %r22416, 0x0; 2026-02-21T09:05:41.1560046Z mov.u32 %r22417, 0x0; 2026-02-21T09:05:41.1560108Z mov.u32 %r22418, 0x0; 2026-02-21T09:05:41.1560169Z mov.u32 %r22419, 0x0; 2026-02-21T09:05:41.1560299Z ld.global.v4.b32 { %r22416, %r22417, %r22418, %r22419 }, [ %rd247 + 0 ]; 2026-02-21T09:05:41.1560358Z // end inline asm 2026-02-21T09:05:41.1560417Z // begin inline asm 2026-02-21T09:05:41.1560481Z mov.u32 %r22420, 0x0; 2026-02-21T09:05:41.1560538Z mov.u32 %r22421, 0x0; 2026-02-21T09:05:41.1560596Z mov.u32 %r22422, 0x0; 2026-02-21T09:05:41.1560663Z mov.u32 %r22423, 0x0; 2026-02-21T09:05:41.1560790Z ld.global.v4.b32 { %r22420, %r22421, %r22422, %r22423 }, [ %rd248 + 0 ]; 2026-02-21T09:05:41.1560849Z // end inline asm 2026-02-21T09:05:41.1560991Z // begin inline asm 2026-02-21T09:05:41.1561053Z mov.u32 %r22424, 0x0; 2026-02-21T09:05:41.1561120Z mov.u32 %r22425, 0x0; 2026-02-21T09:05:41.1561183Z mov.u32 %r22426, 0x0; 2026-02-21T09:05:41.1561247Z mov.u32 %r22427, 0x0; 2026-02-21T09:05:41.1561374Z ld.global.v4.b32 { %r22424, %r22425, %r22426, %r22427 }, [ %rd249 + 0 ]; 2026-02-21T09:05:41.1561431Z // end inline asm 2026-02-21T09:05:41.1561645Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1561705Z bar.sync 0; 2026-02-21T09:05:41.1561793Z st.shared.v2.b32 [%r65], {%r22412, %r22413}; 2026-02-21T09:05:41.1561894Z st.shared.v2.b32 [%r65+2048], {%r22416, %r22417}; 2026-02-21T09:05:41.1561994Z st.shared.v2.b32 [%r65+4096], {%r22420, %r22421}; 2026-02-21T09:05:41.1562182Z st.shared.v2.b32 [%r65+6144], {%r22424, %r22425}; 2026-02-21T09:05:41.1562269Z st.shared.v2.b32 [%r66], {%r22414, %r22415}; 2026-02-21T09:05:41.1562362Z st.shared.v2.b32 [%r66+2048], {%r22418, %r22419}; 2026-02-21T09:05:41.1562449Z st.shared.v2.b32 [%r66+4096], {%r22422, %r22423}; 2026-02-21T09:05:41.1562536Z st.shared.v2.b32 [%r66+6144], {%r22426, %r22427}; 2026-02-21T09:05:41.1562601Z bar.sync 0; 2026-02-21T09:05:41.1562675Z ld.shared.b16 %rs970, [%r67]; 2026-02-21T09:05:41.1562748Z ld.shared.b16 %rs971, [%r67+256]; 2026-02-21T09:05:41.1562818Z ld.shared.b16 %rs972, [%r67+16]; 2026-02-21T09:05:41.1562894Z ld.shared.b16 %rs973, [%r67+272]; 2026-02-21T09:05:41.1562963Z ld.shared.b16 %rs974, [%r67+2048]; 2026-02-21T09:05:41.1563033Z ld.shared.b16 %rs975, [%r67+2304]; 2026-02-21T09:05:41.1563111Z ld.shared.b16 %rs976, [%r67+2064]; 2026-02-21T09:05:41.1563177Z ld.shared.b16 %rs977, [%r67+2320]; 2026-02-21T09:05:41.1563241Z ld.shared.b16 %rs978, [%r67+4096]; 2026-02-21T09:05:41.1563314Z ld.shared.b16 %rs979, [%r67+4352]; 2026-02-21T09:05:41.1563436Z ld.shared.b16 %rs980, [%r67+4112]; 2026-02-21T09:05:41.1563505Z ld.shared.b16 %rs981, [%r67+4368]; 2026-02-21T09:05:41.1563574Z ld.shared.b16 %rs982, [%r67+6144]; 2026-02-21T09:05:41.1563653Z ld.shared.b16 %rs983, [%r67+6400]; 2026-02-21T09:05:41.1563727Z ld.shared.b16 %rs984, [%r67+6160]; 2026-02-21T09:05:41.1563797Z ld.shared.b16 %rs985, [%r67+6416]; 2026-02-21T09:05:41.1563869Z ld.shared.b16 %rs986, [%r68]; 2026-02-21T09:05:41.1563938Z ld.shared.b16 %rs987, [%r68+256]; 2026-02-21T09:05:41.1564004Z ld.shared.b16 %rs988, [%r68+16]; 2026-02-21T09:05:41.1564069Z ld.shared.b16 %rs989, [%r68+272]; 2026-02-21T09:05:41.1564139Z ld.shared.b16 %rs990, [%r68+2048]; 2026-02-21T09:05:41.1564205Z ld.shared.b16 %rs991, [%r68+2304]; 2026-02-21T09:05:41.1564271Z ld.shared.b16 %rs992, [%r68+2064]; 2026-02-21T09:05:41.1564339Z ld.shared.b16 %rs993, [%r68+2320]; 2026-02-21T09:05:41.1564406Z ld.shared.b16 %rs994, [%r68+4096]; 2026-02-21T09:05:41.1564475Z ld.shared.b16 %rs995, [%r68+4352]; 2026-02-21T09:05:41.1564540Z ld.shared.b16 %rs996, [%r68+4112]; 2026-02-21T09:05:41.1564612Z ld.shared.b16 %rs997, [%r68+4368]; 2026-02-21T09:05:41.1564681Z ld.shared.b16 %rs998, [%r68+6144]; 2026-02-21T09:05:41.1564749Z ld.shared.b16 %rs999, [%r68+6400]; 2026-02-21T09:05:41.1564827Z ld.shared.b16 %rs1000, [%r68+6160]; 2026-02-21T09:05:41.1564896Z ld.shared.b16 %rs1001, [%r68+6416]; 2026-02-21T09:05:41.1564959Z cvt.f32.bf16 %r22558, %rs970; 2026-02-21T09:05:41.1565027Z cvt.f32.bf16 %r22559, %rs971; 2026-02-21T09:05:41.1565088Z cvt.f32.bf16 %r22560, %rs986; 2026-02-21T09:05:41.1565152Z cvt.f32.bf16 %r22561, %rs987; 2026-02-21T09:05:41.1565212Z cvt.f32.bf16 %r22690, %rs972; 2026-02-21T09:05:41.1565280Z cvt.f32.bf16 %r22691, %rs973; 2026-02-21T09:05:41.1565341Z cvt.f32.bf16 %r22692, %rs988; 2026-02-21T09:05:41.1565400Z cvt.f32.bf16 %r22693, %rs989; 2026-02-21T09:05:41.1565465Z cvt.f32.bf16 %r22822, %rs974; 2026-02-21T09:05:41.1565533Z cvt.f32.bf16 %r22823, %rs975; 2026-02-21T09:05:41.1565594Z cvt.f32.bf16 %r22824, %rs990; 2026-02-21T09:05:41.1565656Z cvt.f32.bf16 %r22825, %rs991; 2026-02-21T09:05:41.1565722Z cvt.f32.bf16 %r22954, %rs976; 2026-02-21T09:05:41.1565849Z cvt.f32.bf16 %r22955, %rs977; 2026-02-21T09:05:41.1565914Z cvt.f32.bf16 %r22956, %rs992; 2026-02-21T09:05:41.1565981Z cvt.f32.bf16 %r22957, %rs993; 2026-02-21T09:05:41.1566042Z cvt.f32.bf16 %r23086, %rs978; 2026-02-21T09:05:41.1566103Z cvt.f32.bf16 %r23087, %rs979; 2026-02-21T09:05:41.1566163Z cvt.f32.bf16 %r23088, %rs994; 2026-02-21T09:05:41.1566231Z cvt.f32.bf16 %r23089, %rs995; 2026-02-21T09:05:41.1566291Z cvt.f32.bf16 %r23218, %rs980; 2026-02-21T09:05:41.1566354Z cvt.f32.bf16 %r23219, %rs981; 2026-02-21T09:05:41.1566422Z cvt.f32.bf16 %r23220, %rs996; 2026-02-21T09:05:41.1566603Z cvt.f32.bf16 %r23221, %rs997; 2026-02-21T09:05:41.1566671Z cvt.f32.bf16 %r23350, %rs982; 2026-02-21T09:05:41.1566812Z cvt.f32.bf16 %r23351, %rs983; 2026-02-21T09:05:41.1566953Z cvt.f32.bf16 %r23352, %rs998; 2026-02-21T09:05:41.1567020Z cvt.f32.bf16 %r23353, %rs999; 2026-02-21T09:05:41.1567087Z cvt.f32.bf16 %r23482, %rs984; 2026-02-21T09:05:41.1567156Z cvt.f32.bf16 %r23483, %rs985; 2026-02-21T09:05:41.1567223Z cvt.f32.bf16 %r23484, %rs1000; 2026-02-21T09:05:41.1567287Z cvt.f32.bf16 %r23485, %rs1001; 2026-02-21T09:05:41.1567511Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1567573Z // begin inline asm 2026-02-21T09:05:41.1567633Z mov.u32 %r22428, 0x0; 2026-02-21T09:05:41.1567695Z mov.u32 %r22429, 0x0; 2026-02-21T09:05:41.1567802Z ld.global.v2.b32 { %r22428, %r22429 }, [ %rd305 + 0 ]; 2026-02-21T09:05:41.1567862Z // end inline asm 2026-02-21T09:05:41.1568069Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1568136Z bar.sync 0; 2026-02-21T09:05:41.1568206Z st.shared.b8 [%r69], %r22428; 2026-02-21T09:05:41.1568343Z prmt.b32 %r28780, %r22428, 0, 0x7771U; 2026-02-21T09:05:41.1568413Z st.shared.b8 [%r70], %r28780; 2026-02-21T09:05:41.1568488Z prmt.b32 %r28781, %r22428, 0, 0x7772U; 2026-02-21T09:05:41.1568556Z st.shared.b8 [%r71+256], %r28781; 2026-02-21T09:05:41.1568622Z prmt.b32 %r28782, %r22428, 0, 0x7773U; 2026-02-21T09:05:41.1568693Z st.shared.b8 [%r72+256], %r28782; 2026-02-21T09:05:41.1568758Z st.shared.b8 [%r73+512], %r22429; 2026-02-21T09:05:41.1568824Z prmt.b32 %r28783, %r22429, 0, 0x7771U; 2026-02-21T09:05:41.1568897Z st.shared.b8 [%r74+512], %r28783; 2026-02-21T09:05:41.1568974Z prmt.b32 %r28784, %r22429, 0, 0x7772U; 2026-02-21T09:05:41.1569039Z st.shared.b8 [%r75+768], %r28784; 2026-02-21T09:05:41.1569106Z prmt.b32 %r28785, %r22429, 0, 0x7773U; 2026-02-21T09:05:41.1569175Z st.shared.b8 [%r76+768], %r28785; 2026-02-21T09:05:41.1569233Z bar.sync 0; 2026-02-21T09:05:41.1569301Z ld.shared.b32 %r28786, [%r77]; 2026-02-21T09:05:41.1569378Z prmt.b32 %r28787, %r28786, 0, 0x7770U; 2026-02-21T09:05:41.1569443Z cvt.u16.u32 %rs1002, %r28787; 2026-02-21T09:05:41.1569510Z prmt.b32 %r28788, %r28786, 0, 0x7771U; 2026-02-21T09:05:41.1569575Z cvt.u16.u32 %rs1003, %r28788; 2026-02-21T09:05:41.1569648Z prmt.b32 %r28789, %r28786, 0, 0x7772U; 2026-02-21T09:05:41.1569710Z cvt.u16.u32 %rs1004, %r28789; 2026-02-21T09:05:41.1569776Z prmt.b32 %r28790, %r28786, 0, 0x7773U; 2026-02-21T09:05:41.1569843Z cvt.u16.u32 %rs1005, %r28790; 2026-02-21T09:05:41.1569909Z ld.shared.b32 %r28791, [%r78]; 2026-02-21T09:05:41.1569976Z prmt.b32 %r28792, %r28791, 0, 0x7770U; 2026-02-21T09:05:41.1570042Z cvt.u16.u32 %rs1006, %r28792; 2026-02-21T09:05:41.1570107Z prmt.b32 %r28793, %r28791, 0, 0x7771U; 2026-02-21T09:05:41.1570169Z cvt.u16.u32 %rs1007, %r28793; 2026-02-21T09:05:41.1570236Z prmt.b32 %r28794, %r28791, 0, 0x7772U; 2026-02-21T09:05:41.1570307Z cvt.u16.u32 %rs1008, %r28794; 2026-02-21T09:05:41.1570373Z prmt.b32 %r28795, %r28791, 0, 0x7773U; 2026-02-21T09:05:41.1570436Z cvt.u16.u32 %rs1009, %r28795; 2026-02-21T09:05:41.1570647Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1570714Z shl.b16 %rs1010, %rs1002, 4; 2026-02-21T09:05:41.1570864Z shl.b16 %rs1011, %rs1006, 4; 2026-02-21T09:05:41.1570930Z shl.b16 %rs1012, %rs1003, 4; 2026-02-21T09:05:41.1570997Z shl.b16 %rs1013, %rs1007, 4; 2026-02-21T09:05:41.1571058Z shl.b16 %rs1014, %rs1004, 4; 2026-02-21T09:05:41.1571119Z shl.b16 %rs1015, %rs1008, 4; 2026-02-21T09:05:41.1571185Z shl.b16 %rs1016, %rs1005, 4; 2026-02-21T09:05:41.1571248Z shl.b16 %rs1017, %rs1009, 4; 2026-02-21T09:05:41.1571447Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1571516Z cvt.s16.s8 %rs1018, %rs1010; 2026-02-21T09:05:41.1571578Z shr.s16 %rs1019, %rs1018, 4; 2026-02-21T09:05:41.1571640Z cvt.s16.s8 %rs1020, %rs1011; 2026-02-21T09:05:41.1571754Z shr.s16 %rs1021, %rs1020, 4; 2026-02-21T09:05:41.1571870Z prmt.b32 %r28796, %r28786, 0, 0x8880U; 2026-02-21T09:05:41.1571935Z cvt.u16.u32 %rs1022, %r28796; 2026-02-21T09:05:41.1571996Z shr.s16 %rs1023, %rs1022, 4; 2026-02-21T09:05:41.1572068Z prmt.b32 %r28797, %r28791, 0, 0x8880U; 2026-02-21T09:05:41.1572129Z cvt.u16.u32 %rs1024, %r28797; 2026-02-21T09:05:41.1572190Z shr.s16 %rs1025, %rs1024, 4; 2026-02-21T09:05:41.1572394Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1572460Z cvt.rn.f32.s16 %r28798, %rs1025; 2026-02-21T09:05:41.1572523Z cvt.rn.f32.s16 %r28799, %rs1023; 2026-02-21T09:05:41.1572587Z cvt.rn.f32.s16 %r28800, %rs1021; 2026-02-21T09:05:41.1572656Z cvt.rn.f32.s16 %r28801, %rs1019; 2026-02-21T09:05:41.1572850Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1572913Z cvt.s16.s8 %rs1026, %rs1012; 2026-02-21T09:05:41.1572982Z shr.s16 %rs1027, %rs1026, 4; 2026-02-21T09:05:41.1573117Z cvt.s16.s8 %rs1028, %rs1013; 2026-02-21T09:05:41.1573192Z shr.s16 %rs1029, %rs1028, 4; 2026-02-21T09:05:41.1573270Z prmt.b32 %r28802, %r28786, 0, 0x9991U; 2026-02-21T09:05:41.1573335Z cvt.u16.u32 %rs1030, %r28802; 2026-02-21T09:05:41.1573396Z shr.s16 %rs1031, %rs1030, 4; 2026-02-21T09:05:41.1573463Z prmt.b32 %r28803, %r28791, 0, 0x9991U; 2026-02-21T09:05:41.1573531Z cvt.u16.u32 %rs1032, %r28803; 2026-02-21T09:05:41.1573592Z shr.s16 %rs1033, %rs1032, 4; 2026-02-21T09:05:41.1573787Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1573860Z cvt.rn.f32.s16 %r28804, %rs1033; 2026-02-21T09:05:41.1573925Z cvt.rn.f32.s16 %r28805, %rs1031; 2026-02-21T09:05:41.1573989Z cvt.rn.f32.s16 %r28806, %rs1029; 2026-02-21T09:05:41.1574059Z cvt.rn.f32.s16 %r28807, %rs1027; 2026-02-21T09:05:41.1574270Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1574342Z cvt.s16.s8 %rs1034, %rs1014; 2026-02-21T09:05:41.1574404Z shr.s16 %rs1035, %rs1034, 4; 2026-02-21T09:05:41.1574472Z cvt.s16.s8 %rs1036, %rs1015; 2026-02-21T09:05:41.1574536Z shr.s16 %rs1037, %rs1036, 4; 2026-02-21T09:05:41.1574606Z prmt.b32 %r28808, %r28786, 0, 0xaaa2U; 2026-02-21T09:05:41.1574674Z cvt.u16.u32 %rs1038, %r28808; 2026-02-21T09:05:41.1574736Z shr.s16 %rs1039, %rs1038, 4; 2026-02-21T09:05:41.1574805Z prmt.b32 %r28809, %r28791, 0, 0xaaa2U; 2026-02-21T09:05:41.1574873Z cvt.u16.u32 %rs1040, %r28809; 2026-02-21T09:05:41.1574939Z shr.s16 %rs1041, %rs1040, 4; 2026-02-21T09:05:41.1575154Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1575223Z cvt.rn.f32.s16 %r28810, %rs1041; 2026-02-21T09:05:41.1575292Z cvt.rn.f32.s16 %r28811, %rs1039; 2026-02-21T09:05:41.1575355Z cvt.rn.f32.s16 %r28812, %rs1037; 2026-02-21T09:05:41.1575418Z cvt.rn.f32.s16 %r28813, %rs1035; 2026-02-21T09:05:41.1575632Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1575699Z cvt.s16.s8 %rs1042, %rs1016; 2026-02-21T09:05:41.1575761Z shr.s16 %rs1043, %rs1042, 4; 2026-02-21T09:05:41.1575880Z cvt.s16.s8 %rs1044, %rs1017; 2026-02-21T09:05:41.1575947Z shr.s16 %rs1045, %rs1044, 4; 2026-02-21T09:05:41.1576015Z prmt.b32 %r28814, %r28786, 0, 0xbbb3U; 2026-02-21T09:05:41.1576078Z cvt.u16.u32 %rs1046, %r28814; 2026-02-21T09:05:41.1576144Z shr.s16 %rs1047, %rs1046, 4; 2026-02-21T09:05:41.1576210Z prmt.b32 %r28815, %r28791, 0, 0xbbb3U; 2026-02-21T09:05:41.1576272Z cvt.u16.u32 %rs1048, %r28815; 2026-02-21T09:05:41.1576344Z shr.s16 %rs1049, %rs1048, 4; 2026-02-21T09:05:41.1576672Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1576742Z cvt.rn.f32.s16 %r28816, %rs1049; 2026-02-21T09:05:41.1576806Z cvt.rn.f32.s16 %r28817, %rs1047; 2026-02-21T09:05:41.1577020Z cvt.rn.f32.s16 %r28818, %rs1045; 2026-02-21T09:05:41.1577089Z cvt.rn.f32.s16 %r28819, %rs1043; 2026-02-21T09:05:41.1577148Z bar.sync 0; 2026-02-21T09:05:41.1577276Z st.shared.v4.b32 [%r79], {%r28801, %r28799, %r28800, %r28798}; 2026-02-21T09:05:41.1577393Z st.shared.v4.b32 [%r80], {%r28807, %r28805, %r28806, %r28804}; 2026-02-21T09:05:41.1577503Z st.shared.v4.b32 [%r81], {%r28813, %r28811, %r28812, %r28810}; 2026-02-21T09:05:41.1577616Z st.shared.v4.b32 [%r82], {%r28819, %r28817, %r28818, %r28816}; 2026-02-21T09:05:41.1577677Z $L__tmp25: 2026-02-21T09:05:41.1577956Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1578020Z // begin inline asm 2026-02-21T09:05:41.1578108Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1578168Z // end inline asm 2026-02-21T09:05:41.1578225Z bar.sync 0; 2026-02-21T09:05:41.1578313Z shfl.sync.idx.b32 %r28820, %r4, 0, 31, -1; 2026-02-21T09:05:41.1578390Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1578522Z mov.pred %p130, -1; 2026-02-21T09:05:41.1578588Z // begin inline asm 2026-02-21T09:05:41.1580073Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r22558,%r22559,%r22560,%r22561}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1580136Z // end inline asm 2026-02-21T09:05:41.1580202Z // begin inline asm 2026-02-21T09:05:41.1581674Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r22690,%r22691,%r22692,%r22693}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1581748Z // end inline asm 2026-02-21T09:05:41.1581810Z // begin inline asm 2026-02-21T09:05:41.1583280Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r22822,%r22823,%r22824,%r22825}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1583408Z // end inline asm 2026-02-21T09:05:41.1583467Z // begin inline asm 2026-02-21T09:05:41.1584927Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r22954,%r22955,%r22956,%r22957}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1585031Z // end inline asm 2026-02-21T09:05:41.1585139Z // begin inline asm 2026-02-21T09:05:41.1586723Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r23086,%r23087,%r23088,%r23089}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1586789Z // end inline asm 2026-02-21T09:05:41.1586853Z // begin inline asm 2026-02-21T09:05:41.1588465Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r23218,%r23219,%r23220,%r23221}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1588534Z // end inline asm 2026-02-21T09:05:41.1588598Z // begin inline asm 2026-02-21T09:05:41.1590058Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r23350,%r23351,%r23352,%r23353}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1590122Z // end inline asm 2026-02-21T09:05:41.1590181Z // begin inline asm 2026-02-21T09:05:41.1591633Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r23482,%r23483,%r23484,%r23485}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1591698Z // end inline asm 2026-02-21T09:05:41.1591779Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1591843Z mov.b32 %r28519, 0; 2026-02-21T09:05:41.1591912Z mov.b32 %r23743, %r28519; 2026-02-21T09:05:41.1591972Z mov.b32 %r23744, %r28519; 2026-02-21T09:05:41.1592030Z mov.b32 %r23742, %r24001; 2026-02-21T09:05:41.1592157Z // begin inline asm 2026-02-21T09:05:41.1597286Z // wait for regs: %r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930,%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994,%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058,%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122,%r23742,%r23743,%r23744 2026-02-21T09:05:41.1597445Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1597504Z // end inline asm 2026-02-21T09:05:41.1597565Z $L__tmp26: 2026-02-21T09:05:41.1597783Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1597849Z add.s64 %rd259, %rd246, 32; 2026-02-21T09:05:41.1597918Z add.s64 %rd260, %rd247, 32; 2026-02-21T09:05:41.1597978Z add.s64 %rd261, %rd248, 32; 2026-02-21T09:05:41.1598180Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1598243Z add.s64 %rd262, %rd249, 32; 2026-02-21T09:05:41.1598310Z // begin inline asm 2026-02-21T09:05:41.1598373Z mov.u32 %r24004, 0x0; 2026-02-21T09:05:41.1598432Z mov.u32 %r24005, 0x0; 2026-02-21T09:05:41.1598494Z mov.u32 %r24006, 0x0; 2026-02-21T09:05:41.1598550Z mov.u32 %r24007, 0x0; 2026-02-21T09:05:41.1598703Z ld.global.v4.b32 { %r24004, %r24005, %r24006, %r24007 }, [ %rd259 + 0 ]; 2026-02-21T09:05:41.1598762Z // end inline asm 2026-02-21T09:05:41.1598828Z // begin inline asm 2026-02-21T09:05:41.1598888Z mov.u32 %r24008, 0x0; 2026-02-21T09:05:41.1598947Z mov.u32 %r24009, 0x0; 2026-02-21T09:05:41.1599007Z mov.u32 %r24010, 0x0; 2026-02-21T09:05:41.1599064Z mov.u32 %r24011, 0x0; 2026-02-21T09:05:41.1599192Z ld.global.v4.b32 { %r24008, %r24009, %r24010, %r24011 }, [ %rd260 + 0 ]; 2026-02-21T09:05:41.1599248Z // end inline asm 2026-02-21T09:05:41.1599312Z // begin inline asm 2026-02-21T09:05:41.1599371Z mov.u32 %r24012, 0x0; 2026-02-21T09:05:41.1599428Z mov.u32 %r24013, 0x0; 2026-02-21T09:05:41.1599490Z mov.u32 %r24014, 0x0; 2026-02-21T09:05:41.1599553Z mov.u32 %r24015, 0x0; 2026-02-21T09:05:41.1599679Z ld.global.v4.b32 { %r24012, %r24013, %r24014, %r24015 }, [ %rd261 + 0 ]; 2026-02-21T09:05:41.1599744Z // end inline asm 2026-02-21T09:05:41.1599804Z // begin inline asm 2026-02-21T09:05:41.1599990Z mov.u32 %r24016, 0x0; 2026-02-21T09:05:41.1600047Z mov.u32 %r24017, 0x0; 2026-02-21T09:05:41.1600109Z mov.u32 %r24018, 0x0; 2026-02-21T09:05:41.1600167Z mov.u32 %r24019, 0x0; 2026-02-21T09:05:41.1600292Z ld.global.v4.b32 { %r24016, %r24017, %r24018, %r24019 }, [ %rd262 + 0 ]; 2026-02-21T09:05:41.1600355Z // end inline asm 2026-02-21T09:05:41.1600559Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1600615Z bar.sync 0; 2026-02-21T09:05:41.1600711Z st.shared.v2.b32 [%r65], {%r24004, %r24005}; 2026-02-21T09:05:41.1600813Z st.shared.v2.b32 [%r65+2048], {%r24008, %r24009}; 2026-02-21T09:05:41.1600902Z st.shared.v2.b32 [%r65+4096], {%r24012, %r24013}; 2026-02-21T09:05:41.1601109Z st.shared.v2.b32 [%r65+6144], {%r24016, %r24017}; 2026-02-21T09:05:41.1601198Z st.shared.v2.b32 [%r66], {%r24006, %r24007}; 2026-02-21T09:05:41.1601283Z st.shared.v2.b32 [%r66+2048], {%r24010, %r24011}; 2026-02-21T09:05:41.1601372Z st.shared.v2.b32 [%r66+4096], {%r24014, %r24015}; 2026-02-21T09:05:41.1601465Z st.shared.v2.b32 [%r66+6144], {%r24018, %r24019}; 2026-02-21T09:05:41.1601524Z bar.sync 0; 2026-02-21T09:05:41.1601594Z ld.shared.b16 %rs1050, [%r67]; 2026-02-21T09:05:41.1601667Z ld.shared.b16 %rs1051, [%r67+256]; 2026-02-21T09:05:41.1601743Z ld.shared.b16 %rs1052, [%r67+16]; 2026-02-21T09:05:41.1601809Z ld.shared.b16 %rs1053, [%r67+272]; 2026-02-21T09:05:41.1601880Z ld.shared.b16 %rs1054, [%r67+2048]; 2026-02-21T09:05:41.1601949Z ld.shared.b16 %rs1055, [%r67+2304]; 2026-02-21T09:05:41.1602015Z ld.shared.b16 %rs1056, [%r67+2064]; 2026-02-21T09:05:41.1602081Z ld.shared.b16 %rs1057, [%r67+2320]; 2026-02-21T09:05:41.1602145Z ld.shared.b16 %rs1058, [%r67+4096]; 2026-02-21T09:05:41.1602234Z ld.shared.b16 %rs1059, [%r67+4352]; 2026-02-21T09:05:41.1602377Z ld.shared.b16 %rs1060, [%r67+4112]; 2026-02-21T09:05:41.1602450Z ld.shared.b16 %rs1061, [%r67+4368]; 2026-02-21T09:05:41.1602524Z ld.shared.b16 %rs1062, [%r67+6144]; 2026-02-21T09:05:41.1602591Z ld.shared.b16 %rs1063, [%r67+6400]; 2026-02-21T09:05:41.1602658Z ld.shared.b16 %rs1064, [%r67+6160]; 2026-02-21T09:05:41.1602727Z ld.shared.b16 %rs1065, [%r67+6416]; 2026-02-21T09:05:41.1602794Z ld.shared.b16 %rs1066, [%r68]; 2026-02-21T09:05:41.1602859Z ld.shared.b16 %rs1067, [%r68+256]; 2026-02-21T09:05:41.1602926Z ld.shared.b16 %rs1068, [%r68+16]; 2026-02-21T09:05:41.1602997Z ld.shared.b16 %rs1069, [%r68+272]; 2026-02-21T09:05:41.1603063Z ld.shared.b16 %rs1070, [%r68+2048]; 2026-02-21T09:05:41.1603131Z ld.shared.b16 %rs1071, [%r68+2304]; 2026-02-21T09:05:41.1603200Z ld.shared.b16 %rs1072, [%r68+2064]; 2026-02-21T09:05:41.1603267Z ld.shared.b16 %rs1073, [%r68+2320]; 2026-02-21T09:05:41.1603336Z ld.shared.b16 %rs1074, [%r68+4096]; 2026-02-21T09:05:41.1603404Z ld.shared.b16 %rs1075, [%r68+4352]; 2026-02-21T09:05:41.1603475Z ld.shared.b16 %rs1076, [%r68+4112]; 2026-02-21T09:05:41.1603544Z ld.shared.b16 %rs1077, [%r68+4368]; 2026-02-21T09:05:41.1603612Z ld.shared.b16 %rs1078, [%r68+6144]; 2026-02-21T09:05:41.1603682Z ld.shared.b16 %rs1079, [%r68+6400]; 2026-02-21T09:05:41.1603749Z ld.shared.b16 %rs1080, [%r68+6160]; 2026-02-21T09:05:41.1603819Z ld.shared.b16 %rs1081, [%r68+6416]; 2026-02-21T09:05:41.1603890Z cvt.f32.bf16 %r24150, %rs1050; 2026-02-21T09:05:41.1603953Z cvt.f32.bf16 %r24151, %rs1051; 2026-02-21T09:05:41.1604014Z cvt.f32.bf16 %r24152, %rs1066; 2026-02-21T09:05:41.1604076Z cvt.f32.bf16 %r24153, %rs1067; 2026-02-21T09:05:41.1604145Z cvt.f32.bf16 %r24282, %rs1052; 2026-02-21T09:05:41.1604208Z cvt.f32.bf16 %r24283, %rs1053; 2026-02-21T09:05:41.1604271Z cvt.f32.bf16 %r24284, %rs1068; 2026-02-21T09:05:41.1604342Z cvt.f32.bf16 %r24285, %rs1069; 2026-02-21T09:05:41.1604412Z cvt.f32.bf16 %r24414, %rs1054; 2026-02-21T09:05:41.1604477Z cvt.f32.bf16 %r24415, %rs1055; 2026-02-21T09:05:41.1604539Z cvt.f32.bf16 %r24416, %rs1070; 2026-02-21T09:05:41.1604604Z cvt.f32.bf16 %r24417, %rs1071; 2026-02-21T09:05:41.1604726Z cvt.f32.bf16 %r24546, %rs1056; 2026-02-21T09:05:41.1604788Z cvt.f32.bf16 %r24547, %rs1057; 2026-02-21T09:05:41.1604858Z cvt.f32.bf16 %r24548, %rs1072; 2026-02-21T09:05:41.1604919Z cvt.f32.bf16 %r24549, %rs1073; 2026-02-21T09:05:41.1604979Z cvt.f32.bf16 %r24678, %rs1058; 2026-02-21T09:05:41.1605040Z cvt.f32.bf16 %r24679, %rs1059; 2026-02-21T09:05:41.1605105Z cvt.f32.bf16 %r24680, %rs1074; 2026-02-21T09:05:41.1605166Z cvt.f32.bf16 %r24681, %rs1075; 2026-02-21T09:05:41.1605227Z cvt.f32.bf16 %r24810, %rs1060; 2026-02-21T09:05:41.1605294Z cvt.f32.bf16 %r24811, %rs1061; 2026-02-21T09:05:41.1605354Z cvt.f32.bf16 %r24812, %rs1076; 2026-02-21T09:05:41.1605415Z cvt.f32.bf16 %r24813, %rs1077; 2026-02-21T09:05:41.1605480Z cvt.f32.bf16 %r24942, %rs1062; 2026-02-21T09:05:41.1605635Z cvt.f32.bf16 %r24943, %rs1063; 2026-02-21T09:05:41.1605701Z cvt.f32.bf16 %r24944, %rs1078; 2026-02-21T09:05:41.1605763Z cvt.f32.bf16 %r24945, %rs1079; 2026-02-21T09:05:41.1605830Z cvt.f32.bf16 %r25074, %rs1064; 2026-02-21T09:05:41.1605892Z cvt.f32.bf16 %r25075, %rs1065; 2026-02-21T09:05:41.1605953Z cvt.f32.bf16 %r25076, %rs1080; 2026-02-21T09:05:41.1606019Z cvt.f32.bf16 %r25077, %rs1081; 2026-02-21T09:05:41.1606226Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1606289Z add.s64 %rd263, %rd305, 65536; 2026-02-21T09:05:41.1606348Z // begin inline asm 2026-02-21T09:05:41.1606412Z mov.u32 %r24020, 0x0; 2026-02-21T09:05:41.1606594Z mov.u32 %r24021, 0x0; 2026-02-21T09:05:41.1606701Z ld.global.v2.b32 { %r24020, %r24021 }, [ %rd263 + 0 ]; 2026-02-21T09:05:41.1606765Z // end inline asm 2026-02-21T09:05:41.1606966Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1607100Z bar.sync 0; 2026-02-21T09:05:41.1607174Z st.shared.b8 [%r69], %r24020; 2026-02-21T09:05:41.1607258Z prmt.b32 %r28821, %r24020, 0, 0x7771U; 2026-02-21T09:05:41.1607325Z st.shared.b8 [%r70], %r28821; 2026-02-21T09:05:41.1607395Z prmt.b32 %r28822, %r24020, 0, 0x7772U; 2026-02-21T09:05:41.1607465Z st.shared.b8 [%r71+256], %r28822; 2026-02-21T09:05:41.1607533Z prmt.b32 %r28823, %r24020, 0, 0x7773U; 2026-02-21T09:05:41.1607599Z st.shared.b8 [%r72+256], %r28823; 2026-02-21T09:05:41.1607668Z st.shared.b8 [%r73+512], %r24021; 2026-02-21T09:05:41.1607736Z prmt.b32 %r28824, %r24021, 0, 0x7771U; 2026-02-21T09:05:41.1607801Z st.shared.b8 [%r74+512], %r28824; 2026-02-21T09:05:41.1607868Z prmt.b32 %r28825, %r24021, 0, 0x7772U; 2026-02-21T09:05:41.1607935Z st.shared.b8 [%r75+768], %r28825; 2026-02-21T09:05:41.1608002Z prmt.b32 %r28826, %r24021, 0, 0x7773U; 2026-02-21T09:05:41.1608066Z st.shared.b8 [%r76+768], %r28826; 2026-02-21T09:05:41.1608130Z bar.sync 0; 2026-02-21T09:05:41.1608201Z ld.shared.b32 %r28827, [%r77]; 2026-02-21T09:05:41.1608266Z prmt.b32 %r28828, %r28827, 0, 0x7770U; 2026-02-21T09:05:41.1608329Z cvt.u16.u32 %rs1082, %r28828; 2026-02-21T09:05:41.1608406Z prmt.b32 %r28829, %r28827, 0, 0x7771U; 2026-02-21T09:05:41.1608472Z cvt.u16.u32 %rs1083, %r28829; 2026-02-21T09:05:41.1608537Z prmt.b32 %r28830, %r28827, 0, 0x7772U; 2026-02-21T09:05:41.1608603Z cvt.u16.u32 %rs1084, %r28830; 2026-02-21T09:05:41.1608667Z prmt.b32 %r28831, %r28827, 0, 0x7773U; 2026-02-21T09:05:41.1608732Z cvt.u16.u32 %rs1085, %r28831; 2026-02-21T09:05:41.1608804Z ld.shared.b32 %r28832, [%r78]; 2026-02-21T09:05:41.1608869Z prmt.b32 %r28833, %r28832, 0, 0x7770U; 2026-02-21T09:05:41.1608931Z cvt.u16.u32 %rs1086, %r28833; 2026-02-21T09:05:41.1608996Z prmt.b32 %r28834, %r28832, 0, 0x7771U; 2026-02-21T09:05:41.1609064Z cvt.u16.u32 %rs1087, %r28834; 2026-02-21T09:05:41.1609127Z prmt.b32 %r28835, %r28832, 0, 0x7772U; 2026-02-21T09:05:41.1609192Z cvt.u16.u32 %rs1088, %r28835; 2026-02-21T09:05:41.1609266Z prmt.b32 %r28836, %r28832, 0, 0x7773U; 2026-02-21T09:05:41.1609327Z cvt.u16.u32 %rs1089, %r28836; 2026-02-21T09:05:41.1609526Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1609672Z shl.b16 %rs1090, %rs1082, 4; 2026-02-21T09:05:41.1609740Z shl.b16 %rs1091, %rs1086, 4; 2026-02-21T09:05:41.1609800Z shl.b16 %rs1092, %rs1083, 4; 2026-02-21T09:05:41.1609861Z shl.b16 %rs1093, %rs1087, 4; 2026-02-21T09:05:41.1609927Z shl.b16 %rs1094, %rs1084, 4; 2026-02-21T09:05:41.1609989Z shl.b16 %rs1095, %rs1088, 4; 2026-02-21T09:05:41.1610049Z shl.b16 %rs1096, %rs1085, 4; 2026-02-21T09:05:41.1610114Z shl.b16 %rs1097, %rs1089, 4; 2026-02-21T09:05:41.1610313Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1610386Z cvt.s16.s8 %rs1098, %rs1090; 2026-02-21T09:05:41.1610558Z shr.s16 %rs1099, %rs1098, 4; 2026-02-21T09:05:41.1610685Z cvt.s16.s8 %rs1100, %rs1091; 2026-02-21T09:05:41.1610753Z shr.s16 %rs1101, %rs1100, 4; 2026-02-21T09:05:41.1610822Z prmt.b32 %r28837, %r28827, 0, 0x8880U; 2026-02-21T09:05:41.1610895Z cvt.u16.u32 %rs1102, %r28837; 2026-02-21T09:05:41.1610969Z shr.s16 %rs1103, %rs1102, 4; 2026-02-21T09:05:41.1611036Z prmt.b32 %r28838, %r28832, 0, 0x8880U; 2026-02-21T09:05:41.1611100Z cvt.u16.u32 %rs1104, %r28838; 2026-02-21T09:05:41.1611166Z shr.s16 %rs1105, %rs1104, 4; 2026-02-21T09:05:41.1611362Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1611431Z cvt.rn.f32.s16 %r28839, %rs1105; 2026-02-21T09:05:41.1611504Z cvt.rn.f32.s16 %r28840, %rs1103; 2026-02-21T09:05:41.1611567Z cvt.rn.f32.s16 %r28841, %rs1101; 2026-02-21T09:05:41.1611630Z cvt.rn.f32.s16 %r28842, %rs1099; 2026-02-21T09:05:41.1611832Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1611950Z cvt.s16.s8 %rs1106, %rs1092; 2026-02-21T09:05:41.1612016Z shr.s16 %rs1107, %rs1106, 4; 2026-02-21T09:05:41.1612080Z cvt.s16.s8 %rs1108, %rs1093; 2026-02-21T09:05:41.1612147Z shr.s16 %rs1109, %rs1108, 4; 2026-02-21T09:05:41.1612217Z prmt.b32 %r28843, %r28827, 0, 0x9991U; 2026-02-21T09:05:41.1612279Z cvt.u16.u32 %rs1110, %r28843; 2026-02-21T09:05:41.1612346Z shr.s16 %rs1111, %rs1110, 4; 2026-02-21T09:05:41.1612411Z prmt.b32 %r28844, %r28832, 0, 0x9991U; 2026-02-21T09:05:41.1612474Z cvt.u16.u32 %rs1112, %r28844; 2026-02-21T09:05:41.1612540Z shr.s16 %rs1113, %rs1112, 4; 2026-02-21T09:05:41.1612737Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1612802Z cvt.rn.f32.s16 %r28845, %rs1113; 2026-02-21T09:05:41.1612864Z cvt.rn.f32.s16 %r28846, %rs1111; 2026-02-21T09:05:41.1612931Z cvt.rn.f32.s16 %r28847, %rs1109; 2026-02-21T09:05:41.1612994Z cvt.rn.f32.s16 %r28848, %rs1107; 2026-02-21T09:05:41.1613206Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1613276Z cvt.s16.s8 %rs1114, %rs1094; 2026-02-21T09:05:41.1613339Z shr.s16 %rs1115, %rs1114, 4; 2026-02-21T09:05:41.1613402Z cvt.s16.s8 %rs1116, %rs1095; 2026-02-21T09:05:41.1613462Z shr.s16 %rs1117, %rs1116, 4; 2026-02-21T09:05:41.1613534Z prmt.b32 %r28849, %r28827, 0, 0xaaa2U; 2026-02-21T09:05:41.1613594Z cvt.u16.u32 %rs1118, %r28849; 2026-02-21T09:05:41.1613655Z shr.s16 %rs1119, %rs1118, 4; 2026-02-21T09:05:41.1613726Z prmt.b32 %r28850, %r28832, 0, 0xaaa2U; 2026-02-21T09:05:41.1613790Z cvt.u16.u32 %rs1120, %r28850; 2026-02-21T09:05:41.1613850Z shr.s16 %rs1121, %rs1120, 4; 2026-02-21T09:05:41.1614048Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1614112Z cvt.rn.f32.s16 %r28851, %rs1121; 2026-02-21T09:05:41.1614176Z cvt.rn.f32.s16 %r28852, %rs1119; 2026-02-21T09:05:41.1614242Z cvt.rn.f32.s16 %r28853, %rs1117; 2026-02-21T09:05:41.1614310Z cvt.rn.f32.s16 %r28854, %rs1115; 2026-02-21T09:05:41.1614506Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1614626Z cvt.s16.s8 %rs1122, %rs1096; 2026-02-21T09:05:41.1614692Z shr.s16 %rs1123, %rs1122, 4; 2026-02-21T09:05:41.1614753Z cvt.s16.s8 %rs1124, %rs1097; 2026-02-21T09:05:41.1614813Z shr.s16 %rs1125, %rs1124, 4; 2026-02-21T09:05:41.1614883Z prmt.b32 %r28855, %r28827, 0, 0xbbb3U; 2026-02-21T09:05:41.1614945Z cvt.u16.u32 %rs1126, %r28855; 2026-02-21T09:05:41.1615006Z shr.s16 %rs1127, %rs1126, 4; 2026-02-21T09:05:41.1615085Z prmt.b32 %r28856, %r28832, 0, 0xbbb3U; 2026-02-21T09:05:41.1615153Z cvt.u16.u32 %rs1128, %r28856; 2026-02-21T09:05:41.1615216Z shr.s16 %rs1129, %rs1128, 4; 2026-02-21T09:05:41.1615412Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1615594Z cvt.rn.f32.s16 %r28857, %rs1129; 2026-02-21T09:05:41.1615659Z cvt.rn.f32.s16 %r28858, %rs1127; 2026-02-21T09:05:41.1615722Z cvt.rn.f32.s16 %r28859, %rs1125; 2026-02-21T09:05:41.1615785Z cvt.rn.f32.s16 %r28860, %rs1123; 2026-02-21T09:05:41.1615851Z bar.sync 0; 2026-02-21T09:05:41.1615970Z st.shared.v4.b32 [%r79], {%r28842, %r28840, %r28841, %r28839}; 2026-02-21T09:05:41.1616083Z st.shared.v4.b32 [%r80], {%r28848, %r28846, %r28847, %r28845}; 2026-02-21T09:05:41.1616198Z st.shared.v4.b32 [%r81], {%r28854, %r28852, %r28853, %r28851}; 2026-02-21T09:05:41.1616305Z st.shared.v4.b32 [%r82], {%r28860, %r28858, %r28859, %r28857}; 2026-02-21T09:05:41.1616360Z $L__tmp27: 2026-02-21T09:05:41.1616756Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1616821Z // begin inline asm 2026-02-21T09:05:41.1616900Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1616969Z // end inline asm 2026-02-21T09:05:41.1617034Z bar.sync 0; 2026-02-21T09:05:41.1617183Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1617247Z // begin inline asm 2026-02-21T09:05:41.1618722Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r24150,%r24151,%r24152,%r24153}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1618782Z // end inline asm 2026-02-21T09:05:41.1618842Z // begin inline asm 2026-02-21T09:05:41.1620306Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r24282,%r24283,%r24284,%r24285}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1620368Z // end inline asm 2026-02-21T09:05:41.1620433Z // begin inline asm 2026-02-21T09:05:41.1621891Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r24414,%r24415,%r24416,%r24417}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1622014Z // end inline asm 2026-02-21T09:05:41.1622081Z // begin inline asm 2026-02-21T09:05:41.1623530Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r24546,%r24547,%r24548,%r24549}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1623669Z // end inline asm 2026-02-21T09:05:41.1623787Z // begin inline asm 2026-02-21T09:05:41.1625249Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r24678,%r24679,%r24680,%r24681}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1625318Z // end inline asm 2026-02-21T09:05:41.1625376Z // begin inline asm 2026-02-21T09:05:41.1627036Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r24810,%r24811,%r24812,%r24813}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1627109Z // end inline asm 2026-02-21T09:05:41.1627170Z // begin inline asm 2026-02-21T09:05:41.1628728Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r24942,%r24943,%r24944,%r24945}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1628794Z // end inline asm 2026-02-21T09:05:41.1628856Z // begin inline asm 2026-02-21T09:05:41.1630338Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r25074,%r25075,%r25076,%r25077}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1630400Z // end inline asm 2026-02-21T09:05:41.1630486Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1630553Z mov.b32 %r25335, %r28519; 2026-02-21T09:05:41.1630616Z mov.b32 %r25336, %r28519; 2026-02-21T09:05:41.1630676Z mov.b32 %r25334, %r24001; 2026-02-21T09:05:41.1630744Z // begin inline asm 2026-02-21T09:05:41.1635907Z // wait for regs: %r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930,%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994,%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058,%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122,%r25334,%r25335,%r25336 2026-02-21T09:05:41.1636060Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1636127Z // end inline asm 2026-02-21T09:05:41.1636186Z $L__tmp28: 2026-02-21T09:05:41.1636396Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1636580Z add.s64 %rd272, %rd246, 64; 2026-02-21T09:05:41.1636658Z add.s64 %rd273, %rd247, 64; 2026-02-21T09:05:41.1636725Z add.s64 %rd274, %rd248, 64; 2026-02-21T09:05:41.1636927Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1636998Z add.s64 %rd275, %rd249, 64; 2026-02-21T09:05:41.1637059Z // begin inline asm 2026-02-21T09:05:41.1637121Z mov.u32 %r25596, 0x0; 2026-02-21T09:05:41.1637194Z mov.u32 %r25597, 0x0; 2026-02-21T09:05:41.1637254Z mov.u32 %r25598, 0x0; 2026-02-21T09:05:41.1637314Z mov.u32 %r25599, 0x0; 2026-02-21T09:05:41.1637453Z ld.global.v4.b32 { %r25596, %r25597, %r25598, %r25599 }, [ %rd272 + 0 ]; 2026-02-21T09:05:41.1637520Z // end inline asm 2026-02-21T09:05:41.1637582Z // begin inline asm 2026-02-21T09:05:41.1637645Z mov.u32 %r25600, 0x0; 2026-02-21T09:05:41.1637712Z mov.u32 %r25601, 0x0; 2026-02-21T09:05:41.1637771Z mov.u32 %r25602, 0x0; 2026-02-21T09:05:41.1637831Z mov.u32 %r25603, 0x0; 2026-02-21T09:05:41.1637974Z ld.global.v4.b32 { %r25600, %r25601, %r25602, %r25603 }, [ %rd273 + 0 ]; 2026-02-21T09:05:41.1638043Z // end inline asm 2026-02-21T09:05:41.1638102Z // begin inline asm 2026-02-21T09:05:41.1638160Z mov.u32 %r25604, 0x0; 2026-02-21T09:05:41.1638226Z mov.u32 %r25605, 0x0; 2026-02-21T09:05:41.1638284Z mov.u32 %r25606, 0x0; 2026-02-21T09:05:41.1638345Z mov.u32 %r25607, 0x0; 2026-02-21T09:05:41.1638477Z ld.global.v4.b32 { %r25604, %r25605, %r25606, %r25607 }, [ %rd274 + 0 ]; 2026-02-21T09:05:41.1638539Z // end inline asm 2026-02-21T09:05:41.1638599Z // begin inline asm 2026-02-21T09:05:41.1638659Z mov.u32 %r25608, 0x0; 2026-02-21T09:05:41.1638804Z mov.u32 %r25609, 0x0; 2026-02-21T09:05:41.1638864Z mov.u32 %r25610, 0x0; 2026-02-21T09:05:41.1638923Z mov.u32 %r25611, 0x0; 2026-02-21T09:05:41.1639049Z ld.global.v4.b32 { %r25608, %r25609, %r25610, %r25611 }, [ %rd275 + 0 ]; 2026-02-21T09:05:41.1639127Z // end inline asm 2026-02-21T09:05:41.1639332Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1639390Z bar.sync 0; 2026-02-21T09:05:41.1639480Z st.shared.v2.b32 [%r65], {%r25596, %r25597}; 2026-02-21T09:05:41.1639574Z st.shared.v2.b32 [%r65+2048], {%r25600, %r25601}; 2026-02-21T09:05:41.1639661Z st.shared.v2.b32 [%r65+4096], {%r25604, %r25605}; 2026-02-21T09:05:41.1639752Z st.shared.v2.b32 [%r65+6144], {%r25608, %r25609}; 2026-02-21T09:05:41.1639956Z st.shared.v2.b32 [%r66], {%r25598, %r25599}; 2026-02-21T09:05:41.1640045Z st.shared.v2.b32 [%r66+2048], {%r25602, %r25603}; 2026-02-21T09:05:41.1640129Z st.shared.v2.b32 [%r66+4096], {%r25606, %r25607}; 2026-02-21T09:05:41.1640237Z st.shared.v2.b32 [%r66+6144], {%r25610, %r25611}; 2026-02-21T09:05:41.1640297Z bar.sync 0; 2026-02-21T09:05:41.1640367Z ld.shared.b16 %rs1130, [%r67]; 2026-02-21T09:05:41.1640445Z ld.shared.b16 %rs1131, [%r67+256]; 2026-02-21T09:05:41.1640512Z ld.shared.b16 %rs1132, [%r67+16]; 2026-02-21T09:05:41.1640579Z ld.shared.b16 %rs1133, [%r67+272]; 2026-02-21T09:05:41.1640654Z ld.shared.b16 %rs1134, [%r67+2048]; 2026-02-21T09:05:41.1640721Z ld.shared.b16 %rs1135, [%r67+2304]; 2026-02-21T09:05:41.1640789Z ld.shared.b16 %rs1136, [%r67+2064]; 2026-02-21T09:05:41.1640856Z ld.shared.b16 %rs1137, [%r67+2320]; 2026-02-21T09:05:41.1640933Z ld.shared.b16 %rs1138, [%r67+4096]; 2026-02-21T09:05:41.1641002Z ld.shared.b16 %rs1139, [%r67+4352]; 2026-02-21T09:05:41.1641072Z ld.shared.b16 %rs1140, [%r67+4112]; 2026-02-21T09:05:41.1641210Z ld.shared.b16 %rs1141, [%r67+4368]; 2026-02-21T09:05:41.1641281Z ld.shared.b16 %rs1142, [%r67+6144]; 2026-02-21T09:05:41.1641351Z ld.shared.b16 %rs1143, [%r67+6400]; 2026-02-21T09:05:41.1641418Z ld.shared.b16 %rs1144, [%r67+6160]; 2026-02-21T09:05:41.1641489Z ld.shared.b16 %rs1145, [%r67+6416]; 2026-02-21T09:05:41.1641559Z ld.shared.b16 %rs1146, [%r68]; 2026-02-21T09:05:41.1641627Z ld.shared.b16 %rs1147, [%r68+256]; 2026-02-21T09:05:41.1641697Z ld.shared.b16 %rs1148, [%r68+16]; 2026-02-21T09:05:41.1641764Z ld.shared.b16 %rs1149, [%r68+272]; 2026-02-21T09:05:41.1641832Z ld.shared.b16 %rs1150, [%r68+2048]; 2026-02-21T09:05:41.1641904Z ld.shared.b16 %rs1151, [%r68+2304]; 2026-02-21T09:05:41.1641971Z ld.shared.b16 %rs1152, [%r68+2064]; 2026-02-21T09:05:41.1642038Z ld.shared.b16 %rs1153, [%r68+2320]; 2026-02-21T09:05:41.1642104Z ld.shared.b16 %rs1154, [%r68+4096]; 2026-02-21T09:05:41.1642183Z ld.shared.b16 %rs1155, [%r68+4352]; 2026-02-21T09:05:41.1642251Z ld.shared.b16 %rs1156, [%r68+4112]; 2026-02-21T09:05:41.1642328Z ld.shared.b16 %rs1157, [%r68+4368]; 2026-02-21T09:05:41.1642406Z ld.shared.b16 %rs1158, [%r68+6144]; 2026-02-21T09:05:41.1642478Z ld.shared.b16 %rs1159, [%r68+6400]; 2026-02-21T09:05:41.1642547Z ld.shared.b16 %rs1160, [%r68+6160]; 2026-02-21T09:05:41.1642614Z ld.shared.b16 %rs1161, [%r68+6416]; 2026-02-21T09:05:41.1642686Z cvt.f32.bf16 %r25742, %rs1130; 2026-02-21T09:05:41.1642750Z cvt.f32.bf16 %r25743, %rs1131; 2026-02-21T09:05:41.1642816Z cvt.f32.bf16 %r25744, %rs1146; 2026-02-21T09:05:41.1642883Z cvt.f32.bf16 %r25745, %rs1147; 2026-02-21T09:05:41.1642947Z cvt.f32.bf16 %r25874, %rs1132; 2026-02-21T09:05:41.1643008Z cvt.f32.bf16 %r25875, %rs1133; 2026-02-21T09:05:41.1643076Z cvt.f32.bf16 %r25876, %rs1148; 2026-02-21T09:05:41.1643138Z cvt.f32.bf16 %r25877, %rs1149; 2026-02-21T09:05:41.1643200Z cvt.f32.bf16 %r26006, %rs1134; 2026-02-21T09:05:41.1643267Z cvt.f32.bf16 %r26007, %rs1135; 2026-02-21T09:05:41.1643338Z cvt.f32.bf16 %r26008, %rs1150; 2026-02-21T09:05:41.1643401Z cvt.f32.bf16 %r26009, %rs1151; 2026-02-21T09:05:41.1643463Z cvt.f32.bf16 %r26138, %rs1136; 2026-02-21T09:05:41.1643609Z cvt.f32.bf16 %r26139, %rs1137; 2026-02-21T09:05:41.1643674Z cvt.f32.bf16 %r26140, %rs1152; 2026-02-21T09:05:41.1643744Z cvt.f32.bf16 %r26141, %rs1153; 2026-02-21T09:05:41.1643807Z cvt.f32.bf16 %r26270, %rs1138; 2026-02-21T09:05:41.1643875Z cvt.f32.bf16 %r26271, %rs1139; 2026-02-21T09:05:41.1643938Z cvt.f32.bf16 %r26272, %rs1154; 2026-02-21T09:05:41.1644002Z cvt.f32.bf16 %r26273, %rs1155; 2026-02-21T09:05:41.1644070Z cvt.f32.bf16 %r26402, %rs1140; 2026-02-21T09:05:41.1644134Z cvt.f32.bf16 %r26403, %rs1141; 2026-02-21T09:05:41.1644196Z cvt.f32.bf16 %r26404, %rs1156; 2026-02-21T09:05:41.1644259Z cvt.f32.bf16 %r26405, %rs1157; 2026-02-21T09:05:41.1644326Z cvt.f32.bf16 %r26534, %rs1142; 2026-02-21T09:05:41.1644388Z cvt.f32.bf16 %r26535, %rs1143; 2026-02-21T09:05:41.1644545Z cvt.f32.bf16 %r26536, %rs1158; 2026-02-21T09:05:41.1644617Z cvt.f32.bf16 %r26537, %rs1159; 2026-02-21T09:05:41.1644678Z cvt.f32.bf16 %r26666, %rs1144; 2026-02-21T09:05:41.1644740Z cvt.f32.bf16 %r26667, %rs1145; 2026-02-21T09:05:41.1644805Z cvt.f32.bf16 %r26668, %rs1160; 2026-02-21T09:05:41.1644870Z cvt.f32.bf16 %r26669, %rs1161; 2026-02-21T09:05:41.1645077Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1645144Z add.s64 %rd276, %rd305, 131072; 2026-02-21T09:05:41.1645224Z // begin inline asm 2026-02-21T09:05:41.1645285Z mov.u32 %r25612, 0x0; 2026-02-21T09:05:41.1645346Z mov.u32 %r25613, 0x0; 2026-02-21T09:05:41.1645452Z ld.global.v2.b32 { %r25612, %r25613 }, [ %rd276 + 0 ]; 2026-02-21T09:05:41.1645510Z // end inline asm 2026-02-21T09:05:41.1645708Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1645767Z bar.sync 0; 2026-02-21T09:05:41.1645888Z st.shared.b8 [%r69], %r25612; 2026-02-21T09:05:41.1645966Z prmt.b32 %r28861, %r25612, 0, 0x7771U; 2026-02-21T09:05:41.1646031Z st.shared.b8 [%r70], %r28861; 2026-02-21T09:05:41.1646107Z prmt.b32 %r28862, %r25612, 0, 0x7772U; 2026-02-21T09:05:41.1646175Z st.shared.b8 [%r71+256], %r28862; 2026-02-21T09:05:41.1646246Z prmt.b32 %r28863, %r25612, 0, 0x7773U; 2026-02-21T09:05:41.1646317Z st.shared.b8 [%r72+256], %r28863; 2026-02-21T09:05:41.1646384Z st.shared.b8 [%r73+512], %r25613; 2026-02-21T09:05:41.1646749Z prmt.b32 %r28864, %r25613, 0, 0x7771U; 2026-02-21T09:05:41.1646824Z st.shared.b8 [%r74+512], %r28864; 2026-02-21T09:05:41.1646897Z prmt.b32 %r28865, %r25613, 0, 0x7772U; 2026-02-21T09:05:41.1646961Z st.shared.b8 [%r75+768], %r28865; 2026-02-21T09:05:41.1647029Z prmt.b32 %r28866, %r25613, 0, 0x7773U; 2026-02-21T09:05:41.1647097Z st.shared.b8 [%r76+768], %r28866; 2026-02-21T09:05:41.1647157Z bar.sync 0; 2026-02-21T09:05:41.1647224Z ld.shared.b32 %r28867, [%r77]; 2026-02-21T09:05:41.1647293Z prmt.b32 %r28868, %r28867, 0, 0x7770U; 2026-02-21T09:05:41.1647369Z cvt.u16.u32 %rs1162, %r28868; 2026-02-21T09:05:41.1647438Z prmt.b32 %r28869, %r28867, 0, 0x7771U; 2026-02-21T09:05:41.1647502Z cvt.u16.u32 %rs1163, %r28869; 2026-02-21T09:05:41.1647571Z prmt.b32 %r28870, %r28867, 0, 0x7772U; 2026-02-21T09:05:41.1647633Z cvt.u16.u32 %rs1164, %r28870; 2026-02-21T09:05:41.1647697Z prmt.b32 %r28871, %r28867, 0, 0x7773U; 2026-02-21T09:05:41.1647760Z cvt.u16.u32 %rs1165, %r28871; 2026-02-21T09:05:41.1647832Z ld.shared.b32 %r28872, [%r78]; 2026-02-21T09:05:41.1647897Z prmt.b32 %r28873, %r28872, 0, 0x7770U; 2026-02-21T09:05:41.1647959Z cvt.u16.u32 %rs1166, %r28873; 2026-02-21T09:05:41.1648027Z prmt.b32 %r28874, %r28872, 0, 0x7771U; 2026-02-21T09:05:41.1648089Z cvt.u16.u32 %rs1167, %r28874; 2026-02-21T09:05:41.1648155Z prmt.b32 %r28875, %r28872, 0, 0x7772U; 2026-02-21T09:05:41.1648220Z cvt.u16.u32 %rs1168, %r28875; 2026-02-21T09:05:41.1648288Z prmt.b32 %r28876, %r28872, 0, 0x7773U; 2026-02-21T09:05:41.1648353Z cvt.u16.u32 %rs1169, %r28876; 2026-02-21T09:05:41.1648554Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1648728Z shl.b16 %rs1170, %rs1162, 4; 2026-02-21T09:05:41.1648790Z shl.b16 %rs1171, %rs1166, 4; 2026-02-21T09:05:41.1648853Z shl.b16 %rs1172, %rs1163, 4; 2026-02-21T09:05:41.1648926Z shl.b16 %rs1173, %rs1167, 4; 2026-02-21T09:05:41.1648994Z shl.b16 %rs1174, %rs1164, 4; 2026-02-21T09:05:41.1649059Z shl.b16 %rs1175, %rs1168, 4; 2026-02-21T09:05:41.1649121Z shl.b16 %rs1176, %rs1165, 4; 2026-02-21T09:05:41.1649189Z shl.b16 %rs1177, %rs1169, 4; 2026-02-21T09:05:41.1649388Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1649451Z cvt.s16.s8 %rs1178, %rs1170; 2026-02-21T09:05:41.1649520Z shr.s16 %rs1179, %rs1178, 4; 2026-02-21T09:05:41.1649585Z cvt.s16.s8 %rs1180, %rs1171; 2026-02-21T09:05:41.1649774Z shr.s16 %rs1181, %rs1180, 4; 2026-02-21T09:05:41.1649850Z prmt.b32 %r28877, %r28867, 0, 0x8880U; 2026-02-21T09:05:41.1649914Z cvt.u16.u32 %rs1182, %r28877; 2026-02-21T09:05:41.1649977Z shr.s16 %rs1183, %rs1182, 4; 2026-02-21T09:05:41.1650054Z prmt.b32 %r28878, %r28872, 0, 0x8880U; 2026-02-21T09:05:41.1650126Z cvt.u16.u32 %rs1184, %r28878; 2026-02-21T09:05:41.1650188Z shr.s16 %rs1185, %rs1184, 4; 2026-02-21T09:05:41.1650387Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1650460Z cvt.rn.f32.s16 %r28879, %rs1185; 2026-02-21T09:05:41.1650526Z cvt.rn.f32.s16 %r28880, %rs1183; 2026-02-21T09:05:41.1650589Z cvt.rn.f32.s16 %r28881, %rs1181; 2026-02-21T09:05:41.1650653Z cvt.rn.f32.s16 %r28882, %rs1179; 2026-02-21T09:05:41.1650856Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1650917Z cvt.s16.s8 %rs1186, %rs1172; 2026-02-21T09:05:41.1650983Z shr.s16 %rs1187, %rs1186, 4; 2026-02-21T09:05:41.1651114Z cvt.s16.s8 %rs1188, %rs1173; 2026-02-21T09:05:41.1651178Z shr.s16 %rs1189, %rs1188, 4; 2026-02-21T09:05:41.1651247Z prmt.b32 %r28883, %r28867, 0, 0x9991U; 2026-02-21T09:05:41.1651316Z cvt.u16.u32 %rs1190, %r28883; 2026-02-21T09:05:41.1651377Z shr.s16 %rs1191, %rs1190, 4; 2026-02-21T09:05:41.1651446Z prmt.b32 %r28884, %r28872, 0, 0x9991U; 2026-02-21T09:05:41.1651509Z cvt.u16.u32 %rs1192, %r28884; 2026-02-21T09:05:41.1651576Z shr.s16 %rs1193, %rs1192, 4; 2026-02-21T09:05:41.1651775Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1651843Z cvt.rn.f32.s16 %r28885, %rs1193; 2026-02-21T09:05:41.1651913Z cvt.rn.f32.s16 %r28886, %rs1191; 2026-02-21T09:05:41.1651978Z cvt.rn.f32.s16 %r28887, %rs1189; 2026-02-21T09:05:41.1652041Z cvt.rn.f32.s16 %r28888, %rs1187; 2026-02-21T09:05:41.1652247Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1652315Z cvt.s16.s8 %rs1194, %rs1174; 2026-02-21T09:05:41.1652377Z shr.s16 %rs1195, %rs1194, 4; 2026-02-21T09:05:41.1652440Z cvt.s16.s8 %rs1196, %rs1175; 2026-02-21T09:05:41.1652510Z shr.s16 %rs1197, %rs1196, 4; 2026-02-21T09:05:41.1652588Z prmt.b32 %r28889, %r28867, 0, 0xaaa2U; 2026-02-21T09:05:41.1652654Z cvt.u16.u32 %rs1198, %r28889; 2026-02-21T09:05:41.1652724Z shr.s16 %rs1199, %rs1198, 4; 2026-02-21T09:05:41.1652790Z prmt.b32 %r28890, %r28872, 0, 0xaaa2U; 2026-02-21T09:05:41.1652852Z cvt.u16.u32 %rs1200, %r28890; 2026-02-21T09:05:41.1652917Z shr.s16 %rs1201, %rs1200, 4; 2026-02-21T09:05:41.1653123Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1653190Z cvt.rn.f32.s16 %r28891, %rs1201; 2026-02-21T09:05:41.1653254Z cvt.rn.f32.s16 %r28892, %rs1199; 2026-02-21T09:05:41.1653325Z cvt.rn.f32.s16 %r28893, %rs1197; 2026-02-21T09:05:41.1653390Z cvt.rn.f32.s16 %r28894, %rs1195; 2026-02-21T09:05:41.1653591Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1653660Z cvt.s16.s8 %rs1202, %rs1176; 2026-02-21T09:05:41.1653781Z shr.s16 %rs1203, %rs1202, 4; 2026-02-21T09:05:41.1653848Z cvt.s16.s8 %rs1204, %rs1177; 2026-02-21T09:05:41.1653911Z shr.s16 %rs1205, %rs1204, 4; 2026-02-21T09:05:41.1653986Z prmt.b32 %r28895, %r28867, 0, 0xbbb3U; 2026-02-21T09:05:41.1654048Z cvt.u16.u32 %rs1206, %r28895; 2026-02-21T09:05:41.1654111Z shr.s16 %rs1207, %rs1206, 4; 2026-02-21T09:05:41.1654181Z prmt.b32 %r28896, %r28872, 0, 0xbbb3U; 2026-02-21T09:05:41.1654246Z cvt.u16.u32 %rs1208, %r28896; 2026-02-21T09:05:41.1654307Z shr.s16 %rs1209, %rs1208, 4; 2026-02-21T09:05:41.1654510Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1654576Z cvt.rn.f32.s16 %r28897, %rs1209; 2026-02-21T09:05:41.1654734Z cvt.rn.f32.s16 %r28898, %rs1207; 2026-02-21T09:05:41.1654801Z cvt.rn.f32.s16 %r28899, %rs1205; 2026-02-21T09:05:41.1654872Z cvt.rn.f32.s16 %r28900, %rs1203; 2026-02-21T09:05:41.1654928Z bar.sync 0; 2026-02-21T09:05:41.1655050Z st.shared.v4.b32 [%r79], {%r28882, %r28880, %r28881, %r28879}; 2026-02-21T09:05:41.1655168Z st.shared.v4.b32 [%r80], {%r28888, %r28886, %r28887, %r28885}; 2026-02-21T09:05:41.1655277Z st.shared.v4.b32 [%r81], {%r28894, %r28892, %r28893, %r28891}; 2026-02-21T09:05:41.1655386Z st.shared.v4.b32 [%r82], {%r28900, %r28898, %r28899, %r28897}; 2026-02-21T09:05:41.1655447Z $L__tmp29: 2026-02-21T09:05:41.1655723Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1655785Z // begin inline asm 2026-02-21T09:05:41.1655864Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1655926Z // end inline asm 2026-02-21T09:05:41.1655984Z bar.sync 0; 2026-02-21T09:05:41.1656058Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1656129Z // begin inline asm 2026-02-21T09:05:41.1657830Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r25742,%r25743,%r25744,%r25745}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1657900Z // end inline asm 2026-02-21T09:05:41.1657966Z // begin inline asm 2026-02-21T09:05:41.1659454Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r25874,%r25875,%r25876,%r25877}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1659521Z // end inline asm 2026-02-21T09:05:41.1659580Z // begin inline asm 2026-02-21T09:05:41.1661062Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r26006,%r26007,%r26008,%r26009}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1661130Z // end inline asm 2026-02-21T09:05:41.1661267Z // begin inline asm 2026-02-21T09:05:41.1662751Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r26138,%r26139,%r26140,%r26141}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1662817Z // end inline asm 2026-02-21T09:05:41.1662875Z // begin inline asm 2026-02-21T09:05:41.1664483Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r26270,%r26271,%r26272,%r26273}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1664546Z // end inline asm 2026-02-21T09:05:41.1664610Z // begin inline asm 2026-02-21T09:05:41.1666146Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r26402,%r26403,%r26404,%r26405}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1666209Z // end inline asm 2026-02-21T09:05:41.1666270Z // begin inline asm 2026-02-21T09:05:41.1667885Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r26534,%r26535,%r26536,%r26537}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1667959Z // end inline asm 2026-02-21T09:05:41.1668026Z // begin inline asm 2026-02-21T09:05:41.1669611Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r26666,%r26667,%r26668,%r26669}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1669670Z // end inline asm 2026-02-21T09:05:41.1669754Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1669818Z mov.b32 %r26927, %r28519; 2026-02-21T09:05:41.1669883Z mov.b32 %r26928, %r28519; 2026-02-21T09:05:41.1669953Z mov.b32 %r26926, %r24001; 2026-02-21T09:05:41.1670013Z // begin inline asm 2026-02-21T09:05:41.1675133Z // wait for regs: %r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930,%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994,%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058,%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122,%r26926,%r26927,%r26928 2026-02-21T09:05:41.1675379Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1675451Z // end inline asm 2026-02-21T09:05:41.1675510Z $L__tmp30: 2026-02-21T09:05:41.1675718Z .loc 1 55 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:32 2026-02-21T09:05:41.1675790Z add.s64 %rd285, %rd246, 96; 2026-02-21T09:05:41.1675855Z add.s64 %rd286, %rd247, 96; 2026-02-21T09:05:41.1675917Z add.s64 %rd287, %rd248, 96; 2026-02-21T09:05:41.1676122Z .loc 1 55 80 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:55:80 2026-02-21T09:05:41.1676184Z add.s64 %rd288, %rd249, 96; 2026-02-21T09:05:41.1676247Z // begin inline asm 2026-02-21T09:05:41.1676307Z mov.u32 %r27188, 0x0; 2026-02-21T09:05:41.1676376Z mov.u32 %r27189, 0x0; 2026-02-21T09:05:41.1676437Z mov.u32 %r27190, 0x0; 2026-02-21T09:05:41.1676616Z mov.u32 %r27191, 0x0; 2026-02-21T09:05:41.1676761Z ld.global.v4.b32 { %r27188, %r27189, %r27190, %r27191 }, [ %rd285 + 0 ]; 2026-02-21T09:05:41.1676833Z // end inline asm 2026-02-21T09:05:41.1676895Z // begin inline asm 2026-02-21T09:05:41.1676954Z mov.u32 %r27192, 0x0; 2026-02-21T09:05:41.1677018Z mov.u32 %r27193, 0x0; 2026-02-21T09:05:41.1677075Z mov.u32 %r27194, 0x0; 2026-02-21T09:05:41.1677134Z mov.u32 %r27195, 0x0; 2026-02-21T09:05:41.1677267Z ld.global.v4.b32 { %r27192, %r27193, %r27194, %r27195 }, [ %rd286 + 0 ]; 2026-02-21T09:05:41.1677326Z // end inline asm 2026-02-21T09:05:41.1677387Z // begin inline asm 2026-02-21T09:05:41.1677454Z mov.u32 %r27196, 0x0; 2026-02-21T09:05:41.1677512Z mov.u32 %r27197, 0x0; 2026-02-21T09:05:41.1677571Z mov.u32 %r27198, 0x0; 2026-02-21T09:05:41.1677630Z mov.u32 %r27199, 0x0; 2026-02-21T09:05:41.1677760Z ld.global.v4.b32 { %r27196, %r27197, %r27198, %r27199 }, [ %rd287 + 0 ]; 2026-02-21T09:05:41.1677823Z // end inline asm 2026-02-21T09:05:41.1677884Z // begin inline asm 2026-02-21T09:05:41.1677952Z mov.u32 %r27200, 0x0; 2026-02-21T09:05:41.1678011Z mov.u32 %r27201, 0x0; 2026-02-21T09:05:41.1678156Z mov.u32 %r27202, 0x0; 2026-02-21T09:05:41.1678215Z mov.u32 %r27203, 0x0; 2026-02-21T09:05:41.1678347Z ld.global.v4.b32 { %r27200, %r27201, %r27202, %r27203 }, [ %rd288 + 0 ]; 2026-02-21T09:05:41.1678407Z // end inline asm 2026-02-21T09:05:41.1678618Z .loc 1 59 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:59:32 2026-02-21T09:05:41.1678684Z bar.sync 0; 2026-02-21T09:05:41.1678770Z st.shared.v2.b32 [%r65], {%r27188, %r27189}; 2026-02-21T09:05:41.1678863Z st.shared.v2.b32 [%r65+2048], {%r27192, %r27193}; 2026-02-21T09:05:41.1678961Z st.shared.v2.b32 [%r65+4096], {%r27196, %r27197}; 2026-02-21T09:05:41.1679049Z st.shared.v2.b32 [%r65+6144], {%r27200, %r27201}; 2026-02-21T09:05:41.1679128Z st.shared.v2.b32 [%r66], {%r27190, %r27191}; 2026-02-21T09:05:41.1679343Z st.shared.v2.b32 [%r66+2048], {%r27194, %r27195}; 2026-02-21T09:05:41.1679438Z st.shared.v2.b32 [%r66+4096], {%r27198, %r27199}; 2026-02-21T09:05:41.1679524Z st.shared.v2.b32 [%r66+6144], {%r27202, %r27203}; 2026-02-21T09:05:41.1679582Z bar.sync 0; 2026-02-21T09:05:41.1679660Z ld.shared.b16 %rs1210, [%r67]; 2026-02-21T09:05:41.1679731Z ld.shared.b16 %rs1211, [%r67+256]; 2026-02-21T09:05:41.1679798Z ld.shared.b16 %rs1212, [%r67+16]; 2026-02-21T09:05:41.1679866Z ld.shared.b16 %rs1213, [%r67+272]; 2026-02-21T09:05:41.1679941Z ld.shared.b16 %rs1214, [%r67+2048]; 2026-02-21T09:05:41.1680008Z ld.shared.b16 %rs1215, [%r67+2304]; 2026-02-21T09:05:41.1680074Z ld.shared.b16 %rs1216, [%r67+2064]; 2026-02-21T09:05:41.1680149Z ld.shared.b16 %rs1217, [%r67+2320]; 2026-02-21T09:05:41.1680217Z ld.shared.b16 %rs1218, [%r67+4096]; 2026-02-21T09:05:41.1680286Z ld.shared.b16 %rs1219, [%r67+4352]; 2026-02-21T09:05:41.1680359Z ld.shared.b16 %rs1220, [%r67+4112]; 2026-02-21T09:05:41.1680490Z ld.shared.b16 %rs1221, [%r67+4368]; 2026-02-21T09:05:41.1680560Z ld.shared.b16 %rs1222, [%r67+6144]; 2026-02-21T09:05:41.1680625Z ld.shared.b16 %rs1223, [%r67+6400]; 2026-02-21T09:05:41.1680698Z ld.shared.b16 %rs1224, [%r67+6160]; 2026-02-21T09:05:41.1680765Z ld.shared.b16 %rs1225, [%r67+6416]; 2026-02-21T09:05:41.1680845Z ld.shared.b16 %rs1226, [%r68]; 2026-02-21T09:05:41.1680918Z ld.shared.b16 %rs1227, [%r68+256]; 2026-02-21T09:05:41.1680985Z ld.shared.b16 %rs1228, [%r68+16]; 2026-02-21T09:05:41.1681050Z ld.shared.b16 %rs1229, [%r68+272]; 2026-02-21T09:05:41.1681117Z ld.shared.b16 %rs1230, [%r68+2048]; 2026-02-21T09:05:41.1681191Z ld.shared.b16 %rs1231, [%r68+2304]; 2026-02-21T09:05:41.1681258Z ld.shared.b16 %rs1232, [%r68+2064]; 2026-02-21T09:05:41.1681324Z ld.shared.b16 %rs1233, [%r68+2320]; 2026-02-21T09:05:41.1681397Z ld.shared.b16 %rs1234, [%r68+4096]; 2026-02-21T09:05:41.1681464Z ld.shared.b16 %rs1235, [%r68+4352]; 2026-02-21T09:05:41.1681536Z ld.shared.b16 %rs1236, [%r68+4112]; 2026-02-21T09:05:41.1681609Z ld.shared.b16 %rs1237, [%r68+4368]; 2026-02-21T09:05:41.1681675Z ld.shared.b16 %rs1238, [%r68+6144]; 2026-02-21T09:05:41.1681742Z ld.shared.b16 %rs1239, [%r68+6400]; 2026-02-21T09:05:41.1681812Z ld.shared.b16 %rs1240, [%r68+6160]; 2026-02-21T09:05:41.1681884Z ld.shared.b16 %rs1241, [%r68+6416]; 2026-02-21T09:05:41.1681964Z cvt.f32.bf16 %r27334, %rs1210; 2026-02-21T09:05:41.1682031Z cvt.f32.bf16 %r27335, %rs1211; 2026-02-21T09:05:41.1682101Z cvt.f32.bf16 %r27336, %rs1226; 2026-02-21T09:05:41.1682166Z cvt.f32.bf16 %r27337, %rs1227; 2026-02-21T09:05:41.1682228Z cvt.f32.bf16 %r27466, %rs1212; 2026-02-21T09:05:41.1682290Z cvt.f32.bf16 %r27467, %rs1213; 2026-02-21T09:05:41.1682361Z cvt.f32.bf16 %r27468, %rs1228; 2026-02-21T09:05:41.1682422Z cvt.f32.bf16 %r27469, %rs1229; 2026-02-21T09:05:41.1682483Z cvt.f32.bf16 %r27598, %rs1214; 2026-02-21T09:05:41.1682553Z cvt.f32.bf16 %r27599, %rs1215; 2026-02-21T09:05:41.1682621Z cvt.f32.bf16 %r27600, %rs1230; 2026-02-21T09:05:41.1682684Z cvt.f32.bf16 %r27601, %rs1231; 2026-02-21T09:05:41.1682751Z cvt.f32.bf16 %r27730, %rs1216; 2026-02-21T09:05:41.1682821Z cvt.f32.bf16 %r27731, %rs1217; 2026-02-21T09:05:41.1682942Z cvt.f32.bf16 %r27732, %rs1232; 2026-02-21T09:05:41.1683004Z cvt.f32.bf16 %r27733, %rs1233; 2026-02-21T09:05:41.1683076Z cvt.f32.bf16 %r27862, %rs1218; 2026-02-21T09:05:41.1683142Z cvt.f32.bf16 %r27863, %rs1219; 2026-02-21T09:05:41.1683208Z cvt.f32.bf16 %r27864, %rs1234; 2026-02-21T09:05:41.1683275Z cvt.f32.bf16 %r27865, %rs1235; 2026-02-21T09:05:41.1683337Z cvt.f32.bf16 %r27994, %rs1220; 2026-02-21T09:05:41.1683402Z cvt.f32.bf16 %r27995, %rs1221; 2026-02-21T09:05:41.1683464Z cvt.f32.bf16 %r27996, %rs1236; 2026-02-21T09:05:41.1683532Z cvt.f32.bf16 %r27997, %rs1237; 2026-02-21T09:05:41.1683593Z cvt.f32.bf16 %r28126, %rs1222; 2026-02-21T09:05:41.1683656Z cvt.f32.bf16 %r28127, %rs1223; 2026-02-21T09:05:41.1683726Z cvt.f32.bf16 %r28128, %rs1238; 2026-02-21T09:05:41.1683883Z cvt.f32.bf16 %r28129, %rs1239; 2026-02-21T09:05:41.1683950Z cvt.f32.bf16 %r28258, %rs1224; 2026-02-21T09:05:41.1684014Z cvt.f32.bf16 %r28259, %rs1225; 2026-02-21T09:05:41.1684083Z cvt.f32.bf16 %r28260, %rs1240; 2026-02-21T09:05:41.1684149Z cvt.f32.bf16 %r28261, %rs1241; 2026-02-21T09:05:41.1684354Z .loc 1 61 87 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:61:87 2026-02-21T09:05:41.1684427Z add.s64 %rd289, %rd305, 196608; 2026-02-21T09:05:41.1684489Z // begin inline asm 2026-02-21T09:05:41.1684550Z mov.u32 %r27204, 0x0; 2026-02-21T09:05:41.1684612Z mov.u32 %r27205, 0x0; 2026-02-21T09:05:41.1684720Z ld.global.v2.b32 { %r27204, %r27205 }, [ %rd289 + 0 ]; 2026-02-21T09:05:41.1684778Z // end inline asm 2026-02-21T09:05:41.1684979Z .loc 1 69 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:69:28 2026-02-21T09:05:41.1685047Z bar.sync 0; 2026-02-21T09:05:41.1685114Z st.shared.b8 [%r69], %r27204; 2026-02-21T09:05:41.1685242Z prmt.b32 %r28901, %r27204, 0, 0x7771U; 2026-02-21T09:05:41.1685317Z st.shared.b8 [%r70], %r28901; 2026-02-21T09:05:41.1685400Z prmt.b32 %r28902, %r27204, 0, 0x7772U; 2026-02-21T09:05:41.1685472Z st.shared.b8 [%r71+256], %r28902; 2026-02-21T09:05:41.1685540Z prmt.b32 %r28903, %r27204, 0, 0x7773U; 2026-02-21T09:05:41.1685615Z st.shared.b8 [%r72+256], %r28903; 2026-02-21T09:05:41.1685683Z st.shared.b8 [%r73+512], %r27205; 2026-02-21T09:05:41.1685750Z prmt.b32 %r28904, %r27205, 0, 0x7771U; 2026-02-21T09:05:41.1685822Z st.shared.b8 [%r74+512], %r28904; 2026-02-21T09:05:41.1685892Z prmt.b32 %r28905, %r27205, 0, 0x7772U; 2026-02-21T09:05:41.1685964Z st.shared.b8 [%r75+768], %r28905; 2026-02-21T09:05:41.1686029Z prmt.b32 %r28906, %r27205, 0, 0x7773U; 2026-02-21T09:05:41.1686101Z st.shared.b8 [%r76+768], %r28906; 2026-02-21T09:05:41.1686159Z bar.sync 0; 2026-02-21T09:05:41.1686225Z ld.shared.b32 %r28907, [%r77]; 2026-02-21T09:05:41.1686303Z prmt.b32 %r28908, %r28907, 0, 0x7770U; 2026-02-21T09:05:41.1686370Z cvt.u16.u32 %rs1242, %r28908; 2026-02-21T09:05:41.1686438Z prmt.b32 %r28909, %r28907, 0, 0x7771U; 2026-02-21T09:05:41.1686622Z cvt.u16.u32 %rs1243, %r28909; 2026-02-21T09:05:41.1686696Z prmt.b32 %r28910, %r28907, 0, 0x7772U; 2026-02-21T09:05:41.1686770Z cvt.u16.u32 %rs1244, %r28910; 2026-02-21T09:05:41.1686838Z prmt.b32 %r28911, %r28907, 0, 0x7773U; 2026-02-21T09:05:41.1686908Z cvt.u16.u32 %rs1245, %r28911; 2026-02-21T09:05:41.1686974Z ld.shared.b32 %r28912, [%r78]; 2026-02-21T09:05:41.1687040Z prmt.b32 %r28913, %r28912, 0, 0x7770U; 2026-02-21T09:05:41.1687106Z cvt.u16.u32 %rs1246, %r28913; 2026-02-21T09:05:41.1687170Z prmt.b32 %r28914, %r28912, 0, 0x7771U; 2026-02-21T09:05:41.1687230Z cvt.u16.u32 %rs1247, %r28914; 2026-02-21T09:05:41.1687296Z prmt.b32 %r28915, %r28912, 0, 0x7772U; 2026-02-21T09:05:41.1687365Z cvt.u16.u32 %rs1248, %r28915; 2026-02-21T09:05:41.1687430Z prmt.b32 %r28916, %r28912, 0, 0x7773U; 2026-02-21T09:05:41.1687498Z cvt.u16.u32 %rs1249, %r28916; 2026-02-21T09:05:41.1687705Z .loc 1 64 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:64:28 2026-02-21T09:05:41.1687772Z shl.b16 %rs1250, %rs1242, 4; 2026-02-21T09:05:41.1687919Z shl.b16 %rs1251, %rs1246, 4; 2026-02-21T09:05:41.1687990Z shl.b16 %rs1252, %rs1243, 4; 2026-02-21T09:05:41.1688053Z shl.b16 %rs1253, %rs1247, 4; 2026-02-21T09:05:41.1688113Z shl.b16 %rs1254, %rs1244, 4; 2026-02-21T09:05:41.1688174Z shl.b16 %rs1255, %rs1248, 4; 2026-02-21T09:05:41.1688241Z shl.b16 %rs1256, %rs1245, 4; 2026-02-21T09:05:41.1688302Z shl.b16 %rs1257, %rs1249, 4; 2026-02-21T09:05:41.1688500Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1688571Z cvt.s16.s8 %rs1258, %rs1250; 2026-02-21T09:05:41.1688633Z shr.s16 %rs1259, %rs1258, 4; 2026-02-21T09:05:41.1688694Z cvt.s16.s8 %rs1260, %rs1251; 2026-02-21T09:05:41.1688757Z shr.s16 %rs1261, %rs1260, 4; 2026-02-21T09:05:41.1688958Z prmt.b32 %r28917, %r28907, 0, 0x8880U; 2026-02-21T09:05:41.1689023Z cvt.u16.u32 %rs1262, %r28917; 2026-02-21T09:05:41.1689086Z shr.s16 %rs1263, %rs1262, 4; 2026-02-21T09:05:41.1689160Z prmt.b32 %r28918, %r28912, 0, 0x8880U; 2026-02-21T09:05:41.1689223Z cvt.u16.u32 %rs1264, %r28918; 2026-02-21T09:05:41.1689284Z shr.s16 %rs1265, %rs1264, 4; 2026-02-21T09:05:41.1689486Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1689553Z cvt.rn.f32.s16 %r28919, %rs1265; 2026-02-21T09:05:41.1689618Z cvt.rn.f32.s16 %r28920, %rs1263; 2026-02-21T09:05:41.1689682Z cvt.rn.f32.s16 %r28921, %rs1261; 2026-02-21T09:05:41.1689751Z cvt.rn.f32.s16 %r28922, %rs1259; 2026-02-21T09:05:41.1689946Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1690010Z cvt.s16.s8 %rs1266, %rs1252; 2026-02-21T09:05:41.1690077Z shr.s16 %rs1267, %rs1266, 4; 2026-02-21T09:05:41.1690145Z cvt.s16.s8 %rs1268, %rs1253; 2026-02-21T09:05:41.1690269Z shr.s16 %rs1269, %rs1268, 4; 2026-02-21T09:05:41.1690346Z prmt.b32 %r28923, %r28907, 0, 0x9991U; 2026-02-21T09:05:41.1690412Z cvt.u16.u32 %rs1270, %r28923; 2026-02-21T09:05:41.1690476Z shr.s16 %rs1271, %rs1270, 4; 2026-02-21T09:05:41.1690545Z prmt.b32 %r28924, %r28912, 0, 0x9991U; 2026-02-21T09:05:41.1690614Z cvt.u16.u32 %rs1272, %r28924; 2026-02-21T09:05:41.1690677Z shr.s16 %rs1273, %rs1272, 4; 2026-02-21T09:05:41.1690872Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1690942Z cvt.rn.f32.s16 %r28925, %rs1273; 2026-02-21T09:05:41.1691005Z cvt.rn.f32.s16 %r28926, %rs1271; 2026-02-21T09:05:41.1691070Z cvt.rn.f32.s16 %r28927, %rs1269; 2026-02-21T09:05:41.1691132Z cvt.rn.f32.s16 %r28928, %rs1267; 2026-02-21T09:05:41.1691334Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1691401Z cvt.s16.s8 %rs1274, %rs1254; 2026-02-21T09:05:41.1691466Z shr.s16 %rs1275, %rs1274, 4; 2026-02-21T09:05:41.1691537Z cvt.s16.s8 %rs1276, %rs1255; 2026-02-21T09:05:41.1691599Z shr.s16 %rs1277, %rs1276, 4; 2026-02-21T09:05:41.1691688Z prmt.b32 %r28929, %r28907, 0, 0xaaa2U; 2026-02-21T09:05:41.1691757Z cvt.u16.u32 %rs1278, %r28929; 2026-02-21T09:05:41.1691821Z shr.s16 %rs1279, %rs1278, 4; 2026-02-21T09:05:41.1691889Z prmt.b32 %r28930, %r28912, 0, 0xaaa2U; 2026-02-21T09:05:41.1691953Z cvt.u16.u32 %rs1280, %r28930; 2026-02-21T09:05:41.1692019Z shr.s16 %rs1281, %rs1280, 4; 2026-02-21T09:05:41.1692215Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1692280Z cvt.rn.f32.s16 %r28931, %rs1281; 2026-02-21T09:05:41.1692349Z cvt.rn.f32.s16 %r28932, %rs1279; 2026-02-21T09:05:41.1692413Z cvt.rn.f32.s16 %r28933, %rs1277; 2026-02-21T09:05:41.1692476Z cvt.rn.f32.s16 %r28934, %rs1275; 2026-02-21T09:05:41.1692680Z .loc 1 66 25 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:66:25 2026-02-21T09:05:41.1692744Z cvt.s16.s8 %rs1282, %rs1256; 2026-02-21T09:05:41.1692806Z shr.s16 %rs1283, %rs1282, 4; 2026-02-21T09:05:41.1692933Z cvt.s16.s8 %rs1284, %rs1257; 2026-02-21T09:05:41.1693001Z shr.s16 %rs1285, %rs1284, 4; 2026-02-21T09:05:41.1693072Z prmt.b32 %r28935, %r28907, 0, 0xbbb3U; 2026-02-21T09:05:41.1693137Z cvt.u16.u32 %rs1286, %r28935; 2026-02-21T09:05:41.1693212Z shr.s16 %rs1287, %rs1286, 4; 2026-02-21T09:05:41.1693277Z prmt.b32 %r28936, %r28912, 0, 0xbbb3U; 2026-02-21T09:05:41.1693339Z cvt.u16.u32 %rs1288, %r28936; 2026-02-21T09:05:41.1693401Z shr.s16 %rs1289, %rs1288, 4; 2026-02-21T09:05:41.1693610Z .loc 1 84 32 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:84:32 2026-02-21T09:05:41.1693676Z cvt.rn.f32.s16 %r28937, %rs1289; 2026-02-21T09:05:41.1693742Z cvt.rn.f32.s16 %r28938, %rs1287; 2026-02-21T09:05:41.1693928Z cvt.rn.f32.s16 %r28939, %rs1285; 2026-02-21T09:05:41.1693997Z cvt.rn.f32.s16 %r28940, %rs1283; 2026-02-21T09:05:41.1694055Z bar.sync 0; 2026-02-21T09:05:41.1694180Z st.shared.v4.b32 [%r79], {%r28922, %r28920, %r28921, %r28919}; 2026-02-21T09:05:41.1694296Z st.shared.v4.b32 [%r80], {%r28928, %r28926, %r28927, %r28925}; 2026-02-21T09:05:41.1694408Z st.shared.v4.b32 [%r81], {%r28934, %r28932, %r28933, %r28931}; 2026-02-21T09:05:41.1694518Z st.shared.v4.b32 [%r82], {%r28940, %r28938, %r28939, %r28937}; 2026-02-21T09:05:41.1694586Z $L__tmp31: 2026-02-21T09:05:41.1694866Z .loc 2 291 36 // standard.py:291:36 @[ c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:91:40 ] 2026-02-21T09:05:41.1694939Z // begin inline asm 2026-02-21T09:05:41.1695028Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1695090Z // end inline asm 2026-02-21T09:05:41.1695146Z bar.sync 0; 2026-02-21T09:05:41.1695227Z wgmma.fence.sync.aligned; 2026-02-21T09:05:41.1695291Z // begin inline asm 2026-02-21T09:05:41.1696994Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r27334,%r27335,%r27336,%r27337}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1697074Z // end inline asm 2026-02-21T09:05:41.1697134Z // begin inline asm 2026-02-21T09:05:41.1698626Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930}, {%r27466,%r27467,%r27468,%r27469}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1698698Z // end inline asm 2026-02-21T09:05:41.1698758Z // begin inline asm 2026-02-21T09:05:41.1700252Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r27598,%r27599,%r27600,%r27601}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1700315Z // end inline asm 2026-02-21T09:05:41.1700375Z // begin inline asm 2026-02-21T09:05:41.1701938Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994}, {%r27730,%r27731,%r27732,%r27733}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1701999Z // end inline asm 2026-02-21T09:05:41.1702058Z // begin inline asm 2026-02-21T09:05:41.1703602Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r27862,%r27863,%r27864,%r27865}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1703722Z // end inline asm 2026-02-21T09:05:41.1703787Z // begin inline asm 2026-02-21T09:05:41.1705324Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058}, {%r27994,%r27995,%r27996,%r27997}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1705389Z // end inline asm 2026-02-21T09:05:41.1705457Z // begin inline asm 2026-02-21T09:05:41.1707065Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r28126,%r28127,%r28128,%r28129}, %rd5, %p130, 1, 1; 2026-02-21T09:05:41.1707142Z // end inline asm 2026-02-21T09:05:41.1707202Z // begin inline asm 2026-02-21T09:05:41.1708774Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122}, {%r28258,%r28259,%r28260,%r28261}, %rd6, %p130, 1, 1; 2026-02-21T09:05:41.1708843Z // end inline asm 2026-02-21T09:05:41.1708921Z wgmma.commit_group.sync.aligned; 2026-02-21T09:05:41.1708988Z mov.b32 %r28518, %r24001; 2026-02-21T09:05:41.1709061Z mov.b32 %r28520, %r28519; 2026-02-21T09:05:41.1709129Z // begin inline asm 2026-02-21T09:05:41.1714258Z // wait for regs: %r29867,%r29868,%r29869,%r29870,%r29871,%r29872,%r29873,%r29874,%r29875,%r29876,%r29877,%r29878,%r29879,%r29880,%r29881,%r29882,%r29883,%r29884,%r29885,%r29886,%r29887,%r29888,%r29889,%r29890,%r29891,%r29892,%r29893,%r29894,%r29895,%r29896,%r29897,%r29898,%r29899,%r29900,%r29901,%r29902,%r29903,%r29904,%r29905,%r29906,%r29907,%r29908,%r29909,%r29910,%r29911,%r29912,%r29913,%r29914,%r29915,%r29916,%r29917,%r29918,%r29919,%r29920,%r29921,%r29922,%r29923,%r29924,%r29925,%r29926,%r29927,%r29928,%r29929,%r29930,%r29931,%r29932,%r29933,%r29934,%r29935,%r29936,%r29937,%r29938,%r29939,%r29940,%r29941,%r29942,%r29943,%r29944,%r29945,%r29946,%r29947,%r29948,%r29949,%r29950,%r29951,%r29952,%r29953,%r29954,%r29955,%r29956,%r29957,%r29958,%r29959,%r29960,%r29961,%r29962,%r29963,%r29964,%r29965,%r29966,%r29967,%r29968,%r29969,%r29970,%r29971,%r29972,%r29973,%r29974,%r29975,%r29976,%r29977,%r29978,%r29979,%r29980,%r29981,%r29982,%r29983,%r29984,%r29985,%r29986,%r29987,%r29988,%r29989,%r29990,%r29991,%r29992,%r29993,%r29994,%r29995,%r29996,%r29997,%r29998,%r29999,%r30000,%r30001,%r30002,%r30003,%r30004,%r30005,%r30006,%r30007,%r30008,%r30009,%r30010,%r30011,%r30012,%r30013,%r30014,%r30015,%r30016,%r30017,%r30018,%r30019,%r30020,%r30021,%r30022,%r30023,%r30024,%r30025,%r30026,%r30027,%r30028,%r30029,%r30030,%r30031,%r30032,%r30033,%r30034,%r30035,%r30036,%r30037,%r30038,%r30039,%r30040,%r30041,%r30042,%r30043,%r30044,%r30045,%r30046,%r30047,%r30048,%r30049,%r30050,%r30051,%r30052,%r30053,%r30054,%r30055,%r30056,%r30057,%r30058,%r30059,%r30060,%r30061,%r30062,%r30063,%r30064,%r30065,%r30066,%r30067,%r30068,%r30069,%r30070,%r30071,%r30072,%r30073,%r30074,%r30075,%r30076,%r30077,%r30078,%r30079,%r30080,%r30081,%r30082,%r30083,%r30084,%r30085,%r30086,%r30087,%r30088,%r30089,%r30090,%r30091,%r30092,%r30093,%r30094,%r30095,%r30096,%r30097,%r30098,%r30099,%r30100,%r30101,%r30102,%r30103,%r30104,%r30105,%r30106,%r30107,%r30108,%r30109,%r30110,%r30111,%r30112,%r30113,%r30114,%r30115,%r30116,%r30117,%r30118,%r30119,%r30120,%r30121,%r30122,%r28518,%r28519,%r28520 2026-02-21T09:05:41.1714461Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:05:41.1714526Z // end inline asm 2026-02-21T09:05:41.1714585Z $L__tmp32: 2026-02-21T09:05:41.1714803Z .loc 1 47 111 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:47:111 2026-02-21T09:05:41.1714873Z add.s64 %rd307, %rd307, 32; 2026-02-21T09:05:41.1714938Z add.s64 %rd306, %rd306, 128; 2026-02-21T09:05:41.1715004Z add.s64 %rd305, %rd305, 262144; 2026-02-21T09:05:41.1715082Z setp.lt.u64 %p162, %rd307, 480; 2026-02-21T09:05:41.1715151Z @%p162 bra $L__BB0_12; 2026-02-21T09:05:41.1715263Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:05:41.1715473Z .loc 1 94 28 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:94:28 2026-02-21T09:05:41.1715568Z cvt.rn.bf16x2.f32 %r28944, %r29868, %r29867; 2026-02-21T09:05:41.1715658Z cvt.rn.bf16x2.f32 %r28945, %r29870, %r29869; 2026-02-21T09:05:41.1715739Z cvt.rn.bf16x2.f32 %r28946, %r29872, %r29871; 2026-02-21T09:05:41.1715837Z cvt.rn.bf16x2.f32 %r28947, %r29874, %r29873; 2026-02-21T09:05:41.1715929Z cvt.rn.bf16x2.f32 %r28948, %r29876, %r29875; 2026-02-21T09:05:41.1716007Z cvt.rn.bf16x2.f32 %r28949, %r29878, %r29877; 2026-02-21T09:05:41.1716090Z cvt.rn.bf16x2.f32 %r28950, %r29880, %r29879; 2026-02-21T09:05:41.1716168Z cvt.rn.bf16x2.f32 %r28951, %r29882, %r29881; 2026-02-21T09:05:41.1716246Z cvt.rn.bf16x2.f32 %r28952, %r29884, %r29883; 2026-02-21T09:05:41.1716324Z cvt.rn.bf16x2.f32 %r28953, %r29886, %r29885; 2026-02-21T09:05:41.1716410Z cvt.rn.bf16x2.f32 %r28954, %r29888, %r29887; 2026-02-21T09:05:41.1716603Z cvt.rn.bf16x2.f32 %r28955, %r29890, %r29889; 2026-02-21T09:05:41.1716688Z cvt.rn.bf16x2.f32 %r28956, %r29892, %r29891; 2026-02-21T09:05:41.1716771Z cvt.rn.bf16x2.f32 %r28957, %r29894, %r29893; 2026-02-21T09:05:41.1716851Z cvt.rn.bf16x2.f32 %r28958, %r29896, %r29895; 2026-02-21T09:05:41.1716948Z cvt.rn.bf16x2.f32 %r28959, %r29898, %r29897; 2026-02-21T09:05:41.1717029Z cvt.rn.bf16x2.f32 %r28960, %r29900, %r29899; 2026-02-21T09:05:41.1717113Z cvt.rn.bf16x2.f32 %r28961, %r29902, %r29901; 2026-02-21T09:05:41.1717271Z cvt.rn.bf16x2.f32 %r28962, %r29904, %r29903; 2026-02-21T09:05:41.1717347Z cvt.rn.bf16x2.f32 %r28963, %r29906, %r29905; 2026-02-21T09:05:41.1717431Z cvt.rn.bf16x2.f32 %r28964, %r29908, %r29907; 2026-02-21T09:05:41.1717507Z cvt.rn.bf16x2.f32 %r28965, %r29910, %r29909; 2026-02-21T09:05:41.1717584Z cvt.rn.bf16x2.f32 %r28966, %r29912, %r29911; 2026-02-21T09:05:41.1717666Z cvt.rn.bf16x2.f32 %r28967, %r29914, %r29913; 2026-02-21T09:05:41.1717744Z cvt.rn.bf16x2.f32 %r28968, %r29916, %r29915; 2026-02-21T09:05:41.1717821Z cvt.rn.bf16x2.f32 %r28969, %r29918, %r29917; 2026-02-21T09:05:41.1717898Z cvt.rn.bf16x2.f32 %r28970, %r29920, %r29919; 2026-02-21T09:05:41.1717981Z cvt.rn.bf16x2.f32 %r28971, %r29922, %r29921; 2026-02-21T09:05:41.1718189Z cvt.rn.bf16x2.f32 %r28972, %r29924, %r29923; 2026-02-21T09:05:41.1718272Z cvt.rn.bf16x2.f32 %r28973, %r29926, %r29925; 2026-02-21T09:05:41.1718355Z cvt.rn.bf16x2.f32 %r28974, %r29928, %r29927; 2026-02-21T09:05:41.1718437Z cvt.rn.bf16x2.f32 %r28975, %r29930, %r29929; 2026-02-21T09:05:41.1718513Z cvt.rn.bf16x2.f32 %r28976, %r29932, %r29931; 2026-02-21T09:05:41.1718597Z cvt.rn.bf16x2.f32 %r28977, %r29934, %r29933; 2026-02-21T09:05:41.1718686Z cvt.rn.bf16x2.f32 %r28978, %r29936, %r29935; 2026-02-21T09:05:41.1718765Z cvt.rn.bf16x2.f32 %r28979, %r29938, %r29937; 2026-02-21T09:05:41.1718843Z cvt.rn.bf16x2.f32 %r28980, %r29940, %r29939; 2026-02-21T09:05:41.1718926Z cvt.rn.bf16x2.f32 %r28981, %r29942, %r29941; 2026-02-21T09:05:41.1719003Z cvt.rn.bf16x2.f32 %r28982, %r29944, %r29943; 2026-02-21T09:05:41.1719079Z cvt.rn.bf16x2.f32 %r28983, %r29946, %r29945; 2026-02-21T09:05:41.1719166Z cvt.rn.bf16x2.f32 %r28984, %r29948, %r29947; 2026-02-21T09:05:41.1719245Z cvt.rn.bf16x2.f32 %r28985, %r29950, %r29949; 2026-02-21T09:05:41.1719390Z cvt.rn.bf16x2.f32 %r28986, %r29952, %r29951; 2026-02-21T09:05:41.1719478Z cvt.rn.bf16x2.f32 %r28987, %r29954, %r29953; 2026-02-21T09:05:41.1719555Z cvt.rn.bf16x2.f32 %r28988, %r29956, %r29955; 2026-02-21T09:05:41.1719635Z cvt.rn.bf16x2.f32 %r28989, %r29958, %r29957; 2026-02-21T09:05:41.1719712Z cvt.rn.bf16x2.f32 %r28990, %r29960, %r29959; 2026-02-21T09:05:41.1719797Z cvt.rn.bf16x2.f32 %r28991, %r29962, %r29961; 2026-02-21T09:05:41.1719874Z cvt.rn.bf16x2.f32 %r28992, %r29964, %r29963; 2026-02-21T09:05:41.1719951Z cvt.rn.bf16x2.f32 %r28993, %r29966, %r29965; 2026-02-21T09:05:41.1720033Z cvt.rn.bf16x2.f32 %r28994, %r29968, %r29967; 2026-02-21T09:05:41.1720112Z cvt.rn.bf16x2.f32 %r28995, %r29970, %r29969; 2026-02-21T09:05:41.1720189Z cvt.rn.bf16x2.f32 %r28996, %r29972, %r29971; 2026-02-21T09:05:41.1720274Z cvt.rn.bf16x2.f32 %r28997, %r29974, %r29973; 2026-02-21T09:05:41.1720356Z cvt.rn.bf16x2.f32 %r28998, %r29976, %r29975; 2026-02-21T09:05:41.1720445Z cvt.rn.bf16x2.f32 %r28999, %r29978, %r29977; 2026-02-21T09:05:41.1720525Z cvt.rn.bf16x2.f32 %r29000, %r29980, %r29979; 2026-02-21T09:05:41.1720611Z cvt.rn.bf16x2.f32 %r29001, %r29982, %r29981; 2026-02-21T09:05:41.1720697Z cvt.rn.bf16x2.f32 %r29002, %r29984, %r29983; 2026-02-21T09:05:41.1720773Z cvt.rn.bf16x2.f32 %r29003, %r29986, %r29985; 2026-02-21T09:05:41.1720860Z cvt.rn.bf16x2.f32 %r29004, %r29988, %r29987; 2026-02-21T09:05:41.1720937Z cvt.rn.bf16x2.f32 %r29005, %r29990, %r29989; 2026-02-21T09:05:41.1721014Z cvt.rn.bf16x2.f32 %r29006, %r29992, %r29991; 2026-02-21T09:05:41.1721096Z cvt.rn.bf16x2.f32 %r29007, %r29994, %r29993; 2026-02-21T09:05:41.1721174Z cvt.rn.bf16x2.f32 %r29008, %r29996, %r29995; 2026-02-21T09:05:41.1721252Z cvt.rn.bf16x2.f32 %r29009, %r29998, %r29997; 2026-02-21T09:05:41.1721328Z cvt.rn.bf16x2.f32 %r29010, %r30000, %r29999; 2026-02-21T09:05:41.1721410Z cvt.rn.bf16x2.f32 %r29011, %r30002, %r30001; 2026-02-21T09:05:41.1721496Z cvt.rn.bf16x2.f32 %r29012, %r30004, %r30003; 2026-02-21T09:05:41.1721575Z cvt.rn.bf16x2.f32 %r29013, %r30006, %r30005; 2026-02-21T09:05:41.1721659Z cvt.rn.bf16x2.f32 %r29014, %r30008, %r30007; 2026-02-21T09:05:41.1721734Z cvt.rn.bf16x2.f32 %r29015, %r30010, %r30009; 2026-02-21T09:05:41.1721873Z cvt.rn.bf16x2.f32 %r29016, %r30012, %r30011; 2026-02-21T09:05:41.1721955Z cvt.rn.bf16x2.f32 %r29017, %r30014, %r30013; 2026-02-21T09:05:41.1722033Z cvt.rn.bf16x2.f32 %r29018, %r30016, %r30015; 2026-02-21T09:05:41.1722110Z cvt.rn.bf16x2.f32 %r29019, %r30018, %r30017; 2026-02-21T09:05:41.1722186Z cvt.rn.bf16x2.f32 %r29020, %r30020, %r30019; 2026-02-21T09:05:41.1722268Z cvt.rn.bf16x2.f32 %r29021, %r30022, %r30021; 2026-02-21T09:05:41.1722343Z cvt.rn.bf16x2.f32 %r29022, %r30024, %r30023; 2026-02-21T09:05:41.1722420Z cvt.rn.bf16x2.f32 %r29023, %r30026, %r30025; 2026-02-21T09:05:41.1722503Z cvt.rn.bf16x2.f32 %r29024, %r30028, %r30027; 2026-02-21T09:05:41.1722597Z cvt.rn.bf16x2.f32 %r29025, %r30030, %r30029; 2026-02-21T09:05:41.1722794Z cvt.rn.bf16x2.f32 %r29026, %r30032, %r30031; 2026-02-21T09:05:41.1722881Z cvt.rn.bf16x2.f32 %r29027, %r30034, %r30033; 2026-02-21T09:05:41.1722961Z cvt.rn.bf16x2.f32 %r29028, %r30036, %r30035; 2026-02-21T09:05:41.1723039Z cvt.rn.bf16x2.f32 %r29029, %r30038, %r30037; 2026-02-21T09:05:41.1723115Z cvt.rn.bf16x2.f32 %r29030, %r30040, %r30039; 2026-02-21T09:05:41.1723197Z cvt.rn.bf16x2.f32 %r29031, %r30042, %r30041; 2026-02-21T09:05:41.1723274Z cvt.rn.bf16x2.f32 %r29032, %r30044, %r30043; 2026-02-21T09:05:41.1723350Z cvt.rn.bf16x2.f32 %r29033, %r30046, %r30045; 2026-02-21T09:05:41.1723433Z cvt.rn.bf16x2.f32 %r29034, %r30048, %r30047; 2026-02-21T09:05:41.1723510Z cvt.rn.bf16x2.f32 %r29035, %r30050, %r30049; 2026-02-21T09:05:41.1723588Z cvt.rn.bf16x2.f32 %r29036, %r30052, %r30051; 2026-02-21T09:05:41.1723663Z cvt.rn.bf16x2.f32 %r29037, %r30054, %r30053; 2026-02-21T09:05:41.1723756Z cvt.rn.bf16x2.f32 %r29038, %r30056, %r30055; 2026-02-21T09:05:41.1723841Z cvt.rn.bf16x2.f32 %r29039, %r30058, %r30057; 2026-02-21T09:05:41.1723966Z cvt.rn.bf16x2.f32 %r29040, %r30060, %r30059; 2026-02-21T09:05:41.1724050Z cvt.rn.bf16x2.f32 %r29041, %r30062, %r30061; 2026-02-21T09:05:41.1724127Z cvt.rn.bf16x2.f32 %r29042, %r30064, %r30063; 2026-02-21T09:05:41.1724208Z cvt.rn.bf16x2.f32 %r29043, %r30066, %r30065; 2026-02-21T09:05:41.1724291Z cvt.rn.bf16x2.f32 %r29044, %r30068, %r30067; 2026-02-21T09:05:41.1724368Z cvt.rn.bf16x2.f32 %r29045, %r30070, %r30069; 2026-02-21T09:05:41.1724446Z cvt.rn.bf16x2.f32 %r29046, %r30072, %r30071; 2026-02-21T09:05:41.1724524Z cvt.rn.bf16x2.f32 %r29047, %r30074, %r30073; 2026-02-21T09:05:41.1724607Z cvt.rn.bf16x2.f32 %r29048, %r30076, %r30075; 2026-02-21T09:05:41.1724685Z cvt.rn.bf16x2.f32 %r29049, %r30078, %r30077; 2026-02-21T09:05:41.1724765Z cvt.rn.bf16x2.f32 %r29050, %r30080, %r30079; 2026-02-21T09:05:41.1724848Z cvt.rn.bf16x2.f32 %r29051, %r30082, %r30081; 2026-02-21T09:05:41.1724924Z cvt.rn.bf16x2.f32 %r29052, %r30084, %r30083; 2026-02-21T09:05:41.1725008Z cvt.rn.bf16x2.f32 %r29053, %r30086, %r30085; 2026-02-21T09:05:41.1725091Z cvt.rn.bf16x2.f32 %r29054, %r30088, %r30087; 2026-02-21T09:05:41.1725168Z cvt.rn.bf16x2.f32 %r29055, %r30090, %r30089; 2026-02-21T09:05:41.1725249Z cvt.rn.bf16x2.f32 %r29056, %r30092, %r30091; 2026-02-21T09:05:41.1725327Z cvt.rn.bf16x2.f32 %r29057, %r30094, %r30093; 2026-02-21T09:05:41.1725410Z cvt.rn.bf16x2.f32 %r29058, %r30096, %r30095; 2026-02-21T09:05:41.1725486Z cvt.rn.bf16x2.f32 %r29059, %r30098, %r30097; 2026-02-21T09:05:41.1725565Z cvt.rn.bf16x2.f32 %r29060, %r30100, %r30099; 2026-02-21T09:05:41.1725648Z cvt.rn.bf16x2.f32 %r29061, %r30102, %r30101; 2026-02-21T09:05:41.1725724Z cvt.rn.bf16x2.f32 %r29062, %r30104, %r30103; 2026-02-21T09:05:41.1725800Z cvt.rn.bf16x2.f32 %r29063, %r30106, %r30105; 2026-02-21T09:05:41.1725882Z cvt.rn.bf16x2.f32 %r29064, %r30108, %r30107; 2026-02-21T09:05:41.1725967Z cvt.rn.bf16x2.f32 %r29065, %r30110, %r30109; 2026-02-21T09:05:41.1726048Z cvt.rn.bf16x2.f32 %r29066, %r30112, %r30111; 2026-02-21T09:05:41.1726127Z cvt.rn.bf16x2.f32 %r29067, %r30114, %r30113; 2026-02-21T09:05:41.1726211Z cvt.rn.bf16x2.f32 %r29068, %r30116, %r30115; 2026-02-21T09:05:41.1726286Z cvt.rn.bf16x2.f32 %r29069, %r30118, %r30117; 2026-02-21T09:05:41.1726420Z cvt.rn.bf16x2.f32 %r29070, %r30120, %r30119; 2026-02-21T09:05:41.1726622Z cvt.rn.bf16x2.f32 %r29071, %r30122, %r30121; 2026-02-21T09:05:41.1726846Z .loc 1 95 43 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:95:43 2026-02-21T09:05:41.1726909Z bar.sync 0; 2026-02-21T09:05:41.1727115Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r28944, %r28945, %r28946, %r28947}; 2026-02-21T09:05:41.1727303Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r28960, %r28961, %r28962, %r28963}; 2026-02-21T09:05:41.1727489Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r28976, %r28977, %r28978, %r28979}; 2026-02-21T09:05:41.1727764Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r28992, %r28993, %r28994, %r28995}; 2026-02-21T09:05:41.1728020Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r29008, %r29009, %r29010, %r29011}; 2026-02-21T09:05:41.1728206Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r29024, %r29025, %r29026, %r29027}; 2026-02-21T09:05:41.1728394Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r29040, %r29041, %r29042, %r29043}; 2026-02-21T09:05:41.1728585Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r90], {%r29056, %r29057, %r29058, %r29059}; 2026-02-21T09:05:41.1728765Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r91], {%r28948, %r28949, %r28950, %r28951}; 2026-02-21T09:05:41.1728947Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r92], {%r28964, %r28965, %r28966, %r28967}; 2026-02-21T09:05:41.1729135Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r93], {%r28980, %r28981, %r28982, %r28983}; 2026-02-21T09:05:41.1729321Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r94], {%r28996, %r28997, %r28998, %r28999}; 2026-02-21T09:05:41.1729508Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r95], {%r29012, %r29013, %r29014, %r29015}; 2026-02-21T09:05:41.1729760Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r96], {%r29028, %r29029, %r29030, %r29031}; 2026-02-21T09:05:41.1729945Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r97], {%r29044, %r29045, %r29046, %r29047}; 2026-02-21T09:05:41.1730130Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r98], {%r29060, %r29061, %r29062, %r29063}; 2026-02-21T09:05:41.1730333Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r99], {%r28952, %r28953, %r28954, %r28955}; 2026-02-21T09:05:41.1730531Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r100], {%r28968, %r28969, %r28970, %r28971}; 2026-02-21T09:05:41.1730719Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r101], {%r28984, %r28985, %r28986, %r28987}; 2026-02-21T09:05:41.1730912Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r102], {%r29000, %r29001, %r29002, %r29003}; 2026-02-21T09:05:41.1731104Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r103], {%r29016, %r29017, %r29018, %r29019}; 2026-02-21T09:05:41.1731297Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r104], {%r29032, %r29033, %r29034, %r29035}; 2026-02-21T09:05:41.1731487Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r105], {%r29048, %r29049, %r29050, %r29051}; 2026-02-21T09:05:41.1731674Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r106], {%r29064, %r29065, %r29066, %r29067}; 2026-02-21T09:05:41.1731860Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r107], {%r28956, %r28957, %r28958, %r28959}; 2026-02-21T09:05:41.1732050Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r108], {%r28972, %r28973, %r28974, %r28975}; 2026-02-21T09:05:41.1732238Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r109], {%r28988, %r28989, %r28990, %r28991}; 2026-02-21T09:05:41.1732425Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r110], {%r29004, %r29005, %r29006, %r29007}; 2026-02-21T09:05:41.1732608Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r111], {%r29020, %r29021, %r29022, %r29023}; 2026-02-21T09:05:41.1732800Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r112], {%r29036, %r29037, %r29038, %r29039}; 2026-02-21T09:05:41.1732988Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r113], {%r29052, %r29053, %r29054, %r29055}; 2026-02-21T09:05:41.1733172Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r114], {%r29068, %r29069, %r29070, %r29071}; 2026-02-21T09:05:41.1733312Z // begin inline asm 2026-02-21T09:05:41.1733396Z fence.proxy.async.shared::cta; 2026-02-21T09:05:41.1733455Z // end inline asm 2026-02-21T09:05:41.1733518Z bar.sync 0; 2026-02-21T09:05:41.1733588Z elect.sync %r29072|%p165, -1; 2026-02-21T09:05:41.1733673Z shfl.sync.idx.b32 %r29073, %r4, 0, 31, -1; 2026-02-21T09:05:41.1733746Z and.pred %p163, %p167, %p165; 2026-02-21T09:05:41.1733816Z and.b32 %r29074, %r29073, 1; 2026-02-21T09:05:41.1733877Z shl.b32 %r29075, %r29074, 15; 2026-02-21T09:05:41.1733944Z add.s32 %r28943, %r24001, %r29075; 2026-02-21T09:05:41.1734011Z shl.b32 %r29077, %r29074, 6; 2026-02-21T09:05:41.1734077Z or.b32 %r28941, %r29077, %r1676; 2026-02-21T09:05:41.1734140Z // begin inline asm 2026-02-21T09:05:41.1734488Z @%p163 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd183, {%r28941, %r1675}], [%r28943]; 2026-02-21T09:05:41.1734550Z // end inline asm 2026-02-21T09:05:41.1734626Z cp.async.bulk.commit_group; 2026-02-21T09:05:41.1734706Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:05:41.1734767Z bar.sync 0; 2026-02-21T09:05:41.1734983Z .loc 1 26 144 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:144 2026-02-21T09:05:41.1735049Z add.s32 %r2189, %r29866, 1056; 2026-02-21T09:05:41.1735123Z setp.lt.s32 %p166, %r29866, -32; 2026-02-21T09:05:41.1735187Z mov.b32 %r29866, %r2189; 2026-02-21T09:05:41.1735250Z @%p166 bra $L__BB0_11; 2026-02-21T09:05:41.1735343Z $L__BB0_14: // %._crit_edge 2026-02-21T09:05:41.1735568Z .loc 1 26 4 // c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py:26:4 2026-02-21T09:05:41.1735625Z ret; 2026-02-21T09:05:41.1735683Z $L__tmp33: 2026-02-21T09:05:41.1735749Z $L__func_end0: 2026-02-21T09:05:41.1735896Z // -- End function 2026-02-21T09:05:41.1735957Z } 2026-02-21T09:05:41.1736212Z .file 1 "/tmp/torchinductor_root/47/c475ugic4r4obbea5yqzfzrloxe7pits7gnypl7mbm2pufl4cvtl.py" 2026-02-21T09:05:41.1736431Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:05:41.1736610Z .section .debug_abbrev 2026-02-21T09:05:41.1736677Z { 2026-02-21T09:05:41.1736783Z .b8 1 // Abbreviation Code 2026-02-21T09:05:41.1736882Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:05:41.1736971Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:05:41.1737065Z .b8 37 // DW_AT_producer 2026-02-21T09:05:41.1737146Z .b8 8 // DW_FORM_string 2026-02-21T09:05:41.1737228Z .b8 19 // DW_AT_language 2026-02-21T09:05:41.1737323Z .b8 5 // DW_FORM_data2 2026-02-21T09:05:41.1737406Z .b8 3 // DW_AT_name 2026-02-21T09:05:41.1737499Z .b8 8 // DW_FORM_string 2026-02-21T09:05:41.1737588Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:05:41.1737678Z .b8 6 // DW_FORM_data4 2026-02-21T09:05:41.1737760Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:05:41.1737840Z .b8 8 // DW_FORM_string 2026-02-21T09:05:41.1737924Z .b8 0 // EOM(1) 2026-02-21T09:05:41.1737996Z .b8 0 // EOM(2) 2026-02-21T09:05:41.1738086Z .b8 2 // Abbreviation Code 2026-02-21T09:05:41.1738179Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:05:41.1738260Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:41.1738344Z .b8 3 // DW_AT_name 2026-02-21T09:05:41.1738431Z .b8 8 // DW_FORM_string 2026-02-21T09:05:41.1738515Z .b8 32 // DW_AT_inline 2026-02-21T09:05:41.1738678Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:41.1738764Z .b8 0 // EOM(1) 2026-02-21T09:05:41.1738844Z .b8 0 // EOM(2) 2026-02-21T09:05:41.1738933Z .b8 3 // Abbreviation Code 2026-02-21T09:05:41.1739021Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:05:41.1739111Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:05:41.1739195Z .b8 17 // DW_AT_low_pc 2026-02-21T09:05:41.1739284Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:41.1739374Z .b8 18 // DW_AT_high_pc 2026-02-21T09:05:41.1739581Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:41.1739686Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:05:41.1739764Z .b8 19 // DW_FORM_ref4 2026-02-21T09:05:41.1739848Z .b8 0 // EOM(1) 2026-02-21T09:05:41.1739923Z .b8 0 // EOM(2) 2026-02-21T09:05:41.1740013Z .b8 4 // Abbreviation Code 2026-02-21T09:05:41.1740123Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:05:41.1740206Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:05:41.1740300Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:05:41.1740388Z .b8 19 // DW_FORM_ref4 2026-02-21T09:05:41.1740479Z .b8 17 // DW_AT_low_pc 2026-02-21T09:05:41.1740558Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:41.1740708Z .b8 18 // DW_AT_high_pc 2026-02-21T09:05:41.1740797Z .b8 1 // DW_FORM_addr 2026-02-21T09:05:41.1740883Z .b8 88 // DW_AT_call_file 2026-02-21T09:05:41.1740968Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:41.1741061Z .b8 89 // DW_AT_call_line 2026-02-21T09:05:41.1741141Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:41.1741228Z .b8 87 // DW_AT_call_column 2026-02-21T09:05:41.1741314Z .b8 11 // DW_FORM_data1 2026-02-21T09:05:41.1741389Z .b8 0 // EOM(1) 2026-02-21T09:05:41.1741462Z .b8 0 // EOM(2) 2026-02-21T09:05:41.1741543Z .b8 0 // EOM(3) 2026-02-21T09:05:41.1741598Z } 2026-02-21T09:05:41.1741666Z .section .debug_info 2026-02-21T09:05:41.1741724Z { 2026-02-21T09:05:41.1741832Z .b32 178 // Length of Unit 2026-02-21T09:05:41.1741932Z .b8 2 // DWARF version number 2026-02-21T09:05:41.1741990Z .b8 0 2026-02-21T09:05:41.1742135Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:05:41.1742235Z .b8 8 // Address Size (in bytes) 2026-02-21T09:05:41.1742354Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:05:41.1742443Z .b8 116 // DW_AT_producer 2026-02-21T09:05:41.1742505Z .b8 114 2026-02-21T09:05:41.1742560Z .b8 105 2026-02-21T09:05:41.1742613Z .b8 116 2026-02-21T09:05:41.1742673Z .b8 111 2026-02-21T09:05:41.1742726Z .b8 110 2026-02-21T09:05:41.1742778Z .b8 0 2026-02-21T09:05:41.1742857Z .b8 2 // DW_AT_language 2026-02-21T09:05:41.1742916Z .b8 0 2026-02-21T09:05:41.1742997Z .b8 99 // DW_AT_name 2026-02-21T09:05:41.1743053Z .b8 52 2026-02-21T09:05:41.1743112Z .b8 55 2026-02-21T09:05:41.1743164Z .b8 53 2026-02-21T09:05:41.1743231Z .b8 117 2026-02-21T09:05:41.1743287Z .b8 103 2026-02-21T09:05:41.1743425Z .b8 105 2026-02-21T09:05:41.1743478Z .b8 99 2026-02-21T09:05:41.1743529Z .b8 52 2026-02-21T09:05:41.1743587Z .b8 114 2026-02-21T09:05:41.1743639Z .b8 52 2026-02-21T09:05:41.1743691Z .b8 111 2026-02-21T09:05:41.1743743Z .b8 98 2026-02-21T09:05:41.1743799Z .b8 98 2026-02-21T09:05:41.1743854Z .b8 101 2026-02-21T09:05:41.1743907Z .b8 97 2026-02-21T09:05:41.1743964Z .b8 53 2026-02-21T09:05:41.1744018Z .b8 121 2026-02-21T09:05:41.1744071Z .b8 113 2026-02-21T09:05:41.1744125Z .b8 122 2026-02-21T09:05:41.1744183Z .b8 102 2026-02-21T09:05:41.1744235Z .b8 122 2026-02-21T09:05:41.1744287Z .b8 114 2026-02-21T09:05:41.1744341Z .b8 108 2026-02-21T09:05:41.1744398Z .b8 111 2026-02-21T09:05:41.1744451Z .b8 120 2026-02-21T09:05:41.1744505Z .b8 101 2026-02-21T09:05:41.1744618Z .b8 55 2026-02-21T09:05:41.1744720Z .b8 112 2026-02-21T09:05:41.1744779Z .b8 105 2026-02-21T09:05:41.1744831Z .b8 116 2026-02-21T09:05:41.1744890Z .b8 115 2026-02-21T09:05:41.1744944Z .b8 55 2026-02-21T09:05:41.1744997Z .b8 103 2026-02-21T09:05:41.1745073Z .b8 110 2026-02-21T09:05:41.1745130Z .b8 121 2026-02-21T09:05:41.1745184Z .b8 112 2026-02-21T09:05:41.1745237Z .b8 108 2026-02-21T09:05:41.1745295Z .b8 55 2026-02-21T09:05:41.1745350Z .b8 109 2026-02-21T09:05:41.1745403Z .b8 98 2026-02-21T09:05:41.1745457Z .b8 109 2026-02-21T09:05:41.1745516Z .b8 50 2026-02-21T09:05:41.1745570Z .b8 112 2026-02-21T09:05:41.1745623Z .b8 117 2026-02-21T09:05:41.1745683Z .b8 102 2026-02-21T09:05:41.1745737Z .b8 108 2026-02-21T09:05:41.1745793Z .b8 52 2026-02-21T09:05:41.1745846Z .b8 99 2026-02-21T09:05:41.1745905Z .b8 118 2026-02-21T09:05:41.1745958Z .b8 116 2026-02-21T09:05:41.1746011Z .b8 108 2026-02-21T09:05:41.1746070Z .b8 46 2026-02-21T09:05:41.1746122Z .b8 112 2026-02-21T09:05:41.1746176Z .b8 121 2026-02-21T09:05:41.1746233Z .b8 0 2026-02-21T09:05:41.1746405Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:05:41.1746595Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:05:41.1746654Z .b8 116 2026-02-21T09:05:41.1746715Z .b8 109 2026-02-21T09:05:41.1746780Z .b8 112 2026-02-21T09:05:41.1746833Z .b8 47 2026-02-21T09:05:41.1746888Z .b8 116 2026-02-21T09:05:41.1750366Z .b8 111 2026-02-21T09:05:41.1750453Z .b8 114 2026-02-21T09:05:41.1750516Z .b8 99 2026-02-21T09:05:41.1750572Z .b8 104 2026-02-21T09:05:41.1750624Z .b8 105 2026-02-21T09:05:41.1750681Z .b8 110 2026-02-21T09:05:41.1750732Z .b8 100 2026-02-21T09:05:41.1750785Z .b8 117 2026-02-21T09:05:41.1750840Z .b8 99 2026-02-21T09:05:41.1750899Z .b8 116 2026-02-21T09:05:41.1750954Z .b8 111 2026-02-21T09:05:41.1751005Z .b8 114 2026-02-21T09:05:41.1751058Z .b8 95 2026-02-21T09:05:41.1751117Z .b8 114 2026-02-21T09:05:41.1751170Z .b8 111 2026-02-21T09:05:41.1751223Z .b8 111 2026-02-21T09:05:41.1751281Z .b8 116 2026-02-21T09:05:41.1751338Z .b8 47 2026-02-21T09:05:41.1751394Z .b8 52 2026-02-21T09:05:41.1751447Z .b8 55 2026-02-21T09:05:41.1751506Z .b8 0 2026-02-21T09:05:41.1751645Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:05:41.1751753Z .b8 95 // DW_AT_name 2026-02-21T09:05:41.1751820Z .b8 104 2026-02-21T09:05:41.1751875Z .b8 101 2026-02-21T09:05:41.1751928Z .b8 108 2026-02-21T09:05:41.1751979Z .b8 105 2026-02-21T09:05:41.1752040Z .b8 111 2026-02-21T09:05:41.1752092Z .b8 110 2026-02-21T09:05:41.1752145Z .b8 95 2026-02-21T09:05:41.1752202Z .b8 109 2026-02-21T09:05:41.1752256Z .b8 97 2026-02-21T09:05:41.1752312Z .b8 116 2026-02-21T09:05:41.1752365Z .b8 109 2026-02-21T09:05:41.1752424Z .b8 117 2026-02-21T09:05:41.1752477Z .b8 108 2026-02-21T09:05:41.1752527Z .b8 95 2026-02-21T09:05:41.1752585Z .b8 98 2026-02-21T09:05:41.1752638Z .b8 102 2026-02-21T09:05:41.1752692Z .b8 49 2026-02-21T09:05:41.1752745Z .b8 54 2026-02-21T09:05:41.1752803Z .b8 95 2026-02-21T09:05:41.1752861Z .b8 105 2026-02-21T09:05:41.1752917Z .b8 110 2026-02-21T09:05:41.1752971Z .b8 116 2026-02-21T09:05:41.1753029Z .b8 52 2026-02-21T09:05:41.1753080Z .b8 0 2026-02-21T09:05:41.1753175Z .b8 1 // DW_AT_inline 2026-02-21T09:05:41.1753442Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:05:41.1753548Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:05:41.1753653Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:05:41.1753764Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:05:41.1753905Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:05:41.1754008Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:05:41.1754104Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:05:41.1754216Z .b64 $L__tmp32 // DW_AT_high_pc 2026-02-21T09:05:41.1754447Z .b8 1 // DW_AT_call_file 2026-02-21T09:05:41.1754551Z .b8 91 // DW_AT_call_line 2026-02-21T09:05:41.1754653Z .b8 40 // DW_AT_call_column 2026-02-21T09:05:41.1754750Z .b8 0 // End Of Children Mark 2026-02-21T09:05:41.1754836Z .b8 0 // End Of Children Mark 2026-02-21T09:05:41.1754897Z } 2026-02-21T09:05:41.1754969Z .section .debug_macinfo { } 2026-02-21T09:05:41.1754975Z 2026-02-21T09:05:41.1755057Z ================================================================ 2026-02-21T09:05:41.1755187Z please share the reproducer above with Triton project. 2026-02-21T09:05:41.6699812Z [287s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[3, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:05:41.6701708Z Tensor-likes are not close! 2026-02-21T09:05:41.6701902Z 2026-02-21T09:05:41.6702026Z Mismatched elements: 33483608 / 33554432 (99.8%) 2026-02-21T09:05:41.6702471Z Greatest absolute difference: 2384.0 at index (1818, 910) (up to 0.01 allowed) 2026-02-21T09:05:41.6703017Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:05:41.6703486Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:05:41.6703756Z 2026-02-21T09:05:41.8618998Z 2026-02-21T09:05:41.8620555Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 101/101 9.2 configs/s 2026-02-21T09:05:49.7509513Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━━━ 514/514 64.5 configs/s 2026-02-21T09:05:49.9907715Z [295s] Generation 3 complete: 2026-02-21T09:05:49.9908026Z error=15 2026-02-21T09:05:49.9908238Z ok=89 2026-02-21T09:05:49.9908498Z min=0.3962 2026-02-21T09:05:49.9908684Z mid=0.8502 2026-02-21T09:05:49.9908850Z max=33.5683 2026-02-21T09:05:49.9909051Z best={'block_sizes': [8, 64, 256], 2026-02-21T09:05:49.9909461Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:05:49.9909882Z 'l2_groupings': [32], 2026-02-21T09:05:49.9910114Z 'load_eviction_policies': ['', ''], 2026-02-21T09:05:49.9910381Z 'loop_orders': [[0, 1]], 2026-02-21T09:05:49.9910608Z 'num_sm_multiplier': 4, 2026-02-21T09:05:49.9910825Z 'num_stages': 7, 2026-02-21T09:05:49.9911022Z 'num_warps': 4, 2026-02-21T09:05:49.9911236Z 'pid_type': 'persistent_interleaved', 2026-02-21T09:05:49.9911539Z 'range_flattens': [False, False], 2026-02-21T09:05:49.9911799Z 'range_multi_buffers': [None, False], 2026-02-21T09:05:49.9912063Z 'range_num_stages': [0, 0], 2026-02-21T09:05:49.9912313Z 'range_unroll_factors': [2, 2], 2026-02-21T09:05:49.9912584Z 'range_warp_specializes': []} 2026-02-21T09:05:49.9944633Z [295s] Fitting surrogate: 425 points, 425 targets 2026-02-21T09:05:51.8020762Z [297s] Generation 4 starting: 107 neighbors, 5 active search path(s) 2026-02-21T09:06:41.7304625Z [347s] Timeout after 30s compiling Config(block_sizes=[32, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[3, 1], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:06:43.0013780Z [348s] Timeout after 30s compiling Config(block_sizes=[32, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=1, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 2], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T09:06:43.0040547Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 108/108 0.7 configs/s 2026-02-21T09:06:46.0044824Z [351s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=3, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:06:46.0047108Z Tensor-likes are not close! 2026-02-21T09:06:46.0047289Z 2026-02-21T09:06:46.0047406Z Mismatched elements: 33449778 / 33554432 (99.7%) 2026-02-21T09:06:46.0047831Z Greatest absolute difference: 1416.0 at index (1834, 910) (up to 0.01 allowed) 2026-02-21T09:06:46.0048759Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:06:46.0049269Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:06:46.0049530Z 2026-02-21T09:06:46.4661083Z [351s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=128, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[4, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T09:06:46.4662938Z Tensor-likes are not close! 2026-02-21T09:06:46.4663112Z 2026-02-21T09:06:46.4663233Z Mismatched elements: 33485505 / 33554432 (99.8%) 2026-02-21T09:06:46.4663668Z Greatest absolute difference: 2560.0 at index (1834, 918) (up to 0.01 allowed) 2026-02-21T09:06:46.4664213Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:06:46.4664682Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:06:46.4664938Z 2026-02-21T09:06:46.6685509Z [352s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:06:46.6687693Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=128, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:06:46.6689292Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:06:46.6689592Z `ptxas` stderr: 2026-02-21T09:06:46.6690182Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 457 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:06:46.6690846Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:46.6691311Z 2026-02-21T09:06:46.6691841Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp48kj7nh1.ptx -o /tmp/tmp48kj7nh1.ptx.o 2026-02-21T09:06:46.6692431Z 2026-02-21T09:06:46.6692594Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:06:46.6692855Z 2026-02-21T09:06:46.6692953Z 2026-02-21T09:06:46.6693043Z ================================================================ 2026-02-21T09:06:46.6693300Z Internal Triton PTX codegen error 2026-02-21T09:06:46.6693497Z `ptxas` stderr: 2026-02-21T09:06:46.6694198Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 457 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:06:46.6694985Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:46.6695178Z 2026-02-21T09:06:46.6695687Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp48kj7nh1.ptx -o /tmp/tmp48kj7nh1.ptx.o 2026-02-21T09:06:46.6696272Z 2026-02-21T09:06:46.6696276Z 2026-02-21T09:06:46.6696341Z // 2026-02-21T09:06:46.6696686Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:06:46.6696897Z // 2026-02-21T09:06:46.6696970Z 2026-02-21T09:06:46.6697029Z .version 8.7 2026-02-21T09:06:46.6697183Z .target sm_90a 2026-02-21T09:06:46.6697331Z .address_size 64 2026-02-21T09:06:46.6697436Z 2026-02-21T09:06:46.6697614Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:06:46.6697964Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:06:46.6698217Z // @_helion_matmul_bf16_int4 2026-02-21T09:06:46.6698611Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:06:46.6698914Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:06:46.6699280Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:06:46.6699627Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:06:46.6699971Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:06:46.6700321Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:06:46.6700589Z ) 2026-02-21T09:06:46.6700723Z .reqntid 128 2026-02-21T09:06:46.6700863Z .maxnreg 32 2026-02-21T09:06:46.6700999Z { 2026-02-21T09:06:46.6701127Z .reg .pred %p<29>; 2026-02-21T09:06:46.6701295Z .reg .b16 %rs<49>; 2026-02-21T09:06:46.6701446Z .reg .b32 %r<785>; 2026-02-21T09:06:46.6701605Z .reg .b64 %rd<62>; 2026-02-21T09:06:46.6701762Z $L__func_begin0: 2026-02-21T09:06:46.6701855Z 2026-02-21T09:06:46.6701915Z // %bb.0: 2026-02-21T09:06:46.6702209Z .loc 1 21 67 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:21:67 2026-02-21T09:06:46.6702568Z mov.u32 %r1, %ctaid.x; 2026-02-21T09:06:46.6702797Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:06:46.6703048Z mov.u32 %r164, %ctaid.y; 2026-02-21T09:06:46.6703272Z ld.param.b64 %rd32, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:06:46.6703515Z mov.u32 %r165, %ctaid.z; 2026-02-21T09:06:46.6703691Z mov.u32 %r166, %nctaid.x; 2026-02-21T09:06:46.6703881Z mov.u32 %r167, %nctaid.y; 2026-02-21T09:06:46.6704064Z mad.lo.s32 %r168, %r165, %r167, %r164; 2026-02-21T09:06:46.6704281Z mad.lo.s32 %r169, %r168, %r166, %r1; 2026-02-21T09:06:46.6704474Z shl.b32 %r170, %r169, 7; 2026-02-21T09:06:46.6704652Z cvt.s64.s32 %rd33, %r170; 2026-02-21T09:06:46.6704827Z add.s64 %rd29, %rd32, %rd33; 2026-02-21T09:06:46.6705011Z mov.u32 %r2, %tid.x; 2026-02-21T09:06:46.6705173Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T09:06:46.6705356Z shl.b32 %r171, %r2, 2; 2026-02-21T09:06:46.6705525Z mov.b32 %r172, global_smem; 2026-02-21T09:06:46.6705710Z add.s32 %r156, %r172, %r171; 2026-02-21T09:06:46.6705890Z mov.b32 %r157, 0; 2026-02-21T09:06:46.6706139Z // begin inline asm 2026-02-21T09:06:46.6706321Z @%p1 st.shared.b32 [ %r156 + 0 ], %r157; 2026-02-21T09:06:46.6706665Z // end inline asm 2026-02-21T09:06:46.6706855Z bar.warp.sync -1; 2026-02-21T09:06:46.6707023Z setp.eq.b32 %p2, %r2, 0; 2026-02-21T09:06:46.6707200Z cvt.u64.u32 %rd14, %r172; 2026-02-21T09:06:46.6707367Z // begin inline asm 2026-02-21T09:06:46.6707693Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd14 + 0 ], %rd15; 2026-02-21T09:06:46.6708044Z // end inline asm 2026-02-21T09:06:46.6708191Z // begin inline asm 2026-02-21T09:06:46.6708555Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x1; 2026-02-21T09:06:46.6708860Z // end inline asm 2026-02-21T09:06:46.6709012Z mov.b32 %r158, 64; 2026-02-21T09:06:46.6709312Z // begin inline asm 2026-02-21T09:06:46.6709616Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x0, %r158; 2026-02-21T09:06:46.6709943Z // end inline asm 2026-02-21T09:06:46.6710101Z // begin inline asm 2026-02-21T09:06:46.6710376Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x1, %r158; 2026-02-21T09:06:46.6710694Z // end inline asm 2026-02-21T09:06:46.6710847Z mov.b32 %r160, 8192; 2026-02-21T09:06:46.6711004Z // begin inline asm 2026-02-21T09:06:46.6711295Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x0, %r160; 2026-02-21T09:06:46.6711633Z // end inline asm 2026-02-21T09:06:46.6711800Z mov.b32 %r161, 4096; 2026-02-21T09:06:46.6711960Z // begin inline asm 2026-02-21T09:06:46.6712252Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x1, %r161; 2026-02-21T09:06:46.6712601Z // end inline asm 2026-02-21T09:06:46.6712755Z mov.b64 %rd22, 16384; 2026-02-21T09:06:46.6712920Z // begin inline asm 2026-02-21T09:06:46.6713301Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd14 + 0 ], 0x0, %rd22; 2026-02-21T09:06:46.6713655Z // end inline asm 2026-02-21T09:06:46.6713808Z mov.b32 %r720, 1; 2026-02-21T09:06:46.6713957Z // begin inline asm 2026-02-21T09:06:46.6714267Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x0, %r720; 2026-02-21T09:06:46.6714631Z // end inline asm 2026-02-21T09:06:46.6714789Z // begin inline asm 2026-02-21T09:06:46.6715093Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x1, %r720; 2026-02-21T09:06:46.6715438Z // end inline asm 2026-02-21T09:06:46.6715591Z // begin inline asm 2026-02-21T09:06:46.6715859Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd14 + 0 ], 0xa; 2026-02-21T09:06:46.6716181Z // end inline asm 2026-02-21T09:06:46.6716324Z // begin inline asm 2026-02-21T09:06:46.6716777Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x0; 2026-02-21T09:06:46.6717132Z // end inline asm 2026-02-21T09:06:46.6717279Z // begin inline asm 2026-02-21T09:06:46.6717573Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x3; 2026-02-21T09:06:46.6717899Z // end inline asm 2026-02-21T09:06:46.6718048Z // begin inline asm 2026-02-21T09:06:46.6718311Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd14 + 0 ], 0x0; 2026-02-21T09:06:46.6718625Z // end inline asm 2026-02-21T09:06:46.6718774Z // begin inline asm 2026-02-21T09:06:46.6719201Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd29 + 0 ], [ %rd14 + 0 ], 0x80; 2026-02-21T09:06:46.6719678Z // end inline asm 2026-02-21T09:06:46.6719821Z // begin inline asm 2026-02-21T09:06:46.6720064Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd29 + 0 ], 0x80; 2026-02-21T09:06:46.6720364Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:06:46.6720590Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:46.6720798Z // end inline asm 2026-02-21T09:06:46.6720948Z bar.sync 0; 2026-02-21T09:06:46.6721124Z cvta.global.u64 %rd58, %rd29; 2026-02-21T09:06:46.6721463Z .loc 1 26 145 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:26:145 2026-02-21T09:06:46.6721927Z setp.gt.u32 %p19, %r1, 4095; 2026-02-21T09:06:46.6722107Z @%p19 bra $L__BB0_4; 2026-02-21T09:06:46.6722293Z // %bb.1: // %.lr.ph 2026-02-21T09:06:46.6722654Z .loc 1 0 145 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:0:145 2026-02-21T09:06:46.6723077Z ld.param.b64 %rd13, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:06:46.6723374Z ld.param.b64 %rd12, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:06:46.6723615Z shr.u32 %r3, %r2, 5; 2026-02-21T09:06:46.6723920Z .loc 1 54 38 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:54:38 2026-02-21T09:06:46.6724340Z and.b32 %r188, %r2, 3; 2026-02-21T09:06:46.6724577Z shl.b32 %r189, %r188, 2; 2026-02-21T09:06:46.6724880Z .loc 1 48 48 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:48:48 2026-02-21T09:06:46.6725231Z bfe.u32 %r190, %r2, 4, 3; 2026-02-21T09:06:46.6725537Z .loc 1 40 45 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:40:45 2026-02-21T09:06:46.6725876Z bfe.u32 %r191, %r2, 2, 5; 2026-02-21T09:06:46.6726181Z .loc 1 38 45 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:38:45 2026-02-21T09:06:46.6726662Z and.b32 %r192, %r2, 15; 2026-02-21T09:06:46.6726847Z shl.b32 %r193, %r192, 3; 2026-02-21T09:06:46.6727014Z shl.b32 %r194, %r2, 3; 2026-02-21T09:06:46.6727186Z and.b32 %r195, %r194, 888; 2026-02-21T09:06:46.6727363Z and.b32 %r196, %r2, 16; 2026-02-21T09:06:46.6727530Z bfe.s32 %r197, %r2, 4, 1; 2026-02-21T09:06:46.6727705Z and.b32 %r198, %r197, 136; 2026-02-21T09:06:46.6727877Z xor.b32 %r4, %r198, %r195; 2026-02-21T09:06:46.6728144Z add.s32 %r200, %r172, %r4; 2026-02-21T09:06:46.6728325Z add.s32 %r173, %r200, 24576; 2026-02-21T09:06:46.6728508Z add.s32 %r175, %r200, 25600; 2026-02-21T09:06:46.6728683Z shl.b32 %r201, %r190, 13; 2026-02-21T09:06:46.6728854Z and.b32 %r5, %r2, 127; 2026-02-21T09:06:46.6729016Z shl.b32 %r202, %r5, 3; 2026-02-21T09:06:46.6729185Z add.s32 %r203, %r172, %r202; 2026-02-21T09:06:46.6729358Z add.s32 %r177, %r203, 28672; 2026-02-21T09:06:46.6729534Z add.s32 %r179, %r200, 26624; 2026-02-21T09:06:46.6729710Z add.s32 %r181, %r200, 27648; 2026-02-21T09:06:46.6729877Z add.s32 %r183, %r203, 29696; 2026-02-21T09:06:46.6730055Z shl.b32 %r204, %r2, 4; 2026-02-21T09:06:46.6730214Z and.b32 %r205, %r204, 1536; 2026-02-21T09:06:46.6730391Z and.b32 %r206, %r194, 96; 2026-02-21T09:06:46.6730553Z shl.b32 %r207, %r188, 1; 2026-02-21T09:06:46.6730725Z or.b32 %r208, %r205, %r206; 2026-02-21T09:06:46.6730894Z or.b32 %r209, %r208, %r207; 2026-02-21T09:06:46.6731088Z or.b32 %r7, %r209, %r198; 2026-02-21T09:06:46.6731255Z xor.b32 %r8, %r7, 8; 2026-02-21T09:06:46.6731415Z or.b32 %r9, %r2, 896; 2026-02-21T09:06:46.6731578Z shl.b32 %r210, %r5, 6; 2026-02-21T09:06:46.6731739Z and.b32 %r211, %r194, 48; 2026-02-21T09:06:46.6731908Z or.b32 %r212, %r210, %r211; 2026-02-21T09:06:46.6732079Z add.s32 %r213, %r172, 16384; 2026-02-21T09:06:46.6732256Z add.s32 %r10, %r213, %r212; 2026-02-21T09:06:46.6732425Z xor.b32 %r214, %r212, 16; 2026-02-21T09:06:46.6732596Z add.s32 %r11, %r213, %r214; 2026-02-21T09:06:46.6732769Z xor.b32 %r215, %r212, 32; 2026-02-21T09:06:46.6732947Z add.s32 %r12, %r213, %r215; 2026-02-21T09:06:46.6733125Z xor.b32 %r216, %r212, 48; 2026-02-21T09:06:46.6733304Z add.s32 %r13, %r213, %r216; 2026-02-21T09:06:46.6733484Z bfe.u32 %r217, %r213, 4, 14; 2026-02-21T09:06:46.6733658Z cvt.u64.u32 %rd41, %r217; 2026-02-21T09:06:46.6733852Z or.b64 %rd53, %rd41, -9223371899382267904; 2026-02-21T09:06:46.6734065Z add.s32 %r218, %r172, 16416; 2026-02-21T09:06:46.6734247Z bfe.u32 %r219, %r218, 4, 14; 2026-02-21T09:06:46.6734420Z cvt.u64.u32 %rd42, %r219; 2026-02-21T09:06:46.6734618Z or.b64 %rd54, %rd42, -9223371899382267904; 2026-02-21T09:06:46.6734907Z shl.b32 %r220, %r192, 7; 2026-02-21T09:06:46.6735081Z shl.b32 %r221, %r2, 6; 2026-02-21T09:06:46.6735247Z and.b32 %r222, %r221, 6144; 2026-02-21T09:06:46.6735421Z and.b32 %r223, %r204, 112; 2026-02-21T09:06:46.6735600Z or.b32 %r224, %r220, %r223; 2026-02-21T09:06:46.6735769Z xor.b32 %r225, %r224, %r196; 2026-02-21T09:06:46.6735948Z or.b32 %r226, %r225, %r222; 2026-02-21T09:06:46.6736118Z add.s32 %r14, %r172, %r226; 2026-02-21T09:06:46.6736299Z add.s32 %r15, %r14, 8192; 2026-02-21T09:06:46.6736590Z xor.b32 %r227, %r226, 32; 2026-02-21T09:06:46.6736772Z add.s32 %r16, %r172, %r227; 2026-02-21T09:06:46.6736947Z add.s32 %r17, %r16, 8192; 2026-02-21T09:06:46.6737123Z xor.b32 %r228, %r226, 64; 2026-02-21T09:06:46.6737297Z add.s32 %r18, %r172, %r228; 2026-02-21T09:06:46.6737646Z add.s32 %r19, %r18, 8192; 2026-02-21T09:06:46.6737822Z xor.b32 %r229, %r226, 96; 2026-02-21T09:06:46.6737988Z add.s32 %r20, %r172, %r229; 2026-02-21T09:06:46.6738163Z add.s32 %r21, %r20, 8192; 2026-02-21T09:06:46.6738474Z .loc 1 39 27 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:39:27 2026-02-21T09:06:46.6738829Z and.b32 %r679, %r1, 4032; 2026-02-21T09:06:46.6739145Z .loc 1 37 27 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:37:27 2026-02-21T09:06:46.6739493Z shl.b32 %r230, %r1, 7; 2026-02-21T09:06:46.6739662Z and.b32 %r23, %r230, 8064; 2026-02-21T09:06:46.6739965Z .loc 1 38 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:38:32 2026-02-21T09:06:46.6740317Z or.b32 %r231, %r23, %r193; 2026-02-21T09:06:46.6740622Z .loc 1 40 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:40:32 2026-02-21T09:06:46.6740962Z or.b32 %r232, %r679, %r191; 2026-02-21T09:06:46.6741345Z .loc 1 55 53 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:53 2026-02-21T09:06:46.6741695Z shl.b32 %r233, %r232, 10; 2026-02-21T09:06:46.6741996Z .loc 1 55 60 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:60 2026-02-21T09:06:46.6742332Z or.b32 %r234, %r233, %r189; 2026-02-21T09:06:46.6742638Z .loc 1 55 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:32 2026-02-21T09:06:46.6742987Z mad.wide.u32 %rd34, %r234, 2, %rd12; 2026-02-21T09:06:46.6743207Z add.s64 %rd35, %rd34, 65536; 2026-02-21T09:06:46.6743378Z mov.b32 %r174, 8; 2026-02-21T09:06:46.6743666Z .loc 1 55 80 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:80 2026-02-21T09:06:46.6744011Z // begin inline asm 2026-02-21T09:06:46.6744247Z cp.async.ca.shared.global [ %r173 + 0 ], [ %rd34 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6744525Z // end inline asm 2026-02-21T09:06:46.6744682Z // begin inline asm 2026-02-21T09:06:46.6744914Z cp.async.ca.shared.global [ %r175 + 0 ], [ %rd35 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6745179Z // end inline asm 2026-02-21T09:06:46.6745341Z cp.async.commit_group; 2026-02-21T09:06:46.6745650Z .loc 1 61 62 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:62 2026-02-21T09:06:46.6745996Z or.b32 %r235, %r231, %r201; 2026-02-21T09:06:46.6746304Z .loc 1 61 34 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:34 2026-02-21T09:06:46.6746773Z cvt.u64.u32 %rd43, %r235; 2026-02-21T09:06:46.6746954Z add.s64 %rd36, %rd13, %rd43; 2026-02-21T09:06:46.6747260Z .loc 1 61 87 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:87 2026-02-21T09:06:46.6747603Z // begin inline asm 2026-02-21T09:06:46.6747827Z cp.async.ca.shared.global [ %r177 + 0 ], [ %rd36 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6748119Z // end inline asm 2026-02-21T09:06:46.6748363Z cp.async.commit_group; 2026-02-21T09:06:46.6748676Z .loc 1 55 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:32 2026-02-21T09:06:46.6749027Z add.s64 %rd37, %rd34, 32; 2026-02-21T09:06:46.6749299Z or.b32 %r236, %r234, 16; 2026-02-21T09:06:46.6749486Z mad.wide.u32 %rd44, %r236, 2, %rd12; 2026-02-21T09:06:46.6749686Z add.s64 %rd38, %rd44, 65536; 2026-02-21T09:06:46.6749999Z .loc 1 55 80 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:80 2026-02-21T09:06:46.6750337Z bar.sync 0; 2026-02-21T09:06:46.6750502Z // begin inline asm 2026-02-21T09:06:46.6750738Z cp.async.ca.shared.global [ %r179 + 0 ], [ %rd37 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6751003Z // end inline asm 2026-02-21T09:06:46.6751158Z // begin inline asm 2026-02-21T09:06:46.6751379Z cp.async.ca.shared.global [ %r181 + 0 ], [ %rd38 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6751654Z // end inline asm 2026-02-21T09:06:46.6751894Z cp.async.commit_group; 2026-02-21T09:06:46.6752286Z .loc 1 61 34 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:34 2026-02-21T09:06:46.6752637Z add.s64 %rd39, %rd36, 65536; 2026-02-21T09:06:46.6752952Z .loc 1 61 87 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:87 2026-02-21T09:06:46.6753293Z // begin inline asm 2026-02-21T09:06:46.6753527Z cp.async.ca.shared.global [ %r183 + 0 ], [ %rd39 + 0 ], 0x8, %r174; 2026-02-21T09:06:46.6753795Z // end inline asm 2026-02-21T09:06:46.6753946Z cp.async.commit_group; 2026-02-21T09:06:46.6754256Z .loc 1 47 102 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:47:102 2026-02-21T09:06:46.6754608Z mul.wide.u32 %rd45, %r188, 8; 2026-02-21T09:06:46.6754793Z shl.b32 %r237, %r1, 10; 2026-02-21T09:06:46.6754969Z and.b32 %r238, %r237, 4128768; 2026-02-21T09:06:46.6755149Z shl.b32 %r239, %r191, 10; 2026-02-21T09:06:46.6755325Z or.b32 %r240, %r238, %r239; 2026-02-21T09:06:46.6755503Z mul.wide.u32 %rd46, %r240, 2; 2026-02-21T09:06:46.6755770Z or.b64 %rd47, %rd45, %rd46; 2026-02-21T09:06:46.6755949Z add.s64 %rd48, %rd47, %rd12; 2026-02-21T09:06:46.6756128Z add.s64 %rd60, %rd48, 65600; 2026-02-21T09:06:46.6756308Z mul.wide.u32 %rd49, %r190, 8192; 2026-02-21T09:06:46.6756640Z cvt.u64.u32 %rd50, %r231; 2026-02-21T09:06:46.6756817Z or.b64 %rd51, %rd49, %rd50; 2026-02-21T09:06:46.6756997Z add.s64 %rd52, %rd51, %rd13; 2026-02-21T09:06:46.6757178Z add.s64 %rd59, %rd52, 131072; 2026-02-21T09:06:46.6757356Z mov.b32 %r721, 0f00000000; 2026-02-21T09:06:46.6757532Z mov.b32 %r719, -1; 2026-02-21T09:06:46.6757685Z mov.b64 %rd61, -8; 2026-02-21T09:06:46.6757845Z mov.b32 %r722, %r721; 2026-02-21T09:06:46.6758006Z mov.b32 %r723, %r721; 2026-02-21T09:06:46.6758166Z mov.b32 %r724, %r721; 2026-02-21T09:06:46.6758319Z mov.b32 %r725, %r721; 2026-02-21T09:06:46.6758492Z mov.b32 %r726, %r721; 2026-02-21T09:06:46.6758649Z mov.b32 %r727, %r721; 2026-02-21T09:06:46.6758811Z mov.b32 %r728, %r721; 2026-02-21T09:06:46.6758972Z mov.b32 %r729, %r721; 2026-02-21T09:06:46.6759126Z mov.b32 %r730, %r721; 2026-02-21T09:06:46.6759287Z mov.b32 %r731, %r721; 2026-02-21T09:06:46.6759440Z mov.b32 %r732, %r721; 2026-02-21T09:06:46.6759608Z mov.b32 %r733, %r721; 2026-02-21T09:06:46.6759763Z mov.b32 %r734, %r721; 2026-02-21T09:06:46.6759922Z mov.b32 %r735, %r721; 2026-02-21T09:06:46.6760078Z mov.b32 %r736, %r721; 2026-02-21T09:06:46.6760237Z mov.b32 %r737, %r721; 2026-02-21T09:06:46.6760389Z mov.b32 %r738, %r721; 2026-02-21T09:06:46.6760554Z mov.b32 %r739, %r721; 2026-02-21T09:06:46.6760716Z mov.b32 %r740, %r721; 2026-02-21T09:06:46.6760869Z mov.b32 %r741, %r721; 2026-02-21T09:06:46.6761045Z mov.b32 %r742, %r721; 2026-02-21T09:06:46.6761203Z mov.b32 %r743, %r721; 2026-02-21T09:06:46.6761368Z mov.b32 %r744, %r721; 2026-02-21T09:06:46.6761520Z mov.b32 %r745, %r721; 2026-02-21T09:06:46.6761677Z mov.b32 %r746, %r721; 2026-02-21T09:06:46.6761830Z mov.b32 %r747, %r721; 2026-02-21T09:06:46.6761991Z mov.b32 %r748, %r721; 2026-02-21T09:06:46.6762143Z mov.b32 %r749, %r721; 2026-02-21T09:06:46.6762300Z mov.b32 %r750, %r721; 2026-02-21T09:06:46.6762452Z mov.b32 %r751, %r721; 2026-02-21T09:06:46.6762700Z mov.b32 %r752, %r721; 2026-02-21T09:06:46.6762856Z mov.b32 %r753, %r721; 2026-02-21T09:06:46.6763009Z mov.b32 %r754, %r721; 2026-02-21T09:06:46.6763172Z mov.b32 %r755, %r721; 2026-02-21T09:06:46.6763340Z mov.b32 %r756, %r721; 2026-02-21T09:06:46.6763502Z mov.b32 %r757, %r721; 2026-02-21T09:06:46.6763654Z mov.b32 %r758, %r721; 2026-02-21T09:06:46.6763810Z mov.b32 %r759, %r721; 2026-02-21T09:06:46.6763961Z mov.b32 %r760, %r721; 2026-02-21T09:06:46.6764119Z mov.b32 %r761, %r721; 2026-02-21T09:06:46.6764271Z mov.b32 %r762, %r721; 2026-02-21T09:06:46.6764433Z mov.b32 %r763, %r721; 2026-02-21T09:06:46.6764598Z mov.b32 %r764, %r721; 2026-02-21T09:06:46.6764750Z mov.b32 %r765, %r721; 2026-02-21T09:06:46.6764924Z mov.b32 %r766, %r721; 2026-02-21T09:06:46.6765212Z mov.b32 %r767, %r721; 2026-02-21T09:06:46.6765378Z mov.b32 %r768, %r721; 2026-02-21T09:06:46.6765531Z mov.b32 %r769, %r721; 2026-02-21T09:06:46.6765689Z mov.b32 %r770, %r721; 2026-02-21T09:06:46.6765844Z mov.b32 %r771, %r721; 2026-02-21T09:06:46.6766002Z mov.b32 %r772, %r721; 2026-02-21T09:06:46.6766153Z mov.b32 %r773, %r721; 2026-02-21T09:06:46.6766310Z mov.b32 %r774, %r721; 2026-02-21T09:06:46.6766599Z mov.b32 %r775, %r721; 2026-02-21T09:06:46.6766760Z mov.b32 %r776, %r721; 2026-02-21T09:06:46.6766921Z mov.b32 %r777, %r721; 2026-02-21T09:06:46.6767073Z mov.b32 %r778, %r721; 2026-02-21T09:06:46.6767233Z mov.b32 %r779, %r721; 2026-02-21T09:06:46.6767389Z mov.b32 %r780, %r721; 2026-02-21T09:06:46.6767548Z mov.b32 %r781, %r721; 2026-02-21T09:06:46.6767703Z mov.b32 %r782, %r721; 2026-02-21T09:06:46.6767863Z mov.b32 %r783, %r721; 2026-02-21T09:06:46.6768019Z mov.b32 %r784, %r721; 2026-02-21T09:06:46.6768242Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T09:06:46.6768607Z add.s64 %rd61, %rd61, 8; 2026-02-21T09:06:46.6768796Z setp.lt.u64 %p22, %rd61, 496; 2026-02-21T09:06:46.6768988Z add.s32 %r645, %r719, 1; 2026-02-21T09:06:46.6769168Z setp.gt.s32 %p23, %r645, 1; 2026-02-21T09:06:46.6769362Z selp.b32 %r719, 0, %r645, %p23; 2026-02-21T09:06:46.6769691Z .loc 1 55 80 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:80 2026-02-21T09:06:46.6770052Z cp.async.wait_group 2; 2026-02-21T09:06:46.6770221Z bar.sync 0; 2026-02-21T09:06:46.6770374Z shl.b32 %r646, %r719, 10; 2026-02-21T09:06:46.6770550Z shl.b32 %r647, %r719, 11; 2026-02-21T09:06:46.6770721Z add.s32 %r649, %r172, 24576; 2026-02-21T09:06:46.6770902Z add.s32 %r650, %r649, %r647; 2026-02-21T09:06:46.6771209Z .loc 1 59 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:59:32 2026-02-21T09:06:46.6771560Z add.s32 %r651, %r650, %r7; 2026-02-21T09:06:46.6771740Z ld.shared.b16 %rs1, [%r651]; 2026-02-21T09:06:46.6771929Z ld.shared.b16 %rs2, [%r651+256]; 2026-02-21T09:06:46.6772127Z ld.shared.b16 %rs3, [%r651+16]; 2026-02-21T09:06:46.6772320Z ld.shared.b16 %rs4, [%r651+272]; 2026-02-21T09:06:46.6772514Z add.s32 %r652, %r650, %r8; 2026-02-21T09:06:46.6772688Z ld.shared.b16 %rs5, [%r652]; 2026-02-21T09:06:46.6772869Z ld.shared.b16 %rs6, [%r652+256]; 2026-02-21T09:06:46.6773055Z ld.shared.b16 %rs7, [%r652+16]; 2026-02-21T09:06:46.6773255Z ld.shared.b16 %rs8, [%r652+272]; 2026-02-21T09:06:46.6773443Z cvt.f32.bf16 %r369, %rs1; 2026-02-21T09:06:46.6773614Z cvt.f32.bf16 %r370, %rs2; 2026-02-21T09:06:46.6773781Z cvt.f32.bf16 %r371, %rs5; 2026-02-21T09:06:46.6773952Z cvt.f32.bf16 %r372, %rs6; 2026-02-21T09:06:46.6774118Z cvt.f32.bf16 %r501, %rs3; 2026-02-21T09:06:46.6774292Z cvt.f32.bf16 %r502, %rs4; 2026-02-21T09:06:46.6774467Z cvt.f32.bf16 %r503, %rs7; 2026-02-21T09:06:46.6774632Z cvt.f32.bf16 %r504, %rs8; 2026-02-21T09:06:46.6774944Z .loc 1 61 87 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:87 2026-02-21T09:06:46.6775285Z add.s32 %r653, %r172, %r646; 2026-02-21T09:06:46.6775465Z add.s32 %r654, %r653, 28672; 2026-02-21T09:06:46.6775849Z .loc 1 74 45 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:74:45 2026-02-21T09:06:46.6776209Z add.s32 %r655, %r654, %r5; 2026-02-21T09:06:46.6776379Z add.s32 %r656, %r654, %r9; 2026-02-21T09:06:46.6776799Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6777150Z ld.shared.s8 %rs9, [%r655]; 2026-02-21T09:06:46.6777457Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6777803Z shl.b16 %rs10, %rs9, 4; 2026-02-21T09:06:46.6778101Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6778454Z ld.shared.s8 %rs11, [%r655+128]; 2026-02-21T09:06:46.6778930Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6779282Z shl.b16 %rs12, %rs11, 4; 2026-02-21T09:06:46.6779587Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6779932Z ld.shared.s8 %rs13, [%r655+256]; 2026-02-21T09:06:46.6780253Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6780593Z shl.b16 %rs14, %rs13, 4; 2026-02-21T09:06:46.6780900Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6781253Z ld.shared.s8 %rs15, [%r655+384]; 2026-02-21T09:06:46.6781570Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6781920Z shl.b16 %rs16, %rs15, 4; 2026-02-21T09:06:46.6782235Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6782659Z ld.shared.s8 %rs17, [%r655+512]; 2026-02-21T09:06:46.6782978Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6783326Z shl.b16 %rs18, %rs17, 4; 2026-02-21T09:06:46.6783632Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6783979Z ld.shared.s8 %rs19, [%r655+640]; 2026-02-21T09:06:46.6784300Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6784637Z shl.b16 %rs20, %rs19, 4; 2026-02-21T09:06:46.6784939Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6785280Z ld.shared.s8 %rs21, [%r655+768]; 2026-02-21T09:06:46.6785602Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6785947Z shl.b16 %rs22, %rs21, 4; 2026-02-21T09:06:46.6786246Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6786740Z ld.shared.s8 %rs23, [%r656]; 2026-02-21T09:06:46.6787046Z .loc 1 64 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:64:28 2026-02-21T09:06:46.6787387Z shl.b16 %rs24, %rs23, 4; 2026-02-21T09:06:46.6787682Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6788030Z cvt.s16.s8 %rs25, %rs10; 2026-02-21T09:06:46.6788199Z shr.s16 %rs26, %rs25, 4; 2026-02-21T09:06:46.6788447Z cvt.s16.s8 %rs27, %rs12; 2026-02-21T09:06:46.6788620Z shr.s16 %rs28, %rs27, 4; 2026-02-21T09:06:46.6788783Z shr.s16 %rs29, %rs9, 4; 2026-02-21T09:06:46.6788952Z shr.s16 %rs30, %rs11, 4; 2026-02-21T09:06:46.6789245Z .loc 1 84 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:84:32 2026-02-21T09:06:46.6789599Z cvt.rn.f32.s16 %r657, %rs30; 2026-02-21T09:06:46.6789777Z cvt.rn.f32.s16 %r658, %rs29; 2026-02-21T09:06:46.6789962Z cvt.rn.f32.s16 %r659, %rs28; 2026-02-21T09:06:46.6790228Z cvt.rn.f32.s16 %r660, %rs26; 2026-02-21T09:06:46.6790539Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6790912Z cvt.s16.s8 %rs31, %rs14; 2026-02-21T09:06:46.6791088Z shr.s16 %rs32, %rs31, 4; 2026-02-21T09:06:46.6791261Z cvt.s16.s8 %rs33, %rs16; 2026-02-21T09:06:46.6791428Z shr.s16 %rs34, %rs33, 4; 2026-02-21T09:06:46.6791604Z shr.s16 %rs35, %rs13, 4; 2026-02-21T09:06:46.6791768Z shr.s16 %rs36, %rs15, 4; 2026-02-21T09:06:46.6792080Z .loc 1 84 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:84:32 2026-02-21T09:06:46.6792433Z cvt.rn.f32.s16 %r661, %rs36; 2026-02-21T09:06:46.6792611Z cvt.rn.f32.s16 %r662, %rs35; 2026-02-21T09:06:46.6792905Z cvt.rn.f32.s16 %r663, %rs34; 2026-02-21T09:06:46.6793143Z cvt.rn.f32.s16 %r664, %rs32; 2026-02-21T09:06:46.6793459Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6793802Z cvt.s16.s8 %rs37, %rs18; 2026-02-21T09:06:46.6793978Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:06:46.6794146Z cvt.s16.s8 %rs39, %rs20; 2026-02-21T09:06:46.6794326Z shr.s16 %rs40, %rs39, 4; 2026-02-21T09:06:46.6794500Z shr.s16 %rs41, %rs17, 4; 2026-02-21T09:06:46.6794664Z shr.s16 %rs42, %rs19, 4; 2026-02-21T09:06:46.6794968Z .loc 1 84 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:84:32 2026-02-21T09:06:46.6795311Z cvt.rn.f32.s16 %r665, %rs42; 2026-02-21T09:06:46.6795494Z cvt.rn.f32.s16 %r666, %rs41; 2026-02-21T09:06:46.6795671Z cvt.rn.f32.s16 %r667, %rs40; 2026-02-21T09:06:46.6795854Z cvt.rn.f32.s16 %r668, %rs38; 2026-02-21T09:06:46.6796160Z .loc 1 66 25 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:66:25 2026-02-21T09:06:46.6796715Z cvt.s16.s8 %rs43, %rs22; 2026-02-21T09:06:46.6796895Z shr.s16 %rs44, %rs43, 4; 2026-02-21T09:06:46.6797060Z cvt.s16.s8 %rs45, %rs24; 2026-02-21T09:06:46.6797232Z shr.s16 %rs46, %rs45, 4; 2026-02-21T09:06:46.6797398Z shr.s16 %rs47, %rs21, 4; 2026-02-21T09:06:46.6797579Z shr.s16 %rs48, %rs23, 4; 2026-02-21T09:06:46.6797876Z .loc 1 84 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:84:32 2026-02-21T09:06:46.6798223Z cvt.rn.f32.s16 %r669, %rs48; 2026-02-21T09:06:46.6798406Z cvt.rn.f32.s16 %r670, %rs47; 2026-02-21T09:06:46.6798580Z cvt.rn.f32.s16 %r671, %rs46; 2026-02-21T09:06:46.6798758Z cvt.rn.f32.s16 %r672, %rs44; 2026-02-21T09:06:46.6798975Z st.shared.v4.b32 [%r10], {%r660, %r658, %r659, %r657}; 2026-02-21T09:06:46.6799267Z st.shared.v4.b32 [%r11], {%r664, %r662, %r663, %r661}; 2026-02-21T09:06:46.6799538Z st.shared.v4.b32 [%r12], {%r668, %r666, %r667, %r665}; 2026-02-21T09:06:46.6799814Z st.shared.v4.b32 [%r13], {%r672, %r670, %r671, %r669}; 2026-02-21T09:06:46.6800040Z $L__tmp0: 2026-02-21T09:06:46.6800394Z .loc 2 291 36 // standard.py:291:36 @[ c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:91:40 ] 2026-02-21T09:06:46.6800809Z // begin inline asm 2026-02-21T09:06:46.6800987Z fence.proxy.async.shared::cta; 2026-02-21T09:06:46.6801182Z // end inline asm 2026-02-21T09:06:46.6801334Z bar.sync 0; 2026-02-21T09:06:46.6801502Z shfl.sync.idx.b32 %r673, %r3, 0, 31, -1; 2026-02-21T09:06:46.6801722Z wgmma.fence.sync.aligned; 2026-02-21T09:06:46.6801910Z mov.pred %p20, -1; 2026-02-21T09:06:46.6802079Z // begin inline asm 2026-02-21T09:06:46.6803262Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r738,%r739,%r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783,%r784}, {%r369,%r370,%r371,%r372}, %rd53, %p20, 1, 1; 2026-02-21T09:06:46.6804486Z // end inline asm 2026-02-21T09:06:46.6804719Z // begin inline asm 2026-02-21T09:06:46.6805873Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r738,%r739,%r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783,%r784}, {%r501,%r502,%r503,%r504}, %rd54, %p20, 1, 1; 2026-02-21T09:06:46.6807190Z // end inline asm 2026-02-21T09:06:46.6807356Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:46.6807557Z mov.b32 %r570, 0; 2026-02-21T09:06:46.6807708Z mov.b32 %r569, %r213; 2026-02-21T09:06:46.6807967Z mov.b32 %r571, %r570; 2026-02-21T09:06:46.6808186Z // begin inline asm 2026-02-21T09:06:46.6809158Z // wait for regs: %r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r738,%r739,%r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783,%r784,%r569,%r570,%r571 2026-02-21T09:06:46.6810190Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:46.6810379Z // end inline asm 2026-02-21T09:06:46.6810529Z $L__tmp1: 2026-02-21T09:06:46.6810829Z .loc 1 47 102 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:47:102 2026-02-21T09:06:46.6811189Z add.s32 %r674, %r720, 1; 2026-02-21T09:06:46.6811378Z setp.gt.s32 %p24, %r674, 1; 2026-02-21T09:06:46.6811563Z selp.b32 %r720, 0, %r674, %p24; 2026-02-21T09:06:46.6811971Z .loc 1 55 32 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:32 2026-02-21T09:06:46.6812340Z add.s64 %rd55, %rd60, -65536; 2026-02-21T09:06:46.6812687Z .loc 1 55 80 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:55:80 2026-02-21T09:06:46.6813046Z shl.b32 %r675, %r720, 10; 2026-02-21T09:06:46.6813222Z shl.b32 %r676, %r720, 11; 2026-02-21T09:06:46.6813397Z add.s32 %r677, %r649, %r676; 2026-02-21T09:06:46.6813579Z add.s32 %r639, %r677, %r4; 2026-02-21T09:06:46.6813764Z selp.b32 %r640, 8, 0, %p22; 2026-02-21T09:06:46.6813940Z // begin inline asm 2026-02-21T09:06:46.6814179Z cp.async.ca.shared.global [ %r639 + 0 ], [ %rd55 + 0 ], 0x8, %r640; 2026-02-21T09:06:46.6814455Z // end inline asm 2026-02-21T09:06:46.6814609Z add.s32 %r641, %r639, 1024; 2026-02-21T09:06:46.6814780Z // begin inline asm 2026-02-21T09:06:46.6815010Z cp.async.ca.shared.global [ %r641 + 0 ], [ %rd60 + 0 ], 0x8, %r640; 2026-02-21T09:06:46.6815287Z // end inline asm 2026-02-21T09:06:46.6815441Z cp.async.commit_group; 2026-02-21T09:06:46.6815751Z .loc 1 61 87 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:61:87 2026-02-21T09:06:46.6816095Z add.s32 %r643, %r177, %r675; 2026-02-21T09:06:46.6816276Z // begin inline asm 2026-02-21T09:06:46.6816633Z cp.async.ca.shared.global [ %r643 + 0 ], [ %rd59 + 0 ], 0x8, %r640; 2026-02-21T09:06:46.6816910Z // end inline asm 2026-02-21T09:06:46.6817061Z cp.async.commit_group; 2026-02-21T09:06:46.6817375Z .loc 1 47 102 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:47:102 2026-02-21T09:06:46.6817735Z add.s64 %rd60, %rd60, 32; 2026-02-21T09:06:46.6817907Z add.s64 %rd59, %rd59, 65536; 2026-02-21T09:06:46.6818095Z setp.lt.u64 %p25, %rd61, 504; 2026-02-21T09:06:46.6818277Z @%p25 bra $L__BB0_2; 2026-02-21T09:06:46.6818480Z // %bb.3: // %._crit_edge.loopexit 2026-02-21T09:06:46.6818856Z .loc 1 0 102 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:0:102 2026-02-21T09:06:46.6819216Z setp.lt.u32 %p27, %r2, 64; 2026-02-21T09:06:46.6819540Z .loc 1 47 102 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:47:102 2026-02-21T09:06:46.6819990Z cp.async.wait_group 0; 2026-02-21T09:06:46.6820164Z bar.sync 0; 2026-02-21T09:06:46.6820447Z .loc 1 94 28 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:94:28 2026-02-21T09:06:46.6820806Z cvt.rn.bf16x2.f32 %r681, %r722, %r721; 2026-02-21T09:06:46.6821019Z cvt.rn.bf16x2.f32 %r682, %r724, %r723; 2026-02-21T09:06:46.6821232Z cvt.rn.bf16x2.f32 %r683, %r726, %r725; 2026-02-21T09:06:46.6821439Z cvt.rn.bf16x2.f32 %r684, %r728, %r727; 2026-02-21T09:06:46.6821647Z cvt.rn.bf16x2.f32 %r685, %r730, %r729; 2026-02-21T09:06:46.6821857Z cvt.rn.bf16x2.f32 %r686, %r732, %r731; 2026-02-21T09:06:46.6822059Z cvt.rn.bf16x2.f32 %r687, %r734, %r733; 2026-02-21T09:06:46.6822269Z cvt.rn.bf16x2.f32 %r688, %r736, %r735; 2026-02-21T09:06:46.6822610Z cvt.rn.bf16x2.f32 %r689, %r738, %r737; 2026-02-21T09:06:46.6822824Z cvt.rn.bf16x2.f32 %r690, %r740, %r739; 2026-02-21T09:06:46.6823025Z cvt.rn.bf16x2.f32 %r691, %r742, %r741; 2026-02-21T09:06:46.6823233Z cvt.rn.bf16x2.f32 %r692, %r744, %r743; 2026-02-21T09:06:46.6823435Z cvt.rn.bf16x2.f32 %r693, %r746, %r745; 2026-02-21T09:06:46.6823642Z cvt.rn.bf16x2.f32 %r694, %r748, %r747; 2026-02-21T09:06:46.6823855Z cvt.rn.bf16x2.f32 %r695, %r750, %r749; 2026-02-21T09:06:46.6824058Z cvt.rn.bf16x2.f32 %r696, %r752, %r751; 2026-02-21T09:06:46.6824266Z cvt.rn.bf16x2.f32 %r697, %r754, %r753; 2026-02-21T09:06:46.6824469Z cvt.rn.bf16x2.f32 %r698, %r756, %r755; 2026-02-21T09:06:46.6824679Z cvt.rn.bf16x2.f32 %r699, %r758, %r757; 2026-02-21T09:06:46.6824879Z cvt.rn.bf16x2.f32 %r700, %r760, %r759; 2026-02-21T09:06:46.6825090Z cvt.rn.bf16x2.f32 %r701, %r762, %r761; 2026-02-21T09:06:46.6825294Z cvt.rn.bf16x2.f32 %r702, %r764, %r763; 2026-02-21T09:06:46.6825520Z cvt.rn.bf16x2.f32 %r703, %r766, %r765; 2026-02-21T09:06:46.6825806Z cvt.rn.bf16x2.f32 %r704, %r768, %r767; 2026-02-21T09:06:46.6826014Z cvt.rn.bf16x2.f32 %r705, %r770, %r769; 2026-02-21T09:06:46.6826220Z cvt.rn.bf16x2.f32 %r706, %r772, %r771; 2026-02-21T09:06:46.6826427Z cvt.rn.bf16x2.f32 %r707, %r774, %r773; 2026-02-21T09:06:46.6826773Z cvt.rn.bf16x2.f32 %r708, %r776, %r775; 2026-02-21T09:06:46.6826977Z cvt.rn.bf16x2.f32 %r709, %r778, %r777; 2026-02-21T09:06:46.6827182Z cvt.rn.bf16x2.f32 %r710, %r780, %r779; 2026-02-21T09:06:46.6827381Z cvt.rn.bf16x2.f32 %r711, %r782, %r781; 2026-02-21T09:06:46.6827590Z cvt.rn.bf16x2.f32 %r712, %r784, %r783; 2026-02-21T09:06:46.6827922Z .loc 1 95 43 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:95:43 2026-02-21T09:06:46.6828325Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:46.6828536Z bar.sync 0; 2026-02-21T09:06:46.6828800Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r14], {%r681, %r682, %r683, %r684}; 2026-02-21T09:06:46.6829246Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r15], {%r697, %r698, %r699, %r700}; 2026-02-21T09:06:46.6829673Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r16], {%r685, %r686, %r687, %r688}; 2026-02-21T09:06:46.6830098Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r17], {%r701, %r702, %r703, %r704}; 2026-02-21T09:06:46.6830521Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r18], {%r689, %r690, %r691, %r692}; 2026-02-21T09:06:46.6830937Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r19], {%r705, %r706, %r707, %r708}; 2026-02-21T09:06:46.6831358Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r693, %r694, %r695, %r696}; 2026-02-21T09:06:46.6831775Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r21], {%r709, %r710, %r711, %r712}; 2026-02-21T09:06:46.6832092Z // begin inline asm 2026-02-21T09:06:46.6832268Z fence.proxy.async.shared::cta; 2026-02-21T09:06:46.6832461Z // end inline asm 2026-02-21T09:06:46.6832613Z bar.sync 0; 2026-02-21T09:06:46.6832762Z elect.sync %r713|%p28, -1; 2026-02-21T09:06:46.6832979Z shfl.sync.idx.b32 %r714, %r3, 0, 31, -1; 2026-02-21T09:06:46.6833200Z and.pred %p26, %p27, %p28; 2026-02-21T09:06:46.6833386Z and.b32 %r715, %r714, 1; 2026-02-21T09:06:46.6833560Z shl.b32 %r716, %r715, 13; 2026-02-21T09:06:46.6833826Z add.s32 %r680, %r172, %r716; 2026-02-21T09:06:46.6833999Z shl.b32 %r718, %r715, 6; 2026-02-21T09:06:46.6834175Z or.b32 %r678, %r718, %r23; 2026-02-21T09:06:46.6834344Z // begin inline asm 2026-02-21T09:06:46.6834667Z @%p26 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd58, {%r678, %r679}], [%r680]; 2026-02-21T09:06:46.6835049Z // end inline asm 2026-02-21T09:06:46.6835210Z cp.async.bulk.commit_group; 2026-02-21T09:06:46.6835422Z $L__BB0_4: // %._crit_edge 2026-02-21T09:06:46.6835791Z .loc 1 26 145 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:26:145 2026-02-21T09:06:46.6836178Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:46.6836577Z bar.sync 0; 2026-02-21T09:06:46.6836945Z .loc 1 26 4 // c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py:26:4 2026-02-21T09:06:46.6837293Z ret; 2026-02-21T09:06:46.6837440Z $L__tmp2: 2026-02-21T09:06:46.6837588Z $L__func_end0: 2026-02-21T09:06:46.6837761Z // -- End function 2026-02-21T09:06:46.6837982Z } 2026-02-21T09:06:46.6838294Z .file 1 "/tmp/torchinductor_root/44/c44s4ajpp64dp3mks3mm5a3p5gvxiz6slqpva63qaw3pta3eut3e.py" 2026-02-21T09:06:46.6838833Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:06:46.6839190Z .section .debug_abbrev 2026-02-21T09:06:46.6839354Z { 2026-02-21T09:06:46.6839524Z .b8 1 // Abbreviation Code 2026-02-21T09:06:46.6839791Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:06:46.6840049Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:06:46.6840295Z .b8 37 // DW_AT_producer 2026-02-21T09:06:46.6840627Z .b8 8 // DW_FORM_string 2026-02-21T09:06:46.6840870Z .b8 19 // DW_AT_language 2026-02-21T09:06:46.6841115Z .b8 5 // DW_FORM_data2 2026-02-21T09:06:46.6841355Z .b8 3 // DW_AT_name 2026-02-21T09:06:46.6841601Z .b8 8 // DW_FORM_string 2026-02-21T09:06:46.6841846Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:06:46.6842082Z .b8 6 // DW_FORM_data4 2026-02-21T09:06:46.6842322Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:06:46.6842555Z .b8 8 // DW_FORM_string 2026-02-21T09:06:46.6842790Z .b8 0 // EOM(1) 2026-02-21T09:06:46.6843016Z .b8 0 // EOM(2) 2026-02-21T09:06:46.6843250Z .b8 2 // Abbreviation Code 2026-02-21T09:06:46.6843509Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:06:46.6843756Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:46.6844012Z .b8 3 // DW_AT_name 2026-02-21T09:06:46.6844245Z .b8 8 // DW_FORM_string 2026-02-21T09:06:46.6844487Z .b8 32 // DW_AT_inline 2026-02-21T09:06:46.6844723Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:46.6844955Z .b8 0 // EOM(1) 2026-02-21T09:06:46.6845172Z .b8 0 // EOM(2) 2026-02-21T09:06:46.6845400Z .b8 3 // Abbreviation Code 2026-02-21T09:06:46.6845657Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:06:46.6845902Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:06:46.6846147Z .b8 17 // DW_AT_low_pc 2026-02-21T09:06:46.6846384Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:46.6846758Z .b8 18 // DW_AT_high_pc 2026-02-21T09:06:46.6847100Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:46.6847345Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:06:46.6847600Z .b8 19 // DW_FORM_ref4 2026-02-21T09:06:46.6847825Z .b8 0 // EOM(1) 2026-02-21T09:06:46.6848047Z .b8 0 // EOM(2) 2026-02-21T09:06:46.6848276Z .b8 4 // Abbreviation Code 2026-02-21T09:06:46.6848543Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:06:46.6848816Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:46.6849078Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:06:46.6849469Z .b8 19 // DW_FORM_ref4 2026-02-21T09:06:46.6849712Z .b8 17 // DW_AT_low_pc 2026-02-21T09:06:46.6849952Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:46.6850187Z .b8 18 // DW_AT_high_pc 2026-02-21T09:06:46.6850425Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:46.6850668Z .b8 88 // DW_AT_call_file 2026-02-21T09:06:46.6850907Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:46.6851150Z .b8 89 // DW_AT_call_line 2026-02-21T09:06:46.6851401Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:46.6851651Z .b8 87 // DW_AT_call_column 2026-02-21T09:06:46.6851895Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:46.6852127Z .b8 0 // EOM(1) 2026-02-21T09:06:46.6852358Z .b8 0 // EOM(2) 2026-02-21T09:06:46.6852651Z .b8 0 // EOM(3) 2026-02-21T09:06:46.6852856Z } 2026-02-21T09:06:46.6852986Z .section .debug_info 2026-02-21T09:06:46.6853143Z { 2026-02-21T09:06:46.6853299Z .b32 178 // Length of Unit 2026-02-21T09:06:46.6853562Z .b8 2 // DWARF version number 2026-02-21T09:06:46.6853798Z .b8 0 2026-02-21T09:06:46.6854011Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:06:46.6854311Z .b8 8 // Address Size (in bytes) 2026-02-21T09:06:46.6854604Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:06:46.6854895Z .b8 116 // DW_AT_producer 2026-02-21T09:06:46.6855109Z .b8 114 2026-02-21T09:06:46.6855237Z .b8 105 2026-02-21T09:06:46.6855359Z .b8 116 2026-02-21T09:06:46.6855486Z .b8 111 2026-02-21T09:06:46.6855606Z .b8 110 2026-02-21T09:06:46.6855734Z .b8 0 2026-02-21T09:06:46.6855887Z .b8 2 // DW_AT_language 2026-02-21T09:06:46.6856101Z .b8 0 2026-02-21T09:06:46.6856257Z .b8 99 // DW_AT_name 2026-02-21T09:06:46.6856591Z .b8 52 2026-02-21T09:06:46.6856726Z .b8 52 2026-02-21T09:06:46.6856847Z .b8 115 2026-02-21T09:06:46.6856974Z .b8 52 2026-02-21T09:06:46.6857096Z .b8 97 2026-02-21T09:06:46.6857223Z .b8 106 2026-02-21T09:06:46.6857344Z .b8 112 2026-02-21T09:06:46.6857471Z .b8 112 2026-02-21T09:06:46.6857591Z .b8 54 2026-02-21T09:06:46.6857720Z .b8 52 2026-02-21T09:06:46.6857840Z .b8 100 2026-02-21T09:06:46.6857970Z .b8 112 2026-02-21T09:06:46.6858088Z .b8 51 2026-02-21T09:06:46.6858216Z .b8 109 2026-02-21T09:06:46.6858347Z .b8 107 2026-02-21T09:06:46.6858476Z .b8 115 2026-02-21T09:06:46.6858602Z .b8 51 2026-02-21T09:06:46.6858726Z .b8 109 2026-02-21T09:06:46.6865115Z .b8 109 2026-02-21T09:06:46.6865317Z .b8 53 2026-02-21T09:06:46.6865460Z .b8 97 2026-02-21T09:06:46.6865605Z .b8 51 2026-02-21T09:06:46.6865736Z .b8 112 2026-02-21T09:06:46.6865871Z .b8 53 2026-02-21T09:06:46.6865995Z .b8 103 2026-02-21T09:06:46.6866132Z .b8 118 2026-02-21T09:06:46.6866260Z .b8 120 2026-02-21T09:06:46.6866749Z .b8 105 2026-02-21T09:06:46.6866891Z .b8 122 2026-02-21T09:06:46.6867033Z .b8 54 2026-02-21T09:06:46.6867162Z .b8 115 2026-02-21T09:06:46.6867296Z .b8 108 2026-02-21T09:06:46.6867429Z .b8 113 2026-02-21T09:06:46.6867551Z .b8 112 2026-02-21T09:06:46.6867684Z .b8 118 2026-02-21T09:06:46.6867806Z .b8 97 2026-02-21T09:06:46.6867940Z .b8 54 2026-02-21T09:06:46.6868066Z .b8 51 2026-02-21T09:06:46.6868199Z .b8 113 2026-02-21T09:06:46.6868407Z .b8 97 2026-02-21T09:06:46.6868545Z .b8 119 2026-02-21T09:06:46.6868668Z .b8 51 2026-02-21T09:06:46.6868802Z .b8 112 2026-02-21T09:06:46.6868925Z .b8 116 2026-02-21T09:06:46.6869055Z .b8 97 2026-02-21T09:06:46.6869179Z .b8 51 2026-02-21T09:06:46.6869310Z .b8 101 2026-02-21T09:06:46.6869442Z .b8 117 2026-02-21T09:06:46.6869666Z .b8 116 2026-02-21T09:06:46.6869864Z .b8 51 2026-02-21T09:06:46.6869994Z .b8 101 2026-02-21T09:06:46.6870128Z .b8 46 2026-02-21T09:06:46.6870262Z .b8 112 2026-02-21T09:06:46.6870398Z .b8 121 2026-02-21T09:06:46.6870522Z .b8 0 2026-02-21T09:06:46.6870728Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:06:46.6871018Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:06:46.6871258Z .b8 116 2026-02-21T09:06:46.6871389Z .b8 109 2026-02-21T09:06:46.6871523Z .b8 112 2026-02-21T09:06:46.6871660Z .b8 47 2026-02-21T09:06:46.6871797Z .b8 116 2026-02-21T09:06:46.6871931Z .b8 111 2026-02-21T09:06:46.6872052Z .b8 114 2026-02-21T09:06:46.6872188Z .b8 99 2026-02-21T09:06:46.6872312Z .b8 104 2026-02-21T09:06:46.6872444Z .b8 105 2026-02-21T09:06:46.6872568Z .b8 110 2026-02-21T09:06:46.6872699Z .b8 100 2026-02-21T09:06:46.6872823Z .b8 117 2026-02-21T09:06:46.6872957Z .b8 99 2026-02-21T09:06:46.6873084Z .b8 116 2026-02-21T09:06:46.6873231Z .b8 111 2026-02-21T09:06:46.6873367Z .b8 114 2026-02-21T09:06:46.6873502Z .b8 95 2026-02-21T09:06:46.6873711Z .b8 114 2026-02-21T09:06:46.6873853Z .b8 111 2026-02-21T09:06:46.6874000Z .b8 111 2026-02-21T09:06:46.6874128Z .b8 116 2026-02-21T09:06:46.6874259Z .b8 47 2026-02-21T09:06:46.6874384Z .b8 52 2026-02-21T09:06:46.6874520Z .b8 52 2026-02-21T09:06:46.6874644Z .b8 0 2026-02-21T09:06:46.6874848Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:06:46.6875144Z .b8 95 // DW_AT_name 2026-02-21T09:06:46.6875377Z .b8 104 2026-02-21T09:06:46.6875512Z .b8 101 2026-02-21T09:06:46.6875644Z .b8 108 2026-02-21T09:06:46.6875769Z .b8 105 2026-02-21T09:06:46.6875897Z .b8 111 2026-02-21T09:06:46.6876028Z .b8 110 2026-02-21T09:06:46.6876151Z .b8 95 2026-02-21T09:06:46.6876282Z .b8 109 2026-02-21T09:06:46.6876413Z .b8 97 2026-02-21T09:06:46.6876678Z .b8 116 2026-02-21T09:06:46.6876806Z .b8 109 2026-02-21T09:06:46.6876943Z .b8 117 2026-02-21T09:06:46.6877073Z .b8 108 2026-02-21T09:06:46.6877209Z .b8 95 2026-02-21T09:06:46.6877337Z .b8 98 2026-02-21T09:06:46.6877469Z .b8 102 2026-02-21T09:06:46.6877594Z .b8 49 2026-02-21T09:06:46.6877723Z .b8 54 2026-02-21T09:06:46.6877844Z .b8 95 2026-02-21T09:06:46.6877976Z .b8 105 2026-02-21T09:06:46.6878109Z .b8 110 2026-02-21T09:06:46.6878233Z .b8 116 2026-02-21T09:06:46.6878362Z .b8 52 2026-02-21T09:06:46.6878487Z .b8 0 2026-02-21T09:06:46.6878652Z .b8 1 // DW_AT_inline 2026-02-21T09:06:46.6878947Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:06:46.6879260Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:06:46.6879530Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:06:46.6879814Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:06:46.6880137Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:06:46.6880458Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:06:46.6880740Z .b64 $L__tmp0 // DW_AT_low_pc 2026-02-21T09:06:46.6881006Z .b64 $L__tmp1 // DW_AT_high_pc 2026-02-21T09:06:46.6881372Z .b8 1 // DW_AT_call_file 2026-02-21T09:06:46.6881621Z .b8 91 // DW_AT_call_line 2026-02-21T09:06:46.6881881Z .b8 40 // DW_AT_call_column 2026-02-21T09:06:46.6882150Z .b8 0 // End Of Children Mark 2026-02-21T09:06:46.6882408Z .b8 0 // End Of Children Mark 2026-02-21T09:06:46.6882643Z } 2026-02-21T09:06:46.6882794Z .section .debug_macinfo { } 2026-02-21T09:06:46.6882927Z 2026-02-21T09:06:46.6883014Z ================================================================ 2026-02-21T09:06:46.6883312Z please share the reproducer above with Triton project. 2026-02-21T09:06:50.1743757Z 2026-02-21T09:06:50.1744091Z 2026-02-21T09:06:50.1744095Z 2026-02-21T09:06:50.1744647Z ================================================================ 2026-02-21T09:06:50.1745049Z Internal Triton PTX codegen error 2026-02-21T09:06:50.1745330Z `ptxas` stderr: 2026-02-21T09:06:50.1746108Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 541 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:06:50.1747194Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:50.1747444Z 2026-02-21T09:06:50.1748167Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmptkxx4tst.ptx -o /tmp/tmptkxx4tst.ptx.o 2026-02-21T09:06:50.1749085Z 2026-02-21T09:06:50.1749089Z 2026-02-21T09:06:50.1749165Z // 2026-02-21T09:06:50.1749372Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:06:50.1749630Z // 2026-02-21T09:06:50.1749732Z 2026-02-21T09:06:50.1749811Z .version 8.7 2026-02-21T09:06:50.1750009Z .target sm_90a 2026-02-21T09:06:50.1750204Z .address_size 64 2026-02-21T09:06:50.1750460Z 2026-02-21T09:06:50.1750705Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:06:50.1751169Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:06:50.1751514Z // @_helion_matmul_bf16_int4 2026-02-21T09:06:50.1751869Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:06:50.1752274Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:06:50.1752751Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:06:50.1753210Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:06:50.1753675Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:06:50.1754137Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:06:50.1754498Z ) 2026-02-21T09:06:50.1754663Z .reqntid 512 2026-02-21T09:06:50.1754855Z .maxnreg 32 2026-02-21T09:06:50.1755025Z { 2026-02-21T09:06:50.1755204Z .reg .pred %p<83>; 2026-02-21T09:06:50.1755428Z .reg .b16 %rs<785>; 2026-02-21T09:06:50.1755641Z .reg .b32 %r<15444>; 2026-02-21T09:06:50.1755856Z .reg .b64 %rd<165>; 2026-02-21T09:06:50.1756256Z .loc 1 19 0 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:19:0 2026-02-21T09:06:50.1756931Z $L__func_begin0: 2026-02-21T09:06:50.1757315Z .loc 1 19 0 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:19:0 2026-02-21T09:06:50.1757721Z 2026-02-21T09:06:50.1757793Z // %bb.0: 2026-02-21T09:06:50.1758048Z ld.param.b64 %rd23, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:06:50.1758347Z ld.param.b64 %rd22, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:06:50.1758597Z $L__tmp0: 2026-02-21T09:06:50.1758884Z .loc 1 21 67 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:21:67 2026-02-21T09:06:50.1759249Z mov.u32 %r14781, %ctaid.x; 2026-02-21T09:06:50.1759501Z ld.param.b64 %rd25, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:06:50.1759758Z mov.u32 %r1292, %ctaid.y; 2026-02-21T09:06:50.1759988Z ld.param.b64 %rd42, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:06:50.1760405Z mov.u32 %r1293, %ctaid.z; 2026-02-21T09:06:50.1760586Z mov.u32 %r1294, %nctaid.x; 2026-02-21T09:06:50.1760772Z mov.u32 %r1295, %nctaid.y; 2026-02-21T09:06:50.1760971Z mad.lo.s32 %r1296, %r1293, %r1295, %r1292; 2026-02-21T09:06:50.1761208Z mad.lo.s32 %r1297, %r1296, %r1294, %r14781; 2026-02-21T09:06:50.1761428Z shl.b32 %r1298, %r1297, 7; 2026-02-21T09:06:50.1761605Z cvt.s64.s32 %rd43, %r1298; 2026-02-21T09:06:50.1762074Z add.s64 %rd39, %rd42, %rd43; 2026-02-21T09:06:50.1762269Z mov.u32 %r2, %tid.x; 2026-02-21T09:06:50.1762445Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T09:06:50.1762625Z shl.b32 %r6, %r2, 2; 2026-02-21T09:06:50.1762798Z mov.b32 %r12221, global_smem; 2026-02-21T09:06:50.1763009Z add.s32 %r1284, %r12221, %r6; 2026-02-21T09:06:50.1795401Z mov.b32 %r1285, 0; 2026-02-21T09:06:50.1795630Z // begin inline asm 2026-02-21T09:06:50.1795860Z @%p1 st.shared.b32 [ %r1284 + 0 ], %r1285; 2026-02-21T09:06:50.1796135Z // end inline asm 2026-02-21T09:06:50.1796343Z bar.warp.sync -1; 2026-02-21T09:06:50.1796734Z setp.eq.b32 %p2, %r2, 0; 2026-02-21T09:06:50.1796976Z cvt.u64.u32 %rd24, %r12221; 2026-02-21T09:06:50.1797207Z // begin inline asm 2026-02-21T09:06:50.1797606Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd24 + 0 ], %rd25; 2026-02-21T09:06:50.1797955Z // end inline asm 2026-02-21T09:06:50.1798109Z // begin inline asm 2026-02-21T09:06:50.1798373Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1; 2026-02-21T09:06:50.1798682Z // end inline asm 2026-02-21T09:06:50.1798838Z mov.b32 %r1286, 64; 2026-02-21T09:06:50.1799001Z // begin inline asm 2026-02-21T09:06:50.1799306Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r1286; 2026-02-21T09:06:50.1799664Z // end inline asm 2026-02-21T09:06:50.1799915Z mov.b32 %r1287, 256; 2026-02-21T09:06:50.1800078Z // begin inline asm 2026-02-21T09:06:50.1800370Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r1287; 2026-02-21T09:06:50.1800706Z // end inline asm 2026-02-21T09:06:50.1800854Z mov.b32 %r1288, 8192; 2026-02-21T09:06:50.1801023Z // begin inline asm 2026-02-21T09:06:50.1801316Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r1288; 2026-02-21T09:06:50.1801665Z // end inline asm 2026-02-21T09:06:50.1801816Z mov.b32 %r1289, 4096; 2026-02-21T09:06:50.1801983Z // begin inline asm 2026-02-21T09:06:50.1802272Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r1289; 2026-02-21T09:06:50.1802613Z // end inline asm 2026-02-21T09:06:50.1802774Z mov.b64 %rd32, 16384; 2026-02-21T09:06:50.1802946Z // begin inline asm 2026-02-21T09:06:50.1803256Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd24 + 0 ], 0x0, %rd32; 2026-02-21T09:06:50.1803610Z // end inline asm 2026-02-21T09:06:50.1803761Z mov.b32 %r1290, 1; 2026-02-21T09:06:50.1803911Z // begin inline asm 2026-02-21T09:06:50.1804233Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0, %r1290; 2026-02-21T09:06:50.1804608Z // end inline asm 2026-02-21T09:06:50.1804761Z // begin inline asm 2026-02-21T09:06:50.1805079Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x1, %r1290; 2026-02-21T09:06:50.1805433Z // end inline asm 2026-02-21T09:06:50.1805586Z // begin inline asm 2026-02-21T09:06:50.1805864Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd24 + 0 ], 0xa; 2026-02-21T09:06:50.1806190Z // end inline asm 2026-02-21T09:06:50.1806341Z // begin inline asm 2026-02-21T09:06:50.1806795Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:06:50.1807155Z // end inline asm 2026-02-21T09:06:50.1807305Z // begin inline asm 2026-02-21T09:06:50.1807594Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x3; 2026-02-21T09:06:50.1807921Z // end inline asm 2026-02-21T09:06:50.1808195Z // begin inline asm 2026-02-21T09:06:50.1808461Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd24 + 0 ], 0x0; 2026-02-21T09:06:50.1808784Z // end inline asm 2026-02-21T09:06:50.1808940Z // begin inline asm 2026-02-21T09:06:50.1809371Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd39 + 0 ], [ %rd24 + 0 ], 0x80; 2026-02-21T09:06:50.1809848Z // end inline asm 2026-02-21T09:06:50.1809993Z // begin inline asm 2026-02-21T09:06:50.1810242Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd39 + 0 ], 0x80; 2026-02-21T09:06:50.1810546Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:06:50.1810778Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:06:50.1810988Z // end inline asm 2026-02-21T09:06:50.1811213Z bar.sync 0; 2026-02-21T09:06:50.1811436Z cvta.global.u64 %rd158, %rd39; 2026-02-21T09:06:50.1811783Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.1812166Z sub.s32 %r1300, 1567, %r14781; 2026-02-21T09:06:50.1812368Z mul.hi.u32 %r1301, %r1300, 1041204193; 2026-02-21T09:06:50.1812584Z shr.u32 %r1302, %r1301, 8; 2026-02-21T09:06:50.1812766Z mul.hi.u32 %r1303, %r1302, 1431655766; 2026-02-21T09:06:50.1812978Z mad.lo.s32 %r3, %r1303, 3168, %r14781; 2026-02-21T09:06:50.1813325Z .loc 1 38 45 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:38:45 2026-02-21T09:06:50.1813679Z shr.u32 %r4, %r2, 5; 2026-02-21T09:06:50.1813846Z bfe.u32 %r5, %r2, 1, 8; 2026-02-21T09:06:50.1814016Z and.b32 %r7, %r6, 252; 2026-02-21T09:06:50.1814322Z .loc 1 48 48 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:48:48 2026-02-21T09:06:50.1814679Z shr.u32 %r8, %r2, 6; 2026-02-21T09:06:50.1814848Z bfe.u32 %r9, %r2, 6, 3; 2026-02-21T09:06:50.1834830Z .loc 1 54 38 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:54:38 2026-02-21T09:06:50.1835232Z shl.b32 %r10, %r2, 3; 2026-02-21T09:06:50.1835427Z and.b32 %r11, %r10, 8; 2026-02-21T09:06:50.1835739Z .loc 1 72 38 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:72:38 2026-02-21T09:06:50.1836094Z and.b32 %r12, %r2, 256; 2026-02-21T09:06:50.1836402Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.1836890Z setp.ge.s32 %p19, %r14781, %r3; 2026-02-21T09:06:50.1837084Z shl.b32 %r14764, %r2, 4; 2026-02-21T09:06:50.1837265Z bfe.s32 %r14765, %r2, 3, 1; 2026-02-21T09:06:50.1837449Z and.b32 %r14766, %r10, 96; 2026-02-21T09:06:50.1837630Z and.b32 %r14767, %r2, 3; 2026-02-21T09:06:50.1837806Z and.b32 %r14768, %r2, 16; 2026-02-21T09:06:50.1837978Z bfe.s32 %r14769, %r2, 4, 1; 2026-02-21T09:06:50.1838174Z and.b32 %r14770, %r6, 124; 2026-02-21T09:06:50.1838353Z and.b32 %r14771, %r8, 3; 2026-02-21T09:06:50.1838527Z and.b32 %r14772, %r10, 256; 2026-02-21T09:06:50.1838699Z shr.u32 %r14773, %r12, 1; 2026-02-21T09:06:50.1838877Z and.b32 %r14774, %r2, 124; 2026-02-21T09:06:50.1839056Z shl.b32 %r14775, %r2, 1; 2026-02-21T09:06:50.1839232Z shl.b32 %r14776, %r2, 6; 2026-02-21T09:06:50.1839406Z and.b32 %r14777, %r10, 48; 2026-02-21T09:06:50.1839583Z shr.u32 %r14778, %r12, 6; 2026-02-21T09:06:50.1839760Z shl.b32 %r14779, %r2, 7; 2026-02-21T09:06:50.1839928Z and.b32 %r14780, %r2, 480; 2026-02-21T09:06:50.1840110Z setp.eq.b32 %p81, %r12, 0; 2026-02-21T09:06:50.1840289Z setp.lt.u32 %p82, %r2, 128; 2026-02-21T09:06:50.1840471Z @%p19 bra $L__BB0_9; 2026-02-21T09:06:50.1840653Z // %bb.1: // %.lr.ph 2026-02-21T09:06:50.1841027Z .loc 1 0 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:0:144 2026-02-21T09:06:50.1841408Z and.b32 %r1305, %r14764, 8048; 2026-02-21T09:06:50.1841598Z and.b32 %r1307, %r14765, 136; 2026-02-21T09:06:50.1841785Z or.b32 %r1308, %r1307, %r1305; 2026-02-21T09:06:50.1841965Z add.s32 %r13, %r12221, %r1308; 2026-02-21T09:06:50.1842254Z xor.b32 %r1310, %r1308, 8; 2026-02-21T09:06:50.1842425Z add.s32 %r14, %r12221, %r1310; 2026-02-21T09:06:50.1842605Z and.b32 %r1311, %r14764, 7680; 2026-02-21T09:06:50.1842783Z shl.b32 %r1314, %r14767, 1; 2026-02-21T09:06:50.1842982Z and.b32 %r1317, %r14769, 136; 2026-02-21T09:06:50.1843163Z or.b32 %r1318, %r1311, %r14766; 2026-02-21T09:06:50.1843355Z or.b32 %r1319, %r1318, %r1314; 2026-02-21T09:06:50.1843538Z or.b32 %r1320, %r1319, %r1317; 2026-02-21T09:06:50.1843714Z add.s32 %r15, %r12221, %r1320; 2026-02-21T09:06:50.1843896Z xor.b32 %r1321, %r1320, 8; 2026-02-21T09:06:50.1844067Z add.s32 %r16, %r12221, %r1321; 2026-02-21T09:06:50.1844255Z or.b32 %r1326, %r14771, %r14772; 2026-02-21T09:06:50.1844443Z or.b32 %r1327, %r1326, %r14773; 2026-02-21T09:06:50.1844844Z or.b32 %r1328, %r1327, %r14770; 2026-02-21T09:06:50.1845031Z add.s32 %r17, %r12221, %r1328; 2026-02-21T09:06:50.1845219Z xor.b32 %r1329, %r1328, 32; 2026-02-21T09:06:50.1845402Z add.s32 %r18, %r12221, %r1329; 2026-02-21T09:06:50.1845579Z xor.b32 %r1330, %r1328, 64; 2026-02-21T09:06:50.1845760Z add.s32 %r19, %r12221, %r1330; 2026-02-21T09:06:50.1845936Z xor.b32 %r1331, %r1328, 96; 2026-02-21T09:06:50.1846122Z add.s32 %r20, %r12221, %r1331; 2026-02-21T09:06:50.1846303Z shl.b32 %r1332, %r14767, 9; 2026-02-21T09:06:50.1846618Z shl.b32 %r1333, %r14767, 5; 2026-02-21T09:06:50.1846801Z and.b32 %r1336, %r14775, 256; 2026-02-21T09:06:50.1846988Z xor.b32 %r1337, %r1333, %r14774; 2026-02-21T09:06:50.1847175Z add.s32 %r1338, %r12221, %r1332; 2026-02-21T09:06:50.1847365Z add.s32 %r1339, %r1338, %r1336; 2026-02-21T09:06:50.1847563Z add.s32 %r21, %r1339, %r1337; 2026-02-21T09:06:50.1847743Z and.b32 %r1341, %r14776, 16320; 2026-02-21T09:06:50.1847935Z or.b32 %r1344, %r1341, %r14778; 2026-02-21T09:06:50.1848200Z or.b32 %r1345, %r1344, %r14777; 2026-02-21T09:06:50.1848393Z add.s32 %r22, %r12221, %r1345; 2026-02-21T09:06:50.1848574Z xor.b32 %r1346, %r1345, 16; 2026-02-21T09:06:50.1848757Z add.s32 %r23, %r12221, %r1346; 2026-02-21T09:06:50.1848940Z xor.b32 %r1347, %r1345, 32; 2026-02-21T09:06:50.1849137Z add.s32 %r24, %r12221, %r1347; 2026-02-21T09:06:50.1849315Z xor.b32 %r1348, %r1345, 48; 2026-02-21T09:06:50.1849497Z add.s32 %r25, %r12221, %r1348; 2026-02-21T09:06:50.1849684Z bfe.u32 %r1349, %r12221, 4, 14; 2026-02-21T09:06:50.1849871Z cvt.u64.u32 %rd44, %r1349; 2026-02-21T09:06:50.1850075Z or.b64 %rd93, %rd44, -9223371899348713472; 2026-02-21T09:06:50.1850298Z add.s32 %r1350, %r12221, 32; 2026-02-21T09:06:50.1850483Z bfe.u32 %r1351, %r1350, 4, 14; 2026-02-21T09:06:50.1850665Z cvt.u64.u32 %rd45, %r1351; 2026-02-21T09:06:50.1850859Z or.b64 %rd94, %rd45, -9223371899348713472; 2026-02-21T09:06:50.1851063Z and.b32 %r1353, %r14779, 1920; 2026-02-21T09:06:50.1851266Z shl.b32 %r1355, %r14780, 6; 2026-02-21T09:06:50.1851453Z and.b32 %r1356, %r14764, 112; 2026-02-21T09:06:50.1851628Z or.b32 %r1357, %r1353, %r1356; 2026-02-21T09:06:50.1851809Z xor.b32 %r1358, %r1357, %r14768; 2026-02-21T09:06:50.1851996Z or.b32 %r1359, %r1358, %r1355; 2026-02-21T09:06:50.1852178Z add.s32 %r26, %r12221, %r1359; 2026-02-21T09:06:50.1852353Z add.s32 %r27, %r26, 32768; 2026-02-21T09:06:50.1852527Z add.s32 %r28, %r26, 65536; 2026-02-21T09:06:50.1852695Z add.s32 %r29, %r26, 98304; 2026-02-21T09:06:50.1852871Z xor.b32 %r1360, %r1359, 32; 2026-02-21T09:06:50.1853043Z add.s32 %r30, %r12221, %r1360; 2026-02-21T09:06:50.1853225Z add.s32 %r31, %r30, 32768; 2026-02-21T09:06:50.1853398Z add.s32 %r32, %r30, 65536; 2026-02-21T09:06:50.1853568Z add.s32 %r33, %r30, 98304; 2026-02-21T09:06:50.1853742Z xor.b32 %r1361, %r1359, 64; 2026-02-21T09:06:50.1853927Z add.s32 %r34, %r12221, %r1361; 2026-02-21T09:06:50.1854112Z add.s32 %r35, %r34, 32768; 2026-02-21T09:06:50.1854286Z add.s32 %r36, %r34, 65536; 2026-02-21T09:06:50.1854464Z add.s32 %r37, %r34, 98304; 2026-02-21T09:06:50.1854631Z xor.b32 %r1362, %r1359, 96; 2026-02-21T09:06:50.1854811Z add.s32 %r38, %r12221, %r1362; 2026-02-21T09:06:50.1855076Z add.s32 %r39, %r38, 32768; 2026-02-21T09:06:50.1855246Z add.s32 %r40, %r38, 65536; 2026-02-21T09:06:50.1855421Z add.s32 %r41, %r38, 98304; 2026-02-21T09:06:50.1855593Z shl.b32 %r1363, %r14780, 4; 2026-02-21T09:06:50.1855773Z or.b32 %r1364, %r1363, %r14766; 2026-02-21T09:06:50.1855956Z or.b32 %r1365, %r1364, %r1314; 2026-02-21T09:06:50.1856139Z or.b32 %r1366, %r1365, %r1317; 2026-02-21T09:06:50.1856313Z add.s32 %r42, %r12221, %r1366; 2026-02-21T09:06:50.1856639Z xor.b32 %r1367, %r1366, 8; 2026-02-21T09:06:50.1856824Z add.s32 %r43, %r12221, %r1367; 2026-02-21T09:06:50.1857166Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.1857559Z add.s64 %rd4, %rd22, 96; 2026-02-21T09:06:50.1857891Z shl.b32 %r1368, %r5, 10; 2026-02-21T09:06:50.1858214Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.1858569Z or.b32 %r44, %r1368, %r11; 2026-02-21T09:06:50.1858887Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.1859239Z shl.b32 %r1369, %r9, 13; 2026-02-21T09:06:50.1859549Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.1859901Z or.b32 %r45, %r1369, %r7; 2026-02-21T09:06:50.1860129Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:06:50.1860428Z // Child Loop BB0_3 Depth 2 2026-02-21T09:06:50.1860692Z // Child Loop BB0_5 Depth 2 2026-02-21T09:06:50.1860958Z // Child Loop BB0_7 Depth 2 2026-02-21T09:06:50.1861410Z .loc 1 32 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:32:35 2026-02-21T09:06:50.1861770Z shr.s32 %r1371, %r14781, 31; 2026-02-21T09:06:50.1861967Z shr.u32 %r1372, %r1371, 26; 2026-02-21T09:06:50.1862152Z add.s32 %r1373, %r14781, %r1372; 2026-02-21T09:06:50.1862343Z shr.s32 %r1374, %r1373, 6; 2026-02-21T09:06:50.1862646Z .loc 1 33 33 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:33:33 2026-02-21T09:06:50.1862995Z shl.b32 %r1375, %r1374, 1; 2026-02-21T09:06:50.1863301Z .loc 1 34 39 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:39 2026-02-21T09:06:50.1863651Z sub.s32 %r1376, 16, %r1375; 2026-02-21T09:06:50.1863964Z .loc 1 34 52 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:52 2026-02-21T09:06:50.1864303Z min.s32 %r1377, %r1376, 2; 2026-02-21T09:06:50.1864611Z .loc 1 35 45 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:45 2026-02-21T09:06:50.1864955Z and.b32 %r1378, %r1373, -64; 2026-02-21T09:06:50.1865140Z sub.s32 %r1379, %r14781, %r1378; 2026-02-21T09:06:50.1865455Z .loc 1 36 51 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:36:51 2026-02-21T09:06:50.1865810Z div.s32 %r1380, %r1379, %r1377; 2026-02-21T09:06:50.1866131Z .loc 1 35 64 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:64 2026-02-21T09:06:50.1866598Z mul.lo.s32 %r1381, %r1380, %r1377; 2026-02-21T09:06:50.1866804Z sub.s32 %r1382, %r1379, %r1381; 2026-02-21T09:06:50.1867116Z .loc 1 35 30 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:30 2026-02-21T09:06:50.1867459Z add.s32 %r1383, %r1382, %r1375; 2026-02-21T09:06:50.1867770Z .loc 1 37 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:37:27 2026-02-21T09:06:50.1868117Z shl.b32 %r4630, %r1383, 8; 2026-02-21T09:06:50.1868524Z .loc 1 39 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:39:27 2026-02-21T09:06:50.1868868Z shl.b32 %r48, %r1380, 8; 2026-02-21T09:06:50.1869173Z .loc 1 40 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:40:32 2026-02-21T09:06:50.1869604Z or.b32 %r49, %r48, %r7; 2026-02-21T09:06:50.1869915Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.1870259Z shl.b32 %r1384, %r1383, 18; 2026-02-21T09:06:50.1870441Z or.b32 %r1385, %r44, %r1384; 2026-02-21T09:06:50.1870636Z mad.wide.s32 %rd159, %r1385, 2, %rd4; 2026-02-21T09:06:50.1870841Z add.s32 %r14782, %r45, %r48; 2026-02-21T09:06:50.1871027Z mov.b32 %r14783, 0f00000000; 2026-02-21T09:06:50.1871201Z mov.b64 %rd160, -32; 2026-02-21T09:06:50.1871371Z mov.b32 %r14784, %r14783; 2026-02-21T09:06:50.1871555Z mov.b32 %r14785, %r14783; 2026-02-21T09:06:50.1871809Z mov.b32 %r14786, %r14783; 2026-02-21T09:06:50.1872037Z mov.b32 %r14787, %r14783; 2026-02-21T09:06:50.1872213Z mov.b32 %r14788, %r14783; 2026-02-21T09:06:50.1872379Z mov.b32 %r14789, %r14783; 2026-02-21T09:06:50.1872550Z mov.b32 %r14790, %r14783; 2026-02-21T09:06:50.1872724Z mov.b32 %r14791, %r14783; 2026-02-21T09:06:50.1872889Z mov.b32 %r14792, %r14783; 2026-02-21T09:06:50.1873060Z mov.b32 %r14793, %r14783; 2026-02-21T09:06:50.1873224Z mov.b32 %r14794, %r14783; 2026-02-21T09:06:50.1873395Z mov.b32 %r14795, %r14783; 2026-02-21T09:06:50.1873560Z mov.b32 %r14796, %r14783; 2026-02-21T09:06:50.1873732Z mov.b32 %r14797, %r14783; 2026-02-21T09:06:50.1873896Z mov.b32 %r14798, %r14783; 2026-02-21T09:06:50.1874067Z mov.b32 %r14799, %r14783; 2026-02-21T09:06:50.1874245Z mov.b32 %r14800, %r14783; 2026-02-21T09:06:50.1874415Z mov.b32 %r14801, %r14783; 2026-02-21T09:06:50.1874582Z mov.b32 %r14802, %r14783; 2026-02-21T09:06:50.1874743Z mov.b32 %r14803, %r14783; 2026-02-21T09:06:50.1874914Z mov.b32 %r14804, %r14783; 2026-02-21T09:06:50.1875149Z mov.b32 %r14805, %r14783; 2026-02-21T09:06:50.1875320Z mov.b32 %r14806, %r14783; 2026-02-21T09:06:50.1875481Z mov.b32 %r14807, %r14783; 2026-02-21T09:06:50.1875660Z mov.b32 %r14808, %r14783; 2026-02-21T09:06:50.1875823Z mov.b32 %r14809, %r14783; 2026-02-21T09:06:50.1875989Z mov.b32 %r14810, %r14783; 2026-02-21T09:06:50.1876167Z mov.b32 %r14811, %r14783; 2026-02-21T09:06:50.1876335Z mov.b32 %r14812, %r14783; 2026-02-21T09:06:50.1876620Z mov.b32 %r14813, %r14783; 2026-02-21T09:06:50.1876785Z mov.b32 %r14814, %r14783; 2026-02-21T09:06:50.1876953Z mov.b32 %r14815, %r14783; 2026-02-21T09:06:50.1877114Z mov.b32 %r14816, %r14783; 2026-02-21T09:06:50.1877282Z mov.b32 %r14817, %r14783; 2026-02-21T09:06:50.1877442Z mov.b32 %r14818, %r14783; 2026-02-21T09:06:50.1877633Z mov.b32 %r14819, %r14783; 2026-02-21T09:06:50.1877799Z mov.b32 %r14820, %r14783; 2026-02-21T09:06:50.1877970Z mov.b32 %r14821, %r14783; 2026-02-21T09:06:50.1878143Z mov.b32 %r14822, %r14783; 2026-02-21T09:06:50.1878308Z mov.b32 %r14823, %r14783; 2026-02-21T09:06:50.1878477Z mov.b32 %r14824, %r14783; 2026-02-21T09:06:50.1878640Z mov.b32 %r14825, %r14783; 2026-02-21T09:06:50.1878809Z mov.b32 %r14826, %r14783; 2026-02-21T09:06:50.1878973Z mov.b32 %r14827, %r14783; 2026-02-21T09:06:50.1879141Z mov.b32 %r14828, %r14783; 2026-02-21T09:06:50.1879304Z mov.b32 %r14829, %r14783; 2026-02-21T09:06:50.1879473Z mov.b32 %r14830, %r14783; 2026-02-21T09:06:50.1879634Z mov.b32 %r14831, %r14783; 2026-02-21T09:06:50.1879802Z mov.b32 %r14832, %r14783; 2026-02-21T09:06:50.1879973Z mov.b32 %r14833, %r14783; 2026-02-21T09:06:50.1880137Z mov.b32 %r14834, %r14783; 2026-02-21T09:06:50.1880309Z mov.b32 %r14835, %r14783; 2026-02-21T09:06:50.1880475Z mov.b32 %r14836, %r14783; 2026-02-21T09:06:50.1880646Z mov.b32 %r14837, %r14783; 2026-02-21T09:06:50.1880823Z mov.b32 %r14838, %r14783; 2026-02-21T09:06:50.1880998Z mov.b32 %r14839, %r14783; 2026-02-21T09:06:50.1881169Z mov.b32 %r14840, %r14783; 2026-02-21T09:06:50.1881340Z mov.b32 %r14841, %r14783; 2026-02-21T09:06:50.1881506Z mov.b32 %r14842, %r14783; 2026-02-21T09:06:50.1881677Z mov.b32 %r14843, %r14783; 2026-02-21T09:06:50.1881847Z mov.b32 %r14844, %r14783; 2026-02-21T09:06:50.1882097Z mov.b32 %r14845, %r14783; 2026-02-21T09:06:50.1882268Z mov.b32 %r14846, %r14783; 2026-02-21T09:06:50.1882434Z mov.b32 %r14847, %r14783; 2026-02-21T09:06:50.1882604Z mov.b32 %r14848, %r14783; 2026-02-21T09:06:50.1882766Z mov.b32 %r14849, %r14783; 2026-02-21T09:06:50.1882937Z mov.b32 %r14850, %r14783; 2026-02-21T09:06:50.1883099Z mov.b32 %r14851, %r14783; 2026-02-21T09:06:50.1883267Z mov.b32 %r14852, %r14783; 2026-02-21T09:06:50.1883429Z mov.b32 %r14853, %r14783; 2026-02-21T09:06:50.1883596Z mov.b32 %r14854, %r14783; 2026-02-21T09:06:50.1883762Z mov.b32 %r14855, %r14783; 2026-02-21T09:06:50.1883922Z mov.b32 %r14856, %r14783; 2026-02-21T09:06:50.1884092Z mov.b32 %r14857, %r14783; 2026-02-21T09:06:50.1884333Z mov.b32 %r14858, %r14783; 2026-02-21T09:06:50.1884565Z mov.b32 %r14859, %r14783; 2026-02-21T09:06:50.1884727Z mov.b32 %r14860, %r14783; 2026-02-21T09:06:50.1884895Z mov.b32 %r14861, %r14783; 2026-02-21T09:06:50.1885059Z mov.b32 %r14862, %r14783; 2026-02-21T09:06:50.1885231Z mov.b32 %r14863, %r14783; 2026-02-21T09:06:50.1885408Z mov.b32 %r14864, %r14783; 2026-02-21T09:06:50.1885581Z mov.b32 %r14865, %r14783; 2026-02-21T09:06:50.1885747Z mov.b32 %r14866, %r14783; 2026-02-21T09:06:50.1885913Z mov.b32 %r14867, %r14783; 2026-02-21T09:06:50.1886083Z mov.b32 %r14868, %r14783; 2026-02-21T09:06:50.1886246Z mov.b32 %r14869, %r14783; 2026-02-21T09:06:50.1886415Z mov.b32 %r14870, %r14783; 2026-02-21T09:06:50.1886690Z mov.b32 %r14871, %r14783; 2026-02-21T09:06:50.1886860Z mov.b32 %r14872, %r14783; 2026-02-21T09:06:50.1887031Z mov.b32 %r14873, %r14783; 2026-02-21T09:06:50.1887201Z mov.b32 %r14874, %r14783; 2026-02-21T09:06:50.1887367Z mov.b32 %r14875, %r14783; 2026-02-21T09:06:50.1887539Z mov.b32 %r14876, %r14783; 2026-02-21T09:06:50.1887797Z mov.b32 %r14877, %r14783; 2026-02-21T09:06:50.1887969Z mov.b32 %r14878, %r14783; 2026-02-21T09:06:50.1888148Z mov.b32 %r14879, %r14783; 2026-02-21T09:06:50.1888311Z mov.b32 %r14880, %r14783; 2026-02-21T09:06:50.1888482Z mov.b32 %r14881, %r14783; 2026-02-21T09:06:50.1888644Z mov.b32 %r14882, %r14783; 2026-02-21T09:06:50.1888815Z mov.b32 %r14883, %r14783; 2026-02-21T09:06:50.1888979Z mov.b32 %r14884, %r14783; 2026-02-21T09:06:50.1889148Z mov.b32 %r14885, %r14783; 2026-02-21T09:06:50.1889312Z mov.b32 %r14886, %r14783; 2026-02-21T09:06:50.1889486Z mov.b32 %r14887, %r14783; 2026-02-21T09:06:50.1889660Z mov.b32 %r14888, %r14783; 2026-02-21T09:06:50.1889824Z mov.b32 %r14889, %r14783; 2026-02-21T09:06:50.1890000Z mov.b32 %r14890, %r14783; 2026-02-21T09:06:50.1890182Z mov.b32 %r14891, %r14783; 2026-02-21T09:06:50.1890354Z mov.b32 %r14892, %r14783; 2026-02-21T09:06:50.1890522Z mov.b32 %r14893, %r14783; 2026-02-21T09:06:50.1890694Z mov.b32 %r14894, %r14783; 2026-02-21T09:06:50.1890867Z mov.b32 %r14895, %r14783; 2026-02-21T09:06:50.1891038Z mov.b32 %r14896, %r14783; 2026-02-21T09:06:50.1891202Z mov.b32 %r14897, %r14783; 2026-02-21T09:06:50.1891378Z mov.b32 %r14898, %r14783; 2026-02-21T09:06:50.1891550Z mov.b32 %r14899, %r14783; 2026-02-21T09:06:50.1891714Z mov.b32 %r14900, %r14783; 2026-02-21T09:06:50.1891883Z mov.b32 %r14901, %r14783; 2026-02-21T09:06:50.1892044Z mov.b32 %r14902, %r14783; 2026-02-21T09:06:50.1892214Z mov.b32 %r14903, %r14783; 2026-02-21T09:06:50.1892375Z mov.b32 %r14904, %r14783; 2026-02-21T09:06:50.1892542Z mov.b32 %r14905, %r14783; 2026-02-21T09:06:50.1892718Z mov.b32 %r14906, %r14783; 2026-02-21T09:06:50.1892885Z mov.b32 %r14907, %r14783; 2026-02-21T09:06:50.1893048Z mov.b32 %r14908, %r14783; 2026-02-21T09:06:50.1893215Z mov.b32 %r14909, %r14783; 2026-02-21T09:06:50.1893385Z mov.b32 %r14910, %r14783; 2026-02-21T09:06:50.1893609Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:06:50.1893933Z // => This Inner Loop Header: Depth=2 2026-02-21T09:06:50.1894322Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.1894774Z add.s64 %rd47, %rd159, -96; 2026-02-21T09:06:50.1894958Z // begin inline asm 2026-02-21T09:06:50.1895124Z mov.u32 %r1386, 0x0; 2026-02-21T09:06:50.1895285Z mov.u32 %r1387, 0x0; 2026-02-21T09:06:50.1895438Z mov.u32 %r1388, 0x0; 2026-02-21T09:06:50.1895593Z mov.u32 %r1389, 0x0; 2026-02-21T09:06:50.1895821Z ld.global.v4.b32 { %r1386, %r1387, %r1388, %r1389 }, [ %rd47 + 0 ]; 2026-02-21T09:06:50.1896093Z // end inline asm 2026-02-21T09:06:50.1896390Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.1896874Z bar.sync 0; 2026-02-21T09:06:50.1897041Z st.shared.v2.b32 [%r13], {%r1386, %r1387}; 2026-02-21T09:06:50.1897275Z st.shared.v2.b32 [%r14], {%r1388, %r1389}; 2026-02-21T09:06:50.1897653Z bar.sync 0; 2026-02-21T09:06:50.1897820Z ld.shared.b16 %rs1, [%r15]; 2026-02-21T09:06:50.1898022Z ld.shared.b16 %rs2, [%r15+256]; 2026-02-21T09:06:50.1898219Z ld.shared.b16 %rs3, [%r15+16]; 2026-02-21T09:06:50.1898417Z ld.shared.b16 %rs4, [%r15+272]; 2026-02-21T09:06:50.1898607Z ld.shared.b16 %rs5, [%r16]; 2026-02-21T09:06:50.1898801Z ld.shared.b16 %rs6, [%r16+256]; 2026-02-21T09:06:50.1898988Z ld.shared.b16 %rs7, [%r16+16]; 2026-02-21T09:06:50.1899176Z ld.shared.b16 %rs8, [%r16+272]; 2026-02-21T09:06:50.1899363Z cvt.f32.bf16 %r1647, %rs1; 2026-02-21T09:06:50.1899546Z cvt.f32.bf16 %r1648, %rs2; 2026-02-21T09:06:50.1899734Z cvt.f32.bf16 %r1649, %rs5; 2026-02-21T09:06:50.1899916Z cvt.f32.bf16 %r1650, %rs6; 2026-02-21T09:06:50.1900094Z cvt.f32.bf16 %r1907, %rs3; 2026-02-21T09:06:50.1900272Z cvt.f32.bf16 %r1908, %rs4; 2026-02-21T09:06:50.1900452Z cvt.f32.bf16 %r1909, %rs7; 2026-02-21T09:06:50.1900622Z cvt.f32.bf16 %r1910, %rs8; 2026-02-21T09:06:50.1901030Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.1901392Z cvt.s64.s32 %rd63, %r14782; 2026-02-21T09:06:50.1901582Z add.s64 %rd48, %rd23, %rd63; 2026-02-21T09:06:50.1901907Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.1902251Z // begin inline asm 2026-02-21T09:06:50.1902419Z mov.u32 %r1390, 0x0; 2026-02-21T09:06:50.1902605Z ld.global.b32 { %r1390 }, [ %rd48 + 0 ]; 2026-02-21T09:06:50.1902816Z // end inline asm 2026-02-21T09:06:50.1903104Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.1903448Z bar.sync 0; 2026-02-21T09:06:50.1903597Z st.shared.b8 [%r17], %r1390; 2026-02-21T09:06:50.1903796Z prmt.b32 %r4534, %r1390, 0, 0x7771U; 2026-02-21T09:06:50.1904007Z st.shared.b8 [%r18+512], %r4534; 2026-02-21T09:06:50.1904203Z prmt.b32 %r4535, %r1390, 0, 0x7772U; 2026-02-21T09:06:50.1904415Z st.shared.b8 [%r19+1024], %r4535; 2026-02-21T09:06:50.1904614Z prmt.b32 %r4536, %r1390, 0, 0x7773U; 2026-02-21T09:06:50.1904815Z st.shared.b8 [%r20+1536], %r4536; 2026-02-21T09:06:50.1904998Z bar.sync 0; 2026-02-21T09:06:50.1905156Z ld.shared.b32 %r4537, [%r21]; 2026-02-21T09:06:50.1905344Z prmt.b32 %r4538, %r4537, 0, 0x7770U; 2026-02-21T09:06:50.1905545Z cvt.u16.u32 %rs9, %r4538; 2026-02-21T09:06:50.1905728Z prmt.b32 %r4539, %r4537, 0, 0x7771U; 2026-02-21T09:06:50.1905924Z cvt.u16.u32 %rs10, %r4539; 2026-02-21T09:06:50.1906108Z prmt.b32 %r4540, %r4537, 0, 0x7772U; 2026-02-21T09:06:50.1906298Z cvt.u16.u32 %rs11, %r4540; 2026-02-21T09:06:50.1906616Z prmt.b32 %r4541, %r4537, 0, 0x7773U; 2026-02-21T09:06:50.1906827Z cvt.u16.u32 %rs12, %r4541; 2026-02-21T09:06:50.1907012Z ld.shared.b32 %r4542, [%r21+128]; 2026-02-21T09:06:50.1907219Z prmt.b32 %r4543, %r4542, 0, 0x7770U; 2026-02-21T09:06:50.1907418Z cvt.u16.u32 %rs13, %r4543; 2026-02-21T09:06:50.1907601Z prmt.b32 %r4544, %r4542, 0, 0x7771U; 2026-02-21T09:06:50.1907802Z cvt.u16.u32 %rs14, %r4544; 2026-02-21T09:06:50.1907982Z prmt.b32 %r4545, %r4542, 0, 0x7772U; 2026-02-21T09:06:50.1908175Z cvt.u16.u32 %rs15, %r4545; 2026-02-21T09:06:50.1908534Z prmt.b32 %r4546, %r4542, 0, 0x7773U; 2026-02-21T09:06:50.1908725Z cvt.u16.u32 %rs16, %r4546; 2026-02-21T09:06:50.1909041Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.1909387Z shl.b16 %rs17, %rs9, 4; 2026-02-21T09:06:50.1909572Z shl.b16 %rs18, %rs10, 4; 2026-02-21T09:06:50.1909746Z shl.b16 %rs19, %rs11, 4; 2026-02-21T09:06:50.1909921Z shl.b16 %rs20, %rs12, 4; 2026-02-21T09:06:50.1910099Z shl.b16 %rs21, %rs13, 4; 2026-02-21T09:06:50.1910265Z shl.b16 %rs22, %rs14, 4; 2026-02-21T09:06:50.1910435Z shl.b16 %rs23, %rs15, 4; 2026-02-21T09:06:50.1910603Z shl.b16 %rs24, %rs16, 4; 2026-02-21T09:06:50.1910997Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.1911419Z selp.b16 %rs25, %rs17, %rs9, %p81; 2026-02-21T09:06:50.1911628Z cvt.s16.s8 %rs26, %rs25; 2026-02-21T09:06:50.1911800Z shr.s16 %rs27, %rs26, 4; 2026-02-21T09:06:50.1911989Z selp.b16 %rs28, %rs18, %rs10, %p81; 2026-02-21T09:06:50.1912195Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T09:06:50.1912361Z shr.s16 %rs30, %rs29, 4; 2026-02-21T09:06:50.1912543Z selp.b16 %rs31, %rs19, %rs11, %p81; 2026-02-21T09:06:50.1912748Z cvt.s16.s8 %rs32, %rs31; 2026-02-21T09:06:50.1912929Z shr.s16 %rs33, %rs32, 4; 2026-02-21T09:06:50.1913104Z selp.b16 %rs34, %rs20, %rs12, %p81; 2026-02-21T09:06:50.1913304Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T09:06:50.1913468Z shr.s16 %rs36, %rs35, 4; 2026-02-21T09:06:50.1913645Z selp.b16 %rs37, %rs21, %rs13, %p81; 2026-02-21T09:06:50.1913836Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T09:06:50.1914008Z shr.s16 %rs39, %rs38, 4; 2026-02-21T09:06:50.1914183Z selp.b16 %rs40, %rs22, %rs14, %p81; 2026-02-21T09:06:50.1914383Z cvt.s16.s8 %rs41, %rs40; 2026-02-21T09:06:50.1914637Z shr.s16 %rs42, %rs41, 4; 2026-02-21T09:06:50.1914820Z selp.b16 %rs43, %rs23, %rs15, %p81; 2026-02-21T09:06:50.1915015Z cvt.s16.s8 %rs44, %rs43; 2026-02-21T09:06:50.1915179Z shr.s16 %rs45, %rs44, 4; 2026-02-21T09:06:50.1915354Z selp.b16 %rs46, %rs24, %rs16, %p81; 2026-02-21T09:06:50.1915542Z cvt.s16.s8 %rs47, %rs46; 2026-02-21T09:06:50.1915712Z shr.s16 %rs48, %rs47, 4; 2026-02-21T09:06:50.1916023Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.1916377Z cvt.rn.f32.s16 %r4547, %rs27; 2026-02-21T09:06:50.1916736Z cvt.rn.f32.s16 %r4548, %rs30; 2026-02-21T09:06:50.1916918Z cvt.rn.f32.s16 %r4549, %rs33; 2026-02-21T09:06:50.1917100Z cvt.rn.f32.s16 %r4550, %rs36; 2026-02-21T09:06:50.1917287Z cvt.rn.f32.s16 %r4551, %rs39; 2026-02-21T09:06:50.1917476Z cvt.rn.f32.s16 %r4552, %rs42; 2026-02-21T09:06:50.1917652Z cvt.rn.f32.s16 %r4553, %rs45; 2026-02-21T09:06:50.1917841Z cvt.rn.f32.s16 %r4554, %rs48; 2026-02-21T09:06:50.1918021Z bar.sync 0; 2026-02-21T09:06:50.1918172Z st.shared.b32 [%r22], %r4547; 2026-02-21T09:06:50.1918363Z st.shared.b32 [%r22+8], %r4548; 2026-02-21T09:06:50.1918561Z st.shared.b32 [%r23], %r4549; 2026-02-21T09:06:50.1918747Z st.shared.b32 [%r23+8], %r4550; 2026-02-21T09:06:50.1918934Z st.shared.b32 [%r24], %r4551; 2026-02-21T09:06:50.1919122Z st.shared.b32 [%r24+8], %r4552; 2026-02-21T09:06:50.1919307Z st.shared.b32 [%r25], %r4553; 2026-02-21T09:06:50.1919491Z st.shared.b32 [%r25+8], %r4554; 2026-02-21T09:06:50.1919668Z $L__tmp1: 2026-02-21T09:06:50.1920028Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.1920449Z // begin inline asm 2026-02-21T09:06:50.1920629Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.1920827Z // end inline asm 2026-02-21T09:06:50.1920976Z bar.sync 0; 2026-02-21T09:06:50.1921147Z shfl.sync.idx.b32 %r4555, %r4, 0, 31, -1; 2026-02-21T09:06:50.1921379Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.1921581Z mov.pred %p20, -1; 2026-02-21T09:06:50.1921746Z // begin inline asm 2026-02-21T09:06:50.1924669Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r1647,%r1648,%r1649,%r1650}, %rd93, %p20, 1, 1; 2026-02-21T09:06:50.1928009Z // end inline asm 2026-02-21T09:06:50.1928169Z // begin inline asm 2026-02-21T09:06:50.1931080Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r1907,%r1908,%r1909,%r1910}, %rd94, %p20, 1, 1; 2026-02-21T09:06:50.1934075Z // end inline asm 2026-02-21T09:06:50.1934243Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.1934443Z mov.b32 %r4402, 0; 2026-02-21T09:06:50.1934597Z mov.b32 %r2039, %r12221; 2026-02-21T09:06:50.1934775Z mov.b32 %r2040, %r4402; 2026-02-21T09:06:50.1934940Z mov.b32 %r2041, %r4402; 2026-02-21T09:06:50.1935110Z // begin inline asm 2026-02-21T09:06:50.1937866Z // wait for regs: %r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910,%r2039,%r2040,%r2041 2026-02-21T09:06:50.1940675Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.1940868Z // end inline asm 2026-02-21T09:06:50.1941017Z $L__tmp2: 2026-02-21T09:06:50.1941300Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.1941758Z add.s64 %rd51, %rd159, -64; 2026-02-21T09:06:50.1941942Z // begin inline asm 2026-02-21T09:06:50.1942104Z mov.u32 %r2173, 0x0; 2026-02-21T09:06:50.1942258Z mov.u32 %r2174, 0x0; 2026-02-21T09:06:50.1942412Z mov.u32 %r2175, 0x0; 2026-02-21T09:06:50.1942568Z mov.u32 %r2176, 0x0; 2026-02-21T09:06:50.1942788Z ld.global.v4.b32 { %r2173, %r2174, %r2175, %r2176 }, [ %rd51 + 0 ]; 2026-02-21T09:06:50.1943057Z // end inline asm 2026-02-21T09:06:50.1943344Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.1943693Z bar.sync 0; 2026-02-21T09:06:50.1943858Z st.shared.v2.b32 [%r13], {%r2173, %r2174}; 2026-02-21T09:06:50.1944090Z st.shared.v2.b32 [%r14], {%r2175, %r2176}; 2026-02-21T09:06:50.1944450Z bar.sync 0; 2026-02-21T09:06:50.1944613Z ld.shared.b16 %rs49, [%r15]; 2026-02-21T09:06:50.1944811Z ld.shared.b16 %rs50, [%r15+256]; 2026-02-21T09:06:50.1945006Z ld.shared.b16 %rs51, [%r15+16]; 2026-02-21T09:06:50.1945207Z ld.shared.b16 %rs52, [%r15+272]; 2026-02-21T09:06:50.1945397Z ld.shared.b16 %rs53, [%r16]; 2026-02-21T09:06:50.1945585Z ld.shared.b16 %rs54, [%r16+256]; 2026-02-21T09:06:50.1945777Z ld.shared.b16 %rs55, [%r16+16]; 2026-02-21T09:06:50.1945971Z ld.shared.b16 %rs56, [%r16+272]; 2026-02-21T09:06:50.1946161Z cvt.f32.bf16 %r2434, %rs49; 2026-02-21T09:06:50.1946346Z cvt.f32.bf16 %r2435, %rs50; 2026-02-21T09:06:50.1946670Z cvt.f32.bf16 %r2436, %rs53; 2026-02-21T09:06:50.1946854Z cvt.f32.bf16 %r2437, %rs54; 2026-02-21T09:06:50.1947036Z cvt.f32.bf16 %r2694, %rs51; 2026-02-21T09:06:50.1947208Z cvt.f32.bf16 %r2695, %rs52; 2026-02-21T09:06:50.1947386Z cvt.f32.bf16 %r2696, %rs55; 2026-02-21T09:06:50.1947558Z cvt.f32.bf16 %r2697, %rs56; 2026-02-21T09:06:50.1947745Z cvt.u32.u64 %r4556, %rd160; 2026-02-21T09:06:50.1948005Z add.s32 %r4557, %r4556, 40; 2026-02-21T09:06:50.1948410Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.1948768Z or.b32 %r4558, %r8, %r4557; 2026-02-21T09:06:50.1948949Z shl.b32 %r4559, %r4558, 13; 2026-02-21T09:06:50.1949128Z add.s32 %r4560, %r4559, %r49; 2026-02-21T09:06:50.1949441Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.1949797Z cvt.s64.s32 %rd64, %r4560; 2026-02-21T09:06:50.1949974Z add.s64 %rd52, %rd23, %rd64; 2026-02-21T09:06:50.1950290Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.1950630Z // begin inline asm 2026-02-21T09:06:50.1950794Z mov.u32 %r2177, 0x0; 2026-02-21T09:06:50.1950967Z ld.global.b32 { %r2177 }, [ %rd52 + 0 ]; 2026-02-21T09:06:50.1951176Z // end inline asm 2026-02-21T09:06:50.1951468Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.1951804Z bar.sync 0; 2026-02-21T09:06:50.1951956Z st.shared.b8 [%r17], %r2177; 2026-02-21T09:06:50.1952144Z prmt.b32 %r4561, %r2177, 0, 0x7771U; 2026-02-21T09:06:50.1952349Z st.shared.b8 [%r18+512], %r4561; 2026-02-21T09:06:50.1952538Z prmt.b32 %r4562, %r2177, 0, 0x7772U; 2026-02-21T09:06:50.1952742Z st.shared.b8 [%r19+1024], %r4562; 2026-02-21T09:06:50.1952943Z prmt.b32 %r4563, %r2177, 0, 0x7773U; 2026-02-21T09:06:50.1953138Z st.shared.b8 [%r20+1536], %r4563; 2026-02-21T09:06:50.1953324Z bar.sync 0; 2026-02-21T09:06:50.1953470Z ld.shared.b32 %r4564, [%r21]; 2026-02-21T09:06:50.1953658Z prmt.b32 %r4565, %r4564, 0, 0x7770U; 2026-02-21T09:06:50.1953854Z cvt.u16.u32 %rs57, %r4565; 2026-02-21T09:06:50.1954054Z prmt.b32 %r4566, %r4564, 0, 0x7771U; 2026-02-21T09:06:50.1954247Z cvt.u16.u32 %rs58, %r4566; 2026-02-21T09:06:50.1954430Z prmt.b32 %r4567, %r4564, 0, 0x7772U; 2026-02-21T09:06:50.1954624Z cvt.u16.u32 %rs59, %r4567; 2026-02-21T09:06:50.1954804Z prmt.b32 %r4568, %r4564, 0, 0x7773U; 2026-02-21T09:06:50.1955000Z cvt.u16.u32 %rs60, %r4568; 2026-02-21T09:06:50.1955281Z ld.shared.b32 %r4569, [%r21+128]; 2026-02-21T09:06:50.1955481Z prmt.b32 %r4570, %r4569, 0, 0x7770U; 2026-02-21T09:06:50.1955671Z cvt.u16.u32 %rs61, %r4570; 2026-02-21T09:06:50.1955851Z prmt.b32 %r4571, %r4569, 0, 0x7771U; 2026-02-21T09:06:50.1956042Z cvt.u16.u32 %rs62, %r4571; 2026-02-21T09:06:50.1956222Z prmt.b32 %r4572, %r4569, 0, 0x7772U; 2026-02-21T09:06:50.1956433Z cvt.u16.u32 %rs63, %r4572; 2026-02-21T09:06:50.1956739Z prmt.b32 %r4573, %r4569, 0, 0x7773U; 2026-02-21T09:06:50.1956937Z cvt.u16.u32 %rs64, %r4573; 2026-02-21T09:06:50.1957248Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.1957601Z shl.b16 %rs65, %rs57, 4; 2026-02-21T09:06:50.1957873Z shl.b16 %rs66, %rs58, 4; 2026-02-21T09:06:50.1958120Z shl.b16 %rs67, %rs59, 4; 2026-02-21T09:06:50.1958295Z shl.b16 %rs68, %rs60, 4; 2026-02-21T09:06:50.1958466Z shl.b16 %rs69, %rs61, 4; 2026-02-21T09:06:50.1969316Z shl.b16 %rs70, %rs62, 4; 2026-02-21T09:06:50.1969593Z shl.b16 %rs71, %rs63, 4; 2026-02-21T09:06:50.1969794Z shl.b16 %rs72, %rs64, 4; 2026-02-21T09:06:50.1970151Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.1970541Z selp.b16 %rs73, %rs65, %rs57, %p81; 2026-02-21T09:06:50.1970770Z cvt.s16.s8 %rs74, %rs73; 2026-02-21T09:06:50.1970952Z shr.s16 %rs75, %rs74, 4; 2026-02-21T09:06:50.1971156Z selp.b16 %rs76, %rs66, %rs58, %p81; 2026-02-21T09:06:50.1971365Z cvt.s16.s8 %rs77, %rs76; 2026-02-21T09:06:50.1971545Z shr.s16 %rs78, %rs77, 4; 2026-02-21T09:06:50.1971724Z selp.b16 %rs79, %rs67, %rs59, %p81; 2026-02-21T09:06:50.1971929Z cvt.s16.s8 %rs80, %rs79; 2026-02-21T09:06:50.1972100Z shr.s16 %rs81, %rs80, 4; 2026-02-21T09:06:50.1972285Z selp.b16 %rs82, %rs68, %rs60, %p81; 2026-02-21T09:06:50.1972658Z cvt.s16.s8 %rs83, %rs82; 2026-02-21T09:06:50.1972838Z shr.s16 %rs84, %rs83, 4; 2026-02-21T09:06:50.1973022Z selp.b16 %rs85, %rs69, %rs61, %p81; 2026-02-21T09:06:50.1973219Z cvt.s16.s8 %rs86, %rs85; 2026-02-21T09:06:50.1973393Z shr.s16 %rs87, %rs86, 4; 2026-02-21T09:06:50.1973568Z selp.b16 %rs88, %rs70, %rs62, %p81; 2026-02-21T09:06:50.1973768Z cvt.s16.s8 %rs89, %rs88; 2026-02-21T09:06:50.1973936Z shr.s16 %rs90, %rs89, 4; 2026-02-21T09:06:50.1974114Z selp.b16 %rs91, %rs71, %rs63, %p81; 2026-02-21T09:06:50.1974311Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T09:06:50.1974480Z shr.s16 %rs93, %rs92, 4; 2026-02-21T09:06:50.1974659Z selp.b16 %rs94, %rs72, %rs64, %p81; 2026-02-21T09:06:50.1974850Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T09:06:50.1975031Z shr.s16 %rs96, %rs95, 4; 2026-02-21T09:06:50.1975350Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.1975729Z cvt.rn.f32.s16 %r4574, %rs75; 2026-02-21T09:06:50.1975934Z cvt.rn.f32.s16 %r4575, %rs78; 2026-02-21T09:06:50.1976128Z cvt.rn.f32.s16 %r4576, %rs81; 2026-02-21T09:06:50.1976312Z cvt.rn.f32.s16 %r4577, %rs84; 2026-02-21T09:06:50.1976638Z cvt.rn.f32.s16 %r4578, %rs87; 2026-02-21T09:06:50.1976831Z cvt.rn.f32.s16 %r4579, %rs90; 2026-02-21T09:06:50.1977009Z cvt.rn.f32.s16 %r4580, %rs93; 2026-02-21T09:06:50.1977192Z cvt.rn.f32.s16 %r4581, %rs96; 2026-02-21T09:06:50.1977367Z bar.sync 0; 2026-02-21T09:06:50.1977533Z st.shared.b32 [%r22], %r4574; 2026-02-21T09:06:50.1977720Z st.shared.b32 [%r22+8], %r4575; 2026-02-21T09:06:50.1977931Z st.shared.b32 [%r23], %r4576; 2026-02-21T09:06:50.1978116Z st.shared.b32 [%r23+8], %r4577; 2026-02-21T09:06:50.1978311Z st.shared.b32 [%r24], %r4578; 2026-02-21T09:06:50.1978498Z st.shared.b32 [%r24+8], %r4579; 2026-02-21T09:06:50.1978686Z st.shared.b32 [%r25], %r4580; 2026-02-21T09:06:50.1978874Z st.shared.b32 [%r25+8], %r4581; 2026-02-21T09:06:50.1979061Z $L__tmp3: 2026-02-21T09:06:50.1979437Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.1979987Z // begin inline asm 2026-02-21T09:06:50.1980185Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.1980377Z // end inline asm 2026-02-21T09:06:50.1980539Z bar.sync 0; 2026-02-21T09:06:50.1980707Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.1980898Z // begin inline asm 2026-02-21T09:06:50.1983766Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r2434,%r2435,%r2436,%r2437}, %rd93, %p20, 1, 1; 2026-02-21T09:06:50.1986915Z // end inline asm 2026-02-21T09:06:50.1987077Z // begin inline asm 2026-02-21T09:06:50.1990006Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r2694,%r2695,%r2696,%r2697}, %rd94, %p20, 1, 1; 2026-02-21T09:06:50.1992965Z // end inline asm 2026-02-21T09:06:50.1993162Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.1993389Z mov.b32 %r2826, %r12221; 2026-02-21T09:06:50.1993578Z mov.b32 %r2827, %r4402; 2026-02-21T09:06:50.1993753Z mov.b32 %r2828, %r4402; 2026-02-21T09:06:50.1993926Z // begin inline asm 2026-02-21T09:06:50.1996642Z // wait for regs: %r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910,%r2826,%r2827,%r2828 2026-02-21T09:06:50.1999545Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.1999751Z // end inline asm 2026-02-21T09:06:50.1999901Z $L__tmp4: 2026-02-21T09:06:50.2000198Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2000570Z add.s64 %rd55, %rd159, -32; 2026-02-21T09:06:50.2000763Z // begin inline asm 2026-02-21T09:06:50.2000934Z mov.u32 %r2960, 0x0; 2026-02-21T09:06:50.2001096Z mov.u32 %r2961, 0x0; 2026-02-21T09:06:50.2001261Z mov.u32 %r2962, 0x0; 2026-02-21T09:06:50.2001415Z mov.u32 %r2963, 0x0; 2026-02-21T09:06:50.2001651Z ld.global.v4.b32 { %r2960, %r2961, %r2962, %r2963 }, [ %rd55 + 0 ]; 2026-02-21T09:06:50.2001923Z // end inline asm 2026-02-21T09:06:50.2002380Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2002734Z bar.sync 0; 2026-02-21T09:06:50.2002912Z st.shared.v2.b32 [%r13], {%r2960, %r2961}; 2026-02-21T09:06:50.2003153Z st.shared.v2.b32 [%r14], {%r2962, %r2963}; 2026-02-21T09:06:50.2003360Z bar.sync 0; 2026-02-21T09:06:50.2003518Z ld.shared.b16 %rs97, [%r15]; 2026-02-21T09:06:50.2003710Z ld.shared.b16 %rs98, [%r15+256]; 2026-02-21T09:06:50.2003932Z ld.shared.b16 %rs99, [%r15+16]; 2026-02-21T09:06:50.2004132Z ld.shared.b16 %rs100, [%r15+272]; 2026-02-21T09:06:50.2004344Z ld.shared.b16 %rs101, [%r16]; 2026-02-21T09:06:50.2004534Z ld.shared.b16 %rs102, [%r16+256]; 2026-02-21T09:06:50.2004738Z ld.shared.b16 %rs103, [%r16+16]; 2026-02-21T09:06:50.2004935Z ld.shared.b16 %rs104, [%r16+272]; 2026-02-21T09:06:50.2005137Z cvt.f32.bf16 %r3221, %rs97; 2026-02-21T09:06:50.2005327Z cvt.f32.bf16 %r3222, %rs98; 2026-02-21T09:06:50.2005508Z cvt.f32.bf16 %r3223, %rs101; 2026-02-21T09:06:50.2005710Z cvt.f32.bf16 %r3224, %rs102; 2026-02-21T09:06:50.2005969Z cvt.f32.bf16 %r3481, %rs99; 2026-02-21T09:06:50.2006158Z cvt.f32.bf16 %r3482, %rs100; 2026-02-21T09:06:50.2006336Z cvt.f32.bf16 %r3483, %rs103; 2026-02-21T09:06:50.2006662Z cvt.f32.bf16 %r3484, %rs104; 2026-02-21T09:06:50.2006987Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2007355Z add.s32 %r4582, %r14782, 131072; 2026-02-21T09:06:50.2007566Z cvt.s64.s32 %rd65, %r4582; 2026-02-21T09:06:50.2007749Z add.s64 %rd56, %rd23, %rd65; 2026-02-21T09:06:50.2008077Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2008429Z // begin inline asm 2026-02-21T09:06:50.2008602Z mov.u32 %r2964, 0x0; 2026-02-21T09:06:50.2008779Z ld.global.b32 { %r2964 }, [ %rd56 + 0 ]; 2026-02-21T09:06:50.2008995Z // end inline asm 2026-02-21T09:06:50.2009287Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2009643Z bar.sync 0; 2026-02-21T09:06:50.2009807Z st.shared.b8 [%r17], %r2964; 2026-02-21T09:06:50.2009998Z prmt.b32 %r4583, %r2964, 0, 0x7771U; 2026-02-21T09:06:50.2010202Z st.shared.b8 [%r18+512], %r4583; 2026-02-21T09:06:50.2010388Z prmt.b32 %r4584, %r2964, 0, 0x7772U; 2026-02-21T09:06:50.2010585Z st.shared.b8 [%r19+1024], %r4584; 2026-02-21T09:06:50.2010777Z prmt.b32 %r4585, %r2964, 0, 0x7773U; 2026-02-21T09:06:50.2010978Z st.shared.b8 [%r20+1536], %r4585; 2026-02-21T09:06:50.2011160Z bar.sync 0; 2026-02-21T09:06:50.2011332Z ld.shared.b32 %r4586, [%r21]; 2026-02-21T09:06:50.2011527Z prmt.b32 %r4587, %r4586, 0, 0x7770U; 2026-02-21T09:06:50.2011724Z cvt.u16.u32 %rs105, %r4587; 2026-02-21T09:06:50.2011916Z prmt.b32 %r4588, %r4586, 0, 0x7771U; 2026-02-21T09:06:50.2012109Z cvt.u16.u32 %rs106, %r4588; 2026-02-21T09:06:50.2012296Z prmt.b32 %r4589, %r4586, 0, 0x7772U; 2026-02-21T09:06:50.2012493Z cvt.u16.u32 %rs107, %r4589; 2026-02-21T09:06:50.2012702Z prmt.b32 %r4590, %r4586, 0, 0x7773U; 2026-02-21T09:06:50.2012900Z cvt.u16.u32 %rs108, %r4590; 2026-02-21T09:06:50.2013088Z ld.shared.b32 %r4591, [%r21+128]; 2026-02-21T09:06:50.2013373Z prmt.b32 %r4592, %r4591, 0, 0x7770U; 2026-02-21T09:06:50.2013572Z cvt.u16.u32 %rs109, %r4592; 2026-02-21T09:06:50.2013755Z prmt.b32 %r4593, %r4591, 0, 0x7771U; 2026-02-21T09:06:50.2013944Z cvt.u16.u32 %rs110, %r4593; 2026-02-21T09:06:50.2014123Z prmt.b32 %r4594, %r4591, 0, 0x7772U; 2026-02-21T09:06:50.2014312Z cvt.u16.u32 %rs111, %r4594; 2026-02-21T09:06:50.2014487Z prmt.b32 %r4595, %r4591, 0, 0x7773U; 2026-02-21T09:06:50.2014684Z cvt.u16.u32 %rs112, %r4595; 2026-02-21T09:06:50.2015003Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2015376Z shl.b16 %rs113, %rs105, 4; 2026-02-21T09:06:50.2015557Z shl.b16 %rs114, %rs106, 4; 2026-02-21T09:06:50.2015815Z shl.b16 %rs115, %rs107, 4; 2026-02-21T09:06:50.2016054Z shl.b16 %rs116, %rs108, 4; 2026-02-21T09:06:50.2016236Z shl.b16 %rs117, %rs109, 4; 2026-02-21T09:06:50.2016414Z shl.b16 %rs118, %rs110, 4; 2026-02-21T09:06:50.2016718Z shl.b16 %rs119, %rs111, 4; 2026-02-21T09:06:50.2016893Z shl.b16 %rs120, %rs112, 4; 2026-02-21T09:06:50.2017200Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2017555Z selp.b16 %rs121, %rs113, %rs105, %p81; 2026-02-21T09:06:50.2017769Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T09:06:50.2017943Z shr.s16 %rs123, %rs122, 4; 2026-02-21T09:06:50.2018123Z selp.b16 %rs124, %rs114, %rs106, %p81; 2026-02-21T09:06:50.2018332Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T09:06:50.2018500Z shr.s16 %rs126, %rs125, 4; 2026-02-21T09:06:50.2018679Z selp.b16 %rs127, %rs115, %rs107, %p81; 2026-02-21T09:06:50.2018876Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T09:06:50.2019055Z shr.s16 %rs129, %rs128, 4; 2026-02-21T09:06:50.2019247Z selp.b16 %rs130, %rs116, %rs108, %p81; 2026-02-21T09:06:50.2019552Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T09:06:50.2019741Z shr.s16 %rs132, %rs131, 4; 2026-02-21T09:06:50.2019923Z selp.b16 %rs133, %rs117, %rs109, %p81; 2026-02-21T09:06:50.2020131Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T09:06:50.2020304Z shr.s16 %rs135, %rs134, 4; 2026-02-21T09:06:50.2020492Z selp.b16 %rs136, %rs118, %rs110, %p81; 2026-02-21T09:06:50.2020698Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T09:06:50.2020879Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:06:50.2021059Z selp.b16 %rs139, %rs119, %rs111, %p81; 2026-02-21T09:06:50.2021263Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T09:06:50.2021450Z shr.s16 %rs141, %rs140, 4; 2026-02-21T09:06:50.2021629Z selp.b16 %rs142, %rs120, %rs112, %p81; 2026-02-21T09:06:50.2021848Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T09:06:50.2022035Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:06:50.2022360Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2022726Z cvt.rn.f32.s16 %r4596, %rs123; 2026-02-21T09:06:50.2022945Z cvt.rn.f32.s16 %r4597, %rs126; 2026-02-21T09:06:50.2023140Z cvt.rn.f32.s16 %r4598, %rs129; 2026-02-21T09:06:50.2023331Z cvt.rn.f32.s16 %r4599, %rs132; 2026-02-21T09:06:50.2023531Z cvt.rn.f32.s16 %r4600, %rs135; 2026-02-21T09:06:50.2023722Z cvt.rn.f32.s16 %r4601, %rs138; 2026-02-21T09:06:50.2023916Z cvt.rn.f32.s16 %r4602, %rs141; 2026-02-21T09:06:50.2024104Z cvt.rn.f32.s16 %r4603, %rs144; 2026-02-21T09:06:50.2024295Z bar.sync 0; 2026-02-21T09:06:50.2024449Z st.shared.b32 [%r22], %r4596; 2026-02-21T09:06:50.2024657Z st.shared.b32 [%r22+8], %r4597; 2026-02-21T09:06:50.2024852Z st.shared.b32 [%r23], %r4598; 2026-02-21T09:06:50.2025047Z st.shared.b32 [%r23+8], %r4599; 2026-02-21T09:06:50.2025252Z st.shared.b32 [%r24], %r4600; 2026-02-21T09:06:50.2025455Z st.shared.b32 [%r24+8], %r4601; 2026-02-21T09:06:50.2025646Z st.shared.b32 [%r25], %r4602; 2026-02-21T09:06:50.2025827Z st.shared.b32 [%r25+8], %r4603; 2026-02-21T09:06:50.2026011Z $L__tmp5: 2026-02-21T09:06:50.2026364Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2027000Z // begin inline asm 2026-02-21T09:06:50.2027175Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2027381Z // end inline asm 2026-02-21T09:06:50.2027544Z bar.sync 0; 2026-02-21T09:06:50.2027702Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2027890Z // begin inline asm 2026-02-21T09:06:50.2030869Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r3221,%r3222,%r3223,%r3224}, %rd93, %p20, 1, 1; 2026-02-21T09:06:50.2033929Z // end inline asm 2026-02-21T09:06:50.2034080Z // begin inline asm 2026-02-21T09:06:50.2037081Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r3481,%r3482,%r3483,%r3484}, %rd94, %p20, 1, 1; 2026-02-21T09:06:50.2040090Z // end inline asm 2026-02-21T09:06:50.2040274Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2040487Z mov.b32 %r3613, %r12221; 2026-02-21T09:06:50.2040660Z mov.b32 %r3614, %r4402; 2026-02-21T09:06:50.2040827Z mov.b32 %r3615, %r4402; 2026-02-21T09:06:50.2040994Z // begin inline asm 2026-02-21T09:06:50.2043596Z // wait for regs: %r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910,%r3613,%r3614,%r3615 2026-02-21T09:06:50.2046582Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2046780Z // end inline asm 2026-02-21T09:06:50.2046934Z $L__tmp6: 2026-02-21T09:06:50.2047223Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2047576Z // begin inline asm 2026-02-21T09:06:50.2047736Z mov.u32 %r3747, 0x0; 2026-02-21T09:06:50.2047892Z mov.u32 %r3748, 0x0; 2026-02-21T09:06:50.2048045Z mov.u32 %r3749, 0x0; 2026-02-21T09:06:50.2048195Z mov.u32 %r3750, 0x0; 2026-02-21T09:06:50.2048415Z ld.global.v4.b32 { %r3747, %r3748, %r3749, %r3750 }, [ %rd159 + 0 ]; 2026-02-21T09:06:50.2048681Z // end inline asm 2026-02-21T09:06:50.2049058Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2049479Z bar.sync 0; 2026-02-21T09:06:50.2049641Z st.shared.v2.b32 [%r13], {%r3747, %r3748}; 2026-02-21T09:06:50.2049871Z st.shared.v2.b32 [%r14], {%r3749, %r3750}; 2026-02-21T09:06:50.2050073Z bar.sync 0; 2026-02-21T09:06:50.2050222Z ld.shared.b16 %rs145, [%r15]; 2026-02-21T09:06:50.2050416Z ld.shared.b16 %rs146, [%r15+256]; 2026-02-21T09:06:50.2050615Z ld.shared.b16 %rs147, [%r15+16]; 2026-02-21T09:06:50.2050811Z ld.shared.b16 %rs148, [%r15+272]; 2026-02-21T09:06:50.2051003Z ld.shared.b16 %rs149, [%r16]; 2026-02-21T09:06:50.2051207Z ld.shared.b16 %rs150, [%r16+256]; 2026-02-21T09:06:50.2051400Z ld.shared.b16 %rs151, [%r16+16]; 2026-02-21T09:06:50.2051594Z ld.shared.b16 %rs152, [%r16+272]; 2026-02-21T09:06:50.2051784Z cvt.f32.bf16 %r4008, %rs145; 2026-02-21T09:06:50.2051968Z cvt.f32.bf16 %r4009, %rs146; 2026-02-21T09:06:50.2052144Z cvt.f32.bf16 %r4010, %rs149; 2026-02-21T09:06:50.2052327Z cvt.f32.bf16 %r4011, %rs150; 2026-02-21T09:06:50.2052576Z cvt.f32.bf16 %r4268, %rs147; 2026-02-21T09:06:50.2052751Z cvt.f32.bf16 %r4269, %rs148; 2026-02-21T09:06:50.2052923Z cvt.f32.bf16 %r4270, %rs151; 2026-02-21T09:06:50.2053097Z cvt.f32.bf16 %r4271, %rs152; 2026-02-21T09:06:50.2053277Z add.s32 %r4604, %r4556, 56; 2026-02-21T09:06:50.2053592Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2053942Z or.b32 %r4605, %r8, %r4604; 2026-02-21T09:06:50.2054115Z shl.b32 %r4606, %r4605, 13; 2026-02-21T09:06:50.2054292Z add.s32 %r4607, %r4606, %r49; 2026-02-21T09:06:50.2054606Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2054952Z cvt.s64.s32 %rd66, %r4607; 2026-02-21T09:06:50.2055145Z add.s64 %rd60, %rd23, %rd66; 2026-02-21T09:06:50.2055457Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2055806Z // begin inline asm 2026-02-21T09:06:50.2055965Z mov.u32 %r3751, 0x0; 2026-02-21T09:06:50.2056141Z ld.global.b32 { %r3751 }, [ %rd60 + 0 ]; 2026-02-21T09:06:50.2056347Z // end inline asm 2026-02-21T09:06:50.2056765Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2057107Z bar.sync 0; 2026-02-21T09:06:50.2057255Z st.shared.b8 [%r17], %r3751; 2026-02-21T09:06:50.2057444Z prmt.b32 %r4608, %r3751, 0, 0x7771U; 2026-02-21T09:06:50.2057645Z st.shared.b8 [%r18+512], %r4608; 2026-02-21T09:06:50.2057851Z prmt.b32 %r4609, %r3751, 0, 0x7772U; 2026-02-21T09:06:50.2058048Z st.shared.b8 [%r19+1024], %r4609; 2026-02-21T09:06:50.2058243Z prmt.b32 %r4610, %r3751, 0, 0x7773U; 2026-02-21T09:06:50.2058435Z st.shared.b8 [%r20+1536], %r4610; 2026-02-21T09:06:50.2058615Z bar.sync 0; 2026-02-21T09:06:50.2058762Z ld.shared.b32 %r4611, [%r21]; 2026-02-21T09:06:50.2058944Z prmt.b32 %r4612, %r4611, 0, 0x7770U; 2026-02-21T09:06:50.2059142Z cvt.u16.u32 %rs153, %r4612; 2026-02-21T09:06:50.2059320Z prmt.b32 %r4613, %r4611, 0, 0x7771U; 2026-02-21T09:06:50.2059515Z cvt.u16.u32 %rs154, %r4613; 2026-02-21T09:06:50.2059690Z prmt.b32 %r4614, %r4611, 0, 0x7772U; 2026-02-21T09:06:50.2059969Z cvt.u16.u32 %rs155, %r4614; 2026-02-21T09:06:50.2060143Z prmt.b32 %r4615, %r4611, 0, 0x7773U; 2026-02-21T09:06:50.2060348Z cvt.u16.u32 %rs156, %r4615; 2026-02-21T09:06:50.2060531Z ld.shared.b32 %r4616, [%r21+128]; 2026-02-21T09:06:50.2060723Z prmt.b32 %r4617, %r4616, 0, 0x7770U; 2026-02-21T09:06:50.2060915Z cvt.u16.u32 %rs157, %r4617; 2026-02-21T09:06:50.2061090Z prmt.b32 %r4618, %r4616, 0, 0x7771U; 2026-02-21T09:06:50.2061285Z cvt.u16.u32 %rs158, %r4618; 2026-02-21T09:06:50.2061462Z prmt.b32 %r4619, %r4616, 0, 0x7772U; 2026-02-21T09:06:50.2061652Z cvt.u16.u32 %rs159, %r4619; 2026-02-21T09:06:50.2061826Z prmt.b32 %r4620, %r4616, 0, 0x7773U; 2026-02-21T09:06:50.2062018Z cvt.u16.u32 %rs160, %r4620; 2026-02-21T09:06:50.2062472Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2062820Z shl.b16 %rs161, %rs153, 4; 2026-02-21T09:06:50.2063000Z shl.b16 %rs162, %rs154, 4; 2026-02-21T09:06:50.2063176Z shl.b16 %rs163, %rs155, 4; 2026-02-21T09:06:50.2063349Z shl.b16 %rs164, %rs156, 4; 2026-02-21T09:06:50.2063516Z shl.b16 %rs165, %rs157, 4; 2026-02-21T09:06:50.2063687Z shl.b16 %rs166, %rs158, 4; 2026-02-21T09:06:50.2063854Z shl.b16 %rs167, %rs159, 4; 2026-02-21T09:06:50.2064024Z shl.b16 %rs168, %rs160, 4; 2026-02-21T09:06:50.2064335Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2064700Z selp.b16 %rs169, %rs161, %rs153, %p81; 2026-02-21T09:06:50.2064912Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T09:06:50.2065083Z shr.s16 %rs171, %rs170, 4; 2026-02-21T09:06:50.2065265Z selp.b16 %rs172, %rs162, %rs154, %p81; 2026-02-21T09:06:50.2065462Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T09:06:50.2065638Z shr.s16 %rs174, %rs173, 4; 2026-02-21T09:06:50.2065888Z selp.b16 %rs175, %rs163, %rs155, %p81; 2026-02-21T09:06:50.2066090Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T09:06:50.2066259Z shr.s16 %rs177, %rs176, 4; 2026-02-21T09:06:50.2066442Z selp.b16 %rs178, %rs164, %rs156, %p81; 2026-02-21T09:06:50.2066776Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T09:06:50.2066947Z shr.s16 %rs180, %rs179, 4; 2026-02-21T09:06:50.2067138Z selp.b16 %rs181, %rs165, %rs157, %p81; 2026-02-21T09:06:50.2067337Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T09:06:50.2067509Z shr.s16 %rs183, %rs182, 4; 2026-02-21T09:06:50.2067684Z selp.b16 %rs184, %rs166, %rs158, %p81; 2026-02-21T09:06:50.2067882Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T09:06:50.2068048Z shr.s16 %rs186, %rs185, 4; 2026-02-21T09:06:50.2068226Z selp.b16 %rs187, %rs167, %rs159, %p81; 2026-02-21T09:06:50.2068502Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T09:06:50.2068672Z shr.s16 %rs189, %rs188, 4; 2026-02-21T09:06:50.2068851Z selp.b16 %rs190, %rs168, %rs160, %p81; 2026-02-21T09:06:50.2069051Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T09:06:50.2069222Z shr.s16 %rs192, %rs191, 4; 2026-02-21T09:06:50.2069526Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2069880Z cvt.rn.f32.s16 %r4621, %rs171; 2026-02-21T09:06:50.2070066Z cvt.rn.f32.s16 %r4622, %rs174; 2026-02-21T09:06:50.2070247Z cvt.rn.f32.s16 %r4623, %rs177; 2026-02-21T09:06:50.2070426Z cvt.rn.f32.s16 %r4624, %rs180; 2026-02-21T09:06:50.2070604Z cvt.rn.f32.s16 %r4625, %rs183; 2026-02-21T09:06:50.2070783Z cvt.rn.f32.s16 %r4626, %rs186; 2026-02-21T09:06:50.2070958Z cvt.rn.f32.s16 %r4627, %rs189; 2026-02-21T09:06:50.2071137Z cvt.rn.f32.s16 %r4628, %rs192; 2026-02-21T09:06:50.2071306Z bar.sync 0; 2026-02-21T09:06:50.2071453Z st.shared.b32 [%r22], %r4621; 2026-02-21T09:06:50.2071637Z st.shared.b32 [%r22+8], %r4622; 2026-02-21T09:06:50.2071825Z st.shared.b32 [%r23], %r4623; 2026-02-21T09:06:50.2072009Z st.shared.b32 [%r23+8], %r4624; 2026-02-21T09:06:50.2072197Z st.shared.b32 [%r24], %r4625; 2026-02-21T09:06:50.2072379Z st.shared.b32 [%r24+8], %r4626; 2026-02-21T09:06:50.2072563Z st.shared.b32 [%r25], %r4627; 2026-02-21T09:06:50.2072847Z st.shared.b32 [%r25+8], %r4628; 2026-02-21T09:06:50.2073020Z $L__tmp7: 2026-02-21T09:06:50.2073369Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2073782Z // begin inline asm 2026-02-21T09:06:50.2073962Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2074153Z // end inline asm 2026-02-21T09:06:50.2074308Z bar.sync 0; 2026-02-21T09:06:50.2074467Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2074646Z // begin inline asm 2026-02-21T09:06:50.2077657Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r4008,%r4009,%r4010,%r4011}, %rd93, %p20, 1, 1; 2026-02-21T09:06:50.2080736Z // end inline asm 2026-02-21T09:06:50.2080887Z // begin inline asm 2026-02-21T09:06:50.2083769Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910}, {%r4268,%r4269,%r4270,%r4271}, %rd94, %p20, 1, 1; 2026-02-21T09:06:50.2086870Z // end inline asm 2026-02-21T09:06:50.2087043Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2087247Z mov.b32 %r4400, %r12221; 2026-02-21T09:06:50.2087419Z mov.b32 %r4401, %r4402; 2026-02-21T09:06:50.2087581Z // begin inline asm 2026-02-21T09:06:50.2090190Z // wait for regs: %r14783,%r14784,%r14785,%r14786,%r14787,%r14788,%r14789,%r14790,%r14791,%r14792,%r14793,%r14794,%r14795,%r14796,%r14797,%r14798,%r14799,%r14800,%r14801,%r14802,%r14803,%r14804,%r14805,%r14806,%r14807,%r14808,%r14809,%r14810,%r14811,%r14812,%r14813,%r14814,%r14815,%r14816,%r14817,%r14818,%r14819,%r14820,%r14821,%r14822,%r14823,%r14824,%r14825,%r14826,%r14827,%r14828,%r14829,%r14830,%r14831,%r14832,%r14833,%r14834,%r14835,%r14836,%r14837,%r14838,%r14839,%r14840,%r14841,%r14842,%r14843,%r14844,%r14845,%r14846,%r14847,%r14848,%r14849,%r14850,%r14851,%r14852,%r14853,%r14854,%r14855,%r14856,%r14857,%r14858,%r14859,%r14860,%r14861,%r14862,%r14863,%r14864,%r14865,%r14866,%r14867,%r14868,%r14869,%r14870,%r14871,%r14872,%r14873,%r14874,%r14875,%r14876,%r14877,%r14878,%r14879,%r14880,%r14881,%r14882,%r14883,%r14884,%r14885,%r14886,%r14887,%r14888,%r14889,%r14890,%r14891,%r14892,%r14893,%r14894,%r14895,%r14896,%r14897,%r14898,%r14899,%r14900,%r14901,%r14902,%r14903,%r14904,%r14905,%r14906,%r14907,%r14908,%r14909,%r14910,%r4400,%r4401,%r4402 2026-02-21T09:06:50.2093073Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2093264Z // end inline asm 2026-02-21T09:06:50.2093413Z $L__tmp8: 2026-02-21T09:06:50.2093701Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2094066Z add.s64 %rd160, %rd160, 32; 2026-02-21T09:06:50.2094247Z add.s64 %rd159, %rd159, 128; 2026-02-21T09:06:50.2094431Z add.s32 %r14782, %r14782, 262144; 2026-02-21T09:06:50.2094633Z setp.lt.u64 %p29, %rd160, 480; 2026-02-21T09:06:50.2094950Z @%p29 bra $L__BB0_3; 2026-02-21T09:06:50.2095165Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:06:50.2095566Z .loc 1 94 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:94:28 2026-02-21T09:06:50.2095948Z cvt.rn.bf16x2.f32 %r4633, %r14784, %r14783; 2026-02-21T09:06:50.2096186Z cvt.rn.bf16x2.f32 %r4634, %r14786, %r14785; 2026-02-21T09:06:50.2096413Z cvt.rn.bf16x2.f32 %r4635, %r14788, %r14787; 2026-02-21T09:06:50.2096774Z cvt.rn.bf16x2.f32 %r4636, %r14790, %r14789; 2026-02-21T09:06:50.2096999Z cvt.rn.bf16x2.f32 %r4637, %r14792, %r14791; 2026-02-21T09:06:50.2097222Z cvt.rn.bf16x2.f32 %r4638, %r14794, %r14793; 2026-02-21T09:06:50.2097443Z cvt.rn.bf16x2.f32 %r4639, %r14796, %r14795; 2026-02-21T09:06:50.2097667Z cvt.rn.bf16x2.f32 %r4640, %r14798, %r14797; 2026-02-21T09:06:50.2097886Z cvt.rn.bf16x2.f32 %r4641, %r14800, %r14799; 2026-02-21T09:06:50.2098108Z cvt.rn.bf16x2.f32 %r4642, %r14802, %r14801; 2026-02-21T09:06:50.2098354Z cvt.rn.bf16x2.f32 %r4643, %r14804, %r14803; 2026-02-21T09:06:50.2098655Z cvt.rn.bf16x2.f32 %r4644, %r14806, %r14805; 2026-02-21T09:06:50.2098883Z cvt.rn.bf16x2.f32 %r4645, %r14808, %r14807; 2026-02-21T09:06:50.2099106Z cvt.rn.bf16x2.f32 %r4646, %r14810, %r14809; 2026-02-21T09:06:50.2099330Z cvt.rn.bf16x2.f32 %r4647, %r14812, %r14811; 2026-02-21T09:06:50.2099549Z cvt.rn.bf16x2.f32 %r4648, %r14814, %r14813; 2026-02-21T09:06:50.2099772Z cvt.rn.bf16x2.f32 %r4649, %r14816, %r14815; 2026-02-21T09:06:50.2099993Z cvt.rn.bf16x2.f32 %r4650, %r14818, %r14817; 2026-02-21T09:06:50.2100215Z cvt.rn.bf16x2.f32 %r4651, %r14820, %r14819; 2026-02-21T09:06:50.2100436Z cvt.rn.bf16x2.f32 %r4652, %r14822, %r14821; 2026-02-21T09:06:50.2100658Z cvt.rn.bf16x2.f32 %r4653, %r14824, %r14823; 2026-02-21T09:06:50.2100879Z cvt.rn.bf16x2.f32 %r4654, %r14826, %r14825; 2026-02-21T09:06:50.2101100Z cvt.rn.bf16x2.f32 %r4655, %r14828, %r14827; 2026-02-21T09:06:50.2101329Z cvt.rn.bf16x2.f32 %r4656, %r14830, %r14829; 2026-02-21T09:06:50.2101554Z cvt.rn.bf16x2.f32 %r4657, %r14832, %r14831; 2026-02-21T09:06:50.2101778Z cvt.rn.bf16x2.f32 %r4658, %r14834, %r14833; 2026-02-21T09:06:50.2102000Z cvt.rn.bf16x2.f32 %r4659, %r14836, %r14835; 2026-02-21T09:06:50.2102228Z cvt.rn.bf16x2.f32 %r4660, %r14838, %r14837; 2026-02-21T09:06:50.2102457Z cvt.rn.bf16x2.f32 %r4661, %r14840, %r14839; 2026-02-21T09:06:50.2102539Z cvt.rn.bf16x2.f32 %r4662, %r14842, %r14841; 2026-02-21T09:06:50.2102613Z cvt.rn.bf16x2.f32 %r4663, %r14844, %r14843; 2026-02-21T09:06:50.2102687Z cvt.rn.bf16x2.f32 %r4664, %r14846, %r14845; 2026-02-21T09:06:50.2102765Z cvt.rn.bf16x2.f32 %r4665, %r14848, %r14847; 2026-02-21T09:06:50.2102845Z cvt.rn.bf16x2.f32 %r4666, %r14850, %r14849; 2026-02-21T09:06:50.2102921Z cvt.rn.bf16x2.f32 %r4667, %r14852, %r14851; 2026-02-21T09:06:50.2103002Z cvt.rn.bf16x2.f32 %r4668, %r14854, %r14853; 2026-02-21T09:06:50.2103079Z cvt.rn.bf16x2.f32 %r4669, %r14856, %r14855; 2026-02-21T09:06:50.2103161Z cvt.rn.bf16x2.f32 %r4670, %r14858, %r14857; 2026-02-21T09:06:50.2103239Z cvt.rn.bf16x2.f32 %r4671, %r14860, %r14859; 2026-02-21T09:06:50.2103325Z cvt.rn.bf16x2.f32 %r4672, %r14862, %r14861; 2026-02-21T09:06:50.2103486Z cvt.rn.bf16x2.f32 %r4673, %r14864, %r14863; 2026-02-21T09:06:50.2103564Z cvt.rn.bf16x2.f32 %r4674, %r14866, %r14865; 2026-02-21T09:06:50.2103645Z cvt.rn.bf16x2.f32 %r4675, %r14868, %r14867; 2026-02-21T09:06:50.2103722Z cvt.rn.bf16x2.f32 %r4676, %r14870, %r14869; 2026-02-21T09:06:50.2103800Z cvt.rn.bf16x2.f32 %r4677, %r14872, %r14871; 2026-02-21T09:06:50.2103879Z cvt.rn.bf16x2.f32 %r4678, %r14874, %r14873; 2026-02-21T09:06:50.2103954Z cvt.rn.bf16x2.f32 %r4679, %r14876, %r14875; 2026-02-21T09:06:50.2104029Z cvt.rn.bf16x2.f32 %r4680, %r14878, %r14877; 2026-02-21T09:06:50.2104103Z cvt.rn.bf16x2.f32 %r4681, %r14880, %r14879; 2026-02-21T09:06:50.2104183Z cvt.rn.bf16x2.f32 %r4682, %r14882, %r14881; 2026-02-21T09:06:50.2104258Z cvt.rn.bf16x2.f32 %r4683, %r14884, %r14883; 2026-02-21T09:06:50.2104467Z cvt.rn.bf16x2.f32 %r4684, %r14886, %r14885; 2026-02-21T09:06:50.2104550Z cvt.rn.bf16x2.f32 %r4685, %r14888, %r14887; 2026-02-21T09:06:50.2104626Z cvt.rn.bf16x2.f32 %r4686, %r14890, %r14889; 2026-02-21T09:06:50.2104704Z cvt.rn.bf16x2.f32 %r4687, %r14892, %r14891; 2026-02-21T09:06:50.2104796Z cvt.rn.bf16x2.f32 %r4688, %r14894, %r14893; 2026-02-21T09:06:50.2104875Z cvt.rn.bf16x2.f32 %r4689, %r14896, %r14895; 2026-02-21T09:06:50.2104952Z cvt.rn.bf16x2.f32 %r4690, %r14898, %r14897; 2026-02-21T09:06:50.2105029Z cvt.rn.bf16x2.f32 %r4691, %r14900, %r14899; 2026-02-21T09:06:50.2105108Z cvt.rn.bf16x2.f32 %r4692, %r14902, %r14901; 2026-02-21T09:06:50.2105184Z cvt.rn.bf16x2.f32 %r4693, %r14904, %r14903; 2026-02-21T09:06:50.2105258Z cvt.rn.bf16x2.f32 %r4694, %r14906, %r14905; 2026-02-21T09:06:50.2105339Z cvt.rn.bf16x2.f32 %r4695, %r14908, %r14907; 2026-02-21T09:06:50.2105414Z cvt.rn.bf16x2.f32 %r4696, %r14910, %r14909; 2026-02-21T09:06:50.2105674Z .loc 1 95 43 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:95:43 2026-02-21T09:06:50.2105742Z bar.sync 0; 2026-02-21T09:06:50.2105933Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r26], {%r4633, %r4634, %r4635, %r4636}; 2026-02-21T09:06:50.2106120Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r4649, %r4650, %r4651, %r4652}; 2026-02-21T09:06:50.2106295Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r4665, %r4666, %r4667, %r4668}; 2026-02-21T09:06:50.2106605Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r4681, %r4682, %r4683, %r4684}; 2026-02-21T09:06:50.2106790Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r4637, %r4638, %r4639, %r4640}; 2026-02-21T09:06:50.2106963Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r4653, %r4654, %r4655, %r4656}; 2026-02-21T09:06:50.2107141Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r4669, %r4670, %r4671, %r4672}; 2026-02-21T09:06:50.2107314Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r4685, %r4686, %r4687, %r4688}; 2026-02-21T09:06:50.2107493Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r4641, %r4642, %r4643, %r4644}; 2026-02-21T09:06:50.2107670Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r4657, %r4658, %r4659, %r4660}; 2026-02-21T09:06:50.2107847Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r4673, %r4674, %r4675, %r4676}; 2026-02-21T09:06:50.2108021Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r4689, %r4690, %r4691, %r4692}; 2026-02-21T09:06:50.2108227Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r4645, %r4646, %r4647, %r4648}; 2026-02-21T09:06:50.2108479Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r4661, %r4662, %r4663, %r4664}; 2026-02-21T09:06:50.2108658Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r4677, %r4678, %r4679, %r4680}; 2026-02-21T09:06:50.2108836Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r4693, %r4694, %r4695, %r4696}; 2026-02-21T09:06:50.2108901Z // begin inline asm 2026-02-21T09:06:50.2108984Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2109046Z // end inline asm 2026-02-21T09:06:50.2109110Z bar.sync 0; 2026-02-21T09:06:50.2109183Z elect.sync %r4697|%p32, -1; 2026-02-21T09:06:50.2109269Z shfl.sync.idx.b32 %r4698, %r4, 0, 31, -1; 2026-02-21T09:06:50.2109432Z and.pred %p30, %p82, %p32; 2026-02-21T09:06:50.2109495Z and.b32 %r4699, %r4698, 3; 2026-02-21T09:06:50.2109558Z shl.b32 %r4700, %r4699, 15; 2026-02-21T09:06:50.2109630Z add.s32 %r7962, %r12221, %r4700; 2026-02-21T09:06:50.2109694Z shl.b32 %r310, %r4699, 6; 2026-02-21T09:06:50.2109757Z or.b32 %r4629, %r310, %r48; 2026-02-21T09:06:50.2109819Z // begin inline asm 2026-02-21T09:06:50.2110057Z @%p30 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd158, {%r4629, %r4630}], [%r7962]; 2026-02-21T09:06:50.2110130Z // end inline asm 2026-02-21T09:06:50.2110206Z cp.async.bulk.commit_group; 2026-02-21T09:06:50.2110287Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:50.2110344Z bar.sync 0; 2026-02-21T09:06:50.2110630Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.2110755Z add.s32 %r4702, %r14781, 1056; 2026-02-21T09:06:50.2110960Z .loc 1 32 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:32:35 2026-02-21T09:06:50.2111026Z shr.s32 %r4703, %r4702, 31; 2026-02-21T09:06:50.2111088Z shr.u32 %r4704, %r4703, 26; 2026-02-21T09:06:50.2111159Z add.s32 %r4705, %r4702, %r4704; 2026-02-21T09:06:50.2111222Z shr.s32 %r4706, %r4705, 6; 2026-02-21T09:06:50.2111416Z .loc 1 33 33 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:33:33 2026-02-21T09:06:50.2111481Z shl.b32 %r4707, %r4706, 1; 2026-02-21T09:06:50.2111674Z .loc 1 34 39 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:39 2026-02-21T09:06:50.2111736Z sub.s32 %r4708, 16, %r4707; 2026-02-21T09:06:50.2111932Z .loc 1 34 52 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:52 2026-02-21T09:06:50.2112000Z min.s32 %r4709, %r4708, 2; 2026-02-21T09:06:50.2112255Z .loc 1 35 45 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:45 2026-02-21T09:06:50.2112323Z and.b32 %r4710, %r4705, -64; 2026-02-21T09:06:50.2112394Z sub.s32 %r4711, %r4702, %r4710; 2026-02-21T09:06:50.2112586Z .loc 1 36 51 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:36:51 2026-02-21T09:06:50.2112651Z div.s32 %r4712, %r4711, %r4709; 2026-02-21T09:06:50.2112855Z .loc 1 35 64 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:64 2026-02-21T09:06:50.2112927Z mul.lo.s32 %r4713, %r4712, %r4709; 2026-02-21T09:06:50.2112991Z sub.s32 %r4714, %r4711, %r4713; 2026-02-21T09:06:50.2113190Z .loc 1 35 30 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:30 2026-02-21T09:06:50.2113254Z add.s32 %r4715, %r4714, %r4707; 2026-02-21T09:06:50.2113447Z .loc 1 37 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:37:27 2026-02-21T09:06:50.2113512Z shl.b32 %r7961, %r4715, 8; 2026-02-21T09:06:50.2113706Z .loc 1 39 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:39:27 2026-02-21T09:06:50.2113771Z shl.b32 %r312, %r4712, 8; 2026-02-21T09:06:50.2113962Z .loc 1 40 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:40:32 2026-02-21T09:06:50.2114028Z or.b32 %r313, %r312, %r7; 2026-02-21T09:06:50.2114227Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2114290Z shl.b32 %r4716, %r4715, 18; 2026-02-21T09:06:50.2114357Z or.b32 %r4717, %r44, %r4716; 2026-02-21T09:06:50.2114430Z mad.wide.s32 %rd161, %r4717, 2, %rd4; 2026-02-21T09:06:50.2114496Z add.s32 %r14911, %r45, %r312; 2026-02-21T09:06:50.2114562Z mov.b32 %r14912, 0f00000000; 2026-02-21T09:06:50.2114625Z mov.b64 %rd162, -32; 2026-02-21T09:06:50.2114688Z mov.b32 %r14913, %r14912; 2026-02-21T09:06:50.2114752Z mov.b32 %r14914, %r14912; 2026-02-21T09:06:50.2114822Z mov.b32 %r14915, %r14912; 2026-02-21T09:06:50.2114882Z mov.b32 %r14916, %r14912; 2026-02-21T09:06:50.2115005Z mov.b32 %r14917, %r14912; 2026-02-21T09:06:50.2115069Z mov.b32 %r14918, %r14912; 2026-02-21T09:06:50.2115130Z mov.b32 %r14919, %r14912; 2026-02-21T09:06:50.2115189Z mov.b32 %r14920, %r14912; 2026-02-21T09:06:50.2115250Z mov.b32 %r14921, %r14912; 2026-02-21T09:06:50.2115314Z mov.b32 %r14922, %r14912; 2026-02-21T09:06:50.2115373Z mov.b32 %r14923, %r14912; 2026-02-21T09:06:50.2115433Z mov.b32 %r14924, %r14912; 2026-02-21T09:06:50.2115497Z mov.b32 %r14925, %r14912; 2026-02-21T09:06:50.2115557Z mov.b32 %r14926, %r14912; 2026-02-21T09:06:50.2115617Z mov.b32 %r14927, %r14912; 2026-02-21T09:06:50.2115676Z mov.b32 %r14928, %r14912; 2026-02-21T09:06:50.2115740Z mov.b32 %r14929, %r14912; 2026-02-21T09:06:50.2115800Z mov.b32 %r14930, %r14912; 2026-02-21T09:06:50.2115928Z mov.b32 %r14931, %r14912; 2026-02-21T09:06:50.2116048Z mov.b32 %r14932, %r14912; 2026-02-21T09:06:50.2116114Z mov.b32 %r14933, %r14912; 2026-02-21T09:06:50.2116175Z mov.b32 %r14934, %r14912; 2026-02-21T09:06:50.2116237Z mov.b32 %r14935, %r14912; 2026-02-21T09:06:50.2116304Z mov.b32 %r14936, %r14912; 2026-02-21T09:06:50.2116362Z mov.b32 %r14937, %r14912; 2026-02-21T09:06:50.2116621Z mov.b32 %r14938, %r14912; 2026-02-21T09:06:50.2116735Z mov.b32 %r14939, %r14912; 2026-02-21T09:06:50.2116828Z mov.b32 %r14940, %r14912; 2026-02-21T09:06:50.2116908Z mov.b32 %r14941, %r14912; 2026-02-21T09:06:50.2116970Z mov.b32 %r14942, %r14912; 2026-02-21T09:06:50.2117032Z mov.b32 %r14943, %r14912; 2026-02-21T09:06:50.2117093Z mov.b32 %r14944, %r14912; 2026-02-21T09:06:50.2117151Z mov.b32 %r14945, %r14912; 2026-02-21T09:06:50.2117214Z mov.b32 %r14946, %r14912; 2026-02-21T09:06:50.2117273Z mov.b32 %r14947, %r14912; 2026-02-21T09:06:50.2117333Z mov.b32 %r14948, %r14912; 2026-02-21T09:06:50.2117397Z mov.b32 %r14949, %r14912; 2026-02-21T09:06:50.2117544Z mov.b32 %r14950, %r14912; 2026-02-21T09:06:50.2117607Z mov.b32 %r14951, %r14912; 2026-02-21T09:06:50.2117668Z mov.b32 %r14952, %r14912; 2026-02-21T09:06:50.2117733Z mov.b32 %r14953, %r14912; 2026-02-21T09:06:50.2117792Z mov.b32 %r14954, %r14912; 2026-02-21T09:06:50.2117851Z mov.b32 %r14955, %r14912; 2026-02-21T09:06:50.2117910Z mov.b32 %r14956, %r14912; 2026-02-21T09:06:50.2117973Z mov.b32 %r14957, %r14912; 2026-02-21T09:06:50.2118031Z mov.b32 %r14958, %r14912; 2026-02-21T09:06:50.2118092Z mov.b32 %r14959, %r14912; 2026-02-21T09:06:50.2118154Z mov.b32 %r14960, %r14912; 2026-02-21T09:06:50.2118214Z mov.b32 %r14961, %r14912; 2026-02-21T09:06:50.2118273Z mov.b32 %r14962, %r14912; 2026-02-21T09:06:50.2118337Z mov.b32 %r14963, %r14912; 2026-02-21T09:06:50.2118397Z mov.b32 %r14964, %r14912; 2026-02-21T09:06:50.2118458Z mov.b32 %r14965, %r14912; 2026-02-21T09:06:50.2118517Z mov.b32 %r14966, %r14912; 2026-02-21T09:06:50.2118585Z mov.b32 %r14967, %r14912; 2026-02-21T09:06:50.2118647Z mov.b32 %r14968, %r14912; 2026-02-21T09:06:50.2118706Z mov.b32 %r14969, %r14912; 2026-02-21T09:06:50.2118770Z mov.b32 %r14970, %r14912; 2026-02-21T09:06:50.2118831Z mov.b32 %r14971, %r14912; 2026-02-21T09:06:50.2118890Z mov.b32 %r14972, %r14912; 2026-02-21T09:06:50.2118949Z mov.b32 %r14973, %r14912; 2026-02-21T09:06:50.2119012Z mov.b32 %r14974, %r14912; 2026-02-21T09:06:50.2119071Z mov.b32 %r14975, %r14912; 2026-02-21T09:06:50.2119129Z mov.b32 %r14976, %r14912; 2026-02-21T09:06:50.2119190Z mov.b32 %r14977, %r14912; 2026-02-21T09:06:50.2119251Z mov.b32 %r14978, %r14912; 2026-02-21T09:06:50.2119310Z mov.b32 %r14979, %r14912; 2026-02-21T09:06:50.2119367Z mov.b32 %r14980, %r14912; 2026-02-21T09:06:50.2119440Z mov.b32 %r14981, %r14912; 2026-02-21T09:06:50.2119506Z mov.b32 %r14982, %r14912; 2026-02-21T09:06:50.2119566Z mov.b32 %r14983, %r14912; 2026-02-21T09:06:50.2119630Z mov.b32 %r14984, %r14912; 2026-02-21T09:06:50.2119693Z mov.b32 %r14985, %r14912; 2026-02-21T09:06:50.2119754Z mov.b32 %r14986, %r14912; 2026-02-21T09:06:50.2119813Z mov.b32 %r14987, %r14912; 2026-02-21T09:06:50.2119881Z mov.b32 %r14988, %r14912; 2026-02-21T09:06:50.2120022Z mov.b32 %r14989, %r14912; 2026-02-21T09:06:50.2120081Z mov.b32 %r14990, %r14912; 2026-02-21T09:06:50.2120143Z mov.b32 %r14991, %r14912; 2026-02-21T09:06:50.2120207Z mov.b32 %r14992, %r14912; 2026-02-21T09:06:50.2120265Z mov.b32 %r14993, %r14912; 2026-02-21T09:06:50.2120324Z mov.b32 %r14994, %r14912; 2026-02-21T09:06:50.2120393Z mov.b32 %r14995, %r14912; 2026-02-21T09:06:50.2120453Z mov.b32 %r14996, %r14912; 2026-02-21T09:06:50.2120516Z mov.b32 %r14997, %r14912; 2026-02-21T09:06:50.2120579Z mov.b32 %r14998, %r14912; 2026-02-21T09:06:50.2120638Z mov.b32 %r14999, %r14912; 2026-02-21T09:06:50.2120698Z mov.b32 %r15000, %r14912; 2026-02-21T09:06:50.2120758Z mov.b32 %r15001, %r14912; 2026-02-21T09:06:50.2120822Z mov.b32 %r15002, %r14912; 2026-02-21T09:06:50.2121009Z mov.b32 %r15003, %r14912; 2026-02-21T09:06:50.2121073Z mov.b32 %r15004, %r14912; 2026-02-21T09:06:50.2121136Z mov.b32 %r15005, %r14912; 2026-02-21T09:06:50.2121196Z mov.b32 %r15006, %r14912; 2026-02-21T09:06:50.2121258Z mov.b32 %r15007, %r14912; 2026-02-21T09:06:50.2121325Z mov.b32 %r15008, %r14912; 2026-02-21T09:06:50.2121383Z mov.b32 %r15009, %r14912; 2026-02-21T09:06:50.2121443Z mov.b32 %r15010, %r14912; 2026-02-21T09:06:50.2121502Z mov.b32 %r15011, %r14912; 2026-02-21T09:06:50.2121569Z mov.b32 %r15012, %r14912; 2026-02-21T09:06:50.2121627Z mov.b32 %r15013, %r14912; 2026-02-21T09:06:50.2121684Z mov.b32 %r15014, %r14912; 2026-02-21T09:06:50.2121747Z mov.b32 %r15015, %r14912; 2026-02-21T09:06:50.2121805Z mov.b32 %r15016, %r14912; 2026-02-21T09:06:50.2121864Z mov.b32 %r15017, %r14912; 2026-02-21T09:06:50.2121923Z mov.b32 %r15018, %r14912; 2026-02-21T09:06:50.2121988Z mov.b32 %r15019, %r14912; 2026-02-21T09:06:50.2122047Z mov.b32 %r15020, %r14912; 2026-02-21T09:06:50.2122112Z mov.b32 %r15021, %r14912; 2026-02-21T09:06:50.2122236Z mov.b32 %r15022, %r14912; 2026-02-21T09:06:50.2122301Z mov.b32 %r15023, %r14912; 2026-02-21T09:06:50.2122361Z mov.b32 %r15024, %r14912; 2026-02-21T09:06:50.2122422Z mov.b32 %r15025, %r14912; 2026-02-21T09:06:50.2122489Z mov.b32 %r15026, %r14912; 2026-02-21T09:06:50.2122550Z mov.b32 %r15027, %r14912; 2026-02-21T09:06:50.2122609Z mov.b32 %r15028, %r14912; 2026-02-21T09:06:50.2122672Z mov.b32 %r15029, %r14912; 2026-02-21T09:06:50.2122732Z mov.b32 %r15030, %r14912; 2026-02-21T09:06:50.2122790Z mov.b32 %r15031, %r14912; 2026-02-21T09:06:50.2122849Z mov.b32 %r15032, %r14912; 2026-02-21T09:06:50.2122911Z mov.b32 %r15033, %r14912; 2026-02-21T09:06:50.2122972Z mov.b32 %r15034, %r14912; 2026-02-21T09:06:50.2123032Z mov.b32 %r15035, %r14912; 2026-02-21T09:06:50.2123093Z mov.b32 %r15036, %r14912; 2026-02-21T09:06:50.2123152Z mov.b32 %r15037, %r14912; 2026-02-21T09:06:50.2123212Z mov.b32 %r15038, %r14912; 2026-02-21T09:06:50.2123275Z mov.b32 %r15039, %r14912; 2026-02-21T09:06:50.2123397Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:06:50.2123506Z // => This Inner Loop Header: Depth=2 2026-02-21T09:06:50.2123708Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2123779Z add.s64 %rd69, %rd161, -96; 2026-02-21T09:06:50.2123840Z // begin inline asm 2026-02-21T09:06:50.2123900Z mov.u32 %r4718, 0x0; 2026-02-21T09:06:50.2123962Z mov.u32 %r4719, 0x0; 2026-02-21T09:06:50.2124020Z mov.u32 %r4720, 0x0; 2026-02-21T09:06:50.2124078Z mov.u32 %r4721, 0x0; 2026-02-21T09:06:50.2124208Z ld.global.v4.b32 { %r4718, %r4719, %r4720, %r4721 }, [ %rd69 + 0 ]; 2026-02-21T09:06:50.2124270Z // end inline asm 2026-02-21T09:06:50.2124464Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2124522Z bar.sync 0; 2026-02-21T09:06:50.2124613Z st.shared.v2.b32 [%r13], {%r4718, %r4719}; 2026-02-21T09:06:50.2124692Z st.shared.v2.b32 [%r14], {%r4720, %r4721}; 2026-02-21T09:06:50.2124749Z bar.sync 0; 2026-02-21T09:06:50.2124824Z ld.shared.b16 %rs193, [%r42]; 2026-02-21T09:06:50.2124956Z ld.shared.b16 %rs194, [%r42+256]; 2026-02-21T09:06:50.2125026Z ld.shared.b16 %rs195, [%r42+16]; 2026-02-21T09:06:50.2125092Z ld.shared.b16 %rs196, [%r42+272]; 2026-02-21T09:06:50.2125175Z ld.shared.b16 %rs197, [%r43]; 2026-02-21T09:06:50.2125242Z ld.shared.b16 %rs198, [%r43+256]; 2026-02-21T09:06:50.2125308Z ld.shared.b16 %rs199, [%r43+16]; 2026-02-21T09:06:50.2125375Z ld.shared.b16 %rs200, [%r43+272]; 2026-02-21T09:06:50.2125443Z cvt.f32.bf16 %r4979, %rs193; 2026-02-21T09:06:50.2125505Z cvt.f32.bf16 %r4980, %rs194; 2026-02-21T09:06:50.2125568Z cvt.f32.bf16 %r4981, %rs197; 2026-02-21T09:06:50.2125634Z cvt.f32.bf16 %r4982, %rs198; 2026-02-21T09:06:50.2125697Z cvt.f32.bf16 %r5239, %rs195; 2026-02-21T09:06:50.2125810Z cvt.f32.bf16 %r5240, %rs196; 2026-02-21T09:06:50.2125921Z cvt.f32.bf16 %r5241, %rs199; 2026-02-21T09:06:50.2125985Z cvt.f32.bf16 %r5242, %rs200; 2026-02-21T09:06:50.2126184Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2126253Z cvt.s64.s32 %rd85, %r14911; 2026-02-21T09:06:50.2126321Z add.s64 %rd70, %rd23, %rd85; 2026-02-21T09:06:50.2126691Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2126758Z // begin inline asm 2026-02-21T09:06:50.2126833Z mov.u32 %r4722, 0x0; 2026-02-21T09:06:50.2126911Z ld.global.b32 { %r4722 }, [ %rd70 + 0 ]; 2026-02-21T09:06:50.2126970Z // end inline asm 2026-02-21T09:06:50.2127259Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2127317Z bar.sync 0; 2026-02-21T09:06:50.2127387Z st.shared.b8 [%r17], %r4722; 2026-02-21T09:06:50.2127508Z prmt.b32 %r7866, %r4722, 0, 0x7771U; 2026-02-21T09:06:50.2127671Z st.shared.b8 [%r18+512], %r7866; 2026-02-21T09:06:50.2127743Z prmt.b32 %r7867, %r4722, 0, 0x7772U; 2026-02-21T09:06:50.2127813Z st.shared.b8 [%r19+1024], %r7867; 2026-02-21T09:06:50.2127889Z prmt.b32 %r7868, %r4722, 0, 0x7773U; 2026-02-21T09:06:50.2127956Z st.shared.b8 [%r20+1536], %r7868; 2026-02-21T09:06:50.2128014Z bar.sync 0; 2026-02-21T09:06:50.2128084Z ld.shared.b32 %r7869, [%r21]; 2026-02-21T09:06:50.2128150Z prmt.b32 %r7870, %r7869, 0, 0x7770U; 2026-02-21T09:06:50.2128213Z cvt.u16.u32 %rs201, %r7870; 2026-02-21T09:06:50.2128278Z prmt.b32 %r7871, %r7869, 0, 0x7771U; 2026-02-21T09:06:50.2128345Z cvt.u16.u32 %rs202, %r7871; 2026-02-21T09:06:50.2128409Z prmt.b32 %r7872, %r7869, 0, 0x7772U; 2026-02-21T09:06:50.2128470Z cvt.u16.u32 %rs203, %r7872; 2026-02-21T09:06:50.2128541Z prmt.b32 %r7873, %r7869, 0, 0x7773U; 2026-02-21T09:06:50.2128602Z cvt.u16.u32 %rs204, %r7873; 2026-02-21T09:06:50.2128668Z ld.shared.b32 %r7874, [%r21+128]; 2026-02-21T09:06:50.2128738Z prmt.b32 %r7875, %r7874, 0, 0x7770U; 2026-02-21T09:06:50.2128805Z cvt.u16.u32 %rs205, %r7875; 2026-02-21T09:06:50.2128872Z prmt.b32 %r7876, %r7874, 0, 0x7771U; 2026-02-21T09:06:50.2128939Z cvt.u16.u32 %rs206, %r7876; 2026-02-21T09:06:50.2129010Z prmt.b32 %r7877, %r7874, 0, 0x7772U; 2026-02-21T09:06:50.2129073Z cvt.u16.u32 %rs207, %r7877; 2026-02-21T09:06:50.2129138Z prmt.b32 %r7878, %r7874, 0, 0x7773U; 2026-02-21T09:06:50.2129200Z cvt.u16.u32 %rs208, %r7878; 2026-02-21T09:06:50.2129404Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2129470Z shl.b16 %rs209, %rs201, 4; 2026-02-21T09:06:50.2129535Z shl.b16 %rs210, %rs202, 4; 2026-02-21T09:06:50.2129603Z shl.b16 %rs211, %rs203, 4; 2026-02-21T09:06:50.2129665Z shl.b16 %rs212, %rs204, 4; 2026-02-21T09:06:50.2129729Z shl.b16 %rs213, %rs205, 4; 2026-02-21T09:06:50.2129809Z shl.b16 %rs214, %rs206, 4; 2026-02-21T09:06:50.2129879Z shl.b16 %rs215, %rs207, 4; 2026-02-21T09:06:50.2129944Z shl.b16 %rs216, %rs208, 4; 2026-02-21T09:06:50.2130142Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2130298Z selp.b16 %rs217, %rs209, %rs201, %p81; 2026-02-21T09:06:50.2130360Z cvt.s16.s8 %rs218, %rs217; 2026-02-21T09:06:50.2130423Z shr.s16 %rs219, %rs218, 4; 2026-02-21T09:06:50.2130500Z selp.b16 %rs220, %rs210, %rs202, %p81; 2026-02-21T09:06:50.2130562Z cvt.s16.s8 %rs221, %rs220; 2026-02-21T09:06:50.2130623Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:06:50.2130692Z selp.b16 %rs223, %rs211, %rs203, %p81; 2026-02-21T09:06:50.2130758Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T09:06:50.2130821Z shr.s16 %rs225, %rs224, 4; 2026-02-21T09:06:50.2130891Z selp.b16 %rs226, %rs212, %rs204, %p81; 2026-02-21T09:06:50.2130957Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T09:06:50.2131019Z shr.s16 %rs228, %rs227, 4; 2026-02-21T09:06:50.2131088Z selp.b16 %rs229, %rs213, %rs205, %p81; 2026-02-21T09:06:50.2131284Z cvt.s16.s8 %rs230, %rs229; 2026-02-21T09:06:50.2131349Z shr.s16 %rs231, %rs230, 4; 2026-02-21T09:06:50.2131417Z selp.b16 %rs232, %rs214, %rs206, %p81; 2026-02-21T09:06:50.2131480Z cvt.s16.s8 %rs233, %rs232; 2026-02-21T09:06:50.2131548Z shr.s16 %rs234, %rs233, 4; 2026-02-21T09:06:50.2131615Z selp.b16 %rs235, %rs215, %rs207, %p81; 2026-02-21T09:06:50.2131678Z cvt.s16.s8 %rs236, %rs235; 2026-02-21T09:06:50.2131747Z shr.s16 %rs237, %rs236, 4; 2026-02-21T09:06:50.2131824Z selp.b16 %rs238, %rs216, %rs208, %p81; 2026-02-21T09:06:50.2131887Z cvt.s16.s8 %rs239, %rs238; 2026-02-21T09:06:50.2131948Z shr.s16 %rs240, %rs239, 4; 2026-02-21T09:06:50.2132148Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2132217Z cvt.rn.f32.s16 %r7879, %rs219; 2026-02-21T09:06:50.2132282Z cvt.rn.f32.s16 %r7880, %rs222; 2026-02-21T09:06:50.2132349Z cvt.rn.f32.s16 %r7881, %rs225; 2026-02-21T09:06:50.2132415Z cvt.rn.f32.s16 %r7882, %rs228; 2026-02-21T09:06:50.2132539Z cvt.rn.f32.s16 %r7883, %rs231; 2026-02-21T09:06:50.2132616Z cvt.rn.f32.s16 %r7884, %rs234; 2026-02-21T09:06:50.2132685Z cvt.rn.f32.s16 %r7885, %rs237; 2026-02-21T09:06:50.2132751Z cvt.rn.f32.s16 %r7886, %rs240; 2026-02-21T09:06:50.2132811Z bar.sync 0; 2026-02-21T09:06:50.2132880Z st.shared.b32 [%r22], %r7879; 2026-02-21T09:06:50.2132950Z st.shared.b32 [%r22+8], %r7880; 2026-02-21T09:06:50.2133014Z st.shared.b32 [%r23], %r7881; 2026-02-21T09:06:50.2133081Z st.shared.b32 [%r23+8], %r7882; 2026-02-21T09:06:50.2133145Z st.shared.b32 [%r24], %r7883; 2026-02-21T09:06:50.2133209Z st.shared.b32 [%r24+8], %r7884; 2026-02-21T09:06:50.2133273Z st.shared.b32 [%r25], %r7885; 2026-02-21T09:06:50.2133344Z st.shared.b32 [%r25+8], %r7886; 2026-02-21T09:06:50.2133400Z $L__tmp9: 2026-02-21T09:06:50.2133672Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2133744Z // begin inline asm 2026-02-21T09:06:50.2133825Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2133885Z // end inline asm 2026-02-21T09:06:50.2133941Z bar.sync 0; 2026-02-21T09:06:50.2134020Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2134084Z mov.pred %p33, -1; 2026-02-21T09:06:50.2134144Z // begin inline asm 2026-02-21T09:06:50.2136995Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r4979,%r4980,%r4981,%r4982}, %rd93, %p33, 1, 1; 2026-02-21T09:06:50.2137163Z // end inline asm 2026-02-21T09:06:50.2137238Z // begin inline asm 2026-02-21T09:06:50.2140027Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r5239,%r5240,%r5241,%r5242}, %rd94, %p33, 1, 1; 2026-02-21T09:06:50.2140156Z // end inline asm 2026-02-21T09:06:50.2140233Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2140293Z mov.b32 %r7734, 0; 2026-02-21T09:06:50.2140363Z mov.b32 %r5371, %r12221; 2026-02-21T09:06:50.2140424Z mov.b32 %r5372, %r7734; 2026-02-21T09:06:50.2140487Z mov.b32 %r5373, %r7734; 2026-02-21T09:06:50.2140611Z // begin inline asm 2026-02-21T09:06:50.2143134Z // wait for regs: %r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039,%r5371,%r5372,%r5373 2026-02-21T09:06:50.2143220Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2143281Z // end inline asm 2026-02-21T09:06:50.2143336Z $L__tmp10: 2026-02-21T09:06:50.2143545Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2143612Z add.s64 %rd73, %rd161, -64; 2026-02-21T09:06:50.2143671Z // begin inline asm 2026-02-21T09:06:50.2143731Z mov.u32 %r5505, 0x0; 2026-02-21T09:06:50.2143797Z mov.u32 %r5506, 0x0; 2026-02-21T09:06:50.2143855Z mov.u32 %r5507, 0x0; 2026-02-21T09:06:50.2143912Z mov.u32 %r5508, 0x0; 2026-02-21T09:06:50.2144049Z ld.global.v4.b32 { %r5505, %r5506, %r5507, %r5508 }, [ %rd73 + 0 ]; 2026-02-21T09:06:50.2144108Z // end inline asm 2026-02-21T09:06:50.2144307Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2144378Z bar.sync 0; 2026-02-21T09:06:50.2144470Z st.shared.v2.b32 [%r13], {%r5505, %r5506}; 2026-02-21T09:06:50.2144549Z st.shared.v2.b32 [%r14], {%r5507, %r5508}; 2026-02-21T09:06:50.2144665Z bar.sync 0; 2026-02-21T09:06:50.2144738Z ld.shared.b16 %rs241, [%r42]; 2026-02-21T09:06:50.2144808Z ld.shared.b16 %rs242, [%r42+256]; 2026-02-21T09:06:50.2144876Z ld.shared.b16 %rs243, [%r42+16]; 2026-02-21T09:06:50.2144945Z ld.shared.b16 %rs244, [%r42+272]; 2026-02-21T09:06:50.2145011Z ld.shared.b16 %rs245, [%r43]; 2026-02-21T09:06:50.2145081Z ld.shared.b16 %rs246, [%r43+256]; 2026-02-21T09:06:50.2145148Z ld.shared.b16 %rs247, [%r43+16]; 2026-02-21T09:06:50.2145228Z ld.shared.b16 %rs248, [%r43+272]; 2026-02-21T09:06:50.2145298Z cvt.f32.bf16 %r5766, %rs241; 2026-02-21T09:06:50.2145364Z cvt.f32.bf16 %r5767, %rs242; 2026-02-21T09:06:50.2145442Z cvt.f32.bf16 %r5768, %rs245; 2026-02-21T09:06:50.2145512Z cvt.f32.bf16 %r5769, %rs246; 2026-02-21T09:06:50.2145630Z cvt.f32.bf16 %r6026, %rs243; 2026-02-21T09:06:50.2145749Z cvt.f32.bf16 %r6027, %rs244; 2026-02-21T09:06:50.2145825Z cvt.f32.bf16 %r6028, %rs247; 2026-02-21T09:06:50.2145889Z cvt.f32.bf16 %r6029, %rs248; 2026-02-21T09:06:50.2145962Z cvt.u32.u64 %r7887, %rd162; 2026-02-21T09:06:50.2146033Z add.s32 %r7888, %r7887, 40; 2026-02-21T09:06:50.2146235Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2146302Z or.b32 %r7889, %r8, %r7888; 2026-02-21T09:06:50.2146370Z shl.b32 %r7890, %r7889, 13; 2026-02-21T09:06:50.2146562Z add.s32 %r7891, %r7890, %r313; 2026-02-21T09:06:50.2146771Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2146840Z cvt.s64.s32 %rd86, %r7891; 2026-02-21T09:06:50.2146914Z add.s64 %rd74, %rd23, %rd86; 2026-02-21T09:06:50.2147114Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2147187Z // begin inline asm 2026-02-21T09:06:50.2147345Z mov.u32 %r5509, 0x0; 2026-02-21T09:06:50.2147430Z ld.global.b32 { %r5509 }, [ %rd74 + 0 ]; 2026-02-21T09:06:50.2147496Z // end inline asm 2026-02-21T09:06:50.2147693Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2147763Z bar.sync 0; 2026-02-21T09:06:50.2147828Z st.shared.b8 [%r17], %r5509; 2026-02-21T09:06:50.2147898Z prmt.b32 %r7892, %r5509, 0, 0x7771U; 2026-02-21T09:06:50.2147971Z st.shared.b8 [%r18+512], %r7892; 2026-02-21T09:06:50.2148038Z prmt.b32 %r7893, %r5509, 0, 0x7772U; 2026-02-21T09:06:50.2148102Z st.shared.b8 [%r19+1024], %r7893; 2026-02-21T09:06:50.2148170Z prmt.b32 %r7894, %r5509, 0, 0x7773U; 2026-02-21T09:06:50.2148235Z st.shared.b8 [%r20+1536], %r7894; 2026-02-21T09:06:50.2148381Z bar.sync 0; 2026-02-21T09:06:50.2148458Z ld.shared.b32 %r7895, [%r21]; 2026-02-21T09:06:50.2148532Z prmt.b32 %r7896, %r7895, 0, 0x7770U; 2026-02-21T09:06:50.2148599Z cvt.u16.u32 %rs249, %r7896; 2026-02-21T09:06:50.2148668Z prmt.b32 %r7897, %r7895, 0, 0x7771U; 2026-02-21T09:06:50.2148740Z cvt.u16.u32 %rs250, %r7897; 2026-02-21T09:06:50.2148803Z prmt.b32 %r7898, %r7895, 0, 0x7772U; 2026-02-21T09:06:50.2148867Z cvt.u16.u32 %rs251, %r7898; 2026-02-21T09:06:50.2148933Z prmt.b32 %r7899, %r7895, 0, 0x7773U; 2026-02-21T09:06:50.2149001Z cvt.u16.u32 %rs252, %r7899; 2026-02-21T09:06:50.2149067Z ld.shared.b32 %r7900, [%r21+128]; 2026-02-21T09:06:50.2149132Z prmt.b32 %r7901, %r7900, 0, 0x7770U; 2026-02-21T09:06:50.2149200Z cvt.u16.u32 %rs253, %r7901; 2026-02-21T09:06:50.2149265Z prmt.b32 %r7902, %r7900, 0, 0x7771U; 2026-02-21T09:06:50.2149328Z cvt.u16.u32 %rs254, %r7902; 2026-02-21T09:06:50.2149394Z prmt.b32 %r7903, %r7900, 0, 0x7772U; 2026-02-21T09:06:50.2149462Z cvt.u16.u32 %rs255, %r7903; 2026-02-21T09:06:50.2149525Z prmt.b32 %r7904, %r7900, 0, 0x7773U; 2026-02-21T09:06:50.2149587Z cvt.u16.u32 %rs256, %r7904; 2026-02-21T09:06:50.2149792Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2149859Z shl.b16 %rs257, %rs249, 4; 2026-02-21T09:06:50.2149922Z shl.b16 %rs258, %rs250, 4; 2026-02-21T09:06:50.2150074Z shl.b16 %rs259, %rs251, 4; 2026-02-21T09:06:50.2150138Z shl.b16 %rs260, %rs252, 4; 2026-02-21T09:06:50.2150202Z shl.b16 %rs261, %rs253, 4; 2026-02-21T09:06:50.2150263Z shl.b16 %rs262, %rs254, 4; 2026-02-21T09:06:50.2150331Z shl.b16 %rs263, %rs255, 4; 2026-02-21T09:06:50.2150394Z shl.b16 %rs264, %rs256, 4; 2026-02-21T09:06:50.2150601Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2150683Z selp.b16 %rs265, %rs257, %rs249, %p81; 2026-02-21T09:06:50.2150747Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T09:06:50.2150812Z shr.s16 %rs267, %rs266, 4; 2026-02-21T09:06:50.2150886Z selp.b16 %rs268, %rs258, %rs250, %p81; 2026-02-21T09:06:50.2150951Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T09:06:50.2151146Z shr.s16 %rs270, %rs269, 4; 2026-02-21T09:06:50.2151221Z selp.b16 %rs271, %rs259, %rs251, %p81; 2026-02-21T09:06:50.2151290Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T09:06:50.2151353Z shr.s16 %rs273, %rs272, 4; 2026-02-21T09:06:50.2151425Z selp.b16 %rs274, %rs260, %rs252, %p81; 2026-02-21T09:06:50.2151492Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T09:06:50.2151557Z shr.s16 %rs276, %rs275, 4; 2026-02-21T09:06:50.2151625Z selp.b16 %rs277, %rs261, %rs253, %p81; 2026-02-21T09:06:50.2151689Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T09:06:50.2151757Z shr.s16 %rs279, %rs278, 4; 2026-02-21T09:06:50.2151840Z selp.b16 %rs280, %rs262, %rs254, %p81; 2026-02-21T09:06:50.2151906Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T09:06:50.2151976Z shr.s16 %rs282, %rs281, 4; 2026-02-21T09:06:50.2152045Z selp.b16 %rs283, %rs263, %rs255, %p81; 2026-02-21T09:06:50.2152107Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T09:06:50.2152168Z shr.s16 %rs285, %rs284, 4; 2026-02-21T09:06:50.2152250Z selp.b16 %rs286, %rs264, %rs256, %p81; 2026-02-21T09:06:50.2152367Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T09:06:50.2152432Z shr.s16 %rs288, %rs287, 4; 2026-02-21T09:06:50.2152634Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2152703Z cvt.rn.f32.s16 %r7905, %rs267; 2026-02-21T09:06:50.2152769Z cvt.rn.f32.s16 %r7906, %rs270; 2026-02-21T09:06:50.2152837Z cvt.rn.f32.s16 %r7907, %rs273; 2026-02-21T09:06:50.2152904Z cvt.rn.f32.s16 %r7908, %rs276; 2026-02-21T09:06:50.2152968Z cvt.rn.f32.s16 %r7909, %rs279; 2026-02-21T09:06:50.2153033Z cvt.rn.f32.s16 %r7910, %rs282; 2026-02-21T09:06:50.2153101Z cvt.rn.f32.s16 %r7911, %rs285; 2026-02-21T09:06:50.2153164Z cvt.rn.f32.s16 %r7912, %rs288; 2026-02-21T09:06:50.2153222Z bar.sync 0; 2026-02-21T09:06:50.2153293Z st.shared.b32 [%r22], %r7905; 2026-02-21T09:06:50.2153359Z st.shared.b32 [%r22+8], %r7906; 2026-02-21T09:06:50.2153424Z st.shared.b32 [%r23], %r7907; 2026-02-21T09:06:50.2153493Z st.shared.b32 [%r23+8], %r7908; 2026-02-21T09:06:50.2153565Z st.shared.b32 [%r24], %r7909; 2026-02-21T09:06:50.2153632Z st.shared.b32 [%r24+8], %r7910; 2026-02-21T09:06:50.2153698Z st.shared.b32 [%r25], %r7911; 2026-02-21T09:06:50.2153770Z st.shared.b32 [%r25+8], %r7912; 2026-02-21T09:06:50.2153829Z $L__tmp11: 2026-02-21T09:06:50.2154100Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2154164Z // begin inline asm 2026-02-21T09:06:50.2154250Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2154312Z // end inline asm 2026-02-21T09:06:50.2154370Z bar.sync 0; 2026-02-21T09:06:50.2154451Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2154513Z // begin inline asm 2026-02-21T09:06:50.2157339Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r5766,%r5767,%r5768,%r5769}, %rd93, %p33, 1, 1; 2026-02-21T09:06:50.2157496Z // end inline asm 2026-02-21T09:06:50.2157634Z // begin inline asm 2026-02-21T09:06:50.2160481Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r6026,%r6027,%r6028,%r6029}, %rd94, %p33, 1, 1; 2026-02-21T09:06:50.2160551Z // end inline asm 2026-02-21T09:06:50.2160634Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2160701Z mov.b32 %r6158, %r12221; 2026-02-21T09:06:50.2160764Z mov.b32 %r6159, %r7734; 2026-02-21T09:06:50.2160832Z mov.b32 %r6160, %r7734; 2026-02-21T09:06:50.2160893Z // begin inline asm 2026-02-21T09:06:50.2163405Z // wait for regs: %r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039,%r6158,%r6159,%r6160 2026-02-21T09:06:50.2163506Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2163566Z // end inline asm 2026-02-21T09:06:50.2163625Z $L__tmp12: 2026-02-21T09:06:50.2163830Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2163898Z add.s64 %rd77, %rd161, -32; 2026-02-21T09:06:50.2163959Z // begin inline asm 2026-02-21T09:06:50.2164025Z mov.u32 %r6292, 0x0; 2026-02-21T09:06:50.2164086Z mov.u32 %r6293, 0x0; 2026-02-21T09:06:50.2164149Z mov.u32 %r6294, 0x0; 2026-02-21T09:06:50.2164213Z mov.u32 %r6295, 0x0; 2026-02-21T09:06:50.2164343Z ld.global.v4.b32 { %r6292, %r6293, %r6294, %r6295 }, [ %rd77 + 0 ]; 2026-02-21T09:06:50.2164469Z // end inline asm 2026-02-21T09:06:50.2164670Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2164737Z bar.sync 0; 2026-02-21T09:06:50.2164819Z st.shared.v2.b32 [%r13], {%r6292, %r6293}; 2026-02-21T09:06:50.2164898Z st.shared.v2.b32 [%r14], {%r6294, %r6295}; 2026-02-21T09:06:50.2164960Z bar.sync 0; 2026-02-21T09:06:50.2165030Z ld.shared.b16 %rs289, [%r42]; 2026-02-21T09:06:50.2165105Z ld.shared.b16 %rs290, [%r42+256]; 2026-02-21T09:06:50.2165181Z ld.shared.b16 %rs291, [%r42+16]; 2026-02-21T09:06:50.2165249Z ld.shared.b16 %rs292, [%r42+272]; 2026-02-21T09:06:50.2165316Z ld.shared.b16 %rs293, [%r43]; 2026-02-21T09:06:50.2165382Z ld.shared.b16 %rs294, [%r43+256]; 2026-02-21T09:06:50.2165556Z ld.shared.b16 %rs295, [%r43+16]; 2026-02-21T09:06:50.2165624Z ld.shared.b16 %rs296, [%r43+272]; 2026-02-21T09:06:50.2165690Z cvt.f32.bf16 %r6553, %rs289; 2026-02-21T09:06:50.2165762Z cvt.f32.bf16 %r6554, %rs290; 2026-02-21T09:06:50.2165830Z cvt.f32.bf16 %r6555, %rs293; 2026-02-21T09:06:50.2165891Z cvt.f32.bf16 %r6556, %rs294; 2026-02-21T09:06:50.2165954Z cvt.f32.bf16 %r6813, %rs291; 2026-02-21T09:06:50.2166021Z cvt.f32.bf16 %r6814, %rs292; 2026-02-21T09:06:50.2166096Z cvt.f32.bf16 %r6815, %rs295; 2026-02-21T09:06:50.2166160Z cvt.f32.bf16 %r6816, %rs296; 2026-02-21T09:06:50.2166371Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2166439Z add.s32 %r7913, %r14911, 131072; 2026-02-21T09:06:50.2166613Z cvt.s64.s32 %rd87, %r7913; 2026-02-21T09:06:50.2166688Z add.s64 %rd78, %rd23, %rd87; 2026-02-21T09:06:50.2166900Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2166969Z // begin inline asm 2026-02-21T09:06:50.2167104Z mov.u32 %r6296, 0x0; 2026-02-21T09:06:50.2167190Z ld.global.b32 { %r6296 }, [ %rd78 + 0 ]; 2026-02-21T09:06:50.2167250Z // end inline asm 2026-02-21T09:06:50.2167464Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2167537Z bar.sync 0; 2026-02-21T09:06:50.2167607Z st.shared.b8 [%r17], %r6296; 2026-02-21T09:06:50.2167680Z prmt.b32 %r7914, %r6296, 0, 0x7771U; 2026-02-21T09:06:50.2167749Z st.shared.b8 [%r18+512], %r7914; 2026-02-21T09:06:50.2167824Z prmt.b32 %r7915, %r6296, 0, 0x7772U; 2026-02-21T09:06:50.2167892Z st.shared.b8 [%r19+1024], %r7915; 2026-02-21T09:06:50.2167959Z prmt.b32 %r7916, %r6296, 0, 0x7773U; 2026-02-21T09:06:50.2168030Z st.shared.b8 [%r20+1536], %r7916; 2026-02-21T09:06:50.2168087Z bar.sync 0; 2026-02-21T09:06:50.2168154Z ld.shared.b32 %r7917, [%r21]; 2026-02-21T09:06:50.2168224Z prmt.b32 %r7918, %r7917, 0, 0x7770U; 2026-02-21T09:06:50.2168295Z cvt.u16.u32 %rs297, %r7918; 2026-02-21T09:06:50.2168360Z prmt.b32 %r7919, %r7917, 0, 0x7771U; 2026-02-21T09:06:50.2168425Z cvt.u16.u32 %rs298, %r7919; 2026-02-21T09:06:50.2168496Z prmt.b32 %r7920, %r7917, 0, 0x7772U; 2026-02-21T09:06:50.2168561Z cvt.u16.u32 %rs299, %r7920; 2026-02-21T09:06:50.2168626Z prmt.b32 %r7921, %r7917, 0, 0x7773U; 2026-02-21T09:06:50.2168694Z cvt.u16.u32 %rs300, %r7921; 2026-02-21T09:06:50.2168762Z ld.shared.b32 %r7922, [%r21+128]; 2026-02-21T09:06:50.2168828Z prmt.b32 %r7923, %r7922, 0, 0x7770U; 2026-02-21T09:06:50.2168891Z cvt.u16.u32 %rs301, %r7923; 2026-02-21T09:06:50.2168960Z prmt.b32 %r7924, %r7922, 0, 0x7771U; 2026-02-21T09:06:50.2169023Z cvt.u16.u32 %rs302, %r7924; 2026-02-21T09:06:50.2169088Z prmt.b32 %r7925, %r7922, 0, 0x7772U; 2026-02-21T09:06:50.2169158Z cvt.u16.u32 %rs303, %r7925; 2026-02-21T09:06:50.2169223Z prmt.b32 %r7926, %r7922, 0, 0x7773U; 2026-02-21T09:06:50.2169286Z cvt.u16.u32 %rs304, %r7926; 2026-02-21T09:06:50.2169495Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2169563Z shl.b16 %rs305, %rs297, 4; 2026-02-21T09:06:50.2169628Z shl.b16 %rs306, %rs298, 4; 2026-02-21T09:06:50.2169781Z shl.b16 %rs307, %rs299, 4; 2026-02-21T09:06:50.2169850Z shl.b16 %rs308, %rs300, 4; 2026-02-21T09:06:50.2169913Z shl.b16 %rs309, %rs301, 4; 2026-02-21T09:06:50.2169975Z shl.b16 %rs310, %rs302, 4; 2026-02-21T09:06:50.2170045Z shl.b16 %rs311, %rs303, 4; 2026-02-21T09:06:50.2170106Z shl.b16 %rs312, %rs304, 4; 2026-02-21T09:06:50.2170303Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2170379Z selp.b16 %rs313, %rs305, %rs297, %p81; 2026-02-21T09:06:50.2170449Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T09:06:50.2170509Z shr.s16 %rs315, %rs314, 4; 2026-02-21T09:06:50.2170582Z selp.b16 %rs316, %rs306, %rs298, %p81; 2026-02-21T09:06:50.2170721Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T09:06:50.2170851Z shr.s16 %rs318, %rs317, 4; 2026-02-21T09:06:50.2170926Z selp.b16 %rs319, %rs307, %rs299, %p81; 2026-02-21T09:06:50.2170996Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T09:06:50.2171062Z shr.s16 %rs321, %rs320, 4; 2026-02-21T09:06:50.2171133Z selp.b16 %rs322, %rs308, %rs300, %p81; 2026-02-21T09:06:50.2171197Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T09:06:50.2171267Z shr.s16 %rs324, %rs323, 4; 2026-02-21T09:06:50.2171337Z selp.b16 %rs325, %rs309, %rs301, %p81; 2026-02-21T09:06:50.2171400Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T09:06:50.2171466Z shr.s16 %rs327, %rs326, 4; 2026-02-21T09:06:50.2171533Z selp.b16 %rs328, %rs310, %rs302, %p81; 2026-02-21T09:06:50.2171595Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T09:06:50.2171659Z shr.s16 %rs330, %rs329, 4; 2026-02-21T09:06:50.2171743Z selp.b16 %rs331, %rs311, %rs303, %p81; 2026-02-21T09:06:50.2171809Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T09:06:50.2171871Z shr.s16 %rs333, %rs332, 4; 2026-02-21T09:06:50.2171947Z selp.b16 %rs334, %rs312, %rs304, %p81; 2026-02-21T09:06:50.2172062Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T09:06:50.2172127Z shr.s16 %rs336, %rs335, 4; 2026-02-21T09:06:50.2172325Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2172399Z cvt.rn.f32.s16 %r7927, %rs315; 2026-02-21T09:06:50.2172465Z cvt.rn.f32.s16 %r7928, %rs318; 2026-02-21T09:06:50.2172529Z cvt.rn.f32.s16 %r7929, %rs321; 2026-02-21T09:06:50.2172597Z cvt.rn.f32.s16 %r7930, %rs324; 2026-02-21T09:06:50.2172660Z cvt.rn.f32.s16 %r7931, %rs327; 2026-02-21T09:06:50.2172726Z cvt.rn.f32.s16 %r7932, %rs330; 2026-02-21T09:06:50.2172793Z cvt.rn.f32.s16 %r7933, %rs333; 2026-02-21T09:06:50.2172855Z cvt.rn.f32.s16 %r7934, %rs336; 2026-02-21T09:06:50.2172923Z bar.sync 0; 2026-02-21T09:06:50.2172991Z st.shared.b32 [%r22], %r7927; 2026-02-21T09:06:50.2173063Z st.shared.b32 [%r22+8], %r7928; 2026-02-21T09:06:50.2173127Z st.shared.b32 [%r23], %r7929; 2026-02-21T09:06:50.2173197Z st.shared.b32 [%r23+8], %r7930; 2026-02-21T09:06:50.2173268Z st.shared.b32 [%r24], %r7931; 2026-02-21T09:06:50.2173333Z st.shared.b32 [%r24+8], %r7932; 2026-02-21T09:06:50.2173398Z st.shared.b32 [%r25], %r7933; 2026-02-21T09:06:50.2173464Z st.shared.b32 [%r25+8], %r7934; 2026-02-21T09:06:50.2173525Z $L__tmp13: 2026-02-21T09:06:50.2173802Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2173865Z // begin inline asm 2026-02-21T09:06:50.2173954Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2174013Z // end inline asm 2026-02-21T09:06:50.2174071Z bar.sync 0; 2026-02-21T09:06:50.2174145Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2174211Z // begin inline asm 2026-02-21T09:06:50.2177060Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r6553,%r6554,%r6555,%r6556}, %rd93, %p33, 1, 1; 2026-02-21T09:06:50.2177212Z // end inline asm 2026-02-21T09:06:50.2177335Z // begin inline asm 2026-02-21T09:06:50.2180176Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r6813,%r6814,%r6815,%r6816}, %rd94, %p33, 1, 1; 2026-02-21T09:06:50.2180247Z // end inline asm 2026-02-21T09:06:50.2180332Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2180395Z mov.b32 %r6945, %r12221; 2026-02-21T09:06:50.2180462Z mov.b32 %r6946, %r7734; 2026-02-21T09:06:50.2180523Z mov.b32 %r6947, %r7734; 2026-02-21T09:06:50.2180588Z // begin inline asm 2026-02-21T09:06:50.2183095Z // wait for regs: %r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039,%r6945,%r6946,%r6947 2026-02-21T09:06:50.2183185Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2183248Z // end inline asm 2026-02-21T09:06:50.2183303Z $L__tmp14: 2026-02-21T09:06:50.2183518Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2183582Z // begin inline asm 2026-02-21T09:06:50.2183644Z mov.u32 %r7079, 0x0; 2026-02-21T09:06:50.2183709Z mov.u32 %r7080, 0x0; 2026-02-21T09:06:50.2183769Z mov.u32 %r7081, 0x0; 2026-02-21T09:06:50.2183830Z mov.u32 %r7082, 0x0; 2026-02-21T09:06:50.2183965Z ld.global.v4.b32 { %r7079, %r7080, %r7081, %r7082 }, [ %rd161 + 0 ]; 2026-02-21T09:06:50.2184032Z // end inline asm 2026-02-21T09:06:50.2184287Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2184347Z bar.sync 0; 2026-02-21T09:06:50.2184437Z st.shared.v2.b32 [%r13], {%r7079, %r7080}; 2026-02-21T09:06:50.2184516Z st.shared.v2.b32 [%r14], {%r7081, %r7082}; 2026-02-21T09:06:50.2184574Z bar.sync 0; 2026-02-21T09:06:50.2184647Z ld.shared.b16 %rs337, [%r42]; 2026-02-21T09:06:50.2184719Z ld.shared.b16 %rs338, [%r42+256]; 2026-02-21T09:06:50.2184792Z ld.shared.b16 %rs339, [%r42+16]; 2026-02-21T09:06:50.2184857Z ld.shared.b16 %rs340, [%r42+272]; 2026-02-21T09:06:50.2184934Z ld.shared.b16 %rs341, [%r43]; 2026-02-21T09:06:50.2185000Z ld.shared.b16 %rs342, [%r43+256]; 2026-02-21T09:06:50.2185066Z ld.shared.b16 %rs343, [%r43+16]; 2026-02-21T09:06:50.2185271Z ld.shared.b16 %rs344, [%r43+272]; 2026-02-21T09:06:50.2185342Z cvt.f32.bf16 %r7340, %rs337; 2026-02-21T09:06:50.2185406Z cvt.f32.bf16 %r7341, %rs338; 2026-02-21T09:06:50.2185468Z cvt.f32.bf16 %r7342, %rs341; 2026-02-21T09:06:50.2185536Z cvt.f32.bf16 %r7343, %rs342; 2026-02-21T09:06:50.2185599Z cvt.f32.bf16 %r7600, %rs339; 2026-02-21T09:06:50.2185663Z cvt.f32.bf16 %r7601, %rs340; 2026-02-21T09:06:50.2185730Z cvt.f32.bf16 %r7602, %rs343; 2026-02-21T09:06:50.2185790Z cvt.f32.bf16 %r7603, %rs344; 2026-02-21T09:06:50.2185853Z add.s32 %r7935, %r7887, 56; 2026-02-21T09:06:50.2186052Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2186124Z or.b32 %r7936, %r8, %r7935; 2026-02-21T09:06:50.2186186Z shl.b32 %r7937, %r7936, 13; 2026-02-21T09:06:50.2186251Z add.s32 %r7938, %r7937, %r313; 2026-02-21T09:06:50.2186561Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2186714Z cvt.s64.s32 %rd88, %r7938; 2026-02-21T09:06:50.2186783Z add.s64 %rd82, %rd23, %rd88; 2026-02-21T09:06:50.2186982Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2187045Z // begin inline asm 2026-02-21T09:06:50.2187116Z mov.u32 %r7083, 0x0; 2026-02-21T09:06:50.2187194Z ld.global.b32 { %r7083 }, [ %rd82 + 0 ]; 2026-02-21T09:06:50.2187262Z // end inline asm 2026-02-21T09:06:50.2187455Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2187513Z bar.sync 0; 2026-02-21T09:06:50.2187585Z st.shared.b8 [%r17], %r7083; 2026-02-21T09:06:50.2187656Z prmt.b32 %r7939, %r7083, 0, 0x7771U; 2026-02-21T09:06:50.2187724Z st.shared.b8 [%r18+512], %r7939; 2026-02-21T09:06:50.2187795Z prmt.b32 %r7940, %r7083, 0, 0x7772U; 2026-02-21T09:06:50.2187861Z st.shared.b8 [%r19+1024], %r7940; 2026-02-21T09:06:50.2187931Z prmt.b32 %r7941, %r7083, 0, 0x7773U; 2026-02-21T09:06:50.2188001Z st.shared.b8 [%r20+1536], %r7941; 2026-02-21T09:06:50.2188066Z bar.sync 0; 2026-02-21T09:06:50.2188131Z ld.shared.b32 %r7942, [%r21]; 2026-02-21T09:06:50.2188198Z prmt.b32 %r7943, %r7942, 0, 0x7770U; 2026-02-21T09:06:50.2188266Z cvt.u16.u32 %rs345, %r7943; 2026-02-21T09:06:50.2188435Z prmt.b32 %r7944, %r7942, 0, 0x7771U; 2026-02-21T09:06:50.2188500Z cvt.u16.u32 %rs346, %r7944; 2026-02-21T09:06:50.2188565Z prmt.b32 %r7945, %r7942, 0, 0x7772U; 2026-02-21T09:06:50.2188631Z cvt.u16.u32 %rs347, %r7945; 2026-02-21T09:06:50.2188694Z prmt.b32 %r7946, %r7942, 0, 0x7773U; 2026-02-21T09:06:50.2188756Z cvt.u16.u32 %rs348, %r7946; 2026-02-21T09:06:50.2188829Z ld.shared.b32 %r7947, [%r21+128]; 2026-02-21T09:06:50.2188898Z prmt.b32 %r7948, %r7947, 0, 0x7770U; 2026-02-21T09:06:50.2188961Z cvt.u16.u32 %rs349, %r7948; 2026-02-21T09:06:50.2189025Z prmt.b32 %r7949, %r7947, 0, 0x7771U; 2026-02-21T09:06:50.2189094Z cvt.u16.u32 %rs350, %r7949; 2026-02-21T09:06:50.2189161Z prmt.b32 %r7950, %r7947, 0, 0x7772U; 2026-02-21T09:06:50.2189222Z cvt.u16.u32 %rs351, %r7950; 2026-02-21T09:06:50.2189290Z prmt.b32 %r7951, %r7947, 0, 0x7773U; 2026-02-21T09:06:50.2189439Z cvt.u16.u32 %rs352, %r7951; 2026-02-21T09:06:50.2189636Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2189705Z shl.b16 %rs353, %rs345, 4; 2026-02-21T09:06:50.2189769Z shl.b16 %rs354, %rs346, 4; 2026-02-21T09:06:50.2189831Z shl.b16 %rs355, %rs347, 4; 2026-02-21T09:06:50.2189892Z shl.b16 %rs356, %rs348, 4; 2026-02-21T09:06:50.2189958Z shl.b16 %rs357, %rs349, 4; 2026-02-21T09:06:50.2190021Z shl.b16 %rs358, %rs350, 4; 2026-02-21T09:06:50.2190083Z shl.b16 %rs359, %rs351, 4; 2026-02-21T09:06:50.2190149Z shl.b16 %rs360, %rs352, 4; 2026-02-21T09:06:50.2190341Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2190548Z selp.b16 %rs361, %rs353, %rs345, %p81; 2026-02-21T09:06:50.2190619Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T09:06:50.2190682Z shr.s16 %rs363, %rs362, 4; 2026-02-21T09:06:50.2190753Z selp.b16 %rs364, %rs354, %rs346, %p81; 2026-02-21T09:06:50.2190822Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T09:06:50.2190891Z shr.s16 %rs366, %rs365, 4; 2026-02-21T09:06:50.2190973Z selp.b16 %rs367, %rs355, %rs347, %p81; 2026-02-21T09:06:50.2191037Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T09:06:50.2191107Z shr.s16 %rs369, %rs368, 4; 2026-02-21T09:06:50.2191175Z selp.b16 %rs370, %rs356, %rs348, %p81; 2026-02-21T09:06:50.2191238Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T09:06:50.2191300Z shr.s16 %rs372, %rs371, 4; 2026-02-21T09:06:50.2191374Z selp.b16 %rs373, %rs357, %rs349, %p81; 2026-02-21T09:06:50.2191436Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T09:06:50.2191498Z shr.s16 %rs375, %rs374, 4; 2026-02-21T09:06:50.2191570Z selp.b16 %rs376, %rs358, %rs350, %p81; 2026-02-21T09:06:50.2191632Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T09:06:50.2191699Z shr.s16 %rs378, %rs377, 4; 2026-02-21T09:06:50.2191830Z selp.b16 %rs379, %rs359, %rs351, %p81; 2026-02-21T09:06:50.2191901Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T09:06:50.2191963Z shr.s16 %rs381, %rs380, 4; 2026-02-21T09:06:50.2192035Z selp.b16 %rs382, %rs360, %rs352, %p81; 2026-02-21T09:06:50.2192105Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T09:06:50.2192167Z shr.s16 %rs384, %rs383, 4; 2026-02-21T09:06:50.2192363Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2192435Z cvt.rn.f32.s16 %r7952, %rs363; 2026-02-21T09:06:50.2192503Z cvt.rn.f32.s16 %r7953, %rs366; 2026-02-21T09:06:50.2192570Z cvt.rn.f32.s16 %r7954, %rs369; 2026-02-21T09:06:50.2192635Z cvt.rn.f32.s16 %r7955, %rs372; 2026-02-21T09:06:50.2192703Z cvt.rn.f32.s16 %r7956, %rs375; 2026-02-21T09:06:50.2192767Z cvt.rn.f32.s16 %r7957, %rs378; 2026-02-21T09:06:50.2192831Z cvt.rn.f32.s16 %r7958, %rs381; 2026-02-21T09:06:50.2192903Z cvt.rn.f32.s16 %r7959, %rs384; 2026-02-21T09:06:50.2192963Z bar.sync 0; 2026-02-21T09:06:50.2193031Z st.shared.b32 [%r22], %r7952; 2026-02-21T09:06:50.2193097Z st.shared.b32 [%r22+8], %r7953; 2026-02-21T09:06:50.2193170Z st.shared.b32 [%r23], %r7954; 2026-02-21T09:06:50.2193237Z st.shared.b32 [%r23+8], %r7955; 2026-02-21T09:06:50.2193301Z st.shared.b32 [%r24], %r7956; 2026-02-21T09:06:50.2193370Z st.shared.b32 [%r24+8], %r7957; 2026-02-21T09:06:50.2193435Z st.shared.b32 [%r25], %r7958; 2026-02-21T09:06:50.2193501Z st.shared.b32 [%r25+8], %r7959; 2026-02-21T09:06:50.2193555Z $L__tmp15: 2026-02-21T09:06:50.2193844Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2197859Z // begin inline asm 2026-02-21T09:06:50.2197992Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2198058Z // end inline asm 2026-02-21T09:06:50.2198117Z bar.sync 0; 2026-02-21T09:06:50.2198205Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2198271Z // begin inline asm 2026-02-21T09:06:50.2201111Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r7340,%r7341,%r7342,%r7343}, %rd93, %p33, 1, 1; 2026-02-21T09:06:50.2201303Z // end inline asm 2026-02-21T09:06:50.2201370Z // begin inline asm 2026-02-21T09:06:50.2204118Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039}, {%r7600,%r7601,%r7602,%r7603}, %rd94, %p33, 1, 1; 2026-02-21T09:06:50.2204187Z // end inline asm 2026-02-21T09:06:50.2204272Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2204349Z mov.b32 %r7733, %r7734; 2026-02-21T09:06:50.2204415Z mov.b32 %r7732, %r12221; 2026-02-21T09:06:50.2204481Z // begin inline asm 2026-02-21T09:06:50.2207072Z // wait for regs: %r14912,%r14913,%r14914,%r14915,%r14916,%r14917,%r14918,%r14919,%r14920,%r14921,%r14922,%r14923,%r14924,%r14925,%r14926,%r14927,%r14928,%r14929,%r14930,%r14931,%r14932,%r14933,%r14934,%r14935,%r14936,%r14937,%r14938,%r14939,%r14940,%r14941,%r14942,%r14943,%r14944,%r14945,%r14946,%r14947,%r14948,%r14949,%r14950,%r14951,%r14952,%r14953,%r14954,%r14955,%r14956,%r14957,%r14958,%r14959,%r14960,%r14961,%r14962,%r14963,%r14964,%r14965,%r14966,%r14967,%r14968,%r14969,%r14970,%r14971,%r14972,%r14973,%r14974,%r14975,%r14976,%r14977,%r14978,%r14979,%r14980,%r14981,%r14982,%r14983,%r14984,%r14985,%r14986,%r14987,%r14988,%r14989,%r14990,%r14991,%r14992,%r14993,%r14994,%r14995,%r14996,%r14997,%r14998,%r14999,%r15000,%r15001,%r15002,%r15003,%r15004,%r15005,%r15006,%r15007,%r15008,%r15009,%r15010,%r15011,%r15012,%r15013,%r15014,%r15015,%r15016,%r15017,%r15018,%r15019,%r15020,%r15021,%r15022,%r15023,%r15024,%r15025,%r15026,%r15027,%r15028,%r15029,%r15030,%r15031,%r15032,%r15033,%r15034,%r15035,%r15036,%r15037,%r15038,%r15039,%r7732,%r7733,%r7734 2026-02-21T09:06:50.2207164Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2207226Z // end inline asm 2026-02-21T09:06:50.2207282Z $L__tmp16: 2026-02-21T09:06:50.2207510Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2207581Z add.s64 %rd162, %rd162, 32; 2026-02-21T09:06:50.2207648Z add.s64 %rd161, %rd161, 128; 2026-02-21T09:06:50.2207715Z add.s32 %r14911, %r14911, 262144; 2026-02-21T09:06:50.2207900Z setp.lt.u64 %p42, %rd162, 480; 2026-02-21T09:06:50.2207963Z @%p42 bra $L__BB0_5; 2026-02-21T09:06:50.2208081Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:06:50.2208295Z .loc 1 94 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:94:28 2026-02-21T09:06:50.2208385Z cvt.rn.bf16x2.f32 %r7964, %r14913, %r14912; 2026-02-21T09:06:50.2208466Z cvt.rn.bf16x2.f32 %r7965, %r14915, %r14914; 2026-02-21T09:06:50.2208546Z cvt.rn.bf16x2.f32 %r7966, %r14917, %r14916; 2026-02-21T09:06:50.2208621Z cvt.rn.bf16x2.f32 %r7967, %r14919, %r14918; 2026-02-21T09:06:50.2208697Z cvt.rn.bf16x2.f32 %r7968, %r14921, %r14920; 2026-02-21T09:06:50.2208772Z cvt.rn.bf16x2.f32 %r7969, %r14923, %r14922; 2026-02-21T09:06:50.2208984Z cvt.rn.bf16x2.f32 %r7970, %r14925, %r14924; 2026-02-21T09:06:50.2209065Z cvt.rn.bf16x2.f32 %r7971, %r14927, %r14926; 2026-02-21T09:06:50.2209139Z cvt.rn.bf16x2.f32 %r7972, %r14929, %r14928; 2026-02-21T09:06:50.2209222Z cvt.rn.bf16x2.f32 %r7973, %r14931, %r14930; 2026-02-21T09:06:50.2209299Z cvt.rn.bf16x2.f32 %r7974, %r14933, %r14932; 2026-02-21T09:06:50.2209375Z cvt.rn.bf16x2.f32 %r7975, %r14935, %r14934; 2026-02-21T09:06:50.2209456Z cvt.rn.bf16x2.f32 %r7976, %r14937, %r14936; 2026-02-21T09:06:50.2209530Z cvt.rn.bf16x2.f32 %r7977, %r14939, %r14938; 2026-02-21T09:06:50.2209602Z cvt.rn.bf16x2.f32 %r7978, %r14941, %r14940; 2026-02-21T09:06:50.2209680Z cvt.rn.bf16x2.f32 %r7979, %r14943, %r14942; 2026-02-21T09:06:50.2209761Z cvt.rn.bf16x2.f32 %r7980, %r14945, %r14944; 2026-02-21T09:06:50.2209835Z cvt.rn.bf16x2.f32 %r7981, %r14947, %r14946; 2026-02-21T09:06:50.2209910Z cvt.rn.bf16x2.f32 %r7982, %r14949, %r14948; 2026-02-21T09:06:50.2209993Z cvt.rn.bf16x2.f32 %r7983, %r14951, %r14950; 2026-02-21T09:06:50.2210140Z cvt.rn.bf16x2.f32 %r7984, %r14953, %r14952; 2026-02-21T09:06:50.2210220Z cvt.rn.bf16x2.f32 %r7985, %r14955, %r14954; 2026-02-21T09:06:50.2210299Z cvt.rn.bf16x2.f32 %r7986, %r14957, %r14956; 2026-02-21T09:06:50.2210378Z cvt.rn.bf16x2.f32 %r7987, %r14959, %r14958; 2026-02-21T09:06:50.2210452Z cvt.rn.bf16x2.f32 %r7988, %r14961, %r14960; 2026-02-21T09:06:50.2210528Z cvt.rn.bf16x2.f32 %r7989, %r14963, %r14962; 2026-02-21T09:06:50.2210606Z cvt.rn.bf16x2.f32 %r7990, %r14965, %r14964; 2026-02-21T09:06:50.2210682Z cvt.rn.bf16x2.f32 %r7991, %r14967, %r14966; 2026-02-21T09:06:50.2210758Z cvt.rn.bf16x2.f32 %r7992, %r14969, %r14968; 2026-02-21T09:06:50.2210836Z cvt.rn.bf16x2.f32 %r7993, %r14971, %r14970; 2026-02-21T09:06:50.2210912Z cvt.rn.bf16x2.f32 %r7994, %r14973, %r14972; 2026-02-21T09:06:50.2210998Z cvt.rn.bf16x2.f32 %r7995, %r14975, %r14974; 2026-02-21T09:06:50.2211077Z cvt.rn.bf16x2.f32 %r7996, %r14977, %r14976; 2026-02-21T09:06:50.2211159Z cvt.rn.bf16x2.f32 %r7997, %r14979, %r14978; 2026-02-21T09:06:50.2211234Z cvt.rn.bf16x2.f32 %r7998, %r14981, %r14980; 2026-02-21T09:06:50.2211309Z cvt.rn.bf16x2.f32 %r7999, %r14983, %r14982; 2026-02-21T09:06:50.2211391Z cvt.rn.bf16x2.f32 %r8000, %r14985, %r14984; 2026-02-21T09:06:50.2211466Z cvt.rn.bf16x2.f32 %r8001, %r14987, %r14986; 2026-02-21T09:06:50.2211540Z cvt.rn.bf16x2.f32 %r8002, %r14989, %r14988; 2026-02-21T09:06:50.2211619Z cvt.rn.bf16x2.f32 %r8003, %r14991, %r14990; 2026-02-21T09:06:50.2211694Z cvt.rn.bf16x2.f32 %r8004, %r14993, %r14992; 2026-02-21T09:06:50.2211767Z cvt.rn.bf16x2.f32 %r8005, %r14995, %r14994; 2026-02-21T09:06:50.2211848Z cvt.rn.bf16x2.f32 %r8006, %r14997, %r14996; 2026-02-21T09:06:50.2211922Z cvt.rn.bf16x2.f32 %r8007, %r14999, %r14998; 2026-02-21T09:06:50.2211996Z cvt.rn.bf16x2.f32 %r8008, %r15001, %r15000; 2026-02-21T09:06:50.2212070Z cvt.rn.bf16x2.f32 %r8009, %r15003, %r15002; 2026-02-21T09:06:50.2212151Z cvt.rn.bf16x2.f32 %r8010, %r15005, %r15004; 2026-02-21T09:06:50.2212227Z cvt.rn.bf16x2.f32 %r8011, %r15007, %r15006; 2026-02-21T09:06:50.2212302Z cvt.rn.bf16x2.f32 %r8012, %r15009, %r15008; 2026-02-21T09:06:50.2212382Z cvt.rn.bf16x2.f32 %r8013, %r15011, %r15010; 2026-02-21T09:06:50.2212529Z cvt.rn.bf16x2.f32 %r8014, %r15013, %r15012; 2026-02-21T09:06:50.2212604Z cvt.rn.bf16x2.f32 %r8015, %r15015, %r15014; 2026-02-21T09:06:50.2212684Z cvt.rn.bf16x2.f32 %r8016, %r15017, %r15016; 2026-02-21T09:06:50.2212762Z cvt.rn.bf16x2.f32 %r8017, %r15019, %r15018; 2026-02-21T09:06:50.2212836Z cvt.rn.bf16x2.f32 %r8018, %r15021, %r15020; 2026-02-21T09:06:50.2212909Z cvt.rn.bf16x2.f32 %r8019, %r15023, %r15022; 2026-02-21T09:06:50.2212988Z cvt.rn.bf16x2.f32 %r8020, %r15025, %r15024; 2026-02-21T09:06:50.2213062Z cvt.rn.bf16x2.f32 %r8021, %r15027, %r15026; 2026-02-21T09:06:50.2213138Z cvt.rn.bf16x2.f32 %r8022, %r15029, %r15028; 2026-02-21T09:06:50.2213216Z cvt.rn.bf16x2.f32 %r8023, %r15031, %r15030; 2026-02-21T09:06:50.2213288Z cvt.rn.bf16x2.f32 %r8024, %r15033, %r15032; 2026-02-21T09:06:50.2213506Z cvt.rn.bf16x2.f32 %r8025, %r15035, %r15034; 2026-02-21T09:06:50.2213587Z cvt.rn.bf16x2.f32 %r8026, %r15037, %r15036; 2026-02-21T09:06:50.2213662Z cvt.rn.bf16x2.f32 %r8027, %r15039, %r15038; 2026-02-21T09:06:50.2213881Z .loc 1 95 43 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:95:43 2026-02-21T09:06:50.2213942Z bar.sync 0; 2026-02-21T09:06:50.2214139Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r26], {%r7964, %r7965, %r7966, %r7967}; 2026-02-21T09:06:50.2214320Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r7980, %r7981, %r7982, %r7983}; 2026-02-21T09:06:50.2214494Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r7996, %r7997, %r7998, %r7999}; 2026-02-21T09:06:50.2214670Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r8012, %r8013, %r8014, %r8015}; 2026-02-21T09:06:50.2214842Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r7968, %r7969, %r7970, %r7971}; 2026-02-21T09:06:50.2215066Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r7984, %r7985, %r7986, %r7987}; 2026-02-21T09:06:50.2215246Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r8000, %r8001, %r8002, %r8003}; 2026-02-21T09:06:50.2215421Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r8016, %r8017, %r8018, %r8019}; 2026-02-21T09:06:50.2215595Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r7972, %r7973, %r7974, %r7975}; 2026-02-21T09:06:50.2215771Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r7988, %r7989, %r7990, %r7991}; 2026-02-21T09:06:50.2215943Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r8004, %r8005, %r8006, %r8007}; 2026-02-21T09:06:50.2216116Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r8020, %r8021, %r8022, %r8023}; 2026-02-21T09:06:50.2216289Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r7976, %r7977, %r7978, %r7979}; 2026-02-21T09:06:50.2216588Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r7992, %r7993, %r7994, %r7995}; 2026-02-21T09:06:50.2216774Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r8008, %r8009, %r8010, %r8011}; 2026-02-21T09:06:50.2216947Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r8024, %r8025, %r8026, %r8027}; 2026-02-21T09:06:50.2217018Z // begin inline asm 2026-02-21T09:06:50.2217102Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2217161Z // end inline asm 2026-02-21T09:06:50.2217222Z bar.sync 0; 2026-02-21T09:06:50.2217296Z elect.sync %r8028|%p45, -1; 2026-02-21T09:06:50.2217365Z and.pred %p43, %p82, %p45; 2026-02-21T09:06:50.2217436Z or.b32 %r7960, %r312, %r310; 2026-02-21T09:06:50.2217497Z // begin inline asm 2026-02-21T09:06:50.2217730Z @%p43 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd158, {%r7960, %r7961}], [%r7962]; 2026-02-21T09:06:50.2217801Z // end inline asm 2026-02-21T09:06:50.2217885Z cp.async.bulk.commit_group; 2026-02-21T09:06:50.2217964Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:50.2218020Z bar.sync 0; 2026-02-21T09:06:50.2218240Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.2218311Z add.s32 %r8029, %r14781, 2112; 2026-02-21T09:06:50.2218511Z .loc 1 32 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:32:35 2026-02-21T09:06:50.2218665Z shr.s32 %r8030, %r8029, 31; 2026-02-21T09:06:50.2218727Z shr.u32 %r8031, %r8030, 26; 2026-02-21T09:06:50.2218793Z add.s32 %r8032, %r8029, %r8031; 2026-02-21T09:06:50.2218856Z shr.s32 %r8033, %r8032, 6; 2026-02-21T09:06:50.2219052Z .loc 1 33 33 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:33:33 2026-02-21T09:06:50.2219127Z shl.b32 %r8034, %r8033, 1; 2026-02-21T09:06:50.2219320Z .loc 1 34 39 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:39 2026-02-21T09:06:50.2219386Z sub.s32 %r8035, 16, %r8034; 2026-02-21T09:06:50.2219576Z .loc 1 34 52 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:52 2026-02-21T09:06:50.2219763Z min.s32 %r8036, %r8035, 2; 2026-02-21T09:06:50.2219960Z .loc 1 35 45 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:45 2026-02-21T09:06:50.2220027Z and.b32 %r8037, %r8032, -64; 2026-02-21T09:06:50.2220092Z sub.s32 %r8038, %r8029, %r8037; 2026-02-21T09:06:50.2220283Z .loc 1 36 51 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:36:51 2026-02-21T09:06:50.2220354Z div.s32 %r8039, %r8038, %r8036; 2026-02-21T09:06:50.2220543Z .loc 1 35 64 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:64 2026-02-21T09:06:50.2220610Z mul.lo.s32 %r8040, %r8039, %r8036; 2026-02-21T09:06:50.2220677Z sub.s32 %r8041, %r8038, %r8040; 2026-02-21T09:06:50.2220866Z .loc 1 35 30 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:30 2026-02-21T09:06:50.2220928Z add.s32 %r8042, %r8041, %r8034; 2026-02-21T09:06:50.2221124Z .loc 1 37 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:37:27 2026-02-21T09:06:50.2221253Z shl.b32 %r573, %r8042, 8; 2026-02-21T09:06:50.2221451Z .loc 1 39 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:39:27 2026-02-21T09:06:50.2221519Z shl.b32 %r574, %r8039, 8; 2026-02-21T09:06:50.2221710Z .loc 1 40 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:40:32 2026-02-21T09:06:50.2221770Z or.b32 %r575, %r574, %r7; 2026-02-21T09:06:50.2221970Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2222040Z shl.b32 %r8043, %r8042, 18; 2026-02-21T09:06:50.2222103Z or.b32 %r8044, %r44, %r8043; 2026-02-21T09:06:50.2222176Z mad.wide.s32 %rd163, %r8044, 2, %rd4; 2026-02-21T09:06:50.2222249Z add.s32 %r15040, %r45, %r574; 2026-02-21T09:06:50.2222315Z mov.b32 %r15041, 0f00000000; 2026-02-21T09:06:50.2222382Z mov.b64 %rd164, -32; 2026-02-21T09:06:50.2222445Z mov.b32 %r15042, %r15041; 2026-02-21T09:06:50.2222511Z mov.b32 %r15043, %r15041; 2026-02-21T09:06:50.2222570Z mov.b32 %r15044, %r15041; 2026-02-21T09:06:50.2222628Z mov.b32 %r15045, %r15041; 2026-02-21T09:06:50.2222691Z mov.b32 %r15046, %r15041; 2026-02-21T09:06:50.2222752Z mov.b32 %r15047, %r15041; 2026-02-21T09:06:50.2222811Z mov.b32 %r15048, %r15041; 2026-02-21T09:06:50.2222873Z mov.b32 %r15049, %r15041; 2026-02-21T09:06:50.2222934Z mov.b32 %r15050, %r15041; 2026-02-21T09:06:50.2222994Z mov.b32 %r15051, %r15041; 2026-02-21T09:06:50.2223052Z mov.b32 %r15052, %r15041; 2026-02-21T09:06:50.2223125Z mov.b32 %r15053, %r15041; 2026-02-21T09:06:50.2223192Z mov.b32 %r15054, %r15041; 2026-02-21T09:06:50.2223251Z mov.b32 %r15055, %r15041; 2026-02-21T09:06:50.2223317Z mov.b32 %r15056, %r15041; 2026-02-21T09:06:50.2223377Z mov.b32 %r15057, %r15041; 2026-02-21T09:06:50.2223436Z mov.b32 %r15058, %r15041; 2026-02-21T09:06:50.2223496Z mov.b32 %r15059, %r15041; 2026-02-21T09:06:50.2223561Z mov.b32 %r15060, %r15041; 2026-02-21T09:06:50.2223622Z mov.b32 %r15061, %r15041; 2026-02-21T09:06:50.2223680Z mov.b32 %r15062, %r15041; 2026-02-21T09:06:50.2223745Z mov.b32 %r15063, %r15041; 2026-02-21T09:06:50.2223803Z mov.b32 %r15064, %r15041; 2026-02-21T09:06:50.2223925Z mov.b32 %r15065, %r15041; 2026-02-21T09:06:50.2223984Z mov.b32 %r15066, %r15041; 2026-02-21T09:06:50.2224056Z mov.b32 %r15067, %r15041; 2026-02-21T09:06:50.2224121Z mov.b32 %r15068, %r15041; 2026-02-21T09:06:50.2224183Z mov.b32 %r15069, %r15041; 2026-02-21T09:06:50.2224250Z mov.b32 %r15070, %r15041; 2026-02-21T09:06:50.2224310Z mov.b32 %r15071, %r15041; 2026-02-21T09:06:50.2224369Z mov.b32 %r15072, %r15041; 2026-02-21T09:06:50.2224430Z mov.b32 %r15073, %r15041; 2026-02-21T09:06:50.2224498Z mov.b32 %r15074, %r15041; 2026-02-21T09:06:50.2224557Z mov.b32 %r15075, %r15041; 2026-02-21T09:06:50.2224615Z mov.b32 %r15076, %r15041; 2026-02-21T09:06:50.2224681Z mov.b32 %r15077, %r15041; 2026-02-21T09:06:50.2224795Z mov.b32 %r15078, %r15041; 2026-02-21T09:06:50.2224902Z mov.b32 %r15079, %r15041; 2026-02-21T09:06:50.2224965Z mov.b32 %r15080, %r15041; 2026-02-21T09:06:50.2225030Z mov.b32 %r15081, %r15041; 2026-02-21T09:06:50.2225088Z mov.b32 %r15082, %r15041; 2026-02-21T09:06:50.2225149Z mov.b32 %r15083, %r15041; 2026-02-21T09:06:50.2225215Z mov.b32 %r15084, %r15041; 2026-02-21T09:06:50.2225278Z mov.b32 %r15085, %r15041; 2026-02-21T09:06:50.2225335Z mov.b32 %r15086, %r15041; 2026-02-21T09:06:50.2225394Z mov.b32 %r15087, %r15041; 2026-02-21T09:06:50.2225457Z mov.b32 %r15088, %r15041; 2026-02-21T09:06:50.2225515Z mov.b32 %r15089, %r15041; 2026-02-21T09:06:50.2225574Z mov.b32 %r15090, %r15041; 2026-02-21T09:06:50.2225638Z mov.b32 %r15091, %r15041; 2026-02-21T09:06:50.2225698Z mov.b32 %r15092, %r15041; 2026-02-21T09:06:50.2225756Z mov.b32 %r15093, %r15041; 2026-02-21T09:06:50.2225819Z mov.b32 %r15094, %r15041; 2026-02-21T09:06:50.2225880Z mov.b32 %r15095, %r15041; 2026-02-21T09:06:50.2225942Z mov.b32 %r15096, %r15041; 2026-02-21T09:06:50.2226051Z mov.b32 %r15097, %r15041; 2026-02-21T09:06:50.2226117Z mov.b32 %r15098, %r15041; 2026-02-21T09:06:50.2226174Z mov.b32 %r15099, %r15041; 2026-02-21T09:06:50.2226232Z mov.b32 %r15100, %r15041; 2026-02-21T09:06:50.2226299Z mov.b32 %r15101, %r15041; 2026-02-21T09:06:50.2226358Z mov.b32 %r15102, %r15041; 2026-02-21T09:06:50.2226417Z mov.b32 %r15103, %r15041; 2026-02-21T09:06:50.2226600Z mov.b32 %r15104, %r15041; 2026-02-21T09:06:50.2226670Z mov.b32 %r15105, %r15041; 2026-02-21T09:06:50.2226730Z mov.b32 %r15106, %r15041; 2026-02-21T09:06:50.2226788Z mov.b32 %r15107, %r15041; 2026-02-21T09:06:50.2226847Z mov.b32 %r15108, %r15041; 2026-02-21T09:06:50.2226904Z mov.b32 %r15109, %r15041; 2026-02-21T09:06:50.2226961Z mov.b32 %r15110, %r15041; 2026-02-21T09:06:50.2227018Z mov.b32 %r15111, %r15041; 2026-02-21T09:06:50.2227079Z mov.b32 %r15112, %r15041; 2026-02-21T09:06:50.2227135Z mov.b32 %r15113, %r15041; 2026-02-21T09:06:50.2227195Z mov.b32 %r15114, %r15041; 2026-02-21T09:06:50.2227260Z mov.b32 %r15115, %r15041; 2026-02-21T09:06:50.2227318Z mov.b32 %r15116, %r15041; 2026-02-21T09:06:50.2227377Z mov.b32 %r15117, %r15041; 2026-02-21T09:06:50.2227434Z mov.b32 %r15118, %r15041; 2026-02-21T09:06:50.2227501Z mov.b32 %r15119, %r15041; 2026-02-21T09:06:50.2227560Z mov.b32 %r15120, %r15041; 2026-02-21T09:06:50.2227630Z mov.b32 %r15121, %r15041; 2026-02-21T09:06:50.2227695Z mov.b32 %r15122, %r15041; 2026-02-21T09:06:50.2227754Z mov.b32 %r15123, %r15041; 2026-02-21T09:06:50.2227812Z mov.b32 %r15124, %r15041; 2026-02-21T09:06:50.2227872Z mov.b32 %r15125, %r15041; 2026-02-21T09:06:50.2227939Z mov.b32 %r15126, %r15041; 2026-02-21T09:06:50.2227998Z mov.b32 %r15127, %r15041; 2026-02-21T09:06:50.2228055Z mov.b32 %r15128, %r15041; 2026-02-21T09:06:50.2228117Z mov.b32 %r15129, %r15041; 2026-02-21T09:06:50.2228177Z mov.b32 %r15130, %r15041; 2026-02-21T09:06:50.2228235Z mov.b32 %r15131, %r15041; 2026-02-21T09:06:50.2228367Z mov.b32 %r15132, %r15041; 2026-02-21T09:06:50.2228447Z mov.b32 %r15133, %r15041; 2026-02-21T09:06:50.2228507Z mov.b32 %r15134, %r15041; 2026-02-21T09:06:50.2228568Z mov.b32 %r15135, %r15041; 2026-02-21T09:06:50.2228721Z mov.b32 %r15136, %r15041; 2026-02-21T09:06:50.2228781Z mov.b32 %r15137, %r15041; 2026-02-21T09:06:50.2228840Z mov.b32 %r15138, %r15041; 2026-02-21T09:06:50.2228907Z mov.b32 %r15139, %r15041; 2026-02-21T09:06:50.2228969Z mov.b32 %r15140, %r15041; 2026-02-21T09:06:50.2229029Z mov.b32 %r15141, %r15041; 2026-02-21T09:06:50.2229088Z mov.b32 %r15142, %r15041; 2026-02-21T09:06:50.2229151Z mov.b32 %r15143, %r15041; 2026-02-21T09:06:50.2229207Z mov.b32 %r15144, %r15041; 2026-02-21T09:06:50.2229264Z mov.b32 %r15145, %r15041; 2026-02-21T09:06:50.2229323Z mov.b32 %r15146, %r15041; 2026-02-21T09:06:50.2229381Z mov.b32 %r15147, %r15041; 2026-02-21T09:06:50.2229437Z mov.b32 %r15148, %r15041; 2026-02-21T09:06:50.2229495Z mov.b32 %r15149, %r15041; 2026-02-21T09:06:50.2229624Z mov.b32 %r15150, %r15041; 2026-02-21T09:06:50.2229746Z mov.b32 %r15151, %r15041; 2026-02-21T09:06:50.2229808Z mov.b32 %r15152, %r15041; 2026-02-21T09:06:50.2229871Z mov.b32 %r15153, %r15041; 2026-02-21T09:06:50.2229932Z mov.b32 %r15154, %r15041; 2026-02-21T09:06:50.2229991Z mov.b32 %r15155, %r15041; 2026-02-21T09:06:50.2230062Z mov.b32 %r15156, %r15041; 2026-02-21T09:06:50.2230128Z mov.b32 %r15157, %r15041; 2026-02-21T09:06:50.2230188Z mov.b32 %r15158, %r15041; 2026-02-21T09:06:50.2230246Z mov.b32 %r15159, %r15041; 2026-02-21T09:06:50.2230309Z mov.b32 %r15160, %r15041; 2026-02-21T09:06:50.2230369Z mov.b32 %r15161, %r15041; 2026-02-21T09:06:50.2230427Z mov.b32 %r15162, %r15041; 2026-02-21T09:06:50.2230483Z mov.b32 %r15163, %r15041; 2026-02-21T09:06:50.2230545Z mov.b32 %r15164, %r15041; 2026-02-21T09:06:50.2230604Z mov.b32 %r15165, %r15041; 2026-02-21T09:06:50.2230665Z mov.b32 %r15166, %r15041; 2026-02-21T09:06:50.2230728Z mov.b32 %r15167, %r15041; 2026-02-21T09:06:50.2230789Z mov.b32 %r15168, %r15041; 2026-02-21T09:06:50.2230975Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:06:50.2231087Z // => This Inner Loop Header: Depth=2 2026-02-21T09:06:50.2231295Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2231362Z add.s64 %rd91, %rd163, -96; 2026-02-21T09:06:50.2231422Z // begin inline asm 2026-02-21T09:06:50.2231483Z mov.u32 %r8045, 0x0; 2026-02-21T09:06:50.2231541Z mov.u32 %r8046, 0x0; 2026-02-21T09:06:50.2231597Z mov.u32 %r8047, 0x0; 2026-02-21T09:06:50.2231656Z mov.u32 %r8048, 0x0; 2026-02-21T09:06:50.2231786Z ld.global.v4.b32 { %r8045, %r8046, %r8047, %r8048 }, [ %rd91 + 0 ]; 2026-02-21T09:06:50.2231844Z // end inline asm 2026-02-21T09:06:50.2232038Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2232102Z bar.sync 0; 2026-02-21T09:06:50.2232189Z st.shared.v2.b32 [%r13], {%r8045, %r8046}; 2026-02-21T09:06:50.2232264Z st.shared.v2.b32 [%r14], {%r8047, %r8048}; 2026-02-21T09:06:50.2232323Z bar.sync 0; 2026-02-21T09:06:50.2232406Z ld.shared.b16 %rs385, [%r42]; 2026-02-21T09:06:50.2232477Z ld.shared.b16 %rs386, [%r42+256]; 2026-02-21T09:06:50.2232549Z ld.shared.b16 %rs387, [%r42+16]; 2026-02-21T09:06:50.2232615Z ld.shared.b16 %rs388, [%r42+272]; 2026-02-21T09:06:50.2232680Z ld.shared.b16 %rs389, [%r43]; 2026-02-21T09:06:50.2232751Z ld.shared.b16 %rs390, [%r43+256]; 2026-02-21T09:06:50.2232820Z ld.shared.b16 %rs391, [%r43+16]; 2026-02-21T09:06:50.2232886Z ld.shared.b16 %rs392, [%r43+272]; 2026-02-21T09:06:50.2232953Z cvt.f32.bf16 %r8306, %rs385; 2026-02-21T09:06:50.2233019Z cvt.f32.bf16 %r8307, %rs386; 2026-02-21T09:06:50.2233081Z cvt.f32.bf16 %r8308, %rs389; 2026-02-21T09:06:50.2233143Z cvt.f32.bf16 %r8309, %rs390; 2026-02-21T09:06:50.2233206Z cvt.f32.bf16 %r8566, %rs387; 2026-02-21T09:06:50.2233269Z cvt.f32.bf16 %r8567, %rs388; 2026-02-21T09:06:50.2233332Z cvt.f32.bf16 %r8568, %rs391; 2026-02-21T09:06:50.2233393Z cvt.f32.bf16 %r8569, %rs392; 2026-02-21T09:06:50.2233599Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2233741Z cvt.s64.s32 %rd107, %r15040; 2026-02-21T09:06:50.2233804Z add.s64 %rd92, %rd23, %rd107; 2026-02-21T09:06:50.2234007Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2234072Z // begin inline asm 2026-02-21T09:06:50.2234132Z mov.u32 %r8049, 0x0; 2026-02-21T09:06:50.2234214Z ld.global.b32 { %r8049 }, [ %rd92 + 0 ]; 2026-02-21T09:06:50.2234273Z // end inline asm 2026-02-21T09:06:50.2234469Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2234525Z bar.sync 0; 2026-02-21T09:06:50.2234597Z st.shared.b8 [%r17], %r8049; 2026-02-21T09:06:50.2234720Z prmt.b32 %r11193, %r8049, 0, 0x7771U; 2026-02-21T09:06:50.2234834Z st.shared.b8 [%r18+512], %r11193; 2026-02-21T09:06:50.2234908Z prmt.b32 %r11194, %r8049, 0, 0x7772U; 2026-02-21T09:06:50.2234975Z st.shared.b8 [%r19+1024], %r11194; 2026-02-21T09:06:50.2235045Z prmt.b32 %r11195, %r8049, 0, 0x7773U; 2026-02-21T09:06:50.2235110Z st.shared.b8 [%r20+1536], %r11195; 2026-02-21T09:06:50.2235171Z bar.sync 0; 2026-02-21T09:06:50.2235239Z ld.shared.b32 %r11196, [%r21]; 2026-02-21T09:06:50.2235308Z prmt.b32 %r11197, %r11196, 0, 0x7770U; 2026-02-21T09:06:50.2235383Z cvt.u16.u32 %rs393, %r11197; 2026-02-21T09:06:50.2235457Z prmt.b32 %r11198, %r11196, 0, 0x7771U; 2026-02-21T09:06:50.2235520Z cvt.u16.u32 %rs394, %r11198; 2026-02-21T09:06:50.2235589Z prmt.b32 %r11199, %r11196, 0, 0x7772U; 2026-02-21T09:06:50.2235651Z cvt.u16.u32 %rs395, %r11199; 2026-02-21T09:06:50.2235717Z prmt.b32 %r11200, %r11196, 0, 0x7773U; 2026-02-21T09:06:50.2235777Z cvt.u16.u32 %rs396, %r11200; 2026-02-21T09:06:50.2235847Z ld.shared.b32 %r11201, [%r21+128]; 2026-02-21T09:06:50.2235966Z prmt.b32 %r11202, %r11201, 0, 0x7770U; 2026-02-21T09:06:50.2236029Z cvt.u16.u32 %rs397, %r11202; 2026-02-21T09:06:50.2236097Z prmt.b32 %r11203, %r11201, 0, 0x7771U; 2026-02-21T09:06:50.2236158Z cvt.u16.u32 %rs398, %r11203; 2026-02-21T09:06:50.2236223Z prmt.b32 %r11204, %r11201, 0, 0x7772U; 2026-02-21T09:06:50.2236286Z cvt.u16.u32 %rs399, %r11204; 2026-02-21T09:06:50.2236367Z prmt.b32 %r11205, %r11201, 0, 0x7773U; 2026-02-21T09:06:50.2236432Z cvt.u16.u32 %rs400, %r11205; 2026-02-21T09:06:50.2236748Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2236820Z shl.b16 %rs401, %rs393, 4; 2026-02-21T09:06:50.2236883Z shl.b16 %rs402, %rs394, 4; 2026-02-21T09:06:50.2236946Z shl.b16 %rs403, %rs395, 4; 2026-02-21T09:06:50.2237011Z shl.b16 %rs404, %rs396, 4; 2026-02-21T09:06:50.2237073Z shl.b16 %rs405, %rs397, 4; 2026-02-21T09:06:50.2237134Z shl.b16 %rs406, %rs398, 4; 2026-02-21T09:06:50.2237199Z shl.b16 %rs407, %rs399, 4; 2026-02-21T09:06:50.2237264Z shl.b16 %rs408, %rs400, 4; 2026-02-21T09:06:50.2237456Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2237529Z selp.b16 %rs409, %rs401, %rs393, %p81; 2026-02-21T09:06:50.2237596Z cvt.s16.s8 %rs410, %rs409; 2026-02-21T09:06:50.2237656Z shr.s16 %rs411, %rs410, 4; 2026-02-21T09:06:50.2237724Z selp.b16 %rs412, %rs402, %rs394, %p81; 2026-02-21T09:06:50.2237786Z cvt.s16.s8 %rs413, %rs412; 2026-02-21T09:06:50.2237860Z shr.s16 %rs414, %rs413, 4; 2026-02-21T09:06:50.2237934Z selp.b16 %rs415, %rs403, %rs395, %p81; 2026-02-21T09:06:50.2237995Z cvt.s16.s8 %rs416, %rs415; 2026-02-21T09:06:50.2238060Z shr.s16 %rs417, %rs416, 4; 2026-02-21T09:06:50.2238129Z selp.b16 %rs418, %rs404, %rs396, %p81; 2026-02-21T09:06:50.2238189Z cvt.s16.s8 %rs419, %rs418; 2026-02-21T09:06:50.2238250Z shr.s16 %rs420, %rs419, 4; 2026-02-21T09:06:50.2238324Z selp.b16 %rs421, %rs405, %rs397, %p81; 2026-02-21T09:06:50.2238389Z cvt.s16.s8 %rs422, %rs421; 2026-02-21T09:06:50.2238451Z shr.s16 %rs423, %rs422, 4; 2026-02-21T09:06:50.2238521Z selp.b16 %rs424, %rs406, %rs398, %p81; 2026-02-21T09:06:50.2238668Z cvt.s16.s8 %rs425, %rs424; 2026-02-21T09:06:50.2238729Z shr.s16 %rs426, %rs425, 4; 2026-02-21T09:06:50.2238801Z selp.b16 %rs427, %rs407, %rs399, %p81; 2026-02-21T09:06:50.2238862Z cvt.s16.s8 %rs428, %rs427; 2026-02-21T09:06:50.2238923Z shr.s16 %rs429, %rs428, 4; 2026-02-21T09:06:50.2238991Z selp.b16 %rs430, %rs408, %rs400, %p81; 2026-02-21T09:06:50.2239057Z cvt.s16.s8 %rs431, %rs430; 2026-02-21T09:06:50.2239116Z shr.s16 %rs432, %rs431, 4; 2026-02-21T09:06:50.2239318Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2239388Z cvt.rn.f32.s16 %r11206, %rs411; 2026-02-21T09:06:50.2239455Z cvt.rn.f32.s16 %r11207, %rs414; 2026-02-21T09:06:50.2239518Z cvt.rn.f32.s16 %r11208, %rs417; 2026-02-21T09:06:50.2239723Z cvt.rn.f32.s16 %r11209, %rs420; 2026-02-21T09:06:50.2239790Z cvt.rn.f32.s16 %r11210, %rs423; 2026-02-21T09:06:50.2239858Z cvt.rn.f32.s16 %r11211, %rs426; 2026-02-21T09:06:50.2239927Z cvt.rn.f32.s16 %r11212, %rs429; 2026-02-21T09:06:50.2239990Z cvt.rn.f32.s16 %r11213, %rs432; 2026-02-21T09:06:50.2240046Z bar.sync 0; 2026-02-21T09:06:50.2240115Z st.shared.b32 [%r22], %r11206; 2026-02-21T09:06:50.2240183Z st.shared.b32 [%r22+8], %r11207; 2026-02-21T09:06:50.2240246Z st.shared.b32 [%r23], %r11208; 2026-02-21T09:06:50.2240312Z st.shared.b32 [%r23+8], %r11209; 2026-02-21T09:06:50.2240378Z st.shared.b32 [%r24], %r11210; 2026-02-21T09:06:50.2240443Z st.shared.b32 [%r24+8], %r11211; 2026-02-21T09:06:50.2240505Z st.shared.b32 [%r25], %r11212; 2026-02-21T09:06:50.2240573Z st.shared.b32 [%r25+8], %r11213; 2026-02-21T09:06:50.2240629Z $L__tmp17: 2026-02-21T09:06:50.2240907Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2241057Z // begin inline asm 2026-02-21T09:06:50.2241142Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2241200Z // end inline asm 2026-02-21T09:06:50.2241257Z bar.sync 0; 2026-02-21T09:06:50.2241340Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2241410Z mov.pred %p46, -1; 2026-02-21T09:06:50.2241470Z // begin inline asm 2026-02-21T09:06:50.2244132Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r8306,%r8307,%r8308,%r8309}, %rd93, %p46, 1, 1; 2026-02-21T09:06:50.2244197Z // end inline asm 2026-02-21T09:06:50.2244260Z // begin inline asm 2026-02-21T09:06:50.2247026Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r8566,%r8567,%r8568,%r8569}, %rd94, %p46, 1, 1; 2026-02-21T09:06:50.2247174Z // end inline asm 2026-02-21T09:06:50.2247250Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2247310Z mov.b32 %r11061, 0; 2026-02-21T09:06:50.2247376Z mov.b32 %r8698, %r12221; 2026-02-21T09:06:50.2247506Z mov.b32 %r8699, %r11061; 2026-02-21T09:06:50.2247625Z mov.b32 %r8700, %r11061; 2026-02-21T09:06:50.2247704Z // begin inline asm 2026-02-21T09:06:50.2250219Z // wait for regs: %r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168,%r8698,%r8699,%r8700 2026-02-21T09:06:50.2250307Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2250363Z // end inline asm 2026-02-21T09:06:50.2250417Z $L__tmp18: 2026-02-21T09:06:50.2250619Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2250691Z add.s64 %rd95, %rd163, -64; 2026-02-21T09:06:50.2250751Z // begin inline asm 2026-02-21T09:06:50.2250809Z mov.u32 %r8832, 0x0; 2026-02-21T09:06:50.2250869Z mov.u32 %r8833, 0x0; 2026-02-21T09:06:50.2250927Z mov.u32 %r8834, 0x0; 2026-02-21T09:06:50.2250984Z mov.u32 %r8835, 0x0; 2026-02-21T09:06:50.2251116Z ld.global.v4.b32 { %r8832, %r8833, %r8834, %r8835 }, [ %rd95 + 0 ]; 2026-02-21T09:06:50.2251173Z // end inline asm 2026-02-21T09:06:50.2251368Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2251427Z bar.sync 0; 2026-02-21T09:06:50.2251510Z st.shared.v2.b32 [%r13], {%r8832, %r8833}; 2026-02-21T09:06:50.2251586Z st.shared.v2.b32 [%r14], {%r8834, %r8835}; 2026-02-21T09:06:50.2251643Z bar.sync 0; 2026-02-21T09:06:50.2251714Z ld.shared.b16 %rs433, [%r42]; 2026-02-21T09:06:50.2251783Z ld.shared.b16 %rs434, [%r42+256]; 2026-02-21T09:06:50.2251850Z ld.shared.b16 %rs435, [%r42+16]; 2026-02-21T09:06:50.2251915Z ld.shared.b16 %rs436, [%r42+272]; 2026-02-21T09:06:50.2251982Z ld.shared.b16 %rs437, [%r43]; 2026-02-21T09:06:50.2252051Z ld.shared.b16 %rs438, [%r43+256]; 2026-02-21T09:06:50.2252116Z ld.shared.b16 %rs439, [%r43+16]; 2026-02-21T09:06:50.2252182Z ld.shared.b16 %rs440, [%r43+272]; 2026-02-21T09:06:50.2252246Z cvt.f32.bf16 %r9093, %rs433; 2026-02-21T09:06:50.2252308Z cvt.f32.bf16 %r9094, %rs434; 2026-02-21T09:06:50.2252370Z cvt.f32.bf16 %r9095, %rs437; 2026-02-21T09:06:50.2252449Z cvt.f32.bf16 %r9096, %rs438; 2026-02-21T09:06:50.2252516Z cvt.f32.bf16 %r9353, %rs435; 2026-02-21T09:06:50.2252579Z cvt.f32.bf16 %r9354, %rs436; 2026-02-21T09:06:50.2252643Z cvt.f32.bf16 %r9355, %rs439; 2026-02-21T09:06:50.2252703Z cvt.f32.bf16 %r9356, %rs440; 2026-02-21T09:06:50.2252829Z cvt.u32.u64 %r11214, %rd164; 2026-02-21T09:06:50.2252893Z add.s32 %r11215, %r11214, 40; 2026-02-21T09:06:50.2253090Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2253151Z or.b32 %r11216, %r8, %r11215; 2026-02-21T09:06:50.2253210Z shl.b32 %r11217, %r11216, 13; 2026-02-21T09:06:50.2253275Z add.s32 %r11218, %r11217, %r575; 2026-02-21T09:06:50.2253467Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2253531Z cvt.s64.s32 %rd108, %r11218; 2026-02-21T09:06:50.2253598Z add.s64 %rd96, %rd23, %rd108; 2026-02-21T09:06:50.2253840Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2253950Z // begin inline asm 2026-02-21T09:06:50.2254014Z mov.u32 %r8836, 0x0; 2026-02-21T09:06:50.2254088Z ld.global.b32 { %r8836 }, [ %rd96 + 0 ]; 2026-02-21T09:06:50.2254147Z // end inline asm 2026-02-21T09:06:50.2254355Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2254415Z bar.sync 0; 2026-02-21T09:06:50.2254479Z st.shared.b8 [%r17], %r8836; 2026-02-21T09:06:50.2254547Z prmt.b32 %r11219, %r8836, 0, 0x7771U; 2026-02-21T09:06:50.2254617Z st.shared.b8 [%r18+512], %r11219; 2026-02-21T09:06:50.2254684Z prmt.b32 %r11220, %r8836, 0, 0x7772U; 2026-02-21T09:06:50.2254749Z st.shared.b8 [%r19+1024], %r11220; 2026-02-21T09:06:50.2254814Z prmt.b32 %r11221, %r8836, 0, 0x7773U; 2026-02-21T09:06:50.2254884Z st.shared.b8 [%r20+1536], %r11221; 2026-02-21T09:06:50.2254941Z bar.sync 0; 2026-02-21T09:06:50.2255006Z ld.shared.b32 %r11222, [%r21]; 2026-02-21T09:06:50.2255083Z prmt.b32 %r11223, %r11222, 0, 0x7770U; 2026-02-21T09:06:50.2255213Z cvt.u16.u32 %rs441, %r11223; 2026-02-21T09:06:50.2255281Z prmt.b32 %r11224, %r11222, 0, 0x7771U; 2026-02-21T09:06:50.2255345Z cvt.u16.u32 %rs442, %r11224; 2026-02-21T09:06:50.2255415Z prmt.b32 %r11225, %r11222, 0, 0x7772U; 2026-02-21T09:06:50.2255477Z cvt.u16.u32 %rs443, %r11225; 2026-02-21T09:06:50.2255541Z prmt.b32 %r11226, %r11222, 0, 0x7773U; 2026-02-21T09:06:50.2255605Z cvt.u16.u32 %rs444, %r11226; 2026-02-21T09:06:50.2255670Z ld.shared.b32 %r11227, [%r21+128]; 2026-02-21T09:06:50.2255735Z prmt.b32 %r11228, %r11227, 0, 0x7770U; 2026-02-21T09:06:50.2255798Z cvt.u16.u32 %rs445, %r11228; 2026-02-21T09:06:50.2255862Z prmt.b32 %r11229, %r11227, 0, 0x7771U; 2026-02-21T09:06:50.2255922Z cvt.u16.u32 %rs446, %r11229; 2026-02-21T09:06:50.2255986Z prmt.b32 %r11230, %r11227, 0, 0x7772U; 2026-02-21T09:06:50.2256051Z cvt.u16.u32 %rs447, %r11230; 2026-02-21T09:06:50.2256113Z prmt.b32 %r11231, %r11227, 0, 0x7773U; 2026-02-21T09:06:50.2256179Z cvt.u16.u32 %rs448, %r11231; 2026-02-21T09:06:50.2256374Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2256440Z shl.b16 %rs449, %rs441, 4; 2026-02-21T09:06:50.2256630Z shl.b16 %rs450, %rs442, 4; 2026-02-21T09:06:50.2256699Z shl.b16 %rs451, %rs443, 4; 2026-02-21T09:06:50.2256762Z shl.b16 %rs452, %rs444, 4; 2026-02-21T09:06:50.2256822Z shl.b16 %rs453, %rs445, 4; 2026-02-21T09:06:50.2256882Z shl.b16 %rs454, %rs446, 4; 2026-02-21T09:06:50.2256947Z shl.b16 %rs455, %rs447, 4; 2026-02-21T09:06:50.2257006Z shl.b16 %rs456, %rs448, 4; 2026-02-21T09:06:50.2257198Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2257271Z selp.b16 %rs457, %rs449, %rs441, %p81; 2026-02-21T09:06:50.2257333Z cvt.s16.s8 %rs458, %rs457; 2026-02-21T09:06:50.2257394Z shr.s16 %rs459, %rs458, 4; 2026-02-21T09:06:50.2257463Z selp.b16 %rs460, %rs450, %rs442, %p81; 2026-02-21T09:06:50.2257529Z cvt.s16.s8 %rs461, %rs460; 2026-02-21T09:06:50.2257589Z shr.s16 %rs462, %rs461, 4; 2026-02-21T09:06:50.2257658Z selp.b16 %rs463, %rs451, %rs443, %p81; 2026-02-21T09:06:50.2257822Z cvt.s16.s8 %rs464, %rs463; 2026-02-21T09:06:50.2257884Z shr.s16 %rs465, %rs464, 4; 2026-02-21T09:06:50.2257954Z selp.b16 %rs466, %rs452, %rs444, %p81; 2026-02-21T09:06:50.2258014Z cvt.s16.s8 %rs467, %rs466; 2026-02-21T09:06:50.2258078Z shr.s16 %rs468, %rs467, 4; 2026-02-21T09:06:50.2258145Z selp.b16 %rs469, %rs453, %rs445, %p81; 2026-02-21T09:06:50.2258205Z cvt.s16.s8 %rs470, %rs469; 2026-02-21T09:06:50.2258269Z shr.s16 %rs471, %rs470, 4; 2026-02-21T09:06:50.2258339Z selp.b16 %rs472, %rs454, %rs446, %p81; 2026-02-21T09:06:50.2258399Z cvt.s16.s8 %rs473, %rs472; 2026-02-21T09:06:50.2258462Z shr.s16 %rs474, %rs473, 4; 2026-02-21T09:06:50.2258528Z selp.b16 %rs475, %rs455, %rs447, %p81; 2026-02-21T09:06:50.2258588Z cvt.s16.s8 %rs476, %rs475; 2026-02-21T09:06:50.2258793Z shr.s16 %rs477, %rs476, 4; 2026-02-21T09:06:50.2258867Z selp.b16 %rs478, %rs456, %rs448, %p81; 2026-02-21T09:06:50.2258927Z cvt.s16.s8 %rs479, %rs478; 2026-02-21T09:06:50.2258988Z shr.s16 %rs480, %rs479, 4; 2026-02-21T09:06:50.2259191Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2259260Z cvt.rn.f32.s16 %r11232, %rs459; 2026-02-21T09:06:50.2259324Z cvt.rn.f32.s16 %r11233, %rs462; 2026-02-21T09:06:50.2259387Z cvt.rn.f32.s16 %r11234, %rs465; 2026-02-21T09:06:50.2259452Z cvt.rn.f32.s16 %r11235, %rs468; 2026-02-21T09:06:50.2259515Z cvt.rn.f32.s16 %r11236, %rs471; 2026-02-21T09:06:50.2259575Z cvt.rn.f32.s16 %r11237, %rs474; 2026-02-21T09:06:50.2259640Z cvt.rn.f32.s16 %r11238, %rs477; 2026-02-21T09:06:50.2259702Z cvt.rn.f32.s16 %r11239, %rs480; 2026-02-21T09:06:50.2259758Z bar.sync 0; 2026-02-21T09:06:50.2259826Z st.shared.b32 [%r22], %r11232; 2026-02-21T09:06:50.2259895Z st.shared.b32 [%r22+8], %r11233; 2026-02-21T09:06:50.2260024Z st.shared.b32 [%r23], %r11234; 2026-02-21T09:06:50.2260091Z st.shared.b32 [%r23+8], %r11235; 2026-02-21T09:06:50.2260158Z st.shared.b32 [%r24], %r11236; 2026-02-21T09:06:50.2260222Z st.shared.b32 [%r24+8], %r11237; 2026-02-21T09:06:50.2260285Z st.shared.b32 [%r25], %r11238; 2026-02-21T09:06:50.2260349Z st.shared.b32 [%r25+8], %r11239; 2026-02-21T09:06:50.2260402Z $L__tmp19: 2026-02-21T09:06:50.2260669Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2260730Z // begin inline asm 2026-02-21T09:06:50.2260812Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2260868Z // end inline asm 2026-02-21T09:06:50.2260923Z bar.sync 0; 2026-02-21T09:06:50.2261009Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2261070Z // begin inline asm 2026-02-21T09:06:50.2263737Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r9093,%r9094,%r9095,%r9096}, %rd93, %p46, 1, 1; 2026-02-21T09:06:50.2263806Z // end inline asm 2026-02-21T09:06:50.2263866Z // begin inline asm 2026-02-21T09:06:50.2266712Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r9353,%r9354,%r9355,%r9356}, %rd94, %p46, 1, 1; 2026-02-21T09:06:50.2266894Z // end inline asm 2026-02-21T09:06:50.2266979Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2267041Z mov.b32 %r9485, %r12221; 2026-02-21T09:06:50.2267112Z mov.b32 %r9486, %r11061; 2026-02-21T09:06:50.2267176Z mov.b32 %r9487, %r11061; 2026-02-21T09:06:50.2267237Z // begin inline asm 2026-02-21T09:06:50.2269836Z // wait for regs: %r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168,%r9485,%r9486,%r9487 2026-02-21T09:06:50.2269929Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2269992Z // end inline asm 2026-02-21T09:06:50.2270051Z $L__tmp20: 2026-02-21T09:06:50.2270262Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2270329Z add.s64 %rd99, %rd163, -32; 2026-02-21T09:06:50.2270391Z // begin inline asm 2026-02-21T09:06:50.2270455Z mov.u32 %r9619, 0x0; 2026-02-21T09:06:50.2270515Z mov.u32 %r9620, 0x0; 2026-02-21T09:06:50.2270572Z mov.u32 %r9621, 0x0; 2026-02-21T09:06:50.2270632Z mov.u32 %r9622, 0x0; 2026-02-21T09:06:50.2270766Z ld.global.v4.b32 { %r9619, %r9620, %r9621, %r9622 }, [ %rd99 + 0 ]; 2026-02-21T09:06:50.2270823Z // end inline asm 2026-02-21T09:06:50.2271024Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2271083Z bar.sync 0; 2026-02-21T09:06:50.2271165Z st.shared.v2.b32 [%r13], {%r9619, %r9620}; 2026-02-21T09:06:50.2271241Z st.shared.v2.b32 [%r14], {%r9621, %r9622}; 2026-02-21T09:06:50.2271297Z bar.sync 0; 2026-02-21T09:06:50.2271366Z ld.shared.b16 %rs481, [%r42]; 2026-02-21T09:06:50.2271435Z ld.shared.b16 %rs482, [%r42+256]; 2026-02-21T09:06:50.2271506Z ld.shared.b16 %rs483, [%r42+16]; 2026-02-21T09:06:50.2271573Z ld.shared.b16 %rs484, [%r42+272]; 2026-02-21T09:06:50.2271639Z ld.shared.b16 %rs485, [%r43]; 2026-02-21T09:06:50.2271706Z ld.shared.b16 %rs486, [%r43+256]; 2026-02-21T09:06:50.2271776Z ld.shared.b16 %rs487, [%r43+16]; 2026-02-21T09:06:50.2271853Z ld.shared.b16 %rs488, [%r43+272]; 2026-02-21T09:06:50.2271987Z cvt.f32.bf16 %r9880, %rs481; 2026-02-21T09:06:50.2272050Z cvt.f32.bf16 %r9881, %rs482; 2026-02-21T09:06:50.2272111Z cvt.f32.bf16 %r9882, %rs485; 2026-02-21T09:06:50.2272175Z cvt.f32.bf16 %r9883, %rs486; 2026-02-21T09:06:50.2272236Z cvt.f32.bf16 %r10140, %rs483; 2026-02-21T09:06:50.2272302Z cvt.f32.bf16 %r10141, %rs484; 2026-02-21T09:06:50.2272364Z cvt.f32.bf16 %r10142, %rs487; 2026-02-21T09:06:50.2272425Z cvt.f32.bf16 %r10143, %rs488; 2026-02-21T09:06:50.2272631Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2272695Z add.s32 %r11240, %r15040, 131072; 2026-02-21T09:06:50.2272757Z cvt.s64.s32 %rd109, %r11240; 2026-02-21T09:06:50.2272824Z add.s64 %rd100, %rd23, %rd109; 2026-02-21T09:06:50.2273125Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2273188Z // begin inline asm 2026-02-21T09:06:50.2273246Z mov.u32 %r9623, 0x0; 2026-02-21T09:06:50.2273327Z ld.global.b32 { %r9623 }, [ %rd100 + 0 ]; 2026-02-21T09:06:50.2273385Z // end inline asm 2026-02-21T09:06:50.2273578Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2273649Z bar.sync 0; 2026-02-21T09:06:50.2273715Z st.shared.b8 [%r17], %r9623; 2026-02-21T09:06:50.2273783Z prmt.b32 %r11241, %r9623, 0, 0x7771U; 2026-02-21T09:06:50.2273847Z st.shared.b8 [%r18+512], %r11241; 2026-02-21T09:06:50.2273916Z prmt.b32 %r11242, %r9623, 0, 0x7772U; 2026-02-21T09:06:50.2273983Z st.shared.b8 [%r19+1024], %r11242; 2026-02-21T09:06:50.2274047Z prmt.b32 %r11243, %r9623, 0, 0x7773U; 2026-02-21T09:06:50.2274115Z st.shared.b8 [%r20+1536], %r11243; 2026-02-21T09:06:50.2274170Z bar.sync 0; 2026-02-21T09:06:50.2274236Z ld.shared.b32 %r11244, [%r21]; 2026-02-21T09:06:50.2274360Z prmt.b32 %r11245, %r11244, 0, 0x7770U; 2026-02-21T09:06:50.2274426Z cvt.u16.u32 %rs489, %r11245; 2026-02-21T09:06:50.2274491Z prmt.b32 %r11246, %r11244, 0, 0x7771U; 2026-02-21T09:06:50.2274555Z cvt.u16.u32 %rs490, %r11246; 2026-02-21T09:06:50.2274633Z prmt.b32 %r11247, %r11244, 0, 0x7772U; 2026-02-21T09:06:50.2274697Z cvt.u16.u32 %rs491, %r11247; 2026-02-21T09:06:50.2274763Z prmt.b32 %r11248, %r11244, 0, 0x7773U; 2026-02-21T09:06:50.2274826Z cvt.u16.u32 %rs492, %r11248; 2026-02-21T09:06:50.2274890Z ld.shared.b32 %r11249, [%r21+128]; 2026-02-21T09:06:50.2274955Z prmt.b32 %r11250, %r11249, 0, 0x7770U; 2026-02-21T09:06:50.2275016Z cvt.u16.u32 %rs493, %r11250; 2026-02-21T09:06:50.2275084Z prmt.b32 %r11251, %r11249, 0, 0x7771U; 2026-02-21T09:06:50.2275145Z cvt.u16.u32 %rs494, %r11251; 2026-02-21T09:06:50.2275209Z prmt.b32 %r11252, %r11249, 0, 0x7772U; 2026-02-21T09:06:50.2275275Z cvt.u16.u32 %rs495, %r11252; 2026-02-21T09:06:50.2275342Z prmt.b32 %r11253, %r11249, 0, 0x7773U; 2026-02-21T09:06:50.2275403Z cvt.u16.u32 %rs496, %r11253; 2026-02-21T09:06:50.2275599Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2275670Z shl.b16 %rs497, %rs489, 4; 2026-02-21T09:06:50.2275731Z shl.b16 %rs498, %rs490, 4; 2026-02-21T09:06:50.2275792Z shl.b16 %rs499, %rs491, 4; 2026-02-21T09:06:50.2275866Z shl.b16 %rs500, %rs492, 4; 2026-02-21T09:06:50.2275928Z shl.b16 %rs501, %rs493, 4; 2026-02-21T09:06:50.2275989Z shl.b16 %rs502, %rs494, 4; 2026-02-21T09:06:50.2276051Z shl.b16 %rs503, %rs495, 4; 2026-02-21T09:06:50.2276113Z shl.b16 %rs504, %rs496, 4; 2026-02-21T09:06:50.2276306Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2276375Z selp.b16 %rs505, %rs497, %rs489, %p81; 2026-02-21T09:06:50.2276440Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T09:06:50.2276622Z shr.s16 %rs507, %rs506, 4; 2026-02-21T09:06:50.2276696Z selp.b16 %rs508, %rs498, %rs490, %p81; 2026-02-21T09:06:50.2276760Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T09:06:50.2276820Z shr.s16 %rs510, %rs509, 4; 2026-02-21T09:06:50.2276970Z selp.b16 %rs511, %rs499, %rs491, %p81; 2026-02-21T09:06:50.2277034Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T09:06:50.2277095Z shr.s16 %rs513, %rs512, 4; 2026-02-21T09:06:50.2277162Z selp.b16 %rs514, %rs500, %rs492, %p81; 2026-02-21T09:06:50.2277224Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T09:06:50.2277287Z shr.s16 %rs516, %rs515, 4; 2026-02-21T09:06:50.2277355Z selp.b16 %rs517, %rs501, %rs493, %p81; 2026-02-21T09:06:50.2277414Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T09:06:50.2277476Z shr.s16 %rs519, %rs518, 4; 2026-02-21T09:06:50.2277542Z selp.b16 %rs520, %rs502, %rs494, %p81; 2026-02-21T09:06:50.2277601Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T09:06:50.2277660Z shr.s16 %rs522, %rs521, 4; 2026-02-21T09:06:50.2277731Z selp.b16 %rs523, %rs503, %rs495, %p81; 2026-02-21T09:06:50.2277937Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T09:06:50.2278003Z shr.s16 %rs525, %rs524, 4; 2026-02-21T09:06:50.2278074Z selp.b16 %rs526, %rs504, %rs496, %p81; 2026-02-21T09:06:50.2278137Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T09:06:50.2278210Z shr.s16 %rs528, %rs527, 4; 2026-02-21T09:06:50.2278411Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2278482Z cvt.rn.f32.s16 %r11254, %rs507; 2026-02-21T09:06:50.2278546Z cvt.rn.f32.s16 %r11255, %rs510; 2026-02-21T09:06:50.2278610Z cvt.rn.f32.s16 %r11256, %rs513; 2026-02-21T09:06:50.2278675Z cvt.rn.f32.s16 %r11257, %rs516; 2026-02-21T09:06:50.2278738Z cvt.rn.f32.s16 %r11258, %rs519; 2026-02-21T09:06:50.2278798Z cvt.rn.f32.s16 %r11259, %rs522; 2026-02-21T09:06:50.2278860Z cvt.rn.f32.s16 %r11260, %rs525; 2026-02-21T09:06:50.2278924Z cvt.rn.f32.s16 %r11261, %rs528; 2026-02-21T09:06:50.2278978Z bar.sync 0; 2026-02-21T09:06:50.2279045Z st.shared.b32 [%r22], %r11254; 2026-02-21T09:06:50.2279183Z st.shared.b32 [%r22+8], %r11255; 2026-02-21T09:06:50.2279250Z st.shared.b32 [%r23], %r11256; 2026-02-21T09:06:50.2279315Z st.shared.b32 [%r23+8], %r11257; 2026-02-21T09:06:50.2279383Z st.shared.b32 [%r24], %r11258; 2026-02-21T09:06:50.2279446Z st.shared.b32 [%r24+8], %r11259; 2026-02-21T09:06:50.2279509Z st.shared.b32 [%r25], %r11260; 2026-02-21T09:06:50.2279573Z st.shared.b32 [%r25+8], %r11261; 2026-02-21T09:06:50.2279630Z $L__tmp21: 2026-02-21T09:06:50.2279900Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2279959Z // begin inline asm 2026-02-21T09:06:50.2280039Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2280096Z // end inline asm 2026-02-21T09:06:50.2280150Z bar.sync 0; 2026-02-21T09:06:50.2280233Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2280297Z // begin inline asm 2026-02-21T09:06:50.2282969Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r9880,%r9881,%r9882,%r9883}, %rd93, %p46, 1, 1; 2026-02-21T09:06:50.2283035Z // end inline asm 2026-02-21T09:06:50.2283095Z // begin inline asm 2026-02-21T09:06:50.2285818Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r10140,%r10141,%r10142,%r10143}, %rd94, %p46, 1, 1; 2026-02-21T09:06:50.2285966Z // end inline asm 2026-02-21T09:06:50.2286044Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2286107Z mov.b32 %r10272, %r12221; 2026-02-21T09:06:50.2286168Z mov.b32 %r10273, %r11061; 2026-02-21T09:06:50.2286227Z mov.b32 %r10274, %r11061; 2026-02-21T09:06:50.2286287Z // begin inline asm 2026-02-21T09:06:50.2288946Z // wait for regs: %r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168,%r10272,%r10273,%r10274 2026-02-21T09:06:50.2289037Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2289095Z // end inline asm 2026-02-21T09:06:50.2289148Z $L__tmp22: 2026-02-21T09:06:50.2289352Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2289414Z // begin inline asm 2026-02-21T09:06:50.2289475Z mov.u32 %r10406, 0x0; 2026-02-21T09:06:50.2289534Z mov.u32 %r10407, 0x0; 2026-02-21T09:06:50.2289591Z mov.u32 %r10408, 0x0; 2026-02-21T09:06:50.2289662Z mov.u32 %r10409, 0x0; 2026-02-21T09:06:50.2289803Z ld.global.v4.b32 { %r10406, %r10407, %r10408, %r10409 }, [ %rd163 + 0 ]; 2026-02-21T09:06:50.2289862Z // end inline asm 2026-02-21T09:06:50.2290059Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2290115Z bar.sync 0; 2026-02-21T09:06:50.2290200Z st.shared.v2.b32 [%r13], {%r10406, %r10407}; 2026-02-21T09:06:50.2290279Z st.shared.v2.b32 [%r14], {%r10408, %r10409}; 2026-02-21T09:06:50.2290333Z bar.sync 0; 2026-02-21T09:06:50.2290398Z ld.shared.b16 %rs529, [%r42]; 2026-02-21T09:06:50.2290469Z ld.shared.b16 %rs530, [%r42+256]; 2026-02-21T09:06:50.2290535Z ld.shared.b16 %rs531, [%r42+16]; 2026-02-21T09:06:50.2290599Z ld.shared.b16 %rs532, [%r42+272]; 2026-02-21T09:06:50.2290668Z ld.shared.b16 %rs533, [%r43]; 2026-02-21T09:06:50.2290733Z ld.shared.b16 %rs534, [%r43+256]; 2026-02-21T09:06:50.2290798Z ld.shared.b16 %rs535, [%r43+16]; 2026-02-21T09:06:50.2290864Z ld.shared.b16 %rs536, [%r43+272]; 2026-02-21T09:06:50.2291019Z cvt.f32.bf16 %r10667, %rs529; 2026-02-21T09:06:50.2291080Z cvt.f32.bf16 %r10668, %rs530; 2026-02-21T09:06:50.2291140Z cvt.f32.bf16 %r10669, %rs533; 2026-02-21T09:06:50.2291203Z cvt.f32.bf16 %r10670, %rs534; 2026-02-21T09:06:50.2291262Z cvt.f32.bf16 %r10927, %rs531; 2026-02-21T09:06:50.2291325Z cvt.f32.bf16 %r10928, %rs532; 2026-02-21T09:06:50.2291389Z cvt.f32.bf16 %r10929, %rs535; 2026-02-21T09:06:50.2291448Z cvt.f32.bf16 %r10930, %rs536; 2026-02-21T09:06:50.2291508Z add.s32 %r11262, %r11214, 56; 2026-02-21T09:06:50.2291703Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2291767Z or.b32 %r11263, %r8, %r11262; 2026-02-21T09:06:50.2291959Z shl.b32 %r11264, %r11263, 13; 2026-02-21T09:06:50.2292025Z add.s32 %r11265, %r11264, %r575; 2026-02-21T09:06:50.2292224Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2292292Z cvt.s64.s32 %rd110, %r11265; 2026-02-21T09:06:50.2292357Z add.s64 %rd104, %rd23, %rd110; 2026-02-21T09:06:50.2292550Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2292621Z // begin inline asm 2026-02-21T09:06:50.2292682Z mov.u32 %r10410, 0x0; 2026-02-21T09:06:50.2292758Z ld.global.b32 { %r10410 }, [ %rd104 + 0 ]; 2026-02-21T09:06:50.2292817Z // end inline asm 2026-02-21T09:06:50.2293009Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2293065Z bar.sync 0; 2026-02-21T09:06:50.2293130Z st.shared.b8 [%r17], %r10410; 2026-02-21T09:06:50.2293199Z prmt.b32 %r11266, %r10410, 0, 0x7771U; 2026-02-21T09:06:50.2293267Z st.shared.b8 [%r18+512], %r11266; 2026-02-21T09:06:50.2293390Z prmt.b32 %r11267, %r10410, 0, 0x7772U; 2026-02-21T09:06:50.2293458Z st.shared.b8 [%r19+1024], %r11267; 2026-02-21T09:06:50.2293526Z prmt.b32 %r11268, %r10410, 0, 0x7773U; 2026-02-21T09:06:50.2293592Z st.shared.b8 [%r20+1536], %r11268; 2026-02-21T09:06:50.2293651Z bar.sync 0; 2026-02-21T09:06:50.2293715Z ld.shared.b32 %r11269, [%r21]; 2026-02-21T09:06:50.2293780Z prmt.b32 %r11270, %r11269, 0, 0x7770U; 2026-02-21T09:06:50.2293845Z cvt.u16.u32 %rs537, %r11270; 2026-02-21T09:06:50.2293911Z prmt.b32 %r11271, %r11269, 0, 0x7771U; 2026-02-21T09:06:50.2293973Z cvt.u16.u32 %rs538, %r11271; 2026-02-21T09:06:50.2294036Z prmt.b32 %r11272, %r11269, 0, 0x7772U; 2026-02-21T09:06:50.2294099Z cvt.u16.u32 %rs539, %r11272; 2026-02-21T09:06:50.2294163Z prmt.b32 %r11273, %r11269, 0, 0x7773U; 2026-02-21T09:06:50.2294225Z cvt.u16.u32 %rs540, %r11273; 2026-02-21T09:06:50.2294292Z ld.shared.b32 %r11274, [%r21+128]; 2026-02-21T09:06:50.2294361Z prmt.b32 %r11275, %r11274, 0, 0x7770U; 2026-02-21T09:06:50.2294422Z cvt.u16.u32 %rs541, %r11275; 2026-02-21T09:06:50.2294489Z prmt.b32 %r11276, %r11274, 0, 0x7771U; 2026-02-21T09:06:50.2294554Z cvt.u16.u32 %rs542, %r11276; 2026-02-21T09:06:50.2294621Z prmt.b32 %r11277, %r11274, 0, 0x7772U; 2026-02-21T09:06:50.2294693Z cvt.u16.u32 %rs543, %r11277; 2026-02-21T09:06:50.2294768Z prmt.b32 %r11278, %r11274, 0, 0x7773U; 2026-02-21T09:06:50.2294830Z cvt.u16.u32 %rs544, %r11278; 2026-02-21T09:06:50.2295026Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2295094Z shl.b16 %rs545, %rs537, 4; 2026-02-21T09:06:50.2295157Z shl.b16 %rs546, %rs538, 4; 2026-02-21T09:06:50.2295217Z shl.b16 %rs547, %rs539, 4; 2026-02-21T09:06:50.2295277Z shl.b16 %rs548, %rs540, 4; 2026-02-21T09:06:50.2295341Z shl.b16 %rs549, %rs541, 4; 2026-02-21T09:06:50.2295402Z shl.b16 %rs550, %rs542, 4; 2026-02-21T09:06:50.2295465Z shl.b16 %rs551, %rs543, 4; 2026-02-21T09:06:50.2295536Z shl.b16 %rs552, %rs544, 4; 2026-02-21T09:06:50.2295729Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2295874Z selp.b16 %rs553, %rs545, %rs537, %p81; 2026-02-21T09:06:50.2295941Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T09:06:50.2296001Z shr.s16 %rs555, %rs554, 4; 2026-02-21T09:06:50.2296069Z selp.b16 %rs556, %rs546, %rs538, %p81; 2026-02-21T09:06:50.2296130Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T09:06:50.2296194Z shr.s16 %rs558, %rs557, 4; 2026-02-21T09:06:50.2296262Z selp.b16 %rs559, %rs547, %rs539, %p81; 2026-02-21T09:06:50.2296323Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T09:06:50.2296386Z shr.s16 %rs561, %rs560, 4; 2026-02-21T09:06:50.2296563Z selp.b16 %rs562, %rs548, %rs540, %p81; 2026-02-21T09:06:50.2296627Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T09:06:50.2296690Z shr.s16 %rs564, %rs563, 4; 2026-02-21T09:06:50.2296763Z selp.b16 %rs565, %rs549, %rs541, %p81; 2026-02-21T09:06:50.2296973Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T09:06:50.2297038Z shr.s16 %rs567, %rs566, 4; 2026-02-21T09:06:50.2297111Z selp.b16 %rs568, %rs550, %rs542, %p81; 2026-02-21T09:06:50.2297174Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T09:06:50.2297235Z shr.s16 %rs570, %rs569, 4; 2026-02-21T09:06:50.2297304Z selp.b16 %rs571, %rs551, %rs543, %p81; 2026-02-21T09:06:50.2297368Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T09:06:50.2297429Z shr.s16 %rs573, %rs572, 4; 2026-02-21T09:06:50.2297503Z selp.b16 %rs574, %rs552, %rs544, %p81; 2026-02-21T09:06:50.2297565Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T09:06:50.2297625Z shr.s16 %rs576, %rs575, 4; 2026-02-21T09:06:50.2297818Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2297887Z cvt.rn.f32.s16 %r11279, %rs555; 2026-02-21T09:06:50.2297953Z cvt.rn.f32.s16 %r11280, %rs558; 2026-02-21T09:06:50.2298016Z cvt.rn.f32.s16 %r11281, %rs561; 2026-02-21T09:06:50.2298082Z cvt.rn.f32.s16 %r11282, %rs564; 2026-02-21T09:06:50.2298213Z cvt.rn.f32.s16 %r11283, %rs567; 2026-02-21T09:06:50.2298281Z cvt.rn.f32.s16 %r11284, %rs570; 2026-02-21T09:06:50.2298342Z cvt.rn.f32.s16 %r11285, %rs573; 2026-02-21T09:06:50.2298411Z cvt.rn.f32.s16 %r11286, %rs576; 2026-02-21T09:06:50.2298466Z bar.sync 0; 2026-02-21T09:06:50.2298530Z st.shared.b32 [%r22], %r11279; 2026-02-21T09:06:50.2298596Z st.shared.b32 [%r22+8], %r11280; 2026-02-21T09:06:50.2298677Z st.shared.b32 [%r23], %r11281; 2026-02-21T09:06:50.2298747Z st.shared.b32 [%r23+8], %r11282; 2026-02-21T09:06:50.2298810Z st.shared.b32 [%r24], %r11283; 2026-02-21T09:06:50.2298875Z st.shared.b32 [%r24+8], %r11284; 2026-02-21T09:06:50.2298938Z st.shared.b32 [%r25], %r11285; 2026-02-21T09:06:50.2299001Z st.shared.b32 [%r25+8], %r11286; 2026-02-21T09:06:50.2299055Z $L__tmp23: 2026-02-21T09:06:50.2299327Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2299392Z // begin inline asm 2026-02-21T09:06:50.2299471Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2299534Z // end inline asm 2026-02-21T09:06:50.2299588Z bar.sync 0; 2026-02-21T09:06:50.2299662Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2299724Z // begin inline asm 2026-02-21T09:06:50.2302446Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r10667,%r10668,%r10669,%r10670}, %rd93, %p46, 1, 1; 2026-02-21T09:06:50.2302592Z // end inline asm 2026-02-21T09:06:50.2302652Z // begin inline asm 2026-02-21T09:06:50.2305429Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168}, {%r10927,%r10928,%r10929,%r10930}, %rd94, %p46, 1, 1; 2026-02-21T09:06:50.2305540Z // end inline asm 2026-02-21T09:06:50.2305615Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2305681Z mov.b32 %r11059, %r12221; 2026-02-21T09:06:50.2305742Z mov.b32 %r11060, %r11061; 2026-02-21T09:06:50.2305801Z // begin inline asm 2026-02-21T09:06:50.2308569Z // wait for regs: %r15041,%r15042,%r15043,%r15044,%r15045,%r15046,%r15047,%r15048,%r15049,%r15050,%r15051,%r15052,%r15053,%r15054,%r15055,%r15056,%r15057,%r15058,%r15059,%r15060,%r15061,%r15062,%r15063,%r15064,%r15065,%r15066,%r15067,%r15068,%r15069,%r15070,%r15071,%r15072,%r15073,%r15074,%r15075,%r15076,%r15077,%r15078,%r15079,%r15080,%r15081,%r15082,%r15083,%r15084,%r15085,%r15086,%r15087,%r15088,%r15089,%r15090,%r15091,%r15092,%r15093,%r15094,%r15095,%r15096,%r15097,%r15098,%r15099,%r15100,%r15101,%r15102,%r15103,%r15104,%r15105,%r15106,%r15107,%r15108,%r15109,%r15110,%r15111,%r15112,%r15113,%r15114,%r15115,%r15116,%r15117,%r15118,%r15119,%r15120,%r15121,%r15122,%r15123,%r15124,%r15125,%r15126,%r15127,%r15128,%r15129,%r15130,%r15131,%r15132,%r15133,%r15134,%r15135,%r15136,%r15137,%r15138,%r15139,%r15140,%r15141,%r15142,%r15143,%r15144,%r15145,%r15146,%r15147,%r15148,%r15149,%r15150,%r15151,%r15152,%r15153,%r15154,%r15155,%r15156,%r15157,%r15158,%r15159,%r15160,%r15161,%r15162,%r15163,%r15164,%r15165,%r15166,%r15167,%r15168,%r11059,%r11060,%r11061 2026-02-21T09:06:50.2308664Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2308724Z // end inline asm 2026-02-21T09:06:50.2308778Z $L__tmp24: 2026-02-21T09:06:50.2308996Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2309064Z add.s64 %rd164, %rd164, 32; 2026-02-21T09:06:50.2309132Z add.s64 %rd163, %rd163, 128; 2026-02-21T09:06:50.2309194Z add.s32 %r15040, %r15040, 262144; 2026-02-21T09:06:50.2309260Z setp.lt.u64 %p55, %rd164, 480; 2026-02-21T09:06:50.2309323Z @%p55 bra $L__BB0_7; 2026-02-21T09:06:50.2309435Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:06:50.2309638Z .loc 1 94 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:94:28 2026-02-21T09:06:50.2309729Z cvt.rn.bf16x2.f32 %r11290, %r15042, %r15041; 2026-02-21T09:06:50.2309821Z cvt.rn.bf16x2.f32 %r11291, %r15044, %r15043; 2026-02-21T09:06:50.2309905Z cvt.rn.bf16x2.f32 %r11292, %r15046, %r15045; 2026-02-21T09:06:50.2309983Z cvt.rn.bf16x2.f32 %r11293, %r15048, %r15047; 2026-02-21T09:06:50.2310063Z cvt.rn.bf16x2.f32 %r11294, %r15050, %r15049; 2026-02-21T09:06:50.2310215Z cvt.rn.bf16x2.f32 %r11295, %r15052, %r15051; 2026-02-21T09:06:50.2310293Z cvt.rn.bf16x2.f32 %r11296, %r15054, %r15053; 2026-02-21T09:06:50.2310374Z cvt.rn.bf16x2.f32 %r11297, %r15056, %r15055; 2026-02-21T09:06:50.2310448Z cvt.rn.bf16x2.f32 %r11298, %r15058, %r15057; 2026-02-21T09:06:50.2310523Z cvt.rn.bf16x2.f32 %r11299, %r15060, %r15059; 2026-02-21T09:06:50.2310602Z cvt.rn.bf16x2.f32 %r11300, %r15062, %r15061; 2026-02-21T09:06:50.2310678Z cvt.rn.bf16x2.f32 %r11301, %r15064, %r15063; 2026-02-21T09:06:50.2310754Z cvt.rn.bf16x2.f32 %r11302, %r15066, %r15065; 2026-02-21T09:06:50.2310829Z cvt.rn.bf16x2.f32 %r11303, %r15068, %r15067; 2026-02-21T09:06:50.2310908Z cvt.rn.bf16x2.f32 %r11304, %r15070, %r15069; 2026-02-21T09:06:50.2310983Z cvt.rn.bf16x2.f32 %r11305, %r15072, %r15071; 2026-02-21T09:06:50.2311186Z cvt.rn.bf16x2.f32 %r11306, %r15074, %r15073; 2026-02-21T09:06:50.2311268Z cvt.rn.bf16x2.f32 %r11307, %r15076, %r15075; 2026-02-21T09:06:50.2311344Z cvt.rn.bf16x2.f32 %r11308, %r15078, %r15077; 2026-02-21T09:06:50.2311421Z cvt.rn.bf16x2.f32 %r11309, %r15080, %r15079; 2026-02-21T09:06:50.2311499Z cvt.rn.bf16x2.f32 %r11310, %r15082, %r15081; 2026-02-21T09:06:50.2311576Z cvt.rn.bf16x2.f32 %r11311, %r15084, %r15083; 2026-02-21T09:06:50.2311652Z cvt.rn.bf16x2.f32 %r11312, %r15086, %r15085; 2026-02-21T09:06:50.2311725Z cvt.rn.bf16x2.f32 %r11313, %r15088, %r15087; 2026-02-21T09:06:50.2311804Z cvt.rn.bf16x2.f32 %r11314, %r15090, %r15089; 2026-02-21T09:06:50.2311880Z cvt.rn.bf16x2.f32 %r11315, %r15092, %r15091; 2026-02-21T09:06:50.2311954Z cvt.rn.bf16x2.f32 %r11316, %r15094, %r15093; 2026-02-21T09:06:50.2312031Z cvt.rn.bf16x2.f32 %r11317, %r15096, %r15095; 2026-02-21T09:06:50.2312105Z cvt.rn.bf16x2.f32 %r11318, %r15098, %r15097; 2026-02-21T09:06:50.2312184Z cvt.rn.bf16x2.f32 %r11319, %r15100, %r15099; 2026-02-21T09:06:50.2312311Z cvt.rn.bf16x2.f32 %r11320, %r15102, %r15101; 2026-02-21T09:06:50.2312390Z cvt.rn.bf16x2.f32 %r11321, %r15104, %r15103; 2026-02-21T09:06:50.2312467Z cvt.rn.bf16x2.f32 %r11322, %r15106, %r15105; 2026-02-21T09:06:50.2312542Z cvt.rn.bf16x2.f32 %r11323, %r15108, %r15107; 2026-02-21T09:06:50.2312620Z cvt.rn.bf16x2.f32 %r11324, %r15110, %r15109; 2026-02-21T09:06:50.2312708Z cvt.rn.bf16x2.f32 %r11325, %r15112, %r15111; 2026-02-21T09:06:50.2312786Z cvt.rn.bf16x2.f32 %r11326, %r15114, %r15113; 2026-02-21T09:06:50.2312864Z cvt.rn.bf16x2.f32 %r11327, %r15116, %r15115; 2026-02-21T09:06:50.2312941Z cvt.rn.bf16x2.f32 %r11328, %r15118, %r15117; 2026-02-21T09:06:50.2313015Z cvt.rn.bf16x2.f32 %r11329, %r15120, %r15119; 2026-02-21T09:06:50.2313097Z cvt.rn.bf16x2.f32 %r11330, %r15122, %r15121; 2026-02-21T09:06:50.2313174Z cvt.rn.bf16x2.f32 %r11331, %r15124, %r15123; 2026-02-21T09:06:50.2313250Z cvt.rn.bf16x2.f32 %r11332, %r15126, %r15125; 2026-02-21T09:06:50.2313330Z cvt.rn.bf16x2.f32 %r11333, %r15128, %r15127; 2026-02-21T09:06:50.2313409Z cvt.rn.bf16x2.f32 %r11334, %r15130, %r15129; 2026-02-21T09:06:50.2313485Z cvt.rn.bf16x2.f32 %r11335, %r15132, %r15131; 2026-02-21T09:06:50.2313562Z cvt.rn.bf16x2.f32 %r11336, %r15134, %r15133; 2026-02-21T09:06:50.2313641Z cvt.rn.bf16x2.f32 %r11337, %r15136, %r15135; 2026-02-21T09:06:50.2313715Z cvt.rn.bf16x2.f32 %r11338, %r15138, %r15137; 2026-02-21T09:06:50.2313788Z cvt.rn.bf16x2.f32 %r11339, %r15140, %r15139; 2026-02-21T09:06:50.2313864Z cvt.rn.bf16x2.f32 %r11340, %r15142, %r15141; 2026-02-21T09:06:50.2313942Z cvt.rn.bf16x2.f32 %r11341, %r15144, %r15143; 2026-02-21T09:06:50.2314343Z [355s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:06:50.2315616Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=1, num_warps=16, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[3, 1], range_unroll_factors=[3, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:06:50.2315828Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:06:50.2315888Z `ptxas` stderr: 2026-02-21T09:06:50.2316350Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 541 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:06:50.2316578Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:06:50.2316587Z 2026-02-21T09:06:50.2317097Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmptkxx4tst.ptx -o /tmp/tmptkxx4tst.ptx.o 2026-02-21T09:06:50.2317103Z 2026-02-21T09:06:50.2317413Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:06:50.2317506Z cvt.rn.bf16x2.f32 %r11342, %r15146, %r15145; 2026-02-21T09:06:50.2317589Z cvt.rn.bf16x2.f32 %r11343, %r15148, %r15147; 2026-02-21T09:06:50.2317673Z cvt.rn.bf16x2.f32 %r11344, %r15150, %r15149; 2026-02-21T09:06:50.2317750Z cvt.rn.bf16x2.f32 %r11345, %r15152, %r15151; 2026-02-21T09:06:50.2317827Z cvt.rn.bf16x2.f32 %r11346, %r15154, %r15153; 2026-02-21T09:06:50.2317907Z cvt.rn.bf16x2.f32 %r11347, %r15156, %r15155; 2026-02-21T09:06:50.2317985Z cvt.rn.bf16x2.f32 %r11348, %r15158, %r15157; 2026-02-21T09:06:50.2318062Z cvt.rn.bf16x2.f32 %r11349, %r15160, %r15159; 2026-02-21T09:06:50.2318136Z cvt.rn.bf16x2.f32 %r11350, %r15162, %r15161; 2026-02-21T09:06:50.2318216Z cvt.rn.bf16x2.f32 %r11351, %r15164, %r15163; 2026-02-21T09:06:50.2318293Z cvt.rn.bf16x2.f32 %r11352, %r15166, %r15165; 2026-02-21T09:06:50.2318368Z cvt.rn.bf16x2.f32 %r11353, %r15168, %r15167; 2026-02-21T09:06:50.2318658Z .loc 1 95 43 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:95:43 2026-02-21T09:06:50.2318721Z bar.sync 0; 2026-02-21T09:06:50.2318922Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r26], {%r11290, %r11291, %r11292, %r11293}; 2026-02-21T09:06:50.2319114Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r27], {%r11306, %r11307, %r11308, %r11309}; 2026-02-21T09:06:50.2319299Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r28], {%r11322, %r11323, %r11324, %r11325}; 2026-02-21T09:06:50.2319482Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r11338, %r11339, %r11340, %r11341}; 2026-02-21T09:06:50.2319662Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r11294, %r11295, %r11296, %r11297}; 2026-02-21T09:06:50.2319845Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r11310, %r11311, %r11312, %r11313}; 2026-02-21T09:06:50.2320025Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r11326, %r11327, %r11328, %r11329}; 2026-02-21T09:06:50.2320206Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r33], {%r11342, %r11343, %r11344, %r11345}; 2026-02-21T09:06:50.2320393Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r34], {%r11298, %r11299, %r11300, %r11301}; 2026-02-21T09:06:50.2320570Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r11314, %r11315, %r11316, %r11317}; 2026-02-21T09:06:50.2320753Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r11330, %r11331, %r11332, %r11333}; 2026-02-21T09:06:50.2320935Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r11346, %r11347, %r11348, %r11349}; 2026-02-21T09:06:50.2321113Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r11302, %r11303, %r11304, %r11305}; 2026-02-21T09:06:50.2321291Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r11318, %r11319, %r11320, %r11321}; 2026-02-21T09:06:50.2321473Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r11334, %r11335, %r11336, %r11337}; 2026-02-21T09:06:50.2321654Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r11350, %r11351, %r11352, %r11353}; 2026-02-21T09:06:50.2321717Z // begin inline asm 2026-02-21T09:06:50.2321804Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2321866Z // end inline asm 2026-02-21T09:06:50.2321921Z bar.sync 0; 2026-02-21T09:06:50.2321991Z elect.sync %r11354|%p58, -1; 2026-02-21T09:06:50.2322149Z and.pred %p56, %p82, %p58; 2026-02-21T09:06:50.2322213Z or.b32 %r11287, %r574, %r310; 2026-02-21T09:06:50.2322274Z // begin inline asm 2026-02-21T09:06:50.2322507Z @%p56 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd158, {%r11287, %r573}], [%r7962]; 2026-02-21T09:06:50.2322566Z // end inline asm 2026-02-21T09:06:50.2322639Z cp.async.bulk.commit_group; 2026-02-21T09:06:50.2322719Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:50.2322774Z bar.sync 0; 2026-02-21T09:06:50.2322986Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.2323051Z add.s32 %r14781, %r14781, 3168; 2026-02-21T09:06:50.2323123Z setp.lt.s32 %p59, %r14781, %r3; 2026-02-21T09:06:50.2323237Z @%p59 bra $L__BB0_2; 2026-02-21T09:06:50.2323378Z $L__BB0_9: // %._crit_edge 2026-02-21T09:06:50.2323447Z sub.s32 %r11355, 512, %r3; 2026-02-21T09:06:50.2323520Z mul.hi.s32 %r11356, %r11355, 1041204193; 2026-02-21T09:06:50.2323584Z shr.u32 %r11357, %r11356, 31; 2026-02-21T09:06:50.2323645Z shr.s32 %r11358, %r11356, 8; 2026-02-21T09:06:50.2323713Z add.s32 %r11359, %r11358, %r11357; 2026-02-21T09:06:50.2323777Z mul.lo.s32 %r11360, %r11359, 1056; 2026-02-21T09:06:50.2323849Z setp.ne.b32 %p60, %r11355, %r11360; 2026-02-21T09:06:50.2323922Z setp.gt.s32 %p61, %r11355, -1; 2026-02-21T09:06:50.2323989Z and.pred %p62, %p61, %p60; 2026-02-21T09:06:50.2324052Z selp.b32 %r11361, 1, 0, %p62; 2026-02-21T09:06:50.2324119Z add.s32 %r836, %r11359, %r11361; 2026-02-21T09:06:50.2324186Z setp.lt.s32 %p63, %r836, 1; 2026-02-21T09:06:50.2324251Z @%p63 bra $L__BB0_16; 2026-02-21T09:06:50.2324339Z // %bb.10: // %.lr.ph192 2026-02-21T09:06:50.2324409Z shl.b32 %r15169, %r836, 4; 2026-02-21T09:06:50.2324526Z add.s32 %r15315, %r3, -1056; 2026-02-21T09:06:50.2324594Z and.b32 %r11366, %r14764, 8048; 2026-02-21T09:06:50.2324661Z and.b32 %r11368, %r14765, 136; 2026-02-21T09:06:50.2324727Z or.b32 %r11369, %r11368, %r11366; 2026-02-21T09:06:50.2324791Z add.s32 %r839, %r12221, %r11369; 2026-02-21T09:06:50.2324853Z xor.b32 %r11371, %r11369, 8; 2026-02-21T09:06:50.2324918Z add.s32 %r840, %r12221, %r11371; 2026-02-21T09:06:50.2324979Z shl.b32 %r11373, %r14780, 4; 2026-02-21T09:06:50.2325039Z shl.b32 %r11376, %r14767, 1; 2026-02-21T09:06:50.2325103Z and.b32 %r11379, %r14769, 136; 2026-02-21T09:06:50.2325164Z or.b32 %r11380, %r11373, %r14766; 2026-02-21T09:06:50.2325224Z or.b32 %r11381, %r11380, %r11376; 2026-02-21T09:06:50.2325284Z or.b32 %r11382, %r11381, %r11379; 2026-02-21T09:06:50.2325354Z add.s32 %r841, %r12221, %r11382; 2026-02-21T09:06:50.2325414Z xor.b32 %r11383, %r11382, 8; 2026-02-21T09:06:50.2325487Z add.s32 %r842, %r12221, %r11383; 2026-02-21T09:06:50.2325557Z or.b32 %r11388, %r14771, %r14772; 2026-02-21T09:06:50.2325619Z or.b32 %r11389, %r11388, %r14773; 2026-02-21T09:06:50.2325679Z or.b32 %r11390, %r11389, %r14770; 2026-02-21T09:06:50.2325742Z add.s32 %r843, %r12221, %r11390; 2026-02-21T09:06:50.2325805Z xor.b32 %r11391, %r11390, 32; 2026-02-21T09:06:50.2325868Z add.s32 %r844, %r12221, %r11391; 2026-02-21T09:06:50.2325929Z xor.b32 %r11392, %r11390, 64; 2026-02-21T09:06:50.2325993Z add.s32 %r845, %r12221, %r11392; 2026-02-21T09:06:50.2326053Z xor.b32 %r11393, %r11390, 96; 2026-02-21T09:06:50.2326113Z add.s32 %r846, %r12221, %r11393; 2026-02-21T09:06:50.2326182Z shl.b32 %r11394, %r14767, 9; 2026-02-21T09:06:50.2326241Z shl.b32 %r11395, %r14767, 5; 2026-02-21T09:06:50.2326302Z and.b32 %r11398, %r14775, 256; 2026-02-21T09:06:50.2326364Z xor.b32 %r11399, %r11395, %r14774; 2026-02-21T09:06:50.2326429Z add.s32 %r11400, %r12221, %r11394; 2026-02-21T09:06:50.2326611Z add.s32 %r11401, %r11400, %r11398; 2026-02-21T09:06:50.2326680Z add.s32 %r847, %r11401, %r11399; 2026-02-21T09:06:50.2326746Z and.b32 %r11403, %r14776, 16320; 2026-02-21T09:06:50.2326811Z or.b32 %r11406, %r11403, %r14778; 2026-02-21T09:06:50.2326871Z or.b32 %r11407, %r11406, %r14777; 2026-02-21T09:06:50.2327041Z add.s32 %r848, %r12221, %r11407; 2026-02-21T09:06:50.2327105Z xor.b32 %r11408, %r11407, 16; 2026-02-21T09:06:50.2327167Z add.s32 %r849, %r12221, %r11408; 2026-02-21T09:06:50.2327228Z xor.b32 %r11409, %r11407, 32; 2026-02-21T09:06:50.2327294Z add.s32 %r850, %r12221, %r11409; 2026-02-21T09:06:50.2327353Z xor.b32 %r11410, %r11407, 48; 2026-02-21T09:06:50.2327414Z add.s32 %r851, %r12221, %r11410; 2026-02-21T09:06:50.2327477Z bfe.u32 %r11411, %r12221, 4, 14; 2026-02-21T09:06:50.2327542Z cvt.u64.u32 %rd112, %r11411; 2026-02-21T09:06:50.2327626Z or.b64 %rd119, %rd112, -9223371899348713472; 2026-02-21T09:06:50.2327689Z add.s32 %r11412, %r12221, 32; 2026-02-21T09:06:50.2327754Z bfe.u32 %r11413, %r11412, 4, 14; 2026-02-21T09:06:50.2327944Z cvt.u64.u32 %rd113, %r11413; 2026-02-21T09:06:50.2328024Z or.b64 %rd120, %rd113, -9223371899348713472; 2026-02-21T09:06:50.2328093Z and.b32 %r11415, %r14779, 1920; 2026-02-21T09:06:50.2328155Z shl.b32 %r11416, %r14780, 6; 2026-02-21T09:06:50.2328218Z and.b32 %r11417, %r14764, 112; 2026-02-21T09:06:50.2328281Z or.b32 %r11418, %r11415, %r11417; 2026-02-21T09:06:50.2328347Z xor.b32 %r11419, %r11418, %r14768; 2026-02-21T09:06:50.2328409Z or.b32 %r11420, %r11419, %r11416; 2026-02-21T09:06:50.2328484Z add.s32 %r852, %r12221, %r11420; 2026-02-21T09:06:50.2328551Z add.s32 %r853, %r852, 32768; 2026-02-21T09:06:50.2328613Z add.s32 %r854, %r852, 65536; 2026-02-21T09:06:50.2328673Z add.s32 %r855, %r852, 98304; 2026-02-21T09:06:50.2328736Z xor.b32 %r11421, %r11420, 32; 2026-02-21T09:06:50.2328796Z add.s32 %r856, %r12221, %r11421; 2026-02-21T09:06:50.2328857Z add.s32 %r857, %r856, 32768; 2026-02-21T09:06:50.2328918Z add.s32 %r858, %r856, 65536; 2026-02-21T09:06:50.2328984Z add.s32 %r859, %r856, 98304; 2026-02-21T09:06:50.2329111Z xor.b32 %r11422, %r11420, 64; 2026-02-21T09:06:50.2329177Z add.s32 %r860, %r12221, %r11422; 2026-02-21T09:06:50.2329238Z add.s32 %r861, %r860, 32768; 2026-02-21T09:06:50.2329300Z add.s32 %r862, %r860, 65536; 2026-02-21T09:06:50.2329360Z add.s32 %r863, %r860, 98304; 2026-02-21T09:06:50.2329419Z xor.b32 %r11423, %r11420, 96; 2026-02-21T09:06:50.2329483Z add.s32 %r864, %r12221, %r11423; 2026-02-21T09:06:50.2329542Z add.s32 %r865, %r864, 32768; 2026-02-21T09:06:50.2329601Z add.s32 %r866, %r864, 65536; 2026-02-21T09:06:50.2329663Z add.s32 %r867, %r864, 98304; 2026-02-21T09:06:50.2329723Z mov.b32 %r15307, -1; 2026-02-21T09:06:50.2329783Z mov.b32 %r15177, 0f00000000; 2026-02-21T09:06:50.2329841Z mov.b32 %r15311, 0; 2026-02-21T09:06:50.2329905Z mov.b32 %r15312, %r15311; 2026-02-21T09:06:50.2329965Z mov.b32 %r15313, %r15311; 2026-02-21T09:06:50.2330024Z mov.b32 %r15314, %r15311; 2026-02-21T09:06:50.2330087Z mov.b32 %r15310, %r15311; 2026-02-21T09:06:50.2330149Z mov.b32 %r15309, %r15311; 2026-02-21T09:06:50.2330208Z mov.b32 %r15308, %r15311; 2026-02-21T09:06:50.2330268Z mov.b32 %r15178, %r15177; 2026-02-21T09:06:50.2330331Z mov.b32 %r15179, %r15177; 2026-02-21T09:06:50.2330392Z mov.b32 %r15180, %r15177; 2026-02-21T09:06:50.2330454Z mov.b32 %r15181, %r15177; 2026-02-21T09:06:50.2330516Z mov.b32 %r15182, %r15177; 2026-02-21T09:06:50.2330580Z mov.b32 %r15183, %r15177; 2026-02-21T09:06:50.2330638Z mov.b32 %r15184, %r15177; 2026-02-21T09:06:50.2330697Z mov.b32 %r15185, %r15177; 2026-02-21T09:06:50.2330770Z mov.b32 %r15186, %r15177; 2026-02-21T09:06:50.2330834Z mov.b32 %r15187, %r15177; 2026-02-21T09:06:50.2330894Z mov.b32 %r15188, %r15177; 2026-02-21T09:06:50.2330954Z mov.b32 %r15189, %r15177; 2026-02-21T09:06:50.2331014Z mov.b32 %r15190, %r15177; 2026-02-21T09:06:50.2331072Z mov.b32 %r15191, %r15177; 2026-02-21T09:06:50.2331133Z mov.b32 %r15192, %r15177; 2026-02-21T09:06:50.2331192Z mov.b32 %r15193, %r15177; 2026-02-21T09:06:50.2331254Z mov.b32 %r15194, %r15177; 2026-02-21T09:06:50.2331314Z mov.b32 %r15195, %r15177; 2026-02-21T09:06:50.2331376Z mov.b32 %r15196, %r15177; 2026-02-21T09:06:50.2331435Z mov.b32 %r15197, %r15177; 2026-02-21T09:06:50.2331555Z mov.b32 %r15198, %r15177; 2026-02-21T09:06:50.2331620Z mov.b32 %r15199, %r15177; 2026-02-21T09:06:50.2331678Z mov.b32 %r15200, %r15177; 2026-02-21T09:06:50.2331737Z mov.b32 %r15201, %r15177; 2026-02-21T09:06:50.2331795Z mov.b32 %r15202, %r15177; 2026-02-21T09:06:50.2331855Z mov.b32 %r15203, %r15177; 2026-02-21T09:06:50.2331914Z mov.b32 %r15204, %r15177; 2026-02-21T09:06:50.2331972Z mov.b32 %r15205, %r15177; 2026-02-21T09:06:50.2332032Z mov.b32 %r15206, %r15177; 2026-02-21T09:06:50.2332089Z mov.b32 %r15207, %r15177; 2026-02-21T09:06:50.2332147Z mov.b32 %r15208, %r15177; 2026-02-21T09:06:50.2332203Z mov.b32 %r15209, %r15177; 2026-02-21T09:06:50.2332265Z mov.b32 %r15210, %r15177; 2026-02-21T09:06:50.2332386Z mov.b32 %r15211, %r15177; 2026-02-21T09:06:50.2332494Z mov.b32 %r15212, %r15177; 2026-02-21T09:06:50.2332562Z mov.b32 %r15213, %r15177; 2026-02-21T09:06:50.2332622Z mov.b32 %r15214, %r15177; 2026-02-21T09:06:50.2332681Z mov.b32 %r15215, %r15177; 2026-02-21T09:06:50.2332742Z mov.b32 %r15216, %r15177; 2026-02-21T09:06:50.2332804Z mov.b32 %r15217, %r15177; 2026-02-21T09:06:50.2332862Z mov.b32 %r15218, %r15177; 2026-02-21T09:06:50.2332920Z mov.b32 %r15219, %r15177; 2026-02-21T09:06:50.2332986Z mov.b32 %r15220, %r15177; 2026-02-21T09:06:50.2333052Z mov.b32 %r15221, %r15177; 2026-02-21T09:06:50.2333111Z mov.b32 %r15222, %r15177; 2026-02-21T09:06:50.2333169Z mov.b32 %r15223, %r15177; 2026-02-21T09:06:50.2333232Z mov.b32 %r15224, %r15177; 2026-02-21T09:06:50.2333289Z mov.b32 %r15225, %r15177; 2026-02-21T09:06:50.2333347Z mov.b32 %r15226, %r15177; 2026-02-21T09:06:50.2333409Z mov.b32 %r15227, %r15177; 2026-02-21T09:06:50.2333467Z mov.b32 %r15228, %r15177; 2026-02-21T09:06:50.2333528Z mov.b32 %r15229, %r15177; 2026-02-21T09:06:50.2333587Z mov.b32 %r15230, %r15177; 2026-02-21T09:06:50.2333706Z mov.b32 %r15231, %r15177; 2026-02-21T09:06:50.2333768Z mov.b32 %r15232, %r15177; 2026-02-21T09:06:50.2333830Z mov.b32 %r15233, %r15177; 2026-02-21T09:06:50.2333895Z mov.b32 %r15234, %r15177; 2026-02-21T09:06:50.2333953Z mov.b32 %r15235, %r15177; 2026-02-21T09:06:50.2334012Z mov.b32 %r15236, %r15177; 2026-02-21T09:06:50.2334074Z mov.b32 %r15237, %r15177; 2026-02-21T09:06:50.2334133Z mov.b32 %r15238, %r15177; 2026-02-21T09:06:50.2334190Z mov.b32 %r15239, %r15177; 2026-02-21T09:06:50.2334247Z mov.b32 %r15240, %r15177; 2026-02-21T09:06:50.2334307Z mov.b32 %r15241, %r15177; 2026-02-21T09:06:50.2334365Z mov.b32 %r15242, %r15177; 2026-02-21T09:06:50.2334422Z mov.b32 %r15243, %r15177; 2026-02-21T09:06:50.2334495Z mov.b32 %r15244, %r15177; 2026-02-21T09:06:50.2334555Z mov.b32 %r15245, %r15177; 2026-02-21T09:06:50.2334613Z mov.b32 %r15246, %r15177; 2026-02-21T09:06:50.2334673Z mov.b32 %r15247, %r15177; 2026-02-21T09:06:50.2334739Z mov.b32 %r15248, %r15177; 2026-02-21T09:06:50.2334798Z mov.b32 %r15249, %r15177; 2026-02-21T09:06:50.2334856Z mov.b32 %r15250, %r15177; 2026-02-21T09:06:50.2334918Z mov.b32 %r15251, %r15177; 2026-02-21T09:06:50.2334979Z mov.b32 %r15252, %r15177; 2026-02-21T09:06:50.2335036Z mov.b32 %r15253, %r15177; 2026-02-21T09:06:50.2335094Z mov.b32 %r15254, %r15177; 2026-02-21T09:06:50.2335156Z mov.b32 %r15255, %r15177; 2026-02-21T09:06:50.2335215Z mov.b32 %r15256, %r15177; 2026-02-21T09:06:50.2335273Z mov.b32 %r15257, %r15177; 2026-02-21T09:06:50.2335335Z mov.b32 %r15258, %r15177; 2026-02-21T09:06:50.2335393Z mov.b32 %r15259, %r15177; 2026-02-21T09:06:50.2335450Z mov.b32 %r15260, %r15177; 2026-02-21T09:06:50.2335509Z mov.b32 %r15261, %r15177; 2026-02-21T09:06:50.2335572Z mov.b32 %r15262, %r15177; 2026-02-21T09:06:50.2335629Z mov.b32 %r15263, %r15177; 2026-02-21T09:06:50.2335686Z mov.b32 %r15264, %r15177; 2026-02-21T09:06:50.2335749Z mov.b32 %r15265, %r15177; 2026-02-21T09:06:50.2335810Z mov.b32 %r15266, %r15177; 2026-02-21T09:06:50.2335871Z mov.b32 %r15267, %r15177; 2026-02-21T09:06:50.2335931Z mov.b32 %r15268, %r15177; 2026-02-21T09:06:50.2335996Z mov.b32 %r15269, %r15177; 2026-02-21T09:06:50.2336132Z mov.b32 %r15270, %r15177; 2026-02-21T09:06:50.2336191Z mov.b32 %r15271, %r15177; 2026-02-21T09:06:50.2336254Z mov.b32 %r15272, %r15177; 2026-02-21T09:06:50.2336313Z mov.b32 %r15273, %r15177; 2026-02-21T09:06:50.2336373Z mov.b32 %r15274, %r15177; 2026-02-21T09:06:50.2336432Z mov.b32 %r15275, %r15177; 2026-02-21T09:06:50.2336606Z mov.b32 %r15276, %r15177; 2026-02-21T09:06:50.2336671Z mov.b32 %r15277, %r15177; 2026-02-21T09:06:50.2336728Z mov.b32 %r15278, %r15177; 2026-02-21T09:06:50.2336793Z mov.b32 %r15279, %r15177; 2026-02-21T09:06:50.2336852Z mov.b32 %r15280, %r15177; 2026-02-21T09:06:50.2336910Z mov.b32 %r15281, %r15177; 2026-02-21T09:06:50.2336971Z mov.b32 %r15282, %r15177; 2026-02-21T09:06:50.2337121Z mov.b32 %r15283, %r15177; 2026-02-21T09:06:50.2337244Z mov.b32 %r15284, %r15177; 2026-02-21T09:06:50.2337304Z mov.b32 %r15285, %r15177; 2026-02-21T09:06:50.2337366Z mov.b32 %r15286, %r15177; 2026-02-21T09:06:50.2337425Z mov.b32 %r15287, %r15177; 2026-02-21T09:06:50.2337485Z mov.b32 %r15288, %r15177; 2026-02-21T09:06:50.2337546Z mov.b32 %r15289, %r15177; 2026-02-21T09:06:50.2337604Z mov.b32 %r15290, %r15177; 2026-02-21T09:06:50.2337665Z mov.b32 %r15291, %r15177; 2026-02-21T09:06:50.2337723Z mov.b32 %r15292, %r15177; 2026-02-21T09:06:50.2337785Z mov.b32 %r15293, %r15177; 2026-02-21T09:06:50.2337843Z mov.b32 %r15294, %r15177; 2026-02-21T09:06:50.2337902Z mov.b32 %r15295, %r15177; 2026-02-21T09:06:50.2337977Z mov.b32 %r15296, %r15177; 2026-02-21T09:06:50.2338037Z mov.b32 %r15297, %r15177; 2026-02-21T09:06:50.2338096Z mov.b32 %r15298, %r15177; 2026-02-21T09:06:50.2338155Z mov.b32 %r15299, %r15177; 2026-02-21T09:06:50.2338216Z mov.b32 %r15300, %r15177; 2026-02-21T09:06:50.2338279Z mov.b32 %r15301, %r15177; 2026-02-21T09:06:50.2338404Z mov.b32 %r15302, %r15177; 2026-02-21T09:06:50.2338467Z mov.b32 %r15303, %r15177; 2026-02-21T09:06:50.2338525Z mov.b32 %r15304, %r15177; 2026-02-21T09:06:50.2338583Z mov.b32 %r15305, %r15311; 2026-02-21T09:06:50.2338648Z bra.uni $L__BB0_11; 2026-02-21T09:06:50.2338775Z $L__BB0_15: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:06:50.2338995Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.2339058Z add.s32 %r15305, %r1008, 32; 2026-02-21T09:06:50.2339125Z add.s32 %r15169, %r15169, -1; 2026-02-21T09:06:50.2339191Z setp.ne.b32 %p80, %r15169, 0; 2026-02-21T09:06:50.2339252Z @%p80 bra $L__BB0_11; 2026-02-21T09:06:50.2339316Z bra.uni $L__BB0_16; 2026-02-21T09:06:50.2339436Z $L__BB0_11: // =>This Inner Loop Header: Depth=1 2026-02-21T09:06:50.2339497Z add.s32 %r11424, %r15307, 1; 2026-02-21T09:06:50.2339567Z setp.eq.b32 %p64, %r15307, 15; 2026-02-21T09:06:50.2339641Z selp.b32 %r15307, 0, %r11424, %p64; 2026-02-21T09:06:50.2339708Z setp.ne.b32 %p65, %r15307, 0; 2026-02-21T09:06:50.2339769Z setp.eq.b32 %p66, %r15307, 0; 2026-02-21T09:06:50.2339839Z @%p65 bra $L__BB0_13; 2026-02-21T09:06:50.2339954Z // %bb.12: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:06:50.2340018Z add.s32 %r15315, %r15315, 1056; 2026-02-21T09:06:50.2340221Z .loc 1 32 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:32:35 2026-02-21T09:06:50.2340286Z shr.s32 %r11425, %r15315, 31; 2026-02-21T09:06:50.2340347Z shr.u32 %r11426, %r11425, 26; 2026-02-21T09:06:50.2340410Z add.s32 %r11427, %r15315, %r11426; 2026-02-21T09:06:50.2340476Z shr.s32 %r11428, %r11427, 6; 2026-02-21T09:06:50.2340671Z .loc 1 33 33 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:33:33 2026-02-21T09:06:50.2340733Z shl.b32 %r11429, %r11428, 1; 2026-02-21T09:06:50.2340932Z .loc 1 34 39 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:39 2026-02-21T09:06:50.2340993Z sub.s32 %r11430, 16, %r11429; 2026-02-21T09:06:50.2341184Z .loc 1 34 52 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:34:52 2026-02-21T09:06:50.2341325Z min.s32 %r11431, %r11430, 2; 2026-02-21T09:06:50.2341518Z .loc 1 35 45 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:45 2026-02-21T09:06:50.2341582Z and.b32 %r11432, %r11427, -64; 2026-02-21T09:06:50.2341645Z sub.s32 %r11433, %r15315, %r11432; 2026-02-21T09:06:50.2341837Z .loc 1 36 51 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:36:51 2026-02-21T09:06:50.2341900Z div.s32 %r11434, %r11433, %r11431; 2026-02-21T09:06:50.2342092Z .loc 1 35 64 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:64 2026-02-21T09:06:50.2342162Z mul.lo.s32 %r11435, %r11434, %r11431; 2026-02-21T09:06:50.2342324Z sub.s32 %r11436, %r11433, %r11435; 2026-02-21T09:06:50.2342518Z .loc 1 35 30 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:35:30 2026-02-21T09:06:50.2342600Z add.s32 %r11437, %r11436, %r11429; 2026-02-21T09:06:50.2342795Z .loc 1 37 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:37:27 2026-02-21T09:06:50.2342858Z shl.b32 %r15308, %r11437, 8; 2026-02-21T09:06:50.2343051Z .loc 1 38 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:38:32 2026-02-21T09:06:50.2343111Z or.b32 %r15309, %r15308, %r5; 2026-02-21T09:06:50.2343301Z .loc 1 39 27 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:39:27 2026-02-21T09:06:50.2343361Z shl.b32 %r15310, %r11434, 8; 2026-02-21T09:06:50.2343556Z .loc 1 40 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:40:32 2026-02-21T09:06:50.2343619Z or.b32 %r15311, %r15310, %r7; 2026-02-21T09:06:50.2343749Z or.b32 %r15312, %r15311, 1; 2026-02-21T09:06:50.2343817Z or.b32 %r15313, %r15311, 2; 2026-02-21T09:06:50.2343876Z or.b32 %r15314, %r15311, 3; 2026-02-21T09:06:50.2343984Z $L__BB0_13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:06:50.2344175Z .loc 1 0 0 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:0 2026-02-21T09:06:50.2344242Z selp.b32 %r1008, 0, %r15305, %p66; 2026-02-21T09:06:50.2344437Z .loc 1 48 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:48:35 2026-02-21T09:06:50.2344499Z add.s32 %r14582, %r1008, %r9; 2026-02-21T09:06:50.2344690Z .loc 1 52 22 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:52:22 2026-02-21T09:06:50.2344753Z shl.b32 %r14583, %r1008, 1; 2026-02-21T09:06:50.2344945Z .loc 1 55 53 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:53 2026-02-21T09:06:50.2345013Z shl.b32 %r14584, %r15309, 10; 2026-02-21T09:06:50.2345207Z .loc 1 54 25 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:54:25 2026-02-21T09:06:50.2345270Z or.b32 %r14585, %r14584, %r11; 2026-02-21T09:06:50.2345468Z .loc 1 55 60 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:60 2026-02-21T09:06:50.2345531Z add.s32 %r14586, %r14585, %r14583; 2026-02-21T09:06:50.2345721Z .loc 1 55 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:32 2026-02-21T09:06:50.2345800Z mad.wide.s32 %rd114, %r14586, 2, %rd22; 2026-02-21T09:06:50.2345991Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2346064Z // begin inline asm 2026-02-21T09:06:50.2346127Z mov.u32 %r11438, 0x0; 2026-02-21T09:06:50.2346187Z mov.u32 %r11439, 0x0; 2026-02-21T09:06:50.2346245Z mov.u32 %r11440, 0x0; 2026-02-21T09:06:50.2346304Z mov.u32 %r11441, 0x0; 2026-02-21T09:06:50.2346560Z ld.global.v4.b32 { %r11438, %r11439, %r11440, %r11441 }, [ %rd114 + 0 ]; 2026-02-21T09:06:50.2346625Z // end inline asm 2026-02-21T09:06:50.2346828Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2346981Z bar.sync 0; 2026-02-21T09:06:50.2347071Z st.shared.v2.b32 [%r839], {%r11438, %r11439}; 2026-02-21T09:06:50.2347152Z st.shared.v2.b32 [%r840], {%r11440, %r11441}; 2026-02-21T09:06:50.2347208Z bar.sync 0; 2026-02-21T09:06:50.2347281Z ld.shared.b16 %rs593, [%r841]; 2026-02-21T09:06:50.2347348Z ld.shared.b16 %rs594, [%r841+256]; 2026-02-21T09:06:50.2347417Z ld.shared.b16 %rs595, [%r841+16]; 2026-02-21T09:06:50.2347484Z ld.shared.b16 %rs596, [%r841+272]; 2026-02-21T09:06:50.2347550Z ld.shared.b16 %rs597, [%r842]; 2026-02-21T09:06:50.2347614Z ld.shared.b16 %rs598, [%r842+256]; 2026-02-21T09:06:50.2347678Z ld.shared.b16 %rs599, [%r842+16]; 2026-02-21T09:06:50.2347815Z ld.shared.b16 %rs600, [%r842+272]; 2026-02-21T09:06:50.2347940Z cvt.f32.bf16 %r11698, %rs593; 2026-02-21T09:06:50.2348005Z cvt.f32.bf16 %r11699, %rs594; 2026-02-21T09:06:50.2348069Z cvt.f32.bf16 %r11700, %rs597; 2026-02-21T09:06:50.2348131Z cvt.f32.bf16 %r11701, %rs598; 2026-02-21T09:06:50.2348191Z cvt.f32.bf16 %r11958, %rs595; 2026-02-21T09:06:50.2348252Z cvt.f32.bf16 %r11959, %rs596; 2026-02-21T09:06:50.2348384Z cvt.f32.bf16 %r11960, %rs599; 2026-02-21T09:06:50.2348446Z cvt.f32.bf16 %r11961, %rs600; 2026-02-21T09:06:50.2348645Z .loc 1 61 55 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:55 2026-02-21T09:06:50.2348710Z shl.b32 %r14587, %r14582, 13; 2026-02-21T09:06:50.2348903Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2348966Z add.s32 %r14588, %r15311, %r14587; 2026-02-21T09:06:50.2349030Z add.s32 %r14589, %r15312, %r14587; 2026-02-21T09:06:50.2349091Z add.s32 %r14590, %r15313, %r14587; 2026-02-21T09:06:50.2349228Z add.s32 %r14591, %r15314, %r14587; 2026-02-21T09:06:50.2349425Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2349495Z cvt.s64.s32 %rd142, %r14588; 2026-02-21T09:06:50.2349557Z add.s64 %rd115, %rd23, %rd142; 2026-02-21T09:06:50.2349621Z cvt.s64.s32 %rd143, %r14589; 2026-02-21T09:06:50.2349687Z add.s64 %rd116, %rd23, %rd143; 2026-02-21T09:06:50.2349748Z cvt.s64.s32 %rd144, %r14590; 2026-02-21T09:06:50.2349810Z add.s64 %rd117, %rd23, %rd144; 2026-02-21T09:06:50.2349873Z cvt.s64.s32 %rd145, %r14591; 2026-02-21T09:06:50.2349933Z add.s64 %rd118, %rd23, %rd145; 2026-02-21T09:06:50.2350126Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2350188Z // begin inline asm 2026-02-21T09:06:50.2350252Z mov.u16 %rs577, 0x0; 2026-02-21T09:06:50.2350326Z ld.global.b8 { %rs577 }, [ %rd115 + 0 ]; 2026-02-21T09:06:50.2350386Z // end inline asm 2026-02-21T09:06:50.2350450Z // begin inline asm 2026-02-21T09:06:50.2350508Z mov.u16 %rs578, 0x0; 2026-02-21T09:06:50.2350580Z ld.global.b8 { %rs578 }, [ %rd116 + 0 ]; 2026-02-21T09:06:50.2350640Z // end inline asm 2026-02-21T09:06:50.2350702Z // begin inline asm 2026-02-21T09:06:50.2350763Z mov.u16 %rs579, 0x0; 2026-02-21T09:06:50.2350833Z ld.global.b8 { %rs579 }, [ %rd117 + 0 ]; 2026-02-21T09:06:50.2350893Z // end inline asm 2026-02-21T09:06:50.2350954Z // begin inline asm 2026-02-21T09:06:50.2351012Z mov.u16 %rs580, 0x0; 2026-02-21T09:06:50.2351081Z ld.global.b8 { %rs580 }, [ %rd118 + 0 ]; 2026-02-21T09:06:50.2351140Z // end inline asm 2026-02-21T09:06:50.2351334Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2351390Z bar.sync 0; 2026-02-21T09:06:50.2351458Z st.shared.b8 [%r843], %rs577; 2026-02-21T09:06:50.2351524Z st.shared.b8 [%r844+512], %rs578; 2026-02-21T09:06:50.2351593Z st.shared.b8 [%r845+1024], %rs579; 2026-02-21T09:06:50.2351665Z st.shared.b8 [%r846+1536], %rs580; 2026-02-21T09:06:50.2351720Z bar.sync 0; 2026-02-21T09:06:50.2351788Z ld.shared.b32 %r14592, [%r847]; 2026-02-21T09:06:50.2351924Z prmt.b32 %r14593, %r14592, 0, 0x7770U; 2026-02-21T09:06:50.2351990Z cvt.u16.u32 %rs601, %r14593; 2026-02-21T09:06:50.2352058Z prmt.b32 %r14594, %r14592, 0, 0x7771U; 2026-02-21T09:06:50.2352119Z cvt.u16.u32 %rs602, %r14594; 2026-02-21T09:06:50.2352189Z prmt.b32 %r14595, %r14592, 0, 0x7772U; 2026-02-21T09:06:50.2352252Z cvt.u16.u32 %rs603, %r14595; 2026-02-21T09:06:50.2352315Z prmt.b32 %r14596, %r14592, 0, 0x7773U; 2026-02-21T09:06:50.2352376Z cvt.u16.u32 %rs604, %r14596; 2026-02-21T09:06:50.2352450Z ld.shared.b32 %r14597, [%r847+128]; 2026-02-21T09:06:50.2352514Z prmt.b32 %r14598, %r14597, 0, 0x7770U; 2026-02-21T09:06:50.2352574Z cvt.u16.u32 %rs605, %r14598; 2026-02-21T09:06:50.2352643Z prmt.b32 %r14599, %r14597, 0, 0x7771U; 2026-02-21T09:06:50.2352756Z cvt.u16.u32 %rs606, %r14599; 2026-02-21T09:06:50.2352870Z prmt.b32 %r14600, %r14597, 0, 0x7772U; 2026-02-21T09:06:50.2352935Z cvt.u16.u32 %rs607, %r14600; 2026-02-21T09:06:50.2353000Z prmt.b32 %r14601, %r14597, 0, 0x7773U; 2026-02-21T09:06:50.2353063Z cvt.u16.u32 %rs608, %r14601; 2026-02-21T09:06:50.2353257Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2353325Z shl.b16 %rs609, %rs601, 4; 2026-02-21T09:06:50.2353386Z shl.b16 %rs610, %rs602, 4; 2026-02-21T09:06:50.2353446Z shl.b16 %rs611, %rs603, 4; 2026-02-21T09:06:50.2353509Z shl.b16 %rs612, %rs604, 4; 2026-02-21T09:06:50.2353569Z shl.b16 %rs613, %rs605, 4; 2026-02-21T09:06:50.2353644Z shl.b16 %rs614, %rs606, 4; 2026-02-21T09:06:50.2353707Z shl.b16 %rs615, %rs607, 4; 2026-02-21T09:06:50.2353770Z shl.b16 %rs616, %rs608, 4; 2026-02-21T09:06:50.2353966Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2354039Z selp.b16 %rs617, %rs609, %rs601, %p81; 2026-02-21T09:06:50.2354156Z cvt.s16.s8 %rs618, %rs617; 2026-02-21T09:06:50.2354219Z shr.s16 %rs619, %rs618, 4; 2026-02-21T09:06:50.2354288Z selp.b16 %rs620, %rs610, %rs602, %p81; 2026-02-21T09:06:50.2354355Z cvt.s16.s8 %rs621, %rs620; 2026-02-21T09:06:50.2354415Z shr.s16 %rs622, %rs621, 4; 2026-02-21T09:06:50.2354482Z selp.b16 %rs623, %rs611, %rs603, %p81; 2026-02-21T09:06:50.2354544Z cvt.s16.s8 %rs624, %rs623; 2026-02-21T09:06:50.2354609Z shr.s16 %rs625, %rs624, 4; 2026-02-21T09:06:50.2354677Z selp.b16 %rs626, %rs612, %rs604, %p81; 2026-02-21T09:06:50.2354738Z cvt.s16.s8 %rs627, %rs626; 2026-02-21T09:06:50.2354799Z shr.s16 %rs628, %rs627, 4; 2026-02-21T09:06:50.2354867Z selp.b16 %rs629, %rs613, %rs605, %p81; 2026-02-21T09:06:50.2354926Z cvt.s16.s8 %rs630, %rs629; 2026-02-21T09:06:50.2354986Z shr.s16 %rs631, %rs630, 4; 2026-02-21T09:06:50.2355057Z selp.b16 %rs632, %rs614, %rs606, %p81; 2026-02-21T09:06:50.2355120Z cvt.s16.s8 %rs633, %rs632; 2026-02-21T09:06:50.2355185Z shr.s16 %rs634, %rs633, 4; 2026-02-21T09:06:50.2355254Z selp.b16 %rs635, %rs615, %rs607, %p81; 2026-02-21T09:06:50.2355314Z cvt.s16.s8 %rs636, %rs635; 2026-02-21T09:06:50.2355376Z shr.s16 %rs637, %rs636, 4; 2026-02-21T09:06:50.2355442Z selp.b16 %rs638, %rs616, %rs608, %p81; 2026-02-21T09:06:50.2355508Z cvt.s16.s8 %rs639, %rs638; 2026-02-21T09:06:50.2355568Z shr.s16 %rs640, %rs639, 4; 2026-02-21T09:06:50.2355762Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2355833Z cvt.rn.f32.s16 %r14602, %rs619; 2026-02-21T09:06:50.2355897Z cvt.rn.f32.s16 %r14603, %rs622; 2026-02-21T09:06:50.2355958Z cvt.rn.f32.s16 %r14604, %rs625; 2026-02-21T09:06:50.2356023Z cvt.rn.f32.s16 %r14605, %rs628; 2026-02-21T09:06:50.2356084Z cvt.rn.f32.s16 %r14606, %rs631; 2026-02-21T09:06:50.2356146Z cvt.rn.f32.s16 %r14607, %rs634; 2026-02-21T09:06:50.2356209Z cvt.rn.f32.s16 %r14608, %rs637; 2026-02-21T09:06:50.2356276Z cvt.rn.f32.s16 %r14609, %rs640; 2026-02-21T09:06:50.2356334Z bar.sync 0; 2026-02-21T09:06:50.2356397Z st.shared.b32 [%r848], %r14602; 2026-02-21T09:06:50.2356582Z st.shared.b32 [%r848+8], %r14603; 2026-02-21T09:06:50.2356733Z st.shared.b32 [%r849], %r14604; 2026-02-21T09:06:50.2356798Z st.shared.b32 [%r849+8], %r14605; 2026-02-21T09:06:50.2356861Z st.shared.b32 [%r850], %r14606; 2026-02-21T09:06:50.2356925Z st.shared.b32 [%r850+8], %r14607; 2026-02-21T09:06:50.2356988Z st.shared.b32 [%r851], %r14608; 2026-02-21T09:06:50.2357051Z st.shared.b32 [%r851+8], %r14609; 2026-02-21T09:06:50.2357120Z $L__tmp25: 2026-02-21T09:06:50.2357393Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2357453Z // begin inline asm 2026-02-21T09:06:50.2357535Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2357592Z // end inline asm 2026-02-21T09:06:50.2357646Z bar.sync 0; 2026-02-21T09:06:50.2357866Z shfl.sync.idx.b32 %r1025, %r4, 0, 31, -1; 2026-02-21T09:06:50.2357947Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2358011Z mov.pred %p67, -1; 2026-02-21T09:06:50.2358069Z // begin inline asm 2026-02-21T09:06:50.2360877Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r11698,%r11699,%r11700,%r11701}, %rd119, %p67, 1, 1; 2026-02-21T09:06:50.2360942Z // end inline asm 2026-02-21T09:06:50.2361004Z // begin inline asm 2026-02-21T09:06:50.2363731Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r11958,%r11959,%r11960,%r11961}, %rd120, %p67, 1, 1; 2026-02-21T09:06:50.2363805Z // end inline asm 2026-02-21T09:06:50.2363884Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2363944Z mov.b32 %r14449, 0; 2026-02-21T09:06:50.2364008Z mov.b32 %r12090, %r12221; 2026-02-21T09:06:50.2364068Z mov.b32 %r12091, %r14449; 2026-02-21T09:06:50.2364126Z mov.b32 %r12092, %r14449; 2026-02-21T09:06:50.2364187Z // begin inline asm 2026-02-21T09:06:50.2366860Z // wait for regs: %r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304,%r12090,%r12091,%r12092 2026-02-21T09:06:50.2367175Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2367235Z // end inline asm 2026-02-21T09:06:50.2367291Z $L__tmp26: 2026-02-21T09:06:50.2367507Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2367577Z add.s32 %r14610, %r1008, 8; 2026-02-21T09:06:50.2367780Z .loc 1 48 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:48:35 2026-02-21T09:06:50.2367845Z add.s32 %r14611, %r14610, %r9; 2026-02-21T09:06:50.2368044Z .loc 1 52 22 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:52:22 2026-02-21T09:06:50.2368106Z shl.b32 %r14612, %r14610, 1; 2026-02-21T09:06:50.2368297Z .loc 1 55 60 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:60 2026-02-21T09:06:50.2368364Z add.s32 %r14613, %r14585, %r14612; 2026-02-21T09:06:50.2368558Z .loc 1 55 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:32 2026-02-21T09:06:50.2368698Z mad.wide.s32 %rd121, %r14613, 2, %rd22; 2026-02-21T09:06:50.2368896Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2368959Z // begin inline asm 2026-02-21T09:06:50.2369018Z mov.u32 %r12224, 0x0; 2026-02-21T09:06:50.2369076Z mov.u32 %r12225, 0x0; 2026-02-21T09:06:50.2369136Z mov.u32 %r12226, 0x0; 2026-02-21T09:06:50.2369193Z mov.u32 %r12227, 0x0; 2026-02-21T09:06:50.2369329Z ld.global.v4.b32 { %r12224, %r12225, %r12226, %r12227 }, [ %rd121 + 0 ]; 2026-02-21T09:06:50.2369390Z // end inline asm 2026-02-21T09:06:50.2369587Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2369645Z bar.sync 0; 2026-02-21T09:06:50.2369735Z st.shared.v2.b32 [%r839], {%r12224, %r12225}; 2026-02-21T09:06:50.2369816Z st.shared.v2.b32 [%r840], {%r12226, %r12227}; 2026-02-21T09:06:50.2369876Z bar.sync 0; 2026-02-21T09:06:50.2369948Z ld.shared.b16 %rs641, [%r841]; 2026-02-21T09:06:50.2370019Z ld.shared.b16 %rs642, [%r841+256]; 2026-02-21T09:06:50.2370086Z ld.shared.b16 %rs643, [%r841+16]; 2026-02-21T09:06:50.2370153Z ld.shared.b16 %rs644, [%r841+272]; 2026-02-21T09:06:50.2370221Z ld.shared.b16 %rs645, [%r842]; 2026-02-21T09:06:50.2370286Z ld.shared.b16 %rs646, [%r842+256]; 2026-02-21T09:06:50.2370350Z ld.shared.b16 %rs647, [%r842+16]; 2026-02-21T09:06:50.2370415Z ld.shared.b16 %rs648, [%r842+272]; 2026-02-21T09:06:50.2370491Z cvt.f32.bf16 %r12484, %rs641; 2026-02-21T09:06:50.2370557Z cvt.f32.bf16 %r12485, %rs642; 2026-02-21T09:06:50.2370619Z cvt.f32.bf16 %r12486, %rs645; 2026-02-21T09:06:50.2370682Z cvt.f32.bf16 %r12487, %rs646; 2026-02-21T09:06:50.2370744Z cvt.f32.bf16 %r12744, %rs643; 2026-02-21T09:06:50.2370805Z cvt.f32.bf16 %r12745, %rs644; 2026-02-21T09:06:50.2370865Z cvt.f32.bf16 %r12746, %rs647; 2026-02-21T09:06:50.2370929Z cvt.f32.bf16 %r12747, %rs648; 2026-02-21T09:06:50.2371131Z .loc 1 61 55 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:55 2026-02-21T09:06:50.2371195Z shl.b32 %r14614, %r14611, 13; 2026-02-21T09:06:50.2371448Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2371510Z add.s32 %r14615, %r15311, %r14614; 2026-02-21T09:06:50.2371573Z add.s32 %r14616, %r15312, %r14614; 2026-02-21T09:06:50.2371636Z add.s32 %r14617, %r15313, %r14614; 2026-02-21T09:06:50.2371696Z add.s32 %r14618, %r15314, %r14614; 2026-02-21T09:06:50.2371887Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2371951Z cvt.s64.s32 %rd146, %r14615; 2026-02-21T09:06:50.2372018Z add.s64 %rd122, %rd23, %rd146; 2026-02-21T09:06:50.2372080Z cvt.s64.s32 %rd147, %r14616; 2026-02-21T09:06:50.2372142Z add.s64 %rd123, %rd23, %rd147; 2026-02-21T09:06:50.2372207Z cvt.s64.s32 %rd148, %r14617; 2026-02-21T09:06:50.2372366Z add.s64 %rd124, %rd23, %rd148; 2026-02-21T09:06:50.2372431Z cvt.s64.s32 %rd149, %r14618; 2026-02-21T09:06:50.2372497Z add.s64 %rd125, %rd23, %rd149; 2026-02-21T09:06:50.2372690Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2372754Z // begin inline asm 2026-02-21T09:06:50.2372815Z mov.u16 %rs581, 0x0; 2026-02-21T09:06:50.2372891Z ld.global.b8 { %rs581 }, [ %rd122 + 0 ]; 2026-02-21T09:06:50.2372951Z // end inline asm 2026-02-21T09:06:50.2373010Z // begin inline asm 2026-02-21T09:06:50.2373073Z mov.u16 %rs582, 0x0; 2026-02-21T09:06:50.2373145Z ld.global.b8 { %rs582 }, [ %rd123 + 0 ]; 2026-02-21T09:06:50.2373214Z // end inline asm 2026-02-21T09:06:50.2373275Z // begin inline asm 2026-02-21T09:06:50.2373337Z mov.u16 %rs583, 0x0; 2026-02-21T09:06:50.2373407Z ld.global.b8 { %rs583 }, [ %rd124 + 0 ]; 2026-02-21T09:06:50.2373464Z // end inline asm 2026-02-21T09:06:50.2373528Z // begin inline asm 2026-02-21T09:06:50.2373587Z mov.u16 %rs584, 0x0; 2026-02-21T09:06:50.2373713Z ld.global.b8 { %rs584 }, [ %rd125 + 0 ]; 2026-02-21T09:06:50.2373772Z // end inline asm 2026-02-21T09:06:50.2373969Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2374027Z bar.sync 0; 2026-02-21T09:06:50.2374091Z st.shared.b8 [%r843], %rs581; 2026-02-21T09:06:50.2374160Z st.shared.b8 [%r844+512], %rs582; 2026-02-21T09:06:50.2374223Z st.shared.b8 [%r845+1024], %rs583; 2026-02-21T09:06:50.2374288Z st.shared.b8 [%r846+1536], %rs584; 2026-02-21T09:06:50.2374345Z bar.sync 0; 2026-02-21T09:06:50.2374411Z ld.shared.b32 %r14619, [%r847]; 2026-02-21T09:06:50.2374479Z prmt.b32 %r14620, %r14619, 0, 0x7770U; 2026-02-21T09:06:50.2374541Z cvt.u16.u32 %rs649, %r14620; 2026-02-21T09:06:50.2374610Z prmt.b32 %r14621, %r14619, 0, 0x7771U; 2026-02-21T09:06:50.2374671Z cvt.u16.u32 %rs650, %r14621; 2026-02-21T09:06:50.2374738Z prmt.b32 %r14622, %r14619, 0, 0x7772U; 2026-02-21T09:06:50.2374804Z cvt.u16.u32 %rs651, %r14622; 2026-02-21T09:06:50.2374868Z prmt.b32 %r14623, %r14619, 0, 0x7773U; 2026-02-21T09:06:50.2374928Z cvt.u16.u32 %rs652, %r14623; 2026-02-21T09:06:50.2374998Z ld.shared.b32 %r14624, [%r847+128]; 2026-02-21T09:06:50.2375068Z prmt.b32 %r14625, %r14624, 0, 0x7770U; 2026-02-21T09:06:50.2375130Z cvt.u16.u32 %rs653, %r14625; 2026-02-21T09:06:50.2375194Z prmt.b32 %r14626, %r14624, 0, 0x7771U; 2026-02-21T09:06:50.2375257Z cvt.u16.u32 %rs654, %r14626; 2026-02-21T09:06:50.2375321Z prmt.b32 %r14627, %r14624, 0, 0x7772U; 2026-02-21T09:06:50.2375395Z cvt.u16.u32 %rs655, %r14627; 2026-02-21T09:06:50.2375465Z prmt.b32 %r14628, %r14624, 0, 0x7773U; 2026-02-21T09:06:50.2375528Z cvt.u16.u32 %rs656, %r14628; 2026-02-21T09:06:50.2375722Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2375787Z shl.b16 %rs657, %rs649, 4; 2026-02-21T09:06:50.2375854Z shl.b16 %rs658, %rs650, 4; 2026-02-21T09:06:50.2375916Z shl.b16 %rs659, %rs651, 4; 2026-02-21T09:06:50.2375976Z shl.b16 %rs660, %rs652, 4; 2026-02-21T09:06:50.2376039Z shl.b16 %rs661, %rs653, 4; 2026-02-21T09:06:50.2376158Z shl.b16 %rs662, %rs654, 4; 2026-02-21T09:06:50.2376218Z shl.b16 %rs663, %rs655, 4; 2026-02-21T09:06:50.2376280Z shl.b16 %rs664, %rs656, 4; 2026-02-21T09:06:50.2376588Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2376664Z selp.b16 %rs665, %rs657, %rs649, %p81; 2026-02-21T09:06:50.2376726Z cvt.s16.s8 %rs666, %rs665; 2026-02-21T09:06:50.2376789Z shr.s16 %rs667, %rs666, 4; 2026-02-21T09:06:50.2376871Z selp.b16 %rs668, %rs658, %rs650, %p81; 2026-02-21T09:06:50.2376933Z cvt.s16.s8 %rs669, %rs668; 2026-02-21T09:06:50.2376994Z shr.s16 %rs670, %rs669, 4; 2026-02-21T09:06:50.2377065Z selp.b16 %rs671, %rs659, %rs651, %p81; 2026-02-21T09:06:50.2377125Z cvt.s16.s8 %rs672, %rs671; 2026-02-21T09:06:50.2377329Z shr.s16 %rs673, %rs672, 4; 2026-02-21T09:06:50.2377404Z selp.b16 %rs674, %rs660, %rs652, %p81; 2026-02-21T09:06:50.2377465Z cvt.s16.s8 %rs675, %rs674; 2026-02-21T09:06:50.2377525Z shr.s16 %rs676, %rs675, 4; 2026-02-21T09:06:50.2377598Z selp.b16 %rs677, %rs661, %rs653, %p81; 2026-02-21T09:06:50.2377658Z cvt.s16.s8 %rs678, %rs677; 2026-02-21T09:06:50.2377718Z shr.s16 %rs679, %rs678, 4; 2026-02-21T09:06:50.2377787Z selp.b16 %rs680, %rs662, %rs654, %p81; 2026-02-21T09:06:50.2377849Z cvt.s16.s8 %rs681, %rs680; 2026-02-21T09:06:50.2377908Z shr.s16 %rs682, %rs681, 4; 2026-02-21T09:06:50.2377974Z selp.b16 %rs683, %rs663, %rs655, %p81; 2026-02-21T09:06:50.2378038Z cvt.s16.s8 %rs684, %rs683; 2026-02-21T09:06:50.2378099Z shr.s16 %rs685, %rs684, 4; 2026-02-21T09:06:50.2378165Z selp.b16 %rs686, %rs664, %rs656, %p81; 2026-02-21T09:06:50.2378228Z cvt.s16.s8 %rs687, %rs686; 2026-02-21T09:06:50.2378301Z shr.s16 %rs688, %rs687, 4; 2026-02-21T09:06:50.2378568Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2378638Z cvt.rn.f32.s16 %r14629, %rs667; 2026-02-21T09:06:50.2378704Z cvt.rn.f32.s16 %r14630, %rs670; 2026-02-21T09:06:50.2378769Z cvt.rn.f32.s16 %r14631, %rs673; 2026-02-21T09:06:50.2378830Z cvt.rn.f32.s16 %r14632, %rs676; 2026-02-21T09:06:50.2383664Z cvt.rn.f32.s16 %r14633, %rs679; 2026-02-21T09:06:50.2383771Z cvt.rn.f32.s16 %r14634, %rs682; 2026-02-21T09:06:50.2383844Z cvt.rn.f32.s16 %r14635, %rs685; 2026-02-21T09:06:50.2383909Z cvt.rn.f32.s16 %r14636, %rs688; 2026-02-21T09:06:50.2383966Z bar.sync 0; 2026-02-21T09:06:50.2384036Z st.shared.b32 [%r848], %r14629; 2026-02-21T09:06:50.2384109Z st.shared.b32 [%r848+8], %r14630; 2026-02-21T09:06:50.2384175Z st.shared.b32 [%r849], %r14631; 2026-02-21T09:06:50.2384240Z st.shared.b32 [%r849+8], %r14632; 2026-02-21T09:06:50.2384308Z st.shared.b32 [%r850], %r14633; 2026-02-21T09:06:50.2384371Z st.shared.b32 [%r850+8], %r14634; 2026-02-21T09:06:50.2384451Z st.shared.b32 [%r851], %r14635; 2026-02-21T09:06:50.2384520Z st.shared.b32 [%r851+8], %r14636; 2026-02-21T09:06:50.2384583Z $L__tmp27: 2026-02-21T09:06:50.2384877Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2384943Z // begin inline asm 2026-02-21T09:06:50.2385034Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2385090Z // end inline asm 2026-02-21T09:06:50.2385146Z bar.sync 0; 2026-02-21T09:06:50.2385221Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2385283Z // begin inline asm 2026-02-21T09:06:50.2388150Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r12484,%r12485,%r12486,%r12487}, %rd119, %p67, 1, 1; 2026-02-21T09:06:50.2388440Z // end inline asm 2026-02-21T09:06:50.2388504Z // begin inline asm 2026-02-21T09:06:50.2391268Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r12744,%r12745,%r12746,%r12747}, %rd120, %p67, 1, 1; 2026-02-21T09:06:50.2391393Z // end inline asm 2026-02-21T09:06:50.2391537Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2391603Z mov.b32 %r12876, %r12221; 2026-02-21T09:06:50.2391662Z mov.b32 %r12877, %r14449; 2026-02-21T09:06:50.2391721Z mov.b32 %r12878, %r14449; 2026-02-21T09:06:50.2391785Z // begin inline asm 2026-02-21T09:06:50.2394244Z // wait for regs: %r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304,%r12876,%r12877,%r12878 2026-02-21T09:06:50.2394329Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2394388Z // end inline asm 2026-02-21T09:06:50.2394442Z $L__tmp28: 2026-02-21T09:06:50.2394669Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2394736Z add.s32 %r14637, %r1008, 16; 2026-02-21T09:06:50.2394954Z .loc 1 48 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:48:35 2026-02-21T09:06:50.2395029Z add.s32 %r14638, %r14637, %r9; 2026-02-21T09:06:50.2395232Z .loc 1 52 22 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:52:22 2026-02-21T09:06:50.2395301Z shl.b32 %r14639, %r14637, 1; 2026-02-21T09:06:50.2395496Z .loc 1 55 60 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:60 2026-02-21T09:06:50.2395617Z add.s32 %r14640, %r14585, %r14639; 2026-02-21T09:06:50.2395813Z .loc 1 55 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:32 2026-02-21T09:06:50.2395894Z mad.wide.s32 %rd128, %r14640, 2, %rd22; 2026-02-21T09:06:50.2396084Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2396146Z // begin inline asm 2026-02-21T09:06:50.2396203Z mov.u32 %r13010, 0x0; 2026-02-21T09:06:50.2396263Z mov.u32 %r13011, 0x0; 2026-02-21T09:06:50.2396319Z mov.u32 %r13012, 0x0; 2026-02-21T09:06:50.2396377Z mov.u32 %r13013, 0x0; 2026-02-21T09:06:50.2396643Z ld.global.v4.b32 { %r13010, %r13011, %r13012, %r13013 }, [ %rd128 + 0 ]; 2026-02-21T09:06:50.2396822Z // end inline asm 2026-02-21T09:06:50.2397092Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2397151Z bar.sync 0; 2026-02-21T09:06:50.2397243Z st.shared.v2.b32 [%r839], {%r13010, %r13011}; 2026-02-21T09:06:50.2397330Z st.shared.v2.b32 [%r840], {%r13012, %r13013}; 2026-02-21T09:06:50.2397390Z bar.sync 0; 2026-02-21T09:06:50.2397462Z ld.shared.b16 %rs689, [%r841]; 2026-02-21T09:06:50.2397530Z ld.shared.b16 %rs690, [%r841+256]; 2026-02-21T09:06:50.2397597Z ld.shared.b16 %rs691, [%r841+16]; 2026-02-21T09:06:50.2397664Z ld.shared.b16 %rs692, [%r841+272]; 2026-02-21T09:06:50.2397729Z ld.shared.b16 %rs693, [%r842]; 2026-02-21T09:06:50.2397793Z ld.shared.b16 %rs694, [%r842+256]; 2026-02-21T09:06:50.2397857Z ld.shared.b16 %rs695, [%r842+16]; 2026-02-21T09:06:50.2397935Z ld.shared.b16 %rs696, [%r842+272]; 2026-02-21T09:06:50.2398002Z cvt.f32.bf16 %r13270, %rs689; 2026-02-21T09:06:50.2398063Z cvt.f32.bf16 %r13271, %rs690; 2026-02-21T09:06:50.2398131Z cvt.f32.bf16 %r13272, %rs693; 2026-02-21T09:06:50.2398263Z cvt.f32.bf16 %r13273, %rs694; 2026-02-21T09:06:50.2398326Z cvt.f32.bf16 %r13530, %rs691; 2026-02-21T09:06:50.2398385Z cvt.f32.bf16 %r13531, %rs692; 2026-02-21T09:06:50.2398450Z cvt.f32.bf16 %r13532, %rs695; 2026-02-21T09:06:50.2398510Z cvt.f32.bf16 %r13533, %rs696; 2026-02-21T09:06:50.2398718Z .loc 1 61 55 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:55 2026-02-21T09:06:50.2398783Z shl.b32 %r14641, %r14638, 13; 2026-02-21T09:06:50.2398979Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2399042Z add.s32 %r14642, %r15311, %r14641; 2026-02-21T09:06:50.2399107Z add.s32 %r14643, %r15312, %r14641; 2026-02-21T09:06:50.2399180Z add.s32 %r14644, %r15313, %r14641; 2026-02-21T09:06:50.2399245Z add.s32 %r14645, %r15314, %r14641; 2026-02-21T09:06:50.2399447Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2399519Z cvt.s64.s32 %rd150, %r14642; 2026-02-21T09:06:50.2399585Z add.s64 %rd129, %rd23, %rd150; 2026-02-21T09:06:50.2399647Z cvt.s64.s32 %rd151, %r14643; 2026-02-21T09:06:50.2399715Z add.s64 %rd130, %rd23, %rd151; 2026-02-21T09:06:50.2399778Z cvt.s64.s32 %rd152, %r14644; 2026-02-21T09:06:50.2399840Z add.s64 %rd131, %rd23, %rd152; 2026-02-21T09:06:50.2399903Z cvt.s64.s32 %rd153, %r14645; 2026-02-21T09:06:50.2399965Z add.s64 %rd132, %rd23, %rd153; 2026-02-21T09:06:50.2400166Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2400226Z // begin inline asm 2026-02-21T09:06:50.2400291Z mov.u16 %rs585, 0x0; 2026-02-21T09:06:50.2400366Z ld.global.b8 { %rs585 }, [ %rd129 + 0 ]; 2026-02-21T09:06:50.2400423Z // end inline asm 2026-02-21T09:06:50.2400484Z // begin inline asm 2026-02-21T09:06:50.2400541Z mov.u16 %rs586, 0x0; 2026-02-21T09:06:50.2400615Z ld.global.b8 { %rs586 }, [ %rd130 + 0 ]; 2026-02-21T09:06:50.2400673Z // end inline asm 2026-02-21T09:06:50.2400736Z // begin inline asm 2026-02-21T09:06:50.2400794Z mov.u16 %rs587, 0x0; 2026-02-21T09:06:50.2400864Z ld.global.b8 { %rs587 }, [ %rd131 + 0 ]; 2026-02-21T09:06:50.2401001Z // end inline asm 2026-02-21T09:06:50.2401070Z // begin inline asm 2026-02-21T09:06:50.2401129Z mov.u16 %rs588, 0x0; 2026-02-21T09:06:50.2401198Z ld.global.b8 { %rs588 }, [ %rd132 + 0 ]; 2026-02-21T09:06:50.2401256Z // end inline asm 2026-02-21T09:06:50.2401454Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2401511Z bar.sync 0; 2026-02-21T09:06:50.2401581Z st.shared.b8 [%r843], %rs585; 2026-02-21T09:06:50.2401650Z st.shared.b8 [%r844+512], %rs586; 2026-02-21T09:06:50.2401716Z st.shared.b8 [%r845+1024], %rs587; 2026-02-21T09:06:50.2401784Z st.shared.b8 [%r846+1536], %rs588; 2026-02-21T09:06:50.2401840Z bar.sync 0; 2026-02-21T09:06:50.2402018Z ld.shared.b32 %r14646, [%r847]; 2026-02-21T09:06:50.2402097Z prmt.b32 %r14647, %r14646, 0, 0x7770U; 2026-02-21T09:06:50.2402166Z cvt.u16.u32 %rs697, %r14647; 2026-02-21T09:06:50.2402236Z prmt.b32 %r14648, %r14646, 0, 0x7771U; 2026-02-21T09:06:50.2402302Z cvt.u16.u32 %rs698, %r14648; 2026-02-21T09:06:50.2402372Z prmt.b32 %r14649, %r14646, 0, 0x7772U; 2026-02-21T09:06:50.2402433Z cvt.u16.u32 %rs699, %r14649; 2026-02-21T09:06:50.2402498Z prmt.b32 %r14650, %r14646, 0, 0x7773U; 2026-02-21T09:06:50.2402558Z cvt.u16.u32 %rs700, %r14650; 2026-02-21T09:06:50.2402628Z ld.shared.b32 %r14651, [%r847+128]; 2026-02-21T09:06:50.2402692Z prmt.b32 %r14652, %r14651, 0, 0x7770U; 2026-02-21T09:06:50.2402752Z cvt.u16.u32 %rs701, %r14652; 2026-02-21T09:06:50.2402826Z prmt.b32 %r14653, %r14651, 0, 0x7771U; 2026-02-21T09:06:50.2402886Z cvt.u16.u32 %rs702, %r14653; 2026-02-21T09:06:50.2402958Z prmt.b32 %r14654, %r14651, 0, 0x7772U; 2026-02-21T09:06:50.2403021Z cvt.u16.u32 %rs703, %r14654; 2026-02-21T09:06:50.2403091Z prmt.b32 %r14655, %r14651, 0, 0x7773U; 2026-02-21T09:06:50.2403203Z cvt.u16.u32 %rs704, %r14655; 2026-02-21T09:06:50.2403399Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2403468Z shl.b16 %rs705, %rs697, 4; 2026-02-21T09:06:50.2403532Z shl.b16 %rs706, %rs698, 4; 2026-02-21T09:06:50.2403592Z shl.b16 %rs707, %rs699, 4; 2026-02-21T09:06:50.2403659Z shl.b16 %rs708, %rs700, 4; 2026-02-21T09:06:50.2403719Z shl.b16 %rs709, %rs701, 4; 2026-02-21T09:06:50.2403777Z shl.b16 %rs710, %rs702, 4; 2026-02-21T09:06:50.2403837Z shl.b16 %rs711, %rs703, 4; 2026-02-21T09:06:50.2403901Z shl.b16 %rs712, %rs704, 4; 2026-02-21T09:06:50.2404094Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2404163Z selp.b16 %rs713, %rs705, %rs697, %p81; 2026-02-21T09:06:50.2404227Z cvt.s16.s8 %rs714, %rs713; 2026-02-21T09:06:50.2404288Z shr.s16 %rs715, %rs714, 4; 2026-02-21T09:06:50.2404358Z selp.b16 %rs716, %rs706, %rs698, %p81; 2026-02-21T09:06:50.2404417Z cvt.s16.s8 %rs717, %rs716; 2026-02-21T09:06:50.2404484Z shr.s16 %rs718, %rs717, 4; 2026-02-21T09:06:50.2404555Z selp.b16 %rs719, %rs707, %rs699, %p81; 2026-02-21T09:06:50.2404615Z cvt.s16.s8 %rs720, %rs719; 2026-02-21T09:06:50.2404676Z shr.s16 %rs721, %rs720, 4; 2026-02-21T09:06:50.2404744Z selp.b16 %rs722, %rs708, %rs700, %p81; 2026-02-21T09:06:50.2404804Z cvt.s16.s8 %rs723, %rs722; 2026-02-21T09:06:50.2404867Z shr.s16 %rs724, %rs723, 4; 2026-02-21T09:06:50.2404933Z selp.b16 %rs725, %rs709, %rs701, %p81; 2026-02-21T09:06:50.2404995Z cvt.s16.s8 %rs726, %rs725; 2026-02-21T09:06:50.2405054Z shr.s16 %rs727, %rs726, 4; 2026-02-21T09:06:50.2405125Z selp.b16 %rs728, %rs710, %rs702, %p81; 2026-02-21T09:06:50.2405184Z cvt.s16.s8 %rs729, %rs728; 2026-02-21T09:06:50.2405244Z shr.s16 %rs730, %rs729, 4; 2026-02-21T09:06:50.2405315Z selp.b16 %rs731, %rs711, %rs703, %p81; 2026-02-21T09:06:50.2405378Z cvt.s16.s8 %rs732, %rs731; 2026-02-21T09:06:50.2405439Z shr.s16 %rs733, %rs732, 4; 2026-02-21T09:06:50.2405504Z selp.b16 %rs734, %rs712, %rs704, %p81; 2026-02-21T09:06:50.2405567Z cvt.s16.s8 %rs735, %rs734; 2026-02-21T09:06:50.2405687Z shr.s16 %rs736, %rs735, 4; 2026-02-21T09:06:50.2405879Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2405950Z cvt.rn.f32.s16 %r14656, %rs715; 2026-02-21T09:06:50.2406015Z cvt.rn.f32.s16 %r14657, %rs718; 2026-02-21T09:06:50.2406077Z cvt.rn.f32.s16 %r14658, %rs721; 2026-02-21T09:06:50.2406156Z cvt.rn.f32.s16 %r14659, %rs724; 2026-02-21T09:06:50.2406220Z cvt.rn.f32.s16 %r14660, %rs727; 2026-02-21T09:06:50.2406281Z cvt.rn.f32.s16 %r14661, %rs730; 2026-02-21T09:06:50.2406343Z cvt.rn.f32.s16 %r14662, %rs733; 2026-02-21T09:06:50.2406409Z cvt.rn.f32.s16 %r14663, %rs736; 2026-02-21T09:06:50.2406600Z bar.sync 0; 2026-02-21T09:06:50.2406669Z st.shared.b32 [%r848], %r14656; 2026-02-21T09:06:50.2406886Z st.shared.b32 [%r848+8], %r14657; 2026-02-21T09:06:50.2406954Z st.shared.b32 [%r849], %r14658; 2026-02-21T09:06:50.2407017Z st.shared.b32 [%r849+8], %r14659; 2026-02-21T09:06:50.2407084Z st.shared.b32 [%r850], %r14660; 2026-02-21T09:06:50.2407151Z st.shared.b32 [%r850+8], %r14661; 2026-02-21T09:06:50.2407214Z st.shared.b32 [%r851], %r14662; 2026-02-21T09:06:50.2407277Z st.shared.b32 [%r851+8], %r14663; 2026-02-21T09:06:50.2407347Z $L__tmp29: 2026-02-21T09:06:50.2407635Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2407697Z // begin inline asm 2026-02-21T09:06:50.2407780Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2407837Z // end inline asm 2026-02-21T09:06:50.2407892Z bar.sync 0; 2026-02-21T09:06:50.2407966Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2408028Z // begin inline asm 2026-02-21T09:06:50.2410827Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r13270,%r13271,%r13272,%r13273}, %rd119, %p67, 1, 1; 2026-02-21T09:06:50.2410898Z // end inline asm 2026-02-21T09:06:50.2410957Z // begin inline asm 2026-02-21T09:06:50.2413687Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r13530,%r13531,%r13532,%r13533}, %rd120, %p67, 1, 1; 2026-02-21T09:06:50.2413815Z // end inline asm 2026-02-21T09:06:50.2413891Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2413957Z mov.b32 %r13662, %r12221; 2026-02-21T09:06:50.2414017Z mov.b32 %r13663, %r14449; 2026-02-21T09:06:50.2414076Z mov.b32 %r13664, %r14449; 2026-02-21T09:06:50.2414136Z // begin inline asm 2026-02-21T09:06:50.2416850Z // wait for regs: %r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304,%r13662,%r13663,%r13664 2026-02-21T09:06:50.2417008Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2417067Z // end inline asm 2026-02-21T09:06:50.2417122Z $L__tmp30: 2026-02-21T09:06:50.2417340Z .loc 1 47 124 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:47:124 2026-02-21T09:06:50.2417407Z add.s32 %r14664, %r1008, 24; 2026-02-21T09:06:50.2417665Z .loc 1 48 35 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:48:35 2026-02-21T09:06:50.2417731Z add.s32 %r14665, %r14664, %r9; 2026-02-21T09:06:50.2417934Z .loc 1 52 22 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:52:22 2026-02-21T09:06:50.2417998Z shl.b32 %r14666, %r14664, 1; 2026-02-21T09:06:50.2418187Z .loc 1 55 60 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:60 2026-02-21T09:06:50.2418262Z add.s32 %r14667, %r14585, %r14666; 2026-02-21T09:06:50.2418457Z .loc 1 55 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:32 2026-02-21T09:06:50.2418532Z mad.wide.s32 %rd135, %r14667, 2, %rd22; 2026-02-21T09:06:50.2418724Z .loc 1 55 80 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:55:80 2026-02-21T09:06:50.2418783Z // begin inline asm 2026-02-21T09:06:50.2418845Z mov.u32 %r13796, 0x0; 2026-02-21T09:06:50.2418908Z mov.u32 %r13797, 0x0; 2026-02-21T09:06:50.2418965Z mov.u32 %r13798, 0x0; 2026-02-21T09:06:50.2419022Z mov.u32 %r13799, 0x0; 2026-02-21T09:06:50.2419160Z ld.global.v4.b32 { %r13796, %r13797, %r13798, %r13799 }, [ %rd135 + 0 ]; 2026-02-21T09:06:50.2419220Z // end inline asm 2026-02-21T09:06:50.2419411Z .loc 1 59 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:59:32 2026-02-21T09:06:50.2419466Z bar.sync 0; 2026-02-21T09:06:50.2419553Z st.shared.v2.b32 [%r839], {%r13796, %r13797}; 2026-02-21T09:06:50.2419633Z st.shared.v2.b32 [%r840], {%r13798, %r13799}; 2026-02-21T09:06:50.2419688Z bar.sync 0; 2026-02-21T09:06:50.2419753Z ld.shared.b16 %rs737, [%r841]; 2026-02-21T09:06:50.2419823Z ld.shared.b16 %rs738, [%r841+256]; 2026-02-21T09:06:50.2419890Z ld.shared.b16 %rs739, [%r841+16]; 2026-02-21T09:06:50.2419952Z ld.shared.b16 %rs740, [%r841+272]; 2026-02-21T09:06:50.2420021Z ld.shared.b16 %rs741, [%r842]; 2026-02-21T09:06:50.2420089Z ld.shared.b16 %rs742, [%r842+256]; 2026-02-21T09:06:50.2420152Z ld.shared.b16 %rs743, [%r842+16]; 2026-02-21T09:06:50.2420218Z ld.shared.b16 %rs744, [%r842+272]; 2026-02-21T09:06:50.2420376Z cvt.f32.bf16 %r14056, %rs737; 2026-02-21T09:06:50.2420451Z cvt.f32.bf16 %r14057, %rs738; 2026-02-21T09:06:50.2420516Z cvt.f32.bf16 %r14058, %rs741; 2026-02-21T09:06:50.2420581Z cvt.f32.bf16 %r14059, %rs742; 2026-02-21T09:06:50.2420642Z cvt.f32.bf16 %r14316, %rs739; 2026-02-21T09:06:50.2420701Z cvt.f32.bf16 %r14317, %rs740; 2026-02-21T09:06:50.2420762Z cvt.f32.bf16 %r14318, %rs743; 2026-02-21T09:06:50.2420822Z cvt.f32.bf16 %r14319, %rs744; 2026-02-21T09:06:50.2421015Z .loc 1 61 55 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:55 2026-02-21T09:06:50.2421076Z shl.b32 %r14668, %r14665, 13; 2026-02-21T09:06:50.2421270Z .loc 1 61 62 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:62 2026-02-21T09:06:50.2421433Z add.s32 %r14669, %r15311, %r14668; 2026-02-21T09:06:50.2421497Z add.s32 %r14670, %r15312, %r14668; 2026-02-21T09:06:50.2421564Z add.s32 %r14671, %r15313, %r14668; 2026-02-21T09:06:50.2421639Z add.s32 %r14672, %r15314, %r14668; 2026-02-21T09:06:50.2421832Z .loc 1 61 34 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:34 2026-02-21T09:06:50.2421898Z cvt.s64.s32 %rd154, %r14669; 2026-02-21T09:06:50.2421962Z add.s64 %rd136, %rd23, %rd154; 2026-02-21T09:06:50.2422025Z cvt.s64.s32 %rd155, %r14670; 2026-02-21T09:06:50.2422089Z add.s64 %rd137, %rd23, %rd155; 2026-02-21T09:06:50.2422152Z cvt.s64.s32 %rd156, %r14671; 2026-02-21T09:06:50.2422213Z add.s64 %rd138, %rd23, %rd156; 2026-02-21T09:06:50.2422272Z cvt.s64.s32 %rd157, %r14672; 2026-02-21T09:06:50.2422334Z add.s64 %rd139, %rd23, %rd157; 2026-02-21T09:06:50.2422523Z .loc 1 61 87 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:61:87 2026-02-21T09:06:50.2422585Z // begin inline asm 2026-02-21T09:06:50.2422699Z mov.u16 %rs589, 0x0; 2026-02-21T09:06:50.2422774Z ld.global.b8 { %rs589 }, [ %rd136 + 0 ]; 2026-02-21T09:06:50.2422832Z // end inline asm 2026-02-21T09:06:50.2422893Z // begin inline asm 2026-02-21T09:06:50.2422954Z mov.u16 %rs590, 0x0; 2026-02-21T09:06:50.2423025Z ld.global.b8 { %rs590 }, [ %rd137 + 0 ]; 2026-02-21T09:06:50.2423080Z // end inline asm 2026-02-21T09:06:50.2423142Z // begin inline asm 2026-02-21T09:06:50.2423202Z mov.u16 %rs591, 0x0; 2026-02-21T09:06:50.2423269Z ld.global.b8 { %rs591 }, [ %rd138 + 0 ]; 2026-02-21T09:06:50.2423325Z // end inline asm 2026-02-21T09:06:50.2423386Z // begin inline asm 2026-02-21T09:06:50.2423454Z mov.u16 %rs592, 0x0; 2026-02-21T09:06:50.2423529Z ld.global.b8 { %rs592 }, [ %rd139 + 0 ]; 2026-02-21T09:06:50.2423587Z // end inline asm 2026-02-21T09:06:50.2423778Z .loc 1 69 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:69:28 2026-02-21T09:06:50.2423834Z bar.sync 0; 2026-02-21T09:06:50.2423900Z st.shared.b8 [%r843], %rs589; 2026-02-21T09:06:50.2423971Z st.shared.b8 [%r844+512], %rs590; 2026-02-21T09:06:50.2424035Z st.shared.b8 [%r845+1024], %rs591; 2026-02-21T09:06:50.2424103Z st.shared.b8 [%r846+1536], %rs592; 2026-02-21T09:06:50.2424163Z bar.sync 0; 2026-02-21T09:06:50.2424229Z ld.shared.b32 %r14673, [%r847]; 2026-02-21T09:06:50.2424297Z prmt.b32 %r14674, %r14673, 0, 0x7770U; 2026-02-21T09:06:50.2424357Z cvt.u16.u32 %rs745, %r14674; 2026-02-21T09:06:50.2424425Z prmt.b32 %r14675, %r14673, 0, 0x7771U; 2026-02-21T09:06:50.2424485Z cvt.u16.u32 %rs746, %r14675; 2026-02-21T09:06:50.2424550Z prmt.b32 %r14676, %r14673, 0, 0x7772U; 2026-02-21T09:06:50.2424612Z cvt.u16.u32 %rs747, %r14676; 2026-02-21T09:06:50.2424676Z prmt.b32 %r14677, %r14673, 0, 0x7773U; 2026-02-21T09:06:50.2424736Z cvt.u16.u32 %rs748, %r14677; 2026-02-21T09:06:50.2424806Z ld.shared.b32 %r14678, [%r847+128]; 2026-02-21T09:06:50.2424872Z prmt.b32 %r14679, %r14678, 0, 0x7770U; 2026-02-21T09:06:50.2424938Z cvt.u16.u32 %rs749, %r14679; 2026-02-21T09:06:50.2425004Z prmt.b32 %r14680, %r14678, 0, 0x7771U; 2026-02-21T09:06:50.2425066Z cvt.u16.u32 %rs750, %r14680; 2026-02-21T09:06:50.2425191Z prmt.b32 %r14681, %r14678, 0, 0x7772U; 2026-02-21T09:06:50.2425252Z cvt.u16.u32 %rs751, %r14681; 2026-02-21T09:06:50.2425318Z prmt.b32 %r14682, %r14678, 0, 0x7773U; 2026-02-21T09:06:50.2425377Z cvt.u16.u32 %rs752, %r14682; 2026-02-21T09:06:50.2425570Z .loc 1 64 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:64:28 2026-02-21T09:06:50.2425647Z shl.b16 %rs753, %rs745, 4; 2026-02-21T09:06:50.2425710Z shl.b16 %rs754, %rs746, 4; 2026-02-21T09:06:50.2425771Z shl.b16 %rs755, %rs747, 4; 2026-02-21T09:06:50.2425834Z shl.b16 %rs756, %rs748, 4; 2026-02-21T09:06:50.2425893Z shl.b16 %rs757, %rs749, 4; 2026-02-21T09:06:50.2425951Z shl.b16 %rs758, %rs750, 4; 2026-02-21T09:06:50.2426010Z shl.b16 %rs759, %rs751, 4; 2026-02-21T09:06:50.2426175Z shl.b16 %rs760, %rs752, 4; 2026-02-21T09:06:50.2426375Z .loc 1 79 58 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:79:58 2026-02-21T09:06:50.2426445Z selp.b16 %rs761, %rs753, %rs745, %p81; 2026-02-21T09:06:50.2426638Z cvt.s16.s8 %rs762, %rs761; 2026-02-21T09:06:50.2426698Z shr.s16 %rs763, %rs762, 4; 2026-02-21T09:06:50.2426767Z selp.b16 %rs764, %rs754, %rs746, %p81; 2026-02-21T09:06:50.2426828Z cvt.s16.s8 %rs765, %rs764; 2026-02-21T09:06:50.2426889Z shr.s16 %rs766, %rs765, 4; 2026-02-21T09:06:50.2426956Z selp.b16 %rs767, %rs755, %rs747, %p81; 2026-02-21T09:06:50.2427017Z cvt.s16.s8 %rs768, %rs767; 2026-02-21T09:06:50.2427080Z shr.s16 %rs769, %rs768, 4; 2026-02-21T09:06:50.2427147Z selp.b16 %rs770, %rs756, %rs748, %p81; 2026-02-21T09:06:50.2427205Z cvt.s16.s8 %rs771, %rs770; 2026-02-21T09:06:50.2427264Z shr.s16 %rs772, %rs771, 4; 2026-02-21T09:06:50.2427334Z selp.b16 %rs773, %rs757, %rs749, %p81; 2026-02-21T09:06:50.2427395Z cvt.s16.s8 %rs774, %rs773; 2026-02-21T09:06:50.2427531Z shr.s16 %rs775, %rs774, 4; 2026-02-21T09:06:50.2427605Z selp.b16 %rs776, %rs758, %rs750, %p81; 2026-02-21T09:06:50.2427666Z cvt.s16.s8 %rs777, %rs776; 2026-02-21T09:06:50.2427726Z shr.s16 %rs778, %rs777, 4; 2026-02-21T09:06:50.2427794Z selp.b16 %rs779, %rs759, %rs751, %p81; 2026-02-21T09:06:50.2427853Z cvt.s16.s8 %rs780, %rs779; 2026-02-21T09:06:50.2427913Z shr.s16 %rs781, %rs780, 4; 2026-02-21T09:06:50.2427978Z selp.b16 %rs782, %rs760, %rs752, %p81; 2026-02-21T09:06:50.2428041Z cvt.s16.s8 %rs783, %rs782; 2026-02-21T09:06:50.2428112Z shr.s16 %rs784, %rs783, 4; 2026-02-21T09:06:50.2428385Z .loc 1 84 32 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:84:32 2026-02-21T09:06:50.2428455Z cvt.rn.f32.s16 %r14683, %rs763; 2026-02-21T09:06:50.2428518Z cvt.rn.f32.s16 %r14684, %rs766; 2026-02-21T09:06:50.2428578Z cvt.rn.f32.s16 %r14685, %rs769; 2026-02-21T09:06:50.2428639Z cvt.rn.f32.s16 %r14686, %rs772; 2026-02-21T09:06:50.2428705Z cvt.rn.f32.s16 %r14687, %rs775; 2026-02-21T09:06:50.2428769Z cvt.rn.f32.s16 %r14688, %rs778; 2026-02-21T09:06:50.2428830Z cvt.rn.f32.s16 %r14689, %rs781; 2026-02-21T09:06:50.2428892Z cvt.rn.f32.s16 %r14690, %rs784; 2026-02-21T09:06:50.2428947Z bar.sync 0; 2026-02-21T09:06:50.2429009Z st.shared.b32 [%r848], %r14683; 2026-02-21T09:06:50.2429075Z st.shared.b32 [%r848+8], %r14684; 2026-02-21T09:06:50.2429139Z st.shared.b32 [%r849], %r14685; 2026-02-21T09:06:50.2429202Z st.shared.b32 [%r849+8], %r14686; 2026-02-21T09:06:50.2429264Z st.shared.b32 [%r850], %r14687; 2026-02-21T09:06:50.2429331Z st.shared.b32 [%r850+8], %r14688; 2026-02-21T09:06:50.2429393Z st.shared.b32 [%r851], %r14689; 2026-02-21T09:06:50.2429455Z st.shared.b32 [%r851+8], %r14690; 2026-02-21T09:06:50.2429513Z $L__tmp31: 2026-02-21T09:06:50.2429778Z .loc 2 291 36 // standard.py:291:36 @[ cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:91:40 ] 2026-02-21T09:06:50.2429843Z // begin inline asm 2026-02-21T09:06:50.2429921Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2429980Z // end inline asm 2026-02-21T09:06:50.2430038Z bar.sync 0; 2026-02-21T09:06:50.2430111Z wgmma.fence.sync.aligned; 2026-02-21T09:06:50.2430265Z // begin inline asm 2026-02-21T09:06:50.2433069Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r14056,%r14057,%r14058,%r14059}, %rd119, %p67, 1, 1; 2026-02-21T09:06:50.2433191Z // end inline asm 2026-02-21T09:06:50.2433250Z // begin inline asm 2026-02-21T09:06:50.2436017Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304}, {%r14316,%r14317,%r14318,%r14319}, %rd120, %p67, 1, 1; 2026-02-21T09:06:50.2436083Z // end inline asm 2026-02-21T09:06:50.2436158Z wgmma.commit_group.sync.aligned; 2026-02-21T09:06:50.2436224Z mov.b32 %r14448, %r12221; 2026-02-21T09:06:50.2436282Z mov.b32 %r14450, %r14449; 2026-02-21T09:06:50.2436350Z // begin inline asm 2026-02-21T09:06:50.2439007Z // wait for regs: %r15177,%r15178,%r15179,%r15180,%r15181,%r15182,%r15183,%r15184,%r15185,%r15186,%r15187,%r15188,%r15189,%r15190,%r15191,%r15192,%r15193,%r15194,%r15195,%r15196,%r15197,%r15198,%r15199,%r15200,%r15201,%r15202,%r15203,%r15204,%r15205,%r15206,%r15207,%r15208,%r15209,%r15210,%r15211,%r15212,%r15213,%r15214,%r15215,%r15216,%r15217,%r15218,%r15219,%r15220,%r15221,%r15222,%r15223,%r15224,%r15225,%r15226,%r15227,%r15228,%r15229,%r15230,%r15231,%r15232,%r15233,%r15234,%r15235,%r15236,%r15237,%r15238,%r15239,%r15240,%r15241,%r15242,%r15243,%r15244,%r15245,%r15246,%r15247,%r15248,%r15249,%r15250,%r15251,%r15252,%r15253,%r15254,%r15255,%r15256,%r15257,%r15258,%r15259,%r15260,%r15261,%r15262,%r15263,%r15264,%r15265,%r15266,%r15267,%r15268,%r15269,%r15270,%r15271,%r15272,%r15273,%r15274,%r15275,%r15276,%r15277,%r15278,%r15279,%r15280,%r15281,%r15282,%r15283,%r15284,%r15285,%r15286,%r15287,%r15288,%r15289,%r15290,%r15291,%r15292,%r15293,%r15294,%r15295,%r15296,%r15297,%r15298,%r15299,%r15300,%r15301,%r15302,%r15303,%r15304,%r14448,%r14449,%r14450 2026-02-21T09:06:50.2439093Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:06:50.2439152Z // end inline asm 2026-02-21T09:06:50.2439206Z $L__tmp32: 2026-02-21T09:06:50.2439418Z .loc 1 26 144 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:144 2026-02-21T09:06:50.2439589Z setp.ne.b32 %p76, %r15307, 15; 2026-02-21T09:06:50.2439651Z @%p76 bra $L__BB0_15; 2026-02-21T09:06:50.2439764Z // %bb.14: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:06:50.2439964Z .loc 1 94 28 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:94:28 2026-02-21T09:06:50.2440054Z cvt.rn.bf16x2.f32 %r14695, %r15178, %r15177; 2026-02-21T09:06:50.2440134Z cvt.rn.bf16x2.f32 %r14696, %r15180, %r15179; 2026-02-21T09:06:50.2440210Z cvt.rn.bf16x2.f32 %r14697, %r15182, %r15181; 2026-02-21T09:06:50.2440289Z cvt.rn.bf16x2.f32 %r14698, %r15184, %r15183; 2026-02-21T09:06:50.2440363Z cvt.rn.bf16x2.f32 %r14699, %r15186, %r15185; 2026-02-21T09:06:50.2440579Z cvt.rn.bf16x2.f32 %r14700, %r15188, %r15187; 2026-02-21T09:06:50.2440661Z cvt.rn.bf16x2.f32 %r14701, %r15190, %r15189; 2026-02-21T09:06:50.2440737Z cvt.rn.bf16x2.f32 %r14702, %r15192, %r15191; 2026-02-21T09:06:50.2440812Z cvt.rn.bf16x2.f32 %r14703, %r15194, %r15193; 2026-02-21T09:06:50.2440894Z cvt.rn.bf16x2.f32 %r14704, %r15196, %r15195; 2026-02-21T09:06:50.2440969Z cvt.rn.bf16x2.f32 %r14705, %r15198, %r15197; 2026-02-21T09:06:50.2441044Z cvt.rn.bf16x2.f32 %r14706, %r15200, %r15199; 2026-02-21T09:06:50.2441125Z cvt.rn.bf16x2.f32 %r14707, %r15202, %r15201; 2026-02-21T09:06:50.2441199Z cvt.rn.bf16x2.f32 %r14708, %r15204, %r15203; 2026-02-21T09:06:50.2441273Z cvt.rn.bf16x2.f32 %r14709, %r15206, %r15205; 2026-02-21T09:06:50.2441352Z cvt.rn.bf16x2.f32 %r14710, %r15208, %r15207; 2026-02-21T09:06:50.2441424Z cvt.rn.bf16x2.f32 %r14711, %r15210, %r15209; 2026-02-21T09:06:50.2441498Z cvt.rn.bf16x2.f32 %r14712, %r15212, %r15211; 2026-02-21T09:06:50.2441576Z cvt.rn.bf16x2.f32 %r14713, %r15214, %r15213; 2026-02-21T09:06:50.2441738Z cvt.rn.bf16x2.f32 %r14714, %r15216, %r15215; 2026-02-21T09:06:50.2441818Z cvt.rn.bf16x2.f32 %r14715, %r15218, %r15217; 2026-02-21T09:06:50.2441892Z cvt.rn.bf16x2.f32 %r14716, %r15220, %r15219; 2026-02-21T09:06:50.2441974Z cvt.rn.bf16x2.f32 %r14717, %r15222, %r15221; 2026-02-21T09:06:50.2442049Z cvt.rn.bf16x2.f32 %r14718, %r15224, %r15223; 2026-02-21T09:06:50.2442123Z cvt.rn.bf16x2.f32 %r14719, %r15226, %r15225; 2026-02-21T09:06:50.2442209Z cvt.rn.bf16x2.f32 %r14720, %r15228, %r15227; 2026-02-21T09:06:50.2442291Z cvt.rn.bf16x2.f32 %r14721, %r15230, %r15229; 2026-02-21T09:06:50.2442367Z cvt.rn.bf16x2.f32 %r14722, %r15232, %r15231; 2026-02-21T09:06:50.2442446Z cvt.rn.bf16x2.f32 %r14723, %r15234, %r15233; 2026-02-21T09:06:50.2442521Z cvt.rn.bf16x2.f32 %r14724, %r15236, %r15235; 2026-02-21T09:06:50.2442596Z cvt.rn.bf16x2.f32 %r14725, %r15238, %r15237; 2026-02-21T09:06:50.2442671Z cvt.rn.bf16x2.f32 %r14726, %r15240, %r15239; 2026-02-21T09:06:50.2442753Z cvt.rn.bf16x2.f32 %r14727, %r15242, %r15241; 2026-02-21T09:06:50.2442829Z cvt.rn.bf16x2.f32 %r14728, %r15244, %r15243; 2026-02-21T09:06:50.2442904Z cvt.rn.bf16x2.f32 %r14729, %r15246, %r15245; 2026-02-21T09:06:50.2442982Z cvt.rn.bf16x2.f32 %r14730, %r15248, %r15247; 2026-02-21T09:06:50.2443059Z cvt.rn.bf16x2.f32 %r14731, %r15250, %r15249; 2026-02-21T09:06:50.2443132Z cvt.rn.bf16x2.f32 %r14732, %r15252, %r15251; 2026-02-21T09:06:50.2443207Z cvt.rn.bf16x2.f32 %r14733, %r15254, %r15253; 2026-02-21T09:06:50.2443283Z cvt.rn.bf16x2.f32 %r14734, %r15256, %r15255; 2026-02-21T09:06:50.2443357Z cvt.rn.bf16x2.f32 %r14735, %r15258, %r15257; 2026-02-21T09:06:50.2443429Z cvt.rn.bf16x2.f32 %r14736, %r15260, %r15259; 2026-02-21T09:06:50.2443506Z cvt.rn.bf16x2.f32 %r14737, %r15262, %r15261; 2026-02-21T09:06:50.2443581Z cvt.rn.bf16x2.f32 %r14738, %r15264, %r15263; 2026-02-21T09:06:50.2443655Z cvt.rn.bf16x2.f32 %r14739, %r15266, %r15265; 2026-02-21T09:06:50.2443734Z cvt.rn.bf16x2.f32 %r14740, %r15268, %r15267; 2026-02-21T09:06:50.2443810Z cvt.rn.bf16x2.f32 %r14741, %r15270, %r15269; 2026-02-21T09:06:50.2443883Z cvt.rn.bf16x2.f32 %r14742, %r15272, %r15271; 2026-02-21T09:06:50.2443958Z cvt.rn.bf16x2.f32 %r14743, %r15274, %r15273; 2026-02-21T09:06:50.2444108Z cvt.rn.bf16x2.f32 %r14744, %r15276, %r15275; 2026-02-21T09:06:50.2444183Z cvt.rn.bf16x2.f32 %r14745, %r15278, %r15277; 2026-02-21T09:06:50.2444257Z cvt.rn.bf16x2.f32 %r14746, %r15280, %r15279; 2026-02-21T09:06:50.2444336Z cvt.rn.bf16x2.f32 %r14747, %r15282, %r15281; 2026-02-21T09:06:50.2444410Z cvt.rn.bf16x2.f32 %r14748, %r15284, %r15283; 2026-02-21T09:06:50.2444484Z cvt.rn.bf16x2.f32 %r14749, %r15286, %r15285; 2026-02-21T09:06:50.2444560Z cvt.rn.bf16x2.f32 %r14750, %r15288, %r15287; 2026-02-21T09:06:50.2444634Z cvt.rn.bf16x2.f32 %r14751, %r15290, %r15289; 2026-02-21T09:06:50.2444708Z cvt.rn.bf16x2.f32 %r14752, %r15292, %r15291; 2026-02-21T09:06:50.2444781Z cvt.rn.bf16x2.f32 %r14753, %r15294, %r15293; 2026-02-21T09:06:50.2444955Z cvt.rn.bf16x2.f32 %r14754, %r15296, %r15295; 2026-02-21T09:06:50.2445032Z cvt.rn.bf16x2.f32 %r14755, %r15298, %r15297; 2026-02-21T09:06:50.2445106Z cvt.rn.bf16x2.f32 %r14756, %r15300, %r15299; 2026-02-21T09:06:50.2445183Z cvt.rn.bf16x2.f32 %r14757, %r15302, %r15301; 2026-02-21T09:06:50.2445256Z cvt.rn.bf16x2.f32 %r14758, %r15304, %r15303; 2026-02-21T09:06:50.2445455Z .loc 1 95 43 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:95:43 2026-02-21T09:06:50.2445514Z bar.sync 0; 2026-02-21T09:06:50.2445714Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r852], {%r14695, %r14696, %r14697, %r14698}; 2026-02-21T09:06:50.2445904Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r853], {%r14711, %r14712, %r14713, %r14714}; 2026-02-21T09:06:50.2446093Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r854], {%r14727, %r14728, %r14729, %r14730}; 2026-02-21T09:06:50.2446276Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r855], {%r14743, %r14744, %r14745, %r14746}; 2026-02-21T09:06:50.2446711Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r856], {%r14699, %r14700, %r14701, %r14702}; 2026-02-21T09:06:50.2446912Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r857], {%r14715, %r14716, %r14717, %r14718}; 2026-02-21T09:06:50.2447102Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r858], {%r14731, %r14732, %r14733, %r14734}; 2026-02-21T09:06:50.2447285Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r859], {%r14747, %r14748, %r14749, %r14750}; 2026-02-21T09:06:50.2447469Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r860], {%r14703, %r14704, %r14705, %r14706}; 2026-02-21T09:06:50.2447670Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r861], {%r14719, %r14720, %r14721, %r14722}; 2026-02-21T09:06:50.2447854Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r862], {%r14735, %r14736, %r14737, %r14738}; 2026-02-21T09:06:50.2448038Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r863], {%r14751, %r14752, %r14753, %r14754}; 2026-02-21T09:06:50.2448225Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r864], {%r14707, %r14708, %r14709, %r14710}; 2026-02-21T09:06:50.2448411Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r865], {%r14723, %r14724, %r14725, %r14726}; 2026-02-21T09:06:50.2448592Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r866], {%r14739, %r14740, %r14741, %r14742}; 2026-02-21T09:06:50.2448778Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r867], {%r14755, %r14756, %r14757, %r14758}; 2026-02-21T09:06:50.2448839Z // begin inline asm 2026-02-21T09:06:50.2448920Z fence.proxy.async.shared::cta; 2026-02-21T09:06:50.2448978Z // end inline asm 2026-02-21T09:06:50.2449034Z bar.sync 0; 2026-02-21T09:06:50.2449105Z elect.sync %r14759|%p79, -1; 2026-02-21T09:06:50.2449172Z and.pred %p77, %p82, %p79; 2026-02-21T09:06:50.2449237Z and.b32 %r14760, %r1025, 3; 2026-02-21T09:06:50.2449298Z shl.b32 %r14761, %r14760, 15; 2026-02-21T09:06:50.2449362Z add.s32 %r14693, %r12221, %r14761; 2026-02-21T09:06:50.2449427Z shl.b32 %r14763, %r14760, 6; 2026-02-21T09:06:50.2449488Z or.b32 %r14691, %r14763, %r15310; 2026-02-21T09:06:50.2449552Z // begin inline asm 2026-02-21T09:06:50.2449795Z @%p77 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd158, {%r14691, %r15308}], [%r14693]; 2026-02-21T09:06:50.2449857Z // end inline asm 2026-02-21T09:06:50.2450022Z cp.async.bulk.commit_group; 2026-02-21T09:06:50.2450099Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:06:50.2450157Z bar.sync 0; 2026-02-21T09:06:50.2450216Z mov.b32 %r15177, 0f00000000; 2026-02-21T09:06:50.2450276Z mov.b32 %r15178, %r15177; 2026-02-21T09:06:50.2450339Z mov.b32 %r15179, %r15177; 2026-02-21T09:06:50.2450396Z mov.b32 %r15180, %r15177; 2026-02-21T09:06:50.2450453Z mov.b32 %r15181, %r15177; 2026-02-21T09:06:50.2450511Z mov.b32 %r15182, %r15177; 2026-02-21T09:06:50.2450571Z mov.b32 %r15183, %r15177; 2026-02-21T09:06:50.2450628Z mov.b32 %r15184, %r15177; 2026-02-21T09:06:50.2450685Z mov.b32 %r15185, %r15177; 2026-02-21T09:06:50.2450746Z mov.b32 %r15186, %r15177; 2026-02-21T09:06:50.2450804Z mov.b32 %r15187, %r15177; 2026-02-21T09:06:50.2450992Z mov.b32 %r15188, %r15177; 2026-02-21T09:06:50.2451054Z mov.b32 %r15189, %r15177; 2026-02-21T09:06:50.2451114Z mov.b32 %r15190, %r15177; 2026-02-21T09:06:50.2451170Z mov.b32 %r15191, %r15177; 2026-02-21T09:06:50.2451230Z mov.b32 %r15192, %r15177; 2026-02-21T09:06:50.2451295Z mov.b32 %r15193, %r15177; 2026-02-21T09:06:50.2451352Z mov.b32 %r15194, %r15177; 2026-02-21T09:06:50.2451408Z mov.b32 %r15195, %r15177; 2026-02-21T09:06:50.2451463Z mov.b32 %r15196, %r15177; 2026-02-21T09:06:50.2451522Z mov.b32 %r15197, %r15177; 2026-02-21T09:06:50.2451578Z mov.b32 %r15198, %r15177; 2026-02-21T09:06:50.2451634Z mov.b32 %r15199, %r15177; 2026-02-21T09:06:50.2451695Z mov.b32 %r15200, %r15177; 2026-02-21T09:06:50.2451753Z mov.b32 %r15201, %r15177; 2026-02-21T09:06:50.2451810Z mov.b32 %r15202, %r15177; 2026-02-21T09:06:50.2451878Z mov.b32 %r15203, %r15177; 2026-02-21T09:06:50.2451941Z mov.b32 %r15204, %r15177; 2026-02-21T09:06:50.2451997Z mov.b32 %r15205, %r15177; 2026-02-21T09:06:50.2452057Z mov.b32 %r15206, %r15177; 2026-02-21T09:06:50.2452172Z mov.b32 %r15207, %r15177; 2026-02-21T09:06:50.2452232Z mov.b32 %r15208, %r15177; 2026-02-21T09:06:50.2452290Z mov.b32 %r15209, %r15177; 2026-02-21T09:06:50.2452349Z mov.b32 %r15210, %r15177; 2026-02-21T09:06:50.2452408Z mov.b32 %r15211, %r15177; 2026-02-21T09:06:50.2452467Z mov.b32 %r15212, %r15177; 2026-02-21T09:06:50.2452523Z mov.b32 %r15213, %r15177; 2026-02-21T09:06:50.2452583Z mov.b32 %r15214, %r15177; 2026-02-21T09:06:50.2452641Z mov.b32 %r15215, %r15177; 2026-02-21T09:06:50.2452698Z mov.b32 %r15216, %r15177; 2026-02-21T09:06:50.2452757Z mov.b32 %r15217, %r15177; 2026-02-21T09:06:50.2452817Z mov.b32 %r15218, %r15177; 2026-02-21T09:06:50.2452874Z mov.b32 %r15219, %r15177; 2026-02-21T09:06:50.2452929Z mov.b32 %r15220, %r15177; 2026-02-21T09:06:50.2452993Z mov.b32 %r15221, %r15177; 2026-02-21T09:06:50.2453049Z mov.b32 %r15222, %r15177; 2026-02-21T09:06:50.2453105Z mov.b32 %r15223, %r15177; 2026-02-21T09:06:50.2453168Z mov.b32 %r15224, %r15177; 2026-02-21T09:06:50.2453227Z mov.b32 %r15225, %r15177; 2026-02-21T09:06:50.2453283Z mov.b32 %r15226, %r15177; 2026-02-21T09:06:50.2453340Z mov.b32 %r15227, %r15177; 2026-02-21T09:06:50.2453403Z mov.b32 %r15228, %r15177; 2026-02-21T09:06:50.2453459Z mov.b32 %r15229, %r15177; 2026-02-21T09:06:50.2453516Z mov.b32 %r15230, %r15177; 2026-02-21T09:06:50.2453576Z mov.b32 %r15231, %r15177; 2026-02-21T09:06:50.2453632Z mov.b32 %r15232, %r15177; 2026-02-21T09:06:50.2453688Z mov.b32 %r15233, %r15177; 2026-02-21T09:06:50.2453746Z mov.b32 %r15234, %r15177; 2026-02-21T09:06:50.2453808Z mov.b32 %r15235, %r15177; 2026-02-21T09:06:50.2453866Z mov.b32 %r15236, %r15177; 2026-02-21T09:06:50.2453922Z mov.b32 %r15237, %r15177; 2026-02-21T09:06:50.2453982Z mov.b32 %r15238, %r15177; 2026-02-21T09:06:50.2454039Z mov.b32 %r15239, %r15177; 2026-02-21T09:06:50.2454095Z mov.b32 %r15240, %r15177; 2026-02-21T09:06:50.2454151Z mov.b32 %r15241, %r15177; 2026-02-21T09:06:50.2454217Z mov.b32 %r15242, %r15177; 2026-02-21T09:06:50.2454276Z mov.b32 %r15243, %r15177; 2026-02-21T09:06:50.2454347Z mov.b32 %r15244, %r15177; 2026-02-21T09:06:50.2454410Z mov.b32 %r15245, %r15177; 2026-02-21T09:06:50.2454533Z mov.b32 %r15246, %r15177; 2026-02-21T09:06:50.2454592Z mov.b32 %r15247, %r15177; 2026-02-21T09:06:50.2454649Z mov.b32 %r15248, %r15177; 2026-02-21T09:06:50.2454710Z mov.b32 %r15249, %r15177; 2026-02-21T09:06:50.2454767Z mov.b32 %r15250, %r15177; 2026-02-21T09:06:50.2454825Z mov.b32 %r15251, %r15177; 2026-02-21T09:06:50.2454883Z mov.b32 %r15252, %r15177; 2026-02-21T09:06:50.2454941Z mov.b32 %r15253, %r15177; 2026-02-21T09:06:50.2454999Z mov.b32 %r15254, %r15177; 2026-02-21T09:06:50.2455055Z mov.b32 %r15255, %r15177; 2026-02-21T09:06:50.2455115Z mov.b32 %r15256, %r15177; 2026-02-21T09:06:50.2455174Z mov.b32 %r15257, %r15177; 2026-02-21T09:06:50.2455233Z mov.b32 %r15258, %r15177; 2026-02-21T09:06:50.2455292Z mov.b32 %r15259, %r15177; 2026-02-21T09:06:50.2455448Z mov.b32 %r15260, %r15177; 2026-02-21T09:06:50.2455512Z mov.b32 %r15261, %r15177; 2026-02-21T09:06:50.2455569Z mov.b32 %r15262, %r15177; 2026-02-21T09:06:50.2455629Z mov.b32 %r15263, %r15177; 2026-02-21T09:06:50.2455687Z mov.b32 %r15264, %r15177; 2026-02-21T09:06:50.2455745Z mov.b32 %r15265, %r15177; 2026-02-21T09:06:50.2455806Z mov.b32 %r15266, %r15177; 2026-02-21T09:06:50.2455863Z mov.b32 %r15267, %r15177; 2026-02-21T09:06:50.2455919Z mov.b32 %r15268, %r15177; 2026-02-21T09:06:50.2455976Z mov.b32 %r15269, %r15177; 2026-02-21T09:06:50.2456036Z mov.b32 %r15270, %r15177; 2026-02-21T09:06:50.2456094Z mov.b32 %r15271, %r15177; 2026-02-21T09:06:50.2456151Z mov.b32 %r15272, %r15177; 2026-02-21T09:06:50.2456213Z mov.b32 %r15273, %r15177; 2026-02-21T09:06:50.2456269Z mov.b32 %r15274, %r15177; 2026-02-21T09:06:50.2456325Z mov.b32 %r15275, %r15177; 2026-02-21T09:06:50.2456396Z mov.b32 %r15276, %r15177; 2026-02-21T09:06:50.2456574Z mov.b32 %r15277, %r15177; 2026-02-21T09:06:50.2456639Z mov.b32 %r15278, %r15177; 2026-02-21T09:06:50.2456773Z mov.b32 %r15279, %r15177; 2026-02-21T09:06:50.2456837Z mov.b32 %r15280, %r15177; 2026-02-21T09:06:50.2456895Z mov.b32 %r15281, %r15177; 2026-02-21T09:06:50.2456954Z mov.b32 %r15282, %r15177; 2026-02-21T09:06:50.2457015Z mov.b32 %r15283, %r15177; 2026-02-21T09:06:50.2457072Z mov.b32 %r15284, %r15177; 2026-02-21T09:06:50.2457130Z mov.b32 %r15285, %r15177; 2026-02-21T09:06:50.2457189Z mov.b32 %r15286, %r15177; 2026-02-21T09:06:50.2457249Z mov.b32 %r15287, %r15177; 2026-02-21T09:06:50.2457305Z mov.b32 %r15288, %r15177; 2026-02-21T09:06:50.2457372Z mov.b32 %r15289, %r15177; 2026-02-21T09:06:50.2457431Z mov.b32 %r15290, %r15177; 2026-02-21T09:06:50.2457487Z mov.b32 %r15291, %r15177; 2026-02-21T09:06:50.2457544Z mov.b32 %r15292, %r15177; 2026-02-21T09:06:50.2457600Z mov.b32 %r15293, %r15177; 2026-02-21T09:06:50.2457660Z mov.b32 %r15294, %r15177; 2026-02-21T09:06:50.2457716Z mov.b32 %r15295, %r15177; 2026-02-21T09:06:50.2457776Z mov.b32 %r15296, %r15177; 2026-02-21T09:06:50.2457839Z mov.b32 %r15297, %r15177; 2026-02-21T09:06:50.2457911Z mov.b32 %r15298, %r15177; 2026-02-21T09:06:50.2457973Z mov.b32 %r15299, %r15177; 2026-02-21T09:06:50.2458032Z mov.b32 %r15300, %r15177; 2026-02-21T09:06:50.2458095Z mov.b32 %r15301, %r15177; 2026-02-21T09:06:50.2458154Z mov.b32 %r15302, %r15177; 2026-02-21T09:06:50.2458211Z mov.b32 %r15303, %r15177; 2026-02-21T09:06:50.2458271Z mov.b32 %r15304, %r15177; 2026-02-21T09:06:50.2458331Z bra.uni $L__BB0_15; 2026-02-21T09:06:50.2458429Z $L__BB0_16: // %._crit_edge193 2026-02-21T09:06:50.2458634Z .loc 1 26 4 // cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py:26:4 2026-02-21T09:06:50.2458687Z ret; 2026-02-21T09:06:50.2458741Z $L__tmp33: 2026-02-21T09:06:50.2458796Z $L__func_end0: 2026-02-21T09:06:50.2458885Z // -- End function 2026-02-21T09:06:50.2458942Z } 2026-02-21T09:06:50.2459186Z .file 1 "/tmp/torchinductor_root/ls/cls2fio2wbuj64hv5dhv3txqb2dvznoq446m3r6vew3qc2caaele.py" 2026-02-21T09:06:50.2459402Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:06:50.2459548Z .section .debug_abbrev 2026-02-21T09:06:50.2459598Z { 2026-02-21T09:06:50.2459694Z .b8 1 // Abbreviation Code 2026-02-21T09:06:50.2459790Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:06:50.2459876Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:06:50.2459960Z .b8 37 // DW_AT_producer 2026-02-21T09:06:50.2460042Z .b8 8 // DW_FORM_string 2026-02-21T09:06:50.2460122Z .b8 19 // DW_AT_language 2026-02-21T09:06:50.2460203Z .b8 5 // DW_FORM_data2 2026-02-21T09:06:50.2460285Z .b8 3 // DW_AT_name 2026-02-21T09:06:50.2460535Z .b8 8 // DW_FORM_string 2026-02-21T09:06:50.2460628Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:06:50.2460710Z .b8 6 // DW_FORM_data4 2026-02-21T09:06:50.2460797Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:06:50.2460875Z .b8 8 // DW_FORM_string 2026-02-21T09:06:50.2460947Z .b8 0 // EOM(1) 2026-02-21T09:06:50.2461020Z .b8 0 // EOM(2) 2026-02-21T09:06:50.2461110Z .b8 2 // Abbreviation Code 2026-02-21T09:06:50.2461196Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:06:50.2461279Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:50.2461354Z .b8 3 // DW_AT_name 2026-02-21T09:06:50.2461432Z .b8 8 // DW_FORM_string 2026-02-21T09:06:50.2461566Z .b8 32 // DW_AT_inline 2026-02-21T09:06:50.2461648Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:50.2461718Z .b8 0 // EOM(1) 2026-02-21T09:06:50.2461787Z .b8 0 // EOM(2) 2026-02-21T09:06:50.2461876Z .b8 3 // Abbreviation Code 2026-02-21T09:06:50.2461958Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:06:50.2462040Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:06:50.2462122Z .b8 17 // DW_AT_low_pc 2026-02-21T09:06:50.2462197Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:50.2462278Z .b8 18 // DW_AT_high_pc 2026-02-21T09:06:50.2462355Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:50.2462446Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:06:50.2462530Z .b8 19 // DW_FORM_ref4 2026-02-21T09:06:50.2462602Z .b8 0 // EOM(1) 2026-02-21T09:06:50.2462674Z .b8 0 // EOM(2) 2026-02-21T09:06:50.2462761Z .b8 4 // Abbreviation Code 2026-02-21T09:06:50.2462859Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:06:50.2462941Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:06:50.2463035Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:06:50.2463116Z .b8 19 // DW_FORM_ref4 2026-02-21T09:06:50.2463194Z .b8 17 // DW_AT_low_pc 2026-02-21T09:06:50.2463269Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:50.2463349Z .b8 18 // DW_AT_high_pc 2026-02-21T09:06:50.2463426Z .b8 1 // DW_FORM_addr 2026-02-21T09:06:50.2463512Z .b8 88 // DW_AT_call_file 2026-02-21T09:06:50.2463592Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:50.2463730Z .b8 89 // DW_AT_call_line 2026-02-21T09:06:50.2463812Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:50.2463895Z .b8 87 // DW_AT_call_column 2026-02-21T09:06:50.2463971Z .b8 11 // DW_FORM_data1 2026-02-21T09:06:50.2464048Z .b8 0 // EOM(1) 2026-02-21T09:06:50.2464117Z .b8 0 // EOM(2) 2026-02-21T09:06:50.2464185Z .b8 0 // EOM(3) 2026-02-21T09:06:50.2464237Z } 2026-02-21T09:06:50.2464298Z .section .debug_info 2026-02-21T09:06:50.2464360Z { 2026-02-21T09:06:50.2464452Z .b32 178 // Length of Unit 2026-02-21T09:06:50.2464649Z .b8 2 // DWARF version number 2026-02-21T09:06:50.2464704Z .b8 0 2026-02-21T09:06:50.2464839Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:06:50.2464940Z .b8 8 // Address Size (in bytes) 2026-02-21T09:06:50.2465053Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:06:50.2465140Z .b8 116 // DW_AT_producer 2026-02-21T09:06:50.2465193Z .b8 114 2026-02-21T09:06:50.2465251Z .b8 105 2026-02-21T09:06:50.2465301Z .b8 116 2026-02-21T09:06:50.2465351Z .b8 111 2026-02-21T09:06:50.2465406Z .b8 110 2026-02-21T09:06:50.2465457Z .b8 0 2026-02-21T09:06:50.2465540Z .b8 2 // DW_AT_language 2026-02-21T09:06:50.2465591Z .b8 0 2026-02-21T09:06:50.2465675Z .b8 99 // DW_AT_name 2026-02-21T09:06:50.2465726Z .b8 108 2026-02-21T09:06:50.2465776Z .b8 115 2026-02-21T09:06:50.2465830Z .b8 50 2026-02-21T09:06:50.2465883Z .b8 102 2026-02-21T09:06:50.2465983Z .b8 105 2026-02-21T09:06:50.2466037Z .b8 111 2026-02-21T09:06:50.2466090Z .b8 50 2026-02-21T09:06:50.2466141Z .b8 119 2026-02-21T09:06:50.2466191Z .b8 98 2026-02-21T09:06:50.2466250Z .b8 117 2026-02-21T09:06:50.2466310Z .b8 106 2026-02-21T09:06:50.2466362Z .b8 54 2026-02-21T09:06:50.2466413Z .b8 52 2026-02-21T09:06:50.2466587Z .b8 104 2026-02-21T09:06:50.2466642Z .b8 118 2026-02-21T09:06:50.2466694Z .b8 53 2026-02-21T09:06:50.2466746Z .b8 100 2026-02-21T09:06:50.2466801Z .b8 104 2026-02-21T09:06:50.2466852Z .b8 118 2026-02-21T09:06:50.2466902Z .b8 51 2026-02-21T09:06:50.2466955Z .b8 116 2026-02-21T09:06:50.2467004Z .b8 120 2026-02-21T09:06:50.2467055Z .b8 113 2026-02-21T09:06:50.2467105Z .b8 98 2026-02-21T09:06:50.2467159Z .b8 50 2026-02-21T09:06:50.2467209Z .b8 100 2026-02-21T09:06:50.2467259Z .b8 118 2026-02-21T09:06:50.2467311Z .b8 122 2026-02-21T09:06:50.2467361Z .b8 110 2026-02-21T09:06:50.2467412Z .b8 111 2026-02-21T09:06:50.2467465Z .b8 113 2026-02-21T09:06:50.2467519Z .b8 52 2026-02-21T09:06:50.2467570Z .b8 52 2026-02-21T09:06:50.2467620Z .b8 54 2026-02-21T09:06:50.2467673Z .b8 109 2026-02-21T09:06:50.2467724Z .b8 51 2026-02-21T09:06:50.2467776Z .b8 114 2026-02-21T09:06:50.2467827Z .b8 54 2026-02-21T09:06:50.2467896Z .b8 118 2026-02-21T09:06:50.2467951Z .b8 101 2026-02-21T09:06:50.2468003Z .b8 119 2026-02-21T09:06:50.2468054Z .b8 51 2026-02-21T09:06:50.2468108Z .b8 113 2026-02-21T09:06:50.2468158Z .b8 99 2026-02-21T09:06:50.2468208Z .b8 50 2026-02-21T09:06:50.2468261Z .b8 99 2026-02-21T09:06:50.2468387Z .b8 97 2026-02-21T09:06:50.2468439Z .b8 97 2026-02-21T09:06:50.2468491Z .b8 101 2026-02-21T09:06:50.2468544Z .b8 108 2026-02-21T09:06:50.2468595Z .b8 101 2026-02-21T09:06:50.2468645Z .b8 46 2026-02-21T09:06:50.2468701Z .b8 112 2026-02-21T09:06:50.2468751Z .b8 121 2026-02-21T09:06:50.2468799Z .b8 0 2026-02-21T09:06:50.2468907Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:06:50.2468998Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:06:50.2469053Z .b8 116 2026-02-21T09:06:50.2469105Z .b8 109 2026-02-21T09:06:50.2469159Z .b8 112 2026-02-21T09:06:50.2469208Z .b8 47 2026-02-21T09:06:50.2469259Z .b8 116 2026-02-21T09:06:50.2469414Z .b8 111 2026-02-21T09:06:50.2469468Z .b8 114 2026-02-21T09:06:50.2469519Z .b8 99 2026-02-21T09:06:50.2469570Z .b8 104 2026-02-21T09:06:50.2469624Z .b8 105 2026-02-21T09:06:50.2469676Z .b8 110 2026-02-21T09:06:50.2469726Z .b8 100 2026-02-21T09:06:50.2469776Z .b8 117 2026-02-21T09:06:50.2469829Z .b8 99 2026-02-21T09:06:50.2469880Z .b8 116 2026-02-21T09:06:50.2469930Z .b8 111 2026-02-21T09:06:50.2469979Z .b8 114 2026-02-21T09:06:50.2470031Z .b8 95 2026-02-21T09:06:50.2470080Z .b8 114 2026-02-21T09:06:50.2470131Z .b8 111 2026-02-21T09:06:50.2470188Z .b8 111 2026-02-21T09:06:50.2470251Z .b8 116 2026-02-21T09:06:50.2470304Z .b8 47 2026-02-21T09:06:50.2470354Z .b8 108 2026-02-21T09:06:50.2470407Z .b8 115 2026-02-21T09:06:50.2470456Z .b8 0 2026-02-21T09:06:50.2470713Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:06:50.2470799Z .b8 95 // DW_AT_name 2026-02-21T09:06:50.2470850Z .b8 104 2026-02-21T09:06:50.2470904Z .b8 101 2026-02-21T09:06:50.2470955Z .b8 108 2026-02-21T09:06:50.2471009Z .b8 105 2026-02-21T09:06:50.2471060Z .b8 111 2026-02-21T09:06:50.2471109Z .b8 110 2026-02-21T09:06:50.2471163Z .b8 95 2026-02-21T09:06:50.2471214Z .b8 109 2026-02-21T09:06:50.2471269Z .b8 97 2026-02-21T09:06:50.2471319Z .b8 116 2026-02-21T09:06:50.2471374Z .b8 109 2026-02-21T09:06:50.2471425Z .b8 117 2026-02-21T09:06:50.2471477Z .b8 108 2026-02-21T09:06:50.2471528Z .b8 95 2026-02-21T09:06:50.2471582Z .b8 98 2026-02-21T09:06:50.2471633Z .b8 102 2026-02-21T09:06:50.2471683Z .b8 49 2026-02-21T09:06:50.2471738Z .b8 54 2026-02-21T09:06:50.2471788Z .b8 95 2026-02-21T09:06:50.2471841Z .b8 105 2026-02-21T09:06:50.2471891Z .b8 110 2026-02-21T09:06:50.2471945Z .b8 116 2026-02-21T09:06:50.2471996Z .b8 52 2026-02-21T09:06:50.2472048Z .b8 0 2026-02-21T09:06:50.2472212Z .b8 1 // DW_AT_inline 2026-02-21T09:06:50.2472328Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:06:50.2472426Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:06:50.2472525Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:06:50.2472631Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:06:50.2472763Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:06:50.2472863Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:06:50.2472955Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:06:50.2473046Z .b64 $L__tmp32 // DW_AT_high_pc 2026-02-21T09:06:50.2473131Z .b8 1 // DW_AT_call_file 2026-02-21T09:06:50.2473215Z .b8 91 // DW_AT_call_line 2026-02-21T09:06:50.2473307Z .b8 40 // DW_AT_call_column 2026-02-21T09:06:50.2473398Z .b8 0 // End Of Children Mark 2026-02-21T09:06:50.2473491Z .b8 0 // End Of Children Mark 2026-02-21T09:06:50.2473544Z } 2026-02-21T09:06:50.2473619Z .section .debug_macinfo { } 2026-02-21T09:06:50.2473625Z 2026-02-21T09:06:50.2473705Z ================================================================ 2026-02-21T09:06:50.2473836Z please share the reproducer above with Triton project. 2026-02-21T09:06:50.3684795Z [355s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=1, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 1], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:06:50.3686943Z Tensor-likes are not close! 2026-02-21T09:06:50.3687122Z 2026-02-21T09:06:50.3687244Z Mismatched elements: 33451481 / 33554432 (99.7%) 2026-02-21T09:06:50.3687715Z Greatest absolute difference: 1416.0 at index (1834, 910) (up to 0.01 allowed) 2026-02-21T09:06:50.3688465Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:06:50.3688920Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:06:50.3689173Z 2026-02-21T09:06:50.6775046Z [356s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[3, 1], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:06:50.6778031Z Tensor-likes are not close! 2026-02-21T09:06:50.6778199Z 2026-02-21T09:06:50.6778314Z Mismatched elements: 33486020 / 33554432 (99.8%) 2026-02-21T09:06:50.6778750Z Greatest absolute difference: 2480.0 at index (3409, 1855) (up to 0.01 allowed) 2026-02-21T09:06:50.6779287Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:06:50.6779756Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:06:50.6780009Z 2026-02-21T09:06:51.0671809Z [356s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=8, num_stages=1, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[True, True], range_num_stages=[3, 1], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:06:51.0673701Z Tensor-likes are not close! 2026-02-21T09:06:51.0673911Z 2026-02-21T09:06:51.0674502Z Mismatched elements: 33486128 / 33554432 (99.8%) 2026-02-21T09:06:51.0674980Z Greatest absolute difference: 2624.0 at index (1834, 910) (up to 0.01 allowed) 2026-02-21T09:06:51.0675530Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:06:51.0675998Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:06:51.0676253Z 2026-02-21T09:06:51.4624532Z 2026-02-21T09:06:51.4626701Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━ 108/108 12.8 configs/s 2026-02-21T09:07:01.4571945Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━━━ 559/559 55.6 configs/s 2026-02-21T09:07:01.7307150Z [367s] Generation 4 complete: 2026-02-21T09:07:01.7307433Z error=19 2026-02-21T09:07:01.7307621Z timeout=2 2026-02-21T09:07:01.7307791Z ok=91 2026-02-21T09:07:01.7307956Z min=0.3676 2026-02-21T09:07:01.7308124Z mid=0.6950 2026-02-21T09:07:01.7308294Z max=34.0628 2026-02-21T09:07:01.7308599Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:07:01.7308990Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:07:01.7309369Z 'l2_groupings': [1], 2026-02-21T09:07:01.7309608Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:07:01.7309908Z 'loop_orders': [[0, 1]], 2026-02-21T09:07:01.7310123Z 'num_stages': 3, 2026-02-21T09:07:01.7310316Z 'num_warps': 4, 2026-02-21T09:07:01.7310504Z 'pid_type': 'flat', 2026-02-21T09:07:01.7310725Z 'range_flattens': [None, False], 2026-02-21T09:07:01.7311008Z 'range_multi_buffers': [None, False], 2026-02-21T09:07:01.7311307Z 'range_num_stages': [0, 4], 2026-02-21T09:07:01.7311578Z 'range_unroll_factors': [0, 3], 2026-02-21T09:07:01.7311803Z 'range_warp_specializes': []} 2026-02-21T09:07:01.7348752Z [367s] Fitting surrogate: 537 points, 537 targets 2026-02-21T09:07:03.3263708Z [368s] Generation 5 starting: 98 neighbors, 5 active search path(s) 2026-02-21T09:07:26.8618855Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99/99 3.5 configs/s 2026-02-21T09:07:26.9970105Z [392s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_sm_multiplier=4, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:07:26.9972514Z Tensor-likes are not close! 2026-02-21T09:07:26.9972704Z 2026-02-21T09:07:26.9972836Z Mismatched elements: 33485594 / 33554432 (99.8%) 2026-02-21T09:07:26.9973303Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:07:26.9973870Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:26.9974393Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:26.9974870Z 2026-02-21T09:07:27.2998765Z [392s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', 'last'], loop_orders=[[0, 1]], num_sm_multiplier=4, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:07:27.3000760Z Tensor-likes are not close! 2026-02-21T09:07:27.3000930Z 2026-02-21T09:07:27.3001051Z Mismatched elements: 33485594 / 33554432 (99.8%) 2026-02-21T09:07:27.3001470Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:07:27.3001997Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:27.3002459Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:27.3002719Z 2026-02-21T09:07:27.4715326Z [392s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_sm_multiplier=4, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:07:27.4717539Z Tensor-likes are not close! 2026-02-21T09:07:27.4717737Z 2026-02-21T09:07:27.4717861Z Mismatched elements: 33450570 / 33554432 (99.7%) 2026-02-21T09:07:27.4718295Z Greatest absolute difference: 1328.0 at index (2623, 5795) (up to 0.01 allowed) 2026-02-21T09:07:27.4718891Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:27.4719493Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:27.4719750Z 2026-02-21T09:07:28.1951326Z 2026-02-21T09:07:28.1951357Z 2026-02-21T09:07:28.1951997Z ================================================================ 2026-02-21T09:07:28.1952378Z Internal Triton PTX codegen error 2026-02-21T09:07:28.1952650Z `ptxas` stderr: 2026-02-21T09:07:28.1953364Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 562 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T09:07:28.1954213Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:07:28.1954444Z 2026-02-21T09:07:28.1955098Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4omcx2bm.ptx -o /tmp/tmp4omcx2bm.ptx.o 2026-02-21T09:07:28.1955836Z 2026-02-21T09:07:28.1955842Z 2026-02-21T09:07:28.1955915Z // 2026-02-21T09:07:28.1956113Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:07:28.1956366Z // 2026-02-21T09:07:28.1956669Z 2026-02-21T09:07:28.1956747Z .version 8.7 2026-02-21T09:07:28.1956939Z .target sm_90a 2026-02-21T09:07:28.1957130Z .address_size 64 2026-02-21T09:07:28.1957249Z 2026-02-21T09:07:28.1957485Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:07:28.1957923Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:07:28.1958510Z // @_helion_matmul_bf16_int4 2026-02-21T09:07:28.1958841Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:07:28.1959275Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:07:28.1959799Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:07:28.1960217Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:07:28.1960640Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:07:28.1961064Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:07:28.1961399Z ) 2026-02-21T09:07:28.1961553Z .reqntid 512 2026-02-21T09:07:28.1961855Z .maxnreg 32 2026-02-21T09:07:28.1962115Z { 2026-02-21T09:07:28.1962283Z .reg .pred %p<55>; 2026-02-21T09:07:28.1962477Z .reg .b16 %rs<170>; 2026-02-21T09:07:28.1962684Z .reg .b32 %r<3385>; 2026-02-21T09:07:28.1962875Z .reg .b64 %rd<108>; 2026-02-21T09:07:28.1963268Z .loc 1 19 0 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:19:0 2026-02-21T09:07:28.1963729Z $L__func_begin0: 2026-02-21T09:07:28.1964100Z .loc 1 19 0 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:19:0 2026-02-21T09:07:28.1964478Z 2026-02-21T09:07:28.1964544Z // %bb.0: 2026-02-21T09:07:28.1964778Z ld.param.b64 %rd31, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:07:28.1965154Z ld.param.b64 %rd30, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:07:28.1965459Z $L__tmp0: 2026-02-21T09:07:28.1966161Z [393s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:07:28.1968547Z Config: @helion.kernel(config=helion.Config(block_sizes=[4, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=4, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:07:28.1970413Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:07:28.1970697Z `ptxas` stderr: 2026-02-21T09:07:28.1971256Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 562 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T09:07:28.1971896Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:07:28.1972076Z 2026-02-21T09:07:28.1972574Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4omcx2bm.ptx -o /tmp/tmp4omcx2bm.ptx.o 2026-02-21T09:07:28.1973177Z 2026-02-21T09:07:28.1973327Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:07:28.1973786Z .loc 1 21 67 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:21:67 2026-02-21T09:07:28.1974152Z mov.u32 %r3285, %ctaid.x; 2026-02-21T09:07:28.1974389Z ld.param.b64 %rd33, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:07:28.1974660Z mov.u32 %r263, %ctaid.y; 2026-02-21T09:07:28.1974889Z ld.param.b64 %rd50, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:07:28.1975136Z mov.u32 %r264, %ctaid.z; 2026-02-21T09:07:28.1975315Z mov.u32 %r265, %nctaid.x; 2026-02-21T09:07:28.1975499Z mov.u32 %r266, %nctaid.y; 2026-02-21T09:07:28.1975678Z mad.lo.s32 %r267, %r264, %r266, %r263; 2026-02-21T09:07:28.1975898Z mad.lo.s32 %r268, %r267, %r265, %r3285; 2026-02-21T09:07:28.1976105Z shl.b32 %r269, %r268, 7; 2026-02-21T09:07:28.1976283Z cvt.s64.s32 %rd51, %r269; 2026-02-21T09:07:28.1976667Z add.s64 %rd47, %rd50, %rd51; 2026-02-21T09:07:28.1976869Z mov.u32 %r2, %tid.x; 2026-02-21T09:07:28.1977037Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T09:07:28.1977323Z shl.b32 %r270, %r2, 2; 2026-02-21T09:07:28.1977496Z mov.b32 %r3020, global_smem; 2026-02-21T09:07:28.1977702Z add.s32 %r255, %r3020, %r270; 2026-02-21T09:07:28.1977888Z mov.b32 %r256, 0; 2026-02-21T09:07:28.1978041Z // begin inline asm 2026-02-21T09:07:28.1978225Z @%p1 st.shared.b32 [ %r255 + 0 ], %r256; 2026-02-21T09:07:28.1978429Z // end inline asm 2026-02-21T09:07:28.1978588Z bar.warp.sync -1; 2026-02-21T09:07:28.1978749Z setp.eq.b32 %p2, %r2, 0; 2026-02-21T09:07:28.1978948Z cvt.u64.u32 %rd32, %r3020; 2026-02-21T09:07:28.1979133Z // begin inline asm 2026-02-21T09:07:28.1979493Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd32 + 0 ], %rd33; 2026-02-21T09:07:28.1979874Z // end inline asm 2026-02-21T09:07:28.1980026Z // begin inline asm 2026-02-21T09:07:28.1980464Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x1; 2026-02-21T09:07:28.1980779Z // end inline asm 2026-02-21T09:07:28.1980934Z mov.b32 %r257, 64; 2026-02-21T09:07:28.1981088Z // begin inline asm 2026-02-21T09:07:28.1981376Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x0, %r257; 2026-02-21T09:07:28.1981709Z // end inline asm 2026-02-21T09:07:28.1981868Z // begin inline asm 2026-02-21T09:07:28.1982149Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x1, %r257; 2026-02-21T09:07:28.1982478Z // end inline asm 2026-02-21T09:07:28.1982639Z mov.b32 %r259, 8192; 2026-02-21T09:07:28.1982794Z // begin inline asm 2026-02-21T09:07:28.1983085Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x0, %r259; 2026-02-21T09:07:28.1983420Z // end inline asm 2026-02-21T09:07:28.1983572Z mov.b32 %r260, 4096; 2026-02-21T09:07:28.1983737Z // begin inline asm 2026-02-21T09:07:28.1984106Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x1, %r260; 2026-02-21T09:07:28.1984462Z // end inline asm 2026-02-21T09:07:28.1984617Z mov.b64 %rd40, 16384; 2026-02-21T09:07:28.1984785Z // begin inline asm 2026-02-21T09:07:28.1985094Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd32 + 0 ], 0x0, %rd40; 2026-02-21T09:07:28.1985444Z // end inline asm 2026-02-21T09:07:28.1985589Z mov.b32 %r261, 1; 2026-02-21T09:07:28.1985743Z // begin inline asm 2026-02-21T09:07:28.1986050Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x0, %r261; 2026-02-21T09:07:28.1986401Z // end inline asm 2026-02-21T09:07:28.1986692Z // begin inline asm 2026-02-21T09:07:28.1986994Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x1, %r261; 2026-02-21T09:07:28.1987347Z // end inline asm 2026-02-21T09:07:28.1987496Z // begin inline asm 2026-02-21T09:07:28.1987780Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd32 + 0 ], 0xa; 2026-02-21T09:07:28.1988123Z // end inline asm 2026-02-21T09:07:28.1988274Z // begin inline asm 2026-02-21T09:07:28.1988685Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x0; 2026-02-21T09:07:28.1989046Z // end inline asm 2026-02-21T09:07:28.1989202Z // begin inline asm 2026-02-21T09:07:28.1989494Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x3; 2026-02-21T09:07:28.1989829Z // end inline asm 2026-02-21T09:07:28.1989975Z // begin inline asm 2026-02-21T09:07:28.1990246Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd32 + 0 ], 0x0; 2026-02-21T09:07:28.1990563Z // end inline asm 2026-02-21T09:07:28.1990708Z // begin inline asm 2026-02-21T09:07:28.1991138Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd47 + 0 ], [ %rd32 + 0 ], 0x80; 2026-02-21T09:07:28.1991609Z // end inline asm 2026-02-21T09:07:28.1991760Z // begin inline asm 2026-02-21T09:07:28.1992013Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd47 + 0 ], 0x80; 2026-02-21T09:07:28.1992338Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:07:28.1992565Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:07:28.1992864Z // end inline asm 2026-02-21T09:07:28.1993014Z bar.sync 0; 2026-02-21T09:07:28.1993176Z cvta.global.u64 %rd69, %rd47; 2026-02-21T09:07:28.1993535Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.1993904Z sub.s32 %r272, 2575, %r3285; 2026-02-21T09:07:28.1994097Z mul.hi.u32 %r273, %r272, 1041204193; 2026-02-21T09:07:28.1994308Z shr.u32 %r274, %r273, 7; 2026-02-21T09:07:28.1994498Z and.b32 %r275, %r274, 8388606; 2026-02-21T09:07:28.1994707Z mad.lo.s32 %r3352, %r275, 528, %r3285; 2026-02-21T09:07:28.1995067Z .loc 1 38 45 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:38:45 2026-02-21T09:07:28.1995433Z and.b32 %r4, %r2, 31; 2026-02-21T09:07:28.1995736Z shr.u32 %r5, %r2, 5; 2026-02-21T09:07:28.1995909Z bfe.u32 %r6, %r2, 3, 6; 2026-02-21T09:07:28.1996221Z .loc 1 40 45 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:40:45 2026-02-21T09:07:28.1996733Z and.b32 %r7, %r2, 127; 2026-02-21T09:07:28.1996898Z shl.b32 %r8, %r7, 1; 2026-02-21T09:07:28.1997204Z .loc 1 48 48 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:48:48 2026-02-21T09:07:28.1997560Z bfe.u32 %r10, %r2, 7, 2; 2026-02-21T09:07:28.1997866Z .loc 1 54 38 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:54:38 2026-02-21T09:07:28.1998217Z and.b32 %r11, %r2, 7; 2026-02-21T09:07:28.1998514Z .loc 1 72 38 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:72:38 2026-02-21T09:07:28.1998864Z and.b32 %r12, %r2, 256; 2026-02-21T09:07:28.1999165Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.1999550Z setp.ge.s32 %p19, %r3285, %r3352; 2026-02-21T09:07:28.1999860Z shl.b32 %r3273, %r2, 1; 2026-02-21T09:07:28.2000037Z and.b32 %r3274, %r2, 96; 2026-02-21T09:07:28.2000210Z shl.b32 %r3275, %r7, 2; 2026-02-21T09:07:28.2000372Z and.b32 %r3276, %r2, 1; 2026-02-21T09:07:28.2000537Z shl.b32 %r3277, %r2, 5; 2026-02-21T09:07:28.2000699Z bfe.s32 %r3278, %r2, 2, 1; 2026-02-21T09:07:28.2000879Z shr.u32 %r3279, %r12, 6; 2026-02-21T09:07:28.2001044Z shl.b32 %r3280, %r5, 3; 2026-02-21T09:07:28.2001210Z shl.b32 %r3281, %r2, 6; 2026-02-21T09:07:28.2001377Z shl.b32 %r3282, %r11, 4; 2026-02-21T09:07:28.2001539Z shl.b32 %r3283, %r2, 7; 2026-02-21T09:07:28.2001707Z and.b32 %r3284, %r2, 16; 2026-02-21T09:07:28.2001886Z setp.eq.b32 %p53, %r12, 0; 2026-02-21T09:07:28.2002076Z setp.lt.u32 %p54, %r2, 128; 2026-02-21T09:07:28.2002256Z @%p19 bra $L__BB0_7; 2026-02-21T09:07:28.2002441Z // %bb.1: // %.lr.ph 2026-02-21T09:07:28.2002813Z .loc 1 0 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:0:97 2026-02-21T09:07:28.2003174Z shr.u32 %r9, %r2, 7; 2026-02-21T09:07:28.2003337Z and.b32 %r277, %r3273, 1022; 2026-02-21T09:07:28.2003539Z add.s32 %r2012, %r3020, 32768; 2026-02-21T09:07:28.2003733Z add.s32 %r13, %r2012, %r277; 2026-02-21T09:07:28.2003911Z shl.b32 %r281, %r3274, 3; 2026-02-21T09:07:28.2004090Z and.b32 %r283, %r270, 112; 2026-02-21T09:07:28.2004262Z and.b32 %r284, %r3273, 6; 2026-02-21T09:07:28.2004442Z add.s32 %r285, %r2012, %r281; 2026-02-21T09:07:28.2004620Z add.s32 %r286, %r285, %r283; 2026-02-21T09:07:28.2004806Z add.s32 %r14, %r286, %r284; 2026-02-21T09:07:28.2004988Z or.b32 %r288, %r10, %r3275; 2026-02-21T09:07:28.2005170Z add.s32 %r15, %r2012, %r288; 2026-02-21T09:07:28.2005372Z xor.b32 %r289, %r288, 64; 2026-02-21T09:07:28.2005550Z add.s32 %r16, %r2012, %r289; 2026-02-21T09:07:28.2005741Z and.b32 %r290, %r3273, 508; 2026-02-21T09:07:28.2005925Z neg.s32 %r292, %r3276; 2026-02-21T09:07:28.2006112Z and.b32 %r293, %r292, 576; 2026-02-21T09:07:28.2006295Z xor.b32 %r294, %r293, %r290; 2026-02-21T09:07:28.2006611Z add.s32 %r17, %r2012, %r294; 2026-02-21T09:07:28.2006795Z and.b32 %r296, %r3277, 8032; 2026-02-21T09:07:28.2007095Z and.b32 %r298, %r3278, 144; 2026-02-21T09:07:28.2007270Z or.b32 %r300, %r3279, %r296; 2026-02-21T09:07:28.2007448Z or.b32 %r301, %r300, %r298; 2026-02-21T09:07:28.2007627Z add.s32 %r18, %r2012, %r301; 2026-02-21T09:07:28.2007812Z xor.b32 %r302, %r301, 16; 2026-02-21T09:07:28.2007992Z add.s32 %r19, %r2012, %r302; 2026-02-21T09:07:28.2008165Z and.b32 %r304, %r3280, 120; 2026-02-21T09:07:28.2008343Z or.b32 %r305, %r304, %r4; 2026-02-21T09:07:28.2008512Z shl.b32 %r306, %r305, 4; 2026-02-21T09:07:28.2008688Z add.s32 %r307, %r3020, 40960; 2026-02-21T09:07:28.2008866Z add.s32 %r1045, %r307, %r306; 2026-02-21T09:07:28.2009059Z and.b32 %r309, %r3281, 1536; 2026-02-21T09:07:28.2009232Z shl.b32 %r311, %r3274, 2; 2026-02-21T09:07:28.2009558Z add.s32 %r312, %r307, %r309; 2026-02-21T09:07:28.2009740Z add.s32 %r313, %r312, %r3282; 2026-02-21T09:07:28.2009915Z add.s32 %r354, %r313, %r311; 2026-02-21T09:07:28.2010094Z bfe.u32 %r314, %r2012, 4, 14; 2026-02-21T09:07:28.2010274Z cvt.u64.u32 %rd52, %r314; 2026-02-21T09:07:28.2010471Z or.b64 %rd74, %rd52, -4611685949674356736; 2026-02-21T09:07:28.2010685Z add.s32 %r315, %r3020, 34816; 2026-02-21T09:07:28.2010872Z bfe.u32 %r316, %r315, 4, 14; 2026-02-21T09:07:28.2011052Z cvt.u64.u32 %rd53, %r316; 2026-02-21T09:07:28.2011243Z or.b64 %rd75, %rd53, -4611685949674356736; 2026-02-21T09:07:28.2011461Z add.s32 %r317, %r3020, 36864; 2026-02-21T09:07:28.2011649Z bfe.u32 %r318, %r317, 4, 14; 2026-02-21T09:07:28.2011832Z cvt.u64.u32 %rd54, %r318; 2026-02-21T09:07:28.2012013Z or.b64 %rd76, %rd54, -4611685949674356736; 2026-02-21T09:07:28.2012225Z add.s32 %r319, %r3020, 38912; 2026-02-21T09:07:28.2012398Z bfe.u32 %r320, %r319, 4, 14; 2026-02-21T09:07:28.2012591Z cvt.u64.u32 %rd55, %r320; 2026-02-21T09:07:28.2012849Z or.b64 %rd77, %rd55, -4611685949674356736; 2026-02-21T09:07:28.2013064Z and.b32 %r322, %r3283, 1920; 2026-02-21T09:07:28.2013238Z and.b32 %r323, %r3281, 30720; 2026-02-21T09:07:28.2013425Z or.b32 %r325, %r322, %r323; 2026-02-21T09:07:28.2013612Z xor.b32 %r326, %r3282, %r3284; 2026-02-21T09:07:28.2013795Z or.b32 %r327, %r325, %r326; 2026-02-21T09:07:28.2013977Z add.s32 %r22, %r3020, %r327; 2026-02-21T09:07:28.2014153Z xor.b32 %r328, %r327, 32; 2026-02-21T09:07:28.2014329Z add.s32 %r23, %r3020, %r328; 2026-02-21T09:07:28.2014501Z xor.b32 %r329, %r327, 64; 2026-02-21T09:07:28.2014678Z add.s32 %r24, %r3020, %r329; 2026-02-21T09:07:28.2014852Z xor.b32 %r330, %r327, 96; 2026-02-21T09:07:28.2015045Z add.s32 %r25, %r3020, %r330; 2026-02-21T09:07:28.2015234Z add.s32 %r331, %r312, %r311; 2026-02-21T09:07:28.2015413Z add.s32 %r1325, %r331, %r3282; 2026-02-21T09:07:28.2015758Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2016137Z shl.b32 %r332, %r6, 10; 2026-02-21T09:07:28.2016612Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2017001Z or.b32 %r27, %r332, %r11; 2026-02-21T09:07:28.2017326Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2017691Z shl.b32 %r333, %r10, 13; 2026-02-21T09:07:28.2018013Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2018378Z or.b32 %r28, %r333, %r8; 2026-02-21T09:07:28.2018598Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:07:28.2018890Z // Child Loop BB0_3 Depth 2 2026-02-21T09:07:28.2019156Z // Child Loop BB0_5 Depth 2 2026-02-21T09:07:28.2019540Z .loc 1 32 35 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:32:35 2026-02-21T09:07:28.2019904Z shr.s32 %r335, %r3285, 31; 2026-02-21T09:07:28.2020081Z shr.u32 %r336, %r335, 21; 2026-02-21T09:07:28.2020268Z add.s32 %r337, %r3285, %r336; 2026-02-21T09:07:28.2020539Z shr.s32 %r338, %r337, 11; 2026-02-21T09:07:28.2020856Z .loc 1 33 33 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:33:33 2026-02-21T09:07:28.2021219Z shl.b32 %r339, %r338, 6; 2026-02-21T09:07:28.2021535Z .loc 1 34 39 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:34:39 2026-02-21T09:07:28.2021884Z sub.s32 %r340, 64, %r339; 2026-02-21T09:07:28.2022196Z .loc 1 34 52 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:34:52 2026-02-21T09:07:28.2022550Z min.s32 %r341, %r340, 64; 2026-02-21T09:07:28.2022857Z .loc 1 35 45 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:45 2026-02-21T09:07:28.2023286Z and.b32 %r342, %r337, -2048; 2026-02-21T09:07:28.2023531Z sub.s32 %r343, %r3285, %r342; 2026-02-21T09:07:28.2023867Z .loc 1 36 51 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:36:51 2026-02-21T09:07:28.2024221Z div.s32 %r344, %r343, %r341; 2026-02-21T09:07:28.2024543Z .loc 1 35 64 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:64 2026-02-21T09:07:28.2024903Z mul.lo.s32 %r345, %r344, %r341; 2026-02-21T09:07:28.2025092Z sub.s32 %r346, %r343, %r345; 2026-02-21T09:07:28.2025411Z .loc 1 35 30 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:30 2026-02-21T09:07:28.2025761Z add.s32 %r347, %r346, %r339; 2026-02-21T09:07:28.2026092Z .loc 1 37 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:37:27 2026-02-21T09:07:28.2026441Z shl.b32 %r1276, %r347, 6; 2026-02-21T09:07:28.2026895Z .loc 1 39 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:39:27 2026-02-21T09:07:28.2027344Z shl.b32 %r31, %r344, 8; 2026-02-21T09:07:28.2027659Z .loc 1 40 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:40:32 2026-02-21T09:07:28.2028016Z or.b32 %r32, %r31, %r8; 2026-02-21T09:07:28.2028408Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2028786Z shl.b32 %r348, %r347, 16; 2026-02-21T09:07:28.2028958Z or.b32 %r349, %r27, %r348; 2026-02-21T09:07:28.2029150Z mad.wide.s32 %rd101, %r349, 2, %rd30; 2026-02-21T09:07:28.2029368Z add.s32 %r3286, %r28, %r31; 2026-02-21T09:07:28.2029549Z mov.b32 %r3287, 0f00000000; 2026-02-21T09:07:28.2038663Z mov.b64 %rd102, -8; 2026-02-21T09:07:28.2038882Z mov.b32 %r3288, %r3287; 2026-02-21T09:07:28.2039074Z mov.b32 %r3289, %r3287; 2026-02-21T09:07:28.2039260Z mov.b32 %r3290, %r3287; 2026-02-21T09:07:28.2039440Z mov.b32 %r3291, %r3287; 2026-02-21T09:07:28.2039617Z mov.b32 %r3292, %r3287; 2026-02-21T09:07:28.2039790Z mov.b32 %r3293, %r3287; 2026-02-21T09:07:28.2039964Z mov.b32 %r3294, %r3287; 2026-02-21T09:07:28.2040156Z mov.b32 %r3295, %r3287; 2026-02-21T09:07:28.2040327Z mov.b32 %r3296, %r3287; 2026-02-21T09:07:28.2040503Z mov.b32 %r3297, %r3287; 2026-02-21T09:07:28.2040669Z mov.b32 %r3298, %r3287; 2026-02-21T09:07:28.2040840Z mov.b32 %r3299, %r3287; 2026-02-21T09:07:28.2041015Z mov.b32 %r3300, %r3287; 2026-02-21T09:07:28.2041187Z mov.b32 %r3301, %r3287; 2026-02-21T09:07:28.2041347Z mov.b32 %r3302, %r3287; 2026-02-21T09:07:28.2041515Z mov.b32 %r3303, %r3287; 2026-02-21T09:07:28.2041678Z mov.b32 %r3304, %r3287; 2026-02-21T09:07:28.2041846Z mov.b32 %r3305, %r3287; 2026-02-21T09:07:28.2042014Z mov.b32 %r3306, %r3287; 2026-02-21T09:07:28.2042175Z mov.b32 %r3307, %r3287; 2026-02-21T09:07:28.2042344Z mov.b32 %r3308, %r3287; 2026-02-21T09:07:28.2042505Z mov.b32 %r3309, %r3287; 2026-02-21T09:07:28.2042676Z mov.b32 %r3310, %r3287; 2026-02-21T09:07:28.2042852Z mov.b32 %r3311, %r3287; 2026-02-21T09:07:28.2043026Z mov.b32 %r3312, %r3287; 2026-02-21T09:07:28.2043196Z mov.b32 %r3313, %r3287; 2026-02-21T09:07:28.2043376Z mov.b32 %r3314, %r3287; 2026-02-21T09:07:28.2043540Z mov.b32 %r3315, %r3287; 2026-02-21T09:07:28.2043872Z mov.b32 %r3316, %r3287; 2026-02-21T09:07:28.2044045Z mov.b32 %r3317, %r3287; 2026-02-21T09:07:28.2044207Z mov.b32 %r3318, %r3287; 2026-02-21T09:07:28.2044436Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:07:28.2044741Z // => This Inner Loop Header: Depth=2 2026-02-21T09:07:28.2045153Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2045518Z // begin inline asm 2026-02-21T09:07:28.2045692Z mov.u16 %rs1, 0x0; 2026-02-21T09:07:28.2045865Z ld.global.b16 { %rs1 }, [ %rd101 + 0 ]; 2026-02-21T09:07:28.2046079Z // end inline asm 2026-02-21T09:07:28.2046653Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2047122Z bar.sync 0; 2026-02-21T09:07:28.2047293Z st.shared.b16 [%r13], %rs1; 2026-02-21T09:07:28.2047495Z bar.sync 0; 2026-02-21T09:07:28.2047663Z ld.shared.b16 %rs5, [%r14]; 2026-02-21T09:07:28.2047855Z ld.shared.b16 %rs6, [%r14+128]; 2026-02-21T09:07:28.2048061Z ld.shared.b16 %rs7, [%r14+8]; 2026-02-21T09:07:28.2048265Z ld.shared.b16 %rs8, [%r14+136]; 2026-02-21T09:07:28.2048473Z cvt.f32.bf16 %r642, %rs5; 2026-02-21T09:07:28.2048658Z cvt.f32.bf16 %r643, %rs6; 2026-02-21T09:07:28.2048833Z cvt.f32.bf16 %r644, %rs7; 2026-02-21T09:07:28.2049013Z cvt.f32.bf16 %r645, %rs8; 2026-02-21T09:07:28.2049345Z .loc 1 61 34 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:34 2026-02-21T09:07:28.2049729Z cvt.s64.s32 %rd66, %r3286; 2026-02-21T09:07:28.2049919Z add.s64 %rd58, %rd31, %rd66; 2026-02-21T09:07:28.2050260Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2050707Z // begin inline asm 2026-02-21T09:07:28.2050889Z mov.u16 %rs2, 0x0; 2026-02-21T09:07:28.2051068Z ld.global.b16 { %rs2 }, [ %rd58 + 0 ]; 2026-02-21T09:07:28.2051279Z // end inline asm 2026-02-21T09:07:28.2051457Z shr.u16 %rs9, %rs2, 8; 2026-02-21T09:07:28.2051776Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2052130Z bar.sync 0; 2026-02-21T09:07:28.2052286Z st.shared.b8 [%r15], %rs2; 2026-02-21T09:07:28.2052492Z st.shared.b8 [%r16+512], %rs9; 2026-02-21T09:07:28.2052675Z bar.sync 0; 2026-02-21T09:07:28.2052838Z ld.shared.b32 %r1246, [%r17]; 2026-02-21T09:07:28.2053033Z prmt.b32 %r1247, %r1246, 0, 0x7770U; 2026-02-21T09:07:28.2053248Z cvt.u16.u32 %rs10, %r1247; 2026-02-21T09:07:28.2053435Z prmt.b32 %r1248, %r1246, 0, 0x7771U; 2026-02-21T09:07:28.2053631Z cvt.u16.u32 %rs11, %r1248; 2026-02-21T09:07:28.2053816Z prmt.b32 %r1249, %r1246, 0, 0x7772U; 2026-02-21T09:07:28.2054018Z cvt.u16.u32 %rs12, %r1249; 2026-02-21T09:07:28.2054201Z prmt.b32 %r1250, %r1246, 0, 0x7773U; 2026-02-21T09:07:28.2054394Z cvt.u16.u32 %rs13, %r1250; 2026-02-21T09:07:28.2054738Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2055092Z shl.b16 %rs14, %rs10, 4; 2026-02-21T09:07:28.2055277Z shl.b16 %rs15, %rs11, 4; 2026-02-21T09:07:28.2055455Z shl.b16 %rs16, %rs12, 4; 2026-02-21T09:07:28.2055624Z shl.b16 %rs17, %rs13, 4; 2026-02-21T09:07:28.2055955Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2056323Z selp.b16 %rs18, %rs14, %rs10, %p53; 2026-02-21T09:07:28.2056676Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:07:28.2056849Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:07:28.2057043Z selp.b16 %rs21, %rs15, %rs11, %p53; 2026-02-21T09:07:28.2057245Z cvt.s16.s8 %rs22, %rs21; 2026-02-21T09:07:28.2057424Z shr.s16 %rs23, %rs22, 4; 2026-02-21T09:07:28.2057609Z selp.b16 %rs24, %rs16, %rs12, %p53; 2026-02-21T09:07:28.2057816Z cvt.s16.s8 %rs25, %rs24; 2026-02-21T09:07:28.2058000Z shr.s16 %rs26, %rs25, 4; 2026-02-21T09:07:28.2058269Z selp.b16 %rs27, %rs17, %rs13, %p53; 2026-02-21T09:07:28.2058473Z cvt.s16.s8 %rs28, %rs27; 2026-02-21T09:07:28.2058656Z shr.s16 %rs29, %rs28, 4; 2026-02-21T09:07:28.2058997Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2059376Z cvt.rn.f32.s16 %r1251, %rs20; 2026-02-21T09:07:28.2059572Z cvt.rn.f32.s16 %r1252, %rs23; 2026-02-21T09:07:28.2059770Z cvt.rn.f32.s16 %r1253, %rs26; 2026-02-21T09:07:28.2059949Z cvt.rn.f32.s16 %r1254, %rs29; 2026-02-21T09:07:28.2060129Z bar.sync 0; 2026-02-21T09:07:28.2060284Z st.shared.b32 [%r18], %r1251; 2026-02-21T09:07:28.2060495Z st.shared.b32 [%r18+8], %r1252; 2026-02-21T09:07:28.2060691Z st.shared.b32 [%r19], %r1253; 2026-02-21T09:07:28.2060976Z st.shared.b32 [%r19+8], %r1254; 2026-02-21T09:07:28.2061306Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3287}; 2026-02-21T09:07:28.2061594Z bar.sync 0; 2026-02-21T09:07:28.2061756Z // begin inline asm 2026-02-21T09:07:28.2062041Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r782, %r814, %r846, %r878}, [%r354]; 2026-02-21T09:07:28.2062373Z // end inline asm 2026-02-21T09:07:28.2062524Z bar.sync 0; 2026-02-21T09:07:28.2062769Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3289}; 2026-02-21T09:07:28.2063051Z bar.sync 0; 2026-02-21T09:07:28.2063208Z // begin inline asm 2026-02-21T09:07:28.2063494Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r784, %r816, %r848, %r880}, [%r354]; 2026-02-21T09:07:28.2063827Z // end inline asm 2026-02-21T09:07:28.2063983Z bar.sync 0; 2026-02-21T09:07:28.2064206Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3288}; 2026-02-21T09:07:28.2064483Z bar.sync 0; 2026-02-21T09:07:28.2064626Z // begin inline asm 2026-02-21T09:07:28.2064921Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r783, %r815, %r847, %r879}, [%r354]; 2026-02-21T09:07:28.2065315Z // end inline asm 2026-02-21T09:07:28.2065473Z bar.sync 0; 2026-02-21T09:07:28.2065684Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3290}; 2026-02-21T09:07:28.2065960Z bar.sync 0; 2026-02-21T09:07:28.2066108Z // begin inline asm 2026-02-21T09:07:28.2066397Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r785, %r817, %r849, %r881}, [%r354]; 2026-02-21T09:07:28.2066854Z // end inline asm 2026-02-21T09:07:28.2067000Z bar.sync 0; 2026-02-21T09:07:28.2067220Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3291}; 2026-02-21T09:07:28.2067487Z bar.sync 0; 2026-02-21T09:07:28.2067635Z // begin inline asm 2026-02-21T09:07:28.2067903Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r786, %r818, %r850, %r882}, [%r354]; 2026-02-21T09:07:28.2068224Z // end inline asm 2026-02-21T09:07:28.2068447Z bar.sync 0; 2026-02-21T09:07:28.2068668Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3293}; 2026-02-21T09:07:28.2068943Z bar.sync 0; 2026-02-21T09:07:28.2069088Z // begin inline asm 2026-02-21T09:07:28.2069359Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r788, %r820, %r852, %r884}, [%r354]; 2026-02-21T09:07:28.2069691Z // end inline asm 2026-02-21T09:07:28.2069851Z bar.sync 0; 2026-02-21T09:07:28.2070078Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3292}; 2026-02-21T09:07:28.2070352Z bar.sync 0; 2026-02-21T09:07:28.2070497Z // begin inline asm 2026-02-21T09:07:28.2070776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r787, %r819, %r851, %r883}, [%r354]; 2026-02-21T09:07:28.2071096Z // end inline asm 2026-02-21T09:07:28.2071255Z bar.sync 0; 2026-02-21T09:07:28.2071476Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3294}; 2026-02-21T09:07:28.2071744Z bar.sync 0; 2026-02-21T09:07:28.2071906Z // begin inline asm 2026-02-21T09:07:28.2072185Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r789, %r821, %r853, %r885}, [%r354]; 2026-02-21T09:07:28.2072509Z // end inline asm 2026-02-21T09:07:28.2072661Z bar.sync 0; 2026-02-21T09:07:28.2072886Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3295}; 2026-02-21T09:07:28.2073152Z bar.sync 0; 2026-02-21T09:07:28.2073300Z // begin inline asm 2026-02-21T09:07:28.2073675Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r790, %r822, %r854, %r886}, [%r354]; 2026-02-21T09:07:28.2073989Z // end inline asm 2026-02-21T09:07:28.2074143Z bar.sync 0; 2026-02-21T09:07:28.2074352Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3297}; 2026-02-21T09:07:28.2074623Z bar.sync 0; 2026-02-21T09:07:28.2074764Z // begin inline asm 2026-02-21T09:07:28.2075035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r792, %r824, %r856, %r888}, [%r354]; 2026-02-21T09:07:28.2075354Z // end inline asm 2026-02-21T09:07:28.2075503Z bar.sync 0; 2026-02-21T09:07:28.2075722Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3296}; 2026-02-21T09:07:28.2075997Z bar.sync 0; 2026-02-21T09:07:28.2076146Z // begin inline asm 2026-02-21T09:07:28.2076700Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r791, %r823, %r855, %r887}, [%r354]; 2026-02-21T09:07:28.2077038Z // end inline asm 2026-02-21T09:07:28.2077185Z bar.sync 0; 2026-02-21T09:07:28.2077400Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3298}; 2026-02-21T09:07:28.2077673Z bar.sync 0; 2026-02-21T09:07:28.2077816Z // begin inline asm 2026-02-21T09:07:28.2078091Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r793, %r825, %r857, %r889}, [%r354]; 2026-02-21T09:07:28.2078424Z // end inline asm 2026-02-21T09:07:28.2078585Z bar.sync 0; 2026-02-21T09:07:28.2078805Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3299}; 2026-02-21T09:07:28.2079076Z bar.sync 0; 2026-02-21T09:07:28.2079219Z // begin inline asm 2026-02-21T09:07:28.2079508Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r794, %r826, %r858, %r890}, [%r354]; 2026-02-21T09:07:28.2079826Z // end inline asm 2026-02-21T09:07:28.2079979Z bar.sync 0; 2026-02-21T09:07:28.2080200Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3301}; 2026-02-21T09:07:28.2080471Z bar.sync 0; 2026-02-21T09:07:28.2080689Z // begin inline asm 2026-02-21T09:07:28.2080960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r796, %r828, %r860, %r892}, [%r354]; 2026-02-21T09:07:28.2081294Z // end inline asm 2026-02-21T09:07:28.2081440Z bar.sync 0; 2026-02-21T09:07:28.2081658Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3300}; 2026-02-21T09:07:28.2081922Z bar.sync 0; 2026-02-21T09:07:28.2082070Z // begin inline asm 2026-02-21T09:07:28.2082340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r795, %r827, %r859, %r891}, [%r354]; 2026-02-21T09:07:28.2082652Z // end inline asm 2026-02-21T09:07:28.2082802Z bar.sync 0; 2026-02-21T09:07:28.2083014Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3302}; 2026-02-21T09:07:28.2083282Z bar.sync 0; 2026-02-21T09:07:28.2083422Z // begin inline asm 2026-02-21T09:07:28.2083692Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r797, %r829, %r861, %r893}, [%r354]; 2026-02-21T09:07:28.2084007Z // end inline asm 2026-02-21T09:07:28.2084162Z bar.sync 0; 2026-02-21T09:07:28.2084374Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3303}; 2026-02-21T09:07:28.2084643Z bar.sync 0; 2026-02-21T09:07:28.2084785Z // begin inline asm 2026-02-21T09:07:28.2085058Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r798, %r830, %r862, %r894}, [%r354]; 2026-02-21T09:07:28.2085367Z // end inline asm 2026-02-21T09:07:28.2085516Z bar.sync 0; 2026-02-21T09:07:28.2085721Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3305}; 2026-02-21T09:07:28.2085985Z bar.sync 0; 2026-02-21T09:07:28.2086127Z // begin inline asm 2026-02-21T09:07:28.2086390Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r800, %r832, %r864, %r896}, [%r354]; 2026-02-21T09:07:28.2086834Z // end inline asm 2026-02-21T09:07:28.2086985Z bar.sync 0; 2026-02-21T09:07:28.2087199Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3304}; 2026-02-21T09:07:28.2087458Z bar.sync 0; 2026-02-21T09:07:28.2087600Z // begin inline asm 2026-02-21T09:07:28.2087867Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r799, %r831, %r863, %r895}, [%r354]; 2026-02-21T09:07:28.2088180Z // end inline asm 2026-02-21T09:07:28.2088320Z bar.sync 0; 2026-02-21T09:07:28.2088538Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3306}; 2026-02-21T09:07:28.2088907Z bar.sync 0; 2026-02-21T09:07:28.2089062Z // begin inline asm 2026-02-21T09:07:28.2089338Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r801, %r833, %r865, %r897}, [%r354]; 2026-02-21T09:07:28.2089662Z // end inline asm 2026-02-21T09:07:28.2089813Z bar.sync 0; 2026-02-21T09:07:28.2090031Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3307}; 2026-02-21T09:07:28.2090300Z bar.sync 0; 2026-02-21T09:07:28.2090440Z // begin inline asm 2026-02-21T09:07:28.2090712Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r802, %r834, %r866, %r898}, [%r354]; 2026-02-21T09:07:28.2091026Z // end inline asm 2026-02-21T09:07:28.2091175Z bar.sync 0; 2026-02-21T09:07:28.2091385Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3309}; 2026-02-21T09:07:28.2091825Z bar.sync 0; 2026-02-21T09:07:28.2091975Z // begin inline asm 2026-02-21T09:07:28.2092253Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r804, %r836, %r868, %r900}, [%r354]; 2026-02-21T09:07:28.2092571Z // end inline asm 2026-02-21T09:07:28.2092716Z bar.sync 0; 2026-02-21T09:07:28.2092927Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3308}; 2026-02-21T09:07:28.2093186Z bar.sync 0; 2026-02-21T09:07:28.2093329Z // begin inline asm 2026-02-21T09:07:28.2093592Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r803, %r835, %r867, %r899}, [%r354]; 2026-02-21T09:07:28.2093909Z // end inline asm 2026-02-21T09:07:28.2094056Z bar.sync 0; 2026-02-21T09:07:28.2094261Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3310}; 2026-02-21T09:07:28.2094551Z bar.sync 0; 2026-02-21T09:07:28.2094695Z // begin inline asm 2026-02-21T09:07:28.2094964Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r805, %r837, %r869, %r901}, [%r354]; 2026-02-21T09:07:28.2095279Z // end inline asm 2026-02-21T09:07:28.2095428Z bar.sync 0; 2026-02-21T09:07:28.2095709Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3311}; 2026-02-21T09:07:28.2095977Z bar.sync 0; 2026-02-21T09:07:28.2096116Z // begin inline asm 2026-02-21T09:07:28.2096389Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r806, %r838, %r870, %r902}, [%r354]; 2026-02-21T09:07:28.2096857Z // end inline asm 2026-02-21T09:07:28.2097002Z bar.sync 0; 2026-02-21T09:07:28.2097216Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3313}; 2026-02-21T09:07:28.2097476Z bar.sync 0; 2026-02-21T09:07:28.2097630Z // begin inline asm 2026-02-21T09:07:28.2097898Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r808, %r840, %r872, %r904}, [%r354]; 2026-02-21T09:07:28.2098217Z // end inline asm 2026-02-21T09:07:28.2098376Z bar.sync 0; 2026-02-21T09:07:28.2098591Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3312}; 2026-02-21T09:07:28.2098855Z bar.sync 0; 2026-02-21T09:07:28.2098995Z // begin inline asm 2026-02-21T09:07:28.2099266Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r807, %r839, %r871, %r903}, [%r354]; 2026-02-21T09:07:28.2099576Z // end inline asm 2026-02-21T09:07:28.2099725Z bar.sync 0; 2026-02-21T09:07:28.2099931Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3314}; 2026-02-21T09:07:28.2100201Z bar.sync 0; 2026-02-21T09:07:28.2100342Z // begin inline asm 2026-02-21T09:07:28.2100610Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r809, %r841, %r873, %r905}, [%r354]; 2026-02-21T09:07:28.2100927Z // end inline asm 2026-02-21T09:07:28.2101069Z bar.sync 0; 2026-02-21T09:07:28.2101279Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3315}; 2026-02-21T09:07:28.2101538Z bar.sync 0; 2026-02-21T09:07:28.2101691Z // begin inline asm 2026-02-21T09:07:28.2101959Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r810, %r842, %r874, %r906}, [%r354]; 2026-02-21T09:07:28.2102272Z // end inline asm 2026-02-21T09:07:28.2102413Z bar.sync 0; 2026-02-21T09:07:28.2102623Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3317}; 2026-02-21T09:07:28.2102889Z bar.sync 0; 2026-02-21T09:07:28.2103032Z // begin inline asm 2026-02-21T09:07:28.2103300Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r812, %r844, %r876, %r908}, [%r354]; 2026-02-21T09:07:28.2103713Z // end inline asm 2026-02-21T09:07:28.2103861Z bar.sync 0; 2026-02-21T09:07:28.2104069Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3316}; 2026-02-21T09:07:28.2104333Z bar.sync 0; 2026-02-21T09:07:28.2104472Z // begin inline asm 2026-02-21T09:07:28.2104739Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r811, %r843, %r875, %r907}, [%r354]; 2026-02-21T09:07:28.2105050Z // end inline asm 2026-02-21T09:07:28.2105196Z bar.sync 0; 2026-02-21T09:07:28.2105409Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3318}; 2026-02-21T09:07:28.2105669Z bar.sync 0; 2026-02-21T09:07:28.2105810Z // begin inline asm 2026-02-21T09:07:28.2106073Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r813, %r845, %r877, %r909}, [%r354]; 2026-02-21T09:07:28.2106606Z // end inline asm 2026-02-21T09:07:28.2106835Z $L__tmp1: 2026-02-21T09:07:28.2107208Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2107638Z // begin inline asm 2026-02-21T09:07:28.2107827Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2108023Z // end inline asm 2026-02-21T09:07:28.2108192Z shfl.sync.idx.b32 %r1255, %r5, 0, 31, -1; 2026-02-21T09:07:28.2108513Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2108699Z mov.pred %p20, -1; 2026-02-21T09:07:28.2108865Z // begin inline asm 2026-02-21T09:07:28.2109628Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r782,%r783,%r784,%r785,%r786,%r787,%r788,%r789,%r790,%r791,%r792,%r793,%r794,%r795,%r796,%r797,%r798,%r799,%r800,%r801,%r802,%r803,%r804,%r805,%r806,%r807,%r808,%r809,%r810,%r811,%r812,%r813}, {%r642,%r643,%r644,%r645}, %rd74, %p20, 1, 1; 2026-02-21T09:07:28.2110438Z // end inline asm 2026-02-21T09:07:28.2110594Z // begin inline asm 2026-02-21T09:07:28.2111421Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r814,%r815,%r816,%r817,%r818,%r819,%r820,%r821,%r822,%r823,%r824,%r825,%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845}, {%r642,%r643,%r644,%r645}, %rd75, %p20, 1, 1; 2026-02-21T09:07:28.2112224Z // end inline asm 2026-02-21T09:07:28.2112374Z // begin inline asm 2026-02-21T09:07:28.2113115Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r874,%r875,%r876,%r877}, {%r642,%r643,%r644,%r645}, %rd76, %p20, 1, 1; 2026-02-21T09:07:28.2113900Z // end inline asm 2026-02-21T09:07:28.2114045Z // begin inline asm 2026-02-21T09:07:28.2114787Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909}, {%r642,%r643,%r644,%r645}, %rd77, %p20, 1, 1; 2026-02-21T09:07:28.2115575Z // end inline asm 2026-02-21T09:07:28.2115744Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2115961Z mov.b32 %r1209, 0; 2026-02-21T09:07:28.2116118Z mov.b32 %r911, %r1209; 2026-02-21T09:07:28.2116289Z mov.b32 %r912, %r1209; 2026-02-21T09:07:28.2116571Z mov.b32 %r910, %r2012; 2026-02-21T09:07:28.2116743Z // begin inline asm 2026-02-21T09:07:28.2118547Z // wait for regs: %r782,%r783,%r784,%r785,%r786,%r787,%r788,%r789,%r790,%r791,%r792,%r793,%r794,%r795,%r796,%r797,%r798,%r799,%r800,%r801,%r802,%r803,%r804,%r805,%r806,%r807,%r808,%r809,%r810,%r811,%r812,%r813,%r814,%r815,%r816,%r817,%r818,%r819,%r820,%r821,%r822,%r823,%r824,%r825,%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r874,%r875,%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912 2026-02-21T09:07:28.2120501Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2120705Z // end inline asm 2026-02-21T09:07:28.2120853Z $L__tmp2: 2026-02-21T09:07:28.2121151Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2121525Z add.s64 %rd63, %rd101, 16; 2026-02-21T09:07:28.2121702Z // begin inline asm 2026-02-21T09:07:28.2121861Z mov.u16 %rs3, 0x0; 2026-02-21T09:07:28.2122027Z ld.global.b16 { %rs3 }, [ %rd63 + 0 ]; 2026-02-21T09:07:28.2122232Z // end inline asm 2026-02-21T09:07:28.2122527Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2122956Z bar.sync 0; 2026-02-21T09:07:28.2123172Z st.shared.b16 [%r13], %rs3; 2026-02-21T09:07:28.2123351Z bar.sync 0; 2026-02-21T09:07:28.2123520Z ld.shared.b16 %rs30, [%r14]; 2026-02-21T09:07:28.2123712Z ld.shared.b16 %rs31, [%r14+128]; 2026-02-21T09:07:28.2123920Z ld.shared.b16 %rs32, [%r14+8]; 2026-02-21T09:07:28.2124119Z ld.shared.b16 %rs33, [%r14+136]; 2026-02-21T09:07:28.2124319Z cvt.f32.bf16 %r1172, %rs30; 2026-02-21T09:07:28.2124495Z cvt.f32.bf16 %r1173, %rs31; 2026-02-21T09:07:28.2124675Z cvt.f32.bf16 %r1174, %rs32; 2026-02-21T09:07:28.2124848Z cvt.f32.bf16 %r1175, %rs33; 2026-02-21T09:07:28.2125024Z cvt.u32.u64 %r1257, %rd102; 2026-02-21T09:07:28.2125199Z add.s32 %r1258, %r1257, 12; 2026-02-21T09:07:28.2125517Z .loc 1 61 62 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:62 2026-02-21T09:07:28.2125877Z or.b32 %r1259, %r9, %r1258; 2026-02-21T09:07:28.2126048Z shl.b32 %r1260, %r1259, 13; 2026-02-21T09:07:28.2126233Z add.s32 %r1261, %r1260, %r32; 2026-02-21T09:07:28.2126764Z .loc 1 61 34 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:34 2026-02-21T09:07:28.2127143Z cvt.s64.s32 %rd67, %r1261; 2026-02-21T09:07:28.2127329Z add.s64 %rd64, %rd31, %rd67; 2026-02-21T09:07:28.2127654Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2128006Z // begin inline asm 2026-02-21T09:07:28.2128162Z mov.u16 %rs4, 0x0; 2026-02-21T09:07:28.2128340Z ld.global.b16 { %rs4 }, [ %rd64 + 0 ]; 2026-02-21T09:07:28.2128542Z // end inline asm 2026-02-21T09:07:28.2128697Z shr.u16 %rs34, %rs4, 8; 2026-02-21T09:07:28.2129007Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2129363Z bar.sync 0; 2026-02-21T09:07:28.2129510Z st.shared.b8 [%r15], %rs4; 2026-02-21T09:07:28.2129699Z st.shared.b8 [%r16+512], %rs34; 2026-02-21T09:07:28.2129882Z bar.sync 0; 2026-02-21T09:07:28.2130032Z ld.shared.b32 %r1262, [%r17]; 2026-02-21T09:07:28.2130227Z prmt.b32 %r1263, %r1262, 0, 0x7770U; 2026-02-21T09:07:28.2130425Z cvt.u16.u32 %rs35, %r1263; 2026-02-21T09:07:28.2130605Z prmt.b32 %r1264, %r1262, 0, 0x7771U; 2026-02-21T09:07:28.2130812Z cvt.u16.u32 %rs36, %r1264; 2026-02-21T09:07:28.2130994Z prmt.b32 %r1265, %r1262, 0, 0x7772U; 2026-02-21T09:07:28.2131186Z cvt.u16.u32 %rs37, %r1265; 2026-02-21T09:07:28.2131363Z prmt.b32 %r1266, %r1262, 0, 0x7773U; 2026-02-21T09:07:28.2131559Z cvt.u16.u32 %rs38, %r1266; 2026-02-21T09:07:28.2131874Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2132230Z shl.b16 %rs39, %rs35, 4; 2026-02-21T09:07:28.2132403Z shl.b16 %rs40, %rs36, 4; 2026-02-21T09:07:28.2132576Z shl.b16 %rs41, %rs37, 4; 2026-02-21T09:07:28.2132742Z shl.b16 %rs42, %rs38, 4; 2026-02-21T09:07:28.2133051Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2133418Z selp.b16 %rs43, %rs39, %rs35, %p53; 2026-02-21T09:07:28.2133622Z cvt.s16.s8 %rs44, %rs43; 2026-02-21T09:07:28.2133803Z shr.s16 %rs45, %rs44, 4; 2026-02-21T09:07:28.2133979Z selp.b16 %rs46, %rs40, %rs36, %p53; 2026-02-21T09:07:28.2134256Z cvt.s16.s8 %rs47, %rs46; 2026-02-21T09:07:28.2134421Z shr.s16 %rs48, %rs47, 4; 2026-02-21T09:07:28.2134598Z selp.b16 %rs49, %rs41, %rs37, %p53; 2026-02-21T09:07:28.2134788Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T09:07:28.2134969Z shr.s16 %rs51, %rs50, 4; 2026-02-21T09:07:28.2135143Z selp.b16 %rs52, %rs42, %rs38, %p53; 2026-02-21T09:07:28.2135338Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T09:07:28.2135506Z shr.s16 %rs54, %rs53, 4; 2026-02-21T09:07:28.2135822Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2136186Z cvt.rn.f32.s16 %r1267, %rs45; 2026-02-21T09:07:28.2136380Z cvt.rn.f32.s16 %r1268, %rs48; 2026-02-21T09:07:28.2136783Z cvt.rn.f32.s16 %r1269, %rs51; 2026-02-21T09:07:28.2137040Z cvt.rn.f32.s16 %r1270, %rs54; 2026-02-21T09:07:28.2137218Z bar.sync 0; 2026-02-21T09:07:28.2137366Z st.shared.b32 [%r18], %r1267; 2026-02-21T09:07:28.2137552Z st.shared.b32 [%r18+8], %r1268; 2026-02-21T09:07:28.2137751Z st.shared.b32 [%r19], %r1269; 2026-02-21T09:07:28.2137933Z st.shared.b32 [%r19+8], %r1270; 2026-02-21T09:07:28.2138126Z $L__tmp3: 2026-02-21T09:07:28.2138488Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2139036Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r782, %r814, %r846, %r878}; 2026-02-21T09:07:28.2139369Z bar.sync 0; 2026-02-21T09:07:28.2139534Z // begin inline asm 2026-02-21T09:07:28.2139773Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3287}, [%r1045]; 2026-02-21T09:07:28.2140053Z // end inline asm 2026-02-21T09:07:28.2140211Z bar.sync 0; 2026-02-21T09:07:28.2140472Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r784, %r816, %r848, %r880}; 2026-02-21T09:07:28.2140885Z bar.sync 0; 2026-02-21T09:07:28.2141039Z // begin inline asm 2026-02-21T09:07:28.2141277Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3289}, [%r1045]; 2026-02-21T09:07:28.2141549Z // end inline asm 2026-02-21T09:07:28.2141699Z bar.sync 0; 2026-02-21T09:07:28.2141953Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r783, %r815, %r847, %r879}; 2026-02-21T09:07:28.2142271Z bar.sync 0; 2026-02-21T09:07:28.2142412Z // begin inline asm 2026-02-21T09:07:28.2142641Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3288}, [%r1045]; 2026-02-21T09:07:28.2142922Z // end inline asm 2026-02-21T09:07:28.2143068Z bar.sync 0; 2026-02-21T09:07:28.2143324Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r785, %r817, %r849, %r881}; 2026-02-21T09:07:28.2143638Z bar.sync 0; 2026-02-21T09:07:28.2143785Z // begin inline asm 2026-02-21T09:07:28.2144010Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3290}, [%r1045]; 2026-02-21T09:07:28.2144290Z // end inline asm 2026-02-21T09:07:28.2144437Z bar.sync 0; 2026-02-21T09:07:28.2144698Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r786, %r818, %r850, %r882}; 2026-02-21T09:07:28.2145014Z bar.sync 0; 2026-02-21T09:07:28.2145159Z // begin inline asm 2026-02-21T09:07:28.2145390Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3291}, [%r1045]; 2026-02-21T09:07:28.2145659Z // end inline asm 2026-02-21T09:07:28.2145813Z bar.sync 0; 2026-02-21T09:07:28.2146070Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r788, %r820, %r852, %r884}; 2026-02-21T09:07:28.2146404Z bar.sync 0; 2026-02-21T09:07:28.2146686Z // begin inline asm 2026-02-21T09:07:28.2146920Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3293}, [%r1045]; 2026-02-21T09:07:28.2147191Z // end inline asm 2026-02-21T09:07:28.2147337Z bar.sync 0; 2026-02-21T09:07:28.2147595Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r787, %r819, %r851, %r883}; 2026-02-21T09:07:28.2147907Z bar.sync 0; 2026-02-21T09:07:28.2148056Z // begin inline asm 2026-02-21T09:07:28.2148283Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3292}, [%r1045]; 2026-02-21T09:07:28.2148635Z // end inline asm 2026-02-21T09:07:28.2148782Z bar.sync 0; 2026-02-21T09:07:28.2149047Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r789, %r821, %r853, %r885}; 2026-02-21T09:07:28.2149473Z bar.sync 0; 2026-02-21T09:07:28.2149616Z // begin inline asm 2026-02-21T09:07:28.2149849Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3294}, [%r1045]; 2026-02-21T09:07:28.2150116Z // end inline asm 2026-02-21T09:07:28.2150267Z bar.sync 0; 2026-02-21T09:07:28.2150522Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r790, %r822, %r854, %r886}; 2026-02-21T09:07:28.2150864Z bar.sync 0; 2026-02-21T09:07:28.2151010Z // begin inline asm 2026-02-21T09:07:28.2151244Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3295}, [%r1045]; 2026-02-21T09:07:28.2151514Z // end inline asm 2026-02-21T09:07:28.2151669Z bar.sync 0; 2026-02-21T09:07:28.2152040Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r792, %r824, %r856, %r888}; 2026-02-21T09:07:28.2152425Z bar.sync 0; 2026-02-21T09:07:28.2152576Z // begin inline asm 2026-02-21T09:07:28.2152804Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3297}, [%r1045]; 2026-02-21T09:07:28.2153085Z // end inline asm 2026-02-21T09:07:28.2153236Z bar.sync 0; 2026-02-21T09:07:28.2153496Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r791, %r823, %r855, %r887}; 2026-02-21T09:07:28.2153820Z bar.sync 0; 2026-02-21T09:07:28.2153972Z // begin inline asm 2026-02-21T09:07:28.2154201Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3296}, [%r1045]; 2026-02-21T09:07:28.2154466Z // end inline asm 2026-02-21T09:07:28.2154616Z bar.sync 0; 2026-02-21T09:07:28.2154867Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r793, %r825, %r857, %r889}; 2026-02-21T09:07:28.2155185Z bar.sync 0; 2026-02-21T09:07:28.2155324Z // begin inline asm 2026-02-21T09:07:28.2155551Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3298}, [%r1045]; 2026-02-21T09:07:28.2155817Z // end inline asm 2026-02-21T09:07:28.2155970Z bar.sync 0; 2026-02-21T09:07:28.2156310Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r794, %r826, %r858, %r890}; 2026-02-21T09:07:28.2156772Z bar.sync 0; 2026-02-21T09:07:28.2156924Z // begin inline asm 2026-02-21T09:07:28.2157149Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3299}, [%r1045]; 2026-02-21T09:07:28.2157421Z // end inline asm 2026-02-21T09:07:28.2157567Z bar.sync 0; 2026-02-21T09:07:28.2157830Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r796, %r828, %r860, %r892}; 2026-02-21T09:07:28.2158141Z bar.sync 0; 2026-02-21T09:07:28.2158290Z // begin inline asm 2026-02-21T09:07:28.2158524Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3301}, [%r1045]; 2026-02-21T09:07:28.2158807Z // end inline asm 2026-02-21T09:07:28.2158956Z bar.sync 0; 2026-02-21T09:07:28.2159210Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r795, %r827, %r859, %r891}; 2026-02-21T09:07:28.2159529Z bar.sync 0; 2026-02-21T09:07:28.2159675Z // begin inline asm 2026-02-21T09:07:28.2159918Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3300}, [%r1045]; 2026-02-21T09:07:28.2160186Z // end inline asm 2026-02-21T09:07:28.2160338Z bar.sync 0; 2026-02-21T09:07:28.2160606Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r797, %r829, %r861, %r893}; 2026-02-21T09:07:28.2160927Z bar.sync 0; 2026-02-21T09:07:28.2161075Z // begin inline asm 2026-02-21T09:07:28.2161303Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3302}, [%r1045]; 2026-02-21T09:07:28.2161573Z // end inline asm 2026-02-21T09:07:28.2161719Z bar.sync 0; 2026-02-21T09:07:28.2161980Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r798, %r830, %r862, %r894}; 2026-02-21T09:07:28.2162294Z bar.sync 0; 2026-02-21T09:07:28.2162447Z // begin inline asm 2026-02-21T09:07:28.2162672Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3303}, [%r1045]; 2026-02-21T09:07:28.2162953Z // end inline asm 2026-02-21T09:07:28.2163098Z bar.sync 0; 2026-02-21T09:07:28.2163359Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r800, %r832, %r864, %r896}; 2026-02-21T09:07:28.2163688Z bar.sync 0; 2026-02-21T09:07:28.2163832Z // begin inline asm 2026-02-21T09:07:28.2164063Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3305}, [%r1045]; 2026-02-21T09:07:28.2164426Z // end inline asm 2026-02-21T09:07:28.2164579Z bar.sync 0; 2026-02-21T09:07:28.2164832Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r799, %r831, %r863, %r895}; 2026-02-21T09:07:28.2165152Z bar.sync 0; 2026-02-21T09:07:28.2165294Z // begin inline asm 2026-02-21T09:07:28.2165520Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3304}, [%r1045]; 2026-02-21T09:07:28.2165796Z // end inline asm 2026-02-21T09:07:28.2165942Z bar.sync 0; 2026-02-21T09:07:28.2166206Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r801, %r833, %r865, %r897}; 2026-02-21T09:07:28.2166661Z bar.sync 0; 2026-02-21T09:07:28.2166807Z // begin inline asm 2026-02-21T09:07:28.2167029Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3306}, [%r1045]; 2026-02-21T09:07:28.2167463Z // end inline asm 2026-02-21T09:07:28.2167621Z bar.sync 0; 2026-02-21T09:07:28.2167899Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r802, %r834, %r866, %r898}; 2026-02-21T09:07:28.2168221Z bar.sync 0; 2026-02-21T09:07:28.2168369Z // begin inline asm 2026-02-21T09:07:28.2168607Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3307}, [%r1045]; 2026-02-21T09:07:28.2168873Z // end inline asm 2026-02-21T09:07:28.2169020Z bar.sync 0; 2026-02-21T09:07:28.2169270Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r804, %r836, %r868, %r900}; 2026-02-21T09:07:28.2169598Z bar.sync 0; 2026-02-21T09:07:28.2169738Z // begin inline asm 2026-02-21T09:07:28.2169963Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3309}, [%r1045]; 2026-02-21T09:07:28.2170234Z // end inline asm 2026-02-21T09:07:28.2170379Z bar.sync 0; 2026-02-21T09:07:28.2170638Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r803, %r835, %r867, %r899}; 2026-02-21T09:07:28.2170949Z bar.sync 0; 2026-02-21T09:07:28.2171116Z // begin inline asm 2026-02-21T09:07:28.2171424Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3308}, [%r1045]; 2026-02-21T09:07:28.2171704Z // end inline asm 2026-02-21T09:07:28.2171847Z bar.sync 0; 2026-02-21T09:07:28.2172113Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r805, %r837, %r869, %r901}; 2026-02-21T09:07:28.2172429Z bar.sync 0; 2026-02-21T09:07:28.2172579Z // begin inline asm 2026-02-21T09:07:28.2172810Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3310}, [%r1045]; 2026-02-21T09:07:28.2173079Z // end inline asm 2026-02-21T09:07:28.2173233Z bar.sync 0; 2026-02-21T09:07:28.2173486Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r806, %r838, %r870, %r902}; 2026-02-21T09:07:28.2173808Z bar.sync 0; 2026-02-21T09:07:28.2173952Z // begin inline asm 2026-02-21T09:07:28.2174190Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3311}, [%r1045]; 2026-02-21T09:07:28.2174455Z // end inline asm 2026-02-21T09:07:28.2174611Z bar.sync 0; 2026-02-21T09:07:28.2174873Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r808, %r840, %r872, %r904}; 2026-02-21T09:07:28.2175188Z bar.sync 0; 2026-02-21T09:07:28.2175344Z // begin inline asm 2026-02-21T09:07:28.2175569Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3313}, [%r1045]; 2026-02-21T09:07:28.2175845Z // end inline asm 2026-02-21T09:07:28.2176003Z bar.sync 0; 2026-02-21T09:07:28.2176265Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r807, %r839, %r871, %r903}; 2026-02-21T09:07:28.2176717Z bar.sync 0; 2026-02-21T09:07:28.2176862Z // begin inline asm 2026-02-21T09:07:28.2177086Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3312}, [%r1045]; 2026-02-21T09:07:28.2177361Z // end inline asm 2026-02-21T09:07:28.2177503Z bar.sync 0; 2026-02-21T09:07:28.2177751Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r809, %r841, %r873, %r905}; 2026-02-21T09:07:28.2178064Z bar.sync 0; 2026-02-21T09:07:28.2178206Z // begin inline asm 2026-02-21T09:07:28.2178434Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3314}, [%r1045]; 2026-02-21T09:07:28.2178704Z // end inline asm 2026-02-21T09:07:28.2178855Z bar.sync 0; 2026-02-21T09:07:28.2179122Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r810, %r842, %r874, %r906}; 2026-02-21T09:07:28.2179442Z bar.sync 0; 2026-02-21T09:07:28.2179677Z // begin inline asm 2026-02-21T09:07:28.2179897Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3315}, [%r1045]; 2026-02-21T09:07:28.2180184Z // end inline asm 2026-02-21T09:07:28.2180328Z bar.sync 0; 2026-02-21T09:07:28.2180584Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r812, %r844, %r876, %r908}; 2026-02-21T09:07:28.2180897Z bar.sync 0; 2026-02-21T09:07:28.2181042Z // begin inline asm 2026-02-21T09:07:28.2181264Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3317}, [%r1045]; 2026-02-21T09:07:28.2181535Z // end inline asm 2026-02-21T09:07:28.2181684Z bar.sync 0; 2026-02-21T09:07:28.2181936Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r811, %r843, %r875, %r907}; 2026-02-21T09:07:28.2182253Z bar.sync 0; 2026-02-21T09:07:28.2182543Z // begin inline asm 2026-02-21T09:07:28.2182775Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3316}, [%r1045]; 2026-02-21T09:07:28.2183044Z // end inline asm 2026-02-21T09:07:28.2183195Z bar.sync 0; 2026-02-21T09:07:28.2183452Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r354], {%r813, %r845, %r877, %r909}; 2026-02-21T09:07:28.2183770Z bar.sync 0; 2026-02-21T09:07:28.2183910Z // begin inline asm 2026-02-21T09:07:28.2184141Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3318}, [%r1045]; 2026-02-21T09:07:28.2184413Z // end inline asm 2026-02-21T09:07:28.2184564Z // begin inline asm 2026-02-21T09:07:28.2184757Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2184952Z // end inline asm 2026-02-21T09:07:28.2185122Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2185307Z shl.b32 %r1271, %r1255, 9; 2026-02-21T09:07:28.2185500Z and.b32 %r1272, %r1271, 6144; 2026-02-21T09:07:28.2185689Z add.s32 %r1273, %r1272, %r2012; 2026-02-21T09:07:28.2185892Z bfe.u32 %r1274, %r1273, 4, 14; 2026-02-21T09:07:28.2186092Z cvt.u64.u32 %rd68, %r1274; 2026-02-21T09:07:28.2186374Z or.b64 %rd65, %rd68, -4611685949674356736; 2026-02-21T09:07:28.2186753Z // begin inline asm 2026-02-21T09:07:28.2187619Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3287,%r3288,%r3289,%r3290,%r3291,%r3292,%r3293,%r3294,%r3295,%r3296,%r3297,%r3298,%r3299,%r3300,%r3301,%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318}, {%r1172,%r1173,%r1174,%r1175}, %rd65, %p20, 1, 1; 2026-02-21T09:07:28.2188616Z // end inline asm 2026-02-21T09:07:28.2188790Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2189003Z mov.b32 %r1208, %r2012; 2026-02-21T09:07:28.2189181Z mov.b32 %r1210, %r1209; 2026-02-21T09:07:28.2189348Z // begin inline asm 2026-02-21T09:07:28.2190018Z // wait for regs: %r3287,%r3288,%r3289,%r3290,%r3291,%r3292,%r3293,%r3294,%r3295,%r3296,%r3297,%r3298,%r3299,%r3300,%r3301,%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r1208,%r1209,%r1210 2026-02-21T09:07:28.2190742Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2190944Z // end inline asm 2026-02-21T09:07:28.2191092Z $L__tmp4: 2026-02-21T09:07:28.2191403Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2191784Z add.s64 %rd102, %rd102, 8; 2026-02-21T09:07:28.2191968Z add.s64 %rd101, %rd101, 32; 2026-02-21T09:07:28.2192160Z add.s32 %r3286, %r3286, 65536; 2026-02-21T09:07:28.2192365Z setp.lt.u64 %p26, %rd102, 504; 2026-02-21T09:07:28.2192561Z @%p26 bra $L__BB0_3; 2026-02-21T09:07:28.2192783Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:28.2193201Z .loc 1 94 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:94:28 2026-02-21T09:07:28.2193586Z cvt.rn.bf16x2.f32 %r1279, %r3288, %r3287; 2026-02-21T09:07:28.2193816Z cvt.rn.bf16x2.f32 %r1280, %r3290, %r3289; 2026-02-21T09:07:28.2194053Z cvt.rn.bf16x2.f32 %r1281, %r3292, %r3291; 2026-02-21T09:07:28.2194268Z cvt.rn.bf16x2.f32 %r1282, %r3294, %r3293; 2026-02-21T09:07:28.2194489Z cvt.rn.bf16x2.f32 %r1283, %r3296, %r3295; 2026-02-21T09:07:28.2194803Z cvt.rn.bf16x2.f32 %r1284, %r3298, %r3297; 2026-02-21T09:07:28.2195026Z cvt.rn.bf16x2.f32 %r1285, %r3300, %r3299; 2026-02-21T09:07:28.2195242Z cvt.rn.bf16x2.f32 %r1286, %r3302, %r3301; 2026-02-21T09:07:28.2195465Z cvt.rn.bf16x2.f32 %r1287, %r3304, %r3303; 2026-02-21T09:07:28.2195686Z cvt.rn.bf16x2.f32 %r1288, %r3306, %r3305; 2026-02-21T09:07:28.2195900Z cvt.rn.bf16x2.f32 %r1289, %r3308, %r3307; 2026-02-21T09:07:28.2196122Z cvt.rn.bf16x2.f32 %r1290, %r3310, %r3309; 2026-02-21T09:07:28.2196339Z cvt.rn.bf16x2.f32 %r1291, %r3312, %r3311; 2026-02-21T09:07:28.2196704Z cvt.rn.bf16x2.f32 %r1292, %r3314, %r3313; 2026-02-21T09:07:28.2196922Z cvt.rn.bf16x2.f32 %r1293, %r3316, %r3315; 2026-02-21T09:07:28.2197143Z cvt.rn.bf16x2.f32 %r1294, %r3318, %r3317; 2026-02-21T09:07:28.2197659Z .loc 1 95 43 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:95:43 2026-02-21T09:07:28.2198048Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:07:28.2198258Z bar.sync 0; 2026-02-21T09:07:28.2198533Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r22], {%r1279, %r1280, %r1281, %r1282}; 2026-02-21T09:07:28.2198994Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r23], {%r1283, %r1284, %r1285, %r1286}; 2026-02-21T09:07:28.2199437Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r24], {%r1287, %r1288, %r1289, %r1290}; 2026-02-21T09:07:28.2199890Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r25], {%r1291, %r1292, %r1293, %r1294}; 2026-02-21T09:07:28.2200220Z // begin inline asm 2026-02-21T09:07:28.2200407Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2200609Z // end inline asm 2026-02-21T09:07:28.2200764Z bar.sync 0; 2026-02-21T09:07:28.2200929Z elect.sync %r1295|%p29, -1; 2026-02-21T09:07:28.2201136Z shfl.sync.idx.b32 %r1296, %r5, 0, 31, -1; 2026-02-21T09:07:28.2201372Z and.pred %p27, %p54, %p29; 2026-02-21T09:07:28.2201639Z and.b32 %r1297, %r1296, 3; 2026-02-21T09:07:28.2201829Z shl.b32 %r1298, %r1297, 13; 2026-02-21T09:07:28.2202012Z add.s32 %r2243, %r3020, %r1298; 2026-02-21T09:07:28.2202222Z shl.b32 %r101, %r1297, 6; 2026-02-21T09:07:28.2202406Z or.b32 %r1275, %r101, %r31; 2026-02-21T09:07:28.2202587Z // begin inline asm 2026-02-21T09:07:28.2202929Z @%p27 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd69, {%r1275, %r1276}], [%r2243]; 2026-02-21T09:07:28.2203302Z // end inline asm 2026-02-21T09:07:28.2203476Z cp.async.bulk.commit_group; 2026-02-21T09:07:28.2203811Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2204186Z add.s32 %r1300, %r3285, 528; 2026-02-21T09:07:28.2204516Z .loc 1 32 35 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:32:35 2026-02-21T09:07:28.2204883Z shr.s32 %r1301, %r1300, 31; 2026-02-21T09:07:28.2205067Z shr.u32 %r1302, %r1301, 21; 2026-02-21T09:07:28.2205260Z add.s32 %r1303, %r1300, %r1302; 2026-02-21T09:07:28.2205459Z shr.s32 %r1304, %r1303, 11; 2026-02-21T09:07:28.2205772Z .loc 1 33 33 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:33:33 2026-02-21T09:07:28.2206128Z shl.b32 %r1305, %r1304, 6; 2026-02-21T09:07:28.2206442Z .loc 1 34 39 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:34:39 2026-02-21T09:07:28.2206948Z sub.s32 %r1306, 64, %r1305; 2026-02-21T09:07:28.2207264Z .loc 1 34 52 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:34:52 2026-02-21T09:07:28.2207611Z min.s32 %r1307, %r1306, 64; 2026-02-21T09:07:28.2207931Z .loc 1 35 45 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:45 2026-02-21T09:07:28.2208288Z and.b32 %r1308, %r1303, -2048; 2026-02-21T09:07:28.2208484Z sub.s32 %r1309, %r1300, %r1308; 2026-02-21T09:07:28.2208816Z .loc 1 36 51 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:36:51 2026-02-21T09:07:28.2209189Z div.s32 %r1310, %r1309, %r1307; 2026-02-21T09:07:28.2209532Z .loc 1 35 64 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:64 2026-02-21T09:07:28.2210004Z mul.lo.s32 %r1311, %r1310, %r1307; 2026-02-21T09:07:28.2210218Z sub.s32 %r1312, %r1309, %r1311; 2026-02-21T09:07:28.2210543Z .loc 1 35 30 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:30 2026-02-21T09:07:28.2210900Z add.s32 %r1313, %r1312, %r1305; 2026-02-21T09:07:28.2211224Z .loc 1 37 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:37:27 2026-02-21T09:07:28.2211585Z shl.b32 %r102, %r1313, 6; 2026-02-21T09:07:28.2211903Z .loc 1 39 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:39:27 2026-02-21T09:07:28.2212257Z shl.b32 %r103, %r1310, 8; 2026-02-21T09:07:28.2212715Z .loc 1 40 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:40:32 2026-02-21T09:07:28.2213068Z or.b32 %r104, %r103, %r8; 2026-02-21T09:07:28.2213262Z shl.b32 %r1314, %r1296, 9; 2026-02-21T09:07:28.2213450Z and.b32 %r1315, %r1314, 6144; 2026-02-21T09:07:28.2213641Z add.s32 %r1316, %r3020, %r1315; 2026-02-21T09:07:28.2213835Z add.s32 %r1317, %r1316, 32768; 2026-02-21T09:07:28.2214023Z bfe.u32 %r1318, %r1317, 4, 14; 2026-02-21T09:07:28.2214214Z cvt.u64.u32 %rd71, %r1318; 2026-02-21T09:07:28.2214408Z or.b64 %rd80, %rd71, -4611685949674356736; 2026-02-21T09:07:28.2214793Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2215162Z shl.b32 %r1319, %r1313, 16; 2026-02-21T09:07:28.2215353Z or.b32 %r1320, %r27, %r1319; 2026-02-21T09:07:28.2215547Z mad.wide.s32 %rd103, %r1320, 2, %rd30; 2026-02-21T09:07:28.2215764Z add.s32 %r3319, %r28, %r103; 2026-02-21T09:07:28.2215959Z mov.b32 %r3320, 0f00000000; 2026-02-21T09:07:28.2216219Z mov.b64 %rd104, -8; 2026-02-21T09:07:28.2216398Z mov.b32 %r3321, %r3320; 2026-02-21T09:07:28.2216706Z mov.b32 %r3322, %r3320; 2026-02-21T09:07:28.2216887Z mov.b32 %r3323, %r3320; 2026-02-21T09:07:28.2217052Z mov.b32 %r3324, %r3320; 2026-02-21T09:07:28.2217223Z mov.b32 %r3325, %r3320; 2026-02-21T09:07:28.2217387Z mov.b32 %r3326, %r3320; 2026-02-21T09:07:28.2217560Z mov.b32 %r3327, %r3320; 2026-02-21T09:07:28.2217725Z mov.b32 %r3328, %r3320; 2026-02-21T09:07:28.2217896Z mov.b32 %r3329, %r3320; 2026-02-21T09:07:28.2218065Z mov.b32 %r3330, %r3320; 2026-02-21T09:07:28.2218228Z mov.b32 %r3331, %r3320; 2026-02-21T09:07:28.2218400Z mov.b32 %r3332, %r3320; 2026-02-21T09:07:28.2218563Z mov.b32 %r3333, %r3320; 2026-02-21T09:07:28.2218735Z mov.b32 %r3334, %r3320; 2026-02-21T09:07:28.2218897Z mov.b32 %r3335, %r3320; 2026-02-21T09:07:28.2219065Z mov.b32 %r3336, %r3320; 2026-02-21T09:07:28.2219225Z mov.b32 %r3337, %r3320; 2026-02-21T09:07:28.2219398Z mov.b32 %r3338, %r3320; 2026-02-21T09:07:28.2219566Z mov.b32 %r3339, %r3320; 2026-02-21T09:07:28.2219733Z mov.b32 %r3340, %r3320; 2026-02-21T09:07:28.2219900Z mov.b32 %r3341, %r3320; 2026-02-21T09:07:28.2220082Z mov.b32 %r3342, %r3320; 2026-02-21T09:07:28.2220252Z mov.b32 %r3343, %r3320; 2026-02-21T09:07:28.2220416Z mov.b32 %r3344, %r3320; 2026-02-21T09:07:28.2220584Z mov.b32 %r3345, %r3320; 2026-02-21T09:07:28.2220746Z mov.b32 %r3346, %r3320; 2026-02-21T09:07:28.2220915Z mov.b32 %r3347, %r3320; 2026-02-21T09:07:28.2221079Z mov.b32 %r3348, %r3320; 2026-02-21T09:07:28.2221248Z mov.b32 %r3349, %r3320; 2026-02-21T09:07:28.2221411Z mov.b32 %r3350, %r3320; 2026-02-21T09:07:28.2221583Z mov.b32 %r3351, %r3320; 2026-02-21T09:07:28.2221807Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:07:28.2222105Z // => This Inner Loop Header: Depth=2 2026-02-21T09:07:28.2222519Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2222878Z // begin inline asm 2026-02-21T09:07:28.2223045Z mov.u16 %rs55, 0x0; 2026-02-21T09:07:28.2223219Z ld.global.b16 { %rs55 }, [ %rd103 + 0 ]; 2026-02-21T09:07:28.2223544Z // end inline asm 2026-02-21T09:07:28.2223842Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2224212Z bar.sync 0; 2026-02-21T09:07:28.2224374Z st.shared.b16 [%r13], %rs55; 2026-02-21T09:07:28.2224553Z bar.sync 0; 2026-02-21T09:07:28.2224710Z ld.shared.b16 %rs59, [%r14]; 2026-02-21T09:07:28.2224900Z ld.shared.b16 %rs60, [%r14+128]; 2026-02-21T09:07:28.2225112Z ld.shared.b16 %rs61, [%r14+8]; 2026-02-21T09:07:28.2225306Z ld.shared.b16 %rs62, [%r14+136]; 2026-02-21T09:07:28.2225510Z cvt.f32.bf16 %r1613, %rs59; 2026-02-21T09:07:28.2225694Z cvt.f32.bf16 %r1614, %rs60; 2026-02-21T09:07:28.2225881Z cvt.f32.bf16 %r1615, %rs61; 2026-02-21T09:07:28.2226205Z cvt.f32.bf16 %r1616, %rs62; 2026-02-21T09:07:28.2226668Z .loc 1 61 34 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:34 2026-02-21T09:07:28.2227051Z cvt.s64.s32 %rd81, %r3319; 2026-02-21T09:07:28.2227238Z add.s64 %rd73, %rd31, %rd81; 2026-02-21T09:07:28.2227453Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2227528Z // begin inline asm 2026-02-21T09:07:28.2227593Z mov.u16 %rs56, 0x0; 2026-02-21T09:07:28.2227671Z ld.global.b16 { %rs56 }, [ %rd73 + 0 ]; 2026-02-21T09:07:28.2227739Z // end inline asm 2026-02-21T09:07:28.2227806Z shr.u16 %rs63, %rs56, 8; 2026-02-21T09:07:28.2228011Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2228083Z bar.sync 0; 2026-02-21T09:07:28.2228151Z st.shared.b8 [%r15], %rs56; 2026-02-21T09:07:28.2228220Z st.shared.b8 [%r16+512], %rs63; 2026-02-21T09:07:28.2228280Z bar.sync 0; 2026-02-21T09:07:28.2228526Z ld.shared.b32 %r2217, [%r17]; 2026-02-21T09:07:28.2228608Z prmt.b32 %r2218, %r2217, 0, 0x7770U; 2026-02-21T09:07:28.2228677Z cvt.u16.u32 %rs64, %r2218; 2026-02-21T09:07:28.2228754Z prmt.b32 %r2219, %r2217, 0, 0x7771U; 2026-02-21T09:07:28.2228820Z cvt.u16.u32 %rs65, %r2219; 2026-02-21T09:07:28.2228888Z prmt.b32 %r2220, %r2217, 0, 0x7772U; 2026-02-21T09:07:28.2228969Z cvt.u16.u32 %rs66, %r2220; 2026-02-21T09:07:28.2229037Z prmt.b32 %r2221, %r2217, 0, 0x7773U; 2026-02-21T09:07:28.2229100Z cvt.u16.u32 %rs67, %r2221; 2026-02-21T09:07:28.2229316Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2229389Z shl.b16 %rs68, %rs64, 4; 2026-02-21T09:07:28.2229451Z shl.b16 %rs69, %rs65, 4; 2026-02-21T09:07:28.2229512Z shl.b16 %rs70, %rs66, 4; 2026-02-21T09:07:28.2229579Z shl.b16 %rs71, %rs67, 4; 2026-02-21T09:07:28.2229788Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2229865Z selp.b16 %rs72, %rs68, %rs64, %p53; 2026-02-21T09:07:28.2229946Z cvt.s16.s8 %rs73, %rs72; 2026-02-21T09:07:28.2230010Z shr.s16 %rs74, %rs73, 4; 2026-02-21T09:07:28.2230084Z selp.b16 %rs75, %rs69, %rs65, %p53; 2026-02-21T09:07:28.2230147Z cvt.s16.s8 %rs76, %rs75; 2026-02-21T09:07:28.2230214Z shr.s16 %rs77, %rs76, 4; 2026-02-21T09:07:28.2230285Z selp.b16 %rs78, %rs70, %rs66, %p53; 2026-02-21T09:07:28.2230348Z cvt.s16.s8 %rs79, %rs78; 2026-02-21T09:07:28.2230415Z shr.s16 %rs80, %rs79, 4; 2026-02-21T09:07:28.2230483Z selp.b16 %rs81, %rs71, %rs67, %p53; 2026-02-21T09:07:28.2230546Z cvt.s16.s8 %rs82, %rs81; 2026-02-21T09:07:28.2230608Z shr.s16 %rs83, %rs82, 4; 2026-02-21T09:07:28.2230819Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2230887Z cvt.rn.f32.s16 %r2222, %rs74; 2026-02-21T09:07:28.2230953Z cvt.rn.f32.s16 %r2223, %rs77; 2026-02-21T09:07:28.2231028Z cvt.rn.f32.s16 %r2224, %rs80; 2026-02-21T09:07:28.2231093Z cvt.rn.f32.s16 %r2225, %rs83; 2026-02-21T09:07:28.2231153Z bar.sync 0; 2026-02-21T09:07:28.2231218Z st.shared.b32 [%r18], %r2222; 2026-02-21T09:07:28.2231384Z st.shared.b32 [%r18+8], %r2223; 2026-02-21T09:07:28.2231450Z st.shared.b32 [%r19], %r2224; 2026-02-21T09:07:28.2231519Z st.shared.b32 [%r19+8], %r2225; 2026-02-21T09:07:28.2231671Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3320}; 2026-02-21T09:07:28.2231730Z bar.sync 0; 2026-02-21T09:07:28.2231795Z // begin inline asm 2026-02-21T09:07:28.2231995Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1753, %r1785, %r1817, %r1849}, [%r1325]; 2026-02-21T09:07:28.2232055Z // end inline asm 2026-02-21T09:07:28.2232113Z bar.sync 0; 2026-02-21T09:07:28.2232249Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3322}; 2026-02-21T09:07:28.2232313Z bar.sync 0; 2026-02-21T09:07:28.2232375Z // begin inline asm 2026-02-21T09:07:28.2232702Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1755, %r1787, %r1819, %r1851}, [%r1325]; 2026-02-21T09:07:28.2232771Z // end inline asm 2026-02-21T09:07:28.2232830Z bar.sync 0; 2026-02-21T09:07:28.2232963Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3321}; 2026-02-21T09:07:28.2233035Z bar.sync 0; 2026-02-21T09:07:28.2233102Z // begin inline asm 2026-02-21T09:07:28.2233288Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1754, %r1786, %r1818, %r1850}, [%r1325]; 2026-02-21T09:07:28.2233348Z // end inline asm 2026-02-21T09:07:28.2233411Z bar.sync 0; 2026-02-21T09:07:28.2233541Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3323}; 2026-02-21T09:07:28.2233603Z bar.sync 0; 2026-02-21T09:07:28.2233670Z // begin inline asm 2026-02-21T09:07:28.2233853Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1756, %r1788, %r1820, %r1852}, [%r1325]; 2026-02-21T09:07:28.2233912Z // end inline asm 2026-02-21T09:07:28.2233970Z bar.sync 0; 2026-02-21T09:07:28.2234107Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3324}; 2026-02-21T09:07:28.2234169Z bar.sync 0; 2026-02-21T09:07:28.2234286Z // begin inline asm 2026-02-21T09:07:28.2234479Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1757, %r1789, %r1821, %r1853}, [%r1325]; 2026-02-21T09:07:28.2234553Z // end inline asm 2026-02-21T09:07:28.2234612Z bar.sync 0; 2026-02-21T09:07:28.2234744Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3326}; 2026-02-21T09:07:28.2234810Z bar.sync 0; 2026-02-21T09:07:28.2234872Z // begin inline asm 2026-02-21T09:07:28.2235051Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1759, %r1791, %r1823, %r1855}, [%r1325]; 2026-02-21T09:07:28.2235118Z // end inline asm 2026-02-21T09:07:28.2235175Z bar.sync 0; 2026-02-21T09:07:28.2239850Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3325}; 2026-02-21T09:07:28.2239957Z bar.sync 0; 2026-02-21T09:07:28.2240028Z // begin inline asm 2026-02-21T09:07:28.2240252Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1758, %r1790, %r1822, %r1854}, [%r1325]; 2026-02-21T09:07:28.2240322Z // end inline asm 2026-02-21T09:07:28.2240384Z bar.sync 0; 2026-02-21T09:07:28.2240536Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3327}; 2026-02-21T09:07:28.2240602Z bar.sync 0; 2026-02-21T09:07:28.2240670Z // begin inline asm 2026-02-21T09:07:28.2240866Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1760, %r1792, %r1824, %r1856}, [%r1325]; 2026-02-21T09:07:28.2240935Z // end inline asm 2026-02-21T09:07:28.2240992Z bar.sync 0; 2026-02-21T09:07:28.2241135Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3328}; 2026-02-21T09:07:28.2241194Z bar.sync 0; 2026-02-21T09:07:28.2241269Z // begin inline asm 2026-02-21T09:07:28.2241458Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1761, %r1793, %r1825, %r1857}, [%r1325]; 2026-02-21T09:07:28.2241522Z // end inline asm 2026-02-21T09:07:28.2241587Z bar.sync 0; 2026-02-21T09:07:28.2241721Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3330}; 2026-02-21T09:07:28.2241779Z bar.sync 0; 2026-02-21T09:07:28.2241849Z // begin inline asm 2026-02-21T09:07:28.2242038Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1763, %r1795, %r1827, %r1859}, [%r1325]; 2026-02-21T09:07:28.2242098Z // end inline asm 2026-02-21T09:07:28.2242155Z bar.sync 0; 2026-02-21T09:07:28.2242455Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3329}; 2026-02-21T09:07:28.2242517Z bar.sync 0; 2026-02-21T09:07:28.2242579Z // begin inline asm 2026-02-21T09:07:28.2242767Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1762, %r1794, %r1826, %r1858}, [%r1325]; 2026-02-21T09:07:28.2242826Z // end inline asm 2026-02-21T09:07:28.2242885Z bar.sync 0; 2026-02-21T09:07:28.2243016Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3331}; 2026-02-21T09:07:28.2243078Z bar.sync 0; 2026-02-21T09:07:28.2243137Z // begin inline asm 2026-02-21T09:07:28.2243313Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1764, %r1796, %r1828, %r1860}, [%r1325]; 2026-02-21T09:07:28.2243379Z // end inline asm 2026-02-21T09:07:28.2243435Z bar.sync 0; 2026-02-21T09:07:28.2243717Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3332}; 2026-02-21T09:07:28.2243785Z bar.sync 0; 2026-02-21T09:07:28.2243845Z // begin inline asm 2026-02-21T09:07:28.2244026Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1765, %r1797, %r1829, %r1861}, [%r1325]; 2026-02-21T09:07:28.2244089Z // end inline asm 2026-02-21T09:07:28.2244153Z bar.sync 0; 2026-02-21T09:07:28.2244293Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3334}; 2026-02-21T09:07:28.2244352Z bar.sync 0; 2026-02-21T09:07:28.2244416Z // begin inline asm 2026-02-21T09:07:28.2244593Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1767, %r1799, %r1831, %r1863}, [%r1325]; 2026-02-21T09:07:28.2244653Z // end inline asm 2026-02-21T09:07:28.2244711Z bar.sync 0; 2026-02-21T09:07:28.2244855Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3333}; 2026-02-21T09:07:28.2244911Z bar.sync 0; 2026-02-21T09:07:28.2244973Z // begin inline asm 2026-02-21T09:07:28.2245165Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1766, %r1798, %r1830, %r1862}, [%r1325]; 2026-02-21T09:07:28.2245228Z // end inline asm 2026-02-21T09:07:28.2245376Z bar.sync 0; 2026-02-21T09:07:28.2245514Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3335}; 2026-02-21T09:07:28.2245586Z bar.sync 0; 2026-02-21T09:07:28.2245648Z // begin inline asm 2026-02-21T09:07:28.2245830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1768, %r1800, %r1832, %r1864}, [%r1325]; 2026-02-21T09:07:28.2245895Z // end inline asm 2026-02-21T09:07:28.2245951Z bar.sync 0; 2026-02-21T09:07:28.2246079Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3336}; 2026-02-21T09:07:28.2246140Z bar.sync 0; 2026-02-21T09:07:28.2246199Z // begin inline asm 2026-02-21T09:07:28.2246377Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1769, %r1801, %r1833, %r1865}, [%r1325]; 2026-02-21T09:07:28.2246435Z // end inline asm 2026-02-21T09:07:28.2246659Z bar.sync 0; 2026-02-21T09:07:28.2246796Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3338}; 2026-02-21T09:07:28.2246855Z bar.sync 0; 2026-02-21T09:07:28.2246921Z // begin inline asm 2026-02-21T09:07:28.2247102Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1771, %r1803, %r1835, %r1867}, [%r1325]; 2026-02-21T09:07:28.2247162Z // end inline asm 2026-02-21T09:07:28.2247231Z bar.sync 0; 2026-02-21T09:07:28.2247367Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3337}; 2026-02-21T09:07:28.2247424Z bar.sync 0; 2026-02-21T09:07:28.2247484Z // begin inline asm 2026-02-21T09:07:28.2247666Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1770, %r1802, %r1834, %r1866}, [%r1325]; 2026-02-21T09:07:28.2247724Z // end inline asm 2026-02-21T09:07:28.2247780Z bar.sync 0; 2026-02-21T09:07:28.2247911Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3339}; 2026-02-21T09:07:28.2247967Z bar.sync 0; 2026-02-21T09:07:28.2248026Z // begin inline asm 2026-02-21T09:07:28.2248201Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1772, %r1804, %r1836, %r1868}, [%r1325]; 2026-02-21T09:07:28.2248264Z // end inline asm 2026-02-21T09:07:28.2248324Z bar.sync 0; 2026-02-21T09:07:28.2248453Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3340}; 2026-02-21T09:07:28.2248526Z bar.sync 0; 2026-02-21T09:07:28.2248587Z // begin inline asm 2026-02-21T09:07:28.2248858Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1773, %r1805, %r1837, %r1869}, [%r1325]; 2026-02-21T09:07:28.2248919Z // end inline asm 2026-02-21T09:07:28.2248979Z bar.sync 0; 2026-02-21T09:07:28.2249106Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3342}; 2026-02-21T09:07:28.2249165Z bar.sync 0; 2026-02-21T09:07:28.2249229Z // begin inline asm 2026-02-21T09:07:28.2249405Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1775, %r1807, %r1839, %r1871}, [%r1325]; 2026-02-21T09:07:28.2249464Z // end inline asm 2026-02-21T09:07:28.2249531Z bar.sync 0; 2026-02-21T09:07:28.2249659Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3341}; 2026-02-21T09:07:28.2249715Z bar.sync 0; 2026-02-21T09:07:28.2249775Z // begin inline asm 2026-02-21T09:07:28.2250031Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1774, %r1806, %r1838, %r1870}, [%r1325]; 2026-02-21T09:07:28.2250154Z // end inline asm 2026-02-21T09:07:28.2250221Z bar.sync 0; 2026-02-21T09:07:28.2250360Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3343}; 2026-02-21T09:07:28.2250419Z bar.sync 0; 2026-02-21T09:07:28.2250479Z // begin inline asm 2026-02-21T09:07:28.2250657Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1776, %r1808, %r1840, %r1872}, [%r1325]; 2026-02-21T09:07:28.2250724Z // end inline asm 2026-02-21T09:07:28.2250780Z bar.sync 0; 2026-02-21T09:07:28.2250907Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3344}; 2026-02-21T09:07:28.2250975Z bar.sync 0; 2026-02-21T09:07:28.2251034Z // begin inline asm 2026-02-21T09:07:28.2251211Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1777, %r1809, %r1841, %r1873}, [%r1325]; 2026-02-21T09:07:28.2251278Z // end inline asm 2026-02-21T09:07:28.2251334Z bar.sync 0; 2026-02-21T09:07:28.2251461Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3346}; 2026-02-21T09:07:28.2251521Z bar.sync 0; 2026-02-21T09:07:28.2251663Z // begin inline asm 2026-02-21T09:07:28.2251848Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1779, %r1811, %r1843, %r1875}, [%r1325]; 2026-02-21T09:07:28.2251909Z // end inline asm 2026-02-21T09:07:28.2251972Z bar.sync 0; 2026-02-21T09:07:28.2252101Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3345}; 2026-02-21T09:07:28.2252159Z bar.sync 0; 2026-02-21T09:07:28.2252228Z // begin inline asm 2026-02-21T09:07:28.2252416Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1778, %r1810, %r1842, %r1874}, [%r1325]; 2026-02-21T09:07:28.2252473Z // end inline asm 2026-02-21T09:07:28.2252531Z bar.sync 0; 2026-02-21T09:07:28.2252662Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3347}; 2026-02-21T09:07:28.2252720Z bar.sync 0; 2026-02-21T09:07:28.2252784Z // begin inline asm 2026-02-21T09:07:28.2252968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1780, %r1812, %r1844, %r1876}, [%r1325]; 2026-02-21T09:07:28.2253027Z // end inline asm 2026-02-21T09:07:28.2253085Z bar.sync 0; 2026-02-21T09:07:28.2253216Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3348}; 2026-02-21T09:07:28.2253278Z bar.sync 0; 2026-02-21T09:07:28.2253339Z // begin inline asm 2026-02-21T09:07:28.2253518Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1781, %r1813, %r1845, %r1877}, [%r1325]; 2026-02-21T09:07:28.2253583Z // end inline asm 2026-02-21T09:07:28.2253640Z bar.sync 0; 2026-02-21T09:07:28.2253768Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3350}; 2026-02-21T09:07:28.2253834Z bar.sync 0; 2026-02-21T09:07:28.2253902Z // begin inline asm 2026-02-21T09:07:28.2254079Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1783, %r1815, %r1847, %r1879}, [%r1325]; 2026-02-21T09:07:28.2254137Z // end inline asm 2026-02-21T09:07:28.2254200Z bar.sync 0; 2026-02-21T09:07:28.2254328Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3349}; 2026-02-21T09:07:28.2254386Z bar.sync 0; 2026-02-21T09:07:28.2254450Z // begin inline asm 2026-02-21T09:07:28.2254639Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1782, %r1814, %r1846, %r1878}, [%r1325]; 2026-02-21T09:07:28.2254699Z // end inline asm 2026-02-21T09:07:28.2254756Z bar.sync 0; 2026-02-21T09:07:28.2254961Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1045], {%r3351}; 2026-02-21T09:07:28.2255020Z bar.sync 0; 2026-02-21T09:07:28.2255080Z // begin inline asm 2026-02-21T09:07:28.2255264Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1784, %r1816, %r1848, %r1880}, [%r1325]; 2026-02-21T09:07:28.2255320Z // end inline asm 2026-02-21T09:07:28.2255378Z $L__tmp5: 2026-02-21T09:07:28.2255675Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2255747Z // begin inline asm 2026-02-21T09:07:28.2255837Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2255896Z // end inline asm 2026-02-21T09:07:28.2255976Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2256046Z mov.pred %p30, -1; 2026-02-21T09:07:28.2256218Z // begin inline asm 2026-02-21T09:07:28.2257256Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1753,%r1754,%r1755,%r1756,%r1757,%r1758,%r1759,%r1760,%r1761,%r1762,%r1763,%r1764,%r1765,%r1766,%r1767,%r1768,%r1769,%r1770,%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778,%r1779,%r1780,%r1781,%r1782,%r1783,%r1784}, {%r1613,%r1614,%r1615,%r1616}, %rd74, %p30, 1, 1; 2026-02-21T09:07:28.2257326Z // end inline asm 2026-02-21T09:07:28.2257388Z // begin inline asm 2026-02-21T09:07:28.2258167Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1785,%r1786,%r1787,%r1788,%r1789,%r1790,%r1791,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816}, {%r1613,%r1614,%r1615,%r1616}, %rd75, %p30, 1, 1; 2026-02-21T09:07:28.2258231Z // end inline asm 2026-02-21T09:07:28.2258293Z // begin inline asm 2026-02-21T09:07:28.2259155Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1817,%r1818,%r1819,%r1820,%r1821,%r1822,%r1823,%r1824,%r1825,%r1826,%r1827,%r1828,%r1829,%r1830,%r1831,%r1832,%r1833,%r1834,%r1835,%r1836,%r1837,%r1838,%r1839,%r1840,%r1841,%r1842,%r1843,%r1844,%r1845,%r1846,%r1847,%r1848}, {%r1613,%r1614,%r1615,%r1616}, %rd76, %p30, 1, 1; 2026-02-21T09:07:28.2259226Z // end inline asm 2026-02-21T09:07:28.2259289Z // begin inline asm 2026-02-21T09:07:28.2260060Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1849,%r1850,%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858,%r1859,%r1860,%r1861,%r1862,%r1863,%r1864,%r1865,%r1866,%r1867,%r1868,%r1869,%r1870,%r1871,%r1872,%r1873,%r1874,%r1875,%r1876,%r1877,%r1878,%r1879,%r1880}, {%r1613,%r1614,%r1615,%r1616}, %rd77, %p30, 1, 1; 2026-02-21T09:07:28.2260120Z // end inline asm 2026-02-21T09:07:28.2260202Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2260271Z mov.b32 %r2181, 0; 2026-02-21T09:07:28.2260335Z mov.b32 %r1881, %r2012; 2026-02-21T09:07:28.2260395Z mov.b32 %r1882, %r2181; 2026-02-21T09:07:28.2260457Z mov.b32 %r1883, %r2181; 2026-02-21T09:07:28.2260530Z // begin inline asm 2026-02-21T09:07:28.2262626Z // wait for regs: %r1753,%r1754,%r1755,%r1756,%r1757,%r1758,%r1759,%r1760,%r1761,%r1762,%r1763,%r1764,%r1765,%r1766,%r1767,%r1768,%r1769,%r1770,%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778,%r1779,%r1780,%r1781,%r1782,%r1783,%r1784,%r1785,%r1786,%r1787,%r1788,%r1789,%r1790,%r1791,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1819,%r1820,%r1821,%r1822,%r1823,%r1824,%r1825,%r1826,%r1827,%r1828,%r1829,%r1830,%r1831,%r1832,%r1833,%r1834,%r1835,%r1836,%r1837,%r1838,%r1839,%r1840,%r1841,%r1842,%r1843,%r1844,%r1845,%r1846,%r1847,%r1848,%r1849,%r1850,%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858,%r1859,%r1860,%r1861,%r1862,%r1863,%r1864,%r1865,%r1866,%r1867,%r1868,%r1869,%r1870,%r1871,%r1872,%r1873,%r1874,%r1875,%r1876,%r1877,%r1878,%r1879,%r1880,%r1881,%r1882,%r1883 2026-02-21T09:07:28.2262729Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2262789Z // end inline asm 2026-02-21T09:07:28.2262850Z $L__tmp6: 2026-02-21T09:07:28.2263088Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2263247Z add.s64 %rd78, %rd103, 16; 2026-02-21T09:07:28.2263314Z // begin inline asm 2026-02-21T09:07:28.2263374Z mov.u16 %rs57, 0x0; 2026-02-21T09:07:28.2263468Z ld.global.b16 { %rs57 }, [ %rd78 + 0 ]; 2026-02-21T09:07:28.2263532Z // end inline asm 2026-02-21T09:07:28.2263748Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2263812Z bar.sync 0; 2026-02-21T09:07:28.2263880Z st.shared.b16 [%r13], %rs57; 2026-02-21T09:07:28.2263936Z bar.sync 0; 2026-02-21T09:07:28.2264005Z ld.shared.b16 %rs84, [%r14]; 2026-02-21T09:07:28.2264085Z ld.shared.b16 %rs85, [%r14+128]; 2026-02-21T09:07:28.2264296Z ld.shared.b16 %rs86, [%r14+8]; 2026-02-21T09:07:28.2264371Z ld.shared.b16 %rs87, [%r14+136]; 2026-02-21T09:07:28.2264451Z cvt.f32.bf16 %r2143, %rs84; 2026-02-21T09:07:28.2264518Z cvt.f32.bf16 %r2144, %rs85; 2026-02-21T09:07:28.2264585Z cvt.f32.bf16 %r2145, %rs86; 2026-02-21T09:07:28.2264655Z cvt.f32.bf16 %r2146, %rs87; 2026-02-21T09:07:28.2264719Z cvt.u32.u64 %r2227, %rd104; 2026-02-21T09:07:28.2264792Z add.s32 %r2228, %r2227, 12; 2026-02-21T09:07:28.2265005Z .loc 1 61 62 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:62 2026-02-21T09:07:28.2265074Z or.b32 %r2229, %r9, %r2228; 2026-02-21T09:07:28.2265136Z shl.b32 %r2230, %r2229, 13; 2026-02-21T09:07:28.2265202Z add.s32 %r2231, %r2230, %r104; 2026-02-21T09:07:28.2265410Z .loc 1 61 34 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:34 2026-02-21T09:07:28.2265477Z cvt.s64.s32 %rd82, %r2231; 2026-02-21T09:07:28.2265545Z add.s64 %rd79, %rd31, %rd82; 2026-02-21T09:07:28.2265818Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2265887Z // begin inline asm 2026-02-21T09:07:28.2265948Z mov.u16 %rs58, 0x0; 2026-02-21T09:07:28.2266027Z ld.global.b16 { %rs58 }, [ %rd79 + 0 ]; 2026-02-21T09:07:28.2266093Z // end inline asm 2026-02-21T09:07:28.2266157Z shr.u16 %rs88, %rs58, 8; 2026-02-21T09:07:28.2266358Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2266421Z bar.sync 0; 2026-02-21T09:07:28.2266620Z st.shared.b8 [%r15], %rs58; 2026-02-21T09:07:28.2266694Z st.shared.b8 [%r16+512], %rs88; 2026-02-21T09:07:28.2266752Z bar.sync 0; 2026-02-21T09:07:28.2266825Z ld.shared.b32 %r2232, [%r17]; 2026-02-21T09:07:28.2266898Z prmt.b32 %r2233, %r2232, 0, 0x7770U; 2026-02-21T09:07:28.2266963Z cvt.u16.u32 %rs89, %r2233; 2026-02-21T09:07:28.2267035Z prmt.b32 %r2234, %r2232, 0, 0x7771U; 2026-02-21T09:07:28.2267100Z cvt.u16.u32 %rs90, %r2234; 2026-02-21T09:07:28.2267178Z prmt.b32 %r2235, %r2232, 0, 0x7772U; 2026-02-21T09:07:28.2267245Z cvt.u16.u32 %rs91, %r2235; 2026-02-21T09:07:28.2267318Z prmt.b32 %r2236, %r2232, 0, 0x7773U; 2026-02-21T09:07:28.2267391Z cvt.u16.u32 %rs92, %r2236; 2026-02-21T09:07:28.2267597Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2267668Z shl.b16 %rs93, %rs89, 4; 2026-02-21T09:07:28.2267732Z shl.b16 %rs94, %rs90, 4; 2026-02-21T09:07:28.2267793Z shl.b16 %rs95, %rs91, 4; 2026-02-21T09:07:28.2267859Z shl.b16 %rs96, %rs92, 4; 2026-02-21T09:07:28.2268058Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2268132Z selp.b16 %rs97, %rs93, %rs89, %p53; 2026-02-21T09:07:28.2268197Z cvt.s16.s8 %rs98, %rs97; 2026-02-21T09:07:28.2268263Z shr.s16 %rs99, %rs98, 4; 2026-02-21T09:07:28.2268415Z selp.b16 %rs100, %rs94, %rs90, %p53; 2026-02-21T09:07:28.2268488Z cvt.s16.s8 %rs101, %rs100; 2026-02-21T09:07:28.2268563Z shr.s16 %rs102, %rs101, 4; 2026-02-21T09:07:28.2268634Z selp.b16 %rs103, %rs95, %rs91, %p53; 2026-02-21T09:07:28.2268698Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T09:07:28.2268851Z shr.s16 %rs105, %rs104, 4; 2026-02-21T09:07:28.2268920Z selp.b16 %rs106, %rs96, %rs92, %p53; 2026-02-21T09:07:28.2268982Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T09:07:28.2269044Z shr.s16 %rs108, %rs107, 4; 2026-02-21T09:07:28.2269267Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2269336Z cvt.rn.f32.s16 %r2237, %rs99; 2026-02-21T09:07:28.2269404Z cvt.rn.f32.s16 %r2238, %rs102; 2026-02-21T09:07:28.2269483Z cvt.rn.f32.s16 %r2239, %rs105; 2026-02-21T09:07:28.2269548Z cvt.rn.f32.s16 %r2240, %rs108; 2026-02-21T09:07:28.2269606Z bar.sync 0; 2026-02-21T09:07:28.2269672Z st.shared.b32 [%r18], %r2237; 2026-02-21T09:07:28.2269747Z st.shared.b32 [%r18+8], %r2238; 2026-02-21T09:07:28.2269960Z st.shared.b32 [%r19], %r2239; 2026-02-21T09:07:28.2270030Z st.shared.b32 [%r19+8], %r2240; 2026-02-21T09:07:28.2270094Z $L__tmp7: 2026-02-21T09:07:28.2270375Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2270576Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1753, %r1785, %r1817, %r1849}; 2026-02-21T09:07:28.2270639Z bar.sync 0; 2026-02-21T09:07:28.2270702Z // begin inline asm 2026-02-21T09:07:28.2270843Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3320}, [%r1045]; 2026-02-21T09:07:28.2270903Z // end inline asm 2026-02-21T09:07:28.2270967Z bar.sync 0; 2026-02-21T09:07:28.2271150Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1755, %r1787, %r1819, %r1851}; 2026-02-21T09:07:28.2271210Z bar.sync 0; 2026-02-21T09:07:28.2271276Z // begin inline asm 2026-02-21T09:07:28.2271409Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3322}, [%r1045]; 2026-02-21T09:07:28.2271481Z // end inline asm 2026-02-21T09:07:28.2271542Z bar.sync 0; 2026-02-21T09:07:28.2271799Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1754, %r1786, %r1818, %r1850}; 2026-02-21T09:07:28.2271859Z bar.sync 0; 2026-02-21T09:07:28.2271921Z // begin inline asm 2026-02-21T09:07:28.2272056Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3321}, [%r1045]; 2026-02-21T09:07:28.2272114Z // end inline asm 2026-02-21T09:07:28.2272171Z bar.sync 0; 2026-02-21T09:07:28.2272354Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1756, %r1788, %r1820, %r1852}; 2026-02-21T09:07:28.2272410Z bar.sync 0; 2026-02-21T09:07:28.2272470Z // begin inline asm 2026-02-21T09:07:28.2272599Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3323}, [%r1045]; 2026-02-21T09:07:28.2272664Z // end inline asm 2026-02-21T09:07:28.2272720Z bar.sync 0; 2026-02-21T09:07:28.2272899Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1757, %r1789, %r1821, %r1853}; 2026-02-21T09:07:28.2272961Z bar.sync 0; 2026-02-21T09:07:28.2273026Z // begin inline asm 2026-02-21T09:07:28.2273157Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3324}, [%r1045]; 2026-02-21T09:07:28.2273216Z // end inline asm 2026-02-21T09:07:28.2273276Z bar.sync 0; 2026-02-21T09:07:28.2273474Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1759, %r1791, %r1823, %r1855}; 2026-02-21T09:07:28.2273533Z bar.sync 0; 2026-02-21T09:07:28.2273599Z // begin inline asm 2026-02-21T09:07:28.2273729Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3326}, [%r1045]; 2026-02-21T09:07:28.2273788Z // end inline asm 2026-02-21T09:07:28.2273845Z bar.sync 0; 2026-02-21T09:07:28.2274030Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1758, %r1790, %r1822, %r1854}; 2026-02-21T09:07:28.2274087Z bar.sync 0; 2026-02-21T09:07:28.2274147Z // begin inline asm 2026-02-21T09:07:28.2274280Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3325}, [%r1045]; 2026-02-21T09:07:28.2274341Z // end inline asm 2026-02-21T09:07:28.2274397Z bar.sync 0; 2026-02-21T09:07:28.2274586Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1760, %r1792, %r1824, %r1856}; 2026-02-21T09:07:28.2274641Z bar.sync 0; 2026-02-21T09:07:28.2274702Z // begin inline asm 2026-02-21T09:07:28.2274832Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3327}, [%r1045]; 2026-02-21T09:07:28.2274962Z // end inline asm 2026-02-21T09:07:28.2275021Z bar.sync 0; 2026-02-21T09:07:28.2275203Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1761, %r1793, %r1825, %r1857}; 2026-02-21T09:07:28.2275265Z bar.sync 0; 2026-02-21T09:07:28.2275324Z // begin inline asm 2026-02-21T09:07:28.2275452Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3328}, [%r1045]; 2026-02-21T09:07:28.2275514Z // end inline asm 2026-02-21T09:07:28.2275569Z bar.sync 0; 2026-02-21T09:07:28.2275747Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1763, %r1795, %r1827, %r1859}; 2026-02-21T09:07:28.2275808Z bar.sync 0; 2026-02-21T09:07:28.2275867Z // begin inline asm 2026-02-21T09:07:28.2275996Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3330}, [%r1045]; 2026-02-21T09:07:28.2276165Z // end inline asm 2026-02-21T09:07:28.2276226Z bar.sync 0; 2026-02-21T09:07:28.2276404Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1762, %r1794, %r1826, %r1858}; 2026-02-21T09:07:28.2276600Z bar.sync 0; 2026-02-21T09:07:28.2276671Z // begin inline asm 2026-02-21T09:07:28.2276810Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3329}, [%r1045]; 2026-02-21T09:07:28.2276870Z // end inline asm 2026-02-21T09:07:28.2276929Z bar.sync 0; 2026-02-21T09:07:28.2277115Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1764, %r1796, %r1828, %r1860}; 2026-02-21T09:07:28.2277171Z bar.sync 0; 2026-02-21T09:07:28.2277233Z // begin inline asm 2026-02-21T09:07:28.2277367Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3331}, [%r1045]; 2026-02-21T09:07:28.2277427Z // end inline asm 2026-02-21T09:07:28.2277487Z bar.sync 0; 2026-02-21T09:07:28.2277668Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1765, %r1797, %r1829, %r1861}; 2026-02-21T09:07:28.2277733Z bar.sync 0; 2026-02-21T09:07:28.2277870Z // begin inline asm 2026-02-21T09:07:28.2278002Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3332}, [%r1045]; 2026-02-21T09:07:28.2278064Z // end inline asm 2026-02-21T09:07:28.2278121Z bar.sync 0; 2026-02-21T09:07:28.2278299Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1767, %r1799, %r1831, %r1863}; 2026-02-21T09:07:28.2278360Z bar.sync 0; 2026-02-21T09:07:28.2278420Z // begin inline asm 2026-02-21T09:07:28.2278560Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3334}, [%r1045]; 2026-02-21T09:07:28.2278619Z // end inline asm 2026-02-21T09:07:28.2278682Z bar.sync 0; 2026-02-21T09:07:28.2278861Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1766, %r1798, %r1830, %r1862}; 2026-02-21T09:07:28.2278915Z bar.sync 0; 2026-02-21T09:07:28.2278978Z // begin inline asm 2026-02-21T09:07:28.2279104Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3333}, [%r1045]; 2026-02-21T09:07:28.2279164Z // end inline asm 2026-02-21T09:07:28.2279221Z bar.sync 0; 2026-02-21T09:07:28.2279408Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1768, %r1800, %r1832, %r1864}; 2026-02-21T09:07:28.2279465Z bar.sync 0; 2026-02-21T09:07:28.2279525Z // begin inline asm 2026-02-21T09:07:28.2279658Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3335}, [%r1045]; 2026-02-21T09:07:28.2279723Z // end inline asm 2026-02-21T09:07:28.2279778Z bar.sync 0; 2026-02-21T09:07:28.2279965Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1769, %r1801, %r1833, %r1865}; 2026-02-21T09:07:28.2280021Z bar.sync 0; 2026-02-21T09:07:28.2280080Z // begin inline asm 2026-02-21T09:07:28.2280209Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3336}, [%r1045]; 2026-02-21T09:07:28.2280271Z // end inline asm 2026-02-21T09:07:28.2280328Z bar.sync 0; 2026-02-21T09:07:28.2280506Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1771, %r1803, %r1835, %r1867}; 2026-02-21T09:07:28.2280565Z bar.sync 0; 2026-02-21T09:07:28.2280624Z // begin inline asm 2026-02-21T09:07:28.2280756Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3338}, [%r1045]; 2026-02-21T09:07:28.2280813Z // end inline asm 2026-02-21T09:07:28.2280873Z bar.sync 0; 2026-02-21T09:07:28.2281053Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1770, %r1802, %r1834, %r1866}; 2026-02-21T09:07:28.2281188Z bar.sync 0; 2026-02-21T09:07:28.2281262Z // begin inline asm 2026-02-21T09:07:28.2281395Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3337}, [%r1045]; 2026-02-21T09:07:28.2281453Z // end inline asm 2026-02-21T09:07:28.2281511Z bar.sync 0; 2026-02-21T09:07:28.2281690Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1772, %r1804, %r1836, %r1868}; 2026-02-21T09:07:28.2281746Z bar.sync 0; 2026-02-21T09:07:28.2281806Z // begin inline asm 2026-02-21T09:07:28.2281937Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3339}, [%r1045]; 2026-02-21T09:07:28.2281994Z // end inline asm 2026-02-21T09:07:28.2282051Z bar.sync 0; 2026-02-21T09:07:28.2282300Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1773, %r1805, %r1837, %r1869}; 2026-02-21T09:07:28.2282418Z bar.sync 0; 2026-02-21T09:07:28.2282478Z // begin inline asm 2026-02-21T09:07:28.2282607Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3340}, [%r1045]; 2026-02-21T09:07:28.2282671Z // end inline asm 2026-02-21T09:07:28.2282727Z bar.sync 0; 2026-02-21T09:07:28.2282906Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1775, %r1807, %r1839, %r1871}; 2026-02-21T09:07:28.2282967Z bar.sync 0; 2026-02-21T09:07:28.2283025Z // begin inline asm 2026-02-21T09:07:28.2283153Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3342}, [%r1045]; 2026-02-21T09:07:28.2283214Z // end inline asm 2026-02-21T09:07:28.2283269Z bar.sync 0; 2026-02-21T09:07:28.2283445Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1774, %r1806, %r1838, %r1870}; 2026-02-21T09:07:28.2283506Z bar.sync 0; 2026-02-21T09:07:28.2283564Z // begin inline asm 2026-02-21T09:07:28.2283690Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3341}, [%r1045]; 2026-02-21T09:07:28.2283752Z // end inline asm 2026-02-21T09:07:28.2283807Z bar.sync 0; 2026-02-21T09:07:28.2284036Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1776, %r1808, %r1840, %r1872}; 2026-02-21T09:07:28.2284097Z bar.sync 0; 2026-02-21T09:07:28.2284156Z // begin inline asm 2026-02-21T09:07:28.2284282Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3343}, [%r1045]; 2026-02-21T09:07:28.2284338Z // end inline asm 2026-02-21T09:07:28.2284397Z bar.sync 0; 2026-02-21T09:07:28.2284573Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1777, %r1809, %r1841, %r1873}; 2026-02-21T09:07:28.2284626Z bar.sync 0; 2026-02-21T09:07:28.2284689Z // begin inline asm 2026-02-21T09:07:28.2284813Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3344}, [%r1045]; 2026-02-21T09:07:28.2284869Z // end inline asm 2026-02-21T09:07:28.2284923Z bar.sync 0; 2026-02-21T09:07:28.2285105Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1779, %r1811, %r1843, %r1875}; 2026-02-21T09:07:28.2285161Z bar.sync 0; 2026-02-21T09:07:28.2285224Z // begin inline asm 2026-02-21T09:07:28.2285355Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3346}, [%r1045]; 2026-02-21T09:07:28.2285415Z // end inline asm 2026-02-21T09:07:28.2285470Z bar.sync 0; 2026-02-21T09:07:28.2285657Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1778, %r1810, %r1842, %r1874}; 2026-02-21T09:07:28.2285711Z bar.sync 0; 2026-02-21T09:07:28.2285770Z // begin inline asm 2026-02-21T09:07:28.2285895Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3345}, [%r1045]; 2026-02-21T09:07:28.2285955Z // end inline asm 2026-02-21T09:07:28.2286010Z bar.sync 0; 2026-02-21T09:07:28.2286187Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1780, %r1812, %r1844, %r1876}; 2026-02-21T09:07:28.2286244Z bar.sync 0; 2026-02-21T09:07:28.2286302Z // begin inline asm 2026-02-21T09:07:28.2286428Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3347}, [%r1045]; 2026-02-21T09:07:28.2286610Z // end inline asm 2026-02-21T09:07:28.2286672Z bar.sync 0; 2026-02-21T09:07:28.2286859Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1781, %r1813, %r1845, %r1877}; 2026-02-21T09:07:28.2286914Z bar.sync 0; 2026-02-21T09:07:28.2286982Z // begin inline asm 2026-02-21T09:07:28.2287115Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3348}, [%r1045]; 2026-02-21T09:07:28.2287262Z // end inline asm 2026-02-21T09:07:28.2287320Z bar.sync 0; 2026-02-21T09:07:28.2287500Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1783, %r1815, %r1847, %r1879}; 2026-02-21T09:07:28.2287554Z bar.sync 0; 2026-02-21T09:07:28.2287613Z // begin inline asm 2026-02-21T09:07:28.2287742Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3350}, [%r1045]; 2026-02-21T09:07:28.2287797Z // end inline asm 2026-02-21T09:07:28.2287852Z bar.sync 0; 2026-02-21T09:07:28.2288038Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1782, %r1814, %r1846, %r1878}; 2026-02-21T09:07:28.2288098Z bar.sync 0; 2026-02-21T09:07:28.2288158Z // begin inline asm 2026-02-21T09:07:28.2288286Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3349}, [%r1045]; 2026-02-21T09:07:28.2288476Z // end inline asm 2026-02-21T09:07:28.2288532Z bar.sync 0; 2026-02-21T09:07:28.2288710Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1325], {%r1784, %r1816, %r1848, %r1880}; 2026-02-21T09:07:28.2288772Z bar.sync 0; 2026-02-21T09:07:28.2288831Z // begin inline asm 2026-02-21T09:07:28.2288957Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3351}, [%r1045]; 2026-02-21T09:07:28.2289016Z // end inline asm 2026-02-21T09:07:28.2289074Z // begin inline asm 2026-02-21T09:07:28.2289165Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2289223Z // end inline asm 2026-02-21T09:07:28.2289299Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2289356Z // begin inline asm 2026-02-21T09:07:28.2290126Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351}, {%r2143,%r2144,%r2145,%r2146}, %rd80, %p30, 1, 1; 2026-02-21T09:07:28.2290255Z // end inline asm 2026-02-21T09:07:28.2290336Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2290400Z mov.b32 %r2180, %r2181; 2026-02-21T09:07:28.2290466Z mov.b32 %r2179, %r2012; 2026-02-21T09:07:28.2290524Z // begin inline asm 2026-02-21T09:07:28.2291081Z // wait for regs: %r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r2179,%r2180,%r2181 2026-02-21T09:07:28.2291160Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2291217Z // end inline asm 2026-02-21T09:07:28.2291272Z $L__tmp8: 2026-02-21T09:07:28.2291496Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2291564Z add.s64 %rd104, %rd104, 8; 2026-02-21T09:07:28.2291627Z add.s64 %rd103, %rd103, 32; 2026-02-21T09:07:28.2291694Z add.s32 %r3319, %r3319, 65536; 2026-02-21T09:07:28.2291764Z setp.lt.u64 %p36, %rd104, 504; 2026-02-21T09:07:28.2291824Z @%p36 bra $L__BB0_5; 2026-02-21T09:07:28.2291937Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:07:28.2292149Z .loc 1 94 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:94:28 2026-02-21T09:07:28.2292238Z cvt.rn.bf16x2.f32 %r2244, %r3321, %r3320; 2026-02-21T09:07:28.2292313Z cvt.rn.bf16x2.f32 %r2245, %r3323, %r3322; 2026-02-21T09:07:28.2292385Z cvt.rn.bf16x2.f32 %r2246, %r3325, %r3324; 2026-02-21T09:07:28.2292461Z cvt.rn.bf16x2.f32 %r2247, %r3327, %r3326; 2026-02-21T09:07:28.2292532Z cvt.rn.bf16x2.f32 %r2248, %r3329, %r3328; 2026-02-21T09:07:28.2292602Z cvt.rn.bf16x2.f32 %r2249, %r3331, %r3330; 2026-02-21T09:07:28.2292678Z cvt.rn.bf16x2.f32 %r2250, %r3333, %r3332; 2026-02-21T09:07:28.2292749Z cvt.rn.bf16x2.f32 %r2251, %r3335, %r3334; 2026-02-21T09:07:28.2292822Z cvt.rn.bf16x2.f32 %r2252, %r3337, %r3336; 2026-02-21T09:07:28.2292895Z cvt.rn.bf16x2.f32 %r2253, %r3339, %r3338; 2026-02-21T09:07:28.2292967Z cvt.rn.bf16x2.f32 %r2254, %r3341, %r3340; 2026-02-21T09:07:28.2293038Z cvt.rn.bf16x2.f32 %r2255, %r3343, %r3342; 2026-02-21T09:07:28.2293182Z cvt.rn.bf16x2.f32 %r2256, %r3345, %r3344; 2026-02-21T09:07:28.2293257Z cvt.rn.bf16x2.f32 %r2257, %r3347, %r3346; 2026-02-21T09:07:28.2293330Z cvt.rn.bf16x2.f32 %r2258, %r3349, %r3348; 2026-02-21T09:07:28.2293400Z cvt.rn.bf16x2.f32 %r2259, %r3351, %r3350; 2026-02-21T09:07:28.2293614Z .loc 1 95 43 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:95:43 2026-02-21T09:07:28.2293692Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:07:28.2293749Z bar.sync 0; 2026-02-21T09:07:28.2293936Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r22], {%r2244, %r2245, %r2246, %r2247}; 2026-02-21T09:07:28.2294120Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r23], {%r2248, %r2249, %r2250, %r2251}; 2026-02-21T09:07:28.2294417Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r24], {%r2252, %r2253, %r2254, %r2255}; 2026-02-21T09:07:28.2294594Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r25], {%r2256, %r2257, %r2258, %r2259}; 2026-02-21T09:07:28.2294669Z // begin inline asm 2026-02-21T09:07:28.2294752Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2294809Z // end inline asm 2026-02-21T09:07:28.2294869Z bar.sync 0; 2026-02-21T09:07:28.2294940Z elect.sync %r2260|%p39, -1; 2026-02-21T09:07:28.2295009Z and.pred %p37, %p54, %p39; 2026-02-21T09:07:28.2295071Z or.b32 %r2241, %r103, %r101; 2026-02-21T09:07:28.2295133Z // begin inline asm 2026-02-21T09:07:28.2295359Z @%p37 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd69, {%r2241, %r102}], [%r2243]; 2026-02-21T09:07:28.2295419Z // end inline asm 2026-02-21T09:07:28.2295495Z cp.async.bulk.commit_group; 2026-02-21T09:07:28.2295705Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2295773Z add.s32 %r3285, %r3285, 1056; 2026-02-21T09:07:28.2295899Z setp.lt.s32 %p40, %r3285, %r3352; 2026-02-21T09:07:28.2295972Z @%p40 bra $L__BB0_2; 2026-02-21T09:07:28.2296061Z $L__BB0_7: // %._crit_edge 2026-02-21T09:07:28.2296140Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:07:28.2296198Z bar.sync 0; 2026-02-21T09:07:28.2296264Z setp.gt.s32 %p41, %r3352, 2047; 2026-02-21T09:07:28.2296324Z @%p41 bra $L__BB0_12; 2026-02-21T09:07:28.2296410Z // %bb.8: // %.lr.ph48 2026-02-21T09:07:28.2296736Z .loc 1 0 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:0:97 2026-02-21T09:07:28.2296804Z and.b32 %r2262, %r3273, 1022; 2026-02-21T09:07:28.2296871Z add.s32 %r173, %r3020, %r2262; 2026-02-21T09:07:28.2296935Z shl.b32 %r2265, %r3274, 3; 2026-02-21T09:07:28.2296996Z and.b32 %r2267, %r270, 112; 2026-02-21T09:07:28.2297055Z and.b32 %r2268, %r3273, 6; 2026-02-21T09:07:28.2297124Z add.s32 %r2269, %r3020, %r2265; 2026-02-21T09:07:28.2297189Z add.s32 %r2270, %r2269, %r2267; 2026-02-21T09:07:28.2297250Z add.s32 %r174, %r2270, %r2268; 2026-02-21T09:07:28.2297315Z or.b32 %r2272, %r10, %r3275; 2026-02-21T09:07:28.2297392Z add.s32 %r175, %r3020, %r2272; 2026-02-21T09:07:28.2297456Z xor.b32 %r2273, %r2272, 64; 2026-02-21T09:07:28.2297525Z add.s32 %r176, %r3020, %r2273; 2026-02-21T09:07:28.2297594Z and.b32 %r2274, %r3273, 508; 2026-02-21T09:07:28.2297656Z neg.s32 %r2276, %r3276; 2026-02-21T09:07:28.2297718Z and.b32 %r2277, %r2276, 576; 2026-02-21T09:07:28.2297786Z xor.b32 %r2278, %r2277, %r2274; 2026-02-21T09:07:28.2297847Z add.s32 %r177, %r3020, %r2278; 2026-02-21T09:07:28.2297909Z and.b32 %r2280, %r3277, 8032; 2026-02-21T09:07:28.2297969Z and.b32 %r2282, %r3278, 144; 2026-02-21T09:07:28.2298032Z or.b32 %r2284, %r3279, %r2280; 2026-02-21T09:07:28.2298092Z or.b32 %r2285, %r2284, %r2282; 2026-02-21T09:07:28.2298153Z add.s32 %r178, %r3020, %r2285; 2026-02-21T09:07:28.2298223Z xor.b32 %r2286, %r2285, 16; 2026-02-21T09:07:28.2298285Z add.s32 %r179, %r3020, %r2286; 2026-02-21T09:07:28.2298350Z and.b32 %r2288, %r3280, 120; 2026-02-21T09:07:28.2298413Z or.b32 %r2289, %r2288, %r4; 2026-02-21T09:07:28.2298580Z shl.b32 %r2290, %r2289, 4; 2026-02-21T09:07:28.2298641Z add.s32 %r2291, %r3020, 8192; 2026-02-21T09:07:28.2298705Z add.s32 %r3024, %r2291, %r2290; 2026-02-21T09:07:28.2298771Z and.b32 %r2293, %r3281, 1536; 2026-02-21T09:07:28.2298831Z shl.b32 %r2295, %r3274, 2; 2026-02-21T09:07:28.2298904Z add.s32 %r2296, %r2291, %r2293; 2026-02-21T09:07:28.2298972Z add.s32 %r2297, %r2296, %r3282; 2026-02-21T09:07:28.2299033Z add.s32 %r2333, %r2297, %r2295; 2026-02-21T09:07:28.2299094Z bfe.u32 %r2298, %r3020, 4, 14; 2026-02-21T09:07:28.2299156Z cvt.u64.u32 %rd84, %r2298; 2026-02-21T09:07:28.2299242Z or.b64 %rd92, %rd84, -4611685949674356736; 2026-02-21T09:07:28.2299306Z add.s32 %r2299, %r3020, 2048; 2026-02-21T09:07:28.2299365Z bfe.u32 %r2300, %r2299, 4, 14; 2026-02-21T09:07:28.2299561Z cvt.u64.u32 %rd85, %r2300; 2026-02-21T09:07:28.2299641Z or.b64 %rd93, %rd85, -4611685949674356736; 2026-02-21T09:07:28.2299702Z add.s32 %r2301, %r3020, 4096; 2026-02-21T09:07:28.2299777Z bfe.u32 %r2302, %r2301, 4, 14; 2026-02-21T09:07:28.2299844Z cvt.u64.u32 %rd86, %r2302; 2026-02-21T09:07:28.2299917Z or.b64 %rd94, %rd86, -4611685949674356736; 2026-02-21T09:07:28.2299979Z add.s32 %r2303, %r3020, 6144; 2026-02-21T09:07:28.2300042Z bfe.u32 %r2304, %r2303, 4, 14; 2026-02-21T09:07:28.2300103Z cvt.u64.u32 %rd87, %r2304; 2026-02-21T09:07:28.2300175Z or.b64 %rd95, %rd87, -4611685949674356736; 2026-02-21T09:07:28.2300238Z and.b32 %r2306, %r3283, 1920; 2026-02-21T09:07:28.2300301Z and.b32 %r2307, %r3281, 30720; 2026-02-21T09:07:28.2300361Z or.b32 %r2309, %r2306, %r2307; 2026-02-21T09:07:28.2300425Z xor.b32 %r2310, %r3282, %r3284; 2026-02-21T09:07:28.2300491Z or.b32 %r2311, %r2309, %r2310; 2026-02-21T09:07:28.2300552Z add.s32 %r182, %r3020, %r2311; 2026-02-21T09:07:28.2300617Z xor.b32 %r2312, %r2311, 32; 2026-02-21T09:07:28.2300755Z add.s32 %r183, %r3020, %r2312; 2026-02-21T09:07:28.2300821Z xor.b32 %r2313, %r2311, 64; 2026-02-21T09:07:28.2300883Z add.s32 %r184, %r3020, %r2313; 2026-02-21T09:07:28.2300945Z xor.b32 %r2314, %r2311, 96; 2026-02-21T09:07:28.2301023Z add.s32 %r185, %r3020, %r2314; 2026-02-21T09:07:28.2301241Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2301315Z mad.wide.u32 %rd21, %r10, 8192, %rd31; 2026-02-21T09:07:28.2301430Z $L__BB0_9: // =>This Loop Header: Depth=1 2026-02-21T09:07:28.2301528Z // Child Loop BB0_10 Depth 2 2026-02-21T09:07:28.2301735Z .loc 1 32 35 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:32:35 2026-02-21T09:07:28.2301799Z shr.s32 %r2316, %r3352, 31; 2026-02-21T09:07:28.2301861Z shr.u32 %r2317, %r2316, 21; 2026-02-21T09:07:28.2301928Z add.s32 %r2318, %r3352, %r2317; 2026-02-21T09:07:28.2302128Z .loc 1 35 45 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:45 2026-02-21T09:07:28.2302197Z and.b32 %r2319, %r2318, 63488; 2026-02-21T09:07:28.2302272Z sub.s32 %r2320, %r3352, %r2319; 2026-02-21T09:07:28.2302474Z .loc 1 35 64 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:64 2026-02-21T09:07:28.2302542Z cvt.u16.u32 %rs109, %r2320; 2026-02-21T09:07:28.2302741Z .loc 1 36 51 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:36:51 2026-02-21T09:07:28.2302803Z shr.s16 %rs110, %rs109, 15; 2026-02-21T09:07:28.2302869Z shr.u16 %rs111, %rs110, 10; 2026-02-21T09:07:28.2302931Z add.s16 %rs112, %rs109, %rs111; 2026-02-21T09:07:28.2302994Z shr.s16 %rs113, %rs112, 6; 2026-02-21T09:07:28.2303191Z .loc 1 35 64 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:35:64 2026-02-21T09:07:28.2303264Z and.b16 %rs114, %rs112, -64; 2026-02-21T09:07:28.2303328Z sub.s16 %rs115, %rs109, %rs114; 2026-02-21T09:07:28.2303526Z .loc 1 37 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:37:27 2026-02-21T09:07:28.2303658Z shl.b32 %r2321, %r2318, 1; 2026-02-21T09:07:28.2303720Z and.b32 %r2322, %r2321, -4096; 2026-02-21T09:07:28.2303787Z mul.wide.s16 %r2323, %rs115, 64; 2026-02-21T09:07:28.2303852Z add.s32 %r188, %r2323, %r2322; 2026-02-21T09:07:28.2304051Z .loc 1 39 27 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:39:27 2026-02-21T09:07:28.2304118Z mul.wide.s16 %r189, %rs113, 256; 2026-02-21T09:07:28.2304330Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2304406Z or.b32 %r2324, %r6, %r2322; 2026-02-21T09:07:28.2304471Z add.s32 %r2325, %r2324, %r2323; 2026-02-21T09:07:28.2304535Z shl.b32 %r2326, %r2325, 10; 2026-02-21T09:07:28.2304653Z or.b32 %r2327, %r11, %r2326; 2026-02-21T09:07:28.2304775Z mad.wide.s32 %rd106, %r2327, 2, %rd30; 2026-02-21T09:07:28.2304838Z or.b32 %r2328, %r8, %r189; 2026-02-21T09:07:28.2304904Z cvt.s64.s32 %rd89, %r2328; 2026-02-21T09:07:28.2304970Z add.s64 %rd105, %rd21, %rd89; 2026-02-21T09:07:28.2305029Z mov.b32 %r3353, 0f00000000; 2026-02-21T09:07:28.2305089Z mov.b64 %rd107, -8; 2026-02-21T09:07:28.2305154Z mov.b32 %r3354, %r3353; 2026-02-21T09:07:28.2305215Z mov.b32 %r3355, %r3353; 2026-02-21T09:07:28.2305274Z mov.b32 %r3356, %r3353; 2026-02-21T09:07:28.2305336Z mov.b32 %r3357, %r3353; 2026-02-21T09:07:28.2305395Z mov.b32 %r3358, %r3353; 2026-02-21T09:07:28.2305453Z mov.b32 %r3359, %r3353; 2026-02-21T09:07:28.2305512Z mov.b32 %r3360, %r3353; 2026-02-21T09:07:28.2305572Z mov.b32 %r3361, %r3353; 2026-02-21T09:07:28.2305632Z mov.b32 %r3362, %r3353; 2026-02-21T09:07:28.2305692Z mov.b32 %r3363, %r3353; 2026-02-21T09:07:28.2305752Z mov.b32 %r3364, %r3353; 2026-02-21T09:07:28.2305812Z mov.b32 %r3365, %r3353; 2026-02-21T09:07:28.2305922Z mov.b32 %r3366, %r3353; 2026-02-21T09:07:28.2305984Z mov.b32 %r3367, %r3353; 2026-02-21T09:07:28.2306049Z mov.b32 %r3368, %r3353; 2026-02-21T09:07:28.2306106Z mov.b32 %r3369, %r3353; 2026-02-21T09:07:28.2306166Z mov.b32 %r3370, %r3353; 2026-02-21T09:07:28.2306227Z mov.b32 %r3371, %r3353; 2026-02-21T09:07:28.2306285Z mov.b32 %r3372, %r3353; 2026-02-21T09:07:28.2306341Z mov.b32 %r3373, %r3353; 2026-02-21T09:07:28.2306400Z mov.b32 %r3374, %r3353; 2026-02-21T09:07:28.2306590Z mov.b32 %r3375, %r3353; 2026-02-21T09:07:28.2306654Z mov.b32 %r3376, %r3353; 2026-02-21T09:07:28.2306713Z mov.b32 %r3377, %r3353; 2026-02-21T09:07:28.2306773Z mov.b32 %r3378, %r3353; 2026-02-21T09:07:28.2306831Z mov.b32 %r3379, %r3353; 2026-02-21T09:07:28.2306890Z mov.b32 %r3380, %r3353; 2026-02-21T09:07:28.2306948Z mov.b32 %r3381, %r3353; 2026-02-21T09:07:28.2307012Z mov.b32 %r3382, %r3353; 2026-02-21T09:07:28.2307070Z mov.b32 %r3383, %r3353; 2026-02-21T09:07:28.2307133Z mov.b32 %r3384, %r3353; 2026-02-21T09:07:28.2307253Z $L__BB0_10: // Parent Loop BB0_9 Depth=1 2026-02-21T09:07:28.2307358Z // => This Inner Loop Header: Depth=2 2026-02-21T09:07:28.2307562Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2307637Z // begin inline asm 2026-02-21T09:07:28.2307700Z mov.u16 %rs116, 0x0; 2026-02-21T09:07:28.2307777Z ld.global.b16 { %rs116 }, [ %rd106 + 0 ]; 2026-02-21T09:07:28.2307835Z // end inline asm 2026-02-21T09:07:28.2308039Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2308095Z bar.sync 0; 2026-02-21T09:07:28.2308162Z st.shared.b16 [%r173], %rs116; 2026-02-21T09:07:28.2308222Z bar.sync 0; 2026-02-21T09:07:28.2308288Z ld.shared.b16 %rs120, [%r174]; 2026-02-21T09:07:28.2308440Z ld.shared.b16 %rs121, [%r174+128]; 2026-02-21T09:07:28.2308513Z ld.shared.b16 %rs122, [%r174+8]; 2026-02-21T09:07:28.2308583Z ld.shared.b16 %rs123, [%r174+136]; 2026-02-21T09:07:28.2308649Z cvt.f32.bf16 %r2621, %rs120; 2026-02-21T09:07:28.2308712Z cvt.f32.bf16 %r2622, %rs121; 2026-02-21T09:07:28.2308871Z cvt.f32.bf16 %r2623, %rs122; 2026-02-21T09:07:28.2308932Z cvt.f32.bf16 %r2624, %rs123; 2026-02-21T09:07:28.2309135Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2309200Z // begin inline asm 2026-02-21T09:07:28.2309259Z mov.u16 %rs117, 0x0; 2026-02-21T09:07:28.2309332Z ld.global.b16 { %rs117 }, [ %rd105 + 0 ]; 2026-02-21T09:07:28.2309392Z // end inline asm 2026-02-21T09:07:28.2309459Z shr.u16 %rs124, %rs117, 8; 2026-02-21T09:07:28.2309657Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2309714Z bar.sync 0; 2026-02-21T09:07:28.2309784Z st.shared.b8 [%r175], %rs117; 2026-02-21T09:07:28.2309983Z st.shared.b8 [%r176+512], %rs124; 2026-02-21T09:07:28.2310042Z bar.sync 0; 2026-02-21T09:07:28.2310109Z ld.shared.b32 %r3225, [%r177]; 2026-02-21T09:07:28.2310182Z prmt.b32 %r3226, %r3225, 0, 0x7770U; 2026-02-21T09:07:28.2310247Z cvt.u16.u32 %rs125, %r3226; 2026-02-21T09:07:28.2310313Z prmt.b32 %r3227, %r3225, 0, 0x7771U; 2026-02-21T09:07:28.2310379Z cvt.u16.u32 %rs126, %r3227; 2026-02-21T09:07:28.2310444Z prmt.b32 %r3228, %r3225, 0, 0x7772U; 2026-02-21T09:07:28.2310505Z cvt.u16.u32 %rs127, %r3228; 2026-02-21T09:07:28.2310573Z prmt.b32 %r3229, %r3225, 0, 0x7773U; 2026-02-21T09:07:28.2310635Z cvt.u16.u32 %rs128, %r3229; 2026-02-21T09:07:28.2310835Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2310898Z shl.b16 %rs129, %rs125, 4; 2026-02-21T09:07:28.2310964Z shl.b16 %rs130, %rs126, 4; 2026-02-21T09:07:28.2311025Z shl.b16 %rs131, %rs127, 4; 2026-02-21T09:07:28.2311086Z shl.b16 %rs132, %rs128, 4; 2026-02-21T09:07:28.2311360Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2311448Z selp.b16 %rs133, %rs129, %rs125, %p53; 2026-02-21T09:07:28.2311513Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T09:07:28.2311578Z shr.s16 %rs135, %rs134, 4; 2026-02-21T09:07:28.2311648Z selp.b16 %rs136, %rs130, %rs126, %p53; 2026-02-21T09:07:28.2311710Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T09:07:28.2311770Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:07:28.2311851Z selp.b16 %rs139, %rs131, %rs127, %p53; 2026-02-21T09:07:28.2311913Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T09:07:28.2311976Z shr.s16 %rs141, %rs140, 4; 2026-02-21T09:07:28.2312046Z selp.b16 %rs142, %rs132, %rs128, %p53; 2026-02-21T09:07:28.2312108Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T09:07:28.2312171Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:07:28.2312371Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2312444Z cvt.rn.f32.s16 %r3230, %rs135; 2026-02-21T09:07:28.2312509Z cvt.rn.f32.s16 %r3231, %rs138; 2026-02-21T09:07:28.2312573Z cvt.rn.f32.s16 %r3232, %rs141; 2026-02-21T09:07:28.2312638Z cvt.rn.f32.s16 %r3233, %rs144; 2026-02-21T09:07:28.2312696Z bar.sync 0; 2026-02-21T09:07:28.2312758Z st.shared.b32 [%r178], %r3230; 2026-02-21T09:07:28.2312824Z st.shared.b32 [%r178+8], %r3231; 2026-02-21T09:07:28.2312894Z st.shared.b32 [%r179], %r3232; 2026-02-21T09:07:28.2312959Z st.shared.b32 [%r179+8], %r3233; 2026-02-21T09:07:28.2313101Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3353}; 2026-02-21T09:07:28.2313165Z bar.sync 0; 2026-02-21T09:07:28.2313225Z // begin inline asm 2026-02-21T09:07:28.2313417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2761, %r2793, %r2825, %r2857}, [%r2333]; 2026-02-21T09:07:28.2313478Z // end inline asm 2026-02-21T09:07:28.2313532Z bar.sync 0; 2026-02-21T09:07:28.2313664Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3355}; 2026-02-21T09:07:28.2313732Z bar.sync 0; 2026-02-21T09:07:28.2313799Z // begin inline asm 2026-02-21T09:07:28.2313982Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2763, %r2795, %r2827, %r2859}, [%r2333]; 2026-02-21T09:07:28.2314039Z // end inline asm 2026-02-21T09:07:28.2314188Z bar.sync 0; 2026-02-21T09:07:28.2314321Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3354}; 2026-02-21T09:07:28.2314375Z bar.sync 0; 2026-02-21T09:07:28.2314435Z // begin inline asm 2026-02-21T09:07:28.2314618Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2762, %r2794, %r2826, %r2858}, [%r2333]; 2026-02-21T09:07:28.2314675Z // end inline asm 2026-02-21T09:07:28.2314731Z bar.sync 0; 2026-02-21T09:07:28.2314861Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3356}; 2026-02-21T09:07:28.2314916Z bar.sync 0; 2026-02-21T09:07:28.2314974Z // begin inline asm 2026-02-21T09:07:28.2315154Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2764, %r2796, %r2828, %r2860}, [%r2333]; 2026-02-21T09:07:28.2315210Z // end inline asm 2026-02-21T09:07:28.2315364Z bar.sync 0; 2026-02-21T09:07:28.2315507Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3357}; 2026-02-21T09:07:28.2315567Z bar.sync 0; 2026-02-21T09:07:28.2315628Z // begin inline asm 2026-02-21T09:07:28.2315810Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2765, %r2797, %r2829, %r2861}, [%r2333]; 2026-02-21T09:07:28.2315871Z // end inline asm 2026-02-21T09:07:28.2315926Z bar.sync 0; 2026-02-21T09:07:28.2316052Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3359}; 2026-02-21T09:07:28.2316107Z bar.sync 0; 2026-02-21T09:07:28.2316170Z // begin inline asm 2026-02-21T09:07:28.2316347Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2767, %r2799, %r2831, %r2863}, [%r2333]; 2026-02-21T09:07:28.2316403Z // end inline asm 2026-02-21T09:07:28.2316590Z bar.sync 0; 2026-02-21T09:07:28.2316725Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3358}; 2026-02-21T09:07:28.2316778Z bar.sync 0; 2026-02-21T09:07:28.2316841Z // begin inline asm 2026-02-21T09:07:28.2317111Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2766, %r2798, %r2830, %r2862}, [%r2333]; 2026-02-21T09:07:28.2317173Z // end inline asm 2026-02-21T09:07:28.2317229Z bar.sync 0; 2026-02-21T09:07:28.2317367Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3360}; 2026-02-21T09:07:28.2317427Z bar.sync 0; 2026-02-21T09:07:28.2317487Z // begin inline asm 2026-02-21T09:07:28.2317669Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2768, %r2800, %r2832, %r2864}, [%r2333]; 2026-02-21T09:07:28.2317726Z // end inline asm 2026-02-21T09:07:28.2317781Z bar.sync 0; 2026-02-21T09:07:28.2317909Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3361}; 2026-02-21T09:07:28.2317966Z bar.sync 0; 2026-02-21T09:07:28.2318025Z // begin inline asm 2026-02-21T09:07:28.2318201Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2769, %r2801, %r2833, %r2865}, [%r2333]; 2026-02-21T09:07:28.2318259Z // end inline asm 2026-02-21T09:07:28.2318316Z bar.sync 0; 2026-02-21T09:07:28.2318445Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3363}; 2026-02-21T09:07:28.2318504Z bar.sync 0; 2026-02-21T09:07:28.2318579Z // begin inline asm 2026-02-21T09:07:28.2318762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2771, %r2803, %r2835, %r2867}, [%r2333]; 2026-02-21T09:07:28.2318821Z // end inline asm 2026-02-21T09:07:28.2318880Z bar.sync 0; 2026-02-21T09:07:28.2319006Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3362}; 2026-02-21T09:07:28.2319061Z bar.sync 0; 2026-02-21T09:07:28.2319124Z // begin inline asm 2026-02-21T09:07:28.2319301Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2770, %r2802, %r2834, %r2866}, [%r2333]; 2026-02-21T09:07:28.2319358Z // end inline asm 2026-02-21T09:07:28.2319411Z bar.sync 0; 2026-02-21T09:07:28.2319540Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3364}; 2026-02-21T09:07:28.2319594Z bar.sync 0; 2026-02-21T09:07:28.2319653Z // begin inline asm 2026-02-21T09:07:28.2319846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2772, %r2804, %r2836, %r2868}, [%r2333]; 2026-02-21T09:07:28.2319907Z // end inline asm 2026-02-21T09:07:28.2319963Z bar.sync 0; 2026-02-21T09:07:28.2320103Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3365}; 2026-02-21T09:07:28.2320166Z bar.sync 0; 2026-02-21T09:07:28.2320306Z // begin inline asm 2026-02-21T09:07:28.2320488Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2773, %r2805, %r2837, %r2869}, [%r2333]; 2026-02-21T09:07:28.2320556Z // end inline asm 2026-02-21T09:07:28.2320617Z bar.sync 0; 2026-02-21T09:07:28.2320746Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3367}; 2026-02-21T09:07:28.2320810Z bar.sync 0; 2026-02-21T09:07:28.2320870Z // begin inline asm 2026-02-21T09:07:28.2321046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2775, %r2807, %r2839, %r2871}, [%r2333]; 2026-02-21T09:07:28.2321104Z // end inline asm 2026-02-21T09:07:28.2321167Z bar.sync 0; 2026-02-21T09:07:28.2321293Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3366}; 2026-02-21T09:07:28.2321350Z bar.sync 0; 2026-02-21T09:07:28.2321493Z // begin inline asm 2026-02-21T09:07:28.2321735Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2774, %r2806, %r2838, %r2870}, [%r2333]; 2026-02-21T09:07:28.2321794Z // end inline asm 2026-02-21T09:07:28.2321852Z bar.sync 0; 2026-02-21T09:07:28.2321986Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3368}; 2026-02-21T09:07:28.2322041Z bar.sync 0; 2026-02-21T09:07:28.2322103Z // begin inline asm 2026-02-21T09:07:28.2322286Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2776, %r2808, %r2840, %r2872}, [%r2333]; 2026-02-21T09:07:28.2322345Z // end inline asm 2026-02-21T09:07:28.2322400Z bar.sync 0; 2026-02-21T09:07:28.2322529Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3369}; 2026-02-21T09:07:28.2322585Z bar.sync 0; 2026-02-21T09:07:28.2322645Z // begin inline asm 2026-02-21T09:07:28.2322822Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2777, %r2809, %r2841, %r2873}, [%r2333]; 2026-02-21T09:07:28.2322886Z // end inline asm 2026-02-21T09:07:28.2322942Z bar.sync 0; 2026-02-21T09:07:28.2323125Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3371}; 2026-02-21T09:07:28.2323191Z bar.sync 0; 2026-02-21T09:07:28.2323250Z // begin inline asm 2026-02-21T09:07:28.2323427Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2779, %r2811, %r2843, %r2875}, [%r2333]; 2026-02-21T09:07:28.2323485Z // end inline asm 2026-02-21T09:07:28.2323546Z bar.sync 0; 2026-02-21T09:07:28.2323675Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3370}; 2026-02-21T09:07:28.2323730Z bar.sync 0; 2026-02-21T09:07:28.2323793Z // begin inline asm 2026-02-21T09:07:28.2323969Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2778, %r2810, %r2842, %r2874}, [%r2333]; 2026-02-21T09:07:28.2324027Z // end inline asm 2026-02-21T09:07:28.2324086Z bar.sync 0; 2026-02-21T09:07:28.2324213Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3372}; 2026-02-21T09:07:28.2324282Z bar.sync 0; 2026-02-21T09:07:28.2324345Z // begin inline asm 2026-02-21T09:07:28.2324530Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2780, %r2812, %r2844, %r2876}, [%r2333]; 2026-02-21T09:07:28.2324589Z // end inline asm 2026-02-21T09:07:28.2324644Z bar.sync 0; 2026-02-21T09:07:28.2324774Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3373}; 2026-02-21T09:07:28.2324832Z bar.sync 0; 2026-02-21T09:07:28.2324892Z // begin inline asm 2026-02-21T09:07:28.2325068Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2781, %r2813, %r2845, %r2877}, [%r2333]; 2026-02-21T09:07:28.2325130Z // end inline asm 2026-02-21T09:07:28.2325187Z bar.sync 0; 2026-02-21T09:07:28.2325312Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3375}; 2026-02-21T09:07:28.2325373Z bar.sync 0; 2026-02-21T09:07:28.2325433Z // begin inline asm 2026-02-21T09:07:28.2325612Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2783, %r2815, %r2847, %r2879}, [%r2333]; 2026-02-21T09:07:28.2325674Z // end inline asm 2026-02-21T09:07:28.2325729Z bar.sync 0; 2026-02-21T09:07:28.2325858Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3374}; 2026-02-21T09:07:28.2325916Z bar.sync 0; 2026-02-21T09:07:28.2325982Z // begin inline asm 2026-02-21T09:07:28.2326162Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2782, %r2814, %r2846, %r2878}, [%r2333]; 2026-02-21T09:07:28.2326220Z // end inline asm 2026-02-21T09:07:28.2326347Z bar.sync 0; 2026-02-21T09:07:28.2326586Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3376}; 2026-02-21T09:07:28.2326644Z bar.sync 0; 2026-02-21T09:07:28.2326715Z // begin inline asm 2026-02-21T09:07:28.2326901Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2784, %r2816, %r2848, %r2880}, [%r2333]; 2026-02-21T09:07:28.2326959Z // end inline asm 2026-02-21T09:07:28.2327015Z bar.sync 0; 2026-02-21T09:07:28.2327149Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3377}; 2026-02-21T09:07:28.2327204Z bar.sync 0; 2026-02-21T09:07:28.2327264Z // begin inline asm 2026-02-21T09:07:28.2327442Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2785, %r2817, %r2849, %r2881}, [%r2333]; 2026-02-21T09:07:28.2327505Z // end inline asm 2026-02-21T09:07:28.2327712Z bar.sync 0; 2026-02-21T09:07:28.2327846Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3379}; 2026-02-21T09:07:28.2327909Z bar.sync 0; 2026-02-21T09:07:28.2327969Z // begin inline asm 2026-02-21T09:07:28.2328149Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2787, %r2819, %r2851, %r2883}, [%r2333]; 2026-02-21T09:07:28.2328211Z // end inline asm 2026-02-21T09:07:28.2328267Z bar.sync 0; 2026-02-21T09:07:28.2328395Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3378}; 2026-02-21T09:07:28.2328450Z bar.sync 0; 2026-02-21T09:07:28.2328512Z // begin inline asm 2026-02-21T09:07:28.2328689Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2786, %r2818, %r2850, %r2882}, [%r2333]; 2026-02-21T09:07:28.2328747Z // end inline asm 2026-02-21T09:07:28.2328806Z bar.sync 0; 2026-02-21T09:07:28.2328933Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3380}; 2026-02-21T09:07:28.2328989Z bar.sync 0; 2026-02-21T09:07:28.2329048Z // begin inline asm 2026-02-21T09:07:28.2329299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2788, %r2820, %r2852, %r2884}, [%r2333]; 2026-02-21T09:07:28.2329359Z // end inline asm 2026-02-21T09:07:28.2329414Z bar.sync 0; 2026-02-21T09:07:28.2329554Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3381}; 2026-02-21T09:07:28.2329612Z bar.sync 0; 2026-02-21T09:07:28.2329671Z // begin inline asm 2026-02-21T09:07:28.2329851Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2789, %r2821, %r2853, %r2885}, [%r2333]; 2026-02-21T09:07:28.2329908Z // end inline asm 2026-02-21T09:07:28.2329964Z bar.sync 0; 2026-02-21T09:07:28.2330093Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3383}; 2026-02-21T09:07:28.2330151Z bar.sync 0; 2026-02-21T09:07:28.2330210Z // begin inline asm 2026-02-21T09:07:28.2330398Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2791, %r2823, %r2855, %r2887}, [%r2333]; 2026-02-21T09:07:28.2330460Z // end inline asm 2026-02-21T09:07:28.2330518Z bar.sync 0; 2026-02-21T09:07:28.2330646Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3382}; 2026-02-21T09:07:28.2330705Z bar.sync 0; 2026-02-21T09:07:28.2330769Z // begin inline asm 2026-02-21T09:07:28.2330947Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2790, %r2822, %r2854, %r2886}, [%r2333]; 2026-02-21T09:07:28.2331016Z // end inline asm 2026-02-21T09:07:28.2331076Z bar.sync 0; 2026-02-21T09:07:28.2331206Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3024], {%r3384}; 2026-02-21T09:07:28.2331261Z bar.sync 0; 2026-02-21T09:07:28.2331326Z // begin inline asm 2026-02-21T09:07:28.2331505Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2792, %r2824, %r2856, %r2888}, [%r2333]; 2026-02-21T09:07:28.2331563Z // end inline asm 2026-02-21T09:07:28.2331619Z $L__tmp9: 2026-02-21T09:07:28.2331904Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2331965Z // begin inline asm 2026-02-21T09:07:28.2332046Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2332111Z // end inline asm 2026-02-21T09:07:28.2332196Z shfl.sync.idx.b32 %r3234, %r5, 0, 31, -1; 2026-02-21T09:07:28.2332273Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2332337Z mov.pred %p42, -1; 2026-02-21T09:07:28.2332402Z // begin inline asm 2026-02-21T09:07:28.2333250Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768,%r2769,%r2770,%r2771,%r2772,%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788,%r2789,%r2790,%r2791,%r2792}, {%r2621,%r2622,%r2623,%r2624}, %rd92, %p42, 1, 1; 2026-02-21T09:07:28.2333311Z // end inline asm 2026-02-21T09:07:28.2333373Z // begin inline asm 2026-02-21T09:07:28.2334125Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808,%r2809,%r2810,%r2811,%r2812,%r2813,%r2814,%r2815,%r2816,%r2817,%r2818,%r2819,%r2820,%r2821,%r2822,%r2823,%r2824}, {%r2621,%r2622,%r2623,%r2624}, %rd93, %p42, 1, 1; 2026-02-21T09:07:28.2334278Z // end inline asm 2026-02-21T09:07:28.2334348Z // begin inline asm 2026-02-21T09:07:28.2335094Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2825,%r2826,%r2827,%r2828,%r2829,%r2830,%r2831,%r2832,%r2833,%r2834,%r2835,%r2836,%r2837,%r2838,%r2839,%r2840,%r2841,%r2842,%r2843,%r2844,%r2845,%r2846,%r2847,%r2848,%r2849,%r2850,%r2851,%r2852,%r2853,%r2854,%r2855,%r2856}, {%r2621,%r2622,%r2623,%r2624}, %rd94, %p42, 1, 1; 2026-02-21T09:07:28.2335154Z // end inline asm 2026-02-21T09:07:28.2335218Z // begin inline asm 2026-02-21T09:07:28.2335964Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2857,%r2858,%r2859,%r2860,%r2861,%r2862,%r2863,%r2864,%r2865,%r2866,%r2867,%r2868,%r2869,%r2870,%r2871,%r2872,%r2873,%r2874,%r2875,%r2876,%r2877,%r2878,%r2879,%r2880,%r2881,%r2882,%r2883,%r2884,%r2885,%r2886,%r2887,%r2888}, {%r2621,%r2622,%r2623,%r2624}, %rd95, %p42, 1, 1; 2026-02-21T09:07:28.2336023Z // end inline asm 2026-02-21T09:07:28.2336108Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2336254Z mov.b32 %r3189, 0; 2026-02-21T09:07:28.2336320Z mov.b32 %r2889, %r3020; 2026-02-21T09:07:28.2336390Z mov.b32 %r2890, %r3189; 2026-02-21T09:07:28.2336559Z mov.b32 %r2891, %r3189; 2026-02-21T09:07:28.2336637Z // begin inline asm 2026-02-21T09:07:28.2338719Z // wait for regs: %r2761,%r2762,%r2763,%r2764,%r2765,%r2766,%r2767,%r2768,%r2769,%r2770,%r2771,%r2772,%r2773,%r2774,%r2775,%r2776,%r2777,%r2778,%r2779,%r2780,%r2781,%r2782,%r2783,%r2784,%r2785,%r2786,%r2787,%r2788,%r2789,%r2790,%r2791,%r2792,%r2793,%r2794,%r2795,%r2796,%r2797,%r2798,%r2799,%r2800,%r2801,%r2802,%r2803,%r2804,%r2805,%r2806,%r2807,%r2808,%r2809,%r2810,%r2811,%r2812,%r2813,%r2814,%r2815,%r2816,%r2817,%r2818,%r2819,%r2820,%r2821,%r2822,%r2823,%r2824,%r2825,%r2826,%r2827,%r2828,%r2829,%r2830,%r2831,%r2832,%r2833,%r2834,%r2835,%r2836,%r2837,%r2838,%r2839,%r2840,%r2841,%r2842,%r2843,%r2844,%r2845,%r2846,%r2847,%r2848,%r2849,%r2850,%r2851,%r2852,%r2853,%r2854,%r2855,%r2856,%r2857,%r2858,%r2859,%r2860,%r2861,%r2862,%r2863,%r2864,%r2865,%r2866,%r2867,%r2868,%r2869,%r2870,%r2871,%r2872,%r2873,%r2874,%r2875,%r2876,%r2877,%r2878,%r2879,%r2880,%r2881,%r2882,%r2883,%r2884,%r2885,%r2886,%r2887,%r2888,%r2889,%r2890,%r2891 2026-02-21T09:07:28.2338800Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2338861Z // end inline asm 2026-02-21T09:07:28.2338917Z $L__tmp10: 2026-02-21T09:07:28.2339129Z .loc 1 55 80 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:55:80 2026-02-21T09:07:28.2339195Z add.s64 %rd96, %rd106, 16; 2026-02-21T09:07:28.2339259Z // begin inline asm 2026-02-21T09:07:28.2339319Z mov.u16 %rs118, 0x0; 2026-02-21T09:07:28.2339397Z ld.global.b16 { %rs118 }, [ %rd96 + 0 ]; 2026-02-21T09:07:28.2339459Z // end inline asm 2026-02-21T09:07:28.2339663Z .loc 1 59 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:59:32 2026-02-21T09:07:28.2339721Z bar.sync 0; 2026-02-21T09:07:28.2339791Z st.shared.b16 [%r173], %rs118; 2026-02-21T09:07:28.2339855Z bar.sync 0; 2026-02-21T09:07:28.2339922Z ld.shared.b16 %rs145, [%r174]; 2026-02-21T09:07:28.2339992Z ld.shared.b16 %rs146, [%r174+128]; 2026-02-21T09:07:28.2340076Z ld.shared.b16 %rs147, [%r174+8]; 2026-02-21T09:07:28.2340231Z ld.shared.b16 %rs148, [%r174+136]; 2026-02-21T09:07:28.2340298Z cvt.f32.bf16 %r3151, %rs145; 2026-02-21T09:07:28.2340363Z cvt.f32.bf16 %r3152, %rs146; 2026-02-21T09:07:28.2340424Z cvt.f32.bf16 %r3153, %rs147; 2026-02-21T09:07:28.2340485Z cvt.f32.bf16 %r3154, %rs148; 2026-02-21T09:07:28.2340688Z .loc 1 61 87 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:61:87 2026-02-21T09:07:28.2340757Z add.s64 %rd97, %rd105, 32768; 2026-02-21T09:07:28.2340818Z // begin inline asm 2026-02-21T09:07:28.2340878Z mov.u16 %rs119, 0x0; 2026-02-21T09:07:28.2340958Z ld.global.b16 { %rs119 }, [ %rd97 + 0 ]; 2026-02-21T09:07:28.2341017Z // end inline asm 2026-02-21T09:07:28.2341079Z shr.u16 %rs149, %rs119, 8; 2026-02-21T09:07:28.2341426Z .loc 1 69 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:69:28 2026-02-21T09:07:28.2341489Z bar.sync 0; 2026-02-21T09:07:28.2341556Z st.shared.b8 [%r175], %rs119; 2026-02-21T09:07:28.2341626Z st.shared.b8 [%r176+512], %rs149; 2026-02-21T09:07:28.2341686Z bar.sync 0; 2026-02-21T09:07:28.2341753Z ld.shared.b32 %r3235, [%r177]; 2026-02-21T09:07:28.2341822Z prmt.b32 %r3236, %r3235, 0, 0x7770U; 2026-02-21T09:07:28.2341887Z cvt.u16.u32 %rs150, %r3236; 2026-02-21T09:07:28.2341959Z prmt.b32 %r3237, %r3235, 0, 0x7771U; 2026-02-21T09:07:28.2342032Z cvt.u16.u32 %rs151, %r3237; 2026-02-21T09:07:28.2342100Z prmt.b32 %r3238, %r3235, 0, 0x7772U; 2026-02-21T09:07:28.2342168Z cvt.u16.u32 %rs152, %r3238; 2026-02-21T09:07:28.2342234Z prmt.b32 %r3239, %r3235, 0, 0x7773U; 2026-02-21T09:07:28.2342299Z cvt.u16.u32 %rs153, %r3239; 2026-02-21T09:07:28.2342507Z .loc 1 64 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:64:28 2026-02-21T09:07:28.2342640Z shl.b16 %rs154, %rs150, 4; 2026-02-21T09:07:28.2342705Z shl.b16 %rs155, %rs151, 4; 2026-02-21T09:07:28.2342766Z shl.b16 %rs156, %rs152, 4; 2026-02-21T09:07:28.2342831Z shl.b16 %rs157, %rs153, 4; 2026-02-21T09:07:28.2343037Z .loc 1 79 58 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:79:58 2026-02-21T09:07:28.2343111Z selp.b16 %rs158, %rs154, %rs150, %p53; 2026-02-21T09:07:28.2343176Z cvt.s16.s8 %rs159, %rs158; 2026-02-21T09:07:28.2343237Z shr.s16 %rs160, %rs159, 4; 2026-02-21T09:07:28.2343307Z selp.b16 %rs161, %rs155, %rs151, %p53; 2026-02-21T09:07:28.2343371Z cvt.s16.s8 %rs162, %rs161; 2026-02-21T09:07:28.2343436Z shr.s16 %rs163, %rs162, 4; 2026-02-21T09:07:28.2343504Z selp.b16 %rs164, %rs156, %rs152, %p53; 2026-02-21T09:07:28.2343565Z cvt.s16.s8 %rs165, %rs164; 2026-02-21T09:07:28.2343628Z shr.s16 %rs166, %rs165, 4; 2026-02-21T09:07:28.2343697Z selp.b16 %rs167, %rs157, %rs153, %p53; 2026-02-21T09:07:28.2343763Z cvt.s16.s8 %rs168, %rs167; 2026-02-21T09:07:28.2343835Z shr.s16 %rs169, %rs168, 4; 2026-02-21T09:07:28.2344045Z .loc 1 84 32 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:84:32 2026-02-21T09:07:28.2344114Z cvt.rn.f32.s16 %r3240, %rs160; 2026-02-21T09:07:28.2344182Z cvt.rn.f32.s16 %r3241, %rs163; 2026-02-21T09:07:28.2344252Z cvt.rn.f32.s16 %r3242, %rs166; 2026-02-21T09:07:28.2344315Z cvt.rn.f32.s16 %r3243, %rs169; 2026-02-21T09:07:28.2344370Z bar.sync 0; 2026-02-21T09:07:28.2344443Z st.shared.b32 [%r178], %r3240; 2026-02-21T09:07:28.2344508Z st.shared.b32 [%r178+8], %r3241; 2026-02-21T09:07:28.2344571Z st.shared.b32 [%r179], %r3242; 2026-02-21T09:07:28.2344636Z st.shared.b32 [%r179+8], %r3243; 2026-02-21T09:07:28.2344694Z $L__tmp11: 2026-02-21T09:07:28.2344971Z .loc 2 291 36 // standard.py:291:36 @[ cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:91:40 ] 2026-02-21T09:07:28.2345162Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2761, %r2793, %r2825, %r2857}; 2026-02-21T09:07:28.2345228Z bar.sync 0; 2026-02-21T09:07:28.2345289Z // begin inline asm 2026-02-21T09:07:28.2345424Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3353}, [%r3024]; 2026-02-21T09:07:28.2345556Z // end inline asm 2026-02-21T09:07:28.2345612Z bar.sync 0; 2026-02-21T09:07:28.2345796Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2763, %r2795, %r2827, %r2859}; 2026-02-21T09:07:28.2345851Z bar.sync 0; 2026-02-21T09:07:28.2345914Z // begin inline asm 2026-02-21T09:07:28.2346044Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3355}, [%r3024]; 2026-02-21T09:07:28.2346101Z // end inline asm 2026-02-21T09:07:28.2346162Z bar.sync 0; 2026-02-21T09:07:28.2346343Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2762, %r2794, %r2826, %r2858}; 2026-02-21T09:07:28.2346399Z bar.sync 0; 2026-02-21T09:07:28.2346608Z // begin inline asm 2026-02-21T09:07:28.2346745Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3354}, [%r3024]; 2026-02-21T09:07:28.2346959Z // end inline asm 2026-02-21T09:07:28.2347021Z bar.sync 0; 2026-02-21T09:07:28.2347205Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2764, %r2796, %r2828, %r2860}; 2026-02-21T09:07:28.2347262Z bar.sync 0; 2026-02-21T09:07:28.2347322Z // begin inline asm 2026-02-21T09:07:28.2347453Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3356}, [%r3024]; 2026-02-21T09:07:28.2347509Z // end inline asm 2026-02-21T09:07:28.2347563Z bar.sync 0; 2026-02-21T09:07:28.2347740Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2765, %r2797, %r2829, %r2861}; 2026-02-21T09:07:28.2347800Z bar.sync 0; 2026-02-21T09:07:28.2347858Z // begin inline asm 2026-02-21T09:07:28.2347986Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3357}, [%r3024]; 2026-02-21T09:07:28.2348046Z // end inline asm 2026-02-21T09:07:28.2348101Z bar.sync 0; 2026-02-21T09:07:28.2348277Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2767, %r2799, %r2831, %r2863}; 2026-02-21T09:07:28.2348407Z bar.sync 0; 2026-02-21T09:07:28.2348472Z // begin inline asm 2026-02-21T09:07:28.2348673Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3359}, [%r3024]; 2026-02-21T09:07:28.2348734Z // end inline asm 2026-02-21T09:07:28.2348794Z bar.sync 0; 2026-02-21T09:07:28.2348973Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2766, %r2798, %r2830, %r2862}; 2026-02-21T09:07:28.2349027Z bar.sync 0; 2026-02-21T09:07:28.2349089Z // begin inline asm 2026-02-21T09:07:28.2349214Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3358}, [%r3024]; 2026-02-21T09:07:28.2349274Z // end inline asm 2026-02-21T09:07:28.2349329Z bar.sync 0; 2026-02-21T09:07:28.2349510Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2768, %r2800, %r2832, %r2864}; 2026-02-21T09:07:28.2349574Z bar.sync 0; 2026-02-21T09:07:28.2349635Z // begin inline asm 2026-02-21T09:07:28.2349776Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3360}, [%r3024]; 2026-02-21T09:07:28.2349839Z // end inline asm 2026-02-21T09:07:28.2349895Z bar.sync 0; 2026-02-21T09:07:28.2350080Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2769, %r2801, %r2833, %r2865}; 2026-02-21T09:07:28.2350140Z bar.sync 0; 2026-02-21T09:07:28.2350200Z // begin inline asm 2026-02-21T09:07:28.2350330Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3361}, [%r3024]; 2026-02-21T09:07:28.2350391Z // end inline asm 2026-02-21T09:07:28.2350445Z bar.sync 0; 2026-02-21T09:07:28.2350624Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2771, %r2803, %r2835, %r2867}; 2026-02-21T09:07:28.2350683Z bar.sync 0; 2026-02-21T09:07:28.2350743Z // begin inline asm 2026-02-21T09:07:28.2350874Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3363}, [%r3024]; 2026-02-21T09:07:28.2350931Z // end inline asm 2026-02-21T09:07:28.2350991Z bar.sync 0; 2026-02-21T09:07:28.2351169Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2770, %r2802, %r2834, %r2866}; 2026-02-21T09:07:28.2351223Z bar.sync 0; 2026-02-21T09:07:28.2351285Z // begin inline asm 2026-02-21T09:07:28.2351416Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3362}, [%r3024]; 2026-02-21T09:07:28.2351475Z // end inline asm 2026-02-21T09:07:28.2351530Z bar.sync 0; 2026-02-21T09:07:28.2351709Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2772, %r2804, %r2836, %r2868}; 2026-02-21T09:07:28.2351857Z bar.sync 0; 2026-02-21T09:07:28.2351918Z // begin inline asm 2026-02-21T09:07:28.2352052Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3364}, [%r3024]; 2026-02-21T09:07:28.2352107Z // end inline asm 2026-02-21T09:07:28.2352162Z bar.sync 0; 2026-02-21T09:07:28.2352343Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2773, %r2805, %r2837, %r2869}; 2026-02-21T09:07:28.2352398Z bar.sync 0; 2026-02-21T09:07:28.2352458Z // begin inline asm 2026-02-21T09:07:28.2352586Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3365}, [%r3024]; 2026-02-21T09:07:28.2352646Z // end inline asm 2026-02-21T09:07:28.2352703Z bar.sync 0; 2026-02-21T09:07:28.2352882Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2775, %r2807, %r2839, %r2871}; 2026-02-21T09:07:28.2353039Z bar.sync 0; 2026-02-21T09:07:28.2353102Z // begin inline asm 2026-02-21T09:07:28.2353229Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3367}, [%r3024]; 2026-02-21T09:07:28.2353285Z // end inline asm 2026-02-21T09:07:28.2353348Z bar.sync 0; 2026-02-21T09:07:28.2353525Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2774, %r2806, %r2838, %r2870}; 2026-02-21T09:07:28.2353579Z bar.sync 0; 2026-02-21T09:07:28.2353641Z // begin inline asm 2026-02-21T09:07:28.2353772Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3366}, [%r3024]; 2026-02-21T09:07:28.2353828Z // end inline asm 2026-02-21T09:07:28.2353886Z bar.sync 0; 2026-02-21T09:07:28.2354063Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2776, %r2808, %r2840, %r2872}; 2026-02-21T09:07:28.2354118Z bar.sync 0; 2026-02-21T09:07:28.2354177Z // begin inline asm 2026-02-21T09:07:28.2354310Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3368}, [%r3024]; 2026-02-21T09:07:28.2354368Z // end inline asm 2026-02-21T09:07:28.2354426Z bar.sync 0; 2026-02-21T09:07:28.2354658Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2777, %r2809, %r2841, %r2873}; 2026-02-21T09:07:28.2354715Z bar.sync 0; 2026-02-21T09:07:28.2354788Z // begin inline asm 2026-02-21T09:07:28.2354919Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3369}, [%r3024]; 2026-02-21T09:07:28.2354979Z // end inline asm 2026-02-21T09:07:28.2355034Z bar.sync 0; 2026-02-21T09:07:28.2355209Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2779, %r2811, %r2843, %r2875}; 2026-02-21T09:07:28.2355267Z bar.sync 0; 2026-02-21T09:07:28.2355326Z // begin inline asm 2026-02-21T09:07:28.2355452Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3371}, [%r3024]; 2026-02-21T09:07:28.2355512Z // end inline asm 2026-02-21T09:07:28.2355566Z bar.sync 0; 2026-02-21T09:07:28.2355741Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2778, %r2810, %r2842, %r2874}; 2026-02-21T09:07:28.2355798Z bar.sync 0; 2026-02-21T09:07:28.2355863Z // begin inline asm 2026-02-21T09:07:28.2355993Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3370}, [%r3024]; 2026-02-21T09:07:28.2356050Z // end inline asm 2026-02-21T09:07:28.2356110Z bar.sync 0; 2026-02-21T09:07:28.2356286Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2780, %r2812, %r2844, %r2876}; 2026-02-21T09:07:28.2356354Z bar.sync 0; 2026-02-21T09:07:28.2356413Z // begin inline asm 2026-02-21T09:07:28.2356664Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3372}, [%r3024]; 2026-02-21T09:07:28.2356723Z // end inline asm 2026-02-21T09:07:28.2356777Z bar.sync 0; 2026-02-21T09:07:28.2356961Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2781, %r2813, %r2845, %r2877}; 2026-02-21T09:07:28.2357022Z bar.sync 0; 2026-02-21T09:07:28.2357082Z // begin inline asm 2026-02-21T09:07:28.2357211Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3373}, [%r3024]; 2026-02-21T09:07:28.2357271Z // end inline asm 2026-02-21T09:07:28.2357326Z bar.sync 0; 2026-02-21T09:07:28.2357507Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2783, %r2815, %r2847, %r2879}; 2026-02-21T09:07:28.2357570Z bar.sync 0; 2026-02-21T09:07:28.2357630Z // begin inline asm 2026-02-21T09:07:28.2357757Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3375}, [%r3024]; 2026-02-21T09:07:28.2357909Z // end inline asm 2026-02-21T09:07:28.2357967Z bar.sync 0; 2026-02-21T09:07:28.2358149Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2782, %r2814, %r2846, %r2878}; 2026-02-21T09:07:28.2358205Z bar.sync 0; 2026-02-21T09:07:28.2358268Z // begin inline asm 2026-02-21T09:07:28.2358397Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3374}, [%r3024]; 2026-02-21T09:07:28.2358454Z // end inline asm 2026-02-21T09:07:28.2358511Z bar.sync 0; 2026-02-21T09:07:28.2358688Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2784, %r2816, %r2848, %r2880}; 2026-02-21T09:07:28.2358744Z bar.sync 0; 2026-02-21T09:07:28.2358804Z // begin inline asm 2026-02-21T09:07:28.2358935Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3376}, [%r3024]; 2026-02-21T09:07:28.2359137Z // end inline asm 2026-02-21T09:07:28.2359196Z bar.sync 0; 2026-02-21T09:07:28.2359379Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2785, %r2817, %r2849, %r2881}; 2026-02-21T09:07:28.2359448Z bar.sync 0; 2026-02-21T09:07:28.2359511Z // begin inline asm 2026-02-21T09:07:28.2359648Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3377}, [%r3024]; 2026-02-21T09:07:28.2359712Z // end inline asm 2026-02-21T09:07:28.2359772Z bar.sync 0; 2026-02-21T09:07:28.2359953Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2787, %r2819, %r2851, %r2883}; 2026-02-21T09:07:28.2360013Z bar.sync 0; 2026-02-21T09:07:28.2360073Z // begin inline asm 2026-02-21T09:07:28.2360199Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3379}, [%r3024]; 2026-02-21T09:07:28.2360261Z // end inline asm 2026-02-21T09:07:28.2360317Z bar.sync 0; 2026-02-21T09:07:28.2360493Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2786, %r2818, %r2850, %r2882}; 2026-02-21T09:07:28.2360552Z bar.sync 0; 2026-02-21T09:07:28.2360617Z // begin inline asm 2026-02-21T09:07:28.2360809Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3378}, [%r3024]; 2026-02-21T09:07:28.2360868Z // end inline asm 2026-02-21T09:07:28.2360927Z bar.sync 0; 2026-02-21T09:07:28.2361106Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2788, %r2820, %r2852, %r2884}; 2026-02-21T09:07:28.2361162Z bar.sync 0; 2026-02-21T09:07:28.2361225Z // begin inline asm 2026-02-21T09:07:28.2361365Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3380}, [%r3024]; 2026-02-21T09:07:28.2361423Z // end inline asm 2026-02-21T09:07:28.2361479Z bar.sync 0; 2026-02-21T09:07:28.2361663Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2789, %r2821, %r2853, %r2885}; 2026-02-21T09:07:28.2361719Z bar.sync 0; 2026-02-21T09:07:28.2361778Z // begin inline asm 2026-02-21T09:07:28.2361910Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3381}, [%r3024]; 2026-02-21T09:07:28.2361967Z // end inline asm 2026-02-21T09:07:28.2362027Z bar.sync 0; 2026-02-21T09:07:28.2362211Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2791, %r2823, %r2855, %r2887}; 2026-02-21T09:07:28.2362272Z bar.sync 0; 2026-02-21T09:07:28.2362330Z // begin inline asm 2026-02-21T09:07:28.2362459Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3383}, [%r3024]; 2026-02-21T09:07:28.2362522Z // end inline asm 2026-02-21T09:07:28.2362592Z bar.sync 0; 2026-02-21T09:07:28.2362773Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2790, %r2822, %r2854, %r2886}; 2026-02-21T09:07:28.2362832Z bar.sync 0; 2026-02-21T09:07:28.2362892Z // begin inline asm 2026-02-21T09:07:28.2363020Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3382}, [%r3024]; 2026-02-21T09:07:28.2363076Z // end inline asm 2026-02-21T09:07:28.2363137Z bar.sync 0; 2026-02-21T09:07:28.2363316Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2333], {%r2792, %r2824, %r2856, %r2888}; 2026-02-21T09:07:28.2363373Z bar.sync 0; 2026-02-21T09:07:28.2363435Z // begin inline asm 2026-02-21T09:07:28.2363565Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3384}, [%r3024]; 2026-02-21T09:07:28.2363623Z // end inline asm 2026-02-21T09:07:28.2363682Z // begin inline asm 2026-02-21T09:07:28.2363762Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2363882Z // end inline asm 2026-02-21T09:07:28.2363957Z wgmma.fence.sync.aligned; 2026-02-21T09:07:28.2364025Z shl.b32 %r3244, %r3234, 9; 2026-02-21T09:07:28.2364098Z and.b32 %r3245, %r3244, 6144; 2026-02-21T09:07:28.2364166Z add.s32 %r3246, %r3245, %r3020; 2026-02-21T09:07:28.2364228Z bfe.u32 %r3247, %r3246, 4, 14; 2026-02-21T09:07:28.2364294Z cvt.u64.u32 %rd99, %r3247; 2026-02-21T09:07:28.2364375Z or.b64 %rd98, %rd99, -4611685949674356736; 2026-02-21T09:07:28.2364433Z // begin inline asm 2026-02-21T09:07:28.2365247Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365,%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384}, {%r3151,%r3152,%r3153,%r3154}, %rd98, %p42, 1, 1; 2026-02-21T09:07:28.2365350Z // end inline asm 2026-02-21T09:07:28.2365426Z wgmma.commit_group.sync.aligned; 2026-02-21T09:07:28.2365490Z mov.b32 %r3187, %r3020; 2026-02-21T09:07:28.2365561Z mov.b32 %r3188, %r3189; 2026-02-21T09:07:28.2365624Z // begin inline asm 2026-02-21T09:07:28.2366187Z // wait for regs: %r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365,%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3187,%r3188,%r3189 2026-02-21T09:07:28.2366262Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:07:28.2366319Z // end inline asm 2026-02-21T09:07:28.2366376Z $L__tmp12: 2026-02-21T09:07:28.2366727Z .loc 1 47 110 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:47:110 2026-02-21T09:07:28.2366794Z add.s64 %rd107, %rd107, 8; 2026-02-21T09:07:28.2366857Z add.s64 %rd106, %rd106, 32; 2026-02-21T09:07:28.2367010Z add.s64 %rd105, %rd105, 65536; 2026-02-21T09:07:28.2367083Z setp.lt.u64 %p48, %rd107, 504; 2026-02-21T09:07:28.2367144Z @%p48 bra $L__BB0_10; 2026-02-21T09:07:28.2367258Z // %bb.11: // in Loop: Header=BB0_9 Depth=1 2026-02-21T09:07:28.2367469Z .loc 1 94 28 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:94:28 2026-02-21T09:07:28.2367549Z cvt.rn.bf16x2.f32 %r3251, %r3354, %r3353; 2026-02-21T09:07:28.2367623Z cvt.rn.bf16x2.f32 %r3252, %r3356, %r3355; 2026-02-21T09:07:28.2367696Z cvt.rn.bf16x2.f32 %r3253, %r3358, %r3357; 2026-02-21T09:07:28.2367766Z cvt.rn.bf16x2.f32 %r3254, %r3360, %r3359; 2026-02-21T09:07:28.2367836Z cvt.rn.bf16x2.f32 %r3255, %r3362, %r3361; 2026-02-21T09:07:28.2367910Z cvt.rn.bf16x2.f32 %r3256, %r3364, %r3363; 2026-02-21T09:07:28.2367980Z cvt.rn.bf16x2.f32 %r3257, %r3366, %r3365; 2026-02-21T09:07:28.2368050Z cvt.rn.bf16x2.f32 %r3258, %r3368, %r3367; 2026-02-21T09:07:28.2368131Z cvt.rn.bf16x2.f32 %r3259, %r3370, %r3369; 2026-02-21T09:07:28.2368204Z cvt.rn.bf16x2.f32 %r3260, %r3372, %r3371; 2026-02-21T09:07:28.2368277Z cvt.rn.bf16x2.f32 %r3261, %r3374, %r3373; 2026-02-21T09:07:28.2368350Z cvt.rn.bf16x2.f32 %r3262, %r3376, %r3375; 2026-02-21T09:07:28.2368437Z cvt.rn.bf16x2.f32 %r3263, %r3378, %r3377; 2026-02-21T09:07:28.2368511Z cvt.rn.bf16x2.f32 %r3264, %r3380, %r3379; 2026-02-21T09:07:28.2368582Z cvt.rn.bf16x2.f32 %r3265, %r3382, %r3381; 2026-02-21T09:07:28.2368657Z cvt.rn.bf16x2.f32 %r3266, %r3384, %r3383; 2026-02-21T09:07:28.2368868Z .loc 1 95 43 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:95:43 2026-02-21T09:07:28.2368929Z bar.sync 0; 2026-02-21T09:07:28.2369122Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r182], {%r3251, %r3252, %r3253, %r3254}; 2026-02-21T09:07:28.2369305Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r183], {%r3255, %r3256, %r3257, %r3258}; 2026-02-21T09:07:28.2369486Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r184], {%r3259, %r3260, %r3261, %r3262}; 2026-02-21T09:07:28.2369668Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r185], {%r3263, %r3264, %r3265, %r3266}; 2026-02-21T09:07:28.2369735Z // begin inline asm 2026-02-21T09:07:28.2369901Z fence.proxy.async.shared::cta; 2026-02-21T09:07:28.2369959Z // end inline asm 2026-02-21T09:07:28.2370020Z bar.sync 0; 2026-02-21T09:07:28.2370091Z elect.sync %r3267|%p51, -1; 2026-02-21T09:07:28.2370169Z shfl.sync.idx.b32 %r3268, %r5, 0, 31, -1; 2026-02-21T09:07:28.2370247Z and.pred %p49, %p54, %p51; 2026-02-21T09:07:28.2370309Z and.b32 %r3269, %r3268, 3; 2026-02-21T09:07:28.2370372Z shl.b32 %r3270, %r3269, 13; 2026-02-21T09:07:28.2370438Z add.s32 %r3250, %r3020, %r3270; 2026-02-21T09:07:28.2370503Z shl.b32 %r3272, %r3269, 6; 2026-02-21T09:07:28.2370565Z or.b32 %r3248, %r3272, %r189; 2026-02-21T09:07:28.2370624Z // begin inline asm 2026-02-21T09:07:28.2370853Z @%p49 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd69, {%r3248, %r188}], [%r3250]; 2026-02-21T09:07:28.2371051Z // end inline asm 2026-02-21T09:07:28.2371129Z cp.async.bulk.commit_group; 2026-02-21T09:07:28.2371207Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:07:28.2371269Z bar.sync 0; 2026-02-21T09:07:28.2371479Z .loc 1 26 97 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:97 2026-02-21T09:07:28.2371541Z add.s32 %r254, %r3352, 528; 2026-02-21T09:07:28.2371611Z setp.lt.s32 %p52, %r3352, 1520; 2026-02-21T09:07:28.2371673Z mov.b32 %r3352, %r254; 2026-02-21T09:07:28.2371733Z @%p52 bra $L__BB0_9; 2026-02-21T09:07:28.2371832Z $L__BB0_12: // %._crit_edge49 2026-02-21T09:07:28.2372032Z .loc 1 26 4 // cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py:26:4 2026-02-21T09:07:28.2372087Z ret; 2026-02-21T09:07:28.2372141Z $L__tmp13: 2026-02-21T09:07:28.2372204Z $L__func_end0: 2026-02-21T09:07:28.2372289Z // -- End function 2026-02-21T09:07:28.2372348Z } 2026-02-21T09:07:28.2372657Z .file 1 "/tmp/torchinductor_root/ut/cutoohackoopo5cxm7kpb26c2qgsdzao4pxjebwl4pb2p3lmbwce.py" 2026-02-21T09:07:28.2372874Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:07:28.2372954Z .section .debug_abbrev 2026-02-21T09:07:28.2373015Z { 2026-02-21T09:07:28.2373115Z .b8 1 // Abbreviation Code 2026-02-21T09:07:28.2373210Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:07:28.2373296Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:07:28.2373387Z .b8 37 // DW_AT_producer 2026-02-21T09:07:28.2373467Z .b8 8 // DW_FORM_string 2026-02-21T09:07:28.2373548Z .b8 19 // DW_AT_language 2026-02-21T09:07:28.2373636Z .b8 5 // DW_FORM_data2 2026-02-21T09:07:28.2373720Z .b8 3 // DW_AT_name 2026-02-21T09:07:28.2373804Z .b8 8 // DW_FORM_string 2026-02-21T09:07:28.2373893Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:07:28.2373990Z .b8 6 // DW_FORM_data4 2026-02-21T09:07:28.2374076Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:07:28.2374155Z .b8 8 // DW_FORM_string 2026-02-21T09:07:28.2374235Z .b8 0 // EOM(1) 2026-02-21T09:07:28.2374306Z .b8 0 // EOM(2) 2026-02-21T09:07:28.2374394Z .b8 2 // Abbreviation Code 2026-02-21T09:07:28.2374485Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:07:28.2374564Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:07:28.2374641Z .b8 3 // DW_AT_name 2026-02-21T09:07:28.2374725Z .b8 8 // DW_FORM_string 2026-02-21T09:07:28.2374811Z .b8 32 // DW_AT_inline 2026-02-21T09:07:28.2374891Z .b8 11 // DW_FORM_data1 2026-02-21T09:07:28.2375032Z .b8 0 // EOM(1) 2026-02-21T09:07:28.2375106Z .b8 0 // EOM(2) 2026-02-21T09:07:28.2375193Z .b8 3 // Abbreviation Code 2026-02-21T09:07:28.2375278Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:07:28.2375366Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:07:28.2375447Z .b8 17 // DW_AT_low_pc 2026-02-21T09:07:28.2375527Z .b8 1 // DW_FORM_addr 2026-02-21T09:07:28.2375612Z .b8 18 // DW_AT_high_pc 2026-02-21T09:07:28.2375689Z .b8 1 // DW_FORM_addr 2026-02-21T09:07:28.2375835Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:07:28.2375970Z .b8 19 // DW_FORM_ref4 2026-02-21T09:07:28.2376051Z .b8 0 // EOM(1) 2026-02-21T09:07:28.2376122Z .b8 0 // EOM(2) 2026-02-21T09:07:28.2376208Z .b8 4 // Abbreviation Code 2026-02-21T09:07:28.2376314Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:07:28.2376395Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:07:28.2376615Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:07:28.2376703Z .b8 19 // DW_FORM_ref4 2026-02-21T09:07:28.2376782Z .b8 17 // DW_AT_low_pc 2026-02-21T09:07:28.2376858Z .b8 1 // DW_FORM_addr 2026-02-21T09:07:28.2376943Z .b8 18 // DW_AT_high_pc 2026-02-21T09:07:28.2377035Z .b8 1 // DW_FORM_addr 2026-02-21T09:07:28.2377195Z .b8 88 // DW_AT_call_file 2026-02-21T09:07:28.2377277Z .b8 11 // DW_FORM_data1 2026-02-21T09:07:28.2377365Z .b8 89 // DW_AT_call_line 2026-02-21T09:07:28.2377447Z .b8 11 // DW_FORM_data1 2026-02-21T09:07:28.2377532Z .b8 87 // DW_AT_call_column 2026-02-21T09:07:28.2377626Z .b8 11 // DW_FORM_data1 2026-02-21T09:07:28.2377698Z .b8 0 // EOM(1) 2026-02-21T09:07:28.2377768Z .b8 0 // EOM(2) 2026-02-21T09:07:28.2377840Z .b8 0 // EOM(3) 2026-02-21T09:07:28.2377892Z } 2026-02-21T09:07:28.2377956Z .section .debug_info 2026-02-21T09:07:28.2378007Z { 2026-02-21T09:07:28.2378099Z .b32 178 // Length of Unit 2026-02-21T09:07:28.2378199Z .b8 2 // DWARF version number 2026-02-21T09:07:28.2378252Z .b8 0 2026-02-21T09:07:28.2378388Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:07:28.2378486Z .b8 8 // Address Size (in bytes) 2026-02-21T09:07:28.2378601Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:07:28.2378702Z .b8 116 // DW_AT_producer 2026-02-21T09:07:28.2378758Z .b8 114 2026-02-21T09:07:28.2378811Z .b8 105 2026-02-21T09:07:28.2378862Z .b8 116 2026-02-21T09:07:28.2378920Z .b8 111 2026-02-21T09:07:28.2378974Z .b8 110 2026-02-21T09:07:28.2379026Z .b8 0 2026-02-21T09:07:28.2379110Z .b8 2 // DW_AT_language 2026-02-21T09:07:28.2379163Z .b8 0 2026-02-21T09:07:28.2379241Z .b8 99 // DW_AT_name 2026-02-21T09:07:28.2379295Z .b8 117 2026-02-21T09:07:28.2379353Z .b8 116 2026-02-21T09:07:28.2379405Z .b8 111 2026-02-21T09:07:28.2379458Z .b8 111 2026-02-21T09:07:28.2379516Z .b8 104 2026-02-21T09:07:28.2379569Z .b8 97 2026-02-21T09:07:28.2379621Z .b8 99 2026-02-21T09:07:28.2379673Z .b8 107 2026-02-21T09:07:28.2379811Z .b8 111 2026-02-21T09:07:28.2379874Z .b8 111 2026-02-21T09:07:28.2379928Z .b8 112 2026-02-21T09:07:28.2379982Z .b8 111 2026-02-21T09:07:28.2380037Z .b8 53 2026-02-21T09:07:28.2380088Z .b8 99 2026-02-21T09:07:28.2380143Z .b8 120 2026-02-21T09:07:28.2380198Z .b8 109 2026-02-21T09:07:28.2380249Z .b8 55 2026-02-21T09:07:28.2380302Z .b8 107 2026-02-21T09:07:28.2380355Z .b8 112 2026-02-21T09:07:28.2380409Z .b8 98 2026-02-21T09:07:28.2380463Z .b8 50 2026-02-21T09:07:28.2380513Z .b8 54 2026-02-21T09:07:28.2380567Z .b8 99 2026-02-21T09:07:28.2380618Z .b8 50 2026-02-21T09:07:28.2380671Z .b8 113 2026-02-21T09:07:28.2380723Z .b8 103 2026-02-21T09:07:28.2380781Z .b8 115 2026-02-21T09:07:28.2380835Z .b8 100 2026-02-21T09:07:28.2380888Z .b8 122 2026-02-21T09:07:28.2381042Z .b8 97 2026-02-21T09:07:28.2381161Z .b8 111 2026-02-21T09:07:28.2381214Z .b8 52 2026-02-21T09:07:28.2381269Z .b8 112 2026-02-21T09:07:28.2381325Z .b8 120 2026-02-21T09:07:28.2381379Z .b8 106 2026-02-21T09:07:28.2381431Z .b8 101 2026-02-21T09:07:28.2381484Z .b8 98 2026-02-21T09:07:28.2381540Z .b8 119 2026-02-21T09:07:28.2381595Z .b8 108 2026-02-21T09:07:28.2381646Z .b8 52 2026-02-21T09:07:28.2381703Z .b8 112 2026-02-21T09:07:28.2381756Z .b8 98 2026-02-21T09:07:28.2381807Z .b8 50 2026-02-21T09:07:28.2381861Z .b8 112 2026-02-21T09:07:28.2381918Z .b8 51 2026-02-21T09:07:28.2381969Z .b8 108 2026-02-21T09:07:28.2382022Z .b8 109 2026-02-21T09:07:28.2382088Z .b8 98 2026-02-21T09:07:28.2382141Z .b8 119 2026-02-21T09:07:28.2382193Z .b8 99 2026-02-21T09:07:28.2382250Z .b8 101 2026-02-21T09:07:28.2382304Z .b8 46 2026-02-21T09:07:28.2382356Z .b8 112 2026-02-21T09:07:28.2382407Z .b8 121 2026-02-21T09:07:28.2382457Z .b8 0 2026-02-21T09:07:28.2382566Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:07:28.2382654Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:07:28.2382762Z .b8 116 2026-02-21T09:07:28.2382821Z .b8 109 2026-02-21T09:07:28.2382877Z .b8 112 2026-02-21T09:07:28.2382928Z .b8 47 2026-02-21T09:07:28.2382994Z .b8 116 2026-02-21T09:07:28.2383051Z .b8 111 2026-02-21T09:07:28.2383103Z .b8 114 2026-02-21T09:07:28.2383153Z .b8 99 2026-02-21T09:07:28.2383209Z .b8 104 2026-02-21T09:07:28.2383267Z .b8 105 2026-02-21T09:07:28.2383320Z .b8 110 2026-02-21T09:07:28.2383372Z .b8 100 2026-02-21T09:07:28.2383427Z .b8 117 2026-02-21T09:07:28.2383478Z .b8 99 2026-02-21T09:07:28.2383530Z .b8 116 2026-02-21T09:07:28.2383584Z .b8 111 2026-02-21T09:07:28.2383636Z .b8 114 2026-02-21T09:07:28.2383687Z .b8 95 2026-02-21T09:07:28.2383740Z .b8 114 2026-02-21T09:07:28.2383794Z .b8 111 2026-02-21T09:07:28.2383847Z .b8 111 2026-02-21T09:07:28.2383898Z .b8 116 2026-02-21T09:07:28.2383950Z .b8 47 2026-02-21T09:07:28.2384010Z .b8 117 2026-02-21T09:07:28.2384061Z .b8 116 2026-02-21T09:07:28.2384115Z .b8 0 2026-02-21T09:07:28.2384234Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:07:28.2384312Z .b8 95 // DW_AT_name 2026-02-21T09:07:28.2384364Z .b8 104 2026-02-21T09:07:28.2384420Z .b8 101 2026-02-21T09:07:28.2384473Z .b8 108 2026-02-21T09:07:28.2384526Z .b8 105 2026-02-21T09:07:28.2384579Z .b8 111 2026-02-21T09:07:28.2384641Z .b8 110 2026-02-21T09:07:28.2384699Z .b8 95 2026-02-21T09:07:28.2384752Z .b8 109 2026-02-21T09:07:28.2384804Z .b8 97 2026-02-21T09:07:28.2384860Z .b8 116 2026-02-21T09:07:28.2384911Z .b8 109 2026-02-21T09:07:28.2384963Z .b8 117 2026-02-21T09:07:28.2385017Z .b8 108 2026-02-21T09:07:28.2385070Z .b8 95 2026-02-21T09:07:28.2385132Z .b8 98 2026-02-21T09:07:28.2385186Z .b8 102 2026-02-21T09:07:28.2385239Z .b8 49 2026-02-21T09:07:28.2385290Z .b8 54 2026-02-21T09:07:28.2385342Z .b8 95 2026-02-21T09:07:28.2385402Z .b8 105 2026-02-21T09:07:28.2385454Z .b8 110 2026-02-21T09:07:28.2385506Z .b8 116 2026-02-21T09:07:28.2385560Z .b8 52 2026-02-21T09:07:28.2385616Z .b8 0 2026-02-21T09:07:28.2385698Z .b8 1 // DW_AT_inline 2026-02-21T09:07:28.2385805Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:07:28.2385976Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:07:28.2386073Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:07:28.2386171Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:07:28.2386300Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:07:28.2386401Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:07:28.2386615Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:07:28.2386714Z .b64 $L__tmp12 // DW_AT_high_pc 2026-02-21T09:07:28.2386811Z .b8 1 // DW_AT_call_file 2026-02-21T09:07:28.2386975Z .b8 91 // DW_AT_call_line 2026-02-21T09:07:28.2387125Z .b8 40 // DW_AT_call_column 2026-02-21T09:07:28.2387224Z .b8 0 // End Of Children Mark 2026-02-21T09:07:28.2387320Z .b8 0 // End Of Children Mark 2026-02-21T09:07:28.2387373Z } 2026-02-21T09:07:28.2387445Z .section .debug_macinfo { } 2026-02-21T09:07:28.2387454Z 2026-02-21T09:07:28.2387534Z ================================================================ 2026-02-21T09:07:28.2387655Z please share the reproducer above with Triton project. 2026-02-21T09:07:28.2389105Z [393s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], num_sm_multiplier=4, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:07:28.2389185Z Tensor-likes are not close! 2026-02-21T09:07:28.2389191Z 2026-02-21T09:07:28.2389284Z Mismatched elements: 33485594 / 33554432 (99.8%) 2026-02-21T09:07:28.2389463Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:07:28.2389637Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:28.2389764Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:28.2389769Z 2026-02-21T09:07:30.0060732Z [395s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=3, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 3], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T09:07:30.0062467Z Tensor-likes are not close! 2026-02-21T09:07:30.0062641Z 2026-02-21T09:07:30.0062765Z Mismatched elements: 33484956 / 33554432 (99.8%) 2026-02-21T09:07:30.0063209Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:07:30.0063746Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:30.0064202Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:30.0064459Z 2026-02-21T09:07:30.4489023Z [395s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=4, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 3], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T09:07:30.4490866Z Tensor-likes are not close! 2026-02-21T09:07:30.4491046Z 2026-02-21T09:07:30.4491169Z Mismatched elements: 33484956 / 33554432 (99.8%) 2026-02-21T09:07:30.4491620Z Greatest absolute difference: 2384.0 at index (274, 6497) (up to 0.01 allowed) 2026-02-21T09:07:30.4492449Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:07:30.4492943Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:07:30.4493212Z 2026-02-21T09:07:33.2939026Z 2026-02-21T09:07:33.2939988Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 99/99 15.5 configs/s 2026-02-21T09:07:41.6795372Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━━━ 569/569 67.4 configs/s 2026-02-21T09:07:41.9263460Z [407s] Generation 5 complete: 2026-02-21T09:07:41.9263735Z error=33 2026-02-21T09:07:41.9263911Z ok=70 2026-02-21T09:07:41.9264081Z min=0.3620 2026-02-21T09:07:41.9264263Z mid=0.5617 2026-02-21T09:07:41.9264427Z max=32.4317 2026-02-21T09:07:41.9264619Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:07:41.9265598Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:07:41.9266002Z 'l2_groupings': [2], 2026-02-21T09:07:41.9266257Z 'load_eviction_policies': ['last', 'first'], 2026-02-21T09:07:41.9266934Z 'loop_orders': [[0, 1]], 2026-02-21T09:07:41.9267159Z 'num_stages': 3, 2026-02-21T09:07:41.9267351Z 'num_warps': 4, 2026-02-21T09:07:41.9267567Z 'pid_type': 'flat', 2026-02-21T09:07:41.9267786Z 'range_flattens': [None, False], 2026-02-21T09:07:41.9274498Z 'range_multi_buffers': [None, False], 2026-02-21T09:07:41.9274847Z 'range_num_stages': [0, 4], 2026-02-21T09:07:41.9275109Z 'range_unroll_factors': [0, 3], 2026-02-21T09:07:41.9275367Z 'range_warp_specializes': []} 2026-02-21T09:07:41.9304999Z [407s] Fitting surrogate: 640 points, 640 targets 2026-02-21T09:07:44.0918983Z [409s] Generation 6 starting: 76 neighbors, 4 active search path(s) 2026-02-21T09:08:16.0173757Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78/78 0.7 configs/s 2026-02-21T09:08:18.8915147Z 2026-02-21T09:08:18.8915213Z 2026-02-21T09:08:18.8916165Z ================================================================ 2026-02-21T09:08:18.8916765Z Internal Triton PTX codegen error 2026-02-21T09:08:18.8917045Z `ptxas` stderr: 2026-02-21T09:08:18.8917752Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 418 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:08:18.8918576Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:18.8918810Z 2026-02-21T09:08:18.8919458Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsp2gfe2y.ptx -o /tmp/tmpsp2gfe2y.ptx.o 2026-02-21T09:08:18.8920229Z 2026-02-21T09:08:18.8920233Z 2026-02-21T09:08:18.8920338Z // 2026-02-21T09:08:18.8920571Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:08:18.8920854Z // 2026-02-21T09:08:18.8920968Z 2026-02-21T09:08:18.8921055Z .version 8.7 2026-02-21T09:08:18.8921269Z .target sm_90a 2026-02-21T09:08:18.8921482Z .address_size 64 2026-02-21T09:08:18.8922166Z [444s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:08:18.8924127Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 64, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=6, num_warps=16, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, False], range_num_stages=[4, 1], range_unroll_factors=[3, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:08:18.8926003Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:08:18.8926319Z `ptxas` stderr: 2026-02-21T09:08:18.8927088Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 418 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:08:18.8927811Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:08:18.8928023Z 2026-02-21T09:08:18.8928585Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsp2gfe2y.ptx -o /tmp/tmpsp2gfe2y.ptx.o 2026-02-21T09:08:18.8929398Z 2026-02-21T09:08:18.8929564Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:08:18.8929829Z 2026-02-21T09:08:18.8930022Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:08:18.8930435Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:08:18.8930708Z // @_helion_matmul_bf16_int4 2026-02-21T09:08:18.8930999Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:08:18.8931319Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:08:18.8931818Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:08:18.8932291Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:08:18.8932696Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:08:18.8933079Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:08:18.8933374Z ) 2026-02-21T09:08:18.8933516Z .reqntid 512 2026-02-21T09:08:18.8933665Z .maxnreg 32 2026-02-21T09:08:18.8933811Z { 2026-02-21T09:08:18.8933951Z .reg .pred %p<94>; 2026-02-21T09:08:18.8934130Z .reg .b16 %rs<312>; 2026-02-21T09:08:18.8934297Z .reg .b32 %r<2851>; 2026-02-21T09:08:18.8934472Z .reg .b64 %rd<256>; 2026-02-21T09:08:18.8934813Z .loc 1 14 0 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:14:0 2026-02-21T09:08:18.8935219Z $L__func_begin0: 2026-02-21T09:08:18.8935550Z .loc 1 14 0 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:14:0 2026-02-21T09:08:18.8935869Z 2026-02-21T09:08:18.8935929Z // %bb.0: 2026-02-21T09:08:18.8936225Z ld.param.b64 %rd55, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:08:18.8936683Z ld.param.b64 %rd54, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:08:18.8936994Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:08:18.8937241Z $L__tmp0: 2026-02-21T09:08:18.8937531Z .loc 1 19 46 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:46 2026-02-21T09:08:18.8937903Z mov.u32 %r2813, %ctaid.x; 2026-02-21T09:08:18.8938215Z .loc 1 0 0 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:0 2026-02-21T09:08:18.8938587Z sub.s32 %r144, 10303, %r2813; 2026-02-21T09:08:18.8938927Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.8939301Z mul.hi.u32 %r145, %r144, 1041204193; 2026-02-21T09:08:18.8939514Z shr.u32 %r146, %r145, 9; 2026-02-21T09:08:18.8939687Z mul.hi.u32 %r147, %r146, 1431655766; 2026-02-21T09:08:18.8939902Z mad.lo.s32 %r2841, %r147, 6336, %r2813; 2026-02-21T09:08:18.8940257Z .loc 1 31 45 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:45 2026-02-21T09:08:18.8940617Z mov.u32 %r3, %tid.x; 2026-02-21T09:08:18.8940780Z and.b32 %r4, %r3, 31; 2026-02-21T09:08:18.8940954Z shr.u32 %r5, %r3, 5; 2026-02-21T09:08:18.8941117Z and.b32 %r6, %r3, 504; 2026-02-21T09:08:18.8941300Z bfe.u32 %r7, %r3, 3, 6; 2026-02-21T09:08:18.8941475Z shl.b32 %r8, %r3, 1; 2026-02-21T09:08:18.8941634Z and.b32 %r9, %r8, 62; 2026-02-21T09:08:18.8941800Z and.b32 %r10, %r3, 7; 2026-02-21T09:08:18.8941970Z shl.b32 %r11, %r10, 3; 2026-02-21T09:08:18.8942284Z .loc 1 41 48 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:41:48 2026-02-21T09:08:18.8942656Z bfe.u32 %r12, %r3, 5, 4; 2026-02-21T09:08:18.8942980Z .loc 1 47 38 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:47:38 2026-02-21T09:08:18.8943333Z shl.b32 %r13, %r10, 2; 2026-02-21T09:08:18.8943647Z .loc 1 65 38 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:65:38 2026-02-21T09:08:18.8944001Z and.b32 %r14, %r3, 64; 2026-02-21T09:08:18.8944429Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.8944827Z setp.ge.s32 %p1, %r2813, %r2841; 2026-02-21T09:08:18.8945024Z shl.b32 %r2792, %r3, 3; 2026-02-21T09:08:18.8945208Z shr.u32 %r2793, %r3, 1; 2026-02-21T09:08:18.8945379Z mov.b32 %r2624, global_smem; 2026-02-21T09:08:18.8945568Z and.b32 %r2795, %r3, 96; 2026-02-21T09:08:18.8945741Z shl.b32 %r2796, %r3, 4; 2026-02-21T09:08:18.8945904Z and.b32 %r2797, %r8, 6; 2026-02-21T09:08:18.8946085Z and.b32 %r2798, %r3, 24; 2026-02-21T09:08:18.8946253Z shl.b32 %r2799, %r3, 2; 2026-02-21T09:08:18.8946423Z and.b32 %r2800, %r3, 384; 2026-02-21T09:08:18.8946786Z bfe.u32 %r2801, %r3, 7, 2; 2026-02-21T09:08:18.8947071Z and.b32 %r2802, %r8, 124; 2026-02-21T09:08:18.8947323Z and.b32 %r2803, %r3, 1; 2026-02-21T09:08:18.8947510Z shl.b32 %r2804, %r3, 7; 2026-02-21T09:08:18.8947682Z shl.b32 %r2805, %r10, 4; 2026-02-21T09:08:18.8947864Z shr.u32 %r2806, %r3, 4; 2026-02-21T09:08:18.8948039Z shl.b32 %r2807, %r5, 3; 2026-02-21T09:08:18.8948207Z shl.b32 %r2808, %r10, 10; 2026-02-21T09:08:18.8948472Z and.b32 %r2809, %r3, 16; 2026-02-21T09:08:18.8948644Z shl.b32 %r2810, %r6, 1; 2026-02-21T09:08:18.8948821Z shl.b32 %r2811, %r7, 10; 2026-02-21T09:08:18.8948990Z shl.b32 %r2812, %r12, 13; 2026-02-21T09:08:18.8949185Z setp.eq.b32 %p93, %r14, 0; 2026-02-21T09:08:18.8949372Z @%p1 bra $L__BB0_9; 2026-02-21T09:08:18.8949569Z // %bb.1: // %.lr.ph 2026-02-21T09:08:18.8949961Z .loc 1 0 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:0:145 2026-02-21T09:08:18.8950343Z and.b32 %r149, %r2792, 4088; 2026-02-21T09:08:18.8950537Z and.b32 %r151, %r2793, 24; 2026-02-21T09:08:18.8950723Z xor.b32 %r152, %r149, %r151; 2026-02-21T09:08:18.8950982Z add.s32 %r15, %r2624, %r152; 2026-02-21T09:08:18.8951164Z shl.b32 %r155, %r2795, 5; 2026-02-21T09:08:18.8951347Z and.b32 %r157, %r2796, 448; 2026-02-21T09:08:18.8951528Z or.b32 %r160, %r155, %r157; 2026-02-21T09:08:18.8951715Z or.b32 %r161, %r2798, %r2797; 2026-02-21T09:08:18.8951901Z or.b32 %r162, %r160, %r161; 2026-02-21T09:08:18.8952085Z add.s32 %r16, %r2624, %r162; 2026-02-21T09:08:18.8952274Z xor.b32 %r163, %r162, 8; 2026-02-21T09:08:18.8952444Z add.s32 %r17, %r2624, %r163; 2026-02-21T09:08:18.8952634Z xor.b32 %r164, %r162, 16; 2026-02-21T09:08:18.8952813Z add.s32 %r18, %r2624, %r164; 2026-02-21T09:08:18.8952993Z xor.b32 %r165, %r162, 24; 2026-02-21T09:08:18.8953160Z add.s32 %r19, %r2624, %r165; 2026-02-21T09:08:18.8953338Z and.b32 %r167, %r2799, 508; 2026-02-21T09:08:18.8953519Z or.b32 %r170, %r2801, %r167; 2026-02-21T09:08:18.8953691Z add.s32 %r20, %r2624, %r170; 2026-02-21T09:08:18.8953879Z xor.b32 %r171, %r170, 64; 2026-02-21T09:08:18.8954052Z add.s32 %r21, %r2624, %r171; 2026-02-21T09:08:18.8954235Z neg.s32 %r174, %r2803; 2026-02-21T09:08:18.8954407Z and.b32 %r175, %r174, 576; 2026-02-21T09:08:18.8954589Z or.b32 %r176, %r2802, %r2800; 2026-02-21T09:08:18.8954765Z xor.b32 %r177, %r176, %r175; 2026-02-21T09:08:18.8954953Z add.s32 %r22, %r2624, %r177; 2026-02-21T09:08:18.8955125Z and.b32 %r179, %r2804, 8064; 2026-02-21T09:08:18.8955312Z and.b32 %r182, %r2806, 28; 2026-02-21T09:08:18.8955495Z xor.b32 %r183, %r2805, %r182; 2026-02-21T09:08:18.8955673Z or.b32 %r184, %r183, %r179; 2026-02-21T09:08:18.8955856Z add.s32 %r23, %r2624, %r184; 2026-02-21T09:08:18.8956042Z xor.b32 %r185, %r184, 32; 2026-02-21T09:08:18.8956222Z add.s32 %r24, %r2624, %r185; 2026-02-21T09:08:18.8956398Z xor.b32 %r186, %r184, 64; 2026-02-21T09:08:18.8956706Z add.s32 %r25, %r2624, %r186; 2026-02-21T09:08:18.8956882Z xor.b32 %r187, %r184, 96; 2026-02-21T09:08:18.8957058Z add.s32 %r26, %r2624, %r187; 2026-02-21T09:08:18.8957234Z and.b32 %r189, %r2807, 120; 2026-02-21T09:08:18.8957413Z or.b32 %r190, %r189, %r4; 2026-02-21T09:08:18.8957586Z shl.b32 %r191, %r190, 4; 2026-02-21T09:08:18.8957757Z add.s32 %r192, %r2624, 8192; 2026-02-21T09:08:18.8958038Z add.s32 %r695, %r192, %r191; 2026-02-21T09:08:18.8958208Z shl.b32 %r193, %r2798, 6; 2026-02-21T09:08:18.8958379Z shl.b32 %r194, %r2795, 2; 2026-02-21T09:08:18.8958545Z add.s32 %r195, %r192, %r193; 2026-02-21T09:08:18.8958720Z add.s32 %r196, %r195, %r194; 2026-02-21T09:08:18.8958895Z add.s32 %r266, %r196, %r2805; 2026-02-21T09:08:18.8959091Z bfe.u32 %r197, %r2624, 4, 14; 2026-02-21T09:08:18.8959273Z cvt.u64.u32 %rd56, %r197; 2026-02-21T09:08:18.8959465Z or.b64 %rd1, %rd56, 4611686293338849280; 2026-02-21T09:08:18.8959683Z add.s32 %r198, %r2624, 32; 2026-02-21T09:08:18.8959862Z bfe.u32 %r199, %r198, 4, 14; 2026-02-21T09:08:18.8960049Z cvt.u64.u32 %rd57, %r199; 2026-02-21T09:08:18.8960229Z or.b64 %rd2, %rd57, 4611686293338849280; 2026-02-21T09:08:18.8960587Z add.s32 %r200, %r2624, 64; 2026-02-21T09:08:18.8960770Z bfe.u32 %r201, %r200, 4, 14; 2026-02-21T09:08:18.8960952Z cvt.u64.u32 %rd58, %r201; 2026-02-21T09:08:18.8961155Z or.b64 %rd3, %rd58, 4611686293338849280; 2026-02-21T09:08:18.8961362Z add.s32 %r202, %r2624, 96; 2026-02-21T09:08:18.8961542Z bfe.u32 %r203, %r202, 4, 14; 2026-02-21T09:08:18.8961719Z cvt.u64.u32 %rd59, %r203; 2026-02-21T09:08:18.8961911Z or.b64 %rd4, %rd59, 4611686293338849280; 2026-02-21T09:08:18.8962119Z add.s32 %r204, %r2624, 2048; 2026-02-21T09:08:18.8962309Z bfe.u32 %r205, %r204, 4, 14; 2026-02-21T09:08:18.8962494Z cvt.u64.u32 %rd60, %r205; 2026-02-21T09:08:18.8962677Z or.b64 %rd5, %rd60, 4611686293338849280; 2026-02-21T09:08:18.8962887Z add.s32 %r206, %r2624, 2080; 2026-02-21T09:08:18.8963061Z bfe.u32 %r207, %r206, 4, 14; 2026-02-21T09:08:18.8963241Z cvt.u64.u32 %rd61, %r207; 2026-02-21T09:08:18.8963422Z or.b64 %rd6, %rd61, 4611686293338849280; 2026-02-21T09:08:18.8963631Z add.s32 %r208, %r2624, 2112; 2026-02-21T09:08:18.8963895Z bfe.u32 %r209, %r208, 4, 14; 2026-02-21T09:08:18.8964077Z cvt.u64.u32 %rd62, %r209; 2026-02-21T09:08:18.8964258Z or.b64 %rd7, %rd62, 4611686293338849280; 2026-02-21T09:08:18.8964459Z add.s32 %r210, %r2624, 2144; 2026-02-21T09:08:18.8964637Z bfe.u32 %r211, %r210, 4, 14; 2026-02-21T09:08:18.8964807Z cvt.u64.u32 %rd63, %r211; 2026-02-21T09:08:18.8964988Z or.b64 %rd8, %rd63, 4611686293338849280; 2026-02-21T09:08:18.8965195Z add.s32 %r212, %r2624, 4096; 2026-02-21T09:08:18.8965374Z bfe.u32 %r213, %r212, 4, 14; 2026-02-21T09:08:18.8965547Z cvt.u64.u32 %rd64, %r213; 2026-02-21T09:08:18.8965731Z or.b64 %rd9, %rd64, 4611686293338849280; 2026-02-21T09:08:18.8965938Z add.s32 %r214, %r2624, 4128; 2026-02-21T09:08:18.8966113Z bfe.u32 %r215, %r214, 4, 14; 2026-02-21T09:08:18.8966288Z cvt.u64.u32 %rd65, %r215; 2026-02-21T09:08:18.8966613Z or.b64 %rd10, %rd65, 4611686293338849280; 2026-02-21T09:08:18.8966848Z add.s32 %r216, %r2624, 4160; 2026-02-21T09:08:18.8967033Z bfe.u32 %r217, %r216, 4, 14; 2026-02-21T09:08:18.8967222Z cvt.u64.u32 %rd66, %r217; 2026-02-21T09:08:18.8967403Z or.b64 %rd11, %rd66, 4611686293338849280; 2026-02-21T09:08:18.8967616Z add.s32 %r218, %r2624, 4192; 2026-02-21T09:08:18.8967802Z bfe.u32 %r219, %r218, 4, 14; 2026-02-21T09:08:18.8967977Z cvt.u64.u32 %rd67, %r219; 2026-02-21T09:08:18.8968162Z or.b64 %rd12, %rd67, 4611686293338849280; 2026-02-21T09:08:18.8968364Z add.s32 %r220, %r2624, 6144; 2026-02-21T09:08:18.8968545Z bfe.u32 %r221, %r220, 4, 14; 2026-02-21T09:08:18.8968719Z cvt.u64.u32 %rd68, %r221; 2026-02-21T09:08:18.8968901Z or.b64 %rd13, %rd68, 4611686293338849280; 2026-02-21T09:08:18.8969100Z add.s32 %r222, %r2624, 6176; 2026-02-21T09:08:18.8969278Z bfe.u32 %r223, %r222, 4, 14; 2026-02-21T09:08:18.8969456Z cvt.u64.u32 %rd69, %r223; 2026-02-21T09:08:18.8969633Z or.b64 %rd14, %rd69, 4611686293338849280; 2026-02-21T09:08:18.8969840Z add.s32 %r224, %r2624, 6208; 2026-02-21T09:08:18.8970017Z bfe.u32 %r225, %r224, 4, 14; 2026-02-21T09:08:18.8970199Z cvt.u64.u32 %rd70, %r225; 2026-02-21T09:08:18.8970375Z or.b64 %rd15, %rd70, 4611686293338849280; 2026-02-21T09:08:18.8970581Z add.s32 %r226, %r2624, 6240; 2026-02-21T09:08:18.8970863Z bfe.u32 %r227, %r226, 4, 14; 2026-02-21T09:08:18.8971046Z cvt.u64.u32 %rd71, %r227; 2026-02-21T09:08:18.8971226Z or.b64 %rd16, %rd71, 4611686293338849280; 2026-02-21T09:08:18.8971438Z and.b32 %r229, %r2796, 240; 2026-02-21T09:08:18.8971623Z shl.b32 %r230, %r2795, 3; 2026-02-21T09:08:18.8971796Z shr.u32 %r231, %r2800, 2; 2026-02-21T09:08:18.8971970Z or.b32 %r233, %r229, %r230; 2026-02-21T09:08:18.8972148Z or.b32 %r234, %r231, %r2809; 2026-02-21T09:08:18.8972332Z xor.b32 %r235, %r233, %r234; 2026-02-21T09:08:18.8972511Z add.s32 %r236, %r2624, %r2808; 2026-02-21T09:08:18.8972701Z add.s32 %r29, %r236, %r235; 2026-02-21T09:08:18.8972878Z and.b32 %r237, %r2804, 7168; 2026-02-21T09:08:18.8973064Z xor.b32 %r239, %r2805, %r2810; 2026-02-21T09:08:18.8973427Z add.s32 %r240, %r2624, %r237; 2026-02-21T09:08:18.8973611Z add.s32 %r30, %r240, %r239; 2026-02-21T09:08:18.8973957Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.8974331Z or.b32 %r31, %r2811, %r13; 2026-02-21T09:08:18.8974513Z or.b32 %r32, %r2812, %r9; 2026-02-21T09:08:18.8974734Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:08:18.8975055Z // Child Loop BB0_3 Depth 2 2026-02-21T09:08:18.8975327Z // Child Loop BB0_5 Depth 2 2026-02-21T09:08:18.8975592Z // Child Loop BB0_7 Depth 2 2026-02-21T09:08:18.8975972Z .loc 1 25 35 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:25:35 2026-02-21T09:08:18.8976331Z shr.s32 %r244, %r2813, 31; 2026-02-21T09:08:18.8976642Z shr.u32 %r245, %r244, 24; 2026-02-21T09:08:18.8976820Z add.s32 %r246, %r2813, %r245; 2026-02-21T09:08:18.8977095Z shr.s32 %r247, %r246, 8; 2026-02-21T09:08:18.8977424Z .loc 1 26 33 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:26:33 2026-02-21T09:08:18.8977784Z shl.b32 %r248, %r247, 2; 2026-02-21T09:08:18.8978113Z .loc 1 27 39 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:39 2026-02-21T09:08:18.8978468Z sub.s32 %r249, 128, %r248; 2026-02-21T09:08:18.8978787Z .loc 1 27 52 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:52 2026-02-21T09:08:18.8979135Z min.s32 %r250, %r249, 4; 2026-02-21T09:08:18.8979461Z .loc 1 28 45 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:45 2026-02-21T09:08:18.8979815Z and.b32 %r251, %r246, -256; 2026-02-21T09:08:18.8980002Z sub.s32 %r252, %r2813, %r251; 2026-02-21T09:08:18.8980328Z .loc 1 29 51 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:29:51 2026-02-21T09:08:18.8980685Z div.s32 %r253, %r252, %r250; 2026-02-21T09:08:18.8981008Z .loc 1 28 64 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:64 2026-02-21T09:08:18.8981365Z mul.lo.s32 %r254, %r253, %r250; 2026-02-21T09:08:18.8981562Z sub.s32 %r255, %r252, %r254; 2026-02-21T09:08:18.8981886Z .loc 1 28 30 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:30 2026-02-21T09:08:18.8982246Z add.s32 %r256, %r255, %r248; 2026-02-21T09:08:18.8982564Z .loc 1 30 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:30:27 2026-02-21T09:08:18.8982915Z shl.b32 %r53, %r256, 6; 2026-02-21T09:08:18.8983231Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.8983577Z or.b32 %r54, %r53, %r9; 2026-02-21T09:08:18.8983886Z .loc 1 32 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:32:27 2026-02-21T09:08:18.8984241Z shl.b32 %r257, %r253, 6; 2026-02-21T09:08:18.8984554Z .loc 1 33 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:33:32 2026-02-21T09:08:18.8985002Z or.b32 %r55, %r257, %r7; 2026-02-21T09:08:18.8985326Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.8985698Z shl.b32 %r258, %r253, 16; 2026-02-21T09:08:18.8985875Z or.b32 %r259, %r31, %r258; 2026-02-21T09:08:18.8986067Z mad.wide.s32 %rd248, %r259, 2, %rd53; 2026-02-21T09:08:18.8986269Z add.s32 %r2814, %r32, %r53; 2026-02-21T09:08:18.8986585Z mov.b32 %r730, 0f00000000; 2026-02-21T09:08:18.8986784Z mov.b64 %rd249, -32; 2026-02-21T09:08:18.8986958Z mov.b32 %r731, %r730; 2026-02-21T09:08:18.8987127Z mov.b32 %r732, %r730; 2026-02-21T09:08:18.8987285Z mov.b32 %r733, %r730; 2026-02-21T09:08:18.8987448Z mov.b32 %r734, %r730; 2026-02-21T09:08:18.8987609Z mov.b32 %r735, %r730; 2026-02-21T09:08:18.8987934Z mov.b32 %r736, %r730; 2026-02-21T09:08:18.8988096Z mov.b32 %r737, %r730; 2026-02-21T09:08:18.8988312Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:08:18.8988684Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:18.8989089Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.8989463Z // begin inline asm 2026-02-21T09:08:18.8989620Z mov.u64 %rd73, 0x0; 2026-02-21T09:08:18.8989860Z createpolicy.fractional.L2::evict_last.b64 %rd73, 1.0; 2026-02-21T09:08:18.8990115Z // end inline asm 2026-02-21T09:08:18.8990270Z // begin inline asm 2026-02-21T09:08:18.8990421Z mov.u32 %r260, 0x0; 2026-02-21T09:08:18.8990577Z mov.u32 %r261, 0x0; 2026-02-21T09:08:18.8990847Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r260, %r261 }, [ %rd248 + 0 ], %rd73; 2026-02-21T09:08:18.8991190Z // end inline asm 2026-02-21T09:08:18.8991583Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.8991938Z bar.sync 0; 2026-02-21T09:08:18.8992110Z st.shared.v2.b32 [%r15], {%r260, %r261}; 2026-02-21T09:08:18.8992318Z bar.sync 0; 2026-02-21T09:08:18.8992476Z ld.shared.b16 %rs3, [%r16]; 2026-02-21T09:08:18.8992668Z ld.shared.b16 %rs4, [%r16+512]; 2026-02-21T09:08:18.8992869Z ld.shared.b16 %rs5, [%r16+32]; 2026-02-21T09:08:18.8993059Z ld.shared.b16 %rs6, [%r16+544]; 2026-02-21T09:08:18.8993255Z ld.shared.b16 %rs7, [%r17]; 2026-02-21T09:08:18.8993443Z ld.shared.b16 %rs8, [%r17+512]; 2026-02-21T09:08:18.8993638Z ld.shared.b16 %rs9, [%r17+32]; 2026-02-21T09:08:18.8993846Z ld.shared.b16 %rs10, [%r17+544]; 2026-02-21T09:08:18.8994045Z ld.shared.b16 %rs11, [%r18]; 2026-02-21T09:08:18.8994237Z ld.shared.b16 %rs12, [%r18+512]; 2026-02-21T09:08:18.8994429Z ld.shared.b16 %rs13, [%r18+32]; 2026-02-21T09:08:18.8994623Z ld.shared.b16 %rs14, [%r18+544]; 2026-02-21T09:08:18.8994817Z ld.shared.b16 %rs15, [%r19]; 2026-02-21T09:08:18.8995008Z ld.shared.b16 %rs16, [%r19+512]; 2026-02-21T09:08:18.8995205Z ld.shared.b16 %rs17, [%r19+32]; 2026-02-21T09:08:18.8995394Z ld.shared.b16 %rs18, [%r19+544]; 2026-02-21T09:08:18.8995590Z cvt.f32.bf16 %r398, %rs3; 2026-02-21T09:08:18.8995767Z cvt.f32.bf16 %r399, %rs4; 2026-02-21T09:08:18.8995944Z cvt.f32.bf16 %r400, %rs7; 2026-02-21T09:08:18.8996113Z cvt.f32.bf16 %r401, %rs8; 2026-02-21T09:08:18.8996293Z cvt.f32.bf16 %r418, %rs11; 2026-02-21T09:08:18.8996601Z cvt.f32.bf16 %r419, %rs12; 2026-02-21T09:08:18.8996800Z cvt.f32.bf16 %r420, %rs15; 2026-02-21T09:08:18.8996977Z cvt.f32.bf16 %r421, %rs16; 2026-02-21T09:08:18.8997163Z cvt.f32.bf16 %r438, %rs5; 2026-02-21T09:08:18.8997345Z cvt.f32.bf16 %r439, %rs6; 2026-02-21T09:08:18.8997532Z cvt.f32.bf16 %r440, %rs9; 2026-02-21T09:08:18.8997715Z cvt.f32.bf16 %r441, %rs10; 2026-02-21T09:08:18.8997893Z cvt.f32.bf16 %r458, %rs13; 2026-02-21T09:08:18.8998075Z cvt.f32.bf16 %r459, %rs14; 2026-02-21T09:08:18.8998253Z cvt.f32.bf16 %r460, %rs17; 2026-02-21T09:08:18.8998433Z cvt.f32.bf16 %r461, %rs18; 2026-02-21T09:08:18.8998753Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.8999219Z cvt.s64.s32 %rd105, %r2814; 2026-02-21T09:08:18.8999408Z add.s64 %rd77, %rd54, %rd105; 2026-02-21T09:08:18.8999739Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9000098Z // begin inline asm 2026-02-21T09:08:18.9000256Z mov.u64 %rd76, 0x0; 2026-02-21T09:08:18.9000478Z createpolicy.fractional.L2::evict_first.b64 %rd76, 1.0; 2026-02-21T09:08:18.9000735Z // end inline asm 2026-02-21T09:08:18.9000891Z // begin inline asm 2026-02-21T09:08:18.9001043Z mov.u16 %rs1, 0x0; 2026-02-21T09:08:18.9001300Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs1 }, [ %rd77 + 0 ], %rd76; 2026-02-21T09:08:18.9001615Z // end inline asm 2026-02-21T09:08:18.9001901Z shr.u16 %rs19, %rs1, 8; 2026-02-21T09:08:18.9002227Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9002573Z bar.sync 0; 2026-02-21T09:08:18.9002730Z st.shared.b8 [%r20], %rs1; 2026-02-21T09:08:18.9002917Z st.shared.b8 [%r21+512], %rs19; 2026-02-21T09:08:18.9003109Z bar.sync 0; 2026-02-21T09:08:18.9003272Z ld.shared.b32 %r812, [%r22]; 2026-02-21T09:08:18.9003470Z prmt.b32 %r813, %r812, 0, 0x7770U; 2026-02-21T09:08:18.9003669Z cvt.u16.u32 %rs20, %r813; 2026-02-21T09:08:18.9003847Z prmt.b32 %r814, %r812, 0, 0x7771U; 2026-02-21T09:08:18.9004045Z cvt.u16.u32 %rs21, %r814; 2026-02-21T09:08:18.9004219Z prmt.b32 %r815, %r812, 0, 0x7772U; 2026-02-21T09:08:18.9004415Z cvt.u16.u32 %rs22, %r815; 2026-02-21T09:08:18.9004587Z prmt.b32 %r816, %r812, 0, 0x7773U; 2026-02-21T09:08:18.9004781Z cvt.u16.u32 %rs23, %r816; 2026-02-21T09:08:18.9005105Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9005537Z shl.b16 %rs24, %rs20, 4; 2026-02-21T09:08:18.9005720Z shl.b16 %rs25, %rs21, 4; 2026-02-21T09:08:18.9005888Z shl.b16 %rs26, %rs22, 4; 2026-02-21T09:08:18.9006060Z shl.b16 %rs27, %rs23, 4; 2026-02-21T09:08:18.9006370Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9006878Z selp.b16 %rs28, %rs24, %rs20, %p93; 2026-02-21T09:08:18.9007077Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T09:08:18.9007249Z shr.s16 %rs30, %rs29, 4; 2026-02-21T09:08:18.9007435Z selp.b16 %rs31, %rs25, %rs21, %p93; 2026-02-21T09:08:18.9007651Z cvt.s16.s8 %rs32, %rs31; 2026-02-21T09:08:18.9007817Z shr.s16 %rs33, %rs32, 4; 2026-02-21T09:08:18.9008000Z selp.b16 %rs34, %rs26, %rs22, %p93; 2026-02-21T09:08:18.9008198Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T09:08:18.9008369Z shr.s16 %rs36, %rs35, 4; 2026-02-21T09:08:18.9008552Z selp.b16 %rs37, %rs27, %rs23, %p93; 2026-02-21T09:08:18.9008750Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T09:08:18.9008931Z shr.s16 %rs39, %rs38, 4; 2026-02-21T09:08:18.9009240Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9009606Z cvt.rn.f32.s16 %r817, %rs30; 2026-02-21T09:08:18.9009793Z cvt.rn.f32.s16 %r818, %rs33; 2026-02-21T09:08:18.9009980Z cvt.rn.f32.s16 %r819, %rs36; 2026-02-21T09:08:18.9010164Z cvt.rn.f32.s16 %r820, %rs39; 2026-02-21T09:08:18.9010336Z bar.sync 0; 2026-02-21T09:08:18.9010490Z st.shared.b32 [%r23], %r817; 2026-02-21T09:08:18.9010669Z st.shared.b32 [%r24], %r818; 2026-02-21T09:08:18.9010856Z st.shared.b32 [%r25], %r819; 2026-02-21T09:08:18.9011033Z st.shared.b32 [%r26], %r820; 2026-02-21T09:08:18.9011286Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r730}; 2026-02-21T09:08:18.9011549Z bar.sync 0; 2026-02-21T09:08:18.9011696Z // begin inline asm 2026-02-21T09:08:18.9011976Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r322, %r402, %r482, %r562}, [%r266]; 2026-02-21T09:08:18.9012299Z // end inline asm 2026-02-21T09:08:18.9012453Z bar.sync 0; 2026-02-21T09:08:18.9012677Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r732}; 2026-02-21T09:08:18.9012941Z bar.sync 0; 2026-02-21T09:08:18.9013173Z // begin inline asm 2026-02-21T09:08:18.9013449Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r324, %r404, %r484, %r564}, [%r266]; 2026-02-21T09:08:18.9013766Z // end inline asm 2026-02-21T09:08:18.9013937Z bar.sync 0; 2026-02-21T09:08:18.9014142Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r731}; 2026-02-21T09:08:18.9014407Z bar.sync 0; 2026-02-21T09:08:18.9014552Z // begin inline asm 2026-02-21T09:08:18.9014819Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r323, %r403, %r483, %r563}, [%r266]; 2026-02-21T09:08:18.9015143Z // end inline asm 2026-02-21T09:08:18.9015289Z bar.sync 0; 2026-02-21T09:08:18.9015503Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r733}; 2026-02-21T09:08:18.9015763Z bar.sync 0; 2026-02-21T09:08:18.9015983Z // begin inline asm 2026-02-21T09:08:18.9016310Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r325, %r405, %r485, %r565}, [%r266]; 2026-02-21T09:08:18.9016764Z // end inline asm 2026-02-21T09:08:18.9016916Z bar.sync 0; 2026-02-21T09:08:18.9017124Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r734}; 2026-02-21T09:08:18.9017399Z bar.sync 0; 2026-02-21T09:08:18.9017542Z // begin inline asm 2026-02-21T09:08:18.9017812Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r326, %r406, %r486, %r566}, [%r266]; 2026-02-21T09:08:18.9018127Z // end inline asm 2026-02-21T09:08:18.9018275Z bar.sync 0; 2026-02-21T09:08:18.9018480Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r736}; 2026-02-21T09:08:18.9018743Z bar.sync 0; 2026-02-21T09:08:18.9018893Z // begin inline asm 2026-02-21T09:08:18.9019160Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r328, %r408, %r488, %r568}, [%r266]; 2026-02-21T09:08:18.9019479Z // end inline asm 2026-02-21T09:08:18.9019623Z bar.sync 0; 2026-02-21T09:08:18.9019838Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r735}; 2026-02-21T09:08:18.9020175Z bar.sync 0; 2026-02-21T09:08:18.9020328Z // begin inline asm 2026-02-21T09:08:18.9020596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r327, %r407, %r487, %r567}, [%r266]; 2026-02-21T09:08:18.9020923Z // end inline asm 2026-02-21T09:08:18.9021069Z bar.sync 0; 2026-02-21T09:08:18.9021282Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r737}; 2026-02-21T09:08:18.9021543Z bar.sync 0; 2026-02-21T09:08:18.9021697Z // begin inline asm 2026-02-21T09:08:18.9021968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r329, %r409, %r489, %r569}, [%r266]; 2026-02-21T09:08:18.9022282Z // end inline asm 2026-02-21T09:08:18.9022430Z $L__tmp1: 2026-02-21T09:08:18.9022785Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9023207Z // begin inline asm 2026-02-21T09:08:18.9023385Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9023578Z // end inline asm 2026-02-21T09:08:18.9023766Z shfl.sync.idx.b32 %r821, %r5, 0, 31, -1; 2026-02-21T09:08:18.9023995Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9024179Z mov.pred %p2, -1; 2026-02-21T09:08:18.9024333Z // begin inline asm 2026-02-21T09:08:18.9024778Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r322,%r323,%r324,%r325,%r326,%r327,%r328,%r329}, {%r398,%r399,%r400,%r401}, %rd1, %p2, 1, 1; 2026-02-21T09:08:18.9025274Z // end inline asm 2026-02-21T09:08:18.9025428Z // begin inline asm 2026-02-21T09:08:18.9025859Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r322,%r323,%r324,%r325,%r326,%r327,%r328,%r329}, {%r418,%r419,%r420,%r421}, %rd2, %p2, 1, 1; 2026-02-21T09:08:18.9026339Z // end inline asm 2026-02-21T09:08:18.9026610Z // begin inline asm 2026-02-21T09:08:18.9027039Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r322,%r323,%r324,%r325,%r326,%r327,%r328,%r329}, {%r438,%r439,%r440,%r441}, %rd3, %p2, 1, 1; 2026-02-21T09:08:18.9027518Z // end inline asm 2026-02-21T09:08:18.9027667Z // begin inline asm 2026-02-21T09:08:18.9028090Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r322,%r323,%r324,%r325,%r326,%r327,%r328,%r329}, {%r458,%r459,%r460,%r461}, %rd4, %p2, 1, 1; 2026-02-21T09:08:18.9028768Z // end inline asm 2026-02-21T09:08:18.9028918Z // begin inline asm 2026-02-21T09:08:18.9029345Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r402,%r403,%r404,%r405,%r406,%r407,%r408,%r409}, {%r398,%r399,%r400,%r401}, %rd5, %p2, 1, 1; 2026-02-21T09:08:18.9029816Z // end inline asm 2026-02-21T09:08:18.9029969Z // begin inline asm 2026-02-21T09:08:18.9030387Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r402,%r403,%r404,%r405,%r406,%r407,%r408,%r409}, {%r418,%r419,%r420,%r421}, %rd6, %p2, 1, 1; 2026-02-21T09:08:18.9030862Z // end inline asm 2026-02-21T09:08:18.9031015Z // begin inline asm 2026-02-21T09:08:18.9031431Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r402,%r403,%r404,%r405,%r406,%r407,%r408,%r409}, {%r438,%r439,%r440,%r441}, %rd7, %p2, 1, 1; 2026-02-21T09:08:18.9032067Z // end inline asm 2026-02-21T09:08:18.9032222Z // begin inline asm 2026-02-21T09:08:18.9032650Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r402,%r403,%r404,%r405,%r406,%r407,%r408,%r409}, {%r458,%r459,%r460,%r461}, %rd8, %p2, 1, 1; 2026-02-21T09:08:18.9033124Z // end inline asm 2026-02-21T09:08:18.9033278Z // begin inline asm 2026-02-21T09:08:18.9033703Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r482,%r483,%r484,%r485,%r486,%r487,%r488,%r489}, {%r398,%r399,%r400,%r401}, %rd9, %p2, 1, 1; 2026-02-21T09:08:18.9034183Z // end inline asm 2026-02-21T09:08:18.9034351Z // begin inline asm 2026-02-21T09:08:18.9034777Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r482,%r483,%r484,%r485,%r486,%r487,%r488,%r489}, {%r418,%r419,%r420,%r421}, %rd10, %p2, 1, 1; 2026-02-21T09:08:18.9035269Z // end inline asm 2026-02-21T09:08:18.9035421Z // begin inline asm 2026-02-21T09:08:18.9035923Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r482,%r483,%r484,%r485,%r486,%r487,%r488,%r489}, {%r438,%r439,%r440,%r441}, %rd11, %p2, 1, 1; 2026-02-21T09:08:18.9036410Z // end inline asm 2026-02-21T09:08:18.9036704Z // begin inline asm 2026-02-21T09:08:18.9037142Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r482,%r483,%r484,%r485,%r486,%r487,%r488,%r489}, {%r458,%r459,%r460,%r461}, %rd12, %p2, 1, 1; 2026-02-21T09:08:18.9037614Z // end inline asm 2026-02-21T09:08:18.9037771Z // begin inline asm 2026-02-21T09:08:18.9038195Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r562,%r563,%r564,%r565,%r566,%r567,%r568,%r569}, {%r398,%r399,%r400,%r401}, %rd13, %p2, 1, 1; 2026-02-21T09:08:18.9038671Z // end inline asm 2026-02-21T09:08:18.9038825Z // begin inline asm 2026-02-21T09:08:18.9039244Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r562,%r563,%r564,%r565,%r566,%r567,%r568,%r569}, {%r418,%r419,%r420,%r421}, %rd14, %p2, 1, 1; 2026-02-21T09:08:18.9039721Z // end inline asm 2026-02-21T09:08:18.9039865Z // begin inline asm 2026-02-21T09:08:18.9040290Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r562,%r563,%r564,%r565,%r566,%r567,%r568,%r569}, {%r438,%r439,%r440,%r441}, %rd15, %p2, 1, 1; 2026-02-21T09:08:18.9040764Z // end inline asm 2026-02-21T09:08:18.9040918Z // begin inline asm 2026-02-21T09:08:18.9041341Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r562,%r563,%r564,%r565,%r566,%r567,%r568,%r569}, {%r458,%r459,%r460,%r461}, %rd16, %p2, 1, 1; 2026-02-21T09:08:18.9041812Z // end inline asm 2026-02-21T09:08:18.9041994Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9042199Z mov.b32 %r799, 0; 2026-02-21T09:08:18.9042356Z mov.b32 %r655, %r799; 2026-02-21T09:08:18.9042522Z mov.b32 %r656, %r799; 2026-02-21T09:08:18.9042689Z mov.b32 %r654, %r2624; 2026-02-21T09:08:18.9042852Z // begin inline asm 2026-02-21T09:08:18.9043420Z // wait for regs: %r322,%r323,%r324,%r325,%r326,%r327,%r328,%r329,%r402,%r403,%r404,%r405,%r406,%r407,%r408,%r409,%r482,%r483,%r484,%r485,%r486,%r487,%r488,%r489,%r562,%r563,%r564,%r565,%r566,%r567,%r568,%r569,%r654,%r655,%r656 2026-02-21T09:08:18.9044055Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9044248Z // end inline asm 2026-02-21T09:08:18.9044397Z $L__tmp2: 2026-02-21T09:08:18.9044781Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9045159Z add.s64 %rd96, %rd248, 64; 2026-02-21T09:08:18.9045336Z // begin inline asm 2026-02-21T09:08:18.9045495Z mov.u64 %rd95, 0x0; 2026-02-21T09:08:18.9045734Z createpolicy.fractional.L2::evict_last.b64 %rd95, 1.0; 2026-02-21T09:08:18.9045990Z // end inline asm 2026-02-21T09:08:18.9046145Z // begin inline asm 2026-02-21T09:08:18.9046298Z mov.u32 %r692, 0x0; 2026-02-21T09:08:18.9046595Z mov.u32 %r693, 0x0; 2026-02-21T09:08:18.9046867Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r692, %r693 }, [ %rd96 + 0 ], %rd95; 2026-02-21T09:08:18.9047195Z // end inline asm 2026-02-21T09:08:18.9047584Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9048018Z bar.sync 0; 2026-02-21T09:08:18.9048193Z st.shared.v2.b32 [%r15], {%r692, %r693}; 2026-02-21T09:08:18.9048401Z bar.sync 0; 2026-02-21T09:08:18.9048562Z ld.shared.b16 %rs40, [%r16]; 2026-02-21T09:08:18.9048757Z ld.shared.b16 %rs41, [%r16+512]; 2026-02-21T09:08:18.9048963Z ld.shared.b16 %rs42, [%r16+32]; 2026-02-21T09:08:18.9049157Z ld.shared.b16 %rs43, [%r16+544]; 2026-02-21T09:08:18.9049356Z ld.shared.b16 %rs44, [%r17]; 2026-02-21T09:08:18.9049539Z ld.shared.b16 %rs45, [%r17+512]; 2026-02-21T09:08:18.9049738Z ld.shared.b16 %rs46, [%r17+32]; 2026-02-21T09:08:18.9049933Z ld.shared.b16 %rs47, [%r17+544]; 2026-02-21T09:08:18.9050135Z ld.shared.b16 %rs48, [%r18]; 2026-02-21T09:08:18.9050326Z ld.shared.b16 %rs49, [%r18+512]; 2026-02-21T09:08:18.9050520Z ld.shared.b16 %rs50, [%r18+32]; 2026-02-21T09:08:18.9050719Z ld.shared.b16 %rs51, [%r18+544]; 2026-02-21T09:08:18.9050910Z ld.shared.b16 %rs52, [%r19]; 2026-02-21T09:08:18.9051104Z ld.shared.b16 %rs53, [%r19+512]; 2026-02-21T09:08:18.9051387Z ld.shared.b16 %rs54, [%r19+32]; 2026-02-21T09:08:18.9051593Z ld.shared.b16 %rs55, [%r19+544]; 2026-02-21T09:08:18.9051785Z cvt.f32.bf16 %r726, %rs40; 2026-02-21T09:08:18.9051968Z cvt.f32.bf16 %r727, %rs41; 2026-02-21T09:08:18.9052148Z cvt.f32.bf16 %r728, %rs44; 2026-02-21T09:08:18.9052318Z cvt.f32.bf16 %r729, %rs45; 2026-02-21T09:08:18.9052495Z cvt.f32.bf16 %r746, %rs48; 2026-02-21T09:08:18.9052666Z cvt.f32.bf16 %r747, %rs49; 2026-02-21T09:08:18.9052858Z cvt.f32.bf16 %r748, %rs52; 2026-02-21T09:08:18.9053031Z cvt.f32.bf16 %r749, %rs53; 2026-02-21T09:08:18.9053205Z cvt.f32.bf16 %r766, %rs42; 2026-02-21T09:08:18.9053376Z cvt.f32.bf16 %r767, %rs43; 2026-02-21T09:08:18.9053553Z cvt.f32.bf16 %r768, %rs46; 2026-02-21T09:08:18.9053723Z cvt.f32.bf16 %r769, %rs47; 2026-02-21T09:08:18.9053897Z cvt.f32.bf16 %r786, %rs50; 2026-02-21T09:08:18.9054075Z cvt.f32.bf16 %r787, %rs51; 2026-02-21T09:08:18.9054252Z cvt.f32.bf16 %r788, %rs54; 2026-02-21T09:08:18.9054434Z cvt.f32.bf16 %r789, %rs55; 2026-02-21T09:08:18.9054605Z cvt.u32.u64 %r822, %rd249; 2026-02-21T09:08:18.9054781Z add.s32 %r823, %r822, 48; 2026-02-21T09:08:18.9055111Z .loc 1 54 62 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:62 2026-02-21T09:08:18.9055474Z or.b32 %r824, %r5, %r823; 2026-02-21T09:08:18.9055655Z shl.b32 %r825, %r824, 13; 2026-02-21T09:08:18.9055833Z add.s32 %r826, %r825, %r54; 2026-02-21T09:08:18.9056161Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9056657Z cvt.s64.s32 %rd106, %r826; 2026-02-21T09:08:18.9056845Z add.s64 %rd99, %rd54, %rd106; 2026-02-21T09:08:18.9057165Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9057519Z // begin inline asm 2026-02-21T09:08:18.9057675Z mov.u64 %rd98, 0x0; 2026-02-21T09:08:18.9057901Z createpolicy.fractional.L2::evict_first.b64 %rd98, 1.0; 2026-02-21T09:08:18.9058159Z // end inline asm 2026-02-21T09:08:18.9058315Z // begin inline asm 2026-02-21T09:08:18.9058484Z mov.u16 %rs2, 0x0; 2026-02-21T09:08:18.9058825Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs2 }, [ %rd99 + 0 ], %rd98; 2026-02-21T09:08:18.9059126Z // end inline asm 2026-02-21T09:08:18.9059278Z shr.u16 %rs56, %rs2, 8; 2026-02-21T09:08:18.9059601Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9059960Z bar.sync 0; 2026-02-21T09:08:18.9060127Z st.shared.b8 [%r20], %rs2; 2026-02-21T09:08:18.9060311Z st.shared.b8 [%r21+512], %rs56; 2026-02-21T09:08:18.9060500Z bar.sync 0; 2026-02-21T09:08:18.9060655Z ld.shared.b32 %r827, [%r22]; 2026-02-21T09:08:18.9060843Z prmt.b32 %r828, %r827, 0, 0x7770U; 2026-02-21T09:08:18.9061048Z cvt.u16.u32 %rs57, %r828; 2026-02-21T09:08:18.9061224Z prmt.b32 %r829, %r827, 0, 0x7771U; 2026-02-21T09:08:18.9061586Z cvt.u16.u32 %rs58, %r829; 2026-02-21T09:08:18.9061770Z prmt.b32 %r830, %r827, 0, 0x7772U; 2026-02-21T09:08:18.9061969Z cvt.u16.u32 %rs59, %r830; 2026-02-21T09:08:18.9062143Z prmt.b32 %r831, %r827, 0, 0x7773U; 2026-02-21T09:08:18.9062360Z cvt.u16.u32 %rs60, %r831; 2026-02-21T09:08:18.9062685Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9063040Z shl.b16 %rs61, %rs57, 4; 2026-02-21T09:08:18.9063219Z shl.b16 %rs62, %rs58, 4; 2026-02-21T09:08:18.9063390Z shl.b16 %rs63, %rs59, 4; 2026-02-21T09:08:18.9063564Z shl.b16 %rs64, %rs60, 4; 2026-02-21T09:08:18.9063874Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9064239Z selp.b16 %rs65, %rs61, %rs57, %p93; 2026-02-21T09:08:18.9064436Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T09:08:18.9064611Z shr.s16 %rs67, %rs66, 4; 2026-02-21T09:08:18.9064790Z selp.b16 %rs68, %rs62, %rs58, %p93; 2026-02-21T09:08:18.9064993Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T09:08:18.9065244Z shr.s16 %rs70, %rs69, 4; 2026-02-21T09:08:18.9065422Z selp.b16 %rs71, %rs63, %rs59, %p93; 2026-02-21T09:08:18.9065618Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T09:08:18.9065786Z shr.s16 %rs73, %rs72, 4; 2026-02-21T09:08:18.9065981Z selp.b16 %rs74, %rs64, %rs60, %p93; 2026-02-21T09:08:18.9066174Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T09:08:18.9066343Z shr.s16 %rs76, %rs75, 4; 2026-02-21T09:08:18.9066780Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9067136Z cvt.rn.f32.s16 %r832, %rs67; 2026-02-21T09:08:18.9067325Z cvt.rn.f32.s16 %r833, %rs70; 2026-02-21T09:08:18.9067520Z cvt.rn.f32.s16 %r834, %rs73; 2026-02-21T09:08:18.9067708Z cvt.rn.f32.s16 %r835, %rs76; 2026-02-21T09:08:18.9067877Z bar.sync 0; 2026-02-21T09:08:18.9068033Z st.shared.b32 [%r23], %r832; 2026-02-21T09:08:18.9068213Z st.shared.b32 [%r24], %r833; 2026-02-21T09:08:18.9068465Z st.shared.b32 [%r25], %r834; 2026-02-21T09:08:18.9068648Z st.shared.b32 [%r26], %r835; 2026-02-21T09:08:18.9068827Z $L__tmp3: 2026-02-21T09:08:18.9069188Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9069726Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r322, %r402, %r482, %r562}; 2026-02-21T09:08:18.9070052Z bar.sync 0; 2026-02-21T09:08:18.9070194Z // begin inline asm 2026-02-21T09:08:18.9070428Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r730}, [%r695]; 2026-02-21T09:08:18.9070694Z // end inline asm 2026-02-21T09:08:18.9070849Z bar.sync 0; 2026-02-21T09:08:18.9071106Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r324, %r404, %r484, %r564}; 2026-02-21T09:08:18.9071428Z bar.sync 0; 2026-02-21T09:08:18.9071578Z // begin inline asm 2026-02-21T09:08:18.9071807Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r732}, [%r695]; 2026-02-21T09:08:18.9072075Z // end inline asm 2026-02-21T09:08:18.9072225Z bar.sync 0; 2026-02-21T09:08:18.9072492Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r323, %r403, %r483, %r563}; 2026-02-21T09:08:18.9072804Z bar.sync 0; 2026-02-21T09:08:18.9072958Z // begin inline asm 2026-02-21T09:08:18.9073279Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r731}, [%r695]; 2026-02-21T09:08:18.9073547Z // end inline asm 2026-02-21T09:08:18.9073699Z bar.sync 0; 2026-02-21T09:08:18.9073955Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r325, %r405, %r485, %r565}; 2026-02-21T09:08:18.9074272Z bar.sync 0; 2026-02-21T09:08:18.9074414Z // begin inline asm 2026-02-21T09:08:18.9074633Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r733}, [%r695]; 2026-02-21T09:08:18.9074908Z // end inline asm 2026-02-21T09:08:18.9075059Z bar.sync 0; 2026-02-21T09:08:18.9075316Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r326, %r406, %r486, %r566}; 2026-02-21T09:08:18.9075631Z bar.sync 0; 2026-02-21T09:08:18.9075774Z // begin inline asm 2026-02-21T09:08:18.9076138Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r734}, [%r695]; 2026-02-21T09:08:18.9076405Z // end inline asm 2026-02-21T09:08:18.9076681Z bar.sync 0; 2026-02-21T09:08:18.9076940Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r328, %r408, %r488, %r568}; 2026-02-21T09:08:18.9077258Z bar.sync 0; 2026-02-21T09:08:18.9077402Z // begin inline asm 2026-02-21T09:08:18.9077620Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r736}, [%r695]; 2026-02-21T09:08:18.9077886Z // end inline asm 2026-02-21T09:08:18.9078030Z bar.sync 0; 2026-02-21T09:08:18.9078290Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r327, %r407, %r487, %r567}; 2026-02-21T09:08:18.9078612Z bar.sync 0; 2026-02-21T09:08:18.9078763Z // begin inline asm 2026-02-21T09:08:18.9078988Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r735}, [%r695]; 2026-02-21T09:08:18.9079245Z // end inline asm 2026-02-21T09:08:18.9079396Z bar.sync 0; 2026-02-21T09:08:18.9079649Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r329, %r409, %r489, %r569}; 2026-02-21T09:08:18.9079969Z bar.sync 0; 2026-02-21T09:08:18.9080216Z // begin inline asm 2026-02-21T09:08:18.9080443Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r737}, [%r695]; 2026-02-21T09:08:18.9080710Z // end inline asm 2026-02-21T09:08:18.9080860Z // begin inline asm 2026-02-21T09:08:18.9081038Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9081229Z // end inline asm 2026-02-21T09:08:18.9081397Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9081577Z shl.b32 %r836, %r821, 9; 2026-02-21T09:08:18.9081754Z and.b32 %r837, %r836, 6144; 2026-02-21T09:08:18.9081932Z add.s32 %r838, %r837, %r2624; 2026-02-21T09:08:18.9082120Z bfe.u32 %r839, %r838, 4, 14; 2026-02-21T09:08:18.9082302Z cvt.u64.u32 %rd107, %r839; 2026-02-21T09:08:18.9082499Z or.b64 %rd101, %rd107, 4611686293338849280; 2026-02-21T09:08:18.9082716Z // begin inline asm 2026-02-21T09:08:18.9083159Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r726,%r727,%r728,%r729}, %rd101, %p2, 1, 1; 2026-02-21T09:08:18.9083656Z // end inline asm 2026-02-21T09:08:18.9083811Z add.s32 %r840, %r838, 32; 2026-02-21T09:08:18.9083990Z bfe.u32 %r841, %r840, 4, 14; 2026-02-21T09:08:18.9084168Z cvt.u64.u32 %rd108, %r841; 2026-02-21T09:08:18.9084365Z or.b64 %rd102, %rd108, 4611686293338849280; 2026-02-21T09:08:18.9084594Z // begin inline asm 2026-02-21T09:08:18.9085039Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r746,%r747,%r748,%r749}, %rd102, %p2, 1, 1; 2026-02-21T09:08:18.9085527Z // end inline asm 2026-02-21T09:08:18.9085684Z add.s32 %r842, %r838, 64; 2026-02-21T09:08:18.9085866Z bfe.u32 %r843, %r842, 4, 14; 2026-02-21T09:08:18.9086045Z cvt.u64.u32 %rd109, %r843; 2026-02-21T09:08:18.9086240Z or.b64 %rd103, %rd109, 4611686293338849280; 2026-02-21T09:08:18.9086576Z // begin inline asm 2026-02-21T09:08:18.9087020Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r766,%r767,%r768,%r769}, %rd103, %p2, 1, 1; 2026-02-21T09:08:18.9087502Z // end inline asm 2026-02-21T09:08:18.9087651Z add.s32 %r844, %r838, 96; 2026-02-21T09:08:18.9087828Z bfe.u32 %r845, %r844, 4, 14; 2026-02-21T09:08:18.9088102Z cvt.u64.u32 %rd110, %r845; 2026-02-21T09:08:18.9088293Z or.b64 %rd104, %rd110, 4611686293338849280; 2026-02-21T09:08:18.9088500Z // begin inline asm 2026-02-21T09:08:18.9088935Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r786,%r787,%r788,%r789}, %rd104, %p2, 1, 1; 2026-02-21T09:08:18.9089412Z // end inline asm 2026-02-21T09:08:18.9089580Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9089783Z mov.b32 %r798, %r2624; 2026-02-21T09:08:18.9089950Z mov.b32 %r800, %r799; 2026-02-21T09:08:18.9090115Z // begin inline asm 2026-02-21T09:08:18.9090357Z // wait for regs: %r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r798,%r799,%r800 2026-02-21T09:08:18.9090682Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9091030Z // end inline asm 2026-02-21T09:08:18.9091186Z $L__tmp4: 2026-02-21T09:08:18.9091484Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9091857Z add.s64 %rd249, %rd249, 32; 2026-02-21T09:08:18.9092048Z add.s64 %rd248, %rd248, 128; 2026-02-21T09:08:18.9092229Z add.s32 %r2814, %r2814, 262144; 2026-02-21T09:08:18.9092428Z setp.lt.u64 %p23, %rd249, 480; 2026-02-21T09:08:18.9092614Z @%p23 bra $L__BB0_3; 2026-02-21T09:08:18.9092831Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:18.9093235Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9093600Z or.b32 %r851, %r53, %r11; 2026-02-21T09:08:18.9093923Z .loc 1 87 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:87:28 2026-02-21T09:08:18.9094289Z cvt.rn.bf16x2.f32 %r852, %r731, %r730; 2026-02-21T09:08:18.9094515Z cvt.rn.bf16x2.f32 %r853, %r733, %r732; 2026-02-21T09:08:18.9094804Z cvt.rn.bf16x2.f32 %r854, %r735, %r734; 2026-02-21T09:08:18.9095026Z cvt.rn.bf16x2.f32 %r855, %r737, %r736; 2026-02-21T09:08:18.9095373Z .loc 1 88 43 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:43 2026-02-21T09:08:18.9095731Z shl.b32 %r856, %r55, 13; 2026-02-21T09:08:18.9096049Z .loc 1 88 50 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:50 2026-02-21T09:08:18.9096422Z add.s32 %r857, %r851, %r856; 2026-02-21T09:08:18.9096884Z .loc 1 88 22 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:22 2026-02-21T09:08:18.9097240Z mad.wide.s32 %rd111, %r857, 2, %rd55; 2026-02-21T09:08:18.9097590Z .loc 1 88 81 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:81 2026-02-21T09:08:18.9097933Z bar.sync 0; 2026-02-21T09:08:18.9098209Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r852, %r853, %r854, %r855}; 2026-02-21T09:08:18.9098537Z bar.sync 0; 2026-02-21T09:08:18.9098726Z ld.shared.v4.b32 {%r846, %r847, %r848, %r849}, [%r30]; 2026-02-21T09:08:18.9098981Z // begin inline asm 2026-02-21T09:08:18.9099204Z st.global.v4.b32 [ %rd111 + 0 ], { %r846, %r847, %r848, %r849 }; 2026-02-21T09:08:18.9099470Z // end inline asm 2026-02-21T09:08:18.9099778Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.9100152Z add.s32 %r858, %r2813, 2112; 2026-02-21T09:08:18.9100475Z .loc 1 25 35 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:25:35 2026-02-21T09:08:18.9100844Z shr.s32 %r859, %r858, 31; 2026-02-21T09:08:18.9101034Z shr.u32 %r860, %r859, 24; 2026-02-21T09:08:18.9101207Z add.s32 %r861, %r858, %r860; 2026-02-21T09:08:18.9101391Z shr.s32 %r862, %r861, 8; 2026-02-21T09:08:18.9101704Z .loc 1 26 33 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:26:33 2026-02-21T09:08:18.9102066Z shl.b32 %r863, %r862, 2; 2026-02-21T09:08:18.9102375Z .loc 1 27 39 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:39 2026-02-21T09:08:18.9102818Z sub.s32 %r864, 128, %r863; 2026-02-21T09:08:18.9103139Z .loc 1 27 52 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:52 2026-02-21T09:08:18.9103503Z min.s32 %r865, %r864, 4; 2026-02-21T09:08:18.9103817Z .loc 1 28 45 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:45 2026-02-21T09:08:18.9104171Z and.b32 %r866, %r861, -256; 2026-02-21T09:08:18.9104356Z sub.s32 %r867, %r858, %r866; 2026-02-21T09:08:18.9104670Z .loc 1 29 51 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:29:51 2026-02-21T09:08:18.9105023Z div.s32 %r868, %r867, %r865; 2026-02-21T09:08:18.9105340Z .loc 1 28 64 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:64 2026-02-21T09:08:18.9105832Z mul.lo.s32 %r869, %r868, %r865; 2026-02-21T09:08:18.9106028Z sub.s32 %r870, %r867, %r869; 2026-02-21T09:08:18.9106345Z .loc 1 28 30 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:30 2026-02-21T09:08:18.9106829Z add.s32 %r871, %r870, %r863; 2026-02-21T09:08:18.9107144Z .loc 1 30 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:30:27 2026-02-21T09:08:18.9107499Z shl.b32 %r75, %r871, 6; 2026-02-21T09:08:18.9107813Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9108161Z or.b32 %r76, %r75, %r9; 2026-02-21T09:08:18.9108555Z .loc 1 32 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:32:27 2026-02-21T09:08:18.9108907Z shl.b32 %r872, %r868, 6; 2026-02-21T09:08:18.9109224Z .loc 1 33 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:33:32 2026-02-21T09:08:18.9109579Z or.b32 %r77, %r872, %r7; 2026-02-21T09:08:18.9109990Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9110364Z shl.b32 %r873, %r868, 16; 2026-02-21T09:08:18.9110538Z or.b32 %r874, %r31, %r873; 2026-02-21T09:08:18.9110732Z mad.wide.s32 %rd250, %r874, 2, %rd53; 2026-02-21T09:08:18.9110936Z add.s32 %r2823, %r32, %r75; 2026-02-21T09:08:18.9111118Z mov.b32 %r1345, 0f00000000; 2026-02-21T09:08:18.9111294Z mov.b64 %rd251, -32; 2026-02-21T09:08:18.9111475Z mov.b32 %r1346, %r1345; 2026-02-21T09:08:18.9111641Z mov.b32 %r1347, %r1345; 2026-02-21T09:08:18.9111818Z mov.b32 %r1348, %r1345; 2026-02-21T09:08:18.9122853Z mov.b32 %r1349, %r1345; 2026-02-21T09:08:18.9123062Z mov.b32 %r1350, %r1345; 2026-02-21T09:08:18.9123249Z mov.b32 %r1351, %r1345; 2026-02-21T09:08:18.9123431Z mov.b32 %r1352, %r1345; 2026-02-21T09:08:18.9123662Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:08:18.9123986Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:18.9124391Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9124784Z // begin inline asm 2026-02-21T09:08:18.9124949Z mov.u64 %rd113, 0x0; 2026-02-21T09:08:18.9125203Z createpolicy.fractional.L2::evict_last.b64 %rd113, 1.0; 2026-02-21T09:08:18.9125465Z // end inline asm 2026-02-21T09:08:18.9125620Z // begin inline asm 2026-02-21T09:08:18.9125783Z mov.u32 %r875, 0x0; 2026-02-21T09:08:18.9125930Z mov.u32 %r876, 0x0; 2026-02-21T09:08:18.9126217Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r875, %r876 }, [ %rd250 + 0 ], %rd113; 2026-02-21T09:08:18.9126721Z // end inline asm 2026-02-21T09:08:18.9127043Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9127416Z bar.sync 0; 2026-02-21T09:08:18.9127590Z st.shared.v2.b32 [%r15], {%r875, %r876}; 2026-02-21T09:08:18.9127807Z bar.sync 0; 2026-02-21T09:08:18.9127961Z ld.shared.b16 %rs79, [%r16]; 2026-02-21T09:08:18.9128158Z ld.shared.b16 %rs80, [%r16+512]; 2026-02-21T09:08:18.9128353Z ld.shared.b16 %rs81, [%r16+32]; 2026-02-21T09:08:18.9128721Z ld.shared.b16 %rs82, [%r16+544]; 2026-02-21T09:08:18.9128924Z ld.shared.b16 %rs83, [%r17]; 2026-02-21T09:08:18.9129126Z ld.shared.b16 %rs84, [%r17+512]; 2026-02-21T09:08:18.9129319Z ld.shared.b16 %rs85, [%r17+32]; 2026-02-21T09:08:18.9129529Z ld.shared.b16 %rs86, [%r17+544]; 2026-02-21T09:08:18.9129729Z ld.shared.b16 %rs87, [%r18]; 2026-02-21T09:08:18.9129931Z ld.shared.b16 %rs88, [%r18+512]; 2026-02-21T09:08:18.9130141Z ld.shared.b16 %rs89, [%r18+32]; 2026-02-21T09:08:18.9130335Z ld.shared.b16 %rs90, [%r18+544]; 2026-02-21T09:08:18.9130543Z ld.shared.b16 %rs91, [%r19]; 2026-02-21T09:08:18.9130725Z ld.shared.b16 %rs92, [%r19+512]; 2026-02-21T09:08:18.9130924Z ld.shared.b16 %rs93, [%r19+32]; 2026-02-21T09:08:18.9131294Z ld.shared.b16 %rs94, [%r19+544]; 2026-02-21T09:08:18.9131506Z cvt.f32.bf16 %r1013, %rs79; 2026-02-21T09:08:18.9131693Z cvt.f32.bf16 %r1014, %rs80; 2026-02-21T09:08:18.9131887Z cvt.f32.bf16 %r1015, %rs83; 2026-02-21T09:08:18.9132085Z cvt.f32.bf16 %r1016, %rs84; 2026-02-21T09:08:18.9132265Z cvt.f32.bf16 %r1033, %rs87; 2026-02-21T09:08:18.9132451Z cvt.f32.bf16 %r1034, %rs88; 2026-02-21T09:08:18.9132632Z cvt.f32.bf16 %r1035, %rs91; 2026-02-21T09:08:18.9132814Z cvt.f32.bf16 %r1036, %rs92; 2026-02-21T09:08:18.9132990Z cvt.f32.bf16 %r1053, %rs81; 2026-02-21T09:08:18.9133163Z cvt.f32.bf16 %r1054, %rs82; 2026-02-21T09:08:18.9133352Z cvt.f32.bf16 %r1055, %rs85; 2026-02-21T09:08:18.9133534Z cvt.f32.bf16 %r1056, %rs86; 2026-02-21T09:08:18.9133709Z cvt.f32.bf16 %r1073, %rs89; 2026-02-21T09:08:18.9133883Z cvt.f32.bf16 %r1074, %rs90; 2026-02-21T09:08:18.9134059Z cvt.f32.bf16 %r1075, %rs93; 2026-02-21T09:08:18.9134226Z cvt.f32.bf16 %r1076, %rs94; 2026-02-21T09:08:18.9134646Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9135032Z cvt.s64.s32 %rd145, %r2823; 2026-02-21T09:08:18.9135251Z add.s64 %rd117, %rd54, %rd145; 2026-02-21T09:08:18.9135597Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9135962Z // begin inline asm 2026-02-21T09:08:18.9136137Z mov.u64 %rd116, 0x0; 2026-02-21T09:08:18.9136363Z createpolicy.fractional.L2::evict_first.b64 %rd116, 1.0; 2026-02-21T09:08:18.9136767Z // end inline asm 2026-02-21T09:08:18.9136935Z // begin inline asm 2026-02-21T09:08:18.9137102Z mov.u16 %rs77, 0x0; 2026-02-21T09:08:18.9137357Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs77 }, [ %rd117 + 0 ], %rd116; 2026-02-21T09:08:18.9137679Z // end inline asm 2026-02-21T09:08:18.9137839Z shr.u16 %rs95, %rs77, 8; 2026-02-21T09:08:18.9138166Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9138529Z bar.sync 0; 2026-02-21T09:08:18.9138688Z st.shared.b8 [%r20], %rs77; 2026-02-21T09:08:18.9138881Z st.shared.b8 [%r21+512], %rs95; 2026-02-21T09:08:18.9139061Z bar.sync 0; 2026-02-21T09:08:18.9139232Z ld.shared.b32 %r1427, [%r22]; 2026-02-21T09:08:18.9139431Z prmt.b32 %r1428, %r1427, 0, 0x7770U; 2026-02-21T09:08:18.9139650Z cvt.u16.u32 %rs96, %r1428; 2026-02-21T09:08:18.9139834Z prmt.b32 %r1429, %r1427, 0, 0x7771U; 2026-02-21T09:08:18.9140032Z cvt.u16.u32 %rs97, %r1429; 2026-02-21T09:08:18.9140210Z prmt.b32 %r1430, %r1427, 0, 0x7772U; 2026-02-21T09:08:18.9140398Z cvt.u16.u32 %rs98, %r1430; 2026-02-21T09:08:18.9140575Z prmt.b32 %r1431, %r1427, 0, 0x7773U; 2026-02-21T09:08:18.9140763Z cvt.u16.u32 %rs99, %r1431; 2026-02-21T09:08:18.9141090Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9141451Z shl.b16 %rs100, %rs96, 4; 2026-02-21T09:08:18.9141639Z shl.b16 %rs101, %rs97, 4; 2026-02-21T09:08:18.9141810Z shl.b16 %rs102, %rs98, 4; 2026-02-21T09:08:18.9141982Z shl.b16 %rs103, %rs99, 4; 2026-02-21T09:08:18.9142303Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9142779Z selp.b16 %rs104, %rs100, %rs96, %p93; 2026-02-21T09:08:18.9142994Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T09:08:18.9143167Z shr.s16 %rs106, %rs105, 4; 2026-02-21T09:08:18.9143351Z selp.b16 %rs107, %rs101, %rs97, %p93; 2026-02-21T09:08:18.9143547Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T09:08:18.9143724Z shr.s16 %rs109, %rs108, 4; 2026-02-21T09:08:18.9143898Z selp.b16 %rs110, %rs102, %rs98, %p93; 2026-02-21T09:08:18.9144100Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T09:08:18.9144285Z shr.s16 %rs112, %rs111, 4; 2026-02-21T09:08:18.9144465Z selp.b16 %rs113, %rs103, %rs99, %p93; 2026-02-21T09:08:18.9144665Z cvt.s16.s8 %rs114, %rs113; 2026-02-21T09:08:18.9144835Z shr.s16 %rs115, %rs114, 4; 2026-02-21T09:08:18.9145324Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9145699Z cvt.rn.f32.s16 %r1432, %rs106; 2026-02-21T09:08:18.9145894Z cvt.rn.f32.s16 %r1433, %rs109; 2026-02-21T09:08:18.9146079Z cvt.rn.f32.s16 %r1434, %rs112; 2026-02-21T09:08:18.9146263Z cvt.rn.f32.s16 %r1435, %rs115; 2026-02-21T09:08:18.9146441Z bar.sync 0; 2026-02-21T09:08:18.9146729Z st.shared.b32 [%r23], %r1432; 2026-02-21T09:08:18.9146921Z st.shared.b32 [%r24], %r1433; 2026-02-21T09:08:18.9147099Z st.shared.b32 [%r25], %r1434; 2026-02-21T09:08:18.9147279Z st.shared.b32 [%r26], %r1435; 2026-02-21T09:08:18.9147528Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1345}; 2026-02-21T09:08:18.9147797Z bar.sync 0; 2026-02-21T09:08:18.9147937Z // begin inline asm 2026-02-21T09:08:18.9148224Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r937, %r1017, %r1097, %r1177}, [%r266]; 2026-02-21T09:08:18.9148649Z // end inline asm 2026-02-21T09:08:18.9148802Z bar.sync 0; 2026-02-21T09:08:18.9149099Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1347}; 2026-02-21T09:08:18.9149362Z bar.sync 0; 2026-02-21T09:08:18.9149506Z // begin inline asm 2026-02-21T09:08:18.9149776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r939, %r1019, %r1099, %r1179}, [%r266]; 2026-02-21T09:08:18.9150101Z // end inline asm 2026-02-21T09:08:18.9150246Z bar.sync 0; 2026-02-21T09:08:18.9150457Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1346}; 2026-02-21T09:08:18.9150727Z bar.sync 0; 2026-02-21T09:08:18.9150871Z // begin inline asm 2026-02-21T09:08:18.9151144Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r938, %r1018, %r1098, %r1178}, [%r266]; 2026-02-21T09:08:18.9151465Z // end inline asm 2026-02-21T09:08:18.9151612Z bar.sync 0; 2026-02-21T09:08:18.9151818Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1348}; 2026-02-21T09:08:18.9152083Z bar.sync 0; 2026-02-21T09:08:18.9152219Z // begin inline asm 2026-02-21T09:08:18.9152492Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r940, %r1020, %r1100, %r1180}, [%r266]; 2026-02-21T09:08:18.9152814Z // end inline asm 2026-02-21T09:08:18.9152958Z bar.sync 0; 2026-02-21T09:08:18.9153158Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1349}; 2026-02-21T09:08:18.9153417Z bar.sync 0; 2026-02-21T09:08:18.9153556Z // begin inline asm 2026-02-21T09:08:18.9153816Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r941, %r1021, %r1101, %r1181}, [%r266]; 2026-02-21T09:08:18.9154135Z // end inline asm 2026-02-21T09:08:18.9154272Z bar.sync 0; 2026-02-21T09:08:18.9154480Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1351}; 2026-02-21T09:08:18.9154748Z bar.sync 0; 2026-02-21T09:08:18.9154890Z // begin inline asm 2026-02-21T09:08:18.9155154Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r943, %r1023, %r1103, %r1183}, [%r266]; 2026-02-21T09:08:18.9155473Z // end inline asm 2026-02-21T09:08:18.9155617Z bar.sync 0; 2026-02-21T09:08:18.9155818Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1350}; 2026-02-21T09:08:18.9156078Z bar.sync 0; 2026-02-21T09:08:18.9156215Z // begin inline asm 2026-02-21T09:08:18.9156624Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r942, %r1022, %r1102, %r1182}, [%r266]; 2026-02-21T09:08:18.9156960Z // end inline asm 2026-02-21T09:08:18.9157204Z bar.sync 0; 2026-02-21T09:08:18.9157408Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1352}; 2026-02-21T09:08:18.9157670Z bar.sync 0; 2026-02-21T09:08:18.9157813Z // begin inline asm 2026-02-21T09:08:18.9158079Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r944, %r1024, %r1104, %r1184}, [%r266]; 2026-02-21T09:08:18.9158400Z // end inline asm 2026-02-21T09:08:18.9158540Z $L__tmp5: 2026-02-21T09:08:18.9158900Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9159318Z // begin inline asm 2026-02-21T09:08:18.9159492Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9159681Z // end inline asm 2026-02-21T09:08:18.9159846Z shfl.sync.idx.b32 %r1436, %r5, 0, 31, -1; 2026-02-21T09:08:18.9160224Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9160415Z mov.pred %p24, -1; 2026-02-21T09:08:18.9160575Z // begin inline asm 2026-02-21T09:08:18.9161013Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r937,%r938,%r939,%r940,%r941,%r942,%r943,%r944}, {%r1013,%r1014,%r1015,%r1016}, %rd1, %p24, 1, 1; 2026-02-21T09:08:18.9161516Z // end inline asm 2026-02-21T09:08:18.9161657Z // begin inline asm 2026-02-21T09:08:18.9162090Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r937,%r938,%r939,%r940,%r941,%r942,%r943,%r944}, {%r1033,%r1034,%r1035,%r1036}, %rd2, %p24, 1, 1; 2026-02-21T09:08:18.9162572Z // end inline asm 2026-02-21T09:08:18.9162713Z // begin inline asm 2026-02-21T09:08:18.9163142Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r937,%r938,%r939,%r940,%r941,%r942,%r943,%r944}, {%r1053,%r1054,%r1055,%r1056}, %rd3, %p24, 1, 1; 2026-02-21T09:08:18.9163618Z // end inline asm 2026-02-21T09:08:18.9163763Z // begin inline asm 2026-02-21T09:08:18.9164281Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r937,%r938,%r939,%r940,%r941,%r942,%r943,%r944}, {%r1073,%r1074,%r1075,%r1076}, %rd4, %p24, 1, 1; 2026-02-21T09:08:18.9164771Z // end inline asm 2026-02-21T09:08:18.9164918Z // begin inline asm 2026-02-21T09:08:18.9165394Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1017,%r1018,%r1019,%r1020,%r1021,%r1022,%r1023,%r1024}, {%r1013,%r1014,%r1015,%r1016}, %rd5, %p24, 1, 1; 2026-02-21T09:08:18.9165904Z // end inline asm 2026-02-21T09:08:18.9166045Z // begin inline asm 2026-02-21T09:08:18.9166624Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1017,%r1018,%r1019,%r1020,%r1021,%r1022,%r1023,%r1024}, {%r1033,%r1034,%r1035,%r1036}, %rd6, %p24, 1, 1; 2026-02-21T09:08:18.9167140Z // end inline asm 2026-02-21T09:08:18.9167281Z // begin inline asm 2026-02-21T09:08:18.9167739Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1017,%r1018,%r1019,%r1020,%r1021,%r1022,%r1023,%r1024}, {%r1053,%r1054,%r1055,%r1056}, %rd7, %p24, 1, 1; 2026-02-21T09:08:18.9168247Z // end inline asm 2026-02-21T09:08:18.9168393Z // begin inline asm 2026-02-21T09:08:18.9168844Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1017,%r1018,%r1019,%r1020,%r1021,%r1022,%r1023,%r1024}, {%r1073,%r1074,%r1075,%r1076}, %rd8, %p24, 1, 1; 2026-02-21T09:08:18.9169354Z // end inline asm 2026-02-21T09:08:18.9169515Z // begin inline asm 2026-02-21T09:08:18.9169969Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104}, {%r1013,%r1014,%r1015,%r1016}, %rd9, %p24, 1, 1; 2026-02-21T09:08:18.9170476Z // end inline asm 2026-02-21T09:08:18.9170615Z // begin inline asm 2026-02-21T09:08:18.9171072Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104}, {%r1033,%r1034,%r1035,%r1036}, %rd10, %p24, 1, 1; 2026-02-21T09:08:18.9171577Z // end inline asm 2026-02-21T09:08:18.9171724Z // begin inline asm 2026-02-21T09:08:18.9172189Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104}, {%r1053,%r1054,%r1055,%r1056}, %rd11, %p24, 1, 1; 2026-02-21T09:08:18.9172698Z // end inline asm 2026-02-21T09:08:18.9172845Z // begin inline asm 2026-02-21T09:08:18.9173398Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104}, {%r1073,%r1074,%r1075,%r1076}, %rd12, %p24, 1, 1; 2026-02-21T09:08:18.9173906Z // end inline asm 2026-02-21T09:08:18.9174050Z // begin inline asm 2026-02-21T09:08:18.9174513Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1177,%r1178,%r1179,%r1180,%r1181,%r1182,%r1183,%r1184}, {%r1013,%r1014,%r1015,%r1016}, %rd13, %p24, 1, 1; 2026-02-21T09:08:18.9175027Z // end inline asm 2026-02-21T09:08:18.9175176Z // begin inline asm 2026-02-21T09:08:18.9175628Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1177,%r1178,%r1179,%r1180,%r1181,%r1182,%r1183,%r1184}, {%r1033,%r1034,%r1035,%r1036}, %rd14, %p24, 1, 1; 2026-02-21T09:08:18.9176212Z // end inline asm 2026-02-21T09:08:18.9176571Z // begin inline asm 2026-02-21T09:08:18.9177065Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1177,%r1178,%r1179,%r1180,%r1181,%r1182,%r1183,%r1184}, {%r1053,%r1054,%r1055,%r1056}, %rd15, %p24, 1, 1; 2026-02-21T09:08:18.9177581Z // end inline asm 2026-02-21T09:08:18.9177731Z // begin inline asm 2026-02-21T09:08:18.9178197Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1177,%r1178,%r1179,%r1180,%r1181,%r1182,%r1183,%r1184}, {%r1073,%r1074,%r1075,%r1076}, %rd16, %p24, 1, 1; 2026-02-21T09:08:18.9178708Z // end inline asm 2026-02-21T09:08:18.9178887Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9179087Z mov.b32 %r1415, 0; 2026-02-21T09:08:18.9179244Z mov.b32 %r1270, %r1415; 2026-02-21T09:08:18.9179408Z mov.b32 %r1271, %r1415; 2026-02-21T09:08:18.9179575Z mov.b32 %r1269, %r2624; 2026-02-21T09:08:18.9179734Z // begin inline asm 2026-02-21T09:08:18.9180451Z // wait for regs: %r937,%r938,%r939,%r940,%r941,%r942,%r943,%r944,%r1017,%r1018,%r1019,%r1020,%r1021,%r1022,%r1023,%r1024,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1177,%r1178,%r1179,%r1180,%r1181,%r1182,%r1183,%r1184,%r1269,%r1270,%r1271 2026-02-21T09:08:18.9181169Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9181364Z // end inline asm 2026-02-21T09:08:18.9181508Z $L__tmp6: 2026-02-21T09:08:18.9181802Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9182170Z add.s64 %rd136, %rd250, 64; 2026-02-21T09:08:18.9182345Z // begin inline asm 2026-02-21T09:08:18.9182500Z mov.u64 %rd135, 0x0; 2026-02-21T09:08:18.9182724Z createpolicy.fractional.L2::evict_last.b64 %rd135, 1.0; 2026-02-21T09:08:18.9182976Z // end inline asm 2026-02-21T09:08:18.9183126Z // begin inline asm 2026-02-21T09:08:18.9183276Z mov.u32 %r1307, 0x0; 2026-02-21T09:08:18.9183431Z mov.u32 %r1308, 0x0; 2026-02-21T09:08:18.9183711Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1307, %r1308 }, [ %rd136 + 0 ], %rd135; 2026-02-21T09:08:18.9184048Z // end inline asm 2026-02-21T09:08:18.9184359Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9184719Z bar.sync 0; 2026-02-21T09:08:18.9184886Z st.shared.v2.b32 [%r15], {%r1307, %r1308}; 2026-02-21T09:08:18.9185093Z bar.sync 0; 2026-02-21T09:08:18.9185249Z ld.shared.b16 %rs116, [%r16]; 2026-02-21T09:08:18.9185451Z ld.shared.b16 %rs117, [%r16+512]; 2026-02-21T09:08:18.9185655Z ld.shared.b16 %rs118, [%r16+32]; 2026-02-21T09:08:18.9185848Z ld.shared.b16 %rs119, [%r16+544]; 2026-02-21T09:08:18.9186045Z ld.shared.b16 %rs120, [%r17]; 2026-02-21T09:08:18.9186226Z ld.shared.b16 %rs121, [%r17+512]; 2026-02-21T09:08:18.9186426Z ld.shared.b16 %rs122, [%r17+32]; 2026-02-21T09:08:18.9186765Z ld.shared.b16 %rs123, [%r17+544]; 2026-02-21T09:08:18.9186966Z ld.shared.b16 %rs124, [%r18]; 2026-02-21T09:08:18.9187162Z ld.shared.b16 %rs125, [%r18+512]; 2026-02-21T09:08:18.9187362Z ld.shared.b16 %rs126, [%r18+32]; 2026-02-21T09:08:18.9187557Z ld.shared.b16 %rs127, [%r18+544]; 2026-02-21T09:08:18.9187745Z ld.shared.b16 %rs128, [%r19]; 2026-02-21T09:08:18.9187937Z ld.shared.b16 %rs129, [%r19+512]; 2026-02-21T09:08:18.9188227Z ld.shared.b16 %rs130, [%r19+32]; 2026-02-21T09:08:18.9188491Z ld.shared.b16 %rs131, [%r19+544]; 2026-02-21T09:08:18.9188689Z cvt.f32.bf16 %r1341, %rs116; 2026-02-21T09:08:18.9188871Z cvt.f32.bf16 %r1342, %rs117; 2026-02-21T09:08:18.9189053Z cvt.f32.bf16 %r1343, %rs120; 2026-02-21T09:08:18.9189229Z cvt.f32.bf16 %r1344, %rs121; 2026-02-21T09:08:18.9189407Z cvt.f32.bf16 %r1361, %rs124; 2026-02-21T09:08:18.9189579Z cvt.f32.bf16 %r1362, %rs125; 2026-02-21T09:08:18.9189765Z cvt.f32.bf16 %r1363, %rs128; 2026-02-21T09:08:18.9189939Z cvt.f32.bf16 %r1364, %rs129; 2026-02-21T09:08:18.9190119Z cvt.f32.bf16 %r1381, %rs118; 2026-02-21T09:08:18.9190302Z cvt.f32.bf16 %r1382, %rs119; 2026-02-21T09:08:18.9190484Z cvt.f32.bf16 %r1383, %rs122; 2026-02-21T09:08:18.9190818Z cvt.f32.bf16 %r1384, %rs123; 2026-02-21T09:08:18.9191000Z cvt.f32.bf16 %r1401, %rs126; 2026-02-21T09:08:18.9191177Z cvt.f32.bf16 %r1402, %rs127; 2026-02-21T09:08:18.9191349Z cvt.f32.bf16 %r1403, %rs130; 2026-02-21T09:08:18.9191530Z cvt.f32.bf16 %r1404, %rs131; 2026-02-21T09:08:18.9191705Z cvt.u32.u64 %r1437, %rd251; 2026-02-21T09:08:18.9191886Z add.s32 %r1438, %r1437, 48; 2026-02-21T09:08:18.9192206Z .loc 1 54 62 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:62 2026-02-21T09:08:18.9192568Z or.b32 %r1439, %r5, %r1438; 2026-02-21T09:08:18.9192743Z shl.b32 %r1440, %r1439, 13; 2026-02-21T09:08:18.9192806Z add.s32 %r1441, %r1440, %r76; 2026-02-21T09:08:18.9193014Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9193080Z cvt.s64.s32 %rd146, %r1441; 2026-02-21T09:08:18.9193153Z add.s64 %rd139, %rd54, %rd146; 2026-02-21T09:08:18.9193428Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9193496Z // begin inline asm 2026-02-21T09:08:18.9193562Z mov.u64 %rd138, 0x0; 2026-02-21T09:08:18.9193693Z createpolicy.fractional.L2::evict_first.b64 %rd138, 1.0; 2026-02-21T09:08:18.9193755Z // end inline asm 2026-02-21T09:08:18.9193818Z // begin inline asm 2026-02-21T09:08:18.9193878Z mov.u16 %rs78, 0x0; 2026-02-21T09:08:18.9194041Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs78 }, [ %rd139 + 0 ], %rd138; 2026-02-21T09:08:18.9194099Z // end inline asm 2026-02-21T09:08:18.9194177Z shr.u16 %rs132, %rs78, 8; 2026-02-21T09:08:18.9194393Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9194451Z bar.sync 0; 2026-02-21T09:08:18.9194523Z st.shared.b8 [%r20], %rs78; 2026-02-21T09:08:18.9194591Z st.shared.b8 [%r21+512], %rs132; 2026-02-21T09:08:18.9194650Z bar.sync 0; 2026-02-21T09:08:18.9194720Z ld.shared.b32 %r1442, [%r22]; 2026-02-21T09:08:18.9194797Z prmt.b32 %r1443, %r1442, 0, 0x7770U; 2026-02-21T09:08:18.9194861Z cvt.u16.u32 %rs133, %r1443; 2026-02-21T09:08:18.9194926Z prmt.b32 %r1444, %r1442, 0, 0x7771U; 2026-02-21T09:08:18.9194995Z cvt.u16.u32 %rs134, %r1444; 2026-02-21T09:08:18.9195059Z prmt.b32 %r1445, %r1442, 0, 0x7772U; 2026-02-21T09:08:18.9195122Z cvt.u16.u32 %rs135, %r1445; 2026-02-21T09:08:18.9195192Z prmt.b32 %r1446, %r1442, 0, 0x7773U; 2026-02-21T09:08:18.9195260Z cvt.u16.u32 %rs136, %r1446; 2026-02-21T09:08:18.9195464Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9195531Z shl.b16 %rs137, %rs133, 4; 2026-02-21T09:08:18.9195600Z shl.b16 %rs138, %rs134, 4; 2026-02-21T09:08:18.9195663Z shl.b16 %rs139, %rs135, 4; 2026-02-21T09:08:18.9195722Z shl.b16 %rs140, %rs136, 4; 2026-02-21T09:08:18.9195929Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9196009Z selp.b16 %rs141, %rs137, %rs133, %p93; 2026-02-21T09:08:18.9196076Z cvt.s16.s8 %rs142, %rs141; 2026-02-21T09:08:18.9196143Z shr.s16 %rs143, %rs142, 4; 2026-02-21T09:08:18.9196217Z selp.b16 %rs144, %rs138, %rs134, %p93; 2026-02-21T09:08:18.9196342Z cvt.s16.s8 %rs145, %rs144; 2026-02-21T09:08:18.9196404Z shr.s16 %rs146, %rs145, 4; 2026-02-21T09:08:18.9196604Z selp.b16 %rs147, %rs139, %rs135, %p93; 2026-02-21T09:08:18.9196672Z cvt.s16.s8 %rs148, %rs147; 2026-02-21T09:08:18.9196733Z shr.s16 %rs149, %rs148, 4; 2026-02-21T09:08:18.9196808Z selp.b16 %rs150, %rs140, %rs136, %p93; 2026-02-21T09:08:18.9196870Z cvt.s16.s8 %rs151, %rs150; 2026-02-21T09:08:18.9196931Z shr.s16 %rs152, %rs151, 4; 2026-02-21T09:08:18.9197134Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9197213Z cvt.rn.f32.s16 %r1447, %rs143; 2026-02-21T09:08:18.9197276Z cvt.rn.f32.s16 %r1448, %rs146; 2026-02-21T09:08:18.9197450Z cvt.rn.f32.s16 %r1449, %rs149; 2026-02-21T09:08:18.9197585Z cvt.rn.f32.s16 %r1450, %rs152; 2026-02-21T09:08:18.9197644Z bar.sync 0; 2026-02-21T09:08:18.9197710Z st.shared.b32 [%r23], %r1447; 2026-02-21T09:08:18.9197786Z st.shared.b32 [%r24], %r1448; 2026-02-21T09:08:18.9197853Z st.shared.b32 [%r25], %r1449; 2026-02-21T09:08:18.9197922Z st.shared.b32 [%r26], %r1450; 2026-02-21T09:08:18.9197978Z $L__tmp7: 2026-02-21T09:08:18.9198265Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9198457Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r937, %r1017, %r1097, %r1177}; 2026-02-21T09:08:18.9198515Z bar.sync 0; 2026-02-21T09:08:18.9198582Z // begin inline asm 2026-02-21T09:08:18.9198719Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1345}, [%r695]; 2026-02-21T09:08:18.9198778Z // end inline asm 2026-02-21T09:08:18.9198834Z bar.sync 0; 2026-02-21T09:08:18.9199022Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r939, %r1019, %r1099, %r1179}; 2026-02-21T09:08:18.9199158Z bar.sync 0; 2026-02-21T09:08:18.9199225Z // begin inline asm 2026-02-21T09:08:18.9199363Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1347}, [%r695]; 2026-02-21T09:08:18.9199423Z // end inline asm 2026-02-21T09:08:18.9199479Z bar.sync 0; 2026-02-21T09:08:18.9199663Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r938, %r1018, %r1098, %r1178}; 2026-02-21T09:08:18.9199720Z bar.sync 0; 2026-02-21T09:08:18.9199779Z // begin inline asm 2026-02-21T09:08:18.9199907Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1346}, [%r695]; 2026-02-21T09:08:18.9199973Z // end inline asm 2026-02-21T09:08:18.9200029Z bar.sync 0; 2026-02-21T09:08:18.9200205Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r940, %r1020, %r1100, %r1180}; 2026-02-21T09:08:18.9200268Z bar.sync 0; 2026-02-21T09:08:18.9200329Z // begin inline asm 2026-02-21T09:08:18.9200457Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1348}, [%r695]; 2026-02-21T09:08:18.9200517Z // end inline asm 2026-02-21T09:08:18.9200585Z bar.sync 0; 2026-02-21T09:08:18.9200757Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r941, %r1021, %r1101, %r1181}; 2026-02-21T09:08:18.9200814Z bar.sync 0; 2026-02-21T09:08:18.9200879Z // begin inline asm 2026-02-21T09:08:18.9201006Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1349}, [%r695]; 2026-02-21T09:08:18.9201063Z // end inline asm 2026-02-21T09:08:18.9201118Z bar.sync 0; 2026-02-21T09:08:18.9201295Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r943, %r1023, %r1103, %r1183}; 2026-02-21T09:08:18.9201352Z bar.sync 0; 2026-02-21T09:08:18.9201412Z // begin inline asm 2026-02-21T09:08:18.9201543Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1351}, [%r695]; 2026-02-21T09:08:18.9201600Z // end inline asm 2026-02-21T09:08:18.9201655Z bar.sync 0; 2026-02-21T09:08:18.9201843Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r942, %r1022, %r1102, %r1182}; 2026-02-21T09:08:18.9201899Z bar.sync 0; 2026-02-21T09:08:18.9201960Z // begin inline asm 2026-02-21T09:08:18.9202091Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1350}, [%r695]; 2026-02-21T09:08:18.9202158Z // end inline asm 2026-02-21T09:08:18.9202213Z bar.sync 0; 2026-02-21T09:08:18.9202461Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r944, %r1024, %r1104, %r1184}; 2026-02-21T09:08:18.9202521Z bar.sync 0; 2026-02-21T09:08:18.9202579Z // begin inline asm 2026-02-21T09:08:18.9202704Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1352}, [%r695]; 2026-02-21T09:08:18.9202759Z // end inline asm 2026-02-21T09:08:18.9202822Z // begin inline asm 2026-02-21T09:08:18.9202900Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9202956Z // end inline asm 2026-02-21T09:08:18.9203036Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9203097Z shl.b32 %r1451, %r1436, 9; 2026-02-21T09:08:18.9203160Z and.b32 %r1452, %r1451, 6144; 2026-02-21T09:08:18.9203230Z add.s32 %r1453, %r1452, %r2624; 2026-02-21T09:08:18.9203296Z bfe.u32 %r1454, %r1453, 4, 14; 2026-02-21T09:08:18.9203456Z cvt.u64.u32 %rd147, %r1454; 2026-02-21T09:08:18.9203539Z or.b64 %rd141, %rd147, 4611686293338849280; 2026-02-21T09:08:18.9203606Z // begin inline asm 2026-02-21T09:08:18.9203992Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1345,%r1346,%r1347,%r1348,%r1349,%r1350,%r1351,%r1352}, {%r1341,%r1342,%r1343,%r1344}, %rd141, %p24, 1, 1; 2026-02-21T09:08:18.9204051Z // end inline asm 2026-02-21T09:08:18.9204117Z add.s32 %r1455, %r1453, 32; 2026-02-21T09:08:18.9204179Z bfe.u32 %r1456, %r1455, 4, 14; 2026-02-21T09:08:18.9204240Z cvt.u64.u32 %rd148, %r1456; 2026-02-21T09:08:18.9204316Z or.b64 %rd142, %rd148, 4611686293338849280; 2026-02-21T09:08:18.9204382Z // begin inline asm 2026-02-21T09:08:18.9204751Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1345,%r1346,%r1347,%r1348,%r1349,%r1350,%r1351,%r1352}, {%r1361,%r1362,%r1363,%r1364}, %rd142, %p24, 1, 1; 2026-02-21T09:08:18.9204808Z // end inline asm 2026-02-21T09:08:18.9204874Z add.s32 %r1457, %r1453, 64; 2026-02-21T09:08:18.9204940Z bfe.u32 %r1458, %r1457, 4, 14; 2026-02-21T09:08:18.9205061Z cvt.u64.u32 %rd149, %r1458; 2026-02-21T09:08:18.9205144Z or.b64 %rd143, %rd149, 4611686293338849280; 2026-02-21T09:08:18.9205212Z // begin inline asm 2026-02-21T09:08:18.9205583Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1345,%r1346,%r1347,%r1348,%r1349,%r1350,%r1351,%r1352}, {%r1381,%r1382,%r1383,%r1384}, %rd143, %p24, 1, 1; 2026-02-21T09:08:18.9205646Z // end inline asm 2026-02-21T09:08:18.9205708Z add.s32 %r1459, %r1453, 96; 2026-02-21T09:08:18.9205769Z bfe.u32 %r1460, %r1459, 4, 14; 2026-02-21T09:08:18.9205831Z cvt.u64.u32 %rd150, %r1460; 2026-02-21T09:08:18.9205910Z or.b64 %rd144, %rd150, 4611686293338849280; 2026-02-21T09:08:18.9205970Z // begin inline asm 2026-02-21T09:08:18.9206336Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1345,%r1346,%r1347,%r1348,%r1349,%r1350,%r1351,%r1352}, {%r1401,%r1402,%r1403,%r1404}, %rd144, %p24, 1, 1; 2026-02-21T09:08:18.9206398Z // end inline asm 2026-02-21T09:08:18.9206589Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9206657Z mov.b32 %r1414, %r1415; 2026-02-21T09:08:18.9206717Z mov.b32 %r1413, %r2624; 2026-02-21T09:08:18.9206781Z // begin inline asm 2026-02-21T09:08:18.9206962Z // wait for regs: %r1345,%r1346,%r1347,%r1348,%r1349,%r1350,%r1351,%r1352,%r1413,%r1414,%r1415 2026-02-21T09:08:18.9207040Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9207106Z // end inline asm 2026-02-21T09:08:18.9207162Z $L__tmp8: 2026-02-21T09:08:18.9207384Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9207451Z add.s64 %rd251, %rd251, 32; 2026-02-21T09:08:18.9207517Z add.s64 %rd250, %rd250, 128; 2026-02-21T09:08:18.9207592Z add.s32 %r2823, %r2823, 262144; 2026-02-21T09:08:18.9207662Z setp.lt.u64 %p45, %rd251, 480; 2026-02-21T09:08:18.9207731Z @%p45 bra $L__BB0_5; 2026-02-21T09:08:18.9207845Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:18.9208062Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9208133Z or.b32 %r1466, %r75, %r11; 2026-02-21T09:08:18.9208417Z .loc 1 87 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:87:28 2026-02-21T09:08:18.9208500Z cvt.rn.bf16x2.f32 %r1467, %r1346, %r1345; 2026-02-21T09:08:18.9208580Z cvt.rn.bf16x2.f32 %r1468, %r1348, %r1347; 2026-02-21T09:08:18.9208656Z cvt.rn.bf16x2.f32 %r1469, %r1350, %r1349; 2026-02-21T09:08:18.9208727Z cvt.rn.bf16x2.f32 %r1470, %r1352, %r1351; 2026-02-21T09:08:18.9208938Z .loc 1 88 43 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:43 2026-02-21T09:08:18.9209010Z shl.b32 %r1471, %r77, 13; 2026-02-21T09:08:18.9209210Z .loc 1 88 50 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:50 2026-02-21T09:08:18.9209275Z add.s32 %r1472, %r1466, %r1471; 2026-02-21T09:08:18.9209624Z .loc 1 88 22 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:22 2026-02-21T09:08:18.9209700Z mad.wide.s32 %rd151, %r1472, 2, %rd55; 2026-02-21T09:08:18.9209911Z .loc 1 88 81 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:81 2026-02-21T09:08:18.9209978Z bar.sync 0; 2026-02-21T09:08:18.9210168Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r1467, %r1468, %r1469, %r1470}; 2026-02-21T09:08:18.9210229Z bar.sync 0; 2026-02-21T09:08:18.9210349Z ld.shared.v4.b32 {%r1461, %r1462, %r1463, %r1464}, [%r30]; 2026-02-21T09:08:18.9210411Z // begin inline asm 2026-02-21T09:08:18.9210542Z st.global.v4.b32 [ %rd151 + 0 ], { %r1461, %r1462, %r1463, %r1464 }; 2026-02-21T09:08:18.9210602Z // end inline asm 2026-02-21T09:08:18.9210821Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.9210886Z add.s32 %r1473, %r2813, 4224; 2026-02-21T09:08:18.9211156Z .loc 1 25 35 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:25:35 2026-02-21T09:08:18.9211227Z shr.s32 %r1474, %r1473, 31; 2026-02-21T09:08:18.9211287Z shr.u32 %r1475, %r1474, 24; 2026-02-21T09:08:18.9211353Z add.s32 %r1476, %r1473, %r1475; 2026-02-21T09:08:18.9211416Z shr.s32 %r1477, %r1476, 8; 2026-02-21T09:08:18.9211622Z .loc 1 26 33 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:26:33 2026-02-21T09:08:18.9211684Z shl.b32 %r1478, %r1477, 2; 2026-02-21T09:08:18.9211882Z .loc 1 27 39 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:39 2026-02-21T09:08:18.9211950Z sub.s32 %r1479, 128, %r1478; 2026-02-21T09:08:18.9212148Z .loc 1 27 52 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:27:52 2026-02-21T09:08:18.9212208Z min.s32 %r1480, %r1479, 4; 2026-02-21T09:08:18.9212410Z .loc 1 28 45 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:45 2026-02-21T09:08:18.9212478Z and.b32 %r1481, %r1476, -256; 2026-02-21T09:08:18.9212542Z sub.s32 %r1482, %r1473, %r1481; 2026-02-21T09:08:18.9212744Z .loc 1 29 51 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:29:51 2026-02-21T09:08:18.9212808Z div.s32 %r1483, %r1482, %r1480; 2026-02-21T09:08:18.9213007Z .loc 1 28 64 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:64 2026-02-21T09:08:18.9213074Z mul.lo.s32 %r1484, %r1483, %r1480; 2026-02-21T09:08:18.9213159Z sub.s32 %r1485, %r1482, %r1484; 2026-02-21T09:08:18.9213367Z .loc 1 28 30 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:30 2026-02-21T09:08:18.9213432Z add.s32 %r1486, %r1485, %r1478; 2026-02-21T09:08:18.9213638Z .loc 1 30 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:30:27 2026-02-21T09:08:18.9213699Z shl.b32 %r97, %r1486, 6; 2026-02-21T09:08:18.9213902Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9213969Z or.b32 %r98, %r97, %r9; 2026-02-21T09:08:18.9214167Z .loc 1 32 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:32:27 2026-02-21T09:08:18.9214289Z shl.b32 %r1487, %r1483, 6; 2026-02-21T09:08:18.9214493Z .loc 1 33 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:33:32 2026-02-21T09:08:18.9214555Z or.b32 %r99, %r1487, %r7; 2026-02-21T09:08:18.9214761Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9214823Z shl.b32 %r1488, %r1483, 16; 2026-02-21T09:08:18.9214890Z or.b32 %r1489, %r31, %r1488; 2026-02-21T09:08:18.9214960Z mad.wide.s32 %rd252, %r1489, 2, %rd53; 2026-02-21T09:08:18.9215021Z add.s32 %r2832, %r32, %r97; 2026-02-21T09:08:18.9215086Z mov.b32 %r1960, 0f00000000; 2026-02-21T09:08:18.9215147Z mov.b64 %rd253, -32; 2026-02-21T09:08:18.9215304Z mov.b32 %r1961, %r1960; 2026-02-21T09:08:18.9215374Z mov.b32 %r1962, %r1960; 2026-02-21T09:08:18.9215434Z mov.b32 %r1963, %r1960; 2026-02-21T09:08:18.9215492Z mov.b32 %r1964, %r1960; 2026-02-21T09:08:18.9215557Z mov.b32 %r1965, %r1960; 2026-02-21T09:08:18.9215621Z mov.b32 %r1966, %r1960; 2026-02-21T09:08:18.9215682Z mov.b32 %r1967, %r1960; 2026-02-21T09:08:18.9215796Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:08:18.9215909Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:18.9216110Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9216174Z // begin inline asm 2026-02-21T09:08:18.9216236Z mov.u64 %rd153, 0x0; 2026-02-21T09:08:18.9216369Z createpolicy.fractional.L2::evict_last.b64 %rd153, 1.0; 2026-02-21T09:08:18.9216426Z // end inline asm 2026-02-21T09:08:18.9216749Z // begin inline asm 2026-02-21T09:08:18.9216824Z mov.u32 %r1490, 0x0; 2026-02-21T09:08:18.9216967Z mov.u32 %r1491, 0x0; 2026-02-21T09:08:18.9217169Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1490, %r1491 }, [ %rd252 + 0 ], %rd153; 2026-02-21T09:08:18.9217237Z // end inline asm 2026-02-21T09:08:18.9217445Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9217504Z bar.sync 0; 2026-02-21T09:08:18.9217585Z st.shared.v2.b32 [%r15], {%r1490, %r1491}; 2026-02-21T09:08:18.9217647Z bar.sync 0; 2026-02-21T09:08:18.9217715Z ld.shared.b16 %rs155, [%r16]; 2026-02-21T09:08:18.9217786Z ld.shared.b16 %rs156, [%r16+512]; 2026-02-21T09:08:18.9217863Z ld.shared.b16 %rs157, [%r16+32]; 2026-02-21T09:08:18.9217930Z ld.shared.b16 %rs158, [%r16+544]; 2026-02-21T09:08:18.9217994Z ld.shared.b16 %rs159, [%r17]; 2026-02-21T09:08:18.9218064Z ld.shared.b16 %rs160, [%r17+512]; 2026-02-21T09:08:18.9218131Z ld.shared.b16 %rs161, [%r17+32]; 2026-02-21T09:08:18.9218198Z ld.shared.b16 %rs162, [%r17+544]; 2026-02-21T09:08:18.9218265Z ld.shared.b16 %rs163, [%r18]; 2026-02-21T09:08:18.9218336Z ld.shared.b16 %rs164, [%r18+512]; 2026-02-21T09:08:18.9218402Z ld.shared.b16 %rs165, [%r18+32]; 2026-02-21T09:08:18.9218470Z ld.shared.b16 %rs166, [%r18+544]; 2026-02-21T09:08:18.9218540Z ld.shared.b16 %rs167, [%r19]; 2026-02-21T09:08:18.9218608Z ld.shared.b16 %rs168, [%r19+512]; 2026-02-21T09:08:18.9218675Z ld.shared.b16 %rs169, [%r19+32]; 2026-02-21T09:08:18.9218739Z ld.shared.b16 %rs170, [%r19+544]; 2026-02-21T09:08:18.9218814Z cvt.f32.bf16 %r1628, %rs155; 2026-02-21T09:08:18.9218879Z cvt.f32.bf16 %r1629, %rs156; 2026-02-21T09:08:18.9218942Z cvt.f32.bf16 %r1630, %rs159; 2026-02-21T09:08:18.9219009Z cvt.f32.bf16 %r1631, %rs160; 2026-02-21T09:08:18.9219072Z cvt.f32.bf16 %r1648, %rs163; 2026-02-21T09:08:18.9219134Z cvt.f32.bf16 %r1649, %rs164; 2026-02-21T09:08:18.9219197Z cvt.f32.bf16 %r1650, %rs167; 2026-02-21T09:08:18.9219267Z cvt.f32.bf16 %r1651, %rs168; 2026-02-21T09:08:18.9219333Z cvt.f32.bf16 %r1668, %rs157; 2026-02-21T09:08:18.9219395Z cvt.f32.bf16 %r1669, %rs158; 2026-02-21T09:08:18.9219464Z cvt.f32.bf16 %r1670, %rs161; 2026-02-21T09:08:18.9219526Z cvt.f32.bf16 %r1671, %rs162; 2026-02-21T09:08:18.9219679Z cvt.f32.bf16 %r1688, %rs165; 2026-02-21T09:08:18.9219750Z cvt.f32.bf16 %r1689, %rs166; 2026-02-21T09:08:18.9219813Z cvt.f32.bf16 %r1690, %rs169; 2026-02-21T09:08:18.9219877Z cvt.f32.bf16 %r1691, %rs170; 2026-02-21T09:08:18.9220080Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9220152Z cvt.s64.s32 %rd185, %r2832; 2026-02-21T09:08:18.9220224Z add.s64 %rd157, %rd54, %rd185; 2026-02-21T09:08:18.9220426Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9220493Z // begin inline asm 2026-02-21T09:08:18.9220554Z mov.u64 %rd156, 0x0; 2026-02-21T09:08:18.9220681Z createpolicy.fractional.L2::evict_first.b64 %rd156, 1.0; 2026-02-21T09:08:18.9220868Z // end inline asm 2026-02-21T09:08:18.9220936Z // begin inline asm 2026-02-21T09:08:18.9220995Z mov.u16 %rs153, 0x0; 2026-02-21T09:08:18.9221172Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs153 }, [ %rd157 + 0 ], %rd156; 2026-02-21T09:08:18.9221239Z // end inline asm 2026-02-21T09:08:18.9221303Z shr.u16 %rs171, %rs153, 8; 2026-02-21T09:08:18.9221506Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9221567Z bar.sync 0; 2026-02-21T09:08:18.9221633Z st.shared.b8 [%r20], %rs153; 2026-02-21T09:08:18.9221698Z st.shared.b8 [%r21+512], %rs171; 2026-02-21T09:08:18.9221753Z bar.sync 0; 2026-02-21T09:08:18.9221824Z ld.shared.b32 %r2042, [%r22]; 2026-02-21T09:08:18.9221893Z prmt.b32 %r2043, %r2042, 0, 0x7770U; 2026-02-21T09:08:18.9221956Z cvt.u16.u32 %rs172, %r2043; 2026-02-21T09:08:18.9222028Z prmt.b32 %r2044, %r2042, 0, 0x7771U; 2026-02-21T09:08:18.9222092Z cvt.u16.u32 %rs173, %r2044; 2026-02-21T09:08:18.9222210Z prmt.b32 %r2045, %r2042, 0, 0x7772U; 2026-02-21T09:08:18.9222280Z cvt.u16.u32 %rs174, %r2045; 2026-02-21T09:08:18.9222344Z prmt.b32 %r2046, %r2042, 0, 0x7773U; 2026-02-21T09:08:18.9222410Z cvt.u16.u32 %rs175, %r2046; 2026-02-21T09:08:18.9222611Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9222679Z shl.b16 %rs176, %rs172, 4; 2026-02-21T09:08:18.9222741Z shl.b16 %rs177, %rs173, 4; 2026-02-21T09:08:18.9222817Z shl.b16 %rs178, %rs174, 4; 2026-02-21T09:08:18.9222884Z shl.b16 %rs179, %rs175, 4; 2026-02-21T09:08:18.9223087Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9223160Z selp.b16 %rs180, %rs176, %rs172, %p93; 2026-02-21T09:08:18.9223222Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T09:08:18.9223289Z shr.s16 %rs182, %rs181, 4; 2026-02-21T09:08:18.9223359Z selp.b16 %rs183, %rs177, %rs173, %p93; 2026-02-21T09:08:18.9223425Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T09:08:18.9223493Z shr.s16 %rs185, %rs184, 4; 2026-02-21T09:08:18.9223564Z selp.b16 %rs186, %rs178, %rs174, %p93; 2026-02-21T09:08:18.9223624Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T09:08:18.9223696Z shr.s16 %rs188, %rs187, 4; 2026-02-21T09:08:18.9223765Z selp.b16 %rs189, %rs179, %rs175, %p93; 2026-02-21T09:08:18.9223826Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T09:08:18.9223886Z shr.s16 %rs191, %rs190, 4; 2026-02-21T09:08:18.9224093Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9224160Z cvt.rn.f32.s16 %r2047, %rs182; 2026-02-21T09:08:18.9224224Z cvt.rn.f32.s16 %r2048, %rs185; 2026-02-21T09:08:18.9224293Z cvt.rn.f32.s16 %r2049, %rs188; 2026-02-21T09:08:18.9224356Z cvt.rn.f32.s16 %r2050, %rs191; 2026-02-21T09:08:18.9224412Z bar.sync 0; 2026-02-21T09:08:18.9224477Z st.shared.b32 [%r23], %r2047; 2026-02-21T09:08:18.9224550Z st.shared.b32 [%r24], %r2048; 2026-02-21T09:08:18.9224617Z st.shared.b32 [%r25], %r2049; 2026-02-21T09:08:18.9224681Z st.shared.b32 [%r26], %r2050; 2026-02-21T09:08:18.9224823Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1960}; 2026-02-21T09:08:18.9224939Z bar.sync 0; 2026-02-21T09:08:18.9225000Z // begin inline asm 2026-02-21T09:08:18.9225189Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1552, %r1632, %r1712, %r1792}, [%r266]; 2026-02-21T09:08:18.9225254Z // end inline asm 2026-02-21T09:08:18.9225309Z bar.sync 0; 2026-02-21T09:08:18.9225446Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1962}; 2026-02-21T09:08:18.9225507Z bar.sync 0; 2026-02-21T09:08:18.9225567Z // begin inline asm 2026-02-21T09:08:18.9225746Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1554, %r1634, %r1714, %r1794}, [%r266]; 2026-02-21T09:08:18.9225808Z // end inline asm 2026-02-21T09:08:18.9225864Z bar.sync 0; 2026-02-21T09:08:18.9226003Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1961}; 2026-02-21T09:08:18.9226118Z bar.sync 0; 2026-02-21T09:08:18.9226238Z // begin inline asm 2026-02-21T09:08:18.9226420Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1553, %r1633, %r1713, %r1793}, [%r266]; 2026-02-21T09:08:18.9226599Z // end inline asm 2026-02-21T09:08:18.9226667Z bar.sync 0; 2026-02-21T09:08:18.9226799Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1963}; 2026-02-21T09:08:18.9226855Z bar.sync 0; 2026-02-21T09:08:18.9226915Z // begin inline asm 2026-02-21T09:08:18.9227098Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1555, %r1635, %r1715, %r1795}, [%r266]; 2026-02-21T09:08:18.9227157Z // end inline asm 2026-02-21T09:08:18.9227213Z bar.sync 0; 2026-02-21T09:08:18.9227347Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1964}; 2026-02-21T09:08:18.9227403Z bar.sync 0; 2026-02-21T09:08:18.9227463Z // begin inline asm 2026-02-21T09:08:18.9227646Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1556, %r1636, %r1716, %r1796}, [%r266]; 2026-02-21T09:08:18.9227705Z // end inline asm 2026-02-21T09:08:18.9227762Z bar.sync 0; 2026-02-21T09:08:18.9227977Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1966}; 2026-02-21T09:08:18.9228052Z bar.sync 0; 2026-02-21T09:08:18.9228117Z // begin inline asm 2026-02-21T09:08:18.9228298Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1558, %r1638, %r1718, %r1798}, [%r266]; 2026-02-21T09:08:18.9228461Z // end inline asm 2026-02-21T09:08:18.9228520Z bar.sync 0; 2026-02-21T09:08:18.9228649Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1965}; 2026-02-21T09:08:18.9228706Z bar.sync 0; 2026-02-21T09:08:18.9228774Z // begin inline asm 2026-02-21T09:08:18.9228949Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1557, %r1637, %r1717, %r1797}, [%r266]; 2026-02-21T09:08:18.9229007Z // end inline asm 2026-02-21T09:08:18.9229071Z bar.sync 0; 2026-02-21T09:08:18.9229198Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r1967}; 2026-02-21T09:08:18.9229255Z bar.sync 0; 2026-02-21T09:08:18.9229315Z // begin inline asm 2026-02-21T09:08:18.9229502Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1559, %r1639, %r1719, %r1799}, [%r266]; 2026-02-21T09:08:18.9229558Z // end inline asm 2026-02-21T09:08:18.9229614Z $L__tmp9: 2026-02-21T09:08:18.9229906Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9229969Z // begin inline asm 2026-02-21T09:08:18.9230048Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9230109Z // end inline asm 2026-02-21T09:08:18.9230201Z shfl.sync.idx.b32 %r2051, %r5, 0, 31, -1; 2026-02-21T09:08:18.9230278Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9230342Z mov.pred %p46, -1; 2026-02-21T09:08:18.9230406Z // begin inline asm 2026-02-21T09:08:18.9230786Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559}, {%r1628,%r1629,%r1630,%r1631}, %rd1, %p46, 1, 1; 2026-02-21T09:08:18.9230843Z // end inline asm 2026-02-21T09:08:18.9230905Z // begin inline asm 2026-02-21T09:08:18.9231275Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559}, {%r1648,%r1649,%r1650,%r1651}, %rd2, %p46, 1, 1; 2026-02-21T09:08:18.9231333Z // end inline asm 2026-02-21T09:08:18.9231495Z // begin inline asm 2026-02-21T09:08:18.9231861Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559}, {%r1668,%r1669,%r1670,%r1671}, %rd3, %p46, 1, 1; 2026-02-21T09:08:18.9231918Z // end inline asm 2026-02-21T09:08:18.9231980Z // begin inline asm 2026-02-21T09:08:18.9232343Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559}, {%r1688,%r1689,%r1690,%r1691}, %rd4, %p46, 1, 1; 2026-02-21T09:08:18.9232401Z // end inline asm 2026-02-21T09:08:18.9232458Z // begin inline asm 2026-02-21T09:08:18.9232823Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1632,%r1633,%r1634,%r1635,%r1636,%r1637,%r1638,%r1639}, {%r1628,%r1629,%r1630,%r1631}, %rd5, %p46, 1, 1; 2026-02-21T09:08:18.9233015Z // end inline asm 2026-02-21T09:08:18.9233078Z // begin inline asm 2026-02-21T09:08:18.9233447Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1632,%r1633,%r1634,%r1635,%r1636,%r1637,%r1638,%r1639}, {%r1648,%r1649,%r1650,%r1651}, %rd6, %p46, 1, 1; 2026-02-21T09:08:18.9233506Z // end inline asm 2026-02-21T09:08:18.9233566Z // begin inline asm 2026-02-21T09:08:18.9233928Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1632,%r1633,%r1634,%r1635,%r1636,%r1637,%r1638,%r1639}, {%r1668,%r1669,%r1670,%r1671}, %rd7, %p46, 1, 1; 2026-02-21T09:08:18.9233986Z // end inline asm 2026-02-21T09:08:18.9234044Z // begin inline asm 2026-02-21T09:08:18.9234406Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1632,%r1633,%r1634,%r1635,%r1636,%r1637,%r1638,%r1639}, {%r1688,%r1689,%r1690,%r1691}, %rd8, %p46, 1, 1; 2026-02-21T09:08:18.9234463Z // end inline asm 2026-02-21T09:08:18.9234521Z // begin inline asm 2026-02-21T09:08:18.9234935Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1712,%r1713,%r1714,%r1715,%r1716,%r1717,%r1718,%r1719}, {%r1628,%r1629,%r1630,%r1631}, %rd9, %p46, 1, 1; 2026-02-21T09:08:18.9235001Z // end inline asm 2026-02-21T09:08:18.9235058Z // begin inline asm 2026-02-21T09:08:18.9235433Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1712,%r1713,%r1714,%r1715,%r1716,%r1717,%r1718,%r1719}, {%r1648,%r1649,%r1650,%r1651}, %rd10, %p46, 1, 1; 2026-02-21T09:08:18.9235498Z // end inline asm 2026-02-21T09:08:18.9235557Z // begin inline asm 2026-02-21T09:08:18.9235923Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1712,%r1713,%r1714,%r1715,%r1716,%r1717,%r1718,%r1719}, {%r1668,%r1669,%r1670,%r1671}, %rd11, %p46, 1, 1; 2026-02-21T09:08:18.9235996Z // end inline asm 2026-02-21T09:08:18.9236058Z // begin inline asm 2026-02-21T09:08:18.9236422Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1712,%r1713,%r1714,%r1715,%r1716,%r1717,%r1718,%r1719}, {%r1688,%r1689,%r1690,%r1691}, %rd12, %p46, 1, 1; 2026-02-21T09:08:18.9236604Z // end inline asm 2026-02-21T09:08:18.9236669Z // begin inline asm 2026-02-21T09:08:18.9237032Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799}, {%r1628,%r1629,%r1630,%r1631}, %rd13, %p46, 1, 1; 2026-02-21T09:08:18.9237091Z // end inline asm 2026-02-21T09:08:18.9237160Z // begin inline asm 2026-02-21T09:08:18.9237525Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799}, {%r1648,%r1649,%r1650,%r1651}, %rd14, %p46, 1, 1; 2026-02-21T09:08:18.9237583Z // end inline asm 2026-02-21T09:08:18.9237648Z // begin inline asm 2026-02-21T09:08:18.9238011Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799}, {%r1668,%r1669,%r1670,%r1671}, %rd15, %p46, 1, 1; 2026-02-21T09:08:18.9238069Z // end inline asm 2026-02-21T09:08:18.9238133Z // begin inline asm 2026-02-21T09:08:18.9238498Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799}, {%r1688,%r1689,%r1690,%r1691}, %rd16, %p46, 1, 1; 2026-02-21T09:08:18.9238559Z // end inline asm 2026-02-21T09:08:18.9238645Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9238794Z mov.b32 %r2030, 0; 2026-02-21T09:08:18.9238858Z mov.b32 %r1884, %r2624; 2026-02-21T09:08:18.9238917Z mov.b32 %r1885, %r2030; 2026-02-21T09:08:18.9238983Z mov.b32 %r1886, %r2030; 2026-02-21T09:08:18.9239042Z // begin inline asm 2026-02-21T09:08:18.9239603Z // wait for regs: %r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559,%r1632,%r1633,%r1634,%r1635,%r1636,%r1637,%r1638,%r1639,%r1712,%r1713,%r1714,%r1715,%r1716,%r1717,%r1718,%r1719,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1884,%r1885,%r1886 2026-02-21T09:08:18.9239690Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9239747Z // end inline asm 2026-02-21T09:08:18.9239804Z $L__tmp10: 2026-02-21T09:08:18.9240094Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9240226Z add.s64 %rd176, %rd252, 64; 2026-02-21T09:08:18.9240290Z // begin inline asm 2026-02-21T09:08:18.9240350Z mov.u64 %rd175, 0x0; 2026-02-21T09:08:18.9240484Z createpolicy.fractional.L2::evict_last.b64 %rd175, 1.0; 2026-02-21T09:08:18.9240545Z // end inline asm 2026-02-21T09:08:18.9240604Z // begin inline asm 2026-02-21T09:08:18.9240667Z mov.u32 %r1922, 0x0; 2026-02-21T09:08:18.9240729Z mov.u32 %r1923, 0x0; 2026-02-21T09:08:18.9240922Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1922, %r1923 }, [ %rd176 + 0 ], %rd175; 2026-02-21T09:08:18.9240983Z // end inline asm 2026-02-21T09:08:18.9241201Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9241258Z bar.sync 0; 2026-02-21T09:08:18.9241341Z st.shared.v2.b32 [%r15], {%r1922, %r1923}; 2026-02-21T09:08:18.9241403Z bar.sync 0; 2026-02-21T09:08:18.9241472Z ld.shared.b16 %rs192, [%r16]; 2026-02-21T09:08:18.9241548Z ld.shared.b16 %rs193, [%r16+512]; 2026-02-21T09:08:18.9241694Z ld.shared.b16 %rs194, [%r16+32]; 2026-02-21T09:08:18.9241767Z ld.shared.b16 %rs195, [%r16+544]; 2026-02-21T09:08:18.9241834Z ld.shared.b16 %rs196, [%r17]; 2026-02-21T09:08:18.9241901Z ld.shared.b16 %rs197, [%r17+512]; 2026-02-21T09:08:18.9241977Z ld.shared.b16 %rs198, [%r17+32]; 2026-02-21T09:08:18.9242041Z ld.shared.b16 %rs199, [%r17+544]; 2026-02-21T09:08:18.9242105Z ld.shared.b16 %rs200, [%r18]; 2026-02-21T09:08:18.9242176Z ld.shared.b16 %rs201, [%r18+512]; 2026-02-21T09:08:18.9242243Z ld.shared.b16 %rs202, [%r18+32]; 2026-02-21T09:08:18.9242309Z ld.shared.b16 %rs203, [%r18+544]; 2026-02-21T09:08:18.9242373Z ld.shared.b16 %rs204, [%r19]; 2026-02-21T09:08:18.9242445Z ld.shared.b16 %rs205, [%r19+512]; 2026-02-21T09:08:18.9242514Z ld.shared.b16 %rs206, [%r19+32]; 2026-02-21T09:08:18.9242580Z ld.shared.b16 %rs207, [%r19+544]; 2026-02-21T09:08:18.9242657Z cvt.f32.bf16 %r1956, %rs192; 2026-02-21T09:08:18.9242723Z cvt.f32.bf16 %r1957, %rs193; 2026-02-21T09:08:18.9242787Z cvt.f32.bf16 %r1958, %rs196; 2026-02-21T09:08:18.9242864Z cvt.f32.bf16 %r1959, %rs197; 2026-02-21T09:08:18.9242925Z cvt.f32.bf16 %r1976, %rs200; 2026-02-21T09:08:18.9242990Z cvt.f32.bf16 %r1977, %rs201; 2026-02-21T09:08:18.9243050Z cvt.f32.bf16 %r1978, %rs204; 2026-02-21T09:08:18.9243121Z cvt.f32.bf16 %r1979, %rs205; 2026-02-21T09:08:18.9243197Z cvt.f32.bf16 %r1996, %rs194; 2026-02-21T09:08:18.9243261Z cvt.f32.bf16 %r1997, %rs195; 2026-02-21T09:08:18.9243328Z cvt.f32.bf16 %r1998, %rs198; 2026-02-21T09:08:18.9243389Z cvt.f32.bf16 %r1999, %rs199; 2026-02-21T09:08:18.9243449Z cvt.f32.bf16 %r2016, %rs202; 2026-02-21T09:08:18.9243510Z cvt.f32.bf16 %r2017, %rs203; 2026-02-21T09:08:18.9243579Z cvt.f32.bf16 %r2018, %rs206; 2026-02-21T09:08:18.9243640Z cvt.f32.bf16 %r2019, %rs207; 2026-02-21T09:08:18.9243702Z cvt.u32.u64 %r2052, %rd253; 2026-02-21T09:08:18.9243767Z add.s32 %r2053, %r2052, 48; 2026-02-21T09:08:18.9243978Z .loc 1 54 62 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:62 2026-02-21T09:08:18.9244040Z or.b32 %r2054, %r5, %r2053; 2026-02-21T09:08:18.9244104Z shl.b32 %r2055, %r2054, 13; 2026-02-21T09:08:18.9244225Z add.s32 %r2056, %r2055, %r98; 2026-02-21T09:08:18.9244426Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9244490Z cvt.s64.s32 %rd186, %r2056; 2026-02-21T09:08:18.9244559Z add.s64 %rd179, %rd54, %rd186; 2026-02-21T09:08:18.9244757Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9244818Z // begin inline asm 2026-02-21T09:08:18.9244882Z mov.u64 %rd178, 0x0; 2026-02-21T09:08:18.9245009Z createpolicy.fractional.L2::evict_first.b64 %rd178, 1.0; 2026-02-21T09:08:18.9245074Z // end inline asm 2026-02-21T09:08:18.9245135Z // begin inline asm 2026-02-21T09:08:18.9245201Z mov.u16 %rs154, 0x0; 2026-02-21T09:08:18.9245465Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs154 }, [ %rd179 + 0 ], %rd178; 2026-02-21T09:08:18.9245525Z // end inline asm 2026-02-21T09:08:18.9245594Z shr.u16 %rs208, %rs154, 8; 2026-02-21T09:08:18.9245796Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9245855Z bar.sync 0; 2026-02-21T09:08:18.9245926Z st.shared.b8 [%r20], %rs154; 2026-02-21T09:08:18.9245995Z st.shared.b8 [%r21+512], %rs208; 2026-02-21T09:08:18.9246049Z bar.sync 0; 2026-02-21T09:08:18.9246114Z ld.shared.b32 %r2057, [%r22]; 2026-02-21T09:08:18.9246188Z prmt.b32 %r2058, %r2057, 0, 0x7770U; 2026-02-21T09:08:18.9246251Z cvt.u16.u32 %rs209, %r2058; 2026-02-21T09:08:18.9246320Z prmt.b32 %r2059, %r2057, 0, 0x7771U; 2026-02-21T09:08:18.9246387Z cvt.u16.u32 %rs210, %r2059; 2026-02-21T09:08:18.9246571Z prmt.b32 %r2060, %r2057, 0, 0x7772U; 2026-02-21T09:08:18.9246640Z cvt.u16.u32 %rs211, %r2060; 2026-02-21T09:08:18.9246713Z prmt.b32 %r2061, %r2057, 0, 0x7773U; 2026-02-21T09:08:18.9246858Z cvt.u16.u32 %rs212, %r2061; 2026-02-21T09:08:18.9247068Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9247134Z shl.b16 %rs213, %rs209, 4; 2026-02-21T09:08:18.9247205Z shl.b16 %rs214, %rs210, 4; 2026-02-21T09:08:18.9247267Z shl.b16 %rs215, %rs211, 4; 2026-02-21T09:08:18.9247335Z shl.b16 %rs216, %rs212, 4; 2026-02-21T09:08:18.9247541Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9247615Z selp.b16 %rs217, %rs213, %rs209, %p93; 2026-02-21T09:08:18.9247678Z cvt.s16.s8 %rs218, %rs217; 2026-02-21T09:08:18.9247741Z shr.s16 %rs219, %rs218, 4; 2026-02-21T09:08:18.9247815Z selp.b16 %rs220, %rs214, %rs210, %p93; 2026-02-21T09:08:18.9247878Z cvt.s16.s8 %rs221, %rs220; 2026-02-21T09:08:18.9247939Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:08:18.9248017Z selp.b16 %rs223, %rs215, %rs211, %p93; 2026-02-21T09:08:18.9248085Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T09:08:18.9248148Z shr.s16 %rs225, %rs224, 4; 2026-02-21T09:08:18.9248227Z selp.b16 %rs226, %rs216, %rs212, %p93; 2026-02-21T09:08:18.9248290Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T09:08:18.9248354Z shr.s16 %rs228, %rs227, 4; 2026-02-21T09:08:18.9248555Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9248634Z cvt.rn.f32.s16 %r2062, %rs219; 2026-02-21T09:08:18.9248699Z cvt.rn.f32.s16 %r2063, %rs222; 2026-02-21T09:08:18.9248765Z cvt.rn.f32.s16 %r2064, %rs225; 2026-02-21T09:08:18.9248832Z cvt.rn.f32.s16 %r2065, %rs228; 2026-02-21T09:08:18.9248888Z bar.sync 0; 2026-02-21T09:08:18.9248952Z st.shared.b32 [%r23], %r2062; 2026-02-21T09:08:18.9249018Z st.shared.b32 [%r24], %r2063; 2026-02-21T09:08:18.9249088Z st.shared.b32 [%r25], %r2064; 2026-02-21T09:08:18.9249153Z st.shared.b32 [%r26], %r2065; 2026-02-21T09:08:18.9249209Z $L__tmp11: 2026-02-21T09:08:18.9249495Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9249688Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1552, %r1632, %r1712, %r1792}; 2026-02-21T09:08:18.9249835Z bar.sync 0; 2026-02-21T09:08:18.9249905Z // begin inline asm 2026-02-21T09:08:18.9250041Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1960}, [%r695]; 2026-02-21T09:08:18.9250098Z // end inline asm 2026-02-21T09:08:18.9250154Z bar.sync 0; 2026-02-21T09:08:18.9250339Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1554, %r1634, %r1714, %r1794}; 2026-02-21T09:08:18.9250398Z bar.sync 0; 2026-02-21T09:08:18.9250460Z // begin inline asm 2026-02-21T09:08:18.9250595Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1962}, [%r695]; 2026-02-21T09:08:18.9250654Z // end inline asm 2026-02-21T09:08:18.9250710Z bar.sync 0; 2026-02-21T09:08:18.9250891Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1553, %r1633, %r1713, %r1793}; 2026-02-21T09:08:18.9251083Z bar.sync 0; 2026-02-21T09:08:18.9251148Z // begin inline asm 2026-02-21T09:08:18.9251280Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1961}, [%r695]; 2026-02-21T09:08:18.9251346Z // end inline asm 2026-02-21T09:08:18.9251404Z bar.sync 0; 2026-02-21T09:08:18.9251581Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1555, %r1635, %r1715, %r1795}; 2026-02-21T09:08:18.9251655Z bar.sync 0; 2026-02-21T09:08:18.9251719Z // begin inline asm 2026-02-21T09:08:18.9251850Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1963}, [%r695]; 2026-02-21T09:08:18.9251908Z // end inline asm 2026-02-21T09:08:18.9251975Z bar.sync 0; 2026-02-21T09:08:18.9252153Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1556, %r1636, %r1716, %r1796}; 2026-02-21T09:08:18.9252215Z bar.sync 0; 2026-02-21T09:08:18.9252277Z // begin inline asm 2026-02-21T09:08:18.9252402Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1964}, [%r695]; 2026-02-21T09:08:18.9252460Z // end inline asm 2026-02-21T09:08:18.9252518Z bar.sync 0; 2026-02-21T09:08:18.9252769Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1558, %r1638, %r1718, %r1798}; 2026-02-21T09:08:18.9252829Z bar.sync 0; 2026-02-21T09:08:18.9252889Z // begin inline asm 2026-02-21T09:08:18.9253027Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1966}, [%r695]; 2026-02-21T09:08:18.9253092Z // end inline asm 2026-02-21T09:08:18.9253147Z bar.sync 0; 2026-02-21T09:08:18.9253325Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1557, %r1637, %r1717, %r1797}; 2026-02-21T09:08:18.9253497Z bar.sync 0; 2026-02-21T09:08:18.9253555Z // begin inline asm 2026-02-21T09:08:18.9253681Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1965}, [%r695]; 2026-02-21T09:08:18.9253754Z // end inline asm 2026-02-21T09:08:18.9253811Z bar.sync 0; 2026-02-21T09:08:18.9253989Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r266], {%r1559, %r1639, %r1719, %r1799}; 2026-02-21T09:08:18.9254056Z bar.sync 0; 2026-02-21T09:08:18.9254116Z // begin inline asm 2026-02-21T09:08:18.9254246Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1967}, [%r695]; 2026-02-21T09:08:18.9254304Z // end inline asm 2026-02-21T09:08:18.9254366Z // begin inline asm 2026-02-21T09:08:18.9254444Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9254503Z // end inline asm 2026-02-21T09:08:18.9254582Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9254644Z shl.b32 %r2066, %r2051, 9; 2026-02-21T09:08:18.9254705Z and.b32 %r2067, %r2066, 6144; 2026-02-21T09:08:18.9254772Z add.s32 %r2068, %r2067, %r2624; 2026-02-21T09:08:18.9254839Z bfe.u32 %r2069, %r2068, 4, 14; 2026-02-21T09:08:18.9254901Z cvt.u64.u32 %rd187, %r2069; 2026-02-21T09:08:18.9254980Z or.b64 %rd181, %rd187, 4611686293338849280; 2026-02-21T09:08:18.9255045Z // begin inline asm 2026-02-21T09:08:18.9255433Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967}, {%r1956,%r1957,%r1958,%r1959}, %rd181, %p46, 1, 1; 2026-02-21T09:08:18.9255492Z // end inline asm 2026-02-21T09:08:18.9255562Z add.s32 %r2070, %r2068, 32; 2026-02-21T09:08:18.9255625Z bfe.u32 %r2071, %r2070, 4, 14; 2026-02-21T09:08:18.9255687Z cvt.u64.u32 %rd188, %r2071; 2026-02-21T09:08:18.9255762Z or.b64 %rd182, %rd188, 4611686293338849280; 2026-02-21T09:08:18.9255890Z // begin inline asm 2026-02-21T09:08:18.9256261Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967}, {%r1976,%r1977,%r1978,%r1979}, %rd182, %p46, 1, 1; 2026-02-21T09:08:18.9256318Z // end inline asm 2026-02-21T09:08:18.9256385Z add.s32 %r2072, %r2068, 64; 2026-02-21T09:08:18.9256562Z bfe.u32 %r2073, %r2072, 4, 14; 2026-02-21T09:08:18.9256630Z cvt.u64.u32 %rd189, %r2073; 2026-02-21T09:08:18.9256709Z or.b64 %rd183, %rd189, 4611686293338849280; 2026-02-21T09:08:18.9256771Z // begin inline asm 2026-02-21T09:08:18.9257141Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967}, {%r1996,%r1997,%r1998,%r1999}, %rd183, %p46, 1, 1; 2026-02-21T09:08:18.9257339Z // end inline asm 2026-02-21T09:08:18.9257412Z add.s32 %r2074, %r2068, 96; 2026-02-21T09:08:18.9257472Z bfe.u32 %r2075, %r2074, 4, 14; 2026-02-21T09:08:18.9257534Z cvt.u64.u32 %rd190, %r2075; 2026-02-21T09:08:18.9257614Z or.b64 %rd184, %rd190, 4611686293338849280; 2026-02-21T09:08:18.9257675Z // begin inline asm 2026-02-21T09:08:18.9258045Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967}, {%r2016,%r2017,%r2018,%r2019}, %rd184, %p46, 1, 1; 2026-02-21T09:08:18.9258110Z // end inline asm 2026-02-21T09:08:18.9258187Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9258249Z mov.b32 %r2028, %r2624; 2026-02-21T09:08:18.9258323Z mov.b32 %r2029, %r2030; 2026-02-21T09:08:18.9258392Z // begin inline asm 2026-02-21T09:08:18.9258570Z // wait for regs: %r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r2028,%r2029,%r2030 2026-02-21T09:08:18.9258646Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9258711Z // end inline asm 2026-02-21T09:08:18.9258769Z $L__tmp12: 2026-02-21T09:08:18.9259057Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9259133Z add.s64 %rd253, %rd253, 32; 2026-02-21T09:08:18.9259198Z add.s64 %rd252, %rd252, 128; 2026-02-21T09:08:18.9259262Z add.s32 %r2832, %r2832, 262144; 2026-02-21T09:08:18.9259329Z setp.lt.u64 %p67, %rd253, 480; 2026-02-21T09:08:18.9259407Z @%p67 bra $L__BB0_7; 2026-02-21T09:08:18.9259522Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:08:18.9259729Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9259796Z or.b32 %r2080, %r97, %r11; 2026-02-21T09:08:18.9259998Z .loc 1 87 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:87:28 2026-02-21T09:08:18.9260079Z cvt.rn.bf16x2.f32 %r2081, %r1961, %r1960; 2026-02-21T09:08:18.9260163Z cvt.rn.bf16x2.f32 %r2082, %r1963, %r1962; 2026-02-21T09:08:18.9260239Z cvt.rn.bf16x2.f32 %r2083, %r1965, %r1964; 2026-02-21T09:08:18.9260311Z cvt.rn.bf16x2.f32 %r2084, %r1967, %r1966; 2026-02-21T09:08:18.9260513Z .loc 1 88 43 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:43 2026-02-21T09:08:18.9260584Z shl.b32 %r2085, %r99, 13; 2026-02-21T09:08:18.9260784Z .loc 1 88 50 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:50 2026-02-21T09:08:18.9260846Z add.s32 %r2086, %r2080, %r2085; 2026-02-21T09:08:18.9261049Z .loc 1 88 22 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:22 2026-02-21T09:08:18.9261122Z mad.wide.s32 %rd191, %r2086, 2, %rd55; 2026-02-21T09:08:18.9261318Z .loc 1 88 81 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:81 2026-02-21T09:08:18.9261376Z bar.sync 0; 2026-02-21T09:08:18.9261563Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r2081, %r2082, %r2083, %r2084}; 2026-02-21T09:08:18.9261623Z bar.sync 0; 2026-02-21T09:08:18.9261734Z ld.shared.v4.b32 {%r2076, %r2077, %r2078, %r2079}, [%r30]; 2026-02-21T09:08:18.9261801Z // begin inline asm 2026-02-21T09:08:18.9262016Z st.global.v4.b32 [ %rd191 + 0 ], { %r2076, %r2077, %r2078, %r2079 }; 2026-02-21T09:08:18.9262075Z // end inline asm 2026-02-21T09:08:18.9262293Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.9262358Z add.s32 %r2813, %r2813, 6336; 2026-02-21T09:08:18.9262429Z setp.lt.s32 %p68, %r2813, %r2841; 2026-02-21T09:08:18.9262494Z @%p68 bra $L__BB0_2; 2026-02-21T09:08:18.9262583Z $L__BB0_9: // %.preheader 2026-02-21T09:08:18.9262649Z setp.gt.s32 %p69, %r2841, 8191; 2026-02-21T09:08:18.9262709Z @%p69 bra $L__BB0_14; 2026-02-21T09:08:18.9262800Z // %bb.10: // %.lr.ph161 2026-02-21T09:08:18.9263053Z .loc 1 0 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:0:145 2026-02-21T09:08:18.9263160Z and.b32 %r2088, %r2792, 4088; 2026-02-21T09:08:18.9263225Z and.b32 %r2090, %r2793, 24; 2026-02-21T09:08:18.9263287Z xor.b32 %r2091, %r2088, %r2090; 2026-02-21T09:08:18.9263349Z add.s32 %r33, %r2624, %r2091; 2026-02-21T09:08:18.9263414Z shl.b32 %r2094, %r2795, 5; 2026-02-21T09:08:18.9263474Z and.b32 %r2096, %r2796, 448; 2026-02-21T09:08:18.9263535Z or.b32 %r2099, %r2094, %r2096; 2026-02-21T09:08:18.9263598Z or.b32 %r2100, %r2798, %r2797; 2026-02-21T09:08:18.9263664Z or.b32 %r2101, %r2099, %r2100; 2026-02-21T09:08:18.9263726Z add.s32 %r34, %r2624, %r2101; 2026-02-21T09:08:18.9263791Z xor.b32 %r2102, %r2101, 8; 2026-02-21T09:08:18.9263863Z add.s32 %r35, %r2624, %r2102; 2026-02-21T09:08:18.9263927Z xor.b32 %r2103, %r2101, 16; 2026-02-21T09:08:18.9263989Z add.s32 %r36, %r2624, %r2103; 2026-02-21T09:08:18.9264049Z xor.b32 %r2104, %r2101, 24; 2026-02-21T09:08:18.9264117Z add.s32 %r37, %r2624, %r2104; 2026-02-21T09:08:18.9264229Z and.b32 %r2106, %r2799, 508; 2026-02-21T09:08:18.9264298Z or.b32 %r2109, %r2801, %r2106; 2026-02-21T09:08:18.9264363Z add.s32 %r38, %r2624, %r2109; 2026-02-21T09:08:18.9264428Z xor.b32 %r2110, %r2109, 64; 2026-02-21T09:08:18.9264491Z add.s32 %r39, %r2624, %r2110; 2026-02-21T09:08:18.9264553Z neg.s32 %r2113, %r2803; 2026-02-21T09:08:18.9264618Z and.b32 %r2114, %r2113, 576; 2026-02-21T09:08:18.9264680Z or.b32 %r2115, %r2802, %r2800; 2026-02-21T09:08:18.9264747Z xor.b32 %r2116, %r2115, %r2114; 2026-02-21T09:08:18.9264813Z add.s32 %r40, %r2624, %r2116; 2026-02-21T09:08:18.9264873Z and.b32 %r2118, %r2804, 8064; 2026-02-21T09:08:18.9264947Z and.b32 %r2121, %r2806, 28; 2026-02-21T09:08:18.9265019Z xor.b32 %r2122, %r2805, %r2121; 2026-02-21T09:08:18.9265083Z or.b32 %r2123, %r2122, %r2118; 2026-02-21T09:08:18.9265144Z add.s32 %r41, %r2624, %r2123; 2026-02-21T09:08:18.9265215Z xor.b32 %r2124, %r2123, 32; 2026-02-21T09:08:18.9265283Z add.s32 %r42, %r2624, %r2124; 2026-02-21T09:08:18.9265347Z xor.b32 %r2125, %r2123, 64; 2026-02-21T09:08:18.9265410Z add.s32 %r43, %r2624, %r2125; 2026-02-21T09:08:18.9265476Z xor.b32 %r2126, %r2123, 96; 2026-02-21T09:08:18.9265541Z add.s32 %r44, %r2624, %r2126; 2026-02-21T09:08:18.9265603Z and.b32 %r2128, %r2807, 120; 2026-02-21T09:08:18.9265667Z or.b32 %r2129, %r2128, %r4; 2026-02-21T09:08:18.9265734Z shl.b32 %r2130, %r2129, 4; 2026-02-21T09:08:18.9265795Z add.s32 %r2131, %r2624, 8192; 2026-02-21T09:08:18.9265858Z add.s32 %r2630, %r2131, %r2130; 2026-02-21T09:08:18.9265924Z shl.b32 %r2132, %r2798, 6; 2026-02-21T09:08:18.9265985Z shl.b32 %r2133, %r2795, 2; 2026-02-21T09:08:18.9266048Z add.s32 %r2134, %r2131, %r2132; 2026-02-21T09:08:18.9266114Z add.s32 %r2135, %r2134, %r2133; 2026-02-21T09:08:18.9266183Z add.s32 %r2201, %r2135, %r2805; 2026-02-21T09:08:18.9266246Z bfe.u32 %r2136, %r2624, 4, 14; 2026-02-21T09:08:18.9266311Z cvt.u64.u32 %rd192, %r2136; 2026-02-21T09:08:18.9266399Z or.b64 %rd215, %rd192, 4611686293338849280; 2026-02-21T09:08:18.9266568Z add.s32 %r2137, %r2624, 32; 2026-02-21T09:08:18.9266635Z bfe.u32 %r2138, %r2137, 4, 14; 2026-02-21T09:08:18.9266703Z cvt.u64.u32 %rd193, %r2138; 2026-02-21T09:08:18.9266871Z or.b64 %rd216, %rd193, 4611686293338849280; 2026-02-21T09:08:18.9266936Z add.s32 %r2139, %r2624, 64; 2026-02-21T09:08:18.9267000Z bfe.u32 %r2140, %r2139, 4, 14; 2026-02-21T09:08:18.9267069Z cvt.u64.u32 %rd194, %r2140; 2026-02-21T09:08:18.9267146Z or.b64 %rd217, %rd194, 4611686293338849280; 2026-02-21T09:08:18.9267207Z add.s32 %r2141, %r2624, 96; 2026-02-21T09:08:18.9267274Z bfe.u32 %r2142, %r2141, 4, 14; 2026-02-21T09:08:18.9267337Z cvt.u64.u32 %rd195, %r2142; 2026-02-21T09:08:18.9267410Z or.b64 %rd218, %rd195, 4611686293338849280; 2026-02-21T09:08:18.9267472Z add.s32 %r2143, %r2624, 2048; 2026-02-21T09:08:18.9267539Z bfe.u32 %r2144, %r2143, 4, 14; 2026-02-21T09:08:18.9267601Z cvt.u64.u32 %rd196, %r2144; 2026-02-21T09:08:18.9267801Z or.b64 %rd219, %rd196, 4611686293338849280; 2026-02-21T09:08:18.9267872Z add.s32 %r2145, %r2624, 2080; 2026-02-21T09:08:18.9267935Z bfe.u32 %r2146, %r2145, 4, 14; 2026-02-21T09:08:18.9267997Z cvt.u64.u32 %rd197, %r2146; 2026-02-21T09:08:18.9268072Z or.b64 %rd220, %rd197, 4611686293338849280; 2026-02-21T09:08:18.9268140Z add.s32 %r2147, %r2624, 2112; 2026-02-21T09:08:18.9268201Z bfe.u32 %r2148, %r2147, 4, 14; 2026-02-21T09:08:18.9268263Z cvt.u64.u32 %rd198, %r2148; 2026-02-21T09:08:18.9268441Z or.b64 %rd221, %rd198, 4611686293338849280; 2026-02-21T09:08:18.9268506Z add.s32 %r2149, %r2624, 2144; 2026-02-21T09:08:18.9268569Z bfe.u32 %r2150, %r2149, 4, 14; 2026-02-21T09:08:18.9268639Z cvt.u64.u32 %rd199, %r2150; 2026-02-21T09:08:18.9268712Z or.b64 %rd222, %rd199, 4611686293338849280; 2026-02-21T09:08:18.9268775Z add.s32 %r2151, %r2624, 4096; 2026-02-21T09:08:18.9268840Z bfe.u32 %r2152, %r2151, 4, 14; 2026-02-21T09:08:18.9268910Z cvt.u64.u32 %rd200, %r2152; 2026-02-21T09:08:18.9268989Z or.b64 %rd223, %rd200, 4611686293338849280; 2026-02-21T09:08:18.9269124Z add.s32 %r2153, %r2624, 4128; 2026-02-21T09:08:18.9269195Z bfe.u32 %r2154, %r2153, 4, 14; 2026-02-21T09:08:18.9269260Z cvt.u64.u32 %rd201, %r2154; 2026-02-21T09:08:18.9269335Z or.b64 %rd224, %rd201, 4611686293338849280; 2026-02-21T09:08:18.9269399Z add.s32 %r2155, %r2624, 4160; 2026-02-21T09:08:18.9269467Z bfe.u32 %r2156, %r2155, 4, 14; 2026-02-21T09:08:18.9269531Z cvt.u64.u32 %rd202, %r2156; 2026-02-21T09:08:18.9269602Z or.b64 %rd225, %rd202, 4611686293338849280; 2026-02-21T09:08:18.9269670Z add.s32 %r2157, %r2624, 4192; 2026-02-21T09:08:18.9269732Z bfe.u32 %r2158, %r2157, 4, 14; 2026-02-21T09:08:18.9269793Z cvt.u64.u32 %rd203, %r2158; 2026-02-21T09:08:18.9269871Z or.b64 %rd226, %rd203, 4611686293338849280; 2026-02-21T09:08:18.9269933Z add.s32 %r2159, %r2624, 6144; 2026-02-21T09:08:18.9269994Z bfe.u32 %r2160, %r2159, 4, 14; 2026-02-21T09:08:18.9270066Z cvt.u64.u32 %rd204, %r2160; 2026-02-21T09:08:18.9270149Z or.b64 %rd227, %rd204, 4611686293338849280; 2026-02-21T09:08:18.9270213Z add.s32 %r2161, %r2624, 6176; 2026-02-21T09:08:18.9270275Z bfe.u32 %r2162, %r2161, 4, 14; 2026-02-21T09:08:18.9270344Z cvt.u64.u32 %rd205, %r2162; 2026-02-21T09:08:18.9270418Z or.b64 %rd228, %rd205, 4611686293338849280; 2026-02-21T09:08:18.9270480Z add.s32 %r2163, %r2624, 6208; 2026-02-21T09:08:18.9270541Z bfe.u32 %r2164, %r2163, 4, 14; 2026-02-21T09:08:18.9270612Z cvt.u64.u32 %rd206, %r2164; 2026-02-21T09:08:18.9270683Z or.b64 %rd229, %rd206, 4611686293338849280; 2026-02-21T09:08:18.9270744Z add.s32 %r2165, %r2624, 6240; 2026-02-21T09:08:18.9270810Z bfe.u32 %r2166, %r2165, 4, 14; 2026-02-21T09:08:18.9270871Z cvt.u64.u32 %rd207, %r2166; 2026-02-21T09:08:18.9270943Z or.b64 %rd230, %rd207, 4611686293338849280; 2026-02-21T09:08:18.9271005Z and.b32 %r2168, %r2796, 240; 2026-02-21T09:08:18.9271074Z and.b32 %r2169, %r2792, 768; 2026-02-21T09:08:18.9271134Z shr.u32 %r2170, %r3, 2; 2026-02-21T09:08:18.9271198Z and.b32 %r2171, %r2170, 96; 2026-02-21T09:08:18.9271269Z or.b32 %r2173, %r2168, %r2169; 2026-02-21T09:08:18.9271330Z or.b32 %r2174, %r2171, %r2809; 2026-02-21T09:08:18.9271392Z xor.b32 %r2175, %r2173, %r2174; 2026-02-21T09:08:18.9271536Z add.s32 %r2176, %r2624, %r2808; 2026-02-21T09:08:18.9271598Z add.s32 %r47, %r2176, %r2175; 2026-02-21T09:08:18.9271660Z and.b32 %r2177, %r2804, 7168; 2026-02-21T09:08:18.9271721Z xor.b32 %r2179, %r2805, %r2810; 2026-02-21T09:08:18.9271789Z add.s32 %r2180, %r2624, %r2177; 2026-02-21T09:08:18.9271854Z add.s32 %r48, %r2180, %r2179; 2026-02-21T09:08:18.9272081Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9272148Z or.b32 %r49, %r2811, %r13; 2026-02-21T09:08:18.9272211Z or.b32 %r50, %r2812, %r9; 2026-02-21T09:08:18.9272330Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T09:08:18.9272429Z // Child Loop BB0_12 Depth 2 2026-02-21T09:08:18.9272766Z .loc 1 25 35 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:25:35 2026-02-21T09:08:18.9272832Z shr.s32 %r2184, %r2841, 31; 2026-02-21T09:08:18.9272896Z shr.u32 %r2185, %r2184, 24; 2026-02-21T09:08:18.9272966Z add.s32 %r2186, %r2841, %r2185; 2026-02-21T09:08:18.9273170Z .loc 1 28 45 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:45 2026-02-21T09:08:18.9273235Z and.b32 %r2187, %r2186, -256; 2026-02-21T09:08:18.9273303Z sub.s32 %r2188, %r2841, %r2187; 2026-02-21T09:08:18.9273502Z .loc 1 28 64 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:64 2026-02-21T09:08:18.9273567Z cvt.u16.u32 %rs229, %r2188; 2026-02-21T09:08:18.9273774Z .loc 1 29 51 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:29:51 2026-02-21T09:08:18.9273837Z shr.s16 %rs230, %rs229, 15; 2026-02-21T09:08:18.9273899Z shr.u16 %rs231, %rs230, 14; 2026-02-21T09:08:18.9273967Z add.s16 %rs232, %rs229, %rs231; 2026-02-21T09:08:18.9274086Z shr.s16 %rs233, %rs232, 2; 2026-02-21T09:08:18.9274287Z .loc 1 28 64 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:28:64 2026-02-21T09:08:18.9274354Z and.b16 %rs234, %rs232, -4; 2026-02-21T09:08:18.9274425Z sub.s16 %rs235, %rs229, %rs234; 2026-02-21T09:08:18.9274625Z .loc 1 29 51 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:29:51 2026-02-21T09:08:18.9274687Z cvt.u32.u16 %r2189, %rs233; 2026-02-21T09:08:18.9274892Z .loc 1 30 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:30:27 2026-02-21T09:08:18.9274960Z mul.wide.s16 %r2190, %rs235, 64; 2026-02-21T09:08:18.9275039Z add.s32 %r121, %r2190, %r2187; 2026-02-21T09:08:18.9275240Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9275315Z or.b32 %r122, %r121, %r9; 2026-02-21T09:08:18.9275522Z .loc 1 32 27 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:32:27 2026-02-21T09:08:18.9275589Z mul.wide.s16 %r2191, %rs233, 64; 2026-02-21T09:08:18.9275795Z .loc 1 33 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:33:32 2026-02-21T09:08:18.9275859Z or.b32 %r123, %r2191, %r7; 2026-02-21T09:08:18.9276070Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9276137Z shl.b32 %r2192, %r2189, 16; 2026-02-21T09:08:18.9276199Z or.b32 %r2193, %r49, %r2192; 2026-02-21T09:08:18.9276273Z mad.wide.s32 %rd254, %r2193, 2, %rd53; 2026-02-21T09:08:18.9276335Z add.s32 %r2194, %r50, %r2187; 2026-02-21T09:08:18.9276404Z add.s32 %r2842, %r2194, %r2190; 2026-02-21T09:08:18.9276584Z mov.b32 %r2665, 0f00000000; 2026-02-21T09:08:18.9276652Z mov.b64 %rd255, -32; 2026-02-21T09:08:18.9276722Z mov.b32 %r2666, %r2665; 2026-02-21T09:08:18.9276784Z mov.b32 %r2667, %r2665; 2026-02-21T09:08:18.9276849Z mov.b32 %r2668, %r2665; 2026-02-21T09:08:18.9276915Z mov.b32 %r2669, %r2665; 2026-02-21T09:08:18.9276977Z mov.b32 %r2670, %r2665; 2026-02-21T09:08:18.9277038Z mov.b32 %r2671, %r2665; 2026-02-21T09:08:18.9277179Z mov.b32 %r2672, %r2665; 2026-02-21T09:08:18.9277302Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T09:08:18.9277411Z // => This Inner Loop Header: Depth=2 2026-02-21T09:08:18.9277616Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9277685Z // begin inline asm 2026-02-21T09:08:18.9277748Z mov.u64 %rd209, 0x0; 2026-02-21T09:08:18.9277878Z createpolicy.fractional.L2::evict_last.b64 %rd209, 1.0; 2026-02-21T09:08:18.9277945Z // end inline asm 2026-02-21T09:08:18.9278006Z // begin inline asm 2026-02-21T09:08:18.9278065Z mov.u32 %r2195, 0x0; 2026-02-21T09:08:18.9278136Z mov.u32 %r2196, 0x0; 2026-02-21T09:08:18.9278468Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2195, %r2196 }, [ %rd254 + 0 ], %rd209; 2026-02-21T09:08:18.9278532Z // end inline asm 2026-02-21T09:08:18.9278734Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9278801Z bar.sync 0; 2026-02-21T09:08:18.9278884Z st.shared.v2.b32 [%r33], {%r2195, %r2196}; 2026-02-21T09:08:18.9278940Z bar.sync 0; 2026-02-21T09:08:18.9279010Z ld.shared.b16 %rs238, [%r34]; 2026-02-21T09:08:18.9279087Z ld.shared.b16 %rs239, [%r34+512]; 2026-02-21T09:08:18.9279155Z ld.shared.b16 %rs240, [%r34+32]; 2026-02-21T09:08:18.9279223Z ld.shared.b16 %rs241, [%r34+544]; 2026-02-21T09:08:18.9279293Z ld.shared.b16 %rs242, [%r35]; 2026-02-21T09:08:18.9279358Z ld.shared.b16 %rs243, [%r35+512]; 2026-02-21T09:08:18.9279435Z ld.shared.b16 %rs244, [%r35+32]; 2026-02-21T09:08:18.9279508Z ld.shared.b16 %rs245, [%r35+544]; 2026-02-21T09:08:18.9279573Z ld.shared.b16 %rs246, [%r36]; 2026-02-21T09:08:18.9279643Z ld.shared.b16 %rs247, [%r36+512]; 2026-02-21T09:08:18.9279782Z ld.shared.b16 %rs248, [%r36+32]; 2026-02-21T09:08:18.9279857Z ld.shared.b16 %rs249, [%r36+544]; 2026-02-21T09:08:18.9279926Z ld.shared.b16 %rs250, [%r37]; 2026-02-21T09:08:18.9279990Z ld.shared.b16 %rs251, [%r37+512]; 2026-02-21T09:08:18.9280059Z ld.shared.b16 %rs252, [%r37+32]; 2026-02-21T09:08:18.9280124Z ld.shared.b16 %rs253, [%r37+544]; 2026-02-21T09:08:18.9280188Z cvt.f32.bf16 %r2333, %rs238; 2026-02-21T09:08:18.9280252Z cvt.f32.bf16 %r2334, %rs239; 2026-02-21T09:08:18.9280318Z cvt.f32.bf16 %r2335, %rs242; 2026-02-21T09:08:18.9280380Z cvt.f32.bf16 %r2336, %rs243; 2026-02-21T09:08:18.9280454Z cvt.f32.bf16 %r2353, %rs246; 2026-02-21T09:08:18.9280521Z cvt.f32.bf16 %r2354, %rs247; 2026-02-21T09:08:18.9280584Z cvt.f32.bf16 %r2355, %rs250; 2026-02-21T09:08:18.9280644Z cvt.f32.bf16 %r2356, %rs251; 2026-02-21T09:08:18.9280705Z cvt.f32.bf16 %r2373, %rs240; 2026-02-21T09:08:18.9280775Z cvt.f32.bf16 %r2374, %rs241; 2026-02-21T09:08:18.9280838Z cvt.f32.bf16 %r2375, %rs244; 2026-02-21T09:08:18.9280899Z cvt.f32.bf16 %r2376, %rs245; 2026-02-21T09:08:18.9280965Z cvt.f32.bf16 %r2393, %rs248; 2026-02-21T09:08:18.9281029Z cvt.f32.bf16 %r2394, %rs249; 2026-02-21T09:08:18.9281090Z cvt.f32.bf16 %r2395, %rs252; 2026-02-21T09:08:18.9281157Z cvt.f32.bf16 %r2396, %rs253; 2026-02-21T09:08:18.9281369Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9281434Z cvt.s64.s32 %rd241, %r2842; 2026-02-21T09:08:18.9281500Z add.s64 %rd213, %rd54, %rd241; 2026-02-21T09:08:18.9281709Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9281773Z // begin inline asm 2026-02-21T09:08:18.9281832Z mov.u64 %rd212, 0x0; 2026-02-21T09:08:18.9281967Z createpolicy.fractional.L2::evict_first.b64 %rd212, 1.0; 2026-02-21T09:08:18.9282025Z // end inline asm 2026-02-21T09:08:18.9282090Z // begin inline asm 2026-02-21T09:08:18.9282157Z mov.u16 %rs236, 0x0; 2026-02-21T09:08:18.9282322Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs236 }, [ %rd213 + 0 ], %rd212; 2026-02-21T09:08:18.9282454Z // end inline asm 2026-02-21T09:08:18.9282519Z shr.u16 %rs254, %rs236, 8; 2026-02-21T09:08:18.9282731Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9282788Z bar.sync 0; 2026-02-21T09:08:18.9282855Z st.shared.b8 [%r38], %rs236; 2026-02-21T09:08:18.9282926Z st.shared.b8 [%r39+512], %rs254; 2026-02-21T09:08:18.9282983Z bar.sync 0; 2026-02-21T09:08:18.9283049Z ld.shared.b32 %r2747, [%r40]; 2026-02-21T09:08:18.9283120Z prmt.b32 %r2748, %r2747, 0, 0x7770U; 2026-02-21T09:08:18.9283191Z cvt.u16.u32 %rs255, %r2748; 2026-02-21T09:08:18.9283259Z prmt.b32 %r2749, %r2747, 0, 0x7771U; 2026-02-21T09:08:18.9283325Z cvt.u16.u32 %rs256, %r2749; 2026-02-21T09:08:18.9283396Z prmt.b32 %r2750, %r2747, 0, 0x7772U; 2026-02-21T09:08:18.9283561Z cvt.u16.u32 %rs257, %r2750; 2026-02-21T09:08:18.9283631Z prmt.b32 %r2751, %r2747, 0, 0x7773U; 2026-02-21T09:08:18.9283701Z cvt.u16.u32 %rs258, %r2751; 2026-02-21T09:08:18.9283902Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9283968Z shl.b16 %rs259, %rs255, 4; 2026-02-21T09:08:18.9284032Z shl.b16 %rs260, %rs256, 4; 2026-02-21T09:08:18.9284099Z shl.b16 %rs261, %rs257, 4; 2026-02-21T09:08:18.9284160Z shl.b16 %rs262, %rs258, 4; 2026-02-21T09:08:18.9284359Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9284441Z selp.b16 %rs263, %rs259, %rs255, %p93; 2026-02-21T09:08:18.9284516Z cvt.s16.s8 %rs264, %rs263; 2026-02-21T09:08:18.9284581Z shr.s16 %rs265, %rs264, 4; 2026-02-21T09:08:18.9284655Z selp.b16 %rs266, %rs260, %rs256, %p93; 2026-02-21T09:08:18.9284724Z cvt.s16.s8 %rs267, %rs266; 2026-02-21T09:08:18.9284788Z shr.s16 %rs268, %rs267, 4; 2026-02-21T09:08:18.9284910Z selp.b16 %rs269, %rs261, %rs257, %p93; 2026-02-21T09:08:18.9284982Z cvt.s16.s8 %rs270, %rs269; 2026-02-21T09:08:18.9285049Z shr.s16 %rs271, %rs270, 4; 2026-02-21T09:08:18.9285121Z selp.b16 %rs272, %rs262, %rs258, %p93; 2026-02-21T09:08:18.9285192Z cvt.s16.s8 %rs273, %rs272; 2026-02-21T09:08:18.9285253Z shr.s16 %rs274, %rs273, 4; 2026-02-21T09:08:18.9285456Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9285524Z cvt.rn.f32.s16 %r2752, %rs265; 2026-02-21T09:08:18.9285597Z cvt.rn.f32.s16 %r2753, %rs268; 2026-02-21T09:08:18.9285663Z cvt.rn.f32.s16 %r2754, %rs271; 2026-02-21T09:08:18.9285727Z cvt.rn.f32.s16 %r2755, %rs274; 2026-02-21T09:08:18.9285790Z bar.sync 0; 2026-02-21T09:08:18.9285855Z st.shared.b32 [%r41], %r2752; 2026-02-21T09:08:18.9285920Z st.shared.b32 [%r42], %r2753; 2026-02-21T09:08:18.9285987Z st.shared.b32 [%r43], %r2754; 2026-02-21T09:08:18.9286062Z st.shared.b32 [%r44], %r2755; 2026-02-21T09:08:18.9286209Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2665}; 2026-02-21T09:08:18.9286267Z bar.sync 0; 2026-02-21T09:08:18.9286337Z // begin inline asm 2026-02-21T09:08:18.9286653Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2257, %r2337, %r2417, %r2497}, [%r2201]; 2026-02-21T09:08:18.9286714Z // end inline asm 2026-02-21T09:08:18.9286778Z bar.sync 0; 2026-02-21T09:08:18.9286919Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2667}; 2026-02-21T09:08:18.9286976Z bar.sync 0; 2026-02-21T09:08:18.9287038Z // begin inline asm 2026-02-21T09:08:18.9287229Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2259, %r2339, %r2419, %r2499}, [%r2201]; 2026-02-21T09:08:18.9287288Z // end inline asm 2026-02-21T09:08:18.9287345Z bar.sync 0; 2026-02-21T09:08:18.9287482Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2666}; 2026-02-21T09:08:18.9287538Z bar.sync 0; 2026-02-21T09:08:18.9287599Z // begin inline asm 2026-02-21T09:08:18.9287785Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2258, %r2338, %r2418, %r2498}, [%r2201]; 2026-02-21T09:08:18.9287863Z // end inline asm 2026-02-21T09:08:18.9287920Z bar.sync 0; 2026-02-21T09:08:18.9288133Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2668}; 2026-02-21T09:08:18.9288194Z bar.sync 0; 2026-02-21T09:08:18.9288254Z // begin inline asm 2026-02-21T09:08:18.9288434Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2260, %r2340, %r2420, %r2500}, [%r2201]; 2026-02-21T09:08:18.9288497Z // end inline asm 2026-02-21T09:08:18.9288554Z bar.sync 0; 2026-02-21T09:08:18.9288681Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2669}; 2026-02-21T09:08:18.9288738Z bar.sync 0; 2026-02-21T09:08:18.9288803Z // begin inline asm 2026-02-21T09:08:18.9288983Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2261, %r2341, %r2421, %r2501}, [%r2201]; 2026-02-21T09:08:18.9289042Z // end inline asm 2026-02-21T09:08:18.9289105Z bar.sync 0; 2026-02-21T09:08:18.9289375Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2671}; 2026-02-21T09:08:18.9289435Z bar.sync 0; 2026-02-21T09:08:18.9289494Z // begin inline asm 2026-02-21T09:08:18.9289678Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2263, %r2343, %r2423, %r2503}, [%r2201]; 2026-02-21T09:08:18.9289737Z // end inline asm 2026-02-21T09:08:18.9289791Z bar.sync 0; 2026-02-21T09:08:18.9289925Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2670}; 2026-02-21T09:08:18.9289980Z bar.sync 0; 2026-02-21T09:08:18.9290038Z // begin inline asm 2026-02-21T09:08:18.9290220Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2262, %r2342, %r2422, %r2502}, [%r2201]; 2026-02-21T09:08:18.9290282Z // end inline asm 2026-02-21T09:08:18.9290337Z bar.sync 0; 2026-02-21T09:08:18.9290479Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r2630], {%r2672}; 2026-02-21T09:08:18.9290542Z bar.sync 0; 2026-02-21T09:08:18.9290601Z // begin inline asm 2026-02-21T09:08:18.9290782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2264, %r2344, %r2424, %r2504}, [%r2201]; 2026-02-21T09:08:18.9290849Z // end inline asm 2026-02-21T09:08:18.9290973Z $L__tmp13: 2026-02-21T09:08:18.9291254Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9291321Z // begin inline asm 2026-02-21T09:08:18.9291407Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9291468Z // end inline asm 2026-02-21T09:08:18.9291549Z shfl.sync.idx.b32 %r2756, %r5, 0, 31, -1; 2026-02-21T09:08:18.9291631Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9291696Z mov.pred %p70, -1; 2026-02-21T09:08:18.9291756Z // begin inline asm 2026-02-21T09:08:18.9292151Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264}, {%r2333,%r2334,%r2335,%r2336}, %rd215, %p70, 1, 1; 2026-02-21T09:08:18.9292222Z // end inline asm 2026-02-21T09:08:18.9292282Z // begin inline asm 2026-02-21T09:08:18.9292662Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264}, {%r2353,%r2354,%r2355,%r2356}, %rd216, %p70, 1, 1; 2026-02-21T09:08:18.9292730Z // end inline asm 2026-02-21T09:08:18.9292789Z // begin inline asm 2026-02-21T09:08:18.9293159Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264}, {%r2373,%r2374,%r2375,%r2376}, %rd217, %p70, 1, 1; 2026-02-21T09:08:18.9293222Z // end inline asm 2026-02-21T09:08:18.9293282Z // begin inline asm 2026-02-21T09:08:18.9293648Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264}, {%r2393,%r2394,%r2395,%r2396}, %rd218, %p70, 1, 1; 2026-02-21T09:08:18.9293713Z // end inline asm 2026-02-21T09:08:18.9293773Z // begin inline asm 2026-02-21T09:08:18.9294140Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344}, {%r2333,%r2334,%r2335,%r2336}, %rd219, %p70, 1, 1; 2026-02-21T09:08:18.9294202Z // end inline asm 2026-02-21T09:08:18.9294267Z // begin inline asm 2026-02-21T09:08:18.9294638Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344}, {%r2353,%r2354,%r2355,%r2356}, %rd220, %p70, 1, 1; 2026-02-21T09:08:18.9294757Z // end inline asm 2026-02-21T09:08:18.9294823Z // begin inline asm 2026-02-21T09:08:18.9295197Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344}, {%r2373,%r2374,%r2375,%r2376}, %rd221, %p70, 1, 1; 2026-02-21T09:08:18.9295256Z // end inline asm 2026-02-21T09:08:18.9295322Z // begin inline asm 2026-02-21T09:08:18.9295705Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344}, {%r2393,%r2394,%r2395,%r2396}, %rd222, %p70, 1, 1; 2026-02-21T09:08:18.9295766Z // end inline asm 2026-02-21T09:08:18.9295832Z // begin inline asm 2026-02-21T09:08:18.9296289Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424}, {%r2333,%r2334,%r2335,%r2336}, %rd223, %p70, 1, 1; 2026-02-21T09:08:18.9296393Z // end inline asm 2026-02-21T09:08:18.9296574Z // begin inline asm 2026-02-21T09:08:18.9296961Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424}, {%r2353,%r2354,%r2355,%r2356}, %rd224, %p70, 1, 1; 2026-02-21T09:08:18.9297027Z // end inline asm 2026-02-21T09:08:18.9297087Z // begin inline asm 2026-02-21T09:08:18.9297466Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424}, {%r2373,%r2374,%r2375,%r2376}, %rd225, %p70, 1, 1; 2026-02-21T09:08:18.9297524Z // end inline asm 2026-02-21T09:08:18.9297585Z // begin inline asm 2026-02-21T09:08:18.9302137Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424}, {%r2393,%r2394,%r2395,%r2396}, %rd226, %p70, 1, 1; 2026-02-21T09:08:18.9302237Z // end inline asm 2026-02-21T09:08:18.9302319Z // begin inline asm 2026-02-21T09:08:18.9302859Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504}, {%r2333,%r2334,%r2335,%r2336}, %rd227, %p70, 1, 1; 2026-02-21T09:08:18.9302931Z // end inline asm 2026-02-21T09:08:18.9303004Z // begin inline asm 2026-02-21T09:08:18.9303388Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504}, {%r2353,%r2354,%r2355,%r2356}, %rd228, %p70, 1, 1; 2026-02-21T09:08:18.9303449Z // end inline asm 2026-02-21T09:08:18.9303523Z // begin inline asm 2026-02-21T09:08:18.9303894Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504}, {%r2373,%r2374,%r2375,%r2376}, %rd229, %p70, 1, 1; 2026-02-21T09:08:18.9303953Z // end inline asm 2026-02-21T09:08:18.9304023Z // begin inline asm 2026-02-21T09:08:18.9304393Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504}, {%r2393,%r2394,%r2395,%r2396}, %rd230, %p70, 1, 1; 2026-02-21T09:08:18.9304458Z // end inline asm 2026-02-21T09:08:18.9304547Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9304616Z mov.b32 %r2735, 0; 2026-02-21T09:08:18.9304686Z mov.b32 %r2590, %r2735; 2026-02-21T09:08:18.9304748Z mov.b32 %r2591, %r2735; 2026-02-21T09:08:18.9304814Z mov.b32 %r2589, %r2624; 2026-02-21T09:08:18.9304878Z // begin inline asm 2026-02-21T09:08:18.9305459Z // wait for regs: %r2257,%r2258,%r2259,%r2260,%r2261,%r2262,%r2263,%r2264,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2589,%r2590,%r2591 2026-02-21T09:08:18.9305549Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9305610Z // end inline asm 2026-02-21T09:08:18.9305669Z $L__tmp14: 2026-02-21T09:08:18.9305896Z .loc 1 48 80 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:48:80 2026-02-21T09:08:18.9305989Z add.s64 %rd232, %rd254, 64; 2026-02-21T09:08:18.9306055Z // begin inline asm 2026-02-21T09:08:18.9306118Z mov.u64 %rd231, 0x0; 2026-02-21T09:08:18.9306263Z createpolicy.fractional.L2::evict_last.b64 %rd231, 1.0; 2026-02-21T09:08:18.9306428Z // end inline asm 2026-02-21T09:08:18.9306689Z // begin inline asm 2026-02-21T09:08:18.9306765Z mov.u32 %r2627, 0x0; 2026-02-21T09:08:18.9306827Z mov.u32 %r2628, 0x0; 2026-02-21T09:08:18.9307026Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2627, %r2628 }, [ %rd232 + 0 ], %rd231; 2026-02-21T09:08:18.9307087Z // end inline asm 2026-02-21T09:08:18.9307309Z .loc 1 52 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:52:32 2026-02-21T09:08:18.9307371Z bar.sync 0; 2026-02-21T09:08:18.9307457Z st.shared.v2.b32 [%r33], {%r2627, %r2628}; 2026-02-21T09:08:18.9307520Z bar.sync 0; 2026-02-21T09:08:18.9307590Z ld.shared.b16 %rs275, [%r34]; 2026-02-21T09:08:18.9307661Z ld.shared.b16 %rs276, [%r34+512]; 2026-02-21T09:08:18.9307890Z ld.shared.b16 %rs277, [%r34+32]; 2026-02-21T09:08:18.9307970Z ld.shared.b16 %rs278, [%r34+544]; 2026-02-21T09:08:18.9308040Z ld.shared.b16 %rs279, [%r35]; 2026-02-21T09:08:18.9308104Z ld.shared.b16 %rs280, [%r35+512]; 2026-02-21T09:08:18.9308177Z ld.shared.b16 %rs281, [%r35+32]; 2026-02-21T09:08:18.9308241Z ld.shared.b16 %rs282, [%r35+544]; 2026-02-21T09:08:18.9308306Z ld.shared.b16 %rs283, [%r36]; 2026-02-21T09:08:18.9308453Z ld.shared.b16 %rs284, [%r36+512]; 2026-02-21T09:08:18.9308522Z ld.shared.b16 %rs285, [%r36+32]; 2026-02-21T09:08:18.9308588Z ld.shared.b16 %rs286, [%r36+544]; 2026-02-21T09:08:18.9308655Z ld.shared.b16 %rs287, [%r37]; 2026-02-21T09:08:18.9308725Z ld.shared.b16 %rs288, [%r37+512]; 2026-02-21T09:08:18.9308789Z ld.shared.b16 %rs289, [%r37+32]; 2026-02-21T09:08:18.9308854Z ld.shared.b16 %rs290, [%r37+544]; 2026-02-21T09:08:18.9308924Z cvt.f32.bf16 %r2661, %rs275; 2026-02-21T09:08:18.9308990Z cvt.f32.bf16 %r2662, %rs276; 2026-02-21T09:08:18.9309055Z cvt.f32.bf16 %r2663, %rs279; 2026-02-21T09:08:18.9309193Z cvt.f32.bf16 %r2664, %rs280; 2026-02-21T09:08:18.9309272Z cvt.f32.bf16 %r2681, %rs283; 2026-02-21T09:08:18.9309335Z cvt.f32.bf16 %r2682, %rs284; 2026-02-21T09:08:18.9309398Z cvt.f32.bf16 %r2683, %rs287; 2026-02-21T09:08:18.9309468Z cvt.f32.bf16 %r2684, %rs288; 2026-02-21T09:08:18.9309530Z cvt.f32.bf16 %r2701, %rs277; 2026-02-21T09:08:18.9309592Z cvt.f32.bf16 %r2702, %rs278; 2026-02-21T09:08:18.9309654Z cvt.f32.bf16 %r2703, %rs281; 2026-02-21T09:08:18.9309719Z cvt.f32.bf16 %r2704, %rs282; 2026-02-21T09:08:18.9309780Z cvt.f32.bf16 %r2721, %rs285; 2026-02-21T09:08:18.9309842Z cvt.f32.bf16 %r2722, %rs286; 2026-02-21T09:08:18.9309909Z cvt.f32.bf16 %r2723, %rs289; 2026-02-21T09:08:18.9309971Z cvt.f32.bf16 %r2724, %rs290; 2026-02-21T09:08:18.9310033Z cvt.u32.u64 %r2757, %rd255; 2026-02-21T09:08:18.9310099Z add.s32 %r2758, %r2757, 48; 2026-02-21T09:08:18.9310322Z .loc 1 54 62 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:62 2026-02-21T09:08:18.9310388Z or.b32 %r2759, %r5, %r2758; 2026-02-21T09:08:18.9310449Z shl.b32 %r2760, %r2759, 13; 2026-02-21T09:08:18.9310522Z add.s32 %r2761, %r2760, %r122; 2026-02-21T09:08:18.9310740Z .loc 1 54 34 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:34 2026-02-21T09:08:18.9310808Z cvt.s64.s32 %rd242, %r2761; 2026-02-21T09:08:18.9310878Z add.s64 %rd235, %rd54, %rd242; 2026-02-21T09:08:18.9311082Z .loc 1 54 87 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:54:87 2026-02-21T09:08:18.9311144Z // begin inline asm 2026-02-21T09:08:18.9311212Z mov.u64 %rd234, 0x0; 2026-02-21T09:08:18.9311347Z createpolicy.fractional.L2::evict_first.b64 %rd234, 1.0; 2026-02-21T09:08:18.9311407Z // end inline asm 2026-02-21T09:08:18.9311467Z // begin inline asm 2026-02-21T09:08:18.9311533Z mov.u16 %rs237, 0x0; 2026-02-21T09:08:18.9311705Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs237 }, [ %rd235 + 0 ], %rd234; 2026-02-21T09:08:18.9311779Z // end inline asm 2026-02-21T09:08:18.9311851Z shr.u16 %rs291, %rs237, 8; 2026-02-21T09:08:18.9312070Z .loc 1 62 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:62:28 2026-02-21T09:08:18.9312208Z bar.sync 0; 2026-02-21T09:08:18.9312286Z st.shared.b8 [%r38], %rs237; 2026-02-21T09:08:18.9312356Z st.shared.b8 [%r39+512], %rs291; 2026-02-21T09:08:18.9312412Z bar.sync 0; 2026-02-21T09:08:18.9312482Z ld.shared.b32 %r2762, [%r40]; 2026-02-21T09:08:18.9312559Z prmt.b32 %r2763, %r2762, 0, 0x7770U; 2026-02-21T09:08:18.9312625Z cvt.u16.u32 %rs292, %r2763; 2026-02-21T09:08:18.9312695Z prmt.b32 %r2764, %r2762, 0, 0x7771U; 2026-02-21T09:08:18.9312765Z cvt.u16.u32 %rs293, %r2764; 2026-02-21T09:08:18.9312829Z prmt.b32 %r2765, %r2762, 0, 0x7772U; 2026-02-21T09:08:18.9312892Z cvt.u16.u32 %rs294, %r2765; 2026-02-21T09:08:18.9312956Z prmt.b32 %r2766, %r2762, 0, 0x7773U; 2026-02-21T09:08:18.9313128Z cvt.u16.u32 %rs295, %r2766; 2026-02-21T09:08:18.9313340Z .loc 1 57 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:57:28 2026-02-21T09:08:18.9313405Z shl.b16 %rs296, %rs292, 4; 2026-02-21T09:08:18.9313478Z shl.b16 %rs297, %rs293, 4; 2026-02-21T09:08:18.9313540Z shl.b16 %rs298, %rs294, 4; 2026-02-21T09:08:18.9313612Z shl.b16 %rs299, %rs295, 4; 2026-02-21T09:08:18.9313826Z .loc 1 72 58 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:72:58 2026-02-21T09:08:18.9313902Z selp.b16 %rs300, %rs296, %rs292, %p93; 2026-02-21T09:08:18.9313964Z cvt.s16.s8 %rs301, %rs300; 2026-02-21T09:08:18.9314026Z shr.s16 %rs302, %rs301, 4; 2026-02-21T09:08:18.9314103Z selp.b16 %rs303, %rs297, %rs293, %p93; 2026-02-21T09:08:18.9314166Z cvt.s16.s8 %rs304, %rs303; 2026-02-21T09:08:18.9314228Z shr.s16 %rs305, %rs304, 4; 2026-02-21T09:08:18.9314304Z selp.b16 %rs306, %rs298, %rs294, %p93; 2026-02-21T09:08:18.9314366Z cvt.s16.s8 %rs307, %rs306; 2026-02-21T09:08:18.9314432Z shr.s16 %rs308, %rs307, 4; 2026-02-21T09:08:18.9314555Z selp.b16 %rs309, %rs299, %rs295, %p93; 2026-02-21T09:08:18.9314633Z cvt.s16.s8 %rs310, %rs309; 2026-02-21T09:08:18.9314696Z shr.s16 %rs311, %rs310, 4; 2026-02-21T09:08:18.9314909Z .loc 1 77 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:77:32 2026-02-21T09:08:18.9314980Z cvt.rn.f32.s16 %r2767, %rs302; 2026-02-21T09:08:18.9315045Z cvt.rn.f32.s16 %r2768, %rs305; 2026-02-21T09:08:18.9315108Z cvt.rn.f32.s16 %r2769, %rs308; 2026-02-21T09:08:18.9315179Z cvt.rn.f32.s16 %r2770, %rs311; 2026-02-21T09:08:18.9315235Z bar.sync 0; 2026-02-21T09:08:18.9315309Z st.shared.b32 [%r41], %r2767; 2026-02-21T09:08:18.9315374Z st.shared.b32 [%r42], %r2768; 2026-02-21T09:08:18.9315443Z st.shared.b32 [%r43], %r2769; 2026-02-21T09:08:18.9315506Z st.shared.b32 [%r44], %r2770; 2026-02-21T09:08:18.9315562Z $L__tmp15: 2026-02-21T09:08:18.9315851Z .loc 2 291 36 // standard.py:291:36 @[ ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:84:40 ] 2026-02-21T09:08:18.9316053Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2257, %r2337, %r2417, %r2497}; 2026-02-21T09:08:18.9316114Z bar.sync 0; 2026-02-21T09:08:18.9316176Z // begin inline asm 2026-02-21T09:08:18.9316321Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2665}, [%r2630]; 2026-02-21T09:08:18.9316379Z // end inline asm 2026-02-21T09:08:18.9316435Z bar.sync 0; 2026-02-21T09:08:18.9316752Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2259, %r2339, %r2419, %r2499}; 2026-02-21T09:08:18.9316812Z bar.sync 0; 2026-02-21T09:08:18.9316871Z // begin inline asm 2026-02-21T09:08:18.9317006Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2667}, [%r2630]; 2026-02-21T09:08:18.9317076Z // end inline asm 2026-02-21T09:08:18.9317133Z bar.sync 0; 2026-02-21T09:08:18.9317317Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2258, %r2338, %r2418, %r2498}; 2026-02-21T09:08:18.9317378Z bar.sync 0; 2026-02-21T09:08:18.9317439Z // begin inline asm 2026-02-21T09:08:18.9317572Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2666}, [%r2630]; 2026-02-21T09:08:18.9317637Z // end inline asm 2026-02-21T09:08:18.9317691Z bar.sync 0; 2026-02-21T09:08:18.9317956Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2260, %r2340, %r2420, %r2500}; 2026-02-21T09:08:18.9318013Z bar.sync 0; 2026-02-21T09:08:18.9318078Z // begin inline asm 2026-02-21T09:08:18.9318207Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2668}, [%r2630]; 2026-02-21T09:08:18.9318264Z // end inline asm 2026-02-21T09:08:18.9318325Z bar.sync 0; 2026-02-21T09:08:18.9318504Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2261, %r2341, %r2421, %r2501}; 2026-02-21T09:08:18.9318561Z bar.sync 0; 2026-02-21T09:08:18.9318625Z // begin inline asm 2026-02-21T09:08:18.9318753Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2669}, [%r2630]; 2026-02-21T09:08:18.9318809Z // end inline asm 2026-02-21T09:08:18.9318863Z bar.sync 0; 2026-02-21T09:08:18.9319191Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2263, %r2343, %r2423, %r2503}; 2026-02-21T09:08:18.9319249Z bar.sync 0; 2026-02-21T09:08:18.9319308Z // begin inline asm 2026-02-21T09:08:18.9319443Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2671}, [%r2630]; 2026-02-21T09:08:18.9319501Z // end inline asm 2026-02-21T09:08:18.9319556Z bar.sync 0; 2026-02-21T09:08:18.9319734Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2262, %r2342, %r2422, %r2502}; 2026-02-21T09:08:18.9319795Z bar.sync 0; 2026-02-21T09:08:18.9319864Z // begin inline asm 2026-02-21T09:08:18.9319993Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2670}, [%r2630]; 2026-02-21T09:08:18.9320055Z // end inline asm 2026-02-21T09:08:18.9320111Z bar.sync 0; 2026-02-21T09:08:18.9320289Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r2201], {%r2264, %r2344, %r2424, %r2504}; 2026-02-21T09:08:18.9320350Z bar.sync 0; 2026-02-21T09:08:18.9320413Z // begin inline asm 2026-02-21T09:08:18.9320545Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2672}, [%r2630]; 2026-02-21T09:08:18.9320672Z // end inline asm 2026-02-21T09:08:18.9320739Z // begin inline asm 2026-02-21T09:08:18.9320818Z fence.proxy.async.shared::cta; 2026-02-21T09:08:18.9320879Z // end inline asm 2026-02-21T09:08:18.9320958Z wgmma.fence.sync.aligned; 2026-02-21T09:08:18.9321022Z shl.b32 %r2771, %r2756, 9; 2026-02-21T09:08:18.9321085Z and.b32 %r2772, %r2771, 6144; 2026-02-21T09:08:18.9321153Z add.s32 %r2773, %r2772, %r2624; 2026-02-21T09:08:18.9321221Z bfe.u32 %r2774, %r2773, 4, 14; 2026-02-21T09:08:18.9321287Z cvt.u64.u32 %rd243, %r2774; 2026-02-21T09:08:18.9321366Z or.b64 %rd237, %rd243, 4611686293338849280; 2026-02-21T09:08:18.9321429Z // begin inline asm 2026-02-21T09:08:18.9321811Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2665,%r2666,%r2667,%r2668,%r2669,%r2670,%r2671,%r2672}, {%r2661,%r2662,%r2663,%r2664}, %rd237, %p70, 1, 1; 2026-02-21T09:08:18.9321870Z // end inline asm 2026-02-21T09:08:18.9321939Z add.s32 %r2775, %r2773, 32; 2026-02-21T09:08:18.9322006Z bfe.u32 %r2776, %r2775, 4, 14; 2026-02-21T09:08:18.9322070Z cvt.u64.u32 %rd244, %r2776; 2026-02-21T09:08:18.9322150Z or.b64 %rd238, %rd244, 4611686293338849280; 2026-02-21T09:08:18.9322217Z // begin inline asm 2026-02-21T09:08:18.9322591Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2665,%r2666,%r2667,%r2668,%r2669,%r2670,%r2671,%r2672}, {%r2681,%r2682,%r2683,%r2684}, %rd238, %p70, 1, 1; 2026-02-21T09:08:18.9322654Z // end inline asm 2026-02-21T09:08:18.9322733Z add.s32 %r2777, %r2773, 64; 2026-02-21T09:08:18.9322796Z bfe.u32 %r2778, %r2777, 4, 14; 2026-02-21T09:08:18.9322857Z cvt.u64.u32 %rd245, %r2778; 2026-02-21T09:08:18.9322931Z or.b64 %rd239, %rd245, 4611686293338849280; 2026-02-21T09:08:18.9322997Z // begin inline asm 2026-02-21T09:08:18.9323366Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2665,%r2666,%r2667,%r2668,%r2669,%r2670,%r2671,%r2672}, {%r2701,%r2702,%r2703,%r2704}, %rd239, %p70, 1, 1; 2026-02-21T09:08:18.9323427Z // end inline asm 2026-02-21T09:08:18.9323497Z add.s32 %r2779, %r2773, 96; 2026-02-21T09:08:18.9323558Z bfe.u32 %r2780, %r2779, 4, 14; 2026-02-21T09:08:18.9323620Z cvt.u64.u32 %rd246, %r2780; 2026-02-21T09:08:18.9323786Z or.b64 %rd240, %rd246, 4611686293338849280; 2026-02-21T09:08:18.9323847Z // begin inline asm 2026-02-21T09:08:18.9324213Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2665,%r2666,%r2667,%r2668,%r2669,%r2670,%r2671,%r2672}, {%r2721,%r2722,%r2723,%r2724}, %rd240, %p70, 1, 1; 2026-02-21T09:08:18.9324280Z // end inline asm 2026-02-21T09:08:18.9324358Z wgmma.commit_group.sync.aligned; 2026-02-21T09:08:18.9324421Z mov.b32 %r2733, %r2624; 2026-02-21T09:08:18.9324485Z mov.b32 %r2734, %r2735; 2026-02-21T09:08:18.9324549Z // begin inline asm 2026-02-21T09:08:18.9324728Z // wait for regs: %r2665,%r2666,%r2667,%r2668,%r2669,%r2670,%r2671,%r2672,%r2733,%r2734,%r2735 2026-02-21T09:08:18.9324804Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:08:18.9324927Z // end inline asm 2026-02-21T09:08:18.9325035Z $L__tmp16: 2026-02-21T09:08:18.9325260Z .loc 1 40 124 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:40:124 2026-02-21T09:08:18.9325326Z add.s64 %rd255, %rd255, 32; 2026-02-21T09:08:18.9325397Z add.s64 %rd254, %rd254, 128; 2026-02-21T09:08:18.9325461Z add.s32 %r2842, %r2842, 262144; 2026-02-21T09:08:18.9325529Z setp.lt.u64 %p91, %rd255, 480; 2026-02-21T09:08:18.9325596Z @%p91 bra $L__BB0_12; 2026-02-21T09:08:18.9325710Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:08:18.9325919Z .loc 1 31 32 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:31:32 2026-02-21T09:08:18.9326001Z or.b32 %r2785, %r121, %r11; 2026-02-21T09:08:18.9326206Z .loc 1 87 28 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:87:28 2026-02-21T09:08:18.9326284Z cvt.rn.bf16x2.f32 %r2786, %r2666, %r2665; 2026-02-21T09:08:18.9326363Z cvt.rn.bf16x2.f32 %r2787, %r2668, %r2667; 2026-02-21T09:08:18.9326655Z cvt.rn.bf16x2.f32 %r2788, %r2670, %r2669; 2026-02-21T09:08:18.9326749Z cvt.rn.bf16x2.f32 %r2789, %r2672, %r2671; 2026-02-21T09:08:18.9326968Z .loc 1 88 43 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:43 2026-02-21T09:08:18.9327041Z shl.b32 %r2790, %r123, 13; 2026-02-21T09:08:18.9327259Z .loc 1 88 50 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:50 2026-02-21T09:08:18.9327326Z add.s32 %r2791, %r2785, %r2790; 2026-02-21T09:08:18.9327537Z .loc 1 88 22 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:22 2026-02-21T09:08:18.9327612Z mad.wide.s32 %rd247, %r2791, 2, %rd55; 2026-02-21T09:08:18.9327820Z .loc 1 88 81 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:88:81 2026-02-21T09:08:18.9327885Z bar.sync 0; 2026-02-21T09:08:18.9328079Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r2786, %r2787, %r2788, %r2789}; 2026-02-21T09:08:18.9328139Z bar.sync 0; 2026-02-21T09:08:18.9328257Z ld.shared.v4.b32 {%r2781, %r2782, %r2783, %r2784}, [%r48]; 2026-02-21T09:08:18.9328324Z // begin inline asm 2026-02-21T09:08:18.9328454Z st.global.v4.b32 [ %rd247 + 0 ], { %r2781, %r2782, %r2783, %r2784 }; 2026-02-21T09:08:18.9328513Z // end inline asm 2026-02-21T09:08:18.9328734Z .loc 1 19 145 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:145 2026-02-21T09:08:18.9328812Z add.s32 %r143, %r2841, 2112; 2026-02-21T09:08:18.9328883Z setp.lt.s32 %p92, %r2841, 6080; 2026-02-21T09:08:18.9328950Z mov.b32 %r2841, %r143; 2026-02-21T09:08:18.9329012Z @%p92 bra $L__BB0_11; 2026-02-21T09:08:18.9329102Z $L__BB0_14: // %._crit_edge 2026-02-21T09:08:18.9329302Z .loc 1 19 4 // ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py:19:4 2026-02-21T09:08:18.9329360Z ret; 2026-02-21T09:08:18.9329417Z $L__tmp17: 2026-02-21T09:08:18.9329475Z $L__func_end0: 2026-02-21T09:08:18.9329572Z // -- End function 2026-02-21T09:08:18.9329629Z } 2026-02-21T09:08:18.9329883Z .file 1 "/tmp/torchinductor_root/i3/ci3efhc7sgvqdu4chqtxoysgyphbcpzl6qflm226etdlrdmmnp2j.py" 2026-02-21T09:08:18.9330174Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:08:18.9330244Z .section .debug_abbrev 2026-02-21T09:08:18.9330297Z { 2026-02-21T09:08:18.9330393Z .b8 1 // Abbreviation Code 2026-02-21T09:08:18.9330493Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:08:18.9330578Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:08:18.9330666Z .b8 37 // DW_AT_producer 2026-02-21T09:08:18.9330764Z .b8 8 // DW_FORM_string 2026-02-21T09:08:18.9330846Z .b8 19 // DW_AT_language 2026-02-21T09:08:18.9331054Z .b8 5 // DW_FORM_data2 2026-02-21T09:08:18.9331139Z .b8 3 // DW_AT_name 2026-02-21T09:08:18.9331223Z .b8 8 // DW_FORM_string 2026-02-21T09:08:18.9331317Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:08:18.9331400Z .b8 6 // DW_FORM_data4 2026-02-21T09:08:18.9331486Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:08:18.9331564Z .b8 8 // DW_FORM_string 2026-02-21T09:08:18.9331638Z .b8 0 // EOM(1) 2026-02-21T09:08:18.9331712Z .b8 0 // EOM(2) 2026-02-21T09:08:18.9331802Z .b8 2 // Abbreviation Code 2026-02-21T09:08:18.9331891Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:08:18.9331975Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:18.9332054Z .b8 3 // DW_AT_name 2026-02-21T09:08:18.9332184Z .b8 8 // DW_FORM_string 2026-02-21T09:08:18.9332266Z .b8 32 // DW_AT_inline 2026-02-21T09:08:18.9332353Z .b8 11 // DW_FORM_data1 2026-02-21T09:08:18.9332423Z .b8 0 // EOM(1) 2026-02-21T09:08:18.9332492Z .b8 0 // EOM(2) 2026-02-21T09:08:18.9332581Z .b8 3 // Abbreviation Code 2026-02-21T09:08:18.9332665Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:08:18.9332748Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:08:18.9332831Z .b8 17 // DW_AT_low_pc 2026-02-21T09:08:18.9332907Z .b8 1 // DW_FORM_addr 2026-02-21T09:08:18.9332989Z .b8 18 // DW_AT_high_pc 2026-02-21T09:08:18.9333069Z .b8 1 // DW_FORM_addr 2026-02-21T09:08:18.9333166Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:08:18.9333241Z .b8 19 // DW_FORM_ref4 2026-02-21T09:08:18.9333317Z .b8 0 // EOM(1) 2026-02-21T09:08:18.9333392Z .b8 0 // EOM(2) 2026-02-21T09:08:18.9333478Z .b8 4 // Abbreviation Code 2026-02-21T09:08:18.9333579Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:08:18.9333667Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:08:18.9333759Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:08:18.9333836Z .b8 19 // DW_FORM_ref4 2026-02-21T09:08:18.9333917Z .b8 17 // DW_AT_low_pc 2026-02-21T09:08:18.9333993Z .b8 1 // DW_FORM_addr 2026-02-21T09:08:18.9334080Z .b8 18 // DW_AT_high_pc 2026-02-21T09:08:18.9334160Z .b8 1 // DW_FORM_addr 2026-02-21T09:08:18.9334242Z .b8 88 // DW_AT_call_file 2026-02-21T09:08:18.9334378Z .b8 11 // DW_FORM_data1 2026-02-21T09:08:18.9334458Z .b8 89 // DW_AT_call_line 2026-02-21T09:08:18.9334539Z .b8 11 // DW_FORM_data1 2026-02-21T09:08:18.9334621Z .b8 87 // DW_AT_call_column 2026-02-21T09:08:18.9334699Z .b8 11 // DW_FORM_data1 2026-02-21T09:08:18.9334789Z .b8 0 // EOM(1) 2026-02-21T09:08:18.9334860Z .b8 0 // EOM(2) 2026-02-21T09:08:18.9334931Z .b8 0 // EOM(3) 2026-02-21T09:08:18.9334988Z } 2026-02-21T09:08:18.9335053Z .section .debug_info 2026-02-21T09:08:18.9335153Z { 2026-02-21T09:08:18.9335288Z .b32 178 // Length of Unit 2026-02-21T09:08:18.9335395Z .b8 2 // DWARF version number 2026-02-21T09:08:18.9335449Z .b8 0 2026-02-21T09:08:18.9335581Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:08:18.9335681Z .b8 8 // Address Size (in bytes) 2026-02-21T09:08:18.9335797Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:08:18.9335884Z .b8 116 // DW_AT_producer 2026-02-21T09:08:18.9335943Z .b8 114 2026-02-21T09:08:18.9335997Z .b8 105 2026-02-21T09:08:18.9336051Z .b8 116 2026-02-21T09:08:18.9336102Z .b8 111 2026-02-21T09:08:18.9336158Z .b8 110 2026-02-21T09:08:18.9336211Z .b8 0 2026-02-21T09:08:18.9336289Z .b8 2 // DW_AT_language 2026-02-21T09:08:18.9336340Z .b8 0 2026-02-21T09:08:18.9336427Z .b8 99 // DW_AT_name 2026-02-21T09:08:18.9336687Z .b8 105 2026-02-21T09:08:18.9336754Z .b8 51 2026-02-21T09:08:18.9336811Z .b8 101 2026-02-21T09:08:18.9336863Z .b8 102 2026-02-21T09:08:18.9336915Z .b8 104 2026-02-21T09:08:18.9336967Z .b8 99 2026-02-21T09:08:18.9337021Z .b8 55 2026-02-21T09:08:18.9337072Z .b8 115 2026-02-21T09:08:18.9337122Z .b8 103 2026-02-21T09:08:18.9337177Z .b8 118 2026-02-21T09:08:18.9337229Z .b8 113 2026-02-21T09:08:18.9337279Z .b8 100 2026-02-21T09:08:18.9337333Z .b8 117 2026-02-21T09:08:18.9337390Z .b8 52 2026-02-21T09:08:18.9337440Z .b8 99 2026-02-21T09:08:18.9337492Z .b8 104 2026-02-21T09:08:18.9337546Z .b8 113 2026-02-21T09:08:18.9337598Z .b8 116 2026-02-21T09:08:18.9337650Z .b8 120 2026-02-21T09:08:18.9337701Z .b8 111 2026-02-21T09:08:18.9337755Z .b8 121 2026-02-21T09:08:18.9337805Z .b8 115 2026-02-21T09:08:18.9337857Z .b8 103 2026-02-21T09:08:18.9337908Z .b8 121 2026-02-21T09:08:18.9337964Z .b8 112 2026-02-21T09:08:18.9338014Z .b8 104 2026-02-21T09:08:18.9338066Z .b8 98 2026-02-21T09:08:18.9338131Z .b8 99 2026-02-21T09:08:18.9338190Z .b8 112 2026-02-21T09:08:18.9338241Z .b8 122 2026-02-21T09:08:18.9338293Z .b8 108 2026-02-21T09:08:18.9338347Z .b8 54 2026-02-21T09:08:18.9338398Z .b8 113 2026-02-21T09:08:18.9338451Z .b8 102 2026-02-21T09:08:18.9338504Z .b8 108 2026-02-21T09:08:18.9338556Z .b8 109 2026-02-21T09:08:18.9338606Z .b8 50 2026-02-21T09:08:18.9338657Z .b8 50 2026-02-21T09:08:18.9338711Z .b8 54 2026-02-21T09:08:18.9338761Z .b8 101 2026-02-21T09:08:18.9338814Z .b8 116 2026-02-21T09:08:18.9338879Z .b8 100 2026-02-21T09:08:18.9338932Z .b8 108 2026-02-21T09:08:18.9338984Z .b8 114 2026-02-21T09:08:18.9339036Z .b8 100 2026-02-21T09:08:18.9339091Z .b8 109 2026-02-21T09:08:18.9339142Z .b8 109 2026-02-21T09:08:18.9339193Z .b8 110 2026-02-21T09:08:18.9339245Z .b8 112 2026-02-21T09:08:18.9339302Z .b8 50 2026-02-21T09:08:18.9339352Z .b8 106 2026-02-21T09:08:18.9339403Z .b8 46 2026-02-21T09:08:18.9339457Z .b8 112 2026-02-21T09:08:18.9339508Z .b8 121 2026-02-21T09:08:18.9339568Z .b8 0 2026-02-21T09:08:18.9339677Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:08:18.9339759Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:08:18.9339810Z .b8 116 2026-02-21T09:08:18.9339944Z .b8 109 2026-02-21T09:08:18.9339996Z .b8 112 2026-02-21T09:08:18.9340045Z .b8 47 2026-02-21T09:08:18.9340097Z .b8 116 2026-02-21T09:08:18.9340148Z .b8 111 2026-02-21T09:08:18.9340198Z .b8 114 2026-02-21T09:08:18.9340247Z .b8 99 2026-02-21T09:08:18.9340300Z .b8 104 2026-02-21T09:08:18.9340351Z .b8 105 2026-02-21T09:08:18.9340401Z .b8 110 2026-02-21T09:08:18.9340451Z .b8 100 2026-02-21T09:08:18.9340503Z .b8 117 2026-02-21T09:08:18.9340553Z .b8 99 2026-02-21T09:08:18.9340603Z .b8 116 2026-02-21T09:08:18.9340663Z .b8 111 2026-02-21T09:08:18.9340727Z .b8 114 2026-02-21T09:08:18.9340782Z .b8 95 2026-02-21T09:08:18.9340842Z .b8 114 2026-02-21T09:08:18.9340894Z .b8 111 2026-02-21T09:08:18.9340946Z .b8 111 2026-02-21T09:08:18.9340998Z .b8 116 2026-02-21T09:08:18.9341126Z .b8 47 2026-02-21T09:08:18.9341239Z .b8 105 2026-02-21T09:08:18.9341292Z .b8 51 2026-02-21T09:08:18.9341342Z .b8 0 2026-02-21T09:08:18.9341461Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:08:18.9341543Z .b8 95 // DW_AT_name 2026-02-21T09:08:18.9341595Z .b8 104 2026-02-21T09:08:18.9341647Z .b8 101 2026-02-21T09:08:18.9341697Z .b8 108 2026-02-21T09:08:18.9341746Z .b8 105 2026-02-21T09:08:18.9341797Z .b8 111 2026-02-21T09:08:18.9341849Z .b8 110 2026-02-21T09:08:18.9341899Z .b8 95 2026-02-21T09:08:18.9341949Z .b8 109 2026-02-21T09:08:18.9342000Z .b8 97 2026-02-21T09:08:18.9342049Z .b8 116 2026-02-21T09:08:18.9342099Z .b8 109 2026-02-21T09:08:18.9342149Z .b8 117 2026-02-21T09:08:18.9342202Z .b8 108 2026-02-21T09:08:18.9342252Z .b8 95 2026-02-21T09:08:18.9342301Z .b8 98 2026-02-21T09:08:18.9342351Z .b8 102 2026-02-21T09:08:18.9342403Z .b8 49 2026-02-21T09:08:18.9342452Z .b8 54 2026-02-21T09:08:18.9342502Z .b8 95 2026-02-21T09:08:18.9342556Z .b8 105 2026-02-21T09:08:18.9342610Z .b8 110 2026-02-21T09:08:18.9342734Z .b8 116 2026-02-21T09:08:18.9342787Z .b8 52 2026-02-21T09:08:18.9342839Z .b8 0 2026-02-21T09:08:18.9342921Z .b8 1 // DW_AT_inline 2026-02-21T09:08:18.9343030Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:08:18.9343138Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:08:18.9343237Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:08:18.9343339Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:08:18.9343471Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:08:18.9343567Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:08:18.9343654Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:08:18.9343743Z .b64 $L__tmp16 // DW_AT_high_pc 2026-02-21T09:08:18.9343836Z .b8 1 // DW_AT_call_file 2026-02-21T09:08:18.9343919Z .b8 84 // DW_AT_call_line 2026-02-21T09:08:18.9344004Z .b8 40 // DW_AT_call_column 2026-02-21T09:08:18.9344098Z .b8 0 // End Of Children Mark 2026-02-21T09:08:18.9344182Z .b8 0 // End Of Children Mark 2026-02-21T09:08:18.9344234Z } 2026-02-21T09:08:18.9344306Z .section .debug_macinfo { } 2026-02-21T09:08:18.9344313Z 2026-02-21T09:08:18.9344392Z ================================================================ 2026-02-21T09:08:18.9344509Z please share the reproducer above with Triton project. 2026-02-21T09:08:22.0589064Z 2026-02-21T09:08:22.0589904Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 78/78 13.0 configs/s 2026-02-21T09:08:28.4306049Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━━━ 616/616 95.7 configs/s 2026-02-21T09:08:28.6442065Z [454s] Generation 6 complete: 2026-02-21T09:08:28.6442343Z error=21 2026-02-21T09:08:28.6442521Z ok=59 2026-02-21T09:08:28.6442678Z min=0.3337 2026-02-21T09:08:28.6442848Z mid=0.6488 2026-02-21T09:08:28.6443006Z max=30.8671 2026-02-21T09:08:28.6443716Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:08:28.6444100Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:08:28.6444500Z 'l2_groupings': [32], 2026-02-21T09:08:28.6444736Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:08:28.6445031Z 'loop_orders': [[1, 0]], 2026-02-21T09:08:28.6445243Z 'num_stages': 4, 2026-02-21T09:08:28.6445429Z 'num_warps': 4, 2026-02-21T09:08:28.6445624Z 'pid_type': 'flat', 2026-02-21T09:08:28.6445830Z 'range_flattens': [None, True], 2026-02-21T09:08:28.6446083Z 'range_multi_buffers': [None, False], 2026-02-21T09:08:28.6446340Z 'range_num_stages': [0, 3], 2026-02-21T09:08:28.6446854Z 'range_unroll_factors': [0, 0], 2026-02-21T09:08:28.6447097Z 'range_warp_specializes': []} 2026-02-21T09:08:28.6491926Z [454s] Fitting surrogate: 720 points, 720 targets 2026-02-21T09:08:29.9528057Z [455s] Generation 7 starting: 77 neighbors, 4 active search path(s) 2026-02-21T09:09:11.2747653Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78/78 0.6 configs/s 2026-02-21T09:09:14.0562042Z [499s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 128, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=1, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[None, False], range_num_stages=[4, 1], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:09:14.0563864Z Tensor-likes are not close! 2026-02-21T09:09:14.0564024Z 2026-02-21T09:09:14.0564135Z Mismatched elements: 33449807 / 33554432 (99.7%) 2026-02-21T09:09:14.0564566Z Greatest absolute difference: 1416.0 at index (1834, 910) (up to 0.01 allowed) 2026-02-21T09:09:14.0565565Z Greatest relative difference: inf at index (788, 2619) (up to 0.01 allowed) 2026-02-21T09:09:14.0566057Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:09:14.0566323Z 2026-02-21T09:09:17.1222362Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 78/78 13.4 configs/s 2026-02-21T09:09:26.7363132Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━━━ 621/621 64.3 configs/s 2026-02-21T09:09:27.0110343Z [512s] Generation 7 complete: 2026-02-21T09:09:27.0110608Z error=12 2026-02-21T09:09:27.0110785Z ok=69 2026-02-21T09:09:27.0110948Z min=0.3316 2026-02-21T09:09:27.0111127Z mid=0.4988 2026-02-21T09:09:27.0111290Z max=36.0456 2026-02-21T09:09:27.0111484Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:09:27.0111877Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:09:27.0112271Z 'l2_groupings': [64], 2026-02-21T09:09:27.0112526Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:09:27.0112839Z 'loop_orders': [[1, 0]], 2026-02-21T09:09:27.0113065Z 'num_stages': 4, 2026-02-21T09:09:27.0113253Z 'num_warps': 4, 2026-02-21T09:09:27.0113463Z 'pid_type': 'flat', 2026-02-21T09:09:27.0113681Z 'range_flattens': [None, True], 2026-02-21T09:09:27.0114290Z 'range_multi_buffers': [None, False], 2026-02-21T09:09:27.0114562Z 'range_num_stages': [0, 3], 2026-02-21T09:09:27.0114794Z 'range_unroll_factors': [0, 0], 2026-02-21T09:09:27.0115054Z 'range_warp_specializes': []} 2026-02-21T09:09:27.0158989Z [512s] Fitting surrogate: 801 points, 801 targets 2026-02-21T09:09:28.0059139Z [513s] Generation 8 starting: 56 neighbors, 3 active search path(s) 2026-02-21T09:10:07.5060565Z [552s] Timeout after 30s compiling Config(block_sizes=[16, 128, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=2, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[None, True], range_num_stages=[3, 0], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T09:10:07.5078718Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 0.5 configs/s 2026-02-21T09:10:11.7306389Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 56/56 13.4 configs/s 2026-02-21T09:10:16.4469156Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━━ 621/621 129.6 configs/s 2026-02-21T09:10:16.6310874Z [562s] Generation 8 complete: 2026-02-21T09:10:16.6311445Z error=13 2026-02-21T09:10:16.6311898Z timeout=1 2026-02-21T09:10:16.6312296Z ok=45 2026-02-21T09:10:16.6312627Z min=0.3294 2026-02-21T09:10:16.6312825Z mid=0.5745 2026-02-21T09:10:16.6313047Z max=25.2951 2026-02-21T09:10:16.6313262Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:10:16.6313705Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:10:16.6314156Z 'l2_groupings': [64], 2026-02-21T09:10:16.6314426Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:10:16.6315297Z 'loop_orders': [[1, 0]], 2026-02-21T09:10:16.6315552Z 'num_stages': 4, 2026-02-21T09:10:16.6315809Z 'num_warps': 4, 2026-02-21T09:10:16.6316031Z 'pid_type': 'flat', 2026-02-21T09:10:16.6316306Z 'range_flattens': [None, True], 2026-02-21T09:10:16.6316948Z 'range_multi_buffers': [None, False], 2026-02-21T09:10:16.6317268Z 'range_num_stages': [0, 3], 2026-02-21T09:10:16.6317526Z 'range_unroll_factors': [0, 0], 2026-02-21T09:10:16.6317825Z 'range_warp_specializes': []} 2026-02-21T09:10:16.6364328Z [562s] Fitting surrogate: 860 points, 860 targets 2026-02-21T09:10:17.4892512Z [562s] Generation 9 starting: 49 neighbors, 3 active search path(s) 2026-02-21T09:10:53.6082241Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49/49 0.5 configs/s 2026-02-21T09:10:57.0372647Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 49/49 14.4 configs/s 2026-02-21T09:11:03.3999688Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━━━ 621/621 96.6 configs/s 2026-02-21T09:11:03.6119605Z [609s] Generation 9 complete: 2026-02-21T09:11:03.6119941Z error=12 2026-02-21T09:11:03.6120153Z ok=40 2026-02-21T09:11:03.6120389Z min=0.3286 2026-02-21T09:11:03.6120642Z mid=0.4332 2026-02-21T09:11:03.6120866Z max=30.3776 2026-02-21T09:11:03.6121124Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:11:03.6121553Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:03.6122021Z 'l2_groupings': [64], 2026-02-21T09:11:03.6122304Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:11:03.6122661Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:03.6122921Z 'num_stages': 4, 2026-02-21T09:11:03.6123183Z 'num_warps': 4, 2026-02-21T09:11:03.6123432Z 'pid_type': 'flat', 2026-02-21T09:11:03.6123717Z 'range_flattens': [None, True], 2026-02-21T09:11:03.6124037Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:03.6124342Z 'range_num_stages': [0, 3], 2026-02-21T09:11:03.6124645Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:03.6124942Z 'range_warp_specializes': []} 2026-02-21T09:11:03.6170078Z [609s] Fitting surrogate: 912 points, 912 targets 2026-02-21T09:11:03.9996404Z [609s] Generation 10 starting: 17 neighbors, 1 active search path(s) 2026-02-21T09:11:11.0811415Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 17/17 2.0 configs/s 2026-02-21T09:11:12.5978393Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 17/17 10.6 configs/s 2026-02-21T09:11:14.2209170Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━━ 621/621 363.8 configs/s 2026-02-21T09:11:14.3450224Z [619s] Generation 10 complete: 2026-02-21T09:11:14.3450808Z error=6 2026-02-21T09:11:14.3451163Z ok=13 2026-02-21T09:11:14.3451543Z min=0.3241 2026-02-21T09:11:14.3451876Z mid=0.5142 2026-02-21T09:11:14.3452244Z max=24.8131 2026-02-21T09:11:14.3452643Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:11:14.3453315Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:14.3454650Z 'l2_groupings': [64], 2026-02-21T09:11:14.3455338Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:11:14.3455940Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:14.3456358Z 'num_stages': 4, 2026-02-21T09:11:14.3457241Z 'num_warps': 4, 2026-02-21T09:11:14.3457646Z 'pid_type': 'flat', 2026-02-21T09:11:14.3458114Z 'range_flattens': [None, True], 2026-02-21T09:11:14.3458635Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:14.3459112Z 'range_num_stages': [0, 3], 2026-02-21T09:11:14.3459587Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:14.3460053Z 'range_warp_specializes': []} 2026-02-21T09:11:14.3494162Z [619s] Fitting surrogate: 931 points, 931 targets 2026-02-21T09:11:14.7747980Z [620s] Generation 11 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:11:20.4668840Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 4.2 configs/s 2026-02-21T09:11:21.5184096Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 17.9 configs/s 2026-02-21T09:11:24.0470572Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━━ 621/621 237.8 configs/s 2026-02-21T09:11:24.1920791Z [629s] Generation 11 complete: 2026-02-21T09:11:24.1921150Z error=5 2026-02-21T09:11:24.1921384Z ok=15 2026-02-21T09:11:24.1921658Z min=0.3279 2026-02-21T09:11:24.1921934Z mid=0.4023 2026-02-21T09:11:24.1922170Z max=22.6054 2026-02-21T09:11:24.1922455Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:11:24.1923034Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:24.1923708Z 'l2_groupings': [64], 2026-02-21T09:11:24.1924087Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:11:24.1924541Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:24.1924851Z 'num_stages': 4, 2026-02-21T09:11:24.1925164Z 'num_warps': 4, 2026-02-21T09:11:24.1925449Z 'pid_type': 'flat', 2026-02-21T09:11:24.1925792Z 'range_flattens': [None, True], 2026-02-21T09:11:24.1926183Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:24.1926915Z 'range_num_stages': [0, 3], 2026-02-21T09:11:24.1927322Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:24.1927678Z 'range_warp_specializes': []} 2026-02-21T09:11:24.1965498Z [629s] Fitting surrogate: 951 points, 951 targets 2026-02-21T09:11:24.6123450Z [630s] Generation 12 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:11:35.9116898Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 0.7 configs/s 2026-02-21T09:11:36.9782558Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 16.6 configs/s 2026-02-21T09:11:38.3073872Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━━ 621/621 439.8 configs/s 2026-02-21T09:11:38.4243259Z [643s] Generation 12 complete: 2026-02-21T09:11:38.4243623Z error=6 2026-02-21T09:11:38.4243860Z ok=14 2026-02-21T09:11:38.4244116Z min=0.3242 2026-02-21T09:11:38.4244355Z mid=0.5834 2026-02-21T09:11:38.4244623Z max=21.3571 2026-02-21T09:11:38.4245863Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:11:38.4246418Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:38.4247189Z 'l2_groupings': [64], 2026-02-21T09:11:38.4247533Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:11:38.4247945Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:38.4248560Z 'num_stages': 4, 2026-02-21T09:11:38.4248820Z 'num_warps': 4, 2026-02-21T09:11:38.4249093Z 'pid_type': 'flat', 2026-02-21T09:11:38.4249370Z 'range_flattens': [None, True], 2026-02-21T09:11:38.4249714Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:38.4250067Z 'range_num_stages': [0, 3], 2026-02-21T09:11:38.4250386Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:38.4250697Z 'range_warp_specializes': []} 2026-02-21T09:11:38.4299596Z [643s] Fitting surrogate: 971 points, 971 targets 2026-02-21T09:11:38.8585052Z [644s] Generation 13 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:11:48.2133109Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 18/18 1.0 configs/s 2026-02-21T09:11:49.6495053Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 18/18 12.0 configs/s 2026-02-21T09:11:50.6971192Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━━ 621/621 546.8 configs/s 2026-02-21T09:11:50.8086937Z [656s] Generation 13 complete: 2026-02-21T09:11:50.8087313Z error=6 2026-02-21T09:11:50.8087571Z ok=14 2026-02-21T09:11:50.8087787Z min=0.3242 2026-02-21T09:11:50.8088040Z mid=1.0313 2026-02-21T09:11:50.8088258Z max=24.2537 2026-02-21T09:11:50.8088530Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:11:50.8088993Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:11:50.8089495Z 'l2_groupings': [64], 2026-02-21T09:11:50.8089795Z 'load_eviction_policies': ['first', 'first'], 2026-02-21T09:11:50.8090169Z 'loop_orders': [[1, 0]], 2026-02-21T09:11:50.8090466Z 'num_stages': 4, 2026-02-21T09:11:50.8090713Z 'num_warps': 4, 2026-02-21T09:11:50.8090991Z 'pid_type': 'flat', 2026-02-21T09:11:50.8091257Z 'range_flattens': [None, True], 2026-02-21T09:11:50.8091602Z 'range_multi_buffers': [None, False], 2026-02-21T09:11:50.8091935Z 'range_num_stages': [0, 3], 2026-02-21T09:11:50.8092441Z 'range_unroll_factors': [0, 0], 2026-02-21T09:11:50.8092762Z 'range_warp_specializes': []} 2026-02-21T09:11:50.8131969Z [656s] Fitting surrogate: 991 points, 991 targets 2026-02-21T09:11:50.9827032Z [656s] Autotuning complete in 656.4s after searching 946 configs. 2026-02-21T09:11:50.9827612Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:11:50.9829647Z @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=4, num_warps=4, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 3], range_unroll_factors=[0, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:11:50.9831419Z 2026-02-21T09:11:50.9831888Z [656s] Code of selected kernel: /tmp/torchinductor_root/mx/cmx64pievj3dapm7ybq367dah2qbpqfykiq5khwktpx7pg3tlpr7.py 2026-02-21T09:11:52.2379174Z WARNING:tritonbench.utils.triton_op:Completed input ID 17: 2026-02-21T09:11:52.2379690Z x_val 2026-02-21T09:11:52.2379949Z --------------------- 2026-02-21T09:11:52.2380301Z (1, 4096, 8192, 1024) 2026-02-21T09:11:52.2380482Z 2026-02-21T09:11:52.2417657Z 60%|██████ | 6/10 [53:32<37:48, 567.07s/it]WARNING:tritonbench.utils.triton_op:Running input ID 21: 2026-02-21T09:11:52.2418437Z x_val 2026-02-21T09:11:52.2418852Z --------------------- 2026-02-21T09:11:52.2419243Z (4, 4096, 8192, 1024) 2026-02-21T09:11:52.2422939Z INFO:tritonbench.utils.triton_op:Took 0.26ms to get benchmark function for preprocessed_eager_int4_gemm 2026-02-21T09:11:53.4912668Z INFO:tritonbench.utils.triton_op:Took 3.49ms to get benchmark function for preprocessed_torch_compile_int4_gemm 2026-02-21T09:11:56.3906140Z Autotune Choices Stats: 2026-02-21T09:11:56.3908899Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 0.4460799992084503, "best_triton_pos": 1, "best_triton_time": 0.5281280279159546, "best_triton_kernel": "triton_mm_87", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2026-02-21T09:11:56.3912716Z AUTOTUNE mm(16384x1024, 1024x8192) 2026-02-21T09:11:56.3913238Z strides: [1024, 1], [8192, 1] 2026-02-21T09:11:56.3913571Z dtypes: torch.bfloat16, torch.bfloat16 2026-02-21T09:11:56.3913909Z mm 0.4461 ms 100.0% 2026-02-21T09:11:56.3914622Z triton_mm_87 0.5281 ms 84.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:11:56.3915860Z triton_mm_86 0.5457 ms 81.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:11:56.3917429Z triton_mm_88 0.6391 ms 69.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T09:11:56.3918847Z triton_mm_81 0.7150 ms 62.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:11:56.3920073Z triton_mm_79 0.7330 ms 60.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:11:56.3921264Z triton_mm_80 0.7540 ms 59.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:11:56.3922492Z triton_mm_84 0.7719 ms 57.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:11:56.3923745Z triton_mm_82 0.7850 ms 56.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T09:11:56.3924735Z triton_mm_83 0.7863 ms 56.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:11:56.3925571Z SingleProcess AUTOTUNE benchmarking takes 0.7892 seconds and 1.8734 seconds precompiling for 20 choices 2026-02-21T09:11:57.8199438Z INFO:tritonbench.utils.triton_op:Took 0.19ms to get benchmark function for preprocessed_triton_int4_gemm 2026-02-21T09:11:58.9996386Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:11:58.9996896Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:11:58.9997200Z 'dtype': 'torch.bfloat16', 2026-02-21T09:11:58.9997547Z 'shape': (4, 4096, 1024), 2026-02-21T09:11:58.9997849Z 'stride': (4194304, 1024, 1)}, 2026-02-21T09:11:58.9998190Z { 'device': 'cuda:0', 2026-02-21T09:11:58.9998474Z 'dtype': 'torch.int32', 2026-02-21T09:11:58.9998789Z 'shape': (1024, 8192), 2026-02-21T09:11:58.9999068Z 'stride': (8192, 1)}), 2026-02-21T09:11:58.9999370Z 'kwargs': {}} 2026-02-21T09:11:59.0056848Z INFO:tritonbench.utils.triton_op:Took 6.31ms to get benchmark function for helion_int4_gemm_tritonbench 2026-02-21T09:11:59.4159671Z [0s] Autotune random seed: 2135373392 2026-02-21T09:11:59.5976968Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:12:33.8833510Z [34s] Timeout after 30s compiling Config(block_sizes=[32, 2048, 2], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=4, num_stages=7, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[2, 0], range_warp_specializes=[]) 2026-02-21T09:12:39.6419027Z [40s] Timeout after 30s compiling Config(block_sizes=[4, 16384, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[1, 0], range_unroll_factors=[1, 2], range_warp_specializes=[]) 2026-02-21T09:12:40.7352547Z [41s] Timeout after 30s compiling Config(block_sizes=[32, 2048, 1], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=8, num_warps=2, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 2], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T09:12:51.9717642Z [52s] Timeout after 30s compiling Config(block_sizes=[16, 2048, 1], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=5, num_warps=2, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 4], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T09:12:51.9739203Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━━ 100/100 0.5 configs/s 2026-02-21T09:12:54.8961258Z [55s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=32, num_stages=1, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[0, 3], range_unroll_factors=[0, 1], range_warp_specializes=[]) 2026-02-21T09:12:54.8963239Z Tensor-likes are not close! 2026-02-21T09:12:54.8963434Z 2026-02-21T09:12:54.8964048Z Mismatched elements: 134037821 / 134217728 (99.9%) 2026-02-21T09:12:54.8964582Z Greatest absolute difference: 3424.0 at index (2330, 2324) (up to 0.01 allowed) 2026-02-21T09:12:54.8965168Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:12:54.8965704Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:12:54.8965978Z 2026-02-21T09:13:20.3541109Z [80s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 32], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=4, num_stages=1, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, True], range_num_stages=[4, 1], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:13:20.3542988Z Tensor-likes are not close! 2026-02-21T09:13:20.3543222Z 2026-02-21T09:13:20.3543393Z Mismatched elements: 133906199 / 134217728 (99.8%) 2026-02-21T09:13:20.3543939Z Greatest absolute difference: 2448.0 at index (11770, 6169) (up to 0.01 allowed) 2026-02-21T09:13:20.3544574Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:13:20.3545099Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:13:20.3545421Z 2026-02-21T09:13:31.7130147Z [92s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 128, 512], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=64, num_stages=2, num_warps=32, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[4, 2], range_unroll_factors=[4, 1], range_warp_specializes=[]) 2026-02-21T09:13:31.7132195Z Tensor-likes are not close! 2026-02-21T09:13:31.7132433Z 2026-02-21T09:13:31.7132639Z Mismatched elements: 134026375 / 134217728 (99.9%) 2026-02-21T09:13:31.7133142Z Greatest absolute difference: 3312.0 at index (11761, 3387) (up to 0.01 allowed) 2026-02-21T09:13:31.7133731Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:13:31.7134620Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:13:31.7134913Z 2026-02-21T09:14:03.2280151Z 2026-02-21T09:14:03.2280163Z 2026-02-21T09:14:03.2280563Z ================================================================ 2026-02-21T09:14:03.2281005Z Internal Triton PTX codegen error 2026-02-21T09:14:03.2281306Z `ptxas` stderr: 2026-02-21T09:14:03.2282110Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T09:14:03.2283234Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:14:03.2283646Z 2026-02-21T09:14:03.2285238Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp3p90ww4s.ptx -o /tmp/tmp3p90ww4s.ptx.o 2026-02-21T09:14:03.2286098Z 2026-02-21T09:14:03.2286113Z 2026-02-21T09:14:03.2286215Z // 2026-02-21T09:14:03.2286656Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:14:03.2287029Z // 2026-02-21T09:14:03.2287168Z 2026-02-21T09:14:03.2287271Z .version 8.7 2026-02-21T09:14:03.2287546Z .target sm_90a 2026-02-21T09:14:03.2287814Z .address_size 64 2026-02-21T09:14:03.2287958Z 2026-02-21T09:14:03.2288216Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:14:03.2288756Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:14:03.2289134Z // @_helion_matmul_bf16_int4 2026-02-21T09:14:03.2289551Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:14:03.2289976Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:14:03.2290681Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:14:03.2291216Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:14:03.2291702Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:14:03.2292248Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:14:03.2292643Z ) 2026-02-21T09:14:03.2292846Z .reqntid 1024 2026-02-21T09:14:03.2293041Z .maxnreg 32 2026-02-21T09:14:03.2293249Z { 2026-02-21T09:14:03.2293434Z .reg .pred %p<34>; 2026-02-21T09:14:03.2293689Z .reg .b16 %rs<59>; 2026-02-21T09:14:03.2293922Z .reg .b32 %r<1170>; 2026-02-21T09:14:03.2294129Z .reg .b64 %rd<144>; 2026-02-21T09:14:03.2294544Z .loc 1 14 0 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:14:0 2026-02-21T09:14:03.2294985Z $L__func_begin0: 2026-02-21T09:14:03.2295372Z .loc 1 14 0 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:14:0 2026-02-21T09:14:03.2295718Z 2026-02-21T09:14:03.2295804Z // %bb.0: 2026-02-21T09:14:03.2296088Z ld.param.b64 %rd34, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:14:03.2296646Z ld.param.b64 %rd33, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:14:03.2297052Z ld.param.b64 %rd32, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:14:03.2297383Z $L__tmp0: 2026-02-21T09:14:03.2310113Z .loc 1 19 46 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:46 2026-02-21T09:14:03.2310682Z mov.u32 %r1089, %ctaid.x; 2026-02-21T09:14:03.2311089Z .loc 1 0 0 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:0 2026-02-21T09:14:03.2311531Z sub.s32 %r224, 12415, %r1089; 2026-02-21T09:14:03.2311934Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2312405Z mul.hi.u32 %r225, %r224, 1041204193; 2026-02-21T09:14:03.2312815Z shr.u32 %r226, %r225, 10; 2026-02-21T09:14:03.2313079Z mul.hi.u32 %r227, %r226, 1431655766; 2026-02-21T09:14:03.2313373Z mad.lo.s32 %r1150, %r227, 12672, %r1089; 2026-02-21T09:14:03.2313782Z .loc 1 31 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:31:45 2026-02-21T09:14:03.2314470Z mov.u32 %r3, %tid.x; 2026-02-21T09:14:03.2314680Z shr.u32 %r4, %r3, 5; 2026-02-21T09:14:03.2314920Z shr.u32 %r5, %r3, 2; 2026-02-21T09:14:03.2315150Z or.b32 %r6, %r5, 256; 2026-02-21T09:14:03.2315508Z .loc 1 33 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:33:45 2026-02-21T09:14:03.2315949Z and.b32 %r7, %r3, 31; 2026-02-21T09:14:03.2316161Z and.b32 %r8, %r3, 3; 2026-02-21T09:14:03.2316389Z shl.b32 %r9, %r8, 3; 2026-02-21T09:14:03.2316908Z .loc 1 41 48 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:41:48 2026-02-21T09:14:03.2317371Z bfe.u32 %r10, %r3, 6, 3; 2026-02-21T09:14:03.2317881Z .loc 1 47 38 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:47:38 2026-02-21T09:14:03.2318421Z shl.b32 %r11, %r8, 2; 2026-02-21T09:14:03.2318792Z .loc 1 65 38 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:65:38 2026-02-21T09:14:03.2319186Z and.b32 %r12, %r3, 32; 2026-02-21T09:14:03.2319587Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2319996Z setp.ge.s32 %p1, %r1089, %r1150; 2026-02-21T09:14:03.2320271Z shl.b32 %r1078, %r3, 3; 2026-02-21T09:14:03.2320490Z bfe.s32 %r1079, %r3, 4, 1; 2026-02-21T09:14:03.2320752Z mov.b32 %r1080, global_smem; 2026-02-21T09:14:03.2321007Z shl.b32 %r1081, %r3, 4; 2026-02-21T09:14:03.2321230Z shl.b32 %r1082, %r8, 1; 2026-02-21T09:14:03.2321468Z shl.b32 %r1083, %r7, 6; 2026-02-21T09:14:03.2321680Z shr.u32 %r1084, %r3, 3; 2026-02-21T09:14:03.2321934Z shl.b32 %r1085, %r3, 10; 2026-02-21T09:14:03.2322569Z [123s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:14:03.2324272Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 512, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:14:03.2325824Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:14:03.2326157Z `ptxas` stderr: 2026-02-21T09:14:03.2326948Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 300 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T09:14:03.2327668Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:14:03.2327887Z 2026-02-21T09:14:03.2328426Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp3p90ww4s.ptx -o /tmp/tmp3p90ww4s.ptx.o 2026-02-21T09:14:03.2329060Z 2026-02-21T09:14:03.2329240Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:14:03.2329618Z shl.b32 %r1086, %r8, 5; 2026-02-21T09:14:03.2329843Z shl.b32 %r1087, %r3, 2; 2026-02-21T09:14:03.2330091Z shl.b32 %r1088, %r5, 10; 2026-02-21T09:14:03.2330318Z setp.eq.b32 %p33, %r12, 0; 2026-02-21T09:14:03.2330595Z cvt.u64.u32 %rd134, %r11; 2026-02-21T09:14:03.2330831Z @%p1 bra $L__BB0_9; 2026-02-21T09:14:03.2331100Z // %bb.1: // %.lr.ph 2026-02-21T09:14:03.2331538Z .loc 1 0 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:0:144 2026-02-21T09:14:03.2332011Z and.b32 %r229, %r1078, 8056; 2026-02-21T09:14:03.2332283Z and.b32 %r231, %r1079, 136; 2026-02-21T09:14:03.2332515Z xor.b32 %r13, %r231, %r229; 2026-02-21T09:14:03.2332774Z add.s32 %r14, %r1080, %r13; 2026-02-21T09:14:03.2333012Z add.s32 %r15, %r14, 8192; 2026-02-21T09:14:03.2333270Z add.s32 %r16, %r14, 16384; 2026-02-21T09:14:03.2333491Z add.s32 %r17, %r14, 24576; 2026-02-21T09:14:03.2333874Z and.b32 %r234, %r1081, 15872; 2026-02-21T09:14:03.2334106Z and.b32 %r235, %r1078, 96; 2026-02-21T09:14:03.2334343Z or.b32 %r237, %r234, %r235; 2026-02-21T09:14:03.2334566Z or.b32 %r238, %r237, %r1082; 2026-02-21T09:14:03.2334821Z or.b32 %r18, %r238, %r231; 2026-02-21T09:14:03.2335067Z xor.b32 %r19, %r18, 8; 2026-02-21T09:14:03.2335282Z and.b32 %r240, %r1078, 48; 2026-02-21T09:14:03.2335528Z and.b32 %r242, %r1084, 60; 2026-02-21T09:14:03.2335759Z xor.b32 %r243, %r240, %r242; 2026-02-21T09:14:03.2336018Z add.s32 %r244, %r1080, 32768; 2026-02-21T09:14:03.2336245Z add.s32 %r245, %r244, %r1083; 2026-02-21T09:14:03.2336638Z add.s32 %r20, %r245, %r243; 2026-02-21T09:14:03.2336890Z bfe.u32 %r246, %r244, 4, 14; 2026-02-21T09:14:03.2337253Z cvt.u64.u32 %rd35, %r246; 2026-02-21T09:14:03.2337607Z or.b64 %rd1, %rd35, -9223371899407433728; 2026-02-21T09:14:03.2337876Z add.s32 %r247, %r1080, 32800; 2026-02-21T09:14:03.2338131Z bfe.u32 %r248, %r247, 4, 14; 2026-02-21T09:14:03.2338358Z cvt.u64.u32 %rd36, %r248; 2026-02-21T09:14:03.2338617Z or.b64 %rd2, %rd36, -9223371899407433728; 2026-02-21T09:14:03.2338868Z and.b32 %r250, %r1085, 24576; 2026-02-21T09:14:03.2339124Z and.b32 %r252, %r3, 992; 2026-02-21T09:14:03.2339343Z shl.b32 %r253, %r252, 3; 2026-02-21T09:14:03.2339604Z and.b32 %r255, %r1087, 112; 2026-02-21T09:14:03.2339834Z or.b32 %r256, %r1086, %r253; 2026-02-21T09:14:03.2340088Z xor.b32 %r257, %r256, %r255; 2026-02-21T09:14:03.2340346Z add.s32 %r258, %r1080, %r250; 2026-02-21T09:14:03.2340579Z add.s32 %r21, %r258, %r257; 2026-02-21T09:14:03.2340833Z and.b32 %r259, %r3, 6; 2026-02-21T09:14:03.2341051Z shl.b32 %r260, %r259, 12; 2026-02-21T09:14:03.2341299Z and.b32 %r261, %r1081, 112; 2026-02-21T09:14:03.2341523Z and.b32 %r262, %r1087, 4064; 2026-02-21T09:14:03.2341853Z or.b32 %r263, %r260, %r261; 2026-02-21T09:14:03.2342089Z xor.b32 %r264, %r263, %r262; 2026-02-21T09:14:03.2342352Z add.s32 %r438, %r1080, %r264; 2026-02-21T09:14:03.2342578Z add.s32 %r443, %r438, 4096; 2026-02-21T09:14:03.2342830Z shl.b32 %r265, %r252, 4; 2026-02-21T09:14:03.2343077Z or.b32 %r266, %r265, %r235; 2026-02-21T09:14:03.2343300Z or.b32 %r267, %r266, %r1082; 2026-02-21T09:14:03.2343551Z or.b32 %r24, %r267, %r231; 2026-02-21T09:14:03.2343775Z xor.b32 %r25, %r24, 8; 2026-02-21T09:14:03.2344034Z shl.b32 %r268, %r259, 3; 2026-02-21T09:14:03.2344253Z xor.b32 %r269, %r268, %r242; 2026-02-21T09:14:03.2344507Z add.s32 %r26, %r245, %r269; 2026-02-21T09:14:03.2344895Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2345359Z mad.wide.u32 %rd37, %r8, 8, %rd32; 2026-02-21T09:14:03.2345643Z add.s64 %rd3, %rd37, 64; 2026-02-21T09:14:03.2346015Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2346593Z or.b32 %r271, %r1088, %r11; 2026-02-21T09:14:03.2346840Z or.b32 %r28, %r271, 262176; 2026-02-21T09:14:03.2347251Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2347651Z shl.b32 %r272, %r10, 13; 2026-02-21T09:14:03.2348041Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2348552Z or.b32 %r29, %r272, %r7; 2026-02-21T09:14:03.2348827Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:14:03.2349194Z // Child Loop BB0_3 Depth 2 2026-02-21T09:14:03.2349505Z // Child Loop BB0_5 Depth 2 2026-02-21T09:14:03.2349846Z // Child Loop BB0_7 Depth 2 2026-02-21T09:14:03.2350271Z .loc 1 25 35 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:25:35 2026-02-21T09:14:03.2350707Z shr.s32 %r284, %r1089, 31; 2026-02-21T09:14:03.2350961Z shr.u32 %r285, %r284, 23; 2026-02-21T09:14:03.2351295Z add.s32 %r286, %r1089, %r285; 2026-02-21T09:14:03.2351549Z shr.s32 %r287, %r286, 9; 2026-02-21T09:14:03.2351902Z .loc 1 26 33 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:26:33 2026-02-21T09:14:03.2352333Z shl.b32 %r288, %r287, 1; 2026-02-21T09:14:03.2352684Z .loc 1 27 39 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:39 2026-02-21T09:14:03.2353127Z sub.s32 %r289, 32, %r288; 2026-02-21T09:14:03.2353508Z .loc 1 27 52 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:52 2026-02-21T09:14:03.2353900Z min.s32 %r290, %r289, 2; 2026-02-21T09:14:03.2354279Z .loc 1 28 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:45 2026-02-21T09:14:03.2354839Z and.b32 %r291, %r286, -512; 2026-02-21T09:14:03.2355114Z sub.s32 %r292, %r1089, %r291; 2026-02-21T09:14:03.2355478Z .loc 1 29 51 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:29:51 2026-02-21T09:14:03.2355905Z div.s32 %r293, %r292, %r290; 2026-02-21T09:14:03.2356296Z .loc 1 28 64 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:64 2026-02-21T09:14:03.2356862Z mul.lo.s32 %r294, %r293, %r290; 2026-02-21T09:14:03.2357137Z sub.s32 %r295, %r292, %r294; 2026-02-21T09:14:03.2357529Z .loc 1 28 30 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:30 2026-02-21T09:14:03.2357967Z add.s32 %r296, %r295, %r288; 2026-02-21T09:14:03.2358332Z .loc 1 30 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:30:27 2026-02-21T09:14:03.2358753Z shl.b32 %r297, %r296, 9; 2026-02-21T09:14:03.2359140Z .loc 1 31 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:31:32 2026-02-21T09:14:03.2359619Z or.b32 %r44, %r297, %r5; 2026-02-21T09:14:03.2359870Z or.b32 %r45, %r297, %r6; 2026-02-21T09:14:03.2360225Z .loc 1 32 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:32:27 2026-02-21T09:14:03.2360657Z shl.b32 %r46, %r293, 5; 2026-02-21T09:14:03.2361006Z .loc 1 48 53 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:53 2026-02-21T09:14:03.2361427Z shl.b32 %r298, %r44, 10; 2026-02-21T09:14:03.2361675Z shl.b32 %r299, %r45, 10; 2026-02-21T09:14:03.2362026Z .loc 1 48 60 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:60 2026-02-21T09:14:03.2362457Z or.b32 %r300, %r298, %r11; 2026-02-21T09:14:03.2362684Z or.b32 %r301, %r299, %r11; 2026-02-21T09:14:03.2363071Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2363585Z mad.wide.s32 %rd38, %r300, 2, %rd32; 2026-02-21T09:14:03.2363862Z mad.wide.s32 %rd39, %r301, 2, %rd32; 2026-02-21T09:14:03.2364275Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2364668Z bar.sync 0; 2026-02-21T09:14:03.2364888Z mov.b32 %r274, 8; 2026-02-21T09:14:03.2365091Z // begin inline asm 2026-02-21T09:14:03.2365402Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd38 + 0 ], 0x8, %r274; 2026-02-21T09:14:03.2365728Z // end inline asm 2026-02-21T09:14:03.2365953Z // begin inline asm 2026-02-21T09:14:03.2366227Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd39 + 0 ], 0x8, %r274; 2026-02-21T09:14:03.2366725Z // end inline asm 2026-02-21T09:14:03.2366970Z cp.async.commit_group; 2026-02-21T09:14:03.2367331Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2367761Z cvt.s64.s32 %rd43, %r298; 2026-02-21T09:14:03.2367987Z or.b64 %rd44, %rd43, %rd134; 2026-02-21T09:14:03.2368244Z shl.b64 %rd45, %rd44, 1; 2026-02-21T09:14:03.2368465Z add.s64 %rd46, %rd32, %rd45; 2026-02-21T09:14:03.2368717Z add.s64 %rd40, %rd46, 32; 2026-02-21T09:14:03.2368935Z cvt.s64.s32 %rd47, %r299; 2026-02-21T09:14:03.2369305Z or.b64 %rd48, %rd47, %rd134; 2026-02-21T09:14:03.2369553Z shl.b64 %rd49, %rd48, 1; 2026-02-21T09:14:03.2369774Z add.s64 %rd50, %rd32, %rd49; 2026-02-21T09:14:03.2370022Z add.s64 %rd41, %rd50, 32; 2026-02-21T09:14:03.2370374Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2370792Z bar.sync 0; 2026-02-21T09:14:03.2370989Z // begin inline asm 2026-02-21T09:14:03.2371290Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd40 + 0 ], 0x8, %r274; 2026-02-21T09:14:03.2371606Z // end inline asm 2026-02-21T09:14:03.2371842Z // begin inline asm 2026-02-21T09:14:03.2372137Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd41 + 0 ], 0x8, %r274; 2026-02-21T09:14:03.2372455Z // end inline asm 2026-02-21T09:14:03.2372878Z cp.async.commit_group; 2026-02-21T09:14:03.2373249Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2373684Z shl.b32 %r302, %r296, 19; 2026-02-21T09:14:03.2373902Z or.b32 %r303, %r1088, %r302; 2026-02-21T09:14:03.2374163Z mad.wide.s32 %rd135, %r303, 2, %rd3; 2026-02-21T09:14:03.2374413Z shl.b32 %r304, %r1089, 19; 2026-02-21T09:14:03.2374667Z or.b32 %r305, %r28, %r304; 2026-02-21T09:14:03.2374908Z shl.b32 %r306, %r294, 19; 2026-02-21T09:14:03.2375121Z sub.s32 %r307, %r305, %r306; 2026-02-21T09:14:03.2375378Z mul.lo.s32 %r308, %r287, 267386880; 2026-02-21T09:14:03.2375623Z sub.s32 %r1091, %r307, %r308; 2026-02-21T09:14:03.2375878Z add.s32 %r1090, %r29, %r46; 2026-02-21T09:14:03.2376099Z mov.b32 %r1094, 0f00000000; 2026-02-21T09:14:03.2376338Z mov.b32 %r1093, 1; 2026-02-21T09:14:03.2376669Z mov.b32 %r1092, -1; 2026-02-21T09:14:03.2376922Z mov.b64 %rd136, -8; 2026-02-21T09:14:03.2377133Z mov.b32 %r1095, %r1094; 2026-02-21T09:14:03.2377422Z mov.b32 %r1096, %r1094; 2026-02-21T09:14:03.2377601Z mov.b32 %r1097, %r1094; 2026-02-21T09:14:03.2377767Z mov.b32 %r1098, %r1094; 2026-02-21T09:14:03.2377935Z mov.b32 %r1099, %r1094; 2026-02-21T09:14:03.2378109Z mov.b32 %r1100, %r1094; 2026-02-21T09:14:03.2378276Z mov.b32 %r1101, %r1094; 2026-02-21T09:14:03.2378445Z mov.b32 %r1102, %r1094; 2026-02-21T09:14:03.2378623Z mov.b32 %r1103, %r1094; 2026-02-21T09:14:03.2378795Z mov.b32 %r1104, %r1094; 2026-02-21T09:14:03.2378962Z mov.b32 %r1105, %r1094; 2026-02-21T09:14:03.2379132Z mov.b32 %r1106, %r1094; 2026-02-21T09:14:03.2379296Z mov.b32 %r1107, %r1094; 2026-02-21T09:14:03.2379465Z mov.b32 %r1108, %r1094; 2026-02-21T09:14:03.2379638Z mov.b32 %r1109, %r1094; 2026-02-21T09:14:03.2379868Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:14:03.2380175Z // => This Inner Loop Header: Depth=2 2026-02-21T09:14:03.2380432Z add.s64 %rd136, %rd136, 8; 2026-02-21T09:14:03.2380628Z setp.lt.u64 %p5, %rd136, 496; 2026-02-21T09:14:03.2380818Z add.s32 %r423, %r1092, 1; 2026-02-21T09:14:03.2381001Z setp.gt.s32 %p6, %r423, 1; 2026-02-21T09:14:03.2381188Z selp.b32 %r1092, 0, %r423, %p6; 2026-02-21T09:14:03.2381530Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2381897Z cp.async.wait_group 1; 2026-02-21T09:14:03.2382067Z bar.sync 0; 2026-02-21T09:14:03.2382220Z shl.b32 %r424, %r1092, 14; 2026-02-21T09:14:03.2382395Z add.s32 %r426, %r1080, %r424; 2026-02-21T09:14:03.2382720Z .loc 1 52 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:52:32 2026-02-21T09:14:03.2383073Z add.s32 %r427, %r426, %r18; 2026-02-21T09:14:03.2383265Z ld.shared.b16 %rs2, [%r427]; 2026-02-21T09:14:03.2383456Z ld.shared.b16 %rs3, [%r427+256]; 2026-02-21T09:14:03.2383662Z ld.shared.b16 %rs4, [%r427+16]; 2026-02-21T09:14:03.2383865Z ld.shared.b16 %rs5, [%r427+272]; 2026-02-21T09:14:03.2384069Z add.s32 %r428, %r426, %r19; 2026-02-21T09:14:03.2384260Z ld.shared.b16 %rs6, [%r428]; 2026-02-21T09:14:03.2384442Z ld.shared.b16 %rs7, [%r428+256]; 2026-02-21T09:14:03.2384727Z ld.shared.b16 %rs8, [%r428+16]; 2026-02-21T09:14:03.2384919Z ld.shared.b16 %rs9, [%r428+272]; 2026-02-21T09:14:03.2385134Z cvt.f32.bf16 %r341, %rs2; 2026-02-21T09:14:03.2385311Z cvt.f32.bf16 %r342, %rs3; 2026-02-21T09:14:03.2385489Z cvt.f32.bf16 %r343, %rs6; 2026-02-21T09:14:03.2385662Z cvt.f32.bf16 %r344, %rs7; 2026-02-21T09:14:03.2385840Z cvt.f32.bf16 %r377, %rs4; 2026-02-21T09:14:03.2386018Z cvt.f32.bf16 %r378, %rs5; 2026-02-21T09:14:03.2386191Z cvt.f32.bf16 %r379, %rs8; 2026-02-21T09:14:03.2386367Z cvt.f32.bf16 %r380, %rs9; 2026-02-21T09:14:03.2386812Z .loc 1 54 34 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:34 2026-02-21T09:14:03.2387191Z cvt.s64.s32 %rd58, %r1090; 2026-02-21T09:14:03.2387453Z add.s64 %rd52, %rd33, %rd58; 2026-02-21T09:14:03.2387845Z .loc 1 54 87 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:87 2026-02-21T09:14:03.2388201Z // begin inline asm 2026-02-21T09:14:03.2388469Z mov.u64 %rd51, 0x0; 2026-02-21T09:14:03.2388722Z createpolicy.fractional.L2::evict_last.b64 %rd51, 1.0; 2026-02-21T09:14:03.2388995Z // end inline asm 2026-02-21T09:14:03.2389158Z // begin inline asm 2026-02-21T09:14:03.2389320Z mov.u16 %rs1, 0x0; 2026-02-21T09:14:03.2389582Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs1 }, [ %rd52 + 0 ], %rd51; 2026-02-21T09:14:03.2389880Z // end inline asm 2026-02-21T09:14:03.2390201Z .loc 1 57 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:57:28 2026-02-21T09:14:03.2390573Z shl.b16 %rs10, %rs1, 4; 2026-02-21T09:14:03.2390895Z .loc 1 72 58 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:72:58 2026-02-21T09:14:03.2391270Z selp.b16 %rs11, %rs10, %rs1, %p33; 2026-02-21T09:14:03.2391482Z cvt.s16.s8 %rs12, %rs11; 2026-02-21T09:14:03.2391752Z shr.s16 %rs13, %rs12, 4; 2026-02-21T09:14:03.2392070Z .loc 1 77 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:77:32 2026-02-21T09:14:03.2392436Z cvt.rn.f32.s16 %r429, %rs13; 2026-02-21T09:14:03.2392626Z st.shared.b32 [%r20], %r429; 2026-02-21T09:14:03.2392807Z $L__tmp1: 2026-02-21T09:14:03.2393167Z .loc 2 291 36 // standard.py:291:36 @[ csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:84:40 ] 2026-02-21T09:14:03.2393585Z // begin inline asm 2026-02-21T09:14:03.2393769Z fence.proxy.async.shared::cta; 2026-02-21T09:14:03.2393961Z // end inline asm 2026-02-21T09:14:03.2394119Z bar.sync 0; 2026-02-21T09:14:03.2394290Z shfl.sync.idx.b32 %r430, %r4, 0, 31, -1; 2026-02-21T09:14:03.2394523Z wgmma.fence.sync.aligned; 2026-02-21T09:14:03.2394704Z mov.pred %p2, -1; 2026-02-21T09:14:03.2394868Z // begin inline asm 2026-02-21T09:14:03.2395479Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109}, {%r341,%r342,%r343,%r344}, %rd1, %p2, 1, 1; 2026-02-21T09:14:03.2396117Z // end inline asm 2026-02-21T09:14:03.2396280Z // begin inline asm 2026-02-21T09:14:03.2396998Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109}, {%r377,%r378,%r379,%r380}, %rd2, %p2, 1, 1; 2026-02-21T09:14:03.2397653Z // end inline asm 2026-02-21T09:14:03.2397828Z wgmma.commit_group.sync.aligned; 2026-02-21T09:14:03.2398027Z mov.b32 %r399, 0; 2026-02-21T09:14:03.2398187Z mov.b32 %r398, %r399; 2026-02-21T09:14:03.2398352Z mov.b32 %r397, %r244; 2026-02-21T09:14:03.2398521Z // begin inline asm 2026-02-21T09:14:03.2398919Z // wait for regs: %r1094,%r1095,%r1096,%r1097,%r1098,%r1099,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1108,%r1109,%r397,%r398,%r399 2026-02-21T09:14:03.2399398Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:14:03.2399593Z // end inline asm 2026-02-21T09:14:03.2399748Z $L__tmp2: 2026-02-21T09:14:03.2400055Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2400524Z add.s32 %r431, %r1093, 1; 2026-02-21T09:14:03.2400715Z setp.gt.s32 %p7, %r431, 1; 2026-02-21T09:14:03.2400903Z selp.b32 %r1093, 0, %r431, %p7; 2026-02-21T09:14:03.2401255Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2401620Z mad.wide.s32 %rd57, %r1091, 2, %rd32; 2026-02-21T09:14:03.2401968Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2402318Z shl.b32 %r432, %r1093, 14; 2026-02-21T09:14:03.2402503Z add.s32 %r433, %r1080, %r432; 2026-02-21T09:14:03.2402697Z add.s32 %r419, %r433, %r13; 2026-02-21T09:14:03.2402982Z selp.b32 %r420, 8, 0, %p5; 2026-02-21T09:14:03.2403256Z // begin inline asm 2026-02-21T09:14:03.2403503Z cp.async.ca.shared.global [ %r419 + 0 ], [ %rd135 + 0 ], 0x8, %r420; 2026-02-21T09:14:03.2403791Z // end inline asm 2026-02-21T09:14:03.2403950Z add.s32 %r421, %r419, 8192; 2026-02-21T09:14:03.2404139Z // begin inline asm 2026-02-21T09:14:03.2404370Z cp.async.ca.shared.global [ %r421 + 0 ], [ %rd57 + 0 ], 0x8, %r420; 2026-02-21T09:14:03.2404651Z // end inline asm 2026-02-21T09:14:03.2404820Z cp.async.commit_group; 2026-02-21T09:14:03.2405142Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2405514Z add.s64 %rd135, %rd135, 32; 2026-02-21T09:14:03.2405696Z add.s32 %r1091, %r1091, 16; 2026-02-21T09:14:03.2405897Z add.s32 %r1090, %r1090, 65536; 2026-02-21T09:14:03.2406093Z setp.lt.u64 %p8, %rd136, 504; 2026-02-21T09:14:03.2406281Z @%p8 bra $L__BB0_3; 2026-02-21T09:14:03.2406622Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:14:03.2407131Z .loc 1 33 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:33:32 2026-02-21T09:14:03.2407511Z or.b32 %r463, %r46, %r9; 2026-02-21T09:14:03.2407834Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2408207Z cp.async.wait_group 0; 2026-02-21T09:14:03.2408379Z bar.sync 0; 2026-02-21T09:14:03.2408670Z .loc 1 87 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:87:28 2026-02-21T09:14:03.2409034Z cvt.rn.bf16x2.f32 %r464, %r1095, %r1094; 2026-02-21T09:14:03.2409264Z cvt.rn.bf16x2.f32 %r465, %r1097, %r1096; 2026-02-21T09:14:03.2409486Z cvt.rn.bf16x2.f32 %r466, %r1099, %r1098; 2026-02-21T09:14:03.2409707Z cvt.rn.bf16x2.f32 %r467, %r1101, %r1100; 2026-02-21T09:14:03.2409928Z cvt.rn.bf16x2.f32 %r468, %r1103, %r1102; 2026-02-21T09:14:03.2410142Z cvt.rn.bf16x2.f32 %r469, %r1105, %r1104; 2026-02-21T09:14:03.2410366Z cvt.rn.bf16x2.f32 %r470, %r1107, %r1106; 2026-02-21T09:14:03.2410595Z cvt.rn.bf16x2.f32 %r471, %r1109, %r1108; 2026-02-21T09:14:03.2410946Z .loc 1 88 43 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:43 2026-02-21T09:14:03.2411308Z shl.b32 %r472, %r44, 13; 2026-02-21T09:14:03.2411483Z shl.b32 %r473, %r45, 13; 2026-02-21T09:14:03.2411796Z .loc 1 88 50 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:50 2026-02-21T09:14:03.2412150Z add.s32 %r474, %r472, %r463; 2026-02-21T09:14:03.2412342Z add.s32 %r475, %r473, %r463; 2026-02-21T09:14:03.2412658Z .loc 1 88 22 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:22 2026-02-21T09:14:03.2413024Z mad.wide.s32 %rd59, %r474, 2, %rd34; 2026-02-21T09:14:03.2413233Z mad.wide.s32 %rd60, %r475, 2, %rd34; 2026-02-21T09:14:03.2413574Z .loc 1 88 81 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:81 2026-02-21T09:14:03.2413980Z st.shared.v4.b32 [%r21], {%r464, %r466, %r468, %r470}; 2026-02-21T09:14:03.2414410Z st.shared.v4.b32 [%r21+128], {%r465, %r467, %r469, %r471}; 2026-02-21T09:14:03.2414666Z bar.sync 0; 2026-02-21T09:14:03.2414918Z // begin inline asm 2026-02-21T09:14:03.2415203Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r444, %r445, %r446, %r447}, [%r438]; 2026-02-21T09:14:03.2415530Z // end inline asm 2026-02-21T09:14:03.2415690Z // begin inline asm 2026-02-21T09:14:03.2415969Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r448, %r449, %r450, %r451}, [%r443]; 2026-02-21T09:14:03.2416297Z // end inline asm 2026-02-21T09:14:03.2416597Z // begin inline asm 2026-02-21T09:14:03.2416820Z st.global.v4.b32 [ %rd59 + 0 ], { %r444, %r445, %r446, %r447 }; 2026-02-21T09:14:03.2417091Z // end inline asm 2026-02-21T09:14:03.2417246Z // begin inline asm 2026-02-21T09:14:03.2417459Z st.global.v4.b32 [ %rd60 + 0 ], { %r448, %r449, %r450, %r451 }; 2026-02-21T09:14:03.2417794Z // end inline asm 2026-02-21T09:14:03.2418175Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2418551Z add.s32 %r476, %r1089, 4224; 2026-02-21T09:14:03.2418888Z .loc 1 25 35 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:25:35 2026-02-21T09:14:03.2419259Z shr.s32 %r477, %r476, 31; 2026-02-21T09:14:03.2419448Z shr.u32 %r478, %r477, 23; 2026-02-21T09:14:03.2419640Z add.s32 %r479, %r476, %r478; 2026-02-21T09:14:03.2419827Z shr.s32 %r480, %r479, 9; 2026-02-21T09:14:03.2420160Z .loc 1 26 33 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:26:33 2026-02-21T09:14:03.2420526Z shl.b32 %r481, %r480, 1; 2026-02-21T09:14:03.2420850Z .loc 1 27 39 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:39 2026-02-21T09:14:03.2421224Z sub.s32 %r482, 32, %r481; 2026-02-21T09:14:03.2421533Z .loc 1 27 52 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:52 2026-02-21T09:14:03.2421966Z min.s32 %r483, %r482, 2; 2026-02-21T09:14:03.2422282Z .loc 1 28 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:45 2026-02-21T09:14:03.2422638Z and.b32 %r484, %r479, -512; 2026-02-21T09:14:03.2422823Z sub.s32 %r485, %r476, %r484; 2026-02-21T09:14:03.2423160Z .loc 1 29 51 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:29:51 2026-02-21T09:14:03.2423515Z div.s32 %r486, %r485, %r483; 2026-02-21T09:14:03.2423829Z .loc 1 28 64 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:64 2026-02-21T09:14:03.2424186Z mul.lo.s32 %r487, %r486, %r483; 2026-02-21T09:14:03.2424377Z sub.s32 %r488, %r485, %r487; 2026-02-21T09:14:03.2424691Z .loc 1 28 30 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:30 2026-02-21T09:14:03.2425037Z add.s32 %r489, %r488, %r481; 2026-02-21T09:14:03.2425363Z .loc 1 30 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:30:27 2026-02-21T09:14:03.2425712Z shl.b32 %r490, %r489, 9; 2026-02-21T09:14:03.2426016Z .loc 1 31 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:31:32 2026-02-21T09:14:03.2426377Z or.b32 %r89, %r490, %r5; 2026-02-21T09:14:03.2426702Z or.b32 %r90, %r490, %r6; 2026-02-21T09:14:03.2427028Z .loc 1 32 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:32:27 2026-02-21T09:14:03.2427377Z shl.b32 %r91, %r486, 5; 2026-02-21T09:14:03.2427687Z .loc 1 48 53 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:53 2026-02-21T09:14:03.2428040Z shl.b32 %r491, %r89, 10; 2026-02-21T09:14:03.2428208Z shl.b32 %r492, %r90, 10; 2026-02-21T09:14:03.2428592Z .loc 1 48 60 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:60 2026-02-21T09:14:03.2428941Z or.b32 %r493, %r491, %r11; 2026-02-21T09:14:03.2429129Z or.b32 %r494, %r492, %r11; 2026-02-21T09:14:03.2429441Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2429804Z mad.wide.s32 %rd61, %r493, 2, %rd32; 2026-02-21T09:14:03.2430112Z mad.wide.s32 %rd62, %r494, 2, %rd32; 2026-02-21T09:14:03.2430453Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2430806Z bar.sync 0; 2026-02-21T09:14:03.2430963Z mov.b32 %r453, 8; 2026-02-21T09:14:03.2431125Z // begin inline asm 2026-02-21T09:14:03.2431361Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd61 + 0 ], 0x8, %r453; 2026-02-21T09:14:03.2431653Z // end inline asm 2026-02-21T09:14:03.2431807Z // begin inline asm 2026-02-21T09:14:03.2432043Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd62 + 0 ], 0x8, %r453; 2026-02-21T09:14:03.2432321Z // end inline asm 2026-02-21T09:14:03.2432480Z cp.async.commit_group; 2026-02-21T09:14:03.2432970Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2433342Z cvt.s64.s32 %rd66, %r491; 2026-02-21T09:14:03.2433532Z or.b64 %rd67, %rd66, %rd134; 2026-02-21T09:14:03.2433716Z shl.b64 %rd68, %rd67, 1; 2026-02-21T09:14:03.2433895Z add.s64 %rd69, %rd32, %rd68; 2026-02-21T09:14:03.2434076Z add.s64 %rd63, %rd69, 32; 2026-02-21T09:14:03.2434258Z cvt.s64.s32 %rd70, %r492; 2026-02-21T09:14:03.2434435Z or.b64 %rd71, %rd70, %rd134; 2026-02-21T09:14:03.2434613Z shl.b64 %rd72, %rd71, 1; 2026-02-21T09:14:03.2434790Z add.s64 %rd73, %rd32, %rd72; 2026-02-21T09:14:03.2434965Z add.s64 %rd64, %rd73, 32; 2026-02-21T09:14:03.2435283Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2435630Z bar.sync 0; 2026-02-21T09:14:03.2435783Z // begin inline asm 2026-02-21T09:14:03.2436008Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd63 + 0 ], 0x8, %r453; 2026-02-21T09:14:03.2436288Z // end inline asm 2026-02-21T09:14:03.2436672Z // begin inline asm 2026-02-21T09:14:03.2436929Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd64 + 0 ], 0x8, %r453; 2026-02-21T09:14:03.2437210Z // end inline asm 2026-02-21T09:14:03.2437384Z cp.async.commit_group; 2026-02-21T09:14:03.2437719Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2438083Z shl.b32 %r495, %r489, 19; 2026-02-21T09:14:03.2438267Z or.b32 %r496, %r1088, %r495; 2026-02-21T09:14:03.2438472Z mad.wide.s32 %rd137, %r496, 2, %rd3; 2026-02-21T09:14:03.2438680Z shl.b32 %r497, %r476, 19; 2026-02-21T09:14:03.2438861Z or.b32 %r498, %r28, %r497; 2026-02-21T09:14:03.2439035Z shl.b32 %r499, %r487, 19; 2026-02-21T09:14:03.2439214Z sub.s32 %r500, %r498, %r499; 2026-02-21T09:14:03.2439397Z mul.lo.s32 %r501, %r480, 267386880; 2026-02-21T09:14:03.2439604Z sub.s32 %r1111, %r500, %r501; 2026-02-21T09:14:03.2439790Z add.s32 %r1110, %r29, %r91; 2026-02-21T09:14:03.2439981Z mov.b32 %r1114, 0f00000000; 2026-02-21T09:14:03.2440155Z mov.b32 %r1113, 1; 2026-02-21T09:14:03.2440323Z mov.b32 %r1112, -1; 2026-02-21T09:14:03.2440484Z mov.b64 %rd138, -8; 2026-02-21T09:14:03.2440650Z mov.b32 %r1115, %r1114; 2026-02-21T09:14:03.2440827Z mov.b32 %r1116, %r1114; 2026-02-21T09:14:03.2440993Z mov.b32 %r1117, %r1114; 2026-02-21T09:14:03.2441165Z mov.b32 %r1118, %r1114; 2026-02-21T09:14:03.2441329Z mov.b32 %r1119, %r1114; 2026-02-21T09:14:03.2441498Z mov.b32 %r1120, %r1114; 2026-02-21T09:14:03.2441663Z mov.b32 %r1121, %r1114; 2026-02-21T09:14:03.2441832Z mov.b32 %r1122, %r1114; 2026-02-21T09:14:03.2441993Z mov.b32 %r1123, %r1114; 2026-02-21T09:14:03.2442170Z mov.b32 %r1124, %r1114; 2026-02-21T09:14:03.2442337Z mov.b32 %r1125, %r1114; 2026-02-21T09:14:03.2442506Z mov.b32 %r1126, %r1114; 2026-02-21T09:14:03.2442686Z mov.b32 %r1127, %r1114; 2026-02-21T09:14:03.2442859Z mov.b32 %r1128, %r1114; 2026-02-21T09:14:03.2443030Z mov.b32 %r1129, %r1114; 2026-02-21T09:14:03.2443257Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:14:03.2443573Z // => This Inner Loop Header: Depth=2 2026-02-21T09:14:03.2443932Z add.s64 %rd138, %rd138, 8; 2026-02-21T09:14:03.2444130Z setp.lt.u64 %p12, %rd138, 496; 2026-02-21T09:14:03.2444328Z add.s32 %r616, %r1112, 1; 2026-02-21T09:14:03.2444515Z setp.gt.s32 %p13, %r616, 1; 2026-02-21T09:14:03.2444711Z selp.b32 %r1112, 0, %r616, %p13; 2026-02-21T09:14:03.2445050Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2445428Z cp.async.wait_group 1; 2026-02-21T09:14:03.2445606Z bar.sync 0; 2026-02-21T09:14:03.2445765Z shl.b32 %r617, %r1112, 14; 2026-02-21T09:14:03.2445941Z add.s32 %r619, %r1080, %r617; 2026-02-21T09:14:03.2446267Z .loc 1 52 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:52:32 2026-02-21T09:14:03.2446905Z add.s32 %r620, %r619, %r24; 2026-02-21T09:14:03.2447178Z ld.shared.b16 %rs15, [%r620]; 2026-02-21T09:14:03.2447383Z ld.shared.b16 %rs16, [%r620+256]; 2026-02-21T09:14:03.2447595Z ld.shared.b16 %rs17, [%r620+16]; 2026-02-21T09:14:03.2447798Z ld.shared.b16 %rs18, [%r620+272]; 2026-02-21T09:14:03.2447988Z add.s32 %r621, %r619, %r25; 2026-02-21T09:14:03.2448171Z ld.shared.b16 %rs19, [%r621]; 2026-02-21T09:14:03.2448357Z ld.shared.b16 %rs20, [%r621+256]; 2026-02-21T09:14:03.2448556Z ld.shared.b16 %rs21, [%r621+16]; 2026-02-21T09:14:03.2448748Z ld.shared.b16 %rs22, [%r621+272]; 2026-02-21T09:14:03.2448944Z cvt.f32.bf16 %r534, %rs15; 2026-02-21T09:14:03.2449125Z cvt.f32.bf16 %r535, %rs16; 2026-02-21T09:14:03.2449298Z cvt.f32.bf16 %r536, %rs19; 2026-02-21T09:14:03.2449481Z cvt.f32.bf16 %r537, %rs20; 2026-02-21T09:14:03.2449666Z cvt.f32.bf16 %r570, %rs17; 2026-02-21T09:14:03.2449846Z cvt.f32.bf16 %r571, %rs18; 2026-02-21T09:14:03.2450022Z cvt.f32.bf16 %r572, %rs21; 2026-02-21T09:14:03.2450204Z cvt.f32.bf16 %r573, %rs22; 2026-02-21T09:14:03.2450602Z .loc 1 54 34 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:34 2026-02-21T09:14:03.2450969Z cvt.s64.s32 %rd81, %r1110; 2026-02-21T09:14:03.2451152Z add.s64 %rd75, %rd33, %rd81; 2026-02-21T09:14:03.2451477Z .loc 1 54 87 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:87 2026-02-21T09:14:03.2451834Z // begin inline asm 2026-02-21T09:14:03.2451999Z mov.u64 %rd74, 0x0; 2026-02-21T09:14:03.2452233Z createpolicy.fractional.L2::evict_last.b64 %rd74, 1.0; 2026-02-21T09:14:03.2452503Z // end inline asm 2026-02-21T09:14:03.2452669Z // begin inline asm 2026-02-21T09:14:03.2452826Z mov.u16 %rs14, 0x0; 2026-02-21T09:14:03.2453084Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs14 }, [ %rd75 + 0 ], %rd74; 2026-02-21T09:14:03.2453387Z // end inline asm 2026-02-21T09:14:03.2453684Z .loc 1 57 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:57:28 2026-02-21T09:14:03.2454045Z shl.b16 %rs23, %rs14, 4; 2026-02-21T09:14:03.2454363Z .loc 1 72 58 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:72:58 2026-02-21T09:14:03.2454736Z selp.b16 %rs24, %rs23, %rs14, %p33; 2026-02-21T09:14:03.2454943Z cvt.s16.s8 %rs25, %rs24; 2026-02-21T09:14:03.2455129Z shr.s16 %rs26, %rs25, 4; 2026-02-21T09:14:03.2455439Z .loc 1 77 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:77:32 2026-02-21T09:14:03.2455804Z cvt.rn.f32.s16 %r622, %rs26; 2026-02-21T09:14:03.2455999Z st.shared.b32 [%r26], %r622; 2026-02-21T09:14:03.2456178Z $L__tmp3: 2026-02-21T09:14:03.2456700Z .loc 2 291 36 // standard.py:291:36 @[ csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:84:40 ] 2026-02-21T09:14:03.2457142Z // begin inline asm 2026-02-21T09:14:03.2457328Z fence.proxy.async.shared::cta; 2026-02-21T09:14:03.2457521Z // end inline asm 2026-02-21T09:14:03.2457681Z bar.sync 0; 2026-02-21T09:14:03.2457847Z shfl.sync.idx.b32 %r623, %r4, 0, 31, -1; 2026-02-21T09:14:03.2458084Z wgmma.fence.sync.aligned; 2026-02-21T09:14:03.2458271Z mov.pred %p9, -1; 2026-02-21T09:14:03.2458428Z // begin inline asm 2026-02-21T09:14:03.2459146Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129}, {%r534,%r535,%r536,%r537}, %rd1, %p9, 1, 1; 2026-02-21T09:14:03.2459782Z // end inline asm 2026-02-21T09:14:03.2459937Z // begin inline asm 2026-02-21T09:14:03.2460517Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129}, {%r570,%r571,%r572,%r573}, %rd2, %p9, 1, 1; 2026-02-21T09:14:03.2461151Z // end inline asm 2026-02-21T09:14:03.2461327Z wgmma.commit_group.sync.aligned; 2026-02-21T09:14:03.2461525Z mov.b32 %r591, 0; 2026-02-21T09:14:03.2461685Z mov.b32 %r592, %r591; 2026-02-21T09:14:03.2461994Z mov.b32 %r590, %r244; 2026-02-21T09:14:03.2462173Z // begin inline asm 2026-02-21T09:14:03.2462572Z // wait for regs: %r1114,%r1115,%r1116,%r1117,%r1118,%r1119,%r1120,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r590,%r591,%r592 2026-02-21T09:14:03.2463064Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:14:03.2463265Z // end inline asm 2026-02-21T09:14:03.2463413Z $L__tmp4: 2026-02-21T09:14:03.2463720Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2464090Z add.s32 %r624, %r1113, 1; 2026-02-21T09:14:03.2464278Z setp.gt.s32 %p14, %r624, 1; 2026-02-21T09:14:03.2464469Z selp.b32 %r1113, 0, %r624, %p14; 2026-02-21T09:14:03.2464827Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2465315Z mad.wide.s32 %rd80, %r1111, 2, %rd32; 2026-02-21T09:14:03.2465678Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2466135Z shl.b32 %r625, %r1113, 14; 2026-02-21T09:14:03.2466330Z add.s32 %r626, %r1080, %r625; 2026-02-21T09:14:03.2466650Z add.s32 %r612, %r626, %r13; 2026-02-21T09:14:03.2466839Z selp.b32 %r613, 8, 0, %p12; 2026-02-21T09:14:03.2467033Z // begin inline asm 2026-02-21T09:14:03.2467279Z cp.async.ca.shared.global [ %r612 + 0 ], [ %rd137 + 0 ], 0x8, %r613; 2026-02-21T09:14:03.2467569Z // end inline asm 2026-02-21T09:14:03.2467725Z add.s32 %r614, %r612, 8192; 2026-02-21T09:14:03.2467908Z // begin inline asm 2026-02-21T09:14:03.2468146Z cp.async.ca.shared.global [ %r614 + 0 ], [ %rd80 + 0 ], 0x8, %r613; 2026-02-21T09:14:03.2468491Z // end inline asm 2026-02-21T09:14:03.2468658Z cp.async.commit_group; 2026-02-21T09:14:03.2468984Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2469360Z add.s64 %rd137, %rd137, 32; 2026-02-21T09:14:03.2469544Z add.s32 %r1111, %r1111, 16; 2026-02-21T09:14:03.2469741Z add.s32 %r1110, %r1110, 65536; 2026-02-21T09:14:03.2469936Z setp.lt.u64 %p15, %rd138, 504; 2026-02-21T09:14:03.2470131Z @%p15 bra $L__BB0_5; 2026-02-21T09:14:03.2470355Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:14:03.2470763Z .loc 1 33 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:33:32 2026-02-21T09:14:03.2471127Z or.b32 %r656, %r91, %r9; 2026-02-21T09:14:03.2471451Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2471830Z cp.async.wait_group 0; 2026-02-21T09:14:03.2472000Z bar.sync 0; 2026-02-21T09:14:03.2472298Z .loc 1 87 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:87:28 2026-02-21T09:14:03.2472682Z cvt.rn.bf16x2.f32 %r657, %r1115, %r1114; 2026-02-21T09:14:03.2472925Z cvt.rn.bf16x2.f32 %r658, %r1117, %r1116; 2026-02-21T09:14:03.2473159Z cvt.rn.bf16x2.f32 %r659, %r1119, %r1118; 2026-02-21T09:14:03.2473380Z cvt.rn.bf16x2.f32 %r660, %r1121, %r1120; 2026-02-21T09:14:03.2473600Z cvt.rn.bf16x2.f32 %r661, %r1123, %r1122; 2026-02-21T09:14:03.2473813Z cvt.rn.bf16x2.f32 %r662, %r1125, %r1124; 2026-02-21T09:14:03.2474142Z cvt.rn.bf16x2.f32 %r663, %r1127, %r1126; 2026-02-21T09:14:03.2474363Z cvt.rn.bf16x2.f32 %r664, %r1129, %r1128; 2026-02-21T09:14:03.2474722Z .loc 1 88 43 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:43 2026-02-21T09:14:03.2475090Z shl.b32 %r665, %r89, 13; 2026-02-21T09:14:03.2475269Z shl.b32 %r666, %r90, 13; 2026-02-21T09:14:03.2475583Z .loc 1 88 50 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:50 2026-02-21T09:14:03.2475939Z add.s32 %r667, %r665, %r656; 2026-02-21T09:14:03.2476131Z add.s32 %r668, %r666, %r656; 2026-02-21T09:14:03.2476570Z .loc 1 88 22 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:22 2026-02-21T09:14:03.2477109Z mad.wide.s32 %rd82, %r667, 2, %rd34; 2026-02-21T09:14:03.2477340Z mad.wide.s32 %rd83, %r668, 2, %rd34; 2026-02-21T09:14:03.2477683Z .loc 1 88 81 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:81 2026-02-21T09:14:03.2478087Z st.shared.v4.b32 [%r21], {%r657, %r659, %r661, %r663}; 2026-02-21T09:14:03.2478379Z st.shared.v4.b32 [%r21+128], {%r658, %r660, %r662, %r664}; 2026-02-21T09:14:03.2478634Z bar.sync 0; 2026-02-21T09:14:03.2478783Z // begin inline asm 2026-02-21T09:14:03.2479073Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r637, %r638, %r639, %r640}, [%r438]; 2026-02-21T09:14:03.2479406Z // end inline asm 2026-02-21T09:14:03.2479566Z // begin inline asm 2026-02-21T09:14:03.2479845Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r641, %r642, %r643, %r644}, [%r443]; 2026-02-21T09:14:03.2480174Z // end inline asm 2026-02-21T09:14:03.2480336Z // begin inline asm 2026-02-21T09:14:03.2480552Z st.global.v4.b32 [ %rd82 + 0 ], { %r637, %r638, %r639, %r640 }; 2026-02-21T09:14:03.2480822Z // end inline asm 2026-02-21T09:14:03.2481052Z // begin inline asm 2026-02-21T09:14:03.2481273Z st.global.v4.b32 [ %rd83 + 0 ], { %r641, %r642, %r643, %r644 }; 2026-02-21T09:14:03.2481544Z // end inline asm 2026-02-21T09:14:03.2481852Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2482227Z add.s32 %r669, %r1089, 8448; 2026-02-21T09:14:03.2482562Z .loc 1 25 35 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:25:35 2026-02-21T09:14:03.2482979Z shr.s32 %r670, %r669, 31; 2026-02-21T09:14:03.2483204Z shr.u32 %r671, %r670, 23; 2026-02-21T09:14:03.2483444Z add.s32 %r672, %r669, %r671; 2026-02-21T09:14:03.2483669Z shr.s32 %r673, %r672, 9; 2026-02-21T09:14:03.2484051Z .loc 1 26 33 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:26:33 2026-02-21T09:14:03.2484479Z shl.b32 %r674, %r673, 1; 2026-02-21T09:14:03.2484837Z .loc 1 27 39 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:39 2026-02-21T09:14:03.2485258Z sub.s32 %r675, 32, %r674; 2026-02-21T09:14:03.2485612Z .loc 1 27 52 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:27:52 2026-02-21T09:14:03.2486034Z min.s32 %r676, %r675, 2; 2026-02-21T09:14:03.2486383Z .loc 1 28 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:45 2026-02-21T09:14:03.2486950Z and.b32 %r677, %r672, -512; 2026-02-21T09:14:03.2487209Z sub.s32 %r678, %r669, %r677; 2026-02-21T09:14:03.2487568Z .loc 1 29 51 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:29:51 2026-02-21T09:14:03.2487987Z div.s32 %r679, %r678, %r676; 2026-02-21T09:14:03.2488344Z .loc 1 28 64 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:64 2026-02-21T09:14:03.2488768Z mul.lo.s32 %r680, %r679, %r676; 2026-02-21T09:14:03.2489009Z sub.s32 %r681, %r678, %r680; 2026-02-21T09:14:03.2489397Z .loc 1 28 30 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:30 2026-02-21T09:14:03.2489823Z add.s32 %r682, %r681, %r674; 2026-02-21T09:14:03.2490302Z .loc 1 30 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:30:27 2026-02-21T09:14:03.2490725Z shl.b32 %r683, %r682, 9; 2026-02-21T09:14:03.2491080Z .loc 1 31 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:31:32 2026-02-21T09:14:03.2491502Z or.b32 %r134, %r683, %r5; 2026-02-21T09:14:03.2491716Z or.b32 %r135, %r683, %r6; 2026-02-21T09:14:03.2492096Z .loc 1 32 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:32:27 2026-02-21T09:14:03.2492526Z shl.b32 %r136, %r679, 5; 2026-02-21T09:14:03.2492882Z .loc 1 48 53 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:53 2026-02-21T09:14:03.2493397Z shl.b32 %r684, %r134, 10; 2026-02-21T09:14:03.2493678Z shl.b32 %r685, %r135, 10; 2026-02-21T09:14:03.2494088Z .loc 1 48 60 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:60 2026-02-21T09:14:03.2494488Z or.b32 %r686, %r684, %r11; 2026-02-21T09:14:03.2494743Z or.b32 %r687, %r685, %r11; 2026-02-21T09:14:03.2495131Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2495538Z mad.wide.s32 %rd84, %r686, 2, %rd32; 2026-02-21T09:14:03.2495821Z mad.wide.s32 %rd85, %r687, 2, %rd32; 2026-02-21T09:14:03.2496206Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2496761Z bar.sync 0; 2026-02-21T09:14:03.2496965Z mov.b32 %r646, 8; 2026-02-21T09:14:03.2497198Z // begin inline asm 2026-02-21T09:14:03.2497481Z cp.async.ca.shared.global [ %r14 + 0 ], [ %rd84 + 0 ], 0x8, %r646; 2026-02-21T09:14:03.2497832Z // end inline asm 2026-02-21T09:14:03.2498061Z // begin inline asm 2026-02-21T09:14:03.2498408Z cp.async.ca.shared.global [ %r15 + 0 ], [ %rd85 + 0 ], 0x8, %r646; 2026-02-21T09:14:03.2498770Z // end inline asm 2026-02-21T09:14:03.2498954Z cp.async.commit_group; 2026-02-21T09:14:03.2499350Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2499760Z cvt.s64.s32 %rd89, %r684; 2026-02-21T09:14:03.2500020Z or.b64 %rd90, %rd89, %rd134; 2026-02-21T09:14:03.2500273Z shl.b64 %rd91, %rd90, 1; 2026-02-21T09:14:03.2500504Z add.s64 %rd92, %rd32, %rd91; 2026-02-21T09:14:03.2500760Z add.s64 %rd86, %rd92, 32; 2026-02-21T09:14:03.2500981Z cvt.s64.s32 %rd93, %r685; 2026-02-21T09:14:03.2501227Z or.b64 %rd94, %rd93, %rd134; 2026-02-21T09:14:03.2501454Z shl.b64 %rd95, %rd94, 1; 2026-02-21T09:14:03.2501699Z add.s64 %rd96, %rd32, %rd95; 2026-02-21T09:14:03.2501920Z add.s64 %rd87, %rd96, 32; 2026-02-21T09:14:03.2502304Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2502699Z bar.sync 0; 2026-02-21T09:14:03.2502923Z // begin inline asm 2026-02-21T09:14:03.2503220Z cp.async.ca.shared.global [ %r16 + 0 ], [ %rd86 + 0 ], 0x8, %r646; 2026-02-21T09:14:03.2503537Z // end inline asm 2026-02-21T09:14:03.2503764Z // begin inline asm 2026-02-21T09:14:03.2504034Z cp.async.ca.shared.global [ %r17 + 0 ], [ %rd87 + 0 ], 0x8, %r646; 2026-02-21T09:14:03.2504378Z // end inline asm 2026-02-21T09:14:03.2504584Z cp.async.commit_group; 2026-02-21T09:14:03.2504976Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2505382Z shl.b32 %r688, %r682, 19; 2026-02-21T09:14:03.2505633Z or.b32 %r689, %r1088, %r688; 2026-02-21T09:14:03.2505893Z mad.wide.s32 %rd139, %r689, 2, %rd3; 2026-02-21T09:14:03.2506135Z shl.b32 %r690, %r669, 19; 2026-02-21T09:14:03.2506380Z or.b32 %r691, %r28, %r690; 2026-02-21T09:14:03.2506741Z shl.b32 %r692, %r680, 19; 2026-02-21T09:14:03.2507007Z sub.s32 %r693, %r691, %r692; 2026-02-21T09:14:03.2507240Z mul.lo.s32 %r694, %r673, 267386880; 2026-02-21T09:14:03.2507514Z sub.s32 %r1131, %r693, %r694; 2026-02-21T09:14:03.2507731Z add.s32 %r1130, %r29, %r136; 2026-02-21T09:14:03.2508101Z mov.b32 %r1134, 0f00000000; 2026-02-21T09:14:03.2508440Z mov.b32 %r1133, 1; 2026-02-21T09:14:03.2508656Z mov.b32 %r1132, -1; 2026-02-21T09:14:03.2508891Z mov.b64 %rd140, -8; 2026-02-21T09:14:03.2509099Z mov.b32 %r1135, %r1134; 2026-02-21T09:14:03.2509344Z mov.b32 %r1136, %r1134; 2026-02-21T09:14:03.2509555Z mov.b32 %r1137, %r1134; 2026-02-21T09:14:03.2509787Z mov.b32 %r1138, %r1134; 2026-02-21T09:14:03.2510006Z mov.b32 %r1139, %r1134; 2026-02-21T09:14:03.2510223Z mov.b32 %r1140, %r1134; 2026-02-21T09:14:03.2510425Z mov.b32 %r1141, %r1134; 2026-02-21T09:14:03.2510675Z mov.b32 %r1142, %r1134; 2026-02-21T09:14:03.2510912Z mov.b32 %r1143, %r1134; 2026-02-21T09:14:03.2511120Z mov.b32 %r1144, %r1134; 2026-02-21T09:14:03.2511513Z mov.b32 %r1145, %r1134; 2026-02-21T09:14:03.2511734Z mov.b32 %r1146, %r1134; 2026-02-21T09:14:03.2511977Z mov.b32 %r1147, %r1134; 2026-02-21T09:14:03.2512187Z mov.b32 %r1148, %r1134; 2026-02-21T09:14:03.2512431Z mov.b32 %r1149, %r1134; 2026-02-21T09:14:03.2512692Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:14:03.2513067Z // => This Inner Loop Header: Depth=2 2026-02-21T09:14:03.2513373Z add.s64 %rd140, %rd140, 8; 2026-02-21T09:14:03.2513636Z setp.lt.u64 %p19, %rd140, 496; 2026-02-21T09:14:03.2513889Z add.s32 %r809, %r1132, 1; 2026-02-21T09:14:03.2514111Z setp.gt.s32 %p20, %r809, 1; 2026-02-21T09:14:03.2514378Z selp.b32 %r1132, 0, %r809, %p20; 2026-02-21T09:14:03.2514768Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2515201Z cp.async.wait_group 1; 2026-02-21T09:14:03.2515419Z bar.sync 0; 2026-02-21T09:14:03.2515724Z shl.b32 %r810, %r1132, 14; 2026-02-21T09:14:03.2516102Z add.s32 %r812, %r1080, %r810; 2026-02-21T09:14:03.2516662Z .loc 1 52 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:52:32 2026-02-21T09:14:03.2517103Z add.s32 %r813, %r812, %r24; 2026-02-21T09:14:03.2517337Z ld.shared.b16 %rs28, [%r813]; 2026-02-21T09:14:03.2517621Z ld.shared.b16 %rs29, [%r813+256]; 2026-02-21T09:14:03.2517869Z ld.shared.b16 %rs30, [%r813+16]; 2026-02-21T09:14:03.2518138Z ld.shared.b16 %rs31, [%r813+272]; 2026-02-21T09:14:03.2518379Z add.s32 %r814, %r812, %r25; 2026-02-21T09:14:03.2518636Z ld.shared.b16 %rs32, [%r814]; 2026-02-21T09:14:03.2518865Z ld.shared.b16 %rs33, [%r814+256]; 2026-02-21T09:14:03.2519136Z ld.shared.b16 %rs34, [%r814+16]; 2026-02-21T09:14:03.2519403Z ld.shared.b16 %rs35, [%r814+272]; 2026-02-21T09:14:03.2519642Z cvt.f32.bf16 %r727, %rs28; 2026-02-21T09:14:03.2519862Z cvt.f32.bf16 %r728, %rs29; 2026-02-21T09:14:03.2520078Z cvt.f32.bf16 %r729, %rs32; 2026-02-21T09:14:03.2520319Z cvt.f32.bf16 %r730, %rs33; 2026-02-21T09:14:03.2520542Z cvt.f32.bf16 %r763, %rs30; 2026-02-21T09:14:03.2520744Z cvt.f32.bf16 %r764, %rs31; 2026-02-21T09:14:03.2520918Z cvt.f32.bf16 %r765, %rs34; 2026-02-21T09:14:03.2521101Z cvt.f32.bf16 %r766, %rs35; 2026-02-21T09:14:03.2521420Z .loc 1 54 34 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:34 2026-02-21T09:14:03.2521777Z cvt.s64.s32 %rd104, %r1130; 2026-02-21T09:14:03.2521966Z add.s64 %rd98, %rd33, %rd104; 2026-02-21T09:14:03.2522293Z .loc 1 54 87 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:87 2026-02-21T09:14:03.2522664Z // begin inline asm 2026-02-21T09:14:03.2522826Z mov.u64 %rd97, 0x0; 2026-02-21T09:14:03.2523057Z createpolicy.fractional.L2::evict_last.b64 %rd97, 1.0; 2026-02-21T09:14:03.2523315Z // end inline asm 2026-02-21T09:14:03.2523475Z // begin inline asm 2026-02-21T09:14:03.2523634Z mov.u16 %rs27, 0x0; 2026-02-21T09:14:03.2523891Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs27 }, [ %rd98 + 0 ], %rd97; 2026-02-21T09:14:03.2524190Z // end inline asm 2026-02-21T09:14:03.2524487Z .loc 1 57 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:57:28 2026-02-21T09:14:03.2524948Z shl.b16 %rs36, %rs27, 4; 2026-02-21T09:14:03.2525259Z .loc 1 72 58 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:72:58 2026-02-21T09:14:03.2525622Z selp.b16 %rs37, %rs36, %rs27, %p33; 2026-02-21T09:14:03.2525835Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T09:14:03.2526008Z shr.s16 %rs39, %rs38, 4; 2026-02-21T09:14:03.2526323Z .loc 1 77 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:77:32 2026-02-21T09:14:03.2526838Z cvt.rn.f32.s16 %r815, %rs39; 2026-02-21T09:14:03.2527046Z st.shared.b32 [%r26], %r815; 2026-02-21T09:14:03.2527226Z $L__tmp5: 2026-02-21T09:14:03.2527666Z .loc 2 291 36 // standard.py:291:36 @[ csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:84:40 ] 2026-02-21T09:14:03.2528156Z // begin inline asm 2026-02-21T09:14:03.2528356Z fence.proxy.async.shared::cta; 2026-02-21T09:14:03.2528561Z // end inline asm 2026-02-21T09:14:03.2528713Z bar.sync 0; 2026-02-21T09:14:03.2528888Z shfl.sync.idx.b32 %r816, %r4, 0, 31, -1; 2026-02-21T09:14:03.2529118Z wgmma.fence.sync.aligned; 2026-02-21T09:14:03.2529312Z mov.pred %p16, -1; 2026-02-21T09:14:03.2529473Z // begin inline asm 2026-02-21T09:14:03.2530081Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149}, {%r727,%r728,%r729,%r730}, %rd1, %p16, 1, 1; 2026-02-21T09:14:03.2530726Z // end inline asm 2026-02-21T09:14:03.2530879Z // begin inline asm 2026-02-21T09:14:03.2531482Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149}, {%r763,%r764,%r765,%r766}, %rd2, %p16, 1, 1; 2026-02-21T09:14:03.2532200Z // end inline asm 2026-02-21T09:14:03.2532379Z wgmma.commit_group.sync.aligned; 2026-02-21T09:14:03.2532577Z mov.b32 %r785, 0; 2026-02-21T09:14:03.2532739Z mov.b32 %r784, %r785; 2026-02-21T09:14:03.2532901Z mov.b32 %r783, %r244; 2026-02-21T09:14:03.2533066Z // begin inline asm 2026-02-21T09:14:03.2533466Z // wait for regs: %r1134,%r1135,%r1136,%r1137,%r1138,%r1139,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1148,%r1149,%r783,%r784,%r785 2026-02-21T09:14:03.2533933Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:14:03.2534129Z // end inline asm 2026-02-21T09:14:03.2534274Z $L__tmp6: 2026-02-21T09:14:03.2534590Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2534955Z add.s32 %r817, %r1133, 1; 2026-02-21T09:14:03.2535139Z setp.gt.s32 %p21, %r817, 1; 2026-02-21T09:14:03.2535328Z selp.b32 %r1133, 0, %r817, %p21; 2026-02-21T09:14:03.2535671Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2536041Z mad.wide.s32 %rd103, %r1131, 2, %rd32; 2026-02-21T09:14:03.2536387Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2536900Z shl.b32 %r818, %r1133, 14; 2026-02-21T09:14:03.2537092Z add.s32 %r819, %r1080, %r818; 2026-02-21T09:14:03.2537286Z add.s32 %r805, %r819, %r13; 2026-02-21T09:14:03.2537468Z selp.b32 %r806, 8, 0, %p19; 2026-02-21T09:14:03.2537652Z // begin inline asm 2026-02-21T09:14:03.2537896Z cp.async.ca.shared.global [ %r805 + 0 ], [ %rd139 + 0 ], 0x8, %r806; 2026-02-21T09:14:03.2538190Z // end inline asm 2026-02-21T09:14:03.2538357Z add.s32 %r807, %r805, 8192; 2026-02-21T09:14:03.2538535Z // begin inline asm 2026-02-21T09:14:03.2538771Z cp.async.ca.shared.global [ %r807 + 0 ], [ %rd103 + 0 ], 0x8, %r806; 2026-02-21T09:14:03.2539044Z // end inline asm 2026-02-21T09:14:03.2539212Z cp.async.commit_group; 2026-02-21T09:14:03.2539534Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2539907Z add.s64 %rd139, %rd139, 32; 2026-02-21T09:14:03.2540184Z add.s32 %r1131, %r1131, 16; 2026-02-21T09:14:03.2540364Z add.s32 %r1130, %r1130, 65536; 2026-02-21T09:14:03.2540571Z setp.lt.u64 %p22, %rd140, 504; 2026-02-21T09:14:03.2540762Z @%p22 bra $L__BB0_7; 2026-02-21T09:14:03.2540984Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:14:03.2541388Z .loc 1 33 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:33:32 2026-02-21T09:14:03.2541756Z or.b32 %r838, %r136, %r9; 2026-02-21T09:14:03.2548645Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2549063Z cp.async.wait_group 0; 2026-02-21T09:14:03.2549249Z bar.sync 0; 2026-02-21T09:14:03.2549839Z .loc 1 87 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:87:28 2026-02-21T09:14:03.2550233Z cvt.rn.bf16x2.f32 %r839, %r1135, %r1134; 2026-02-21T09:14:03.2550489Z cvt.rn.bf16x2.f32 %r840, %r1137, %r1136; 2026-02-21T09:14:03.2550715Z cvt.rn.bf16x2.f32 %r841, %r1139, %r1138; 2026-02-21T09:14:03.2550939Z cvt.rn.bf16x2.f32 %r842, %r1141, %r1140; 2026-02-21T09:14:03.2551155Z cvt.rn.bf16x2.f32 %r843, %r1143, %r1142; 2026-02-21T09:14:03.2551376Z cvt.rn.bf16x2.f32 %r844, %r1145, %r1144; 2026-02-21T09:14:03.2551594Z cvt.rn.bf16x2.f32 %r845, %r1147, %r1146; 2026-02-21T09:14:03.2551808Z cvt.rn.bf16x2.f32 %r846, %r1149, %r1148; 2026-02-21T09:14:03.2552170Z .loc 1 88 43 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:43 2026-02-21T09:14:03.2552537Z shl.b32 %r847, %r134, 13; 2026-02-21T09:14:03.2552720Z shl.b32 %r848, %r135, 13; 2026-02-21T09:14:03.2553033Z .loc 1 88 50 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:50 2026-02-21T09:14:03.2553512Z add.s32 %r849, %r847, %r838; 2026-02-21T09:14:03.2553717Z add.s32 %r850, %r848, %r838; 2026-02-21T09:14:03.2554051Z .loc 1 88 22 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:22 2026-02-21T09:14:03.2554428Z mad.wide.s32 %rd105, %r849, 2, %rd34; 2026-02-21T09:14:03.2554642Z mad.wide.s32 %rd106, %r850, 2, %rd34; 2026-02-21T09:14:03.2554984Z .loc 1 88 81 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:81 2026-02-21T09:14:03.2555376Z st.shared.v4.b32 [%r21], {%r839, %r841, %r843, %r845}; 2026-02-21T09:14:03.2555673Z st.shared.v4.b32 [%r21+128], {%r840, %r842, %r844, %r846}; 2026-02-21T09:14:03.2555924Z bar.sync 0; 2026-02-21T09:14:03.2556074Z // begin inline asm 2026-02-21T09:14:03.2556363Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r820, %r821, %r822, %r823}, [%r438]; 2026-02-21T09:14:03.2556861Z // end inline asm 2026-02-21T09:14:03.2557022Z // begin inline asm 2026-02-21T09:14:03.2557314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r825, %r826, %r827, %r828}, [%r443]; 2026-02-21T09:14:03.2557645Z // end inline asm 2026-02-21T09:14:03.2557796Z // begin inline asm 2026-02-21T09:14:03.2558022Z st.global.v4.b32 [ %rd105 + 0 ], { %r820, %r821, %r822, %r823 }; 2026-02-21T09:14:03.2558290Z // end inline asm 2026-02-21T09:14:03.2558439Z // begin inline asm 2026-02-21T09:14:03.2558651Z st.global.v4.b32 [ %rd106 + 0 ], { %r825, %r826, %r827, %r828 }; 2026-02-21T09:14:03.2558900Z // end inline asm 2026-02-21T09:14:03.2559220Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2559589Z add.s32 %r1089, %r1089, 12672; 2026-02-21T09:14:03.2559808Z setp.lt.s32 %p23, %r1089, %r1150; 2026-02-21T09:14:03.2560007Z @%p23 bra $L__BB0_2; 2026-02-21T09:14:03.2560213Z $L__BB0_9: // %.preheader 2026-02-21T09:14:03.2560468Z setp.gt.s32 %p24, %r1150, 8191; 2026-02-21T09:14:03.2560669Z @%p24 bra $L__BB0_14; 2026-02-21T09:14:03.2560884Z // %bb.10: // %.lr.ph29 2026-02-21T09:14:03.2561267Z .loc 1 0 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:0:144 2026-02-21T09:14:03.2561745Z and.b32 %r852, %r1078, 8056; 2026-02-21T09:14:03.2561930Z and.b32 %r854, %r1079, 136; 2026-02-21T09:14:03.2562113Z xor.b32 %r30, %r854, %r852; 2026-02-21T09:14:03.2562297Z add.s32 %r31, %r1080, %r30; 2026-02-21T09:14:03.2562484Z add.s32 %r892, %r31, 8192; 2026-02-21T09:14:03.2562668Z add.s32 %r894, %r31, 16384; 2026-02-21T09:14:03.2562841Z add.s32 %r896, %r31, 24576; 2026-02-21T09:14:03.2563025Z and.b32 %r857, %r1081, 15872; 2026-02-21T09:14:03.2563213Z and.b32 %r858, %r1078, 96; 2026-02-21T09:14:03.2563410Z or.b32 %r860, %r857, %r858; 2026-02-21T09:14:03.2563587Z or.b32 %r861, %r860, %r1082; 2026-02-21T09:14:03.2563772Z or.b32 %r35, %r861, %r854; 2026-02-21T09:14:03.2563950Z xor.b32 %r36, %r35, 8; 2026-02-21T09:14:03.2564276Z and.b32 %r863, %r1078, 48; 2026-02-21T09:14:03.2564457Z and.b32 %r865, %r1084, 60; 2026-02-21T09:14:03.2564636Z xor.b32 %r866, %r863, %r865; 2026-02-21T09:14:03.2564820Z add.s32 %r867, %r1080, 32768; 2026-02-21T09:14:03.2565000Z add.s32 %r868, %r867, %r1083; 2026-02-21T09:14:03.2565186Z add.s32 %r37, %r868, %r866; 2026-02-21T09:14:03.2565364Z bfe.u32 %r869, %r867, 4, 14; 2026-02-21T09:14:03.2565550Z cvt.u64.u32 %rd107, %r869; 2026-02-21T09:14:03.2565751Z or.b64 %rd128, %rd107, -9223371899407433728; 2026-02-21T09:14:03.2565979Z add.s32 %r870, %r1080, 32800; 2026-02-21T09:14:03.2566161Z bfe.u32 %r871, %r870, 4, 14; 2026-02-21T09:14:03.2566402Z cvt.u64.u32 %rd108, %r871; 2026-02-21T09:14:03.2566839Z or.b64 %rd129, %rd108, -9223371899407433728; 2026-02-21T09:14:03.2567133Z and.b32 %r873, %r1085, 24576; 2026-02-21T09:14:03.2567322Z and.b32 %r875, %r1078, 7936; 2026-02-21T09:14:03.2567499Z and.b32 %r877, %r1087, 112; 2026-02-21T09:14:03.2567680Z or.b32 %r878, %r1086, %r875; 2026-02-21T09:14:03.2567859Z xor.b32 %r879, %r878, %r877; 2026-02-21T09:14:03.2568129Z add.s32 %r880, %r1080, %r873; 2026-02-21T09:14:03.2568322Z add.s32 %r38, %r880, %r879; 2026-02-21T09:14:03.2568508Z shl.b32 %r881, %r3, 12; 2026-02-21T09:14:03.2568679Z and.b32 %r882, %r881, 24576; 2026-02-21T09:14:03.2568859Z and.b32 %r883, %r1081, 112; 2026-02-21T09:14:03.2569039Z and.b32 %r884, %r1087, 4064; 2026-02-21T09:14:03.2569211Z or.b32 %r885, %r882, %r883; 2026-02-21T09:14:03.2569392Z xor.b32 %r886, %r885, %r884; 2026-02-21T09:14:03.2569567Z add.s32 %r1051, %r1080, %r886; 2026-02-21T09:14:03.2569760Z add.s32 %r1056, %r1051, 4096; 2026-02-21T09:14:03.2570105Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2570497Z mad.wide.u32 %rd109, %r8, 8, %rd32; 2026-02-21T09:14:03.2570707Z add.s64 %rd6, %rd109, 64; 2026-02-21T09:14:03.2571044Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2571416Z or.b32 %r889, %r1088, %r11; 2026-02-21T09:14:03.2571593Z or.b32 %r41, %r889, 262176; 2026-02-21T09:14:03.2571924Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2572297Z mad.wide.u32 %rd7, %r10, 8192, %rd33; 2026-02-21T09:14:03.2572560Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T09:14:03.2572851Z // Child Loop BB0_12 Depth 2 2026-02-21T09:14:03.2573239Z .loc 1 25 35 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:25:35 2026-02-21T09:14:03.2573602Z shr.s32 %r901, %r1150, 31; 2026-02-21T09:14:03.2573784Z shr.u32 %r902, %r901, 23; 2026-02-21T09:14:03.2573969Z add.s32 %r903, %r1150, %r902; 2026-02-21T09:14:03.2574152Z shr.s32 %r904, %r903, 9; 2026-02-21T09:14:03.2574470Z .loc 1 28 45 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:45 2026-02-21T09:14:03.2574833Z and.b32 %r905, %r903, 65024; 2026-02-21T09:14:03.2575020Z sub.s32 %r906, %r1150, %r905; 2026-02-21T09:14:03.2575345Z .loc 1 28 64 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:64 2026-02-21T09:14:03.2575796Z cvt.u16.u32 %rs40, %r906; 2026-02-21T09:14:03.2576114Z .loc 1 29 51 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:29:51 2026-02-21T09:14:03.2576595Z shr.u16 %rs41, %rs40, 15; 2026-02-21T09:14:03.2576788Z add.s16 %rs42, %rs40, %rs41; 2026-02-21T09:14:03.2576982Z shr.s16 %rs43, %rs42, 1; 2026-02-21T09:14:03.2577301Z .loc 1 28 64 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:28:64 2026-02-21T09:14:03.2577655Z and.b16 %rs44, %rs42, -2; 2026-02-21T09:14:03.2577837Z sub.s16 %rs45, %rs40, %rs44; 2026-02-21T09:14:03.2578020Z cvt.u32.u16 %r907, %rs45; 2026-02-21T09:14:03.2578408Z .loc 1 30 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:30:27 2026-02-21T09:14:03.2578823Z shl.b32 %r908, %r904, 10; 2026-02-21T09:14:03.2579001Z mul.wide.s16 %r909, %rs45, 512; 2026-02-21T09:14:03.2579203Z add.s32 %r910, %r909, %r908; 2026-02-21T09:14:03.2579516Z .loc 1 31 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:31:32 2026-02-21T09:14:03.2579866Z or.b32 %r181, %r910, %r5; 2026-02-21T09:14:03.2580045Z or.b32 %r182, %r910, %r6; 2026-02-21T09:14:03.2580361Z .loc 1 32 27 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:32:27 2026-02-21T09:14:03.2580724Z mul.wide.s16 %r183, %rs43, 32; 2026-02-21T09:14:03.2581044Z .loc 1 48 53 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:53 2026-02-21T09:14:03.2581392Z shl.b32 %r911, %r181, 10; 2026-02-21T09:14:03.2581563Z shl.b32 %r912, %r182, 10; 2026-02-21T09:14:03.2581876Z .loc 1 48 60 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:60 2026-02-21T09:14:03.2582300Z or.b32 %r913, %r911, %r11; 2026-02-21T09:14:03.2582493Z or.b32 %r914, %r912, %r11; 2026-02-21T09:14:03.2582811Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2583170Z mad.wide.s32 %rd110, %r913, 2, %rd32; 2026-02-21T09:14:03.2583386Z mad.wide.s32 %rd111, %r914, 2, %rd32; 2026-02-21T09:14:03.2583722Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2584075Z bar.sync 0; 2026-02-21T09:14:03.2584220Z mov.b32 %r891, 8; 2026-02-21T09:14:03.2584381Z // begin inline asm 2026-02-21T09:14:03.2584626Z cp.async.ca.shared.global [ %r31 + 0 ], [ %rd110 + 0 ], 0x8, %r891; 2026-02-21T09:14:03.2584902Z // end inline asm 2026-02-21T09:14:03.2585060Z // begin inline asm 2026-02-21T09:14:03.2585306Z cp.async.ca.shared.global [ %r892 + 0 ], [ %rd111 + 0 ], 0x8, %r891; 2026-02-21T09:14:03.2585590Z // end inline asm 2026-02-21T09:14:03.2585751Z cp.async.commit_group; 2026-02-21T09:14:03.2586073Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2586426Z cvt.s64.s32 %rd115, %r911; 2026-02-21T09:14:03.2586756Z or.b64 %rd117, %rd115, %rd134; 2026-02-21T09:14:03.2586965Z shl.b64 %rd118, %rd117, 1; 2026-02-21T09:14:03.2587149Z add.s64 %rd119, %rd32, %rd118; 2026-02-21T09:14:03.2587340Z add.s64 %rd112, %rd119, 32; 2026-02-21T09:14:03.2587525Z cvt.s64.s32 %rd120, %r912; 2026-02-21T09:14:03.2587710Z or.b64 %rd121, %rd120, %rd134; 2026-02-21T09:14:03.2587898Z shl.b64 %rd122, %rd121, 1; 2026-02-21T09:14:03.2588083Z add.s64 %rd123, %rd32, %rd122; 2026-02-21T09:14:03.2588270Z add.s64 %rd113, %rd123, 32; 2026-02-21T09:14:03.2588668Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2589020Z bar.sync 0; 2026-02-21T09:14:03.2589167Z // begin inline asm 2026-02-21T09:14:03.2589411Z cp.async.ca.shared.global [ %r894 + 0 ], [ %rd112 + 0 ], 0x8, %r891; 2026-02-21T09:14:03.2589689Z // end inline asm 2026-02-21T09:14:03.2589846Z // begin inline asm 2026-02-21T09:14:03.2590072Z cp.async.ca.shared.global [ %r896 + 0 ], [ %rd113 + 0 ], 0x8, %r891; 2026-02-21T09:14:03.2590453Z // end inline asm 2026-02-21T09:14:03.2590623Z cp.async.commit_group; 2026-02-21T09:14:03.2590957Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2591330Z or.b32 %r915, %r5, %r908; 2026-02-21T09:14:03.2591508Z add.s32 %r916, %r915, %r909; 2026-02-21T09:14:03.2591695Z shl.b32 %r917, %r916, 10; 2026-02-21T09:14:03.2591879Z mad.wide.s32 %rd142, %r917, 2, %rd6; 2026-02-21T09:14:03.2592087Z shl.b32 %r918, %r904, 20; 2026-02-21T09:14:03.2592259Z or.b32 %r919, %r41, %r918; 2026-02-21T09:14:03.2592445Z shl.b32 %r920, %r907, 19; 2026-02-21T09:14:03.2592627Z add.s32 %r1151, %r919, %r920; 2026-02-21T09:14:03.2592881Z or.b32 %r921, %r7, %r183; 2026-02-21T09:14:03.2593124Z cvt.s64.s32 %rd124, %r921; 2026-02-21T09:14:03.2593305Z add.s64 %rd141, %rd7, %rd124; 2026-02-21T09:14:03.2593491Z mov.b32 %r1154, 0f00000000; 2026-02-21T09:14:03.2593677Z mov.b32 %r1153, 1; 2026-02-21T09:14:03.2593840Z mov.b32 %r1152, -1; 2026-02-21T09:14:03.2593998Z mov.b64 %rd143, -8; 2026-02-21T09:14:03.2594163Z mov.b32 %r1155, %r1154; 2026-02-21T09:14:03.2594327Z mov.b32 %r1156, %r1154; 2026-02-21T09:14:03.2594494Z mov.b32 %r1157, %r1154; 2026-02-21T09:14:03.2594660Z mov.b32 %r1158, %r1154; 2026-02-21T09:14:03.2594820Z mov.b32 %r1159, %r1154; 2026-02-21T09:14:03.2594986Z mov.b32 %r1160, %r1154; 2026-02-21T09:14:03.2595148Z mov.b32 %r1161, %r1154; 2026-02-21T09:14:03.2595324Z mov.b32 %r1162, %r1154; 2026-02-21T09:14:03.2595488Z mov.b32 %r1163, %r1154; 2026-02-21T09:14:03.2595652Z mov.b32 %r1164, %r1154; 2026-02-21T09:14:03.2595812Z mov.b32 %r1165, %r1154; 2026-02-21T09:14:03.2595979Z mov.b32 %r1166, %r1154; 2026-02-21T09:14:03.2596146Z mov.b32 %r1167, %r1154; 2026-02-21T09:14:03.2596387Z mov.b32 %r1168, %r1154; 2026-02-21T09:14:03.2596595Z mov.b32 %r1169, %r1154; 2026-02-21T09:14:03.2596733Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T09:14:03.2596846Z // => This Inner Loop Header: Depth=2 2026-02-21T09:14:03.2596927Z add.s64 %rd143, %rd143, 8; 2026-02-21T09:14:03.2597006Z setp.lt.u64 %p28, %rd143, 496; 2026-02-21T09:14:03.2597076Z add.s32 %r1036, %r1152, 1; 2026-02-21T09:14:03.2597147Z setp.gt.s32 %p29, %r1036, 1; 2026-02-21T09:14:03.2597220Z selp.b32 %r1152, 0, %r1036, %p29; 2026-02-21T09:14:03.2597461Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2597532Z cp.async.wait_group 1; 2026-02-21T09:14:03.2597592Z bar.sync 0; 2026-02-21T09:14:03.2597664Z shl.b32 %r1037, %r1152, 14; 2026-02-21T09:14:03.2597731Z add.s32 %r1039, %r1080, %r1037; 2026-02-21T09:14:03.2597949Z .loc 1 52 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:52:32 2026-02-21T09:14:03.2598022Z add.s32 %r1040, %r1039, %r35; 2026-02-21T09:14:03.2598093Z ld.shared.b16 %rs47, [%r1040]; 2026-02-21T09:14:03.2598165Z ld.shared.b16 %rs48, [%r1040+256]; 2026-02-21T09:14:03.2598239Z ld.shared.b16 %rs49, [%r1040+16]; 2026-02-21T09:14:03.2598307Z ld.shared.b16 %rs50, [%r1040+272]; 2026-02-21T09:14:03.2598370Z add.s32 %r1041, %r1039, %r36; 2026-02-21T09:14:03.2598444Z ld.shared.b16 %rs51, [%r1041]; 2026-02-21T09:14:03.2598512Z ld.shared.b16 %rs52, [%r1041+256]; 2026-02-21T09:14:03.2598578Z ld.shared.b16 %rs53, [%r1041+16]; 2026-02-21T09:14:03.2598644Z ld.shared.b16 %rs54, [%r1041+272]; 2026-02-21T09:14:03.2598717Z cvt.f32.bf16 %r954, %rs47; 2026-02-21T09:14:03.2598781Z cvt.f32.bf16 %r955, %rs48; 2026-02-21T09:14:03.2598845Z cvt.f32.bf16 %r956, %rs51; 2026-02-21T09:14:03.2598914Z cvt.f32.bf16 %r957, %rs52; 2026-02-21T09:14:03.2598981Z cvt.f32.bf16 %r990, %rs49; 2026-02-21T09:14:03.2599047Z cvt.f32.bf16 %r991, %rs50; 2026-02-21T09:14:03.2599112Z cvt.f32.bf16 %r992, %rs53; 2026-02-21T09:14:03.2599179Z cvt.f32.bf16 %r993, %rs54; 2026-02-21T09:14:03.2599472Z .loc 1 54 87 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:54:87 2026-02-21T09:14:03.2599536Z // begin inline asm 2026-02-21T09:14:03.2599604Z mov.u64 %rd125, 0x0; 2026-02-21T09:14:03.2599743Z createpolicy.fractional.L2::evict_last.b64 %rd125, 1.0; 2026-02-21T09:14:03.2599807Z // end inline asm 2026-02-21T09:14:03.2599873Z // begin inline asm 2026-02-21T09:14:03.2599933Z mov.u16 %rs46, 0x0; 2026-02-21T09:14:03.2600095Z ld.global.L1::evict_last.L2::cache_hint.b8 { %rs46 }, [ %rd141 + 0 ], %rd125; 2026-02-21T09:14:03.2600153Z // end inline asm 2026-02-21T09:14:03.2600361Z .loc 1 57 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:57:28 2026-02-21T09:14:03.2600427Z shl.b16 %rs55, %rs46, 4; 2026-02-21T09:14:03.2600782Z .loc 1 72 58 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:72:58 2026-02-21T09:14:03.2600874Z selp.b16 %rs56, %rs55, %rs46, %p33; 2026-02-21T09:14:03.2600942Z cvt.s16.s8 %rs57, %rs56; 2026-02-21T09:14:03.2601009Z shr.s16 %rs58, %rs57, 4; 2026-02-21T09:14:03.2601214Z .loc 1 77 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:77:32 2026-02-21T09:14:03.2601281Z cvt.rn.f32.s16 %r1042, %rs58; 2026-02-21T09:14:03.2601347Z st.shared.b32 [%r37], %r1042; 2026-02-21T09:14:03.2601405Z $L__tmp7: 2026-02-21T09:14:03.2601685Z .loc 2 291 36 // standard.py:291:36 @[ csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:84:40 ] 2026-02-21T09:14:03.2601748Z // begin inline asm 2026-02-21T09:14:03.2601825Z fence.proxy.async.shared::cta; 2026-02-21T09:14:03.2601891Z // end inline asm 2026-02-21T09:14:03.2601950Z bar.sync 0; 2026-02-21T09:14:03.2602033Z shfl.sync.idx.b32 %r1043, %r4, 0, 31, -1; 2026-02-21T09:14:03.2602118Z wgmma.fence.sync.aligned; 2026-02-21T09:14:03.2602265Z mov.pred %p25, -1; 2026-02-21T09:14:03.2602329Z // begin inline asm 2026-02-21T09:14:03.2602840Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169}, {%r954,%r955,%r956,%r957}, %rd128, %p25, 1, 1; 2026-02-21T09:14:03.2602907Z // end inline asm 2026-02-21T09:14:03.2602966Z // begin inline asm 2026-02-21T09:14:03.2603472Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169}, {%r990,%r991,%r992,%r993}, %rd129, %p25, 1, 1; 2026-02-21T09:14:03.2603537Z // end inline asm 2026-02-21T09:14:03.2603616Z wgmma.commit_group.sync.aligned; 2026-02-21T09:14:03.2603675Z mov.b32 %r1011, 0; 2026-02-21T09:14:03.2603744Z mov.b32 %r1010, %r867; 2026-02-21T09:14:03.2603810Z mov.b32 %r1012, %r1011; 2026-02-21T09:14:03.2603873Z // begin inline asm 2026-02-21T09:14:03.2604184Z // wait for regs: %r1154,%r1155,%r1156,%r1157,%r1158,%r1159,%r1160,%r1161,%r1162,%r1163,%r1164,%r1165,%r1166,%r1167,%r1168,%r1169,%r1010,%r1011,%r1012 2026-02-21T09:14:03.2604268Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:14:03.2604328Z // end inline asm 2026-02-21T09:14:03.2604384Z $L__tmp8: 2026-02-21T09:14:03.2604606Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2604672Z add.s32 %r1044, %r1153, 1; 2026-02-21T09:14:03.2604740Z setp.gt.s32 %p30, %r1044, 1; 2026-02-21T09:14:03.2604814Z selp.b32 %r1153, 0, %r1044, %p30; 2026-02-21T09:14:03.2605016Z .loc 1 48 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:32 2026-02-21T09:14:03.2605091Z mad.wide.s32 %rd131, %r1151, 2, %rd32; 2026-02-21T09:14:03.2605293Z .loc 1 48 80 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:48:80 2026-02-21T09:14:03.2605371Z shl.b32 %r1045, %r1153, 14; 2026-02-21T09:14:03.2605438Z add.s32 %r1046, %r1080, %r1045; 2026-02-21T09:14:03.2605501Z add.s32 %r1032, %r1046, %r30; 2026-02-21T09:14:03.2605572Z selp.b32 %r1033, 8, 0, %p28; 2026-02-21T09:14:03.2605697Z // begin inline asm 2026-02-21T09:14:03.2605845Z cp.async.ca.shared.global [ %r1032 + 0 ], [ %rd142 + 0 ], 0x8, %r1033; 2026-02-21T09:14:03.2605910Z // end inline asm 2026-02-21T09:14:03.2605974Z add.s32 %r1034, %r1032, 8192; 2026-02-21T09:14:03.2606038Z // begin inline asm 2026-02-21T09:14:03.2606175Z cp.async.ca.shared.global [ %r1034 + 0 ], [ %rd131 + 0 ], 0x8, %r1033; 2026-02-21T09:14:03.2606252Z // end inline asm 2026-02-21T09:14:03.2606321Z cp.async.commit_group; 2026-02-21T09:14:03.2606671Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2606747Z add.s64 %rd142, %rd142, 32; 2026-02-21T09:14:03.2606809Z add.s32 %r1151, %r1151, 16; 2026-02-21T09:14:03.2607030Z add.s64 %rd141, %rd141, 65536; 2026-02-21T09:14:03.2607114Z setp.lt.u64 %p31, %rd143, 504; 2026-02-21T09:14:03.2607185Z @%p31 bra $L__BB0_12; 2026-02-21T09:14:03.2607303Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:14:03.2607506Z .loc 1 33 32 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:33:32 2026-02-21T09:14:03.2607578Z or.b32 %r1065, %r183, %r9; 2026-02-21T09:14:03.2607784Z .loc 1 40 102 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:40:102 2026-02-21T09:14:03.2607853Z cp.async.wait_group 0; 2026-02-21T09:14:03.2607917Z bar.sync 0; 2026-02-21T09:14:03.2608117Z .loc 1 87 28 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:87:28 2026-02-21T09:14:03.2608198Z cvt.rn.bf16x2.f32 %r1066, %r1155, %r1154; 2026-02-21T09:14:03.2608279Z cvt.rn.bf16x2.f32 %r1067, %r1157, %r1156; 2026-02-21T09:14:03.2608353Z cvt.rn.bf16x2.f32 %r1068, %r1159, %r1158; 2026-02-21T09:14:03.2608496Z cvt.rn.bf16x2.f32 %r1069, %r1161, %r1160; 2026-02-21T09:14:03.2608571Z cvt.rn.bf16x2.f32 %r1070, %r1163, %r1162; 2026-02-21T09:14:03.2608651Z cvt.rn.bf16x2.f32 %r1071, %r1165, %r1164; 2026-02-21T09:14:03.2608725Z cvt.rn.bf16x2.f32 %r1072, %r1167, %r1166; 2026-02-21T09:14:03.2608797Z cvt.rn.bf16x2.f32 %r1073, %r1169, %r1168; 2026-02-21T09:14:03.2609007Z .loc 1 88 43 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:43 2026-02-21T09:14:03.2609072Z shl.b32 %r1074, %r181, 13; 2026-02-21T09:14:03.2609135Z shl.b32 %r1075, %r182, 13; 2026-02-21T09:14:03.2609333Z .loc 1 88 50 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:50 2026-02-21T09:14:03.2609406Z add.s32 %r1076, %r1074, %r1065; 2026-02-21T09:14:03.2609470Z add.s32 %r1077, %r1075, %r1065; 2026-02-21T09:14:03.2609667Z .loc 1 88 22 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:22 2026-02-21T09:14:03.2609749Z mad.wide.s32 %rd132, %r1076, 2, %rd34; 2026-02-21T09:14:03.2609824Z mad.wide.s32 %rd133, %r1077, 2, %rd34; 2026-02-21T09:14:03.2610022Z .loc 1 88 81 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:88:81 2026-02-21T09:14:03.2610147Z st.shared.v4.b32 [%r38], {%r1066, %r1068, %r1070, %r1072}; 2026-02-21T09:14:03.2610266Z st.shared.v4.b32 [%r38+128], {%r1067, %r1069, %r1071, %r1073}; 2026-02-21T09:14:03.2610324Z bar.sync 0; 2026-02-21T09:14:03.2610392Z // begin inline asm 2026-02-21T09:14:03.2610586Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1047, %r1048, %r1049, %r1050}, [%r1051]; 2026-02-21T09:14:03.2610648Z // end inline asm 2026-02-21T09:14:03.2610709Z // begin inline asm 2026-02-21T09:14:03.2610905Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1052, %r1053, %r1054, %r1055}, [%r1056]; 2026-02-21T09:14:03.2610977Z // end inline asm 2026-02-21T09:14:03.2611042Z // begin inline asm 2026-02-21T09:14:03.2611180Z st.global.v4.b32 [ %rd132 + 0 ], { %r1047, %r1048, %r1049, %r1050 }; 2026-02-21T09:14:03.2611243Z // end inline asm 2026-02-21T09:14:03.2611305Z // begin inline asm 2026-02-21T09:14:03.2611424Z st.global.v4.b32 [ %rd133 + 0 ], { %r1052, %r1053, %r1054, %r1055 }; 2026-02-21T09:14:03.2611565Z // end inline asm 2026-02-21T09:14:03.2611778Z .loc 1 19 144 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:144 2026-02-21T09:14:03.2611843Z add.s32 %r223, %r1150, 4224; 2026-02-21T09:14:03.2611925Z setp.lt.s32 %p32, %r1150, 3968; 2026-02-21T09:14:03.2611992Z mov.b32 %r1150, %r223; 2026-02-21T09:14:03.2612055Z @%p32 bra $L__BB0_11; 2026-02-21T09:14:03.2612149Z $L__BB0_14: // %._crit_edge 2026-02-21T09:14:03.2612348Z .loc 1 19 4 // csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py:19:4 2026-02-21T09:14:03.2612403Z ret; 2026-02-21T09:14:03.2612460Z $L__tmp9: 2026-02-21T09:14:03.2612526Z $L__func_end0: 2026-02-21T09:14:03.2612722Z // -- End function 2026-02-21T09:14:03.2612783Z } 2026-02-21T09:14:03.2613038Z .file 1 "/tmp/torchinductor_root/sr/csrxdzhubgqhh7ymjf7oci5gx6qer46dfgpyh5e77rrwhu6ayf66.py" 2026-02-21T09:14:03.2613254Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:14:03.2613320Z .section .debug_abbrev 2026-02-21T09:14:03.2613379Z { 2026-02-21T09:14:03.2613473Z .b8 1 // Abbreviation Code 2026-02-21T09:14:03.2613571Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:14:03.2613659Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:14:03.2613753Z .b8 37 // DW_AT_producer 2026-02-21T09:14:03.2613832Z .b8 8 // DW_FORM_string 2026-02-21T09:14:03.2613913Z .b8 19 // DW_AT_language 2026-02-21T09:14:03.2613999Z .b8 5 // DW_FORM_data2 2026-02-21T09:14:03.2614129Z .b8 3 // DW_AT_name 2026-02-21T09:14:03.2614213Z .b8 8 // DW_FORM_string 2026-02-21T09:14:03.2614301Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:14:03.2614384Z .b8 6 // DW_FORM_data4 2026-02-21T09:14:03.2614478Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:14:03.2614560Z .b8 8 // DW_FORM_string 2026-02-21T09:14:03.2614641Z .b8 0 // EOM(1) 2026-02-21T09:14:03.2614710Z .b8 0 // EOM(2) 2026-02-21T09:14:03.2614800Z .b8 2 // Abbreviation Code 2026-02-21T09:14:03.2614892Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:14:03.2614974Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:14:03.2615053Z .b8 3 // DW_AT_name 2026-02-21T09:14:03.2615142Z .b8 8 // DW_FORM_string 2026-02-21T09:14:03.2615224Z .b8 32 // DW_AT_inline 2026-02-21T09:14:03.2615304Z .b8 11 // DW_FORM_data1 2026-02-21T09:14:03.2615379Z .b8 0 // EOM(1) 2026-02-21T09:14:03.2615455Z .b8 0 // EOM(2) 2026-02-21T09:14:03.2615541Z .b8 3 // Abbreviation Code 2026-02-21T09:14:03.2615628Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:14:03.2615720Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:14:03.2615800Z .b8 17 // DW_AT_low_pc 2026-02-21T09:14:03.2615877Z .b8 1 // DW_FORM_addr 2026-02-21T09:14:03.2615968Z .b8 18 // DW_AT_high_pc 2026-02-21T09:14:03.2616046Z .b8 1 // DW_FORM_addr 2026-02-21T09:14:03.2616151Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:14:03.2616233Z .b8 19 // DW_FORM_ref4 2026-02-21T09:14:03.2616305Z .b8 0 // EOM(1) 2026-02-21T09:14:03.2616578Z .b8 0 // EOM(2) 2026-02-21T09:14:03.2616681Z .b8 4 // Abbreviation Code 2026-02-21T09:14:03.2616789Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:14:03.2616871Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:14:03.2616962Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:14:03.2617057Z .b8 19 // DW_FORM_ref4 2026-02-21T09:14:03.2617138Z .b8 17 // DW_AT_low_pc 2026-02-21T09:14:03.2617219Z .b8 1 // DW_FORM_addr 2026-02-21T09:14:03.2617306Z .b8 18 // DW_AT_high_pc 2026-02-21T09:14:03.2617584Z .b8 1 // DW_FORM_addr 2026-02-21T09:14:03.2617730Z .b8 88 // DW_AT_call_file 2026-02-21T09:14:03.2617821Z .b8 11 // DW_FORM_data1 2026-02-21T09:14:03.2617911Z .b8 89 // DW_AT_call_line 2026-02-21T09:14:03.2617994Z .b8 11 // DW_FORM_data1 2026-02-21T09:14:03.2618085Z .b8 87 // DW_AT_call_column 2026-02-21T09:14:03.2618168Z .b8 11 // DW_FORM_data1 2026-02-21T09:14:03.2618245Z .b8 0 // EOM(1) 2026-02-21T09:14:03.2618316Z .b8 0 // EOM(2) 2026-02-21T09:14:03.2618393Z .b8 0 // EOM(3) 2026-02-21T09:14:03.2618448Z } 2026-02-21T09:14:03.2618514Z .section .debug_info 2026-02-21T09:14:03.2618568Z { 2026-02-21T09:14:03.2618671Z .b32 178 // Length of Unit 2026-02-21T09:14:03.2618843Z .b8 2 // DWARF version number 2026-02-21T09:14:03.2618901Z .b8 0 2026-02-21T09:14:03.2619043Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:14:03.2619143Z .b8 8 // Address Size (in bytes) 2026-02-21T09:14:03.2619264Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:14:03.2619366Z .b8 116 // DW_AT_producer 2026-02-21T09:14:03.2619422Z .b8 114 2026-02-21T09:14:03.2619476Z .b8 105 2026-02-21T09:14:03.2619531Z .b8 116 2026-02-21T09:14:03.2619594Z .b8 111 2026-02-21T09:14:03.2619647Z .b8 110 2026-02-21T09:14:03.2619700Z .b8 0 2026-02-21T09:14:03.2619786Z .b8 2 // DW_AT_language 2026-02-21T09:14:03.2619840Z .b8 0 2026-02-21T09:14:03.2619932Z .b8 99 // DW_AT_name 2026-02-21T09:14:03.2619991Z .b8 115 2026-02-21T09:14:03.2620054Z .b8 114 2026-02-21T09:14:03.2620111Z .b8 120 2026-02-21T09:14:03.2620166Z .b8 100 2026-02-21T09:14:03.2620225Z .b8 122 2026-02-21T09:14:03.2620278Z .b8 104 2026-02-21T09:14:03.2620334Z .b8 117 2026-02-21T09:14:03.2620387Z .b8 98 2026-02-21T09:14:03.2620445Z .b8 103 2026-02-21T09:14:03.2620502Z .b8 113 2026-02-21T09:14:03.2620557Z .b8 104 2026-02-21T09:14:03.2620612Z .b8 104 2026-02-21T09:14:03.2620672Z .b8 55 2026-02-21T09:14:03.2620724Z .b8 121 2026-02-21T09:14:03.2620776Z .b8 109 2026-02-21T09:14:03.2620833Z .b8 106 2026-02-21T09:14:03.2620888Z .b8 102 2026-02-21T09:14:03.2620940Z .b8 55 2026-02-21T09:14:03.2620992Z .b8 111 2026-02-21T09:14:03.2621049Z .b8 99 2026-02-21T09:14:03.2621104Z .b8 105 2026-02-21T09:14:03.2621155Z .b8 53 2026-02-21T09:14:03.2621213Z .b8 103 2026-02-21T09:14:03.2621276Z .b8 120 2026-02-21T09:14:03.2621332Z .b8 54 2026-02-21T09:14:03.2621385Z .b8 113 2026-02-21T09:14:03.2621447Z .b8 101 2026-02-21T09:14:03.2621500Z .b8 114 2026-02-21T09:14:03.2621553Z .b8 52 2026-02-21T09:14:03.2621610Z .b8 54 2026-02-21T09:14:03.2621671Z .b8 100 2026-02-21T09:14:03.2621724Z .b8 102 2026-02-21T09:14:03.2621778Z .b8 103 2026-02-21T09:14:03.2621838Z .b8 112 2026-02-21T09:14:03.2621968Z .b8 121 2026-02-21T09:14:03.2622022Z .b8 104 2026-02-21T09:14:03.2622077Z .b8 53 2026-02-21T09:14:03.2622143Z .b8 101 2026-02-21T09:14:03.2622198Z .b8 55 2026-02-21T09:14:03.2622259Z .b8 55 2026-02-21T09:14:03.2622317Z .b8 114 2026-02-21T09:14:03.2622370Z .b8 114 2026-02-21T09:14:03.2622423Z .b8 119 2026-02-21T09:14:03.2622476Z .b8 104 2026-02-21T09:14:03.2622534Z .b8 117 2026-02-21T09:14:03.2622589Z .b8 54 2026-02-21T09:14:03.2622641Z .b8 97 2026-02-21T09:14:03.2622698Z .b8 121 2026-02-21T09:14:03.2622752Z .b8 102 2026-02-21T09:14:03.2622805Z .b8 54 2026-02-21T09:14:03.2622855Z .b8 54 2026-02-21T09:14:03.2622913Z .b8 46 2026-02-21T09:14:03.2622967Z .b8 112 2026-02-21T09:14:03.2623019Z .b8 121 2026-02-21T09:14:03.2623070Z .b8 0 2026-02-21T09:14:03.2623178Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:14:03.2623391Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:14:03.2623447Z .b8 116 2026-02-21T09:14:03.2623506Z .b8 109 2026-02-21T09:14:03.2623558Z .b8 112 2026-02-21T09:14:03.2623613Z .b8 47 2026-02-21T09:14:03.2623665Z .b8 116 2026-02-21T09:14:03.2623723Z .b8 111 2026-02-21T09:14:03.2623776Z .b8 114 2026-02-21T09:14:03.2623828Z .b8 99 2026-02-21T09:14:03.2623887Z .b8 104 2026-02-21T09:14:03.2623939Z .b8 105 2026-02-21T09:14:03.2623992Z .b8 110 2026-02-21T09:14:03.2624044Z .b8 100 2026-02-21T09:14:03.2624102Z .b8 117 2026-02-21T09:14:03.2624167Z .b8 99 2026-02-21T09:14:03.2624223Z .b8 116 2026-02-21T09:14:03.2624283Z .b8 111 2026-02-21T09:14:03.2624335Z .b8 114 2026-02-21T09:14:03.2624389Z .b8 95 2026-02-21T09:14:03.2624441Z .b8 114 2026-02-21T09:14:03.2624499Z .b8 111 2026-02-21T09:14:03.2624551Z .b8 111 2026-02-21T09:14:03.2624604Z .b8 116 2026-02-21T09:14:03.2624656Z .b8 47 2026-02-21T09:14:03.2624714Z .b8 115 2026-02-21T09:14:03.2624769Z .b8 114 2026-02-21T09:14:03.2624824Z .b8 0 2026-02-21T09:14:03.2625002Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:14:03.2625084Z .b8 95 // DW_AT_name 2026-02-21T09:14:03.2625140Z .b8 104 2026-02-21T09:14:03.2625200Z .b8 101 2026-02-21T09:14:03.2625253Z .b8 108 2026-02-21T09:14:03.2625307Z .b8 105 2026-02-21T09:14:03.2625360Z .b8 111 2026-02-21T09:14:03.2625424Z .b8 110 2026-02-21T09:14:03.2625479Z .b8 95 2026-02-21T09:14:03.2625531Z .b8 109 2026-02-21T09:14:03.2625583Z .b8 97 2026-02-21T09:14:03.2625642Z .b8 116 2026-02-21T09:14:03.2625695Z .b8 109 2026-02-21T09:14:03.2625760Z .b8 117 2026-02-21T09:14:03.2625822Z .b8 108 2026-02-21T09:14:03.2625874Z .b8 95 2026-02-21T09:14:03.2625928Z .b8 98 2026-02-21T09:14:03.2625981Z .b8 102 2026-02-21T09:14:03.2626041Z .b8 49 2026-02-21T09:14:03.2626094Z .b8 54 2026-02-21T09:14:03.2626147Z .b8 95 2026-02-21T09:14:03.2626206Z .b8 105 2026-02-21T09:14:03.2626259Z .b8 110 2026-02-21T09:14:03.2626317Z .b8 116 2026-02-21T09:14:03.2626372Z .b8 52 2026-02-21T09:14:03.2626431Z .b8 0 2026-02-21T09:14:03.2626670Z .b8 1 // DW_AT_inline 2026-02-21T09:14:03.2626785Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:14:03.2626896Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:14:03.2626995Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:14:03.2627097Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:14:03.2627229Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:14:03.2627332Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:14:03.2627422Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:14:03.2627513Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T09:14:03.2627604Z .b8 1 // DW_AT_call_file 2026-02-21T09:14:03.2627692Z .b8 84 // DW_AT_call_line 2026-02-21T09:14:03.2627782Z .b8 40 // DW_AT_call_column 2026-02-21T09:14:03.2627882Z .b8 0 // End Of Children Mark 2026-02-21T09:14:03.2628078Z .b8 0 // End Of Children Mark 2026-02-21T09:14:03.2628135Z } 2026-02-21T09:14:03.2628208Z .section .debug_macinfo { } 2026-02-21T09:14:03.2628220Z 2026-02-21T09:14:03.2628363Z ================================================================ 2026-02-21T09:14:03.2628495Z please share the reproducer above with Triton project. 2026-02-21T09:14:18.9794479Z [139s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=2, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, None], range_num_stages=[4, 1], range_unroll_factors=[1, 1], range_warp_specializes=[]) 2026-02-21T09:14:18.9797349Z Tensor-likes are not close! 2026-02-21T09:14:18.9797598Z 2026-02-21T09:14:18.9797765Z Mismatched elements: 134037546 / 134217728 (99.9%) 2026-02-21T09:14:18.9798203Z Greatest absolute difference: 3248.0 at index (2314, 2332) (up to 0.01 allowed) 2026-02-21T09:14:18.9798729Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:14:18.9799197Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:14:18.9799450Z 2026-02-21T09:14:24.4034365Z [144s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=4, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, True], range_num_stages=[0, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]) 2026-02-21T09:14:24.4036016Z Tensor-likes are not close! 2026-02-21T09:14:24.4036168Z 2026-02-21T09:14:24.4036277Z Mismatched elements: 133768015 / 134217728 (99.7%) 2026-02-21T09:14:24.4037001Z Greatest absolute difference: 1408.0 at index (5503, 5031) (up to 0.01 allowed) 2026-02-21T09:14:24.4037500Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:14:24.4037938Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:14:24.4038172Z 2026-02-21T09:14:25.6080557Z [146s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 1024, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=4, num_warps=32, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, False], range_num_stages=[3, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T09:14:25.6082480Z Tensor-likes are not close! 2026-02-21T09:14:25.6082643Z 2026-02-21T09:14:25.6082764Z Mismatched elements: 133944767 / 134217728 (99.8%) 2026-02-21T09:14:25.6083198Z Greatest absolute difference: 2448.0 at index (1487, 3813) (up to 0.01 allowed) 2026-02-21T09:14:25.6083731Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:14:25.6084193Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:14:25.6084449Z 2026-02-21T09:14:39.2353909Z [159s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 16, 256], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=1, num_stages=7, num_warps=32, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[True, False], range_num_stages=[1, 2], range_unroll_factors=[4, 0], range_warp_specializes=[]) 2026-02-21T09:14:39.2355853Z Tensor-likes are not close! 2026-02-21T09:14:39.2356012Z 2026-02-21T09:14:39.2356137Z Mismatched elements: 134215283 / 134217728 (100.0%) 2026-02-21T09:14:39.2357381Z Greatest absolute difference: 2768.0 at index (5819, 7671) (up to 0.01 allowed) 2026-02-21T09:14:39.2357928Z Greatest relative difference: 3.03125 at index (14, 6207) (up to 0.01 allowed) 2026-02-21T09:14:39.2358399Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:14:39.2358660Z 2026-02-21T09:15:05.8010101Z [186s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 256], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], num_stages=4, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 4], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:15:05.8012401Z Tensor-likes are not close! 2026-02-21T09:15:05.8012716Z 2026-02-21T09:15:05.8012839Z Mismatched elements: 134038581 / 134217728 (99.9%) 2026-02-21T09:15:05.8013257Z Greatest absolute difference: 3440.0 at index (2330, 2316) (up to 0.01 allowed) 2026-02-21T09:15:05.8013761Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:15:05.8014201Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:15:05.8014437Z 2026-02-21T09:15:42.1344512Z [222s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 1024], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=1, num_stages=6, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[None, None], range_num_stages=[4, 0], range_unroll_factors=[3, 0], range_warp_specializes=[]) 2026-02-21T09:15:42.1347425Z Tensor-likes are not close! 2026-02-21T09:15:42.1347710Z 2026-02-21T09:15:42.1348486Z Mismatched elements: 134016187 / 134217728 (99.8%) 2026-02-21T09:15:42.1349197Z Greatest absolute difference: 3248.0 at index (16335, 3919) (up to 0.01 allowed) 2026-02-21T09:15:42.1350018Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:15:42.1350714Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:15:42.1351108Z 2026-02-21T09:15:42.7168551Z [223s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 2048], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=128, num_stages=8, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[0, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:15:42.7171396Z Tensor-likes are not close! 2026-02-21T09:15:42.7171652Z 2026-02-21T09:15:42.7171842Z Mismatched elements: 134011873 / 134217728 (99.8%) 2026-02-21T09:15:42.7172498Z Greatest absolute difference: 3424.0 at index (2330, 2588) (up to 0.01 allowed) 2026-02-21T09:15:42.7173401Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:15:42.7173906Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:15:42.7174163Z 2026-02-21T09:15:45.0804968Z [225s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=5, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[0, 0], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T09:15:45.0806941Z Tensor-likes are not close! 2026-02-21T09:15:45.0807093Z 2026-02-21T09:15:45.0807216Z Mismatched elements: 133575894 / 134217728 (99.5%) 2026-02-21T09:15:45.0807635Z Greatest absolute difference: 1360.0 at index (6311, 4721) (up to 0.01 allowed) 2026-02-21T09:15:45.0808131Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:15:45.0808817Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:15:45.0809054Z 2026-02-21T09:15:48.2465833Z [228s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=6, num_warps=32, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[3, 3], range_unroll_factors=[0, 1], range_warp_specializes=[]) 2026-02-21T09:15:48.2469190Z Tensor-likes are not close! 2026-02-21T09:15:48.2469461Z 2026-02-21T09:15:48.2469650Z Mismatched elements: 134215283 / 134217728 (100.0%) 2026-02-21T09:15:48.2471050Z Greatest absolute difference: 2768.0 at index (5819, 7671) (up to 0.01 allowed) 2026-02-21T09:15:48.2471935Z Greatest relative difference: 3.03125 at index (14, 6207) (up to 0.01 allowed) 2026-02-21T09:15:48.2472665Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:15:48.2473073Z 2026-02-21T09:16:11.7841462Z [252s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:16:11.7844202Z Tensor-likes are not close! 2026-02-21T09:16:11.7844459Z 2026-02-21T09:16:11.7844643Z Mismatched elements: 133775108 / 134217728 (99.7%) 2026-02-21T09:16:11.7845651Z Greatest absolute difference: 1488.0 at index (5644, 599) (up to 0.01 allowed) 2026-02-21T09:16:11.7846792Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:16:11.7847511Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:16:11.7847905Z 2026-02-21T09:16:36.7244056Z [277s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=1, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, None], range_num_stages=[1, 3], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T09:16:36.7246018Z Tensor-likes are not close! 2026-02-21T09:16:36.7246196Z 2026-02-21T09:16:36.7246326Z Mismatched elements: 133905832 / 134217728 (99.8%) 2026-02-21T09:16:36.7247176Z Greatest absolute difference: 2400.0 at index (6169, 6459) (up to 0.01 allowed) 2026-02-21T09:16:36.7247789Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:16:36.7248308Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:16:36.7248594Z 2026-02-21T09:16:36.7958880Z [277s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 1024], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=4, num_stages=8, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[3, 1], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T09:16:36.7960894Z Tensor-likes are not close! 2026-02-21T09:16:36.7961074Z 2026-02-21T09:16:36.7961196Z Mismatched elements: 133943599 / 134217728 (99.8%) 2026-02-21T09:16:36.7961678Z Greatest absolute difference: 2416.0 at index (8123, 7471) (up to 0.01 allowed) 2026-02-21T09:16:36.7962260Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:16:36.7962767Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:16:36.7963461Z 2026-02-21T09:16:49.7482579Z 2026-02-21T09:16:49.7484956Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 0.5 configs/s 2026-02-21T09:16:49.7502003Z [290s] Adaptive compile timeout: 30s (90% percentile=20.9s, bounds=[30.0s, 30s]) 2026-02-21T09:16:49.7752548Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25/25 - configs/s 2026-02-21T09:16:50.1831955Z [290s] Initial random population of 100, 5 starting points: 2026-02-21T09:16:50.1832598Z error=19 2026-02-21T09:16:50.1832898Z timeout=4 2026-02-21T09:16:50.1833159Z ok=77 2026-02-21T09:16:50.1833426Z min=7.9460 2026-02-21T09:16:50.1833696Z mid=216.7040 2026-02-21T09:16:50.1833999Z max=2613.2520 2026-02-21T09:16:50.1834362Z best={'block_sizes': [16, 16, 128], 2026-02-21T09:16:50.1835436Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:16:50.1835833Z 'l2_groupings': [1], 2026-02-21T09:16:50.1836066Z 'load_eviction_policies': ['last', 'last'], 2026-02-21T09:16:50.1836350Z 'loop_orders': [[0, 1]], 2026-02-21T09:16:50.1836849Z 'maxnreg': 128, 2026-02-21T09:16:50.1837051Z 'num_sm_multiplier': 16, 2026-02-21T09:16:50.1837272Z 'num_stages': 7, 2026-02-21T09:16:50.1837463Z 'num_warps': 4, 2026-02-21T09:16:50.1837663Z 'pid_type': 'persistent_blocked', 2026-02-21T09:16:50.1837925Z 'range_flattens': [None, None], 2026-02-21T09:16:50.1838171Z 'range_multi_buffers': [False, None], 2026-02-21T09:16:50.1838437Z 'range_num_stages': [2, 2], 2026-02-21T09:16:50.1838667Z 'range_unroll_factors': [4, 0], 2026-02-21T09:16:50.1838916Z 'range_warp_specializes': []} 2026-02-21T09:16:50.1854291Z [290s] Fitting surrogate: 100 points, 100 targets 2026-02-21T09:16:52.1622386Z [292s] Generation 1 starting: 114 neighbors, 5 active search path(s) 2026-02-21T09:17:41.8833656Z [342s] Timeout after 30s compiling Config(block_sizes=[256, 64, 8], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=32, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[False, False], range_num_stages=[1, 0], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:17:41.8852301Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 116/116 0.5 configs/s 2026-02-21T09:17:42.3583288Z [342s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=16, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[2, 2], range_unroll_factors=[4, 0], range_warp_specializes=[]) 2026-02-21T09:17:42.3585202Z Tensor-likes are not close! 2026-02-21T09:17:42.3585375Z 2026-02-21T09:17:42.3585498Z Mismatched elements: 133685123 / 134217728 (99.6%) 2026-02-21T09:17:42.3586186Z Greatest absolute difference: 1408.0 at index (13960, 7749) (up to 0.01 allowed) 2026-02-21T09:17:42.3586954Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:17:42.3587438Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:17:42.3587696Z 2026-02-21T09:17:46.7141897Z [347s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 64, 32], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=8, num_stages=7, num_warps=16, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, None], range_num_stages=[4, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:17:46.7143830Z Tensor-likes are not close! 2026-02-21T09:17:46.7144020Z 2026-02-21T09:17:46.7144143Z Mismatched elements: 133923697 / 134217728 (99.8%) 2026-02-21T09:17:46.7144592Z Greatest absolute difference: 2448.0 at index (11770, 6169) (up to 0.01 allowed) 2026-02-21T09:17:46.7145141Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:17:46.7145609Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:17:46.7145863Z 2026-02-21T09:18:03.9977323Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 116/116 5.2 configs/s 2026-02-21T09:18:04.1061158Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━ 128/128 2930.0 2026-02-21T09:18:04.1061639Z configs/s 2026-02-21T09:18:04.5142594Z [364s] Generation 1 complete: 2026-02-21T09:18:04.5142872Z error=26 2026-02-21T09:18:04.5143050Z timeout=1 2026-02-21T09:18:04.5143223Z ok=92 2026-02-21T09:18:04.5161884Z min=1.5620 2026-02-21T09:18:04.5162213Z mid=13.1951 2026-02-21T09:18:04.5162389Z max=209.8972 2026-02-21T09:18:04.5162587Z best={'block_sizes': [16, 64, 128], 2026-02-21T09:18:04.5162977Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:18:04.5163383Z 'l2_groupings': [1], 2026-02-21T09:18:04.5163617Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:18:04.5163889Z 'loop_orders': [[0, 1]], 2026-02-21T09:18:04.5164100Z 'maxnreg': 128, 2026-02-21T09:18:04.5164287Z 'num_sm_multiplier': 8, 2026-02-21T09:18:04.5164497Z 'num_stages': 7, 2026-02-21T09:18:04.5164676Z 'num_warps': 4, 2026-02-21T09:18:04.5164880Z 'pid_type': 'persistent_blocked', 2026-02-21T09:18:04.5165134Z 'range_flattens': [None, None], 2026-02-21T09:18:04.5165388Z 'range_multi_buffers': [False, None], 2026-02-21T09:18:04.5165652Z 'range_num_stages': [2, 2], 2026-02-21T09:18:04.5165881Z 'range_unroll_factors': [4, 0], 2026-02-21T09:18:04.5166126Z 'range_warp_specializes': []} 2026-02-21T09:18:04.5168497Z [364s] Fitting surrogate: 219 points, 219 targets 2026-02-21T09:18:06.4202424Z [366s] Generation 2 starting: 111 neighbors, 5 active search path(s) 2026-02-21T09:19:02.0897162Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 114/114 0.9 configs/s 2026-02-21T09:19:08.0261232Z [428s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 32], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, None], range_num_stages=[3, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:19:08.0263083Z Tensor-likes are not close! 2026-02-21T09:19:08.0263253Z 2026-02-21T09:19:08.0263390Z Mismatched elements: 133779192 / 134217728 (99.7%) 2026-02-21T09:19:08.0263826Z Greatest absolute difference: 1488.0 at index (5644, 599) (up to 0.01 allowed) 2026-02-21T09:19:08.0264422Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:19:08.0264890Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:19:08.0265616Z 2026-02-21T09:19:14.1161344Z 2026-02-21T09:19:14.1161359Z 2026-02-21T09:19:14.1161734Z ================================================================ 2026-02-21T09:19:14.1162125Z Internal Triton PTX codegen error 2026-02-21T09:19:14.1162392Z `ptxas` stderr: 2026-02-21T09:19:14.1163128Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1282 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:19:14.1163958Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:19:14.1164186Z 2026-02-21T09:19:14.1164835Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpu2yd3u9a.ptx -o /tmp/tmpu2yd3u9a.ptx.o 2026-02-21T09:19:14.1165991Z 2026-02-21T09:19:14.1166000Z 2026-02-21T09:19:14.1166098Z // 2026-02-21T09:19:14.1166296Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:19:14.1166796Z // 2026-02-21T09:19:14.1166895Z 2026-02-21T09:19:14.1166971Z .version 8.7 2026-02-21T09:19:14.1167143Z .target sm_90a 2026-02-21T09:19:14.1167324Z .address_size 64 2026-02-21T09:19:14.1167434Z 2026-02-21T09:19:14.1167658Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:19:14.1168105Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:19:14.1168430Z // @_helion_matmul_bf16_int4 2026-02-21T09:19:14.1168751Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:19:14.1169129Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:19:14.1169563Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:19:14.1170000Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:19:14.1170626Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:19:14.1171023Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:19:14.1171321Z ) 2026-02-21T09:19:14.1171459Z .reqntid 256 2026-02-21T09:19:14.1171617Z .maxnreg 32 2026-02-21T09:19:14.1171755Z { 2026-02-21T09:19:14.1171899Z .reg .pred %p<111>; 2026-02-21T09:19:14.1172079Z .reg .b16 %rs<728>; 2026-02-21T09:19:14.1172250Z .reg .b32 %r<24289>; 2026-02-21T09:19:14.1172418Z .reg .b64 %rd<1124>; 2026-02-21T09:19:14.1172759Z .loc 1 14 0 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:14:0 2026-02-21T09:19:14.1173160Z $L__func_begin0: 2026-02-21T09:19:14.1173484Z .loc 1 14 0 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:14:0 2026-02-21T09:19:14.1173809Z 2026-02-21T09:19:14.1173873Z // %bb.0: 2026-02-21T09:19:14.1174083Z ld.param.b64 %rd66, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:19:14.1174529Z ld.param.b64 %rd65, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:19:14.1174849Z ld.param.b64 %rd64, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:19:14.1175115Z $L__tmp0: 2026-02-21T09:19:14.1175440Z .loc 1 20 30 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:20:30 2026-02-21T09:19:14.1175849Z mov.u32 %r22988, %ctaid.x; 2026-02-21T09:19:14.1176203Z .loc 1 21 49 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:21:49 2026-02-21T09:19:14.1176781Z min.u32 %r2, %r22988, 2047; 2026-02-21T09:19:14.1177145Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1177543Z sub.s32 %r2950, %r2, %r22988; 2026-02-21T09:19:14.1177740Z add.s32 %r2951, %r2950, 1; 2026-02-21T09:19:14.1177929Z shr.s32 %r2952, %r2951, 31; 2026-02-21T09:19:14.1178115Z shr.u32 %r2953, %r2952, 30; 2026-02-21T09:19:14.1178316Z add.s32 %r2954, %r2951, %r2953; 2026-02-21T09:19:14.1178528Z and.b32 %r2955, %r2954, -4; 2026-02-21T09:19:14.1178734Z add.s32 %r24029, %r2955, %r22988; 2026-02-21T09:19:14.1179094Z .loc 1 34 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:45 2026-02-21T09:19:14.1179654Z mov.u32 %r4, %tid.x; 2026-02-21T09:19:14.1179818Z shr.u32 %r5, %r4, 5; 2026-02-21T09:19:14.1179977Z shl.b32 %r6, %r4, 2; 2026-02-21T09:19:14.1180131Z and.b32 %r7, %r6, 124; 2026-02-21T09:19:14.1180314Z shl.b32 %r8, %r4, 3; 2026-02-21T09:19:14.1180474Z and.b32 %r9, %r8, 120; 2026-02-21T09:19:14.1180785Z .loc 1 36 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:45 2026-02-21T09:19:14.1181149Z and.b32 %r10, %r4, 252; 2026-02-21T09:19:14.1181324Z bfe.u32 %r11, %r4, 2, 6; 2026-02-21T09:19:14.1181545Z or.b32 %r12, %r11, 64; 2026-02-21T09:19:14.1181710Z or.b32 %r13, %r11, 128; 2026-02-21T09:19:14.1181878Z or.b32 %r14, %r11, 192; 2026-02-21T09:19:14.1182044Z or.b32 %r15, %r11, 256; 2026-02-21T09:19:14.1182306Z or.b32 %r16, %r11, 320; 2026-02-21T09:19:14.1182479Z or.b32 %r17, %r11, 384; 2026-02-21T09:19:14.1182639Z or.b32 %r18, %r11, 448; 2026-02-21T09:19:14.1182803Z shr.u32 %r2956, %r4, 4; 2026-02-21T09:19:14.1182989Z bfe.u32 %r19, %r4, 4, 4; 2026-02-21T09:19:14.1183163Z or.b32 %r20, %r19, 16; 2026-02-21T09:19:14.1183321Z or.b32 %r21, %r19, 32; 2026-02-21T09:19:14.1186166Z or.b32 %r22, %r2956, 48; 2026-02-21T09:19:14.1186373Z or.b32 %r23, %r19, 64; 2026-02-21T09:19:14.1186771Z or.b32 %r24, %r19, 80; 2026-02-21T09:19:14.1186944Z or.b32 %r25, %r19, 96; 2026-02-21T09:19:14.1187108Z or.b32 %r26, %r2956, 112; 2026-02-21T09:19:14.1187285Z or.b32 %r27, %r19, 128; 2026-02-21T09:19:14.1187452Z or.b32 %r28, %r19, 144; 2026-02-21T09:19:14.1187619Z or.b32 %r29, %r19, 160; 2026-02-21T09:19:14.1187777Z or.b32 %r30, %r2956, 176; 2026-02-21T09:19:14.1201691Z or.b32 %r31, %r19, 192; 2026-02-21T09:19:14.1201918Z or.b32 %r32, %r19, 208; 2026-02-21T09:19:14.1202301Z or.b32 %r33, %r19, 224; 2026-02-21T09:19:14.1202495Z or.b32 %r34, %r2956, 240; 2026-02-21T09:19:14.1202677Z or.b32 %r35, %r19, 256; 2026-02-21T09:19:14.1202858Z or.b32 %r36, %r19, 272; 2026-02-21T09:19:14.1203026Z or.b32 %r37, %r19, 288; 2026-02-21T09:19:14.1203201Z or.b32 %r38, %r2956, 304; 2026-02-21T09:19:14.1203385Z or.b32 %r39, %r19, 320; 2026-02-21T09:19:14.1203552Z or.b32 %r40, %r19, 336; 2026-02-21T09:19:14.1203728Z or.b32 %r41, %r19, 352; 2026-02-21T09:19:14.1203908Z or.b32 %r42, %r2956, 368; 2026-02-21T09:19:14.1204110Z or.b32 %r43, %r19, 384; 2026-02-21T09:19:14.1204285Z or.b32 %r44, %r19, 400; 2026-02-21T09:19:14.1204462Z or.b32 %r45, %r19, 416; 2026-02-21T09:19:14.1204628Z or.b32 %r46, %r2956, 432; 2026-02-21T09:19:14.1204825Z or.b32 %r47, %r19, 448; 2026-02-21T09:19:14.1205004Z or.b32 %r48, %r19, 464; 2026-02-21T09:19:14.1205179Z or.b32 %r49, %r19, 480; 2026-02-21T09:19:14.1205358Z or.b32 %r50, %r2956, 496; 2026-02-21T09:19:14.1205816Z .loc 1 44 48 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:44:48 2026-02-21T09:19:14.1206223Z and.b32 %r51, %r4, 224; 2026-02-21T09:19:14.1206407Z bfe.u32 %r52, %r4, 5, 3; 2026-02-21T09:19:14.1206931Z .loc 1 50 38 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:50:38 2026-02-21T09:19:14.1207300Z and.b32 %r53, %r4, 3; 2026-02-21T09:19:14.1207475Z shl.b32 %r54, %r53, 2; 2026-02-21T09:19:14.1207799Z .loc 1 68 38 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:68:38 2026-02-21T09:19:14.1208160Z and.b32 %r55, %r4, 128; 2026-02-21T09:19:14.1208500Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1208893Z setp.ge.s32 %p1, %r22988, %r24029; 2026-02-21T09:19:14.1209133Z and.b32 %r22965, %r8, 1912; 2026-02-21T09:19:14.1209323Z bfe.s32 %r22966, %r4, 4, 1; 2026-02-21T09:19:14.1209535Z mov.b32 %r22967, global_smem; 2026-02-21T09:19:14.1209738Z shl.b32 %r22968, %r51, 8; 2026-02-21T09:19:14.1209921Z and.b32 %r22969, %r6, 1020; 2026-02-21T09:19:14.1210103Z shl.b32 %r22970, %r51, 4; 2026-02-21T09:19:14.1210272Z and.b32 %r22971, %r8, 96; 2026-02-21T09:19:14.1210586Z shl.b32 %r22972, %r53, 1; 2026-02-21T09:19:14.1210758Z and.b32 %r22973, %r4, 127; 2026-02-21T09:19:14.1210942Z or.b32 %r22974, %r4, 896; 2026-02-21T09:19:14.1211112Z and.b32 %r22975, %r8, 48; 2026-02-21T09:19:14.1211287Z shr.u32 %r22976, %r55, 5; 2026-02-21T09:19:14.1211460Z shl.b32 %r22977, %r53, 13; 2026-02-21T09:19:14.1211642Z shl.b32 %r22978, %r4, 5; 2026-02-21T09:19:14.1211820Z and.b32 %r22979, %r4, 24; 2026-02-21T09:19:14.1211990Z and.b32 %r22980, %r6, 16; 2026-02-21T09:19:14.1212164Z shl.b32 %r22981, %r53, 5; 2026-02-21T09:19:14.1212334Z shl.b32 %r22982, %r10, 2; 2026-02-21T09:19:14.1212511Z shl.b32 %r22983, %r52, 13; 2026-02-21T09:19:14.1212683Z shl.b32 %r22984, %r11, 10; 2026-02-21T09:19:14.1212890Z setp.eq.b32 %p110, %r55, 0; 2026-02-21T09:19:14.1213176Z cvt.u64.u32 %rd1113, %r54; 2026-02-21T09:19:14.1213368Z @%p1 bra $L__BB0_11; 2026-02-21T09:19:14.1213566Z // %bb.1: // %.lr.ph 2026-02-21T09:19:14.1213954Z .loc 1 0 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:0:121 2026-02-21T09:19:14.1214337Z and.b32 %r2959, %r22966, 136; 2026-02-21T09:19:14.1214526Z xor.b32 %r56, %r2959, %r22965; 2026-02-21T09:19:14.1214723Z add.s32 %r57, %r22967, %r56; 2026-02-21T09:19:14.1214911Z add.s32 %r58, %r57, 2048; 2026-02-21T09:19:14.1215092Z add.s32 %r59, %r57, 4096; 2026-02-21T09:19:14.1215261Z add.s32 %r60, %r57, 6144; 2026-02-21T09:19:14.1215433Z add.s32 %r61, %r57, 8192; 2026-02-21T09:19:14.1215608Z add.s32 %r62, %r57, 10240; 2026-02-21T09:19:14.1215784Z add.s32 %r63, %r57, 12288; 2026-02-21T09:19:14.1215963Z add.s32 %r64, %r57, 14336; 2026-02-21T09:19:14.1216141Z add.s32 %r2962, %r22967, %r22969; 2026-02-21T09:19:14.1216342Z add.s32 %r66, %r2962, 172032; 2026-02-21T09:19:14.1216827Z add.s32 %r67, %r57, 81920; 2026-02-21T09:19:14.1217016Z add.s32 %r68, %r57, 83968; 2026-02-21T09:19:14.1217205Z add.s32 %r69, %r57, 86016; 2026-02-21T09:19:14.1217385Z add.s32 %r70, %r57, 88064; 2026-02-21T09:19:14.1217559Z add.s32 %r71, %r57, 90112; 2026-02-21T09:19:14.1217740Z add.s32 %r72, %r57, 92160; 2026-02-21T09:19:14.1217918Z add.s32 %r73, %r57, 94208; 2026-02-21T09:19:14.1218094Z add.s32 %r74, %r57, 96256; 2026-02-21T09:19:14.1218265Z or.b32 %r75, %r22968, 65536; 2026-02-21T09:19:14.1218450Z add.s32 %r76, %r2962, 177152; 2026-02-21T09:19:14.1218627Z add.s32 %r77, %r57, 16384; 2026-02-21T09:19:14.1218802Z add.s32 %r78, %r57, 18432; 2026-02-21T09:19:14.1218970Z add.s32 %r79, %r57, 20480; 2026-02-21T09:19:14.1219150Z add.s32 %r80, %r57, 22528; 2026-02-21T09:19:14.1219324Z add.s32 %r81, %r57, 24576; 2026-02-21T09:19:14.1219492Z add.s32 %r82, %r57, 26624; 2026-02-21T09:19:14.1219669Z add.s32 %r83, %r57, 28672; 2026-02-21T09:19:14.1219843Z add.s32 %r84, %r57, 30720; 2026-02-21T09:19:14.1220109Z or.b32 %r85, %r22968, 131072; 2026-02-21T09:19:14.1220291Z add.s32 %r86, %r2962, 173056; 2026-02-21T09:19:14.1220474Z add.s32 %r87, %r57, 98304; 2026-02-21T09:19:14.1220650Z add.s32 %r88, %r57, 100352; 2026-02-21T09:19:14.1220838Z add.s32 %r89, %r57, 102400; 2026-02-21T09:19:14.1221031Z add.s32 %r90, %r57, 104448; 2026-02-21T09:19:14.1221208Z add.s32 %r91, %r57, 106496; 2026-02-21T09:19:14.1221388Z add.s32 %r92, %r57, 108544; 2026-02-21T09:19:14.1221563Z add.s32 %r93, %r57, 110592; 2026-02-21T09:19:14.1221745Z add.s32 %r94, %r57, 112640; 2026-02-21T09:19:14.1221920Z or.b32 %r95, %r22968, 196608; 2026-02-21T09:19:14.1222102Z add.s32 %r96, %r2962, 178176; 2026-02-21T09:19:14.1222277Z add.s32 %r97, %r57, 32768; 2026-02-21T09:19:14.1222455Z add.s32 %r98, %r57, 34816; 2026-02-21T09:19:14.1222624Z add.s32 %r99, %r57, 36864; 2026-02-21T09:19:14.1222803Z add.s32 %r100, %r57, 38912; 2026-02-21T09:19:14.1222979Z add.s32 %r101, %r57, 40960; 2026-02-21T09:19:14.1223153Z add.s32 %r102, %r57, 43008; 2026-02-21T09:19:14.1223330Z add.s32 %r103, %r57, 45056; 2026-02-21T09:19:14.1223499Z add.s32 %r104, %r57, 47104; 2026-02-21T09:19:14.1223695Z or.b32 %r105, %r22968, 262144; 2026-02-21T09:19:14.1223977Z add.s32 %r106, %r2962, 174080; 2026-02-21T09:19:14.1224165Z add.s32 %r107, %r57, 114688; 2026-02-21T09:19:14.1224342Z add.s32 %r108, %r57, 116736; 2026-02-21T09:19:14.1224520Z add.s32 %r109, %r57, 118784; 2026-02-21T09:19:14.1224692Z add.s32 %r110, %r57, 120832; 2026-02-21T09:19:14.1224868Z add.s32 %r111, %r57, 122880; 2026-02-21T09:19:14.1225043Z add.s32 %r112, %r57, 124928; 2026-02-21T09:19:14.1225213Z add.s32 %r113, %r57, 126976; 2026-02-21T09:19:14.1225399Z add.s32 %r114, %r57, 129024; 2026-02-21T09:19:14.1225579Z or.b32 %r115, %r22968, 327680; 2026-02-21T09:19:14.1225785Z add.s32 %r116, %r2962, 179200; 2026-02-21T09:19:14.1225971Z add.s32 %r117, %r57, 49152; 2026-02-21T09:19:14.1226155Z add.s32 %r118, %r57, 51200; 2026-02-21T09:19:14.1226417Z add.s32 %r119, %r57, 53248; 2026-02-21T09:19:14.1226728Z add.s32 %r120, %r57, 55296; 2026-02-21T09:19:14.1226911Z add.s32 %r121, %r57, 57344; 2026-02-21T09:19:14.1227084Z add.s32 %r122, %r57, 59392; 2026-02-21T09:19:14.1227264Z add.s32 %r123, %r57, 61440; 2026-02-21T09:19:14.1227441Z add.s32 %r124, %r57, 63488; 2026-02-21T09:19:14.1227614Z or.b32 %r125, %r22968, 393216; 2026-02-21T09:19:14.1227801Z add.s32 %r126, %r2962, 175104; 2026-02-21T09:19:14.1227981Z add.s32 %r127, %r57, 131072; 2026-02-21T09:19:14.1228163Z add.s32 %r128, %r57, 133120; 2026-02-21T09:19:14.1228336Z add.s32 %r129, %r57, 135168; 2026-02-21T09:19:14.1228617Z add.s32 %r130, %r57, 137216; 2026-02-21T09:19:14.1228802Z add.s32 %r131, %r57, 139264; 2026-02-21T09:19:14.1228983Z add.s32 %r132, %r57, 141312; 2026-02-21T09:19:14.1229166Z add.s32 %r133, %r57, 143360; 2026-02-21T09:19:14.1229343Z add.s32 %r134, %r57, 145408; 2026-02-21T09:19:14.1229519Z or.b32 %r135, %r22968, 458752; 2026-02-21T09:19:14.1229795Z add.s32 %r136, %r2962, 180224; 2026-02-21T09:19:14.1229980Z add.s32 %r137, %r57, 65536; 2026-02-21T09:19:14.1230150Z add.s32 %r138, %r57, 67584; 2026-02-21T09:19:14.1230326Z add.s32 %r139, %r57, 69632; 2026-02-21T09:19:14.1230503Z add.s32 %r140, %r57, 71680; 2026-02-21T09:19:14.1230678Z add.s32 %r141, %r57, 73728; 2026-02-21T09:19:14.1230846Z add.s32 %r142, %r57, 75776; 2026-02-21T09:19:14.1231025Z add.s32 %r143, %r57, 77824; 2026-02-21T09:19:14.1231198Z add.s32 %r144, %r57, 79872; 2026-02-21T09:19:14.1231371Z or.b32 %r145, %r22968, 524288; 2026-02-21T09:19:14.1231561Z add.s32 %r146, %r2962, 176128; 2026-02-21T09:19:14.1231740Z add.s32 %r147, %r57, 147456; 2026-02-21T09:19:14.1231920Z add.s32 %r148, %r57, 149504; 2026-02-21T09:19:14.1232091Z add.s32 %r149, %r57, 151552; 2026-02-21T09:19:14.1232267Z add.s32 %r150, %r57, 153600; 2026-02-21T09:19:14.1232437Z add.s32 %r151, %r57, 155648; 2026-02-21T09:19:14.1232615Z add.s32 %r152, %r57, 157696; 2026-02-21T09:19:14.1232791Z add.s32 %r153, %r57, 159744; 2026-02-21T09:19:14.1233054Z add.s32 %r154, %r57, 161792; 2026-02-21T09:19:14.1233236Z or.b32 %r155, %r22968, 589824; 2026-02-21T09:19:14.1233413Z add.s32 %r156, %r2962, 181248; 2026-02-21T09:19:14.1233605Z or.b32 %r2966, %r22970, %r22971; 2026-02-21T09:19:14.1233801Z or.b32 %r2967, %r2966, %r22972; 2026-02-21T09:19:14.1233994Z or.b32 %r157, %r2967, %r2959; 2026-02-21T09:19:14.1234173Z xor.b32 %r158, %r157, 8; 2026-02-21T09:19:14.1234349Z shl.b32 %r2968, %r22973, 6; 2026-02-21T09:19:14.1234522Z or.b32 %r2971, %r2968, %r22976; 2026-02-21T09:19:14.1234714Z or.b32 %r2972, %r2971, %r22975; 2026-02-21T09:19:14.1234907Z add.s32 %r2973, %r22967, 163840; 2026-02-21T09:19:14.1235096Z add.s32 %r161, %r2973, %r2972; 2026-02-21T09:19:14.1235284Z xor.b32 %r2974, %r2972, 16; 2026-02-21T09:19:14.1235459Z add.s32 %r162, %r2973, %r2974; 2026-02-21T09:19:14.1235646Z xor.b32 %r2975, %r2972, 32; 2026-02-21T09:19:14.1235819Z add.s32 %r163, %r2973, %r2975; 2026-02-21T09:19:14.1236014Z xor.b32 %r2976, %r2972, 48; 2026-02-21T09:19:14.1236189Z add.s32 %r164, %r2973, %r2976; 2026-02-21T09:19:14.1236376Z bfe.u32 %r2977, %r2973, 4, 14; 2026-02-21T09:19:14.1236799Z cvt.u64.u32 %rd67, %r2977; 2026-02-21T09:19:14.1237006Z or.b64 %rd1, %rd67, -9223371899382267904; 2026-02-21T09:19:14.1237234Z add.s32 %r2978, %r22967, 163872; 2026-02-21T09:19:14.1237421Z bfe.u32 %r2979, %r2978, 4, 14; 2026-02-21T09:19:14.1237614Z cvt.u64.u32 %rd68, %r2979; 2026-02-21T09:19:14.1237804Z or.b64 %rd2, %rd68, -9223371899382267904; 2026-02-21T09:19:14.1238021Z and.b32 %r2982, %r22978, 7264; 2026-02-21T09:19:14.1238197Z shl.b32 %r2984, %r22979, 4; 2026-02-21T09:19:14.1238375Z or.b32 %r2986, %r22977, %r22980; 2026-02-21T09:19:14.1238555Z or.b32 %r2987, %r2982, %r2984; 2026-02-21T09:19:14.1238732Z or.b32 %r2988, %r2986, %r2987; 2026-02-21T09:19:14.1238909Z add.s32 %r165, %r22967, %r2988; 2026-02-21T09:19:14.1239086Z xor.b32 %r2989, %r2988, 32; 2026-02-21T09:19:14.1239348Z add.s32 %r166, %r22967, %r2989; 2026-02-21T09:19:14.1239531Z xor.b32 %r2990, %r2988, 64; 2026-02-21T09:19:14.1239719Z add.s32 %r167, %r22967, %r2990; 2026-02-21T09:19:14.1239897Z xor.b32 %r2991, %r2988, 96; 2026-02-21T09:19:14.1240070Z add.s32 %r168, %r22967, %r2991; 2026-02-21T09:19:14.1240246Z shl.b32 %r2992, %r22979, 10; 2026-02-21T09:19:14.1240421Z or.b32 %r2995, %r2992, %r22981; 2026-02-21T09:19:14.1240599Z xor.b32 %r2996, %r2995, %r22982; 2026-02-21T09:19:14.1240786Z add.s32 %r6479, %r22967, %r2996; 2026-02-21T09:19:14.1240980Z add.s32 %r6484, %r6479, 1024; 2026-02-21T09:19:14.1241158Z add.s32 %r6489, %r6479, 2048; 2026-02-21T09:19:14.1241333Z add.s32 %r6494, %r6479, 3072; 2026-02-21T09:19:14.1241502Z add.s32 %r6499, %r6479, 4096; 2026-02-21T09:19:14.1241674Z add.s32 %r6504, %r6479, 5120; 2026-02-21T09:19:14.1241843Z add.s32 %r6509, %r6479, 6144; 2026-02-21T09:19:14.1242016Z add.s32 %r6514, %r6479, 7168; 2026-02-21T09:19:14.1242351Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1242803Z or.b32 %r2998, %r22983, %r7; 2026-02-21T09:19:14.1242984Z or.b32 %r177, %r2998, 720896; 2026-02-21T09:19:14.1243317Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1243691Z mad.wide.u32 %rd3, %r53, 8, %rd64; 2026-02-21T09:19:14.1244030Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1244396Z or.b32 %r3000, %r22984, %r54; 2026-02-21T09:19:14.1244572Z or.b32 %r185, %r3000, 458928; 2026-02-21T09:19:14.1244906Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1245270Z add.s32 %r22987, %r22988, 1; 2026-02-21T09:19:14.1245454Z add.s32 %r22986, %r22988, 2; 2026-02-21T09:19:14.1245634Z add.s32 %r22985, %r22988, 3; 2026-02-21T09:19:14.1245859Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:19:14.1246227Z // Child Loop BB0_3 Depth 2 2026-02-21T09:19:14.1246626Z // Child Loop BB0_5 Depth 2 2026-02-21T09:19:14.1246907Z // Child Loop BB0_7 Depth 2 2026-02-21T09:19:14.1247172Z // Child Loop BB0_9 Depth 2 2026-02-21T09:19:14.1247543Z .loc 1 29 33 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:29:33 2026-02-21T09:19:14.1247905Z shr.u32 %r3184, %r22985, 5; 2026-02-21T09:19:14.1248090Z and.b32 %r324, %r3184, 67108856; 2026-02-21T09:19:14.1248286Z shr.u32 %r3185, %r22986, 5; 2026-02-21T09:19:14.1248468Z and.b32 %r325, %r3185, 67108856; 2026-02-21T09:19:14.1248666Z shr.u32 %r3186, %r22987, 5; 2026-02-21T09:19:14.1248842Z and.b32 %r326, %r3186, 67108856; 2026-02-21T09:19:14.1249038Z shr.u32 %r3187, %r22988, 5; 2026-02-21T09:19:14.1249223Z and.b32 %r3188, %r3187, 33554424; 2026-02-21T09:19:14.1249415Z and.b32 %r3189, %r3187, 67108856; 2026-02-21T09:19:14.1249765Z .loc 1 30 39 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:39 2026-02-21T09:19:14.1250224Z sub.s32 %r3190, 64, %r3189; 2026-02-21T09:19:14.1250544Z .loc 1 30 52 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:52 2026-02-21T09:19:14.1250895Z min.s32 %r3191, %r3190, 8; 2026-02-21T09:19:14.1251216Z .loc 1 31 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:45 2026-02-21T09:19:14.1251570Z and.b32 %r3192, %r22988, 255; 2026-02-21T09:19:14.1251882Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.1252238Z div.s32 %r3193, %r3192, %r3191; 2026-02-21T09:19:14.1252560Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.1252996Z mul.lo.s32 %r3194, %r3193, %r3191; 2026-02-21T09:19:14.1253196Z sub.s32 %r3195, %r3192, %r3194; 2026-02-21T09:19:14.1253521Z .loc 1 31 30 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:30 2026-02-21T09:19:14.1253877Z add.s32 %r3196, %r3195, %r3189; 2026-02-21T09:19:14.1254197Z .loc 1 33 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:33:27 2026-02-21T09:19:14.1254550Z shl.b32 %r327, %r3196, 7; 2026-02-21T09:19:14.1254881Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.1255246Z or.b32 %r3197, %r327, %r7; 2026-02-21T09:19:14.1255567Z .loc 1 35 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:35:27 2026-02-21T09:19:14.1255926Z shl.b32 %r328, %r3193, 9; 2026-02-21T09:19:14.1256243Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.1256795Z or.b32 %r3198, %r328, %r11; 2026-02-21T09:19:14.1256989Z or.b32 %r3199, %r328, %r12; 2026-02-21T09:19:14.1257180Z or.b32 %r3200, %r328, %r13; 2026-02-21T09:19:14.1257363Z or.b32 %r3201, %r328, %r14; 2026-02-21T09:19:14.1257536Z or.b32 %r3202, %r328, %r15; 2026-02-21T09:19:14.1257715Z or.b32 %r3203, %r328, %r16; 2026-02-21T09:19:14.1257890Z or.b32 %r3204, %r328, %r17; 2026-02-21T09:19:14.1258069Z or.b32 %r3205, %r328, %r18; 2026-02-21T09:19:14.1258401Z .loc 1 51 53 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:53 2026-02-21T09:19:14.1258764Z shl.b32 %r3206, %r3198, 10; 2026-02-21T09:19:14.1258947Z shl.b32 %r3207, %r3199, 10; 2026-02-21T09:19:14.1259128Z shl.b32 %r3208, %r3200, 10; 2026-02-21T09:19:14.1259321Z shl.b32 %r3209, %r3201, 10; 2026-02-21T09:19:14.1259494Z shl.b32 %r3210, %r3202, 10; 2026-02-21T09:19:14.1259673Z shl.b32 %r3211, %r3203, 10; 2026-02-21T09:19:14.1259846Z shl.b32 %r3212, %r3204, 10; 2026-02-21T09:19:14.1260026Z shl.b32 %r3213, %r3205, 10; 2026-02-21T09:19:14.1260424Z .loc 1 51 60 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:60 2026-02-21T09:19:14.1260785Z or.b32 %r3214, %r3206, %r54; 2026-02-21T09:19:14.1260975Z or.b32 %r3215, %r3207, %r54; 2026-02-21T09:19:14.1261160Z or.b32 %r3216, %r3208, %r54; 2026-02-21T09:19:14.1261345Z or.b32 %r3217, %r3209, %r54; 2026-02-21T09:19:14.1261517Z or.b32 %r3218, %r3210, %r54; 2026-02-21T09:19:14.1261697Z or.b32 %r3219, %r3211, %r54; 2026-02-21T09:19:14.1261873Z or.b32 %r3220, %r3212, %r54; 2026-02-21T09:19:14.1262052Z or.b32 %r3221, %r3213, %r54; 2026-02-21T09:19:14.1262372Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1262734Z mad.wide.s32 %rd69, %r3214, 2, %rd64; 2026-02-21T09:19:14.1262959Z mad.wide.s32 %rd70, %r3215, 2, %rd64; 2026-02-21T09:19:14.1263163Z mad.wide.s32 %rd71, %r3216, 2, %rd64; 2026-02-21T09:19:14.1263376Z mad.wide.s32 %rd72, %r3217, 2, %rd64; 2026-02-21T09:19:14.1263581Z mad.wide.s32 %rd73, %r3218, 2, %rd64; 2026-02-21T09:19:14.1263789Z mad.wide.s32 %rd74, %r3219, 2, %rd64; 2026-02-21T09:19:14.1263993Z mad.wide.s32 %rd75, %r3220, 2, %rd64; 2026-02-21T09:19:14.1264285Z mad.wide.s32 %rd76, %r3221, 2, %rd64; 2026-02-21T09:19:14.1264631Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1264979Z bar.sync 0; 2026-02-21T09:19:14.1265130Z mov.b32 %r3002, 8; 2026-02-21T09:19:14.1265291Z // begin inline asm 2026-02-21T09:19:14.1265535Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd69 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1265814Z // end inline asm 2026-02-21T09:19:14.1265988Z // begin inline asm 2026-02-21T09:19:14.1266218Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd70 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1266604Z // end inline asm 2026-02-21T09:19:14.1266768Z // begin inline asm 2026-02-21T09:19:14.1267078Z cp.async.ca.shared.global [ %r59 + 0 ], [ %rd71 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1267352Z // end inline asm 2026-02-21T09:19:14.1267497Z // begin inline asm 2026-02-21T09:19:14.1267719Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd72 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1267985Z // end inline asm 2026-02-21T09:19:14.1268149Z // begin inline asm 2026-02-21T09:19:14.1268376Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd73 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1268743Z // end inline asm 2026-02-21T09:19:14.1268910Z // begin inline asm 2026-02-21T09:19:14.1269138Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd74 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1269416Z // end inline asm 2026-02-21T09:19:14.1269570Z // begin inline asm 2026-02-21T09:19:14.1269803Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd75 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1270069Z // end inline asm 2026-02-21T09:19:14.1270229Z // begin inline asm 2026-02-21T09:19:14.1270455Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd76 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1270818Z // end inline asm 2026-02-21T09:19:14.1270986Z cp.async.commit_group; 2026-02-21T09:19:14.1271307Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1271679Z add.s32 %r3222, %r3197, %r22968; 2026-02-21T09:19:14.1272012Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1272375Z cvt.s64.s32 %rd160, %r3222; 2026-02-21T09:19:14.1272563Z add.s64 %rd77, %rd65, %rd160; 2026-02-21T09:19:14.1272753Z mov.b32 %r22992, 4; 2026-02-21T09:19:14.1273051Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1273415Z // begin inline asm 2026-02-21T09:19:14.1273671Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd77 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1273954Z // end inline asm 2026-02-21T09:19:14.1274121Z cp.async.commit_group; 2026-02-21T09:19:14.1274511Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1274881Z cvt.s64.s32 %rd161, %r3206; 2026-02-21T09:19:14.1275071Z or.b64 %rd162, %rd161, %rd1113; 2026-02-21T09:19:14.1275275Z shl.b64 %rd163, %rd162, 1; 2026-02-21T09:19:14.1275470Z add.s64 %rd164, %rd64, %rd163; 2026-02-21T09:19:14.1275659Z add.s64 %rd78, %rd164, 32; 2026-02-21T09:19:14.1275847Z cvt.s64.s32 %rd165, %r3207; 2026-02-21T09:19:14.1276030Z or.b64 %rd166, %rd165, %rd1113; 2026-02-21T09:19:14.1276226Z shl.b64 %rd167, %rd166, 1; 2026-02-21T09:19:14.1276405Z add.s64 %rd168, %rd64, %rd167; 2026-02-21T09:19:14.1276711Z add.s64 %rd79, %rd168, 32; 2026-02-21T09:19:14.1276886Z cvt.s64.s32 %rd169, %r3208; 2026-02-21T09:19:14.1277073Z or.b64 %rd170, %rd169, %rd1113; 2026-02-21T09:19:14.1277260Z shl.b64 %rd171, %rd170, 1; 2026-02-21T09:19:14.1277448Z add.s64 %rd172, %rd64, %rd171; 2026-02-21T09:19:14.1277646Z add.s64 %rd80, %rd172, 32; 2026-02-21T09:19:14.1277830Z cvt.s64.s32 %rd173, %r3209; 2026-02-21T09:19:14.1278014Z or.b64 %rd174, %rd173, %rd1113; 2026-02-21T09:19:14.1278196Z shl.b64 %rd175, %rd174, 1; 2026-02-21T09:19:14.1278377Z add.s64 %rd176, %rd64, %rd175; 2026-02-21T09:19:14.1278654Z add.s64 %rd81, %rd176, 32; 2026-02-21T09:19:14.1278833Z cvt.s64.s32 %rd177, %r3210; 2026-02-21T09:19:14.1279012Z or.b64 %rd178, %rd177, %rd1113; 2026-02-21T09:19:14.1279202Z shl.b64 %rd179, %rd178, 1; 2026-02-21T09:19:14.1279386Z add.s64 %rd180, %rd64, %rd179; 2026-02-21T09:19:14.1279573Z add.s64 %rd82, %rd180, 32; 2026-02-21T09:19:14.1279752Z cvt.s64.s32 %rd181, %r3211; 2026-02-21T09:19:14.1279929Z or.b64 %rd182, %rd181, %rd1113; 2026-02-21T09:19:14.1280120Z shl.b64 %rd183, %rd182, 1; 2026-02-21T09:19:14.1280294Z add.s64 %rd184, %rd64, %rd183; 2026-02-21T09:19:14.1280486Z add.s64 %rd83, %rd184, 32; 2026-02-21T09:19:14.1280674Z cvt.s64.s32 %rd185, %r3212; 2026-02-21T09:19:14.1280863Z or.b64 %rd186, %rd185, %rd1113; 2026-02-21T09:19:14.1281127Z shl.b64 %rd187, %rd186, 1; 2026-02-21T09:19:14.1281315Z add.s64 %rd188, %rd64, %rd187; 2026-02-21T09:19:14.1281508Z add.s64 %rd84, %rd188, 32; 2026-02-21T09:19:14.1281681Z cvt.s64.s32 %rd189, %r3213; 2026-02-21T09:19:14.1281867Z or.b64 %rd190, %rd189, %rd1113; 2026-02-21T09:19:14.1282055Z shl.b64 %rd191, %rd190, 1; 2026-02-21T09:19:14.1282237Z add.s64 %rd192, %rd64, %rd191; 2026-02-21T09:19:14.1282417Z add.s64 %rd85, %rd192, 32; 2026-02-21T09:19:14.1282738Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1283095Z // begin inline asm 2026-02-21T09:19:14.1283337Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd78 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1283615Z // end inline asm 2026-02-21T09:19:14.1283767Z // begin inline asm 2026-02-21T09:19:14.1284001Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd79 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1284269Z // end inline asm 2026-02-21T09:19:14.1284508Z // begin inline asm 2026-02-21T09:19:14.1284747Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd80 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1285027Z // end inline asm 2026-02-21T09:19:14.1285183Z // begin inline asm 2026-02-21T09:19:14.1285417Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd81 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1285772Z // end inline asm 2026-02-21T09:19:14.1285939Z // begin inline asm 2026-02-21T09:19:14.1286174Z cp.async.ca.shared.global [ %r71 + 0 ], [ %rd82 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1286542Z // end inline asm 2026-02-21T09:19:14.1286705Z // begin inline asm 2026-02-21T09:19:14.1286929Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd83 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1287214Z // end inline asm 2026-02-21T09:19:14.1287367Z // begin inline asm 2026-02-21T09:19:14.1287598Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd84 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1287865Z // end inline asm 2026-02-21T09:19:14.1288017Z // begin inline asm 2026-02-21T09:19:14.1288249Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd85 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1288603Z // end inline asm 2026-02-21T09:19:14.1288772Z cp.async.commit_group; 2026-02-21T09:19:14.1289095Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1289462Z add.s32 %r3223, %r3197, %r75; 2026-02-21T09:19:14.1289785Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1290146Z cvt.s64.s32 %rd193, %r3223; 2026-02-21T09:19:14.1290336Z add.s64 %rd86, %rd65, %rd193; 2026-02-21T09:19:14.1290674Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1291035Z // begin inline asm 2026-02-21T09:19:14.1291274Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd86 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1291559Z // end inline asm 2026-02-21T09:19:14.1291718Z cp.async.commit_group; 2026-02-21T09:19:14.1292043Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1292405Z add.s64 %rd87, %rd164, 64; 2026-02-21T09:19:14.1292597Z add.s64 %rd88, %rd168, 64; 2026-02-21T09:19:14.1292859Z add.s64 %rd89, %rd172, 64; 2026-02-21T09:19:14.1293035Z add.s64 %rd90, %rd176, 64; 2026-02-21T09:19:14.1293215Z add.s64 %rd91, %rd180, 64; 2026-02-21T09:19:14.1293387Z add.s64 %rd92, %rd184, 64; 2026-02-21T09:19:14.1293568Z add.s64 %rd93, %rd188, 64; 2026-02-21T09:19:14.1293742Z add.s64 %rd94, %rd192, 64; 2026-02-21T09:19:14.1294062Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1294413Z bar.sync 0; 2026-02-21T09:19:14.1294569Z // begin inline asm 2026-02-21T09:19:14.1294809Z cp.async.ca.shared.global [ %r77 + 0 ], [ %rd87 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1295083Z // end inline asm 2026-02-21T09:19:14.1295244Z // begin inline asm 2026-02-21T09:19:14.1295553Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd88 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1295847Z // end inline asm 2026-02-21T09:19:14.1296002Z // begin inline asm 2026-02-21T09:19:14.1296233Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd89 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1296622Z // end inline asm 2026-02-21T09:19:14.1296784Z // begin inline asm 2026-02-21T09:19:14.1297007Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd90 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1297281Z // end inline asm 2026-02-21T09:19:14.1297438Z // begin inline asm 2026-02-21T09:19:14.1297660Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd91 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1297937Z // end inline asm 2026-02-21T09:19:14.1298090Z // begin inline asm 2026-02-21T09:19:14.1298322Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd92 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1298604Z // end inline asm 2026-02-21T09:19:14.1298759Z // begin inline asm 2026-02-21T09:19:14.1298982Z cp.async.ca.shared.global [ %r83 + 0 ], [ %rd93 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1299346Z // end inline asm 2026-02-21T09:19:14.1299505Z // begin inline asm 2026-02-21T09:19:14.1299724Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd94 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1300001Z // end inline asm 2026-02-21T09:19:14.1300155Z cp.async.commit_group; 2026-02-21T09:19:14.1300474Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1300830Z add.s32 %r3224, %r3197, %r85; 2026-02-21T09:19:14.1301159Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1301521Z cvt.s64.s32 %rd194, %r3224; 2026-02-21T09:19:14.1301706Z add.s64 %rd95, %rd65, %rd194; 2026-02-21T09:19:14.1302029Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1302375Z // begin inline asm 2026-02-21T09:19:14.1302616Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd95 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1302892Z // end inline asm 2026-02-21T09:19:14.1303146Z cp.async.commit_group; 2026-02-21T09:19:14.1303477Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1303841Z add.s64 %rd96, %rd164, 96; 2026-02-21T09:19:14.1304028Z add.s64 %rd97, %rd168, 96; 2026-02-21T09:19:14.1304203Z add.s64 %rd98, %rd172, 96; 2026-02-21T09:19:14.1304380Z add.s64 %rd99, %rd176, 96; 2026-02-21T09:19:14.1304555Z add.s64 %rd100, %rd180, 96; 2026-02-21T09:19:14.1304738Z add.s64 %rd101, %rd184, 96; 2026-02-21T09:19:14.1304917Z add.s64 %rd102, %rd188, 96; 2026-02-21T09:19:14.1305099Z add.s64 %rd103, %rd192, 96; 2026-02-21T09:19:14.1305413Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1305774Z // begin inline asm 2026-02-21T09:19:14.1306008Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd96 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1306279Z // end inline asm 2026-02-21T09:19:14.1306437Z // begin inline asm 2026-02-21T09:19:14.1306807Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd97 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1307084Z // end inline asm 2026-02-21T09:19:14.1307317Z // begin inline asm 2026-02-21T09:19:14.1307546Z cp.async.ca.shared.global [ %r89 + 0 ], [ %rd98 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1307817Z // end inline asm 2026-02-21T09:19:14.1307975Z // begin inline asm 2026-02-21T09:19:14.1308202Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd99 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1308467Z // end inline asm 2026-02-21T09:19:14.1308885Z // begin inline asm 2026-02-21T09:19:14.1309112Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd100 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1309397Z // end inline asm 2026-02-21T09:19:14.1309548Z // begin inline asm 2026-02-21T09:19:14.1309781Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd101 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1310057Z // end inline asm 2026-02-21T09:19:14.1310216Z // begin inline asm 2026-02-21T09:19:14.1310523Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd102 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1310808Z // end inline asm 2026-02-21T09:19:14.1310964Z // begin inline asm 2026-02-21T09:19:14.1311191Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd103 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1311468Z // end inline asm 2026-02-21T09:19:14.1311626Z cp.async.commit_group; 2026-02-21T09:19:14.1311948Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1312307Z add.s32 %r3225, %r3197, %r95; 2026-02-21T09:19:14.1312650Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1313010Z cvt.s64.s32 %rd195, %r3225; 2026-02-21T09:19:14.1313195Z add.s64 %rd104, %rd65, %rd195; 2026-02-21T09:19:14.1313522Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1313952Z // begin inline asm 2026-02-21T09:19:14.1314193Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd104 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1314468Z // end inline asm 2026-02-21T09:19:14.1314629Z cp.async.commit_group; 2026-02-21T09:19:14.1314934Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1315296Z add.s64 %rd105, %rd164, 128; 2026-02-21T09:19:14.1315497Z add.s64 %rd106, %rd168, 128; 2026-02-21T09:19:14.1315679Z add.s64 %rd107, %rd172, 128; 2026-02-21T09:19:14.1315859Z add.s64 %rd108, %rd176, 128; 2026-02-21T09:19:14.1316032Z add.s64 %rd109, %rd180, 128; 2026-02-21T09:19:14.1316211Z add.s64 %rd110, %rd184, 128; 2026-02-21T09:19:14.1316386Z add.s64 %rd111, %rd188, 128; 2026-02-21T09:19:14.1316690Z add.s64 %rd112, %rd192, 128; 2026-02-21T09:19:14.1317005Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1317356Z bar.sync 0; 2026-02-21T09:19:14.1317511Z // begin inline asm 2026-02-21T09:19:14.1317832Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd105 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1318117Z // end inline asm 2026-02-21T09:19:14.1318268Z // begin inline asm 2026-02-21T09:19:14.1318504Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd106 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1318778Z // end inline asm 2026-02-21T09:19:14.1318936Z // begin inline asm 2026-02-21T09:19:14.1319160Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd107 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1319438Z // end inline asm 2026-02-21T09:19:14.1319594Z // begin inline asm 2026-02-21T09:19:14.1319823Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd108 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1320106Z // end inline asm 2026-02-21T09:19:14.1320254Z // begin inline asm 2026-02-21T09:19:14.1320485Z cp.async.ca.shared.global [ %r101 + 0 ], [ %rd109 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1320760Z // end inline asm 2026-02-21T09:19:14.1320915Z // begin inline asm 2026-02-21T09:19:14.1321148Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd110 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1321430Z // end inline asm 2026-02-21T09:19:14.1321593Z // begin inline asm 2026-02-21T09:19:14.1321903Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd111 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1322184Z // end inline asm 2026-02-21T09:19:14.1322336Z // begin inline asm 2026-02-21T09:19:14.1322588Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd112 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1322862Z // end inline asm 2026-02-21T09:19:14.1323029Z cp.async.commit_group; 2026-02-21T09:19:14.1323342Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1323709Z add.s32 %r3226, %r3197, %r105; 2026-02-21T09:19:14.1324041Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1324394Z cvt.s64.s32 %rd196, %r3226; 2026-02-21T09:19:14.1324592Z add.s64 %rd113, %rd65, %rd196; 2026-02-21T09:19:14.1324994Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1325356Z // begin inline asm 2026-02-21T09:19:14.1325596Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd113 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1325886Z // end inline asm 2026-02-21T09:19:14.1326048Z cp.async.commit_group; 2026-02-21T09:19:14.1326356Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1326853Z add.s64 %rd114, %rd164, 160; 2026-02-21T09:19:14.1327037Z add.s64 %rd115, %rd168, 160; 2026-02-21T09:19:14.1327220Z add.s64 %rd116, %rd172, 160; 2026-02-21T09:19:14.1327394Z add.s64 %rd117, %rd176, 160; 2026-02-21T09:19:14.1327574Z add.s64 %rd118, %rd180, 160; 2026-02-21T09:19:14.1327750Z add.s64 %rd119, %rd184, 160; 2026-02-21T09:19:14.1327930Z add.s64 %rd120, %rd188, 160; 2026-02-21T09:19:14.1328110Z add.s64 %rd121, %rd192, 160; 2026-02-21T09:19:14.1328511Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1328869Z // begin inline asm 2026-02-21T09:19:14.1329100Z cp.async.ca.shared.global [ %r107 + 0 ], [ %rd114 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1329381Z // end inline asm 2026-02-21T09:19:14.1329530Z // begin inline asm 2026-02-21T09:19:14.1329765Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd115 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1330044Z // end inline asm 2026-02-21T09:19:14.1330204Z // begin inline asm 2026-02-21T09:19:14.1330439Z cp.async.ca.shared.global [ %r109 + 0 ], [ %rd116 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1330713Z // end inline asm 2026-02-21T09:19:14.1330869Z // begin inline asm 2026-02-21T09:19:14.1331100Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd117 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1331379Z // end inline asm 2026-02-21T09:19:14.1331527Z // begin inline asm 2026-02-21T09:19:14.1331762Z cp.async.ca.shared.global [ %r111 + 0 ], [ %rd118 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1332114Z // end inline asm 2026-02-21T09:19:14.1332275Z // begin inline asm 2026-02-21T09:19:14.1332509Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd119 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1332785Z // end inline asm 2026-02-21T09:19:14.1332939Z // begin inline asm 2026-02-21T09:19:14.1333165Z cp.async.ca.shared.global [ %r113 + 0 ], [ %rd120 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1333443Z // end inline asm 2026-02-21T09:19:14.1333603Z // begin inline asm 2026-02-21T09:19:14.1333839Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd121 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1334114Z // end inline asm 2026-02-21T09:19:14.1334279Z cp.async.commit_group; 2026-02-21T09:19:14.1334598Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1334957Z add.s32 %r3227, %r3197, %r115; 2026-02-21T09:19:14.1335289Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1335649Z cvt.s64.s32 %rd197, %r3227; 2026-02-21T09:19:14.1335841Z add.s64 %rd122, %rd65, %rd197; 2026-02-21T09:19:14.1336161Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1336721Z // begin inline asm 2026-02-21T09:19:14.1336961Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd122 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1337250Z // end inline asm 2026-02-21T09:19:14.1337415Z cp.async.commit_group; 2026-02-21T09:19:14.1337727Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1338083Z add.s64 %rd123, %rd164, 192; 2026-02-21T09:19:14.1338264Z add.s64 %rd124, %rd168, 192; 2026-02-21T09:19:14.1338450Z add.s64 %rd125, %rd172, 192; 2026-02-21T09:19:14.1338629Z add.s64 %rd126, %rd176, 192; 2026-02-21T09:19:14.1338810Z add.s64 %rd127, %rd180, 192; 2026-02-21T09:19:14.1338990Z add.s64 %rd128, %rd184, 192; 2026-02-21T09:19:14.1339251Z add.s64 %rd129, %rd188, 192; 2026-02-21T09:19:14.1339436Z add.s64 %rd130, %rd192, 192; 2026-02-21T09:19:14.1339749Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1340104Z bar.sync 0; 2026-02-21T09:19:14.1340253Z // begin inline asm 2026-02-21T09:19:14.1340489Z cp.async.ca.shared.global [ %r117 + 0 ], [ %rd123 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1340761Z // end inline asm 2026-02-21T09:19:14.1340921Z // begin inline asm 2026-02-21T09:19:14.1341149Z cp.async.ca.shared.global [ %r118 + 0 ], [ %rd124 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1341445Z // end inline asm 2026-02-21T09:19:14.1341600Z // begin inline asm 2026-02-21T09:19:14.1341825Z cp.async.ca.shared.global [ %r119 + 0 ], [ %rd125 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1342103Z // end inline asm 2026-02-21T09:19:14.1342251Z // begin inline asm 2026-02-21T09:19:14.1342484Z cp.async.ca.shared.global [ %r120 + 0 ], [ %rd126 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1342839Z // end inline asm 2026-02-21T09:19:14.1342993Z // begin inline asm 2026-02-21T09:19:14.1343219Z cp.async.ca.shared.global [ %r121 + 0 ], [ %rd127 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1343500Z // end inline asm 2026-02-21T09:19:14.1343654Z // begin inline asm 2026-02-21T09:19:14.1343881Z cp.async.ca.shared.global [ %r122 + 0 ], [ %rd128 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1344161Z // end inline asm 2026-02-21T09:19:14.1344311Z // begin inline asm 2026-02-21T09:19:14.1344542Z cp.async.ca.shared.global [ %r123 + 0 ], [ %rd129 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1344815Z // end inline asm 2026-02-21T09:19:14.1344985Z // begin inline asm 2026-02-21T09:19:14.1345213Z cp.async.ca.shared.global [ %r124 + 0 ], [ %rd130 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1345493Z // end inline asm 2026-02-21T09:19:14.1345656Z cp.async.commit_group; 2026-02-21T09:19:14.1345965Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1346416Z add.s32 %r3228, %r3197, %r125; 2026-02-21T09:19:14.1346876Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1347252Z cvt.s64.s32 %rd198, %r3228; 2026-02-21T09:19:14.1347439Z add.s64 %rd131, %rd65, %rd198; 2026-02-21T09:19:14.1347772Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1348134Z // begin inline asm 2026-02-21T09:19:14.1348371Z cp.async.ca.shared.global [ %r126 + 0 ], [ %rd131 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1348777Z // end inline asm 2026-02-21T09:19:14.1348936Z cp.async.commit_group; 2026-02-21T09:19:14.1349251Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1349608Z add.s64 %rd132, %rd164, 224; 2026-02-21T09:19:14.1349796Z add.s64 %rd133, %rd168, 224; 2026-02-21T09:19:14.1349978Z add.s64 %rd134, %rd172, 224; 2026-02-21T09:19:14.1350168Z add.s64 %rd135, %rd176, 224; 2026-02-21T09:19:14.1350350Z add.s64 %rd136, %rd180, 224; 2026-02-21T09:19:14.1350526Z add.s64 %rd137, %rd184, 224; 2026-02-21T09:19:14.1350794Z add.s64 %rd138, %rd188, 224; 2026-02-21T09:19:14.1350968Z add.s64 %rd139, %rd192, 224; 2026-02-21T09:19:14.1351285Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1351632Z // begin inline asm 2026-02-21T09:19:14.1351866Z cp.async.ca.shared.global [ %r127 + 0 ], [ %rd132 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1352140Z // end inline asm 2026-02-21T09:19:14.1352310Z // begin inline asm 2026-02-21T09:19:14.1352546Z cp.async.ca.shared.global [ %r128 + 0 ], [ %rd133 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1352819Z // end inline asm 2026-02-21T09:19:14.1352972Z // begin inline asm 2026-02-21T09:19:14.1353197Z cp.async.ca.shared.global [ %r129 + 0 ], [ %rd134 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1353477Z // end inline asm 2026-02-21T09:19:14.1353702Z // begin inline asm 2026-02-21T09:19:14.1353955Z cp.async.ca.shared.global [ %r130 + 0 ], [ %rd135 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1354230Z // end inline asm 2026-02-21T09:19:14.1354386Z // begin inline asm 2026-02-21T09:19:14.1354618Z cp.async.ca.shared.global [ %r131 + 0 ], [ %rd136 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1354889Z // end inline asm 2026-02-21T09:19:14.1355044Z // begin inline asm 2026-02-21T09:19:14.1355272Z cp.async.ca.shared.global [ %r132 + 0 ], [ %rd137 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1355552Z // end inline asm 2026-02-21T09:19:14.1355699Z // begin inline asm 2026-02-21T09:19:14.1355929Z cp.async.ca.shared.global [ %r133 + 0 ], [ %rd138 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1356201Z // end inline asm 2026-02-21T09:19:14.1356355Z // begin inline asm 2026-02-21T09:19:14.1356715Z cp.async.ca.shared.global [ %r134 + 0 ], [ %rd139 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1356987Z // end inline asm 2026-02-21T09:19:14.1357269Z cp.async.commit_group; 2026-02-21T09:19:14.1357581Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1357945Z add.s32 %r3229, %r3197, %r135; 2026-02-21T09:19:14.1358274Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1358633Z cvt.s64.s32 %rd199, %r3229; 2026-02-21T09:19:14.1358816Z add.s64 %rd140, %rd65, %rd199; 2026-02-21T09:19:14.1359152Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1359507Z // begin inline asm 2026-02-21T09:19:14.1359746Z cp.async.ca.shared.global [ %r136 + 0 ], [ %rd140 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1360035Z // end inline asm 2026-02-21T09:19:14.1360194Z cp.async.commit_group; 2026-02-21T09:19:14.1360508Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1360868Z add.s64 %rd141, %rd164, 256; 2026-02-21T09:19:14.1361137Z add.s64 %rd142, %rd168, 256; 2026-02-21T09:19:14.1361330Z add.s64 %rd143, %rd172, 256; 2026-02-21T09:19:14.1361508Z add.s64 %rd144, %rd176, 256; 2026-02-21T09:19:14.1361693Z add.s64 %rd145, %rd180, 256; 2026-02-21T09:19:14.1361871Z add.s64 %rd146, %rd184, 256; 2026-02-21T09:19:14.1362057Z add.s64 %rd147, %rd188, 256; 2026-02-21T09:19:14.1362234Z add.s64 %rd148, %rd192, 256; 2026-02-21T09:19:14.1362556Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1362899Z bar.sync 0; 2026-02-21T09:19:14.1363049Z // begin inline asm 2026-02-21T09:19:14.1363287Z cp.async.ca.shared.global [ %r137 + 0 ], [ %rd141 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1363561Z // end inline asm 2026-02-21T09:19:14.1363715Z // begin inline asm 2026-02-21T09:19:14.1363943Z cp.async.ca.shared.global [ %r138 + 0 ], [ %rd142 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1364223Z // end inline asm 2026-02-21T09:19:14.1364372Z // begin inline asm 2026-02-21T09:19:14.1364602Z cp.async.ca.shared.global [ %r139 + 0 ], [ %rd143 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1364874Z // end inline asm 2026-02-21T09:19:14.1365127Z // begin inline asm 2026-02-21T09:19:14.1365354Z cp.async.ca.shared.global [ %r140 + 0 ], [ %rd144 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1365635Z // end inline asm 2026-02-21T09:19:14.1365787Z // begin inline asm 2026-02-21T09:19:14.1366012Z cp.async.ca.shared.global [ %r141 + 0 ], [ %rd145 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1366291Z // end inline asm 2026-02-21T09:19:14.1366440Z // begin inline asm 2026-02-21T09:19:14.1366800Z cp.async.ca.shared.global [ %r142 + 0 ], [ %rd146 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1367072Z // end inline asm 2026-02-21T09:19:14.1367227Z // begin inline asm 2026-02-21T09:19:14.1367456Z cp.async.ca.shared.global [ %r143 + 0 ], [ %rd147 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1367736Z // end inline asm 2026-02-21T09:19:14.1367893Z // begin inline asm 2026-02-21T09:19:14.1368208Z cp.async.ca.shared.global [ %r144 + 0 ], [ %rd148 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1368511Z // end inline asm 2026-02-21T09:19:14.1368672Z cp.async.commit_group; 2026-02-21T09:19:14.1368992Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1369357Z add.s32 %r3230, %r3197, %r145; 2026-02-21T09:19:14.1369684Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1370043Z cvt.s64.s32 %rd200, %r3230; 2026-02-21T09:19:14.1370225Z add.s64 %rd149, %rd65, %rd200; 2026-02-21T09:19:14.1370551Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1370901Z // begin inline asm 2026-02-21T09:19:14.1371141Z cp.async.ca.shared.global [ %r146 + 0 ], [ %rd149 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1371420Z // end inline asm 2026-02-21T09:19:14.1371770Z cp.async.commit_group; 2026-02-21T09:19:14.1372081Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1372554Z add.s64 %rd150, %rd164, 288; 2026-02-21T09:19:14.1372861Z add.s64 %rd151, %rd168, 288; 2026-02-21T09:19:14.1373120Z add.s64 %rd152, %rd172, 288; 2026-02-21T09:19:14.1373501Z add.s64 %rd153, %rd176, 288; 2026-02-21T09:19:14.1373800Z add.s64 %rd154, %rd180, 288; 2026-02-21T09:19:14.1374037Z add.s64 %rd155, %rd184, 288; 2026-02-21T09:19:14.1374368Z add.s64 %rd156, %rd188, 288; 2026-02-21T09:19:14.1374613Z add.s64 %rd157, %rd192, 288; 2026-02-21T09:19:14.1375045Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1375547Z // begin inline asm 2026-02-21T09:19:14.1375865Z cp.async.ca.shared.global [ %r147 + 0 ], [ %rd150 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1376238Z // end inline asm 2026-02-21T09:19:14.1376841Z // begin inline asm 2026-02-21T09:19:14.1377307Z cp.async.ca.shared.global [ %r148 + 0 ], [ %rd151 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1377650Z // end inline asm 2026-02-21T09:19:14.1377937Z // begin inline asm 2026-02-21T09:19:14.1378320Z cp.async.ca.shared.global [ %r149 + 0 ], [ %rd152 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1378660Z // end inline asm 2026-02-21T09:19:14.1378936Z // begin inline asm 2026-02-21T09:19:14.1379261Z cp.async.ca.shared.global [ %r150 + 0 ], [ %rd153 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1379635Z // end inline asm 2026-02-21T09:19:14.1379831Z // begin inline asm 2026-02-21T09:19:14.1380234Z cp.async.ca.shared.global [ %r151 + 0 ], [ %rd154 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1380626Z // end inline asm 2026-02-21T09:19:14.1380821Z // begin inline asm 2026-02-21T09:19:14.1381230Z cp.async.ca.shared.global [ %r152 + 0 ], [ %rd155 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1381566Z // end inline asm 2026-02-21T09:19:14.1381798Z // begin inline asm 2026-02-21T09:19:14.1382170Z cp.async.ca.shared.global [ %r153 + 0 ], [ %rd156 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1382548Z // end inline asm 2026-02-21T09:19:14.1382760Z // begin inline asm 2026-02-21T09:19:14.1383146Z cp.async.ca.shared.global [ %r154 + 0 ], [ %rd157 + 0 ], 0x8, %r3002; 2026-02-21T09:19:14.1383605Z // end inline asm 2026-02-21T09:19:14.1383845Z cp.async.commit_group; 2026-02-21T09:19:14.1384314Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1384742Z add.s32 %r3231, %r3197, %r155; 2026-02-21T09:19:14.1385552Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1386002Z cvt.s64.s32 %rd201, %r3231; 2026-02-21T09:19:14.1386286Z add.s64 %rd158, %rd65, %rd201; 2026-02-21T09:19:14.1406389Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1406926Z // begin inline asm 2026-02-21T09:19:14.1407358Z cp.async.ca.shared.global [ %r156 + 0 ], [ %rd158 + 0 ], 0x4, %r22992; 2026-02-21T09:19:14.1407657Z // end inline asm 2026-02-21T09:19:14.1407833Z cp.async.commit_group; 2026-02-21T09:19:14.1408165Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1408547Z add.s32 %r3232, %r3192, %r3188; 2026-02-21T09:19:14.1408747Z sub.s32 %r3233, %r3232, %r3194; 2026-02-21T09:19:14.1408931Z shl.b32 %r3234, %r3233, 7; 2026-02-21T09:19:14.1409121Z add.s32 %r22990, %r177, %r3234; 2026-02-21T09:19:14.1409317Z or.b32 %r3235, %r17, %r328; 2026-02-21T09:19:14.1409502Z shl.b32 %r3236, %r3235, 10; 2026-02-21T09:19:14.1409690Z mul.wide.s32 %rd8, %r3236, 2; 2026-02-21T09:19:14.1409878Z or.b32 %r3237, %r16, %r328; 2026-02-21T09:19:14.1410048Z shl.b32 %r3238, %r3237, 10; 2026-02-21T09:19:14.1410235Z mul.wide.s32 %rd9, %r3238, 2; 2026-02-21T09:19:14.1410416Z or.b32 %r3239, %r15, %r328; 2026-02-21T09:19:14.1410581Z shl.b32 %r3240, %r3239, 10; 2026-02-21T09:19:14.1410857Z mul.wide.s32 %rd10, %r3240, 2; 2026-02-21T09:19:14.1411038Z or.b32 %r3241, %r14, %r328; 2026-02-21T09:19:14.1411213Z shl.b32 %r3242, %r3241, 10; 2026-02-21T09:19:14.1411383Z mul.wide.s32 %rd11, %r3242, 2; 2026-02-21T09:19:14.1411567Z or.b32 %r3243, %r13, %r328; 2026-02-21T09:19:14.1411732Z shl.b32 %r3244, %r3243, 10; 2026-02-21T09:19:14.1411908Z mul.wide.s32 %rd12, %r3244, 2; 2026-02-21T09:19:14.1412085Z or.b32 %r3245, %r12, %r328; 2026-02-21T09:19:14.1412254Z shl.b32 %r3246, %r3245, 10; 2026-02-21T09:19:14.1412430Z mul.wide.s32 %rd13, %r3246, 2; 2026-02-21T09:19:14.1412604Z shl.b32 %r3247, %r3193, 19; 2026-02-21T09:19:14.1412793Z or.b32 %r3248, %r22984, %r3247; 2026-02-21T09:19:14.1412983Z mul.wide.s32 %rd14, %r3248, 2; 2026-02-21T09:19:14.1413165Z or.b32 %r22989, %r185, %r3247; 2026-02-21T09:19:14.1413342Z mov.b32 %r22993, 0f00000000; 2026-02-21T09:19:14.1413536Z mov.b32 %r22991, -1; 2026-02-21T09:19:14.1413706Z mov.b64 %rd1115, -16; 2026-02-21T09:19:14.1413878Z mov.b64 %rd1114, %rd3; 2026-02-21T09:19:14.1414127Z mov.b32 %r22994, %r22993; 2026-02-21T09:19:14.1414312Z mov.b32 %r22995, %r22993; 2026-02-21T09:19:14.1414486Z mov.b32 %r22996, %r22993; 2026-02-21T09:19:14.1414657Z mov.b32 %r22997, %r22993; 2026-02-21T09:19:14.1414827Z mov.b32 %r22998, %r22993; 2026-02-21T09:19:14.1414992Z mov.b32 %r22999, %r22993; 2026-02-21T09:19:14.1415180Z mov.b32 %r23000, %r22993; 2026-02-21T09:19:14.1415352Z mov.b32 %r23001, %r22993; 2026-02-21T09:19:14.1415521Z mov.b32 %r23002, %r22993; 2026-02-21T09:19:14.1415688Z mov.b32 %r23003, %r22993; 2026-02-21T09:19:14.1415860Z mov.b32 %r23004, %r22993; 2026-02-21T09:19:14.1416026Z mov.b32 %r23005, %r22993; 2026-02-21T09:19:14.1416203Z mov.b32 %r23006, %r22993; 2026-02-21T09:19:14.1416378Z mov.b32 %r23007, %r22993; 2026-02-21T09:19:14.1416664Z mov.b32 %r23008, %r22993; 2026-02-21T09:19:14.1416842Z mov.b32 %r23009, %r22993; 2026-02-21T09:19:14.1417011Z mov.b32 %r23010, %r22993; 2026-02-21T09:19:14.1417184Z mov.b32 %r23011, %r22993; 2026-02-21T09:19:14.1417351Z mov.b32 %r23012, %r22993; 2026-02-21T09:19:14.1417525Z mov.b32 %r23013, %r22993; 2026-02-21T09:19:14.1417705Z mov.b32 %r23014, %r22993; 2026-02-21T09:19:14.1418013Z mov.b32 %r23015, %r22993; 2026-02-21T09:19:14.1418180Z mov.b32 %r23016, %r22993; 2026-02-21T09:19:14.1418348Z mov.b32 %r23017, %r22993; 2026-02-21T09:19:14.1418521Z mov.b32 %r23018, %r22993; 2026-02-21T09:19:14.1418686Z mov.b32 %r23019, %r22993; 2026-02-21T09:19:14.1418858Z mov.b32 %r23020, %r22993; 2026-02-21T09:19:14.1419030Z mov.b32 %r23021, %r22993; 2026-02-21T09:19:14.1419202Z mov.b32 %r23022, %r22993; 2026-02-21T09:19:14.1419369Z mov.b32 %r23023, %r22993; 2026-02-21T09:19:14.1419540Z mov.b32 %r23024, %r22993; 2026-02-21T09:19:14.1419706Z mov.b32 %r23025, %r22993; 2026-02-21T09:19:14.1419882Z mov.b32 %r23026, %r22993; 2026-02-21T09:19:14.1420049Z mov.b32 %r23027, %r22993; 2026-02-21T09:19:14.1420238Z mov.b32 %r23028, %r22993; 2026-02-21T09:19:14.1420497Z mov.b32 %r23029, %r22993; 2026-02-21T09:19:14.1420669Z mov.b32 %r23030, %r22993; 2026-02-21T09:19:14.1420849Z mov.b32 %r23031, %r22993; 2026-02-21T09:19:14.1421016Z mov.b32 %r23032, %r22993; 2026-02-21T09:19:14.1421188Z mov.b32 %r23033, %r22993; 2026-02-21T09:19:14.1421358Z mov.b32 %r23034, %r22993; 2026-02-21T09:19:14.1421530Z mov.b32 %r23035, %r22993; 2026-02-21T09:19:14.1421698Z mov.b32 %r23036, %r22993; 2026-02-21T09:19:14.1421867Z mov.b32 %r23037, %r22993; 2026-02-21T09:19:14.1422044Z mov.b32 %r23038, %r22993; 2026-02-21T09:19:14.1422210Z mov.b32 %r23039, %r22993; 2026-02-21T09:19:14.1422385Z mov.b32 %r23040, %r22993; 2026-02-21T09:19:14.1422564Z mov.b32 %r23041, %r22993; 2026-02-21T09:19:14.1422751Z mov.b32 %r23042, %r22993; 2026-02-21T09:19:14.1422920Z mov.b32 %r23043, %r22993; 2026-02-21T09:19:14.1423099Z mov.b32 %r23044, %r22993; 2026-02-21T09:19:14.1423269Z mov.b32 %r23045, %r22993; 2026-02-21T09:19:14.1423439Z mov.b32 %r23046, %r22993; 2026-02-21T09:19:14.1423688Z mov.b32 %r23047, %r22993; 2026-02-21T09:19:14.1423865Z mov.b32 %r23048, %r22993; 2026-02-21T09:19:14.1424038Z mov.b32 %r23049, %r22993; 2026-02-21T09:19:14.1424205Z mov.b32 %r23050, %r22993; 2026-02-21T09:19:14.1424382Z mov.b32 %r23051, %r22993; 2026-02-21T09:19:14.1424556Z mov.b32 %r23052, %r22993; 2026-02-21T09:19:14.1424731Z mov.b32 %r23053, %r22993; 2026-02-21T09:19:14.1424897Z mov.b32 %r23054, %r22993; 2026-02-21T09:19:14.1425073Z mov.b32 %r23055, %r22993; 2026-02-21T09:19:14.1425241Z mov.b32 %r23056, %r22993; 2026-02-21T09:19:14.1425416Z mov.b32 %r23057, %r22993; 2026-02-21T09:19:14.1425584Z mov.b32 %r23058, %r22993; 2026-02-21T09:19:14.1425758Z mov.b32 %r23059, %r22993; 2026-02-21T09:19:14.1425933Z mov.b32 %r23060, %r22993; 2026-02-21T09:19:14.1426101Z mov.b32 %r23061, %r22993; 2026-02-21T09:19:14.1426276Z mov.b32 %r23062, %r22993; 2026-02-21T09:19:14.1426568Z mov.b32 %r23063, %r22993; 2026-02-21T09:19:14.1426767Z mov.b32 %r23064, %r22993; 2026-02-21T09:19:14.1426942Z mov.b32 %r23065, %r22993; 2026-02-21T09:19:14.1427198Z mov.b32 %r23066, %r22993; 2026-02-21T09:19:14.1427374Z mov.b32 %r23067, %r22993; 2026-02-21T09:19:14.1427547Z mov.b32 %r23068, %r22993; 2026-02-21T09:19:14.1427717Z mov.b32 %r23069, %r22993; 2026-02-21T09:19:14.1427891Z mov.b32 %r23070, %r22993; 2026-02-21T09:19:14.1428069Z mov.b32 %r23071, %r22993; 2026-02-21T09:19:14.1428235Z mov.b32 %r23072, %r22993; 2026-02-21T09:19:14.1428408Z mov.b32 %r23073, %r22993; 2026-02-21T09:19:14.1428670Z mov.b32 %r23074, %r22993; 2026-02-21T09:19:14.1428849Z mov.b32 %r23075, %r22993; 2026-02-21T09:19:14.1429014Z mov.b32 %r23076, %r22993; 2026-02-21T09:19:14.1429189Z mov.b32 %r23077, %r22993; 2026-02-21T09:19:14.1429358Z mov.b32 %r23078, %r22993; 2026-02-21T09:19:14.1429532Z mov.b32 %r23079, %r22993; 2026-02-21T09:19:14.1429704Z mov.b32 %r23080, %r22993; 2026-02-21T09:19:14.1429880Z mov.b32 %r23081, %r22993; 2026-02-21T09:19:14.1430054Z mov.b32 %r23082, %r22993; 2026-02-21T09:19:14.1430225Z mov.b32 %r23083, %r22993; 2026-02-21T09:19:14.1430400Z mov.b32 %r23084, %r22993; 2026-02-21T09:19:14.1430566Z mov.b32 %r23085, %r22993; 2026-02-21T09:19:14.1430742Z mov.b32 %r23086, %r22993; 2026-02-21T09:19:14.1430997Z mov.b32 %r23087, %r22993; 2026-02-21T09:19:14.1431171Z mov.b32 %r23088, %r22993; 2026-02-21T09:19:14.1431337Z mov.b32 %r23089, %r22993; 2026-02-21T09:19:14.1431509Z mov.b32 %r23090, %r22993; 2026-02-21T09:19:14.1431675Z mov.b32 %r23091, %r22993; 2026-02-21T09:19:14.1431860Z mov.b32 %r23092, %r22993; 2026-02-21T09:19:14.1432035Z mov.b32 %r23093, %r22993; 2026-02-21T09:19:14.1432200Z mov.b32 %r23094, %r22993; 2026-02-21T09:19:14.1432375Z mov.b32 %r23095, %r22993; 2026-02-21T09:19:14.1432539Z mov.b32 %r23096, %r22993; 2026-02-21T09:19:14.1432712Z mov.b32 %r23097, %r22993; 2026-02-21T09:19:14.1432877Z mov.b32 %r23098, %r22993; 2026-02-21T09:19:14.1433050Z mov.b32 %r23099, %r22993; 2026-02-21T09:19:14.1433219Z mov.b32 %r23100, %r22993; 2026-02-21T09:19:14.1433472Z mov.b32 %r23101, %r22993; 2026-02-21T09:19:14.1433647Z mov.b32 %r23102, %r22993; 2026-02-21T09:19:14.1433824Z mov.b32 %r23103, %r22993; 2026-02-21T09:19:14.1433996Z mov.b32 %r23104, %r22993; 2026-02-21T09:19:14.1434169Z mov.b32 %r23105, %r22993; 2026-02-21T09:19:14.1434350Z mov.b32 %r23106, %r22993; 2026-02-21T09:19:14.1434518Z mov.b32 %r23107, %r22993; 2026-02-21T09:19:14.1434693Z mov.b32 %r23108, %r22993; 2026-02-21T09:19:14.1434860Z mov.b32 %r23109, %r22993; 2026-02-21T09:19:14.1435031Z mov.b32 %r23110, %r22993; 2026-02-21T09:19:14.1435204Z mov.b32 %r23111, %r22993; 2026-02-21T09:19:14.1435368Z mov.b32 %r23112, %r22993; 2026-02-21T09:19:14.1435539Z mov.b32 %r23113, %r22993; 2026-02-21T09:19:14.1435705Z mov.b32 %r23114, %r22993; 2026-02-21T09:19:14.1435878Z mov.b32 %r23115, %r22993; 2026-02-21T09:19:14.1436045Z mov.b32 %r23116, %r22993; 2026-02-21T09:19:14.1436220Z mov.b32 %r23117, %r22993; 2026-02-21T09:19:14.1436387Z mov.b32 %r23118, %r22993; 2026-02-21T09:19:14.1436789Z mov.b32 %r23119, %r22993; 2026-02-21T09:19:14.1436962Z mov.b32 %r23120, %r22993; 2026-02-21T09:19:14.1437136Z mov.b32 %r23121, %r22993; 2026-02-21T09:19:14.1437310Z mov.b32 %r23122, %r22993; 2026-02-21T09:19:14.1437481Z mov.b32 %r23123, %r22993; 2026-02-21T09:19:14.1437652Z mov.b32 %r23124, %r22993; 2026-02-21T09:19:14.1437821Z mov.b32 %r23125, %r22993; 2026-02-21T09:19:14.1437994Z mov.b32 %r23126, %r22993; 2026-02-21T09:19:14.1438174Z mov.b32 %r23127, %r22993; 2026-02-21T09:19:14.1438351Z mov.b32 %r23128, %r22993; 2026-02-21T09:19:14.1438518Z mov.b32 %r23129, %r22993; 2026-02-21T09:19:14.1438689Z mov.b32 %r23130, %r22993; 2026-02-21T09:19:14.1438858Z mov.b32 %r23131, %r22993; 2026-02-21T09:19:14.1439033Z mov.b32 %r23132, %r22993; 2026-02-21T09:19:14.1439203Z mov.b32 %r23133, %r22993; 2026-02-21T09:19:14.1439368Z mov.b32 %r23134, %r22993; 2026-02-21T09:19:14.1439535Z mov.b32 %r23135, %r22993; 2026-02-21T09:19:14.1439699Z mov.b32 %r23136, %r22993; 2026-02-21T09:19:14.1439875Z mov.b32 %r23137, %r22993; 2026-02-21T09:19:14.1440130Z mov.b32 %r23138, %r22993; 2026-02-21T09:19:14.1440304Z mov.b32 %r23139, %r22993; 2026-02-21T09:19:14.1440468Z mov.b32 %r23140, %r22993; 2026-02-21T09:19:14.1440639Z mov.b32 %r23141, %r22993; 2026-02-21T09:19:14.1440804Z mov.b32 %r23142, %r22993; 2026-02-21T09:19:14.1440974Z mov.b32 %r23143, %r22993; 2026-02-21T09:19:14.1441146Z mov.b32 %r23144, %r22993; 2026-02-21T09:19:14.1441208Z mov.b32 %r23145, %r22993; 2026-02-21T09:19:14.1441271Z mov.b32 %r23146, %r22993; 2026-02-21T09:19:14.1441332Z mov.b32 %r23147, %r22993; 2026-02-21T09:19:14.1441398Z mov.b32 %r23148, %r22993; 2026-02-21T09:19:14.1441457Z mov.b32 %r23149, %r22993; 2026-02-21T09:19:14.1441518Z mov.b32 %r23150, %r22993; 2026-02-21T09:19:14.1441587Z mov.b32 %r23151, %r22993; 2026-02-21T09:19:14.1441658Z mov.b32 %r23152, %r22993; 2026-02-21T09:19:14.1441720Z mov.b32 %r23153, %r22993; 2026-02-21T09:19:14.1441782Z mov.b32 %r23154, %r22993; 2026-02-21T09:19:14.1441851Z mov.b32 %r23155, %r22993; 2026-02-21T09:19:14.1441915Z mov.b32 %r23156, %r22993; 2026-02-21T09:19:14.1441977Z mov.b32 %r23157, %r22993; 2026-02-21T09:19:14.1442044Z mov.b32 %r23158, %r22993; 2026-02-21T09:19:14.1442190Z mov.b32 %r23159, %r22993; 2026-02-21T09:19:14.1442252Z mov.b32 %r23160, %r22993; 2026-02-21T09:19:14.1442315Z mov.b32 %r23161, %r22993; 2026-02-21T09:19:14.1442382Z mov.b32 %r23162, %r22993; 2026-02-21T09:19:14.1442443Z mov.b32 %r23163, %r22993; 2026-02-21T09:19:14.1442505Z mov.b32 %r23164, %r22993; 2026-02-21T09:19:14.1442571Z mov.b32 %r23165, %r22993; 2026-02-21T09:19:14.1442632Z mov.b32 %r23166, %r22993; 2026-02-21T09:19:14.1442693Z mov.b32 %r23167, %r22993; 2026-02-21T09:19:14.1442755Z mov.b32 %r23168, %r22993; 2026-02-21T09:19:14.1442822Z mov.b32 %r23169, %r22993; 2026-02-21T09:19:14.1442880Z mov.b32 %r23170, %r22993; 2026-02-21T09:19:14.1442954Z mov.b32 %r23171, %r22993; 2026-02-21T09:19:14.1443017Z mov.b32 %r23172, %r22993; 2026-02-21T09:19:14.1443155Z mov.b32 %r23173, %r22993; 2026-02-21T09:19:14.1443229Z mov.b32 %r23174, %r22993; 2026-02-21T09:19:14.1443290Z mov.b32 %r23175, %r22993; 2026-02-21T09:19:14.1443351Z mov.b32 %r23176, %r22993; 2026-02-21T09:19:14.1443415Z mov.b32 %r23177, %r22993; 2026-02-21T09:19:14.1443483Z mov.b32 %r23178, %r22993; 2026-02-21T09:19:14.1443547Z mov.b32 %r23179, %r22993; 2026-02-21T09:19:14.1443608Z mov.b32 %r23180, %r22993; 2026-02-21T09:19:14.1443675Z mov.b32 %r23181, %r22993; 2026-02-21T09:19:14.1443747Z mov.b32 %r23182, %r22993; 2026-02-21T09:19:14.1443810Z mov.b32 %r23183, %r22993; 2026-02-21T09:19:14.1443872Z mov.b32 %r23184, %r22993; 2026-02-21T09:19:14.1443940Z mov.b32 %r23185, %r22993; 2026-02-21T09:19:14.1444001Z mov.b32 %r23186, %r22993; 2026-02-21T09:19:14.1444063Z mov.b32 %r23187, %r22993; 2026-02-21T09:19:14.1444134Z mov.b32 %r23188, %r22993; 2026-02-21T09:19:14.1444196Z mov.b32 %r23189, %r22993; 2026-02-21T09:19:14.1444256Z mov.b32 %r23190, %r22993; 2026-02-21T09:19:14.1444398Z mov.b32 %r23191, %r22993; 2026-02-21T09:19:14.1444462Z mov.b32 %r23192, %r22993; 2026-02-21T09:19:14.1444525Z mov.b32 %r23193, %r22993; 2026-02-21T09:19:14.1444585Z mov.b32 %r23194, %r22993; 2026-02-21T09:19:14.1444654Z mov.b32 %r23195, %r22993; 2026-02-21T09:19:14.1444713Z mov.b32 %r23196, %r22993; 2026-02-21T09:19:14.1444774Z mov.b32 %r23197, %r22993; 2026-02-21T09:19:14.1444838Z mov.b32 %r23198, %r22993; 2026-02-21T09:19:14.1444901Z mov.b32 %r23199, %r22993; 2026-02-21T09:19:14.1444962Z mov.b32 %r23200, %r22993; 2026-02-21T09:19:14.1445022Z mov.b32 %r23201, %r22993; 2026-02-21T09:19:14.1445087Z mov.b32 %r23202, %r22993; 2026-02-21T09:19:14.1445147Z mov.b32 %r23203, %r22993; 2026-02-21T09:19:14.1445207Z mov.b32 %r23204, %r22993; 2026-02-21T09:19:14.1445273Z mov.b32 %r23205, %r22993; 2026-02-21T09:19:14.1445334Z mov.b32 %r23206, %r22993; 2026-02-21T09:19:14.1445393Z mov.b32 %r23207, %r22993; 2026-02-21T09:19:14.1445453Z mov.b32 %r23208, %r22993; 2026-02-21T09:19:14.1445522Z mov.b32 %r23209, %r22993; 2026-02-21T09:19:14.1445633Z mov.b32 %r23210, %r22993; 2026-02-21T09:19:14.1445697Z mov.b32 %r23211, %r22993; 2026-02-21T09:19:14.1445761Z mov.b32 %r23212, %r22993; 2026-02-21T09:19:14.1445824Z mov.b32 %r23213, %r22993; 2026-02-21T09:19:14.1445885Z mov.b32 %r23214, %r22993; 2026-02-21T09:19:14.1445947Z mov.b32 %r23215, %r22993; 2026-02-21T09:19:14.1446021Z mov.b32 %r23216, %r22993; 2026-02-21T09:19:14.1446086Z mov.b32 %r23217, %r22993; 2026-02-21T09:19:14.1446147Z mov.b32 %r23218, %r22993; 2026-02-21T09:19:14.1446218Z mov.b32 %r23219, %r22993; 2026-02-21T09:19:14.1446279Z mov.b32 %r23220, %r22993; 2026-02-21T09:19:14.1446339Z mov.b32 %r23221, %r22993; 2026-02-21T09:19:14.1446399Z mov.b32 %r23222, %r22993; 2026-02-21T09:19:14.1446580Z mov.b32 %r23223, %r22993; 2026-02-21T09:19:14.1446646Z mov.b32 %r23224, %r22993; 2026-02-21T09:19:14.1446707Z mov.b32 %r23225, %r22993; 2026-02-21T09:19:14.1446776Z mov.b32 %r23226, %r22993; 2026-02-21T09:19:14.1446838Z mov.b32 %r23227, %r22993; 2026-02-21T09:19:14.1446903Z mov.b32 %r23228, %r22993; 2026-02-21T09:19:14.1446965Z mov.b32 %r23229, %r22993; 2026-02-21T09:19:14.1447031Z mov.b32 %r23230, %r22993; 2026-02-21T09:19:14.1447172Z mov.b32 %r23231, %r22993; 2026-02-21T09:19:14.1447234Z mov.b32 %r23232, %r22993; 2026-02-21T09:19:14.1447301Z mov.b32 %r23233, %r22993; 2026-02-21T09:19:14.1447363Z mov.b32 %r23234, %r22993; 2026-02-21T09:19:14.1447424Z mov.b32 %r23235, %r22993; 2026-02-21T09:19:14.1447487Z mov.b32 %r23236, %r22993; 2026-02-21T09:19:14.1447553Z mov.b32 %r23237, %r22993; 2026-02-21T09:19:14.1447613Z mov.b32 %r23238, %r22993; 2026-02-21T09:19:14.1447674Z mov.b32 %r23239, %r22993; 2026-02-21T09:19:14.1447744Z mov.b32 %r23240, %r22993; 2026-02-21T09:19:14.1447806Z mov.b32 %r23241, %r22993; 2026-02-21T09:19:14.1447866Z mov.b32 %r23242, %r22993; 2026-02-21T09:19:14.1447932Z mov.b32 %r23243, %r22993; 2026-02-21T09:19:14.1447996Z mov.b32 %r23244, %r22993; 2026-02-21T09:19:14.1448128Z mov.b32 %r23245, %r22993; 2026-02-21T09:19:14.1448195Z mov.b32 %r23246, %r22993; 2026-02-21T09:19:14.1448263Z mov.b32 %r23247, %r22993; 2026-02-21T09:19:14.1448325Z mov.b32 %r23248, %r22993; 2026-02-21T09:19:14.1448452Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:19:14.1448569Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:14.1448638Z add.s64 %rd1115, %rd1115, 16; 2026-02-21T09:19:14.1448713Z setp.lt.u64 %p19, %rd1115, 432; 2026-02-21T09:19:14.1448780Z add.s32 %r6433, %r22991, 1; 2026-02-21T09:19:14.1448856Z setp.gt.s32 %p20, %r6433, 4; 2026-02-21T09:19:14.1448928Z selp.b32 %r22991, 0, %r6433, %p20; 2026-02-21T09:19:14.1449159Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1449237Z cp.async.wait_group 16; 2026-02-21T09:19:14.1449298Z bar.sync 0; 2026-02-21T09:19:14.1449434Z shl.b32 %r6434, %r22991, 14; 2026-02-21T09:19:14.1449512Z add.s32 %r6436, %r22967, %r6434; 2026-02-21T09:19:14.1449732Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1449806Z add.s32 %r6437, %r6436, %r157; 2026-02-21T09:19:14.1449880Z ld.shared.b16 %rs1, [%r6437]; 2026-02-21T09:19:14.1449970Z ld.shared.b16 %rs2, [%r6437+256]; 2026-02-21T09:19:14.1450043Z ld.shared.b16 %rs3, [%r6437+16]; 2026-02-21T09:19:14.1450111Z ld.shared.b16 %rs4, [%r6437+272]; 2026-02-21T09:19:14.1450188Z ld.shared.b16 %rs5, [%r6437+4096]; 2026-02-21T09:19:14.1450255Z ld.shared.b16 %rs6, [%r6437+4352]; 2026-02-21T09:19:14.1450320Z ld.shared.b16 %rs7, [%r6437+4112]; 2026-02-21T09:19:14.1450385Z ld.shared.b16 %rs8, [%r6437+4368]; 2026-02-21T09:19:14.1450458Z ld.shared.b16 %rs9, [%r6437+8192]; 2026-02-21T09:19:14.1450533Z ld.shared.b16 %rs10, [%r6437+8448]; 2026-02-21T09:19:14.1450607Z ld.shared.b16 %rs11, [%r6437+8208]; 2026-02-21T09:19:14.1450686Z ld.shared.b16 %rs12, [%r6437+8464]; 2026-02-21T09:19:14.1450848Z ld.shared.b16 %rs13, [%r6437+12288]; 2026-02-21T09:19:14.1450924Z ld.shared.b16 %rs14, [%r6437+12544]; 2026-02-21T09:19:14.1451001Z ld.shared.b16 %rs15, [%r6437+12304]; 2026-02-21T09:19:14.1451072Z ld.shared.b16 %rs16, [%r6437+12560]; 2026-02-21T09:19:14.1451140Z add.s32 %r6438, %r6436, %r158; 2026-02-21T09:19:14.1451208Z ld.shared.b16 %rs17, [%r6438]; 2026-02-21T09:19:14.1451291Z ld.shared.b16 %rs18, [%r6438+256]; 2026-02-21T09:19:14.1451362Z ld.shared.b16 %rs19, [%r6438+16]; 2026-02-21T09:19:14.1451429Z ld.shared.b16 %rs20, [%r6438+272]; 2026-02-21T09:19:14.1451506Z ld.shared.b16 %rs21, [%r6438+4096]; 2026-02-21T09:19:14.1451575Z ld.shared.b16 %rs22, [%r6438+4352]; 2026-02-21T09:19:14.1451643Z ld.shared.b16 %rs23, [%r6438+4112]; 2026-02-21T09:19:14.1451711Z ld.shared.b16 %rs24, [%r6438+4368]; 2026-02-21T09:19:14.1451788Z ld.shared.b16 %rs25, [%r6438+8192]; 2026-02-21T09:19:14.1451855Z ld.shared.b16 %rs26, [%r6438+8448]; 2026-02-21T09:19:14.1451927Z ld.shared.b16 %rs27, [%r6438+8208]; 2026-02-21T09:19:14.1452000Z ld.shared.b16 %rs28, [%r6438+8464]; 2026-02-21T09:19:14.1452069Z ld.shared.b16 %rs29, [%r6438+12288]; 2026-02-21T09:19:14.1452193Z ld.shared.b16 %rs30, [%r6438+12544]; 2026-02-21T09:19:14.1452265Z ld.shared.b16 %rs31, [%r6438+12304]; 2026-02-21T09:19:14.1452333Z ld.shared.b16 %rs32, [%r6438+12560]; 2026-02-21T09:19:14.1452401Z cvt.f32.bf16 %r3377, %rs1; 2026-02-21T09:19:14.1452464Z cvt.f32.bf16 %r3378, %rs2; 2026-02-21T09:19:14.1452539Z cvt.f32.bf16 %r3379, %rs17; 2026-02-21T09:19:14.1452604Z cvt.f32.bf16 %r3380, %rs18; 2026-02-21T09:19:14.1452668Z cvt.f32.bf16 %r3509, %rs3; 2026-02-21T09:19:14.1452737Z cvt.f32.bf16 %r3510, %rs4; 2026-02-21T09:19:14.1452801Z cvt.f32.bf16 %r3511, %rs19; 2026-02-21T09:19:14.1452867Z cvt.f32.bf16 %r3512, %rs20; 2026-02-21T09:19:14.1452930Z cvt.f32.bf16 %r3641, %rs5; 2026-02-21T09:19:14.1452999Z cvt.f32.bf16 %r3642, %rs6; 2026-02-21T09:19:14.1453119Z cvt.f32.bf16 %r3643, %rs21; 2026-02-21T09:19:14.1453187Z cvt.f32.bf16 %r3644, %rs22; 2026-02-21T09:19:14.1453255Z cvt.f32.bf16 %r3773, %rs7; 2026-02-21T09:19:14.1453320Z cvt.f32.bf16 %r3774, %rs8; 2026-02-21T09:19:14.1453385Z cvt.f32.bf16 %r3775, %rs23; 2026-02-21T09:19:14.1453450Z cvt.f32.bf16 %r3776, %rs24; 2026-02-21T09:19:14.1453517Z cvt.f32.bf16 %r3905, %rs9; 2026-02-21T09:19:14.1453580Z cvt.f32.bf16 %r3906, %rs10; 2026-02-21T09:19:14.1453642Z cvt.f32.bf16 %r3907, %rs25; 2026-02-21T09:19:14.1453710Z cvt.f32.bf16 %r3908, %rs26; 2026-02-21T09:19:14.1453772Z cvt.f32.bf16 %r4037, %rs11; 2026-02-21T09:19:14.1453846Z cvt.f32.bf16 %r4038, %rs12; 2026-02-21T09:19:14.1453916Z cvt.f32.bf16 %r4039, %rs27; 2026-02-21T09:19:14.1453979Z cvt.f32.bf16 %r4040, %rs28; 2026-02-21T09:19:14.1454043Z cvt.f32.bf16 %r4169, %rs13; 2026-02-21T09:19:14.1454105Z cvt.f32.bf16 %r4170, %rs14; 2026-02-21T09:19:14.1454173Z cvt.f32.bf16 %r4171, %rs29; 2026-02-21T09:19:14.1454291Z cvt.f32.bf16 %r4172, %rs30; 2026-02-21T09:19:14.1454356Z cvt.f32.bf16 %r4301, %rs15; 2026-02-21T09:19:14.1454427Z cvt.f32.bf16 %r4302, %rs16; 2026-02-21T09:19:14.1454489Z cvt.f32.bf16 %r4303, %rs31; 2026-02-21T09:19:14.1454551Z cvt.f32.bf16 %r4304, %rs32; 2026-02-21T09:19:14.1454765Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1454839Z shl.b32 %r6439, %r22991, 10; 2026-02-21T09:19:14.1454905Z add.s32 %r6440, %r22967, %r6439; 2026-02-21T09:19:14.1454971Z add.s32 %r6441, %r6440, 172032; 2026-02-21T09:19:14.1455183Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1455247Z add.s32 %r6442, %r6441, %r22973; 2026-02-21T09:19:14.1455317Z ld.shared.b8 %rs33, [%r6442]; 2026-02-21T09:19:14.1455393Z ld.shared.b8 %rs34, [%r6442+128]; 2026-02-21T09:19:14.1455462Z ld.shared.b8 %rs35, [%r6442+256]; 2026-02-21T09:19:14.1455528Z ld.shared.b8 %rs36, [%r6442+384]; 2026-02-21T09:19:14.1455597Z ld.shared.b8 %rs37, [%r6442+512]; 2026-02-21T09:19:14.1455721Z ld.shared.b8 %rs38, [%r6442+640]; 2026-02-21T09:19:14.1455789Z ld.shared.b8 %rs39, [%r6442+768]; 2026-02-21T09:19:14.1455854Z add.s32 %r6443, %r6441, %r22974; 2026-02-21T09:19:14.1455926Z ld.shared.b8 %rs40, [%r6443]; 2026-02-21T09:19:14.1456126Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1456192Z shl.b16 %rs41, %rs33, 4; 2026-02-21T09:19:14.1456255Z shl.b16 %rs42, %rs34, 4; 2026-02-21T09:19:14.1456324Z shl.b16 %rs43, %rs35, 4; 2026-02-21T09:19:14.1456386Z shl.b16 %rs44, %rs36, 4; 2026-02-21T09:19:14.1456447Z shl.b16 %rs45, %rs37, 4; 2026-02-21T09:19:14.1456645Z shl.b16 %rs46, %rs38, 4; 2026-02-21T09:19:14.1456709Z shl.b16 %rs47, %rs39, 4; 2026-02-21T09:19:14.1456771Z shl.b16 %rs48, %rs40, 4; 2026-02-21T09:19:14.1456978Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1457058Z selp.b16 %rs49, %rs41, %rs33, %p110; 2026-02-21T09:19:14.1457122Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T09:19:14.1457186Z shr.s16 %rs51, %rs50, 4; 2026-02-21T09:19:14.1457262Z selp.b16 %rs52, %rs42, %rs34, %p110; 2026-02-21T09:19:14.1457407Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T09:19:14.1457469Z shr.s16 %rs54, %rs53, 4; 2026-02-21T09:19:14.1457547Z selp.b16 %rs55, %rs43, %rs35, %p110; 2026-02-21T09:19:14.1457613Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T09:19:14.1457675Z shr.s16 %rs57, %rs56, 4; 2026-02-21T09:19:14.1457743Z selp.b16 %rs58, %rs44, %rs36, %p110; 2026-02-21T09:19:14.1457810Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T09:19:14.1457872Z shr.s16 %rs60, %rs59, 4; 2026-02-21T09:19:14.1457944Z selp.b16 %rs61, %rs45, %rs37, %p110; 2026-02-21T09:19:14.1458024Z cvt.s16.s8 %rs62, %rs61; 2026-02-21T09:19:14.1458088Z shr.s16 %rs63, %rs62, 4; 2026-02-21T09:19:14.1458158Z selp.b16 %rs64, %rs46, %rs38, %p110; 2026-02-21T09:19:14.1458223Z cvt.s16.s8 %rs65, %rs64; 2026-02-21T09:19:14.1458367Z shr.s16 %rs66, %rs65, 4; 2026-02-21T09:19:14.1458444Z selp.b16 %rs67, %rs47, %rs39, %p110; 2026-02-21T09:19:14.1458510Z cvt.s16.s8 %rs68, %rs67; 2026-02-21T09:19:14.1458580Z shr.s16 %rs69, %rs68, 4; 2026-02-21T09:19:14.1458654Z selp.b16 %rs70, %rs48, %rs40, %p110; 2026-02-21T09:19:14.1458721Z cvt.s16.s8 %rs71, %rs70; 2026-02-21T09:19:14.1458793Z shr.s16 %rs72, %rs71, 4; 2026-02-21T09:19:14.1458995Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1459064Z cvt.rn.f32.s16 %r6444, %rs51; 2026-02-21T09:19:14.1459129Z cvt.rn.f32.s16 %r6445, %rs54; 2026-02-21T09:19:14.1459201Z cvt.rn.f32.s16 %r6446, %rs57; 2026-02-21T09:19:14.1459267Z cvt.rn.f32.s16 %r6447, %rs60; 2026-02-21T09:19:14.1459343Z cvt.rn.f32.s16 %r6448, %rs63; 2026-02-21T09:19:14.1459419Z cvt.rn.f32.s16 %r6449, %rs66; 2026-02-21T09:19:14.1459485Z cvt.rn.f32.s16 %r6450, %rs69; 2026-02-21T09:19:14.1459633Z cvt.rn.f32.s16 %r6451, %rs72; 2026-02-21T09:19:14.1459705Z st.shared.b32 [%r161], %r6444; 2026-02-21T09:19:14.1459780Z st.shared.b32 [%r161+8], %r6445; 2026-02-21T09:19:14.1459847Z st.shared.b32 [%r162], %r6446; 2026-02-21T09:19:14.1459916Z st.shared.b32 [%r162+8], %r6447; 2026-02-21T09:19:14.1459986Z st.shared.b32 [%r163], %r6448; 2026-02-21T09:19:14.1460053Z st.shared.b32 [%r163+8], %r6449; 2026-02-21T09:19:14.1460119Z st.shared.b32 [%r164], %r6450; 2026-02-21T09:19:14.1460185Z st.shared.b32 [%r164+8], %r6451; 2026-02-21T09:19:14.1460250Z $L__tmp1: 2026-02-21T09:19:14.1460531Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1460598Z // begin inline asm 2026-02-21T09:19:14.1460703Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1460762Z // end inline asm 2026-02-21T09:19:14.1460819Z bar.sync 0; 2026-02-21T09:19:14.1460906Z shfl.sync.idx.b32 %r6452, %r5, 0, 31, -1; 2026-02-21T09:19:14.1460982Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1461117Z mov.pred %p2, -1; 2026-02-21T09:19:14.1461184Z // begin inline asm 2026-02-21T09:19:14.1462651Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056}, {%r3377,%r3378,%r3379,%r3380}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1462714Z // end inline asm 2026-02-21T09:19:14.1462779Z // begin inline asm 2026-02-21T09:19:14.1464229Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056}, {%r3509,%r3510,%r3511,%r3512}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1464341Z // end inline asm 2026-02-21T09:19:14.1464407Z // begin inline asm 2026-02-21T09:19:14.1465902Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120}, {%r3641,%r3642,%r3643,%r3644}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1465970Z // end inline asm 2026-02-21T09:19:14.1466030Z // begin inline asm 2026-02-21T09:19:14.1467589Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120}, {%r3773,%r3774,%r3775,%r3776}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1467658Z // end inline asm 2026-02-21T09:19:14.1467792Z // begin inline asm 2026-02-21T09:19:14.1469336Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184}, {%r3905,%r3906,%r3907,%r3908}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1469408Z // end inline asm 2026-02-21T09:19:14.1469471Z // begin inline asm 2026-02-21T09:19:14.1471000Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184}, {%r4037,%r4038,%r4039,%r4040}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1471068Z // end inline asm 2026-02-21T09:19:14.1471129Z // begin inline asm 2026-02-21T09:19:14.1472589Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248}, {%r4169,%r4170,%r4171,%r4172}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1472651Z // end inline asm 2026-02-21T09:19:14.1472712Z // begin inline asm 2026-02-21T09:19:14.1474243Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248}, {%r4301,%r4302,%r4303,%r4304}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1474303Z // end inline asm 2026-02-21T09:19:14.1474387Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1474460Z mov.b32 %r6137, 0; 2026-02-21T09:19:14.1474593Z mov.b32 %r4561, %r2973; 2026-02-21T09:19:14.1474660Z mov.b32 %r4562, %r6137; 2026-02-21T09:19:14.1474727Z mov.b32 %r4563, %r6137; 2026-02-21T09:19:14.1474790Z // begin inline asm 2026-02-21T09:19:14.1479869Z // wait for regs: %r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r4561,%r4562,%r4563 2026-02-21T09:19:14.1480024Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1480085Z // end inline asm 2026-02-21T09:19:14.1480144Z $L__tmp2: 2026-02-21T09:19:14.1480362Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1480446Z add.s32 %r6453, %r22967, 81920; 2026-02-21T09:19:14.1480515Z add.s32 %r6454, %r6453, %r6434; 2026-02-21T09:19:14.1480725Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1480796Z add.s32 %r6455, %r6454, %r157; 2026-02-21T09:19:14.1480865Z ld.shared.b16 %rs73, [%r6455]; 2026-02-21T09:19:14.1480936Z ld.shared.b16 %rs74, [%r6455+256]; 2026-02-21T09:19:14.1481013Z ld.shared.b16 %rs75, [%r6455+16]; 2026-02-21T09:19:14.1481081Z ld.shared.b16 %rs76, [%r6455+272]; 2026-02-21T09:19:14.1481154Z ld.shared.b16 %rs77, [%r6455+4096]; 2026-02-21T09:19:14.1481222Z ld.shared.b16 %rs78, [%r6455+4352]; 2026-02-21T09:19:14.1481365Z ld.shared.b16 %rs79, [%r6455+4112]; 2026-02-21T09:19:14.1481433Z ld.shared.b16 %rs80, [%r6455+4368]; 2026-02-21T09:19:14.1481500Z ld.shared.b16 %rs81, [%r6455+8192]; 2026-02-21T09:19:14.1481572Z ld.shared.b16 %rs82, [%r6455+8448]; 2026-02-21T09:19:14.1481640Z ld.shared.b16 %rs83, [%r6455+8208]; 2026-02-21T09:19:14.1481707Z ld.shared.b16 %rs84, [%r6455+8464]; 2026-02-21T09:19:14.1481783Z ld.shared.b16 %rs85, [%r6455+12288]; 2026-02-21T09:19:14.1481853Z ld.shared.b16 %rs86, [%r6455+12544]; 2026-02-21T09:19:14.1481922Z ld.shared.b16 %rs87, [%r6455+12304]; 2026-02-21T09:19:14.1481988Z ld.shared.b16 %rs88, [%r6455+12560]; 2026-02-21T09:19:14.1482058Z add.s32 %r6456, %r6454, %r158; 2026-02-21T09:19:14.1482129Z ld.shared.b16 %rs89, [%r6456]; 2026-02-21T09:19:14.1482264Z ld.shared.b16 %rs90, [%r6456+256]; 2026-02-21T09:19:14.1482342Z ld.shared.b16 %rs91, [%r6456+16]; 2026-02-21T09:19:14.1482409Z ld.shared.b16 %rs92, [%r6456+272]; 2026-02-21T09:19:14.1482479Z ld.shared.b16 %rs93, [%r6456+4096]; 2026-02-21T09:19:14.1482548Z ld.shared.b16 %rs94, [%r6456+4352]; 2026-02-21T09:19:14.1482626Z ld.shared.b16 %rs95, [%r6456+4112]; 2026-02-21T09:19:14.1482695Z ld.shared.b16 %rs96, [%r6456+4368]; 2026-02-21T09:19:14.1482761Z ld.shared.b16 %rs97, [%r6456+8192]; 2026-02-21T09:19:14.1482837Z ld.shared.b16 %rs98, [%r6456+8448]; 2026-02-21T09:19:14.1482909Z ld.shared.b16 %rs99, [%r6456+8208]; 2026-02-21T09:19:14.1482978Z ld.shared.b16 %rs100, [%r6456+8464]; 2026-02-21T09:19:14.1483057Z ld.shared.b16 %rs101, [%r6456+12288]; 2026-02-21T09:19:14.1483126Z ld.shared.b16 %rs102, [%r6456+12544]; 2026-02-21T09:19:14.1483195Z ld.shared.b16 %rs103, [%r6456+12304]; 2026-02-21T09:19:14.1483267Z ld.shared.b16 %rs104, [%r6456+12560]; 2026-02-21T09:19:14.1483395Z cvt.f32.bf16 %r4951, %rs73; 2026-02-21T09:19:14.1483462Z cvt.f32.bf16 %r4952, %rs74; 2026-02-21T09:19:14.1483528Z cvt.f32.bf16 %r4953, %rs89; 2026-02-21T09:19:14.1483597Z cvt.f32.bf16 %r4954, %rs90; 2026-02-21T09:19:14.1483664Z cvt.f32.bf16 %r5083, %rs75; 2026-02-21T09:19:14.1483730Z cvt.f32.bf16 %r5084, %rs76; 2026-02-21T09:19:14.1483794Z cvt.f32.bf16 %r5085, %rs91; 2026-02-21T09:19:14.1483865Z cvt.f32.bf16 %r5086, %rs92; 2026-02-21T09:19:14.1483930Z cvt.f32.bf16 %r5215, %rs77; 2026-02-21T09:19:14.1483994Z cvt.f32.bf16 %r5216, %rs78; 2026-02-21T09:19:14.1484063Z cvt.f32.bf16 %r5217, %rs93; 2026-02-21T09:19:14.1484126Z cvt.f32.bf16 %r5218, %rs94; 2026-02-21T09:19:14.1484188Z cvt.f32.bf16 %r5347, %rs79; 2026-02-21T09:19:14.1484255Z cvt.f32.bf16 %r5348, %rs80; 2026-02-21T09:19:14.1484325Z cvt.f32.bf16 %r5349, %rs95; 2026-02-21T09:19:14.1484387Z cvt.f32.bf16 %r5350, %rs96; 2026-02-21T09:19:14.1484449Z cvt.f32.bf16 %r5479, %rs81; 2026-02-21T09:19:14.1484519Z cvt.f32.bf16 %r5480, %rs82; 2026-02-21T09:19:14.1484636Z cvt.f32.bf16 %r5481, %rs97; 2026-02-21T09:19:14.1484702Z cvt.f32.bf16 %r5482, %rs98; 2026-02-21T09:19:14.1484768Z cvt.f32.bf16 %r5611, %rs83; 2026-02-21T09:19:14.1484832Z cvt.f32.bf16 %r5612, %rs84; 2026-02-21T09:19:14.1484894Z cvt.f32.bf16 %r5613, %rs99; 2026-02-21T09:19:14.1484959Z cvt.f32.bf16 %r5614, %rs100; 2026-02-21T09:19:14.1485028Z cvt.f32.bf16 %r5743, %rs85; 2026-02-21T09:19:14.1485091Z cvt.f32.bf16 %r5744, %rs86; 2026-02-21T09:19:14.1485156Z cvt.f32.bf16 %r5745, %rs101; 2026-02-21T09:19:14.1485224Z cvt.f32.bf16 %r5746, %rs102; 2026-02-21T09:19:14.1485286Z cvt.f32.bf16 %r5875, %rs87; 2026-02-21T09:19:14.1485349Z cvt.f32.bf16 %r5876, %rs88; 2026-02-21T09:19:14.1485412Z cvt.f32.bf16 %r5877, %rs103; 2026-02-21T09:19:14.1485495Z cvt.f32.bf16 %r5878, %rs104; 2026-02-21T09:19:14.1485701Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1485771Z add.s32 %r6457, %r6440, 177152; 2026-02-21T09:19:14.1485979Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1486046Z add.s32 %r6458, %r6457, %r22973; 2026-02-21T09:19:14.1486169Z ld.shared.b8 %rs105, [%r6458]; 2026-02-21T09:19:14.1486241Z ld.shared.b8 %rs106, [%r6458+128]; 2026-02-21T09:19:14.1486308Z ld.shared.b8 %rs107, [%r6458+256]; 2026-02-21T09:19:14.1486374Z ld.shared.b8 %rs108, [%r6458+384]; 2026-02-21T09:19:14.1486439Z ld.shared.b8 %rs109, [%r6458+512]; 2026-02-21T09:19:14.1486626Z ld.shared.b8 %rs110, [%r6458+640]; 2026-02-21T09:19:14.1486696Z ld.shared.b8 %rs111, [%r6458+768]; 2026-02-21T09:19:14.1486762Z add.s32 %r6459, %r6457, %r22974; 2026-02-21T09:19:14.1486834Z ld.shared.b8 %rs112, [%r6459]; 2026-02-21T09:19:14.1487039Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1487106Z shl.b16 %rs113, %rs105, 4; 2026-02-21T09:19:14.1487248Z shl.b16 %rs114, %rs106, 4; 2026-02-21T09:19:14.1487321Z shl.b16 %rs115, %rs107, 4; 2026-02-21T09:19:14.1487385Z shl.b16 %rs116, %rs108, 4; 2026-02-21T09:19:14.1487448Z shl.b16 %rs117, %rs109, 4; 2026-02-21T09:19:14.1487519Z shl.b16 %rs118, %rs110, 4; 2026-02-21T09:19:14.1487581Z shl.b16 %rs119, %rs111, 4; 2026-02-21T09:19:14.1487644Z shl.b16 %rs120, %rs112, 4; 2026-02-21T09:19:14.1487847Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1487926Z selp.b16 %rs121, %rs113, %rs105, %p110; 2026-02-21T09:19:14.1487991Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T09:19:14.1488055Z shr.s16 %rs123, %rs122, 4; 2026-02-21T09:19:14.1488134Z selp.b16 %rs124, %rs114, %rs106, %p110; 2026-02-21T09:19:14.1488197Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T09:19:14.1488261Z shr.s16 %rs126, %rs125, 4; 2026-02-21T09:19:14.1488339Z selp.b16 %rs127, %rs115, %rs107, %p110; 2026-02-21T09:19:14.1488479Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T09:19:14.1488543Z shr.s16 %rs129, %rs128, 4; 2026-02-21T09:19:14.1488629Z selp.b16 %rs130, %rs116, %rs108, %p110; 2026-02-21T09:19:14.1488701Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T09:19:14.1488765Z shr.s16 %rs132, %rs131, 4; 2026-02-21T09:19:14.1488838Z selp.b16 %rs133, %rs117, %rs109, %p110; 2026-02-21T09:19:14.1488907Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T09:19:14.1488970Z shr.s16 %rs135, %rs134, 4; 2026-02-21T09:19:14.1489041Z selp.b16 %rs136, %rs118, %rs110, %p110; 2026-02-21T09:19:14.1489105Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T09:19:14.1489174Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:19:14.1489245Z selp.b16 %rs139, %rs119, %rs111, %p110; 2026-02-21T09:19:14.1489309Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T09:19:14.1489376Z shr.s16 %rs141, %rs140, 4; 2026-02-21T09:19:14.1489446Z selp.b16 %rs142, %rs120, %rs112, %p110; 2026-02-21T09:19:14.1489511Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T09:19:14.1489581Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:19:14.1489855Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1489930Z cvt.rn.f32.s16 %r6460, %rs123; 2026-02-21T09:19:14.1489997Z cvt.rn.f32.s16 %r6461, %rs126; 2026-02-21T09:19:14.1490070Z cvt.rn.f32.s16 %r6462, %rs129; 2026-02-21T09:19:14.1490135Z cvt.rn.f32.s16 %r6463, %rs132; 2026-02-21T09:19:14.1490199Z cvt.rn.f32.s16 %r6464, %rs135; 2026-02-21T09:19:14.1490268Z cvt.rn.f32.s16 %r6465, %rs138; 2026-02-21T09:19:14.1490333Z cvt.rn.f32.s16 %r6466, %rs141; 2026-02-21T09:19:14.1490398Z cvt.rn.f32.s16 %r6467, %rs144; 2026-02-21T09:19:14.1490465Z bar.sync 0; 2026-02-21T09:19:14.1490539Z st.shared.b32 [%r161], %r6460; 2026-02-21T09:19:14.1490609Z st.shared.b32 [%r161+8], %r6461; 2026-02-21T09:19:14.1490676Z st.shared.b32 [%r162], %r6462; 2026-02-21T09:19:14.1490750Z st.shared.b32 [%r162+8], %r6463; 2026-02-21T09:19:14.1490817Z st.shared.b32 [%r163], %r6464; 2026-02-21T09:19:14.1490883Z st.shared.b32 [%r163+8], %r6465; 2026-02-21T09:19:14.1490958Z st.shared.b32 [%r164], %r6466; 2026-02-21T09:19:14.1491027Z st.shared.b32 [%r164+8], %r6467; 2026-02-21T09:19:14.1491085Z $L__tmp3: 2026-02-21T09:19:14.1491367Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1491518Z // begin inline asm 2026-02-21T09:19:14.1491603Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1491664Z // end inline asm 2026-02-21T09:19:14.1491729Z bar.sync 0; 2026-02-21T09:19:14.1491804Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1491866Z // begin inline asm 2026-02-21T09:19:14.1493385Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056}, {%r4951,%r4952,%r4953,%r4954}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1493455Z // end inline asm 2026-02-21T09:19:14.1493520Z // begin inline asm 2026-02-21T09:19:14.1494974Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056}, {%r5083,%r5084,%r5085,%r5086}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1495083Z // end inline asm 2026-02-21T09:19:14.1495147Z // begin inline asm 2026-02-21T09:19:14.1496722Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120}, {%r5215,%r5216,%r5217,%r5218}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1496791Z // end inline asm 2026-02-21T09:19:14.1496857Z // begin inline asm 2026-02-21T09:19:14.1498378Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120}, {%r5347,%r5348,%r5349,%r5350}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1498444Z // end inline asm 2026-02-21T09:19:14.1498514Z // begin inline asm 2026-02-21T09:19:14.1499957Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184}, {%r5479,%r5480,%r5481,%r5482}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1500023Z // end inline asm 2026-02-21T09:19:14.1500149Z // begin inline asm 2026-02-21T09:19:14.1501594Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184}, {%r5611,%r5612,%r5613,%r5614}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1501662Z // end inline asm 2026-02-21T09:19:14.1501723Z // begin inline asm 2026-02-21T09:19:14.1503252Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248}, {%r5743,%r5744,%r5745,%r5746}, %rd1, %p2, 1, 1; 2026-02-21T09:19:14.1503327Z // end inline asm 2026-02-21T09:19:14.1503388Z // begin inline asm 2026-02-21T09:19:14.1504843Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248}, {%r5875,%r5876,%r5877,%r5878}, %rd2, %p2, 1, 1; 2026-02-21T09:19:14.1504974Z // end inline asm 2026-02-21T09:19:14.1505055Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1505123Z mov.b32 %r6136, %r6137; 2026-02-21T09:19:14.1505186Z mov.b32 %r6135, %r2973; 2026-02-21T09:19:14.1505247Z // begin inline asm 2026-02-21T09:19:14.1510367Z // wait for regs: %r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r23038,%r23039,%r23040,%r23041,%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r6135,%r6136,%r6137 2026-02-21T09:19:14.1510527Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1510588Z // end inline asm 2026-02-21T09:19:14.1510645Z $L__tmp4: 2026-02-21T09:19:14.1510868Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1510934Z add.s32 %r6468, %r22992, 1; 2026-02-21T09:19:14.1511073Z setp.gt.s32 %p21, %r6468, 4; 2026-02-21T09:19:14.1511153Z selp.b32 %r22992, 0, %r6468, %p21; 2026-02-21T09:19:14.1511363Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1511436Z add.s32 %r6469, %r22989, -16; 2026-02-21T09:19:14.1511506Z add.s64 %rd236, %rd1114, %rd14; 2026-02-21T09:19:14.1511578Z add.s64 %rd218, %rd236, 320; 2026-02-21T09:19:14.1511643Z add.s64 %rd237, %rd1114, %rd13; 2026-02-21T09:19:14.1511708Z add.s64 %rd219, %rd237, 320; 2026-02-21T09:19:14.1511777Z add.s64 %rd238, %rd1114, %rd12; 2026-02-21T09:19:14.1511841Z add.s64 %rd220, %rd238, 320; 2026-02-21T09:19:14.1511904Z add.s64 %rd239, %rd1114, %rd11; 2026-02-21T09:19:14.1511970Z add.s64 %rd221, %rd239, 320; 2026-02-21T09:19:14.1512041Z add.s64 %rd240, %rd1114, %rd10; 2026-02-21T09:19:14.1512105Z add.s64 %rd222, %rd240, 320; 2026-02-21T09:19:14.1512171Z add.s64 %rd241, %rd1114, %rd9; 2026-02-21T09:19:14.1512312Z add.s64 %rd223, %rd241, 320; 2026-02-21T09:19:14.1512380Z add.s64 %rd242, %rd1114, %rd8; 2026-02-21T09:19:14.1512443Z add.s64 %rd224, %rd242, 320; 2026-02-21T09:19:14.1512524Z mad.wide.s32 %rd225, %r6469, 2, %rd64; 2026-02-21T09:19:14.1512731Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1512795Z shl.b32 %r6470, %r22992, 14; 2026-02-21T09:19:14.1512862Z add.s32 %r6471, %r22967, %r6470; 2026-02-21T09:19:14.1512944Z add.s32 %r6397, %r6471, %r56; 2026-02-21T09:19:14.1513013Z selp.b32 %r6398, 8, 0, %p19; 2026-02-21T09:19:14.1513077Z // begin inline asm 2026-02-21T09:19:14.1513231Z cp.async.ca.shared.global [ %r6397 + 0 ], [ %rd218 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1513292Z // end inline asm 2026-02-21T09:19:14.1513354Z add.s32 %r6399, %r6397, 2048; 2026-02-21T09:19:14.1513415Z // begin inline asm 2026-02-21T09:19:14.1513561Z cp.async.ca.shared.global [ %r6399 + 0 ], [ %rd219 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1513625Z // end inline asm 2026-02-21T09:19:14.1513740Z add.s32 %r6401, %r6397, 4096; 2026-02-21T09:19:14.1513809Z // begin inline asm 2026-02-21T09:19:14.1513942Z cp.async.ca.shared.global [ %r6401 + 0 ], [ %rd220 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1514004Z // end inline asm 2026-02-21T09:19:14.1514072Z add.s32 %r6403, %r6397, 6144; 2026-02-21T09:19:14.1514132Z // begin inline asm 2026-02-21T09:19:14.1514269Z cp.async.ca.shared.global [ %r6403 + 0 ], [ %rd221 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1514328Z // end inline asm 2026-02-21T09:19:14.1514398Z add.s32 %r6405, %r6397, 8192; 2026-02-21T09:19:14.1514458Z // begin inline asm 2026-02-21T09:19:14.1514591Z cp.async.ca.shared.global [ %r6405 + 0 ], [ %rd222 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1514655Z // end inline asm 2026-02-21T09:19:14.1514719Z add.s32 %r6407, %r6397, 10240; 2026-02-21T09:19:14.1514780Z // begin inline asm 2026-02-21T09:19:14.1514911Z cp.async.ca.shared.global [ %r6407 + 0 ], [ %rd223 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1514980Z // end inline asm 2026-02-21T09:19:14.1515046Z add.s32 %r6409, %r6397, 12288; 2026-02-21T09:19:14.1515107Z // begin inline asm 2026-02-21T09:19:14.1515246Z cp.async.ca.shared.global [ %r6409 + 0 ], [ %rd224 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1515366Z // end inline asm 2026-02-21T09:19:14.1515428Z add.s32 %r6411, %r6397, 14336; 2026-02-21T09:19:14.1515497Z // begin inline asm 2026-02-21T09:19:14.1515630Z cp.async.ca.shared.global [ %r6411 + 0 ], [ %rd225 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1515690Z // end inline asm 2026-02-21T09:19:14.1515760Z cp.async.commit_group; 2026-02-21T09:19:14.1515980Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1516054Z add.s32 %r6472, %r22990, -65536; 2026-02-21T09:19:14.1516123Z cvt.s64.s32 %rd243, %r6472; 2026-02-21T09:19:14.1516197Z add.s64 %rd226, %rd65, %rd243; 2026-02-21T09:19:14.1516571Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1516650Z shl.b32 %r6473, %r22992, 10; 2026-02-21T09:19:14.1516715Z add.s32 %r6413, %r66, %r6473; 2026-02-21T09:19:14.1516789Z selp.b32 %r6414, 4, 0, %p19; 2026-02-21T09:19:14.1516854Z // begin inline asm 2026-02-21T09:19:14.1516995Z cp.async.ca.shared.global [ %r6413 + 0 ], [ %rd226 + 0 ], 0x4, %r6414; 2026-02-21T09:19:14.1517062Z // end inline asm 2026-02-21T09:19:14.1517131Z cp.async.commit_group; 2026-02-21T09:19:14.1517333Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1517403Z add.s64 %rd227, %rd236, 352; 2026-02-21T09:19:14.1517468Z add.s64 %rd228, %rd237, 352; 2026-02-21T09:19:14.1517533Z add.s64 %rd229, %rd238, 352; 2026-02-21T09:19:14.1517597Z add.s64 %rd230, %rd239, 352; 2026-02-21T09:19:14.1517675Z add.s64 %rd231, %rd240, 352; 2026-02-21T09:19:14.1517743Z add.s64 %rd232, %rd241, 352; 2026-02-21T09:19:14.1517888Z add.s64 %rd233, %rd242, 352; 2026-02-21T09:19:14.1517973Z mad.wide.s32 %rd234, %r22989, 2, %rd64; 2026-02-21T09:19:14.1518178Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1518245Z add.s32 %r6474, %r6453, %r6470; 2026-02-21T09:19:14.1518312Z add.s32 %r6415, %r6474, %r56; 2026-02-21T09:19:14.1518375Z // begin inline asm 2026-02-21T09:19:14.1518514Z cp.async.ca.shared.global [ %r6415 + 0 ], [ %rd227 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1518574Z // end inline asm 2026-02-21T09:19:14.1518642Z add.s32 %r6417, %r6415, 2048; 2026-02-21T09:19:14.1518702Z // begin inline asm 2026-02-21T09:19:14.1518835Z cp.async.ca.shared.global [ %r6417 + 0 ], [ %rd228 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1518897Z // end inline asm 2026-02-21T09:19:14.1518959Z add.s32 %r6419, %r6415, 4096; 2026-02-21T09:19:14.1519020Z // begin inline asm 2026-02-21T09:19:14.1519151Z cp.async.ca.shared.global [ %r6419 + 0 ], [ %rd229 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1519219Z // end inline asm 2026-02-21T09:19:14.1519348Z add.s32 %r6421, %r6415, 6144; 2026-02-21T09:19:14.1519412Z // begin inline asm 2026-02-21T09:19:14.1519551Z cp.async.ca.shared.global [ %r6421 + 0 ], [ %rd230 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1519611Z // end inline asm 2026-02-21T09:19:14.1519672Z add.s32 %r6423, %r6415, 8192; 2026-02-21T09:19:14.1519732Z // begin inline asm 2026-02-21T09:19:14.1519868Z cp.async.ca.shared.global [ %r6423 + 0 ], [ %rd231 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1519926Z // end inline asm 2026-02-21T09:19:14.1519990Z add.s32 %r6425, %r6415, 10240; 2026-02-21T09:19:14.1520058Z // begin inline asm 2026-02-21T09:19:14.1520189Z cp.async.ca.shared.global [ %r6425 + 0 ], [ %rd232 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1520248Z // end inline asm 2026-02-21T09:19:14.1520317Z add.s32 %r6427, %r6415, 12288; 2026-02-21T09:19:14.1520377Z // begin inline asm 2026-02-21T09:19:14.1520510Z cp.async.ca.shared.global [ %r6427 + 0 ], [ %rd233 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1520572Z // end inline asm 2026-02-21T09:19:14.1520643Z add.s32 %r6429, %r6415, 14336; 2026-02-21T09:19:14.1520705Z // begin inline asm 2026-02-21T09:19:14.1520838Z cp.async.ca.shared.global [ %r6429 + 0 ], [ %rd234 + 0 ], 0x8, %r6398; 2026-02-21T09:19:14.1520973Z // end inline asm 2026-02-21T09:19:14.1521042Z cp.async.commit_group; 2026-02-21T09:19:14.1521242Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1521310Z cvt.s64.s32 %rd244, %r22990; 2026-02-21T09:19:14.1521383Z add.s64 %rd235, %rd65, %rd244; 2026-02-21T09:19:14.1521583Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1521647Z add.s32 %r6431, %r76, %r6473; 2026-02-21T09:19:14.1521714Z // begin inline asm 2026-02-21T09:19:14.1521849Z cp.async.ca.shared.global [ %r6431 + 0 ], [ %rd235 + 0 ], 0x4, %r6414; 2026-02-21T09:19:14.1521908Z // end inline asm 2026-02-21T09:19:14.1522049Z cp.async.commit_group; 2026-02-21T09:19:14.1522255Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1522322Z add.s32 %r22990, %r22990, 131072; 2026-02-21T09:19:14.1522390Z add.s64 %rd1114, %rd1114, 64; 2026-02-21T09:19:14.1522457Z add.s32 %r22989, %r22989, 32; 2026-02-21T09:19:14.1522527Z setp.lt.u64 %p22, %rd1115, 496; 2026-02-21T09:19:14.1522593Z @%p22 bra $L__BB0_3; 2026-02-21T09:19:14.1522713Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:14.1522917Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.1522986Z or.b32 %r6946, %r327, %r9; 2026-02-21T09:19:14.1523191Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.1523258Z or.b32 %r6947, %r328, %r19; 2026-02-21T09:19:14.1523319Z or.b32 %r6948, %r328, %r20; 2026-02-21T09:19:14.1523442Z or.b32 %r6949, %r328, %r21; 2026-02-21T09:19:14.1523512Z or.b32 %r6950, %r328, %r22; 2026-02-21T09:19:14.1523574Z or.b32 %r6951, %r328, %r23; 2026-02-21T09:19:14.1523635Z or.b32 %r6952, %r328, %r24; 2026-02-21T09:19:14.1523702Z or.b32 %r6953, %r328, %r25; 2026-02-21T09:19:14.1523765Z or.b32 %r6954, %r328, %r26; 2026-02-21T09:19:14.1523826Z or.b32 %r6955, %r328, %r27; 2026-02-21T09:19:14.1523893Z or.b32 %r6956, %r328, %r28; 2026-02-21T09:19:14.1523954Z or.b32 %r6957, %r328, %r29; 2026-02-21T09:19:14.1524016Z or.b32 %r6958, %r328, %r30; 2026-02-21T09:19:14.1524078Z or.b32 %r6959, %r328, %r31; 2026-02-21T09:19:14.1524147Z or.b32 %r6960, %r328, %r32; 2026-02-21T09:19:14.1524208Z or.b32 %r6961, %r328, %r33; 2026-02-21T09:19:14.1524270Z or.b32 %r6962, %r328, %r34; 2026-02-21T09:19:14.1524338Z or.b32 %r6963, %r328, %r35; 2026-02-21T09:19:14.1524398Z or.b32 %r6964, %r328, %r36; 2026-02-21T09:19:14.1524460Z or.b32 %r6965, %r328, %r37; 2026-02-21T09:19:14.1524526Z or.b32 %r6966, %r328, %r38; 2026-02-21T09:19:14.1524646Z or.b32 %r6967, %r328, %r39; 2026-02-21T09:19:14.1524711Z or.b32 %r6968, %r328, %r40; 2026-02-21T09:19:14.1524773Z or.b32 %r6969, %r328, %r41; 2026-02-21T09:19:14.1524852Z or.b32 %r6970, %r328, %r42; 2026-02-21T09:19:14.1524922Z or.b32 %r6971, %r328, %r43; 2026-02-21T09:19:14.1524988Z or.b32 %r6972, %r328, %r44; 2026-02-21T09:19:14.1525053Z or.b32 %r6973, %r328, %r45; 2026-02-21T09:19:14.1525121Z or.b32 %r6974, %r328, %r46; 2026-02-21T09:19:14.1525185Z or.b32 %r6975, %r328, %r47; 2026-02-21T09:19:14.1525247Z or.b32 %r6976, %r328, %r48; 2026-02-21T09:19:14.1525314Z or.b32 %r6977, %r328, %r49; 2026-02-21T09:19:14.1525378Z or.b32 %r6978, %r328, %r50; 2026-02-21T09:19:14.1525585Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1525668Z cp.async.wait_group 0; 2026-02-21T09:19:14.1525729Z bar.sync 0; 2026-02-21T09:19:14.1525934Z .loc 1 90 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:90:28 2026-02-21T09:19:14.1526024Z cvt.rn.bf16x2.f32 %r6979, %r22994, %r22993; 2026-02-21T09:19:14.1526111Z cvt.rn.bf16x2.f32 %r6980, %r22996, %r22995; 2026-02-21T09:19:14.1526272Z cvt.rn.bf16x2.f32 %r6981, %r22998, %r22997; 2026-02-21T09:19:14.1526349Z cvt.rn.bf16x2.f32 %r6982, %r23000, %r22999; 2026-02-21T09:19:14.1526432Z cvt.rn.bf16x2.f32 %r6983, %r23002, %r23001; 2026-02-21T09:19:14.1526618Z cvt.rn.bf16x2.f32 %r6984, %r23004, %r23003; 2026-02-21T09:19:14.1526698Z cvt.rn.bf16x2.f32 %r6985, %r23006, %r23005; 2026-02-21T09:19:14.1526783Z cvt.rn.bf16x2.f32 %r6986, %r23008, %r23007; 2026-02-21T09:19:14.1526859Z cvt.rn.bf16x2.f32 %r6987, %r23010, %r23009; 2026-02-21T09:19:14.1526936Z cvt.rn.bf16x2.f32 %r6988, %r23012, %r23011; 2026-02-21T09:19:14.1527011Z cvt.rn.bf16x2.f32 %r6989, %r23014, %r23013; 2026-02-21T09:19:14.1527111Z cvt.rn.bf16x2.f32 %r6990, %r23016, %r23015; 2026-02-21T09:19:14.1527273Z cvt.rn.bf16x2.f32 %r6991, %r23018, %r23017; 2026-02-21T09:19:14.1527356Z cvt.rn.bf16x2.f32 %r6992, %r23020, %r23019; 2026-02-21T09:19:14.1527438Z cvt.rn.bf16x2.f32 %r6993, %r23022, %r23021; 2026-02-21T09:19:14.1527516Z cvt.rn.bf16x2.f32 %r6994, %r23024, %r23023; 2026-02-21T09:19:14.1527594Z cvt.rn.bf16x2.f32 %r6995, %r23026, %r23025; 2026-02-21T09:19:14.1527672Z cvt.rn.bf16x2.f32 %r6996, %r23028, %r23027; 2026-02-21T09:19:14.1527754Z cvt.rn.bf16x2.f32 %r6997, %r23030, %r23029; 2026-02-21T09:19:14.1527830Z cvt.rn.bf16x2.f32 %r6998, %r23032, %r23031; 2026-02-21T09:19:14.1527906Z cvt.rn.bf16x2.f32 %r6999, %r23034, %r23033; 2026-02-21T09:19:14.1527987Z cvt.rn.bf16x2.f32 %r7000, %r23036, %r23035; 2026-02-21T09:19:14.1528064Z cvt.rn.bf16x2.f32 %r7001, %r23038, %r23037; 2026-02-21T09:19:14.1528141Z cvt.rn.bf16x2.f32 %r7002, %r23040, %r23039; 2026-02-21T09:19:14.1528223Z cvt.rn.bf16x2.f32 %r7003, %r23042, %r23041; 2026-02-21T09:19:14.1528300Z cvt.rn.bf16x2.f32 %r7004, %r23044, %r23043; 2026-02-21T09:19:14.1528449Z cvt.rn.bf16x2.f32 %r7005, %r23046, %r23045; 2026-02-21T09:19:14.1528530Z cvt.rn.bf16x2.f32 %r7006, %r23048, %r23047; 2026-02-21T09:19:14.1528612Z cvt.rn.bf16x2.f32 %r7007, %r23050, %r23049; 2026-02-21T09:19:14.1528692Z cvt.rn.bf16x2.f32 %r7008, %r23052, %r23051; 2026-02-21T09:19:14.1528769Z cvt.rn.bf16x2.f32 %r7009, %r23054, %r23053; 2026-02-21T09:19:14.1528853Z cvt.rn.bf16x2.f32 %r7010, %r23056, %r23055; 2026-02-21T09:19:14.1528935Z cvt.rn.bf16x2.f32 %r7011, %r23058, %r23057; 2026-02-21T09:19:14.1529014Z cvt.rn.bf16x2.f32 %r7012, %r23060, %r23059; 2026-02-21T09:19:14.1529094Z cvt.rn.bf16x2.f32 %r7013, %r23062, %r23061; 2026-02-21T09:19:14.1529170Z cvt.rn.bf16x2.f32 %r7014, %r23064, %r23063; 2026-02-21T09:19:14.1529249Z cvt.rn.bf16x2.f32 %r7015, %r23066, %r23065; 2026-02-21T09:19:14.1529321Z cvt.rn.bf16x2.f32 %r7016, %r23068, %r23067; 2026-02-21T09:19:14.1529403Z cvt.rn.bf16x2.f32 %r7017, %r23070, %r23069; 2026-02-21T09:19:14.1529477Z cvt.rn.bf16x2.f32 %r7018, %r23072, %r23071; 2026-02-21T09:19:14.1529619Z cvt.rn.bf16x2.f32 %r7019, %r23074, %r23073; 2026-02-21T09:19:14.1529707Z cvt.rn.bf16x2.f32 %r7020, %r23076, %r23075; 2026-02-21T09:19:14.1529784Z cvt.rn.bf16x2.f32 %r7021, %r23078, %r23077; 2026-02-21T09:19:14.1529862Z cvt.rn.bf16x2.f32 %r7022, %r23080, %r23079; 2026-02-21T09:19:14.1529943Z cvt.rn.bf16x2.f32 %r7023, %r23082, %r23081; 2026-02-21T09:19:14.1530019Z cvt.rn.bf16x2.f32 %r7024, %r23084, %r23083; 2026-02-21T09:19:14.1530097Z cvt.rn.bf16x2.f32 %r7025, %r23086, %r23085; 2026-02-21T09:19:14.1530174Z cvt.rn.bf16x2.f32 %r7026, %r23088, %r23087; 2026-02-21T09:19:14.1530256Z cvt.rn.bf16x2.f32 %r7027, %r23090, %r23089; 2026-02-21T09:19:14.1530332Z cvt.rn.bf16x2.f32 %r7028, %r23092, %r23091; 2026-02-21T09:19:14.1530408Z cvt.rn.bf16x2.f32 %r7029, %r23094, %r23093; 2026-02-21T09:19:14.1530504Z cvt.rn.bf16x2.f32 %r7030, %r23096, %r23095; 2026-02-21T09:19:14.1530586Z cvt.rn.bf16x2.f32 %r7031, %r23098, %r23097; 2026-02-21T09:19:14.1530667Z cvt.rn.bf16x2.f32 %r7032, %r23100, %r23099; 2026-02-21T09:19:14.1530749Z cvt.rn.bf16x2.f32 %r7033, %r23102, %r23101; 2026-02-21T09:19:14.1530826Z cvt.rn.bf16x2.f32 %r7034, %r23104, %r23103; 2026-02-21T09:19:14.1530904Z cvt.rn.bf16x2.f32 %r7035, %r23106, %r23105; 2026-02-21T09:19:14.1531051Z cvt.rn.bf16x2.f32 %r7036, %r23108, %r23107; 2026-02-21T09:19:14.1531135Z cvt.rn.bf16x2.f32 %r7037, %r23110, %r23109; 2026-02-21T09:19:14.1531210Z cvt.rn.bf16x2.f32 %r7038, %r23112, %r23111; 2026-02-21T09:19:14.1531288Z cvt.rn.bf16x2.f32 %r7039, %r23114, %r23113; 2026-02-21T09:19:14.1531372Z cvt.rn.bf16x2.f32 %r7040, %r23116, %r23115; 2026-02-21T09:19:14.1531449Z cvt.rn.bf16x2.f32 %r7041, %r23118, %r23117; 2026-02-21T09:19:14.1531525Z cvt.rn.bf16x2.f32 %r7042, %r23120, %r23119; 2026-02-21T09:19:14.1531609Z cvt.rn.bf16x2.f32 %r7043, %r23122, %r23121; 2026-02-21T09:19:14.1531687Z cvt.rn.bf16x2.f32 %r7044, %r23124, %r23123; 2026-02-21T09:19:14.1531763Z cvt.rn.bf16x2.f32 %r7045, %r23126, %r23125; 2026-02-21T09:19:14.1531896Z cvt.rn.bf16x2.f32 %r7046, %r23128, %r23127; 2026-02-21T09:19:14.1531982Z cvt.rn.bf16x2.f32 %r7047, %r23130, %r23129; 2026-02-21T09:19:14.1532060Z cvt.rn.bf16x2.f32 %r7048, %r23132, %r23131; 2026-02-21T09:19:14.1532138Z cvt.rn.bf16x2.f32 %r7049, %r23134, %r23133; 2026-02-21T09:19:14.1532223Z cvt.rn.bf16x2.f32 %r7050, %r23136, %r23135; 2026-02-21T09:19:14.1532300Z cvt.rn.bf16x2.f32 %r7051, %r23138, %r23137; 2026-02-21T09:19:14.1532376Z cvt.rn.bf16x2.f32 %r7052, %r23140, %r23139; 2026-02-21T09:19:14.1532458Z cvt.rn.bf16x2.f32 %r7053, %r23142, %r23141; 2026-02-21T09:19:14.1532534Z cvt.rn.bf16x2.f32 %r7054, %r23144, %r23143; 2026-02-21T09:19:14.1532613Z cvt.rn.bf16x2.f32 %r7055, %r23146, %r23145; 2026-02-21T09:19:14.1532690Z cvt.rn.bf16x2.f32 %r7056, %r23148, %r23147; 2026-02-21T09:19:14.1532772Z cvt.rn.bf16x2.f32 %r7057, %r23150, %r23149; 2026-02-21T09:19:14.1532850Z cvt.rn.bf16x2.f32 %r7058, %r23152, %r23151; 2026-02-21T09:19:14.1532927Z cvt.rn.bf16x2.f32 %r7059, %r23154, %r23153; 2026-02-21T09:19:14.1533075Z cvt.rn.bf16x2.f32 %r7060, %r23156, %r23155; 2026-02-21T09:19:14.1533152Z cvt.rn.bf16x2.f32 %r7061, %r23158, %r23157; 2026-02-21T09:19:14.1533228Z cvt.rn.bf16x2.f32 %r7062, %r23160, %r23159; 2026-02-21T09:19:14.1533310Z cvt.rn.bf16x2.f32 %r7063, %r23162, %r23161; 2026-02-21T09:19:14.1533397Z cvt.rn.bf16x2.f32 %r7064, %r23164, %r23163; 2026-02-21T09:19:14.1533473Z cvt.rn.bf16x2.f32 %r7065, %r23166, %r23165; 2026-02-21T09:19:14.1533555Z cvt.rn.bf16x2.f32 %r7066, %r23168, %r23167; 2026-02-21T09:19:14.1533638Z cvt.rn.bf16x2.f32 %r7067, %r23170, %r23169; 2026-02-21T09:19:14.1533717Z cvt.rn.bf16x2.f32 %r7068, %r23172, %r23171; 2026-02-21T09:19:14.1533794Z cvt.rn.bf16x2.f32 %r7069, %r23174, %r23173; 2026-02-21T09:19:14.1533876Z cvt.rn.bf16x2.f32 %r7070, %r23176, %r23175; 2026-02-21T09:19:14.1533952Z cvt.rn.bf16x2.f32 %r7071, %r23178, %r23177; 2026-02-21T09:19:14.1534028Z cvt.rn.bf16x2.f32 %r7072, %r23180, %r23179; 2026-02-21T09:19:14.1534108Z cvt.rn.bf16x2.f32 %r7073, %r23182, %r23181; 2026-02-21T09:19:14.1534242Z cvt.rn.bf16x2.f32 %r7074, %r23184, %r23183; 2026-02-21T09:19:14.1534326Z cvt.rn.bf16x2.f32 %r7075, %r23186, %r23185; 2026-02-21T09:19:14.1534404Z cvt.rn.bf16x2.f32 %r7076, %r23188, %r23187; 2026-02-21T09:19:14.1534489Z cvt.rn.bf16x2.f32 %r7077, %r23190, %r23189; 2026-02-21T09:19:14.1534565Z cvt.rn.bf16x2.f32 %r7078, %r23192, %r23191; 2026-02-21T09:19:14.1534642Z cvt.rn.bf16x2.f32 %r7079, %r23194, %r23193; 2026-02-21T09:19:14.1534725Z cvt.rn.bf16x2.f32 %r7080, %r23196, %r23195; 2026-02-21T09:19:14.1534802Z cvt.rn.bf16x2.f32 %r7081, %r23198, %r23197; 2026-02-21T09:19:14.1534878Z cvt.rn.bf16x2.f32 %r7082, %r23200, %r23199; 2026-02-21T09:19:14.1534954Z cvt.rn.bf16x2.f32 %r7083, %r23202, %r23201; 2026-02-21T09:19:14.1535035Z cvt.rn.bf16x2.f32 %r7084, %r23204, %r23203; 2026-02-21T09:19:14.1535111Z cvt.rn.bf16x2.f32 %r7085, %r23206, %r23205; 2026-02-21T09:19:14.1535187Z cvt.rn.bf16x2.f32 %r7086, %r23208, %r23207; 2026-02-21T09:19:14.1535271Z cvt.rn.bf16x2.f32 %r7087, %r23210, %r23209; 2026-02-21T09:19:14.1535350Z cvt.rn.bf16x2.f32 %r7088, %r23212, %r23211; 2026-02-21T09:19:14.1535427Z cvt.rn.bf16x2.f32 %r7089, %r23214, %r23213; 2026-02-21T09:19:14.1535562Z cvt.rn.bf16x2.f32 %r7090, %r23216, %r23215; 2026-02-21T09:19:14.1535637Z cvt.rn.bf16x2.f32 %r7091, %r23218, %r23217; 2026-02-21T09:19:14.1535715Z cvt.rn.bf16x2.f32 %r7092, %r23220, %r23219; 2026-02-21T09:19:14.1535790Z cvt.rn.bf16x2.f32 %r7093, %r23222, %r23221; 2026-02-21T09:19:14.1535871Z cvt.rn.bf16x2.f32 %r7094, %r23224, %r23223; 2026-02-21T09:19:14.1535946Z cvt.rn.bf16x2.f32 %r7095, %r23226, %r23225; 2026-02-21T09:19:14.1536035Z cvt.rn.bf16x2.f32 %r7096, %r23228, %r23227; 2026-02-21T09:19:14.1536118Z cvt.rn.bf16x2.f32 %r7097, %r23230, %r23229; 2026-02-21T09:19:14.1536195Z cvt.rn.bf16x2.f32 %r7098, %r23232, %r23231; 2026-02-21T09:19:14.1536272Z cvt.rn.bf16x2.f32 %r7099, %r23234, %r23233; 2026-02-21T09:19:14.1536353Z cvt.rn.bf16x2.f32 %r7100, %r23236, %r23235; 2026-02-21T09:19:14.1536613Z cvt.rn.bf16x2.f32 %r7101, %r23238, %r23237; 2026-02-21T09:19:14.1536705Z cvt.rn.bf16x2.f32 %r7102, %r23240, %r23239; 2026-02-21T09:19:14.1536783Z cvt.rn.bf16x2.f32 %r7103, %r23242, %r23241; 2026-02-21T09:19:14.1536868Z cvt.rn.bf16x2.f32 %r7104, %r23244, %r23243; 2026-02-21T09:19:14.1536946Z cvt.rn.bf16x2.f32 %r7105, %r23246, %r23245; 2026-02-21T09:19:14.1537023Z cvt.rn.bf16x2.f32 %r7106, %r23248, %r23247; 2026-02-21T09:19:14.1537239Z .loc 1 91 43 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:43 2026-02-21T09:19:14.1537306Z shl.b32 %r7107, %r6947, 13; 2026-02-21T09:19:14.1537368Z shl.b32 %r7108, %r6948, 13; 2026-02-21T09:19:14.1537436Z shl.b32 %r7109, %r6949, 13; 2026-02-21T09:19:14.1537497Z shl.b32 %r7110, %r6950, 13; 2026-02-21T09:19:14.1537558Z shl.b32 %r7111, %r6951, 13; 2026-02-21T09:19:14.1537619Z shl.b32 %r7112, %r6952, 13; 2026-02-21T09:19:14.1537686Z shl.b32 %r7113, %r6953, 13; 2026-02-21T09:19:14.1537830Z shl.b32 %r7114, %r6954, 13; 2026-02-21T09:19:14.1537893Z shl.b32 %r7115, %r6955, 13; 2026-02-21T09:19:14.1537960Z shl.b32 %r7116, %r6956, 13; 2026-02-21T09:19:14.1538022Z shl.b32 %r7117, %r6957, 13; 2026-02-21T09:19:14.1538086Z shl.b32 %r7118, %r6958, 13; 2026-02-21T09:19:14.1538148Z shl.b32 %r7119, %r6959, 13; 2026-02-21T09:19:14.1538219Z shl.b32 %r7120, %r6960, 13; 2026-02-21T09:19:14.1538281Z shl.b32 %r7121, %r6961, 13; 2026-02-21T09:19:14.1538342Z shl.b32 %r7122, %r6962, 13; 2026-02-21T09:19:14.1538409Z shl.b32 %r7123, %r6963, 13; 2026-02-21T09:19:14.1538470Z shl.b32 %r7124, %r6964, 13; 2026-02-21T09:19:14.1538530Z shl.b32 %r7125, %r6965, 13; 2026-02-21T09:19:14.1538594Z shl.b32 %r7126, %r6966, 13; 2026-02-21T09:19:14.1538664Z shl.b32 %r7127, %r6967, 13; 2026-02-21T09:19:14.1538725Z shl.b32 %r7128, %r6968, 13; 2026-02-21T09:19:14.1538786Z shl.b32 %r7129, %r6969, 13; 2026-02-21T09:19:14.1538853Z shl.b32 %r7130, %r6970, 13; 2026-02-21T09:19:14.1538917Z shl.b32 %r7131, %r6971, 13; 2026-02-21T09:19:14.1539046Z shl.b32 %r7132, %r6972, 13; 2026-02-21T09:19:14.1539130Z shl.b32 %r7133, %r6973, 13; 2026-02-21T09:19:14.1539195Z shl.b32 %r7134, %r6974, 13; 2026-02-21T09:19:14.1539259Z shl.b32 %r7135, %r6975, 13; 2026-02-21T09:19:14.1539323Z shl.b32 %r7136, %r6976, 13; 2026-02-21T09:19:14.1539392Z shl.b32 %r7137, %r6977, 13; 2026-02-21T09:19:14.1539454Z shl.b32 %r7138, %r6978, 13; 2026-02-21T09:19:14.1539661Z .loc 1 91 50 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:50 2026-02-21T09:19:14.1539734Z add.s32 %r7139, %r7107, %r6946; 2026-02-21T09:19:14.1539799Z add.s32 %r7140, %r7108, %r6946; 2026-02-21T09:19:14.1539863Z add.s32 %r7141, %r7109, %r6946; 2026-02-21T09:19:14.1539927Z add.s32 %r7142, %r7110, %r6946; 2026-02-21T09:19:14.1539995Z add.s32 %r7143, %r7111, %r6946; 2026-02-21T09:19:14.1540059Z add.s32 %r7144, %r7112, %r6946; 2026-02-21T09:19:14.1540122Z add.s32 %r7145, %r7113, %r6946; 2026-02-21T09:19:14.1540195Z add.s32 %r7146, %r7114, %r6946; 2026-02-21T09:19:14.1540260Z add.s32 %r7147, %r7115, %r6946; 2026-02-21T09:19:14.1540324Z add.s32 %r7148, %r7116, %r6946; 2026-02-21T09:19:14.1540393Z add.s32 %r7149, %r7117, %r6946; 2026-02-21T09:19:14.1540531Z add.s32 %r7150, %r7118, %r6946; 2026-02-21T09:19:14.1540598Z add.s32 %r7151, %r7119, %r6946; 2026-02-21T09:19:14.1540662Z add.s32 %r7152, %r7120, %r6946; 2026-02-21T09:19:14.1540732Z add.s32 %r7153, %r7121, %r6946; 2026-02-21T09:19:14.1540794Z add.s32 %r7154, %r7122, %r6946; 2026-02-21T09:19:14.1540857Z add.s32 %r7155, %r7123, %r6946; 2026-02-21T09:19:14.1540926Z add.s32 %r7156, %r7124, %r6946; 2026-02-21T09:19:14.1540990Z add.s32 %r7157, %r7125, %r6946; 2026-02-21T09:19:14.1541054Z add.s32 %r7158, %r7126, %r6946; 2026-02-21T09:19:14.1541117Z add.s32 %r7159, %r7127, %r6946; 2026-02-21T09:19:14.1541187Z add.s32 %r7160, %r7128, %r6946; 2026-02-21T09:19:14.1541250Z add.s32 %r7161, %r7129, %r6946; 2026-02-21T09:19:14.1541382Z add.s32 %r7162, %r7130, %r6946; 2026-02-21T09:19:14.1541457Z add.s32 %r7163, %r7131, %r6946; 2026-02-21T09:19:14.1541521Z add.s32 %r7164, %r7132, %r6946; 2026-02-21T09:19:14.1541585Z add.s32 %r7165, %r7133, %r6946; 2026-02-21T09:19:14.1541655Z add.s32 %r7166, %r7134, %r6946; 2026-02-21T09:19:14.1541725Z add.s32 %r7167, %r7135, %r6946; 2026-02-21T09:19:14.1541789Z add.s32 %r7168, %r7136, %r6946; 2026-02-21T09:19:14.1541852Z add.s32 %r7169, %r7137, %r6946; 2026-02-21T09:19:14.1541922Z add.s32 %r7170, %r7138, %r6946; 2026-02-21T09:19:14.1542124Z .loc 1 91 22 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:22 2026-02-21T09:19:14.1542202Z mad.wide.s32 %rd245, %r7139, 2, %rd66; 2026-02-21T09:19:14.1542281Z mad.wide.s32 %rd246, %r7140, 2, %rd66; 2026-02-21T09:19:14.1542353Z mad.wide.s32 %rd247, %r7141, 2, %rd66; 2026-02-21T09:19:14.1542424Z mad.wide.s32 %rd248, %r7142, 2, %rd66; 2026-02-21T09:19:14.1542494Z mad.wide.s32 %rd249, %r7143, 2, %rd66; 2026-02-21T09:19:14.1542623Z mad.wide.s32 %rd250, %r7144, 2, %rd66; 2026-02-21T09:19:14.1542694Z mad.wide.s32 %rd251, %r7145, 2, %rd66; 2026-02-21T09:19:14.1542762Z mad.wide.s32 %rd252, %r7146, 2, %rd66; 2026-02-21T09:19:14.1542838Z mad.wide.s32 %rd253, %r7147, 2, %rd66; 2026-02-21T09:19:14.1542907Z mad.wide.s32 %rd254, %r7148, 2, %rd66; 2026-02-21T09:19:14.1542977Z mad.wide.s32 %rd255, %r7149, 2, %rd66; 2026-02-21T09:19:14.1543051Z mad.wide.s32 %rd256, %r7150, 2, %rd66; 2026-02-21T09:19:14.1543120Z mad.wide.s32 %rd257, %r7151, 2, %rd66; 2026-02-21T09:19:14.1543190Z mad.wide.s32 %rd258, %r7152, 2, %rd66; 2026-02-21T09:19:14.1543259Z mad.wide.s32 %rd259, %r7153, 2, %rd66; 2026-02-21T09:19:14.1543332Z mad.wide.s32 %rd260, %r7154, 2, %rd66; 2026-02-21T09:19:14.1543406Z mad.wide.s32 %rd261, %r7155, 2, %rd66; 2026-02-21T09:19:14.1543475Z mad.wide.s32 %rd262, %r7156, 2, %rd66; 2026-02-21T09:19:14.1543547Z mad.wide.s32 %rd263, %r7157, 2, %rd66; 2026-02-21T09:19:14.1543620Z mad.wide.s32 %rd264, %r7158, 2, %rd66; 2026-02-21T09:19:14.1543770Z mad.wide.s32 %rd265, %r7159, 2, %rd66; 2026-02-21T09:19:14.1543846Z mad.wide.s32 %rd266, %r7160, 2, %rd66; 2026-02-21T09:19:14.1543925Z mad.wide.s32 %rd267, %r7161, 2, %rd66; 2026-02-21T09:19:14.1543995Z mad.wide.s32 %rd268, %r7162, 2, %rd66; 2026-02-21T09:19:14.1544063Z mad.wide.s32 %rd269, %r7163, 2, %rd66; 2026-02-21T09:19:14.1544138Z mad.wide.s32 %rd270, %r7164, 2, %rd66; 2026-02-21T09:19:14.1544207Z mad.wide.s32 %rd271, %r7165, 2, %rd66; 2026-02-21T09:19:14.1544274Z mad.wide.s32 %rd272, %r7166, 2, %rd66; 2026-02-21T09:19:14.1544346Z mad.wide.s32 %rd273, %r7167, 2, %rd66; 2026-02-21T09:19:14.1544414Z mad.wide.s32 %rd274, %r7168, 2, %rd66; 2026-02-21T09:19:14.1544483Z mad.wide.s32 %rd275, %r7169, 2, %rd66; 2026-02-21T09:19:14.1544553Z mad.wide.s32 %rd276, %r7170, 2, %rd66; 2026-02-21T09:19:14.1544760Z .loc 1 91 81 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:81 2026-02-21T09:19:14.1544882Z st.shared.v4.b32 [%r165], {%r6979, %r6981, %r6983, %r6985}; 2026-02-21T09:19:14.1545009Z st.shared.v4.b32 [%r165+512], {%r6980, %r6982, %r6984, %r6986}; 2026-02-21T09:19:14.1545125Z st.shared.v4.b32 [%r166], {%r6987, %r6989, %r6991, %r6993}; 2026-02-21T09:19:14.1545295Z st.shared.v4.b32 [%r166+512], {%r6988, %r6990, %r6992, %r6994}; 2026-02-21T09:19:14.1545401Z st.shared.v4.b32 [%r167], {%r6995, %r6997, %r6999, %r7001}; 2026-02-21T09:19:14.1545517Z st.shared.v4.b32 [%r167+512], {%r6996, %r6998, %r7000, %r7002}; 2026-02-21T09:19:14.1545622Z st.shared.v4.b32 [%r168], {%r7003, %r7005, %r7007, %r7009}; 2026-02-21T09:19:14.1545733Z st.shared.v4.b32 [%r168+512], {%r7004, %r7006, %r7008, %r7010}; 2026-02-21T09:19:14.1545793Z bar.sync 0; 2026-02-21T09:19:14.1545861Z // begin inline asm 2026-02-21T09:19:14.1546057Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6635, %r6636, %r6637, %r6638}, [%r6479]; 2026-02-21T09:19:14.1546119Z // end inline asm 2026-02-21T09:19:14.1546185Z // begin inline asm 2026-02-21T09:19:14.1546425Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6639, %r6640, %r6641, %r6642}, [%r6484]; 2026-02-21T09:19:14.1546606Z // end inline asm 2026-02-21T09:19:14.1546675Z // begin inline asm 2026-02-21T09:19:14.1546859Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6643, %r6644, %r6645, %r6646}, [%r6489]; 2026-02-21T09:19:14.1546918Z // end inline asm 2026-02-21T09:19:14.1546979Z // begin inline asm 2026-02-21T09:19:14.1547164Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6647, %r6648, %r6649, %r6650}, [%r6494]; 2026-02-21T09:19:14.1547223Z // end inline asm 2026-02-21T09:19:14.1547283Z // begin inline asm 2026-02-21T09:19:14.1547468Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6651, %r6652, %r6653, %r6654}, [%r6499]; 2026-02-21T09:19:14.1547527Z // end inline asm 2026-02-21T09:19:14.1547588Z // begin inline asm 2026-02-21T09:19:14.1547774Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6655, %r6656, %r6657, %r6658}, [%r6504]; 2026-02-21T09:19:14.1547833Z // end inline asm 2026-02-21T09:19:14.1547971Z // begin inline asm 2026-02-21T09:19:14.1548158Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6659, %r6660, %r6661, %r6662}, [%r6509]; 2026-02-21T09:19:14.1548230Z // end inline asm 2026-02-21T09:19:14.1548293Z // begin inline asm 2026-02-21T09:19:14.1548474Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6663, %r6664, %r6665, %r6666}, [%r6514]; 2026-02-21T09:19:14.1548611Z // end inline asm 2026-02-21T09:19:14.1548673Z bar.sync 0; 2026-02-21T09:19:14.1548786Z st.shared.v4.b32 [%r165], {%r7011, %r7013, %r7015, %r7017}; 2026-02-21T09:19:14.1548901Z st.shared.v4.b32 [%r165+512], {%r7012, %r7014, %r7016, %r7018}; 2026-02-21T09:19:14.1549016Z st.shared.v4.b32 [%r166], {%r7019, %r7021, %r7023, %r7025}; 2026-02-21T09:19:14.1549131Z st.shared.v4.b32 [%r166+512], {%r7020, %r7022, %r7024, %r7026}; 2026-02-21T09:19:14.1549237Z st.shared.v4.b32 [%r167], {%r7027, %r7029, %r7031, %r7033}; 2026-02-21T09:19:14.1549355Z st.shared.v4.b32 [%r167+512], {%r7028, %r7030, %r7032, %r7034}; 2026-02-21T09:19:14.1549535Z st.shared.v4.b32 [%r168], {%r7035, %r7037, %r7039, %r7041}; 2026-02-21T09:19:14.1549652Z st.shared.v4.b32 [%r168+512], {%r7036, %r7038, %r7040, %r7042}; 2026-02-21T09:19:14.1549716Z bar.sync 0; 2026-02-21T09:19:14.1549779Z // begin inline asm 2026-02-21T09:19:14.1549966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6667, %r6668, %r6669, %r6670}, [%r6479]; 2026-02-21T09:19:14.1550025Z // end inline asm 2026-02-21T09:19:14.1550095Z // begin inline asm 2026-02-21T09:19:14.1550276Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6671, %r6672, %r6673, %r6674}, [%r6484]; 2026-02-21T09:19:14.1550336Z // end inline asm 2026-02-21T09:19:14.1550406Z // begin inline asm 2026-02-21T09:19:14.1550588Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6675, %r6676, %r6677, %r6678}, [%r6489]; 2026-02-21T09:19:14.1550647Z // end inline asm 2026-02-21T09:19:14.1550713Z // begin inline asm 2026-02-21T09:19:14.1550892Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6679, %r6680, %r6681, %r6682}, [%r6494]; 2026-02-21T09:19:14.1550956Z // end inline asm 2026-02-21T09:19:14.1551018Z // begin inline asm 2026-02-21T09:19:14.1551204Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6683, %r6684, %r6685, %r6686}, [%r6499]; 2026-02-21T09:19:14.1551335Z // end inline asm 2026-02-21T09:19:14.1551396Z // begin inline asm 2026-02-21T09:19:14.1551579Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6687, %r6688, %r6689, %r6690}, [%r6504]; 2026-02-21T09:19:14.1551637Z // end inline asm 2026-02-21T09:19:14.1553981Z // begin inline asm 2026-02-21T09:19:14.1554190Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6691, %r6692, %r6693, %r6694}, [%r6509]; 2026-02-21T09:19:14.1554255Z // end inline asm 2026-02-21T09:19:14.1554317Z // begin inline asm 2026-02-21T09:19:14.1554506Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6695, %r6696, %r6697, %r6698}, [%r6514]; 2026-02-21T09:19:14.1554562Z // end inline asm 2026-02-21T09:19:14.1554625Z bar.sync 0; 2026-02-21T09:19:14.1554746Z st.shared.v4.b32 [%r165], {%r7043, %r7045, %r7047, %r7049}; 2026-02-21T09:19:14.1554969Z st.shared.v4.b32 [%r165+512], {%r7044, %r7046, %r7048, %r7050}; 2026-02-21T09:19:14.1555086Z st.shared.v4.b32 [%r166], {%r7051, %r7053, %r7055, %r7057}; 2026-02-21T09:19:14.1555199Z st.shared.v4.b32 [%r166+512], {%r7052, %r7054, %r7056, %r7058}; 2026-02-21T09:19:14.1555305Z st.shared.v4.b32 [%r167], {%r7059, %r7061, %r7063, %r7065}; 2026-02-21T09:19:14.1555416Z st.shared.v4.b32 [%r167+512], {%r7060, %r7062, %r7064, %r7066}; 2026-02-21T09:19:14.1555537Z st.shared.v4.b32 [%r168], {%r7067, %r7069, %r7071, %r7073}; 2026-02-21T09:19:14.1555644Z st.shared.v4.b32 [%r168+512], {%r7068, %r7070, %r7072, %r7074}; 2026-02-21T09:19:14.1555703Z bar.sync 0; 2026-02-21T09:19:14.1555771Z // begin inline asm 2026-02-21T09:19:14.1555963Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6699, %r6700, %r6701, %r6702}, [%r6479]; 2026-02-21T09:19:14.1556019Z // end inline asm 2026-02-21T09:19:14.1556083Z // begin inline asm 2026-02-21T09:19:14.1556266Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6703, %r6704, %r6705, %r6706}, [%r6484]; 2026-02-21T09:19:14.1556326Z // end inline asm 2026-02-21T09:19:14.1556386Z // begin inline asm 2026-02-21T09:19:14.1556726Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6707, %r6708, %r6709, %r6710}, [%r6489]; 2026-02-21T09:19:14.1556788Z // end inline asm 2026-02-21T09:19:14.1556848Z // begin inline asm 2026-02-21T09:19:14.1557028Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6711, %r6712, %r6713, %r6714}, [%r6494]; 2026-02-21T09:19:14.1557085Z // end inline asm 2026-02-21T09:19:14.1557145Z // begin inline asm 2026-02-21T09:19:14.1557325Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6715, %r6716, %r6717, %r6718}, [%r6499]; 2026-02-21T09:19:14.1557386Z // end inline asm 2026-02-21T09:19:14.1557444Z // begin inline asm 2026-02-21T09:19:14.1557620Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6719, %r6720, %r6721, %r6722}, [%r6504]; 2026-02-21T09:19:14.1557678Z // end inline asm 2026-02-21T09:19:14.1557739Z // begin inline asm 2026-02-21T09:19:14.1558011Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6723, %r6724, %r6725, %r6726}, [%r6509]; 2026-02-21T09:19:14.1558074Z // end inline asm 2026-02-21T09:19:14.1558133Z // begin inline asm 2026-02-21T09:19:14.1558311Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6727, %r6728, %r6729, %r6730}, [%r6514]; 2026-02-21T09:19:14.1558368Z // end inline asm 2026-02-21T09:19:14.1558428Z bar.sync 0; 2026-02-21T09:19:14.1558539Z st.shared.v4.b32 [%r165], {%r7075, %r7077, %r7079, %r7081}; 2026-02-21T09:19:14.1558653Z st.shared.v4.b32 [%r165+512], {%r7076, %r7078, %r7080, %r7082}; 2026-02-21T09:19:14.1558764Z st.shared.v4.b32 [%r166], {%r7083, %r7085, %r7087, %r7089}; 2026-02-21T09:19:14.1558874Z st.shared.v4.b32 [%r166+512], {%r7084, %r7086, %r7088, %r7090}; 2026-02-21T09:19:14.1558975Z st.shared.v4.b32 [%r167], {%r7091, %r7093, %r7095, %r7097}; 2026-02-21T09:19:14.1559088Z st.shared.v4.b32 [%r167+512], {%r7092, %r7094, %r7096, %r7098}; 2026-02-21T09:19:14.1559189Z st.shared.v4.b32 [%r168], {%r7099, %r7101, %r7103, %r7105}; 2026-02-21T09:19:14.1559300Z st.shared.v4.b32 [%r168+512], {%r7100, %r7102, %r7104, %r7106}; 2026-02-21T09:19:14.1559358Z bar.sync 0; 2026-02-21T09:19:14.1559424Z // begin inline asm 2026-02-21T09:19:14.1559683Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6731, %r6732, %r6733, %r6734}, [%r6479]; 2026-02-21T09:19:14.1559740Z // end inline asm 2026-02-21T09:19:14.1559806Z // begin inline asm 2026-02-21T09:19:14.1559989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6735, %r6736, %r6737, %r6738}, [%r6484]; 2026-02-21T09:19:14.1560177Z // end inline asm 2026-02-21T09:19:14.1560254Z // begin inline asm 2026-02-21T09:19:14.1560437Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6739, %r6740, %r6741, %r6742}, [%r6489]; 2026-02-21T09:19:14.1560496Z // end inline asm 2026-02-21T09:19:14.1560554Z // begin inline asm 2026-02-21T09:19:14.1560739Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6743, %r6744, %r6745, %r6746}, [%r6494]; 2026-02-21T09:19:14.1560799Z // end inline asm 2026-02-21T09:19:14.1560927Z // begin inline asm 2026-02-21T09:19:14.1561115Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6747, %r6748, %r6749, %r6750}, [%r6499]; 2026-02-21T09:19:14.1561173Z // end inline asm 2026-02-21T09:19:14.1561235Z // begin inline asm 2026-02-21T09:19:14.1561410Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6751, %r6752, %r6753, %r6754}, [%r6504]; 2026-02-21T09:19:14.1561474Z // end inline asm 2026-02-21T09:19:14.1561533Z // begin inline asm 2026-02-21T09:19:14.1561713Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6755, %r6756, %r6757, %r6758}, [%r6509]; 2026-02-21T09:19:14.1561782Z // end inline asm 2026-02-21T09:19:14.1561845Z // begin inline asm 2026-02-21T09:19:14.1562035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6759, %r6760, %r6761, %r6762}, [%r6514]; 2026-02-21T09:19:14.1562099Z // end inline asm 2026-02-21T09:19:14.1562160Z // begin inline asm 2026-02-21T09:19:14.1562292Z st.global.v4.b32 [ %rd245 + 0 ], { %r6635, %r6636, %r6637, %r6638 }; 2026-02-21T09:19:14.1562350Z // end inline asm 2026-02-21T09:19:14.1562418Z // begin inline asm 2026-02-21T09:19:14.1562543Z st.global.v4.b32 [ %rd246 + 0 ], { %r6639, %r6640, %r6641, %r6642 }; 2026-02-21T09:19:14.1562601Z // end inline asm 2026-02-21T09:19:14.1562668Z // begin inline asm 2026-02-21T09:19:14.1562786Z st.global.v4.b32 [ %rd247 + 0 ], { %r6643, %r6644, %r6645, %r6646 }; 2026-02-21T09:19:14.1562844Z // end inline asm 2026-02-21T09:19:14.1562907Z // begin inline asm 2026-02-21T09:19:14.1563026Z st.global.v4.b32 [ %rd248 + 0 ], { %r6647, %r6648, %r6649, %r6650 }; 2026-02-21T09:19:14.1563086Z // end inline asm 2026-02-21T09:19:14.1563146Z // begin inline asm 2026-02-21T09:19:14.1563275Z st.global.v4.b32 [ %rd249 + 0 ], { %r6651, %r6652, %r6653, %r6654 }; 2026-02-21T09:19:14.1563335Z // end inline asm 2026-02-21T09:19:14.1563396Z // begin inline asm 2026-02-21T09:19:14.1563513Z st.global.v4.b32 [ %rd250 + 0 ], { %r6655, %r6656, %r6657, %r6658 }; 2026-02-21T09:19:14.1563571Z // end inline asm 2026-02-21T09:19:14.1563631Z // begin inline asm 2026-02-21T09:19:14.1563799Z st.global.v4.b32 [ %rd251 + 0 ], { %r6659, %r6660, %r6661, %r6662 }; 2026-02-21T09:19:14.1563863Z // end inline asm 2026-02-21T09:19:14.1563921Z // begin inline asm 2026-02-21T09:19:14.1564034Z st.global.v4.b32 [ %rd252 + 0 ], { %r6663, %r6664, %r6665, %r6666 }; 2026-02-21T09:19:14.1564093Z // end inline asm 2026-02-21T09:19:14.1564152Z // begin inline asm 2026-02-21T09:19:14.1564266Z st.global.v4.b32 [ %rd253 + 0 ], { %r6667, %r6668, %r6669, %r6670 }; 2026-02-21T09:19:14.1564327Z // end inline asm 2026-02-21T09:19:14.1564394Z // begin inline asm 2026-02-21T09:19:14.1564508Z st.global.v4.b32 [ %rd254 + 0 ], { %r6671, %r6672, %r6673, %r6674 }; 2026-02-21T09:19:14.1564566Z // end inline asm 2026-02-21T09:19:14.1564631Z // begin inline asm 2026-02-21T09:19:14.1564743Z st.global.v4.b32 [ %rd255 + 0 ], { %r6675, %r6676, %r6677, %r6678 }; 2026-02-21T09:19:14.1564801Z // end inline asm 2026-02-21T09:19:14.1564863Z // begin inline asm 2026-02-21T09:19:14.1564979Z st.global.v4.b32 [ %rd256 + 0 ], { %r6679, %r6680, %r6681, %r6682 }; 2026-02-21T09:19:14.1565037Z // end inline asm 2026-02-21T09:19:14.1565100Z // begin inline asm 2026-02-21T09:19:14.1565220Z st.global.v4.b32 [ %rd257 + 0 ], { %r6683, %r6684, %r6685, %r6686 }; 2026-02-21T09:19:14.1565335Z // end inline asm 2026-02-21T09:19:14.1565394Z // begin inline asm 2026-02-21T09:19:14.1565515Z st.global.v4.b32 [ %rd258 + 0 ], { %r6687, %r6688, %r6689, %r6690 }; 2026-02-21T09:19:14.1565632Z // end inline asm 2026-02-21T09:19:14.1565692Z // begin inline asm 2026-02-21T09:19:14.1565805Z st.global.v4.b32 [ %rd259 + 0 ], { %r6691, %r6692, %r6693, %r6694 }; 2026-02-21T09:19:14.1565866Z // end inline asm 2026-02-21T09:19:14.1565923Z // begin inline asm 2026-02-21T09:19:14.1566037Z st.global.v4.b32 [ %rd260 + 0 ], { %r6695, %r6696, %r6697, %r6698 }; 2026-02-21T09:19:14.1566098Z // end inline asm 2026-02-21T09:19:14.1566157Z // begin inline asm 2026-02-21T09:19:14.1566350Z st.global.v4.b32 [ %rd261 + 0 ], { %r6699, %r6700, %r6701, %r6702 }; 2026-02-21T09:19:14.1566418Z // end inline asm 2026-02-21T09:19:14.1566603Z // begin inline asm 2026-02-21T09:19:14.1566724Z st.global.v4.b32 [ %rd262 + 0 ], { %r6703, %r6704, %r6705, %r6706 }; 2026-02-21T09:19:14.1566786Z // end inline asm 2026-02-21T09:19:14.1566850Z // begin inline asm 2026-02-21T09:19:14.1566963Z st.global.v4.b32 [ %rd263 + 0 ], { %r6707, %r6708, %r6709, %r6710 }; 2026-02-21T09:19:14.1567019Z // end inline asm 2026-02-21T09:19:14.1567085Z // begin inline asm 2026-02-21T09:19:14.1567198Z st.global.v4.b32 [ %rd264 + 0 ], { %r6711, %r6712, %r6713, %r6714 }; 2026-02-21T09:19:14.1567254Z // end inline asm 2026-02-21T09:19:14.1567313Z // begin inline asm 2026-02-21T09:19:14.1567431Z st.global.v4.b32 [ %rd265 + 0 ], { %r6715, %r6716, %r6717, %r6718 }; 2026-02-21T09:19:14.1567487Z // end inline asm 2026-02-21T09:19:14.1567546Z // begin inline asm 2026-02-21T09:19:14.1567663Z st.global.v4.b32 [ %rd266 + 0 ], { %r6719, %r6720, %r6721, %r6722 }; 2026-02-21T09:19:14.1567721Z // end inline asm 2026-02-21T09:19:14.1567781Z // begin inline asm 2026-02-21T09:19:14.1567900Z st.global.v4.b32 [ %rd267 + 0 ], { %r6723, %r6724, %r6725, %r6726 }; 2026-02-21T09:19:14.1567958Z // end inline asm 2026-02-21T09:19:14.1568016Z // begin inline asm 2026-02-21T09:19:14.1568129Z st.global.v4.b32 [ %rd268 + 0 ], { %r6727, %r6728, %r6729, %r6730 }; 2026-02-21T09:19:14.1568201Z // end inline asm 2026-02-21T09:19:14.1568263Z // begin inline asm 2026-02-21T09:19:14.1568379Z st.global.v4.b32 [ %rd269 + 0 ], { %r6731, %r6732, %r6733, %r6734 }; 2026-02-21T09:19:14.1568443Z // end inline asm 2026-02-21T09:19:14.1568502Z // begin inline asm 2026-02-21T09:19:14.1568617Z st.global.v4.b32 [ %rd270 + 0 ], { %r6735, %r6736, %r6737, %r6738 }; 2026-02-21T09:19:14.1568676Z // end inline asm 2026-02-21T09:19:14.1568738Z // begin inline asm 2026-02-21T09:19:14.1568853Z st.global.v4.b32 [ %rd271 + 0 ], { %r6739, %r6740, %r6741, %r6742 }; 2026-02-21T09:19:14.1568910Z // end inline asm 2026-02-21T09:19:14.1569056Z // begin inline asm 2026-02-21T09:19:14.1569173Z st.global.v4.b32 [ %rd272 + 0 ], { %r6743, %r6744, %r6745, %r6746 }; 2026-02-21T09:19:14.1569230Z // end inline asm 2026-02-21T09:19:14.1569291Z // begin inline asm 2026-02-21T09:19:14.1569414Z st.global.v4.b32 [ %rd273 + 0 ], { %r6747, %r6748, %r6749, %r6750 }; 2026-02-21T09:19:14.1569471Z // end inline asm 2026-02-21T09:19:14.1569532Z // begin inline asm 2026-02-21T09:19:14.1569651Z st.global.v4.b32 [ %rd274 + 0 ], { %r6751, %r6752, %r6753, %r6754 }; 2026-02-21T09:19:14.1569710Z // end inline asm 2026-02-21T09:19:14.1569772Z // begin inline asm 2026-02-21T09:19:14.1569891Z st.global.v4.b32 [ %rd275 + 0 ], { %r6755, %r6756, %r6757, %r6758 }; 2026-02-21T09:19:14.1569948Z // end inline asm 2026-02-21T09:19:14.1570006Z // begin inline asm 2026-02-21T09:19:14.1570124Z st.global.v4.b32 [ %rd276 + 0 ], { %r6759, %r6760, %r6761, %r6762 }; 2026-02-21T09:19:14.1570186Z // end inline asm 2026-02-21T09:19:14.1570416Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1570485Z add.s32 %r7171, %r22988, 1; 2026-02-21T09:19:14.1570722Z .loc 1 29 33 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:29:33 2026-02-21T09:19:14.1570864Z shr.u32 %r7172, %r7171, 5; 2026-02-21T09:19:14.1570933Z and.b32 %r7173, %r7172, 67108856; 2026-02-21T09:19:14.1571144Z .loc 1 30 39 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:39 2026-02-21T09:19:14.1571271Z sub.s32 %r7174, 64, %r7173; 2026-02-21T09:19:14.1571486Z .loc 1 30 52 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:52 2026-02-21T09:19:14.1571551Z min.s32 %r7175, %r7174, 8; 2026-02-21T09:19:14.1571761Z .loc 1 31 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:45 2026-02-21T09:19:14.1571826Z and.b32 %r7176, %r7171, 255; 2026-02-21T09:19:14.1572090Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.1572168Z div.s32 %r7177, %r7176, %r7175; 2026-02-21T09:19:14.1572368Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.1572439Z mul.lo.s32 %r7178, %r7177, %r7175; 2026-02-21T09:19:14.1572511Z sub.s32 %r7179, %r7176, %r7178; 2026-02-21T09:19:14.1572710Z .loc 1 31 30 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:30 2026-02-21T09:19:14.1572776Z add.s32 %r7180, %r7179, %r7173; 2026-02-21T09:19:14.1572976Z .loc 1 33 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:33:27 2026-02-21T09:19:14.1573039Z shl.b32 %r851, %r7180, 7; 2026-02-21T09:19:14.1573236Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.1573302Z or.b32 %r7181, %r851, %r7; 2026-02-21T09:19:14.1573512Z .loc 1 35 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:35:27 2026-02-21T09:19:14.1573573Z shl.b32 %r852, %r7177, 9; 2026-02-21T09:19:14.1573773Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.1573849Z or.b32 %r7182, %r852, %r11; 2026-02-21T09:19:14.1573912Z or.b32 %r7183, %r852, %r12; 2026-02-21T09:19:14.1573973Z or.b32 %r7184, %r852, %r13; 2026-02-21T09:19:14.1574044Z or.b32 %r7185, %r852, %r14; 2026-02-21T09:19:14.1574113Z or.b32 %r7186, %r852, %r15; 2026-02-21T09:19:14.1574176Z or.b32 %r7187, %r852, %r16; 2026-02-21T09:19:14.1574236Z or.b32 %r7188, %r852, %r17; 2026-02-21T09:19:14.1574301Z or.b32 %r7189, %r852, %r18; 2026-02-21T09:19:14.1574501Z .loc 1 51 53 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:53 2026-02-21T09:19:14.1574562Z shl.b32 %r7190, %r7182, 10; 2026-02-21T09:19:14.1574625Z shl.b32 %r7191, %r7183, 10; 2026-02-21T09:19:14.1574685Z shl.b32 %r7192, %r7184, 10; 2026-02-21T09:19:14.1574816Z shl.b32 %r7193, %r7185, 10; 2026-02-21T09:19:14.1574883Z shl.b32 %r7194, %r7186, 10; 2026-02-21T09:19:14.1574943Z shl.b32 %r7195, %r7187, 10; 2026-02-21T09:19:14.1575005Z shl.b32 %r7196, %r7188, 10; 2026-02-21T09:19:14.1575066Z shl.b32 %r7197, %r7189, 10; 2026-02-21T09:19:14.1575271Z .loc 1 51 60 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:60 2026-02-21T09:19:14.1575337Z or.b32 %r7198, %r7190, %r54; 2026-02-21T09:19:14.1575397Z or.b32 %r7199, %r7191, %r54; 2026-02-21T09:19:14.1575461Z or.b32 %r7200, %r7192, %r54; 2026-02-21T09:19:14.1575531Z or.b32 %r7201, %r7193, %r54; 2026-02-21T09:19:14.1575594Z or.b32 %r7202, %r7194, %r54; 2026-02-21T09:19:14.1575655Z or.b32 %r7203, %r7195, %r54; 2026-02-21T09:19:14.1575717Z or.b32 %r7204, %r7196, %r54; 2026-02-21T09:19:14.1575777Z or.b32 %r7205, %r7197, %r54; 2026-02-21T09:19:14.1575980Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1576059Z mad.wide.s32 %rd277, %r7198, 2, %rd64; 2026-02-21T09:19:14.1576129Z mad.wide.s32 %rd278, %r7199, 2, %rd64; 2026-02-21T09:19:14.1576254Z mad.wide.s32 %rd279, %r7200, 2, %rd64; 2026-02-21T09:19:14.1576327Z mad.wide.s32 %rd280, %r7201, 2, %rd64; 2026-02-21T09:19:14.1576394Z mad.wide.s32 %rd281, %r7202, 2, %rd64; 2026-02-21T09:19:14.1576568Z mad.wide.s32 %rd282, %r7203, 2, %rd64; 2026-02-21T09:19:14.1576714Z mad.wide.s32 %rd283, %r7204, 2, %rd64; 2026-02-21T09:19:14.1576784Z mad.wide.s32 %rd284, %r7205, 2, %rd64; 2026-02-21T09:19:14.1576981Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1577038Z bar.sync 0; 2026-02-21T09:19:14.1577099Z mov.b32 %r6764, 8; 2026-02-21T09:19:14.1577158Z // begin inline asm 2026-02-21T09:19:14.1577297Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd277 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1577429Z // end inline asm 2026-02-21T09:19:14.1577492Z // begin inline asm 2026-02-21T09:19:14.1577624Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd278 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1577694Z // end inline asm 2026-02-21T09:19:14.1577761Z // begin inline asm 2026-02-21T09:19:14.1577890Z cp.async.ca.shared.global [ %r59 + 0 ], [ %rd279 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1577947Z // end inline asm 2026-02-21T09:19:14.1578009Z // begin inline asm 2026-02-21T09:19:14.1578137Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd280 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1578195Z // end inline asm 2026-02-21T09:19:14.1578254Z // begin inline asm 2026-02-21T09:19:14.1578385Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd281 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1578442Z // end inline asm 2026-02-21T09:19:14.1578500Z // begin inline asm 2026-02-21T09:19:14.1578629Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd282 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1578685Z // end inline asm 2026-02-21T09:19:14.1578746Z // begin inline asm 2026-02-21T09:19:14.1578875Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd283 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1578937Z // end inline asm 2026-02-21T09:19:14.1578995Z // begin inline asm 2026-02-21T09:19:14.1579124Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd284 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1579185Z // end inline asm 2026-02-21T09:19:14.1579252Z cp.async.commit_group; 2026-02-21T09:19:14.1579452Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1579525Z add.s32 %r7206, %r7181, %r22968; 2026-02-21T09:19:14.1579724Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1579793Z cvt.s64.s32 %rd368, %r7206; 2026-02-21T09:19:14.1579862Z add.s64 %rd285, %rd65, %rd368; 2026-02-21T09:19:14.1579927Z mov.b32 %r23252, 4; 2026-02-21T09:19:14.1580128Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1580261Z // begin inline asm 2026-02-21T09:19:14.1580410Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd285 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1580470Z // end inline asm 2026-02-21T09:19:14.1580539Z cp.async.commit_group; 2026-02-21T09:19:14.1580742Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1580808Z cvt.s64.s32 %rd369, %r7190; 2026-02-21T09:19:14.1580876Z or.b64 %rd370, %rd369, %rd1113; 2026-02-21T09:19:14.1580942Z shl.b64 %rd371, %rd370, 1; 2026-02-21T09:19:14.1581014Z add.s64 %rd372, %rd64, %rd371; 2026-02-21T09:19:14.1581079Z add.s64 %rd286, %rd372, 32; 2026-02-21T09:19:14.1581141Z cvt.s64.s32 %rd373, %r7191; 2026-02-21T09:19:14.1581213Z or.b64 %rd374, %rd373, %rd1113; 2026-02-21T09:19:14.1599591Z shl.b64 %rd375, %rd374, 1; 2026-02-21T09:19:14.1599711Z add.s64 %rd376, %rd64, %rd375; 2026-02-21T09:19:14.1599784Z add.s64 %rd287, %rd376, 32; 2026-02-21T09:19:14.1599859Z cvt.s64.s32 %rd377, %r7192; 2026-02-21T09:19:14.1599931Z or.b64 %rd378, %rd377, %rd1113; 2026-02-21T09:19:14.1599997Z shl.b64 %rd379, %rd378, 1; 2026-02-21T09:19:14.1600061Z add.s64 %rd380, %rd64, %rd379; 2026-02-21T09:19:14.1600283Z add.s64 %rd288, %rd380, 32; 2026-02-21T09:19:14.1600349Z cvt.s64.s32 %rd381, %r7193; 2026-02-21T09:19:14.1600412Z or.b64 %rd382, %rd381, %rd1113; 2026-02-21T09:19:14.1600474Z shl.b64 %rd383, %rd382, 1; 2026-02-21T09:19:14.1600619Z add.s64 %rd384, %rd64, %rd383; 2026-02-21T09:19:14.1600678Z add.s64 %rd289, %rd384, 32; 2026-02-21T09:19:14.1600737Z cvt.s64.s32 %rd385, %r7194; 2026-02-21T09:19:14.1600801Z or.b64 %rd386, %rd385, %rd1113; 2026-02-21T09:19:14.1600867Z shl.b64 %rd387, %rd386, 1; 2026-02-21T09:19:14.1600927Z add.s64 %rd388, %rd64, %rd387; 2026-02-21T09:19:14.1600992Z add.s64 %rd290, %rd388, 32; 2026-02-21T09:19:14.1601069Z cvt.s64.s32 %rd389, %r7195; 2026-02-21T09:19:14.1601134Z or.b64 %rd390, %rd389, %rd1113; 2026-02-21T09:19:14.1601273Z shl.b64 %rd391, %rd390, 1; 2026-02-21T09:19:14.1601340Z add.s64 %rd392, %rd64, %rd391; 2026-02-21T09:19:14.1601406Z add.s64 %rd291, %rd392, 32; 2026-02-21T09:19:14.1601465Z cvt.s64.s32 %rd393, %r7196; 2026-02-21T09:19:14.1601529Z or.b64 %rd394, %rd393, %rd1113; 2026-02-21T09:19:14.1601596Z shl.b64 %rd395, %rd394, 1; 2026-02-21T09:19:14.1601660Z add.s64 %rd396, %rd64, %rd395; 2026-02-21T09:19:14.1601723Z add.s64 %rd292, %rd396, 32; 2026-02-21T09:19:14.1601790Z cvt.s64.s32 %rd397, %r7197; 2026-02-21T09:19:14.1601853Z or.b64 %rd398, %rd397, %rd1113; 2026-02-21T09:19:14.1601915Z shl.b64 %rd399, %rd398, 1; 2026-02-21T09:19:14.1601983Z add.s64 %rd400, %rd64, %rd399; 2026-02-21T09:19:14.1602053Z add.s64 %rd293, %rd400, 32; 2026-02-21T09:19:14.1602296Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1602361Z // begin inline asm 2026-02-21T09:19:14.1602521Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd286 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1602582Z // end inline asm 2026-02-21T09:19:14.1602641Z // begin inline asm 2026-02-21T09:19:14.1602776Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd287 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1602839Z // end inline asm 2026-02-21T09:19:14.1602899Z // begin inline asm 2026-02-21T09:19:14.1603027Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd288 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1603089Z // end inline asm 2026-02-21T09:19:14.1603148Z // begin inline asm 2026-02-21T09:19:14.1603273Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd289 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1603332Z // end inline asm 2026-02-21T09:19:14.1603390Z // begin inline asm 2026-02-21T09:19:14.1603518Z cp.async.ca.shared.global [ %r71 + 0 ], [ %rd290 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1603574Z // end inline asm 2026-02-21T09:19:14.1603640Z // begin inline asm 2026-02-21T09:19:14.1603765Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd291 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1603822Z // end inline asm 2026-02-21T09:19:14.1603960Z // begin inline asm 2026-02-21T09:19:14.1604096Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd292 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1604157Z // end inline asm 2026-02-21T09:19:14.1604215Z // begin inline asm 2026-02-21T09:19:14.1604356Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd293 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1604413Z // end inline asm 2026-02-21T09:19:14.1604494Z cp.async.commit_group; 2026-02-21T09:19:14.1604725Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1604793Z add.s32 %r7207, %r7181, %r75; 2026-02-21T09:19:14.1604997Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1605064Z cvt.s64.s32 %rd401, %r7207; 2026-02-21T09:19:14.1605131Z add.s64 %rd294, %rd65, %rd401; 2026-02-21T09:19:14.1605328Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1605389Z // begin inline asm 2026-02-21T09:19:14.1605537Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd294 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1605654Z // end inline asm 2026-02-21T09:19:14.1605721Z cp.async.commit_group; 2026-02-21T09:19:14.1605927Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1605990Z add.s64 %rd295, %rd372, 64; 2026-02-21T09:19:14.1606097Z add.s64 %rd296, %rd376, 64; 2026-02-21T09:19:14.1606163Z add.s64 %rd297, %rd380, 64; 2026-02-21T09:19:14.1606223Z add.s64 %rd298, %rd384, 64; 2026-02-21T09:19:14.1606286Z add.s64 %rd299, %rd388, 64; 2026-02-21T09:19:14.1606344Z add.s64 %rd300, %rd392, 64; 2026-02-21T09:19:14.1606410Z add.s64 %rd301, %rd396, 64; 2026-02-21T09:19:14.1606613Z add.s64 %rd302, %rd400, 64; 2026-02-21T09:19:14.1606836Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1606994Z bar.sync 0; 2026-02-21T09:19:14.1607070Z // begin inline asm 2026-02-21T09:19:14.1607219Z cp.async.ca.shared.global [ %r77 + 0 ], [ %rd295 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1607281Z // end inline asm 2026-02-21T09:19:14.1607347Z // begin inline asm 2026-02-21T09:19:14.1607484Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd296 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1607544Z // end inline asm 2026-02-21T09:19:14.1607761Z // begin inline asm 2026-02-21T09:19:14.1608014Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd297 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1608300Z // end inline asm 2026-02-21T09:19:14.1608460Z // begin inline asm 2026-02-21T09:19:14.1608695Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd298 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1608972Z // end inline asm 2026-02-21T09:19:14.1609135Z // begin inline asm 2026-02-21T09:19:14.1609387Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd299 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1609664Z // end inline asm 2026-02-21T09:19:14.1609825Z // begin inline asm 2026-02-21T09:19:14.1610054Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd300 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1610344Z // end inline asm 2026-02-21T09:19:14.1610504Z // begin inline asm 2026-02-21T09:19:14.1610730Z cp.async.ca.shared.global [ %r83 + 0 ], [ %rd301 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1611005Z // end inline asm 2026-02-21T09:19:14.1611149Z // begin inline asm 2026-02-21T09:19:14.1611374Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd302 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1611650Z // end inline asm 2026-02-21T09:19:14.1611815Z cp.async.commit_group; 2026-02-21T09:19:14.1612141Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1612514Z add.s32 %r7208, %r7181, %r85; 2026-02-21T09:19:14.1612859Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1613219Z cvt.s64.s32 %rd402, %r7208; 2026-02-21T09:19:14.1613495Z add.s64 %rd303, %rd65, %rd402; 2026-02-21T09:19:14.1613826Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1614181Z // begin inline asm 2026-02-21T09:19:14.1614418Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd303 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1614694Z // end inline asm 2026-02-21T09:19:14.1614852Z cp.async.commit_group; 2026-02-21T09:19:14.1615250Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1615613Z add.s64 %rd304, %rd372, 96; 2026-02-21T09:19:14.1615801Z add.s64 %rd305, %rd376, 96; 2026-02-21T09:19:14.1615979Z add.s64 %rd306, %rd380, 96; 2026-02-21T09:19:14.1616159Z add.s64 %rd307, %rd384, 96; 2026-02-21T09:19:14.1616333Z add.s64 %rd308, %rd388, 96; 2026-02-21T09:19:14.1616634Z add.s64 %rd309, %rd392, 96; 2026-02-21T09:19:14.1616808Z add.s64 %rd310, %rd396, 96; 2026-02-21T09:19:14.1616988Z add.s64 %rd311, %rd400, 96; 2026-02-21T09:19:14.1617302Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1617658Z // begin inline asm 2026-02-21T09:19:14.1617990Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd304 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1618261Z // end inline asm 2026-02-21T09:19:14.1618420Z // begin inline asm 2026-02-21T09:19:14.1618651Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd305 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1619008Z // end inline asm 2026-02-21T09:19:14.1619162Z // begin inline asm 2026-02-21T09:19:14.1619404Z cp.async.ca.shared.global [ %r89 + 0 ], [ %rd306 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1619685Z // end inline asm 2026-02-21T09:19:14.1619848Z // begin inline asm 2026-02-21T09:19:14.1620076Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd307 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1620342Z // end inline asm 2026-02-21T09:19:14.1620502Z // begin inline asm 2026-02-21T09:19:14.1620804Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd308 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1621083Z // end inline asm 2026-02-21T09:19:14.1621236Z // begin inline asm 2026-02-21T09:19:14.1621470Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd309 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1621735Z // end inline asm 2026-02-21T09:19:14.1621896Z // begin inline asm 2026-02-21T09:19:14.1622120Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd310 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1622394Z // end inline asm 2026-02-21T09:19:14.1622553Z // begin inline asm 2026-02-21T09:19:14.1622776Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd311 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1623049Z // end inline asm 2026-02-21T09:19:14.1623204Z cp.async.commit_group; 2026-02-21T09:19:14.1623533Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1623902Z add.s32 %r7209, %r7181, %r95; 2026-02-21T09:19:14.1624250Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1624618Z cvt.s64.s32 %rd403, %r7209; 2026-02-21T09:19:14.1624806Z add.s64 %rd312, %rd65, %rd403; 2026-02-21T09:19:14.1625139Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1625501Z // begin inline asm 2026-02-21T09:19:14.1625769Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd312 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1626056Z // end inline asm 2026-02-21T09:19:14.1626222Z cp.async.commit_group; 2026-02-21T09:19:14.1626650Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1627022Z add.s64 %rd313, %rd372, 128; 2026-02-21T09:19:14.1627214Z add.s64 %rd314, %rd376, 128; 2026-02-21T09:19:14.1627395Z add.s64 %rd315, %rd380, 128; 2026-02-21T09:19:14.1627576Z add.s64 %rd316, %rd384, 128; 2026-02-21T09:19:14.1627752Z add.s64 %rd317, %rd388, 128; 2026-02-21T09:19:14.1627941Z add.s64 %rd318, %rd392, 128; 2026-02-21T09:19:14.1628210Z add.s64 %rd319, %rd396, 128; 2026-02-21T09:19:14.1628406Z add.s64 %rd320, %rd400, 128; 2026-02-21T09:19:14.1628805Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1629160Z bar.sync 0; 2026-02-21T09:19:14.1629318Z // begin inline asm 2026-02-21T09:19:14.1629559Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd313 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1629842Z // end inline asm 2026-02-21T09:19:14.1629992Z // begin inline asm 2026-02-21T09:19:14.1630226Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd314 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1630503Z // end inline asm 2026-02-21T09:19:14.1630659Z // begin inline asm 2026-02-21T09:19:14.1630891Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd315 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1631168Z // end inline asm 2026-02-21T09:19:14.1631323Z // begin inline asm 2026-02-21T09:19:14.1631564Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd316 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1631839Z // end inline asm 2026-02-21T09:19:14.1631992Z // begin inline asm 2026-02-21T09:19:14.1632220Z cp.async.ca.shared.global [ %r101 + 0 ], [ %rd317 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1632605Z // end inline asm 2026-02-21T09:19:14.1632750Z // begin inline asm 2026-02-21T09:19:14.1632983Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd318 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1633341Z // end inline asm 2026-02-21T09:19:14.1633504Z // begin inline asm 2026-02-21T09:19:14.1633751Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd319 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1634033Z // end inline asm 2026-02-21T09:19:14.1634190Z // begin inline asm 2026-02-21T09:19:14.1634417Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd320 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1634694Z // end inline asm 2026-02-21T09:19:14.1634849Z cp.async.commit_group; 2026-02-21T09:19:14.1635255Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1635630Z add.s32 %r7210, %r7181, %r105; 2026-02-21T09:19:14.1635953Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1636319Z cvt.s64.s32 %rd404, %r7210; 2026-02-21T09:19:14.1636636Z add.s64 %rd321, %rd65, %rd404; 2026-02-21T09:19:14.1636969Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1637321Z // begin inline asm 2026-02-21T09:19:14.1637564Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd321 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1637847Z // end inline asm 2026-02-21T09:19:14.1638002Z cp.async.commit_group; 2026-02-21T09:19:14.1638317Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1638667Z add.s64 %rd322, %rd372, 160; 2026-02-21T09:19:14.1638854Z add.s64 %rd323, %rd376, 160; 2026-02-21T09:19:14.1639031Z add.s64 %rd324, %rd380, 160; 2026-02-21T09:19:14.1639212Z add.s64 %rd325, %rd384, 160; 2026-02-21T09:19:14.1639387Z add.s64 %rd326, %rd388, 160; 2026-02-21T09:19:14.1639564Z add.s64 %rd327, %rd392, 160; 2026-02-21T09:19:14.1639752Z add.s64 %rd328, %rd396, 160; 2026-02-21T09:19:14.1639938Z add.s64 %rd329, %rd400, 160; 2026-02-21T09:19:14.1640255Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1640604Z // begin inline asm 2026-02-21T09:19:14.1640842Z cp.async.ca.shared.global [ %r107 + 0 ], [ %rd322 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1641115Z // end inline asm 2026-02-21T09:19:14.1641268Z // begin inline asm 2026-02-21T09:19:14.1641494Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd323 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1641772Z // end inline asm 2026-02-21T09:19:14.1641928Z // begin inline asm 2026-02-21T09:19:14.1642155Z cp.async.ca.shared.global [ %r109 + 0 ], [ %rd324 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1642431Z // end inline asm 2026-02-21T09:19:14.1642666Z // begin inline asm 2026-02-21T09:19:14.1642904Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd325 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1643178Z // end inline asm 2026-02-21T09:19:14.1643332Z // begin inline asm 2026-02-21T09:19:14.1643562Z cp.async.ca.shared.global [ %r111 + 0 ], [ %rd326 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1643831Z // end inline asm 2026-02-21T09:19:14.1643985Z // begin inline asm 2026-02-21T09:19:14.1644213Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd327 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1644491Z // end inline asm 2026-02-21T09:19:14.1644643Z // begin inline asm 2026-02-21T09:19:14.1644866Z cp.async.ca.shared.global [ %r113 + 0 ], [ %rd328 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1645144Z // end inline asm 2026-02-21T09:19:14.1645291Z // begin inline asm 2026-02-21T09:19:14.1645523Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd329 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1645795Z // end inline asm 2026-02-21T09:19:14.1645958Z cp.async.commit_group; 2026-02-21T09:19:14.1646273Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1646854Z add.s32 %r7211, %r7181, %r115; 2026-02-21T09:19:14.1647188Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1647539Z cvt.s64.s32 %rd405, %r7211; 2026-02-21T09:19:14.1647806Z add.s64 %rd330, %rd65, %rd405; 2026-02-21T09:19:14.1648125Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1648480Z // begin inline asm 2026-02-21T09:19:14.1648715Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd330 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1649001Z // end inline asm 2026-02-21T09:19:14.1649160Z cp.async.commit_group; 2026-02-21T09:19:14.1649468Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1649895Z add.s64 %rd331, %rd372, 192; 2026-02-21T09:19:14.1650082Z add.s64 %rd332, %rd376, 192; 2026-02-21T09:19:14.1650264Z add.s64 %rd333, %rd380, 192; 2026-02-21T09:19:14.1650441Z add.s64 %rd334, %rd384, 192; 2026-02-21T09:19:14.1650621Z add.s64 %rd335, %rd388, 192; 2026-02-21T09:19:14.1650796Z add.s64 %rd336, %rd392, 192; 2026-02-21T09:19:14.1650977Z add.s64 %rd337, %rd396, 192; 2026-02-21T09:19:14.1651152Z add.s64 %rd338, %rd400, 192; 2026-02-21T09:19:14.1651470Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1651817Z bar.sync 0; 2026-02-21T09:19:14.1651963Z // begin inline asm 2026-02-21T09:19:14.1652200Z cp.async.ca.shared.global [ %r117 + 0 ], [ %rd331 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1652473Z // end inline asm 2026-02-21T09:19:14.1652628Z // begin inline asm 2026-02-21T09:19:14.1652859Z cp.async.ca.shared.global [ %r118 + 0 ], [ %rd332 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1653140Z // end inline asm 2026-02-21T09:19:14.1653295Z // begin inline asm 2026-02-21T09:19:14.1653521Z cp.async.ca.shared.global [ %r119 + 0 ], [ %rd333 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1653801Z // end inline asm 2026-02-21T09:19:14.1653950Z // begin inline asm 2026-02-21T09:19:14.1654179Z cp.async.ca.shared.global [ %r120 + 0 ], [ %rd334 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1654449Z // end inline asm 2026-02-21T09:19:14.1654602Z // begin inline asm 2026-02-21T09:19:14.1654830Z cp.async.ca.shared.global [ %r121 + 0 ], [ %rd335 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1655105Z // end inline asm 2026-02-21T09:19:14.1655254Z // begin inline asm 2026-02-21T09:19:14.1655489Z cp.async.ca.shared.global [ %r122 + 0 ], [ %rd336 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1655768Z // end inline asm 2026-02-21T09:19:14.1655917Z // begin inline asm 2026-02-21T09:19:14.1656148Z cp.async.ca.shared.global [ %r123 + 0 ], [ %rd337 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1656419Z // end inline asm 2026-02-21T09:19:14.1656780Z // begin inline asm 2026-02-21T09:19:14.1657011Z cp.async.ca.shared.global [ %r124 + 0 ], [ %rd338 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1657288Z // end inline asm 2026-02-21T09:19:14.1657443Z cp.async.commit_group; 2026-02-21T09:19:14.1657751Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1658106Z add.s32 %r7212, %r7181, %r125; 2026-02-21T09:19:14.1658421Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1658775Z cvt.s64.s32 %rd406, %r7212; 2026-02-21T09:19:14.1658951Z add.s64 %rd339, %rd65, %rd406; 2026-02-21T09:19:14.1659267Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1659609Z // begin inline asm 2026-02-21T09:19:14.1659840Z cp.async.ca.shared.global [ %r126 + 0 ], [ %rd339 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1660119Z // end inline asm 2026-02-21T09:19:14.1660276Z cp.async.commit_group; 2026-02-21T09:19:14.1660586Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1661018Z add.s64 %rd340, %rd372, 224; 2026-02-21T09:19:14.1661199Z add.s64 %rd341, %rd376, 224; 2026-02-21T09:19:14.1661375Z add.s64 %rd342, %rd380, 224; 2026-02-21T09:19:14.1661553Z add.s64 %rd343, %rd384, 224; 2026-02-21T09:19:14.1661726Z add.s64 %rd344, %rd388, 224; 2026-02-21T09:19:14.1661969Z add.s64 %rd345, %rd392, 224; 2026-02-21T09:19:14.1662143Z add.s64 %rd346, %rd396, 224; 2026-02-21T09:19:14.1662320Z add.s64 %rd347, %rd400, 224; 2026-02-21T09:19:14.1662646Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1662990Z // begin inline asm 2026-02-21T09:19:14.1663216Z cp.async.ca.shared.global [ %r127 + 0 ], [ %rd340 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1663487Z // end inline asm 2026-02-21T09:19:14.1663707Z // begin inline asm 2026-02-21T09:19:14.1663937Z cp.async.ca.shared.global [ %r128 + 0 ], [ %rd341 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1664209Z // end inline asm 2026-02-21T09:19:14.1664361Z // begin inline asm 2026-02-21T09:19:14.1664586Z cp.async.ca.shared.global [ %r129 + 0 ], [ %rd342 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1664863Z // end inline asm 2026-02-21T09:19:14.1665009Z // begin inline asm 2026-02-21T09:19:14.1665238Z cp.async.ca.shared.global [ %r130 + 0 ], [ %rd343 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1665510Z // end inline asm 2026-02-21T09:19:14.1665663Z // begin inline asm 2026-02-21T09:19:14.1665889Z cp.async.ca.shared.global [ %r131 + 0 ], [ %rd344 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1666165Z // end inline asm 2026-02-21T09:19:14.1666321Z // begin inline asm 2026-02-21T09:19:14.1666674Z cp.async.ca.shared.global [ %r132 + 0 ], [ %rd345 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1666956Z // end inline asm 2026-02-21T09:19:14.1667102Z // begin inline asm 2026-02-21T09:19:14.1667336Z cp.async.ca.shared.global [ %r133 + 0 ], [ %rd346 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1667606Z // end inline asm 2026-02-21T09:19:14.1667760Z // begin inline asm 2026-02-21T09:19:14.1667988Z cp.async.ca.shared.global [ %r134 + 0 ], [ %rd347 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1668264Z // end inline asm 2026-02-21T09:19:14.1668428Z cp.async.commit_group; 2026-02-21T09:19:14.1668833Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1669191Z add.s32 %r7213, %r7181, %r135; 2026-02-21T09:19:14.1669505Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1669863Z cvt.s64.s32 %rd407, %r7213; 2026-02-21T09:19:14.1670038Z add.s64 %rd348, %rd65, %rd407; 2026-02-21T09:19:14.1670361Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1670711Z // begin inline asm 2026-02-21T09:19:14.1671026Z cp.async.ca.shared.global [ %r136 + 0 ], [ %rd348 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1671312Z // end inline asm 2026-02-21T09:19:14.1671469Z cp.async.commit_group; 2026-02-21T09:19:14.1671786Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1672138Z add.s64 %rd349, %rd372, 256; 2026-02-21T09:19:14.1672332Z add.s64 %rd350, %rd376, 256; 2026-02-21T09:19:14.1672511Z add.s64 %rd351, %rd380, 256; 2026-02-21T09:19:14.1672693Z add.s64 %rd352, %rd384, 256; 2026-02-21T09:19:14.1672870Z add.s64 %rd353, %rd388, 256; 2026-02-21T09:19:14.1673043Z add.s64 %rd354, %rd392, 256; 2026-02-21T09:19:14.1673221Z add.s64 %rd355, %rd396, 256; 2026-02-21T09:19:14.1673397Z add.s64 %rd356, %rd400, 256; 2026-02-21T09:19:14.1673709Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1674052Z bar.sync 0; 2026-02-21T09:19:14.1674205Z // begin inline asm 2026-02-21T09:19:14.1674433Z cp.async.ca.shared.global [ %r137 + 0 ], [ %rd349 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1674710Z // end inline asm 2026-02-21T09:19:14.1674862Z // begin inline asm 2026-02-21T09:19:14.1675167Z cp.async.ca.shared.global [ %r138 + 0 ], [ %rd350 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1675439Z // end inline asm 2026-02-21T09:19:14.1675601Z // begin inline asm 2026-02-21T09:19:14.1675829Z cp.async.ca.shared.global [ %r139 + 0 ], [ %rd351 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1676163Z // end inline asm 2026-02-21T09:19:14.1676313Z // begin inline asm 2026-02-21T09:19:14.1676651Z cp.async.ca.shared.global [ %r140 + 0 ], [ %rd352 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1676929Z // end inline asm 2026-02-21T09:19:14.1677082Z // begin inline asm 2026-02-21T09:19:14.1677306Z cp.async.ca.shared.global [ %r141 + 0 ], [ %rd353 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1677581Z // end inline asm 2026-02-21T09:19:14.1677728Z // begin inline asm 2026-02-21T09:19:14.1678044Z cp.async.ca.shared.global [ %r142 + 0 ], [ %rd354 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1678321Z // end inline asm 2026-02-21T09:19:14.1678475Z // begin inline asm 2026-02-21T09:19:14.1678702Z cp.async.ca.shared.global [ %r143 + 0 ], [ %rd355 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1678979Z // end inline asm 2026-02-21T09:19:14.1679133Z // begin inline asm 2026-02-21T09:19:14.1679362Z cp.async.ca.shared.global [ %r144 + 0 ], [ %rd356 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1679639Z // end inline asm 2026-02-21T09:19:14.1679795Z cp.async.commit_group; 2026-02-21T09:19:14.1680106Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1680467Z add.s32 %r7214, %r7181, %r145; 2026-02-21T09:19:14.1680800Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1681150Z cvt.s64.s32 %rd408, %r7214; 2026-02-21T09:19:14.1681337Z add.s64 %rd357, %rd65, %rd408; 2026-02-21T09:19:14.1681659Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1682006Z // begin inline asm 2026-02-21T09:19:14.1682249Z cp.async.ca.shared.global [ %r146 + 0 ], [ %rd357 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1682526Z // end inline asm 2026-02-21T09:19:14.1682687Z cp.async.commit_group; 2026-02-21T09:19:14.1682989Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1683344Z add.s64 %rd358, %rd372, 288; 2026-02-21T09:19:14.1683521Z add.s64 %rd359, %rd376, 288; 2026-02-21T09:19:14.1683702Z add.s64 %rd360, %rd380, 288; 2026-02-21T09:19:14.1683881Z add.s64 %rd361, %rd384, 288; 2026-02-21T09:19:14.1684054Z add.s64 %rd362, %rd388, 288; 2026-02-21T09:19:14.1684232Z add.s64 %rd363, %rd392, 288; 2026-02-21T09:19:14.1684405Z add.s64 %rd364, %rd396, 288; 2026-02-21T09:19:14.1684583Z add.s64 %rd365, %rd400, 288; 2026-02-21T09:19:14.1684996Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1685364Z // begin inline asm 2026-02-21T09:19:14.1685593Z cp.async.ca.shared.global [ %r147 + 0 ], [ %rd358 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1685872Z // end inline asm 2026-02-21T09:19:14.1686024Z // begin inline asm 2026-02-21T09:19:14.1686249Z cp.async.ca.shared.global [ %r148 + 0 ], [ %rd359 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1686632Z // end inline asm 2026-02-21T09:19:14.1686779Z // begin inline asm 2026-02-21T09:19:14.1687001Z cp.async.ca.shared.global [ %r149 + 0 ], [ %rd360 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1687265Z // end inline asm 2026-02-21T09:19:14.1687428Z // begin inline asm 2026-02-21T09:19:14.1687651Z cp.async.ca.shared.global [ %r150 + 0 ], [ %rd361 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1687922Z // end inline asm 2026-02-21T09:19:14.1688071Z // begin inline asm 2026-02-21T09:19:14.1688310Z cp.async.ca.shared.global [ %r151 + 0 ], [ %rd362 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1688586Z // end inline asm 2026-02-21T09:19:14.1688732Z // begin inline asm 2026-02-21T09:19:14.1688958Z cp.async.ca.shared.global [ %r152 + 0 ], [ %rd363 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1689318Z // end inline asm 2026-02-21T09:19:14.1689469Z // begin inline asm 2026-02-21T09:19:14.1689692Z cp.async.ca.shared.global [ %r153 + 0 ], [ %rd364 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1689964Z // end inline asm 2026-02-21T09:19:14.1690181Z // begin inline asm 2026-02-21T09:19:14.1690406Z cp.async.ca.shared.global [ %r154 + 0 ], [ %rd365 + 0 ], 0x8, %r6764; 2026-02-21T09:19:14.1690682Z // end inline asm 2026-02-21T09:19:14.1690834Z cp.async.commit_group; 2026-02-21T09:19:14.1691147Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1691500Z add.s32 %r7215, %r7181, %r155; 2026-02-21T09:19:14.1691892Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1692255Z cvt.s64.s32 %rd409, %r7215; 2026-02-21T09:19:14.1692448Z add.s64 %rd366, %rd65, %rd409; 2026-02-21T09:19:14.1692778Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1693128Z // begin inline asm 2026-02-21T09:19:14.1693368Z cp.async.ca.shared.global [ %r156 + 0 ], [ %rd366 + 0 ], 0x4, %r23252; 2026-02-21T09:19:14.1693645Z // end inline asm 2026-02-21T09:19:14.1693805Z cp.async.commit_group; 2026-02-21T09:19:14.1694109Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1694466Z add.s32 %r7216, %r7176, %r326; 2026-02-21T09:19:14.1694659Z sub.s32 %r7217, %r7216, %r7178; 2026-02-21T09:19:14.1694851Z shl.b32 %r7218, %r7217, 7; 2026-02-21T09:19:14.1695037Z add.s32 %r23250, %r177, %r7218; 2026-02-21T09:19:14.1695223Z or.b32 %r7219, %r17, %r852; 2026-02-21T09:19:14.1695406Z shl.b32 %r7220, %r7219, 10; 2026-02-21T09:19:14.1695591Z mul.wide.s32 %rd19, %r7220, 2; 2026-02-21T09:19:14.1695777Z or.b32 %r7221, %r16, %r852; 2026-02-21T09:19:14.1695949Z shl.b32 %r7222, %r7221, 10; 2026-02-21T09:19:14.1696129Z mul.wide.s32 %rd20, %r7222, 2; 2026-02-21T09:19:14.1696319Z or.b32 %r7223, %r15, %r852; 2026-02-21T09:19:14.1696620Z shl.b32 %r7224, %r7223, 10; 2026-02-21T09:19:14.1696810Z mul.wide.s32 %rd21, %r7224, 2; 2026-02-21T09:19:14.1696990Z or.b32 %r7225, %r14, %r852; 2026-02-21T09:19:14.1697168Z shl.b32 %r7226, %r7225, 10; 2026-02-21T09:19:14.1697343Z mul.wide.s32 %rd22, %r7226, 2; 2026-02-21T09:19:14.1697526Z or.b32 %r7227, %r13, %r852; 2026-02-21T09:19:14.1697698Z shl.b32 %r7228, %r7227, 10; 2026-02-21T09:19:14.1697877Z mul.wide.s32 %rd23, %r7228, 2; 2026-02-21T09:19:14.1698058Z or.b32 %r7229, %r12, %r852; 2026-02-21T09:19:14.1698233Z shl.b32 %r7230, %r7229, 10; 2026-02-21T09:19:14.1698413Z mul.wide.s32 %rd24, %r7230, 2; 2026-02-21T09:19:14.1698593Z shl.b32 %r7231, %r7177, 19; 2026-02-21T09:19:14.1698868Z or.b32 %r7232, %r22984, %r7231; 2026-02-21T09:19:14.1699064Z mul.wide.s32 %rd25, %r7232, 2; 2026-02-21T09:19:14.1699254Z or.b32 %r23249, %r185, %r7231; 2026-02-21T09:19:14.1699442Z mov.b32 %r23253, 0f00000000; 2026-02-21T09:19:14.1699625Z mov.b32 %r23251, -1; 2026-02-21T09:19:14.1699789Z mov.b64 %rd1117, -16; 2026-02-21T09:19:14.1699963Z mov.b64 %rd1116, %rd3; 2026-02-21T09:19:14.1700139Z mov.b32 %r23254, %r23253; 2026-02-21T09:19:14.1700318Z mov.b32 %r23255, %r23253; 2026-02-21T09:19:14.1700489Z mov.b32 %r23256, %r23253; 2026-02-21T09:19:14.1700655Z mov.b32 %r23257, %r23253; 2026-02-21T09:19:14.1700825Z mov.b32 %r23258, %r23253; 2026-02-21T09:19:14.1700992Z mov.b32 %r23259, %r23253; 2026-02-21T09:19:14.1701162Z mov.b32 %r23260, %r23253; 2026-02-21T09:19:14.1701325Z mov.b32 %r23261, %r23253; 2026-02-21T09:19:14.1701496Z mov.b32 %r23262, %r23253; 2026-02-21T09:19:14.1701664Z mov.b32 %r23263, %r23253; 2026-02-21T09:19:14.1701836Z mov.b32 %r23264, %r23253; 2026-02-21T09:19:14.1702008Z mov.b32 %r23265, %r23253; 2026-02-21T09:19:14.1702183Z mov.b32 %r23266, %r23253; 2026-02-21T09:19:14.1702362Z mov.b32 %r23267, %r23253; 2026-02-21T09:19:14.1702536Z mov.b32 %r23268, %r23253; 2026-02-21T09:19:14.1702790Z mov.b32 %r23269, %r23253; 2026-02-21T09:19:14.1702956Z mov.b32 %r23270, %r23253; 2026-02-21T09:19:14.1703130Z mov.b32 %r23271, %r23253; 2026-02-21T09:19:14.1703296Z mov.b32 %r23272, %r23253; 2026-02-21T09:19:14.1703535Z mov.b32 %r23273, %r23253; 2026-02-21T09:19:14.1703701Z mov.b32 %r23274, %r23253; 2026-02-21T09:19:14.1703875Z mov.b32 %r23275, %r23253; 2026-02-21T09:19:14.1704041Z mov.b32 %r23276, %r23253; 2026-02-21T09:19:14.1704212Z mov.b32 %r23277, %r23253; 2026-02-21T09:19:14.1704382Z mov.b32 %r23278, %r23253; 2026-02-21T09:19:14.1704562Z mov.b32 %r23279, %r23253; 2026-02-21T09:19:14.1704739Z mov.b32 %r23280, %r23253; 2026-02-21T09:19:14.1704907Z mov.b32 %r23281, %r23253; 2026-02-21T09:19:14.1705079Z mov.b32 %r23282, %r23253; 2026-02-21T09:19:14.1705317Z mov.b32 %r23283, %r23253; 2026-02-21T09:19:14.1705492Z mov.b32 %r23284, %r23253; 2026-02-21T09:19:14.1705656Z mov.b32 %r23285, %r23253; 2026-02-21T09:19:14.1705825Z mov.b32 %r23286, %r23253; 2026-02-21T09:19:14.1705990Z mov.b32 %r23287, %r23253; 2026-02-21T09:19:14.1706159Z mov.b32 %r23288, %r23253; 2026-02-21T09:19:14.1706330Z mov.b32 %r23289, %r23253; 2026-02-21T09:19:14.1706603Z mov.b32 %r23290, %r23253; 2026-02-21T09:19:14.1706782Z mov.b32 %r23291, %r23253; 2026-02-21T09:19:14.1706946Z mov.b32 %r23292, %r23253; 2026-02-21T09:19:14.1707113Z mov.b32 %r23293, %r23253; 2026-02-21T09:19:14.1707276Z mov.b32 %r23294, %r23253; 2026-02-21T09:19:14.1707443Z mov.b32 %r23295, %r23253; 2026-02-21T09:19:14.1707608Z mov.b32 %r23296, %r23253; 2026-02-21T09:19:14.1707779Z mov.b32 %r23297, %r23253; 2026-02-21T09:19:14.1707950Z mov.b32 %r23298, %r23253; 2026-02-21T09:19:14.1708117Z mov.b32 %r23299, %r23253; 2026-02-21T09:19:14.1708294Z mov.b32 %r23300, %r23253; 2026-02-21T09:19:14.1708472Z mov.b32 %r23301, %r23253; 2026-02-21T09:19:14.1708712Z mov.b32 %r23302, %r23253; 2026-02-21T09:19:14.1708877Z mov.b32 %r23303, %r23253; 2026-02-21T09:19:14.1709050Z mov.b32 %r23304, %r23253; 2026-02-21T09:19:14.1709218Z mov.b32 %r23305, %r23253; 2026-02-21T09:19:14.1709388Z mov.b32 %r23306, %r23253; 2026-02-21T09:19:14.1709555Z mov.b32 %r23307, %r23253; 2026-02-21T09:19:14.1709727Z mov.b32 %r23308, %r23253; 2026-02-21T09:19:14.1709898Z mov.b32 %r23309, %r23253; 2026-02-21T09:19:14.1710064Z mov.b32 %r23310, %r23253; 2026-02-21T09:19:14.1710236Z mov.b32 %r23311, %r23253; 2026-02-21T09:19:14.1710400Z mov.b32 %r23312, %r23253; 2026-02-21T09:19:14.1710571Z mov.b32 %r23313, %r23253; 2026-02-21T09:19:14.1710734Z mov.b32 %r23314, %r23253; 2026-02-21T09:19:14.1710905Z mov.b32 %r23315, %r23253; 2026-02-21T09:19:14.1711070Z mov.b32 %r23316, %r23253; 2026-02-21T09:19:14.1711239Z mov.b32 %r23317, %r23253; 2026-02-21T09:19:14.1711406Z mov.b32 %r23318, %r23253; 2026-02-21T09:19:14.1711668Z mov.b32 %r23319, %r23253; 2026-02-21T09:19:14.1711847Z mov.b32 %r23320, %r23253; 2026-02-21T09:19:14.1712011Z mov.b32 %r23321, %r23253; 2026-02-21T09:19:14.1712187Z mov.b32 %r23322, %r23253; 2026-02-21T09:19:14.1712355Z mov.b32 %r23323, %r23253; 2026-02-21T09:19:14.1712528Z mov.b32 %r23324, %r23253; 2026-02-21T09:19:14.1712696Z mov.b32 %r23325, %r23253; 2026-02-21T09:19:14.1712867Z mov.b32 %r23326, %r23253; 2026-02-21T09:19:14.1713036Z mov.b32 %r23327, %r23253; 2026-02-21T09:19:14.1713226Z mov.b32 %r23328, %r23253; 2026-02-21T09:19:14.1713396Z mov.b32 %r23329, %r23253; 2026-02-21T09:19:14.1713569Z mov.b32 %r23330, %r23253; 2026-02-21T09:19:14.1713738Z mov.b32 %r23331, %r23253; 2026-02-21T09:19:14.1713902Z mov.b32 %r23332, %r23253; 2026-02-21T09:19:14.1714072Z mov.b32 %r23333, %r23253; 2026-02-21T09:19:14.1714234Z mov.b32 %r23334, %r23253; 2026-02-21T09:19:14.1714403Z mov.b32 %r23335, %r23253; 2026-02-21T09:19:14.1714570Z mov.b32 %r23336, %r23253; 2026-02-21T09:19:14.1714743Z mov.b32 %r23337, %r23253; 2026-02-21T09:19:14.1714906Z mov.b32 %r23338, %r23253; 2026-02-21T09:19:14.1715076Z mov.b32 %r23339, %r23253; 2026-02-21T09:19:14.1715324Z mov.b32 %r23340, %r23253; 2026-02-21T09:19:14.1715497Z mov.b32 %r23341, %r23253; 2026-02-21T09:19:14.1715667Z mov.b32 %r23342, %r23253; 2026-02-21T09:19:14.1715830Z mov.b32 %r23343, %r23253; 2026-02-21T09:19:14.1716003Z mov.b32 %r23344, %r23253; 2026-02-21T09:19:14.1716234Z mov.b32 %r23345, %r23253; 2026-02-21T09:19:14.1716407Z mov.b32 %r23346, %r23253; 2026-02-21T09:19:14.1716697Z mov.b32 %r23347, %r23253; 2026-02-21T09:19:14.1716868Z mov.b32 %r23348, %r23253; 2026-02-21T09:19:14.1717031Z mov.b32 %r23349, %r23253; 2026-02-21T09:19:14.1717204Z mov.b32 %r23350, %r23253; 2026-02-21T09:19:14.1717369Z mov.b32 %r23351, %r23253; 2026-02-21T09:19:14.1717542Z mov.b32 %r23352, %r23253; 2026-02-21T09:19:14.1717716Z mov.b32 %r23353, %r23253; 2026-02-21T09:19:14.1717882Z mov.b32 %r23354, %r23253; 2026-02-21T09:19:14.1718152Z mov.b32 %r23355, %r23253; 2026-02-21T09:19:14.1718322Z mov.b32 %r23356, %r23253; 2026-02-21T09:19:14.1718492Z mov.b32 %r23357, %r23253; 2026-02-21T09:19:14.1718660Z mov.b32 %r23358, %r23253; 2026-02-21T09:19:14.1718828Z mov.b32 %r23359, %r23253; 2026-02-21T09:19:14.1718997Z mov.b32 %r23360, %r23253; 2026-02-21T09:19:14.1719167Z mov.b32 %r23361, %r23253; 2026-02-21T09:19:14.1719333Z mov.b32 %r23362, %r23253; 2026-02-21T09:19:14.1719506Z mov.b32 %r23363, %r23253; 2026-02-21T09:19:14.1719687Z mov.b32 %r23364, %r23253; 2026-02-21T09:19:14.1719863Z mov.b32 %r23365, %r23253; 2026-02-21T09:19:14.1720034Z mov.b32 %r23366, %r23253; 2026-02-21T09:19:14.1720200Z mov.b32 %r23367, %r23253; 2026-02-21T09:19:14.1720370Z mov.b32 %r23368, %r23253; 2026-02-21T09:19:14.1720536Z mov.b32 %r23369, %r23253; 2026-02-21T09:19:14.1720709Z mov.b32 %r23370, %r23253; 2026-02-21T09:19:14.1720879Z mov.b32 %r23371, %r23253; 2026-02-21T09:19:14.1721054Z mov.b32 %r23372, %r23253; 2026-02-21T09:19:14.1721222Z mov.b32 %r23373, %r23253; 2026-02-21T09:19:14.1721397Z mov.b32 %r23374, %r23253; 2026-02-21T09:19:14.1721567Z mov.b32 %r23375, %r23253; 2026-02-21T09:19:14.1721737Z mov.b32 %r23376, %r23253; 2026-02-21T09:19:14.1721798Z mov.b32 %r23377, %r23253; 2026-02-21T09:19:14.1721866Z mov.b32 %r23378, %r23253; 2026-02-21T09:19:14.1721926Z mov.b32 %r23379, %r23253; 2026-02-21T09:19:14.1721990Z mov.b32 %r23380, %r23253; 2026-02-21T09:19:14.1722059Z mov.b32 %r23381, %r23253; 2026-02-21T09:19:14.1722121Z mov.b32 %r23382, %r23253; 2026-02-21T09:19:14.1722182Z mov.b32 %r23383, %r23253; 2026-02-21T09:19:14.1722244Z mov.b32 %r23384, %r23253; 2026-02-21T09:19:14.1722310Z mov.b32 %r23385, %r23253; 2026-02-21T09:19:14.1722371Z mov.b32 %r23386, %r23253; 2026-02-21T09:19:14.1722436Z mov.b32 %r23387, %r23253; 2026-02-21T09:19:14.1722513Z mov.b32 %r23388, %r23253; 2026-02-21T09:19:14.1722575Z mov.b32 %r23389, %r23253; 2026-02-21T09:19:14.1722637Z mov.b32 %r23390, %r23253; 2026-02-21T09:19:14.1722777Z mov.b32 %r23391, %r23253; 2026-02-21T09:19:14.1722850Z mov.b32 %r23392, %r23253; 2026-02-21T09:19:14.1722912Z mov.b32 %r23393, %r23253; 2026-02-21T09:19:14.1722973Z mov.b32 %r23394, %r23253; 2026-02-21T09:19:14.1723038Z mov.b32 %r23395, %r23253; 2026-02-21T09:19:14.1723098Z mov.b32 %r23396, %r23253; 2026-02-21T09:19:14.1723159Z mov.b32 %r23397, %r23253; 2026-02-21T09:19:14.1723218Z mov.b32 %r23398, %r23253; 2026-02-21T09:19:14.1723289Z mov.b32 %r23399, %r23253; 2026-02-21T09:19:14.1723349Z mov.b32 %r23400, %r23253; 2026-02-21T09:19:14.1723412Z mov.b32 %r23401, %r23253; 2026-02-21T09:19:14.1723477Z mov.b32 %r23402, %r23253; 2026-02-21T09:19:14.1723537Z mov.b32 %r23403, %r23253; 2026-02-21T09:19:14.1723596Z mov.b32 %r23404, %r23253; 2026-02-21T09:19:14.1723657Z mov.b32 %r23405, %r23253; 2026-02-21T09:19:14.1723723Z mov.b32 %r23406, %r23253; 2026-02-21T09:19:14.1723794Z mov.b32 %r23407, %r23253; 2026-02-21T09:19:14.1723858Z mov.b32 %r23408, %r23253; 2026-02-21T09:19:14.1723925Z mov.b32 %r23409, %r23253; 2026-02-21T09:19:14.1723986Z mov.b32 %r23410, %r23253; 2026-02-21T09:19:14.1724045Z mov.b32 %r23411, %r23253; 2026-02-21T09:19:14.1724184Z mov.b32 %r23412, %r23253; 2026-02-21T09:19:14.1724244Z mov.b32 %r23413, %r23253; 2026-02-21T09:19:14.1724304Z mov.b32 %r23414, %r23253; 2026-02-21T09:19:14.1724365Z mov.b32 %r23415, %r23253; 2026-02-21T09:19:14.1724429Z mov.b32 %r23416, %r23253; 2026-02-21T09:19:14.1724549Z mov.b32 %r23417, %r23253; 2026-02-21T09:19:14.1724609Z mov.b32 %r23418, %r23253; 2026-02-21T09:19:14.1724675Z mov.b32 %r23419, %r23253; 2026-02-21T09:19:14.1724735Z mov.b32 %r23420, %r23253; 2026-02-21T09:19:14.1724794Z mov.b32 %r23421, %r23253; 2026-02-21T09:19:14.1724856Z mov.b32 %r23422, %r23253; 2026-02-21T09:19:14.1724922Z mov.b32 %r23423, %r23253; 2026-02-21T09:19:14.1724982Z mov.b32 %r23424, %r23253; 2026-02-21T09:19:14.1725042Z mov.b32 %r23425, %r23253; 2026-02-21T09:19:14.1725176Z mov.b32 %r23426, %r23253; 2026-02-21T09:19:14.1725240Z mov.b32 %r23427, %r23253; 2026-02-21T09:19:14.1725301Z mov.b32 %r23428, %r23253; 2026-02-21T09:19:14.1725365Z mov.b32 %r23429, %r23253; 2026-02-21T09:19:14.1725434Z mov.b32 %r23430, %r23253; 2026-02-21T09:19:14.1725507Z mov.b32 %r23431, %r23253; 2026-02-21T09:19:14.1725571Z mov.b32 %r23432, %r23253; 2026-02-21T09:19:14.1725640Z mov.b32 %r23433, %r23253; 2026-02-21T09:19:14.1725706Z mov.b32 %r23434, %r23253; 2026-02-21T09:19:14.1725769Z mov.b32 %r23435, %r23253; 2026-02-21T09:19:14.1725830Z mov.b32 %r23436, %r23253; 2026-02-21T09:19:14.1725896Z mov.b32 %r23437, %r23253; 2026-02-21T09:19:14.1725956Z mov.b32 %r23438, %r23253; 2026-02-21T09:19:14.1726016Z mov.b32 %r23439, %r23253; 2026-02-21T09:19:14.1726084Z mov.b32 %r23440, %r23253; 2026-02-21T09:19:14.1726146Z mov.b32 %r23441, %r23253; 2026-02-21T09:19:14.1726207Z mov.b32 %r23442, %r23253; 2026-02-21T09:19:14.1726267Z mov.b32 %r23443, %r23253; 2026-02-21T09:19:14.1726334Z mov.b32 %r23444, %r23253; 2026-02-21T09:19:14.1726396Z mov.b32 %r23445, %r23253; 2026-02-21T09:19:14.1726588Z mov.b32 %r23446, %r23253; 2026-02-21T09:19:14.1726662Z mov.b32 %r23447, %r23253; 2026-02-21T09:19:14.1726724Z mov.b32 %r23448, %r23253; 2026-02-21T09:19:14.1726785Z mov.b32 %r23449, %r23253; 2026-02-21T09:19:14.1726844Z mov.b32 %r23450, %r23253; 2026-02-21T09:19:14.1726912Z mov.b32 %r23451, %r23253; 2026-02-21T09:19:14.1726973Z mov.b32 %r23452, %r23253; 2026-02-21T09:19:14.1727038Z mov.b32 %r23453, %r23253; 2026-02-21T09:19:14.1727105Z mov.b32 %r23454, %r23253; 2026-02-21T09:19:14.1727166Z mov.b32 %r23455, %r23253; 2026-02-21T09:19:14.1727227Z mov.b32 %r23456, %r23253; 2026-02-21T09:19:14.1727293Z mov.b32 %r23457, %r23253; 2026-02-21T09:19:14.1727355Z mov.b32 %r23458, %r23253; 2026-02-21T09:19:14.1727422Z mov.b32 %r23459, %r23253; 2026-02-21T09:19:14.1727483Z mov.b32 %r23460, %r23253; 2026-02-21T09:19:14.1727550Z mov.b32 %r23461, %r23253; 2026-02-21T09:19:14.1727613Z mov.b32 %r23462, %r23253; 2026-02-21T09:19:14.1727756Z mov.b32 %r23463, %r23253; 2026-02-21T09:19:14.1727830Z mov.b32 %r23464, %r23253; 2026-02-21T09:19:14.1727894Z mov.b32 %r23465, %r23253; 2026-02-21T09:19:14.1727959Z mov.b32 %r23466, %r23253; 2026-02-21T09:19:14.1728019Z mov.b32 %r23467, %r23253; 2026-02-21T09:19:14.1728084Z mov.b32 %r23468, %r23253; 2026-02-21T09:19:14.1728144Z mov.b32 %r23469, %r23253; 2026-02-21T09:19:14.1728207Z mov.b32 %r23470, %r23253; 2026-02-21T09:19:14.1728276Z mov.b32 %r23471, %r23253; 2026-02-21T09:19:14.1728336Z mov.b32 %r23472, %r23253; 2026-02-21T09:19:14.1728401Z mov.b32 %r23473, %r23253; 2026-02-21T09:19:14.1728464Z mov.b32 %r23474, %r23253; 2026-02-21T09:19:14.1728530Z mov.b32 %r23475, %r23253; 2026-02-21T09:19:14.1728591Z mov.b32 %r23476, %r23253; 2026-02-21T09:19:14.1728652Z mov.b32 %r23477, %r23253; 2026-02-21T09:19:14.1728719Z mov.b32 %r23478, %r23253; 2026-02-21T09:19:14.1728781Z mov.b32 %r23479, %r23253; 2026-02-21T09:19:14.1728845Z mov.b32 %r23480, %r23253; 2026-02-21T09:19:14.1728908Z mov.b32 %r23481, %r23253; 2026-02-21T09:19:14.1728984Z mov.b32 %r23482, %r23253; 2026-02-21T09:19:14.1729048Z mov.b32 %r23483, %r23253; 2026-02-21T09:19:14.1729185Z mov.b32 %r23484, %r23253; 2026-02-21T09:19:14.1729250Z mov.b32 %r23485, %r23253; 2026-02-21T09:19:14.1729311Z mov.b32 %r23486, %r23253; 2026-02-21T09:19:14.1729374Z mov.b32 %r23487, %r23253; 2026-02-21T09:19:14.1729435Z mov.b32 %r23488, %r23253; 2026-02-21T09:19:14.1729560Z mov.b32 %r23489, %r23253; 2026-02-21T09:19:14.1729621Z mov.b32 %r23490, %r23253; 2026-02-21T09:19:14.1729681Z mov.b32 %r23491, %r23253; 2026-02-21T09:19:14.1729749Z mov.b32 %r23492, %r23253; 2026-02-21T09:19:14.1729809Z mov.b32 %r23493, %r23253; 2026-02-21T09:19:14.1729869Z mov.b32 %r23494, %r23253; 2026-02-21T09:19:14.1729928Z mov.b32 %r23495, %r23253; 2026-02-21T09:19:14.1729993Z mov.b32 %r23496, %r23253; 2026-02-21T09:19:14.1730055Z mov.b32 %r23497, %r23253; 2026-02-21T09:19:14.1730180Z mov.b32 %r23498, %r23253; 2026-02-21T09:19:14.1730250Z mov.b32 %r23499, %r23253; 2026-02-21T09:19:14.1730312Z mov.b32 %r23500, %r23253; 2026-02-21T09:19:14.1730371Z mov.b32 %r23501, %r23253; 2026-02-21T09:19:14.1730442Z mov.b32 %r23502, %r23253; 2026-02-21T09:19:14.1730502Z mov.b32 %r23503, %r23253; 2026-02-21T09:19:14.1730562Z mov.b32 %r23504, %r23253; 2026-02-21T09:19:14.1730622Z mov.b32 %r23505, %r23253; 2026-02-21T09:19:14.1730690Z mov.b32 %r23506, %r23253; 2026-02-21T09:19:14.1730752Z mov.b32 %r23507, %r23253; 2026-02-21T09:19:14.1730811Z mov.b32 %r23508, %r23253; 2026-02-21T09:19:14.1730940Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:19:14.1731062Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:14.1731131Z add.s64 %rd1117, %rd1117, 16; 2026-02-21T09:19:14.1731206Z setp.lt.u64 %p40, %rd1117, 432; 2026-02-21T09:19:14.1731275Z add.s32 %r10417, %r23251, 1; 2026-02-21T09:19:14.1731343Z setp.gt.s32 %p41, %r10417, 4; 2026-02-21T09:19:14.1731417Z selp.b32 %r23251, 0, %r10417, %p41; 2026-02-21T09:19:14.1731638Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1731714Z cp.async.wait_group 16; 2026-02-21T09:19:14.1731772Z bar.sync 0; 2026-02-21T09:19:14.1731842Z shl.b32 %r10418, %r23251, 14; 2026-02-21T09:19:14.1731910Z add.s32 %r10420, %r22967, %r10418; 2026-02-21T09:19:14.1732116Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1732186Z add.s32 %r10421, %r10420, %r157; 2026-02-21T09:19:14.1732261Z ld.shared.b16 %rs145, [%r10421]; 2026-02-21T09:19:14.1732335Z ld.shared.b16 %rs146, [%r10421+256]; 2026-02-21T09:19:14.1732406Z ld.shared.b16 %rs147, [%r10421+16]; 2026-02-21T09:19:14.1732480Z ld.shared.b16 %rs148, [%r10421+272]; 2026-02-21T09:19:14.1732553Z ld.shared.b16 %rs149, [%r10421+4096]; 2026-02-21T09:19:14.1732623Z ld.shared.b16 %rs150, [%r10421+4352]; 2026-02-21T09:19:14.1732747Z ld.shared.b16 %rs151, [%r10421+4112]; 2026-02-21T09:19:14.1732824Z ld.shared.b16 %rs152, [%r10421+4368]; 2026-02-21T09:19:14.1732892Z ld.shared.b16 %rs153, [%r10421+8192]; 2026-02-21T09:19:14.1732962Z ld.shared.b16 %rs154, [%r10421+8448]; 2026-02-21T09:19:14.1733039Z ld.shared.b16 %rs155, [%r10421+8208]; 2026-02-21T09:19:14.1733107Z ld.shared.b16 %rs156, [%r10421+8464]; 2026-02-21T09:19:14.1733182Z ld.shared.b16 %rs157, [%r10421+12288]; 2026-02-21T09:19:14.1733260Z ld.shared.b16 %rs158, [%r10421+12544]; 2026-02-21T09:19:14.1733330Z ld.shared.b16 %rs159, [%r10421+12304]; 2026-02-21T09:19:14.1733399Z ld.shared.b16 %rs160, [%r10421+12560]; 2026-02-21T09:19:14.1733465Z add.s32 %r10422, %r10420, %r158; 2026-02-21T09:19:14.1733539Z ld.shared.b16 %rs161, [%r10422]; 2026-02-21T09:19:14.1733607Z ld.shared.b16 %rs162, [%r10422+256]; 2026-02-21T09:19:14.1733676Z ld.shared.b16 %rs163, [%r10422+16]; 2026-02-21T09:19:14.1733750Z ld.shared.b16 %rs164, [%r10422+272]; 2026-02-21T09:19:14.1733822Z ld.shared.b16 %rs165, [%r10422+4096]; 2026-02-21T09:19:14.1733891Z ld.shared.b16 %rs166, [%r10422+4352]; 2026-02-21T09:19:14.1733960Z ld.shared.b16 %rs167, [%r10422+4112]; 2026-02-21T09:19:14.1734086Z ld.shared.b16 %rs168, [%r10422+4368]; 2026-02-21T09:19:14.1734154Z ld.shared.b16 %rs169, [%r10422+8192]; 2026-02-21T09:19:14.1734222Z ld.shared.b16 %rs170, [%r10422+8448]; 2026-02-21T09:19:14.1734297Z ld.shared.b16 %rs171, [%r10422+8208]; 2026-02-21T09:19:14.1734409Z ld.shared.b16 %rs172, [%r10422+8464]; 2026-02-21T09:19:14.1734481Z ld.shared.b16 %rs173, [%r10422+12288]; 2026-02-21T09:19:14.1734559Z ld.shared.b16 %rs174, [%r10422+12544]; 2026-02-21T09:19:14.1734629Z ld.shared.b16 %rs175, [%r10422+12304]; 2026-02-21T09:19:14.1734697Z ld.shared.b16 %rs176, [%r10422+12560]; 2026-02-21T09:19:14.1734767Z cvt.f32.bf16 %r7361, %rs145; 2026-02-21T09:19:14.1734840Z cvt.f32.bf16 %r7362, %rs146; 2026-02-21T09:19:14.1734952Z cvt.f32.bf16 %r7363, %rs161; 2026-02-21T09:19:14.1735031Z cvt.f32.bf16 %r7364, %rs162; 2026-02-21T09:19:14.1735110Z cvt.f32.bf16 %r7493, %rs147; 2026-02-21T09:19:14.1735176Z cvt.f32.bf16 %r7494, %rs148; 2026-02-21T09:19:14.1735241Z cvt.f32.bf16 %r7495, %rs163; 2026-02-21T09:19:14.1735306Z cvt.f32.bf16 %r7496, %rs164; 2026-02-21T09:19:14.1735378Z cvt.f32.bf16 %r7625, %rs149; 2026-02-21T09:19:14.1735442Z cvt.f32.bf16 %r7626, %rs150; 2026-02-21T09:19:14.1735504Z cvt.f32.bf16 %r7627, %rs165; 2026-02-21T09:19:14.1735577Z cvt.f32.bf16 %r7628, %rs166; 2026-02-21T09:19:14.1735642Z cvt.f32.bf16 %r7757, %rs151; 2026-02-21T09:19:14.1735710Z cvt.f32.bf16 %r7758, %rs152; 2026-02-21T09:19:14.1735778Z cvt.f32.bf16 %r7759, %rs167; 2026-02-21T09:19:14.1735839Z cvt.f32.bf16 %r7760, %rs168; 2026-02-21T09:19:14.1735903Z cvt.f32.bf16 %r7889, %rs153; 2026-02-21T09:19:14.1735966Z cvt.f32.bf16 %r7890, %rs154; 2026-02-21T09:19:14.1736036Z cvt.f32.bf16 %r7891, %rs169; 2026-02-21T09:19:14.1736103Z cvt.f32.bf16 %r7892, %rs170; 2026-02-21T09:19:14.1736168Z cvt.f32.bf16 %r8021, %rs155; 2026-02-21T09:19:14.1736236Z cvt.f32.bf16 %r8022, %rs156; 2026-02-21T09:19:14.1736298Z cvt.f32.bf16 %r8023, %rs171; 2026-02-21T09:19:14.1736363Z cvt.f32.bf16 %r8024, %rs172; 2026-02-21T09:19:14.1736426Z cvt.f32.bf16 %r8153, %rs157; 2026-02-21T09:19:14.1736618Z cvt.f32.bf16 %r8154, %rs158; 2026-02-21T09:19:14.1736687Z cvt.f32.bf16 %r8155, %rs173; 2026-02-21T09:19:14.1736750Z cvt.f32.bf16 %r8156, %rs174; 2026-02-21T09:19:14.1736819Z cvt.f32.bf16 %r8285, %rs159; 2026-02-21T09:19:14.1736881Z cvt.f32.bf16 %r8286, %rs160; 2026-02-21T09:19:14.1736941Z cvt.f32.bf16 %r8287, %rs175; 2026-02-21T09:19:14.1737005Z cvt.f32.bf16 %r8288, %rs176; 2026-02-21T09:19:14.1737218Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1737286Z shl.b32 %r10423, %r23251, 10; 2026-02-21T09:19:14.1737352Z add.s32 %r10424, %r22967, %r10423; 2026-02-21T09:19:14.1737424Z add.s32 %r10425, %r10424, 172032; 2026-02-21T09:19:14.1737703Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1737771Z add.s32 %r10426, %r10425, %r22973; 2026-02-21T09:19:14.1737849Z ld.shared.b8 %rs177, [%r10426]; 2026-02-21T09:19:14.1737917Z ld.shared.b8 %rs178, [%r10426+128]; 2026-02-21T09:19:14.1737984Z ld.shared.b8 %rs179, [%r10426+256]; 2026-02-21T09:19:14.1738050Z ld.shared.b8 %rs180, [%r10426+384]; 2026-02-21T09:19:14.1738126Z ld.shared.b8 %rs181, [%r10426+512]; 2026-02-21T09:19:14.1738193Z ld.shared.b8 %rs182, [%r10426+640]; 2026-02-21T09:19:14.1738260Z ld.shared.b8 %rs183, [%r10426+768]; 2026-02-21T09:19:14.1738327Z add.s32 %r10427, %r10425, %r22974; 2026-02-21T09:19:14.1738393Z ld.shared.b8 %rs184, [%r10427]; 2026-02-21T09:19:14.1738594Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1738666Z shl.b16 %rs185, %rs177, 4; 2026-02-21T09:19:14.1738733Z shl.b16 %rs186, %rs178, 4; 2026-02-21T09:19:14.1738797Z shl.b16 %rs187, %rs179, 4; 2026-02-21T09:19:14.1738860Z shl.b16 %rs188, %rs180, 4; 2026-02-21T09:19:14.1738926Z shl.b16 %rs189, %rs181, 4; 2026-02-21T09:19:14.1739061Z shl.b16 %rs190, %rs182, 4; 2026-02-21T09:19:14.1739126Z shl.b16 %rs191, %rs183, 4; 2026-02-21T09:19:14.1739193Z shl.b16 %rs192, %rs184, 4; 2026-02-21T09:19:14.1739403Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1739543Z selp.b16 %rs193, %rs185, %rs177, %p110; 2026-02-21T09:19:14.1739608Z cvt.s16.s8 %rs194, %rs193; 2026-02-21T09:19:14.1739677Z shr.s16 %rs195, %rs194, 4; 2026-02-21T09:19:14.1739749Z selp.b16 %rs196, %rs186, %rs178, %p110; 2026-02-21T09:19:14.1739813Z cvt.s16.s8 %rs197, %rs196; 2026-02-21T09:19:14.1739880Z shr.s16 %rs198, %rs197, 4; 2026-02-21T09:19:14.1739955Z selp.b16 %rs199, %rs187, %rs179, %p110; 2026-02-21T09:19:14.1740021Z cvt.s16.s8 %rs200, %rs199; 2026-02-21T09:19:14.1740153Z shr.s16 %rs201, %rs200, 4; 2026-02-21T09:19:14.1740227Z selp.b16 %rs202, %rs188, %rs180, %p110; 2026-02-21T09:19:14.1740291Z cvt.s16.s8 %rs203, %rs202; 2026-02-21T09:19:14.1740358Z shr.s16 %rs204, %rs203, 4; 2026-02-21T09:19:14.1740434Z selp.b16 %rs205, %rs189, %rs181, %p110; 2026-02-21T09:19:14.1740500Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T09:19:14.1740562Z shr.s16 %rs207, %rs206, 4; 2026-02-21T09:19:14.1740637Z selp.b16 %rs208, %rs190, %rs182, %p110; 2026-02-21T09:19:14.1740701Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T09:19:14.1740762Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:19:14.1740834Z selp.b16 %rs211, %rs191, %rs183, %p110; 2026-02-21T09:19:14.1740903Z cvt.s16.s8 %rs212, %rs211; 2026-02-21T09:19:14.1740964Z shr.s16 %rs213, %rs212, 4; 2026-02-21T09:19:14.1741034Z selp.b16 %rs214, %rs192, %rs184, %p110; 2026-02-21T09:19:14.1741103Z cvt.s16.s8 %rs215, %rs214; 2026-02-21T09:19:14.1741166Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:19:14.1741369Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1741439Z cvt.rn.f32.s16 %r10428, %rs195; 2026-02-21T09:19:14.1741512Z cvt.rn.f32.s16 %r10429, %rs198; 2026-02-21T09:19:14.1741581Z cvt.rn.f32.s16 %r10430, %rs201; 2026-02-21T09:19:14.1741646Z cvt.rn.f32.s16 %r10431, %rs204; 2026-02-21T09:19:14.1741717Z cvt.rn.f32.s16 %r10432, %rs207; 2026-02-21T09:19:14.1741782Z cvt.rn.f32.s16 %r10433, %rs210; 2026-02-21T09:19:14.1741865Z cvt.rn.f32.s16 %r10434, %rs213; 2026-02-21T09:19:14.1741937Z cvt.rn.f32.s16 %r10435, %rs216; 2026-02-21T09:19:14.1742006Z st.shared.b32 [%r161], %r10428; 2026-02-21T09:19:14.1742077Z st.shared.b32 [%r161+8], %r10429; 2026-02-21T09:19:14.1742143Z st.shared.b32 [%r162], %r10430; 2026-02-21T09:19:14.1742217Z st.shared.b32 [%r162+8], %r10431; 2026-02-21T09:19:14.1742283Z st.shared.b32 [%r163], %r10432; 2026-02-21T09:19:14.1742350Z st.shared.b32 [%r163+8], %r10433; 2026-02-21T09:19:14.1742422Z st.shared.b32 [%r164], %r10434; 2026-02-21T09:19:14.1742545Z st.shared.b32 [%r164+8], %r10435; 2026-02-21T09:19:14.1742606Z $L__tmp5: 2026-02-21T09:19:14.1742898Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1742973Z // begin inline asm 2026-02-21T09:19:14.1743063Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1743124Z // end inline asm 2026-02-21T09:19:14.1743189Z bar.sync 0; 2026-02-21T09:19:14.1743278Z shfl.sync.idx.b32 %r10436, %r5, 0, 31, -1; 2026-02-21T09:19:14.1743356Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1743432Z mov.pred %p23, -1; 2026-02-21T09:19:14.1743498Z // begin inline asm 2026-02-21T09:19:14.1744998Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316}, {%r7361,%r7362,%r7363,%r7364}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1745127Z // end inline asm 2026-02-21T09:19:14.1745188Z // begin inline asm 2026-02-21T09:19:14.1746872Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316}, {%r7493,%r7494,%r7495,%r7496}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1747001Z // end inline asm 2026-02-21T09:19:14.1747063Z // begin inline asm 2026-02-21T09:19:14.1750320Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380}, {%r7625,%r7626,%r7627,%r7628}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1750441Z // end inline asm 2026-02-21T09:19:14.1750510Z // begin inline asm 2026-02-21T09:19:14.1751987Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380}, {%r7757,%r7758,%r7759,%r7760}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1752061Z // end inline asm 2026-02-21T09:19:14.1752125Z // begin inline asm 2026-02-21T09:19:14.1753603Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444}, {%r7889,%r7890,%r7891,%r7892}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1753675Z // end inline asm 2026-02-21T09:19:14.1753735Z // begin inline asm 2026-02-21T09:19:14.1755202Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444}, {%r8021,%r8022,%r8023,%r8024}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1755267Z // end inline asm 2026-02-21T09:19:14.1755328Z // begin inline asm 2026-02-21T09:19:14.1757010Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508}, {%r8153,%r8154,%r8155,%r8156}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1757239Z // end inline asm 2026-02-21T09:19:14.1757304Z // begin inline asm 2026-02-21T09:19:14.1758861Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508}, {%r8285,%r8286,%r8287,%r8288}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1758926Z // end inline asm 2026-02-21T09:19:14.1759012Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1759073Z mov.b32 %r10121, 0; 2026-02-21T09:19:14.1759136Z mov.b32 %r8545, %r2973; 2026-02-21T09:19:14.1759199Z mov.b32 %r8546, %r10121; 2026-02-21T09:19:14.1759264Z mov.b32 %r8547, %r10121; 2026-02-21T09:19:14.1759445Z // begin inline asm 2026-02-21T09:19:14.1764448Z // wait for regs: %r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r8545,%r8546,%r8547 2026-02-21T09:19:14.1764543Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1764602Z // end inline asm 2026-02-21T09:19:14.1764661Z $L__tmp6: 2026-02-21T09:19:14.1764883Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1764956Z add.s32 %r10438, %r6453, %r10418; 2026-02-21T09:19:14.1765217Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1765285Z add.s32 %r10439, %r10438, %r157; 2026-02-21T09:19:14.1765359Z ld.shared.b16 %rs217, [%r10439]; 2026-02-21T09:19:14.1765478Z ld.shared.b16 %rs218, [%r10439+256]; 2026-02-21T09:19:14.1765561Z ld.shared.b16 %rs219, [%r10439+16]; 2026-02-21T09:19:14.1765636Z ld.shared.b16 %rs220, [%r10439+272]; 2026-02-21T09:19:14.1765710Z ld.shared.b16 %rs221, [%r10439+4096]; 2026-02-21T09:19:14.1765782Z ld.shared.b16 %rs222, [%r10439+4352]; 2026-02-21T09:19:14.1765850Z ld.shared.b16 %rs223, [%r10439+4112]; 2026-02-21T09:19:14.1765923Z ld.shared.b16 %rs224, [%r10439+4368]; 2026-02-21T09:19:14.1765990Z ld.shared.b16 %rs225, [%r10439+8192]; 2026-02-21T09:19:14.1766109Z ld.shared.b16 %rs226, [%r10439+8448]; 2026-02-21T09:19:14.1766184Z ld.shared.b16 %rs227, [%r10439+8208]; 2026-02-21T09:19:14.1766255Z ld.shared.b16 %rs228, [%r10439+8464]; 2026-02-21T09:19:14.1766330Z ld.shared.b16 %rs229, [%r10439+12288]; 2026-02-21T09:19:14.1766404Z ld.shared.b16 %rs230, [%r10439+12544]; 2026-02-21T09:19:14.1766604Z ld.shared.b16 %rs231, [%r10439+12304]; 2026-02-21T09:19:14.1766679Z ld.shared.b16 %rs232, [%r10439+12560]; 2026-02-21T09:19:14.1766747Z add.s32 %r10440, %r10438, %r158; 2026-02-21T09:19:14.1766819Z ld.shared.b16 %rs233, [%r10440]; 2026-02-21T09:19:14.1766889Z ld.shared.b16 %rs234, [%r10440+256]; 2026-02-21T09:19:14.1766959Z ld.shared.b16 %rs235, [%r10440+16]; 2026-02-21T09:19:14.1767119Z ld.shared.b16 %rs236, [%r10440+272]; 2026-02-21T09:19:14.1767191Z ld.shared.b16 %rs237, [%r10440+4096]; 2026-02-21T09:19:14.1767258Z ld.shared.b16 %rs238, [%r10440+4352]; 2026-02-21T09:19:14.1767326Z ld.shared.b16 %rs239, [%r10440+4112]; 2026-02-21T09:19:14.1767401Z ld.shared.b16 %rs240, [%r10440+4368]; 2026-02-21T09:19:14.1767468Z ld.shared.b16 %rs241, [%r10440+8192]; 2026-02-21T09:19:14.1767536Z ld.shared.b16 %rs242, [%r10440+8448]; 2026-02-21T09:19:14.1767613Z ld.shared.b16 %rs243, [%r10440+8208]; 2026-02-21T09:19:14.1767682Z ld.shared.b16 %rs244, [%r10440+8464]; 2026-02-21T09:19:14.1767752Z ld.shared.b16 %rs245, [%r10440+12288]; 2026-02-21T09:19:14.1767827Z ld.shared.b16 %rs246, [%r10440+12544]; 2026-02-21T09:19:14.1767899Z ld.shared.b16 %rs247, [%r10440+12304]; 2026-02-21T09:19:14.1767966Z ld.shared.b16 %rs248, [%r10440+12560]; 2026-02-21T09:19:14.1768034Z cvt.f32.bf16 %r8935, %rs217; 2026-02-21T09:19:14.1768104Z cvt.f32.bf16 %r8936, %rs218; 2026-02-21T09:19:14.1768167Z cvt.f32.bf16 %r8937, %rs233; 2026-02-21T09:19:14.1768232Z cvt.f32.bf16 %r8938, %rs234; 2026-02-21T09:19:14.1768301Z cvt.f32.bf16 %r9067, %rs219; 2026-02-21T09:19:14.1768365Z cvt.f32.bf16 %r9068, %rs220; 2026-02-21T09:19:14.1768429Z cvt.f32.bf16 %r9069, %rs235; 2026-02-21T09:19:14.1768492Z cvt.f32.bf16 %r9070, %rs236; 2026-02-21T09:19:14.1768559Z cvt.f32.bf16 %r9199, %rs221; 2026-02-21T09:19:14.1768621Z cvt.f32.bf16 %r9200, %rs222; 2026-02-21T09:19:14.1768685Z cvt.f32.bf16 %r9201, %rs237; 2026-02-21T09:19:14.1768753Z cvt.f32.bf16 %r9202, %rs238; 2026-02-21T09:19:14.1768815Z cvt.f32.bf16 %r9331, %rs223; 2026-02-21T09:19:14.1768877Z cvt.f32.bf16 %r9332, %rs224; 2026-02-21T09:19:14.1768938Z cvt.f32.bf16 %r9333, %rs239; 2026-02-21T09:19:14.1769007Z cvt.f32.bf16 %r9334, %rs240; 2026-02-21T09:19:14.1769071Z cvt.f32.bf16 %r9463, %rs225; 2026-02-21T09:19:14.1769135Z cvt.f32.bf16 %r9464, %rs226; 2026-02-21T09:19:14.1769203Z cvt.f32.bf16 %r9465, %rs241; 2026-02-21T09:19:14.1769265Z cvt.f32.bf16 %r9466, %rs242; 2026-02-21T09:19:14.1769328Z cvt.f32.bf16 %r9595, %rs227; 2026-02-21T09:19:14.1769398Z cvt.f32.bf16 %r9596, %rs228; 2026-02-21T09:19:14.1769465Z cvt.f32.bf16 %r9597, %rs243; 2026-02-21T09:19:14.1769541Z cvt.f32.bf16 %r9598, %rs244; 2026-02-21T09:19:14.1769606Z cvt.f32.bf16 %r9727, %rs229; 2026-02-21T09:19:14.1769678Z cvt.f32.bf16 %r9728, %rs230; 2026-02-21T09:19:14.1769744Z cvt.f32.bf16 %r9729, %rs245; 2026-02-21T09:19:14.1769892Z cvt.f32.bf16 %r9730, %rs246; 2026-02-21T09:19:14.1769959Z cvt.f32.bf16 %r9859, %rs231; 2026-02-21T09:19:14.1770021Z cvt.f32.bf16 %r9860, %rs232; 2026-02-21T09:19:14.1770082Z cvt.f32.bf16 %r9861, %rs247; 2026-02-21T09:19:14.1770145Z cvt.f32.bf16 %r9862, %rs248; 2026-02-21T09:19:14.1770416Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1770483Z add.s32 %r10441, %r10424, 177152; 2026-02-21T09:19:14.1770683Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1770757Z add.s32 %r10442, %r10441, %r22973; 2026-02-21T09:19:14.1770827Z ld.shared.b8 %rs249, [%r10442]; 2026-02-21T09:19:14.1770895Z ld.shared.b8 %rs250, [%r10442+128]; 2026-02-21T09:19:14.1771032Z ld.shared.b8 %rs251, [%r10442+256]; 2026-02-21T09:19:14.1771106Z ld.shared.b8 %rs252, [%r10442+384]; 2026-02-21T09:19:14.1771176Z ld.shared.b8 %rs253, [%r10442+512]; 2026-02-21T09:19:14.1771244Z ld.shared.b8 %rs254, [%r10442+640]; 2026-02-21T09:19:14.1771318Z ld.shared.b8 %rs255, [%r10442+768]; 2026-02-21T09:19:14.1771380Z add.s32 %r10443, %r10441, %r22974; 2026-02-21T09:19:14.1771446Z ld.shared.b8 %rs256, [%r10443]; 2026-02-21T09:19:14.1771652Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1771717Z shl.b16 %rs257, %rs249, 4; 2026-02-21T09:19:14.1771780Z shl.b16 %rs258, %rs250, 4; 2026-02-21T09:19:14.1771847Z shl.b16 %rs259, %rs251, 4; 2026-02-21T09:19:14.1771956Z shl.b16 %rs260, %rs252, 4; 2026-02-21T09:19:14.1772020Z shl.b16 %rs261, %rs253, 4; 2026-02-21T09:19:14.1772081Z shl.b16 %rs262, %rs254, 4; 2026-02-21T09:19:14.1772148Z shl.b16 %rs263, %rs255, 4; 2026-02-21T09:19:14.1772211Z shl.b16 %rs264, %rs256, 4; 2026-02-21T09:19:14.1772411Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1772493Z selp.b16 %rs265, %rs257, %rs249, %p110; 2026-02-21T09:19:14.1772558Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T09:19:14.1772619Z shr.s16 %rs267, %rs266, 4; 2026-02-21T09:19:14.1772690Z selp.b16 %rs268, %rs258, %rs250, %p110; 2026-02-21T09:19:14.1772758Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T09:19:14.1772822Z shr.s16 %rs270, %rs269, 4; 2026-02-21T09:19:14.1772896Z selp.b16 %rs271, %rs259, %rs251, %p110; 2026-02-21T09:19:14.1772963Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T09:19:14.1773037Z shr.s16 %rs273, %rs272, 4; 2026-02-21T09:19:14.1773111Z selp.b16 %rs274, %rs260, %rs252, %p110; 2026-02-21T09:19:14.1773175Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T09:19:14.1773243Z shr.s16 %rs276, %rs275, 4; 2026-02-21T09:19:14.1773313Z selp.b16 %rs277, %rs261, %rs253, %p110; 2026-02-21T09:19:14.1773376Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T09:19:14.1773443Z shr.s16 %rs279, %rs278, 4; 2026-02-21T09:19:14.1773515Z selp.b16 %rs280, %rs262, %rs254, %p110; 2026-02-21T09:19:14.1773577Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T09:19:14.1773651Z shr.s16 %rs282, %rs281, 4; 2026-02-21T09:19:14.1773721Z selp.b16 %rs283, %rs263, %rs255, %p110; 2026-02-21T09:19:14.1773785Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T09:19:14.1773847Z shr.s16 %rs285, %rs284, 4; 2026-02-21T09:19:14.1773921Z selp.b16 %rs286, %rs264, %rs256, %p110; 2026-02-21T09:19:14.1773986Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T09:19:14.1774051Z shr.s16 %rs288, %rs287, 4; 2026-02-21T09:19:14.1774262Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1774331Z cvt.rn.f32.s16 %r10444, %rs267; 2026-02-21T09:19:14.1774396Z cvt.rn.f32.s16 %r10445, %rs270; 2026-02-21T09:19:14.1774463Z cvt.rn.f32.s16 %r10446, %rs273; 2026-02-21T09:19:14.1774533Z cvt.rn.f32.s16 %r10447, %rs276; 2026-02-21T09:19:14.1774597Z cvt.rn.f32.s16 %r10448, %rs279; 2026-02-21T09:19:14.1774661Z cvt.rn.f32.s16 %r10449, %rs282; 2026-02-21T09:19:14.1774730Z cvt.rn.f32.s16 %r10450, %rs285; 2026-02-21T09:19:14.1774875Z cvt.rn.f32.s16 %r10451, %rs288; 2026-02-21T09:19:14.1774936Z bar.sync 0; 2026-02-21T09:19:14.1775008Z st.shared.b32 [%r161], %r10444; 2026-02-21T09:19:14.1775075Z st.shared.b32 [%r161+8], %r10445; 2026-02-21T09:19:14.1775141Z st.shared.b32 [%r162], %r10446; 2026-02-21T09:19:14.1775253Z st.shared.b32 [%r162+8], %r10447; 2026-02-21T09:19:14.1775333Z st.shared.b32 [%r163], %r10448; 2026-02-21T09:19:14.1775405Z st.shared.b32 [%r163+8], %r10449; 2026-02-21T09:19:14.1775470Z st.shared.b32 [%r164], %r10450; 2026-02-21T09:19:14.1775541Z st.shared.b32 [%r164+8], %r10451; 2026-02-21T09:19:14.1775601Z $L__tmp7: 2026-02-21T09:19:14.1775884Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1775999Z // begin inline asm 2026-02-21T09:19:14.1776093Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1776153Z // end inline asm 2026-02-21T09:19:14.1776212Z bar.sync 0; 2026-02-21T09:19:14.1776295Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1776356Z // begin inline asm 2026-02-21T09:19:14.1778051Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316}, {%r8935,%r8936,%r8937,%r8938}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1778126Z // end inline asm 2026-02-21T09:19:14.1778190Z // begin inline asm 2026-02-21T09:19:14.1779671Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316}, {%r9067,%r9068,%r9069,%r9070}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1779741Z // end inline asm 2026-02-21T09:19:14.1779802Z // begin inline asm 2026-02-21T09:19:14.1781287Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380}, {%r9199,%r9200,%r9201,%r9202}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1781350Z // end inline asm 2026-02-21T09:19:14.1781409Z // begin inline asm 2026-02-21T09:19:14.1788491Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380}, {%r9331,%r9332,%r9333,%r9334}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1788687Z // end inline asm 2026-02-21T09:19:14.1788756Z // begin inline asm 2026-02-21T09:19:14.1790390Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444}, {%r9463,%r9464,%r9465,%r9466}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1790527Z // end inline asm 2026-02-21T09:19:14.1790588Z // begin inline asm 2026-02-21T09:19:14.1792123Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444}, {%r9595,%r9596,%r9597,%r9598}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1792190Z // end inline asm 2026-02-21T09:19:14.1792250Z // begin inline asm 2026-02-21T09:19:14.1793768Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508}, {%r9727,%r9728,%r9729,%r9730}, %rd1, %p23, 1, 1; 2026-02-21T09:19:14.1793832Z // end inline asm 2026-02-21T09:19:14.1793896Z // begin inline asm 2026-02-21T09:19:14.1795350Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508}, {%r9859,%r9860,%r9861,%r9862}, %rd2, %p23, 1, 1; 2026-02-21T09:19:14.1795413Z // end inline asm 2026-02-21T09:19:14.1795499Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1795562Z mov.b32 %r10119, %r2973; 2026-02-21T09:19:14.1795624Z mov.b32 %r10120, %r10121; 2026-02-21T09:19:14.1795687Z // begin inline asm 2026-02-21T09:19:14.1800799Z // wait for regs: %r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r23298,%r23299,%r23300,%r23301,%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r10119,%r10120,%r10121 2026-02-21T09:19:14.1801011Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1801071Z // end inline asm 2026-02-21T09:19:14.1801137Z $L__tmp8: 2026-02-21T09:19:14.1801360Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1801430Z add.s32 %r10452, %r23252, 1; 2026-02-21T09:19:14.1801499Z setp.gt.s32 %p42, %r10452, 4; 2026-02-21T09:19:14.1801638Z selp.b32 %r23252, 0, %r10452, %p42; 2026-02-21T09:19:14.1801849Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1801918Z add.s32 %r10453, %r23249, -16; 2026-02-21T09:19:14.1801993Z add.s64 %rd444, %rd1116, %rd25; 2026-02-21T09:19:14.1802056Z add.s64 %rd426, %rd444, 320; 2026-02-21T09:19:14.1802120Z add.s64 %rd445, %rd1116, %rd24; 2026-02-21T09:19:14.1802183Z add.s64 %rd427, %rd445, 320; 2026-02-21T09:19:14.1802248Z add.s64 %rd446, %rd1116, %rd23; 2026-02-21T09:19:14.1802311Z add.s64 %rd428, %rd446, 320; 2026-02-21T09:19:14.1802372Z add.s64 %rd447, %rd1116, %rd22; 2026-02-21T09:19:14.1802436Z add.s64 %rd429, %rd447, 320; 2026-02-21T09:19:14.1802498Z add.s64 %rd448, %rd1116, %rd21; 2026-02-21T09:19:14.1802559Z add.s64 %rd430, %rd448, 320; 2026-02-21T09:19:14.1802624Z add.s64 %rd449, %rd1116, %rd20; 2026-02-21T09:19:14.1802687Z add.s64 %rd431, %rd449, 320; 2026-02-21T09:19:14.1802761Z add.s64 %rd450, %rd1116, %rd19; 2026-02-21T09:19:14.1802827Z add.s64 %rd432, %rd450, 320; 2026-02-21T09:19:14.1802906Z mad.wide.s32 %rd433, %r10453, 2, %rd64; 2026-02-21T09:19:14.1803110Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1803174Z shl.b32 %r10454, %r23252, 14; 2026-02-21T09:19:14.1803243Z add.s32 %r10455, %r22967, %r10454; 2026-02-21T09:19:14.1803306Z add.s32 %r10381, %r10455, %r56; 2026-02-21T09:19:14.1803372Z selp.b32 %r10382, 8, 0, %p40; 2026-02-21T09:19:14.1803436Z // begin inline asm 2026-02-21T09:19:14.1803590Z cp.async.ca.shared.global [ %r10381 + 0 ], [ %rd426 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1803648Z // end inline asm 2026-02-21T09:19:14.1803711Z add.s32 %r10383, %r10381, 2048; 2026-02-21T09:19:14.1803774Z // begin inline asm 2026-02-21T09:19:14.1803914Z cp.async.ca.shared.global [ %r10383 + 0 ], [ %rd427 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1803973Z // end inline asm 2026-02-21T09:19:14.1804038Z add.s32 %r10385, %r10381, 4096; 2026-02-21T09:19:14.1804098Z // begin inline asm 2026-02-21T09:19:14.1804234Z cp.async.ca.shared.global [ %r10385 + 0 ], [ %rd428 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1804296Z // end inline asm 2026-02-21T09:19:14.1804360Z add.s32 %r10387, %r10381, 6144; 2026-02-21T09:19:14.1804420Z // begin inline asm 2026-02-21T09:19:14.1804551Z cp.async.ca.shared.global [ %r10387 + 0 ], [ %rd429 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1804676Z // end inline asm 2026-02-21T09:19:14.1804736Z add.s32 %r10389, %r10381, 8192; 2026-02-21T09:19:14.1804796Z // begin inline asm 2026-02-21T09:19:14.1804933Z cp.async.ca.shared.global [ %r10389 + 0 ], [ %rd430 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1805039Z // end inline asm 2026-02-21T09:19:14.1805102Z add.s32 %r10391, %r10381, 10240; 2026-02-21T09:19:14.1805160Z // begin inline asm 2026-02-21T09:19:14.1805299Z cp.async.ca.shared.global [ %r10391 + 0 ], [ %rd431 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1805356Z // end inline asm 2026-02-21T09:19:14.1805434Z add.s32 %r10393, %r10381, 12288; 2026-02-21T09:19:14.1805500Z // begin inline asm 2026-02-21T09:19:14.1805635Z cp.async.ca.shared.global [ %r10393 + 0 ], [ %rd432 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1805742Z // end inline asm 2026-02-21T09:19:14.1805808Z add.s32 %r10395, %r10381, 14336; 2026-02-21T09:19:14.1805873Z // begin inline asm 2026-02-21T09:19:14.1806007Z cp.async.ca.shared.global [ %r10395 + 0 ], [ %rd433 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1806068Z // end inline asm 2026-02-21T09:19:14.1806141Z cp.async.commit_group; 2026-02-21T09:19:14.1806343Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1806412Z add.s32 %r10456, %r23250, -65536; 2026-02-21T09:19:14.1806596Z cvt.s64.s32 %rd451, %r10456; 2026-02-21T09:19:14.1806664Z add.s64 %rd434, %rd65, %rd451; 2026-02-21T09:19:14.1806881Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1807044Z shl.b32 %r10457, %r23252, 10; 2026-02-21T09:19:14.1807116Z add.s32 %r10397, %r66, %r10457; 2026-02-21T09:19:14.1807180Z selp.b32 %r10398, 4, 0, %p40; 2026-02-21T09:19:14.1807241Z // begin inline asm 2026-02-21T09:19:14.1807394Z cp.async.ca.shared.global [ %r10397 + 0 ], [ %rd434 + 0 ], 0x4, %r10398; 2026-02-21T09:19:14.1807452Z // end inline asm 2026-02-21T09:19:14.1807517Z cp.async.commit_group; 2026-02-21T09:19:14.1807730Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1807796Z add.s64 %rd435, %rd444, 352; 2026-02-21T09:19:14.1807860Z add.s64 %rd436, %rd445, 352; 2026-02-21T09:19:14.1807925Z add.s64 %rd437, %rd446, 352; 2026-02-21T09:19:14.1807991Z add.s64 %rd438, %rd447, 352; 2026-02-21T09:19:14.1808053Z add.s64 %rd439, %rd448, 352; 2026-02-21T09:19:14.1808115Z add.s64 %rd440, %rd449, 352; 2026-02-21T09:19:14.1808182Z add.s64 %rd441, %rd450, 352; 2026-02-21T09:19:14.1808261Z mad.wide.s32 %rd442, %r23249, 2, %rd64; 2026-02-21T09:19:14.1808473Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1808550Z add.s32 %r10458, %r6453, %r10454; 2026-02-21T09:19:14.1808619Z add.s32 %r10399, %r10458, %r56; 2026-02-21T09:19:14.1808682Z // begin inline asm 2026-02-21T09:19:14.1808832Z cp.async.ca.shared.global [ %r10399 + 0 ], [ %rd435 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1808898Z // end inline asm 2026-02-21T09:19:14.1808962Z add.s32 %r10401, %r10399, 2048; 2026-02-21T09:19:14.1809024Z // begin inline asm 2026-02-21T09:19:14.1809168Z cp.async.ca.shared.global [ %r10401 + 0 ], [ %rd436 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1809228Z // end inline asm 2026-02-21T09:19:14.1809290Z add.s32 %r10403, %r10399, 4096; 2026-02-21T09:19:14.1809352Z // begin inline asm 2026-02-21T09:19:14.1809493Z cp.async.ca.shared.global [ %r10403 + 0 ], [ %rd437 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1809552Z // end inline asm 2026-02-21T09:19:14.1809615Z add.s32 %r10405, %r10399, 6144; 2026-02-21T09:19:14.1809681Z // begin inline asm 2026-02-21T09:19:14.1809817Z cp.async.ca.shared.global [ %r10405 + 0 ], [ %rd438 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1809875Z // end inline asm 2026-02-21T09:19:14.1809936Z add.s32 %r10407, %r10399, 8192; 2026-02-21T09:19:14.1810001Z // begin inline asm 2026-02-21T09:19:14.1810149Z cp.async.ca.shared.global [ %r10407 + 0 ], [ %rd439 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1810287Z // end inline asm 2026-02-21T09:19:14.1810355Z add.s32 %r10409, %r10399, 10240; 2026-02-21T09:19:14.1810416Z // begin inline asm 2026-02-21T09:19:14.1810553Z cp.async.ca.shared.global [ %r10409 + 0 ], [ %rd440 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1810675Z // end inline asm 2026-02-21T09:19:14.1810736Z add.s32 %r10411, %r10399, 12288; 2026-02-21T09:19:14.1810797Z // begin inline asm 2026-02-21T09:19:14.1810933Z cp.async.ca.shared.global [ %r10411 + 0 ], [ %rd441 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1810997Z // end inline asm 2026-02-21T09:19:14.1811062Z add.s32 %r10413, %r10399, 14336; 2026-02-21T09:19:14.1811122Z // begin inline asm 2026-02-21T09:19:14.1811332Z cp.async.ca.shared.global [ %r10413 + 0 ], [ %rd442 + 0 ], 0x8, %r10382; 2026-02-21T09:19:14.1811395Z // end inline asm 2026-02-21T09:19:14.1811465Z cp.async.commit_group; 2026-02-21T09:19:14.1811679Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1811767Z cvt.s64.s32 %rd452, %r23250; 2026-02-21T09:19:14.1811835Z add.s64 %rd443, %rd65, %rd452; 2026-02-21T09:19:14.1812038Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1812107Z add.s32 %r10415, %r76, %r10457; 2026-02-21T09:19:14.1812171Z // begin inline asm 2026-02-21T09:19:14.1812317Z cp.async.ca.shared.global [ %r10415 + 0 ], [ %rd443 + 0 ], 0x4, %r10398; 2026-02-21T09:19:14.1812378Z // end inline asm 2026-02-21T09:19:14.1812495Z cp.async.commit_group; 2026-02-21T09:19:14.1812694Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1812759Z add.s32 %r23250, %r23250, 131072; 2026-02-21T09:19:14.1812831Z add.s64 %rd1116, %rd1116, 64; 2026-02-21T09:19:14.1812893Z add.s32 %r23249, %r23249, 32; 2026-02-21T09:19:14.1812962Z setp.lt.u64 %p43, %rd1117, 496; 2026-02-21T09:19:14.1813033Z @%p43 bra $L__BB0_5; 2026-02-21T09:19:14.1813144Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:14.1813348Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.1813419Z or.b32 %r10930, %r851, %r9; 2026-02-21T09:19:14.1813615Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.1813678Z or.b32 %r10931, %r852, %r19; 2026-02-21T09:19:14.1813739Z or.b32 %r10932, %r852, %r20; 2026-02-21T09:19:14.1813806Z or.b32 %r10933, %r852, %r21; 2026-02-21T09:19:14.1813866Z or.b32 %r10934, %r852, %r22; 2026-02-21T09:19:14.1813927Z or.b32 %r10935, %r852, %r23; 2026-02-21T09:19:14.1813995Z or.b32 %r10936, %r852, %r24; 2026-02-21T09:19:14.1814057Z or.b32 %r10937, %r852, %r25; 2026-02-21T09:19:14.1814117Z or.b32 %r10938, %r852, %r26; 2026-02-21T09:19:14.1814183Z or.b32 %r10939, %r852, %r27; 2026-02-21T09:19:14.1814242Z or.b32 %r10940, %r852, %r28; 2026-02-21T09:19:14.1814303Z or.b32 %r10941, %r852, %r29; 2026-02-21T09:19:14.1814361Z or.b32 %r10942, %r852, %r30; 2026-02-21T09:19:14.1814426Z or.b32 %r10943, %r852, %r31; 2026-02-21T09:19:14.1814485Z or.b32 %r10944, %r852, %r32; 2026-02-21T09:19:14.1814547Z or.b32 %r10945, %r852, %r33; 2026-02-21T09:19:14.1814611Z or.b32 %r10946, %r852, %r34; 2026-02-21T09:19:14.1814669Z or.b32 %r10947, %r852, %r35; 2026-02-21T09:19:14.1814729Z or.b32 %r10948, %r852, %r36; 2026-02-21T09:19:14.1814788Z or.b32 %r10949, %r852, %r37; 2026-02-21T09:19:14.1814855Z or.b32 %r10950, %r852, %r38; 2026-02-21T09:19:14.1814915Z or.b32 %r10951, %r852, %r39; 2026-02-21T09:19:14.1814977Z or.b32 %r10952, %r852, %r40; 2026-02-21T09:19:14.1815044Z or.b32 %r10953, %r852, %r41; 2026-02-21T09:19:14.1815106Z or.b32 %r10954, %r852, %r42; 2026-02-21T09:19:14.1815166Z or.b32 %r10955, %r852, %r43; 2026-02-21T09:19:14.1815225Z or.b32 %r10956, %r852, %r44; 2026-02-21T09:19:14.1815289Z or.b32 %r10957, %r852, %r45; 2026-02-21T09:19:14.1815416Z or.b32 %r10958, %r852, %r46; 2026-02-21T09:19:14.1815478Z or.b32 %r10959, %r852, %r47; 2026-02-21T09:19:14.1815542Z or.b32 %r10960, %r852, %r48; 2026-02-21T09:19:14.1815601Z or.b32 %r10961, %r852, %r49; 2026-02-21T09:19:14.1815708Z or.b32 %r10962, %r852, %r50; 2026-02-21T09:19:14.1815911Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1815982Z cp.async.wait_group 0; 2026-02-21T09:19:14.1816042Z bar.sync 0; 2026-02-21T09:19:14.1816241Z .loc 1 90 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:90:28 2026-02-21T09:19:14.1816333Z cvt.rn.bf16x2.f32 %r10963, %r23254, %r23253; 2026-02-21T09:19:14.1816620Z cvt.rn.bf16x2.f32 %r10964, %r23256, %r23255; 2026-02-21T09:19:14.1816715Z cvt.rn.bf16x2.f32 %r10965, %r23258, %r23257; 2026-02-21T09:19:14.1816804Z cvt.rn.bf16x2.f32 %r10966, %r23260, %r23259; 2026-02-21T09:19:14.1816881Z cvt.rn.bf16x2.f32 %r10967, %r23262, %r23261; 2026-02-21T09:19:14.1816962Z cvt.rn.bf16x2.f32 %r10968, %r23264, %r23263; 2026-02-21T09:19:14.1817045Z cvt.rn.bf16x2.f32 %r10969, %r23266, %r23265; 2026-02-21T09:19:14.1817122Z cvt.rn.bf16x2.f32 %r10970, %r23268, %r23267; 2026-02-21T09:19:14.1817205Z cvt.rn.bf16x2.f32 %r10971, %r23270, %r23269; 2026-02-21T09:19:14.1817281Z cvt.rn.bf16x2.f32 %r10972, %r23272, %r23271; 2026-02-21T09:19:14.1817363Z cvt.rn.bf16x2.f32 %r10973, %r23274, %r23273; 2026-02-21T09:19:14.1817438Z cvt.rn.bf16x2.f32 %r10974, %r23276, %r23275; 2026-02-21T09:19:14.1817590Z cvt.rn.bf16x2.f32 %r10975, %r23278, %r23277; 2026-02-21T09:19:14.1817675Z cvt.rn.bf16x2.f32 %r10976, %r23280, %r23279; 2026-02-21T09:19:14.1817753Z cvt.rn.bf16x2.f32 %r10977, %r23282, %r23281; 2026-02-21T09:19:14.1817832Z cvt.rn.bf16x2.f32 %r10978, %r23284, %r23283; 2026-02-21T09:19:14.1817913Z cvt.rn.bf16x2.f32 %r10979, %r23286, %r23285; 2026-02-21T09:19:14.1817991Z cvt.rn.bf16x2.f32 %r10980, %r23288, %r23287; 2026-02-21T09:19:14.1818069Z cvt.rn.bf16x2.f32 %r10981, %r23290, %r23289; 2026-02-21T09:19:14.1818144Z cvt.rn.bf16x2.f32 %r10982, %r23292, %r23291; 2026-02-21T09:19:14.1818224Z cvt.rn.bf16x2.f32 %r10983, %r23294, %r23293; 2026-02-21T09:19:14.1818300Z cvt.rn.bf16x2.f32 %r10984, %r23296, %r23295; 2026-02-21T09:19:14.1818379Z cvt.rn.bf16x2.f32 %r10985, %r23298, %r23297; 2026-02-21T09:19:14.1818462Z cvt.rn.bf16x2.f32 %r10986, %r23300, %r23299; 2026-02-21T09:19:14.1818540Z cvt.rn.bf16x2.f32 %r10987, %r23302, %r23301; 2026-02-21T09:19:14.1818617Z cvt.rn.bf16x2.f32 %r10988, %r23304, %r23303; 2026-02-21T09:19:14.1818708Z cvt.rn.bf16x2.f32 %r10989, %r23306, %r23305; 2026-02-21T09:19:14.1818797Z cvt.rn.bf16x2.f32 %r10990, %r23308, %r23307; 2026-02-21T09:19:14.1818880Z cvt.rn.bf16x2.f32 %r10991, %r23310, %r23309; 2026-02-21T09:19:14.1818963Z cvt.rn.bf16x2.f32 %r10992, %r23312, %r23311; 2026-02-21T09:19:14.1819047Z cvt.rn.bf16x2.f32 %r10993, %r23314, %r23313; 2026-02-21T09:19:14.1819127Z cvt.rn.bf16x2.f32 %r10994, %r23316, %r23315; 2026-02-21T09:19:14.1819204Z cvt.rn.bf16x2.f32 %r10995, %r23318, %r23317; 2026-02-21T09:19:14.1819286Z cvt.rn.bf16x2.f32 %r10996, %r23320, %r23319; 2026-02-21T09:19:14.1819364Z cvt.rn.bf16x2.f32 %r10997, %r23322, %r23321; 2026-02-21T09:19:14.1819443Z cvt.rn.bf16x2.f32 %r10998, %r23324, %r23323; 2026-02-21T09:19:14.1819519Z cvt.rn.bf16x2.f32 %r10999, %r23326, %r23325; 2026-02-21T09:19:14.1819603Z cvt.rn.bf16x2.f32 %r11000, %r23328, %r23327; 2026-02-21T09:19:14.1819679Z cvt.rn.bf16x2.f32 %r11001, %r23330, %r23329; 2026-02-21T09:19:14.1819758Z cvt.rn.bf16x2.f32 %r11002, %r23332, %r23331; 2026-02-21T09:19:14.1819843Z cvt.rn.bf16x2.f32 %r11003, %r23334, %r23333; 2026-02-21T09:19:14.1819918Z cvt.rn.bf16x2.f32 %r11004, %r23336, %r23335; 2026-02-21T09:19:14.1819996Z cvt.rn.bf16x2.f32 %r11005, %r23338, %r23337; 2026-02-21T09:19:14.1820078Z cvt.rn.bf16x2.f32 %r11006, %r23340, %r23339; 2026-02-21T09:19:14.1820155Z cvt.rn.bf16x2.f32 %r11007, %r23342, %r23341; 2026-02-21T09:19:14.1820309Z cvt.rn.bf16x2.f32 %r11008, %r23344, %r23343; 2026-02-21T09:19:14.1820385Z cvt.rn.bf16x2.f32 %r11009, %r23346, %r23345; 2026-02-21T09:19:14.1820465Z cvt.rn.bf16x2.f32 %r11010, %r23348, %r23347; 2026-02-21T09:19:14.1820542Z cvt.rn.bf16x2.f32 %r11011, %r23350, %r23349; 2026-02-21T09:19:14.1820681Z cvt.rn.bf16x2.f32 %r11012, %r23352, %r23351; 2026-02-21T09:19:14.1820764Z cvt.rn.bf16x2.f32 %r11013, %r23354, %r23353; 2026-02-21T09:19:14.1820840Z cvt.rn.bf16x2.f32 %r11014, %r23356, %r23355; 2026-02-21T09:19:14.1820919Z cvt.rn.bf16x2.f32 %r11015, %r23358, %r23357; 2026-02-21T09:19:14.1821002Z cvt.rn.bf16x2.f32 %r11016, %r23360, %r23359; 2026-02-21T09:19:14.1821078Z cvt.rn.bf16x2.f32 %r11017, %r23362, %r23361; 2026-02-21T09:19:14.1821213Z cvt.rn.bf16x2.f32 %r11018, %r23364, %r23363; 2026-02-21T09:19:14.1821293Z cvt.rn.bf16x2.f32 %r11019, %r23366, %r23365; 2026-02-21T09:19:14.1821374Z cvt.rn.bf16x2.f32 %r11020, %r23368, %r23367; 2026-02-21T09:19:14.1821455Z cvt.rn.bf16x2.f32 %r11021, %r23370, %r23369; 2026-02-21T09:19:14.1821532Z cvt.rn.bf16x2.f32 %r11022, %r23372, %r23371; 2026-02-21T09:19:14.1821615Z cvt.rn.bf16x2.f32 %r11023, %r23374, %r23373; 2026-02-21T09:19:14.1821689Z cvt.rn.bf16x2.f32 %r11024, %r23376, %r23375; 2026-02-21T09:19:14.1821767Z cvt.rn.bf16x2.f32 %r11025, %r23378, %r23377; 2026-02-21T09:19:14.1821848Z cvt.rn.bf16x2.f32 %r11026, %r23380, %r23379; 2026-02-21T09:19:14.1821925Z cvt.rn.bf16x2.f32 %r11027, %r23382, %r23381; 2026-02-21T09:19:14.1822000Z cvt.rn.bf16x2.f32 %r11028, %r23384, %r23383; 2026-02-21T09:19:14.1822133Z cvt.rn.bf16x2.f32 %r11029, %r23386, %r23385; 2026-02-21T09:19:14.1822232Z cvt.rn.bf16x2.f32 %r11030, %r23388, %r23387; 2026-02-21T09:19:14.1822310Z cvt.rn.bf16x2.f32 %r11031, %r23390, %r23389; 2026-02-21T09:19:14.1822389Z cvt.rn.bf16x2.f32 %r11032, %r23392, %r23391; 2026-02-21T09:19:14.1822473Z cvt.rn.bf16x2.f32 %r11033, %r23394, %r23393; 2026-02-21T09:19:14.1822549Z cvt.rn.bf16x2.f32 %r11034, %r23396, %r23395; 2026-02-21T09:19:14.1822627Z cvt.rn.bf16x2.f32 %r11035, %r23398, %r23397; 2026-02-21T09:19:14.1822707Z cvt.rn.bf16x2.f32 %r11036, %r23400, %r23399; 2026-02-21T09:19:14.1822784Z cvt.rn.bf16x2.f32 %r11037, %r23402, %r23401; 2026-02-21T09:19:14.1822860Z cvt.rn.bf16x2.f32 %r11038, %r23404, %r23403; 2026-02-21T09:19:14.1822939Z cvt.rn.bf16x2.f32 %r11039, %r23406, %r23405; 2026-02-21T09:19:14.1823022Z cvt.rn.bf16x2.f32 %r11040, %r23408, %r23407; 2026-02-21T09:19:14.1823098Z cvt.rn.bf16x2.f32 %r11041, %r23410, %r23409; 2026-02-21T09:19:14.1823175Z cvt.rn.bf16x2.f32 %r11042, %r23412, %r23411; 2026-02-21T09:19:14.1823258Z cvt.rn.bf16x2.f32 %r11043, %r23414, %r23413; 2026-02-21T09:19:14.1823334Z cvt.rn.bf16x2.f32 %r11044, %r23416, %r23415; 2026-02-21T09:19:14.1823412Z cvt.rn.bf16x2.f32 %r11045, %r23418, %r23417; 2026-02-21T09:19:14.1823495Z cvt.rn.bf16x2.f32 %r11046, %r23420, %r23419; 2026-02-21T09:19:14.1823571Z cvt.rn.bf16x2.f32 %r11047, %r23422, %r23421; 2026-02-21T09:19:14.1823648Z cvt.rn.bf16x2.f32 %r11048, %r23424, %r23423; 2026-02-21T09:19:14.1823724Z cvt.rn.bf16x2.f32 %r11049, %r23426, %r23425; 2026-02-21T09:19:14.1823805Z cvt.rn.bf16x2.f32 %r11050, %r23428, %r23427; 2026-02-21T09:19:14.1823882Z cvt.rn.bf16x2.f32 %r11051, %r23430, %r23429; 2026-02-21T09:19:14.1823961Z cvt.rn.bf16x2.f32 %r11052, %r23432, %r23431; 2026-02-21T09:19:14.1824044Z cvt.rn.bf16x2.f32 %r11053, %r23434, %r23433; 2026-02-21T09:19:14.1824122Z cvt.rn.bf16x2.f32 %r11054, %r23436, %r23435; 2026-02-21T09:19:14.1824197Z cvt.rn.bf16x2.f32 %r11055, %r23438, %r23437; 2026-02-21T09:19:14.1824274Z cvt.rn.bf16x2.f32 %r11056, %r23440, %r23439; 2026-02-21T09:19:14.1824362Z cvt.rn.bf16x2.f32 %r11057, %r23442, %r23441; 2026-02-21T09:19:14.1824440Z cvt.rn.bf16x2.f32 %r11058, %r23444, %r23443; 2026-02-21T09:19:14.1824517Z cvt.rn.bf16x2.f32 %r11059, %r23446, %r23445; 2026-02-21T09:19:14.1824600Z cvt.rn.bf16x2.f32 %r11060, %r23448, %r23447; 2026-02-21T09:19:14.1824677Z cvt.rn.bf16x2.f32 %r11061, %r23450, %r23449; 2026-02-21T09:19:14.1824813Z cvt.rn.bf16x2.f32 %r11062, %r23452, %r23451; 2026-02-21T09:19:14.1824894Z cvt.rn.bf16x2.f32 %r11063, %r23454, %r23453; 2026-02-21T09:19:14.1824971Z cvt.rn.bf16x2.f32 %r11064, %r23456, %r23455; 2026-02-21T09:19:14.1825049Z cvt.rn.bf16x2.f32 %r11065, %r23458, %r23457; 2026-02-21T09:19:14.1825177Z cvt.rn.bf16x2.f32 %r11066, %r23460, %r23459; 2026-02-21T09:19:14.1825258Z cvt.rn.bf16x2.f32 %r11067, %r23462, %r23461; 2026-02-21T09:19:14.1825349Z cvt.rn.bf16x2.f32 %r11068, %r23464, %r23463; 2026-02-21T09:19:14.1825431Z cvt.rn.bf16x2.f32 %r11069, %r23466, %r23465; 2026-02-21T09:19:14.1825511Z cvt.rn.bf16x2.f32 %r11070, %r23468, %r23467; 2026-02-21T09:19:14.1825593Z cvt.rn.bf16x2.f32 %r11071, %r23470, %r23469; 2026-02-21T09:19:14.1825741Z cvt.rn.bf16x2.f32 %r11072, %r23472, %r23471; 2026-02-21T09:19:14.1825824Z cvt.rn.bf16x2.f32 %r11073, %r23474, %r23473; 2026-02-21T09:19:14.1825900Z cvt.rn.bf16x2.f32 %r11074, %r23476, %r23475; 2026-02-21T09:19:14.1825978Z cvt.rn.bf16x2.f32 %r11075, %r23478, %r23477; 2026-02-21T09:19:14.1826059Z cvt.rn.bf16x2.f32 %r11076, %r23480, %r23479; 2026-02-21T09:19:14.1826135Z cvt.rn.bf16x2.f32 %r11077, %r23482, %r23481; 2026-02-21T09:19:14.1826213Z cvt.rn.bf16x2.f32 %r11078, %r23484, %r23483; 2026-02-21T09:19:14.1826291Z cvt.rn.bf16x2.f32 %r11079, %r23486, %r23485; 2026-02-21T09:19:14.1826372Z cvt.rn.bf16x2.f32 %r11080, %r23488, %r23487; 2026-02-21T09:19:14.1826559Z cvt.rn.bf16x2.f32 %r11081, %r23490, %r23489; 2026-02-21T09:19:14.1826639Z cvt.rn.bf16x2.f32 %r11082, %r23492, %r23491; 2026-02-21T09:19:14.1826794Z cvt.rn.bf16x2.f32 %r11083, %r23494, %r23493; 2026-02-21T09:19:14.1826874Z cvt.rn.bf16x2.f32 %r11084, %r23496, %r23495; 2026-02-21T09:19:14.1826951Z cvt.rn.bf16x2.f32 %r11085, %r23498, %r23497; 2026-02-21T09:19:14.1827037Z cvt.rn.bf16x2.f32 %r11086, %r23500, %r23499; 2026-02-21T09:19:14.1827113Z cvt.rn.bf16x2.f32 %r11087, %r23502, %r23501; 2026-02-21T09:19:14.1827192Z cvt.rn.bf16x2.f32 %r11088, %r23504, %r23503; 2026-02-21T09:19:14.1827270Z cvt.rn.bf16x2.f32 %r11089, %r23506, %r23505; 2026-02-21T09:19:14.1827355Z cvt.rn.bf16x2.f32 %r11090, %r23508, %r23507; 2026-02-21T09:19:14.1827568Z .loc 1 91 43 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:43 2026-02-21T09:19:14.1827638Z shl.b32 %r11091, %r10931, 13; 2026-02-21T09:19:14.1827717Z shl.b32 %r11092, %r10932, 13; 2026-02-21T09:19:14.1827788Z shl.b32 %r11093, %r10933, 13; 2026-02-21T09:19:14.1827849Z shl.b32 %r11094, %r10934, 13; 2026-02-21T09:19:14.1827915Z shl.b32 %r11095, %r10935, 13; 2026-02-21T09:19:14.1827975Z shl.b32 %r11096, %r10936, 13; 2026-02-21T09:19:14.1828036Z shl.b32 %r11097, %r10937, 13; 2026-02-21T09:19:14.1828098Z shl.b32 %r11098, %r10938, 13; 2026-02-21T09:19:14.1828165Z shl.b32 %r11099, %r10939, 13; 2026-02-21T09:19:14.1828226Z shl.b32 %r11100, %r10940, 13; 2026-02-21T09:19:14.1828286Z shl.b32 %r11101, %r10941, 13; 2026-02-21T09:19:14.1828351Z shl.b32 %r11102, %r10942, 13; 2026-02-21T09:19:14.1828412Z shl.b32 %r11103, %r10943, 13; 2026-02-21T09:19:14.1828472Z shl.b32 %r11104, %r10944, 13; 2026-02-21T09:19:14.1828602Z shl.b32 %r11105, %r10945, 13; 2026-02-21T09:19:14.1828669Z shl.b32 %r11106, %r10946, 13; 2026-02-21T09:19:14.1828732Z shl.b32 %r11107, %r10947, 13; 2026-02-21T09:19:14.1828796Z shl.b32 %r11108, %r10948, 13; 2026-02-21T09:19:14.1828862Z shl.b32 %r11109, %r10949, 13; 2026-02-21T09:19:14.1828922Z shl.b32 %r11110, %r10950, 13; 2026-02-21T09:19:14.1828987Z shl.b32 %r11111, %r10951, 13; 2026-02-21T09:19:14.1829060Z shl.b32 %r11112, %r10952, 13; 2026-02-21T09:19:14.1829121Z shl.b32 %r11113, %r10953, 13; 2026-02-21T09:19:14.1829182Z shl.b32 %r11114, %r10954, 13; 2026-02-21T09:19:14.1829242Z shl.b32 %r11115, %r10955, 13; 2026-02-21T09:19:14.1829308Z shl.b32 %r11116, %r10956, 13; 2026-02-21T09:19:14.1829368Z shl.b32 %r11117, %r10957, 13; 2026-02-21T09:19:14.1829428Z shl.b32 %r11118, %r10958, 13; 2026-02-21T09:19:14.1829494Z shl.b32 %r11119, %r10959, 13; 2026-02-21T09:19:14.1829637Z shl.b32 %r11120, %r10960, 13; 2026-02-21T09:19:14.1829697Z shl.b32 %r11121, %r10961, 13; 2026-02-21T09:19:14.1829757Z shl.b32 %r11122, %r10962, 13; 2026-02-21T09:19:14.1829974Z .loc 1 91 50 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:50 2026-02-21T09:19:14.1830106Z add.s32 %r11123, %r11091, %r10930; 2026-02-21T09:19:14.1830170Z add.s32 %r11124, %r11092, %r10930; 2026-02-21T09:19:14.1830238Z add.s32 %r11125, %r11093, %r10930; 2026-02-21T09:19:14.1830300Z add.s32 %r11126, %r11094, %r10930; 2026-02-21T09:19:14.1830363Z add.s32 %r11127, %r11095, %r10930; 2026-02-21T09:19:14.1830426Z add.s32 %r11128, %r11096, %r10930; 2026-02-21T09:19:14.1830504Z add.s32 %r11129, %r11097, %r10930; 2026-02-21T09:19:14.1830632Z add.s32 %r11130, %r11098, %r10930; 2026-02-21T09:19:14.1830697Z add.s32 %r11131, %r11099, %r10930; 2026-02-21T09:19:14.1830764Z add.s32 %r11132, %r11100, %r10930; 2026-02-21T09:19:14.1830828Z add.s32 %r11133, %r11101, %r10930; 2026-02-21T09:19:14.1830891Z add.s32 %r11134, %r11102, %r10930; 2026-02-21T09:19:14.1830955Z add.s32 %r11135, %r11103, %r10930; 2026-02-21T09:19:14.1831017Z add.s32 %r11136, %r11104, %r10930; 2026-02-21T09:19:14.1831079Z add.s32 %r11137, %r11105, %r10930; 2026-02-21T09:19:14.1831142Z add.s32 %r11138, %r11106, %r10930; 2026-02-21T09:19:14.1831209Z add.s32 %r11139, %r11107, %r10930; 2026-02-21T09:19:14.1831270Z add.s32 %r11140, %r11108, %r10930; 2026-02-21T09:19:14.1831332Z add.s32 %r11141, %r11109, %r10930; 2026-02-21T09:19:14.1831447Z add.s32 %r11142, %r11110, %r10930; 2026-02-21T09:19:14.1831513Z add.s32 %r11143, %r11111, %r10930; 2026-02-21T09:19:14.1831574Z add.s32 %r11144, %r11112, %r10930; 2026-02-21T09:19:14.1831638Z add.s32 %r11145, %r11113, %r10930; 2026-02-21T09:19:14.1831704Z add.s32 %r11146, %r11114, %r10930; 2026-02-21T09:19:14.1831766Z add.s32 %r11147, %r11115, %r10930; 2026-02-21T09:19:14.1831827Z add.s32 %r11148, %r11116, %r10930; 2026-02-21T09:19:14.1831896Z add.s32 %r11149, %r11117, %r10930; 2026-02-21T09:19:14.1831958Z add.s32 %r11150, %r11118, %r10930; 2026-02-21T09:19:14.1832020Z add.s32 %r11151, %r11119, %r10930; 2026-02-21T09:19:14.1832086Z add.s32 %r11152, %r11120, %r10930; 2026-02-21T09:19:14.1832152Z add.s32 %r11153, %r11121, %r10930; 2026-02-21T09:19:14.1832215Z add.s32 %r11154, %r11122, %r10930; 2026-02-21T09:19:14.1832424Z .loc 1 91 22 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:22 2026-02-21T09:19:14.1832506Z mad.wide.s32 %rd453, %r11123, 2, %rd66; 2026-02-21T09:19:14.1832580Z mad.wide.s32 %rd454, %r11124, 2, %rd66; 2026-02-21T09:19:14.1832651Z mad.wide.s32 %rd455, %r11125, 2, %rd66; 2026-02-21T09:19:14.1832726Z mad.wide.s32 %rd456, %r11126, 2, %rd66; 2026-02-21T09:19:14.1832798Z mad.wide.s32 %rd457, %r11127, 2, %rd66; 2026-02-21T09:19:14.1832874Z mad.wide.s32 %rd458, %r11128, 2, %rd66; 2026-02-21T09:19:14.1832944Z mad.wide.s32 %rd459, %r11129, 2, %rd66; 2026-02-21T09:19:14.1833013Z mad.wide.s32 %rd460, %r11130, 2, %rd66; 2026-02-21T09:19:14.1833081Z mad.wide.s32 %rd461, %r11131, 2, %rd66; 2026-02-21T09:19:14.1833154Z mad.wide.s32 %rd462, %r11132, 2, %rd66; 2026-02-21T09:19:14.1833221Z mad.wide.s32 %rd463, %r11133, 2, %rd66; 2026-02-21T09:19:14.1833292Z mad.wide.s32 %rd464, %r11134, 2, %rd66; 2026-02-21T09:19:14.1833371Z mad.wide.s32 %rd465, %r11135, 2, %rd66; 2026-02-21T09:19:14.1833446Z mad.wide.s32 %rd466, %r11136, 2, %rd66; 2026-02-21T09:19:14.1833516Z mad.wide.s32 %rd467, %r11137, 2, %rd66; 2026-02-21T09:19:14.1833590Z mad.wide.s32 %rd468, %r11138, 2, %rd66; 2026-02-21T09:19:14.1833658Z mad.wide.s32 %rd469, %r11139, 2, %rd66; 2026-02-21T09:19:14.1833726Z mad.wide.s32 %rd470, %r11140, 2, %rd66; 2026-02-21T09:19:14.1833798Z mad.wide.s32 %rd471, %r11141, 2, %rd66; 2026-02-21T09:19:14.1833870Z mad.wide.s32 %rd472, %r11142, 2, %rd66; 2026-02-21T09:19:14.1833937Z mad.wide.s32 %rd473, %r11143, 2, %rd66; 2026-02-21T09:19:14.1834073Z mad.wide.s32 %rd474, %r11144, 2, %rd66; 2026-02-21T09:19:14.1834145Z mad.wide.s32 %rd475, %r11145, 2, %rd66; 2026-02-21T09:19:14.1834212Z mad.wide.s32 %rd476, %r11146, 2, %rd66; 2026-02-21T09:19:14.1834279Z mad.wide.s32 %rd477, %r11147, 2, %rd66; 2026-02-21T09:19:14.1834402Z mad.wide.s32 %rd478, %r11148, 2, %rd66; 2026-02-21T09:19:14.1834472Z mad.wide.s32 %rd479, %r11149, 2, %rd66; 2026-02-21T09:19:14.1834540Z mad.wide.s32 %rd480, %r11150, 2, %rd66; 2026-02-21T09:19:14.1834608Z mad.wide.s32 %rd481, %r11151, 2, %rd66; 2026-02-21T09:19:14.1834681Z mad.wide.s32 %rd482, %r11152, 2, %rd66; 2026-02-21T09:19:14.1834749Z mad.wide.s32 %rd483, %r11153, 2, %rd66; 2026-02-21T09:19:14.1834816Z mad.wide.s32 %rd484, %r11154, 2, %rd66; 2026-02-21T09:19:14.1835077Z .loc 1 91 81 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:81 2026-02-21T09:19:14.1835204Z st.shared.v4.b32 [%r165], {%r10963, %r10965, %r10967, %r10969}; 2026-02-21T09:19:14.1835332Z st.shared.v4.b32 [%r165+512], {%r10964, %r10966, %r10968, %r10970}; 2026-02-21T09:19:14.1835454Z st.shared.v4.b32 [%r166], {%r10971, %r10973, %r10975, %r10977}; 2026-02-21T09:19:14.1835573Z st.shared.v4.b32 [%r166+512], {%r10972, %r10974, %r10976, %r10978}; 2026-02-21T09:19:14.1835685Z st.shared.v4.b32 [%r167], {%r10979, %r10981, %r10983, %r10985}; 2026-02-21T09:19:14.1835799Z st.shared.v4.b32 [%r167+512], {%r10980, %r10982, %r10984, %r10986}; 2026-02-21T09:19:14.1835914Z st.shared.v4.b32 [%r168], {%r10987, %r10989, %r10991, %r10993}; 2026-02-21T09:19:14.1836074Z st.shared.v4.b32 [%r168+512], {%r10988, %r10990, %r10992, %r10994}; 2026-02-21T09:19:14.1836134Z bar.sync 0; 2026-02-21T09:19:14.1836201Z // begin inline asm 2026-02-21T09:19:14.1836408Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10619, %r10620, %r10621, %r10622}, [%r6479]; 2026-02-21T09:19:14.1836584Z // end inline asm 2026-02-21T09:19:14.1836654Z // begin inline asm 2026-02-21T09:19:14.1836849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10623, %r10624, %r10625, %r10626}, [%r6484]; 2026-02-21T09:19:14.1836910Z // end inline asm 2026-02-21T09:19:14.1836969Z // begin inline asm 2026-02-21T09:19:14.1837160Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10627, %r10628, %r10629, %r10630}, [%r6489]; 2026-02-21T09:19:14.1837220Z // end inline asm 2026-02-21T09:19:14.1837279Z // begin inline asm 2026-02-21T09:19:14.1837468Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10631, %r10632, %r10633, %r10634}, [%r6494]; 2026-02-21T09:19:14.1837526Z // end inline asm 2026-02-21T09:19:14.1837583Z // begin inline asm 2026-02-21T09:19:14.1837772Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10635, %r10636, %r10637, %r10638}, [%r6499]; 2026-02-21T09:19:14.1837828Z // end inline asm 2026-02-21T09:19:14.1837891Z // begin inline asm 2026-02-21T09:19:14.1838078Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10639, %r10640, %r10641, %r10642}, [%r6504]; 2026-02-21T09:19:14.1838138Z // end inline asm 2026-02-21T09:19:14.1838197Z // begin inline asm 2026-02-21T09:19:14.1838384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10643, %r10644, %r10645, %r10646}, [%r6509]; 2026-02-21T09:19:14.1838444Z // end inline asm 2026-02-21T09:19:14.1838503Z // begin inline asm 2026-02-21T09:19:14.1838687Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10647, %r10648, %r10649, %r10650}, [%r6514]; 2026-02-21T09:19:14.1838750Z // end inline asm 2026-02-21T09:19:14.1838807Z bar.sync 0; 2026-02-21T09:19:14.1838920Z st.shared.v4.b32 [%r165], {%r10995, %r10997, %r10999, %r11001}; 2026-02-21T09:19:14.1839039Z st.shared.v4.b32 [%r165+512], {%r10996, %r10998, %r11000, %r11002}; 2026-02-21T09:19:14.1839153Z st.shared.v4.b32 [%r166], {%r11003, %r11005, %r11007, %r11009}; 2026-02-21T09:19:14.1839269Z st.shared.v4.b32 [%r166+512], {%r11004, %r11006, %r11008, %r11010}; 2026-02-21T09:19:14.1839379Z st.shared.v4.b32 [%r167], {%r11011, %r11013, %r11015, %r11017}; 2026-02-21T09:19:14.1839498Z st.shared.v4.b32 [%r167+512], {%r11012, %r11014, %r11016, %r11018}; 2026-02-21T09:19:14.1839692Z st.shared.v4.b32 [%r168], {%r11019, %r11021, %r11023, %r11025}; 2026-02-21T09:19:14.1839806Z st.shared.v4.b32 [%r168+512], {%r11020, %r11022, %r11024, %r11026}; 2026-02-21T09:19:14.1839867Z bar.sync 0; 2026-02-21T09:19:14.1839927Z // begin inline asm 2026-02-21T09:19:14.1840183Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10651, %r10652, %r10653, %r10654}, [%r6479]; 2026-02-21T09:19:14.1840239Z // end inline asm 2026-02-21T09:19:14.1840303Z // begin inline asm 2026-02-21T09:19:14.1840492Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10655, %r10656, %r10657, %r10658}, [%r6484]; 2026-02-21T09:19:14.1840552Z // end inline asm 2026-02-21T09:19:14.1840617Z // begin inline asm 2026-02-21T09:19:14.1840805Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10659, %r10660, %r10661, %r10662}, [%r6489]; 2026-02-21T09:19:14.1840925Z // end inline asm 2026-02-21T09:19:14.1840991Z // begin inline asm 2026-02-21T09:19:14.1841178Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10663, %r10664, %r10665, %r10666}, [%r6494]; 2026-02-21T09:19:14.1841237Z // end inline asm 2026-02-21T09:19:14.1841296Z // begin inline asm 2026-02-21T09:19:14.1841486Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10667, %r10668, %r10669, %r10670}, [%r6499]; 2026-02-21T09:19:14.1841544Z // end inline asm 2026-02-21T09:19:14.1841605Z // begin inline asm 2026-02-21T09:19:14.1841802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10671, %r10672, %r10673, %r10674}, [%r6504]; 2026-02-21T09:19:14.1841866Z // end inline asm 2026-02-21T09:19:14.1841928Z // begin inline asm 2026-02-21T09:19:14.1842188Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10675, %r10676, %r10677, %r10678}, [%r6509]; 2026-02-21T09:19:14.1842251Z // end inline asm 2026-02-21T09:19:14.1842309Z // begin inline asm 2026-02-21T09:19:14.1842498Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10679, %r10680, %r10681, %r10682}, [%r6514]; 2026-02-21T09:19:14.1842559Z // end inline asm 2026-02-21T09:19:14.1842615Z bar.sync 0; 2026-02-21T09:19:14.1842727Z st.shared.v4.b32 [%r165], {%r11027, %r11029, %r11031, %r11033}; 2026-02-21T09:19:14.1842850Z st.shared.v4.b32 [%r165+512], {%r11028, %r11030, %r11032, %r11034}; 2026-02-21T09:19:14.1842960Z st.shared.v4.b32 [%r166], {%r11035, %r11037, %r11039, %r11041}; 2026-02-21T09:19:14.1843074Z st.shared.v4.b32 [%r166+512], {%r11036, %r11038, %r11040, %r11042}; 2026-02-21T09:19:14.1843184Z st.shared.v4.b32 [%r167], {%r11043, %r11045, %r11047, %r11049}; 2026-02-21T09:19:14.1843301Z st.shared.v4.b32 [%r167+512], {%r11044, %r11046, %r11048, %r11050}; 2026-02-21T09:19:14.1843415Z st.shared.v4.b32 [%r168], {%r11051, %r11053, %r11055, %r11057}; 2026-02-21T09:19:14.1843531Z st.shared.v4.b32 [%r168+512], {%r11052, %r11054, %r11056, %r11058}; 2026-02-21T09:19:14.1843592Z bar.sync 0; 2026-02-21T09:19:14.1843651Z // begin inline asm 2026-02-21T09:19:14.1843842Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10683, %r10684, %r10685, %r10686}, [%r6479]; 2026-02-21T09:19:14.1843903Z // end inline asm 2026-02-21T09:19:14.1843961Z // begin inline asm 2026-02-21T09:19:14.1844151Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10687, %r10688, %r10689, %r10690}, [%r6484]; 2026-02-21T09:19:14.1844207Z // end inline asm 2026-02-21T09:19:14.1844267Z // begin inline asm 2026-02-21T09:19:14.1844454Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10691, %r10692, %r10693, %r10694}, [%r6489]; 2026-02-21T09:19:14.1844513Z // end inline asm 2026-02-21T09:19:14.1844575Z // begin inline asm 2026-02-21T09:19:14.1844760Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10695, %r10696, %r10697, %r10698}, [%r6494]; 2026-02-21T09:19:14.1844816Z // end inline asm 2026-02-21T09:19:14.1844878Z // begin inline asm 2026-02-21T09:19:14.1845076Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10699, %r10700, %r10701, %r10702}, [%r6499]; 2026-02-21T09:19:14.1845135Z // end inline asm 2026-02-21T09:19:14.1845194Z // begin inline asm 2026-02-21T09:19:14.1845387Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10703, %r10704, %r10705, %r10706}, [%r6504]; 2026-02-21T09:19:14.1845504Z // end inline asm 2026-02-21T09:19:14.1845562Z // begin inline asm 2026-02-21T09:19:14.1845750Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10707, %r10708, %r10709, %r10710}, [%r6509]; 2026-02-21T09:19:14.1845807Z // end inline asm 2026-02-21T09:19:14.1845866Z // begin inline asm 2026-02-21T09:19:14.1846123Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10711, %r10712, %r10713, %r10714}, [%r6514]; 2026-02-21T09:19:14.1846180Z // end inline asm 2026-02-21T09:19:14.1846238Z bar.sync 0; 2026-02-21T09:19:14.1846349Z st.shared.v4.b32 [%r165], {%r11059, %r11061, %r11063, %r11065}; 2026-02-21T09:19:14.1846585Z st.shared.v4.b32 [%r165+512], {%r11060, %r11062, %r11064, %r11066}; 2026-02-21T09:19:14.1846704Z st.shared.v4.b32 [%r166], {%r11067, %r11069, %r11071, %r11073}; 2026-02-21T09:19:14.1846892Z st.shared.v4.b32 [%r166+512], {%r11068, %r11070, %r11072, %r11074}; 2026-02-21T09:19:14.1847008Z st.shared.v4.b32 [%r167], {%r11075, %r11077, %r11079, %r11081}; 2026-02-21T09:19:14.1847121Z st.shared.v4.b32 [%r167+512], {%r11076, %r11078, %r11080, %r11082}; 2026-02-21T09:19:14.1847230Z st.shared.v4.b32 [%r168], {%r11083, %r11085, %r11087, %r11089}; 2026-02-21T09:19:14.1847348Z st.shared.v4.b32 [%r168+512], {%r11084, %r11086, %r11088, %r11090}; 2026-02-21T09:19:14.1847405Z bar.sync 0; 2026-02-21T09:19:14.1847468Z // begin inline asm 2026-02-21T09:19:14.1847657Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10715, %r10716, %r10717, %r10718}, [%r6479]; 2026-02-21T09:19:14.1847717Z // end inline asm 2026-02-21T09:19:14.1847776Z // begin inline asm 2026-02-21T09:19:14.1848026Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10719, %r10720, %r10721, %r10722}, [%r6484]; 2026-02-21T09:19:14.1848089Z // end inline asm 2026-02-21T09:19:14.1848149Z // begin inline asm 2026-02-21T09:19:14.1848352Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10723, %r10724, %r10725, %r10726}, [%r6489]; 2026-02-21T09:19:14.1848416Z // end inline asm 2026-02-21T09:19:14.1848475Z // begin inline asm 2026-02-21T09:19:14.1848660Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10727, %r10728, %r10729, %r10730}, [%r6494]; 2026-02-21T09:19:14.1848718Z // end inline asm 2026-02-21T09:19:14.1848780Z // begin inline asm 2026-02-21T09:19:14.1848966Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10731, %r10732, %r10733, %r10734}, [%r6499]; 2026-02-21T09:19:14.1849024Z // end inline asm 2026-02-21T09:19:14.1849086Z // begin inline asm 2026-02-21T09:19:14.1849271Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10735, %r10736, %r10737, %r10738}, [%r6504]; 2026-02-21T09:19:14.1849328Z // end inline asm 2026-02-21T09:19:14.1849388Z // begin inline asm 2026-02-21T09:19:14.1849579Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10739, %r10740, %r10741, %r10742}, [%r6509]; 2026-02-21T09:19:14.1849636Z // end inline asm 2026-02-21T09:19:14.1849696Z // begin inline asm 2026-02-21T09:19:14.1849889Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10743, %r10744, %r10745, %r10746}, [%r6514]; 2026-02-21T09:19:14.1849948Z // end inline asm 2026-02-21T09:19:14.1850008Z // begin inline asm 2026-02-21T09:19:14.1850141Z st.global.v4.b32 [ %rd453 + 0 ], { %r10619, %r10620, %r10621, %r10622 }; 2026-02-21T09:19:14.1850199Z // end inline asm 2026-02-21T09:19:14.1850258Z // begin inline asm 2026-02-21T09:19:14.1850382Z st.global.v4.b32 [ %rd454 + 0 ], { %r10623, %r10624, %r10625, %r10626 }; 2026-02-21T09:19:14.1850444Z // end inline asm 2026-02-21T09:19:14.1850505Z // begin inline asm 2026-02-21T09:19:14.1850625Z st.global.v4.b32 [ %rd455 + 0 ], { %r10627, %r10628, %r10629, %r10630 }; 2026-02-21T09:19:14.1850685Z // end inline asm 2026-02-21T09:19:14.1850747Z // begin inline asm 2026-02-21T09:19:14.1850864Z st.global.v4.b32 [ %rd456 + 0 ], { %r10631, %r10632, %r10633, %r10634 }; 2026-02-21T09:19:14.1850924Z // end inline asm 2026-02-21T09:19:14.1850984Z // begin inline asm 2026-02-21T09:19:14.1851102Z st.global.v4.b32 [ %rd457 + 0 ], { %r10635, %r10636, %r10637, %r10638 }; 2026-02-21T09:19:14.1851158Z // end inline asm 2026-02-21T09:19:14.1851221Z // begin inline asm 2026-02-21T09:19:14.1851430Z st.global.v4.b32 [ %rd458 + 0 ], { %r10639, %r10640, %r10641, %r10642 }; 2026-02-21T09:19:14.1851489Z // end inline asm 2026-02-21T09:19:14.1851552Z // begin inline asm 2026-02-21T09:19:14.1851671Z st.global.v4.b32 [ %rd459 + 0 ], { %r10643, %r10644, %r10645, %r10646 }; 2026-02-21T09:19:14.1851792Z // end inline asm 2026-02-21T09:19:14.1851850Z // begin inline asm 2026-02-21T09:19:14.1851972Z st.global.v4.b32 [ %rd460 + 0 ], { %r10647, %r10648, %r10649, %r10650 }; 2026-02-21T09:19:14.1852029Z // end inline asm 2026-02-21T09:19:14.1852089Z // begin inline asm 2026-02-21T09:19:14.1852213Z st.global.v4.b32 [ %rd461 + 0 ], { %r10651, %r10652, %r10653, %r10654 }; 2026-02-21T09:19:14.1852270Z // end inline asm 2026-02-21T09:19:14.1852328Z // begin inline asm 2026-02-21T09:19:14.1852498Z st.global.v4.b32 [ %rd462 + 0 ], { %r10655, %r10656, %r10657, %r10658 }; 2026-02-21T09:19:14.1852558Z // end inline asm 2026-02-21T09:19:14.1852617Z // begin inline asm 2026-02-21T09:19:14.1852736Z st.global.v4.b32 [ %rd463 + 0 ], { %r10659, %r10660, %r10661, %r10662 }; 2026-02-21T09:19:14.1852796Z // end inline asm 2026-02-21T09:19:14.1852866Z // begin inline asm 2026-02-21T09:19:14.1852987Z st.global.v4.b32 [ %rd464 + 0 ], { %r10663, %r10664, %r10665, %r10666 }; 2026-02-21T09:19:14.1853048Z // end inline asm 2026-02-21T09:19:14.1853107Z // begin inline asm 2026-02-21T09:19:14.1853223Z st.global.v4.b32 [ %rd465 + 0 ], { %r10667, %r10668, %r10669, %r10670 }; 2026-02-21T09:19:14.1853280Z // end inline asm 2026-02-21T09:19:14.1853344Z // begin inline asm 2026-02-21T09:19:14.1853509Z st.global.v4.b32 [ %rd466 + 0 ], { %r10671, %r10672, %r10673, %r10674 }; 2026-02-21T09:19:14.1853569Z // end inline asm 2026-02-21T09:19:14.1853632Z // begin inline asm 2026-02-21T09:19:14.1853751Z st.global.v4.b32 [ %rd467 + 0 ], { %r10675, %r10676, %r10677, %r10678 }; 2026-02-21T09:19:14.1853808Z // end inline asm 2026-02-21T09:19:14.1853867Z // begin inline asm 2026-02-21T09:19:14.1853991Z st.global.v4.b32 [ %rd468 + 0 ], { %r10679, %r10680, %r10681, %r10682 }; 2026-02-21T09:19:14.1854047Z // end inline asm 2026-02-21T09:19:14.1854106Z // begin inline asm 2026-02-21T09:19:14.1854225Z st.global.v4.b32 [ %rd469 + 0 ], { %r10683, %r10684, %r10685, %r10686 }; 2026-02-21T09:19:14.1854282Z // end inline asm 2026-02-21T09:19:14.1854340Z // begin inline asm 2026-02-21T09:19:14.1854465Z st.global.v4.b32 [ %rd470 + 0 ], { %r10687, %r10688, %r10689, %r10690 }; 2026-02-21T09:19:14.1854521Z // end inline asm 2026-02-21T09:19:14.1854582Z // begin inline asm 2026-02-21T09:19:14.1854701Z st.global.v4.b32 [ %rd471 + 0 ], { %r10691, %r10692, %r10693, %r10694 }; 2026-02-21T09:19:14.1854761Z // end inline asm 2026-02-21T09:19:14.1854817Z // begin inline asm 2026-02-21T09:19:14.1854935Z st.global.v4.b32 [ %rd472 + 0 ], { %r10695, %r10696, %r10697, %r10698 }; 2026-02-21T09:19:14.1854993Z // end inline asm 2026-02-21T09:19:14.1855052Z // begin inline asm 2026-02-21T09:19:14.1855169Z st.global.v4.b32 [ %rd473 + 0 ], { %r10699, %r10700, %r10701, %r10702 }; 2026-02-21T09:19:14.1855231Z // end inline asm 2026-02-21T09:19:14.1855291Z // begin inline asm 2026-02-21T09:19:14.1855410Z st.global.v4.b32 [ %rd474 + 0 ], { %r10703, %r10704, %r10705, %r10706 }; 2026-02-21T09:19:14.1855469Z // end inline asm 2026-02-21T09:19:14.1855532Z // begin inline asm 2026-02-21T09:19:14.1855647Z st.global.v4.b32 [ %rd475 + 0 ], { %r10707, %r10708, %r10709, %r10710 }; 2026-02-21T09:19:14.1855703Z // end inline asm 2026-02-21T09:19:14.1855767Z // begin inline asm 2026-02-21T09:19:14.1855886Z st.global.v4.b32 [ %rd476 + 0 ], { %r10711, %r10712, %r10713, %r10714 }; 2026-02-21T09:19:14.1855942Z // end inline asm 2026-02-21T09:19:14.1856001Z // begin inline asm 2026-02-21T09:19:14.1856123Z st.global.v4.b32 [ %rd477 + 0 ], { %r10715, %r10716, %r10717, %r10718 }; 2026-02-21T09:19:14.1856180Z // end inline asm 2026-02-21T09:19:14.1856238Z // begin inline asm 2026-02-21T09:19:14.1856365Z st.global.v4.b32 [ %rd478 + 0 ], { %r10719, %r10720, %r10721, %r10722 }; 2026-02-21T09:19:14.1856591Z // end inline asm 2026-02-21T09:19:14.1856654Z // begin inline asm 2026-02-21T09:19:14.1856775Z st.global.v4.b32 [ %rd479 + 0 ], { %r10723, %r10724, %r10725, %r10726 }; 2026-02-21T09:19:14.1856848Z // end inline asm 2026-02-21T09:19:14.1856979Z // begin inline asm 2026-02-21T09:19:14.1857098Z st.global.v4.b32 [ %rd480 + 0 ], { %r10727, %r10728, %r10729, %r10730 }; 2026-02-21T09:19:14.1857158Z // end inline asm 2026-02-21T09:19:14.1857216Z // begin inline asm 2026-02-21T09:19:14.1857334Z st.global.v4.b32 [ %rd481 + 0 ], { %r10731, %r10732, %r10733, %r10734 }; 2026-02-21T09:19:14.1857393Z // end inline asm 2026-02-21T09:19:14.1857452Z // begin inline asm 2026-02-21T09:19:14.1857633Z st.global.v4.b32 [ %rd482 + 0 ], { %r10735, %r10736, %r10737, %r10738 }; 2026-02-21T09:19:14.1857693Z // end inline asm 2026-02-21T09:19:14.1857756Z // begin inline asm 2026-02-21T09:19:14.1857872Z st.global.v4.b32 [ %rd483 + 0 ], { %r10739, %r10740, %r10741, %r10742 }; 2026-02-21T09:19:14.1857931Z // end inline asm 2026-02-21T09:19:14.1857997Z // begin inline asm 2026-02-21T09:19:14.1858125Z st.global.v4.b32 [ %rd484 + 0 ], { %r10743, %r10744, %r10745, %r10746 }; 2026-02-21T09:19:14.1858182Z // end inline asm 2026-02-21T09:19:14.1858419Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.1858492Z add.s32 %r11155, %r22988, 2; 2026-02-21T09:19:14.1858708Z .loc 1 29 33 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:29:33 2026-02-21T09:19:14.1858836Z shr.u32 %r11156, %r11155, 5; 2026-02-21T09:19:14.1858912Z and.b32 %r11157, %r11156, 67108856; 2026-02-21T09:19:14.1859115Z .loc 1 30 39 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:39 2026-02-21T09:19:14.1859180Z sub.s32 %r11158, 64, %r11157; 2026-02-21T09:19:14.1859382Z .loc 1 30 52 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:52 2026-02-21T09:19:14.1859448Z min.s32 %r11159, %r11158, 8; 2026-02-21T09:19:14.1859645Z .loc 1 31 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:45 2026-02-21T09:19:14.1859712Z and.b32 %r11160, %r11155, 255; 2026-02-21T09:19:14.1859908Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.1859975Z div.s32 %r11161, %r11160, %r11159; 2026-02-21T09:19:14.1860175Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.1860248Z mul.lo.s32 %r11162, %r11161, %r11159; 2026-02-21T09:19:14.1860311Z sub.s32 %r11163, %r11160, %r11162; 2026-02-21T09:19:14.1860508Z .loc 1 31 30 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:30 2026-02-21T09:19:14.1860573Z add.s32 %r11164, %r11163, %r11157; 2026-02-21T09:19:14.1860768Z .loc 1 33 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:33:27 2026-02-21T09:19:14.1860833Z shl.b32 %r1375, %r11164, 7; 2026-02-21T09:19:14.1861030Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.1861094Z or.b32 %r11165, %r1375, %r7; 2026-02-21T09:19:14.1861290Z .loc 1 35 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:35:27 2026-02-21T09:19:14.1861355Z shl.b32 %r1376, %r11161, 9; 2026-02-21T09:19:14.1861551Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.1861616Z or.b32 %r11166, %r1376, %r11; 2026-02-21T09:19:14.1861678Z or.b32 %r11167, %r1376, %r12; 2026-02-21T09:19:14.1861742Z or.b32 %r11168, %r1376, %r13; 2026-02-21T09:19:14.1861802Z or.b32 %r11169, %r1376, %r14; 2026-02-21T09:19:14.1861863Z or.b32 %r11170, %r1376, %r15; 2026-02-21T09:19:14.1861928Z or.b32 %r11171, %r1376, %r16; 2026-02-21T09:19:14.1861988Z or.b32 %r11172, %r1376, %r17; 2026-02-21T09:19:14.1862126Z or.b32 %r11173, %r1376, %r18; 2026-02-21T09:19:14.1862326Z .loc 1 51 53 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:53 2026-02-21T09:19:14.1862387Z shl.b32 %r11174, %r11166, 10; 2026-02-21T09:19:14.1862493Z shl.b32 %r11175, %r11167, 10; 2026-02-21T09:19:14.1862552Z shl.b32 %r11176, %r11168, 10; 2026-02-21T09:19:14.1862616Z shl.b32 %r11177, %r11169, 10; 2026-02-21T09:19:14.1862676Z shl.b32 %r11178, %r11170, 10; 2026-02-21T09:19:14.1862738Z shl.b32 %r11179, %r11171, 10; 2026-02-21T09:19:14.1862803Z shl.b32 %r11180, %r11172, 10; 2026-02-21T09:19:14.1862862Z shl.b32 %r11181, %r11173, 10; 2026-02-21T09:19:14.1863119Z .loc 1 51 60 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:60 2026-02-21T09:19:14.1863191Z or.b32 %r11182, %r11174, %r54; 2026-02-21T09:19:14.1863254Z or.b32 %r11183, %r11175, %r54; 2026-02-21T09:19:14.1863318Z or.b32 %r11184, %r11176, %r54; 2026-02-21T09:19:14.1863383Z or.b32 %r11185, %r11177, %r54; 2026-02-21T09:19:14.1863447Z or.b32 %r11186, %r11178, %r54; 2026-02-21T09:19:14.1863509Z or.b32 %r11187, %r11179, %r54; 2026-02-21T09:19:14.1863568Z or.b32 %r11188, %r11180, %r54; 2026-02-21T09:19:14.1863632Z or.b32 %r11189, %r11181, %r54; 2026-02-21T09:19:14.1863831Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1863905Z mad.wide.s32 %rd485, %r11182, 2, %rd64; 2026-02-21T09:19:14.1863976Z mad.wide.s32 %rd486, %r11183, 2, %rd64; 2026-02-21T09:19:14.1864093Z mad.wide.s32 %rd487, %r11184, 2, %rd64; 2026-02-21T09:19:14.1864164Z mad.wide.s32 %rd488, %r11185, 2, %rd64; 2026-02-21T09:19:14.1864233Z mad.wide.s32 %rd489, %r11186, 2, %rd64; 2026-02-21T09:19:14.1864305Z mad.wide.s32 %rd490, %r11187, 2, %rd64; 2026-02-21T09:19:14.1864374Z mad.wide.s32 %rd491, %r11188, 2, %rd64; 2026-02-21T09:19:14.1864441Z mad.wide.s32 %rd492, %r11189, 2, %rd64; 2026-02-21T09:19:14.1864641Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1864699Z bar.sync 0; 2026-02-21T09:19:14.1864757Z mov.b32 %r10748, 8; 2026-02-21T09:19:14.1864818Z // begin inline asm 2026-02-21T09:19:14.1864966Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd485 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1865023Z // end inline asm 2026-02-21T09:19:14.1865082Z // begin inline asm 2026-02-21T09:19:14.1865221Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd486 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1865278Z // end inline asm 2026-02-21T09:19:14.1865338Z // begin inline asm 2026-02-21T09:19:14.1865469Z cp.async.ca.shared.global [ %r59 + 0 ], [ %rd487 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1865529Z // end inline asm 2026-02-21T09:19:14.1865589Z // begin inline asm 2026-02-21T09:19:14.1865719Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd488 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1865779Z // end inline asm 2026-02-21T09:19:14.1865839Z // begin inline asm 2026-02-21T09:19:14.1865968Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd489 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1866027Z // end inline asm 2026-02-21T09:19:14.1866085Z // begin inline asm 2026-02-21T09:19:14.1866213Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd490 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1866272Z // end inline asm 2026-02-21T09:19:14.1866334Z // begin inline asm 2026-02-21T09:19:14.1866588Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd491 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1866653Z // end inline asm 2026-02-21T09:19:14.1866716Z // begin inline asm 2026-02-21T09:19:14.1866849Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd492 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1866906Z // end inline asm 2026-02-21T09:19:14.1866975Z cp.async.commit_group; 2026-02-21T09:19:14.1867179Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1867245Z add.s32 %r11190, %r11165, %r22968; 2026-02-21T09:19:14.1867563Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1867634Z cvt.s64.s32 %rd576, %r11190; 2026-02-21T09:19:14.1867700Z add.s64 %rd493, %rd65, %rd576; 2026-02-21T09:19:14.1867827Z mov.b32 %r23512, 4; 2026-02-21T09:19:14.1868027Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1868086Z // begin inline asm 2026-02-21T09:19:14.1868219Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd493 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1868280Z // end inline asm 2026-02-21T09:19:14.1868352Z cp.async.commit_group; 2026-02-21T09:19:14.1868644Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1868783Z cvt.s64.s32 %rd577, %r11174; 2026-02-21T09:19:14.1868859Z or.b64 %rd578, %rd577, %rd1113; 2026-02-21T09:19:14.1868923Z shl.b64 %rd579, %rd578, 1; 2026-02-21T09:19:14.1868989Z add.s64 %rd580, %rd64, %rd579; 2026-02-21T09:19:14.1869056Z add.s64 %rd494, %rd580, 32; 2026-02-21T09:19:14.1869118Z cvt.s64.s32 %rd581, %r11175; 2026-02-21T09:19:14.1869182Z or.b64 %rd582, %rd581, %rd1113; 2026-02-21T09:19:14.1869244Z shl.b64 %rd583, %rd582, 1; 2026-02-21T09:19:14.1869314Z add.s64 %rd584, %rd64, %rd583; 2026-02-21T09:19:14.1869376Z add.s64 %rd495, %rd584, 32; 2026-02-21T09:19:14.1869438Z cvt.s64.s32 %rd585, %r11176; 2026-02-21T09:19:14.1869501Z or.b64 %rd586, %rd585, %rd1113; 2026-02-21T09:19:14.1869562Z shl.b64 %rd587, %rd586, 1; 2026-02-21T09:19:14.1869690Z add.s64 %rd588, %rd64, %rd587; 2026-02-21T09:19:14.1869755Z add.s64 %rd496, %rd588, 32; 2026-02-21T09:19:14.1869818Z cvt.s64.s32 %rd589, %r11177; 2026-02-21T09:19:14.1869880Z or.b64 %rd590, %rd589, %rd1113; 2026-02-21T09:19:14.1869943Z shl.b64 %rd591, %rd590, 1; 2026-02-21T09:19:14.1870010Z add.s64 %rd592, %rd64, %rd591; 2026-02-21T09:19:14.1870070Z add.s64 %rd497, %rd592, 32; 2026-02-21T09:19:14.1870132Z cvt.s64.s32 %rd593, %r11178; 2026-02-21T09:19:14.1870195Z or.b64 %rd594, %rd593, %rd1113; 2026-02-21T09:19:14.1870260Z shl.b64 %rd595, %rd594, 1; 2026-02-21T09:19:14.1870322Z add.s64 %rd596, %rd64, %rd595; 2026-02-21T09:19:14.1870384Z add.s64 %rd498, %rd596, 32; 2026-02-21T09:19:14.1870449Z cvt.s64.s32 %rd597, %r11179; 2026-02-21T09:19:14.1870511Z or.b64 %rd598, %rd597, %rd1113; 2026-02-21T09:19:14.1870571Z shl.b64 %rd599, %rd598, 1; 2026-02-21T09:19:14.1870639Z add.s64 %rd600, %rd64, %rd599; 2026-02-21T09:19:14.1870700Z add.s64 %rd499, %rd600, 32; 2026-02-21T09:19:14.1870762Z cvt.s64.s32 %rd601, %r11180; 2026-02-21T09:19:14.1870824Z or.b64 %rd602, %rd601, %rd1113; 2026-02-21T09:19:14.1870889Z shl.b64 %rd603, %rd602, 1; 2026-02-21T09:19:14.1870951Z add.s64 %rd604, %rd64, %rd603; 2026-02-21T09:19:14.1871015Z add.s64 %rd500, %rd604, 32; 2026-02-21T09:19:14.1871089Z cvt.s64.s32 %rd605, %r11181; 2026-02-21T09:19:14.1871154Z or.b64 %rd606, %rd605, %rd1113; 2026-02-21T09:19:14.1871219Z shl.b64 %rd607, %rd606, 1; 2026-02-21T09:19:14.1871283Z add.s64 %rd608, %rd64, %rd607; 2026-02-21T09:19:14.1871351Z add.s64 %rd501, %rd608, 32; 2026-02-21T09:19:14.1871553Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1871616Z // begin inline asm 2026-02-21T09:19:14.1871761Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd494 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1871831Z // end inline asm 2026-02-21T09:19:14.1871893Z // begin inline asm 2026-02-21T09:19:14.1872039Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd495 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1872099Z // end inline asm 2026-02-21T09:19:14.1872161Z // begin inline asm 2026-02-21T09:19:14.1872294Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd496 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1872360Z // end inline asm 2026-02-21T09:19:14.1872420Z // begin inline asm 2026-02-21T09:19:14.1872550Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd497 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1872675Z // end inline asm 2026-02-21T09:19:14.1872737Z // begin inline asm 2026-02-21T09:19:14.1872868Z cp.async.ca.shared.global [ %r71 + 0 ], [ %rd498 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1872928Z // end inline asm 2026-02-21T09:19:14.1873042Z // begin inline asm 2026-02-21T09:19:14.1873173Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd499 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1873232Z // end inline asm 2026-02-21T09:19:14.1873298Z // begin inline asm 2026-02-21T09:19:14.1873428Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd500 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1873488Z // end inline asm 2026-02-21T09:19:14.1873549Z // begin inline asm 2026-02-21T09:19:14.1873684Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd501 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1873791Z // end inline asm 2026-02-21T09:19:14.1873864Z cp.async.commit_group; 2026-02-21T09:19:14.1874086Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1874155Z add.s32 %r11191, %r11165, %r75; 2026-02-21T09:19:14.1874358Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1874432Z cvt.s64.s32 %rd609, %r11191; 2026-02-21T09:19:14.1874500Z add.s64 %rd502, %rd65, %rd609; 2026-02-21T09:19:14.1874701Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1874767Z // begin inline asm 2026-02-21T09:19:14.1874953Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd502 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1875014Z // end inline asm 2026-02-21T09:19:14.1875094Z cp.async.commit_group; 2026-02-21T09:19:14.1875304Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1875370Z add.s64 %rd503, %rd580, 64; 2026-02-21T09:19:14.1875432Z add.s64 %rd504, %rd584, 64; 2026-02-21T09:19:14.1875499Z add.s64 %rd505, %rd588, 64; 2026-02-21T09:19:14.1875563Z add.s64 %rd506, %rd592, 64; 2026-02-21T09:19:14.1875625Z add.s64 %rd507, %rd596, 64; 2026-02-21T09:19:14.1875686Z add.s64 %rd508, %rd600, 64; 2026-02-21T09:19:14.1875752Z add.s64 %rd509, %rd604, 64; 2026-02-21T09:19:14.1875814Z add.s64 %rd510, %rd608, 64; 2026-02-21T09:19:14.1876014Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1876077Z bar.sync 0; 2026-02-21T09:19:14.1876138Z // begin inline asm 2026-02-21T09:19:14.1876275Z cp.async.ca.shared.global [ %r77 + 0 ], [ %rd503 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1876341Z // end inline asm 2026-02-21T09:19:14.1876400Z // begin inline asm 2026-02-21T09:19:14.1876646Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd504 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1876709Z // end inline asm 2026-02-21T09:19:14.1876777Z // begin inline asm 2026-02-21T09:19:14.1876908Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd505 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1876980Z // end inline asm 2026-02-21T09:19:14.1877047Z // begin inline asm 2026-02-21T09:19:14.1877179Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd506 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1877236Z // end inline asm 2026-02-21T09:19:14.1877297Z // begin inline asm 2026-02-21T09:19:14.1877434Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd507 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1877493Z // end inline asm 2026-02-21T09:19:14.1877555Z // begin inline asm 2026-02-21T09:19:14.1877691Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd508 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1877751Z // end inline asm 2026-02-21T09:19:14.1877812Z // begin inline asm 2026-02-21T09:19:14.1877948Z cp.async.ca.shared.global [ %r83 + 0 ], [ %rd509 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1878010Z // end inline asm 2026-02-21T09:19:14.1878071Z // begin inline asm 2026-02-21T09:19:14.1878201Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd510 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1878264Z // end inline asm 2026-02-21T09:19:14.1878414Z cp.async.commit_group; 2026-02-21T09:19:14.1878615Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1878684Z add.s32 %r11192, %r11165, %r85; 2026-02-21T09:19:14.1878947Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1879025Z cvt.s64.s32 %rd610, %r11192; 2026-02-21T09:19:14.1879099Z add.s64 %rd511, %rd65, %rd610; 2026-02-21T09:19:14.1879299Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1879360Z // begin inline asm 2026-02-21T09:19:14.1879493Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd511 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1879623Z // end inline asm 2026-02-21T09:19:14.1879693Z cp.async.commit_group; 2026-02-21T09:19:14.1879891Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1879961Z add.s64 %rd512, %rd580, 96; 2026-02-21T09:19:14.1880024Z add.s64 %rd513, %rd584, 96; 2026-02-21T09:19:14.1880088Z add.s64 %rd514, %rd588, 96; 2026-02-21T09:19:14.1880150Z add.s64 %rd515, %rd592, 96; 2026-02-21T09:19:14.1880217Z add.s64 %rd516, %rd596, 96; 2026-02-21T09:19:14.1880280Z add.s64 %rd517, %rd600, 96; 2026-02-21T09:19:14.1880342Z add.s64 %rd518, %rd604, 96; 2026-02-21T09:19:14.1880409Z add.s64 %rd519, %rd608, 96; 2026-02-21T09:19:14.1880608Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1880734Z // begin inline asm 2026-02-21T09:19:14.1880875Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd512 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1880935Z // end inline asm 2026-02-21T09:19:14.1881000Z // begin inline asm 2026-02-21T09:19:14.1881131Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd513 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1881195Z // end inline asm 2026-02-21T09:19:14.1881258Z // begin inline asm 2026-02-21T09:19:14.1881390Z cp.async.ca.shared.global [ %r89 + 0 ], [ %rd514 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1881454Z // end inline asm 2026-02-21T09:19:14.1881516Z // begin inline asm 2026-02-21T09:19:14.1881647Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd515 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1881708Z // end inline asm 2026-02-21T09:19:14.1881773Z // begin inline asm 2026-02-21T09:19:14.1881903Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd516 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1881961Z // end inline asm 2026-02-21T09:19:14.1882027Z // begin inline asm 2026-02-21T09:19:14.1882156Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd517 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1882215Z // end inline asm 2026-02-21T09:19:14.1882282Z // begin inline asm 2026-02-21T09:19:14.1882413Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd518 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1882486Z // end inline asm 2026-02-21T09:19:14.1882551Z // begin inline asm 2026-02-21T09:19:14.1882692Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd519 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1882751Z // end inline asm 2026-02-21T09:19:14.1882820Z cp.async.commit_group; 2026-02-21T09:19:14.1883025Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1883091Z add.s32 %r11193, %r11165, %r95; 2026-02-21T09:19:14.1883290Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1883363Z cvt.s64.s32 %rd611, %r11193; 2026-02-21T09:19:14.1883431Z add.s64 %rd520, %rd65, %rd611; 2026-02-21T09:19:14.1883628Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1883689Z // begin inline asm 2026-02-21T09:19:14.1883828Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd520 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1883887Z // end inline asm 2026-02-21T09:19:14.1883953Z cp.async.commit_group; 2026-02-21T09:19:14.1884221Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1884288Z add.s64 %rd521, %rd580, 128; 2026-02-21T09:19:14.1884351Z add.s64 %rd522, %rd584, 128; 2026-02-21T09:19:14.1884464Z add.s64 %rd523, %rd588, 128; 2026-02-21T09:19:14.1884527Z add.s64 %rd524, %rd592, 128; 2026-02-21T09:19:14.1884589Z add.s64 %rd525, %rd596, 128; 2026-02-21T09:19:14.1884649Z add.s64 %rd526, %rd600, 128; 2026-02-21T09:19:14.1884716Z add.s64 %rd527, %rd604, 128; 2026-02-21T09:19:14.1884777Z add.s64 %rd528, %rd608, 128; 2026-02-21T09:19:14.1884978Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1885040Z bar.sync 0; 2026-02-21T09:19:14.1885148Z // begin inline asm 2026-02-21T09:19:14.1885283Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd521 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1885342Z // end inline asm 2026-02-21T09:19:14.1885407Z // begin inline asm 2026-02-21T09:19:14.1885543Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd522 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1885600Z // end inline asm 2026-02-21T09:19:14.1885664Z // begin inline asm 2026-02-21T09:19:14.1885792Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd523 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1885852Z // end inline asm 2026-02-21T09:19:14.1885915Z // begin inline asm 2026-02-21T09:19:14.1886055Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd524 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1886113Z // end inline asm 2026-02-21T09:19:14.1886172Z // begin inline asm 2026-02-21T09:19:14.1886377Z cp.async.ca.shared.global [ %r101 + 0 ], [ %rd525 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1886438Z // end inline asm 2026-02-21T09:19:14.1886614Z // begin inline asm 2026-02-21T09:19:14.1886758Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd526 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1886817Z // end inline asm 2026-02-21T09:19:14.1886876Z // begin inline asm 2026-02-21T09:19:14.1887009Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd527 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1887072Z // end inline asm 2026-02-21T09:19:14.1887131Z // begin inline asm 2026-02-21T09:19:14.1887266Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd528 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1887332Z // end inline asm 2026-02-21T09:19:14.1887398Z cp.async.commit_group; 2026-02-21T09:19:14.1887599Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1887672Z add.s32 %r11194, %r11165, %r105; 2026-02-21T09:19:14.1887873Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1887937Z cvt.s64.s32 %rd612, %r11194; 2026-02-21T09:19:14.1888002Z add.s64 %rd529, %rd65, %rd612; 2026-02-21T09:19:14.1888206Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1888271Z // begin inline asm 2026-02-21T09:19:14.1888406Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd529 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1888472Z // end inline asm 2026-02-21T09:19:14.1888541Z cp.async.commit_group; 2026-02-21T09:19:14.1888739Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1888811Z add.s64 %rd530, %rd580, 160; 2026-02-21T09:19:14.1888874Z add.s64 %rd531, %rd584, 160; 2026-02-21T09:19:14.1888937Z add.s64 %rd532, %rd588, 160; 2026-02-21T09:19:14.1888999Z add.s64 %rd533, %rd592, 160; 2026-02-21T09:19:14.1889069Z add.s64 %rd534, %rd596, 160; 2026-02-21T09:19:14.1889131Z add.s64 %rd535, %rd600, 160; 2026-02-21T09:19:14.1889194Z add.s64 %rd536, %rd604, 160; 2026-02-21T09:19:14.1889261Z add.s64 %rd537, %rd608, 160; 2026-02-21T09:19:14.1889461Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1889524Z // begin inline asm 2026-02-21T09:19:14.1889657Z cp.async.ca.shared.global [ %r107 + 0 ], [ %rd530 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1889814Z // end inline asm 2026-02-21T09:19:14.1889875Z // begin inline asm 2026-02-21T09:19:14.1890007Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd531 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1890134Z // end inline asm 2026-02-21T09:19:14.1890195Z // begin inline asm 2026-02-21T09:19:14.1890328Z cp.async.ca.shared.global [ %r109 + 0 ], [ %rd532 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1890397Z // end inline asm 2026-02-21T09:19:14.1890457Z // begin inline asm 2026-02-21T09:19:14.1890591Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd533 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1890651Z // end inline asm 2026-02-21T09:19:14.1890717Z // begin inline asm 2026-02-21T09:19:14.1890913Z cp.async.ca.shared.global [ %r111 + 0 ], [ %rd534 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1890974Z // end inline asm 2026-02-21T09:19:14.1891040Z // begin inline asm 2026-02-21T09:19:14.1891172Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd535 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1891234Z // end inline asm 2026-02-21T09:19:14.1891294Z // begin inline asm 2026-02-21T09:19:14.1891432Z cp.async.ca.shared.global [ %r113 + 0 ], [ %rd536 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1891492Z // end inline asm 2026-02-21T09:19:14.1891558Z // begin inline asm 2026-02-21T09:19:14.1891698Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd537 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1891759Z // end inline asm 2026-02-21T09:19:14.1891825Z cp.async.commit_group; 2026-02-21T09:19:14.1892105Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1892178Z add.s32 %r11195, %r11165, %r115; 2026-02-21T09:19:14.1892382Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1892448Z cvt.s64.s32 %rd613, %r11195; 2026-02-21T09:19:14.1892522Z add.s64 %rd538, %rd65, %rd613; 2026-02-21T09:19:14.1892720Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1892786Z // begin inline asm 2026-02-21T09:19:14.1892925Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd538 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1892984Z // end inline asm 2026-02-21T09:19:14.1893054Z cp.async.commit_group; 2026-02-21T09:19:14.1893258Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1893321Z add.s64 %rd539, %rd580, 192; 2026-02-21T09:19:14.1893384Z add.s64 %rd540, %rd584, 192; 2026-02-21T09:19:14.1893449Z add.s64 %rd541, %rd588, 192; 2026-02-21T09:19:14.1893517Z add.s64 %rd542, %rd592, 192; 2026-02-21T09:19:14.1893578Z add.s64 %rd543, %rd596, 192; 2026-02-21T09:19:14.1893640Z add.s64 %rd544, %rd600, 192; 2026-02-21T09:19:14.1893707Z add.s64 %rd545, %rd604, 192; 2026-02-21T09:19:14.1893768Z add.s64 %rd546, %rd608, 192; 2026-02-21T09:19:14.1893967Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1894032Z bar.sync 0; 2026-02-21T09:19:14.1894094Z // begin inline asm 2026-02-21T09:19:14.1894228Z cp.async.ca.shared.global [ %r117 + 0 ], [ %rd539 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1894287Z // end inline asm 2026-02-21T09:19:14.1894351Z // begin inline asm 2026-02-21T09:19:14.1894483Z cp.async.ca.shared.global [ %r118 + 0 ], [ %rd540 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1894541Z // end inline asm 2026-02-21T09:19:14.1894604Z // begin inline asm 2026-02-21T09:19:14.1894739Z cp.async.ca.shared.global [ %r119 + 0 ], [ %rd541 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1894797Z // end inline asm 2026-02-21T09:19:14.1894857Z // begin inline asm 2026-02-21T09:19:14.1894993Z cp.async.ca.shared.global [ %r120 + 0 ], [ %rd542 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1895051Z // end inline asm 2026-02-21T09:19:14.1895110Z // begin inline asm 2026-02-21T09:19:14.1895245Z cp.async.ca.shared.global [ %r121 + 0 ], [ %rd543 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1895366Z // end inline asm 2026-02-21T09:19:14.1895425Z // begin inline asm 2026-02-21T09:19:14.1895557Z cp.async.ca.shared.global [ %r122 + 0 ], [ %rd544 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1895620Z // end inline asm 2026-02-21T09:19:14.1895727Z // begin inline asm 2026-02-21T09:19:14.1895859Z cp.async.ca.shared.global [ %r123 + 0 ], [ %rd545 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1895921Z // end inline asm 2026-02-21T09:19:14.1895982Z // begin inline asm 2026-02-21T09:19:14.1896114Z cp.async.ca.shared.global [ %r124 + 0 ], [ %rd546 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1896178Z // end inline asm 2026-02-21T09:19:14.1896244Z cp.async.commit_group; 2026-02-21T09:19:14.1896641Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1896720Z add.s32 %r11196, %r11165, %r125; 2026-02-21T09:19:14.1896941Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1897012Z cvt.s64.s32 %rd614, %r11196; 2026-02-21T09:19:14.1897078Z add.s64 %rd547, %rd65, %rd614; 2026-02-21T09:19:14.1897287Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1897352Z // begin inline asm 2026-02-21T09:19:14.1897494Z cp.async.ca.shared.global [ %r126 + 0 ], [ %rd547 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1897560Z // end inline asm 2026-02-21T09:19:14.1897628Z cp.async.commit_group; 2026-02-21T09:19:14.1897898Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1897965Z add.s64 %rd548, %rd580, 224; 2026-02-21T09:19:14.1898033Z add.s64 %rd549, %rd584, 224; 2026-02-21T09:19:14.1898100Z add.s64 %rd550, %rd588, 224; 2026-02-21T09:19:14.1898163Z add.s64 %rd551, %rd592, 224; 2026-02-21T09:19:14.1898229Z add.s64 %rd552, %rd596, 224; 2026-02-21T09:19:14.1898293Z add.s64 %rd553, %rd600, 224; 2026-02-21T09:19:14.1898357Z add.s64 %rd554, %rd604, 224; 2026-02-21T09:19:14.1898424Z add.s64 %rd555, %rd608, 224; 2026-02-21T09:19:14.1898624Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1898687Z // begin inline asm 2026-02-21T09:19:14.1898823Z cp.async.ca.shared.global [ %r127 + 0 ], [ %rd548 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1898886Z // end inline asm 2026-02-21T09:19:14.1898947Z // begin inline asm 2026-02-21T09:19:14.1899080Z cp.async.ca.shared.global [ %r128 + 0 ], [ %rd549 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1899145Z // end inline asm 2026-02-21T09:19:14.1899206Z // begin inline asm 2026-02-21T09:19:14.1899339Z cp.async.ca.shared.global [ %r129 + 0 ], [ %rd550 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1899398Z // end inline asm 2026-02-21T09:19:14.1899463Z // begin inline asm 2026-02-21T09:19:14.1899595Z cp.async.ca.shared.global [ %r130 + 0 ], [ %rd551 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1899654Z // end inline asm 2026-02-21T09:19:14.1899719Z // begin inline asm 2026-02-21T09:19:14.1899850Z cp.async.ca.shared.global [ %r131 + 0 ], [ %rd552 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1899908Z // end inline asm 2026-02-21T09:19:14.1899972Z // begin inline asm 2026-02-21T09:19:14.1900105Z cp.async.ca.shared.global [ %r132 + 0 ], [ %rd553 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1900163Z // end inline asm 2026-02-21T09:19:14.1900224Z // begin inline asm 2026-02-21T09:19:14.1900364Z cp.async.ca.shared.global [ %r133 + 0 ], [ %rd554 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1900425Z // end inline asm 2026-02-21T09:19:14.1900498Z // begin inline asm 2026-02-21T09:19:14.1900638Z cp.async.ca.shared.global [ %r134 + 0 ], [ %rd555 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1900698Z // end inline asm 2026-02-21T09:19:14.1900766Z cp.async.commit_group; 2026-02-21T09:19:14.1900969Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1901115Z add.s32 %r11197, %r11165, %r135; 2026-02-21T09:19:14.1901314Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1901379Z cvt.s64.s32 %rd615, %r11197; 2026-02-21T09:19:14.1901451Z add.s64 %rd556, %rd65, %rd615; 2026-02-21T09:19:14.1901712Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1901772Z // begin inline asm 2026-02-21T09:19:14.1901910Z cp.async.ca.shared.global [ %r136 + 0 ], [ %rd556 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1901971Z // end inline asm 2026-02-21T09:19:14.1902037Z cp.async.commit_group; 2026-02-21T09:19:14.1902235Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1902358Z add.s64 %rd557, %rd580, 256; 2026-02-21T09:19:14.1902425Z add.s64 %rd558, %rd584, 256; 2026-02-21T09:19:14.1902487Z add.s64 %rd559, %rd588, 256; 2026-02-21T09:19:14.1902556Z add.s64 %rd560, %rd592, 256; 2026-02-21T09:19:14.1902619Z add.s64 %rd561, %rd596, 256; 2026-02-21T09:19:14.1902683Z add.s64 %rd562, %rd600, 256; 2026-02-21T09:19:14.1902751Z add.s64 %rd563, %rd604, 256; 2026-02-21T09:19:14.1902812Z add.s64 %rd564, %rd608, 256; 2026-02-21T09:19:14.1903011Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1903068Z bar.sync 0; 2026-02-21T09:19:14.1903133Z // begin inline asm 2026-02-21T09:19:14.1903266Z cp.async.ca.shared.global [ %r137 + 0 ], [ %rd557 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1903381Z // end inline asm 2026-02-21T09:19:14.1903451Z // begin inline asm 2026-02-21T09:19:14.1903587Z cp.async.ca.shared.global [ %r138 + 0 ], [ %rd558 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1903645Z // end inline asm 2026-02-21T09:19:14.1903706Z // begin inline asm 2026-02-21T09:19:14.1903844Z cp.async.ca.shared.global [ %r139 + 0 ], [ %rd559 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1903904Z // end inline asm 2026-02-21T09:19:14.1903963Z // begin inline asm 2026-02-21T09:19:14.1904098Z cp.async.ca.shared.global [ %r140 + 0 ], [ %rd560 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1904157Z // end inline asm 2026-02-21T09:19:14.1904216Z // begin inline asm 2026-02-21T09:19:14.1904354Z cp.async.ca.shared.global [ %r141 + 0 ], [ %rd561 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1904411Z // end inline asm 2026-02-21T09:19:14.1904473Z // begin inline asm 2026-02-21T09:19:14.1904603Z cp.async.ca.shared.global [ %r142 + 0 ], [ %rd562 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1904666Z // end inline asm 2026-02-21T09:19:14.1904726Z // begin inline asm 2026-02-21T09:19:14.1904855Z cp.async.ca.shared.global [ %r143 + 0 ], [ %rd563 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1904920Z // end inline asm 2026-02-21T09:19:14.1904980Z // begin inline asm 2026-02-21T09:19:14.1905111Z cp.async.ca.shared.global [ %r144 + 0 ], [ %rd564 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1905169Z // end inline asm 2026-02-21T09:19:14.1905255Z cp.async.commit_group; 2026-02-21T09:19:14.1905457Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1905521Z add.s32 %r11198, %r11165, %r145; 2026-02-21T09:19:14.1905722Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1905788Z cvt.s64.s32 %rd616, %r11198; 2026-02-21T09:19:14.1905853Z add.s64 %rd565, %rd65, %rd616; 2026-02-21T09:19:14.1906059Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1906120Z // begin inline asm 2026-02-21T09:19:14.1906254Z cp.async.ca.shared.global [ %r146 + 0 ], [ %rd565 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1906311Z // end inline asm 2026-02-21T09:19:14.1906384Z cp.async.commit_group; 2026-02-21T09:19:14.1906697Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1906846Z add.s64 %rd566, %rd580, 288; 2026-02-21T09:19:14.1906913Z add.s64 %rd567, %rd584, 288; 2026-02-21T09:19:14.1906975Z add.s64 %rd568, %rd588, 288; 2026-02-21T09:19:14.1907036Z add.s64 %rd569, %rd592, 288; 2026-02-21T09:19:14.1907103Z add.s64 %rd570, %rd596, 288; 2026-02-21T09:19:14.1907248Z add.s64 %rd571, %rd600, 288; 2026-02-21T09:19:14.1907311Z add.s64 %rd572, %rd604, 288; 2026-02-21T09:19:14.1907376Z add.s64 %rd573, %rd608, 288; 2026-02-21T09:19:14.1907580Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1907644Z // begin inline asm 2026-02-21T09:19:14.1907780Z cp.async.ca.shared.global [ %r147 + 0 ], [ %rd566 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1907842Z // end inline asm 2026-02-21T09:19:14.1907987Z // begin inline asm 2026-02-21T09:19:14.1908122Z cp.async.ca.shared.global [ %r148 + 0 ], [ %rd567 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1908188Z // end inline asm 2026-02-21T09:19:14.1908249Z // begin inline asm 2026-02-21T09:19:14.1908381Z cp.async.ca.shared.global [ %r149 + 0 ], [ %rd568 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1908439Z // end inline asm 2026-02-21T09:19:14.1908582Z // begin inline asm 2026-02-21T09:19:14.1908718Z cp.async.ca.shared.global [ %r150 + 0 ], [ %rd569 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1908778Z // end inline asm 2026-02-21T09:19:14.1908844Z // begin inline asm 2026-02-21T09:19:14.1908975Z cp.async.ca.shared.global [ %r151 + 0 ], [ %rd570 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1909032Z // end inline asm 2026-02-21T09:19:14.1909163Z // begin inline asm 2026-02-21T09:19:14.1909302Z cp.async.ca.shared.global [ %r152 + 0 ], [ %rd571 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1909358Z // end inline asm 2026-02-21T09:19:14.1909416Z // begin inline asm 2026-02-21T09:19:14.1909553Z cp.async.ca.shared.global [ %r153 + 0 ], [ %rd572 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1909610Z // end inline asm 2026-02-21T09:19:14.1909670Z // begin inline asm 2026-02-21T09:19:14.1909806Z cp.async.ca.shared.global [ %r154 + 0 ], [ %rd573 + 0 ], 0x8, %r10748; 2026-02-21T09:19:14.1909869Z // end inline asm 2026-02-21T09:19:14.1909935Z cp.async.commit_group; 2026-02-21T09:19:14.1910136Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.1910213Z add.s32 %r11199, %r11165, %r155; 2026-02-21T09:19:14.1910412Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.1910477Z cvt.s64.s32 %rd617, %r11199; 2026-02-21T09:19:14.1910551Z add.s64 %rd574, %rd65, %rd617; 2026-02-21T09:19:14.1910751Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1910813Z // begin inline asm 2026-02-21T09:19:14.1910954Z cp.async.ca.shared.global [ %r156 + 0 ], [ %rd574 + 0 ], 0x4, %r23512; 2026-02-21T09:19:14.1911018Z // end inline asm 2026-02-21T09:19:14.1911085Z cp.async.commit_group; 2026-02-21T09:19:14.1911285Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1911355Z add.s32 %r11200, %r11160, %r325; 2026-02-21T09:19:14.1911423Z sub.s32 %r11201, %r11200, %r11162; 2026-02-21T09:19:14.1911488Z shl.b32 %r11202, %r11201, 7; 2026-02-21T09:19:14.1911556Z add.s32 %r23510, %r177, %r11202; 2026-02-21T09:19:14.1911623Z or.b32 %r11203, %r17, %r1376; 2026-02-21T09:19:14.1911687Z shl.b32 %r11204, %r11203, 10; 2026-02-21T09:19:14.1911757Z mul.wide.s32 %rd30, %r11204, 2; 2026-02-21T09:19:14.1911830Z or.b32 %r11205, %r16, %r1376; 2026-02-21T09:19:14.1911893Z shl.b32 %r11206, %r11205, 10; 2026-02-21T09:19:14.1911961Z mul.wide.s32 %rd31, %r11206, 2; 2026-02-21T09:19:14.1912028Z or.b32 %r11207, %r15, %r1376; 2026-02-21T09:19:14.1912093Z shl.b32 %r11208, %r11207, 10; 2026-02-21T09:19:14.1912159Z mul.wide.s32 %rd32, %r11208, 2; 2026-02-21T09:19:14.1912224Z or.b32 %r11209, %r14, %r1376; 2026-02-21T09:19:14.1912348Z shl.b32 %r11210, %r11209, 10; 2026-02-21T09:19:14.1912414Z mul.wide.s32 %rd33, %r11210, 2; 2026-02-21T09:19:14.1912478Z or.b32 %r11211, %r13, %r1376; 2026-02-21T09:19:14.1912544Z shl.b32 %r11212, %r11211, 10; 2026-02-21T09:19:14.1912608Z mul.wide.s32 %rd34, %r11212, 2; 2026-02-21T09:19:14.1912721Z or.b32 %r11213, %r12, %r1376; 2026-02-21T09:19:14.1912786Z shl.b32 %r11214, %r11213, 10; 2026-02-21T09:19:14.1912851Z mul.wide.s32 %rd35, %r11214, 2; 2026-02-21T09:19:14.1912911Z shl.b32 %r11215, %r11161, 19; 2026-02-21T09:19:14.1912975Z or.b32 %r11216, %r22984, %r11215; 2026-02-21T09:19:14.1913047Z mul.wide.s32 %rd36, %r11216, 2; 2026-02-21T09:19:14.1913111Z or.b32 %r23509, %r185, %r11215; 2026-02-21T09:19:14.1913176Z mov.b32 %r23513, 0f00000000; 2026-02-21T09:19:14.1913293Z mov.b32 %r23511, -1; 2026-02-21T09:19:14.1913357Z mov.b64 %rd1119, -16; 2026-02-21T09:19:14.1913420Z mov.b64 %rd1118, %rd3; 2026-02-21T09:19:14.1913483Z mov.b32 %r23514, %r23513; 2026-02-21T09:19:14.1913551Z mov.b32 %r23515, %r23513; 2026-02-21T09:19:14.1913611Z mov.b32 %r23516, %r23513; 2026-02-21T09:19:14.1913671Z mov.b32 %r23517, %r23513; 2026-02-21T09:19:14.1913736Z mov.b32 %r23518, %r23513; 2026-02-21T09:19:14.1913796Z mov.b32 %r23519, %r23513; 2026-02-21T09:19:14.1913857Z mov.b32 %r23520, %r23513; 2026-02-21T09:19:14.1913918Z mov.b32 %r23521, %r23513; 2026-02-21T09:19:14.1913983Z mov.b32 %r23522, %r23513; 2026-02-21T09:19:14.1914045Z mov.b32 %r23523, %r23513; 2026-02-21T09:19:14.1914105Z mov.b32 %r23524, %r23513; 2026-02-21T09:19:14.1914170Z mov.b32 %r23525, %r23513; 2026-02-21T09:19:14.1914280Z mov.b32 %r23526, %r23513; 2026-02-21T09:19:14.1914343Z mov.b32 %r23527, %r23513; 2026-02-21T09:19:14.1914404Z mov.b32 %r23528, %r23513; 2026-02-21T09:19:14.1914470Z mov.b32 %r23529, %r23513; 2026-02-21T09:19:14.1914533Z mov.b32 %r23530, %r23513; 2026-02-21T09:19:14.1914593Z mov.b32 %r23531, %r23513; 2026-02-21T09:19:14.1914658Z mov.b32 %r23532, %r23513; 2026-02-21T09:19:14.1914720Z mov.b32 %r23533, %r23513; 2026-02-21T09:19:14.1914781Z mov.b32 %r23534, %r23513; 2026-02-21T09:19:14.1914847Z mov.b32 %r23535, %r23513; 2026-02-21T09:19:14.1914908Z mov.b32 %r23536, %r23513; 2026-02-21T09:19:14.1914969Z mov.b32 %r23537, %r23513; 2026-02-21T09:19:14.1915033Z mov.b32 %r23538, %r23513; 2026-02-21T09:19:14.1915098Z mov.b32 %r23539, %r23513; 2026-02-21T09:19:14.1915171Z mov.b32 %r23540, %r23513; 2026-02-21T09:19:14.1915233Z mov.b32 %r23541, %r23513; 2026-02-21T09:19:14.1915298Z mov.b32 %r23542, %r23513; 2026-02-21T09:19:14.1915360Z mov.b32 %r23543, %r23513; 2026-02-21T09:19:14.1915421Z mov.b32 %r23544, %r23513; 2026-02-21T09:19:14.1915481Z mov.b32 %r23545, %r23513; 2026-02-21T09:19:14.1915549Z mov.b32 %r23546, %r23513; 2026-02-21T09:19:14.1915611Z mov.b32 %r23547, %r23513; 2026-02-21T09:19:14.1915672Z mov.b32 %r23548, %r23513; 2026-02-21T09:19:14.1915737Z mov.b32 %r23549, %r23513; 2026-02-21T09:19:14.1915797Z mov.b32 %r23550, %r23513; 2026-02-21T09:19:14.1915860Z mov.b32 %r23551, %r23513; 2026-02-21T09:19:14.1915921Z mov.b32 %r23552, %r23513; 2026-02-21T09:19:14.1915988Z mov.b32 %r23553, %r23513; 2026-02-21T09:19:14.1916048Z mov.b32 %r23554, %r23513; 2026-02-21T09:19:14.1916108Z mov.b32 %r23555, %r23513; 2026-02-21T09:19:14.1916176Z mov.b32 %r23556, %r23513; 2026-02-21T09:19:14.1916236Z mov.b32 %r23557, %r23513; 2026-02-21T09:19:14.1916295Z mov.b32 %r23558, %r23513; 2026-02-21T09:19:14.1916355Z mov.b32 %r23559, %r23513; 2026-02-21T09:19:14.1916422Z mov.b32 %r23560, %r23513; 2026-02-21T09:19:14.1916601Z mov.b32 %r23561, %r23513; 2026-02-21T09:19:14.1916669Z mov.b32 %r23562, %r23513; 2026-02-21T09:19:14.1916734Z mov.b32 %r23563, %r23513; 2026-02-21T09:19:14.1916795Z mov.b32 %r23564, %r23513; 2026-02-21T09:19:14.1916860Z mov.b32 %r23565, %r23513; 2026-02-21T09:19:14.1916921Z mov.b32 %r23566, %r23513; 2026-02-21T09:19:14.1916987Z mov.b32 %r23567, %r23513; 2026-02-21T09:19:14.1917048Z mov.b32 %r23568, %r23513; 2026-02-21T09:19:14.1917188Z mov.b32 %r23569, %r23513; 2026-02-21T09:19:14.1917255Z mov.b32 %r23570, %r23513; 2026-02-21T09:19:14.1917316Z mov.b32 %r23571, %r23513; 2026-02-21T09:19:14.1917377Z mov.b32 %r23572, %r23513; 2026-02-21T09:19:14.1917438Z mov.b32 %r23573, %r23513; 2026-02-21T09:19:14.1917567Z mov.b32 %r23574, %r23513; 2026-02-21T09:19:14.1917627Z mov.b32 %r23575, %r23513; 2026-02-21T09:19:14.1917687Z mov.b32 %r23576, %r23513; 2026-02-21T09:19:14.1917755Z mov.b32 %r23577, %r23513; 2026-02-21T09:19:14.1917817Z mov.b32 %r23578, %r23513; 2026-02-21T09:19:14.1917879Z mov.b32 %r23579, %r23513; 2026-02-21T09:19:14.1917948Z mov.b32 %r23580, %r23513; 2026-02-21T09:19:14.1918008Z mov.b32 %r23581, %r23513; 2026-02-21T09:19:14.1918068Z mov.b32 %r23582, %r23513; 2026-02-21T09:19:14.1918190Z mov.b32 %r23583, %r23513; 2026-02-21T09:19:14.1918260Z mov.b32 %r23584, %r23513; 2026-02-21T09:19:14.1918321Z mov.b32 %r23585, %r23513; 2026-02-21T09:19:14.1918392Z mov.b32 %r23586, %r23513; 2026-02-21T09:19:14.1918457Z mov.b32 %r23587, %r23513; 2026-02-21T09:19:14.1918517Z mov.b32 %r23588, %r23513; 2026-02-21T09:19:14.1918574Z mov.b32 %r23589, %r23513; 2026-02-21T09:19:14.1918635Z mov.b32 %r23590, %r23513; 2026-02-21T09:19:14.1918698Z mov.b32 %r23591, %r23513; 2026-02-21T09:19:14.1918758Z mov.b32 %r23592, %r23513; 2026-02-21T09:19:14.1918818Z mov.b32 %r23593, %r23513; 2026-02-21T09:19:14.1918884Z mov.b32 %r23594, %r23513; 2026-02-21T09:19:14.1918944Z mov.b32 %r23595, %r23513; 2026-02-21T09:19:14.1919002Z mov.b32 %r23596, %r23513; 2026-02-21T09:19:14.1919128Z mov.b32 %r23597, %r23513; 2026-02-21T09:19:14.1919193Z mov.b32 %r23598, %r23513; 2026-02-21T09:19:14.1919253Z mov.b32 %r23599, %r23513; 2026-02-21T09:19:14.1919311Z mov.b32 %r23600, %r23513; 2026-02-21T09:19:14.1919374Z mov.b32 %r23601, %r23513; 2026-02-21T09:19:14.1919433Z mov.b32 %r23602, %r23513; 2026-02-21T09:19:14.1919491Z mov.b32 %r23603, %r23513; 2026-02-21T09:19:14.1919561Z mov.b32 %r23604, %r23513; 2026-02-21T09:19:14.1919632Z mov.b32 %r23605, %r23513; 2026-02-21T09:19:14.1919691Z mov.b32 %r23606, %r23513; 2026-02-21T09:19:14.1919750Z mov.b32 %r23607, %r23513; 2026-02-21T09:19:14.1919816Z mov.b32 %r23608, %r23513; 2026-02-21T09:19:14.1919874Z mov.b32 %r23609, %r23513; 2026-02-21T09:19:14.1919937Z mov.b32 %r23610, %r23513; 2026-02-21T09:19:14.1919996Z mov.b32 %r23611, %r23513; 2026-02-21T09:19:14.1920058Z mov.b32 %r23612, %r23513; 2026-02-21T09:19:14.1920117Z mov.b32 %r23613, %r23513; 2026-02-21T09:19:14.1920176Z mov.b32 %r23614, %r23513; 2026-02-21T09:19:14.1920241Z mov.b32 %r23615, %r23513; 2026-02-21T09:19:14.1920300Z mov.b32 %r23616, %r23513; 2026-02-21T09:19:14.1920360Z mov.b32 %r23617, %r23513; 2026-02-21T09:19:14.1920419Z mov.b32 %r23618, %r23513; 2026-02-21T09:19:14.1920484Z mov.b32 %r23619, %r23513; 2026-02-21T09:19:14.1920544Z mov.b32 %r23620, %r23513; 2026-02-21T09:19:14.1920605Z mov.b32 %r23621, %r23513; 2026-02-21T09:19:14.1920666Z mov.b32 %r23622, %r23513; 2026-02-21T09:19:14.1920728Z mov.b32 %r23623, %r23513; 2026-02-21T09:19:14.1920787Z mov.b32 %r23624, %r23513; 2026-02-21T09:19:14.1920848Z mov.b32 %r23625, %r23513; 2026-02-21T09:19:14.1920909Z mov.b32 %r23626, %r23513; 2026-02-21T09:19:14.1920968Z mov.b32 %r23627, %r23513; 2026-02-21T09:19:14.1921029Z mov.b32 %r23628, %r23513; 2026-02-21T09:19:14.1921091Z mov.b32 %r23629, %r23513; 2026-02-21T09:19:14.1921151Z mov.b32 %r23630, %r23513; 2026-02-21T09:19:14.1921209Z mov.b32 %r23631, %r23513; 2026-02-21T09:19:14.1921272Z mov.b32 %r23632, %r23513; 2026-02-21T09:19:14.1921332Z mov.b32 %r23633, %r23513; 2026-02-21T09:19:14.1921391Z mov.b32 %r23634, %r23513; 2026-02-21T09:19:14.1921451Z mov.b32 %r23635, %r23513; 2026-02-21T09:19:14.1921515Z mov.b32 %r23636, %r23513; 2026-02-21T09:19:14.1921576Z mov.b32 %r23637, %r23513; 2026-02-21T09:19:14.1921637Z mov.b32 %r23638, %r23513; 2026-02-21T09:19:14.1921701Z mov.b32 %r23639, %r23513; 2026-02-21T09:19:14.1921760Z mov.b32 %r23640, %r23513; 2026-02-21T09:19:14.1921882Z mov.b32 %r23641, %r23513; 2026-02-21T09:19:14.1921941Z mov.b32 %r23642, %r23513; 2026-02-21T09:19:14.1922008Z mov.b32 %r23643, %r23513; 2026-02-21T09:19:14.1922070Z mov.b32 %r23644, %r23513; 2026-02-21T09:19:14.1922130Z mov.b32 %r23645, %r23513; 2026-02-21T09:19:14.1922246Z mov.b32 %r23646, %r23513; 2026-02-21T09:19:14.1922306Z mov.b32 %r23647, %r23513; 2026-02-21T09:19:14.1922364Z mov.b32 %r23648, %r23513; 2026-02-21T09:19:14.1922424Z mov.b32 %r23649, %r23513; 2026-02-21T09:19:14.1922490Z mov.b32 %r23650, %r23513; 2026-02-21T09:19:14.1922552Z mov.b32 %r23651, %r23513; 2026-02-21T09:19:14.1922612Z mov.b32 %r23652, %r23513; 2026-02-21T09:19:14.1922677Z mov.b32 %r23653, %r23513; 2026-02-21T09:19:14.1922738Z mov.b32 %r23654, %r23513; 2026-02-21T09:19:14.1922847Z mov.b32 %r23655, %r23513; 2026-02-21T09:19:14.1922911Z mov.b32 %r23656, %r23513; 2026-02-21T09:19:14.1922977Z mov.b32 %r23657, %r23513; 2026-02-21T09:19:14.1923038Z mov.b32 %r23658, %r23513; 2026-02-21T09:19:14.1923100Z mov.b32 %r23659, %r23513; 2026-02-21T09:19:14.1923164Z mov.b32 %r23660, %r23513; 2026-02-21T09:19:14.1923226Z mov.b32 %r23661, %r23513; 2026-02-21T09:19:14.1923287Z mov.b32 %r23662, %r23513; 2026-02-21T09:19:14.1923347Z mov.b32 %r23663, %r23513; 2026-02-21T09:19:14.1923423Z mov.b32 %r23664, %r23513; 2026-02-21T09:19:14.1923488Z mov.b32 %r23665, %r23513; 2026-02-21T09:19:14.1923551Z mov.b32 %r23666, %r23513; 2026-02-21T09:19:14.1923617Z mov.b32 %r23667, %r23513; 2026-02-21T09:19:14.1923678Z mov.b32 %r23668, %r23513; 2026-02-21T09:19:14.1923791Z mov.b32 %r23669, %r23513; 2026-02-21T09:19:14.1923860Z mov.b32 %r23670, %r23513; 2026-02-21T09:19:14.1923920Z mov.b32 %r23671, %r23513; 2026-02-21T09:19:14.1923981Z mov.b32 %r23672, %r23513; 2026-02-21T09:19:14.1924042Z mov.b32 %r23673, %r23513; 2026-02-21T09:19:14.1924105Z mov.b32 %r23674, %r23513; 2026-02-21T09:19:14.1924165Z mov.b32 %r23675, %r23513; 2026-02-21T09:19:14.1924223Z mov.b32 %r23676, %r23513; 2026-02-21T09:19:14.1924290Z mov.b32 %r23677, %r23513; 2026-02-21T09:19:14.1924349Z mov.b32 %r23678, %r23513; 2026-02-21T09:19:14.1924408Z mov.b32 %r23679, %r23513; 2026-02-21T09:19:14.1924466Z mov.b32 %r23680, %r23513; 2026-02-21T09:19:14.1924531Z mov.b32 %r23681, %r23513; 2026-02-21T09:19:14.1924593Z mov.b32 %r23682, %r23513; 2026-02-21T09:19:14.1924653Z mov.b32 %r23683, %r23513; 2026-02-21T09:19:14.1924715Z mov.b32 %r23684, %r23513; 2026-02-21T09:19:14.1924774Z mov.b32 %r23685, %r23513; 2026-02-21T09:19:14.1924833Z mov.b32 %r23686, %r23513; 2026-02-21T09:19:14.1924893Z mov.b32 %r23687, %r23513; 2026-02-21T09:19:14.1924960Z mov.b32 %r23688, %r23513; 2026-02-21T09:19:14.1925020Z mov.b32 %r23689, %r23513; 2026-02-21T09:19:14.1925082Z mov.b32 %r23690, %r23513; 2026-02-21T09:19:14.1925148Z mov.b32 %r23691, %r23513; 2026-02-21T09:19:14.1925211Z mov.b32 %r23692, %r23513; 2026-02-21T09:19:14.1925274Z mov.b32 %r23693, %r23513; 2026-02-21T09:19:14.1925335Z mov.b32 %r23694, %r23513; 2026-02-21T09:19:14.1925402Z mov.b32 %r23695, %r23513; 2026-02-21T09:19:14.1925462Z mov.b32 %r23696, %r23513; 2026-02-21T09:19:14.1925534Z mov.b32 %r23697, %r23513; 2026-02-21T09:19:14.1925600Z mov.b32 %r23698, %r23513; 2026-02-21T09:19:14.1925661Z mov.b32 %r23699, %r23513; 2026-02-21T09:19:14.1925723Z mov.b32 %r23700, %r23513; 2026-02-21T09:19:14.1925782Z mov.b32 %r23701, %r23513; 2026-02-21T09:19:14.1925846Z mov.b32 %r23702, %r23513; 2026-02-21T09:19:14.1925905Z mov.b32 %r23703, %r23513; 2026-02-21T09:19:14.1925967Z mov.b32 %r23704, %r23513; 2026-02-21T09:19:14.1926034Z mov.b32 %r23705, %r23513; 2026-02-21T09:19:14.1926093Z mov.b32 %r23706, %r23513; 2026-02-21T09:19:14.1926153Z mov.b32 %r23707, %r23513; 2026-02-21T09:19:14.1926213Z mov.b32 %r23708, %r23513; 2026-02-21T09:19:14.1926278Z mov.b32 %r23709, %r23513; 2026-02-21T09:19:14.1926339Z mov.b32 %r23710, %r23513; 2026-02-21T09:19:14.1926399Z mov.b32 %r23711, %r23513; 2026-02-21T09:19:14.1926575Z mov.b32 %r23712, %r23513; 2026-02-21T09:19:14.1926735Z mov.b32 %r23713, %r23513; 2026-02-21T09:19:14.1926794Z mov.b32 %r23714, %r23513; 2026-02-21T09:19:14.1926858Z mov.b32 %r23715, %r23513; 2026-02-21T09:19:14.1926918Z mov.b32 %r23716, %r23513; 2026-02-21T09:19:14.1926979Z mov.b32 %r23717, %r23513; 2026-02-21T09:19:14.1927119Z mov.b32 %r23718, %r23513; 2026-02-21T09:19:14.1927185Z mov.b32 %r23719, %r23513; 2026-02-21T09:19:14.1927244Z mov.b32 %r23720, %r23513; 2026-02-21T09:19:14.1927304Z mov.b32 %r23721, %r23513; 2026-02-21T09:19:14.1927369Z mov.b32 %r23722, %r23513; 2026-02-21T09:19:14.1927430Z mov.b32 %r23723, %r23513; 2026-02-21T09:19:14.1927490Z mov.b32 %r23724, %r23513; 2026-02-21T09:19:14.1927550Z mov.b32 %r23725, %r23513; 2026-02-21T09:19:14.1927614Z mov.b32 %r23726, %r23513; 2026-02-21T09:19:14.1927737Z mov.b32 %r23727, %r23513; 2026-02-21T09:19:14.1927802Z mov.b32 %r23728, %r23513; 2026-02-21T09:19:14.1927869Z mov.b32 %r23729, %r23513; 2026-02-21T09:19:14.1927928Z mov.b32 %r23730, %r23513; 2026-02-21T09:19:14.1927991Z mov.b32 %r23731, %r23513; 2026-02-21T09:19:14.1928054Z mov.b32 %r23732, %r23513; 2026-02-21T09:19:14.1928116Z mov.b32 %r23733, %r23513; 2026-02-21T09:19:14.1928177Z mov.b32 %r23734, %r23513; 2026-02-21T09:19:14.1928247Z mov.b32 %r23735, %r23513; 2026-02-21T09:19:14.1928315Z mov.b32 %r23736, %r23513; 2026-02-21T09:19:14.1928376Z mov.b32 %r23737, %r23513; 2026-02-21T09:19:14.1928437Z mov.b32 %r23738, %r23513; 2026-02-21T09:19:14.1928498Z mov.b32 %r23739, %r23513; 2026-02-21T09:19:14.1928563Z mov.b32 %r23740, %r23513; 2026-02-21T09:19:14.1928685Z mov.b32 %r23741, %r23513; 2026-02-21T09:19:14.1928748Z mov.b32 %r23742, %r23513; 2026-02-21T09:19:14.1928812Z mov.b32 %r23743, %r23513; 2026-02-21T09:19:14.1928873Z mov.b32 %r23744, %r23513; 2026-02-21T09:19:14.1928934Z mov.b32 %r23745, %r23513; 2026-02-21T09:19:14.1928994Z mov.b32 %r23746, %r23513; 2026-02-21T09:19:14.1929059Z mov.b32 %r23747, %r23513; 2026-02-21T09:19:14.1929119Z mov.b32 %r23748, %r23513; 2026-02-21T09:19:14.1929179Z mov.b32 %r23749, %r23513; 2026-02-21T09:19:14.1929244Z mov.b32 %r23750, %r23513; 2026-02-21T09:19:14.1929303Z mov.b32 %r23751, %r23513; 2026-02-21T09:19:14.1929363Z mov.b32 %r23752, %r23513; 2026-02-21T09:19:14.1929422Z mov.b32 %r23753, %r23513; 2026-02-21T09:19:14.1929491Z mov.b32 %r23754, %r23513; 2026-02-21T09:19:14.1929552Z mov.b32 %r23755, %r23513; 2026-02-21T09:19:14.1929612Z mov.b32 %r23756, %r23513; 2026-02-21T09:19:14.1929676Z mov.b32 %r23757, %r23513; 2026-02-21T09:19:14.1929737Z mov.b32 %r23758, %r23513; 2026-02-21T09:19:14.1929800Z mov.b32 %r23759, %r23513; 2026-02-21T09:19:14.1929860Z mov.b32 %r23760, %r23513; 2026-02-21T09:19:14.1929926Z mov.b32 %r23761, %r23513; 2026-02-21T09:19:14.1929987Z mov.b32 %r23762, %r23513; 2026-02-21T09:19:14.1930048Z mov.b32 %r23763, %r23513; 2026-02-21T09:19:14.1930115Z mov.b32 %r23764, %r23513; 2026-02-21T09:19:14.1930175Z mov.b32 %r23765, %r23513; 2026-02-21T09:19:14.1930248Z mov.b32 %r23766, %r23513; 2026-02-21T09:19:14.1930317Z mov.b32 %r23767, %r23513; 2026-02-21T09:19:14.1930378Z mov.b32 %r23768, %r23513; 2026-02-21T09:19:14.1930499Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:19:14.1930610Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:14.1930682Z add.s64 %rd1119, %rd1119, 16; 2026-02-21T09:19:14.1930755Z setp.lt.u64 %p61, %rd1119, 432; 2026-02-21T09:19:14.1930820Z add.s32 %r14401, %r23511, 1; 2026-02-21T09:19:14.1930897Z setp.gt.s32 %p62, %r14401, 4; 2026-02-21T09:19:14.1930972Z selp.b32 %r23511, 0, %r14401, %p62; 2026-02-21T09:19:14.1931182Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1931265Z cp.async.wait_group 16; 2026-02-21T09:19:14.1931324Z bar.sync 0; 2026-02-21T09:19:14.1931385Z shl.b32 %r14402, %r23511, 14; 2026-02-21T09:19:14.1931450Z add.s32 %r14404, %r22967, %r14402; 2026-02-21T09:19:14.1931722Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1931787Z add.s32 %r14405, %r14404, %r157; 2026-02-21T09:19:14.1931857Z ld.shared.b16 %rs289, [%r14405]; 2026-02-21T09:19:14.1931936Z ld.shared.b16 %rs290, [%r14405+256]; 2026-02-21T09:19:14.1932053Z ld.shared.b16 %rs291, [%r14405+16]; 2026-02-21T09:19:14.1932123Z ld.shared.b16 %rs292, [%r14405+272]; 2026-02-21T09:19:14.1932195Z ld.shared.b16 %rs293, [%r14405+4096]; 2026-02-21T09:19:14.1932268Z ld.shared.b16 %rs294, [%r14405+4352]; 2026-02-21T09:19:14.1932339Z ld.shared.b16 %rs295, [%r14405+4112]; 2026-02-21T09:19:14.1932407Z ld.shared.b16 %rs296, [%r14405+4368]; 2026-02-21T09:19:14.1932478Z ld.shared.b16 %rs297, [%r14405+8192]; 2026-02-21T09:19:14.1932595Z ld.shared.b16 %rs298, [%r14405+8448]; 2026-02-21T09:19:14.1932665Z ld.shared.b16 %rs299, [%r14405+8208]; 2026-02-21T09:19:14.1932736Z ld.shared.b16 %rs300, [%r14405+8464]; 2026-02-21T09:19:14.1932808Z ld.shared.b16 %rs301, [%r14405+12288]; 2026-02-21T09:19:14.1932880Z ld.shared.b16 %rs302, [%r14405+12544]; 2026-02-21T09:19:14.1932949Z ld.shared.b16 %rs303, [%r14405+12304]; 2026-02-21T09:19:14.1933027Z ld.shared.b16 %rs304, [%r14405+12560]; 2026-02-21T09:19:14.1933105Z add.s32 %r14406, %r14404, %r158; 2026-02-21T09:19:14.1933176Z ld.shared.b16 %rs305, [%r14406]; 2026-02-21T09:19:14.1933247Z ld.shared.b16 %rs306, [%r14406+256]; 2026-02-21T09:19:14.1933315Z ld.shared.b16 %rs307, [%r14406+16]; 2026-02-21T09:19:14.1933384Z ld.shared.b16 %rs308, [%r14406+272]; 2026-02-21T09:19:14.1933503Z ld.shared.b16 %rs309, [%r14406+4096]; 2026-02-21T09:19:14.1933577Z ld.shared.b16 %rs310, [%r14406+4352]; 2026-02-21T09:19:14.1933646Z ld.shared.b16 %rs311, [%r14406+4112]; 2026-02-21T09:19:14.1933713Z ld.shared.b16 %rs312, [%r14406+4368]; 2026-02-21T09:19:14.1933784Z ld.shared.b16 %rs313, [%r14406+8192]; 2026-02-21T09:19:14.1933851Z ld.shared.b16 %rs314, [%r14406+8448]; 2026-02-21T09:19:14.1933920Z ld.shared.b16 %rs315, [%r14406+8208]; 2026-02-21T09:19:14.1933994Z ld.shared.b16 %rs316, [%r14406+8464]; 2026-02-21T09:19:14.1934065Z ld.shared.b16 %rs317, [%r14406+12288]; 2026-02-21T09:19:14.1934134Z ld.shared.b16 %rs318, [%r14406+12544]; 2026-02-21T09:19:14.1934204Z ld.shared.b16 %rs319, [%r14406+12304]; 2026-02-21T09:19:14.1934278Z ld.shared.b16 %rs320, [%r14406+12560]; 2026-02-21T09:19:14.1934342Z cvt.f32.bf16 %r11345, %rs289; 2026-02-21T09:19:14.1934407Z cvt.f32.bf16 %r11346, %rs290; 2026-02-21T09:19:14.1934472Z cvt.f32.bf16 %r11347, %rs305; 2026-02-21T09:19:14.1934535Z cvt.f32.bf16 %r11348, %rs306; 2026-02-21T09:19:14.1934597Z cvt.f32.bf16 %r11477, %rs291; 2026-02-21T09:19:14.1934658Z cvt.f32.bf16 %r11478, %rs292; 2026-02-21T09:19:14.1934722Z cvt.f32.bf16 %r11479, %rs307; 2026-02-21T09:19:14.1934785Z cvt.f32.bf16 %r11480, %rs308; 2026-02-21T09:19:14.1934845Z cvt.f32.bf16 %r11609, %rs293; 2026-02-21T09:19:14.1934920Z cvt.f32.bf16 %r11610, %rs294; 2026-02-21T09:19:14.1934988Z cvt.f32.bf16 %r11611, %rs309; 2026-02-21T09:19:14.1935050Z cvt.f32.bf16 %r11612, %rs310; 2026-02-21T09:19:14.1935113Z cvt.f32.bf16 %r11741, %rs295; 2026-02-21T09:19:14.1935179Z cvt.f32.bf16 %r11742, %rs296; 2026-02-21T09:19:14.1935240Z cvt.f32.bf16 %r11743, %rs311; 2026-02-21T09:19:14.1935306Z cvt.f32.bf16 %r11744, %rs312; 2026-02-21T09:19:14.1935374Z cvt.f32.bf16 %r11873, %rs297; 2026-02-21T09:19:14.1935436Z cvt.f32.bf16 %r11874, %rs298; 2026-02-21T09:19:14.1935497Z cvt.f32.bf16 %r11875, %rs313; 2026-02-21T09:19:14.1935562Z cvt.f32.bf16 %r11876, %rs314; 2026-02-21T09:19:14.1935626Z cvt.f32.bf16 %r12005, %rs299; 2026-02-21T09:19:14.1935688Z cvt.f32.bf16 %r12006, %rs300; 2026-02-21T09:19:14.1935749Z cvt.f32.bf16 %r12007, %rs315; 2026-02-21T09:19:14.1935814Z cvt.f32.bf16 %r12008, %rs316; 2026-02-21T09:19:14.1935876Z cvt.f32.bf16 %r12137, %rs301; 2026-02-21T09:19:14.1935938Z cvt.f32.bf16 %r12138, %rs302; 2026-02-21T09:19:14.1936004Z cvt.f32.bf16 %r12139, %rs317; 2026-02-21T09:19:14.1936141Z cvt.f32.bf16 %r12140, %rs318; 2026-02-21T09:19:14.1936203Z cvt.f32.bf16 %r12269, %rs303; 2026-02-21T09:19:14.1936264Z cvt.f32.bf16 %r12270, %rs304; 2026-02-21T09:19:14.1936329Z cvt.f32.bf16 %r12271, %rs319; 2026-02-21T09:19:14.1936390Z cvt.f32.bf16 %r12272, %rs320; 2026-02-21T09:19:14.1936754Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1936822Z shl.b32 %r14407, %r23511, 10; 2026-02-21T09:19:14.1936887Z add.s32 %r14408, %r22967, %r14407; 2026-02-21T09:19:14.1936954Z add.s32 %r14409, %r14408, 172032; 2026-02-21T09:19:14.1937158Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1937222Z add.s32 %r14410, %r14409, %r22973; 2026-02-21T09:19:14.1937375Z ld.shared.b8 %rs321, [%r14410]; 2026-02-21T09:19:14.1937450Z ld.shared.b8 %rs322, [%r14410+128]; 2026-02-21T09:19:14.1937525Z ld.shared.b8 %rs323, [%r14410+256]; 2026-02-21T09:19:14.1937594Z ld.shared.b8 %rs324, [%r14410+384]; 2026-02-21T09:19:14.1937662Z ld.shared.b8 %rs325, [%r14410+512]; 2026-02-21T09:19:14.1937734Z ld.shared.b8 %rs326, [%r14410+640]; 2026-02-21T09:19:14.1937800Z ld.shared.b8 %rs327, [%r14410+768]; 2026-02-21T09:19:14.1937866Z add.s32 %r14411, %r14409, %r22974; 2026-02-21T09:19:14.1937933Z ld.shared.b8 %rs328, [%r14411]; 2026-02-21T09:19:14.1938137Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1938203Z shl.b16 %rs329, %rs321, 4; 2026-02-21T09:19:14.1938331Z shl.b16 %rs330, %rs322, 4; 2026-02-21T09:19:14.1938403Z shl.b16 %rs331, %rs323, 4; 2026-02-21T09:19:14.1941195Z shl.b16 %rs332, %rs324, 4; 2026-02-21T09:19:14.1941295Z shl.b16 %rs333, %rs325, 4; 2026-02-21T09:19:14.1941365Z shl.b16 %rs334, %rs326, 4; 2026-02-21T09:19:14.1941428Z shl.b16 %rs335, %rs327, 4; 2026-02-21T09:19:14.1941491Z shl.b16 %rs336, %rs328, 4; 2026-02-21T09:19:14.1941729Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1941814Z selp.b16 %rs337, %rs329, %rs321, %p110; 2026-02-21T09:19:14.1941895Z cvt.s16.s8 %rs338, %rs337; 2026-02-21T09:19:14.1941965Z shr.s16 %rs339, %rs338, 4; 2026-02-21T09:19:14.1942042Z selp.b16 %rs340, %rs330, %rs322, %p110; 2026-02-21T09:19:14.1942102Z cvt.s16.s8 %rs341, %rs340; 2026-02-21T09:19:14.1942168Z shr.s16 %rs342, %rs341, 4; 2026-02-21T09:19:14.1942238Z selp.b16 %rs343, %rs331, %rs323, %p110; 2026-02-21T09:19:14.1942302Z cvt.s16.s8 %rs344, %rs343; 2026-02-21T09:19:14.1942369Z shr.s16 %rs345, %rs344, 4; 2026-02-21T09:19:14.1942437Z selp.b16 %rs346, %rs332, %rs324, %p110; 2026-02-21T09:19:14.1942499Z cvt.s16.s8 %rs347, %rs346; 2026-02-21T09:19:14.1942562Z shr.s16 %rs348, %rs347, 4; 2026-02-21T09:19:14.1942633Z selp.b16 %rs349, %rs333, %rs325, %p110; 2026-02-21T09:19:14.1942693Z cvt.s16.s8 %rs350, %rs349; 2026-02-21T09:19:14.1942753Z shr.s16 %rs351, %rs350, 4; 2026-02-21T09:19:14.1942828Z selp.b16 %rs352, %rs334, %rs326, %p110; 2026-02-21T09:19:14.1942888Z cvt.s16.s8 %rs353, %rs352; 2026-02-21T09:19:14.1942948Z shr.s16 %rs354, %rs353, 4; 2026-02-21T09:19:14.1943017Z selp.b16 %rs355, %rs335, %rs327, %p110; 2026-02-21T09:19:14.1943085Z cvt.s16.s8 %rs356, %rs355; 2026-02-21T09:19:14.1943147Z shr.s16 %rs357, %rs356, 4; 2026-02-21T09:19:14.1943215Z selp.b16 %rs358, %rs336, %rs328, %p110; 2026-02-21T09:19:14.1943280Z cvt.s16.s8 %rs359, %rs358; 2026-02-21T09:19:14.1943345Z shr.s16 %rs360, %rs359, 4; 2026-02-21T09:19:14.1943566Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1943640Z cvt.rn.f32.s16 %r14412, %rs339; 2026-02-21T09:19:14.1943705Z cvt.rn.f32.s16 %r14413, %rs342; 2026-02-21T09:19:14.1943768Z cvt.rn.f32.s16 %r14414, %rs345; 2026-02-21T09:19:14.1943829Z cvt.rn.f32.s16 %r14415, %rs348; 2026-02-21T09:19:14.1943895Z cvt.rn.f32.s16 %r14416, %rs351; 2026-02-21T09:19:14.1944076Z cvt.rn.f32.s16 %r14417, %rs354; 2026-02-21T09:19:14.1944138Z cvt.rn.f32.s16 %r14418, %rs357; 2026-02-21T09:19:14.1944202Z cvt.rn.f32.s16 %r14419, %rs360; 2026-02-21T09:19:14.1944267Z st.shared.b32 [%r161], %r14412; 2026-02-21T09:19:14.1944334Z st.shared.b32 [%r161+8], %r14413; 2026-02-21T09:19:14.1944479Z st.shared.b32 [%r162], %r14414; 2026-02-21T09:19:14.1944547Z st.shared.b32 [%r162+8], %r14415; 2026-02-21T09:19:14.1944610Z st.shared.b32 [%r163], %r14416; 2026-02-21T09:19:14.1944673Z st.shared.b32 [%r163+8], %r14417; 2026-02-21T09:19:14.1944739Z st.shared.b32 [%r164], %r14418; 2026-02-21T09:19:14.1944806Z st.shared.b32 [%r164+8], %r14419; 2026-02-21T09:19:14.1944864Z $L__tmp9: 2026-02-21T09:19:14.1945204Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1945283Z // begin inline asm 2026-02-21T09:19:14.1945373Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1945432Z // end inline asm 2026-02-21T09:19:14.1945493Z bar.sync 0; 2026-02-21T09:19:14.1945587Z shfl.sync.idx.b32 %r14420, %r5, 0, 31, -1; 2026-02-21T09:19:14.1945662Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1945733Z mov.pred %p44, -1; 2026-02-21T09:19:14.1945793Z // begin inline asm 2026-02-21T09:19:14.1947546Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576}, {%r11345,%r11346,%r11347,%r11348}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1947630Z // end inline asm 2026-02-21T09:19:14.1947691Z // begin inline asm 2026-02-21T09:19:14.1949258Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576}, {%r11477,%r11478,%r11479,%r11480}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1949326Z // end inline asm 2026-02-21T09:19:14.1949384Z // begin inline asm 2026-02-21T09:19:14.1950867Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640}, {%r11609,%r11610,%r11611,%r11612}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1950930Z // end inline asm 2026-02-21T09:19:14.1950988Z // begin inline asm 2026-02-21T09:19:14.1952479Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640}, {%r11741,%r11742,%r11743,%r11744}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1952633Z // end inline asm 2026-02-21T09:19:14.1952692Z // begin inline asm 2026-02-21T09:19:14.1954241Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704}, {%r11873,%r11874,%r11875,%r11876}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1954357Z // end inline asm 2026-02-21T09:19:14.1954417Z // begin inline asm 2026-02-21T09:19:14.1955914Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704}, {%r12005,%r12006,%r12007,%r12008}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1956022Z // end inline asm 2026-02-21T09:19:14.1956085Z // begin inline asm 2026-02-21T09:19:14.1957679Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768}, {%r12137,%r12138,%r12139,%r12140}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1957747Z // end inline asm 2026-02-21T09:19:14.1957806Z // begin inline asm 2026-02-21T09:19:14.1959290Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768}, {%r12269,%r12270,%r12271,%r12272}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1959355Z // end inline asm 2026-02-21T09:19:14.1959433Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1959494Z mov.b32 %r14105, 0; 2026-02-21T09:19:14.1959562Z mov.b32 %r12529, %r2973; 2026-02-21T09:19:14.1959626Z mov.b32 %r12530, %r14105; 2026-02-21T09:19:14.1959685Z mov.b32 %r12531, %r14105; 2026-02-21T09:19:14.1959745Z // begin inline asm 2026-02-21T09:19:14.1964826Z // wait for regs: %r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576,%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640,%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704,%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768,%r12529,%r12530,%r12531 2026-02-21T09:19:14.1965029Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1965086Z // end inline asm 2026-02-21T09:19:14.1965211Z $L__tmp10: 2026-02-21T09:19:14.1965439Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1965510Z add.s32 %r14422, %r6453, %r14402; 2026-02-21T09:19:14.1965731Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.1965798Z add.s32 %r14423, %r14422, %r157; 2026-02-21T09:19:14.1965869Z ld.shared.b16 %rs361, [%r14423]; 2026-02-21T09:19:14.1965940Z ld.shared.b16 %rs362, [%r14423+256]; 2026-02-21T09:19:14.1966012Z ld.shared.b16 %rs363, [%r14423+16]; 2026-02-21T09:19:14.1966082Z ld.shared.b16 %rs364, [%r14423+272]; 2026-02-21T09:19:14.1966155Z ld.shared.b16 %rs365, [%r14423+4096]; 2026-02-21T09:19:14.1966227Z ld.shared.b16 %rs366, [%r14423+4352]; 2026-02-21T09:19:14.1966294Z ld.shared.b16 %rs367, [%r14423+4112]; 2026-02-21T09:19:14.1966359Z ld.shared.b16 %rs368, [%r14423+4368]; 2026-02-21T09:19:14.1966431Z ld.shared.b16 %rs369, [%r14423+8192]; 2026-02-21T09:19:14.1966610Z ld.shared.b16 %rs370, [%r14423+8448]; 2026-02-21T09:19:14.1966677Z ld.shared.b16 %rs371, [%r14423+8208]; 2026-02-21T09:19:14.1966742Z ld.shared.b16 %rs372, [%r14423+8464]; 2026-02-21T09:19:14.1966821Z ld.shared.b16 %rs373, [%r14423+12288]; 2026-02-21T09:19:14.1966892Z ld.shared.b16 %rs374, [%r14423+12544]; 2026-02-21T09:19:14.1966960Z ld.shared.b16 %rs375, [%r14423+12304]; 2026-02-21T09:19:14.1967035Z ld.shared.b16 %rs376, [%r14423+12560]; 2026-02-21T09:19:14.1967100Z add.s32 %r14424, %r14422, %r158; 2026-02-21T09:19:14.1967166Z ld.shared.b16 %rs377, [%r14424]; 2026-02-21T09:19:14.1967233Z ld.shared.b16 %rs378, [%r14424+256]; 2026-02-21T09:19:14.1967304Z ld.shared.b16 %rs379, [%r14424+16]; 2026-02-21T09:19:14.1967382Z ld.shared.b16 %rs380, [%r14424+272]; 2026-02-21T09:19:14.1967452Z ld.shared.b16 %rs381, [%r14424+4096]; 2026-02-21T09:19:14.1967525Z ld.shared.b16 %rs382, [%r14424+4352]; 2026-02-21T09:19:14.1967593Z ld.shared.b16 %rs383, [%r14424+4112]; 2026-02-21T09:19:14.1967658Z ld.shared.b16 %rs384, [%r14424+4368]; 2026-02-21T09:19:14.1967729Z ld.shared.b16 %rs385, [%r14424+8192]; 2026-02-21T09:19:14.1967798Z ld.shared.b16 %rs386, [%r14424+8448]; 2026-02-21T09:19:14.1967862Z ld.shared.b16 %rs387, [%r14424+8208]; 2026-02-21T09:19:14.1967927Z ld.shared.b16 %rs388, [%r14424+8464]; 2026-02-21T09:19:14.1968002Z ld.shared.b16 %rs389, [%r14424+12288]; 2026-02-21T09:19:14.1968153Z ld.shared.b16 %rs390, [%r14424+12544]; 2026-02-21T09:19:14.1968225Z ld.shared.b16 %rs391, [%r14424+12304]; 2026-02-21T09:19:14.1968296Z ld.shared.b16 %rs392, [%r14424+12560]; 2026-02-21T09:19:14.1968362Z cvt.f32.bf16 %r12919, %rs361; 2026-02-21T09:19:14.1968487Z cvt.f32.bf16 %r12920, %rs362; 2026-02-21T09:19:14.1968564Z cvt.f32.bf16 %r12921, %rs377; 2026-02-21T09:19:14.1968629Z cvt.f32.bf16 %r12922, %rs378; 2026-02-21T09:19:14.1968690Z cvt.f32.bf16 %r13051, %rs363; 2026-02-21T09:19:14.1968752Z cvt.f32.bf16 %r13052, %rs364; 2026-02-21T09:19:14.1968818Z cvt.f32.bf16 %r13053, %rs379; 2026-02-21T09:19:14.1968878Z cvt.f32.bf16 %r13054, %rs380; 2026-02-21T09:19:14.1968939Z cvt.f32.bf16 %r13183, %rs365; 2026-02-21T09:19:14.1969063Z cvt.f32.bf16 %r13184, %rs366; 2026-02-21T09:19:14.1969130Z cvt.f32.bf16 %r13185, %rs381; 2026-02-21T09:19:14.1969190Z cvt.f32.bf16 %r13186, %rs382; 2026-02-21T09:19:14.1969250Z cvt.f32.bf16 %r13315, %rs367; 2026-02-21T09:19:14.1969315Z cvt.f32.bf16 %r13316, %rs368; 2026-02-21T09:19:14.1969378Z cvt.f32.bf16 %r13317, %rs383; 2026-02-21T09:19:14.1969438Z cvt.f32.bf16 %r13318, %rs384; 2026-02-21T09:19:14.1969502Z cvt.f32.bf16 %r13447, %rs369; 2026-02-21T09:19:14.1969564Z cvt.f32.bf16 %r13448, %rs370; 2026-02-21T09:19:14.1969626Z cvt.f32.bf16 %r13449, %rs385; 2026-02-21T09:19:14.1969687Z cvt.f32.bf16 %r13450, %rs386; 2026-02-21T09:19:14.1969752Z cvt.f32.bf16 %r13579, %rs371; 2026-02-21T09:19:14.1969812Z cvt.f32.bf16 %r13580, %rs372; 2026-02-21T09:19:14.1969872Z cvt.f32.bf16 %r13581, %rs387; 2026-02-21T09:19:14.1970008Z cvt.f32.bf16 %r13582, %rs388; 2026-02-21T09:19:14.1970074Z cvt.f32.bf16 %r13711, %rs373; 2026-02-21T09:19:14.1970135Z cvt.f32.bf16 %r13712, %rs374; 2026-02-21T09:19:14.1970196Z cvt.f32.bf16 %r13713, %rs389; 2026-02-21T09:19:14.1970260Z cvt.f32.bf16 %r13714, %rs390; 2026-02-21T09:19:14.1970320Z cvt.f32.bf16 %r13843, %rs375; 2026-02-21T09:19:14.1970381Z cvt.f32.bf16 %r13844, %rs376; 2026-02-21T09:19:14.1970446Z cvt.f32.bf16 %r13845, %rs391; 2026-02-21T09:19:14.1970507Z cvt.f32.bf16 %r13846, %rs392; 2026-02-21T09:19:14.1970720Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.1970787Z add.s32 %r14425, %r14408, 177152; 2026-02-21T09:19:14.1970990Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.1971057Z add.s32 %r14426, %r14425, %r22973; 2026-02-21T09:19:14.1971125Z ld.shared.b8 %rs393, [%r14426]; 2026-02-21T09:19:14.1971198Z ld.shared.b8 %rs394, [%r14426+128]; 2026-02-21T09:19:14.1971263Z ld.shared.b8 %rs395, [%r14426+256]; 2026-02-21T09:19:14.1971327Z ld.shared.b8 %rs396, [%r14426+384]; 2026-02-21T09:19:14.1971397Z ld.shared.b8 %rs397, [%r14426+512]; 2026-02-21T09:19:14.1971463Z ld.shared.b8 %rs398, [%r14426+640]; 2026-02-21T09:19:14.1971526Z ld.shared.b8 %rs399, [%r14426+768]; 2026-02-21T09:19:14.1971590Z add.s32 %r14427, %r14425, %r22974; 2026-02-21T09:19:14.1971660Z ld.shared.b8 %rs400, [%r14427]; 2026-02-21T09:19:14.1971858Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.1971925Z shl.b16 %rs401, %rs393, 4; 2026-02-21T09:19:14.1971991Z shl.b16 %rs402, %rs394, 4; 2026-02-21T09:19:14.1972052Z shl.b16 %rs403, %rs395, 4; 2026-02-21T09:19:14.1972114Z shl.b16 %rs404, %rs396, 4; 2026-02-21T09:19:14.1972179Z shl.b16 %rs405, %rs397, 4; 2026-02-21T09:19:14.1972238Z shl.b16 %rs406, %rs398, 4; 2026-02-21T09:19:14.1972299Z shl.b16 %rs407, %rs399, 4; 2026-02-21T09:19:14.1972360Z shl.b16 %rs408, %rs400, 4; 2026-02-21T09:19:14.1972564Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.1972641Z selp.b16 %rs409, %rs401, %rs393, %p110; 2026-02-21T09:19:14.1972703Z cvt.s16.s8 %rs410, %rs409; 2026-02-21T09:19:14.1972769Z shr.s16 %rs411, %rs410, 4; 2026-02-21T09:19:14.1972908Z selp.b16 %rs412, %rs402, %rs394, %p110; 2026-02-21T09:19:14.1972970Z cvt.s16.s8 %rs413, %rs412; 2026-02-21T09:19:14.1973032Z shr.s16 %rs414, %rs413, 4; 2026-02-21T09:19:14.1973106Z selp.b16 %rs415, %rs403, %rs395, %p110; 2026-02-21T09:19:14.1973170Z cvt.s16.s8 %rs416, %rs415; 2026-02-21T09:19:14.1973277Z shr.s16 %rs417, %rs416, 4; 2026-02-21T09:19:14.1973349Z selp.b16 %rs418, %rs404, %rs396, %p110; 2026-02-21T09:19:14.1973410Z cvt.s16.s8 %rs419, %rs418; 2026-02-21T09:19:14.1973470Z shr.s16 %rs420, %rs419, 4; 2026-02-21T09:19:14.1973545Z selp.b16 %rs421, %rs405, %rs397, %p110; 2026-02-21T09:19:14.1973607Z cvt.s16.s8 %rs422, %rs421; 2026-02-21T09:19:14.1973669Z shr.s16 %rs423, %rs422, 4; 2026-02-21T09:19:14.1973740Z selp.b16 %rs424, %rs406, %rs398, %p110; 2026-02-21T09:19:14.1973858Z cvt.s16.s8 %rs425, %rs424; 2026-02-21T09:19:14.1973922Z shr.s16 %rs426, %rs425, 4; 2026-02-21T09:19:14.1973991Z selp.b16 %rs427, %rs407, %rs399, %p110; 2026-02-21T09:19:14.1974056Z cvt.s16.s8 %rs428, %rs427; 2026-02-21T09:19:14.1974121Z shr.s16 %rs429, %rs428, 4; 2026-02-21T09:19:14.1974188Z selp.b16 %rs430, %rs408, %rs400, %p110; 2026-02-21T09:19:14.1974251Z cvt.s16.s8 %rs431, %rs430; 2026-02-21T09:19:14.1974321Z shr.s16 %rs432, %rs431, 4; 2026-02-21T09:19:14.1974528Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.1974593Z cvt.rn.f32.s16 %r14428, %rs411; 2026-02-21T09:19:14.1974669Z cvt.rn.f32.s16 %r14429, %rs414; 2026-02-21T09:19:14.1974734Z cvt.rn.f32.s16 %r14430, %rs417; 2026-02-21T09:19:14.1974863Z cvt.rn.f32.s16 %r14431, %rs420; 2026-02-21T09:19:14.1974930Z cvt.rn.f32.s16 %r14432, %rs423; 2026-02-21T09:19:14.1974994Z cvt.rn.f32.s16 %r14433, %rs426; 2026-02-21T09:19:14.1975071Z cvt.rn.f32.s16 %r14434, %rs429; 2026-02-21T09:19:14.1975135Z cvt.rn.f32.s16 %r14435, %rs432; 2026-02-21T09:19:14.1975197Z bar.sync 0; 2026-02-21T09:19:14.1975261Z st.shared.b32 [%r161], %r14428; 2026-02-21T09:19:14.1975332Z st.shared.b32 [%r161+8], %r14429; 2026-02-21T09:19:14.1975404Z st.shared.b32 [%r162], %r14430; 2026-02-21T09:19:14.1975473Z st.shared.b32 [%r162+8], %r14431; 2026-02-21T09:19:14.1975538Z st.shared.b32 [%r163], %r14432; 2026-02-21T09:19:14.1975601Z st.shared.b32 [%r163+8], %r14433; 2026-02-21T09:19:14.1975673Z st.shared.b32 [%r164], %r14434; 2026-02-21T09:19:14.1975737Z st.shared.b32 [%r164+8], %r14435; 2026-02-21T09:19:14.1975793Z $L__tmp11: 2026-02-21T09:19:14.1976080Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.1976143Z // begin inline asm 2026-02-21T09:19:14.1976221Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.1976291Z // end inline asm 2026-02-21T09:19:14.1976358Z bar.sync 0; 2026-02-21T09:19:14.1976433Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.1976617Z // begin inline asm 2026-02-21T09:19:14.1978129Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576}, {%r12919,%r12920,%r12921,%r12922}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1978196Z // end inline asm 2026-02-21T09:19:14.1978260Z // begin inline asm 2026-02-21T09:19:14.1979756Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576}, {%r13051,%r13052,%r13053,%r13054}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1979955Z // end inline asm 2026-02-21T09:19:14.1980021Z // begin inline asm 2026-02-21T09:19:14.1981571Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640}, {%r13183,%r13184,%r13185,%r13186}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1981639Z // end inline asm 2026-02-21T09:19:14.1981700Z // begin inline asm 2026-02-21T09:19:14.1983248Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640}, {%r13315,%r13316,%r13317,%r13318}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1983315Z // end inline asm 2026-02-21T09:19:14.1983376Z // begin inline asm 2026-02-21T09:19:14.1984854Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704}, {%r13447,%r13448,%r13449,%r13450}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1984920Z // end inline asm 2026-02-21T09:19:14.1984980Z // begin inline asm 2026-02-21T09:19:14.1986566Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704}, {%r13579,%r13580,%r13581,%r13582}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1986634Z // end inline asm 2026-02-21T09:19:14.1986695Z // begin inline asm 2026-02-21T09:19:14.1988189Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768}, {%r13711,%r13712,%r13713,%r13714}, %rd1, %p44, 1, 1; 2026-02-21T09:19:14.1988254Z // end inline asm 2026-02-21T09:19:14.1988391Z // begin inline asm 2026-02-21T09:19:14.1989978Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768}, {%r13843,%r13844,%r13845,%r13846}, %rd2, %p44, 1, 1; 2026-02-21T09:19:14.1990109Z // end inline asm 2026-02-21T09:19:14.1990191Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.1990314Z mov.b32 %r14103, %r2973; 2026-02-21T09:19:14.1990380Z mov.b32 %r14104, %r14105; 2026-02-21T09:19:14.1990439Z // begin inline asm 2026-02-21T09:19:14.1995483Z // wait for regs: %r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r23558,%r23559,%r23560,%r23561,%r23562,%r23563,%r23564,%r23565,%r23566,%r23567,%r23568,%r23569,%r23570,%r23571,%r23572,%r23573,%r23574,%r23575,%r23576,%r23577,%r23578,%r23579,%r23580,%r23581,%r23582,%r23583,%r23584,%r23585,%r23586,%r23587,%r23588,%r23589,%r23590,%r23591,%r23592,%r23593,%r23594,%r23595,%r23596,%r23597,%r23598,%r23599,%r23600,%r23601,%r23602,%r23603,%r23604,%r23605,%r23606,%r23607,%r23608,%r23609,%r23610,%r23611,%r23612,%r23613,%r23614,%r23615,%r23616,%r23617,%r23618,%r23619,%r23620,%r23621,%r23622,%r23623,%r23624,%r23625,%r23626,%r23627,%r23628,%r23629,%r23630,%r23631,%r23632,%r23633,%r23634,%r23635,%r23636,%r23637,%r23638,%r23639,%r23640,%r23641,%r23642,%r23643,%r23644,%r23645,%r23646,%r23647,%r23648,%r23649,%r23650,%r23651,%r23652,%r23653,%r23654,%r23655,%r23656,%r23657,%r23658,%r23659,%r23660,%r23661,%r23662,%r23663,%r23664,%r23665,%r23666,%r23667,%r23668,%r23669,%r23670,%r23671,%r23672,%r23673,%r23674,%r23675,%r23676,%r23677,%r23678,%r23679,%r23680,%r23681,%r23682,%r23683,%r23684,%r23685,%r23686,%r23687,%r23688,%r23689,%r23690,%r23691,%r23692,%r23693,%r23694,%r23695,%r23696,%r23697,%r23698,%r23699,%r23700,%r23701,%r23702,%r23703,%r23704,%r23705,%r23706,%r23707,%r23708,%r23709,%r23710,%r23711,%r23712,%r23713,%r23714,%r23715,%r23716,%r23717,%r23718,%r23719,%r23720,%r23721,%r23722,%r23723,%r23724,%r23725,%r23726,%r23727,%r23728,%r23729,%r23730,%r23731,%r23732,%r23733,%r23734,%r23735,%r23736,%r23737,%r23738,%r23739,%r23740,%r23741,%r23742,%r23743,%r23744,%r23745,%r23746,%r23747,%r23748,%r23749,%r23750,%r23751,%r23752,%r23753,%r23754,%r23755,%r23756,%r23757,%r23758,%r23759,%r23760,%r23761,%r23762,%r23763,%r23764,%r23765,%r23766,%r23767,%r23768,%r14103,%r14104,%r14105 2026-02-21T09:19:14.1995576Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.1995633Z // end inline asm 2026-02-21T09:19:14.1995695Z $L__tmp12: 2026-02-21T09:19:14.1995905Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.1995973Z add.s32 %r14436, %r23512, 1; 2026-02-21T09:19:14.1996041Z setp.gt.s32 %p63, %r14436, 4; 2026-02-21T09:19:14.1996119Z selp.b32 %r23512, 0, %r14436, %p63; 2026-02-21T09:19:14.1996321Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.1996391Z add.s32 %r14437, %r23509, -16; 2026-02-21T09:19:14.1996579Z add.s64 %rd652, %rd1118, %rd36; 2026-02-21T09:19:14.1996649Z add.s64 %rd634, %rd652, 320; 2026-02-21T09:19:14.1996716Z add.s64 %rd653, %rd1118, %rd35; 2026-02-21T09:19:14.1996785Z add.s64 %rd635, %rd653, 320; 2026-02-21T09:19:14.1996848Z add.s64 %rd654, %rd1118, %rd34; 2026-02-21T09:19:14.1996910Z add.s64 %rd636, %rd654, 320; 2026-02-21T09:19:14.1997050Z add.s64 %rd655, %rd1118, %rd33; 2026-02-21T09:19:14.1997117Z add.s64 %rd637, %rd655, 320; 2026-02-21T09:19:14.1997180Z add.s64 %rd656, %rd1118, %rd32; 2026-02-21T09:19:14.1997242Z add.s64 %rd638, %rd656, 320; 2026-02-21T09:19:14.1997373Z add.s64 %rd657, %rd1118, %rd31; 2026-02-21T09:19:14.1997434Z add.s64 %rd639, %rd657, 320; 2026-02-21T09:19:14.1997497Z add.s64 %rd658, %rd1118, %rd30; 2026-02-21T09:19:14.1997558Z add.s64 %rd640, %rd658, 320; 2026-02-21T09:19:14.1997637Z mad.wide.s32 %rd641, %r14437, 2, %rd64; 2026-02-21T09:19:14.1997838Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.1997901Z shl.b32 %r14438, %r23512, 14; 2026-02-21T09:19:14.1998040Z add.s32 %r14439, %r22967, %r14438; 2026-02-21T09:19:14.1998106Z add.s32 %r14365, %r14439, %r56; 2026-02-21T09:19:14.1998170Z selp.b32 %r14366, 8, 0, %p61; 2026-02-21T09:19:14.1998234Z // begin inline asm 2026-02-21T09:19:14.1998391Z cp.async.ca.shared.global [ %r14365 + 0 ], [ %rd634 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.1998452Z // end inline asm 2026-02-21T09:19:14.1998514Z add.s32 %r14367, %r14365, 2048; 2026-02-21T09:19:14.1998581Z // begin inline asm 2026-02-21T09:19:14.1998725Z cp.async.ca.shared.global [ %r14367 + 0 ], [ %rd635 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.1998781Z // end inline asm 2026-02-21T09:19:14.1998847Z add.s32 %r14369, %r14365, 4096; 2026-02-21T09:19:14.1998907Z // begin inline asm 2026-02-21T09:19:14.1999126Z cp.async.ca.shared.global [ %r14369 + 0 ], [ %rd636 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.1999194Z // end inline asm 2026-02-21T09:19:14.1999255Z add.s32 %r14371, %r14365, 6144; 2026-02-21T09:19:14.1999314Z // begin inline asm 2026-02-21T09:19:14.1999457Z cp.async.ca.shared.global [ %r14371 + 0 ], [ %rd637 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.1999514Z // end inline asm 2026-02-21T09:19:14.1999577Z add.s32 %r14373, %r14365, 8192; 2026-02-21T09:19:14.1999637Z // begin inline asm 2026-02-21T09:19:14.1999781Z cp.async.ca.shared.global [ %r14373 + 0 ], [ %rd638 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.1999840Z // end inline asm 2026-02-21T09:19:14.1999906Z add.s32 %r14375, %r14365, 10240; 2026-02-21T09:19:14.1999968Z // begin inline asm 2026-02-21T09:19:14.2000108Z cp.async.ca.shared.global [ %r14375 + 0 ], [ %rd639 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2000165Z // end inline asm 2026-02-21T09:19:14.2000229Z add.s32 %r14377, %r14365, 12288; 2026-02-21T09:19:14.2000293Z // begin inline asm 2026-02-21T09:19:14.2000429Z cp.async.ca.shared.global [ %r14377 + 0 ], [ %rd640 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2000486Z // end inline asm 2026-02-21T09:19:14.2000553Z add.s32 %r14379, %r14365, 14336; 2026-02-21T09:19:14.2000614Z // begin inline asm 2026-02-21T09:19:14.2000749Z cp.async.ca.shared.global [ %r14379 + 0 ], [ %rd641 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2000811Z // end inline asm 2026-02-21T09:19:14.2000878Z cp.async.commit_group; 2026-02-21T09:19:14.2001087Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2001154Z add.s32 %r14440, %r23510, -65536; 2026-02-21T09:19:14.2001227Z cvt.s64.s32 %rd659, %r14440; 2026-02-21T09:19:14.2001295Z add.s64 %rd642, %rd65, %rd659; 2026-02-21T09:19:14.2001493Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2001559Z shl.b32 %r14441, %r23512, 10; 2026-02-21T09:19:14.2001621Z add.s32 %r14381, %r66, %r14441; 2026-02-21T09:19:14.2001688Z selp.b32 %r14382, 4, 0, %p61; 2026-02-21T09:19:14.2001746Z // begin inline asm 2026-02-21T09:19:14.2001887Z cp.async.ca.shared.global [ %r14381 + 0 ], [ %rd642 + 0 ], 0x4, %r14382; 2026-02-21T09:19:14.2001945Z // end inline asm 2026-02-21T09:19:14.2002010Z cp.async.commit_group; 2026-02-21T09:19:14.2002214Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2002355Z add.s64 %rd643, %rd652, 352; 2026-02-21T09:19:14.2002419Z add.s64 %rd644, %rd653, 352; 2026-02-21T09:19:14.2002486Z add.s64 %rd645, %rd654, 352; 2026-02-21T09:19:14.2002547Z add.s64 %rd646, %rd655, 352; 2026-02-21T09:19:14.2002609Z add.s64 %rd647, %rd656, 352; 2026-02-21T09:19:14.2002721Z add.s64 %rd648, %rd657, 352; 2026-02-21T09:19:14.2002786Z add.s64 %rd649, %rd658, 352; 2026-02-21T09:19:14.2002858Z mad.wide.s32 %rd650, %r23509, 2, %rd64; 2026-02-21T09:19:14.2003056Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2003124Z add.s32 %r14442, %r6453, %r14438; 2026-02-21T09:19:14.2003185Z add.s32 %r14383, %r14442, %r56; 2026-02-21T09:19:14.2003243Z // begin inline asm 2026-02-21T09:19:14.2003457Z cp.async.ca.shared.global [ %r14383 + 0 ], [ %rd643 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2003517Z // end inline asm 2026-02-21T09:19:14.2003580Z add.s32 %r14385, %r14383, 2048; 2026-02-21T09:19:14.2003642Z // begin inline asm 2026-02-21T09:19:14.2003783Z cp.async.ca.shared.global [ %r14385 + 0 ], [ %rd644 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2003840Z // end inline asm 2026-02-21T09:19:14.2003901Z add.s32 %r14387, %r14383, 4096; 2026-02-21T09:19:14.2003965Z // begin inline asm 2026-02-21T09:19:14.2004100Z cp.async.ca.shared.global [ %r14387 + 0 ], [ %rd645 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2004158Z // end inline asm 2026-02-21T09:19:14.2004218Z add.s32 %r14389, %r14383, 6144; 2026-02-21T09:19:14.2004280Z // begin inline asm 2026-02-21T09:19:14.2004468Z cp.async.ca.shared.global [ %r14389 + 0 ], [ %rd646 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2004527Z // end inline asm 2026-02-21T09:19:14.2004594Z add.s32 %r14391, %r14383, 8192; 2026-02-21T09:19:14.2004655Z // begin inline asm 2026-02-21T09:19:14.2004791Z cp.async.ca.shared.global [ %r14391 + 0 ], [ %rd647 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2004849Z // end inline asm 2026-02-21T09:19:14.2004911Z add.s32 %r14393, %r14383, 10240; 2026-02-21T09:19:14.2004972Z // begin inline asm 2026-02-21T09:19:14.2005107Z cp.async.ca.shared.global [ %r14393 + 0 ], [ %rd648 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2005175Z // end inline asm 2026-02-21T09:19:14.2005243Z add.s32 %r14395, %r14383, 12288; 2026-02-21T09:19:14.2005304Z // begin inline asm 2026-02-21T09:19:14.2005444Z cp.async.ca.shared.global [ %r14395 + 0 ], [ %rd649 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2005500Z // end inline asm 2026-02-21T09:19:14.2005562Z add.s32 %r14397, %r14383, 14336; 2026-02-21T09:19:14.2005621Z // begin inline asm 2026-02-21T09:19:14.2005762Z cp.async.ca.shared.global [ %r14397 + 0 ], [ %rd650 + 0 ], 0x8, %r14366; 2026-02-21T09:19:14.2005819Z // end inline asm 2026-02-21T09:19:14.2005884Z cp.async.commit_group; 2026-02-21T09:19:14.2006086Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2006149Z cvt.s64.s32 %rd660, %r23510; 2026-02-21T09:19:14.2006215Z add.s64 %rd651, %rd65, %rd660; 2026-02-21T09:19:14.2006414Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2006601Z add.s32 %r14399, %r76, %r14441; 2026-02-21T09:19:14.2006664Z // begin inline asm 2026-02-21T09:19:14.2006806Z cp.async.ca.shared.global [ %r14399 + 0 ], [ %rd651 + 0 ], 0x4, %r14382; 2026-02-21T09:19:14.2006867Z // end inline asm 2026-02-21T09:19:14.2006934Z cp.async.commit_group; 2026-02-21T09:19:14.2007130Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2007197Z add.s32 %r23510, %r23510, 131072; 2026-02-21T09:19:14.2007259Z add.s64 %rd1118, %rd1118, 64; 2026-02-21T09:19:14.2007325Z add.s32 %r23509, %r23509, 32; 2026-02-21T09:19:14.2007393Z setp.lt.u64 %p64, %rd1119, 496; 2026-02-21T09:19:14.2007454Z @%p64 bra $L__BB0_7; 2026-02-21T09:19:14.2007562Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:14.2007850Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.2007912Z or.b32 %r14914, %r1375, %r9; 2026-02-21T09:19:14.2008108Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.2008232Z or.b32 %r14915, %r1376, %r19; 2026-02-21T09:19:14.2008293Z or.b32 %r14916, %r1376, %r20; 2026-02-21T09:19:14.2008356Z or.b32 %r14917, %r1376, %r21; 2026-02-21T09:19:14.2008430Z or.b32 %r14918, %r1376, %r22; 2026-02-21T09:19:14.2008492Z or.b32 %r14919, %r1376, %r23; 2026-02-21T09:19:14.2008550Z or.b32 %r14920, %r1376, %r24; 2026-02-21T09:19:14.2008609Z or.b32 %r14921, %r1376, %r25; 2026-02-21T09:19:14.2008674Z or.b32 %r14922, %r1376, %r26; 2026-02-21T09:19:14.2008798Z or.b32 %r14923, %r1376, %r27; 2026-02-21T09:19:14.2008861Z or.b32 %r14924, %r1376, %r28; 2026-02-21T09:19:14.2008923Z or.b32 %r14925, %r1376, %r29; 2026-02-21T09:19:14.2008985Z or.b32 %r14926, %r1376, %r30; 2026-02-21T09:19:14.2009053Z or.b32 %r14927, %r1376, %r31; 2026-02-21T09:19:14.2009114Z or.b32 %r14928, %r1376, %r32; 2026-02-21T09:19:14.2009174Z or.b32 %r14929, %r1376, %r33; 2026-02-21T09:19:14.2009231Z or.b32 %r14930, %r1376, %r34; 2026-02-21T09:19:14.2009292Z or.b32 %r14931, %r1376, %r35; 2026-02-21T09:19:14.2009355Z or.b32 %r14932, %r1376, %r36; 2026-02-21T09:19:14.2009413Z or.b32 %r14933, %r1376, %r37; 2026-02-21T09:19:14.2009471Z or.b32 %r14934, %r1376, %r38; 2026-02-21T09:19:14.2009533Z or.b32 %r14935, %r1376, %r39; 2026-02-21T09:19:14.2009654Z or.b32 %r14936, %r1376, %r40; 2026-02-21T09:19:14.2009715Z or.b32 %r14937, %r1376, %r41; 2026-02-21T09:19:14.2009773Z or.b32 %r14938, %r1376, %r42; 2026-02-21T09:19:14.2009833Z or.b32 %r14939, %r1376, %r43; 2026-02-21T09:19:14.2009895Z or.b32 %r14940, %r1376, %r44; 2026-02-21T09:19:14.2009954Z or.b32 %r14941, %r1376, %r45; 2026-02-21T09:19:14.2010015Z or.b32 %r14942, %r1376, %r46; 2026-02-21T09:19:14.2010075Z or.b32 %r14943, %r1376, %r47; 2026-02-21T09:19:14.2010133Z or.b32 %r14944, %r1376, %r48; 2026-02-21T09:19:14.2010193Z or.b32 %r14945, %r1376, %r49; 2026-02-21T09:19:14.2010255Z or.b32 %r14946, %r1376, %r50; 2026-02-21T09:19:14.2010453Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2010523Z cp.async.wait_group 0; 2026-02-21T09:19:14.2010582Z bar.sync 0; 2026-02-21T09:19:14.2010780Z .loc 1 90 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:90:28 2026-02-21T09:19:14.2010866Z cvt.rn.bf16x2.f32 %r14947, %r23514, %r23513; 2026-02-21T09:19:14.2010949Z cvt.rn.bf16x2.f32 %r14948, %r23516, %r23515; 2026-02-21T09:19:14.2011026Z cvt.rn.bf16x2.f32 %r14949, %r23518, %r23517; 2026-02-21T09:19:14.2011102Z cvt.rn.bf16x2.f32 %r14950, %r23520, %r23519; 2026-02-21T09:19:14.2011177Z cvt.rn.bf16x2.f32 %r14951, %r23522, %r23521; 2026-02-21T09:19:14.2011255Z cvt.rn.bf16x2.f32 %r14952, %r23524, %r23523; 2026-02-21T09:19:14.2011334Z cvt.rn.bf16x2.f32 %r14953, %r23526, %r23525; 2026-02-21T09:19:14.2011409Z cvt.rn.bf16x2.f32 %r14954, %r23528, %r23527; 2026-02-21T09:19:14.2011487Z cvt.rn.bf16x2.f32 %r14955, %r23530, %r23529; 2026-02-21T09:19:14.2011564Z cvt.rn.bf16x2.f32 %r14956, %r23532, %r23531; 2026-02-21T09:19:14.2011638Z cvt.rn.bf16x2.f32 %r14957, %r23534, %r23533; 2026-02-21T09:19:14.2011715Z cvt.rn.bf16x2.f32 %r14958, %r23536, %r23535; 2026-02-21T09:19:14.2011790Z cvt.rn.bf16x2.f32 %r14959, %r23538, %r23537; 2026-02-21T09:19:14.2011866Z cvt.rn.bf16x2.f32 %r14960, %r23540, %r23539; 2026-02-21T09:19:14.2011941Z cvt.rn.bf16x2.f32 %r14961, %r23542, %r23541; 2026-02-21T09:19:14.2012017Z cvt.rn.bf16x2.f32 %r14962, %r23544, %r23543; 2026-02-21T09:19:14.2012095Z cvt.rn.bf16x2.f32 %r14963, %r23546, %r23545; 2026-02-21T09:19:14.2012169Z cvt.rn.bf16x2.f32 %r14964, %r23548, %r23547; 2026-02-21T09:19:14.2012247Z cvt.rn.bf16x2.f32 %r14965, %r23550, %r23549; 2026-02-21T09:19:14.2012382Z cvt.rn.bf16x2.f32 %r14966, %r23552, %r23551; 2026-02-21T09:19:14.2012457Z cvt.rn.bf16x2.f32 %r14967, %r23554, %r23553; 2026-02-21T09:19:14.2012534Z cvt.rn.bf16x2.f32 %r14968, %r23556, %r23555; 2026-02-21T09:19:14.2012609Z cvt.rn.bf16x2.f32 %r14969, %r23558, %r23557; 2026-02-21T09:19:14.2012731Z cvt.rn.bf16x2.f32 %r14970, %r23560, %r23559; 2026-02-21T09:19:14.2012806Z cvt.rn.bf16x2.f32 %r14971, %r23562, %r23561; 2026-02-21T09:19:14.2012883Z cvt.rn.bf16x2.f32 %r14972, %r23564, %r23563; 2026-02-21T09:19:14.2012958Z cvt.rn.bf16x2.f32 %r14973, %r23566, %r23565; 2026-02-21T09:19:14.2013034Z cvt.rn.bf16x2.f32 %r14974, %r23568, %r23567; 2026-02-21T09:19:14.2013111Z cvt.rn.bf16x2.f32 %r14975, %r23570, %r23569; 2026-02-21T09:19:14.2013185Z cvt.rn.bf16x2.f32 %r14976, %r23572, %r23571; 2026-02-21T09:19:14.2013308Z cvt.rn.bf16x2.f32 %r14977, %r23574, %r23573; 2026-02-21T09:19:14.2013388Z cvt.rn.bf16x2.f32 %r14978, %r23576, %r23575; 2026-02-21T09:19:14.2013463Z cvt.rn.bf16x2.f32 %r14979, %r23578, %r23577; 2026-02-21T09:19:14.2013540Z cvt.rn.bf16x2.f32 %r14980, %r23580, %r23579; 2026-02-21T09:19:14.2013615Z cvt.rn.bf16x2.f32 %r14981, %r23582, %r23581; 2026-02-21T09:19:14.2013693Z cvt.rn.bf16x2.f32 %r14982, %r23584, %r23583; 2026-02-21T09:19:14.2013770Z cvt.rn.bf16x2.f32 %r14983, %r23586, %r23585; 2026-02-21T09:19:14.2013846Z cvt.rn.bf16x2.f32 %r14984, %r23588, %r23587; 2026-02-21T09:19:14.2013925Z cvt.rn.bf16x2.f32 %r14985, %r23590, %r23589; 2026-02-21T09:19:14.2013999Z cvt.rn.bf16x2.f32 %r14986, %r23592, %r23591; 2026-02-21T09:19:14.2014123Z cvt.rn.bf16x2.f32 %r14987, %r23594, %r23593; 2026-02-21T09:19:14.2014201Z cvt.rn.bf16x2.f32 %r14988, %r23596, %r23595; 2026-02-21T09:19:14.2014293Z cvt.rn.bf16x2.f32 %r14989, %r23598, %r23597; 2026-02-21T09:19:14.2014371Z cvt.rn.bf16x2.f32 %r14990, %r23600, %r23599; 2026-02-21T09:19:14.2014448Z cvt.rn.bf16x2.f32 %r14991, %r23602, %r23601; 2026-02-21T09:19:14.2014532Z cvt.rn.bf16x2.f32 %r14992, %r23604, %r23603; 2026-02-21T09:19:14.2014607Z cvt.rn.bf16x2.f32 %r14993, %r23606, %r23605; 2026-02-21T09:19:14.2014683Z cvt.rn.bf16x2.f32 %r14994, %r23608, %r23607; 2026-02-21T09:19:14.2014760Z cvt.rn.bf16x2.f32 %r14995, %r23610, %r23609; 2026-02-21T09:19:14.2014835Z cvt.rn.bf16x2.f32 %r14996, %r23612, %r23611; 2026-02-21T09:19:14.2014912Z cvt.rn.bf16x2.f32 %r14997, %r23614, %r23613; 2026-02-21T09:19:14.2014986Z cvt.rn.bf16x2.f32 %r14998, %r23616, %r23615; 2026-02-21T09:19:14.2015066Z cvt.rn.bf16x2.f32 %r14999, %r23618, %r23617; 2026-02-21T09:19:14.2015140Z cvt.rn.bf16x2.f32 %r15000, %r23620, %r23619; 2026-02-21T09:19:14.2015215Z cvt.rn.bf16x2.f32 %r15001, %r23622, %r23621; 2026-02-21T09:19:14.2015293Z cvt.rn.bf16x2.f32 %r15002, %r23624, %r23623; 2026-02-21T09:19:14.2015367Z cvt.rn.bf16x2.f32 %r15003, %r23626, %r23625; 2026-02-21T09:19:14.2015443Z cvt.rn.bf16x2.f32 %r15004, %r23628, %r23627; 2026-02-21T09:19:14.2015522Z cvt.rn.bf16x2.f32 %r15005, %r23630, %r23629; 2026-02-21T09:19:14.2015597Z cvt.rn.bf16x2.f32 %r15006, %r23632, %r23631; 2026-02-21T09:19:14.2015673Z cvt.rn.bf16x2.f32 %r15007, %r23634, %r23633; 2026-02-21T09:19:14.2015747Z cvt.rn.bf16x2.f32 %r15008, %r23636, %r23635; 2026-02-21T09:19:14.2015824Z cvt.rn.bf16x2.f32 %r15009, %r23638, %r23637; 2026-02-21T09:19:14.2015899Z cvt.rn.bf16x2.f32 %r15010, %r23640, %r23639; 2026-02-21T09:19:14.2015972Z cvt.rn.bf16x2.f32 %r15011, %r23642, %r23641; 2026-02-21T09:19:14.2016049Z cvt.rn.bf16x2.f32 %r15012, %r23644, %r23643; 2026-02-21T09:19:14.2016123Z cvt.rn.bf16x2.f32 %r15013, %r23646, %r23645; 2026-02-21T09:19:14.2016199Z cvt.rn.bf16x2.f32 %r15014, %r23648, %r23647; 2026-02-21T09:19:14.2016276Z cvt.rn.bf16x2.f32 %r15015, %r23650, %r23649; 2026-02-21T09:19:14.2016351Z cvt.rn.bf16x2.f32 %r15016, %r23652, %r23651; 2026-02-21T09:19:14.2016426Z cvt.rn.bf16x2.f32 %r15017, %r23654, %r23653; 2026-02-21T09:19:14.2016625Z cvt.rn.bf16x2.f32 %r15018, %r23656, %r23655; 2026-02-21T09:19:14.2016706Z cvt.rn.bf16x2.f32 %r15019, %r23658, %r23657; 2026-02-21T09:19:14.2016874Z cvt.rn.bf16x2.f32 %r15020, %r23660, %r23659; 2026-02-21T09:19:14.2016952Z cvt.rn.bf16x2.f32 %r15021, %r23662, %r23661; 2026-02-21T09:19:14.2017031Z cvt.rn.bf16x2.f32 %r15022, %r23664, %r23663; 2026-02-21T09:19:14.2017104Z cvt.rn.bf16x2.f32 %r15023, %r23666, %r23665; 2026-02-21T09:19:14.2017241Z cvt.rn.bf16x2.f32 %r15024, %r23668, %r23667; 2026-02-21T09:19:14.2017319Z cvt.rn.bf16x2.f32 %r15025, %r23670, %r23669; 2026-02-21T09:19:14.2017394Z cvt.rn.bf16x2.f32 %r15026, %r23672, %r23671; 2026-02-21T09:19:14.2017468Z cvt.rn.bf16x2.f32 %r15027, %r23674, %r23673; 2026-02-21T09:19:14.2017543Z cvt.rn.bf16x2.f32 %r15028, %r23676, %r23675; 2026-02-21T09:19:14.2017619Z cvt.rn.bf16x2.f32 %r15029, %r23678, %r23677; 2026-02-21T09:19:14.2017693Z cvt.rn.bf16x2.f32 %r15030, %r23680, %r23679; 2026-02-21T09:19:14.2017831Z cvt.rn.bf16x2.f32 %r15031, %r23682, %r23681; 2026-02-21T09:19:14.2017912Z cvt.rn.bf16x2.f32 %r15032, %r23684, %r23683; 2026-02-21T09:19:14.2017986Z cvt.rn.bf16x2.f32 %r15033, %r23686, %r23685; 2026-02-21T09:19:14.2018063Z cvt.rn.bf16x2.f32 %r15034, %r23688, %r23687; 2026-02-21T09:19:14.2018139Z cvt.rn.bf16x2.f32 %r15035, %r23690, %r23689; 2026-02-21T09:19:14.2018213Z cvt.rn.bf16x2.f32 %r15036, %r23692, %r23691; 2026-02-21T09:19:14.2018289Z cvt.rn.bf16x2.f32 %r15037, %r23694, %r23693; 2026-02-21T09:19:14.2018366Z cvt.rn.bf16x2.f32 %r15038, %r23696, %r23695; 2026-02-21T09:19:14.2018443Z cvt.rn.bf16x2.f32 %r15039, %r23698, %r23697; 2026-02-21T09:19:14.2018517Z cvt.rn.bf16x2.f32 %r15040, %r23700, %r23699; 2026-02-21T09:19:14.2018654Z cvt.rn.bf16x2.f32 %r15041, %r23702, %r23701; 2026-02-21T09:19:14.2018739Z cvt.rn.bf16x2.f32 %r15042, %r23704, %r23703; 2026-02-21T09:19:14.2018814Z cvt.rn.bf16x2.f32 %r15043, %r23706, %r23705; 2026-02-21T09:19:14.2018890Z cvt.rn.bf16x2.f32 %r15044, %r23708, %r23707; 2026-02-21T09:19:14.2018967Z cvt.rn.bf16x2.f32 %r15045, %r23710, %r23709; 2026-02-21T09:19:14.2019041Z cvt.rn.bf16x2.f32 %r15046, %r23712, %r23711; 2026-02-21T09:19:14.2019116Z cvt.rn.bf16x2.f32 %r15047, %r23714, %r23713; 2026-02-21T09:19:14.2019191Z cvt.rn.bf16x2.f32 %r15048, %r23716, %r23715; 2026-02-21T09:19:14.2019267Z cvt.rn.bf16x2.f32 %r15049, %r23718, %r23717; 2026-02-21T09:19:14.2019340Z cvt.rn.bf16x2.f32 %r15050, %r23720, %r23719; 2026-02-21T09:19:14.2019418Z cvt.rn.bf16x2.f32 %r15051, %r23722, %r23721; 2026-02-21T09:19:14.2019499Z cvt.rn.bf16x2.f32 %r15052, %r23724, %r23723; 2026-02-21T09:19:14.2019576Z cvt.rn.bf16x2.f32 %r15053, %r23726, %r23725; 2026-02-21T09:19:14.2019650Z cvt.rn.bf16x2.f32 %r15054, %r23728, %r23727; 2026-02-21T09:19:14.2019727Z cvt.rn.bf16x2.f32 %r15055, %r23730, %r23729; 2026-02-21T09:19:14.2019803Z cvt.rn.bf16x2.f32 %r15056, %r23732, %r23731; 2026-02-21T09:19:14.2019879Z cvt.rn.bf16x2.f32 %r15057, %r23734, %r23733; 2026-02-21T09:19:14.2019954Z cvt.rn.bf16x2.f32 %r15058, %r23736, %r23735; 2026-02-21T09:19:14.2020032Z cvt.rn.bf16x2.f32 %r15059, %r23738, %r23737; 2026-02-21T09:19:14.2020106Z cvt.rn.bf16x2.f32 %r15060, %r23740, %r23739; 2026-02-21T09:19:14.2020182Z cvt.rn.bf16x2.f32 %r15061, %r23742, %r23741; 2026-02-21T09:19:14.2020259Z cvt.rn.bf16x2.f32 %r15062, %r23744, %r23743; 2026-02-21T09:19:14.2020338Z cvt.rn.bf16x2.f32 %r15063, %r23746, %r23745; 2026-02-21T09:19:14.2020416Z cvt.rn.bf16x2.f32 %r15064, %r23748, %r23747; 2026-02-21T09:19:14.2020489Z cvt.rn.bf16x2.f32 %r15065, %r23750, %r23749; 2026-02-21T09:19:14.2020567Z cvt.rn.bf16x2.f32 %r15066, %r23752, %r23751; 2026-02-21T09:19:14.2020643Z cvt.rn.bf16x2.f32 %r15067, %r23754, %r23753; 2026-02-21T09:19:14.2020718Z cvt.rn.bf16x2.f32 %r15068, %r23756, %r23755; 2026-02-21T09:19:14.2020807Z cvt.rn.bf16x2.f32 %r15069, %r23758, %r23757; 2026-02-21T09:19:14.2020884Z cvt.rn.bf16x2.f32 %r15070, %r23760, %r23759; 2026-02-21T09:19:14.2020961Z cvt.rn.bf16x2.f32 %r15071, %r23762, %r23761; 2026-02-21T09:19:14.2021038Z cvt.rn.bf16x2.f32 %r15072, %r23764, %r23763; 2026-02-21T09:19:14.2021114Z cvt.rn.bf16x2.f32 %r15073, %r23766, %r23765; 2026-02-21T09:19:14.2021248Z cvt.rn.bf16x2.f32 %r15074, %r23768, %r23767; 2026-02-21T09:19:14.2021464Z .loc 1 91 43 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:43 2026-02-21T09:19:14.2021531Z shl.b32 %r15075, %r14915, 13; 2026-02-21T09:19:14.2021657Z shl.b32 %r15076, %r14916, 13; 2026-02-21T09:19:14.2021717Z shl.b32 %r15077, %r14917, 13; 2026-02-21T09:19:14.2021779Z shl.b32 %r15078, %r14918, 13; 2026-02-21T09:19:14.2021837Z shl.b32 %r15079, %r14919, 13; 2026-02-21T09:19:14.2021895Z shl.b32 %r15080, %r14920, 13; 2026-02-21T09:19:14.2021956Z shl.b32 %r15081, %r14921, 13; 2026-02-21T09:19:14.2022016Z shl.b32 %r15082, %r14922, 13; 2026-02-21T09:19:14.2022074Z shl.b32 %r15083, %r14923, 13; 2026-02-21T09:19:14.2022131Z shl.b32 %r15084, %r14924, 13; 2026-02-21T09:19:14.2022247Z shl.b32 %r15085, %r14925, 13; 2026-02-21T09:19:14.2022310Z shl.b32 %r15086, %r14926, 13; 2026-02-21T09:19:14.2022370Z shl.b32 %r15087, %r14927, 13; 2026-02-21T09:19:14.2022434Z shl.b32 %r15088, %r14928, 13; 2026-02-21T09:19:14.2022493Z shl.b32 %r15089, %r14929, 13; 2026-02-21T09:19:14.2022552Z shl.b32 %r15090, %r14930, 13; 2026-02-21T09:19:14.2022611Z shl.b32 %r15091, %r14931, 13; 2026-02-21T09:19:14.2022673Z shl.b32 %r15092, %r14932, 13; 2026-02-21T09:19:14.2022733Z shl.b32 %r15093, %r14933, 13; 2026-02-21T09:19:14.2022792Z shl.b32 %r15094, %r14934, 13; 2026-02-21T09:19:14.2022854Z shl.b32 %r15095, %r14935, 13; 2026-02-21T09:19:14.2022912Z shl.b32 %r15096, %r14936, 13; 2026-02-21T09:19:14.2022971Z shl.b32 %r15097, %r14937, 13; 2026-02-21T09:19:14.2023076Z shl.b32 %r15098, %r14938, 13; 2026-02-21T09:19:14.2023141Z shl.b32 %r15099, %r14939, 13; 2026-02-21T09:19:14.2023200Z shl.b32 %r15100, %r14940, 13; 2026-02-21T09:19:14.2023260Z shl.b32 %r15101, %r14941, 13; 2026-02-21T09:19:14.2023323Z shl.b32 %r15102, %r14942, 13; 2026-02-21T09:19:14.2023381Z shl.b32 %r15103, %r14943, 13; 2026-02-21T09:19:14.2023440Z shl.b32 %r15104, %r14944, 13; 2026-02-21T09:19:14.2023503Z shl.b32 %r15105, %r14945, 13; 2026-02-21T09:19:14.2023561Z shl.b32 %r15106, %r14946, 13; 2026-02-21T09:19:14.2023767Z .loc 1 91 50 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:50 2026-02-21T09:19:14.2023834Z add.s32 %r15107, %r15075, %r14914; 2026-02-21T09:19:14.2023900Z add.s32 %r15108, %r15076, %r14914; 2026-02-21T09:19:14.2023961Z add.s32 %r15109, %r15077, %r14914; 2026-02-21T09:19:14.2024022Z add.s32 %r15110, %r15078, %r14914; 2026-02-21T09:19:14.2024084Z add.s32 %r15111, %r15079, %r14914; 2026-02-21T09:19:14.2024145Z add.s32 %r15112, %r15080, %r14914; 2026-02-21T09:19:14.2024205Z add.s32 %r15113, %r15081, %r14914; 2026-02-21T09:19:14.2024263Z add.s32 %r15114, %r15082, %r14914; 2026-02-21T09:19:14.2024339Z add.s32 %r15115, %r15083, %r14914; 2026-02-21T09:19:14.2024402Z add.s32 %r15116, %r15084, %r14914; 2026-02-21T09:19:14.2024463Z add.s32 %r15117, %r15085, %r14914; 2026-02-21T09:19:14.2024526Z add.s32 %r15118, %r15086, %r14914; 2026-02-21T09:19:14.2024589Z add.s32 %r15119, %r15087, %r14914; 2026-02-21T09:19:14.2024650Z add.s32 %r15120, %r15088, %r14914; 2026-02-21T09:19:14.2024713Z add.s32 %r15121, %r15089, %r14914; 2026-02-21T09:19:14.2024773Z add.s32 %r15122, %r15090, %r14914; 2026-02-21T09:19:14.2024835Z add.s32 %r15123, %r15091, %r14914; 2026-02-21T09:19:14.2024895Z add.s32 %r15124, %r15092, %r14914; 2026-02-21T09:19:14.2024958Z add.s32 %r15125, %r15093, %r14914; 2026-02-21T09:19:14.2025019Z add.s32 %r15126, %r15094, %r14914; 2026-02-21T09:19:14.2025079Z add.s32 %r15127, %r15095, %r14914; 2026-02-21T09:19:14.2025142Z add.s32 %r15128, %r15096, %r14914; 2026-02-21T09:19:14.2025204Z add.s32 %r15129, %r15097, %r14914; 2026-02-21T09:19:14.2025264Z add.s32 %r15130, %r15098, %r14914; 2026-02-21T09:19:14.2025325Z add.s32 %r15131, %r15099, %r14914; 2026-02-21T09:19:14.2025388Z add.s32 %r15132, %r15100, %r14914; 2026-02-21T09:19:14.2025446Z add.s32 %r15133, %r15101, %r14914; 2026-02-21T09:19:14.2025566Z add.s32 %r15134, %r15102, %r14914; 2026-02-21T09:19:14.2025629Z add.s32 %r15135, %r15103, %r14914; 2026-02-21T09:19:14.2025688Z add.s32 %r15136, %r15104, %r14914; 2026-02-21T09:19:14.2025748Z add.s32 %r15137, %r15105, %r14914; 2026-02-21T09:19:14.2025807Z add.s32 %r15138, %r15106, %r14914; 2026-02-21T09:19:14.2026069Z .loc 1 91 22 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:22 2026-02-21T09:19:14.2026143Z mad.wide.s32 %rd661, %r15107, 2, %rd66; 2026-02-21T09:19:14.2026212Z mad.wide.s32 %rd662, %r15108, 2, %rd66; 2026-02-21T09:19:14.2026287Z mad.wide.s32 %rd663, %r15109, 2, %rd66; 2026-02-21T09:19:14.2026355Z mad.wide.s32 %rd664, %r15110, 2, %rd66; 2026-02-21T09:19:14.2026432Z mad.wide.s32 %rd665, %r15111, 2, %rd66; 2026-02-21T09:19:14.2026685Z mad.wide.s32 %rd666, %r15112, 2, %rd66; 2026-02-21T09:19:14.2026759Z mad.wide.s32 %rd667, %r15113, 2, %rd66; 2026-02-21T09:19:14.2026825Z mad.wide.s32 %rd668, %r15114, 2, %rd66; 2026-02-21T09:19:14.2026894Z mad.wide.s32 %rd669, %r15115, 2, %rd66; 2026-02-21T09:19:14.2026964Z mad.wide.s32 %rd670, %r15116, 2, %rd66; 2026-02-21T09:19:14.2027030Z mad.wide.s32 %rd671, %r15117, 2, %rd66; 2026-02-21T09:19:14.2027096Z mad.wide.s32 %rd672, %r15118, 2, %rd66; 2026-02-21T09:19:14.2027167Z mad.wide.s32 %rd673, %r15119, 2, %rd66; 2026-02-21T09:19:14.2027235Z mad.wide.s32 %rd674, %r15120, 2, %rd66; 2026-02-21T09:19:14.2027301Z mad.wide.s32 %rd675, %r15121, 2, %rd66; 2026-02-21T09:19:14.2027370Z mad.wide.s32 %rd676, %r15122, 2, %rd66; 2026-02-21T09:19:14.2027500Z mad.wide.s32 %rd677, %r15123, 2, %rd66; 2026-02-21T09:19:14.2027570Z mad.wide.s32 %rd678, %r15124, 2, %rd66; 2026-02-21T09:19:14.2027637Z mad.wide.s32 %rd679, %r15125, 2, %rd66; 2026-02-21T09:19:14.2027708Z mad.wide.s32 %rd680, %r15126, 2, %rd66; 2026-02-21T09:19:14.2027777Z mad.wide.s32 %rd681, %r15127, 2, %rd66; 2026-02-21T09:19:14.2027842Z mad.wide.s32 %rd682, %r15128, 2, %rd66; 2026-02-21T09:19:14.2027918Z mad.wide.s32 %rd683, %r15129, 2, %rd66; 2026-02-21T09:19:14.2027992Z mad.wide.s32 %rd684, %r15130, 2, %rd66; 2026-02-21T09:19:14.2028061Z mad.wide.s32 %rd685, %r15131, 2, %rd66; 2026-02-21T09:19:14.2028130Z mad.wide.s32 %rd686, %r15132, 2, %rd66; 2026-02-21T09:19:14.2028198Z mad.wide.s32 %rd687, %r15133, 2, %rd66; 2026-02-21T09:19:14.2028265Z mad.wide.s32 %rd688, %r15134, 2, %rd66; 2026-02-21T09:19:14.2028330Z mad.wide.s32 %rd689, %r15135, 2, %rd66; 2026-02-21T09:19:14.2028399Z mad.wide.s32 %rd690, %r15136, 2, %rd66; 2026-02-21T09:19:14.2028465Z mad.wide.s32 %rd691, %r15137, 2, %rd66; 2026-02-21T09:19:14.2028606Z mad.wide.s32 %rd692, %r15138, 2, %rd66; 2026-02-21T09:19:14.2028823Z .loc 1 91 81 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:81 2026-02-21T09:19:14.2028948Z st.shared.v4.b32 [%r165], {%r14947, %r14949, %r14951, %r14953}; 2026-02-21T09:19:14.2029073Z st.shared.v4.b32 [%r165+512], {%r14948, %r14950, %r14952, %r14954}; 2026-02-21T09:19:14.2029196Z st.shared.v4.b32 [%r166], {%r14955, %r14957, %r14959, %r14961}; 2026-02-21T09:19:14.2029315Z st.shared.v4.b32 [%r166+512], {%r14956, %r14958, %r14960, %r14962}; 2026-02-21T09:19:14.2029424Z st.shared.v4.b32 [%r167], {%r14963, %r14965, %r14967, %r14969}; 2026-02-21T09:19:14.2029543Z st.shared.v4.b32 [%r167+512], {%r14964, %r14966, %r14968, %r14970}; 2026-02-21T09:19:14.2029655Z st.shared.v4.b32 [%r168], {%r14971, %r14973, %r14975, %r14977}; 2026-02-21T09:19:14.2029768Z st.shared.v4.b32 [%r168+512], {%r14972, %r14974, %r14976, %r14978}; 2026-02-21T09:19:14.2029825Z bar.sync 0; 2026-02-21T09:19:14.2029890Z // begin inline asm 2026-02-21T09:19:14.2030092Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14603, %r14604, %r14605, %r14606}, [%r6479]; 2026-02-21T09:19:14.2030150Z // end inline asm 2026-02-21T09:19:14.2030213Z // begin inline asm 2026-02-21T09:19:14.2030410Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14607, %r14608, %r14609, %r14610}, [%r6484]; 2026-02-21T09:19:14.2030466Z // end inline asm 2026-02-21T09:19:14.2030608Z // begin inline asm 2026-02-21T09:19:14.2030799Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14611, %r14612, %r14613, %r14614}, [%r6489]; 2026-02-21T09:19:14.2030856Z // end inline asm 2026-02-21T09:19:14.2030914Z // begin inline asm 2026-02-21T09:19:14.2031168Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14615, %r14616, %r14617, %r14618}, [%r6494]; 2026-02-21T09:19:14.2031224Z // end inline asm 2026-02-21T09:19:14.2031293Z // begin inline asm 2026-02-21T09:19:14.2031913Z [434s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:19:14.2033236Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 512, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=16, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[4, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:19:14.2033384Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:19:14.2033445Z `ptxas` stderr: 2026-02-21T09:19:14.2033909Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1282 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:19:14.2034015Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:19:14.2034022Z 2026-02-21T09:19:14.2034578Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpu2yd3u9a.ptx -o /tmp/tmpu2yd3u9a.ptx.o 2026-02-21T09:19:14.2034584Z 2026-02-21T09:19:14.2034738Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:19:14.2035983Z [434s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=16, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[True, False], range_num_stages=[4, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T09:19:14.2036059Z Tensor-likes are not close! 2026-02-21T09:19:14.2036064Z 2026-02-21T09:19:14.2036155Z Mismatched elements: 133941113 / 134217728 (99.8%) 2026-02-21T09:19:14.2036331Z Greatest absolute difference: 2432.0 at index (5585, 6555) (up to 0.01 allowed) 2026-02-21T09:19:14.2036624Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:19:14.2036756Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:19:14.2036763Z 2026-02-21T09:19:14.2036967Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14619, %r14620, %r14621, %r14622}, [%r6499]; 2026-02-21T09:19:14.2037025Z // end inline asm 2026-02-21T09:19:14.2037087Z // begin inline asm 2026-02-21T09:19:14.2037282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14623, %r14624, %r14625, %r14626}, [%r6504]; 2026-02-21T09:19:14.2037339Z // end inline asm 2026-02-21T09:19:14.2037398Z // begin inline asm 2026-02-21T09:19:14.2037588Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14627, %r14628, %r14629, %r14630}, [%r6509]; 2026-02-21T09:19:14.2037648Z // end inline asm 2026-02-21T09:19:14.2037704Z // begin inline asm 2026-02-21T09:19:14.2037889Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14631, %r14632, %r14633, %r14634}, [%r6514]; 2026-02-21T09:19:14.2037951Z // end inline asm 2026-02-21T09:19:14.2038007Z bar.sync 0; 2026-02-21T09:19:14.2038129Z st.shared.v4.b32 [%r165], {%r14979, %r14981, %r14983, %r14985}; 2026-02-21T09:19:14.2038252Z st.shared.v4.b32 [%r165+512], {%r14980, %r14982, %r14984, %r14986}; 2026-02-21T09:19:14.2038370Z st.shared.v4.b32 [%r166], {%r14987, %r14989, %r14991, %r14993}; 2026-02-21T09:19:14.2038487Z st.shared.v4.b32 [%r166+512], {%r14988, %r14990, %r14992, %r14994}; 2026-02-21T09:19:14.2038691Z st.shared.v4.b32 [%r167], {%r14995, %r14997, %r14999, %r15001}; 2026-02-21T09:19:14.2038812Z st.shared.v4.b32 [%r167+512], {%r14996, %r14998, %r15000, %r15002}; 2026-02-21T09:19:14.2038919Z st.shared.v4.b32 [%r168], {%r15003, %r15005, %r15007, %r15009}; 2026-02-21T09:19:14.2039095Z st.shared.v4.b32 [%r168+512], {%r15004, %r15006, %r15008, %r15010}; 2026-02-21T09:19:14.2039155Z bar.sync 0; 2026-02-21T09:19:14.2039215Z // begin inline asm 2026-02-21T09:19:14.2039414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14635, %r14636, %r14637, %r14638}, [%r6479]; 2026-02-21T09:19:14.2039474Z // end inline asm 2026-02-21T09:19:14.2039531Z // begin inline asm 2026-02-21T09:19:14.2039782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14639, %r14640, %r14641, %r14642}, [%r6484]; 2026-02-21T09:19:14.2039841Z // end inline asm 2026-02-21T09:19:14.2039901Z // begin inline asm 2026-02-21T09:19:14.2040086Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14643, %r14644, %r14645, %r14646}, [%r6489]; 2026-02-21T09:19:14.2040143Z // end inline asm 2026-02-21T09:19:14.2040201Z // begin inline asm 2026-02-21T09:19:14.2040387Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14647, %r14648, %r14649, %r14650}, [%r6494]; 2026-02-21T09:19:14.2040448Z // end inline asm 2026-02-21T09:19:14.2040503Z // begin inline asm 2026-02-21T09:19:14.2040692Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14651, %r14652, %r14653, %r14654}, [%r6499]; 2026-02-21T09:19:14.2040747Z // end inline asm 2026-02-21T09:19:14.2040805Z // begin inline asm 2026-02-21T09:19:14.2041056Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14655, %r14656, %r14657, %r14658}, [%r6504]; 2026-02-21T09:19:14.2041115Z // end inline asm 2026-02-21T09:19:14.2041172Z // begin inline asm 2026-02-21T09:19:14.2041362Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14659, %r14660, %r14661, %r14662}, [%r6509]; 2026-02-21T09:19:14.2041418Z // end inline asm 2026-02-21T09:19:14.2041475Z // begin inline asm 2026-02-21T09:19:14.2041664Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14663, %r14664, %r14665, %r14666}, [%r6514]; 2026-02-21T09:19:14.2041722Z // end inline asm 2026-02-21T09:19:14.2041776Z bar.sync 0; 2026-02-21T09:19:14.2041886Z st.shared.v4.b32 [%r165], {%r15011, %r15013, %r15015, %r15017}; 2026-02-21T09:19:14.2042007Z st.shared.v4.b32 [%r165+512], {%r15012, %r15014, %r15016, %r15018}; 2026-02-21T09:19:14.2042116Z st.shared.v4.b32 [%r166], {%r15019, %r15021, %r15023, %r15025}; 2026-02-21T09:19:14.2042231Z st.shared.v4.b32 [%r166+512], {%r15020, %r15022, %r15024, %r15026}; 2026-02-21T09:19:14.2042343Z st.shared.v4.b32 [%r167], {%r15027, %r15029, %r15031, %r15033}; 2026-02-21T09:19:14.2042457Z st.shared.v4.b32 [%r167+512], {%r15028, %r15030, %r15032, %r15034}; 2026-02-21T09:19:14.2042565Z st.shared.v4.b32 [%r168], {%r15035, %r15037, %r15039, %r15041}; 2026-02-21T09:19:14.2042678Z st.shared.v4.b32 [%r168+512], {%r15036, %r15038, %r15040, %r15042}; 2026-02-21T09:19:14.2042734Z bar.sync 0; 2026-02-21T09:19:14.2042793Z // begin inline asm 2026-02-21T09:19:14.2042982Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14667, %r14668, %r14669, %r14670}, [%r6479]; 2026-02-21T09:19:14.2043050Z // end inline asm 2026-02-21T09:19:14.2043116Z // begin inline asm 2026-02-21T09:19:14.2043306Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14671, %r14672, %r14673, %r14674}, [%r6484]; 2026-02-21T09:19:14.2043364Z // end inline asm 2026-02-21T09:19:14.2043422Z // begin inline asm 2026-02-21T09:19:14.2043609Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14675, %r14676, %r14677, %r14678}, [%r6489]; 2026-02-21T09:19:14.2043667Z // end inline asm 2026-02-21T09:19:14.2043727Z // begin inline asm 2026-02-21T09:19:14.2043915Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14679, %r14680, %r14681, %r14682}, [%r6494]; 2026-02-21T09:19:14.2043971Z // end inline asm 2026-02-21T09:19:14.2044030Z // begin inline asm 2026-02-21T09:19:14.2044217Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14683, %r14684, %r14685, %r14686}, [%r6499]; 2026-02-21T09:19:14.2044354Z // end inline asm 2026-02-21T09:19:14.2044413Z // begin inline asm 2026-02-21T09:19:14.2044601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14687, %r14688, %r14689, %r14690}, [%r6504]; 2026-02-21T09:19:14.2044657Z // end inline asm 2026-02-21T09:19:14.2044768Z // begin inline asm 2026-02-21T09:19:14.2044958Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14691, %r14692, %r14693, %r14694}, [%r6509]; 2026-02-21T09:19:14.2045015Z // end inline asm 2026-02-21T09:19:14.2045072Z // begin inline asm 2026-02-21T09:19:14.2045265Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14695, %r14696, %r14697, %r14698}, [%r6514]; 2026-02-21T09:19:14.2045322Z // end inline asm 2026-02-21T09:19:14.2045380Z bar.sync 0; 2026-02-21T09:19:14.2045543Z st.shared.v4.b32 [%r165], {%r15043, %r15045, %r15047, %r15049}; 2026-02-21T09:19:14.2045666Z st.shared.v4.b32 [%r165+512], {%r15044, %r15046, %r15048, %r15050}; 2026-02-21T09:19:14.2045777Z st.shared.v4.b32 [%r166], {%r15051, %r15053, %r15055, %r15057}; 2026-02-21T09:19:14.2045895Z st.shared.v4.b32 [%r166+512], {%r15052, %r15054, %r15056, %r15058}; 2026-02-21T09:19:14.2046009Z st.shared.v4.b32 [%r167], {%r15059, %r15061, %r15063, %r15065}; 2026-02-21T09:19:14.2046123Z st.shared.v4.b32 [%r167+512], {%r15060, %r15062, %r15064, %r15066}; 2026-02-21T09:19:14.2046233Z st.shared.v4.b32 [%r168], {%r15067, %r15069, %r15071, %r15073}; 2026-02-21T09:19:14.2046349Z st.shared.v4.b32 [%r168+512], {%r15068, %r15070, %r15072, %r15074}; 2026-02-21T09:19:14.2046404Z bar.sync 0; 2026-02-21T09:19:14.2046591Z // begin inline asm 2026-02-21T09:19:14.2046864Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14699, %r14700, %r14701, %r14702}, [%r6479]; 2026-02-21T09:19:14.2046929Z // end inline asm 2026-02-21T09:19:14.2046987Z // begin inline asm 2026-02-21T09:19:14.2047177Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14703, %r14704, %r14705, %r14706}, [%r6484]; 2026-02-21T09:19:14.2047239Z // end inline asm 2026-02-21T09:19:14.2047299Z // begin inline asm 2026-02-21T09:19:14.2047488Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14707, %r14708, %r14709, %r14710}, [%r6489]; 2026-02-21T09:19:14.2047549Z // end inline asm 2026-02-21T09:19:14.2047607Z // begin inline asm 2026-02-21T09:19:14.2047795Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14711, %r14712, %r14713, %r14714}, [%r6494]; 2026-02-21T09:19:14.2047853Z // end inline asm 2026-02-21T09:19:14.2047913Z // begin inline asm 2026-02-21T09:19:14.2048100Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14715, %r14716, %r14717, %r14718}, [%r6499]; 2026-02-21T09:19:14.2048158Z // end inline asm 2026-02-21T09:19:14.2048221Z // begin inline asm 2026-02-21T09:19:14.2048407Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14719, %r14720, %r14721, %r14722}, [%r6504]; 2026-02-21T09:19:14.2048465Z // end inline asm 2026-02-21T09:19:14.2048526Z // begin inline asm 2026-02-21T09:19:14.2048714Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14723, %r14724, %r14725, %r14726}, [%r6509]; 2026-02-21T09:19:14.2048773Z // end inline asm 2026-02-21T09:19:14.2048831Z // begin inline asm 2026-02-21T09:19:14.2049023Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14727, %r14728, %r14729, %r14730}, [%r6514]; 2026-02-21T09:19:14.2049080Z // end inline asm 2026-02-21T09:19:14.2049137Z // begin inline asm 2026-02-21T09:19:14.2049270Z st.global.v4.b32 [ %rd661 + 0 ], { %r14603, %r14604, %r14605, %r14606 }; 2026-02-21T09:19:14.2049327Z // end inline asm 2026-02-21T09:19:14.2049384Z // begin inline asm 2026-02-21T09:19:14.2049506Z st.global.v4.b32 [ %rd662 + 0 ], { %r14607, %r14608, %r14609, %r14610 }; 2026-02-21T09:19:14.2049569Z // end inline asm 2026-02-21T09:19:14.2049627Z // begin inline asm 2026-02-21T09:19:14.2049748Z st.global.v4.b32 [ %rd663 + 0 ], { %r14611, %r14612, %r14613, %r14614 }; 2026-02-21T09:19:14.2049811Z // end inline asm 2026-02-21T09:19:14.2049869Z // begin inline asm 2026-02-21T09:19:14.2049988Z st.global.v4.b32 [ %rd664 + 0 ], { %r14615, %r14616, %r14617, %r14618 }; 2026-02-21T09:19:14.2050137Z // end inline asm 2026-02-21T09:19:14.2050197Z // begin inline asm 2026-02-21T09:19:14.2050316Z st.global.v4.b32 [ %rd665 + 0 ], { %r14619, %r14620, %r14621, %r14622 }; 2026-02-21T09:19:14.2050384Z // end inline asm 2026-02-21T09:19:14.2050447Z // begin inline asm 2026-02-21T09:19:14.2050629Z st.global.v4.b32 [ %rd666 + 0 ], { %r14623, %r14624, %r14625, %r14626 }; 2026-02-21T09:19:14.2050686Z // end inline asm 2026-02-21T09:19:14.2050748Z // begin inline asm 2026-02-21T09:19:14.2050865Z st.global.v4.b32 [ %rd667 + 0 ], { %r14627, %r14628, %r14629, %r14630 }; 2026-02-21T09:19:14.2050923Z // end inline asm 2026-02-21T09:19:14.2050984Z // begin inline asm 2026-02-21T09:19:14.2051106Z st.global.v4.b32 [ %rd668 + 0 ], { %r14631, %r14632, %r14633, %r14634 }; 2026-02-21T09:19:14.2051163Z // end inline asm 2026-02-21T09:19:14.2051296Z // begin inline asm 2026-02-21T09:19:14.2051421Z st.global.v4.b32 [ %rd669 + 0 ], { %r14635, %r14636, %r14637, %r14638 }; 2026-02-21T09:19:14.2051478Z // end inline asm 2026-02-21T09:19:14.2051538Z // begin inline asm 2026-02-21T09:19:14.2051661Z st.global.v4.b32 [ %rd670 + 0 ], { %r14639, %r14640, %r14641, %r14642 }; 2026-02-21T09:19:14.2051717Z // end inline asm 2026-02-21T09:19:14.2051775Z // begin inline asm 2026-02-21T09:19:14.2051896Z st.global.v4.b32 [ %rd671 + 0 ], { %r14643, %r14644, %r14645, %r14646 }; 2026-02-21T09:19:14.2051959Z // end inline asm 2026-02-21T09:19:14.2052017Z // begin inline asm 2026-02-21T09:19:14.2052135Z st.global.v4.b32 [ %rd672 + 0 ], { %r14647, %r14648, %r14649, %r14650 }; 2026-02-21T09:19:14.2052194Z // end inline asm 2026-02-21T09:19:14.2052299Z // begin inline asm 2026-02-21T09:19:14.2052421Z st.global.v4.b32 [ %rd673 + 0 ], { %r14651, %r14652, %r14653, %r14654 }; 2026-02-21T09:19:14.2052478Z // end inline asm 2026-02-21T09:19:14.2052545Z // begin inline asm 2026-02-21T09:19:14.2052663Z st.global.v4.b32 [ %rd674 + 0 ], { %r14655, %r14656, %r14657, %r14658 }; 2026-02-21T09:19:14.2052719Z // end inline asm 2026-02-21T09:19:14.2052781Z // begin inline asm 2026-02-21T09:19:14.2052899Z st.global.v4.b32 [ %rd675 + 0 ], { %r14659, %r14660, %r14661, %r14662 }; 2026-02-21T09:19:14.2052954Z // end inline asm 2026-02-21T09:19:14.2053018Z // begin inline asm 2026-02-21T09:19:14.2053136Z st.global.v4.b32 [ %rd676 + 0 ], { %r14663, %r14664, %r14665, %r14666 }; 2026-02-21T09:19:14.2053194Z // end inline asm 2026-02-21T09:19:14.2053252Z // begin inline asm 2026-02-21T09:19:14.2053374Z st.global.v4.b32 [ %rd677 + 0 ], { %r14667, %r14668, %r14669, %r14670 }; 2026-02-21T09:19:14.2053431Z // end inline asm 2026-02-21T09:19:14.2053491Z // begin inline asm 2026-02-21T09:19:14.2053612Z st.global.v4.b32 [ %rd678 + 0 ], { %r14671, %r14672, %r14673, %r14674 }; 2026-02-21T09:19:14.2053668Z // end inline asm 2026-02-21T09:19:14.2053733Z // begin inline asm 2026-02-21T09:19:14.2053862Z st.global.v4.b32 [ %rd679 + 0 ], { %r14675, %r14676, %r14677, %r14678 }; 2026-02-21T09:19:14.2053921Z // end inline asm 2026-02-21T09:19:14.2053980Z // begin inline asm 2026-02-21T09:19:14.2054101Z st.global.v4.b32 [ %rd680 + 0 ], { %r14679, %r14680, %r14681, %r14682 }; 2026-02-21T09:19:14.2054161Z // end inline asm 2026-02-21T09:19:14.2054218Z // begin inline asm 2026-02-21T09:19:14.2054336Z st.global.v4.b32 [ %rd681 + 0 ], { %r14683, %r14684, %r14685, %r14686 }; 2026-02-21T09:19:14.2054397Z // end inline asm 2026-02-21T09:19:14.2054455Z // begin inline asm 2026-02-21T09:19:14.2054574Z st.global.v4.b32 [ %rd682 + 0 ], { %r14687, %r14688, %r14689, %r14690 }; 2026-02-21T09:19:14.2054629Z // end inline asm 2026-02-21T09:19:14.2054693Z // begin inline asm 2026-02-21T09:19:14.2054811Z st.global.v4.b32 [ %rd683 + 0 ], { %r14691, %r14692, %r14693, %r14694 }; 2026-02-21T09:19:14.2054867Z // end inline asm 2026-02-21T09:19:14.2054935Z // begin inline asm 2026-02-21T09:19:14.2055055Z st.global.v4.b32 [ %rd684 + 0 ], { %r14695, %r14696, %r14697, %r14698 }; 2026-02-21T09:19:14.2055112Z // end inline asm 2026-02-21T09:19:14.2055171Z // begin inline asm 2026-02-21T09:19:14.2055355Z st.global.v4.b32 [ %rd685 + 0 ], { %r14699, %r14700, %r14701, %r14702 }; 2026-02-21T09:19:14.2055412Z // end inline asm 2026-02-21T09:19:14.2055470Z // begin inline asm 2026-02-21T09:19:14.2055591Z st.global.v4.b32 [ %rd686 + 0 ], { %r14703, %r14704, %r14705, %r14706 }; 2026-02-21T09:19:14.2055710Z // end inline asm 2026-02-21T09:19:14.2055770Z // begin inline asm 2026-02-21T09:19:14.2055892Z st.global.v4.b32 [ %rd687 + 0 ], { %r14707, %r14708, %r14709, %r14710 }; 2026-02-21T09:19:14.2055949Z // end inline asm 2026-02-21T09:19:14.2056008Z // begin inline asm 2026-02-21T09:19:14.2056128Z st.global.v4.b32 [ %rd688 + 0 ], { %r14711, %r14712, %r14713, %r14714 }; 2026-02-21T09:19:14.2056188Z // end inline asm 2026-02-21T09:19:14.2056245Z // begin inline asm 2026-02-21T09:19:14.2056421Z st.global.v4.b32 [ %rd689 + 0 ], { %r14715, %r14716, %r14717, %r14718 }; 2026-02-21T09:19:14.2056616Z // end inline asm 2026-02-21T09:19:14.2056678Z // begin inline asm 2026-02-21T09:19:14.2056800Z st.global.v4.b32 [ %rd690 + 0 ], { %r14719, %r14720, %r14721, %r14722 }; 2026-02-21T09:19:14.2056856Z // end inline asm 2026-02-21T09:19:14.2056918Z // begin inline asm 2026-02-21T09:19:14.2057036Z st.global.v4.b32 [ %rd691 + 0 ], { %r14723, %r14724, %r14725, %r14726 }; 2026-02-21T09:19:14.2057095Z // end inline asm 2026-02-21T09:19:14.2057155Z // begin inline asm 2026-02-21T09:19:14.2057271Z st.global.v4.b32 [ %rd692 + 0 ], { %r14727, %r14728, %r14729, %r14730 }; 2026-02-21T09:19:14.2057327Z // end inline asm 2026-02-21T09:19:14.2057638Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.2057710Z add.s32 %r15139, %r22988, 3; 2026-02-21T09:19:14.2057924Z .loc 1 29 33 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:29:33 2026-02-21T09:19:14.2057987Z shr.u32 %r15140, %r15139, 5; 2026-02-21T09:19:14.2058059Z and.b32 %r15141, %r15140, 67108856; 2026-02-21T09:19:14.2058260Z .loc 1 30 39 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:39 2026-02-21T09:19:14.2058325Z sub.s32 %r15142, 64, %r15141; 2026-02-21T09:19:14.2058526Z .loc 1 30 52 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:30:52 2026-02-21T09:19:14.2058590Z min.s32 %r15143, %r15142, 8; 2026-02-21T09:19:14.2058786Z .loc 1 31 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:45 2026-02-21T09:19:14.2058855Z and.b32 %r15144, %r15139, 255; 2026-02-21T09:19:14.2059052Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.2059117Z div.s32 %r15145, %r15144, %r15143; 2026-02-21T09:19:14.2059317Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.2059386Z mul.lo.s32 %r15146, %r15145, %r15143; 2026-02-21T09:19:14.2059449Z sub.s32 %r15147, %r15144, %r15146; 2026-02-21T09:19:14.2059649Z .loc 1 31 30 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:30 2026-02-21T09:19:14.2059716Z add.s32 %r15148, %r15147, %r15141; 2026-02-21T09:19:14.2059913Z .loc 1 33 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:33:27 2026-02-21T09:19:14.2059978Z shl.b32 %r1899, %r15148, 7; 2026-02-21T09:19:14.2060177Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.2060240Z or.b32 %r15149, %r1899, %r7; 2026-02-21T09:19:14.2060437Z .loc 1 35 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:35:27 2026-02-21T09:19:14.2060502Z shl.b32 %r1900, %r15145, 9; 2026-02-21T09:19:14.2060701Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.2060764Z or.b32 %r15150, %r1900, %r11; 2026-02-21T09:19:14.2060827Z or.b32 %r15151, %r1900, %r12; 2026-02-21T09:19:14.2060967Z or.b32 %r15152, %r1900, %r13; 2026-02-21T09:19:14.2061027Z or.b32 %r15153, %r1900, %r14; 2026-02-21T09:19:14.2061089Z or.b32 %r15154, %r1900, %r15; 2026-02-21T09:19:14.2061154Z or.b32 %r15155, %r1900, %r16; 2026-02-21T09:19:14.2061213Z or.b32 %r15156, %r1900, %r17; 2026-02-21T09:19:14.2061335Z or.b32 %r15157, %r1900, %r18; 2026-02-21T09:19:14.2061536Z .loc 1 51 53 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:53 2026-02-21T09:19:14.2061596Z shl.b32 %r15158, %r15150, 10; 2026-02-21T09:19:14.2061655Z shl.b32 %r15159, %r15151, 10; 2026-02-21T09:19:14.2061716Z shl.b32 %r15160, %r15152, 10; 2026-02-21T09:19:14.2061780Z shl.b32 %r15161, %r15153, 10; 2026-02-21T09:19:14.2061838Z shl.b32 %r15162, %r15154, 10; 2026-02-21T09:19:14.2061962Z shl.b32 %r15163, %r15155, 10; 2026-02-21T09:19:14.2062029Z shl.b32 %r15164, %r15156, 10; 2026-02-21T09:19:14.2062089Z shl.b32 %r15165, %r15157, 10; 2026-02-21T09:19:14.2062284Z .loc 1 51 60 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:60 2026-02-21T09:19:14.2062353Z or.b32 %r15166, %r15158, %r54; 2026-02-21T09:19:14.2062415Z or.b32 %r15167, %r15159, %r54; 2026-02-21T09:19:14.2062475Z or.b32 %r15168, %r15160, %r54; 2026-02-21T09:19:14.2062537Z or.b32 %r15169, %r15161, %r54; 2026-02-21T09:19:14.2062601Z or.b32 %r15170, %r15162, %r54; 2026-02-21T09:19:14.2062661Z or.b32 %r15171, %r15163, %r54; 2026-02-21T09:19:14.2062722Z or.b32 %r15172, %r15164, %r54; 2026-02-21T09:19:14.2062785Z or.b32 %r15173, %r15165, %r54; 2026-02-21T09:19:14.2063048Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2063128Z mad.wide.s32 %rd693, %r15166, 2, %rd64; 2026-02-21T09:19:14.2063201Z mad.wide.s32 %rd694, %r15167, 2, %rd64; 2026-02-21T09:19:14.2063275Z mad.wide.s32 %rd695, %r15168, 2, %rd64; 2026-02-21T09:19:14.2063344Z mad.wide.s32 %rd696, %r15169, 2, %rd64; 2026-02-21T09:19:14.2063412Z mad.wide.s32 %rd697, %r15170, 2, %rd64; 2026-02-21T09:19:14.2063485Z mad.wide.s32 %rd698, %r15171, 2, %rd64; 2026-02-21T09:19:14.2063551Z mad.wide.s32 %rd699, %r15172, 2, %rd64; 2026-02-21T09:19:14.2063619Z mad.wide.s32 %rd700, %r15173, 2, %rd64; 2026-02-21T09:19:14.2063820Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2063879Z bar.sync 0; 2026-02-21T09:19:14.2063941Z mov.b32 %r14732, 8; 2026-02-21T09:19:14.2063999Z // begin inline asm 2026-02-21T09:19:14.2064147Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd693 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2064205Z // end inline asm 2026-02-21T09:19:14.2064264Z // begin inline asm 2026-02-21T09:19:14.2064403Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd694 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2064462Z // end inline asm 2026-02-21T09:19:14.2064522Z // begin inline asm 2026-02-21T09:19:14.2064655Z cp.async.ca.shared.global [ %r59 + 0 ], [ %rd695 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2064715Z // end inline asm 2026-02-21T09:19:14.2064774Z // begin inline asm 2026-02-21T09:19:14.2064904Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd696 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2064965Z // end inline asm 2026-02-21T09:19:14.2065040Z // begin inline asm 2026-02-21T09:19:14.2065176Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd697 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2065235Z // end inline asm 2026-02-21T09:19:14.2065296Z // begin inline asm 2026-02-21T09:19:14.2065424Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd698 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2065483Z // end inline asm 2026-02-21T09:19:14.2065545Z // begin inline asm 2026-02-21T09:19:14.2065672Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd699 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2065730Z // end inline asm 2026-02-21T09:19:14.2065791Z // begin inline asm 2026-02-21T09:19:14.2065919Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd700 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2065975Z // end inline asm 2026-02-21T09:19:14.2066107Z cp.async.commit_group; 2026-02-21T09:19:14.2066313Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2066378Z add.s32 %r15174, %r15149, %r22968; 2026-02-21T09:19:14.2066740Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2066814Z cvt.s64.s32 %rd784, %r15174; 2026-02-21T09:19:14.2066879Z add.s64 %rd701, %rd65, %rd784; 2026-02-21T09:19:14.2066936Z mov.b32 %r23772, 4; 2026-02-21T09:19:14.2067139Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2067198Z // begin inline asm 2026-02-21T09:19:14.2067408Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd701 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2067472Z // end inline asm 2026-02-21T09:19:14.2067539Z cp.async.commit_group; 2026-02-21T09:19:14.2067738Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2067804Z cvt.s64.s32 %rd785, %r15158; 2026-02-21T09:19:14.2067874Z or.b64 %rd786, %rd785, %rd1113; 2026-02-21T09:19:14.2067950Z shl.b64 %rd787, %rd786, 1; 2026-02-21T09:19:14.2068016Z add.s64 %rd788, %rd64, %rd787; 2026-02-21T09:19:14.2068085Z add.s64 %rd702, %rd788, 32; 2026-02-21T09:19:14.2068148Z cvt.s64.s32 %rd789, %r15159; 2026-02-21T09:19:14.2068211Z or.b64 %rd790, %rd789, %rd1113; 2026-02-21T09:19:14.2068273Z shl.b64 %rd791, %rd790, 1; 2026-02-21T09:19:14.2068344Z add.s64 %rd792, %rd64, %rd791; 2026-02-21T09:19:14.2068474Z add.s64 %rd703, %rd792, 32; 2026-02-21T09:19:14.2068617Z cvt.s64.s32 %rd793, %r15160; 2026-02-21T09:19:14.2068685Z or.b64 %rd794, %rd793, %rd1113; 2026-02-21T09:19:14.2068750Z shl.b64 %rd795, %rd794, 1; 2026-02-21T09:19:14.2068814Z add.s64 %rd796, %rd64, %rd795; 2026-02-21T09:19:14.2068875Z add.s64 %rd704, %rd796, 32; 2026-02-21T09:19:14.2068942Z cvt.s64.s32 %rd797, %r15161; 2026-02-21T09:19:14.2069007Z or.b64 %rd798, %rd797, %rd1113; 2026-02-21T09:19:14.2069068Z shl.b64 %rd799, %rd798, 1; 2026-02-21T09:19:14.2069142Z add.s64 %rd800, %rd64, %rd799; 2026-02-21T09:19:14.2069202Z add.s64 %rd705, %rd800, 32; 2026-02-21T09:19:14.2069265Z cvt.s64.s32 %rd801, %r15162; 2026-02-21T09:19:14.2069330Z or.b64 %rd802, %rd801, %rd1113; 2026-02-21T09:19:14.2069391Z shl.b64 %rd803, %rd802, 1; 2026-02-21T09:19:14.2069454Z add.s64 %rd804, %rd64, %rd803; 2026-02-21T09:19:14.2069514Z add.s64 %rd706, %rd804, 32; 2026-02-21T09:19:14.2069580Z cvt.s64.s32 %rd805, %r15163; 2026-02-21T09:19:14.2069644Z or.b64 %rd806, %rd805, %rd1113; 2026-02-21T09:19:14.2069704Z shl.b64 %rd807, %rd806, 1; 2026-02-21T09:19:14.2069772Z add.s64 %rd808, %rd64, %rd807; 2026-02-21T09:19:14.2069835Z add.s64 %rd707, %rd808, 32; 2026-02-21T09:19:14.2069896Z cvt.s64.s32 %rd809, %r15164; 2026-02-21T09:19:14.2069959Z or.b64 %rd810, %rd809, %rd1113; 2026-02-21T09:19:14.2070025Z shl.b64 %rd811, %rd810, 1; 2026-02-21T09:19:14.2070090Z add.s64 %rd812, %rd64, %rd811; 2026-02-21T09:19:14.2070151Z add.s64 %rd708, %rd812, 32; 2026-02-21T09:19:14.2070216Z cvt.s64.s32 %rd813, %r15165; 2026-02-21T09:19:14.2070278Z or.b64 %rd814, %rd813, %rd1113; 2026-02-21T09:19:14.2070338Z shl.b64 %rd815, %rd814, 1; 2026-02-21T09:19:14.2070407Z add.s64 %rd816, %rd64, %rd815; 2026-02-21T09:19:14.2070472Z add.s64 %rd709, %rd816, 32; 2026-02-21T09:19:14.2070672Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2070733Z // begin inline asm 2026-02-21T09:19:14.2070872Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd702 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2070933Z // end inline asm 2026-02-21T09:19:14.2070992Z // begin inline asm 2026-02-21T09:19:14.2071126Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd703 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2071184Z // end inline asm 2026-02-21T09:19:14.2071241Z // begin inline asm 2026-02-21T09:19:14.2071450Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd704 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2071509Z // end inline asm 2026-02-21T09:19:14.2071568Z // begin inline asm 2026-02-21T09:19:14.2071698Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd705 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2071825Z // end inline asm 2026-02-21T09:19:14.2071892Z // begin inline asm 2026-02-21T09:19:14.2072023Z cp.async.ca.shared.global [ %r71 + 0 ], [ %rd706 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2072079Z // end inline asm 2026-02-21T09:19:14.2072143Z // begin inline asm 2026-02-21T09:19:14.2072275Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd707 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2072333Z // end inline asm 2026-02-21T09:19:14.2072395Z // begin inline asm 2026-02-21T09:19:14.2072573Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd708 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2072632Z // end inline asm 2026-02-21T09:19:14.2072696Z // begin inline asm 2026-02-21T09:19:14.2072826Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd709 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2072885Z // end inline asm 2026-02-21T09:19:14.2072950Z cp.async.commit_group; 2026-02-21T09:19:14.2073152Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2073217Z add.s32 %r15175, %r15149, %r75; 2026-02-21T09:19:14.2073418Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2073482Z cvt.s64.s32 %rd817, %r15175; 2026-02-21T09:19:14.2073546Z add.s64 %rd710, %rd65, %rd817; 2026-02-21T09:19:14.2073795Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2073861Z // begin inline asm 2026-02-21T09:19:14.2073996Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd710 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2074053Z // end inline asm 2026-02-21T09:19:14.2074120Z cp.async.commit_group; 2026-02-21T09:19:14.2074322Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2074385Z add.s64 %rd711, %rd788, 64; 2026-02-21T09:19:14.2074446Z add.s64 %rd712, %rd792, 64; 2026-02-21T09:19:14.2074509Z add.s64 %rd713, %rd796, 64; 2026-02-21T09:19:14.2074572Z add.s64 %rd714, %rd800, 64; 2026-02-21T09:19:14.2074632Z add.s64 %rd715, %rd804, 64; 2026-02-21T09:19:14.2074698Z add.s64 %rd716, %rd808, 64; 2026-02-21T09:19:14.2074759Z add.s64 %rd717, %rd812, 64; 2026-02-21T09:19:14.2074820Z add.s64 %rd718, %rd816, 64; 2026-02-21T09:19:14.2075019Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2075082Z bar.sync 0; 2026-02-21T09:19:14.2075144Z // begin inline asm 2026-02-21T09:19:14.2075278Z cp.async.ca.shared.global [ %r77 + 0 ], [ %rd711 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2075339Z // end inline asm 2026-02-21T09:19:14.2075400Z // begin inline asm 2026-02-21T09:19:14.2075530Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd712 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2075589Z // end inline asm 2026-02-21T09:19:14.2075651Z // begin inline asm 2026-02-21T09:19:14.2075780Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd713 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2075841Z // end inline asm 2026-02-21T09:19:14.2075901Z // begin inline asm 2026-02-21T09:19:14.2076031Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd714 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2076088Z // end inline asm 2026-02-21T09:19:14.2076149Z // begin inline asm 2026-02-21T09:19:14.2076278Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd715 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2076335Z // end inline asm 2026-02-21T09:19:14.2076394Z // begin inline asm 2026-02-21T09:19:14.2076657Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd716 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2076718Z // end inline asm 2026-02-21T09:19:14.2076776Z // begin inline asm 2026-02-21T09:19:14.2076909Z cp.async.ca.shared.global [ %r83 + 0 ], [ %rd717 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2077047Z // end inline asm 2026-02-21T09:19:14.2077106Z // begin inline asm 2026-02-21T09:19:14.2077233Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd718 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2077293Z // end inline asm 2026-02-21T09:19:14.2077424Z cp.async.commit_group; 2026-02-21T09:19:14.2077624Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2077692Z add.s32 %r15176, %r15149, %r85; 2026-02-21T09:19:14.2077891Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2077955Z cvt.s64.s32 %rd818, %r15176; 2026-02-21T09:19:14.2078020Z add.s64 %rd719, %rd65, %rd818; 2026-02-21T09:19:14.2078295Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2078357Z // begin inline asm 2026-02-21T09:19:14.2078489Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd719 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2078553Z // end inline asm 2026-02-21T09:19:14.2078619Z cp.async.commit_group; 2026-02-21T09:19:14.2078828Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2078899Z add.s64 %rd720, %rd788, 96; 2026-02-21T09:19:14.2078961Z add.s64 %rd721, %rd792, 96; 2026-02-21T09:19:14.2079021Z add.s64 %rd722, %rd796, 96; 2026-02-21T09:19:14.2079086Z add.s64 %rd723, %rd800, 96; 2026-02-21T09:19:14.2079146Z add.s64 %rd724, %rd804, 96; 2026-02-21T09:19:14.2079278Z add.s64 %rd725, %rd808, 96; 2026-02-21T09:19:14.2079341Z add.s64 %rd726, %rd812, 96; 2026-02-21T09:19:14.2079406Z add.s64 %rd727, %rd816, 96; 2026-02-21T09:19:14.2079606Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2079667Z // begin inline asm 2026-02-21T09:19:14.2079802Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd720 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2079861Z // end inline asm 2026-02-21T09:19:14.2079918Z // begin inline asm 2026-02-21T09:19:14.2080049Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd721 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2080108Z // end inline asm 2026-02-21T09:19:14.2080166Z // begin inline asm 2026-02-21T09:19:14.2080298Z cp.async.ca.shared.global [ %r89 + 0 ], [ %rd722 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2080356Z // end inline asm 2026-02-21T09:19:14.2080414Z // begin inline asm 2026-02-21T09:19:14.2080544Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd723 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2080606Z // end inline asm 2026-02-21T09:19:14.2080664Z // begin inline asm 2026-02-21T09:19:14.2080791Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd724 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2080846Z // end inline asm 2026-02-21T09:19:14.2080909Z // begin inline asm 2026-02-21T09:19:14.2081037Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd725 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2081093Z // end inline asm 2026-02-21T09:19:14.2081158Z // begin inline asm 2026-02-21T09:19:14.2081287Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd726 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2081344Z // end inline asm 2026-02-21T09:19:14.2081403Z // begin inline asm 2026-02-21T09:19:14.2081537Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd727 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2081594Z // end inline asm 2026-02-21T09:19:14.2081659Z cp.async.commit_group; 2026-02-21T09:19:14.2081863Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2081933Z add.s32 %r15177, %r15149, %r95; 2026-02-21T09:19:14.2082131Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2082198Z cvt.s64.s32 %rd819, %r15177; 2026-02-21T09:19:14.2082260Z add.s64 %rd728, %rd65, %rd819; 2026-02-21T09:19:14.2082455Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2082582Z // begin inline asm 2026-02-21T09:19:14.2082725Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd728 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2082786Z // end inline asm 2026-02-21T09:19:14.2082852Z cp.async.commit_group; 2026-02-21T09:19:14.2083102Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2083165Z add.s64 %rd729, %rd788, 128; 2026-02-21T09:19:14.2083226Z add.s64 %rd730, %rd792, 128; 2026-02-21T09:19:14.2083291Z add.s64 %rd731, %rd796, 128; 2026-02-21T09:19:14.2083354Z add.s64 %rd732, %rd800, 128; 2026-02-21T09:19:14.2083414Z add.s64 %rd733, %rd804, 128; 2026-02-21T09:19:14.2083473Z add.s64 %rd734, %rd808, 128; 2026-02-21T09:19:14.2083535Z add.s64 %rd735, %rd812, 128; 2026-02-21T09:19:14.2083661Z add.s64 %rd736, %rd816, 128; 2026-02-21T09:19:14.2083859Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2083923Z bar.sync 0; 2026-02-21T09:19:14.2083984Z // begin inline asm 2026-02-21T09:19:14.2084115Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd729 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2084175Z // end inline asm 2026-02-21T09:19:14.2084236Z // begin inline asm 2026-02-21T09:19:14.2084368Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd730 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2084425Z // end inline asm 2026-02-21T09:19:14.2084485Z // begin inline asm 2026-02-21T09:19:14.2084614Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd731 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2084718Z // end inline asm 2026-02-21T09:19:14.2084781Z // begin inline asm 2026-02-21T09:19:14.2084920Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd732 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2084975Z // end inline asm 2026-02-21T09:19:14.2085033Z // begin inline asm 2026-02-21T09:19:14.2085170Z cp.async.ca.shared.global [ %r101 + 0 ], [ %rd733 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2085228Z // end inline asm 2026-02-21T09:19:14.2085287Z // begin inline asm 2026-02-21T09:19:14.2085419Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd734 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2085489Z // end inline asm 2026-02-21T09:19:14.2085550Z // begin inline asm 2026-02-21T09:19:14.2085681Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd735 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2085741Z // end inline asm 2026-02-21T09:19:14.2085799Z // begin inline asm 2026-02-21T09:19:14.2085931Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd736 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2085990Z // end inline asm 2026-02-21T09:19:14.2086056Z cp.async.commit_group; 2026-02-21T09:19:14.2086255Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2086327Z add.s32 %r15178, %r15149, %r105; 2026-02-21T09:19:14.2086645Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2086711Z cvt.s64.s32 %rd820, %r15178; 2026-02-21T09:19:14.2086778Z add.s64 %rd737, %rd65, %rd820; 2026-02-21T09:19:14.2086998Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2087061Z // begin inline asm 2026-02-21T09:19:14.2087204Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd737 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2087266Z // end inline asm 2026-02-21T09:19:14.2087332Z cp.async.commit_group; 2026-02-21T09:19:14.2087530Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2087598Z add.s64 %rd738, %rd788, 160; 2026-02-21T09:19:14.2087659Z add.s64 %rd739, %rd792, 160; 2026-02-21T09:19:14.2087719Z add.s64 %rd740, %rd796, 160; 2026-02-21T09:19:14.2087779Z add.s64 %rd741, %rd800, 160; 2026-02-21T09:19:14.2087845Z add.s64 %rd742, %rd804, 160; 2026-02-21T09:19:14.2087906Z add.s64 %rd743, %rd808, 160; 2026-02-21T09:19:14.2087967Z add.s64 %rd744, %rd812, 160; 2026-02-21T09:19:14.2088120Z add.s64 %rd745, %rd816, 160; 2026-02-21T09:19:14.2088319Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2088378Z // begin inline asm 2026-02-21T09:19:14.2088515Z cp.async.ca.shared.global [ %r107 + 0 ], [ %rd738 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2088641Z // end inline asm 2026-02-21T09:19:14.2088701Z // begin inline asm 2026-02-21T09:19:14.2088831Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd739 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2088890Z // end inline asm 2026-02-21T09:19:14.2088949Z // begin inline asm 2026-02-21T09:19:14.2089078Z cp.async.ca.shared.global [ %r109 + 0 ], [ %rd740 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2089146Z // end inline asm 2026-02-21T09:19:14.2089204Z // begin inline asm 2026-02-21T09:19:14.2089399Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd741 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2089459Z // end inline asm 2026-02-21T09:19:14.2089520Z // begin inline asm 2026-02-21T09:19:14.2089652Z cp.async.ca.shared.global [ %r111 + 0 ], [ %rd742 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2089709Z // end inline asm 2026-02-21T09:19:14.2089772Z // begin inline asm 2026-02-21T09:19:14.2089900Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd743 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2089958Z // end inline asm 2026-02-21T09:19:14.2090018Z // begin inline asm 2026-02-21T09:19:14.2090146Z cp.async.ca.shared.global [ %r113 + 0 ], [ %rd744 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2090202Z // end inline asm 2026-02-21T09:19:14.2090260Z // begin inline asm 2026-02-21T09:19:14.2090454Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd745 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2090514Z // end inline asm 2026-02-21T09:19:14.2090579Z cp.async.commit_group; 2026-02-21T09:19:14.2090780Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2090844Z add.s32 %r15179, %r15149, %r115; 2026-02-21T09:19:14.2091055Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2091126Z cvt.s64.s32 %rd821, %r15179; 2026-02-21T09:19:14.2091190Z add.s64 %rd746, %rd65, %rd821; 2026-02-21T09:19:14.2091387Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2091449Z // begin inline asm 2026-02-21T09:19:14.2091586Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd746 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2091641Z // end inline asm 2026-02-21T09:19:14.2091708Z cp.async.commit_group; 2026-02-21T09:19:14.2091907Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2091974Z add.s64 %rd747, %rd788, 192; 2026-02-21T09:19:14.2092037Z add.s64 %rd748, %rd792, 192; 2026-02-21T09:19:14.2092099Z add.s64 %rd749, %rd796, 192; 2026-02-21T09:19:14.2092162Z add.s64 %rd750, %rd800, 192; 2026-02-21T09:19:14.2092222Z add.s64 %rd751, %rd804, 192; 2026-02-21T09:19:14.2092284Z add.s64 %rd752, %rd808, 192; 2026-02-21T09:19:14.2092347Z add.s64 %rd753, %rd812, 192; 2026-02-21T09:19:14.2092406Z add.s64 %rd754, %rd816, 192; 2026-02-21T09:19:14.2092600Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2092660Z bar.sync 0; 2026-02-21T09:19:14.2092719Z // begin inline asm 2026-02-21T09:19:14.2092852Z cp.async.ca.shared.global [ %r117 + 0 ], [ %rd747 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2092908Z // end inline asm 2026-02-21T09:19:14.2092971Z // begin inline asm 2026-02-21T09:19:14.2093102Z cp.async.ca.shared.global [ %r118 + 0 ], [ %rd748 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2093158Z // end inline asm 2026-02-21T09:19:14.2093222Z // begin inline asm 2026-02-21T09:19:14.2093354Z cp.async.ca.shared.global [ %r119 + 0 ], [ %rd749 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2093410Z // end inline asm 2026-02-21T09:19:14.2093468Z // begin inline asm 2026-02-21T09:19:14.2093667Z cp.async.ca.shared.global [ %r120 + 0 ], [ %rd750 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2093723Z // end inline asm 2026-02-21T09:19:14.2093779Z // begin inline asm 2026-02-21T09:19:14.2093911Z cp.async.ca.shared.global [ %r121 + 0 ], [ %rd751 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2094014Z // end inline asm 2026-02-21T09:19:14.2094072Z // begin inline asm 2026-02-21T09:19:14.2094204Z cp.async.ca.shared.global [ %r122 + 0 ], [ %rd752 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2094260Z // end inline asm 2026-02-21T09:19:14.2094318Z // begin inline asm 2026-02-21T09:19:14.2094449Z cp.async.ca.shared.global [ %r123 + 0 ], [ %rd753 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2094509Z // end inline asm 2026-02-21T09:19:14.2094566Z // begin inline asm 2026-02-21T09:19:14.2094743Z cp.async.ca.shared.global [ %r124 + 0 ], [ %rd754 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2094804Z // end inline asm 2026-02-21T09:19:14.2094881Z cp.async.commit_group; 2026-02-21T09:19:14.2095084Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2095149Z add.s32 %r15180, %r15149, %r125; 2026-02-21T09:19:14.2095349Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2095412Z cvt.s64.s32 %rd822, %r15180; 2026-02-21T09:19:14.2095475Z add.s64 %rd755, %rd65, %rd822; 2026-02-21T09:19:14.2095678Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2095740Z // begin inline asm 2026-02-21T09:19:14.2095921Z cp.async.ca.shared.global [ %r126 + 0 ], [ %rd755 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2095983Z // end inline asm 2026-02-21T09:19:14.2096049Z cp.async.commit_group; 2026-02-21T09:19:14.2096247Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2096315Z add.s64 %rd756, %rd788, 224; 2026-02-21T09:19:14.2096375Z add.s64 %rd757, %rd792, 224; 2026-02-21T09:19:14.2096437Z add.s64 %rd758, %rd796, 224; 2026-02-21T09:19:14.2096618Z add.s64 %rd759, %rd800, 224; 2026-02-21T09:19:14.2096684Z add.s64 %rd760, %rd804, 224; 2026-02-21T09:19:14.2096745Z add.s64 %rd761, %rd808, 224; 2026-02-21T09:19:14.2096807Z add.s64 %rd762, %rd812, 224; 2026-02-21T09:19:14.2096870Z add.s64 %rd763, %rd816, 224; 2026-02-21T09:19:14.2097066Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2097125Z // begin inline asm 2026-02-21T09:19:14.2097259Z cp.async.ca.shared.global [ %r127 + 0 ], [ %rd756 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2097319Z // end inline asm 2026-02-21T09:19:14.2097376Z // begin inline asm 2026-02-21T09:19:14.2097508Z cp.async.ca.shared.global [ %r128 + 0 ], [ %rd757 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2097568Z // end inline asm 2026-02-21T09:19:14.2097627Z // begin inline asm 2026-02-21T09:19:14.2097757Z cp.async.ca.shared.global [ %r129 + 0 ], [ %rd758 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2097824Z // end inline asm 2026-02-21T09:19:14.2097893Z // begin inline asm 2026-02-21T09:19:14.2098025Z cp.async.ca.shared.global [ %r130 + 0 ], [ %rd759 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2098080Z // end inline asm 2026-02-21T09:19:14.2098144Z // begin inline asm 2026-02-21T09:19:14.2098273Z cp.async.ca.shared.global [ %r131 + 0 ], [ %rd760 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2098329Z // end inline asm 2026-02-21T09:19:14.2098391Z // begin inline asm 2026-02-21T09:19:14.2098522Z cp.async.ca.shared.global [ %r132 + 0 ], [ %rd761 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2098577Z // end inline asm 2026-02-21T09:19:14.2098635Z // begin inline asm 2026-02-21T09:19:14.2098771Z cp.async.ca.shared.global [ %r133 + 0 ], [ %rd762 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2098827Z // end inline asm 2026-02-21T09:19:14.2098884Z // begin inline asm 2026-02-21T09:19:14.2099016Z cp.async.ca.shared.global [ %r134 + 0 ], [ %rd763 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2099157Z // end inline asm 2026-02-21T09:19:14.2099224Z cp.async.commit_group; 2026-02-21T09:19:14.2099427Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2099492Z add.s32 %r15181, %r15149, %r135; 2026-02-21T09:19:14.2099752Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2099813Z cvt.s64.s32 %rd823, %r15181; 2026-02-21T09:19:14.2099892Z add.s64 %rd764, %rd65, %rd823; 2026-02-21T09:19:14.2100093Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2100151Z // begin inline asm 2026-02-21T09:19:14.2100349Z cp.async.ca.shared.global [ %r136 + 0 ], [ %rd764 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2100408Z // end inline asm 2026-02-21T09:19:14.2100474Z cp.async.commit_group; 2026-02-21T09:19:14.2100672Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2100738Z add.s64 %rd765, %rd788, 256; 2026-02-21T09:19:14.2100798Z add.s64 %rd766, %rd792, 256; 2026-02-21T09:19:14.2100857Z add.s64 %rd767, %rd796, 256; 2026-02-21T09:19:14.2100920Z add.s64 %rd768, %rd800, 256; 2026-02-21T09:19:14.2100983Z add.s64 %rd769, %rd804, 256; 2026-02-21T09:19:14.2101044Z add.s64 %rd770, %rd808, 256; 2026-02-21T09:19:14.2101107Z add.s64 %rd771, %rd812, 256; 2026-02-21T09:19:14.2101177Z add.s64 %rd772, %rd816, 256; 2026-02-21T09:19:14.2101437Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2101498Z bar.sync 0; 2026-02-21T09:19:14.2101560Z // begin inline asm 2026-02-21T09:19:14.2101696Z cp.async.ca.shared.global [ %r137 + 0 ], [ %rd765 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2101753Z // end inline asm 2026-02-21T09:19:14.2101814Z // begin inline asm 2026-02-21T09:19:14.2101945Z cp.async.ca.shared.global [ %r138 + 0 ], [ %rd766 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2102003Z // end inline asm 2026-02-21T09:19:14.2102064Z // begin inline asm 2026-02-21T09:19:14.2102194Z cp.async.ca.shared.global [ %r139 + 0 ], [ %rd767 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2102250Z // end inline asm 2026-02-21T09:19:14.2102309Z // begin inline asm 2026-02-21T09:19:14.2102441Z cp.async.ca.shared.global [ %r140 + 0 ], [ %rd768 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2102496Z // end inline asm 2026-02-21T09:19:14.2102554Z // begin inline asm 2026-02-21T09:19:14.2102690Z cp.async.ca.shared.global [ %r141 + 0 ], [ %rd769 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2102747Z // end inline asm 2026-02-21T09:19:14.2102805Z // begin inline asm 2026-02-21T09:19:14.2102935Z cp.async.ca.shared.global [ %r142 + 0 ], [ %rd770 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2102993Z // end inline asm 2026-02-21T09:19:14.2103052Z // begin inline asm 2026-02-21T09:19:14.2103181Z cp.async.ca.shared.global [ %r143 + 0 ], [ %rd771 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2103242Z // end inline asm 2026-02-21T09:19:14.2103300Z // begin inline asm 2026-02-21T09:19:14.2103430Z cp.async.ca.shared.global [ %r144 + 0 ], [ %rd772 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2103488Z // end inline asm 2026-02-21T09:19:14.2103554Z cp.async.commit_group; 2026-02-21T09:19:14.2103751Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2103814Z add.s32 %r15182, %r15149, %r145; 2026-02-21T09:19:14.2104029Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2104092Z cvt.s64.s32 %rd824, %r15182; 2026-02-21T09:19:14.2104154Z add.s64 %rd773, %rd65, %rd824; 2026-02-21T09:19:14.2104352Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2104411Z // begin inline asm 2026-02-21T09:19:14.2104542Z cp.async.ca.shared.global [ %r146 + 0 ], [ %rd773 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2104665Z // end inline asm 2026-02-21T09:19:14.2104730Z cp.async.commit_group; 2026-02-21T09:19:14.2104926Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2105053Z add.s64 %rd774, %rd788, 288; 2026-02-21T09:19:14.2105117Z add.s64 %rd775, %rd792, 288; 2026-02-21T09:19:14.2105177Z add.s64 %rd776, %rd796, 288; 2026-02-21T09:19:14.2105238Z add.s64 %rd777, %rd800, 288; 2026-02-21T09:19:14.2105300Z add.s64 %rd778, %rd804, 288; 2026-02-21T09:19:14.2105360Z add.s64 %rd779, %rd808, 288; 2026-02-21T09:19:14.2105423Z add.s64 %rd780, %rd812, 288; 2026-02-21T09:19:14.2105484Z add.s64 %rd781, %rd816, 288; 2026-02-21T09:19:14.2105745Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2105806Z // begin inline asm 2026-02-21T09:19:14.2105940Z cp.async.ca.shared.global [ %r147 + 0 ], [ %rd774 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2106002Z // end inline asm 2026-02-21T09:19:14.2106059Z // begin inline asm 2026-02-21T09:19:14.2106188Z cp.async.ca.shared.global [ %r148 + 0 ], [ %rd775 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2106247Z // end inline asm 2026-02-21T09:19:14.2106303Z // begin inline asm 2026-02-21T09:19:14.2106434Z cp.async.ca.shared.global [ %r149 + 0 ], [ %rd776 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2106596Z // end inline asm 2026-02-21T09:19:14.2106658Z // begin inline asm 2026-02-21T09:19:14.2106787Z cp.async.ca.shared.global [ %r150 + 0 ], [ %rd777 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2106925Z // end inline asm 2026-02-21T09:19:14.2106992Z // begin inline asm 2026-02-21T09:19:14.2107124Z cp.async.ca.shared.global [ %r151 + 0 ], [ %rd778 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2107182Z // end inline asm 2026-02-21T09:19:14.2107240Z // begin inline asm 2026-02-21T09:19:14.2107372Z cp.async.ca.shared.global [ %r152 + 0 ], [ %rd779 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2107429Z // end inline asm 2026-02-21T09:19:14.2107489Z // begin inline asm 2026-02-21T09:19:14.2107619Z cp.async.ca.shared.global [ %r153 + 0 ], [ %rd780 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2107676Z // end inline asm 2026-02-21T09:19:14.2107732Z // begin inline asm 2026-02-21T09:19:14.2107866Z cp.async.ca.shared.global [ %r154 + 0 ], [ %rd781 + 0 ], 0x8, %r14732; 2026-02-21T09:19:14.2107922Z // end inline asm 2026-02-21T09:19:14.2107986Z cp.async.commit_group; 2026-02-21T09:19:14.2108187Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2108255Z add.s32 %r15183, %r15149, %r155; 2026-02-21T09:19:14.2108451Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2108605Z cvt.s64.s32 %rd825, %r15183; 2026-02-21T09:19:14.2108674Z add.s64 %rd782, %rd65, %rd825; 2026-02-21T09:19:14.2108871Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2108931Z // begin inline asm 2026-02-21T09:19:14.2109068Z cp.async.ca.shared.global [ %r156 + 0 ], [ %rd782 + 0 ], 0x4, %r23772; 2026-02-21T09:19:14.2109125Z // end inline asm 2026-02-21T09:19:14.2109189Z cp.async.commit_group; 2026-02-21T09:19:14.2109387Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2109455Z add.s32 %r15184, %r15144, %r324; 2026-02-21T09:19:14.2109523Z sub.s32 %r15185, %r15184, %r15146; 2026-02-21T09:19:14.2109582Z shl.b32 %r15186, %r15185, 7; 2026-02-21T09:19:14.2109648Z add.s32 %r23770, %r177, %r15186; 2026-02-21T09:19:14.2109710Z or.b32 %r15187, %r17, %r1900; 2026-02-21T09:19:14.2109770Z shl.b32 %r15188, %r15187, 10; 2026-02-21T09:19:14.2109840Z mul.wide.s32 %rd41, %r15188, 2; 2026-02-21T09:19:14.2109901Z or.b32 %r15189, %r16, %r1900; 2026-02-21T09:19:14.2109960Z shl.b32 %r15190, %r15189, 10; 2026-02-21T09:19:14.2110024Z mul.wide.s32 %rd42, %r15190, 2; 2026-02-21T09:19:14.2110166Z or.b32 %r15191, %r15, %r1900; 2026-02-21T09:19:14.2110225Z shl.b32 %r15192, %r15191, 10; 2026-02-21T09:19:14.2110289Z mul.wide.s32 %rd43, %r15192, 2; 2026-02-21T09:19:14.2110353Z or.b32 %r15193, %r14, %r1900; 2026-02-21T09:19:14.2110413Z shl.b32 %r15194, %r15193, 10; 2026-02-21T09:19:14.2110539Z mul.wide.s32 %rd44, %r15194, 2; 2026-02-21T09:19:14.2110600Z or.b32 %r15195, %r13, %r1900; 2026-02-21T09:19:14.2110666Z shl.b32 %r15196, %r15195, 10; 2026-02-21T09:19:14.2110729Z mul.wide.s32 %rd45, %r15196, 2; 2026-02-21T09:19:14.2110790Z or.b32 %r15197, %r12, %r1900; 2026-02-21T09:19:14.2110856Z shl.b32 %r15198, %r15197, 10; 2026-02-21T09:19:14.2110930Z mul.wide.s32 %rd46, %r15198, 2; 2026-02-21T09:19:14.2110990Z shl.b32 %r15199, %r15145, 19; 2026-02-21T09:19:14.2111118Z or.b32 %r15200, %r22984, %r15199; 2026-02-21T09:19:14.2111188Z mul.wide.s32 %rd47, %r15200, 2; 2026-02-21T09:19:14.2111247Z or.b32 %r23769, %r185, %r15199; 2026-02-21T09:19:14.2111306Z mov.b32 %r23773, 0f00000000; 2026-02-21T09:19:14.2111373Z mov.b32 %r23771, -1; 2026-02-21T09:19:14.2111432Z mov.b64 %rd1121, -16; 2026-02-21T09:19:14.2111493Z mov.b64 %rd1120, %rd3; 2026-02-21T09:19:14.2111553Z mov.b32 %r23774, %r23773; 2026-02-21T09:19:14.2111619Z mov.b32 %r23775, %r23773; 2026-02-21T09:19:14.2111680Z mov.b32 %r23776, %r23773; 2026-02-21T09:19:14.2111739Z mov.b32 %r23777, %r23773; 2026-02-21T09:19:14.2111801Z mov.b32 %r23778, %r23773; 2026-02-21T09:19:14.2111859Z mov.b32 %r23779, %r23773; 2026-02-21T09:19:14.2111917Z mov.b32 %r23780, %r23773; 2026-02-21T09:19:14.2112027Z mov.b32 %r23781, %r23773; 2026-02-21T09:19:14.2112088Z mov.b32 %r23782, %r23773; 2026-02-21T09:19:14.2112145Z mov.b32 %r23783, %r23773; 2026-02-21T09:19:14.2112204Z mov.b32 %r23784, %r23773; 2026-02-21T09:19:14.2112267Z mov.b32 %r23785, %r23773; 2026-02-21T09:19:14.2112326Z mov.b32 %r23786, %r23773; 2026-02-21T09:19:14.2112385Z mov.b32 %r23787, %r23773; 2026-02-21T09:19:14.2112449Z mov.b32 %r23788, %r23773; 2026-02-21T09:19:14.2112508Z mov.b32 %r23789, %r23773; 2026-02-21T09:19:14.2112566Z mov.b32 %r23790, %r23773; 2026-02-21T09:19:14.2112625Z mov.b32 %r23791, %r23773; 2026-02-21T09:19:14.2112688Z mov.b32 %r23792, %r23773; 2026-02-21T09:19:14.2112745Z mov.b32 %r23793, %r23773; 2026-02-21T09:19:14.2112805Z mov.b32 %r23794, %r23773; 2026-02-21T09:19:14.2112865Z mov.b32 %r23795, %r23773; 2026-02-21T09:19:14.2112922Z mov.b32 %r23796, %r23773; 2026-02-21T09:19:14.2112979Z mov.b32 %r23797, %r23773; 2026-02-21T09:19:14.2113037Z mov.b32 %r23798, %r23773; 2026-02-21T09:19:14.2113104Z mov.b32 %r23799, %r23773; 2026-02-21T09:19:14.2113163Z mov.b32 %r23800, %r23773; 2026-02-21T09:19:14.2113223Z mov.b32 %r23801, %r23773; 2026-02-21T09:19:14.2113283Z mov.b32 %r23802, %r23773; 2026-02-21T09:19:14.2113340Z mov.b32 %r23803, %r23773; 2026-02-21T09:19:14.2113398Z mov.b32 %r23804, %r23773; 2026-02-21T09:19:14.2113456Z mov.b32 %r23805, %r23773; 2026-02-21T09:19:14.2113526Z mov.b32 %r23806, %r23773; 2026-02-21T09:19:14.2113591Z mov.b32 %r23807, %r23773; 2026-02-21T09:19:14.2113650Z mov.b32 %r23808, %r23773; 2026-02-21T09:19:14.2113709Z mov.b32 %r23809, %r23773; 2026-02-21T09:19:14.2113767Z mov.b32 %r23810, %r23773; 2026-02-21T09:19:14.2113824Z mov.b32 %r23811, %r23773; 2026-02-21T09:19:14.2113885Z mov.b32 %r23812, %r23773; 2026-02-21T09:19:14.2113948Z mov.b32 %r23813, %r23773; 2026-02-21T09:19:14.2114008Z mov.b32 %r23814, %r23773; 2026-02-21T09:19:14.2114066Z mov.b32 %r23815, %r23773; 2026-02-21T09:19:14.2114129Z mov.b32 %r23816, %r23773; 2026-02-21T09:19:14.2114189Z mov.b32 %r23817, %r23773; 2026-02-21T09:19:14.2114248Z mov.b32 %r23818, %r23773; 2026-02-21T09:19:14.2114306Z mov.b32 %r23819, %r23773; 2026-02-21T09:19:14.2114366Z mov.b32 %r23820, %r23773; 2026-02-21T09:19:14.2114426Z mov.b32 %r23821, %r23773; 2026-02-21T09:19:14.2114485Z mov.b32 %r23822, %r23773; 2026-02-21T09:19:14.2114544Z mov.b32 %r23823, %r23773; 2026-02-21T09:19:14.2114602Z mov.b32 %r23824, %r23773; 2026-02-21T09:19:14.2114721Z mov.b32 %r23825, %r23773; 2026-02-21T09:19:14.2114782Z mov.b32 %r23826, %r23773; 2026-02-21T09:19:14.2114839Z mov.b32 %r23827, %r23773; 2026-02-21T09:19:14.2114897Z mov.b32 %r23828, %r23773; 2026-02-21T09:19:14.2115000Z mov.b32 %r23829, %r23773; 2026-02-21T09:19:14.2115061Z mov.b32 %r23830, %r23773; 2026-02-21T09:19:14.2115119Z mov.b32 %r23831, %r23773; 2026-02-21T09:19:14.2115177Z mov.b32 %r23832, %r23773; 2026-02-21T09:19:14.2115238Z mov.b32 %r23833, %r23773; 2026-02-21T09:19:14.2115296Z mov.b32 %r23834, %r23773; 2026-02-21T09:19:14.2115355Z mov.b32 %r23835, %r23773; 2026-02-21T09:19:14.2115414Z mov.b32 %r23836, %r23773; 2026-02-21T09:19:14.2115475Z mov.b32 %r23837, %r23773; 2026-02-21T09:19:14.2115532Z mov.b32 %r23838, %r23773; 2026-02-21T09:19:14.2115648Z mov.b32 %r23839, %r23773; 2026-02-21T09:19:14.2115712Z mov.b32 %r23840, %r23773; 2026-02-21T09:19:14.2115769Z mov.b32 %r23841, %r23773; 2026-02-21T09:19:14.2115827Z mov.b32 %r23842, %r23773; 2026-02-21T09:19:14.2115890Z mov.b32 %r23843, %r23773; 2026-02-21T09:19:14.2115952Z mov.b32 %r23844, %r23773; 2026-02-21T09:19:14.2116010Z mov.b32 %r23845, %r23773; 2026-02-21T09:19:14.2116067Z mov.b32 %r23846, %r23773; 2026-02-21T09:19:14.2116131Z mov.b32 %r23847, %r23773; 2026-02-21T09:19:14.2116190Z mov.b32 %r23848, %r23773; 2026-02-21T09:19:14.2116247Z mov.b32 %r23849, %r23773; 2026-02-21T09:19:14.2116305Z mov.b32 %r23850, %r23773; 2026-02-21T09:19:14.2116367Z mov.b32 %r23851, %r23773; 2026-02-21T09:19:14.2116424Z mov.b32 %r23852, %r23773; 2026-02-21T09:19:14.2116667Z mov.b32 %r23853, %r23773; 2026-02-21T09:19:14.2116742Z mov.b32 %r23854, %r23773; 2026-02-21T09:19:14.2120740Z mov.b32 %r23855, %r23773; 2026-02-21T09:19:14.2120842Z mov.b32 %r23856, %r23773; 2026-02-21T09:19:14.2120913Z mov.b32 %r23857, %r23773; 2026-02-21T09:19:14.2120972Z mov.b32 %r23858, %r23773; 2026-02-21T09:19:14.2121033Z mov.b32 %r23859, %r23773; 2026-02-21T09:19:14.2121091Z mov.b32 %r23860, %r23773; 2026-02-21T09:19:14.2121152Z mov.b32 %r23861, %r23773; 2026-02-21T09:19:14.2121209Z mov.b32 %r23862, %r23773; 2026-02-21T09:19:14.2121275Z mov.b32 %r23863, %r23773; 2026-02-21T09:19:14.2121334Z mov.b32 %r23864, %r23773; 2026-02-21T09:19:14.2121394Z mov.b32 %r23865, %r23773; 2026-02-21T09:19:14.2121458Z mov.b32 %r23866, %r23773; 2026-02-21T09:19:14.2121526Z mov.b32 %r23867, %r23773; 2026-02-21T09:19:14.2121583Z mov.b32 %r23868, %r23773; 2026-02-21T09:19:14.2121643Z mov.b32 %r23869, %r23773; 2026-02-21T09:19:14.2121703Z mov.b32 %r23870, %r23773; 2026-02-21T09:19:14.2121762Z mov.b32 %r23871, %r23773; 2026-02-21T09:19:14.2121819Z mov.b32 %r23872, %r23773; 2026-02-21T09:19:14.2121879Z mov.b32 %r23873, %r23773; 2026-02-21T09:19:14.2121938Z mov.b32 %r23874, %r23773; 2026-02-21T09:19:14.2121997Z mov.b32 %r23875, %r23773; 2026-02-21T09:19:14.2122055Z mov.b32 %r23876, %r23773; 2026-02-21T09:19:14.2122116Z mov.b32 %r23877, %r23773; 2026-02-21T09:19:14.2122174Z mov.b32 %r23878, %r23773; 2026-02-21T09:19:14.2122232Z mov.b32 %r23879, %r23773; 2026-02-21T09:19:14.2122294Z mov.b32 %r23880, %r23773; 2026-02-21T09:19:14.2122352Z mov.b32 %r23881, %r23773; 2026-02-21T09:19:14.2122411Z mov.b32 %r23882, %r23773; 2026-02-21T09:19:14.2122468Z mov.b32 %r23883, %r23773; 2026-02-21T09:19:14.2122528Z mov.b32 %r23884, %r23773; 2026-02-21T09:19:14.2122588Z mov.b32 %r23885, %r23773; 2026-02-21T09:19:14.2122645Z mov.b32 %r23886, %r23773; 2026-02-21T09:19:14.2122706Z mov.b32 %r23887, %r23773; 2026-02-21T09:19:14.2122763Z mov.b32 %r23888, %r23773; 2026-02-21T09:19:14.2122820Z mov.b32 %r23889, %r23773; 2026-02-21T09:19:14.2122877Z mov.b32 %r23890, %r23773; 2026-02-21T09:19:14.2122939Z mov.b32 %r23891, %r23773; 2026-02-21T09:19:14.2123003Z mov.b32 %r23892, %r23773; 2026-02-21T09:19:14.2123064Z mov.b32 %r23893, %r23773; 2026-02-21T09:19:14.2123125Z mov.b32 %r23894, %r23773; 2026-02-21T09:19:14.2123183Z mov.b32 %r23895, %r23773; 2026-02-21T09:19:14.2123367Z mov.b32 %r23896, %r23773; 2026-02-21T09:19:14.2123424Z mov.b32 %r23897, %r23773; 2026-02-21T09:19:14.2123483Z mov.b32 %r23898, %r23773; 2026-02-21T09:19:14.2123541Z mov.b32 %r23899, %r23773; 2026-02-21T09:19:14.2123597Z mov.b32 %r23900, %r23773; 2026-02-21T09:19:14.2123728Z mov.b32 %r23901, %r23773; 2026-02-21T09:19:14.2123785Z mov.b32 %r23902, %r23773; 2026-02-21T09:19:14.2123841Z mov.b32 %r23903, %r23773; 2026-02-21T09:19:14.2123899Z mov.b32 %r23904, %r23773; 2026-02-21T09:19:14.2123959Z mov.b32 %r23905, %r23773; 2026-02-21T09:19:14.2124017Z mov.b32 %r23906, %r23773; 2026-02-21T09:19:14.2124076Z mov.b32 %r23907, %r23773; 2026-02-21T09:19:14.2124135Z mov.b32 %r23908, %r23773; 2026-02-21T09:19:14.2124191Z mov.b32 %r23909, %r23773; 2026-02-21T09:19:14.2124247Z mov.b32 %r23910, %r23773; 2026-02-21T09:19:14.2124374Z mov.b32 %r23911, %r23773; 2026-02-21T09:19:14.2124433Z mov.b32 %r23912, %r23773; 2026-02-21T09:19:14.2124501Z mov.b32 %r23913, %r23773; 2026-02-21T09:19:14.2124564Z mov.b32 %r23914, %r23773; 2026-02-21T09:19:14.2124622Z mov.b32 %r23915, %r23773; 2026-02-21T09:19:14.2124682Z mov.b32 %r23916, %r23773; 2026-02-21T09:19:14.2124739Z mov.b32 %r23917, %r23773; 2026-02-21T09:19:14.2124797Z mov.b32 %r23918, %r23773; 2026-02-21T09:19:14.2124856Z mov.b32 %r23919, %r23773; 2026-02-21T09:19:14.2124914Z mov.b32 %r23920, %r23773; 2026-02-21T09:19:14.2124972Z mov.b32 %r23921, %r23773; 2026-02-21T09:19:14.2125033Z mov.b32 %r23922, %r23773; 2026-02-21T09:19:14.2125089Z mov.b32 %r23923, %r23773; 2026-02-21T09:19:14.2125146Z mov.b32 %r23924, %r23773; 2026-02-21T09:19:14.2125256Z mov.b32 %r23925, %r23773; 2026-02-21T09:19:14.2125314Z mov.b32 %r23926, %r23773; 2026-02-21T09:19:14.2125374Z mov.b32 %r23927, %r23773; 2026-02-21T09:19:14.2125433Z mov.b32 %r23928, %r23773; 2026-02-21T09:19:14.2125494Z mov.b32 %r23929, %r23773; 2026-02-21T09:19:14.2125551Z mov.b32 %r23930, %r23773; 2026-02-21T09:19:14.2125608Z mov.b32 %r23931, %r23773; 2026-02-21T09:19:14.2125670Z mov.b32 %r23932, %r23773; 2026-02-21T09:19:14.2125726Z mov.b32 %r23933, %r23773; 2026-02-21T09:19:14.2125784Z mov.b32 %r23934, %r23773; 2026-02-21T09:19:14.2125841Z mov.b32 %r23935, %r23773; 2026-02-21T09:19:14.2125900Z mov.b32 %r23936, %r23773; 2026-02-21T09:19:14.2125959Z mov.b32 %r23937, %r23773; 2026-02-21T09:19:14.2126016Z mov.b32 %r23938, %r23773; 2026-02-21T09:19:14.2126076Z mov.b32 %r23939, %r23773; 2026-02-21T09:19:14.2126133Z mov.b32 %r23940, %r23773; 2026-02-21T09:19:14.2126190Z mov.b32 %r23941, %r23773; 2026-02-21T09:19:14.2126247Z mov.b32 %r23942, %r23773; 2026-02-21T09:19:14.2126312Z mov.b32 %r23943, %r23773; 2026-02-21T09:19:14.2126368Z mov.b32 %r23944, %r23773; 2026-02-21T09:19:14.2126425Z mov.b32 %r23945, %r23773; 2026-02-21T09:19:14.2126658Z mov.b32 %r23946, %r23773; 2026-02-21T09:19:14.2126733Z mov.b32 %r23947, %r23773; 2026-02-21T09:19:14.2126794Z mov.b32 %r23948, %r23773; 2026-02-21T09:19:14.2126852Z mov.b32 %r23949, %r23773; 2026-02-21T09:19:14.2126914Z mov.b32 %r23950, %r23773; 2026-02-21T09:19:14.2126971Z mov.b32 %r23951, %r23773; 2026-02-21T09:19:14.2127027Z mov.b32 %r23952, %r23773; 2026-02-21T09:19:14.2127087Z mov.b32 %r23953, %r23773; 2026-02-21T09:19:14.2127144Z mov.b32 %r23954, %r23773; 2026-02-21T09:19:14.2127204Z mov.b32 %r23955, %r23773; 2026-02-21T09:19:14.2127264Z mov.b32 %r23956, %r23773; 2026-02-21T09:19:14.2127321Z mov.b32 %r23957, %r23773; 2026-02-21T09:19:14.2127378Z mov.b32 %r23958, %r23773; 2026-02-21T09:19:14.2127435Z mov.b32 %r23959, %r23773; 2026-02-21T09:19:14.2127495Z mov.b32 %r23960, %r23773; 2026-02-21T09:19:14.2127555Z mov.b32 %r23961, %r23773; 2026-02-21T09:19:14.2127614Z mov.b32 %r23962, %r23773; 2026-02-21T09:19:14.2127675Z mov.b32 %r23963, %r23773; 2026-02-21T09:19:14.2127735Z mov.b32 %r23964, %r23773; 2026-02-21T09:19:14.2127792Z mov.b32 %r23965, %r23773; 2026-02-21T09:19:14.2127850Z mov.b32 %r23966, %r23773; 2026-02-21T09:19:14.2127912Z mov.b32 %r23967, %r23773; 2026-02-21T09:19:14.2128060Z mov.b32 %r23968, %r23773; 2026-02-21T09:19:14.2128117Z mov.b32 %r23969, %r23773; 2026-02-21T09:19:14.2128176Z mov.b32 %r23970, %r23773; 2026-02-21T09:19:14.2128234Z mov.b32 %r23971, %r23773; 2026-02-21T09:19:14.2128292Z mov.b32 %r23972, %r23773; 2026-02-21T09:19:14.2128413Z mov.b32 %r23973, %r23773; 2026-02-21T09:19:14.2128476Z mov.b32 %r23974, %r23773; 2026-02-21T09:19:14.2128533Z mov.b32 %r23975, %r23773; 2026-02-21T09:19:14.2128592Z mov.b32 %r23976, %r23773; 2026-02-21T09:19:14.2128655Z mov.b32 %r23977, %r23773; 2026-02-21T09:19:14.2128712Z mov.b32 %r23978, %r23773; 2026-02-21T09:19:14.2128771Z mov.b32 %r23979, %r23773; 2026-02-21T09:19:14.2128830Z mov.b32 %r23980, %r23773; 2026-02-21T09:19:14.2128891Z mov.b32 %r23981, %r23773; 2026-02-21T09:19:14.2129024Z mov.b32 %r23982, %r23773; 2026-02-21T09:19:14.2129086Z mov.b32 %r23983, %r23773; 2026-02-21T09:19:14.2129147Z mov.b32 %r23984, %r23773; 2026-02-21T09:19:14.2129204Z mov.b32 %r23985, %r23773; 2026-02-21T09:19:14.2129263Z mov.b32 %r23986, %r23773; 2026-02-21T09:19:14.2129321Z mov.b32 %r23987, %r23773; 2026-02-21T09:19:14.2129381Z mov.b32 %r23988, %r23773; 2026-02-21T09:19:14.2129439Z mov.b32 %r23989, %r23773; 2026-02-21T09:19:14.2129495Z mov.b32 %r23990, %r23773; 2026-02-21T09:19:14.2129556Z mov.b32 %r23991, %r23773; 2026-02-21T09:19:14.2129614Z mov.b32 %r23992, %r23773; 2026-02-21T09:19:14.2129671Z mov.b32 %r23993, %r23773; 2026-02-21T09:19:14.2129729Z mov.b32 %r23994, %r23773; 2026-02-21T09:19:14.2129788Z mov.b32 %r23995, %r23773; 2026-02-21T09:19:14.2129846Z mov.b32 %r23996, %r23773; 2026-02-21T09:19:14.2129966Z mov.b32 %r23997, %r23773; 2026-02-21T09:19:14.2130029Z mov.b32 %r23998, %r23773; 2026-02-21T09:19:14.2130088Z mov.b32 %r23999, %r23773; 2026-02-21T09:19:14.2130147Z mov.b32 %r24000, %r23773; 2026-02-21T09:19:14.2130209Z mov.b32 %r24001, %r23773; 2026-02-21T09:19:14.2130266Z mov.b32 %r24002, %r23773; 2026-02-21T09:19:14.2130321Z mov.b32 %r24003, %r23773; 2026-02-21T09:19:14.2130381Z mov.b32 %r24004, %r23773; 2026-02-21T09:19:14.2130447Z mov.b32 %r24005, %r23773; 2026-02-21T09:19:14.2130512Z mov.b32 %r24006, %r23773; 2026-02-21T09:19:14.2130569Z mov.b32 %r24007, %r23773; 2026-02-21T09:19:14.2130628Z mov.b32 %r24008, %r23773; 2026-02-21T09:19:14.2130687Z mov.b32 %r24009, %r23773; 2026-02-21T09:19:14.2130743Z mov.b32 %r24010, %r23773; 2026-02-21T09:19:14.2130800Z mov.b32 %r24011, %r23773; 2026-02-21T09:19:14.2130860Z mov.b32 %r24012, %r23773; 2026-02-21T09:19:14.2130917Z mov.b32 %r24013, %r23773; 2026-02-21T09:19:14.2130976Z mov.b32 %r24014, %r23773; 2026-02-21T09:19:14.2131035Z mov.b32 %r24015, %r23773; 2026-02-21T09:19:14.2131093Z mov.b32 %r24016, %r23773; 2026-02-21T09:19:14.2131150Z mov.b32 %r24017, %r23773; 2026-02-21T09:19:14.2131209Z mov.b32 %r24018, %r23773; 2026-02-21T09:19:14.2131267Z mov.b32 %r24019, %r23773; 2026-02-21T09:19:14.2131327Z mov.b32 %r24020, %r23773; 2026-02-21T09:19:14.2131382Z mov.b32 %r24021, %r23773; 2026-02-21T09:19:14.2131443Z mov.b32 %r24022, %r23773; 2026-02-21T09:19:14.2131501Z mov.b32 %r24023, %r23773; 2026-02-21T09:19:14.2131558Z mov.b32 %r24024, %r23773; 2026-02-21T09:19:14.2131615Z mov.b32 %r24025, %r23773; 2026-02-21T09:19:14.2131675Z mov.b32 %r24026, %r23773; 2026-02-21T09:19:14.2131734Z mov.b32 %r24027, %r23773; 2026-02-21T09:19:14.2131792Z mov.b32 %r24028, %r23773; 2026-02-21T09:19:14.2131922Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:19:14.2132033Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:14.2132103Z add.s64 %rd1121, %rd1121, 16; 2026-02-21T09:19:14.2132177Z setp.lt.u64 %p82, %rd1121, 432; 2026-02-21T09:19:14.2132240Z add.s32 %r18385, %r23771, 1; 2026-02-21T09:19:14.2132307Z setp.gt.s32 %p83, %r18385, 4; 2026-02-21T09:19:14.2132377Z selp.b32 %r23771, 0, %r18385, %p83; 2026-02-21T09:19:14.2132609Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2132746Z cp.async.wait_group 16; 2026-02-21T09:19:14.2132803Z bar.sync 0; 2026-02-21T09:19:14.2132867Z shl.b32 %r18386, %r23771, 14; 2026-02-21T09:19:14.2132932Z add.s32 %r18388, %r22967, %r18386; 2026-02-21T09:19:14.2133143Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.2133260Z add.s32 %r18389, %r18388, %r157; 2026-02-21T09:19:14.2133329Z ld.shared.b16 %rs433, [%r18389]; 2026-02-21T09:19:14.2133401Z ld.shared.b16 %rs434, [%r18389+256]; 2026-02-21T09:19:14.2133469Z ld.shared.b16 %rs435, [%r18389+16]; 2026-02-21T09:19:14.2133539Z ld.shared.b16 %rs436, [%r18389+272]; 2026-02-21T09:19:14.2133608Z ld.shared.b16 %rs437, [%r18389+4096]; 2026-02-21T09:19:14.2133723Z ld.shared.b16 %rs438, [%r18389+4352]; 2026-02-21T09:19:14.2133792Z ld.shared.b16 %rs439, [%r18389+4112]; 2026-02-21T09:19:14.2133857Z ld.shared.b16 %rs440, [%r18389+4368]; 2026-02-21T09:19:14.2133923Z ld.shared.b16 %rs441, [%r18389+8192]; 2026-02-21T09:19:14.2133991Z ld.shared.b16 %rs442, [%r18389+8448]; 2026-02-21T09:19:14.2134060Z ld.shared.b16 %rs443, [%r18389+8208]; 2026-02-21T09:19:14.2134126Z ld.shared.b16 %rs444, [%r18389+8464]; 2026-02-21T09:19:14.2134198Z ld.shared.b16 %rs445, [%r18389+12288]; 2026-02-21T09:19:14.2134271Z ld.shared.b16 %rs446, [%r18389+12544]; 2026-02-21T09:19:14.2134337Z ld.shared.b16 %rs447, [%r18389+12304]; 2026-02-21T09:19:14.2134403Z ld.shared.b16 %rs448, [%r18389+12560]; 2026-02-21T09:19:14.2134466Z add.s32 %r18390, %r18388, %r158; 2026-02-21T09:19:14.2134593Z ld.shared.b16 %rs449, [%r18390]; 2026-02-21T09:19:14.2134662Z ld.shared.b16 %rs450, [%r18390+256]; 2026-02-21T09:19:14.2134728Z ld.shared.b16 %rs451, [%r18390+16]; 2026-02-21T09:19:14.2134798Z ld.shared.b16 %rs452, [%r18390+272]; 2026-02-21T09:19:14.2134865Z ld.shared.b16 %rs453, [%r18390+4096]; 2026-02-21T09:19:14.2134933Z ld.shared.b16 %rs454, [%r18390+4352]; 2026-02-21T09:19:14.2134999Z ld.shared.b16 %rs455, [%r18390+4112]; 2026-02-21T09:19:14.2135066Z ld.shared.b16 %rs456, [%r18390+4368]; 2026-02-21T09:19:14.2135131Z ld.shared.b16 %rs457, [%r18390+8192]; 2026-02-21T09:19:14.2135198Z ld.shared.b16 %rs458, [%r18390+8448]; 2026-02-21T09:19:14.2135269Z ld.shared.b16 %rs459, [%r18390+8208]; 2026-02-21T09:19:14.2135336Z ld.shared.b16 %rs460, [%r18390+8464]; 2026-02-21T09:19:14.2135405Z ld.shared.b16 %rs461, [%r18390+12288]; 2026-02-21T09:19:14.2135480Z ld.shared.b16 %rs462, [%r18390+12544]; 2026-02-21T09:19:14.2135554Z ld.shared.b16 %rs463, [%r18390+12304]; 2026-02-21T09:19:14.2135622Z ld.shared.b16 %rs464, [%r18390+12560]; 2026-02-21T09:19:14.2135687Z cvt.f32.bf16 %r15329, %rs433; 2026-02-21T09:19:14.2135754Z cvt.f32.bf16 %r15330, %rs434; 2026-02-21T09:19:14.2135815Z cvt.f32.bf16 %r15331, %rs449; 2026-02-21T09:19:14.2135874Z cvt.f32.bf16 %r15332, %rs450; 2026-02-21T09:19:14.2135937Z cvt.f32.bf16 %r15461, %rs435; 2026-02-21T09:19:14.2135997Z cvt.f32.bf16 %r15462, %rs436; 2026-02-21T09:19:14.2136058Z cvt.f32.bf16 %r15463, %rs451; 2026-02-21T09:19:14.2136120Z cvt.f32.bf16 %r15464, %rs452; 2026-02-21T09:19:14.2136181Z cvt.f32.bf16 %r15593, %rs437; 2026-02-21T09:19:14.2136240Z cvt.f32.bf16 %r15594, %rs438; 2026-02-21T09:19:14.2136299Z cvt.f32.bf16 %r15595, %rs453; 2026-02-21T09:19:14.2136362Z cvt.f32.bf16 %r15596, %rs454; 2026-02-21T09:19:14.2136426Z cvt.f32.bf16 %r15725, %rs439; 2026-02-21T09:19:14.2136605Z cvt.f32.bf16 %r15726, %rs440; 2026-02-21T09:19:14.2136683Z cvt.f32.bf16 %r15727, %rs455; 2026-02-21T09:19:14.2136745Z cvt.f32.bf16 %r15728, %rs456; 2026-02-21T09:19:14.2136806Z cvt.f32.bf16 %r15857, %rs441; 2026-02-21T09:19:14.2136865Z cvt.f32.bf16 %r15858, %rs442; 2026-02-21T09:19:14.2136927Z cvt.f32.bf16 %r15859, %rs457; 2026-02-21T09:19:14.2136988Z cvt.f32.bf16 %r15860, %rs458; 2026-02-21T09:19:14.2137049Z cvt.f32.bf16 %r15989, %rs443; 2026-02-21T09:19:14.2137111Z cvt.f32.bf16 %r15990, %rs444; 2026-02-21T09:19:14.2137171Z cvt.f32.bf16 %r15991, %rs459; 2026-02-21T09:19:14.2137319Z cvt.f32.bf16 %r15992, %rs460; 2026-02-21T09:19:14.2137380Z cvt.f32.bf16 %r16121, %rs445; 2026-02-21T09:19:14.2137444Z cvt.f32.bf16 %r16122, %rs446; 2026-02-21T09:19:14.2137504Z cvt.f32.bf16 %r16123, %rs461; 2026-02-21T09:19:14.2137562Z cvt.f32.bf16 %r16124, %rs462; 2026-02-21T09:19:14.2137704Z cvt.f32.bf16 %r16253, %rs447; 2026-02-21T09:19:14.2137764Z cvt.f32.bf16 %r16254, %rs448; 2026-02-21T09:19:14.2137823Z cvt.f32.bf16 %r16255, %rs463; 2026-02-21T09:19:14.2137884Z cvt.f32.bf16 %r16256, %rs464; 2026-02-21T09:19:14.2138107Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2138172Z shl.b32 %r18391, %r23771, 10; 2026-02-21T09:19:14.2138237Z add.s32 %r18392, %r22967, %r18391; 2026-02-21T09:19:14.2138371Z add.s32 %r18393, %r18392, 172032; 2026-02-21T09:19:14.2138583Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.2138648Z add.s32 %r18394, %r18393, %r22973; 2026-02-21T09:19:14.2138719Z ld.shared.b8 %rs465, [%r18394]; 2026-02-21T09:19:14.2138787Z ld.shared.b8 %rs466, [%r18394+128]; 2026-02-21T09:19:14.2138852Z ld.shared.b8 %rs467, [%r18394+256]; 2026-02-21T09:19:14.2138919Z ld.shared.b8 %rs468, [%r18394+384]; 2026-02-21T09:19:14.2138984Z ld.shared.b8 %rs469, [%r18394+512]; 2026-02-21T09:19:14.2139047Z ld.shared.b8 %rs470, [%r18394+640]; 2026-02-21T09:19:14.2139111Z ld.shared.b8 %rs471, [%r18394+768]; 2026-02-21T09:19:14.2139175Z add.s32 %r18395, %r18393, %r22974; 2026-02-21T09:19:14.2139308Z ld.shared.b8 %rs472, [%r18395]; 2026-02-21T09:19:14.2139512Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.2139581Z shl.b16 %rs473, %rs465, 4; 2026-02-21T09:19:14.2139643Z shl.b16 %rs474, %rs466, 4; 2026-02-21T09:19:14.2139702Z shl.b16 %rs475, %rs467, 4; 2026-02-21T09:19:14.2139762Z shl.b16 %rs476, %rs468, 4; 2026-02-21T09:19:14.2139829Z shl.b16 %rs477, %rs469, 4; 2026-02-21T09:19:14.2139889Z shl.b16 %rs478, %rs470, 4; 2026-02-21T09:19:14.2139948Z shl.b16 %rs479, %rs471, 4; 2026-02-21T09:19:14.2140009Z shl.b16 %rs480, %rs472, 4; 2026-02-21T09:19:14.2140207Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.2140284Z selp.b16 %rs481, %rs473, %rs465, %p110; 2026-02-21T09:19:14.2140348Z cvt.s16.s8 %rs482, %rs481; 2026-02-21T09:19:14.2140407Z shr.s16 %rs483, %rs482, 4; 2026-02-21T09:19:14.2140476Z selp.b16 %rs484, %rs474, %rs466, %p110; 2026-02-21T09:19:14.2140538Z cvt.s16.s8 %rs485, %rs484; 2026-02-21T09:19:14.2140602Z shr.s16 %rs486, %rs485, 4; 2026-02-21T09:19:14.2140672Z selp.b16 %rs487, %rs475, %rs467, %p110; 2026-02-21T09:19:14.2140733Z cvt.s16.s8 %rs488, %rs487; 2026-02-21T09:19:14.2140808Z shr.s16 %rs489, %rs488, 4; 2026-02-21T09:19:14.2140878Z selp.b16 %rs490, %rs476, %rs468, %p110; 2026-02-21T09:19:14.2140939Z cvt.s16.s8 %rs491, %rs490; 2026-02-21T09:19:14.2141000Z shr.s16 %rs492, %rs491, 4; 2026-02-21T09:19:14.2141070Z selp.b16 %rs493, %rs477, %rs469, %p110; 2026-02-21T09:19:14.2141128Z cvt.s16.s8 %rs494, %rs493; 2026-02-21T09:19:14.2141188Z shr.s16 %rs495, %rs494, 4; 2026-02-21T09:19:14.2141260Z selp.b16 %rs496, %rs478, %rs470, %p110; 2026-02-21T09:19:14.2141320Z cvt.s16.s8 %rs497, %rs496; 2026-02-21T09:19:14.2141379Z shr.s16 %rs498, %rs497, 4; 2026-02-21T09:19:14.2141446Z selp.b16 %rs499, %rs479, %rs471, %p110; 2026-02-21T09:19:14.2141510Z cvt.s16.s8 %rs500, %rs499; 2026-02-21T09:19:14.2141571Z shr.s16 %rs501, %rs500, 4; 2026-02-21T09:19:14.2141638Z selp.b16 %rs502, %rs480, %rs472, %p110; 2026-02-21T09:19:14.2141701Z cvt.s16.s8 %rs503, %rs502; 2026-02-21T09:19:14.2141760Z shr.s16 %rs504, %rs503, 4; 2026-02-21T09:19:14.2141964Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.2142034Z cvt.rn.f32.s16 %r18396, %rs483; 2026-02-21T09:19:14.2142158Z cvt.rn.f32.s16 %r18397, %rs486; 2026-02-21T09:19:14.2142221Z cvt.rn.f32.s16 %r18398, %rs489; 2026-02-21T09:19:14.2142285Z cvt.rn.f32.s16 %r18399, %rs492; 2026-02-21T09:19:14.2142346Z cvt.rn.f32.s16 %r18400, %rs495; 2026-02-21T09:19:14.2142406Z cvt.rn.f32.s16 %r18401, %rs498; 2026-02-21T09:19:14.2142515Z cvt.rn.f32.s16 %r18402, %rs501; 2026-02-21T09:19:14.2142579Z cvt.rn.f32.s16 %r18403, %rs504; 2026-02-21T09:19:14.2142646Z st.shared.b32 [%r161], %r18396; 2026-02-21T09:19:14.2142712Z st.shared.b32 [%r161+8], %r18397; 2026-02-21T09:19:14.2142779Z st.shared.b32 [%r162], %r18398; 2026-02-21T09:19:14.2142843Z st.shared.b32 [%r162+8], %r18399; 2026-02-21T09:19:14.2142905Z st.shared.b32 [%r163], %r18400; 2026-02-21T09:19:14.2142969Z st.shared.b32 [%r163+8], %r18401; 2026-02-21T09:19:14.2143099Z st.shared.b32 [%r164], %r18402; 2026-02-21T09:19:14.2143164Z st.shared.b32 [%r164+8], %r18403; 2026-02-21T09:19:14.2143222Z $L__tmp13: 2026-02-21T09:19:14.2143503Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.2143568Z // begin inline asm 2026-02-21T09:19:14.2143652Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.2143710Z // end inline asm 2026-02-21T09:19:14.2143767Z bar.sync 0; 2026-02-21T09:19:14.2143852Z shfl.sync.idx.b32 %r18404, %r5, 0, 31, -1; 2026-02-21T09:19:14.2143926Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.2143994Z mov.pred %p65, -1; 2026-02-21T09:19:14.2144051Z // begin inline asm 2026-02-21T09:19:14.2145572Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836}, {%r15329,%r15330,%r15331,%r15332}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2145639Z // end inline asm 2026-02-21T09:19:14.2145697Z // begin inline asm 2026-02-21T09:19:14.2147286Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836}, {%r15461,%r15462,%r15463,%r15464}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2147353Z // end inline asm 2026-02-21T09:19:14.2147413Z // begin inline asm 2026-02-21T09:19:14.2148966Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900}, {%r15593,%r15594,%r15595,%r15596}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2149027Z // end inline asm 2026-02-21T09:19:14.2149085Z // begin inline asm 2026-02-21T09:19:14.2150569Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900}, {%r15725,%r15726,%r15727,%r15728}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2150775Z // end inline asm 2026-02-21T09:19:14.2150835Z // begin inline asm 2026-02-21T09:19:14.2152386Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964}, {%r15857,%r15858,%r15859,%r15860}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2152447Z // end inline asm 2026-02-21T09:19:14.2152506Z // begin inline asm 2026-02-21T09:19:14.2154046Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964}, {%r15989,%r15990,%r15991,%r15992}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2154111Z // end inline asm 2026-02-21T09:19:14.2154167Z // begin inline asm 2026-02-21T09:19:14.2155641Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028}, {%r16121,%r16122,%r16123,%r16124}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2155704Z // end inline asm 2026-02-21T09:19:14.2155761Z // begin inline asm 2026-02-21T09:19:14.2157353Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028}, {%r16253,%r16254,%r16255,%r16256}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2157419Z // end inline asm 2026-02-21T09:19:14.2157493Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.2157555Z mov.b32 %r18089, 0; 2026-02-21T09:19:14.2157620Z mov.b32 %r16513, %r2973; 2026-02-21T09:19:14.2157681Z mov.b32 %r16514, %r18089; 2026-02-21T09:19:14.2157742Z mov.b32 %r16515, %r18089; 2026-02-21T09:19:14.2157803Z // begin inline asm 2026-02-21T09:19:14.2162962Z // wait for regs: %r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836,%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900,%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964,%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028,%r16513,%r16514,%r16515 2026-02-21T09:19:14.2163164Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.2163226Z // end inline asm 2026-02-21T09:19:14.2163281Z $L__tmp14: 2026-02-21T09:19:14.2163504Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2163571Z add.s32 %r18406, %r6453, %r18386; 2026-02-21T09:19:14.2163782Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.2163847Z add.s32 %r18407, %r18406, %r157; 2026-02-21T09:19:14.2163914Z ld.shared.b16 %rs505, [%r18407]; 2026-02-21T09:19:14.2163987Z ld.shared.b16 %rs506, [%r18407+256]; 2026-02-21T09:19:14.2164054Z ld.shared.b16 %rs507, [%r18407+16]; 2026-02-21T09:19:14.2164120Z ld.shared.b16 %rs508, [%r18407+272]; 2026-02-21T09:19:14.2164192Z ld.shared.b16 %rs509, [%r18407+4096]; 2026-02-21T09:19:14.2164260Z ld.shared.b16 %rs510, [%r18407+4352]; 2026-02-21T09:19:14.2164325Z ld.shared.b16 %rs511, [%r18407+4112]; 2026-02-21T09:19:14.2164391Z ld.shared.b16 %rs512, [%r18407+4368]; 2026-02-21T09:19:14.2164459Z ld.shared.b16 %rs513, [%r18407+8192]; 2026-02-21T09:19:14.2164526Z ld.shared.b16 %rs514, [%r18407+8448]; 2026-02-21T09:19:14.2164590Z ld.shared.b16 %rs515, [%r18407+8208]; 2026-02-21T09:19:14.2164669Z ld.shared.b16 %rs516, [%r18407+8464]; 2026-02-21T09:19:14.2164743Z ld.shared.b16 %rs517, [%r18407+12288]; 2026-02-21T09:19:14.2164810Z ld.shared.b16 %rs518, [%r18407+12544]; 2026-02-21T09:19:14.2164877Z ld.shared.b16 %rs519, [%r18407+12304]; 2026-02-21T09:19:14.2164947Z ld.shared.b16 %rs520, [%r18407+12560]; 2026-02-21T09:19:14.2165009Z add.s32 %r18408, %r18406, %r158; 2026-02-21T09:19:14.2165074Z ld.shared.b16 %rs521, [%r18408]; 2026-02-21T09:19:14.2165141Z ld.shared.b16 %rs522, [%r18408+256]; 2026-02-21T09:19:14.2165207Z ld.shared.b16 %rs523, [%r18408+16]; 2026-02-21T09:19:14.2165273Z ld.shared.b16 %rs524, [%r18408+272]; 2026-02-21T09:19:14.2165341Z ld.shared.b16 %rs525, [%r18408+4096]; 2026-02-21T09:19:14.2165406Z ld.shared.b16 %rs526, [%r18408+4352]; 2026-02-21T09:19:14.2165472Z ld.shared.b16 %rs527, [%r18408+4112]; 2026-02-21T09:19:14.2165538Z ld.shared.b16 %rs528, [%r18408+4368]; 2026-02-21T09:19:14.2165604Z ld.shared.b16 %rs529, [%r18408+8192]; 2026-02-21T09:19:14.2165754Z ld.shared.b16 %rs530, [%r18408+8448]; 2026-02-21T09:19:14.2165823Z ld.shared.b16 %rs531, [%r18408+8208]; 2026-02-21T09:19:14.2165888Z ld.shared.b16 %rs532, [%r18408+8464]; 2026-02-21T09:19:14.2165954Z ld.shared.b16 %rs533, [%r18408+12288]; 2026-02-21T09:19:14.2166071Z ld.shared.b16 %rs534, [%r18408+12544]; 2026-02-21T09:19:14.2166136Z ld.shared.b16 %rs535, [%r18408+12304]; 2026-02-21T09:19:14.2166201Z ld.shared.b16 %rs536, [%r18408+12560]; 2026-02-21T09:19:14.2166264Z cvt.f32.bf16 %r16903, %rs505; 2026-02-21T09:19:14.2166327Z cvt.f32.bf16 %r16904, %rs506; 2026-02-21T09:19:14.2166388Z cvt.f32.bf16 %r16905, %rs521; 2026-02-21T09:19:14.2166728Z cvt.f32.bf16 %r16906, %rs522; 2026-02-21T09:19:14.2166799Z cvt.f32.bf16 %r17035, %rs507; 2026-02-21T09:19:14.2166940Z cvt.f32.bf16 %r17036, %rs508; 2026-02-21T09:19:14.2167002Z cvt.f32.bf16 %r17037, %rs523; 2026-02-21T09:19:14.2167062Z cvt.f32.bf16 %r17038, %rs524; 2026-02-21T09:19:14.2167124Z cvt.f32.bf16 %r17167, %rs509; 2026-02-21T09:19:14.2167185Z cvt.f32.bf16 %r17168, %rs510; 2026-02-21T09:19:14.2167243Z cvt.f32.bf16 %r17169, %rs525; 2026-02-21T09:19:14.2167306Z cvt.f32.bf16 %r17170, %rs526; 2026-02-21T09:19:14.2167367Z cvt.f32.bf16 %r17299, %rs511; 2026-02-21T09:19:14.2167428Z cvt.f32.bf16 %r17300, %rs512; 2026-02-21T09:19:14.2167490Z cvt.f32.bf16 %r17301, %rs527; 2026-02-21T09:19:14.2167553Z cvt.f32.bf16 %r17302, %rs528; 2026-02-21T09:19:14.2167613Z cvt.f32.bf16 %r17431, %rs513; 2026-02-21T09:19:14.2167673Z cvt.f32.bf16 %r17432, %rs514; 2026-02-21T09:19:14.2167801Z cvt.f32.bf16 %r17433, %rs529; 2026-02-21T09:19:14.2167863Z cvt.f32.bf16 %r17434, %rs530; 2026-02-21T09:19:14.2167922Z cvt.f32.bf16 %r17563, %rs515; 2026-02-21T09:19:14.2167983Z cvt.f32.bf16 %r17564, %rs516; 2026-02-21T09:19:14.2168043Z cvt.f32.bf16 %r17565, %rs531; 2026-02-21T09:19:14.2168102Z cvt.f32.bf16 %r17566, %rs532; 2026-02-21T09:19:14.2168161Z cvt.f32.bf16 %r17695, %rs517; 2026-02-21T09:19:14.2168223Z cvt.f32.bf16 %r17696, %rs518; 2026-02-21T09:19:14.2168283Z cvt.f32.bf16 %r17697, %rs533; 2026-02-21T09:19:14.2168341Z cvt.f32.bf16 %r17698, %rs534; 2026-02-21T09:19:14.2168402Z cvt.f32.bf16 %r17827, %rs519; 2026-02-21T09:19:14.2168460Z cvt.f32.bf16 %r17828, %rs520; 2026-02-21T09:19:14.2168519Z cvt.f32.bf16 %r17829, %rs535; 2026-02-21T09:19:14.2168578Z cvt.f32.bf16 %r17830, %rs536; 2026-02-21T09:19:14.2168790Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2168852Z add.s32 %r18409, %r18392, 177152; 2026-02-21T09:19:14.2169053Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.2169128Z add.s32 %r18410, %r18409, %r22973; 2026-02-21T09:19:14.2169195Z ld.shared.b8 %rs537, [%r18410]; 2026-02-21T09:19:14.2169261Z ld.shared.b8 %rs538, [%r18410+128]; 2026-02-21T09:19:14.2169333Z ld.shared.b8 %rs539, [%r18410+256]; 2026-02-21T09:19:14.2169407Z ld.shared.b8 %rs540, [%r18410+384]; 2026-02-21T09:19:14.2169475Z ld.shared.b8 %rs541, [%r18410+512]; 2026-02-21T09:19:14.2169539Z ld.shared.b8 %rs542, [%r18410+640]; 2026-02-21T09:19:14.2169605Z ld.shared.b8 %rs543, [%r18410+768]; 2026-02-21T09:19:14.2169665Z add.s32 %r18411, %r18409, %r22974; 2026-02-21T09:19:14.2169729Z ld.shared.b8 %rs544, [%r18411]; 2026-02-21T09:19:14.2169931Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.2169994Z shl.b16 %rs545, %rs537, 4; 2026-02-21T09:19:14.2170053Z shl.b16 %rs546, %rs538, 4; 2026-02-21T09:19:14.2170116Z shl.b16 %rs547, %rs539, 4; 2026-02-21T09:19:14.2170176Z shl.b16 %rs548, %rs540, 4; 2026-02-21T09:19:14.2170236Z shl.b16 %rs549, %rs541, 4; 2026-02-21T09:19:14.2170297Z shl.b16 %rs550, %rs542, 4; 2026-02-21T09:19:14.2170359Z shl.b16 %rs551, %rs543, 4; 2026-02-21T09:19:14.2170420Z shl.b16 %rs552, %rs544, 4; 2026-02-21T09:19:14.2170619Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.2170774Z selp.b16 %rs553, %rs545, %rs537, %p110; 2026-02-21T09:19:14.2170835Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T09:19:14.2170894Z shr.s16 %rs555, %rs554, 4; 2026-02-21T09:19:14.2170964Z selp.b16 %rs556, %rs546, %rs538, %p110; 2026-02-21T09:19:14.2171087Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T09:19:14.2171146Z shr.s16 %rs558, %rs557, 4; 2026-02-21T09:19:14.2171214Z selp.b16 %rs559, %rs547, %rs539, %p110; 2026-02-21T09:19:14.2171276Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T09:19:14.2171337Z shr.s16 %rs561, %rs560, 4; 2026-02-21T09:19:14.2171404Z selp.b16 %rs562, %rs548, %rs540, %p110; 2026-02-21T09:19:14.2171464Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T09:19:14.2171526Z shr.s16 %rs564, %rs563, 4; 2026-02-21T09:19:14.2171645Z selp.b16 %rs565, %rs549, %rs541, %p110; 2026-02-21T09:19:14.2171707Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T09:19:14.2171769Z shr.s16 %rs567, %rs566, 4; 2026-02-21T09:19:14.2171839Z selp.b16 %rs568, %rs550, %rs542, %p110; 2026-02-21T09:19:14.2171898Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T09:19:14.2171960Z shr.s16 %rs570, %rs569, 4; 2026-02-21T09:19:14.2172027Z selp.b16 %rs571, %rs551, %rs543, %p110; 2026-02-21T09:19:14.2172090Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T09:19:14.2172151Z shr.s16 %rs573, %rs572, 4; 2026-02-21T09:19:14.2172221Z selp.b16 %rs574, %rs552, %rs544, %p110; 2026-02-21T09:19:14.2172280Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T09:19:14.2172339Z shr.s16 %rs576, %rs575, 4; 2026-02-21T09:19:14.2172592Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.2172671Z cvt.rn.f32.s16 %r18412, %rs555; 2026-02-21T09:19:14.2172736Z cvt.rn.f32.s16 %r18413, %rs558; 2026-02-21T09:19:14.2172800Z cvt.rn.f32.s16 %r18414, %rs561; 2026-02-21T09:19:14.2172864Z cvt.rn.f32.s16 %r18415, %rs564; 2026-02-21T09:19:14.2172924Z cvt.rn.f32.s16 %r18416, %rs567; 2026-02-21T09:19:14.2172984Z cvt.rn.f32.s16 %r18417, %rs570; 2026-02-21T09:19:14.2173051Z cvt.rn.f32.s16 %r18418, %rs573; 2026-02-21T09:19:14.2173110Z cvt.rn.f32.s16 %r18419, %rs576; 2026-02-21T09:19:14.2173165Z bar.sync 0; 2026-02-21T09:19:14.2173232Z st.shared.b32 [%r161], %r18412; 2026-02-21T09:19:14.2173297Z st.shared.b32 [%r161+8], %r18413; 2026-02-21T09:19:14.2173359Z st.shared.b32 [%r162], %r18414; 2026-02-21T09:19:14.2173422Z st.shared.b32 [%r162+8], %r18415; 2026-02-21T09:19:14.2173486Z st.shared.b32 [%r163], %r18416; 2026-02-21T09:19:14.2173549Z st.shared.b32 [%r163+8], %r18417; 2026-02-21T09:19:14.2173616Z st.shared.b32 [%r164], %r18418; 2026-02-21T09:19:14.2173681Z st.shared.b32 [%r164+8], %r18419; 2026-02-21T09:19:14.2173736Z $L__tmp15: 2026-02-21T09:19:14.2174014Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.2174075Z // begin inline asm 2026-02-21T09:19:14.2174155Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.2174213Z // end inline asm 2026-02-21T09:19:14.2174267Z bar.sync 0; 2026-02-21T09:19:14.2174341Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.2174399Z // begin inline asm 2026-02-21T09:19:14.2175899Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836}, {%r16903,%r16904,%r16905,%r16906}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2175972Z // end inline asm 2026-02-21T09:19:14.2176033Z // begin inline asm 2026-02-21T09:19:14.2177648Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836}, {%r17035,%r17036,%r17037,%r17038}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2177855Z // end inline asm 2026-02-21T09:19:14.2177916Z // begin inline asm 2026-02-21T09:19:14.2179485Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900}, {%r17167,%r17168,%r17169,%r17170}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2179552Z // end inline asm 2026-02-21T09:19:14.2179611Z // begin inline asm 2026-02-21T09:19:14.2181172Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900}, {%r17299,%r17300,%r17301,%r17302}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2181234Z // end inline asm 2026-02-21T09:19:14.2181293Z // begin inline asm 2026-02-21T09:19:14.2182773Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964}, {%r17431,%r17432,%r17433,%r17434}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2182831Z // end inline asm 2026-02-21T09:19:14.2182892Z // begin inline asm 2026-02-21T09:19:14.2184373Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964}, {%r17563,%r17564,%r17565,%r17566}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2184437Z // end inline asm 2026-02-21T09:19:14.2184494Z // begin inline asm 2026-02-21T09:19:14.2185983Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028}, {%r17695,%r17696,%r17697,%r17698}, %rd1, %p65, 1, 1; 2026-02-21T09:19:14.2186093Z // end inline asm 2026-02-21T09:19:14.2186193Z // begin inline asm 2026-02-21T09:19:14.2187859Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028}, {%r17827,%r17828,%r17829,%r17830}, %rd2, %p65, 1, 1; 2026-02-21T09:19:14.2187924Z // end inline asm 2026-02-21T09:19:14.2188002Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.2188063Z mov.b32 %r18087, %r2973; 2026-02-21T09:19:14.2188125Z mov.b32 %r18088, %r18089; 2026-02-21T09:19:14.2188182Z // begin inline asm 2026-02-21T09:19:14.2193348Z // wait for regs: %r23773,%r23774,%r23775,%r23776,%r23777,%r23778,%r23779,%r23780,%r23781,%r23782,%r23783,%r23784,%r23785,%r23786,%r23787,%r23788,%r23789,%r23790,%r23791,%r23792,%r23793,%r23794,%r23795,%r23796,%r23797,%r23798,%r23799,%r23800,%r23801,%r23802,%r23803,%r23804,%r23805,%r23806,%r23807,%r23808,%r23809,%r23810,%r23811,%r23812,%r23813,%r23814,%r23815,%r23816,%r23817,%r23818,%r23819,%r23820,%r23821,%r23822,%r23823,%r23824,%r23825,%r23826,%r23827,%r23828,%r23829,%r23830,%r23831,%r23832,%r23833,%r23834,%r23835,%r23836,%r23837,%r23838,%r23839,%r23840,%r23841,%r23842,%r23843,%r23844,%r23845,%r23846,%r23847,%r23848,%r23849,%r23850,%r23851,%r23852,%r23853,%r23854,%r23855,%r23856,%r23857,%r23858,%r23859,%r23860,%r23861,%r23862,%r23863,%r23864,%r23865,%r23866,%r23867,%r23868,%r23869,%r23870,%r23871,%r23872,%r23873,%r23874,%r23875,%r23876,%r23877,%r23878,%r23879,%r23880,%r23881,%r23882,%r23883,%r23884,%r23885,%r23886,%r23887,%r23888,%r23889,%r23890,%r23891,%r23892,%r23893,%r23894,%r23895,%r23896,%r23897,%r23898,%r23899,%r23900,%r23901,%r23902,%r23903,%r23904,%r23905,%r23906,%r23907,%r23908,%r23909,%r23910,%r23911,%r23912,%r23913,%r23914,%r23915,%r23916,%r23917,%r23918,%r23919,%r23920,%r23921,%r23922,%r23923,%r23924,%r23925,%r23926,%r23927,%r23928,%r23929,%r23930,%r23931,%r23932,%r23933,%r23934,%r23935,%r23936,%r23937,%r23938,%r23939,%r23940,%r23941,%r23942,%r23943,%r23944,%r23945,%r23946,%r23947,%r23948,%r23949,%r23950,%r23951,%r23952,%r23953,%r23954,%r23955,%r23956,%r23957,%r23958,%r23959,%r23960,%r23961,%r23962,%r23963,%r23964,%r23965,%r23966,%r23967,%r23968,%r23969,%r23970,%r23971,%r23972,%r23973,%r23974,%r23975,%r23976,%r23977,%r23978,%r23979,%r23980,%r23981,%r23982,%r23983,%r23984,%r23985,%r23986,%r23987,%r23988,%r23989,%r23990,%r23991,%r23992,%r23993,%r23994,%r23995,%r23996,%r23997,%r23998,%r23999,%r24000,%r24001,%r24002,%r24003,%r24004,%r24005,%r24006,%r24007,%r24008,%r24009,%r24010,%r24011,%r24012,%r24013,%r24014,%r24015,%r24016,%r24017,%r24018,%r24019,%r24020,%r24021,%r24022,%r24023,%r24024,%r24025,%r24026,%r24027,%r24028,%r18087,%r18088,%r18089 2026-02-21T09:19:14.2193441Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.2193498Z // end inline asm 2026-02-21T09:19:14.2193551Z $L__tmp16: 2026-02-21T09:19:14.2193762Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2193831Z add.s32 %r18420, %r23772, 1; 2026-02-21T09:19:14.2193898Z setp.gt.s32 %p84, %r18420, 4; 2026-02-21T09:19:14.2193968Z selp.b32 %r23772, 0, %r18420, %p84; 2026-02-21T09:19:14.2194171Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2194236Z add.s32 %r18421, %r23769, -16; 2026-02-21T09:19:14.2194300Z add.s64 %rd860, %rd1120, %rd47; 2026-02-21T09:19:14.2194431Z add.s64 %rd842, %rd860, 320; 2026-02-21T09:19:14.2194491Z add.s64 %rd861, %rd1120, %rd46; 2026-02-21T09:19:14.2194552Z add.s64 %rd843, %rd861, 320; 2026-02-21T09:19:14.2194611Z add.s64 %rd862, %rd1120, %rd45; 2026-02-21T09:19:14.2194673Z add.s64 %rd844, %rd862, 320; 2026-02-21T09:19:14.2194796Z add.s64 %rd863, %rd1120, %rd44; 2026-02-21T09:19:14.2194856Z add.s64 %rd845, %rd863, 320; 2026-02-21T09:19:14.2194919Z add.s64 %rd864, %rd1120, %rd43; 2026-02-21T09:19:14.2194980Z add.s64 %rd846, %rd864, 320; 2026-02-21T09:19:14.2195041Z add.s64 %rd865, %rd1120, %rd42; 2026-02-21T09:19:14.2195103Z add.s64 %rd847, %rd865, 320; 2026-02-21T09:19:14.2195168Z add.s64 %rd866, %rd1120, %rd41; 2026-02-21T09:19:14.2195228Z add.s64 %rd848, %rd866, 320; 2026-02-21T09:19:14.2195347Z mad.wide.s32 %rd849, %r18421, 2, %rd64; 2026-02-21T09:19:14.2195548Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2195620Z shl.b32 %r18422, %r23772, 14; 2026-02-21T09:19:14.2195685Z add.s32 %r18423, %r22967, %r18422; 2026-02-21T09:19:14.2195748Z add.s32 %r18349, %r18423, %r56; 2026-02-21T09:19:14.2195811Z selp.b32 %r18350, 8, 0, %p82; 2026-02-21T09:19:14.2195872Z // begin inline asm 2026-02-21T09:19:14.2196030Z cp.async.ca.shared.global [ %r18349 + 0 ], [ %rd842 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2196089Z // end inline asm 2026-02-21T09:19:14.2196150Z add.s32 %r18351, %r18349, 2048; 2026-02-21T09:19:14.2196208Z // begin inline asm 2026-02-21T09:19:14.2196400Z cp.async.ca.shared.global [ %r18351 + 0 ], [ %rd843 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2196567Z // end inline asm 2026-02-21T09:19:14.2196631Z add.s32 %r18353, %r18349, 4096; 2026-02-21T09:19:14.2196690Z // begin inline asm 2026-02-21T09:19:14.2196831Z cp.async.ca.shared.global [ %r18353 + 0 ], [ %rd844 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2196887Z // end inline asm 2026-02-21T09:19:14.2196945Z add.s32 %r18355, %r18349, 6144; 2026-02-21T09:19:14.2197009Z // begin inline asm 2026-02-21T09:19:14.2197143Z cp.async.ca.shared.global [ %r18355 + 0 ], [ %rd845 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2197198Z // end inline asm 2026-02-21T09:19:14.2197259Z add.s32 %r18357, %r18349, 8192; 2026-02-21T09:19:14.2197317Z // begin inline asm 2026-02-21T09:19:14.2197453Z cp.async.ca.shared.global [ %r18357 + 0 ], [ %rd846 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2197507Z // end inline asm 2026-02-21T09:19:14.2197572Z add.s32 %r18359, %r18349, 10240; 2026-02-21T09:19:14.2197629Z // begin inline asm 2026-02-21T09:19:14.2197764Z cp.async.ca.shared.global [ %r18359 + 0 ], [ %rd847 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2197820Z // end inline asm 2026-02-21T09:19:14.2197882Z add.s32 %r18361, %r18349, 12288; 2026-02-21T09:19:14.2197952Z // begin inline asm 2026-02-21T09:19:14.2198090Z cp.async.ca.shared.global [ %r18361 + 0 ], [ %rd848 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2198149Z // end inline asm 2026-02-21T09:19:14.2198210Z add.s32 %r18363, %r18349, 14336; 2026-02-21T09:19:14.2198269Z // begin inline asm 2026-02-21T09:19:14.2198408Z cp.async.ca.shared.global [ %r18363 + 0 ], [ %rd849 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2198464Z // end inline asm 2026-02-21T09:19:14.2198530Z cp.async.commit_group; 2026-02-21T09:19:14.2198739Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2198805Z add.s32 %r18424, %r23770, -65536; 2026-02-21T09:19:14.2198868Z cvt.s64.s32 %rd867, %r18424; 2026-02-21T09:19:14.2198931Z add.s64 %rd850, %rd65, %rd867; 2026-02-21T09:19:14.2199134Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2199195Z shl.b32 %r18425, %r23772, 10; 2026-02-21T09:19:14.2199257Z add.s32 %r18365, %r66, %r18425; 2026-02-21T09:19:14.2199322Z selp.b32 %r18366, 4, 0, %p82; 2026-02-21T09:19:14.2199379Z // begin inline asm 2026-02-21T09:19:14.2199515Z cp.async.ca.shared.global [ %r18365 + 0 ], [ %rd850 + 0 ], 0x4, %r18366; 2026-02-21T09:19:14.2199654Z // end inline asm 2026-02-21T09:19:14.2199721Z cp.async.commit_group; 2026-02-21T09:19:14.2199917Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2200054Z add.s64 %rd851, %rd860, 352; 2026-02-21T09:19:14.2200119Z add.s64 %rd852, %rd861, 352; 2026-02-21T09:19:14.2200180Z add.s64 %rd853, %rd862, 352; 2026-02-21T09:19:14.2200240Z add.s64 %rd854, %rd863, 352; 2026-02-21T09:19:14.2200304Z add.s64 %rd855, %rd864, 352; 2026-02-21T09:19:14.2200364Z add.s64 %rd856, %rd865, 352; 2026-02-21T09:19:14.2200422Z add.s64 %rd857, %rd866, 352; 2026-02-21T09:19:14.2200493Z mad.wide.s32 %rd858, %r23769, 2, %rd64; 2026-02-21T09:19:14.2200755Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2200821Z add.s32 %r18426, %r6453, %r18422; 2026-02-21T09:19:14.2200882Z add.s32 %r18367, %r18426, %r56; 2026-02-21T09:19:14.2200944Z // begin inline asm 2026-02-21T09:19:14.2201081Z cp.async.ca.shared.global [ %r18367 + 0 ], [ %rd851 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2201137Z // end inline asm 2026-02-21T09:19:14.2201199Z add.s32 %r18369, %r18367, 2048; 2026-02-21T09:19:14.2201258Z // begin inline asm 2026-02-21T09:19:14.2201391Z cp.async.ca.shared.global [ %r18369 + 0 ], [ %rd852 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2201445Z // end inline asm 2026-02-21T09:19:14.2201507Z add.s32 %r18371, %r18367, 4096; 2026-02-21T09:19:14.2201563Z // begin inline asm 2026-02-21T09:19:14.2201761Z cp.async.ca.shared.global [ %r18371 + 0 ], [ %rd853 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2201821Z // end inline asm 2026-02-21T09:19:14.2201880Z add.s32 %r18373, %r18367, 6144; 2026-02-21T09:19:14.2201938Z // begin inline asm 2026-02-21T09:19:14.2202072Z cp.async.ca.shared.global [ %r18373 + 0 ], [ %rd854 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2202130Z // end inline asm 2026-02-21T09:19:14.2202189Z add.s32 %r18375, %r18367, 8192; 2026-02-21T09:19:14.2202247Z // begin inline asm 2026-02-21T09:19:14.2202389Z cp.async.ca.shared.global [ %r18375 + 0 ], [ %rd855 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2202452Z // end inline asm 2026-02-21T09:19:14.2202512Z add.s32 %r18377, %r18367, 10240; 2026-02-21T09:19:14.2202573Z // begin inline asm 2026-02-21T09:19:14.2202708Z cp.async.ca.shared.global [ %r18377 + 0 ], [ %rd856 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2202763Z // end inline asm 2026-02-21T09:19:14.2202824Z add.s32 %r18379, %r18367, 12288; 2026-02-21T09:19:14.2202884Z // begin inline asm 2026-02-21T09:19:14.2203016Z cp.async.ca.shared.global [ %r18379 + 0 ], [ %rd857 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2203071Z // end inline asm 2026-02-21T09:19:14.2203132Z add.s32 %r18381, %r18367, 14336; 2026-02-21T09:19:14.2203191Z // begin inline asm 2026-02-21T09:19:14.2203324Z cp.async.ca.shared.global [ %r18381 + 0 ], [ %rd858 + 0 ], 0x8, %r18350; 2026-02-21T09:19:14.2203378Z // end inline asm 2026-02-21T09:19:14.2203448Z cp.async.commit_group; 2026-02-21T09:19:14.2203644Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2203706Z cvt.s64.s32 %rd868, %r23770; 2026-02-21T09:19:14.2203771Z add.s64 %rd859, %rd65, %rd868; 2026-02-21T09:19:14.2203968Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2204029Z add.s32 %r18383, %r76, %r18425; 2026-02-21T09:19:14.2204093Z // begin inline asm 2026-02-21T09:19:14.2204239Z cp.async.ca.shared.global [ %r18383 + 0 ], [ %rd859 + 0 ], 0x4, %r18366; 2026-02-21T09:19:14.2204295Z // end inline asm 2026-02-21T09:19:14.2204359Z cp.async.commit_group; 2026-02-21T09:19:14.2204558Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2204619Z add.s32 %r23770, %r23770, 131072; 2026-02-21T09:19:14.2204681Z add.s64 %rd1120, %rd1120, 64; 2026-02-21T09:19:14.2204802Z add.s32 %r23769, %r23769, 32; 2026-02-21T09:19:14.2204867Z setp.lt.u64 %p85, %rd1121, 496; 2026-02-21T09:19:14.2204926Z @%p85 bra $L__BB0_9; 2026-02-21T09:19:14.2205034Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:19:14.2205279Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.2205339Z or.b32 %r18715, %r1899, %r9; 2026-02-21T09:19:14.2205533Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.2205597Z or.b32 %r18716, %r1900, %r19; 2026-02-21T09:19:14.2205656Z or.b32 %r18717, %r1900, %r20; 2026-02-21T09:19:14.2205715Z or.b32 %r18718, %r1900, %r21; 2026-02-21T09:19:14.2205823Z or.b32 %r18719, %r1900, %r22; 2026-02-21T09:19:14.2205883Z or.b32 %r18720, %r1900, %r23; 2026-02-21T09:19:14.2205941Z or.b32 %r18721, %r1900, %r24; 2026-02-21T09:19:14.2206001Z or.b32 %r18722, %r1900, %r25; 2026-02-21T09:19:14.2206061Z or.b32 %r18723, %r1900, %r26; 2026-02-21T09:19:14.2206119Z or.b32 %r18724, %r1900, %r27; 2026-02-21T09:19:14.2206177Z or.b32 %r18725, %r1900, %r28; 2026-02-21T09:19:14.2206238Z or.b32 %r18726, %r1900, %r29; 2026-02-21T09:19:14.2206296Z or.b32 %r18727, %r1900, %r30; 2026-02-21T09:19:14.2206356Z or.b32 %r18728, %r1900, %r31; 2026-02-21T09:19:14.2206416Z or.b32 %r18729, %r1900, %r32; 2026-02-21T09:19:14.2206587Z or.b32 %r18730, %r1900, %r33; 2026-02-21T09:19:14.2206649Z or.b32 %r18731, %r1900, %r34; 2026-02-21T09:19:14.2206710Z or.b32 %r18732, %r1900, %r35; 2026-02-21T09:19:14.2206845Z or.b32 %r18733, %r1900, %r36; 2026-02-21T09:19:14.2206907Z or.b32 %r18734, %r1900, %r37; 2026-02-21T09:19:14.2206965Z or.b32 %r18735, %r1900, %r38; 2026-02-21T09:19:14.2207027Z or.b32 %r18736, %r1900, %r39; 2026-02-21T09:19:14.2207085Z or.b32 %r18737, %r1900, %r40; 2026-02-21T09:19:14.2207143Z or.b32 %r18738, %r1900, %r41; 2026-02-21T09:19:14.2207201Z or.b32 %r18739, %r1900, %r42; 2026-02-21T09:19:14.2207263Z or.b32 %r18740, %r1900, %r43; 2026-02-21T09:19:14.2207334Z or.b32 %r18741, %r1900, %r44; 2026-02-21T09:19:14.2207392Z or.b32 %r18742, %r1900, %r45; 2026-02-21T09:19:14.2207456Z or.b32 %r18743, %r1900, %r46; 2026-02-21T09:19:14.2207514Z or.b32 %r18744, %r1900, %r47; 2026-02-21T09:19:14.2207574Z or.b32 %r18745, %r1900, %r48; 2026-02-21T09:19:14.2207631Z or.b32 %r18746, %r1900, %r49; 2026-02-21T09:19:14.2207692Z or.b32 %r18747, %r1900, %r50; 2026-02-21T09:19:14.2207888Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2207955Z cp.async.wait_group 0; 2026-02-21T09:19:14.2208012Z bar.sync 0; 2026-02-21T09:19:14.2208208Z .loc 1 90 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:90:28 2026-02-21T09:19:14.2208294Z cvt.rn.bf16x2.f32 %r18748, %r23774, %r23773; 2026-02-21T09:19:14.2208376Z cvt.rn.bf16x2.f32 %r18749, %r23776, %r23775; 2026-02-21T09:19:14.2208454Z cvt.rn.bf16x2.f32 %r18750, %r23778, %r23777; 2026-02-21T09:19:14.2208531Z cvt.rn.bf16x2.f32 %r18751, %r23780, %r23779; 2026-02-21T09:19:14.2208605Z cvt.rn.bf16x2.f32 %r18752, %r23782, %r23781; 2026-02-21T09:19:14.2208682Z cvt.rn.bf16x2.f32 %r18753, %r23784, %r23783; 2026-02-21T09:19:14.2208760Z cvt.rn.bf16x2.f32 %r18754, %r23786, %r23785; 2026-02-21T09:19:14.2208835Z cvt.rn.bf16x2.f32 %r18755, %r23788, %r23787; 2026-02-21T09:19:14.2208912Z cvt.rn.bf16x2.f32 %r18756, %r23790, %r23789; 2026-02-21T09:19:14.2208993Z cvt.rn.bf16x2.f32 %r18757, %r23792, %r23791; 2026-02-21T09:19:14.2209068Z cvt.rn.bf16x2.f32 %r18758, %r23794, %r23793; 2026-02-21T09:19:14.2209143Z cvt.rn.bf16x2.f32 %r18759, %r23796, %r23795; 2026-02-21T09:19:14.2209217Z cvt.rn.bf16x2.f32 %r18760, %r23798, %r23797; 2026-02-21T09:19:14.2209294Z cvt.rn.bf16x2.f32 %r18761, %r23800, %r23799; 2026-02-21T09:19:14.2209368Z cvt.rn.bf16x2.f32 %r18762, %r23802, %r23801; 2026-02-21T09:19:14.2209445Z cvt.rn.bf16x2.f32 %r18763, %r23804, %r23803; 2026-02-21T09:19:14.2209596Z cvt.rn.bf16x2.f32 %r18764, %r23806, %r23805; 2026-02-21T09:19:14.2209670Z cvt.rn.bf16x2.f32 %r18765, %r23808, %r23807; 2026-02-21T09:19:14.2209745Z cvt.rn.bf16x2.f32 %r18766, %r23810, %r23809; 2026-02-21T09:19:14.2209817Z cvt.rn.bf16x2.f32 %r18767, %r23812, %r23811; 2026-02-21T09:19:14.2209959Z cvt.rn.bf16x2.f32 %r18768, %r23814, %r23813; 2026-02-21T09:19:14.2210035Z cvt.rn.bf16x2.f32 %r18769, %r23816, %r23815; 2026-02-21T09:19:14.2210109Z cvt.rn.bf16x2.f32 %r18770, %r23818, %r23817; 2026-02-21T09:19:14.2210185Z cvt.rn.bf16x2.f32 %r18771, %r23820, %r23819; 2026-02-21T09:19:14.2210266Z cvt.rn.bf16x2.f32 %r18772, %r23822, %r23821; 2026-02-21T09:19:14.2210349Z cvt.rn.bf16x2.f32 %r18773, %r23824, %r23823; 2026-02-21T09:19:14.2210522Z cvt.rn.bf16x2.f32 %r18774, %r23826, %r23825; 2026-02-21T09:19:14.2210599Z cvt.rn.bf16x2.f32 %r18775, %r23828, %r23827; 2026-02-21T09:19:14.2210684Z cvt.rn.bf16x2.f32 %r18776, %r23830, %r23829; 2026-02-21T09:19:14.2210766Z cvt.rn.bf16x2.f32 %r18777, %r23832, %r23831; 2026-02-21T09:19:14.2210841Z cvt.rn.bf16x2.f32 %r18778, %r23834, %r23833; 2026-02-21T09:19:14.2210917Z cvt.rn.bf16x2.f32 %r18779, %r23836, %r23835; 2026-02-21T09:19:14.2210992Z cvt.rn.bf16x2.f32 %r18780, %r23838, %r23837; 2026-02-21T09:19:14.2211067Z cvt.rn.bf16x2.f32 %r18781, %r23840, %r23839; 2026-02-21T09:19:14.2211140Z cvt.rn.bf16x2.f32 %r18782, %r23842, %r23841; 2026-02-21T09:19:14.2211216Z cvt.rn.bf16x2.f32 %r18783, %r23844, %r23843; 2026-02-21T09:19:14.2211290Z cvt.rn.bf16x2.f32 %r18784, %r23846, %r23845; 2026-02-21T09:19:14.2211416Z cvt.rn.bf16x2.f32 %r18785, %r23848, %r23847; 2026-02-21T09:19:14.2211497Z cvt.rn.bf16x2.f32 %r18786, %r23850, %r23849; 2026-02-21T09:19:14.2211572Z cvt.rn.bf16x2.f32 %r18787, %r23852, %r23851; 2026-02-21T09:19:14.2211647Z cvt.rn.bf16x2.f32 %r18788, %r23854, %r23853; 2026-02-21T09:19:14.2211724Z cvt.rn.bf16x2.f32 %r18789, %r23856, %r23855; 2026-02-21T09:19:14.2211799Z cvt.rn.bf16x2.f32 %r18790, %r23858, %r23857; 2026-02-21T09:19:14.2211874Z cvt.rn.bf16x2.f32 %r18791, %r23860, %r23859; 2026-02-21T09:19:14.2211948Z cvt.rn.bf16x2.f32 %r18792, %r23862, %r23861; 2026-02-21T09:19:14.2212026Z cvt.rn.bf16x2.f32 %r18793, %r23864, %r23863; 2026-02-21T09:19:14.2212102Z cvt.rn.bf16x2.f32 %r18794, %r23866, %r23865; 2026-02-21T09:19:14.2212177Z cvt.rn.bf16x2.f32 %r18795, %r23868, %r23867; 2026-02-21T09:19:14.2212253Z cvt.rn.bf16x2.f32 %r18796, %r23870, %r23869; 2026-02-21T09:19:14.2212327Z cvt.rn.bf16x2.f32 %r18797, %r23872, %r23871; 2026-02-21T09:19:14.2212403Z cvt.rn.bf16x2.f32 %r18798, %r23874, %r23873; 2026-02-21T09:19:14.2212482Z cvt.rn.bf16x2.f32 %r18799, %r23876, %r23875; 2026-02-21T09:19:14.2212570Z cvt.rn.bf16x2.f32 %r18800, %r23878, %r23877; 2026-02-21T09:19:14.2212646Z cvt.rn.bf16x2.f32 %r18801, %r23880, %r23879; 2026-02-21T09:19:14.2212721Z cvt.rn.bf16x2.f32 %r18802, %r23882, %r23881; 2026-02-21T09:19:14.2212798Z cvt.rn.bf16x2.f32 %r18803, %r23884, %r23883; 2026-02-21T09:19:14.2212875Z cvt.rn.bf16x2.f32 %r18804, %r23886, %r23885; 2026-02-21T09:19:14.2212948Z cvt.rn.bf16x2.f32 %r18805, %r23888, %r23887; 2026-02-21T09:19:14.2213023Z cvt.rn.bf16x2.f32 %r18806, %r23890, %r23889; 2026-02-21T09:19:14.2213097Z cvt.rn.bf16x2.f32 %r18807, %r23892, %r23891; 2026-02-21T09:19:14.2213171Z cvt.rn.bf16x2.f32 %r18808, %r23894, %r23893; 2026-02-21T09:19:14.2213247Z cvt.rn.bf16x2.f32 %r18809, %r23896, %r23895; 2026-02-21T09:19:14.2213323Z cvt.rn.bf16x2.f32 %r18810, %r23898, %r23897; 2026-02-21T09:19:14.2213396Z cvt.rn.bf16x2.f32 %r18811, %r23900, %r23899; 2026-02-21T09:19:14.2213473Z cvt.rn.bf16x2.f32 %r18812, %r23902, %r23901; 2026-02-21T09:19:14.2213550Z cvt.rn.bf16x2.f32 %r18813, %r23904, %r23903; 2026-02-21T09:19:14.2213622Z cvt.rn.bf16x2.f32 %r18814, %r23906, %r23905; 2026-02-21T09:19:14.2213696Z cvt.rn.bf16x2.f32 %r18815, %r23908, %r23907; 2026-02-21T09:19:14.2213774Z cvt.rn.bf16x2.f32 %r18816, %r23910, %r23909; 2026-02-21T09:19:14.2213849Z cvt.rn.bf16x2.f32 %r18817, %r23912, %r23911; 2026-02-21T09:19:14.2213984Z cvt.rn.bf16x2.f32 %r18818, %r23914, %r23913; 2026-02-21T09:19:14.2214057Z cvt.rn.bf16x2.f32 %r18819, %r23916, %r23915; 2026-02-21T09:19:14.2214136Z cvt.rn.bf16x2.f32 %r18820, %r23918, %r23917; 2026-02-21T09:19:14.2214259Z cvt.rn.bf16x2.f32 %r18821, %r23920, %r23919; 2026-02-21T09:19:14.2214332Z cvt.rn.bf16x2.f32 %r18822, %r23922, %r23921; 2026-02-21T09:19:14.2214409Z cvt.rn.bf16x2.f32 %r18823, %r23924, %r23923; 2026-02-21T09:19:14.2214482Z cvt.rn.bf16x2.f32 %r18824, %r23926, %r23925; 2026-02-21T09:19:14.2214557Z cvt.rn.bf16x2.f32 %r18825, %r23928, %r23927; 2026-02-21T09:19:14.2214634Z cvt.rn.bf16x2.f32 %r18826, %r23930, %r23929; 2026-02-21T09:19:14.2214712Z cvt.rn.bf16x2.f32 %r18827, %r23932, %r23931; 2026-02-21T09:19:14.2214836Z cvt.rn.bf16x2.f32 %r18828, %r23934, %r23933; 2026-02-21T09:19:14.2214913Z cvt.rn.bf16x2.f32 %r18829, %r23936, %r23935; 2026-02-21T09:19:14.2214993Z cvt.rn.bf16x2.f32 %r18830, %r23938, %r23937; 2026-02-21T09:19:14.2215069Z cvt.rn.bf16x2.f32 %r18831, %r23940, %r23939; 2026-02-21T09:19:14.2215142Z cvt.rn.bf16x2.f32 %r18832, %r23942, %r23941; 2026-02-21T09:19:14.2215219Z cvt.rn.bf16x2.f32 %r18833, %r23944, %r23943; 2026-02-21T09:19:14.2215294Z cvt.rn.bf16x2.f32 %r18834, %r23946, %r23945; 2026-02-21T09:19:14.2215371Z cvt.rn.bf16x2.f32 %r18835, %r23948, %r23947; 2026-02-21T09:19:14.2215448Z cvt.rn.bf16x2.f32 %r18836, %r23950, %r23949; 2026-02-21T09:19:14.2215522Z cvt.rn.bf16x2.f32 %r18837, %r23952, %r23951; 2026-02-21T09:19:14.2215597Z cvt.rn.bf16x2.f32 %r18838, %r23954, %r23953; 2026-02-21T09:19:14.2215717Z cvt.rn.bf16x2.f32 %r18839, %r23956, %r23955; 2026-02-21T09:19:14.2215797Z cvt.rn.bf16x2.f32 %r18840, %r23958, %r23957; 2026-02-21T09:19:14.2215871Z cvt.rn.bf16x2.f32 %r18841, %r23960, %r23959; 2026-02-21T09:19:14.2215948Z cvt.rn.bf16x2.f32 %r18842, %r23962, %r23961; 2026-02-21T09:19:14.2216024Z cvt.rn.bf16x2.f32 %r18843, %r23964, %r23963; 2026-02-21T09:19:14.2216098Z cvt.rn.bf16x2.f32 %r18844, %r23966, %r23965; 2026-02-21T09:19:14.2216177Z cvt.rn.bf16x2.f32 %r18845, %r23968, %r23967; 2026-02-21T09:19:14.2216255Z cvt.rn.bf16x2.f32 %r18846, %r23970, %r23969; 2026-02-21T09:19:14.2216329Z cvt.rn.bf16x2.f32 %r18847, %r23972, %r23971; 2026-02-21T09:19:14.2216404Z cvt.rn.bf16x2.f32 %r18848, %r23974, %r23973; 2026-02-21T09:19:14.2216591Z cvt.rn.bf16x2.f32 %r18849, %r23976, %r23975; 2026-02-21T09:19:14.2216674Z cvt.rn.bf16x2.f32 %r18850, %r23978, %r23977; 2026-02-21T09:19:14.2216748Z cvt.rn.bf16x2.f32 %r18851, %r23980, %r23979; 2026-02-21T09:19:14.2216835Z cvt.rn.bf16x2.f32 %r18852, %r23982, %r23981; 2026-02-21T09:19:14.2216918Z cvt.rn.bf16x2.f32 %r18853, %r23984, %r23983; 2026-02-21T09:19:14.2216993Z cvt.rn.bf16x2.f32 %r18854, %r23986, %r23985; 2026-02-21T09:19:14.2217067Z cvt.rn.bf16x2.f32 %r18855, %r23988, %r23987; 2026-02-21T09:19:14.2217145Z cvt.rn.bf16x2.f32 %r18856, %r23990, %r23989; 2026-02-21T09:19:14.2217221Z cvt.rn.bf16x2.f32 %r18857, %r23992, %r23991; 2026-02-21T09:19:14.2217297Z cvt.rn.bf16x2.f32 %r18858, %r23994, %r23993; 2026-02-21T09:19:14.2217372Z cvt.rn.bf16x2.f32 %r18859, %r23996, %r23995; 2026-02-21T09:19:14.2217454Z cvt.rn.bf16x2.f32 %r18860, %r23998, %r23997; 2026-02-21T09:19:14.2217528Z cvt.rn.bf16x2.f32 %r18861, %r24000, %r23999; 2026-02-21T09:19:14.2217604Z cvt.rn.bf16x2.f32 %r18862, %r24002, %r24001; 2026-02-21T09:19:14.2217681Z cvt.rn.bf16x2.f32 %r18863, %r24004, %r24003; 2026-02-21T09:19:14.2217754Z cvt.rn.bf16x2.f32 %r18864, %r24006, %r24005; 2026-02-21T09:19:14.2217830Z cvt.rn.bf16x2.f32 %r18865, %r24008, %r24007; 2026-02-21T09:19:14.2217908Z cvt.rn.bf16x2.f32 %r18866, %r24010, %r24009; 2026-02-21T09:19:14.2217982Z cvt.rn.bf16x2.f32 %r18867, %r24012, %r24011; 2026-02-21T09:19:14.2218059Z cvt.rn.bf16x2.f32 %r18868, %r24014, %r24013; 2026-02-21T09:19:14.2218134Z cvt.rn.bf16x2.f32 %r18869, %r24016, %r24015; 2026-02-21T09:19:14.2218210Z cvt.rn.bf16x2.f32 %r18870, %r24018, %r24017; 2026-02-21T09:19:14.2218285Z cvt.rn.bf16x2.f32 %r18871, %r24020, %r24019; 2026-02-21T09:19:14.2218440Z cvt.rn.bf16x2.f32 %r18872, %r24022, %r24021; 2026-02-21T09:19:14.2218519Z cvt.rn.bf16x2.f32 %r18873, %r24024, %r24023; 2026-02-21T09:19:14.2218593Z cvt.rn.bf16x2.f32 %r18874, %r24026, %r24025; 2026-02-21T09:19:14.2218726Z cvt.rn.bf16x2.f32 %r18875, %r24028, %r24027; 2026-02-21T09:19:14.2218941Z .loc 1 91 43 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:43 2026-02-21T09:19:14.2219009Z shl.b32 %r18876, %r18716, 13; 2026-02-21T09:19:14.2219069Z shl.b32 %r18877, %r18717, 13; 2026-02-21T09:19:14.2219129Z shl.b32 %r18878, %r18718, 13; 2026-02-21T09:19:14.2219191Z shl.b32 %r18879, %r18719, 13; 2026-02-21T09:19:14.2219249Z shl.b32 %r18880, %r18720, 13; 2026-02-21T09:19:14.2219373Z shl.b32 %r18881, %r18721, 13; 2026-02-21T09:19:14.2219438Z shl.b32 %r18882, %r18722, 13; 2026-02-21T09:19:14.2219498Z shl.b32 %r18883, %r18723, 13; 2026-02-21T09:19:14.2219559Z shl.b32 %r18884, %r18724, 13; 2026-02-21T09:19:14.2219620Z shl.b32 %r18885, %r18725, 13; 2026-02-21T09:19:14.2219681Z shl.b32 %r18886, %r18726, 13; 2026-02-21T09:19:14.2219739Z shl.b32 %r18887, %r18727, 13; 2026-02-21T09:19:14.2219810Z shl.b32 %r18888, %r18728, 13; 2026-02-21T09:19:14.2219875Z shl.b32 %r18889, %r18729, 13; 2026-02-21T09:19:14.2219940Z shl.b32 %r18890, %r18730, 13; 2026-02-21T09:19:14.2219998Z shl.b32 %r18891, %r18731, 13; 2026-02-21T09:19:14.2220058Z shl.b32 %r18892, %r18732, 13; 2026-02-21T09:19:14.2220120Z shl.b32 %r18893, %r18733, 13; 2026-02-21T09:19:14.2220177Z shl.b32 %r18894, %r18734, 13; 2026-02-21T09:19:14.2220300Z shl.b32 %r18895, %r18735, 13; 2026-02-21T09:19:14.2220364Z shl.b32 %r18896, %r18736, 13; 2026-02-21T09:19:14.2220423Z shl.b32 %r18897, %r18737, 13; 2026-02-21T09:19:14.2220483Z shl.b32 %r18898, %r18738, 13; 2026-02-21T09:19:14.2220545Z shl.b32 %r18899, %r18739, 13; 2026-02-21T09:19:14.2220603Z shl.b32 %r18900, %r18740, 13; 2026-02-21T09:19:14.2220661Z shl.b32 %r18901, %r18741, 13; 2026-02-21T09:19:14.2220721Z shl.b32 %r18902, %r18742, 13; 2026-02-21T09:19:14.2220781Z shl.b32 %r18903, %r18743, 13; 2026-02-21T09:19:14.2220839Z shl.b32 %r18904, %r18744, 13; 2026-02-21T09:19:14.2220897Z shl.b32 %r18905, %r18745, 13; 2026-02-21T09:19:14.2220959Z shl.b32 %r18906, %r18746, 13; 2026-02-21T09:19:14.2221018Z shl.b32 %r18907, %r18747, 13; 2026-02-21T09:19:14.2221221Z .loc 1 91 50 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:50 2026-02-21T09:19:14.2221286Z add.s32 %r18908, %r18876, %r18715; 2026-02-21T09:19:14.2221351Z add.s32 %r18909, %r18877, %r18715; 2026-02-21T09:19:14.2221412Z add.s32 %r18910, %r18878, %r18715; 2026-02-21T09:19:14.2221472Z add.s32 %r18911, %r18879, %r18715; 2026-02-21T09:19:14.2221535Z add.s32 %r18912, %r18880, %r18715; 2026-02-21T09:19:14.2221596Z add.s32 %r18913, %r18881, %r18715; 2026-02-21T09:19:14.2221657Z add.s32 %r18914, %r18882, %r18715; 2026-02-21T09:19:14.2221719Z add.s32 %r18915, %r18883, %r18715; 2026-02-21T09:19:14.2221781Z add.s32 %r18916, %r18884, %r18715; 2026-02-21T09:19:14.2221841Z add.s32 %r18917, %r18885, %r18715; 2026-02-21T09:19:14.2221900Z add.s32 %r18918, %r18886, %r18715; 2026-02-21T09:19:14.2221964Z add.s32 %r18919, %r18887, %r18715; 2026-02-21T09:19:14.2222025Z add.s32 %r18920, %r18888, %r18715; 2026-02-21T09:19:14.2222086Z add.s32 %r18921, %r18889, %r18715; 2026-02-21T09:19:14.2222149Z add.s32 %r18922, %r18890, %r18715; 2026-02-21T09:19:14.2222208Z add.s32 %r18923, %r18891, %r18715; 2026-02-21T09:19:14.2222267Z add.s32 %r18924, %r18892, %r18715; 2026-02-21T09:19:14.2222329Z add.s32 %r18925, %r18893, %r18715; 2026-02-21T09:19:14.2222391Z add.s32 %r18926, %r18894, %r18715; 2026-02-21T09:19:14.2222463Z add.s32 %r18927, %r18895, %r18715; 2026-02-21T09:19:14.2222528Z add.s32 %r18928, %r18896, %r18715; 2026-02-21T09:19:14.2222592Z add.s32 %r18929, %r18897, %r18715; 2026-02-21T09:19:14.2222651Z add.s32 %r18930, %r18898, %r18715; 2026-02-21T09:19:14.2222772Z add.s32 %r18931, %r18899, %r18715; 2026-02-21T09:19:14.2222832Z add.s32 %r18932, %r18900, %r18715; 2026-02-21T09:19:14.2222894Z add.s32 %r18933, %r18901, %r18715; 2026-02-21T09:19:14.2222955Z add.s32 %r18934, %r18902, %r18715; 2026-02-21T09:19:14.2223015Z add.s32 %r18935, %r18903, %r18715; 2026-02-21T09:19:14.2223125Z add.s32 %r18936, %r18904, %r18715; 2026-02-21T09:19:14.2223184Z add.s32 %r18937, %r18905, %r18715; 2026-02-21T09:19:14.2223243Z add.s32 %r18938, %r18906, %r18715; 2026-02-21T09:19:14.2223307Z add.s32 %r18939, %r18907, %r18715; 2026-02-21T09:19:14.2223511Z .loc 1 91 22 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:22 2026-02-21T09:19:14.2223585Z mad.wide.s32 %rd869, %r18908, 2, %rd66; 2026-02-21T09:19:14.2223705Z mad.wide.s32 %rd870, %r18909, 2, %rd66; 2026-02-21T09:19:14.2223779Z mad.wide.s32 %rd871, %r18910, 2, %rd66; 2026-02-21T09:19:14.2223845Z mad.wide.s32 %rd872, %r18911, 2, %rd66; 2026-02-21T09:19:14.2223912Z mad.wide.s32 %rd873, %r18912, 2, %rd66; 2026-02-21T09:19:14.2223986Z mad.wide.s32 %rd874, %r18913, 2, %rd66; 2026-02-21T09:19:14.2224053Z mad.wide.s32 %rd875, %r18914, 2, %rd66; 2026-02-21T09:19:14.2224120Z mad.wide.s32 %rd876, %r18915, 2, %rd66; 2026-02-21T09:19:14.2224189Z mad.wide.s32 %rd877, %r18916, 2, %rd66; 2026-02-21T09:19:14.2224258Z mad.wide.s32 %rd878, %r18917, 2, %rd66; 2026-02-21T09:19:14.2224335Z mad.wide.s32 %rd879, %r18918, 2, %rd66; 2026-02-21T09:19:14.2224406Z mad.wide.s32 %rd880, %r18919, 2, %rd66; 2026-02-21T09:19:14.2224476Z mad.wide.s32 %rd881, %r18920, 2, %rd66; 2026-02-21T09:19:14.2224592Z mad.wide.s32 %rd882, %r18921, 2, %rd66; 2026-02-21T09:19:14.2224660Z mad.wide.s32 %rd883, %r18922, 2, %rd66; 2026-02-21T09:19:14.2224729Z mad.wide.s32 %rd884, %r18923, 2, %rd66; 2026-02-21T09:19:14.2224797Z mad.wide.s32 %rd885, %r18924, 2, %rd66; 2026-02-21T09:19:14.2224864Z mad.wide.s32 %rd886, %r18925, 2, %rd66; 2026-02-21T09:19:14.2224931Z mad.wide.s32 %rd887, %r18926, 2, %rd66; 2026-02-21T09:19:14.2225002Z mad.wide.s32 %rd888, %r18927, 2, %rd66; 2026-02-21T09:19:14.2225069Z mad.wide.s32 %rd889, %r18928, 2, %rd66; 2026-02-21T09:19:14.2225135Z mad.wide.s32 %rd890, %r18929, 2, %rd66; 2026-02-21T09:19:14.2225202Z mad.wide.s32 %rd891, %r18930, 2, %rd66; 2026-02-21T09:19:14.2225271Z mad.wide.s32 %rd892, %r18931, 2, %rd66; 2026-02-21T09:19:14.2225339Z mad.wide.s32 %rd893, %r18932, 2, %rd66; 2026-02-21T09:19:14.2225407Z mad.wide.s32 %rd894, %r18933, 2, %rd66; 2026-02-21T09:19:14.2225473Z mad.wide.s32 %rd895, %r18934, 2, %rd66; 2026-02-21T09:19:14.2225540Z mad.wide.s32 %rd896, %r18935, 2, %rd66; 2026-02-21T09:19:14.2225608Z mad.wide.s32 %rd897, %r18936, 2, %rd66; 2026-02-21T09:19:14.2225679Z mad.wide.s32 %rd898, %r18937, 2, %rd66; 2026-02-21T09:19:14.2225746Z mad.wide.s32 %rd899, %r18938, 2, %rd66; 2026-02-21T09:19:14.2225813Z mad.wide.s32 %rd900, %r18939, 2, %rd66; 2026-02-21T09:19:14.2226016Z .loc 1 91 81 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:81 2026-02-21T09:19:14.2226138Z st.shared.v4.b32 [%r165], {%r18748, %r18750, %r18752, %r18754}; 2026-02-21T09:19:14.2226261Z st.shared.v4.b32 [%r165+512], {%r18749, %r18751, %r18753, %r18755}; 2026-02-21T09:19:14.2226375Z st.shared.v4.b32 [%r166], {%r18756, %r18758, %r18760, %r18762}; 2026-02-21T09:19:14.2226607Z st.shared.v4.b32 [%r166+512], {%r18757, %r18759, %r18761, %r18763}; 2026-02-21T09:19:14.2226722Z st.shared.v4.b32 [%r167], {%r18764, %r18766, %r18768, %r18770}; 2026-02-21T09:19:14.2226841Z st.shared.v4.b32 [%r167+512], {%r18765, %r18767, %r18769, %r18771}; 2026-02-21T09:19:14.2226950Z st.shared.v4.b32 [%r168], {%r18772, %r18774, %r18776, %r18778}; 2026-02-21T09:19:14.2227062Z st.shared.v4.b32 [%r168+512], {%r18773, %r18775, %r18777, %r18779}; 2026-02-21T09:19:14.2227121Z bar.sync 0; 2026-02-21T09:19:14.2227184Z // begin inline asm 2026-02-21T09:19:14.2227384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18427, %r18428, %r18429, %r18430}, [%r6479]; 2026-02-21T09:19:14.2227521Z // end inline asm 2026-02-21T09:19:14.2227581Z // begin inline asm 2026-02-21T09:19:14.2227773Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18432, %r18433, %r18434, %r18435}, [%r6484]; 2026-02-21T09:19:14.2227829Z // end inline asm 2026-02-21T09:19:14.2227983Z // begin inline asm 2026-02-21T09:19:14.2228175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18437, %r18438, %r18439, %r18440}, [%r6489]; 2026-02-21T09:19:14.2228231Z // end inline asm 2026-02-21T09:19:14.2228288Z // begin inline asm 2026-02-21T09:19:14.2228480Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18442, %r18443, %r18444, %r18445}, [%r6494]; 2026-02-21T09:19:14.2228621Z // end inline asm 2026-02-21T09:19:14.2228682Z // begin inline asm 2026-02-21T09:19:14.2228943Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18447, %r18448, %r18449, %r18450}, [%r6499]; 2026-02-21T09:19:14.2229002Z // end inline asm 2026-02-21T09:19:14.2229059Z // begin inline asm 2026-02-21T09:19:14.2229255Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18452, %r18453, %r18454, %r18455}, [%r6504]; 2026-02-21T09:19:14.2229314Z // end inline asm 2026-02-21T09:19:14.2229371Z // begin inline asm 2026-02-21T09:19:14.2229557Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18457, %r18458, %r18459, %r18460}, [%r6509]; 2026-02-21T09:19:14.2229617Z // end inline asm 2026-02-21T09:19:14.2229674Z // begin inline asm 2026-02-21T09:19:14.2229859Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18462, %r18463, %r18464, %r18465}, [%r6514]; 2026-02-21T09:19:14.2229915Z // end inline asm 2026-02-21T09:19:14.2229970Z bar.sync 0; 2026-02-21T09:19:14.2230143Z st.shared.v4.b32 [%r165], {%r18780, %r18782, %r18784, %r18786}; 2026-02-21T09:19:14.2230265Z st.shared.v4.b32 [%r165+512], {%r18781, %r18783, %r18785, %r18787}; 2026-02-21T09:19:14.2230378Z st.shared.v4.b32 [%r166], {%r18788, %r18790, %r18792, %r18794}; 2026-02-21T09:19:14.2230493Z st.shared.v4.b32 [%r166+512], {%r18789, %r18791, %r18793, %r18795}; 2026-02-21T09:19:14.2230601Z st.shared.v4.b32 [%r167], {%r18796, %r18798, %r18800, %r18802}; 2026-02-21T09:19:14.2230719Z st.shared.v4.b32 [%r167+512], {%r18797, %r18799, %r18801, %r18803}; 2026-02-21T09:19:14.2230828Z st.shared.v4.b32 [%r168], {%r18804, %r18806, %r18808, %r18810}; 2026-02-21T09:19:14.2230941Z st.shared.v4.b32 [%r168+512], {%r18805, %r18807, %r18809, %r18811}; 2026-02-21T09:19:14.2231001Z bar.sync 0; 2026-02-21T09:19:14.2231058Z // begin inline asm 2026-02-21T09:19:14.2231248Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18467, %r18468, %r18469, %r18470}, [%r6479]; 2026-02-21T09:19:14.2231302Z // end inline asm 2026-02-21T09:19:14.2231364Z // begin inline asm 2026-02-21T09:19:14.2231550Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18472, %r18473, %r18474, %r18475}, [%r6484]; 2026-02-21T09:19:14.2231607Z // end inline asm 2026-02-21T09:19:14.2231668Z // begin inline asm 2026-02-21T09:19:14.2231854Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18477, %r18478, %r18479, %r18480}, [%r6489]; 2026-02-21T09:19:14.2231911Z // end inline asm 2026-02-21T09:19:14.2231972Z // begin inline asm 2026-02-21T09:19:14.2232157Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18482, %r18483, %r18484, %r18485}, [%r6494]; 2026-02-21T09:19:14.2232212Z // end inline asm 2026-02-21T09:19:14.2232269Z // begin inline asm 2026-02-21T09:19:14.2232461Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18487, %r18488, %r18489, %r18490}, [%r6499]; 2026-02-21T09:19:14.2232517Z // end inline asm 2026-02-21T09:19:14.2232574Z // begin inline asm 2026-02-21T09:19:14.2232765Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18492, %r18493, %r18494, %r18495}, [%r6504]; 2026-02-21T09:19:14.2232819Z // end inline asm 2026-02-21T09:19:14.2232877Z // begin inline asm 2026-02-21T09:19:14.2233065Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18497, %r18498, %r18499, %r18500}, [%r6509]; 2026-02-21T09:19:14.2233122Z // end inline asm 2026-02-21T09:19:14.2233182Z // begin inline asm 2026-02-21T09:19:14.2233366Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18502, %r18503, %r18504, %r18505}, [%r6514]; 2026-02-21T09:19:14.2233484Z // end inline asm 2026-02-21T09:19:14.2233539Z bar.sync 0; 2026-02-21T09:19:14.2233648Z st.shared.v4.b32 [%r165], {%r18812, %r18814, %r18816, %r18818}; 2026-02-21T09:19:14.2233768Z st.shared.v4.b32 [%r165+512], {%r18813, %r18815, %r18817, %r18819}; 2026-02-21T09:19:14.2233925Z st.shared.v4.b32 [%r166], {%r18820, %r18822, %r18824, %r18826}; 2026-02-21T09:19:14.2234039Z st.shared.v4.b32 [%r166+512], {%r18821, %r18823, %r18825, %r18827}; 2026-02-21T09:19:14.2234151Z st.shared.v4.b32 [%r167], {%r18828, %r18830, %r18832, %r18834}; 2026-02-21T09:19:14.2234265Z st.shared.v4.b32 [%r167+512], {%r18829, %r18831, %r18833, %r18835}; 2026-02-21T09:19:14.2234372Z st.shared.v4.b32 [%r168], {%r18836, %r18838, %r18840, %r18842}; 2026-02-21T09:19:14.2234545Z st.shared.v4.b32 [%r168+512], {%r18837, %r18839, %r18841, %r18843}; 2026-02-21T09:19:14.2234605Z bar.sync 0; 2026-02-21T09:19:14.2234664Z // begin inline asm 2026-02-21T09:19:14.2234854Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18507, %r18508, %r18509, %r18510}, [%r6479]; 2026-02-21T09:19:14.2234913Z // end inline asm 2026-02-21T09:19:14.2234969Z // begin inline asm 2026-02-21T09:19:14.2235158Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18512, %r18513, %r18514, %r18515}, [%r6484]; 2026-02-21T09:19:14.2235217Z // end inline asm 2026-02-21T09:19:14.2235274Z // begin inline asm 2026-02-21T09:19:14.2235460Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18517, %r18518, %r18519, %r18520}, [%r6489]; 2026-02-21T09:19:14.2235514Z // end inline asm 2026-02-21T09:19:14.2235574Z // begin inline asm 2026-02-21T09:19:14.2235807Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18522, %r18523, %r18524, %r18525}, [%r6494]; 2026-02-21T09:19:14.2235865Z // end inline asm 2026-02-21T09:19:14.2235926Z // begin inline asm 2026-02-21T09:19:14.2236113Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18527, %r18528, %r18529, %r18530}, [%r6499]; 2026-02-21T09:19:14.2236170Z // end inline asm 2026-02-21T09:19:14.2236228Z // begin inline asm 2026-02-21T09:19:14.2236417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18532, %r18533, %r18534, %r18535}, [%r6504]; 2026-02-21T09:19:14.2236579Z // end inline asm 2026-02-21T09:19:14.2236640Z // begin inline asm 2026-02-21T09:19:14.2236829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18537, %r18538, %r18539, %r18540}, [%r6509]; 2026-02-21T09:19:14.2236898Z // end inline asm 2026-02-21T09:19:14.2236956Z // begin inline asm 2026-02-21T09:19:14.2237147Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18542, %r18543, %r18544, %r18545}, [%r6514]; 2026-02-21T09:19:14.2237205Z // end inline asm 2026-02-21T09:19:14.2237259Z bar.sync 0; 2026-02-21T09:19:14.2237369Z st.shared.v4.b32 [%r165], {%r18844, %r18846, %r18848, %r18850}; 2026-02-21T09:19:14.2237487Z st.shared.v4.b32 [%r165+512], {%r18845, %r18847, %r18849, %r18851}; 2026-02-21T09:19:14.2237595Z st.shared.v4.b32 [%r166], {%r18852, %r18854, %r18856, %r18858}; 2026-02-21T09:19:14.2237709Z st.shared.v4.b32 [%r166+512], {%r18853, %r18855, %r18857, %r18859}; 2026-02-21T09:19:14.2237821Z st.shared.v4.b32 [%r167], {%r18860, %r18862, %r18864, %r18866}; 2026-02-21T09:19:14.2237935Z st.shared.v4.b32 [%r167+512], {%r18861, %r18863, %r18865, %r18867}; 2026-02-21T09:19:14.2238043Z st.shared.v4.b32 [%r168], {%r18868, %r18870, %r18872, %r18874}; 2026-02-21T09:19:14.2238159Z st.shared.v4.b32 [%r168+512], {%r18869, %r18871, %r18873, %r18875}; 2026-02-21T09:19:14.2238215Z bar.sync 0; 2026-02-21T09:19:14.2238273Z // begin inline asm 2026-02-21T09:19:14.2238464Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18547, %r18548, %r18549, %r18550}, [%r6479]; 2026-02-21T09:19:14.2238520Z // end inline asm 2026-02-21T09:19:14.2238577Z // begin inline asm 2026-02-21T09:19:14.2238764Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18552, %r18553, %r18554, %r18555}, [%r6484]; 2026-02-21T09:19:14.2238823Z // end inline asm 2026-02-21T09:19:14.2238880Z // begin inline asm 2026-02-21T09:19:14.2239068Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18557, %r18558, %r18559, %r18560}, [%r6489]; 2026-02-21T09:19:14.2239213Z // end inline asm 2026-02-21T09:19:14.2239271Z // begin inline asm 2026-02-21T09:19:14.2239457Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18562, %r18563, %r18564, %r18565}, [%r6494]; 2026-02-21T09:19:14.2239580Z // end inline asm 2026-02-21T09:19:14.2239639Z // begin inline asm 2026-02-21T09:19:14.2239825Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18567, %r18568, %r18569, %r18570}, [%r6499]; 2026-02-21T09:19:14.2239882Z // end inline asm 2026-02-21T09:19:14.2239943Z // begin inline asm 2026-02-21T09:19:14.2240134Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18572, %r18573, %r18574, %r18575}, [%r6504]; 2026-02-21T09:19:14.2240191Z // end inline asm 2026-02-21T09:19:14.2240250Z // begin inline asm 2026-02-21T09:19:14.2240501Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18577, %r18578, %r18579, %r18580}, [%r6509]; 2026-02-21T09:19:14.2240559Z // end inline asm 2026-02-21T09:19:14.2240616Z // begin inline asm 2026-02-21T09:19:14.2240807Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r18582, %r18583, %r18584, %r18585}, [%r6514]; 2026-02-21T09:19:14.2240863Z // end inline asm 2026-02-21T09:19:14.2240920Z // begin inline asm 2026-02-21T09:19:14.2241051Z st.global.v4.b32 [ %rd869 + 0 ], { %r18427, %r18428, %r18429, %r18430 }; 2026-02-21T09:19:14.2241110Z // end inline asm 2026-02-21T09:19:14.2241168Z // begin inline asm 2026-02-21T09:19:14.2241289Z st.global.v4.b32 [ %rd870 + 0 ], { %r18432, %r18433, %r18434, %r18435 }; 2026-02-21T09:19:14.2241361Z // end inline asm 2026-02-21T09:19:14.2241419Z // begin inline asm 2026-02-21T09:19:14.2241601Z st.global.v4.b32 [ %rd871 + 0 ], { %r18437, %r18438, %r18439, %r18440 }; 2026-02-21T09:19:14.2241664Z // end inline asm 2026-02-21T09:19:14.2241722Z // begin inline asm 2026-02-21T09:19:14.2241842Z st.global.v4.b32 [ %rd872 + 0 ], { %r18442, %r18443, %r18444, %r18445 }; 2026-02-21T09:19:14.2241900Z // end inline asm 2026-02-21T09:19:14.2241958Z // begin inline asm 2026-02-21T09:19:14.2242078Z st.global.v4.b32 [ %rd873 + 0 ], { %r18447, %r18448, %r18449, %r18450 }; 2026-02-21T09:19:14.2242134Z // end inline asm 2026-02-21T09:19:14.2242194Z // begin inline asm 2026-02-21T09:19:14.2242311Z st.global.v4.b32 [ %rd874 + 0 ], { %r18452, %r18453, %r18454, %r18455 }; 2026-02-21T09:19:14.2242367Z // end inline asm 2026-02-21T09:19:14.2242427Z // begin inline asm 2026-02-21T09:19:14.2242544Z st.global.v4.b32 [ %rd875 + 0 ], { %r18457, %r18458, %r18459, %r18460 }; 2026-02-21T09:19:14.2242601Z // end inline asm 2026-02-21T09:19:14.2242657Z // begin inline asm 2026-02-21T09:19:14.2242779Z st.global.v4.b32 [ %rd876 + 0 ], { %r18462, %r18463, %r18464, %r18465 }; 2026-02-21T09:19:14.2242834Z // end inline asm 2026-02-21T09:19:14.2242893Z // begin inline asm 2026-02-21T09:19:14.2243014Z st.global.v4.b32 [ %rd877 + 0 ], { %r18467, %r18468, %r18469, %r18470 }; 2026-02-21T09:19:14.2243070Z // end inline asm 2026-02-21T09:19:14.2243127Z // begin inline asm 2026-02-21T09:19:14.2243247Z st.global.v4.b32 [ %rd878 + 0 ], { %r18472, %r18473, %r18474, %r18475 }; 2026-02-21T09:19:14.2243305Z // end inline asm 2026-02-21T09:19:14.2243362Z // begin inline asm 2026-02-21T09:19:14.2243480Z st.global.v4.b32 [ %rd879 + 0 ], { %r18477, %r18478, %r18479, %r18480 }; 2026-02-21T09:19:14.2243540Z // end inline asm 2026-02-21T09:19:14.2243598Z // begin inline asm 2026-02-21T09:19:14.2243716Z st.global.v4.b32 [ %rd880 + 0 ], { %r18482, %r18483, %r18484, %r18485 }; 2026-02-21T09:19:14.2243774Z // end inline asm 2026-02-21T09:19:14.2243831Z // begin inline asm 2026-02-21T09:19:14.2243952Z st.global.v4.b32 [ %rd881 + 0 ], { %r18487, %r18488, %r18489, %r18490 }; 2026-02-21T09:19:14.2244008Z // end inline asm 2026-02-21T09:19:14.2244068Z // begin inline asm 2026-02-21T09:19:14.2244186Z st.global.v4.b32 [ %rd882 + 0 ], { %r18492, %r18493, %r18494, %r18495 }; 2026-02-21T09:19:14.2244242Z // end inline asm 2026-02-21T09:19:14.2244302Z // begin inline asm 2026-02-21T09:19:14.2244420Z st.global.v4.b32 [ %rd883 + 0 ], { %r18497, %r18498, %r18499, %r18500 }; 2026-02-21T09:19:14.2244541Z // end inline asm 2026-02-21T09:19:14.2244602Z // begin inline asm 2026-02-21T09:19:14.2244720Z st.global.v4.b32 [ %rd884 + 0 ], { %r18502, %r18503, %r18504, %r18505 }; 2026-02-21T09:19:14.2244778Z // end inline asm 2026-02-21T09:19:14.2244883Z // begin inline asm 2026-02-21T09:19:14.2245005Z st.global.v4.b32 [ %rd885 + 0 ], { %r18507, %r18508, %r18509, %r18510 }; 2026-02-21T09:19:14.2245063Z // end inline asm 2026-02-21T09:19:14.2245119Z // begin inline asm 2026-02-21T09:19:14.2245238Z st.global.v4.b32 [ %rd886 + 0 ], { %r18512, %r18513, %r18514, %r18515 }; 2026-02-21T09:19:14.2245294Z // end inline asm 2026-02-21T09:19:14.2245351Z // begin inline asm 2026-02-21T09:19:14.2245515Z st.global.v4.b32 [ %rd887 + 0 ], { %r18517, %r18518, %r18519, %r18520 }; 2026-02-21T09:19:14.2245589Z // end inline asm 2026-02-21T09:19:14.2245648Z // begin inline asm 2026-02-21T09:19:14.2245767Z st.global.v4.b32 [ %rd888 + 0 ], { %r18522, %r18523, %r18524, %r18525 }; 2026-02-21T09:19:14.2245828Z // end inline asm 2026-02-21T09:19:14.2245886Z // begin inline asm 2026-02-21T09:19:14.2246003Z st.global.v4.b32 [ %rd889 + 0 ], { %r18527, %r18528, %r18529, %r18530 }; 2026-02-21T09:19:14.2246062Z // end inline asm 2026-02-21T09:19:14.2246122Z // begin inline asm 2026-02-21T09:19:14.2246239Z st.global.v4.b32 [ %rd890 + 0 ], { %r18532, %r18533, %r18534, %r18535 }; 2026-02-21T09:19:14.2246296Z // end inline asm 2026-02-21T09:19:14.2246356Z // begin inline asm 2026-02-21T09:19:14.2246659Z st.global.v4.b32 [ %rd891 + 0 ], { %r18537, %r18538, %r18539, %r18540 }; 2026-02-21T09:19:14.2246727Z // end inline asm 2026-02-21T09:19:14.2246790Z // begin inline asm 2026-02-21T09:19:14.2246924Z st.global.v4.b32 [ %rd892 + 0 ], { %r18542, %r18543, %r18544, %r18545 }; 2026-02-21T09:19:14.2246985Z // end inline asm 2026-02-21T09:19:14.2247043Z // begin inline asm 2026-02-21T09:19:14.2247170Z st.global.v4.b32 [ %rd893 + 0 ], { %r18547, %r18548, %r18549, %r18550 }; 2026-02-21T09:19:14.2247229Z // end inline asm 2026-02-21T09:19:14.2247288Z // begin inline asm 2026-02-21T09:19:14.2247411Z st.global.v4.b32 [ %rd894 + 0 ], { %r18552, %r18553, %r18554, %r18555 }; 2026-02-21T09:19:14.2247467Z // end inline asm 2026-02-21T09:19:14.2247526Z // begin inline asm 2026-02-21T09:19:14.2247647Z st.global.v4.b32 [ %rd895 + 0 ], { %r18557, %r18558, %r18559, %r18560 }; 2026-02-21T09:19:14.2247702Z // end inline asm 2026-02-21T09:19:14.2247760Z // begin inline asm 2026-02-21T09:19:14.2247880Z st.global.v4.b32 [ %rd896 + 0 ], { %r18562, %r18563, %r18564, %r18565 }; 2026-02-21T09:19:14.2247940Z // end inline asm 2026-02-21T09:19:14.2248002Z // begin inline asm 2026-02-21T09:19:14.2248122Z st.global.v4.b32 [ %rd897 + 0 ], { %r18567, %r18568, %r18569, %r18570 }; 2026-02-21T09:19:14.2248183Z // end inline asm 2026-02-21T09:19:14.2248240Z // begin inline asm 2026-02-21T09:19:14.2248360Z st.global.v4.b32 [ %rd898 + 0 ], { %r18572, %r18573, %r18574, %r18575 }; 2026-02-21T09:19:14.2248417Z // end inline asm 2026-02-21T09:19:14.2248477Z // begin inline asm 2026-02-21T09:19:14.2248593Z st.global.v4.b32 [ %rd899 + 0 ], { %r18577, %r18578, %r18579, %r18580 }; 2026-02-21T09:19:14.2248649Z // end inline asm 2026-02-21T09:19:14.2248709Z // begin inline asm 2026-02-21T09:19:14.2248829Z st.global.v4.b32 [ %rd900 + 0 ], { %r18582, %r18583, %r18584, %r18585 }; 2026-02-21T09:19:14.2248885Z // end inline asm 2026-02-21T09:19:14.2249107Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.2249181Z add.s32 %r22988, %r22988, 4; 2026-02-21T09:19:14.2249243Z add.s32 %r22987, %r22987, 4; 2026-02-21T09:19:14.2249304Z add.s32 %r22986, %r22986, 4; 2026-02-21T09:19:14.2249369Z add.s32 %r22985, %r22985, 4; 2026-02-21T09:19:14.2249443Z setp.lt.s32 %p86, %r22988, %r24029; 2026-02-21T09:19:14.2249504Z @%p86 bra $L__BB0_2; 2026-02-21T09:19:14.2249598Z $L__BB0_11: // %.preheader 2026-02-21T09:19:14.2249763Z setp.gt.s32 %p87, %r24029, %r2; 2026-02-21T09:19:14.2249825Z @%p87 bra $L__BB0_16; 2026-02-21T09:19:14.2249909Z // %bb.12: // %.lr.ph114 2026-02-21T09:19:14.2250120Z .loc 1 0 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:0:121 2026-02-21T09:19:14.2250258Z and.b32 %r18942, %r22966, 136; 2026-02-21T09:19:14.2250324Z xor.b32 %r189, %r18942, %r22965; 2026-02-21T09:19:14.2250394Z add.s32 %r190, %r22967, %r189; 2026-02-21T09:19:14.2250457Z add.s32 %r18985, %r190, 2048; 2026-02-21T09:19:14.2250518Z add.s32 %r18987, %r190, 4096; 2026-02-21T09:19:14.2250583Z add.s32 %r18989, %r190, 6144; 2026-02-21T09:19:14.2250643Z add.s32 %r18991, %r190, 8192; 2026-02-21T09:19:14.2250703Z add.s32 %r18993, %r190, 10240; 2026-02-21T09:19:14.2250829Z add.s32 %r18995, %r190, 12288; 2026-02-21T09:19:14.2250896Z add.s32 %r18997, %r190, 14336; 2026-02-21T09:19:14.2250960Z add.s32 %r18945, %r22967, %r22969; 2026-02-21T09:19:14.2251024Z add.s32 %r199, %r18945, 172032; 2026-02-21T09:19:14.2251087Z add.s32 %r19001, %r190, 81920; 2026-02-21T09:19:14.2251146Z add.s32 %r19003, %r190, 83968; 2026-02-21T09:19:14.2251205Z add.s32 %r19005, %r190, 86016; 2026-02-21T09:19:14.2251266Z add.s32 %r19007, %r190, 88064; 2026-02-21T09:19:14.2251333Z add.s32 %r19009, %r190, 90112; 2026-02-21T09:19:14.2251392Z add.s32 %r19011, %r190, 92160; 2026-02-21T09:19:14.2251451Z add.s32 %r19013, %r190, 94208; 2026-02-21T09:19:14.2251513Z add.s32 %r19015, %r190, 96256; 2026-02-21T09:19:14.2251573Z or.b32 %r208, %r22968, 65536; 2026-02-21T09:19:14.2251687Z add.s32 %r209, %r18945, 177152; 2026-02-21T09:19:14.2251751Z add.s32 %r19019, %r190, 16384; 2026-02-21T09:19:14.2251811Z add.s32 %r19021, %r190, 18432; 2026-02-21T09:19:14.2251873Z add.s32 %r19023, %r190, 20480; 2026-02-21T09:19:14.2251933Z add.s32 %r19025, %r190, 22528; 2026-02-21T09:19:14.2251997Z add.s32 %r19027, %r190, 24576; 2026-02-21T09:19:14.2252057Z add.s32 %r19029, %r190, 26624; 2026-02-21T09:19:14.2252117Z add.s32 %r19031, %r190, 28672; 2026-02-21T09:19:14.2252177Z add.s32 %r19033, %r190, 30720; 2026-02-21T09:19:14.2252238Z or.b32 %r218, %r22968, 131072; 2026-02-21T09:19:14.2252302Z add.s32 %r19035, %r18945, 173056; 2026-02-21T09:19:14.2252364Z add.s32 %r19037, %r190, 98304; 2026-02-21T09:19:14.2252430Z add.s32 %r19039, %r190, 100352; 2026-02-21T09:19:14.2252489Z add.s32 %r19041, %r190, 102400; 2026-02-21T09:19:14.2252547Z add.s32 %r19043, %r190, 104448; 2026-02-21T09:19:14.2252611Z add.s32 %r19045, %r190, 106496; 2026-02-21T09:19:14.2252671Z add.s32 %r19047, %r190, 108544; 2026-02-21T09:19:14.2252731Z add.s32 %r19049, %r190, 110592; 2026-02-21T09:19:14.2252791Z add.s32 %r19051, %r190, 112640; 2026-02-21T09:19:14.2252854Z or.b32 %r228, %r22968, 196608; 2026-02-21T09:19:14.2252915Z add.s32 %r19053, %r18945, 178176; 2026-02-21T09:19:14.2252975Z add.s32 %r19055, %r190, 32768; 2026-02-21T09:19:14.2253037Z add.s32 %r19057, %r190, 34816; 2026-02-21T09:19:14.2253099Z add.s32 %r19059, %r190, 36864; 2026-02-21T09:19:14.2253157Z add.s32 %r19061, %r190, 38912; 2026-02-21T09:19:14.2253218Z add.s32 %r19063, %r190, 40960; 2026-02-21T09:19:14.2253276Z add.s32 %r19065, %r190, 43008; 2026-02-21T09:19:14.2253334Z add.s32 %r19067, %r190, 45056; 2026-02-21T09:19:14.2253394Z add.s32 %r19069, %r190, 47104; 2026-02-21T09:19:14.2253458Z or.b32 %r238, %r22968, 262144; 2026-02-21T09:19:14.2253519Z add.s32 %r19071, %r18945, 174080; 2026-02-21T09:19:14.2253579Z add.s32 %r19073, %r190, 114688; 2026-02-21T09:19:14.2253641Z add.s32 %r19075, %r190, 116736; 2026-02-21T09:19:14.2253701Z add.s32 %r19077, %r190, 118784; 2026-02-21T09:19:14.2253759Z add.s32 %r19079, %r190, 120832; 2026-02-21T09:19:14.2253817Z add.s32 %r19081, %r190, 122880; 2026-02-21T09:19:14.2253881Z add.s32 %r19083, %r190, 124928; 2026-02-21T09:19:14.2253941Z add.s32 %r19085, %r190, 126976; 2026-02-21T09:19:14.2254003Z add.s32 %r19087, %r190, 129024; 2026-02-21T09:19:14.2254067Z or.b32 %r248, %r22968, 327680; 2026-02-21T09:19:14.2255672Z add.s32 %r19089, %r18945, 179200; 2026-02-21T09:19:14.2255732Z add.s32 %r19091, %r190, 49152; 2026-02-21T09:19:14.2255793Z add.s32 %r19093, %r190, 51200; 2026-02-21T09:19:14.2255856Z add.s32 %r19095, %r190, 53248; 2026-02-21T09:19:14.2255966Z add.s32 %r19097, %r190, 55296; 2026-02-21T09:19:14.2256025Z add.s32 %r19099, %r190, 57344; 2026-02-21T09:19:14.2256089Z add.s32 %r19101, %r190, 59392; 2026-02-21T09:19:14.2256149Z add.s32 %r19103, %r190, 61440; 2026-02-21T09:19:14.2256207Z add.s32 %r19105, %r190, 63488; 2026-02-21T09:19:14.2256271Z or.b32 %r258, %r22968, 393216; 2026-02-21T09:19:14.2256332Z add.s32 %r19107, %r18945, 175104; 2026-02-21T09:19:14.2256392Z add.s32 %r19109, %r190, 131072; 2026-02-21T09:19:14.2256653Z add.s32 %r19111, %r190, 133120; 2026-02-21T09:19:14.2256724Z add.s32 %r19113, %r190, 135168; 2026-02-21T09:19:14.2256784Z add.s32 %r19115, %r190, 137216; 2026-02-21T09:19:14.2256843Z add.s32 %r19117, %r190, 139264; 2026-02-21T09:19:14.2256906Z add.s32 %r19119, %r190, 141312; 2026-02-21T09:19:14.2256964Z add.s32 %r19121, %r190, 143360; 2026-02-21T09:19:14.2257034Z add.s32 %r19123, %r190, 145408; 2026-02-21T09:19:14.2257097Z or.b32 %r268, %r22968, 458752; 2026-02-21T09:19:14.2257162Z add.s32 %r19125, %r18945, 180224; 2026-02-21T09:19:14.2257222Z add.s32 %r19127, %r190, 65536; 2026-02-21T09:19:14.2257280Z add.s32 %r19129, %r190, 67584; 2026-02-21T09:19:14.2257342Z add.s32 %r19131, %r190, 69632; 2026-02-21T09:19:14.2257404Z add.s32 %r19133, %r190, 71680; 2026-02-21T09:19:14.2257532Z add.s32 %r19135, %r190, 73728; 2026-02-21T09:19:14.2257596Z add.s32 %r19137, %r190, 75776; 2026-02-21T09:19:14.2257659Z add.s32 %r19139, %r190, 77824; 2026-02-21T09:19:14.2257719Z add.s32 %r19141, %r190, 79872; 2026-02-21T09:19:14.2257779Z or.b32 %r278, %r22968, 524288; 2026-02-21T09:19:14.2257843Z add.s32 %r19143, %r18945, 176128; 2026-02-21T09:19:14.2257902Z add.s32 %r19145, %r190, 147456; 2026-02-21T09:19:14.2257964Z add.s32 %r19147, %r190, 149504; 2026-02-21T09:19:14.2258026Z add.s32 %r19149, %r190, 151552; 2026-02-21T09:19:14.2258085Z add.s32 %r19151, %r190, 153600; 2026-02-21T09:19:14.2258144Z add.s32 %r19153, %r190, 155648; 2026-02-21T09:19:14.2258204Z add.s32 %r19155, %r190, 157696; 2026-02-21T09:19:14.2258269Z add.s32 %r19157, %r190, 159744; 2026-02-21T09:19:14.2258327Z add.s32 %r19159, %r190, 161792; 2026-02-21T09:19:14.2258386Z or.b32 %r288, %r22968, 589824; 2026-02-21T09:19:14.2258449Z add.s32 %r19161, %r18945, 181248; 2026-02-21T09:19:14.2258510Z or.b32 %r18949, %r22970, %r22971; 2026-02-21T09:19:14.2258570Z or.b32 %r18950, %r18949, %r22972; 2026-02-21T09:19:14.2258632Z or.b32 %r290, %r18950, %r18942; 2026-02-21T09:19:14.2258696Z xor.b32 %r291, %r290, 8; 2026-02-21T09:19:14.2258760Z shl.b32 %r18951, %r22973, 6; 2026-02-21T09:19:14.2258821Z or.b32 %r18954, %r18951, %r22976; 2026-02-21T09:19:14.2258883Z or.b32 %r18955, %r18954, %r22975; 2026-02-21T09:19:14.2258941Z add.s32 %r18956, %r22967, 163840; 2026-02-21T09:19:14.2259006Z add.s32 %r294, %r18956, %r18955; 2026-02-21T09:19:14.2259067Z xor.b32 %r18957, %r18955, 16; 2026-02-21T09:19:14.2259129Z add.s32 %r295, %r18956, %r18957; 2026-02-21T09:19:14.2259188Z xor.b32 %r18958, %r18955, 32; 2026-02-21T09:19:14.2259258Z add.s32 %r296, %r18956, %r18958; 2026-02-21T09:19:14.2259322Z xor.b32 %r18959, %r18955, 48; 2026-02-21T09:19:14.2259383Z add.s32 %r297, %r18956, %r18959; 2026-02-21T09:19:14.2259443Z bfe.u32 %r18960, %r18956, 4, 14; 2026-02-21T09:19:14.2259508Z cvt.u64.u32 %rd901, %r18960; 2026-02-21T09:19:14.2259590Z or.b64 %rd4, %rd901, -9223371899382267904; 2026-02-21T09:19:14.2259654Z add.s32 %r18961, %r22967, 163872; 2026-02-21T09:19:14.2259715Z bfe.u32 %r18962, %r18961, 4, 14; 2026-02-21T09:19:14.2259782Z cvt.u64.u32 %rd902, %r18962; 2026-02-21T09:19:14.2259857Z or.b64 %rd5, %rd902, -9223371899382267904; 2026-02-21T09:19:14.2259920Z and.b32 %r18965, %r22978, 7264; 2026-02-21T09:19:14.2260059Z shl.b32 %r18967, %r22979, 4; 2026-02-21T09:19:14.2260119Z or.b32 %r18969, %r22977, %r22980; 2026-02-21T09:19:14.2260178Z or.b32 %r18970, %r18965, %r18967; 2026-02-21T09:19:14.2260237Z or.b32 %r18971, %r18969, %r18970; 2026-02-21T09:19:14.2260300Z add.s32 %r298, %r22967, %r18971; 2026-02-21T09:19:14.2260430Z xor.b32 %r18972, %r18971, 32; 2026-02-21T09:19:14.2260492Z add.s32 %r299, %r22967, %r18972; 2026-02-21T09:19:14.2260552Z xor.b32 %r18973, %r18971, 64; 2026-02-21T09:19:14.2260612Z add.s32 %r300, %r22967, %r18973; 2026-02-21T09:19:14.2260684Z xor.b32 %r18974, %r18971, 96; 2026-02-21T09:19:14.2260752Z add.s32 %r301, %r22967, %r18974; 2026-02-21T09:19:14.2260813Z shl.b32 %r18975, %r22979, 10; 2026-02-21T09:19:14.2260872Z or.b32 %r18978, %r18975, %r22981; 2026-02-21T09:19:14.2260983Z xor.b32 %r18979, %r18978, %r22982; 2026-02-21T09:19:14.2261049Z add.s32 %r22456, %r22967, %r18979; 2026-02-21T09:19:14.2261112Z add.s32 %r22461, %r22456, 1024; 2026-02-21T09:19:14.2261173Z add.s32 %r22466, %r22456, 2048; 2026-02-21T09:19:14.2261237Z add.s32 %r22471, %r22456, 3072; 2026-02-21T09:19:14.2261295Z add.s32 %r22476, %r22456, 4096; 2026-02-21T09:19:14.2261353Z add.s32 %r22481, %r22456, 5120; 2026-02-21T09:19:14.2261411Z add.s32 %r22486, %r22456, 6144; 2026-02-21T09:19:14.2261474Z add.s32 %r22491, %r22456, 7168; 2026-02-21T09:19:14.2261681Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2261741Z or.b32 %r18981, %r22983, %r7; 2026-02-21T09:19:14.2261805Z or.b32 %r310, %r18981, 720896; 2026-02-21T09:19:14.2262066Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.2262135Z mad.wide.u32 %rd6, %r53, 8, %rd64; 2026-02-21T09:19:14.2262250Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T09:19:14.2262347Z // Child Loop BB0_14 Depth 2 2026-02-21T09:19:14.2262548Z .loc 1 28 35 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:28:35 2026-02-21T09:19:14.2262608Z shr.s32 %r19166, %r24029, 31; 2026-02-21T09:19:14.2262669Z shr.u32 %r19167, %r19166, 24; 2026-02-21T09:19:14.2262731Z add.s32 %r19168, %r24029, %r19167; 2026-02-21T09:19:14.2262930Z .loc 1 31 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:45 2026-02-21T09:19:14.2262994Z and.b32 %r19169, %r19168, 65280; 2026-02-21T09:19:14.2263055Z sub.s32 %r19170, %r24029, %r19169; 2026-02-21T09:19:14.2263250Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.2263314Z cvt.u16.u32 %rs577, %r19170; 2026-02-21T09:19:14.2263508Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.2263573Z shr.s16 %rs578, %rs577, 15; 2026-02-21T09:19:14.2263639Z shr.u16 %rs579, %rs578, 13; 2026-02-21T09:19:14.2263704Z add.s16 %rs580, %rs577, %rs579; 2026-02-21T09:19:14.2263767Z shr.s16 %rs581, %rs580, 3; 2026-02-21T09:19:14.2263963Z .loc 1 31 64 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:31:64 2026-02-21T09:19:14.2264031Z and.b16 %rs582, %rs580, -8; 2026-02-21T09:19:14.2264095Z sub.s16 %rs583, %rs577, %rs582; 2026-02-21T09:19:14.2264292Z .loc 1 32 51 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:32:51 2026-02-21T09:19:14.2264356Z cvt.u32.u16 %r19171, %rs581; 2026-02-21T09:19:14.2264552Z .loc 1 33 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:33:27 2026-02-21T09:19:14.2264612Z shl.b32 %r19172, %r19168, 2; 2026-02-21T09:19:14.2264680Z and.b32 %r19173, %r19172, -1024; 2026-02-21T09:19:14.2264747Z mul.wide.s16 %r19174, %rs583, 128; 2026-02-21T09:19:14.2264809Z add.s32 %r2428, %r19174, %r19173; 2026-02-21T09:19:14.2265005Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.2265126Z or.b32 %r19175, %r2428, %r7; 2026-02-21T09:19:14.2265322Z .loc 1 35 27 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:35:27 2026-02-21T09:19:14.2265387Z mul.wide.s16 %r2429, %rs581, 512; 2026-02-21T09:19:14.2265630Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.2265690Z or.b32 %r19176, %r2429, %r11; 2026-02-21T09:19:14.2265750Z or.b32 %r19177, %r2429, %r12; 2026-02-21T09:19:14.2265813Z or.b32 %r19178, %r2429, %r13; 2026-02-21T09:19:14.2265874Z or.b32 %r19179, %r2429, %r14; 2026-02-21T09:19:14.2265934Z or.b32 %r19180, %r2429, %r15; 2026-02-21T09:19:14.2265994Z or.b32 %r19181, %r2429, %r16; 2026-02-21T09:19:14.2266106Z or.b32 %r19182, %r2429, %r17; 2026-02-21T09:19:14.2266167Z or.b32 %r19183, %r2429, %r18; 2026-02-21T09:19:14.2266363Z .loc 1 51 53 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:53 2026-02-21T09:19:14.2266428Z shl.b32 %r19184, %r19176, 10; 2026-02-21T09:19:14.2266603Z shl.b32 %r19185, %r19177, 10; 2026-02-21T09:19:14.2266666Z shl.b32 %r19186, %r19178, 10; 2026-02-21T09:19:14.2266726Z shl.b32 %r19187, %r19179, 10; 2026-02-21T09:19:14.2266790Z shl.b32 %r19188, %r19180, 10; 2026-02-21T09:19:14.2266849Z shl.b32 %r19189, %r19181, 10; 2026-02-21T09:19:14.2266907Z shl.b32 %r19190, %r19182, 10; 2026-02-21T09:19:14.2266969Z shl.b32 %r19191, %r19183, 10; 2026-02-21T09:19:14.2267261Z .loc 1 51 60 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:60 2026-02-21T09:19:14.2267327Z or.b32 %r19192, %r19184, %r54; 2026-02-21T09:19:14.2267391Z or.b32 %r19193, %r19185, %r54; 2026-02-21T09:19:14.2267451Z or.b32 %r19194, %r19186, %r54; 2026-02-21T09:19:14.2267513Z or.b32 %r19195, %r19187, %r54; 2026-02-21T09:19:14.2267571Z or.b32 %r19196, %r19188, %r54; 2026-02-21T09:19:14.2267649Z or.b32 %r19197, %r19189, %r54; 2026-02-21T09:19:14.2267711Z or.b32 %r19198, %r19190, %r54; 2026-02-21T09:19:14.2267771Z or.b32 %r19199, %r19191, %r54; 2026-02-21T09:19:14.2267972Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2268049Z mad.wide.s32 %rd903, %r19192, 2, %rd64; 2026-02-21T09:19:14.2268121Z mad.wide.s32 %rd904, %r19193, 2, %rd64; 2026-02-21T09:19:14.2268191Z mad.wide.s32 %rd905, %r19194, 2, %rd64; 2026-02-21T09:19:14.2268259Z mad.wide.s32 %rd906, %r19195, 2, %rd64; 2026-02-21T09:19:14.2268327Z mad.wide.s32 %rd907, %r19196, 2, %rd64; 2026-02-21T09:19:14.2268396Z mad.wide.s32 %rd908, %r19197, 2, %rd64; 2026-02-21T09:19:14.2268472Z mad.wide.s32 %rd909, %r19198, 2, %rd64; 2026-02-21T09:19:14.2268624Z mad.wide.s32 %rd910, %r19199, 2, %rd64; 2026-02-21T09:19:14.2268826Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2268886Z bar.sync 0; 2026-02-21T09:19:14.2268944Z mov.b32 %r18984, 8; 2026-02-21T09:19:14.2269005Z // begin inline asm 2026-02-21T09:19:14.2269150Z cp.async.ca.shared.global [ %r190 + 0 ], [ %rd903 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2269219Z // end inline asm 2026-02-21T09:19:14.2269278Z // begin inline asm 2026-02-21T09:19:14.2269423Z cp.async.ca.shared.global [ %r18985 + 0 ], [ %rd904 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2269482Z // end inline asm 2026-02-21T09:19:14.2269540Z // begin inline asm 2026-02-21T09:19:14.2269676Z cp.async.ca.shared.global [ %r18987 + 0 ], [ %rd905 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2269735Z // end inline asm 2026-02-21T09:19:14.2269794Z // begin inline asm 2026-02-21T09:19:14.2269929Z cp.async.ca.shared.global [ %r18989 + 0 ], [ %rd906 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2269985Z // end inline asm 2026-02-21T09:19:14.2270048Z // begin inline asm 2026-02-21T09:19:14.2270182Z cp.async.ca.shared.global [ %r18991 + 0 ], [ %rd907 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2270238Z // end inline asm 2026-02-21T09:19:14.2270379Z // begin inline asm 2026-02-21T09:19:14.2270513Z cp.async.ca.shared.global [ %r18993 + 0 ], [ %rd908 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2270569Z // end inline asm 2026-02-21T09:19:14.2270625Z // begin inline asm 2026-02-21T09:19:14.2270763Z cp.async.ca.shared.global [ %r18995 + 0 ], [ %rd909 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2270880Z // end inline asm 2026-02-21T09:19:14.2270939Z // begin inline asm 2026-02-21T09:19:14.2271076Z cp.async.ca.shared.global [ %r18997 + 0 ], [ %rd910 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2271133Z // end inline asm 2026-02-21T09:19:14.2271200Z cp.async.commit_group; 2026-02-21T09:19:14.2271402Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2271539Z add.s32 %r19200, %r19175, %r22968; 2026-02-21T09:19:14.2271749Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2271812Z cvt.s64.s32 %rd994, %r19200; 2026-02-21T09:19:14.2271885Z add.s64 %rd911, %rd65, %rd994; 2026-02-21T09:19:14.2271943Z mov.b32 %r24032, 4; 2026-02-21T09:19:14.2272143Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2272207Z // begin inline asm 2026-02-21T09:19:14.2272347Z cp.async.ca.shared.global [ %r199 + 0 ], [ %rd911 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2272402Z // end inline asm 2026-02-21T09:19:14.2272471Z cp.async.commit_group; 2026-02-21T09:19:14.2272718Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2272783Z cvt.s64.s32 %rd995, %r19184; 2026-02-21T09:19:14.2272847Z or.b64 %rd997, %rd995, %rd1113; 2026-02-21T09:19:14.2272913Z shl.b64 %rd998, %rd997, 1; 2026-02-21T09:19:14.2272977Z add.s64 %rd999, %rd64, %rd998; 2026-02-21T09:19:14.2273040Z add.s64 %rd912, %rd999, 32; 2026-02-21T09:19:14.2273106Z cvt.s64.s32 %rd1000, %r19185; 2026-02-21T09:19:14.2273171Z or.b64 %rd1001, %rd1000, %rd1113; 2026-02-21T09:19:14.2273231Z shl.b64 %rd1002, %rd1001, 1; 2026-02-21T09:19:14.2273294Z add.s64 %rd1003, %rd64, %rd1002; 2026-02-21T09:19:14.2273357Z add.s64 %rd913, %rd1003, 32; 2026-02-21T09:19:14.2273418Z cvt.s64.s32 %rd1004, %r19186; 2026-02-21T09:19:14.2273480Z or.b64 %rd1005, %rd1004, %rd1113; 2026-02-21T09:19:14.2273542Z shl.b64 %rd1006, %rd1005, 1; 2026-02-21T09:19:14.2273605Z add.s64 %rd1007, %rd64, %rd1006; 2026-02-21T09:19:14.2273665Z add.s64 %rd914, %rd1007, 32; 2026-02-21T09:19:14.2273729Z cvt.s64.s32 %rd1008, %r19187; 2026-02-21T09:19:14.2273791Z or.b64 %rd1009, %rd1008, %rd1113; 2026-02-21T09:19:14.2273852Z shl.b64 %rd1010, %rd1009, 1; 2026-02-21T09:19:14.2273913Z add.s64 %rd1011, %rd64, %rd1010; 2026-02-21T09:19:14.2273978Z add.s64 %rd915, %rd1011, 32; 2026-02-21T09:19:14.2274040Z cvt.s64.s32 %rd1012, %r19188; 2026-02-21T09:19:14.2274101Z or.b64 %rd1013, %rd1012, %rd1113; 2026-02-21T09:19:14.2274165Z shl.b64 %rd1014, %rd1013, 1; 2026-02-21T09:19:14.2274228Z add.s64 %rd1015, %rd64, %rd1014; 2026-02-21T09:19:14.2274293Z add.s64 %rd916, %rd1015, 32; 2026-02-21T09:19:14.2274354Z cvt.s64.s32 %rd1016, %r19189; 2026-02-21T09:19:14.2274419Z or.b64 %rd1017, %rd1016, %rd1113; 2026-02-21T09:19:14.2274480Z shl.b64 %rd1018, %rd1017, 1; 2026-02-21T09:19:14.2274542Z add.s64 %rd1019, %rd64, %rd1018; 2026-02-21T09:19:14.2274605Z add.s64 %rd917, %rd1019, 32; 2026-02-21T09:19:14.2274666Z cvt.s64.s32 %rd1020, %r19190; 2026-02-21T09:19:14.2274728Z or.b64 %rd1021, %rd1020, %rd1113; 2026-02-21T09:19:14.2274790Z shl.b64 %rd1022, %rd1021, 1; 2026-02-21T09:19:14.2274869Z add.s64 %rd1023, %rd64, %rd1022; 2026-02-21T09:19:14.2274931Z add.s64 %rd918, %rd1023, 32; 2026-02-21T09:19:14.2274990Z cvt.s64.s32 %rd1024, %r19191; 2026-02-21T09:19:14.2275056Z or.b64 %rd1025, %rd1024, %rd1113; 2026-02-21T09:19:14.2275117Z shl.b64 %rd1026, %rd1025, 1; 2026-02-21T09:19:14.2275179Z add.s64 %rd1027, %rd64, %rd1026; 2026-02-21T09:19:14.2275307Z add.s64 %rd919, %rd1027, 32; 2026-02-21T09:19:14.2275506Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2275566Z // begin inline asm 2026-02-21T09:19:14.2275710Z cp.async.ca.shared.global [ %r19001 + 0 ], [ %rd912 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2275814Z // end inline asm 2026-02-21T09:19:14.2275871Z // begin inline asm 2026-02-21T09:19:14.2276008Z cp.async.ca.shared.global [ %r19003 + 0 ], [ %rd913 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2276066Z // end inline asm 2026-02-21T09:19:14.2276124Z // begin inline asm 2026-02-21T09:19:14.2276260Z cp.async.ca.shared.global [ %r19005 + 0 ], [ %rd914 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2276316Z // end inline asm 2026-02-21T09:19:14.2276425Z // begin inline asm 2026-02-21T09:19:14.2276691Z cp.async.ca.shared.global [ %r19007 + 0 ], [ %rd915 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2276752Z // end inline asm 2026-02-21T09:19:14.2276813Z // begin inline asm 2026-02-21T09:19:14.2276950Z cp.async.ca.shared.global [ %r19009 + 0 ], [ %rd916 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2277008Z // end inline asm 2026-02-21T09:19:14.2277069Z // begin inline asm 2026-02-21T09:19:14.2277204Z cp.async.ca.shared.global [ %r19011 + 0 ], [ %rd917 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2277264Z // end inline asm 2026-02-21T09:19:14.2277321Z // begin inline asm 2026-02-21T09:19:14.2277457Z cp.async.ca.shared.global [ %r19013 + 0 ], [ %rd918 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2277514Z // end inline asm 2026-02-21T09:19:14.2277573Z // begin inline asm 2026-02-21T09:19:14.2277782Z cp.async.ca.shared.global [ %r19015 + 0 ], [ %rd919 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2277841Z // end inline asm 2026-02-21T09:19:14.2277905Z cp.async.commit_group; 2026-02-21T09:19:14.2278105Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2278171Z add.s32 %r19201, %r208, %r19175; 2026-02-21T09:19:14.2278369Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2278429Z cvt.s64.s32 %rd1028, %r19201; 2026-02-21T09:19:14.2278496Z add.s64 %rd920, %rd65, %rd1028; 2026-02-21T09:19:14.2278689Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2278750Z // begin inline asm 2026-02-21T09:19:14.2278887Z cp.async.ca.shared.global [ %r209 + 0 ], [ %rd920 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2278943Z // end inline asm 2026-02-21T09:19:14.2279009Z cp.async.commit_group; 2026-02-21T09:19:14.2279204Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2279277Z add.s64 %rd921, %rd999, 64; 2026-02-21T09:19:14.2279343Z add.s64 %rd922, %rd1003, 64; 2026-02-21T09:19:14.2279404Z add.s64 %rd923, %rd1007, 64; 2026-02-21T09:19:14.2279467Z add.s64 %rd924, %rd1011, 64; 2026-02-21T09:19:14.2279529Z add.s64 %rd925, %rd1015, 64; 2026-02-21T09:19:14.2279588Z add.s64 %rd926, %rd1019, 64; 2026-02-21T09:19:14.2279650Z add.s64 %rd927, %rd1023, 64; 2026-02-21T09:19:14.2279711Z add.s64 %rd928, %rd1027, 64; 2026-02-21T09:19:14.2279909Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2279970Z bar.sync 0; 2026-02-21T09:19:14.2280031Z // begin inline asm 2026-02-21T09:19:14.2280168Z cp.async.ca.shared.global [ %r19019 + 0 ], [ %rd921 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2280224Z // end inline asm 2026-02-21T09:19:14.2280287Z // begin inline asm 2026-02-21T09:19:14.2280421Z cp.async.ca.shared.global [ %r19021 + 0 ], [ %rd922 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2280476Z // end inline asm 2026-02-21T09:19:14.2280539Z // begin inline asm 2026-02-21T09:19:14.2280675Z cp.async.ca.shared.global [ %r19023 + 0 ], [ %rd923 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2280729Z // end inline asm 2026-02-21T09:19:14.2280786Z // begin inline asm 2026-02-21T09:19:14.2281005Z cp.async.ca.shared.global [ %r19025 + 0 ], [ %rd924 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2281061Z // end inline asm 2026-02-21T09:19:14.2281117Z // begin inline asm 2026-02-21T09:19:14.2281252Z cp.async.ca.shared.global [ %r19027 + 0 ], [ %rd925 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2281370Z // end inline asm 2026-02-21T09:19:14.2281428Z // begin inline asm 2026-02-21T09:19:14.2281563Z cp.async.ca.shared.global [ %r19029 + 0 ], [ %rd926 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2281621Z // end inline asm 2026-02-21T09:19:14.2281680Z // begin inline asm 2026-02-21T09:19:14.2281814Z cp.async.ca.shared.global [ %r19031 + 0 ], [ %rd927 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2281873Z // end inline asm 2026-02-21T09:19:14.2281930Z // begin inline asm 2026-02-21T09:19:14.2282128Z cp.async.ca.shared.global [ %r19033 + 0 ], [ %rd928 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2282190Z // end inline asm 2026-02-21T09:19:14.2282256Z cp.async.commit_group; 2026-02-21T09:19:14.2282456Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2282518Z add.s32 %r19202, %r218, %r19175; 2026-02-21T09:19:14.2282719Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2282784Z cvt.s64.s32 %rd1029, %r19202; 2026-02-21T09:19:14.2282846Z add.s64 %rd929, %rd65, %rd1029; 2026-02-21T09:19:14.2283042Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2283161Z // begin inline asm 2026-02-21T09:19:14.2283301Z cp.async.ca.shared.global [ %r19035 + 0 ], [ %rd929 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2283360Z // end inline asm 2026-02-21T09:19:14.2283424Z cp.async.commit_group; 2026-02-21T09:19:14.2283621Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2283684Z add.s64 %rd930, %rd999, 96; 2026-02-21T09:19:14.2283750Z add.s64 %rd931, %rd1003, 96; 2026-02-21T09:19:14.2283809Z add.s64 %rd932, %rd1007, 96; 2026-02-21T09:19:14.2283869Z add.s64 %rd933, %rd1011, 96; 2026-02-21T09:19:14.2283931Z add.s64 %rd934, %rd1015, 96; 2026-02-21T09:19:14.2283991Z add.s64 %rd935, %rd1019, 96; 2026-02-21T09:19:14.2284052Z add.s64 %rd936, %rd1023, 96; 2026-02-21T09:19:14.2284112Z add.s64 %rd937, %rd1027, 96; 2026-02-21T09:19:14.2284313Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2284374Z // begin inline asm 2026-02-21T09:19:14.2284511Z cp.async.ca.shared.global [ %r19037 + 0 ], [ %rd930 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2284570Z // end inline asm 2026-02-21T09:19:14.2284627Z // begin inline asm 2026-02-21T09:19:14.2284761Z cp.async.ca.shared.global [ %r19039 + 0 ], [ %rd931 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2284821Z // end inline asm 2026-02-21T09:19:14.2287438Z // begin inline asm 2026-02-21T09:19:14.2287655Z cp.async.ca.shared.global [ %r19041 + 0 ], [ %rd932 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2287719Z // end inline asm 2026-02-21T09:19:14.2287782Z // begin inline asm 2026-02-21T09:19:14.2287945Z cp.async.ca.shared.global [ %r19043 + 0 ], [ %rd933 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2288005Z // end inline asm 2026-02-21T09:19:14.2288063Z // begin inline asm 2026-02-21T09:19:14.2288210Z cp.async.ca.shared.global [ %r19045 + 0 ], [ %rd934 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2288271Z // end inline asm 2026-02-21T09:19:14.2288330Z // begin inline asm 2026-02-21T09:19:14.2288472Z cp.async.ca.shared.global [ %r19047 + 0 ], [ %rd935 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2288531Z // end inline asm 2026-02-21T09:19:14.2288591Z // begin inline asm 2026-02-21T09:19:14.2288729Z cp.async.ca.shared.global [ %r19049 + 0 ], [ %rd936 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2288784Z // end inline asm 2026-02-21T09:19:14.2288844Z // begin inline asm 2026-02-21T09:19:14.2288986Z cp.async.ca.shared.global [ %r19051 + 0 ], [ %rd937 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2289178Z // end inline asm 2026-02-21T09:19:14.2289249Z cp.async.commit_group; 2026-02-21T09:19:14.2289476Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2289613Z add.s32 %r19203, %r228, %r19175; 2026-02-21T09:19:14.2289837Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2289906Z cvt.s64.s32 %rd1030, %r19203; 2026-02-21T09:19:14.2289973Z add.s64 %rd938, %rd65, %rd1030; 2026-02-21T09:19:14.2290178Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2290244Z // begin inline asm 2026-02-21T09:19:14.2290484Z cp.async.ca.shared.global [ %r19053 + 0 ], [ %rd938 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2290544Z // end inline asm 2026-02-21T09:19:14.2290613Z cp.async.commit_group; 2026-02-21T09:19:14.2290815Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2290881Z add.s64 %rd939, %rd999, 128; 2026-02-21T09:19:14.2290948Z add.s64 %rd940, %rd1003, 128; 2026-02-21T09:19:14.2291014Z add.s64 %rd941, %rd1007, 128; 2026-02-21T09:19:14.2291074Z add.s64 %rd942, %rd1011, 128; 2026-02-21T09:19:14.2291133Z add.s64 %rd943, %rd1015, 128; 2026-02-21T09:19:14.2291207Z add.s64 %rd944, %rd1019, 128; 2026-02-21T09:19:14.2291269Z add.s64 %rd945, %rd1023, 128; 2026-02-21T09:19:14.2291329Z add.s64 %rd946, %rd1027, 128; 2026-02-21T09:19:14.2291595Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2291656Z bar.sync 0; 2026-02-21T09:19:14.2291716Z // begin inline asm 2026-02-21T09:19:14.2291862Z cp.async.ca.shared.global [ %r19055 + 0 ], [ %rd939 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2291921Z // end inline asm 2026-02-21T09:19:14.2291979Z // begin inline asm 2026-02-21T09:19:14.2292116Z cp.async.ca.shared.global [ %r19057 + 0 ], [ %rd940 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2292174Z // end inline asm 2026-02-21T09:19:14.2292233Z // begin inline asm 2026-02-21T09:19:14.2292368Z cp.async.ca.shared.global [ %r19059 + 0 ], [ %rd941 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2292428Z // end inline asm 2026-02-21T09:19:14.2292485Z // begin inline asm 2026-02-21T09:19:14.2292625Z cp.async.ca.shared.global [ %r19061 + 0 ], [ %rd942 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2292681Z // end inline asm 2026-02-21T09:19:14.2292740Z // begin inline asm 2026-02-21T09:19:14.2292873Z cp.async.ca.shared.global [ %r19063 + 0 ], [ %rd943 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2292929Z // end inline asm 2026-02-21T09:19:14.2292990Z // begin inline asm 2026-02-21T09:19:14.2293124Z cp.async.ca.shared.global [ %r19065 + 0 ], [ %rd944 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2293178Z // end inline asm 2026-02-21T09:19:14.2293235Z // begin inline asm 2026-02-21T09:19:14.2293370Z cp.async.ca.shared.global [ %r19067 + 0 ], [ %rd945 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2293425Z // end inline asm 2026-02-21T09:19:14.2293482Z // begin inline asm 2026-02-21T09:19:14.2293615Z cp.async.ca.shared.global [ %r19069 + 0 ], [ %rd946 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2293672Z // end inline asm 2026-02-21T09:19:14.2293738Z cp.async.commit_group; 2026-02-21T09:19:14.2293949Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2294016Z add.s32 %r19204, %r238, %r19175; 2026-02-21T09:19:14.2294218Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2294282Z cvt.s64.s32 %rd1031, %r19204; 2026-02-21T09:19:14.2294354Z add.s64 %rd947, %rd65, %rd1031; 2026-02-21T09:19:14.2294553Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2294613Z // begin inline asm 2026-02-21T09:19:14.2294819Z cp.async.ca.shared.global [ %r19071 + 0 ], [ %rd947 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2294876Z // end inline asm 2026-02-21T09:19:14.2294941Z cp.async.commit_group; 2026-02-21T09:19:14.2295139Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2295249Z add.s64 %rd948, %rd999, 160; 2026-02-21T09:19:14.2295310Z add.s64 %rd949, %rd1003, 160; 2026-02-21T09:19:14.2295371Z add.s64 %rd950, %rd1007, 160; 2026-02-21T09:19:14.2295433Z add.s64 %rd951, %rd1011, 160; 2026-02-21T09:19:14.2295493Z add.s64 %rd952, %rd1015, 160; 2026-02-21T09:19:14.2295553Z add.s64 %rd953, %rd1019, 160; 2026-02-21T09:19:14.2295615Z add.s64 %rd954, %rd1023, 160; 2026-02-21T09:19:14.2295686Z add.s64 %rd955, %rd1027, 160; 2026-02-21T09:19:14.2295938Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2296003Z // begin inline asm 2026-02-21T09:19:14.2296146Z cp.async.ca.shared.global [ %r19073 + 0 ], [ %rd948 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2296202Z // end inline asm 2026-02-21T09:19:14.2296260Z // begin inline asm 2026-02-21T09:19:14.2296396Z cp.async.ca.shared.global [ %r19075 + 0 ], [ %rd949 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2296588Z // end inline asm 2026-02-21T09:19:14.2296651Z // begin inline asm 2026-02-21T09:19:14.2296788Z cp.async.ca.shared.global [ %r19077 + 0 ], [ %rd950 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2296843Z // end inline asm 2026-02-21T09:19:14.2296899Z // begin inline asm 2026-02-21T09:19:14.2297105Z cp.async.ca.shared.global [ %r19079 + 0 ], [ %rd951 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2297167Z // end inline asm 2026-02-21T09:19:14.2297224Z // begin inline asm 2026-02-21T09:19:14.2297367Z cp.async.ca.shared.global [ %r19081 + 0 ], [ %rd952 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2297426Z // end inline asm 2026-02-21T09:19:14.2297484Z // begin inline asm 2026-02-21T09:19:14.2297619Z cp.async.ca.shared.global [ %r19083 + 0 ], [ %rd953 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2297679Z // end inline asm 2026-02-21T09:19:14.2297736Z // begin inline asm 2026-02-21T09:19:14.2297869Z cp.async.ca.shared.global [ %r19085 + 0 ], [ %rd954 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2297925Z // end inline asm 2026-02-21T09:19:14.2297984Z // begin inline asm 2026-02-21T09:19:14.2298116Z cp.async.ca.shared.global [ %r19087 + 0 ], [ %rd955 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2298172Z // end inline asm 2026-02-21T09:19:14.2298239Z cp.async.commit_group; 2026-02-21T09:19:14.2298458Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2298525Z add.s32 %r19205, %r248, %r19175; 2026-02-21T09:19:14.2298729Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2298794Z cvt.s64.s32 %rd1032, %r19205; 2026-02-21T09:19:14.2298858Z add.s64 %rd956, %rd65, %rd1032; 2026-02-21T09:19:14.2299055Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2299125Z // begin inline asm 2026-02-21T09:19:14.2299266Z cp.async.ca.shared.global [ %r19089 + 0 ], [ %rd956 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2299327Z // end inline asm 2026-02-21T09:19:14.2299399Z cp.async.commit_group; 2026-02-21T09:19:14.2299606Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2299671Z add.s64 %rd957, %rd999, 192; 2026-02-21T09:19:14.2299738Z add.s64 %rd958, %rd1003, 192; 2026-02-21T09:19:14.2299798Z add.s64 %rd959, %rd1007, 192; 2026-02-21T09:19:14.2299857Z add.s64 %rd960, %rd1011, 192; 2026-02-21T09:19:14.2299917Z add.s64 %rd961, %rd1015, 192; 2026-02-21T09:19:14.2299983Z add.s64 %rd962, %rd1019, 192; 2026-02-21T09:19:14.2300043Z add.s64 %rd963, %rd1023, 192; 2026-02-21T09:19:14.2300103Z add.s64 %rd964, %rd1027, 192; 2026-02-21T09:19:14.2300382Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2300439Z bar.sync 0; 2026-02-21T09:19:14.2300498Z // begin inline asm 2026-02-21T09:19:14.2300640Z cp.async.ca.shared.global [ %r19091 + 0 ], [ %rd957 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2300778Z // end inline asm 2026-02-21T09:19:14.2300838Z // begin inline asm 2026-02-21T09:19:14.2300979Z cp.async.ca.shared.global [ %r19093 + 0 ], [ %rd958 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2301036Z // end inline asm 2026-02-21T09:19:14.2301094Z // begin inline asm 2026-02-21T09:19:14.2301231Z cp.async.ca.shared.global [ %r19095 + 0 ], [ %rd959 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2301290Z // end inline asm 2026-02-21T09:19:14.2301350Z // begin inline asm 2026-02-21T09:19:14.2301558Z cp.async.ca.shared.global [ %r19097 + 0 ], [ %rd960 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2301618Z // end inline asm 2026-02-21T09:19:14.2301678Z // begin inline asm 2026-02-21T09:19:14.2301814Z cp.async.ca.shared.global [ %r19099 + 0 ], [ %rd961 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2301870Z // end inline asm 2026-02-21T09:19:14.2301929Z // begin inline asm 2026-02-21T09:19:14.2302063Z cp.async.ca.shared.global [ %r19101 + 0 ], [ %rd962 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2302122Z // end inline asm 2026-02-21T09:19:14.2302179Z // begin inline asm 2026-02-21T09:19:14.2302314Z cp.async.ca.shared.global [ %r19103 + 0 ], [ %rd963 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2302370Z // end inline asm 2026-02-21T09:19:14.2302428Z // begin inline asm 2026-02-21T09:19:14.2302616Z cp.async.ca.shared.global [ %r19105 + 0 ], [ %rd964 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2302673Z // end inline asm 2026-02-21T09:19:14.2302736Z cp.async.commit_group; 2026-02-21T09:19:14.2302945Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2303011Z add.s32 %r19206, %r258, %r19175; 2026-02-21T09:19:14.2303211Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2303276Z cvt.s64.s32 %rd1033, %r19206; 2026-02-21T09:19:14.2303345Z add.s64 %rd965, %rd65, %rd1033; 2026-02-21T09:19:14.2303541Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2303603Z // begin inline asm 2026-02-21T09:19:14.2303740Z cp.async.ca.shared.global [ %r19107 + 0 ], [ %rd965 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2303796Z // end inline asm 2026-02-21T09:19:14.2303859Z cp.async.commit_group; 2026-02-21T09:19:14.2304058Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2304122Z add.s64 %rd966, %rd999, 224; 2026-02-21T09:19:14.2304184Z add.s64 %rd967, %rd1003, 224; 2026-02-21T09:19:14.2304244Z add.s64 %rd968, %rd1007, 224; 2026-02-21T09:19:14.2304307Z add.s64 %rd969, %rd1011, 224; 2026-02-21T09:19:14.2304366Z add.s64 %rd970, %rd1015, 224; 2026-02-21T09:19:14.2304427Z add.s64 %rd971, %rd1019, 224; 2026-02-21T09:19:14.2304487Z add.s64 %rd972, %rd1023, 224; 2026-02-21T09:19:14.2304547Z add.s64 %rd973, %rd1027, 224; 2026-02-21T09:19:14.2304744Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2304804Z // begin inline asm 2026-02-21T09:19:14.2304944Z cp.async.ca.shared.global [ %r19109 + 0 ], [ %rd966 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2305000Z // end inline asm 2026-02-21T09:19:14.2305058Z // begin inline asm 2026-02-21T09:19:14.2305195Z cp.async.ca.shared.global [ %r19111 + 0 ], [ %rd967 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2305251Z // end inline asm 2026-02-21T09:19:14.2305308Z // begin inline asm 2026-02-21T09:19:14.2305444Z cp.async.ca.shared.global [ %r19113 + 0 ], [ %rd968 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2305501Z // end inline asm 2026-02-21T09:19:14.2305558Z // begin inline asm 2026-02-21T09:19:14.2305691Z cp.async.ca.shared.global [ %r19115 + 0 ], [ %rd969 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2305813Z // end inline asm 2026-02-21T09:19:14.2305869Z // begin inline asm 2026-02-21T09:19:14.2306003Z cp.async.ca.shared.global [ %r19117 + 0 ], [ %rd970 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2306111Z // end inline asm 2026-02-21T09:19:14.2306167Z // begin inline asm 2026-02-21T09:19:14.2306301Z cp.async.ca.shared.global [ %r19119 + 0 ], [ %rd971 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2306356Z // end inline asm 2026-02-21T09:19:14.2306415Z // begin inline asm 2026-02-21T09:19:14.2306661Z cp.async.ca.shared.global [ %r19121 + 0 ], [ %rd972 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2306721Z // end inline asm 2026-02-21T09:19:14.2306784Z // begin inline asm 2026-02-21T09:19:14.2307001Z cp.async.ca.shared.global [ %r19123 + 0 ], [ %rd973 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2307062Z // end inline asm 2026-02-21T09:19:14.2307131Z cp.async.commit_group; 2026-02-21T09:19:14.2307336Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2307403Z add.s32 %r19207, %r268, %r19175; 2026-02-21T09:19:14.2307605Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2307673Z cvt.s64.s32 %rd1034, %r19207; 2026-02-21T09:19:14.2307736Z add.s64 %rd974, %rd65, %rd1034; 2026-02-21T09:19:14.2307932Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2307995Z // begin inline asm 2026-02-21T09:19:14.2308197Z cp.async.ca.shared.global [ %r19125 + 0 ], [ %rd974 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2308257Z // end inline asm 2026-02-21T09:19:14.2308323Z cp.async.commit_group; 2026-02-21T09:19:14.2308592Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2308659Z add.s64 %rd975, %rd999, 256; 2026-02-21T09:19:14.2308723Z add.s64 %rd976, %rd1003, 256; 2026-02-21T09:19:14.2308788Z add.s64 %rd977, %rd1007, 256; 2026-02-21T09:19:14.2308848Z add.s64 %rd978, %rd1011, 256; 2026-02-21T09:19:14.2308907Z add.s64 %rd979, %rd1015, 256; 2026-02-21T09:19:14.2308970Z add.s64 %rd980, %rd1019, 256; 2026-02-21T09:19:14.2309031Z add.s64 %rd981, %rd1023, 256; 2026-02-21T09:19:14.2309091Z add.s64 %rd982, %rd1027, 256; 2026-02-21T09:19:14.2309292Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2309349Z bar.sync 0; 2026-02-21T09:19:14.2309406Z // begin inline asm 2026-02-21T09:19:14.2309546Z cp.async.ca.shared.global [ %r19127 + 0 ], [ %rd975 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2309606Z // end inline asm 2026-02-21T09:19:14.2309663Z // begin inline asm 2026-02-21T09:19:14.2309798Z cp.async.ca.shared.global [ %r19129 + 0 ], [ %rd976 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2309856Z // end inline asm 2026-02-21T09:19:14.2309915Z // begin inline asm 2026-02-21T09:19:14.2310054Z cp.async.ca.shared.global [ %r19131 + 0 ], [ %rd977 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2310110Z // end inline asm 2026-02-21T09:19:14.2310172Z // begin inline asm 2026-02-21T09:19:14.2310304Z cp.async.ca.shared.global [ %r19133 + 0 ], [ %rd978 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2310361Z // end inline asm 2026-02-21T09:19:14.2310420Z // begin inline asm 2026-02-21T09:19:14.2310551Z cp.async.ca.shared.global [ %r19135 + 0 ], [ %rd979 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2310607Z // end inline asm 2026-02-21T09:19:14.2310665Z // begin inline asm 2026-02-21T09:19:14.2310799Z cp.async.ca.shared.global [ %r19137 + 0 ], [ %rd980 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2310855Z // end inline asm 2026-02-21T09:19:14.2310916Z // begin inline asm 2026-02-21T09:19:14.2311053Z cp.async.ca.shared.global [ %r19139 + 0 ], [ %rd981 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2311110Z // end inline asm 2026-02-21T09:19:14.2311168Z // begin inline asm 2026-02-21T09:19:14.2311303Z cp.async.ca.shared.global [ %r19141 + 0 ], [ %rd982 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2311455Z // end inline asm 2026-02-21T09:19:14.2311521Z cp.async.commit_group; 2026-02-21T09:19:14.2311718Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2311873Z add.s32 %r19208, %r278, %r19175; 2026-02-21T09:19:14.2312070Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2312132Z cvt.s64.s32 %rd1035, %r19208; 2026-02-21T09:19:14.2312204Z add.s64 %rd983, %rd65, %rd1035; 2026-02-21T09:19:14.2312401Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2312461Z // begin inline asm 2026-02-21T09:19:14.2312650Z cp.async.ca.shared.global [ %r19143 + 0 ], [ %rd983 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2312708Z // end inline asm 2026-02-21T09:19:14.2312771Z cp.async.commit_group; 2026-02-21T09:19:14.2312974Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2313036Z add.s64 %rd984, %rd999, 288; 2026-02-21T09:19:14.2313096Z add.s64 %rd985, %rd1003, 288; 2026-02-21T09:19:14.2313155Z add.s64 %rd986, %rd1007, 288; 2026-02-21T09:19:14.2313219Z add.s64 %rd987, %rd1011, 288; 2026-02-21T09:19:14.2313279Z add.s64 %rd988, %rd1015, 288; 2026-02-21T09:19:14.2313337Z add.s64 %rd989, %rd1019, 288; 2026-02-21T09:19:14.2313400Z add.s64 %rd990, %rd1023, 288; 2026-02-21T09:19:14.2313459Z add.s64 %rd991, %rd1027, 288; 2026-02-21T09:19:14.2313703Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2313765Z // begin inline asm 2026-02-21T09:19:14.2313904Z cp.async.ca.shared.global [ %r19145 + 0 ], [ %rd984 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2313958Z // end inline asm 2026-02-21T09:19:14.2314015Z // begin inline asm 2026-02-21T09:19:14.2314151Z cp.async.ca.shared.global [ %r19147 + 0 ], [ %rd985 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2314209Z // end inline asm 2026-02-21T09:19:14.2314266Z // begin inline asm 2026-02-21T09:19:14.2314401Z cp.async.ca.shared.global [ %r19149 + 0 ], [ %rd986 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2314458Z // end inline asm 2026-02-21T09:19:14.2314515Z // begin inline asm 2026-02-21T09:19:14.2314648Z cp.async.ca.shared.global [ %r19151 + 0 ], [ %rd987 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2314708Z // end inline asm 2026-02-21T09:19:14.2314764Z // begin inline asm 2026-02-21T09:19:14.2314898Z cp.async.ca.shared.global [ %r19153 + 0 ], [ %rd988 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2314956Z // end inline asm 2026-02-21T09:19:14.2315013Z // begin inline asm 2026-02-21T09:19:14.2315149Z cp.async.ca.shared.global [ %r19155 + 0 ], [ %rd989 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2315207Z // end inline asm 2026-02-21T09:19:14.2315265Z // begin inline asm 2026-02-21T09:19:14.2315397Z cp.async.ca.shared.global [ %r19157 + 0 ], [ %rd990 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2315454Z // end inline asm 2026-02-21T09:19:14.2315512Z // begin inline asm 2026-02-21T09:19:14.2315645Z cp.async.ca.shared.global [ %r19159 + 0 ], [ %rd991 + 0 ], 0x8, %r18984; 2026-02-21T09:19:14.2315699Z // end inline asm 2026-02-21T09:19:14.2315767Z cp.async.commit_group; 2026-02-21T09:19:14.2315964Z .loc 1 57 62 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:62 2026-02-21T09:19:14.2316026Z add.s32 %r19209, %r288, %r19175; 2026-02-21T09:19:14.2316222Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2316295Z cvt.s64.s32 %rd1036, %r19209; 2026-02-21T09:19:14.2316363Z add.s64 %rd992, %rd65, %rd1036; 2026-02-21T09:19:14.2316679Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2316743Z // begin inline asm 2026-02-21T09:19:14.2316879Z cp.async.ca.shared.global [ %r19161 + 0 ], [ %rd992 + 0 ], 0x4, %r24032; 2026-02-21T09:19:14.2317015Z // end inline asm 2026-02-21T09:19:14.2317088Z cp.async.commit_group; 2026-02-21T09:19:14.2317284Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2317408Z add.s32 %r19210, %r310, %r19173; 2026-02-21T09:19:14.2317475Z add.s32 %r24030, %r19210, %r19174; 2026-02-21T09:19:14.2317535Z or.b32 %r19211, %r18, %r2429; 2026-02-21T09:19:14.2317593Z shl.b32 %r19212, %r19211, 10; 2026-02-21T09:19:14.2317658Z mul.wide.s32 %rd52, %r19212, 2; 2026-02-21T09:19:14.2317723Z or.b32 %r19213, %r17, %r2429; 2026-02-21T09:19:14.2317782Z shl.b32 %r19214, %r19213, 10; 2026-02-21T09:19:14.2317846Z mul.wide.s32 %rd53, %r19214, 2; 2026-02-21T09:19:14.2317970Z or.b32 %r19215, %r16, %r2429; 2026-02-21T09:19:14.2318033Z shl.b32 %r19216, %r19215, 10; 2026-02-21T09:19:14.2318096Z mul.wide.s32 %rd54, %r19216, 2; 2026-02-21T09:19:14.2318155Z or.b32 %r19217, %r15, %r2429; 2026-02-21T09:19:14.2318218Z shl.b32 %r19218, %r19217, 10; 2026-02-21T09:19:14.2318280Z mul.wide.s32 %rd55, %r19218, 2; 2026-02-21T09:19:14.2318337Z or.b32 %r19219, %r14, %r2429; 2026-02-21T09:19:14.2318399Z shl.b32 %r19220, %r19219, 10; 2026-02-21T09:19:14.2318466Z mul.wide.s32 %rd56, %r19220, 2; 2026-02-21T09:19:14.2318524Z or.b32 %r19221, %r13, %r2429; 2026-02-21T09:19:14.2318585Z shl.b32 %r19222, %r19221, 10; 2026-02-21T09:19:14.2318648Z mul.wide.s32 %rd57, %r19222, 2; 2026-02-21T09:19:14.2318706Z or.b32 %r19223, %r12, %r2429; 2026-02-21T09:19:14.2318831Z shl.b32 %r19224, %r19223, 10; 2026-02-21T09:19:14.2318899Z mul.wide.s32 %rd58, %r19224, 2; 2026-02-21T09:19:14.2318958Z shl.b32 %r19225, %r19171, 19; 2026-02-21T09:19:14.2319021Z or.b32 %r19226, %r22984, %r19225; 2026-02-21T09:19:14.2319090Z mul.wide.s32 %rd59, %r19226, 2; 2026-02-21T09:19:14.2319152Z mov.b32 %r24033, 0f00000000; 2026-02-21T09:19:14.2319213Z mov.b32 %r24031, -1; 2026-02-21T09:19:14.2319275Z mov.b64 %rd1123, -16; 2026-02-21T09:19:14.2319341Z mov.b64 %rd1122, %rd6; 2026-02-21T09:19:14.2319413Z mov.b32 %r24034, %r24033; 2026-02-21T09:19:14.2319473Z mov.b32 %r24035, %r24033; 2026-02-21T09:19:14.2319534Z mov.b32 %r24036, %r24033; 2026-02-21T09:19:14.2319592Z mov.b32 %r24037, %r24033; 2026-02-21T09:19:14.2319653Z mov.b32 %r24038, %r24033; 2026-02-21T09:19:14.2319710Z mov.b32 %r24039, %r24033; 2026-02-21T09:19:14.2319769Z mov.b32 %r24040, %r24033; 2026-02-21T09:19:14.2319827Z mov.b32 %r24041, %r24033; 2026-02-21T09:19:14.2319886Z mov.b32 %r24042, %r24033; 2026-02-21T09:19:14.2319949Z mov.b32 %r24043, %r24033; 2026-02-21T09:19:14.2320008Z mov.b32 %r24044, %r24033; 2026-02-21T09:19:14.2320065Z mov.b32 %r24045, %r24033; 2026-02-21T09:19:14.2320122Z mov.b32 %r24046, %r24033; 2026-02-21T09:19:14.2320182Z mov.b32 %r24047, %r24033; 2026-02-21T09:19:14.2320239Z mov.b32 %r24048, %r24033; 2026-02-21T09:19:14.2320296Z mov.b32 %r24049, %r24033; 2026-02-21T09:19:14.2320354Z mov.b32 %r24050, %r24033; 2026-02-21T09:19:14.2320414Z mov.b32 %r24051, %r24033; 2026-02-21T09:19:14.2320470Z mov.b32 %r24052, %r24033; 2026-02-21T09:19:14.2320528Z mov.b32 %r24053, %r24033; 2026-02-21T09:19:14.2320585Z mov.b32 %r24054, %r24033; 2026-02-21T09:19:14.2320644Z mov.b32 %r24055, %r24033; 2026-02-21T09:19:14.2320703Z mov.b32 %r24056, %r24033; 2026-02-21T09:19:14.2320763Z mov.b32 %r24057, %r24033; 2026-02-21T09:19:14.2320822Z mov.b32 %r24058, %r24033; 2026-02-21T09:19:14.2320878Z mov.b32 %r24059, %r24033; 2026-02-21T09:19:14.2320937Z mov.b32 %r24060, %r24033; 2026-02-21T09:19:14.2320996Z mov.b32 %r24061, %r24033; 2026-02-21T09:19:14.2321055Z mov.b32 %r24062, %r24033; 2026-02-21T09:19:14.2321117Z mov.b32 %r24063, %r24033; 2026-02-21T09:19:14.2321178Z mov.b32 %r24064, %r24033; 2026-02-21T09:19:14.2321237Z mov.b32 %r24065, %r24033; 2026-02-21T09:19:14.2321296Z mov.b32 %r24066, %r24033; 2026-02-21T09:19:14.2321356Z mov.b32 %r24067, %r24033; 2026-02-21T09:19:14.2321413Z mov.b32 %r24068, %r24033; 2026-02-21T09:19:14.2321534Z mov.b32 %r24069, %r24033; 2026-02-21T09:19:14.2321591Z mov.b32 %r24070, %r24033; 2026-02-21T09:19:14.2321651Z mov.b32 %r24071, %r24033; 2026-02-21T09:19:14.2321709Z mov.b32 %r24072, %r24033; 2026-02-21T09:19:14.2321823Z mov.b32 %r24073, %r24033; 2026-02-21T09:19:14.2321885Z mov.b32 %r24074, %r24033; 2026-02-21T09:19:14.2321943Z mov.b32 %r24075, %r24033; 2026-02-21T09:19:14.2322000Z mov.b32 %r24076, %r24033; 2026-02-21T09:19:14.2322057Z mov.b32 %r24077, %r24033; 2026-02-21T09:19:14.2322116Z mov.b32 %r24078, %r24033; 2026-02-21T09:19:14.2322175Z mov.b32 %r24079, %r24033; 2026-02-21T09:19:14.2322234Z mov.b32 %r24080, %r24033; 2026-02-21T09:19:14.2322293Z mov.b32 %r24081, %r24033; 2026-02-21T09:19:14.2322352Z mov.b32 %r24082, %r24033; 2026-02-21T09:19:14.2322459Z mov.b32 %r24083, %r24033; 2026-02-21T09:19:14.2322520Z mov.b32 %r24084, %r24033; 2026-02-21T09:19:14.2322582Z mov.b32 %r24085, %r24033; 2026-02-21T09:19:14.2322639Z mov.b32 %r24086, %r24033; 2026-02-21T09:19:14.2322699Z mov.b32 %r24087, %r24033; 2026-02-21T09:19:14.2322760Z mov.b32 %r24088, %r24033; 2026-02-21T09:19:14.2322819Z mov.b32 %r24089, %r24033; 2026-02-21T09:19:14.2322876Z mov.b32 %r24090, %r24033; 2026-02-21T09:19:14.2322935Z mov.b32 %r24091, %r24033; 2026-02-21T09:19:14.2322996Z mov.b32 %r24092, %r24033; 2026-02-21T09:19:14.2323055Z mov.b32 %r24093, %r24033; 2026-02-21T09:19:14.2323117Z mov.b32 %r24094, %r24033; 2026-02-21T09:19:14.2323175Z mov.b32 %r24095, %r24033; 2026-02-21T09:19:14.2323233Z mov.b32 %r24096, %r24033; 2026-02-21T09:19:14.2323348Z mov.b32 %r24097, %r24033; 2026-02-21T09:19:14.2323408Z mov.b32 %r24098, %r24033; 2026-02-21T09:19:14.2323466Z mov.b32 %r24099, %r24033; 2026-02-21T09:19:14.2323522Z mov.b32 %r24100, %r24033; 2026-02-21T09:19:14.2323585Z mov.b32 %r24101, %r24033; 2026-02-21T09:19:14.2323642Z mov.b32 %r24102, %r24033; 2026-02-21T09:19:14.2323699Z mov.b32 %r24103, %r24033; 2026-02-21T09:19:14.2323760Z mov.b32 %r24104, %r24033; 2026-02-21T09:19:14.2323819Z mov.b32 %r24105, %r24033; 2026-02-21T09:19:14.2323877Z mov.b32 %r24106, %r24033; 2026-02-21T09:19:14.2323933Z mov.b32 %r24107, %r24033; 2026-02-21T09:19:14.2323996Z mov.b32 %r24108, %r24033; 2026-02-21T09:19:14.2324056Z mov.b32 %r24109, %r24033; 2026-02-21T09:19:14.2324112Z mov.b32 %r24110, %r24033; 2026-02-21T09:19:14.2324178Z mov.b32 %r24111, %r24033; 2026-02-21T09:19:14.2324243Z mov.b32 %r24112, %r24033; 2026-02-21T09:19:14.2324301Z mov.b32 %r24113, %r24033; 2026-02-21T09:19:14.2324362Z mov.b32 %r24114, %r24033; 2026-02-21T09:19:14.2324421Z mov.b32 %r24115, %r24033; 2026-02-21T09:19:14.2324480Z mov.b32 %r24116, %r24033; 2026-02-21T09:19:14.2324538Z mov.b32 %r24117, %r24033; 2026-02-21T09:19:14.2324599Z mov.b32 %r24118, %r24033; 2026-02-21T09:19:14.2324657Z mov.b32 %r24119, %r24033; 2026-02-21T09:19:14.2324715Z mov.b32 %r24120, %r24033; 2026-02-21T09:19:14.2324773Z mov.b32 %r24121, %r24033; 2026-02-21T09:19:14.2324834Z mov.b32 %r24122, %r24033; 2026-02-21T09:19:14.2324894Z mov.b32 %r24123, %r24033; 2026-02-21T09:19:14.2324952Z mov.b32 %r24124, %r24033; 2026-02-21T09:19:14.2325011Z mov.b32 %r24125, %r24033; 2026-02-21T09:19:14.2325068Z mov.b32 %r24126, %r24033; 2026-02-21T09:19:14.2325126Z mov.b32 %r24127, %r24033; 2026-02-21T09:19:14.2325185Z mov.b32 %r24128, %r24033; 2026-02-21T09:19:14.2325243Z mov.b32 %r24129, %r24033; 2026-02-21T09:19:14.2325300Z mov.b32 %r24130, %r24033; 2026-02-21T09:19:14.2325357Z mov.b32 %r24131, %r24033; 2026-02-21T09:19:14.2325420Z mov.b32 %r24132, %r24033; 2026-02-21T09:19:14.2325480Z mov.b32 %r24133, %r24033; 2026-02-21T09:19:14.2325538Z mov.b32 %r24134, %r24033; 2026-02-21T09:19:14.2325598Z mov.b32 %r24135, %r24033; 2026-02-21T09:19:14.2325655Z mov.b32 %r24136, %r24033; 2026-02-21T09:19:14.2325714Z mov.b32 %r24137, %r24033; 2026-02-21T09:19:14.2325771Z mov.b32 %r24138, %r24033; 2026-02-21T09:19:14.2325831Z mov.b32 %r24139, %r24033; 2026-02-21T09:19:14.2325950Z mov.b32 %r24140, %r24033; 2026-02-21T09:19:14.2326009Z mov.b32 %r24141, %r24033; 2026-02-21T09:19:14.2326069Z mov.b32 %r24142, %r24033; 2026-02-21T09:19:14.2326125Z mov.b32 %r24143, %r24033; 2026-02-21T09:19:14.2326183Z mov.b32 %r24144, %r24033; 2026-02-21T09:19:14.2326286Z mov.b32 %r24145, %r24033; 2026-02-21T09:19:14.2326346Z mov.b32 %r24146, %r24033; 2026-02-21T09:19:14.2326404Z mov.b32 %r24147, %r24033; 2026-02-21T09:19:14.2326568Z mov.b32 %r24148, %r24033; 2026-02-21T09:19:14.2326632Z mov.b32 %r24149, %r24033; 2026-02-21T09:19:14.2326689Z mov.b32 %r24150, %r24033; 2026-02-21T09:19:14.2326748Z mov.b32 %r24151, %r24033; 2026-02-21T09:19:14.2326806Z mov.b32 %r24152, %r24033; 2026-02-21T09:19:14.2326868Z mov.b32 %r24153, %r24033; 2026-02-21T09:19:14.2326927Z mov.b32 %r24154, %r24033; 2026-02-21T09:19:14.2327066Z mov.b32 %r24155, %r24033; 2026-02-21T09:19:14.2327130Z mov.b32 %r24156, %r24033; 2026-02-21T09:19:14.2327187Z mov.b32 %r24157, %r24033; 2026-02-21T09:19:14.2327246Z mov.b32 %r24158, %r24033; 2026-02-21T09:19:14.2327305Z mov.b32 %r24159, %r24033; 2026-02-21T09:19:14.2327365Z mov.b32 %r24160, %r24033; 2026-02-21T09:19:14.2327424Z mov.b32 %r24161, %r24033; 2026-02-21T09:19:14.2327483Z mov.b32 %r24162, %r24033; 2026-02-21T09:19:14.2327551Z mov.b32 %r24163, %r24033; 2026-02-21T09:19:14.2327616Z mov.b32 %r24164, %r24033; 2026-02-21T09:19:14.2327675Z mov.b32 %r24165, %r24033; 2026-02-21T09:19:14.2327734Z mov.b32 %r24166, %r24033; 2026-02-21T09:19:14.2327792Z mov.b32 %r24167, %r24033; 2026-02-21T09:19:14.2327849Z mov.b32 %r24168, %r24033; 2026-02-21T09:19:14.2327969Z mov.b32 %r24169, %r24033; 2026-02-21T09:19:14.2328032Z mov.b32 %r24170, %r24033; 2026-02-21T09:19:14.2328089Z mov.b32 %r24171, %r24033; 2026-02-21T09:19:14.2328148Z mov.b32 %r24172, %r24033; 2026-02-21T09:19:14.2328208Z mov.b32 %r24173, %r24033; 2026-02-21T09:19:14.2328267Z mov.b32 %r24174, %r24033; 2026-02-21T09:19:14.2328325Z mov.b32 %r24175, %r24033; 2026-02-21T09:19:14.2328394Z mov.b32 %r24176, %r24033; 2026-02-21T09:19:14.2328457Z mov.b32 %r24177, %r24033; 2026-02-21T09:19:14.2328514Z mov.b32 %r24178, %r24033; 2026-02-21T09:19:14.2328574Z mov.b32 %r24179, %r24033; 2026-02-21T09:19:14.2328634Z mov.b32 %r24180, %r24033; 2026-02-21T09:19:14.2328694Z mov.b32 %r24181, %r24033; 2026-02-21T09:19:14.2328753Z mov.b32 %r24182, %r24033; 2026-02-21T09:19:14.2328810Z mov.b32 %r24183, %r24033; 2026-02-21T09:19:14.2328870Z mov.b32 %r24184, %r24033; 2026-02-21T09:19:14.2328927Z mov.b32 %r24185, %r24033; 2026-02-21T09:19:14.2328984Z mov.b32 %r24186, %r24033; 2026-02-21T09:19:14.2329045Z mov.b32 %r24187, %r24033; 2026-02-21T09:19:14.2329111Z mov.b32 %r24188, %r24033; 2026-02-21T09:19:14.2329168Z mov.b32 %r24189, %r24033; 2026-02-21T09:19:14.2329229Z mov.b32 %r24190, %r24033; 2026-02-21T09:19:14.2329289Z mov.b32 %r24191, %r24033; 2026-02-21T09:19:14.2329347Z mov.b32 %r24192, %r24033; 2026-02-21T09:19:14.2329404Z mov.b32 %r24193, %r24033; 2026-02-21T09:19:14.2329466Z mov.b32 %r24194, %r24033; 2026-02-21T09:19:14.2329523Z mov.b32 %r24195, %r24033; 2026-02-21T09:19:14.2329581Z mov.b32 %r24196, %r24033; 2026-02-21T09:19:14.2329638Z mov.b32 %r24197, %r24033; 2026-02-21T09:19:14.2329699Z mov.b32 %r24198, %r24033; 2026-02-21T09:19:14.2329759Z mov.b32 %r24199, %r24033; 2026-02-21T09:19:14.2329817Z mov.b32 %r24200, %r24033; 2026-02-21T09:19:14.2329877Z mov.b32 %r24201, %r24033; 2026-02-21T09:19:14.2329933Z mov.b32 %r24202, %r24033; 2026-02-21T09:19:14.2329990Z mov.b32 %r24203, %r24033; 2026-02-21T09:19:14.2330047Z mov.b32 %r24204, %r24033; 2026-02-21T09:19:14.2330108Z mov.b32 %r24205, %r24033; 2026-02-21T09:19:14.2330166Z mov.b32 %r24206, %r24033; 2026-02-21T09:19:14.2330223Z mov.b32 %r24207, %r24033; 2026-02-21T09:19:14.2330285Z mov.b32 %r24208, %r24033; 2026-02-21T09:19:14.2330343Z mov.b32 %r24209, %r24033; 2026-02-21T09:19:14.2330401Z mov.b32 %r24210, %r24033; 2026-02-21T09:19:14.2330461Z mov.b32 %r24211, %r24033; 2026-02-21T09:19:14.2330614Z mov.b32 %r24212, %r24033; 2026-02-21T09:19:14.2330672Z mov.b32 %r24213, %r24033; 2026-02-21T09:19:14.2330728Z mov.b32 %r24214, %r24033; 2026-02-21T09:19:14.2330792Z mov.b32 %r24215, %r24033; 2026-02-21T09:19:14.2330849Z mov.b32 %r24216, %r24033; 2026-02-21T09:19:14.2330969Z mov.b32 %r24217, %r24033; 2026-02-21T09:19:14.2331028Z mov.b32 %r24218, %r24033; 2026-02-21T09:19:14.2331086Z mov.b32 %r24219, %r24033; 2026-02-21T09:19:14.2331143Z mov.b32 %r24220, %r24033; 2026-02-21T09:19:14.2331201Z mov.b32 %r24221, %r24033; 2026-02-21T09:19:14.2331260Z mov.b32 %r24222, %r24033; 2026-02-21T09:19:14.2331319Z mov.b32 %r24223, %r24033; 2026-02-21T09:19:14.2331376Z mov.b32 %r24224, %r24033; 2026-02-21T09:19:14.2331436Z mov.b32 %r24225, %r24033; 2026-02-21T09:19:14.2331553Z mov.b32 %r24226, %r24033; 2026-02-21T09:19:14.2331615Z mov.b32 %r24227, %r24033; 2026-02-21T09:19:14.2331672Z mov.b32 %r24228, %r24033; 2026-02-21T09:19:14.2331733Z mov.b32 %r24229, %r24033; 2026-02-21T09:19:14.2331792Z mov.b32 %r24230, %r24033; 2026-02-21T09:19:14.2331849Z mov.b32 %r24231, %r24033; 2026-02-21T09:19:14.2331908Z mov.b32 %r24232, %r24033; 2026-02-21T09:19:14.2331967Z mov.b32 %r24233, %r24033; 2026-02-21T09:19:14.2332026Z mov.b32 %r24234, %r24033; 2026-02-21T09:19:14.2332087Z mov.b32 %r24235, %r24033; 2026-02-21T09:19:14.2332147Z mov.b32 %r24236, %r24033; 2026-02-21T09:19:14.2332203Z mov.b32 %r24237, %r24033; 2026-02-21T09:19:14.2332261Z mov.b32 %r24238, %r24033; 2026-02-21T09:19:14.2332321Z mov.b32 %r24239, %r24033; 2026-02-21T09:19:14.2332378Z mov.b32 %r24240, %r24033; 2026-02-21T09:19:14.2332482Z mov.b32 %r24241, %r24033; 2026-02-21T09:19:14.2332541Z mov.b32 %r24242, %r24033; 2026-02-21T09:19:14.2332603Z mov.b32 %r24243, %r24033; 2026-02-21T09:19:14.2332662Z mov.b32 %r24244, %r24033; 2026-02-21T09:19:14.2332719Z mov.b32 %r24245, %r24033; 2026-02-21T09:19:14.2332779Z mov.b32 %r24246, %r24033; 2026-02-21T09:19:14.2332837Z mov.b32 %r24247, %r24033; 2026-02-21T09:19:14.2332896Z mov.b32 %r24248, %r24033; 2026-02-21T09:19:14.2332954Z mov.b32 %r24249, %r24033; 2026-02-21T09:19:14.2333015Z mov.b32 %r24250, %r24033; 2026-02-21T09:19:14.2333071Z mov.b32 %r24251, %r24033; 2026-02-21T09:19:14.2333129Z mov.b32 %r24252, %r24033; 2026-02-21T09:19:14.2333192Z mov.b32 %r24253, %r24033; 2026-02-21T09:19:14.2333250Z mov.b32 %r24254, %r24033; 2026-02-21T09:19:14.2333308Z mov.b32 %r24255, %r24033; 2026-02-21T09:19:14.2333367Z mov.b32 %r24256, %r24033; 2026-02-21T09:19:14.2333426Z mov.b32 %r24257, %r24033; 2026-02-21T09:19:14.2333486Z mov.b32 %r24258, %r24033; 2026-02-21T09:19:14.2333544Z mov.b32 %r24259, %r24033; 2026-02-21T09:19:14.2333605Z mov.b32 %r24260, %r24033; 2026-02-21T09:19:14.2333663Z mov.b32 %r24261, %r24033; 2026-02-21T09:19:14.2333721Z mov.b32 %r24262, %r24033; 2026-02-21T09:19:14.2333782Z mov.b32 %r24263, %r24033; 2026-02-21T09:19:14.2333838Z mov.b32 %r24264, %r24033; 2026-02-21T09:19:14.2333896Z mov.b32 %r24265, %r24033; 2026-02-21T09:19:14.2333954Z mov.b32 %r24266, %r24033; 2026-02-21T09:19:14.2334013Z mov.b32 %r24267, %r24033; 2026-02-21T09:19:14.2334070Z mov.b32 %r24268, %r24033; 2026-02-21T09:19:14.2334128Z mov.b32 %r24269, %r24033; 2026-02-21T09:19:14.2334187Z mov.b32 %r24270, %r24033; 2026-02-21T09:19:14.2334247Z mov.b32 %r24271, %r24033; 2026-02-21T09:19:14.2334303Z mov.b32 %r24272, %r24033; 2026-02-21T09:19:14.2334360Z mov.b32 %r24273, %r24033; 2026-02-21T09:19:14.2334422Z mov.b32 %r24274, %r24033; 2026-02-21T09:19:14.2334479Z mov.b32 %r24275, %r24033; 2026-02-21T09:19:14.2334537Z mov.b32 %r24276, %r24033; 2026-02-21T09:19:14.2334597Z mov.b32 %r24277, %r24033; 2026-02-21T09:19:14.2334655Z mov.b32 %r24278, %r24033; 2026-02-21T09:19:14.2334714Z mov.b32 %r24279, %r24033; 2026-02-21T09:19:14.2334772Z mov.b32 %r24280, %r24033; 2026-02-21T09:19:14.2334833Z mov.b32 %r24281, %r24033; 2026-02-21T09:19:14.2334891Z mov.b32 %r24282, %r24033; 2026-02-21T09:19:14.2334949Z mov.b32 %r24283, %r24033; 2026-02-21T09:19:14.2335072Z mov.b32 %r24284, %r24033; 2026-02-21T09:19:14.2335130Z mov.b32 %r24285, %r24033; 2026-02-21T09:19:14.2335188Z mov.b32 %r24286, %r24033; 2026-02-21T09:19:14.2335244Z mov.b32 %r24287, %r24033; 2026-02-21T09:19:14.2335316Z mov.b32 %r24288, %r24033; 2026-02-21T09:19:14.2335486Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T09:19:14.2335594Z // => This Inner Loop Header: Depth=2 2026-02-21T09:19:14.2335662Z add.s64 %rd1123, %rd1123, 16; 2026-02-21T09:19:14.2335734Z setp.lt.u64 %p105, %rd1123, 432; 2026-02-21T09:19:14.2335797Z add.s32 %r22411, %r24031, 1; 2026-02-21T09:19:14.2335867Z setp.gt.s32 %p106, %r22411, 4; 2026-02-21T09:19:14.2335937Z selp.b32 %r24031, 0, %r22411, %p106; 2026-02-21T09:19:14.2336198Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2336270Z cp.async.wait_group 16; 2026-02-21T09:19:14.2336330Z bar.sync 0; 2026-02-21T09:19:14.2336393Z shl.b32 %r22412, %r24031, 14; 2026-02-21T09:19:14.2336574Z add.s32 %r22414, %r22967, %r22412; 2026-02-21T09:19:14.2336785Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.2336854Z add.s32 %r22415, %r22414, %r290; 2026-02-21T09:19:14.2336919Z ld.shared.b16 %rs584, [%r22415]; 2026-02-21T09:19:14.2336991Z ld.shared.b16 %rs585, [%r22415+256]; 2026-02-21T09:19:14.2337060Z ld.shared.b16 %rs586, [%r22415+16]; 2026-02-21T09:19:14.2337126Z ld.shared.b16 %rs587, [%r22415+272]; 2026-02-21T09:19:14.2337277Z ld.shared.b16 %rs588, [%r22415+4096]; 2026-02-21T09:19:14.2337353Z ld.shared.b16 %rs589, [%r22415+4352]; 2026-02-21T09:19:14.2337421Z ld.shared.b16 %rs590, [%r22415+4112]; 2026-02-21T09:19:14.2337488Z ld.shared.b16 %rs591, [%r22415+4368]; 2026-02-21T09:19:14.2337555Z ld.shared.b16 %rs592, [%r22415+8192]; 2026-02-21T09:19:14.2337623Z ld.shared.b16 %rs593, [%r22415+8448]; 2026-02-21T09:19:14.2337690Z ld.shared.b16 %rs594, [%r22415+8208]; 2026-02-21T09:19:14.2337756Z ld.shared.b16 %rs595, [%r22415+8464]; 2026-02-21T09:19:14.2337830Z ld.shared.b16 %rs596, [%r22415+12288]; 2026-02-21T09:19:14.2337898Z ld.shared.b16 %rs597, [%r22415+12544]; 2026-02-21T09:19:14.2337967Z ld.shared.b16 %rs598, [%r22415+12304]; 2026-02-21T09:19:14.2338036Z ld.shared.b16 %rs599, [%r22415+12560]; 2026-02-21T09:19:14.2338099Z add.s32 %r22416, %r22414, %r291; 2026-02-21T09:19:14.2338161Z ld.shared.b16 %rs600, [%r22416]; 2026-02-21T09:19:14.2338228Z ld.shared.b16 %rs601, [%r22416+256]; 2026-02-21T09:19:14.2338296Z ld.shared.b16 %rs602, [%r22416+16]; 2026-02-21T09:19:14.2338361Z ld.shared.b16 %rs603, [%r22416+272]; 2026-02-21T09:19:14.2338426Z ld.shared.b16 %rs604, [%r22416+4096]; 2026-02-21T09:19:14.2338495Z ld.shared.b16 %rs605, [%r22416+4352]; 2026-02-21T09:19:14.2338560Z ld.shared.b16 %rs606, [%r22416+4112]; 2026-02-21T09:19:14.2338624Z ld.shared.b16 %rs607, [%r22416+4368]; 2026-02-21T09:19:14.2338692Z ld.shared.b16 %rs608, [%r22416+8192]; 2026-02-21T09:19:14.2338757Z ld.shared.b16 %rs609, [%r22416+8448]; 2026-02-21T09:19:14.2338822Z ld.shared.b16 %rs610, [%r22416+8208]; 2026-02-21T09:19:14.2338886Z ld.shared.b16 %rs611, [%r22416+8464]; 2026-02-21T09:19:14.2338958Z ld.shared.b16 %rs612, [%r22416+12288]; 2026-02-21T09:19:14.2339025Z ld.shared.b16 %rs613, [%r22416+12544]; 2026-02-21T09:19:14.2339093Z ld.shared.b16 %rs614, [%r22416+12304]; 2026-02-21T09:19:14.2339163Z ld.shared.b16 %rs615, [%r22416+12560]; 2026-02-21T09:19:14.2339229Z cvt.f32.bf16 %r19355, %rs584; 2026-02-21T09:19:14.2339291Z cvt.f32.bf16 %r19356, %rs585; 2026-02-21T09:19:14.2339354Z cvt.f32.bf16 %r19357, %rs600; 2026-02-21T09:19:14.2339413Z cvt.f32.bf16 %r19358, %rs601; 2026-02-21T09:19:14.2339473Z cvt.f32.bf16 %r19487, %rs586; 2026-02-21T09:19:14.2339531Z cvt.f32.bf16 %r19488, %rs587; 2026-02-21T09:19:14.2339594Z cvt.f32.bf16 %r19489, %rs602; 2026-02-21T09:19:14.2339652Z cvt.f32.bf16 %r19490, %rs603; 2026-02-21T09:19:14.2339790Z cvt.f32.bf16 %r19619, %rs588; 2026-02-21T09:19:14.2339851Z cvt.f32.bf16 %r19620, %rs589; 2026-02-21T09:19:14.2339911Z cvt.f32.bf16 %r19621, %rs604; 2026-02-21T09:19:14.2339975Z cvt.f32.bf16 %r19622, %rs605; 2026-02-21T09:19:14.2340095Z cvt.f32.bf16 %r19751, %rs590; 2026-02-21T09:19:14.2340157Z cvt.f32.bf16 %r19752, %rs591; 2026-02-21T09:19:14.2340216Z cvt.f32.bf16 %r19753, %rs606; 2026-02-21T09:19:14.2340278Z cvt.f32.bf16 %r19754, %rs607; 2026-02-21T09:19:14.2340340Z cvt.f32.bf16 %r19883, %rs592; 2026-02-21T09:19:14.2340414Z cvt.f32.bf16 %r19884, %rs593; 2026-02-21T09:19:14.2340476Z cvt.f32.bf16 %r19885, %rs608; 2026-02-21T09:19:14.2340536Z cvt.f32.bf16 %r19886, %rs609; 2026-02-21T09:19:14.2340597Z cvt.f32.bf16 %r20015, %rs594; 2026-02-21T09:19:14.2340721Z cvt.f32.bf16 %r20016, %rs595; 2026-02-21T09:19:14.2340784Z cvt.f32.bf16 %r20017, %rs610; 2026-02-21T09:19:14.2340846Z cvt.f32.bf16 %r20018, %rs611; 2026-02-21T09:19:14.2340904Z cvt.f32.bf16 %r20147, %rs596; 2026-02-21T09:19:14.2340965Z cvt.f32.bf16 %r20148, %rs597; 2026-02-21T09:19:14.2341027Z cvt.f32.bf16 %r20149, %rs612; 2026-02-21T09:19:14.2341086Z cvt.f32.bf16 %r20150, %rs613; 2026-02-21T09:19:14.2341146Z cvt.f32.bf16 %r20279, %rs598; 2026-02-21T09:19:14.2341206Z cvt.f32.bf16 %r20280, %rs599; 2026-02-21T09:19:14.2341268Z cvt.f32.bf16 %r20281, %rs614; 2026-02-21T09:19:14.2341327Z cvt.f32.bf16 %r20282, %rs615; 2026-02-21T09:19:14.2341539Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2341649Z shl.b32 %r22417, %r24031, 10; 2026-02-21T09:19:14.2341715Z add.s32 %r22418, %r22967, %r22417; 2026-02-21T09:19:14.2341776Z add.s32 %r22419, %r22418, 172032; 2026-02-21T09:19:14.2341979Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.2342044Z add.s32 %r22420, %r22419, %r22973; 2026-02-21T09:19:14.2342111Z ld.shared.b8 %rs616, [%r22420]; 2026-02-21T09:19:14.2342179Z ld.shared.b8 %rs617, [%r22420+128]; 2026-02-21T09:19:14.2342247Z ld.shared.b8 %rs618, [%r22420+256]; 2026-02-21T09:19:14.2342311Z ld.shared.b8 %rs619, [%r22420+384]; 2026-02-21T09:19:14.2342375Z ld.shared.b8 %rs620, [%r22420+512]; 2026-02-21T09:19:14.2342458Z ld.shared.b8 %rs621, [%r22420+640]; 2026-02-21T09:19:14.2342523Z ld.shared.b8 %rs622, [%r22420+768]; 2026-02-21T09:19:14.2342585Z add.s32 %r22421, %r22419, %r22974; 2026-02-21T09:19:14.2342648Z ld.shared.b8 %rs623, [%r22421]; 2026-02-21T09:19:14.2342849Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.2342915Z shl.b16 %rs624, %rs616, 4; 2026-02-21T09:19:14.2342975Z shl.b16 %rs625, %rs617, 4; 2026-02-21T09:19:14.2343040Z shl.b16 %rs626, %rs618, 4; 2026-02-21T09:19:14.2343100Z shl.b16 %rs627, %rs619, 4; 2026-02-21T09:19:14.2343159Z shl.b16 %rs628, %rs620, 4; 2026-02-21T09:19:14.2343219Z shl.b16 %rs629, %rs621, 4; 2026-02-21T09:19:14.2343284Z shl.b16 %rs630, %rs622, 4; 2026-02-21T09:19:14.2343347Z shl.b16 %rs631, %rs623, 4; 2026-02-21T09:19:14.2343548Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.2343625Z selp.b16 %rs632, %rs624, %rs616, %p110; 2026-02-21T09:19:14.2343701Z cvt.s16.s8 %rs633, %rs632; 2026-02-21T09:19:14.2343762Z shr.s16 %rs634, %rs633, 4; 2026-02-21T09:19:14.2343837Z selp.b16 %rs635, %rs625, %rs617, %p110; 2026-02-21T09:19:14.2343898Z cvt.s16.s8 %rs636, %rs635; 2026-02-21T09:19:14.2343960Z shr.s16 %rs637, %rs636, 4; 2026-02-21T09:19:14.2344028Z selp.b16 %rs638, %rs626, %rs618, %p110; 2026-02-21T09:19:14.2344092Z cvt.s16.s8 %rs639, %rs638; 2026-02-21T09:19:14.2344152Z shr.s16 %rs640, %rs639, 4; 2026-02-21T09:19:14.2344223Z selp.b16 %rs641, %rs627, %rs619, %p110; 2026-02-21T09:19:14.2344285Z cvt.s16.s8 %rs642, %rs641; 2026-02-21T09:19:14.2344344Z shr.s16 %rs643, %rs642, 4; 2026-02-21T09:19:14.2344413Z selp.b16 %rs644, %rs628, %rs620, %p110; 2026-02-21T09:19:14.2344533Z cvt.s16.s8 %rs645, %rs644; 2026-02-21T09:19:14.2344597Z shr.s16 %rs646, %rs645, 4; 2026-02-21T09:19:14.2344663Z selp.b16 %rs647, %rs629, %rs621, %p110; 2026-02-21T09:19:14.2344724Z cvt.s16.s8 %rs648, %rs647; 2026-02-21T09:19:14.2344833Z shr.s16 %rs649, %rs648, 4; 2026-02-21T09:19:14.2344901Z selp.b16 %rs650, %rs630, %rs622, %p110; 2026-02-21T09:19:14.2344961Z cvt.s16.s8 %rs651, %rs650; 2026-02-21T09:19:14.2345023Z shr.s16 %rs652, %rs651, 4; 2026-02-21T09:19:14.2345092Z selp.b16 %rs653, %rs631, %rs623, %p110; 2026-02-21T09:19:14.2345152Z cvt.s16.s8 %rs654, %rs653; 2026-02-21T09:19:14.2345212Z shr.s16 %rs655, %rs654, 4; 2026-02-21T09:19:14.2345467Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.2345535Z cvt.rn.f32.s16 %r22422, %rs634; 2026-02-21T09:19:14.2345598Z cvt.rn.f32.s16 %r22423, %rs637; 2026-02-21T09:19:14.2345661Z cvt.rn.f32.s16 %r22424, %rs640; 2026-02-21T09:19:14.2345725Z cvt.rn.f32.s16 %r22425, %rs643; 2026-02-21T09:19:14.2345786Z cvt.rn.f32.s16 %r22426, %rs646; 2026-02-21T09:19:14.2345846Z cvt.rn.f32.s16 %r22427, %rs649; 2026-02-21T09:19:14.2345909Z cvt.rn.f32.s16 %r22428, %rs652; 2026-02-21T09:19:14.2345972Z cvt.rn.f32.s16 %r22429, %rs655; 2026-02-21T09:19:14.2346036Z st.shared.b32 [%r294], %r22422; 2026-02-21T09:19:14.2346104Z st.shared.b32 [%r294+8], %r22423; 2026-02-21T09:19:14.2346166Z st.shared.b32 [%r295], %r22424; 2026-02-21T09:19:14.2346229Z st.shared.b32 [%r295+8], %r22425; 2026-02-21T09:19:14.2346374Z st.shared.b32 [%r296], %r22426; 2026-02-21T09:19:14.2346442Z st.shared.b32 [%r296+8], %r22427; 2026-02-21T09:19:14.2346615Z st.shared.b32 [%r297], %r22428; 2026-02-21T09:19:14.2346683Z st.shared.b32 [%r297+8], %r22429; 2026-02-21T09:19:14.2346741Z $L__tmp17: 2026-02-21T09:19:14.2347021Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.2347084Z // begin inline asm 2026-02-21T09:19:14.2347170Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.2347227Z // end inline asm 2026-02-21T09:19:14.2347282Z bar.sync 0; 2026-02-21T09:19:14.2347365Z shfl.sync.idx.b32 %r22430, %r5, 0, 31, -1; 2026-02-21T09:19:14.2347443Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.2347508Z mov.pred %p88, -1; 2026-02-21T09:19:14.2347568Z // begin inline asm 2026-02-21T09:19:14.2349142Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096}, {%r19355,%r19356,%r19357,%r19358}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2349204Z // end inline asm 2026-02-21T09:19:14.2349270Z // begin inline asm 2026-02-21T09:19:14.2350757Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096}, {%r19487,%r19488,%r19489,%r19490}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2350819Z // end inline asm 2026-02-21T09:19:14.2350879Z // begin inline asm 2026-02-21T09:19:14.2352356Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160}, {%r19619,%r19620,%r19621,%r19622}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2352563Z // end inline asm 2026-02-21T09:19:14.2352626Z // begin inline asm 2026-02-21T09:19:14.2354161Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160}, {%r19751,%r19752,%r19753,%r19754}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2354228Z // end inline asm 2026-02-21T09:19:14.2354286Z // begin inline asm 2026-02-21T09:19:14.2355834Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224}, {%r19883,%r19884,%r19885,%r19886}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2355901Z // end inline asm 2026-02-21T09:19:14.2355960Z // begin inline asm 2026-02-21T09:19:14.2357546Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224}, {%r20015,%r20016,%r20017,%r20018}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2357611Z // end inline asm 2026-02-21T09:19:14.2357669Z // begin inline asm 2026-02-21T09:19:14.2359149Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288}, {%r20147,%r20148,%r20149,%r20150}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2359209Z // end inline asm 2026-02-21T09:19:14.2359266Z // begin inline asm 2026-02-21T09:19:14.2360749Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288}, {%r20279,%r20280,%r20281,%r20282}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2360885Z // end inline asm 2026-02-21T09:19:14.2360963Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.2361080Z mov.b32 %r22115, 0; 2026-02-21T09:19:14.2361140Z mov.b32 %r20539, %r18956; 2026-02-21T09:19:14.2361199Z mov.b32 %r20540, %r22115; 2026-02-21T09:19:14.2361259Z mov.b32 %r20541, %r22115; 2026-02-21T09:19:14.2361318Z // begin inline asm 2026-02-21T09:19:14.2366422Z // wait for regs: %r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096,%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160,%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224,%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288,%r20539,%r20540,%r20541 2026-02-21T09:19:14.2366620Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.2366679Z // end inline asm 2026-02-21T09:19:14.2366734Z $L__tmp18: 2026-02-21T09:19:14.2366952Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2367020Z add.s32 %r22431, %r22967, 81920; 2026-02-21T09:19:14.2367084Z add.s32 %r22432, %r22431, %r22412; 2026-02-21T09:19:14.2367290Z .loc 1 55 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:55:32 2026-02-21T09:19:14.2367356Z add.s32 %r22433, %r22432, %r290; 2026-02-21T09:19:14.2367421Z ld.shared.b16 %rs656, [%r22433]; 2026-02-21T09:19:14.2367491Z ld.shared.b16 %rs657, [%r22433+256]; 2026-02-21T09:19:14.2367562Z ld.shared.b16 %rs658, [%r22433+16]; 2026-02-21T09:19:14.2367628Z ld.shared.b16 %rs659, [%r22433+272]; 2026-02-21T09:19:14.2367695Z ld.shared.b16 %rs660, [%r22433+4096]; 2026-02-21T09:19:14.2367767Z ld.shared.b16 %rs661, [%r22433+4352]; 2026-02-21T09:19:14.2367835Z ld.shared.b16 %rs662, [%r22433+4112]; 2026-02-21T09:19:14.2367902Z ld.shared.b16 %rs663, [%r22433+4368]; 2026-02-21T09:19:14.2367968Z ld.shared.b16 %rs664, [%r22433+8192]; 2026-02-21T09:19:14.2368037Z ld.shared.b16 %rs665, [%r22433+8448]; 2026-02-21T09:19:14.2368102Z ld.shared.b16 %rs666, [%r22433+8208]; 2026-02-21T09:19:14.2368166Z ld.shared.b16 %rs667, [%r22433+8464]; 2026-02-21T09:19:14.2368325Z ld.shared.b16 %rs668, [%r22433+12288]; 2026-02-21T09:19:14.2368411Z ld.shared.b16 %rs669, [%r22433+12544]; 2026-02-21T09:19:14.2368479Z ld.shared.b16 %rs670, [%r22433+12304]; 2026-02-21T09:19:14.2368545Z ld.shared.b16 %rs671, [%r22433+12560]; 2026-02-21T09:19:14.2368684Z add.s32 %r22434, %r22432, %r291; 2026-02-21T09:19:14.2368749Z ld.shared.b16 %rs672, [%r22434]; 2026-02-21T09:19:14.2368815Z ld.shared.b16 %rs673, [%r22434+256]; 2026-02-21T09:19:14.2368884Z ld.shared.b16 %rs674, [%r22434+16]; 2026-02-21T09:19:14.2368951Z ld.shared.b16 %rs675, [%r22434+272]; 2026-02-21T09:19:14.2369017Z ld.shared.b16 %rs676, [%r22434+4096]; 2026-02-21T09:19:14.2369083Z ld.shared.b16 %rs677, [%r22434+4352]; 2026-02-21T09:19:14.2369220Z ld.shared.b16 %rs678, [%r22434+4112]; 2026-02-21T09:19:14.2369287Z ld.shared.b16 %rs679, [%r22434+4368]; 2026-02-21T09:19:14.2369351Z ld.shared.b16 %rs680, [%r22434+8192]; 2026-02-21T09:19:14.2369418Z ld.shared.b16 %rs681, [%r22434+8448]; 2026-02-21T09:19:14.2369485Z ld.shared.b16 %rs682, [%r22434+8208]; 2026-02-21T09:19:14.2369550Z ld.shared.b16 %rs683, [%r22434+8464]; 2026-02-21T09:19:14.2369619Z ld.shared.b16 %rs684, [%r22434+12288]; 2026-02-21T09:19:14.2369689Z ld.shared.b16 %rs685, [%r22434+12544]; 2026-02-21T09:19:14.2369758Z ld.shared.b16 %rs686, [%r22434+12304]; 2026-02-21T09:19:14.2369823Z ld.shared.b16 %rs687, [%r22434+12560]; 2026-02-21T09:19:14.2369890Z cvt.f32.bf16 %r20929, %rs656; 2026-02-21T09:19:14.2369953Z cvt.f32.bf16 %r20930, %rs657; 2026-02-21T09:19:14.2370083Z cvt.f32.bf16 %r20931, %rs672; 2026-02-21T09:19:14.2370150Z cvt.f32.bf16 %r20932, %rs673; 2026-02-21T09:19:14.2370211Z cvt.f32.bf16 %r21061, %rs658; 2026-02-21T09:19:14.2370269Z cvt.f32.bf16 %r21062, %rs659; 2026-02-21T09:19:14.2370330Z cvt.f32.bf16 %r21063, %rs674; 2026-02-21T09:19:14.2370393Z cvt.f32.bf16 %r21064, %rs675; 2026-02-21T09:19:14.2370452Z cvt.f32.bf16 %r21193, %rs660; 2026-02-21T09:19:14.2370513Z cvt.f32.bf16 %r21194, %rs661; 2026-02-21T09:19:14.2370576Z cvt.f32.bf16 %r21195, %rs676; 2026-02-21T09:19:14.2370636Z cvt.f32.bf16 %r21196, %rs677; 2026-02-21T09:19:14.2370695Z cvt.f32.bf16 %r21325, %rs662; 2026-02-21T09:19:14.2370753Z cvt.f32.bf16 %r21326, %rs663; 2026-02-21T09:19:14.2370816Z cvt.f32.bf16 %r21327, %rs678; 2026-02-21T09:19:14.2370874Z cvt.f32.bf16 %r21328, %rs679; 2026-02-21T09:19:14.2370934Z cvt.f32.bf16 %r21457, %rs664; 2026-02-21T09:19:14.2370996Z cvt.f32.bf16 %r21458, %rs665; 2026-02-21T09:19:14.2371056Z cvt.f32.bf16 %r21459, %rs680; 2026-02-21T09:19:14.2371116Z cvt.f32.bf16 %r21460, %rs681; 2026-02-21T09:19:14.2371176Z cvt.f32.bf16 %r21589, %rs666; 2026-02-21T09:19:14.2371240Z cvt.f32.bf16 %r21590, %rs667; 2026-02-21T09:19:14.2371300Z cvt.f32.bf16 %r21591, %rs682; 2026-02-21T09:19:14.2371360Z cvt.f32.bf16 %r21592, %rs683; 2026-02-21T09:19:14.2371425Z cvt.f32.bf16 %r21721, %rs668; 2026-02-21T09:19:14.2371484Z cvt.f32.bf16 %r21722, %rs669; 2026-02-21T09:19:14.2371546Z cvt.f32.bf16 %r21723, %rs684; 2026-02-21T09:19:14.2371605Z cvt.f32.bf16 %r21724, %rs685; 2026-02-21T09:19:14.2371667Z cvt.f32.bf16 %r21853, %rs670; 2026-02-21T09:19:14.2371727Z cvt.f32.bf16 %r21854, %rs671; 2026-02-21T09:19:14.2371787Z cvt.f32.bf16 %r21855, %rs686; 2026-02-21T09:19:14.2371864Z cvt.f32.bf16 %r21856, %rs687; 2026-02-21T09:19:14.2372073Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2372138Z add.s32 %r22435, %r22418, 177152; 2026-02-21T09:19:14.2372342Z .loc 1 70 45 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:70:45 2026-02-21T09:19:14.2372407Z add.s32 %r22436, %r22435, %r22973; 2026-02-21T09:19:14.2372477Z ld.shared.b8 %rs688, [%r22436]; 2026-02-21T09:19:14.2372547Z ld.shared.b8 %rs689, [%r22436+128]; 2026-02-21T09:19:14.2372616Z ld.shared.b8 %rs690, [%r22436+256]; 2026-02-21T09:19:14.2372680Z ld.shared.b8 %rs691, [%r22436+384]; 2026-02-21T09:19:14.2372805Z ld.shared.b8 %rs692, [%r22436+512]; 2026-02-21T09:19:14.2372873Z ld.shared.b8 %rs693, [%r22436+640]; 2026-02-21T09:19:14.2372937Z ld.shared.b8 %rs694, [%r22436+768]; 2026-02-21T09:19:14.2373000Z add.s32 %r22437, %r22435, %r22974; 2026-02-21T09:19:14.2373112Z ld.shared.b8 %rs695, [%r22437]; 2026-02-21T09:19:14.2373312Z .loc 1 60 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:60:28 2026-02-21T09:19:14.2373376Z shl.b16 %rs696, %rs688, 4; 2026-02-21T09:19:14.2373435Z shl.b16 %rs697, %rs689, 4; 2026-02-21T09:19:14.2373500Z shl.b16 %rs698, %rs690, 4; 2026-02-21T09:19:14.2373560Z shl.b16 %rs699, %rs691, 4; 2026-02-21T09:19:14.2373620Z shl.b16 %rs700, %rs692, 4; 2026-02-21T09:19:14.2373681Z shl.b16 %rs701, %rs693, 4; 2026-02-21T09:19:14.2373821Z shl.b16 %rs702, %rs694, 4; 2026-02-21T09:19:14.2373885Z shl.b16 %rs703, %rs695, 4; 2026-02-21T09:19:14.2374083Z .loc 1 75 58 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:75:58 2026-02-21T09:19:14.2374162Z selp.b16 %rs704, %rs696, %rs688, %p110; 2026-02-21T09:19:14.2374225Z cvt.s16.s8 %rs705, %rs704; 2026-02-21T09:19:14.2374284Z shr.s16 %rs706, %rs705, 4; 2026-02-21T09:19:14.2374357Z selp.b16 %rs707, %rs697, %rs689, %p110; 2026-02-21T09:19:14.2374418Z cvt.s16.s8 %rs708, %rs707; 2026-02-21T09:19:14.2374476Z shr.s16 %rs709, %rs708, 4; 2026-02-21T09:19:14.2374546Z selp.b16 %rs710, %rs698, %rs690, %p110; 2026-02-21T09:19:14.2374608Z cvt.s16.s8 %rs711, %rs710; 2026-02-21T09:19:14.2374667Z shr.s16 %rs712, %rs711, 4; 2026-02-21T09:19:14.2374786Z selp.b16 %rs713, %rs699, %rs691, %p110; 2026-02-21T09:19:14.2374852Z cvt.s16.s8 %rs714, %rs713; 2026-02-21T09:19:14.2374913Z shr.s16 %rs715, %rs714, 4; 2026-02-21T09:19:14.2374984Z selp.b16 %rs716, %rs700, %rs692, %p110; 2026-02-21T09:19:14.2375046Z cvt.s16.s8 %rs717, %rs716; 2026-02-21T09:19:14.2375105Z shr.s16 %rs718, %rs717, 4; 2026-02-21T09:19:14.2375172Z selp.b16 %rs719, %rs701, %rs693, %p110; 2026-02-21T09:19:14.2375234Z cvt.s16.s8 %rs720, %rs719; 2026-02-21T09:19:14.2375296Z shr.s16 %rs721, %rs720, 4; 2026-02-21T09:19:14.2375364Z selp.b16 %rs722, %rs702, %rs694, %p110; 2026-02-21T09:19:14.2375423Z cvt.s16.s8 %rs723, %rs722; 2026-02-21T09:19:14.2375486Z shr.s16 %rs724, %rs723, 4; 2026-02-21T09:19:14.2375557Z selp.b16 %rs725, %rs703, %rs695, %p110; 2026-02-21T09:19:14.2375617Z cvt.s16.s8 %rs726, %rs725; 2026-02-21T09:19:14.2375677Z shr.s16 %rs727, %rs726, 4; 2026-02-21T09:19:14.2375878Z .loc 1 80 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:80:32 2026-02-21T09:19:14.2375944Z cvt.rn.f32.s16 %r22438, %rs706; 2026-02-21T09:19:14.2376018Z cvt.rn.f32.s16 %r22439, %rs709; 2026-02-21T09:19:14.2376086Z cvt.rn.f32.s16 %r22440, %rs712; 2026-02-21T09:19:14.2376151Z cvt.rn.f32.s16 %r22441, %rs715; 2026-02-21T09:19:14.2376212Z cvt.rn.f32.s16 %r22442, %rs718; 2026-02-21T09:19:14.2376276Z cvt.rn.f32.s16 %r22443, %rs721; 2026-02-21T09:19:14.2376339Z cvt.rn.f32.s16 %r22444, %rs724; 2026-02-21T09:19:14.2376400Z cvt.rn.f32.s16 %r22445, %rs727; 2026-02-21T09:19:14.2376577Z bar.sync 0; 2026-02-21T09:19:14.2376651Z st.shared.b32 [%r294], %r22438; 2026-02-21T09:19:14.2376717Z st.shared.b32 [%r294+8], %r22439; 2026-02-21T09:19:14.2376781Z st.shared.b32 [%r295], %r22440; 2026-02-21T09:19:14.2376848Z st.shared.b32 [%r295+8], %r22441; 2026-02-21T09:19:14.2376910Z st.shared.b32 [%r296], %r22442; 2026-02-21T09:19:14.2376971Z st.shared.b32 [%r296+8], %r22443; 2026-02-21T09:19:14.2377033Z st.shared.b32 [%r297], %r22444; 2026-02-21T09:19:14.2377101Z st.shared.b32 [%r297+8], %r22445; 2026-02-21T09:19:14.2377156Z $L__tmp19: 2026-02-21T09:19:14.2377436Z .loc 2 291 36 // standard.py:291:36 @[ co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:87:40 ] 2026-02-21T09:19:14.2377500Z // begin inline asm 2026-02-21T09:19:14.2377578Z fence.proxy.async.shared::cta; 2026-02-21T09:19:14.2377634Z // end inline asm 2026-02-21T09:19:14.2377778Z bar.sync 0; 2026-02-21T09:19:14.2377850Z wgmma.fence.sync.aligned; 2026-02-21T09:19:14.2377909Z // begin inline asm 2026-02-21T09:19:14.2379402Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096}, {%r20929,%r20930,%r20931,%r20932}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2379597Z // end inline asm 2026-02-21T09:19:14.2379659Z // begin inline asm 2026-02-21T09:19:14.2381142Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096}, {%r21061,%r21062,%r21063,%r21064}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2381203Z // end inline asm 2026-02-21T09:19:14.2381317Z // begin inline asm 2026-02-21T09:19:14.2382798Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160}, {%r21193,%r21194,%r21195,%r21196}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2382858Z // end inline asm 2026-02-21T09:19:14.2382918Z // begin inline asm 2026-02-21T09:19:14.2384391Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160}, {%r21325,%r21326,%r21327,%r21328}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2384447Z // end inline asm 2026-02-21T09:19:14.2384511Z // begin inline asm 2026-02-21T09:19:14.2385995Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224}, {%r21457,%r21458,%r21459,%r21460}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2386055Z // end inline asm 2026-02-21T09:19:14.2386113Z // begin inline asm 2026-02-21T09:19:14.2387739Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224}, {%r21589,%r21590,%r21591,%r21592}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2387953Z // end inline asm 2026-02-21T09:19:14.2388013Z // begin inline asm 2026-02-21T09:19:14.2389700Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288}, {%r21721,%r21722,%r21723,%r21724}, %rd4, %p88, 1, 1; 2026-02-21T09:19:14.2389768Z // end inline asm 2026-02-21T09:19:14.2389826Z // begin inline asm 2026-02-21T09:19:14.2391374Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288}, {%r21853,%r21854,%r21855,%r21856}, %rd5, %p88, 1, 1; 2026-02-21T09:19:14.2391434Z // end inline asm 2026-02-21T09:19:14.2391509Z wgmma.commit_group.sync.aligned; 2026-02-21T09:19:14.2391574Z mov.b32 %r22114, %r22115; 2026-02-21T09:19:14.2391635Z mov.b32 %r22113, %r18956; 2026-02-21T09:19:14.2391692Z // begin inline asm 2026-02-21T09:19:14.2396774Z // wait for regs: %r24033,%r24034,%r24035,%r24036,%r24037,%r24038,%r24039,%r24040,%r24041,%r24042,%r24043,%r24044,%r24045,%r24046,%r24047,%r24048,%r24049,%r24050,%r24051,%r24052,%r24053,%r24054,%r24055,%r24056,%r24057,%r24058,%r24059,%r24060,%r24061,%r24062,%r24063,%r24064,%r24065,%r24066,%r24067,%r24068,%r24069,%r24070,%r24071,%r24072,%r24073,%r24074,%r24075,%r24076,%r24077,%r24078,%r24079,%r24080,%r24081,%r24082,%r24083,%r24084,%r24085,%r24086,%r24087,%r24088,%r24089,%r24090,%r24091,%r24092,%r24093,%r24094,%r24095,%r24096,%r24097,%r24098,%r24099,%r24100,%r24101,%r24102,%r24103,%r24104,%r24105,%r24106,%r24107,%r24108,%r24109,%r24110,%r24111,%r24112,%r24113,%r24114,%r24115,%r24116,%r24117,%r24118,%r24119,%r24120,%r24121,%r24122,%r24123,%r24124,%r24125,%r24126,%r24127,%r24128,%r24129,%r24130,%r24131,%r24132,%r24133,%r24134,%r24135,%r24136,%r24137,%r24138,%r24139,%r24140,%r24141,%r24142,%r24143,%r24144,%r24145,%r24146,%r24147,%r24148,%r24149,%r24150,%r24151,%r24152,%r24153,%r24154,%r24155,%r24156,%r24157,%r24158,%r24159,%r24160,%r24161,%r24162,%r24163,%r24164,%r24165,%r24166,%r24167,%r24168,%r24169,%r24170,%r24171,%r24172,%r24173,%r24174,%r24175,%r24176,%r24177,%r24178,%r24179,%r24180,%r24181,%r24182,%r24183,%r24184,%r24185,%r24186,%r24187,%r24188,%r24189,%r24190,%r24191,%r24192,%r24193,%r24194,%r24195,%r24196,%r24197,%r24198,%r24199,%r24200,%r24201,%r24202,%r24203,%r24204,%r24205,%r24206,%r24207,%r24208,%r24209,%r24210,%r24211,%r24212,%r24213,%r24214,%r24215,%r24216,%r24217,%r24218,%r24219,%r24220,%r24221,%r24222,%r24223,%r24224,%r24225,%r24226,%r24227,%r24228,%r24229,%r24230,%r24231,%r24232,%r24233,%r24234,%r24235,%r24236,%r24237,%r24238,%r24239,%r24240,%r24241,%r24242,%r24243,%r24244,%r24245,%r24246,%r24247,%r24248,%r24249,%r24250,%r24251,%r24252,%r24253,%r24254,%r24255,%r24256,%r24257,%r24258,%r24259,%r24260,%r24261,%r24262,%r24263,%r24264,%r24265,%r24266,%r24267,%r24268,%r24269,%r24270,%r24271,%r24272,%r24273,%r24274,%r24275,%r24276,%r24277,%r24278,%r24279,%r24280,%r24281,%r24282,%r24283,%r24284,%r24285,%r24286,%r24287,%r24288,%r22113,%r22114,%r22115 2026-02-21T09:19:14.2396944Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:19:14.2397067Z // end inline asm 2026-02-21T09:19:14.2397125Z $L__tmp20: 2026-02-21T09:19:14.2397337Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2397401Z add.s32 %r22446, %r24032, 1; 2026-02-21T09:19:14.2397472Z setp.gt.s32 %p107, %r22446, 4; 2026-02-21T09:19:14.2397543Z selp.b32 %r24032, 0, %r22446, %p107; 2026-02-21T09:19:14.2397804Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2397871Z add.s64 %rd1071, %rd1122, %rd59; 2026-02-21T09:19:14.2397934Z add.s64 %rd1053, %rd1071, 320; 2026-02-21T09:19:14.2397999Z add.s64 %rd1072, %rd1122, %rd58; 2026-02-21T09:19:14.2398062Z add.s64 %rd1054, %rd1072, 320; 2026-02-21T09:19:14.2398123Z add.s64 %rd1073, %rd1122, %rd57; 2026-02-21T09:19:14.2398187Z add.s64 %rd1055, %rd1073, 320; 2026-02-21T09:19:14.2398248Z add.s64 %rd1074, %rd1122, %rd56; 2026-02-21T09:19:14.2398310Z add.s64 %rd1056, %rd1074, 320; 2026-02-21T09:19:14.2398373Z add.s64 %rd1075, %rd1122, %rd55; 2026-02-21T09:19:14.2398433Z add.s64 %rd1057, %rd1075, 320; 2026-02-21T09:19:14.2398495Z add.s64 %rd1076, %rd1122, %rd54; 2026-02-21T09:19:14.2398557Z add.s64 %rd1058, %rd1076, 320; 2026-02-21T09:19:14.2398683Z add.s64 %rd1077, %rd1122, %rd53; 2026-02-21T09:19:14.2398748Z add.s64 %rd1059, %rd1077, 320; 2026-02-21T09:19:14.2398948Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2399014Z add.s64 %rd1078, %rd1122, %rd52; 2026-02-21T09:19:14.2399076Z add.s64 %rd1060, %rd1078, 320; 2026-02-21T09:19:14.2399137Z shl.b32 %r22447, %r24032, 14; 2026-02-21T09:19:14.2399221Z add.s32 %r22448, %r22967, %r22447; 2026-02-21T09:19:14.2399287Z add.s32 %r22375, %r22448, %r189; 2026-02-21T09:19:14.2399352Z selp.b32 %r22376, 8, 0, %p105; 2026-02-21T09:19:14.2399412Z // begin inline asm 2026-02-21T09:19:14.2399568Z cp.async.ca.shared.global [ %r22375 + 0 ], [ %rd1053 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2399626Z // end inline asm 2026-02-21T09:19:14.2399687Z add.s32 %r22377, %r22375, 2048; 2026-02-21T09:19:14.2399749Z // begin inline asm 2026-02-21T09:19:14.2399893Z cp.async.ca.shared.global [ %r22377 + 0 ], [ %rd1054 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2399951Z // end inline asm 2026-02-21T09:19:14.2400011Z add.s32 %r22379, %r22375, 4096; 2026-02-21T09:19:14.2400071Z // begin inline asm 2026-02-21T09:19:14.2400212Z cp.async.ca.shared.global [ %r22379 + 0 ], [ %rd1055 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2400269Z // end inline asm 2026-02-21T09:19:14.2400330Z add.s32 %r22381, %r22375, 6144; 2026-02-21T09:19:14.2400388Z // begin inline asm 2026-02-21T09:19:14.2400527Z cp.async.ca.shared.global [ %r22381 + 0 ], [ %rd1056 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2400583Z // end inline asm 2026-02-21T09:19:14.2400645Z add.s32 %r22383, %r22375, 8192; 2026-02-21T09:19:14.2400703Z // begin inline asm 2026-02-21T09:19:14.2400842Z cp.async.ca.shared.global [ %r22383 + 0 ], [ %rd1057 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2400909Z // end inline asm 2026-02-21T09:19:14.2400971Z add.s32 %r22385, %r22375, 10240; 2026-02-21T09:19:14.2401028Z // begin inline asm 2026-02-21T09:19:14.2401171Z cp.async.ca.shared.global [ %r22385 + 0 ], [ %rd1058 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2401227Z // end inline asm 2026-02-21T09:19:14.2401288Z add.s32 %r22387, %r22375, 12288; 2026-02-21T09:19:14.2401347Z // begin inline asm 2026-02-21T09:19:14.2401488Z cp.async.ca.shared.global [ %r22387 + 0 ], [ %rd1059 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2401544Z // end inline asm 2026-02-21T09:19:14.2401605Z add.s32 %r22389, %r22375, 14336; 2026-02-21T09:19:14.2401733Z // begin inline asm 2026-02-21T09:19:14.2401869Z cp.async.ca.shared.global [ %r22389 + 0 ], [ %rd1060 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2401924Z // end inline asm 2026-02-21T09:19:14.2401990Z cp.async.commit_group; 2026-02-21T09:19:14.2402271Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2402336Z add.s32 %r22449, %r24030, -65536; 2026-02-21T09:19:14.2402400Z cvt.s64.s32 %rd1079, %r22449; 2026-02-21T09:19:14.2402467Z add.s64 %rd1061, %rd65, %rd1079; 2026-02-21T09:19:14.2402668Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2402729Z shl.b32 %r22450, %r24032, 10; 2026-02-21T09:19:14.2402791Z add.s32 %r22391, %r199, %r22450; 2026-02-21T09:19:14.2402903Z selp.b32 %r22392, 4, 0, %p105; 2026-02-21T09:19:14.2402964Z // begin inline asm 2026-02-21T09:19:14.2403103Z cp.async.ca.shared.global [ %r22391 + 0 ], [ %rd1061 + 0 ], 0x4, %r22392; 2026-02-21T09:19:14.2403164Z // end inline asm 2026-02-21T09:19:14.2403229Z cp.async.commit_group; 2026-02-21T09:19:14.2403425Z .loc 1 51 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:32 2026-02-21T09:19:14.2403492Z add.s64 %rd1062, %rd1071, 352; 2026-02-21T09:19:14.2403555Z add.s64 %rd1063, %rd1072, 352; 2026-02-21T09:19:14.2403616Z add.s64 %rd1064, %rd1073, 352; 2026-02-21T09:19:14.2403676Z add.s64 %rd1065, %rd1074, 352; 2026-02-21T09:19:14.2403739Z add.s64 %rd1066, %rd1075, 352; 2026-02-21T09:19:14.2403849Z add.s64 %rd1067, %rd1076, 352; 2026-02-21T09:19:14.2403911Z add.s64 %rd1068, %rd1077, 352; 2026-02-21T09:19:14.2404110Z .loc 1 51 80 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:51:80 2026-02-21T09:19:14.2404171Z add.s64 %rd1069, %rd1078, 352; 2026-02-21T09:19:14.2404234Z add.s32 %r22451, %r22431, %r22447; 2026-02-21T09:19:14.2404296Z add.s32 %r22393, %r22451, %r189; 2026-02-21T09:19:14.2404356Z // begin inline asm 2026-02-21T09:19:14.2404497Z cp.async.ca.shared.global [ %r22393 + 0 ], [ %rd1062 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2404553Z // end inline asm 2026-02-21T09:19:14.2404615Z add.s32 %r22395, %r22393, 2048; 2026-02-21T09:19:14.2404674Z // begin inline asm 2026-02-21T09:19:14.2404815Z cp.async.ca.shared.global [ %r22395 + 0 ], [ %rd1063 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2404876Z // end inline asm 2026-02-21T09:19:14.2404935Z add.s32 %r22397, %r22393, 4096; 2026-02-21T09:19:14.2404993Z // begin inline asm 2026-02-21T09:19:14.2405134Z cp.async.ca.shared.global [ %r22397 + 0 ], [ %rd1064 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2405190Z // end inline asm 2026-02-21T09:19:14.2405261Z add.s32 %r22399, %r22393, 6144; 2026-02-21T09:19:14.2405323Z // begin inline asm 2026-02-21T09:19:14.2405463Z cp.async.ca.shared.global [ %r22399 + 0 ], [ %rd1065 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2405520Z // end inline asm 2026-02-21T09:19:14.2405580Z add.s32 %r22401, %r22393, 8192; 2026-02-21T09:19:14.2405644Z // begin inline asm 2026-02-21T09:19:14.2405780Z cp.async.ca.shared.global [ %r22401 + 0 ], [ %rd1066 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2405837Z // end inline asm 2026-02-21T09:19:14.2405900Z add.s32 %r22403, %r22393, 10240; 2026-02-21T09:19:14.2405963Z // begin inline asm 2026-02-21T09:19:14.2406099Z cp.async.ca.shared.global [ %r22403 + 0 ], [ %rd1067 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2406154Z // end inline asm 2026-02-21T09:19:14.2406217Z add.s32 %r22405, %r22393, 12288; 2026-02-21T09:19:14.2406274Z // begin inline asm 2026-02-21T09:19:14.2406410Z cp.async.ca.shared.global [ %r22405 + 0 ], [ %rd1068 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2406577Z // end inline asm 2026-02-21T09:19:14.2406648Z add.s32 %r22407, %r22393, 14336; 2026-02-21T09:19:14.2406708Z // begin inline asm 2026-02-21T09:19:14.2406846Z cp.async.ca.shared.global [ %r22407 + 0 ], [ %rd1069 + 0 ], 0x8, %r22376; 2026-02-21T09:19:14.2406905Z // end inline asm 2026-02-21T09:19:14.2439700Z cp.async.commit_group; 2026-02-21T09:19:14.2439931Z .loc 1 57 34 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:34 2026-02-21T09:19:14.2440000Z cvt.s64.s32 %rd1080, %r24030; 2026-02-21T09:19:14.2440067Z add.s64 %rd1070, %rd65, %rd1080; 2026-02-21T09:19:14.2440366Z .loc 1 57 87 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:57:87 2026-02-21T09:19:14.2440430Z add.s32 %r22409, %r209, %r22450; 2026-02-21T09:19:14.2440493Z // begin inline asm 2026-02-21T09:19:14.2440647Z cp.async.ca.shared.global [ %r22409 + 0 ], [ %rd1070 + 0 ], 0x4, %r22392; 2026-02-21T09:19:14.2440704Z // end inline asm 2026-02-21T09:19:14.2440772Z cp.async.commit_group; 2026-02-21T09:19:14.2441045Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2441111Z add.s32 %r24030, %r24030, 131072; 2026-02-21T09:19:14.2441177Z add.s64 %rd1122, %rd1122, 64; 2026-02-21T09:19:14.2441246Z setp.lt.u64 %p108, %rd1123, 496; 2026-02-21T09:19:14.2441307Z @%p108 bra $L__BB0_14; 2026-02-21T09:19:14.2441417Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T09:19:14.2441620Z .loc 1 34 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:34:32 2026-02-21T09:19:14.2441686Z or.b32 %r22740, %r2428, %r9; 2026-02-21T09:19:14.2441881Z .loc 1 36 32 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:36:32 2026-02-21T09:19:14.2441946Z or.b32 %r22741, %r2429, %r19; 2026-02-21T09:19:14.2442074Z or.b32 %r22742, %r2429, %r20; 2026-02-21T09:19:14.2442137Z or.b32 %r22743, %r2429, %r21; 2026-02-21T09:19:14.2442197Z or.b32 %r22744, %r2429, %r22; 2026-02-21T09:19:14.2442258Z or.b32 %r22745, %r2429, %r23; 2026-02-21T09:19:14.2442317Z or.b32 %r22746, %r2429, %r24; 2026-02-21T09:19:14.2442376Z or.b32 %r22747, %r2429, %r25; 2026-02-21T09:19:14.2442437Z or.b32 %r22748, %r2429, %r26; 2026-02-21T09:19:14.2442498Z or.b32 %r22749, %r2429, %r27; 2026-02-21T09:19:14.2442555Z or.b32 %r22750, %r2429, %r28; 2026-02-21T09:19:14.2442621Z or.b32 %r22751, %r2429, %r29; 2026-02-21T09:19:14.2442689Z or.b32 %r22752, %r2429, %r30; 2026-02-21T09:19:14.2442751Z or.b32 %r22753, %r2429, %r31; 2026-02-21T09:19:14.2442811Z or.b32 %r22754, %r2429, %r32; 2026-02-21T09:19:14.2442871Z or.b32 %r22755, %r2429, %r33; 2026-02-21T09:19:14.2442930Z or.b32 %r22756, %r2429, %r34; 2026-02-21T09:19:14.2442987Z or.b32 %r22757, %r2429, %r35; 2026-02-21T09:19:14.2443049Z or.b32 %r22758, %r2429, %r36; 2026-02-21T09:19:14.2443107Z or.b32 %r22759, %r2429, %r37; 2026-02-21T09:19:14.2443165Z or.b32 %r22760, %r2429, %r38; 2026-02-21T09:19:14.2443226Z or.b32 %r22761, %r2429, %r39; 2026-02-21T09:19:14.2443285Z or.b32 %r22762, %r2429, %r40; 2026-02-21T09:19:14.2443343Z or.b32 %r22763, %r2429, %r41; 2026-02-21T09:19:14.2443402Z or.b32 %r22764, %r2429, %r42; 2026-02-21T09:19:14.2443464Z or.b32 %r22765, %r2429, %r43; 2026-02-21T09:19:14.2443524Z or.b32 %r22766, %r2429, %r44; 2026-02-21T09:19:14.2443581Z or.b32 %r22767, %r2429, %r45; 2026-02-21T09:19:14.2443641Z or.b32 %r22768, %r2429, %r46; 2026-02-21T09:19:14.2443700Z or.b32 %r22769, %r2429, %r47; 2026-02-21T09:19:14.2443761Z or.b32 %r22770, %r2429, %r48; 2026-02-21T09:19:14.2443820Z or.b32 %r22771, %r2429, %r49; 2026-02-21T09:19:14.2443880Z or.b32 %r22772, %r2429, %r50; 2026-02-21T09:19:14.2444078Z .loc 1 43 78 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:43:78 2026-02-21T09:19:14.2444148Z cp.async.wait_group 0; 2026-02-21T09:19:14.2444208Z bar.sync 0; 2026-02-21T09:19:14.2444403Z .loc 1 90 28 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:90:28 2026-02-21T09:19:14.2444491Z cvt.rn.bf16x2.f32 %r22773, %r24034, %r24033; 2026-02-21T09:19:14.2444573Z cvt.rn.bf16x2.f32 %r22774, %r24036, %r24035; 2026-02-21T09:19:14.2444650Z cvt.rn.bf16x2.f32 %r22775, %r24038, %r24037; 2026-02-21T09:19:14.2444819Z cvt.rn.bf16x2.f32 %r22776, %r24040, %r24039; 2026-02-21T09:19:14.2444894Z cvt.rn.bf16x2.f32 %r22777, %r24042, %r24041; 2026-02-21T09:19:14.2444979Z cvt.rn.bf16x2.f32 %r22778, %r24044, %r24043; 2026-02-21T09:19:14.2445060Z cvt.rn.bf16x2.f32 %r22779, %r24046, %r24045; 2026-02-21T09:19:14.2445187Z cvt.rn.bf16x2.f32 %r22780, %r24048, %r24047; 2026-02-21T09:19:14.2445265Z cvt.rn.bf16x2.f32 %r22781, %r24050, %r24049; 2026-02-21T09:19:14.2445341Z cvt.rn.bf16x2.f32 %r22782, %r24052, %r24051; 2026-02-21T09:19:14.2445418Z cvt.rn.bf16x2.f32 %r22783, %r24054, %r24053; 2026-02-21T09:19:14.2445495Z cvt.rn.bf16x2.f32 %r22784, %r24056, %r24055; 2026-02-21T09:19:14.2445571Z cvt.rn.bf16x2.f32 %r22785, %r24058, %r24057; 2026-02-21T09:19:14.2445695Z cvt.rn.bf16x2.f32 %r22786, %r24060, %r24059; 2026-02-21T09:19:14.2445771Z cvt.rn.bf16x2.f32 %r22787, %r24062, %r24061; 2026-02-21T09:19:14.2445849Z cvt.rn.bf16x2.f32 %r22788, %r24064, %r24063; 2026-02-21T09:19:14.2445926Z cvt.rn.bf16x2.f32 %r22789, %r24066, %r24065; 2026-02-21T09:19:14.2446000Z cvt.rn.bf16x2.f32 %r22790, %r24068, %r24067; 2026-02-21T09:19:14.2446077Z cvt.rn.bf16x2.f32 %r22791, %r24070, %r24069; 2026-02-21T09:19:14.2446152Z cvt.rn.bf16x2.f32 %r22792, %r24072, %r24071; 2026-02-21T09:19:14.2446229Z cvt.rn.bf16x2.f32 %r22793, %r24074, %r24073; 2026-02-21T09:19:14.2446304Z cvt.rn.bf16x2.f32 %r22794, %r24076, %r24075; 2026-02-21T09:19:14.2446380Z cvt.rn.bf16x2.f32 %r22795, %r24078, %r24077; 2026-02-21T09:19:14.2446587Z cvt.rn.bf16x2.f32 %r22796, %r24080, %r24079; 2026-02-21T09:19:14.2446746Z cvt.rn.bf16x2.f32 %r22797, %r24082, %r24081; 2026-02-21T09:19:14.2446830Z cvt.rn.bf16x2.f32 %r22798, %r24084, %r24083; 2026-02-21T09:19:14.2446906Z cvt.rn.bf16x2.f32 %r22799, %r24086, %r24085; 2026-02-21T09:19:14.2446985Z cvt.rn.bf16x2.f32 %r22800, %r24088, %r24087; 2026-02-21T09:19:14.2447061Z cvt.rn.bf16x2.f32 %r22801, %r24090, %r24089; 2026-02-21T09:19:14.2447135Z cvt.rn.bf16x2.f32 %r22802, %r24092, %r24091; 2026-02-21T09:19:14.2447211Z cvt.rn.bf16x2.f32 %r22803, %r24094, %r24093; 2026-02-21T09:19:14.2447287Z cvt.rn.bf16x2.f32 %r22804, %r24096, %r24095; 2026-02-21T09:19:14.2447367Z cvt.rn.bf16x2.f32 %r22805, %r24098, %r24097; 2026-02-21T09:19:14.2447442Z cvt.rn.bf16x2.f32 %r22806, %r24100, %r24099; 2026-02-21T09:19:14.2447522Z cvt.rn.bf16x2.f32 %r22807, %r24102, %r24101; 2026-02-21T09:19:14.2447599Z cvt.rn.bf16x2.f32 %r22808, %r24104, %r24103; 2026-02-21T09:19:14.2447676Z cvt.rn.bf16x2.f32 %r22809, %r24106, %r24105; 2026-02-21T09:19:14.2447752Z cvt.rn.bf16x2.f32 %r22810, %r24108, %r24107; 2026-02-21T09:19:14.2447829Z cvt.rn.bf16x2.f32 %r22811, %r24110, %r24109; 2026-02-21T09:19:14.2447903Z cvt.rn.bf16x2.f32 %r22812, %r24112, %r24111; 2026-02-21T09:19:14.2447980Z cvt.rn.bf16x2.f32 %r22813, %r24114, %r24113; 2026-02-21T09:19:14.2448056Z cvt.rn.bf16x2.f32 %r22814, %r24116, %r24115; 2026-02-21T09:19:14.2448132Z cvt.rn.bf16x2.f32 %r22815, %r24118, %r24117; 2026-02-21T09:19:14.2448209Z cvt.rn.bf16x2.f32 %r22816, %r24120, %r24119; 2026-02-21T09:19:14.2448283Z cvt.rn.bf16x2.f32 %r22817, %r24122, %r24121; 2026-02-21T09:19:14.2448362Z cvt.rn.bf16x2.f32 %r22818, %r24124, %r24123; 2026-02-21T09:19:14.2448436Z cvt.rn.bf16x2.f32 %r22819, %r24126, %r24125; 2026-02-21T09:19:14.2448512Z cvt.rn.bf16x2.f32 %r22820, %r24128, %r24127; 2026-02-21T09:19:14.2448589Z cvt.rn.bf16x2.f32 %r22821, %r24130, %r24129; 2026-02-21T09:19:14.2448667Z cvt.rn.bf16x2.f32 %r22822, %r24132, %r24131; 2026-02-21T09:19:14.2448741Z cvt.rn.bf16x2.f32 %r22823, %r24134, %r24133; 2026-02-21T09:19:14.2448815Z cvt.rn.bf16x2.f32 %r22824, %r24136, %r24135; 2026-02-21T09:19:14.2448892Z cvt.rn.bf16x2.f32 %r22825, %r24138, %r24137; 2026-02-21T09:19:14.2448965Z cvt.rn.bf16x2.f32 %r22826, %r24140, %r24139; 2026-02-21T09:19:14.2449041Z cvt.rn.bf16x2.f32 %r22827, %r24142, %r24141; 2026-02-21T09:19:14.2449126Z cvt.rn.bf16x2.f32 %r22828, %r24144, %r24143; 2026-02-21T09:19:14.2449215Z cvt.rn.bf16x2.f32 %r22829, %r24146, %r24145; 2026-02-21T09:19:14.2449370Z cvt.rn.bf16x2.f32 %r22830, %r24148, %r24147; 2026-02-21T09:19:14.2449448Z cvt.rn.bf16x2.f32 %r22831, %r24150, %r24149; 2026-02-21T09:19:14.2449528Z cvt.rn.bf16x2.f32 %r22832, %r24152, %r24151; 2026-02-21T09:19:14.2449606Z cvt.rn.bf16x2.f32 %r22833, %r24154, %r24153; 2026-02-21T09:19:14.2449749Z cvt.rn.bf16x2.f32 %r22834, %r24156, %r24155; 2026-02-21T09:19:14.2449828Z cvt.rn.bf16x2.f32 %r22835, %r24158, %r24157; 2026-02-21T09:19:14.2449902Z cvt.rn.bf16x2.f32 %r22836, %r24160, %r24159; 2026-02-21T09:19:14.2449979Z cvt.rn.bf16x2.f32 %r22837, %r24162, %r24161; 2026-02-21T09:19:14.2450057Z cvt.rn.bf16x2.f32 %r22838, %r24164, %r24163; 2026-02-21T09:19:14.2450130Z cvt.rn.bf16x2.f32 %r22839, %r24166, %r24165; 2026-02-21T09:19:14.2450268Z cvt.rn.bf16x2.f32 %r22840, %r24168, %r24167; 2026-02-21T09:19:14.2450348Z cvt.rn.bf16x2.f32 %r22841, %r24170, %r24169; 2026-02-21T09:19:14.2450423Z cvt.rn.bf16x2.f32 %r22842, %r24172, %r24171; 2026-02-21T09:19:14.2450500Z cvt.rn.bf16x2.f32 %r22843, %r24174, %r24173; 2026-02-21T09:19:14.2450574Z cvt.rn.bf16x2.f32 %r22844, %r24176, %r24175; 2026-02-21T09:19:14.2450653Z cvt.rn.bf16x2.f32 %r22845, %r24178, %r24177; 2026-02-21T09:19:14.2450727Z cvt.rn.bf16x2.f32 %r22846, %r24180, %r24179; 2026-02-21T09:19:14.2450806Z cvt.rn.bf16x2.f32 %r22847, %r24182, %r24181; 2026-02-21T09:19:14.2450881Z cvt.rn.bf16x2.f32 %r22848, %r24184, %r24183; 2026-02-21T09:19:14.2450955Z cvt.rn.bf16x2.f32 %r22849, %r24186, %r24185; 2026-02-21T09:19:14.2451028Z cvt.rn.bf16x2.f32 %r22850, %r24188, %r24187; 2026-02-21T09:19:14.2451156Z cvt.rn.bf16x2.f32 %r22851, %r24190, %r24189; 2026-02-21T09:19:14.2451236Z cvt.rn.bf16x2.f32 %r22852, %r24192, %r24191; 2026-02-21T09:19:14.2451312Z cvt.rn.bf16x2.f32 %r22853, %r24194, %r24193; 2026-02-21T09:19:14.2451391Z cvt.rn.bf16x2.f32 %r22854, %r24196, %r24195; 2026-02-21T09:19:14.2451468Z cvt.rn.bf16x2.f32 %r22855, %r24198, %r24197; 2026-02-21T09:19:14.2451544Z cvt.rn.bf16x2.f32 %r22856, %r24200, %r24199; 2026-02-21T09:19:14.2451620Z cvt.rn.bf16x2.f32 %r22857, %r24202, %r24201; 2026-02-21T09:19:14.2451697Z cvt.rn.bf16x2.f32 %r22858, %r24204, %r24203; 2026-02-21T09:19:14.2451772Z cvt.rn.bf16x2.f32 %r22859, %r24206, %r24205; 2026-02-21T09:19:14.2451861Z cvt.rn.bf16x2.f32 %r22860, %r24208, %r24207; 2026-02-21T09:19:14.2451936Z cvt.rn.bf16x2.f32 %r22861, %r24210, %r24209; 2026-02-21T09:19:14.2452014Z cvt.rn.bf16x2.f32 %r22862, %r24212, %r24211; 2026-02-21T09:19:14.2452086Z cvt.rn.bf16x2.f32 %r22863, %r24214, %r24213; 2026-02-21T09:19:14.2452161Z cvt.rn.bf16x2.f32 %r22864, %r24216, %r24215; 2026-02-21T09:19:14.2452238Z cvt.rn.bf16x2.f32 %r22865, %r24218, %r24217; 2026-02-21T09:19:14.2452315Z cvt.rn.bf16x2.f32 %r22866, %r24220, %r24219; 2026-02-21T09:19:14.2452392Z cvt.rn.bf16x2.f32 %r22867, %r24222, %r24221; 2026-02-21T09:19:14.2452468Z cvt.rn.bf16x2.f32 %r22868, %r24224, %r24223; 2026-02-21T09:19:14.2452543Z cvt.rn.bf16x2.f32 %r22869, %r24226, %r24225; 2026-02-21T09:19:14.2452619Z cvt.rn.bf16x2.f32 %r22870, %r24228, %r24227; 2026-02-21T09:19:14.2452693Z cvt.rn.bf16x2.f32 %r22871, %r24230, %r24229; 2026-02-21T09:19:14.2452771Z cvt.rn.bf16x2.f32 %r22872, %r24232, %r24231; 2026-02-21T09:19:14.2452845Z cvt.rn.bf16x2.f32 %r22873, %r24234, %r24233; 2026-02-21T09:19:14.2452921Z cvt.rn.bf16x2.f32 %r22874, %r24236, %r24235; 2026-02-21T09:19:14.2452999Z cvt.rn.bf16x2.f32 %r22875, %r24238, %r24237; 2026-02-21T09:19:14.2453075Z cvt.rn.bf16x2.f32 %r22876, %r24240, %r24239; 2026-02-21T09:19:14.2453151Z cvt.rn.bf16x2.f32 %r22877, %r24242, %r24241; 2026-02-21T09:19:14.2453229Z cvt.rn.bf16x2.f32 %r22878, %r24244, %r24243; 2026-02-21T09:19:14.2453306Z cvt.rn.bf16x2.f32 %r22879, %r24246, %r24245; 2026-02-21T09:19:14.2453380Z cvt.rn.bf16x2.f32 %r22880, %r24248, %r24247; 2026-02-21T09:19:14.2453455Z cvt.rn.bf16x2.f32 %r22881, %r24250, %r24249; 2026-02-21T09:19:14.2453532Z cvt.rn.bf16x2.f32 %r22882, %r24252, %r24251; 2026-02-21T09:19:14.2453607Z cvt.rn.bf16x2.f32 %r22883, %r24254, %r24253; 2026-02-21T09:19:14.2453763Z cvt.rn.bf16x2.f32 %r22884, %r24256, %r24255; 2026-02-21T09:19:14.2453841Z cvt.rn.bf16x2.f32 %r22885, %r24258, %r24257; 2026-02-21T09:19:14.2453916Z cvt.rn.bf16x2.f32 %r22886, %r24260, %r24259; 2026-02-21T09:19:14.2454049Z cvt.rn.bf16x2.f32 %r22887, %r24262, %r24261; 2026-02-21T09:19:14.2454126Z cvt.rn.bf16x2.f32 %r22888, %r24264, %r24263; 2026-02-21T09:19:14.2454200Z cvt.rn.bf16x2.f32 %r22889, %r24266, %r24265; 2026-02-21T09:19:14.2454275Z cvt.rn.bf16x2.f32 %r22890, %r24268, %r24267; 2026-02-21T09:19:14.2454351Z cvt.rn.bf16x2.f32 %r22891, %r24270, %r24269; 2026-02-21T09:19:14.2454429Z cvt.rn.bf16x2.f32 %r22892, %r24272, %r24271; 2026-02-21T09:19:14.2454505Z cvt.rn.bf16x2.f32 %r22893, %r24274, %r24273; 2026-02-21T09:19:14.2454634Z cvt.rn.bf16x2.f32 %r22894, %r24276, %r24275; 2026-02-21T09:19:14.2454715Z cvt.rn.bf16x2.f32 %r22895, %r24278, %r24277; 2026-02-21T09:19:14.2454791Z cvt.rn.bf16x2.f32 %r22896, %r24280, %r24279; 2026-02-21T09:19:14.2454869Z cvt.rn.bf16x2.f32 %r22897, %r24282, %r24281; 2026-02-21T09:19:14.2454945Z cvt.rn.bf16x2.f32 %r22898, %r24284, %r24283; 2026-02-21T09:19:14.2455020Z cvt.rn.bf16x2.f32 %r22899, %r24286, %r24285; 2026-02-21T09:19:14.2455095Z cvt.rn.bf16x2.f32 %r22900, %r24288, %r24287; 2026-02-21T09:19:14.2455306Z .loc 1 91 43 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:43 2026-02-21T09:19:14.2455372Z shl.b32 %r22901, %r22741, 13; 2026-02-21T09:19:14.2455432Z shl.b32 %r22902, %r22742, 13; 2026-02-21T09:19:14.2455489Z shl.b32 %r22903, %r22743, 13; 2026-02-21T09:19:14.2455598Z shl.b32 %r22904, %r22744, 13; 2026-02-21T09:19:14.2455659Z shl.b32 %r22905, %r22745, 13; 2026-02-21T09:19:14.2455718Z shl.b32 %r22906, %r22746, 13; 2026-02-21T09:19:14.2455781Z shl.b32 %r22907, %r22747, 13; 2026-02-21T09:19:14.2455838Z shl.b32 %r22908, %r22748, 13; 2026-02-21T09:19:14.2455898Z shl.b32 %r22909, %r22749, 13; 2026-02-21T09:19:14.2455957Z shl.b32 %r22910, %r22750, 13; 2026-02-21T09:19:14.2456020Z shl.b32 %r22911, %r22751, 13; 2026-02-21T09:19:14.2456078Z shl.b32 %r22912, %r22752, 13; 2026-02-21T09:19:14.2456135Z shl.b32 %r22913, %r22753, 13; 2026-02-21T09:19:14.2456196Z shl.b32 %r22914, %r22754, 13; 2026-02-21T09:19:14.2456256Z shl.b32 %r22915, %r22755, 13; 2026-02-21T09:19:14.2456314Z shl.b32 %r22916, %r22756, 13; 2026-02-21T09:19:14.2456373Z shl.b32 %r22917, %r22757, 13; 2026-02-21T09:19:14.2456433Z shl.b32 %r22918, %r22758, 13; 2026-02-21T09:19:14.2456617Z shl.b32 %r22919, %r22759, 13; 2026-02-21T09:19:14.2456683Z shl.b32 %r22920, %r22760, 13; 2026-02-21T09:19:14.2456744Z shl.b32 %r22921, %r22761, 13; 2026-02-21T09:19:14.2456803Z shl.b32 %r22922, %r22762, 13; 2026-02-21T09:19:14.2456861Z shl.b32 %r22923, %r22763, 13; 2026-02-21T09:19:14.2456923Z shl.b32 %r22924, %r22764, 13; 2026-02-21T09:19:14.2456985Z shl.b32 %r22925, %r22765, 13; 2026-02-21T09:19:14.2457043Z shl.b32 %r22926, %r22766, 13; 2026-02-21T09:19:14.2457103Z shl.b32 %r22927, %r22767, 13; 2026-02-21T09:19:14.2457166Z shl.b32 %r22928, %r22768, 13; 2026-02-21T09:19:14.2457224Z shl.b32 %r22929, %r22769, 13; 2026-02-21T09:19:14.2457282Z shl.b32 %r22930, %r22770, 13; 2026-02-21T09:19:14.2457341Z shl.b32 %r22931, %r22771, 13; 2026-02-21T09:19:14.2457405Z shl.b32 %r22932, %r22772, 13; 2026-02-21T09:19:14.2457614Z .loc 1 91 50 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:50 2026-02-21T09:19:14.2457679Z add.s32 %r22933, %r22901, %r22740; 2026-02-21T09:19:14.2457746Z add.s32 %r22934, %r22902, %r22740; 2026-02-21T09:19:14.2457809Z add.s32 %r22935, %r22903, %r22740; 2026-02-21T09:19:14.2457870Z add.s32 %r22936, %r22904, %r22740; 2026-02-21T09:19:14.2457935Z add.s32 %r22937, %r22905, %r22740; 2026-02-21T09:19:14.2457996Z add.s32 %r22938, %r22906, %r22740; 2026-02-21T09:19:14.2458056Z add.s32 %r22939, %r22907, %r22740; 2026-02-21T09:19:14.2458116Z add.s32 %r22940, %r22908, %r22740; 2026-02-21T09:19:14.2458179Z add.s32 %r22941, %r22909, %r22740; 2026-02-21T09:19:14.2458323Z add.s32 %r22942, %r22910, %r22740; 2026-02-21T09:19:14.2458383Z add.s32 %r22943, %r22911, %r22740; 2026-02-21T09:19:14.2458447Z add.s32 %r22944, %r22912, %r22740; 2026-02-21T09:19:14.2458508Z add.s32 %r22945, %r22913, %r22740; 2026-02-21T09:19:14.2458632Z add.s32 %r22946, %r22914, %r22740; 2026-02-21T09:19:14.2458692Z add.s32 %r22947, %r22915, %r22740; 2026-02-21T09:19:14.2458765Z add.s32 %r22948, %r22916, %r22740; 2026-02-21T09:19:14.2458830Z add.s32 %r22949, %r22917, %r22740; 2026-02-21T09:19:14.2458891Z add.s32 %r22950, %r22918, %r22740; 2026-02-21T09:19:14.2458956Z add.s32 %r22951, %r22919, %r22740; 2026-02-21T09:19:14.2459017Z add.s32 %r22952, %r22920, %r22740; 2026-02-21T09:19:14.2459076Z add.s32 %r22953, %r22921, %r22740; 2026-02-21T09:19:14.2459201Z add.s32 %r22954, %r22922, %r22740; 2026-02-21T09:19:14.2459263Z add.s32 %r22955, %r22923, %r22740; 2026-02-21T09:19:14.2459330Z add.s32 %r22956, %r22924, %r22740; 2026-02-21T09:19:14.2459392Z add.s32 %r22957, %r22925, %r22740; 2026-02-21T09:19:14.2459456Z add.s32 %r22958, %r22926, %r22740; 2026-02-21T09:19:14.2459517Z add.s32 %r22959, %r22927, %r22740; 2026-02-21T09:19:14.2459577Z add.s32 %r22960, %r22928, %r22740; 2026-02-21T09:19:14.2459642Z add.s32 %r22961, %r22929, %r22740; 2026-02-21T09:19:14.2459700Z add.s32 %r22962, %r22930, %r22740; 2026-02-21T09:19:14.2459760Z add.s32 %r22963, %r22931, %r22740; 2026-02-21T09:19:14.2459819Z add.s32 %r22964, %r22932, %r22740; 2026-02-21T09:19:14.2460087Z .loc 1 91 22 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:22 2026-02-21T09:19:14.2460165Z mad.wide.s32 %rd1081, %r22933, 2, %rd66; 2026-02-21T09:19:14.2460238Z mad.wide.s32 %rd1082, %r22934, 2, %rd66; 2026-02-21T09:19:14.2460311Z mad.wide.s32 %rd1083, %r22935, 2, %rd66; 2026-02-21T09:19:14.2460379Z mad.wide.s32 %rd1084, %r22936, 2, %rd66; 2026-02-21T09:19:14.2460446Z mad.wide.s32 %rd1085, %r22937, 2, %rd66; 2026-02-21T09:19:14.2460518Z mad.wide.s32 %rd1086, %r22938, 2, %rd66; 2026-02-21T09:19:14.2460588Z mad.wide.s32 %rd1087, %r22939, 2, %rd66; 2026-02-21T09:19:14.2460655Z mad.wide.s32 %rd1088, %r22940, 2, %rd66; 2026-02-21T09:19:14.2460722Z mad.wide.s32 %rd1089, %r22941, 2, %rd66; 2026-02-21T09:19:14.2460795Z mad.wide.s32 %rd1090, %r22942, 2, %rd66; 2026-02-21T09:19:14.2460865Z mad.wide.s32 %rd1091, %r22943, 2, %rd66; 2026-02-21T09:19:14.2460933Z mad.wide.s32 %rd1092, %r22944, 2, %rd66; 2026-02-21T09:19:14.2461006Z mad.wide.s32 %rd1093, %r22945, 2, %rd66; 2026-02-21T09:19:14.2461074Z mad.wide.s32 %rd1094, %r22946, 2, %rd66; 2026-02-21T09:19:14.2461141Z mad.wide.s32 %rd1095, %r22947, 2, %rd66; 2026-02-21T09:19:14.2461212Z mad.wide.s32 %rd1096, %r22948, 2, %rd66; 2026-02-21T09:19:14.2461280Z mad.wide.s32 %rd1097, %r22949, 2, %rd66; 2026-02-21T09:19:14.2461350Z mad.wide.s32 %rd1098, %r22950, 2, %rd66; 2026-02-21T09:19:14.2461417Z mad.wide.s32 %rd1099, %r22951, 2, %rd66; 2026-02-21T09:19:14.2461491Z mad.wide.s32 %rd1100, %r22952, 2, %rd66; 2026-02-21T09:19:14.2461560Z mad.wide.s32 %rd1101, %r22953, 2, %rd66; 2026-02-21T09:19:14.2461628Z mad.wide.s32 %rd1102, %r22954, 2, %rd66; 2026-02-21T09:19:14.2461698Z mad.wide.s32 %rd1103, %r22955, 2, %rd66; 2026-02-21T09:19:14.2461768Z mad.wide.s32 %rd1104, %r22956, 2, %rd66; 2026-02-21T09:19:14.2461836Z mad.wide.s32 %rd1105, %r22957, 2, %rd66; 2026-02-21T09:19:14.2461919Z mad.wide.s32 %rd1106, %r22958, 2, %rd66; 2026-02-21T09:19:14.2461994Z mad.wide.s32 %rd1107, %r22959, 2, %rd66; 2026-02-21T09:19:14.2462064Z mad.wide.s32 %rd1108, %r22960, 2, %rd66; 2026-02-21T09:19:14.2462132Z mad.wide.s32 %rd1109, %r22961, 2, %rd66; 2026-02-21T09:19:14.2462202Z mad.wide.s32 %rd1110, %r22962, 2, %rd66; 2026-02-21T09:19:14.2462272Z mad.wide.s32 %rd1111, %r22963, 2, %rd66; 2026-02-21T09:19:14.2462340Z mad.wide.s32 %rd1112, %r22964, 2, %rd66; 2026-02-21T09:19:14.2462553Z .loc 1 91 81 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:91:81 2026-02-21T09:19:14.2462739Z st.shared.v4.b32 [%r298], {%r22773, %r22775, %r22777, %r22779}; 2026-02-21T09:19:14.2462864Z st.shared.v4.b32 [%r298+512], {%r22774, %r22776, %r22778, %r22780}; 2026-02-21T09:19:14.2462980Z st.shared.v4.b32 [%r299], {%r22781, %r22783, %r22785, %r22787}; 2026-02-21T09:19:14.2463146Z st.shared.v4.b32 [%r299+512], {%r22782, %r22784, %r22786, %r22788}; 2026-02-21T09:19:14.2463255Z st.shared.v4.b32 [%r300], {%r22789, %r22791, %r22793, %r22795}; 2026-02-21T09:19:14.2463369Z st.shared.v4.b32 [%r300+512], {%r22790, %r22792, %r22794, %r22796}; 2026-02-21T09:19:14.2463478Z st.shared.v4.b32 [%r301], {%r22797, %r22799, %r22801, %r22803}; 2026-02-21T09:19:14.2463591Z st.shared.v4.b32 [%r301+512], {%r22798, %r22800, %r22802, %r22804}; 2026-02-21T09:19:14.2463695Z bar.sync 0; 2026-02-21T09:19:14.2463761Z // begin inline asm 2026-02-21T09:19:14.2463967Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22452, %r22453, %r22454, %r22455}, [%r22456]; 2026-02-21T09:19:14.2464028Z // end inline asm 2026-02-21T09:19:14.2464089Z // begin inline asm 2026-02-21T09:19:14.2464283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22457, %r22458, %r22459, %r22460}, [%r22461]; 2026-02-21T09:19:14.2464341Z // end inline asm 2026-02-21T09:19:14.2464400Z // begin inline asm 2026-02-21T09:19:14.2464594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22462, %r22463, %r22464, %r22465}, [%r22466]; 2026-02-21T09:19:14.2464651Z // end inline asm 2026-02-21T09:19:14.2464709Z // begin inline asm 2026-02-21T09:19:14.2464962Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22467, %r22468, %r22469, %r22470}, [%r22471]; 2026-02-21T09:19:14.2465022Z // end inline asm 2026-02-21T09:19:14.2465081Z // begin inline asm 2026-02-21T09:19:14.2465276Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22472, %r22473, %r22474, %r22475}, [%r22476]; 2026-02-21T09:19:14.2465333Z // end inline asm 2026-02-21T09:19:14.2465390Z // begin inline asm 2026-02-21T09:19:14.2465578Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22477, %r22478, %r22479, %r22480}, [%r22481]; 2026-02-21T09:19:14.2465639Z // end inline asm 2026-02-21T09:19:14.2465696Z // begin inline asm 2026-02-21T09:19:14.2465885Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22482, %r22483, %r22484, %r22485}, [%r22486]; 2026-02-21T09:19:14.2465943Z // end inline asm 2026-02-21T09:19:14.2466000Z // begin inline asm 2026-02-21T09:19:14.2466189Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22487, %r22488, %r22489, %r22490}, [%r22491]; 2026-02-21T09:19:14.2466246Z // end inline asm 2026-02-21T09:19:14.2466307Z bar.sync 0; 2026-02-21T09:19:14.2466420Z st.shared.v4.b32 [%r298], {%r22805, %r22807, %r22809, %r22811}; 2026-02-21T09:19:14.2466671Z st.shared.v4.b32 [%r298+512], {%r22806, %r22808, %r22810, %r22812}; 2026-02-21T09:19:14.2466789Z st.shared.v4.b32 [%r299], {%r22813, %r22815, %r22817, %r22819}; 2026-02-21T09:19:14.2466904Z st.shared.v4.b32 [%r299+512], {%r22814, %r22816, %r22818, %r22820}; 2026-02-21T09:19:14.2467011Z st.shared.v4.b32 [%r300], {%r22821, %r22823, %r22825, %r22827}; 2026-02-21T09:19:14.2467128Z st.shared.v4.b32 [%r300+512], {%r22822, %r22824, %r22826, %r22828}; 2026-02-21T09:19:14.2467235Z st.shared.v4.b32 [%r301], {%r22829, %r22831, %r22833, %r22835}; 2026-02-21T09:19:14.2467348Z st.shared.v4.b32 [%r301+512], {%r22830, %r22832, %r22834, %r22836}; 2026-02-21T09:19:14.2467406Z bar.sync 0; 2026-02-21T09:19:14.2467465Z // begin inline asm 2026-02-21T09:19:14.2467656Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22492, %r22493, %r22494, %r22495}, [%r22456]; 2026-02-21T09:19:14.2467712Z // end inline asm 2026-02-21T09:19:14.2467773Z // begin inline asm 2026-02-21T09:19:14.2467962Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22497, %r22498, %r22499, %r22500}, [%r22461]; 2026-02-21T09:19:14.2468019Z // end inline asm 2026-02-21T09:19:14.2468082Z // begin inline asm 2026-02-21T09:19:14.2468270Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22502, %r22503, %r22504, %r22505}, [%r22466]; 2026-02-21T09:19:14.2468325Z // end inline asm 2026-02-21T09:19:14.2468469Z // begin inline asm 2026-02-21T09:19:14.2468842Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22507, %r22508, %r22509, %r22510}, [%r22471]; 2026-02-21T09:19:14.2468898Z // end inline asm 2026-02-21T09:19:14.2468956Z // begin inline asm 2026-02-21T09:19:14.2469229Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22512, %r22513, %r22514, %r22515}, [%r22476]; 2026-02-21T09:19:14.2469283Z // end inline asm 2026-02-21T09:19:14.2469340Z // begin inline asm 2026-02-21T09:19:14.2469531Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22517, %r22518, %r22519, %r22520}, [%r22481]; 2026-02-21T09:19:14.2469586Z // end inline asm 2026-02-21T09:19:14.2469642Z // begin inline asm 2026-02-21T09:19:14.2469830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22522, %r22523, %r22524, %r22525}, [%r22486]; 2026-02-21T09:19:14.2469957Z // end inline asm 2026-02-21T09:19:14.2470017Z // begin inline asm 2026-02-21T09:19:14.2470204Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22527, %r22528, %r22529, %r22530}, [%r22491]; 2026-02-21T09:19:14.2470264Z // end inline asm 2026-02-21T09:19:14.2470318Z bar.sync 0; 2026-02-21T09:19:14.2470428Z st.shared.v4.b32 [%r298], {%r22837, %r22839, %r22841, %r22843}; 2026-02-21T09:19:14.2470546Z st.shared.v4.b32 [%r298+512], {%r22838, %r22840, %r22842, %r22844}; 2026-02-21T09:19:14.2470657Z st.shared.v4.b32 [%r299], {%r22845, %r22847, %r22849, %r22851}; 2026-02-21T09:19:14.2470770Z st.shared.v4.b32 [%r299+512], {%r22846, %r22848, %r22850, %r22852}; 2026-02-21T09:19:14.2470877Z st.shared.v4.b32 [%r300], {%r22853, %r22855, %r22857, %r22859}; 2026-02-21T09:19:14.2471091Z st.shared.v4.b32 [%r300+512], {%r22854, %r22856, %r22858, %r22860}; 2026-02-21T09:19:14.2471203Z st.shared.v4.b32 [%r301], {%r22861, %r22863, %r22865, %r22867}; 2026-02-21T09:19:14.2471318Z st.shared.v4.b32 [%r301+512], {%r22862, %r22864, %r22866, %r22868}; 2026-02-21T09:19:14.2471376Z bar.sync 0; 2026-02-21T09:19:14.2471435Z // begin inline asm 2026-02-21T09:19:14.2471626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22532, %r22533, %r22534, %r22535}, [%r22456]; 2026-02-21T09:19:14.2471685Z // end inline asm 2026-02-21T09:19:14.2471753Z // begin inline asm 2026-02-21T09:19:14.2471946Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22537, %r22538, %r22539, %r22540}, [%r22461]; 2026-02-21T09:19:14.2472004Z // end inline asm 2026-02-21T09:19:14.2472065Z // begin inline asm 2026-02-21T09:19:14.2472254Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22542, %r22543, %r22544, %r22545}, [%r22466]; 2026-02-21T09:19:14.2472310Z // end inline asm 2026-02-21T09:19:14.2472372Z // begin inline asm 2026-02-21T09:19:14.2472559Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22547, %r22548, %r22549, %r22550}, [%r22471]; 2026-02-21T09:19:14.2472614Z // end inline asm 2026-02-21T09:19:14.2472675Z // begin inline asm 2026-02-21T09:19:14.2472862Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22552, %r22553, %r22554, %r22555}, [%r22476]; 2026-02-21T09:19:14.2472918Z // end inline asm 2026-02-21T09:19:14.2472977Z // begin inline asm 2026-02-21T09:19:14.2473166Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22557, %r22558, %r22559, %r22560}, [%r22481]; 2026-02-21T09:19:14.2473223Z // end inline asm 2026-02-21T09:19:14.2473281Z // begin inline asm 2026-02-21T09:19:14.2473472Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22562, %r22563, %r22564, %r22565}, [%r22486]; 2026-02-21T09:19:14.2473530Z // end inline asm 2026-02-21T09:19:14.2473589Z // begin inline asm 2026-02-21T09:19:14.2473777Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22567, %r22568, %r22569, %r22570}, [%r22491]; 2026-02-21T09:19:14.2473835Z // end inline asm 2026-02-21T09:19:14.2473891Z bar.sync 0; 2026-02-21T09:19:14.2474000Z st.shared.v4.b32 [%r298], {%r22869, %r22871, %r22873, %r22875}; 2026-02-21T09:19:14.2474117Z st.shared.v4.b32 [%r298+512], {%r22870, %r22872, %r22874, %r22876}; 2026-02-21T09:19:14.2474226Z st.shared.v4.b32 [%r299], {%r22877, %r22879, %r22881, %r22883}; 2026-02-21T09:19:14.2474340Z st.shared.v4.b32 [%r299+512], {%r22878, %r22880, %r22882, %r22884}; 2026-02-21T09:19:14.2474514Z st.shared.v4.b32 [%r300], {%r22885, %r22887, %r22889, %r22891}; 2026-02-21T09:19:14.2474628Z st.shared.v4.b32 [%r300+512], {%r22886, %r22888, %r22890, %r22892}; 2026-02-21T09:19:14.2474734Z st.shared.v4.b32 [%r301], {%r22893, %r22895, %r22897, %r22899}; 2026-02-21T09:19:14.2474896Z st.shared.v4.b32 [%r301+512], {%r22894, %r22896, %r22898, %r22900}; 2026-02-21T09:19:14.2474962Z bar.sync 0; 2026-02-21T09:19:14.2475022Z // begin inline asm 2026-02-21T09:19:14.2475215Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22572, %r22573, %r22574, %r22575}, [%r22456]; 2026-02-21T09:19:14.2475276Z // end inline asm 2026-02-21T09:19:14.2475335Z // begin inline asm 2026-02-21T09:19:14.2475575Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22577, %r22578, %r22579, %r22580}, [%r22461]; 2026-02-21T09:19:14.2475637Z // end inline asm 2026-02-21T09:19:14.2475695Z // begin inline asm 2026-02-21T09:19:14.2475882Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22582, %r22583, %r22584, %r22585}, [%r22466]; 2026-02-21T09:19:14.2475942Z // end inline asm 2026-02-21T09:19:14.2476000Z // begin inline asm 2026-02-21T09:19:14.2476190Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22587, %r22588, %r22589, %r22590}, [%r22471]; 2026-02-21T09:19:14.2476247Z // end inline asm 2026-02-21T09:19:14.2476307Z // begin inline asm 2026-02-21T09:19:14.2476605Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22592, %r22593, %r22594, %r22595}, [%r22476]; 2026-02-21T09:19:14.2476664Z // end inline asm 2026-02-21T09:19:14.2476724Z // begin inline asm 2026-02-21T09:19:14.2476987Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22597, %r22598, %r22599, %r22600}, [%r22481]; 2026-02-21T09:19:14.2477045Z // end inline asm 2026-02-21T09:19:14.2477103Z // begin inline asm 2026-02-21T09:19:14.2477296Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22602, %r22603, %r22604, %r22605}, [%r22486]; 2026-02-21T09:19:14.2477353Z // end inline asm 2026-02-21T09:19:14.2477410Z // begin inline asm 2026-02-21T09:19:14.2477601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r22607, %r22608, %r22609, %r22610}, [%r22491]; 2026-02-21T09:19:14.2477657Z // end inline asm 2026-02-21T09:19:14.2477714Z // begin inline asm 2026-02-21T09:19:14.2477849Z st.global.v4.b32 [ %rd1081 + 0 ], { %r22452, %r22453, %r22454, %r22455 }; 2026-02-21T09:19:14.2477907Z // end inline asm 2026-02-21T09:19:14.2477964Z // begin inline asm 2026-02-21T09:19:14.2478087Z st.global.v4.b32 [ %rd1082 + 0 ], { %r22457, %r22458, %r22459, %r22460 }; 2026-02-21T09:19:14.2478144Z // end inline asm 2026-02-21T09:19:14.2478202Z // begin inline asm 2026-02-21T09:19:14.2478323Z st.global.v4.b32 [ %rd1083 + 0 ], { %r22462, %r22463, %r22464, %r22465 }; 2026-02-21T09:19:14.2478384Z // end inline asm 2026-02-21T09:19:14.2478441Z // begin inline asm 2026-02-21T09:19:14.2478563Z st.global.v4.b32 [ %rd1084 + 0 ], { %r22467, %r22468, %r22469, %r22470 }; 2026-02-21T09:19:14.2478618Z // end inline asm 2026-02-21T09:19:14.2478679Z // begin inline asm 2026-02-21T09:19:14.2478799Z st.global.v4.b32 [ %rd1085 + 0 ], { %r22472, %r22473, %r22474, %r22475 }; 2026-02-21T09:19:14.2478855Z // end inline asm 2026-02-21T09:19:14.2478919Z // begin inline asm 2026-02-21T09:19:14.2479037Z st.global.v4.b32 [ %rd1086 + 0 ], { %r22477, %r22478, %r22479, %r22480 }; 2026-02-21T09:19:14.2479094Z // end inline asm 2026-02-21T09:19:14.2479152Z // begin inline asm 2026-02-21T09:19:14.2479270Z st.global.v4.b32 [ %rd1087 + 0 ], { %r22482, %r22483, %r22484, %r22485 }; 2026-02-21T09:19:14.2479326Z // end inline asm 2026-02-21T09:19:14.2479382Z // begin inline asm 2026-02-21T09:19:14.2479506Z st.global.v4.b32 [ %rd1088 + 0 ], { %r22487, %r22488, %r22489, %r22490 }; 2026-02-21T09:19:14.2479561Z // end inline asm 2026-02-21T09:19:14.2479619Z // begin inline asm 2026-02-21T09:19:14.2479741Z st.global.v4.b32 [ %rd1089 + 0 ], { %r22492, %r22493, %r22494, %r22495 }; 2026-02-21T09:19:14.2479797Z // end inline asm 2026-02-21T09:19:14.2479854Z // begin inline asm 2026-02-21T09:19:14.2480074Z st.global.v4.b32 [ %rd1090 + 0 ], { %r22497, %r22498, %r22499, %r22500 }; 2026-02-21T09:19:14.2480132Z // end inline asm 2026-02-21T09:19:14.2480191Z // begin inline asm 2026-02-21T09:19:14.2480310Z st.global.v4.b32 [ %rd1091 + 0 ], { %r22502, %r22503, %r22504, %r22505 }; 2026-02-21T09:19:14.2480433Z // end inline asm 2026-02-21T09:19:14.2480490Z // begin inline asm 2026-02-21T09:19:14.2480608Z st.global.v4.b32 [ %rd1092 + 0 ], { %r22507, %r22508, %r22509, %r22510 }; 2026-02-21T09:19:14.2480667Z // end inline asm 2026-02-21T09:19:14.2480723Z // begin inline asm 2026-02-21T09:19:14.2480840Z st.global.v4.b32 [ %rd1093 + 0 ], { %r22512, %r22513, %r22514, %r22515 }; 2026-02-21T09:19:14.2480895Z // end inline asm 2026-02-21T09:19:14.2480955Z // begin inline asm 2026-02-21T09:19:14.2481132Z st.global.v4.b32 [ %rd1094 + 0 ], { %r22517, %r22518, %r22519, %r22520 }; 2026-02-21T09:19:14.2481191Z // end inline asm 2026-02-21T09:19:14.2481250Z // begin inline asm 2026-02-21T09:19:14.2481369Z st.global.v4.b32 [ %rd1095 + 0 ], { %r22522, %r22523, %r22524, %r22525 }; 2026-02-21T09:19:14.2481425Z // end inline asm 2026-02-21T09:19:14.2481482Z // begin inline asm 2026-02-21T09:19:14.2481612Z st.global.v4.b32 [ %rd1096 + 0 ], { %r22527, %r22528, %r22529, %r22530 }; 2026-02-21T09:19:14.2481670Z // end inline asm 2026-02-21T09:19:14.2481728Z // begin inline asm 2026-02-21T09:19:14.2481852Z st.global.v4.b32 [ %rd1097 + 0 ], { %r22532, %r22533, %r22534, %r22535 }; 2026-02-21T09:19:14.2481907Z // end inline asm 2026-02-21T09:19:14.2481964Z // begin inline asm 2026-02-21T09:19:14.2482135Z st.global.v4.b32 [ %rd1098 + 0 ], { %r22537, %r22538, %r22539, %r22540 }; 2026-02-21T09:19:14.2482193Z // end inline asm 2026-02-21T09:19:14.2482250Z // begin inline asm 2026-02-21T09:19:14.2482373Z st.global.v4.b32 [ %rd1099 + 0 ], { %r22542, %r22543, %r22544, %r22545 }; 2026-02-21T09:19:14.2482432Z // end inline asm 2026-02-21T09:19:14.2482488Z // begin inline asm 2026-02-21T09:19:14.2482619Z st.global.v4.b32 [ %rd1100 + 0 ], { %r22547, %r22548, %r22549, %r22550 }; 2026-02-21T09:19:14.2482679Z // end inline asm 2026-02-21T09:19:14.2482737Z // begin inline asm 2026-02-21T09:19:14.2482861Z st.global.v4.b32 [ %rd1101 + 0 ], { %r22552, %r22553, %r22554, %r22555 }; 2026-02-21T09:19:14.2482919Z // end inline asm 2026-02-21T09:19:14.2482981Z // begin inline asm 2026-02-21T09:19:14.2483100Z st.global.v4.b32 [ %rd1102 + 0 ], { %r22557, %r22558, %r22559, %r22560 }; 2026-02-21T09:19:14.2483155Z // end inline asm 2026-02-21T09:19:14.2483215Z // begin inline asm 2026-02-21T09:19:14.2483334Z st.global.v4.b32 [ %rd1103 + 0 ], { %r22562, %r22563, %r22564, %r22565 }; 2026-02-21T09:19:14.2483389Z // end inline asm 2026-02-21T09:19:14.2483448Z // begin inline asm 2026-02-21T09:19:14.2483569Z st.global.v4.b32 [ %rd1104 + 0 ], { %r22567, %r22568, %r22569, %r22570 }; 2026-02-21T09:19:14.2483624Z // end inline asm 2026-02-21T09:19:14.2483681Z // begin inline asm 2026-02-21T09:19:14.2483804Z st.global.v4.b32 [ %rd1105 + 0 ], { %r22572, %r22573, %r22574, %r22575 }; 2026-02-21T09:19:14.2483861Z // end inline asm 2026-02-21T09:19:14.2483917Z // begin inline asm 2026-02-21T09:19:14.2484038Z st.global.v4.b32 [ %rd1106 + 0 ], { %r22577, %r22578, %r22579, %r22580 }; 2026-02-21T09:19:14.2484093Z // end inline asm 2026-02-21T09:19:14.2484153Z // begin inline asm 2026-02-21T09:19:14.2484275Z st.global.v4.b32 [ %rd1107 + 0 ], { %r22582, %r22583, %r22584, %r22585 }; 2026-02-21T09:19:14.2484332Z // end inline asm 2026-02-21T09:19:14.2484389Z // begin inline asm 2026-02-21T09:19:14.2484510Z st.global.v4.b32 [ %rd1108 + 0 ], { %r22587, %r22588, %r22589, %r22590 }; 2026-02-21T09:19:14.2484569Z // end inline asm 2026-02-21T09:19:14.2484626Z // begin inline asm 2026-02-21T09:19:14.2484748Z st.global.v4.b32 [ %rd1109 + 0 ], { %r22592, %r22593, %r22594, %r22595 }; 2026-02-21T09:19:14.2484805Z // end inline asm 2026-02-21T09:19:14.2484862Z // begin inline asm 2026-02-21T09:19:14.2484983Z st.global.v4.b32 [ %rd1110 + 0 ], { %r22597, %r22598, %r22599, %r22600 }; 2026-02-21T09:19:14.2485101Z // end inline asm 2026-02-21T09:19:14.2487580Z // begin inline asm 2026-02-21T09:19:14.2487769Z st.global.v4.b32 [ %rd1111 + 0 ], { %r22602, %r22603, %r22604, %r22605 }; 2026-02-21T09:19:14.2487835Z // end inline asm 2026-02-21T09:19:14.2488032Z // begin inline asm 2026-02-21T09:19:14.2488174Z st.global.v4.b32 [ %rd1112 + 0 ], { %r22607, %r22608, %r22609, %r22610 }; 2026-02-21T09:19:14.2488235Z // end inline asm 2026-02-21T09:19:14.2488472Z .loc 1 22 121 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:121 2026-02-21T09:19:14.2488540Z add.s32 %r2949, %r24029, 1; 2026-02-21T09:19:14.2488619Z setp.lt.s32 %p109, %r24029, %r2; 2026-02-21T09:19:14.2488683Z mov.b32 %r24029, %r2949; 2026-02-21T09:19:14.2488818Z @%p109 bra $L__BB0_13; 2026-02-21T09:19:14.2488914Z $L__BB0_16: // %._crit_edge 2026-02-21T09:19:14.2489133Z .loc 1 22 4 // co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py:22:4 2026-02-21T09:19:14.2489190Z ret; 2026-02-21T09:19:14.2489246Z $L__tmp21: 2026-02-21T09:19:14.2489305Z $L__func_end0: 2026-02-21T09:19:14.2489395Z // -- End function 2026-02-21T09:19:14.2489451Z } 2026-02-21T09:19:14.2489708Z .file 1 "/tmp/torchinductor_root/o2/co2kerwah5rroq4bjmbmfx4eeaf7rm2rnizqw5a4jyuoigho7ggg.py" 2026-02-21T09:19:14.2489926Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:19:14.2489992Z .section .debug_abbrev 2026-02-21T09:19:14.2490043Z { 2026-02-21T09:19:14.2490208Z .b8 1 // Abbreviation Code 2026-02-21T09:19:14.2490305Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:19:14.2490391Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:19:14.2490479Z .b8 37 // DW_AT_producer 2026-02-21T09:19:14.2490561Z .b8 8 // DW_FORM_string 2026-02-21T09:19:14.2490640Z .b8 19 // DW_AT_language 2026-02-21T09:19:14.2490724Z .b8 5 // DW_FORM_data2 2026-02-21T09:19:14.2490801Z .b8 3 // DW_AT_name 2026-02-21T09:19:14.2490880Z .b8 8 // DW_FORM_string 2026-02-21T09:19:14.2490961Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:19:14.2491040Z .b8 6 // DW_FORM_data4 2026-02-21T09:19:14.2491119Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:19:14.2491194Z .b8 8 // DW_FORM_string 2026-02-21T09:19:14.2491270Z .b8 0 // EOM(1) 2026-02-21T09:19:14.2491339Z .b8 0 // EOM(2) 2026-02-21T09:19:14.2491425Z .b8 2 // Abbreviation Code 2026-02-21T09:19:14.2491512Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:19:14.2491592Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:19:14.2491666Z .b8 3 // DW_AT_name 2026-02-21T09:19:14.2491742Z .b8 8 // DW_FORM_string 2026-02-21T09:19:14.2491839Z .b8 32 // DW_AT_inline 2026-02-21T09:19:14.2491919Z .b8 11 // DW_FORM_data1 2026-02-21T09:19:14.2491988Z .b8 0 // EOM(1) 2026-02-21T09:19:14.2492059Z .b8 0 // EOM(2) 2026-02-21T09:19:14.2492142Z .b8 3 // Abbreviation Code 2026-02-21T09:19:14.2492224Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:19:14.2492307Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:19:14.2492384Z .b8 17 // DW_AT_low_pc 2026-02-21T09:19:14.2492457Z .b8 1 // DW_FORM_addr 2026-02-21T09:19:14.2492611Z .b8 18 // DW_AT_high_pc 2026-02-21T09:19:14.2492688Z .b8 1 // DW_FORM_addr 2026-02-21T09:19:14.2492781Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:19:14.2492903Z .b8 19 // DW_FORM_ref4 2026-02-21T09:19:14.2492977Z .b8 0 // EOM(1) 2026-02-21T09:19:14.2493043Z .b8 0 // EOM(2) 2026-02-21T09:19:14.2493125Z .b8 4 // Abbreviation Code 2026-02-21T09:19:14.2493229Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:19:14.2493307Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:19:14.2493462Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:19:14.2493541Z .b8 19 // DW_FORM_ref4 2026-02-21T09:19:14.2493618Z .b8 17 // DW_AT_low_pc 2026-02-21T09:19:14.2493689Z .b8 1 // DW_FORM_addr 2026-02-21T09:19:14.2493768Z .b8 18 // DW_AT_high_pc 2026-02-21T09:19:14.2493843Z .b8 1 // DW_FORM_addr 2026-02-21T09:19:14.2493925Z .b8 88 // DW_AT_call_file 2026-02-21T09:19:14.2494002Z .b8 11 // DW_FORM_data1 2026-02-21T09:19:14.2494084Z .b8 89 // DW_AT_call_line 2026-02-21T09:19:14.2494205Z .b8 11 // DW_FORM_data1 2026-02-21T09:19:14.2494291Z .b8 87 // DW_AT_call_column 2026-02-21T09:19:14.2494370Z .b8 11 // DW_FORM_data1 2026-02-21T09:19:14.2494439Z .b8 0 // EOM(1) 2026-02-21T09:19:14.2494507Z .b8 0 // EOM(2) 2026-02-21T09:19:14.2494575Z .b8 0 // EOM(3) 2026-02-21T09:19:14.2494628Z } 2026-02-21T09:19:14.2494690Z .section .debug_info 2026-02-21T09:19:14.2494740Z { 2026-02-21T09:19:14.2494828Z .b32 178 // Length of Unit 2026-02-21T09:19:14.2494919Z .b8 2 // DWARF version number 2026-02-21T09:19:14.2494971Z .b8 0 2026-02-21T09:19:14.2495101Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:19:14.2495199Z .b8 8 // Address Size (in bytes) 2026-02-21T09:19:14.2495323Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:19:14.2495410Z .b8 116 // DW_AT_producer 2026-02-21T09:19:14.2495467Z .b8 114 2026-02-21T09:19:14.2495520Z .b8 105 2026-02-21T09:19:14.2495573Z .b8 116 2026-02-21T09:19:14.2495626Z .b8 111 2026-02-21T09:19:14.2495677Z .b8 110 2026-02-21T09:19:14.2495727Z .b8 0 2026-02-21T09:19:14.2495807Z .b8 2 // DW_AT_language 2026-02-21T09:19:14.2495862Z .b8 0 2026-02-21T09:19:14.2495937Z .b8 99 // DW_AT_name 2026-02-21T09:19:14.2495988Z .b8 111 2026-02-21T09:19:14.2496042Z .b8 50 2026-02-21T09:19:14.2496095Z .b8 107 2026-02-21T09:19:14.2496145Z .b8 101 2026-02-21T09:19:14.2496196Z .b8 114 2026-02-21T09:19:14.2496251Z .b8 119 2026-02-21T09:19:14.2496301Z .b8 97 2026-02-21T09:19:14.2496352Z .b8 104 2026-02-21T09:19:14.2496401Z .b8 53 2026-02-21T09:19:14.2496595Z .b8 114 2026-02-21T09:19:14.2496651Z .b8 114 2026-02-21T09:19:14.2496704Z .b8 111 2026-02-21T09:19:14.2496757Z .b8 113 2026-02-21T09:19:14.2496806Z .b8 52 2026-02-21T09:19:14.2496857Z .b8 98 2026-02-21T09:19:14.2496908Z .b8 106 2026-02-21T09:19:14.2496962Z .b8 109 2026-02-21T09:19:14.2497013Z .b8 98 2026-02-21T09:19:14.2497064Z .b8 109 2026-02-21T09:19:14.2497117Z .b8 102 2026-02-21T09:19:14.2497168Z .b8 120 2026-02-21T09:19:14.2497217Z .b8 52 2026-02-21T09:19:14.2497266Z .b8 101 2026-02-21T09:19:14.2497415Z .b8 101 2026-02-21T09:19:14.2497465Z .b8 97 2026-02-21T09:19:14.2497517Z .b8 102 2026-02-21T09:19:14.2497570Z .b8 55 2026-02-21T09:19:14.2497621Z .b8 114 2026-02-21T09:19:14.2497670Z .b8 109 2026-02-21T09:19:14.2497720Z .b8 50 2026-02-21T09:19:14.2497839Z .b8 114 2026-02-21T09:19:14.2497890Z .b8 110 2026-02-21T09:19:14.2497941Z .b8 105 2026-02-21T09:19:14.2497994Z .b8 122 2026-02-21T09:19:14.2498048Z .b8 113 2026-02-21T09:19:14.2498099Z .b8 119 2026-02-21T09:19:14.2498148Z .b8 53 2026-02-21T09:19:14.2498200Z .b8 97 2026-02-21T09:19:14.2498250Z .b8 52 2026-02-21T09:19:14.2498301Z .b8 106 2026-02-21T09:19:14.2498352Z .b8 121 2026-02-21T09:19:14.2498418Z .b8 117 2026-02-21T09:19:14.2498470Z .b8 111 2026-02-21T09:19:14.2498520Z .b8 105 2026-02-21T09:19:14.2498573Z .b8 103 2026-02-21T09:19:14.2498690Z .b8 104 2026-02-21T09:19:14.2498742Z .b8 111 2026-02-21T09:19:14.2498793Z .b8 55 2026-02-21T09:19:14.2498845Z .b8 103 2026-02-21T09:19:14.2498895Z .b8 103 2026-02-21T09:19:14.2498946Z .b8 103 2026-02-21T09:19:14.2498997Z .b8 46 2026-02-21T09:19:14.2499050Z .b8 112 2026-02-21T09:19:14.2499101Z .b8 121 2026-02-21T09:19:14.2499150Z .b8 0 2026-02-21T09:19:14.2499259Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:19:14.2499345Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:19:14.2499398Z .b8 116 2026-02-21T09:19:14.2499448Z .b8 109 2026-02-21T09:19:14.2499502Z .b8 112 2026-02-21T09:19:14.2499552Z .b8 47 2026-02-21T09:19:14.2499602Z .b8 116 2026-02-21T09:19:14.2499654Z .b8 111 2026-02-21T09:19:14.2499705Z .b8 114 2026-02-21T09:19:14.2499823Z .b8 99 2026-02-21T09:19:14.2499876Z .b8 104 2026-02-21T09:19:14.2499930Z .b8 105 2026-02-21T09:19:14.2499981Z .b8 110 2026-02-21T09:19:14.2500031Z .b8 100 2026-02-21T09:19:14.2500085Z .b8 117 2026-02-21T09:19:14.2500136Z .b8 99 2026-02-21T09:19:14.2500187Z .b8 116 2026-02-21T09:19:14.2500237Z .b8 111 2026-02-21T09:19:14.2500289Z .b8 114 2026-02-21T09:19:14.2500340Z .b8 95 2026-02-21T09:19:14.2500389Z .b8 114 2026-02-21T09:19:14.2500444Z .b8 111 2026-02-21T09:19:14.2500494Z .b8 111 2026-02-21T09:19:14.2500544Z .b8 116 2026-02-21T09:19:14.2500593Z .b8 47 2026-02-21T09:19:14.2500647Z .b8 111 2026-02-21T09:19:14.2500698Z .b8 50 2026-02-21T09:19:14.2500747Z .b8 0 2026-02-21T09:19:14.2500863Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:19:14.2500947Z .b8 95 // DW_AT_name 2026-02-21T09:19:14.2501000Z .b8 104 2026-02-21T09:19:14.2501050Z .b8 101 2026-02-21T09:19:14.2501102Z .b8 108 2026-02-21T09:19:14.2501153Z .b8 105 2026-02-21T09:19:14.2501208Z .b8 111 2026-02-21T09:19:14.2501259Z .b8 110 2026-02-21T09:19:14.2501312Z .b8 95 2026-02-21T09:19:14.2501361Z .b8 109 2026-02-21T09:19:14.2501411Z .b8 97 2026-02-21T09:19:14.2501468Z .b8 116 2026-02-21T09:19:14.2501520Z .b8 109 2026-02-21T09:19:14.2501569Z .b8 117 2026-02-21T09:19:14.2501621Z .b8 108 2026-02-21T09:19:14.2501674Z .b8 95 2026-02-21T09:19:14.2501724Z .b8 98 2026-02-21T09:19:14.2501776Z .b8 102 2026-02-21T09:19:14.2501827Z .b8 49 2026-02-21T09:19:14.2501878Z .b8 54 2026-02-21T09:19:14.2501927Z .b8 95 2026-02-21T09:19:14.2501979Z .b8 105 2026-02-21T09:19:14.2502034Z .b8 110 2026-02-21T09:19:14.2502084Z .b8 116 2026-02-21T09:19:14.2502134Z .b8 52 2026-02-21T09:19:14.2502188Z .b8 0 2026-02-21T09:19:14.2502272Z .b8 1 // DW_AT_inline 2026-02-21T09:19:14.2502378Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:19:14.2502471Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:19:14.2502574Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:19:14.2502674Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:19:14.2502804Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:19:14.2502903Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:19:14.2502991Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:19:14.2503143Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T09:19:14.2503231Z .b8 1 // DW_AT_call_file 2026-02-21T09:19:14.2503312Z .b8 87 // DW_AT_call_line 2026-02-21T09:19:14.2503444Z .b8 40 // DW_AT_call_column 2026-02-21T09:19:14.2503534Z .b8 0 // End Of Children Mark 2026-02-21T09:19:14.2503632Z .b8 0 // End Of Children Mark 2026-02-21T09:19:14.2503685Z } 2026-02-21T09:19:14.2503757Z .section .debug_macinfo { } 2026-02-21T09:19:14.2503763Z 2026-02-21T09:19:14.2503847Z ================================================================ 2026-02-21T09:19:14.2504014Z please share the reproducer above with Triton project. 2026-02-21T09:19:22.1324063Z 2026-02-21T09:19:22.1325646Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 114/114 5.7 configs/s 2026-02-21T09:19:23.8395150Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━━━ 128/128 61.2 configs/s 2026-02-21T09:19:24.2703281Z [444s] Generation 2 complete: 2026-02-21T09:19:24.2703563Z error=27 2026-02-21T09:19:24.2703737Z ok=90 2026-02-21T09:19:24.2703910Z min=1.5741 2026-02-21T09:19:24.2704112Z mid=7.2512 2026-02-21T09:19:24.2704288Z max=167.6621 2026-02-21T09:19:24.2704480Z best={'block_sizes': [16, 64, 128], 2026-02-21T09:19:24.2704884Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:19:24.2705282Z 'l2_groupings': [1], 2026-02-21T09:19:24.2705974Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:19:24.2706277Z 'loop_orders': [[0, 1]], 2026-02-21T09:19:24.2706829Z 'maxnreg': 128, 2026-02-21T09:19:24.2707047Z 'num_sm_multiplier': 8, 2026-02-21T09:19:24.2707273Z 'num_stages': 7, 2026-02-21T09:19:24.2707472Z 'num_warps': 4, 2026-02-21T09:19:24.2707682Z 'pid_type': 'persistent_blocked', 2026-02-21T09:19:24.2707953Z 'range_flattens': [None, None], 2026-02-21T09:19:24.2708215Z 'range_multi_buffers': [False, None], 2026-02-21T09:19:24.2708569Z 'range_num_stages': [2, 2], 2026-02-21T09:19:24.2708813Z 'range_unroll_factors': [4, 0], 2026-02-21T09:19:24.2709057Z 'range_warp_specializes': []} 2026-02-21T09:19:24.2739040Z [444s] Fitting surrogate: 336 points, 336 targets 2026-02-21T09:19:26.2335803Z [446s] Generation 3 starting: 114 neighbors, 5 active search path(s) 2026-02-21T09:19:57.4992689Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 115/115 4.9 configs/s 2026-02-21T09:19:59.1855608Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━ 115/115 152.3 configs/s 2026-02-21T09:19:59.8090435Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━━ 128/128 127.1 configs/s 2026-02-21T09:20:00.2284956Z [480s] Generation 3 complete: 2026-02-21T09:20:00.2285260Z error=112 2026-02-21T09:20:00.2285438Z ok=8 2026-02-21T09:20:00.2285621Z min=1.5374 2026-02-21T09:20:00.2285795Z mid=2.7946 2026-02-21T09:20:00.2285968Z max=4.0129 2026-02-21T09:20:00.2286169Z best={'block_sizes': [16, 64, 128], 2026-02-21T09:20:00.2286783Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:20:00.2287211Z 'l2_groupings': [1], 2026-02-21T09:20:00.2287466Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:20:00.2287759Z 'loop_orders': [[0, 1]], 2026-02-21T09:20:00.2287973Z 'maxnreg': 128, 2026-02-21T09:20:00.2288175Z 'num_sm_multiplier': 8, 2026-02-21T09:20:00.2288390Z 'num_stages': 7, 2026-02-21T09:20:00.2288599Z 'num_warps': 4, 2026-02-21T09:20:00.2288810Z 'pid_type': 'persistent_blocked', 2026-02-21T09:20:00.2289086Z 'range_flattens': [None, None], 2026-02-21T09:20:00.2289339Z 'range_multi_buffers': [False, None], 2026-02-21T09:20:00.2289606Z 'range_num_stages': [2, 2], 2026-02-21T09:20:00.2289843Z 'range_unroll_factors': [4, 0], 2026-02-21T09:20:00.2290095Z 'range_warp_specializes': []} 2026-02-21T09:20:00.2316746Z [480s] Fitting surrogate: 456 points, 456 targets 2026-02-21T09:20:01.7351760Z [482s] Generation 4 starting: 93 neighbors, 5 active search path(s) 2026-02-21T09:20:41.7483067Z [522s] Timeout after 30s compiling Config(block_sizes=[16, 256, 8], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, None], range_num_stages=[4, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:20:49.7033187Z [530s] Timeout after 30s compiling Config(block_sizes=[256, 256, 8], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[1, 0], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:20:50.1384344Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 94/94 0.8 configs/s 2026-02-21T09:20:54.3540981Z [534s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:20:54.3542922Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 256], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:20:54.3544466Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:20:54.3544779Z `ptxas` stderr: 2026-02-21T09:20:54.3545396Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 791 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:20:54.3546103Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:54.3546302Z 2026-02-21T09:20:54.3547094Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpzb0b00xw.ptx -o /tmp/tmpzb0b00xw.ptx.o 2026-02-21T09:20:54.3547749Z 2026-02-21T09:20:54.3547916Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:20:54.3548180Z 2026-02-21T09:20:54.3548185Z 2026-02-21T09:20:54.3548297Z ================================================================ 2026-02-21T09:20:54.3548678Z Internal Triton PTX codegen error 2026-02-21T09:20:54.3548901Z `ptxas` stderr: 2026-02-21T09:20:54.3549498Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 791 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:20:54.3550198Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:54.3550390Z 2026-02-21T09:20:54.3550964Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpzb0b00xw.ptx -o /tmp/tmpzb0b00xw.ptx.o 2026-02-21T09:20:54.3551595Z 2026-02-21T09:20:54.3551599Z 2026-02-21T09:20:54.3551659Z // 2026-02-21T09:20:54.3551823Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:20:54.3552020Z // 2026-02-21T09:20:54.3552097Z 2026-02-21T09:20:54.3552157Z .version 8.7 2026-02-21T09:20:54.3552321Z .target sm_90a 2026-02-21T09:20:54.3552474Z .address_size 64 2026-02-21T09:20:54.3552567Z 2026-02-21T09:20:54.3552752Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:20:54.3553100Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:20:54.3553362Z // @_helion_matmul_bf16_int4 2026-02-21T09:20:54.3553626Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:20:54.3553925Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:20:54.3554486Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:20:54.3554873Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:20:54.3555227Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:20:54.3555701Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:20:54.3555975Z ) 2026-02-21T09:20:54.3556104Z .reqntid 1024 2026-02-21T09:20:54.3556248Z { 2026-02-21T09:20:54.3556383Z .reg .pred %p<12>; 2026-02-21T09:20:54.3556754Z .reg .b16 %rs<61>; 2026-02-21T09:20:54.3556915Z .reg .b32 %r<2035>; 2026-02-21T09:20:54.3557079Z .reg .b64 %rd<52>; 2026-02-21T09:20:54.3557403Z .loc 1 13 0 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:13:0 2026-02-21T09:20:54.3557881Z $L__func_begin0: 2026-02-21T09:20:54.3558175Z .loc 1 13 0 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:13:0 2026-02-21T09:20:54.3558473Z 2026-02-21T09:20:54.3558527Z // %bb.0: 2026-02-21T09:20:54.3558726Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:20:54.3559028Z ld.param.b64 %rd8, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:20:54.3559326Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:20:54.3559561Z $L__tmp0: 2026-02-21T09:20:54.3559853Z .loc 1 17 33 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:17:33 2026-02-21T09:20:54.3560234Z mov.u32 %r184, %ctaid.x; 2026-02-21T09:20:54.3560637Z .loc 1 20 29 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:20:29 2026-02-21T09:20:54.3560995Z shr.u32 %r185, %r184, 6; 2026-02-21T09:20:54.3561167Z and.b32 %r186, %r185, 33554428; 2026-02-21T09:20:54.3561497Z .loc 1 21 35 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:21:35 2026-02-21T09:20:54.3561845Z sub.s32 %r187, 32, %r186; 2026-02-21T09:20:54.3562158Z .loc 1 21 48 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:21:48 2026-02-21T09:20:54.3562519Z min.s32 %r188, %r187, 4; 2026-02-21T09:20:54.3562824Z .loc 1 22 41 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:22:41 2026-02-21T09:20:54.3563176Z and.b32 %r189, %r184, 255; 2026-02-21T09:20:54.3563484Z .loc 1 23 47 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:23:47 2026-02-21T09:20:54.3563839Z div.s32 %r190, %r189, %r188; 2026-02-21T09:20:54.3564160Z .loc 1 22 60 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:22:60 2026-02-21T09:20:54.3564535Z mul.lo.s32 %r191, %r190, %r188; 2026-02-21T09:20:54.3564721Z sub.s32 %r192, %r189, %r191; 2026-02-21T09:20:54.3565045Z .loc 1 22 26 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:22:26 2026-02-21T09:20:54.3565396Z add.s32 %r193, %r192, %r186; 2026-02-21T09:20:54.3565704Z .loc 1 24 23 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:24:23 2026-02-21T09:20:54.3566061Z shl.b32 %r1, %r193, 8; 2026-02-21T09:20:54.3566378Z .loc 1 25 41 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:25:41 2026-02-21T09:20:54.3566871Z mov.u32 %r2, %tid.x; 2026-02-21T09:20:54.3567033Z and.b32 %r194, %r2, 31; 2026-02-21T09:20:54.3567205Z shr.u32 %r3, %r2, 5; 2026-02-21T09:20:54.3567367Z shr.u32 %r195, %r2, 2; 2026-02-21T09:20:54.3567529Z shl.b32 %r4, %r2, 1; 2026-02-21T09:20:54.3567692Z and.b32 %r196, %r4, 254; 2026-02-21T09:20:54.3568000Z .loc 1 26 23 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:26:23 2026-02-21T09:20:54.3568351Z shl.b32 %r5, %r190, 8; 2026-02-21T09:20:54.3568826Z .loc 1 27 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:27:28 2026-02-21T09:20:54.3569184Z or.b32 %r197, %r5, %r195; 2026-02-21T09:20:54.3569500Z .loc 1 41 34 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:41:34 2026-02-21T09:20:54.3569949Z and.b32 %r6, %r2, 3; 2026-02-21T09:20:54.3570124Z shl.b32 %r198, %r6, 2; 2026-02-21T09:20:54.3570426Z .loc 1 42 49 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:49 2026-02-21T09:20:54.3570843Z shl.b32 %r199, %r197, 10; 2026-02-21T09:20:54.3571150Z .loc 1 59 34 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:59:34 2026-02-21T09:20:54.3571511Z and.b32 %r7, %r2, 256; 2026-02-21T09:20:54.3571816Z .loc 1 42 56 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:56 2026-02-21T09:20:54.3572170Z or.b32 %r200, %r199, %r198; 2026-02-21T09:20:54.3572555Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3572917Z mad.wide.s32 %rd10, %r200, 2, %rd7; 2026-02-21T09:20:54.3573259Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3573617Z shl.b32 %r201, %r2, 3; 2026-02-21T09:20:54.3573783Z and.b32 %r202, %r201, 8056; 2026-02-21T09:20:54.3573960Z bfe.s32 %r203, %r2, 4, 1; 2026-02-21T09:20:54.3574135Z and.b32 %r204, %r203, 136; 2026-02-21T09:20:54.3574313Z xor.b32 %r205, %r204, %r202; 2026-02-21T09:20:54.3574488Z mov.b32 %r206, global_smem; 2026-02-21T09:20:54.3574665Z add.s32 %r161, %r206, %r205; 2026-02-21T09:20:54.3574830Z mov.b32 %r162, 8; 2026-02-21T09:20:54.3574988Z // begin inline asm 2026-02-21T09:20:54.3575294Z cp.async.ca.shared.global [ %r161 + 0 ], [ %rd10 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3575576Z // end inline asm 2026-02-21T09:20:54.3575730Z cp.async.commit_group; 2026-02-21T09:20:54.3576060Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3576413Z add.s64 %rd11, %rd10, 32; 2026-02-21T09:20:54.3576848Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3577197Z add.s32 %r163, %r161, 40960; 2026-02-21T09:20:54.3577368Z // begin inline asm 2026-02-21T09:20:54.3577603Z cp.async.ca.shared.global [ %r163 + 0 ], [ %rd11 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3577871Z // end inline asm 2026-02-21T09:20:54.3578033Z cp.async.commit_group; 2026-02-21T09:20:54.3578350Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3578700Z add.s64 %rd12, %rd10, 64; 2026-02-21T09:20:54.3579010Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3579347Z bar.sync 0; 2026-02-21T09:20:54.3579501Z add.s32 %r165, %r161, 8192; 2026-02-21T09:20:54.3579671Z // begin inline asm 2026-02-21T09:20:54.3579902Z cp.async.ca.shared.global [ %r165 + 0 ], [ %rd12 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3580168Z // end inline asm 2026-02-21T09:20:54.3580335Z cp.async.commit_group; 2026-02-21T09:20:54.3580661Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3581006Z add.s64 %rd13, %rd10, 96; 2026-02-21T09:20:54.3581315Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3581657Z add.s32 %r167, %r161, 49152; 2026-02-21T09:20:54.3581834Z // begin inline asm 2026-02-21T09:20:54.3582055Z cp.async.ca.shared.global [ %r167 + 0 ], [ %rd13 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3582325Z // end inline asm 2026-02-21T09:20:54.3582480Z cp.async.commit_group; 2026-02-21T09:20:54.3582790Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3583142Z add.s64 %rd14, %rd10, 128; 2026-02-21T09:20:54.3583456Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3583802Z bar.sync 0; 2026-02-21T09:20:54.3583952Z add.s32 %r169, %r161, 16384; 2026-02-21T09:20:54.3584241Z // begin inline asm 2026-02-21T09:20:54.3584470Z cp.async.ca.shared.global [ %r169 + 0 ], [ %rd14 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3584743Z // end inline asm 2026-02-21T09:20:54.3584900Z cp.async.commit_group; 2026-02-21T09:20:54.3585272Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3585625Z add.s64 %rd15, %rd10, 160; 2026-02-21T09:20:54.3585946Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3586302Z add.s32 %r171, %r161, 57344; 2026-02-21T09:20:54.3586592Z // begin inline asm 2026-02-21T09:20:54.3586834Z cp.async.ca.shared.global [ %r171 + 0 ], [ %rd15 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3587190Z // end inline asm 2026-02-21T09:20:54.3587359Z cp.async.commit_group; 2026-02-21T09:20:54.3587675Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3588022Z add.s64 %rd16, %rd10, 192; 2026-02-21T09:20:54.3588440Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3588809Z bar.sync 0; 2026-02-21T09:20:54.3588961Z add.s32 %r173, %r161, 24576; 2026-02-21T09:20:54.3589133Z // begin inline asm 2026-02-21T09:20:54.3589364Z cp.async.ca.shared.global [ %r173 + 0 ], [ %rd16 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3589626Z // end inline asm 2026-02-21T09:20:54.3589784Z cp.async.commit_group; 2026-02-21T09:20:54.3590184Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3590548Z add.s64 %rd17, %rd10, 224; 2026-02-21T09:20:54.3590862Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3591204Z add.s32 %r175, %r161, 65536; 2026-02-21T09:20:54.3591383Z // begin inline asm 2026-02-21T09:20:54.3591604Z cp.async.ca.shared.global [ %r175 + 0 ], [ %rd17 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3591891Z // end inline asm 2026-02-21T09:20:54.3592050Z cp.async.commit_group; 2026-02-21T09:20:54.3592353Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3592713Z add.s64 %rd18, %rd10, 256; 2026-02-21T09:20:54.3593025Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3593374Z bar.sync 0; 2026-02-21T09:20:54.3593523Z add.s32 %r177, %r161, 32768; 2026-02-21T09:20:54.3593703Z // begin inline asm 2026-02-21T09:20:54.3593927Z cp.async.ca.shared.global [ %r177 + 0 ], [ %rd18 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3594206Z // end inline asm 2026-02-21T09:20:54.3594366Z cp.async.commit_group; 2026-02-21T09:20:54.3594667Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3595018Z add.s64 %rd19, %rd10, 288; 2026-02-21T09:20:54.3595340Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3595691Z add.s32 %r179, %r161, 73728; 2026-02-21T09:20:54.3595863Z // begin inline asm 2026-02-21T09:20:54.3596094Z cp.async.ca.shared.global [ %r179 + 0 ], [ %rd19 + 0 ], 0x8, %r162; 2026-02-21T09:20:54.3596367Z // end inline asm 2026-02-21T09:20:54.3596676Z cp.async.commit_group; 2026-02-21T09:20:54.3596851Z shl.b32 %r207, %r2, 4; 2026-02-21T09:20:54.3597018Z and.b32 %r208, %r207, 7680; 2026-02-21T09:20:54.3597201Z and.b32 %r209, %r201, 96; 2026-02-21T09:20:54.3597371Z shl.b32 %r210, %r6, 1; 2026-02-21T09:20:54.3597544Z or.b32 %r211, %r208, %r209; 2026-02-21T09:20:54.3597720Z or.b32 %r212, %r211, %r210; 2026-02-21T09:20:54.3597902Z or.b32 %r10, %r212, %r204; 2026-02-21T09:20:54.3598070Z xor.b32 %r11, %r10, 8; 2026-02-21T09:20:54.3598240Z and.b32 %r213, %r201, 768; 2026-02-21T09:20:54.3598419Z shl.b32 %r214, %r194, 2; 2026-02-21T09:20:54.3598713Z shr.u32 %r215, %r2, 8; 2026-02-21T09:20:54.3598880Z and.b32 %r216, %r2, 128; 2026-02-21T09:20:54.3599044Z or.b32 %r217, %r215, %r216; 2026-02-21T09:20:54.3599220Z or.b32 %r218, %r213, %r214; 2026-02-21T09:20:54.3599388Z or.b32 %r219, %r218, %r217; 2026-02-21T09:20:54.3599649Z add.s32 %r1244, %r206, 81920; 2026-02-21T09:20:54.3599829Z add.s32 %r12, %r1244, %r219; 2026-02-21T09:20:54.3600006Z xor.b32 %r221, %r219, 64; 2026-02-21T09:20:54.3600175Z add.s32 %r13, %r1244, %r221; 2026-02-21T09:20:54.3600362Z shl.b32 %r222, %r2, 2; 2026-02-21T09:20:54.3600536Z and.b32 %r223, %r222, 768; 2026-02-21T09:20:54.3600709Z and.b32 %r224, %r4, 124; 2026-02-21T09:20:54.3600894Z and.b32 %r225, %r2, 1; 2026-02-21T09:20:54.3601059Z neg.s32 %r226, %r225; 2026-02-21T09:20:54.3601307Z and.b32 %r227, %r226, 1088; 2026-02-21T09:20:54.3601488Z and.b32 %r228, %r195, 128; 2026-02-21T09:20:54.3601660Z or.b32 %r229, %r223, %r224; 2026-02-21T09:20:54.3601830Z xor.b32 %r230, %r229, %r227; 2026-02-21T09:20:54.3602029Z add.s32 %r231, %r1244, %r228; 2026-02-21T09:20:54.3602204Z add.s32 %r14, %r231, %r230; 2026-02-21T09:20:54.3602379Z shl.b32 %r232, %r2, 6; 2026-02-21T09:20:54.3602545Z and.b32 %r233, %r232, 16320; 2026-02-21T09:20:54.3602718Z and.b32 %r234, %r201, 48; 2026-02-21T09:20:54.3602889Z shr.u32 %r235, %r2, 6; 2026-02-21T09:20:54.3603048Z and.b32 %r236, %r235, 12; 2026-02-21T09:20:54.3603221Z or.b32 %r237, %r233, %r234; 2026-02-21T09:20:54.3603390Z or.b32 %r238, %r237, %r236; 2026-02-21T09:20:54.3603568Z add.s32 %r15, %r1244, %r238; 2026-02-21T09:20:54.3603835Z xor.b32 %r239, %r238, 16; 2026-02-21T09:20:54.3604019Z add.s32 %r16, %r1244, %r239; 2026-02-21T09:20:54.3604190Z xor.b32 %r240, %r238, 32; 2026-02-21T09:20:54.3604360Z add.s32 %r17, %r1244, %r240; 2026-02-21T09:20:54.3604539Z xor.b32 %r241, %r238, 48; 2026-02-21T09:20:54.3604706Z add.s32 %r18, %r1244, %r241; 2026-02-21T09:20:54.3604882Z shl.b32 %r242, %r3, 7; 2026-02-21T09:20:54.3605054Z shl.b32 %r243, %r194, 4; 2026-02-21T09:20:54.3605232Z or.b32 %r244, %r242, %r243; 2026-02-21T09:20:54.3605402Z add.s32 %r245, %r206, 98304; 2026-02-21T09:20:54.3605596Z add.s32 %r1248, %r245, %r244; 2026-02-21T09:20:54.3605775Z and.b32 %r20, %r207, 112; 2026-02-21T09:20:54.3605951Z shl.b32 %r21, %r194, 3; 2026-02-21T09:20:54.3606121Z or.b32 %r246, %r242, %r21; 2026-02-21T09:20:54.3606297Z and.b32 %r247, %r246, 1920; 2026-02-21T09:20:54.3606609Z shl.b32 %r248, %r2, 8; 2026-02-21T09:20:54.3606779Z and.b32 %r249, %r248, 2048; 2026-02-21T09:20:54.3606963Z add.s32 %r250, %r245, %r20; 2026-02-21T09:20:54.3607139Z add.s32 %r251, %r250, %r249; 2026-02-21T09:20:54.3607321Z add.s32 %r267, %r251, %r247; 2026-02-21T09:20:54.3607494Z bfe.u32 %r252, %r1244, 4, 14; 2026-02-21T09:20:54.3607678Z cvt.u64.u32 %rd21, %r252; 2026-02-21T09:20:54.3607880Z or.b64 %rd28, %rd21, -9223371899348713472; 2026-02-21T09:20:54.3608099Z add.s32 %r253, %r206, 81952; 2026-02-21T09:20:54.3608278Z bfe.u32 %r254, %r253, 4, 14; 2026-02-21T09:20:54.3608453Z cvt.u64.u32 %rd22, %r254; 2026-02-21T09:20:54.3608638Z or.b64 %rd29, %rd22, -9223371899348713472; 2026-02-21T09:20:54.3608848Z add.s32 %r255, %r206, 90112; 2026-02-21T09:20:54.3609024Z bfe.u32 %r256, %r255, 4, 14; 2026-02-21T09:20:54.3609196Z cvt.u64.u32 %rd23, %r256; 2026-02-21T09:20:54.3609391Z or.b64 %rd30, %rd23, -9223371899348713472; 2026-02-21T09:20:54.3609592Z add.s32 %r257, %r206, 90144; 2026-02-21T09:20:54.3609772Z bfe.u32 %r258, %r257, 4, 14; 2026-02-21T09:20:54.3609953Z cvt.u64.u32 %rd24, %r258; 2026-02-21T09:20:54.3610143Z or.b64 %rd31, %rd24, -9223371899348713472; 2026-02-21T09:20:54.3610497Z .loc 1 34 74 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:34:74 2026-02-21T09:20:54.3610851Z shl.b32 %r259, %r190, 18; 2026-02-21T09:20:54.3611025Z shl.b32 %r260, %r195, 10; 2026-02-21T09:20:54.3611191Z or.b32 %r261, %r259, %r260; 2026-02-21T09:20:54.3611365Z or.b32 %r262, %r261, %r198; 2026-02-21T09:20:54.3611638Z or.b32 %r1968, %r262, 176; 2026-02-21T09:20:54.3611826Z and.b32 %r263, %r232, 57344; 2026-02-21T09:20:54.3612001Z add.s32 %r264, %r263, %r1; 2026-02-21T09:20:54.3612179Z or.b32 %r1967, %r264, %r196; 2026-02-21T09:20:54.3612354Z mov.b32 %r1507, 0f00000000; 2026-02-21T09:20:54.3612593Z mov.b32 %r1970, 4; 2026-02-21T09:20:54.3612755Z mov.b32 %r1969, -1; 2026-02-21T09:20:54.3612910Z mov.b64 %rd51, -16; 2026-02-21T09:20:54.3613077Z setp.eq.b32 %p7, %r7, 0; 2026-02-21T09:20:54.3613252Z mov.b32 %r1508, %r1507; 2026-02-21T09:20:54.3613422Z mov.b32 %r1509, %r1507; 2026-02-21T09:20:54.3613589Z mov.b32 %r1510, %r1507; 2026-02-21T09:20:54.3613756Z mov.b32 %r1511, %r1507; 2026-02-21T09:20:54.3613915Z mov.b32 %r1512, %r1507; 2026-02-21T09:20:54.3614082Z mov.b32 %r1513, %r1507; 2026-02-21T09:20:54.3614319Z mov.b32 %r1514, %r1507; 2026-02-21T09:20:54.3614485Z mov.b32 %r1515, %r1507; 2026-02-21T09:20:54.3614654Z mov.b32 %r1516, %r1507; 2026-02-21T09:20:54.3614829Z mov.b32 %r1517, %r1507; 2026-02-21T09:20:54.3615001Z mov.b32 %r1518, %r1507; 2026-02-21T09:20:54.3615163Z mov.b32 %r1519, %r1507; 2026-02-21T09:20:54.3615330Z mov.b32 %r1520, %r1507; 2026-02-21T09:20:54.3615491Z mov.b32 %r1521, %r1507; 2026-02-21T09:20:54.3615663Z mov.b32 %r1522, %r1507; 2026-02-21T09:20:54.3615823Z mov.b32 %r1523, %r1507; 2026-02-21T09:20:54.3616004Z mov.b32 %r1524, %r1507; 2026-02-21T09:20:54.3616180Z mov.b32 %r1525, %r1507; 2026-02-21T09:20:54.3616341Z mov.b32 %r1526, %r1507; 2026-02-21T09:20:54.3616639Z mov.b32 %r1527, %r1507; 2026-02-21T09:20:54.3616883Z mov.b32 %r1528, %r1507; 2026-02-21T09:20:54.3617053Z mov.b32 %r1529, %r1507; 2026-02-21T09:20:54.3617215Z mov.b32 %r1530, %r1507; 2026-02-21T09:20:54.3617381Z mov.b32 %r1531, %r1507; 2026-02-21T09:20:54.3617543Z mov.b32 %r1532, %r1507; 2026-02-21T09:20:54.3617705Z mov.b32 %r1533, %r1507; 2026-02-21T09:20:54.3617874Z mov.b32 %r1534, %r1507; 2026-02-21T09:20:54.3618042Z mov.b32 %r1535, %r1507; 2026-02-21T09:20:54.3618207Z mov.b32 %r1536, %r1507; 2026-02-21T09:20:54.3618364Z mov.b32 %r1537, %r1507; 2026-02-21T09:20:54.3618525Z mov.b32 %r1538, %r1507; 2026-02-21T09:20:54.3618681Z mov.b32 %r1539, %r1507; 2026-02-21T09:20:54.3618843Z mov.b32 %r1540, %r1507; 2026-02-21T09:20:54.3619003Z mov.b32 %r1541, %r1507; 2026-02-21T09:20:54.3619166Z mov.b32 %r1542, %r1507; 2026-02-21T09:20:54.3619321Z mov.b32 %r1543, %r1507; 2026-02-21T09:20:54.3619486Z mov.b32 %r1544, %r1507; 2026-02-21T09:20:54.3619643Z mov.b32 %r1545, %r1507; 2026-02-21T09:20:54.3619806Z mov.b32 %r1546, %r1507; 2026-02-21T09:20:54.3619971Z mov.b32 %r1547, %r1507; 2026-02-21T09:20:54.3620142Z mov.b32 %r1548, %r1507; 2026-02-21T09:20:54.3620306Z mov.b32 %r1549, %r1507; 2026-02-21T09:20:54.3620468Z mov.b32 %r1550, %r1507; 2026-02-21T09:20:54.3620634Z mov.b32 %r1551, %r1507; 2026-02-21T09:20:54.3620790Z mov.b32 %r1552, %r1507; 2026-02-21T09:20:54.3620956Z mov.b32 %r1553, %r1507; 2026-02-21T09:20:54.3621115Z mov.b32 %r1554, %r1507; 2026-02-21T09:20:54.3621283Z mov.b32 %r1555, %r1507; 2026-02-21T09:20:54.3621441Z mov.b32 %r1556, %r1507; 2026-02-21T09:20:54.3621607Z mov.b32 %r1557, %r1507; 2026-02-21T09:20:54.3621769Z mov.b32 %r1558, %r1507; 2026-02-21T09:20:54.3621927Z mov.b32 %r1559, %r1507; 2026-02-21T09:20:54.3622111Z mov.b32 %r1560, %r1507; 2026-02-21T09:20:54.3622274Z mov.b32 %r1561, %r1507; 2026-02-21T09:20:54.3622439Z mov.b32 %r1562, %r1507; 2026-02-21T09:20:54.3622596Z mov.b32 %r1563, %r1507; 2026-02-21T09:20:54.3622760Z mov.b32 %r1564, %r1507; 2026-02-21T09:20:54.3622918Z mov.b32 %r1565, %r1507; 2026-02-21T09:20:54.3623084Z mov.b32 %r1566, %r1507; 2026-02-21T09:20:54.3623241Z mov.b32 %r1567, %r1507; 2026-02-21T09:20:54.3623418Z mov.b32 %r1568, %r1507; 2026-02-21T09:20:54.3623583Z mov.b32 %r1569, %r1507; 2026-02-21T09:20:54.3623744Z mov.b32 %r1570, %r1507; 2026-02-21T09:20:54.3623968Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:20:54.3624236Z add.s64 %rd51, %rd51, 16; 2026-02-21T09:20:54.3624515Z setp.lt.u64 %p8, %rd51, 432; 2026-02-21T09:20:54.3624693Z add.s32 %r1777, %r1969, 1; 2026-02-21T09:20:54.3624872Z setp.gt.s32 %p9, %r1777, 4; 2026-02-21T09:20:54.3625057Z selp.b32 %r1969, 0, %r1777, %p9; 2026-02-21T09:20:54.3625463Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3625827Z cp.async.wait_group 8; 2026-02-21T09:20:54.3626002Z bar.sync 0; 2026-02-21T09:20:54.3626164Z shl.b32 %r1778, %r1969, 13; 2026-02-21T09:20:54.3626346Z add.s32 %r1780, %r206, %r1778; 2026-02-21T09:20:54.3626799Z .loc 1 46 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:46:28 2026-02-21T09:20:54.3627148Z add.s32 %r1781, %r1780, %r10; 2026-02-21T09:20:54.3627424Z ld.shared.b16 %rs3, [%r1781]; 2026-02-21T09:20:54.3627619Z ld.shared.b16 %rs4, [%r1781+256]; 2026-02-21T09:20:54.3627823Z ld.shared.b16 %rs5, [%r1781+16]; 2026-02-21T09:20:54.3628018Z ld.shared.b16 %rs6, [%r1781+272]; 2026-02-21T09:20:54.3628221Z add.s32 %r1782, %r1780, %r11; 2026-02-21T09:20:54.3628557Z ld.shared.b16 %rs7, [%r1782]; 2026-02-21T09:20:54.3628739Z ld.shared.b16 %rs8, [%r1782+256]; 2026-02-21T09:20:54.3628933Z ld.shared.b16 %rs9, [%r1782+16]; 2026-02-21T09:20:54.3629130Z ld.shared.b16 %rs10, [%r1782+272]; 2026-02-21T09:20:54.3629332Z cvt.f32.bf16 %r849, %rs3; 2026-02-21T09:20:54.3629503Z cvt.f32.bf16 %r850, %rs4; 2026-02-21T09:20:54.3629677Z cvt.f32.bf16 %r851, %rs7; 2026-02-21T09:20:54.3629844Z cvt.f32.bf16 %r852, %rs8; 2026-02-21T09:20:54.3630099Z cvt.f32.bf16 %r981, %rs5; 2026-02-21T09:20:54.3630279Z cvt.f32.bf16 %r982, %rs6; 2026-02-21T09:20:54.3630445Z cvt.f32.bf16 %r983, %rs9; 2026-02-21T09:20:54.3630619Z cvt.f32.bf16 %r984, %rs10; 2026-02-21T09:20:54.3630936Z .loc 1 48 30 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:48:30 2026-02-21T09:20:54.3631290Z cvt.s64.s32 %rd39, %r1967; 2026-02-21T09:20:54.3631466Z add.s64 %rd26, %rd8, %rd39; 2026-02-21T09:20:54.3631786Z .loc 1 48 83 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:48:83 2026-02-21T09:20:54.3632135Z // begin inline asm 2026-02-21T09:20:54.3632293Z mov.u64 %rd25, 0x0; 2026-02-21T09:20:54.3632540Z createpolicy.fractional.L2::evict_first.b64 %rd25, 1.0; 2026-02-21T09:20:54.3632793Z // end inline asm 2026-02-21T09:20:54.3632954Z // begin inline asm 2026-02-21T09:20:54.3633105Z mov.u16 %rs1, 0x0; 2026-02-21T09:20:54.3633359Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs1 }, [ %rd26 + 0 ], %rd25; 2026-02-21T09:20:54.3633657Z // end inline asm 2026-02-21T09:20:54.3633826Z shr.u16 %rs11, %rs1, 8; 2026-02-21T09:20:54.3634146Z .loc 1 56 24 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:56:24 2026-02-21T09:20:54.3634507Z st.shared.b8 [%r12], %rs1; 2026-02-21T09:20:54.3634693Z st.shared.b8 [%r13+1024], %rs11; 2026-02-21T09:20:54.3634877Z bar.sync 0; 2026-02-21T09:20:54.3635034Z ld.shared.b32 %r1783, [%r14]; 2026-02-21T09:20:54.3635224Z prmt.b32 %r1784, %r1783, 0, 0x7770U; 2026-02-21T09:20:54.3635426Z cvt.u16.u32 %rs12, %r1784; 2026-02-21T09:20:54.3635602Z prmt.b32 %r1785, %r1783, 0, 0x7771U; 2026-02-21T09:20:54.3635806Z cvt.u16.u32 %rs13, %r1785; 2026-02-21T09:20:54.3635982Z prmt.b32 %r1786, %r1783, 0, 0x7772U; 2026-02-21T09:20:54.3636180Z cvt.u16.u32 %rs14, %r1786; 2026-02-21T09:20:54.3636377Z prmt.b32 %r1787, %r1783, 0, 0x7773U; 2026-02-21T09:20:54.3636698Z cvt.u16.u32 %rs15, %r1787; 2026-02-21T09:20:54.3637023Z .loc 1 51 24 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:51:24 2026-02-21T09:20:54.3637376Z shl.b16 %rs16, %rs12, 4; 2026-02-21T09:20:54.3637557Z shl.b16 %rs17, %rs13, 4; 2026-02-21T09:20:54.3637729Z shl.b16 %rs18, %rs14, 4; 2026-02-21T09:20:54.3637905Z shl.b16 %rs19, %rs15, 4; 2026-02-21T09:20:54.3638215Z .loc 1 66 54 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:66:54 2026-02-21T09:20:54.3638666Z selp.b16 %rs20, %rs16, %rs12, %p7; 2026-02-21T09:20:54.3638866Z cvt.s16.s8 %rs21, %rs20; 2026-02-21T09:20:54.3639043Z shr.s16 %rs22, %rs21, 4; 2026-02-21T09:20:54.3639224Z selp.b16 %rs23, %rs17, %rs13, %p7; 2026-02-21T09:20:54.3639482Z cvt.s16.s8 %rs24, %rs23; 2026-02-21T09:20:54.3639653Z shr.s16 %rs25, %rs24, 4; 2026-02-21T09:20:54.3639820Z selp.b16 %rs26, %rs18, %rs14, %p7; 2026-02-21T09:20:54.3640014Z cvt.s16.s8 %rs27, %rs26; 2026-02-21T09:20:54.3640176Z shr.s16 %rs28, %rs27, 4; 2026-02-21T09:20:54.3640349Z selp.b16 %rs29, %rs19, %rs15, %p7; 2026-02-21T09:20:54.3640542Z cvt.s16.s8 %rs30, %rs29; 2026-02-21T09:20:54.3640705Z shr.s16 %rs31, %rs30, 4; 2026-02-21T09:20:54.3641092Z .loc 1 71 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:71:28 2026-02-21T09:20:54.3641449Z cvt.rn.f32.s16 %r1788, %rs22; 2026-02-21T09:20:54.3641633Z cvt.rn.f32.s16 %r1789, %rs25; 2026-02-21T09:20:54.3641808Z cvt.rn.f32.s16 %r1790, %rs28; 2026-02-21T09:20:54.3642005Z cvt.rn.f32.s16 %r1791, %rs31; 2026-02-21T09:20:54.3642174Z bar.sync 0; 2026-02-21T09:20:54.3642327Z st.shared.b32 [%r15], %r1788; 2026-02-21T09:20:54.3642510Z st.shared.b32 [%r16], %r1789; 2026-02-21T09:20:54.3642687Z st.shared.b32 [%r17], %r1790; 2026-02-21T09:20:54.3642872Z st.shared.b32 [%r18], %r1791; 2026-02-21T09:20:54.3643121Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1507}; 2026-02-21T09:20:54.3643408Z bar.sync 0; 2026-02-21T09:20:54.3643553Z // begin inline asm 2026-02-21T09:20:54.3643890Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r589, %r853}, [%r267]; 2026-02-21T09:20:54.3644176Z // end inline asm 2026-02-21T09:20:54.3644338Z bar.sync 0; 2026-02-21T09:20:54.3644556Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1509}; 2026-02-21T09:20:54.3644820Z bar.sync 0; 2026-02-21T09:20:54.3644966Z // begin inline asm 2026-02-21T09:20:54.3645200Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r591, %r855}, [%r267]; 2026-02-21T09:20:54.3645488Z // end inline asm 2026-02-21T09:20:54.3645631Z bar.sync 0; 2026-02-21T09:20:54.3645842Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1508}; 2026-02-21T09:20:54.3646104Z bar.sync 0; 2026-02-21T09:20:54.3646248Z // begin inline asm 2026-02-21T09:20:54.3655872Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r590, %r854}, [%r267]; 2026-02-21T09:20:54.3656192Z // end inline asm 2026-02-21T09:20:54.3656361Z bar.sync 0; 2026-02-21T09:20:54.3656783Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1510}; 2026-02-21T09:20:54.3657079Z bar.sync 0; 2026-02-21T09:20:54.3657245Z // begin inline asm 2026-02-21T09:20:54.3657497Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r592, %r856}, [%r267]; 2026-02-21T09:20:54.3657793Z // end inline asm 2026-02-21T09:20:54.3657946Z bar.sync 0; 2026-02-21T09:20:54.3658185Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1511}; 2026-02-21T09:20:54.3658456Z bar.sync 0; 2026-02-21T09:20:54.3658605Z // begin inline asm 2026-02-21T09:20:54.3658846Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r593, %r857}, [%r267]; 2026-02-21T09:20:54.3659145Z // end inline asm 2026-02-21T09:20:54.3659300Z bar.sync 0; 2026-02-21T09:20:54.3659511Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1513}; 2026-02-21T09:20:54.3659781Z bar.sync 0; 2026-02-21T09:20:54.3659925Z // begin inline asm 2026-02-21T09:20:54.3660167Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r595, %r859}, [%r267]; 2026-02-21T09:20:54.3660442Z // end inline asm 2026-02-21T09:20:54.3660595Z bar.sync 0; 2026-02-21T09:20:54.3660807Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1512}; 2026-02-21T09:20:54.3661074Z bar.sync 0; 2026-02-21T09:20:54.3661218Z // begin inline asm 2026-02-21T09:20:54.3661458Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r594, %r858}, [%r267]; 2026-02-21T09:20:54.3661745Z // end inline asm 2026-02-21T09:20:54.3661892Z bar.sync 0; 2026-02-21T09:20:54.3662108Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1514}; 2026-02-21T09:20:54.3662549Z bar.sync 0; 2026-02-21T09:20:54.3662696Z // begin inline asm 2026-02-21T09:20:54.3662928Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r596, %r860}, [%r267]; 2026-02-21T09:20:54.3663207Z // end inline asm 2026-02-21T09:20:54.3663351Z bar.sync 0; 2026-02-21T09:20:54.3663660Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1515}; 2026-02-21T09:20:54.3663925Z bar.sync 0; 2026-02-21T09:20:54.3664068Z // begin inline asm 2026-02-21T09:20:54.3664307Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r597, %r861}, [%r267]; 2026-02-21T09:20:54.3664577Z // end inline asm 2026-02-21T09:20:54.3664733Z bar.sync 0; 2026-02-21T09:20:54.3664942Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1517}; 2026-02-21T09:20:54.3665214Z bar.sync 0; 2026-02-21T09:20:54.3665358Z // begin inline asm 2026-02-21T09:20:54.3665704Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r599, %r863}, [%r267]; 2026-02-21T09:20:54.3665997Z // end inline asm 2026-02-21T09:20:54.3666143Z bar.sync 0; 2026-02-21T09:20:54.3666361Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1516}; 2026-02-21T09:20:54.3666765Z bar.sync 0; 2026-02-21T09:20:54.3666916Z // begin inline asm 2026-02-21T09:20:54.3667145Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r598, %r862}, [%r267]; 2026-02-21T09:20:54.3667438Z // end inline asm 2026-02-21T09:20:54.3667603Z bar.sync 0; 2026-02-21T09:20:54.3667833Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1518}; 2026-02-21T09:20:54.3668111Z bar.sync 0; 2026-02-21T09:20:54.3668266Z // begin inline asm 2026-02-21T09:20:54.3668702Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r600, %r864}, [%r267]; 2026-02-21T09:20:54.3668986Z // end inline asm 2026-02-21T09:20:54.3669142Z bar.sync 0; 2026-02-21T09:20:54.3669367Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1519}; 2026-02-21T09:20:54.3669639Z bar.sync 0; 2026-02-21T09:20:54.3669784Z // begin inline asm 2026-02-21T09:20:54.3670036Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r601, %r865}, [%r267]; 2026-02-21T09:20:54.3670319Z // end inline asm 2026-02-21T09:20:54.3670483Z bar.sync 0; 2026-02-21T09:20:54.3670700Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1521}; 2026-02-21T09:20:54.3670972Z bar.sync 0; 2026-02-21T09:20:54.3671135Z // begin inline asm 2026-02-21T09:20:54.3671377Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r603, %r867}, [%r267]; 2026-02-21T09:20:54.3671664Z // end inline asm 2026-02-21T09:20:54.3671810Z bar.sync 0; 2026-02-21T09:20:54.3672028Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1520}; 2026-02-21T09:20:54.3672300Z bar.sync 0; 2026-02-21T09:20:54.3672458Z // begin inline asm 2026-02-21T09:20:54.3672701Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r602, %r866}, [%r267]; 2026-02-21T09:20:54.3672990Z // end inline asm 2026-02-21T09:20:54.3673141Z bar.sync 0; 2026-02-21T09:20:54.3673354Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1522}; 2026-02-21T09:20:54.3673627Z bar.sync 0; 2026-02-21T09:20:54.3673770Z // begin inline asm 2026-02-21T09:20:54.3674033Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r604, %r868}, [%r267]; 2026-02-21T09:20:54.3674312Z // end inline asm 2026-02-21T09:20:54.3674466Z bar.sync 0; 2026-02-21T09:20:54.3674676Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1523}; 2026-02-21T09:20:54.3674944Z bar.sync 0; 2026-02-21T09:20:54.3675097Z // begin inline asm 2026-02-21T09:20:54.3675341Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r605, %r869}, [%r267]; 2026-02-21T09:20:54.3675622Z // end inline asm 2026-02-21T09:20:54.3675765Z bar.sync 0; 2026-02-21T09:20:54.3675987Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1525}; 2026-02-21T09:20:54.3676251Z bar.sync 0; 2026-02-21T09:20:54.3676397Z // begin inline asm 2026-02-21T09:20:54.3676766Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r607, %r871}, [%r267]; 2026-02-21T09:20:54.3677054Z // end inline asm 2026-02-21T09:20:54.3677199Z bar.sync 0; 2026-02-21T09:20:54.3677414Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1524}; 2026-02-21T09:20:54.3677686Z bar.sync 0; 2026-02-21T09:20:54.3677920Z // begin inline asm 2026-02-21T09:20:54.3678167Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r606, %r870}, [%r267]; 2026-02-21T09:20:54.3678453Z // end inline asm 2026-02-21T09:20:54.3678609Z bar.sync 0; 2026-02-21T09:20:54.3678820Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1526}; 2026-02-21T09:20:54.3679171Z bar.sync 0; 2026-02-21T09:20:54.3679317Z // begin inline asm 2026-02-21T09:20:54.3679560Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r608, %r872}, [%r267]; 2026-02-21T09:20:54.3679845Z // end inline asm 2026-02-21T09:20:54.3679994Z bar.sync 0; 2026-02-21T09:20:54.3680215Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1527}; 2026-02-21T09:20:54.3680485Z bar.sync 0; 2026-02-21T09:20:54.3680634Z // begin inline asm 2026-02-21T09:20:54.3680953Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r609, %r873}, [%r267]; 2026-02-21T09:20:54.3681252Z // end inline asm 2026-02-21T09:20:54.3681397Z bar.sync 0; 2026-02-21T09:20:54.3681614Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1529}; 2026-02-21T09:20:54.3681881Z bar.sync 0; 2026-02-21T09:20:54.3682030Z // begin inline asm 2026-02-21T09:20:54.3682269Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r611, %r875}, [%r267]; 2026-02-21T09:20:54.3682545Z // end inline asm 2026-02-21T09:20:54.3682712Z bar.sync 0; 2026-02-21T09:20:54.3682923Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1528}; 2026-02-21T09:20:54.3683190Z bar.sync 0; 2026-02-21T09:20:54.3683333Z // begin inline asm 2026-02-21T09:20:54.3683571Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r610, %r874}, [%r267]; 2026-02-21T09:20:54.3683921Z // end inline asm 2026-02-21T09:20:54.3684079Z bar.sync 0; 2026-02-21T09:20:54.3684299Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1530}; 2026-02-21T09:20:54.3684560Z bar.sync 0; 2026-02-21T09:20:54.3684705Z // begin inline asm 2026-02-21T09:20:54.3684938Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r612, %r876}, [%r267]; 2026-02-21T09:20:54.3685221Z // end inline asm 2026-02-21T09:20:54.3685369Z bar.sync 0; 2026-02-21T09:20:54.3685585Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1531}; 2026-02-21T09:20:54.3685842Z bar.sync 0; 2026-02-21T09:20:54.3685990Z // begin inline asm 2026-02-21T09:20:54.3686220Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r613, %r877}, [%r267]; 2026-02-21T09:20:54.3686627Z // end inline asm 2026-02-21T09:20:54.3686783Z bar.sync 0; 2026-02-21T09:20:54.3686994Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1533}; 2026-02-21T09:20:54.3687263Z bar.sync 0; 2026-02-21T09:20:54.3687402Z // begin inline asm 2026-02-21T09:20:54.3687640Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r615, %r879}, [%r267]; 2026-02-21T09:20:54.3687914Z // end inline asm 2026-02-21T09:20:54.3688065Z bar.sync 0; 2026-02-21T09:20:54.3688277Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1532}; 2026-02-21T09:20:54.3688540Z bar.sync 0; 2026-02-21T09:20:54.3688687Z // begin inline asm 2026-02-21T09:20:54.3688917Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r614, %r878}, [%r267]; 2026-02-21T09:20:54.3689220Z // end inline asm 2026-02-21T09:20:54.3689363Z bar.sync 0; 2026-02-21T09:20:54.3689575Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1534}; 2026-02-21T09:20:54.3689833Z bar.sync 0; 2026-02-21T09:20:54.3689992Z // begin inline asm 2026-02-21T09:20:54.3690227Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r616, %r880}, [%r267]; 2026-02-21T09:20:54.3690513Z // end inline asm 2026-02-21T09:20:54.3690664Z bar.sync 0; 2026-02-21T09:20:54.3690870Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1535}; 2026-02-21T09:20:54.3691137Z bar.sync 0; 2026-02-21T09:20:54.3691278Z // begin inline asm 2026-02-21T09:20:54.3691512Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r617, %r881}, [%r267]; 2026-02-21T09:20:54.3691785Z // end inline asm 2026-02-21T09:20:54.3691933Z bar.sync 0; 2026-02-21T09:20:54.3692141Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1537}; 2026-02-21T09:20:54.3692406Z bar.sync 0; 2026-02-21T09:20:54.3692643Z // begin inline asm 2026-02-21T09:20:54.3692873Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r619, %r883}, [%r267]; 2026-02-21T09:20:54.3693157Z // end inline asm 2026-02-21T09:20:54.3693301Z bar.sync 0; 2026-02-21T09:20:54.3693513Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1536}; 2026-02-21T09:20:54.3693836Z bar.sync 0; 2026-02-21T09:20:54.3693981Z // begin inline asm 2026-02-21T09:20:54.3694209Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r618, %r882}, [%r267]; 2026-02-21T09:20:54.3694488Z // end inline asm 2026-02-21T09:20:54.3694638Z bar.sync 0; 2026-02-21T09:20:54.3694865Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1538}; 2026-02-21T09:20:54.3695133Z bar.sync 0; 2026-02-21T09:20:54.3695272Z // begin inline asm 2026-02-21T09:20:54.3695578Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r620, %r884}, [%r267]; 2026-02-21T09:20:54.3695855Z // end inline asm 2026-02-21T09:20:54.3696002Z bar.sync 0; 2026-02-21T09:20:54.3696208Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1539}; 2026-02-21T09:20:54.3696600Z bar.sync 0; 2026-02-21T09:20:54.3696751Z // begin inline asm 2026-02-21T09:20:54.3696988Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r621, %r885}, [%r267]; 2026-02-21T09:20:54.3697264Z // end inline asm 2026-02-21T09:20:54.3697410Z bar.sync 0; 2026-02-21T09:20:54.3697621Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1541}; 2026-02-21T09:20:54.3697883Z bar.sync 0; 2026-02-21T09:20:54.3698023Z // begin inline asm 2026-02-21T09:20:54.3698251Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r623, %r887}, [%r267]; 2026-02-21T09:20:54.3698613Z // end inline asm 2026-02-21T09:20:54.3698765Z bar.sync 0; 2026-02-21T09:20:54.3698978Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1540}; 2026-02-21T09:20:54.3699241Z bar.sync 0; 2026-02-21T09:20:54.3699381Z // begin inline asm 2026-02-21T09:20:54.3699615Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r622, %r886}, [%r267]; 2026-02-21T09:20:54.3699887Z // end inline asm 2026-02-21T09:20:54.3700036Z bar.sync 0; 2026-02-21T09:20:54.3700240Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1542}; 2026-02-21T09:20:54.3700503Z bar.sync 0; 2026-02-21T09:20:54.3700642Z // begin inline asm 2026-02-21T09:20:54.3700875Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r624, %r888}, [%r267]; 2026-02-21T09:20:54.3701157Z // end inline asm 2026-02-21T09:20:54.3701299Z bar.sync 0; 2026-02-21T09:20:54.3701510Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1543}; 2026-02-21T09:20:54.3701771Z bar.sync 0; 2026-02-21T09:20:54.3701910Z // begin inline asm 2026-02-21T09:20:54.3702144Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r625, %r889}, [%r267]; 2026-02-21T09:20:54.3702414Z // end inline asm 2026-02-21T09:20:54.3702562Z bar.sync 0; 2026-02-21T09:20:54.3702768Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1545}; 2026-02-21T09:20:54.3703028Z bar.sync 0; 2026-02-21T09:20:54.3703165Z // begin inline asm 2026-02-21T09:20:54.3703408Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r627, %r891}, [%r267]; 2026-02-21T09:20:54.3703684Z // end inline asm 2026-02-21T09:20:54.3703827Z bar.sync 0; 2026-02-21T09:20:54.3704034Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1544}; 2026-02-21T09:20:54.3704286Z bar.sync 0; 2026-02-21T09:20:54.3704426Z // begin inline asm 2026-02-21T09:20:54.3704651Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r626, %r890}, [%r267]; 2026-02-21T09:20:54.3704928Z // end inline asm 2026-02-21T09:20:54.3705065Z bar.sync 0; 2026-02-21T09:20:54.3705274Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1546}; 2026-02-21T09:20:54.3705531Z bar.sync 0; 2026-02-21T09:20:54.3705686Z // begin inline asm 2026-02-21T09:20:54.3705921Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r628, %r892}, [%r267]; 2026-02-21T09:20:54.3706189Z // end inline asm 2026-02-21T09:20:54.3706335Z bar.sync 0; 2026-02-21T09:20:54.3706660Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1547}; 2026-02-21T09:20:54.3706924Z bar.sync 0; 2026-02-21T09:20:54.3707157Z // begin inline asm 2026-02-21T09:20:54.3707388Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r629, %r893}, [%r267]; 2026-02-21T09:20:54.3707657Z // end inline asm 2026-02-21T09:20:54.3707800Z bar.sync 0; 2026-02-21T09:20:54.3708007Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1549}; 2026-02-21T09:20:54.3708436Z bar.sync 0; 2026-02-21T09:20:54.3708592Z // begin inline asm 2026-02-21T09:20:54.3708820Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r631, %r895}, [%r267]; 2026-02-21T09:20:54.3709094Z // end inline asm 2026-02-21T09:20:54.3709235Z bar.sync 0; 2026-02-21T09:20:54.3709444Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1548}; 2026-02-21T09:20:54.3709700Z bar.sync 0; 2026-02-21T09:20:54.3709845Z // begin inline asm 2026-02-21T09:20:54.3710168Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r630, %r894}, [%r267]; 2026-02-21T09:20:54.3710454Z // end inline asm 2026-02-21T09:20:54.3710601Z bar.sync 0; 2026-02-21T09:20:54.3710804Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1550}; 2026-02-21T09:20:54.3711068Z bar.sync 0; 2026-02-21T09:20:54.3711206Z // begin inline asm 2026-02-21T09:20:54.3711437Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r632, %r896}, [%r267]; 2026-02-21T09:20:54.3711707Z // end inline asm 2026-02-21T09:20:54.3711854Z bar.sync 0; 2026-02-21T09:20:54.3712063Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1551}; 2026-02-21T09:20:54.3712329Z bar.sync 0; 2026-02-21T09:20:54.3712469Z // begin inline asm 2026-02-21T09:20:54.3712724Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r633, %r897}, [%r267]; 2026-02-21T09:20:54.3713082Z // end inline asm 2026-02-21T09:20:54.3713232Z bar.sync 0; 2026-02-21T09:20:54.3713455Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1553}; 2026-02-21T09:20:54.3713722Z bar.sync 0; 2026-02-21T09:20:54.3713863Z // begin inline asm 2026-02-21T09:20:54.3714094Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r635, %r899}, [%r267]; 2026-02-21T09:20:54.3714371Z // end inline asm 2026-02-21T09:20:54.3714516Z bar.sync 0; 2026-02-21T09:20:54.3714728Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1552}; 2026-02-21T09:20:54.3714989Z bar.sync 0; 2026-02-21T09:20:54.3715125Z // begin inline asm 2026-02-21T09:20:54.3715356Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r634, %r898}, [%r267]; 2026-02-21T09:20:54.3715630Z // end inline asm 2026-02-21T09:20:54.3715788Z bar.sync 0; 2026-02-21T09:20:54.3715996Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1554}; 2026-02-21T09:20:54.3716257Z bar.sync 0; 2026-02-21T09:20:54.3716394Z // begin inline asm 2026-02-21T09:20:54.3716757Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r636, %r900}, [%r267]; 2026-02-21T09:20:54.3717030Z // end inline asm 2026-02-21T09:20:54.3717175Z bar.sync 0; 2026-02-21T09:20:54.3717382Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1555}; 2026-02-21T09:20:54.3717639Z bar.sync 0; 2026-02-21T09:20:54.3717779Z // begin inline asm 2026-02-21T09:20:54.3718010Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r637, %r901}, [%r267]; 2026-02-21T09:20:54.3718298Z // end inline asm 2026-02-21T09:20:54.3718442Z bar.sync 0; 2026-02-21T09:20:54.3718651Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1557}; 2026-02-21T09:20:54.3718910Z bar.sync 0; 2026-02-21T09:20:54.3719049Z // begin inline asm 2026-02-21T09:20:54.3719288Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r639, %r903}, [%r267]; 2026-02-21T09:20:54.3719561Z // end inline asm 2026-02-21T09:20:54.3719706Z bar.sync 0; 2026-02-21T09:20:54.3719911Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1556}; 2026-02-21T09:20:54.3720173Z bar.sync 0; 2026-02-21T09:20:54.3720319Z // begin inline asm 2026-02-21T09:20:54.3720557Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r638, %r902}, [%r267]; 2026-02-21T09:20:54.3720837Z // end inline asm 2026-02-21T09:20:54.3720986Z bar.sync 0; 2026-02-21T09:20:54.3721190Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1558}; 2026-02-21T09:20:54.3721452Z bar.sync 0; 2026-02-21T09:20:54.3721595Z // begin inline asm 2026-02-21T09:20:54.3721918Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r640, %r904}, [%r267]; 2026-02-21T09:20:54.3722193Z // end inline asm 2026-02-21T09:20:54.3722332Z bar.sync 0; 2026-02-21T09:20:54.3722543Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1559}; 2026-02-21T09:20:54.3722878Z bar.sync 0; 2026-02-21T09:20:54.3723021Z // begin inline asm 2026-02-21T09:20:54.3723250Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r641, %r905}, [%r267]; 2026-02-21T09:20:54.3723528Z // end inline asm 2026-02-21T09:20:54.3723672Z bar.sync 0; 2026-02-21T09:20:54.3723881Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1561}; 2026-02-21T09:20:54.3724142Z bar.sync 0; 2026-02-21T09:20:54.3724276Z // begin inline asm 2026-02-21T09:20:54.3724587Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r643, %r907}, [%r267]; 2026-02-21T09:20:54.3724868Z // end inline asm 2026-02-21T09:20:54.3725010Z bar.sync 0; 2026-02-21T09:20:54.3725217Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1560}; 2026-02-21T09:20:54.3725480Z bar.sync 0; 2026-02-21T09:20:54.3725618Z // begin inline asm 2026-02-21T09:20:54.3725861Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r642, %r906}, [%r267]; 2026-02-21T09:20:54.3726143Z // end inline asm 2026-02-21T09:20:54.3726289Z bar.sync 0; 2026-02-21T09:20:54.3726621Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1562}; 2026-02-21T09:20:54.3726889Z bar.sync 0; 2026-02-21T09:20:54.3727034Z // begin inline asm 2026-02-21T09:20:54.3727269Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r644, %r908}, [%r267]; 2026-02-21T09:20:54.3727633Z // end inline asm 2026-02-21T09:20:54.3727786Z bar.sync 0; 2026-02-21T09:20:54.3727997Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1563}; 2026-02-21T09:20:54.3728261Z bar.sync 0; 2026-02-21T09:20:54.3728397Z // begin inline asm 2026-02-21T09:20:54.3728630Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r645, %r909}, [%r267]; 2026-02-21T09:20:54.3728903Z // end inline asm 2026-02-21T09:20:54.3729048Z bar.sync 0; 2026-02-21T09:20:54.3729253Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1565}; 2026-02-21T09:20:54.3729515Z bar.sync 0; 2026-02-21T09:20:54.3729652Z // begin inline asm 2026-02-21T09:20:54.3729881Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r647, %r911}, [%r267]; 2026-02-21T09:20:54.3730169Z // end inline asm 2026-02-21T09:20:54.3730314Z bar.sync 0; 2026-02-21T09:20:54.3730531Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1564}; 2026-02-21T09:20:54.3730789Z bar.sync 0; 2026-02-21T09:20:54.3730930Z // begin inline asm 2026-02-21T09:20:54.3731160Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r646, %r910}, [%r267]; 2026-02-21T09:20:54.3731435Z // end inline asm 2026-02-21T09:20:54.3731576Z bar.sync 0; 2026-02-21T09:20:54.3731790Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1566}; 2026-02-21T09:20:54.3732049Z bar.sync 0; 2026-02-21T09:20:54.3732190Z // begin inline asm 2026-02-21T09:20:54.3732420Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r648, %r912}, [%r267]; 2026-02-21T09:20:54.3732692Z // end inline asm 2026-02-21T09:20:54.3732837Z bar.sync 0; 2026-02-21T09:20:54.3733047Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1567}; 2026-02-21T09:20:54.3733309Z bar.sync 0; 2026-02-21T09:20:54.3733447Z // begin inline asm 2026-02-21T09:20:54.3733681Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r649, %r913}, [%r267]; 2026-02-21T09:20:54.3733953Z // end inline asm 2026-02-21T09:20:54.3734098Z bar.sync 0; 2026-02-21T09:20:54.3734304Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1569}; 2026-02-21T09:20:54.3734564Z bar.sync 0; 2026-02-21T09:20:54.3734704Z // begin inline asm 2026-02-21T09:20:54.3734945Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r651, %r915}, [%r267]; 2026-02-21T09:20:54.3735225Z // end inline asm 2026-02-21T09:20:54.3735367Z bar.sync 0; 2026-02-21T09:20:54.3735574Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1568}; 2026-02-21T09:20:54.3735831Z bar.sync 0; 2026-02-21T09:20:54.3735970Z // begin inline asm 2026-02-21T09:20:54.3736293Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r650, %r914}, [%r267]; 2026-02-21T09:20:54.3736686Z // end inline asm 2026-02-21T09:20:54.3736830Z bar.sync 0; 2026-02-21T09:20:54.3737037Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1248], {%r1570}; 2026-02-21T09:20:54.3737375Z bar.sync 0; 2026-02-21T09:20:54.3737524Z // begin inline asm 2026-02-21T09:20:54.3737757Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r652, %r916}, [%r267]; 2026-02-21T09:20:54.3738026Z // end inline asm 2026-02-21T09:20:54.3738170Z $L__tmp1: 2026-02-21T09:20:54.3738537Z .loc 2 291 36 // standard.py:291:36 @[ ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:78:36 ] 2026-02-21T09:20:54.3738969Z // begin inline asm 2026-02-21T09:20:54.3739155Z fence.proxy.async.shared::cta; 2026-02-21T09:20:54.3739414Z // end inline asm 2026-02-21T09:20:54.3739588Z shfl.sync.idx.b32 %r1792, %r3, 0, 31, -1; 2026-02-21T09:20:54.3739813Z wgmma.fence.sync.aligned; 2026-02-21T09:20:54.3740005Z mov.pred %p1, -1; 2026-02-21T09:20:54.3740182Z // begin inline asm 2026-02-21T09:20:54.3741360Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r612,%r613,%r614,%r615,%r616,%r617,%r618,%r619,%r620,%r621,%r622,%r623,%r624,%r625,%r626,%r627,%r628,%r629,%r630,%r631,%r632,%r633,%r634,%r635,%r636,%r637,%r638,%r639,%r640,%r641,%r642,%r643,%r644,%r645,%r646,%r647,%r648,%r649,%r650,%r651,%r652}, {%r849,%r850,%r851,%r852}, %rd28, %p1, 1, 1; 2026-02-21T09:20:54.3742653Z // end inline asm 2026-02-21T09:20:54.3742811Z // begin inline asm 2026-02-21T09:20:54.3743964Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r612,%r613,%r614,%r615,%r616,%r617,%r618,%r619,%r620,%r621,%r622,%r623,%r624,%r625,%r626,%r627,%r628,%r629,%r630,%r631,%r632,%r633,%r634,%r635,%r636,%r637,%r638,%r639,%r640,%r641,%r642,%r643,%r644,%r645,%r646,%r647,%r648,%r649,%r650,%r651,%r652}, {%r981,%r982,%r983,%r984}, %rd29, %p1, 1, 1; 2026-02-21T09:20:54.3745158Z // end inline asm 2026-02-21T09:20:54.3745305Z // begin inline asm 2026-02-21T09:20:54.3746579Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r874,%r875,%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r849,%r850,%r851,%r852}, %rd30, %p1, 1, 1; 2026-02-21T09:20:54.3747813Z // end inline asm 2026-02-21T09:20:54.3747966Z // begin inline asm 2026-02-21T09:20:54.3749207Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r874,%r875,%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916}, {%r981,%r982,%r983,%r984}, %rd31, %p1, 1, 1; 2026-02-21T09:20:54.3750423Z // end inline asm 2026-02-21T09:20:54.3750593Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:54.3750793Z mov.b32 %r1704, 0; 2026-02-21T09:20:54.3750948Z mov.b32 %r1114, %r1704; 2026-02-21T09:20:54.3751124Z mov.b32 %r1115, %r1704; 2026-02-21T09:20:54.3751289Z mov.b32 %r1113, %r1244; 2026-02-21T09:20:54.3751449Z // begin inline asm 2026-02-21T09:20:54.3753259Z // wait for regs: %r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r612,%r613,%r614,%r615,%r616,%r617,%r618,%r619,%r620,%r621,%r622,%r623,%r624,%r625,%r626,%r627,%r628,%r629,%r630,%r631,%r632,%r633,%r634,%r635,%r636,%r637,%r638,%r639,%r640,%r641,%r642,%r643,%r644,%r645,%r646,%r647,%r648,%r649,%r650,%r651,%r652,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r874,%r875,%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916,%r1113,%r1114,%r1115 2026-02-21T09:20:54.3755266Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:54.3755458Z // end inline asm 2026-02-21T09:20:54.3755604Z $L__tmp2: 2026-02-21T09:20:54.3755969Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3756349Z add.s32 %r1793, %r1780, 40960; 2026-02-21T09:20:54.3756822Z .loc 1 46 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:46:28 2026-02-21T09:20:54.3757184Z add.s32 %r1794, %r1793, %r10; 2026-02-21T09:20:54.3757396Z ld.shared.b16 %rs32, [%r1794]; 2026-02-21T09:20:54.3757593Z ld.shared.b16 %rs33, [%r1794+256]; 2026-02-21T09:20:54.3757811Z ld.shared.b16 %rs34, [%r1794+16]; 2026-02-21T09:20:54.3758014Z ld.shared.b16 %rs35, [%r1794+272]; 2026-02-21T09:20:54.3758211Z add.s32 %r1795, %r1793, %r11; 2026-02-21T09:20:54.3758394Z ld.shared.b16 %rs36, [%r1795]; 2026-02-21T09:20:54.3758587Z ld.shared.b16 %rs37, [%r1795+256]; 2026-02-21T09:20:54.3758787Z ld.shared.b16 %rs38, [%r1795+16]; 2026-02-21T09:20:54.3759064Z ld.shared.b16 %rs39, [%r1795+272]; 2026-02-21T09:20:54.3759275Z cvt.f32.bf16 %r1503, %rs32; 2026-02-21T09:20:54.3759457Z cvt.f32.bf16 %r1504, %rs33; 2026-02-21T09:20:54.3759640Z cvt.f32.bf16 %r1505, %rs36; 2026-02-21T09:20:54.3759814Z cvt.f32.bf16 %r1506, %rs37; 2026-02-21T09:20:54.3759993Z cvt.f32.bf16 %r1635, %rs34; 2026-02-21T09:20:54.3760164Z cvt.f32.bf16 %r1636, %rs35; 2026-02-21T09:20:54.3760344Z cvt.f32.bf16 %r1637, %rs38; 2026-02-21T09:20:54.3760523Z cvt.f32.bf16 %r1638, %rs39; 2026-02-21T09:20:54.3760847Z .loc 1 48 30 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:48:30 2026-02-21T09:20:54.3761208Z add.s32 %r1796, %r1967, 65536; 2026-02-21T09:20:54.3761405Z cvt.s64.s32 %rd40, %r1796; 2026-02-21T09:20:54.3761590Z add.s64 %rd33, %rd8, %rd40; 2026-02-21T09:20:54.3761903Z .loc 1 48 83 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:48:83 2026-02-21T09:20:54.3762256Z // begin inline asm 2026-02-21T09:20:54.3762415Z mov.u64 %rd32, 0x0; 2026-02-21T09:20:54.3762640Z createpolicy.fractional.L2::evict_first.b64 %rd32, 1.0; 2026-02-21T09:20:54.3762901Z // end inline asm 2026-02-21T09:20:54.3763056Z // begin inline asm 2026-02-21T09:20:54.3763214Z mov.u16 %rs2, 0x0; 2026-02-21T09:20:54.3763463Z ld.global.L1::evict_first.L2::cache_hint.b16 { %rs2 }, [ %rd33 + 0 ], %rd32; 2026-02-21T09:20:54.3763768Z // end inline asm 2026-02-21T09:20:54.3763925Z shr.u16 %rs40, %rs2, 8; 2026-02-21T09:20:54.3764240Z .loc 1 56 24 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:56:24 2026-02-21T09:20:54.3764596Z bar.sync 0; 2026-02-21T09:20:54.3764755Z st.shared.b8 [%r12], %rs2; 2026-02-21T09:20:54.3764945Z st.shared.b8 [%r13+1024], %rs40; 2026-02-21T09:20:54.3765133Z bar.sync 0; 2026-02-21T09:20:54.3765291Z ld.shared.b32 %r1797, [%r14]; 2026-02-21T09:20:54.3765482Z prmt.b32 %r1798, %r1797, 0, 0x7770U; 2026-02-21T09:20:54.3765686Z cvt.u16.u32 %rs41, %r1798; 2026-02-21T09:20:54.3765866Z prmt.b32 %r1799, %r1797, 0, 0x7771U; 2026-02-21T09:20:54.3766064Z cvt.u16.u32 %rs42, %r1799; 2026-02-21T09:20:54.3766237Z prmt.b32 %r1800, %r1797, 0, 0x7772U; 2026-02-21T09:20:54.3766436Z cvt.u16.u32 %rs43, %r1800; 2026-02-21T09:20:54.3766739Z prmt.b32 %r1801, %r1797, 0, 0x7773U; 2026-02-21T09:20:54.3766933Z cvt.u16.u32 %rs44, %r1801; 2026-02-21T09:20:54.3767245Z .loc 1 51 24 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:51:24 2026-02-21T09:20:54.3767701Z shl.b16 %rs45, %rs41, 4; 2026-02-21T09:20:54.3767879Z shl.b16 %rs46, %rs42, 4; 2026-02-21T09:20:54.3768046Z shl.b16 %rs47, %rs43, 4; 2026-02-21T09:20:54.3768218Z shl.b16 %rs48, %rs44, 4; 2026-02-21T09:20:54.3768623Z .loc 1 66 54 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:66:54 2026-02-21T09:20:54.3768988Z selp.b16 %rs49, %rs45, %rs41, %p7; 2026-02-21T09:20:54.3769194Z cvt.s16.s8 %rs50, %rs49; 2026-02-21T09:20:54.3769363Z shr.s16 %rs51, %rs50, 4; 2026-02-21T09:20:54.3769558Z selp.b16 %rs52, %rs46, %rs42, %p7; 2026-02-21T09:20:54.3769760Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T09:20:54.3769935Z shr.s16 %rs54, %rs53, 4; 2026-02-21T09:20:54.3770185Z selp.b16 %rs55, %rs47, %rs43, %p7; 2026-02-21T09:20:54.3770396Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T09:20:54.3770562Z shr.s16 %rs57, %rs56, 4; 2026-02-21T09:20:54.3770739Z selp.b16 %rs58, %rs48, %rs44, %p7; 2026-02-21T09:20:54.3770930Z cvt.s16.s8 %rs59, %rs58; 2026-02-21T09:20:54.3771100Z shr.s16 %rs60, %rs59, 4; 2026-02-21T09:20:54.3771407Z .loc 1 71 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:71:28 2026-02-21T09:20:54.3771760Z cvt.rn.f32.s16 %r1802, %rs51; 2026-02-21T09:20:54.3771948Z cvt.rn.f32.s16 %r1803, %rs54; 2026-02-21T09:20:54.3772138Z cvt.rn.f32.s16 %r1804, %rs57; 2026-02-21T09:20:54.3772325Z cvt.rn.f32.s16 %r1805, %rs60; 2026-02-21T09:20:54.3772496Z bar.sync 0; 2026-02-21T09:20:54.3772651Z st.shared.b32 [%r15], %r1802; 2026-02-21T09:20:54.3772906Z st.shared.b32 [%r16], %r1803; 2026-02-21T09:20:54.3773091Z st.shared.b32 [%r17], %r1804; 2026-02-21T09:20:54.3773269Z st.shared.b32 [%r18], %r1805; 2026-02-21T09:20:54.3773438Z $L__tmp3: 2026-02-21T09:20:54.3773800Z .loc 2 291 36 // standard.py:291:36 @[ ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:78:36 ] 2026-02-21T09:20:54.3774307Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r589, %r853}; 2026-02-21T09:20:54.3774594Z bar.sync 0; 2026-02-21T09:20:54.3774737Z // begin inline asm 2026-02-21T09:20:54.3774971Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1507}, [%r1248]; 2026-02-21T09:20:54.3775255Z // end inline asm 2026-02-21T09:20:54.3775410Z bar.sync 0; 2026-02-21T09:20:54.3775644Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r591, %r855}; 2026-02-21T09:20:54.3775919Z bar.sync 0; 2026-02-21T09:20:54.3776064Z // begin inline asm 2026-02-21T09:20:54.3776288Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1509}, [%r1248]; 2026-02-21T09:20:54.3776684Z // end inline asm 2026-02-21T09:20:54.3776830Z bar.sync 0; 2026-02-21T09:20:54.3777058Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r590, %r854}; 2026-02-21T09:20:54.3777332Z bar.sync 0; 2026-02-21T09:20:54.3777478Z // begin inline asm 2026-02-21T09:20:54.3777711Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1508}, [%r1248]; 2026-02-21T09:20:54.3777986Z // end inline asm 2026-02-21T09:20:54.3778138Z bar.sync 0; 2026-02-21T09:20:54.3778359Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r592, %r856}; 2026-02-21T09:20:54.3778639Z bar.sync 0; 2026-02-21T09:20:54.3778778Z // begin inline asm 2026-02-21T09:20:54.3779011Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1510}, [%r1248]; 2026-02-21T09:20:54.3779278Z // end inline asm 2026-02-21T09:20:54.3779426Z bar.sync 0; 2026-02-21T09:20:54.3779647Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r593, %r857}; 2026-02-21T09:20:54.3779932Z bar.sync 0; 2026-02-21T09:20:54.3780079Z // begin inline asm 2026-02-21T09:20:54.3780327Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1511}, [%r1248]; 2026-02-21T09:20:54.3780597Z // end inline asm 2026-02-21T09:20:54.3780737Z bar.sync 0; 2026-02-21T09:20:54.3780963Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r595, %r859}; 2026-02-21T09:20:54.3781237Z bar.sync 0; 2026-02-21T09:20:54.3781382Z // begin inline asm 2026-02-21T09:20:54.3781603Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1513}, [%r1248]; 2026-02-21T09:20:54.3781971Z // end inline asm 2026-02-21T09:20:54.3782118Z bar.sync 0; 2026-02-21T09:20:54.3782336Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r594, %r858}; 2026-02-21T09:20:54.3782613Z bar.sync 0; 2026-02-21T09:20:54.3782838Z // begin inline asm 2026-02-21T09:20:54.3783062Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1512}, [%r1248]; 2026-02-21T09:20:54.3783334Z // end inline asm 2026-02-21T09:20:54.3783483Z bar.sync 0; 2026-02-21T09:20:54.3783702Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r596, %r860}; 2026-02-21T09:20:54.3783983Z bar.sync 0; 2026-02-21T09:20:54.3784126Z // begin inline asm 2026-02-21T09:20:54.3784351Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1514}, [%r1248]; 2026-02-21T09:20:54.3784684Z // end inline asm 2026-02-21T09:20:54.3784831Z bar.sync 0; 2026-02-21T09:20:54.3785053Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r597, %r861}; 2026-02-21T09:20:54.3785326Z bar.sync 0; 2026-02-21T09:20:54.3785472Z // begin inline asm 2026-02-21T09:20:54.3785694Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1515}, [%r1248]; 2026-02-21T09:20:54.3785963Z // end inline asm 2026-02-21T09:20:54.3786118Z bar.sync 0; 2026-02-21T09:20:54.3786347Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r599, %r863}; 2026-02-21T09:20:54.3786732Z bar.sync 0; 2026-02-21T09:20:54.3786885Z // begin inline asm 2026-02-21T09:20:54.3787110Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1517}, [%r1248]; 2026-02-21T09:20:54.3787371Z // end inline asm 2026-02-21T09:20:54.3787518Z bar.sync 0; 2026-02-21T09:20:54.3787822Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r598, %r862}; 2026-02-21T09:20:54.3788117Z bar.sync 0; 2026-02-21T09:20:54.3788261Z // begin inline asm 2026-02-21T09:20:54.3788582Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1516}, [%r1248]; 2026-02-21T09:20:54.3788851Z // end inline asm 2026-02-21T09:20:54.3788996Z bar.sync 0; 2026-02-21T09:20:54.3789219Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r600, %r864}; 2026-02-21T09:20:54.3789492Z bar.sync 0; 2026-02-21T09:20:54.3789636Z // begin inline asm 2026-02-21T09:20:54.3789857Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1518}, [%r1248]; 2026-02-21T09:20:54.3790123Z // end inline asm 2026-02-21T09:20:54.3790271Z bar.sync 0; 2026-02-21T09:20:54.3790506Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r601, %r865}; 2026-02-21T09:20:54.3790778Z bar.sync 0; 2026-02-21T09:20:54.3790925Z // begin inline asm 2026-02-21T09:20:54.3791154Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1519}, [%r1248]; 2026-02-21T09:20:54.3791417Z // end inline asm 2026-02-21T09:20:54.3791571Z bar.sync 0; 2026-02-21T09:20:54.3791806Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r603, %r867}; 2026-02-21T09:20:54.3792087Z bar.sync 0; 2026-02-21T09:20:54.3792228Z // begin inline asm 2026-02-21T09:20:54.3792455Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1521}, [%r1248]; 2026-02-21T09:20:54.3792717Z // end inline asm 2026-02-21T09:20:54.3792868Z bar.sync 0; 2026-02-21T09:20:54.3793100Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r602, %r866}; 2026-02-21T09:20:54.3793377Z bar.sync 0; 2026-02-21T09:20:54.3793521Z // begin inline asm 2026-02-21T09:20:54.3793742Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1520}, [%r1248]; 2026-02-21T09:20:54.3794009Z // end inline asm 2026-02-21T09:20:54.3794152Z bar.sync 0; 2026-02-21T09:20:54.3794374Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r604, %r868}; 2026-02-21T09:20:54.3794645Z bar.sync 0; 2026-02-21T09:20:54.3794787Z // begin inline asm 2026-02-21T09:20:54.3795012Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1522}, [%r1248]; 2026-02-21T09:20:54.3795285Z // end inline asm 2026-02-21T09:20:54.3795449Z bar.sync 0; 2026-02-21T09:20:54.3795673Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r605, %r869}; 2026-02-21T09:20:54.3795951Z bar.sync 0; 2026-02-21T09:20:54.3796089Z // begin inline asm 2026-02-21T09:20:54.3796315Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1523}, [%r1248]; 2026-02-21T09:20:54.3796803Z // end inline asm 2026-02-21T09:20:54.3796953Z bar.sync 0; 2026-02-21T09:20:54.3797172Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r607, %r871}; 2026-02-21T09:20:54.3797449Z bar.sync 0; 2026-02-21T09:20:54.3797667Z // begin inline asm 2026-02-21T09:20:54.3797888Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1525}, [%r1248]; 2026-02-21T09:20:54.3798154Z // end inline asm 2026-02-21T09:20:54.3798295Z bar.sync 0; 2026-02-21T09:20:54.3798517Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r606, %r870}; 2026-02-21T09:20:54.3798793Z bar.sync 0; 2026-02-21T09:20:54.3798937Z // begin inline asm 2026-02-21T09:20:54.3799157Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1524}, [%r1248]; 2026-02-21T09:20:54.3799424Z // end inline asm 2026-02-21T09:20:54.3799644Z bar.sync 0; 2026-02-21T09:20:54.3799886Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r608, %r872}; 2026-02-21T09:20:54.3800164Z bar.sync 0; 2026-02-21T09:20:54.3800307Z // begin inline asm 2026-02-21T09:20:54.3800531Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1526}, [%r1248]; 2026-02-21T09:20:54.3800801Z // end inline asm 2026-02-21T09:20:54.3800961Z bar.sync 0; 2026-02-21T09:20:54.3801182Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r609, %r873}; 2026-02-21T09:20:54.3801471Z bar.sync 0; 2026-02-21T09:20:54.3801611Z // begin inline asm 2026-02-21T09:20:54.3801841Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1527}, [%r1248]; 2026-02-21T09:20:54.3801902Z // end inline asm 2026-02-21T09:20:54.3801966Z bar.sync 0; 2026-02-21T09:20:54.3802180Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r611, %r875}; 2026-02-21T09:20:54.3802243Z bar.sync 0; 2026-02-21T09:20:54.3802307Z // begin inline asm 2026-02-21T09:20:54.3802454Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1529}, [%r1248]; 2026-02-21T09:20:54.3802514Z // end inline asm 2026-02-21T09:20:54.3802576Z bar.sync 0; 2026-02-21T09:20:54.3802728Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r610, %r874}; 2026-02-21T09:20:54.3802801Z bar.sync 0; 2026-02-21T09:20:54.3802865Z // begin inline asm 2026-02-21T09:20:54.3802997Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1528}, [%r1248]; 2026-02-21T09:20:54.3803063Z // end inline asm 2026-02-21T09:20:54.3803123Z bar.sync 0; 2026-02-21T09:20:54.3803266Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r612, %r876}; 2026-02-21T09:20:54.3803329Z bar.sync 0; 2026-02-21T09:20:54.3803389Z // begin inline asm 2026-02-21T09:20:54.3803519Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1530}, [%r1248]; 2026-02-21T09:20:54.3803585Z // end inline asm 2026-02-21T09:20:54.3803641Z bar.sync 0; 2026-02-21T09:20:54.3803779Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r613, %r877}; 2026-02-21T09:20:54.3803835Z bar.sync 0; 2026-02-21T09:20:54.3803903Z // begin inline asm 2026-02-21T09:20:54.3804032Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1531}, [%r1248]; 2026-02-21T09:20:54.3804090Z // end inline asm 2026-02-21T09:20:54.3804150Z bar.sync 0; 2026-02-21T09:20:54.3804290Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r615, %r879}; 2026-02-21T09:20:54.3804347Z bar.sync 0; 2026-02-21T09:20:54.3804406Z // begin inline asm 2026-02-21T09:20:54.3804538Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1533}, [%r1248]; 2026-02-21T09:20:54.3804597Z // end inline asm 2026-02-21T09:20:54.3804653Z bar.sync 0; 2026-02-21T09:20:54.3804796Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r614, %r878}; 2026-02-21T09:20:54.3804852Z bar.sync 0; 2026-02-21T09:20:54.3804913Z // begin inline asm 2026-02-21T09:20:54.3805039Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1532}, [%r1248]; 2026-02-21T09:20:54.3805100Z // end inline asm 2026-02-21T09:20:54.3805155Z bar.sync 0; 2026-02-21T09:20:54.3805294Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r616, %r880}; 2026-02-21T09:20:54.3805355Z bar.sync 0; 2026-02-21T09:20:54.3805415Z // begin inline asm 2026-02-21T09:20:54.3805555Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1534}, [%r1248]; 2026-02-21T09:20:54.3805698Z // end inline asm 2026-02-21T09:20:54.3805755Z bar.sync 0; 2026-02-21T09:20:54.3805895Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r617, %r881}; 2026-02-21T09:20:54.3805952Z bar.sync 0; 2026-02-21T09:20:54.3806018Z // begin inline asm 2026-02-21T09:20:54.3806195Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1535}, [%r1248]; 2026-02-21T09:20:54.3806253Z // end inline asm 2026-02-21T09:20:54.3806314Z bar.sync 0; 2026-02-21T09:20:54.3806574Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r619, %r883}; 2026-02-21T09:20:54.3806638Z bar.sync 0; 2026-02-21T09:20:54.3806700Z // begin inline asm 2026-02-21T09:20:54.3806836Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1537}, [%r1248]; 2026-02-21T09:20:54.3806894Z // end inline asm 2026-02-21T09:20:54.3807034Z bar.sync 0; 2026-02-21T09:20:54.3807193Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r618, %r882}; 2026-02-21T09:20:54.3807251Z bar.sync 0; 2026-02-21T09:20:54.3807312Z // begin inline asm 2026-02-21T09:20:54.3807443Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1536}, [%r1248]; 2026-02-21T09:20:54.3807508Z // end inline asm 2026-02-21T09:20:54.3807566Z bar.sync 0; 2026-02-21T09:20:54.3807708Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r620, %r884}; 2026-02-21T09:20:54.3807773Z bar.sync 0; 2026-02-21T09:20:54.3807834Z // begin inline asm 2026-02-21T09:20:54.3807974Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1538}, [%r1248]; 2026-02-21T09:20:54.3808040Z // end inline asm 2026-02-21T09:20:54.3808097Z bar.sync 0; 2026-02-21T09:20:54.3808304Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r621, %r885}; 2026-02-21T09:20:54.3808370Z bar.sync 0; 2026-02-21T09:20:54.3808437Z // begin inline asm 2026-02-21T09:20:54.3808572Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1539}, [%r1248]; 2026-02-21T09:20:54.3808631Z // end inline asm 2026-02-21T09:20:54.3808693Z bar.sync 0; 2026-02-21T09:20:54.3808832Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r623, %r887}; 2026-02-21T09:20:54.3808890Z bar.sync 0; 2026-02-21T09:20:54.3808953Z // begin inline asm 2026-02-21T09:20:54.3809087Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1541}, [%r1248]; 2026-02-21T09:20:54.3809145Z // end inline asm 2026-02-21T09:20:54.3809201Z bar.sync 0; 2026-02-21T09:20:54.3809345Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r622, %r886}; 2026-02-21T09:20:54.3809399Z bar.sync 0; 2026-02-21T09:20:54.3809458Z // begin inline asm 2026-02-21T09:20:54.3809584Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1540}, [%r1248]; 2026-02-21T09:20:54.3809644Z // end inline asm 2026-02-21T09:20:54.3809701Z bar.sync 0; 2026-02-21T09:20:54.3809850Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r624, %r888}; 2026-02-21T09:20:54.3809913Z bar.sync 0; 2026-02-21T09:20:54.3809975Z // begin inline asm 2026-02-21T09:20:54.3810106Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1542}, [%r1248]; 2026-02-21T09:20:54.3810172Z // end inline asm 2026-02-21T09:20:54.3810228Z bar.sync 0; 2026-02-21T09:20:54.3810371Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r625, %r889}; 2026-02-21T09:20:54.3810438Z bar.sync 0; 2026-02-21T09:20:54.3810507Z // begin inline asm 2026-02-21T09:20:54.3810635Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1543}, [%r1248]; 2026-02-21T09:20:54.3810696Z // end inline asm 2026-02-21T09:20:54.3810760Z bar.sync 0; 2026-02-21T09:20:54.3810898Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r627, %r891}; 2026-02-21T09:20:54.3810956Z bar.sync 0; 2026-02-21T09:20:54.3811017Z // begin inline asm 2026-02-21T09:20:54.3811154Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1545}, [%r1248]; 2026-02-21T09:20:54.3811225Z // end inline asm 2026-02-21T09:20:54.3811286Z bar.sync 0; 2026-02-21T09:20:54.3811434Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r626, %r890}; 2026-02-21T09:20:54.3811492Z bar.sync 0; 2026-02-21T09:20:54.3811554Z // begin inline asm 2026-02-21T09:20:54.3811686Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1544}, [%r1248]; 2026-02-21T09:20:54.3811859Z // end inline asm 2026-02-21T09:20:54.3811917Z bar.sync 0; 2026-02-21T09:20:54.3812058Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r628, %r892}; 2026-02-21T09:20:54.3812122Z bar.sync 0; 2026-02-21T09:20:54.3812183Z // begin inline asm 2026-02-21T09:20:54.3812375Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1546}, [%r1248]; 2026-02-21T09:20:54.3812435Z // end inline asm 2026-02-21T09:20:54.3812491Z bar.sync 0; 2026-02-21T09:20:54.3812630Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r629, %r893}; 2026-02-21T09:20:54.3812685Z bar.sync 0; 2026-02-21T09:20:54.3812751Z // begin inline asm 2026-02-21T09:20:54.3812880Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1547}, [%r1248]; 2026-02-21T09:20:54.3812937Z // end inline asm 2026-02-21T09:20:54.3812999Z bar.sync 0; 2026-02-21T09:20:54.3813189Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r631, %r895}; 2026-02-21T09:20:54.3813249Z bar.sync 0; 2026-02-21T09:20:54.3813308Z // begin inline asm 2026-02-21T09:20:54.3813455Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1549}, [%r1248]; 2026-02-21T09:20:54.3813512Z // end inline asm 2026-02-21T09:20:54.3813568Z bar.sync 0; 2026-02-21T09:20:54.3813712Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r630, %r894}; 2026-02-21T09:20:54.3813770Z bar.sync 0; 2026-02-21T09:20:54.3813829Z // begin inline asm 2026-02-21T09:20:54.3813956Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1548}, [%r1248]; 2026-02-21T09:20:54.3814018Z // end inline asm 2026-02-21T09:20:54.3814074Z bar.sync 0; 2026-02-21T09:20:54.3814260Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r632, %r896}; 2026-02-21T09:20:54.3814327Z bar.sync 0; 2026-02-21T09:20:54.3814387Z // begin inline asm 2026-02-21T09:20:54.3814515Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1550}, [%r1248]; 2026-02-21T09:20:54.3814581Z // end inline asm 2026-02-21T09:20:54.3814637Z bar.sync 0; 2026-02-21T09:20:54.3814777Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r633, %r897}; 2026-02-21T09:20:54.3814847Z bar.sync 0; 2026-02-21T09:20:54.3814914Z // begin inline asm 2026-02-21T09:20:54.3815043Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1551}, [%r1248]; 2026-02-21T09:20:54.3815100Z // end inline asm 2026-02-21T09:20:54.3815162Z bar.sync 0; 2026-02-21T09:20:54.3815303Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r635, %r899}; 2026-02-21T09:20:54.3815359Z bar.sync 0; 2026-02-21T09:20:54.3815419Z // begin inline asm 2026-02-21T09:20:54.3815553Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1553}, [%r1248]; 2026-02-21T09:20:54.3815611Z // end inline asm 2026-02-21T09:20:54.3815668Z bar.sync 0; 2026-02-21T09:20:54.3815813Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r634, %r898}; 2026-02-21T09:20:54.3815870Z bar.sync 0; 2026-02-21T09:20:54.3815930Z // begin inline asm 2026-02-21T09:20:54.3816060Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1552}, [%r1248]; 2026-02-21T09:20:54.3816126Z // end inline asm 2026-02-21T09:20:54.3816183Z bar.sync 0; 2026-02-21T09:20:54.3816325Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r636, %r900}; 2026-02-21T09:20:54.3816388Z bar.sync 0; 2026-02-21T09:20:54.3816572Z // begin inline asm 2026-02-21T09:20:54.3816711Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1554}, [%r1248]; 2026-02-21T09:20:54.3816790Z // end inline asm 2026-02-21T09:20:54.3816851Z bar.sync 0; 2026-02-21T09:20:54.3816997Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r637, %r901}; 2026-02-21T09:20:54.3817054Z bar.sync 0; 2026-02-21T09:20:54.3817123Z // begin inline asm 2026-02-21T09:20:54.3817255Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1555}, [%r1248]; 2026-02-21T09:20:54.3817314Z // end inline asm 2026-02-21T09:20:54.3817375Z bar.sync 0; 2026-02-21T09:20:54.3817516Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r639, %r903}; 2026-02-21T09:20:54.3817575Z bar.sync 0; 2026-02-21T09:20:54.3817638Z // begin inline asm 2026-02-21T09:20:54.3817776Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1557}, [%r1248]; 2026-02-21T09:20:54.3817927Z // end inline asm 2026-02-21T09:20:54.3817985Z bar.sync 0; 2026-02-21T09:20:54.3818132Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r638, %r902}; 2026-02-21T09:20:54.3818190Z bar.sync 0; 2026-02-21T09:20:54.3818251Z // begin inline asm 2026-02-21T09:20:54.3818446Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1556}, [%r1248]; 2026-02-21T09:20:54.3818511Z // end inline asm 2026-02-21T09:20:54.3818566Z bar.sync 0; 2026-02-21T09:20:54.3818705Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r640, %r904}; 2026-02-21T09:20:54.3818766Z bar.sync 0; 2026-02-21T09:20:54.3818828Z // begin inline asm 2026-02-21T09:20:54.3818956Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1558}, [%r1248]; 2026-02-21T09:20:54.3819019Z // end inline asm 2026-02-21T09:20:54.3819075Z bar.sync 0; 2026-02-21T09:20:54.3819276Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r641, %r905}; 2026-02-21T09:20:54.3819337Z bar.sync 0; 2026-02-21T09:20:54.3819404Z // begin inline asm 2026-02-21T09:20:54.3819534Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1559}, [%r1248]; 2026-02-21T09:20:54.3819594Z // end inline asm 2026-02-21T09:20:54.3819656Z bar.sync 0; 2026-02-21T09:20:54.3819809Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r643, %r907}; 2026-02-21T09:20:54.3819871Z bar.sync 0; 2026-02-21T09:20:54.3819932Z // begin inline asm 2026-02-21T09:20:54.3820069Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1561}, [%r1248]; 2026-02-21T09:20:54.3820129Z // end inline asm 2026-02-21T09:20:54.3820186Z bar.sync 0; 2026-02-21T09:20:54.3820395Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r642, %r906}; 2026-02-21T09:20:54.3820463Z bar.sync 0; 2026-02-21T09:20:54.3820524Z // begin inline asm 2026-02-21T09:20:54.3820655Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1560}, [%r1248]; 2026-02-21T09:20:54.3820725Z // end inline asm 2026-02-21T09:20:54.3820783Z bar.sync 0; 2026-02-21T09:20:54.3820922Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r644, %r908}; 2026-02-21T09:20:54.3820999Z bar.sync 0; 2026-02-21T09:20:54.3821064Z // begin inline asm 2026-02-21T09:20:54.3821197Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1562}, [%r1248]; 2026-02-21T09:20:54.3821264Z // end inline asm 2026-02-21T09:20:54.3821323Z bar.sync 0; 2026-02-21T09:20:54.3821465Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r645, %r909}; 2026-02-21T09:20:54.3821527Z bar.sync 0; 2026-02-21T09:20:54.3821595Z // begin inline asm 2026-02-21T09:20:54.3821722Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1563}, [%r1248]; 2026-02-21T09:20:54.3821781Z // end inline asm 2026-02-21T09:20:54.3821845Z bar.sync 0; 2026-02-21T09:20:54.3821984Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r647, %r911}; 2026-02-21T09:20:54.3822040Z bar.sync 0; 2026-02-21T09:20:54.3822101Z // begin inline asm 2026-02-21T09:20:54.3822235Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1565}, [%r1248]; 2026-02-21T09:20:54.3822296Z // end inline asm 2026-02-21T09:20:54.3822350Z bar.sync 0; 2026-02-21T09:20:54.3822496Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r646, %r910}; 2026-02-21T09:20:54.3822556Z bar.sync 0; 2026-02-21T09:20:54.3822616Z // begin inline asm 2026-02-21T09:20:54.3822745Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1564}, [%r1248]; 2026-02-21T09:20:54.3822810Z // end inline asm 2026-02-21T09:20:54.3822868Z bar.sync 0; 2026-02-21T09:20:54.3823008Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r648, %r912}; 2026-02-21T09:20:54.3823071Z bar.sync 0; 2026-02-21T09:20:54.3823132Z // begin inline asm 2026-02-21T09:20:54.3823261Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1566}, [%r1248]; 2026-02-21T09:20:54.3823326Z // end inline asm 2026-02-21T09:20:54.3823392Z bar.sync 0; 2026-02-21T09:20:54.3823535Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r649, %r913}; 2026-02-21T09:20:54.3823592Z bar.sync 0; 2026-02-21T09:20:54.3823658Z // begin inline asm 2026-02-21T09:20:54.3823785Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1567}, [%r1248]; 2026-02-21T09:20:54.3823842Z // end inline asm 2026-02-21T09:20:54.3823964Z bar.sync 0; 2026-02-21T09:20:54.3824105Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r651, %r915}; 2026-02-21T09:20:54.3824161Z bar.sync 0; 2026-02-21T09:20:54.3824222Z // begin inline asm 2026-02-21T09:20:54.3824358Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1569}, [%r1248]; 2026-02-21T09:20:54.3824464Z // end inline asm 2026-02-21T09:20:54.3824521Z bar.sync 0; 2026-02-21T09:20:54.3824666Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r650, %r914}; 2026-02-21T09:20:54.3824723Z bar.sync 0; 2026-02-21T09:20:54.3824783Z // begin inline asm 2026-02-21T09:20:54.3824912Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1568}, [%r1248]; 2026-02-21T09:20:54.3824978Z // end inline asm 2026-02-21T09:20:54.3825035Z bar.sync 0; 2026-02-21T09:20:54.3825227Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r267], {%r652, %r916}; 2026-02-21T09:20:54.3825303Z bar.sync 0; 2026-02-21T09:20:54.3825366Z // begin inline asm 2026-02-21T09:20:54.3825496Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1570}, [%r1248]; 2026-02-21T09:20:54.3825564Z // end inline asm 2026-02-21T09:20:54.3825624Z // begin inline asm 2026-02-21T09:20:54.3825710Z fence.proxy.async.shared::cta; 2026-02-21T09:20:54.3825769Z // end inline asm 2026-02-21T09:20:54.3825853Z wgmma.fence.sync.aligned; 2026-02-21T09:20:54.3825922Z shl.b32 %r1806, %r1792, 9; 2026-02-21T09:20:54.3825986Z and.b32 %r1807, %r1806, 8192; 2026-02-21T09:20:54.3826064Z add.s32 %r1808, %r1807, %r1244; 2026-02-21T09:20:54.3826130Z bfe.u32 %r1809, %r1808, 4, 14; 2026-02-21T09:20:54.3826261Z cvt.u64.u32 %rd41, %r1809; 2026-02-21T09:20:54.3826350Z or.b64 %rd35, %rd41, -9223371899348713472; 2026-02-21T09:20:54.3826423Z // begin inline asm 2026-02-21T09:20:54.3827828Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1507,%r1508,%r1509,%r1510,%r1511,%r1512,%r1513,%r1514,%r1515,%r1516,%r1517,%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1526,%r1527,%r1528,%r1529,%r1530,%r1531,%r1532,%r1533,%r1534,%r1535,%r1536,%r1537,%r1538,%r1539,%r1540,%r1541,%r1542,%r1543,%r1544,%r1545,%r1546,%r1547,%r1548,%r1549,%r1550,%r1551,%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559,%r1560,%r1561,%r1562,%r1563,%r1564,%r1565,%r1566,%r1567,%r1568,%r1569,%r1570}, {%r1503,%r1504,%r1505,%r1506}, %rd35, %p1, 1, 1; 2026-02-21T09:20:54.3827898Z // end inline asm 2026-02-21T09:20:54.3827968Z add.s32 %r1810, %r1808, 32; 2026-02-21T09:20:54.3828034Z bfe.u32 %r1811, %r1810, 4, 14; 2026-02-21T09:20:54.3828098Z cvt.u64.u32 %rd42, %r1811; 2026-02-21T09:20:54.3828179Z or.b64 %rd36, %rd42, -9223371899348713472; 2026-02-21T09:20:54.3828241Z // begin inline asm 2026-02-21T09:20:54.3829582Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1507,%r1508,%r1509,%r1510,%r1511,%r1512,%r1513,%r1514,%r1515,%r1516,%r1517,%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1526,%r1527,%r1528,%r1529,%r1530,%r1531,%r1532,%r1533,%r1534,%r1535,%r1536,%r1537,%r1538,%r1539,%r1540,%r1541,%r1542,%r1543,%r1544,%r1545,%r1546,%r1547,%r1548,%r1549,%r1550,%r1551,%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559,%r1560,%r1561,%r1562,%r1563,%r1564,%r1565,%r1566,%r1567,%r1568,%r1569,%r1570}, {%r1635,%r1636,%r1637,%r1638}, %rd36, %p1, 1, 1; 2026-02-21T09:20:54.3829651Z // end inline asm 2026-02-21T09:20:54.3829733Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:54.3829798Z mov.b32 %r1705, %r1704; 2026-02-21T09:20:54.3829865Z mov.b32 %r1703, %r1244; 2026-02-21T09:20:54.3829927Z // begin inline asm 2026-02-21T09:20:54.3831002Z // wait for regs: %r1507,%r1508,%r1509,%r1510,%r1511,%r1512,%r1513,%r1514,%r1515,%r1516,%r1517,%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1526,%r1527,%r1528,%r1529,%r1530,%r1531,%r1532,%r1533,%r1534,%r1535,%r1536,%r1537,%r1538,%r1539,%r1540,%r1541,%r1542,%r1543,%r1544,%r1545,%r1546,%r1547,%r1548,%r1549,%r1550,%r1551,%r1552,%r1553,%r1554,%r1555,%r1556,%r1557,%r1558,%r1559,%r1560,%r1561,%r1562,%r1563,%r1564,%r1565,%r1566,%r1567,%r1568,%r1569,%r1570,%r1703,%r1704,%r1705 2026-02-21T09:20:54.3831170Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:54.3831229Z // end inline asm 2026-02-21T09:20:54.3831286Z $L__tmp4: 2026-02-21T09:20:54.3831509Z .loc 1 34 74 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:34:74 2026-02-21T09:20:54.3831638Z add.s32 %r1812, %r1970, 1; 2026-02-21T09:20:54.3831721Z setp.gt.s32 %p10, %r1812, 4; 2026-02-21T09:20:54.3831800Z selp.b32 %r1970, 0, %r1812, %p10; 2026-02-21T09:20:54.3832010Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3832079Z add.s32 %r1813, %r1968, -16; 2026-02-21T09:20:54.3832154Z mad.wide.s32 %rd37, %r1813, 2, %rd7; 2026-02-21T09:20:54.3832427Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3832507Z shl.b32 %r1814, %r1970, 13; 2026-02-21T09:20:54.3832577Z add.s32 %r1773, %r161, %r1814; 2026-02-21T09:20:54.3832651Z selp.b32 %r1774, 8, 0, %p8; 2026-02-21T09:20:54.3832727Z // begin inline asm 2026-02-21T09:20:54.3832875Z cp.async.ca.shared.global [ %r1773 + 0 ], [ %rd37 + 0 ], 0x8, %r1774; 2026-02-21T09:20:54.3832940Z // end inline asm 2026-02-21T09:20:54.3833010Z cp.async.commit_group; 2026-02-21T09:20:54.3833212Z .loc 1 42 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:28 2026-02-21T09:20:54.3833285Z mad.wide.s32 %rd38, %r1968, 2, %rd7; 2026-02-21T09:20:54.3833489Z .loc 1 42 76 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:42:76 2026-02-21T09:20:54.3833640Z add.s32 %r1775, %r163, %r1814; 2026-02-21T09:20:54.3833705Z // begin inline asm 2026-02-21T09:20:54.3833851Z cp.async.ca.shared.global [ %r1775 + 0 ], [ %rd38 + 0 ], 0x8, %r1774; 2026-02-21T09:20:54.3833913Z // end inline asm 2026-02-21T09:20:54.3833983Z cp.async.commit_group; 2026-02-21T09:20:54.3834183Z .loc 1 34 74 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:34:74 2026-02-21T09:20:54.3834245Z add.s32 %r1968, %r1968, 32; 2026-02-21T09:20:54.3834308Z add.s32 %r1967, %r1967, 131072; 2026-02-21T09:20:54.3834374Z setp.lt.u64 %p11, %rd51, 496; 2026-02-21T09:20:54.3834442Z @%p11 bra $L__BB0_1; 2026-02-21T09:20:54.3834498Z // %bb.2: 2026-02-21T09:20:54.3834698Z .loc 1 27 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:27:28 2026-02-21T09:20:54.3834766Z or.b32 %r1887, %r5, %r3; 2026-02-21T09:20:54.3834964Z .loc 1 25 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:25:28 2026-02-21T09:20:54.3835029Z or.b32 %r1888, %r1, %r21; 2026-02-21T09:20:54.3835233Z .loc 1 34 74 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:34:74 2026-02-21T09:20:54.3835304Z cp.async.wait_group 0; 2026-02-21T09:20:54.3835362Z bar.sync 0; 2026-02-21T09:20:54.3835567Z .loc 1 81 24 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:81:24 2026-02-21T09:20:54.3835652Z cvt.rn.bf16x2.f32 %r1889, %r1508, %r1507; 2026-02-21T09:20:54.3835726Z cvt.rn.bf16x2.f32 %r1890, %r1510, %r1509; 2026-02-21T09:20:54.3835797Z cvt.rn.bf16x2.f32 %r1891, %r1512, %r1511; 2026-02-21T09:20:54.3835871Z cvt.rn.bf16x2.f32 %r1892, %r1514, %r1513; 2026-02-21T09:20:54.3835951Z cvt.rn.bf16x2.f32 %r1893, %r1516, %r1515; 2026-02-21T09:20:54.3836023Z cvt.rn.bf16x2.f32 %r1894, %r1518, %r1517; 2026-02-21T09:20:54.3836103Z cvt.rn.bf16x2.f32 %r1895, %r1520, %r1519; 2026-02-21T09:20:54.3836174Z cvt.rn.bf16x2.f32 %r1896, %r1522, %r1521; 2026-02-21T09:20:54.3836247Z cvt.rn.bf16x2.f32 %r1897, %r1524, %r1523; 2026-02-21T09:20:54.3836323Z cvt.rn.bf16x2.f32 %r1898, %r1526, %r1525; 2026-02-21T09:20:54.3836412Z cvt.rn.bf16x2.f32 %r1899, %r1528, %r1527; 2026-02-21T09:20:54.3836635Z cvt.rn.bf16x2.f32 %r1900, %r1530, %r1529; 2026-02-21T09:20:54.3836716Z cvt.rn.bf16x2.f32 %r1901, %r1532, %r1531; 2026-02-21T09:20:54.3836798Z cvt.rn.bf16x2.f32 %r1902, %r1534, %r1533; 2026-02-21T09:20:54.3836964Z cvt.rn.bf16x2.f32 %r1903, %r1536, %r1535; 2026-02-21T09:20:54.3837043Z cvt.rn.bf16x2.f32 %r1904, %r1538, %r1537; 2026-02-21T09:20:54.3837123Z cvt.rn.bf16x2.f32 %r1905, %r1540, %r1539; 2026-02-21T09:20:54.3837210Z cvt.rn.bf16x2.f32 %r1906, %r1542, %r1541; 2026-02-21T09:20:54.3837348Z cvt.rn.bf16x2.f32 %r1907, %r1544, %r1543; 2026-02-21T09:20:54.3837423Z cvt.rn.bf16x2.f32 %r1908, %r1546, %r1545; 2026-02-21T09:20:54.3837512Z cvt.rn.bf16x2.f32 %r1909, %r1548, %r1547; 2026-02-21T09:20:54.3837585Z cvt.rn.bf16x2.f32 %r1910, %r1550, %r1549; 2026-02-21T09:20:54.3837663Z cvt.rn.bf16x2.f32 %r1911, %r1552, %r1551; 2026-02-21T09:20:54.3837747Z cvt.rn.bf16x2.f32 %r1912, %r1554, %r1553; 2026-02-21T09:20:54.3837825Z cvt.rn.bf16x2.f32 %r1913, %r1556, %r1555; 2026-02-21T09:20:54.3837963Z cvt.rn.bf16x2.f32 %r1914, %r1558, %r1557; 2026-02-21T09:20:54.3838054Z cvt.rn.bf16x2.f32 %r1915, %r1560, %r1559; 2026-02-21T09:20:54.3838137Z cvt.rn.bf16x2.f32 %r1916, %r1562, %r1561; 2026-02-21T09:20:54.3838214Z cvt.rn.bf16x2.f32 %r1917, %r1564, %r1563; 2026-02-21T09:20:54.3838297Z cvt.rn.bf16x2.f32 %r1918, %r1566, %r1565; 2026-02-21T09:20:54.3838383Z cvt.rn.bf16x2.f32 %r1919, %r1568, %r1567; 2026-02-21T09:20:54.3838470Z cvt.rn.bf16x2.f32 %r1920, %r1570, %r1569; 2026-02-21T09:20:54.3838684Z .loc 1 27 28 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:27:28 2026-02-21T09:20:54.3838755Z shl.b32 %r1921, %r1887, 13; 2026-02-21T09:20:54.3838958Z .loc 1 82 39 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:82:39 2026-02-21T09:20:54.3839088Z or.b32 %r1922, %r1921, 262144; 2026-02-21T09:20:54.3839158Z or.b32 %r1923, %r1921, 524288; 2026-02-21T09:20:54.3839223Z or.b32 %r1924, %r1921, 786432; 2026-02-21T09:20:54.3839289Z or.b32 %r1925, %r1921, 1048576; 2026-02-21T09:20:54.3839352Z or.b32 %r1926, %r1921, 1310720; 2026-02-21T09:20:54.3839421Z or.b32 %r1927, %r1921, 1572864; 2026-02-21T09:20:54.3839484Z or.b32 %r1928, %r1921, 1835008; 2026-02-21T09:20:54.3839702Z .loc 1 82 46 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:82:46 2026-02-21T09:20:54.3839771Z add.s32 %r1929, %r1921, %r1888; 2026-02-21T09:20:54.3839836Z add.s32 %r1930, %r1922, %r1888; 2026-02-21T09:20:54.3839899Z add.s32 %r1931, %r1923, %r1888; 2026-02-21T09:20:54.3839961Z add.s32 %r1932, %r1924, %r1888; 2026-02-21T09:20:54.3840029Z add.s32 %r1933, %r1925, %r1888; 2026-02-21T09:20:54.3840090Z add.s32 %r1934, %r1926, %r1888; 2026-02-21T09:20:54.3840163Z add.s32 %r1935, %r1927, %r1888; 2026-02-21T09:20:54.3840236Z add.s32 %r1936, %r1928, %r1888; 2026-02-21T09:20:54.3840437Z .loc 1 82 18 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:82:18 2026-02-21T09:20:54.3840511Z mad.wide.s32 %rd43, %r1929, 2, %rd9; 2026-02-21T09:20:54.3840583Z mad.wide.s32 %rd44, %r1930, 2, %rd9; 2026-02-21T09:20:54.3840665Z mad.wide.s32 %rd45, %r1931, 2, %rd9; 2026-02-21T09:20:54.3840733Z mad.wide.s32 %rd46, %r1932, 2, %rd9; 2026-02-21T09:20:54.3840801Z mad.wide.s32 %rd47, %r1933, 2, %rd9; 2026-02-21T09:20:54.3840873Z mad.wide.s32 %rd48, %r1934, 2, %rd9; 2026-02-21T09:20:54.3840939Z mad.wide.s32 %rd49, %r1935, 2, %rd9; 2026-02-21T09:20:54.3841007Z mad.wide.s32 %rd50, %r1936, 2, %rd9; 2026-02-21T09:20:54.3841217Z .loc 1 82 77 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:82:77 2026-02-21T09:20:54.3841281Z shl.b32 %r1937, %r6, 15; 2026-02-21T09:20:54.3841344Z shl.b32 %r1938, %r2, 5; 2026-02-21T09:20:54.3841409Z and.b32 %r1939, %r1938, 15456; 2026-02-21T09:20:54.3841477Z and.b32 %r1940, %r2, 24; 2026-02-21T09:20:54.3845455Z shl.b32 %r1941, %r1940, 4; 2026-02-21T09:20:54.3845569Z bfe.s32 %r1942, %r2, 2, 1; 2026-02-21T09:20:54.3845646Z and.b32 %r1943, %r1942, 16400; 2026-02-21T09:20:54.3845709Z shr.u32 %r1944, %r2, 3; 2026-02-21T09:20:54.3845782Z and.b32 %r1945, %r1944, 64; 2026-02-21T09:20:54.3845848Z or.b32 %r1946, %r1943, %r1937; 2026-02-21T09:20:54.3846019Z or.b32 %r1947, %r1939, %r1941; 2026-02-21T09:20:54.3846103Z xor.b32 %r1948, %r1947, %r1945; 2026-02-21T09:20:54.3846167Z or.b32 %r1949, %r1946, %r1948; 2026-02-21T09:20:54.3846233Z add.s32 %r1951, %r206, %r1949; 2026-02-21T09:20:54.3846369Z st.shared.v4.b32 [%r1951], {%r1889, %r1891, %r1893, %r1895}; 2026-02-21T09:20:54.3846737Z st.shared.v4.b32 [%r1951+512], {%r1890, %r1892, %r1894, %r1896}; 2026-02-21T09:20:54.3846811Z xor.b32 %r1952, %r1949, 16; 2026-02-21T09:20:54.3846877Z add.s32 %r1953, %r206, %r1952; 2026-02-21T09:20:54.3847001Z st.shared.v4.b32 [%r1953], {%r1897, %r1899, %r1901, %r1903}; 2026-02-21T09:20:54.3847120Z st.shared.v4.b32 [%r1953+512], {%r1898, %r1900, %r1902, %r1904}; 2026-02-21T09:20:54.3847185Z xor.b32 %r1954, %r1949, 32; 2026-02-21T09:20:54.3847350Z add.s32 %r1955, %r206, %r1954; 2026-02-21T09:20:54.3847473Z st.shared.v4.b32 [%r1955], {%r1905, %r1907, %r1909, %r1911}; 2026-02-21T09:20:54.3847588Z st.shared.v4.b32 [%r1955+512], {%r1906, %r1908, %r1910, %r1912}; 2026-02-21T09:20:54.3847657Z xor.b32 %r1956, %r1949, 48; 2026-02-21T09:20:54.3847725Z add.s32 %r1957, %r206, %r1956; 2026-02-21T09:20:54.3847838Z st.shared.v4.b32 [%r1957], {%r1913, %r1915, %r1917, %r1919}; 2026-02-21T09:20:54.3847966Z st.shared.v4.b32 [%r1957+512], {%r1914, %r1916, %r1918, %r1920}; 2026-02-21T09:20:54.3848032Z bar.sync 0; 2026-02-21T09:20:54.3848099Z shl.b32 %r1958, %r1940, 12; 2026-02-21T09:20:54.3848164Z shl.b32 %r1959, %r1940, 2; 2026-02-21T09:20:54.3848224Z and.b32 %r1960, %r4, 1920; 2026-02-21T09:20:54.3848290Z bfe.s32 %r1961, %r2, 5, 1; 2026-02-21T09:20:54.3848431Z and.b32 %r1962, %r1961, 16400; 2026-02-21T09:20:54.3848500Z or.b32 %r1963, %r1958, %r20; 2026-02-21T09:20:54.3848569Z or.b32 %r1964, %r1959, %r1960; 2026-02-21T09:20:54.3848637Z xor.b32 %r1965, %r1963, %r1964; 2026-02-21T09:20:54.3848703Z xor.b32 %r1966, %r1965, %r1962; 2026-02-21T09:20:54.3848767Z add.s32 %r1819, %r206, %r1966; 2026-02-21T09:20:54.3848836Z // begin inline asm 2026-02-21T09:20:54.3849039Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1855, %r1856, %r1857, %r1858}, [%r1819]; 2026-02-21T09:20:54.3849101Z // end inline asm 2026-02-21T09:20:54.3849171Z add.s32 %r1824, %r1819, 2048; 2026-02-21T09:20:54.3849232Z // begin inline asm 2026-02-21T09:20:54.3849430Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1859, %r1860, %r1861, %r1862}, [%r1824]; 2026-02-21T09:20:54.3849497Z // end inline asm 2026-02-21T09:20:54.3849562Z add.s32 %r1829, %r1819, 4096; 2026-02-21T09:20:54.3849622Z // begin inline asm 2026-02-21T09:20:54.3849805Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1863, %r1864, %r1865, %r1866}, [%r1829]; 2026-02-21T09:20:54.3849868Z // end inline asm 2026-02-21T09:20:54.3849930Z add.s32 %r1834, %r1819, 6144; 2026-02-21T09:20:54.3849992Z // begin inline asm 2026-02-21T09:20:54.3850175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1867, %r1868, %r1869, %r1870}, [%r1834]; 2026-02-21T09:20:54.3850232Z // end inline asm 2026-02-21T09:20:54.3850296Z add.s32 %r1839, %r1819, 8192; 2026-02-21T09:20:54.3850376Z // begin inline asm 2026-02-21T09:20:54.3850564Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1871, %r1872, %r1873, %r1874}, [%r1839]; 2026-02-21T09:20:54.3850622Z // end inline asm 2026-02-21T09:20:54.3850684Z add.s32 %r1844, %r1819, 10240; 2026-02-21T09:20:54.3850754Z // begin inline asm 2026-02-21T09:20:54.3850929Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1875, %r1876, %r1877, %r1878}, [%r1844]; 2026-02-21T09:20:54.3850988Z // end inline asm 2026-02-21T09:20:54.3851056Z add.s32 %r1849, %r1819, 12288; 2026-02-21T09:20:54.3851117Z // begin inline asm 2026-02-21T09:20:54.3851294Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1879, %r1880, %r1881, %r1882}, [%r1849]; 2026-02-21T09:20:54.3851352Z // end inline asm 2026-02-21T09:20:54.3851419Z add.s32 %r1854, %r1819, 14336; 2026-02-21T09:20:54.3851479Z // begin inline asm 2026-02-21T09:20:54.3851657Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1883, %r1884, %r1885, %r1886}, [%r1854]; 2026-02-21T09:20:54.3851808Z // end inline asm 2026-02-21T09:20:54.3851868Z // begin inline asm 2026-02-21T09:20:54.3852002Z st.global.v4.b32 [ %rd43 + 0 ], { %r1855, %r1856, %r1857, %r1858 }; 2026-02-21T09:20:54.3852066Z // end inline asm 2026-02-21T09:20:54.3852126Z // begin inline asm 2026-02-21T09:20:54.3852311Z st.global.v4.b32 [ %rd44 + 0 ], { %r1859, %r1860, %r1861, %r1862 }; 2026-02-21T09:20:54.3852370Z // end inline asm 2026-02-21T09:20:54.3852433Z // begin inline asm 2026-02-21T09:20:54.3852548Z st.global.v4.b32 [ %rd45 + 0 ], { %r1863, %r1864, %r1865, %r1866 }; 2026-02-21T09:20:54.3852607Z // end inline asm 2026-02-21T09:20:54.3852672Z // begin inline asm 2026-02-21T09:20:54.3852786Z st.global.v4.b32 [ %rd46 + 0 ], { %r1867, %r1868, %r1869, %r1870 }; 2026-02-21T09:20:54.3852845Z // end inline asm 2026-02-21T09:20:54.3852954Z // begin inline asm 2026-02-21T09:20:54.3853080Z st.global.v4.b32 [ %rd47 + 0 ], { %r1871, %r1872, %r1873, %r1874 }; 2026-02-21T09:20:54.3853138Z // end inline asm 2026-02-21T09:20:54.3853202Z // begin inline asm 2026-02-21T09:20:54.3853345Z st.global.v4.b32 [ %rd48 + 0 ], { %r1875, %r1876, %r1877, %r1878 }; 2026-02-21T09:20:54.3853406Z // end inline asm 2026-02-21T09:20:54.3853468Z // begin inline asm 2026-02-21T09:20:54.3853588Z st.global.v4.b32 [ %rd49 + 0 ], { %r1879, %r1880, %r1881, %r1882 }; 2026-02-21T09:20:54.3853656Z // end inline asm 2026-02-21T09:20:54.3853716Z // begin inline asm 2026-02-21T09:20:54.3853830Z st.global.v4.b32 [ %rd50 + 0 ], { %r1883, %r1884, %r1885, %r1886 }; 2026-02-21T09:20:54.3853892Z // end inline asm 2026-02-21T09:20:54.3854161Z .loc 1 82 4 // ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py:82:4 2026-02-21T09:20:54.3854222Z ret; 2026-02-21T09:20:54.3854298Z $L__tmp5: 2026-02-21T09:20:54.3854356Z $L__func_end0: 2026-02-21T09:20:54.3854451Z // -- End function 2026-02-21T09:20:54.3854506Z } 2026-02-21T09:20:54.3854767Z .file 1 "/tmp/torchinductor_root/ei/ceix7ddasvxpn75joahdftryyd5msw355grpcvzqxehe5svs6z3o.py" 2026-02-21T09:20:54.3854986Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:20:54.3855053Z .section .debug_abbrev 2026-02-21T09:20:54.3855118Z { 2026-02-21T09:20:54.3855221Z .b8 1 // Abbreviation Code 2026-02-21T09:20:54.3855317Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:20:54.3855407Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:54.3855504Z .b8 37 // DW_AT_producer 2026-02-21T09:20:54.3855588Z .b8 8 // DW_FORM_string 2026-02-21T09:20:54.3855673Z .b8 19 // DW_AT_language 2026-02-21T09:20:54.3855764Z .b8 5 // DW_FORM_data2 2026-02-21T09:20:54.3855843Z .b8 3 // DW_AT_name 2026-02-21T09:20:54.3855924Z .b8 8 // DW_FORM_string 2026-02-21T09:20:54.3856018Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:20:54.3856099Z .b8 6 // DW_FORM_data4 2026-02-21T09:20:54.3856180Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:20:54.3856266Z .b8 8 // DW_FORM_string 2026-02-21T09:20:54.3856342Z .b8 0 // EOM(1) 2026-02-21T09:20:54.3856414Z .b8 0 // EOM(2) 2026-02-21T09:20:54.3856640Z .b8 2 // Abbreviation Code 2026-02-21T09:20:54.3856751Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:54.3856833Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:54.3856912Z .b8 3 // DW_AT_name 2026-02-21T09:20:54.3856997Z .b8 8 // DW_FORM_string 2026-02-21T09:20:54.3857081Z .b8 32 // DW_AT_inline 2026-02-21T09:20:54.3857253Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:54.3857330Z .b8 0 // EOM(1) 2026-02-21T09:20:54.3857400Z .b8 0 // EOM(2) 2026-02-21T09:20:54.3857548Z .b8 3 // Abbreviation Code 2026-02-21T09:20:54.3857634Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:54.3857722Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:54.3857806Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:54.3857885Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:54.3857985Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:54.3858151Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:54.3858250Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:54.3858335Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:54.3858411Z .b8 0 // EOM(1) 2026-02-21T09:20:54.3858483Z .b8 0 // EOM(2) 2026-02-21T09:20:54.3858577Z .b8 4 // Abbreviation Code 2026-02-21T09:20:54.3858680Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:20:54.3858763Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:54.3858865Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:54.3859038Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:54.3859122Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:54.3859200Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:54.3859288Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:54.3859365Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:54.3859450Z .b8 88 // DW_AT_call_file 2026-02-21T09:20:54.3859536Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:54.3859617Z .b8 89 // DW_AT_call_line 2026-02-21T09:20:54.3859698Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:54.3859785Z .b8 87 // DW_AT_call_column 2026-02-21T09:20:54.3859870Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:54.3859943Z .b8 0 // EOM(1) 2026-02-21T09:20:54.3860016Z .b8 0 // EOM(2) 2026-02-21T09:20:54.3860091Z .b8 0 // EOM(3) 2026-02-21T09:20:54.3860146Z } 2026-02-21T09:20:54.3860230Z .section .debug_info 2026-02-21T09:20:54.3860289Z { 2026-02-21T09:20:54.3860383Z .b32 178 // Length of Unit 2026-02-21T09:20:54.3860480Z .b8 2 // DWARF version number 2026-02-21T09:20:54.3860536Z .b8 0 2026-02-21T09:20:54.3860675Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:20:54.3860773Z .b8 8 // Address Size (in bytes) 2026-02-21T09:20:54.3860888Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:20:54.3860982Z .b8 116 // DW_AT_producer 2026-02-21T09:20:54.3861046Z .b8 114 2026-02-21T09:20:54.3861101Z .b8 105 2026-02-21T09:20:54.3861155Z .b8 116 2026-02-21T09:20:54.3861213Z .b8 111 2026-02-21T09:20:54.3861267Z .b8 110 2026-02-21T09:20:54.3861320Z .b8 0 2026-02-21T09:20:54.3861406Z .b8 2 // DW_AT_language 2026-02-21T09:20:54.3861460Z .b8 0 2026-02-21T09:20:54.3861541Z .b8 99 // DW_AT_name 2026-02-21T09:20:54.3861600Z .b8 101 2026-02-21T09:20:54.3861654Z .b8 105 2026-02-21T09:20:54.3861706Z .b8 120 2026-02-21T09:20:54.3861758Z .b8 55 2026-02-21T09:20:54.3861815Z .b8 100 2026-02-21T09:20:54.3861941Z .b8 100 2026-02-21T09:20:54.3861997Z .b8 97 2026-02-21T09:20:54.3862049Z .b8 115 2026-02-21T09:20:54.3862107Z .b8 118 2026-02-21T09:20:54.3862159Z .b8 120 2026-02-21T09:20:54.3862211Z .b8 112 2026-02-21T09:20:54.3862268Z .b8 110 2026-02-21T09:20:54.3862371Z .b8 55 2026-02-21T09:20:54.3862424Z .b8 53 2026-02-21T09:20:54.3862478Z .b8 106 2026-02-21T09:20:54.3862543Z .b8 111 2026-02-21T09:20:54.3862599Z .b8 97 2026-02-21T09:20:54.3862653Z .b8 104 2026-02-21T09:20:54.3862714Z .b8 100 2026-02-21T09:20:54.3862767Z .b8 102 2026-02-21T09:20:54.3862819Z .b8 116 2026-02-21T09:20:54.3862873Z .b8 114 2026-02-21T09:20:54.3862930Z .b8 121 2026-02-21T09:20:54.3862984Z .b8 121 2026-02-21T09:20:54.3863038Z .b8 100 2026-02-21T09:20:54.3863090Z .b8 53 2026-02-21T09:20:54.3863153Z .b8 109 2026-02-21T09:20:54.3863262Z .b8 115 2026-02-21T09:20:54.3863321Z .b8 119 2026-02-21T09:20:54.3863380Z .b8 51 2026-02-21T09:20:54.3863433Z .b8 53 2026-02-21T09:20:54.3863484Z .b8 53 2026-02-21T09:20:54.3863538Z .b8 103 2026-02-21T09:20:54.3863599Z .b8 114 2026-02-21T09:20:54.3863652Z .b8 112 2026-02-21T09:20:54.3863705Z .b8 99 2026-02-21T09:20:54.3863764Z .b8 118 2026-02-21T09:20:54.3863820Z .b8 122 2026-02-21T09:20:54.3863874Z .b8 113 2026-02-21T09:20:54.3863925Z .b8 120 2026-02-21T09:20:54.3863991Z .b8 101 2026-02-21T09:20:54.3864044Z .b8 104 2026-02-21T09:20:54.3864097Z .b8 101 2026-02-21T09:20:54.3864153Z .b8 53 2026-02-21T09:20:54.3864209Z .b8 115 2026-02-21T09:20:54.3864262Z .b8 118 2026-02-21T09:20:54.3864327Z .b8 115 2026-02-21T09:20:54.3864386Z .b8 54 2026-02-21T09:20:54.3864440Z .b8 122 2026-02-21T09:20:54.3864545Z .b8 51 2026-02-21T09:20:54.3864600Z .b8 111 2026-02-21T09:20:54.3864657Z .b8 46 2026-02-21T09:20:54.3864710Z .b8 112 2026-02-21T09:20:54.3864763Z .b8 121 2026-02-21T09:20:54.3864819Z .b8 0 2026-02-21T09:20:54.3864925Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:20:54.3865014Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:20:54.3865069Z .b8 116 2026-02-21T09:20:54.3865130Z .b8 109 2026-02-21T09:20:54.3865184Z .b8 112 2026-02-21T09:20:54.3865236Z .b8 47 2026-02-21T09:20:54.3865303Z .b8 116 2026-02-21T09:20:54.3865356Z .b8 111 2026-02-21T09:20:54.3865408Z .b8 114 2026-02-21T09:20:54.3865459Z .b8 99 2026-02-21T09:20:54.3865518Z .b8 104 2026-02-21T09:20:54.3865571Z .b8 105 2026-02-21T09:20:54.3865625Z .b8 110 2026-02-21T09:20:54.3865683Z .b8 100 2026-02-21T09:20:54.3865736Z .b8 117 2026-02-21T09:20:54.3865787Z .b8 99 2026-02-21T09:20:54.3865841Z .b8 116 2026-02-21T09:20:54.3865900Z .b8 111 2026-02-21T09:20:54.3865952Z .b8 114 2026-02-21T09:20:54.3866006Z .b8 95 2026-02-21T09:20:54.3866060Z .b8 114 2026-02-21T09:20:54.3866119Z .b8 111 2026-02-21T09:20:54.3866172Z .b8 111 2026-02-21T09:20:54.3866225Z .b8 116 2026-02-21T09:20:54.3866282Z .b8 47 2026-02-21T09:20:54.3866338Z .b8 101 2026-02-21T09:20:54.3866391Z .b8 105 2026-02-21T09:20:54.3866573Z .b8 0 2026-02-21T09:20:54.3866708Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:20:54.3866791Z .b8 95 // DW_AT_name 2026-02-21T09:20:54.3866844Z .b8 104 2026-02-21T09:20:54.3866901Z .b8 101 2026-02-21T09:20:54.3866968Z .b8 108 2026-02-21T09:20:54.3867021Z .b8 105 2026-02-21T09:20:54.3867074Z .b8 111 2026-02-21T09:20:54.3867131Z .b8 110 2026-02-21T09:20:54.3867182Z .b8 95 2026-02-21T09:20:54.3867235Z .b8 109 2026-02-21T09:20:54.3867290Z .b8 97 2026-02-21T09:20:54.3867343Z .b8 116 2026-02-21T09:20:54.3867395Z .b8 109 2026-02-21T09:20:54.3867447Z .b8 117 2026-02-21T09:20:54.3867507Z .b8 108 2026-02-21T09:20:54.3867560Z .b8 95 2026-02-21T09:20:54.3867615Z .b8 98 2026-02-21T09:20:54.3867672Z .b8 102 2026-02-21T09:20:54.3867725Z .b8 49 2026-02-21T09:20:54.3867777Z .b8 54 2026-02-21T09:20:54.3867831Z .b8 95 2026-02-21T09:20:54.3867887Z .b8 105 2026-02-21T09:20:54.3867940Z .b8 110 2026-02-21T09:20:54.3867992Z .b8 116 2026-02-21T09:20:54.3868045Z .b8 52 2026-02-21T09:20:54.3868103Z .b8 0 2026-02-21T09:20:54.3868184Z .b8 1 // DW_AT_inline 2026-02-21T09:20:54.3868495Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:20:54.3868605Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:20:54.3868705Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:20:54.3868878Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:54.3869020Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:20:54.3869129Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:54.3869235Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:20:54.3869330Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:20:54.3869486Z .b8 1 // DW_AT_call_file 2026-02-21T09:20:54.3869572Z .b8 78 // DW_AT_call_line 2026-02-21T09:20:54.3869660Z .b8 36 // DW_AT_call_column 2026-02-21T09:20:54.3869760Z .b8 0 // End Of Children Mark 2026-02-21T09:20:54.3869848Z .b8 0 // End Of Children Mark 2026-02-21T09:20:54.3869901Z } 2026-02-21T09:20:54.3869978Z .section .debug_macinfo { } 2026-02-21T09:20:54.3869985Z 2026-02-21T09:20:54.3870065Z ================================================================ 2026-02-21T09:20:54.3870185Z please share the reproducer above with Triton project. 2026-02-21T09:20:56.0253547Z 2026-02-21T09:20:56.0253563Z 2026-02-21T09:20:56.0253569Z 2026-02-21T09:20:56.0254248Z ================================================================ 2026-02-21T09:20:56.0254662Z Internal Triton PTX codegen error 2026-02-21T09:20:56.0254931Z `ptxas` stderr: 2026-02-21T09:20:56.0255662Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 856 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:20:56.0256852Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:56.0257105Z 2026-02-21T09:20:56.0257775Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp39ftwlkr.ptx -o /tmp/tmp39ftwlkr.ptx.o 2026-02-21T09:20:56.0258535Z 2026-02-21T09:20:56.0258548Z 2026-02-21T09:20:56.0258620Z // 2026-02-21T09:20:56.0258821Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:20:56.0259089Z // 2026-02-21T09:20:56.0259183Z 2026-02-21T09:20:56.0259259Z .version 8.7 2026-02-21T09:20:56.0259454Z .target sm_90a 2026-02-21T09:20:56.0259646Z .address_size 64 2026-02-21T09:20:56.0259769Z 2026-02-21T09:20:56.0259998Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:20:56.0260444Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:20:56.0260787Z // @_helion_matmul_bf16_int4 2026-02-21T09:20:56.0261133Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:20:56.0261466Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:20:56.0261847Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:20:56.0262239Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:20:56.0262612Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:20:56.0263011Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:20:56.0263306Z ) 2026-02-21T09:20:56.0263452Z .reqntid 256 2026-02-21T09:20:56.0263614Z .maxnreg 64 2026-02-21T09:20:56.0263759Z { 2026-02-21T09:20:56.0263912Z .reg .pred %p<71>; 2026-02-21T09:20:56.0264092Z .reg .b16 %rs<568>; 2026-02-21T09:20:56.0264268Z .reg .b32 %r<12548>; 2026-02-21T09:20:56.0264443Z .reg .b64 %rd<644>; 2026-02-21T09:20:56.0264786Z .loc 1 14 0 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:14:0 2026-02-21T09:20:56.0265179Z $L__func_begin0: 2026-02-21T09:20:56.0265673Z .loc 1 14 0 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:14:0 2026-02-21T09:20:56.0265996Z 2026-02-21T09:20:56.0266059Z // %bb.0: 2026-02-21T09:20:56.0266263Z ld.param.b64 %rd46, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:20:56.0266876Z ld.param.b64 %rd45, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:20:56.0267186Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:20:56.0267447Z $L__tmp0: 2026-02-21T09:20:56.0267758Z .loc 1 20 30 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:20:30 2026-02-21T09:20:56.0268156Z mov.u32 %r1551, %ctaid.x; 2026-02-21T09:20:56.0268750Z .loc 1 20 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:20:35 2026-02-21T09:20:56.0269148Z shl.b32 %r11887, %r1551, 1; 2026-02-21T09:20:56.0269502Z .loc 1 21 37 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:21:37 2026-02-21T09:20:56.0269902Z add.s32 %r1552, %r11887, 2; 2026-02-21T09:20:56.0270253Z .loc 1 21 49 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:21:49 2026-02-21T09:20:56.0270641Z min.s32 %r2, %r1552, 4096; 2026-02-21T09:20:56.0271009Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0271404Z sub.s32 %r1553, %r2, %r11887; 2026-02-21T09:20:56.0271586Z shr.s32 %r1554, %r1553, 31; 2026-02-21T09:20:56.0271763Z shr.u32 %r1555, %r1554, 30; 2026-02-21T09:20:56.0272026Z add.s32 %r1556, %r1553, %r1555; 2026-02-21T09:20:56.0272230Z and.b32 %r1557, %r1556, -4; 2026-02-21T09:20:56.0272409Z add.s32 %r12416, %r1557, %r11887; 2026-02-21T09:20:56.0272749Z .loc 1 34 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:45 2026-02-21T09:20:56.0273103Z mov.u32 %r4, %tid.x; 2026-02-21T09:20:56.0273281Z shr.u32 %r5, %r4, 5; 2026-02-21T09:20:56.0273442Z shl.b32 %r6, %r4, 2; 2026-02-21T09:20:56.0273599Z and.b32 %r7, %r6, 124; 2026-02-21T09:20:56.0273772Z shl.b32 %r8, %r4, 3; 2026-02-21T09:20:56.0273927Z and.b32 %r9, %r8, 120; 2026-02-21T09:20:56.0274241Z .loc 1 36 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:45 2026-02-21T09:20:56.0274597Z and.b32 %r10, %r4, 252; 2026-02-21T09:20:56.0274774Z bfe.u32 %r11, %r4, 2, 6; 2026-02-21T09:20:56.0274940Z or.b32 %r12, %r11, 64; 2026-02-21T09:20:56.0275109Z or.b32 %r13, %r11, 128; 2026-02-21T09:20:56.0275270Z or.b32 %r14, %r11, 192; 2026-02-21T09:20:56.0275443Z shr.u32 %r1558, %r4, 4; 2026-02-21T09:20:56.0275612Z bfe.u32 %r15, %r4, 4, 4; 2026-02-21T09:20:56.0275777Z or.b32 %r16, %r15, 16; 2026-02-21T09:20:56.0275958Z or.b32 %r17, %r15, 32; 2026-02-21T09:20:56.0276124Z or.b32 %r18, %r1558, 48; 2026-02-21T09:20:56.0276293Z or.b32 %r19, %r15, 64; 2026-02-21T09:20:56.0276585Z or.b32 %r20, %r15, 80; 2026-02-21T09:20:56.0276756Z or.b32 %r21, %r15, 96; 2026-02-21T09:20:56.0276918Z or.b32 %r22, %r1558, 112; 2026-02-21T09:20:56.0277096Z or.b32 %r23, %r15, 128; 2026-02-21T09:20:56.0277259Z or.b32 %r24, %r15, 144; 2026-02-21T09:20:56.0277427Z or.b32 %r25, %r15, 160; 2026-02-21T09:20:56.0277598Z or.b32 %r26, %r1558, 176; 2026-02-21T09:20:56.0277764Z or.b32 %r27, %r15, 192; 2026-02-21T09:20:56.0277929Z or.b32 %r28, %r15, 208; 2026-02-21T09:20:56.0278087Z or.b32 %r29, %r15, 224; 2026-02-21T09:20:56.0278259Z or.b32 %r30, %r1558, 240; 2026-02-21T09:20:56.0278590Z .loc 1 44 48 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:44:48 2026-02-21T09:20:56.0278957Z and.b32 %r31, %r4, 224; 2026-02-21T09:20:56.0279120Z bfe.u32 %r32, %r4, 5, 3; 2026-02-21T09:20:56.0279444Z .loc 1 50 38 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:50:38 2026-02-21T09:20:56.0279802Z and.b32 %r33, %r4, 3; 2026-02-21T09:20:56.0279965Z shl.b32 %r34, %r33, 2; 2026-02-21T09:20:56.0280277Z .loc 1 68 38 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:68:38 2026-02-21T09:20:56.0280729Z and.b32 %r35, %r4, 128; 2026-02-21T09:20:56.0281058Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0281490Z setp.ge.s32 %p1, %r11887, %r12416; 2026-02-21T09:20:56.0281694Z and.b32 %r11867, %r8, 1912; 2026-02-21T09:20:56.0282624Z [536s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:20:56.0284261Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:20:56.0285752Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:20:56.0286044Z `ptxas` stderr: 2026-02-21T09:20:56.0286748Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 856 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:20:56.0287408Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:56.0287591Z 2026-02-21T09:20:56.0288227Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp39ftwlkr.ptx -o /tmp/tmp39ftwlkr.ptx.o 2026-02-21T09:20:56.0288812Z 2026-02-21T09:20:56.0288968Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:20:56.0289272Z bfe.s32 %r11868, %r4, 4, 1; 2026-02-21T09:20:56.0289458Z mov.b32 %r11869, global_smem; 2026-02-21T09:20:56.0289661Z shl.b32 %r11870, %r31, 8; 2026-02-21T09:20:56.0289832Z and.b32 %r11871, %r6, 1020; 2026-02-21T09:20:56.0290010Z shl.b32 %r11872, %r31, 4; 2026-02-21T09:20:56.0290181Z and.b32 %r11873, %r8, 96; 2026-02-21T09:20:56.0290350Z shl.b32 %r11874, %r33, 1; 2026-02-21T09:20:56.0290524Z and.b32 %r11875, %r4, 127; 2026-02-21T09:20:56.0290697Z or.b32 %r11876, %r4, 896; 2026-02-21T09:20:56.0290892Z and.b32 %r11877, %r8, 48; 2026-02-21T09:20:56.0291061Z shr.u32 %r11878, %r35, 5; 2026-02-21T09:20:56.0291235Z shl.b32 %r11879, %r33, 13; 2026-02-21T09:20:56.0291406Z shl.b32 %r11880, %r4, 5; 2026-02-21T09:20:56.0291583Z and.b32 %r11881, %r4, 24; 2026-02-21T09:20:56.0291749Z and.b32 %r11882, %r6, 16; 2026-02-21T09:20:56.0291923Z shl.b32 %r11883, %r33, 5; 2026-02-21T09:20:56.0292090Z shl.b32 %r11884, %r10, 2; 2026-02-21T09:20:56.0292264Z shl.b32 %r11885, %r32, 13; 2026-02-21T09:20:56.0292440Z shl.b32 %r11886, %r11, 10; 2026-02-21T09:20:56.0292613Z setp.eq.b32 %p70, %r35, 0; 2026-02-21T09:20:56.0292795Z cvt.u64.u32 %rd633, %r34; 2026-02-21T09:20:56.0292966Z @%p1 bra $L__BB0_11; 2026-02-21T09:20:56.0293161Z // %bb.1: // %.lr.ph 2026-02-21T09:20:56.0293539Z .loc 1 0 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:0:121 2026-02-21T09:20:56.0293924Z and.b32 %r1561, %r11868, 136; 2026-02-21T09:20:56.0294114Z xor.b32 %r36, %r1561, %r11867; 2026-02-21T09:20:56.0294304Z add.s32 %r37, %r11869, %r36; 2026-02-21T09:20:56.0294481Z add.s32 %r38, %r37, 2048; 2026-02-21T09:20:56.0294645Z add.s32 %r39, %r37, 4096; 2026-02-21T09:20:56.0294815Z add.s32 %r40, %r37, 6144; 2026-02-21T09:20:56.0294983Z add.s32 %r1564, %r11869, %r11871; 2026-02-21T09:20:56.0295177Z add.s32 %r42, %r1564, 90112; 2026-02-21T09:20:56.0295349Z add.s32 %r43, %r37, 40960; 2026-02-21T09:20:56.0295524Z add.s32 %r44, %r37, 43008; 2026-02-21T09:20:56.0295695Z add.s32 %r45, %r37, 45056; 2026-02-21T09:20:56.0295865Z add.s32 %r46, %r37, 47104; 2026-02-21T09:20:56.0296037Z or.b32 %r47, %r11870, 65536; 2026-02-21T09:20:56.0296207Z add.s32 %r48, %r1564, 95232; 2026-02-21T09:20:56.0296608Z add.s32 %r49, %r37, 8192; 2026-02-21T09:20:56.0296780Z add.s32 %r50, %r37, 10240; 2026-02-21T09:20:56.0296953Z add.s32 %r51, %r37, 12288; 2026-02-21T09:20:56.0297120Z add.s32 %r52, %r37, 14336; 2026-02-21T09:20:56.0297376Z or.b32 %r53, %r11870, 131072; 2026-02-21T09:20:56.0297553Z add.s32 %r54, %r1564, 91136; 2026-02-21T09:20:56.0297728Z add.s32 %r55, %r37, 49152; 2026-02-21T09:20:56.0297909Z add.s32 %r56, %r37, 51200; 2026-02-21T09:20:56.0298086Z add.s32 %r57, %r37, 53248; 2026-02-21T09:20:56.0298255Z add.s32 %r58, %r37, 55296; 2026-02-21T09:20:56.0298425Z or.b32 %r59, %r11870, 196608; 2026-02-21T09:20:56.0298604Z add.s32 %r60, %r1564, 96256; 2026-02-21T09:20:56.0298774Z add.s32 %r61, %r37, 16384; 2026-02-21T09:20:56.0299025Z add.s32 %r62, %r37, 18432; 2026-02-21T09:20:56.0299201Z add.s32 %r63, %r37, 20480; 2026-02-21T09:20:56.0299377Z add.s32 %r64, %r37, 22528; 2026-02-21T09:20:56.0299550Z or.b32 %r65, %r11870, 262144; 2026-02-21T09:20:56.0299732Z add.s32 %r66, %r1564, 92160; 2026-02-21T09:20:56.0299903Z add.s32 %r67, %r37, 57344; 2026-02-21T09:20:56.0300075Z add.s32 %r68, %r37, 59392; 2026-02-21T09:20:56.0300248Z add.s32 %r69, %r37, 61440; 2026-02-21T09:20:56.0300416Z add.s32 %r70, %r37, 63488; 2026-02-21T09:20:56.0300616Z or.b32 %r71, %r11870, 327680; 2026-02-21T09:20:56.0300799Z add.s32 %r72, %r1564, 97280; 2026-02-21T09:20:56.0300979Z add.s32 %r73, %r37, 24576; 2026-02-21T09:20:56.0301150Z add.s32 %r74, %r37, 26624; 2026-02-21T09:20:56.0301325Z add.s32 %r75, %r37, 28672; 2026-02-21T09:20:56.0301569Z add.s32 %r76, %r37, 30720; 2026-02-21T09:20:56.0301760Z or.b32 %r77, %r11870, 393216; 2026-02-21T09:20:56.0301935Z add.s32 %r78, %r1564, 93184; 2026-02-21T09:20:56.0302118Z add.s32 %r79, %r37, 65536; 2026-02-21T09:20:56.0302295Z add.s32 %r80, %r37, 67584; 2026-02-21T09:20:56.0302465Z add.s32 %r81, %r37, 69632; 2026-02-21T09:20:56.0302638Z add.s32 %r82, %r37, 71680; 2026-02-21T09:20:56.0302809Z or.b32 %r83, %r11870, 458752; 2026-02-21T09:20:56.0302991Z add.s32 %r84, %r1564, 98304; 2026-02-21T09:20:56.0303161Z add.s32 %r85, %r37, 32768; 2026-02-21T09:20:56.0303331Z add.s32 %r86, %r37, 34816; 2026-02-21T09:20:56.0303496Z add.s32 %r87, %r37, 36864; 2026-02-21T09:20:56.0303669Z add.s32 %r88, %r37, 38912; 2026-02-21T09:20:56.0303833Z or.b32 %r89, %r11870, 524288; 2026-02-21T09:20:56.0304010Z add.s32 %r90, %r1564, 94208; 2026-02-21T09:20:56.0304188Z add.s32 %r91, %r37, 73728; 2026-02-21T09:20:56.0304352Z add.s32 %r92, %r37, 75776; 2026-02-21T09:20:56.0304527Z add.s32 %r93, %r37, 77824; 2026-02-21T09:20:56.0304691Z add.s32 %r94, %r37, 79872; 2026-02-21T09:20:56.0304864Z or.b32 %r95, %r11870, 589824; 2026-02-21T09:20:56.0305035Z add.s32 %r96, %r1564, 99328; 2026-02-21T09:20:56.0305215Z or.b32 %r1568, %r11872, %r11873; 2026-02-21T09:20:56.0305406Z or.b32 %r1569, %r1568, %r11874; 2026-02-21T09:20:56.0305599Z or.b32 %r97, %r1569, %r1561; 2026-02-21T09:20:56.0305781Z xor.b32 %r98, %r97, 8; 2026-02-21T09:20:56.0305966Z shl.b32 %r1570, %r11875, 6; 2026-02-21T09:20:56.0306149Z or.b32 %r1573, %r1570, %r11878; 2026-02-21T09:20:56.0306330Z or.b32 %r1574, %r1573, %r11877; 2026-02-21T09:20:56.0306645Z add.s32 %r1575, %r11869, 81920; 2026-02-21T09:20:56.0306834Z add.s32 %r101, %r1575, %r1574; 2026-02-21T09:20:56.0307034Z xor.b32 %r1576, %r1574, 16; 2026-02-21T09:20:56.0307216Z add.s32 %r102, %r1575, %r1576; 2026-02-21T09:20:56.0307404Z xor.b32 %r1577, %r1574, 32; 2026-02-21T09:20:56.0307580Z add.s32 %r103, %r1575, %r1577; 2026-02-21T09:20:56.0307765Z xor.b32 %r1578, %r1574, 48; 2026-02-21T09:20:56.0307945Z add.s32 %r104, %r1575, %r1578; 2026-02-21T09:20:56.0308127Z bfe.u32 %r1579, %r1575, 4, 14; 2026-02-21T09:20:56.0308405Z cvt.u64.u32 %rd47, %r1579; 2026-02-21T09:20:56.0308609Z or.b64 %rd1, %rd47, -9223371899382267904; 2026-02-21T09:20:56.0308833Z add.s32 %r1580, %r11869, 81952; 2026-02-21T09:20:56.0309020Z bfe.u32 %r1581, %r1580, 4, 14; 2026-02-21T09:20:56.0309312Z cvt.u64.u32 %rd48, %r1581; 2026-02-21T09:20:56.0309500Z or.b64 %rd2, %rd48, -9223371899382267904; 2026-02-21T09:20:56.0309718Z and.b32 %r1584, %r11880, 7264; 2026-02-21T09:20:56.0309906Z shl.b32 %r1586, %r11881, 4; 2026-02-21T09:20:56.0310086Z or.b32 %r1588, %r11879, %r11882; 2026-02-21T09:20:56.0310366Z or.b32 %r1589, %r1584, %r1586; 2026-02-21T09:20:56.0310544Z or.b32 %r1590, %r1588, %r1589; 2026-02-21T09:20:56.0310734Z add.s32 %r105, %r11869, %r1590; 2026-02-21T09:20:56.0310921Z xor.b32 %r1591, %r1590, 32; 2026-02-21T09:20:56.0311103Z add.s32 %r106, %r11869, %r1591; 2026-02-21T09:20:56.0311287Z xor.b32 %r1592, %r1590, 64; 2026-02-21T09:20:56.0311469Z add.s32 %r107, %r11869, %r1592; 2026-02-21T09:20:56.0311650Z xor.b32 %r1593, %r1590, 96; 2026-02-21T09:20:56.0311906Z add.s32 %r108, %r11869, %r1593; 2026-02-21T09:20:56.0312107Z shl.b32 %r1594, %r11881, 10; 2026-02-21T09:20:56.0312288Z or.b32 %r1597, %r1594, %r11883; 2026-02-21T09:20:56.0312483Z xor.b32 %r1598, %r1597, %r11884; 2026-02-21T09:20:56.0312673Z add.s32 %r3394, %r11869, %r1598; 2026-02-21T09:20:56.0312863Z add.s32 %r3399, %r3394, 1024; 2026-02-21T09:20:56.0313037Z add.s32 %r3404, %r3394, 2048; 2026-02-21T09:20:56.0313212Z add.s32 %r3409, %r3394, 3072; 2026-02-21T09:20:56.0313385Z add.s32 %r3414, %r3394, 4096; 2026-02-21T09:20:56.0313561Z add.s32 %r3419, %r3394, 5120; 2026-02-21T09:20:56.0313738Z add.s32 %r3424, %r3394, 6144; 2026-02-21T09:20:56.0313907Z add.s32 %r3429, %r3394, 7168; 2026-02-21T09:20:56.0314328Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0314703Z or.b32 %r1600, %r11885, %r7; 2026-02-21T09:20:56.0314889Z or.b32 %r117, %r1600, 720896; 2026-02-21T09:20:56.0315222Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0315600Z mad.wide.u32 %rd3, %r33, 8, %rd44; 2026-02-21T09:20:56.0315940Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0316302Z or.b32 %r1602, %r11886, %r34; 2026-02-21T09:20:56.0316614Z or.b32 %r121, %r1602, 196784; 2026-02-21T09:20:56.0316849Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:20:56.0317145Z // Child Loop BB0_3 Depth 2 2026-02-21T09:20:56.0317413Z // Child Loop BB0_5 Depth 2 2026-02-21T09:20:56.0317684Z // Child Loop BB0_7 Depth 2 2026-02-21T09:20:56.0317941Z // Child Loop BB0_9 Depth 2 2026-02-21T09:20:56.0318342Z .loc 1 28 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:28:35 2026-02-21T09:20:56.0318709Z shr.s32 %r1706, %r11887, 31; 2026-02-21T09:20:56.0318887Z shr.u32 %r1707, %r1706, 23; 2026-02-21T09:20:56.0319072Z add.s32 %r1708, %r11887, %r1707; 2026-02-21T09:20:56.0319261Z shr.s32 %r1709, %r1708, 9; 2026-02-21T09:20:56.0319580Z .loc 1 29 33 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:29:33 2026-02-21T09:20:56.0319935Z shl.b32 %r1710, %r1709, 3; 2026-02-21T09:20:56.0320265Z .loc 1 30 39 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:39 2026-02-21T09:20:56.0320623Z sub.s32 %r1711, 64, %r1710; 2026-02-21T09:20:56.0320936Z .loc 1 30 52 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:52 2026-02-21T09:20:56.0321297Z min.s32 %r1712, %r1711, 8; 2026-02-21T09:20:56.0321610Z .loc 1 31 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:45 2026-02-21T09:20:56.0321967Z and.b32 %r1713, %r1708, -512; 2026-02-21T09:20:56.0322146Z sub.s32 %r1714, %r11887, %r1713; 2026-02-21T09:20:56.0322475Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0322935Z div.s32 %r1715, %r1714, %r1712; 2026-02-21T09:20:56.0323262Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0323628Z mul.lo.s32 %r1716, %r1715, %r1712; 2026-02-21T09:20:56.0323822Z sub.s32 %r1717, %r1714, %r1716; 2026-02-21T09:20:56.0324225Z .loc 1 31 30 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:30 2026-02-21T09:20:56.0324575Z add.s32 %r1718, %r1717, %r1710; 2026-02-21T09:20:56.0324911Z .loc 1 33 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:33:27 2026-02-21T09:20:56.0325271Z shl.b32 %r211, %r1718, 7; 2026-02-21T09:20:56.0325587Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0326020Z or.b32 %r1719, %r211, %r7; 2026-02-21T09:20:56.0326348Z .loc 1 35 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:35:27 2026-02-21T09:20:56.0326842Z shl.b32 %r212, %r1715, 8; 2026-02-21T09:20:56.0327156Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0327516Z or.b32 %r1720, %r212, %r11; 2026-02-21T09:20:56.0327695Z or.b32 %r1721, %r212, %r12; 2026-02-21T09:20:56.0327866Z or.b32 %r1722, %r212, %r13; 2026-02-21T09:20:56.0328041Z or.b32 %r1723, %r212, %r14; 2026-02-21T09:20:56.0328352Z .loc 1 51 53 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:53 2026-02-21T09:20:56.0328701Z shl.b32 %r1724, %r1720, 10; 2026-02-21T09:20:56.0328960Z shl.b32 %r1725, %r1721, 10; 2026-02-21T09:20:56.0329147Z shl.b32 %r1726, %r1722, 10; 2026-02-21T09:20:56.0329316Z shl.b32 %r1727, %r1723, 10; 2026-02-21T09:20:56.0329633Z .loc 1 51 60 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:60 2026-02-21T09:20:56.0329991Z or.b32 %r1728, %r1724, %r34; 2026-02-21T09:20:56.0330165Z or.b32 %r1729, %r1725, %r34; 2026-02-21T09:20:56.0330344Z or.b32 %r1730, %r1726, %r34; 2026-02-21T09:20:56.0330513Z or.b32 %r1731, %r1727, %r34; 2026-02-21T09:20:56.0330830Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0331204Z mad.wide.s32 %rd49, %r1728, 2, %rd44; 2026-02-21T09:20:56.0331420Z mad.wide.s32 %rd50, %r1729, 2, %rd44; 2026-02-21T09:20:56.0331627Z mad.wide.s32 %rd51, %r1730, 2, %rd44; 2026-02-21T09:20:56.0331823Z mad.wide.s32 %rd52, %r1731, 2, %rd44; 2026-02-21T09:20:56.0332172Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0332518Z bar.sync 0; 2026-02-21T09:20:56.0332671Z mov.b32 %r1604, 8; 2026-02-21T09:20:56.0332830Z // begin inline asm 2026-02-21T09:20:56.0333069Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd49 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0333344Z // end inline asm 2026-02-21T09:20:56.0333500Z // begin inline asm 2026-02-21T09:20:56.0333731Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd50 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0333999Z // end inline asm 2026-02-21T09:20:56.0334153Z // begin inline asm 2026-02-21T09:20:56.0334375Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd51 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0334649Z // end inline asm 2026-02-21T09:20:56.0334796Z // begin inline asm 2026-02-21T09:20:56.0335022Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd52 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0335293Z // end inline asm 2026-02-21T09:20:56.0335457Z cp.async.commit_group; 2026-02-21T09:20:56.0335777Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0336139Z add.s32 %r1732, %r1719, %r11870; 2026-02-21T09:20:56.0336745Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0337131Z cvt.s64.s32 %rd100, %r1732; 2026-02-21T09:20:56.0337325Z add.s64 %rd53, %rd45, %rd100; 2026-02-21T09:20:56.0337610Z mov.b32 %r11891, 4; 2026-02-21T09:20:56.0337923Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0338286Z // begin inline asm 2026-02-21T09:20:56.0338522Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd53 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0338873Z // end inline asm 2026-02-21T09:20:56.0339031Z cp.async.commit_group; 2026-02-21T09:20:56.0339349Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0339706Z cvt.s64.s32 %rd101, %r1724; 2026-02-21T09:20:56.0339895Z or.b64 %rd102, %rd101, %rd633; 2026-02-21T09:20:56.0340080Z shl.b64 %rd103, %rd102, 1; 2026-02-21T09:20:56.0340265Z add.s64 %rd104, %rd44, %rd103; 2026-02-21T09:20:56.0340565Z add.s64 %rd54, %rd104, 32; 2026-02-21T09:20:56.0340747Z cvt.s64.s32 %rd105, %r1725; 2026-02-21T09:20:56.0340927Z or.b64 %rd106, %rd105, %rd633; 2026-02-21T09:20:56.0341107Z shl.b64 %rd107, %rd106, 1; 2026-02-21T09:20:56.0341286Z add.s64 %rd108, %rd44, %rd107; 2026-02-21T09:20:56.0341465Z add.s64 %rd55, %rd108, 32; 2026-02-21T09:20:56.0341644Z cvt.s64.s32 %rd109, %r1726; 2026-02-21T09:20:56.0341822Z or.b64 %rd110, %rd109, %rd633; 2026-02-21T09:20:56.0342005Z shl.b64 %rd111, %rd110, 1; 2026-02-21T09:20:56.0342177Z add.s64 %rd112, %rd44, %rd111; 2026-02-21T09:20:56.0342357Z add.s64 %rd56, %rd112, 32; 2026-02-21T09:20:56.0342536Z cvt.s64.s32 %rd113, %r1727; 2026-02-21T09:20:56.0342723Z or.b64 %rd114, %rd113, %rd633; 2026-02-21T09:20:56.0342905Z shl.b64 %rd115, %rd114, 1; 2026-02-21T09:20:56.0343148Z add.s64 %rd116, %rd44, %rd115; 2026-02-21T09:20:56.0343349Z add.s64 %rd57, %rd116, 32; 2026-02-21T09:20:56.0343668Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0344027Z // begin inline asm 2026-02-21T09:20:56.0344254Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd54 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0344533Z // end inline asm 2026-02-21T09:20:56.0344685Z // begin inline asm 2026-02-21T09:20:56.0344907Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd55 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0345180Z // end inline asm 2026-02-21T09:20:56.0345325Z // begin inline asm 2026-02-21T09:20:56.0345552Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd56 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0345831Z // end inline asm 2026-02-21T09:20:56.0345983Z // begin inline asm 2026-02-21T09:20:56.0346204Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd57 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0346608Z // end inline asm 2026-02-21T09:20:56.0346778Z cp.async.commit_group; 2026-02-21T09:20:56.0347102Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0347470Z add.s32 %r1733, %r1719, %r47; 2026-02-21T09:20:56.0347791Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0348152Z cvt.s64.s32 %rd117, %r1733; 2026-02-21T09:20:56.0348418Z add.s64 %rd58, %rd45, %rd117; 2026-02-21T09:20:56.0348755Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0349105Z // begin inline asm 2026-02-21T09:20:56.0349354Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd58 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0349639Z // end inline asm 2026-02-21T09:20:56.0349798Z cp.async.commit_group; 2026-02-21T09:20:56.0350115Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0350472Z add.s64 %rd59, %rd104, 64; 2026-02-21T09:20:56.0350657Z add.s64 %rd60, %rd108, 64; 2026-02-21T09:20:56.0350838Z add.s64 %rd61, %rd112, 64; 2026-02-21T09:20:56.0351019Z add.s64 %rd62, %rd116, 64; 2026-02-21T09:20:56.0351341Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0351688Z bar.sync 0; 2026-02-21T09:20:56.0351966Z // begin inline asm 2026-02-21T09:20:56.0352202Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd59 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0352478Z // end inline asm 2026-02-21T09:20:56.0352626Z // begin inline asm 2026-02-21T09:20:56.0352862Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd60 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0353212Z // end inline asm 2026-02-21T09:20:56.0353362Z // begin inline asm 2026-02-21T09:20:56.0353583Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd61 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0353855Z // end inline asm 2026-02-21T09:20:56.0354006Z // begin inline asm 2026-02-21T09:20:56.0354223Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd62 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0354495Z // end inline asm 2026-02-21T09:20:56.0354652Z cp.async.commit_group; 2026-02-21T09:20:56.0355053Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0355417Z add.s32 %r1734, %r1719, %r53; 2026-02-21T09:20:56.0355745Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0356105Z cvt.s64.s32 %rd118, %r1734; 2026-02-21T09:20:56.0356285Z add.s64 %rd63, %rd45, %rd118; 2026-02-21T09:20:56.0356761Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0357116Z // begin inline asm 2026-02-21T09:20:56.0357349Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd63 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0357622Z // end inline asm 2026-02-21T09:20:56.0357858Z cp.async.commit_group; 2026-02-21T09:20:56.0358171Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0358531Z add.s64 %rd64, %rd104, 96; 2026-02-21T09:20:56.0358716Z add.s64 %rd65, %rd108, 96; 2026-02-21T09:20:56.0358891Z add.s64 %rd66, %rd112, 96; 2026-02-21T09:20:56.0359070Z add.s64 %rd67, %rd116, 96; 2026-02-21T09:20:56.0359382Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0359738Z // begin inline asm 2026-02-21T09:20:56.0359964Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd64 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0360253Z // end inline asm 2026-02-21T09:20:56.0360402Z // begin inline asm 2026-02-21T09:20:56.0360633Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd65 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0360906Z // end inline asm 2026-02-21T09:20:56.0361063Z // begin inline asm 2026-02-21T09:20:56.0361294Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd66 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0361559Z // end inline asm 2026-02-21T09:20:56.0361713Z // begin inline asm 2026-02-21T09:20:56.0361936Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd67 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0362205Z // end inline asm 2026-02-21T09:20:56.0362361Z cp.async.commit_group; 2026-02-21T09:20:56.0362681Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0363046Z add.s32 %r1735, %r1719, %r59; 2026-02-21T09:20:56.0363370Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0363741Z cvt.s64.s32 %rd119, %r1735; 2026-02-21T09:20:56.0363929Z add.s64 %rd68, %rd45, %rd119; 2026-02-21T09:20:56.0364253Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0364604Z // begin inline asm 2026-02-21T09:20:56.0364842Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd68 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0365120Z // end inline asm 2026-02-21T09:20:56.0365276Z cp.async.commit_group; 2026-02-21T09:20:56.0365593Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0365944Z add.s64 %rd69, %rd104, 128; 2026-02-21T09:20:56.0366124Z add.s64 %rd70, %rd108, 128; 2026-02-21T09:20:56.0366298Z add.s64 %rd71, %rd112, 128; 2026-02-21T09:20:56.0366685Z add.s64 %rd72, %rd116, 128; 2026-02-21T09:20:56.0367000Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0367346Z bar.sync 0; 2026-02-21T09:20:56.0367569Z // begin inline asm 2026-02-21T09:20:56.0367793Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd69 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0368071Z // end inline asm 2026-02-21T09:20:56.0368219Z // begin inline asm 2026-02-21T09:20:56.0368445Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd70 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0368711Z // end inline asm 2026-02-21T09:20:56.0368862Z // begin inline asm 2026-02-21T09:20:56.0369080Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd71 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0369425Z // end inline asm 2026-02-21T09:20:56.0369585Z // begin inline asm 2026-02-21T09:20:56.0369805Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd72 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0370075Z // end inline asm 2026-02-21T09:20:56.0370228Z cp.async.commit_group; 2026-02-21T09:20:56.0370552Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0370909Z add.s32 %r1736, %r1719, %r65; 2026-02-21T09:20:56.0371237Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0371593Z cvt.s64.s32 %rd120, %r1736; 2026-02-21T09:20:56.0371773Z add.s64 %rd73, %rd45, %rd120; 2026-02-21T09:20:56.0372176Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0372531Z // begin inline asm 2026-02-21T09:20:56.0372764Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd73 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0373035Z // end inline asm 2026-02-21T09:20:56.0373193Z cp.async.commit_group; 2026-02-21T09:20:56.0373499Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0373855Z add.s64 %rd74, %rd104, 160; 2026-02-21T09:20:56.0374037Z add.s64 %rd75, %rd108, 160; 2026-02-21T09:20:56.0374209Z add.s64 %rd76, %rd112, 160; 2026-02-21T09:20:56.0374393Z add.s64 %rd77, %rd116, 160; 2026-02-21T09:20:56.0374712Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0375068Z // begin inline asm 2026-02-21T09:20:56.0375295Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd74 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0375573Z // end inline asm 2026-02-21T09:20:56.0375726Z // begin inline asm 2026-02-21T09:20:56.0375957Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd75 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0376228Z // end inline asm 2026-02-21T09:20:56.0376379Z // begin inline asm 2026-02-21T09:20:56.0376746Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd76 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0377013Z // end inline asm 2026-02-21T09:20:56.0377164Z // begin inline asm 2026-02-21T09:20:56.0377385Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd77 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0377653Z // end inline asm 2026-02-21T09:20:56.0377808Z cp.async.commit_group; 2026-02-21T09:20:56.0378125Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0378483Z add.s32 %r1737, %r1719, %r71; 2026-02-21T09:20:56.0378798Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0379159Z cvt.s64.s32 %rd121, %r1737; 2026-02-21T09:20:56.0379336Z add.s64 %rd78, %rd45, %rd121; 2026-02-21T09:20:56.0379660Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0380007Z // begin inline asm 2026-02-21T09:20:56.0380241Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd78 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0380515Z // end inline asm 2026-02-21T09:20:56.0380666Z cp.async.commit_group; 2026-02-21T09:20:56.0381069Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0381420Z add.s64 %rd79, %rd104, 192; 2026-02-21T09:20:56.0381599Z add.s64 %rd80, %rd108, 192; 2026-02-21T09:20:56.0381774Z add.s64 %rd81, %rd112, 192; 2026-02-21T09:20:56.0382022Z add.s64 %rd82, %rd116, 192; 2026-02-21T09:20:56.0382332Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0382685Z bar.sync 0; 2026-02-21T09:20:56.0382833Z // begin inline asm 2026-02-21T09:20:56.0383057Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd79 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0383329Z // end inline asm 2026-02-21T09:20:56.0383476Z // begin inline asm 2026-02-21T09:20:56.0383776Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd80 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0384060Z // end inline asm 2026-02-21T09:20:56.0384216Z // begin inline asm 2026-02-21T09:20:56.0384433Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd81 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0384703Z // end inline asm 2026-02-21T09:20:56.0384853Z // begin inline asm 2026-02-21T09:20:56.0385073Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd82 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0385344Z // end inline asm 2026-02-21T09:20:56.0385498Z cp.async.commit_group; 2026-02-21T09:20:56.0385814Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0386167Z add.s32 %r1738, %r1719, %r77; 2026-02-21T09:20:56.0386704Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0387093Z cvt.s64.s32 %rd122, %r1738; 2026-02-21T09:20:56.0387272Z add.s64 %rd83, %rd45, %rd122; 2026-02-21T09:20:56.0387597Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0387961Z // begin inline asm 2026-02-21T09:20:56.0388196Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd83 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0388558Z // end inline asm 2026-02-21T09:20:56.0388720Z cp.async.commit_group; 2026-02-21T09:20:56.0389026Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0389388Z add.s64 %rd84, %rd104, 224; 2026-02-21T09:20:56.0389571Z add.s64 %rd85, %rd108, 224; 2026-02-21T09:20:56.0389742Z add.s64 %rd86, %rd112, 224; 2026-02-21T09:20:56.0389925Z add.s64 %rd87, %rd116, 224; 2026-02-21T09:20:56.0390241Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0390611Z // begin inline asm 2026-02-21T09:20:56.0390850Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd84 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0391140Z // end inline asm 2026-02-21T09:20:56.0391294Z // begin inline asm 2026-02-21T09:20:56.0391529Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd85 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0391805Z // end inline asm 2026-02-21T09:20:56.0391966Z // begin inline asm 2026-02-21T09:20:56.0392197Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd86 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0392463Z // end inline asm 2026-02-21T09:20:56.0392614Z // begin inline asm 2026-02-21T09:20:56.0392835Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd87 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0393106Z // end inline asm 2026-02-21T09:20:56.0393261Z cp.async.commit_group; 2026-02-21T09:20:56.0393581Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0393948Z add.s32 %r1739, %r1719, %r83; 2026-02-21T09:20:56.0394273Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0394636Z cvt.s64.s32 %rd123, %r1739; 2026-02-21T09:20:56.0394822Z add.s64 %rd88, %rd45, %rd123; 2026-02-21T09:20:56.0395162Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0395605Z // begin inline asm 2026-02-21T09:20:56.0395840Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd88 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0396120Z // end inline asm 2026-02-21T09:20:56.0396276Z cp.async.commit_group; 2026-02-21T09:20:56.0396822Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0397180Z add.s64 %rd89, %rd104, 256; 2026-02-21T09:20:56.0397365Z add.s64 %rd90, %rd108, 256; 2026-02-21T09:20:56.0397538Z add.s64 %rd91, %rd112, 256; 2026-02-21T09:20:56.0397718Z add.s64 %rd92, %rd116, 256; 2026-02-21T09:20:56.0398032Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0398383Z bar.sync 0; 2026-02-21T09:20:56.0398621Z // begin inline asm 2026-02-21T09:20:56.0398867Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd89 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0399141Z // end inline asm 2026-02-21T09:20:56.0399293Z // begin inline asm 2026-02-21T09:20:56.0399521Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd90 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0399788Z // end inline asm 2026-02-21T09:20:56.0399940Z // begin inline asm 2026-02-21T09:20:56.0400164Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd91 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0400441Z // end inline asm 2026-02-21T09:20:56.0400607Z // begin inline asm 2026-02-21T09:20:56.0400829Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd92 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0401100Z // end inline asm 2026-02-21T09:20:56.0413983Z cp.async.commit_group; 2026-02-21T09:20:56.0414428Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0414840Z add.s32 %r1740, %r1719, %r89; 2026-02-21T09:20:56.0415205Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0415595Z cvt.s64.s32 %rd124, %r1740; 2026-02-21T09:20:56.0415798Z add.s64 %rd93, %rd45, %rd124; 2026-02-21T09:20:56.0416149Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0416694Z // begin inline asm 2026-02-21T09:20:56.0416949Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd93 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0417245Z // end inline asm 2026-02-21T09:20:56.0417412Z cp.async.commit_group; 2026-02-21T09:20:56.0417750Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0418135Z add.s64 %rd94, %rd104, 288; 2026-02-21T09:20:56.0418330Z add.s64 %rd95, %rd108, 288; 2026-02-21T09:20:56.0418523Z add.s64 %rd96, %rd112, 288; 2026-02-21T09:20:56.0418712Z add.s64 %rd97, %rd116, 288; 2026-02-21T09:20:56.0419045Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0419401Z // begin inline asm 2026-02-21T09:20:56.0419652Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd94 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0419931Z // end inline asm 2026-02-21T09:20:56.0420094Z // begin inline asm 2026-02-21T09:20:56.0420327Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd95 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0420603Z // end inline asm 2026-02-21T09:20:56.0420755Z // begin inline asm 2026-02-21T09:20:56.0420997Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd96 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0421273Z // end inline asm 2026-02-21T09:20:56.0421422Z // begin inline asm 2026-02-21T09:20:56.0421654Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd97 + 0 ], 0x8, %r1604; 2026-02-21T09:20:56.0421920Z // end inline asm 2026-02-21T09:20:56.0422084Z cp.async.commit_group; 2026-02-21T09:20:56.0422403Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0422784Z add.s32 %r1741, %r1719, %r95; 2026-02-21T09:20:56.0423123Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0423582Z cvt.s64.s32 %rd125, %r1741; 2026-02-21T09:20:56.0423774Z add.s64 %rd98, %rd45, %rd125; 2026-02-21T09:20:56.0424091Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0424529Z // begin inline asm 2026-02-21T09:20:56.0424762Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd98 + 0 ], 0x4, %r11891; 2026-02-21T09:20:56.0425044Z // end inline asm 2026-02-21T09:20:56.0425201Z cp.async.commit_group; 2026-02-21T09:20:56.0425518Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0425883Z add.s32 %r11889, %r117, %r211; 2026-02-21T09:20:56.0426069Z or.b32 %r1742, %r13, %r212; 2026-02-21T09:20:56.0426349Z shl.b32 %r1743, %r1742, 10; 2026-02-21T09:20:56.0426667Z mul.wide.s32 %rd8, %r1743, 2; 2026-02-21T09:20:56.0426858Z or.b32 %r1744, %r12, %r212; 2026-02-21T09:20:56.0427045Z shl.b32 %r1745, %r1744, 10; 2026-02-21T09:20:56.0427257Z mul.wide.s32 %rd9, %r1745, 2; 2026-02-21T09:20:56.0427445Z shl.b32 %r1746, %r1715, 18; 2026-02-21T09:20:56.0427631Z or.b32 %r1747, %r11886, %r1746; 2026-02-21T09:20:56.0427836Z mul.wide.s32 %rd10, %r1747, 2; 2026-02-21T09:20:56.0428030Z or.b32 %r11888, %r121, %r1746; 2026-02-21T09:20:56.0428224Z mov.b32 %r11892, 0f00000000; 2026-02-21T09:20:56.0428491Z mov.b32 %r11890, -1; 2026-02-21T09:20:56.0428670Z mov.b64 %rd635, -16; 2026-02-21T09:20:56.0428833Z mov.b64 %rd634, %rd3; 2026-02-21T09:20:56.0429008Z mov.b32 %r11893, %r11892; 2026-02-21T09:20:56.0429264Z mov.b32 %r11894, %r11892; 2026-02-21T09:20:56.0429451Z mov.b32 %r11895, %r11892; 2026-02-21T09:20:56.0429633Z mov.b32 %r11896, %r11892; 2026-02-21T09:20:56.0429813Z mov.b32 %r11897, %r11892; 2026-02-21T09:20:56.0429989Z mov.b32 %r11898, %r11892; 2026-02-21T09:20:56.0430160Z mov.b32 %r11899, %r11892; 2026-02-21T09:20:56.0430335Z mov.b32 %r11900, %r11892; 2026-02-21T09:20:56.0430501Z mov.b32 %r11901, %r11892; 2026-02-21T09:20:56.0430676Z mov.b32 %r11902, %r11892; 2026-02-21T09:20:56.0430852Z mov.b32 %r11903, %r11892; 2026-02-21T09:20:56.0431038Z mov.b32 %r11904, %r11892; 2026-02-21T09:20:56.0431218Z mov.b32 %r11905, %r11892; 2026-02-21T09:20:56.0431397Z mov.b32 %r11906, %r11892; 2026-02-21T09:20:56.0431566Z mov.b32 %r11907, %r11892; 2026-02-21T09:20:56.0431739Z mov.b32 %r11908, %r11892; 2026-02-21T09:20:56.0431916Z mov.b32 %r11909, %r11892; 2026-02-21T09:20:56.0432083Z mov.b32 %r11910, %r11892; 2026-02-21T09:20:56.0432254Z mov.b32 %r11911, %r11892; 2026-02-21T09:20:56.0432421Z mov.b32 %r11912, %r11892; 2026-02-21T09:20:56.0432595Z mov.b32 %r11913, %r11892; 2026-02-21T09:20:56.0432770Z mov.b32 %r11914, %r11892; 2026-02-21T09:20:56.0432944Z mov.b32 %r11915, %r11892; 2026-02-21T09:20:56.0433129Z mov.b32 %r11916, %r11892; 2026-02-21T09:20:56.0433305Z mov.b32 %r11917, %r11892; 2026-02-21T09:20:56.0433473Z mov.b32 %r11918, %r11892; 2026-02-21T09:20:56.0433644Z mov.b32 %r11919, %r11892; 2026-02-21T09:20:56.0433818Z mov.b32 %r11920, %r11892; 2026-02-21T09:20:56.0433984Z mov.b32 %r11921, %r11892; 2026-02-21T09:20:56.0434156Z mov.b32 %r11922, %r11892; 2026-02-21T09:20:56.0434322Z mov.b32 %r11923, %r11892; 2026-02-21T09:20:56.0434497Z mov.b32 %r11924, %r11892; 2026-02-21T09:20:56.0434662Z mov.b32 %r11925, %r11892; 2026-02-21T09:20:56.0434835Z mov.b32 %r11926, %r11892; 2026-02-21T09:20:56.0435003Z mov.b32 %r11927, %r11892; 2026-02-21T09:20:56.0435178Z mov.b32 %r11928, %r11892; 2026-02-21T09:20:56.0435345Z mov.b32 %r11929, %r11892; 2026-02-21T09:20:56.0435520Z mov.b32 %r11930, %r11892; 2026-02-21T09:20:56.0435697Z mov.b32 %r11931, %r11892; 2026-02-21T09:20:56.0435864Z mov.b32 %r11932, %r11892; 2026-02-21T09:20:56.0436039Z mov.b32 %r11933, %r11892; 2026-02-21T09:20:56.0436207Z mov.b32 %r11934, %r11892; 2026-02-21T09:20:56.0436382Z mov.b32 %r11935, %r11892; 2026-02-21T09:20:56.0436679Z mov.b32 %r11936, %r11892; 2026-02-21T09:20:56.0436871Z mov.b32 %r11937, %r11892; 2026-02-21T09:20:56.0437128Z mov.b32 %r11938, %r11892; 2026-02-21T09:20:56.0437309Z mov.b32 %r11939, %r11892; 2026-02-21T09:20:56.0437490Z mov.b32 %r11940, %r11892; 2026-02-21T09:20:56.0437661Z mov.b32 %r11941, %r11892; 2026-02-21T09:20:56.0437926Z mov.b32 %r11942, %r11892; 2026-02-21T09:20:56.0438099Z mov.b32 %r11943, %r11892; 2026-02-21T09:20:56.0438277Z mov.b32 %r11944, %r11892; 2026-02-21T09:20:56.0438446Z mov.b32 %r11945, %r11892; 2026-02-21T09:20:56.0438624Z mov.b32 %r11946, %r11892; 2026-02-21T09:20:56.0438791Z mov.b32 %r11947, %r11892; 2026-02-21T09:20:56.0438967Z mov.b32 %r11948, %r11892; 2026-02-21T09:20:56.0439137Z mov.b32 %r11949, %r11892; 2026-02-21T09:20:56.0439313Z mov.b32 %r11950, %r11892; 2026-02-21T09:20:56.0439486Z mov.b32 %r11951, %r11892; 2026-02-21T09:20:56.0439726Z mov.b32 %r11952, %r11892; 2026-02-21T09:20:56.0439902Z mov.b32 %r11953, %r11892; 2026-02-21T09:20:56.0440080Z mov.b32 %r11954, %r11892; 2026-02-21T09:20:56.0440265Z mov.b32 %r11955, %r11892; 2026-02-21T09:20:56.0440432Z mov.b32 %r11956, %r11892; 2026-02-21T09:20:56.0440606Z mov.b32 %r11957, %r11892; 2026-02-21T09:20:56.0440775Z mov.b32 %r11958, %r11892; 2026-02-21T09:20:56.0440948Z mov.b32 %r11959, %r11892; 2026-02-21T09:20:56.0441126Z mov.b32 %r11960, %r11892; 2026-02-21T09:20:56.0441297Z mov.b32 %r11961, %r11892; 2026-02-21T09:20:56.0441469Z mov.b32 %r11962, %r11892; 2026-02-21T09:20:56.0441640Z mov.b32 %r11963, %r11892; 2026-02-21T09:20:56.0441812Z mov.b32 %r11964, %r11892; 2026-02-21T09:20:56.0441979Z mov.b32 %r11965, %r11892; 2026-02-21T09:20:56.0442235Z mov.b32 %r11966, %r11892; 2026-02-21T09:20:56.0442415Z mov.b32 %r11967, %r11892; 2026-02-21T09:20:56.0442588Z mov.b32 %r11968, %r11892; 2026-02-21T09:20:56.0442755Z mov.b32 %r11969, %r11892; 2026-02-21T09:20:56.0442932Z mov.b32 %r11970, %r11892; 2026-02-21T09:20:56.0443100Z mov.b32 %r11971, %r11892; 2026-02-21T09:20:56.0443273Z mov.b32 %r11972, %r11892; 2026-02-21T09:20:56.0443449Z mov.b32 %r11973, %r11892; 2026-02-21T09:20:56.0443618Z mov.b32 %r11974, %r11892; 2026-02-21T09:20:56.0443793Z mov.b32 %r11975, %r11892; 2026-02-21T09:20:56.0443962Z mov.b32 %r11976, %r11892; 2026-02-21T09:20:56.0444136Z mov.b32 %r11977, %r11892; 2026-02-21T09:20:56.0444304Z mov.b32 %r11978, %r11892; 2026-02-21T09:20:56.0444474Z mov.b32 %r11979, %r11892; 2026-02-21T09:20:56.0444642Z mov.b32 %r11980, %r11892; 2026-02-21T09:20:56.0444816Z mov.b32 %r11981, %r11892; 2026-02-21T09:20:56.0444981Z mov.b32 %r11982, %r11892; 2026-02-21T09:20:56.0445154Z mov.b32 %r11983, %r11892; 2026-02-21T09:20:56.0445331Z mov.b32 %r11984, %r11892; 2026-02-21T09:20:56.0445500Z mov.b32 %r11985, %r11892; 2026-02-21T09:20:56.0445671Z mov.b32 %r11986, %r11892; 2026-02-21T09:20:56.0445838Z mov.b32 %r11987, %r11892; 2026-02-21T09:20:56.0446018Z mov.b32 %r11988, %r11892; 2026-02-21T09:20:56.0446188Z mov.b32 %r11989, %r11892; 2026-02-21T09:20:56.0446377Z mov.b32 %r11990, %r11892; 2026-02-21T09:20:56.0446667Z mov.b32 %r11991, %r11892; 2026-02-21T09:20:56.0446833Z mov.b32 %r11992, %r11892; 2026-02-21T09:20:56.0447005Z mov.b32 %r11993, %r11892; 2026-02-21T09:20:56.0447170Z mov.b32 %r11994, %r11892; 2026-02-21T09:20:56.0447342Z mov.b32 %r11995, %r11892; 2026-02-21T09:20:56.0447526Z mov.b32 %r11996, %r11892; 2026-02-21T09:20:56.0447697Z mov.b32 %r11997, %r11892; 2026-02-21T09:20:56.0447862Z mov.b32 %r11998, %r11892; 2026-02-21T09:20:56.0448034Z mov.b32 %r11999, %r11892; 2026-02-21T09:20:56.0448200Z mov.b32 %r12000, %r11892; 2026-02-21T09:20:56.0448373Z mov.b32 %r12001, %r11892; 2026-02-21T09:20:56.0448546Z mov.b32 %r12002, %r11892; 2026-02-21T09:20:56.0448716Z mov.b32 %r12003, %r11892; 2026-02-21T09:20:56.0448888Z mov.b32 %r12004, %r11892; 2026-02-21T09:20:56.0449056Z mov.b32 %r12005, %r11892; 2026-02-21T09:20:56.0449230Z mov.b32 %r12006, %r11892; 2026-02-21T09:20:56.0449397Z mov.b32 %r12007, %r11892; 2026-02-21T09:20:56.0449575Z mov.b32 %r12008, %r11892; 2026-02-21T09:20:56.0449863Z mov.b32 %r12009, %r11892; 2026-02-21T09:20:56.0450043Z mov.b32 %r12010, %r11892; 2026-02-21T09:20:56.0450212Z mov.b32 %r12011, %r11892; 2026-02-21T09:20:56.0450377Z mov.b32 %r12012, %r11892; 2026-02-21T09:20:56.0450542Z mov.b32 %r12013, %r11892; 2026-02-21T09:20:56.0450803Z mov.b32 %r12014, %r11892; 2026-02-21T09:20:56.0450966Z mov.b32 %r12015, %r11892; 2026-02-21T09:20:56.0451127Z mov.b32 %r12016, %r11892; 2026-02-21T09:20:56.0451295Z mov.b32 %r12017, %r11892; 2026-02-21T09:20:56.0451458Z mov.b32 %r12018, %r11892; 2026-02-21T09:20:56.0451623Z mov.b32 %r12019, %r11892; 2026-02-21T09:20:56.0451848Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:56.0452158Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:56.0452493Z add.s64 %rd635, %rd635, 16; 2026-02-21T09:20:56.0452699Z setp.lt.u64 %p11, %rd635, 432; 2026-02-21T09:20:56.0452898Z add.s32 %r3348, %r11890, 1; 2026-02-21T09:20:56.0453085Z setp.gt.s32 %p12, %r3348, 4; 2026-02-21T09:20:56.0453293Z selp.b32 %r11890, 0, %r3348, %p12; 2026-02-21T09:20:56.0453655Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0454037Z cp.async.wait_group 16; 2026-02-21T09:20:56.0454211Z bar.sync 0; 2026-02-21T09:20:56.0454371Z shl.b32 %r3349, %r11890, 13; 2026-02-21T09:20:56.0454557Z add.s32 %r3351, %r11869, %r3349; 2026-02-21T09:20:56.0454900Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0455342Z add.s32 %r3352, %r3351, %r97; 2026-02-21T09:20:56.0455539Z ld.shared.b16 %rs1, [%r3352]; 2026-02-21T09:20:56.0455745Z ld.shared.b16 %rs2, [%r3352+256]; 2026-02-21T09:20:56.0455952Z ld.shared.b16 %rs3, [%r3352+16]; 2026-02-21T09:20:56.0456154Z ld.shared.b16 %rs4, [%r3352+272]; 2026-02-21T09:20:56.0456352Z ld.shared.b16 %rs5, [%r3352+4096]; 2026-02-21T09:20:56.0456678Z ld.shared.b16 %rs6, [%r3352+4352]; 2026-02-21T09:20:56.0456881Z ld.shared.b16 %rs7, [%r3352+4112]; 2026-02-21T09:20:56.0457073Z ld.shared.b16 %rs8, [%r3352+4368]; 2026-02-21T09:20:56.0457263Z add.s32 %r3353, %r3351, %r98; 2026-02-21T09:20:56.0457445Z ld.shared.b16 %rs9, [%r3353]; 2026-02-21T09:20:56.0457631Z ld.shared.b16 %rs10, [%r3353+256]; 2026-02-21T09:20:56.0457828Z ld.shared.b16 %rs11, [%r3353+16]; 2026-02-21T09:20:56.0458019Z ld.shared.b16 %rs12, [%r3353+272]; 2026-02-21T09:20:56.0458222Z ld.shared.b16 %rs13, [%r3353+4096]; 2026-02-21T09:20:56.0458419Z ld.shared.b16 %rs14, [%r3353+4352]; 2026-02-21T09:20:56.0458619Z ld.shared.b16 %rs15, [%r3353+4112]; 2026-02-21T09:20:56.0458811Z ld.shared.b16 %rs16, [%r3353+4368]; 2026-02-21T09:20:56.0459008Z cvt.f32.bf16 %r1876, %rs1; 2026-02-21T09:20:56.0459189Z cvt.f32.bf16 %r1877, %rs2; 2026-02-21T09:20:56.0459359Z cvt.f32.bf16 %r1878, %rs9; 2026-02-21T09:20:56.0459534Z cvt.f32.bf16 %r1879, %rs10; 2026-02-21T09:20:56.0459707Z cvt.f32.bf16 %r2008, %rs3; 2026-02-21T09:20:56.0459880Z cvt.f32.bf16 %r2009, %rs4; 2026-02-21T09:20:56.0460049Z cvt.f32.bf16 %r2010, %rs11; 2026-02-21T09:20:56.0460230Z cvt.f32.bf16 %r2011, %rs12; 2026-02-21T09:20:56.0460401Z cvt.f32.bf16 %r2140, %rs5; 2026-02-21T09:20:56.0460586Z cvt.f32.bf16 %r2141, %rs6; 2026-02-21T09:20:56.0460760Z cvt.f32.bf16 %r2142, %rs13; 2026-02-21T09:20:56.0460945Z cvt.f32.bf16 %r2143, %rs14; 2026-02-21T09:20:56.0461120Z cvt.f32.bf16 %r2272, %rs7; 2026-02-21T09:20:56.0461287Z cvt.f32.bf16 %r2273, %rs8; 2026-02-21T09:20:56.0461457Z cvt.f32.bf16 %r2274, %rs15; 2026-02-21T09:20:56.0461629Z cvt.f32.bf16 %r2275, %rs16; 2026-02-21T09:20:56.0461953Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0462312Z shl.b32 %r3354, %r11890, 10; 2026-02-21T09:20:56.0462493Z add.s32 %r3355, %r11869, %r3354; 2026-02-21T09:20:56.0462682Z add.s32 %r3356, %r3355, 90112; 2026-02-21T09:20:56.0463008Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0463487Z add.s32 %r3357, %r3356, %r11875; 2026-02-21T09:20:56.0463673Z ld.shared.b8 %rs17, [%r3357]; 2026-02-21T09:20:56.0463861Z ld.shared.b8 %rs18, [%r3357+128]; 2026-02-21T09:20:56.0464118Z ld.shared.b8 %rs19, [%r3357+256]; 2026-02-21T09:20:56.0464309Z ld.shared.b8 %rs20, [%r3357+384]; 2026-02-21T09:20:56.0464494Z ld.shared.b8 %rs21, [%r3357+512]; 2026-02-21T09:20:56.0464684Z ld.shared.b8 %rs22, [%r3357+640]; 2026-02-21T09:20:56.0464880Z ld.shared.b8 %rs23, [%r3357+768]; 2026-02-21T09:20:56.0465076Z add.s32 %r3358, %r3356, %r11876; 2026-02-21T09:20:56.0465267Z ld.shared.b8 %rs24, [%r3358]; 2026-02-21T09:20:56.0465662Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0466029Z shl.b16 %rs25, %rs17, 4; 2026-02-21T09:20:56.0466202Z shl.b16 %rs26, %rs18, 4; 2026-02-21T09:20:56.0466369Z shl.b16 %rs27, %rs19, 4; 2026-02-21T09:20:56.0466648Z shl.b16 %rs28, %rs20, 4; 2026-02-21T09:20:56.0466817Z shl.b16 %rs29, %rs21, 4; 2026-02-21T09:20:56.0466978Z shl.b16 %rs30, %rs22, 4; 2026-02-21T09:20:56.0467145Z shl.b16 %rs31, %rs23, 4; 2026-02-21T09:20:56.0467312Z shl.b16 %rs32, %rs24, 4; 2026-02-21T09:20:56.0467617Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0467980Z selp.b16 %rs33, %rs25, %rs17, %p70; 2026-02-21T09:20:56.0468188Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:20:56.0468420Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:20:56.0468676Z selp.b16 %rs36, %rs26, %rs18, %p70; 2026-02-21T09:20:56.0468878Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:20:56.0469043Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:20:56.0469218Z selp.b16 %rs39, %rs27, %rs19, %p70; 2026-02-21T09:20:56.0469412Z cvt.s16.s8 %rs40, %rs39; 2026-02-21T09:20:56.0469587Z shr.s16 %rs41, %rs40, 4; 2026-02-21T09:20:56.0469758Z selp.b16 %rs42, %rs28, %rs20, %p70; 2026-02-21T09:20:56.0469950Z cvt.s16.s8 %rs43, %rs42; 2026-02-21T09:20:56.0470120Z shr.s16 %rs44, %rs43, 4; 2026-02-21T09:20:56.0470289Z selp.b16 %rs45, %rs29, %rs21, %p70; 2026-02-21T09:20:56.0470480Z cvt.s16.s8 %rs46, %rs45; 2026-02-21T09:20:56.0470642Z shr.s16 %rs47, %rs46, 4; 2026-02-21T09:20:56.0470815Z selp.b16 %rs48, %rs30, %rs22, %p70; 2026-02-21T09:20:56.0471003Z cvt.s16.s8 %rs49, %rs48; 2026-02-21T09:20:56.0471177Z shr.s16 %rs50, %rs49, 4; 2026-02-21T09:20:56.0471349Z selp.b16 %rs51, %rs31, %rs23, %p70; 2026-02-21T09:20:56.0471538Z cvt.s16.s8 %rs52, %rs51; 2026-02-21T09:20:56.0471706Z shr.s16 %rs53, %rs52, 4; 2026-02-21T09:20:56.0471877Z selp.b16 %rs54, %rs32, %rs24, %p70; 2026-02-21T09:20:56.0472070Z cvt.s16.s8 %rs55, %rs54; 2026-02-21T09:20:56.0472237Z shr.s16 %rs56, %rs55, 4; 2026-02-21T09:20:56.0472552Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0472908Z cvt.rn.f32.s16 %r3359, %rs35; 2026-02-21T09:20:56.0473089Z cvt.rn.f32.s16 %r3360, %rs38; 2026-02-21T09:20:56.0473268Z cvt.rn.f32.s16 %r3361, %rs41; 2026-02-21T09:20:56.0473445Z cvt.rn.f32.s16 %r3362, %rs44; 2026-02-21T09:20:56.0473625Z cvt.rn.f32.s16 %r3363, %rs47; 2026-02-21T09:20:56.0473798Z cvt.rn.f32.s16 %r3364, %rs50; 2026-02-21T09:20:56.0473976Z cvt.rn.f32.s16 %r3365, %rs53; 2026-02-21T09:20:56.0474148Z cvt.rn.f32.s16 %r3366, %rs56; 2026-02-21T09:20:56.0474328Z st.shared.b32 [%r101], %r3359; 2026-02-21T09:20:56.0474516Z st.shared.b32 [%r101+8], %r3360; 2026-02-21T09:20:56.0474708Z st.shared.b32 [%r102], %r3361; 2026-02-21T09:20:56.0474892Z st.shared.b32 [%r102+8], %r3362; 2026-02-21T09:20:56.0475084Z st.shared.b32 [%r103], %r3363; 2026-02-21T09:20:56.0475280Z st.shared.b32 [%r103+8], %r3364; 2026-02-21T09:20:56.0475472Z st.shared.b32 [%r104], %r3365; 2026-02-21T09:20:56.0475655Z st.shared.b32 [%r104+8], %r3366; 2026-02-21T09:20:56.0475833Z $L__tmp1: 2026-02-21T09:20:56.0476196Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0476915Z // begin inline asm 2026-02-21T09:20:56.0477119Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0477302Z // end inline asm 2026-02-21T09:20:56.0477453Z bar.sync 0; 2026-02-21T09:20:56.0477708Z shfl.sync.idx.b32 %r3367, %r5, 0, 31, -1; 2026-02-21T09:20:56.0477932Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0478116Z mov.pred %p2, -1; 2026-02-21T09:20:56.0478268Z // begin inline asm 2026-02-21T09:20:56.0479915Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955}, {%r1876,%r1877,%r1878,%r1879}, %rd1, %p2, 1, 1; 2026-02-21T09:20:56.0481548Z // end inline asm 2026-02-21T09:20:56.0481694Z // begin inline asm 2026-02-21T09:20:56.0483325Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955}, {%r2008,%r2009,%r2010,%r2011}, %rd2, %p2, 1, 1; 2026-02-21T09:20:56.0484932Z // end inline asm 2026-02-21T09:20:56.0485077Z // begin inline asm 2026-02-21T09:20:56.0486764Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019}, {%r2140,%r2141,%r2142,%r2143}, %rd1, %p2, 1, 1; 2026-02-21T09:20:56.0488382Z // end inline asm 2026-02-21T09:20:56.0488530Z // begin inline asm 2026-02-21T09:20:56.0490096Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019}, {%r2272,%r2273,%r2274,%r2275}, %rd2, %p2, 1, 1; 2026-02-21T09:20:56.0491709Z // end inline asm 2026-02-21T09:20:56.0491872Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0492069Z mov.b32 %r3196, 0; 2026-02-21T09:20:56.0492223Z mov.b32 %r2404, %r1575; 2026-02-21T09:20:56.0492391Z mov.b32 %r2405, %r3196; 2026-02-21T09:20:56.0492555Z mov.b32 %r2406, %r3196; 2026-02-21T09:20:56.0492712Z // begin inline asm 2026-02-21T09:20:56.0495338Z // wait for regs: %r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955,%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019,%r2404,%r2405,%r2406 2026-02-21T09:20:56.0498460Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0498732Z // end inline asm 2026-02-21T09:20:56.0498880Z $L__tmp2: 2026-02-21T09:20:56.0499176Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0499538Z add.s32 %r3368, %r11869, 40960; 2026-02-21T09:20:56.0499720Z add.s32 %r3369, %r3368, %r3349; 2026-02-21T09:20:56.0500041Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0500401Z add.s32 %r3370, %r3369, %r97; 2026-02-21T09:20:56.0500597Z ld.shared.b16 %rs57, [%r3370]; 2026-02-21T09:20:56.0500797Z ld.shared.b16 %rs58, [%r3370+256]; 2026-02-21T09:20:56.0501002Z ld.shared.b16 %rs59, [%r3370+16]; 2026-02-21T09:20:56.0501194Z ld.shared.b16 %rs60, [%r3370+272]; 2026-02-21T09:20:56.0501465Z ld.shared.b16 %rs61, [%r3370+4096]; 2026-02-21T09:20:56.0501666Z ld.shared.b16 %rs62, [%r3370+4352]; 2026-02-21T09:20:56.0501864Z ld.shared.b16 %rs63, [%r3370+4112]; 2026-02-21T09:20:56.0502058Z ld.shared.b16 %rs64, [%r3370+4368]; 2026-02-21T09:20:56.0502262Z add.s32 %r3371, %r3369, %r98; 2026-02-21T09:20:56.0502443Z ld.shared.b16 %rs65, [%r3371]; 2026-02-21T09:20:56.0502630Z ld.shared.b16 %rs66, [%r3371+256]; 2026-02-21T09:20:56.0502825Z ld.shared.b16 %rs67, [%r3371+16]; 2026-02-21T09:20:56.0503016Z ld.shared.b16 %rs68, [%r3371+272]; 2026-02-21T09:20:56.0503214Z ld.shared.b16 %rs69, [%r3371+4096]; 2026-02-21T09:20:56.0503404Z ld.shared.b16 %rs70, [%r3371+4352]; 2026-02-21T09:20:56.0503602Z ld.shared.b16 %rs71, [%r3371+4112]; 2026-02-21T09:20:56.0503791Z ld.shared.b16 %rs72, [%r3371+4368]; 2026-02-21T09:20:56.0503997Z cvt.f32.bf16 %r2666, %rs57; 2026-02-21T09:20:56.0504176Z cvt.f32.bf16 %r2667, %rs58; 2026-02-21T09:20:56.0504353Z cvt.f32.bf16 %r2668, %rs65; 2026-02-21T09:20:56.0504529Z cvt.f32.bf16 %r2669, %rs66; 2026-02-21T09:20:56.0504701Z cvt.f32.bf16 %r2798, %rs59; 2026-02-21T09:20:56.0504874Z cvt.f32.bf16 %r2799, %rs60; 2026-02-21T09:20:56.0505046Z cvt.f32.bf16 %r2800, %rs67; 2026-02-21T09:20:56.0505220Z cvt.f32.bf16 %r2801, %rs68; 2026-02-21T09:20:56.0505390Z cvt.f32.bf16 %r2930, %rs61; 2026-02-21T09:20:56.0505560Z cvt.f32.bf16 %r2931, %rs62; 2026-02-21T09:20:56.0505729Z cvt.f32.bf16 %r2932, %rs69; 2026-02-21T09:20:56.0505901Z cvt.f32.bf16 %r2933, %rs70; 2026-02-21T09:20:56.0506068Z cvt.f32.bf16 %r3062, %rs63; 2026-02-21T09:20:56.0506257Z cvt.f32.bf16 %r3063, %rs64; 2026-02-21T09:20:56.0506435Z cvt.f32.bf16 %r3064, %rs71; 2026-02-21T09:20:56.0506728Z cvt.f32.bf16 %r3065, %rs72; 2026-02-21T09:20:56.0507048Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0507404Z add.s32 %r3372, %r3355, 95232; 2026-02-21T09:20:56.0507728Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0508081Z add.s32 %r3373, %r3372, %r11875; 2026-02-21T09:20:56.0508277Z ld.shared.b8 %rs73, [%r3373]; 2026-02-21T09:20:56.0508551Z ld.shared.b8 %rs74, [%r3373+128]; 2026-02-21T09:20:56.0508747Z ld.shared.b8 %rs75, [%r3373+256]; 2026-02-21T09:20:56.0508938Z ld.shared.b8 %rs76, [%r3373+384]; 2026-02-21T09:20:56.0509126Z ld.shared.b8 %rs77, [%r3373+512]; 2026-02-21T09:20:56.0509415Z ld.shared.b8 %rs78, [%r3373+640]; 2026-02-21T09:20:56.0509602Z ld.shared.b8 %rs79, [%r3373+768]; 2026-02-21T09:20:56.0509788Z add.s32 %r3374, %r3372, %r11876; 2026-02-21T09:20:56.0509973Z ld.shared.b8 %rs80, [%r3374]; 2026-02-21T09:20:56.0510373Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0510726Z shl.b16 %rs81, %rs73, 4; 2026-02-21T09:20:56.0510895Z shl.b16 %rs82, %rs74, 4; 2026-02-21T09:20:56.0511063Z shl.b16 %rs83, %rs75, 4; 2026-02-21T09:20:56.0511229Z shl.b16 %rs84, %rs76, 4; 2026-02-21T09:20:56.0511400Z shl.b16 %rs85, %rs77, 4; 2026-02-21T09:20:56.0511564Z shl.b16 %rs86, %rs78, 4; 2026-02-21T09:20:56.0511731Z shl.b16 %rs87, %rs79, 4; 2026-02-21T09:20:56.0511969Z shl.b16 %rs88, %rs80, 4; 2026-02-21T09:20:56.0512282Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0512639Z selp.b16 %rs89, %rs81, %rs73, %p70; 2026-02-21T09:20:56.0512841Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T09:20:56.0513009Z shr.s16 %rs91, %rs90, 4; 2026-02-21T09:20:56.0513181Z selp.b16 %rs92, %rs82, %rs74, %p70; 2026-02-21T09:20:56.0513390Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T09:20:56.0513556Z shr.s16 %rs94, %rs93, 4; 2026-02-21T09:20:56.0513729Z selp.b16 %rs95, %rs83, %rs75, %p70; 2026-02-21T09:20:56.0513916Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T09:20:56.0514083Z shr.s16 %rs97, %rs96, 4; 2026-02-21T09:20:56.0514257Z selp.b16 %rs98, %rs84, %rs76, %p70; 2026-02-21T09:20:56.0514561Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T09:20:56.0514753Z shr.s16 %rs100, %rs99, 4; 2026-02-21T09:20:56.0514944Z selp.b16 %rs101, %rs85, %rs77, %p70; 2026-02-21T09:20:56.0515153Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T09:20:56.0515334Z shr.s16 %rs103, %rs102, 4; 2026-02-21T09:20:56.0515522Z selp.b16 %rs104, %rs86, %rs78, %p70; 2026-02-21T09:20:56.0515723Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T09:20:56.0515896Z shr.s16 %rs106, %rs105, 4; 2026-02-21T09:20:56.0516080Z selp.b16 %rs107, %rs87, %rs79, %p70; 2026-02-21T09:20:56.0516277Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T09:20:56.0516583Z shr.s16 %rs109, %rs108, 4; 2026-02-21T09:20:56.0516771Z selp.b16 %rs110, %rs88, %rs80, %p70; 2026-02-21T09:20:56.0516972Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T09:20:56.0517144Z shr.s16 %rs112, %rs111, 4; 2026-02-21T09:20:56.0517462Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0517824Z cvt.rn.f32.s16 %r3375, %rs91; 2026-02-21T09:20:56.0518011Z cvt.rn.f32.s16 %r3376, %rs94; 2026-02-21T09:20:56.0518194Z cvt.rn.f32.s16 %r3377, %rs97; 2026-02-21T09:20:56.0518371Z cvt.rn.f32.s16 %r3378, %rs100; 2026-02-21T09:20:56.0518559Z cvt.rn.f32.s16 %r3379, %rs103; 2026-02-21T09:20:56.0518742Z cvt.rn.f32.s16 %r3380, %rs106; 2026-02-21T09:20:56.0518927Z cvt.rn.f32.s16 %r3381, %rs109; 2026-02-21T09:20:56.0519107Z cvt.rn.f32.s16 %r3382, %rs112; 2026-02-21T09:20:56.0519291Z bar.sync 0; 2026-02-21T09:20:56.0519442Z st.shared.b32 [%r101], %r3375; 2026-02-21T09:20:56.0519632Z st.shared.b32 [%r101+8], %r3376; 2026-02-21T09:20:56.0519826Z st.shared.b32 [%r102], %r3377; 2026-02-21T09:20:56.0520010Z st.shared.b32 [%r102+8], %r3378; 2026-02-21T09:20:56.0520219Z st.shared.b32 [%r103], %r3379; 2026-02-21T09:20:56.0520407Z st.shared.b32 [%r103+8], %r3380; 2026-02-21T09:20:56.0520601Z st.shared.b32 [%r104], %r3381; 2026-02-21T09:20:56.0520787Z st.shared.b32 [%r104+8], %r3382; 2026-02-21T09:20:56.0520970Z $L__tmp3: 2026-02-21T09:20:56.0521335Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0521762Z // begin inline asm 2026-02-21T09:20:56.0521944Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0522127Z // end inline asm 2026-02-21T09:20:56.0522282Z bar.sync 0; 2026-02-21T09:20:56.0522450Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0522724Z // begin inline asm 2026-02-21T09:20:56.0524303Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955}, {%r2666,%r2667,%r2668,%r2669}, %rd1, %p2, 1, 1; 2026-02-21T09:20:56.0525997Z // end inline asm 2026-02-21T09:20:56.0526153Z // begin inline asm 2026-02-21T09:20:56.0527916Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955}, {%r2798,%r2799,%r2800,%r2801}, %rd2, %p2, 1, 1; 2026-02-21T09:20:56.0529543Z // end inline asm 2026-02-21T09:20:56.0529695Z // begin inline asm 2026-02-21T09:20:56.0531343Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019}, {%r2930,%r2931,%r2932,%r2933}, %rd1, %p2, 1, 1; 2026-02-21T09:20:56.0532954Z // end inline asm 2026-02-21T09:20:56.0533106Z // begin inline asm 2026-02-21T09:20:56.0534686Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019}, {%r3062,%r3063,%r3064,%r3065}, %rd2, %p2, 1, 1; 2026-02-21T09:20:56.0536290Z // end inline asm 2026-02-21T09:20:56.0536574Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0536790Z mov.b32 %r3194, %r1575; 2026-02-21T09:20:56.0536962Z mov.b32 %r3195, %r3196; 2026-02-21T09:20:56.0537131Z // begin inline asm 2026-02-21T09:20:56.0539742Z // wait for regs: %r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r11900,%r11901,%r11902,%r11903,%r11904,%r11905,%r11906,%r11907,%r11908,%r11909,%r11910,%r11911,%r11912,%r11913,%r11914,%r11915,%r11916,%r11917,%r11918,%r11919,%r11920,%r11921,%r11922,%r11923,%r11924,%r11925,%r11926,%r11927,%r11928,%r11929,%r11930,%r11931,%r11932,%r11933,%r11934,%r11935,%r11936,%r11937,%r11938,%r11939,%r11940,%r11941,%r11942,%r11943,%r11944,%r11945,%r11946,%r11947,%r11948,%r11949,%r11950,%r11951,%r11952,%r11953,%r11954,%r11955,%r11956,%r11957,%r11958,%r11959,%r11960,%r11961,%r11962,%r11963,%r11964,%r11965,%r11966,%r11967,%r11968,%r11969,%r11970,%r11971,%r11972,%r11973,%r11974,%r11975,%r11976,%r11977,%r11978,%r11979,%r11980,%r11981,%r11982,%r11983,%r11984,%r11985,%r11986,%r11987,%r11988,%r11989,%r11990,%r11991,%r11992,%r11993,%r11994,%r11995,%r11996,%r11997,%r11998,%r11999,%r12000,%r12001,%r12002,%r12003,%r12004,%r12005,%r12006,%r12007,%r12008,%r12009,%r12010,%r12011,%r12012,%r12013,%r12014,%r12015,%r12016,%r12017,%r12018,%r12019,%r3194,%r3195,%r3196 2026-02-21T09:20:56.0542623Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0542826Z // end inline asm 2026-02-21T09:20:56.0543030Z $L__tmp4: 2026-02-21T09:20:56.0543323Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0543684Z add.s32 %r3383, %r11891, 1; 2026-02-21T09:20:56.0543873Z setp.gt.s32 %p13, %r3383, 4; 2026-02-21T09:20:56.0544069Z selp.b32 %r11891, 0, %r3383, %p13; 2026-02-21T09:20:56.0544420Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0544784Z add.s32 %r3384, %r11888, -16; 2026-02-21T09:20:56.0545035Z add.s64 %rd144, %rd634, %rd10; 2026-02-21T09:20:56.0545228Z add.s64 %rd134, %rd144, 320; 2026-02-21T09:20:56.0545293Z add.s64 %rd145, %rd634, %rd9; 2026-02-21T09:20:56.0545359Z add.s64 %rd135, %rd145, 320; 2026-02-21T09:20:56.0545434Z add.s64 %rd146, %rd634, %rd8; 2026-02-21T09:20:56.0545502Z add.s64 %rd136, %rd146, 320; 2026-02-21T09:20:56.0545576Z mad.wide.s32 %rd137, %r3384, 2, %rd44; 2026-02-21T09:20:56.0545789Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0545854Z shl.b32 %r3385, %r11891, 13; 2026-02-21T09:20:56.0545919Z add.s32 %r3386, %r11869, %r3385; 2026-02-21T09:20:56.0545982Z add.s32 %r3328, %r3386, %r36; 2026-02-21T09:20:56.0546117Z selp.b32 %r3329, 8, 0, %p11; 2026-02-21T09:20:56.0546183Z // begin inline asm 2026-02-21T09:20:56.0546332Z cp.async.ca.shared.global [ %r3328 + 0 ], [ %rd134 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0546398Z // end inline asm 2026-02-21T09:20:56.0546593Z add.s32 %r3330, %r3328, 2048; 2026-02-21T09:20:56.0546661Z // begin inline asm 2026-02-21T09:20:56.0546805Z cp.async.ca.shared.global [ %r3330 + 0 ], [ %rd135 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0546870Z // end inline asm 2026-02-21T09:20:56.0546934Z add.s32 %r3332, %r3328, 4096; 2026-02-21T09:20:56.0546994Z // begin inline asm 2026-02-21T09:20:56.0547131Z cp.async.ca.shared.global [ %r3332 + 0 ], [ %rd136 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0547193Z // end inline asm 2026-02-21T09:20:56.0547260Z add.s32 %r3334, %r3328, 6144; 2026-02-21T09:20:56.0547325Z // begin inline asm 2026-02-21T09:20:56.0547459Z cp.async.ca.shared.global [ %r3334 + 0 ], [ %rd137 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0547516Z // end inline asm 2026-02-21T09:20:56.0547587Z cp.async.commit_group; 2026-02-21T09:20:56.0547806Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0547877Z add.s32 %r3387, %r11889, -65536; 2026-02-21T09:20:56.0547947Z cvt.s64.s32 %rd147, %r3387; 2026-02-21T09:20:56.0548019Z add.s64 %rd138, %rd45, %rd147; 2026-02-21T09:20:56.0548228Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0548362Z shl.b32 %r3388, %r11891, 10; 2026-02-21T09:20:56.0548436Z add.s32 %r3336, %r42, %r3388; 2026-02-21T09:20:56.0548501Z selp.b32 %r3337, 4, 0, %p11; 2026-02-21T09:20:56.0548563Z // begin inline asm 2026-02-21T09:20:56.0548703Z cp.async.ca.shared.global [ %r3336 + 0 ], [ %rd138 + 0 ], 0x4, %r3337; 2026-02-21T09:20:56.0548766Z // end inline asm 2026-02-21T09:20:56.0548833Z cp.async.commit_group; 2026-02-21T09:20:56.0549037Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0549109Z add.s64 %rd139, %rd144, 352; 2026-02-21T09:20:56.0549173Z add.s64 %rd140, %rd145, 352; 2026-02-21T09:20:56.0549235Z add.s64 %rd141, %rd146, 352; 2026-02-21T09:20:56.0549312Z mad.wide.s32 %rd142, %r11888, 2, %rd44; 2026-02-21T09:20:56.0549519Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0549683Z add.s32 %r3389, %r3368, %r3385; 2026-02-21T09:20:56.0549745Z add.s32 %r3338, %r3389, %r36; 2026-02-21T09:20:56.0549810Z // begin inline asm 2026-02-21T09:20:56.0549947Z cp.async.ca.shared.global [ %r3338 + 0 ], [ %rd139 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0550005Z // end inline asm 2026-02-21T09:20:56.0550135Z add.s32 %r3340, %r3338, 2048; 2026-02-21T09:20:56.0550196Z // begin inline asm 2026-02-21T09:20:56.0550330Z cp.async.ca.shared.global [ %r3340 + 0 ], [ %rd140 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0550388Z // end inline asm 2026-02-21T09:20:56.0550457Z add.s32 %r3342, %r3338, 4096; 2026-02-21T09:20:56.0550517Z // begin inline asm 2026-02-21T09:20:56.0550650Z cp.async.ca.shared.global [ %r3342 + 0 ], [ %rd141 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0550715Z // end inline asm 2026-02-21T09:20:56.0550842Z add.s32 %r3344, %r3338, 6144; 2026-02-21T09:20:56.0550904Z // begin inline asm 2026-02-21T09:20:56.0551036Z cp.async.ca.shared.global [ %r3344 + 0 ], [ %rd142 + 0 ], 0x8, %r3329; 2026-02-21T09:20:56.0551108Z // end inline asm 2026-02-21T09:20:56.0551175Z cp.async.commit_group; 2026-02-21T09:20:56.0551380Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0551448Z cvt.s64.s32 %rd148, %r11889; 2026-02-21T09:20:56.0551525Z add.s64 %rd143, %rd45, %rd148; 2026-02-21T09:20:56.0551732Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0551800Z add.s32 %r3346, %r48, %r3388; 2026-02-21T09:20:56.0551862Z // begin inline asm 2026-02-21T09:20:56.0552062Z cp.async.ca.shared.global [ %r3346 + 0 ], [ %rd143 + 0 ], 0x4, %r3337; 2026-02-21T09:20:56.0552122Z // end inline asm 2026-02-21T09:20:56.0552192Z cp.async.commit_group; 2026-02-21T09:20:56.0552395Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0552462Z add.s32 %r11889, %r11889, 131072; 2026-02-21T09:20:56.0552537Z add.s64 %rd634, %rd634, 64; 2026-02-21T09:20:56.0552601Z add.s32 %r11888, %r11888, 32; 2026-02-21T09:20:56.0552670Z setp.lt.u64 %p14, %rd635, 496; 2026-02-21T09:20:56.0552740Z @%p14 bra $L__BB0_3; 2026-02-21T09:20:56.0552850Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:56.0553062Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0553128Z or.b32 %r3637, %r211, %r9; 2026-02-21T09:20:56.0553340Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0553404Z or.b32 %r3638, %r212, %r15; 2026-02-21T09:20:56.0553466Z or.b32 %r3639, %r212, %r16; 2026-02-21T09:20:56.0553530Z or.b32 %r3640, %r212, %r17; 2026-02-21T09:20:56.0553590Z or.b32 %r3641, %r212, %r18; 2026-02-21T09:20:56.0553653Z or.b32 %r3642, %r212, %r19; 2026-02-21T09:20:56.0553716Z or.b32 %r3643, %r212, %r20; 2026-02-21T09:20:56.0553774Z or.b32 %r3644, %r212, %r21; 2026-02-21T09:20:56.0553836Z or.b32 %r3645, %r212, %r22; 2026-02-21T09:20:56.0553895Z or.b32 %r3646, %r212, %r23; 2026-02-21T09:20:56.0553960Z or.b32 %r3647, %r212, %r24; 2026-02-21T09:20:56.0554019Z or.b32 %r3648, %r212, %r25; 2026-02-21T09:20:56.0554081Z or.b32 %r3649, %r212, %r26; 2026-02-21T09:20:56.0554143Z or.b32 %r3650, %r212, %r27; 2026-02-21T09:20:56.0554201Z or.b32 %r3651, %r212, %r28; 2026-02-21T09:20:56.0554260Z or.b32 %r3652, %r212, %r29; 2026-02-21T09:20:56.0554322Z or.b32 %r3653, %r212, %r30; 2026-02-21T09:20:56.0554527Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0554596Z cp.async.wait_group 0; 2026-02-21T09:20:56.0554667Z bar.sync 0; 2026-02-21T09:20:56.0554876Z .loc 1 90 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:90:28 2026-02-21T09:20:56.0554963Z cvt.rn.bf16x2.f32 %r3654, %r11893, %r11892; 2026-02-21T09:20:56.0555044Z cvt.rn.bf16x2.f32 %r3655, %r11895, %r11894; 2026-02-21T09:20:56.0555185Z cvt.rn.bf16x2.f32 %r3656, %r11897, %r11896; 2026-02-21T09:20:56.0555262Z cvt.rn.bf16x2.f32 %r3657, %r11899, %r11898; 2026-02-21T09:20:56.0555338Z cvt.rn.bf16x2.f32 %r3658, %r11901, %r11900; 2026-02-21T09:20:56.0555479Z cvt.rn.bf16x2.f32 %r3659, %r11903, %r11902; 2026-02-21T09:20:56.0555562Z cvt.rn.bf16x2.f32 %r3660, %r11905, %r11904; 2026-02-21T09:20:56.0555641Z cvt.rn.bf16x2.f32 %r3661, %r11907, %r11906; 2026-02-21T09:20:56.0555717Z cvt.rn.bf16x2.f32 %r3662, %r11909, %r11908; 2026-02-21T09:20:56.0555797Z cvt.rn.bf16x2.f32 %r3663, %r11911, %r11910; 2026-02-21T09:20:56.0555872Z cvt.rn.bf16x2.f32 %r3664, %r11913, %r11912; 2026-02-21T09:20:56.0555947Z cvt.rn.bf16x2.f32 %r3665, %r11915, %r11914; 2026-02-21T09:20:56.0556099Z cvt.rn.bf16x2.f32 %r3666, %r11917, %r11916; 2026-02-21T09:20:56.0556176Z cvt.rn.bf16x2.f32 %r3667, %r11919, %r11918; 2026-02-21T09:20:56.0556250Z cvt.rn.bf16x2.f32 %r3668, %r11921, %r11920; 2026-02-21T09:20:56.0556326Z cvt.rn.bf16x2.f32 %r3669, %r11923, %r11922; 2026-02-21T09:20:56.0556406Z cvt.rn.bf16x2.f32 %r3670, %r11925, %r11924; 2026-02-21T09:20:56.0556609Z cvt.rn.bf16x2.f32 %r3671, %r11927, %r11926; 2026-02-21T09:20:56.0556691Z cvt.rn.bf16x2.f32 %r3672, %r11929, %r11928; 2026-02-21T09:20:56.0556776Z cvt.rn.bf16x2.f32 %r3673, %r11931, %r11930; 2026-02-21T09:20:56.0556851Z cvt.rn.bf16x2.f32 %r3674, %r11933, %r11932; 2026-02-21T09:20:56.0556927Z cvt.rn.bf16x2.f32 %r3675, %r11935, %r11934; 2026-02-21T09:20:56.0557006Z cvt.rn.bf16x2.f32 %r3676, %r11937, %r11936; 2026-02-21T09:20:56.0557156Z cvt.rn.bf16x2.f32 %r3677, %r11939, %r11938; 2026-02-21T09:20:56.0557233Z cvt.rn.bf16x2.f32 %r3678, %r11941, %r11940; 2026-02-21T09:20:56.0557308Z cvt.rn.bf16x2.f32 %r3679, %r11943, %r11942; 2026-02-21T09:20:56.0557392Z cvt.rn.bf16x2.f32 %r3680, %r11945, %r11944; 2026-02-21T09:20:56.0557469Z cvt.rn.bf16x2.f32 %r3681, %r11947, %r11946; 2026-02-21T09:20:56.0557543Z cvt.rn.bf16x2.f32 %r3682, %r11949, %r11948; 2026-02-21T09:20:56.0557624Z cvt.rn.bf16x2.f32 %r3683, %r11951, %r11950; 2026-02-21T09:20:56.0557700Z cvt.rn.bf16x2.f32 %r3684, %r11953, %r11952; 2026-02-21T09:20:56.0557775Z cvt.rn.bf16x2.f32 %r3685, %r11955, %r11954; 2026-02-21T09:20:56.0557856Z cvt.rn.bf16x2.f32 %r3686, %r11957, %r11956; 2026-02-21T09:20:56.0557933Z cvt.rn.bf16x2.f32 %r3687, %r11959, %r11958; 2026-02-21T09:20:56.0558007Z cvt.rn.bf16x2.f32 %r3688, %r11961, %r11960; 2026-02-21T09:20:56.0558081Z cvt.rn.bf16x2.f32 %r3689, %r11963, %r11962; 2026-02-21T09:20:56.0558164Z cvt.rn.bf16x2.f32 %r3690, %r11965, %r11964; 2026-02-21T09:20:56.0558239Z cvt.rn.bf16x2.f32 %r3691, %r11967, %r11966; 2026-02-21T09:20:56.0558327Z cvt.rn.bf16x2.f32 %r3692, %r11969, %r11968; 2026-02-21T09:20:56.0558410Z cvt.rn.bf16x2.f32 %r3693, %r11971, %r11970; 2026-02-21T09:20:56.0558490Z cvt.rn.bf16x2.f32 %r3694, %r11973, %r11972; 2026-02-21T09:20:56.0558567Z cvt.rn.bf16x2.f32 %r3695, %r11975, %r11974; 2026-02-21T09:20:56.0558644Z cvt.rn.bf16x2.f32 %r3696, %r11977, %r11976; 2026-02-21T09:20:56.0558725Z cvt.rn.bf16x2.f32 %r3697, %r11979, %r11978; 2026-02-21T09:20:56.0558800Z cvt.rn.bf16x2.f32 %r3698, %r11981, %r11980; 2026-02-21T09:20:56.0558874Z cvt.rn.bf16x2.f32 %r3699, %r11983, %r11982; 2026-02-21T09:20:56.0558956Z cvt.rn.bf16x2.f32 %r3700, %r11985, %r11984; 2026-02-21T09:20:56.0559033Z cvt.rn.bf16x2.f32 %r3701, %r11987, %r11986; 2026-02-21T09:20:56.0559108Z cvt.rn.bf16x2.f32 %r3702, %r11989, %r11988; 2026-02-21T09:20:56.0559191Z cvt.rn.bf16x2.f32 %r3703, %r11991, %r11990; 2026-02-21T09:20:56.0559268Z cvt.rn.bf16x2.f32 %r3704, %r11993, %r11992; 2026-02-21T09:20:56.0559344Z cvt.rn.bf16x2.f32 %r3705, %r11995, %r11994; 2026-02-21T09:20:56.0559417Z cvt.rn.bf16x2.f32 %r3706, %r11997, %r11996; 2026-02-21T09:20:56.0559499Z cvt.rn.bf16x2.f32 %r3707, %r11999, %r11998; 2026-02-21T09:20:56.0559576Z cvt.rn.bf16x2.f32 %r3708, %r12001, %r12000; 2026-02-21T09:20:56.0559653Z cvt.rn.bf16x2.f32 %r3709, %r12003, %r12002; 2026-02-21T09:20:56.0559824Z cvt.rn.bf16x2.f32 %r3710, %r12005, %r12004; 2026-02-21T09:20:56.0559901Z cvt.rn.bf16x2.f32 %r3711, %r12007, %r12006; 2026-02-21T09:20:56.0559977Z cvt.rn.bf16x2.f32 %r3712, %r12009, %r12008; 2026-02-21T09:20:56.0560059Z cvt.rn.bf16x2.f32 %r3713, %r12011, %r12010; 2026-02-21T09:20:56.0560197Z cvt.rn.bf16x2.f32 %r3714, %r12013, %r12012; 2026-02-21T09:20:56.0560273Z cvt.rn.bf16x2.f32 %r3715, %r12015, %r12014; 2026-02-21T09:20:56.0560348Z cvt.rn.bf16x2.f32 %r3716, %r12017, %r12016; 2026-02-21T09:20:56.0560428Z cvt.rn.bf16x2.f32 %r3717, %r12019, %r12018; 2026-02-21T09:20:56.0560653Z .loc 1 91 43 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:43 2026-02-21T09:20:56.0560719Z shl.b32 %r3718, %r3638, 13; 2026-02-21T09:20:56.0560856Z shl.b32 %r3719, %r3639, 13; 2026-02-21T09:20:56.0560923Z shl.b32 %r3720, %r3640, 13; 2026-02-21T09:20:56.0560985Z shl.b32 %r3721, %r3641, 13; 2026-02-21T09:20:56.0561057Z shl.b32 %r3722, %r3642, 13; 2026-02-21T09:20:56.0561119Z shl.b32 %r3723, %r3643, 13; 2026-02-21T09:20:56.0561179Z shl.b32 %r3724, %r3644, 13; 2026-02-21T09:20:56.0561240Z shl.b32 %r3725, %r3645, 13; 2026-02-21T09:20:56.0561308Z shl.b32 %r3726, %r3646, 13; 2026-02-21T09:20:56.0561369Z shl.b32 %r3727, %r3647, 13; 2026-02-21T09:20:56.0561436Z shl.b32 %r3728, %r3648, 13; 2026-02-21T09:20:56.0561502Z shl.b32 %r3729, %r3649, 13; 2026-02-21T09:20:56.0561563Z shl.b32 %r3730, %r3650, 13; 2026-02-21T09:20:56.0561621Z shl.b32 %r3731, %r3651, 13; 2026-02-21T09:20:56.0561682Z shl.b32 %r3732, %r3652, 13; 2026-02-21T09:20:56.0561799Z shl.b32 %r3733, %r3653, 13; 2026-02-21T09:20:56.0562016Z .loc 1 91 50 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:50 2026-02-21T09:20:56.0562085Z add.s32 %r3734, %r3718, %r3637; 2026-02-21T09:20:56.0562157Z add.s32 %r3735, %r3719, %r3637; 2026-02-21T09:20:56.0562219Z add.s32 %r3736, %r3720, %r3637; 2026-02-21T09:20:56.0562283Z add.s32 %r3737, %r3721, %r3637; 2026-02-21T09:20:56.0562366Z add.s32 %r3738, %r3722, %r3637; 2026-02-21T09:20:56.0562429Z add.s32 %r3739, %r3723, %r3637; 2026-02-21T09:20:56.0562491Z add.s32 %r3740, %r3724, %r3637; 2026-02-21T09:20:56.0562552Z add.s32 %r3741, %r3725, %r3637; 2026-02-21T09:20:56.0562620Z add.s32 %r3742, %r3726, %r3637; 2026-02-21T09:20:56.0562680Z add.s32 %r3743, %r3727, %r3637; 2026-02-21T09:20:56.0562743Z add.s32 %r3744, %r3728, %r3637; 2026-02-21T09:20:56.0562808Z add.s32 %r3745, %r3729, %r3637; 2026-02-21T09:20:56.0562869Z add.s32 %r3746, %r3730, %r3637; 2026-02-21T09:20:56.0562932Z add.s32 %r3747, %r3731, %r3637; 2026-02-21T09:20:56.0562993Z add.s32 %r3748, %r3732, %r3637; 2026-02-21T09:20:56.0563059Z add.s32 %r3749, %r3733, %r3637; 2026-02-21T09:20:56.0563270Z .loc 1 91 22 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:22 2026-02-21T09:20:56.0563345Z mad.wide.s32 %rd149, %r3734, 2, %rd46; 2026-02-21T09:20:56.0563421Z mad.wide.s32 %rd150, %r3735, 2, %rd46; 2026-02-21T09:20:56.0563489Z mad.wide.s32 %rd151, %r3736, 2, %rd46; 2026-02-21T09:20:56.0563557Z mad.wide.s32 %rd152, %r3737, 2, %rd46; 2026-02-21T09:20:56.0563627Z mad.wide.s32 %rd153, %r3738, 2, %rd46; 2026-02-21T09:20:56.0563696Z mad.wide.s32 %rd154, %r3739, 2, %rd46; 2026-02-21T09:20:56.0563765Z mad.wide.s32 %rd155, %r3740, 2, %rd46; 2026-02-21T09:20:56.0563832Z mad.wide.s32 %rd156, %r3741, 2, %rd46; 2026-02-21T09:20:56.0563906Z mad.wide.s32 %rd157, %r3742, 2, %rd46; 2026-02-21T09:20:56.0563975Z mad.wide.s32 %rd158, %r3743, 2, %rd46; 2026-02-21T09:20:56.0564043Z mad.wide.s32 %rd159, %r3744, 2, %rd46; 2026-02-21T09:20:56.0564115Z mad.wide.s32 %rd160, %r3745, 2, %rd46; 2026-02-21T09:20:56.0564183Z mad.wide.s32 %rd161, %r3746, 2, %rd46; 2026-02-21T09:20:56.0564252Z mad.wide.s32 %rd162, %r3747, 2, %rd46; 2026-02-21T09:20:56.0564319Z mad.wide.s32 %rd163, %r3748, 2, %rd46; 2026-02-21T09:20:56.0564390Z mad.wide.s32 %rd164, %r3749, 2, %rd46; 2026-02-21T09:20:56.0564594Z .loc 1 91 81 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:81 2026-02-21T09:20:56.0564784Z st.shared.v4.b32 [%r105], {%r3654, %r3656, %r3658, %r3660}; 2026-02-21T09:20:56.0564910Z st.shared.v4.b32 [%r105+512], {%r3655, %r3657, %r3659, %r3661}; 2026-02-21T09:20:56.0565064Z st.shared.v4.b32 [%r106], {%r3662, %r3664, %r3666, %r3668}; 2026-02-21T09:20:56.0565178Z st.shared.v4.b32 [%r106+512], {%r3663, %r3665, %r3667, %r3669}; 2026-02-21T09:20:56.0565289Z st.shared.v4.b32 [%r107], {%r3670, %r3672, %r3674, %r3676}; 2026-02-21T09:20:56.0565413Z st.shared.v4.b32 [%r107+512], {%r3671, %r3673, %r3675, %r3677}; 2026-02-21T09:20:56.0565522Z st.shared.v4.b32 [%r108], {%r3678, %r3680, %r3682, %r3684}; 2026-02-21T09:20:56.0565688Z st.shared.v4.b32 [%r108+512], {%r3679, %r3681, %r3683, %r3685}; 2026-02-21T09:20:56.0565760Z bar.sync 0; 2026-02-21T09:20:56.0565826Z // begin inline asm 2026-02-21T09:20:56.0566022Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3470, %r3471, %r3472, %r3473}, [%r3394]; 2026-02-21T09:20:56.0566088Z // end inline asm 2026-02-21T09:20:56.0566149Z // begin inline asm 2026-02-21T09:20:56.0566335Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3474, %r3475, %r3476, %r3477}, [%r3399]; 2026-02-21T09:20:56.0566397Z // end inline asm 2026-02-21T09:20:56.0566572Z // begin inline asm 2026-02-21T09:20:56.0566760Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3478, %r3479, %r3480, %r3481}, [%r3404]; 2026-02-21T09:20:56.0566822Z // end inline asm 2026-02-21T09:20:56.0566881Z // begin inline asm 2026-02-21T09:20:56.0567146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3482, %r3483, %r3484, %r3485}, [%r3409]; 2026-02-21T09:20:56.0567220Z // end inline asm 2026-02-21T09:20:56.0567286Z // begin inline asm 2026-02-21T09:20:56.0567474Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3486, %r3487, %r3488, %r3489}, [%r3414]; 2026-02-21T09:20:56.0567533Z // end inline asm 2026-02-21T09:20:56.0567595Z // begin inline asm 2026-02-21T09:20:56.0567774Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3490, %r3491, %r3492, %r3493}, [%r3419]; 2026-02-21T09:20:56.0567834Z // end inline asm 2026-02-21T09:20:56.0567893Z // begin inline asm 2026-02-21T09:20:56.0568075Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3494, %r3495, %r3496, %r3497}, [%r3424]; 2026-02-21T09:20:56.0568136Z // end inline asm 2026-02-21T09:20:56.0568196Z // begin inline asm 2026-02-21T09:20:56.0568382Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3498, %r3499, %r3500, %r3501}, [%r3429]; 2026-02-21T09:20:56.0568448Z // end inline asm 2026-02-21T09:20:56.0568508Z bar.sync 0; 2026-02-21T09:20:56.0568622Z st.shared.v4.b32 [%r105], {%r3686, %r3688, %r3690, %r3692}; 2026-02-21T09:20:56.0568739Z st.shared.v4.b32 [%r105+512], {%r3687, %r3689, %r3691, %r3693}; 2026-02-21T09:20:56.0568849Z st.shared.v4.b32 [%r106], {%r3694, %r3696, %r3698, %r3700}; 2026-02-21T09:20:56.0568962Z st.shared.v4.b32 [%r106+512], {%r3695, %r3697, %r3699, %r3701}; 2026-02-21T09:20:56.0569073Z st.shared.v4.b32 [%r107], {%r3702, %r3704, %r3706, %r3708}; 2026-02-21T09:20:56.0569185Z st.shared.v4.b32 [%r107+512], {%r3703, %r3705, %r3707, %r3709}; 2026-02-21T09:20:56.0569290Z st.shared.v4.b32 [%r108], {%r3710, %r3712, %r3714, %r3716}; 2026-02-21T09:20:56.0569406Z st.shared.v4.b32 [%r108+512], {%r3711, %r3713, %r3715, %r3717}; 2026-02-21T09:20:56.0569467Z bar.sync 0; 2026-02-21T09:20:56.0569529Z // begin inline asm 2026-02-21T09:20:56.0569720Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3502, %r3503, %r3504, %r3505}, [%r3394]; 2026-02-21T09:20:56.0569780Z // end inline asm 2026-02-21T09:20:56.0569844Z // begin inline asm 2026-02-21T09:20:56.0570030Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3506, %r3507, %r3508, %r3509}, [%r3399]; 2026-02-21T09:20:56.0570096Z // end inline asm 2026-02-21T09:20:56.0570158Z // begin inline asm 2026-02-21T09:20:56.0570340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3510, %r3511, %r3512, %r3513}, [%r3404]; 2026-02-21T09:20:56.0570405Z // end inline asm 2026-02-21T09:20:56.0570465Z // begin inline asm 2026-02-21T09:20:56.0570744Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3514, %r3515, %r3516, %r3517}, [%r3409]; 2026-02-21T09:20:56.0570810Z // end inline asm 2026-02-21T09:20:56.0570871Z // begin inline asm 2026-02-21T09:20:56.0571048Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3518, %r3519, %r3520, %r3521}, [%r3414]; 2026-02-21T09:20:56.0571167Z // end inline asm 2026-02-21T09:20:56.0571236Z // begin inline asm 2026-02-21T09:20:56.0571417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3522, %r3523, %r3524, %r3525}, [%r3419]; 2026-02-21T09:20:56.0571478Z // end inline asm 2026-02-21T09:20:56.0571550Z // begin inline asm 2026-02-21T09:20:56.0571731Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3526, %r3527, %r3528, %r3529}, [%r3424]; 2026-02-21T09:20:56.0571791Z // end inline asm 2026-02-21T09:20:56.0571921Z // begin inline asm 2026-02-21T09:20:56.0572114Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3530, %r3531, %r3532, %r3533}, [%r3429]; 2026-02-21T09:20:56.0572173Z // end inline asm 2026-02-21T09:20:56.0572234Z // begin inline asm 2026-02-21T09:20:56.0572371Z st.global.v4.b32 [ %rd149 + 0 ], { %r3470, %r3471, %r3472, %r3473 }; 2026-02-21T09:20:56.0572431Z // end inline asm 2026-02-21T09:20:56.0572492Z // begin inline asm 2026-02-21T09:20:56.0572624Z st.global.v4.b32 [ %rd150 + 0 ], { %r3474, %r3475, %r3476, %r3477 }; 2026-02-21T09:20:56.0572683Z // end inline asm 2026-02-21T09:20:56.0572745Z // begin inline asm 2026-02-21T09:20:56.0572867Z st.global.v4.b32 [ %rd151 + 0 ], { %r3478, %r3479, %r3480, %r3481 }; 2026-02-21T09:20:56.0572935Z // end inline asm 2026-02-21T09:20:56.0573044Z // begin inline asm 2026-02-21T09:20:56.0573163Z st.global.v4.b32 [ %rd152 + 0 ], { %r3482, %r3483, %r3484, %r3485 }; 2026-02-21T09:20:56.0573225Z // end inline asm 2026-02-21T09:20:56.0573286Z // begin inline asm 2026-02-21T09:20:56.0573402Z st.global.v4.b32 [ %rd153 + 0 ], { %r3486, %r3487, %r3488, %r3489 }; 2026-02-21T09:20:56.0573463Z // end inline asm 2026-02-21T09:20:56.0573547Z // begin inline asm 2026-02-21T09:20:56.0573670Z st.global.v4.b32 [ %rd154 + 0 ], { %r3490, %r3491, %r3492, %r3493 }; 2026-02-21T09:20:56.0573732Z // end inline asm 2026-02-21T09:20:56.0573798Z // begin inline asm 2026-02-21T09:20:56.0573919Z st.global.v4.b32 [ %rd155 + 0 ], { %r3494, %r3495, %r3496, %r3497 }; 2026-02-21T09:20:56.0573978Z // end inline asm 2026-02-21T09:20:56.0574048Z // begin inline asm 2026-02-21T09:20:56.0574163Z st.global.v4.b32 [ %rd156 + 0 ], { %r3498, %r3499, %r3500, %r3501 }; 2026-02-21T09:20:56.0574221Z // end inline asm 2026-02-21T09:20:56.0574280Z // begin inline asm 2026-02-21T09:20:56.0574404Z st.global.v4.b32 [ %rd157 + 0 ], { %r3502, %r3503, %r3504, %r3505 }; 2026-02-21T09:20:56.0574463Z // end inline asm 2026-02-21T09:20:56.0574524Z // begin inline asm 2026-02-21T09:20:56.0574650Z st.global.v4.b32 [ %rd158 + 0 ], { %r3506, %r3507, %r3508, %r3509 }; 2026-02-21T09:20:56.0574708Z // end inline asm 2026-02-21T09:20:56.0574768Z // begin inline asm 2026-02-21T09:20:56.0574888Z st.global.v4.b32 [ %rd159 + 0 ], { %r3510, %r3511, %r3512, %r3513 }; 2026-02-21T09:20:56.0574954Z // end inline asm 2026-02-21T09:20:56.0575017Z // begin inline asm 2026-02-21T09:20:56.0575133Z st.global.v4.b32 [ %rd160 + 0 ], { %r3514, %r3515, %r3516, %r3517 }; 2026-02-21T09:20:56.0575202Z // end inline asm 2026-02-21T09:20:56.0575262Z // begin inline asm 2026-02-21T09:20:56.0575379Z st.global.v4.b32 [ %rd161 + 0 ], { %r3518, %r3519, %r3520, %r3521 }; 2026-02-21T09:20:56.0575446Z // end inline asm 2026-02-21T09:20:56.0575507Z // begin inline asm 2026-02-21T09:20:56.0575626Z st.global.v4.b32 [ %rd162 + 0 ], { %r3522, %r3523, %r3524, %r3525 }; 2026-02-21T09:20:56.0575686Z // end inline asm 2026-02-21T09:20:56.0575753Z // begin inline asm 2026-02-21T09:20:56.0575872Z st.global.v4.b32 [ %rd163 + 0 ], { %r3526, %r3527, %r3528, %r3529 }; 2026-02-21T09:20:56.0575932Z // end inline asm 2026-02-21T09:20:56.0576013Z // begin inline asm 2026-02-21T09:20:56.0576133Z st.global.v4.b32 [ %rd164 + 0 ], { %r3530, %r3531, %r3532, %r3533 }; 2026-02-21T09:20:56.0576257Z // end inline asm 2026-02-21T09:20:56.0576588Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0576670Z or.b32 %r3750, %r11887, 1; 2026-02-21T09:20:56.0576991Z .loc 1 28 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:28:35 2026-02-21T09:20:56.0577061Z add.s32 %r3753, %r3750, %r1707; 2026-02-21T09:20:56.0577142Z shr.s32 %r3754, %r3753, 9; 2026-02-21T09:20:56.0577354Z .loc 1 29 33 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:29:33 2026-02-21T09:20:56.0577420Z shl.b32 %r3755, %r3754, 3; 2026-02-21T09:20:56.0577706Z .loc 1 30 39 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:39 2026-02-21T09:20:56.0577779Z sub.s32 %r3756, 64, %r3755; 2026-02-21T09:20:56.0577990Z .loc 1 30 52 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:52 2026-02-21T09:20:56.0578061Z min.s32 %r3757, %r3756, 8; 2026-02-21T09:20:56.0578273Z .loc 1 31 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:45 2026-02-21T09:20:56.0578341Z and.b32 %r3758, %r3753, -512; 2026-02-21T09:20:56.0578410Z sub.s32 %r3759, %r3750, %r3758; 2026-02-21T09:20:56.0578623Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0578690Z div.s32 %r3760, %r3759, %r3757; 2026-02-21T09:20:56.0578966Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0579053Z mul.lo.s32 %r3761, %r3760, %r3757; 2026-02-21T09:20:56.0579121Z sub.s32 %r3762, %r3759, %r3761; 2026-02-21T09:20:56.0579326Z .loc 1 31 30 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:30 2026-02-21T09:20:56.0579399Z add.s32 %r3763, %r3762, %r3755; 2026-02-21T09:20:56.0579602Z .loc 1 33 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:33:27 2026-02-21T09:20:56.0579672Z shl.b32 %r479, %r3763, 7; 2026-02-21T09:20:56.0579885Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0579952Z or.b32 %r3764, %r479, %r7; 2026-02-21T09:20:56.0580158Z .loc 1 35 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:35:27 2026-02-21T09:20:56.0580222Z shl.b32 %r480, %r3760, 8; 2026-02-21T09:20:56.0580430Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0580498Z or.b32 %r3765, %r480, %r11; 2026-02-21T09:20:56.0580564Z or.b32 %r3766, %r480, %r12; 2026-02-21T09:20:56.0580647Z or.b32 %r3767, %r480, %r13; 2026-02-21T09:20:56.0580713Z or.b32 %r3768, %r480, %r14; 2026-02-21T09:20:56.0580926Z .loc 1 51 53 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:53 2026-02-21T09:20:56.0580996Z shl.b32 %r3769, %r3765, 10; 2026-02-21T09:20:56.0581062Z shl.b32 %r3770, %r3766, 10; 2026-02-21T09:20:56.0581124Z shl.b32 %r3771, %r3767, 10; 2026-02-21T09:20:56.0581186Z shl.b32 %r3772, %r3768, 10; 2026-02-21T09:20:56.0581399Z .loc 1 51 60 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:60 2026-02-21T09:20:56.0581466Z or.b32 %r3773, %r3769, %r34; 2026-02-21T09:20:56.0581527Z or.b32 %r3774, %r3770, %r34; 2026-02-21T09:20:56.0581595Z or.b32 %r3775, %r3771, %r34; 2026-02-21T09:20:56.0581658Z or.b32 %r3776, %r3772, %r34; 2026-02-21T09:20:56.0581859Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0581943Z mad.wide.s32 %rd165, %r3773, 2, %rd44; 2026-02-21T09:20:56.0582018Z mad.wide.s32 %rd166, %r3774, 2, %rd44; 2026-02-21T09:20:56.0582091Z mad.wide.s32 %rd167, %r3775, 2, %rd44; 2026-02-21T09:20:56.0582161Z mad.wide.s32 %rd168, %r3776, 2, %rd44; 2026-02-21T09:20:56.0582465Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0582525Z bar.sync 0; 2026-02-21T09:20:56.0582584Z mov.b32 %r3535, 8; 2026-02-21T09:20:56.0582651Z // begin inline asm 2026-02-21T09:20:56.0582847Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd165 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0582907Z // end inline asm 2026-02-21T09:20:56.0582967Z // begin inline asm 2026-02-21T09:20:56.0583108Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd166 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0583168Z // end inline asm 2026-02-21T09:20:56.0583229Z // begin inline asm 2026-02-21T09:20:56.0583365Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd167 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0583424Z // end inline asm 2026-02-21T09:20:56.0583539Z // begin inline asm 2026-02-21T09:20:56.0583679Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd168 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0583738Z // end inline asm 2026-02-21T09:20:56.0583809Z cp.async.commit_group; 2026-02-21T09:20:56.0584016Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0584089Z add.s32 %r3777, %r3764, %r11870; 2026-02-21T09:20:56.0584292Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0584360Z cvt.s64.s32 %rd216, %r3777; 2026-02-21T09:20:56.0584443Z add.s64 %rd169, %rd45, %rd216; 2026-02-21T09:20:56.0584509Z mov.b32 %r12023, 4; 2026-02-21T09:20:56.0584762Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0584831Z // begin inline asm 2026-02-21T09:20:56.0584972Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd169 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0585044Z // end inline asm 2026-02-21T09:20:56.0585115Z cp.async.commit_group; 2026-02-21T09:20:56.0585324Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0585390Z cvt.s64.s32 %rd217, %r3769; 2026-02-21T09:20:56.0585456Z or.b64 %rd218, %rd217, %rd633; 2026-02-21T09:20:56.0585526Z shl.b64 %rd219, %rd218, 1; 2026-02-21T09:20:56.0585592Z add.s64 %rd220, %rd44, %rd219; 2026-02-21T09:20:56.0585658Z add.s64 %rd170, %rd220, 32; 2026-02-21T09:20:56.0585720Z cvt.s64.s32 %rd221, %r3770; 2026-02-21T09:20:56.0585791Z or.b64 %rd222, %rd221, %rd633; 2026-02-21T09:20:56.0585856Z shl.b64 %rd223, %rd222, 1; 2026-02-21T09:20:56.0585919Z add.s64 %rd224, %rd44, %rd223; 2026-02-21T09:20:56.0585989Z add.s64 %rd171, %rd224, 32; 2026-02-21T09:20:56.0586052Z cvt.s64.s32 %rd225, %r3771; 2026-02-21T09:20:56.0586120Z or.b64 %rd226, %rd225, %rd633; 2026-02-21T09:20:56.0586191Z shl.b64 %rd227, %rd226, 1; 2026-02-21T09:20:56.0586258Z add.s64 %rd228, %rd44, %rd227; 2026-02-21T09:20:56.0586320Z add.s64 %rd172, %rd228, 32; 2026-02-21T09:20:56.0586383Z cvt.s64.s32 %rd229, %r3772; 2026-02-21T09:20:56.0586574Z or.b64 %rd230, %rd229, %rd633; 2026-02-21T09:20:56.0586646Z shl.b64 %rd231, %rd230, 1; 2026-02-21T09:20:56.0586709Z add.s64 %rd232, %rd44, %rd231; 2026-02-21T09:20:56.0586778Z add.s64 %rd173, %rd232, 32; 2026-02-21T09:20:56.0586985Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0587051Z // begin inline asm 2026-02-21T09:20:56.0587185Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd170 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0587252Z // end inline asm 2026-02-21T09:20:56.0587313Z // begin inline asm 2026-02-21T09:20:56.0587447Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd171 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0587511Z // end inline asm 2026-02-21T09:20:56.0587573Z // begin inline asm 2026-02-21T09:20:56.0587704Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd172 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0587770Z // end inline asm 2026-02-21T09:20:56.0587830Z // begin inline asm 2026-02-21T09:20:56.0587959Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd173 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0588112Z // end inline asm 2026-02-21T09:20:56.0588186Z cp.async.commit_group; 2026-02-21T09:20:56.0588479Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0588620Z add.s32 %r3778, %r3764, %r47; 2026-02-21T09:20:56.0588831Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0588897Z cvt.s64.s32 %rd233, %r3778; 2026-02-21T09:20:56.0588965Z add.s64 %rd174, %rd45, %rd233; 2026-02-21T09:20:56.0589178Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0589241Z // begin inline asm 2026-02-21T09:20:56.0589441Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd174 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0589503Z // end inline asm 2026-02-21T09:20:56.0589577Z cp.async.commit_group; 2026-02-21T09:20:56.0589781Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0589849Z add.s64 %rd175, %rd220, 64; 2026-02-21T09:20:56.0589919Z add.s64 %rd176, %rd224, 64; 2026-02-21T09:20:56.0589981Z add.s64 %rd177, %rd228, 64; 2026-02-21T09:20:56.0590056Z add.s64 %rd178, %rd232, 64; 2026-02-21T09:20:56.0590263Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0590329Z bar.sync 0; 2026-02-21T09:20:56.0590391Z // begin inline asm 2026-02-21T09:20:56.0590588Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd175 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0590658Z // end inline asm 2026-02-21T09:20:56.0590718Z // begin inline asm 2026-02-21T09:20:56.0590852Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd176 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0590917Z // end inline asm 2026-02-21T09:20:56.0590979Z // begin inline asm 2026-02-21T09:20:56.0591111Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd177 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0591173Z // end inline asm 2026-02-21T09:20:56.0591239Z // begin inline asm 2026-02-21T09:20:56.0591369Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd178 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0591428Z // end inline asm 2026-02-21T09:20:56.0591504Z cp.async.commit_group; 2026-02-21T09:20:56.0591710Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0591777Z add.s32 %r3779, %r3764, %r53; 2026-02-21T09:20:56.0591988Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0592055Z cvt.s64.s32 %rd234, %r3779; 2026-02-21T09:20:56.0592122Z add.s64 %rd179, %rd45, %rd234; 2026-02-21T09:20:56.0592326Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0592393Z // begin inline asm 2026-02-21T09:20:56.0592531Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd179 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0592593Z // end inline asm 2026-02-21T09:20:56.0592666Z cp.async.commit_group; 2026-02-21T09:20:56.0592867Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0592932Z add.s64 %rd180, %rd220, 96; 2026-02-21T09:20:56.0593000Z add.s64 %rd181, %rd224, 96; 2026-02-21T09:20:56.0593065Z add.s64 %rd182, %rd228, 96; 2026-02-21T09:20:56.0593127Z add.s64 %rd183, %rd232, 96; 2026-02-21T09:20:56.0593332Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0593398Z // begin inline asm 2026-02-21T09:20:56.0593530Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd180 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0593588Z // end inline asm 2026-02-21T09:20:56.0593654Z // begin inline asm 2026-02-21T09:20:56.0593784Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd181 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0593842Z // end inline asm 2026-02-21T09:20:56.0593964Z // begin inline asm 2026-02-21T09:20:56.0594100Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd182 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0594158Z // end inline asm 2026-02-21T09:20:56.0594218Z // begin inline asm 2026-02-21T09:20:56.0594352Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd183 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0594457Z // end inline asm 2026-02-21T09:20:56.0594524Z cp.async.commit_group; 2026-02-21T09:20:56.0594732Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0594798Z add.s32 %r3780, %r3764, %r59; 2026-02-21T09:20:56.0595000Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0595064Z cvt.s64.s32 %rd235, %r3780; 2026-02-21T09:20:56.0595186Z add.s64 %rd184, %rd45, %rd235; 2026-02-21T09:20:56.0595390Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0595452Z // begin inline asm 2026-02-21T09:20:56.0595591Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd184 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0595649Z // end inline asm 2026-02-21T09:20:56.0595729Z cp.async.commit_group; 2026-02-21T09:20:56.0595942Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0596005Z add.s64 %rd185, %rd220, 128; 2026-02-21T09:20:56.0596066Z add.s64 %rd186, %rd224, 128; 2026-02-21T09:20:56.0596126Z add.s64 %rd187, %rd228, 128; 2026-02-21T09:20:56.0596239Z add.s64 %rd188, %rd232, 128; 2026-02-21T09:20:56.0596442Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0596620Z bar.sync 0; 2026-02-21T09:20:56.0596689Z // begin inline asm 2026-02-21T09:20:56.0596824Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd185 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0596881Z // end inline asm 2026-02-21T09:20:56.0596947Z // begin inline asm 2026-02-21T09:20:56.0597078Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd186 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0597136Z // end inline asm 2026-02-21T09:20:56.0597197Z // begin inline asm 2026-02-21T09:20:56.0597331Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd187 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0597393Z // end inline asm 2026-02-21T09:20:56.0597454Z // begin inline asm 2026-02-21T09:20:56.0597588Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd188 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0597648Z // end inline asm 2026-02-21T09:20:56.0597719Z cp.async.commit_group; 2026-02-21T09:20:56.0597924Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0597996Z add.s32 %r3781, %r3764, %r65; 2026-02-21T09:20:56.0598198Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0598263Z cvt.s64.s32 %rd236, %r3781; 2026-02-21T09:20:56.0598349Z add.s64 %rd189, %rd45, %rd236; 2026-02-21T09:20:56.0598554Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0598617Z // begin inline asm 2026-02-21T09:20:56.0598757Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd189 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0598819Z // end inline asm 2026-02-21T09:20:56.0598888Z cp.async.commit_group; 2026-02-21T09:20:56.0599091Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0599172Z add.s64 %rd190, %rd220, 160; 2026-02-21T09:20:56.0599238Z add.s64 %rd191, %rd224, 160; 2026-02-21T09:20:56.0599303Z add.s64 %rd192, %rd228, 160; 2026-02-21T09:20:56.0599372Z add.s64 %rd193, %rd232, 160; 2026-02-21T09:20:56.0599578Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0599638Z // begin inline asm 2026-02-21T09:20:56.0599774Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd190 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0599944Z // end inline asm 2026-02-21T09:20:56.0600004Z // begin inline asm 2026-02-21T09:20:56.0600136Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd191 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0600198Z // end inline asm 2026-02-21T09:20:56.0600323Z // begin inline asm 2026-02-21T09:20:56.0600464Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd192 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0600531Z // end inline asm 2026-02-21T09:20:56.0600592Z // begin inline asm 2026-02-21T09:20:56.0600726Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd193 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0600791Z // end inline asm 2026-02-21T09:20:56.0600860Z cp.async.commit_group; 2026-02-21T09:20:56.0601137Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0601205Z add.s32 %r3782, %r3764, %r71; 2026-02-21T09:20:56.0601414Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0601479Z cvt.s64.s32 %rd237, %r3782; 2026-02-21T09:20:56.0601547Z add.s64 %rd194, %rd45, %rd237; 2026-02-21T09:20:56.0601755Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0601818Z // begin inline asm 2026-02-21T09:20:56.0601951Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd194 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0602015Z // end inline asm 2026-02-21T09:20:56.0602082Z cp.async.commit_group; 2026-02-21T09:20:56.0602346Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0602414Z add.s64 %rd195, %rd220, 192; 2026-02-21T09:20:56.0602487Z add.s64 %rd196, %rd224, 192; 2026-02-21T09:20:56.0602551Z add.s64 %rd197, %rd228, 192; 2026-02-21T09:20:56.0602615Z add.s64 %rd198, %rd232, 192; 2026-02-21T09:20:56.0602826Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0602886Z bar.sync 0; 2026-02-21T09:20:56.0602947Z // begin inline asm 2026-02-21T09:20:56.0603080Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd195 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0603145Z // end inline asm 2026-02-21T09:20:56.0603207Z // begin inline asm 2026-02-21T09:20:56.0603337Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd196 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0603409Z // end inline asm 2026-02-21T09:20:56.0603475Z // begin inline asm 2026-02-21T09:20:56.0603606Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd197 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0603669Z // end inline asm 2026-02-21T09:20:56.0603731Z // begin inline asm 2026-02-21T09:20:56.0603860Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd198 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0603919Z // end inline asm 2026-02-21T09:20:56.0603992Z cp.async.commit_group; 2026-02-21T09:20:56.0604194Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0604259Z add.s32 %r3783, %r3764, %r77; 2026-02-21T09:20:56.0604467Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0604531Z cvt.s64.s32 %rd238, %r3783; 2026-02-21T09:20:56.0604601Z add.s64 %rd199, %rd45, %rd238; 2026-02-21T09:20:56.0604805Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0604866Z // begin inline asm 2026-02-21T09:20:56.0605001Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd199 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0605060Z // end inline asm 2026-02-21T09:20:56.0605134Z cp.async.commit_group; 2026-02-21T09:20:56.0605337Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0605404Z add.s64 %rd200, %rd220, 224; 2026-02-21T09:20:56.0605472Z add.s64 %rd201, %rd224, 224; 2026-02-21T09:20:56.0605534Z add.s64 %rd202, %rd228, 224; 2026-02-21T09:20:56.0605667Z add.s64 %rd203, %rd232, 224; 2026-02-21T09:20:56.0605878Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0605940Z // begin inline asm 2026-02-21T09:20:56.0606072Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd200 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0606177Z // end inline asm 2026-02-21T09:20:56.0606241Z // begin inline asm 2026-02-21T09:20:56.0606370Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd201 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0606429Z // end inline asm 2026-02-21T09:20:56.0606609Z // begin inline asm 2026-02-21T09:20:56.0606744Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd202 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0606804Z // end inline asm 2026-02-21T09:20:56.0606939Z // begin inline asm 2026-02-21T09:20:56.0607083Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd203 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0607141Z // end inline asm 2026-02-21T09:20:56.0607209Z cp.async.commit_group; 2026-02-21T09:20:56.0607421Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0607485Z add.s32 %r3784, %r3764, %r83; 2026-02-21T09:20:56.0607690Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0607760Z cvt.s64.s32 %rd239, %r3784; 2026-02-21T09:20:56.0607824Z add.s64 %rd204, %rd45, %rd239; 2026-02-21T09:20:56.0608027Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0608329Z // begin inline asm 2026-02-21T09:20:56.0608478Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd204 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0608539Z // end inline asm 2026-02-21T09:20:56.0608608Z cp.async.commit_group; 2026-02-21T09:20:56.0608817Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0608880Z add.s64 %rd205, %rd220, 256; 2026-02-21T09:20:56.0608946Z add.s64 %rd206, %rd224, 256; 2026-02-21T09:20:56.0609013Z add.s64 %rd207, %rd228, 256; 2026-02-21T09:20:56.0609075Z add.s64 %rd208, %rd232, 256; 2026-02-21T09:20:56.0609278Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0609338Z bar.sync 0; 2026-02-21T09:20:56.0609403Z // begin inline asm 2026-02-21T09:20:56.0609536Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd205 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0609598Z // end inline asm 2026-02-21T09:20:56.0609665Z // begin inline asm 2026-02-21T09:20:56.0609797Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd206 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0609855Z // end inline asm 2026-02-21T09:20:56.0609919Z // begin inline asm 2026-02-21T09:20:56.0610052Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd207 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0610110Z // end inline asm 2026-02-21T09:20:56.0610169Z // begin inline asm 2026-02-21T09:20:56.0610310Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd208 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0610369Z // end inline asm 2026-02-21T09:20:56.0610437Z cp.async.commit_group; 2026-02-21T09:20:56.0610646Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0610713Z add.s32 %r3785, %r3764, %r89; 2026-02-21T09:20:56.0610922Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0610994Z cvt.s64.s32 %rd240, %r3785; 2026-02-21T09:20:56.0615163Z add.s64 %rd209, %rd45, %rd240; 2026-02-21T09:20:56.0615449Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0615519Z // begin inline asm 2026-02-21T09:20:56.0615677Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd209 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0615741Z // end inline asm 2026-02-21T09:20:56.0615816Z cp.async.commit_group; 2026-02-21T09:20:56.0616191Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0616263Z add.s64 %rd210, %rd220, 288; 2026-02-21T09:20:56.0616333Z add.s64 %rd211, %rd224, 288; 2026-02-21T09:20:56.0616396Z add.s64 %rd212, %rd228, 288; 2026-02-21T09:20:56.0616731Z add.s64 %rd213, %rd232, 288; 2026-02-21T09:20:56.0616970Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0617035Z // begin inline asm 2026-02-21T09:20:56.0617188Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd210 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0617250Z // end inline asm 2026-02-21T09:20:56.0617317Z // begin inline asm 2026-02-21T09:20:56.0617542Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd211 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0617608Z // end inline asm 2026-02-21T09:20:56.0617680Z // begin inline asm 2026-02-21T09:20:56.0617824Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd212 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0617887Z // end inline asm 2026-02-21T09:20:56.0617948Z // begin inline asm 2026-02-21T09:20:56.0618080Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd213 + 0 ], 0x8, %r3535; 2026-02-21T09:20:56.0618139Z // end inline asm 2026-02-21T09:20:56.0618212Z cp.async.commit_group; 2026-02-21T09:20:56.0618439Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0618508Z add.s32 %r3786, %r3764, %r95; 2026-02-21T09:20:56.0618789Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0618864Z cvt.s64.s32 %rd241, %r3786; 2026-02-21T09:20:56.0618936Z add.s64 %rd214, %rd45, %rd241; 2026-02-21T09:20:56.0619150Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0619216Z // begin inline asm 2026-02-21T09:20:56.0619363Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd214 + 0 ], 0x4, %r12023; 2026-02-21T09:20:56.0619424Z // end inline asm 2026-02-21T09:20:56.0619493Z cp.async.commit_group; 2026-02-21T09:20:56.0619707Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0619776Z add.s32 %r12021, %r117, %r479; 2026-02-21T09:20:56.0619845Z or.b32 %r3787, %r13, %r480; 2026-02-21T09:20:56.0619917Z shl.b32 %r3788, %r3787, 10; 2026-02-21T09:20:56.0619995Z mul.wide.s32 %rd15, %r3788, 2; 2026-02-21T09:20:56.0620059Z or.b32 %r3789, %r12, %r480; 2026-02-21T09:20:56.0620120Z shl.b32 %r3790, %r3789, 10; 2026-02-21T09:20:56.0620193Z mul.wide.s32 %rd16, %r3790, 2; 2026-02-21T09:20:56.0620254Z shl.b32 %r3791, %r3760, 18; 2026-02-21T09:20:56.0620322Z or.b32 %r3792, %r11886, %r3791; 2026-02-21T09:20:56.0620395Z mul.wide.s32 %rd17, %r3792, 2; 2026-02-21T09:20:56.0620461Z or.b32 %r12020, %r121, %r3791; 2026-02-21T09:20:56.0620528Z mov.b32 %r12024, 0f00000000; 2026-02-21T09:20:56.0620591Z mov.b32 %r12022, -1; 2026-02-21T09:20:56.0620662Z mov.b64 %rd637, -16; 2026-02-21T09:20:56.0620724Z mov.b64 %rd636, %rd3; 2026-02-21T09:20:56.0620788Z mov.b32 %r12025, %r12024; 2026-02-21T09:20:56.0620864Z mov.b32 %r12026, %r12024; 2026-02-21T09:20:56.0620924Z mov.b32 %r12027, %r12024; 2026-02-21T09:20:56.0620985Z mov.b32 %r12028, %r12024; 2026-02-21T09:20:56.0621049Z mov.b32 %r12029, %r12024; 2026-02-21T09:20:56.0621109Z mov.b32 %r12030, %r12024; 2026-02-21T09:20:56.0621169Z mov.b32 %r12031, %r12024; 2026-02-21T09:20:56.0621229Z mov.b32 %r12032, %r12024; 2026-02-21T09:20:56.0621295Z mov.b32 %r12033, %r12024; 2026-02-21T09:20:56.0621355Z mov.b32 %r12034, %r12024; 2026-02-21T09:20:56.0621414Z mov.b32 %r12035, %r12024; 2026-02-21T09:20:56.0621482Z mov.b32 %r12036, %r12024; 2026-02-21T09:20:56.0621543Z mov.b32 %r12037, %r12024; 2026-02-21T09:20:56.0621603Z mov.b32 %r12038, %r12024; 2026-02-21T09:20:56.0621661Z mov.b32 %r12039, %r12024; 2026-02-21T09:20:56.0621729Z mov.b32 %r12040, %r12024; 2026-02-21T09:20:56.0621880Z mov.b32 %r12041, %r12024; 2026-02-21T09:20:56.0621940Z mov.b32 %r12042, %r12024; 2026-02-21T09:20:56.0622009Z mov.b32 %r12043, %r12024; 2026-02-21T09:20:56.0622071Z mov.b32 %r12044, %r12024; 2026-02-21T09:20:56.0622130Z mov.b32 %r12045, %r12024; 2026-02-21T09:20:56.0622250Z mov.b32 %r12046, %r12024; 2026-02-21T09:20:56.0622315Z mov.b32 %r12047, %r12024; 2026-02-21T09:20:56.0622374Z mov.b32 %r12048, %r12024; 2026-02-21T09:20:56.0622433Z mov.b32 %r12049, %r12024; 2026-02-21T09:20:56.0622503Z mov.b32 %r12050, %r12024; 2026-02-21T09:20:56.0622572Z mov.b32 %r12051, %r12024; 2026-02-21T09:20:56.0622633Z mov.b32 %r12052, %r12024; 2026-02-21T09:20:56.0622692Z mov.b32 %r12053, %r12024; 2026-02-21T09:20:56.0622756Z mov.b32 %r12054, %r12024; 2026-02-21T09:20:56.0622869Z mov.b32 %r12055, %r12024; 2026-02-21T09:20:56.0622932Z mov.b32 %r12056, %r12024; 2026-02-21T09:20:56.0622996Z mov.b32 %r12057, %r12024; 2026-02-21T09:20:56.0623057Z mov.b32 %r12058, %r12024; 2026-02-21T09:20:56.0623118Z mov.b32 %r12059, %r12024; 2026-02-21T09:20:56.0623178Z mov.b32 %r12060, %r12024; 2026-02-21T09:20:56.0623242Z mov.b32 %r12061, %r12024; 2026-02-21T09:20:56.0623311Z mov.b32 %r12062, %r12024; 2026-02-21T09:20:56.0623374Z mov.b32 %r12063, %r12024; 2026-02-21T09:20:56.0623441Z mov.b32 %r12064, %r12024; 2026-02-21T09:20:56.0623500Z mov.b32 %r12065, %r12024; 2026-02-21T09:20:56.0623561Z mov.b32 %r12066, %r12024; 2026-02-21T09:20:56.0623622Z mov.b32 %r12067, %r12024; 2026-02-21T09:20:56.0623685Z mov.b32 %r12068, %r12024; 2026-02-21T09:20:56.0623818Z mov.b32 %r12069, %r12024; 2026-02-21T09:20:56.0623880Z mov.b32 %r12070, %r12024; 2026-02-21T09:20:56.0623945Z mov.b32 %r12071, %r12024; 2026-02-21T09:20:56.0624004Z mov.b32 %r12072, %r12024; 2026-02-21T09:20:56.0624066Z mov.b32 %r12073, %r12024; 2026-02-21T09:20:56.0624134Z mov.b32 %r12074, %r12024; 2026-02-21T09:20:56.0624194Z mov.b32 %r12075, %r12024; 2026-02-21T09:20:56.0624253Z mov.b32 %r12076, %r12024; 2026-02-21T09:20:56.0624315Z mov.b32 %r12077, %r12024; 2026-02-21T09:20:56.0624390Z mov.b32 %r12078, %r12024; 2026-02-21T09:20:56.0624452Z mov.b32 %r12079, %r12024; 2026-02-21T09:20:56.0624514Z mov.b32 %r12080, %r12024; 2026-02-21T09:20:56.0624582Z mov.b32 %r12081, %r12024; 2026-02-21T09:20:56.0624643Z mov.b32 %r12082, %r12024; 2026-02-21T09:20:56.0624702Z mov.b32 %r12083, %r12024; 2026-02-21T09:20:56.0624762Z mov.b32 %r12084, %r12024; 2026-02-21T09:20:56.0624828Z mov.b32 %r12085, %r12024; 2026-02-21T09:20:56.0624886Z mov.b32 %r12086, %r12024; 2026-02-21T09:20:56.0624948Z mov.b32 %r12087, %r12024; 2026-02-21T09:20:56.0625012Z mov.b32 %r12088, %r12024; 2026-02-21T09:20:56.0625073Z mov.b32 %r12089, %r12024; 2026-02-21T09:20:56.0625131Z mov.b32 %r12090, %r12024; 2026-02-21T09:20:56.0625196Z mov.b32 %r12091, %r12024; 2026-02-21T09:20:56.0625261Z mov.b32 %r12092, %r12024; 2026-02-21T09:20:56.0625319Z mov.b32 %r12093, %r12024; 2026-02-21T09:20:56.0625378Z mov.b32 %r12094, %r12024; 2026-02-21T09:20:56.0625444Z mov.b32 %r12095, %r12024; 2026-02-21T09:20:56.0625508Z mov.b32 %r12096, %r12024; 2026-02-21T09:20:56.0625567Z mov.b32 %r12097, %r12024; 2026-02-21T09:20:56.0625627Z mov.b32 %r12098, %r12024; 2026-02-21T09:20:56.0625693Z mov.b32 %r12099, %r12024; 2026-02-21T09:20:56.0625754Z mov.b32 %r12100, %r12024; 2026-02-21T09:20:56.0625813Z mov.b32 %r12101, %r12024; 2026-02-21T09:20:56.0625877Z mov.b32 %r12102, %r12024; 2026-02-21T09:20:56.0625937Z mov.b32 %r12103, %r12024; 2026-02-21T09:20:56.0625995Z mov.b32 %r12104, %r12024; 2026-02-21T09:20:56.0626057Z mov.b32 %r12105, %r12024; 2026-02-21T09:20:56.0626124Z mov.b32 %r12106, %r12024; 2026-02-21T09:20:56.0626184Z mov.b32 %r12107, %r12024; 2026-02-21T09:20:56.0626247Z mov.b32 %r12108, %r12024; 2026-02-21T09:20:56.0626315Z mov.b32 %r12109, %r12024; 2026-02-21T09:20:56.0626376Z mov.b32 %r12110, %r12024; 2026-02-21T09:20:56.0626438Z mov.b32 %r12111, %r12024; 2026-02-21T09:20:56.0626632Z mov.b32 %r12112, %r12024; 2026-02-21T09:20:56.0626792Z mov.b32 %r12113, %r12024; 2026-02-21T09:20:56.0626854Z mov.b32 %r12114, %r12024; 2026-02-21T09:20:56.0626914Z mov.b32 %r12115, %r12024; 2026-02-21T09:20:56.0626977Z mov.b32 %r12116, %r12024; 2026-02-21T09:20:56.0627038Z mov.b32 %r12117, %r12024; 2026-02-21T09:20:56.0627161Z mov.b32 %r12118, %r12024; 2026-02-21T09:20:56.0627231Z mov.b32 %r12119, %r12024; 2026-02-21T09:20:56.0627292Z mov.b32 %r12120, %r12024; 2026-02-21T09:20:56.0627351Z mov.b32 %r12121, %r12024; 2026-02-21T09:20:56.0627411Z mov.b32 %r12122, %r12024; 2026-02-21T09:20:56.0627480Z mov.b32 %r12123, %r12024; 2026-02-21T09:20:56.0627540Z mov.b32 %r12124, %r12024; 2026-02-21T09:20:56.0627600Z mov.b32 %r12125, %r12024; 2026-02-21T09:20:56.0627665Z mov.b32 %r12126, %r12024; 2026-02-21T09:20:56.0627791Z mov.b32 %r12127, %r12024; 2026-02-21T09:20:56.0627863Z mov.b32 %r12128, %r12024; 2026-02-21T09:20:56.0627924Z mov.b32 %r12129, %r12024; 2026-02-21T09:20:56.0627990Z mov.b32 %r12130, %r12024; 2026-02-21T09:20:56.0628054Z mov.b32 %r12131, %r12024; 2026-02-21T09:20:56.0628118Z mov.b32 %r12132, %r12024; 2026-02-21T09:20:56.0628181Z mov.b32 %r12133, %r12024; 2026-02-21T09:20:56.0628239Z mov.b32 %r12134, %r12024; 2026-02-21T09:20:56.0628374Z mov.b32 %r12135, %r12024; 2026-02-21T09:20:56.0628442Z mov.b32 %r12136, %r12024; 2026-02-21T09:20:56.0628512Z mov.b32 %r12137, %r12024; 2026-02-21T09:20:56.0628575Z mov.b32 %r12138, %r12024; 2026-02-21T09:20:56.0628637Z mov.b32 %r12139, %r12024; 2026-02-21T09:20:56.0628703Z mov.b32 %r12140, %r12024; 2026-02-21T09:20:56.0628842Z mov.b32 %r12141, %r12024; 2026-02-21T09:20:56.0628907Z mov.b32 %r12142, %r12024; 2026-02-21T09:20:56.0628967Z mov.b32 %r12143, %r12024; 2026-02-21T09:20:56.0629034Z mov.b32 %r12144, %r12024; 2026-02-21T09:20:56.0629096Z mov.b32 %r12145, %r12024; 2026-02-21T09:20:56.0629157Z mov.b32 %r12146, %r12024; 2026-02-21T09:20:56.0629224Z mov.b32 %r12147, %r12024; 2026-02-21T09:20:56.0629283Z mov.b32 %r12148, %r12024; 2026-02-21T09:20:56.0629344Z mov.b32 %r12149, %r12024; 2026-02-21T09:20:56.0629403Z mov.b32 %r12150, %r12024; 2026-02-21T09:20:56.0629467Z mov.b32 %r12151, %r12024; 2026-02-21T09:20:56.0629588Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:56.0629700Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:56.0629783Z add.s64 %rd637, %rd637, 16; 2026-02-21T09:20:56.0629856Z setp.lt.u64 %p24, %rd637, 432; 2026-02-21T09:20:56.0629920Z add.s32 %r5393, %r12022, 1; 2026-02-21T09:20:56.0629996Z setp.gt.s32 %p25, %r5393, 4; 2026-02-21T09:20:56.0630065Z selp.b32 %r12022, 0, %r5393, %p25; 2026-02-21T09:20:56.0630282Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0630356Z cp.async.wait_group 16; 2026-02-21T09:20:56.0630420Z bar.sync 0; 2026-02-21T09:20:56.0630484Z shl.b32 %r5394, %r12022, 13; 2026-02-21T09:20:56.0630550Z add.s32 %r5396, %r11869, %r5394; 2026-02-21T09:20:56.0630764Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0630831Z add.s32 %r5397, %r5396, %r97; 2026-02-21T09:20:56.0630908Z ld.shared.b16 %rs113, [%r5397]; 2026-02-21T09:20:56.0630984Z ld.shared.b16 %rs114, [%r5397+256]; 2026-02-21T09:20:56.0631056Z ld.shared.b16 %rs115, [%r5397+16]; 2026-02-21T09:20:56.0631125Z ld.shared.b16 %rs116, [%r5397+272]; 2026-02-21T09:20:56.0631196Z ld.shared.b16 %rs117, [%r5397+4096]; 2026-02-21T09:20:56.0631269Z ld.shared.b16 %rs118, [%r5397+4352]; 2026-02-21T09:20:56.0631338Z ld.shared.b16 %rs119, [%r5397+4112]; 2026-02-21T09:20:56.0631404Z ld.shared.b16 %rs120, [%r5397+4368]; 2026-02-21T09:20:56.0631472Z add.s32 %r5398, %r5396, %r98; 2026-02-21T09:20:56.0631539Z ld.shared.b16 %rs121, [%r5398]; 2026-02-21T09:20:56.0631606Z ld.shared.b16 %rs122, [%r5398+256]; 2026-02-21T09:20:56.0631672Z ld.shared.b16 %rs123, [%r5398+16]; 2026-02-21T09:20:56.0631818Z ld.shared.b16 %rs124, [%r5398+272]; 2026-02-21T09:20:56.0631886Z ld.shared.b16 %rs125, [%r5398+4096]; 2026-02-21T09:20:56.0631954Z ld.shared.b16 %rs126, [%r5398+4352]; 2026-02-21T09:20:56.0632026Z ld.shared.b16 %rs127, [%r5398+4112]; 2026-02-21T09:20:56.0632152Z ld.shared.b16 %rs128, [%r5398+4368]; 2026-02-21T09:20:56.0632222Z cvt.f32.bf16 %r3921, %rs113; 2026-02-21T09:20:56.0632288Z cvt.f32.bf16 %r3922, %rs114; 2026-02-21T09:20:56.0632354Z cvt.f32.bf16 %r3923, %rs121; 2026-02-21T09:20:56.0632416Z cvt.f32.bf16 %r3924, %rs122; 2026-02-21T09:20:56.0632480Z cvt.f32.bf16 %r4053, %rs115; 2026-02-21T09:20:56.0632547Z cvt.f32.bf16 %r4054, %rs116; 2026-02-21T09:20:56.0632609Z cvt.f32.bf16 %r4055, %rs123; 2026-02-21T09:20:56.0632670Z cvt.f32.bf16 %r4056, %rs124; 2026-02-21T09:20:56.0632786Z cvt.f32.bf16 %r4185, %rs117; 2026-02-21T09:20:56.0632851Z cvt.f32.bf16 %r4186, %rs118; 2026-02-21T09:20:56.0632913Z cvt.f32.bf16 %r4187, %rs125; 2026-02-21T09:20:56.0632975Z cvt.f32.bf16 %r4188, %rs126; 2026-02-21T09:20:56.0633043Z cvt.f32.bf16 %r4317, %rs119; 2026-02-21T09:20:56.0633105Z cvt.f32.bf16 %r4318, %rs120; 2026-02-21T09:20:56.0633167Z cvt.f32.bf16 %r4319, %rs127; 2026-02-21T09:20:56.0633232Z cvt.f32.bf16 %r4320, %rs128; 2026-02-21T09:20:56.0633443Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0633505Z shl.b32 %r5399, %r12022, 10; 2026-02-21T09:20:56.0633570Z add.s32 %r5400, %r11869, %r5399; 2026-02-21T09:20:56.0633638Z add.s32 %r5401, %r5400, 90112; 2026-02-21T09:20:56.0633888Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0633965Z add.s32 %r5402, %r5401, %r11875; 2026-02-21T09:20:56.0634046Z ld.shared.b8 %rs129, [%r5402]; 2026-02-21T09:20:56.0634113Z ld.shared.b8 %rs130, [%r5402+128]; 2026-02-21T09:20:56.0634178Z ld.shared.b8 %rs131, [%r5402+256]; 2026-02-21T09:20:56.0634247Z ld.shared.b8 %rs132, [%r5402+384]; 2026-02-21T09:20:56.0634314Z ld.shared.b8 %rs133, [%r5402+512]; 2026-02-21T09:20:56.0634379Z ld.shared.b8 %rs134, [%r5402+640]; 2026-02-21T09:20:56.0634442Z ld.shared.b8 %rs135, [%r5402+768]; 2026-02-21T09:20:56.0634510Z add.s32 %r5403, %r5401, %r11876; 2026-02-21T09:20:56.0634581Z ld.shared.b8 %rs136, [%r5403]; 2026-02-21T09:20:56.0634787Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0634861Z shl.b16 %rs137, %rs129, 4; 2026-02-21T09:20:56.0634923Z shl.b16 %rs138, %rs130, 4; 2026-02-21T09:20:56.0634987Z shl.b16 %rs139, %rs131, 4; 2026-02-21T09:20:56.0635053Z shl.b16 %rs140, %rs132, 4; 2026-02-21T09:20:56.0635117Z shl.b16 %rs141, %rs133, 4; 2026-02-21T09:20:56.0635180Z shl.b16 %rs142, %rs134, 4; 2026-02-21T09:20:56.0635242Z shl.b16 %rs143, %rs135, 4; 2026-02-21T09:20:56.0635310Z shl.b16 %rs144, %rs136, 4; 2026-02-21T09:20:56.0635515Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0635595Z selp.b16 %rs145, %rs137, %rs129, %p70; 2026-02-21T09:20:56.0635664Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T09:20:56.0635726Z shr.s16 %rs147, %rs146, 4; 2026-02-21T09:20:56.0635800Z selp.b16 %rs148, %rs138, %rs130, %p70; 2026-02-21T09:20:56.0635865Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T09:20:56.0635935Z shr.s16 %rs150, %rs149, 4; 2026-02-21T09:20:56.0636004Z selp.b16 %rs151, %rs139, %rs131, %p70; 2026-02-21T09:20:56.0636065Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T09:20:56.0636132Z shr.s16 %rs153, %rs152, 4; 2026-02-21T09:20:56.0636203Z selp.b16 %rs154, %rs140, %rs132, %p70; 2026-02-21T09:20:56.0636264Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T09:20:56.0636326Z shr.s16 %rs156, %rs155, 4; 2026-02-21T09:20:56.0636405Z selp.b16 %rs157, %rs141, %rs133, %p70; 2026-02-21T09:20:56.0636595Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T09:20:56.0636664Z shr.s16 %rs159, %rs158, 4; 2026-02-21T09:20:56.0636740Z selp.b16 %rs160, %rs142, %rs134, %p70; 2026-02-21T09:20:56.0636896Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T09:20:56.0636959Z shr.s16 %rs162, %rs161, 4; 2026-02-21T09:20:56.0637033Z selp.b16 %rs163, %rs143, %rs135, %p70; 2026-02-21T09:20:56.0637096Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T09:20:56.0637221Z shr.s16 %rs165, %rs164, 4; 2026-02-21T09:20:56.0637291Z selp.b16 %rs166, %rs144, %rs136, %p70; 2026-02-21T09:20:56.0637359Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T09:20:56.0637419Z shr.s16 %rs168, %rs167, 4; 2026-02-21T09:20:56.0637628Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0637701Z cvt.rn.f32.s16 %r5404, %rs147; 2026-02-21T09:20:56.0637765Z cvt.rn.f32.s16 %r5405, %rs150; 2026-02-21T09:20:56.0637828Z cvt.rn.f32.s16 %r5406, %rs153; 2026-02-21T09:20:56.0637955Z cvt.rn.f32.s16 %r5407, %rs156; 2026-02-21T09:20:56.0638029Z cvt.rn.f32.s16 %r5408, %rs159; 2026-02-21T09:20:56.0638092Z cvt.rn.f32.s16 %r5409, %rs162; 2026-02-21T09:20:56.0638158Z cvt.rn.f32.s16 %r5410, %rs165; 2026-02-21T09:20:56.0638225Z cvt.rn.f32.s16 %r5411, %rs168; 2026-02-21T09:20:56.0638290Z st.shared.b32 [%r101], %r5404; 2026-02-21T09:20:56.0638356Z st.shared.b32 [%r101+8], %r5405; 2026-02-21T09:20:56.0638420Z st.shared.b32 [%r102], %r5406; 2026-02-21T09:20:56.0638494Z st.shared.b32 [%r102+8], %r5407; 2026-02-21T09:20:56.0638560Z st.shared.b32 [%r103], %r5408; 2026-02-21T09:20:56.0638625Z st.shared.b32 [%r103+8], %r5409; 2026-02-21T09:20:56.0638695Z st.shared.b32 [%r104], %r5410; 2026-02-21T09:20:56.0638758Z st.shared.b32 [%r104+8], %r5411; 2026-02-21T09:20:56.0638882Z $L__tmp5: 2026-02-21T09:20:56.0639185Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0639256Z // begin inline asm 2026-02-21T09:20:56.0639344Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0639402Z // end inline asm 2026-02-21T09:20:56.0639467Z bar.sync 0; 2026-02-21T09:20:56.0639550Z shfl.sync.idx.b32 %r5412, %r5, 0, 31, -1; 2026-02-21T09:20:56.0639628Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0639693Z mov.pred %p15, -1; 2026-02-21T09:20:56.0639753Z // begin inline asm 2026-02-21T09:20:56.0641235Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087}, {%r3921,%r3922,%r3923,%r3924}, %rd1, %p15, 1, 1; 2026-02-21T09:20:56.0641298Z // end inline asm 2026-02-21T09:20:56.0641357Z // begin inline asm 2026-02-21T09:20:56.0642811Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087}, {%r4053,%r4054,%r4055,%r4056}, %rd2, %p15, 1, 1; 2026-02-21T09:20:56.0642875Z // end inline asm 2026-02-21T09:20:56.0642939Z // begin inline asm 2026-02-21T09:20:56.0644383Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151}, {%r4185,%r4186,%r4187,%r4188}, %rd1, %p15, 1, 1; 2026-02-21T09:20:56.0644549Z // end inline asm 2026-02-21T09:20:56.0644618Z // begin inline asm 2026-02-21T09:20:56.0646134Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151}, {%r4317,%r4318,%r4319,%r4320}, %rd2, %p15, 1, 1; 2026-02-21T09:20:56.0646202Z // end inline asm 2026-02-21T09:20:56.0646281Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0646341Z mov.b32 %r5240, 0; 2026-02-21T09:20:56.0646405Z mov.b32 %r4449, %r1575; 2026-02-21T09:20:56.0646593Z mov.b32 %r4450, %r5240; 2026-02-21T09:20:56.0646661Z mov.b32 %r4451, %r5240; 2026-02-21T09:20:56.0646723Z // begin inline asm 2026-02-21T09:20:56.0649274Z // wait for regs: %r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087,%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151,%r4449,%r4450,%r4451 2026-02-21T09:20:56.0649359Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0649424Z // end inline asm 2026-02-21T09:20:56.0649484Z $L__tmp6: 2026-02-21T09:20:56.0649700Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0649775Z add.s32 %r5414, %r3368, %r5394; 2026-02-21T09:20:56.0649981Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0650048Z add.s32 %r5415, %r5414, %r97; 2026-02-21T09:20:56.0650117Z ld.shared.b16 %rs169, [%r5415]; 2026-02-21T09:20:56.0650191Z ld.shared.b16 %rs170, [%r5415+256]; 2026-02-21T09:20:56.0650258Z ld.shared.b16 %rs171, [%r5415+16]; 2026-02-21T09:20:56.0650325Z ld.shared.b16 %rs172, [%r5415+272]; 2026-02-21T09:20:56.0650396Z ld.shared.b16 %rs173, [%r5415+4096]; 2026-02-21T09:20:56.0650466Z ld.shared.b16 %rs174, [%r5415+4352]; 2026-02-21T09:20:56.0650533Z ld.shared.b16 %rs175, [%r5415+4112]; 2026-02-21T09:20:56.0650602Z ld.shared.b16 %rs176, [%r5415+4368]; 2026-02-21T09:20:56.0650666Z add.s32 %r5416, %r5414, %r98; 2026-02-21T09:20:56.0650733Z ld.shared.b16 %rs177, [%r5416]; 2026-02-21T09:20:56.0650801Z ld.shared.b16 %rs178, [%r5416+256]; 2026-02-21T09:20:56.0650894Z ld.shared.b16 %rs179, [%r5416+16]; 2026-02-21T09:20:56.0650963Z ld.shared.b16 %rs180, [%r5416+272]; 2026-02-21T09:20:56.0651033Z ld.shared.b16 %rs181, [%r5416+4096]; 2026-02-21T09:20:56.0651102Z ld.shared.b16 %rs182, [%r5416+4352]; 2026-02-21T09:20:56.0651167Z ld.shared.b16 %rs183, [%r5416+4112]; 2026-02-21T09:20:56.0651316Z ld.shared.b16 %rs184, [%r5416+4368]; 2026-02-21T09:20:56.0651380Z cvt.f32.bf16 %r4711, %rs169; 2026-02-21T09:20:56.0651443Z cvt.f32.bf16 %r4712, %rs170; 2026-02-21T09:20:56.0651504Z cvt.f32.bf16 %r4713, %rs177; 2026-02-21T09:20:56.0651632Z cvt.f32.bf16 %r4714, %rs178; 2026-02-21T09:20:56.0651694Z cvt.f32.bf16 %r4843, %rs171; 2026-02-21T09:20:56.0651757Z cvt.f32.bf16 %r4844, %rs172; 2026-02-21T09:20:56.0651816Z cvt.f32.bf16 %r4845, %rs179; 2026-02-21T09:20:56.0651882Z cvt.f32.bf16 %r4846, %rs180; 2026-02-21T09:20:56.0651946Z cvt.f32.bf16 %r4975, %rs173; 2026-02-21T09:20:56.0652011Z cvt.f32.bf16 %r4976, %rs174; 2026-02-21T09:20:56.0652073Z cvt.f32.bf16 %r4977, %rs181; 2026-02-21T09:20:56.0652146Z cvt.f32.bf16 %r4978, %rs182; 2026-02-21T09:20:56.0652279Z cvt.f32.bf16 %r5107, %rs175; 2026-02-21T09:20:56.0652343Z cvt.f32.bf16 %r5108, %rs176; 2026-02-21T09:20:56.0652408Z cvt.f32.bf16 %r5109, %rs183; 2026-02-21T09:20:56.0652470Z cvt.f32.bf16 %r5110, %rs184; 2026-02-21T09:20:56.0652683Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0652746Z add.s32 %r5417, %r5400, 95232; 2026-02-21T09:20:56.0652947Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0653014Z add.s32 %r5418, %r5417, %r11875; 2026-02-21T09:20:56.0653081Z ld.shared.b8 %rs185, [%r5418]; 2026-02-21T09:20:56.0653151Z ld.shared.b8 %rs186, [%r5418+128]; 2026-02-21T09:20:56.0653216Z ld.shared.b8 %rs187, [%r5418+256]; 2026-02-21T09:20:56.0653337Z ld.shared.b8 %rs188, [%r5418+384]; 2026-02-21T09:20:56.0653414Z ld.shared.b8 %rs189, [%r5418+512]; 2026-02-21T09:20:56.0653480Z ld.shared.b8 %rs190, [%r5418+640]; 2026-02-21T09:20:56.0653545Z ld.shared.b8 %rs191, [%r5418+768]; 2026-02-21T09:20:56.0653607Z add.s32 %r5419, %r5417, %r11876; 2026-02-21T09:20:56.0653674Z ld.shared.b8 %rs192, [%r5419]; 2026-02-21T09:20:56.0653876Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0653943Z shl.b16 %rs193, %rs185, 4; 2026-02-21T09:20:56.0654007Z shl.b16 %rs194, %rs186, 4; 2026-02-21T09:20:56.0654068Z shl.b16 %rs195, %rs187, 4; 2026-02-21T09:20:56.0654130Z shl.b16 %rs196, %rs188, 4; 2026-02-21T09:20:56.0654193Z shl.b16 %rs197, %rs189, 4; 2026-02-21T09:20:56.0654256Z shl.b16 %rs198, %rs190, 4; 2026-02-21T09:20:56.0654318Z shl.b16 %rs199, %rs191, 4; 2026-02-21T09:20:56.0654380Z shl.b16 %rs200, %rs192, 4; 2026-02-21T09:20:56.0654595Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0654670Z selp.b16 %rs201, %rs193, %rs185, %p70; 2026-02-21T09:20:56.0654732Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T09:20:56.0654799Z shr.s16 %rs203, %rs202, 4; 2026-02-21T09:20:56.0654870Z selp.b16 %rs204, %rs194, %rs186, %p70; 2026-02-21T09:20:56.0654930Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T09:20:56.0654993Z shr.s16 %rs206, %rs205, 4; 2026-02-21T09:20:56.0655066Z selp.b16 %rs207, %rs195, %rs187, %p70; 2026-02-21T09:20:56.0655128Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T09:20:56.0655197Z shr.s16 %rs209, %rs208, 4; 2026-02-21T09:20:56.0655266Z selp.b16 %rs210, %rs196, %rs188, %p70; 2026-02-21T09:20:56.0655329Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T09:20:56.0655392Z shr.s16 %rs212, %rs211, 4; 2026-02-21T09:20:56.0655459Z selp.b16 %rs213, %rs197, %rs189, %p70; 2026-02-21T09:20:56.0655521Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T09:20:56.0655580Z shr.s16 %rs215, %rs214, 4; 2026-02-21T09:20:56.0655652Z selp.b16 %rs216, %rs198, %rs190, %p70; 2026-02-21T09:20:56.0655714Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T09:20:56.0655773Z shr.s16 %rs218, %rs217, 4; 2026-02-21T09:20:56.0655845Z selp.b16 %rs219, %rs199, %rs191, %p70; 2026-02-21T09:20:56.0655908Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T09:20:56.0655968Z shr.s16 %rs221, %rs220, 4; 2026-02-21T09:20:56.0656034Z selp.b16 %rs222, %rs200, %rs192, %p70; 2026-02-21T09:20:56.0656168Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T09:20:56.0656229Z shr.s16 %rs224, %rs223, 4; 2026-02-21T09:20:56.0656432Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0656676Z cvt.rn.f32.s16 %r5420, %rs203; 2026-02-21T09:20:56.0656742Z cvt.rn.f32.s16 %r5421, %rs206; 2026-02-21T09:20:56.0656805Z cvt.rn.f32.s16 %r5422, %rs209; 2026-02-21T09:20:56.0656870Z cvt.rn.f32.s16 %r5423, %rs212; 2026-02-21T09:20:56.0656932Z cvt.rn.f32.s16 %r5424, %rs215; 2026-02-21T09:20:56.0656996Z cvt.rn.f32.s16 %r5425, %rs218; 2026-02-21T09:20:56.0657057Z cvt.rn.f32.s16 %r5426, %rs221; 2026-02-21T09:20:56.0657122Z cvt.rn.f32.s16 %r5427, %rs224; 2026-02-21T09:20:56.0657178Z bar.sync 0; 2026-02-21T09:20:56.0657312Z st.shared.b32 [%r101], %r5420; 2026-02-21T09:20:56.0657388Z st.shared.b32 [%r101+8], %r5421; 2026-02-21T09:20:56.0657460Z st.shared.b32 [%r102], %r5422; 2026-02-21T09:20:56.0657527Z st.shared.b32 [%r102+8], %r5423; 2026-02-21T09:20:56.0657590Z st.shared.b32 [%r103], %r5424; 2026-02-21T09:20:56.0657657Z st.shared.b32 [%r103+8], %r5425; 2026-02-21T09:20:56.0657719Z st.shared.b32 [%r104], %r5426; 2026-02-21T09:20:56.0657784Z st.shared.b32 [%r104+8], %r5427; 2026-02-21T09:20:56.0657845Z $L__tmp7: 2026-02-21T09:20:56.0658122Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0658183Z // begin inline asm 2026-02-21T09:20:56.0658259Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0658385Z // end inline asm 2026-02-21T09:20:56.0658443Z bar.sync 0; 2026-02-21T09:20:56.0658526Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0658591Z // begin inline asm 2026-02-21T09:20:56.0660040Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087}, {%r4711,%r4712,%r4713,%r4714}, %rd1, %p15, 1, 1; 2026-02-21T09:20:56.0660101Z // end inline asm 2026-02-21T09:20:56.0660161Z // begin inline asm 2026-02-21T09:20:56.0661623Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087}, {%r4843,%r4844,%r4845,%r4846}, %rd2, %p15, 1, 1; 2026-02-21T09:20:56.0661685Z // end inline asm 2026-02-21T09:20:56.0661744Z // begin inline asm 2026-02-21T09:20:56.0663195Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151}, {%r4975,%r4976,%r4977,%r4978}, %rd1, %p15, 1, 1; 2026-02-21T09:20:56.0663258Z // end inline asm 2026-02-21T09:20:56.0663317Z // begin inline asm 2026-02-21T09:20:56.0664761Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151}, {%r5107,%r5108,%r5109,%r5110}, %rd2, %p15, 1, 1; 2026-02-21T09:20:56.0664958Z // end inline asm 2026-02-21T09:20:56.0665038Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0665102Z mov.b32 %r5239, %r1575; 2026-02-21T09:20:56.0665162Z mov.b32 %r5241, %r5240; 2026-02-21T09:20:56.0665267Z // begin inline asm 2026-02-21T09:20:56.0667918Z // wait for regs: %r12024,%r12025,%r12026,%r12027,%r12028,%r12029,%r12030,%r12031,%r12032,%r12033,%r12034,%r12035,%r12036,%r12037,%r12038,%r12039,%r12040,%r12041,%r12042,%r12043,%r12044,%r12045,%r12046,%r12047,%r12048,%r12049,%r12050,%r12051,%r12052,%r12053,%r12054,%r12055,%r12056,%r12057,%r12058,%r12059,%r12060,%r12061,%r12062,%r12063,%r12064,%r12065,%r12066,%r12067,%r12068,%r12069,%r12070,%r12071,%r12072,%r12073,%r12074,%r12075,%r12076,%r12077,%r12078,%r12079,%r12080,%r12081,%r12082,%r12083,%r12084,%r12085,%r12086,%r12087,%r12088,%r12089,%r12090,%r12091,%r12092,%r12093,%r12094,%r12095,%r12096,%r12097,%r12098,%r12099,%r12100,%r12101,%r12102,%r12103,%r12104,%r12105,%r12106,%r12107,%r12108,%r12109,%r12110,%r12111,%r12112,%r12113,%r12114,%r12115,%r12116,%r12117,%r12118,%r12119,%r12120,%r12121,%r12122,%r12123,%r12124,%r12125,%r12126,%r12127,%r12128,%r12129,%r12130,%r12131,%r12132,%r12133,%r12134,%r12135,%r12136,%r12137,%r12138,%r12139,%r12140,%r12141,%r12142,%r12143,%r12144,%r12145,%r12146,%r12147,%r12148,%r12149,%r12150,%r12151,%r5239,%r5240,%r5241 2026-02-21T09:20:56.0668008Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0668068Z // end inline asm 2026-02-21T09:20:56.0668126Z $L__tmp8: 2026-02-21T09:20:56.0668400Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0668467Z add.s32 %r5428, %r12023, 1; 2026-02-21T09:20:56.0668539Z setp.gt.s32 %p26, %r5428, 4; 2026-02-21T09:20:56.0668606Z selp.b32 %r12023, 0, %r5428, %p26; 2026-02-21T09:20:56.0668813Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0668879Z add.s32 %r5429, %r12020, -16; 2026-02-21T09:20:56.0668945Z add.s64 %rd260, %rd636, %rd17; 2026-02-21T09:20:56.0669009Z add.s64 %rd250, %rd260, 320; 2026-02-21T09:20:56.0669072Z add.s64 %rd261, %rd636, %rd16; 2026-02-21T09:20:56.0669137Z add.s64 %rd251, %rd261, 320; 2026-02-21T09:20:56.0669199Z add.s64 %rd262, %rd636, %rd15; 2026-02-21T09:20:56.0669262Z add.s64 %rd252, %rd262, 320; 2026-02-21T09:20:56.0669340Z mad.wide.s32 %rd253, %r5429, 2, %rd44; 2026-02-21T09:20:56.0669546Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0669607Z shl.b32 %r5430, %r12023, 13; 2026-02-21T09:20:56.0669674Z add.s32 %r5431, %r11869, %r5430; 2026-02-21T09:20:56.0669734Z add.s32 %r5373, %r5431, %r36; 2026-02-21T09:20:56.0669800Z selp.b32 %r5374, 8, 0, %p24; 2026-02-21T09:20:56.0669864Z // begin inline asm 2026-02-21T09:20:56.0670023Z cp.async.ca.shared.global [ %r5373 + 0 ], [ %rd250 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0670082Z // end inline asm 2026-02-21T09:20:56.0670145Z add.s32 %r5375, %r5373, 2048; 2026-02-21T09:20:56.0670207Z // begin inline asm 2026-02-21T09:20:56.0670346Z cp.async.ca.shared.global [ %r5375 + 0 ], [ %rd251 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0670402Z // end inline asm 2026-02-21T09:20:56.0670465Z add.s32 %r5377, %r5373, 4096; 2026-02-21T09:20:56.0670528Z // begin inline asm 2026-02-21T09:20:56.0670660Z cp.async.ca.shared.global [ %r5377 + 0 ], [ %rd252 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0670808Z // end inline asm 2026-02-21T09:20:56.0670872Z add.s32 %r5379, %r5373, 6144; 2026-02-21T09:20:56.0670932Z // begin inline asm 2026-02-21T09:20:56.0671070Z cp.async.ca.shared.global [ %r5379 + 0 ], [ %rd253 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0671216Z // end inline asm 2026-02-21T09:20:56.0671288Z cp.async.commit_group; 2026-02-21T09:20:56.0671496Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0671562Z add.s32 %r5432, %r12021, -65536; 2026-02-21T09:20:56.0671634Z cvt.s64.s32 %rd263, %r5432; 2026-02-21T09:20:56.0671699Z add.s64 %rd254, %rd45, %rd263; 2026-02-21T09:20:56.0671902Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0672033Z shl.b32 %r5433, %r12023, 10; 2026-02-21T09:20:56.0672096Z add.s32 %r5381, %r42, %r5433; 2026-02-21T09:20:56.0672160Z selp.b32 %r5382, 4, 0, %p24; 2026-02-21T09:20:56.0672223Z // begin inline asm 2026-02-21T09:20:56.0672360Z cp.async.ca.shared.global [ %r5381 + 0 ], [ %rd254 + 0 ], 0x4, %r5382; 2026-02-21T09:20:56.0672416Z // end inline asm 2026-02-21T09:20:56.0672483Z cp.async.commit_group; 2026-02-21T09:20:56.0672687Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0672751Z add.s64 %rd255, %rd260, 352; 2026-02-21T09:20:56.0672813Z add.s64 %rd256, %rd261, 352; 2026-02-21T09:20:56.0672877Z add.s64 %rd257, %rd262, 352; 2026-02-21T09:20:56.0673011Z mad.wide.s32 %rd258, %r12020, 2, %rd44; 2026-02-21T09:20:56.0673219Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0673284Z add.s32 %r5434, %r3368, %r5430; 2026-02-21T09:20:56.0673354Z add.s32 %r5383, %r5434, %r36; 2026-02-21T09:20:56.0673418Z // begin inline asm 2026-02-21T09:20:56.0673561Z cp.async.ca.shared.global [ %r5383 + 0 ], [ %rd255 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0673624Z // end inline asm 2026-02-21T09:20:56.0673685Z add.s32 %r5385, %r5383, 2048; 2026-02-21T09:20:56.0673744Z // begin inline asm 2026-02-21T09:20:56.0673882Z cp.async.ca.shared.global [ %r5385 + 0 ], [ %rd256 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0673945Z // end inline asm 2026-02-21T09:20:56.0674006Z add.s32 %r5387, %r5383, 4096; 2026-02-21T09:20:56.0674065Z // begin inline asm 2026-02-21T09:20:56.0674201Z cp.async.ca.shared.global [ %r5387 + 0 ], [ %rd257 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0674258Z // end inline asm 2026-02-21T09:20:56.0674320Z add.s32 %r5389, %r5383, 6144; 2026-02-21T09:20:56.0674382Z // begin inline asm 2026-02-21T09:20:56.0674515Z cp.async.ca.shared.global [ %r5389 + 0 ], [ %rd258 + 0 ], 0x8, %r5374; 2026-02-21T09:20:56.0674572Z // end inline asm 2026-02-21T09:20:56.0674641Z cp.async.commit_group; 2026-02-21T09:20:56.0674853Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0674919Z cvt.s64.s32 %rd264, %r12021; 2026-02-21T09:20:56.0674984Z add.s64 %rd259, %rd45, %rd264; 2026-02-21T09:20:56.0675188Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0675251Z add.s32 %r5391, %r48, %r5433; 2026-02-21T09:20:56.0675325Z // begin inline asm 2026-02-21T09:20:56.0675466Z cp.async.ca.shared.global [ %r5391 + 0 ], [ %rd259 + 0 ], 0x4, %r5382; 2026-02-21T09:20:56.0675522Z // end inline asm 2026-02-21T09:20:56.0675587Z cp.async.commit_group; 2026-02-21T09:20:56.0675790Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0675860Z add.s32 %r12021, %r12021, 131072; 2026-02-21T09:20:56.0675924Z add.s64 %rd636, %rd636, 64; 2026-02-21T09:20:56.0675986Z add.s32 %r12020, %r12020, 32; 2026-02-21T09:20:56.0676057Z setp.lt.u64 %p27, %rd637, 496; 2026-02-21T09:20:56.0676118Z @%p27 bra $L__BB0_5; 2026-02-21T09:20:56.0676231Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:56.0676632Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0676701Z or.b32 %r5682, %r479, %r9; 2026-02-21T09:20:56.0676978Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0677043Z or.b32 %r5683, %r480, %r15; 2026-02-21T09:20:56.0677105Z or.b32 %r5684, %r480, %r16; 2026-02-21T09:20:56.0677164Z or.b32 %r5685, %r480, %r17; 2026-02-21T09:20:56.0677224Z or.b32 %r5686, %r480, %r18; 2026-02-21T09:20:56.0677287Z or.b32 %r5687, %r480, %r19; 2026-02-21T09:20:56.0677358Z or.b32 %r5688, %r480, %r20; 2026-02-21T09:20:56.0677418Z or.b32 %r5689, %r480, %r21; 2026-02-21T09:20:56.0677543Z or.b32 %r5690, %r480, %r22; 2026-02-21T09:20:56.0677609Z or.b32 %r5691, %r480, %r23; 2026-02-21T09:20:56.0677670Z or.b32 %r5692, %r480, %r24; 2026-02-21T09:20:56.0677730Z or.b32 %r5693, %r480, %r25; 2026-02-21T09:20:56.0677795Z or.b32 %r5694, %r480, %r26; 2026-02-21T09:20:56.0677854Z or.b32 %r5695, %r480, %r27; 2026-02-21T09:20:56.0677913Z or.b32 %r5696, %r480, %r28; 2026-02-21T09:20:56.0677972Z or.b32 %r5697, %r480, %r29; 2026-02-21T09:20:56.0678034Z or.b32 %r5698, %r480, %r30; 2026-02-21T09:20:56.0678238Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0678306Z cp.async.wait_group 0; 2026-02-21T09:20:56.0678374Z bar.sync 0; 2026-02-21T09:20:56.0678647Z .loc 1 90 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:90:28 2026-02-21T09:20:56.0678743Z cvt.rn.bf16x2.f32 %r5699, %r12025, %r12024; 2026-02-21T09:20:56.0678829Z cvt.rn.bf16x2.f32 %r5700, %r12027, %r12026; 2026-02-21T09:20:56.0678908Z cvt.rn.bf16x2.f32 %r5701, %r12029, %r12028; 2026-02-21T09:20:56.0678984Z cvt.rn.bf16x2.f32 %r5702, %r12031, %r12030; 2026-02-21T09:20:56.0679065Z cvt.rn.bf16x2.f32 %r5703, %r12033, %r12032; 2026-02-21T09:20:56.0679143Z cvt.rn.bf16x2.f32 %r5704, %r12035, %r12034; 2026-02-21T09:20:56.0679218Z cvt.rn.bf16x2.f32 %r5705, %r12037, %r12036; 2026-02-21T09:20:56.0679292Z cvt.rn.bf16x2.f32 %r5706, %r12039, %r12038; 2026-02-21T09:20:56.0679369Z cvt.rn.bf16x2.f32 %r5707, %r12041, %r12040; 2026-02-21T09:20:56.0679445Z cvt.rn.bf16x2.f32 %r5708, %r12043, %r12042; 2026-02-21T09:20:56.0679519Z cvt.rn.bf16x2.f32 %r5709, %r12045, %r12044; 2026-02-21T09:20:56.0679596Z cvt.rn.bf16x2.f32 %r5710, %r12047, %r12046; 2026-02-21T09:20:56.0679675Z cvt.rn.bf16x2.f32 %r5711, %r12049, %r12048; 2026-02-21T09:20:56.0679749Z cvt.rn.bf16x2.f32 %r5712, %r12051, %r12050; 2026-02-21T09:20:56.0679825Z cvt.rn.bf16x2.f32 %r5713, %r12053, %r12052; 2026-02-21T09:20:56.0679903Z cvt.rn.bf16x2.f32 %r5714, %r12055, %r12054; 2026-02-21T09:20:56.0679977Z cvt.rn.bf16x2.f32 %r5715, %r12057, %r12056; 2026-02-21T09:20:56.0680051Z cvt.rn.bf16x2.f32 %r5716, %r12059, %r12058; 2026-02-21T09:20:56.0680129Z cvt.rn.bf16x2.f32 %r5717, %r12061, %r12060; 2026-02-21T09:20:56.0680207Z cvt.rn.bf16x2.f32 %r5718, %r12063, %r12062; 2026-02-21T09:20:56.0680284Z cvt.rn.bf16x2.f32 %r5719, %r12065, %r12064; 2026-02-21T09:20:56.0680364Z cvt.rn.bf16x2.f32 %r5720, %r12067, %r12066; 2026-02-21T09:20:56.0680441Z cvt.rn.bf16x2.f32 %r5721, %r12069, %r12068; 2026-02-21T09:20:56.0680516Z cvt.rn.bf16x2.f32 %r5722, %r12071, %r12070; 2026-02-21T09:20:56.0680590Z cvt.rn.bf16x2.f32 %r5723, %r12073, %r12072; 2026-02-21T09:20:56.0680670Z cvt.rn.bf16x2.f32 %r5724, %r12075, %r12074; 2026-02-21T09:20:56.0680744Z cvt.rn.bf16x2.f32 %r5725, %r12077, %r12076; 2026-02-21T09:20:56.0680818Z cvt.rn.bf16x2.f32 %r5726, %r12079, %r12078; 2026-02-21T09:20:56.0680894Z cvt.rn.bf16x2.f32 %r5727, %r12081, %r12080; 2026-02-21T09:20:56.0680977Z cvt.rn.bf16x2.f32 %r5728, %r12083, %r12082; 2026-02-21T09:20:56.0681053Z cvt.rn.bf16x2.f32 %r5729, %r12085, %r12084; 2026-02-21T09:20:56.0681131Z cvt.rn.bf16x2.f32 %r5730, %r12087, %r12086; 2026-02-21T09:20:56.0681294Z cvt.rn.bf16x2.f32 %r5731, %r12089, %r12088; 2026-02-21T09:20:56.0681370Z cvt.rn.bf16x2.f32 %r5732, %r12091, %r12090; 2026-02-21T09:20:56.0681447Z cvt.rn.bf16x2.f32 %r5733, %r12093, %r12092; 2026-02-21T09:20:56.0681524Z cvt.rn.bf16x2.f32 %r5734, %r12095, %r12094; 2026-02-21T09:20:56.0681650Z cvt.rn.bf16x2.f32 %r5735, %r12097, %r12096; 2026-02-21T09:20:56.0681724Z cvt.rn.bf16x2.f32 %r5736, %r12099, %r12098; 2026-02-21T09:20:56.0681801Z cvt.rn.bf16x2.f32 %r5737, %r12101, %r12100; 2026-02-21T09:20:56.0681879Z cvt.rn.bf16x2.f32 %r5738, %r12103, %r12102; 2026-02-21T09:20:56.0681955Z cvt.rn.bf16x2.f32 %r5739, %r12105, %r12104; 2026-02-21T09:20:56.0682034Z cvt.rn.bf16x2.f32 %r5740, %r12107, %r12106; 2026-02-21T09:20:56.0682110Z cvt.rn.bf16x2.f32 %r5741, %r12109, %r12108; 2026-02-21T09:20:56.0682239Z cvt.rn.bf16x2.f32 %r5742, %r12111, %r12110; 2026-02-21T09:20:56.0682321Z cvt.rn.bf16x2.f32 %r5743, %r12113, %r12112; 2026-02-21T09:20:56.0682401Z cvt.rn.bf16x2.f32 %r5744, %r12115, %r12114; 2026-02-21T09:20:56.0682479Z cvt.rn.bf16x2.f32 %r5745, %r12117, %r12116; 2026-02-21T09:20:56.0682556Z cvt.rn.bf16x2.f32 %r5746, %r12119, %r12118; 2026-02-21T09:20:56.0682631Z cvt.rn.bf16x2.f32 %r5747, %r12121, %r12120; 2026-02-21T09:20:56.0682704Z cvt.rn.bf16x2.f32 %r5748, %r12123, %r12122; 2026-02-21T09:20:56.0682781Z cvt.rn.bf16x2.f32 %r5749, %r12125, %r12124; 2026-02-21T09:20:56.0682858Z cvt.rn.bf16x2.f32 %r5750, %r12127, %r12126; 2026-02-21T09:20:56.0682931Z cvt.rn.bf16x2.f32 %r5751, %r12129, %r12128; 2026-02-21T09:20:56.0683055Z cvt.rn.bf16x2.f32 %r5752, %r12131, %r12130; 2026-02-21T09:20:56.0683132Z cvt.rn.bf16x2.f32 %r5753, %r12133, %r12132; 2026-02-21T09:20:56.0683209Z cvt.rn.bf16x2.f32 %r5754, %r12135, %r12134; 2026-02-21T09:20:56.0683285Z cvt.rn.bf16x2.f32 %r5755, %r12137, %r12136; 2026-02-21T09:20:56.0683359Z cvt.rn.bf16x2.f32 %r5756, %r12139, %r12138; 2026-02-21T09:20:56.0683437Z cvt.rn.bf16x2.f32 %r5757, %r12141, %r12140; 2026-02-21T09:20:56.0683513Z cvt.rn.bf16x2.f32 %r5758, %r12143, %r12142; 2026-02-21T09:20:56.0683588Z cvt.rn.bf16x2.f32 %r5759, %r12145, %r12144; 2026-02-21T09:20:56.0683665Z cvt.rn.bf16x2.f32 %r5760, %r12147, %r12146; 2026-02-21T09:20:56.0683740Z cvt.rn.bf16x2.f32 %r5761, %r12149, %r12148; 2026-02-21T09:20:56.0683816Z cvt.rn.bf16x2.f32 %r5762, %r12151, %r12150; 2026-02-21T09:20:56.0684031Z .loc 1 91 43 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:43 2026-02-21T09:20:56.0684099Z shl.b32 %r5763, %r5683, 13; 2026-02-21T09:20:56.0684160Z shl.b32 %r5764, %r5684, 13; 2026-02-21T09:20:56.0684220Z shl.b32 %r5765, %r5685, 13; 2026-02-21T09:20:56.0684282Z shl.b32 %r5766, %r5686, 13; 2026-02-21T09:20:56.0684342Z shl.b32 %r5767, %r5687, 13; 2026-02-21T09:20:56.0684403Z shl.b32 %r5768, %r5688, 13; 2026-02-21T09:20:56.0684463Z shl.b32 %r5769, %r5689, 13; 2026-02-21T09:20:56.0684524Z shl.b32 %r5770, %r5690, 13; 2026-02-21T09:20:56.0684582Z shl.b32 %r5771, %r5691, 13; 2026-02-21T09:20:56.0684643Z shl.b32 %r5772, %r5692, 13; 2026-02-21T09:20:56.0684706Z shl.b32 %r5773, %r5693, 13; 2026-02-21T09:20:56.0684764Z shl.b32 %r5774, %r5694, 13; 2026-02-21T09:20:56.0684824Z shl.b32 %r5775, %r5695, 13; 2026-02-21T09:20:56.0684887Z shl.b32 %r5776, %r5696, 13; 2026-02-21T09:20:56.0684958Z shl.b32 %r5777, %r5697, 13; 2026-02-21T09:20:56.0685021Z shl.b32 %r5778, %r5698, 13; 2026-02-21T09:20:56.0685231Z .loc 1 91 50 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:50 2026-02-21T09:20:56.0685299Z add.s32 %r5779, %r5763, %r5682; 2026-02-21T09:20:56.0685363Z add.s32 %r5780, %r5764, %r5682; 2026-02-21T09:20:56.0685425Z add.s32 %r5781, %r5765, %r5682; 2026-02-21T09:20:56.0685485Z add.s32 %r5782, %r5766, %r5682; 2026-02-21T09:20:56.0685546Z add.s32 %r5783, %r5767, %r5682; 2026-02-21T09:20:56.0685606Z add.s32 %r5784, %r5768, %r5682; 2026-02-21T09:20:56.0685666Z add.s32 %r5785, %r5769, %r5682; 2026-02-21T09:20:56.0685730Z add.s32 %r5786, %r5770, %r5682; 2026-02-21T09:20:56.0685861Z add.s32 %r5787, %r5771, %r5682; 2026-02-21T09:20:56.0685923Z add.s32 %r5788, %r5772, %r5682; 2026-02-21T09:20:56.0685986Z add.s32 %r5789, %r5773, %r5682; 2026-02-21T09:20:56.0686046Z add.s32 %r5790, %r5774, %r5682; 2026-02-21T09:20:56.0686154Z add.s32 %r5791, %r5775, %r5682; 2026-02-21T09:20:56.0686219Z add.s32 %r5792, %r5776, %r5682; 2026-02-21T09:20:56.0686280Z add.s32 %r5793, %r5777, %r5682; 2026-02-21T09:20:56.0686340Z add.s32 %r5794, %r5778, %r5682; 2026-02-21T09:20:56.0686666Z .loc 1 91 22 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:22 2026-02-21T09:20:56.0686746Z mad.wide.s32 %rd265, %r5779, 2, %rd46; 2026-02-21T09:20:56.0686817Z mad.wide.s32 %rd266, %r5780, 2, %rd46; 2026-02-21T09:20:56.0686964Z mad.wide.s32 %rd267, %r5781, 2, %rd46; 2026-02-21T09:20:56.0687041Z mad.wide.s32 %rd268, %r5782, 2, %rd46; 2026-02-21T09:20:56.0687107Z mad.wide.s32 %rd269, %r5783, 2, %rd46; 2026-02-21T09:20:56.0687175Z mad.wide.s32 %rd270, %r5784, 2, %rd46; 2026-02-21T09:20:56.0687243Z mad.wide.s32 %rd271, %r5785, 2, %rd46; 2026-02-21T09:20:56.0687315Z mad.wide.s32 %rd272, %r5786, 2, %rd46; 2026-02-21T09:20:56.0687391Z mad.wide.s32 %rd273, %r5787, 2, %rd46; 2026-02-21T09:20:56.0687463Z mad.wide.s32 %rd274, %r5788, 2, %rd46; 2026-02-21T09:20:56.0687534Z mad.wide.s32 %rd275, %r5789, 2, %rd46; 2026-02-21T09:20:56.0687600Z mad.wide.s32 %rd276, %r5790, 2, %rd46; 2026-02-21T09:20:56.0687666Z mad.wide.s32 %rd277, %r5791, 2, %rd46; 2026-02-21T09:20:56.0687735Z mad.wide.s32 %rd278, %r5792, 2, %rd46; 2026-02-21T09:20:56.0687873Z mad.wide.s32 %rd279, %r5793, 2, %rd46; 2026-02-21T09:20:56.0687948Z mad.wide.s32 %rd280, %r5794, 2, %rd46; 2026-02-21T09:20:56.0688154Z .loc 1 91 81 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:81 2026-02-21T09:20:56.0688282Z st.shared.v4.b32 [%r105], {%r5699, %r5701, %r5703, %r5705}; 2026-02-21T09:20:56.0688405Z st.shared.v4.b32 [%r105+512], {%r5700, %r5702, %r5704, %r5706}; 2026-02-21T09:20:56.0688514Z st.shared.v4.b32 [%r106], {%r5707, %r5709, %r5711, %r5713}; 2026-02-21T09:20:56.0688630Z st.shared.v4.b32 [%r106+512], {%r5708, %r5710, %r5712, %r5714}; 2026-02-21T09:20:56.0688735Z st.shared.v4.b32 [%r107], {%r5715, %r5717, %r5719, %r5721}; 2026-02-21T09:20:56.0688849Z st.shared.v4.b32 [%r107+512], {%r5716, %r5718, %r5720, %r5722}; 2026-02-21T09:20:56.0688955Z st.shared.v4.b32 [%r108], {%r5723, %r5725, %r5727, %r5729}; 2026-02-21T09:20:56.0689065Z st.shared.v4.b32 [%r108+512], {%r5724, %r5726, %r5728, %r5730}; 2026-02-21T09:20:56.0689125Z bar.sync 0; 2026-02-21T09:20:56.0689189Z // begin inline asm 2026-02-21T09:20:56.0689386Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5515, %r5516, %r5517, %r5518}, [%r3394]; 2026-02-21T09:20:56.0689448Z // end inline asm 2026-02-21T09:20:56.0689507Z // begin inline asm 2026-02-21T09:20:56.0689697Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5519, %r5520, %r5521, %r5522}, [%r3399]; 2026-02-21T09:20:56.0689756Z // end inline asm 2026-02-21T09:20:56.0689813Z // begin inline asm 2026-02-21T09:20:56.0689994Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5523, %r5524, %r5525, %r5526}, [%r3404]; 2026-02-21T09:20:56.0690051Z // end inline asm 2026-02-21T09:20:56.0690111Z // begin inline asm 2026-02-21T09:20:56.0690291Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5527, %r5528, %r5529, %r5530}, [%r3409]; 2026-02-21T09:20:56.0690350Z // end inline asm 2026-02-21T09:20:56.0690408Z // begin inline asm 2026-02-21T09:20:56.0690585Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5531, %r5532, %r5533, %r5534}, [%r3414]; 2026-02-21T09:20:56.0690646Z // end inline asm 2026-02-21T09:20:56.0690704Z // begin inline asm 2026-02-21T09:20:56.0690890Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5535, %r5536, %r5537, %r5538}, [%r3419]; 2026-02-21T09:20:56.0690951Z // end inline asm 2026-02-21T09:20:56.0691009Z // begin inline asm 2026-02-21T09:20:56.0691187Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5539, %r5540, %r5541, %r5542}, [%r3424]; 2026-02-21T09:20:56.0691356Z // end inline asm 2026-02-21T09:20:56.0691423Z // begin inline asm 2026-02-21T09:20:56.0691599Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5543, %r5544, %r5545, %r5546}, [%r3429]; 2026-02-21T09:20:56.0691656Z // end inline asm 2026-02-21T09:20:56.0691775Z bar.sync 0; 2026-02-21T09:20:56.0691884Z st.shared.v4.b32 [%r105], {%r5731, %r5733, %r5735, %r5737}; 2026-02-21T09:20:56.0691997Z st.shared.v4.b32 [%r105+512], {%r5732, %r5734, %r5736, %r5738}; 2026-02-21T09:20:56.0692111Z st.shared.v4.b32 [%r106], {%r5739, %r5741, %r5743, %r5745}; 2026-02-21T09:20:56.0692232Z st.shared.v4.b32 [%r106+512], {%r5740, %r5742, %r5744, %r5746}; 2026-02-21T09:20:56.0692334Z st.shared.v4.b32 [%r107], {%r5747, %r5749, %r5751, %r5753}; 2026-02-21T09:20:56.0692499Z st.shared.v4.b32 [%r107+512], {%r5748, %r5750, %r5752, %r5754}; 2026-02-21T09:20:56.0692614Z st.shared.v4.b32 [%r108], {%r5755, %r5757, %r5759, %r5761}; 2026-02-21T09:20:56.0692723Z st.shared.v4.b32 [%r108+512], {%r5756, %r5758, %r5760, %r5762}; 2026-02-21T09:20:56.0692779Z bar.sync 0; 2026-02-21T09:20:56.0692841Z // begin inline asm 2026-02-21T09:20:56.0693020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5547, %r5548, %r5549, %r5550}, [%r3394]; 2026-02-21T09:20:56.0693079Z // end inline asm 2026-02-21T09:20:56.0693141Z // begin inline asm 2026-02-21T09:20:56.0693319Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5551, %r5552, %r5553, %r5554}, [%r3399]; 2026-02-21T09:20:56.0693376Z // end inline asm 2026-02-21T09:20:56.0693433Z // begin inline asm 2026-02-21T09:20:56.0693665Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5555, %r5556, %r5557, %r5558}, [%r3404]; 2026-02-21T09:20:56.0693731Z // end inline asm 2026-02-21T09:20:56.0693789Z // begin inline asm 2026-02-21T09:20:56.0693975Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5559, %r5560, %r5561, %r5562}, [%r3409]; 2026-02-21T09:20:56.0694030Z // end inline asm 2026-02-21T09:20:56.0694088Z // begin inline asm 2026-02-21T09:20:56.0694266Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5563, %r5564, %r5565, %r5566}, [%r3414]; 2026-02-21T09:20:56.0694326Z // end inline asm 2026-02-21T09:20:56.0694384Z // begin inline asm 2026-02-21T09:20:56.0694560Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5567, %r5568, %r5569, %r5570}, [%r3419]; 2026-02-21T09:20:56.0694621Z // end inline asm 2026-02-21T09:20:56.0694680Z // begin inline asm 2026-02-21T09:20:56.0694854Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5571, %r5572, %r5573, %r5574}, [%r3424]; 2026-02-21T09:20:56.0694912Z // end inline asm 2026-02-21T09:20:56.0694971Z // begin inline asm 2026-02-21T09:20:56.0695145Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5575, %r5576, %r5577, %r5578}, [%r3429]; 2026-02-21T09:20:56.0695200Z // end inline asm 2026-02-21T09:20:56.0695264Z // begin inline asm 2026-02-21T09:20:56.0695391Z st.global.v4.b32 [ %rd265 + 0 ], { %r5515, %r5516, %r5517, %r5518 }; 2026-02-21T09:20:56.0695448Z // end inline asm 2026-02-21T09:20:56.0695508Z // begin inline asm 2026-02-21T09:20:56.0695627Z st.global.v4.b32 [ %rd266 + 0 ], { %r5519, %r5520, %r5521, %r5522 }; 2026-02-21T09:20:56.0695683Z // end inline asm 2026-02-21T09:20:56.0695741Z // begin inline asm 2026-02-21T09:20:56.0695860Z st.global.v4.b32 [ %rd267 + 0 ], { %r5523, %r5524, %r5525, %r5526 }; 2026-02-21T09:20:56.0695918Z // end inline asm 2026-02-21T09:20:56.0695977Z // begin inline asm 2026-02-21T09:20:56.0696096Z st.global.v4.b32 [ %rd268 + 0 ], { %r5527, %r5528, %r5529, %r5530 }; 2026-02-21T09:20:56.0696153Z // end inline asm 2026-02-21T09:20:56.0696210Z // begin inline asm 2026-02-21T09:20:56.0696328Z st.global.v4.b32 [ %rd269 + 0 ], { %r5531, %r5532, %r5533, %r5534 }; 2026-02-21T09:20:56.0696386Z // end inline asm 2026-02-21T09:20:56.0696444Z // begin inline asm 2026-02-21T09:20:56.0696693Z st.global.v4.b32 [ %rd270 + 0 ], { %r5535, %r5536, %r5537, %r5538 }; 2026-02-21T09:20:56.0696753Z // end inline asm 2026-02-21T09:20:56.0696812Z // begin inline asm 2026-02-21T09:20:56.0696925Z st.global.v4.b32 [ %rd271 + 0 ], { %r5539, %r5540, %r5541, %r5542 }; 2026-02-21T09:20:56.0697076Z // end inline asm 2026-02-21T09:20:56.0697135Z // begin inline asm 2026-02-21T09:20:56.0697249Z st.global.v4.b32 [ %rd272 + 0 ], { %r5543, %r5544, %r5545, %r5546 }; 2026-02-21T09:20:56.0697368Z // end inline asm 2026-02-21T09:20:56.0697432Z // begin inline asm 2026-02-21T09:20:56.0697546Z st.global.v4.b32 [ %rd273 + 0 ], { %r5547, %r5548, %r5549, %r5550 }; 2026-02-21T09:20:56.0697603Z // end inline asm 2026-02-21T09:20:56.0697664Z // begin inline asm 2026-02-21T09:20:56.0697779Z st.global.v4.b32 [ %rd274 + 0 ], { %r5551, %r5552, %r5553, %r5554 }; 2026-02-21T09:20:56.0697837Z // end inline asm 2026-02-21T09:20:56.0697896Z // begin inline asm 2026-02-21T09:20:56.0698074Z st.global.v4.b32 [ %rd275 + 0 ], { %r5555, %r5556, %r5557, %r5558 }; 2026-02-21T09:20:56.0698133Z // end inline asm 2026-02-21T09:20:56.0698201Z // begin inline asm 2026-02-21T09:20:56.0698320Z st.global.v4.b32 [ %rd276 + 0 ], { %r5559, %r5560, %r5561, %r5562 }; 2026-02-21T09:20:56.0698377Z // end inline asm 2026-02-21T09:20:56.0698435Z // begin inline asm 2026-02-21T09:20:56.0698549Z st.global.v4.b32 [ %rd277 + 0 ], { %r5563, %r5564, %r5565, %r5566 }; 2026-02-21T09:20:56.0698605Z // end inline asm 2026-02-21T09:20:56.0698670Z // begin inline asm 2026-02-21T09:20:56.0698784Z st.global.v4.b32 [ %rd278 + 0 ], { %r5567, %r5568, %r5569, %r5570 }; 2026-02-21T09:20:56.0698847Z // end inline asm 2026-02-21T09:20:56.0698905Z // begin inline asm 2026-02-21T09:20:56.0699083Z st.global.v4.b32 [ %rd279 + 0 ], { %r5571, %r5572, %r5573, %r5574 }; 2026-02-21T09:20:56.0699149Z // end inline asm 2026-02-21T09:20:56.0699210Z // begin inline asm 2026-02-21T09:20:56.0699323Z st.global.v4.b32 [ %rd280 + 0 ], { %r5575, %r5576, %r5577, %r5578 }; 2026-02-21T09:20:56.0699385Z // end inline asm 2026-02-21T09:20:56.0699609Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0699685Z add.s32 %r5795, %r11887, 2; 2026-02-21T09:20:56.0699897Z .loc 1 28 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:28:35 2026-02-21T09:20:56.0699967Z shr.s32 %r5796, %r5795, 31; 2026-02-21T09:20:56.0700029Z shr.u32 %r5797, %r5796, 23; 2026-02-21T09:20:56.0700095Z add.s32 %r5798, %r5795, %r5797; 2026-02-21T09:20:56.0700162Z shr.s32 %r5799, %r5798, 9; 2026-02-21T09:20:56.0700366Z .loc 1 29 33 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:29:33 2026-02-21T09:20:56.0700430Z shl.b32 %r5800, %r5799, 3; 2026-02-21T09:20:56.0700636Z .loc 1 30 39 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:39 2026-02-21T09:20:56.0700697Z sub.s32 %r5801, 64, %r5800; 2026-02-21T09:20:56.0700900Z .loc 1 30 52 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:52 2026-02-21T09:20:56.0700963Z min.s32 %r5802, %r5801, 8; 2026-02-21T09:20:56.0701168Z .loc 1 31 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:45 2026-02-21T09:20:56.0701235Z and.b32 %r5803, %r5798, -512; 2026-02-21T09:20:56.0701300Z sub.s32 %r5804, %r5795, %r5803; 2026-02-21T09:20:56.0701507Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0701573Z div.s32 %r5805, %r5804, %r5802; 2026-02-21T09:20:56.0701773Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0701845Z mul.lo.s32 %r5806, %r5805, %r5802; 2026-02-21T09:20:56.0701910Z sub.s32 %r5807, %r5804, %r5806; 2026-02-21T09:20:56.0702110Z .loc 1 31 30 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:30 2026-02-21T09:20:56.0702182Z add.s32 %r5808, %r5807, %r5800; 2026-02-21T09:20:56.0702381Z .loc 1 33 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:33:27 2026-02-21T09:20:56.0702444Z shl.b32 %r747, %r5808, 7; 2026-02-21T09:20:56.0702716Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0702785Z or.b32 %r5809, %r747, %r7; 2026-02-21T09:20:56.0702984Z .loc 1 35 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:35:27 2026-02-21T09:20:56.0703094Z shl.b32 %r748, %r5805, 8; 2026-02-21T09:20:56.0703299Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0703364Z or.b32 %r5810, %r748, %r11; 2026-02-21T09:20:56.0703426Z or.b32 %r5811, %r748, %r12; 2026-02-21T09:20:56.0703490Z or.b32 %r5812, %r748, %r13; 2026-02-21T09:20:56.0703551Z or.b32 %r5813, %r748, %r14; 2026-02-21T09:20:56.0703797Z .loc 1 51 53 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:53 2026-02-21T09:20:56.0703861Z shl.b32 %r5814, %r5810, 10; 2026-02-21T09:20:56.0703924Z shl.b32 %r5815, %r5811, 10; 2026-02-21T09:20:56.0703989Z shl.b32 %r5816, %r5812, 10; 2026-02-21T09:20:56.0704048Z shl.b32 %r5817, %r5813, 10; 2026-02-21T09:20:56.0704253Z .loc 1 51 60 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:60 2026-02-21T09:20:56.0704317Z or.b32 %r5818, %r5814, %r34; 2026-02-21T09:20:56.0704379Z or.b32 %r5819, %r5815, %r34; 2026-02-21T09:20:56.0704442Z or.b32 %r5820, %r5816, %r34; 2026-02-21T09:20:56.0704503Z or.b32 %r5821, %r5817, %r34; 2026-02-21T09:20:56.0704752Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0704825Z mad.wide.s32 %rd281, %r5818, 2, %rd44; 2026-02-21T09:20:56.0704909Z mad.wide.s32 %rd282, %r5819, 2, %rd44; 2026-02-21T09:20:56.0704980Z mad.wide.s32 %rd283, %r5820, 2, %rd44; 2026-02-21T09:20:56.0705050Z mad.wide.s32 %rd284, %r5821, 2, %rd44; 2026-02-21T09:20:56.0705257Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0705315Z bar.sync 0; 2026-02-21T09:20:56.0705373Z mov.b32 %r5580, 8; 2026-02-21T09:20:56.0705437Z // begin inline asm 2026-02-21T09:20:56.0705574Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd281 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0705632Z // end inline asm 2026-02-21T09:20:56.0705694Z // begin inline asm 2026-02-21T09:20:56.0705828Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd282 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0705885Z // end inline asm 2026-02-21T09:20:56.0705944Z // begin inline asm 2026-02-21T09:20:56.0706077Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd283 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0706134Z // end inline asm 2026-02-21T09:20:56.0706195Z // begin inline asm 2026-02-21T09:20:56.0706326Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd284 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0706399Z // end inline asm 2026-02-21T09:20:56.0706589Z cp.async.commit_group; 2026-02-21T09:20:56.0706800Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0706872Z add.s32 %r5822, %r5809, %r11870; 2026-02-21T09:20:56.0707075Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0707139Z cvt.s64.s32 %rd332, %r5822; 2026-02-21T09:20:56.0707211Z add.s64 %rd285, %rd45, %rd332; 2026-02-21T09:20:56.0707272Z mov.b32 %r12155, 4; 2026-02-21T09:20:56.0707472Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0707532Z // begin inline asm 2026-02-21T09:20:56.0707677Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd285 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0707735Z // end inline asm 2026-02-21T09:20:56.0707802Z cp.async.commit_group; 2026-02-21T09:20:56.0708010Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0708075Z cvt.s64.s32 %rd333, %r5814; 2026-02-21T09:20:56.0708138Z or.b64 %rd334, %rd333, %rd633; 2026-02-21T09:20:56.0708357Z shl.b64 %rd335, %rd334, 1; 2026-02-21T09:20:56.0708438Z add.s64 %rd336, %rd44, %rd335; 2026-02-21T09:20:56.0708504Z add.s64 %rd286, %rd336, 32; 2026-02-21T09:20:56.0708566Z cvt.s64.s32 %rd337, %r5815; 2026-02-21T09:20:56.0708708Z or.b64 %rd338, %rd337, %rd633; 2026-02-21T09:20:56.0708772Z shl.b64 %rd339, %rd338, 1; 2026-02-21T09:20:56.0708835Z add.s64 %rd340, %rd44, %rd339; 2026-02-21T09:20:56.0708900Z add.s64 %rd287, %rd340, 32; 2026-02-21T09:20:56.0708962Z cvt.s64.s32 %rd341, %r5816; 2026-02-21T09:20:56.0709027Z or.b64 %rd342, %rd341, %rd633; 2026-02-21T09:20:56.0709089Z shl.b64 %rd343, %rd342, 1; 2026-02-21T09:20:56.0709155Z add.s64 %rd344, %rd44, %rd343; 2026-02-21T09:20:56.0709216Z add.s64 %rd288, %rd344, 32; 2026-02-21T09:20:56.0709343Z cvt.s64.s32 %rd345, %r5817; 2026-02-21T09:20:56.0709412Z or.b64 %rd346, %rd345, %rd633; 2026-02-21T09:20:56.0709473Z shl.b64 %rd347, %rd346, 1; 2026-02-21T09:20:56.0709538Z add.s64 %rd348, %rd44, %rd347; 2026-02-21T09:20:56.0709601Z add.s64 %rd289, %rd348, 32; 2026-02-21T09:20:56.0709823Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0709884Z // begin inline asm 2026-02-21T09:20:56.0710017Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd286 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0710080Z // end inline asm 2026-02-21T09:20:56.0710140Z // begin inline asm 2026-02-21T09:20:56.0710272Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd287 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0710332Z // end inline asm 2026-02-21T09:20:56.0710472Z // begin inline asm 2026-02-21T09:20:56.0710604Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd288 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0710661Z // end inline asm 2026-02-21T09:20:56.0710727Z // begin inline asm 2026-02-21T09:20:56.0710856Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd289 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0710914Z // end inline asm 2026-02-21T09:20:56.0710997Z cp.async.commit_group; 2026-02-21T09:20:56.0711204Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0711269Z add.s32 %r5823, %r5809, %r47; 2026-02-21T09:20:56.0711473Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0711539Z cvt.s64.s32 %rd349, %r5823; 2026-02-21T09:20:56.0711603Z add.s64 %rd290, %rd45, %rd349; 2026-02-21T09:20:56.0711808Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0711875Z // begin inline asm 2026-02-21T09:20:56.0712011Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd290 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0712069Z // end inline asm 2026-02-21T09:20:56.0712140Z cp.async.commit_group; 2026-02-21T09:20:56.0712339Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0712400Z add.s64 %rd291, %rd336, 64; 2026-02-21T09:20:56.0712465Z add.s64 %rd292, %rd340, 64; 2026-02-21T09:20:56.0712526Z add.s64 %rd293, %rd344, 64; 2026-02-21T09:20:56.0712590Z add.s64 %rd294, %rd348, 64; 2026-02-21T09:20:56.0712789Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0712851Z bar.sync 0; 2026-02-21T09:20:56.0712911Z // begin inline asm 2026-02-21T09:20:56.0713040Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd291 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0713101Z // end inline asm 2026-02-21T09:20:56.0713164Z // begin inline asm 2026-02-21T09:20:56.0713292Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd292 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0713350Z // end inline asm 2026-02-21T09:20:56.0713412Z // begin inline asm 2026-02-21T09:20:56.0713540Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd293 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0713597Z // end inline asm 2026-02-21T09:20:56.0713658Z // begin inline asm 2026-02-21T09:20:56.0713860Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd294 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0713919Z // end inline asm 2026-02-21T09:20:56.0713989Z cp.async.commit_group; 2026-02-21T09:20:56.0714192Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0714312Z add.s32 %r5824, %r5809, %r53; 2026-02-21T09:20:56.0714514Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0714580Z cvt.s64.s32 %rd350, %r5824; 2026-02-21T09:20:56.0714645Z add.s64 %rd295, %rd45, %rd350; 2026-02-21T09:20:56.0714845Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0714907Z // begin inline asm 2026-02-21T09:20:56.0715094Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd295 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0715158Z // end inline asm 2026-02-21T09:20:56.0715227Z cp.async.commit_group; 2026-02-21T09:20:56.0715429Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0715491Z add.s64 %rd296, %rd336, 96; 2026-02-21T09:20:56.0715554Z add.s64 %rd297, %rd340, 96; 2026-02-21T09:20:56.0715619Z add.s64 %rd298, %rd344, 96; 2026-02-21T09:20:56.0715683Z add.s64 %rd299, %rd348, 96; 2026-02-21T09:20:56.0715883Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0715946Z // begin inline asm 2026-02-21T09:20:56.0716135Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd296 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0716197Z // end inline asm 2026-02-21T09:20:56.0716260Z // begin inline asm 2026-02-21T09:20:56.0716390Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd297 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0716564Z // end inline asm 2026-02-21T09:20:56.0716631Z // begin inline asm 2026-02-21T09:20:56.0716763Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd298 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0716822Z // end inline asm 2026-02-21T09:20:56.0716881Z // begin inline asm 2026-02-21T09:20:56.0717009Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd299 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0717066Z // end inline asm 2026-02-21T09:20:56.0717134Z cp.async.commit_group; 2026-02-21T09:20:56.0717337Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0717404Z add.s32 %r5825, %r5809, %r59; 2026-02-21T09:20:56.0717607Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0717670Z cvt.s64.s32 %rd351, %r5825; 2026-02-21T09:20:56.0717737Z add.s64 %rd300, %rd45, %rd351; 2026-02-21T09:20:56.0717940Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0718001Z // begin inline asm 2026-02-21T09:20:56.0718139Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd300 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0718199Z // end inline asm 2026-02-21T09:20:56.0718265Z cp.async.commit_group; 2026-02-21T09:20:56.0718468Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0718536Z add.s64 %rd301, %rd336, 128; 2026-02-21T09:20:56.0718599Z add.s64 %rd302, %rd340, 128; 2026-02-21T09:20:56.0718661Z add.s64 %rd303, %rd344, 128; 2026-02-21T09:20:56.0718726Z add.s64 %rd304, %rd348, 128; 2026-02-21T09:20:56.0718927Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0718983Z bar.sync 0; 2026-02-21T09:20:56.0719048Z // begin inline asm 2026-02-21T09:20:56.0719180Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd301 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0719241Z // end inline asm 2026-02-21T09:20:56.0719299Z // begin inline asm 2026-02-21T09:20:56.0719430Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd302 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0719590Z // end inline asm 2026-02-21T09:20:56.0719656Z // begin inline asm 2026-02-21T09:20:56.0719796Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd303 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0719856Z // end inline asm 2026-02-21T09:20:56.0719914Z // begin inline asm 2026-02-21T09:20:56.0720110Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd304 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0720168Z // end inline asm 2026-02-21T09:20:56.0720233Z cp.async.commit_group; 2026-02-21T09:20:56.0720445Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0720515Z add.s32 %r5826, %r5809, %r65; 2026-02-21T09:20:56.0720720Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0720849Z cvt.s64.s32 %rd352, %r5826; 2026-02-21T09:20:56.0720921Z add.s64 %rd305, %rd45, %rd352; 2026-02-21T09:20:56.0721130Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0721192Z // begin inline asm 2026-02-21T09:20:56.0721344Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd305 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0721404Z // end inline asm 2026-02-21T09:20:56.0721471Z cp.async.commit_group; 2026-02-21T09:20:56.0721673Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0721739Z add.s64 %rd306, %rd336, 160; 2026-02-21T09:20:56.0721800Z add.s64 %rd307, %rd340, 160; 2026-02-21T09:20:56.0721862Z add.s64 %rd308, %rd344, 160; 2026-02-21T09:20:56.0722007Z add.s64 %rd309, %rd348, 160; 2026-02-21T09:20:56.0722212Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0722274Z // begin inline asm 2026-02-21T09:20:56.0722408Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd306 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0722465Z // end inline asm 2026-02-21T09:20:56.0722522Z // begin inline asm 2026-02-21T09:20:56.0722652Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd307 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0722711Z // end inline asm 2026-02-21T09:20:56.0722771Z // begin inline asm 2026-02-21T09:20:56.0722898Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd308 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0722964Z // end inline asm 2026-02-21T09:20:56.0723022Z // begin inline asm 2026-02-21T09:20:56.0723149Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd309 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0723204Z // end inline asm 2026-02-21T09:20:56.0723275Z cp.async.commit_group; 2026-02-21T09:20:56.0723478Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0723541Z add.s32 %r5827, %r5809, %r71; 2026-02-21T09:20:56.0723749Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0723813Z cvt.s64.s32 %rd353, %r5827; 2026-02-21T09:20:56.0723876Z add.s64 %rd310, %rd45, %rd353; 2026-02-21T09:20:56.0724079Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0724139Z // begin inline asm 2026-02-21T09:20:56.0724272Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd310 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0724330Z // end inline asm 2026-02-21T09:20:56.0724399Z cp.async.commit_group; 2026-02-21T09:20:56.0724598Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0724660Z add.s64 %rd311, %rd336, 192; 2026-02-21T09:20:56.0724737Z add.s64 %rd312, %rd340, 192; 2026-02-21T09:20:56.0724801Z add.s64 %rd313, %rd344, 192; 2026-02-21T09:20:56.0724863Z add.s64 %rd314, %rd348, 192; 2026-02-21T09:20:56.0725071Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0725125Z bar.sync 0; 2026-02-21T09:20:56.0725186Z // begin inline asm 2026-02-21T09:20:56.0725390Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd311 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0725451Z // end inline asm 2026-02-21T09:20:56.0725510Z // begin inline asm 2026-02-21T09:20:56.0725638Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd312 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0725747Z // end inline asm 2026-02-21T09:20:56.0725806Z // begin inline asm 2026-02-21T09:20:56.0725933Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd313 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0725988Z // end inline asm 2026-02-21T09:20:56.0726052Z // begin inline asm 2026-02-21T09:20:56.0726181Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd314 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0726238Z // end inline asm 2026-02-21T09:20:56.0726309Z cp.async.commit_group; 2026-02-21T09:20:56.0726705Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0726776Z add.s32 %r5828, %r5809, %r77; 2026-02-21T09:20:56.0726981Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0727046Z cvt.s64.s32 %rd354, %r5828; 2026-02-21T09:20:56.0727109Z add.s64 %rd315, %rd45, %rd354; 2026-02-21T09:20:56.0727310Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0727388Z // begin inline asm 2026-02-21T09:20:56.0727525Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd315 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0727584Z // end inline asm 2026-02-21T09:20:56.0727654Z cp.async.commit_group; 2026-02-21T09:20:56.0727917Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0727982Z add.s64 %rd316, %rd336, 224; 2026-02-21T09:20:56.0728049Z add.s64 %rd317, %rd340, 224; 2026-02-21T09:20:56.0728109Z add.s64 %rd318, %rd344, 224; 2026-02-21T09:20:56.0728171Z add.s64 %rd319, %rd348, 224; 2026-02-21T09:20:56.0728372Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0728437Z // begin inline asm 2026-02-21T09:20:56.0728566Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd316 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0728624Z // end inline asm 2026-02-21T09:20:56.0728688Z // begin inline asm 2026-02-21T09:20:56.0728817Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd317 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0728875Z // end inline asm 2026-02-21T09:20:56.0728935Z // begin inline asm 2026-02-21T09:20:56.0729064Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd318 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0729123Z // end inline asm 2026-02-21T09:20:56.0729181Z // begin inline asm 2026-02-21T09:20:56.0729311Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd319 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0729374Z // end inline asm 2026-02-21T09:20:56.0729440Z cp.async.commit_group; 2026-02-21T09:20:56.0729645Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0729710Z add.s32 %r5829, %r5809, %r83; 2026-02-21T09:20:56.0729913Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0729982Z cvt.s64.s32 %rd355, %r5829; 2026-02-21T09:20:56.0730047Z add.s64 %rd320, %rd45, %rd355; 2026-02-21T09:20:56.0730245Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0730305Z // begin inline asm 2026-02-21T09:20:56.0730440Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd320 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0730498Z // end inline asm 2026-02-21T09:20:56.0730565Z cp.async.commit_group; 2026-02-21T09:20:56.0730769Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0730832Z add.s64 %rd321, %rd336, 256; 2026-02-21T09:20:56.0730893Z add.s64 %rd322, %rd340, 256; 2026-02-21T09:20:56.0730956Z add.s64 %rd323, %rd344, 256; 2026-02-21T09:20:56.0731106Z add.s64 %rd324, %rd348, 256; 2026-02-21T09:20:56.0731307Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0731363Z bar.sync 0; 2026-02-21T09:20:56.0731426Z // begin inline asm 2026-02-21T09:20:56.0731626Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd321 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0731684Z // end inline asm 2026-02-21T09:20:56.0731746Z // begin inline asm 2026-02-21T09:20:56.0731873Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd322 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0731933Z // end inline asm 2026-02-21T09:20:56.0731990Z // begin inline asm 2026-02-21T09:20:56.0732120Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd323 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0732176Z // end inline asm 2026-02-21T09:20:56.0732306Z // begin inline asm 2026-02-21T09:20:56.0732445Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd324 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0732501Z // end inline asm 2026-02-21T09:20:56.0732569Z cp.async.commit_group; 2026-02-21T09:20:56.0732772Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0732834Z add.s32 %r5830, %r5809, %r89; 2026-02-21T09:20:56.0733035Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0733101Z cvt.s64.s32 %rd356, %r5830; 2026-02-21T09:20:56.0733169Z add.s64 %rd325, %rd45, %rd356; 2026-02-21T09:20:56.0733418Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0733485Z // begin inline asm 2026-02-21T09:20:56.0733622Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd325 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0733681Z // end inline asm 2026-02-21T09:20:56.0733747Z cp.async.commit_group; 2026-02-21T09:20:56.0733950Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0734014Z add.s64 %rd326, %rd336, 288; 2026-02-21T09:20:56.0734074Z add.s64 %rd327, %rd340, 288; 2026-02-21T09:20:56.0734134Z add.s64 %rd328, %rd344, 288; 2026-02-21T09:20:56.0734198Z add.s64 %rd329, %rd348, 288; 2026-02-21T09:20:56.0734396Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0734459Z // begin inline asm 2026-02-21T09:20:56.0734593Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd326 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0734649Z // end inline asm 2026-02-21T09:20:56.0734708Z // begin inline asm 2026-02-21T09:20:56.0734841Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd327 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0734898Z // end inline asm 2026-02-21T09:20:56.0734957Z // begin inline asm 2026-02-21T09:20:56.0735087Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd328 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0735146Z // end inline asm 2026-02-21T09:20:56.0735204Z // begin inline asm 2026-02-21T09:20:56.0735333Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd329 + 0 ], 0x8, %r5580; 2026-02-21T09:20:56.0735392Z // end inline asm 2026-02-21T09:20:56.0735458Z cp.async.commit_group; 2026-02-21T09:20:56.0735659Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0735723Z add.s32 %r5831, %r5809, %r95; 2026-02-21T09:20:56.0735928Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0735991Z cvt.s64.s32 %rd357, %r5831; 2026-02-21T09:20:56.0736056Z add.s64 %rd330, %rd45, %rd357; 2026-02-21T09:20:56.0736274Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0736336Z // begin inline asm 2026-02-21T09:20:56.0736585Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd330 + 0 ], 0x4, %r12155; 2026-02-21T09:20:56.0736648Z // end inline asm 2026-02-21T09:20:56.0736713Z cp.async.commit_group; 2026-02-21T09:20:56.0737009Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0737073Z add.s32 %r12153, %r117, %r747; 2026-02-21T09:20:56.0737141Z or.b32 %r5832, %r13, %r748; 2026-02-21T09:20:56.0737201Z shl.b32 %r5833, %r5832, 10; 2026-02-21T09:20:56.0737339Z mul.wide.s32 %rd22, %r5833, 2; 2026-02-21T09:20:56.0737404Z or.b32 %r5834, %r12, %r748; 2026-02-21T09:20:56.0737465Z shl.b32 %r5835, %r5834, 10; 2026-02-21T09:20:56.0737531Z mul.wide.s32 %rd23, %r5835, 2; 2026-02-21T09:20:56.0737595Z shl.b32 %r5836, %r5805, 18; 2026-02-21T09:20:56.0737662Z or.b32 %r5837, %r11886, %r5836; 2026-02-21T09:20:56.0737726Z mul.wide.s32 %rd24, %r5837, 2; 2026-02-21T09:20:56.0737789Z or.b32 %r12152, %r121, %r5836; 2026-02-21T09:20:56.0737926Z mov.b32 %r12156, 0f00000000; 2026-02-21T09:20:56.0737993Z mov.b32 %r12154, -1; 2026-02-21T09:20:56.0738054Z mov.b64 %rd639, -16; 2026-02-21T09:20:56.0738116Z mov.b64 %rd638, %rd3; 2026-02-21T09:20:56.0738179Z mov.b32 %r12157, %r12156; 2026-02-21T09:20:56.0738241Z mov.b32 %r12158, %r12156; 2026-02-21T09:20:56.0738300Z mov.b32 %r12159, %r12156; 2026-02-21T09:20:56.0738362Z mov.b32 %r12160, %r12156; 2026-02-21T09:20:56.0738421Z mov.b32 %r12161, %r12156; 2026-02-21T09:20:56.0738482Z mov.b32 %r12162, %r12156; 2026-02-21T09:20:56.0738545Z mov.b32 %r12163, %r12156; 2026-02-21T09:20:56.0738605Z mov.b32 %r12164, %r12156; 2026-02-21T09:20:56.0738665Z mov.b32 %r12165, %r12156; 2026-02-21T09:20:56.0738723Z mov.b32 %r12166, %r12156; 2026-02-21T09:20:56.0738785Z mov.b32 %r12167, %r12156; 2026-02-21T09:20:56.0738917Z mov.b32 %r12168, %r12156; 2026-02-21T09:20:56.0738985Z mov.b32 %r12169, %r12156; 2026-02-21T09:20:56.0739047Z mov.b32 %r12170, %r12156; 2026-02-21T09:20:56.0739106Z mov.b32 %r12171, %r12156; 2026-02-21T09:20:56.0739169Z mov.b32 %r12172, %r12156; 2026-02-21T09:20:56.0739227Z mov.b32 %r12173, %r12156; 2026-02-21T09:20:56.0739291Z mov.b32 %r12174, %r12156; 2026-02-21T09:20:56.0739351Z mov.b32 %r12175, %r12156; 2026-02-21T09:20:56.0739410Z mov.b32 %r12176, %r12156; 2026-02-21T09:20:56.0739471Z mov.b32 %r12177, %r12156; 2026-02-21T09:20:56.0739529Z mov.b32 %r12178, %r12156; 2026-02-21T09:20:56.0739587Z mov.b32 %r12179, %r12156; 2026-02-21T09:20:56.0739646Z mov.b32 %r12180, %r12156; 2026-02-21T09:20:56.0739710Z mov.b32 %r12181, %r12156; 2026-02-21T09:20:56.0739767Z mov.b32 %r12182, %r12156; 2026-02-21T09:20:56.0739827Z mov.b32 %r12183, %r12156; 2026-02-21T09:20:56.0739891Z mov.b32 %r12184, %r12156; 2026-02-21T09:20:56.0739948Z mov.b32 %r12185, %r12156; 2026-02-21T09:20:56.0740008Z mov.b32 %r12186, %r12156; 2026-02-21T09:20:56.0740071Z mov.b32 %r12187, %r12156; 2026-02-21T09:20:56.0740131Z mov.b32 %r12188, %r12156; 2026-02-21T09:20:56.0740190Z mov.b32 %r12189, %r12156; 2026-02-21T09:20:56.0740249Z mov.b32 %r12190, %r12156; 2026-02-21T09:20:56.0740309Z mov.b32 %r12191, %r12156; 2026-02-21T09:20:56.0740369Z mov.b32 %r12192, %r12156; 2026-02-21T09:20:56.0740430Z mov.b32 %r12193, %r12156; 2026-02-21T09:20:56.0740491Z mov.b32 %r12194, %r12156; 2026-02-21T09:20:56.0740549Z mov.b32 %r12195, %r12156; 2026-02-21T09:20:56.0740606Z mov.b32 %r12196, %r12156; 2026-02-21T09:20:56.0740665Z mov.b32 %r12197, %r12156; 2026-02-21T09:20:56.0740729Z mov.b32 %r12198, %r12156; 2026-02-21T09:20:56.0740788Z mov.b32 %r12199, %r12156; 2026-02-21T09:20:56.0740853Z mov.b32 %r12200, %r12156; 2026-02-21T09:20:56.0740913Z mov.b32 %r12201, %r12156; 2026-02-21T09:20:56.0740971Z mov.b32 %r12202, %r12156; 2026-02-21T09:20:56.0741028Z mov.b32 %r12203, %r12156; 2026-02-21T09:20:56.0741087Z mov.b32 %r12204, %r12156; 2026-02-21T09:20:56.0741148Z mov.b32 %r12205, %r12156; 2026-02-21T09:20:56.0741208Z mov.b32 %r12206, %r12156; 2026-02-21T09:20:56.0741268Z mov.b32 %r12207, %r12156; 2026-02-21T09:20:56.0741329Z mov.b32 %r12208, %r12156; 2026-02-21T09:20:56.0741387Z mov.b32 %r12209, %r12156; 2026-02-21T09:20:56.0741444Z mov.b32 %r12210, %r12156; 2026-02-21T09:20:56.0741575Z mov.b32 %r12211, %r12156; 2026-02-21T09:20:56.0741640Z mov.b32 %r12212, %r12156; 2026-02-21T09:20:56.0741699Z mov.b32 %r12213, %r12156; 2026-02-21T09:20:56.0741759Z mov.b32 %r12214, %r12156; 2026-02-21T09:20:56.0741822Z mov.b32 %r12215, %r12156; 2026-02-21T09:20:56.0741928Z mov.b32 %r12216, %r12156; 2026-02-21T09:20:56.0741986Z mov.b32 %r12217, %r12156; 2026-02-21T09:20:56.0742045Z mov.b32 %r12218, %r12156; 2026-02-21T09:20:56.0742107Z mov.b32 %r12219, %r12156; 2026-02-21T09:20:56.0742165Z mov.b32 %r12220, %r12156; 2026-02-21T09:20:56.0742223Z mov.b32 %r12221, %r12156; 2026-02-21T09:20:56.0742286Z mov.b32 %r12222, %r12156; 2026-02-21T09:20:56.0742345Z mov.b32 %r12223, %r12156; 2026-02-21T09:20:56.0742403Z mov.b32 %r12224, %r12156; 2026-02-21T09:20:56.0742516Z mov.b32 %r12225, %r12156; 2026-02-21T09:20:56.0742583Z mov.b32 %r12226, %r12156; 2026-02-21T09:20:56.0742642Z mov.b32 %r12227, %r12156; 2026-02-21T09:20:56.0742701Z mov.b32 %r12228, %r12156; 2026-02-21T09:20:56.0742765Z mov.b32 %r12229, %r12156; 2026-02-21T09:20:56.0742824Z mov.b32 %r12230, %r12156; 2026-02-21T09:20:56.0742882Z mov.b32 %r12231, %r12156; 2026-02-21T09:20:56.0742941Z mov.b32 %r12232, %r12156; 2026-02-21T09:20:56.0743004Z mov.b32 %r12233, %r12156; 2026-02-21T09:20:56.0743066Z mov.b32 %r12234, %r12156; 2026-02-21T09:20:56.0743125Z mov.b32 %r12235, %r12156; 2026-02-21T09:20:56.0743187Z mov.b32 %r12236, %r12156; 2026-02-21T09:20:56.0743244Z mov.b32 %r12237, %r12156; 2026-02-21T09:20:56.0743303Z mov.b32 %r12238, %r12156; 2026-02-21T09:20:56.0743364Z mov.b32 %r12239, %r12156; 2026-02-21T09:20:56.0743472Z mov.b32 %r12240, %r12156; 2026-02-21T09:20:56.0743532Z mov.b32 %r12241, %r12156; 2026-02-21T09:20:56.0743591Z mov.b32 %r12242, %r12156; 2026-02-21T09:20:56.0743656Z mov.b32 %r12243, %r12156; 2026-02-21T09:20:56.0743714Z mov.b32 %r12244, %r12156; 2026-02-21T09:20:56.0743772Z mov.b32 %r12245, %r12156; 2026-02-21T09:20:56.0743833Z mov.b32 %r12246, %r12156; 2026-02-21T09:20:56.0743893Z mov.b32 %r12247, %r12156; 2026-02-21T09:20:56.0743951Z mov.b32 %r12248, %r12156; 2026-02-21T09:20:56.0744009Z mov.b32 %r12249, %r12156; 2026-02-21T09:20:56.0744070Z mov.b32 %r12250, %r12156; 2026-02-21T09:20:56.0744126Z mov.b32 %r12251, %r12156; 2026-02-21T09:20:56.0744198Z mov.b32 %r12252, %r12156; 2026-02-21T09:20:56.0744261Z mov.b32 %r12253, %r12156; 2026-02-21T09:20:56.0744320Z mov.b32 %r12254, %r12156; 2026-02-21T09:20:56.0744380Z mov.b32 %r12255, %r12156; 2026-02-21T09:20:56.0744440Z mov.b32 %r12256, %r12156; 2026-02-21T09:20:56.0744502Z mov.b32 %r12257, %r12156; 2026-02-21T09:20:56.0744563Z mov.b32 %r12258, %r12156; 2026-02-21T09:20:56.0744623Z mov.b32 %r12259, %r12156; 2026-02-21T09:20:56.0744685Z mov.b32 %r12260, %r12156; 2026-02-21T09:20:56.0744746Z mov.b32 %r12261, %r12156; 2026-02-21T09:20:56.0744806Z mov.b32 %r12262, %r12156; 2026-02-21T09:20:56.0744864Z mov.b32 %r12263, %r12156; 2026-02-21T09:20:56.0744935Z mov.b32 %r12264, %r12156; 2026-02-21T09:20:56.0744998Z mov.b32 %r12265, %r12156; 2026-02-21T09:20:56.0745057Z mov.b32 %r12266, %r12156; 2026-02-21T09:20:56.0745118Z mov.b32 %r12267, %r12156; 2026-02-21T09:20:56.0745176Z mov.b32 %r12268, %r12156; 2026-02-21T09:20:56.0745233Z mov.b32 %r12269, %r12156; 2026-02-21T09:20:56.0745295Z mov.b32 %r12270, %r12156; 2026-02-21T09:20:56.0745356Z mov.b32 %r12271, %r12156; 2026-02-21T09:20:56.0745414Z mov.b32 %r12272, %r12156; 2026-02-21T09:20:56.0745473Z mov.b32 %r12273, %r12156; 2026-02-21T09:20:56.0745535Z mov.b32 %r12274, %r12156; 2026-02-21T09:20:56.0745595Z mov.b32 %r12275, %r12156; 2026-02-21T09:20:56.0745653Z mov.b32 %r12276, %r12156; 2026-02-21T09:20:56.0745712Z mov.b32 %r12277, %r12156; 2026-02-21T09:20:56.0745772Z mov.b32 %r12278, %r12156; 2026-02-21T09:20:56.0745833Z mov.b32 %r12279, %r12156; 2026-02-21T09:20:56.0745891Z mov.b32 %r12280, %r12156; 2026-02-21T09:20:56.0745952Z mov.b32 %r12281, %r12156; 2026-02-21T09:20:56.0746013Z mov.b32 %r12282, %r12156; 2026-02-21T09:20:56.0746141Z mov.b32 %r12283, %r12156; 2026-02-21T09:20:56.0746260Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:56.0746373Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:56.0746598Z add.s64 %rd639, %rd639, 16; 2026-02-21T09:20:56.0746669Z setp.lt.u64 %p37, %rd639, 432; 2026-02-21T09:20:56.0746746Z add.s32 %r7438, %r12154, 1; 2026-02-21T09:20:56.0746811Z setp.gt.s32 %p38, %r7438, 4; 2026-02-21T09:20:56.0746880Z selp.b32 %r12154, 0, %r7438, %p38; 2026-02-21T09:20:56.0747095Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0747165Z cp.async.wait_group 16; 2026-02-21T09:20:56.0747221Z bar.sync 0; 2026-02-21T09:20:56.0747356Z shl.b32 %r7439, %r12154, 13; 2026-02-21T09:20:56.0747428Z add.s32 %r7441, %r11869, %r7439; 2026-02-21T09:20:56.0747634Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0747699Z add.s32 %r7442, %r7441, %r97; 2026-02-21T09:20:56.0747769Z ld.shared.b16 %rs225, [%r7442]; 2026-02-21T09:20:56.0747840Z ld.shared.b16 %rs226, [%r7442+256]; 2026-02-21T09:20:56.0747906Z ld.shared.b16 %rs227, [%r7442+16]; 2026-02-21T09:20:56.0747978Z ld.shared.b16 %rs228, [%r7442+272]; 2026-02-21T09:20:56.0748046Z ld.shared.b16 %rs229, [%r7442+4096]; 2026-02-21T09:20:56.0748124Z ld.shared.b16 %rs230, [%r7442+4352]; 2026-02-21T09:20:56.0748193Z ld.shared.b16 %rs231, [%r7442+4112]; 2026-02-21T09:20:56.0748406Z ld.shared.b16 %rs232, [%r7442+4368]; 2026-02-21T09:20:56.0748474Z add.s32 %r7443, %r7441, %r98; 2026-02-21T09:20:56.0748540Z ld.shared.b16 %rs233, [%r7443]; 2026-02-21T09:20:56.0748610Z ld.shared.b16 %rs234, [%r7443+256]; 2026-02-21T09:20:56.0748678Z ld.shared.b16 %rs235, [%r7443+16]; 2026-02-21T09:20:56.0748744Z ld.shared.b16 %rs236, [%r7443+272]; 2026-02-21T09:20:56.0748813Z ld.shared.b16 %rs237, [%r7443+4096]; 2026-02-21T09:20:56.0748881Z ld.shared.b16 %rs238, [%r7443+4352]; 2026-02-21T09:20:56.0748946Z ld.shared.b16 %rs239, [%r7443+4112]; 2026-02-21T09:20:56.0749010Z ld.shared.b16 %rs240, [%r7443+4368]; 2026-02-21T09:20:56.0749078Z cvt.f32.bf16 %r5966, %rs225; 2026-02-21T09:20:56.0749138Z cvt.f32.bf16 %r5967, %rs226; 2026-02-21T09:20:56.0749201Z cvt.f32.bf16 %r5968, %rs233; 2026-02-21T09:20:56.0749266Z cvt.f32.bf16 %r5969, %rs234; 2026-02-21T09:20:56.0749327Z cvt.f32.bf16 %r6098, %rs227; 2026-02-21T09:20:56.0749389Z cvt.f32.bf16 %r6099, %rs228; 2026-02-21T09:20:56.0749450Z cvt.f32.bf16 %r6100, %rs235; 2026-02-21T09:20:56.0749514Z cvt.f32.bf16 %r6101, %rs236; 2026-02-21T09:20:56.0749576Z cvt.f32.bf16 %r6230, %rs229; 2026-02-21T09:20:56.0749638Z cvt.f32.bf16 %r6231, %rs230; 2026-02-21T09:20:56.0749702Z cvt.f32.bf16 %r6232, %rs237; 2026-02-21T09:20:56.0749764Z cvt.f32.bf16 %r6233, %rs238; 2026-02-21T09:20:56.0749825Z cvt.f32.bf16 %r6362, %rs231; 2026-02-21T09:20:56.0749886Z cvt.f32.bf16 %r6363, %rs232; 2026-02-21T09:20:56.0749951Z cvt.f32.bf16 %r6364, %rs239; 2026-02-21T09:20:56.0750010Z cvt.f32.bf16 %r6365, %rs240; 2026-02-21T09:20:56.0750213Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0750285Z shl.b32 %r7444, %r12154, 10; 2026-02-21T09:20:56.0750351Z add.s32 %r7445, %r11869, %r7444; 2026-02-21T09:20:56.0750413Z add.s32 %r7446, %r7445, 90112; 2026-02-21T09:20:56.0750620Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0750685Z add.s32 %r7447, %r7446, %r11875; 2026-02-21T09:20:56.0750752Z ld.shared.b8 %rs241, [%r7447]; 2026-02-21T09:20:56.0750828Z ld.shared.b8 %rs242, [%r7447+128]; 2026-02-21T09:20:56.0750901Z ld.shared.b8 %rs243, [%r7447+256]; 2026-02-21T09:20:56.0750967Z ld.shared.b8 %rs244, [%r7447+384]; 2026-02-21T09:20:56.0751030Z ld.shared.b8 %rs245, [%r7447+512]; 2026-02-21T09:20:56.0751097Z ld.shared.b8 %rs246, [%r7447+640]; 2026-02-21T09:20:56.0751247Z ld.shared.b8 %rs247, [%r7447+768]; 2026-02-21T09:20:56.0751308Z add.s32 %r7448, %r7446, %r11876; 2026-02-21T09:20:56.0751374Z ld.shared.b8 %rs248, [%r7448]; 2026-02-21T09:20:56.0751576Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0751717Z shl.b16 %rs249, %rs241, 4; 2026-02-21T09:20:56.0751781Z shl.b16 %rs250, %rs242, 4; 2026-02-21T09:20:56.0751845Z shl.b16 %rs251, %rs243, 4; 2026-02-21T09:20:56.0751916Z shl.b16 %rs252, %rs244, 4; 2026-02-21T09:20:56.0751980Z shl.b16 %rs253, %rs245, 4; 2026-02-21T09:20:56.0752043Z shl.b16 %rs254, %rs246, 4; 2026-02-21T09:20:56.0752103Z shl.b16 %rs255, %rs247, 4; 2026-02-21T09:20:56.0752163Z shl.b16 %rs256, %rs248, 4; 2026-02-21T09:20:56.0752419Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0752503Z selp.b16 %rs257, %rs249, %rs241, %p70; 2026-02-21T09:20:56.0752568Z cvt.s16.s8 %rs258, %rs257; 2026-02-21T09:20:56.0752629Z shr.s16 %rs259, %rs258, 4; 2026-02-21T09:20:56.0752702Z selp.b16 %rs260, %rs250, %rs242, %p70; 2026-02-21T09:20:56.0752762Z cvt.s16.s8 %rs261, %rs260; 2026-02-21T09:20:56.0752821Z shr.s16 %rs262, %rs261, 4; 2026-02-21T09:20:56.0752895Z selp.b16 %rs263, %rs251, %rs243, %p70; 2026-02-21T09:20:56.0752957Z cvt.s16.s8 %rs264, %rs263; 2026-02-21T09:20:56.0753017Z shr.s16 %rs265, %rs264, 4; 2026-02-21T09:20:56.0753086Z selp.b16 %rs266, %rs252, %rs244, %p70; 2026-02-21T09:20:56.0753153Z cvt.s16.s8 %rs267, %rs266; 2026-02-21T09:20:56.0753274Z shr.s16 %rs268, %rs267, 4; 2026-02-21T09:20:56.0753351Z selp.b16 %rs269, %rs253, %rs245, %p70; 2026-02-21T09:20:56.0753416Z cvt.s16.s8 %rs270, %rs269; 2026-02-21T09:20:56.0753479Z shr.s16 %rs271, %rs270, 4; 2026-02-21T09:20:56.0753548Z selp.b16 %rs272, %rs254, %rs246, %p70; 2026-02-21T09:20:56.0753609Z cvt.s16.s8 %rs273, %rs272; 2026-02-21T09:20:56.0753678Z shr.s16 %rs274, %rs273, 4; 2026-02-21T09:20:56.0753748Z selp.b16 %rs275, %rs255, %rs247, %p70; 2026-02-21T09:20:56.0753810Z cvt.s16.s8 %rs276, %rs275; 2026-02-21T09:20:56.0753873Z shr.s16 %rs277, %rs276, 4; 2026-02-21T09:20:56.0753939Z selp.b16 %rs278, %rs256, %rs248, %p70; 2026-02-21T09:20:56.0754002Z cvt.s16.s8 %rs279, %rs278; 2026-02-21T09:20:56.0754062Z shr.s16 %rs280, %rs279, 4; 2026-02-21T09:20:56.0754269Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0754335Z cvt.rn.f32.s16 %r7449, %rs259; 2026-02-21T09:20:56.0754400Z cvt.rn.f32.s16 %r7450, %rs262; 2026-02-21T09:20:56.0754466Z cvt.rn.f32.s16 %r7451, %rs265; 2026-02-21T09:20:56.0754529Z cvt.rn.f32.s16 %r7452, %rs268; 2026-02-21T09:20:56.0754591Z cvt.rn.f32.s16 %r7453, %rs271; 2026-02-21T09:20:56.0754659Z cvt.rn.f32.s16 %r7454, %rs274; 2026-02-21T09:20:56.0754722Z cvt.rn.f32.s16 %r7455, %rs277; 2026-02-21T09:20:56.0754784Z cvt.rn.f32.s16 %r7456, %rs280; 2026-02-21T09:20:56.0754850Z st.shared.b32 [%r101], %r7449; 2026-02-21T09:20:56.0754923Z st.shared.b32 [%r101+8], %r7450; 2026-02-21T09:20:56.0754985Z st.shared.b32 [%r102], %r7451; 2026-02-21T09:20:56.0755050Z st.shared.b32 [%r102+8], %r7452; 2026-02-21T09:20:56.0755117Z st.shared.b32 [%r103], %r7453; 2026-02-21T09:20:56.0755182Z st.shared.b32 [%r103+8], %r7454; 2026-02-21T09:20:56.0755250Z st.shared.b32 [%r104], %r7455; 2026-02-21T09:20:56.0755312Z st.shared.b32 [%r104+8], %r7456; 2026-02-21T09:20:56.0755374Z $L__tmp9: 2026-02-21T09:20:56.0755655Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0755716Z // begin inline asm 2026-02-21T09:20:56.0755801Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0755857Z // end inline asm 2026-02-21T09:20:56.0755914Z bar.sync 0; 2026-02-21T09:20:56.0755994Z shfl.sync.idx.b32 %r7457, %r5, 0, 31, -1; 2026-02-21T09:20:56.0756073Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0756206Z mov.pred %p28, -1; 2026-02-21T09:20:56.0756266Z // begin inline asm 2026-02-21T09:20:56.0757854Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219}, {%r5966,%r5967,%r5968,%r5969}, %rd1, %p28, 1, 1; 2026-02-21T09:20:56.0758000Z // end inline asm 2026-02-21T09:20:56.0758064Z // begin inline asm 2026-02-21T09:20:56.0759574Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219}, {%r6098,%r6099,%r6100,%r6101}, %rd2, %p28, 1, 1; 2026-02-21T09:20:56.0759637Z // end inline asm 2026-02-21T09:20:56.0759698Z // begin inline asm 2026-02-21T09:20:56.0761206Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283}, {%r6230,%r6231,%r6232,%r6233}, %rd1, %p28, 1, 1; 2026-02-21T09:20:56.0761272Z // end inline asm 2026-02-21T09:20:56.0761330Z // begin inline asm 2026-02-21T09:20:56.0762776Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283}, {%r6362,%r6363,%r6364,%r6365}, %rd2, %p28, 1, 1; 2026-02-21T09:20:56.0762836Z // end inline asm 2026-02-21T09:20:56.0762914Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0762973Z mov.b32 %r7286, 0; 2026-02-21T09:20:56.0763037Z mov.b32 %r6494, %r1575; 2026-02-21T09:20:56.0763098Z mov.b32 %r6495, %r7286; 2026-02-21T09:20:56.0763169Z mov.b32 %r6496, %r7286; 2026-02-21T09:20:56.0763231Z // begin inline asm 2026-02-21T09:20:56.0765692Z // wait for regs: %r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219,%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283,%r6494,%r6495,%r6496 2026-02-21T09:20:56.0765889Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0765949Z // end inline asm 2026-02-21T09:20:56.0766005Z $L__tmp10: 2026-02-21T09:20:56.0766218Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0766284Z add.s32 %r7459, %r3368, %r7439; 2026-02-21T09:20:56.0766604Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0766745Z add.s32 %r7460, %r7459, %r97; 2026-02-21T09:20:56.0766822Z ld.shared.b16 %rs281, [%r7460]; 2026-02-21T09:20:56.0766895Z ld.shared.b16 %rs282, [%r7460+256]; 2026-02-21T09:20:56.0766964Z ld.shared.b16 %rs283, [%r7460+16]; 2026-02-21T09:20:56.0767031Z ld.shared.b16 %rs284, [%r7460+272]; 2026-02-21T09:20:56.0767102Z ld.shared.b16 %rs285, [%r7460+4096]; 2026-02-21T09:20:56.0767168Z ld.shared.b16 %rs286, [%r7460+4352]; 2026-02-21T09:20:56.0767234Z ld.shared.b16 %rs287, [%r7460+4112]; 2026-02-21T09:20:56.0767306Z ld.shared.b16 %rs288, [%r7460+4368]; 2026-02-21T09:20:56.0767368Z add.s32 %r7461, %r7459, %r98; 2026-02-21T09:20:56.0767432Z ld.shared.b16 %rs289, [%r7461]; 2026-02-21T09:20:56.0767503Z ld.shared.b16 %rs290, [%r7461+256]; 2026-02-21T09:20:56.0767632Z ld.shared.b16 %rs291, [%r7461+16]; 2026-02-21T09:20:56.0767701Z ld.shared.b16 %rs292, [%r7461+272]; 2026-02-21T09:20:56.0767767Z ld.shared.b16 %rs293, [%r7461+4096]; 2026-02-21T09:20:56.0767839Z ld.shared.b16 %rs294, [%r7461+4352]; 2026-02-21T09:20:56.0767917Z ld.shared.b16 %rs295, [%r7461+4112]; 2026-02-21T09:20:56.0767985Z ld.shared.b16 %rs296, [%r7461+4368]; 2026-02-21T09:20:56.0768053Z cvt.f32.bf16 %r6756, %rs281; 2026-02-21T09:20:56.0768117Z cvt.f32.bf16 %r6757, %rs282; 2026-02-21T09:20:56.0768178Z cvt.f32.bf16 %r6758, %rs289; 2026-02-21T09:20:56.0768238Z cvt.f32.bf16 %r6759, %rs290; 2026-02-21T09:20:56.0768301Z cvt.f32.bf16 %r6888, %rs283; 2026-02-21T09:20:56.0768363Z cvt.f32.bf16 %r6889, %rs284; 2026-02-21T09:20:56.0768424Z cvt.f32.bf16 %r6890, %rs291; 2026-02-21T09:20:56.0768488Z cvt.f32.bf16 %r6891, %rs292; 2026-02-21T09:20:56.0768549Z cvt.f32.bf16 %r7020, %rs285; 2026-02-21T09:20:56.0768610Z cvt.f32.bf16 %r7021, %rs286; 2026-02-21T09:20:56.0768672Z cvt.f32.bf16 %r7022, %rs293; 2026-02-21T09:20:56.0768734Z cvt.f32.bf16 %r7023, %rs294; 2026-02-21T09:20:56.0768798Z cvt.f32.bf16 %r7152, %rs287; 2026-02-21T09:20:56.0768859Z cvt.f32.bf16 %r7153, %rs288; 2026-02-21T09:20:56.0768926Z cvt.f32.bf16 %r7154, %rs295; 2026-02-21T09:20:56.0768988Z cvt.f32.bf16 %r7155, %rs296; 2026-02-21T09:20:56.0769202Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0769270Z add.s32 %r7462, %r7445, 95232; 2026-02-21T09:20:56.0769472Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0769535Z add.s32 %r7463, %r7462, %r11875; 2026-02-21T09:20:56.0769603Z ld.shared.b8 %rs297, [%r7463]; 2026-02-21T09:20:56.0769672Z ld.shared.b8 %rs298, [%r7463+128]; 2026-02-21T09:20:56.0769736Z ld.shared.b8 %rs299, [%r7463+256]; 2026-02-21T09:20:56.0769800Z ld.shared.b8 %rs300, [%r7463+384]; 2026-02-21T09:20:56.0769867Z ld.shared.b8 %rs301, [%r7463+512]; 2026-02-21T09:20:56.0769933Z ld.shared.b8 %rs302, [%r7463+640]; 2026-02-21T09:20:56.0769997Z ld.shared.b8 %rs303, [%r7463+768]; 2026-02-21T09:20:56.0770060Z add.s32 %r7464, %r7462, %r11876; 2026-02-21T09:20:56.0770131Z ld.shared.b8 %rs304, [%r7464]; 2026-02-21T09:20:56.0770330Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0770480Z shl.b16 %rs305, %rs297, 4; 2026-02-21T09:20:56.0770548Z shl.b16 %rs306, %rs298, 4; 2026-02-21T09:20:56.0770609Z shl.b16 %rs307, %rs299, 4; 2026-02-21T09:20:56.0770670Z shl.b16 %rs308, %rs300, 4; 2026-02-21T09:20:56.0770732Z shl.b16 %rs309, %rs301, 4; 2026-02-21T09:20:56.0770865Z shl.b16 %rs310, %rs302, 4; 2026-02-21T09:20:56.0770935Z shl.b16 %rs311, %rs303, 4; 2026-02-21T09:20:56.0770996Z shl.b16 %rs312, %rs304, 4; 2026-02-21T09:20:56.0771200Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0771274Z selp.b16 %rs313, %rs305, %rs297, %p70; 2026-02-21T09:20:56.0771336Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T09:20:56.0771399Z shr.s16 %rs315, %rs314, 4; 2026-02-21T09:20:56.0771468Z selp.b16 %rs316, %rs306, %rs298, %p70; 2026-02-21T09:20:56.0771583Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T09:20:56.0771650Z shr.s16 %rs318, %rs317, 4; 2026-02-21T09:20:56.0771725Z selp.b16 %rs319, %rs307, %rs299, %p70; 2026-02-21T09:20:56.0771789Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T09:20:56.0771849Z shr.s16 %rs321, %rs320, 4; 2026-02-21T09:20:56.0771920Z selp.b16 %rs322, %rs308, %rs300, %p70; 2026-02-21T09:20:56.0771980Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T09:20:56.0772039Z shr.s16 %rs324, %rs323, 4; 2026-02-21T09:20:56.0772111Z selp.b16 %rs325, %rs309, %rs301, %p70; 2026-02-21T09:20:56.0772172Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T09:20:56.0772231Z shr.s16 %rs327, %rs326, 4; 2026-02-21T09:20:56.0772299Z selp.b16 %rs328, %rs310, %rs302, %p70; 2026-02-21T09:20:56.0772362Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T09:20:56.0772473Z shr.s16 %rs330, %rs329, 4; 2026-02-21T09:20:56.0772543Z selp.b16 %rs331, %rs311, %rs303, %p70; 2026-02-21T09:20:56.0772606Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T09:20:56.0772668Z shr.s16 %rs333, %rs332, 4; 2026-02-21T09:20:56.0772733Z selp.b16 %rs334, %rs312, %rs304, %p70; 2026-02-21T09:20:56.0772792Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T09:20:56.0772855Z shr.s16 %rs336, %rs335, 4; 2026-02-21T09:20:56.0773059Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0773124Z cvt.rn.f32.s16 %r7465, %rs315; 2026-02-21T09:20:56.0773197Z cvt.rn.f32.s16 %r7466, %rs318; 2026-02-21T09:20:56.0773267Z cvt.rn.f32.s16 %r7467, %rs321; 2026-02-21T09:20:56.0773330Z cvt.rn.f32.s16 %r7468, %rs324; 2026-02-21T09:20:56.0773400Z cvt.rn.f32.s16 %r7469, %rs327; 2026-02-21T09:20:56.0773462Z cvt.rn.f32.s16 %r7470, %rs330; 2026-02-21T09:20:56.0773522Z cvt.rn.f32.s16 %r7471, %rs333; 2026-02-21T09:20:56.0773585Z cvt.rn.f32.s16 %r7472, %rs336; 2026-02-21T09:20:56.0773645Z bar.sync 0; 2026-02-21T09:20:56.0773708Z st.shared.b32 [%r101], %r7465; 2026-02-21T09:20:56.0773774Z st.shared.b32 [%r101+8], %r7466; 2026-02-21T09:20:56.0773841Z st.shared.b32 [%r102], %r7467; 2026-02-21T09:20:56.0773906Z st.shared.b32 [%r102+8], %r7468; 2026-02-21T09:20:56.0773968Z st.shared.b32 [%r103], %r7469; 2026-02-21T09:20:56.0774031Z st.shared.b32 [%r103+8], %r7470; 2026-02-21T09:20:56.0774098Z st.shared.b32 [%r104], %r7471; 2026-02-21T09:20:56.0774161Z st.shared.b32 [%r104+8], %r7472; 2026-02-21T09:20:56.0774216Z $L__tmp11: 2026-02-21T09:20:56.0774498Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0774560Z // begin inline asm 2026-02-21T09:20:56.0774639Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0774696Z // end inline asm 2026-02-21T09:20:56.0774759Z bar.sync 0; 2026-02-21T09:20:56.0774834Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0774893Z // begin inline asm 2026-02-21T09:20:56.0776344Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219}, {%r6756,%r6757,%r6758,%r6759}, %rd1, %p28, 1, 1; 2026-02-21T09:20:56.0776664Z // end inline asm 2026-02-21T09:20:56.0776723Z // begin inline asm 2026-02-21T09:20:56.0778252Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219}, {%r6888,%r6889,%r6890,%r6891}, %rd2, %p28, 1, 1; 2026-02-21T09:20:56.0778317Z // end inline asm 2026-02-21T09:20:56.0778378Z // begin inline asm 2026-02-21T09:20:56.0779881Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283}, {%r7020,%r7021,%r7022,%r7023}, %rd1, %p28, 1, 1; 2026-02-21T09:20:56.0779945Z // end inline asm 2026-02-21T09:20:56.0780007Z // begin inline asm 2026-02-21T09:20:56.0781459Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283}, {%r7152,%r7153,%r7154,%r7155}, %rd2, %p28, 1, 1; 2026-02-21T09:20:56.0781522Z // end inline asm 2026-02-21T09:20:56.0781595Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0781667Z mov.b32 %r7284, %r1575; 2026-02-21T09:20:56.0781731Z mov.b32 %r7285, %r7286; 2026-02-21T09:20:56.0781790Z // begin inline asm 2026-02-21T09:20:56.0784237Z // wait for regs: %r12156,%r12157,%r12158,%r12159,%r12160,%r12161,%r12162,%r12163,%r12164,%r12165,%r12166,%r12167,%r12168,%r12169,%r12170,%r12171,%r12172,%r12173,%r12174,%r12175,%r12176,%r12177,%r12178,%r12179,%r12180,%r12181,%r12182,%r12183,%r12184,%r12185,%r12186,%r12187,%r12188,%r12189,%r12190,%r12191,%r12192,%r12193,%r12194,%r12195,%r12196,%r12197,%r12198,%r12199,%r12200,%r12201,%r12202,%r12203,%r12204,%r12205,%r12206,%r12207,%r12208,%r12209,%r12210,%r12211,%r12212,%r12213,%r12214,%r12215,%r12216,%r12217,%r12218,%r12219,%r12220,%r12221,%r12222,%r12223,%r12224,%r12225,%r12226,%r12227,%r12228,%r12229,%r12230,%r12231,%r12232,%r12233,%r12234,%r12235,%r12236,%r12237,%r12238,%r12239,%r12240,%r12241,%r12242,%r12243,%r12244,%r12245,%r12246,%r12247,%r12248,%r12249,%r12250,%r12251,%r12252,%r12253,%r12254,%r12255,%r12256,%r12257,%r12258,%r12259,%r12260,%r12261,%r12262,%r12263,%r12264,%r12265,%r12266,%r12267,%r12268,%r12269,%r12270,%r12271,%r12272,%r12273,%r12274,%r12275,%r12276,%r12277,%r12278,%r12279,%r12280,%r12281,%r12282,%r12283,%r7284,%r7285,%r7286 2026-02-21T09:20:56.0784320Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0784377Z // end inline asm 2026-02-21T09:20:56.0784432Z $L__tmp12: 2026-02-21T09:20:56.0784642Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0784785Z add.s32 %r7473, %r12155, 1; 2026-02-21T09:20:56.0784854Z setp.gt.s32 %p39, %r7473, 4; 2026-02-21T09:20:56.0784927Z selp.b32 %r12155, 0, %r7473, %p39; 2026-02-21T09:20:56.0785196Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0785264Z add.s32 %r7474, %r12152, -16; 2026-02-21T09:20:56.0785337Z add.s64 %rd376, %rd638, %rd24; 2026-02-21T09:20:56.0788425Z add.s64 %rd366, %rd376, 320; 2026-02-21T09:20:56.0788539Z add.s64 %rd377, %rd638, %rd23; 2026-02-21T09:20:56.0788617Z add.s64 %rd367, %rd377, 320; 2026-02-21T09:20:56.0788686Z add.s64 %rd378, %rd638, %rd22; 2026-02-21T09:20:56.0788752Z add.s64 %rd368, %rd378, 320; 2026-02-21T09:20:56.0788961Z mad.wide.s32 %rd369, %r7474, 2, %rd44; 2026-02-21T09:20:56.0789207Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0789278Z shl.b32 %r7475, %r12155, 13; 2026-02-21T09:20:56.0789345Z add.s32 %r7476, %r11869, %r7475; 2026-02-21T09:20:56.0789414Z add.s32 %r7418, %r7476, %r36; 2026-02-21T09:20:56.0789483Z selp.b32 %r7419, 8, 0, %p37; 2026-02-21T09:20:56.0789548Z // begin inline asm 2026-02-21T09:20:56.0789710Z cp.async.ca.shared.global [ %r7418 + 0 ], [ %rd366 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0789768Z // end inline asm 2026-02-21T09:20:56.0789832Z add.s32 %r7420, %r7418, 2048; 2026-02-21T09:20:56.0789893Z // begin inline asm 2026-02-21T09:20:56.0790113Z cp.async.ca.shared.global [ %r7420 + 0 ], [ %rd367 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0790181Z // end inline asm 2026-02-21T09:20:56.0790247Z add.s32 %r7422, %r7418, 4096; 2026-02-21T09:20:56.0790309Z // begin inline asm 2026-02-21T09:20:56.0790450Z cp.async.ca.shared.global [ %r7422 + 0 ], [ %rd368 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0790507Z // end inline asm 2026-02-21T09:20:56.0790572Z add.s32 %r7424, %r7418, 6144; 2026-02-21T09:20:56.0790634Z // begin inline asm 2026-02-21T09:20:56.0790767Z cp.async.ca.shared.global [ %r7424 + 0 ], [ %rd369 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0790823Z // end inline asm 2026-02-21T09:20:56.0790901Z cp.async.commit_group; 2026-02-21T09:20:56.0791127Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0791198Z add.s32 %r7477, %r12153, -65536; 2026-02-21T09:20:56.0791270Z cvt.s64.s32 %rd379, %r7477; 2026-02-21T09:20:56.0791337Z add.s64 %rd370, %rd45, %rd379; 2026-02-21T09:20:56.0791550Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0791615Z shl.b32 %r7478, %r12155, 10; 2026-02-21T09:20:56.0791676Z add.s32 %r7426, %r42, %r7478; 2026-02-21T09:20:56.0791742Z selp.b32 %r7427, 4, 0, %p37; 2026-02-21T09:20:56.0791801Z // begin inline asm 2026-02-21T09:20:56.0791943Z cp.async.ca.shared.global [ %r7426 + 0 ], [ %rd370 + 0 ], 0x4, %r7427; 2026-02-21T09:20:56.0792004Z // end inline asm 2026-02-21T09:20:56.0792070Z cp.async.commit_group; 2026-02-21T09:20:56.0792277Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0792341Z add.s64 %rd371, %rd376, 352; 2026-02-21T09:20:56.0792404Z add.s64 %rd372, %rd377, 352; 2026-02-21T09:20:56.0792466Z add.s64 %rd373, %rd378, 352; 2026-02-21T09:20:56.0792546Z mad.wide.s32 %rd374, %r12152, 2, %rd44; 2026-02-21T09:20:56.0792751Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0792816Z add.s32 %r7479, %r3368, %r7475; 2026-02-21T09:20:56.0792883Z add.s32 %r7428, %r7479, %r36; 2026-02-21T09:20:56.0792944Z // begin inline asm 2026-02-21T09:20:56.0793081Z cp.async.ca.shared.global [ %r7428 + 0 ], [ %rd371 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0793143Z // end inline asm 2026-02-21T09:20:56.0793203Z add.s32 %r7430, %r7428, 2048; 2026-02-21T09:20:56.0793354Z // begin inline asm 2026-02-21T09:20:56.0793487Z cp.async.ca.shared.global [ %r7430 + 0 ], [ %rd372 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0793546Z // end inline asm 2026-02-21T09:20:56.0793604Z add.s32 %r7432, %r7428, 4096; 2026-02-21T09:20:56.0793664Z // begin inline asm 2026-02-21T09:20:56.0793857Z cp.async.ca.shared.global [ %r7432 + 0 ], [ %rd373 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0793912Z // end inline asm 2026-02-21T09:20:56.0793972Z add.s32 %r7434, %r7428, 6144; 2026-02-21T09:20:56.0794040Z // begin inline asm 2026-02-21T09:20:56.0794178Z cp.async.ca.shared.global [ %r7434 + 0 ], [ %rd374 + 0 ], 0x8, %r7419; 2026-02-21T09:20:56.0794235Z // end inline asm 2026-02-21T09:20:56.0794299Z cp.async.commit_group; 2026-02-21T09:20:56.0794557Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0794623Z cvt.s64.s32 %rd380, %r12153; 2026-02-21T09:20:56.0794689Z add.s64 %rd375, %rd45, %rd380; 2026-02-21T09:20:56.0794897Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0794958Z add.s32 %r7436, %r48, %r7478; 2026-02-21T09:20:56.0795017Z // begin inline asm 2026-02-21T09:20:56.0795147Z cp.async.ca.shared.global [ %r7436 + 0 ], [ %rd375 + 0 ], 0x4, %r7427; 2026-02-21T09:20:56.0795208Z // end inline asm 2026-02-21T09:20:56.0795272Z cp.async.commit_group; 2026-02-21T09:20:56.0795471Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0795586Z add.s32 %r12153, %r12153, 131072; 2026-02-21T09:20:56.0795651Z add.s64 %rd638, %rd638, 64; 2026-02-21T09:20:56.0795711Z add.s32 %r12152, %r12152, 32; 2026-02-21T09:20:56.0795784Z setp.lt.u64 %p40, %rd639, 496; 2026-02-21T09:20:56.0795856Z @%p40 bra $L__BB0_7; 2026-02-21T09:20:56.0795971Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:56.0796184Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0796255Z or.b32 %r7727, %r747, %r9; 2026-02-21T09:20:56.0796613Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0796685Z or.b32 %r7728, %r748, %r15; 2026-02-21T09:20:56.0796750Z or.b32 %r7729, %r748, %r16; 2026-02-21T09:20:56.0796809Z or.b32 %r7730, %r748, %r17; 2026-02-21T09:20:56.0796867Z or.b32 %r7731, %r748, %r18; 2026-02-21T09:20:56.0796929Z or.b32 %r7732, %r748, %r19; 2026-02-21T09:20:56.0796989Z or.b32 %r7733, %r748, %r20; 2026-02-21T09:20:56.0797047Z or.b32 %r7734, %r748, %r21; 2026-02-21T09:20:56.0797106Z or.b32 %r7735, %r748, %r22; 2026-02-21T09:20:56.0797170Z or.b32 %r7736, %r748, %r23; 2026-02-21T09:20:56.0797230Z or.b32 %r7737, %r748, %r24; 2026-02-21T09:20:56.0797290Z or.b32 %r7738, %r748, %r25; 2026-02-21T09:20:56.0797352Z or.b32 %r7739, %r748, %r26; 2026-02-21T09:20:56.0797410Z or.b32 %r7740, %r748, %r27; 2026-02-21T09:20:56.0797469Z or.b32 %r7741, %r748, %r28; 2026-02-21T09:20:56.0797531Z or.b32 %r7742, %r748, %r29; 2026-02-21T09:20:56.0797593Z or.b32 %r7743, %r748, %r30; 2026-02-21T09:20:56.0797801Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0797873Z cp.async.wait_group 0; 2026-02-21T09:20:56.0797932Z bar.sync 0; 2026-02-21T09:20:56.0798135Z .loc 1 90 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:90:28 2026-02-21T09:20:56.0798223Z cvt.rn.bf16x2.f32 %r7744, %r12157, %r12156; 2026-02-21T09:20:56.0798306Z cvt.rn.bf16x2.f32 %r7745, %r12159, %r12158; 2026-02-21T09:20:56.0798384Z cvt.rn.bf16x2.f32 %r7746, %r12161, %r12160; 2026-02-21T09:20:56.0798461Z cvt.rn.bf16x2.f32 %r7747, %r12163, %r12162; 2026-02-21T09:20:56.0798536Z cvt.rn.bf16x2.f32 %r7748, %r12165, %r12164; 2026-02-21T09:20:56.0798614Z cvt.rn.bf16x2.f32 %r7749, %r12167, %r12166; 2026-02-21T09:20:56.0798687Z cvt.rn.bf16x2.f32 %r7750, %r12169, %r12168; 2026-02-21T09:20:56.0798858Z cvt.rn.bf16x2.f32 %r7751, %r12171, %r12170; 2026-02-21T09:20:56.0798935Z cvt.rn.bf16x2.f32 %r7752, %r12173, %r12172; 2026-02-21T09:20:56.0799009Z cvt.rn.bf16x2.f32 %r7753, %r12175, %r12174; 2026-02-21T09:20:56.0799485Z cvt.rn.bf16x2.f32 %r7754, %r12177, %r12176; 2026-02-21T09:20:56.0799575Z cvt.rn.bf16x2.f32 %r7755, %r12179, %r12178; 2026-02-21T09:20:56.0799652Z cvt.rn.bf16x2.f32 %r7756, %r12181, %r12180; 2026-02-21T09:20:56.0799729Z cvt.rn.bf16x2.f32 %r7757, %r12183, %r12182; 2026-02-21T09:20:56.0799804Z cvt.rn.bf16x2.f32 %r7758, %r12185, %r12184; 2026-02-21T09:20:56.0799880Z cvt.rn.bf16x2.f32 %r7759, %r12187, %r12186; 2026-02-21T09:20:56.0799953Z cvt.rn.bf16x2.f32 %r7760, %r12189, %r12188; 2026-02-21T09:20:56.0800098Z cvt.rn.bf16x2.f32 %r7761, %r12191, %r12190; 2026-02-21T09:20:56.0800177Z cvt.rn.bf16x2.f32 %r7762, %r12193, %r12192; 2026-02-21T09:20:56.0800252Z cvt.rn.bf16x2.f32 %r7763, %r12195, %r12194; 2026-02-21T09:20:56.0800327Z cvt.rn.bf16x2.f32 %r7764, %r12197, %r12196; 2026-02-21T09:20:56.0800411Z cvt.rn.bf16x2.f32 %r7765, %r12199, %r12198; 2026-02-21T09:20:56.0800494Z cvt.rn.bf16x2.f32 %r7766, %r12201, %r12200; 2026-02-21T09:20:56.0800570Z cvt.rn.bf16x2.f32 %r7767, %r12203, %r12202; 2026-02-21T09:20:56.0800644Z cvt.rn.bf16x2.f32 %r7768, %r12205, %r12204; 2026-02-21T09:20:56.0800721Z cvt.rn.bf16x2.f32 %r7769, %r12207, %r12206; 2026-02-21T09:20:56.0800794Z cvt.rn.bf16x2.f32 %r7770, %r12209, %r12208; 2026-02-21T09:20:56.0800868Z cvt.rn.bf16x2.f32 %r7771, %r12211, %r12210; 2026-02-21T09:20:56.0801029Z cvt.rn.bf16x2.f32 %r7772, %r12213, %r12212; 2026-02-21T09:20:56.0801119Z cvt.rn.bf16x2.f32 %r7773, %r12215, %r12214; 2026-02-21T09:20:56.0801193Z cvt.rn.bf16x2.f32 %r7774, %r12217, %r12216; 2026-02-21T09:20:56.0801267Z cvt.rn.bf16x2.f32 %r7775, %r12219, %r12218; 2026-02-21T09:20:56.0801347Z cvt.rn.bf16x2.f32 %r7776, %r12221, %r12220; 2026-02-21T09:20:56.0801419Z cvt.rn.bf16x2.f32 %r7777, %r12223, %r12222; 2026-02-21T09:20:56.0801494Z cvt.rn.bf16x2.f32 %r7778, %r12225, %r12224; 2026-02-21T09:20:56.0801571Z cvt.rn.bf16x2.f32 %r7779, %r12227, %r12226; 2026-02-21T09:20:56.0801642Z cvt.rn.bf16x2.f32 %r7780, %r12229, %r12228; 2026-02-21T09:20:56.0801718Z cvt.rn.bf16x2.f32 %r7781, %r12231, %r12230; 2026-02-21T09:20:56.0801795Z cvt.rn.bf16x2.f32 %r7782, %r12233, %r12232; 2026-02-21T09:20:56.0801869Z cvt.rn.bf16x2.f32 %r7783, %r12235, %r12234; 2026-02-21T09:20:56.0801940Z cvt.rn.bf16x2.f32 %r7784, %r12237, %r12236; 2026-02-21T09:20:56.0802026Z cvt.rn.bf16x2.f32 %r7785, %r12239, %r12238; 2026-02-21T09:20:56.0802110Z cvt.rn.bf16x2.f32 %r7786, %r12241, %r12240; 2026-02-21T09:20:56.0802183Z cvt.rn.bf16x2.f32 %r7787, %r12243, %r12242; 2026-02-21T09:20:56.0802259Z cvt.rn.bf16x2.f32 %r7788, %r12245, %r12244; 2026-02-21T09:20:56.0802340Z cvt.rn.bf16x2.f32 %r7789, %r12247, %r12246; 2026-02-21T09:20:56.0802416Z cvt.rn.bf16x2.f32 %r7790, %r12249, %r12248; 2026-02-21T09:20:56.0802491Z cvt.rn.bf16x2.f32 %r7791, %r12251, %r12250; 2026-02-21T09:20:56.0802570Z cvt.rn.bf16x2.f32 %r7792, %r12253, %r12252; 2026-02-21T09:20:56.0802643Z cvt.rn.bf16x2.f32 %r7793, %r12255, %r12254; 2026-02-21T09:20:56.0802715Z cvt.rn.bf16x2.f32 %r7794, %r12257, %r12256; 2026-02-21T09:20:56.0802795Z cvt.rn.bf16x2.f32 %r7795, %r12259, %r12258; 2026-02-21T09:20:56.0802874Z cvt.rn.bf16x2.f32 %r7796, %r12261, %r12260; 2026-02-21T09:20:56.0802949Z cvt.rn.bf16x2.f32 %r7797, %r12263, %r12262; 2026-02-21T09:20:56.0803023Z cvt.rn.bf16x2.f32 %r7798, %r12265, %r12264; 2026-02-21T09:20:56.0803103Z cvt.rn.bf16x2.f32 %r7799, %r12267, %r12266; 2026-02-21T09:20:56.0803176Z cvt.rn.bf16x2.f32 %r7800, %r12269, %r12268; 2026-02-21T09:20:56.0803250Z cvt.rn.bf16x2.f32 %r7801, %r12271, %r12270; 2026-02-21T09:20:56.0803333Z cvt.rn.bf16x2.f32 %r7802, %r12273, %r12272; 2026-02-21T09:20:56.0803408Z cvt.rn.bf16x2.f32 %r7803, %r12275, %r12274; 2026-02-21T09:20:56.0803480Z cvt.rn.bf16x2.f32 %r7804, %r12277, %r12276; 2026-02-21T09:20:56.0803627Z cvt.rn.bf16x2.f32 %r7805, %r12279, %r12278; 2026-02-21T09:20:56.0803704Z cvt.rn.bf16x2.f32 %r7806, %r12281, %r12280; 2026-02-21T09:20:56.0803778Z cvt.rn.bf16x2.f32 %r7807, %r12283, %r12282; 2026-02-21T09:20:56.0803992Z .loc 1 91 43 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:43 2026-02-21T09:20:56.0804114Z shl.b32 %r7808, %r7728, 13; 2026-02-21T09:20:56.0804178Z shl.b32 %r7809, %r7729, 13; 2026-02-21T09:20:56.0804238Z shl.b32 %r7810, %r7730, 13; 2026-02-21T09:20:56.0804303Z shl.b32 %r7811, %r7731, 13; 2026-02-21T09:20:56.0804365Z shl.b32 %r7812, %r7732, 13; 2026-02-21T09:20:56.0804424Z shl.b32 %r7813, %r7733, 13; 2026-02-21T09:20:56.0804481Z shl.b32 %r7814, %r7734, 13; 2026-02-21T09:20:56.0804543Z shl.b32 %r7815, %r7735, 13; 2026-02-21T09:20:56.0804657Z shl.b32 %r7816, %r7736, 13; 2026-02-21T09:20:56.0804724Z shl.b32 %r7817, %r7737, 13; 2026-02-21T09:20:56.0804788Z shl.b32 %r7818, %r7738, 13; 2026-02-21T09:20:56.0804847Z shl.b32 %r7819, %r7739, 13; 2026-02-21T09:20:56.0804904Z shl.b32 %r7820, %r7740, 13; 2026-02-21T09:20:56.0804964Z shl.b32 %r7821, %r7741, 13; 2026-02-21T09:20:56.0805025Z shl.b32 %r7822, %r7742, 13; 2026-02-21T09:20:56.0805083Z shl.b32 %r7823, %r7743, 13; 2026-02-21T09:20:56.0805296Z .loc 1 91 50 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:50 2026-02-21T09:20:56.0805366Z add.s32 %r7824, %r7808, %r7727; 2026-02-21T09:20:56.0805429Z add.s32 %r7825, %r7809, %r7727; 2026-02-21T09:20:56.0805491Z add.s32 %r7826, %r7810, %r7727; 2026-02-21T09:20:56.0805619Z add.s32 %r7827, %r7811, %r7727; 2026-02-21T09:20:56.0805687Z add.s32 %r7828, %r7812, %r7727; 2026-02-21T09:20:56.0805746Z add.s32 %r7829, %r7813, %r7727; 2026-02-21T09:20:56.0805806Z add.s32 %r7830, %r7814, %r7727; 2026-02-21T09:20:56.0805871Z add.s32 %r7831, %r7815, %r7727; 2026-02-21T09:20:56.0805932Z add.s32 %r7832, %r7816, %r7727; 2026-02-21T09:20:56.0805992Z add.s32 %r7833, %r7817, %r7727; 2026-02-21T09:20:56.0806061Z add.s32 %r7834, %r7818, %r7727; 2026-02-21T09:20:56.0806121Z add.s32 %r7835, %r7819, %r7727; 2026-02-21T09:20:56.0806182Z add.s32 %r7836, %r7820, %r7727; 2026-02-21T09:20:56.0806244Z add.s32 %r7837, %r7821, %r7727; 2026-02-21T09:20:56.0806309Z add.s32 %r7838, %r7822, %r7727; 2026-02-21T09:20:56.0806368Z add.s32 %r7839, %r7823, %r7727; 2026-02-21T09:20:56.0806715Z .loc 1 91 22 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:22 2026-02-21T09:20:56.0806796Z mad.wide.s32 %rd381, %r7824, 2, %rd46; 2026-02-21T09:20:56.0806867Z mad.wide.s32 %rd382, %r7825, 2, %rd46; 2026-02-21T09:20:56.0806936Z mad.wide.s32 %rd383, %r7826, 2, %rd46; 2026-02-21T09:20:56.0807006Z mad.wide.s32 %rd384, %r7827, 2, %rd46; 2026-02-21T09:20:56.0807073Z mad.wide.s32 %rd385, %r7828, 2, %rd46; 2026-02-21T09:20:56.0807138Z mad.wide.s32 %rd386, %r7829, 2, %rd46; 2026-02-21T09:20:56.0807204Z mad.wide.s32 %rd387, %r7830, 2, %rd46; 2026-02-21T09:20:56.0807277Z mad.wide.s32 %rd388, %r7831, 2, %rd46; 2026-02-21T09:20:56.0807343Z mad.wide.s32 %rd389, %r7832, 2, %rd46; 2026-02-21T09:20:56.0807407Z mad.wide.s32 %rd390, %r7833, 2, %rd46; 2026-02-21T09:20:56.0807475Z mad.wide.s32 %rd391, %r7834, 2, %rd46; 2026-02-21T09:20:56.0807543Z mad.wide.s32 %rd392, %r7835, 2, %rd46; 2026-02-21T09:20:56.0807606Z mad.wide.s32 %rd393, %r7836, 2, %rd46; 2026-02-21T09:20:56.0807672Z mad.wide.s32 %rd394, %r7837, 2, %rd46; 2026-02-21T09:20:56.0807744Z mad.wide.s32 %rd395, %r7838, 2, %rd46; 2026-02-21T09:20:56.0807813Z mad.wide.s32 %rd396, %r7839, 2, %rd46; 2026-02-21T09:20:56.0808020Z .loc 1 91 81 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:81 2026-02-21T09:20:56.0808145Z st.shared.v4.b32 [%r105], {%r7744, %r7746, %r7748, %r7750}; 2026-02-21T09:20:56.0808265Z st.shared.v4.b32 [%r105+512], {%r7745, %r7747, %r7749, %r7751}; 2026-02-21T09:20:56.0808372Z st.shared.v4.b32 [%r106], {%r7752, %r7754, %r7756, %r7758}; 2026-02-21T09:20:56.0808583Z st.shared.v4.b32 [%r106+512], {%r7753, %r7755, %r7757, %r7759}; 2026-02-21T09:20:56.0808689Z st.shared.v4.b32 [%r107], {%r7760, %r7762, %r7764, %r7766}; 2026-02-21T09:20:56.0808798Z st.shared.v4.b32 [%r107+512], {%r7761, %r7763, %r7765, %r7767}; 2026-02-21T09:20:56.0808975Z st.shared.v4.b32 [%r108], {%r7768, %r7770, %r7772, %r7774}; 2026-02-21T09:20:56.0809084Z st.shared.v4.b32 [%r108+512], {%r7769, %r7771, %r7773, %r7775}; 2026-02-21T09:20:56.0809142Z bar.sync 0; 2026-02-21T09:20:56.0809203Z // begin inline asm 2026-02-21T09:20:56.0809397Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7560, %r7561, %r7562, %r7563}, [%r3394]; 2026-02-21T09:20:56.0809456Z // end inline asm 2026-02-21T09:20:56.0809514Z // begin inline asm 2026-02-21T09:20:56.0809766Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7564, %r7565, %r7566, %r7567}, [%r3399]; 2026-02-21T09:20:56.0809829Z // end inline asm 2026-02-21T09:20:56.0809889Z // begin inline asm 2026-02-21T09:20:56.0810071Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7568, %r7569, %r7570, %r7571}, [%r3404]; 2026-02-21T09:20:56.0810135Z // end inline asm 2026-02-21T09:20:56.0810194Z // begin inline asm 2026-02-21T09:20:56.0810372Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7572, %r7573, %r7574, %r7575}, [%r3409]; 2026-02-21T09:20:56.0810432Z // end inline asm 2026-02-21T09:20:56.0810492Z // begin inline asm 2026-02-21T09:20:56.0810671Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7576, %r7577, %r7578, %r7579}, [%r3414]; 2026-02-21T09:20:56.0810730Z // end inline asm 2026-02-21T09:20:56.0810787Z // begin inline asm 2026-02-21T09:20:56.0811035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7580, %r7581, %r7582, %r7583}, [%r3419]; 2026-02-21T09:20:56.0811095Z // end inline asm 2026-02-21T09:20:56.0811171Z // begin inline asm 2026-02-21T09:20:56.0811354Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7584, %r7585, %r7586, %r7587}, [%r3424]; 2026-02-21T09:20:56.0811410Z // end inline asm 2026-02-21T09:20:56.0811473Z // begin inline asm 2026-02-21T09:20:56.0811654Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7588, %r7589, %r7590, %r7591}, [%r3429]; 2026-02-21T09:20:56.0811714Z // end inline asm 2026-02-21T09:20:56.0811771Z bar.sync 0; 2026-02-21T09:20:56.0811877Z st.shared.v4.b32 [%r105], {%r7776, %r7778, %r7780, %r7782}; 2026-02-21T09:20:56.0811988Z st.shared.v4.b32 [%r105+512], {%r7777, %r7779, %r7781, %r7783}; 2026-02-21T09:20:56.0812096Z st.shared.v4.b32 [%r106], {%r7784, %r7786, %r7788, %r7790}; 2026-02-21T09:20:56.0812208Z st.shared.v4.b32 [%r106+512], {%r7785, %r7787, %r7789, %r7791}; 2026-02-21T09:20:56.0812310Z st.shared.v4.b32 [%r107], {%r7792, %r7794, %r7796, %r7798}; 2026-02-21T09:20:56.0812420Z st.shared.v4.b32 [%r107+512], {%r7793, %r7795, %r7797, %r7799}; 2026-02-21T09:20:56.0812526Z st.shared.v4.b32 [%r108], {%r7800, %r7802, %r7804, %r7806}; 2026-02-21T09:20:56.0812633Z st.shared.v4.b32 [%r108+512], {%r7801, %r7803, %r7805, %r7807}; 2026-02-21T09:20:56.0812687Z bar.sync 0; 2026-02-21T09:20:56.0812750Z // begin inline asm 2026-02-21T09:20:56.0812930Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7592, %r7593, %r7594, %r7595}, [%r3394]; 2026-02-21T09:20:56.0812988Z // end inline asm 2026-02-21T09:20:56.0813049Z // begin inline asm 2026-02-21T09:20:56.0813230Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7596, %r7597, %r7598, %r7599}, [%r3399]; 2026-02-21T09:20:56.0813286Z // end inline asm 2026-02-21T09:20:56.0813343Z // begin inline asm 2026-02-21T09:20:56.0813521Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7600, %r7601, %r7602, %r7603}, [%r3404]; 2026-02-21T09:20:56.0813580Z // end inline asm 2026-02-21T09:20:56.0813639Z // begin inline asm 2026-02-21T09:20:56.0813819Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7604, %r7605, %r7606, %r7607}, [%r3409]; 2026-02-21T09:20:56.0813875Z // end inline asm 2026-02-21T09:20:56.0813934Z // begin inline asm 2026-02-21T09:20:56.0814122Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7608, %r7609, %r7610, %r7611}, [%r3414]; 2026-02-21T09:20:56.0814183Z // end inline asm 2026-02-21T09:20:56.0814309Z // begin inline asm 2026-02-21T09:20:56.0814487Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7612, %r7613, %r7614, %r7615}, [%r3419]; 2026-02-21T09:20:56.0814547Z // end inline asm 2026-02-21T09:20:56.0814605Z // begin inline asm 2026-02-21T09:20:56.0814829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7616, %r7617, %r7618, %r7619}, [%r3424]; 2026-02-21T09:20:56.0814886Z // end inline asm 2026-02-21T09:20:56.0814946Z // begin inline asm 2026-02-21T09:20:56.0815119Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7620, %r7621, %r7622, %r7623}, [%r3429]; 2026-02-21T09:20:56.0815177Z // end inline asm 2026-02-21T09:20:56.0815237Z // begin inline asm 2026-02-21T09:20:56.0815361Z st.global.v4.b32 [ %rd381 + 0 ], { %r7560, %r7561, %r7562, %r7563 }; 2026-02-21T09:20:56.0815474Z // end inline asm 2026-02-21T09:20:56.0815540Z // begin inline asm 2026-02-21T09:20:56.0815659Z st.global.v4.b32 [ %rd382 + 0 ], { %r7564, %r7565, %r7566, %r7567 }; 2026-02-21T09:20:56.0815716Z // end inline asm 2026-02-21T09:20:56.0815773Z // begin inline asm 2026-02-21T09:20:56.0815891Z st.global.v4.b32 [ %rd383 + 0 ], { %r7568, %r7569, %r7570, %r7571 }; 2026-02-21T09:20:56.0815956Z // end inline asm 2026-02-21T09:20:56.0816015Z // begin inline asm 2026-02-21T09:20:56.0816135Z st.global.v4.b32 [ %rd384 + 0 ], { %r7572, %r7573, %r7574, %r7575 }; 2026-02-21T09:20:56.0816189Z // end inline asm 2026-02-21T09:20:56.0816251Z // begin inline asm 2026-02-21T09:20:56.0816363Z st.global.v4.b32 [ %rd385 + 0 ], { %r7576, %r7577, %r7578, %r7579 }; 2026-02-21T09:20:56.0816596Z // end inline asm 2026-02-21T09:20:56.0816667Z // begin inline asm 2026-02-21T09:20:56.0816785Z st.global.v4.b32 [ %rd386 + 0 ], { %r7580, %r7581, %r7582, %r7583 }; 2026-02-21T09:20:56.0816840Z // end inline asm 2026-02-21T09:20:56.0816899Z // begin inline asm 2026-02-21T09:20:56.0817015Z st.global.v4.b32 [ %rd387 + 0 ], { %r7584, %r7585, %r7586, %r7587 }; 2026-02-21T09:20:56.0817072Z // end inline asm 2026-02-21T09:20:56.0817130Z // begin inline asm 2026-02-21T09:20:56.0817246Z st.global.v4.b32 [ %rd388 + 0 ], { %r7588, %r7589, %r7590, %r7591 }; 2026-02-21T09:20:56.0817304Z // end inline asm 2026-02-21T09:20:56.0817364Z // begin inline asm 2026-02-21T09:20:56.0817474Z st.global.v4.b32 [ %rd389 + 0 ], { %r7592, %r7593, %r7594, %r7595 }; 2026-02-21T09:20:56.0817535Z // end inline asm 2026-02-21T09:20:56.0817591Z // begin inline asm 2026-02-21T09:20:56.0817704Z st.global.v4.b32 [ %rd390 + 0 ], { %r7596, %r7597, %r7598, %r7599 }; 2026-02-21T09:20:56.0817763Z // end inline asm 2026-02-21T09:20:56.0817822Z // begin inline asm 2026-02-21T09:20:56.0817934Z st.global.v4.b32 [ %rd391 + 0 ], { %r7600, %r7601, %r7602, %r7603 }; 2026-02-21T09:20:56.0817989Z // end inline asm 2026-02-21T09:20:56.0818049Z // begin inline asm 2026-02-21T09:20:56.0818161Z st.global.v4.b32 [ %rd392 + 0 ], { %r7604, %r7605, %r7606, %r7607 }; 2026-02-21T09:20:56.0818215Z // end inline asm 2026-02-21T09:20:56.0818274Z // begin inline asm 2026-02-21T09:20:56.0818387Z st.global.v4.b32 [ %rd393 + 0 ], { %r7608, %r7609, %r7610, %r7611 }; 2026-02-21T09:20:56.0818443Z // end inline asm 2026-02-21T09:20:56.0818500Z // begin inline asm 2026-02-21T09:20:56.0818615Z st.global.v4.b32 [ %rd394 + 0 ], { %r7612, %r7613, %r7614, %r7615 }; 2026-02-21T09:20:56.0818671Z // end inline asm 2026-02-21T09:20:56.0818728Z // begin inline asm 2026-02-21T09:20:56.0818844Z st.global.v4.b32 [ %rd395 + 0 ], { %r7616, %r7617, %r7618, %r7619 }; 2026-02-21T09:20:56.0818898Z // end inline asm 2026-02-21T09:20:56.0818955Z // begin inline asm 2026-02-21T09:20:56.0819071Z st.global.v4.b32 [ %rd396 + 0 ], { %r7620, %r7621, %r7622, %r7623 }; 2026-02-21T09:20:56.0819126Z // end inline asm 2026-02-21T09:20:56.0819349Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0819414Z add.s32 %r7840, %r11887, 3; 2026-02-21T09:20:56.0819626Z .loc 1 28 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:28:35 2026-02-21T09:20:56.0819783Z shr.s32 %r7841, %r7840, 31; 2026-02-21T09:20:56.0819844Z shr.u32 %r7842, %r7841, 23; 2026-02-21T09:20:56.0819911Z add.s32 %r7843, %r7840, %r7842; 2026-02-21T09:20:56.0819971Z shr.s32 %r7844, %r7843, 9; 2026-02-21T09:20:56.0820239Z .loc 1 29 33 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:29:33 2026-02-21T09:20:56.0820300Z shl.b32 %r7845, %r7844, 3; 2026-02-21T09:20:56.0820499Z .loc 1 30 39 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:39 2026-02-21T09:20:56.0820560Z sub.s32 %r7846, 64, %r7845; 2026-02-21T09:20:56.0820758Z .loc 1 30 52 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:30:52 2026-02-21T09:20:56.0820906Z min.s32 %r7847, %r7846, 8; 2026-02-21T09:20:56.0821112Z .loc 1 31 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:45 2026-02-21T09:20:56.0821177Z and.b32 %r7848, %r7843, -512; 2026-02-21T09:20:56.0821244Z sub.s32 %r7849, %r7840, %r7848; 2026-02-21T09:20:56.0821441Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0821503Z div.s32 %r7850, %r7849, %r7847; 2026-02-21T09:20:56.0821707Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0821771Z mul.lo.s32 %r7851, %r7850, %r7847; 2026-02-21T09:20:56.0821833Z sub.s32 %r7852, %r7849, %r7851; 2026-02-21T09:20:56.0822106Z .loc 1 31 30 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:30 2026-02-21T09:20:56.0822174Z add.s32 %r7853, %r7852, %r7845; 2026-02-21T09:20:56.0822375Z .loc 1 33 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:33:27 2026-02-21T09:20:56.0822437Z shl.b32 %r1015, %r7853, 7; 2026-02-21T09:20:56.0822641Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0822707Z or.b32 %r7854, %r1015, %r7; 2026-02-21T09:20:56.0822905Z .loc 1 35 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:35:27 2026-02-21T09:20:56.0822968Z shl.b32 %r1016, %r7850, 8; 2026-02-21T09:20:56.0823168Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0823231Z or.b32 %r7855, %r1016, %r11; 2026-02-21T09:20:56.0823293Z or.b32 %r7856, %r1016, %r12; 2026-02-21T09:20:56.0823351Z or.b32 %r7857, %r1016, %r13; 2026-02-21T09:20:56.0823411Z or.b32 %r7858, %r1016, %r14; 2026-02-21T09:20:56.0823608Z .loc 1 51 53 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:53 2026-02-21T09:20:56.0823674Z shl.b32 %r7859, %r7855, 10; 2026-02-21T09:20:56.0823732Z shl.b32 %r7860, %r7856, 10; 2026-02-21T09:20:56.0823791Z shl.b32 %r7861, %r7857, 10; 2026-02-21T09:20:56.0823851Z shl.b32 %r7862, %r7858, 10; 2026-02-21T09:20:56.0824052Z .loc 1 51 60 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:60 2026-02-21T09:20:56.0824111Z or.b32 %r7863, %r7859, %r34; 2026-02-21T09:20:56.0824171Z or.b32 %r7864, %r7860, %r34; 2026-02-21T09:20:56.0824229Z or.b32 %r7865, %r7861, %r34; 2026-02-21T09:20:56.0824289Z or.b32 %r7866, %r7862, %r34; 2026-02-21T09:20:56.0824488Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0824562Z mad.wide.s32 %rd397, %r7863, 2, %rd44; 2026-02-21T09:20:56.0824633Z mad.wide.s32 %rd398, %r7864, 2, %rd44; 2026-02-21T09:20:56.0824700Z mad.wide.s32 %rd399, %r7865, 2, %rd44; 2026-02-21T09:20:56.0824779Z mad.wide.s32 %rd400, %r7866, 2, %rd44; 2026-02-21T09:20:56.0824985Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0825040Z bar.sync 0; 2026-02-21T09:20:56.0825102Z mov.b32 %r7625, 8; 2026-02-21T09:20:56.0825231Z // begin inline asm 2026-02-21T09:20:56.0825372Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd397 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0825430Z // end inline asm 2026-02-21T09:20:56.0825491Z // begin inline asm 2026-02-21T09:20:56.0825621Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd398 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0825733Z // end inline asm 2026-02-21T09:20:56.0825790Z // begin inline asm 2026-02-21T09:20:56.0825917Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd399 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0825973Z // end inline asm 2026-02-21T09:20:56.0826030Z // begin inline asm 2026-02-21T09:20:56.0826156Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd400 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0826211Z // end inline asm 2026-02-21T09:20:56.0826281Z cp.async.commit_group; 2026-02-21T09:20:56.0826697Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0826771Z add.s32 %r7867, %r7854, %r11870; 2026-02-21T09:20:56.0826981Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0827045Z cvt.s64.s32 %rd448, %r7867; 2026-02-21T09:20:56.0827111Z add.s64 %rd401, %rd45, %rd448; 2026-02-21T09:20:56.0827172Z mov.b32 %r12287, 4; 2026-02-21T09:20:56.0827372Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0827442Z // begin inline asm 2026-02-21T09:20:56.0827583Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd401 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0827709Z // end inline asm 2026-02-21T09:20:56.0827777Z cp.async.commit_group; 2026-02-21T09:20:56.0827977Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0828044Z cvt.s64.s32 %rd449, %r7859; 2026-02-21T09:20:56.0828108Z or.b64 %rd450, %rd449, %rd633; 2026-02-21T09:20:56.0828170Z shl.b64 %rd451, %rd450, 1; 2026-02-21T09:20:56.0828235Z add.s64 %rd452, %rd44, %rd451; 2026-02-21T09:20:56.0828352Z add.s64 %rd402, %rd452, 32; 2026-02-21T09:20:56.0828425Z cvt.s64.s32 %rd453, %r7860; 2026-02-21T09:20:56.0828487Z or.b64 %rd454, %rd453, %rd633; 2026-02-21T09:20:56.0828551Z shl.b64 %rd455, %rd454, 1; 2026-02-21T09:20:56.0828615Z add.s64 %rd456, %rd44, %rd455; 2026-02-21T09:20:56.0828674Z add.s64 %rd403, %rd456, 32; 2026-02-21T09:20:56.0828735Z cvt.s64.s32 %rd457, %r7861; 2026-02-21T09:20:56.0828794Z or.b64 %rd458, %rd457, %rd633; 2026-02-21T09:20:56.0828855Z shl.b64 %rd459, %rd458, 1; 2026-02-21T09:20:56.0828916Z add.s64 %rd460, %rd44, %rd459; 2026-02-21T09:20:56.0828979Z add.s64 %rd404, %rd460, 32; 2026-02-21T09:20:56.0829036Z cvt.s64.s32 %rd461, %r7862; 2026-02-21T09:20:56.0829095Z or.b64 %rd462, %rd461, %rd633; 2026-02-21T09:20:56.0829157Z shl.b64 %rd463, %rd462, 1; 2026-02-21T09:20:56.0829219Z add.s64 %rd464, %rd44, %rd463; 2026-02-21T09:20:56.0829279Z add.s64 %rd405, %rd464, 32; 2026-02-21T09:20:56.0829480Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0829544Z // begin inline asm 2026-02-21T09:20:56.0829677Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd402 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0829734Z // end inline asm 2026-02-21T09:20:56.0829795Z // begin inline asm 2026-02-21T09:20:56.0829924Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd403 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0829979Z // end inline asm 2026-02-21T09:20:56.0830038Z // begin inline asm 2026-02-21T09:20:56.0830168Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd404 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0830224Z // end inline asm 2026-02-21T09:20:56.0830280Z // begin inline asm 2026-02-21T09:20:56.0830410Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd405 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0830465Z // end inline asm 2026-02-21T09:20:56.0830540Z cp.async.commit_group; 2026-02-21T09:20:56.0830749Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0830894Z add.s32 %r7868, %r7854, %r47; 2026-02-21T09:20:56.0831102Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0831165Z cvt.s64.s32 %rd465, %r7868; 2026-02-21T09:20:56.0831300Z add.s64 %rd406, %rd45, %rd465; 2026-02-21T09:20:56.0831500Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0831559Z // begin inline asm 2026-02-21T09:20:56.0831698Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd406 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0831755Z // end inline asm 2026-02-21T09:20:56.0831819Z cp.async.commit_group; 2026-02-21T09:20:56.0832068Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0832133Z add.s64 %rd407, %rd452, 64; 2026-02-21T09:20:56.0832192Z add.s64 %rd408, %rd456, 64; 2026-02-21T09:20:56.0832253Z add.s64 %rd409, %rd460, 64; 2026-02-21T09:20:56.0832314Z add.s64 %rd410, %rd464, 64; 2026-02-21T09:20:56.0832515Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0832571Z bar.sync 0; 2026-02-21T09:20:56.0832633Z // begin inline asm 2026-02-21T09:20:56.0832762Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd407 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0832817Z // end inline asm 2026-02-21T09:20:56.0832876Z // begin inline asm 2026-02-21T09:20:56.0833002Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd408 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0833105Z // end inline asm 2026-02-21T09:20:56.0833164Z // begin inline asm 2026-02-21T09:20:56.0833293Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd409 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0833350Z // end inline asm 2026-02-21T09:20:56.0833407Z // begin inline asm 2026-02-21T09:20:56.0833535Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd410 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0833591Z // end inline asm 2026-02-21T09:20:56.0833654Z cp.async.commit_group; 2026-02-21T09:20:56.0833862Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0833934Z add.s32 %r7869, %r7854, %r53; 2026-02-21T09:20:56.0834137Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0834204Z cvt.s64.s32 %rd466, %r7869; 2026-02-21T09:20:56.0834270Z add.s64 %rd411, %rd45, %rd466; 2026-02-21T09:20:56.0834471Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0834530Z // begin inline asm 2026-02-21T09:20:56.0834665Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd411 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0834724Z // end inline asm 2026-02-21T09:20:56.0834789Z cp.async.commit_group; 2026-02-21T09:20:56.0834994Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0835067Z add.s64 %rd412, %rd452, 96; 2026-02-21T09:20:56.0835129Z add.s64 %rd413, %rd456, 96; 2026-02-21T09:20:56.0835187Z add.s64 %rd414, %rd460, 96; 2026-02-21T09:20:56.0835249Z add.s64 %rd415, %rd464, 96; 2026-02-21T09:20:56.0835447Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0835506Z // begin inline asm 2026-02-21T09:20:56.0835636Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd412 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0835691Z // end inline asm 2026-02-21T09:20:56.0835749Z // begin inline asm 2026-02-21T09:20:56.0835880Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd413 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0835937Z // end inline asm 2026-02-21T09:20:56.0835993Z // begin inline asm 2026-02-21T09:20:56.0836121Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd414 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0836179Z // end inline asm 2026-02-21T09:20:56.0836237Z // begin inline asm 2026-02-21T09:20:56.0836432Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd415 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0836611Z // end inline asm 2026-02-21T09:20:56.0836679Z cp.async.commit_group; 2026-02-21T09:20:56.0836881Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0837026Z add.s32 %r7870, %r7854, %r59; 2026-02-21T09:20:56.0837231Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0837294Z cvt.s64.s32 %rd467, %r7870; 2026-02-21T09:20:56.0837356Z add.s64 %rd416, %rd45, %rd467; 2026-02-21T09:20:56.0837560Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0837619Z // begin inline asm 2026-02-21T09:20:56.0837824Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd416 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0837888Z // end inline asm 2026-02-21T09:20:56.0837951Z cp.async.commit_group; 2026-02-21T09:20:56.0838161Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0838228Z add.s64 %rd417, %rd452, 128; 2026-02-21T09:20:56.0838290Z add.s64 %rd418, %rd456, 128; 2026-02-21T09:20:56.0838351Z add.s64 %rd419, %rd460, 128; 2026-02-21T09:20:56.0838409Z add.s64 %rd420, %rd464, 128; 2026-02-21T09:20:56.0838613Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0838668Z bar.sync 0; 2026-02-21T09:20:56.0838726Z // begin inline asm 2026-02-21T09:20:56.0838928Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd417 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0838989Z // end inline asm 2026-02-21T09:20:56.0839046Z // begin inline asm 2026-02-21T09:20:56.0839187Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd418 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0839247Z // end inline asm 2026-02-21T09:20:56.0839304Z // begin inline asm 2026-02-21T09:20:56.0839432Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd419 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0839492Z // end inline asm 2026-02-21T09:20:56.0839549Z // begin inline asm 2026-02-21T09:20:56.0839674Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd420 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0839733Z // end inline asm 2026-02-21T09:20:56.0839797Z cp.async.commit_group; 2026-02-21T09:20:56.0840000Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0840061Z add.s32 %r7871, %r7854, %r65; 2026-02-21T09:20:56.0840264Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0840326Z cvt.s64.s32 %rd468, %r7871; 2026-02-21T09:20:56.0840389Z add.s64 %rd421, %rd45, %rd468; 2026-02-21T09:20:56.0840590Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0840649Z // begin inline asm 2026-02-21T09:20:56.0840782Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd421 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0840841Z // end inline asm 2026-02-21T09:20:56.0840905Z cp.async.commit_group; 2026-02-21T09:20:56.0841104Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0841168Z add.s64 %rd422, %rd452, 160; 2026-02-21T09:20:56.0841233Z add.s64 %rd423, %rd456, 160; 2026-02-21T09:20:56.0841291Z add.s64 %rd424, %rd460, 160; 2026-02-21T09:20:56.0841350Z add.s64 %rd425, %rd464, 160; 2026-02-21T09:20:56.0841552Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0841611Z // begin inline asm 2026-02-21T09:20:56.0841742Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd422 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0841801Z // end inline asm 2026-02-21T09:20:56.0841858Z // begin inline asm 2026-02-21T09:20:56.0841986Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd423 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0842130Z // end inline asm 2026-02-21T09:20:56.0842191Z // begin inline asm 2026-02-21T09:20:56.0842320Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd424 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0842376Z // end inline asm 2026-02-21T09:20:56.0842434Z // begin inline asm 2026-02-21T09:20:56.0842626Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd425 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0842682Z // end inline asm 2026-02-21T09:20:56.0842747Z cp.async.commit_group; 2026-02-21T09:20:56.0842951Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0843012Z add.s32 %r7872, %r7854, %r71; 2026-02-21T09:20:56.0843210Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0843332Z cvt.s64.s32 %rd469, %r7872; 2026-02-21T09:20:56.0843398Z add.s64 %rd426, %rd45, %rd469; 2026-02-21T09:20:56.0843598Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0843661Z // begin inline asm 2026-02-21T09:20:56.0843794Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd426 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0843849Z // end inline asm 2026-02-21T09:20:56.0843913Z cp.async.commit_group; 2026-02-21T09:20:56.0844115Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0844176Z add.s64 %rd427, %rd452, 192; 2026-02-21T09:20:56.0844235Z add.s64 %rd428, %rd456, 192; 2026-02-21T09:20:56.0844299Z add.s64 %rd429, %rd460, 192; 2026-02-21T09:20:56.0844416Z add.s64 %rd430, %rd464, 192; 2026-02-21T09:20:56.0844620Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0844679Z bar.sync 0; 2026-02-21T09:20:56.0844738Z // begin inline asm 2026-02-21T09:20:56.0844867Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd427 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0844922Z // end inline asm 2026-02-21T09:20:56.0844984Z // begin inline asm 2026-02-21T09:20:56.0845110Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd428 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0845165Z // end inline asm 2026-02-21T09:20:56.0845224Z // begin inline asm 2026-02-21T09:20:56.0845353Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd429 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0845407Z // end inline asm 2026-02-21T09:20:56.0845464Z // begin inline asm 2026-02-21T09:20:56.0845595Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd430 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0845650Z // end inline asm 2026-02-21T09:20:56.0845716Z cp.async.commit_group; 2026-02-21T09:20:56.0845918Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0845982Z add.s32 %r7873, %r7854, %r77; 2026-02-21T09:20:56.0846182Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0846244Z cvt.s64.s32 %rd470, %r7873; 2026-02-21T09:20:56.0846307Z add.s64 %rd431, %rd45, %rd470; 2026-02-21T09:20:56.0846629Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0846692Z // begin inline asm 2026-02-21T09:20:56.0846826Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd431 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0846881Z // end inline asm 2026-02-21T09:20:56.0846946Z cp.async.commit_group; 2026-02-21T09:20:56.0847145Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0847208Z add.s64 %rd432, %rd452, 224; 2026-02-21T09:20:56.0847267Z add.s64 %rd433, %rd456, 224; 2026-02-21T09:20:56.0847328Z add.s64 %rd434, %rd460, 224; 2026-02-21T09:20:56.0847390Z add.s64 %rd435, %rd464, 224; 2026-02-21T09:20:56.0847589Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0847647Z // begin inline asm 2026-02-21T09:20:56.0847878Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd432 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0847935Z // end inline asm 2026-02-21T09:20:56.0847992Z // begin inline asm 2026-02-21T09:20:56.0848122Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd433 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0848240Z // end inline asm 2026-02-21T09:20:56.0848297Z // begin inline asm 2026-02-21T09:20:56.0848425Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd434 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0848481Z // end inline asm 2026-02-21T09:20:56.0848537Z // begin inline asm 2026-02-21T09:20:56.0848664Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd435 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0848721Z // end inline asm 2026-02-21T09:20:56.0848783Z cp.async.commit_group; 2026-02-21T09:20:56.0849046Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0849111Z add.s32 %r7874, %r7854, %r83; 2026-02-21T09:20:56.0849312Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0849375Z cvt.s64.s32 %rd471, %r7874; 2026-02-21T09:20:56.0849442Z add.s64 %rd436, %rd45, %rd471; 2026-02-21T09:20:56.0849651Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0849712Z // begin inline asm 2026-02-21T09:20:56.0849843Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd436 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0849900Z // end inline asm 2026-02-21T09:20:56.0849964Z cp.async.commit_group; 2026-02-21T09:20:56.0850222Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0850288Z add.s64 %rd437, %rd452, 256; 2026-02-21T09:20:56.0850350Z add.s64 %rd438, %rd456, 256; 2026-02-21T09:20:56.0850409Z add.s64 %rd439, %rd460, 256; 2026-02-21T09:20:56.0850467Z add.s64 %rd440, %rd464, 256; 2026-02-21T09:20:56.0850670Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0850726Z bar.sync 0; 2026-02-21T09:20:56.0850784Z // begin inline asm 2026-02-21T09:20:56.0850921Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd437 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0850977Z // end inline asm 2026-02-21T09:20:56.0851034Z // begin inline asm 2026-02-21T09:20:56.0851165Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd438 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0851232Z // end inline asm 2026-02-21T09:20:56.0851291Z // begin inline asm 2026-02-21T09:20:56.0851422Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd439 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0851480Z // end inline asm 2026-02-21T09:20:56.0851537Z // begin inline asm 2026-02-21T09:20:56.0851662Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd440 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0851719Z // end inline asm 2026-02-21T09:20:56.0851782Z cp.async.commit_group; 2026-02-21T09:20:56.0851980Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0852045Z add.s32 %r7875, %r7854, %r89; 2026-02-21T09:20:56.0852243Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0852305Z cvt.s64.s32 %rd472, %r7875; 2026-02-21T09:20:56.0852369Z add.s64 %rd441, %rd45, %rd472; 2026-02-21T09:20:56.0852569Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0852627Z // begin inline asm 2026-02-21T09:20:56.0852757Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd441 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0852816Z // end inline asm 2026-02-21T09:20:56.0852884Z cp.async.commit_group; 2026-02-21T09:20:56.0853084Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0853148Z add.s64 %rd442, %rd452, 288; 2026-02-21T09:20:56.0853207Z add.s64 %rd443, %rd456, 288; 2026-02-21T09:20:56.0853339Z add.s64 %rd444, %rd460, 288; 2026-02-21T09:20:56.0853399Z add.s64 %rd445, %rd464, 288; 2026-02-21T09:20:56.0853602Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0853661Z // begin inline asm 2026-02-21T09:20:56.0853836Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd442 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0853894Z // end inline asm 2026-02-21T09:20:56.0853956Z // begin inline asm 2026-02-21T09:20:56.0854081Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd443 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0854140Z // end inline asm 2026-02-21T09:20:56.0854207Z // begin inline asm 2026-02-21T09:20:56.0854335Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd444 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0854390Z // end inline asm 2026-02-21T09:20:56.0854499Z // begin inline asm 2026-02-21T09:20:56.0854628Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd445 + 0 ], 0x8, %r7625; 2026-02-21T09:20:56.0854683Z // end inline asm 2026-02-21T09:20:56.0854751Z cp.async.commit_group; 2026-02-21T09:20:56.0854960Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0855024Z add.s32 %r7876, %r7854, %r95; 2026-02-21T09:20:56.0855223Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0855291Z cvt.s64.s32 %rd473, %r7876; 2026-02-21T09:20:56.0855352Z add.s64 %rd446, %rd45, %rd473; 2026-02-21T09:20:56.0855595Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0855668Z // begin inline asm 2026-02-21T09:20:56.0855804Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd446 + 0 ], 0x4, %r12287; 2026-02-21T09:20:56.0855860Z // end inline asm 2026-02-21T09:20:56.0855926Z cp.async.commit_group; 2026-02-21T09:20:56.0856125Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0856191Z add.s32 %r12285, %r117, %r1015; 2026-02-21T09:20:56.0856251Z or.b32 %r7877, %r13, %r1016; 2026-02-21T09:20:56.0856312Z shl.b32 %r7878, %r7877, 10; 2026-02-21T09:20:56.0856376Z mul.wide.s32 %rd29, %r7878, 2; 2026-02-21T09:20:56.0856435Z or.b32 %r7879, %r12, %r1016; 2026-02-21T09:20:56.0856622Z shl.b32 %r7880, %r7879, 10; 2026-02-21T09:20:56.0856688Z mul.wide.s32 %rd30, %r7880, 2; 2026-02-21T09:20:56.0856746Z shl.b32 %r7881, %r7850, 18; 2026-02-21T09:20:56.0856811Z or.b32 %r7882, %r11886, %r7881; 2026-02-21T09:20:56.0856872Z mul.wide.s32 %rd31, %r7882, 2; 2026-02-21T09:20:56.0856934Z or.b32 %r12284, %r121, %r7881; 2026-02-21T09:20:56.0856994Z mov.b32 %r12288, 0f00000000; 2026-02-21T09:20:56.0857057Z mov.b32 %r12286, -1; 2026-02-21T09:20:56.0857124Z mov.b64 %rd641, -16; 2026-02-21T09:20:56.0857188Z mov.b64 %rd640, %rd3; 2026-02-21T09:20:56.0857252Z mov.b32 %r12289, %r12288; 2026-02-21T09:20:56.0857311Z mov.b32 %r12290, %r12288; 2026-02-21T09:20:56.0857371Z mov.b32 %r12291, %r12288; 2026-02-21T09:20:56.0857429Z mov.b32 %r12292, %r12288; 2026-02-21T09:20:56.0857489Z mov.b32 %r12293, %r12288; 2026-02-21T09:20:56.0857546Z mov.b32 %r12294, %r12288; 2026-02-21T09:20:56.0857602Z mov.b32 %r12295, %r12288; 2026-02-21T09:20:56.0857664Z mov.b32 %r12296, %r12288; 2026-02-21T09:20:56.0857721Z mov.b32 %r12297, %r12288; 2026-02-21T09:20:56.0857776Z mov.b32 %r12298, %r12288; 2026-02-21T09:20:56.0857832Z mov.b32 %r12299, %r12288; 2026-02-21T09:20:56.0857890Z mov.b32 %r12300, %r12288; 2026-02-21T09:20:56.0857947Z mov.b32 %r12301, %r12288; 2026-02-21T09:20:56.0858005Z mov.b32 %r12302, %r12288; 2026-02-21T09:20:56.0858065Z mov.b32 %r12303, %r12288; 2026-02-21T09:20:56.0858122Z mov.b32 %r12304, %r12288; 2026-02-21T09:20:56.0858178Z mov.b32 %r12305, %r12288; 2026-02-21T09:20:56.0858236Z mov.b32 %r12306, %r12288; 2026-02-21T09:20:56.0858296Z mov.b32 %r12307, %r12288; 2026-02-21T09:20:56.0858353Z mov.b32 %r12308, %r12288; 2026-02-21T09:20:56.0858502Z mov.b32 %r12309, %r12288; 2026-02-21T09:20:56.0858562Z mov.b32 %r12310, %r12288; 2026-02-21T09:20:56.0858619Z mov.b32 %r12311, %r12288; 2026-02-21T09:20:56.0858675Z mov.b32 %r12312, %r12288; 2026-02-21T09:20:56.0858731Z mov.b32 %r12313, %r12288; 2026-02-21T09:20:56.0858853Z mov.b32 %r12314, %r12288; 2026-02-21T09:20:56.0858910Z mov.b32 %r12315, %r12288; 2026-02-21T09:20:56.0858966Z mov.b32 %r12316, %r12288; 2026-02-21T09:20:56.0859025Z mov.b32 %r12317, %r12288; 2026-02-21T09:20:56.0859092Z mov.b32 %r12318, %r12288; 2026-02-21T09:20:56.0859151Z mov.b32 %r12319, %r12288; 2026-02-21T09:20:56.0859210Z mov.b32 %r12320, %r12288; 2026-02-21T09:20:56.0859272Z mov.b32 %r12321, %r12288; 2026-02-21T09:20:56.0859328Z mov.b32 %r12322, %r12288; 2026-02-21T09:20:56.0859385Z mov.b32 %r12323, %r12288; 2026-02-21T09:20:56.0859528Z mov.b32 %r12324, %r12288; 2026-02-21T09:20:56.0859591Z mov.b32 %r12325, %r12288; 2026-02-21T09:20:56.0859647Z mov.b32 %r12326, %r12288; 2026-02-21T09:20:56.0859708Z mov.b32 %r12327, %r12288; 2026-02-21T09:20:56.0859765Z mov.b32 %r12328, %r12288; 2026-02-21T09:20:56.0859823Z mov.b32 %r12329, %r12288; 2026-02-21T09:20:56.0859881Z mov.b32 %r12330, %r12288; 2026-02-21T09:20:56.0859940Z mov.b32 %r12331, %r12288; 2026-02-21T09:20:56.0859998Z mov.b32 %r12332, %r12288; 2026-02-21T09:20:56.0860055Z mov.b32 %r12333, %r12288; 2026-02-21T09:20:56.0860114Z mov.b32 %r12334, %r12288; 2026-02-21T09:20:56.0860170Z mov.b32 %r12335, %r12288; 2026-02-21T09:20:56.0860228Z mov.b32 %r12336, %r12288; 2026-02-21T09:20:56.0860284Z mov.b32 %r12337, %r12288; 2026-02-21T09:20:56.0860413Z mov.b32 %r12338, %r12288; 2026-02-21T09:20:56.0860483Z mov.b32 %r12339, %r12288; 2026-02-21T09:20:56.0860541Z mov.b32 %r12340, %r12288; 2026-02-21T09:20:56.0860600Z mov.b32 %r12341, %r12288; 2026-02-21T09:20:56.0860659Z mov.b32 %r12342, %r12288; 2026-02-21T09:20:56.0860717Z mov.b32 %r12343, %r12288; 2026-02-21T09:20:56.0860774Z mov.b32 %r12344, %r12288; 2026-02-21T09:20:56.0860837Z mov.b32 %r12345, %r12288; 2026-02-21T09:20:56.0860894Z mov.b32 %r12346, %r12288; 2026-02-21T09:20:56.0860951Z mov.b32 %r12347, %r12288; 2026-02-21T09:20:56.0861009Z mov.b32 %r12348, %r12288; 2026-02-21T09:20:56.0861065Z mov.b32 %r12349, %r12288; 2026-02-21T09:20:56.0861131Z mov.b32 %r12350, %r12288; 2026-02-21T09:20:56.0861187Z mov.b32 %r12351, %r12288; 2026-02-21T09:20:56.0861248Z mov.b32 %r12352, %r12288; 2026-02-21T09:20:56.0861305Z mov.b32 %r12353, %r12288; 2026-02-21T09:20:56.0861362Z mov.b32 %r12354, %r12288; 2026-02-21T09:20:56.0861422Z mov.b32 %r12355, %r12288; 2026-02-21T09:20:56.0861481Z mov.b32 %r12356, %r12288; 2026-02-21T09:20:56.0861537Z mov.b32 %r12357, %r12288; 2026-02-21T09:20:56.0861595Z mov.b32 %r12358, %r12288; 2026-02-21T09:20:56.0861656Z mov.b32 %r12359, %r12288; 2026-02-21T09:20:56.0861724Z mov.b32 %r12360, %r12288; 2026-02-21T09:20:56.0861784Z mov.b32 %r12361, %r12288; 2026-02-21T09:20:56.0861844Z mov.b32 %r12362, %r12288; 2026-02-21T09:20:56.0861902Z mov.b32 %r12363, %r12288; 2026-02-21T09:20:56.0861959Z mov.b32 %r12364, %r12288; 2026-02-21T09:20:56.0862016Z mov.b32 %r12365, %r12288; 2026-02-21T09:20:56.0862079Z mov.b32 %r12366, %r12288; 2026-02-21T09:20:56.0862138Z mov.b32 %r12367, %r12288; 2026-02-21T09:20:56.0862197Z mov.b32 %r12368, %r12288; 2026-02-21T09:20:56.0862257Z mov.b32 %r12369, %r12288; 2026-02-21T09:20:56.0862314Z mov.b32 %r12370, %r12288; 2026-02-21T09:20:56.0862372Z mov.b32 %r12371, %r12288; 2026-02-21T09:20:56.0862432Z mov.b32 %r12372, %r12288; 2026-02-21T09:20:56.0862489Z mov.b32 %r12373, %r12288; 2026-02-21T09:20:56.0862547Z mov.b32 %r12374, %r12288; 2026-02-21T09:20:56.0862604Z mov.b32 %r12375, %r12288; 2026-02-21T09:20:56.0862666Z mov.b32 %r12376, %r12288; 2026-02-21T09:20:56.0862725Z mov.b32 %r12377, %r12288; 2026-02-21T09:20:56.0862793Z mov.b32 %r12378, %r12288; 2026-02-21T09:20:56.0862855Z mov.b32 %r12379, %r12288; 2026-02-21T09:20:56.0862913Z mov.b32 %r12380, %r12288; 2026-02-21T09:20:56.0863048Z mov.b32 %r12381, %r12288; 2026-02-21T09:20:56.0863104Z mov.b32 %r12382, %r12288; 2026-02-21T09:20:56.0863164Z mov.b32 %r12383, %r12288; 2026-02-21T09:20:56.0863222Z mov.b32 %r12384, %r12288; 2026-02-21T09:20:56.0863280Z mov.b32 %r12385, %r12288; 2026-02-21T09:20:56.0863397Z mov.b32 %r12386, %r12288; 2026-02-21T09:20:56.0863456Z mov.b32 %r12387, %r12288; 2026-02-21T09:20:56.0863514Z mov.b32 %r12388, %r12288; 2026-02-21T09:20:56.0863571Z mov.b32 %r12389, %r12288; 2026-02-21T09:20:56.0863634Z mov.b32 %r12390, %r12288; 2026-02-21T09:20:56.0863691Z mov.b32 %r12391, %r12288; 2026-02-21T09:20:56.0863750Z mov.b32 %r12392, %r12288; 2026-02-21T09:20:56.0863810Z mov.b32 %r12393, %r12288; 2026-02-21T09:20:56.0863867Z mov.b32 %r12394, %r12288; 2026-02-21T09:20:56.0863974Z mov.b32 %r12395, %r12288; 2026-02-21T09:20:56.0864033Z mov.b32 %r12396, %r12288; 2026-02-21T09:20:56.0864093Z mov.b32 %r12397, %r12288; 2026-02-21T09:20:56.0864150Z mov.b32 %r12398, %r12288; 2026-02-21T09:20:56.0864209Z mov.b32 %r12399, %r12288; 2026-02-21T09:20:56.0864269Z mov.b32 %r12400, %r12288; 2026-02-21T09:20:56.0864324Z mov.b32 %r12401, %r12288; 2026-02-21T09:20:56.0864380Z mov.b32 %r12402, %r12288; 2026-02-21T09:20:56.0864436Z mov.b32 %r12403, %r12288; 2026-02-21T09:20:56.0864499Z mov.b32 %r12404, %r12288; 2026-02-21T09:20:56.0864559Z mov.b32 %r12405, %r12288; 2026-02-21T09:20:56.0864617Z mov.b32 %r12406, %r12288; 2026-02-21T09:20:56.0864677Z mov.b32 %r12407, %r12288; 2026-02-21T09:20:56.0864734Z mov.b32 %r12408, %r12288; 2026-02-21T09:20:56.0864791Z mov.b32 %r12409, %r12288; 2026-02-21T09:20:56.0864896Z mov.b32 %r12410, %r12288; 2026-02-21T09:20:56.0864959Z mov.b32 %r12411, %r12288; 2026-02-21T09:20:56.0865016Z mov.b32 %r12412, %r12288; 2026-02-21T09:20:56.0865075Z mov.b32 %r12413, %r12288; 2026-02-21T09:20:56.0865133Z mov.b32 %r12414, %r12288; 2026-02-21T09:20:56.0865190Z mov.b32 %r12415, %r12288; 2026-02-21T09:20:56.0865302Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:56.0865412Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:56.0865474Z add.s64 %rd641, %rd641, 16; 2026-02-21T09:20:56.0865542Z setp.lt.u64 %p50, %rd641, 432; 2026-02-21T09:20:56.0865603Z add.s32 %r9483, %r12286, 1; 2026-02-21T09:20:56.0865670Z setp.gt.s32 %p51, %r9483, 4; 2026-02-21T09:20:56.0865750Z selp.b32 %r12286, 0, %r9483, %p51; 2026-02-21T09:20:56.0865962Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0866038Z cp.async.wait_group 16; 2026-02-21T09:20:56.0866095Z bar.sync 0; 2026-02-21T09:20:56.0866155Z shl.b32 %r9484, %r12286, 13; 2026-02-21T09:20:56.0866221Z add.s32 %r9486, %r11869, %r9484; 2026-02-21T09:20:56.0866439Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0866628Z add.s32 %r9487, %r9486, %r97; 2026-02-21T09:20:56.0866700Z ld.shared.b16 %rs337, [%r9487]; 2026-02-21T09:20:56.0866776Z ld.shared.b16 %rs338, [%r9487+256]; 2026-02-21T09:20:56.0866841Z ld.shared.b16 %rs339, [%r9487+16]; 2026-02-21T09:20:56.0866920Z ld.shared.b16 %rs340, [%r9487+272]; 2026-02-21T09:20:56.0866993Z ld.shared.b16 %rs341, [%r9487+4096]; 2026-02-21T09:20:56.0867064Z ld.shared.b16 %rs342, [%r9487+4352]; 2026-02-21T09:20:56.0867128Z ld.shared.b16 %rs343, [%r9487+4112]; 2026-02-21T09:20:56.0867192Z ld.shared.b16 %rs344, [%r9487+4368]; 2026-02-21T09:20:56.0867255Z add.s32 %r9488, %r9486, %r98; 2026-02-21T09:20:56.0867320Z ld.shared.b16 %rs345, [%r9488]; 2026-02-21T09:20:56.0867385Z ld.shared.b16 %rs346, [%r9488+256]; 2026-02-21T09:20:56.0867452Z ld.shared.b16 %rs347, [%r9488+16]; 2026-02-21T09:20:56.0867515Z ld.shared.b16 %rs348, [%r9488+272]; 2026-02-21T09:20:56.0867581Z ld.shared.b16 %rs349, [%r9488+4096]; 2026-02-21T09:20:56.0867648Z ld.shared.b16 %rs350, [%r9488+4352]; 2026-02-21T09:20:56.0867711Z ld.shared.b16 %rs351, [%r9488+4112]; 2026-02-21T09:20:56.0867876Z ld.shared.b16 %rs352, [%r9488+4368]; 2026-02-21T09:20:56.0867941Z cvt.f32.bf16 %r8011, %rs337; 2026-02-21T09:20:56.0868006Z cvt.f32.bf16 %r8012, %rs338; 2026-02-21T09:20:56.0868066Z cvt.f32.bf16 %r8013, %rs345; 2026-02-21T09:20:56.0868197Z cvt.f32.bf16 %r8014, %rs346; 2026-02-21T09:20:56.0868261Z cvt.f32.bf16 %r8143, %rs339; 2026-02-21T09:20:56.0868398Z cvt.f32.bf16 %r8144, %rs340; 2026-02-21T09:20:56.0868461Z cvt.f32.bf16 %r8145, %rs347; 2026-02-21T09:20:56.0868520Z cvt.f32.bf16 %r8146, %rs348; 2026-02-21T09:20:56.0868585Z cvt.f32.bf16 %r8275, %rs341; 2026-02-21T09:20:56.0868644Z cvt.f32.bf16 %r8276, %rs342; 2026-02-21T09:20:56.0868704Z cvt.f32.bf16 %r8277, %rs349; 2026-02-21T09:20:56.0868766Z cvt.f32.bf16 %r8278, %rs350; 2026-02-21T09:20:56.0868894Z cvt.f32.bf16 %r8407, %rs343; 2026-02-21T09:20:56.0868957Z cvt.f32.bf16 %r8408, %rs344; 2026-02-21T09:20:56.0869016Z cvt.f32.bf16 %r8409, %rs351; 2026-02-21T09:20:56.0869077Z cvt.f32.bf16 %r8410, %rs352; 2026-02-21T09:20:56.0869292Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0869351Z shl.b32 %r9489, %r12286, 10; 2026-02-21T09:20:56.0869416Z add.s32 %r9490, %r11869, %r9489; 2026-02-21T09:20:56.0869479Z add.s32 %r9491, %r9490, 90112; 2026-02-21T09:20:56.0869680Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0869745Z add.s32 %r9492, %r9491, %r11875; 2026-02-21T09:20:56.0869812Z ld.shared.b8 %rs353, [%r9492]; 2026-02-21T09:20:56.0869939Z ld.shared.b8 %rs354, [%r9492+128]; 2026-02-21T09:20:56.0870007Z ld.shared.b8 %rs355, [%r9492+256]; 2026-02-21T09:20:56.0870086Z ld.shared.b8 %rs356, [%r9492+384]; 2026-02-21T09:20:56.0870153Z ld.shared.b8 %rs357, [%r9492+512]; 2026-02-21T09:20:56.0870216Z ld.shared.b8 %rs358, [%r9492+640]; 2026-02-21T09:20:56.0870282Z ld.shared.b8 %rs359, [%r9492+768]; 2026-02-21T09:20:56.0870343Z add.s32 %r9493, %r9491, %r11876; 2026-02-21T09:20:56.0870407Z ld.shared.b8 %rs360, [%r9493]; 2026-02-21T09:20:56.0870609Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0870671Z shl.b16 %rs361, %rs353, 4; 2026-02-21T09:20:56.0870734Z shl.b16 %rs362, %rs354, 4; 2026-02-21T09:20:56.0870793Z shl.b16 %rs363, %rs355, 4; 2026-02-21T09:20:56.0870856Z shl.b16 %rs364, %rs356, 4; 2026-02-21T09:20:56.0870917Z shl.b16 %rs365, %rs357, 4; 2026-02-21T09:20:56.0870976Z shl.b16 %rs366, %rs358, 4; 2026-02-21T09:20:56.0871038Z shl.b16 %rs367, %rs359, 4; 2026-02-21T09:20:56.0871098Z shl.b16 %rs368, %rs360, 4; 2026-02-21T09:20:56.0871297Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0871372Z selp.b16 %rs369, %rs361, %rs353, %p70; 2026-02-21T09:20:56.0871438Z cvt.s16.s8 %rs370, %rs369; 2026-02-21T09:20:56.0871497Z shr.s16 %rs371, %rs370, 4; 2026-02-21T09:20:56.0871569Z selp.b16 %rs372, %rs362, %rs354, %p70; 2026-02-21T09:20:56.0871630Z cvt.s16.s8 %rs373, %rs372; 2026-02-21T09:20:56.0871690Z shr.s16 %rs374, %rs373, 4; 2026-02-21T09:20:56.0871757Z selp.b16 %rs375, %rs363, %rs355, %p70; 2026-02-21T09:20:56.0871816Z cvt.s16.s8 %rs376, %rs375; 2026-02-21T09:20:56.0871880Z shr.s16 %rs377, %rs376, 4; 2026-02-21T09:20:56.0871957Z selp.b16 %rs378, %rs364, %rs356, %p70; 2026-02-21T09:20:56.0872018Z cvt.s16.s8 %rs379, %rs378; 2026-02-21T09:20:56.0872081Z shr.s16 %rs380, %rs379, 4; 2026-02-21T09:20:56.0872149Z selp.b16 %rs381, %rs365, %rs357, %p70; 2026-02-21T09:20:56.0872210Z cvt.s16.s8 %rs382, %rs381; 2026-02-21T09:20:56.0872272Z shr.s16 %rs383, %rs382, 4; 2026-02-21T09:20:56.0872338Z selp.b16 %rs384, %rs366, %rs358, %p70; 2026-02-21T09:20:56.0872398Z cvt.s16.s8 %rs385, %rs384; 2026-02-21T09:20:56.0872457Z shr.s16 %rs386, %rs385, 4; 2026-02-21T09:20:56.0872528Z selp.b16 %rs387, %rs367, %rs359, %p70; 2026-02-21T09:20:56.0872591Z cvt.s16.s8 %rs388, %rs387; 2026-02-21T09:20:56.0872710Z shr.s16 %rs389, %rs388, 4; 2026-02-21T09:20:56.0872778Z selp.b16 %rs390, %rs368, %rs360, %p70; 2026-02-21T09:20:56.0872838Z cvt.s16.s8 %rs391, %rs390; 2026-02-21T09:20:56.0872897Z shr.s16 %rs392, %rs391, 4; 2026-02-21T09:20:56.0873146Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0873214Z cvt.rn.f32.s16 %r9494, %rs371; 2026-02-21T09:20:56.0873278Z cvt.rn.f32.s16 %r9495, %rs374; 2026-02-21T09:20:56.0873340Z cvt.rn.f32.s16 %r9496, %rs377; 2026-02-21T09:20:56.0873405Z cvt.rn.f32.s16 %r9497, %rs380; 2026-02-21T09:20:56.0873467Z cvt.rn.f32.s16 %r9498, %rs383; 2026-02-21T09:20:56.0873527Z cvt.rn.f32.s16 %r9499, %rs386; 2026-02-21T09:20:56.0873588Z cvt.rn.f32.s16 %r9500, %rs389; 2026-02-21T09:20:56.0873701Z cvt.rn.f32.s16 %r9501, %rs392; 2026-02-21T09:20:56.0873764Z st.shared.b32 [%r101], %r9494; 2026-02-21T09:20:56.0873829Z st.shared.b32 [%r101+8], %r9495; 2026-02-21T09:20:56.0873896Z st.shared.b32 [%r102], %r9496; 2026-02-21T09:20:56.0873972Z st.shared.b32 [%r102+8], %r9497; 2026-02-21T09:20:56.0874036Z st.shared.b32 [%r103], %r9498; 2026-02-21T09:20:56.0874104Z st.shared.b32 [%r103+8], %r9499; 2026-02-21T09:20:56.0874166Z st.shared.b32 [%r104], %r9500; 2026-02-21T09:20:56.0874229Z st.shared.b32 [%r104+8], %r9501; 2026-02-21T09:20:56.0874283Z $L__tmp13: 2026-02-21T09:20:56.0874566Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0874627Z // begin inline asm 2026-02-21T09:20:56.0874758Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0874819Z // end inline asm 2026-02-21T09:20:56.0874884Z bar.sync 0; 2026-02-21T09:20:56.0874967Z shfl.sync.idx.b32 %r9502, %r5, 0, 31, -1; 2026-02-21T09:20:56.0875041Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0875106Z mov.pred %p41, -1; 2026-02-21T09:20:56.0875164Z // begin inline asm 2026-02-21T09:20:56.0876812Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351}, {%r8011,%r8012,%r8013,%r8014}, %rd1, %p41, 1, 1; 2026-02-21T09:20:56.0876882Z // end inline asm 2026-02-21T09:20:56.0876940Z // begin inline asm 2026-02-21T09:20:56.0878415Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351}, {%r8143,%r8144,%r8145,%r8146}, %rd2, %p41, 1, 1; 2026-02-21T09:20:56.0878475Z // end inline asm 2026-02-21T09:20:56.0878534Z // begin inline asm 2026-02-21T09:20:56.0880010Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415}, {%r8275,%r8276,%r8277,%r8278}, %rd1, %p41, 1, 1; 2026-02-21T09:20:56.0880155Z // end inline asm 2026-02-21T09:20:56.0880212Z // begin inline asm 2026-02-21T09:20:56.0881691Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415}, {%r8407,%r8408,%r8409,%r8410}, %rd2, %p41, 1, 1; 2026-02-21T09:20:56.0881867Z // end inline asm 2026-02-21T09:20:56.0881950Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0882008Z mov.b32 %r9331, 0; 2026-02-21T09:20:56.0882069Z mov.b32 %r8539, %r1575; 2026-02-21T09:20:56.0882133Z mov.b32 %r8540, %r9331; 2026-02-21T09:20:56.0882191Z mov.b32 %r8541, %r9331; 2026-02-21T09:20:56.0882260Z // begin inline asm 2026-02-21T09:20:56.0884862Z // wait for regs: %r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r8539,%r8540,%r8541 2026-02-21T09:20:56.0884945Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0885003Z // end inline asm 2026-02-21T09:20:56.0885059Z $L__tmp14: 2026-02-21T09:20:56.0885274Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0885336Z add.s32 %r9504, %r3368, %r9484; 2026-02-21T09:20:56.0885544Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0885605Z add.s32 %r9505, %r9504, %r97; 2026-02-21T09:20:56.0885670Z ld.shared.b16 %rs393, [%r9505]; 2026-02-21T09:20:56.0885742Z ld.shared.b16 %rs394, [%r9505+256]; 2026-02-21T09:20:56.0885809Z ld.shared.b16 %rs395, [%r9505+16]; 2026-02-21T09:20:56.0885873Z ld.shared.b16 %rs396, [%r9505+272]; 2026-02-21T09:20:56.0885953Z ld.shared.b16 %rs397, [%r9505+4096]; 2026-02-21T09:20:56.0886023Z ld.shared.b16 %rs398, [%r9505+4352]; 2026-02-21T09:20:56.0886088Z ld.shared.b16 %rs399, [%r9505+4112]; 2026-02-21T09:20:56.0886151Z ld.shared.b16 %rs400, [%r9505+4368]; 2026-02-21T09:20:56.0886216Z add.s32 %r9506, %r9504, %r98; 2026-02-21T09:20:56.0886280Z ld.shared.b16 %rs401, [%r9506]; 2026-02-21T09:20:56.0886350Z ld.shared.b16 %rs402, [%r9506+256]; 2026-02-21T09:20:56.0886415Z ld.shared.b16 %rs403, [%r9506+16]; 2026-02-21T09:20:56.0886606Z ld.shared.b16 %rs404, [%r9506+272]; 2026-02-21T09:20:56.0886678Z ld.shared.b16 %rs405, [%r9506+4096]; 2026-02-21T09:20:56.0886741Z ld.shared.b16 %rs406, [%r9506+4352]; 2026-02-21T09:20:56.0886810Z ld.shared.b16 %rs407, [%r9506+4112]; 2026-02-21T09:20:56.0886876Z ld.shared.b16 %rs408, [%r9506+4368]; 2026-02-21T09:20:56.0886939Z cvt.f32.bf16 %r8801, %rs393; 2026-02-21T09:20:56.0887003Z cvt.f32.bf16 %r8802, %rs394; 2026-02-21T09:20:56.0887064Z cvt.f32.bf16 %r8803, %rs401; 2026-02-21T09:20:56.0887215Z cvt.f32.bf16 %r8804, %rs402; 2026-02-21T09:20:56.0887275Z cvt.f32.bf16 %r8933, %rs395; 2026-02-21T09:20:56.0887336Z cvt.f32.bf16 %r8934, %rs396; 2026-02-21T09:20:56.0887396Z cvt.f32.bf16 %r8935, %rs403; 2026-02-21T09:20:56.0887518Z cvt.f32.bf16 %r8936, %rs404; 2026-02-21T09:20:56.0887582Z cvt.f32.bf16 %r9065, %rs397; 2026-02-21T09:20:56.0887640Z cvt.f32.bf16 %r9066, %rs398; 2026-02-21T09:20:56.0887699Z cvt.f32.bf16 %r9067, %rs405; 2026-02-21T09:20:56.0887760Z cvt.f32.bf16 %r9068, %rs406; 2026-02-21T09:20:56.0887824Z cvt.f32.bf16 %r9197, %rs399; 2026-02-21T09:20:56.0887883Z cvt.f32.bf16 %r9198, %rs400; 2026-02-21T09:20:56.0887943Z cvt.f32.bf16 %r9199, %rs407; 2026-02-21T09:20:56.0888004Z cvt.f32.bf16 %r9200, %rs408; 2026-02-21T09:20:56.0888288Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0888363Z add.s32 %r9507, %r9490, 95232; 2026-02-21T09:20:56.0888568Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0888633Z add.s32 %r9508, %r9507, %r11875; 2026-02-21T09:20:56.0888697Z ld.shared.b8 %rs409, [%r9508]; 2026-02-21T09:20:56.0888760Z ld.shared.b8 %rs410, [%r9508+128]; 2026-02-21T09:20:56.0888831Z ld.shared.b8 %rs411, [%r9508+256]; 2026-02-21T09:20:56.0888893Z ld.shared.b8 %rs412, [%r9508+384]; 2026-02-21T09:20:56.0888956Z ld.shared.b8 %rs413, [%r9508+512]; 2026-02-21T09:20:56.0889021Z ld.shared.b8 %rs414, [%r9508+640]; 2026-02-21T09:20:56.0889152Z ld.shared.b8 %rs415, [%r9508+768]; 2026-02-21T09:20:56.0889220Z add.s32 %r9509, %r9507, %r11876; 2026-02-21T09:20:56.0889283Z ld.shared.b8 %rs416, [%r9509]; 2026-02-21T09:20:56.0889489Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0889552Z shl.b16 %rs417, %rs409, 4; 2026-02-21T09:20:56.0889611Z shl.b16 %rs418, %rs410, 4; 2026-02-21T09:20:56.0889675Z shl.b16 %rs419, %rs411, 4; 2026-02-21T09:20:56.0889734Z shl.b16 %rs420, %rs412, 4; 2026-02-21T09:20:56.0889794Z shl.b16 %rs421, %rs413, 4; 2026-02-21T09:20:56.0889856Z shl.b16 %rs422, %rs414, 4; 2026-02-21T09:20:56.0889916Z shl.b16 %rs423, %rs415, 4; 2026-02-21T09:20:56.0889976Z shl.b16 %rs424, %rs416, 4; 2026-02-21T09:20:56.0890174Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0890248Z selp.b16 %rs425, %rs417, %rs409, %p70; 2026-02-21T09:20:56.0890307Z cvt.s16.s8 %rs426, %rs425; 2026-02-21T09:20:56.0890367Z shr.s16 %rs427, %rs426, 4; 2026-02-21T09:20:56.0890437Z selp.b16 %rs428, %rs418, %rs410, %p70; 2026-02-21T09:20:56.0890496Z cvt.s16.s8 %rs429, %rs428; 2026-02-21T09:20:56.0890555Z shr.s16 %rs430, %rs429, 4; 2026-02-21T09:20:56.0890623Z selp.b16 %rs431, %rs419, %rs411, %p70; 2026-02-21T09:20:56.0890687Z cvt.s16.s8 %rs432, %rs431; 2026-02-21T09:20:56.0890747Z shr.s16 %rs433, %rs432, 4; 2026-02-21T09:20:56.0890814Z selp.b16 %rs434, %rs420, %rs412, %p70; 2026-02-21T09:20:56.0890877Z cvt.s16.s8 %rs435, %rs434; 2026-02-21T09:20:56.0890944Z shr.s16 %rs436, %rs435, 4; 2026-02-21T09:20:56.0891008Z selp.b16 %rs437, %rs421, %rs413, %p70; 2026-02-21T09:20:56.0891071Z cvt.s16.s8 %rs438, %rs437; 2026-02-21T09:20:56.0891132Z shr.s16 %rs439, %rs438, 4; 2026-02-21T09:20:56.0891198Z selp.b16 %rs440, %rs422, %rs414, %p70; 2026-02-21T09:20:56.0891257Z cvt.s16.s8 %rs441, %rs440; 2026-02-21T09:20:56.0891319Z shr.s16 %rs442, %rs441, 4; 2026-02-21T09:20:56.0891387Z selp.b16 %rs443, %rs423, %rs415, %p70; 2026-02-21T09:20:56.0891447Z cvt.s16.s8 %rs444, %rs443; 2026-02-21T09:20:56.0891508Z shr.s16 %rs445, %rs444, 4; 2026-02-21T09:20:56.0891577Z selp.b16 %rs446, %rs424, %rs416, %p70; 2026-02-21T09:20:56.0891637Z cvt.s16.s8 %rs447, %rs446; 2026-02-21T09:20:56.0891699Z shr.s16 %rs448, %rs447, 4; 2026-02-21T09:20:56.0891902Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.0892039Z cvt.rn.f32.s16 %r9510, %rs427; 2026-02-21T09:20:56.0892101Z cvt.rn.f32.s16 %r9511, %rs430; 2026-02-21T09:20:56.0892165Z cvt.rn.f32.s16 %r9512, %rs433; 2026-02-21T09:20:56.0892227Z cvt.rn.f32.s16 %r9513, %rs436; 2026-02-21T09:20:56.0892336Z cvt.rn.f32.s16 %r9514, %rs439; 2026-02-21T09:20:56.0892398Z cvt.rn.f32.s16 %r9515, %rs442; 2026-02-21T09:20:56.0892461Z cvt.rn.f32.s16 %r9516, %rs445; 2026-02-21T09:20:56.0892521Z cvt.rn.f32.s16 %r9517, %rs448; 2026-02-21T09:20:56.0892577Z bar.sync 0; 2026-02-21T09:20:56.0892644Z st.shared.b32 [%r101], %r9510; 2026-02-21T09:20:56.0892708Z st.shared.b32 [%r101+8], %r9511; 2026-02-21T09:20:56.0892770Z st.shared.b32 [%r102], %r9512; 2026-02-21T09:20:56.0892835Z st.shared.b32 [%r102+8], %r9513; 2026-02-21T09:20:56.0892944Z st.shared.b32 [%r103], %r9514; 2026-02-21T09:20:56.0893009Z st.shared.b32 [%r103+8], %r9515; 2026-02-21T09:20:56.0893070Z st.shared.b32 [%r104], %r9516; 2026-02-21T09:20:56.0893139Z st.shared.b32 [%r104+8], %r9517; 2026-02-21T09:20:56.0893191Z $L__tmp15: 2026-02-21T09:20:56.0893468Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.0893541Z // begin inline asm 2026-02-21T09:20:56.0893622Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.0893679Z // end inline asm 2026-02-21T09:20:56.0893733Z bar.sync 0; 2026-02-21T09:20:56.0893808Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.0893866Z // begin inline asm 2026-02-21T09:20:56.0895393Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351}, {%r8801,%r8802,%r8803,%r8804}, %rd1, %p41, 1, 1; 2026-02-21T09:20:56.0895457Z // end inline asm 2026-02-21T09:20:56.0895515Z // begin inline asm 2026-02-21T09:20:56.0897129Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351}, {%r8933,%r8934,%r8935,%r8936}, %rd2, %p41, 1, 1; 2026-02-21T09:20:56.0897192Z // end inline asm 2026-02-21T09:20:56.0897250Z // begin inline asm 2026-02-21T09:20:56.0898719Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415}, {%r9065,%r9066,%r9067,%r9068}, %rd1, %p41, 1, 1; 2026-02-21T09:20:56.0898777Z // end inline asm 2026-02-21T09:20:56.0898833Z // begin inline asm 2026-02-21T09:20:56.0900313Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415}, {%r9197,%r9198,%r9199,%r9200}, %rd2, %p41, 1, 1; 2026-02-21T09:20:56.0900509Z // end inline asm 2026-02-21T09:20:56.0900588Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.0900646Z mov.b32 %r9329, %r1575; 2026-02-21T09:20:56.0900705Z mov.b32 %r9330, %r9331; 2026-02-21T09:20:56.0900765Z // begin inline asm 2026-02-21T09:20:56.0903406Z // wait for regs: %r12288,%r12289,%r12290,%r12291,%r12292,%r12293,%r12294,%r12295,%r12296,%r12297,%r12298,%r12299,%r12300,%r12301,%r12302,%r12303,%r12304,%r12305,%r12306,%r12307,%r12308,%r12309,%r12310,%r12311,%r12312,%r12313,%r12314,%r12315,%r12316,%r12317,%r12318,%r12319,%r12320,%r12321,%r12322,%r12323,%r12324,%r12325,%r12326,%r12327,%r12328,%r12329,%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r9329,%r9330,%r9331 2026-02-21T09:20:56.0903499Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.0903560Z // end inline asm 2026-02-21T09:20:56.0903614Z $L__tmp16: 2026-02-21T09:20:56.0903822Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0903885Z add.s32 %r9518, %r12287, 1; 2026-02-21T09:20:56.0903952Z setp.gt.s32 %p52, %r9518, 4; 2026-02-21T09:20:56.0904017Z selp.b32 %r12287, 0, %r9518, %p52; 2026-02-21T09:20:56.0904220Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0904286Z add.s32 %r9519, %r12284, -16; 2026-02-21T09:20:56.0904348Z add.s64 %rd492, %rd640, %rd31; 2026-02-21T09:20:56.0904413Z add.s64 %rd482, %rd492, 320; 2026-02-21T09:20:56.0904474Z add.s64 %rd493, %rd640, %rd30; 2026-02-21T09:20:56.0904535Z add.s64 %rd483, %rd493, 320; 2026-02-21T09:20:56.0904598Z add.s64 %rd494, %rd640, %rd29; 2026-02-21T09:20:56.0904660Z add.s64 %rd484, %rd494, 320; 2026-02-21T09:20:56.0904729Z mad.wide.s32 %rd485, %r9519, 2, %rd44; 2026-02-21T09:20:56.0904930Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0904992Z shl.b32 %r9520, %r12287, 13; 2026-02-21T09:20:56.0905054Z add.s32 %r9521, %r11869, %r9520; 2026-02-21T09:20:56.0905113Z add.s32 %r9463, %r9521, %r36; 2026-02-21T09:20:56.0905175Z selp.b32 %r9464, 8, 0, %p50; 2026-02-21T09:20:56.0905236Z // begin inline asm 2026-02-21T09:20:56.0905381Z cp.async.ca.shared.global [ %r9463 + 0 ], [ %rd482 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0905440Z // end inline asm 2026-02-21T09:20:56.0905502Z add.s32 %r9465, %r9463, 2048; 2026-02-21T09:20:56.0905559Z // begin inline asm 2026-02-21T09:20:56.0905696Z cp.async.ca.shared.global [ %r9465 + 0 ], [ %rd483 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0905759Z // end inline asm 2026-02-21T09:20:56.0905832Z add.s32 %r9467, %r9463, 4096; 2026-02-21T09:20:56.0905892Z // begin inline asm 2026-02-21T09:20:56.0906027Z cp.async.ca.shared.global [ %r9467 + 0 ], [ %rd484 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0906089Z // end inline asm 2026-02-21T09:20:56.0906149Z add.s32 %r9469, %r9463, 6144; 2026-02-21T09:20:56.0906208Z // begin inline asm 2026-02-21T09:20:56.0906340Z cp.async.ca.shared.global [ %r9469 + 0 ], [ %rd485 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0906581Z // end inline asm 2026-02-21T09:20:56.0906653Z cp.async.commit_group; 2026-02-21T09:20:56.0906865Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0907016Z add.s32 %r9522, %r12285, -65536; 2026-02-21T09:20:56.0907079Z cvt.s64.s32 %rd495, %r9522; 2026-02-21T09:20:56.0907142Z add.s64 %rd486, %rd45, %rd495; 2026-02-21T09:20:56.0907349Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0907411Z shl.b32 %r9523, %r12287, 10; 2026-02-21T09:20:56.0907472Z add.s32 %r9471, %r42, %r9523; 2026-02-21T09:20:56.0907541Z selp.b32 %r9472, 4, 0, %p50; 2026-02-21T09:20:56.0907607Z // begin inline asm 2026-02-21T09:20:56.0907829Z cp.async.ca.shared.global [ %r9471 + 0 ], [ %rd486 + 0 ], 0x4, %r9472; 2026-02-21T09:20:56.0907890Z // end inline asm 2026-02-21T09:20:56.0907958Z cp.async.commit_group; 2026-02-21T09:20:56.0908162Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0908225Z add.s64 %rd487, %rd492, 352; 2026-02-21T09:20:56.0908287Z add.s64 %rd488, %rd493, 352; 2026-02-21T09:20:56.0908436Z add.s64 %rd489, %rd494, 352; 2026-02-21T09:20:56.0908511Z mad.wide.s32 %rd490, %r12284, 2, %rd44; 2026-02-21T09:20:56.0908715Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0908779Z add.s32 %r9524, %r3368, %r9520; 2026-02-21T09:20:56.0908906Z add.s32 %r9473, %r9524, %r36; 2026-02-21T09:20:56.0908972Z // begin inline asm 2026-02-21T09:20:56.0909112Z cp.async.ca.shared.global [ %r9473 + 0 ], [ %rd487 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0909178Z // end inline asm 2026-02-21T09:20:56.0909241Z add.s32 %r9475, %r9473, 2048; 2026-02-21T09:20:56.0909305Z // begin inline asm 2026-02-21T09:20:56.0909438Z cp.async.ca.shared.global [ %r9475 + 0 ], [ %rd488 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0909496Z // end inline asm 2026-02-21T09:20:56.0909556Z add.s32 %r9477, %r9473, 4096; 2026-02-21T09:20:56.0909617Z // begin inline asm 2026-02-21T09:20:56.0909756Z cp.async.ca.shared.global [ %r9477 + 0 ], [ %rd489 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0909816Z // end inline asm 2026-02-21T09:20:56.0909883Z add.s32 %r9479, %r9473, 6144; 2026-02-21T09:20:56.0909941Z // begin inline asm 2026-02-21T09:20:56.0910076Z cp.async.ca.shared.global [ %r9479 + 0 ], [ %rd490 + 0 ], 0x8, %r9464; 2026-02-21T09:20:56.0910133Z // end inline asm 2026-02-21T09:20:56.0910204Z cp.async.commit_group; 2026-02-21T09:20:56.0910412Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0910475Z cvt.s64.s32 %rd496, %r12285; 2026-02-21T09:20:56.0910541Z add.s64 %rd491, %rd45, %rd496; 2026-02-21T09:20:56.0910742Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0910806Z add.s32 %r9481, %r48, %r9523; 2026-02-21T09:20:56.0910868Z // begin inline asm 2026-02-21T09:20:56.0911009Z cp.async.ca.shared.global [ %r9481 + 0 ], [ %rd491 + 0 ], 0x4, %r9472; 2026-02-21T09:20:56.0911067Z // end inline asm 2026-02-21T09:20:56.0911138Z cp.async.commit_group; 2026-02-21T09:20:56.0911341Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0911406Z add.s32 %r12285, %r12285, 131072; 2026-02-21T09:20:56.0911468Z add.s64 %rd640, %rd640, 64; 2026-02-21T09:20:56.0911533Z add.s32 %r12284, %r12284, 32; 2026-02-21T09:20:56.0911599Z setp.lt.u64 %p53, %rd641, 496; 2026-02-21T09:20:56.0911669Z @%p53 bra $L__BB0_9; 2026-02-21T09:20:56.0911786Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:56.0911989Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0912049Z or.b32 %r9669, %r1015, %r9; 2026-02-21T09:20:56.0912325Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0912387Z or.b32 %r9670, %r1016, %r15; 2026-02-21T09:20:56.0912447Z or.b32 %r9671, %r1016, %r16; 2026-02-21T09:20:56.0912553Z or.b32 %r9672, %r1016, %r17; 2026-02-21T09:20:56.0912614Z or.b32 %r9673, %r1016, %r18; 2026-02-21T09:20:56.0912672Z or.b32 %r9674, %r1016, %r19; 2026-02-21T09:20:56.0912729Z or.b32 %r9675, %r1016, %r20; 2026-02-21T09:20:56.0912789Z or.b32 %r9676, %r1016, %r21; 2026-02-21T09:20:56.0912848Z or.b32 %r9677, %r1016, %r22; 2026-02-21T09:20:56.0912917Z or.b32 %r9678, %r1016, %r23; 2026-02-21T09:20:56.0912978Z or.b32 %r9679, %r1016, %r24; 2026-02-21T09:20:56.0913037Z or.b32 %r9680, %r1016, %r25; 2026-02-21T09:20:56.0913149Z or.b32 %r9681, %r1016, %r26; 2026-02-21T09:20:56.0913213Z or.b32 %r9682, %r1016, %r27; 2026-02-21T09:20:56.0913272Z or.b32 %r9683, %r1016, %r28; 2026-02-21T09:20:56.0913328Z or.b32 %r9684, %r1016, %r29; 2026-02-21T09:20:56.0913387Z or.b32 %r9685, %r1016, %r30; 2026-02-21T09:20:56.0913590Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0913658Z cp.async.wait_group 0; 2026-02-21T09:20:56.0913715Z bar.sync 0; 2026-02-21T09:20:56.0913914Z .loc 1 90 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:90:28 2026-02-21T09:20:56.0914001Z cvt.rn.bf16x2.f32 %r9686, %r12289, %r12288; 2026-02-21T09:20:56.0914081Z cvt.rn.bf16x2.f32 %r9687, %r12291, %r12290; 2026-02-21T09:20:56.0914216Z cvt.rn.bf16x2.f32 %r9688, %r12293, %r12292; 2026-02-21T09:20:56.0914295Z cvt.rn.bf16x2.f32 %r9689, %r12295, %r12294; 2026-02-21T09:20:56.0914368Z cvt.rn.bf16x2.f32 %r9690, %r12297, %r12296; 2026-02-21T09:20:56.0914442Z cvt.rn.bf16x2.f32 %r9691, %r12299, %r12298; 2026-02-21T09:20:56.0914518Z cvt.rn.bf16x2.f32 %r9692, %r12301, %r12300; 2026-02-21T09:20:56.0914593Z cvt.rn.bf16x2.f32 %r9693, %r12303, %r12302; 2026-02-21T09:20:56.0914668Z cvt.rn.bf16x2.f32 %r9694, %r12305, %r12304; 2026-02-21T09:20:56.0914742Z cvt.rn.bf16x2.f32 %r9695, %r12307, %r12306; 2026-02-21T09:20:56.0914826Z cvt.rn.bf16x2.f32 %r9696, %r12309, %r12308; 2026-02-21T09:20:56.0914903Z cvt.rn.bf16x2.f32 %r9697, %r12311, %r12310; 2026-02-21T09:20:56.0914979Z cvt.rn.bf16x2.f32 %r9698, %r12313, %r12312; 2026-02-21T09:20:56.0915055Z cvt.rn.bf16x2.f32 %r9699, %r12315, %r12314; 2026-02-21T09:20:56.0915128Z cvt.rn.bf16x2.f32 %r9700, %r12317, %r12316; 2026-02-21T09:20:56.0915202Z cvt.rn.bf16x2.f32 %r9701, %r12319, %r12318; 2026-02-21T09:20:56.0915275Z cvt.rn.bf16x2.f32 %r9702, %r12321, %r12320; 2026-02-21T09:20:56.0915351Z cvt.rn.bf16x2.f32 %r9703, %r12323, %r12322; 2026-02-21T09:20:56.0915425Z cvt.rn.bf16x2.f32 %r9704, %r12325, %r12324; 2026-02-21T09:20:56.0915497Z cvt.rn.bf16x2.f32 %r9705, %r12327, %r12326; 2026-02-21T09:20:56.0915572Z cvt.rn.bf16x2.f32 %r9706, %r12329, %r12328; 2026-02-21T09:20:56.0915643Z cvt.rn.bf16x2.f32 %r9707, %r12331, %r12330; 2026-02-21T09:20:56.0915718Z cvt.rn.bf16x2.f32 %r9708, %r12333, %r12332; 2026-02-21T09:20:56.0915794Z cvt.rn.bf16x2.f32 %r9709, %r12335, %r12334; 2026-02-21T09:20:56.0915867Z cvt.rn.bf16x2.f32 %r9710, %r12337, %r12336; 2026-02-21T09:20:56.0915941Z cvt.rn.bf16x2.f32 %r9711, %r12339, %r12338; 2026-02-21T09:20:56.0916013Z cvt.rn.bf16x2.f32 %r9712, %r12341, %r12340; 2026-02-21T09:20:56.0916088Z cvt.rn.bf16x2.f32 %r9713, %r12343, %r12342; 2026-02-21T09:20:56.0916160Z cvt.rn.bf16x2.f32 %r9714, %r12345, %r12344; 2026-02-21T09:20:56.0916234Z cvt.rn.bf16x2.f32 %r9715, %r12347, %r12346; 2026-02-21T09:20:56.0916311Z cvt.rn.bf16x2.f32 %r9716, %r12349, %r12348; 2026-02-21T09:20:56.0916383Z cvt.rn.bf16x2.f32 %r9717, %r12351, %r12350; 2026-02-21T09:20:56.0916591Z cvt.rn.bf16x2.f32 %r9718, %r12353, %r12352; 2026-02-21T09:20:56.0916675Z cvt.rn.bf16x2.f32 %r9719, %r12355, %r12354; 2026-02-21T09:20:56.0916749Z cvt.rn.bf16x2.f32 %r9720, %r12357, %r12356; 2026-02-21T09:20:56.0916905Z cvt.rn.bf16x2.f32 %r9721, %r12359, %r12358; 2026-02-21T09:20:56.0916981Z cvt.rn.bf16x2.f32 %r9722, %r12361, %r12360; 2026-02-21T09:20:56.0917057Z cvt.rn.bf16x2.f32 %r9723, %r12363, %r12362; 2026-02-21T09:20:56.0917129Z cvt.rn.bf16x2.f32 %r9724, %r12365, %r12364; 2026-02-21T09:20:56.0917274Z cvt.rn.bf16x2.f32 %r9725, %r12367, %r12366; 2026-02-21T09:20:56.0917352Z cvt.rn.bf16x2.f32 %r9726, %r12369, %r12368; 2026-02-21T09:20:56.0917425Z cvt.rn.bf16x2.f32 %r9727, %r12371, %r12370; 2026-02-21T09:20:56.0917498Z cvt.rn.bf16x2.f32 %r9728, %r12373, %r12372; 2026-02-21T09:20:56.0917575Z cvt.rn.bf16x2.f32 %r9729, %r12375, %r12374; 2026-02-21T09:20:56.0917648Z cvt.rn.bf16x2.f32 %r9730, %r12377, %r12376; 2026-02-21T09:20:56.0917722Z cvt.rn.bf16x2.f32 %r9731, %r12379, %r12378; 2026-02-21T09:20:56.0917863Z cvt.rn.bf16x2.f32 %r9732, %r12381, %r12380; 2026-02-21T09:20:56.0917944Z cvt.rn.bf16x2.f32 %r9733, %r12383, %r12382; 2026-02-21T09:20:56.0918018Z cvt.rn.bf16x2.f32 %r9734, %r12385, %r12384; 2026-02-21T09:20:56.0918093Z cvt.rn.bf16x2.f32 %r9735, %r12387, %r12386; 2026-02-21T09:20:56.0918171Z cvt.rn.bf16x2.f32 %r9736, %r12389, %r12388; 2026-02-21T09:20:56.0918243Z cvt.rn.bf16x2.f32 %r9737, %r12391, %r12390; 2026-02-21T09:20:56.0918316Z cvt.rn.bf16x2.f32 %r9738, %r12393, %r12392; 2026-02-21T09:20:56.0918397Z cvt.rn.bf16x2.f32 %r9739, %r12395, %r12394; 2026-02-21T09:20:56.0918477Z cvt.rn.bf16x2.f32 %r9740, %r12397, %r12396; 2026-02-21T09:20:56.0918549Z cvt.rn.bf16x2.f32 %r9741, %r12399, %r12398; 2026-02-21T09:20:56.0918692Z cvt.rn.bf16x2.f32 %r9742, %r12401, %r12400; 2026-02-21T09:20:56.0918775Z cvt.rn.bf16x2.f32 %r9743, %r12403, %r12402; 2026-02-21T09:20:56.0918849Z cvt.rn.bf16x2.f32 %r9744, %r12405, %r12404; 2026-02-21T09:20:56.0918923Z cvt.rn.bf16x2.f32 %r9745, %r12407, %r12406; 2026-02-21T09:20:56.0919002Z cvt.rn.bf16x2.f32 %r9746, %r12409, %r12408; 2026-02-21T09:20:56.0919074Z cvt.rn.bf16x2.f32 %r9747, %r12411, %r12410; 2026-02-21T09:20:56.0919148Z cvt.rn.bf16x2.f32 %r9748, %r12413, %r12412; 2026-02-21T09:20:56.0919226Z cvt.rn.bf16x2.f32 %r9749, %r12415, %r12414; 2026-02-21T09:20:56.0919438Z .loc 1 91 43 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:43 2026-02-21T09:20:56.0919504Z shl.b32 %r9750, %r9670, 13; 2026-02-21T09:20:56.0919563Z shl.b32 %r9751, %r9671, 13; 2026-02-21T09:20:56.0919625Z shl.b32 %r9752, %r9672, 13; 2026-02-21T09:20:56.0919683Z shl.b32 %r9753, %r9673, 13; 2026-02-21T09:20:56.0919741Z shl.b32 %r9754, %r9674, 13; 2026-02-21T09:20:56.0919802Z shl.b32 %r9755, %r9675, 13; 2026-02-21T09:20:56.0919862Z shl.b32 %r9756, %r9676, 13; 2026-02-21T09:20:56.0919921Z shl.b32 %r9757, %r9677, 13; 2026-02-21T09:20:56.0919979Z shl.b32 %r9758, %r9678, 13; 2026-02-21T09:20:56.0920043Z shl.b32 %r9759, %r9679, 13; 2026-02-21T09:20:56.0920100Z shl.b32 %r9760, %r9680, 13; 2026-02-21T09:20:56.0920158Z shl.b32 %r9761, %r9681, 13; 2026-02-21T09:20:56.0920220Z shl.b32 %r9762, %r9682, 13; 2026-02-21T09:20:56.0920279Z shl.b32 %r9763, %r9683, 13; 2026-02-21T09:20:56.0920336Z shl.b32 %r9764, %r9684, 13; 2026-02-21T09:20:56.0920393Z shl.b32 %r9765, %r9685, 13; 2026-02-21T09:20:56.0920604Z .loc 1 91 50 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:50 2026-02-21T09:20:56.0920669Z add.s32 %r9766, %r9750, %r9669; 2026-02-21T09:20:56.0920730Z add.s32 %r9767, %r9751, %r9669; 2026-02-21T09:20:56.0920793Z add.s32 %r9768, %r9752, %r9669; 2026-02-21T09:20:56.0920852Z add.s32 %r9769, %r9753, %r9669; 2026-02-21T09:20:56.0920913Z add.s32 %r9770, %r9754, %r9669; 2026-02-21T09:20:56.0920976Z add.s32 %r9771, %r9755, %r9669; 2026-02-21T09:20:56.0921048Z add.s32 %r9772, %r9756, %r9669; 2026-02-21T09:20:56.0921116Z add.s32 %r9773, %r9757, %r9669; 2026-02-21T09:20:56.0921178Z add.s32 %r9774, %r9758, %r9669; 2026-02-21T09:20:56.0921240Z add.s32 %r9775, %r9759, %r9669; 2026-02-21T09:20:56.0921298Z add.s32 %r9776, %r9760, %r9669; 2026-02-21T09:20:56.0921427Z add.s32 %r9777, %r9761, %r9669; 2026-02-21T09:20:56.0921492Z add.s32 %r9778, %r9762, %r9669; 2026-02-21T09:20:56.0921551Z add.s32 %r9779, %r9763, %r9669; 2026-02-21T09:20:56.0921610Z add.s32 %r9780, %r9764, %r9669; 2026-02-21T09:20:56.0921671Z add.s32 %r9781, %r9765, %r9669; 2026-02-21T09:20:56.0921924Z .loc 1 91 22 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:22 2026-02-21T09:20:56.0921997Z mad.wide.s32 %rd497, %r9766, 2, %rd46; 2026-02-21T09:20:56.0922065Z mad.wide.s32 %rd498, %r9767, 2, %rd46; 2026-02-21T09:20:56.0922136Z mad.wide.s32 %rd499, %r9768, 2, %rd46; 2026-02-21T09:20:56.0922202Z mad.wide.s32 %rd500, %r9769, 2, %rd46; 2026-02-21T09:20:56.0922266Z mad.wide.s32 %rd501, %r9770, 2, %rd46; 2026-02-21T09:20:56.0922385Z mad.wide.s32 %rd502, %r9771, 2, %rd46; 2026-02-21T09:20:56.0922455Z mad.wide.s32 %rd503, %r9772, 2, %rd46; 2026-02-21T09:20:56.0922521Z mad.wide.s32 %rd504, %r9773, 2, %rd46; 2026-02-21T09:20:56.0922586Z mad.wide.s32 %rd505, %r9774, 2, %rd46; 2026-02-21T09:20:56.0922659Z mad.wide.s32 %rd506, %r9775, 2, %rd46; 2026-02-21T09:20:56.0922723Z mad.wide.s32 %rd507, %r9776, 2, %rd46; 2026-02-21T09:20:56.0922788Z mad.wide.s32 %rd508, %r9777, 2, %rd46; 2026-02-21T09:20:56.0922856Z mad.wide.s32 %rd509, %r9778, 2, %rd46; 2026-02-21T09:20:56.0922922Z mad.wide.s32 %rd510, %r9779, 2, %rd46; 2026-02-21T09:20:56.0922987Z mad.wide.s32 %rd511, %r9780, 2, %rd46; 2026-02-21T09:20:56.0923053Z mad.wide.s32 %rd512, %r9781, 2, %rd46; 2026-02-21T09:20:56.0923310Z .loc 1 91 81 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:81 2026-02-21T09:20:56.0923429Z st.shared.v4.b32 [%r105], {%r9686, %r9688, %r9690, %r9692}; 2026-02-21T09:20:56.0923550Z st.shared.v4.b32 [%r105+512], {%r9687, %r9689, %r9691, %r9693}; 2026-02-21T09:20:56.0923660Z st.shared.v4.b32 [%r106], {%r9694, %r9696, %r9698, %r9700}; 2026-02-21T09:20:56.0923773Z st.shared.v4.b32 [%r106+512], {%r9695, %r9697, %r9699, %r9701}; 2026-02-21T09:20:56.0923888Z st.shared.v4.b32 [%r107], {%r9702, %r9704, %r9706, %r9708}; 2026-02-21T09:20:56.0924003Z st.shared.v4.b32 [%r107+512], {%r9703, %r9705, %r9707, %r9709}; 2026-02-21T09:20:56.0924108Z st.shared.v4.b32 [%r108], {%r9710, %r9712, %r9714, %r9716}; 2026-02-21T09:20:56.0924218Z st.shared.v4.b32 [%r108+512], {%r9711, %r9713, %r9715, %r9717}; 2026-02-21T09:20:56.0924277Z bar.sync 0; 2026-02-21T09:20:56.0924337Z // begin inline asm 2026-02-21T09:20:56.0924529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9525, %r9526, %r9527, %r9528}, [%r3394]; 2026-02-21T09:20:56.0924589Z // end inline asm 2026-02-21T09:20:56.0924647Z // begin inline asm 2026-02-21T09:20:56.0924829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9530, %r9531, %r9532, %r9533}, [%r3399]; 2026-02-21T09:20:56.0924896Z // end inline asm 2026-02-21T09:20:56.0924959Z // begin inline asm 2026-02-21T09:20:56.0925140Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9535, %r9536, %r9537, %r9538}, [%r3404]; 2026-02-21T09:20:56.0925198Z // end inline asm 2026-02-21T09:20:56.0925257Z // begin inline asm 2026-02-21T09:20:56.0925435Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9540, %r9541, %r9542, %r9543}, [%r3409]; 2026-02-21T09:20:56.0925493Z // end inline asm 2026-02-21T09:20:56.0925550Z // begin inline asm 2026-02-21T09:20:56.0925730Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9545, %r9546, %r9547, %r9548}, [%r3414]; 2026-02-21T09:20:56.0925786Z // end inline asm 2026-02-21T09:20:56.0925842Z // begin inline asm 2026-02-21T09:20:56.0926021Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9550, %r9551, %r9552, %r9553}, [%r3419]; 2026-02-21T09:20:56.0926076Z // end inline asm 2026-02-21T09:20:56.0926135Z // begin inline asm 2026-02-21T09:20:56.0926316Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9555, %r9556, %r9557, %r9558}, [%r3424]; 2026-02-21T09:20:56.0926371Z // end inline asm 2026-02-21T09:20:56.0926428Z // begin inline asm 2026-02-21T09:20:56.0926721Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9560, %r9561, %r9562, %r9563}, [%r3429]; 2026-02-21T09:20:56.0926871Z // end inline asm 2026-02-21T09:20:56.0926927Z bar.sync 0; 2026-02-21T09:20:56.0927034Z st.shared.v4.b32 [%r105], {%r9718, %r9720, %r9722, %r9724}; 2026-02-21T09:20:56.0927150Z st.shared.v4.b32 [%r105+512], {%r9719, %r9721, %r9723, %r9725}; 2026-02-21T09:20:56.0927349Z st.shared.v4.b32 [%r106], {%r9726, %r9728, %r9730, %r9732}; 2026-02-21T09:20:56.0927460Z st.shared.v4.b32 [%r106+512], {%r9727, %r9729, %r9731, %r9733}; 2026-02-21T09:20:56.0927565Z st.shared.v4.b32 [%r107], {%r9734, %r9736, %r9738, %r9740}; 2026-02-21T09:20:56.0927675Z st.shared.v4.b32 [%r107+512], {%r9735, %r9737, %r9739, %r9741}; 2026-02-21T09:20:56.0927777Z st.shared.v4.b32 [%r108], {%r9742, %r9744, %r9746, %r9748}; 2026-02-21T09:20:56.0927952Z st.shared.v4.b32 [%r108+512], {%r9743, %r9745, %r9747, %r9749}; 2026-02-21T09:20:56.0928018Z bar.sync 0; 2026-02-21T09:20:56.0928077Z // begin inline asm 2026-02-21T09:20:56.0928255Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9565, %r9566, %r9567, %r9568}, [%r3394]; 2026-02-21T09:20:56.0928317Z // end inline asm 2026-02-21T09:20:56.0928376Z // begin inline asm 2026-02-21T09:20:56.0928552Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9570, %r9571, %r9572, %r9573}, [%r3399]; 2026-02-21T09:20:56.0928614Z // end inline asm 2026-02-21T09:20:56.0928670Z // begin inline asm 2026-02-21T09:20:56.0928846Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9575, %r9576, %r9577, %r9578}, [%r3404]; 2026-02-21T09:20:56.0928902Z // end inline asm 2026-02-21T09:20:56.0928961Z // begin inline asm 2026-02-21T09:20:56.0929206Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9580, %r9581, %r9582, %r9583}, [%r3409]; 2026-02-21T09:20:56.0929266Z // end inline asm 2026-02-21T09:20:56.0929327Z // begin inline asm 2026-02-21T09:20:56.0929511Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9585, %r9586, %r9587, %r9588}, [%r3414]; 2026-02-21T09:20:56.0929567Z // end inline asm 2026-02-21T09:20:56.0929624Z // begin inline asm 2026-02-21T09:20:56.0929807Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9590, %r9591, %r9592, %r9593}, [%r3419]; 2026-02-21T09:20:56.0929864Z // end inline asm 2026-02-21T09:20:56.0929921Z // begin inline asm 2026-02-21T09:20:56.0930101Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9595, %r9596, %r9597, %r9598}, [%r3424]; 2026-02-21T09:20:56.0930158Z // end inline asm 2026-02-21T09:20:56.0930215Z // begin inline asm 2026-02-21T09:20:56.0930392Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9600, %r9601, %r9602, %r9603}, [%r3429]; 2026-02-21T09:20:56.0930447Z // end inline asm 2026-02-21T09:20:56.0930504Z // begin inline asm 2026-02-21T09:20:56.0930629Z st.global.v4.b32 [ %rd497 + 0 ], { %r9525, %r9526, %r9527, %r9528 }; 2026-02-21T09:20:56.0930690Z // end inline asm 2026-02-21T09:20:56.0930746Z // begin inline asm 2026-02-21T09:20:56.0930864Z st.global.v4.b32 [ %rd498 + 0 ], { %r9530, %r9531, %r9532, %r9533 }; 2026-02-21T09:20:56.0930921Z // end inline asm 2026-02-21T09:20:56.0930979Z // begin inline asm 2026-02-21T09:20:56.0931094Z st.global.v4.b32 [ %rd499 + 0 ], { %r9535, %r9536, %r9537, %r9538 }; 2026-02-21T09:20:56.0931150Z // end inline asm 2026-02-21T09:20:56.0931208Z // begin inline asm 2026-02-21T09:20:56.0931322Z st.global.v4.b32 [ %rd500 + 0 ], { %r9540, %r9541, %r9542, %r9543 }; 2026-02-21T09:20:56.0931378Z // end inline asm 2026-02-21T09:20:56.0931436Z // begin inline asm 2026-02-21T09:20:56.0931547Z st.global.v4.b32 [ %rd501 + 0 ], { %r9545, %r9546, %r9547, %r9548 }; 2026-02-21T09:20:56.0931603Z // end inline asm 2026-02-21T09:20:56.0931662Z // begin inline asm 2026-02-21T09:20:56.0931774Z st.global.v4.b32 [ %rd502 + 0 ], { %r9550, %r9551, %r9552, %r9553 }; 2026-02-21T09:20:56.0931829Z // end inline asm 2026-02-21T09:20:56.0931886Z // begin inline asm 2026-02-21T09:20:56.0932002Z st.global.v4.b32 [ %rd503 + 0 ], { %r9555, %r9556, %r9557, %r9558 }; 2026-02-21T09:20:56.0932058Z // end inline asm 2026-02-21T09:20:56.0932115Z // begin inline asm 2026-02-21T09:20:56.0932230Z st.global.v4.b32 [ %rd504 + 0 ], { %r9560, %r9561, %r9562, %r9563 }; 2026-02-21T09:20:56.0932345Z // end inline asm 2026-02-21T09:20:56.0932401Z // begin inline asm 2026-02-21T09:20:56.0932512Z st.global.v4.b32 [ %rd505 + 0 ], { %r9565, %r9566, %r9567, %r9568 }; 2026-02-21T09:20:56.0932570Z // end inline asm 2026-02-21T09:20:56.0932677Z // begin inline asm 2026-02-21T09:20:56.0932801Z st.global.v4.b32 [ %rd506 + 0 ], { %r9570, %r9571, %r9572, %r9573 }; 2026-02-21T09:20:56.0932861Z // end inline asm 2026-02-21T09:20:56.0932917Z // begin inline asm 2026-02-21T09:20:56.0933032Z st.global.v4.b32 [ %rd507 + 0 ], { %r9575, %r9576, %r9577, %r9578 }; 2026-02-21T09:20:56.0933089Z // end inline asm 2026-02-21T09:20:56.0933146Z // begin inline asm 2026-02-21T09:20:56.0933258Z st.global.v4.b32 [ %rd508 + 0 ], { %r9580, %r9581, %r9582, %r9583 }; 2026-02-21T09:20:56.0933359Z // end inline asm 2026-02-21T09:20:56.0933420Z // begin inline asm 2026-02-21T09:20:56.0933532Z st.global.v4.b32 [ %rd509 + 0 ], { %r9585, %r9586, %r9587, %r9588 }; 2026-02-21T09:20:56.0933589Z // end inline asm 2026-02-21T09:20:56.0933647Z // begin inline asm 2026-02-21T09:20:56.0933759Z st.global.v4.b32 [ %rd510 + 0 ], { %r9590, %r9591, %r9592, %r9593 }; 2026-02-21T09:20:56.0933814Z // end inline asm 2026-02-21T09:20:56.0933872Z // begin inline asm 2026-02-21T09:20:56.0933988Z st.global.v4.b32 [ %rd511 + 0 ], { %r9595, %r9596, %r9597, %r9598 }; 2026-02-21T09:20:56.0934044Z // end inline asm 2026-02-21T09:20:56.0934100Z // begin inline asm 2026-02-21T09:20:56.0934213Z st.global.v4.b32 [ %rd512 + 0 ], { %r9600, %r9601, %r9602, %r9603 }; 2026-02-21T09:20:56.0934321Z // end inline asm 2026-02-21T09:20:56.0934549Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0934615Z add.s32 %r11887, %r11887, 4; 2026-02-21T09:20:56.0934688Z setp.lt.s32 %p54, %r11887, %r12416; 2026-02-21T09:20:56.0934749Z @%p54 bra $L__BB0_2; 2026-02-21T09:20:56.0934848Z $L__BB0_11: // %.preheader 2026-02-21T09:20:56.0934921Z setp.ge.s32 %p55, %r12416, %r2; 2026-02-21T09:20:56.0934981Z @%p55 bra $L__BB0_16; 2026-02-21T09:20:56.0935066Z // %bb.12: // %.lr.ph112 2026-02-21T09:20:56.0935289Z .loc 1 0 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:0:121 2026-02-21T09:20:56.0935357Z and.b32 %r9784, %r11868, 136; 2026-02-21T09:20:56.0935420Z xor.b32 %r123, %r9784, %r11867; 2026-02-21T09:20:56.0935485Z add.s32 %r124, %r11869, %r123; 2026-02-21T09:20:56.0935546Z add.s32 %r9827, %r124, 2048; 2026-02-21T09:20:56.0935607Z add.s32 %r9829, %r124, 4096; 2026-02-21T09:20:56.0935666Z add.s32 %r9831, %r124, 6144; 2026-02-21T09:20:56.0935732Z add.s32 %r9787, %r11869, %r11871; 2026-02-21T09:20:56.0935794Z add.s32 %r129, %r9787, 90112; 2026-02-21T09:20:56.0935854Z add.s32 %r9835, %r124, 40960; 2026-02-21T09:20:56.0935915Z add.s32 %r9837, %r124, 43008; 2026-02-21T09:20:56.0935974Z add.s32 %r9839, %r124, 45056; 2026-02-21T09:20:56.0936033Z add.s32 %r9841, %r124, 47104; 2026-02-21T09:20:56.0936091Z or.b32 %r134, %r11870, 65536; 2026-02-21T09:20:56.0936152Z add.s32 %r135, %r9787, 95232; 2026-02-21T09:20:56.0936212Z add.s32 %r9845, %r124, 8192; 2026-02-21T09:20:56.0936281Z add.s32 %r9847, %r124, 10240; 2026-02-21T09:20:56.0936346Z add.s32 %r9849, %r124, 12288; 2026-02-21T09:20:56.0936405Z add.s32 %r9851, %r124, 14336; 2026-02-21T09:20:56.0936588Z or.b32 %r140, %r11870, 131072; 2026-02-21T09:20:56.0936653Z add.s32 %r9853, %r9787, 91136; 2026-02-21T09:20:56.0936714Z add.s32 %r9855, %r124, 49152; 2026-02-21T09:20:56.0936773Z add.s32 %r9857, %r124, 51200; 2026-02-21T09:20:56.0936831Z add.s32 %r9859, %r124, 53248; 2026-02-21T09:20:56.0936892Z add.s32 %r9861, %r124, 55296; 2026-02-21T09:20:56.0936952Z or.b32 %r146, %r11870, 196608; 2026-02-21T09:20:56.0937012Z add.s32 %r9863, %r9787, 96256; 2026-02-21T09:20:56.0937072Z add.s32 %r9865, %r124, 16384; 2026-02-21T09:20:56.0937133Z add.s32 %r9867, %r124, 18432; 2026-02-21T09:20:56.0937282Z add.s32 %r9869, %r124, 20480; 2026-02-21T09:20:56.0937342Z add.s32 %r9871, %r124, 22528; 2026-02-21T09:20:56.0937405Z or.b32 %r152, %r11870, 262144; 2026-02-21T09:20:56.0937464Z add.s32 %r9873, %r9787, 92160; 2026-02-21T09:20:56.0937584Z add.s32 %r9875, %r124, 57344; 2026-02-21T09:20:56.0937645Z add.s32 %r9877, %r124, 59392; 2026-02-21T09:20:56.0937703Z add.s32 %r9879, %r124, 61440; 2026-02-21T09:20:56.0937762Z add.s32 %r9881, %r124, 63488; 2026-02-21T09:20:56.0937821Z or.b32 %r158, %r11870, 327680; 2026-02-21T09:20:56.0937888Z add.s32 %r9883, %r9787, 97280; 2026-02-21T09:20:56.0937946Z add.s32 %r9885, %r124, 24576; 2026-02-21T09:20:56.0938004Z add.s32 %r9887, %r124, 26624; 2026-02-21T09:20:56.0938064Z add.s32 %r9889, %r124, 28672; 2026-02-21T09:20:56.0938192Z add.s32 %r9891, %r124, 30720; 2026-02-21T09:20:56.0938256Z or.b32 %r164, %r11870, 393216; 2026-02-21T09:20:56.0938325Z add.s32 %r9893, %r9787, 93184; 2026-02-21T09:20:56.0938388Z add.s32 %r9895, %r124, 65536; 2026-02-21T09:20:56.0938449Z add.s32 %r9897, %r124, 67584; 2026-02-21T09:20:56.0938507Z add.s32 %r9899, %r124, 69632; 2026-02-21T09:20:56.0938568Z add.s32 %r9901, %r124, 71680; 2026-02-21T09:20:56.0938626Z or.b32 %r170, %r11870, 458752; 2026-02-21T09:20:56.0938685Z add.s32 %r9903, %r9787, 98304; 2026-02-21T09:20:56.0938743Z add.s32 %r9905, %r124, 32768; 2026-02-21T09:20:56.0938803Z add.s32 %r9907, %r124, 34816; 2026-02-21T09:20:56.0938860Z add.s32 %r9909, %r124, 36864; 2026-02-21T09:20:56.0938918Z add.s32 %r9911, %r124, 38912; 2026-02-21T09:20:56.0939047Z or.b32 %r176, %r11870, 524288; 2026-02-21T09:20:56.0939111Z add.s32 %r9913, %r9787, 94208; 2026-02-21T09:20:56.0939169Z add.s32 %r9915, %r124, 73728; 2026-02-21T09:20:56.0939227Z add.s32 %r9917, %r124, 75776; 2026-02-21T09:20:56.0939289Z add.s32 %r9919, %r124, 77824; 2026-02-21T09:20:56.0939347Z add.s32 %r9921, %r124, 79872; 2026-02-21T09:20:56.0939404Z or.b32 %r182, %r11870, 589824; 2026-02-21T09:20:56.0939469Z add.s32 %r9923, %r9787, 99328; 2026-02-21T09:20:56.0939531Z or.b32 %r9791, %r11872, %r11873; 2026-02-21T09:20:56.0939592Z or.b32 %r9792, %r9791, %r11874; 2026-02-21T09:20:56.0939654Z or.b32 %r184, %r9792, %r9784; 2026-02-21T09:20:56.0939715Z xor.b32 %r185, %r184, 8; 2026-02-21T09:20:56.0939778Z shl.b32 %r9793, %r11875, 6; 2026-02-21T09:20:56.0939838Z or.b32 %r9796, %r9793, %r11878; 2026-02-21T09:20:56.0939901Z or.b32 %r9797, %r9796, %r11877; 2026-02-21T09:20:56.0939960Z add.s32 %r9798, %r11869, 81920; 2026-02-21T09:20:56.0940020Z add.s32 %r188, %r9798, %r9797; 2026-02-21T09:20:56.0940082Z xor.b32 %r9799, %r9797, 16; 2026-02-21T09:20:56.0940143Z add.s32 %r189, %r9798, %r9799; 2026-02-21T09:20:56.0940202Z xor.b32 %r9800, %r9797, 32; 2026-02-21T09:20:56.0940263Z add.s32 %r190, %r9798, %r9800; 2026-02-21T09:20:56.0940324Z xor.b32 %r9801, %r9797, 48; 2026-02-21T09:20:56.0940385Z add.s32 %r191, %r9798, %r9801; 2026-02-21T09:20:56.0940445Z bfe.u32 %r9802, %r9798, 4, 14; 2026-02-21T09:20:56.0940510Z cvt.u64.u32 %rd513, %r9802; 2026-02-21T09:20:56.0940588Z or.b64 %rd4, %rd513, -9223371899382267904; 2026-02-21T09:20:56.0940651Z add.s32 %r9803, %r11869, 81952; 2026-02-21T09:20:56.0940710Z bfe.u32 %r9804, %r9803, 4, 14; 2026-02-21T09:20:56.0940784Z cvt.u64.u32 %rd514, %r9804; 2026-02-21T09:20:56.0940867Z or.b64 %rd5, %rd514, -9223371899382267904; 2026-02-21T09:20:56.0940927Z and.b32 %r9807, %r11880, 7264; 2026-02-21T09:20:56.0940988Z shl.b32 %r9809, %r11881, 4; 2026-02-21T09:20:56.0941049Z or.b32 %r9811, %r11879, %r11882; 2026-02-21T09:20:56.0941109Z or.b32 %r9812, %r9807, %r9809; 2026-02-21T09:20:56.0941171Z or.b32 %r9813, %r9811, %r9812; 2026-02-21T09:20:56.0941231Z add.s32 %r192, %r11869, %r9813; 2026-02-21T09:20:56.0941290Z xor.b32 %r9814, %r9813, 32; 2026-02-21T09:20:56.0941360Z add.s32 %r193, %r11869, %r9814; 2026-02-21T09:20:56.0941424Z xor.b32 %r9815, %r9813, 64; 2026-02-21T09:20:56.0941485Z add.s32 %r194, %r11869, %r9815; 2026-02-21T09:20:56.0941600Z xor.b32 %r9816, %r9813, 96; 2026-02-21T09:20:56.0941663Z add.s32 %r195, %r11869, %r9816; 2026-02-21T09:20:56.0941732Z shl.b32 %r9817, %r11881, 10; 2026-02-21T09:20:56.0941794Z or.b32 %r9820, %r9817, %r11883; 2026-02-21T09:20:56.0941855Z xor.b32 %r9821, %r9820, %r11884; 2026-02-21T09:20:56.0941969Z add.s32 %r11614, %r11869, %r9821; 2026-02-21T09:20:56.0942029Z add.s32 %r11619, %r11614, 1024; 2026-02-21T09:20:56.0942087Z add.s32 %r11624, %r11614, 2048; 2026-02-21T09:20:56.0942148Z add.s32 %r11629, %r11614, 3072; 2026-02-21T09:20:56.0942207Z add.s32 %r11634, %r11614, 4096; 2026-02-21T09:20:56.0942268Z add.s32 %r11639, %r11614, 5120; 2026-02-21T09:20:56.0942330Z add.s32 %r11644, %r11614, 6144; 2026-02-21T09:20:56.0942389Z add.s32 %r11649, %r11614, 7168; 2026-02-21T09:20:56.0942665Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0942728Z or.b32 %r9823, %r11885, %r7; 2026-02-21T09:20:56.0942791Z or.b32 %r204, %r9823, 720896; 2026-02-21T09:20:56.0943010Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.0943078Z mad.wide.u32 %rd6, %r33, 8, %rd44; 2026-02-21T09:20:56.0943192Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T09:20:56.0943289Z // Child Loop BB0_14 Depth 2 2026-02-21T09:20:56.0943495Z .loc 1 28 35 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:28:35 2026-02-21T09:20:56.0943605Z shr.s32 %r9928, %r12416, 31; 2026-02-21T09:20:56.0943668Z shr.u32 %r9929, %r9928, 23; 2026-02-21T09:20:56.0943731Z add.s32 %r9930, %r12416, %r9929; 2026-02-21T09:20:56.0946886Z .loc 1 31 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:45 2026-02-21T09:20:56.0946983Z and.b32 %r9931, %r9930, 65024; 2026-02-21T09:20:56.0947052Z sub.s32 %r9932, %r12416, %r9931; 2026-02-21T09:20:56.0947296Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0947365Z cvt.u16.u32 %rs449, %r9932; 2026-02-21T09:20:56.0947591Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0947660Z shr.s16 %rs450, %rs449, 15; 2026-02-21T09:20:56.0947724Z shr.u16 %rs451, %rs450, 13; 2026-02-21T09:20:56.0947790Z add.s16 %rs452, %rs449, %rs451; 2026-02-21T09:20:56.0947852Z shr.s16 %rs453, %rs452, 3; 2026-02-21T09:20:56.0948071Z .loc 1 31 64 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:31:64 2026-02-21T09:20:56.0948141Z and.b16 %rs454, %rs452, -8; 2026-02-21T09:20:56.0948205Z sub.s16 %rs455, %rs449, %rs454; 2026-02-21T09:20:56.0948484Z .loc 1 32 51 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:32:51 2026-02-21T09:20:56.0948549Z cvt.u32.u16 %r9933, %rs453; 2026-02-21T09:20:56.0948753Z .loc 1 33 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:33:27 2026-02-21T09:20:56.0948819Z shl.b32 %r9934, %r9930, 1; 2026-02-21T09:20:56.0948885Z and.b32 %r9935, %r9934, -1024; 2026-02-21T09:20:56.0948953Z mul.wide.s16 %r9936, %rs455, 128; 2026-02-21T09:20:56.0949016Z add.s32 %r1285, %r9936, %r9935; 2026-02-21T09:20:56.0949223Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.0949283Z or.b32 %r9937, %r1285, %r7; 2026-02-21T09:20:56.0949484Z .loc 1 35 27 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:35:27 2026-02-21T09:20:56.0949558Z mul.wide.s16 %r1286, %rs453, 256; 2026-02-21T09:20:56.0949758Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.0949819Z or.b32 %r9938, %r1286, %r11; 2026-02-21T09:20:56.0949879Z or.b32 %r9939, %r1286, %r12; 2026-02-21T09:20:56.0949936Z or.b32 %r9940, %r1286, %r13; 2026-02-21T09:20:56.0950160Z or.b32 %r9941, %r1286, %r14; 2026-02-21T09:20:56.0950364Z .loc 1 51 53 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:53 2026-02-21T09:20:56.0950428Z shl.b32 %r9942, %r9938, 10; 2026-02-21T09:20:56.0950553Z shl.b32 %r9943, %r9939, 10; 2026-02-21T09:20:56.0950611Z shl.b32 %r9944, %r9940, 10; 2026-02-21T09:20:56.0950671Z shl.b32 %r9945, %r9941, 10; 2026-02-21T09:20:56.0950871Z .loc 1 51 60 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:60 2026-02-21T09:20:56.0950935Z or.b32 %r9946, %r9942, %r34; 2026-02-21T09:20:56.0950997Z or.b32 %r9947, %r9943, %r34; 2026-02-21T09:20:56.0951054Z or.b32 %r9948, %r9944, %r34; 2026-02-21T09:20:56.0951113Z or.b32 %r9949, %r9945, %r34; 2026-02-21T09:20:56.0951386Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0951474Z mad.wide.s32 %rd515, %r9946, 2, %rd44; 2026-02-21T09:20:56.0951548Z mad.wide.s32 %rd516, %r9947, 2, %rd44; 2026-02-21T09:20:56.0951615Z mad.wide.s32 %rd517, %r9948, 2, %rd44; 2026-02-21T09:20:56.0951683Z mad.wide.s32 %rd518, %r9949, 2, %rd44; 2026-02-21T09:20:56.0951887Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0951947Z bar.sync 0; 2026-02-21T09:20:56.0952005Z mov.b32 %r9826, 8; 2026-02-21T09:20:56.0952079Z // begin inline asm 2026-02-21T09:20:56.0952226Z cp.async.ca.shared.global [ %r124 + 0 ], [ %rd515 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0952349Z // end inline asm 2026-02-21T09:20:56.0952416Z // begin inline asm 2026-02-21T09:20:56.0952559Z cp.async.ca.shared.global [ %r9827 + 0 ], [ %rd516 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0952615Z // end inline asm 2026-02-21T09:20:56.0952675Z // begin inline asm 2026-02-21T09:20:56.0952809Z cp.async.ca.shared.global [ %r9829 + 0 ], [ %rd517 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0952866Z // end inline asm 2026-02-21T09:20:56.0952924Z // begin inline asm 2026-02-21T09:20:56.0953059Z cp.async.ca.shared.global [ %r9831 + 0 ], [ %rd518 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0953123Z // end inline asm 2026-02-21T09:20:56.0953191Z cp.async.commit_group; 2026-02-21T09:20:56.0953407Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0953472Z add.s32 %r9950, %r9937, %r11870; 2026-02-21T09:20:56.0953673Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0953741Z cvt.s64.s32 %rd566, %r9950; 2026-02-21T09:20:56.0953808Z add.s64 %rd519, %rd45, %rd566; 2026-02-21T09:20:56.0953868Z mov.b32 %r12419, 4; 2026-02-21T09:20:56.0954069Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0954134Z // begin inline asm 2026-02-21T09:20:56.0954268Z cp.async.ca.shared.global [ %r129 + 0 ], [ %rd519 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0954324Z // end inline asm 2026-02-21T09:20:56.0954390Z cp.async.commit_group; 2026-02-21T09:20:56.0954588Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0954650Z cvt.s64.s32 %rd567, %r9942; 2026-02-21T09:20:56.0954717Z or.b64 %rd569, %rd567, %rd633; 2026-02-21T09:20:56.0954780Z shl.b64 %rd570, %rd569, 1; 2026-02-21T09:20:56.0954842Z add.s64 %rd571, %rd44, %rd570; 2026-02-21T09:20:56.0954903Z add.s64 %rd520, %rd571, 32; 2026-02-21T09:20:56.0954965Z cvt.s64.s32 %rd572, %r9943; 2026-02-21T09:20:56.0955026Z or.b64 %rd573, %rd572, %rd633; 2026-02-21T09:20:56.0955086Z shl.b64 %rd574, %rd573, 1; 2026-02-21T09:20:56.0955148Z add.s64 %rd575, %rd44, %rd574; 2026-02-21T09:20:56.0955209Z add.s64 %rd521, %rd575, 32; 2026-02-21T09:20:56.0955268Z cvt.s64.s32 %rd576, %r9944; 2026-02-21T09:20:56.0955328Z or.b64 %rd577, %rd576, %rd633; 2026-02-21T09:20:56.0955390Z shl.b64 %rd578, %rd577, 1; 2026-02-21T09:20:56.0955524Z add.s64 %rd579, %rd44, %rd578; 2026-02-21T09:20:56.0955583Z add.s64 %rd522, %rd579, 32; 2026-02-21T09:20:56.0955644Z cvt.s64.s32 %rd580, %r9945; 2026-02-21T09:20:56.0955704Z or.b64 %rd581, %rd580, %rd633; 2026-02-21T09:20:56.0955764Z shl.b64 %rd582, %rd581, 1; 2026-02-21T09:20:56.0955878Z add.s64 %rd583, %rd44, %rd582; 2026-02-21T09:20:56.0955940Z add.s64 %rd523, %rd583, 32; 2026-02-21T09:20:56.0956140Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0956196Z // begin inline asm 2026-02-21T09:20:56.0956332Z cp.async.ca.shared.global [ %r9835 + 0 ], [ %rd520 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0956387Z // end inline asm 2026-02-21T09:20:56.0956443Z // begin inline asm 2026-02-21T09:20:56.0956821Z cp.async.ca.shared.global [ %r9837 + 0 ], [ %rd521 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0956886Z // end inline asm 2026-02-21T09:20:56.0956945Z // begin inline asm 2026-02-21T09:20:56.0957079Z cp.async.ca.shared.global [ %r9839 + 0 ], [ %rd522 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0957135Z // end inline asm 2026-02-21T09:20:56.0957190Z // begin inline asm 2026-02-21T09:20:56.0957320Z cp.async.ca.shared.global [ %r9841 + 0 ], [ %rd523 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0957378Z // end inline asm 2026-02-21T09:20:56.0957441Z cp.async.commit_group; 2026-02-21T09:20:56.0957644Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0957706Z add.s32 %r9951, %r134, %r9937; 2026-02-21T09:20:56.0957970Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0958032Z cvt.s64.s32 %rd584, %r9951; 2026-02-21T09:20:56.0958092Z add.s64 %rd524, %rd45, %rd584; 2026-02-21T09:20:56.0958295Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0958364Z // begin inline asm 2026-02-21T09:20:56.0958501Z cp.async.ca.shared.global [ %r135 + 0 ], [ %rd524 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0958559Z // end inline asm 2026-02-21T09:20:56.0958622Z cp.async.commit_group; 2026-02-21T09:20:56.0958819Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0958880Z add.s64 %rd525, %rd571, 64; 2026-02-21T09:20:56.0958940Z add.s64 %rd526, %rd575, 64; 2026-02-21T09:20:56.0958998Z add.s64 %rd527, %rd579, 64; 2026-02-21T09:20:56.0959058Z add.s64 %rd528, %rd583, 64; 2026-02-21T09:20:56.0959262Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0959316Z bar.sync 0; 2026-02-21T09:20:56.0959374Z // begin inline asm 2026-02-21T09:20:56.0959513Z cp.async.ca.shared.global [ %r9845 + 0 ], [ %rd525 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0959574Z // end inline asm 2026-02-21T09:20:56.0959632Z // begin inline asm 2026-02-21T09:20:56.0959763Z cp.async.ca.shared.global [ %r9847 + 0 ], [ %rd526 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0959821Z // end inline asm 2026-02-21T09:20:56.0959879Z // begin inline asm 2026-02-21T09:20:56.0960007Z cp.async.ca.shared.global [ %r9849 + 0 ], [ %rd527 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0960065Z // end inline asm 2026-02-21T09:20:56.0960121Z // begin inline asm 2026-02-21T09:20:56.0960249Z cp.async.ca.shared.global [ %r9851 + 0 ], [ %rd528 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0960315Z // end inline asm 2026-02-21T09:20:56.0960383Z cp.async.commit_group; 2026-02-21T09:20:56.0960586Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0960647Z add.s32 %r9952, %r140, %r9937; 2026-02-21T09:20:56.0960849Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0960909Z cvt.s64.s32 %rd585, %r9952; 2026-02-21T09:20:56.0960970Z add.s64 %rd529, %rd45, %rd585; 2026-02-21T09:20:56.0961268Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0961329Z // begin inline asm 2026-02-21T09:20:56.0961471Z cp.async.ca.shared.global [ %r9853 + 0 ], [ %rd529 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0961590Z // end inline asm 2026-02-21T09:20:56.0961656Z cp.async.commit_group; 2026-02-21T09:20:56.0961862Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0961925Z add.s64 %rd530, %rd571, 96; 2026-02-21T09:20:56.0961990Z add.s64 %rd531, %rd575, 96; 2026-02-21T09:20:56.0962051Z add.s64 %rd532, %rd579, 96; 2026-02-21T09:20:56.0962110Z add.s64 %rd533, %rd583, 96; 2026-02-21T09:20:56.0962361Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0962421Z // begin inline asm 2026-02-21T09:20:56.0962558Z cp.async.ca.shared.global [ %r9855 + 0 ], [ %rd530 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0962615Z // end inline asm 2026-02-21T09:20:56.0962675Z // begin inline asm 2026-02-21T09:20:56.0962806Z cp.async.ca.shared.global [ %r9857 + 0 ], [ %rd531 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0962861Z // end inline asm 2026-02-21T09:20:56.0962921Z // begin inline asm 2026-02-21T09:20:56.0963049Z cp.async.ca.shared.global [ %r9859 + 0 ], [ %rd532 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0963106Z // end inline asm 2026-02-21T09:20:56.0963165Z // begin inline asm 2026-02-21T09:20:56.0963292Z cp.async.ca.shared.global [ %r9861 + 0 ], [ %rd533 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0963391Z // end inline asm 2026-02-21T09:20:56.0963457Z cp.async.commit_group; 2026-02-21T09:20:56.0963680Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0963747Z add.s32 %r9953, %r146, %r9937; 2026-02-21T09:20:56.0963949Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0964017Z cvt.s64.s32 %rd586, %r9953; 2026-02-21T09:20:56.0964080Z add.s64 %rd534, %rd45, %rd586; 2026-02-21T09:20:56.0964280Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0964342Z // begin inline asm 2026-02-21T09:20:56.0964484Z cp.async.ca.shared.global [ %r9863 + 0 ], [ %rd534 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0964538Z // end inline asm 2026-02-21T09:20:56.0964603Z cp.async.commit_group; 2026-02-21T09:20:56.0964808Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0964872Z add.s64 %rd535, %rd571, 128; 2026-02-21T09:20:56.0964931Z add.s64 %rd536, %rd575, 128; 2026-02-21T09:20:56.0964993Z add.s64 %rd537, %rd579, 128; 2026-02-21T09:20:56.0965054Z add.s64 %rd538, %rd583, 128; 2026-02-21T09:20:56.0965252Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0965310Z bar.sync 0; 2026-02-21T09:20:56.0965369Z // begin inline asm 2026-02-21T09:20:56.0965502Z cp.async.ca.shared.global [ %r9865 + 0 ], [ %rd535 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0965555Z // end inline asm 2026-02-21T09:20:56.0965614Z // begin inline asm 2026-02-21T09:20:56.0965745Z cp.async.ca.shared.global [ %r9867 + 0 ], [ %rd536 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0965800Z // end inline asm 2026-02-21T09:20:56.0965857Z // begin inline asm 2026-02-21T09:20:56.0965984Z cp.async.ca.shared.global [ %r9869 + 0 ], [ %rd537 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0966041Z // end inline asm 2026-02-21T09:20:56.0966097Z // begin inline asm 2026-02-21T09:20:56.0966230Z cp.async.ca.shared.global [ %r9871 + 0 ], [ %rd538 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0966285Z // end inline asm 2026-02-21T09:20:56.0966353Z cp.async.commit_group; 2026-02-21T09:20:56.0966690Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0966846Z add.s32 %r9954, %r152, %r9937; 2026-02-21T09:20:56.0967050Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0967114Z cvt.s64.s32 %rd587, %r9954; 2026-02-21T09:20:56.0967176Z add.s64 %rd539, %rd45, %rd587; 2026-02-21T09:20:56.0967434Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0967493Z // begin inline asm 2026-02-21T09:20:56.0967632Z cp.async.ca.shared.global [ %r9873 + 0 ], [ %rd539 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0967689Z // end inline asm 2026-02-21T09:20:56.0967754Z cp.async.commit_group; 2026-02-21T09:20:56.0967966Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0968090Z add.s64 %rd540, %rd571, 160; 2026-02-21T09:20:56.0968165Z add.s64 %rd541, %rd575, 160; 2026-02-21T09:20:56.0968231Z add.s64 %rd542, %rd579, 160; 2026-02-21T09:20:56.0968291Z add.s64 %rd543, %rd583, 160; 2026-02-21T09:20:56.0968506Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0968565Z // begin inline asm 2026-02-21T09:20:56.0968709Z cp.async.ca.shared.global [ %r9875 + 0 ], [ %rd540 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0968766Z // end inline asm 2026-02-21T09:20:56.0968823Z // begin inline asm 2026-02-21T09:20:56.0968956Z cp.async.ca.shared.global [ %r9877 + 0 ], [ %rd541 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0969010Z // end inline asm 2026-02-21T09:20:56.0969067Z // begin inline asm 2026-02-21T09:20:56.0969278Z cp.async.ca.shared.global [ %r9879 + 0 ], [ %rd542 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0969336Z // end inline asm 2026-02-21T09:20:56.0969393Z // begin inline asm 2026-02-21T09:20:56.0969524Z cp.async.ca.shared.global [ %r9881 + 0 ], [ %rd543 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0969582Z // end inline asm 2026-02-21T09:20:56.0969648Z cp.async.commit_group; 2026-02-21T09:20:56.0969854Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0969918Z add.s32 %r9955, %r158, %r9937; 2026-02-21T09:20:56.0970116Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0970181Z cvt.s64.s32 %rd588, %r9955; 2026-02-21T09:20:56.0970250Z add.s64 %rd544, %rd45, %rd588; 2026-02-21T09:20:56.0970461Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0970518Z // begin inline asm 2026-02-21T09:20:56.0970658Z cp.async.ca.shared.global [ %r9883 + 0 ], [ %rd544 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0970717Z // end inline asm 2026-02-21T09:20:56.0970781Z cp.async.commit_group; 2026-02-21T09:20:56.0970998Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0971066Z add.s64 %rd545, %rd571, 192; 2026-02-21T09:20:56.0971129Z add.s64 %rd546, %rd575, 192; 2026-02-21T09:20:56.0971188Z add.s64 %rd547, %rd579, 192; 2026-02-21T09:20:56.0971249Z add.s64 %rd548, %rd583, 192; 2026-02-21T09:20:56.0971458Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0971516Z bar.sync 0; 2026-02-21T09:20:56.0971574Z // begin inline asm 2026-02-21T09:20:56.0971715Z cp.async.ca.shared.global [ %r9885 + 0 ], [ %rd545 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0971770Z // end inline asm 2026-02-21T09:20:56.0971827Z // begin inline asm 2026-02-21T09:20:56.0971967Z cp.async.ca.shared.global [ %r9887 + 0 ], [ %rd546 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0972022Z // end inline asm 2026-02-21T09:20:56.0972090Z // begin inline asm 2026-02-21T09:20:56.0972228Z cp.async.ca.shared.global [ %r9889 + 0 ], [ %rd547 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0972292Z // end inline asm 2026-02-21T09:20:56.0972350Z // begin inline asm 2026-02-21T09:20:56.0972478Z cp.async.ca.shared.global [ %r9891 + 0 ], [ %rd548 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0972597Z // end inline asm 2026-02-21T09:20:56.0972664Z cp.async.commit_group; 2026-02-21T09:20:56.0972871Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0972987Z add.s32 %r9956, %r164, %r9937; 2026-02-21T09:20:56.0973189Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0973265Z cvt.s64.s32 %rd589, %r9956; 2026-02-21T09:20:56.0973329Z add.s64 %rd549, %rd45, %rd589; 2026-02-21T09:20:56.0973534Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0973594Z // begin inline asm 2026-02-21T09:20:56.0973781Z cp.async.ca.shared.global [ %r9893 + 0 ], [ %rd549 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0973852Z // end inline asm 2026-02-21T09:20:56.0973922Z cp.async.commit_group; 2026-02-21T09:20:56.0974123Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0974188Z add.s64 %rd550, %rd571, 224; 2026-02-21T09:20:56.0974247Z add.s64 %rd551, %rd575, 224; 2026-02-21T09:20:56.0974306Z add.s64 %rd552, %rd579, 224; 2026-02-21T09:20:56.0974367Z add.s64 %rd553, %rd583, 224; 2026-02-21T09:20:56.0974567Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0974626Z // begin inline asm 2026-02-21T09:20:56.0974804Z cp.async.ca.shared.global [ %r9895 + 0 ], [ %rd550 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0974863Z // end inline asm 2026-02-21T09:20:56.0974919Z // begin inline asm 2026-02-21T09:20:56.0975048Z cp.async.ca.shared.global [ %r9897 + 0 ], [ %rd551 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0975104Z // end inline asm 2026-02-21T09:20:56.0975163Z // begin inline asm 2026-02-21T09:20:56.0975291Z cp.async.ca.shared.global [ %r9899 + 0 ], [ %rd552 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0975346Z // end inline asm 2026-02-21T09:20:56.0975406Z // begin inline asm 2026-02-21T09:20:56.0975536Z cp.async.ca.shared.global [ %r9901 + 0 ], [ %rd553 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0975590Z // end inline asm 2026-02-21T09:20:56.0975656Z cp.async.commit_group; 2026-02-21T09:20:56.0975857Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0975918Z add.s32 %r9957, %r170, %r9937; 2026-02-21T09:20:56.0976119Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0976183Z cvt.s64.s32 %rd590, %r9957; 2026-02-21T09:20:56.0976244Z add.s64 %rd554, %rd45, %rd590; 2026-02-21T09:20:56.0976443Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0976629Z // begin inline asm 2026-02-21T09:20:56.0976772Z cp.async.ca.shared.global [ %r9903 + 0 ], [ %rd554 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0976830Z // end inline asm 2026-02-21T09:20:56.0976897Z cp.async.commit_group; 2026-02-21T09:20:56.0977096Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0977157Z add.s64 %rd555, %rd571, 256; 2026-02-21T09:20:56.0977218Z add.s64 %rd556, %rd575, 256; 2026-02-21T09:20:56.0977279Z add.s64 %rd557, %rd579, 256; 2026-02-21T09:20:56.0977351Z add.s64 %rd558, %rd583, 256; 2026-02-21T09:20:56.0977557Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0977614Z bar.sync 0; 2026-02-21T09:20:56.0977672Z // begin inline asm 2026-02-21T09:20:56.0977806Z cp.async.ca.shared.global [ %r9905 + 0 ], [ %rd555 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0977862Z // end inline asm 2026-02-21T09:20:56.0977919Z // begin inline asm 2026-02-21T09:20:56.0978049Z cp.async.ca.shared.global [ %r9907 + 0 ], [ %rd556 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0978200Z // end inline asm 2026-02-21T09:20:56.0978257Z // begin inline asm 2026-02-21T09:20:56.0978386Z cp.async.ca.shared.global [ %r9909 + 0 ], [ %rd557 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0978444Z // end inline asm 2026-02-21T09:20:56.0978500Z // begin inline asm 2026-02-21T09:20:56.0978692Z cp.async.ca.shared.global [ %r9911 + 0 ], [ %rd558 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0978750Z // end inline asm 2026-02-21T09:20:56.0978813Z cp.async.commit_group; 2026-02-21T09:20:56.0979014Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0979079Z add.s32 %r9958, %r176, %r9937; 2026-02-21T09:20:56.0979284Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0979407Z cvt.s64.s32 %rd591, %r9958; 2026-02-21T09:20:56.0979472Z add.s64 %rd559, %rd45, %rd591; 2026-02-21T09:20:56.0979672Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0979731Z // begin inline asm 2026-02-21T09:20:56.0979868Z cp.async.ca.shared.global [ %r9913 + 0 ], [ %rd559 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0979927Z // end inline asm 2026-02-21T09:20:56.0980002Z cp.async.commit_group; 2026-02-21T09:20:56.0980210Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.0980271Z add.s64 %rd560, %rd571, 288; 2026-02-21T09:20:56.0980334Z add.s64 %rd561, %rd575, 288; 2026-02-21T09:20:56.0980394Z add.s64 %rd562, %rd579, 288; 2026-02-21T09:20:56.0980519Z add.s64 %rd563, %rd583, 288; 2026-02-21T09:20:56.0980724Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0980783Z // begin inline asm 2026-02-21T09:20:56.0980914Z cp.async.ca.shared.global [ %r9915 + 0 ], [ %rd560 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0980971Z // end inline asm 2026-02-21T09:20:56.0981027Z // begin inline asm 2026-02-21T09:20:56.0981167Z cp.async.ca.shared.global [ %r9917 + 0 ], [ %rd561 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0981222Z // end inline asm 2026-02-21T09:20:56.0981282Z // begin inline asm 2026-02-21T09:20:56.0981420Z cp.async.ca.shared.global [ %r9919 + 0 ], [ %rd562 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0981478Z // end inline asm 2026-02-21T09:20:56.0981538Z // begin inline asm 2026-02-21T09:20:56.0981668Z cp.async.ca.shared.global [ %r9921 + 0 ], [ %rd563 + 0 ], 0x8, %r9826; 2026-02-21T09:20:56.0981722Z // end inline asm 2026-02-21T09:20:56.0981785Z cp.async.commit_group; 2026-02-21T09:20:56.0981989Z .loc 1 57 62 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:62 2026-02-21T09:20:56.0982049Z add.s32 %r9959, %r182, %r9937; 2026-02-21T09:20:56.0982248Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.0982313Z cvt.s64.s32 %rd592, %r9959; 2026-02-21T09:20:56.0982375Z add.s64 %rd564, %rd45, %rd592; 2026-02-21T09:20:56.0982573Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0982632Z // begin inline asm 2026-02-21T09:20:56.0982764Z cp.async.ca.shared.global [ %r9923 + 0 ], [ %rd564 + 0 ], 0x4, %r12419; 2026-02-21T09:20:56.0982823Z // end inline asm 2026-02-21T09:20:56.0982888Z cp.async.commit_group; 2026-02-21T09:20:56.0983090Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.0983152Z add.s32 %r9960, %r204, %r9935; 2026-02-21T09:20:56.0983215Z add.s32 %r12417, %r9960, %r9936; 2026-02-21T09:20:56.0983278Z or.b32 %r9961, %r14, %r1286; 2026-02-21T09:20:56.0983338Z shl.b32 %r9962, %r9961, 10; 2026-02-21T09:20:56.0983404Z mul.wide.s32 %rd36, %r9962, 2; 2026-02-21T09:20:56.0983467Z or.b32 %r9963, %r13, %r1286; 2026-02-21T09:20:56.0983525Z shl.b32 %r9964, %r9963, 10; 2026-02-21T09:20:56.0983589Z mul.wide.s32 %rd37, %r9964, 2; 2026-02-21T09:20:56.0983719Z or.b32 %r9965, %r12, %r1286; 2026-02-21T09:20:56.0983779Z shl.b32 %r9966, %r9965, 10; 2026-02-21T09:20:56.0983841Z mul.wide.s32 %rd38, %r9966, 2; 2026-02-21T09:20:56.0983900Z shl.b32 %r9967, %r9933, 18; 2026-02-21T09:20:56.0984008Z or.b32 %r9968, %r11886, %r9967; 2026-02-21T09:20:56.0984070Z mul.wide.s32 %rd39, %r9968, 2; 2026-02-21T09:20:56.0984133Z mov.b32 %r12420, 0f00000000; 2026-02-21T09:20:56.0984192Z mov.b32 %r12418, -1; 2026-02-21T09:20:56.0984252Z mov.b64 %rd643, -16; 2026-02-21T09:20:56.0984310Z mov.b64 %rd642, %rd6; 2026-02-21T09:20:56.0984371Z mov.b32 %r12421, %r12420; 2026-02-21T09:20:56.0984431Z mov.b32 %r12422, %r12420; 2026-02-21T09:20:56.0984487Z mov.b32 %r12423, %r12420; 2026-02-21T09:20:56.0984543Z mov.b32 %r12424, %r12420; 2026-02-21T09:20:56.0984650Z mov.b32 %r12425, %r12420; 2026-02-21T09:20:56.0984717Z mov.b32 %r12426, %r12420; 2026-02-21T09:20:56.0984773Z mov.b32 %r12427, %r12420; 2026-02-21T09:20:56.0984830Z mov.b32 %r12428, %r12420; 2026-02-21T09:20:56.0984890Z mov.b32 %r12429, %r12420; 2026-02-21T09:20:56.0984948Z mov.b32 %r12430, %r12420; 2026-02-21T09:20:56.0985003Z mov.b32 %r12431, %r12420; 2026-02-21T09:20:56.0985061Z mov.b32 %r12432, %r12420; 2026-02-21T09:20:56.0985120Z mov.b32 %r12433, %r12420; 2026-02-21T09:20:56.0985177Z mov.b32 %r12434, %r12420; 2026-02-21T09:20:56.0985233Z mov.b32 %r12435, %r12420; 2026-02-21T09:20:56.0985300Z mov.b32 %r12436, %r12420; 2026-02-21T09:20:56.0985361Z mov.b32 %r12437, %r12420; 2026-02-21T09:20:56.0985418Z mov.b32 %r12438, %r12420; 2026-02-21T09:20:56.0985528Z mov.b32 %r12439, %r12420; 2026-02-21T09:20:56.0985588Z mov.b32 %r12440, %r12420; 2026-02-21T09:20:56.0985646Z mov.b32 %r12441, %r12420; 2026-02-21T09:20:56.0985703Z mov.b32 %r12442, %r12420; 2026-02-21T09:20:56.0985761Z mov.b32 %r12443, %r12420; 2026-02-21T09:20:56.0985817Z mov.b32 %r12444, %r12420; 2026-02-21T09:20:56.0985877Z mov.b32 %r12445, %r12420; 2026-02-21T09:20:56.0985934Z mov.b32 %r12446, %r12420; 2026-02-21T09:20:56.0985990Z mov.b32 %r12447, %r12420; 2026-02-21T09:20:56.0986048Z mov.b32 %r12448, %r12420; 2026-02-21T09:20:56.0986107Z mov.b32 %r12449, %r12420; 2026-02-21T09:20:56.0986164Z mov.b32 %r12450, %r12420; 2026-02-21T09:20:56.0986221Z mov.b32 %r12451, %r12420; 2026-02-21T09:20:56.0986281Z mov.b32 %r12452, %r12420; 2026-02-21T09:20:56.0986337Z mov.b32 %r12453, %r12420; 2026-02-21T09:20:56.0986404Z mov.b32 %r12454, %r12420; 2026-02-21T09:20:56.0986589Z mov.b32 %r12455, %r12420; 2026-02-21T09:20:56.0986651Z mov.b32 %r12456, %r12420; 2026-02-21T09:20:56.0986711Z mov.b32 %r12457, %r12420; 2026-02-21T09:20:56.0986767Z mov.b32 %r12458, %r12420; 2026-02-21T09:20:56.0986826Z mov.b32 %r12459, %r12420; 2026-02-21T09:20:56.0986883Z mov.b32 %r12460, %r12420; 2026-02-21T09:20:56.0986941Z mov.b32 %r12461, %r12420; 2026-02-21T09:20:56.0987001Z mov.b32 %r12462, %r12420; 2026-02-21T09:20:56.0987057Z mov.b32 %r12463, %r12420; 2026-02-21T09:20:56.0987114Z mov.b32 %r12464, %r12420; 2026-02-21T09:20:56.0987170Z mov.b32 %r12465, %r12420; 2026-02-21T09:20:56.0987228Z mov.b32 %r12466, %r12420; 2026-02-21T09:20:56.0987284Z mov.b32 %r12467, %r12420; 2026-02-21T09:20:56.0987340Z mov.b32 %r12468, %r12420; 2026-02-21T09:20:56.0987400Z mov.b32 %r12469, %r12420; 2026-02-21T09:20:56.0987457Z mov.b32 %r12470, %r12420; 2026-02-21T09:20:56.0987512Z mov.b32 %r12471, %r12420; 2026-02-21T09:20:56.0987567Z mov.b32 %r12472, %r12420; 2026-02-21T09:20:56.0987624Z mov.b32 %r12473, %r12420; 2026-02-21T09:20:56.0987681Z mov.b32 %r12474, %r12420; 2026-02-21T09:20:56.0987738Z mov.b32 %r12475, %r12420; 2026-02-21T09:20:56.0987796Z mov.b32 %r12476, %r12420; 2026-02-21T09:20:56.0987852Z mov.b32 %r12477, %r12420; 2026-02-21T09:20:56.0987910Z mov.b32 %r12478, %r12420; 2026-02-21T09:20:56.0987966Z mov.b32 %r12479, %r12420; 2026-02-21T09:20:56.0988025Z mov.b32 %r12480, %r12420; 2026-02-21T09:20:56.0988081Z mov.b32 %r12481, %r12420; 2026-02-21T09:20:56.0988230Z mov.b32 %r12482, %r12420; 2026-02-21T09:20:56.0988288Z mov.b32 %r12483, %r12420; 2026-02-21T09:20:56.0988445Z mov.b32 %r12484, %r12420; 2026-02-21T09:20:56.0988503Z mov.b32 %r12485, %r12420; 2026-02-21T09:20:56.0988558Z mov.b32 %r12486, %r12420; 2026-02-21T09:20:56.0988697Z mov.b32 %r12487, %r12420; 2026-02-21T09:20:56.0988754Z mov.b32 %r12488, %r12420; 2026-02-21T09:20:56.0988811Z mov.b32 %r12489, %r12420; 2026-02-21T09:20:56.0988880Z mov.b32 %r12490, %r12420; 2026-02-21T09:20:56.0988937Z mov.b32 %r12491, %r12420; 2026-02-21T09:20:56.0988993Z mov.b32 %r12492, %r12420; 2026-02-21T09:20:56.0989053Z mov.b32 %r12493, %r12420; 2026-02-21T09:20:56.0989110Z mov.b32 %r12494, %r12420; 2026-02-21T09:20:56.0989166Z mov.b32 %r12495, %r12420; 2026-02-21T09:20:56.0989308Z mov.b32 %r12496, %r12420; 2026-02-21T09:20:56.0989370Z mov.b32 %r12497, %r12420; 2026-02-21T09:20:56.0989426Z mov.b32 %r12498, %r12420; 2026-02-21T09:20:56.0989482Z mov.b32 %r12499, %r12420; 2026-02-21T09:20:56.0989542Z mov.b32 %r12500, %r12420; 2026-02-21T09:20:56.0989598Z mov.b32 %r12501, %r12420; 2026-02-21T09:20:56.0989665Z mov.b32 %r12502, %r12420; 2026-02-21T09:20:56.0989723Z mov.b32 %r12503, %r12420; 2026-02-21T09:20:56.0989783Z mov.b32 %r12504, %r12420; 2026-02-21T09:20:56.0989841Z mov.b32 %r12505, %r12420; 2026-02-21T09:20:56.0989896Z mov.b32 %r12506, %r12420; 2026-02-21T09:20:56.0989955Z mov.b32 %r12507, %r12420; 2026-02-21T09:20:56.0990010Z mov.b32 %r12508, %r12420; 2026-02-21T09:20:56.0990067Z mov.b32 %r12509, %r12420; 2026-02-21T09:20:56.0990122Z mov.b32 %r12510, %r12420; 2026-02-21T09:20:56.0990244Z mov.b32 %r12511, %r12420; 2026-02-21T09:20:56.0990303Z mov.b32 %r12512, %r12420; 2026-02-21T09:20:56.0990359Z mov.b32 %r12513, %r12420; 2026-02-21T09:20:56.0990421Z mov.b32 %r12514, %r12420; 2026-02-21T09:20:56.0990490Z mov.b32 %r12515, %r12420; 2026-02-21T09:20:56.0990548Z mov.b32 %r12516, %r12420; 2026-02-21T09:20:56.0990605Z mov.b32 %r12517, %r12420; 2026-02-21T09:20:56.0990665Z mov.b32 %r12518, %r12420; 2026-02-21T09:20:56.0990722Z mov.b32 %r12519, %r12420; 2026-02-21T09:20:56.0990778Z mov.b32 %r12520, %r12420; 2026-02-21T09:20:56.0990838Z mov.b32 %r12521, %r12420; 2026-02-21T09:20:56.0990894Z mov.b32 %r12522, %r12420; 2026-02-21T09:20:56.0990952Z mov.b32 %r12523, %r12420; 2026-02-21T09:20:56.0991008Z mov.b32 %r12524, %r12420; 2026-02-21T09:20:56.0991067Z mov.b32 %r12525, %r12420; 2026-02-21T09:20:56.0991124Z mov.b32 %r12526, %r12420; 2026-02-21T09:20:56.0991180Z mov.b32 %r12527, %r12420; 2026-02-21T09:20:56.0991238Z mov.b32 %r12528, %r12420; 2026-02-21T09:20:56.0991297Z mov.b32 %r12529, %r12420; 2026-02-21T09:20:56.0991354Z mov.b32 %r12530, %r12420; 2026-02-21T09:20:56.0991410Z mov.b32 %r12531, %r12420; 2026-02-21T09:20:56.0991469Z mov.b32 %r12532, %r12420; 2026-02-21T09:20:56.0991536Z mov.b32 %r12533, %r12420; 2026-02-21T09:20:56.0991595Z mov.b32 %r12534, %r12420; 2026-02-21T09:20:56.0991653Z mov.b32 %r12535, %r12420; 2026-02-21T09:20:56.0991711Z mov.b32 %r12536, %r12420; 2026-02-21T09:20:56.0991769Z mov.b32 %r12537, %r12420; 2026-02-21T09:20:56.0991827Z mov.b32 %r12538, %r12420; 2026-02-21T09:20:56.0991885Z mov.b32 %r12539, %r12420; 2026-02-21T09:20:56.0991940Z mov.b32 %r12540, %r12420; 2026-02-21T09:20:56.0991998Z mov.b32 %r12541, %r12420; 2026-02-21T09:20:56.0992057Z mov.b32 %r12542, %r12420; 2026-02-21T09:20:56.0992113Z mov.b32 %r12543, %r12420; 2026-02-21T09:20:56.0992169Z mov.b32 %r12544, %r12420; 2026-02-21T09:20:56.0992227Z mov.b32 %r12545, %r12420; 2026-02-21T09:20:56.0992283Z mov.b32 %r12546, %r12420; 2026-02-21T09:20:56.0992339Z mov.b32 %r12547, %r12420; 2026-02-21T09:20:56.0992453Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T09:20:56.0992565Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:56.0992628Z add.s64 %rd643, %rd643, 16; 2026-02-21T09:20:56.0992695Z setp.lt.u64 %p65, %rd643, 432; 2026-02-21T09:20:56.0992820Z add.s32 %r11569, %r12418, 1; 2026-02-21T09:20:56.0992885Z setp.gt.s32 %p66, %r11569, 4; 2026-02-21T09:20:56.0992954Z selp.b32 %r12418, 0, %r11569, %p66; 2026-02-21T09:20:56.0993165Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.0993281Z cp.async.wait_group 16; 2026-02-21T09:20:56.0993336Z bar.sync 0; 2026-02-21T09:20:56.0993395Z shl.b32 %r11570, %r12418, 13; 2026-02-21T09:20:56.0993460Z add.s32 %r11572, %r11869, %r11570; 2026-02-21T09:20:56.0993663Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.0993731Z add.s32 %r11573, %r11572, %r184; 2026-02-21T09:20:56.0993800Z ld.shared.b16 %rs456, [%r11573]; 2026-02-21T09:20:56.0993918Z ld.shared.b16 %rs457, [%r11573+256]; 2026-02-21T09:20:56.0993987Z ld.shared.b16 %rs458, [%r11573+16]; 2026-02-21T09:20:56.0994050Z ld.shared.b16 %rs459, [%r11573+272]; 2026-02-21T09:20:56.0994121Z ld.shared.b16 %rs460, [%r11573+4096]; 2026-02-21T09:20:56.0994189Z ld.shared.b16 %rs461, [%r11573+4352]; 2026-02-21T09:20:56.0994268Z ld.shared.b16 %rs462, [%r11573+4112]; 2026-02-21T09:20:56.0994339Z ld.shared.b16 %rs463, [%r11573+4368]; 2026-02-21T09:20:56.0994400Z add.s32 %r11574, %r11572, %r185; 2026-02-21T09:20:56.0994464Z ld.shared.b16 %rs464, [%r11574]; 2026-02-21T09:20:56.0994533Z ld.shared.b16 %rs465, [%r11574+256]; 2026-02-21T09:20:56.0994598Z ld.shared.b16 %rs466, [%r11574+16]; 2026-02-21T09:20:56.0994662Z ld.shared.b16 %rs467, [%r11574+272]; 2026-02-21T09:20:56.0994776Z ld.shared.b16 %rs468, [%r11574+4096]; 2026-02-21T09:20:56.0994845Z ld.shared.b16 %rs469, [%r11574+4352]; 2026-02-21T09:20:56.0994909Z ld.shared.b16 %rs470, [%r11574+4112]; 2026-02-21T09:20:56.0994975Z ld.shared.b16 %rs471, [%r11574+4368]; 2026-02-21T09:20:56.0995044Z cvt.f32.bf16 %r10097, %rs456; 2026-02-21T09:20:56.0995111Z cvt.f32.bf16 %r10098, %rs457; 2026-02-21T09:20:56.0995172Z cvt.f32.bf16 %r10099, %rs464; 2026-02-21T09:20:56.0995233Z cvt.f32.bf16 %r10100, %rs465; 2026-02-21T09:20:56.0995295Z cvt.f32.bf16 %r10229, %rs458; 2026-02-21T09:20:56.0995354Z cvt.f32.bf16 %r10230, %rs459; 2026-02-21T09:20:56.0995413Z cvt.f32.bf16 %r10231, %rs466; 2026-02-21T09:20:56.0995474Z cvt.f32.bf16 %r10232, %rs467; 2026-02-21T09:20:56.0995534Z cvt.f32.bf16 %r10361, %rs460; 2026-02-21T09:20:56.0995593Z cvt.f32.bf16 %r10362, %rs461; 2026-02-21T09:20:56.0995652Z cvt.f32.bf16 %r10363, %rs468; 2026-02-21T09:20:56.0995712Z cvt.f32.bf16 %r10364, %rs469; 2026-02-21T09:20:56.0995769Z cvt.f32.bf16 %r10493, %rs462; 2026-02-21T09:20:56.0995827Z cvt.f32.bf16 %r10494, %rs463; 2026-02-21T09:20:56.0995889Z cvt.f32.bf16 %r10495, %rs470; 2026-02-21T09:20:56.0995948Z cvt.f32.bf16 %r10496, %rs471; 2026-02-21T09:20:56.0996164Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.0996224Z shl.b32 %r11575, %r12418, 10; 2026-02-21T09:20:56.0996286Z add.s32 %r11576, %r11869, %r11575; 2026-02-21T09:20:56.0996348Z add.s32 %r11577, %r11576, 90112; 2026-02-21T09:20:56.0996686Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.0996755Z add.s32 %r11578, %r11577, %r11875; 2026-02-21T09:20:56.0996824Z ld.shared.b8 %rs472, [%r11578]; 2026-02-21T09:20:56.0996891Z ld.shared.b8 %rs473, [%r11578+128]; 2026-02-21T09:20:56.0996957Z ld.shared.b8 %rs474, [%r11578+256]; 2026-02-21T09:20:56.0997020Z ld.shared.b8 %rs475, [%r11578+384]; 2026-02-21T09:20:56.0997085Z ld.shared.b8 %rs476, [%r11578+512]; 2026-02-21T09:20:56.0997150Z ld.shared.b8 %rs477, [%r11578+640]; 2026-02-21T09:20:56.0997213Z ld.shared.b8 %rs478, [%r11578+768]; 2026-02-21T09:20:56.0997271Z add.s32 %r11579, %r11577, %r11876; 2026-02-21T09:20:56.0997335Z ld.shared.b8 %rs479, [%r11579]; 2026-02-21T09:20:56.0997540Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.0997694Z shl.b16 %rs480, %rs472, 4; 2026-02-21T09:20:56.0997754Z shl.b16 %rs481, %rs473, 4; 2026-02-21T09:20:56.0997817Z shl.b16 %rs482, %rs474, 4; 2026-02-21T09:20:56.0997876Z shl.b16 %rs483, %rs475, 4; 2026-02-21T09:20:56.0997934Z shl.b16 %rs484, %rs476, 4; 2026-02-21T09:20:56.0998062Z shl.b16 %rs485, %rs477, 4; 2026-02-21T09:20:56.0998124Z shl.b16 %rs486, %rs478, 4; 2026-02-21T09:20:56.0998182Z shl.b16 %rs487, %rs479, 4; 2026-02-21T09:20:56.0998388Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.0998465Z selp.b16 %rs488, %rs480, %rs472, %p70; 2026-02-21T09:20:56.0998527Z cvt.s16.s8 %rs489, %rs488; 2026-02-21T09:20:56.0998585Z shr.s16 %rs490, %rs489, 4; 2026-02-21T09:20:56.0998719Z selp.b16 %rs491, %rs481, %rs473, %p70; 2026-02-21T09:20:56.0998783Z cvt.s16.s8 %rs492, %rs491; 2026-02-21T09:20:56.0998842Z shr.s16 %rs493, %rs492, 4; 2026-02-21T09:20:56.0998907Z selp.b16 %rs494, %rs482, %rs474, %p70; 2026-02-21T09:20:56.0998971Z cvt.s16.s8 %rs495, %rs494; 2026-02-21T09:20:56.0999030Z shr.s16 %rs496, %rs495, 4; 2026-02-21T09:20:56.0999096Z selp.b16 %rs497, %rs483, %rs475, %p70; 2026-02-21T09:20:56.0999160Z cvt.s16.s8 %rs498, %rs497; 2026-02-21T09:20:56.0999220Z shr.s16 %rs499, %rs498, 4; 2026-02-21T09:20:56.0999286Z selp.b16 %rs500, %rs484, %rs476, %p70; 2026-02-21T09:20:56.0999345Z cvt.s16.s8 %rs501, %rs500; 2026-02-21T09:20:56.0999405Z shr.s16 %rs502, %rs501, 4; 2026-02-21T09:20:56.0999470Z selp.b16 %rs503, %rs485, %rs477, %p70; 2026-02-21T09:20:56.0999588Z cvt.s16.s8 %rs504, %rs503; 2026-02-21T09:20:56.0999662Z shr.s16 %rs505, %rs504, 4; 2026-02-21T09:20:56.0999732Z selp.b16 %rs506, %rs486, %rs478, %p70; 2026-02-21T09:20:56.0999791Z cvt.s16.s8 %rs507, %rs506; 2026-02-21T09:20:56.0999851Z shr.s16 %rs508, %rs507, 4; 2026-02-21T09:20:56.0999921Z selp.b16 %rs509, %rs487, %rs479, %p70; 2026-02-21T09:20:56.0999979Z cvt.s16.s8 %rs510, %rs509; 2026-02-21T09:20:56.1000037Z shr.s16 %rs511, %rs510, 4; 2026-02-21T09:20:56.1000246Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.1000311Z cvt.rn.f32.s16 %r11580, %rs490; 2026-02-21T09:20:56.1000375Z cvt.rn.f32.s16 %r11581, %rs493; 2026-02-21T09:20:56.1000440Z cvt.rn.f32.s16 %r11582, %rs496; 2026-02-21T09:20:56.1000500Z cvt.rn.f32.s16 %r11583, %rs499; 2026-02-21T09:20:56.1000559Z cvt.rn.f32.s16 %r11584, %rs502; 2026-02-21T09:20:56.1000618Z cvt.rn.f32.s16 %r11585, %rs505; 2026-02-21T09:20:56.1000681Z cvt.rn.f32.s16 %r11586, %rs508; 2026-02-21T09:20:56.1000742Z cvt.rn.f32.s16 %r11587, %rs511; 2026-02-21T09:20:56.1000806Z st.shared.b32 [%r188], %r11580; 2026-02-21T09:20:56.1000874Z st.shared.b32 [%r188+8], %r11581; 2026-02-21T09:20:56.1000936Z st.shared.b32 [%r189], %r11582; 2026-02-21T09:20:56.1000999Z st.shared.b32 [%r189+8], %r11583; 2026-02-21T09:20:56.1001060Z st.shared.b32 [%r190], %r11584; 2026-02-21T09:20:56.1001125Z st.shared.b32 [%r190+8], %r11585; 2026-02-21T09:20:56.1001187Z st.shared.b32 [%r191], %r11586; 2026-02-21T09:20:56.1001248Z st.shared.b32 [%r191+8], %r11587; 2026-02-21T09:20:56.1001307Z $L__tmp17: 2026-02-21T09:20:56.1001589Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.1001649Z // begin inline asm 2026-02-21T09:20:56.1001733Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.1001788Z // end inline asm 2026-02-21T09:20:56.1001841Z bar.sync 0; 2026-02-21T09:20:56.1001923Z shfl.sync.idx.b32 %r11588, %r5, 0, 31, -1; 2026-02-21T09:20:56.1001998Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.1002060Z mov.pred %p56, -1; 2026-02-21T09:20:56.1002117Z // begin inline asm 2026-02-21T09:20:56.1003617Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483}, {%r10097,%r10098,%r10099,%r10100}, %rd4, %p56, 1, 1; 2026-02-21T09:20:56.1003783Z // end inline asm 2026-02-21T09:20:56.1003843Z // begin inline asm 2026-02-21T09:20:56.1005371Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483}, {%r10229,%r10230,%r10231,%r10232}, %rd5, %p56, 1, 1; 2026-02-21T09:20:56.1005431Z // end inline asm 2026-02-21T09:20:56.1005491Z // begin inline asm 2026-02-21T09:20:56.1007155Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547}, {%r10361,%r10362,%r10363,%r10364}, %rd4, %p56, 1, 1; 2026-02-21T09:20:56.1007225Z // end inline asm 2026-02-21T09:20:56.1007286Z // begin inline asm 2026-02-21T09:20:56.1008761Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547}, {%r10493,%r10494,%r10495,%r10496}, %rd5, %p56, 1, 1; 2026-02-21T09:20:56.1008822Z // end inline asm 2026-02-21T09:20:56.1008897Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.1008955Z mov.b32 %r11416, 0; 2026-02-21T09:20:56.1009013Z mov.b32 %r10626, %r11416; 2026-02-21T09:20:56.1009076Z mov.b32 %r10627, %r11416; 2026-02-21T09:20:56.1009136Z mov.b32 %r10625, %r9798; 2026-02-21T09:20:56.1009192Z // begin inline asm 2026-02-21T09:20:56.1011729Z // wait for regs: %r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r10625,%r10626,%r10627 2026-02-21T09:20:56.1011872Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.1011931Z // end inline asm 2026-02-21T09:20:56.1011996Z $L__tmp18: 2026-02-21T09:20:56.1012213Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.1012363Z add.s32 %r11589, %r11869, 40960; 2026-02-21T09:20:56.1012424Z add.s32 %r11590, %r11589, %r11570; 2026-02-21T09:20:56.1012629Z .loc 1 55 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:55:32 2026-02-21T09:20:56.1012695Z add.s32 %r11591, %r11590, %r184; 2026-02-21T09:20:56.1012758Z ld.shared.b16 %rs512, [%r11591]; 2026-02-21T09:20:56.1012826Z ld.shared.b16 %rs513, [%r11591+256]; 2026-02-21T09:20:56.1012953Z ld.shared.b16 %rs514, [%r11591+16]; 2026-02-21T09:20:56.1013024Z ld.shared.b16 %rs515, [%r11591+272]; 2026-02-21T09:20:56.1013092Z ld.shared.b16 %rs516, [%r11591+4096]; 2026-02-21T09:20:56.1013157Z ld.shared.b16 %rs517, [%r11591+4352]; 2026-02-21T09:20:56.1013225Z ld.shared.b16 %rs518, [%r11591+4112]; 2026-02-21T09:20:56.1013289Z ld.shared.b16 %rs519, [%r11591+4368]; 2026-02-21T09:20:56.1013349Z add.s32 %r11592, %r11590, %r185; 2026-02-21T09:20:56.1013413Z ld.shared.b16 %rs520, [%r11592]; 2026-02-21T09:20:56.1013482Z ld.shared.b16 %rs521, [%r11592+256]; 2026-02-21T09:20:56.1013547Z ld.shared.b16 %rs522, [%r11592+16]; 2026-02-21T09:20:56.1013611Z ld.shared.b16 %rs523, [%r11592+272]; 2026-02-21T09:20:56.1013677Z ld.shared.b16 %rs524, [%r11592+4096]; 2026-02-21T09:20:56.1013797Z ld.shared.b16 %rs525, [%r11592+4352]; 2026-02-21T09:20:56.1013865Z ld.shared.b16 %rs526, [%r11592+4112]; 2026-02-21T09:20:56.1013932Z ld.shared.b16 %rs527, [%r11592+4368]; 2026-02-21T09:20:56.1014005Z cvt.f32.bf16 %r10887, %rs512; 2026-02-21T09:20:56.1014067Z cvt.f32.bf16 %r10888, %rs513; 2026-02-21T09:20:56.1014128Z cvt.f32.bf16 %r10889, %rs520; 2026-02-21T09:20:56.1014199Z cvt.f32.bf16 %r10890, %rs521; 2026-02-21T09:20:56.1014261Z cvt.f32.bf16 %r11019, %rs514; 2026-02-21T09:20:56.1014320Z cvt.f32.bf16 %r11020, %rs515; 2026-02-21T09:20:56.1014382Z cvt.f32.bf16 %r11021, %rs522; 2026-02-21T09:20:56.1014441Z cvt.f32.bf16 %r11022, %rs523; 2026-02-21T09:20:56.1014500Z cvt.f32.bf16 %r11151, %rs516; 2026-02-21T09:20:56.1014560Z cvt.f32.bf16 %r11152, %rs517; 2026-02-21T09:20:56.1014620Z cvt.f32.bf16 %r11153, %rs524; 2026-02-21T09:20:56.1014679Z cvt.f32.bf16 %r11154, %rs525; 2026-02-21T09:20:56.1014738Z cvt.f32.bf16 %r11283, %rs518; 2026-02-21T09:20:56.1014798Z cvt.f32.bf16 %r11284, %rs519; 2026-02-21T09:20:56.1014858Z cvt.f32.bf16 %r11285, %rs526; 2026-02-21T09:20:56.1014916Z cvt.f32.bf16 %r11286, %rs527; 2026-02-21T09:20:56.1015122Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.1015185Z add.s32 %r11593, %r11576, 95232; 2026-02-21T09:20:56.1015384Z .loc 1 70 45 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:70:45 2026-02-21T09:20:56.1015448Z add.s32 %r11594, %r11593, %r11875; 2026-02-21T09:20:56.1015526Z ld.shared.b8 %rs528, [%r11594]; 2026-02-21T09:20:56.1015593Z ld.shared.b8 %rs529, [%r11594+128]; 2026-02-21T09:20:56.1015659Z ld.shared.b8 %rs530, [%r11594+256]; 2026-02-21T09:20:56.1015729Z ld.shared.b8 %rs531, [%r11594+384]; 2026-02-21T09:20:56.1015791Z ld.shared.b8 %rs532, [%r11594+512]; 2026-02-21T09:20:56.1015853Z ld.shared.b8 %rs533, [%r11594+640]; 2026-02-21T09:20:56.1015917Z ld.shared.b8 %rs534, [%r11594+768]; 2026-02-21T09:20:56.1015981Z add.s32 %r11595, %r11593, %r11876; 2026-02-21T09:20:56.1016044Z ld.shared.b8 %rs535, [%r11595]; 2026-02-21T09:20:56.1016244Z .loc 1 60 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:60:28 2026-02-21T09:20:56.1016311Z shl.b16 %rs536, %rs528, 4; 2026-02-21T09:20:56.1016371Z shl.b16 %rs537, %rs529, 4; 2026-02-21T09:20:56.1016429Z shl.b16 %rs538, %rs530, 4; 2026-02-21T09:20:56.1016610Z shl.b16 %rs539, %rs531, 4; 2026-02-21T09:20:56.1016766Z shl.b16 %rs540, %rs532, 4; 2026-02-21T09:20:56.1016827Z shl.b16 %rs541, %rs533, 4; 2026-02-21T09:20:56.1016886Z shl.b16 %rs542, %rs534, 4; 2026-02-21T09:20:56.1016947Z shl.b16 %rs543, %rs535, 4; 2026-02-21T09:20:56.1017215Z .loc 1 75 58 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:75:58 2026-02-21T09:20:56.1017287Z selp.b16 %rs544, %rs536, %rs528, %p70; 2026-02-21T09:20:56.1017351Z cvt.s16.s8 %rs545, %rs544; 2026-02-21T09:20:56.1017409Z shr.s16 %rs546, %rs545, 4; 2026-02-21T09:20:56.1017480Z selp.b16 %rs547, %rs537, %rs529, %p70; 2026-02-21T09:20:56.1017541Z cvt.s16.s8 %rs548, %rs547; 2026-02-21T09:20:56.1017601Z shr.s16 %rs549, %rs548, 4; 2026-02-21T09:20:56.1017730Z selp.b16 %rs550, %rs538, %rs530, %p70; 2026-02-21T09:20:56.1017791Z cvt.s16.s8 %rs551, %rs550; 2026-02-21T09:20:56.1017851Z shr.s16 %rs552, %rs551, 4; 2026-02-21T09:20:56.1017929Z selp.b16 %rs553, %rs539, %rs531, %p70; 2026-02-21T09:20:56.1017992Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T09:20:56.1018053Z shr.s16 %rs555, %rs554, 4; 2026-02-21T09:20:56.1018118Z selp.b16 %rs556, %rs540, %rs532, %p70; 2026-02-21T09:20:56.1018177Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T09:20:56.1018235Z shr.s16 %rs558, %rs557, 4; 2026-02-21T09:20:56.1018304Z selp.b16 %rs559, %rs541, %rs533, %p70; 2026-02-21T09:20:56.1018363Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T09:20:56.1018420Z shr.s16 %rs561, %rs560, 4; 2026-02-21T09:20:56.1018487Z selp.b16 %rs562, %rs542, %rs534, %p70; 2026-02-21T09:20:56.1018546Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T09:20:56.1018671Z shr.s16 %rs564, %rs563, 4; 2026-02-21T09:20:56.1018751Z selp.b16 %rs565, %rs543, %rs535, %p70; 2026-02-21T09:20:56.1018814Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T09:20:56.1018873Z shr.s16 %rs567, %rs566, 4; 2026-02-21T09:20:56.1019075Z .loc 1 80 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:80:32 2026-02-21T09:20:56.1019145Z cvt.rn.f32.s16 %r11596, %rs546; 2026-02-21T09:20:56.1019209Z cvt.rn.f32.s16 %r11597, %rs549; 2026-02-21T09:20:56.1019269Z cvt.rn.f32.s16 %r11598, %rs552; 2026-02-21T09:20:56.1019331Z cvt.rn.f32.s16 %r11599, %rs555; 2026-02-21T09:20:56.1019393Z cvt.rn.f32.s16 %r11600, %rs558; 2026-02-21T09:20:56.1019456Z cvt.rn.f32.s16 %r11601, %rs561; 2026-02-21T09:20:56.1019518Z cvt.rn.f32.s16 %r11602, %rs564; 2026-02-21T09:20:56.1019580Z cvt.rn.f32.s16 %r11603, %rs567; 2026-02-21T09:20:56.1019635Z bar.sync 0; 2026-02-21T09:20:56.1019697Z st.shared.b32 [%r188], %r11596; 2026-02-21T09:20:56.1019764Z st.shared.b32 [%r188+8], %r11597; 2026-02-21T09:20:56.1019826Z st.shared.b32 [%r189], %r11598; 2026-02-21T09:20:56.1019889Z st.shared.b32 [%r189+8], %r11599; 2026-02-21T09:20:56.1019955Z st.shared.b32 [%r190], %r11600; 2026-02-21T09:20:56.1020025Z st.shared.b32 [%r190+8], %r11601; 2026-02-21T09:20:56.1020090Z st.shared.b32 [%r191], %r11602; 2026-02-21T09:20:56.1020153Z st.shared.b32 [%r191+8], %r11603; 2026-02-21T09:20:56.1020212Z $L__tmp19: 2026-02-21T09:20:56.1020497Z .loc 2 291 36 // standard.py:291:36 @[ czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:87:40 ] 2026-02-21T09:20:56.1020557Z // begin inline asm 2026-02-21T09:20:56.1020637Z fence.proxy.async.shared::cta; 2026-02-21T09:20:56.1020698Z // end inline asm 2026-02-21T09:20:56.1020752Z bar.sync 0; 2026-02-21T09:20:56.1020824Z wgmma.fence.sync.aligned; 2026-02-21T09:20:56.1020884Z // begin inline asm 2026-02-21T09:20:56.1022374Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483}, {%r10887,%r10888,%r10889,%r10890}, %rd4, %p56, 1, 1; 2026-02-21T09:20:56.1022504Z // end inline asm 2026-02-21T09:20:56.1022567Z // begin inline asm 2026-02-21T09:20:56.1024095Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483}, {%r11019,%r11020,%r11021,%r11022}, %rd5, %p56, 1, 1; 2026-02-21T09:20:56.1024198Z // end inline asm 2026-02-21T09:20:56.1024255Z // begin inline asm 2026-02-21T09:20:56.1025743Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547}, {%r11151,%r11152,%r11153,%r11154}, %rd4, %p56, 1, 1; 2026-02-21T09:20:56.1025854Z // end inline asm 2026-02-21T09:20:56.1025913Z // begin inline asm 2026-02-21T09:20:56.1027522Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547}, {%r11283,%r11284,%r11285,%r11286}, %rd5, %p56, 1, 1; 2026-02-21T09:20:56.1027586Z // end inline asm 2026-02-21T09:20:56.1027662Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:56.1027726Z mov.b32 %r11417, %r11416; 2026-02-21T09:20:56.1027786Z mov.b32 %r11415, %r9798; 2026-02-21T09:20:56.1027845Z // begin inline asm 2026-02-21T09:20:56.1030452Z // wait for regs: %r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r11415,%r11416,%r11417 2026-02-21T09:20:56.1030531Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:56.1030587Z // end inline asm 2026-02-21T09:20:56.1030644Z $L__tmp20: 2026-02-21T09:20:56.1030860Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.1030923Z add.s32 %r11604, %r12419, 1; 2026-02-21T09:20:56.1031091Z setp.gt.s32 %p67, %r11604, 4; 2026-02-21T09:20:56.1031162Z selp.b32 %r12419, 0, %r11604, %p67; 2026-02-21T09:20:56.1031369Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.1031498Z add.s64 %rd611, %rd642, %rd39; 2026-02-21T09:20:56.1031559Z add.s64 %rd601, %rd611, 320; 2026-02-21T09:20:56.1031620Z add.s64 %rd612, %rd642, %rd38; 2026-02-21T09:20:56.1031680Z add.s64 %rd602, %rd612, 320; 2026-02-21T09:20:56.1031746Z add.s64 %rd613, %rd642, %rd37; 2026-02-21T09:20:56.1031808Z add.s64 %rd603, %rd613, 320; 2026-02-21T09:20:56.1032009Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.1032074Z add.s64 %rd614, %rd642, %rd36; 2026-02-21T09:20:56.1032202Z add.s64 %rd604, %rd614, 320; 2026-02-21T09:20:56.1032266Z shl.b32 %r11605, %r12419, 13; 2026-02-21T09:20:56.1032334Z add.s32 %r11606, %r11869, %r11605; 2026-02-21T09:20:56.1032399Z add.s32 %r11549, %r11606, %r123; 2026-02-21T09:20:56.1032463Z selp.b32 %r11550, 8, 0, %p65; 2026-02-21T09:20:56.1032521Z // begin inline asm 2026-02-21T09:20:56.1032690Z cp.async.ca.shared.global [ %r11549 + 0 ], [ %rd601 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1032749Z // end inline asm 2026-02-21T09:20:56.1032814Z add.s32 %r11551, %r11549, 2048; 2026-02-21T09:20:56.1032874Z // begin inline asm 2026-02-21T09:20:56.1033016Z cp.async.ca.shared.global [ %r11551 + 0 ], [ %rd602 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1033070Z // end inline asm 2026-02-21T09:20:56.1033198Z add.s32 %r11553, %r11549, 4096; 2026-02-21T09:20:56.1033264Z // begin inline asm 2026-02-21T09:20:56.1033404Z cp.async.ca.shared.global [ %r11553 + 0 ], [ %rd603 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1033461Z // end inline asm 2026-02-21T09:20:56.1033524Z add.s32 %r11555, %r11549, 6144; 2026-02-21T09:20:56.1033583Z // begin inline asm 2026-02-21T09:20:56.1033720Z cp.async.ca.shared.global [ %r11555 + 0 ], [ %rd604 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1033784Z // end inline asm 2026-02-21T09:20:56.1033859Z cp.async.commit_group; 2026-02-21T09:20:56.1034066Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.1034133Z add.s32 %r11607, %r12417, -65536; 2026-02-21T09:20:56.1034197Z cvt.s64.s32 %rd615, %r11607; 2026-02-21T09:20:56.1034259Z add.s64 %rd605, %rd45, %rd615; 2026-02-21T09:20:56.1034461Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.1034525Z shl.b32 %r11608, %r12419, 10; 2026-02-21T09:20:56.1034588Z add.s32 %r11557, %r129, %r11608; 2026-02-21T09:20:56.1034650Z selp.b32 %r11558, 4, 0, %p65; 2026-02-21T09:20:56.1034707Z // begin inline asm 2026-02-21T09:20:56.1034851Z cp.async.ca.shared.global [ %r11557 + 0 ], [ %rd605 + 0 ], 0x4, %r11558; 2026-02-21T09:20:56.1034906Z // end inline asm 2026-02-21T09:20:56.1034970Z cp.async.commit_group; 2026-02-21T09:20:56.1035176Z .loc 1 51 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:32 2026-02-21T09:20:56.1035238Z add.s64 %rd606, %rd611, 352; 2026-02-21T09:20:56.1035300Z add.s64 %rd607, %rd612, 352; 2026-02-21T09:20:56.1035362Z add.s64 %rd608, %rd613, 352; 2026-02-21T09:20:56.1035568Z .loc 1 51 80 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:51:80 2026-02-21T09:20:56.1035629Z add.s64 %rd609, %rd614, 352; 2026-02-21T09:20:56.1035694Z add.s32 %r11609, %r11589, %r11605; 2026-02-21T09:20:56.1035761Z add.s32 %r11559, %r11609, %r123; 2026-02-21T09:20:56.1035820Z // begin inline asm 2026-02-21T09:20:56.1035962Z cp.async.ca.shared.global [ %r11559 + 0 ], [ %rd606 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1036021Z // end inline asm 2026-02-21T09:20:56.1036082Z add.s32 %r11561, %r11559, 2048; 2026-02-21T09:20:56.1036140Z // begin inline asm 2026-02-21T09:20:56.1036277Z cp.async.ca.shared.global [ %r11561 + 0 ], [ %rd607 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1036419Z // end inline asm 2026-02-21T09:20:56.1036599Z add.s32 %r11563, %r11559, 4096; 2026-02-21T09:20:56.1036663Z // begin inline asm 2026-02-21T09:20:56.1036802Z cp.async.ca.shared.global [ %r11563 + 0 ], [ %rd608 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1036941Z // end inline asm 2026-02-21T09:20:56.1037011Z add.s32 %r11565, %r11559, 6144; 2026-02-21T09:20:56.1037074Z // begin inline asm 2026-02-21T09:20:56.1037212Z cp.async.ca.shared.global [ %r11565 + 0 ], [ %rd609 + 0 ], 0x8, %r11550; 2026-02-21T09:20:56.1037270Z // end inline asm 2026-02-21T09:20:56.1037336Z cp.async.commit_group; 2026-02-21T09:20:56.1037541Z .loc 1 57 34 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:34 2026-02-21T09:20:56.1037668Z cvt.s64.s32 %rd616, %r12417; 2026-02-21T09:20:56.1037737Z add.s64 %rd610, %rd45, %rd616; 2026-02-21T09:20:56.1037959Z .loc 1 57 87 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:57:87 2026-02-21T09:20:56.1038024Z add.s32 %r11567, %r135, %r11608; 2026-02-21T09:20:56.1038084Z // begin inline asm 2026-02-21T09:20:56.1038245Z cp.async.ca.shared.global [ %r11567 + 0 ], [ %rd610 + 0 ], 0x4, %r11558; 2026-02-21T09:20:56.1038304Z // end inline asm 2026-02-21T09:20:56.1038373Z cp.async.commit_group; 2026-02-21T09:20:56.1038577Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.1038644Z add.s32 %r12417, %r12417, 131072; 2026-02-21T09:20:56.1038706Z add.s64 %rd642, %rd642, 64; 2026-02-21T09:20:56.1038853Z setp.lt.u64 %p68, %rd643, 496; 2026-02-21T09:20:56.1038922Z @%p68 bra $L__BB0_14; 2026-02-21T09:20:56.1039044Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T09:20:56.1039253Z .loc 1 34 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:34:32 2026-02-21T09:20:56.1039318Z or.b32 %r11754, %r1285, %r9; 2026-02-21T09:20:56.1039517Z .loc 1 36 32 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:36:32 2026-02-21T09:20:56.1039580Z or.b32 %r11755, %r1286, %r15; 2026-02-21T09:20:56.1039639Z or.b32 %r11756, %r1286, %r16; 2026-02-21T09:20:56.1039701Z or.b32 %r11757, %r1286, %r17; 2026-02-21T09:20:56.1039765Z or.b32 %r11758, %r1286, %r18; 2026-02-21T09:20:56.1039823Z or.b32 %r11759, %r1286, %r19; 2026-02-21T09:20:56.1039884Z or.b32 %r11760, %r1286, %r20; 2026-02-21T09:20:56.1039941Z or.b32 %r11761, %r1286, %r21; 2026-02-21T09:20:56.1039998Z or.b32 %r11762, %r1286, %r22; 2026-02-21T09:20:56.1040059Z or.b32 %r11763, %r1286, %r23; 2026-02-21T09:20:56.1040118Z or.b32 %r11764, %r1286, %r24; 2026-02-21T09:20:56.1040174Z or.b32 %r11765, %r1286, %r25; 2026-02-21T09:20:56.1040234Z or.b32 %r11766, %r1286, %r26; 2026-02-21T09:20:56.1040296Z or.b32 %r11767, %r1286, %r27; 2026-02-21T09:20:56.1040353Z or.b32 %r11768, %r1286, %r28; 2026-02-21T09:20:56.1040409Z or.b32 %r11769, %r1286, %r29; 2026-02-21T09:20:56.1040472Z or.b32 %r11770, %r1286, %r30; 2026-02-21T09:20:56.1040672Z .loc 1 43 78 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:43:78 2026-02-21T09:20:56.1040739Z cp.async.wait_group 0; 2026-02-21T09:20:56.1040794Z bar.sync 0; 2026-02-21T09:20:56.1040999Z .loc 1 90 28 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:90:28 2026-02-21T09:20:56.1041086Z cvt.rn.bf16x2.f32 %r11771, %r12421, %r12420; 2026-02-21T09:20:56.1041165Z cvt.rn.bf16x2.f32 %r11772, %r12423, %r12422; 2026-02-21T09:20:56.1041246Z cvt.rn.bf16x2.f32 %r11773, %r12425, %r12424; 2026-02-21T09:20:56.1041322Z cvt.rn.bf16x2.f32 %r11774, %r12427, %r12426; 2026-02-21T09:20:56.1041397Z cvt.rn.bf16x2.f32 %r11775, %r12429, %r12428; 2026-02-21T09:20:56.1041477Z cvt.rn.bf16x2.f32 %r11776, %r12431, %r12430; 2026-02-21T09:20:56.1041552Z cvt.rn.bf16x2.f32 %r11777, %r12433, %r12432; 2026-02-21T09:20:56.1041640Z cvt.rn.bf16x2.f32 %r11778, %r12435, %r12434; 2026-02-21T09:20:56.1041795Z cvt.rn.bf16x2.f32 %r11779, %r12437, %r12436; 2026-02-21T09:20:56.1041873Z cvt.rn.bf16x2.f32 %r11780, %r12439, %r12438; 2026-02-21T09:20:56.1041958Z cvt.rn.bf16x2.f32 %r11781, %r12441, %r12440; 2026-02-21T09:20:56.1042038Z cvt.rn.bf16x2.f32 %r11782, %r12443, %r12442; 2026-02-21T09:20:56.1042171Z cvt.rn.bf16x2.f32 %r11783, %r12445, %r12444; 2026-02-21T09:20:56.1042247Z cvt.rn.bf16x2.f32 %r11784, %r12447, %r12446; 2026-02-21T09:20:56.1042321Z cvt.rn.bf16x2.f32 %r11785, %r12449, %r12448; 2026-02-21T09:20:56.1042398Z cvt.rn.bf16x2.f32 %r11786, %r12451, %r12450; 2026-02-21T09:20:56.1042474Z cvt.rn.bf16x2.f32 %r11787, %r12453, %r12452; 2026-02-21T09:20:56.1042549Z cvt.rn.bf16x2.f32 %r11788, %r12455, %r12454; 2026-02-21T09:20:56.1042673Z cvt.rn.bf16x2.f32 %r11789, %r12457, %r12456; 2026-02-21T09:20:56.1042752Z cvt.rn.bf16x2.f32 %r11790, %r12459, %r12458; 2026-02-21T09:20:56.1042829Z cvt.rn.bf16x2.f32 %r11791, %r12461, %r12460; 2026-02-21T09:20:56.1042905Z cvt.rn.bf16x2.f32 %r11792, %r12463, %r12462; 2026-02-21T09:20:56.1042985Z cvt.rn.bf16x2.f32 %r11793, %r12465, %r12464; 2026-02-21T09:20:56.1043059Z cvt.rn.bf16x2.f32 %r11794, %r12467, %r12466; 2026-02-21T09:20:56.1043133Z cvt.rn.bf16x2.f32 %r11795, %r12469, %r12468; 2026-02-21T09:20:56.1043212Z cvt.rn.bf16x2.f32 %r11796, %r12471, %r12470; 2026-02-21T09:20:56.1043285Z cvt.rn.bf16x2.f32 %r11797, %r12473, %r12472; 2026-02-21T09:20:56.1043359Z cvt.rn.bf16x2.f32 %r11798, %r12475, %r12474; 2026-02-21T09:20:56.1043435Z cvt.rn.bf16x2.f32 %r11799, %r12477, %r12476; 2026-02-21T09:20:56.1043562Z cvt.rn.bf16x2.f32 %r11800, %r12479, %r12478; 2026-02-21T09:20:56.1043644Z cvt.rn.bf16x2.f32 %r11801, %r12481, %r12480; 2026-02-21T09:20:56.1043719Z cvt.rn.bf16x2.f32 %r11802, %r12483, %r12482; 2026-02-21T09:20:56.1043798Z cvt.rn.bf16x2.f32 %r11803, %r12485, %r12484; 2026-02-21T09:20:56.1043874Z cvt.rn.bf16x2.f32 %r11804, %r12487, %r12486; 2026-02-21T09:20:56.1043947Z cvt.rn.bf16x2.f32 %r11805, %r12489, %r12488; 2026-02-21T09:20:56.1044026Z cvt.rn.bf16x2.f32 %r11806, %r12491, %r12490; 2026-02-21T09:20:56.1044101Z cvt.rn.bf16x2.f32 %r11807, %r12493, %r12492; 2026-02-21T09:20:56.1044175Z cvt.rn.bf16x2.f32 %r11808, %r12495, %r12494; 2026-02-21T09:20:56.1044249Z cvt.rn.bf16x2.f32 %r11809, %r12497, %r12496; 2026-02-21T09:20:56.1044326Z cvt.rn.bf16x2.f32 %r11810, %r12499, %r12498; 2026-02-21T09:20:56.1044399Z cvt.rn.bf16x2.f32 %r11811, %r12501, %r12500; 2026-02-21T09:20:56.1044474Z cvt.rn.bf16x2.f32 %r11812, %r12503, %r12502; 2026-02-21T09:20:56.1044558Z cvt.rn.bf16x2.f32 %r11813, %r12505, %r12504; 2026-02-21T09:20:56.1044633Z cvt.rn.bf16x2.f32 %r11814, %r12507, %r12506; 2026-02-21T09:20:56.1044708Z cvt.rn.bf16x2.f32 %r11815, %r12509, %r12508; 2026-02-21T09:20:56.1044783Z cvt.rn.bf16x2.f32 %r11816, %r12511, %r12510; 2026-02-21T09:20:56.1044861Z cvt.rn.bf16x2.f32 %r11817, %r12513, %r12512; 2026-02-21T09:20:56.1044936Z cvt.rn.bf16x2.f32 %r11818, %r12515, %r12514; 2026-02-21T09:20:56.1045010Z cvt.rn.bf16x2.f32 %r11819, %r12517, %r12516; 2026-02-21T09:20:56.1045090Z cvt.rn.bf16x2.f32 %r11820, %r12519, %r12518; 2026-02-21T09:20:56.1045164Z cvt.rn.bf16x2.f32 %r11821, %r12521, %r12520; 2026-02-21T09:20:56.1045238Z cvt.rn.bf16x2.f32 %r11822, %r12523, %r12522; 2026-02-21T09:20:56.1045318Z cvt.rn.bf16x2.f32 %r11823, %r12525, %r12524; 2026-02-21T09:20:56.1045392Z cvt.rn.bf16x2.f32 %r11824, %r12527, %r12526; 2026-02-21T09:20:56.1045466Z cvt.rn.bf16x2.f32 %r11825, %r12529, %r12528; 2026-02-21T09:20:56.1045539Z cvt.rn.bf16x2.f32 %r11826, %r12531, %r12530; 2026-02-21T09:20:56.1045616Z cvt.rn.bf16x2.f32 %r11827, %r12533, %r12532; 2026-02-21T09:20:56.1045690Z cvt.rn.bf16x2.f32 %r11828, %r12535, %r12534; 2026-02-21T09:20:56.1045765Z cvt.rn.bf16x2.f32 %r11829, %r12537, %r12536; 2026-02-21T09:20:56.1045854Z cvt.rn.bf16x2.f32 %r11830, %r12539, %r12538; 2026-02-21T09:20:56.1045932Z cvt.rn.bf16x2.f32 %r11831, %r12541, %r12540; 2026-02-21T09:20:56.1046009Z cvt.rn.bf16x2.f32 %r11832, %r12543, %r12542; 2026-02-21T09:20:56.1046157Z cvt.rn.bf16x2.f32 %r11833, %r12545, %r12544; 2026-02-21T09:20:56.1046232Z cvt.rn.bf16x2.f32 %r11834, %r12547, %r12546; 2026-02-21T09:20:56.1046441Z .loc 1 91 43 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:43 2026-02-21T09:20:56.1046696Z shl.b32 %r11835, %r11755, 13; 2026-02-21T09:20:56.1046760Z shl.b32 %r11836, %r11756, 13; 2026-02-21T09:20:56.1046818Z shl.b32 %r11837, %r11757, 13; 2026-02-21T09:20:56.1046877Z shl.b32 %r11838, %r11758, 13; 2026-02-21T09:20:56.1046937Z shl.b32 %r11839, %r11759, 13; 2026-02-21T09:20:56.1047008Z shl.b32 %r11840, %r11760, 13; 2026-02-21T09:20:56.1047069Z shl.b32 %r11841, %r11761, 13; 2026-02-21T09:20:56.1047129Z shl.b32 %r11842, %r11762, 13; 2026-02-21T09:20:56.1047252Z shl.b32 %r11843, %r11763, 13; 2026-02-21T09:20:56.1047325Z shl.b32 %r11844, %r11764, 13; 2026-02-21T09:20:56.1047383Z shl.b32 %r11845, %r11765, 13; 2026-02-21T09:20:56.1047443Z shl.b32 %r11846, %r11766, 13; 2026-02-21T09:20:56.1047504Z shl.b32 %r11847, %r11767, 13; 2026-02-21T09:20:56.1047561Z shl.b32 %r11848, %r11768, 13; 2026-02-21T09:20:56.1047622Z shl.b32 %r11849, %r11769, 13; 2026-02-21T09:20:56.1047681Z shl.b32 %r11850, %r11770, 13; 2026-02-21T09:20:56.1047888Z .loc 1 91 50 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:50 2026-02-21T09:20:56.1047953Z add.s32 %r11851, %r11835, %r11754; 2026-02-21T09:20:56.1048018Z add.s32 %r11852, %r11836, %r11754; 2026-02-21T09:20:56.1048080Z add.s32 %r11853, %r11837, %r11754; 2026-02-21T09:20:56.1048202Z add.s32 %r11854, %r11838, %r11754; 2026-02-21T09:20:56.1048271Z add.s32 %r11855, %r11839, %r11754; 2026-02-21T09:20:56.1048329Z add.s32 %r11856, %r11840, %r11754; 2026-02-21T09:20:56.1048389Z add.s32 %r11857, %r11841, %r11754; 2026-02-21T09:20:56.1048453Z add.s32 %r11858, %r11842, %r11754; 2026-02-21T09:20:56.1048512Z add.s32 %r11859, %r11843, %r11754; 2026-02-21T09:20:56.1048569Z add.s32 %r11860, %r11844, %r11754; 2026-02-21T09:20:56.1048631Z add.s32 %r11861, %r11845, %r11754; 2026-02-21T09:20:56.1048693Z add.s32 %r11862, %r11846, %r11754; 2026-02-21T09:20:56.1048752Z add.s32 %r11863, %r11847, %r11754; 2026-02-21T09:20:56.1048811Z add.s32 %r11864, %r11848, %r11754; 2026-02-21T09:20:56.1048873Z add.s32 %r11865, %r11849, %r11754; 2026-02-21T09:20:56.1048932Z add.s32 %r11866, %r11850, %r11754; 2026-02-21T09:20:56.1049136Z .loc 1 91 22 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:22 2026-02-21T09:20:56.1049210Z mad.wide.s32 %rd617, %r11851, 2, %rd46; 2026-02-21T09:20:56.1049296Z mad.wide.s32 %rd618, %r11852, 2, %rd46; 2026-02-21T09:20:56.1049366Z mad.wide.s32 %rd619, %r11853, 2, %rd46; 2026-02-21T09:20:56.1049433Z mad.wide.s32 %rd620, %r11854, 2, %rd46; 2026-02-21T09:20:56.1049504Z mad.wide.s32 %rd621, %r11855, 2, %rd46; 2026-02-21T09:20:56.1049569Z mad.wide.s32 %rd622, %r11856, 2, %rd46; 2026-02-21T09:20:56.1049636Z mad.wide.s32 %rd623, %r11857, 2, %rd46; 2026-02-21T09:20:56.1049706Z mad.wide.s32 %rd624, %r11858, 2, %rd46; 2026-02-21T09:20:56.1049772Z mad.wide.s32 %rd625, %r11859, 2, %rd46; 2026-02-21T09:20:56.1049837Z mad.wide.s32 %rd626, %r11860, 2, %rd46; 2026-02-21T09:20:56.1049904Z mad.wide.s32 %rd627, %r11861, 2, %rd46; 2026-02-21T09:20:56.1049977Z mad.wide.s32 %rd628, %r11862, 2, %rd46; 2026-02-21T09:20:56.1050043Z mad.wide.s32 %rd629, %r11863, 2, %rd46; 2026-02-21T09:20:56.1050110Z mad.wide.s32 %rd630, %r11864, 2, %rd46; 2026-02-21T09:20:56.1050181Z mad.wide.s32 %rd631, %r11865, 2, %rd46; 2026-02-21T09:20:56.1050247Z mad.wide.s32 %rd632, %r11866, 2, %rd46; 2026-02-21T09:20:56.1050451Z .loc 1 91 81 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:91:81 2026-02-21T09:20:56.1050576Z st.shared.v4.b32 [%r192], {%r11771, %r11773, %r11775, %r11777}; 2026-02-21T09:20:56.1050701Z st.shared.v4.b32 [%r192+512], {%r11772, %r11774, %r11776, %r11778}; 2026-02-21T09:20:56.1050812Z st.shared.v4.b32 [%r193], {%r11779, %r11781, %r11783, %r11785}; 2026-02-21T09:20:56.1051016Z st.shared.v4.b32 [%r193+512], {%r11780, %r11782, %r11784, %r11786}; 2026-02-21T09:20:56.1051131Z st.shared.v4.b32 [%r194], {%r11787, %r11789, %r11791, %r11793}; 2026-02-21T09:20:56.1051245Z st.shared.v4.b32 [%r194+512], {%r11788, %r11790, %r11792, %r11794}; 2026-02-21T09:20:56.1051397Z st.shared.v4.b32 [%r195], {%r11795, %r11797, %r11799, %r11801}; 2026-02-21T09:20:56.1051513Z st.shared.v4.b32 [%r195+512], {%r11796, %r11798, %r11800, %r11802}; 2026-02-21T09:20:56.1051570Z bar.sync 0; 2026-02-21T09:20:56.1051632Z // begin inline asm 2026-02-21T09:20:56.1051837Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11610, %r11611, %r11612, %r11613}, [%r11614]; 2026-02-21T09:20:56.1051894Z // end inline asm 2026-02-21T09:20:56.1052001Z // begin inline asm 2026-02-21T09:20:56.1052197Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11615, %r11616, %r11617, %r11618}, [%r11619]; 2026-02-21T09:20:56.1052257Z // end inline asm 2026-02-21T09:20:56.1052316Z // begin inline asm 2026-02-21T09:20:56.1052506Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11620, %r11621, %r11622, %r11623}, [%r11624]; 2026-02-21T09:20:56.1052562Z // end inline asm 2026-02-21T09:20:56.1052619Z // begin inline asm 2026-02-21T09:20:56.1052808Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11625, %r11626, %r11627, %r11628}, [%r11629]; 2026-02-21T09:20:56.1052865Z // end inline asm 2026-02-21T09:20:56.1052922Z // begin inline asm 2026-02-21T09:20:56.1053109Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11630, %r11631, %r11632, %r11633}, [%r11634]; 2026-02-21T09:20:56.1053229Z // end inline asm 2026-02-21T09:20:56.1053294Z // begin inline asm 2026-02-21T09:20:56.1053483Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11635, %r11636, %r11637, %r11638}, [%r11639]; 2026-02-21T09:20:56.1053540Z // end inline asm 2026-02-21T09:20:56.1053602Z // begin inline asm 2026-02-21T09:20:56.1053790Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11640, %r11641, %r11642, %r11643}, [%r11644]; 2026-02-21T09:20:56.1053847Z // end inline asm 2026-02-21T09:20:56.1053906Z // begin inline asm 2026-02-21T09:20:56.1054093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11645, %r11646, %r11647, %r11648}, [%r11649]; 2026-02-21T09:20:56.1054148Z // end inline asm 2026-02-21T09:20:56.1054204Z bar.sync 0; 2026-02-21T09:20:56.1054317Z st.shared.v4.b32 [%r192], {%r11803, %r11805, %r11807, %r11809}; 2026-02-21T09:20:56.1054446Z st.shared.v4.b32 [%r192+512], {%r11804, %r11806, %r11808, %r11810}; 2026-02-21T09:20:56.1054559Z st.shared.v4.b32 [%r193], {%r11811, %r11813, %r11815, %r11817}; 2026-02-21T09:20:56.1054680Z st.shared.v4.b32 [%r193+512], {%r11812, %r11814, %r11816, %r11818}; 2026-02-21T09:20:56.1054789Z st.shared.v4.b32 [%r194], {%r11819, %r11821, %r11823, %r11825}; 2026-02-21T09:20:56.1054907Z st.shared.v4.b32 [%r194+512], {%r11820, %r11822, %r11824, %r11826}; 2026-02-21T09:20:56.1055016Z st.shared.v4.b32 [%r195], {%r11827, %r11829, %r11831, %r11833}; 2026-02-21T09:20:56.1055130Z st.shared.v4.b32 [%r195+512], {%r11828, %r11830, %r11832, %r11834}; 2026-02-21T09:20:56.1055185Z bar.sync 0; 2026-02-21T09:20:56.1055243Z // begin inline asm 2026-02-21T09:20:56.1055440Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11650, %r11651, %r11652, %r11653}, [%r11614]; 2026-02-21T09:20:56.1055498Z // end inline asm 2026-02-21T09:20:56.1055555Z // begin inline asm 2026-02-21T09:20:56.1055750Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11655, %r11656, %r11657, %r11658}, [%r11619]; 2026-02-21T09:20:56.1055804Z // end inline asm 2026-02-21T09:20:56.1055860Z // begin inline asm 2026-02-21T09:20:56.1056054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11660, %r11661, %r11662, %r11663}, [%r11624]; 2026-02-21T09:20:56.1056109Z // end inline asm 2026-02-21T09:20:56.1056166Z // begin inline asm 2026-02-21T09:20:56.1056355Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11665, %r11666, %r11667, %r11668}, [%r11629]; 2026-02-21T09:20:56.1056414Z // end inline asm 2026-02-21T09:20:56.1056589Z // begin inline asm 2026-02-21T09:20:56.1056879Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11670, %r11671, %r11672, %r11673}, [%r11634]; 2026-02-21T09:20:56.1056939Z // end inline asm 2026-02-21T09:20:56.1056997Z // begin inline asm 2026-02-21T09:20:56.1057192Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11675, %r11676, %r11677, %r11678}, [%r11639]; 2026-02-21T09:20:56.1057322Z // end inline asm 2026-02-21T09:20:56.1057382Z // begin inline asm 2026-02-21T09:20:56.1057576Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11680, %r11681, %r11682, %r11683}, [%r11644]; 2026-02-21T09:20:56.1057634Z // end inline asm 2026-02-21T09:20:56.1057695Z // begin inline asm 2026-02-21T09:20:56.1057885Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11685, %r11686, %r11687, %r11688}, [%r11649]; 2026-02-21T09:20:56.1058003Z // end inline asm 2026-02-21T09:20:56.1058065Z // begin inline asm 2026-02-21T09:20:56.1058196Z st.global.v4.b32 [ %rd617 + 0 ], { %r11610, %r11611, %r11612, %r11613 }; 2026-02-21T09:20:56.1058251Z // end inline asm 2026-02-21T09:20:56.1058310Z // begin inline asm 2026-02-21T09:20:56.1058435Z st.global.v4.b32 [ %rd618 + 0 ], { %r11615, %r11616, %r11617, %r11618 }; 2026-02-21T09:20:56.1058491Z // end inline asm 2026-02-21T09:20:56.1058548Z // begin inline asm 2026-02-21T09:20:56.1058672Z st.global.v4.b32 [ %rd619 + 0 ], { %r11620, %r11621, %r11622, %r11623 }; 2026-02-21T09:20:56.1058728Z // end inline asm 2026-02-21T09:20:56.1058785Z // begin inline asm 2026-02-21T09:20:56.1058910Z st.global.v4.b32 [ %rd620 + 0 ], { %r11625, %r11626, %r11627, %r11628 }; 2026-02-21T09:20:56.1059027Z // end inline asm 2026-02-21T09:20:56.1059087Z // begin inline asm 2026-02-21T09:20:56.1059204Z st.global.v4.b32 [ %rd621 + 0 ], { %r11630, %r11631, %r11632, %r11633 }; 2026-02-21T09:20:56.1059264Z // end inline asm 2026-02-21T09:20:56.1059322Z // begin inline asm 2026-02-21T09:20:56.1059439Z st.global.v4.b32 [ %rd622 + 0 ], { %r11635, %r11636, %r11637, %r11638 }; 2026-02-21T09:20:56.1059498Z // end inline asm 2026-02-21T09:20:56.1059558Z // begin inline asm 2026-02-21T09:20:56.1059674Z st.global.v4.b32 [ %rd623 + 0 ], { %r11640, %r11641, %r11642, %r11643 }; 2026-02-21T09:20:56.1059729Z // end inline asm 2026-02-21T09:20:56.1059789Z // begin inline asm 2026-02-21T09:20:56.1059909Z st.global.v4.b32 [ %rd624 + 0 ], { %r11645, %r11646, %r11647, %r11648 }; 2026-02-21T09:20:56.1059965Z // end inline asm 2026-02-21T09:20:56.1060024Z // begin inline asm 2026-02-21T09:20:56.1060141Z st.global.v4.b32 [ %rd625 + 0 ], { %r11650, %r11651, %r11652, %r11653 }; 2026-02-21T09:20:56.1060197Z // end inline asm 2026-02-21T09:20:56.1060259Z // begin inline asm 2026-02-21T09:20:56.1060376Z st.global.v4.b32 [ %rd626 + 0 ], { %r11655, %r11656, %r11657, %r11658 }; 2026-02-21T09:20:56.1060430Z // end inline asm 2026-02-21T09:20:56.1060489Z // begin inline asm 2026-02-21T09:20:56.1060607Z st.global.v4.b32 [ %rd627 + 0 ], { %r11660, %r11661, %r11662, %r11663 }; 2026-02-21T09:20:56.1060661Z // end inline asm 2026-02-21T09:20:56.1060722Z // begin inline asm 2026-02-21T09:20:56.1060842Z st.global.v4.b32 [ %rd628 + 0 ], { %r11665, %r11666, %r11667, %r11668 }; 2026-02-21T09:20:56.1060897Z // end inline asm 2026-02-21T09:20:56.1060954Z // begin inline asm 2026-02-21T09:20:56.1061071Z st.global.v4.b32 [ %rd629 + 0 ], { %r11670, %r11671, %r11672, %r11673 }; 2026-02-21T09:20:56.1061131Z // end inline asm 2026-02-21T09:20:56.1061192Z // begin inline asm 2026-02-21T09:20:56.1061321Z st.global.v4.b32 [ %rd630 + 0 ], { %r11675, %r11676, %r11677, %r11678 }; 2026-02-21T09:20:56.1061380Z // end inline asm 2026-02-21T09:20:56.1061440Z // begin inline asm 2026-02-21T09:20:56.1061559Z st.global.v4.b32 [ %rd631 + 0 ], { %r11680, %r11681, %r11682, %r11683 }; 2026-02-21T09:20:56.1061617Z // end inline asm 2026-02-21T09:20:56.1061675Z // begin inline asm 2026-02-21T09:20:56.1061792Z st.global.v4.b32 [ %rd632 + 0 ], { %r11685, %r11686, %r11687, %r11688 }; 2026-02-21T09:20:56.1061846Z // end inline asm 2026-02-21T09:20:56.1062070Z .loc 1 22 121 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:121 2026-02-21T09:20:56.1062200Z add.s32 %r12416, %r12416, 1; 2026-02-21T09:20:56.1062270Z setp.ne.b32 %p69, %r12416, %r2; 2026-02-21T09:20:56.1062335Z @%p69 bra $L__BB0_13; 2026-02-21T09:20:56.1062470Z $L__BB0_16: // %._crit_edge 2026-02-21T09:20:56.1062675Z .loc 1 22 4 // czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py:22:4 2026-02-21T09:20:56.1062730Z ret; 2026-02-21T09:20:56.1062784Z $L__tmp21: 2026-02-21T09:20:56.1062840Z $L__func_end0: 2026-02-21T09:20:56.1062930Z // -- End function 2026-02-21T09:20:56.1062988Z } 2026-02-21T09:20:56.1063297Z .file 1 "/tmp/torchinductor_root/zx/czxhdcweo4rhyvelg2mubnzsjrxb5lwia7gfcczmhsworx47mpcw.py" 2026-02-21T09:20:56.1063513Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:20:56.1063583Z .section .debug_abbrev 2026-02-21T09:20:56.1063636Z { 2026-02-21T09:20:56.1063732Z .b8 1 // Abbreviation Code 2026-02-21T09:20:56.1063825Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:20:56.1063924Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:56.1064014Z .b8 37 // DW_AT_producer 2026-02-21T09:20:56.1064097Z .b8 8 // DW_FORM_string 2026-02-21T09:20:56.1064179Z .b8 19 // DW_AT_language 2026-02-21T09:20:56.1064307Z .b8 5 // DW_FORM_data2 2026-02-21T09:20:56.1064388Z .b8 3 // DW_AT_name 2026-02-21T09:20:56.1064469Z .b8 8 // DW_FORM_string 2026-02-21T09:20:56.1064551Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:20:56.1064630Z .b8 6 // DW_FORM_data4 2026-02-21T09:20:56.1064708Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:20:56.1064790Z .b8 8 // DW_FORM_string 2026-02-21T09:20:56.1064863Z .b8 0 // EOM(1) 2026-02-21T09:20:56.1064932Z .b8 0 // EOM(2) 2026-02-21T09:20:56.1065038Z .b8 2 // Abbreviation Code 2026-02-21T09:20:56.1065127Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:56.1065208Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:56.1065287Z .b8 3 // DW_AT_name 2026-02-21T09:20:56.1065365Z .b8 8 // DW_FORM_string 2026-02-21T09:20:56.1065444Z .b8 32 // DW_AT_inline 2026-02-21T09:20:56.1065526Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:56.1065597Z .b8 0 // EOM(1) 2026-02-21T09:20:56.1065665Z .b8 0 // EOM(2) 2026-02-21T09:20:56.1065752Z .b8 3 // Abbreviation Code 2026-02-21T09:20:56.1065840Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:56.1065920Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:56.1066011Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:56.1066092Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:56.1066172Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:56.1066249Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:56.1066349Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:56.1066426Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:56.1066617Z .b8 0 // EOM(1) 2026-02-21T09:20:56.1066690Z .b8 0 // EOM(2) 2026-02-21T09:20:56.1066788Z .b8 4 // Abbreviation Code 2026-02-21T09:20:56.1066970Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:20:56.1067051Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:56.1067147Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:56.1067285Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:56.1067373Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:56.1067453Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:56.1067536Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:56.1067610Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:56.1067758Z .b8 88 // DW_AT_call_file 2026-02-21T09:20:56.1067838Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:56.1067918Z .b8 89 // DW_AT_call_line 2026-02-21T09:20:56.1067997Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:56.1068089Z .b8 87 // DW_AT_call_column 2026-02-21T09:20:56.1068165Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:56.1068235Z .b8 0 // EOM(1) 2026-02-21T09:20:56.1068385Z .b8 0 // EOM(2) 2026-02-21T09:20:56.1068461Z .b8 0 // EOM(3) 2026-02-21T09:20:56.1068513Z } 2026-02-21T09:20:56.1068577Z .section .debug_info 2026-02-21T09:20:56.1068635Z { 2026-02-21T09:20:56.1068790Z .b32 178 // Length of Unit 2026-02-21T09:20:56.1068887Z .b8 2 // DWARF version number 2026-02-21T09:20:56.1068942Z .b8 0 2026-02-21T09:20:56.1069075Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:20:56.1069180Z .b8 8 // Address Size (in bytes) 2026-02-21T09:20:56.1069304Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:20:56.1069391Z .b8 116 // DW_AT_producer 2026-02-21T09:20:56.1069444Z .b8 114 2026-02-21T09:20:56.1069495Z .b8 105 2026-02-21T09:20:56.1069552Z .b8 116 2026-02-21T09:20:56.1069602Z .b8 111 2026-02-21T09:20:56.1069652Z .b8 110 2026-02-21T09:20:56.1069706Z .b8 0 2026-02-21T09:20:56.1069784Z .b8 2 // DW_AT_language 2026-02-21T09:20:56.1069835Z .b8 0 2026-02-21T09:20:56.1069913Z .b8 99 // DW_AT_name 2026-02-21T09:20:56.1069968Z .b8 122 2026-02-21T09:20:56.1070018Z .b8 120 2026-02-21T09:20:56.1070067Z .b8 104 2026-02-21T09:20:56.1070119Z .b8 100 2026-02-21T09:20:56.1070168Z .b8 99 2026-02-21T09:20:56.1070221Z .b8 119 2026-02-21T09:20:56.1070271Z .b8 101 2026-02-21T09:20:56.1070323Z .b8 111 2026-02-21T09:20:56.1070373Z .b8 52 2026-02-21T09:20:56.1070422Z .b8 114 2026-02-21T09:20:56.1070473Z .b8 104 2026-02-21T09:20:56.1070526Z .b8 121 2026-02-21T09:20:56.1070576Z .b8 118 2026-02-21T09:20:56.1070625Z .b8 101 2026-02-21T09:20:56.1070678Z .b8 108 2026-02-21T09:20:56.1070728Z .b8 103 2026-02-21T09:20:56.1070776Z .b8 50 2026-02-21T09:20:56.1070826Z .b8 109 2026-02-21T09:20:56.1070889Z .b8 117 2026-02-21T09:20:56.1070942Z .b8 98 2026-02-21T09:20:56.1070993Z .b8 110 2026-02-21T09:20:56.1071047Z .b8 122 2026-02-21T09:20:56.1071108Z .b8 115 2026-02-21T09:20:56.1071161Z .b8 106 2026-02-21T09:20:56.1071210Z .b8 114 2026-02-21T09:20:56.1071265Z .b8 120 2026-02-21T09:20:56.1071315Z .b8 98 2026-02-21T09:20:56.1071365Z .b8 53 2026-02-21T09:20:56.1071419Z .b8 108 2026-02-21T09:20:56.1071469Z .b8 119 2026-02-21T09:20:56.1071529Z .b8 105 2026-02-21T09:20:56.1071580Z .b8 97 2026-02-21T09:20:56.1071634Z .b8 55 2026-02-21T09:20:56.1071686Z .b8 103 2026-02-21T09:20:56.1071736Z .b8 102 2026-02-21T09:20:56.1071790Z .b8 99 2026-02-21T09:20:56.1071839Z .b8 99 2026-02-21T09:20:56.1071890Z .b8 122 2026-02-21T09:20:56.1071939Z .b8 109 2026-02-21T09:20:56.1072067Z .b8 104 2026-02-21T09:20:56.1072118Z .b8 115 2026-02-21T09:20:56.1072169Z .b8 119 2026-02-21T09:20:56.1072224Z .b8 111 2026-02-21T09:20:56.1072278Z .b8 114 2026-02-21T09:20:56.1072329Z .b8 120 2026-02-21T09:20:56.1072379Z .b8 52 2026-02-21T09:20:56.1072480Z .b8 55 2026-02-21T09:20:56.1072530Z .b8 109 2026-02-21T09:20:56.1072578Z .b8 112 2026-02-21T09:20:56.1072627Z .b8 99 2026-02-21T09:20:56.1072681Z .b8 119 2026-02-21T09:20:56.1072734Z .b8 46 2026-02-21T09:20:56.1072783Z .b8 112 2026-02-21T09:20:56.1072838Z .b8 121 2026-02-21T09:20:56.1072888Z .b8 0 2026-02-21T09:20:56.1072991Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:20:56.1073073Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:20:56.1073127Z .b8 116 2026-02-21T09:20:56.1073253Z .b8 109 2026-02-21T09:20:56.1073309Z .b8 112 2026-02-21T09:20:56.1073362Z .b8 47 2026-02-21T09:20:56.1073413Z .b8 116 2026-02-21T09:20:56.1073464Z .b8 111 2026-02-21T09:20:56.1073525Z .b8 114 2026-02-21T09:20:56.1073583Z .b8 99 2026-02-21T09:20:56.1073636Z .b8 104 2026-02-21T09:20:56.1073688Z .b8 105 2026-02-21T09:20:56.1073739Z .b8 110 2026-02-21T09:20:56.1073793Z .b8 100 2026-02-21T09:20:56.1073843Z .b8 117 2026-02-21T09:20:56.1073894Z .b8 99 2026-02-21T09:20:56.1073948Z .b8 116 2026-02-21T09:20:56.1074000Z .b8 111 2026-02-21T09:20:56.1074051Z .b8 114 2026-02-21T09:20:56.1074100Z .b8 95 2026-02-21T09:20:56.1074152Z .b8 114 2026-02-21T09:20:56.1074202Z .b8 111 2026-02-21T09:20:56.1074251Z .b8 111 2026-02-21T09:20:56.1074305Z .b8 116 2026-02-21T09:20:56.1074354Z .b8 47 2026-02-21T09:20:56.1074458Z .b8 122 2026-02-21T09:20:56.1074515Z .b8 120 2026-02-21T09:20:56.1074568Z .b8 0 2026-02-21T09:20:56.1074681Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:20:56.1074759Z .b8 95 // DW_AT_name 2026-02-21T09:20:56.1074811Z .b8 104 2026-02-21T09:20:56.1074862Z .b8 101 2026-02-21T09:20:56.1074910Z .b8 108 2026-02-21T09:20:56.1074960Z .b8 105 2026-02-21T09:20:56.1075017Z .b8 111 2026-02-21T09:20:56.1075067Z .b8 110 2026-02-21T09:20:56.1075117Z .b8 95 2026-02-21T09:20:56.1075171Z .b8 109 2026-02-21T09:20:56.1075223Z .b8 97 2026-02-21T09:20:56.1075274Z .b8 116 2026-02-21T09:20:56.1075323Z .b8 109 2026-02-21T09:20:56.1075380Z .b8 117 2026-02-21T09:20:56.1075430Z .b8 108 2026-02-21T09:20:56.1075479Z .b8 95 2026-02-21T09:20:56.1075529Z .b8 98 2026-02-21T09:20:56.1075585Z .b8 102 2026-02-21T09:20:56.1075633Z .b8 49 2026-02-21T09:20:56.1075682Z .b8 54 2026-02-21T09:20:56.1075736Z .b8 95 2026-02-21T09:20:56.1075786Z .b8 105 2026-02-21T09:20:56.1075838Z .b8 110 2026-02-21T09:20:56.1075890Z .b8 116 2026-02-21T09:20:56.1075942Z .b8 52 2026-02-21T09:20:56.1075992Z .b8 0 2026-02-21T09:20:56.1076069Z .b8 1 // DW_AT_inline 2026-02-21T09:20:56.1076178Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:20:56.1076272Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:20:56.1076367Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:20:56.1076587Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:56.1076722Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:20:56.1076821Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:56.1076909Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:20:56.1076999Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T09:20:56.1077081Z .b8 1 // DW_AT_call_file 2026-02-21T09:20:56.1077163Z .b8 87 // DW_AT_call_line 2026-02-21T09:20:56.1077247Z .b8 40 // DW_AT_call_column 2026-02-21T09:20:56.1077338Z .b8 0 // End Of Children Mark 2026-02-21T09:20:56.1077425Z .b8 0 // End Of Children Mark 2026-02-21T09:20:56.1077566Z } 2026-02-21T09:20:56.1077638Z .section .debug_macinfo { } 2026-02-21T09:20:56.1077644Z 2026-02-21T09:20:56.1077726Z ================================================================ 2026-02-21T09:20:56.1077846Z please share the reproducer above with Triton project. 2026-02-21T09:20:57.6254866Z 2026-02-21T09:20:57.6254883Z 2026-02-21T09:20:57.6254888Z 2026-02-21T09:20:57.6255149Z ================================================================ 2026-02-21T09:20:57.6255516Z Internal Triton PTX codegen error 2026-02-21T09:20:57.6255818Z `ptxas` stderr: 2026-02-21T09:20:57.6256843Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:20:57.6258427Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:20:57.6259113Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:57.6259352Z 2026-02-21T09:20:57.6260003Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpza8qfwif.ptx -o /tmp/tmpza8qfwif.ptx.o 2026-02-21T09:20:57.6260766Z 2026-02-21T09:20:57.6260770Z 2026-02-21T09:20:57.6260845Z // 2026-02-21T09:20:57.6261053Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:20:57.6261357Z // 2026-02-21T09:20:57.6261461Z 2026-02-21T09:20:57.6261564Z .version 8.7 2026-02-21T09:20:57.6261732Z .target sm_90a 2026-02-21T09:20:57.6261943Z .address_size 64 2026-02-21T09:20:57.6262513Z 2026-02-21T09:20:57.6262740Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:20:57.6263147Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:20:57.6263445Z // @_helion_matmul_bf16_int4 2026-02-21T09:20:57.6263746Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:20:57.6264097Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:20:57.6264521Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:20:57.6264920Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:20:57.6265322Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:20:57.6265711Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:20:57.6266022Z ) 2026-02-21T09:20:57.6266174Z .reqntid 1024 2026-02-21T09:20:57.6266333Z { 2026-02-21T09:20:57.6266686Z .reg .pred %p<64>; 2026-02-21T09:20:57.6266902Z .reg .b16 %rs<198>; 2026-02-21T09:20:57.6267090Z .reg .b32 %r<5340>; 2026-02-21T09:20:57.6267280Z .reg .b64 %rd<226>; 2026-02-21T09:20:57.6267643Z .loc 1 14 0 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:14:0 2026-02-21T09:20:57.6268066Z $L__func_begin0: 2026-02-21T09:20:57.6268518Z .loc 1 14 0 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:14:0 2026-02-21T09:20:57.6268868Z 2026-02-21T09:20:57.6268955Z // %bb.0: 2026-02-21T09:20:57.6269190Z ld.param.b64 %rd29, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:20:57.6269573Z ld.param.b64 %rd28, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:20:57.6269926Z ld.param.b64 %rd27, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:20:57.6270230Z $L__tmp0: 2026-02-21T09:20:57.6270570Z .loc 1 20 30 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:20:30 2026-02-21T09:20:57.6271014Z mov.u32 %r457, %ctaid.x; 2026-02-21T09:20:57.6271401Z .loc 1 20 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:20:35 2026-02-21T09:20:57.6271813Z shl.b32 %r5159, %r457, 1; 2026-02-21T09:20:57.6272181Z .loc 1 21 37 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:21:37 2026-02-21T09:20:57.6272569Z add.s32 %r458, %r5159, 2; 2026-02-21T09:20:57.6272910Z .loc 1 21 49 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:21:49 2026-02-21T09:20:57.6273411Z min.s32 %r2, %r458, 4096; 2026-02-21T09:20:57.6273754Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6274206Z sub.s32 %r459, %r2, %r5159; 2026-02-21T09:20:57.6274391Z shr.s32 %r460, %r459, 31; 2026-02-21T09:20:57.6274566Z shr.u32 %r461, %r460, 30; 2026-02-21T09:20:57.6274737Z add.s32 %r462, %r459, %r461; 2026-02-21T09:20:57.6274927Z and.b32 %r463, %r462, -4; 2026-02-21T09:20:57.6275109Z add.s32 %r5304, %r463, %r5159; 2026-02-21T09:20:57.6275445Z .loc 1 34 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:45 2026-02-21T09:20:57.6275842Z mov.u32 %r4, %tid.x; 2026-02-21T09:20:57.6276077Z and.b32 %r5, %r4, 31; 2026-02-21T09:20:57.6287464Z shr.u32 %r6, %r4, 5; 2026-02-21T09:20:57.6287671Z shl.b32 %r7, %r4, 3; 2026-02-21T09:20:57.6287859Z and.b32 %r8, %r7, 120; 2026-02-21T09:20:57.6288044Z and.b32 %r9, %r4, 127; 2026-02-21T09:20:57.6288388Z .loc 1 36 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:45 2026-02-21T09:20:57.6288793Z and.b32 %r10, %r4, 1020; 2026-02-21T09:20:57.6288978Z shr.u32 %r11, %r4, 2; 2026-02-21T09:20:57.6289161Z shr.u32 %r12, %r4, 4; 2026-02-21T09:20:57.6289331Z or.b32 %r13, %r12, 64; 2026-02-21T09:20:57.6289506Z or.b32 %r14, %r12, 128; 2026-02-21T09:20:57.6289678Z or.b32 %r15, %r12, 192; 2026-02-21T09:20:57.6290137Z .loc 1 44 48 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:44:48 2026-02-21T09:20:57.6290508Z and.b32 %r16, %r4, 896; 2026-02-21T09:20:57.6290682Z shr.u32 %r17, %r4, 7; 2026-02-21T09:20:57.6291009Z .loc 1 50 38 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:50:38 2026-02-21T09:20:57.6291366Z and.b32 %r18, %r4, 3; 2026-02-21T09:20:57.6291547Z shl.b32 %r19, %r18, 2; 2026-02-21T09:20:57.6291882Z .loc 1 68 38 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:68:38 2026-02-21T09:20:57.6292257Z and.b32 %r20, %r4, 128; 2026-02-21T09:20:57.6292588Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6293009Z setp.ge.s32 %p1, %r5159, %r5304; 2026-02-21T09:20:57.6293224Z and.b32 %r5134, %r7, 8056; 2026-02-21T09:20:57.6293422Z bfe.s32 %r5135, %r4, 4, 1; 2026-02-21T09:20:57.6293609Z mov.b32 %r5136, global_smem; 2026-02-21T09:20:57.6293799Z shl.b32 %r5137, %r4, 4; 2026-02-21T09:20:57.6293985Z and.b32 %r5138, %r7, 96; 2026-02-21T09:20:57.6294161Z shl.b32 %r5139, %r18, 1; 2026-02-21T09:20:57.6294337Z shl.b32 %r5140, %r4, 2; 2026-02-21T09:20:57.6294505Z and.b32 %r5141, %r4, 384; 2026-02-21T09:20:57.6294689Z and.b32 %r5142, %r12, 2; 2026-02-21T09:20:57.6294860Z and.b32 %r5143, %r7, 512; 2026-02-21T09:20:57.6295057Z setp.gt.u32 %p62, %r4, 511; 2026-02-21T09:20:57.6295253Z shr.u32 %r5144, %r4, 1; 2026-02-21T09:20:57.6295425Z shl.b32 %r5145, %r9, 6; 2026-02-21T09:20:57.6295597Z and.b32 %r5146, %r7, 48; 2026-02-21T09:20:57.6295762Z shr.u32 %r5147, %r16, 5; 2026-02-21T09:20:57.6295935Z shl.b32 %r5148, %r6, 7; 2026-02-21T09:20:57.6296107Z shl.b32 %r5149, %r5, 4; 2026-02-21T09:20:57.6296285Z shl.b32 %r5150, %r5, 3; 2026-02-21T09:20:57.6296603Z shl.b32 %r5151, %r4, 8; 2026-02-21T09:20:57.6296785Z shl.b32 %r5152, %r18, 14; 2026-02-21T09:20:57.6296964Z shl.b32 %r5153, %r4, 5; 2026-02-21T09:20:57.6297134Z and.b32 %r5154, %r4, 24; 2026-02-21T09:20:57.6297308Z shl.b32 %r5155, %r18, 5; 2026-02-21T09:20:57.6297477Z shl.b32 %r5156, %r10, 2; 2026-02-21T09:20:57.6297649Z shl.b32 %r5157, %r11, 10; 2026-02-21T09:20:57.6297831Z shl.b32 %r5158, %r17, 13; 2026-02-21T09:20:57.6298023Z setp.eq.b32 %p63, %r20, 0; 2026-02-21T09:20:57.6298216Z @%p1 bra $L__BB0_11; 2026-02-21T09:20:57.6298420Z // %bb.1: // %.lr.ph 2026-02-21T09:20:57.6298819Z .loc 1 0 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:0:121 2026-02-21T09:20:57.6299303Z and.b32 %r466, %r5135, 136; 2026-02-21T09:20:57.6299494Z xor.b32 %r467, %r466, %r5134; 2026-02-21T09:20:57.6299691Z add.s32 %r21, %r5136, %r467; 2026-02-21T09:20:57.6299949Z add.s32 %r22, %r21, 40960; 2026-02-21T09:20:57.6300127Z add.s32 %r23, %r21, 8192; 2026-02-21T09:20:57.6300308Z add.s32 %r24, %r21, 49152; 2026-02-21T09:20:57.6300480Z add.s32 %r25, %r21, 16384; 2026-02-21T09:20:57.6300659Z add.s32 %r26, %r21, 57344; 2026-02-21T09:20:57.6300830Z add.s32 %r27, %r21, 24576; 2026-02-21T09:20:57.6301009Z add.s32 %r28, %r21, 65536; 2026-02-21T09:20:57.6301179Z add.s32 %r29, %r21, 32768; 2026-02-21T09:20:57.6301365Z add.s32 %r30, %r21, 73728; 2026-02-21T09:20:57.6301617Z and.b32 %r470, %r5137, 7680; 2026-02-21T09:20:57.6301800Z or.b32 %r473, %r470, %r5138; 2026-02-21T09:20:57.6301980Z or.b32 %r474, %r473, %r5139; 2026-02-21T09:20:57.6302168Z or.b32 %r31, %r474, %r466; 2026-02-21T09:20:57.6302358Z xor.b32 %r32, %r31, 8; 2026-02-21T09:20:57.6302537Z and.b32 %r476, %r5140, 124; 2026-02-21T09:20:57.6302730Z selp.b32 %r480, 1, 0, %p62; 2026-02-21T09:20:57.6302919Z add.s32 %r481, %r5136, 81920; 2026-02-21T09:20:57.6303111Z add.s32 %r482, %r481, %r5141; 2026-02-21T09:20:57.6303295Z add.s32 %r483, %r482, %r480; 2026-02-21T09:20:57.6303485Z add.s32 %r484, %r483, %r5143; 2026-02-21T09:20:57.6303668Z add.s32 %r485, %r484, %r5142; 2026-02-21T09:20:57.6303848Z add.s32 %r33, %r485, %r476; 2026-02-21T09:20:57.6304039Z and.b32 %r487, %r5144, 384; 2026-02-21T09:20:57.6304288Z add.s32 %r488, %r481, %r5142; 2026-02-21T09:20:57.6304481Z add.s32 %r489, %r488, %r487; 2026-02-21T09:20:57.6304659Z add.s32 %r490, %r489, %r476; 2026-02-21T09:20:57.6304845Z add.s32 %r34, %r490, %r5143; 2026-02-21T09:20:57.6305025Z xor.b32 %r494, %r5146, %r5147; 2026-02-21T09:20:57.6305217Z or.b32 %r495, %r494, %r5145; 2026-02-21T09:20:57.6305393Z add.s32 %r35, %r481, %r495; 2026-02-21T09:20:57.6305581Z xor.b32 %r496, %r495, 32; 2026-02-21T09:20:57.6305764Z add.s32 %r36, %r481, %r496; 2026-02-21T09:20:57.6305940Z or.b32 %r499, %r5148, %r5149; 2026-02-21T09:20:57.6306135Z add.s32 %r500, %r5136, 90112; 2026-02-21T09:20:57.6306316Z add.s32 %r1079, %r500, %r499; 2026-02-21T09:20:57.6306642Z and.b32 %r501, %r5137, 112; 2026-02-21T09:20:57.6306828Z or.b32 %r503, %r5148, %r5150; 2026-02-21T09:20:57.6307012Z and.b32 %r504, %r503, 1920; 2026-02-21T09:20:57.6307187Z and.b32 %r506, %r5151, 2048; 2026-02-21T09:20:57.6307367Z add.s32 %r507, %r500, %r501; 2026-02-21T09:20:57.6307551Z add.s32 %r508, %r507, %r506; 2026-02-21T09:20:57.6307726Z add.s32 %r578, %r508, %r504; 2026-02-21T09:20:57.6307908Z bfe.u32 %r509, %r481, 4, 14; 2026-02-21T09:20:57.6308088Z cvt.u64.u32 %rd30, %r509; 2026-02-21T09:20:57.6308369Z or.b64 %rd1, %rd30, -9223371899382267904; 2026-02-21T09:20:57.6308608Z add.s32 %r510, %r5136, 81952; 2026-02-21T09:20:57.6308795Z bfe.u32 %r511, %r510, 4, 14; 2026-02-21T09:20:57.6308975Z cvt.u64.u32 %rd31, %r511; 2026-02-21T09:20:57.6309170Z or.b64 %rd2, %rd31, -9223371899382267904; 2026-02-21T09:20:57.6309379Z add.s32 %r512, %r5136, 86016; 2026-02-21T09:20:57.6309563Z bfe.u32 %r513, %r512, 4, 14; 2026-02-21T09:20:57.6309749Z cvt.u64.u32 %rd32, %r513; 2026-02-21T09:20:57.6309930Z or.b64 %rd3, %rd32, -9223371899382267904; 2026-02-21T09:20:57.6310140Z add.s32 %r514, %r5136, 86048; 2026-02-21T09:20:57.6310314Z bfe.u32 %r515, %r514, 4, 14; 2026-02-21T09:20:57.6310496Z cvt.u64.u32 %rd33, %r515; 2026-02-21T09:20:57.6310674Z or.b64 %rd4, %rd33, -9223371899382267904; 2026-02-21T09:20:57.6310896Z and.b32 %r518, %r5153, 15456; 2026-02-21T09:20:57.6311079Z shl.b32 %r520, %r5154, 4; 2026-02-21T09:20:57.6311264Z and.b32 %r521, %r5140, 16; 2026-02-21T09:20:57.6311449Z selp.b32 %r522, 64, 0, %p62; 2026-02-21T09:20:57.6311633Z or.b32 %r523, %r5152, %r521; 2026-02-21T09:20:57.6311818Z or.b32 %r524, %r518, %r520; 2026-02-21T09:20:57.6312088Z xor.b32 %r525, %r524, %r522; 2026-02-21T09:20:57.6312273Z or.b32 %r526, %r525, %r523; 2026-02-21T09:20:57.6312450Z add.s32 %r39, %r5136, %r526; 2026-02-21T09:20:57.6312632Z xor.b32 %r527, %r526, 32; 2026-02-21T09:20:57.6312805Z add.s32 %r40, %r5136, %r527; 2026-02-21T09:20:57.6313058Z shl.b32 %r528, %r5154, 11; 2026-02-21T09:20:57.6313234Z or.b32 %r531, %r528, %r5155; 2026-02-21T09:20:57.6313419Z xor.b32 %r532, %r531, %r5156; 2026-02-21T09:20:57.6313612Z add.s32 %r1380, %r5136, %r532; 2026-02-21T09:20:57.6313791Z add.s32 %r1385, %r1380, 4096; 2026-02-21T09:20:57.6313974Z add.s32 %r1390, %r1380, 8192; 2026-02-21T09:20:57.6314159Z add.s32 %r1395, %r1380, 12288; 2026-02-21T09:20:57.6314497Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6314948Z or.b32 %r534, %r5157, %r19; 2026-02-21T09:20:57.6315139Z or.b32 %r45, %r534, 176; 2026-02-21T09:20:57.6315311Z or.b32 %r46, %r5158, %r9; 2026-02-21T09:20:57.6315542Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:20:57.6315831Z // Child Loop BB0_3 Depth 2 2026-02-21T09:20:57.6316107Z // Child Loop BB0_5 Depth 2 2026-02-21T09:20:57.6316372Z // Child Loop BB0_7 Depth 2 2026-02-21T09:20:57.6316771Z // Child Loop BB0_9 Depth 2 2026-02-21T09:20:57.6317170Z .loc 1 28 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:28:35 2026-02-21T09:20:57.6317608Z shr.s32 %r559, %r5159, 31; 2026-02-21T09:20:57.6317796Z shr.u32 %r560, %r559, 22; 2026-02-21T09:20:57.6317964Z add.s32 %r561, %r5159, %r560; 2026-02-21T09:20:57.6318156Z shr.s32 %r562, %r561, 10; 2026-02-21T09:20:57.6318467Z .loc 1 29 33 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:29:33 2026-02-21T09:20:57.6318825Z shl.b32 %r563, %r562, 4; 2026-02-21T09:20:57.6319142Z .loc 1 30 39 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:39 2026-02-21T09:20:57.6319489Z sub.s32 %r564, 64, %r563; 2026-02-21T09:20:57.6319811Z .loc 1 30 52 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:52 2026-02-21T09:20:57.6320159Z min.s32 %r565, %r564, 16; 2026-02-21T09:20:57.6320485Z .loc 1 31 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:45 2026-02-21T09:20:57.6320831Z and.b32 %r566, %r561, -1024; 2026-02-21T09:20:57.6321017Z sub.s32 %r567, %r5159, %r566; 2026-02-21T09:20:57.6321336Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6321684Z div.s32 %r568, %r567, %r565; 2026-02-21T09:20:57.6322004Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6322356Z mul.lo.s32 %r569, %r568, %r565; 2026-02-21T09:20:57.6322553Z sub.s32 %r570, %r567, %r569; 2026-02-21T09:20:57.6322860Z .loc 1 31 30 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:30 2026-02-21T09:20:57.6323227Z add.s32 %r571, %r570, %r563; 2026-02-21T09:20:57.6323543Z .loc 1 33 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:33:27 2026-02-21T09:20:57.6323888Z shl.b32 %r77, %r571, 7; 2026-02-21T09:20:57.6324199Z .loc 1 35 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:35:27 2026-02-21T09:20:57.6324548Z shl.b32 %r78, %r568, 8; 2026-02-21T09:20:57.6324854Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6325203Z or.b32 %r572, %r78, %r11; 2026-02-21T09:20:57.6325518Z .loc 1 51 53 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:53 2026-02-21T09:20:57.6325869Z shl.b32 %r573, %r572, 10; 2026-02-21T09:20:57.6326270Z .loc 1 51 60 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:60 2026-02-21T09:20:57.6326744Z or.b32 %r574, %r573, %r19; 2026-02-21T09:20:57.6327052Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6327508Z mad.wide.s32 %rd34, %r574, 2, %rd27; 2026-02-21T09:20:57.6327847Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6328200Z bar.sync 0; 2026-02-21T09:20:57.6328360Z mov.b32 %r537, 8; 2026-02-21T09:20:57.6328519Z // begin inline asm 2026-02-21T09:20:57.6328765Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd34 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6329045Z // end inline asm 2026-02-21T09:20:57.6329299Z cp.async.commit_group; 2026-02-21T09:20:57.6329623Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6329987Z cvt.s64.s32 %rd45, %r573; 2026-02-21T09:20:57.6330174Z cvt.u64.u32 %rd10, %r19; 2026-02-21T09:20:57.6330349Z or.b64 %rd46, %rd45, %rd10; 2026-02-21T09:20:57.6330535Z shl.b64 %rd47, %rd46, 1; 2026-02-21T09:20:57.6330711Z add.s64 %rd48, %rd27, %rd47; 2026-02-21T09:20:57.6330898Z add.s64 %rd35, %rd48, 32; 2026-02-21T09:20:57.6331217Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6331570Z // begin inline asm 2026-02-21T09:20:57.6331807Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd35 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6332147Z // end inline asm 2026-02-21T09:20:57.6332317Z cp.async.commit_group; 2026-02-21T09:20:57.6332628Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6332979Z add.s64 %rd36, %rd48, 64; 2026-02-21T09:20:57.6333296Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6333636Z bar.sync 0; 2026-02-21T09:20:57.6333789Z // begin inline asm 2026-02-21T09:20:57.6334014Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd36 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6334302Z // end inline asm 2026-02-21T09:20:57.6334460Z cp.async.commit_group; 2026-02-21T09:20:57.6334772Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6335120Z add.s64 %rd37, %rd48, 96; 2026-02-21T09:20:57.6335436Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6335784Z // begin inline asm 2026-02-21T09:20:57.6336006Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd37 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6336283Z // end inline asm 2026-02-21T09:20:57.6336443Z cp.async.commit_group; 2026-02-21T09:20:57.6336881Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6337232Z add.s64 %rd38, %rd48, 128; 2026-02-21T09:20:57.6337552Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6337894Z bar.sync 0; 2026-02-21T09:20:57.6338039Z // begin inline asm 2026-02-21T09:20:57.6338267Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd38 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6338537Z // end inline asm 2026-02-21T09:20:57.6338702Z cp.async.commit_group; 2026-02-21T09:20:57.6339004Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6339355Z add.s64 %rd39, %rd48, 160; 2026-02-21T09:20:57.6339687Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6340038Z // begin inline asm 2026-02-21T09:20:57.6340271Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd39 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6340540Z // end inline asm 2026-02-21T09:20:57.6340701Z cp.async.commit_group; 2026-02-21T09:20:57.6341006Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6341458Z add.s64 %rd40, %rd48, 192; 2026-02-21T09:20:57.6341767Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6342183Z bar.sync 0; 2026-02-21T09:20:57.6342334Z // begin inline asm 2026-02-21T09:20:57.6342576Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd40 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6342850Z // end inline asm 2026-02-21T09:20:57.6343008Z cp.async.commit_group; 2026-02-21T09:20:57.6343322Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6343669Z add.s64 %rd41, %rd48, 224; 2026-02-21T09:20:57.6344051Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6344394Z // begin inline asm 2026-02-21T09:20:57.6344625Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd41 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6344898Z // end inline asm 2026-02-21T09:20:57.6345055Z cp.async.commit_group; 2026-02-21T09:20:57.6345366Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6345716Z add.s64 %rd42, %rd48, 256; 2026-02-21T09:20:57.6346029Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6346367Z bar.sync 0; 2026-02-21T09:20:57.6346643Z // begin inline asm 2026-02-21T09:20:57.6346963Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd42 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6347243Z // end inline asm 2026-02-21T09:20:57.6347405Z cp.async.commit_group; 2026-02-21T09:20:57.6347723Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6348084Z add.s64 %rd43, %rd48, 288; 2026-02-21T09:20:57.6348490Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6348848Z // begin inline asm 2026-02-21T09:20:57.6349079Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd43 + 0 ], 0x8, %r537; 2026-02-21T09:20:57.6349359Z // end inline asm 2026-02-21T09:20:57.6349518Z cp.async.commit_group; 2026-02-21T09:20:57.6349828Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6350179Z shl.b32 %r575, %r568, 18; 2026-02-21T09:20:57.6350359Z or.b32 %r5161, %r45, %r575; 2026-02-21T09:20:57.6350542Z add.s32 %r5160, %r46, %r77; 2026-02-21T09:20:57.6350718Z mov.b32 %r1210, 0f00000000; 2026-02-21T09:20:57.6350891Z mov.b32 %r5163, 4; 2026-02-21T09:20:57.6351048Z mov.b32 %r5162, -1; 2026-02-21T09:20:57.6351214Z mov.b64 %rd219, -16; 2026-02-21T09:20:57.6351393Z mov.b32 %r1211, %r1210; 2026-02-21T09:20:57.6351562Z mov.b32 %r1212, %r1210; 2026-02-21T09:20:57.6351732Z mov.b32 %r1213, %r1210; 2026-02-21T09:20:57.6351892Z mov.b32 %r1214, %r1210; 2026-02-21T09:20:57.6352058Z mov.b32 %r1215, %r1210; 2026-02-21T09:20:57.6352218Z mov.b32 %r1216, %r1210; 2026-02-21T09:20:57.6352381Z mov.b32 %r1217, %r1210; 2026-02-21T09:20:57.6352540Z mov.b32 %r1218, %r1210; 2026-02-21T09:20:57.6352720Z mov.b32 %r1219, %r1210; 2026-02-21T09:20:57.6352886Z mov.b32 %r1220, %r1210; 2026-02-21T09:20:57.6353055Z mov.b32 %r1221, %r1210; 2026-02-21T09:20:57.6353221Z mov.b32 %r1222, %r1210; 2026-02-21T09:20:57.6353381Z mov.b32 %r1223, %r1210; 2026-02-21T09:20:57.6353550Z mov.b32 %r1224, %r1210; 2026-02-21T09:20:57.6353707Z mov.b32 %r1225, %r1210; 2026-02-21T09:20:57.6353874Z mov.b32 %r1226, %r1210; 2026-02-21T09:20:57.6354036Z mov.b32 %r1227, %r1210; 2026-02-21T09:20:57.6354201Z mov.b32 %r1228, %r1210; 2026-02-21T09:20:57.6354359Z mov.b32 %r1229, %r1210; 2026-02-21T09:20:57.6354524Z mov.b32 %r1230, %r1210; 2026-02-21T09:20:57.6354684Z mov.b32 %r1231, %r1210; 2026-02-21T09:20:57.6354851Z mov.b32 %r1232, %r1210; 2026-02-21T09:20:57.6355026Z mov.b32 %r1233, %r1210; 2026-02-21T09:20:57.6355284Z mov.b32 %r1234, %r1210; 2026-02-21T09:20:57.6355450Z mov.b32 %r1235, %r1210; 2026-02-21T09:20:57.6355610Z mov.b32 %r1236, %r1210; 2026-02-21T09:20:57.6355779Z mov.b32 %r1237, %r1210; 2026-02-21T09:20:57.6355943Z mov.b32 %r1238, %r1210; 2026-02-21T09:20:57.6356179Z mov.b32 %r1239, %r1210; 2026-02-21T09:20:57.6356337Z mov.b32 %r1240, %r1210; 2026-02-21T09:20:57.6356612Z mov.b32 %r1241, %r1210; 2026-02-21T09:20:57.6356833Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:57.6357138Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:57.6357406Z add.s64 %rd219, %rd219, 16; 2026-02-21T09:20:57.6357599Z setp.lt.u64 %p10, %rd219, 432; 2026-02-21T09:20:57.6357884Z add.s32 %r1352, %r5162, 1; 2026-02-21T09:20:57.6358070Z setp.gt.s32 %p11, %r1352, 4; 2026-02-21T09:20:57.6358270Z selp.b32 %r5162, 0, %r1352, %p11; 2026-02-21T09:20:57.6358609Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6358976Z cp.async.wait_group 8; 2026-02-21T09:20:57.6359152Z bar.sync 0; 2026-02-21T09:20:57.6359310Z shl.b32 %r1353, %r5162, 13; 2026-02-21T09:20:57.6359495Z add.s32 %r1355, %r5136, %r1353; 2026-02-21T09:20:57.6359821Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6360177Z add.s32 %r1356, %r1355, %r31; 2026-02-21T09:20:57.6360362Z ld.shared.b16 %rs3, [%r1356]; 2026-02-21T09:20:57.6360621Z ld.shared.b16 %rs4, [%r1356+256]; 2026-02-21T09:20:57.6360827Z ld.shared.b16 %rs5, [%r1356+16]; 2026-02-21T09:20:57.6361039Z ld.shared.b16 %rs6, [%r1356+272]; 2026-02-21T09:20:57.6361231Z add.s32 %r1357, %r1355, %r32; 2026-02-21T09:20:57.6361416Z ld.shared.b16 %rs7, [%r1357]; 2026-02-21T09:20:57.6361607Z ld.shared.b16 %rs8, [%r1357+256]; 2026-02-21T09:20:57.6361795Z ld.shared.b16 %rs9, [%r1357+16]; 2026-02-21T09:20:57.6361992Z ld.shared.b16 %rs10, [%r1357+272]; 2026-02-21T09:20:57.6362191Z cvt.f32.bf16 %r872, %rs3; 2026-02-21T09:20:57.6362370Z cvt.f32.bf16 %r873, %rs4; 2026-02-21T09:20:57.6362549Z cvt.f32.bf16 %r874, %rs7; 2026-02-21T09:20:57.6362723Z cvt.f32.bf16 %r875, %rs8; 2026-02-21T09:20:57.6362892Z cvt.f32.bf16 %r940, %rs5; 2026-02-21T09:20:57.6363067Z cvt.f32.bf16 %r941, %rs6; 2026-02-21T09:20:57.6363243Z cvt.f32.bf16 %r942, %rs9; 2026-02-21T09:20:57.6363413Z cvt.f32.bf16 %r943, %rs10; 2026-02-21T09:20:57.6363733Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6364083Z cvt.s64.s32 %rd63, %r5160; 2026-02-21T09:20:57.6364264Z add.s64 %rd50, %rd28, %rd63; 2026-02-21T09:20:57.6364583Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6364933Z // begin inline asm 2026-02-21T09:20:57.6365086Z mov.u64 %rd49, 0x0; 2026-02-21T09:20:57.6365335Z createpolicy.fractional.L2::evict_first.b64 %rd49, 1.0; 2026-02-21T09:20:57.6365593Z // end inline asm 2026-02-21T09:20:57.6365748Z // begin inline asm 2026-02-21T09:20:57.6365924Z mov.u16 %rs1, 0x0; 2026-02-21T09:20:57.6366179Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd50 + 0 ], %rd49; 2026-02-21T09:20:57.6366602Z // end inline asm 2026-02-21T09:20:57.6366906Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6367269Z st.shared.b8 [%r33], %rs1; 2026-02-21T09:20:57.6367449Z bar.sync 0; 2026-02-21T09:20:57.6367627Z ld.shared.v2.b8 {%rs11, %rs12}, [%r34]; 2026-02-21T09:20:57.6367983Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6368340Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:20:57.6368526Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:20:57.6368840Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6369313Z selp.b16 %rs15, %rs13, %rs11, %p63; 2026-02-21T09:20:57.6369517Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:20:57.6369698Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:20:57.6369883Z selp.b16 %rs18, %rs14, %rs12, %p63; 2026-02-21T09:20:57.6370079Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:20:57.6370321Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:20:57.6370632Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6370986Z cvt.rn.f32.s16 %r1358, %rs17; 2026-02-21T09:20:57.6371172Z cvt.rn.f32.s16 %r1359, %rs20; 2026-02-21T09:20:57.6371358Z bar.sync 0; 2026-02-21T09:20:57.6371509Z st.shared.b32 [%r35], %r1358; 2026-02-21T09:20:57.6371691Z st.shared.b32 [%r36], %r1359; 2026-02-21T09:20:57.6372020Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1210}; 2026-02-21T09:20:57.6372312Z bar.sync 0; 2026-02-21T09:20:57.6372463Z // begin inline asm 2026-02-21T09:20:57.6372708Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r740, %r876}, [%r578]; 2026-02-21T09:20:57.6372997Z // end inline asm 2026-02-21T09:20:57.6373145Z bar.sync 0; 2026-02-21T09:20:57.6373359Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1212}; 2026-02-21T09:20:57.6373623Z bar.sync 0; 2026-02-21T09:20:57.6373766Z // begin inline asm 2026-02-21T09:20:57.6373997Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r742, %r878}, [%r578]; 2026-02-21T09:20:57.6374275Z // end inline asm 2026-02-21T09:20:57.6374422Z bar.sync 0; 2026-02-21T09:20:57.6374629Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1211}; 2026-02-21T09:20:57.6374968Z bar.sync 0; 2026-02-21T09:20:57.6375111Z // begin inline asm 2026-02-21T09:20:57.6375344Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r741, %r877}, [%r578]; 2026-02-21T09:20:57.6375633Z // end inline asm 2026-02-21T09:20:57.6375783Z bar.sync 0; 2026-02-21T09:20:57.6375989Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1213}; 2026-02-21T09:20:57.6376251Z bar.sync 0; 2026-02-21T09:20:57.6376392Z // begin inline asm 2026-02-21T09:20:57.6376742Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r743, %r879}, [%r578]; 2026-02-21T09:20:57.6377036Z // end inline asm 2026-02-21T09:20:57.6377184Z bar.sync 0; 2026-02-21T09:20:57.6377400Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1214}; 2026-02-21T09:20:57.6377666Z bar.sync 0; 2026-02-21T09:20:57.6377813Z // begin inline asm 2026-02-21T09:20:57.6378045Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r744, %r880}, [%r578]; 2026-02-21T09:20:57.6378331Z // end inline asm 2026-02-21T09:20:57.6378475Z bar.sync 0; 2026-02-21T09:20:57.6378695Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1216}; 2026-02-21T09:20:57.6378966Z bar.sync 0; 2026-02-21T09:20:57.6379113Z // begin inline asm 2026-02-21T09:20:57.6379354Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r746, %r882}, [%r578]; 2026-02-21T09:20:57.6379627Z // end inline asm 2026-02-21T09:20:57.6379791Z bar.sync 0; 2026-02-21T09:20:57.6380003Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1215}; 2026-02-21T09:20:57.6380269Z bar.sync 0; 2026-02-21T09:20:57.6380409Z // begin inline asm 2026-02-21T09:20:57.6380645Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r745, %r881}, [%r578]; 2026-02-21T09:20:57.6380928Z // end inline asm 2026-02-21T09:20:57.6381073Z bar.sync 0; 2026-02-21T09:20:57.6381286Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1217}; 2026-02-21T09:20:57.6381554Z bar.sync 0; 2026-02-21T09:20:57.6381698Z // begin inline asm 2026-02-21T09:20:57.6381930Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r747, %r883}, [%r578]; 2026-02-21T09:20:57.6382214Z // end inline asm 2026-02-21T09:20:57.6382358Z bar.sync 0; 2026-02-21T09:20:57.6382572Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1218}; 2026-02-21T09:20:57.6382839Z bar.sync 0; 2026-02-21T09:20:57.6382980Z // begin inline asm 2026-02-21T09:20:57.6383217Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r748, %r884}, [%r578]; 2026-02-21T09:20:57.6383494Z // end inline asm 2026-02-21T09:20:57.6383746Z bar.sync 0; 2026-02-21T09:20:57.6383956Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1220}; 2026-02-21T09:20:57.6384223Z bar.sync 0; 2026-02-21T09:20:57.6384363Z // begin inline asm 2026-02-21T09:20:57.6384602Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r750, %r886}, [%r578]; 2026-02-21T09:20:57.6384963Z // end inline asm 2026-02-21T09:20:57.6385110Z bar.sync 0; 2026-02-21T09:20:57.6385324Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1219}; 2026-02-21T09:20:57.6385586Z bar.sync 0; 2026-02-21T09:20:57.6385731Z // begin inline asm 2026-02-21T09:20:57.6385964Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r749, %r885}, [%r578]; 2026-02-21T09:20:57.6386246Z // end inline asm 2026-02-21T09:20:57.6386388Z bar.sync 0; 2026-02-21T09:20:57.6386817Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1221}; 2026-02-21T09:20:57.6387083Z bar.sync 0; 2026-02-21T09:20:57.6387230Z // begin inline asm 2026-02-21T09:20:57.6387463Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r751, %r887}, [%r578]; 2026-02-21T09:20:57.6387751Z // end inline asm 2026-02-21T09:20:57.6387903Z bar.sync 0; 2026-02-21T09:20:57.6388113Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1222}; 2026-02-21T09:20:57.6388452Z bar.sync 0; 2026-02-21T09:20:57.6388594Z // begin inline asm 2026-02-21T09:20:57.6388836Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r752, %r888}, [%r578]; 2026-02-21T09:20:57.6389112Z // end inline asm 2026-02-21T09:20:57.6389264Z bar.sync 0; 2026-02-21T09:20:57.6389473Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1224}; 2026-02-21T09:20:57.6389821Z bar.sync 0; 2026-02-21T09:20:57.6389976Z // begin inline asm 2026-02-21T09:20:57.6390209Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r754, %r890}, [%r578]; 2026-02-21T09:20:57.6390493Z // end inline asm 2026-02-21T09:20:57.6390638Z bar.sync 0; 2026-02-21T09:20:57.6390877Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1223}; 2026-02-21T09:20:57.6391150Z bar.sync 0; 2026-02-21T09:20:57.6391305Z // begin inline asm 2026-02-21T09:20:57.6391552Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r753, %r889}, [%r578]; 2026-02-21T09:20:57.6391858Z // end inline asm 2026-02-21T09:20:57.6392019Z bar.sync 0; 2026-02-21T09:20:57.6392233Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1225}; 2026-02-21T09:20:57.6392511Z bar.sync 0; 2026-02-21T09:20:57.6392653Z // begin inline asm 2026-02-21T09:20:57.6392899Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r755, %r891}, [%r578]; 2026-02-21T09:20:57.6393185Z // end inline asm 2026-02-21T09:20:57.6393337Z bar.sync 0; 2026-02-21T09:20:57.6393556Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1226}; 2026-02-21T09:20:57.6393828Z bar.sync 0; 2026-02-21T09:20:57.6393968Z // begin inline asm 2026-02-21T09:20:57.6394218Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r756, %r892}, [%r578]; 2026-02-21T09:20:57.6394506Z // end inline asm 2026-02-21T09:20:57.6394652Z bar.sync 0; 2026-02-21T09:20:57.6394872Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1228}; 2026-02-21T09:20:57.6395136Z bar.sync 0; 2026-02-21T09:20:57.6395292Z // begin inline asm 2026-02-21T09:20:57.6395535Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r758, %r894}, [%r578]; 2026-02-21T09:20:57.6395816Z // end inline asm 2026-02-21T09:20:57.6395963Z bar.sync 0; 2026-02-21T09:20:57.6396180Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1227}; 2026-02-21T09:20:57.6396579Z bar.sync 0; 2026-02-21T09:20:57.6396726Z // begin inline asm 2026-02-21T09:20:57.6396982Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r757, %r893}, [%r578]; 2026-02-21T09:20:57.6397265Z // end inline asm 2026-02-21T09:20:57.6397416Z bar.sync 0; 2026-02-21T09:20:57.6397626Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1229}; 2026-02-21T09:20:57.6397900Z bar.sync 0; 2026-02-21T09:20:57.6398043Z // begin inline asm 2026-02-21T09:20:57.6398282Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r759, %r895}, [%r578]; 2026-02-21T09:20:57.6398560Z // end inline asm 2026-02-21T09:20:57.6398710Z bar.sync 0; 2026-02-21T09:20:57.6399030Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1230}; 2026-02-21T09:20:57.6399295Z bar.sync 0; 2026-02-21T09:20:57.6399443Z // begin inline asm 2026-02-21T09:20:57.6399679Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r760, %r896}, [%r578]; 2026-02-21T09:20:57.6400037Z // end inline asm 2026-02-21T09:20:57.6400185Z bar.sync 0; 2026-02-21T09:20:57.6400405Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1232}; 2026-02-21T09:20:57.6400673Z bar.sync 0; 2026-02-21T09:20:57.6400825Z // begin inline asm 2026-02-21T09:20:57.6401069Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r762, %r898}, [%r578]; 2026-02-21T09:20:57.6401345Z // end inline asm 2026-02-21T09:20:57.6401501Z bar.sync 0; 2026-02-21T09:20:57.6401782Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1231}; 2026-02-21T09:20:57.6402057Z bar.sync 0; 2026-02-21T09:20:57.6402199Z // begin inline asm 2026-02-21T09:20:57.6402437Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r761, %r897}, [%r578]; 2026-02-21T09:20:57.6402716Z // end inline asm 2026-02-21T09:20:57.6402864Z bar.sync 0; 2026-02-21T09:20:57.6403072Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1233}; 2026-02-21T09:20:57.6403339Z bar.sync 0; 2026-02-21T09:20:57.6403489Z // begin inline asm 2026-02-21T09:20:57.6403735Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r763, %r899}, [%r578]; 2026-02-21T09:20:57.6404018Z // end inline asm 2026-02-21T09:20:57.6404161Z bar.sync 0; 2026-02-21T09:20:57.6404374Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1234}; 2026-02-21T09:20:57.6404640Z bar.sync 0; 2026-02-21T09:20:57.6404861Z // begin inline asm 2026-02-21T09:20:57.6405098Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r764, %r900}, [%r578]; 2026-02-21T09:20:57.6405376Z // end inline asm 2026-02-21T09:20:57.6405525Z bar.sync 0; 2026-02-21T09:20:57.6405734Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1236}; 2026-02-21T09:20:57.6406019Z bar.sync 0; 2026-02-21T09:20:57.6406159Z // begin inline asm 2026-02-21T09:20:57.6406399Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r766, %r902}, [%r578]; 2026-02-21T09:20:57.6406798Z // end inline asm 2026-02-21T09:20:57.6406948Z bar.sync 0; 2026-02-21T09:20:57.6407156Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1235}; 2026-02-21T09:20:57.6407426Z bar.sync 0; 2026-02-21T09:20:57.6407568Z // begin inline asm 2026-02-21T09:20:57.6407809Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r765, %r901}, [%r578]; 2026-02-21T09:20:57.6408107Z // end inline asm 2026-02-21T09:20:57.6408253Z bar.sync 0; 2026-02-21T09:20:57.6408468Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1237}; 2026-02-21T09:20:57.6408746Z bar.sync 0; 2026-02-21T09:20:57.6408895Z // begin inline asm 2026-02-21T09:20:57.6409133Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r767, %r903}, [%r578]; 2026-02-21T09:20:57.6409417Z // end inline asm 2026-02-21T09:20:57.6409559Z bar.sync 0; 2026-02-21T09:20:57.6409774Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1238}; 2026-02-21T09:20:57.6410046Z bar.sync 0; 2026-02-21T09:20:57.6410185Z // begin inline asm 2026-02-21T09:20:57.6410428Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r768, %r904}, [%r578]; 2026-02-21T09:20:57.6410704Z // end inline asm 2026-02-21T09:20:57.6410854Z bar.sync 0; 2026-02-21T09:20:57.6411065Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1240}; 2026-02-21T09:20:57.6411341Z bar.sync 0; 2026-02-21T09:20:57.6411482Z // begin inline asm 2026-02-21T09:20:57.6411723Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r770, %r906}, [%r578]; 2026-02-21T09:20:57.6412000Z // end inline asm 2026-02-21T09:20:57.6412155Z bar.sync 0; 2026-02-21T09:20:57.6412376Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1239}; 2026-02-21T09:20:57.6412639Z bar.sync 0; 2026-02-21T09:20:57.6412791Z // begin inline asm 2026-02-21T09:20:57.6413030Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r769, %r905}, [%r578]; 2026-02-21T09:20:57.6413314Z // end inline asm 2026-02-21T09:20:57.6413457Z bar.sync 0; 2026-02-21T09:20:57.6413775Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r1241}; 2026-02-21T09:20:57.6414036Z bar.sync 0; 2026-02-21T09:20:57.6414182Z // begin inline asm 2026-02-21T09:20:57.6414421Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r771, %r907}, [%r578]; 2026-02-21T09:20:57.6414765Z // end inline asm 2026-02-21T09:20:57.6414915Z $L__tmp1: 2026-02-21T09:20:57.6415275Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6415702Z // begin inline asm 2026-02-21T09:20:57.6415879Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6416077Z // end inline asm 2026-02-21T09:20:57.6416247Z shfl.sync.idx.b32 %r1360, %r6, 0, 31, -1; 2026-02-21T09:20:57.6416611Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6416887Z mov.pred %p3, -1; 2026-02-21T09:20:57.6417048Z // begin inline asm 2026-02-21T09:20:57.6417807Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771}, {%r872,%r873,%r874,%r875}, %rd1, %p3, 1, 1; 2026-02-21T09:20:57.6418625Z // end inline asm 2026-02-21T09:20:57.6418785Z // begin inline asm 2026-02-21T09:20:57.6419595Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771}, {%r940,%r941,%r942,%r943}, %rd2, %p3, 1, 1; 2026-02-21T09:20:57.6420390Z // end inline asm 2026-02-21T09:20:57.6420544Z // begin inline asm 2026-02-21T09:20:57.6421286Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907}, {%r872,%r873,%r874,%r875}, %rd3, %p3, 1, 1; 2026-02-21T09:20:57.6422069Z // end inline asm 2026-02-21T09:20:57.6422224Z // begin inline asm 2026-02-21T09:20:57.6422959Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907}, {%r940,%r941,%r942,%r943}, %rd4, %p3, 1, 1; 2026-02-21T09:20:57.6423750Z // end inline asm 2026-02-21T09:20:57.6423919Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6424121Z mov.b32 %r1312, 0; 2026-02-21T09:20:57.6424284Z mov.b32 %r1008, %r481; 2026-02-21T09:20:57.6424463Z mov.b32 %r1009, %r1312; 2026-02-21T09:20:57.6424641Z mov.b32 %r1010, %r1312; 2026-02-21T09:20:57.6424804Z // begin inline asm 2026-02-21T09:20:57.6425787Z // wait for regs: %r740,%r741,%r742,%r743,%r744,%r745,%r746,%r747,%r748,%r749,%r750,%r751,%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r876,%r877,%r878,%r879,%r880,%r881,%r882,%r883,%r884,%r885,%r886,%r887,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r1008,%r1009,%r1010 2026-02-21T09:20:57.6426969Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6427174Z // end inline asm 2026-02-21T09:20:57.6427331Z $L__tmp2: 2026-02-21T09:20:57.6427627Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6427998Z add.s32 %r1361, %r1355, 40960; 2026-02-21T09:20:57.6428378Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6428742Z add.s32 %r1362, %r1361, %r31; 2026-02-21T09:20:57.6428932Z ld.shared.b16 %rs21, [%r1362]; 2026-02-21T09:20:57.6429139Z ld.shared.b16 %rs22, [%r1362+256]; 2026-02-21T09:20:57.6429345Z ld.shared.b16 %rs23, [%r1362+16]; 2026-02-21T09:20:57.6429555Z ld.shared.b16 %rs24, [%r1362+272]; 2026-02-21T09:20:57.6429844Z add.s32 %r1363, %r1361, %r32; 2026-02-21T09:20:57.6430028Z ld.shared.b16 %rs25, [%r1363]; 2026-02-21T09:20:57.6430223Z ld.shared.b16 %rs26, [%r1363+256]; 2026-02-21T09:20:57.6430431Z ld.shared.b16 %rs27, [%r1363+16]; 2026-02-21T09:20:57.6430700Z ld.shared.b16 %rs28, [%r1363+272]; 2026-02-21T09:20:57.6430896Z cvt.f32.bf16 %r1206, %rs21; 2026-02-21T09:20:57.6431088Z cvt.f32.bf16 %r1207, %rs22; 2026-02-21T09:20:57.6431269Z cvt.f32.bf16 %r1208, %rs25; 2026-02-21T09:20:57.6431459Z cvt.f32.bf16 %r1209, %rs26; 2026-02-21T09:20:57.6431640Z cvt.f32.bf16 %r1274, %rs23; 2026-02-21T09:20:57.6431815Z cvt.f32.bf16 %r1275, %rs24; 2026-02-21T09:20:57.6431996Z cvt.f32.bf16 %r1276, %rs27; 2026-02-21T09:20:57.6432170Z cvt.f32.bf16 %r1277, %rs28; 2026-02-21T09:20:57.6432566Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6432924Z add.s32 %r1364, %r5160, 65536; 2026-02-21T09:20:57.6433118Z cvt.s64.s32 %rd64, %r1364; 2026-02-21T09:20:57.6433301Z add.s64 %rd57, %rd28, %rd64; 2026-02-21T09:20:57.6433622Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6433973Z // begin inline asm 2026-02-21T09:20:57.6434137Z mov.u64 %rd56, 0x0; 2026-02-21T09:20:57.6434368Z createpolicy.fractional.L2::evict_first.b64 %rd56, 1.0; 2026-02-21T09:20:57.6434627Z // end inline asm 2026-02-21T09:20:57.6434785Z // begin inline asm 2026-02-21T09:20:57.6434940Z mov.u16 %rs2, 0x0; 2026-02-21T09:20:57.6435267Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd57 + 0 ], %rd56; 2026-02-21T09:20:57.6435581Z // end inline asm 2026-02-21T09:20:57.6435890Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6436244Z bar.sync 0; 2026-02-21T09:20:57.6436398Z st.shared.b8 [%r33], %rs2; 2026-02-21T09:20:57.6436708Z bar.sync 0; 2026-02-21T09:20:57.6436869Z ld.shared.v2.b8 {%rs29, %rs30}, [%r34]; 2026-02-21T09:20:57.6437229Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6437595Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:20:57.6437780Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:20:57.6438097Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6438456Z selp.b16 %rs33, %rs31, %rs29, %p63; 2026-02-21T09:20:57.6438665Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:20:57.6438835Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:20:57.6439019Z selp.b16 %rs36, %rs32, %rs30, %p63; 2026-02-21T09:20:57.6439215Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:20:57.6439390Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:20:57.6439701Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6440063Z cvt.rn.f32.s16 %r1365, %rs35; 2026-02-21T09:20:57.6440256Z cvt.rn.f32.s16 %r1366, %rs38; 2026-02-21T09:20:57.6440435Z bar.sync 0; 2026-02-21T09:20:57.6440595Z st.shared.b32 [%r35], %r1365; 2026-02-21T09:20:57.6440778Z st.shared.b32 [%r36], %r1366; 2026-02-21T09:20:57.6440961Z $L__tmp3: 2026-02-21T09:20:57.6441314Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6441834Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r740, %r876}; 2026-02-21T09:20:57.6442122Z bar.sync 0; 2026-02-21T09:20:57.6442274Z // begin inline asm 2026-02-21T09:20:57.6442533Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1210}, [%r1079]; 2026-02-21T09:20:57.6442804Z // end inline asm 2026-02-21T09:20:57.6442956Z bar.sync 0; 2026-02-21T09:20:57.6443182Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r742, %r878}; 2026-02-21T09:20:57.6443465Z bar.sync 0; 2026-02-21T09:20:57.6443607Z // begin inline asm 2026-02-21T09:20:57.6443841Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1212}, [%r1079]; 2026-02-21T09:20:57.6444216Z // end inline asm 2026-02-21T09:20:57.6444370Z bar.sync 0; 2026-02-21T09:20:57.6444593Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r741, %r877}; 2026-02-21T09:20:57.6444876Z bar.sync 0; 2026-02-21T09:20:57.6445021Z // begin inline asm 2026-02-21T09:20:57.6445333Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1211}, [%r1079]; 2026-02-21T09:20:57.6445602Z // end inline asm 2026-02-21T09:20:57.6445744Z bar.sync 0; 2026-02-21T09:20:57.6445972Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r743, %r879}; 2026-02-21T09:20:57.6446249Z bar.sync 0; 2026-02-21T09:20:57.6446397Z // begin inline asm 2026-02-21T09:20:57.6446756Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1213}, [%r1079]; 2026-02-21T09:20:57.6447029Z // end inline asm 2026-02-21T09:20:57.6447182Z bar.sync 0; 2026-02-21T09:20:57.6447484Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r744, %r880}; 2026-02-21T09:20:57.6447767Z bar.sync 0; 2026-02-21T09:20:57.6447907Z // begin inline asm 2026-02-21T09:20:57.6448138Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1214}, [%r1079]; 2026-02-21T09:20:57.6448403Z // end inline asm 2026-02-21T09:20:57.6448552Z bar.sync 0; 2026-02-21T09:20:57.6448776Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r746, %r882}; 2026-02-21T09:20:57.6449057Z bar.sync 0; 2026-02-21T09:20:57.6449199Z // begin inline asm 2026-02-21T09:20:57.6449429Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1216}, [%r1079]; 2026-02-21T09:20:57.6449699Z // end inline asm 2026-02-21T09:20:57.6449843Z bar.sync 0; 2026-02-21T09:20:57.6450157Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r745, %r881}; 2026-02-21T09:20:57.6450439Z bar.sync 0; 2026-02-21T09:20:57.6450586Z // begin inline asm 2026-02-21T09:20:57.6450809Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1215}, [%r1079]; 2026-02-21T09:20:57.6451083Z // end inline asm 2026-02-21T09:20:57.6451227Z bar.sync 0; 2026-02-21T09:20:57.6451462Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r747, %r883}; 2026-02-21T09:20:57.6451745Z bar.sync 0; 2026-02-21T09:20:57.6451890Z // begin inline asm 2026-02-21T09:20:57.6452122Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1217}, [%r1079]; 2026-02-21T09:20:57.6452391Z // end inline asm 2026-02-21T09:20:57.6452541Z bar.sync 0; 2026-02-21T09:20:57.6452765Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r748, %r884}; 2026-02-21T09:20:57.6453049Z bar.sync 0; 2026-02-21T09:20:57.6453192Z // begin inline asm 2026-02-21T09:20:57.6453425Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1218}, [%r1079]; 2026-02-21T09:20:57.6453715Z // end inline asm 2026-02-21T09:20:57.6453864Z bar.sync 0; 2026-02-21T09:20:57.6454093Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r750, %r886}; 2026-02-21T09:20:57.6454367Z bar.sync 0; 2026-02-21T09:20:57.6454515Z // begin inline asm 2026-02-21T09:20:57.6454739Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1220}, [%r1079]; 2026-02-21T09:20:57.6455008Z // end inline asm 2026-02-21T09:20:57.6455152Z bar.sync 0; 2026-02-21T09:20:57.6455375Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r749, %r885}; 2026-02-21T09:20:57.6455652Z bar.sync 0; 2026-02-21T09:20:57.6455796Z // begin inline asm 2026-02-21T09:20:57.6456025Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1219}, [%r1079]; 2026-02-21T09:20:57.6456291Z // end inline asm 2026-02-21T09:20:57.6456445Z bar.sync 0; 2026-02-21T09:20:57.6456797Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r751, %r887}; 2026-02-21T09:20:57.6457077Z bar.sync 0; 2026-02-21T09:20:57.6457219Z // begin inline asm 2026-02-21T09:20:57.6457451Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1221}, [%r1079]; 2026-02-21T09:20:57.6457718Z // end inline asm 2026-02-21T09:20:57.6457868Z bar.sync 0; 2026-02-21T09:20:57.6458090Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r752, %r888}; 2026-02-21T09:20:57.6458371Z bar.sync 0; 2026-02-21T09:20:57.6458520Z // begin inline asm 2026-02-21T09:20:57.6458743Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1222}, [%r1079]; 2026-02-21T09:20:57.6459019Z // end inline asm 2026-02-21T09:20:57.6459343Z bar.sync 0; 2026-02-21T09:20:57.6459571Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r754, %r890}; 2026-02-21T09:20:57.6459844Z bar.sync 0; 2026-02-21T09:20:57.6459993Z // begin inline asm 2026-02-21T09:20:57.6460219Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1224}, [%r1079]; 2026-02-21T09:20:57.6460554Z // end inline asm 2026-02-21T09:20:57.6460706Z bar.sync 0; 2026-02-21T09:20:57.6460928Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r753, %r889}; 2026-02-21T09:20:57.6461207Z bar.sync 0; 2026-02-21T09:20:57.6461352Z // begin inline asm 2026-02-21T09:20:57.6461583Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1223}, [%r1079]; 2026-02-21T09:20:57.6461848Z // end inline asm 2026-02-21T09:20:57.6462011Z bar.sync 0; 2026-02-21T09:20:57.6462306Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r755, %r891}; 2026-02-21T09:20:57.6462590Z bar.sync 0; 2026-02-21T09:20:57.6462737Z // begin inline asm 2026-02-21T09:20:57.6462962Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1225}, [%r1079]; 2026-02-21T09:20:57.6463238Z // end inline asm 2026-02-21T09:20:57.6463385Z bar.sync 0; 2026-02-21T09:20:57.6463614Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r756, %r892}; 2026-02-21T09:20:57.6463889Z bar.sync 0; 2026-02-21T09:20:57.6464038Z // begin inline asm 2026-02-21T09:20:57.6464262Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1226}, [%r1079]; 2026-02-21T09:20:57.6464534Z // end inline asm 2026-02-21T09:20:57.6464678Z bar.sync 0; 2026-02-21T09:20:57.6464907Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r758, %r894}; 2026-02-21T09:20:57.6465265Z bar.sync 0; 2026-02-21T09:20:57.6465410Z // begin inline asm 2026-02-21T09:20:57.6465656Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1228}, [%r1079]; 2026-02-21T09:20:57.6465927Z // end inline asm 2026-02-21T09:20:57.6466074Z bar.sync 0; 2026-02-21T09:20:57.6466291Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r757, %r893}; 2026-02-21T09:20:57.6466687Z bar.sync 0; 2026-02-21T09:20:57.6466833Z // begin inline asm 2026-02-21T09:20:57.6467076Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1227}, [%r1079]; 2026-02-21T09:20:57.6467345Z // end inline asm 2026-02-21T09:20:57.6467489Z bar.sync 0; 2026-02-21T09:20:57.6467715Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r759, %r895}; 2026-02-21T09:20:57.6467992Z bar.sync 0; 2026-02-21T09:20:57.6468139Z // begin inline asm 2026-02-21T09:20:57.6468449Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1229}, [%r1079]; 2026-02-21T09:20:57.6468723Z // end inline asm 2026-02-21T09:20:57.6468866Z bar.sync 0; 2026-02-21T09:20:57.6469093Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r760, %r896}; 2026-02-21T09:20:57.6469371Z bar.sync 0; 2026-02-21T09:20:57.6469518Z // begin inline asm 2026-02-21T09:20:57.6469745Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1230}, [%r1079]; 2026-02-21T09:20:57.6470012Z // end inline asm 2026-02-21T09:20:57.6470162Z bar.sync 0; 2026-02-21T09:20:57.6470385Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r762, %r898}; 2026-02-21T09:20:57.6470667Z bar.sync 0; 2026-02-21T09:20:57.6470808Z // begin inline asm 2026-02-21T09:20:57.6471037Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1232}, [%r1079]; 2026-02-21T09:20:57.6471317Z // end inline asm 2026-02-21T09:20:57.6471477Z bar.sync 0; 2026-02-21T09:20:57.6471714Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r761, %r897}; 2026-02-21T09:20:57.6471989Z bar.sync 0; 2026-02-21T09:20:57.6472137Z // begin inline asm 2026-02-21T09:20:57.6472360Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1231}, [%r1079]; 2026-02-21T09:20:57.6472633Z // end inline asm 2026-02-21T09:20:57.6472778Z bar.sync 0; 2026-02-21T09:20:57.6473004Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r763, %r899}; 2026-02-21T09:20:57.6473280Z bar.sync 0; 2026-02-21T09:20:57.6473429Z // begin inline asm 2026-02-21T09:20:57.6473654Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1233}, [%r1079]; 2026-02-21T09:20:57.6473945Z // end inline asm 2026-02-21T09:20:57.6474182Z bar.sync 0; 2026-02-21T09:20:57.6474402Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r764, %r900}; 2026-02-21T09:20:57.6474688Z bar.sync 0; 2026-02-21T09:20:57.6474829Z // begin inline asm 2026-02-21T09:20:57.6475062Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1234}, [%r1079]; 2026-02-21T09:20:57.6475400Z // end inline asm 2026-02-21T09:20:57.6475557Z bar.sync 0; 2026-02-21T09:20:57.6475778Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r766, %r902}; 2026-02-21T09:20:57.6476062Z bar.sync 0; 2026-02-21T09:20:57.6476210Z // begin inline asm 2026-02-21T09:20:57.6476587Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1236}, [%r1079]; 2026-02-21T09:20:57.6476880Z // end inline asm 2026-02-21T09:20:57.6477026Z bar.sync 0; 2026-02-21T09:20:57.6477328Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r765, %r901}; 2026-02-21T09:20:57.6477606Z bar.sync 0; 2026-02-21T09:20:57.6477753Z // begin inline asm 2026-02-21T09:20:57.6477978Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1235}, [%r1079]; 2026-02-21T09:20:57.6478264Z // end inline asm 2026-02-21T09:20:57.6478409Z bar.sync 0; 2026-02-21T09:20:57.6478638Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r767, %r903}; 2026-02-21T09:20:57.6478916Z bar.sync 0; 2026-02-21T09:20:57.6479058Z // begin inline asm 2026-02-21T09:20:57.6479286Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1237}, [%r1079]; 2026-02-21T09:20:57.6479552Z // end inline asm 2026-02-21T09:20:57.6479702Z bar.sync 0; 2026-02-21T09:20:57.6487089Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r768, %r904}; 2026-02-21T09:20:57.6487618Z bar.sync 0; 2026-02-21T09:20:57.6487797Z // begin inline asm 2026-02-21T09:20:57.6488060Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1238}, [%r1079]; 2026-02-21T09:20:57.6488346Z // end inline asm 2026-02-21T09:20:57.6488512Z bar.sync 0; 2026-02-21T09:20:57.6488760Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r770, %r906}; 2026-02-21T09:20:57.6489051Z bar.sync 0; 2026-02-21T09:20:57.6489211Z // begin inline asm 2026-02-21T09:20:57.6489448Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1240}, [%r1079]; 2026-02-21T09:20:57.6489726Z // end inline asm 2026-02-21T09:20:57.6489876Z bar.sync 0; 2026-02-21T09:20:57.6490109Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r769, %r905}; 2026-02-21T09:20:57.6490392Z bar.sync 0; 2026-02-21T09:20:57.6490549Z // begin inline asm 2026-02-21T09:20:57.6490786Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1239}, [%r1079]; 2026-02-21T09:20:57.6491076Z // end inline asm 2026-02-21T09:20:57.6491237Z bar.sync 0; 2026-02-21T09:20:57.6491484Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r771, %r907}; 2026-02-21T09:20:57.6491782Z bar.sync 0; 2026-02-21T09:20:57.6491927Z // begin inline asm 2026-02-21T09:20:57.6492171Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1241}, [%r1079]; 2026-02-21T09:20:57.6492444Z // end inline asm 2026-02-21T09:20:57.6492602Z // begin inline asm 2026-02-21T09:20:57.6492787Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6492992Z // end inline asm 2026-02-21T09:20:57.6493160Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6493350Z shl.b32 %r1367, %r1360, 8; 2026-02-21T09:20:57.6493537Z and.b32 %r1368, %r1367, 4096; 2026-02-21T09:20:57.6493723Z add.s32 %r1369, %r1368, %r481; 2026-02-21T09:20:57.6493923Z bfe.u32 %r1370, %r1369, 4, 14; 2026-02-21T09:20:57.6494108Z cvt.u64.u32 %rd65, %r1370; 2026-02-21T09:20:57.6494331Z or.b64 %rd59, %rd65, -9223371899382267904; 2026-02-21T09:20:57.6494553Z // begin inline asm 2026-02-21T09:20:57.6495423Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1210,%r1211,%r1212,%r1213,%r1214,%r1215,%r1216,%r1217,%r1218,%r1219,%r1220,%r1221,%r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241}, {%r1206,%r1207,%r1208,%r1209}, %rd59, %p3, 1, 1; 2026-02-21T09:20:57.6496326Z // end inline asm 2026-02-21T09:20:57.6496629Z add.s32 %r1371, %r1369, 32; 2026-02-21T09:20:57.6496947Z bfe.u32 %r1372, %r1371, 4, 14; 2026-02-21T09:20:57.6497136Z cvt.u64.u32 %rd66, %r1372; 2026-02-21T09:20:57.6497342Z or.b64 %rd60, %rd66, -9223371899382267904; 2026-02-21T09:20:57.6497559Z // begin inline asm 2026-02-21T09:20:57.6498426Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1210,%r1211,%r1212,%r1213,%r1214,%r1215,%r1216,%r1217,%r1218,%r1219,%r1220,%r1221,%r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241}, {%r1274,%r1275,%r1276,%r1277}, %rd60, %p3, 1, 1; 2026-02-21T09:20:57.6499406Z // end inline asm 2026-02-21T09:20:57.6499582Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6499800Z mov.b32 %r1311, %r1312; 2026-02-21T09:20:57.6499983Z mov.b32 %r1310, %r481; 2026-02-21T09:20:57.6500243Z // begin inline asm 2026-02-21T09:20:57.6500914Z // wait for regs: %r1210,%r1211,%r1212,%r1213,%r1214,%r1215,%r1216,%r1217,%r1218,%r1219,%r1220,%r1221,%r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241,%r1310,%r1311,%r1312 2026-02-21T09:20:57.6501647Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6501859Z // end inline asm 2026-02-21T09:20:57.6502007Z $L__tmp4: 2026-02-21T09:20:57.6502317Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6502688Z add.s32 %r1373, %r5163, 1; 2026-02-21T09:20:57.6502881Z setp.gt.s32 %p12, %r1373, 4; 2026-02-21T09:20:57.6503077Z selp.b32 %r5163, 0, %r1373, %p12; 2026-02-21T09:20:57.6503522Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6503915Z add.s32 %r1374, %r5161, -16; 2026-02-21T09:20:57.6504126Z mad.wide.s32 %rd61, %r1374, 2, %rd27; 2026-02-21T09:20:57.6504491Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6504851Z shl.b32 %r1375, %r5163, 13; 2026-02-21T09:20:57.6505044Z add.s32 %r1348, %r21, %r1375; 2026-02-21T09:20:57.6505236Z selp.b32 %r1349, 8, 0, %p10; 2026-02-21T09:20:57.6505419Z // begin inline asm 2026-02-21T09:20:57.6505670Z cp.async.ca.shared.global [ %r1348 + 0 ], [ %rd61 + 0 ], 0x8, %r1349; 2026-02-21T09:20:57.6505953Z // end inline asm 2026-02-21T09:20:57.6506125Z cp.async.commit_group; 2026-02-21T09:20:57.6506439Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6506956Z mad.wide.s32 %rd62, %r5161, 2, %rd27; 2026-02-21T09:20:57.6507303Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6507662Z add.s32 %r1350, %r22, %r1375; 2026-02-21T09:20:57.6507844Z // begin inline asm 2026-02-21T09:20:57.6508092Z cp.async.ca.shared.global [ %r1350 + 0 ], [ %rd62 + 0 ], 0x8, %r1349; 2026-02-21T09:20:57.6508444Z // end inline asm 2026-02-21T09:20:57.6508604Z cp.async.commit_group; 2026-02-21T09:20:57.6508922Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6509276Z add.s32 %r5161, %r5161, 32; 2026-02-21T09:20:57.6509465Z add.s32 %r5160, %r5160, 131072; 2026-02-21T09:20:57.6509663Z setp.lt.u64 %p13, %rd219, 496; 2026-02-21T09:20:57.6509859Z @%p13 bra $L__BB0_3; 2026-02-21T09:20:57.6510082Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:57.6510483Z .loc 1 34 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:32 2026-02-21T09:20:57.6510851Z or.b32 %r1435, %r77, %r8; 2026-02-21T09:20:57.6511166Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6511533Z or.b32 %r1436, %r78, %r12; 2026-02-21T09:20:57.6511711Z or.b32 %r1437, %r78, %r13; 2026-02-21T09:20:57.6511887Z or.b32 %r1438, %r78, %r14; 2026-02-21T09:20:57.6512056Z or.b32 %r1439, %r78, %r15; 2026-02-21T09:20:57.6512464Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6512826Z cp.async.wait_group 0; 2026-02-21T09:20:57.6512999Z bar.sync 0; 2026-02-21T09:20:57.6513289Z .loc 1 90 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:90:28 2026-02-21T09:20:57.6513718Z cvt.rn.bf16x2.f32 %r1440, %r1211, %r1210; 2026-02-21T09:20:57.6513956Z cvt.rn.bf16x2.f32 %r1441, %r1213, %r1212; 2026-02-21T09:20:57.6514175Z cvt.rn.bf16x2.f32 %r1442, %r1215, %r1214; 2026-02-21T09:20:57.6514396Z cvt.rn.bf16x2.f32 %r1443, %r1217, %r1216; 2026-02-21T09:20:57.6514618Z cvt.rn.bf16x2.f32 %r1444, %r1219, %r1218; 2026-02-21T09:20:57.6514831Z cvt.rn.bf16x2.f32 %r1445, %r1221, %r1220; 2026-02-21T09:20:57.6515137Z cvt.rn.bf16x2.f32 %r1446, %r1223, %r1222; 2026-02-21T09:20:57.6515361Z cvt.rn.bf16x2.f32 %r1447, %r1225, %r1224; 2026-02-21T09:20:57.6515578Z cvt.rn.bf16x2.f32 %r1448, %r1227, %r1226; 2026-02-21T09:20:57.6515797Z cvt.rn.bf16x2.f32 %r1449, %r1229, %r1228; 2026-02-21T09:20:57.6516017Z cvt.rn.bf16x2.f32 %r1450, %r1231, %r1230; 2026-02-21T09:20:57.6516232Z cvt.rn.bf16x2.f32 %r1451, %r1233, %r1232; 2026-02-21T09:20:57.6516577Z cvt.rn.bf16x2.f32 %r1452, %r1235, %r1234; 2026-02-21T09:20:57.6516817Z cvt.rn.bf16x2.f32 %r1453, %r1237, %r1236; 2026-02-21T09:20:57.6517030Z cvt.rn.bf16x2.f32 %r1454, %r1239, %r1238; 2026-02-21T09:20:57.6517248Z cvt.rn.bf16x2.f32 %r1455, %r1241, %r1240; 2026-02-21T09:20:57.6517681Z .loc 1 91 43 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:43 2026-02-21T09:20:57.6518056Z shl.b32 %r1456, %r1436, 13; 2026-02-21T09:20:57.6518239Z shl.b32 %r1457, %r1437, 13; 2026-02-21T09:20:57.6518417Z shl.b32 %r1458, %r1438, 13; 2026-02-21T09:20:57.6518594Z shl.b32 %r1459, %r1439, 13; 2026-02-21T09:20:57.6518920Z .loc 1 91 50 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:50 2026-02-21T09:20:57.6519303Z add.s32 %r1460, %r1456, %r1435; 2026-02-21T09:20:57.6519495Z add.s32 %r1461, %r1457, %r1435; 2026-02-21T09:20:57.6519690Z add.s32 %r1462, %r1458, %r1435; 2026-02-21T09:20:57.6519872Z add.s32 %r1463, %r1459, %r1435; 2026-02-21T09:20:57.6520199Z .loc 1 91 22 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:22 2026-02-21T09:20:57.6520563Z mad.wide.s32 %rd67, %r1460, 2, %rd29; 2026-02-21T09:20:57.6520780Z mad.wide.s32 %rd68, %r1461, 2, %rd29; 2026-02-21T09:20:57.6520989Z mad.wide.s32 %rd69, %r1462, 2, %rd29; 2026-02-21T09:20:57.6521193Z mad.wide.s32 %rd70, %r1463, 2, %rd29; 2026-02-21T09:20:57.6521536Z .loc 1 91 81 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:81 2026-02-21T09:20:57.6521945Z st.shared.v4.b32 [%r39], {%r1440, %r1442, %r1444, %r1446}; 2026-02-21T09:20:57.6522256Z st.shared.v4.b32 [%r39+512], {%r1441, %r1443, %r1445, %r1447}; 2026-02-21T09:20:57.6522565Z st.shared.v4.b32 [%r40], {%r1448, %r1450, %r1452, %r1454}; 2026-02-21T09:20:57.6522862Z st.shared.v4.b32 [%r40+512], {%r1449, %r1451, %r1453, %r1455}; 2026-02-21T09:20:57.6523115Z bar.sync 0; 2026-02-21T09:20:57.6523263Z // begin inline asm 2026-02-21T09:20:57.6523558Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1396, %r1397, %r1398, %r1399}, [%r1380]; 2026-02-21T09:20:57.6523898Z // end inline asm 2026-02-21T09:20:57.6524054Z // begin inline asm 2026-02-21T09:20:57.6524350Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1400, %r1401, %r1402, %r1403}, [%r1385]; 2026-02-21T09:20:57.6524683Z // end inline asm 2026-02-21T09:20:57.6524836Z // begin inline asm 2026-02-21T09:20:57.6525114Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1404, %r1405, %r1406, %r1407}, [%r1390]; 2026-02-21T09:20:57.6525451Z // end inline asm 2026-02-21T09:20:57.6525601Z // begin inline asm 2026-02-21T09:20:57.6525877Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1408, %r1409, %r1410, %r1411}, [%r1395]; 2026-02-21T09:20:57.6526202Z // end inline asm 2026-02-21T09:20:57.6526437Z // begin inline asm 2026-02-21T09:20:57.6526798Z st.global.v4.b32 [ %rd67 + 0 ], { %r1396, %r1397, %r1398, %r1399 }; 2026-02-21T09:20:57.6527067Z // end inline asm 2026-02-21T09:20:57.6527220Z // begin inline asm 2026-02-21T09:20:57.6527435Z st.global.v4.b32 [ %rd68 + 0 ], { %r1400, %r1401, %r1402, %r1403 }; 2026-02-21T09:20:57.6527778Z // end inline asm 2026-02-21T09:20:57.6527924Z // begin inline asm 2026-02-21T09:20:57.6528151Z st.global.v4.b32 [ %rd69 + 0 ], { %r1404, %r1405, %r1406, %r1407 }; 2026-02-21T09:20:57.6528403Z // end inline asm 2026-02-21T09:20:57.6528555Z // begin inline asm 2026-02-21T09:20:57.6528765Z st.global.v4.b32 [ %rd70 + 0 ], { %r1408, %r1409, %r1410, %r1411 }; 2026-02-21T09:20:57.6529016Z // end inline asm 2026-02-21T09:20:57.6529393Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6529760Z or.b32 %r1464, %r5159, 1; 2026-02-21T09:20:57.6530076Z .loc 1 28 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:28:35 2026-02-21T09:20:57.6530433Z add.s32 %r1467, %r1464, %r560; 2026-02-21T09:20:57.6530624Z shr.s32 %r1468, %r1467, 10; 2026-02-21T09:20:57.6530954Z .loc 1 29 33 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:29:33 2026-02-21T09:20:57.6531317Z shl.b32 %r1469, %r1468, 4; 2026-02-21T09:20:57.6531633Z .loc 1 30 39 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:39 2026-02-21T09:20:57.6531979Z sub.s32 %r1470, 64, %r1469; 2026-02-21T09:20:57.6532364Z .loc 1 30 52 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:52 2026-02-21T09:20:57.6532715Z min.s32 %r1471, %r1470, 16; 2026-02-21T09:20:57.6533030Z .loc 1 31 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:45 2026-02-21T09:20:57.6533389Z and.b32 %r1472, %r1467, -1024; 2026-02-21T09:20:57.6533580Z sub.s32 %r1473, %r1464, %r1472; 2026-02-21T09:20:57.6533907Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6534257Z div.s32 %r1474, %r1473, %r1471; 2026-02-21T09:20:57.6534601Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6534953Z mul.lo.s32 %r1475, %r1474, %r1471; 2026-02-21T09:20:57.6535150Z sub.s32 %r1476, %r1473, %r1475; 2026-02-21T09:20:57.6535471Z .loc 1 31 30 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:30 2026-02-21T09:20:57.6535537Z add.s32 %r1477, %r1476, %r1469; 2026-02-21T09:20:57.6535736Z .loc 1 33 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:33:27 2026-02-21T09:20:57.6535801Z shl.b32 %r153, %r1477, 7; 2026-02-21T09:20:57.6535997Z .loc 1 35 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:35:27 2026-02-21T09:20:57.6536061Z shl.b32 %r154, %r1474, 8; 2026-02-21T09:20:57.6536268Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6536336Z or.b32 %r1478, %r154, %r11; 2026-02-21T09:20:57.6536669Z .loc 1 51 53 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:53 2026-02-21T09:20:57.6536742Z shl.b32 %r1479, %r1478, 10; 2026-02-21T09:20:57.6536936Z .loc 1 51 60 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:60 2026-02-21T09:20:57.6537000Z or.b32 %r1480, %r1479, %r19; 2026-02-21T09:20:57.6537203Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6537286Z mad.wide.s32 %rd71, %r1480, 2, %rd27; 2026-02-21T09:20:57.6537487Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6537549Z bar.sync 0; 2026-02-21T09:20:57.6537608Z mov.b32 %r1413, 8; 2026-02-21T09:20:57.6537761Z // begin inline asm 2026-02-21T09:20:57.6537901Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd71 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6537963Z // end inline asm 2026-02-21T09:20:57.6538032Z cp.async.commit_group; 2026-02-21T09:20:57.6538233Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6538367Z cvt.s64.s32 %rd82, %r1479; 2026-02-21T09:20:57.6538433Z or.b64 %rd83, %rd82, %rd10; 2026-02-21T09:20:57.6538497Z shl.b64 %rd84, %rd83, 1; 2026-02-21T09:20:57.6538574Z add.s64 %rd85, %rd27, %rd84; 2026-02-21T09:20:57.6538641Z add.s64 %rd72, %rd85, 32; 2026-02-21T09:20:57.6538840Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6538900Z // begin inline asm 2026-02-21T09:20:57.6539100Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd72 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6539161Z // end inline asm 2026-02-21T09:20:57.6539227Z cp.async.commit_group; 2026-02-21T09:20:57.6539430Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6539493Z add.s64 %rd73, %rd85, 64; 2026-02-21T09:20:57.6539688Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6539748Z bar.sync 0; 2026-02-21T09:20:57.6539810Z // begin inline asm 2026-02-21T09:20:57.6539939Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd73 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6539996Z // end inline asm 2026-02-21T09:20:57.6540127Z cp.async.commit_group; 2026-02-21T09:20:57.6540330Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6540392Z add.s64 %rd74, %rd85, 96; 2026-02-21T09:20:57.6540594Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6540655Z // begin inline asm 2026-02-21T09:20:57.6540784Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd74 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6540844Z // end inline asm 2026-02-21T09:20:57.6540911Z cp.async.commit_group; 2026-02-21T09:20:57.6541107Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6541174Z add.s64 %rd75, %rd85, 128; 2026-02-21T09:20:57.6541371Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6541426Z bar.sync 0; 2026-02-21T09:20:57.6541487Z // begin inline asm 2026-02-21T09:20:57.6541629Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd75 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6541688Z // end inline asm 2026-02-21T09:20:57.6541754Z cp.async.commit_group; 2026-02-21T09:20:57.6541952Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6542018Z add.s64 %rd76, %rd85, 160; 2026-02-21T09:20:57.6542213Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6542276Z // begin inline asm 2026-02-21T09:20:57.6542410Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd76 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6542467Z // end inline asm 2026-02-21T09:20:57.6542535Z cp.async.commit_group; 2026-02-21T09:20:57.6542740Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6542805Z add.s64 %rd77, %rd85, 192; 2026-02-21T09:20:57.6543013Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6543070Z bar.sync 0; 2026-02-21T09:20:57.6543143Z // begin inline asm 2026-02-21T09:20:57.6543288Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd77 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6543347Z // end inline asm 2026-02-21T09:20:57.6543416Z cp.async.commit_group; 2026-02-21T09:20:57.6543616Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6543739Z add.s64 %rd78, %rd85, 224; 2026-02-21T09:20:57.6543940Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6544002Z // begin inline asm 2026-02-21T09:20:57.6544176Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd78 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6544234Z // end inline asm 2026-02-21T09:20:57.6544304Z cp.async.commit_group; 2026-02-21T09:20:57.6544502Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6544565Z add.s64 %rd79, %rd85, 256; 2026-02-21T09:20:57.6544764Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6544885Z bar.sync 0; 2026-02-21T09:20:57.6544947Z // begin inline asm 2026-02-21T09:20:57.6545078Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd79 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6545138Z // end inline asm 2026-02-21T09:20:57.6545216Z cp.async.commit_group; 2026-02-21T09:20:57.6545415Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6545482Z add.s64 %rd80, %rd85, 288; 2026-02-21T09:20:57.6545680Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6545740Z // begin inline asm 2026-02-21T09:20:57.6545871Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd80 + 0 ], 0x8, %r1413; 2026-02-21T09:20:57.6545927Z // end inline asm 2026-02-21T09:20:57.6546042Z cp.async.commit_group; 2026-02-21T09:20:57.6546242Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6546307Z shl.b32 %r1481, %r1474, 18; 2026-02-21T09:20:57.6546371Z or.b32 %r5197, %r45, %r1481; 2026-02-21T09:20:57.6546432Z add.s32 %r5196, %r46, %r153; 2026-02-21T09:20:57.6546614Z mov.b32 %r2116, 0f00000000; 2026-02-21T09:20:57.6546678Z mov.b32 %r5199, 4; 2026-02-21T09:20:57.6546741Z mov.b32 %r5198, -1; 2026-02-21T09:20:57.6546807Z mov.b64 %rd220, -16; 2026-02-21T09:20:57.6546868Z mov.b32 %r2117, %r2116; 2026-02-21T09:20:57.6546929Z mov.b32 %r2118, %r2116; 2026-02-21T09:20:57.6546991Z mov.b32 %r2119, %r2116; 2026-02-21T09:20:57.6547052Z mov.b32 %r2120, %r2116; 2026-02-21T09:20:57.6547114Z mov.b32 %r2121, %r2116; 2026-02-21T09:20:57.6547171Z mov.b32 %r2122, %r2116; 2026-02-21T09:20:57.6547233Z mov.b32 %r2123, %r2116; 2026-02-21T09:20:57.6547291Z mov.b32 %r2124, %r2116; 2026-02-21T09:20:57.6547350Z mov.b32 %r2125, %r2116; 2026-02-21T09:20:57.6547408Z mov.b32 %r2126, %r2116; 2026-02-21T09:20:57.6547470Z mov.b32 %r2127, %r2116; 2026-02-21T09:20:57.6547528Z mov.b32 %r2128, %r2116; 2026-02-21T09:20:57.6547587Z mov.b32 %r2129, %r2116; 2026-02-21T09:20:57.6547649Z mov.b32 %r2130, %r2116; 2026-02-21T09:20:57.6547707Z mov.b32 %r2131, %r2116; 2026-02-21T09:20:57.6547764Z mov.b32 %r2132, %r2116; 2026-02-21T09:20:57.6547824Z mov.b32 %r2133, %r2116; 2026-02-21T09:20:57.6547887Z mov.b32 %r2134, %r2116; 2026-02-21T09:20:57.6547946Z mov.b32 %r2135, %r2116; 2026-02-21T09:20:57.6548016Z mov.b32 %r2136, %r2116; 2026-02-21T09:20:57.6548078Z mov.b32 %r2137, %r2116; 2026-02-21T09:20:57.6548139Z mov.b32 %r2138, %r2116; 2026-02-21T09:20:57.6548196Z mov.b32 %r2139, %r2116; 2026-02-21T09:20:57.6548254Z mov.b32 %r2140, %r2116; 2026-02-21T09:20:57.6548378Z mov.b32 %r2141, %r2116; 2026-02-21T09:20:57.6548442Z mov.b32 %r2142, %r2116; 2026-02-21T09:20:57.6548501Z mov.b32 %r2143, %r2116; 2026-02-21T09:20:57.6548568Z mov.b32 %r2144, %r2116; 2026-02-21T09:20:57.6548626Z mov.b32 %r2145, %r2116; 2026-02-21T09:20:57.6548683Z mov.b32 %r2146, %r2116; 2026-02-21T09:20:57.6548745Z mov.b32 %r2147, %r2116; 2026-02-21T09:20:57.6548862Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:57.6548971Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:57.6549124Z add.s64 %rd220, %rd220, 16; 2026-02-21T09:20:57.6549198Z setp.lt.u64 %p21, %rd220, 432; 2026-02-21T09:20:57.6549258Z add.s32 %r2258, %r5198, 1; 2026-02-21T09:20:57.6549322Z setp.gt.s32 %p22, %r2258, 4; 2026-02-21T09:20:57.6549455Z selp.b32 %r5198, 0, %r2258, %p22; 2026-02-21T09:20:57.6549657Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6549726Z cp.async.wait_group 8; 2026-02-21T09:20:57.6549780Z bar.sync 0; 2026-02-21T09:20:57.6549844Z shl.b32 %r2259, %r5198, 13; 2026-02-21T09:20:57.6549911Z add.s32 %r2261, %r5136, %r2259; 2026-02-21T09:20:57.6550109Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6550239Z add.s32 %r2262, %r2261, %r31; 2026-02-21T09:20:57.6550308Z ld.shared.b16 %rs41, [%r2262]; 2026-02-21T09:20:57.6550376Z ld.shared.b16 %rs42, [%r2262+256]; 2026-02-21T09:20:57.6550446Z ld.shared.b16 %rs43, [%r2262+16]; 2026-02-21T09:20:57.6550513Z ld.shared.b16 %rs44, [%r2262+272]; 2026-02-21T09:20:57.6550574Z add.s32 %r2263, %r2261, %r32; 2026-02-21T09:20:57.6550650Z ld.shared.b16 %rs45, [%r2263]; 2026-02-21T09:20:57.6550719Z ld.shared.b16 %rs46, [%r2263+256]; 2026-02-21T09:20:57.6550786Z ld.shared.b16 %rs47, [%r2263+16]; 2026-02-21T09:20:57.6550848Z ld.shared.b16 %rs48, [%r2263+272]; 2026-02-21T09:20:57.6550916Z cvt.f32.bf16 %r1778, %rs41; 2026-02-21T09:20:57.6550978Z cvt.f32.bf16 %r1779, %rs42; 2026-02-21T09:20:57.6551040Z cvt.f32.bf16 %r1780, %rs45; 2026-02-21T09:20:57.6551163Z cvt.f32.bf16 %r1781, %rs46; 2026-02-21T09:20:57.6551231Z cvt.f32.bf16 %r1846, %rs43; 2026-02-21T09:20:57.6551293Z cvt.f32.bf16 %r1847, %rs44; 2026-02-21T09:20:57.6551362Z cvt.f32.bf16 %r1848, %rs47; 2026-02-21T09:20:57.6551427Z cvt.f32.bf16 %r1849, %rs48; 2026-02-21T09:20:57.6551631Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6551697Z cvt.s64.s32 %rd100, %r5196; 2026-02-21T09:20:57.6551764Z add.s64 %rd87, %rd28, %rd100; 2026-02-21T09:20:57.6551959Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6552019Z // begin inline asm 2026-02-21T09:20:57.6552079Z mov.u64 %rd86, 0x0; 2026-02-21T09:20:57.6552212Z createpolicy.fractional.L2::evict_first.b64 %rd86, 1.0; 2026-02-21T09:20:57.6552271Z // end inline asm 2026-02-21T09:20:57.6552329Z // begin inline asm 2026-02-21T09:20:57.6552389Z mov.u16 %rs39, 0x0; 2026-02-21T09:20:57.6552548Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs39 }, [ %rd87 + 0 ], %rd86; 2026-02-21T09:20:57.6552608Z // end inline asm 2026-02-21T09:20:57.6552813Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6552881Z st.shared.b8 [%r33], %rs39; 2026-02-21T09:20:57.6552936Z bar.sync 0; 2026-02-21T09:20:57.6553013Z ld.shared.v2.b8 {%rs49, %rs50}, [%r34]; 2026-02-21T09:20:57.6553217Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6553280Z shl.b16 %rs51, %rs49, 4; 2026-02-21T09:20:57.6553341Z shl.b16 %rs52, %rs50, 4; 2026-02-21T09:20:57.6553544Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6553615Z selp.b16 %rs53, %rs51, %rs49, %p63; 2026-02-21T09:20:57.6553677Z cvt.s16.s8 %rs54, %rs53; 2026-02-21T09:20:57.6553740Z shr.s16 %rs55, %rs54, 4; 2026-02-21T09:20:57.6553809Z selp.b16 %rs56, %rs52, %rs50, %p63; 2026-02-21T09:20:57.6553869Z cvt.s16.s8 %rs57, %rs56; 2026-02-21T09:20:57.6553929Z shr.s16 %rs58, %rs57, 4; 2026-02-21T09:20:57.6554130Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6554195Z cvt.rn.f32.s16 %r2264, %rs55; 2026-02-21T09:20:57.6554259Z cvt.rn.f32.s16 %r2265, %rs58; 2026-02-21T09:20:57.6554380Z bar.sync 0; 2026-02-21T09:20:57.6554445Z st.shared.b32 [%r35], %r2264; 2026-02-21T09:20:57.6554509Z st.shared.b32 [%r36], %r2265; 2026-02-21T09:20:57.6554649Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2116}; 2026-02-21T09:20:57.6554710Z bar.sync 0; 2026-02-21T09:20:57.6554817Z // begin inline asm 2026-02-21T09:20:57.6554972Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1646, %r1782}, [%r578]; 2026-02-21T09:20:57.6555034Z // end inline asm 2026-02-21T09:20:57.6555089Z bar.sync 0; 2026-02-21T09:20:57.6555219Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2118}; 2026-02-21T09:20:57.6555292Z bar.sync 0; 2026-02-21T09:20:57.6555356Z // begin inline asm 2026-02-21T09:20:57.6555506Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1648, %r1784}, [%r578]; 2026-02-21T09:20:57.6555611Z // end inline asm 2026-02-21T09:20:57.6555670Z bar.sync 0; 2026-02-21T09:20:57.6555798Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2117}; 2026-02-21T09:20:57.6555853Z bar.sync 0; 2026-02-21T09:20:57.6555915Z // begin inline asm 2026-02-21T09:20:57.6556061Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1647, %r1783}, [%r578]; 2026-02-21T09:20:57.6556118Z // end inline asm 2026-02-21T09:20:57.6556173Z bar.sync 0; 2026-02-21T09:20:57.6556305Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2119}; 2026-02-21T09:20:57.6556362Z bar.sync 0; 2026-02-21T09:20:57.6556421Z // begin inline asm 2026-02-21T09:20:57.6556696Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1649, %r1785}, [%r578]; 2026-02-21T09:20:57.6556755Z // end inline asm 2026-02-21T09:20:57.6556885Z bar.sync 0; 2026-02-21T09:20:57.6557017Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2120}; 2026-02-21T09:20:57.6557077Z bar.sync 0; 2026-02-21T09:20:57.6557135Z // begin inline asm 2026-02-21T09:20:57.6557281Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1650, %r1786}, [%r578]; 2026-02-21T09:20:57.6557340Z // end inline asm 2026-02-21T09:20:57.6557396Z bar.sync 0; 2026-02-21T09:20:57.6557524Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2122}; 2026-02-21T09:20:57.6557589Z bar.sync 0; 2026-02-21T09:20:57.6557658Z // begin inline asm 2026-02-21T09:20:57.6557806Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1652, %r1788}, [%r578]; 2026-02-21T09:20:57.6557863Z // end inline asm 2026-02-21T09:20:57.6557923Z bar.sync 0; 2026-02-21T09:20:57.6558049Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2121}; 2026-02-21T09:20:57.6558105Z bar.sync 0; 2026-02-21T09:20:57.6558167Z // begin inline asm 2026-02-21T09:20:57.6558314Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1651, %r1787}, [%r578]; 2026-02-21T09:20:57.6558369Z // end inline asm 2026-02-21T09:20:57.6558425Z bar.sync 0; 2026-02-21T09:20:57.6558555Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2123}; 2026-02-21T09:20:57.6558611Z bar.sync 0; 2026-02-21T09:20:57.6558670Z // begin inline asm 2026-02-21T09:20:57.6558819Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1653, %r1789}, [%r578]; 2026-02-21T09:20:57.6558874Z // end inline asm 2026-02-21T09:20:57.6558932Z bar.sync 0; 2026-02-21T09:20:57.6559060Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2124}; 2026-02-21T09:20:57.6559117Z bar.sync 0; 2026-02-21T09:20:57.6559174Z // begin inline asm 2026-02-21T09:20:57.6559316Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1654, %r1790}, [%r578]; 2026-02-21T09:20:57.6559378Z // end inline asm 2026-02-21T09:20:57.6559434Z bar.sync 0; 2026-02-21T09:20:57.6559561Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2126}; 2026-02-21T09:20:57.6559622Z bar.sync 0; 2026-02-21T09:20:57.6559681Z // begin inline asm 2026-02-21T09:20:57.6559835Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1656, %r1792}, [%r578]; 2026-02-21T09:20:57.6559891Z // end inline asm 2026-02-21T09:20:57.6559953Z bar.sync 0; 2026-02-21T09:20:57.6560082Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2125}; 2026-02-21T09:20:57.6560136Z bar.sync 0; 2026-02-21T09:20:57.6560198Z // begin inline asm 2026-02-21T09:20:57.6560342Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1655, %r1791}, [%r578]; 2026-02-21T09:20:57.6560479Z // end inline asm 2026-02-21T09:20:57.6560533Z bar.sync 0; 2026-02-21T09:20:57.6560665Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2127}; 2026-02-21T09:20:57.6560720Z bar.sync 0; 2026-02-21T09:20:57.6560841Z // begin inline asm 2026-02-21T09:20:57.6560988Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1657, %r1793}, [%r578]; 2026-02-21T09:20:57.6561045Z // end inline asm 2026-02-21T09:20:57.6561100Z bar.sync 0; 2026-02-21T09:20:57.6561233Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2128}; 2026-02-21T09:20:57.6561289Z bar.sync 0; 2026-02-21T09:20:57.6561350Z // begin inline asm 2026-02-21T09:20:57.6561494Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1658, %r1794}, [%r578]; 2026-02-21T09:20:57.6561616Z // end inline asm 2026-02-21T09:20:57.6561673Z bar.sync 0; 2026-02-21T09:20:57.6561806Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2130}; 2026-02-21T09:20:57.6561864Z bar.sync 0; 2026-02-21T09:20:57.6561926Z // begin inline asm 2026-02-21T09:20:57.6562069Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1660, %r1796}, [%r578]; 2026-02-21T09:20:57.6562125Z // end inline asm 2026-02-21T09:20:57.6562184Z bar.sync 0; 2026-02-21T09:20:57.6562310Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2129}; 2026-02-21T09:20:57.6562366Z bar.sync 0; 2026-02-21T09:20:57.6562426Z // begin inline asm 2026-02-21T09:20:57.6562572Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1659, %r1795}, [%r578]; 2026-02-21T09:20:57.6562628Z // end inline asm 2026-02-21T09:20:57.6562728Z bar.sync 0; 2026-02-21T09:20:57.6562864Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2131}; 2026-02-21T09:20:57.6562919Z bar.sync 0; 2026-02-21T09:20:57.6562976Z // begin inline asm 2026-02-21T09:20:57.6563125Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1661, %r1797}, [%r578]; 2026-02-21T09:20:57.6563195Z // end inline asm 2026-02-21T09:20:57.6563252Z bar.sync 0; 2026-02-21T09:20:57.6563381Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2132}; 2026-02-21T09:20:57.6563440Z bar.sync 0; 2026-02-21T09:20:57.6563499Z // begin inline asm 2026-02-21T09:20:57.6563643Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1662, %r1798}, [%r578]; 2026-02-21T09:20:57.6563704Z // end inline asm 2026-02-21T09:20:57.6563757Z bar.sync 0; 2026-02-21T09:20:57.6563884Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2134}; 2026-02-21T09:20:57.6563943Z bar.sync 0; 2026-02-21T09:20:57.6564001Z // begin inline asm 2026-02-21T09:20:57.6564146Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1664, %r1800}, [%r578]; 2026-02-21T09:20:57.6564202Z // end inline asm 2026-02-21T09:20:57.6564261Z bar.sync 0; 2026-02-21T09:20:57.6564390Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2133}; 2026-02-21T09:20:57.6564445Z bar.sync 0; 2026-02-21T09:20:57.6564511Z // begin inline asm 2026-02-21T09:20:57.6564655Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1663, %r1799}, [%r578]; 2026-02-21T09:20:57.6564712Z // end inline asm 2026-02-21T09:20:57.6564767Z bar.sync 0; 2026-02-21T09:20:57.6564898Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2135}; 2026-02-21T09:20:57.6564954Z bar.sync 0; 2026-02-21T09:20:57.6565013Z // begin inline asm 2026-02-21T09:20:57.6565159Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1665, %r1801}, [%r578]; 2026-02-21T09:20:57.6565218Z // end inline asm 2026-02-21T09:20:57.6565273Z bar.sync 0; 2026-02-21T09:20:57.6565402Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2136}; 2026-02-21T09:20:57.6565459Z bar.sync 0; 2026-02-21T09:20:57.6565522Z // begin inline asm 2026-02-21T09:20:57.6565666Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1666, %r1802}, [%r578]; 2026-02-21T09:20:57.6565726Z // end inline asm 2026-02-21T09:20:57.6565780Z bar.sync 0; 2026-02-21T09:20:57.6565919Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2138}; 2026-02-21T09:20:57.6565979Z bar.sync 0; 2026-02-21T09:20:57.6566038Z // begin inline asm 2026-02-21T09:20:57.6566183Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1668, %r1804}, [%r578]; 2026-02-21T09:20:57.6566325Z // end inline asm 2026-02-21T09:20:57.6566382Z bar.sync 0; 2026-02-21T09:20:57.6566625Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2137}; 2026-02-21T09:20:57.6566788Z bar.sync 0; 2026-02-21T09:20:57.6566850Z // begin inline asm 2026-02-21T09:20:57.6567003Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1667, %r1803}, [%r578]; 2026-02-21T09:20:57.6567061Z // end inline asm 2026-02-21T09:20:57.6567115Z bar.sync 0; 2026-02-21T09:20:57.6567252Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2139}; 2026-02-21T09:20:57.6567309Z bar.sync 0; 2026-02-21T09:20:57.6567367Z // begin inline asm 2026-02-21T09:20:57.6567520Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1669, %r1805}, [%r578]; 2026-02-21T09:20:57.6567652Z // end inline asm 2026-02-21T09:20:57.6567712Z bar.sync 0; 2026-02-21T09:20:57.6567844Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2140}; 2026-02-21T09:20:57.6567903Z bar.sync 0; 2026-02-21T09:20:57.6567962Z // begin inline asm 2026-02-21T09:20:57.6568109Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1670, %r1806}, [%r578]; 2026-02-21T09:20:57.6568168Z // end inline asm 2026-02-21T09:20:57.6568221Z bar.sync 0; 2026-02-21T09:20:57.6568354Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2142}; 2026-02-21T09:20:57.6568412Z bar.sync 0; 2026-02-21T09:20:57.6568471Z // begin inline asm 2026-02-21T09:20:57.6568615Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1672, %r1808}, [%r578]; 2026-02-21T09:20:57.6568670Z // end inline asm 2026-02-21T09:20:57.6568794Z bar.sync 0; 2026-02-21T09:20:57.6568929Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2141}; 2026-02-21T09:20:57.6568984Z bar.sync 0; 2026-02-21T09:20:57.6569045Z // begin inline asm 2026-02-21T09:20:57.6569191Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1671, %r1807}, [%r578]; 2026-02-21T09:20:57.6569247Z // end inline asm 2026-02-21T09:20:57.6569302Z bar.sync 0; 2026-02-21T09:20:57.6569435Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2143}; 2026-02-21T09:20:57.6569489Z bar.sync 0; 2026-02-21T09:20:57.6569547Z // begin inline asm 2026-02-21T09:20:57.6569693Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1673, %r1809}, [%r578]; 2026-02-21T09:20:57.6569751Z // end inline asm 2026-02-21T09:20:57.6569805Z bar.sync 0; 2026-02-21T09:20:57.6569932Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2144}; 2026-02-21T09:20:57.6569990Z bar.sync 0; 2026-02-21T09:20:57.6570049Z // begin inline asm 2026-02-21T09:20:57.6570194Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1674, %r1810}, [%r578]; 2026-02-21T09:20:57.6570261Z // end inline asm 2026-02-21T09:20:57.6570324Z bar.sync 0; 2026-02-21T09:20:57.6570453Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2146}; 2026-02-21T09:20:57.6570513Z bar.sync 0; 2026-02-21T09:20:57.6570572Z // begin inline asm 2026-02-21T09:20:57.6570715Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1676, %r1812}, [%r578]; 2026-02-21T09:20:57.6570772Z // end inline asm 2026-02-21T09:20:57.6570828Z bar.sync 0; 2026-02-21T09:20:57.6570955Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2145}; 2026-02-21T09:20:57.6571008Z bar.sync 0; 2026-02-21T09:20:57.6571069Z // begin inline asm 2026-02-21T09:20:57.6571215Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1675, %r1811}, [%r578]; 2026-02-21T09:20:57.6571272Z // end inline asm 2026-02-21T09:20:57.6571336Z bar.sync 0; 2026-02-21T09:20:57.6571466Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r2147}; 2026-02-21T09:20:57.6571523Z bar.sync 0; 2026-02-21T09:20:57.6571585Z // begin inline asm 2026-02-21T09:20:57.6571733Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1677, %r1813}, [%r578]; 2026-02-21T09:20:57.6571788Z // end inline asm 2026-02-21T09:20:57.6571854Z $L__tmp5: 2026-02-21T09:20:57.6572142Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6572205Z // begin inline asm 2026-02-21T09:20:57.6572365Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6572421Z // end inline asm 2026-02-21T09:20:57.6572507Z shfl.sync.idx.b32 %r2266, %r6, 0, 31, -1; 2026-02-21T09:20:57.6572581Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6572645Z mov.pred %p14, -1; 2026-02-21T09:20:57.6572754Z // begin inline asm 2026-02-21T09:20:57.6573520Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1646,%r1647,%r1648,%r1649,%r1650,%r1651,%r1652,%r1653,%r1654,%r1655,%r1656,%r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677}, {%r1778,%r1779,%r1780,%r1781}, %rd1, %p14, 1, 1; 2026-02-21T09:20:57.6573578Z // end inline asm 2026-02-21T09:20:57.6573641Z // begin inline asm 2026-02-21T09:20:57.6574452Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1646,%r1647,%r1648,%r1649,%r1650,%r1651,%r1652,%r1653,%r1654,%r1655,%r1656,%r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677}, {%r1846,%r1847,%r1848,%r1849}, %rd2, %p14, 1, 1; 2026-02-21T09:20:57.6574516Z // end inline asm 2026-02-21T09:20:57.6574581Z // begin inline asm 2026-02-21T09:20:57.6575333Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1782,%r1783,%r1784,%r1785,%r1786,%r1787,%r1788,%r1789,%r1790,%r1791,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813}, {%r1778,%r1779,%r1780,%r1781}, %rd3, %p14, 1, 1; 2026-02-21T09:20:57.6575441Z // end inline asm 2026-02-21T09:20:57.6575507Z // begin inline asm 2026-02-21T09:20:57.6576252Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1782,%r1783,%r1784,%r1785,%r1786,%r1787,%r1788,%r1789,%r1790,%r1791,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813}, {%r1846,%r1847,%r1848,%r1849}, %rd4, %p14, 1, 1; 2026-02-21T09:20:57.6576313Z // end inline asm 2026-02-21T09:20:57.6576398Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6576573Z mov.b32 %r2218, 0; 2026-02-21T09:20:57.6576640Z mov.b32 %r1914, %r481; 2026-02-21T09:20:57.6576702Z mov.b32 %r1915, %r2218; 2026-02-21T09:20:57.6576768Z mov.b32 %r1916, %r2218; 2026-02-21T09:20:57.6576827Z // begin inline asm 2026-02-21T09:20:57.6577896Z // wait for regs: %r1646,%r1647,%r1648,%r1649,%r1650,%r1651,%r1652,%r1653,%r1654,%r1655,%r1656,%r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677,%r1782,%r1783,%r1784,%r1785,%r1786,%r1787,%r1788,%r1789,%r1790,%r1791,%r1792,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1914,%r1915,%r1916 2026-02-21T09:20:57.6577981Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6578041Z // end inline asm 2026-02-21T09:20:57.6578097Z $L__tmp6: 2026-02-21T09:20:57.6578318Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6578385Z add.s32 %r2267, %r2261, 40960; 2026-02-21T09:20:57.6578587Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6578669Z add.s32 %r2268, %r2267, %r31; 2026-02-21T09:20:57.6578739Z ld.shared.b16 %rs59, [%r2268]; 2026-02-21T09:20:57.6578809Z ld.shared.b16 %rs60, [%r2268+256]; 2026-02-21T09:20:57.6578877Z ld.shared.b16 %rs61, [%r2268+16]; 2026-02-21T09:20:57.6578949Z ld.shared.b16 %rs62, [%r2268+272]; 2026-02-21T09:20:57.6579014Z add.s32 %r2269, %r2267, %r32; 2026-02-21T09:20:57.6579080Z ld.shared.b16 %rs63, [%r2269]; 2026-02-21T09:20:57.6579152Z ld.shared.b16 %rs64, [%r2269+256]; 2026-02-21T09:20:57.6579227Z ld.shared.b16 %rs65, [%r2269+16]; 2026-02-21T09:20:57.6579295Z ld.shared.b16 %rs66, [%r2269+272]; 2026-02-21T09:20:57.6579448Z cvt.f32.bf16 %r2112, %rs59; 2026-02-21T09:20:57.6579513Z cvt.f32.bf16 %r2113, %rs60; 2026-02-21T09:20:57.6579574Z cvt.f32.bf16 %r2114, %rs63; 2026-02-21T09:20:57.6579636Z cvt.f32.bf16 %r2115, %rs64; 2026-02-21T09:20:57.6579706Z cvt.f32.bf16 %r2180, %rs61; 2026-02-21T09:20:57.6579830Z cvt.f32.bf16 %r2181, %rs62; 2026-02-21T09:20:57.6579892Z cvt.f32.bf16 %r2182, %rs65; 2026-02-21T09:20:57.6579957Z cvt.f32.bf16 %r2183, %rs66; 2026-02-21T09:20:57.6580159Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6580224Z add.s32 %r2270, %r5196, 65536; 2026-02-21T09:20:57.6580288Z cvt.s64.s32 %rd101, %r2270; 2026-02-21T09:20:57.6580356Z add.s64 %rd94, %rd28, %rd101; 2026-02-21T09:20:57.6580612Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6580675Z // begin inline asm 2026-02-21T09:20:57.6580740Z mov.u64 %rd93, 0x0; 2026-02-21T09:20:57.6580869Z createpolicy.fractional.L2::evict_first.b64 %rd93, 1.0; 2026-02-21T09:20:57.6580931Z // end inline asm 2026-02-21T09:20:57.6580994Z // begin inline asm 2026-02-21T09:20:57.6581054Z mov.u16 %rs40, 0x0; 2026-02-21T09:20:57.6581214Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs40 }, [ %rd94 + 0 ], %rd93; 2026-02-21T09:20:57.6581274Z // end inline asm 2026-02-21T09:20:57.6581479Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6581536Z bar.sync 0; 2026-02-21T09:20:57.6581600Z st.shared.b8 [%r33], %rs40; 2026-02-21T09:20:57.6581724Z bar.sync 0; 2026-02-21T09:20:57.6581804Z ld.shared.v2.b8 {%rs67, %rs68}, [%r34]; 2026-02-21T09:20:57.6582003Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6582082Z shl.b16 %rs69, %rs67, 4; 2026-02-21T09:20:57.6582146Z shl.b16 %rs70, %rs68, 4; 2026-02-21T09:20:57.6582343Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6582417Z selp.b16 %rs71, %rs69, %rs67, %p63; 2026-02-21T09:20:57.6582483Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T09:20:57.6582543Z shr.s16 %rs73, %rs72, 4; 2026-02-21T09:20:57.6582611Z selp.b16 %rs74, %rs70, %rs68, %p63; 2026-02-21T09:20:57.6582679Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T09:20:57.6582739Z shr.s16 %rs76, %rs75, 4; 2026-02-21T09:20:57.6582936Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6583008Z cvt.rn.f32.s16 %r2271, %rs73; 2026-02-21T09:20:57.6583074Z cvt.rn.f32.s16 %r2272, %rs76; 2026-02-21T09:20:57.6583142Z bar.sync 0; 2026-02-21T09:20:57.6583209Z st.shared.b32 [%r35], %r2271; 2026-02-21T09:20:57.6583280Z st.shared.b32 [%r36], %r2272; 2026-02-21T09:20:57.6583338Z $L__tmp7: 2026-02-21T09:20:57.6583611Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6583771Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1646, %r1782}; 2026-02-21T09:20:57.6583828Z bar.sync 0; 2026-02-21T09:20:57.6583888Z // begin inline asm 2026-02-21T09:20:57.6584025Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2116}, [%r1079]; 2026-02-21T09:20:57.6584087Z // end inline asm 2026-02-21T09:20:57.6584143Z bar.sync 0; 2026-02-21T09:20:57.6584293Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1648, %r1784}; 2026-02-21T09:20:57.6584352Z bar.sync 0; 2026-02-21T09:20:57.6584413Z // begin inline asm 2026-02-21T09:20:57.6584545Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2118}, [%r1079]; 2026-02-21T09:20:57.6584606Z // end inline asm 2026-02-21T09:20:57.6584661Z bar.sync 0; 2026-02-21T09:20:57.6584811Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1647, %r1783}; 2026-02-21T09:20:57.6584869Z bar.sync 0; 2026-02-21T09:20:57.6584933Z // begin inline asm 2026-02-21T09:20:57.6585062Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2117}, [%r1079]; 2026-02-21T09:20:57.6585181Z // end inline asm 2026-02-21T09:20:57.6585241Z bar.sync 0; 2026-02-21T09:20:57.6585385Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1649, %r1785}; 2026-02-21T09:20:57.6585442Z bar.sync 0; 2026-02-21T09:20:57.6585502Z // begin inline asm 2026-02-21T09:20:57.6585681Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2119}, [%r1079]; 2026-02-21T09:20:57.6585737Z // end inline asm 2026-02-21T09:20:57.6585792Z bar.sync 0; 2026-02-21T09:20:57.6585940Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1650, %r1786}; 2026-02-21T09:20:57.6585995Z bar.sync 0; 2026-02-21T09:20:57.6586056Z // begin inline asm 2026-02-21T09:20:57.6586186Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2120}, [%r1079]; 2026-02-21T09:20:57.6586247Z // end inline asm 2026-02-21T09:20:57.6586352Z bar.sync 0; 2026-02-21T09:20:57.6586629Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1652, %r1788}; 2026-02-21T09:20:57.6586694Z bar.sync 0; 2026-02-21T09:20:57.6586754Z // begin inline asm 2026-02-21T09:20:57.6586887Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2122}, [%r1079]; 2026-02-21T09:20:57.6586951Z // end inline asm 2026-02-21T09:20:57.6587006Z bar.sync 0; 2026-02-21T09:20:57.6587151Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1651, %r1787}; 2026-02-21T09:20:57.6587208Z bar.sync 0; 2026-02-21T09:20:57.6587273Z // begin inline asm 2026-02-21T09:20:57.6587400Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2121}, [%r1079]; 2026-02-21T09:20:57.6587458Z // end inline asm 2026-02-21T09:20:57.6587519Z bar.sync 0; 2026-02-21T09:20:57.6587742Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1653, %r1789}; 2026-02-21T09:20:57.6587803Z bar.sync 0; 2026-02-21T09:20:57.6587863Z // begin inline asm 2026-02-21T09:20:57.6588000Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2123}, [%r1079]; 2026-02-21T09:20:57.6588057Z // end inline asm 2026-02-21T09:20:57.6588112Z bar.sync 0; 2026-02-21T09:20:57.6588263Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1654, %r1790}; 2026-02-21T09:20:57.6588411Z bar.sync 0; 2026-02-21T09:20:57.6588475Z // begin inline asm 2026-02-21T09:20:57.6588605Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2124}, [%r1079]; 2026-02-21T09:20:57.6588667Z // end inline asm 2026-02-21T09:20:57.6588722Z bar.sync 0; 2026-02-21T09:20:57.6588868Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1656, %r1792}; 2026-02-21T09:20:57.6588928Z bar.sync 0; 2026-02-21T09:20:57.6588990Z // begin inline asm 2026-02-21T09:20:57.6589120Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2126}, [%r1079]; 2026-02-21T09:20:57.6589183Z // end inline asm 2026-02-21T09:20:57.6589238Z bar.sync 0; 2026-02-21T09:20:57.6589381Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1655, %r1791}; 2026-02-21T09:20:57.6589436Z bar.sync 0; 2026-02-21T09:20:57.6589500Z // begin inline asm 2026-02-21T09:20:57.6589629Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2125}, [%r1079]; 2026-02-21T09:20:57.6589686Z // end inline asm 2026-02-21T09:20:57.6589746Z bar.sync 0; 2026-02-21T09:20:57.6589893Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1657, %r1793}; 2026-02-21T09:20:57.6589949Z bar.sync 0; 2026-02-21T09:20:57.6590007Z // begin inline asm 2026-02-21T09:20:57.6590139Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2127}, [%r1079]; 2026-02-21T09:20:57.6590202Z // end inline asm 2026-02-21T09:20:57.6590257Z bar.sync 0; 2026-02-21T09:20:57.6590404Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1658, %r1794}; 2026-02-21T09:20:57.6590460Z bar.sync 0; 2026-02-21T09:20:57.6590519Z // begin inline asm 2026-02-21T09:20:57.6590648Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2128}, [%r1079]; 2026-02-21T09:20:57.6590708Z // end inline asm 2026-02-21T09:20:57.6590763Z bar.sync 0; 2026-02-21T09:20:57.6590908Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1660, %r1796}; 2026-02-21T09:20:57.6590980Z bar.sync 0; 2026-02-21T09:20:57.6591041Z // begin inline asm 2026-02-21T09:20:57.6591171Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2130}, [%r1079]; 2026-02-21T09:20:57.6591353Z // end inline asm 2026-02-21T09:20:57.6591408Z bar.sync 0; 2026-02-21T09:20:57.6591553Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1659, %r1795}; 2026-02-21T09:20:57.6591608Z bar.sync 0; 2026-02-21T09:20:57.6591672Z // begin inline asm 2026-02-21T09:20:57.6591865Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2129}, [%r1079]; 2026-02-21T09:20:57.6591923Z // end inline asm 2026-02-21T09:20:57.6591981Z bar.sync 0; 2026-02-21T09:20:57.6592124Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1661, %r1797}; 2026-02-21T09:20:57.6592182Z bar.sync 0; 2026-02-21T09:20:57.6592244Z // begin inline asm 2026-02-21T09:20:57.6592377Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2131}, [%r1079]; 2026-02-21T09:20:57.6592434Z // end inline asm 2026-02-21T09:20:57.6592553Z bar.sync 0; 2026-02-21T09:20:57.6592707Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1662, %r1798}; 2026-02-21T09:20:57.6592763Z bar.sync 0; 2026-02-21T09:20:57.6592821Z // begin inline asm 2026-02-21T09:20:57.6592954Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2132}, [%r1079]; 2026-02-21T09:20:57.6593028Z // end inline asm 2026-02-21T09:20:57.6593086Z bar.sync 0; 2026-02-21T09:20:57.6593232Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1664, %r1800}; 2026-02-21T09:20:57.6593293Z bar.sync 0; 2026-02-21T09:20:57.6593353Z // begin inline asm 2026-02-21T09:20:57.6593481Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2134}, [%r1079]; 2026-02-21T09:20:57.6593541Z // end inline asm 2026-02-21T09:20:57.6593598Z bar.sync 0; 2026-02-21T09:20:57.6593801Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1663, %r1799}; 2026-02-21T09:20:57.6593859Z bar.sync 0; 2026-02-21T09:20:57.6593923Z // begin inline asm 2026-02-21T09:20:57.6594053Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2133}, [%r1079]; 2026-02-21T09:20:57.6594111Z // end inline asm 2026-02-21T09:20:57.6594170Z bar.sync 0; 2026-02-21T09:20:57.6594315Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1665, %r1801}; 2026-02-21T09:20:57.6594373Z bar.sync 0; 2026-02-21T09:20:57.6594433Z // begin inline asm 2026-02-21T09:20:57.6594565Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2135}, [%r1079]; 2026-02-21T09:20:57.6594623Z // end inline asm 2026-02-21T09:20:57.6594680Z bar.sync 0; 2026-02-21T09:20:57.6594830Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1666, %r1802}; 2026-02-21T09:20:57.6594884Z bar.sync 0; 2026-02-21T09:20:57.6594946Z // begin inline asm 2026-02-21T09:20:57.6595076Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2136}, [%r1079]; 2026-02-21T09:20:57.6595140Z // end inline asm 2026-02-21T09:20:57.6595197Z bar.sync 0; 2026-02-21T09:20:57.6595339Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1668, %r1804}; 2026-02-21T09:20:57.6595399Z bar.sync 0; 2026-02-21T09:20:57.6595459Z // begin inline asm 2026-02-21T09:20:57.6595588Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2138}, [%r1079]; 2026-02-21T09:20:57.6595648Z // end inline asm 2026-02-21T09:20:57.6595706Z bar.sync 0; 2026-02-21T09:20:57.6595851Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1667, %r1803}; 2026-02-21T09:20:57.6595907Z bar.sync 0; 2026-02-21T09:20:57.6595970Z // begin inline asm 2026-02-21T09:20:57.6596099Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2137}, [%r1079]; 2026-02-21T09:20:57.6596157Z // end inline asm 2026-02-21T09:20:57.6596218Z bar.sync 0; 2026-02-21T09:20:57.6596363Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1669, %r1805}; 2026-02-21T09:20:57.6596418Z bar.sync 0; 2026-02-21T09:20:57.6596606Z // begin inline asm 2026-02-21T09:20:57.6596749Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2139}, [%r1079]; 2026-02-21T09:20:57.6596808Z // end inline asm 2026-02-21T09:20:57.6596863Z bar.sync 0; 2026-02-21T09:20:57.6597012Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1670, %r1806}; 2026-02-21T09:20:57.6597068Z bar.sync 0; 2026-02-21T09:20:57.6597129Z // begin inline asm 2026-02-21T09:20:57.6597260Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2140}, [%r1079]; 2026-02-21T09:20:57.6597410Z // end inline asm 2026-02-21T09:20:57.6597466Z bar.sync 0; 2026-02-21T09:20:57.6597610Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1672, %r1808}; 2026-02-21T09:20:57.6597668Z bar.sync 0; 2026-02-21T09:20:57.6597794Z // begin inline asm 2026-02-21T09:20:57.6597921Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2142}, [%r1079]; 2026-02-21T09:20:57.6597982Z // end inline asm 2026-02-21T09:20:57.6598037Z bar.sync 0; 2026-02-21T09:20:57.6598183Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1671, %r1807}; 2026-02-21T09:20:57.6598240Z bar.sync 0; 2026-02-21T09:20:57.6598301Z // begin inline asm 2026-02-21T09:20:57.6598445Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2141}, [%r1079]; 2026-02-21T09:20:57.6598503Z // end inline asm 2026-02-21T09:20:57.6598627Z bar.sync 0; 2026-02-21T09:20:57.6598774Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1673, %r1809}; 2026-02-21T09:20:57.6598828Z bar.sync 0; 2026-02-21T09:20:57.6598889Z // begin inline asm 2026-02-21T09:20:57.6599022Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2143}, [%r1079]; 2026-02-21T09:20:57.6599078Z // end inline asm 2026-02-21T09:20:57.6599133Z bar.sync 0; 2026-02-21T09:20:57.6599281Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1674, %r1810}; 2026-02-21T09:20:57.6599337Z bar.sync 0; 2026-02-21T09:20:57.6599396Z // begin inline asm 2026-02-21T09:20:57.6599522Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2144}, [%r1079]; 2026-02-21T09:20:57.6599582Z // end inline asm 2026-02-21T09:20:57.6599640Z bar.sync 0; 2026-02-21T09:20:57.6600009Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1676, %r1812}; 2026-02-21T09:20:57.6600076Z bar.sync 0; 2026-02-21T09:20:57.6600136Z // begin inline asm 2026-02-21T09:20:57.6600269Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2146}, [%r1079]; 2026-02-21T09:20:57.6600331Z // end inline asm 2026-02-21T09:20:57.6600386Z bar.sync 0; 2026-02-21T09:20:57.6600530Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1675, %r1811}; 2026-02-21T09:20:57.6600588Z bar.sync 0; 2026-02-21T09:20:57.6600650Z // begin inline asm 2026-02-21T09:20:57.6600777Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2145}, [%r1079]; 2026-02-21T09:20:57.6600833Z // end inline asm 2026-02-21T09:20:57.6600893Z bar.sync 0; 2026-02-21T09:20:57.6601040Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r1677, %r1813}; 2026-02-21T09:20:57.6601097Z bar.sync 0; 2026-02-21T09:20:57.6601155Z // begin inline asm 2026-02-21T09:20:57.6601289Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2147}, [%r1079]; 2026-02-21T09:20:57.6601347Z // end inline asm 2026-02-21T09:20:57.6601406Z // begin inline asm 2026-02-21T09:20:57.6601490Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6601547Z // end inline asm 2026-02-21T09:20:57.6601622Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6601686Z shl.b32 %r2273, %r2266, 8; 2026-02-21T09:20:57.6601759Z and.b32 %r2274, %r2273, 4096; 2026-02-21T09:20:57.6601822Z add.s32 %r2275, %r2274, %r481; 2026-02-21T09:20:57.6601886Z bfe.u32 %r2276, %r2275, 4, 14; 2026-02-21T09:20:57.6601956Z cvt.u64.u32 %rd102, %r2276; 2026-02-21T09:20:57.6602038Z or.b64 %rd96, %rd102, -9223371899382267904; 2026-02-21T09:20:57.6602098Z // begin inline asm 2026-02-21T09:20:57.6602861Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2116,%r2117,%r2118,%r2119,%r2120,%r2121,%r2122,%r2123,%r2124,%r2125,%r2126,%r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147}, {%r2112,%r2113,%r2114,%r2115}, %rd96, %p14, 1, 1; 2026-02-21T09:20:57.6602923Z // end inline asm 2026-02-21T09:20:57.6602986Z add.s32 %r2277, %r2275, 32; 2026-02-21T09:20:57.6603054Z bfe.u32 %r2278, %r2277, 4, 14; 2026-02-21T09:20:57.6603118Z cvt.u64.u32 %rd103, %r2278; 2026-02-21T09:20:57.6603195Z or.b64 %rd97, %rd103, -9223371899382267904; 2026-02-21T09:20:57.6603257Z // begin inline asm 2026-02-21T09:20:57.6604012Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2116,%r2117,%r2118,%r2119,%r2120,%r2121,%r2122,%r2123,%r2124,%r2125,%r2126,%r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147}, {%r2180,%r2181,%r2182,%r2183}, %rd97, %p14, 1, 1; 2026-02-21T09:20:57.6604172Z // end inline asm 2026-02-21T09:20:57.6604249Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6604316Z mov.b32 %r2217, %r2218; 2026-02-21T09:20:57.6604377Z mov.b32 %r2216, %r481; 2026-02-21T09:20:57.6604437Z // begin inline asm 2026-02-21T09:20:57.6605042Z // wait for regs: %r2116,%r2117,%r2118,%r2119,%r2120,%r2121,%r2122,%r2123,%r2124,%r2125,%r2126,%r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147,%r2216,%r2217,%r2218 2026-02-21T09:20:57.6605120Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6605177Z // end inline asm 2026-02-21T09:20:57.6605236Z $L__tmp8: 2026-02-21T09:20:57.6605448Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6605512Z add.s32 %r2279, %r5199, 1; 2026-02-21T09:20:57.6605579Z setp.gt.s32 %p23, %r2279, 4; 2026-02-21T09:20:57.6605659Z selp.b32 %r5199, 0, %r2279, %p23; 2026-02-21T09:20:57.6605867Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6605931Z add.s32 %r2280, %r5197, -16; 2026-02-21T09:20:57.6606008Z mad.wide.s32 %rd98, %r2280, 2, %rd27; 2026-02-21T09:20:57.6606258Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6606322Z shl.b32 %r2281, %r5199, 13; 2026-02-21T09:20:57.6606390Z add.s32 %r2254, %r21, %r2281; 2026-02-21T09:20:57.6606569Z selp.b32 %r2255, 8, 0, %p21; 2026-02-21T09:20:57.6606633Z // begin inline asm 2026-02-21T09:20:57.6606779Z cp.async.ca.shared.global [ %r2254 + 0 ], [ %rd98 + 0 ], 0x8, %r2255; 2026-02-21T09:20:57.6606843Z // end inline asm 2026-02-21T09:20:57.6606912Z cp.async.commit_group; 2026-02-21T09:20:57.6607110Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6607188Z mad.wide.s32 %rd99, %r5197, 2, %rd27; 2026-02-21T09:20:57.6607385Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6607448Z add.s32 %r2256, %r22, %r2281; 2026-02-21T09:20:57.6607511Z // begin inline asm 2026-02-21T09:20:57.6607651Z cp.async.ca.shared.global [ %r2256 + 0 ], [ %rd99 + 0 ], 0x8, %r2255; 2026-02-21T09:20:57.6607722Z // end inline asm 2026-02-21T09:20:57.6607790Z cp.async.commit_group; 2026-02-21T09:20:57.6607997Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6608059Z add.s32 %r5197, %r5197, 32; 2026-02-21T09:20:57.6608123Z add.s32 %r5196, %r5196, 131072; 2026-02-21T09:20:57.6608194Z setp.lt.u64 %p24, %rd220, 496; 2026-02-21T09:20:57.6608256Z @%p24 bra $L__BB0_5; 2026-02-21T09:20:57.6608368Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:57.6608573Z .loc 1 34 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:32 2026-02-21T09:20:57.6608643Z or.b32 %r2341, %r153, %r8; 2026-02-21T09:20:57.6608839Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6608900Z or.b32 %r2342, %r154, %r12; 2026-02-21T09:20:57.6608966Z or.b32 %r2343, %r154, %r13; 2026-02-21T09:20:57.6609026Z or.b32 %r2344, %r154, %r14; 2026-02-21T09:20:57.6609086Z or.b32 %r2345, %r154, %r15; 2026-02-21T09:20:57.6609288Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6609356Z cp.async.wait_group 0; 2026-02-21T09:20:57.6609410Z bar.sync 0; 2026-02-21T09:20:57.6609608Z .loc 1 90 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:90:28 2026-02-21T09:20:57.6609772Z cvt.rn.bf16x2.f32 %r2346, %r2117, %r2116; 2026-02-21T09:20:57.6609847Z cvt.rn.bf16x2.f32 %r2347, %r2119, %r2118; 2026-02-21T09:20:57.6609919Z cvt.rn.bf16x2.f32 %r2348, %r2121, %r2120; 2026-02-21T09:20:57.6610059Z cvt.rn.bf16x2.f32 %r2349, %r2123, %r2122; 2026-02-21T09:20:57.6610131Z cvt.rn.bf16x2.f32 %r2350, %r2125, %r2124; 2026-02-21T09:20:57.6610201Z cvt.rn.bf16x2.f32 %r2351, %r2127, %r2126; 2026-02-21T09:20:57.6610275Z cvt.rn.bf16x2.f32 %r2352, %r2129, %r2128; 2026-02-21T09:20:57.6610347Z cvt.rn.bf16x2.f32 %r2353, %r2131, %r2130; 2026-02-21T09:20:57.6610418Z cvt.rn.bf16x2.f32 %r2354, %r2133, %r2132; 2026-02-21T09:20:57.6610490Z cvt.rn.bf16x2.f32 %r2355, %r2135, %r2134; 2026-02-21T09:20:57.6610631Z cvt.rn.bf16x2.f32 %r2356, %r2137, %r2136; 2026-02-21T09:20:57.6610706Z cvt.rn.bf16x2.f32 %r2357, %r2139, %r2138; 2026-02-21T09:20:57.6610777Z cvt.rn.bf16x2.f32 %r2358, %r2141, %r2140; 2026-02-21T09:20:57.6610855Z cvt.rn.bf16x2.f32 %r2359, %r2143, %r2142; 2026-02-21T09:20:57.6610928Z cvt.rn.bf16x2.f32 %r2360, %r2145, %r2144; 2026-02-21T09:20:57.6610999Z cvt.rn.bf16x2.f32 %r2361, %r2147, %r2146; 2026-02-21T09:20:57.6611207Z .loc 1 91 43 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:43 2026-02-21T09:20:57.6611270Z shl.b32 %r2362, %r2342, 13; 2026-02-21T09:20:57.6611338Z shl.b32 %r2363, %r2343, 13; 2026-02-21T09:20:57.6611399Z shl.b32 %r2364, %r2344, 13; 2026-02-21T09:20:57.6611461Z shl.b32 %r2365, %r2345, 13; 2026-02-21T09:20:57.6611741Z .loc 1 91 50 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:50 2026-02-21T09:20:57.6611808Z add.s32 %r2366, %r2362, %r2341; 2026-02-21T09:20:57.6611876Z add.s32 %r2367, %r2363, %r2341; 2026-02-21T09:20:57.6611938Z add.s32 %r2368, %r2364, %r2341; 2026-02-21T09:20:57.6611998Z add.s32 %r2369, %r2365, %r2341; 2026-02-21T09:20:57.6612217Z .loc 1 91 22 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:22 2026-02-21T09:20:57.6612294Z mad.wide.s32 %rd104, %r2366, 2, %rd29; 2026-02-21T09:20:57.6612364Z mad.wide.s32 %rd105, %r2367, 2, %rd29; 2026-02-21T09:20:57.6612431Z mad.wide.s32 %rd106, %r2368, 2, %rd29; 2026-02-21T09:20:57.6612503Z mad.wide.s32 %rd107, %r2369, 2, %rd29; 2026-02-21T09:20:57.6612701Z .loc 1 91 81 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:81 2026-02-21T09:20:57.6612814Z st.shared.v4.b32 [%r39], {%r2346, %r2348, %r2350, %r2352}; 2026-02-21T09:20:57.6612937Z st.shared.v4.b32 [%r39+512], {%r2347, %r2349, %r2351, %r2353}; 2026-02-21T09:20:57.6613043Z st.shared.v4.b32 [%r40], {%r2354, %r2356, %r2358, %r2360}; 2026-02-21T09:20:57.6613156Z st.shared.v4.b32 [%r40+512], {%r2355, %r2357, %r2359, %r2361}; 2026-02-21T09:20:57.6613216Z bar.sync 0; 2026-02-21T09:20:57.6613275Z // begin inline asm 2026-02-21T09:20:57.6613469Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2302, %r2303, %r2304, %r2305}, [%r1380]; 2026-02-21T09:20:57.6613529Z // end inline asm 2026-02-21T09:20:57.6613593Z // begin inline asm 2026-02-21T09:20:57.6613777Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2306, %r2307, %r2308, %r2309}, [%r1385]; 2026-02-21T09:20:57.6613836Z // end inline asm 2026-02-21T09:20:57.6613900Z // begin inline asm 2026-02-21T09:20:57.6614078Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2310, %r2311, %r2312, %r2313}, [%r1390]; 2026-02-21T09:20:57.6614136Z // end inline asm 2026-02-21T09:20:57.6614199Z // begin inline asm 2026-02-21T09:20:57.6614379Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2314, %r2315, %r2316, %r2317}, [%r1395]; 2026-02-21T09:20:57.6614437Z // end inline asm 2026-02-21T09:20:57.6614495Z // begin inline asm 2026-02-21T09:20:57.6614624Z st.global.v4.b32 [ %rd104 + 0 ], { %r2302, %r2303, %r2304, %r2305 }; 2026-02-21T09:20:57.6614681Z // end inline asm 2026-02-21T09:20:57.6614740Z // begin inline asm 2026-02-21T09:20:57.6614865Z st.global.v4.b32 [ %rd105 + 0 ], { %r2306, %r2307, %r2308, %r2309 }; 2026-02-21T09:20:57.6614986Z // end inline asm 2026-02-21T09:20:57.6615048Z // begin inline asm 2026-02-21T09:20:57.6615165Z st.global.v4.b32 [ %rd106 + 0 ], { %r2310, %r2311, %r2312, %r2313 }; 2026-02-21T09:20:57.6615276Z // end inline asm 2026-02-21T09:20:57.6615334Z // begin inline asm 2026-02-21T09:20:57.6615462Z st.global.v4.b32 [ %rd107 + 0 ], { %r2314, %r2315, %r2316, %r2317 }; 2026-02-21T09:20:57.6615526Z // end inline asm 2026-02-21T09:20:57.6615746Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6615810Z add.s32 %r2370, %r5159, 2; 2026-02-21T09:20:57.6616017Z .loc 1 28 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:28:35 2026-02-21T09:20:57.6616128Z shr.s32 %r2371, %r2370, 31; 2026-02-21T09:20:57.6616192Z shr.u32 %r2372, %r2371, 22; 2026-02-21T09:20:57.6616257Z add.s32 %r2373, %r2370, %r2372; 2026-02-21T09:20:57.6616326Z shr.s32 %r2374, %r2373, 10; 2026-02-21T09:20:57.6616662Z .loc 1 29 33 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:29:33 2026-02-21T09:20:57.6616739Z shl.b32 %r2375, %r2374, 4; 2026-02-21T09:20:57.6616944Z .loc 1 30 39 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:39 2026-02-21T09:20:57.6617008Z sub.s32 %r2376, 64, %r2375; 2026-02-21T09:20:57.6617204Z .loc 1 30 52 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:52 2026-02-21T09:20:57.6617348Z min.s32 %r2377, %r2376, 16; 2026-02-21T09:20:57.6617546Z .loc 1 31 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:45 2026-02-21T09:20:57.6617613Z and.b32 %r2378, %r2373, -1024; 2026-02-21T09:20:57.6617684Z sub.s32 %r2379, %r2370, %r2378; 2026-02-21T09:20:57.6617880Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6617946Z div.s32 %r2380, %r2379, %r2377; 2026-02-21T09:20:57.6618139Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6618211Z mul.lo.s32 %r2381, %r2380, %r2377; 2026-02-21T09:20:57.6618275Z sub.s32 %r2382, %r2379, %r2381; 2026-02-21T09:20:57.6618475Z .loc 1 31 30 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:30 2026-02-21T09:20:57.6618545Z add.s32 %r2383, %r2382, %r2375; 2026-02-21T09:20:57.6618744Z .loc 1 33 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:33:27 2026-02-21T09:20:57.6618807Z shl.b32 %r229, %r2383, 7; 2026-02-21T09:20:57.6619012Z .loc 1 35 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:35:27 2026-02-21T09:20:57.6619078Z shl.b32 %r230, %r2380, 8; 2026-02-21T09:20:57.6619273Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6619341Z or.b32 %r2384, %r230, %r11; 2026-02-21T09:20:57.6619536Z .loc 1 51 53 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:53 2026-02-21T09:20:57.6619596Z shl.b32 %r2385, %r2384, 10; 2026-02-21T09:20:57.6619792Z .loc 1 51 60 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:60 2026-02-21T09:20:57.6619861Z or.b32 %r2386, %r2385, %r19; 2026-02-21T09:20:57.6620057Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6620129Z mad.wide.s32 %rd108, %r2386, 2, %rd27; 2026-02-21T09:20:57.6620330Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6620387Z bar.sync 0; 2026-02-21T09:20:57.6620445Z mov.b32 %r2319, 8; 2026-02-21T09:20:57.6620511Z // begin inline asm 2026-02-21T09:20:57.6620648Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd108 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6620785Z // end inline asm 2026-02-21T09:20:57.6620853Z cp.async.commit_group; 2026-02-21T09:20:57.6621053Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6621118Z cvt.s64.s32 %rd119, %r2385; 2026-02-21T09:20:57.6621243Z or.b64 %rd120, %rd119, %rd10; 2026-02-21T09:20:57.6621318Z shl.b64 %rd121, %rd120, 1; 2026-02-21T09:20:57.6621390Z add.s64 %rd122, %rd27, %rd121; 2026-02-21T09:20:57.6621452Z add.s64 %rd109, %rd122, 32; 2026-02-21T09:20:57.6621665Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6621725Z // begin inline asm 2026-02-21T09:20:57.6621857Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd109 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6621976Z // end inline asm 2026-02-21T09:20:57.6622050Z cp.async.commit_group; 2026-02-21T09:20:57.6622249Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6622315Z add.s64 %rd110, %rd122, 64; 2026-02-21T09:20:57.6622518Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6622575Z bar.sync 0; 2026-02-21T09:20:57.6622635Z // begin inline asm 2026-02-21T09:20:57.6622773Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd110 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6622831Z // end inline asm 2026-02-21T09:20:57.6622897Z cp.async.commit_group; 2026-02-21T09:20:57.6623145Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6623227Z add.s64 %rd111, %rd122, 96; 2026-02-21T09:20:57.6623425Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6623488Z // begin inline asm 2026-02-21T09:20:57.6623624Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd111 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6623680Z // end inline asm 2026-02-21T09:20:57.6623747Z cp.async.commit_group; 2026-02-21T09:20:57.6623950Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6624016Z add.s64 %rd112, %rd122, 128; 2026-02-21T09:20:57.6624212Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6624272Z bar.sync 0; 2026-02-21T09:20:57.6624341Z // begin inline asm 2026-02-21T09:20:57.6624471Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd112 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6624529Z // end inline asm 2026-02-21T09:20:57.6624601Z cp.async.commit_group; 2026-02-21T09:20:57.6624799Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6624864Z add.s64 %rd113, %rd122, 160; 2026-02-21T09:20:57.6625068Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6625130Z // begin inline asm 2026-02-21T09:20:57.6625261Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd113 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6625317Z // end inline asm 2026-02-21T09:20:57.6625388Z cp.async.commit_group; 2026-02-21T09:20:57.6625583Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6625648Z add.s64 %rd114, %rd122, 192; 2026-02-21T09:20:57.6625849Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6625905Z bar.sync 0; 2026-02-21T09:20:57.6625967Z // begin inline asm 2026-02-21T09:20:57.6626108Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd114 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6626176Z // end inline asm 2026-02-21T09:20:57.6626248Z cp.async.commit_group; 2026-02-21T09:20:57.6626571Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6626652Z add.s64 %rd115, %rd122, 224; 2026-02-21T09:20:57.6626931Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6626992Z // begin inline asm 2026-02-21T09:20:57.6627126Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd115 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6627244Z // end inline asm 2026-02-21T09:20:57.6627309Z cp.async.commit_group; 2026-02-21T09:20:57.6627510Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6627577Z add.s64 %rd116, %rd122, 256; 2026-02-21T09:20:57.6627775Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6627831Z bar.sync 0; 2026-02-21T09:20:57.6627895Z // begin inline asm 2026-02-21T09:20:57.6628086Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd116 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6628147Z // end inline asm 2026-02-21T09:20:57.6628218Z cp.async.commit_group; 2026-02-21T09:20:57.6628483Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6628550Z add.s64 %rd117, %rd122, 288; 2026-02-21T09:20:57.6628750Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6628812Z // begin inline asm 2026-02-21T09:20:57.6628943Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd117 + 0 ], 0x8, %r2319; 2026-02-21T09:20:57.6629002Z // end inline asm 2026-02-21T09:20:57.6629073Z cp.async.commit_group; 2026-02-21T09:20:57.6629338Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6629404Z shl.b32 %r2387, %r2380, 18; 2026-02-21T09:20:57.6629470Z or.b32 %r5233, %r45, %r2387; 2026-02-21T09:20:57.6629534Z add.s32 %r5232, %r46, %r229; 2026-02-21T09:20:57.6629598Z mov.b32 %r3022, 0f00000000; 2026-02-21T09:20:57.6629655Z mov.b32 %r5235, 4; 2026-02-21T09:20:57.6629720Z mov.b32 %r5234, -1; 2026-02-21T09:20:57.6629783Z mov.b64 %rd221, -16; 2026-02-21T09:20:57.6629844Z mov.b32 %r3023, %r3022; 2026-02-21T09:20:57.6629908Z mov.b32 %r3024, %r3022; 2026-02-21T09:20:57.6629968Z mov.b32 %r3025, %r3022; 2026-02-21T09:20:57.6630026Z mov.b32 %r3026, %r3022; 2026-02-21T09:20:57.6630090Z mov.b32 %r3027, %r3022; 2026-02-21T09:20:57.6630149Z mov.b32 %r3028, %r3022; 2026-02-21T09:20:57.6630207Z mov.b32 %r3029, %r3022; 2026-02-21T09:20:57.6630265Z mov.b32 %r3030, %r3022; 2026-02-21T09:20:57.6630329Z mov.b32 %r3031, %r3022; 2026-02-21T09:20:57.6630387Z mov.b32 %r3032, %r3022; 2026-02-21T09:20:57.6630445Z mov.b32 %r3033, %r3022; 2026-02-21T09:20:57.6630511Z mov.b32 %r3034, %r3022; 2026-02-21T09:20:57.6630570Z mov.b32 %r3035, %r3022; 2026-02-21T09:20:57.6630629Z mov.b32 %r3036, %r3022; 2026-02-21T09:20:57.6630692Z mov.b32 %r3037, %r3022; 2026-02-21T09:20:57.6630756Z mov.b32 %r3038, %r3022; 2026-02-21T09:20:57.6630814Z mov.b32 %r3039, %r3022; 2026-02-21T09:20:57.6630871Z mov.b32 %r3040, %r3022; 2026-02-21T09:20:57.6630935Z mov.b32 %r3041, %r3022; 2026-02-21T09:20:57.6630994Z mov.b32 %r3042, %r3022; 2026-02-21T09:20:57.6631053Z mov.b32 %r3043, %r3022; 2026-02-21T09:20:57.6631126Z mov.b32 %r3044, %r3022; 2026-02-21T09:20:57.6631191Z mov.b32 %r3045, %r3022; 2026-02-21T09:20:57.6631253Z mov.b32 %r3046, %r3022; 2026-02-21T09:20:57.6631311Z mov.b32 %r3047, %r3022; 2026-02-21T09:20:57.6631383Z mov.b32 %r3048, %r3022; 2026-02-21T09:20:57.6631443Z mov.b32 %r3049, %r3022; 2026-02-21T09:20:57.6631500Z mov.b32 %r3050, %r3022; 2026-02-21T09:20:57.6631559Z mov.b32 %r3051, %r3022; 2026-02-21T09:20:57.6631623Z mov.b32 %r3052, %r3022; 2026-02-21T09:20:57.6631683Z mov.b32 %r3053, %r3022; 2026-02-21T09:20:57.6631795Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:57.6631910Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:57.6631974Z add.s64 %rd221, %rd221, 16; 2026-02-21T09:20:57.6632043Z setp.lt.u64 %p32, %rd221, 432; 2026-02-21T09:20:57.6632171Z add.s32 %r3164, %r5234, 1; 2026-02-21T09:20:57.6632237Z setp.gt.s32 %p33, %r3164, 4; 2026-02-21T09:20:57.6632306Z selp.b32 %r5234, 0, %r3164, %p33; 2026-02-21T09:20:57.6632506Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6632623Z cp.async.wait_group 8; 2026-02-21T09:20:57.6632680Z bar.sync 0; 2026-02-21T09:20:57.6632741Z shl.b32 %r3165, %r5234, 13; 2026-02-21T09:20:57.6632810Z add.s32 %r3167, %r5136, %r3165; 2026-02-21T09:20:57.6633010Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6633074Z add.s32 %r3168, %r3167, %r31; 2026-02-21T09:20:57.6633142Z ld.shared.b16 %rs79, [%r3168]; 2026-02-21T09:20:57.6633283Z ld.shared.b16 %rs80, [%r3168+256]; 2026-02-21T09:20:57.6633352Z ld.shared.b16 %rs81, [%r3168+16]; 2026-02-21T09:20:57.6633416Z ld.shared.b16 %rs82, [%r3168+272]; 2026-02-21T09:20:57.6633484Z add.s32 %r3169, %r3167, %r32; 2026-02-21T09:20:57.6633548Z ld.shared.b16 %rs83, [%r3169]; 2026-02-21T09:20:57.6633611Z ld.shared.b16 %rs84, [%r3169+256]; 2026-02-21T09:20:57.6633680Z ld.shared.b16 %rs85, [%r3169+16]; 2026-02-21T09:20:57.6633745Z ld.shared.b16 %rs86, [%r3169+272]; 2026-02-21T09:20:57.6633812Z cvt.f32.bf16 %r2684, %rs79; 2026-02-21T09:20:57.6633872Z cvt.f32.bf16 %r2685, %rs80; 2026-02-21T09:20:57.6633937Z cvt.f32.bf16 %r2686, %rs83; 2026-02-21T09:20:57.6634010Z cvt.f32.bf16 %r2687, %rs84; 2026-02-21T09:20:57.6634075Z cvt.f32.bf16 %r2752, %rs81; 2026-02-21T09:20:57.6634193Z cvt.f32.bf16 %r2753, %rs82; 2026-02-21T09:20:57.6634263Z cvt.f32.bf16 %r2754, %rs85; 2026-02-21T09:20:57.6634323Z cvt.f32.bf16 %r2755, %rs86; 2026-02-21T09:20:57.6634526Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6634594Z cvt.s64.s32 %rd137, %r5232; 2026-02-21T09:20:57.6634659Z add.s64 %rd124, %rd28, %rd137; 2026-02-21T09:20:57.6634855Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6634921Z // begin inline asm 2026-02-21T09:20:57.6634982Z mov.u64 %rd123, 0x0; 2026-02-21T09:20:57.6635114Z createpolicy.fractional.L2::evict_first.b64 %rd123, 1.0; 2026-02-21T09:20:57.6635175Z // end inline asm 2026-02-21T09:20:57.6635237Z // begin inline asm 2026-02-21T09:20:57.6635298Z mov.u16 %rs77, 0x0; 2026-02-21T09:20:57.6635461Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs77 }, [ %rd124 + 0 ], %rd123; 2026-02-21T09:20:57.6635525Z // end inline asm 2026-02-21T09:20:57.6635725Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6635789Z st.shared.b8 [%r33], %rs77; 2026-02-21T09:20:57.6635855Z bar.sync 0; 2026-02-21T09:20:57.6635932Z ld.shared.v2.b8 {%rs87, %rs88}, [%r34]; 2026-02-21T09:20:57.6636131Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6636203Z shl.b16 %rs89, %rs87, 4; 2026-02-21T09:20:57.6636270Z shl.b16 %rs90, %rs88, 4; 2026-02-21T09:20:57.6636593Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6636672Z selp.b16 %rs91, %rs89, %rs87, %p63; 2026-02-21T09:20:57.6636740Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T09:20:57.6636801Z shr.s16 %rs93, %rs92, 4; 2026-02-21T09:20:57.6636869Z selp.b16 %rs94, %rs90, %rs88, %p63; 2026-02-21T09:20:57.6636933Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T09:20:57.6636995Z shr.s16 %rs96, %rs95, 4; 2026-02-21T09:20:57.6637193Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6637264Z cvt.rn.f32.s16 %r3170, %rs93; 2026-02-21T09:20:57.6637326Z cvt.rn.f32.s16 %r3171, %rs96; 2026-02-21T09:20:57.6637382Z bar.sync 0; 2026-02-21T09:20:57.6637445Z st.shared.b32 [%r35], %r3170; 2026-02-21T09:20:57.6637595Z st.shared.b32 [%r36], %r3171; 2026-02-21T09:20:57.6637735Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3022}; 2026-02-21T09:20:57.6637793Z bar.sync 0; 2026-02-21T09:20:57.6637857Z // begin inline asm 2026-02-21T09:20:57.6638010Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2552, %r2688}, [%r578]; 2026-02-21T09:20:57.6638131Z // end inline asm 2026-02-21T09:20:57.6638185Z bar.sync 0; 2026-02-21T09:20:57.6638320Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3024}; 2026-02-21T09:20:57.6638376Z bar.sync 0; 2026-02-21T09:20:57.6638435Z // begin inline asm 2026-02-21T09:20:57.6638589Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2554, %r2690}, [%r578]; 2026-02-21T09:20:57.6638649Z // end inline asm 2026-02-21T09:20:57.6638704Z bar.sync 0; 2026-02-21T09:20:57.6638897Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3023}; 2026-02-21T09:20:57.6638960Z bar.sync 0; 2026-02-21T09:20:57.6639017Z // begin inline asm 2026-02-21T09:20:57.6639178Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2553, %r2689}, [%r578]; 2026-02-21T09:20:57.6639242Z // end inline asm 2026-02-21T09:20:57.6639297Z bar.sync 0; 2026-02-21T09:20:57.6639427Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3025}; 2026-02-21T09:20:57.6639484Z bar.sync 0; 2026-02-21T09:20:57.6639545Z // begin inline asm 2026-02-21T09:20:57.6639689Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2555, %r2691}, [%r578]; 2026-02-21T09:20:57.6639745Z // end inline asm 2026-02-21T09:20:57.6639804Z bar.sync 0; 2026-02-21T09:20:57.6639932Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3026}; 2026-02-21T09:20:57.6640052Z bar.sync 0; 2026-02-21T09:20:57.6640118Z // begin inline asm 2026-02-21T09:20:57.6640264Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2556, %r2692}, [%r578]; 2026-02-21T09:20:57.6640323Z // end inline asm 2026-02-21T09:20:57.6640379Z bar.sync 0; 2026-02-21T09:20:57.6640510Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3028}; 2026-02-21T09:20:57.6640565Z bar.sync 0; 2026-02-21T09:20:57.6640625Z // begin inline asm 2026-02-21T09:20:57.6640773Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2558, %r2694}, [%r578]; 2026-02-21T09:20:57.6640828Z // end inline asm 2026-02-21T09:20:57.6640884Z bar.sync 0; 2026-02-21T09:20:57.6641014Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3027}; 2026-02-21T09:20:57.6641086Z bar.sync 0; 2026-02-21T09:20:57.6641146Z // begin inline asm 2026-02-21T09:20:57.6641293Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2557, %r2693}, [%r578]; 2026-02-21T09:20:57.6641353Z // end inline asm 2026-02-21T09:20:57.6641408Z bar.sync 0; 2026-02-21T09:20:57.6641549Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3029}; 2026-02-21T09:20:57.6641609Z bar.sync 0; 2026-02-21T09:20:57.6641670Z // begin inline asm 2026-02-21T09:20:57.6641825Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2559, %r2695}, [%r578]; 2026-02-21T09:20:57.6641882Z // end inline asm 2026-02-21T09:20:57.6641940Z bar.sync 0; 2026-02-21T09:20:57.6642066Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3030}; 2026-02-21T09:20:57.6642123Z bar.sync 0; 2026-02-21T09:20:57.6642183Z // begin inline asm 2026-02-21T09:20:57.6642326Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2560, %r2696}, [%r578]; 2026-02-21T09:20:57.6642381Z // end inline asm 2026-02-21T09:20:57.6642439Z bar.sync 0; 2026-02-21T09:20:57.6642570Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3032}; 2026-02-21T09:20:57.6642625Z bar.sync 0; 2026-02-21T09:20:57.6642684Z // begin inline asm 2026-02-21T09:20:57.6642846Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2562, %r2698}, [%r578]; 2026-02-21T09:20:57.6642905Z // end inline asm 2026-02-21T09:20:57.6642962Z bar.sync 0; 2026-02-21T09:20:57.6643092Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3031}; 2026-02-21T09:20:57.6643156Z bar.sync 0; 2026-02-21T09:20:57.6643213Z // begin inline asm 2026-02-21T09:20:57.6643358Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2561, %r2697}, [%r578]; 2026-02-21T09:20:57.6643418Z // end inline asm 2026-02-21T09:20:57.6643536Z bar.sync 0; 2026-02-21T09:20:57.6643665Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3033}; 2026-02-21T09:20:57.6643725Z bar.sync 0; 2026-02-21T09:20:57.6643783Z // begin inline asm 2026-02-21T09:20:57.6643929Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2563, %r2699}, [%r578]; 2026-02-21T09:20:57.6644047Z // end inline asm 2026-02-21T09:20:57.6644106Z bar.sync 0; 2026-02-21T09:20:57.6644233Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3034}; 2026-02-21T09:20:57.6644289Z bar.sync 0; 2026-02-21T09:20:57.6644348Z // begin inline asm 2026-02-21T09:20:57.6644495Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2564, %r2700}, [%r578]; 2026-02-21T09:20:57.6644552Z // end inline asm 2026-02-21T09:20:57.6644606Z bar.sync 0; 2026-02-21T09:20:57.6644787Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3036}; 2026-02-21T09:20:57.6644846Z bar.sync 0; 2026-02-21T09:20:57.6644904Z // begin inline asm 2026-02-21T09:20:57.6645055Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2566, %r2702}, [%r578]; 2026-02-21T09:20:57.6645111Z // end inline asm 2026-02-21T09:20:57.6645165Z bar.sync 0; 2026-02-21T09:20:57.6645293Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3035}; 2026-02-21T09:20:57.6645351Z bar.sync 0; 2026-02-21T09:20:57.6645421Z // begin inline asm 2026-02-21T09:20:57.6645570Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2565, %r2701}, [%r578]; 2026-02-21T09:20:57.6645630Z // end inline asm 2026-02-21T09:20:57.6645685Z bar.sync 0; 2026-02-21T09:20:57.6645814Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3037}; 2026-02-21T09:20:57.6645926Z bar.sync 0; 2026-02-21T09:20:57.6645989Z // begin inline asm 2026-02-21T09:20:57.6646134Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2567, %r2703}, [%r578]; 2026-02-21T09:20:57.6646193Z // end inline asm 2026-02-21T09:20:57.6646251Z bar.sync 0; 2026-02-21T09:20:57.6646378Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3038}; 2026-02-21T09:20:57.6646434Z bar.sync 0; 2026-02-21T09:20:57.6646610Z // begin inline asm 2026-02-21T09:20:57.6646762Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2568, %r2704}, [%r578]; 2026-02-21T09:20:57.6646818Z // end inline asm 2026-02-21T09:20:57.6646873Z bar.sync 0; 2026-02-21T09:20:57.6647004Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3040}; 2026-02-21T09:20:57.6647061Z bar.sync 0; 2026-02-21T09:20:57.6647120Z // begin inline asm 2026-02-21T09:20:57.6647283Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2570, %r2706}, [%r578]; 2026-02-21T09:20:57.6647341Z // end inline asm 2026-02-21T09:20:57.6647396Z bar.sync 0; 2026-02-21T09:20:57.6647527Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3039}; 2026-02-21T09:20:57.6647585Z bar.sync 0; 2026-02-21T09:20:57.6647645Z // begin inline asm 2026-02-21T09:20:57.6647795Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2569, %r2705}, [%r578]; 2026-02-21T09:20:57.6647855Z // end inline asm 2026-02-21T09:20:57.6647912Z bar.sync 0; 2026-02-21T09:20:57.6648039Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3041}; 2026-02-21T09:20:57.6648099Z bar.sync 0; 2026-02-21T09:20:57.6648157Z // begin inline asm 2026-02-21T09:20:57.6648301Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2571, %r2707}, [%r578]; 2026-02-21T09:20:57.6648358Z // end inline asm 2026-02-21T09:20:57.6648416Z bar.sync 0; 2026-02-21T09:20:57.6648543Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3042}; 2026-02-21T09:20:57.6648598Z bar.sync 0; 2026-02-21T09:20:57.6648660Z // begin inline asm 2026-02-21T09:20:57.6648803Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2572, %r2708}, [%r578]; 2026-02-21T09:20:57.6648859Z // end inline asm 2026-02-21T09:20:57.6648914Z bar.sync 0; 2026-02-21T09:20:57.6649043Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3044}; 2026-02-21T09:20:57.6649099Z bar.sync 0; 2026-02-21T09:20:57.6649158Z // begin inline asm 2026-02-21T09:20:57.6649306Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2574, %r2710}, [%r578]; 2026-02-21T09:20:57.6649363Z // end inline asm 2026-02-21T09:20:57.6649503Z bar.sync 0; 2026-02-21T09:20:57.6649631Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3043}; 2026-02-21T09:20:57.6649689Z bar.sync 0; 2026-02-21T09:20:57.6649749Z // begin inline asm 2026-02-21T09:20:57.6649892Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2573, %r2709}, [%r578]; 2026-02-21T09:20:57.6650022Z // end inline asm 2026-02-21T09:20:57.6650077Z bar.sync 0; 2026-02-21T09:20:57.6650202Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3045}; 2026-02-21T09:20:57.6650261Z bar.sync 0; 2026-02-21T09:20:57.6650322Z // begin inline asm 2026-02-21T09:20:57.6650466Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2575, %r2711}, [%r578]; 2026-02-21T09:20:57.6650522Z // end inline asm 2026-02-21T09:20:57.6650590Z bar.sync 0; 2026-02-21T09:20:57.6650788Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3046}; 2026-02-21T09:20:57.6650847Z bar.sync 0; 2026-02-21T09:20:57.6650909Z // begin inline asm 2026-02-21T09:20:57.6651054Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2576, %r2712}, [%r578]; 2026-02-21T09:20:57.6651113Z // end inline asm 2026-02-21T09:20:57.6651168Z bar.sync 0; 2026-02-21T09:20:57.6651299Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3048}; 2026-02-21T09:20:57.6651354Z bar.sync 0; 2026-02-21T09:20:57.6651414Z // begin inline asm 2026-02-21T09:20:57.6651560Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2578, %r2714}, [%r578]; 2026-02-21T09:20:57.6651616Z // end inline asm 2026-02-21T09:20:57.6651673Z bar.sync 0; 2026-02-21T09:20:57.6651860Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3047}; 2026-02-21T09:20:57.6651923Z bar.sync 0; 2026-02-21T09:20:57.6651982Z // begin inline asm 2026-02-21T09:20:57.6652126Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2577, %r2713}, [%r578]; 2026-02-21T09:20:57.6652188Z // end inline asm 2026-02-21T09:20:57.6652243Z bar.sync 0; 2026-02-21T09:20:57.6652370Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3049}; 2026-02-21T09:20:57.6652428Z bar.sync 0; 2026-02-21T09:20:57.6652488Z // begin inline asm 2026-02-21T09:20:57.6652632Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2579, %r2715}, [%r578]; 2026-02-21T09:20:57.6652688Z // end inline asm 2026-02-21T09:20:57.6652749Z bar.sync 0; 2026-02-21T09:20:57.6652878Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3050}; 2026-02-21T09:20:57.6652935Z bar.sync 0; 2026-02-21T09:20:57.6652996Z // begin inline asm 2026-02-21T09:20:57.6653140Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2580, %r2716}, [%r578]; 2026-02-21T09:20:57.6653196Z // end inline asm 2026-02-21T09:20:57.6653251Z bar.sync 0; 2026-02-21T09:20:57.6653387Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3052}; 2026-02-21T09:20:57.6653443Z bar.sync 0; 2026-02-21T09:20:57.6653503Z // begin inline asm 2026-02-21T09:20:57.6653656Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2582, %r2718}, [%r578]; 2026-02-21T09:20:57.6653713Z // end inline asm 2026-02-21T09:20:57.6653769Z bar.sync 0; 2026-02-21T09:20:57.6653900Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3051}; 2026-02-21T09:20:57.6653958Z bar.sync 0; 2026-02-21T09:20:57.6654018Z // begin inline asm 2026-02-21T09:20:57.6654163Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2581, %r2717}, [%r578]; 2026-02-21T09:20:57.6654223Z // end inline asm 2026-02-21T09:20:57.6654281Z bar.sync 0; 2026-02-21T09:20:57.6654409Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3053}; 2026-02-21T09:20:57.6654467Z bar.sync 0; 2026-02-21T09:20:57.6654525Z // begin inline asm 2026-02-21T09:20:57.6654684Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2583, %r2719}, [%r578]; 2026-02-21T09:20:57.6654743Z // end inline asm 2026-02-21T09:20:57.6654802Z $L__tmp9: 2026-02-21T09:20:57.6655083Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6655142Z // begin inline asm 2026-02-21T09:20:57.6655227Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6655283Z // end inline asm 2026-02-21T09:20:57.6655426Z shfl.sync.idx.b32 %r3172, %r6, 0, 31, -1; 2026-02-21T09:20:57.6655500Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6655568Z mov.pred %p25, -1; 2026-02-21T09:20:57.6655626Z // begin inline asm 2026-02-21T09:20:57.6656384Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2552,%r2553,%r2554,%r2555,%r2556,%r2557,%r2558,%r2559,%r2560,%r2561,%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583}, {%r2684,%r2685,%r2686,%r2687}, %rd1, %p25, 1, 1; 2026-02-21T09:20:57.6656647Z // end inline asm 2026-02-21T09:20:57.6656720Z // begin inline asm 2026-02-21T09:20:57.6657561Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2552,%r2553,%r2554,%r2555,%r2556,%r2557,%r2558,%r2559,%r2560,%r2561,%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583}, {%r2752,%r2753,%r2754,%r2755}, %rd2, %p25, 1, 1; 2026-02-21T09:20:57.6658039Z [538s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:20:57.6659345Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=6, num_warps=32, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:20:57.6659502Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:20:57.6659564Z `ptxas` stderr: 2026-02-21T09:20:57.6660106Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:20:57.6660430Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:20:57.6660532Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:20:57.6660540Z 2026-02-21T09:20:57.6661044Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpza8qfwif.ptx -o /tmp/tmpza8qfwif.ptx.o 2026-02-21T09:20:57.6661051Z 2026-02-21T09:20:57.6661207Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:20:57.6661269Z // end inline asm 2026-02-21T09:20:57.6661329Z // begin inline asm 2026-02-21T09:20:57.6662101Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2688,%r2689,%r2690,%r2691,%r2692,%r2693,%r2694,%r2695,%r2696,%r2697,%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719}, {%r2684,%r2685,%r2686,%r2687}, %rd3, %p25, 1, 1; 2026-02-21T09:20:57.6662162Z // end inline asm 2026-02-21T09:20:57.6662222Z // begin inline asm 2026-02-21T09:20:57.6662972Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2688,%r2689,%r2690,%r2691,%r2692,%r2693,%r2694,%r2695,%r2696,%r2697,%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719}, {%r2752,%r2753,%r2754,%r2755}, %rd4, %p25, 1, 1; 2026-02-21T09:20:57.6663033Z // end inline asm 2026-02-21T09:20:57.6663116Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6663181Z mov.b32 %r3123, 0; 2026-02-21T09:20:57.6669071Z mov.b32 %r2821, %r3123; 2026-02-21T09:20:57.6669185Z mov.b32 %r2822, %r3123; 2026-02-21T09:20:57.6669253Z mov.b32 %r2820, %r481; 2026-02-21T09:20:57.6669321Z // begin inline asm 2026-02-21T09:20:57.6670421Z // wait for regs: %r2552,%r2553,%r2554,%r2555,%r2556,%r2557,%r2558,%r2559,%r2560,%r2561,%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583,%r2688,%r2689,%r2690,%r2691,%r2692,%r2693,%r2694,%r2695,%r2696,%r2697,%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2820,%r2821,%r2822 2026-02-21T09:20:57.6670703Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6670763Z // end inline asm 2026-02-21T09:20:57.6670819Z $L__tmp10: 2026-02-21T09:20:57.6671058Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6671128Z add.s32 %r3173, %r3167, 40960; 2026-02-21T09:20:57.6671346Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6671485Z add.s32 %r3174, %r3173, %r31; 2026-02-21T09:20:57.6671571Z ld.shared.b16 %rs97, [%r3174]; 2026-02-21T09:20:57.6671645Z ld.shared.b16 %rs98, [%r3174+256]; 2026-02-21T09:20:57.6671730Z ld.shared.b16 %rs99, [%r3174+16]; 2026-02-21T09:20:57.6671806Z ld.shared.b16 %rs100, [%r3174+272]; 2026-02-21T09:20:57.6671870Z add.s32 %r3175, %r3173, %r32; 2026-02-21T09:20:57.6671936Z ld.shared.b16 %rs101, [%r3175]; 2026-02-21T09:20:57.6672008Z ld.shared.b16 %rs102, [%r3175+256]; 2026-02-21T09:20:57.6672077Z ld.shared.b16 %rs103, [%r3175+16]; 2026-02-21T09:20:57.6672142Z ld.shared.b16 %rs104, [%r3175+272]; 2026-02-21T09:20:57.6672213Z cvt.f32.bf16 %r3018, %rs97; 2026-02-21T09:20:57.6672276Z cvt.f32.bf16 %r3019, %rs98; 2026-02-21T09:20:57.6672401Z cvt.f32.bf16 %r3020, %rs101; 2026-02-21T09:20:57.6672471Z cvt.f32.bf16 %r3021, %rs102; 2026-02-21T09:20:57.6672532Z cvt.f32.bf16 %r3086, %rs99; 2026-02-21T09:20:57.6672592Z cvt.f32.bf16 %r3087, %rs100; 2026-02-21T09:20:57.6672655Z cvt.f32.bf16 %r3088, %rs103; 2026-02-21T09:20:57.6672718Z cvt.f32.bf16 %r3089, %rs104; 2026-02-21T09:20:57.6672946Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6673013Z add.s32 %r3176, %r5232, 65536; 2026-02-21T09:20:57.6673082Z cvt.s64.s32 %rd138, %r3176; 2026-02-21T09:20:57.6673147Z add.s64 %rd131, %rd28, %rd138; 2026-02-21T09:20:57.6673351Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6673413Z // begin inline asm 2026-02-21T09:20:57.6673478Z mov.u64 %rd130, 0x0; 2026-02-21T09:20:57.6673618Z createpolicy.fractional.L2::evict_first.b64 %rd130, 1.0; 2026-02-21T09:20:57.6673679Z // end inline asm 2026-02-21T09:20:57.6673747Z // begin inline asm 2026-02-21T09:20:57.6673808Z mov.u16 %rs78, 0x0; 2026-02-21T09:20:57.6673983Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs78 }, [ %rd131 + 0 ], %rd130; 2026-02-21T09:20:57.6674045Z // end inline asm 2026-02-21T09:20:57.6674261Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6674320Z bar.sync 0; 2026-02-21T09:20:57.6674391Z st.shared.b8 [%r33], %rs78; 2026-02-21T09:20:57.6674450Z bar.sync 0; 2026-02-21T09:20:57.6674537Z ld.shared.v2.b8 {%rs105, %rs106}, [%r34]; 2026-02-21T09:20:57.6674748Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6674819Z shl.b16 %rs107, %rs105, 4; 2026-02-21T09:20:57.6674882Z shl.b16 %rs108, %rs106, 4; 2026-02-21T09:20:57.6675100Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6675182Z selp.b16 %rs109, %rs107, %rs105, %p63; 2026-02-21T09:20:57.6675248Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T09:20:57.6675310Z shr.s16 %rs111, %rs110, 4; 2026-02-21T09:20:57.6675380Z selp.b16 %rs112, %rs108, %rs106, %p63; 2026-02-21T09:20:57.6675449Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T09:20:57.6675510Z shr.s16 %rs114, %rs113, 4; 2026-02-21T09:20:57.6675716Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6675850Z cvt.rn.f32.s16 %r3177, %rs111; 2026-02-21T09:20:57.6675913Z cvt.rn.f32.s16 %r3178, %rs114; 2026-02-21T09:20:57.6675971Z bar.sync 0; 2026-02-21T09:20:57.6676042Z st.shared.b32 [%r35], %r3177; 2026-02-21T09:20:57.6676106Z st.shared.b32 [%r36], %r3178; 2026-02-21T09:20:57.6676208Z $L__tmp11: 2026-02-21T09:20:57.6676632Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6676808Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2552, %r2688}; 2026-02-21T09:20:57.6676867Z bar.sync 0; 2026-02-21T09:20:57.6676929Z // begin inline asm 2026-02-21T09:20:57.6677077Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3022}, [%r1079]; 2026-02-21T09:20:57.6677136Z // end inline asm 2026-02-21T09:20:57.6677267Z bar.sync 0; 2026-02-21T09:20:57.6677424Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2554, %r2690}; 2026-02-21T09:20:57.6677487Z bar.sync 0; 2026-02-21T09:20:57.6677549Z // begin inline asm 2026-02-21T09:20:57.6677690Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3024}, [%r1079]; 2026-02-21T09:20:57.6677752Z // end inline asm 2026-02-21T09:20:57.6677805Z bar.sync 0; 2026-02-21T09:20:57.6677959Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2553, %r2689}; 2026-02-21T09:20:57.6678020Z bar.sync 0; 2026-02-21T09:20:57.6678080Z // begin inline asm 2026-02-21T09:20:57.6678212Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3023}, [%r1079]; 2026-02-21T09:20:57.6678268Z // end inline asm 2026-02-21T09:20:57.6678341Z bar.sync 0; 2026-02-21T09:20:57.6678557Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2555, %r2691}; 2026-02-21T09:20:57.6678616Z bar.sync 0; 2026-02-21T09:20:57.6678678Z // begin inline asm 2026-02-21T09:20:57.6678807Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3025}, [%r1079]; 2026-02-21T09:20:57.6678864Z // end inline asm 2026-02-21T09:20:57.6678919Z bar.sync 0; 2026-02-21T09:20:57.6679068Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2556, %r2692}; 2026-02-21T09:20:57.6679128Z bar.sync 0; 2026-02-21T09:20:57.6679186Z // begin inline asm 2026-02-21T09:20:57.6679316Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3026}, [%r1079]; 2026-02-21T09:20:57.6679371Z // end inline asm 2026-02-21T09:20:57.6679428Z bar.sync 0; 2026-02-21T09:20:57.6679571Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2558, %r2694}; 2026-02-21T09:20:57.6679630Z bar.sync 0; 2026-02-21T09:20:57.6679689Z // begin inline asm 2026-02-21T09:20:57.6679823Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3028}, [%r1079]; 2026-02-21T09:20:57.6679886Z // end inline asm 2026-02-21T09:20:57.6679940Z bar.sync 0; 2026-02-21T09:20:57.6680092Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2557, %r2693}; 2026-02-21T09:20:57.6680151Z bar.sync 0; 2026-02-21T09:20:57.6680211Z // begin inline asm 2026-02-21T09:20:57.6680341Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3027}, [%r1079]; 2026-02-21T09:20:57.6680397Z // end inline asm 2026-02-21T09:20:57.6680457Z bar.sync 0; 2026-02-21T09:20:57.6680605Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2559, %r2695}; 2026-02-21T09:20:57.6680673Z bar.sync 0; 2026-02-21T09:20:57.6680740Z // begin inline asm 2026-02-21T09:20:57.6680879Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3029}, [%r1079]; 2026-02-21T09:20:57.6680938Z // end inline asm 2026-02-21T09:20:57.6680993Z bar.sync 0; 2026-02-21T09:20:57.6681149Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2560, %r2696}; 2026-02-21T09:20:57.6681205Z bar.sync 0; 2026-02-21T09:20:57.6681263Z // begin inline asm 2026-02-21T09:20:57.6681401Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3030}, [%r1079]; 2026-02-21T09:20:57.6681460Z // end inline asm 2026-02-21T09:20:57.6681515Z bar.sync 0; 2026-02-21T09:20:57.6681663Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2562, %r2698}; 2026-02-21T09:20:57.6681721Z bar.sync 0; 2026-02-21T09:20:57.6681781Z // begin inline asm 2026-02-21T09:20:57.6681912Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3032}, [%r1079]; 2026-02-21T09:20:57.6682052Z // end inline asm 2026-02-21T09:20:57.6682107Z bar.sync 0; 2026-02-21T09:20:57.6682253Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2561, %r2697}; 2026-02-21T09:20:57.6682312Z bar.sync 0; 2026-02-21T09:20:57.6682432Z // begin inline asm 2026-02-21T09:20:57.6682560Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3031}, [%r1079]; 2026-02-21T09:20:57.6682618Z // end inline asm 2026-02-21T09:20:57.6682676Z bar.sync 0; 2026-02-21T09:20:57.6682821Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2563, %r2699}; 2026-02-21T09:20:57.6682880Z bar.sync 0; 2026-02-21T09:20:57.6682940Z // begin inline asm 2026-02-21T09:20:57.6683068Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3033}, [%r1079]; 2026-02-21T09:20:57.6683175Z // end inline asm 2026-02-21T09:20:57.6683235Z bar.sync 0; 2026-02-21T09:20:57.6683381Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2564, %r2700}; 2026-02-21T09:20:57.6683435Z bar.sync 0; 2026-02-21T09:20:57.6683496Z // begin inline asm 2026-02-21T09:20:57.6683629Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3034}, [%r1079]; 2026-02-21T09:20:57.6683687Z // end inline asm 2026-02-21T09:20:57.6683756Z bar.sync 0; 2026-02-21T09:20:57.6683903Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2566, %r2702}; 2026-02-21T09:20:57.6683964Z bar.sync 0; 2026-02-21T09:20:57.6684022Z // begin inline asm 2026-02-21T09:20:57.6684149Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3036}, [%r1079]; 2026-02-21T09:20:57.6684210Z // end inline asm 2026-02-21T09:20:57.6684263Z bar.sync 0; 2026-02-21T09:20:57.6684453Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2565, %r2701}; 2026-02-21T09:20:57.6684514Z bar.sync 0; 2026-02-21T09:20:57.6684573Z // begin inline asm 2026-02-21T09:20:57.6684701Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3035}, [%r1079]; 2026-02-21T09:20:57.6684758Z // end inline asm 2026-02-21T09:20:57.6684819Z bar.sync 0; 2026-02-21T09:20:57.6684961Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2567, %r2703}; 2026-02-21T09:20:57.6685018Z bar.sync 0; 2026-02-21T09:20:57.6685080Z // begin inline asm 2026-02-21T09:20:57.6685206Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3037}, [%r1079]; 2026-02-21T09:20:57.6685262Z // end inline asm 2026-02-21T09:20:57.6685321Z bar.sync 0; 2026-02-21T09:20:57.6685467Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2568, %r2704}; 2026-02-21T09:20:57.6685522Z bar.sync 0; 2026-02-21T09:20:57.6685581Z // begin inline asm 2026-02-21T09:20:57.6685711Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3038}, [%r1079]; 2026-02-21T09:20:57.6685768Z // end inline asm 2026-02-21T09:20:57.6685823Z bar.sync 0; 2026-02-21T09:20:57.6685965Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2570, %r2706}; 2026-02-21T09:20:57.6686025Z bar.sync 0; 2026-02-21T09:20:57.6686084Z // begin inline asm 2026-02-21T09:20:57.6686212Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3040}, [%r1079]; 2026-02-21T09:20:57.6686274Z // end inline asm 2026-02-21T09:20:57.6686330Z bar.sync 0; 2026-02-21T09:20:57.6686594Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2569, %r2705}; 2026-02-21T09:20:57.6686657Z bar.sync 0; 2026-02-21T09:20:57.6686717Z // begin inline asm 2026-02-21T09:20:57.6686843Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3039}, [%r1079]; 2026-02-21T09:20:57.6686902Z // end inline asm 2026-02-21T09:20:57.6686959Z bar.sync 0; 2026-02-21T09:20:57.6687115Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2571, %r2707}; 2026-02-21T09:20:57.6687172Z bar.sync 0; 2026-02-21T09:20:57.6687235Z // begin inline asm 2026-02-21T09:20:57.6687365Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3041}, [%r1079]; 2026-02-21T09:20:57.6687420Z // end inline asm 2026-02-21T09:20:57.6687474Z bar.sync 0; 2026-02-21T09:20:57.6687622Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2572, %r2708}; 2026-02-21T09:20:57.6687676Z bar.sync 0; 2026-02-21T09:20:57.6687735Z // begin inline asm 2026-02-21T09:20:57.6687872Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3042}, [%r1079]; 2026-02-21T09:20:57.6688036Z // end inline asm 2026-02-21T09:20:57.6688092Z bar.sync 0; 2026-02-21T09:20:57.6688236Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2574, %r2710}; 2026-02-21T09:20:57.6688293Z bar.sync 0; 2026-02-21T09:20:57.6688416Z // begin inline asm 2026-02-21T09:20:57.6688542Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3044}, [%r1079]; 2026-02-21T09:20:57.6688605Z // end inline asm 2026-02-21T09:20:57.6688661Z bar.sync 0; 2026-02-21T09:20:57.6688807Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2573, %r2709}; 2026-02-21T09:20:57.6688866Z bar.sync 0; 2026-02-21T09:20:57.6688929Z // begin inline asm 2026-02-21T09:20:57.6689060Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3043}, [%r1079]; 2026-02-21T09:20:57.6689195Z // end inline asm 2026-02-21T09:20:57.6689258Z bar.sync 0; 2026-02-21T09:20:57.6689403Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2575, %r2711}; 2026-02-21T09:20:57.6689459Z bar.sync 0; 2026-02-21T09:20:57.6689524Z // begin inline asm 2026-02-21T09:20:57.6689651Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3045}, [%r1079]; 2026-02-21T09:20:57.6689709Z // end inline asm 2026-02-21T09:20:57.6689764Z bar.sync 0; 2026-02-21T09:20:57.6689914Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2576, %r2712}; 2026-02-21T09:20:57.6689971Z bar.sync 0; 2026-02-21T09:20:57.6690029Z // begin inline asm 2026-02-21T09:20:57.6690163Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3046}, [%r1079]; 2026-02-21T09:20:57.6690227Z // end inline asm 2026-02-21T09:20:57.6690282Z bar.sync 0; 2026-02-21T09:20:57.6690490Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2578, %r2714}; 2026-02-21T09:20:57.6690554Z bar.sync 0; 2026-02-21T09:20:57.6690612Z // begin inline asm 2026-02-21T09:20:57.6690741Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3048}, [%r1079]; 2026-02-21T09:20:57.6690801Z // end inline asm 2026-02-21T09:20:57.6690854Z bar.sync 0; 2026-02-21T09:20:57.6690997Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2577, %r2713}; 2026-02-21T09:20:57.6691056Z bar.sync 0; 2026-02-21T09:20:57.6691114Z // begin inline asm 2026-02-21T09:20:57.6691241Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3047}, [%r1079]; 2026-02-21T09:20:57.6691296Z // end inline asm 2026-02-21T09:20:57.6691356Z bar.sync 0; 2026-02-21T09:20:57.6691501Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2579, %r2715}; 2026-02-21T09:20:57.6691555Z bar.sync 0; 2026-02-21T09:20:57.6691619Z // begin inline asm 2026-02-21T09:20:57.6691758Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3049}, [%r1079]; 2026-02-21T09:20:57.6691816Z // end inline asm 2026-02-21T09:20:57.6691872Z bar.sync 0; 2026-02-21T09:20:57.6692019Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2580, %r2716}; 2026-02-21T09:20:57.6692075Z bar.sync 0; 2026-02-21T09:20:57.6692146Z // begin inline asm 2026-02-21T09:20:57.6692281Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3050}, [%r1079]; 2026-02-21T09:20:57.6692337Z // end inline asm 2026-02-21T09:20:57.6692394Z bar.sync 0; 2026-02-21T09:20:57.6692537Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2582, %r2718}; 2026-02-21T09:20:57.6692599Z bar.sync 0; 2026-02-21T09:20:57.6692658Z // begin inline asm 2026-02-21T09:20:57.6692791Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3052}, [%r1079]; 2026-02-21T09:20:57.6692851Z // end inline asm 2026-02-21T09:20:57.6692904Z bar.sync 0; 2026-02-21T09:20:57.6693046Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2581, %r2717}; 2026-02-21T09:20:57.6693103Z bar.sync 0; 2026-02-21T09:20:57.6693166Z // begin inline asm 2026-02-21T09:20:57.6693293Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3051}, [%r1079]; 2026-02-21T09:20:57.6693353Z // end inline asm 2026-02-21T09:20:57.6693413Z bar.sync 0; 2026-02-21T09:20:57.6693557Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r2583, %r2719}; 2026-02-21T09:20:57.6693611Z bar.sync 0; 2026-02-21T09:20:57.6693672Z // begin inline asm 2026-02-21T09:20:57.6693800Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3053}, [%r1079]; 2026-02-21T09:20:57.6693922Z // end inline asm 2026-02-21T09:20:57.6693980Z // begin inline asm 2026-02-21T09:20:57.6694067Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6694126Z // end inline asm 2026-02-21T09:20:57.6694274Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6694338Z shl.b32 %r3179, %r3172, 8; 2026-02-21T09:20:57.6694401Z and.b32 %r3180, %r3179, 4096; 2026-02-21T09:20:57.6694469Z add.s32 %r3181, %r3180, %r481; 2026-02-21T09:20:57.6694531Z bfe.u32 %r3182, %r3181, 4, 14; 2026-02-21T09:20:57.6694599Z cvt.u64.u32 %rd139, %r3182; 2026-02-21T09:20:57.6694685Z or.b64 %rd133, %rd139, -9223371899382267904; 2026-02-21T09:20:57.6694745Z // begin inline asm 2026-02-21T09:20:57.6695569Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3028,%r3029,%r3030,%r3031,%r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053}, {%r3018,%r3019,%r3020,%r3021}, %rd133, %p25, 1, 1; 2026-02-21T09:20:57.6695634Z // end inline asm 2026-02-21T09:20:57.6695697Z add.s32 %r3183, %r3181, 32; 2026-02-21T09:20:57.6695757Z bfe.u32 %r3184, %r3183, 4, 14; 2026-02-21T09:20:57.6695821Z cvt.u64.u32 %rd140, %r3184; 2026-02-21T09:20:57.6695912Z or.b64 %rd134, %rd140, -9223371899382267904; 2026-02-21T09:20:57.6695975Z // begin inline asm 2026-02-21T09:20:57.6696932Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3028,%r3029,%r3030,%r3031,%r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053}, {%r3086,%r3087,%r3088,%r3089}, %rd134, %p25, 1, 1; 2026-02-21T09:20:57.6697003Z // end inline asm 2026-02-21T09:20:57.6697081Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6697143Z mov.b32 %r3122, %r481; 2026-02-21T09:20:57.6697206Z mov.b32 %r3124, %r3123; 2026-02-21T09:20:57.6697267Z // begin inline asm 2026-02-21T09:20:57.6697827Z // wait for regs: %r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3028,%r3029,%r3030,%r3031,%r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053,%r3122,%r3123,%r3124 2026-02-21T09:20:57.6697909Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6697966Z // end inline asm 2026-02-21T09:20:57.6698021Z $L__tmp12: 2026-02-21T09:20:57.6698243Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6698323Z add.s32 %r3185, %r5235, 1; 2026-02-21T09:20:57.6698395Z setp.gt.s32 %p34, %r3185, 4; 2026-02-21T09:20:57.6698466Z selp.b32 %r5235, 0, %r3185, %p34; 2026-02-21T09:20:57.6698675Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6698740Z add.s32 %r3186, %r5233, -16; 2026-02-21T09:20:57.6698816Z mad.wide.s32 %rd135, %r3186, 2, %rd27; 2026-02-21T09:20:57.6699025Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6699089Z shl.b32 %r3187, %r5235, 13; 2026-02-21T09:20:57.6699153Z add.s32 %r3160, %r21, %r3187; 2026-02-21T09:20:57.6699221Z selp.b32 %r3161, 8, 0, %p32; 2026-02-21T09:20:57.6699285Z // begin inline asm 2026-02-21T09:20:57.6699435Z cp.async.ca.shared.global [ %r3160 + 0 ], [ %rd135 + 0 ], 0x8, %r3161; 2026-02-21T09:20:57.6699494Z // end inline asm 2026-02-21T09:20:57.6699566Z cp.async.commit_group; 2026-02-21T09:20:57.6699769Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6699840Z mad.wide.s32 %rd136, %r5233, 2, %rd27; 2026-02-21T09:20:57.6700042Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6700105Z add.s32 %r3162, %r22, %r3187; 2026-02-21T09:20:57.6700246Z // begin inline asm 2026-02-21T09:20:57.6700385Z cp.async.ca.shared.global [ %r3162 + 0 ], [ %rd136 + 0 ], 0x8, %r3161; 2026-02-21T09:20:57.6700445Z // end inline asm 2026-02-21T09:20:57.6700510Z cp.async.commit_group; 2026-02-21T09:20:57.6700708Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6700844Z add.s32 %r5233, %r5233, 32; 2026-02-21T09:20:57.6700913Z add.s32 %r5232, %r5232, 131072; 2026-02-21T09:20:57.6700983Z setp.lt.u64 %p35, %rd221, 496; 2026-02-21T09:20:57.6701047Z @%p35 bra $L__BB0_7; 2026-02-21T09:20:57.6701164Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:57.6701368Z .loc 1 34 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:32 2026-02-21T09:20:57.6701492Z or.b32 %r3247, %r229, %r8; 2026-02-21T09:20:57.6701694Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6701759Z or.b32 %r3248, %r230, %r12; 2026-02-21T09:20:57.6701818Z or.b32 %r3249, %r230, %r13; 2026-02-21T09:20:57.6701880Z or.b32 %r3250, %r230, %r14; 2026-02-21T09:20:57.6701940Z or.b32 %r3251, %r230, %r15; 2026-02-21T09:20:57.6702145Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6702222Z cp.async.wait_group 0; 2026-02-21T09:20:57.6702279Z bar.sync 0; 2026-02-21T09:20:57.6702477Z .loc 1 90 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:90:28 2026-02-21T09:20:57.6702604Z cvt.rn.bf16x2.f32 %r3252, %r3023, %r3022; 2026-02-21T09:20:57.6702685Z cvt.rn.bf16x2.f32 %r3253, %r3025, %r3024; 2026-02-21T09:20:57.6702757Z cvt.rn.bf16x2.f32 %r3254, %r3027, %r3026; 2026-02-21T09:20:57.6702826Z cvt.rn.bf16x2.f32 %r3255, %r3029, %r3028; 2026-02-21T09:20:57.6702901Z cvt.rn.bf16x2.f32 %r3256, %r3031, %r3030; 2026-02-21T09:20:57.6702972Z cvt.rn.bf16x2.f32 %r3257, %r3033, %r3032; 2026-02-21T09:20:57.6703046Z cvt.rn.bf16x2.f32 %r3258, %r3035, %r3034; 2026-02-21T09:20:57.6703118Z cvt.rn.bf16x2.f32 %r3259, %r3037, %r3036; 2026-02-21T09:20:57.6703188Z cvt.rn.bf16x2.f32 %r3260, %r3039, %r3038; 2026-02-21T09:20:57.6703268Z cvt.rn.bf16x2.f32 %r3261, %r3041, %r3040; 2026-02-21T09:20:57.6703343Z cvt.rn.bf16x2.f32 %r3262, %r3043, %r3042; 2026-02-21T09:20:57.6703419Z cvt.rn.bf16x2.f32 %r3263, %r3045, %r3044; 2026-02-21T09:20:57.6703490Z cvt.rn.bf16x2.f32 %r3264, %r3047, %r3046; 2026-02-21T09:20:57.6703558Z cvt.rn.bf16x2.f32 %r3265, %r3049, %r3048; 2026-02-21T09:20:57.6703631Z cvt.rn.bf16x2.f32 %r3266, %r3051, %r3050; 2026-02-21T09:20:57.6703701Z cvt.rn.bf16x2.f32 %r3267, %r3053, %r3052; 2026-02-21T09:20:57.6703902Z .loc 1 91 43 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:43 2026-02-21T09:20:57.6703969Z shl.b32 %r3268, %r3248, 13; 2026-02-21T09:20:57.6704030Z shl.b32 %r3269, %r3249, 13; 2026-02-21T09:20:57.6704090Z shl.b32 %r3270, %r3250, 13; 2026-02-21T09:20:57.6704152Z shl.b32 %r3271, %r3251, 13; 2026-02-21T09:20:57.6704349Z .loc 1 91 50 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:50 2026-02-21T09:20:57.6704413Z add.s32 %r3272, %r3268, %r3247; 2026-02-21T09:20:57.6704477Z add.s32 %r3273, %r3269, %r3247; 2026-02-21T09:20:57.6704540Z add.s32 %r3274, %r3270, %r3247; 2026-02-21T09:20:57.6704604Z add.s32 %r3275, %r3271, %r3247; 2026-02-21T09:20:57.6704801Z .loc 1 91 22 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:22 2026-02-21T09:20:57.6704884Z mad.wide.s32 %rd141, %r3272, 2, %rd29; 2026-02-21T09:20:57.6704954Z mad.wide.s32 %rd142, %r3273, 2, %rd29; 2026-02-21T09:20:57.6705025Z mad.wide.s32 %rd143, %r3274, 2, %rd29; 2026-02-21T09:20:57.6705092Z mad.wide.s32 %rd144, %r3275, 2, %rd29; 2026-02-21T09:20:57.6705289Z .loc 1 91 81 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:81 2026-02-21T09:20:57.6705463Z st.shared.v4.b32 [%r39], {%r3252, %r3254, %r3256, %r3258}; 2026-02-21T09:20:57.6705582Z st.shared.v4.b32 [%r39+512], {%r3253, %r3255, %r3257, %r3259}; 2026-02-21T09:20:57.6705689Z st.shared.v4.b32 [%r40], {%r3260, %r3262, %r3264, %r3266}; 2026-02-21T09:20:57.6705844Z st.shared.v4.b32 [%r40+512], {%r3261, %r3263, %r3265, %r3267}; 2026-02-21T09:20:57.6705902Z bar.sync 0; 2026-02-21T09:20:57.6705961Z // begin inline asm 2026-02-21T09:20:57.6706159Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3208, %r3209, %r3210, %r3211}, [%r1380]; 2026-02-21T09:20:57.6706222Z // end inline asm 2026-02-21T09:20:57.6706284Z // begin inline asm 2026-02-21T09:20:57.6706596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3212, %r3213, %r3214, %r3215}, [%r1385]; 2026-02-21T09:20:57.6706661Z // end inline asm 2026-02-21T09:20:57.6706796Z // begin inline asm 2026-02-21T09:20:57.6706983Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3216, %r3217, %r3218, %r3219}, [%r1390]; 2026-02-21T09:20:57.6707038Z // end inline asm 2026-02-21T09:20:57.6707101Z // begin inline asm 2026-02-21T09:20:57.6707282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3220, %r3221, %r3222, %r3223}, [%r1395]; 2026-02-21T09:20:57.6707338Z // end inline asm 2026-02-21T09:20:57.6707398Z // begin inline asm 2026-02-21T09:20:57.6707530Z st.global.v4.b32 [ %rd141 + 0 ], { %r3208, %r3209, %r3210, %r3211 }; 2026-02-21T09:20:57.6707586Z // end inline asm 2026-02-21T09:20:57.6707643Z // begin inline asm 2026-02-21T09:20:57.6707764Z st.global.v4.b32 [ %rd142 + 0 ], { %r3212, %r3213, %r3214, %r3215 }; 2026-02-21T09:20:57.6707901Z // end inline asm 2026-02-21T09:20:57.6707961Z // begin inline asm 2026-02-21T09:20:57.6708081Z st.global.v4.b32 [ %rd143 + 0 ], { %r3216, %r3217, %r3218, %r3219 }; 2026-02-21T09:20:57.6708138Z // end inline asm 2026-02-21T09:20:57.6708197Z // begin inline asm 2026-02-21T09:20:57.6708368Z st.global.v4.b32 [ %rd144 + 0 ], { %r3220, %r3221, %r3222, %r3223 }; 2026-02-21T09:20:57.6708430Z // end inline asm 2026-02-21T09:20:57.6708655Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6708718Z add.s32 %r3276, %r5159, 3; 2026-02-21T09:20:57.6708922Z .loc 1 28 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:28:35 2026-02-21T09:20:57.6708987Z shr.s32 %r3277, %r3276, 31; 2026-02-21T09:20:57.6709046Z shr.u32 %r3278, %r3277, 22; 2026-02-21T09:20:57.6709112Z add.s32 %r3279, %r3276, %r3278; 2026-02-21T09:20:57.6709172Z shr.s32 %r3280, %r3279, 10; 2026-02-21T09:20:57.6709372Z .loc 1 29 33 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:29:33 2026-02-21T09:20:57.6709433Z shl.b32 %r3281, %r3280, 4; 2026-02-21T09:20:57.6709631Z .loc 1 30 39 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:39 2026-02-21T09:20:57.6709692Z sub.s32 %r3282, 64, %r3281; 2026-02-21T09:20:57.6709889Z .loc 1 30 52 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:30:52 2026-02-21T09:20:57.6709954Z min.s32 %r3283, %r3282, 16; 2026-02-21T09:20:57.6710149Z .loc 1 31 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:45 2026-02-21T09:20:57.6710215Z and.b32 %r3284, %r3279, -1024; 2026-02-21T09:20:57.6710287Z sub.s32 %r3285, %r3276, %r3284; 2026-02-21T09:20:57.6710492Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6710555Z div.s32 %r3286, %r3285, %r3283; 2026-02-21T09:20:57.6710749Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6710817Z mul.lo.s32 %r3287, %r3286, %r3283; 2026-02-21T09:20:57.6710879Z sub.s32 %r3288, %r3285, %r3287; 2026-02-21T09:20:57.6711076Z .loc 1 31 30 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:30 2026-02-21T09:20:57.6711139Z add.s32 %r3289, %r3288, %r3281; 2026-02-21T09:20:57.6711412Z .loc 1 33 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:33:27 2026-02-21T09:20:57.6711475Z shl.b32 %r305, %r3289, 7; 2026-02-21T09:20:57.6711673Z .loc 1 35 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:35:27 2026-02-21T09:20:57.6711794Z shl.b32 %r306, %r3286, 8; 2026-02-21T09:20:57.6711989Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6712054Z or.b32 %r3290, %r306, %r11; 2026-02-21T09:20:57.6712251Z .loc 1 51 53 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:53 2026-02-21T09:20:57.6712312Z shl.b32 %r3291, %r3290, 10; 2026-02-21T09:20:57.6712556Z .loc 1 51 60 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:60 2026-02-21T09:20:57.6712624Z or.b32 %r3292, %r3291, %r19; 2026-02-21T09:20:57.6712818Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6712891Z mad.wide.s32 %rd145, %r3292, 2, %rd27; 2026-02-21T09:20:57.6713085Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6713141Z bar.sync 0; 2026-02-21T09:20:57.6713200Z mov.b32 %r3225, 8; 2026-02-21T09:20:57.6713261Z // begin inline asm 2026-02-21T09:20:57.6713399Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd145 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6713456Z // end inline asm 2026-02-21T09:20:57.6713522Z cp.async.commit_group; 2026-02-21T09:20:57.6713770Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6713835Z cvt.s64.s32 %rd156, %r3291; 2026-02-21T09:20:57.6713901Z or.b64 %rd157, %rd156, %rd10; 2026-02-21T09:20:57.6713973Z shl.b64 %rd158, %rd157, 1; 2026-02-21T09:20:57.6714043Z add.s64 %rd159, %rd27, %rd158; 2026-02-21T09:20:57.6714104Z add.s64 %rd146, %rd159, 32; 2026-02-21T09:20:57.6714304Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6714363Z // begin inline asm 2026-02-21T09:20:57.6714498Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd146 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6714558Z // end inline asm 2026-02-21T09:20:57.6714627Z cp.async.commit_group; 2026-02-21T09:20:57.6714823Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6714885Z add.s64 %rd147, %rd159, 64; 2026-02-21T09:20:57.6715084Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6715140Z bar.sync 0; 2026-02-21T09:20:57.6715200Z // begin inline asm 2026-02-21T09:20:57.6715337Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd147 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6715395Z // end inline asm 2026-02-21T09:20:57.6715461Z cp.async.commit_group; 2026-02-21T09:20:57.6715657Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6715725Z add.s64 %rd148, %rd159, 96; 2026-02-21T09:20:57.6715920Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6715979Z // begin inline asm 2026-02-21T09:20:57.6716110Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd148 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6716166Z // end inline asm 2026-02-21T09:20:57.6716241Z cp.async.commit_group; 2026-02-21T09:20:57.6716444Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6716639Z add.s64 %rd149, %rd159, 128; 2026-02-21T09:20:57.6716836Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6716892Z bar.sync 0; 2026-02-21T09:20:57.6716954Z // begin inline asm 2026-02-21T09:20:57.6717082Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd149 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6717218Z // end inline asm 2026-02-21T09:20:57.6717286Z cp.async.commit_group; 2026-02-21T09:20:57.6717482Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6717607Z add.s64 %rd150, %rd159, 160; 2026-02-21T09:20:57.6717805Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6717863Z // begin inline asm 2026-02-21T09:20:57.6717991Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd150 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6718047Z // end inline asm 2026-02-21T09:20:57.6718114Z cp.async.commit_group; 2026-02-21T09:20:57.6718370Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6718433Z add.s64 %rd151, %rd159, 192; 2026-02-21T09:20:57.6718631Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6718687Z bar.sync 0; 2026-02-21T09:20:57.6718744Z // begin inline asm 2026-02-21T09:20:57.6718876Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd151 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6718931Z // end inline asm 2026-02-21T09:20:57.6718995Z cp.async.commit_group; 2026-02-21T09:20:57.6719193Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6719257Z add.s64 %rd152, %rd159, 224; 2026-02-21T09:20:57.6719513Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6719575Z // begin inline asm 2026-02-21T09:20:57.6719705Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd152 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6719760Z // end inline asm 2026-02-21T09:20:57.6719825Z cp.async.commit_group; 2026-02-21T09:20:57.6720021Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6720082Z add.s64 %rd153, %rd159, 256; 2026-02-21T09:20:57.6720275Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6720330Z bar.sync 0; 2026-02-21T09:20:57.6720391Z // begin inline asm 2026-02-21T09:20:57.6720522Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd153 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6720578Z // end inline asm 2026-02-21T09:20:57.6720643Z cp.async.commit_group; 2026-02-21T09:20:57.6720838Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6720902Z add.s64 %rd154, %rd159, 288; 2026-02-21T09:20:57.6721104Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6721174Z // begin inline asm 2026-02-21T09:20:57.6721304Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd154 + 0 ], 0x8, %r3225; 2026-02-21T09:20:57.6721360Z // end inline asm 2026-02-21T09:20:57.6721428Z cp.async.commit_group; 2026-02-21T09:20:57.6721634Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6721695Z shl.b32 %r3293, %r3286, 18; 2026-02-21T09:20:57.6721758Z or.b32 %r5269, %r45, %r3293; 2026-02-21T09:20:57.6721820Z add.s32 %r5268, %r46, %r305; 2026-02-21T09:20:57.6721880Z mov.b32 %r3928, 0f00000000; 2026-02-21T09:20:57.6721939Z mov.b32 %r5271, 4; 2026-02-21T09:20:57.6722000Z mov.b32 %r5270, -1; 2026-02-21T09:20:57.6722058Z mov.b64 %rd222, -16; 2026-02-21T09:20:57.6722118Z mov.b32 %r3929, %r3928; 2026-02-21T09:20:57.6722181Z mov.b32 %r3930, %r3928; 2026-02-21T09:20:57.6722238Z mov.b32 %r3931, %r3928; 2026-02-21T09:20:57.6722295Z mov.b32 %r3932, %r3928; 2026-02-21T09:20:57.6722354Z mov.b32 %r3933, %r3928; 2026-02-21T09:20:57.6722414Z mov.b32 %r3934, %r3928; 2026-02-21T09:20:57.6722471Z mov.b32 %r3935, %r3928; 2026-02-21T09:20:57.6722528Z mov.b32 %r3936, %r3928; 2026-02-21T09:20:57.6722587Z mov.b32 %r3937, %r3928; 2026-02-21T09:20:57.6722706Z mov.b32 %r3938, %r3928; 2026-02-21T09:20:57.6722763Z mov.b32 %r3939, %r3928; 2026-02-21T09:20:57.6722821Z mov.b32 %r3940, %r3928; 2026-02-21T09:20:57.6722879Z mov.b32 %r3941, %r3928; 2026-02-21T09:20:57.6722935Z mov.b32 %r3942, %r3928; 2026-02-21T09:20:57.6723039Z mov.b32 %r3943, %r3928; 2026-02-21T09:20:57.6723099Z mov.b32 %r3944, %r3928; 2026-02-21T09:20:57.6723155Z mov.b32 %r3945, %r3928; 2026-02-21T09:20:57.6723212Z mov.b32 %r3946, %r3928; 2026-02-21T09:20:57.6723271Z mov.b32 %r3947, %r3928; 2026-02-21T09:20:57.6723330Z mov.b32 %r3948, %r3928; 2026-02-21T09:20:57.6723387Z mov.b32 %r3949, %r3928; 2026-02-21T09:20:57.6723444Z mov.b32 %r3950, %r3928; 2026-02-21T09:20:57.6723503Z mov.b32 %r3951, %r3928; 2026-02-21T09:20:57.6723611Z mov.b32 %r3952, %r3928; 2026-02-21T09:20:57.6723670Z mov.b32 %r3953, %r3928; 2026-02-21T09:20:57.6723729Z mov.b32 %r3954, %r3928; 2026-02-21T09:20:57.6723785Z mov.b32 %r3955, %r3928; 2026-02-21T09:20:57.6723843Z mov.b32 %r3956, %r3928; 2026-02-21T09:20:57.6723901Z mov.b32 %r3957, %r3928; 2026-02-21T09:20:57.6723962Z mov.b32 %r3958, %r3928; 2026-02-21T09:20:57.6724020Z mov.b32 %r3959, %r3928; 2026-02-21T09:20:57.6724133Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:20:57.6724246Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:57.6724308Z add.s64 %rd222, %rd222, 16; 2026-02-21T09:20:57.6724377Z setp.lt.u64 %p43, %rd222, 432; 2026-02-21T09:20:57.6724442Z add.s32 %r4070, %r5270, 1; 2026-02-21T09:20:57.6724565Z setp.gt.s32 %p44, %r4070, 4; 2026-02-21T09:20:57.6724636Z selp.b32 %r5270, 0, %r4070, %p44; 2026-02-21T09:20:57.6724839Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6724911Z cp.async.wait_group 8; 2026-02-21T09:20:57.6724966Z bar.sync 0; 2026-02-21T09:20:57.6725025Z shl.b32 %r4071, %r5270, 13; 2026-02-21T09:20:57.6725092Z add.s32 %r4073, %r5136, %r4071; 2026-02-21T09:20:57.6725289Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6725352Z add.s32 %r4074, %r4073, %r31; 2026-02-21T09:20:57.6725421Z ld.shared.b16 %rs117, [%r4074]; 2026-02-21T09:20:57.6725492Z ld.shared.b16 %rs118, [%r4074+256]; 2026-02-21T09:20:57.6725559Z ld.shared.b16 %rs119, [%r4074+16]; 2026-02-21T09:20:57.6725625Z ld.shared.b16 %rs120, [%r4074+272]; 2026-02-21T09:20:57.6725688Z add.s32 %r4075, %r4073, %r32; 2026-02-21T09:20:57.6725752Z ld.shared.b16 %rs121, [%r4075]; 2026-02-21T09:20:57.6725817Z ld.shared.b16 %rs122, [%r4075+256]; 2026-02-21T09:20:57.6725884Z ld.shared.b16 %rs123, [%r4075+16]; 2026-02-21T09:20:57.6725948Z ld.shared.b16 %rs124, [%r4075+272]; 2026-02-21T09:20:57.6726012Z cvt.f32.bf16 %r3590, %rs117; 2026-02-21T09:20:57.6726072Z cvt.f32.bf16 %r3591, %rs118; 2026-02-21T09:20:57.6726135Z cvt.f32.bf16 %r3592, %rs121; 2026-02-21T09:20:57.6726197Z cvt.f32.bf16 %r3593, %rs122; 2026-02-21T09:20:57.6726258Z cvt.f32.bf16 %r3658, %rs119; 2026-02-21T09:20:57.6726320Z cvt.f32.bf16 %r3659, %rs120; 2026-02-21T09:20:57.6726379Z cvt.f32.bf16 %r3660, %rs123; 2026-02-21T09:20:57.6726437Z cvt.f32.bf16 %r3661, %rs124; 2026-02-21T09:20:57.6726769Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6726838Z cvt.s64.s32 %rd174, %r5268; 2026-02-21T09:20:57.6726900Z add.s64 %rd161, %rd28, %rd174; 2026-02-21T09:20:57.6727099Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6727160Z // begin inline asm 2026-02-21T09:20:57.6727219Z mov.u64 %rd160, 0x0; 2026-02-21T09:20:57.6727352Z createpolicy.fractional.L2::evict_first.b64 %rd160, 1.0; 2026-02-21T09:20:57.6727411Z // end inline asm 2026-02-21T09:20:57.6727469Z // begin inline asm 2026-02-21T09:20:57.6727526Z mov.u16 %rs115, 0x0; 2026-02-21T09:20:57.6727770Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs115 }, [ %rd161 + 0 ], %rd160; 2026-02-21T09:20:57.6727828Z // end inline asm 2026-02-21T09:20:57.6728026Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6728155Z st.shared.b8 [%r33], %rs115; 2026-02-21T09:20:57.6728212Z bar.sync 0; 2026-02-21T09:20:57.6728293Z ld.shared.v2.b8 {%rs125, %rs126}, [%r34]; 2026-02-21T09:20:57.6728490Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6728557Z shl.b16 %rs127, %rs125, 4; 2026-02-21T09:20:57.6728618Z shl.b16 %rs128, %rs126, 4; 2026-02-21T09:20:57.6728811Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6728965Z selp.b16 %rs129, %rs127, %rs125, %p63; 2026-02-21T09:20:57.6729032Z cvt.s16.s8 %rs130, %rs129; 2026-02-21T09:20:57.6729092Z shr.s16 %rs131, %rs130, 4; 2026-02-21T09:20:57.6729165Z selp.b16 %rs132, %rs128, %rs126, %p63; 2026-02-21T09:20:57.6729228Z cvt.s16.s8 %rs133, %rs132; 2026-02-21T09:20:57.6729286Z shr.s16 %rs134, %rs133, 4; 2026-02-21T09:20:57.6729483Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6729551Z cvt.rn.f32.s16 %r4076, %rs131; 2026-02-21T09:20:57.6729612Z cvt.rn.f32.s16 %r4077, %rs134; 2026-02-21T09:20:57.6729665Z bar.sync 0; 2026-02-21T09:20:57.6729729Z st.shared.b32 [%r35], %r4076; 2026-02-21T09:20:57.6729810Z st.shared.b32 [%r36], %r4077; 2026-02-21T09:20:57.6730021Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3928}; 2026-02-21T09:20:57.6730078Z bar.sync 0; 2026-02-21T09:20:57.6730140Z // begin inline asm 2026-02-21T09:20:57.6730294Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3458, %r3594}, [%r578]; 2026-02-21T09:20:57.6730351Z // end inline asm 2026-02-21T09:20:57.6730405Z bar.sync 0; 2026-02-21T09:20:57.6730544Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3930}; 2026-02-21T09:20:57.6730599Z bar.sync 0; 2026-02-21T09:20:57.6730658Z // begin inline asm 2026-02-21T09:20:57.6730807Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3460, %r3596}, [%r578]; 2026-02-21T09:20:57.6730864Z // end inline asm 2026-02-21T09:20:57.6730919Z bar.sync 0; 2026-02-21T09:20:57.6731049Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3929}; 2026-02-21T09:20:57.6731102Z bar.sync 0; 2026-02-21T09:20:57.6731159Z // begin inline asm 2026-02-21T09:20:57.6731301Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3459, %r3595}, [%r578]; 2026-02-21T09:20:57.6731369Z // end inline asm 2026-02-21T09:20:57.6731434Z bar.sync 0; 2026-02-21T09:20:57.6731564Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3931}; 2026-02-21T09:20:57.6731619Z bar.sync 0; 2026-02-21T09:20:57.6731679Z // begin inline asm 2026-02-21T09:20:57.6731822Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3461, %r3597}, [%r578]; 2026-02-21T09:20:57.6731879Z // end inline asm 2026-02-21T09:20:57.6731937Z bar.sync 0; 2026-02-21T09:20:57.6732063Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3932}; 2026-02-21T09:20:57.6732117Z bar.sync 0; 2026-02-21T09:20:57.6732177Z // begin inline asm 2026-02-21T09:20:57.6732321Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3462, %r3598}, [%r578]; 2026-02-21T09:20:57.6732378Z // end inline asm 2026-02-21T09:20:57.6732433Z bar.sync 0; 2026-02-21T09:20:57.6732562Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3934}; 2026-02-21T09:20:57.6732615Z bar.sync 0; 2026-02-21T09:20:57.6732673Z // begin inline asm 2026-02-21T09:20:57.6732823Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3464, %r3600}, [%r578]; 2026-02-21T09:20:57.6732881Z // end inline asm 2026-02-21T09:20:57.6732935Z bar.sync 0; 2026-02-21T09:20:57.6733067Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3933}; 2026-02-21T09:20:57.6733121Z bar.sync 0; 2026-02-21T09:20:57.6733178Z // begin inline asm 2026-02-21T09:20:57.6733322Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3463, %r3599}, [%r578]; 2026-02-21T09:20:57.6733443Z // end inline asm 2026-02-21T09:20:57.6733507Z bar.sync 0; 2026-02-21T09:20:57.6733640Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3935}; 2026-02-21T09:20:57.6733695Z bar.sync 0; 2026-02-21T09:20:57.6733801Z // begin inline asm 2026-02-21T09:20:57.6733942Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3465, %r3601}, [%r578]; 2026-02-21T09:20:57.6733997Z // end inline asm 2026-02-21T09:20:57.6734054Z bar.sync 0; 2026-02-21T09:20:57.6734180Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3936}; 2026-02-21T09:20:57.6734236Z bar.sync 0; 2026-02-21T09:20:57.6734297Z // begin inline asm 2026-02-21T09:20:57.6734439Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3466, %r3602}, [%r578]; 2026-02-21T09:20:57.6734495Z // end inline asm 2026-02-21T09:20:57.6734596Z bar.sync 0; 2026-02-21T09:20:57.6734727Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3938}; 2026-02-21T09:20:57.6734781Z bar.sync 0; 2026-02-21T09:20:57.6734841Z // begin inline asm 2026-02-21T09:20:57.6734986Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3468, %r3604}, [%r578]; 2026-02-21T09:20:57.6735041Z // end inline asm 2026-02-21T09:20:57.6735094Z bar.sync 0; 2026-02-21T09:20:57.6735222Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3937}; 2026-02-21T09:20:57.6735279Z bar.sync 0; 2026-02-21T09:20:57.6735337Z // begin inline asm 2026-02-21T09:20:57.6735478Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3467, %r3603}, [%r578]; 2026-02-21T09:20:57.6735535Z // end inline asm 2026-02-21T09:20:57.6735589Z bar.sync 0; 2026-02-21T09:20:57.6735774Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3939}; 2026-02-21T09:20:57.6735833Z bar.sync 0; 2026-02-21T09:20:57.6735890Z // begin inline asm 2026-02-21T09:20:57.6736034Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3469, %r3605}, [%r578]; 2026-02-21T09:20:57.6736089Z // end inline asm 2026-02-21T09:20:57.6736144Z bar.sync 0; 2026-02-21T09:20:57.6736269Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3940}; 2026-02-21T09:20:57.6736325Z bar.sync 0; 2026-02-21T09:20:57.6736398Z // begin inline asm 2026-02-21T09:20:57.6736664Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3470, %r3606}, [%r578]; 2026-02-21T09:20:57.6736724Z // end inline asm 2026-02-21T09:20:57.6736779Z bar.sync 0; 2026-02-21T09:20:57.6736908Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3942}; 2026-02-21T09:20:57.6736961Z bar.sync 0; 2026-02-21T09:20:57.6737018Z // begin inline asm 2026-02-21T09:20:57.6737161Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3472, %r3608}, [%r578]; 2026-02-21T09:20:57.6737218Z // end inline asm 2026-02-21T09:20:57.6737270Z bar.sync 0; 2026-02-21T09:20:57.6737397Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3941}; 2026-02-21T09:20:57.6737450Z bar.sync 0; 2026-02-21T09:20:57.6737507Z // begin inline asm 2026-02-21T09:20:57.6737648Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3471, %r3607}, [%r578]; 2026-02-21T09:20:57.6737705Z // end inline asm 2026-02-21T09:20:57.6737760Z bar.sync 0; 2026-02-21T09:20:57.6737887Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3943}; 2026-02-21T09:20:57.6737942Z bar.sync 0; 2026-02-21T09:20:57.6738000Z // begin inline asm 2026-02-21T09:20:57.6738140Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3473, %r3609}, [%r578]; 2026-02-21T09:20:57.6738196Z // end inline asm 2026-02-21T09:20:57.6738252Z bar.sync 0; 2026-02-21T09:20:57.6738377Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3944}; 2026-02-21T09:20:57.6738430Z bar.sync 0; 2026-02-21T09:20:57.6738493Z // begin inline asm 2026-02-21T09:20:57.6738647Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3474, %r3610}, [%r578]; 2026-02-21T09:20:57.6738704Z // end inline asm 2026-02-21T09:20:57.6738758Z bar.sync 0; 2026-02-21T09:20:57.6738889Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3946}; 2026-02-21T09:20:57.6738942Z bar.sync 0; 2026-02-21T09:20:57.6739000Z // begin inline asm 2026-02-21T09:20:57.6739145Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3476, %r3612}, [%r578]; 2026-02-21T09:20:57.6739287Z // end inline asm 2026-02-21T09:20:57.6739341Z bar.sync 0; 2026-02-21T09:20:57.6739471Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3945}; 2026-02-21T09:20:57.6739524Z bar.sync 0; 2026-02-21T09:20:57.6739642Z // begin inline asm 2026-02-21T09:20:57.6739784Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3475, %r3611}, [%r578]; 2026-02-21T09:20:57.6739841Z // end inline asm 2026-02-21T09:20:57.6739895Z bar.sync 0; 2026-02-21T09:20:57.6740020Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3947}; 2026-02-21T09:20:57.6740077Z bar.sync 0; 2026-02-21T09:20:57.6740135Z // begin inline asm 2026-02-21T09:20:57.6740276Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3477, %r3613}, [%r578]; 2026-02-21T09:20:57.6740395Z // end inline asm 2026-02-21T09:20:57.6740457Z bar.sync 0; 2026-02-21T09:20:57.6740583Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3948}; 2026-02-21T09:20:57.6740637Z bar.sync 0; 2026-02-21T09:20:57.6740698Z // begin inline asm 2026-02-21T09:20:57.6740839Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3478, %r3614}, [%r578]; 2026-02-21T09:20:57.6740893Z // end inline asm 2026-02-21T09:20:57.6740946Z bar.sync 0; 2026-02-21T09:20:57.6741075Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3950}; 2026-02-21T09:20:57.6741132Z bar.sync 0; 2026-02-21T09:20:57.6741190Z // begin inline asm 2026-02-21T09:20:57.6741334Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3480, %r3616}, [%r578]; 2026-02-21T09:20:57.6741388Z // end inline asm 2026-02-21T09:20:57.6741442Z bar.sync 0; 2026-02-21T09:20:57.6741633Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3949}; 2026-02-21T09:20:57.6741691Z bar.sync 0; 2026-02-21T09:20:57.6741748Z // begin inline asm 2026-02-21T09:20:57.6741893Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3479, %r3615}, [%r578]; 2026-02-21T09:20:57.6741953Z // end inline asm 2026-02-21T09:20:57.6742006Z bar.sync 0; 2026-02-21T09:20:57.6742132Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3951}; 2026-02-21T09:20:57.6742188Z bar.sync 0; 2026-02-21T09:20:57.6742244Z // begin inline asm 2026-02-21T09:20:57.6742389Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3481, %r3617}, [%r578]; 2026-02-21T09:20:57.6742454Z // end inline asm 2026-02-21T09:20:57.6742518Z bar.sync 0; 2026-02-21T09:20:57.6742645Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3952}; 2026-02-21T09:20:57.6742700Z bar.sync 0; 2026-02-21T09:20:57.6742760Z // begin inline asm 2026-02-21T09:20:57.6742904Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3482, %r3618}, [%r578]; 2026-02-21T09:20:57.6742962Z // end inline asm 2026-02-21T09:20:57.6743016Z bar.sync 0; 2026-02-21T09:20:57.6743145Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3954}; 2026-02-21T09:20:57.6743198Z bar.sync 0; 2026-02-21T09:20:57.6743257Z // begin inline asm 2026-02-21T09:20:57.6743403Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3484, %r3620}, [%r578]; 2026-02-21T09:20:57.6743459Z // end inline asm 2026-02-21T09:20:57.6743514Z bar.sync 0; 2026-02-21T09:20:57.6743641Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3953}; 2026-02-21T09:20:57.6743694Z bar.sync 0; 2026-02-21T09:20:57.6743751Z // begin inline asm 2026-02-21T09:20:57.6743895Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3483, %r3619}, [%r578]; 2026-02-21T09:20:57.6743957Z // end inline asm 2026-02-21T09:20:57.6744013Z bar.sync 0; 2026-02-21T09:20:57.6744139Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3955}; 2026-02-21T09:20:57.6744196Z bar.sync 0; 2026-02-21T09:20:57.6744267Z // begin inline asm 2026-02-21T09:20:57.6744414Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3485, %r3621}, [%r578]; 2026-02-21T09:20:57.6744470Z // end inline asm 2026-02-21T09:20:57.6744528Z bar.sync 0; 2026-02-21T09:20:57.6744656Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3956}; 2026-02-21T09:20:57.6744709Z bar.sync 0; 2026-02-21T09:20:57.6744768Z // begin inline asm 2026-02-21T09:20:57.6744912Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3486, %r3622}, [%r578]; 2026-02-21T09:20:57.6745029Z // end inline asm 2026-02-21T09:20:57.6745082Z bar.sync 0; 2026-02-21T09:20:57.6745213Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3958}; 2026-02-21T09:20:57.6745277Z bar.sync 0; 2026-02-21T09:20:57.6745386Z // begin inline asm 2026-02-21T09:20:57.6745533Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3488, %r3624}, [%r578]; 2026-02-21T09:20:57.6745589Z // end inline asm 2026-02-21T09:20:57.6745643Z bar.sync 0; 2026-02-21T09:20:57.6745775Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3957}; 2026-02-21T09:20:57.6745831Z bar.sync 0; 2026-02-21T09:20:57.6745889Z // begin inline asm 2026-02-21T09:20:57.6746031Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3487, %r3623}, [%r578]; 2026-02-21T09:20:57.6746142Z // end inline asm 2026-02-21T09:20:57.6746199Z bar.sync 0; 2026-02-21T09:20:57.6746327Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1079], {%r3959}; 2026-02-21T09:20:57.6746383Z bar.sync 0; 2026-02-21T09:20:57.6746442Z // begin inline asm 2026-02-21T09:20:57.6746700Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3489, %r3625}, [%r578]; 2026-02-21T09:20:57.6746755Z // end inline asm 2026-02-21T09:20:57.6746812Z $L__tmp13: 2026-02-21T09:20:57.6747106Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6747168Z // begin inline asm 2026-02-21T09:20:57.6747252Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6747309Z // end inline asm 2026-02-21T09:20:57.6747467Z shfl.sync.idx.b32 %r4078, %r6, 0, 31, -1; 2026-02-21T09:20:57.6747544Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6747607Z mov.pred %p36, -1; 2026-02-21T09:20:57.6747665Z // begin inline asm 2026-02-21T09:20:57.6748494Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3458,%r3459,%r3460,%r3461,%r3462,%r3463,%r3464,%r3465,%r3466,%r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489}, {%r3590,%r3591,%r3592,%r3593}, %rd1, %p36, 1, 1; 2026-02-21T09:20:57.6748561Z // end inline asm 2026-02-21T09:20:57.6748622Z // begin inline asm 2026-02-21T09:20:57.6749368Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3458,%r3459,%r3460,%r3461,%r3462,%r3463,%r3464,%r3465,%r3466,%r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489}, {%r3658,%r3659,%r3660,%r3661}, %rd2, %p36, 1, 1; 2026-02-21T09:20:57.6749431Z // end inline asm 2026-02-21T09:20:57.6749489Z // begin inline asm 2026-02-21T09:20:57.6750234Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3594,%r3595,%r3596,%r3597,%r3598,%r3599,%r3600,%r3601,%r3602,%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625}, {%r3590,%r3591,%r3592,%r3593}, %rd3, %p36, 1, 1; 2026-02-21T09:20:57.6750294Z // end inline asm 2026-02-21T09:20:57.6750351Z // begin inline asm 2026-02-21T09:20:57.6751092Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3594,%r3595,%r3596,%r3597,%r3598,%r3599,%r3600,%r3601,%r3602,%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625}, {%r3658,%r3659,%r3660,%r3661}, %rd4, %p36, 1, 1; 2026-02-21T09:20:57.6751151Z // end inline asm 2026-02-21T09:20:57.6751229Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6751287Z mov.b32 %r4030, 0; 2026-02-21T09:20:57.6751352Z mov.b32 %r3726, %r481; 2026-02-21T09:20:57.6751413Z mov.b32 %r3727, %r4030; 2026-02-21T09:20:57.6751471Z mov.b32 %r3728, %r4030; 2026-02-21T09:20:57.6751544Z // begin inline asm 2026-02-21T09:20:57.6752623Z // wait for regs: %r3458,%r3459,%r3460,%r3461,%r3462,%r3463,%r3464,%r3465,%r3466,%r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3594,%r3595,%r3596,%r3597,%r3598,%r3599,%r3600,%r3601,%r3602,%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625,%r3726,%r3727,%r3728 2026-02-21T09:20:57.6752852Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6752911Z // end inline asm 2026-02-21T09:20:57.6752965Z $L__tmp14: 2026-02-21T09:20:57.6753175Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6753245Z add.s32 %r4079, %r4073, 40960; 2026-02-21T09:20:57.6753446Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6753569Z add.s32 %r4080, %r4079, %r31; 2026-02-21T09:20:57.6753638Z ld.shared.b16 %rs135, [%r4080]; 2026-02-21T09:20:57.6753710Z ld.shared.b16 %rs136, [%r4080+256]; 2026-02-21T09:20:57.6753778Z ld.shared.b16 %rs137, [%r4080+16]; 2026-02-21T09:20:57.6753845Z ld.shared.b16 %rs138, [%r4080+272]; 2026-02-21T09:20:57.6753907Z add.s32 %r4081, %r4079, %r32; 2026-02-21T09:20:57.6753970Z ld.shared.b16 %rs139, [%r4081]; 2026-02-21T09:20:57.6754035Z ld.shared.b16 %rs140, [%r4081+256]; 2026-02-21T09:20:57.6754107Z ld.shared.b16 %rs141, [%r4081+16]; 2026-02-21T09:20:57.6754172Z ld.shared.b16 %rs142, [%r4081+272]; 2026-02-21T09:20:57.6754237Z cvt.f32.bf16 %r3924, %rs135; 2026-02-21T09:20:57.6754298Z cvt.f32.bf16 %r3925, %rs136; 2026-02-21T09:20:57.6754364Z cvt.f32.bf16 %r3926, %rs139; 2026-02-21T09:20:57.6754473Z cvt.f32.bf16 %r3927, %rs140; 2026-02-21T09:20:57.6754536Z cvt.f32.bf16 %r3992, %rs137; 2026-02-21T09:20:57.6754598Z cvt.f32.bf16 %r3993, %rs138; 2026-02-21T09:20:57.6754659Z cvt.f32.bf16 %r3994, %rs141; 2026-02-21T09:20:57.6754719Z cvt.f32.bf16 %r3995, %rs142; 2026-02-21T09:20:57.6754918Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6754984Z add.s32 %r4082, %r5268, 65536; 2026-02-21T09:20:57.6755048Z cvt.s64.s32 %rd175, %r4082; 2026-02-21T09:20:57.6755123Z add.s64 %rd168, %rd28, %rd175; 2026-02-21T09:20:57.6755325Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6755386Z // begin inline asm 2026-02-21T09:20:57.6755446Z mov.u64 %rd167, 0x0; 2026-02-21T09:20:57.6755578Z createpolicy.fractional.L2::evict_first.b64 %rd167, 1.0; 2026-02-21T09:20:57.6755634Z // end inline asm 2026-02-21T09:20:57.6755693Z // begin inline asm 2026-02-21T09:20:57.6755752Z mov.u16 %rs116, 0x0; 2026-02-21T09:20:57.6755921Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs116 }, [ %rd168 + 0 ], %rd167; 2026-02-21T09:20:57.6755977Z // end inline asm 2026-02-21T09:20:57.6756177Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6756235Z bar.sync 0; 2026-02-21T09:20:57.6756301Z st.shared.b8 [%r33], %rs116; 2026-02-21T09:20:57.6756356Z bar.sync 0; 2026-02-21T09:20:57.6756438Z ld.shared.v2.b8 {%rs143, %rs144}, [%r34]; 2026-02-21T09:20:57.6756757Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6756825Z shl.b16 %rs145, %rs143, 4; 2026-02-21T09:20:57.6756887Z shl.b16 %rs146, %rs144, 4; 2026-02-21T09:20:57.6757089Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6757162Z selp.b16 %rs147, %rs145, %rs143, %p63; 2026-02-21T09:20:57.6757225Z cvt.s16.s8 %rs148, %rs147; 2026-02-21T09:20:57.6757288Z shr.s16 %rs149, %rs148, 4; 2026-02-21T09:20:57.6757356Z selp.b16 %rs150, %rs146, %rs144, %p63; 2026-02-21T09:20:57.6757418Z cvt.s16.s8 %rs151, %rs150; 2026-02-21T09:20:57.6757478Z shr.s16 %rs152, %rs151, 4; 2026-02-21T09:20:57.6757675Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6757820Z cvt.rn.f32.s16 %r4083, %rs149; 2026-02-21T09:20:57.6757883Z cvt.rn.f32.s16 %r4084, %rs152; 2026-02-21T09:20:57.6757948Z bar.sync 0; 2026-02-21T09:20:57.6758020Z st.shared.b32 [%r35], %r4083; 2026-02-21T09:20:57.6758083Z st.shared.b32 [%r36], %r4084; 2026-02-21T09:20:57.6758201Z $L__tmp15: 2026-02-21T09:20:57.6758475Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6758629Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3458, %r3594}; 2026-02-21T09:20:57.6758684Z bar.sync 0; 2026-02-21T09:20:57.6758745Z // begin inline asm 2026-02-21T09:20:57.6758879Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3928}, [%r1079]; 2026-02-21T09:20:57.6758937Z // end inline asm 2026-02-21T09:20:57.6759059Z bar.sync 0; 2026-02-21T09:20:57.6759209Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3460, %r3596}; 2026-02-21T09:20:57.6759262Z bar.sync 0; 2026-02-21T09:20:57.6759322Z // begin inline asm 2026-02-21T09:20:57.6759455Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3930}, [%r1079]; 2026-02-21T09:20:57.6759510Z // end inline asm 2026-02-21T09:20:57.6759562Z bar.sync 0; 2026-02-21T09:20:57.6759708Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3459, %r3595}; 2026-02-21T09:20:57.6759762Z bar.sync 0; 2026-02-21T09:20:57.6759820Z // begin inline asm 2026-02-21T09:20:57.6759949Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3929}, [%r1079]; 2026-02-21T09:20:57.6760003Z // end inline asm 2026-02-21T09:20:57.6760057Z bar.sync 0; 2026-02-21T09:20:57.6760263Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3461, %r3597}; 2026-02-21T09:20:57.6760323Z bar.sync 0; 2026-02-21T09:20:57.6760380Z // begin inline asm 2026-02-21T09:20:57.6760509Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3931}, [%r1079]; 2026-02-21T09:20:57.6760567Z // end inline asm 2026-02-21T09:20:57.6760620Z bar.sync 0; 2026-02-21T09:20:57.6760762Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3462, %r3598}; 2026-02-21T09:20:57.6760817Z bar.sync 0; 2026-02-21T09:20:57.6760877Z // begin inline asm 2026-02-21T09:20:57.6761002Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3932}, [%r1079]; 2026-02-21T09:20:57.6761057Z // end inline asm 2026-02-21T09:20:57.6761122Z bar.sync 0; 2026-02-21T09:20:57.6761274Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3464, %r3600}; 2026-02-21T09:20:57.6761328Z bar.sync 0; 2026-02-21T09:20:57.6761388Z // begin inline asm 2026-02-21T09:20:57.6761516Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3934}, [%r1079]; 2026-02-21T09:20:57.6761580Z // end inline asm 2026-02-21T09:20:57.6761634Z bar.sync 0; 2026-02-21T09:20:57.6761780Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3463, %r3599}; 2026-02-21T09:20:57.6761834Z bar.sync 0; 2026-02-21T09:20:57.6761893Z // begin inline asm 2026-02-21T09:20:57.6762024Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3933}, [%r1079]; 2026-02-21T09:20:57.6762081Z // end inline asm 2026-02-21T09:20:57.6762137Z bar.sync 0; 2026-02-21T09:20:57.6762281Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3465, %r3601}; 2026-02-21T09:20:57.6762338Z bar.sync 0; 2026-02-21T09:20:57.6762395Z // begin inline asm 2026-02-21T09:20:57.6762521Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3935}, [%r1079]; 2026-02-21T09:20:57.6762582Z // end inline asm 2026-02-21T09:20:57.6762635Z bar.sync 0; 2026-02-21T09:20:57.6762778Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3466, %r3602}; 2026-02-21T09:20:57.6762832Z bar.sync 0; 2026-02-21T09:20:57.6762894Z // begin inline asm 2026-02-21T09:20:57.6763021Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3936}, [%r1079]; 2026-02-21T09:20:57.6763078Z // end inline asm 2026-02-21T09:20:57.6763135Z bar.sync 0; 2026-02-21T09:20:57.6763280Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3468, %r3604}; 2026-02-21T09:20:57.6763335Z bar.sync 0; 2026-02-21T09:20:57.6763396Z // begin inline asm 2026-02-21T09:20:57.6763524Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3938}, [%r1079]; 2026-02-21T09:20:57.6763645Z // end inline asm 2026-02-21T09:20:57.6763698Z bar.sync 0; 2026-02-21T09:20:57.6763843Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3467, %r3603}; 2026-02-21T09:20:57.6763896Z bar.sync 0; 2026-02-21T09:20:57.6764001Z // begin inline asm 2026-02-21T09:20:57.6764129Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3937}, [%r1079]; 2026-02-21T09:20:57.6764185Z // end inline asm 2026-02-21T09:20:57.6764238Z bar.sync 0; 2026-02-21T09:20:57.6764381Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3469, %r3605}; 2026-02-21T09:20:57.6764440Z bar.sync 0; 2026-02-21T09:20:57.6764500Z // begin inline asm 2026-02-21T09:20:57.6764626Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3939}, [%r1079]; 2026-02-21T09:20:57.6764686Z // end inline asm 2026-02-21T09:20:57.6764787Z bar.sync 0; 2026-02-21T09:20:57.6764933Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3470, %r3606}; 2026-02-21T09:20:57.6764987Z bar.sync 0; 2026-02-21T09:20:57.6765051Z // begin inline asm 2026-02-21T09:20:57.6765178Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3940}, [%r1079]; 2026-02-21T09:20:57.6765234Z // end inline asm 2026-02-21T09:20:57.6765288Z bar.sync 0; 2026-02-21T09:20:57.6765430Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3472, %r3608}; 2026-02-21T09:20:57.6765485Z bar.sync 0; 2026-02-21T09:20:57.6765546Z // begin inline asm 2026-02-21T09:20:57.6765673Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3942}, [%r1079]; 2026-02-21T09:20:57.6765729Z // end inline asm 2026-02-21T09:20:57.6765782Z bar.sync 0; 2026-02-21T09:20:57.6765975Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3471, %r3607}; 2026-02-21T09:20:57.6766031Z bar.sync 0; 2026-02-21T09:20:57.6766088Z // begin inline asm 2026-02-21T09:20:57.6766218Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3941}, [%r1079]; 2026-02-21T09:20:57.6766276Z // end inline asm 2026-02-21T09:20:57.6766328Z bar.sync 0; 2026-02-21T09:20:57.6766588Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3473, %r3609}; 2026-02-21T09:20:57.6766651Z bar.sync 0; 2026-02-21T09:20:57.6766709Z // begin inline asm 2026-02-21T09:20:57.6766838Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3943}, [%r1079]; 2026-02-21T09:20:57.6766896Z // end inline asm 2026-02-21T09:20:57.6766950Z bar.sync 0; 2026-02-21T09:20:57.6767093Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3474, %r3610}; 2026-02-21T09:20:57.6767150Z bar.sync 0; 2026-02-21T09:20:57.6767211Z // begin inline asm 2026-02-21T09:20:57.6767337Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3944}, [%r1079]; 2026-02-21T09:20:57.6767393Z // end inline asm 2026-02-21T09:20:57.6767448Z bar.sync 0; 2026-02-21T09:20:57.6767592Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3476, %r3612}; 2026-02-21T09:20:57.6767650Z bar.sync 0; 2026-02-21T09:20:57.6767710Z // begin inline asm 2026-02-21T09:20:57.6767839Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3946}, [%r1079]; 2026-02-21T09:20:57.6767895Z // end inline asm 2026-02-21T09:20:57.6767950Z bar.sync 0; 2026-02-21T09:20:57.6768094Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3475, %r3611}; 2026-02-21T09:20:57.6768147Z bar.sync 0; 2026-02-21T09:20:57.6768204Z // begin inline asm 2026-02-21T09:20:57.6768346Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3945}, [%r1079]; 2026-02-21T09:20:57.6768405Z // end inline asm 2026-02-21T09:20:57.6768459Z bar.sync 0; 2026-02-21T09:20:57.6768602Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3477, %r3613}; 2026-02-21T09:20:57.6768659Z bar.sync 0; 2026-02-21T09:20:57.6768717Z // begin inline asm 2026-02-21T09:20:57.6768844Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3947}, [%r1079]; 2026-02-21T09:20:57.6768902Z // end inline asm 2026-02-21T09:20:57.6768955Z bar.sync 0; 2026-02-21T09:20:57.6769098Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3478, %r3614}; 2026-02-21T09:20:57.6769153Z bar.sync 0; 2026-02-21T09:20:57.6769214Z // begin inline asm 2026-02-21T09:20:57.6769341Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3948}, [%r1079]; 2026-02-21T09:20:57.6769484Z // end inline asm 2026-02-21T09:20:57.6769551Z bar.sync 0; 2026-02-21T09:20:57.6769698Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3480, %r3616}; 2026-02-21T09:20:57.6769753Z bar.sync 0; 2026-02-21T09:20:57.6769878Z // begin inline asm 2026-02-21T09:20:57.6770006Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3950}, [%r1079]; 2026-02-21T09:20:57.6770061Z // end inline asm 2026-02-21T09:20:57.6770115Z bar.sync 0; 2026-02-21T09:20:57.6770260Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3479, %r3615}; 2026-02-21T09:20:57.6770315Z bar.sync 0; 2026-02-21T09:20:57.6770373Z // begin inline asm 2026-02-21T09:20:57.6770502Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3949}, [%r1079]; 2026-02-21T09:20:57.6770619Z // end inline asm 2026-02-21T09:20:57.6770676Z bar.sync 0; 2026-02-21T09:20:57.6770818Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3481, %r3617}; 2026-02-21T09:20:57.6770874Z bar.sync 0; 2026-02-21T09:20:57.6770934Z // begin inline asm 2026-02-21T09:20:57.6771061Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3951}, [%r1079]; 2026-02-21T09:20:57.6771120Z // end inline asm 2026-02-21T09:20:57.6771173Z bar.sync 0; 2026-02-21T09:20:57.6771315Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3482, %r3618}; 2026-02-21T09:20:57.6771377Z bar.sync 0; 2026-02-21T09:20:57.6771438Z // begin inline asm 2026-02-21T09:20:57.6771565Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3952}, [%r1079]; 2026-02-21T09:20:57.6771622Z // end inline asm 2026-02-21T09:20:57.6771678Z bar.sync 0; 2026-02-21T09:20:57.6771891Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3484, %r3620}; 2026-02-21T09:20:57.6771949Z bar.sync 0; 2026-02-21T09:20:57.6772008Z // begin inline asm 2026-02-21T09:20:57.6772137Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3954}, [%r1079]; 2026-02-21T09:20:57.6772193Z // end inline asm 2026-02-21T09:20:57.6772246Z bar.sync 0; 2026-02-21T09:20:57.6772392Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3483, %r3619}; 2026-02-21T09:20:57.6772447Z bar.sync 0; 2026-02-21T09:20:57.6772503Z // begin inline asm 2026-02-21T09:20:57.6772631Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3953}, [%r1079]; 2026-02-21T09:20:57.6772686Z // end inline asm 2026-02-21T09:20:57.6772741Z bar.sync 0; 2026-02-21T09:20:57.6772883Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3485, %r3621}; 2026-02-21T09:20:57.6772941Z bar.sync 0; 2026-02-21T09:20:57.6772999Z // begin inline asm 2026-02-21T09:20:57.6773126Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3955}, [%r1079]; 2026-02-21T09:20:57.6773186Z // end inline asm 2026-02-21T09:20:57.6773241Z bar.sync 0; 2026-02-21T09:20:57.6773384Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3486, %r3622}; 2026-02-21T09:20:57.6773440Z bar.sync 0; 2026-02-21T09:20:57.6773500Z // begin inline asm 2026-02-21T09:20:57.6773626Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3956}, [%r1079]; 2026-02-21T09:20:57.6773682Z // end inline asm 2026-02-21T09:20:57.6773739Z bar.sync 0; 2026-02-21T09:20:57.6773882Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3488, %r3624}; 2026-02-21T09:20:57.6773936Z bar.sync 0; 2026-02-21T09:20:57.6773999Z // begin inline asm 2026-02-21T09:20:57.6774126Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3958}, [%r1079]; 2026-02-21T09:20:57.6774183Z // end inline asm 2026-02-21T09:20:57.6774236Z bar.sync 0; 2026-02-21T09:20:57.6774379Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3487, %r3623}; 2026-02-21T09:20:57.6774434Z bar.sync 0; 2026-02-21T09:20:57.6774495Z // begin inline asm 2026-02-21T09:20:57.6774626Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3957}, [%r1079]; 2026-02-21T09:20:57.6774682Z // end inline asm 2026-02-21T09:20:57.6774735Z bar.sync 0; 2026-02-21T09:20:57.6774879Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r578], {%r3489, %r3625}; 2026-02-21T09:20:57.6774936Z bar.sync 0; 2026-02-21T09:20:57.6774994Z // begin inline asm 2026-02-21T09:20:57.6775121Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3959}, [%r1079]; 2026-02-21T09:20:57.6775264Z // end inline asm 2026-02-21T09:20:57.6775330Z // begin inline asm 2026-02-21T09:20:57.6775411Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6775468Z // end inline asm 2026-02-21T09:20:57.6775594Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6775656Z shl.b32 %r4085, %r4078, 8; 2026-02-21T09:20:57.6775717Z and.b32 %r4086, %r4085, 4096; 2026-02-21T09:20:57.6775782Z add.s32 %r4087, %r4086, %r481; 2026-02-21T09:20:57.6775843Z bfe.u32 %r4088, %r4087, 4, 14; 2026-02-21T09:20:57.6775910Z cvt.u64.u32 %rd176, %r4088; 2026-02-21T09:20:57.6775994Z or.b64 %rd170, %rd176, -9223371899382267904; 2026-02-21T09:20:57.6776053Z // begin inline asm 2026-02-21T09:20:57.6777024Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3928,%r3929,%r3930,%r3931,%r3932,%r3933,%r3934,%r3935,%r3936,%r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959}, {%r3924,%r3925,%r3926,%r3927}, %rd170, %p36, 1, 1; 2026-02-21T09:20:57.6777093Z // end inline asm 2026-02-21T09:20:57.6777154Z add.s32 %r4089, %r4087, 32; 2026-02-21T09:20:57.6777214Z bfe.u32 %r4090, %r4089, 4, 14; 2026-02-21T09:20:57.6777276Z cvt.u64.u32 %rd177, %r4090; 2026-02-21T09:20:57.6777360Z or.b64 %rd171, %rd177, -9223371899382267904; 2026-02-21T09:20:57.6777418Z // begin inline asm 2026-02-21T09:20:57.6778231Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3928,%r3929,%r3930,%r3931,%r3932,%r3933,%r3934,%r3935,%r3936,%r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959}, {%r3992,%r3993,%r3994,%r3995}, %rd171, %p36, 1, 1; 2026-02-21T09:20:57.6778296Z // end inline asm 2026-02-21T09:20:57.6778373Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6778432Z mov.b32 %r4028, %r481; 2026-02-21T09:20:57.6778493Z mov.b32 %r4029, %r4030; 2026-02-21T09:20:57.6778553Z // begin inline asm 2026-02-21T09:20:57.6779113Z // wait for regs: %r3928,%r3929,%r3930,%r3931,%r3932,%r3933,%r3934,%r3935,%r3936,%r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959,%r4028,%r4029,%r4030 2026-02-21T09:20:57.6779192Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6779247Z // end inline asm 2026-02-21T09:20:57.6779301Z $L__tmp16: 2026-02-21T09:20:57.6779513Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6779577Z add.s32 %r4091, %r5271, 1; 2026-02-21T09:20:57.6779642Z setp.gt.s32 %p45, %r4091, 4; 2026-02-21T09:20:57.6779710Z selp.b32 %r5271, 0, %r4091, %p45; 2026-02-21T09:20:57.6779915Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6779979Z add.s32 %r4092, %r5269, -16; 2026-02-21T09:20:57.6780052Z mad.wide.s32 %rd172, %r4092, 2, %rd27; 2026-02-21T09:20:57.6780253Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6780313Z shl.b32 %r4093, %r5271, 13; 2026-02-21T09:20:57.6780374Z add.s32 %r4066, %r21, %r4093; 2026-02-21T09:20:57.6780439Z selp.b32 %r4067, 8, 0, %p43; 2026-02-21T09:20:57.6780500Z // begin inline asm 2026-02-21T09:20:57.6780644Z cp.async.ca.shared.global [ %r4066 + 0 ], [ %rd172 + 0 ], 0x8, %r4067; 2026-02-21T09:20:57.6780701Z // end inline asm 2026-02-21T09:20:57.6780772Z cp.async.commit_group; 2026-02-21T09:20:57.6780971Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6781041Z mad.wide.s32 %rd173, %r5269, 2, %rd27; 2026-02-21T09:20:57.6781244Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6781310Z add.s32 %r4068, %r22, %r4093; 2026-02-21T09:20:57.6781368Z // begin inline asm 2026-02-21T09:20:57.6781581Z cp.async.ca.shared.global [ %r4068 + 0 ], [ %rd173 + 0 ], 0x8, %r4067; 2026-02-21T09:20:57.6781641Z // end inline asm 2026-02-21T09:20:57.6781705Z cp.async.commit_group; 2026-02-21T09:20:57.6781900Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6782024Z add.s32 %r5269, %r5269, 32; 2026-02-21T09:20:57.6782085Z add.s32 %r5268, %r5268, 131072; 2026-02-21T09:20:57.6782150Z setp.lt.u64 %p46, %rd222, 496; 2026-02-21T09:20:57.6782211Z @%p46 bra $L__BB0_9; 2026-02-21T09:20:57.6782325Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:20:57.6782522Z .loc 1 34 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:32 2026-02-21T09:20:57.6782644Z or.b32 %r4130, %r305, %r8; 2026-02-21T09:20:57.6782845Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6782908Z or.b32 %r4131, %r306, %r12; 2026-02-21T09:20:57.6782967Z or.b32 %r4132, %r306, %r13; 2026-02-21T09:20:57.6783029Z or.b32 %r4133, %r306, %r14; 2026-02-21T09:20:57.6783087Z or.b32 %r4134, %r306, %r15; 2026-02-21T09:20:57.6783283Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6783354Z cp.async.wait_group 0; 2026-02-21T09:20:57.6783408Z bar.sync 0; 2026-02-21T09:20:57.6783609Z .loc 1 90 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:90:28 2026-02-21T09:20:57.6783736Z cvt.rn.bf16x2.f32 %r4135, %r3929, %r3928; 2026-02-21T09:20:57.6783817Z cvt.rn.bf16x2.f32 %r4136, %r3931, %r3930; 2026-02-21T09:20:57.6783889Z cvt.rn.bf16x2.f32 %r4137, %r3933, %r3932; 2026-02-21T09:20:57.6783961Z cvt.rn.bf16x2.f32 %r4138, %r3935, %r3934; 2026-02-21T09:20:57.6784034Z cvt.rn.bf16x2.f32 %r4139, %r3937, %r3936; 2026-02-21T09:20:57.6784103Z cvt.rn.bf16x2.f32 %r4140, %r3939, %r3938; 2026-02-21T09:20:57.6784174Z cvt.rn.bf16x2.f32 %r4141, %r3941, %r3940; 2026-02-21T09:20:57.6784244Z cvt.rn.bf16x2.f32 %r4142, %r3943, %r3942; 2026-02-21T09:20:57.6784317Z cvt.rn.bf16x2.f32 %r4143, %r3945, %r3944; 2026-02-21T09:20:57.6784386Z cvt.rn.bf16x2.f32 %r4144, %r3947, %r3946; 2026-02-21T09:20:57.6784457Z cvt.rn.bf16x2.f32 %r4145, %r3949, %r3948; 2026-02-21T09:20:57.6784531Z cvt.rn.bf16x2.f32 %r4146, %r3951, %r3950; 2026-02-21T09:20:57.6784611Z cvt.rn.bf16x2.f32 %r4147, %r3953, %r3952; 2026-02-21T09:20:57.6784683Z cvt.rn.bf16x2.f32 %r4148, %r3955, %r3954; 2026-02-21T09:20:57.6784757Z cvt.rn.bf16x2.f32 %r4149, %r3957, %r3956; 2026-02-21T09:20:57.6784829Z cvt.rn.bf16x2.f32 %r4150, %r3959, %r3958; 2026-02-21T09:20:57.6785029Z .loc 1 91 43 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:43 2026-02-21T09:20:57.6785089Z shl.b32 %r4151, %r4131, 13; 2026-02-21T09:20:57.6785151Z shl.b32 %r4152, %r4132, 13; 2026-02-21T09:20:57.6785209Z shl.b32 %r4153, %r4133, 13; 2026-02-21T09:20:57.6785269Z shl.b32 %r4154, %r4134, 13; 2026-02-21T09:20:57.6785470Z .loc 1 91 50 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:50 2026-02-21T09:20:57.6785533Z add.s32 %r4155, %r4151, %r4130; 2026-02-21T09:20:57.6785598Z add.s32 %r4156, %r4152, %r4130; 2026-02-21T09:20:57.6785660Z add.s32 %r4157, %r4153, %r4130; 2026-02-21T09:20:57.6785719Z add.s32 %r4158, %r4154, %r4130; 2026-02-21T09:20:57.6785915Z .loc 1 91 22 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:22 2026-02-21T09:20:57.6785986Z mad.wide.s32 %rd178, %r4155, 2, %rd29; 2026-02-21T09:20:57.6786058Z mad.wide.s32 %rd179, %r4156, 2, %rd29; 2026-02-21T09:20:57.6786124Z mad.wide.s32 %rd180, %r4157, 2, %rd29; 2026-02-21T09:20:57.6786190Z mad.wide.s32 %rd181, %r4158, 2, %rd29; 2026-02-21T09:20:57.6786386Z .loc 1 91 81 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:81 2026-02-21T09:20:57.6786680Z st.shared.v4.b32 [%r39], {%r4135, %r4137, %r4139, %r4141}; 2026-02-21T09:20:57.6786802Z st.shared.v4.b32 [%r39+512], {%r4136, %r4138, %r4140, %r4142}; 2026-02-21T09:20:57.6786908Z st.shared.v4.b32 [%r40], {%r4143, %r4145, %r4147, %r4149}; 2026-02-21T09:20:57.6787092Z st.shared.v4.b32 [%r40+512], {%r4144, %r4146, %r4148, %r4150}; 2026-02-21T09:20:57.6787148Z bar.sync 0; 2026-02-21T09:20:57.6787207Z // begin inline asm 2026-02-21T09:20:57.6787402Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4094, %r4095, %r4096, %r4097}, [%r1380]; 2026-02-21T09:20:57.6787458Z // end inline asm 2026-02-21T09:20:57.6787519Z // begin inline asm 2026-02-21T09:20:57.6787703Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4099, %r4100, %r4101, %r4102}, [%r1385]; 2026-02-21T09:20:57.6787757Z // end inline asm 2026-02-21T09:20:57.6787891Z // begin inline asm 2026-02-21T09:20:57.6788077Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4104, %r4105, %r4106, %r4107}, [%r1390]; 2026-02-21T09:20:57.6788134Z // end inline asm 2026-02-21T09:20:57.6788193Z // begin inline asm 2026-02-21T09:20:57.6788428Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4109, %r4110, %r4111, %r4112}, [%r1395]; 2026-02-21T09:20:57.6788488Z // end inline asm 2026-02-21T09:20:57.6788545Z // begin inline asm 2026-02-21T09:20:57.6788674Z st.global.v4.b32 [ %rd178 + 0 ], { %r4094, %r4095, %r4096, %r4097 }; 2026-02-21T09:20:57.6788733Z // end inline asm 2026-02-21T09:20:57.6788791Z // begin inline asm 2026-02-21T09:20:57.6788910Z st.global.v4.b32 [ %rd179 + 0 ], { %r4099, %r4100, %r4101, %r4102 }; 2026-02-21T09:20:57.6789032Z // end inline asm 2026-02-21T09:20:57.6789094Z // begin inline asm 2026-02-21T09:20:57.6789209Z st.global.v4.b32 [ %rd180 + 0 ], { %r4104, %r4105, %r4106, %r4107 }; 2026-02-21T09:20:57.6789267Z // end inline asm 2026-02-21T09:20:57.6789330Z // begin inline asm 2026-02-21T09:20:57.6789445Z st.global.v4.b32 [ %rd181 + 0 ], { %r4109, %r4110, %r4111, %r4112 }; 2026-02-21T09:20:57.6789501Z // end inline asm 2026-02-21T09:20:57.6789719Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6789781Z add.s32 %r5159, %r5159, 4; 2026-02-21T09:20:57.6789851Z setp.lt.s32 %p47, %r5159, %r5304; 2026-02-21T09:20:57.6789912Z @%p47 bra $L__BB0_2; 2026-02-21T09:20:57.6790007Z $L__BB0_11: // %.preheader 2026-02-21T09:20:57.6790076Z setp.ge.s32 %p48, %r5304, %r2; 2026-02-21T09:20:57.6790136Z @%p48 bra $L__BB0_16; 2026-02-21T09:20:57.6790224Z // %bb.12: // %.lr.ph52 2026-02-21T09:20:57.6790429Z .loc 1 0 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:0:121 2026-02-21T09:20:57.6790493Z and.b32 %r4161, %r5135, 136; 2026-02-21T09:20:57.6790557Z xor.b32 %r4162, %r4161, %r5134; 2026-02-21T09:20:57.6790621Z add.s32 %r48, %r5136, %r4162; 2026-02-21T09:20:57.6790682Z add.s32 %r49, %r48, 40960; 2026-02-21T09:20:57.6790743Z add.s32 %r4235, %r48, 8192; 2026-02-21T09:20:57.6790809Z add.s32 %r4237, %r48, 49152; 2026-02-21T09:20:57.6790868Z add.s32 %r4239, %r48, 16384; 2026-02-21T09:20:57.6790928Z add.s32 %r4241, %r48, 57344; 2026-02-21T09:20:57.6790988Z add.s32 %r4243, %r48, 24576; 2026-02-21T09:20:57.6791058Z add.s32 %r4245, %r48, 65536; 2026-02-21T09:20:57.6791122Z add.s32 %r4247, %r48, 32768; 2026-02-21T09:20:57.6791180Z add.s32 %r4249, %r48, 73728; 2026-02-21T09:20:57.6791242Z and.b32 %r4165, %r5137, 7680; 2026-02-21T09:20:57.6791304Z or.b32 %r4168, %r4165, %r5138; 2026-02-21T09:20:57.6791363Z or.b32 %r4169, %r4168, %r5139; 2026-02-21T09:20:57.6791427Z or.b32 %r58, %r4169, %r4161; 2026-02-21T09:20:57.6791487Z xor.b32 %r59, %r58, 8; 2026-02-21T09:20:57.6791546Z and.b32 %r4171, %r5140, 124; 2026-02-21T09:20:57.6791608Z selp.b32 %r4175, 1, 0, %p62; 2026-02-21T09:20:57.6791674Z add.s32 %r4176, %r5136, 81920; 2026-02-21T09:20:57.6791734Z add.s32 %r4177, %r4176, %r5141; 2026-02-21T09:20:57.6791795Z add.s32 %r4178, %r4177, %r4175; 2026-02-21T09:20:57.6791944Z add.s32 %r4179, %r4178, %r5143; 2026-02-21T09:20:57.6792005Z add.s32 %r4180, %r4179, %r5142; 2026-02-21T09:20:57.6792066Z add.s32 %r60, %r4180, %r4171; 2026-02-21T09:20:57.6792129Z and.b32 %r4182, %r5144, 384; 2026-02-21T09:20:57.6792191Z add.s32 %r4183, %r4176, %r5142; 2026-02-21T09:20:57.6792666Z add.s32 %r4184, %r4183, %r4182; 2026-02-21T09:20:57.6792726Z add.s32 %r4185, %r4184, %r4171; 2026-02-21T09:20:57.6792789Z add.s32 %r61, %r4185, %r5143; 2026-02-21T09:20:57.6792849Z xor.b32 %r4189, %r5146, %r5147; 2026-02-21T09:20:57.6792910Z or.b32 %r4190, %r4189, %r5145; 2026-02-21T09:20:57.6792973Z add.s32 %r62, %r4176, %r4190; 2026-02-21T09:20:57.6793033Z xor.b32 %r4191, %r4190, 32; 2026-02-21T09:20:57.6793092Z add.s32 %r63, %r4176, %r4191; 2026-02-21T09:20:57.6793199Z or.b32 %r4194, %r5148, %r5149; 2026-02-21T09:20:57.6793263Z add.s32 %r4195, %r5136, 90112; 2026-02-21T09:20:57.6793322Z add.s32 %r4773, %r4195, %r4194; 2026-02-21T09:20:57.6793381Z and.b32 %r4196, %r5137, 112; 2026-02-21T09:20:57.6793447Z or.b32 %r4198, %r5148, %r5150; 2026-02-21T09:20:57.6793506Z and.b32 %r4199, %r4198, 1920; 2026-02-21T09:20:57.6793564Z and.b32 %r4201, %r5151, 2048; 2026-02-21T09:20:57.6793625Z add.s32 %r4202, %r4195, %r4196; 2026-02-21T09:20:57.6793691Z add.s32 %r4203, %r4202, %r4201; 2026-02-21T09:20:57.6793750Z add.s32 %r4272, %r4203, %r4199; 2026-02-21T09:20:57.6793810Z bfe.u32 %r4204, %r4176, 4, 14; 2026-02-21T09:20:57.6793876Z cvt.u64.u32 %rd182, %r4204; 2026-02-21T09:20:57.6793956Z or.b64 %rd200, %rd182, -9223371899382267904; 2026-02-21T09:20:57.6794078Z add.s32 %r4205, %r5136, 81952; 2026-02-21T09:20:57.6794142Z bfe.u32 %r4206, %r4205, 4, 14; 2026-02-21T09:20:57.6794204Z cvt.u64.u32 %rd183, %r4206; 2026-02-21T09:20:57.6794281Z or.b64 %rd201, %rd183, -9223371899382267904; 2026-02-21T09:20:57.6794342Z add.s32 %r4207, %r5136, 86016; 2026-02-21T09:20:57.6794404Z bfe.u32 %r4208, %r4207, 4, 14; 2026-02-21T09:20:57.6794464Z cvt.u64.u32 %rd184, %r4208; 2026-02-21T09:20:57.6794537Z or.b64 %rd202, %rd184, -9223371899382267904; 2026-02-21T09:20:57.6794600Z add.s32 %r4209, %r5136, 86048; 2026-02-21T09:20:57.6794659Z bfe.u32 %r4210, %r4209, 4, 14; 2026-02-21T09:20:57.6794720Z cvt.u64.u32 %rd185, %r4210; 2026-02-21T09:20:57.6794792Z or.b64 %rd203, %rd185, -9223371899382267904; 2026-02-21T09:20:57.6794859Z and.b32 %r4213, %r5153, 15456; 2026-02-21T09:20:57.6794919Z shl.b32 %r4215, %r5154, 4; 2026-02-21T09:20:57.6794978Z and.b32 %r4216, %r5140, 16; 2026-02-21T09:20:57.6795042Z shr.u32 %r4217, %r4, 3; 2026-02-21T09:20:57.6795103Z and.b32 %r4218, %r4217, 64; 2026-02-21T09:20:57.6795162Z or.b32 %r4219, %r5152, %r4216; 2026-02-21T09:20:57.6795222Z or.b32 %r4220, %r4213, %r4215; 2026-02-21T09:20:57.6795284Z xor.b32 %r4221, %r4220, %r4218; 2026-02-21T09:20:57.6795346Z or.b32 %r4222, %r4221, %r4219; 2026-02-21T09:20:57.6795406Z add.s32 %r66, %r5136, %r4222; 2026-02-21T09:20:57.6795468Z xor.b32 %r4223, %r4222, 32; 2026-02-21T09:20:57.6795529Z add.s32 %r67, %r5136, %r4223; 2026-02-21T09:20:57.6795588Z shl.b32 %r4224, %r5154, 11; 2026-02-21T09:20:57.6795649Z or.b32 %r4227, %r4224, %r5155; 2026-02-21T09:20:57.6795709Z xor.b32 %r4228, %r4227, %r5156; 2026-02-21T09:20:57.6795768Z add.s32 %r5073, %r5136, %r4228; 2026-02-21T09:20:57.6795830Z add.s32 %r5078, %r5073, 4096; 2026-02-21T09:20:57.6795893Z add.s32 %r5083, %r5073, 8192; 2026-02-21T09:20:57.6795951Z add.s32 %r5088, %r5073, 12288; 2026-02-21T09:20:57.6796176Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6796246Z add.s64 %rd9, %rd27, 320; 2026-02-21T09:20:57.6796563Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6796630Z or.b32 %r72, %r5157, %r19; 2026-02-21T09:20:57.6796689Z or.b32 %r73, %r72, 176; 2026-02-21T09:20:57.6796751Z or.b32 %r74, %r5158, %r9; 2026-02-21T09:20:57.6796863Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T09:20:57.6797044Z // Child Loop BB0_14 Depth 2 2026-02-21T09:20:57.6797245Z .loc 1 28 35 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:28:35 2026-02-21T09:20:57.6797367Z shr.s32 %r4254, %r5304, 31; 2026-02-21T09:20:57.6797427Z shr.u32 %r4255, %r4254, 22; 2026-02-21T09:20:57.6797491Z add.s32 %r4256, %r5304, %r4255; 2026-02-21T09:20:57.6797689Z .loc 1 31 45 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:45 2026-02-21T09:20:57.6797752Z and.b32 %r4257, %r4256, 64512; 2026-02-21T09:20:57.6797816Z sub.s32 %r4258, %r5304, %r4257; 2026-02-21T09:20:57.6798071Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6798136Z cvt.u16.u32 %rs153, %r4258; 2026-02-21T09:20:57.6798342Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6798409Z shr.s16 %rs154, %rs153, 15; 2026-02-21T09:20:57.6798471Z shr.u16 %rs155, %rs154, 12; 2026-02-21T09:20:57.6798532Z add.s16 %rs156, %rs153, %rs155; 2026-02-21T09:20:57.6798595Z shr.s16 %rs157, %rs156, 4; 2026-02-21T09:20:57.6798792Z .loc 1 31 64 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:31:64 2026-02-21T09:20:57.6798858Z and.b16 %rs158, %rs156, -16; 2026-02-21T09:20:57.6798922Z sub.s16 %rs159, %rs153, %rs158; 2026-02-21T09:20:57.6799177Z .loc 1 32 51 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:32:51 2026-02-21T09:20:57.6799239Z cvt.u32.u16 %r4259, %rs157; 2026-02-21T09:20:57.6799433Z .loc 1 33 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:33:27 2026-02-21T09:20:57.6799497Z shl.b32 %r4260, %r4256, 1; 2026-02-21T09:20:57.6799558Z and.b32 %r4261, %r4260, -2048; 2026-02-21T09:20:57.6799625Z mul.wide.s16 %r4262, %rs159, 128; 2026-02-21T09:20:57.6799691Z add.s32 %r383, %r4262, %r4261; 2026-02-21T09:20:57.6799885Z .loc 1 35 27 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:35:27 2026-02-21T09:20:57.6799953Z mul.wide.s16 %r384, %rs157, 256; 2026-02-21T09:20:57.6800150Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6800212Z or.b32 %r4263, %r384, %r11; 2026-02-21T09:20:57.6800406Z .loc 1 51 53 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:53 2026-02-21T09:20:57.6800468Z shl.b32 %r4264, %r4263, 10; 2026-02-21T09:20:57.6800664Z .loc 1 51 60 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:60 2026-02-21T09:20:57.6800723Z or.b32 %r4265, %r4264, %r19; 2026-02-21T09:20:57.6800917Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6800991Z mad.wide.s32 %rd186, %r4265, 2, %rd27; 2026-02-21T09:20:57.6801188Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6801244Z bar.sync 0; 2026-02-21T09:20:57.6801302Z mov.b32 %r4232, 8; 2026-02-21T09:20:57.6801360Z // begin inline asm 2026-02-21T09:20:57.6801501Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd186 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6801559Z // end inline asm 2026-02-21T09:20:57.6801641Z cp.async.commit_group; 2026-02-21T09:20:57.6801847Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6801910Z add.s64 %rd187, %rd186, 32; 2026-02-21T09:20:57.6802108Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6802169Z // begin inline asm 2026-02-21T09:20:57.6802305Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd187 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6802365Z // end inline asm 2026-02-21T09:20:57.6802430Z cp.async.commit_group; 2026-02-21T09:20:57.6802681Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6802742Z add.s64 %rd188, %rd186, 64; 2026-02-21T09:20:57.6802939Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6803039Z bar.sync 0; 2026-02-21T09:20:57.6803098Z // begin inline asm 2026-02-21T09:20:57.6803239Z cp.async.ca.shared.global [ %r4235 + 0 ], [ %rd188 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6803295Z // end inline asm 2026-02-21T09:20:57.6803361Z cp.async.commit_group; 2026-02-21T09:20:57.6803559Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6803620Z add.s64 %rd189, %rd186, 96; 2026-02-21T09:20:57.6803862Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6803926Z // begin inline asm 2026-02-21T09:20:57.6804065Z cp.async.ca.shared.global [ %r4237 + 0 ], [ %rd189 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6804121Z // end inline asm 2026-02-21T09:20:57.6804184Z cp.async.commit_group; 2026-02-21T09:20:57.6804384Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6804447Z add.s64 %rd190, %rd186, 128; 2026-02-21T09:20:57.6804643Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6804701Z bar.sync 0; 2026-02-21T09:20:57.6804770Z // begin inline asm 2026-02-21T09:20:57.6804961Z cp.async.ca.shared.global [ %r4239 + 0 ], [ %rd190 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6805019Z // end inline asm 2026-02-21T09:20:57.6805088Z cp.async.commit_group; 2026-02-21T09:20:57.6805285Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6805348Z add.s64 %rd191, %rd186, 160; 2026-02-21T09:20:57.6805545Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6805605Z // begin inline asm 2026-02-21T09:20:57.6805738Z cp.async.ca.shared.global [ %r4241 + 0 ], [ %rd191 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6805800Z // end inline asm 2026-02-21T09:20:57.6805865Z cp.async.commit_group; 2026-02-21T09:20:57.6806059Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6806123Z add.s64 %rd192, %rd186, 192; 2026-02-21T09:20:57.6806317Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6806372Z bar.sync 0; 2026-02-21T09:20:57.6806431Z // begin inline asm 2026-02-21T09:20:57.6806682Z cp.async.ca.shared.global [ %r4243 + 0 ], [ %rd192 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6806740Z // end inline asm 2026-02-21T09:20:57.6806804Z cp.async.commit_group; 2026-02-21T09:20:57.6807001Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6807066Z add.s64 %rd193, %rd186, 224; 2026-02-21T09:20:57.6807261Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6807324Z // begin inline asm 2026-02-21T09:20:57.6807470Z cp.async.ca.shared.global [ %r4245 + 0 ], [ %rd193 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6807527Z // end inline asm 2026-02-21T09:20:57.6807591Z cp.async.commit_group; 2026-02-21T09:20:57.6807791Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6807855Z add.s64 %rd194, %rd186, 256; 2026-02-21T09:20:57.6808050Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6808108Z bar.sync 0; 2026-02-21T09:20:57.6808166Z // begin inline asm 2026-02-21T09:20:57.6808298Z cp.async.ca.shared.global [ %r4247 + 0 ], [ %rd194 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6808432Z // end inline asm 2026-02-21T09:20:57.6808500Z cp.async.commit_group; 2026-02-21T09:20:57.6808693Z .loc 1 51 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:32 2026-02-21T09:20:57.6808822Z add.s64 %rd195, %rd186, 288; 2026-02-21T09:20:57.6809021Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6809080Z // begin inline asm 2026-02-21T09:20:57.6809212Z cp.async.ca.shared.global [ %r4249 + 0 ], [ %rd195 + 0 ], 0x8, %r4232; 2026-02-21T09:20:57.6809272Z // end inline asm 2026-02-21T09:20:57.6809337Z cp.async.commit_group; 2026-02-21T09:20:57.6809533Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6809658Z shl.b32 %r4266, %r4259, 18; 2026-02-21T09:20:57.6809721Z or.b32 %r4267, %r72, %r4266; 2026-02-21T09:20:57.6809793Z mad.wide.s32 %rd224, %r4267, 2, %rd9; 2026-02-21T09:20:57.6809854Z or.b32 %r4268, %r73, %r4266; 2026-02-21T09:20:57.6809926Z mad.wide.s32 %rd223, %r4268, 2, %rd27; 2026-02-21T09:20:57.6809988Z add.s32 %r4269, %r74, %r4261; 2026-02-21T09:20:57.6810050Z add.s32 %r5305, %r4269, %r4262; 2026-02-21T09:20:57.6810111Z mov.b32 %r4904, 0f00000000; 2026-02-21T09:20:57.6810168Z mov.b32 %r5307, 4; 2026-02-21T09:20:57.6810227Z mov.b32 %r5306, -1; 2026-02-21T09:20:57.6810297Z mov.b64 %rd225, -16; 2026-02-21T09:20:57.6810362Z mov.b32 %r4905, %r4904; 2026-02-21T09:20:57.6810420Z mov.b32 %r4906, %r4904; 2026-02-21T09:20:57.6810537Z mov.b32 %r4907, %r4904; 2026-02-21T09:20:57.6810602Z mov.b32 %r4908, %r4904; 2026-02-21T09:20:57.6810660Z mov.b32 %r4909, %r4904; 2026-02-21T09:20:57.6810717Z mov.b32 %r4910, %r4904; 2026-02-21T09:20:57.6810776Z mov.b32 %r4911, %r4904; 2026-02-21T09:20:57.6810835Z mov.b32 %r4912, %r4904; 2026-02-21T09:20:57.6810901Z mov.b32 %r4913, %r4904; 2026-02-21T09:20:57.6810962Z mov.b32 %r4914, %r4904; 2026-02-21T09:20:57.6811025Z mov.b32 %r4915, %r4904; 2026-02-21T09:20:57.6811082Z mov.b32 %r4916, %r4904; 2026-02-21T09:20:57.6811139Z mov.b32 %r4917, %r4904; 2026-02-21T09:20:57.6811196Z mov.b32 %r4918, %r4904; 2026-02-21T09:20:57.6811255Z mov.b32 %r4919, %r4904; 2026-02-21T09:20:57.6811315Z mov.b32 %r4920, %r4904; 2026-02-21T09:20:57.6811379Z mov.b32 %r4921, %r4904; 2026-02-21T09:20:57.6811439Z mov.b32 %r4922, %r4904; 2026-02-21T09:20:57.6811496Z mov.b32 %r4923, %r4904; 2026-02-21T09:20:57.6811553Z mov.b32 %r4924, %r4904; 2026-02-21T09:20:57.6811611Z mov.b32 %r4925, %r4904; 2026-02-21T09:20:57.6811671Z mov.b32 %r4926, %r4904; 2026-02-21T09:20:57.6811729Z mov.b32 %r4927, %r4904; 2026-02-21T09:20:57.6811786Z mov.b32 %r4928, %r4904; 2026-02-21T09:20:57.6811846Z mov.b32 %r4929, %r4904; 2026-02-21T09:20:57.6811905Z mov.b32 %r4930, %r4904; 2026-02-21T09:20:57.6811963Z mov.b32 %r4931, %r4904; 2026-02-21T09:20:57.6812020Z mov.b32 %r4932, %r4904; 2026-02-21T09:20:57.6812079Z mov.b32 %r4933, %r4904; 2026-02-21T09:20:57.6812138Z mov.b32 %r4934, %r4904; 2026-02-21T09:20:57.6812194Z mov.b32 %r4935, %r4904; 2026-02-21T09:20:57.6812307Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T09:20:57.6812414Z // => This Inner Loop Header: Depth=2 2026-02-21T09:20:57.6812478Z add.s64 %rd225, %rd225, 16; 2026-02-21T09:20:57.6812548Z setp.lt.u64 %p57, %rd225, 432; 2026-02-21T09:20:57.6812608Z add.s32 %r5046, %r5306, 1; 2026-02-21T09:20:57.6812670Z setp.gt.s32 %p58, %r5046, 4; 2026-02-21T09:20:57.6812737Z selp.b32 %r5306, 0, %r5046, %p58; 2026-02-21T09:20:57.6812941Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6813007Z cp.async.wait_group 8; 2026-02-21T09:20:57.6813063Z bar.sync 0; 2026-02-21T09:20:57.6813132Z shl.b32 %r5047, %r5306, 13; 2026-02-21T09:20:57.6813201Z add.s32 %r5049, %r5136, %r5047; 2026-02-21T09:20:57.6813398Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6813520Z add.s32 %r5050, %r5049, %r58; 2026-02-21T09:20:57.6813585Z ld.shared.b16 %rs162, [%r5050]; 2026-02-21T09:20:57.6813654Z ld.shared.b16 %rs163, [%r5050+256]; 2026-02-21T09:20:57.6813765Z ld.shared.b16 %rs164, [%r5050+16]; 2026-02-21T09:20:57.6813833Z ld.shared.b16 %rs165, [%r5050+272]; 2026-02-21T09:20:57.6813893Z add.s32 %r5051, %r5049, %r59; 2026-02-21T09:20:57.6813956Z ld.shared.b16 %rs166, [%r5051]; 2026-02-21T09:20:57.6814022Z ld.shared.b16 %rs167, [%r5051+256]; 2026-02-21T09:20:57.6814089Z ld.shared.b16 %rs168, [%r5051+16]; 2026-02-21T09:20:57.6814153Z ld.shared.b16 %rs169, [%r5051+272]; 2026-02-21T09:20:57.6814219Z cvt.f32.bf16 %r4566, %rs162; 2026-02-21T09:20:57.6814370Z cvt.f32.bf16 %r4567, %rs163; 2026-02-21T09:20:57.6814445Z cvt.f32.bf16 %r4568, %rs166; 2026-02-21T09:20:57.6814508Z cvt.f32.bf16 %r4569, %rs167; 2026-02-21T09:20:57.6814568Z cvt.f32.bf16 %r4634, %rs164; 2026-02-21T09:20:57.6814629Z cvt.f32.bf16 %r4635, %rs165; 2026-02-21T09:20:57.6814687Z cvt.f32.bf16 %r4636, %rs168; 2026-02-21T09:20:57.6814748Z cvt.f32.bf16 %r4637, %rs169; 2026-02-21T09:20:57.6814951Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6815015Z cvt.s64.s32 %rd211, %r5305; 2026-02-21T09:20:57.6815078Z add.s64 %rd198, %rd28, %rd211; 2026-02-21T09:20:57.6815277Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6815381Z // begin inline asm 2026-02-21T09:20:57.6815442Z mov.u64 %rd197, 0x0; 2026-02-21T09:20:57.6815574Z createpolicy.fractional.L2::evict_first.b64 %rd197, 1.0; 2026-02-21T09:20:57.6815632Z // end inline asm 2026-02-21T09:20:57.6815693Z // begin inline asm 2026-02-21T09:20:57.6815753Z mov.u16 %rs160, 0x0; 2026-02-21T09:20:57.6815927Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs160 }, [ %rd198 + 0 ], %rd197; 2026-02-21T09:20:57.6815985Z // end inline asm 2026-02-21T09:20:57.6816194Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6816263Z st.shared.b8 [%r60], %rs160; 2026-02-21T09:20:57.6816319Z bar.sync 0; 2026-02-21T09:20:57.6816402Z ld.shared.v2.b8 {%rs170, %rs171}, [%r61]; 2026-02-21T09:20:57.6816728Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6816795Z shl.b16 %rs172, %rs170, 4; 2026-02-21T09:20:57.6816856Z shl.b16 %rs173, %rs171, 4; 2026-02-21T09:20:57.6817058Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6817132Z selp.b16 %rs174, %rs172, %rs170, %p63; 2026-02-21T09:20:57.6817195Z cvt.s16.s8 %rs175, %rs174; 2026-02-21T09:20:57.6817255Z shr.s16 %rs176, %rs175, 4; 2026-02-21T09:20:57.6817327Z selp.b16 %rs177, %rs173, %rs171, %p63; 2026-02-21T09:20:57.6817387Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T09:20:57.6817449Z shr.s16 %rs179, %rs178, 4; 2026-02-21T09:20:57.6817650Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6817715Z cvt.rn.f32.s16 %r5052, %rs176; 2026-02-21T09:20:57.6817779Z cvt.rn.f32.s16 %r5053, %rs179; 2026-02-21T09:20:57.6817834Z bar.sync 0; 2026-02-21T09:20:57.6817901Z st.shared.b32 [%r62], %r5052; 2026-02-21T09:20:57.6817962Z st.shared.b32 [%r63], %r5053; 2026-02-21T09:20:57.6818105Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4904}; 2026-02-21T09:20:57.6818163Z bar.sync 0; 2026-02-21T09:20:57.6818232Z // begin inline asm 2026-02-21T09:20:57.6818391Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4434, %r4570}, [%r4272]; 2026-02-21T09:20:57.6818452Z // end inline asm 2026-02-21T09:20:57.6818507Z bar.sync 0; 2026-02-21T09:20:57.6818637Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4906}; 2026-02-21T09:20:57.6818691Z bar.sync 0; 2026-02-21T09:20:57.6818833Z // begin inline asm 2026-02-21T09:20:57.6818983Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4436, %r4572}, [%r4272]; 2026-02-21T09:20:57.6819038Z // end inline asm 2026-02-21T09:20:57.6819097Z bar.sync 0; 2026-02-21T09:20:57.6819226Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4905}; 2026-02-21T09:20:57.6819342Z bar.sync 0; 2026-02-21T09:20:57.6819401Z // begin inline asm 2026-02-21T09:20:57.6819552Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4435, %r4571}, [%r4272]; 2026-02-21T09:20:57.6819609Z // end inline asm 2026-02-21T09:20:57.6819662Z bar.sync 0; 2026-02-21T09:20:57.6819794Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4907}; 2026-02-21T09:20:57.6819849Z bar.sync 0; 2026-02-21T09:20:57.6819906Z // begin inline asm 2026-02-21T09:20:57.6820114Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4437, %r4573}, [%r4272]; 2026-02-21T09:20:57.6820180Z // end inline asm 2026-02-21T09:20:57.6820235Z bar.sync 0; 2026-02-21T09:20:57.6820362Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4908}; 2026-02-21T09:20:57.6820421Z bar.sync 0; 2026-02-21T09:20:57.6820478Z // begin inline asm 2026-02-21T09:20:57.6820623Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4438, %r4574}, [%r4272]; 2026-02-21T09:20:57.6820685Z // end inline asm 2026-02-21T09:20:57.6820743Z bar.sync 0; 2026-02-21T09:20:57.6820869Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4910}; 2026-02-21T09:20:57.6820923Z bar.sync 0; 2026-02-21T09:20:57.6820984Z // begin inline asm 2026-02-21T09:20:57.6821130Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4440, %r4576}, [%r4272]; 2026-02-21T09:20:57.6821248Z // end inline asm 2026-02-21T09:20:57.6821307Z bar.sync 0; 2026-02-21T09:20:57.6821435Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4909}; 2026-02-21T09:20:57.6821505Z bar.sync 0; 2026-02-21T09:20:57.6821566Z // begin inline asm 2026-02-21T09:20:57.6821717Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4439, %r4575}, [%r4272]; 2026-02-21T09:20:57.6821780Z // end inline asm 2026-02-21T09:20:57.6821836Z bar.sync 0; 2026-02-21T09:20:57.6821964Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4911}; 2026-02-21T09:20:57.6822018Z bar.sync 0; 2026-02-21T09:20:57.6822075Z // begin inline asm 2026-02-21T09:20:57.6822221Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4441, %r4577}, [%r4272]; 2026-02-21T09:20:57.6822280Z // end inline asm 2026-02-21T09:20:57.6822334Z bar.sync 0; 2026-02-21T09:20:57.6822459Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4912}; 2026-02-21T09:20:57.6822516Z bar.sync 0; 2026-02-21T09:20:57.6822573Z // begin inline asm 2026-02-21T09:20:57.6822719Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4442, %r4578}, [%r4272]; 2026-02-21T09:20:57.6822782Z // end inline asm 2026-02-21T09:20:57.6822836Z bar.sync 0; 2026-02-21T09:20:57.6822963Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4914}; 2026-02-21T09:20:57.6823017Z bar.sync 0; 2026-02-21T09:20:57.6823076Z // begin inline asm 2026-02-21T09:20:57.6823220Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4444, %r4580}, [%r4272]; 2026-02-21T09:20:57.6823277Z // end inline asm 2026-02-21T09:20:57.6823333Z bar.sync 0; 2026-02-21T09:20:57.6823459Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4913}; 2026-02-21T09:20:57.6823514Z bar.sync 0; 2026-02-21T09:20:57.6823574Z // begin inline asm 2026-02-21T09:20:57.6823725Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4443, %r4579}, [%r4272]; 2026-02-21T09:20:57.6823781Z // end inline asm 2026-02-21T09:20:57.6823836Z bar.sync 0; 2026-02-21T09:20:57.6829774Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4915}; 2026-02-21T09:20:57.6829858Z bar.sync 0; 2026-02-21T09:20:57.6829924Z // begin inline asm 2026-02-21T09:20:57.6830101Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4445, %r4581}, [%r4272]; 2026-02-21T09:20:57.6830162Z // end inline asm 2026-02-21T09:20:57.6830216Z bar.sync 0; 2026-02-21T09:20:57.6830356Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4916}; 2026-02-21T09:20:57.6830413Z bar.sync 0; 2026-02-21T09:20:57.6830609Z // begin inline asm 2026-02-21T09:20:57.6830766Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4446, %r4582}, [%r4272]; 2026-02-21T09:20:57.6830825Z // end inline asm 2026-02-21T09:20:57.6830878Z bar.sync 0; 2026-02-21T09:20:57.6831010Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4918}; 2026-02-21T09:20:57.6831135Z bar.sync 0; 2026-02-21T09:20:57.6831198Z // begin inline asm 2026-02-21T09:20:57.6831346Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4448, %r4584}, [%r4272]; 2026-02-21T09:20:57.6831400Z // end inline asm 2026-02-21T09:20:57.6831457Z bar.sync 0; 2026-02-21T09:20:57.6831596Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4917}; 2026-02-21T09:20:57.6831652Z bar.sync 0; 2026-02-21T09:20:57.6831714Z // begin inline asm 2026-02-21T09:20:57.6831942Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4447, %r4583}, [%r4272]; 2026-02-21T09:20:57.6832001Z // end inline asm 2026-02-21T09:20:57.6832055Z bar.sync 0; 2026-02-21T09:20:57.6832193Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4919}; 2026-02-21T09:20:57.6832250Z bar.sync 0; 2026-02-21T09:20:57.6832308Z // begin inline asm 2026-02-21T09:20:57.6832458Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4449, %r4585}, [%r4272]; 2026-02-21T09:20:57.6832514Z // end inline asm 2026-02-21T09:20:57.6832568Z bar.sync 0; 2026-02-21T09:20:57.6832692Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4920}; 2026-02-21T09:20:57.6832748Z bar.sync 0; 2026-02-21T09:20:57.6832804Z // begin inline asm 2026-02-21T09:20:57.6833006Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4450, %r4586}, [%r4272]; 2026-02-21T09:20:57.6833067Z // end inline asm 2026-02-21T09:20:57.6833119Z bar.sync 0; 2026-02-21T09:20:57.6833243Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4922}; 2026-02-21T09:20:57.6833302Z bar.sync 0; 2026-02-21T09:20:57.6833362Z // begin inline asm 2026-02-21T09:20:57.6833505Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4452, %r4588}, [%r4272]; 2026-02-21T09:20:57.6833562Z // end inline asm 2026-02-21T09:20:57.6833619Z bar.sync 0; 2026-02-21T09:20:57.6833743Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4921}; 2026-02-21T09:20:57.6833795Z bar.sync 0; 2026-02-21T09:20:57.6833854Z // begin inline asm 2026-02-21T09:20:57.6833999Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4451, %r4587}, [%r4272]; 2026-02-21T09:20:57.6834054Z // end inline asm 2026-02-21T09:20:57.6834119Z bar.sync 0; 2026-02-21T09:20:57.6834249Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4923}; 2026-02-21T09:20:57.6834303Z bar.sync 0; 2026-02-21T09:20:57.6834361Z // begin inline asm 2026-02-21T09:20:57.6834505Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4453, %r4589}, [%r4272]; 2026-02-21T09:20:57.6834560Z // end inline asm 2026-02-21T09:20:57.6834612Z bar.sync 0; 2026-02-21T09:20:57.6834736Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4924}; 2026-02-21T09:20:57.6834792Z bar.sync 0; 2026-02-21T09:20:57.6834848Z // begin inline asm 2026-02-21T09:20:57.6834990Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4454, %r4590}, [%r4272]; 2026-02-21T09:20:57.6835047Z // end inline asm 2026-02-21T09:20:57.6835099Z bar.sync 0; 2026-02-21T09:20:57.6835221Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4926}; 2026-02-21T09:20:57.6835278Z bar.sync 0; 2026-02-21T09:20:57.6835336Z // begin inline asm 2026-02-21T09:20:57.6835477Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4456, %r4592}, [%r4272]; 2026-02-21T09:20:57.6835532Z // end inline asm 2026-02-21T09:20:57.6835587Z bar.sync 0; 2026-02-21T09:20:57.6835710Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4925}; 2026-02-21T09:20:57.6835764Z bar.sync 0; 2026-02-21T09:20:57.6835822Z // begin inline asm 2026-02-21T09:20:57.6835964Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4455, %r4591}, [%r4272]; 2026-02-21T09:20:57.6836020Z // end inline asm 2026-02-21T09:20:57.6836072Z bar.sync 0; 2026-02-21T09:20:57.6836197Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4927}; 2026-02-21T09:20:57.6836311Z bar.sync 0; 2026-02-21T09:20:57.6836368Z // begin inline asm 2026-02-21T09:20:57.6836654Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4457, %r4593}, [%r4272]; 2026-02-21T09:20:57.6836713Z // end inline asm 2026-02-21T09:20:57.6836766Z bar.sync 0; 2026-02-21T09:20:57.6836965Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4928}; 2026-02-21T09:20:57.6837025Z bar.sync 0; 2026-02-21T09:20:57.6837084Z // begin inline asm 2026-02-21T09:20:57.6837229Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4458, %r4594}, [%r4272]; 2026-02-21T09:20:57.6837287Z // end inline asm 2026-02-21T09:20:57.6837342Z bar.sync 0; 2026-02-21T09:20:57.6837468Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4930}; 2026-02-21T09:20:57.6837523Z bar.sync 0; 2026-02-21T09:20:57.6837579Z // begin inline asm 2026-02-21T09:20:57.6837792Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4460, %r4596}, [%r4272]; 2026-02-21T09:20:57.6837852Z // end inline asm 2026-02-21T09:20:57.6837908Z bar.sync 0; 2026-02-21T09:20:57.6838035Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4929}; 2026-02-21T09:20:57.6838087Z bar.sync 0; 2026-02-21T09:20:57.6838145Z // begin inline asm 2026-02-21T09:20:57.6838288Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4459, %r4595}, [%r4272]; 2026-02-21T09:20:57.6838345Z // end inline asm 2026-02-21T09:20:57.6838398Z bar.sync 0; 2026-02-21T09:20:57.6838523Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4931}; 2026-02-21T09:20:57.6838576Z bar.sync 0; 2026-02-21T09:20:57.6838632Z // begin inline asm 2026-02-21T09:20:57.6838836Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4461, %r4597}, [%r4272]; 2026-02-21T09:20:57.6838894Z // end inline asm 2026-02-21T09:20:57.6838946Z bar.sync 0; 2026-02-21T09:20:57.6839070Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4932}; 2026-02-21T09:20:57.6839126Z bar.sync 0; 2026-02-21T09:20:57.6839182Z // begin inline asm 2026-02-21T09:20:57.6839325Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4462, %r4598}, [%r4272]; 2026-02-21T09:20:57.6839383Z // end inline asm 2026-02-21T09:20:57.6839436Z bar.sync 0; 2026-02-21T09:20:57.6839559Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4934}; 2026-02-21T09:20:57.6839613Z bar.sync 0; 2026-02-21T09:20:57.6839670Z // begin inline asm 2026-02-21T09:20:57.6839813Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4464, %r4600}, [%r4272]; 2026-02-21T09:20:57.6839868Z // end inline asm 2026-02-21T09:20:57.6839923Z bar.sync 0; 2026-02-21T09:20:57.6840047Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4933}; 2026-02-21T09:20:57.6840099Z bar.sync 0; 2026-02-21T09:20:57.6840160Z // begin inline asm 2026-02-21T09:20:57.6840302Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4463, %r4599}, [%r4272]; 2026-02-21T09:20:57.6840356Z // end inline asm 2026-02-21T09:20:57.6840412Z bar.sync 0; 2026-02-21T09:20:57.6840537Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4773], {%r4935}; 2026-02-21T09:20:57.6840589Z bar.sync 0; 2026-02-21T09:20:57.6840645Z // begin inline asm 2026-02-21T09:20:57.6840789Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4465, %r4601}, [%r4272]; 2026-02-21T09:20:57.6840846Z // end inline asm 2026-02-21T09:20:57.6840902Z $L__tmp17: 2026-02-21T09:20:57.6841196Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6841258Z // begin inline asm 2026-02-21T09:20:57.6841342Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6841397Z // end inline asm 2026-02-21T09:20:57.6841482Z shfl.sync.idx.b32 %r5054, %r6, 0, 31, -1; 2026-02-21T09:20:57.6841556Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6841628Z mov.pred %p50, -1; 2026-02-21T09:20:57.6841687Z // begin inline asm 2026-02-21T09:20:57.6842450Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4434,%r4435,%r4436,%r4437,%r4438,%r4439,%r4440,%r4441,%r4442,%r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465}, {%r4566,%r4567,%r4568,%r4569}, %rd200, %p50, 1, 1; 2026-02-21T09:20:57.6842584Z // end inline asm 2026-02-21T09:20:57.6842651Z // begin inline asm 2026-02-21T09:20:57.6843411Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4434,%r4435,%r4436,%r4437,%r4438,%r4439,%r4440,%r4441,%r4442,%r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465}, {%r4634,%r4635,%r4636,%r4637}, %rd201, %p50, 1, 1; 2026-02-21T09:20:57.6843535Z // end inline asm 2026-02-21T09:20:57.6843598Z // begin inline asm 2026-02-21T09:20:57.6844409Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601}, {%r4566,%r4567,%r4568,%r4569}, %rd202, %p50, 1, 1; 2026-02-21T09:20:57.6844472Z // end inline asm 2026-02-21T09:20:57.6844538Z // begin inline asm 2026-02-21T09:20:57.6845291Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601}, {%r4634,%r4635,%r4636,%r4637}, %rd203, %p50, 1, 1; 2026-02-21T09:20:57.6845350Z // end inline asm 2026-02-21T09:20:57.6845432Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6845490Z mov.b32 %r5006, 0; 2026-02-21T09:20:57.6845596Z mov.b32 %r4702, %r4176; 2026-02-21T09:20:57.6845655Z mov.b32 %r4703, %r5006; 2026-02-21T09:20:57.6845716Z mov.b32 %r4704, %r5006; 2026-02-21T09:20:57.6845775Z // begin inline asm 2026-02-21T09:20:57.6846971Z // wait for regs: %r4434,%r4435,%r4436,%r4437,%r4438,%r4439,%r4440,%r4441,%r4442,%r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601,%r4702,%r4703,%r4704 2026-02-21T09:20:57.6847060Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6847117Z // end inline asm 2026-02-21T09:20:57.6847173Z $L__tmp18: 2026-02-21T09:20:57.6847396Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6847463Z add.s32 %r5055, %r5049, 40960; 2026-02-21T09:20:57.6847667Z .loc 1 55 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:55:32 2026-02-21T09:20:57.6847732Z add.s32 %r5056, %r5055, %r58; 2026-02-21T09:20:57.6847800Z ld.shared.b16 %rs180, [%r5056]; 2026-02-21T09:20:57.6847869Z ld.shared.b16 %rs181, [%r5056+256]; 2026-02-21T09:20:57.6847937Z ld.shared.b16 %rs182, [%r5056+16]; 2026-02-21T09:20:57.6848003Z ld.shared.b16 %rs183, [%r5056+272]; 2026-02-21T09:20:57.6848063Z add.s32 %r5057, %r5055, %r59; 2026-02-21T09:20:57.6848126Z ld.shared.b16 %rs184, [%r5057]; 2026-02-21T09:20:57.6848192Z ld.shared.b16 %rs185, [%r5057+256]; 2026-02-21T09:20:57.6848256Z ld.shared.b16 %rs186, [%r5057+16]; 2026-02-21T09:20:57.6848321Z ld.shared.b16 %rs187, [%r5057+272]; 2026-02-21T09:20:57.6848392Z cvt.f32.bf16 %r4900, %rs180; 2026-02-21T09:20:57.6848452Z cvt.f32.bf16 %r4901, %rs181; 2026-02-21T09:20:57.6848510Z cvt.f32.bf16 %r4902, %rs184; 2026-02-21T09:20:57.6848570Z cvt.f32.bf16 %r4903, %rs185; 2026-02-21T09:20:57.6848631Z cvt.f32.bf16 %r4968, %rs182; 2026-02-21T09:20:57.6848690Z cvt.f32.bf16 %r4969, %rs183; 2026-02-21T09:20:57.6848748Z cvt.f32.bf16 %r4970, %rs186; 2026-02-21T09:20:57.6848809Z cvt.f32.bf16 %r4971, %rs187; 2026-02-21T09:20:57.6849008Z .loc 1 57 34 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:34 2026-02-21T09:20:57.6849071Z add.s32 %r5058, %r5305, 65536; 2026-02-21T09:20:57.6849232Z cvt.s64.s32 %rd212, %r5058; 2026-02-21T09:20:57.6849295Z add.s64 %rd205, %rd28, %rd212; 2026-02-21T09:20:57.6849492Z .loc 1 57 87 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:57:87 2026-02-21T09:20:57.6849620Z // begin inline asm 2026-02-21T09:20:57.6849681Z mov.u64 %rd204, 0x0; 2026-02-21T09:20:57.6849811Z createpolicy.fractional.L2::evict_first.b64 %rd204, 1.0; 2026-02-21T09:20:57.6849867Z // end inline asm 2026-02-21T09:20:57.6849925Z // begin inline asm 2026-02-21T09:20:57.6849984Z mov.u16 %rs161, 0x0; 2026-02-21T09:20:57.6850149Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs161 }, [ %rd205 + 0 ], %rd204; 2026-02-21T09:20:57.6850205Z // end inline asm 2026-02-21T09:20:57.6850472Z .loc 1 65 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:65:28 2026-02-21T09:20:57.6850530Z bar.sync 0; 2026-02-21T09:20:57.6850595Z st.shared.b8 [%r60], %rs161; 2026-02-21T09:20:57.6850652Z bar.sync 0; 2026-02-21T09:20:57.6850732Z ld.shared.v2.b8 {%rs188, %rs189}, [%r61]; 2026-02-21T09:20:57.6850936Z .loc 1 60 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:60:28 2026-02-21T09:20:57.6851004Z shl.b16 %rs190, %rs188, 4; 2026-02-21T09:20:57.6851066Z shl.b16 %rs191, %rs189, 4; 2026-02-21T09:20:57.6851263Z .loc 1 75 58 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:75:58 2026-02-21T09:20:57.6851343Z selp.b16 %rs192, %rs190, %rs188, %p63; 2026-02-21T09:20:57.6851474Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T09:20:57.6851540Z shr.s16 %rs194, %rs193, 4; 2026-02-21T09:20:57.6851608Z selp.b16 %rs195, %rs191, %rs189, %p63; 2026-02-21T09:20:57.6851671Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T09:20:57.6851731Z shr.s16 %rs197, %rs196, 4; 2026-02-21T09:20:57.6851927Z .loc 1 80 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:80:32 2026-02-21T09:20:57.6851996Z cvt.rn.f32.s16 %r5059, %rs194; 2026-02-21T09:20:57.6852058Z cvt.rn.f32.s16 %r5060, %rs197; 2026-02-21T09:20:57.6852112Z bar.sync 0; 2026-02-21T09:20:57.6852175Z st.shared.b32 [%r62], %r5059; 2026-02-21T09:20:57.6852240Z st.shared.b32 [%r63], %r5060; 2026-02-21T09:20:57.6852296Z $L__tmp19: 2026-02-21T09:20:57.6852570Z .loc 2 291 36 // standard.py:291:36 @[ ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:87:40 ] 2026-02-21T09:20:57.6852733Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4434, %r4570}; 2026-02-21T09:20:57.6852788Z bar.sync 0; 2026-02-21T09:20:57.6852861Z // begin inline asm 2026-02-21T09:20:57.6853002Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4904}, [%r4773]; 2026-02-21T09:20:57.6853060Z // end inline asm 2026-02-21T09:20:57.6853112Z bar.sync 0; 2026-02-21T09:20:57.6853261Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4436, %r4572}; 2026-02-21T09:20:57.6853315Z bar.sync 0; 2026-02-21T09:20:57.6853371Z // begin inline asm 2026-02-21T09:20:57.6853501Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4906}, [%r4773]; 2026-02-21T09:20:57.6853558Z // end inline asm 2026-02-21T09:20:57.6853612Z bar.sync 0; 2026-02-21T09:20:57.6853754Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4435, %r4571}; 2026-02-21T09:20:57.6853809Z bar.sync 0; 2026-02-21T09:20:57.6853868Z // begin inline asm 2026-02-21T09:20:57.6853994Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4905}, [%r4773]; 2026-02-21T09:20:57.6854048Z // end inline asm 2026-02-21T09:20:57.6854105Z bar.sync 0; 2026-02-21T09:20:57.6854248Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4437, %r4573}; 2026-02-21T09:20:57.6854301Z bar.sync 0; 2026-02-21T09:20:57.6854357Z // begin inline asm 2026-02-21T09:20:57.6854486Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4907}, [%r4773]; 2026-02-21T09:20:57.6854540Z // end inline asm 2026-02-21T09:20:57.6854593Z bar.sync 0; 2026-02-21T09:20:57.6854737Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4438, %r4574}; 2026-02-21T09:20:57.6854853Z bar.sync 0; 2026-02-21T09:20:57.6854910Z // begin inline asm 2026-02-21T09:20:57.6855052Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4908}, [%r4773]; 2026-02-21T09:20:57.6855109Z // end inline asm 2026-02-21T09:20:57.6855164Z bar.sync 0; 2026-02-21T09:20:57.6855354Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4440, %r4576}; 2026-02-21T09:20:57.6855410Z bar.sync 0; 2026-02-21T09:20:57.6855466Z // begin inline asm 2026-02-21T09:20:57.6855588Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4910}, [%r4773]; 2026-02-21T09:20:57.6855645Z // end inline asm 2026-02-21T09:20:57.6855700Z bar.sync 0; 2026-02-21T09:20:57.6855844Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4439, %r4575}; 2026-02-21T09:20:57.6855897Z bar.sync 0; 2026-02-21T09:20:57.6856006Z // begin inline asm 2026-02-21T09:20:57.6856135Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4909}, [%r4773]; 2026-02-21T09:20:57.6856190Z // end inline asm 2026-02-21T09:20:57.6856245Z bar.sync 0; 2026-02-21T09:20:57.6856388Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4441, %r4577}; 2026-02-21T09:20:57.6856440Z bar.sync 0; 2026-02-21T09:20:57.6856610Z // begin inline asm 2026-02-21T09:20:57.6856741Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4911}, [%r4773]; 2026-02-21T09:20:57.6856801Z // end inline asm 2026-02-21T09:20:57.6856853Z bar.sync 0; 2026-02-21T09:20:57.6856995Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4442, %r4578}; 2026-02-21T09:20:57.6857052Z bar.sync 0; 2026-02-21T09:20:57.6857109Z // begin inline asm 2026-02-21T09:20:57.6857316Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4912}, [%r4773]; 2026-02-21T09:20:57.6857378Z // end inline asm 2026-02-21T09:20:57.6857431Z bar.sync 0; 2026-02-21T09:20:57.6857576Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4444, %r4580}; 2026-02-21T09:20:57.6857629Z bar.sync 0; 2026-02-21T09:20:57.6857687Z // begin inline asm 2026-02-21T09:20:57.6857810Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4914}, [%r4773]; 2026-02-21T09:20:57.6857868Z // end inline asm 2026-02-21T09:20:57.6857923Z bar.sync 0; 2026-02-21T09:20:57.6858066Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4443, %r4579}; 2026-02-21T09:20:57.6858119Z bar.sync 0; 2026-02-21T09:20:57.6858178Z // begin inline asm 2026-02-21T09:20:57.6858307Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4913}, [%r4773]; 2026-02-21T09:20:57.6858362Z // end inline asm 2026-02-21T09:20:57.6858414Z bar.sync 0; 2026-02-21T09:20:57.6858559Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4445, %r4581}; 2026-02-21T09:20:57.6858611Z bar.sync 0; 2026-02-21T09:20:57.6858668Z // begin inline asm 2026-02-21T09:20:57.6858794Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4915}, [%r4773]; 2026-02-21T09:20:57.6858849Z // end inline asm 2026-02-21T09:20:57.6858903Z bar.sync 0; 2026-02-21T09:20:57.6859045Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4446, %r4582}; 2026-02-21T09:20:57.6859101Z bar.sync 0; 2026-02-21T09:20:57.6859157Z // begin inline asm 2026-02-21T09:20:57.6859283Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4916}, [%r4773]; 2026-02-21T09:20:57.6859340Z // end inline asm 2026-02-21T09:20:57.6859393Z bar.sync 0; 2026-02-21T09:20:57.6859533Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4448, %r4584}; 2026-02-21T09:20:57.6859588Z bar.sync 0; 2026-02-21T09:20:57.6859647Z // begin inline asm 2026-02-21T09:20:57.6859771Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4918}, [%r4773]; 2026-02-21T09:20:57.6859827Z // end inline asm 2026-02-21T09:20:57.6859884Z bar.sync 0; 2026-02-21T09:20:57.6860026Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4447, %r4583}; 2026-02-21T09:20:57.6860078Z bar.sync 0; 2026-02-21T09:20:57.6860135Z // begin inline asm 2026-02-21T09:20:57.6860279Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4917}, [%r4773]; 2026-02-21T09:20:57.6860335Z // end inline asm 2026-02-21T09:20:57.6860388Z bar.sync 0; 2026-02-21T09:20:57.6860529Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4449, %r4585}; 2026-02-21T09:20:57.6860659Z bar.sync 0; 2026-02-21T09:20:57.6860715Z // begin inline asm 2026-02-21T09:20:57.6860841Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4919}, [%r4773]; 2026-02-21T09:20:57.6860896Z // end inline asm 2026-02-21T09:20:57.6860947Z bar.sync 0; 2026-02-21T09:20:57.6861146Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4450, %r4586}; 2026-02-21T09:20:57.6861203Z bar.sync 0; 2026-02-21T09:20:57.6861261Z // begin inline asm 2026-02-21T09:20:57.6861392Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4920}, [%r4773]; 2026-02-21T09:20:57.6861450Z // end inline asm 2026-02-21T09:20:57.6861515Z bar.sync 0; 2026-02-21T09:20:57.6861659Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4452, %r4588}; 2026-02-21T09:20:57.6861712Z bar.sync 0; 2026-02-21T09:20:57.6861832Z // begin inline asm 2026-02-21T09:20:57.6861964Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4922}, [%r4773]; 2026-02-21T09:20:57.6862020Z // end inline asm 2026-02-21T09:20:57.6862079Z bar.sync 0; 2026-02-21T09:20:57.6862228Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4451, %r4587}; 2026-02-21T09:20:57.6862292Z bar.sync 0; 2026-02-21T09:20:57.6862353Z // begin inline asm 2026-02-21T09:20:57.6862486Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4921}, [%r4773]; 2026-02-21T09:20:57.6862543Z // end inline asm 2026-02-21T09:20:57.6862596Z bar.sync 0; 2026-02-21T09:20:57.6862744Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4453, %r4589}; 2026-02-21T09:20:57.6862796Z bar.sync 0; 2026-02-21T09:20:57.6862853Z // begin inline asm 2026-02-21T09:20:57.6863028Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4923}, [%r4773]; 2026-02-21T09:20:57.6863089Z // end inline asm 2026-02-21T09:20:57.6863142Z bar.sync 0; 2026-02-21T09:20:57.6863286Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4454, %r4590}; 2026-02-21T09:20:57.6863342Z bar.sync 0; 2026-02-21T09:20:57.6863398Z // begin inline asm 2026-02-21T09:20:57.6863522Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4924}, [%r4773]; 2026-02-21T09:20:57.6863580Z // end inline asm 2026-02-21T09:20:57.6863633Z bar.sync 0; 2026-02-21T09:20:57.6863775Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4456, %r4592}; 2026-02-21T09:20:57.6863826Z bar.sync 0; 2026-02-21T09:20:57.6863888Z // begin inline asm 2026-02-21T09:20:57.6864011Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4926}, [%r4773]; 2026-02-21T09:20:57.6864065Z // end inline asm 2026-02-21T09:20:57.6864121Z bar.sync 0; 2026-02-21T09:20:57.6864263Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4455, %r4591}; 2026-02-21T09:20:57.6864317Z bar.sync 0; 2026-02-21T09:20:57.6864374Z // begin inline asm 2026-02-21T09:20:57.6864499Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4925}, [%r4773]; 2026-02-21T09:20:57.6864555Z // end inline asm 2026-02-21T09:20:57.6864615Z bar.sync 0; 2026-02-21T09:20:57.6864758Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4457, %r4593}; 2026-02-21T09:20:57.6864812Z bar.sync 0; 2026-02-21T09:20:57.6864873Z // begin inline asm 2026-02-21T09:20:57.6864998Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4927}, [%r4773]; 2026-02-21T09:20:57.6865052Z // end inline asm 2026-02-21T09:20:57.6865117Z bar.sync 0; 2026-02-21T09:20:57.6865266Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4458, %r4594}; 2026-02-21T09:20:57.6865321Z bar.sync 0; 2026-02-21T09:20:57.6865378Z // begin inline asm 2026-02-21T09:20:57.6865503Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4928}, [%r4773]; 2026-02-21T09:20:57.6865557Z // end inline asm 2026-02-21T09:20:57.6865610Z bar.sync 0; 2026-02-21T09:20:57.6865756Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4460, %r4596}; 2026-02-21T09:20:57.6865809Z bar.sync 0; 2026-02-21T09:20:57.6865866Z // begin inline asm 2026-02-21T09:20:57.6865991Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4930}, [%r4773]; 2026-02-21T09:20:57.6866047Z // end inline asm 2026-02-21T09:20:57.6866099Z bar.sync 0; 2026-02-21T09:20:57.6866241Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4459, %r4595}; 2026-02-21T09:20:57.6866374Z bar.sync 0; 2026-02-21T09:20:57.6866431Z // begin inline asm 2026-02-21T09:20:57.6866667Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4929}, [%r4773]; 2026-02-21T09:20:57.6866724Z // end inline asm 2026-02-21T09:20:57.6866865Z bar.sync 0; 2026-02-21T09:20:57.6867013Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4461, %r4597}; 2026-02-21T09:20:57.6867065Z bar.sync 0; 2026-02-21T09:20:57.6867123Z // begin inline asm 2026-02-21T09:20:57.6867246Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4931}, [%r4773]; 2026-02-21T09:20:57.6867302Z // end inline asm 2026-02-21T09:20:57.6867358Z bar.sync 0; 2026-02-21T09:20:57.6867502Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4462, %r4598}; 2026-02-21T09:20:57.6867554Z bar.sync 0; 2026-02-21T09:20:57.6867675Z // begin inline asm 2026-02-21T09:20:57.6867806Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4932}, [%r4773]; 2026-02-21T09:20:57.6867859Z // end inline asm 2026-02-21T09:20:57.6867914Z bar.sync 0; 2026-02-21T09:20:57.6868058Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4464, %r4600}; 2026-02-21T09:20:57.6868110Z bar.sync 0; 2026-02-21T09:20:57.6868167Z // begin inline asm 2026-02-21T09:20:57.6868357Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4934}, [%r4773]; 2026-02-21T09:20:57.6868433Z // end inline asm 2026-02-21T09:20:57.6868489Z bar.sync 0; 2026-02-21T09:20:57.6868631Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4463, %r4599}; 2026-02-21T09:20:57.6868685Z bar.sync 0; 2026-02-21T09:20:57.6868743Z // begin inline asm 2026-02-21T09:20:57.6868936Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4933}, [%r4773]; 2026-02-21T09:20:57.6868994Z // end inline asm 2026-02-21T09:20:57.6869049Z bar.sync 0; 2026-02-21T09:20:57.6869194Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4272], {%r4465, %r4601}; 2026-02-21T09:20:57.6869247Z bar.sync 0; 2026-02-21T09:20:57.6869306Z // begin inline asm 2026-02-21T09:20:57.6869429Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4935}, [%r4773]; 2026-02-21T09:20:57.6869486Z // end inline asm 2026-02-21T09:20:57.6869545Z // begin inline asm 2026-02-21T09:20:57.6869624Z fence.proxy.async.shared::cta; 2026-02-21T09:20:57.6869678Z // end inline asm 2026-02-21T09:20:57.6869748Z wgmma.fence.sync.aligned; 2026-02-21T09:20:57.6869820Z shl.b32 %r5061, %r5054, 8; 2026-02-21T09:20:57.6869881Z and.b32 %r5062, %r5061, 4096; 2026-02-21T09:20:57.6869944Z add.s32 %r5063, %r5062, %r4176; 2026-02-21T09:20:57.6870008Z bfe.u32 %r5064, %r5063, 4, 14; 2026-02-21T09:20:57.6870071Z cvt.u64.u32 %rd213, %r5064; 2026-02-21T09:20:57.6870153Z or.b64 %rd207, %rd213, -9223371899382267904; 2026-02-21T09:20:57.6870211Z // begin inline asm 2026-02-21T09:20:57.6870988Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4904,%r4905,%r4906,%r4907,%r4908,%r4909,%r4910,%r4911,%r4912,%r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935}, {%r4900,%r4901,%r4902,%r4903}, %rd207, %p50, 1, 1; 2026-02-21T09:20:57.6871048Z // end inline asm 2026-02-21T09:20:57.6871121Z add.s32 %r5065, %r5063, 32; 2026-02-21T09:20:57.6871186Z bfe.u32 %r5066, %r5065, 4, 14; 2026-02-21T09:20:57.6871249Z cvt.u64.u32 %rd214, %r5066; 2026-02-21T09:20:57.6871332Z or.b64 %rd208, %rd214, -9223371899382267904; 2026-02-21T09:20:57.6871392Z // begin inline asm 2026-02-21T09:20:57.6872160Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4904,%r4905,%r4906,%r4907,%r4908,%r4909,%r4910,%r4911,%r4912,%r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935}, {%r4968,%r4969,%r4970,%r4971}, %rd208, %p50, 1, 1; 2026-02-21T09:20:57.6872216Z // end inline asm 2026-02-21T09:20:57.6872296Z wgmma.commit_group.sync.aligned; 2026-02-21T09:20:57.6872357Z mov.b32 %r5004, %r4176; 2026-02-21T09:20:57.6872415Z mov.b32 %r5005, %r5006; 2026-02-21T09:20:57.6872548Z // begin inline asm 2026-02-21T09:20:57.6873107Z // wait for regs: %r4904,%r4905,%r4906,%r4907,%r4908,%r4909,%r4910,%r4911,%r4912,%r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935,%r5004,%r5005,%r5006 2026-02-21T09:20:57.6873229Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:20:57.6873283Z // end inline asm 2026-02-21T09:20:57.6873340Z $L__tmp20: 2026-02-21T09:20:57.6873557Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6873620Z add.s32 %r5067, %r5307, 1; 2026-02-21T09:20:57.6873695Z setp.gt.s32 %p59, %r5067, 4; 2026-02-21T09:20:57.6873768Z selp.b32 %r5307, 0, %r5067, %p59; 2026-02-21T09:20:57.6874024Z .loc 1 51 80 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:51:80 2026-02-21T09:20:57.6874089Z shl.b32 %r5068, %r5307, 13; 2026-02-21T09:20:57.6874148Z add.s32 %r5042, %r48, %r5068; 2026-02-21T09:20:57.6874214Z selp.b32 %r5043, 8, 0, %p57; 2026-02-21T09:20:57.6874271Z // begin inline asm 2026-02-21T09:20:57.6874421Z cp.async.ca.shared.global [ %r5042 + 0 ], [ %rd224 + 0 ], 0x8, %r5043; 2026-02-21T09:20:57.6874476Z // end inline asm 2026-02-21T09:20:57.6874543Z cp.async.commit_group; 2026-02-21T09:20:57.6874604Z add.s32 %r5044, %r49, %r5068; 2026-02-21T09:20:57.6874661Z // begin inline asm 2026-02-21T09:20:57.6874796Z cp.async.ca.shared.global [ %r5044 + 0 ], [ %rd223 + 0 ], 0x8, %r5043; 2026-02-21T09:20:57.6874851Z // end inline asm 2026-02-21T09:20:57.6874966Z cp.async.commit_group; 2026-02-21T09:20:57.6875172Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6875236Z add.s64 %rd224, %rd224, 64; 2026-02-21T09:20:57.6875298Z add.s64 %rd223, %rd223, 64; 2026-02-21T09:20:57.6875358Z add.s32 %r5305, %r5305, 131072; 2026-02-21T09:20:57.6875424Z setp.lt.u64 %p60, %rd225, 496; 2026-02-21T09:20:57.6875489Z @%p60 bra $L__BB0_14; 2026-02-21T09:20:57.6875600Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T09:20:57.6875800Z .loc 1 34 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:34:32 2026-02-21T09:20:57.6875862Z or.b32 %r5105, %r383, %r8; 2026-02-21T09:20:57.6876060Z .loc 1 36 32 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:36:32 2026-02-21T09:20:57.6876119Z or.b32 %r5106, %r384, %r12; 2026-02-21T09:20:57.6876176Z or.b32 %r5107, %r384, %r13; 2026-02-21T09:20:57.6876236Z or.b32 %r5108, %r384, %r14; 2026-02-21T09:20:57.6876293Z or.b32 %r5109, %r384, %r15; 2026-02-21T09:20:57.6876657Z .loc 1 43 78 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:43:78 2026-02-21T09:20:57.6876737Z cp.async.wait_group 0; 2026-02-21T09:20:57.6876792Z bar.sync 0; 2026-02-21T09:20:57.6876988Z .loc 1 90 28 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:90:28 2026-02-21T09:20:57.6877070Z cvt.rn.bf16x2.f32 %r5110, %r4905, %r4904; 2026-02-21T09:20:57.6877145Z cvt.rn.bf16x2.f32 %r5111, %r4907, %r4906; 2026-02-21T09:20:57.6877215Z cvt.rn.bf16x2.f32 %r5112, %r4909, %r4908; 2026-02-21T09:20:57.6877285Z cvt.rn.bf16x2.f32 %r5113, %r4911, %r4910; 2026-02-21T09:20:57.6877358Z cvt.rn.bf16x2.f32 %r5114, %r4913, %r4912; 2026-02-21T09:20:57.6877427Z cvt.rn.bf16x2.f32 %r5115, %r4915, %r4914; 2026-02-21T09:20:57.6877494Z cvt.rn.bf16x2.f32 %r5116, %r4917, %r4916; 2026-02-21T09:20:57.6877565Z cvt.rn.bf16x2.f32 %r5117, %r4919, %r4918; 2026-02-21T09:20:57.6877633Z cvt.rn.bf16x2.f32 %r5118, %r4921, %r4920; 2026-02-21T09:20:57.6877700Z cvt.rn.bf16x2.f32 %r5119, %r4923, %r4922; 2026-02-21T09:20:57.6877769Z cvt.rn.bf16x2.f32 %r5120, %r4925, %r4924; 2026-02-21T09:20:57.6877841Z cvt.rn.bf16x2.f32 %r5121, %r4927, %r4926; 2026-02-21T09:20:57.6877910Z cvt.rn.bf16x2.f32 %r5122, %r4929, %r4928; 2026-02-21T09:20:57.6877977Z cvt.rn.bf16x2.f32 %r5123, %r4931, %r4930; 2026-02-21T09:20:57.6878133Z cvt.rn.bf16x2.f32 %r5124, %r4933, %r4932; 2026-02-21T09:20:57.6878201Z cvt.rn.bf16x2.f32 %r5125, %r4935, %r4934; 2026-02-21T09:20:57.6878412Z .loc 1 91 43 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:43 2026-02-21T09:20:57.6878538Z shl.b32 %r5126, %r5106, 13; 2026-02-21T09:20:57.6878598Z shl.b32 %r5127, %r5107, 13; 2026-02-21T09:20:57.6878654Z shl.b32 %r5128, %r5108, 13; 2026-02-21T09:20:57.6878711Z shl.b32 %r5129, %r5109, 13; 2026-02-21T09:20:57.6878912Z .loc 1 91 50 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:50 2026-02-21T09:20:57.6878974Z add.s32 %r5130, %r5126, %r5105; 2026-02-21T09:20:57.6879034Z add.s32 %r5131, %r5127, %r5105; 2026-02-21T09:20:57.6879167Z add.s32 %r5132, %r5128, %r5105; 2026-02-21T09:20:57.6879229Z add.s32 %r5133, %r5129, %r5105; 2026-02-21T09:20:57.6879423Z .loc 1 91 22 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:22 2026-02-21T09:20:57.6879499Z mad.wide.s32 %rd215, %r5130, 2, %rd29; 2026-02-21T09:20:57.6879568Z mad.wide.s32 %rd216, %r5131, 2, %rd29; 2026-02-21T09:20:57.6879632Z mad.wide.s32 %rd217, %r5132, 2, %rd29; 2026-02-21T09:20:57.6879697Z mad.wide.s32 %rd218, %r5133, 2, %rd29; 2026-02-21T09:20:57.6879895Z .loc 1 91 81 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:91:81 2026-02-21T09:20:57.6880007Z st.shared.v4.b32 [%r66], {%r5110, %r5112, %r5114, %r5116}; 2026-02-21T09:20:57.6880182Z st.shared.v4.b32 [%r66+512], {%r5111, %r5113, %r5115, %r5117}; 2026-02-21T09:20:57.6880291Z st.shared.v4.b32 [%r67], {%r5118, %r5120, %r5122, %r5124}; 2026-02-21T09:20:57.6880400Z st.shared.v4.b32 [%r67+512], {%r5119, %r5121, %r5123, %r5125}; 2026-02-21T09:20:57.6880456Z bar.sync 0; 2026-02-21T09:20:57.6880517Z // begin inline asm 2026-02-21T09:20:57.6880710Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5069, %r5070, %r5071, %r5072}, [%r5073]; 2026-02-21T09:20:57.6880767Z // end inline asm 2026-02-21T09:20:57.6880824Z // begin inline asm 2026-02-21T09:20:57.6881007Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5074, %r5075, %r5076, %r5077}, [%r5078]; 2026-02-21T09:20:57.6881065Z // end inline asm 2026-02-21T09:20:57.6881124Z // begin inline asm 2026-02-21T09:20:57.6881302Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5079, %r5080, %r5081, %r5082}, [%r5083]; 2026-02-21T09:20:57.6881357Z // end inline asm 2026-02-21T09:20:57.6881413Z // begin inline asm 2026-02-21T09:20:57.6881590Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5084, %r5085, %r5086, %r5087}, [%r5088]; 2026-02-21T09:20:57.6881646Z // end inline asm 2026-02-21T09:20:57.6881703Z // begin inline asm 2026-02-21T09:20:57.6881829Z st.global.v4.b32 [ %rd215 + 0 ], { %r5069, %r5070, %r5071, %r5072 }; 2026-02-21T09:20:57.6881887Z // end inline asm 2026-02-21T09:20:57.6881947Z // begin inline asm 2026-02-21T09:20:57.6882064Z st.global.v4.b32 [ %rd216 + 0 ], { %r5074, %r5075, %r5076, %r5077 }; 2026-02-21T09:20:57.6882126Z // end inline asm 2026-02-21T09:20:57.6882190Z // begin inline asm 2026-02-21T09:20:57.6882306Z st.global.v4.b32 [ %rd217 + 0 ], { %r5079, %r5080, %r5081, %r5082 }; 2026-02-21T09:20:57.6882362Z // end inline asm 2026-02-21T09:20:57.6882423Z // begin inline asm 2026-02-21T09:20:57.6882535Z st.global.v4.b32 [ %rd218 + 0 ], { %r5084, %r5085, %r5086, %r5087 }; 2026-02-21T09:20:57.6882590Z // end inline asm 2026-02-21T09:20:57.6882808Z .loc 1 22 121 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:121 2026-02-21T09:20:57.6882871Z add.s32 %r5304, %r5304, 1; 2026-02-21T09:20:57.6882939Z setp.ne.b32 %p61, %r5304, %r2; 2026-02-21T09:20:57.6883002Z @%p61 bra $L__BB0_13; 2026-02-21T09:20:57.6883091Z $L__BB0_16: // %._crit_edge 2026-02-21T09:20:57.6883295Z .loc 1 22 4 // ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py:22:4 2026-02-21T09:20:57.6883347Z ret; 2026-02-21T09:20:57.6883460Z $L__tmp21: 2026-02-21T09:20:57.6883515Z $L__func_end0: 2026-02-21T09:20:57.6883601Z // -- End function 2026-02-21T09:20:57.6883655Z } 2026-02-21T09:20:57.6883901Z .file 1 "/tmp/torchinductor_root/kj/ckjxelsk547kjxhd2wa7zptbe4k4vhzejeizol4sswrxl2kt4xv6.py" 2026-02-21T09:20:57.6884166Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:20:57.6884233Z .section .debug_abbrev 2026-02-21T09:20:57.6884285Z { 2026-02-21T09:20:57.6884382Z .b8 1 // Abbreviation Code 2026-02-21T09:20:57.6884476Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:20:57.6884561Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:57.6884690Z .b8 37 // DW_AT_producer 2026-02-21T09:20:57.6884770Z .b8 8 // DW_FORM_string 2026-02-21T09:20:57.6884848Z .b8 19 // DW_AT_language 2026-02-21T09:20:57.6884930Z .b8 5 // DW_FORM_data2 2026-02-21T09:20:57.6885006Z .b8 3 // DW_AT_name 2026-02-21T09:20:57.6885082Z .b8 8 // DW_FORM_string 2026-02-21T09:20:57.6885168Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:20:57.6885246Z .b8 6 // DW_FORM_data4 2026-02-21T09:20:57.6885324Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:20:57.6885465Z .b8 8 // DW_FORM_string 2026-02-21T09:20:57.6885538Z .b8 0 // EOM(1) 2026-02-21T09:20:57.6885607Z .b8 0 // EOM(2) 2026-02-21T09:20:57.6885697Z .b8 2 // Abbreviation Code 2026-02-21T09:20:57.6885783Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:57.6885872Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:57.6885950Z .b8 3 // DW_AT_name 2026-02-21T09:20:57.6886028Z .b8 8 // DW_FORM_string 2026-02-21T09:20:57.6886107Z .b8 32 // DW_AT_inline 2026-02-21T09:20:57.6886188Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:57.6886259Z .b8 0 // EOM(1) 2026-02-21T09:20:57.6886324Z .b8 0 // EOM(2) 2026-02-21T09:20:57.6886408Z .b8 3 // Abbreviation Code 2026-02-21T09:20:57.6886613Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:20:57.6886698Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:20:57.6886777Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:57.6886853Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:57.6886943Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:57.6887022Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:57.6887115Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:57.6887193Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:57.6887264Z .b8 0 // EOM(1) 2026-02-21T09:20:57.6887331Z .b8 0 // EOM(2) 2026-02-21T09:20:57.6887415Z .b8 4 // Abbreviation Code 2026-02-21T09:20:57.6887513Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:20:57.6887592Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:20:57.6887682Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:20:57.6887756Z .b8 19 // DW_FORM_ref4 2026-02-21T09:20:57.6887830Z .b8 17 // DW_AT_low_pc 2026-02-21T09:20:57.6887985Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:57.6888068Z .b8 18 // DW_AT_high_pc 2026-02-21T09:20:57.6888141Z .b8 1 // DW_FORM_addr 2026-02-21T09:20:57.6888220Z .b8 88 // DW_AT_call_file 2026-02-21T09:20:57.6888370Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:57.6888448Z .b8 89 // DW_AT_call_line 2026-02-21T09:20:57.6888525Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:57.6888609Z .b8 87 // DW_AT_call_column 2026-02-21T09:20:57.6888685Z .b8 11 // DW_FORM_data1 2026-02-21T09:20:57.6888812Z .b8 0 // EOM(1) 2026-02-21T09:20:57.6888883Z .b8 0 // EOM(2) 2026-02-21T09:20:57.6888949Z .b8 0 // EOM(3) 2026-02-21T09:20:57.6889001Z } 2026-02-21T09:20:57.6889063Z .section .debug_info 2026-02-21T09:20:57.6889117Z { 2026-02-21T09:20:57.6889204Z .b32 178 // Length of Unit 2026-02-21T09:20:57.6889295Z .b8 2 // DWARF version number 2026-02-21T09:20:57.6889349Z .b8 0 2026-02-21T09:20:57.6889478Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:20:57.6889570Z .b8 8 // Address Size (in bytes) 2026-02-21T09:20:57.6889684Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:20:57.6889831Z .b8 116 // DW_AT_producer 2026-02-21T09:20:57.6889885Z .b8 114 2026-02-21T09:20:57.6889936Z .b8 105 2026-02-21T09:20:57.6889987Z .b8 116 2026-02-21T09:20:57.6890050Z .b8 111 2026-02-21T09:20:57.6890102Z .b8 110 2026-02-21T09:20:57.6890151Z .b8 0 2026-02-21T09:20:57.6890233Z .b8 2 // DW_AT_language 2026-02-21T09:20:57.6890285Z .b8 0 2026-02-21T09:20:57.6890361Z .b8 99 // DW_AT_name 2026-02-21T09:20:57.6890415Z .b8 107 2026-02-21T09:20:57.6890464Z .b8 106 2026-02-21T09:20:57.6890512Z .b8 120 2026-02-21T09:20:57.6890562Z .b8 101 2026-02-21T09:20:57.6890613Z .b8 108 2026-02-21T09:20:57.6890664Z .b8 115 2026-02-21T09:20:57.6890712Z .b8 107 2026-02-21T09:20:57.6890764Z .b8 53 2026-02-21T09:20:57.6890813Z .b8 52 2026-02-21T09:20:57.6890863Z .b8 55 2026-02-21T09:20:57.6890911Z .b8 107 2026-02-21T09:20:57.6890963Z .b8 106 2026-02-21T09:20:57.6891012Z .b8 120 2026-02-21T09:20:57.6891062Z .b8 104 2026-02-21T09:20:57.6891114Z .b8 100 2026-02-21T09:20:57.6891162Z .b8 50 2026-02-21T09:20:57.6891211Z .b8 119 2026-02-21T09:20:57.6891262Z .b8 97 2026-02-21T09:20:57.6891314Z .b8 55 2026-02-21T09:20:57.6891375Z .b8 122 2026-02-21T09:20:57.6891424Z .b8 112 2026-02-21T09:20:57.6891476Z .b8 116 2026-02-21T09:20:57.6891525Z .b8 98 2026-02-21T09:20:57.6891575Z .b8 101 2026-02-21T09:20:57.6891623Z .b8 52 2026-02-21T09:20:57.6891677Z .b8 107 2026-02-21T09:20:57.6891726Z .b8 52 2026-02-21T09:20:57.6891776Z .b8 118 2026-02-21T09:20:57.6891825Z .b8 104 2026-02-21T09:20:57.6891877Z .b8 122 2026-02-21T09:20:57.6891926Z .b8 101 2026-02-21T09:20:57.6891975Z .b8 106 2026-02-21T09:20:57.6892027Z .b8 101 2026-02-21T09:20:57.6892078Z .b8 105 2026-02-21T09:20:57.6892128Z .b8 122 2026-02-21T09:20:57.6892176Z .b8 111 2026-02-21T09:20:57.6892229Z .b8 108 2026-02-21T09:20:57.6892277Z .b8 52 2026-02-21T09:20:57.6892325Z .b8 115 2026-02-21T09:20:57.6892377Z .b8 115 2026-02-21T09:20:57.6892427Z .b8 119 2026-02-21T09:20:57.6892477Z .b8 114 2026-02-21T09:20:57.6892527Z .b8 120 2026-02-21T09:20:57.6892585Z .b8 108 2026-02-21T09:20:57.6892640Z .b8 50 2026-02-21T09:20:57.6892691Z .b8 107 2026-02-21T09:20:57.6892742Z .b8 116 2026-02-21T09:20:57.6892795Z .b8 52 2026-02-21T09:20:57.6892846Z .b8 120 2026-02-21T09:20:57.6892895Z .b8 118 2026-02-21T09:20:57.6892945Z .b8 54 2026-02-21T09:20:57.6892994Z .b8 46 2026-02-21T09:20:57.6893044Z .b8 112 2026-02-21T09:20:57.6893158Z .b8 121 2026-02-21T09:20:57.6893209Z .b8 0 2026-02-21T09:20:57.6893319Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:20:57.6893406Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:20:57.6893459Z .b8 116 2026-02-21T09:20:57.6893558Z .b8 109 2026-02-21T09:20:57.6893607Z .b8 112 2026-02-21T09:20:57.6893655Z .b8 47 2026-02-21T09:20:57.6893706Z .b8 116 2026-02-21T09:20:57.6893756Z .b8 111 2026-02-21T09:20:57.6893805Z .b8 114 2026-02-21T09:20:57.6893855Z .b8 99 2026-02-21T09:20:57.6893904Z .b8 104 2026-02-21T09:20:57.6893954Z .b8 105 2026-02-21T09:20:57.6894006Z .b8 110 2026-02-21T09:20:57.6894059Z .b8 100 2026-02-21T09:20:57.6894107Z .b8 117 2026-02-21T09:20:57.6894167Z .b8 99 2026-02-21T09:20:57.6894221Z .b8 116 2026-02-21T09:20:57.6894320Z .b8 111 2026-02-21T09:20:57.6894372Z .b8 114 2026-02-21T09:20:57.6894421Z .b8 95 2026-02-21T09:20:57.6894472Z .b8 114 2026-02-21T09:20:57.6894521Z .b8 111 2026-02-21T09:20:57.6894570Z .b8 111 2026-02-21T09:20:57.6894621Z .b8 116 2026-02-21T09:20:57.6894673Z .b8 47 2026-02-21T09:20:57.6894722Z .b8 107 2026-02-21T09:20:57.6894771Z .b8 106 2026-02-21T09:20:57.6894822Z .b8 0 2026-02-21T09:20:57.6894935Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:20:57.6895018Z .b8 95 // DW_AT_name 2026-02-21T09:20:57.6895068Z .b8 104 2026-02-21T09:20:57.6895121Z .b8 101 2026-02-21T09:20:57.6895170Z .b8 108 2026-02-21T09:20:57.6895219Z .b8 105 2026-02-21T09:20:57.6895270Z .b8 111 2026-02-21T09:20:57.6895319Z .b8 110 2026-02-21T09:20:57.6895417Z .b8 95 2026-02-21T09:20:57.6895470Z .b8 109 2026-02-21T09:20:57.6895521Z .b8 97 2026-02-21T09:20:57.6895569Z .b8 116 2026-02-21T09:20:57.6895618Z .b8 109 2026-02-21T09:20:57.6895670Z .b8 117 2026-02-21T09:20:57.6895720Z .b8 108 2026-02-21T09:20:57.6895768Z .b8 95 2026-02-21T09:20:57.6895816Z .b8 98 2026-02-21T09:20:57.6895867Z .b8 102 2026-02-21T09:20:57.6895915Z .b8 49 2026-02-21T09:20:57.6895975Z .b8 54 2026-02-21T09:20:57.6896027Z .b8 95 2026-02-21T09:20:57.6896079Z .b8 105 2026-02-21T09:20:57.6896129Z .b8 110 2026-02-21T09:20:57.6896177Z .b8 116 2026-02-21T09:20:57.6896228Z .b8 52 2026-02-21T09:20:57.6896278Z .b8 0 2026-02-21T09:20:57.6896361Z .b8 1 // DW_AT_inline 2026-02-21T09:20:57.6896581Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:20:57.6896686Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:20:57.6896785Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:20:57.6896887Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:57.6897020Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:20:57.6897118Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:20:57.6897207Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:20:57.6897300Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T09:20:57.6897386Z .b8 1 // DW_AT_call_file 2026-02-21T09:20:57.6897467Z .b8 87 // DW_AT_call_line 2026-02-21T09:20:57.6897567Z .b8 40 // DW_AT_call_column 2026-02-21T09:20:57.6897663Z .b8 0 // End Of Children Mark 2026-02-21T09:20:57.6897750Z .b8 0 // End Of Children Mark 2026-02-21T09:20:57.6897800Z } 2026-02-21T09:20:57.6897872Z .section .debug_macinfo { } 2026-02-21T09:20:57.6897880Z 2026-02-21T09:20:57.6897959Z ================================================================ 2026-02-21T09:20:57.6898076Z please share the reproducer above with Triton project. 2026-02-21T09:20:57.6899216Z [538s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:20:57.6899378Z Tensor-likes are not close! 2026-02-21T09:20:57.6899443Z 2026-02-21T09:20:57.6899545Z Mismatched elements: 133804599 / 134217728 (99.7%) 2026-02-21T09:20:57.6899719Z Greatest absolute difference: 1392.0 at index (6231, 2515) (up to 0.01 allowed) 2026-02-21T09:20:57.6899881Z Greatest relative difference: inf at index (3448, 3113) (up to 0.01 allowed) 2026-02-21T09:20:57.6900006Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:20:57.6900018Z 2026-02-21T09:21:00.3442738Z 2026-02-21T09:21:00.3442756Z 2026-02-21T09:21:00.3442764Z 2026-02-21T09:21:00.3443723Z ================================================================ 2026-02-21T09:21:00.3444373Z Internal Triton PTX codegen error 2026-02-21T09:21:00.3444972Z `ptxas` stderr: 2026-02-21T09:21:00.3446143Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1035 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:21:00.3453245Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:00.3453529Z 2026-02-21T09:21:00.3454127Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp74pdajdr.ptx -o /tmp/tmp74pdajdr.ptx.o 2026-02-21T09:21:00.3454815Z 2026-02-21T09:21:00.3454820Z 2026-02-21T09:21:00.3454901Z // 2026-02-21T09:21:00.3455438Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:21:00.3455693Z // 2026-02-21T09:21:00.3455780Z 2026-02-21T09:21:00.3455847Z .version 8.7 2026-02-21T09:21:00.3456021Z .target sm_90a 2026-02-21T09:21:00.3456186Z .address_size 64 2026-02-21T09:21:00.3456303Z 2026-02-21T09:21:00.3456665Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:21:00.3457080Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:21:00.3457362Z // @_helion_matmul_bf16_int4 2026-02-21T09:21:00.3457661Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:21:00.3457990Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:21:00.3458401Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:21:00.3458804Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:21:00.3459209Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:21:00.3459620Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:21:00.3459920Z ) 2026-02-21T09:21:00.3460071Z .reqntid 256 2026-02-21T09:21:00.3460228Z .maxnreg 64 2026-02-21T09:21:00.3460391Z { 2026-02-21T09:21:00.3460537Z .reg .pred %p<65>; 2026-02-21T09:21:00.3460717Z .reg .b16 %rs<568>; 2026-02-21T09:21:00.3460919Z .reg .b32 %r<23558>; 2026-02-21T09:21:00.3461090Z .reg .b64 %rd<724>; 2026-02-21T09:21:00.3461432Z .loc 1 14 0 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:14:0 2026-02-21T09:21:00.3461824Z $L__func_begin0: 2026-02-21T09:21:00.3462159Z .loc 1 14 0 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:14:0 2026-02-21T09:21:00.3462521Z 2026-02-21T09:21:00.3462588Z // %bb.0: 2026-02-21T09:21:00.3462807Z ld.param.b64 %rd46, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:21:00.3463134Z ld.param.b64 %rd45, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:21:00.3463431Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:21:00.3463690Z $L__tmp0: 2026-02-21T09:21:00.3464009Z .loc 1 20 30 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:20:30 2026-02-21T09:21:00.3464396Z mov.u32 %r22257, %ctaid.x; 2026-02-21T09:21:00.3464751Z .loc 1 21 49 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:21:49 2026-02-21T09:21:00.3465382Z min.u32 %r2, %r22257, 2047; 2026-02-21T09:21:00.3465734Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.3466097Z sub.s32 %r2863, %r2, %r22257; 2026-02-21T09:21:00.3466376Z add.s32 %r2864, %r2863, 1; 2026-02-21T09:21:00.3466712Z shr.s32 %r2865, %r2864, 31; 2026-02-21T09:21:00.3466895Z shr.u32 %r2866, %r2865, 30; 2026-02-21T09:21:00.3467074Z add.s32 %r2867, %r2864, %r2866; 2026-02-21T09:21:00.3467272Z and.b32 %r2868, %r2867, -4; 2026-02-21T09:21:00.3467460Z add.s32 %r23298, %r2868, %r22257; 2026-02-21T09:21:00.3467813Z .loc 1 34 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:45 2026-02-21T09:21:00.3468190Z mov.u32 %r4, %tid.x; 2026-02-21T09:21:00.3468579Z shr.u32 %r5, %r4, 5; 2026-02-21T09:21:00.3468761Z shr.u32 %r2869, %r4, 2; 2026-02-21T09:21:00.3468930Z bfe.u32 %r6, %r4, 2, 6; 2026-02-21T09:21:00.3469103Z or.b32 %r7, %r6, 64; 2026-02-21T09:21:00.3469258Z or.b32 %r8, %r6, 128; 2026-02-21T09:21:00.3469866Z or.b32 %r9, %r2869, 192; 2026-02-21T09:21:00.3470045Z and.b32 %r10, %r4, 224; 2026-02-21T09:21:00.3470226Z bfe.u32 %r11, %r4, 5, 3; 2026-02-21T09:21:00.3470402Z or.b32 %r12, %r11, 8; 2026-02-21T09:21:00.3470574Z or.b32 %r13, %r11, 16; 2026-02-21T09:21:00.3470752Z or.b32 %r14, %r11, 24; 2026-02-21T09:21:00.3470918Z or.b32 %r15, %r11, 32; 2026-02-21T09:21:00.3471124Z or.b32 %r16, %r11, 40; 2026-02-21T09:21:00.3471291Z or.b32 %r17, %r11, 48; 2026-02-21T09:21:00.3471467Z or.b32 %r18, %r11, 56; 2026-02-21T09:21:00.3471716Z or.b32 %r19, %r11, 64; 2026-02-21T09:21:00.3471899Z or.b32 %r20, %r11, 72; 2026-02-21T09:21:00.3472056Z or.b32 %r21, %r11, 80; 2026-02-21T09:21:00.3472283Z or.b32 %r22, %r11, 88; 2026-02-21T09:21:00.3472457Z or.b32 %r23, %r11, 96; 2026-02-21T09:21:00.3472656Z or.b32 %r24, %r11, 104; 2026-02-21T09:21:00.3472831Z or.b32 %r25, %r11, 112; 2026-02-21T09:21:00.3472993Z or.b32 %r26, %r11, 120; 2026-02-21T09:21:00.3473167Z or.b32 %r27, %r11, 128; 2026-02-21T09:21:00.3473334Z or.b32 %r28, %r11, 136; 2026-02-21T09:21:00.3473501Z or.b32 %r29, %r11, 144; 2026-02-21T09:21:00.3473670Z or.b32 %r30, %r11, 152; 2026-02-21T09:21:00.3473841Z or.b32 %r31, %r11, 160; 2026-02-21T09:21:00.3474001Z or.b32 %r32, %r11, 168; 2026-02-21T09:21:00.3474172Z or.b32 %r33, %r11, 176; 2026-02-21T09:21:00.3474337Z or.b32 %r34, %r11, 184; 2026-02-21T09:21:00.3474503Z or.b32 %r35, %r11, 192; 2026-02-21T09:21:00.3474673Z or.b32 %r36, %r11, 200; 2026-02-21T09:21:00.3474834Z or.b32 %r37, %r11, 208; 2026-02-21T09:21:00.3475009Z or.b32 %r38, %r11, 216; 2026-02-21T09:21:00.3475174Z or.b32 %r39, %r11, 224; 2026-02-21T09:21:00.3475343Z or.b32 %r40, %r11, 232; 2026-02-21T09:21:00.3475520Z or.b32 %r41, %r11, 240; 2026-02-21T09:21:00.3475716Z or.b32 %r42, %r11, 248; 2026-02-21T09:21:00.3475884Z shl.b32 %r43, %r4, 3; 2026-02-21T09:21:00.3476065Z and.b32 %r44, %r43, 248; 2026-02-21T09:21:00.3476388Z .loc 1 50 38 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:50:38 2026-02-21T09:21:00.3476909Z and.b32 %r45, %r4, 3; 2026-02-21T09:21:00.3477089Z shl.b32 %r46, %r45, 2; 2026-02-21T09:21:00.3477416Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.3477818Z setp.ge.s32 %p1, %r22257, %r23298; 2026-02-21T09:21:00.3478026Z and.b32 %r22235, %r43, 1912; 2026-02-21T09:21:00.3478227Z bfe.s32 %r22236, %r4, 4, 1; 2026-02-21T09:21:00.3478412Z mov.b32 %r22237, global_smem; 2026-02-21T09:21:00.3478611Z shl.b32 %r22238, %r10, 8; 2026-02-21T09:21:00.3478796Z and.b32 %r22239, %r4, 255; 2026-02-21T09:21:00.3478974Z shl.b32 %r22240, %r10, 4; 2026-02-21T09:21:00.3479156Z and.b32 %r22241, %r43, 96; 2026-02-21T09:21:00.3479329Z shl.b32 %r22242, %r45, 1; 2026-02-21T09:21:00.3479505Z or.b32 %r22243, %r4, 768; 2026-02-21T09:21:00.3479671Z or.b32 %r22244, %r4, 1792; 2026-02-21T09:21:00.3479858Z and.b32 %r22245, %r43, 48; 2026-02-21T09:21:00.3480154Z shl.b32 %r22246, %r4, 4; 2026-02-21T09:21:00.3480343Z bfe.s32 %r22247, %r4, 2, 1; 2026-02-21T09:21:00.3480522Z and.b32 %r22248, %r4, 24; 2026-02-21T09:21:00.3480713Z shl.b32 %r22249, %r4, 1; 2026-02-21T09:21:00.3480975Z bfe.s32 %r22250, %r4, 5, 1; 2026-02-21T09:21:00.3481158Z shl.b32 %r22251, %r11, 13; 2026-02-21T09:21:00.3481354Z shl.b32 %r22252, %r6, 10; 2026-02-21T09:21:00.3481549Z shl.b32 %r22253, %r9, 10; 2026-02-21T09:21:00.3481735Z cvt.u64.u32 %rd713, %r46; 2026-02-21T09:21:00.3481909Z @%p1 bra $L__BB0_11; 2026-02-21T09:21:00.3482107Z // %bb.1: // %.lr.ph 2026-02-21T09:21:00.3482506Z .loc 1 0 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:0:121 2026-02-21T09:21:00.3482978Z and.b32 %r2872, %r22236, 136; 2026-02-21T09:21:00.3483180Z xor.b32 %r47, %r2872, %r22235; 2026-02-21T09:21:00.3483381Z add.s32 %r48, %r22237, %r47; 2026-02-21T09:21:00.3483566Z add.s32 %r49, %r48, 2048; 2026-02-21T09:21:00.3483748Z add.s32 %r50, %r48, 4096; 2026-02-21T09:21:00.3483925Z add.s32 %r51, %r48, 6144; 2026-02-21T09:21:00.3484092Z shl.b32 %r2874, %r22239, 3; 2026-02-21T09:21:00.3484280Z add.s32 %r2875, %r22237, %r2874; 2026-02-21T09:21:00.3484477Z add.s32 %r54, %r2875, 98304; 2026-02-21T09:21:00.3484660Z add.s32 %r55, %r48, 40960; 2026-02-21T09:21:00.3484841Z add.s32 %r56, %r48, 43008; 2026-02-21T09:21:00.3485021Z add.s32 %r57, %r48, 45056; 2026-02-21T09:21:00.3485197Z add.s32 %r58, %r48, 47104; 2026-02-21T09:21:00.3485370Z or.b32 %r59, %r22238, 65536; 2026-02-21T09:21:00.3485633Z add.s32 %r60, %r2875, 108544; 2026-02-21T09:21:00.3485860Z add.s32 %r61, %r48, 8192; 2026-02-21T09:21:00.3486041Z add.s32 %r62, %r48, 10240; 2026-02-21T09:21:00.3486216Z add.s32 %r63, %r48, 12288; 2026-02-21T09:21:00.3486419Z add.s32 %r64, %r48, 14336; 2026-02-21T09:21:00.3486747Z or.b32 %r65, %r22238, 131072; 2026-02-21T09:21:00.3486939Z add.s32 %r66, %r2875, 100352; 2026-02-21T09:21:00.3487114Z add.s32 %r67, %r48, 49152; 2026-02-21T09:21:00.3487316Z add.s32 %r68, %r48, 51200; 2026-02-21T09:21:00.3487486Z add.s32 %r69, %r48, 53248; 2026-02-21T09:21:00.3487665Z add.s32 %r70, %r48, 55296; 2026-02-21T09:21:00.3487849Z or.b32 %r71, %r22238, 196608; 2026-02-21T09:21:00.3488024Z add.s32 %r72, %r2875, 110592; 2026-02-21T09:21:00.3488221Z add.s32 %r73, %r48, 16384; 2026-02-21T09:21:00.3488392Z add.s32 %r74, %r48, 18432; 2026-02-21T09:21:00.3488572Z add.s32 %r75, %r48, 20480; 2026-02-21T09:21:00.3488743Z add.s32 %r76, %r48, 22528; 2026-02-21T09:21:00.3488928Z or.b32 %r77, %r22238, 262144; 2026-02-21T09:21:00.3489103Z add.s32 %r78, %r2875, 102400; 2026-02-21T09:21:00.3489292Z add.s32 %r79, %r48, 57344; 2026-02-21T09:21:00.3489486Z add.s32 %r80, %r48, 59392; 2026-02-21T09:21:00.3489668Z add.s32 %r81, %r48, 61440; 2026-02-21T09:21:00.3489851Z add.s32 %r82, %r48, 63488; 2026-02-21T09:21:00.3490023Z or.b32 %r83, %r22238, 327680; 2026-02-21T09:21:00.3490218Z add.s32 %r84, %r2875, 112640; 2026-02-21T09:21:00.3490402Z add.s32 %r85, %r48, 24576; 2026-02-21T09:21:00.3490589Z add.s32 %r86, %r48, 26624; 2026-02-21T09:21:00.3490756Z add.s32 %r87, %r48, 28672; 2026-02-21T09:21:00.3490936Z add.s32 %r88, %r48, 30720; 2026-02-21T09:21:00.3491116Z or.b32 %r89, %r22238, 393216; 2026-02-21T09:21:00.3491319Z add.s32 %r90, %r2875, 104448; 2026-02-21T09:21:00.3491501Z add.s32 %r91, %r48, 65536; 2026-02-21T09:21:00.3491664Z add.s32 %r92, %r48, 67584; 2026-02-21T09:21:00.3491845Z add.s32 %r93, %r48, 69632; 2026-02-21T09:21:00.3492010Z add.s32 %r94, %r48, 71680; 2026-02-21T09:21:00.3492192Z or.b32 %r95, %r22238, 458752; 2026-02-21T09:21:00.3492363Z add.s32 %r96, %r2875, 114688; 2026-02-21T09:21:00.3492548Z add.s32 %r97, %r48, 32768; 2026-02-21T09:21:00.3492748Z add.s32 %r98, %r48, 34816; 2026-02-21T09:21:00.3492929Z add.s32 %r99, %r48, 36864; 2026-02-21T09:21:00.3493108Z add.s32 %r100, %r48, 38912; 2026-02-21T09:21:00.3493291Z or.b32 %r101, %r22238, 524288; 2026-02-21T09:21:00.3493594Z add.s32 %r102, %r2875, 106496; 2026-02-21T09:21:00.3493784Z add.s32 %r103, %r48, 73728; 2026-02-21T09:21:00.3493987Z add.s32 %r104, %r48, 75776; 2026-02-21T09:21:00.3494168Z add.s32 %r105, %r48, 77824; 2026-02-21T09:21:00.3494425Z add.s32 %r106, %r48, 79872; 2026-02-21T09:21:00.3494603Z or.b32 %r107, %r22238, 589824; 2026-02-21T09:21:00.3494796Z add.s32 %r108, %r2875, 116736; 2026-02-21T09:21:00.3494981Z or.b32 %r2879, %r22240, %r22241; 2026-02-21T09:21:00.3495194Z or.b32 %r2880, %r2879, %r22242; 2026-02-21T09:21:00.3495394Z or.b32 %r109, %r2880, %r2872; 2026-02-21T09:21:00.3495590Z xor.b32 %r110, %r109, 8; 2026-02-21T09:21:00.3495766Z shl.b32 %r2881, %r22239, 6; 2026-02-21T09:21:00.3495945Z or.b32 %r2883, %r2881, %r22245; 2026-02-21T09:21:00.3496217Z add.s32 %r2884, %r22237, 81920; 2026-02-21T09:21:00.3496404Z add.s32 %r113, %r2884, %r2883; 2026-02-21T09:21:00.3496736Z xor.b32 %r2885, %r2883, 16; 2026-02-21T09:21:00.3496920Z add.s32 %r114, %r2884, %r2885; 2026-02-21T09:21:00.3497114Z xor.b32 %r2886, %r2883, 32; 2026-02-21T09:21:00.3497298Z add.s32 %r115, %r2884, %r2886; 2026-02-21T09:21:00.3497476Z xor.b32 %r2887, %r2883, 48; 2026-02-21T09:21:00.3497667Z add.s32 %r116, %r2884, %r2887; 2026-02-21T09:21:00.3497855Z bfe.u32 %r2888, %r2884, 4, 14; 2026-02-21T09:21:00.3498048Z cvt.u64.u32 %rd47, %r2888; 2026-02-21T09:21:00.3498243Z or.b64 %rd1, %rd47, -9223371899348713472; 2026-02-21T09:21:00.3498464Z add.s32 %r2889, %r22237, 81952; 2026-02-21T09:21:00.3498650Z bfe.u32 %r2890, %r2889, 4, 14; 2026-02-21T09:21:00.3498914Z cvt.u64.u32 %rd48, %r2890; 2026-02-21T09:21:00.3499105Z or.b64 %rd2, %rd48, -9223371899348713472; 2026-02-21T09:21:00.3499326Z and.b32 %r2892, %r22246, 3968; 2026-02-21T09:21:00.3499513Z and.b32 %r2894, %r22247, 4112; 2026-02-21T09:21:00.3499688Z or.b32 %r2895, %r2892, %r2894; 2026-02-21T09:21:00.3499885Z mad.lo.s32 %r2896, %r45, 8224, %r2895; 2026-02-21T09:21:00.3500091Z add.s32 %r117, %r22237, %r2896; 2026-02-21T09:21:00.3500283Z xor.b32 %r2897, %r2896, 16; 2026-02-21T09:21:00.3500457Z add.s32 %r118, %r22237, %r2897; 2026-02-21T09:21:00.3500645Z xor.b32 %r2898, %r2896, 32; 2026-02-21T09:21:00.3500816Z add.s32 %r119, %r22237, %r2898; 2026-02-21T09:21:00.3500999Z xor.b32 %r2899, %r2896, 48; 2026-02-21T09:21:00.3501175Z add.s32 %r120, %r22237, %r2899; 2026-02-21T09:21:00.3501354Z xor.b32 %r2900, %r2896, 64; 2026-02-21T09:21:00.3501532Z add.s32 %r121, %r22237, %r2900; 2026-02-21T09:21:00.3501711Z xor.b32 %r2901, %r2896, 80; 2026-02-21T09:21:00.3501890Z add.s32 %r122, %r22237, %r2901; 2026-02-21T09:21:00.3502082Z xor.b32 %r2902, %r2896, 96; 2026-02-21T09:21:00.3502260Z add.s32 %r123, %r22237, %r2902; 2026-02-21T09:21:00.3502448Z xor.b32 %r2903, %r2896, 112; 2026-02-21T09:21:00.3502634Z add.s32 %r124, %r22237, %r2903; 2026-02-21T09:21:00.3502813Z shl.b32 %r2905, %r22248, 10; 2026-02-21T09:21:00.3502997Z and.b32 %r2906, %r22246, 112; 2026-02-21T09:21:00.3503180Z shl.b32 %r2907, %r22248, 2; 2026-02-21T09:21:00.3503351Z and.b32 %r2909, %r22249, 384; 2026-02-21T09:21:00.3503532Z and.b32 %r2911, %r22250, 4112; 2026-02-21T09:21:00.3503710Z or.b32 %r2912, %r2905, %r2906; 2026-02-21T09:21:00.3503898Z or.b32 %r2913, %r2907, %r2909; 2026-02-21T09:21:00.3504084Z xor.b32 %r2914, %r2912, %r2913; 2026-02-21T09:21:00.3504272Z xor.b32 %r2915, %r2914, %r2911; 2026-02-21T09:21:00.3504455Z add.s32 %r6269, %r22237, %r2915; 2026-02-21T09:21:00.3504668Z add.s32 %r6274, %r6269, 512; 2026-02-21T09:21:00.3504848Z add.s32 %r6279, %r6269, 1024; 2026-02-21T09:21:00.3505025Z add.s32 %r6284, %r6269, 1536; 2026-02-21T09:21:00.3505211Z add.s32 %r6289, %r6269, 2048; 2026-02-21T09:21:00.3505388Z add.s32 %r6294, %r6269, 2560; 2026-02-21T09:21:00.3505571Z add.s32 %r6299, %r6269, 3072; 2026-02-21T09:21:00.3505747Z add.s32 %r6304, %r6269, 3584; 2026-02-21T09:21:00.3506085Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3506656Z or.b32 %r2917, %r22251, %r44; 2026-02-21T09:21:00.3506854Z or.b32 %r133, %r2917, 720896; 2026-02-21T09:21:00.3507192Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.3507680Z mad.wide.u32 %rd3, %r45, 8, %rd44; 2026-02-21T09:21:00.3508031Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3508496Z or.b32 %r2920, %r22253, %r46; 2026-02-21T09:21:00.3508687Z or.b32 %r137, %r2920, 176; 2026-02-21T09:21:00.3509016Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.3509388Z add.s32 %r22256, %r22257, 1; 2026-02-21T09:21:00.3509575Z add.s32 %r22255, %r22257, 2; 2026-02-21T09:21:00.3509832Z add.s32 %r22254, %r22257, 3; 2026-02-21T09:21:00.3510070Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:21:00.3510357Z // Child Loop BB0_3 Depth 2 2026-02-21T09:21:00.3510635Z // Child Loop BB0_5 Depth 2 2026-02-21T09:21:00.3510894Z // Child Loop BB0_7 Depth 2 2026-02-21T09:21:00.3511163Z // Child Loop BB0_9 Depth 2 2026-02-21T09:21:00.3511556Z .loc 1 29 33 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:29:33 2026-02-21T09:21:00.3511907Z shr.u32 %r3024, %r22254, 6; 2026-02-21T09:21:00.3512170Z and.b32 %r237, %r3024, 33554424; 2026-02-21T09:21:00.3512360Z shr.u32 %r3025, %r22255, 6; 2026-02-21T09:21:00.3512540Z and.b32 %r238, %r3025, 33554424; 2026-02-21T09:21:00.3512720Z shr.u32 %r3026, %r22256, 6; 2026-02-21T09:21:00.3512901Z and.b32 %r239, %r3026, 33554424; 2026-02-21T09:21:00.3513083Z shr.u32 %r3027, %r22257, 6; 2026-02-21T09:21:00.3513276Z and.b32 %r3028, %r3027, 16777208; 2026-02-21T09:21:00.3513474Z and.b32 %r3029, %r3027, 33554424; 2026-02-21T09:21:00.3513800Z .loc 1 30 39 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:39 2026-02-21T09:21:00.3514151Z sub.s32 %r3030, 32, %r3029; 2026-02-21T09:21:00.3514463Z .loc 1 30 52 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:52 2026-02-21T09:21:00.3514819Z min.s32 %r3031, %r3030, 8; 2026-02-21T09:21:00.3515125Z .loc 1 31 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:45 2026-02-21T09:21:00.3515478Z and.b32 %r3032, %r22257, 511; 2026-02-21T09:21:00.3515797Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.3516147Z div.s32 %r3033, %r3032, %r3031; 2026-02-21T09:21:00.3516604Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.3516968Z mul.lo.s32 %r3034, %r3033, %r3031; 2026-02-21T09:21:00.3517172Z sub.s32 %r3035, %r3032, %r3034; 2026-02-21T09:21:00.3517500Z .loc 1 31 30 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:30 2026-02-21T09:21:00.3517856Z add.s32 %r3036, %r3035, %r3029; 2026-02-21T09:21:00.3518183Z .loc 1 33 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:33:27 2026-02-21T09:21:00.3518530Z shl.b32 %r3037, %r3036, 8; 2026-02-21T09:21:00.3518847Z .loc 1 34 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:32 2026-02-21T09:21:00.3519193Z or.b32 %r240, %r3037, %r44; 2026-02-21T09:21:00.3519512Z .loc 1 35 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:35:27 2026-02-21T09:21:00.3519859Z shl.b32 %r241, %r3033, 8; 2026-02-21T09:21:00.3520187Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.3520547Z or.b32 %r3038, %r241, %r6; 2026-02-21T09:21:00.3520719Z or.b32 %r3039, %r241, %r7; 2026-02-21T09:21:00.3520987Z or.b32 %r3040, %r241, %r8; 2026-02-21T09:21:00.3521157Z or.b32 %r3041, %r241, %r9; 2026-02-21T09:21:00.3521483Z .loc 1 51 53 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:53 2026-02-21T09:21:00.3521902Z shl.b32 %r3042, %r3038, 10; 2026-02-21T09:21:00.3522081Z shl.b32 %r3043, %r3039, 10; 2026-02-21T09:21:00.3522263Z shl.b32 %r3044, %r3040, 10; 2026-02-21T09:21:00.3522443Z shl.b32 %r3045, %r3041, 10; 2026-02-21T09:21:00.3522763Z .loc 1 51 60 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:60 2026-02-21T09:21:00.3523116Z or.b32 %r3046, %r3042, %r46; 2026-02-21T09:21:00.3523302Z or.b32 %r3047, %r3043, %r46; 2026-02-21T09:21:00.3523479Z or.b32 %r3048, %r3044, %r46; 2026-02-21T09:21:00.3523734Z or.b32 %r3049, %r3045, %r46; 2026-02-21T09:21:00.3524048Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3524437Z mad.wide.s32 %rd49, %r3046, 2, %rd44; 2026-02-21T09:21:00.3524661Z mad.wide.s32 %rd50, %r3047, 2, %rd44; 2026-02-21T09:21:00.3524865Z mad.wide.s32 %rd51, %r3048, 2, %rd44; 2026-02-21T09:21:00.3525070Z mad.wide.s32 %rd52, %r3049, 2, %rd44; 2026-02-21T09:21:00.3525403Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3525751Z bar.sync 0; 2026-02-21T09:21:00.3525897Z mov.b32 %r2922, 8; 2026-02-21T09:21:00.3526058Z // begin inline asm 2026-02-21T09:21:00.3526358Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd49 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3526799Z // end inline asm 2026-02-21T09:21:00.3526954Z // begin inline asm 2026-02-21T09:21:00.3527191Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd50 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3547009Z // end inline asm 2026-02-21T09:21:00.3547239Z // begin inline asm 2026-02-21T09:21:00.3547541Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd51 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3547864Z // end inline asm 2026-02-21T09:21:00.3548064Z // begin inline asm 2026-02-21T09:21:00.3548411Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd52 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3548716Z // end inline asm 2026-02-21T09:21:00.3548893Z cp.async.commit_group; 2026-02-21T09:21:00.3549255Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3549654Z add.s32 %r3050, %r240, %r22238; 2026-02-21T09:21:00.3550013Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3550391Z cvt.s64.s32 %rd100, %r3050; 2026-02-21T09:21:00.3550584Z add.s64 %rd53, %rd45, %rd100; 2026-02-21T09:21:00.3550934Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3551296Z // begin inline asm 2026-02-21T09:21:00.3551568Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd53 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3551869Z // end inline asm 2026-02-21T09:21:00.3552061Z cp.async.commit_group; 2026-02-21T09:21:00.3552410Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3552776Z cvt.s64.s32 %rd101, %r3042; 2026-02-21T09:21:00.3552983Z or.b64 %rd102, %rd101, %rd713; 2026-02-21T09:21:00.3553180Z shl.b64 %rd103, %rd102, 1; 2026-02-21T09:21:00.3553386Z add.s64 %rd104, %rd44, %rd103; 2026-02-21T09:21:00.3553574Z add.s64 %rd54, %rd104, 32; 2026-02-21T09:21:00.3553762Z cvt.s64.s32 %rd105, %r3043; 2026-02-21T09:21:00.3553946Z or.b64 %rd106, %rd105, %rd713; 2026-02-21T09:21:00.3554135Z shl.b64 %rd107, %rd106, 1; 2026-02-21T09:21:00.3554312Z add.s64 %rd108, %rd44, %rd107; 2026-02-21T09:21:00.3554521Z add.s64 %rd55, %rd108, 32; 2026-02-21T09:21:00.3554711Z cvt.s64.s32 %rd109, %r3044; 2026-02-21T09:21:00.3554900Z or.b64 %rd110, %rd109, %rd713; 2026-02-21T09:21:00.3555097Z shl.b64 %rd111, %rd110, 1; 2026-02-21T09:21:00.3555461Z add.s64 %rd112, %rd44, %rd111; 2026-02-21T09:21:00.3555654Z add.s64 %rd56, %rd112, 32; 2026-02-21T09:21:00.3555835Z cvt.s64.s32 %rd113, %r3045; 2026-02-21T09:21:00.3556022Z or.b64 %rd114, %rd113, %rd713; 2026-02-21T09:21:00.3556206Z shl.b64 %rd115, %rd114, 1; 2026-02-21T09:21:00.3556634Z add.s64 %rd116, %rd44, %rd115; 2026-02-21T09:21:00.3556839Z add.s64 %rd57, %rd116, 32; 2026-02-21T09:21:00.3557180Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3557553Z // begin inline asm 2026-02-21T09:21:00.3557802Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd54 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3558093Z // end inline asm 2026-02-21T09:21:00.3558249Z // begin inline asm 2026-02-21T09:21:00.3558589Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd55 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3558892Z // end inline asm 2026-02-21T09:21:00.3559055Z // begin inline asm 2026-02-21T09:21:00.3559289Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd56 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3559570Z // end inline asm 2026-02-21T09:21:00.3559735Z // begin inline asm 2026-02-21T09:21:00.3559961Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd57 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3560247Z // end inline asm 2026-02-21T09:21:00.3560415Z cp.async.commit_group; 2026-02-21T09:21:00.3560750Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3561122Z add.s32 %r3051, %r240, %r59; 2026-02-21T09:21:00.3561524Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3561894Z cvt.s64.s32 %rd117, %r3051; 2026-02-21T09:21:00.3562083Z add.s64 %rd58, %rd45, %rd117; 2026-02-21T09:21:00.3562430Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3562792Z // begin inline asm 2026-02-21T09:21:00.3563036Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd58 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3563311Z // end inline asm 2026-02-21T09:21:00.3563480Z cp.async.commit_group; 2026-02-21T09:21:00.3563804Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3564162Z add.s64 %rd59, %rd104, 64; 2026-02-21T09:21:00.3564350Z add.s64 %rd60, %rd108, 64; 2026-02-21T09:21:00.3564527Z add.s64 %rd61, %rd112, 64; 2026-02-21T09:21:00.3564712Z add.s64 %rd62, %rd116, 64; 2026-02-21T09:21:00.3565039Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3565396Z bar.sync 0; 2026-02-21T09:21:00.3565545Z // begin inline asm 2026-02-21T09:21:00.3565786Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd59 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3566071Z // end inline asm 2026-02-21T09:21:00.3566225Z // begin inline asm 2026-02-21T09:21:00.3566584Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd60 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3566868Z // end inline asm 2026-02-21T09:21:00.3567024Z // begin inline asm 2026-02-21T09:21:00.3567246Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd61 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3567523Z // end inline asm 2026-02-21T09:21:00.3567673Z // begin inline asm 2026-02-21T09:21:00.3567904Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd62 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3568182Z // end inline asm 2026-02-21T09:21:00.3568344Z cp.async.commit_group; 2026-02-21T09:21:00.3568670Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3569030Z add.s32 %r3052, %r240, %r65; 2026-02-21T09:21:00.3569353Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3569711Z cvt.s64.s32 %rd118, %r3052; 2026-02-21T09:21:00.3569904Z add.s64 %rd63, %rd45, %rd118; 2026-02-21T09:21:00.3570223Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3570683Z // begin inline asm 2026-02-21T09:21:00.3570923Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd63 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3571196Z // end inline asm 2026-02-21T09:21:00.3571442Z cp.async.commit_group; 2026-02-21T09:21:00.3571753Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3572113Z add.s64 %rd64, %rd104, 96; 2026-02-21T09:21:00.3572294Z add.s64 %rd65, %rd108, 96; 2026-02-21T09:21:00.3572484Z add.s64 %rd66, %rd112, 96; 2026-02-21T09:21:00.3572657Z add.s64 %rd67, %rd116, 96; 2026-02-21T09:21:00.3572974Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3573405Z // begin inline asm 2026-02-21T09:21:00.3573636Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd64 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3573919Z // end inline asm 2026-02-21T09:21:00.3574075Z // begin inline asm 2026-02-21T09:21:00.3574307Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd65 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3574578Z // end inline asm 2026-02-21T09:21:00.3574740Z // begin inline asm 2026-02-21T09:21:00.3574961Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd66 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3575243Z // end inline asm 2026-02-21T09:21:00.3575403Z // begin inline asm 2026-02-21T09:21:00.3575623Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd67 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3575900Z // end inline asm 2026-02-21T09:21:00.3576127Z cp.async.commit_group; 2026-02-21T09:21:00.3576571Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3576948Z add.s32 %r3053, %r240, %r71; 2026-02-21T09:21:00.3577277Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3577634Z cvt.s64.s32 %rd119, %r3053; 2026-02-21T09:21:00.3577824Z add.s64 %rd68, %rd45, %rd119; 2026-02-21T09:21:00.3578149Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3578499Z // begin inline asm 2026-02-21T09:21:00.3578736Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd68 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3579008Z // end inline asm 2026-02-21T09:21:00.3579173Z cp.async.commit_group; 2026-02-21T09:21:00.3579478Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3579834Z add.s64 %rd69, %rd104, 128; 2026-02-21T09:21:00.3580024Z add.s64 %rd70, %rd108, 128; 2026-02-21T09:21:00.3580201Z add.s64 %rd71, %rd112, 128; 2026-02-21T09:21:00.3580384Z add.s64 %rd72, %rd116, 128; 2026-02-21T09:21:00.3580700Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3581054Z bar.sync 0; 2026-02-21T09:21:00.3581205Z // begin inline asm 2026-02-21T09:21:00.3581457Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd69 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3581732Z // end inline asm 2026-02-21T09:21:00.3581892Z // begin inline asm 2026-02-21T09:21:00.3582124Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd70 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3582403Z // end inline asm 2026-02-21T09:21:00.3582560Z // begin inline asm 2026-02-21T09:21:00.3582781Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd71 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3583057Z // end inline asm 2026-02-21T09:21:00.3583208Z // begin inline asm 2026-02-21T09:21:00.3583441Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd72 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3583717Z // end inline asm 2026-02-21T09:21:00.3583885Z cp.async.commit_group; 2026-02-21T09:21:00.3584206Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3584556Z add.s32 %r3054, %r240, %r77; 2026-02-21T09:21:00.3584879Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3585315Z cvt.s64.s32 %rd120, %r3054; 2026-02-21T09:21:00.3585503Z add.s64 %rd73, %rd45, %rd120; 2026-02-21T09:21:00.3585824Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3586247Z // begin inline asm 2026-02-21T09:21:00.3586616Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd73 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3586899Z // end inline asm 2026-02-21T09:21:00.3587072Z cp.async.commit_group; 2026-02-21T09:21:00.3587383Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3587741Z add.s64 %rd74, %rd104, 160; 2026-02-21T09:21:00.3587918Z add.s64 %rd75, %rd108, 160; 2026-02-21T09:21:00.3588185Z add.s64 %rd76, %rd112, 160; 2026-02-21T09:21:00.3588458Z add.s64 %rd77, %rd116, 160; 2026-02-21T09:21:00.3588773Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3589129Z // begin inline asm 2026-02-21T09:21:00.3589354Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd74 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3589631Z // end inline asm 2026-02-21T09:21:00.3589783Z // begin inline asm 2026-02-21T09:21:00.3590014Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd75 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3590290Z // end inline asm 2026-02-21T09:21:00.3590445Z // begin inline asm 2026-02-21T09:21:00.3590679Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd76 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3591020Z // end inline asm 2026-02-21T09:21:00.3591201Z // begin inline asm 2026-02-21T09:21:00.3591427Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd77 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3591705Z // end inline asm 2026-02-21T09:21:00.3591865Z cp.async.commit_group; 2026-02-21T09:21:00.3592180Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3592553Z add.s32 %r3055, %r240, %r83; 2026-02-21T09:21:00.3592867Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3593218Z cvt.s64.s32 %rd121, %r3055; 2026-02-21T09:21:00.3593396Z add.s64 %rd78, %rd45, %rd121; 2026-02-21T09:21:00.3593721Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3594069Z // begin inline asm 2026-02-21T09:21:00.3594309Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd78 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3594584Z // end inline asm 2026-02-21T09:21:00.3594744Z cp.async.commit_group; 2026-02-21T09:21:00.3595051Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3595397Z add.s64 %rd79, %rd104, 192; 2026-02-21T09:21:00.3595585Z add.s64 %rd80, %rd108, 192; 2026-02-21T09:21:00.3595760Z add.s64 %rd81, %rd112, 192; 2026-02-21T09:21:00.3595953Z add.s64 %rd82, %rd116, 192; 2026-02-21T09:21:00.3596264Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3596753Z bar.sync 0; 2026-02-21T09:21:00.3596901Z // begin inline asm 2026-02-21T09:21:00.3597141Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd79 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3597416Z // end inline asm 2026-02-21T09:21:00.3597565Z // begin inline asm 2026-02-21T09:21:00.3597794Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd80 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3598061Z // end inline asm 2026-02-21T09:21:00.3598217Z // begin inline asm 2026-02-21T09:21:00.3598438Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd81 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3598727Z // end inline asm 2026-02-21T09:21:00.3598877Z // begin inline asm 2026-02-21T09:21:00.3599107Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd82 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3599383Z // end inline asm 2026-02-21T09:21:00.3599554Z cp.async.commit_group; 2026-02-21T09:21:00.3599978Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3600355Z add.s32 %r3056, %r240, %r89; 2026-02-21T09:21:00.3600682Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3601114Z cvt.s64.s32 %rd122, %r3056; 2026-02-21T09:21:00.3601304Z add.s64 %rd83, %rd45, %rd122; 2026-02-21T09:21:00.3601653Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3602009Z // begin inline asm 2026-02-21T09:21:00.3602268Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd83 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3602553Z // end inline asm 2026-02-21T09:21:00.3602812Z cp.async.commit_group; 2026-02-21T09:21:00.3603133Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3603483Z add.s64 %rd84, %rd104, 224; 2026-02-21T09:21:00.3603678Z add.s64 %rd85, %rd108, 224; 2026-02-21T09:21:00.3603858Z add.s64 %rd86, %rd112, 224; 2026-02-21T09:21:00.3604047Z add.s64 %rd87, %rd116, 224; 2026-02-21T09:21:00.3604361Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3604716Z // begin inline asm 2026-02-21T09:21:00.3604953Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd84 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3605226Z // end inline asm 2026-02-21T09:21:00.3605400Z // begin inline asm 2026-02-21T09:21:00.3605697Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd85 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3605981Z // end inline asm 2026-02-21T09:21:00.3606130Z // begin inline asm 2026-02-21T09:21:00.3606364Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd86 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3606778Z // end inline asm 2026-02-21T09:21:00.3606935Z // begin inline asm 2026-02-21T09:21:00.3607171Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd87 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3607442Z // end inline asm 2026-02-21T09:21:00.3607605Z cp.async.commit_group; 2026-02-21T09:21:00.3607916Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3608276Z add.s32 %r3057, %r240, %r95; 2026-02-21T09:21:00.3608605Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3608973Z cvt.s64.s32 %rd123, %r3057; 2026-02-21T09:21:00.3609165Z add.s64 %rd88, %rd45, %rd123; 2026-02-21T09:21:00.3609487Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3609841Z // begin inline asm 2026-02-21T09:21:00.3610078Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd88 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3610358Z // end inline asm 2026-02-21T09:21:00.3610515Z cp.async.commit_group; 2026-02-21T09:21:00.3610827Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3611191Z add.s64 %rd89, %rd104, 256; 2026-02-21T09:21:00.3611381Z add.s64 %rd90, %rd108, 256; 2026-02-21T09:21:00.3611574Z add.s64 %rd91, %rd112, 256; 2026-02-21T09:21:00.3611752Z add.s64 %rd92, %rd116, 256; 2026-02-21T09:21:00.3612077Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3612433Z bar.sync 0; 2026-02-21T09:21:00.3612594Z // begin inline asm 2026-02-21T09:21:00.3612835Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd89 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3613114Z // end inline asm 2026-02-21T09:21:00.3613273Z // begin inline asm 2026-02-21T09:21:00.3613501Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd90 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3613783Z // end inline asm 2026-02-21T09:21:00.3613936Z // begin inline asm 2026-02-21T09:21:00.3614165Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd91 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3614578Z // end inline asm 2026-02-21T09:21:00.3614734Z // begin inline asm 2026-02-21T09:21:00.3614969Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd92 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3615260Z // end inline asm 2026-02-21T09:21:00.3615424Z cp.async.commit_group; 2026-02-21T09:21:00.3615807Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3616165Z add.s32 %r3058, %r240, %r101; 2026-02-21T09:21:00.3616620Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3617004Z cvt.s64.s32 %rd124, %r3058; 2026-02-21T09:21:00.3617197Z add.s64 %rd93, %rd45, %rd124; 2026-02-21T09:21:00.3617599Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3617950Z // begin inline asm 2026-02-21T09:21:00.3618190Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd93 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3618476Z // end inline asm 2026-02-21T09:21:00.3618635Z cp.async.commit_group; 2026-02-21T09:21:00.3618956Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3619314Z add.s64 %rd94, %rd104, 288; 2026-02-21T09:21:00.3619506Z add.s64 %rd95, %rd108, 288; 2026-02-21T09:21:00.3619680Z add.s64 %rd96, %rd112, 288; 2026-02-21T09:21:00.3619863Z add.s64 %rd97, %rd116, 288; 2026-02-21T09:21:00.3620173Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3620597Z // begin inline asm 2026-02-21T09:21:00.3620846Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd94 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3621126Z // end inline asm 2026-02-21T09:21:00.3621296Z // begin inline asm 2026-02-21T09:21:00.3621527Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd95 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3621807Z // end inline asm 2026-02-21T09:21:00.3621957Z // begin inline asm 2026-02-21T09:21:00.3622194Z cp.async.ca.shared.global [ %r105 + 0 ], [ %rd96 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3622475Z // end inline asm 2026-02-21T09:21:00.3622624Z // begin inline asm 2026-02-21T09:21:00.3622854Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd97 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3623125Z // end inline asm 2026-02-21T09:21:00.3623294Z cp.async.commit_group; 2026-02-21T09:21:00.3623602Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3623961Z add.s32 %r3059, %r240, %r107; 2026-02-21T09:21:00.3624279Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3624654Z cvt.s64.s32 %rd125, %r3059; 2026-02-21T09:21:00.3624843Z add.s64 %rd98, %rd45, %rd125; 2026-02-21T09:21:00.3625157Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3625510Z // begin inline asm 2026-02-21T09:21:00.3625741Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd98 + 0 ], 0x8, %r2922; 2026-02-21T09:21:00.3626021Z // end inline asm 2026-02-21T09:21:00.3626180Z cp.async.commit_group; 2026-02-21T09:21:00.3626630Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3627014Z add.s32 %r3060, %r3032, %r3028; 2026-02-21T09:21:00.3627209Z sub.s32 %r3061, %r3060, %r3034; 2026-02-21T09:21:00.3627408Z shl.b32 %r3062, %r3061, 8; 2026-02-21T09:21:00.3627604Z add.s32 %r22259, %r133, %r3062; 2026-02-21T09:21:00.3627807Z or.b32 %r3063, %r8, %r241; 2026-02-21T09:21:00.3627992Z shl.b32 %r3064, %r3063, 10; 2026-02-21T09:21:00.3628183Z mul.wide.s32 %rd8, %r3064, 2; 2026-02-21T09:21:00.3628436Z or.b32 %r3065, %r7, %r241; 2026-02-21T09:21:00.3628625Z shl.b32 %r3066, %r3065, 10; 2026-02-21T09:21:00.3628804Z mul.wide.s32 %rd9, %r3066, 2; 2026-02-21T09:21:00.3628990Z shl.b32 %r3067, %r3033, 18; 2026-02-21T09:21:00.3629261Z or.b32 %r3068, %r22252, %r3067; 2026-02-21T09:21:00.3629455Z mul.wide.s32 %rd10, %r3068, 2; 2026-02-21T09:21:00.3629645Z or.b32 %r22258, %r137, %r3067; 2026-02-21T09:21:00.3629829Z mov.b32 %r22262, 0f00000000; 2026-02-21T09:21:00.3630010Z mov.b32 %r22261, 4; 2026-02-21T09:21:00.3630252Z mov.b32 %r22260, -1; 2026-02-21T09:21:00.3630421Z mov.b64 %rd715, -16; 2026-02-21T09:21:00.3630582Z mov.b64 %rd714, %rd3; 2026-02-21T09:21:00.3630770Z mov.b32 %r22263, %r22262; 2026-02-21T09:21:00.3630945Z mov.b32 %r22264, %r22262; 2026-02-21T09:21:00.3631122Z mov.b32 %r22265, %r22262; 2026-02-21T09:21:00.3631298Z mov.b32 %r22266, %r22262; 2026-02-21T09:21:00.3631468Z mov.b32 %r22267, %r22262; 2026-02-21T09:21:00.3631639Z mov.b32 %r22268, %r22262; 2026-02-21T09:21:00.3631875Z mov.b32 %r22269, %r22262; 2026-02-21T09:21:00.3632050Z mov.b32 %r22270, %r22262; 2026-02-21T09:21:00.3632221Z mov.b32 %r22271, %r22262; 2026-02-21T09:21:00.3632391Z mov.b32 %r22272, %r22262; 2026-02-21T09:21:00.3632559Z mov.b32 %r22273, %r22262; 2026-02-21T09:21:00.3632730Z mov.b32 %r22274, %r22262; 2026-02-21T09:21:00.3632897Z mov.b32 %r22275, %r22262; 2026-02-21T09:21:00.3633068Z mov.b32 %r22276, %r22262; 2026-02-21T09:21:00.3633239Z mov.b32 %r22277, %r22262; 2026-02-21T09:21:00.3633408Z mov.b32 %r22278, %r22262; 2026-02-21T09:21:00.3633597Z mov.b32 %r22279, %r22262; 2026-02-21T09:21:00.3633771Z mov.b32 %r22280, %r22262; 2026-02-21T09:21:00.3633944Z mov.b32 %r22281, %r22262; 2026-02-21T09:21:00.3634115Z mov.b32 %r22282, %r22262; 2026-02-21T09:21:00.3634363Z mov.b32 %r22283, %r22262; 2026-02-21T09:21:00.3634534Z mov.b32 %r22284, %r22262; 2026-02-21T09:21:00.3634707Z mov.b32 %r22285, %r22262; 2026-02-21T09:21:00.3634876Z mov.b32 %r22286, %r22262; 2026-02-21T09:21:00.3635062Z mov.b32 %r22287, %r22262; 2026-02-21T09:21:00.3635248Z mov.b32 %r22288, %r22262; 2026-02-21T09:21:00.3635415Z mov.b32 %r22289, %r22262; 2026-02-21T09:21:00.3635593Z mov.b32 %r22290, %r22262; 2026-02-21T09:21:00.3635771Z mov.b32 %r22291, %r22262; 2026-02-21T09:21:00.3635949Z mov.b32 %r22292, %r22262; 2026-02-21T09:21:00.3636117Z mov.b32 %r22293, %r22262; 2026-02-21T09:21:00.3636289Z mov.b32 %r22294, %r22262; 2026-02-21T09:21:00.3636579Z mov.b32 %r22295, %r22262; 2026-02-21T09:21:00.3636763Z mov.b32 %r22296, %r22262; 2026-02-21T09:21:00.3636932Z mov.b32 %r22297, %r22262; 2026-02-21T09:21:00.3637120Z mov.b32 %r22298, %r22262; 2026-02-21T09:21:00.3637299Z mov.b32 %r22299, %r22262; 2026-02-21T09:21:00.3637467Z mov.b32 %r22300, %r22262; 2026-02-21T09:21:00.3637645Z mov.b32 %r22301, %r22262; 2026-02-21T09:21:00.3637815Z mov.b32 %r22302, %r22262; 2026-02-21T09:21:00.3637988Z mov.b32 %r22303, %r22262; 2026-02-21T09:21:00.3638156Z mov.b32 %r22304, %r22262; 2026-02-21T09:21:00.3638333Z mov.b32 %r22305, %r22262; 2026-02-21T09:21:00.3638499Z mov.b32 %r22306, %r22262; 2026-02-21T09:21:00.3638675Z mov.b32 %r22307, %r22262; 2026-02-21T09:21:00.3638849Z mov.b32 %r22308, %r22262; 2026-02-21T09:21:00.3639019Z mov.b32 %r22309, %r22262; 2026-02-21T09:21:00.3639188Z mov.b32 %r22310, %r22262; 2026-02-21T09:21:00.3639354Z mov.b32 %r22311, %r22262; 2026-02-21T09:21:00.3639528Z mov.b32 %r22312, %r22262; 2026-02-21T09:21:00.3639694Z mov.b32 %r22313, %r22262; 2026-02-21T09:21:00.3639869Z mov.b32 %r22314, %r22262; 2026-02-21T09:21:00.3640045Z mov.b32 %r22315, %r22262; 2026-02-21T09:21:00.3640223Z mov.b32 %r22316, %r22262; 2026-02-21T09:21:00.3640388Z mov.b32 %r22317, %r22262; 2026-02-21T09:21:00.3640561Z mov.b32 %r22318, %r22262; 2026-02-21T09:21:00.3640732Z mov.b32 %r22319, %r22262; 2026-02-21T09:21:00.3640896Z mov.b32 %r22320, %r22262; 2026-02-21T09:21:00.3641067Z mov.b32 %r22321, %r22262; 2026-02-21T09:21:00.3641231Z mov.b32 %r22322, %r22262; 2026-02-21T09:21:00.3641403Z mov.b32 %r22323, %r22262; 2026-02-21T09:21:00.3641572Z mov.b32 %r22324, %r22262; 2026-02-21T09:21:00.3641745Z mov.b32 %r22325, %r22262; 2026-02-21T09:21:00.3641915Z mov.b32 %r22326, %r22262; 2026-02-21T09:21:00.3642183Z mov.b32 %r22327, %r22262; 2026-02-21T09:21:00.3642348Z mov.b32 %r22328, %r22262; 2026-02-21T09:21:00.3642524Z mov.b32 %r22329, %r22262; 2026-02-21T09:21:00.3642699Z mov.b32 %r22330, %r22262; 2026-02-21T09:21:00.3642866Z mov.b32 %r22331, %r22262; 2026-02-21T09:21:00.3643102Z mov.b32 %r22332, %r22262; 2026-02-21T09:21:00.3643268Z mov.b32 %r22333, %r22262; 2026-02-21T09:21:00.3643440Z mov.b32 %r22334, %r22262; 2026-02-21T09:21:00.3643605Z mov.b32 %r22335, %r22262; 2026-02-21T09:21:00.3643778Z mov.b32 %r22336, %r22262; 2026-02-21T09:21:00.3643948Z mov.b32 %r22337, %r22262; 2026-02-21T09:21:00.3644120Z mov.b32 %r22338, %r22262; 2026-02-21T09:21:00.3644292Z mov.b32 %r22339, %r22262; 2026-02-21T09:21:00.3644468Z mov.b32 %r22340, %r22262; 2026-02-21T09:21:00.3644709Z mov.b32 %r22341, %r22262; 2026-02-21T09:21:00.3644879Z mov.b32 %r22342, %r22262; 2026-02-21T09:21:00.3645065Z mov.b32 %r22343, %r22262; 2026-02-21T09:21:00.3645248Z mov.b32 %r22344, %r22262; 2026-02-21T09:21:00.3645446Z mov.b32 %r22345, %r22262; 2026-02-21T09:21:00.3645622Z mov.b32 %r22346, %r22262; 2026-02-21T09:21:00.3645804Z mov.b32 %r22347, %r22262; 2026-02-21T09:21:00.3645974Z mov.b32 %r22348, %r22262; 2026-02-21T09:21:00.3646153Z mov.b32 %r22349, %r22262; 2026-02-21T09:21:00.3646328Z mov.b32 %r22350, %r22262; 2026-02-21T09:21:00.3646631Z mov.b32 %r22351, %r22262; 2026-02-21T09:21:00.3646816Z mov.b32 %r22352, %r22262; 2026-02-21T09:21:00.3646991Z mov.b32 %r22353, %r22262; 2026-02-21T09:21:00.3647167Z mov.b32 %r22354, %r22262; 2026-02-21T09:21:00.3647424Z mov.b32 %r22355, %r22262; 2026-02-21T09:21:00.3647623Z mov.b32 %r22356, %r22262; 2026-02-21T09:21:00.3647803Z mov.b32 %r22357, %r22262; 2026-02-21T09:21:00.3647976Z mov.b32 %r22358, %r22262; 2026-02-21T09:21:00.3648148Z mov.b32 %r22359, %r22262; 2026-02-21T09:21:00.3648323Z mov.b32 %r22360, %r22262; 2026-02-21T09:21:00.3648494Z mov.b32 %r22361, %r22262; 2026-02-21T09:21:00.3648672Z mov.b32 %r22362, %r22262; 2026-02-21T09:21:00.3648852Z mov.b32 %r22363, %r22262; 2026-02-21T09:21:00.3649024Z mov.b32 %r22364, %r22262; 2026-02-21T09:21:00.3649209Z mov.b32 %r22365, %r22262; 2026-02-21T09:21:00.3649382Z mov.b32 %r22366, %r22262; 2026-02-21T09:21:00.3649558Z mov.b32 %r22367, %r22262; 2026-02-21T09:21:00.3649737Z mov.b32 %r22368, %r22262; 2026-02-21T09:21:00.3649918Z mov.b32 %r22369, %r22262; 2026-02-21T09:21:00.3650086Z mov.b32 %r22370, %r22262; 2026-02-21T09:21:00.3650264Z mov.b32 %r22371, %r22262; 2026-02-21T09:21:00.3650436Z mov.b32 %r22372, %r22262; 2026-02-21T09:21:00.3650614Z mov.b32 %r22373, %r22262; 2026-02-21T09:21:00.3650794Z mov.b32 %r22374, %r22262; 2026-02-21T09:21:00.3650972Z mov.b32 %r22375, %r22262; 2026-02-21T09:21:00.3651151Z mov.b32 %r22376, %r22262; 2026-02-21T09:21:00.3651324Z mov.b32 %r22377, %r22262; 2026-02-21T09:21:00.3651505Z mov.b32 %r22378, %r22262; 2026-02-21T09:21:00.3651677Z mov.b32 %r22379, %r22262; 2026-02-21T09:21:00.3651862Z mov.b32 %r22380, %r22262; 2026-02-21T09:21:00.3652036Z mov.b32 %r22381, %r22262; 2026-02-21T09:21:00.3652208Z mov.b32 %r22382, %r22262; 2026-02-21T09:21:00.3652385Z mov.b32 %r22383, %r22262; 2026-02-21T09:21:00.3652567Z mov.b32 %r22384, %r22262; 2026-02-21T09:21:00.3652747Z mov.b32 %r22385, %r22262; 2026-02-21T09:21:00.3652918Z mov.b32 %r22386, %r22262; 2026-02-21T09:21:00.3653093Z mov.b32 %r22387, %r22262; 2026-02-21T09:21:00.3653258Z mov.b32 %r22388, %r22262; 2026-02-21T09:21:00.3653449Z mov.b32 %r22389, %r22262; 2026-02-21T09:21:00.3653625Z mov.b32 %r22390, %r22262; 2026-02-21T09:21:00.3653800Z mov.b32 %r22391, %r22262; 2026-02-21T09:21:00.3653970Z mov.b32 %r22392, %r22262; 2026-02-21T09:21:00.3654145Z mov.b32 %r22393, %r22262; 2026-02-21T09:21:00.3654312Z mov.b32 %r22394, %r22262; 2026-02-21T09:21:00.3654488Z mov.b32 %r22395, %r22262; 2026-02-21T09:21:00.3654666Z mov.b32 %r22396, %r22262; 2026-02-21T09:21:00.3654836Z mov.b32 %r22397, %r22262; 2026-02-21T09:21:00.3655010Z mov.b32 %r22398, %r22262; 2026-02-21T09:21:00.3655260Z mov.b32 %r22399, %r22262; 2026-02-21T09:21:00.3655439Z mov.b32 %r22400, %r22262; 2026-02-21T09:21:00.3655606Z mov.b32 %r22401, %r22262; 2026-02-21T09:21:00.3655786Z mov.b32 %r22402, %r22262; 2026-02-21T09:21:00.3655954Z mov.b32 %r22403, %r22262; 2026-02-21T09:21:00.3656231Z mov.b32 %r22404, %r22262; 2026-02-21T09:21:00.3656411Z mov.b32 %r22405, %r22262; 2026-02-21T09:21:00.3656706Z mov.b32 %r22406, %r22262; 2026-02-21T09:21:00.3656887Z mov.b32 %r22407, %r22262; 2026-02-21T09:21:00.3657054Z mov.b32 %r22408, %r22262; 2026-02-21T09:21:00.3657243Z mov.b32 %r22409, %r22262; 2026-02-21T09:21:00.3657420Z mov.b32 %r22410, %r22262; 2026-02-21T09:21:00.3657592Z mov.b32 %r22411, %r22262; 2026-02-21T09:21:00.3657765Z mov.b32 %r22412, %r22262; 2026-02-21T09:21:00.3658012Z mov.b32 %r22413, %r22262; 2026-02-21T09:21:00.3658184Z mov.b32 %r22414, %r22262; 2026-02-21T09:21:00.3658355Z mov.b32 %r22415, %r22262; 2026-02-21T09:21:00.3658528Z mov.b32 %r22416, %r22262; 2026-02-21T09:21:00.3658696Z mov.b32 %r22417, %r22262; 2026-02-21T09:21:00.3658884Z mov.b32 %r22418, %r22262; 2026-02-21T09:21:00.3659051Z mov.b32 %r22419, %r22262; 2026-02-21T09:21:00.3659222Z mov.b32 %r22420, %r22262; 2026-02-21T09:21:00.3659387Z mov.b32 %r22421, %r22262; 2026-02-21T09:21:00.3659560Z mov.b32 %r22422, %r22262; 2026-02-21T09:21:00.3659726Z mov.b32 %r22423, %r22262; 2026-02-21T09:21:00.3659899Z mov.b32 %r22424, %r22262; 2026-02-21T09:21:00.3660065Z mov.b32 %r22425, %r22262; 2026-02-21T09:21:00.3660236Z mov.b32 %r22426, %r22262; 2026-02-21T09:21:00.3660492Z mov.b32 %r22427, %r22262; 2026-02-21T09:21:00.3660667Z mov.b32 %r22428, %r22262; 2026-02-21T09:21:00.3660848Z mov.b32 %r22429, %r22262; 2026-02-21T09:21:00.3661019Z mov.b32 %r22430, %r22262; 2026-02-21T09:21:00.3661199Z mov.b32 %r22431, %r22262; 2026-02-21T09:21:00.3661370Z mov.b32 %r22432, %r22262; 2026-02-21T09:21:00.3661544Z mov.b32 %r22433, %r22262; 2026-02-21T09:21:00.3661712Z mov.b32 %r22434, %r22262; 2026-02-21T09:21:00.3661892Z mov.b32 %r22435, %r22262; 2026-02-21T09:21:00.3662060Z mov.b32 %r22436, %r22262; 2026-02-21T09:21:00.3662237Z mov.b32 %r22437, %r22262; 2026-02-21T09:21:00.3662415Z mov.b32 %r22438, %r22262; 2026-02-21T09:21:00.3662582Z mov.b32 %r22439, %r22262; 2026-02-21T09:21:00.3662762Z mov.b32 %r22440, %r22262; 2026-02-21T09:21:00.3662929Z mov.b32 %r22441, %r22262; 2026-02-21T09:21:00.3663101Z mov.b32 %r22442, %r22262; 2026-02-21T09:21:00.3663270Z mov.b32 %r22443, %r22262; 2026-02-21T09:21:00.3663442Z mov.b32 %r22444, %r22262; 2026-02-21T09:21:00.3663609Z mov.b32 %r22445, %r22262; 2026-02-21T09:21:00.3663780Z mov.b32 %r22446, %r22262; 2026-02-21T09:21:00.3663948Z mov.b32 %r22447, %r22262; 2026-02-21T09:21:00.3664124Z mov.b32 %r22448, %r22262; 2026-02-21T09:21:00.3664301Z mov.b32 %r22449, %r22262; 2026-02-21T09:21:00.3664471Z mov.b32 %r22450, %r22262; 2026-02-21T09:21:00.3664644Z mov.b32 %r22451, %r22262; 2026-02-21T09:21:00.3664809Z mov.b32 %r22452, %r22262; 2026-02-21T09:21:00.3664989Z mov.b32 %r22453, %r22262; 2026-02-21T09:21:00.3665159Z mov.b32 %r22454, %r22262; 2026-02-21T09:21:00.3665349Z mov.b32 %r22455, %r22262; 2026-02-21T09:21:00.3665521Z mov.b32 %r22456, %r22262; 2026-02-21T09:21:00.3665693Z mov.b32 %r22457, %r22262; 2026-02-21T09:21:00.3665860Z mov.b32 %r22458, %r22262; 2026-02-21T09:21:00.3666029Z mov.b32 %r22459, %r22262; 2026-02-21T09:21:00.3666207Z mov.b32 %r22460, %r22262; 2026-02-21T09:21:00.3666373Z mov.b32 %r22461, %r22262; 2026-02-21T09:21:00.3666674Z mov.b32 %r22462, %r22262; 2026-02-21T09:21:00.3666843Z mov.b32 %r22463, %r22262; 2026-02-21T09:21:00.3667013Z mov.b32 %r22464, %r22262; 2026-02-21T09:21:00.3667186Z mov.b32 %r22465, %r22262; 2026-02-21T09:21:00.3667357Z mov.b32 %r22466, %r22262; 2026-02-21T09:21:00.3667525Z mov.b32 %r22467, %r22262; 2026-02-21T09:21:00.3667711Z mov.b32 %r22468, %r22262; 2026-02-21T09:21:00.3667882Z mov.b32 %r22469, %r22262; 2026-02-21T09:21:00.3668052Z mov.b32 %r22470, %r22262; 2026-02-21T09:21:00.3668384Z mov.b32 %r22471, %r22262; 2026-02-21T09:21:00.3668563Z mov.b32 %r22472, %r22262; 2026-02-21T09:21:00.3668741Z mov.b32 %r22473, %r22262; 2026-02-21T09:21:00.3668910Z mov.b32 %r22474, %r22262; 2026-02-21T09:21:00.3669087Z mov.b32 %r22475, %r22262; 2026-02-21T09:21:00.3669340Z mov.b32 %r22476, %r22262; 2026-02-21T09:21:00.3669515Z mov.b32 %r22477, %r22262; 2026-02-21T09:21:00.3669680Z mov.b32 %r22478, %r22262; 2026-02-21T09:21:00.3669851Z mov.b32 %r22479, %r22262; 2026-02-21T09:21:00.3670019Z mov.b32 %r22480, %r22262; 2026-02-21T09:21:00.3670205Z mov.b32 %r22481, %r22262; 2026-02-21T09:21:00.3670381Z mov.b32 %r22482, %r22262; 2026-02-21T09:21:00.3670551Z mov.b32 %r22483, %r22262; 2026-02-21T09:21:00.3670727Z mov.b32 %r22484, %r22262; 2026-02-21T09:21:00.3670959Z mov.b32 %r22485, %r22262; 2026-02-21T09:21:00.3671141Z mov.b32 %r22486, %r22262; 2026-02-21T09:21:00.3671314Z mov.b32 %r22487, %r22262; 2026-02-21T09:21:00.3671487Z mov.b32 %r22488, %r22262; 2026-02-21T09:21:00.3671662Z mov.b32 %r22489, %r22262; 2026-02-21T09:21:00.3671834Z mov.b32 %r22490, %r22262; 2026-02-21T09:21:00.3672005Z mov.b32 %r22491, %r22262; 2026-02-21T09:21:00.3672177Z mov.b32 %r22492, %r22262; 2026-02-21T09:21:00.3672353Z mov.b32 %r22493, %r22262; 2026-02-21T09:21:00.3672525Z mov.b32 %r22494, %r22262; 2026-02-21T09:21:00.3672702Z mov.b32 %r22495, %r22262; 2026-02-21T09:21:00.3672868Z mov.b32 %r22496, %r22262; 2026-02-21T09:21:00.3673042Z mov.b32 %r22497, %r22262; 2026-02-21T09:21:00.3673207Z mov.b32 %r22498, %r22262; 2026-02-21T09:21:00.3673453Z mov.b32 %r22499, %r22262; 2026-02-21T09:21:00.3673625Z mov.b32 %r22500, %r22262; 2026-02-21T09:21:00.3673800Z mov.b32 %r22501, %r22262; 2026-02-21T09:21:00.3673967Z mov.b32 %r22502, %r22262; 2026-02-21T09:21:00.3674144Z mov.b32 %r22503, %r22262; 2026-02-21T09:21:00.3674317Z mov.b32 %r22504, %r22262; 2026-02-21T09:21:00.3674488Z mov.b32 %r22505, %r22262; 2026-02-21T09:21:00.3674661Z mov.b32 %r22506, %r22262; 2026-02-21T09:21:00.3674832Z mov.b32 %r22507, %r22262; 2026-02-21T09:21:00.3675004Z mov.b32 %r22508, %r22262; 2026-02-21T09:21:00.3675166Z mov.b32 %r22509, %r22262; 2026-02-21T09:21:00.3675337Z mov.b32 %r22510, %r22262; 2026-02-21T09:21:00.3675504Z mov.b32 %r22511, %r22262; 2026-02-21T09:21:00.3675684Z mov.b32 %r22512, %r22262; 2026-02-21T09:21:00.3675864Z mov.b32 %r22513, %r22262; 2026-02-21T09:21:00.3676049Z mov.b32 %r22514, %r22262; 2026-02-21T09:21:00.3676222Z mov.b32 %r22515, %r22262; 2026-02-21T09:21:00.3676392Z mov.b32 %r22516, %r22262; 2026-02-21T09:21:00.3676697Z mov.b32 %r22517, %r22262; 2026-02-21T09:21:00.3676925Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:00.3677237Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:00.3677498Z add.s64 %rd715, %rd715, 16; 2026-02-21T09:21:00.3677699Z setp.lt.u64 %p10, %rd715, 432; 2026-02-21T09:21:00.3677891Z add.s32 %r6205, %r22260, 1; 2026-02-21T09:21:00.3678081Z setp.gt.s32 %p11, %r6205, 4; 2026-02-21T09:21:00.3678276Z selp.b32 %r22260, 0, %r6205, %p11; 2026-02-21T09:21:00.3678626Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3679005Z cp.async.wait_group 16; 2026-02-21T09:21:00.3679181Z bar.sync 0; 2026-02-21T09:21:00.3679338Z shl.b32 %r6206, %r22260, 13; 2026-02-21T09:21:00.3679526Z add.s32 %r6208, %r22237, %r6206; 2026-02-21T09:21:00.3679866Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.3680230Z add.s32 %r6209, %r6208, %r109; 2026-02-21T09:21:00.3680419Z ld.shared.b16 %rs1, [%r6209]; 2026-02-21T09:21:00.3680619Z ld.shared.b16 %rs2, [%r6209+256]; 2026-02-21T09:21:00.3680841Z ld.shared.b16 %rs3, [%r6209+16]; 2026-02-21T09:21:00.3681046Z ld.shared.b16 %rs4, [%r6209+272]; 2026-02-21T09:21:00.3681247Z ld.shared.b16 %rs5, [%r6209+4096]; 2026-02-21T09:21:00.3681542Z ld.shared.b16 %rs6, [%r6209+4352]; 2026-02-21T09:21:00.3681736Z ld.shared.b16 %rs7, [%r6209+4112]; 2026-02-21T09:21:00.3681937Z ld.shared.b16 %rs8, [%r6209+4368]; 2026-02-21T09:21:00.3682137Z add.s32 %r6210, %r6208, %r110; 2026-02-21T09:21:00.3682336Z ld.shared.b16 %rs9, [%r6210]; 2026-02-21T09:21:00.3682600Z ld.shared.b16 %rs10, [%r6210+256]; 2026-02-21T09:21:00.3682810Z ld.shared.b16 %rs11, [%r6210+16]; 2026-02-21T09:21:00.3683015Z ld.shared.b16 %rs12, [%r6210+272]; 2026-02-21T09:21:00.3683214Z ld.shared.b16 %rs13, [%r6210+4096]; 2026-02-21T09:21:00.3683430Z ld.shared.b16 %rs14, [%r6210+4352]; 2026-02-21T09:21:00.3683636Z ld.shared.b16 %rs15, [%r6210+4112]; 2026-02-21T09:21:00.3683845Z ld.shared.b16 %rs16, [%r6210+4368]; 2026-02-21T09:21:00.3684046Z cvt.f32.bf16 %r3325, %rs1; 2026-02-21T09:21:00.3684323Z cvt.f32.bf16 %r3326, %rs2; 2026-02-21T09:21:00.3684516Z cvt.f32.bf16 %r3327, %rs9; 2026-02-21T09:21:00.3684699Z cvt.f32.bf16 %r3328, %rs10; 2026-02-21T09:21:00.3684893Z cvt.f32.bf16 %r3585, %rs3; 2026-02-21T09:21:00.3685072Z cvt.f32.bf16 %r3586, %rs4; 2026-02-21T09:21:00.3685271Z cvt.f32.bf16 %r3587, %rs11; 2026-02-21T09:21:00.3685453Z cvt.f32.bf16 %r3588, %rs12; 2026-02-21T09:21:00.3685637Z cvt.f32.bf16 %r3845, %rs5; 2026-02-21T09:21:00.3685814Z cvt.f32.bf16 %r3846, %rs6; 2026-02-21T09:21:00.3685996Z cvt.f32.bf16 %r3847, %rs13; 2026-02-21T09:21:00.3686173Z cvt.f32.bf16 %r3848, %rs14; 2026-02-21T09:21:00.3686356Z cvt.f32.bf16 %r4105, %rs7; 2026-02-21T09:21:00.3686653Z cvt.f32.bf16 %r4106, %rs8; 2026-02-21T09:21:00.3686910Z cvt.f32.bf16 %r4107, %rs15; 2026-02-21T09:21:00.3687112Z cvt.f32.bf16 %r4108, %rs16; 2026-02-21T09:21:00.3687437Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3687806Z shl.b32 %r6211, %r22260, 11; 2026-02-21T09:21:00.3687992Z add.s32 %r6212, %r22237, %r6211; 2026-02-21T09:21:00.3688189Z add.s32 %r6213, %r6212, 98304; 2026-02-21T09:21:00.3688514Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.3688876Z add.s32 %r6214, %r6213, %r22239; 2026-02-21T09:21:00.3689081Z add.s32 %r6215, %r6213, %r22243; 2026-02-21T09:21:00.3689279Z add.s32 %r6216, %r6213, %r22244; 2026-02-21T09:21:00.3689613Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3689975Z ld.shared.s8 %rs17, [%r6214]; 2026-02-21T09:21:00.3690302Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3690652Z shl.b16 %rs18, %rs17, 4; 2026-02-21T09:21:00.3690969Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3691335Z ld.shared.s8 %rs19, [%r6214+256]; 2026-02-21T09:21:00.3691668Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3692032Z shl.b16 %rs20, %rs19, 4; 2026-02-21T09:21:00.3692346Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3692721Z ld.shared.s8 %rs21, [%r6214+512]; 2026-02-21T09:21:00.3693056Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3693408Z shl.b16 %rs22, %rs21, 4; 2026-02-21T09:21:00.3693733Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3694093Z ld.shared.s8 %rs23, [%r6215]; 2026-02-21T09:21:00.3694427Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3694777Z shl.b16 %rs24, %rs23, 4; 2026-02-21T09:21:00.3695094Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3695457Z ld.shared.s8 %rs25, [%r6214+1024]; 2026-02-21T09:21:00.3695883Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3696243Z shl.b16 %rs26, %rs25, 4; 2026-02-21T09:21:00.3696778Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3697222Z ld.shared.s8 %rs27, [%r6214+1280]; 2026-02-21T09:21:00.3697559Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3697917Z shl.b16 %rs28, %rs27, 4; 2026-02-21T09:21:00.3698236Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3698590Z ld.shared.s8 %rs29, [%r6214+1536]; 2026-02-21T09:21:00.3699010Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3699369Z shl.b16 %rs30, %rs29, 4; 2026-02-21T09:21:00.3699681Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3700038Z ld.shared.s8 %rs31, [%r6216]; 2026-02-21T09:21:00.3700369Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3700725Z shl.b16 %rs32, %rs31, 4; 2026-02-21T09:21:00.3701031Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3701384Z cvt.s16.s8 %rs33, %rs18; 2026-02-21T09:21:00.3701556Z shr.s16 %rs34, %rs33, 4; 2026-02-21T09:21:00.3701735Z cvt.s16.s8 %rs35, %rs20; 2026-02-21T09:21:00.3701974Z shr.s16 %rs36, %rs35, 4; 2026-02-21T09:21:00.3702170Z shr.s16 %rs37, %rs17, 4; 2026-02-21T09:21:00.3702348Z shr.s16 %rs38, %rs19, 4; 2026-02-21T09:21:00.3702667Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3703024Z cvt.rn.f32.s16 %r6217, %rs38; 2026-02-21T09:21:00.3703207Z cvt.rn.f32.s16 %r6218, %rs37; 2026-02-21T09:21:00.3703394Z cvt.rn.f32.s16 %r6219, %rs36; 2026-02-21T09:21:00.3703576Z cvt.rn.f32.s16 %r6220, %rs34; 2026-02-21T09:21:00.3703895Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3704246Z cvt.s16.s8 %rs39, %rs22; 2026-02-21T09:21:00.3704425Z shr.s16 %rs40, %rs39, 4; 2026-02-21T09:21:00.3704603Z cvt.s16.s8 %rs41, %rs24; 2026-02-21T09:21:00.3704775Z shr.s16 %rs42, %rs41, 4; 2026-02-21T09:21:00.3704950Z shr.s16 %rs43, %rs21, 4; 2026-02-21T09:21:00.3705122Z shr.s16 %rs44, %rs23, 4; 2026-02-21T09:21:00.3705437Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3705790Z cvt.rn.f32.s16 %r6221, %rs44; 2026-02-21T09:21:00.3705980Z cvt.rn.f32.s16 %r6222, %rs43; 2026-02-21T09:21:00.3706162Z cvt.rn.f32.s16 %r6223, %rs42; 2026-02-21T09:21:00.3706350Z cvt.rn.f32.s16 %r6224, %rs40; 2026-02-21T09:21:00.3706814Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3707167Z cvt.s16.s8 %rs45, %rs26; 2026-02-21T09:21:00.3707348Z shr.s16 %rs46, %rs45, 4; 2026-02-21T09:21:00.3707520Z cvt.s16.s8 %rs47, %rs28; 2026-02-21T09:21:00.3707701Z shr.s16 %rs48, %rs47, 4; 2026-02-21T09:21:00.3707874Z shr.s16 %rs49, %rs25, 4; 2026-02-21T09:21:00.3708054Z shr.s16 %rs50, %rs27, 4; 2026-02-21T09:21:00.3708434Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3708802Z cvt.rn.f32.s16 %r6225, %rs50; 2026-02-21T09:21:00.3708998Z cvt.rn.f32.s16 %r6226, %rs49; 2026-02-21T09:21:00.3709180Z cvt.rn.f32.s16 %r6227, %rs48; 2026-02-21T09:21:00.3709370Z cvt.rn.f32.s16 %r6228, %rs46; 2026-02-21T09:21:00.3709688Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3710046Z cvt.s16.s8 %rs51, %rs30; 2026-02-21T09:21:00.3710219Z shr.s16 %rs52, %rs51, 4; 2026-02-21T09:21:00.3710512Z cvt.s16.s8 %rs53, %rs32; 2026-02-21T09:21:00.3710688Z shr.s16 %rs54, %rs53, 4; 2026-02-21T09:21:00.3710866Z shr.s16 %rs55, %rs29, 4; 2026-02-21T09:21:00.3711045Z shr.s16 %rs56, %rs31, 4; 2026-02-21T09:21:00.3711359Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3711787Z cvt.rn.f32.s16 %r6229, %rs56; 2026-02-21T09:21:00.3711969Z cvt.rn.f32.s16 %r6230, %rs55; 2026-02-21T09:21:00.3712163Z cvt.rn.f32.s16 %r6231, %rs54; 2026-02-21T09:21:00.3712351Z cvt.rn.f32.s16 %r6232, %rs52; 2026-02-21T09:21:00.3712606Z st.shared.v4.b32 [%r113], {%r6220, %r6218, %r6219, %r6217}; 2026-02-21T09:21:00.3712920Z st.shared.v4.b32 [%r114], {%r6224, %r6222, %r6223, %r6221}; 2026-02-21T09:21:00.3713101Z st.shared.v4.b32 [%r115], {%r6228, %r6226, %r6227, %r6225}; 2026-02-21T09:21:00.3713222Z st.shared.v4.b32 [%r116], {%r6232, %r6230, %r6231, %r6229}; 2026-02-21T09:21:00.3713283Z $L__tmp1: 2026-02-21T09:21:00.3713574Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.3713650Z // begin inline asm 2026-02-21T09:21:00.3713764Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.3713833Z // end inline asm 2026-02-21T09:21:00.3713894Z bar.sync 0; 2026-02-21T09:21:00.3713981Z shfl.sync.idx.b32 %r6233, %r5, 0, 31, -1; 2026-02-21T09:21:00.3714060Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.3714136Z mov.pred %p2, -1; 2026-02-21T09:21:00.3714198Z // begin inline asm 2026-02-21T09:21:00.3717133Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389}, {%r3325,%r3326,%r3327,%r3328}, %rd1, %p2, 1, 1; 2026-02-21T09:21:00.3717224Z // end inline asm 2026-02-21T09:21:00.3717289Z // begin inline asm 2026-02-21T09:21:00.3720016Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389}, {%r3585,%r3586,%r3587,%r3588}, %rd2, %p2, 1, 1; 2026-02-21T09:21:00.3720091Z // end inline asm 2026-02-21T09:21:00.3720154Z // begin inline asm 2026-02-21T09:21:00.3722985Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517}, {%r3845,%r3846,%r3847,%r3848}, %rd1, %p2, 1, 1; 2026-02-21T09:21:00.3723112Z // end inline asm 2026-02-21T09:21:00.3723175Z // begin inline asm 2026-02-21T09:21:00.3725930Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517}, {%r4105,%r4106,%r4107,%r4108}, %rd2, %p2, 1, 1; 2026-02-21T09:21:00.3726000Z // end inline asm 2026-02-21T09:21:00.3726092Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.3726152Z mov.b32 %r5925, 0; 2026-02-21T09:21:00.3726218Z mov.b32 %r4365, %r2884; 2026-02-21T09:21:00.3726286Z mov.b32 %r4366, %r5925; 2026-02-21T09:21:00.3726354Z mov.b32 %r4367, %r5925; 2026-02-21T09:21:00.3726416Z // begin inline asm 2026-02-21T09:21:00.3731526Z // wait for regs: %r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517,%r4365,%r4366,%r4367 2026-02-21T09:21:00.3731816Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.3731881Z // end inline asm 2026-02-21T09:21:00.3731938Z $L__tmp2: 2026-02-21T09:21:00.3732172Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3732251Z add.s32 %r6234, %r22237, 40960; 2026-02-21T09:21:00.3732317Z add.s32 %r6235, %r6234, %r6206; 2026-02-21T09:21:00.3732523Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.3732599Z add.s32 %r6236, %r6235, %r109; 2026-02-21T09:21:00.3732668Z ld.shared.b16 %rs57, [%r6236]; 2026-02-21T09:21:00.3732740Z ld.shared.b16 %rs58, [%r6236+256]; 2026-02-21T09:21:00.3732817Z ld.shared.b16 %rs59, [%r6236+16]; 2026-02-21T09:21:00.3732951Z ld.shared.b16 %rs60, [%r6236+272]; 2026-02-21T09:21:00.3733024Z ld.shared.b16 %rs61, [%r6236+4096]; 2026-02-21T09:21:00.3733094Z ld.shared.b16 %rs62, [%r6236+4352]; 2026-02-21T09:21:00.3733174Z ld.shared.b16 %rs63, [%r6236+4112]; 2026-02-21T09:21:00.3733254Z ld.shared.b16 %rs64, [%r6236+4368]; 2026-02-21T09:21:00.3733322Z add.s32 %r6237, %r6235, %r110; 2026-02-21T09:21:00.3733398Z ld.shared.b16 %rs65, [%r6237]; 2026-02-21T09:21:00.3733469Z ld.shared.b16 %rs66, [%r6237+256]; 2026-02-21T09:21:00.3733537Z ld.shared.b16 %rs67, [%r6237+16]; 2026-02-21T09:21:00.3733612Z ld.shared.b16 %rs68, [%r6237+272]; 2026-02-21T09:21:00.3733681Z ld.shared.b16 %rs69, [%r6237+4096]; 2026-02-21T09:21:00.3733752Z ld.shared.b16 %rs70, [%r6237+4352]; 2026-02-21T09:21:00.3733819Z ld.shared.b16 %rs71, [%r6237+4112]; 2026-02-21T09:21:00.3733895Z ld.shared.b16 %rs72, [%r6237+4368]; 2026-02-21T09:21:00.3733963Z cvt.f32.bf16 %r4883, %rs57; 2026-02-21T09:21:00.3734029Z cvt.f32.bf16 %r4884, %rs58; 2026-02-21T09:21:00.3734100Z cvt.f32.bf16 %r4885, %rs65; 2026-02-21T09:21:00.3734170Z cvt.f32.bf16 %r4886, %rs66; 2026-02-21T09:21:00.3734237Z cvt.f32.bf16 %r5143, %rs59; 2026-02-21T09:21:00.3734303Z cvt.f32.bf16 %r5144, %rs60; 2026-02-21T09:21:00.3734374Z cvt.f32.bf16 %r5145, %rs67; 2026-02-21T09:21:00.3734441Z cvt.f32.bf16 %r5146, %rs68; 2026-02-21T09:21:00.3734505Z cvt.f32.bf16 %r5403, %rs61; 2026-02-21T09:21:00.3734578Z cvt.f32.bf16 %r5404, %rs62; 2026-02-21T09:21:00.3734645Z cvt.f32.bf16 %r5405, %rs69; 2026-02-21T09:21:00.3734708Z cvt.f32.bf16 %r5406, %rs70; 2026-02-21T09:21:00.3734773Z cvt.f32.bf16 %r5663, %rs63; 2026-02-21T09:21:00.3734848Z cvt.f32.bf16 %r5664, %rs64; 2026-02-21T09:21:00.3734915Z cvt.f32.bf16 %r5665, %rs71; 2026-02-21T09:21:00.3734978Z cvt.f32.bf16 %r5666, %rs72; 2026-02-21T09:21:00.3735190Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3735257Z add.s32 %r6238, %r6212, 108544; 2026-02-21T09:21:00.3735459Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.3735538Z add.s32 %r6239, %r6238, %r22239; 2026-02-21T09:21:00.3735605Z add.s32 %r6240, %r6238, %r22243; 2026-02-21T09:21:00.3735670Z add.s32 %r6241, %r6238, %r22244; 2026-02-21T09:21:00.3735869Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3736010Z ld.shared.s8 %rs73, [%r6239]; 2026-02-21T09:21:00.3736212Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3736279Z shl.b16 %rs74, %rs73, 4; 2026-02-21T09:21:00.3736676Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3736754Z ld.shared.s8 %rs75, [%r6239+256]; 2026-02-21T09:21:00.3736960Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3737036Z shl.b16 %rs76, %rs75, 4; 2026-02-21T09:21:00.3737239Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3737386Z ld.shared.s8 %rs77, [%r6239+512]; 2026-02-21T09:21:00.3737597Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3737664Z shl.b16 %rs78, %rs77, 4; 2026-02-21T09:21:00.3737866Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3737935Z ld.shared.s8 %rs79, [%r6240]; 2026-02-21T09:21:00.3738141Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3738208Z shl.b16 %rs80, %rs79, 4; 2026-02-21T09:21:00.3738406Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3738485Z ld.shared.s8 %rs81, [%r6239+1024]; 2026-02-21T09:21:00.3738771Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3738843Z shl.b16 %rs82, %rs81, 4; 2026-02-21T09:21:00.3739063Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3739136Z ld.shared.s8 %rs83, [%r6239+1280]; 2026-02-21T09:21:00.3739334Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3739405Z shl.b16 %rs84, %rs83, 4; 2026-02-21T09:21:00.3739602Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3739673Z ld.shared.s8 %rs85, [%r6239+1536]; 2026-02-21T09:21:00.3739870Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3739941Z shl.b16 %rs86, %rs85, 4; 2026-02-21T09:21:00.3740143Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3740213Z ld.shared.s8 %rs87, [%r6241]; 2026-02-21T09:21:00.3740418Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3740481Z shl.b16 %rs88, %rs87, 4; 2026-02-21T09:21:00.3740679Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3740753Z cvt.s16.s8 %rs89, %rs74; 2026-02-21T09:21:00.3740816Z shr.s16 %rs90, %rs89, 4; 2026-02-21T09:21:00.3740881Z cvt.s16.s8 %rs91, %rs76; 2026-02-21T09:21:00.3740942Z shr.s16 %rs92, %rs91, 4; 2026-02-21T09:21:00.3741014Z shr.s16 %rs93, %rs73, 4; 2026-02-21T09:21:00.3741078Z shr.s16 %rs94, %rs75, 4; 2026-02-21T09:21:00.3741280Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3741353Z cvt.rn.f32.s16 %r6242, %rs94; 2026-02-21T09:21:00.3741421Z cvt.rn.f32.s16 %r6243, %rs93; 2026-02-21T09:21:00.3741487Z cvt.rn.f32.s16 %r6244, %rs92; 2026-02-21T09:21:00.3741566Z cvt.rn.f32.s16 %r6245, %rs90; 2026-02-21T09:21:00.3741770Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3741840Z cvt.s16.s8 %rs95, %rs78; 2026-02-21T09:21:00.3741903Z shr.s16 %rs96, %rs95, 4; 2026-02-21T09:21:00.3741972Z cvt.s16.s8 %rs97, %rs80; 2026-02-21T09:21:00.3742118Z shr.s16 %rs98, %rs97, 4; 2026-02-21T09:21:00.3742187Z shr.s16 %rs99, %rs77, 4; 2026-02-21T09:21:00.3742262Z shr.s16 %rs100, %rs79, 4; 2026-02-21T09:21:00.3742463Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3742593Z cvt.rn.f32.s16 %r6246, %rs100; 2026-02-21T09:21:00.3742670Z cvt.rn.f32.s16 %r6247, %rs99; 2026-02-21T09:21:00.3742739Z cvt.rn.f32.s16 %r6248, %rs98; 2026-02-21T09:21:00.3742803Z cvt.rn.f32.s16 %r6249, %rs96; 2026-02-21T09:21:00.3743005Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3743077Z cvt.s16.s8 %rs101, %rs82; 2026-02-21T09:21:00.3743144Z shr.s16 %rs102, %rs101, 4; 2026-02-21T09:21:00.3743258Z cvt.s16.s8 %rs103, %rs84; 2026-02-21T09:21:00.3743333Z shr.s16 %rs104, %rs103, 4; 2026-02-21T09:21:00.3743398Z shr.s16 %rs105, %rs81, 4; 2026-02-21T09:21:00.3743462Z shr.s16 %rs106, %rs83, 4; 2026-02-21T09:21:00.3743679Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3743756Z cvt.rn.f32.s16 %r6250, %rs106; 2026-02-21T09:21:00.3743824Z cvt.rn.f32.s16 %r6251, %rs105; 2026-02-21T09:21:00.3743892Z cvt.rn.f32.s16 %r6252, %rs104; 2026-02-21T09:21:00.3743966Z cvt.rn.f32.s16 %r6253, %rs102; 2026-02-21T09:21:00.3744168Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3744235Z cvt.s16.s8 %rs107, %rs86; 2026-02-21T09:21:00.3744355Z shr.s16 %rs108, %rs107, 4; 2026-02-21T09:21:00.3744422Z cvt.s16.s8 %rs109, %rs88; 2026-02-21T09:21:00.3744486Z shr.s16 %rs110, %rs109, 4; 2026-02-21T09:21:00.3744551Z shr.s16 %rs111, %rs85, 4; 2026-02-21T09:21:00.3744625Z shr.s16 %rs112, %rs87, 4; 2026-02-21T09:21:00.3744823Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3744892Z cvt.rn.f32.s16 %r6254, %rs112; 2026-02-21T09:21:00.3744967Z cvt.rn.f32.s16 %r6255, %rs111; 2026-02-21T09:21:00.3745034Z cvt.rn.f32.s16 %r6256, %rs110; 2026-02-21T09:21:00.3745100Z cvt.rn.f32.s16 %r6257, %rs108; 2026-02-21T09:21:00.3745162Z bar.sync 0; 2026-02-21T09:21:00.3745291Z st.shared.v4.b32 [%r113], {%r6245, %r6243, %r6244, %r6242}; 2026-02-21T09:21:00.3745402Z st.shared.v4.b32 [%r114], {%r6249, %r6247, %r6248, %r6246}; 2026-02-21T09:21:00.3745512Z st.shared.v4.b32 [%r115], {%r6253, %r6251, %r6252, %r6250}; 2026-02-21T09:21:00.3745645Z st.shared.v4.b32 [%r116], {%r6257, %r6255, %r6256, %r6254}; 2026-02-21T09:21:00.3745706Z $L__tmp3: 2026-02-21T09:21:00.3745989Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.3746065Z // begin inline asm 2026-02-21T09:21:00.3746150Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.3746213Z // end inline asm 2026-02-21T09:21:00.3746273Z bar.sync 0; 2026-02-21T09:21:00.3746357Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.3746422Z // begin inline asm 2026-02-21T09:21:00.3756306Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389}, {%r4883,%r4884,%r4885,%r4886}, %rd1, %p2, 1, 1; 2026-02-21T09:21:00.3756801Z // end inline asm 2026-02-21T09:21:00.3756965Z // begin inline asm 2026-02-21T09:21:00.3759757Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389}, {%r5143,%r5144,%r5145,%r5146}, %rd2, %p2, 1, 1; 2026-02-21T09:21:00.3759842Z // end inline asm 2026-02-21T09:21:00.3759912Z // begin inline asm 2026-02-21T09:21:00.3762660Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517}, {%r5403,%r5404,%r5405,%r5406}, %rd1, %p2, 1, 1; 2026-02-21T09:21:00.3762731Z // end inline asm 2026-02-21T09:21:00.3762804Z // begin inline asm 2026-02-21T09:21:00.3765483Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517}, {%r5663,%r5664,%r5665,%r5666}, %rd2, %p2, 1, 1; 2026-02-21T09:21:00.3765561Z // end inline asm 2026-02-21T09:21:00.3765647Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.3765782Z mov.b32 %r5923, %r2884; 2026-02-21T09:21:00.3765857Z mov.b32 %r5924, %r5925; 2026-02-21T09:21:00.3765931Z // begin inline asm 2026-02-21T09:21:00.3771066Z // wait for regs: %r22262,%r22263,%r22264,%r22265,%r22266,%r22267,%r22268,%r22269,%r22270,%r22271,%r22272,%r22273,%r22274,%r22275,%r22276,%r22277,%r22278,%r22279,%r22280,%r22281,%r22282,%r22283,%r22284,%r22285,%r22286,%r22287,%r22288,%r22289,%r22290,%r22291,%r22292,%r22293,%r22294,%r22295,%r22296,%r22297,%r22298,%r22299,%r22300,%r22301,%r22302,%r22303,%r22304,%r22305,%r22306,%r22307,%r22308,%r22309,%r22310,%r22311,%r22312,%r22313,%r22314,%r22315,%r22316,%r22317,%r22318,%r22319,%r22320,%r22321,%r22322,%r22323,%r22324,%r22325,%r22326,%r22327,%r22328,%r22329,%r22330,%r22331,%r22332,%r22333,%r22334,%r22335,%r22336,%r22337,%r22338,%r22339,%r22340,%r22341,%r22342,%r22343,%r22344,%r22345,%r22346,%r22347,%r22348,%r22349,%r22350,%r22351,%r22352,%r22353,%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r22482,%r22483,%r22484,%r22485,%r22486,%r22487,%r22488,%r22489,%r22490,%r22491,%r22492,%r22493,%r22494,%r22495,%r22496,%r22497,%r22498,%r22499,%r22500,%r22501,%r22502,%r22503,%r22504,%r22505,%r22506,%r22507,%r22508,%r22509,%r22510,%r22511,%r22512,%r22513,%r22514,%r22515,%r22516,%r22517,%r5923,%r5924,%r5925 2026-02-21T09:21:00.3771256Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.3771322Z // end inline asm 2026-02-21T09:21:00.3771383Z $L__tmp4: 2026-02-21T09:21:00.3771610Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3771688Z add.s32 %r6258, %r22261, 1; 2026-02-21T09:21:00.3771764Z setp.gt.s32 %p12, %r6258, 4; 2026-02-21T09:21:00.3771842Z selp.b32 %r22261, 0, %r6258, %p12; 2026-02-21T09:21:00.3772062Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3772135Z add.s32 %r6259, %r22258, -16; 2026-02-21T09:21:00.3772206Z add.s64 %rd144, %rd714, %rd10; 2026-02-21T09:21:00.3772289Z add.s64 %rd134, %rd144, 320; 2026-02-21T09:21:00.3772359Z add.s64 %rd145, %rd714, %rd9; 2026-02-21T09:21:00.3772425Z add.s64 %rd135, %rd145, 320; 2026-02-21T09:21:00.3772489Z add.s64 %rd146, %rd714, %rd8; 2026-02-21T09:21:00.3772561Z add.s64 %rd136, %rd146, 320; 2026-02-21T09:21:00.3772641Z mad.wide.s32 %rd137, %r6259, 2, %rd44; 2026-02-21T09:21:00.3772848Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3772920Z shl.b32 %r6260, %r22261, 13; 2026-02-21T09:21:00.3772989Z add.s32 %r6261, %r22237, %r6260; 2026-02-21T09:21:00.3773058Z add.s32 %r6185, %r6261, %r47; 2026-02-21T09:21:00.3773125Z selp.b32 %r6186, 8, 0, %p10; 2026-02-21T09:21:00.3773195Z // begin inline asm 2026-02-21T09:21:00.3773360Z cp.async.ca.shared.global [ %r6185 + 0 ], [ %rd134 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3773424Z // end inline asm 2026-02-21T09:21:00.3773499Z add.s32 %r6187, %r6185, 2048; 2026-02-21T09:21:00.3773564Z // begin inline asm 2026-02-21T09:21:00.3773778Z cp.async.ca.shared.global [ %r6187 + 0 ], [ %rd135 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3773845Z // end inline asm 2026-02-21T09:21:00.3773909Z add.s32 %r6189, %r6185, 4096; 2026-02-21T09:21:00.3773971Z // begin inline asm 2026-02-21T09:21:00.3774109Z cp.async.ca.shared.global [ %r6189 + 0 ], [ %rd136 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3774226Z // end inline asm 2026-02-21T09:21:00.3774290Z add.s32 %r6191, %r6185, 6144; 2026-02-21T09:21:00.3774352Z // begin inline asm 2026-02-21T09:21:00.3774490Z cp.async.ca.shared.global [ %r6191 + 0 ], [ %rd137 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3774553Z // end inline asm 2026-02-21T09:21:00.3774637Z cp.async.commit_group; 2026-02-21T09:21:00.3774918Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3774999Z add.s32 %r6262, %r22259, -65536; 2026-02-21T09:21:00.3775070Z cvt.s64.s32 %rd147, %r6262; 2026-02-21T09:21:00.3775141Z add.s64 %rd138, %rd45, %rd147; 2026-02-21T09:21:00.3775352Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3775420Z shl.b32 %r6263, %r22261, 11; 2026-02-21T09:21:00.3775483Z add.s32 %r6193, %r54, %r6263; 2026-02-21T09:21:00.3775552Z // begin inline asm 2026-02-21T09:21:00.3775689Z cp.async.ca.shared.global [ %r6193 + 0 ], [ %rd138 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3775748Z // end inline asm 2026-02-21T09:21:00.3775821Z cp.async.commit_group; 2026-02-21T09:21:00.3776078Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3776146Z add.s64 %rd139, %rd144, 352; 2026-02-21T09:21:00.3776212Z add.s64 %rd140, %rd145, 352; 2026-02-21T09:21:00.3776295Z add.s64 %rd141, %rd146, 352; 2026-02-21T09:21:00.3776377Z mad.wide.s32 %rd142, %r22258, 2, %rd44; 2026-02-21T09:21:00.3776707Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3776788Z add.s32 %r6264, %r6234, %r6260; 2026-02-21T09:21:00.3776856Z add.s32 %r6195, %r6264, %r47; 2026-02-21T09:21:00.3776923Z // begin inline asm 2026-02-21T09:21:00.3777076Z cp.async.ca.shared.global [ %r6195 + 0 ], [ %rd139 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3777147Z // end inline asm 2026-02-21T09:21:00.3777212Z add.s32 %r6197, %r6195, 2048; 2026-02-21T09:21:00.3777275Z // begin inline asm 2026-02-21T09:21:00.3777419Z cp.async.ca.shared.global [ %r6197 + 0 ], [ %rd140 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3777480Z // end inline asm 2026-02-21T09:21:00.3777543Z add.s32 %r6199, %r6195, 4096; 2026-02-21T09:21:00.3777604Z // begin inline asm 2026-02-21T09:21:00.3777746Z cp.async.ca.shared.global [ %r6199 + 0 ], [ %rd141 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3777806Z // end inline asm 2026-02-21T09:21:00.3777870Z add.s32 %r6201, %r6195, 6144; 2026-02-21T09:21:00.3777940Z // begin inline asm 2026-02-21T09:21:00.3778072Z cp.async.ca.shared.global [ %r6201 + 0 ], [ %rd142 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3778133Z // end inline asm 2026-02-21T09:21:00.3778212Z cp.async.commit_group; 2026-02-21T09:21:00.3778424Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3778496Z cvt.s64.s32 %rd148, %r22259; 2026-02-21T09:21:00.3778563Z add.s64 %rd143, %rd45, %rd148; 2026-02-21T09:21:00.3778772Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3778838Z add.s32 %r6203, %r60, %r6263; 2026-02-21T09:21:00.3778902Z // begin inline asm 2026-02-21T09:21:00.3779042Z cp.async.ca.shared.global [ %r6203 + 0 ], [ %rd143 + 0 ], 0x8, %r6186; 2026-02-21T09:21:00.3779102Z // end inline asm 2026-02-21T09:21:00.3779169Z cp.async.commit_group; 2026-02-21T09:21:00.3779373Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3779443Z add.s32 %r22259, %r22259, 131072; 2026-02-21T09:21:00.3779606Z add.s64 %rd714, %rd714, 64; 2026-02-21T09:21:00.3779671Z add.s32 %r22258, %r22258, 32; 2026-02-21T09:21:00.3779752Z setp.lt.u64 %p13, %rd715, 496; 2026-02-21T09:21:00.3779817Z @%p13 bra $L__BB0_3; 2026-02-21T09:21:00.3779932Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:00.3780211Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.3780279Z or.b32 %r6656, %r241, %r11; 2026-02-21T09:21:00.3780342Z or.b32 %r6657, %r241, %r12; 2026-02-21T09:21:00.3780407Z or.b32 %r6658, %r241, %r13; 2026-02-21T09:21:00.3780479Z or.b32 %r6659, %r241, %r14; 2026-02-21T09:21:00.3780545Z or.b32 %r6660, %r241, %r15; 2026-02-21T09:21:00.3780608Z or.b32 %r6661, %r241, %r16; 2026-02-21T09:21:00.3780755Z or.b32 %r6662, %r241, %r17; 2026-02-21T09:21:00.3780822Z or.b32 %r6663, %r241, %r18; 2026-02-21T09:21:00.3780885Z or.b32 %r6664, %r241, %r19; 2026-02-21T09:21:00.3780958Z or.b32 %r6665, %r241, %r20; 2026-02-21T09:21:00.3781022Z or.b32 %r6666, %r241, %r21; 2026-02-21T09:21:00.3781084Z or.b32 %r6667, %r241, %r22; 2026-02-21T09:21:00.3781144Z or.b32 %r6668, %r241, %r23; 2026-02-21T09:21:00.3781212Z or.b32 %r6669, %r241, %r24; 2026-02-21T09:21:00.3781274Z or.b32 %r6670, %r241, %r25; 2026-02-21T09:21:00.3781336Z or.b32 %r6671, %r241, %r26; 2026-02-21T09:21:00.3781404Z or.b32 %r6672, %r241, %r27; 2026-02-21T09:21:00.3781464Z or.b32 %r6673, %r241, %r28; 2026-02-21T09:21:00.3781527Z or.b32 %r6674, %r241, %r29; 2026-02-21T09:21:00.3781749Z or.b32 %r6675, %r241, %r30; 2026-02-21T09:21:00.3781835Z or.b32 %r6676, %r241, %r31; 2026-02-21T09:21:00.3781898Z or.b32 %r6677, %r241, %r32; 2026-02-21T09:21:00.3781959Z or.b32 %r6678, %r241, %r33; 2026-02-21T09:21:00.3782029Z or.b32 %r6679, %r241, %r34; 2026-02-21T09:21:00.3782093Z or.b32 %r6680, %r241, %r35; 2026-02-21T09:21:00.3782153Z or.b32 %r6681, %r241, %r36; 2026-02-21T09:21:00.3782221Z or.b32 %r6682, %r241, %r37; 2026-02-21T09:21:00.3782293Z or.b32 %r6683, %r241, %r38; 2026-02-21T09:21:00.3782355Z or.b32 %r6684, %r241, %r39; 2026-02-21T09:21:00.3782415Z or.b32 %r6685, %r241, %r40; 2026-02-21T09:21:00.3782485Z or.b32 %r6686, %r241, %r41; 2026-02-21T09:21:00.3782549Z or.b32 %r6687, %r241, %r42; 2026-02-21T09:21:00.3782751Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3782828Z cp.async.wait_group 0; 2026-02-21T09:21:00.3782889Z bar.sync 0; 2026-02-21T09:21:00.3783092Z .loc 1 90 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:90:28 2026-02-21T09:21:00.3783183Z cvt.rn.bf16x2.f32 %r6688, %r22263, %r22262; 2026-02-21T09:21:00.3783273Z cvt.rn.bf16x2.f32 %r6689, %r22265, %r22264; 2026-02-21T09:21:00.3783354Z cvt.rn.bf16x2.f32 %r6690, %r22267, %r22266; 2026-02-21T09:21:00.3783430Z cvt.rn.bf16x2.f32 %r6691, %r22269, %r22268; 2026-02-21T09:21:00.3783515Z cvt.rn.bf16x2.f32 %r6692, %r22271, %r22270; 2026-02-21T09:21:00.3783595Z cvt.rn.bf16x2.f32 %r6693, %r22273, %r22272; 2026-02-21T09:21:00.3783675Z cvt.rn.bf16x2.f32 %r6694, %r22275, %r22274; 2026-02-21T09:21:00.3783761Z cvt.rn.bf16x2.f32 %r6695, %r22277, %r22276; 2026-02-21T09:21:00.3783840Z cvt.rn.bf16x2.f32 %r6696, %r22279, %r22278; 2026-02-21T09:21:00.3783922Z cvt.rn.bf16x2.f32 %r6697, %r22281, %r22280; 2026-02-21T09:21:00.3784015Z cvt.rn.bf16x2.f32 %r6698, %r22283, %r22282; 2026-02-21T09:21:00.3784103Z cvt.rn.bf16x2.f32 %r6699, %r22285, %r22284; 2026-02-21T09:21:00.3784185Z cvt.rn.bf16x2.f32 %r6700, %r22287, %r22286; 2026-02-21T09:21:00.3784262Z cvt.rn.bf16x2.f32 %r6701, %r22289, %r22288; 2026-02-21T09:21:00.3784352Z cvt.rn.bf16x2.f32 %r6702, %r22291, %r22290; 2026-02-21T09:21:00.3784430Z cvt.rn.bf16x2.f32 %r6703, %r22293, %r22292; 2026-02-21T09:21:00.3784507Z cvt.rn.bf16x2.f32 %r6704, %r22295, %r22294; 2026-02-21T09:21:00.3784592Z cvt.rn.bf16x2.f32 %r6705, %r22297, %r22296; 2026-02-21T09:21:00.3784725Z cvt.rn.bf16x2.f32 %r6706, %r22299, %r22298; 2026-02-21T09:21:00.3784801Z cvt.rn.bf16x2.f32 %r6707, %r22301, %r22300; 2026-02-21T09:21:00.3784878Z cvt.rn.bf16x2.f32 %r6708, %r22303, %r22302; 2026-02-21T09:21:00.3784961Z cvt.rn.bf16x2.f32 %r6709, %r22305, %r22304; 2026-02-21T09:21:00.3785082Z cvt.rn.bf16x2.f32 %r6710, %r22307, %r22306; 2026-02-21T09:21:00.3785158Z cvt.rn.bf16x2.f32 %r6711, %r22309, %r22308; 2026-02-21T09:21:00.3785243Z cvt.rn.bf16x2.f32 %r6712, %r22311, %r22310; 2026-02-21T09:21:00.3785319Z cvt.rn.bf16x2.f32 %r6713, %r22313, %r22312; 2026-02-21T09:21:00.3785396Z cvt.rn.bf16x2.f32 %r6714, %r22315, %r22314; 2026-02-21T09:21:00.3785472Z cvt.rn.bf16x2.f32 %r6715, %r22317, %r22316; 2026-02-21T09:21:00.3785559Z cvt.rn.bf16x2.f32 %r6716, %r22319, %r22318; 2026-02-21T09:21:00.3785687Z cvt.rn.bf16x2.f32 %r6717, %r22321, %r22320; 2026-02-21T09:21:00.3785768Z cvt.rn.bf16x2.f32 %r6718, %r22323, %r22322; 2026-02-21T09:21:00.3785850Z cvt.rn.bf16x2.f32 %r6719, %r22325, %r22324; 2026-02-21T09:21:00.3785930Z cvt.rn.bf16x2.f32 %r6720, %r22327, %r22326; 2026-02-21T09:21:00.3786008Z cvt.rn.bf16x2.f32 %r6721, %r22329, %r22328; 2026-02-21T09:21:00.3786091Z cvt.rn.bf16x2.f32 %r6722, %r22331, %r22330; 2026-02-21T09:21:00.3786169Z cvt.rn.bf16x2.f32 %r6723, %r22333, %r22332; 2026-02-21T09:21:00.3786248Z cvt.rn.bf16x2.f32 %r6724, %r22335, %r22334; 2026-02-21T09:21:00.3786326Z cvt.rn.bf16x2.f32 %r6725, %r22337, %r22336; 2026-02-21T09:21:00.3786412Z cvt.rn.bf16x2.f32 %r6726, %r22339, %r22338; 2026-02-21T09:21:00.3786614Z cvt.rn.bf16x2.f32 %r6727, %r22341, %r22340; 2026-02-21T09:21:00.3786773Z cvt.rn.bf16x2.f32 %r6728, %r22343, %r22342; 2026-02-21T09:21:00.3786863Z cvt.rn.bf16x2.f32 %r6729, %r22345, %r22344; 2026-02-21T09:21:00.3786942Z cvt.rn.bf16x2.f32 %r6730, %r22347, %r22346; 2026-02-21T09:21:00.3787021Z cvt.rn.bf16x2.f32 %r6731, %r22349, %r22348; 2026-02-21T09:21:00.3787104Z cvt.rn.bf16x2.f32 %r6732, %r22351, %r22350; 2026-02-21T09:21:00.3787181Z cvt.rn.bf16x2.f32 %r6733, %r22353, %r22352; 2026-02-21T09:21:00.3787259Z cvt.rn.bf16x2.f32 %r6734, %r22355, %r22354; 2026-02-21T09:21:00.3787337Z cvt.rn.bf16x2.f32 %r6735, %r22357, %r22356; 2026-02-21T09:21:00.3787421Z cvt.rn.bf16x2.f32 %r6736, %r22359, %r22358; 2026-02-21T09:21:00.3787501Z cvt.rn.bf16x2.f32 %r6737, %r22361, %r22360; 2026-02-21T09:21:00.3787594Z cvt.rn.bf16x2.f32 %r6738, %r22363, %r22362; 2026-02-21T09:21:00.3787681Z cvt.rn.bf16x2.f32 %r6739, %r22365, %r22364; 2026-02-21T09:21:00.3787758Z cvt.rn.bf16x2.f32 %r6740, %r22367, %r22366; 2026-02-21T09:21:00.3787836Z cvt.rn.bf16x2.f32 %r6741, %r22369, %r22368; 2026-02-21T09:21:00.3787919Z cvt.rn.bf16x2.f32 %r6742, %r22371, %r22370; 2026-02-21T09:21:00.3787996Z cvt.rn.bf16x2.f32 %r6743, %r22373, %r22372; 2026-02-21T09:21:00.3788074Z cvt.rn.bf16x2.f32 %r6744, %r22375, %r22374; 2026-02-21T09:21:00.3788151Z cvt.rn.bf16x2.f32 %r6745, %r22377, %r22376; 2026-02-21T09:21:00.3788238Z cvt.rn.bf16x2.f32 %r6746, %r22379, %r22378; 2026-02-21T09:21:00.3788402Z cvt.rn.bf16x2.f32 %r6747, %r22381, %r22380; 2026-02-21T09:21:00.3788484Z cvt.rn.bf16x2.f32 %r6748, %r22383, %r22382; 2026-02-21T09:21:00.3788574Z cvt.rn.bf16x2.f32 %r6749, %r22385, %r22384; 2026-02-21T09:21:00.3788655Z cvt.rn.bf16x2.f32 %r6750, %r22387, %r22386; 2026-02-21T09:21:00.3788736Z cvt.rn.bf16x2.f32 %r6751, %r22389, %r22388; 2026-02-21T09:21:00.3788821Z cvt.rn.bf16x2.f32 %r6752, %r22391, %r22390; 2026-02-21T09:21:00.3788899Z cvt.rn.bf16x2.f32 %r6753, %r22393, %r22392; 2026-02-21T09:21:00.3788977Z cvt.rn.bf16x2.f32 %r6754, %r22395, %r22394; 2026-02-21T09:21:00.3789057Z cvt.rn.bf16x2.f32 %r6755, %r22397, %r22396; 2026-02-21T09:21:00.3789142Z cvt.rn.bf16x2.f32 %r6756, %r22399, %r22398; 2026-02-21T09:21:00.3789217Z cvt.rn.bf16x2.f32 %r6757, %r22401, %r22400; 2026-02-21T09:21:00.3789294Z cvt.rn.bf16x2.f32 %r6758, %r22403, %r22402; 2026-02-21T09:21:00.3789380Z cvt.rn.bf16x2.f32 %r6759, %r22405, %r22404; 2026-02-21T09:21:00.3789456Z cvt.rn.bf16x2.f32 %r6760, %r22407, %r22406; 2026-02-21T09:21:00.3789613Z cvt.rn.bf16x2.f32 %r6761, %r22409, %r22408; 2026-02-21T09:21:00.3789699Z cvt.rn.bf16x2.f32 %r6762, %r22411, %r22410; 2026-02-21T09:21:00.3789776Z cvt.rn.bf16x2.f32 %r6763, %r22413, %r22412; 2026-02-21T09:21:00.3789856Z cvt.rn.bf16x2.f32 %r6764, %r22415, %r22414; 2026-02-21T09:21:00.3789997Z cvt.rn.bf16x2.f32 %r6765, %r22417, %r22416; 2026-02-21T09:21:00.3790082Z cvt.rn.bf16x2.f32 %r6766, %r22419, %r22418; 2026-02-21T09:21:00.3790160Z cvt.rn.bf16x2.f32 %r6767, %r22421, %r22420; 2026-02-21T09:21:00.3790235Z cvt.rn.bf16x2.f32 %r6768, %r22423, %r22422; 2026-02-21T09:21:00.3790320Z cvt.rn.bf16x2.f32 %r6769, %r22425, %r22424; 2026-02-21T09:21:00.3790398Z cvt.rn.bf16x2.f32 %r6770, %r22427, %r22426; 2026-02-21T09:21:00.3790475Z cvt.rn.bf16x2.f32 %r6771, %r22429, %r22428; 2026-02-21T09:21:00.3790638Z cvt.rn.bf16x2.f32 %r6772, %r22431, %r22430; 2026-02-21T09:21:00.3790719Z cvt.rn.bf16x2.f32 %r6773, %r22433, %r22432; 2026-02-21T09:21:00.3790797Z cvt.rn.bf16x2.f32 %r6774, %r22435, %r22434; 2026-02-21T09:21:00.3790876Z cvt.rn.bf16x2.f32 %r6775, %r22437, %r22436; 2026-02-21T09:21:00.3790960Z cvt.rn.bf16x2.f32 %r6776, %r22439, %r22438; 2026-02-21T09:21:00.3791039Z cvt.rn.bf16x2.f32 %r6777, %r22441, %r22440; 2026-02-21T09:21:00.3791117Z cvt.rn.bf16x2.f32 %r6778, %r22443, %r22442; 2026-02-21T09:21:00.3791199Z cvt.rn.bf16x2.f32 %r6779, %r22445, %r22444; 2026-02-21T09:21:00.3791275Z cvt.rn.bf16x2.f32 %r6780, %r22447, %r22446; 2026-02-21T09:21:00.3791352Z cvt.rn.bf16x2.f32 %r6781, %r22449, %r22448; 2026-02-21T09:21:00.3791476Z cvt.rn.bf16x2.f32 %r6782, %r22451, %r22450; 2026-02-21T09:21:00.3791563Z cvt.rn.bf16x2.f32 %r6783, %r22453, %r22452; 2026-02-21T09:21:00.3791640Z cvt.rn.bf16x2.f32 %r6784, %r22455, %r22454; 2026-02-21T09:21:00.3791717Z cvt.rn.bf16x2.f32 %r6785, %r22457, %r22456; 2026-02-21T09:21:00.3791799Z cvt.rn.bf16x2.f32 %r6786, %r22459, %r22458; 2026-02-21T09:21:00.3791887Z cvt.rn.bf16x2.f32 %r6787, %r22461, %r22460; 2026-02-21T09:21:00.3791968Z cvt.rn.bf16x2.f32 %r6788, %r22463, %r22462; 2026-02-21T09:21:00.3792051Z cvt.rn.bf16x2.f32 %r6789, %r22465, %r22464; 2026-02-21T09:21:00.3792127Z cvt.rn.bf16x2.f32 %r6790, %r22467, %r22466; 2026-02-21T09:21:00.3792203Z cvt.rn.bf16x2.f32 %r6791, %r22469, %r22468; 2026-02-21T09:21:00.3792281Z cvt.rn.bf16x2.f32 %r6792, %r22471, %r22470; 2026-02-21T09:21:00.3792366Z cvt.rn.bf16x2.f32 %r6793, %r22473, %r22472; 2026-02-21T09:21:00.3792445Z cvt.rn.bf16x2.f32 %r6794, %r22475, %r22474; 2026-02-21T09:21:00.3792522Z cvt.rn.bf16x2.f32 %r6795, %r22477, %r22476; 2026-02-21T09:21:00.3792614Z cvt.rn.bf16x2.f32 %r6796, %r22479, %r22478; 2026-02-21T09:21:00.3792690Z cvt.rn.bf16x2.f32 %r6797, %r22481, %r22480; 2026-02-21T09:21:00.3792767Z cvt.rn.bf16x2.f32 %r6798, %r22483, %r22482; 2026-02-21T09:21:00.3792857Z cvt.rn.bf16x2.f32 %r6799, %r22485, %r22484; 2026-02-21T09:21:00.3792933Z cvt.rn.bf16x2.f32 %r6800, %r22487, %r22486; 2026-02-21T09:21:00.3793008Z cvt.rn.bf16x2.f32 %r6801, %r22489, %r22488; 2026-02-21T09:21:00.3793085Z cvt.rn.bf16x2.f32 %r6802, %r22491, %r22490; 2026-02-21T09:21:00.3793169Z cvt.rn.bf16x2.f32 %r6803, %r22493, %r22492; 2026-02-21T09:21:00.3793247Z cvt.rn.bf16x2.f32 %r6804, %r22495, %r22494; 2026-02-21T09:21:00.3793327Z cvt.rn.bf16x2.f32 %r6805, %r22497, %r22496; 2026-02-21T09:21:00.3793414Z cvt.rn.bf16x2.f32 %r6806, %r22499, %r22498; 2026-02-21T09:21:00.3793491Z cvt.rn.bf16x2.f32 %r6807, %r22501, %r22500; 2026-02-21T09:21:00.3793569Z cvt.rn.bf16x2.f32 %r6808, %r22503, %r22502; 2026-02-21T09:21:00.3793655Z cvt.rn.bf16x2.f32 %r6809, %r22505, %r22504; 2026-02-21T09:21:00.3793734Z cvt.rn.bf16x2.f32 %r6810, %r22507, %r22506; 2026-02-21T09:21:00.3793810Z cvt.rn.bf16x2.f32 %r6811, %r22509, %r22508; 2026-02-21T09:21:00.3793890Z cvt.rn.bf16x2.f32 %r6812, %r22511, %r22510; 2026-02-21T09:21:00.3793973Z cvt.rn.bf16x2.f32 %r6813, %r22513, %r22512; 2026-02-21T09:21:00.3794053Z cvt.rn.bf16x2.f32 %r6814, %r22515, %r22514; 2026-02-21T09:21:00.3794130Z cvt.rn.bf16x2.f32 %r6815, %r22517, %r22516; 2026-02-21T09:21:00.3794421Z .loc 1 91 43 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:43 2026-02-21T09:21:00.3794490Z shl.b32 %r6816, %r6656, 13; 2026-02-21T09:21:00.3794554Z shl.b32 %r6817, %r6657, 13; 2026-02-21T09:21:00.3794692Z shl.b32 %r6818, %r6658, 13; 2026-02-21T09:21:00.3794754Z shl.b32 %r6819, %r6659, 13; 2026-02-21T09:21:00.3794815Z shl.b32 %r6820, %r6660, 13; 2026-02-21T09:21:00.3794877Z shl.b32 %r6821, %r6661, 13; 2026-02-21T09:21:00.3794946Z shl.b32 %r6822, %r6662, 13; 2026-02-21T09:21:00.3795011Z shl.b32 %r6823, %r6663, 13; 2026-02-21T09:21:00.3795074Z shl.b32 %r6824, %r6664, 13; 2026-02-21T09:21:00.3795140Z shl.b32 %r6825, %r6665, 13; 2026-02-21T09:21:00.3795202Z shl.b32 %r6826, %r6666, 13; 2026-02-21T09:21:00.3795320Z shl.b32 %r6827, %r6667, 13; 2026-02-21T09:21:00.3795386Z shl.b32 %r6828, %r6668, 13; 2026-02-21T09:21:00.3795454Z shl.b32 %r6829, %r6669, 13; 2026-02-21T09:21:00.3795519Z shl.b32 %r6830, %r6670, 13; 2026-02-21T09:21:00.3795585Z shl.b32 %r6831, %r6671, 13; 2026-02-21T09:21:00.3795653Z shl.b32 %r6832, %r6672, 13; 2026-02-21T09:21:00.3795714Z shl.b32 %r6833, %r6673, 13; 2026-02-21T09:21:00.3795776Z shl.b32 %r6834, %r6674, 13; 2026-02-21T09:21:00.3795840Z shl.b32 %r6835, %r6675, 13; 2026-02-21T09:21:00.3795909Z shl.b32 %r6836, %r6676, 13; 2026-02-21T09:21:00.3795972Z shl.b32 %r6837, %r6677, 13; 2026-02-21T09:21:00.3796033Z shl.b32 %r6838, %r6678, 13; 2026-02-21T09:21:00.3796100Z shl.b32 %r6839, %r6679, 13; 2026-02-21T09:21:00.3796206Z shl.b32 %r6840, %r6680, 13; 2026-02-21T09:21:00.3796272Z shl.b32 %r6841, %r6681, 13; 2026-02-21T09:21:00.3796339Z shl.b32 %r6842, %r6682, 13; 2026-02-21T09:21:00.3796400Z shl.b32 %r6843, %r6683, 13; 2026-02-21T09:21:00.3796589Z shl.b32 %r6844, %r6684, 13; 2026-02-21T09:21:00.3796657Z shl.b32 %r6845, %r6685, 13; 2026-02-21T09:21:00.3796723Z shl.b32 %r6846, %r6686, 13; 2026-02-21T09:21:00.3796784Z shl.b32 %r6847, %r6687, 13; 2026-02-21T09:21:00.3797000Z .loc 1 91 50 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:50 2026-02-21T09:21:00.3797077Z add.s32 %r6848, %r6816, %r240; 2026-02-21T09:21:00.3797142Z add.s32 %r6849, %r6817, %r240; 2026-02-21T09:21:00.3797208Z add.s32 %r6850, %r6818, %r240; 2026-02-21T09:21:00.3797275Z add.s32 %r6851, %r6819, %r240; 2026-02-21T09:21:00.3797337Z add.s32 %r6852, %r6820, %r240; 2026-02-21T09:21:00.3797399Z add.s32 %r6853, %r6821, %r240; 2026-02-21T09:21:00.3797466Z add.s32 %r6854, %r6822, %r240; 2026-02-21T09:21:00.3797547Z add.s32 %r6855, %r6823, %r240; 2026-02-21T09:21:00.3797611Z add.s32 %r6856, %r6824, %r240; 2026-02-21T09:21:00.3797675Z add.s32 %r6857, %r6825, %r240; 2026-02-21T09:21:00.3797744Z add.s32 %r6858, %r6826, %r240; 2026-02-21T09:21:00.3797808Z add.s32 %r6859, %r6827, %r240; 2026-02-21T09:21:00.3797871Z add.s32 %r6860, %r6828, %r240; 2026-02-21T09:21:00.3797935Z add.s32 %r6861, %r6829, %r240; 2026-02-21T09:21:00.3798006Z add.s32 %r6862, %r6830, %r240; 2026-02-21T09:21:00.3798070Z add.s32 %r6863, %r6831, %r240; 2026-02-21T09:21:00.3798132Z add.s32 %r6864, %r6832, %r240; 2026-02-21T09:21:00.3798199Z add.s32 %r6865, %r6833, %r240; 2026-02-21T09:21:00.3798264Z add.s32 %r6866, %r6834, %r240; 2026-02-21T09:21:00.3798328Z add.s32 %r6867, %r6835, %r240; 2026-02-21T09:21:00.3798392Z add.s32 %r6868, %r6836, %r240; 2026-02-21T09:21:00.3798461Z add.s32 %r6869, %r6837, %r240; 2026-02-21T09:21:00.3798523Z add.s32 %r6870, %r6838, %r240; 2026-02-21T09:21:00.3798586Z add.s32 %r6871, %r6839, %r240; 2026-02-21T09:21:00.3798657Z add.s32 %r6872, %r6840, %r240; 2026-02-21T09:21:00.3798720Z add.s32 %r6873, %r6841, %r240; 2026-02-21T09:21:00.3798783Z add.s32 %r6874, %r6842, %r240; 2026-02-21T09:21:00.3798851Z add.s32 %r6875, %r6843, %r240; 2026-02-21T09:21:00.3798913Z add.s32 %r6876, %r6844, %r240; 2026-02-21T09:21:00.3798976Z add.s32 %r6877, %r6845, %r240; 2026-02-21T09:21:00.3799039Z add.s32 %r6878, %r6846, %r240; 2026-02-21T09:21:00.3799199Z add.s32 %r6879, %r6847, %r240; 2026-02-21T09:21:00.3799412Z .loc 1 91 22 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:22 2026-02-21T09:21:00.3799492Z mad.wide.s32 %rd149, %r6848, 2, %rd46; 2026-02-21T09:21:00.3799631Z mad.wide.s32 %rd150, %r6849, 2, %rd46; 2026-02-21T09:21:00.3799702Z mad.wide.s32 %rd151, %r6850, 2, %rd46; 2026-02-21T09:21:00.3799771Z mad.wide.s32 %rd152, %r6851, 2, %rd46; 2026-02-21T09:21:00.3799838Z mad.wide.s32 %rd153, %r6852, 2, %rd46; 2026-02-21T09:21:00.3799913Z mad.wide.s32 %rd154, %r6853, 2, %rd46; 2026-02-21T09:21:00.3799984Z mad.wide.s32 %rd155, %r6854, 2, %rd46; 2026-02-21T09:21:00.3800052Z mad.wide.s32 %rd156, %r6855, 2, %rd46; 2026-02-21T09:21:00.3800186Z mad.wide.s32 %rd157, %r6856, 2, %rd46; 2026-02-21T09:21:00.3800257Z mad.wide.s32 %rd158, %r6857, 2, %rd46; 2026-02-21T09:21:00.3800328Z mad.wide.s32 %rd159, %r6858, 2, %rd46; 2026-02-21T09:21:00.3800401Z mad.wide.s32 %rd160, %r6859, 2, %rd46; 2026-02-21T09:21:00.3800484Z mad.wide.s32 %rd161, %r6860, 2, %rd46; 2026-02-21T09:21:00.3800557Z mad.wide.s32 %rd162, %r6861, 2, %rd46; 2026-02-21T09:21:00.3800624Z mad.wide.s32 %rd163, %r6862, 2, %rd46; 2026-02-21T09:21:00.3800696Z mad.wide.s32 %rd164, %r6863, 2, %rd46; 2026-02-21T09:21:00.3800766Z mad.wide.s32 %rd165, %r6864, 2, %rd46; 2026-02-21T09:21:00.3800833Z mad.wide.s32 %rd166, %r6865, 2, %rd46; 2026-02-21T09:21:00.3800909Z mad.wide.s32 %rd167, %r6866, 2, %rd46; 2026-02-21T09:21:00.3800976Z mad.wide.s32 %rd168, %r6867, 2, %rd46; 2026-02-21T09:21:00.3801105Z mad.wide.s32 %rd169, %r6868, 2, %rd46; 2026-02-21T09:21:00.3801182Z mad.wide.s32 %rd170, %r6869, 2, %rd46; 2026-02-21T09:21:00.3801252Z mad.wide.s32 %rd171, %r6870, 2, %rd46; 2026-02-21T09:21:00.3801319Z mad.wide.s32 %rd172, %r6871, 2, %rd46; 2026-02-21T09:21:00.3801387Z mad.wide.s32 %rd173, %r6872, 2, %rd46; 2026-02-21T09:21:00.3801460Z mad.wide.s32 %rd174, %r6873, 2, %rd46; 2026-02-21T09:21:00.3801530Z mad.wide.s32 %rd175, %r6874, 2, %rd46; 2026-02-21T09:21:00.3801597Z mad.wide.s32 %rd176, %r6875, 2, %rd46; 2026-02-21T09:21:00.3801671Z mad.wide.s32 %rd177, %r6876, 2, %rd46; 2026-02-21T09:21:00.3801738Z mad.wide.s32 %rd178, %r6877, 2, %rd46; 2026-02-21T09:21:00.3801805Z mad.wide.s32 %rd179, %r6878, 2, %rd46; 2026-02-21T09:21:00.3801873Z mad.wide.s32 %rd180, %r6879, 2, %rd46; 2026-02-21T09:21:00.3802095Z .loc 1 91 81 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:81 2026-02-21T09:21:00.3802227Z st.shared.v4.b32 [%r117], {%r6688, %r6690, %r6692, %r6694}; 2026-02-21T09:21:00.3802340Z st.shared.v4.b32 [%r118], {%r6696, %r6698, %r6700, %r6702}; 2026-02-21T09:21:00.3802452Z st.shared.v4.b32 [%r119], {%r6704, %r6706, %r6708, %r6710}; 2026-02-21T09:21:00.3802559Z st.shared.v4.b32 [%r120], {%r6712, %r6714, %r6716, %r6718}; 2026-02-21T09:21:00.3802663Z st.shared.v4.b32 [%r121], {%r6720, %r6722, %r6724, %r6726}; 2026-02-21T09:21:00.3802774Z st.shared.v4.b32 [%r122], {%r6728, %r6730, %r6732, %r6734}; 2026-02-21T09:21:00.3802880Z st.shared.v4.b32 [%r123], {%r6736, %r6738, %r6740, %r6742}; 2026-02-21T09:21:00.3802984Z st.shared.v4.b32 [%r124], {%r6744, %r6746, %r6748, %r6750}; 2026-02-21T09:21:00.3803049Z bar.sync 0; 2026-02-21T09:21:00.3803115Z // begin inline asm 2026-02-21T09:21:00.3803310Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6425, %r6426, %r6427, %r6428}, [%r6269]; 2026-02-21T09:21:00.3803370Z // end inline asm 2026-02-21T09:21:00.3803437Z // begin inline asm 2026-02-21T09:21:00.3803624Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6433, %r6434, %r6435, %r6436}, [%r6274]; 2026-02-21T09:21:00.3803684Z // end inline asm 2026-02-21T09:21:00.3803754Z // begin inline asm 2026-02-21T09:21:00.3803935Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6441, %r6442, %r6443, %r6444}, [%r6279]; 2026-02-21T09:21:00.3803992Z // end inline asm 2026-02-21T09:21:00.3804062Z // begin inline asm 2026-02-21T09:21:00.3804244Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6449, %r6450, %r6451, %r6452}, [%r6284]; 2026-02-21T09:21:00.3804362Z // end inline asm 2026-02-21T09:21:00.3804435Z // begin inline asm 2026-02-21T09:21:00.3804619Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6457, %r6458, %r6459, %r6460}, [%r6289]; 2026-02-21T09:21:00.3804725Z // end inline asm 2026-02-21T09:21:00.3804794Z // begin inline asm 2026-02-21T09:21:00.3804973Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6465, %r6466, %r6467, %r6468}, [%r6294]; 2026-02-21T09:21:00.3805032Z // end inline asm 2026-02-21T09:21:00.3805098Z // begin inline asm 2026-02-21T09:21:00.3805278Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6473, %r6474, %r6475, %r6476}, [%r6299]; 2026-02-21T09:21:00.3805338Z // end inline asm 2026-02-21T09:21:00.3805397Z // begin inline asm 2026-02-21T09:21:00.3805641Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6481, %r6482, %r6483, %r6484}, [%r6304]; 2026-02-21T09:21:00.3805703Z // end inline asm 2026-02-21T09:21:00.3805762Z bar.sync 0; 2026-02-21T09:21:00.3805882Z st.shared.v4.b32 [%r117], {%r6689, %r6691, %r6693, %r6695}; 2026-02-21T09:21:00.3805989Z st.shared.v4.b32 [%r118], {%r6697, %r6699, %r6701, %r6703}; 2026-02-21T09:21:00.3806094Z st.shared.v4.b32 [%r119], {%r6705, %r6707, %r6709, %r6711}; 2026-02-21T09:21:00.3806208Z st.shared.v4.b32 [%r120], {%r6713, %r6715, %r6717, %r6719}; 2026-02-21T09:21:00.3806315Z st.shared.v4.b32 [%r121], {%r6721, %r6723, %r6725, %r6727}; 2026-02-21T09:21:00.3806425Z st.shared.v4.b32 [%r122], {%r6729, %r6731, %r6733, %r6735}; 2026-02-21T09:21:00.3806760Z st.shared.v4.b32 [%r123], {%r6737, %r6739, %r6741, %r6743}; 2026-02-21T09:21:00.3806885Z st.shared.v4.b32 [%r124], {%r6745, %r6747, %r6749, %r6751}; 2026-02-21T09:21:00.3806946Z bar.sync 0; 2026-02-21T09:21:00.3807007Z // begin inline asm 2026-02-21T09:21:00.3807197Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6429, %r6430, %r6431, %r6432}, [%r6269]; 2026-02-21T09:21:00.3807255Z // end inline asm 2026-02-21T09:21:00.3807313Z // begin inline asm 2026-02-21T09:21:00.3807497Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6437, %r6438, %r6439, %r6440}, [%r6274]; 2026-02-21T09:21:00.3807563Z // end inline asm 2026-02-21T09:21:00.3807622Z // begin inline asm 2026-02-21T09:21:00.3807802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6445, %r6446, %r6447, %r6448}, [%r6279]; 2026-02-21T09:21:00.3807873Z // end inline asm 2026-02-21T09:21:00.3807937Z // begin inline asm 2026-02-21T09:21:00.3808116Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6453, %r6454, %r6455, %r6456}, [%r6284]; 2026-02-21T09:21:00.3808179Z // end inline asm 2026-02-21T09:21:00.3808239Z // begin inline asm 2026-02-21T09:21:00.3808417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6461, %r6462, %r6463, %r6464}, [%r6289]; 2026-02-21T09:21:00.3808476Z // end inline asm 2026-02-21T09:21:00.3808544Z // begin inline asm 2026-02-21T09:21:00.3808722Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6469, %r6470, %r6471, %r6472}, [%r6294]; 2026-02-21T09:21:00.3808780Z // end inline asm 2026-02-21T09:21:00.3808845Z // begin inline asm 2026-02-21T09:21:00.3809023Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6477, %r6478, %r6479, %r6480}, [%r6299]; 2026-02-21T09:21:00.3809080Z // end inline asm 2026-02-21T09:21:00.3809145Z // begin inline asm 2026-02-21T09:21:00.3809325Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6485, %r6486, %r6487, %r6488}, [%r6304]; 2026-02-21T09:21:00.3809383Z // end inline asm 2026-02-21T09:21:00.3809439Z bar.sync 0; 2026-02-21T09:21:00.3809551Z st.shared.v4.b32 [%r117], {%r6752, %r6754, %r6756, %r6758}; 2026-02-21T09:21:00.3809657Z st.shared.v4.b32 [%r118], {%r6760, %r6762, %r6764, %r6766}; 2026-02-21T09:21:00.3809760Z st.shared.v4.b32 [%r119], {%r6768, %r6770, %r6772, %r6774}; 2026-02-21T09:21:00.3809869Z st.shared.v4.b32 [%r120], {%r6776, %r6778, %r6780, %r6782}; 2026-02-21T09:21:00.3809972Z st.shared.v4.b32 [%r121], {%r6784, %r6786, %r6788, %r6790}; 2026-02-21T09:21:00.3810075Z st.shared.v4.b32 [%r122], {%r6792, %r6794, %r6796, %r6798}; 2026-02-21T09:21:00.3810185Z st.shared.v4.b32 [%r123], {%r6800, %r6802, %r6804, %r6806}; 2026-02-21T09:21:00.3810380Z st.shared.v4.b32 [%r124], {%r6808, %r6810, %r6812, %r6814}; 2026-02-21T09:21:00.3810439Z bar.sync 0; 2026-02-21T09:21:00.3810500Z // begin inline asm 2026-02-21T09:21:00.3810689Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6489, %r6490, %r6491, %r6492}, [%r6269]; 2026-02-21T09:21:00.3810808Z // end inline asm 2026-02-21T09:21:00.3810869Z // begin inline asm 2026-02-21T09:21:00.3811055Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6497, %r6498, %r6499, %r6500}, [%r6274]; 2026-02-21T09:21:00.3811118Z // end inline asm 2026-02-21T09:21:00.3811179Z // begin inline asm 2026-02-21T09:21:00.3811363Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6505, %r6506, %r6507, %r6508}, [%r6279]; 2026-02-21T09:21:00.3811482Z // end inline asm 2026-02-21T09:21:00.3811545Z // begin inline asm 2026-02-21T09:21:00.3811726Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6513, %r6514, %r6515, %r6516}, [%r6284]; 2026-02-21T09:21:00.3811792Z // end inline asm 2026-02-21T09:21:00.3811851Z // begin inline asm 2026-02-21T09:21:00.3812030Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6521, %r6522, %r6523, %r6524}, [%r6289]; 2026-02-21T09:21:00.3812093Z // end inline asm 2026-02-21T09:21:00.3812154Z // begin inline asm 2026-02-21T09:21:00.3812335Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6529, %r6530, %r6531, %r6532}, [%r6294]; 2026-02-21T09:21:00.3812393Z // end inline asm 2026-02-21T09:21:00.3812460Z // begin inline asm 2026-02-21T09:21:00.3812648Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6537, %r6538, %r6539, %r6540}, [%r6299]; 2026-02-21T09:21:00.3812759Z // end inline asm 2026-02-21T09:21:00.3812835Z // begin inline asm 2026-02-21T09:21:00.3813022Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6545, %r6546, %r6547, %r6548}, [%r6304]; 2026-02-21T09:21:00.3813082Z // end inline asm 2026-02-21T09:21:00.3813145Z bar.sync 0; 2026-02-21T09:21:00.3813252Z st.shared.v4.b32 [%r117], {%r6753, %r6755, %r6757, %r6759}; 2026-02-21T09:21:00.3813361Z st.shared.v4.b32 [%r118], {%r6761, %r6763, %r6765, %r6767}; 2026-02-21T09:21:00.3813464Z st.shared.v4.b32 [%r119], {%r6769, %r6771, %r6773, %r6775}; 2026-02-21T09:21:00.3813575Z st.shared.v4.b32 [%r120], {%r6777, %r6779, %r6781, %r6783}; 2026-02-21T09:21:00.3813678Z st.shared.v4.b32 [%r121], {%r6785, %r6787, %r6789, %r6791}; 2026-02-21T09:21:00.3813781Z st.shared.v4.b32 [%r122], {%r6793, %r6795, %r6797, %r6799}; 2026-02-21T09:21:00.3813891Z st.shared.v4.b32 [%r123], {%r6801, %r6803, %r6805, %r6807}; 2026-02-21T09:21:00.3813994Z st.shared.v4.b32 [%r124], {%r6809, %r6811, %r6813, %r6815}; 2026-02-21T09:21:00.3814052Z bar.sync 0; 2026-02-21T09:21:00.3814116Z // begin inline asm 2026-02-21T09:21:00.3814297Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6493, %r6494, %r6495, %r6496}, [%r6269]; 2026-02-21T09:21:00.3814356Z // end inline asm 2026-02-21T09:21:00.3814416Z // begin inline asm 2026-02-21T09:21:00.3814602Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6501, %r6502, %r6503, %r6504}, [%r6274]; 2026-02-21T09:21:00.3814662Z // end inline asm 2026-02-21T09:21:00.3814722Z // begin inline asm 2026-02-21T09:21:00.3814909Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6509, %r6510, %r6511, %r6512}, [%r6279]; 2026-02-21T09:21:00.3814967Z // end inline asm 2026-02-21T09:21:00.3815029Z // begin inline asm 2026-02-21T09:21:00.3815217Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6517, %r6518, %r6519, %r6520}, [%r6284]; 2026-02-21T09:21:00.3815276Z // end inline asm 2026-02-21T09:21:00.3815336Z // begin inline asm 2026-02-21T09:21:00.3815516Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6525, %r6526, %r6527, %r6528}, [%r6289]; 2026-02-21T09:21:00.3815581Z // end inline asm 2026-02-21T09:21:00.3815641Z // begin inline asm 2026-02-21T09:21:00.3815819Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6533, %r6534, %r6535, %r6536}, [%r6294]; 2026-02-21T09:21:00.3815881Z // end inline asm 2026-02-21T09:21:00.3815945Z // begin inline asm 2026-02-21T09:21:00.3816123Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6541, %r6542, %r6543, %r6544}, [%r6299]; 2026-02-21T09:21:00.3816263Z // end inline asm 2026-02-21T09:21:00.3816332Z // begin inline asm 2026-02-21T09:21:00.3816634Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6549, %r6550, %r6551, %r6552}, [%r6304]; 2026-02-21T09:21:00.3816775Z // end inline asm 2026-02-21T09:21:00.3816846Z // begin inline asm 2026-02-21T09:21:00.3816977Z st.global.v4.b32 [ %rd149 + 0 ], { %r6425, %r6426, %r6427, %r6428 }; 2026-02-21T09:21:00.3817036Z // end inline asm 2026-02-21T09:21:00.3817106Z // begin inline asm 2026-02-21T09:21:00.3817229Z st.global.v4.b32 [ %rd150 + 0 ], { %r6429, %r6430, %r6431, %r6432 }; 2026-02-21T09:21:00.3817289Z // end inline asm 2026-02-21T09:21:00.3817349Z // begin inline asm 2026-02-21T09:21:00.3817537Z st.global.v4.b32 [ %rd151 + 0 ], { %r6433, %r6434, %r6435, %r6436 }; 2026-02-21T09:21:00.3817597Z // end inline asm 2026-02-21T09:21:00.3817657Z // begin inline asm 2026-02-21T09:21:00.3817777Z st.global.v4.b32 [ %rd152 + 0 ], { %r6437, %r6438, %r6439, %r6440 }; 2026-02-21T09:21:00.3817837Z // end inline asm 2026-02-21T09:21:00.3817896Z // begin inline asm 2026-02-21T09:21:00.3818018Z st.global.v4.b32 [ %rd153 + 0 ], { %r6441, %r6442, %r6443, %r6444 }; 2026-02-21T09:21:00.3818082Z // end inline asm 2026-02-21T09:21:00.3818144Z // begin inline asm 2026-02-21T09:21:00.3818269Z st.global.v4.b32 [ %rd154 + 0 ], { %r6445, %r6446, %r6447, %r6448 }; 2026-02-21T09:21:00.3818336Z // end inline asm 2026-02-21T09:21:00.3818396Z // begin inline asm 2026-02-21T09:21:00.3818577Z st.global.v4.b32 [ %rd155 + 0 ], { %r6449, %r6450, %r6451, %r6452 }; 2026-02-21T09:21:00.3818648Z // end inline asm 2026-02-21T09:21:00.3818710Z // begin inline asm 2026-02-21T09:21:00.3818824Z st.global.v4.b32 [ %rd156 + 0 ], { %r6453, %r6454, %r6455, %r6456 }; 2026-02-21T09:21:00.3818882Z // end inline asm 2026-02-21T09:21:00.3818950Z // begin inline asm 2026-02-21T09:21:00.3819064Z st.global.v4.b32 [ %rd157 + 0 ], { %r6457, %r6458, %r6459, %r6460 }; 2026-02-21T09:21:00.3819124Z // end inline asm 2026-02-21T09:21:00.3819187Z // begin inline asm 2026-02-21T09:21:00.3819302Z st.global.v4.b32 [ %rd158 + 0 ], { %r6461, %r6462, %r6463, %r6464 }; 2026-02-21T09:21:00.3819358Z // end inline asm 2026-02-21T09:21:00.3819418Z // begin inline asm 2026-02-21T09:21:00.3819539Z st.global.v4.b32 [ %rd159 + 0 ], { %r6465, %r6466, %r6467, %r6468 }; 2026-02-21T09:21:00.3819597Z // end inline asm 2026-02-21T09:21:00.3819655Z // begin inline asm 2026-02-21T09:21:00.3819774Z st.global.v4.b32 [ %rd160 + 0 ], { %r6469, %r6470, %r6471, %r6472 }; 2026-02-21T09:21:00.3819835Z // end inline asm 2026-02-21T09:21:00.3819895Z // begin inline asm 2026-02-21T09:21:00.3820015Z st.global.v4.b32 [ %rd161 + 0 ], { %r6473, %r6474, %r6475, %r6476 }; 2026-02-21T09:21:00.3820075Z // end inline asm 2026-02-21T09:21:00.3820133Z // begin inline asm 2026-02-21T09:21:00.3820249Z st.global.v4.b32 [ %rd162 + 0 ], { %r6477, %r6478, %r6479, %r6480 }; 2026-02-21T09:21:00.3820311Z // end inline asm 2026-02-21T09:21:00.3820376Z // begin inline asm 2026-02-21T09:21:00.3820491Z st.global.v4.b32 [ %rd163 + 0 ], { %r6481, %r6482, %r6483, %r6484 }; 2026-02-21T09:21:00.3820558Z // end inline asm 2026-02-21T09:21:00.3820620Z // begin inline asm 2026-02-21T09:21:00.3820738Z st.global.v4.b32 [ %rd164 + 0 ], { %r6485, %r6486, %r6487, %r6488 }; 2026-02-21T09:21:00.3820796Z // end inline asm 2026-02-21T09:21:00.3820862Z // begin inline asm 2026-02-21T09:21:00.3820975Z st.global.v4.b32 [ %rd165 + 0 ], { %r6489, %r6490, %r6491, %r6492 }; 2026-02-21T09:21:00.3821034Z // end inline asm 2026-02-21T09:21:00.3821100Z // begin inline asm 2026-02-21T09:21:00.3821215Z st.global.v4.b32 [ %rd166 + 0 ], { %r6493, %r6494, %r6495, %r6496 }; 2026-02-21T09:21:00.3821273Z // end inline asm 2026-02-21T09:21:00.3821349Z // begin inline asm 2026-02-21T09:21:00.3821474Z st.global.v4.b32 [ %rd167 + 0 ], { %r6497, %r6498, %r6499, %r6500 }; 2026-02-21T09:21:00.3821533Z // end inline asm 2026-02-21T09:21:00.3821593Z // begin inline asm 2026-02-21T09:21:00.3821793Z st.global.v4.b32 [ %rd168 + 0 ], { %r6501, %r6502, %r6503, %r6504 }; 2026-02-21T09:21:00.3821850Z // end inline asm 2026-02-21T09:21:00.3821910Z // begin inline asm 2026-02-21T09:21:00.3822030Z st.global.v4.b32 [ %rd169 + 0 ], { %r6505, %r6506, %r6507, %r6508 }; 2026-02-21T09:21:00.3822138Z // end inline asm 2026-02-21T09:21:00.3822199Z // begin inline asm 2026-02-21T09:21:00.3822312Z st.global.v4.b32 [ %rd170 + 0 ], { %r6509, %r6510, %r6511, %r6512 }; 2026-02-21T09:21:00.3822378Z // end inline asm 2026-02-21T09:21:00.3822445Z // begin inline asm 2026-02-21T09:21:00.3822560Z st.global.v4.b32 [ %rd171 + 0 ], { %r6513, %r6514, %r6515, %r6516 }; 2026-02-21T09:21:00.3822628Z // end inline asm 2026-02-21T09:21:00.3822688Z // begin inline asm 2026-02-21T09:21:00.3822850Z st.global.v4.b32 [ %rd172 + 0 ], { %r6517, %r6518, %r6519, %r6520 }; 2026-02-21T09:21:00.3822911Z // end inline asm 2026-02-21T09:21:00.3822979Z // begin inline asm 2026-02-21T09:21:00.3823100Z st.global.v4.b32 [ %rd173 + 0 ], { %r6521, %r6522, %r6523, %r6524 }; 2026-02-21T09:21:00.3823158Z // end inline asm 2026-02-21T09:21:00.3823226Z // begin inline asm 2026-02-21T09:21:00.3823343Z st.global.v4.b32 [ %rd174 + 0 ], { %r6525, %r6526, %r6527, %r6528 }; 2026-02-21T09:21:00.3823403Z // end inline asm 2026-02-21T09:21:00.3823469Z // begin inline asm 2026-02-21T09:21:00.3823584Z st.global.v4.b32 [ %rd175 + 0 ], { %r6529, %r6530, %r6531, %r6532 }; 2026-02-21T09:21:00.3823644Z // end inline asm 2026-02-21T09:21:00.3823703Z // begin inline asm 2026-02-21T09:21:00.3823873Z st.global.v4.b32 [ %rd176 + 0 ], { %r6533, %r6534, %r6535, %r6536 }; 2026-02-21T09:21:00.3823933Z // end inline asm 2026-02-21T09:21:00.3823995Z // begin inline asm 2026-02-21T09:21:00.3824118Z st.global.v4.b32 [ %rd177 + 0 ], { %r6537, %r6538, %r6539, %r6540 }; 2026-02-21T09:21:00.3824176Z // end inline asm 2026-02-21T09:21:00.3824238Z // begin inline asm 2026-02-21T09:21:00.3824352Z st.global.v4.b32 [ %rd178 + 0 ], { %r6541, %r6542, %r6543, %r6544 }; 2026-02-21T09:21:00.3824434Z // end inline asm 2026-02-21T09:21:00.3824495Z // begin inline asm 2026-02-21T09:21:00.3824611Z st.global.v4.b32 [ %rd179 + 0 ], { %r6545, %r6546, %r6547, %r6548 }; 2026-02-21T09:21:00.3824677Z // end inline asm 2026-02-21T09:21:00.3824738Z // begin inline asm 2026-02-21T09:21:00.3824855Z st.global.v4.b32 [ %rd180 + 0 ], { %r6549, %r6550, %r6551, %r6552 }; 2026-02-21T09:21:00.3824924Z // end inline asm 2026-02-21T09:21:00.3825148Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.3825218Z add.s32 %r6880, %r22257, 1; 2026-02-21T09:21:00.3825424Z .loc 1 29 33 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:29:33 2026-02-21T09:21:00.3825499Z shr.u32 %r6881, %r6880, 6; 2026-02-21T09:21:00.3825569Z and.b32 %r6882, %r6881, 33554424; 2026-02-21T09:21:00.3825767Z .loc 1 30 39 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:39 2026-02-21T09:21:00.3825841Z sub.s32 %r6883, 32, %r6882; 2026-02-21T09:21:00.3826039Z .loc 1 30 52 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:52 2026-02-21T09:21:00.3826103Z min.s32 %r6884, %r6883, 8; 2026-02-21T09:21:00.3826305Z .loc 1 31 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:45 2026-02-21T09:21:00.3826369Z and.b32 %r6885, %r6880, 511; 2026-02-21T09:21:00.3826695Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.3826775Z div.s32 %r6886, %r6885, %r6884; 2026-02-21T09:21:00.3826975Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.3827046Z mul.lo.s32 %r6887, %r6886, %r6884; 2026-02-21T09:21:00.3827113Z sub.s32 %r6888, %r6885, %r6887; 2026-02-21T09:21:00.3827316Z .loc 1 31 30 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:30 2026-02-21T09:21:00.3827459Z add.s32 %r6889, %r6888, %r6882; 2026-02-21T09:21:00.3827656Z .loc 1 33 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:33:27 2026-02-21T09:21:00.3827726Z shl.b32 %r6890, %r6889, 8; 2026-02-21T09:21:00.3827984Z .loc 1 34 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:32 2026-02-21T09:21:00.3828048Z or.b32 %r764, %r6890, %r44; 2026-02-21T09:21:00.3828250Z .loc 1 35 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:35:27 2026-02-21T09:21:00.3828380Z shl.b32 %r765, %r6886, 8; 2026-02-21T09:21:00.3828603Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.3828745Z or.b32 %r6891, %r765, %r6; 2026-02-21T09:21:00.3828812Z or.b32 %r6892, %r765, %r7; 2026-02-21T09:21:00.3828874Z or.b32 %r6893, %r765, %r8; 2026-02-21T09:21:00.3828933Z or.b32 %r6894, %r765, %r9; 2026-02-21T09:21:00.3829142Z .loc 1 51 53 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:53 2026-02-21T09:21:00.3829206Z shl.b32 %r6895, %r6891, 10; 2026-02-21T09:21:00.3829266Z shl.b32 %r6896, %r6892, 10; 2026-02-21T09:21:00.3829334Z shl.b32 %r6897, %r6893, 10; 2026-02-21T09:21:00.3829396Z shl.b32 %r6898, %r6894, 10; 2026-02-21T09:21:00.3829593Z .loc 1 51 60 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:60 2026-02-21T09:21:00.3829661Z or.b32 %r6899, %r6895, %r46; 2026-02-21T09:21:00.3829793Z or.b32 %r6900, %r6896, %r46; 2026-02-21T09:21:00.3829858Z or.b32 %r6901, %r6897, %r46; 2026-02-21T09:21:00.3829921Z or.b32 %r6902, %r6898, %r46; 2026-02-21T09:21:00.3830123Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3830203Z mad.wide.s32 %rd181, %r6899, 2, %rd44; 2026-02-21T09:21:00.3830276Z mad.wide.s32 %rd182, %r6900, 2, %rd44; 2026-02-21T09:21:00.3830352Z mad.wide.s32 %rd183, %r6901, 2, %rd44; 2026-02-21T09:21:00.3830421Z mad.wide.s32 %rd184, %r6902, 2, %rd44; 2026-02-21T09:21:00.3830618Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3830678Z bar.sync 0; 2026-02-21T09:21:00.3830755Z mov.b32 %r6554, 8; 2026-02-21T09:21:00.3830822Z // begin inline asm 2026-02-21T09:21:00.3830964Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd181 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3831030Z // end inline asm 2026-02-21T09:21:00.3831091Z // begin inline asm 2026-02-21T09:21:00.3831230Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd182 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3831294Z // end inline asm 2026-02-21T09:21:00.3831356Z // begin inline asm 2026-02-21T09:21:00.3831487Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd183 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3831547Z // end inline asm 2026-02-21T09:21:00.3831617Z // begin inline asm 2026-02-21T09:21:00.3831749Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd184 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3831807Z // end inline asm 2026-02-21T09:21:00.3831884Z cp.async.commit_group; 2026-02-21T09:21:00.3832086Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3832155Z add.s32 %r6903, %r764, %r22238; 2026-02-21T09:21:00.3832353Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3832427Z cvt.s64.s32 %rd232, %r6903; 2026-02-21T09:21:00.3832498Z add.s64 %rd185, %rd45, %rd232; 2026-02-21T09:21:00.3832698Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3832769Z // begin inline asm 2026-02-21T09:21:00.3832903Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd185 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3832963Z // end inline asm 2026-02-21T09:21:00.3833038Z cp.async.commit_group; 2026-02-21T09:21:00.3833294Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3833359Z cvt.s64.s32 %rd233, %r6895; 2026-02-21T09:21:00.3833427Z or.b64 %rd234, %rd233, %rd713; 2026-02-21T09:21:00.3833501Z shl.b64 %rd235, %rd234, 1; 2026-02-21T09:21:00.3833613Z add.s64 %rd236, %rd44, %rd235; 2026-02-21T09:21:00.3833679Z add.s64 %rd186, %rd236, 32; 2026-02-21T09:21:00.3833761Z cvt.s64.s32 %rd237, %r6896; 2026-02-21T09:21:00.3833827Z or.b64 %rd238, %rd237, %rd713; 2026-02-21T09:21:00.3833892Z shl.b64 %rd239, %rd238, 1; 2026-02-21T09:21:00.3833964Z add.s64 %rd240, %rd44, %rd239; 2026-02-21T09:21:00.3834028Z add.s64 %rd187, %rd240, 32; 2026-02-21T09:21:00.3834090Z cvt.s64.s32 %rd241, %r6897; 2026-02-21T09:21:00.3834198Z or.b64 %rd242, %rd241, %rd713; 2026-02-21T09:21:00.3834274Z shl.b64 %rd243, %rd242, 1; 2026-02-21T09:21:00.3834339Z add.s64 %rd244, %rd44, %rd243; 2026-02-21T09:21:00.3834407Z add.s64 %rd188, %rd244, 32; 2026-02-21T09:21:00.3834479Z cvt.s64.s32 %rd245, %r6898; 2026-02-21T09:21:00.3834545Z or.b64 %rd246, %rd245, %rd713; 2026-02-21T09:21:00.3834610Z shl.b64 %rd247, %rd246, 1; 2026-02-21T09:21:00.3834677Z add.s64 %rd248, %rd44, %rd247; 2026-02-21T09:21:00.3834748Z add.s64 %rd189, %rd248, 32; 2026-02-21T09:21:00.3834968Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3835032Z // begin inline asm 2026-02-21T09:21:00.3835180Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd186 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3835307Z // end inline asm 2026-02-21T09:21:00.3835368Z // begin inline asm 2026-02-21T09:21:00.3835501Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd187 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3835570Z // end inline asm 2026-02-21T09:21:00.3835633Z // begin inline asm 2026-02-21T09:21:00.3835765Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd188 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3835832Z // end inline asm 2026-02-21T09:21:00.3835895Z // begin inline asm 2026-02-21T09:21:00.3836027Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd189 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3836092Z // end inline asm 2026-02-21T09:21:00.3836162Z cp.async.commit_group; 2026-02-21T09:21:00.3836363Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3836434Z add.s32 %r6904, %r764, %r59; 2026-02-21T09:21:00.3836773Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3836852Z cvt.s64.s32 %rd249, %r6904; 2026-02-21T09:21:00.3836920Z add.s64 %rd190, %rd45, %rd249; 2026-02-21T09:21:00.3837125Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3837188Z // begin inline asm 2026-02-21T09:21:00.3837322Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd190 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3837384Z // end inline asm 2026-02-21T09:21:00.3837453Z cp.async.commit_group; 2026-02-21T09:21:00.3837649Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3837713Z add.s64 %rd191, %rd236, 64; 2026-02-21T09:21:00.3837782Z add.s64 %rd192, %rd240, 64; 2026-02-21T09:21:00.3837846Z add.s64 %rd193, %rd244, 64; 2026-02-21T09:21:00.3837908Z add.s64 %rd194, %rd248, 64; 2026-02-21T09:21:00.3838110Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3838168Z bar.sync 0; 2026-02-21T09:21:00.3838230Z // begin inline asm 2026-02-21T09:21:00.3838368Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd191 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3838425Z // end inline asm 2026-02-21T09:21:00.3838486Z // begin inline asm 2026-02-21T09:21:00.3838617Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd192 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3838682Z // end inline asm 2026-02-21T09:21:00.3838754Z // begin inline asm 2026-02-21T09:21:00.3838964Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd193 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3839029Z // end inline asm 2026-02-21T09:21:00.3839088Z // begin inline asm 2026-02-21T09:21:00.3839218Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd194 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3839341Z // end inline asm 2026-02-21T09:21:00.3839414Z cp.async.commit_group; 2026-02-21T09:21:00.3839617Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3839683Z add.s32 %r6905, %r764, %r65; 2026-02-21T09:21:00.3839893Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3839958Z cvt.s64.s32 %rd250, %r6905; 2026-02-21T09:21:00.3840082Z add.s64 %rd195, %rd45, %rd250; 2026-02-21T09:21:00.3840286Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3840360Z // begin inline asm 2026-02-21T09:21:00.3840499Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd195 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3840558Z // end inline asm 2026-02-21T09:21:00.3840630Z cp.async.commit_group; 2026-02-21T09:21:00.3840829Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3840895Z add.s64 %rd196, %rd236, 96; 2026-02-21T09:21:00.3840963Z add.s64 %rd197, %rd240, 96; 2026-02-21T09:21:00.3841025Z add.s64 %rd198, %rd244, 96; 2026-02-21T09:21:00.3841087Z add.s64 %rd199, %rd248, 96; 2026-02-21T09:21:00.3841349Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3841414Z // begin inline asm 2026-02-21T09:21:00.3841548Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd196 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3841606Z // end inline asm 2026-02-21T09:21:00.3841674Z // begin inline asm 2026-02-21T09:21:00.3841816Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd197 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3841879Z // end inline asm 2026-02-21T09:21:00.3841950Z // begin inline asm 2026-02-21T09:21:00.3842081Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd198 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3842139Z // end inline asm 2026-02-21T09:21:00.3842215Z // begin inline asm 2026-02-21T09:21:00.3842348Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd199 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3842406Z // end inline asm 2026-02-21T09:21:00.3842474Z cp.async.commit_group; 2026-02-21T09:21:00.3842682Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3842747Z add.s32 %r6906, %r764, %r71; 2026-02-21T09:21:00.3842950Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3843030Z cvt.s64.s32 %rd251, %r6906; 2026-02-21T09:21:00.3843095Z add.s64 %rd200, %rd45, %rd251; 2026-02-21T09:21:00.3843298Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3843368Z // begin inline asm 2026-02-21T09:21:00.3843499Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd200 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3843559Z // end inline asm 2026-02-21T09:21:00.3843629Z cp.async.commit_group; 2026-02-21T09:21:00.3843835Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3843899Z add.s64 %rd201, %rd236, 128; 2026-02-21T09:21:00.3843962Z add.s64 %rd202, %rd240, 128; 2026-02-21T09:21:00.3844032Z add.s64 %rd203, %rd244, 128; 2026-02-21T09:21:00.3844095Z add.s64 %rd204, %rd248, 128; 2026-02-21T09:21:00.3844294Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3844362Z bar.sync 0; 2026-02-21T09:21:00.3844425Z // begin inline asm 2026-02-21T09:21:00.3844558Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd201 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3844686Z // end inline asm 2026-02-21T09:21:00.3844761Z // begin inline asm 2026-02-21T09:21:00.3844895Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd202 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3844955Z // end inline asm 2026-02-21T09:21:00.3845024Z // begin inline asm 2026-02-21T09:21:00.3845203Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd203 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3845262Z // end inline asm 2026-02-21T09:21:00.3845325Z // begin inline asm 2026-02-21T09:21:00.3845464Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd204 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3845525Z // end inline asm 2026-02-21T09:21:00.3845593Z cp.async.commit_group; 2026-02-21T09:21:00.3845803Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3845916Z add.s32 %r6907, %r764, %r77; 2026-02-21T09:21:00.3846118Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3846193Z cvt.s64.s32 %rd252, %r6907; 2026-02-21T09:21:00.3846260Z add.s64 %rd205, %rd45, %rd252; 2026-02-21T09:21:00.3846574Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3846642Z // begin inline asm 2026-02-21T09:21:00.3846785Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd205 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3846845Z // end inline asm 2026-02-21T09:21:00.3846913Z cp.async.commit_group; 2026-02-21T09:21:00.3847191Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3847259Z add.s64 %rd206, %rd236, 160; 2026-02-21T09:21:00.3847322Z add.s64 %rd207, %rd240, 160; 2026-02-21T09:21:00.3847406Z add.s64 %rd208, %rd244, 160; 2026-02-21T09:21:00.3847473Z add.s64 %rd209, %rd248, 160; 2026-02-21T09:21:00.3847676Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3847752Z // begin inline asm 2026-02-21T09:21:00.3847892Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd206 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3847950Z // end inline asm 2026-02-21T09:21:00.3848009Z // begin inline asm 2026-02-21T09:21:00.3848149Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd207 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3848209Z // end inline asm 2026-02-21T09:21:00.3848268Z // begin inline asm 2026-02-21T09:21:00.3848399Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd208 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3848461Z // end inline asm 2026-02-21T09:21:00.3848524Z // begin inline asm 2026-02-21T09:21:00.3848653Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd209 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3848717Z // end inline asm 2026-02-21T09:21:00.3848783Z cp.async.commit_group; 2026-02-21T09:21:00.3848982Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3849051Z add.s32 %r6908, %r764, %r83; 2026-02-21T09:21:00.3849251Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3849318Z cvt.s64.s32 %rd253, %r6908; 2026-02-21T09:21:00.3849385Z add.s64 %rd210, %rd45, %rd253; 2026-02-21T09:21:00.3849589Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3849653Z // begin inline asm 2026-02-21T09:21:00.3849783Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd210 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3849847Z // end inline asm 2026-02-21T09:21:00.3849916Z cp.async.commit_group; 2026-02-21T09:21:00.3850112Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3850180Z add.s64 %rd211, %rd236, 192; 2026-02-21T09:21:00.3850243Z add.s64 %rd212, %rd240, 192; 2026-02-21T09:21:00.3850304Z add.s64 %rd213, %rd244, 192; 2026-02-21T09:21:00.3850364Z add.s64 %rd214, %rd248, 192; 2026-02-21T09:21:00.3850564Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3850698Z bar.sync 0; 2026-02-21T09:21:00.3850759Z // begin inline asm 2026-02-21T09:21:00.3850896Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd211 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3851019Z // end inline asm 2026-02-21T09:21:00.3851080Z // begin inline asm 2026-02-21T09:21:00.3851226Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd212 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3851288Z // end inline asm 2026-02-21T09:21:00.3851349Z // begin inline asm 2026-02-21T09:21:00.3851482Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd213 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3851547Z // end inline asm 2026-02-21T09:21:00.3851606Z // begin inline asm 2026-02-21T09:21:00.3851799Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd214 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3851863Z // end inline asm 2026-02-21T09:21:00.3851930Z cp.async.commit_group; 2026-02-21T09:21:00.3852142Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3852210Z add.s32 %r6909, %r764, %r89; 2026-02-21T09:21:00.3852419Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3852487Z cvt.s64.s32 %rd254, %r6909; 2026-02-21T09:21:00.3852553Z add.s64 %rd215, %rd45, %rd254; 2026-02-21T09:21:00.3852756Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3852819Z // begin inline asm 2026-02-21T09:21:00.3853003Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd215 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3853069Z // end inline asm 2026-02-21T09:21:00.3853136Z cp.async.commit_group; 2026-02-21T09:21:00.3853337Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3853406Z add.s64 %rd216, %rd236, 224; 2026-02-21T09:21:00.3853469Z add.s64 %rd217, %rd240, 224; 2026-02-21T09:21:00.3853531Z add.s64 %rd218, %rd244, 224; 2026-02-21T09:21:00.3853591Z add.s64 %rd219, %rd248, 224; 2026-02-21T09:21:00.3853794Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3853858Z // begin inline asm 2026-02-21T09:21:00.3853993Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd216 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3854056Z // end inline asm 2026-02-21T09:21:00.3854115Z // begin inline asm 2026-02-21T09:21:00.3854248Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd217 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3854305Z // end inline asm 2026-02-21T09:21:00.3854371Z // begin inline asm 2026-02-21T09:21:00.3854501Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd218 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3854560Z // end inline asm 2026-02-21T09:21:00.3854629Z // begin inline asm 2026-02-21T09:21:00.3854758Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd219 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3854817Z // end inline asm 2026-02-21T09:21:00.3854893Z cp.async.commit_group; 2026-02-21T09:21:00.3855097Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3855163Z add.s32 %r6910, %r764, %r95; 2026-02-21T09:21:00.3855364Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3855444Z cvt.s64.s32 %rd255, %r6910; 2026-02-21T09:21:00.3855510Z add.s64 %rd220, %rd45, %rd255; 2026-02-21T09:21:00.3855711Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3855776Z // begin inline asm 2026-02-21T09:21:00.3855910Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd220 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3855969Z // end inline asm 2026-02-21T09:21:00.3856044Z cp.async.commit_group; 2026-02-21T09:21:00.3856242Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3856363Z add.s64 %rd221, %rd236, 256; 2026-02-21T09:21:00.3856425Z add.s64 %rd222, %rd240, 256; 2026-02-21T09:21:00.3856611Z add.s64 %rd223, %rd244, 256; 2026-02-21T09:21:00.3856678Z add.s64 %rd224, %rd248, 256; 2026-02-21T09:21:00.3856953Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3857018Z bar.sync 0; 2026-02-21T09:21:00.3857080Z // begin inline asm 2026-02-21T09:21:00.3857211Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd221 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3857279Z // end inline asm 2026-02-21T09:21:00.3857339Z // begin inline asm 2026-02-21T09:21:00.3857470Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd222 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3857531Z // end inline asm 2026-02-21T09:21:00.3857678Z // begin inline asm 2026-02-21T09:21:00.3857810Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd223 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3857867Z // end inline asm 2026-02-21T09:21:00.3857933Z // begin inline asm 2026-02-21T09:21:00.3858071Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd224 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3858128Z // end inline asm 2026-02-21T09:21:00.3858196Z cp.async.commit_group; 2026-02-21T09:21:00.3858413Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3858480Z add.s32 %r6911, %r764, %r101; 2026-02-21T09:21:00.3858679Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3858809Z cvt.s64.s32 %rd256, %r6911; 2026-02-21T09:21:00.3858876Z add.s64 %rd225, %rd45, %rd256; 2026-02-21T09:21:00.3859076Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3859144Z // begin inline asm 2026-02-21T09:21:00.3859278Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd225 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3859336Z // end inline asm 2026-02-21T09:21:00.3859403Z cp.async.commit_group; 2026-02-21T09:21:00.3859606Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3859671Z add.s64 %rd226, %rd236, 288; 2026-02-21T09:21:00.3859735Z add.s64 %rd227, %rd240, 288; 2026-02-21T09:21:00.3859803Z add.s64 %rd228, %rd244, 288; 2026-02-21T09:21:00.3859864Z add.s64 %rd229, %rd248, 288; 2026-02-21T09:21:00.3860059Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3860125Z // begin inline asm 2026-02-21T09:21:00.3860261Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd226 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3860318Z // end inline asm 2026-02-21T09:21:00.3860379Z // begin inline asm 2026-02-21T09:21:00.3860518Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd227 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3860575Z // end inline asm 2026-02-21T09:21:00.3860634Z // begin inline asm 2026-02-21T09:21:00.3860775Z cp.async.ca.shared.global [ %r105 + 0 ], [ %rd228 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3860833Z // end inline asm 2026-02-21T09:21:00.3860903Z // begin inline asm 2026-02-21T09:21:00.3861036Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd229 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3861102Z // end inline asm 2026-02-21T09:21:00.3861168Z cp.async.commit_group; 2026-02-21T09:21:00.3861367Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.3861438Z add.s32 %r6912, %r764, %r107; 2026-02-21T09:21:00.3861638Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3861703Z cvt.s64.s32 %rd257, %r6912; 2026-02-21T09:21:00.3861772Z add.s64 %rd230, %rd45, %rd257; 2026-02-21T09:21:00.3861970Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3862032Z // begin inline asm 2026-02-21T09:21:00.3862248Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd230 + 0 ], 0x8, %r6554; 2026-02-21T09:21:00.3862314Z // end inline asm 2026-02-21T09:21:00.3862383Z cp.async.commit_group; 2026-02-21T09:21:00.3862588Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3862707Z add.s32 %r6913, %r6885, %r239; 2026-02-21T09:21:00.3862773Z sub.s32 %r6914, %r6913, %r6887; 2026-02-21T09:21:00.3862835Z shl.b32 %r6915, %r6914, 8; 2026-02-21T09:21:00.3862909Z add.s32 %r22519, %r133, %r6915; 2026-02-21T09:21:00.3862971Z or.b32 %r6916, %r8, %r765; 2026-02-21T09:21:00.3863033Z shl.b32 %r6917, %r6916, 10; 2026-02-21T09:21:00.3863102Z mul.wide.s32 %rd15, %r6917, 2; 2026-02-21T09:21:00.3863167Z or.b32 %r6918, %r7, %r765; 2026-02-21T09:21:00.3863273Z shl.b32 %r6919, %r6918, 10; 2026-02-21T09:21:00.3863341Z mul.wide.s32 %rd16, %r6919, 2; 2026-02-21T09:21:00.3863410Z shl.b32 %r6920, %r6886, 18; 2026-02-21T09:21:00.3863474Z or.b32 %r6921, %r22252, %r6920; 2026-02-21T09:21:00.3863542Z mul.wide.s32 %rd17, %r6921, 2; 2026-02-21T09:21:00.3863605Z or.b32 %r22518, %r137, %r6920; 2026-02-21T09:21:00.3863671Z mov.b32 %r22522, 0f00000000; 2026-02-21T09:21:00.3863730Z mov.b32 %r22521, 4; 2026-02-21T09:21:00.3863795Z mov.b32 %r22520, -1; 2026-02-21T09:21:00.3863863Z mov.b64 %rd717, -16; 2026-02-21T09:21:00.3863926Z mov.b64 %rd716, %rd3; 2026-02-21T09:21:00.3863988Z mov.b32 %r22523, %r22522; 2026-02-21T09:21:00.3864055Z mov.b32 %r22524, %r22522; 2026-02-21T09:21:00.3864116Z mov.b32 %r22525, %r22522; 2026-02-21T09:21:00.3864222Z mov.b32 %r22526, %r22522; 2026-02-21T09:21:00.3864284Z mov.b32 %r22527, %r22522; 2026-02-21T09:21:00.3864351Z mov.b32 %r22528, %r22522; 2026-02-21T09:21:00.3864413Z mov.b32 %r22529, %r22522; 2026-02-21T09:21:00.3864474Z mov.b32 %r22530, %r22522; 2026-02-21T09:21:00.3864541Z mov.b32 %r22531, %r22522; 2026-02-21T09:21:00.3864600Z mov.b32 %r22532, %r22522; 2026-02-21T09:21:00.3864659Z mov.b32 %r22533, %r22522; 2026-02-21T09:21:00.3864724Z mov.b32 %r22534, %r22522; 2026-02-21T09:21:00.3864788Z mov.b32 %r22535, %r22522; 2026-02-21T09:21:00.3864849Z mov.b32 %r22536, %r22522; 2026-02-21T09:21:00.3864910Z mov.b32 %r22537, %r22522; 2026-02-21T09:21:00.3864979Z mov.b32 %r22538, %r22522; 2026-02-21T09:21:00.3865040Z mov.b32 %r22539, %r22522; 2026-02-21T09:21:00.3865102Z mov.b32 %r22540, %r22522; 2026-02-21T09:21:00.3865162Z mov.b32 %r22541, %r22522; 2026-02-21T09:21:00.3865232Z mov.b32 %r22542, %r22522; 2026-02-21T09:21:00.3865294Z mov.b32 %r22543, %r22522; 2026-02-21T09:21:00.3865356Z mov.b32 %r22544, %r22522; 2026-02-21T09:21:00.3865422Z mov.b32 %r22545, %r22522; 2026-02-21T09:21:00.3865479Z mov.b32 %r22546, %r22522; 2026-02-21T09:21:00.3865539Z mov.b32 %r22547, %r22522; 2026-02-21T09:21:00.3865599Z mov.b32 %r22548, %r22522; 2026-02-21T09:21:00.3865665Z mov.b32 %r22549, %r22522; 2026-02-21T09:21:00.3865724Z mov.b32 %r22550, %r22522; 2026-02-21T09:21:00.3865784Z mov.b32 %r22551, %r22522; 2026-02-21T09:21:00.3865858Z mov.b32 %r22552, %r22522; 2026-02-21T09:21:00.3865927Z mov.b32 %r22553, %r22522; 2026-02-21T09:21:00.3865989Z mov.b32 %r22554, %r22522; 2026-02-21T09:21:00.3866048Z mov.b32 %r22555, %r22522; 2026-02-21T09:21:00.3866114Z mov.b32 %r22556, %r22522; 2026-02-21T09:21:00.3866178Z mov.b32 %r22557, %r22522; 2026-02-21T09:21:00.3866238Z mov.b32 %r22558, %r22522; 2026-02-21T09:21:00.3866302Z mov.b32 %r22559, %r22522; 2026-02-21T09:21:00.3866361Z mov.b32 %r22560, %r22522; 2026-02-21T09:21:00.3866421Z mov.b32 %r22561, %r22522; 2026-02-21T09:21:00.3866593Z mov.b32 %r22562, %r22522; 2026-02-21T09:21:00.3866664Z mov.b32 %r22563, %r22522; 2026-02-21T09:21:00.3866724Z mov.b32 %r22564, %r22522; 2026-02-21T09:21:00.3866783Z mov.b32 %r22565, %r22522; 2026-02-21T09:21:00.3866849Z mov.b32 %r22566, %r22522; 2026-02-21T09:21:00.3866908Z mov.b32 %r22567, %r22522; 2026-02-21T09:21:00.3866967Z mov.b32 %r22568, %r22522; 2026-02-21T09:21:00.3867027Z mov.b32 %r22569, %r22522; 2026-02-21T09:21:00.3867175Z mov.b32 %r22570, %r22522; 2026-02-21T09:21:00.3867233Z mov.b32 %r22571, %r22522; 2026-02-21T09:21:00.3867291Z mov.b32 %r22572, %r22522; 2026-02-21T09:21:00.3867355Z mov.b32 %r22573, %r22522; 2026-02-21T09:21:00.3867415Z mov.b32 %r22574, %r22522; 2026-02-21T09:21:00.3867548Z mov.b32 %r22575, %r22522; 2026-02-21T09:21:00.3867616Z mov.b32 %r22576, %r22522; 2026-02-21T09:21:00.3867678Z mov.b32 %r22577, %r22522; 2026-02-21T09:21:00.3867738Z mov.b32 %r22578, %r22522; 2026-02-21T09:21:00.3867799Z mov.b32 %r22579, %r22522; 2026-02-21T09:21:00.3867864Z mov.b32 %r22580, %r22522; 2026-02-21T09:21:00.3867924Z mov.b32 %r22581, %r22522; 2026-02-21T09:21:00.3867983Z mov.b32 %r22582, %r22522; 2026-02-21T09:21:00.3868049Z mov.b32 %r22583, %r22522; 2026-02-21T09:21:00.3868170Z mov.b32 %r22584, %r22522; 2026-02-21T09:21:00.3868231Z mov.b32 %r22585, %r22522; 2026-02-21T09:21:00.3868349Z mov.b32 %r22586, %r22522; 2026-02-21T09:21:00.3868421Z mov.b32 %r22587, %r22522; 2026-02-21T09:21:00.3868483Z mov.b32 %r22588, %r22522; 2026-02-21T09:21:00.3868545Z mov.b32 %r22589, %r22522; 2026-02-21T09:21:00.3868608Z mov.b32 %r22590, %r22522; 2026-02-21T09:21:00.3868667Z mov.b32 %r22591, %r22522; 2026-02-21T09:21:00.3868729Z mov.b32 %r22592, %r22522; 2026-02-21T09:21:00.3868787Z mov.b32 %r22593, %r22522; 2026-02-21T09:21:00.3868853Z mov.b32 %r22594, %r22522; 2026-02-21T09:21:00.3868913Z mov.b32 %r22595, %r22522; 2026-02-21T09:21:00.3868971Z mov.b32 %r22596, %r22522; 2026-02-21T09:21:00.3869035Z mov.b32 %r22597, %r22522; 2026-02-21T09:21:00.3869161Z mov.b32 %r22598, %r22522; 2026-02-21T09:21:00.3869224Z mov.b32 %r22599, %r22522; 2026-02-21T09:21:00.3869283Z mov.b32 %r22600, %r22522; 2026-02-21T09:21:00.3869348Z mov.b32 %r22601, %r22522; 2026-02-21T09:21:00.3869409Z mov.b32 %r22602, %r22522; 2026-02-21T09:21:00.3869467Z mov.b32 %r22603, %r22522; 2026-02-21T09:21:00.3869535Z mov.b32 %r22604, %r22522; 2026-02-21T09:21:00.3869596Z mov.b32 %r22605, %r22522; 2026-02-21T09:21:00.3869656Z mov.b32 %r22606, %r22522; 2026-02-21T09:21:00.3869718Z mov.b32 %r22607, %r22522; 2026-02-21T09:21:00.3869783Z mov.b32 %r22608, %r22522; 2026-02-21T09:21:00.3869844Z mov.b32 %r22609, %r22522; 2026-02-21T09:21:00.3869905Z mov.b32 %r22610, %r22522; 2026-02-21T09:21:00.3869969Z mov.b32 %r22611, %r22522; 2026-02-21T09:21:00.3870030Z mov.b32 %r22612, %r22522; 2026-02-21T09:21:00.3870089Z mov.b32 %r22613, %r22522; 2026-02-21T09:21:00.3870149Z mov.b32 %r22614, %r22522; 2026-02-21T09:21:00.3870215Z mov.b32 %r22615, %r22522; 2026-02-21T09:21:00.3870277Z mov.b32 %r22616, %r22522; 2026-02-21T09:21:00.3870338Z mov.b32 %r22617, %r22522; 2026-02-21T09:21:00.3870404Z mov.b32 %r22618, %r22522; 2026-02-21T09:21:00.3870464Z mov.b32 %r22619, %r22522; 2026-02-21T09:21:00.3870525Z mov.b32 %r22620, %r22522; 2026-02-21T09:21:00.3870590Z mov.b32 %r22621, %r22522; 2026-02-21T09:21:00.3870650Z mov.b32 %r22622, %r22522; 2026-02-21T09:21:00.3870710Z mov.b32 %r22623, %r22522; 2026-02-21T09:21:00.3870773Z mov.b32 %r22624, %r22522; 2026-02-21T09:21:00.3870839Z mov.b32 %r22625, %r22522; 2026-02-21T09:21:00.3870899Z mov.b32 %r22626, %r22522; 2026-02-21T09:21:00.3870960Z mov.b32 %r22627, %r22522; 2026-02-21T09:21:00.3871025Z mov.b32 %r22628, %r22522; 2026-02-21T09:21:00.3871086Z mov.b32 %r22629, %r22522; 2026-02-21T09:21:00.3871147Z mov.b32 %r22630, %r22522; 2026-02-21T09:21:00.3871207Z mov.b32 %r22631, %r22522; 2026-02-21T09:21:00.3871278Z mov.b32 %r22632, %r22522; 2026-02-21T09:21:00.3871346Z mov.b32 %r22633, %r22522; 2026-02-21T09:21:00.3871411Z mov.b32 %r22634, %r22522; 2026-02-21T09:21:00.3871475Z mov.b32 %r22635, %r22522; 2026-02-21T09:21:00.3871536Z mov.b32 %r22636, %r22522; 2026-02-21T09:21:00.3871598Z mov.b32 %r22637, %r22522; 2026-02-21T09:21:00.3871659Z mov.b32 %r22638, %r22522; 2026-02-21T09:21:00.3871725Z mov.b32 %r22639, %r22522; 2026-02-21T09:21:00.3871786Z mov.b32 %r22640, %r22522; 2026-02-21T09:21:00.3871907Z mov.b32 %r22641, %r22522; 2026-02-21T09:21:00.3871976Z mov.b32 %r22642, %r22522; 2026-02-21T09:21:00.3872035Z mov.b32 %r22643, %r22522; 2026-02-21T09:21:00.3872097Z mov.b32 %r22644, %r22522; 2026-02-21T09:21:00.3872158Z mov.b32 %r22645, %r22522; 2026-02-21T09:21:00.3872270Z mov.b32 %r22646, %r22522; 2026-02-21T09:21:00.3872330Z mov.b32 %r22647, %r22522; 2026-02-21T09:21:00.3872388Z mov.b32 %r22648, %r22522; 2026-02-21T09:21:00.3872457Z mov.b32 %r22649, %r22522; 2026-02-21T09:21:00.3872518Z mov.b32 %r22650, %r22522; 2026-02-21T09:21:00.3872577Z mov.b32 %r22651, %r22522; 2026-02-21T09:21:00.3872638Z mov.b32 %r22652, %r22522; 2026-02-21T09:21:00.3872704Z mov.b32 %r22653, %r22522; 2026-02-21T09:21:00.3872764Z mov.b32 %r22654, %r22522; 2026-02-21T09:21:00.3872824Z mov.b32 %r22655, %r22522; 2026-02-21T09:21:00.3872939Z mov.b32 %r22656, %r22522; 2026-02-21T09:21:00.3873002Z mov.b32 %r22657, %r22522; 2026-02-21T09:21:00.3873061Z mov.b32 %r22658, %r22522; 2026-02-21T09:21:00.3873122Z mov.b32 %r22659, %r22522; 2026-02-21T09:21:00.3873187Z mov.b32 %r22660, %r22522; 2026-02-21T09:21:00.3873245Z mov.b32 %r22661, %r22522; 2026-02-21T09:21:00.3873303Z mov.b32 %r22662, %r22522; 2026-02-21T09:21:00.3873366Z mov.b32 %r22663, %r22522; 2026-02-21T09:21:00.3873426Z mov.b32 %r22664, %r22522; 2026-02-21T09:21:00.3873485Z mov.b32 %r22665, %r22522; 2026-02-21T09:21:00.3873549Z mov.b32 %r22666, %r22522; 2026-02-21T09:21:00.3873608Z mov.b32 %r22667, %r22522; 2026-02-21T09:21:00.3873667Z mov.b32 %r22668, %r22522; 2026-02-21T09:21:00.3873737Z mov.b32 %r22669, %r22522; 2026-02-21T09:21:00.3873858Z mov.b32 %r22670, %r22522; 2026-02-21T09:21:00.3873920Z mov.b32 %r22671, %r22522; 2026-02-21T09:21:00.3873983Z mov.b32 %r22672, %r22522; 2026-02-21T09:21:00.3874045Z mov.b32 %r22673, %r22522; 2026-02-21T09:21:00.3874107Z mov.b32 %r22674, %r22522; 2026-02-21T09:21:00.3874168Z mov.b32 %r22675, %r22522; 2026-02-21T09:21:00.3874226Z mov.b32 %r22676, %r22522; 2026-02-21T09:21:00.3874291Z mov.b32 %r22677, %r22522; 2026-02-21T09:21:00.3874349Z mov.b32 %r22678, %r22522; 2026-02-21T09:21:00.3874409Z mov.b32 %r22679, %r22522; 2026-02-21T09:21:00.3874473Z mov.b32 %r22680, %r22522; 2026-02-21T09:21:00.3874533Z mov.b32 %r22681, %r22522; 2026-02-21T09:21:00.3874594Z mov.b32 %r22682, %r22522; 2026-02-21T09:21:00.3874652Z mov.b32 %r22683, %r22522; 2026-02-21T09:21:00.3874716Z mov.b32 %r22684, %r22522; 2026-02-21T09:21:00.3874775Z mov.b32 %r22685, %r22522; 2026-02-21T09:21:00.3874833Z mov.b32 %r22686, %r22522; 2026-02-21T09:21:00.3874897Z mov.b32 %r22687, %r22522; 2026-02-21T09:21:00.3874959Z mov.b32 %r22688, %r22522; 2026-02-21T09:21:00.3875018Z mov.b32 %r22689, %r22522; 2026-02-21T09:21:00.3875079Z mov.b32 %r22690, %r22522; 2026-02-21T09:21:00.3875143Z mov.b32 %r22691, %r22522; 2026-02-21T09:21:00.3875205Z mov.b32 %r22692, %r22522; 2026-02-21T09:21:00.3875264Z mov.b32 %r22693, %r22522; 2026-02-21T09:21:00.3875332Z mov.b32 %r22694, %r22522; 2026-02-21T09:21:00.3875391Z mov.b32 %r22695, %r22522; 2026-02-21T09:21:00.3875449Z mov.b32 %r22696, %r22522; 2026-02-21T09:21:00.3875511Z mov.b32 %r22697, %r22522; 2026-02-21T09:21:00.3875581Z mov.b32 %r22698, %r22522; 2026-02-21T09:21:00.3875641Z mov.b32 %r22699, %r22522; 2026-02-21T09:21:00.3875701Z mov.b32 %r22700, %r22522; 2026-02-21T09:21:00.3875766Z mov.b32 %r22701, %r22522; 2026-02-21T09:21:00.3875828Z mov.b32 %r22702, %r22522; 2026-02-21T09:21:00.3875887Z mov.b32 %r22703, %r22522; 2026-02-21T09:21:00.3875947Z mov.b32 %r22704, %r22522; 2026-02-21T09:21:00.3876012Z mov.b32 %r22705, %r22522; 2026-02-21T09:21:00.3876074Z mov.b32 %r22706, %r22522; 2026-02-21T09:21:00.3876134Z mov.b32 %r22707, %r22522; 2026-02-21T09:21:00.3876197Z mov.b32 %r22708, %r22522; 2026-02-21T09:21:00.3876257Z mov.b32 %r22709, %r22522; 2026-02-21T09:21:00.3876317Z mov.b32 %r22710, %r22522; 2026-02-21T09:21:00.3876383Z mov.b32 %r22711, %r22522; 2026-02-21T09:21:00.3876445Z mov.b32 %r22712, %r22522; 2026-02-21T09:21:00.3876729Z mov.b32 %r22713, %r22522; 2026-02-21T09:21:00.3876790Z mov.b32 %r22714, %r22522; 2026-02-21T09:21:00.3876857Z mov.b32 %r22715, %r22522; 2026-02-21T09:21:00.3876918Z mov.b32 %r22716, %r22522; 2026-02-21T09:21:00.3876978Z mov.b32 %r22717, %r22522; 2026-02-21T09:21:00.3877126Z mov.b32 %r22718, %r22522; 2026-02-21T09:21:00.3877189Z mov.b32 %r22719, %r22522; 2026-02-21T09:21:00.3877248Z mov.b32 %r22720, %r22522; 2026-02-21T09:21:00.3877306Z mov.b32 %r22721, %r22522; 2026-02-21T09:21:00.3877373Z mov.b32 %r22722, %r22522; 2026-02-21T09:21:00.3877432Z mov.b32 %r22723, %r22522; 2026-02-21T09:21:00.3877494Z mov.b32 %r22724, %r22522; 2026-02-21T09:21:00.3877559Z mov.b32 %r22725, %r22522; 2026-02-21T09:21:00.3877618Z mov.b32 %r22726, %r22522; 2026-02-21T09:21:00.3877749Z mov.b32 %r22727, %r22522; 2026-02-21T09:21:00.3877812Z mov.b32 %r22728, %r22522; 2026-02-21T09:21:00.3877877Z mov.b32 %r22729, %r22522; 2026-02-21T09:21:00.3877936Z mov.b32 %r22730, %r22522; 2026-02-21T09:21:00.3877999Z mov.b32 %r22731, %r22522; 2026-02-21T09:21:00.3878070Z mov.b32 %r22732, %r22522; 2026-02-21T09:21:00.3878136Z mov.b32 %r22733, %r22522; 2026-02-21T09:21:00.3878198Z mov.b32 %r22734, %r22522; 2026-02-21T09:21:00.3878260Z mov.b32 %r22735, %r22522; 2026-02-21T09:21:00.3878330Z mov.b32 %r22736, %r22522; 2026-02-21T09:21:00.3878392Z mov.b32 %r22737, %r22522; 2026-02-21T09:21:00.3878451Z mov.b32 %r22738, %r22522; 2026-02-21T09:21:00.3878515Z mov.b32 %r22739, %r22522; 2026-02-21T09:21:00.3878573Z mov.b32 %r22740, %r22522; 2026-02-21T09:21:00.3878633Z mov.b32 %r22741, %r22522; 2026-02-21T09:21:00.3878756Z mov.b32 %r22742, %r22522; 2026-02-21T09:21:00.3878824Z mov.b32 %r22743, %r22522; 2026-02-21T09:21:00.3878884Z mov.b32 %r22744, %r22522; 2026-02-21T09:21:00.3878943Z mov.b32 %r22745, %r22522; 2026-02-21T09:21:00.3879022Z mov.b32 %r22746, %r22522; 2026-02-21T09:21:00.3879083Z mov.b32 %r22747, %r22522; 2026-02-21T09:21:00.3879142Z mov.b32 %r22748, %r22522; 2026-02-21T09:21:00.3879203Z mov.b32 %r22749, %r22522; 2026-02-21T09:21:00.3879272Z mov.b32 %r22750, %r22522; 2026-02-21T09:21:00.3879330Z mov.b32 %r22751, %r22522; 2026-02-21T09:21:00.3879390Z mov.b32 %r22752, %r22522; 2026-02-21T09:21:00.3879460Z mov.b32 %r22753, %r22522; 2026-02-21T09:21:00.3879521Z mov.b32 %r22754, %r22522; 2026-02-21T09:21:00.3879579Z mov.b32 %r22755, %r22522; 2026-02-21T09:21:00.3879646Z mov.b32 %r22756, %r22522; 2026-02-21T09:21:00.3879705Z mov.b32 %r22757, %r22522; 2026-02-21T09:21:00.3879764Z mov.b32 %r22758, %r22522; 2026-02-21T09:21:00.3879822Z mov.b32 %r22759, %r22522; 2026-02-21T09:21:00.3879887Z mov.b32 %r22760, %r22522; 2026-02-21T09:21:00.3879946Z mov.b32 %r22761, %r22522; 2026-02-21T09:21:00.3880007Z mov.b32 %r22762, %r22522; 2026-02-21T09:21:00.3880071Z mov.b32 %r22763, %r22522; 2026-02-21T09:21:00.3880132Z mov.b32 %r22764, %r22522; 2026-02-21T09:21:00.3880191Z mov.b32 %r22765, %r22522; 2026-02-21T09:21:00.3880250Z mov.b32 %r22766, %r22522; 2026-02-21T09:21:00.3880315Z mov.b32 %r22767, %r22522; 2026-02-21T09:21:00.3880375Z mov.b32 %r22768, %r22522; 2026-02-21T09:21:00.3880437Z mov.b32 %r22769, %r22522; 2026-02-21T09:21:00.3880503Z mov.b32 %r22770, %r22522; 2026-02-21T09:21:00.3880562Z mov.b32 %r22771, %r22522; 2026-02-21T09:21:00.3880622Z mov.b32 %r22772, %r22522; 2026-02-21T09:21:00.3880680Z mov.b32 %r22773, %r22522; 2026-02-21T09:21:00.3880744Z mov.b32 %r22774, %r22522; 2026-02-21T09:21:00.3880804Z mov.b32 %r22775, %r22522; 2026-02-21T09:21:00.3880863Z mov.b32 %r22776, %r22522; 2026-02-21T09:21:00.3880927Z mov.b32 %r22777, %r22522; 2026-02-21T09:21:00.3881049Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:00.3881174Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:00.3881245Z add.s64 %rd717, %rd717, 16; 2026-02-21T09:21:00.3881323Z setp.lt.u64 %p22, %rd717, 432; 2026-02-21T09:21:00.3881387Z add.s32 %r10058, %r22520, 1; 2026-02-21T09:21:00.3881515Z setp.gt.s32 %p23, %r10058, 4; 2026-02-21T09:21:00.3881594Z selp.b32 %r22520, 0, %r10058, %p23; 2026-02-21T09:21:00.3881807Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3881881Z cp.async.wait_group 16; 2026-02-21T09:21:00.3881991Z bar.sync 0; 2026-02-21T09:21:00.3882052Z shl.b32 %r10059, %r22520, 13; 2026-02-21T09:21:00.3882117Z add.s32 %r10061, %r22237, %r10059; 2026-02-21T09:21:00.3882322Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.3882396Z add.s32 %r10062, %r10061, %r109; 2026-02-21T09:21:00.3882464Z ld.shared.b16 %rs113, [%r10062]; 2026-02-21T09:21:00.3882536Z ld.shared.b16 %rs114, [%r10062+256]; 2026-02-21T09:21:00.3882666Z ld.shared.b16 %rs115, [%r10062+16]; 2026-02-21T09:21:00.3882737Z ld.shared.b16 %rs116, [%r10062+272]; 2026-02-21T09:21:00.3882809Z ld.shared.b16 %rs117, [%r10062+4096]; 2026-02-21T09:21:00.3882884Z ld.shared.b16 %rs118, [%r10062+4352]; 2026-02-21T09:21:00.3882951Z ld.shared.b16 %rs119, [%r10062+4112]; 2026-02-21T09:21:00.3883016Z ld.shared.b16 %rs120, [%r10062+4368]; 2026-02-21T09:21:00.3883081Z add.s32 %r10063, %r10061, %r110; 2026-02-21T09:21:00.3883154Z ld.shared.b16 %rs121, [%r10063]; 2026-02-21T09:21:00.3883223Z ld.shared.b16 %rs122, [%r10063+256]; 2026-02-21T09:21:00.3883291Z ld.shared.b16 %rs123, [%r10063+16]; 2026-02-21T09:21:00.3883363Z ld.shared.b16 %rs124, [%r10063+272]; 2026-02-21T09:21:00.3883430Z ld.shared.b16 %rs125, [%r10063+4096]; 2026-02-21T09:21:00.3883545Z ld.shared.b16 %rs126, [%r10063+4352]; 2026-02-21T09:21:00.3883614Z ld.shared.b16 %rs127, [%r10063+4112]; 2026-02-21T09:21:00.3883690Z ld.shared.b16 %rs128, [%r10063+4368]; 2026-02-21T09:21:00.3883775Z cvt.f32.bf16 %r7178, %rs113; 2026-02-21T09:21:00.3883839Z cvt.f32.bf16 %r7179, %rs114; 2026-02-21T09:21:00.3883910Z cvt.f32.bf16 %r7180, %rs121; 2026-02-21T09:21:00.3883976Z cvt.f32.bf16 %r7181, %rs122; 2026-02-21T09:21:00.3884039Z cvt.f32.bf16 %r7438, %rs115; 2026-02-21T09:21:00.3884103Z cvt.f32.bf16 %r7439, %rs116; 2026-02-21T09:21:00.3884170Z cvt.f32.bf16 %r7440, %rs123; 2026-02-21T09:21:00.3884232Z cvt.f32.bf16 %r7441, %rs124; 2026-02-21T09:21:00.3884294Z cvt.f32.bf16 %r7698, %rs117; 2026-02-21T09:21:00.3884366Z cvt.f32.bf16 %r7699, %rs118; 2026-02-21T09:21:00.3884426Z cvt.f32.bf16 %r7700, %rs125; 2026-02-21T09:21:00.3884490Z cvt.f32.bf16 %r7701, %rs126; 2026-02-21T09:21:00.3884561Z cvt.f32.bf16 %r7958, %rs119; 2026-02-21T09:21:00.3884625Z cvt.f32.bf16 %r7959, %rs120; 2026-02-21T09:21:00.3884690Z cvt.f32.bf16 %r7960, %rs127; 2026-02-21T09:21:00.3884754Z cvt.f32.bf16 %r7961, %rs128; 2026-02-21T09:21:00.3884967Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3885034Z shl.b32 %r10064, %r22520, 11; 2026-02-21T09:21:00.3885101Z add.s32 %r10065, %r22237, %r10064; 2026-02-21T09:21:00.3885169Z add.s32 %r10066, %r10065, 98304; 2026-02-21T09:21:00.3885368Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.3885432Z add.s32 %r10067, %r10066, %r22239; 2026-02-21T09:21:00.3885501Z add.s32 %r10068, %r10066, %r22243; 2026-02-21T09:21:00.3885567Z add.s32 %r10069, %r10066, %r22244; 2026-02-21T09:21:00.3885763Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3885834Z ld.shared.s8 %rs129, [%r10067]; 2026-02-21T09:21:00.3886041Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3886107Z shl.b16 %rs130, %rs129, 4; 2026-02-21T09:21:00.3886319Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3886398Z ld.shared.s8 %rs131, [%r10067+256]; 2026-02-21T09:21:00.3886720Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3886880Z shl.b16 %rs132, %rs131, 4; 2026-02-21T09:21:00.3887087Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3887158Z ld.shared.s8 %rs133, [%r10067+512]; 2026-02-21T09:21:00.3887420Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3887491Z shl.b16 %rs134, %rs133, 4; 2026-02-21T09:21:00.3887690Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3887761Z ld.shared.s8 %rs135, [%r10068]; 2026-02-21T09:21:00.3887957Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3888088Z shl.b16 %rs136, %rs135, 4; 2026-02-21T09:21:00.3888291Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3888365Z ld.shared.s8 %rs137, [%r10067+1024]; 2026-02-21T09:21:00.3888562Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3888626Z shl.b16 %rs138, %rs137, 4; 2026-02-21T09:21:00.3888820Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3888896Z ld.shared.s8 %rs139, [%r10067+1280]; 2026-02-21T09:21:00.3889091Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3889216Z shl.b16 %rs140, %rs139, 4; 2026-02-21T09:21:00.3889417Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3889486Z ld.shared.s8 %rs141, [%r10067+1536]; 2026-02-21T09:21:00.3889684Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3889746Z shl.b16 %rs142, %rs141, 4; 2026-02-21T09:21:00.3889947Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3890014Z ld.shared.s8 %rs143, [%r10069]; 2026-02-21T09:21:00.3890210Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3890283Z shl.b16 %rs144, %rs143, 4; 2026-02-21T09:21:00.3890477Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3890541Z cvt.s16.s8 %rs145, %rs130; 2026-02-21T09:21:00.3890611Z shr.s16 %rs146, %rs145, 4; 2026-02-21T09:21:00.3890672Z cvt.s16.s8 %rs147, %rs132; 2026-02-21T09:21:00.3890733Z shr.s16 %rs148, %rs147, 4; 2026-02-21T09:21:00.3890794Z shr.s16 %rs149, %rs129, 4; 2026-02-21T09:21:00.3890860Z shr.s16 %rs150, %rs131, 4; 2026-02-21T09:21:00.3891055Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3891123Z cvt.rn.f32.s16 %r10070, %rs150; 2026-02-21T09:21:00.3891194Z cvt.rn.f32.s16 %r10071, %rs149; 2026-02-21T09:21:00.3891257Z cvt.rn.f32.s16 %r10072, %rs148; 2026-02-21T09:21:00.3891319Z cvt.rn.f32.s16 %r10073, %rs146; 2026-02-21T09:21:00.3891521Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3891586Z cvt.s16.s8 %rs151, %rs134; 2026-02-21T09:21:00.3891648Z shr.s16 %rs152, %rs151, 4; 2026-02-21T09:21:00.3891708Z cvt.s16.s8 %rs153, %rs136; 2026-02-21T09:21:00.3891774Z shr.s16 %rs154, %rs153, 4; 2026-02-21T09:21:00.3891837Z shr.s16 %rs155, %rs133, 4; 2026-02-21T09:21:00.3891899Z shr.s16 %rs156, %rs135, 4; 2026-02-21T09:21:00.3892108Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3892173Z cvt.rn.f32.s16 %r10074, %rs156; 2026-02-21T09:21:00.3892237Z cvt.rn.f32.s16 %r10075, %rs155; 2026-02-21T09:21:00.3892302Z cvt.rn.f32.s16 %r10076, %rs154; 2026-02-21T09:21:00.3892427Z cvt.rn.f32.s16 %r10077, %rs152; 2026-02-21T09:21:00.3892622Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3892686Z cvt.s16.s8 %rs157, %rs138; 2026-02-21T09:21:00.3892803Z shr.s16 %rs158, %rs157, 4; 2026-02-21T09:21:00.3892866Z cvt.s16.s8 %rs159, %rs140; 2026-02-21T09:21:00.3892929Z shr.s16 %rs160, %rs159, 4; 2026-02-21T09:21:00.3892997Z shr.s16 %rs161, %rs137, 4; 2026-02-21T09:21:00.3893058Z shr.s16 %rs162, %rs139, 4; 2026-02-21T09:21:00.3893258Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3893325Z cvt.rn.f32.s16 %r10078, %rs162; 2026-02-21T09:21:00.3893394Z cvt.rn.f32.s16 %r10079, %rs161; 2026-02-21T09:21:00.3893503Z cvt.rn.f32.s16 %r10080, %rs160; 2026-02-21T09:21:00.3893570Z cvt.rn.f32.s16 %r10081, %rs158; 2026-02-21T09:21:00.3893772Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3893837Z cvt.s16.s8 %rs163, %rs142; 2026-02-21T09:21:00.3893900Z shr.s16 %rs164, %rs163, 4; 2026-02-21T09:21:00.3893969Z cvt.s16.s8 %rs165, %rs144; 2026-02-21T09:21:00.3894034Z shr.s16 %rs166, %rs165, 4; 2026-02-21T09:21:00.3894096Z shr.s16 %rs167, %rs141, 4; 2026-02-21T09:21:00.3894156Z shr.s16 %rs168, %rs143, 4; 2026-02-21T09:21:00.3894360Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3894425Z cvt.rn.f32.s16 %r10082, %rs168; 2026-02-21T09:21:00.3894535Z cvt.rn.f32.s16 %r10083, %rs167; 2026-02-21T09:21:00.3894605Z cvt.rn.f32.s16 %r10084, %rs166; 2026-02-21T09:21:00.3894683Z cvt.rn.f32.s16 %r10085, %rs164; 2026-02-21T09:21:00.3894815Z st.shared.v4.b32 [%r113], {%r10073, %r10071, %r10072, %r10070}; 2026-02-21T09:21:00.3894940Z st.shared.v4.b32 [%r114], {%r10077, %r10075, %r10076, %r10074}; 2026-02-21T09:21:00.3895055Z st.shared.v4.b32 [%r115], {%r10081, %r10079, %r10080, %r10078}; 2026-02-21T09:21:00.3895170Z st.shared.v4.b32 [%r116], {%r10085, %r10083, %r10084, %r10082}; 2026-02-21T09:21:00.3895230Z $L__tmp5: 2026-02-21T09:21:00.3895516Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.3895583Z // begin inline asm 2026-02-21T09:21:00.3895673Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.3895740Z // end inline asm 2026-02-21T09:21:00.3895797Z bar.sync 0; 2026-02-21T09:21:00.3895887Z shfl.sync.idx.b32 %r10086, %r5, 0, 31, -1; 2026-02-21T09:21:00.3895966Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.3896038Z mov.pred %p14, -1; 2026-02-21T09:21:00.3896100Z // begin inline asm 2026-02-21T09:21:00.3898954Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649}, {%r7178,%r7179,%r7180,%r7181}, %rd1, %p14, 1, 1; 2026-02-21T09:21:00.3899030Z // end inline asm 2026-02-21T09:21:00.3899093Z // begin inline asm 2026-02-21T09:21:00.3901904Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649}, {%r7438,%r7439,%r7440,%r7441}, %rd2, %p14, 1, 1; 2026-02-21T09:21:00.3902094Z // end inline asm 2026-02-21T09:21:00.3902156Z // begin inline asm 2026-02-21T09:21:00.3904924Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777}, {%r7698,%r7699,%r7700,%r7701}, %rd1, %p14, 1, 1; 2026-02-21T09:21:00.3904999Z // end inline asm 2026-02-21T09:21:00.3905061Z // begin inline asm 2026-02-21T09:21:00.3907900Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777}, {%r7958,%r7959,%r7960,%r7961}, %rd2, %p14, 1, 1; 2026-02-21T09:21:00.3907972Z // end inline asm 2026-02-21T09:21:00.3908059Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.3908120Z mov.b32 %r9778, 0; 2026-02-21T09:21:00.3908184Z mov.b32 %r8218, %r2884; 2026-02-21T09:21:00.3908257Z mov.b32 %r8219, %r9778; 2026-02-21T09:21:00.3908373Z mov.b32 %r8220, %r9778; 2026-02-21T09:21:00.3908438Z // begin inline asm 2026-02-21T09:21:00.3913561Z // wait for regs: %r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649,%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777,%r8218,%r8219,%r8220 2026-02-21T09:21:00.3913765Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.3913825Z // end inline asm 2026-02-21T09:21:00.3913882Z $L__tmp6: 2026-02-21T09:21:00.3914101Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3914174Z add.s32 %r10088, %r6234, %r10059; 2026-02-21T09:21:00.3914377Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.3914442Z add.s32 %r10089, %r10088, %r109; 2026-02-21T09:21:00.3914523Z ld.shared.b16 %rs169, [%r10089]; 2026-02-21T09:21:00.3914596Z ld.shared.b16 %rs170, [%r10089+256]; 2026-02-21T09:21:00.3914667Z ld.shared.b16 %rs171, [%r10089+16]; 2026-02-21T09:21:00.3914738Z ld.shared.b16 %rs172, [%r10089+272]; 2026-02-21T09:21:00.3914809Z ld.shared.b16 %rs173, [%r10089+4096]; 2026-02-21T09:21:00.3914878Z ld.shared.b16 %rs174, [%r10089+4352]; 2026-02-21T09:21:00.3914950Z ld.shared.b16 %rs175, [%r10089+4112]; 2026-02-21T09:21:00.3915018Z ld.shared.b16 %rs176, [%r10089+4368]; 2026-02-21T09:21:00.3915081Z add.s32 %r10090, %r10088, %r110; 2026-02-21T09:21:00.3915148Z ld.shared.b16 %rs177, [%r10090]; 2026-02-21T09:21:00.3915220Z ld.shared.b16 %rs178, [%r10090+256]; 2026-02-21T09:21:00.3915289Z ld.shared.b16 %rs179, [%r10090+16]; 2026-02-21T09:21:00.3915358Z ld.shared.b16 %rs180, [%r10090+272]; 2026-02-21T09:21:00.3915431Z ld.shared.b16 %rs181, [%r10090+4096]; 2026-02-21T09:21:00.3915498Z ld.shared.b16 %rs182, [%r10090+4352]; 2026-02-21T09:21:00.3915567Z ld.shared.b16 %rs183, [%r10090+4112]; 2026-02-21T09:21:00.3915651Z ld.shared.b16 %rs184, [%r10090+4368]; 2026-02-21T09:21:00.3915724Z cvt.f32.bf16 %r8736, %rs169; 2026-02-21T09:21:00.3915788Z cvt.f32.bf16 %r8737, %rs170; 2026-02-21T09:21:00.3915851Z cvt.f32.bf16 %r8738, %rs177; 2026-02-21T09:21:00.3915921Z cvt.f32.bf16 %r8739, %rs178; 2026-02-21T09:21:00.3915984Z cvt.f32.bf16 %r8996, %rs171; 2026-02-21T09:21:00.3916047Z cvt.f32.bf16 %r8997, %rs172; 2026-02-21T09:21:00.3916173Z cvt.f32.bf16 %r8998, %rs179; 2026-02-21T09:21:00.3916236Z cvt.f32.bf16 %r8999, %rs180; 2026-02-21T09:21:00.3916299Z cvt.f32.bf16 %r9256, %rs173; 2026-02-21T09:21:00.3916361Z cvt.f32.bf16 %r9257, %rs174; 2026-02-21T09:21:00.3916428Z cvt.f32.bf16 %r9258, %rs181; 2026-02-21T09:21:00.3916678Z cvt.f32.bf16 %r9259, %rs182; 2026-02-21T09:21:00.3916742Z cvt.f32.bf16 %r9516, %rs175; 2026-02-21T09:21:00.3916808Z cvt.f32.bf16 %r9517, %rs176; 2026-02-21T09:21:00.3916870Z cvt.f32.bf16 %r9518, %rs183; 2026-02-21T09:21:00.3916931Z cvt.f32.bf16 %r9519, %rs184; 2026-02-21T09:21:00.3917144Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3917217Z add.s32 %r10091, %r10065, 108544; 2026-02-21T09:21:00.3917505Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.3917576Z add.s32 %r10092, %r10091, %r22239; 2026-02-21T09:21:00.3917647Z add.s32 %r10093, %r10091, %r22243; 2026-02-21T09:21:00.3917714Z add.s32 %r10094, %r10091, %r22244; 2026-02-21T09:21:00.3917914Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3917990Z ld.shared.s8 %rs185, [%r10092]; 2026-02-21T09:21:00.3918190Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3918260Z shl.b16 %rs186, %rs185, 4; 2026-02-21T09:21:00.3918464Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3918595Z ld.shared.s8 %rs187, [%r10092+256]; 2026-02-21T09:21:00.3918795Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3918862Z shl.b16 %rs188, %rs187, 4; 2026-02-21T09:21:00.3919067Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3919136Z ld.shared.s8 %rs189, [%r10092+512]; 2026-02-21T09:21:00.3919336Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3919417Z shl.b16 %rs190, %rs189, 4; 2026-02-21T09:21:00.3919617Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3919688Z ld.shared.s8 %rs191, [%r10093]; 2026-02-21T09:21:00.3919895Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3919958Z shl.b16 %rs192, %rs191, 4; 2026-02-21T09:21:00.3920156Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3920233Z ld.shared.s8 %rs193, [%r10092+1024]; 2026-02-21T09:21:00.3920433Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3920502Z shl.b16 %rs194, %rs193, 4; 2026-02-21T09:21:00.3920701Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3920777Z ld.shared.s8 %rs195, [%r10092+1280]; 2026-02-21T09:21:00.3920975Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3921042Z shl.b16 %rs196, %rs195, 4; 2026-02-21T09:21:00.3921245Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3921314Z ld.shared.s8 %rs197, [%r10092+1536]; 2026-02-21T09:21:00.3921511Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3921582Z shl.b16 %rs198, %rs197, 4; 2026-02-21T09:21:00.3921789Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3921858Z ld.shared.s8 %rs199, [%r10094]; 2026-02-21T09:21:00.3922062Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.3922205Z shl.b16 %rs200, %rs199, 4; 2026-02-21T09:21:00.3922401Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3922527Z cvt.s16.s8 %rs201, %rs186; 2026-02-21T09:21:00.3922607Z shr.s16 %rs202, %rs201, 4; 2026-02-21T09:21:00.3922671Z cvt.s16.s8 %rs203, %rs188; 2026-02-21T09:21:00.3922734Z shr.s16 %rs204, %rs203, 4; 2026-02-21T09:21:00.3922801Z shr.s16 %rs205, %rs185, 4; 2026-02-21T09:21:00.3922865Z shr.s16 %rs206, %rs187, 4; 2026-02-21T09:21:00.3923066Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3923138Z cvt.rn.f32.s16 %r10095, %rs206; 2026-02-21T09:21:00.3923254Z cvt.rn.f32.s16 %r10096, %rs205; 2026-02-21T09:21:00.3923319Z cvt.rn.f32.s16 %r10097, %rs204; 2026-02-21T09:21:00.3923383Z cvt.rn.f32.s16 %r10098, %rs202; 2026-02-21T09:21:00.3923587Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3923654Z cvt.s16.s8 %rs207, %rs190; 2026-02-21T09:21:00.3923717Z shr.s16 %rs208, %rs207, 4; 2026-02-21T09:21:00.3923787Z cvt.s16.s8 %rs209, %rs192; 2026-02-21T09:21:00.3923852Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:21:00.3923915Z shr.s16 %rs211, %rs189, 4; 2026-02-21T09:21:00.3923982Z shr.s16 %rs212, %rs191, 4; 2026-02-21T09:21:00.3924185Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3924314Z cvt.rn.f32.s16 %r10099, %rs212; 2026-02-21T09:21:00.3924383Z cvt.rn.f32.s16 %r10100, %rs211; 2026-02-21T09:21:00.3924454Z cvt.rn.f32.s16 %r10101, %rs210; 2026-02-21T09:21:00.3924518Z cvt.rn.f32.s16 %r10102, %rs208; 2026-02-21T09:21:00.3924719Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3924789Z cvt.s16.s8 %rs213, %rs194; 2026-02-21T09:21:00.3924855Z shr.s16 %rs214, %rs213, 4; 2026-02-21T09:21:00.3924917Z cvt.s16.s8 %rs215, %rs196; 2026-02-21T09:21:00.3924980Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:21:00.3925046Z shr.s16 %rs217, %rs193, 4; 2026-02-21T09:21:00.3925106Z shr.s16 %rs218, %rs195, 4; 2026-02-21T09:21:00.3925306Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3925377Z cvt.rn.f32.s16 %r10103, %rs218; 2026-02-21T09:21:00.3925445Z cvt.rn.f32.s16 %r10104, %rs217; 2026-02-21T09:21:00.3925512Z cvt.rn.f32.s16 %r10105, %rs216; 2026-02-21T09:21:00.3925583Z cvt.rn.f32.s16 %r10106, %rs214; 2026-02-21T09:21:00.3925791Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.3925862Z cvt.s16.s8 %rs219, %rs198; 2026-02-21T09:21:00.3925927Z shr.s16 %rs220, %rs219, 4; 2026-02-21T09:21:00.3925995Z cvt.s16.s8 %rs221, %rs200; 2026-02-21T09:21:00.3926057Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:21:00.3926132Z shr.s16 %rs223, %rs197, 4; 2026-02-21T09:21:00.3926202Z shr.s16 %rs224, %rs199, 4; 2026-02-21T09:21:00.3926409Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.3926592Z cvt.rn.f32.s16 %r10107, %rs224; 2026-02-21T09:21:00.3926661Z cvt.rn.f32.s16 %r10108, %rs223; 2026-02-21T09:21:00.3926743Z cvt.rn.f32.s16 %r10109, %rs222; 2026-02-21T09:21:00.3926808Z cvt.rn.f32.s16 %r10110, %rs220; 2026-02-21T09:21:00.3926868Z bar.sync 0; 2026-02-21T09:21:00.3927005Z st.shared.v4.b32 [%r113], {%r10098, %r10096, %r10097, %r10095}; 2026-02-21T09:21:00.3927126Z st.shared.v4.b32 [%r114], {%r10102, %r10100, %r10101, %r10099}; 2026-02-21T09:21:00.3927239Z st.shared.v4.b32 [%r115], {%r10106, %r10104, %r10105, %r10103}; 2026-02-21T09:21:00.3927356Z st.shared.v4.b32 [%r116], {%r10110, %r10108, %r10109, %r10107}; 2026-02-21T09:21:00.3927414Z $L__tmp7: 2026-02-21T09:21:00.3927693Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.3927848Z // begin inline asm 2026-02-21T09:21:00.3927939Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.3927998Z // end inline asm 2026-02-21T09:21:00.3928120Z bar.sync 0; 2026-02-21T09:21:00.3928204Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.3928265Z // begin inline asm 2026-02-21T09:21:00.3931045Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649}, {%r8736,%r8737,%r8738,%r8739}, %rd1, %p14, 1, 1; 2026-02-21T09:21:00.3931118Z // end inline asm 2026-02-21T09:21:00.3931248Z // begin inline asm 2026-02-21T09:21:00.3933975Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649}, {%r8996,%r8997,%r8998,%r8999}, %rd2, %p14, 1, 1; 2026-02-21T09:21:00.3934041Z // end inline asm 2026-02-21T09:21:00.3934110Z // begin inline asm 2026-02-21T09:21:00.3936941Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777}, {%r9256,%r9257,%r9258,%r9259}, %rd1, %p14, 1, 1; 2026-02-21T09:21:00.3937085Z // end inline asm 2026-02-21T09:21:00.3937146Z // begin inline asm 2026-02-21T09:21:00.3939925Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777}, {%r9516,%r9517,%r9518,%r9519}, %rd2, %p14, 1, 1; 2026-02-21T09:21:00.3940044Z // end inline asm 2026-02-21T09:21:00.3940130Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.3940195Z mov.b32 %r9776, %r2884; 2026-02-21T09:21:00.3940258Z mov.b32 %r9777, %r9778; 2026-02-21T09:21:00.3940320Z // begin inline asm 2026-02-21T09:21:00.3945377Z // wait for regs: %r22522,%r22523,%r22524,%r22525,%r22526,%r22527,%r22528,%r22529,%r22530,%r22531,%r22532,%r22533,%r22534,%r22535,%r22536,%r22537,%r22538,%r22539,%r22540,%r22541,%r22542,%r22543,%r22544,%r22545,%r22546,%r22547,%r22548,%r22549,%r22550,%r22551,%r22552,%r22553,%r22554,%r22555,%r22556,%r22557,%r22558,%r22559,%r22560,%r22561,%r22562,%r22563,%r22564,%r22565,%r22566,%r22567,%r22568,%r22569,%r22570,%r22571,%r22572,%r22573,%r22574,%r22575,%r22576,%r22577,%r22578,%r22579,%r22580,%r22581,%r22582,%r22583,%r22584,%r22585,%r22586,%r22587,%r22588,%r22589,%r22590,%r22591,%r22592,%r22593,%r22594,%r22595,%r22596,%r22597,%r22598,%r22599,%r22600,%r22601,%r22602,%r22603,%r22604,%r22605,%r22606,%r22607,%r22608,%r22609,%r22610,%r22611,%r22612,%r22613,%r22614,%r22615,%r22616,%r22617,%r22618,%r22619,%r22620,%r22621,%r22622,%r22623,%r22624,%r22625,%r22626,%r22627,%r22628,%r22629,%r22630,%r22631,%r22632,%r22633,%r22634,%r22635,%r22636,%r22637,%r22638,%r22639,%r22640,%r22641,%r22642,%r22643,%r22644,%r22645,%r22646,%r22647,%r22648,%r22649,%r22650,%r22651,%r22652,%r22653,%r22654,%r22655,%r22656,%r22657,%r22658,%r22659,%r22660,%r22661,%r22662,%r22663,%r22664,%r22665,%r22666,%r22667,%r22668,%r22669,%r22670,%r22671,%r22672,%r22673,%r22674,%r22675,%r22676,%r22677,%r22678,%r22679,%r22680,%r22681,%r22682,%r22683,%r22684,%r22685,%r22686,%r22687,%r22688,%r22689,%r22690,%r22691,%r22692,%r22693,%r22694,%r22695,%r22696,%r22697,%r22698,%r22699,%r22700,%r22701,%r22702,%r22703,%r22704,%r22705,%r22706,%r22707,%r22708,%r22709,%r22710,%r22711,%r22712,%r22713,%r22714,%r22715,%r22716,%r22717,%r22718,%r22719,%r22720,%r22721,%r22722,%r22723,%r22724,%r22725,%r22726,%r22727,%r22728,%r22729,%r22730,%r22731,%r22732,%r22733,%r22734,%r22735,%r22736,%r22737,%r22738,%r22739,%r22740,%r22741,%r22742,%r22743,%r22744,%r22745,%r22746,%r22747,%r22748,%r22749,%r22750,%r22751,%r22752,%r22753,%r22754,%r22755,%r22756,%r22757,%r22758,%r22759,%r22760,%r22761,%r22762,%r22763,%r22764,%r22765,%r22766,%r22767,%r22768,%r22769,%r22770,%r22771,%r22772,%r22773,%r22774,%r22775,%r22776,%r22777,%r9776,%r9777,%r9778 2026-02-21T09:21:00.3945471Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.3945530Z // end inline asm 2026-02-21T09:21:00.3945592Z $L__tmp8: 2026-02-21T09:21:00.3945807Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3945875Z add.s32 %r10111, %r22521, 1; 2026-02-21T09:21:00.3945944Z setp.gt.s32 %p24, %r10111, 4; 2026-02-21T09:21:00.3946022Z selp.b32 %r22521, 0, %r10111, %p24; 2026-02-21T09:21:00.3946277Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3946348Z add.s32 %r10112, %r22518, -16; 2026-02-21T09:21:00.3946421Z add.s64 %rd276, %rd716, %rd17; 2026-02-21T09:21:00.3946670Z add.s64 %rd266, %rd276, 320; 2026-02-21T09:21:00.3946736Z add.s64 %rd277, %rd716, %rd16; 2026-02-21T09:21:00.3946807Z add.s64 %rd267, %rd277, 320; 2026-02-21T09:21:00.3946871Z add.s64 %rd278, %rd716, %rd15; 2026-02-21T09:21:00.3946935Z add.s64 %rd268, %rd278, 320; 2026-02-21T09:21:00.3947015Z mad.wide.s32 %rd269, %r10112, 2, %rd44; 2026-02-21T09:21:00.3947235Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3947299Z shl.b32 %r10113, %r22521, 13; 2026-02-21T09:21:00.3947442Z add.s32 %r10114, %r22237, %r10113; 2026-02-21T09:21:00.3947518Z add.s32 %r10038, %r10114, %r47; 2026-02-21T09:21:00.3965219Z selp.b32 %r10039, 8, 0, %p22; 2026-02-21T09:21:00.3965347Z // begin inline asm 2026-02-21T09:21:00.3965532Z cp.async.ca.shared.global [ %r10038 + 0 ], [ %rd266 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3965594Z // end inline asm 2026-02-21T09:21:00.3965676Z add.s32 %r10040, %r10038, 2048; 2026-02-21T09:21:00.3965744Z // begin inline asm 2026-02-21T09:21:00.3965909Z cp.async.ca.shared.global [ %r10040 + 0 ], [ %rd267 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3965977Z // end inline asm 2026-02-21T09:21:00.3966055Z add.s32 %r10042, %r10038, 4096; 2026-02-21T09:21:00.3966121Z // begin inline asm 2026-02-21T09:21:00.3966411Z cp.async.ca.shared.global [ %r10042 + 0 ], [ %rd268 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3966654Z // end inline asm 2026-02-21T09:21:00.3966743Z add.s32 %r10044, %r10038, 6144; 2026-02-21T09:21:00.3966812Z // begin inline asm 2026-02-21T09:21:00.3966962Z cp.async.ca.shared.global [ %r10044 + 0 ], [ %rd269 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3967031Z // end inline asm 2026-02-21T09:21:00.3967105Z cp.async.commit_group; 2026-02-21T09:21:00.3967335Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3967419Z add.s32 %r10115, %r22519, -65536; 2026-02-21T09:21:00.3967490Z cvt.s64.s32 %rd279, %r10115; 2026-02-21T09:21:00.3967561Z add.s64 %rd270, %rd45, %rd279; 2026-02-21T09:21:00.3967780Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3967847Z shl.b32 %r10116, %r22521, 11; 2026-02-21T09:21:00.3967912Z add.s32 %r10046, %r54, %r10116; 2026-02-21T09:21:00.3967976Z // begin inline asm 2026-02-21T09:21:00.3968139Z cp.async.ca.shared.global [ %r10046 + 0 ], [ %rd270 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3968199Z // end inline asm 2026-02-21T09:21:00.3968270Z cp.async.commit_group; 2026-02-21T09:21:00.3968485Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.3968552Z add.s64 %rd271, %rd276, 352; 2026-02-21T09:21:00.3968622Z add.s64 %rd272, %rd277, 352; 2026-02-21T09:21:00.3968692Z add.s64 %rd273, %rd278, 352; 2026-02-21T09:21:00.3968772Z mad.wide.s32 %rd274, %r22518, 2, %rd44; 2026-02-21T09:21:00.3968976Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.3969047Z add.s32 %r10117, %r6234, %r10113; 2026-02-21T09:21:00.3969122Z add.s32 %r10048, %r10117, %r47; 2026-02-21T09:21:00.3969185Z // begin inline asm 2026-02-21T09:21:00.3969333Z cp.async.ca.shared.global [ %r10048 + 0 ], [ %rd271 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3969403Z // end inline asm 2026-02-21T09:21:00.3969467Z add.s32 %r10050, %r10048, 2048; 2026-02-21T09:21:00.3969528Z // begin inline asm 2026-02-21T09:21:00.3969686Z cp.async.ca.shared.global [ %r10050 + 0 ], [ %rd272 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3969760Z // end inline asm 2026-02-21T09:21:00.3969828Z add.s32 %r10052, %r10048, 4096; 2026-02-21T09:21:00.3969891Z // begin inline asm 2026-02-21T09:21:00.3970142Z cp.async.ca.shared.global [ %r10052 + 0 ], [ %rd273 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3970205Z // end inline asm 2026-02-21T09:21:00.3970273Z add.s32 %r10054, %r10048, 6144; 2026-02-21T09:21:00.3970342Z // begin inline asm 2026-02-21T09:21:00.3970552Z cp.async.ca.shared.global [ %r10054 + 0 ], [ %rd274 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3970612Z // end inline asm 2026-02-21T09:21:00.3970684Z cp.async.commit_group; 2026-02-21T09:21:00.3970915Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.3970987Z cvt.s64.s32 %rd280, %r22519; 2026-02-21T09:21:00.3971059Z add.s64 %rd275, %rd45, %rd280; 2026-02-21T09:21:00.3971355Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.3971429Z add.s32 %r10056, %r60, %r10116; 2026-02-21T09:21:00.3971494Z // begin inline asm 2026-02-21T09:21:00.3971654Z cp.async.ca.shared.global [ %r10056 + 0 ], [ %rd275 + 0 ], 0x8, %r10039; 2026-02-21T09:21:00.3971717Z // end inline asm 2026-02-21T09:21:00.3971789Z cp.async.commit_group; 2026-02-21T09:21:00.3972003Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3972078Z add.s32 %r22519, %r22519, 131072; 2026-02-21T09:21:00.3972153Z add.s64 %rd716, %rd716, 64; 2026-02-21T09:21:00.3972220Z add.s32 %r22518, %r22518, 32; 2026-02-21T09:21:00.3972295Z setp.lt.u64 %p25, %rd717, 496; 2026-02-21T09:21:00.3972360Z @%p25 bra $L__BB0_5; 2026-02-21T09:21:00.3972549Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:00.3972770Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.3972840Z or.b32 %r10509, %r765, %r11; 2026-02-21T09:21:00.3972903Z or.b32 %r10510, %r765, %r12; 2026-02-21T09:21:00.3972966Z or.b32 %r10511, %r765, %r13; 2026-02-21T09:21:00.3973036Z or.b32 %r10512, %r765, %r14; 2026-02-21T09:21:00.3973097Z or.b32 %r10513, %r765, %r15; 2026-02-21T09:21:00.3973158Z or.b32 %r10514, %r765, %r16; 2026-02-21T09:21:00.3973223Z or.b32 %r10515, %r765, %r17; 2026-02-21T09:21:00.3973285Z or.b32 %r10516, %r765, %r18; 2026-02-21T09:21:00.3973348Z or.b32 %r10517, %r765, %r19; 2026-02-21T09:21:00.3973410Z or.b32 %r10518, %r765, %r20; 2026-02-21T09:21:00.3973476Z or.b32 %r10519, %r765, %r21; 2026-02-21T09:21:00.3973538Z or.b32 %r10520, %r765, %r22; 2026-02-21T09:21:00.3973600Z or.b32 %r10521, %r765, %r23; 2026-02-21T09:21:00.3973676Z or.b32 %r10522, %r765, %r24; 2026-02-21T09:21:00.3973745Z or.b32 %r10523, %r765, %r25; 2026-02-21T09:21:00.3973810Z or.b32 %r10524, %r765, %r26; 2026-02-21T09:21:00.3973871Z or.b32 %r10525, %r765, %r27; 2026-02-21T09:21:00.3973943Z or.b32 %r10526, %r765, %r28; 2026-02-21T09:21:00.3974008Z or.b32 %r10527, %r765, %r29; 2026-02-21T09:21:00.3974068Z or.b32 %r10528, %r765, %r30; 2026-02-21T09:21:00.3974147Z or.b32 %r10529, %r765, %r31; 2026-02-21T09:21:00.3974216Z or.b32 %r10530, %r765, %r32; 2026-02-21T09:21:00.3974278Z or.b32 %r10531, %r765, %r33; 2026-02-21T09:21:00.3974346Z or.b32 %r10532, %r765, %r34; 2026-02-21T09:21:00.3974410Z or.b32 %r10533, %r765, %r35; 2026-02-21T09:21:00.3974474Z or.b32 %r10534, %r765, %r36; 2026-02-21T09:21:00.3974539Z or.b32 %r10535, %r765, %r37; 2026-02-21T09:21:00.3974608Z or.b32 %r10536, %r765, %r38; 2026-02-21T09:21:00.3974668Z or.b32 %r10537, %r765, %r39; 2026-02-21T09:21:00.3974729Z or.b32 %r10538, %r765, %r40; 2026-02-21T09:21:00.3974800Z or.b32 %r10539, %r765, %r41; 2026-02-21T09:21:00.3974863Z or.b32 %r10540, %r765, %r42; 2026-02-21T09:21:00.3975089Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.3975165Z cp.async.wait_group 0; 2026-02-21T09:21:00.3975235Z bar.sync 0; 2026-02-21T09:21:00.3975442Z .loc 1 90 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:90:28 2026-02-21T09:21:00.3975599Z cvt.rn.bf16x2.f32 %r10541, %r22523, %r22522; 2026-02-21T09:21:00.3975697Z cvt.rn.bf16x2.f32 %r10542, %r22525, %r22524; 2026-02-21T09:21:00.3975780Z cvt.rn.bf16x2.f32 %r10543, %r22527, %r22526; 2026-02-21T09:21:00.3975923Z cvt.rn.bf16x2.f32 %r10544, %r22529, %r22528; 2026-02-21T09:21:00.3976009Z cvt.rn.bf16x2.f32 %r10545, %r22531, %r22530; 2026-02-21T09:21:00.3976089Z cvt.rn.bf16x2.f32 %r10546, %r22533, %r22532; 2026-02-21T09:21:00.3976169Z cvt.rn.bf16x2.f32 %r10547, %r22535, %r22534; 2026-02-21T09:21:00.3976250Z cvt.rn.bf16x2.f32 %r10548, %r22537, %r22536; 2026-02-21T09:21:00.3976339Z cvt.rn.bf16x2.f32 %r10549, %r22539, %r22538; 2026-02-21T09:21:00.3976418Z cvt.rn.bf16x2.f32 %r10550, %r22541, %r22540; 2026-02-21T09:21:00.3976699Z cvt.rn.bf16x2.f32 %r10551, %r22543, %r22542; 2026-02-21T09:21:00.3976800Z cvt.rn.bf16x2.f32 %r10552, %r22545, %r22544; 2026-02-21T09:21:00.3976881Z cvt.rn.bf16x2.f32 %r10553, %r22547, %r22546; 2026-02-21T09:21:00.3976966Z cvt.rn.bf16x2.f32 %r10554, %r22549, %r22548; 2026-02-21T09:21:00.3977050Z cvt.rn.bf16x2.f32 %r10555, %r22551, %r22550; 2026-02-21T09:21:00.3977131Z cvt.rn.bf16x2.f32 %r10556, %r22553, %r22552; 2026-02-21T09:21:00.3977209Z cvt.rn.bf16x2.f32 %r10557, %r22555, %r22554; 2026-02-21T09:21:00.3977290Z cvt.rn.bf16x2.f32 %r10558, %r22557, %r22556; 2026-02-21T09:21:00.3977389Z cvt.rn.bf16x2.f32 %r10559, %r22559, %r22558; 2026-02-21T09:21:00.3977470Z cvt.rn.bf16x2.f32 %r10560, %r22561, %r22560; 2026-02-21T09:21:00.3977550Z cvt.rn.bf16x2.f32 %r10561, %r22563, %r22562; 2026-02-21T09:21:00.3977701Z cvt.rn.bf16x2.f32 %r10562, %r22565, %r22564; 2026-02-21T09:21:00.3977784Z cvt.rn.bf16x2.f32 %r10563, %r22567, %r22566; 2026-02-21T09:21:00.3977865Z cvt.rn.bf16x2.f32 %r10564, %r22569, %r22568; 2026-02-21T09:21:00.3977949Z cvt.rn.bf16x2.f32 %r10565, %r22571, %r22570; 2026-02-21T09:21:00.3978028Z cvt.rn.bf16x2.f32 %r10566, %r22573, %r22572; 2026-02-21T09:21:00.3978105Z cvt.rn.bf16x2.f32 %r10567, %r22575, %r22574; 2026-02-21T09:21:00.3978185Z cvt.rn.bf16x2.f32 %r10568, %r22577, %r22576; 2026-02-21T09:21:00.3978272Z cvt.rn.bf16x2.f32 %r10569, %r22579, %r22578; 2026-02-21T09:21:00.3978350Z cvt.rn.bf16x2.f32 %r10570, %r22581, %r22580; 2026-02-21T09:21:00.3978430Z cvt.rn.bf16x2.f32 %r10571, %r22583, %r22582; 2026-02-21T09:21:00.3978521Z cvt.rn.bf16x2.f32 %r10572, %r22585, %r22584; 2026-02-21T09:21:00.3978609Z cvt.rn.bf16x2.f32 %r10573, %r22587, %r22586; 2026-02-21T09:21:00.3978690Z cvt.rn.bf16x2.f32 %r10574, %r22589, %r22588; 2026-02-21T09:21:00.3978778Z cvt.rn.bf16x2.f32 %r10575, %r22591, %r22590; 2026-02-21T09:21:00.3978856Z cvt.rn.bf16x2.f32 %r10576, %r22593, %r22592; 2026-02-21T09:21:00.3978935Z cvt.rn.bf16x2.f32 %r10577, %r22595, %r22594; 2026-02-21T09:21:00.3979015Z cvt.rn.bf16x2.f32 %r10578, %r22597, %r22596; 2026-02-21T09:21:00.3979101Z cvt.rn.bf16x2.f32 %r10579, %r22599, %r22598; 2026-02-21T09:21:00.3979181Z cvt.rn.bf16x2.f32 %r10580, %r22601, %r22600; 2026-02-21T09:21:00.3979261Z cvt.rn.bf16x2.f32 %r10581, %r22603, %r22602; 2026-02-21T09:21:00.3979345Z cvt.rn.bf16x2.f32 %r10582, %r22605, %r22604; 2026-02-21T09:21:00.3979424Z cvt.rn.bf16x2.f32 %r10583, %r22607, %r22606; 2026-02-21T09:21:00.3979501Z cvt.rn.bf16x2.f32 %r10584, %r22609, %r22608; 2026-02-21T09:21:00.3979588Z cvt.rn.bf16x2.f32 %r10585, %r22611, %r22610; 2026-02-21T09:21:00.3979666Z cvt.rn.bf16x2.f32 %r10586, %r22613, %r22612; 2026-02-21T09:21:00.3979744Z cvt.rn.bf16x2.f32 %r10587, %r22615, %r22614; 2026-02-21T09:21:00.3979825Z cvt.rn.bf16x2.f32 %r10588, %r22617, %r22616; 2026-02-21T09:21:00.3979910Z cvt.rn.bf16x2.f32 %r10589, %r22619, %r22618; 2026-02-21T09:21:00.3979990Z cvt.rn.bf16x2.f32 %r10590, %r22621, %r22620; 2026-02-21T09:21:00.3980070Z cvt.rn.bf16x2.f32 %r10591, %r22623, %r22622; 2026-02-21T09:21:00.3980154Z cvt.rn.bf16x2.f32 %r10592, %r22625, %r22624; 2026-02-21T09:21:00.3980235Z cvt.rn.bf16x2.f32 %r10593, %r22627, %r22626; 2026-02-21T09:21:00.3980313Z cvt.rn.bf16x2.f32 %r10594, %r22629, %r22628; 2026-02-21T09:21:00.3980466Z cvt.rn.bf16x2.f32 %r10595, %r22631, %r22630; 2026-02-21T09:21:00.3980554Z cvt.rn.bf16x2.f32 %r10596, %r22633, %r22632; 2026-02-21T09:21:00.3980634Z cvt.rn.bf16x2.f32 %r10597, %r22635, %r22634; 2026-02-21T09:21:00.3980775Z cvt.rn.bf16x2.f32 %r10598, %r22637, %r22636; 2026-02-21T09:21:00.3980862Z cvt.rn.bf16x2.f32 %r10599, %r22639, %r22638; 2026-02-21T09:21:00.3980942Z cvt.rn.bf16x2.f32 %r10600, %r22641, %r22640; 2026-02-21T09:21:00.3981025Z cvt.rn.bf16x2.f32 %r10601, %r22643, %r22642; 2026-02-21T09:21:00.3981111Z cvt.rn.bf16x2.f32 %r10602, %r22645, %r22644; 2026-02-21T09:21:00.3981190Z cvt.rn.bf16x2.f32 %r10603, %r22647, %r22646; 2026-02-21T09:21:00.3981268Z cvt.rn.bf16x2.f32 %r10604, %r22649, %r22648; 2026-02-21T09:21:00.3981393Z cvt.rn.bf16x2.f32 %r10605, %r22651, %r22650; 2026-02-21T09:21:00.3981480Z cvt.rn.bf16x2.f32 %r10606, %r22653, %r22652; 2026-02-21T09:21:00.3981556Z cvt.rn.bf16x2.f32 %r10607, %r22655, %r22654; 2026-02-21T09:21:00.3981636Z cvt.rn.bf16x2.f32 %r10608, %r22657, %r22656; 2026-02-21T09:21:00.3981725Z cvt.rn.bf16x2.f32 %r10609, %r22659, %r22658; 2026-02-21T09:21:00.3981803Z cvt.rn.bf16x2.f32 %r10610, %r22661, %r22660; 2026-02-21T09:21:00.3981884Z cvt.rn.bf16x2.f32 %r10611, %r22663, %r22662; 2026-02-21T09:21:00.3981969Z cvt.rn.bf16x2.f32 %r10612, %r22665, %r22664; 2026-02-21T09:21:00.3982046Z cvt.rn.bf16x2.f32 %r10613, %r22667, %r22666; 2026-02-21T09:21:00.3982125Z cvt.rn.bf16x2.f32 %r10614, %r22669, %r22668; 2026-02-21T09:21:00.3982250Z cvt.rn.bf16x2.f32 %r10615, %r22671, %r22670; 2026-02-21T09:21:00.3982340Z cvt.rn.bf16x2.f32 %r10616, %r22673, %r22672; 2026-02-21T09:21:00.3982423Z cvt.rn.bf16x2.f32 %r10617, %r22675, %r22674; 2026-02-21T09:21:00.3982502Z cvt.rn.bf16x2.f32 %r10618, %r22677, %r22676; 2026-02-21T09:21:00.3982600Z cvt.rn.bf16x2.f32 %r10619, %r22679, %r22678; 2026-02-21T09:21:00.3982679Z cvt.rn.bf16x2.f32 %r10620, %r22681, %r22680; 2026-02-21T09:21:00.3982763Z cvt.rn.bf16x2.f32 %r10621, %r22683, %r22682; 2026-02-21T09:21:00.3982861Z cvt.rn.bf16x2.f32 %r10622, %r22685, %r22684; 2026-02-21T09:21:00.3982944Z cvt.rn.bf16x2.f32 %r10623, %r22687, %r22686; 2026-02-21T09:21:00.3983022Z cvt.rn.bf16x2.f32 %r10624, %r22689, %r22688; 2026-02-21T09:21:00.3983103Z cvt.rn.bf16x2.f32 %r10625, %r22691, %r22690; 2026-02-21T09:21:00.3983191Z cvt.rn.bf16x2.f32 %r10626, %r22693, %r22692; 2026-02-21T09:21:00.3983271Z cvt.rn.bf16x2.f32 %r10627, %r22695, %r22694; 2026-02-21T09:21:00.3983348Z cvt.rn.bf16x2.f32 %r10628, %r22697, %r22696; 2026-02-21T09:21:00.3983434Z cvt.rn.bf16x2.f32 %r10629, %r22699, %r22698; 2026-02-21T09:21:00.3983513Z cvt.rn.bf16x2.f32 %r10630, %r22701, %r22700; 2026-02-21T09:21:00.3983595Z cvt.rn.bf16x2.f32 %r10631, %r22703, %r22702; 2026-02-21T09:21:00.3983684Z cvt.rn.bf16x2.f32 %r10632, %r22705, %r22704; 2026-02-21T09:21:00.3983762Z cvt.rn.bf16x2.f32 %r10633, %r22707, %r22706; 2026-02-21T09:21:00.3983842Z cvt.rn.bf16x2.f32 %r10634, %r22709, %r22708; 2026-02-21T09:21:00.3983923Z cvt.rn.bf16x2.f32 %r10635, %r22711, %r22710; 2026-02-21T09:21:00.3984009Z cvt.rn.bf16x2.f32 %r10636, %r22713, %r22712; 2026-02-21T09:21:00.3984086Z cvt.rn.bf16x2.f32 %r10637, %r22715, %r22714; 2026-02-21T09:21:00.3984165Z cvt.rn.bf16x2.f32 %r10638, %r22717, %r22716; 2026-02-21T09:21:00.3984254Z cvt.rn.bf16x2.f32 %r10639, %r22719, %r22718; 2026-02-21T09:21:00.3984332Z cvt.rn.bf16x2.f32 %r10640, %r22721, %r22720; 2026-02-21T09:21:00.3984413Z cvt.rn.bf16x2.f32 %r10641, %r22723, %r22722; 2026-02-21T09:21:00.3984500Z cvt.rn.bf16x2.f32 %r10642, %r22725, %r22724; 2026-02-21T09:21:00.3984579Z cvt.rn.bf16x2.f32 %r10643, %r22727, %r22726; 2026-02-21T09:21:00.3984658Z cvt.rn.bf16x2.f32 %r10644, %r22729, %r22728; 2026-02-21T09:21:00.3984736Z cvt.rn.bf16x2.f32 %r10645, %r22731, %r22730; 2026-02-21T09:21:00.3984821Z cvt.rn.bf16x2.f32 %r10646, %r22733, %r22732; 2026-02-21T09:21:00.3984900Z cvt.rn.bf16x2.f32 %r10647, %r22735, %r22734; 2026-02-21T09:21:00.3985040Z cvt.rn.bf16x2.f32 %r10648, %r22737, %r22736; 2026-02-21T09:21:00.3985126Z cvt.rn.bf16x2.f32 %r10649, %r22739, %r22738; 2026-02-21T09:21:00.3985206Z cvt.rn.bf16x2.f32 %r10650, %r22741, %r22740; 2026-02-21T09:21:00.3985283Z cvt.rn.bf16x2.f32 %r10651, %r22743, %r22742; 2026-02-21T09:21:00.3985409Z cvt.rn.bf16x2.f32 %r10652, %r22745, %r22744; 2026-02-21T09:21:00.3985488Z cvt.rn.bf16x2.f32 %r10653, %r22747, %r22746; 2026-02-21T09:21:00.3985564Z cvt.rn.bf16x2.f32 %r10654, %r22749, %r22748; 2026-02-21T09:21:00.3985640Z cvt.rn.bf16x2.f32 %r10655, %r22751, %r22750; 2026-02-21T09:21:00.3985725Z cvt.rn.bf16x2.f32 %r10656, %r22753, %r22752; 2026-02-21T09:21:00.3985801Z cvt.rn.bf16x2.f32 %r10657, %r22755, %r22754; 2026-02-21T09:21:00.3985878Z cvt.rn.bf16x2.f32 %r10658, %r22757, %r22756; 2026-02-21T09:21:00.3986006Z cvt.rn.bf16x2.f32 %r10659, %r22759, %r22758; 2026-02-21T09:21:00.3986086Z cvt.rn.bf16x2.f32 %r10660, %r22761, %r22760; 2026-02-21T09:21:00.3986162Z cvt.rn.bf16x2.f32 %r10661, %r22763, %r22762; 2026-02-21T09:21:00.3986240Z cvt.rn.bf16x2.f32 %r10662, %r22765, %r22764; 2026-02-21T09:21:00.3986321Z cvt.rn.bf16x2.f32 %r10663, %r22767, %r22766; 2026-02-21T09:21:00.3986397Z cvt.rn.bf16x2.f32 %r10664, %r22769, %r22768; 2026-02-21T09:21:00.3986592Z cvt.rn.bf16x2.f32 %r10665, %r22771, %r22770; 2026-02-21T09:21:00.3986684Z cvt.rn.bf16x2.f32 %r10666, %r22773, %r22772; 2026-02-21T09:21:00.3986765Z cvt.rn.bf16x2.f32 %r10667, %r22775, %r22774; 2026-02-21T09:21:00.3986842Z cvt.rn.bf16x2.f32 %r10668, %r22777, %r22776; 2026-02-21T09:21:00.3987134Z .loc 1 91 43 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:43 2026-02-21T09:21:00.3987205Z shl.b32 %r10669, %r10509, 13; 2026-02-21T09:21:00.3987268Z shl.b32 %r10670, %r10510, 13; 2026-02-21T09:21:00.3987331Z shl.b32 %r10671, %r10511, 13; 2026-02-21T09:21:00.3987397Z shl.b32 %r10672, %r10512, 13; 2026-02-21T09:21:00.3987457Z shl.b32 %r10673, %r10513, 13; 2026-02-21T09:21:00.3987517Z shl.b32 %r10674, %r10514, 13; 2026-02-21T09:21:00.3987585Z shl.b32 %r10675, %r10515, 13; 2026-02-21T09:21:00.3987644Z shl.b32 %r10676, %r10516, 13; 2026-02-21T09:21:00.3987707Z shl.b32 %r10677, %r10517, 13; 2026-02-21T09:21:00.3987773Z shl.b32 %r10678, %r10518, 13; 2026-02-21T09:21:00.3987834Z shl.b32 %r10679, %r10519, 13; 2026-02-21T09:21:00.3987898Z shl.b32 %r10680, %r10520, 13; 2026-02-21T09:21:00.3987957Z shl.b32 %r10681, %r10521, 13; 2026-02-21T09:21:00.3988025Z shl.b32 %r10682, %r10522, 13; 2026-02-21T09:21:00.3988086Z shl.b32 %r10683, %r10523, 13; 2026-02-21T09:21:00.3988147Z shl.b32 %r10684, %r10524, 13; 2026-02-21T09:21:00.3988213Z shl.b32 %r10685, %r10525, 13; 2026-02-21T09:21:00.3988275Z shl.b32 %r10686, %r10526, 13; 2026-02-21T09:21:00.3988402Z shl.b32 %r10687, %r10527, 13; 2026-02-21T09:21:00.3988467Z shl.b32 %r10688, %r10528, 13; 2026-02-21T09:21:00.3988534Z shl.b32 %r10689, %r10529, 13; 2026-02-21T09:21:00.3988597Z shl.b32 %r10690, %r10530, 13; 2026-02-21T09:21:00.3988659Z shl.b32 %r10691, %r10531, 13; 2026-02-21T09:21:00.3988726Z shl.b32 %r10692, %r10532, 13; 2026-02-21T09:21:00.3988791Z shl.b32 %r10693, %r10533, 13; 2026-02-21T09:21:00.3988851Z shl.b32 %r10694, %r10534, 13; 2026-02-21T09:21:00.3988912Z shl.b32 %r10695, %r10535, 13; 2026-02-21T09:21:00.3988982Z shl.b32 %r10696, %r10536, 13; 2026-02-21T09:21:00.3989040Z shl.b32 %r10697, %r10537, 13; 2026-02-21T09:21:00.3989100Z shl.b32 %r10698, %r10538, 13; 2026-02-21T09:21:00.3989164Z shl.b32 %r10699, %r10539, 13; 2026-02-21T09:21:00.3989225Z shl.b32 %r10700, %r10540, 13; 2026-02-21T09:21:00.3989434Z .loc 1 91 50 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:50 2026-02-21T09:21:00.3989512Z add.s32 %r10701, %r10669, %r764; 2026-02-21T09:21:00.3989576Z add.s32 %r10702, %r10670, %r764; 2026-02-21T09:21:00.3989641Z add.s32 %r10703, %r10671, %r764; 2026-02-21T09:21:00.3989713Z add.s32 %r10704, %r10672, %r764; 2026-02-21T09:21:00.3989783Z add.s32 %r10705, %r10673, %r764; 2026-02-21T09:21:00.3989926Z add.s32 %r10706, %r10674, %r764; 2026-02-21T09:21:00.3989992Z add.s32 %r10707, %r10675, %r764; 2026-02-21T09:21:00.3990058Z add.s32 %r10708, %r10676, %r764; 2026-02-21T09:21:00.3990119Z add.s32 %r10709, %r10677, %r764; 2026-02-21T09:21:00.3990243Z add.s32 %r10710, %r10678, %r764; 2026-02-21T09:21:00.3990309Z add.s32 %r10711, %r10679, %r764; 2026-02-21T09:21:00.3990376Z add.s32 %r10712, %r10680, %r764; 2026-02-21T09:21:00.3990438Z add.s32 %r10713, %r10681, %r764; 2026-02-21T09:21:00.3990502Z add.s32 %r10714, %r10682, %r764; 2026-02-21T09:21:00.3990571Z add.s32 %r10715, %r10683, %r764; 2026-02-21T09:21:00.3990633Z add.s32 %r10716, %r10684, %r764; 2026-02-21T09:21:00.3990695Z add.s32 %r10717, %r10685, %r764; 2026-02-21T09:21:00.3990826Z add.s32 %r10718, %r10686, %r764; 2026-02-21T09:21:00.3990891Z add.s32 %r10719, %r10687, %r764; 2026-02-21T09:21:00.3990953Z add.s32 %r10720, %r10688, %r764; 2026-02-21T09:21:00.3991014Z add.s32 %r10721, %r10689, %r764; 2026-02-21T09:21:00.3991081Z add.s32 %r10722, %r10690, %r764; 2026-02-21T09:21:00.3991143Z add.s32 %r10723, %r10691, %r764; 2026-02-21T09:21:00.3991205Z add.s32 %r10724, %r10692, %r764; 2026-02-21T09:21:00.3991276Z add.s32 %r10725, %r10693, %r764; 2026-02-21T09:21:00.3991339Z add.s32 %r10726, %r10694, %r764; 2026-02-21T09:21:00.3991401Z add.s32 %r10727, %r10695, %r764; 2026-02-21T09:21:00.3991464Z add.s32 %r10728, %r10696, %r764; 2026-02-21T09:21:00.3991531Z add.s32 %r10729, %r10697, %r764; 2026-02-21T09:21:00.3991593Z add.s32 %r10730, %r10698, %r764; 2026-02-21T09:21:00.3991704Z add.s32 %r10731, %r10699, %r764; 2026-02-21T09:21:00.3991776Z add.s32 %r10732, %r10700, %r764; 2026-02-21T09:21:00.3991980Z .loc 1 91 22 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:22 2026-02-21T09:21:00.3992056Z mad.wide.s32 %rd281, %r10701, 2, %rd46; 2026-02-21T09:21:00.3992138Z mad.wide.s32 %rd282, %r10702, 2, %rd46; 2026-02-21T09:21:00.3992207Z mad.wide.s32 %rd283, %r10703, 2, %rd46; 2026-02-21T09:21:00.3992278Z mad.wide.s32 %rd284, %r10704, 2, %rd46; 2026-02-21T09:21:00.3992351Z mad.wide.s32 %rd285, %r10705, 2, %rd46; 2026-02-21T09:21:00.3992420Z mad.wide.s32 %rd286, %r10706, 2, %rd46; 2026-02-21T09:21:00.3992489Z mad.wide.s32 %rd287, %r10707, 2, %rd46; 2026-02-21T09:21:00.3992570Z mad.wide.s32 %rd288, %r10708, 2, %rd46; 2026-02-21T09:21:00.3992646Z mad.wide.s32 %rd289, %r10709, 2, %rd46; 2026-02-21T09:21:00.3992715Z mad.wide.s32 %rd290, %r10710, 2, %rd46; 2026-02-21T09:21:00.3992785Z mad.wide.s32 %rd291, %r10711, 2, %rd46; 2026-02-21T09:21:00.3992858Z mad.wide.s32 %rd292, %r10712, 2, %rd46; 2026-02-21T09:21:00.3992928Z mad.wide.s32 %rd293, %r10713, 2, %rd46; 2026-02-21T09:21:00.3992996Z mad.wide.s32 %rd294, %r10714, 2, %rd46; 2026-02-21T09:21:00.3993071Z mad.wide.s32 %rd295, %r10715, 2, %rd46; 2026-02-21T09:21:00.3993140Z mad.wide.s32 %rd296, %r10716, 2, %rd46; 2026-02-21T09:21:00.3993207Z mad.wide.s32 %rd297, %r10717, 2, %rd46; 2026-02-21T09:21:00.3993277Z mad.wide.s32 %rd298, %r10718, 2, %rd46; 2026-02-21T09:21:00.3993352Z mad.wide.s32 %rd299, %r10719, 2, %rd46; 2026-02-21T09:21:00.3993420Z mad.wide.s32 %rd300, %r10720, 2, %rd46; 2026-02-21T09:21:00.3993487Z mad.wide.s32 %rd301, %r10721, 2, %rd46; 2026-02-21T09:21:00.3993560Z mad.wide.s32 %rd302, %r10722, 2, %rd46; 2026-02-21T09:21:00.3993628Z mad.wide.s32 %rd303, %r10723, 2, %rd46; 2026-02-21T09:21:00.3993697Z mad.wide.s32 %rd304, %r10724, 2, %rd46; 2026-02-21T09:21:00.3993767Z mad.wide.s32 %rd305, %r10725, 2, %rd46; 2026-02-21T09:21:00.3993842Z mad.wide.s32 %rd306, %r10726, 2, %rd46; 2026-02-21T09:21:00.3993911Z mad.wide.s32 %rd307, %r10727, 2, %rd46; 2026-02-21T09:21:00.3993978Z mad.wide.s32 %rd308, %r10728, 2, %rd46; 2026-02-21T09:21:00.3994054Z mad.wide.s32 %rd309, %r10729, 2, %rd46; 2026-02-21T09:21:00.3994122Z mad.wide.s32 %rd310, %r10730, 2, %rd46; 2026-02-21T09:21:00.3994189Z mad.wide.s32 %rd311, %r10731, 2, %rd46; 2026-02-21T09:21:00.3994342Z mad.wide.s32 %rd312, %r10732, 2, %rd46; 2026-02-21T09:21:00.3994543Z .loc 1 91 81 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:81 2026-02-21T09:21:00.3994671Z st.shared.v4.b32 [%r117], {%r10541, %r10543, %r10545, %r10547}; 2026-02-21T09:21:00.3994833Z st.shared.v4.b32 [%r118], {%r10549, %r10551, %r10553, %r10555}; 2026-02-21T09:21:00.3994948Z st.shared.v4.b32 [%r119], {%r10557, %r10559, %r10561, %r10563}; 2026-02-21T09:21:00.3995070Z st.shared.v4.b32 [%r120], {%r10565, %r10567, %r10569, %r10571}; 2026-02-21T09:21:00.3995184Z st.shared.v4.b32 [%r121], {%r10573, %r10575, %r10577, %r10579}; 2026-02-21T09:21:00.3995298Z st.shared.v4.b32 [%r122], {%r10581, %r10583, %r10585, %r10587}; 2026-02-21T09:21:00.3995457Z st.shared.v4.b32 [%r123], {%r10589, %r10591, %r10593, %r10595}; 2026-02-21T09:21:00.3995569Z st.shared.v4.b32 [%r124], {%r10597, %r10599, %r10601, %r10603}; 2026-02-21T09:21:00.3995634Z bar.sync 0; 2026-02-21T09:21:00.3995698Z // begin inline asm 2026-02-21T09:21:00.3995908Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10278, %r10279, %r10280, %r10281}, [%r6269]; 2026-02-21T09:21:00.3995972Z // end inline asm 2026-02-21T09:21:00.3996033Z // begin inline asm 2026-02-21T09:21:00.3996228Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10286, %r10287, %r10288, %r10289}, [%r6274]; 2026-02-21T09:21:00.3996288Z // end inline asm 2026-02-21T09:21:00.3996351Z // begin inline asm 2026-02-21T09:21:00.3996676Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10294, %r10295, %r10296, %r10297}, [%r6279]; 2026-02-21T09:21:00.3996739Z // end inline asm 2026-02-21T09:21:00.3996873Z // begin inline asm 2026-02-21T09:21:00.3997067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10302, %r10303, %r10304, %r10305}, [%r6284]; 2026-02-21T09:21:00.3997125Z // end inline asm 2026-02-21T09:21:00.3997186Z // begin inline asm 2026-02-21T09:21:00.3997379Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10310, %r10311, %r10312, %r10313}, [%r6289]; 2026-02-21T09:21:00.3997439Z // end inline asm 2026-02-21T09:21:00.3997510Z // begin inline asm 2026-02-21T09:21:00.3997706Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10318, %r10319, %r10320, %r10321}, [%r6294]; 2026-02-21T09:21:00.3997764Z // end inline asm 2026-02-21T09:21:00.3997823Z // begin inline asm 2026-02-21T09:21:00.3998016Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10326, %r10327, %r10328, %r10329}, [%r6299]; 2026-02-21T09:21:00.3998084Z // end inline asm 2026-02-21T09:21:00.3998145Z // begin inline asm 2026-02-21T09:21:00.3998336Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10334, %r10335, %r10336, %r10337}, [%r6304]; 2026-02-21T09:21:00.3998397Z // end inline asm 2026-02-21T09:21:00.3998454Z bar.sync 0; 2026-02-21T09:21:00.3998568Z st.shared.v4.b32 [%r117], {%r10542, %r10544, %r10546, %r10548}; 2026-02-21T09:21:00.3998687Z st.shared.v4.b32 [%r118], {%r10550, %r10552, %r10554, %r10556}; 2026-02-21T09:21:00.3998797Z st.shared.v4.b32 [%r119], {%r10558, %r10560, %r10562, %r10564}; 2026-02-21T09:21:00.3998907Z st.shared.v4.b32 [%r120], {%r10566, %r10568, %r10570, %r10572}; 2026-02-21T09:21:00.3999019Z st.shared.v4.b32 [%r121], {%r10574, %r10576, %r10578, %r10580}; 2026-02-21T09:21:00.3999128Z st.shared.v4.b32 [%r122], {%r10582, %r10584, %r10586, %r10588}; 2026-02-21T09:21:00.3999238Z st.shared.v4.b32 [%r123], {%r10590, %r10592, %r10594, %r10596}; 2026-02-21T09:21:00.3999347Z st.shared.v4.b32 [%r124], {%r10598, %r10600, %r10602, %r10604}; 2026-02-21T09:21:00.3999413Z bar.sync 0; 2026-02-21T09:21:00.3999472Z // begin inline asm 2026-02-21T09:21:00.3999664Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10282, %r10283, %r10284, %r10285}, [%r6269]; 2026-02-21T09:21:00.3999727Z // end inline asm 2026-02-21T09:21:00.3999786Z // begin inline asm 2026-02-21T09:21:00.3999974Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10290, %r10291, %r10292, %r10293}, [%r6274]; 2026-02-21T09:21:00.4000037Z // end inline asm 2026-02-21T09:21:00.4000096Z // begin inline asm 2026-02-21T09:21:00.4000281Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10298, %r10299, %r10300, %r10301}, [%r6279]; 2026-02-21T09:21:00.4000426Z // end inline asm 2026-02-21T09:21:00.4000490Z // begin inline asm 2026-02-21T09:21:00.4000679Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10306, %r10307, %r10308, %r10309}, [%r6284]; 2026-02-21T09:21:00.4000796Z // end inline asm 2026-02-21T09:21:00.4000860Z // begin inline asm 2026-02-21T09:21:00.4001046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10314, %r10315, %r10316, %r10317}, [%r6289]; 2026-02-21T09:21:00.4001104Z // end inline asm 2026-02-21T09:21:00.4001167Z // begin inline asm 2026-02-21T09:21:00.4001356Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10322, %r10323, %r10324, %r10325}, [%r6294]; 2026-02-21T09:21:00.4001421Z // end inline asm 2026-02-21T09:21:00.4001481Z // begin inline asm 2026-02-21T09:21:00.4001737Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10330, %r10331, %r10332, %r10333}, [%r6299]; 2026-02-21T09:21:00.4001797Z // end inline asm 2026-02-21T09:21:00.4001860Z // begin inline asm 2026-02-21T09:21:00.4002052Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10338, %r10339, %r10340, %r10341}, [%r6304]; 2026-02-21T09:21:00.4002107Z // end inline asm 2026-02-21T09:21:00.4002165Z bar.sync 0; 2026-02-21T09:21:00.4002283Z st.shared.v4.b32 [%r117], {%r10605, %r10607, %r10609, %r10611}; 2026-02-21T09:21:00.4002394Z st.shared.v4.b32 [%r118], {%r10613, %r10615, %r10617, %r10619}; 2026-02-21T09:21:00.4002504Z st.shared.v4.b32 [%r119], {%r10621, %r10623, %r10625, %r10627}; 2026-02-21T09:21:00.4002613Z st.shared.v4.b32 [%r120], {%r10629, %r10631, %r10633, %r10635}; 2026-02-21T09:21:00.4002773Z st.shared.v4.b32 [%r121], {%r10637, %r10639, %r10641, %r10643}; 2026-02-21T09:21:00.4002884Z st.shared.v4.b32 [%r122], {%r10645, %r10647, %r10649, %r10651}; 2026-02-21T09:21:00.4003008Z st.shared.v4.b32 [%r123], {%r10653, %r10655, %r10657, %r10659}; 2026-02-21T09:21:00.4003119Z st.shared.v4.b32 [%r124], {%r10661, %r10663, %r10665, %r10667}; 2026-02-21T09:21:00.4003174Z bar.sync 0; 2026-02-21T09:21:00.4003236Z // begin inline asm 2026-02-21T09:21:00.4003424Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10342, %r10343, %r10344, %r10345}, [%r6269]; 2026-02-21T09:21:00.4003481Z // end inline asm 2026-02-21T09:21:00.4003542Z // begin inline asm 2026-02-21T09:21:00.4003730Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10350, %r10351, %r10352, %r10353}, [%r6274]; 2026-02-21T09:21:00.4003786Z // end inline asm 2026-02-21T09:21:00.4003843Z // begin inline asm 2026-02-21T09:21:00.4004031Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10358, %r10359, %r10360, %r10361}, [%r6279]; 2026-02-21T09:21:00.4004088Z // end inline asm 2026-02-21T09:21:00.4004146Z // begin inline asm 2026-02-21T09:21:00.4004334Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10366, %r10367, %r10368, %r10369}, [%r6284]; 2026-02-21T09:21:00.4004393Z // end inline asm 2026-02-21T09:21:00.4004452Z // begin inline asm 2026-02-21T09:21:00.4004637Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10374, %r10375, %r10376, %r10377}, [%r6289]; 2026-02-21T09:21:00.4004699Z // end inline asm 2026-02-21T09:21:00.4004757Z // begin inline asm 2026-02-21T09:21:00.4004942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10382, %r10383, %r10384, %r10385}, [%r6294]; 2026-02-21T09:21:00.4005004Z // end inline asm 2026-02-21T09:21:00.4005064Z // begin inline asm 2026-02-21T09:21:00.4005249Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10390, %r10391, %r10392, %r10393}, [%r6299]; 2026-02-21T09:21:00.4005308Z // end inline asm 2026-02-21T09:21:00.4005365Z // begin inline asm 2026-02-21T09:21:00.4005550Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10398, %r10399, %r10400, %r10401}, [%r6304]; 2026-02-21T09:21:00.4005604Z // end inline asm 2026-02-21T09:21:00.4005664Z bar.sync 0; 2026-02-21T09:21:00.4005774Z st.shared.v4.b32 [%r117], {%r10606, %r10608, %r10610, %r10612}; 2026-02-21T09:21:00.4005883Z st.shared.v4.b32 [%r118], {%r10614, %r10616, %r10618, %r10620}; 2026-02-21T09:21:00.4005993Z st.shared.v4.b32 [%r119], {%r10622, %r10624, %r10626, %r10628}; 2026-02-21T09:21:00.4006157Z st.shared.v4.b32 [%r120], {%r10630, %r10632, %r10634, %r10636}; 2026-02-21T09:21:00.4006265Z st.shared.v4.b32 [%r121], {%r10638, %r10640, %r10642, %r10644}; 2026-02-21T09:21:00.4006373Z st.shared.v4.b32 [%r122], {%r10646, %r10648, %r10650, %r10652}; 2026-02-21T09:21:00.4006781Z st.shared.v4.b32 [%r123], {%r10654, %r10656, %r10658, %r10660}; 2026-02-21T09:21:00.4006894Z st.shared.v4.b32 [%r124], {%r10662, %r10664, %r10666, %r10668}; 2026-02-21T09:21:00.4006949Z bar.sync 0; 2026-02-21T09:21:00.4007015Z // begin inline asm 2026-02-21T09:21:00.4007208Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10346, %r10347, %r10348, %r10349}, [%r6269]; 2026-02-21T09:21:00.4007264Z // end inline asm 2026-02-21T09:21:00.4007325Z // begin inline asm 2026-02-21T09:21:00.4007594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10354, %r10355, %r10356, %r10357}, [%r6274]; 2026-02-21T09:21:00.4007657Z // end inline asm 2026-02-21T09:21:00.4007719Z // begin inline asm 2026-02-21T09:21:00.4007911Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10362, %r10363, %r10364, %r10365}, [%r6279]; 2026-02-21T09:21:00.4007967Z // end inline asm 2026-02-21T09:21:00.4008028Z // begin inline asm 2026-02-21T09:21:00.4008216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10370, %r10371, %r10372, %r10373}, [%r6284]; 2026-02-21T09:21:00.4008274Z // end inline asm 2026-02-21T09:21:00.4008332Z // begin inline asm 2026-02-21T09:21:00.4008518Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10378, %r10379, %r10380, %r10381}, [%r6289]; 2026-02-21T09:21:00.4008574Z // end inline asm 2026-02-21T09:21:00.4008714Z // begin inline asm 2026-02-21T09:21:00.4008906Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10386, %r10387, %r10388, %r10389}, [%r6294]; 2026-02-21T09:21:00.4008962Z // end inline asm 2026-02-21T09:21:00.4009020Z // begin inline asm 2026-02-21T09:21:00.4009207Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10394, %r10395, %r10396, %r10397}, [%r6299]; 2026-02-21T09:21:00.4009265Z // end inline asm 2026-02-21T09:21:00.4009324Z // begin inline asm 2026-02-21T09:21:00.4009508Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10402, %r10403, %r10404, %r10405}, [%r6304]; 2026-02-21T09:21:00.4009578Z // end inline asm 2026-02-21T09:21:00.4009638Z // begin inline asm 2026-02-21T09:21:00.4009773Z st.global.v4.b32 [ %rd281 + 0 ], { %r10278, %r10279, %r10280, %r10281 }; 2026-02-21T09:21:00.4009830Z // end inline asm 2026-02-21T09:21:00.4009890Z // begin inline asm 2026-02-21T09:21:00.4010013Z st.global.v4.b32 [ %rd282 + 0 ], { %r10282, %r10283, %r10284, %r10285 }; 2026-02-21T09:21:00.4010075Z // end inline asm 2026-02-21T09:21:00.4010136Z // begin inline asm 2026-02-21T09:21:00.4010255Z st.global.v4.b32 [ %rd283 + 0 ], { %r10286, %r10287, %r10288, %r10289 }; 2026-02-21T09:21:00.4010311Z // end inline asm 2026-02-21T09:21:00.4010378Z // begin inline asm 2026-02-21T09:21:00.4010496Z st.global.v4.b32 [ %rd284 + 0 ], { %r10290, %r10291, %r10292, %r10293 }; 2026-02-21T09:21:00.4010551Z // end inline asm 2026-02-21T09:21:00.4010610Z // begin inline asm 2026-02-21T09:21:00.4010734Z st.global.v4.b32 [ %rd285 + 0 ], { %r10294, %r10295, %r10296, %r10297 }; 2026-02-21T09:21:00.4010790Z // end inline asm 2026-02-21T09:21:00.4010848Z // begin inline asm 2026-02-21T09:21:00.4010972Z st.global.v4.b32 [ %rd286 + 0 ], { %r10298, %r10299, %r10300, %r10301 }; 2026-02-21T09:21:00.4011028Z // end inline asm 2026-02-21T09:21:00.4011095Z // begin inline asm 2026-02-21T09:21:00.4011216Z st.global.v4.b32 [ %rd287 + 0 ], { %r10302, %r10303, %r10304, %r10305 }; 2026-02-21T09:21:00.4011278Z // end inline asm 2026-02-21T09:21:00.4011338Z // begin inline asm 2026-02-21T09:21:00.4011456Z st.global.v4.b32 [ %rd288 + 0 ], { %r10306, %r10307, %r10308, %r10309 }; 2026-02-21T09:21:00.4011515Z // end inline asm 2026-02-21T09:21:00.4011575Z // begin inline asm 2026-02-21T09:21:00.4011692Z st.global.v4.b32 [ %rd289 + 0 ], { %r10310, %r10311, %r10312, %r10313 }; 2026-02-21T09:21:00.4011752Z // end inline asm 2026-02-21T09:21:00.4011885Z // begin inline asm 2026-02-21T09:21:00.4012002Z st.global.v4.b32 [ %rd290 + 0 ], { %r10314, %r10315, %r10316, %r10317 }; 2026-02-21T09:21:00.4012058Z // end inline asm 2026-02-21T09:21:00.4012125Z // begin inline asm 2026-02-21T09:21:00.4012241Z st.global.v4.b32 [ %rd291 + 0 ], { %r10318, %r10319, %r10320, %r10321 }; 2026-02-21T09:21:00.4012356Z // end inline asm 2026-02-21T09:21:00.4012417Z // begin inline asm 2026-02-21T09:21:00.4012543Z st.global.v4.b32 [ %rd292 + 0 ], { %r10322, %r10323, %r10324, %r10325 }; 2026-02-21T09:21:00.4012602Z // end inline asm 2026-02-21T09:21:00.4012661Z // begin inline asm 2026-02-21T09:21:00.4012794Z st.global.v4.b32 [ %rd293 + 0 ], { %r10326, %r10327, %r10328, %r10329 }; 2026-02-21T09:21:00.4012850Z // end inline asm 2026-02-21T09:21:00.4012954Z // begin inline asm 2026-02-21T09:21:00.4013074Z st.global.v4.b32 [ %rd294 + 0 ], { %r10330, %r10331, %r10332, %r10333 }; 2026-02-21T09:21:00.4013131Z // end inline asm 2026-02-21T09:21:00.4013189Z // begin inline asm 2026-02-21T09:21:00.4013310Z st.global.v4.b32 [ %rd295 + 0 ], { %r10334, %r10335, %r10336, %r10337 }; 2026-02-21T09:21:00.4013365Z // end inline asm 2026-02-21T09:21:00.4013423Z // begin inline asm 2026-02-21T09:21:00.4013539Z st.global.v4.b32 [ %rd296 + 0 ], { %r10338, %r10339, %r10340, %r10341 }; 2026-02-21T09:21:00.4013601Z // end inline asm 2026-02-21T09:21:00.4013658Z // begin inline asm 2026-02-21T09:21:00.4013774Z st.global.v4.b32 [ %rd297 + 0 ], { %r10342, %r10343, %r10344, %r10345 }; 2026-02-21T09:21:00.4013831Z // end inline asm 2026-02-21T09:21:00.4013956Z // begin inline asm 2026-02-21T09:21:00.4014077Z st.global.v4.b32 [ %rd298 + 0 ], { %r10346, %r10347, %r10348, %r10349 }; 2026-02-21T09:21:00.4014132Z // end inline asm 2026-02-21T09:21:00.4014193Z // begin inline asm 2026-02-21T09:21:00.4014311Z st.global.v4.b32 [ %rd299 + 0 ], { %r10350, %r10351, %r10352, %r10353 }; 2026-02-21T09:21:00.4014367Z // end inline asm 2026-02-21T09:21:00.4014425Z // begin inline asm 2026-02-21T09:21:00.4014543Z st.global.v4.b32 [ %rd300 + 0 ], { %r10354, %r10355, %r10356, %r10357 }; 2026-02-21T09:21:00.4014599Z // end inline asm 2026-02-21T09:21:00.4014657Z // begin inline asm 2026-02-21T09:21:00.4014773Z st.global.v4.b32 [ %rd301 + 0 ], { %r10358, %r10359, %r10360, %r10361 }; 2026-02-21T09:21:00.4014831Z // end inline asm 2026-02-21T09:21:00.4014890Z // begin inline asm 2026-02-21T09:21:00.4015009Z st.global.v4.b32 [ %rd302 + 0 ], { %r10362, %r10363, %r10364, %r10365 }; 2026-02-21T09:21:00.4015064Z // end inline asm 2026-02-21T09:21:00.4015123Z // begin inline asm 2026-02-21T09:21:00.4015245Z st.global.v4.b32 [ %rd303 + 0 ], { %r10366, %r10367, %r10368, %r10369 }; 2026-02-21T09:21:00.4015302Z // end inline asm 2026-02-21T09:21:00.4015359Z // begin inline asm 2026-02-21T09:21:00.4015477Z st.global.v4.b32 [ %rd304 + 0 ], { %r10370, %r10371, %r10372, %r10373 }; 2026-02-21T09:21:00.4015537Z // end inline asm 2026-02-21T09:21:00.4015594Z // begin inline asm 2026-02-21T09:21:00.4015714Z st.global.v4.b32 [ %rd305 + 0 ], { %r10374, %r10375, %r10376, %r10377 }; 2026-02-21T09:21:00.4015774Z // end inline asm 2026-02-21T09:21:00.4015831Z // begin inline asm 2026-02-21T09:21:00.4015951Z st.global.v4.b32 [ %rd306 + 0 ], { %r10378, %r10379, %r10380, %r10381 }; 2026-02-21T09:21:00.4016014Z // end inline asm 2026-02-21T09:21:00.4016072Z // begin inline asm 2026-02-21T09:21:00.4016188Z st.global.v4.b32 [ %rd307 + 0 ], { %r10382, %r10383, %r10384, %r10385 }; 2026-02-21T09:21:00.4016243Z // end inline asm 2026-02-21T09:21:00.4016304Z // begin inline asm 2026-02-21T09:21:00.4016422Z st.global.v4.b32 [ %rd308 + 0 ], { %r10386, %r10387, %r10388, %r10389 }; 2026-02-21T09:21:00.4016611Z // end inline asm 2026-02-21T09:21:00.4016678Z // begin inline asm 2026-02-21T09:21:00.4016800Z st.global.v4.b32 [ %rd309 + 0 ], { %r10390, %r10391, %r10392, %r10393 }; 2026-02-21T09:21:00.4016855Z // end inline asm 2026-02-21T09:21:00.4016911Z // begin inline asm 2026-02-21T09:21:00.4017036Z st.global.v4.b32 [ %rd310 + 0 ], { %r10394, %r10395, %r10396, %r10397 }; 2026-02-21T09:21:00.4017174Z // end inline asm 2026-02-21T09:21:00.4017230Z // begin inline asm 2026-02-21T09:21:00.4017352Z st.global.v4.b32 [ %rd311 + 0 ], { %r10398, %r10399, %r10400, %r10401 }; 2026-02-21T09:21:00.4017471Z // end inline asm 2026-02-21T09:21:00.4017528Z // begin inline asm 2026-02-21T09:21:00.4017652Z st.global.v4.b32 [ %rd312 + 0 ], { %r10402, %r10403, %r10404, %r10405 }; 2026-02-21T09:21:00.4017708Z // end inline asm 2026-02-21T09:21:00.4017928Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.4017992Z add.s32 %r10733, %r22257, 2; 2026-02-21T09:21:00.4018260Z .loc 1 29 33 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:29:33 2026-02-21T09:21:00.4018323Z shr.u32 %r10734, %r10733, 6; 2026-02-21T09:21:00.4018390Z and.b32 %r10735, %r10734, 33554424; 2026-02-21T09:21:00.4018591Z .loc 1 30 39 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:39 2026-02-21T09:21:00.4018657Z sub.s32 %r10736, 32, %r10735; 2026-02-21T09:21:00.4018855Z .loc 1 30 52 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:52 2026-02-21T09:21:00.4018934Z min.s32 %r10737, %r10736, 8; 2026-02-21T09:21:00.4019131Z .loc 1 31 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:45 2026-02-21T09:21:00.4019195Z and.b32 %r10738, %r10733, 511; 2026-02-21T09:21:00.4019448Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.4019515Z div.s32 %r10739, %r10738, %r10737; 2026-02-21T09:21:00.4019710Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.4019778Z mul.lo.s32 %r10740, %r10739, %r10737; 2026-02-21T09:21:00.4019844Z sub.s32 %r10741, %r10738, %r10740; 2026-02-21T09:21:00.4020039Z .loc 1 31 30 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:30 2026-02-21T09:21:00.4020101Z add.s32 %r10742, %r10741, %r10735; 2026-02-21T09:21:00.4020309Z .loc 1 33 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:33:27 2026-02-21T09:21:00.4020375Z shl.b32 %r10743, %r10742, 8; 2026-02-21T09:21:00.4020571Z .loc 1 34 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:32 2026-02-21T09:21:00.4020638Z or.b32 %r1288, %r10743, %r44; 2026-02-21T09:21:00.4020833Z .loc 1 35 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:35:27 2026-02-21T09:21:00.4020896Z shl.b32 %r1289, %r10739, 8; 2026-02-21T09:21:00.4021090Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4021157Z or.b32 %r10744, %r1289, %r6; 2026-02-21T09:21:00.4021217Z or.b32 %r10745, %r1289, %r7; 2026-02-21T09:21:00.4021277Z or.b32 %r10746, %r1289, %r8; 2026-02-21T09:21:00.4021341Z or.b32 %r10747, %r1289, %r9; 2026-02-21T09:21:00.4021535Z .loc 1 51 53 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:53 2026-02-21T09:21:00.4021595Z shl.b32 %r10748, %r10744, 10; 2026-02-21T09:21:00.4021660Z shl.b32 %r10749, %r10745, 10; 2026-02-21T09:21:00.4021718Z shl.b32 %r10750, %r10746, 10; 2026-02-21T09:21:00.4021778Z shl.b32 %r10751, %r10747, 10; 2026-02-21T09:21:00.4021971Z .loc 1 51 60 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:60 2026-02-21T09:21:00.4022042Z or.b32 %r10752, %r10748, %r46; 2026-02-21T09:21:00.4022104Z or.b32 %r10753, %r10749, %r46; 2026-02-21T09:21:00.4022163Z or.b32 %r10754, %r10750, %r46; 2026-02-21T09:21:00.4022242Z or.b32 %r10755, %r10751, %r46; 2026-02-21T09:21:00.4022435Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4022510Z mad.wide.s32 %rd313, %r10752, 2, %rd44; 2026-02-21T09:21:00.4022645Z mad.wide.s32 %rd314, %r10753, 2, %rd44; 2026-02-21T09:21:00.4022715Z mad.wide.s32 %rd315, %r10754, 2, %rd44; 2026-02-21T09:21:00.4022782Z mad.wide.s32 %rd316, %r10755, 2, %rd44; 2026-02-21T09:21:00.4022979Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4023087Z bar.sync 0; 2026-02-21T09:21:00.4023145Z mov.b32 %r10407, 8; 2026-02-21T09:21:00.4023204Z // begin inline asm 2026-02-21T09:21:00.4023355Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd313 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4023414Z // end inline asm 2026-02-21T09:21:00.4023472Z // begin inline asm 2026-02-21T09:21:00.4023611Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd314 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4023715Z // end inline asm 2026-02-21T09:21:00.4023775Z // begin inline asm 2026-02-21T09:21:00.4023904Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd315 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4023967Z // end inline asm 2026-02-21T09:21:00.4024025Z // begin inline asm 2026-02-21T09:21:00.4024155Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd316 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4024214Z // end inline asm 2026-02-21T09:21:00.4024283Z cp.async.commit_group; 2026-02-21T09:21:00.4024488Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4024555Z add.s32 %r10756, %r1288, %r22238; 2026-02-21T09:21:00.4024753Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4024867Z cvt.s64.s32 %rd364, %r10756; 2026-02-21T09:21:00.4024935Z add.s64 %rd317, %rd45, %rd364; 2026-02-21T09:21:00.4025151Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4025213Z // begin inline asm 2026-02-21T09:21:00.4025347Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd317 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4025408Z // end inline asm 2026-02-21T09:21:00.4025473Z cp.async.commit_group; 2026-02-21T09:21:00.4025669Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4025737Z cvt.s64.s32 %rd365, %r10748; 2026-02-21T09:21:00.4025802Z or.b64 %rd366, %rd365, %rd713; 2026-02-21T09:21:00.4025865Z shl.b64 %rd367, %rd366, 1; 2026-02-21T09:21:00.4025927Z add.s64 %rd368, %rd44, %rd367; 2026-02-21T09:21:00.4025993Z add.s64 %rd318, %rd368, 32; 2026-02-21T09:21:00.4026054Z cvt.s64.s32 %rd369, %r10749; 2026-02-21T09:21:00.4026121Z or.b64 %rd370, %rd369, %rd713; 2026-02-21T09:21:00.4026187Z shl.b64 %rd371, %rd370, 1; 2026-02-21T09:21:00.4026249Z add.s64 %rd372, %rd44, %rd371; 2026-02-21T09:21:00.4026311Z add.s64 %rd319, %rd372, 32; 2026-02-21T09:21:00.4026373Z cvt.s64.s32 %rd373, %r10750; 2026-02-21T09:21:00.4026438Z or.b64 %rd374, %rd373, %rd713; 2026-02-21T09:21:00.4026626Z shl.b64 %rd375, %rd374, 1; 2026-02-21T09:21:00.4026693Z add.s64 %rd376, %rd44, %rd375; 2026-02-21T09:21:00.4026758Z add.s64 %rd320, %rd376, 32; 2026-02-21T09:21:00.4026818Z cvt.s64.s32 %rd377, %r10751; 2026-02-21T09:21:00.4026880Z or.b64 %rd378, %rd377, %rd713; 2026-02-21T09:21:00.4026938Z shl.b64 %rd379, %rd378, 1; 2026-02-21T09:21:00.4027004Z add.s64 %rd380, %rd44, %rd379; 2026-02-21T09:21:00.4027065Z add.s64 %rd321, %rd380, 32; 2026-02-21T09:21:00.4027272Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4027337Z // begin inline asm 2026-02-21T09:21:00.4027475Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd318 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4027532Z // end inline asm 2026-02-21T09:21:00.4027594Z // begin inline asm 2026-02-21T09:21:00.4027729Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd319 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4027789Z // end inline asm 2026-02-21T09:21:00.4027847Z // begin inline asm 2026-02-21T09:21:00.4027982Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd320 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4028118Z // end inline asm 2026-02-21T09:21:00.4028176Z // begin inline asm 2026-02-21T09:21:00.4028371Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd321 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4028497Z // end inline asm 2026-02-21T09:21:00.4028564Z cp.async.commit_group; 2026-02-21T09:21:00.4028767Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4028834Z add.s32 %r10757, %r1288, %r59; 2026-02-21T09:21:00.4029030Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4029090Z cvt.s64.s32 %rd381, %r10757; 2026-02-21T09:21:00.4029156Z add.s64 %rd322, %rd45, %rd381; 2026-02-21T09:21:00.4029415Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4029477Z // begin inline asm 2026-02-21T09:21:00.4029612Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd322 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4029674Z // end inline asm 2026-02-21T09:21:00.4029739Z cp.async.commit_group; 2026-02-21T09:21:00.4029935Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4030003Z add.s64 %rd323, %rd368, 64; 2026-02-21T09:21:00.4030062Z add.s64 %rd324, %rd372, 64; 2026-02-21T09:21:00.4030135Z add.s64 %rd325, %rd376, 64; 2026-02-21T09:21:00.4030200Z add.s64 %rd326, %rd380, 64; 2026-02-21T09:21:00.4030462Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4030520Z bar.sync 0; 2026-02-21T09:21:00.4030585Z // begin inline asm 2026-02-21T09:21:00.4030719Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd323 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4030778Z // end inline asm 2026-02-21T09:21:00.4030836Z // begin inline asm 2026-02-21T09:21:00.4030971Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd324 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4031029Z // end inline asm 2026-02-21T09:21:00.4031087Z // begin inline asm 2026-02-21T09:21:00.4031220Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd325 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4031276Z // end inline asm 2026-02-21T09:21:00.4031335Z // begin inline asm 2026-02-21T09:21:00.4031463Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd326 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4031523Z // end inline asm 2026-02-21T09:21:00.4031587Z cp.async.commit_group; 2026-02-21T09:21:00.4031800Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4031867Z add.s32 %r10758, %r1288, %r65; 2026-02-21T09:21:00.4032064Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4032126Z cvt.s64.s32 %rd382, %r10758; 2026-02-21T09:21:00.4032191Z add.s64 %rd327, %rd45, %rd382; 2026-02-21T09:21:00.4032393Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4032454Z // begin inline asm 2026-02-21T09:21:00.4032590Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd327 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4032647Z // end inline asm 2026-02-21T09:21:00.4032713Z cp.async.commit_group; 2026-02-21T09:21:00.4032908Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4032973Z add.s64 %rd328, %rd368, 96; 2026-02-21T09:21:00.4033035Z add.s64 %rd329, %rd372, 96; 2026-02-21T09:21:00.4033097Z add.s64 %rd330, %rd376, 96; 2026-02-21T09:21:00.4033161Z add.s64 %rd331, %rd380, 96; 2026-02-21T09:21:00.4033355Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4033413Z // begin inline asm 2026-02-21T09:21:00.4033545Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd328 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4033603Z // end inline asm 2026-02-21T09:21:00.4033719Z // begin inline asm 2026-02-21T09:21:00.4033850Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd329 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4033908Z // end inline asm 2026-02-21T09:21:00.4033967Z // begin inline asm 2026-02-21T09:21:00.4034139Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd330 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4034199Z // end inline asm 2026-02-21T09:21:00.4034256Z // begin inline asm 2026-02-21T09:21:00.4034385Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd331 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4034440Z // end inline asm 2026-02-21T09:21:00.4034511Z cp.async.commit_group; 2026-02-21T09:21:00.4034708Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4034835Z add.s32 %r10759, %r1288, %r71; 2026-02-21T09:21:00.4035036Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4035097Z cvt.s64.s32 %rd383, %r10759; 2026-02-21T09:21:00.4035161Z add.s64 %rd332, %rd45, %rd383; 2026-02-21T09:21:00.4035372Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4035432Z // begin inline asm 2026-02-21T09:21:00.4035565Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd332 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4035623Z // end inline asm 2026-02-21T09:21:00.4035692Z cp.async.commit_group; 2026-02-21T09:21:00.4035885Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4035992Z add.s64 %rd333, %rd368, 128; 2026-02-21T09:21:00.4036057Z add.s64 %rd334, %rd372, 128; 2026-02-21T09:21:00.4036117Z add.s64 %rd335, %rd376, 128; 2026-02-21T09:21:00.4036179Z add.s64 %rd336, %rd380, 128; 2026-02-21T09:21:00.4036375Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4036431Z bar.sync 0; 2026-02-21T09:21:00.4036608Z // begin inline asm 2026-02-21T09:21:00.4036743Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd333 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4036802Z // end inline asm 2026-02-21T09:21:00.4036860Z // begin inline asm 2026-02-21T09:21:00.4036989Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd334 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4037050Z // end inline asm 2026-02-21T09:21:00.4037107Z // begin inline asm 2026-02-21T09:21:00.4037236Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd335 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4037291Z // end inline asm 2026-02-21T09:21:00.4037354Z // begin inline asm 2026-02-21T09:21:00.4037481Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd336 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4037552Z // end inline asm 2026-02-21T09:21:00.4037625Z cp.async.commit_group; 2026-02-21T09:21:00.4037824Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4037887Z add.s32 %r10760, %r1288, %r77; 2026-02-21T09:21:00.4038089Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4038151Z cvt.s64.s32 %rd384, %r10760; 2026-02-21T09:21:00.4038212Z add.s64 %rd337, %rd45, %rd384; 2026-02-21T09:21:00.4038406Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4038472Z // begin inline asm 2026-02-21T09:21:00.4038603Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd337 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4038659Z // end inline asm 2026-02-21T09:21:00.4038728Z cp.async.commit_group; 2026-02-21T09:21:00.4038924Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4038985Z add.s64 %rd338, %rd368, 160; 2026-02-21T09:21:00.4039049Z add.s64 %rd339, %rd372, 160; 2026-02-21T09:21:00.4039109Z add.s64 %rd340, %rd376, 160; 2026-02-21T09:21:00.4039170Z add.s64 %rd341, %rd380, 160; 2026-02-21T09:21:00.4039445Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4039508Z // begin inline asm 2026-02-21T09:21:00.4039640Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd338 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4039757Z // end inline asm 2026-02-21T09:21:00.4039819Z // begin inline asm 2026-02-21T09:21:00.4039952Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd339 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4040010Z // end inline asm 2026-02-21T09:21:00.4040073Z // begin inline asm 2026-02-21T09:21:00.4040205Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd340 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4040261Z // end inline asm 2026-02-21T09:21:00.4040318Z // begin inline asm 2026-02-21T09:21:00.4040510Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd341 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4040569Z // end inline asm 2026-02-21T09:21:00.4040638Z cp.async.commit_group; 2026-02-21T09:21:00.4040843Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4040910Z add.s32 %r10761, %r1288, %r83; 2026-02-21T09:21:00.4041118Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4041188Z cvt.s64.s32 %rd385, %r10761; 2026-02-21T09:21:00.4041251Z add.s64 %rd342, %rd45, %rd385; 2026-02-21T09:21:00.4041447Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4041508Z // begin inline asm 2026-02-21T09:21:00.4041706Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd342 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4041766Z // end inline asm 2026-02-21T09:21:00.4041835Z cp.async.commit_group; 2026-02-21T09:21:00.4042039Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4042103Z add.s64 %rd343, %rd368, 192; 2026-02-21T09:21:00.4042162Z add.s64 %rd344, %rd372, 192; 2026-02-21T09:21:00.4042232Z add.s64 %rd345, %rd376, 192; 2026-02-21T09:21:00.4042297Z add.s64 %rd346, %rd380, 192; 2026-02-21T09:21:00.4042493Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4042551Z bar.sync 0; 2026-02-21T09:21:00.4042616Z // begin inline asm 2026-02-21T09:21:00.4042750Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd343 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4042808Z // end inline asm 2026-02-21T09:21:00.4042871Z // begin inline asm 2026-02-21T09:21:00.4043004Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd344 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4043061Z // end inline asm 2026-02-21T09:21:00.4043119Z // begin inline asm 2026-02-21T09:21:00.4043252Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd345 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4043309Z // end inline asm 2026-02-21T09:21:00.4043367Z // begin inline asm 2026-02-21T09:21:00.4043502Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd346 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4043573Z // end inline asm 2026-02-21T09:21:00.4043642Z cp.async.commit_group; 2026-02-21T09:21:00.4043846Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4043909Z add.s32 %r10762, %r1288, %r89; 2026-02-21T09:21:00.4044105Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4044165Z cvt.s64.s32 %rd386, %r10762; 2026-02-21T09:21:00.4044229Z add.s64 %rd347, %rd45, %rd386; 2026-02-21T09:21:00.4044424Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4044487Z // begin inline asm 2026-02-21T09:21:00.4044624Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd347 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4044681Z // end inline asm 2026-02-21T09:21:00.4044747Z cp.async.commit_group; 2026-02-21T09:21:00.4044947Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4045068Z add.s64 %rd348, %rd368, 224; 2026-02-21T09:21:00.4045128Z add.s64 %rd349, %rd372, 224; 2026-02-21T09:21:00.4045188Z add.s64 %rd350, %rd376, 224; 2026-02-21T09:21:00.4045254Z add.s64 %rd351, %rd380, 224; 2026-02-21T09:21:00.4045496Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4045557Z // begin inline asm 2026-02-21T09:21:00.4045694Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd348 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4045752Z // end inline asm 2026-02-21T09:21:00.4045811Z // begin inline asm 2026-02-21T09:21:00.4045942Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd349 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4046003Z // end inline asm 2026-02-21T09:21:00.4046115Z // begin inline asm 2026-02-21T09:21:00.4046248Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd350 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4046308Z // end inline asm 2026-02-21T09:21:00.4046369Z // begin inline asm 2026-02-21T09:21:00.4046614Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd351 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4046679Z // end inline asm 2026-02-21T09:21:00.4046746Z cp.async.commit_group; 2026-02-21T09:21:00.4046942Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4047008Z add.s32 %r10763, %r1288, %r95; 2026-02-21T09:21:00.4047208Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4047348Z cvt.s64.s32 %rd387, %r10763; 2026-02-21T09:21:00.4047418Z add.s64 %rd352, %rd45, %rd387; 2026-02-21T09:21:00.4047619Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4047682Z // begin inline asm 2026-02-21T09:21:00.4047814Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd352 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4047874Z // end inline asm 2026-02-21T09:21:00.4047942Z cp.async.commit_group; 2026-02-21T09:21:00.4048135Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4048198Z add.s64 %rd353, %rd368, 256; 2026-02-21T09:21:00.4048263Z add.s64 %rd354, %rd372, 256; 2026-02-21T09:21:00.4048325Z add.s64 %rd355, %rd376, 256; 2026-02-21T09:21:00.4048385Z add.s64 %rd356, %rd380, 256; 2026-02-21T09:21:00.4048584Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4048641Z bar.sync 0; 2026-02-21T09:21:00.4048702Z // begin inline asm 2026-02-21T09:21:00.4048839Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd353 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4048895Z // end inline asm 2026-02-21T09:21:00.4048954Z // begin inline asm 2026-02-21T09:21:00.4049086Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd354 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4049146Z // end inline asm 2026-02-21T09:21:00.4049205Z // begin inline asm 2026-02-21T09:21:00.4049337Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd355 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4049397Z // end inline asm 2026-02-21T09:21:00.4049455Z // begin inline asm 2026-02-21T09:21:00.4049593Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd356 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4049651Z // end inline asm 2026-02-21T09:21:00.4049720Z cp.async.commit_group; 2026-02-21T09:21:00.4049918Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4049985Z add.s32 %r10764, %r1288, %r101; 2026-02-21T09:21:00.4050184Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4050246Z cvt.s64.s32 %rd388, %r10764; 2026-02-21T09:21:00.4050310Z add.s64 %rd357, %rd45, %rd388; 2026-02-21T09:21:00.4050509Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4050642Z // begin inline asm 2026-02-21T09:21:00.4050781Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd357 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4050839Z // end inline asm 2026-02-21T09:21:00.4050907Z cp.async.commit_group; 2026-02-21T09:21:00.4051101Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4051227Z add.s64 %rd358, %rd368, 288; 2026-02-21T09:21:00.4051292Z add.s64 %rd359, %rd372, 288; 2026-02-21T09:21:00.4051356Z add.s64 %rd360, %rd376, 288; 2026-02-21T09:21:00.4051417Z add.s64 %rd361, %rd380, 288; 2026-02-21T09:21:00.4051615Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4051676Z // begin inline asm 2026-02-21T09:21:00.4051872Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd358 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4051932Z // end inline asm 2026-02-21T09:21:00.4051995Z // begin inline asm 2026-02-21T09:21:00.4052128Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd359 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4052194Z // end inline asm 2026-02-21T09:21:00.4052270Z // begin inline asm 2026-02-21T09:21:00.4052405Z cp.async.ca.shared.global [ %r105 + 0 ], [ %rd360 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4052465Z // end inline asm 2026-02-21T09:21:00.4052526Z // begin inline asm 2026-02-21T09:21:00.4052658Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd361 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4052714Z // end inline asm 2026-02-21T09:21:00.4052778Z cp.async.commit_group; 2026-02-21T09:21:00.4053029Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4053098Z add.s32 %r10765, %r1288, %r107; 2026-02-21T09:21:00.4053298Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4053365Z cvt.s64.s32 %rd389, %r10765; 2026-02-21T09:21:00.4053429Z add.s64 %rd362, %rd45, %rd389; 2026-02-21T09:21:00.4053626Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4053689Z // begin inline asm 2026-02-21T09:21:00.4053822Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd362 + 0 ], 0x8, %r10407; 2026-02-21T09:21:00.4053881Z // end inline asm 2026-02-21T09:21:00.4053945Z cp.async.commit_group; 2026-02-21T09:21:00.4054145Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4054211Z add.s32 %r10766, %r10738, %r238; 2026-02-21T09:21:00.4054278Z sub.s32 %r10767, %r10766, %r10740; 2026-02-21T09:21:00.4054340Z shl.b32 %r10768, %r10767, 8; 2026-02-21T09:21:00.4054401Z add.s32 %r22779, %r133, %r10768; 2026-02-21T09:21:00.4054460Z or.b32 %r10769, %r8, %r1289; 2026-02-21T09:21:00.4054523Z shl.b32 %r10770, %r10769, 10; 2026-02-21T09:21:00.4054594Z mul.wide.s32 %rd22, %r10770, 2; 2026-02-21T09:21:00.4054652Z or.b32 %r10771, %r7, %r1289; 2026-02-21T09:21:00.4054714Z shl.b32 %r10772, %r10771, 10; 2026-02-21T09:21:00.4054784Z mul.wide.s32 %rd23, %r10772, 2; 2026-02-21T09:21:00.4054843Z shl.b32 %r10773, %r10739, 18; 2026-02-21T09:21:00.4054907Z or.b32 %r10774, %r22252, %r10773; 2026-02-21T09:21:00.4054975Z mul.wide.s32 %rd24, %r10774, 2; 2026-02-21T09:21:00.4055038Z or.b32 %r22778, %r137, %r10773; 2026-02-21T09:21:00.4055097Z mov.b32 %r22782, 0f00000000; 2026-02-21T09:21:00.4055156Z mov.b32 %r22781, 4; 2026-02-21T09:21:00.4055221Z mov.b32 %r22780, -1; 2026-02-21T09:21:00.4055281Z mov.b64 %rd719, -16; 2026-02-21T09:21:00.4055341Z mov.b64 %rd718, %rd3; 2026-02-21T09:21:00.4055406Z mov.b32 %r22783, %r22782; 2026-02-21T09:21:00.4055465Z mov.b32 %r22784, %r22782; 2026-02-21T09:21:00.4055523Z mov.b32 %r22785, %r22782; 2026-02-21T09:21:00.4055583Z mov.b32 %r22786, %r22782; 2026-02-21T09:21:00.4055647Z mov.b32 %r22787, %r22782; 2026-02-21T09:21:00.4055705Z mov.b32 %r22788, %r22782; 2026-02-21T09:21:00.4055764Z mov.b32 %r22789, %r22782; 2026-02-21T09:21:00.4055890Z mov.b32 %r22790, %r22782; 2026-02-21T09:21:00.4055959Z mov.b32 %r22791, %r22782; 2026-02-21T09:21:00.4056017Z mov.b32 %r22792, %r22782; 2026-02-21T09:21:00.4056075Z mov.b32 %r22793, %r22782; 2026-02-21T09:21:00.4056139Z mov.b32 %r22794, %r22782; 2026-02-21T09:21:00.4056261Z mov.b32 %r22795, %r22782; 2026-02-21T09:21:00.4056320Z mov.b32 %r22796, %r22782; 2026-02-21T09:21:00.4056382Z mov.b32 %r22797, %r22782; 2026-02-21T09:21:00.4056442Z mov.b32 %r22798, %r22782; 2026-02-21T09:21:00.4056624Z mov.b32 %r22799, %r22782; 2026-02-21T09:21:00.4056689Z mov.b32 %r22800, %r22782; 2026-02-21T09:21:00.4056750Z mov.b32 %r22801, %r22782; 2026-02-21T09:21:00.4056808Z mov.b32 %r22802, %r22782; 2026-02-21T09:21:00.4056866Z mov.b32 %r22803, %r22782; 2026-02-21T09:21:00.4057002Z mov.b32 %r22804, %r22782; 2026-02-21T09:21:00.4057065Z mov.b32 %r22805, %r22782; 2026-02-21T09:21:00.4057123Z mov.b32 %r22806, %r22782; 2026-02-21T09:21:00.4057182Z mov.b32 %r22807, %r22782; 2026-02-21T09:21:00.4057248Z mov.b32 %r22808, %r22782; 2026-02-21T09:21:00.4057305Z mov.b32 %r22809, %r22782; 2026-02-21T09:21:00.4057364Z mov.b32 %r22810, %r22782; 2026-02-21T09:21:00.4057430Z mov.b32 %r22811, %r22782; 2026-02-21T09:21:00.4057489Z mov.b32 %r22812, %r22782; 2026-02-21T09:21:00.4057549Z mov.b32 %r22813, %r22782; 2026-02-21T09:21:00.4057612Z mov.b32 %r22814, %r22782; 2026-02-21T09:21:00.4057671Z mov.b32 %r22815, %r22782; 2026-02-21T09:21:00.4057729Z mov.b32 %r22816, %r22782; 2026-02-21T09:21:00.4057786Z mov.b32 %r22817, %r22782; 2026-02-21T09:21:00.4057914Z mov.b32 %r22818, %r22782; 2026-02-21T09:21:00.4057978Z mov.b32 %r22819, %r22782; 2026-02-21T09:21:00.4058039Z mov.b32 %r22820, %r22782; 2026-02-21T09:21:00.4058100Z mov.b32 %r22821, %r22782; 2026-02-21T09:21:00.4058160Z mov.b32 %r22822, %r22782; 2026-02-21T09:21:00.4058219Z mov.b32 %r22823, %r22782; 2026-02-21T09:21:00.4058278Z mov.b32 %r22824, %r22782; 2026-02-21T09:21:00.4058342Z mov.b32 %r22825, %r22782; 2026-02-21T09:21:00.4058402Z mov.b32 %r22826, %r22782; 2026-02-21T09:21:00.4058472Z mov.b32 %r22827, %r22782; 2026-02-21T09:21:00.4058539Z mov.b32 %r22828, %r22782; 2026-02-21T09:21:00.4058597Z mov.b32 %r22829, %r22782; 2026-02-21T09:21:00.4058653Z mov.b32 %r22830, %r22782; 2026-02-21T09:21:00.4058715Z mov.b32 %r22831, %r22782; 2026-02-21T09:21:00.4058777Z mov.b32 %r22832, %r22782; 2026-02-21T09:21:00.4058835Z mov.b32 %r22833, %r22782; 2026-02-21T09:21:00.4058894Z mov.b32 %r22834, %r22782; 2026-02-21T09:21:00.4058956Z mov.b32 %r22835, %r22782; 2026-02-21T09:21:00.4059018Z mov.b32 %r22836, %r22782; 2026-02-21T09:21:00.4059076Z mov.b32 %r22837, %r22782; 2026-02-21T09:21:00.4059133Z mov.b32 %r22838, %r22782; 2026-02-21T09:21:00.4059196Z mov.b32 %r22839, %r22782; 2026-02-21T09:21:00.4059254Z mov.b32 %r22840, %r22782; 2026-02-21T09:21:00.4059312Z mov.b32 %r22841, %r22782; 2026-02-21T09:21:00.4059378Z mov.b32 %r22842, %r22782; 2026-02-21T09:21:00.4059436Z mov.b32 %r22843, %r22782; 2026-02-21T09:21:00.4059497Z mov.b32 %r22844, %r22782; 2026-02-21T09:21:00.4059560Z mov.b32 %r22845, %r22782; 2026-02-21T09:21:00.4059623Z mov.b32 %r22846, %r22782; 2026-02-21T09:21:00.4059685Z mov.b32 %r22847, %r22782; 2026-02-21T09:21:00.4059743Z mov.b32 %r22848, %r22782; 2026-02-21T09:21:00.4059808Z mov.b32 %r22849, %r22782; 2026-02-21T09:21:00.4059866Z mov.b32 %r22850, %r22782; 2026-02-21T09:21:00.4059924Z mov.b32 %r22851, %r22782; 2026-02-21T09:21:00.4059983Z mov.b32 %r22852, %r22782; 2026-02-21T09:21:00.4060047Z mov.b32 %r22853, %r22782; 2026-02-21T09:21:00.4060106Z mov.b32 %r22854, %r22782; 2026-02-21T09:21:00.4060164Z mov.b32 %r22855, %r22782; 2026-02-21T09:21:00.4060226Z mov.b32 %r22856, %r22782; 2026-02-21T09:21:00.4060282Z mov.b32 %r22857, %r22782; 2026-02-21T09:21:00.4060343Z mov.b32 %r22858, %r22782; 2026-02-21T09:21:00.4060408Z mov.b32 %r22859, %r22782; 2026-02-21T09:21:00.4060466Z mov.b32 %r22860, %r22782; 2026-02-21T09:21:00.4060522Z mov.b32 %r22861, %r22782; 2026-02-21T09:21:00.4060658Z mov.b32 %r22862, %r22782; 2026-02-21T09:21:00.4060722Z mov.b32 %r22863, %r22782; 2026-02-21T09:21:00.4060779Z mov.b32 %r22864, %r22782; 2026-02-21T09:21:00.4060837Z mov.b32 %r22865, %r22782; 2026-02-21T09:21:00.4060898Z mov.b32 %r22866, %r22782; 2026-02-21T09:21:00.4061017Z mov.b32 %r22867, %r22782; 2026-02-21T09:21:00.4061074Z mov.b32 %r22868, %r22782; 2026-02-21T09:21:00.4061131Z mov.b32 %r22869, %r22782; 2026-02-21T09:21:00.4061203Z mov.b32 %r22870, %r22782; 2026-02-21T09:21:00.4061263Z mov.b32 %r22871, %r22782; 2026-02-21T09:21:00.4061323Z mov.b32 %r22872, %r22782; 2026-02-21T09:21:00.4061383Z mov.b32 %r22873, %r22782; 2026-02-21T09:21:00.4061441Z mov.b32 %r22874, %r22782; 2026-02-21T09:21:00.4061499Z mov.b32 %r22875, %r22782; 2026-02-21T09:21:00.4061604Z mov.b32 %r22876, %r22782; 2026-02-21T09:21:00.4061669Z mov.b32 %r22877, %r22782; 2026-02-21T09:21:00.4061729Z mov.b32 %r22878, %r22782; 2026-02-21T09:21:00.4061786Z mov.b32 %r22879, %r22782; 2026-02-21T09:21:00.4061849Z mov.b32 %r22880, %r22782; 2026-02-21T09:21:00.4061906Z mov.b32 %r22881, %r22782; 2026-02-21T09:21:00.4061976Z mov.b32 %r22882, %r22782; 2026-02-21T09:21:00.4062035Z mov.b32 %r22883, %r22782; 2026-02-21T09:21:00.4062098Z mov.b32 %r22884, %r22782; 2026-02-21T09:21:00.4062158Z mov.b32 %r22885, %r22782; 2026-02-21T09:21:00.4062216Z mov.b32 %r22886, %r22782; 2026-02-21T09:21:00.4062276Z mov.b32 %r22887, %r22782; 2026-02-21T09:21:00.4062335Z mov.b32 %r22888, %r22782; 2026-02-21T09:21:00.4062391Z mov.b32 %r22889, %r22782; 2026-02-21T09:21:00.4062496Z mov.b32 %r22890, %r22782; 2026-02-21T09:21:00.4062566Z mov.b32 %r22891, %r22782; 2026-02-21T09:21:00.4062625Z mov.b32 %r22892, %r22782; 2026-02-21T09:21:00.4062683Z mov.b32 %r22893, %r22782; 2026-02-21T09:21:00.4062746Z mov.b32 %r22894, %r22782; 2026-02-21T09:21:00.4062804Z mov.b32 %r22895, %r22782; 2026-02-21T09:21:00.4062862Z mov.b32 %r22896, %r22782; 2026-02-21T09:21:00.4062919Z mov.b32 %r22897, %r22782; 2026-02-21T09:21:00.4062983Z mov.b32 %r22898, %r22782; 2026-02-21T09:21:00.4063040Z mov.b32 %r22899, %r22782; 2026-02-21T09:21:00.4063097Z mov.b32 %r22900, %r22782; 2026-02-21T09:21:00.4063159Z mov.b32 %r22901, %r22782; 2026-02-21T09:21:00.4063216Z mov.b32 %r22902, %r22782; 2026-02-21T09:21:00.4063275Z mov.b32 %r22903, %r22782; 2026-02-21T09:21:00.4063337Z mov.b32 %r22904, %r22782; 2026-02-21T09:21:00.4063395Z mov.b32 %r22905, %r22782; 2026-02-21T09:21:00.4063452Z mov.b32 %r22906, %r22782; 2026-02-21T09:21:00.4063509Z mov.b32 %r22907, %r22782; 2026-02-21T09:21:00.4063574Z mov.b32 %r22908, %r22782; 2026-02-21T09:21:00.4063631Z mov.b32 %r22909, %r22782; 2026-02-21T09:21:00.4063689Z mov.b32 %r22910, %r22782; 2026-02-21T09:21:00.4063751Z mov.b32 %r22911, %r22782; 2026-02-21T09:21:00.4063809Z mov.b32 %r22912, %r22782; 2026-02-21T09:21:00.4063868Z mov.b32 %r22913, %r22782; 2026-02-21T09:21:00.4063926Z mov.b32 %r22914, %r22782; 2026-02-21T09:21:00.4063988Z mov.b32 %r22915, %r22782; 2026-02-21T09:21:00.4064046Z mov.b32 %r22916, %r22782; 2026-02-21T09:21:00.4064105Z mov.b32 %r22917, %r22782; 2026-02-21T09:21:00.4064166Z mov.b32 %r22918, %r22782; 2026-02-21T09:21:00.4064225Z mov.b32 %r22919, %r22782; 2026-02-21T09:21:00.4064282Z mov.b32 %r22920, %r22782; 2026-02-21T09:21:00.4064341Z mov.b32 %r22921, %r22782; 2026-02-21T09:21:00.4064402Z mov.b32 %r22922, %r22782; 2026-02-21T09:21:00.4064461Z mov.b32 %r22923, %r22782; 2026-02-21T09:21:00.4064520Z mov.b32 %r22924, %r22782; 2026-02-21T09:21:00.4064580Z mov.b32 %r22925, %r22782; 2026-02-21T09:21:00.4064640Z mov.b32 %r22926, %r22782; 2026-02-21T09:21:00.4064698Z mov.b32 %r22927, %r22782; 2026-02-21T09:21:00.4064757Z mov.b32 %r22928, %r22782; 2026-02-21T09:21:00.4064822Z mov.b32 %r22929, %r22782; 2026-02-21T09:21:00.4064881Z mov.b32 %r22930, %r22782; 2026-02-21T09:21:00.4064941Z mov.b32 %r22931, %r22782; 2026-02-21T09:21:00.4065004Z mov.b32 %r22932, %r22782; 2026-02-21T09:21:00.4065062Z mov.b32 %r22933, %r22782; 2026-02-21T09:21:00.4065178Z mov.b32 %r22934, %r22782; 2026-02-21T09:21:00.4065236Z mov.b32 %r22935, %r22782; 2026-02-21T09:21:00.4065298Z mov.b32 %r22936, %r22782; 2026-02-21T09:21:00.4065356Z mov.b32 %r22937, %r22782; 2026-02-21T09:21:00.4065414Z mov.b32 %r22938, %r22782; 2026-02-21T09:21:00.4065522Z mov.b32 %r22939, %r22782; 2026-02-21T09:21:00.4065580Z mov.b32 %r22940, %r22782; 2026-02-21T09:21:00.4065639Z mov.b32 %r22941, %r22782; 2026-02-21T09:21:00.4065697Z mov.b32 %r22942, %r22782; 2026-02-21T09:21:00.4065762Z mov.b32 %r22943, %r22782; 2026-02-21T09:21:00.4065821Z mov.b32 %r22944, %r22782; 2026-02-21T09:21:00.4065880Z mov.b32 %r22945, %r22782; 2026-02-21T09:21:00.4065943Z mov.b32 %r22946, %r22782; 2026-02-21T09:21:00.4065999Z mov.b32 %r22947, %r22782; 2026-02-21T09:21:00.4066104Z mov.b32 %r22948, %r22782; 2026-02-21T09:21:00.4066163Z mov.b32 %r22949, %r22782; 2026-02-21T09:21:00.4066227Z mov.b32 %r22950, %r22782; 2026-02-21T09:21:00.4066286Z mov.b32 %r22951, %r22782; 2026-02-21T09:21:00.4066346Z mov.b32 %r22952, %r22782; 2026-02-21T09:21:00.4066408Z mov.b32 %r22953, %r22782; 2026-02-21T09:21:00.4066585Z mov.b32 %r22954, %r22782; 2026-02-21T09:21:00.4066650Z mov.b32 %r22955, %r22782; 2026-02-21T09:21:00.4066713Z mov.b32 %r22956, %r22782; 2026-02-21T09:21:00.4066773Z mov.b32 %r22957, %r22782; 2026-02-21T09:21:00.4066831Z mov.b32 %r22958, %r22782; 2026-02-21T09:21:00.4066889Z mov.b32 %r22959, %r22782; 2026-02-21T09:21:00.4066949Z mov.b32 %r22960, %r22782; 2026-02-21T09:21:00.4067009Z mov.b32 %r22961, %r22782; 2026-02-21T09:21:00.4067148Z mov.b32 %r22962, %r22782; 2026-02-21T09:21:00.4067216Z mov.b32 %r22963, %r22782; 2026-02-21T09:21:00.4067277Z mov.b32 %r22964, %r22782; 2026-02-21T09:21:00.4067333Z mov.b32 %r22965, %r22782; 2026-02-21T09:21:00.4067394Z mov.b32 %r22966, %r22782; 2026-02-21T09:21:00.4067458Z mov.b32 %r22967, %r22782; 2026-02-21T09:21:00.4067516Z mov.b32 %r22968, %r22782; 2026-02-21T09:21:00.4067574Z mov.b32 %r22969, %r22782; 2026-02-21T09:21:00.4067636Z mov.b32 %r22970, %r22782; 2026-02-21T09:21:00.4067691Z mov.b32 %r22971, %r22782; 2026-02-21T09:21:00.4067748Z mov.b32 %r22972, %r22782; 2026-02-21T09:21:00.4067806Z mov.b32 %r22973, %r22782; 2026-02-21T09:21:00.4067867Z mov.b32 %r22974, %r22782; 2026-02-21T09:21:00.4067927Z mov.b32 %r22975, %r22782; 2026-02-21T09:21:00.4067985Z mov.b32 %r22976, %r22782; 2026-02-21T09:21:00.4068048Z mov.b32 %r22977, %r22782; 2026-02-21T09:21:00.4068107Z mov.b32 %r22978, %r22782; 2026-02-21T09:21:00.4068164Z mov.b32 %r22979, %r22782; 2026-02-21T09:21:00.4068223Z mov.b32 %r22980, %r22782; 2026-02-21T09:21:00.4068336Z mov.b32 %r22981, %r22782; 2026-02-21T09:21:00.4068402Z mov.b32 %r22982, %r22782; 2026-02-21T09:21:00.4068461Z mov.b32 %r22983, %r22782; 2026-02-21T09:21:00.4068525Z mov.b32 %r22984, %r22782; 2026-02-21T09:21:00.4068586Z mov.b32 %r22985, %r22782; 2026-02-21T09:21:00.4068645Z mov.b32 %r22986, %r22782; 2026-02-21T09:21:00.4068705Z mov.b32 %r22987, %r22782; 2026-02-21T09:21:00.4068767Z mov.b32 %r22988, %r22782; 2026-02-21T09:21:00.4068825Z mov.b32 %r22989, %r22782; 2026-02-21T09:21:00.4068881Z mov.b32 %r22990, %r22782; 2026-02-21T09:21:00.4068943Z mov.b32 %r22991, %r22782; 2026-02-21T09:21:00.4069000Z mov.b32 %r22992, %r22782; 2026-02-21T09:21:00.4069060Z mov.b32 %r22993, %r22782; 2026-02-21T09:21:00.4069117Z mov.b32 %r22994, %r22782; 2026-02-21T09:21:00.4069179Z mov.b32 %r22995, %r22782; 2026-02-21T09:21:00.4069238Z mov.b32 %r22996, %r22782; 2026-02-21T09:21:00.4069297Z mov.b32 %r22997, %r22782; 2026-02-21T09:21:00.4069363Z mov.b32 %r22998, %r22782; 2026-02-21T09:21:00.4069421Z mov.b32 %r22999, %r22782; 2026-02-21T09:21:00.4069479Z mov.b32 %r23000, %r22782; 2026-02-21T09:21:00.4069540Z mov.b32 %r23001, %r22782; 2026-02-21T09:21:00.4069602Z mov.b32 %r23002, %r22782; 2026-02-21T09:21:00.4069659Z mov.b32 %r23003, %r22782; 2026-02-21T09:21:00.4069718Z mov.b32 %r23004, %r22782; 2026-02-21T09:21:00.4069780Z mov.b32 %r23005, %r22782; 2026-02-21T09:21:00.4069924Z mov.b32 %r23006, %r22782; 2026-02-21T09:21:00.4069981Z mov.b32 %r23007, %r22782; 2026-02-21T09:21:00.4070043Z mov.b32 %r23008, %r22782; 2026-02-21T09:21:00.4070101Z mov.b32 %r23009, %r22782; 2026-02-21T09:21:00.4070158Z mov.b32 %r23010, %r22782; 2026-02-21T09:21:00.4070282Z mov.b32 %r23011, %r22782; 2026-02-21T09:21:00.4070344Z mov.b32 %r23012, %r22782; 2026-02-21T09:21:00.4070400Z mov.b32 %r23013, %r22782; 2026-02-21T09:21:00.4070461Z mov.b32 %r23014, %r22782; 2026-02-21T09:21:00.4070521Z mov.b32 %r23015, %r22782; 2026-02-21T09:21:00.4070580Z mov.b32 %r23016, %r22782; 2026-02-21T09:21:00.4070638Z mov.b32 %r23017, %r22782; 2026-02-21T09:21:00.4070695Z mov.b32 %r23018, %r22782; 2026-02-21T09:21:00.4070760Z mov.b32 %r23019, %r22782; 2026-02-21T09:21:00.4070883Z mov.b32 %r23020, %r22782; 2026-02-21T09:21:00.4070945Z mov.b32 %r23021, %r22782; 2026-02-21T09:21:00.4071007Z mov.b32 %r23022, %r22782; 2026-02-21T09:21:00.4071066Z mov.b32 %r23023, %r22782; 2026-02-21T09:21:00.4071127Z mov.b32 %r23024, %r22782; 2026-02-21T09:21:00.4071186Z mov.b32 %r23025, %r22782; 2026-02-21T09:21:00.4071247Z mov.b32 %r23026, %r22782; 2026-02-21T09:21:00.4071305Z mov.b32 %r23027, %r22782; 2026-02-21T09:21:00.4071376Z mov.b32 %r23028, %r22782; 2026-02-21T09:21:00.4071448Z mov.b32 %r23029, %r22782; 2026-02-21T09:21:00.4071509Z mov.b32 %r23030, %r22782; 2026-02-21T09:21:00.4072191Z [540s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:21:00.4073512Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 256], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:21:00.4073668Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:21:00.4073728Z `ptxas` stderr: 2026-02-21T09:21:00.4074199Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1035 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T09:21:00.4074305Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:00.4074315Z 2026-02-21T09:21:00.4074831Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp74pdajdr.ptx -o /tmp/tmp74pdajdr.ptx.o 2026-02-21T09:21:00.4074841Z 2026-02-21T09:21:00.4074996Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:21:00.4075063Z mov.b32 %r23031, %r22782; 2026-02-21T09:21:00.4075125Z mov.b32 %r23032, %r22782; 2026-02-21T09:21:00.4075189Z mov.b32 %r23033, %r22782; 2026-02-21T09:21:00.4075249Z mov.b32 %r23034, %r22782; 2026-02-21T09:21:00.4075308Z mov.b32 %r23035, %r22782; 2026-02-21T09:21:00.4075372Z mov.b32 %r23036, %r22782; 2026-02-21T09:21:00.4075434Z mov.b32 %r23037, %r22782; 2026-02-21T09:21:00.4075548Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:00.4075657Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:00.4075728Z add.s64 %rd719, %rd719, 16; 2026-02-21T09:21:00.4075799Z setp.lt.u64 %p34, %rd719, 432; 2026-02-21T09:21:00.4075862Z add.s32 %r13911, %r22780, 1; 2026-02-21T09:21:00.4075935Z setp.gt.s32 %p35, %r13911, 4; 2026-02-21T09:21:00.4076008Z selp.b32 %r22780, 0, %r13911, %p35; 2026-02-21T09:21:00.4076222Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4076298Z cp.async.wait_group 16; 2026-02-21T09:21:00.4076356Z bar.sync 0; 2026-02-21T09:21:00.4076419Z shl.b32 %r13912, %r22780, 13; 2026-02-21T09:21:00.4076608Z add.s32 %r13914, %r22237, %r13912; 2026-02-21T09:21:00.4076929Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4077000Z add.s32 %r13915, %r13914, %r109; 2026-02-21T09:21:00.4077069Z ld.shared.b16 %rs225, [%r13915]; 2026-02-21T09:21:00.4077208Z ld.shared.b16 %rs226, [%r13915+256]; 2026-02-21T09:21:00.4077278Z ld.shared.b16 %rs227, [%r13915+16]; 2026-02-21T09:21:00.4077345Z ld.shared.b16 %rs228, [%r13915+272]; 2026-02-21T09:21:00.4077415Z ld.shared.b16 %rs229, [%r13915+4096]; 2026-02-21T09:21:00.4077487Z ld.shared.b16 %rs230, [%r13915+4352]; 2026-02-21T09:21:00.4077554Z ld.shared.b16 %rs231, [%r13915+4112]; 2026-02-21T09:21:00.4077620Z ld.shared.b16 %rs232, [%r13915+4368]; 2026-02-21T09:21:00.4077764Z add.s32 %r13916, %r13914, %r110; 2026-02-21T09:21:00.4077834Z ld.shared.b16 %rs233, [%r13916]; 2026-02-21T09:21:00.4077901Z ld.shared.b16 %rs234, [%r13916+256]; 2026-02-21T09:21:00.4077977Z ld.shared.b16 %rs235, [%r13916+16]; 2026-02-21T09:21:00.4078045Z ld.shared.b16 %rs236, [%r13916+272]; 2026-02-21T09:21:00.4078114Z ld.shared.b16 %rs237, [%r13916+4096]; 2026-02-21T09:21:00.4078184Z ld.shared.b16 %rs238, [%r13916+4352]; 2026-02-21T09:21:00.4078257Z ld.shared.b16 %rs239, [%r13916+4112]; 2026-02-21T09:21:00.4078326Z ld.shared.b16 %rs240, [%r13916+4368]; 2026-02-21T09:21:00.4078393Z cvt.f32.bf16 %r11031, %rs225; 2026-02-21T09:21:00.4078458Z cvt.f32.bf16 %r11032, %rs226; 2026-02-21T09:21:00.4078520Z cvt.f32.bf16 %r11033, %rs233; 2026-02-21T09:21:00.4078580Z cvt.f32.bf16 %r11034, %rs234; 2026-02-21T09:21:00.4078705Z cvt.f32.bf16 %r11291, %rs227; 2026-02-21T09:21:00.4078772Z cvt.f32.bf16 %r11292, %rs228; 2026-02-21T09:21:00.4078833Z cvt.f32.bf16 %r11293, %rs235; 2026-02-21T09:21:00.4078895Z cvt.f32.bf16 %r11294, %rs236; 2026-02-21T09:21:00.4078962Z cvt.f32.bf16 %r11551, %rs229; 2026-02-21T09:21:00.4079023Z cvt.f32.bf16 %r11552, %rs230; 2026-02-21T09:21:00.4079083Z cvt.f32.bf16 %r11553, %rs237; 2026-02-21T09:21:00.4079150Z cvt.f32.bf16 %r11554, %rs238; 2026-02-21T09:21:00.4079211Z cvt.f32.bf16 %r11811, %rs231; 2026-02-21T09:21:00.4079272Z cvt.f32.bf16 %r11812, %rs232; 2026-02-21T09:21:00.4079334Z cvt.f32.bf16 %r11813, %rs239; 2026-02-21T09:21:00.4079405Z cvt.f32.bf16 %r11814, %rs240; 2026-02-21T09:21:00.4079625Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4079689Z shl.b32 %r13917, %r22780, 11; 2026-02-21T09:21:00.4079758Z add.s32 %r13918, %r22237, %r13917; 2026-02-21T09:21:00.4079823Z add.s32 %r13919, %r13918, 98304; 2026-02-21T09:21:00.4080022Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4080085Z add.s32 %r13920, %r13919, %r22239; 2026-02-21T09:21:00.4080153Z add.s32 %r13921, %r13919, %r22243; 2026-02-21T09:21:00.4080215Z add.s32 %r13922, %r13919, %r22244; 2026-02-21T09:21:00.4080413Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4080488Z ld.shared.s8 %rs241, [%r13920]; 2026-02-21T09:21:00.4080685Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4080752Z shl.b16 %rs242, %rs241, 4; 2026-02-21T09:21:00.4080949Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4081018Z ld.shared.s8 %rs243, [%r13920+256]; 2026-02-21T09:21:00.4081214Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4081283Z shl.b16 %rs244, %rs243, 4; 2026-02-21T09:21:00.4081479Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4081548Z ld.shared.s8 %rs245, [%r13920+512]; 2026-02-21T09:21:00.4081742Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4081870Z shl.b16 %rs246, %rs245, 4; 2026-02-21T09:21:00.4082073Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4082141Z ld.shared.s8 %rs247, [%r13921]; 2026-02-21T09:21:00.4082388Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4082450Z shl.b16 %rs248, %rs247, 4; 2026-02-21T09:21:00.4082644Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4082720Z ld.shared.s8 %rs249, [%r13920+1024]; 2026-02-21T09:21:00.4082915Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4083025Z shl.b16 %rs250, %rs249, 4; 2026-02-21T09:21:00.4083231Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4083298Z ld.shared.s8 %rs251, [%r13920+1280]; 2026-02-21T09:21:00.4083494Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4083556Z shl.b16 %rs252, %rs251, 4; 2026-02-21T09:21:00.4083756Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4083826Z ld.shared.s8 %rs253, [%r13920+1536]; 2026-02-21T09:21:00.4084022Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4084091Z shl.b16 %rs254, %rs253, 4; 2026-02-21T09:21:00.4084332Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4084402Z ld.shared.s8 %rs255, [%r13922]; 2026-02-21T09:21:00.4084603Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4084667Z shl.b16 %rs256, %rs255, 4; 2026-02-21T09:21:00.4084862Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4084933Z cvt.s16.s8 %rs257, %rs242; 2026-02-21T09:21:00.4084994Z shr.s16 %rs258, %rs257, 4; 2026-02-21T09:21:00.4085055Z cvt.s16.s8 %rs259, %rs244; 2026-02-21T09:21:00.4085121Z shr.s16 %rs260, %rs259, 4; 2026-02-21T09:21:00.4085186Z shr.s16 %rs261, %rs241, 4; 2026-02-21T09:21:00.4085247Z shr.s16 %rs262, %rs243, 4; 2026-02-21T09:21:00.4085441Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4085513Z cvt.rn.f32.s16 %r13923, %rs262; 2026-02-21T09:21:00.4085577Z cvt.rn.f32.s16 %r13924, %rs261; 2026-02-21T09:21:00.4085641Z cvt.rn.f32.s16 %r13925, %rs260; 2026-02-21T09:21:00.4085711Z cvt.rn.f32.s16 %r13926, %rs258; 2026-02-21T09:21:00.4085908Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4085971Z cvt.s16.s8 %rs263, %rs246; 2026-02-21T09:21:00.4086037Z shr.s16 %rs264, %rs263, 4; 2026-02-21T09:21:00.4086112Z cvt.s16.s8 %rs265, %rs248; 2026-02-21T09:21:00.4086180Z shr.s16 %rs266, %rs265, 4; 2026-02-21T09:21:00.4086242Z shr.s16 %rs267, %rs245, 4; 2026-02-21T09:21:00.4086306Z shr.s16 %rs268, %rs247, 4; 2026-02-21T09:21:00.4086622Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4086701Z cvt.rn.f32.s16 %r13927, %rs268; 2026-02-21T09:21:00.4086767Z cvt.rn.f32.s16 %r13928, %rs267; 2026-02-21T09:21:00.4086838Z cvt.rn.f32.s16 %r13929, %rs266; 2026-02-21T09:21:00.4086904Z cvt.rn.f32.s16 %r13930, %rs264; 2026-02-21T09:21:00.4087102Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4087171Z cvt.s16.s8 %rs269, %rs250; 2026-02-21T09:21:00.4087232Z shr.s16 %rs270, %rs269, 4; 2026-02-21T09:21:00.4087292Z cvt.s16.s8 %rs271, %rs252; 2026-02-21T09:21:00.4087357Z shr.s16 %rs272, %rs271, 4; 2026-02-21T09:21:00.4087502Z shr.s16 %rs273, %rs249, 4; 2026-02-21T09:21:00.4087563Z shr.s16 %rs274, %rs251, 4; 2026-02-21T09:21:00.4087759Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4087891Z cvt.rn.f32.s16 %r13931, %rs274; 2026-02-21T09:21:00.4087953Z cvt.rn.f32.s16 %r13932, %rs273; 2026-02-21T09:21:00.4088015Z cvt.rn.f32.s16 %r13933, %rs272; 2026-02-21T09:21:00.4088081Z cvt.rn.f32.s16 %r13934, %rs270; 2026-02-21T09:21:00.4088279Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4088342Z cvt.s16.s8 %rs275, %rs254; 2026-02-21T09:21:00.4088406Z shr.s16 %rs276, %rs275, 4; 2026-02-21T09:21:00.4088466Z cvt.s16.s8 %rs277, %rs256; 2026-02-21T09:21:00.4088595Z shr.s16 %rs278, %rs277, 4; 2026-02-21T09:21:00.4088659Z shr.s16 %rs279, %rs253, 4; 2026-02-21T09:21:00.4088725Z shr.s16 %rs280, %rs255, 4; 2026-02-21T09:21:00.4088923Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4088990Z cvt.rn.f32.s16 %r13935, %rs280; 2026-02-21T09:21:00.4089057Z cvt.rn.f32.s16 %r13936, %rs279; 2026-02-21T09:21:00.4089121Z cvt.rn.f32.s16 %r13937, %rs278; 2026-02-21T09:21:00.4089187Z cvt.rn.f32.s16 %r13938, %rs276; 2026-02-21T09:21:00.4089312Z st.shared.v4.b32 [%r113], {%r13926, %r13924, %r13925, %r13923}; 2026-02-21T09:21:00.4089431Z st.shared.v4.b32 [%r114], {%r13930, %r13928, %r13929, %r13927}; 2026-02-21T09:21:00.4089604Z st.shared.v4.b32 [%r115], {%r13934, %r13932, %r13933, %r13931}; 2026-02-21T09:21:00.4089717Z st.shared.v4.b32 [%r116], {%r13938, %r13936, %r13937, %r13935}; 2026-02-21T09:21:00.4089780Z $L__tmp9: 2026-02-21T09:21:00.4090062Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4090124Z // begin inline asm 2026-02-21T09:21:00.4090213Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4090276Z // end inline asm 2026-02-21T09:21:00.4090334Z bar.sync 0; 2026-02-21T09:21:00.4090421Z shfl.sync.idx.b32 %r13939, %r5, 0, 31, -1; 2026-02-21T09:21:00.4090499Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4090567Z mov.pred %p26, -1; 2026-02-21T09:21:00.4090628Z // begin inline asm 2026-02-21T09:21:00.4093314Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909}, {%r11031,%r11032,%r11033,%r11034}, %rd1, %p26, 1, 1; 2026-02-21T09:21:00.4093378Z // end inline asm 2026-02-21T09:21:00.4093446Z // begin inline asm 2026-02-21T09:21:00.4096102Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909}, {%r11291,%r11292,%r11293,%r11294}, %rd2, %p26, 1, 1; 2026-02-21T09:21:00.4096262Z // end inline asm 2026-02-21T09:21:00.4096323Z // begin inline asm 2026-02-21T09:21:00.4099250Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037}, {%r11551,%r11552,%r11553,%r11554}, %rd1, %p26, 1, 1; 2026-02-21T09:21:00.4099322Z // end inline asm 2026-02-21T09:21:00.4099389Z // begin inline asm 2026-02-21T09:21:00.4102050Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037}, {%r11811,%r11812,%r11813,%r11814}, %rd2, %p26, 1, 1; 2026-02-21T09:21:00.4102118Z // end inline asm 2026-02-21T09:21:00.4102196Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4102258Z mov.b32 %r13630, 0; 2026-02-21T09:21:00.4102329Z mov.b32 %r12071, %r2884; 2026-02-21T09:21:00.4102392Z mov.b32 %r12072, %r13630; 2026-02-21T09:21:00.4102453Z mov.b32 %r12073, %r13630; 2026-02-21T09:21:00.4102525Z // begin inline asm 2026-02-21T09:21:00.4107635Z // wait for regs: %r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909,%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r12071,%r12072,%r12073 2026-02-21T09:21:00.4107843Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4107904Z // end inline asm 2026-02-21T09:21:00.4107985Z $L__tmp10: 2026-02-21T09:21:00.4108204Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4108273Z add.s32 %r13941, %r6234, %r13912; 2026-02-21T09:21:00.4108534Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4108608Z add.s32 %r13942, %r13941, %r109; 2026-02-21T09:21:00.4108677Z ld.shared.b16 %rs281, [%r13942]; 2026-02-21T09:21:00.4108750Z ld.shared.b16 %rs282, [%r13942+256]; 2026-02-21T09:21:00.4108825Z ld.shared.b16 %rs283, [%r13942+16]; 2026-02-21T09:21:00.4108893Z ld.shared.b16 %rs284, [%r13942+272]; 2026-02-21T09:21:00.4108962Z ld.shared.b16 %rs285, [%r13942+4096]; 2026-02-21T09:21:00.4109035Z ld.shared.b16 %rs286, [%r13942+4352]; 2026-02-21T09:21:00.4109101Z ld.shared.b16 %rs287, [%r13942+4112]; 2026-02-21T09:21:00.4109171Z ld.shared.b16 %rs288, [%r13942+4368]; 2026-02-21T09:21:00.4109236Z add.s32 %r13943, %r13941, %r110; 2026-02-21T09:21:00.4109307Z ld.shared.b16 %rs289, [%r13943]; 2026-02-21T09:21:00.4109375Z ld.shared.b16 %rs290, [%r13943+256]; 2026-02-21T09:21:00.4109444Z ld.shared.b16 %rs291, [%r13943+16]; 2026-02-21T09:21:00.4109515Z ld.shared.b16 %rs292, [%r13943+272]; 2026-02-21T09:21:00.4109584Z ld.shared.b16 %rs293, [%r13943+4096]; 2026-02-21T09:21:00.4109652Z ld.shared.b16 %rs294, [%r13943+4352]; 2026-02-21T09:21:00.4109719Z ld.shared.b16 %rs295, [%r13943+4112]; 2026-02-21T09:21:00.4109804Z ld.shared.b16 %rs296, [%r13943+4368]; 2026-02-21T09:21:00.4109873Z cvt.f32.bf16 %r12589, %rs281; 2026-02-21T09:21:00.4109936Z cvt.f32.bf16 %r12590, %rs282; 2026-02-21T09:21:00.4110003Z cvt.f32.bf16 %r12591, %rs289; 2026-02-21T09:21:00.4110064Z cvt.f32.bf16 %r12592, %rs290; 2026-02-21T09:21:00.4110124Z cvt.f32.bf16 %r12849, %rs283; 2026-02-21T09:21:00.4110191Z cvt.f32.bf16 %r12850, %rs284; 2026-02-21T09:21:00.4110253Z cvt.f32.bf16 %r12851, %rs291; 2026-02-21T09:21:00.4110313Z cvt.f32.bf16 %r12852, %rs292; 2026-02-21T09:21:00.4110373Z cvt.f32.bf16 %r13109, %rs285; 2026-02-21T09:21:00.4110439Z cvt.f32.bf16 %r13110, %rs286; 2026-02-21T09:21:00.4110500Z cvt.f32.bf16 %r13111, %rs293; 2026-02-21T09:21:00.4110562Z cvt.f32.bf16 %r13112, %rs294; 2026-02-21T09:21:00.4110627Z cvt.f32.bf16 %r13369, %rs287; 2026-02-21T09:21:00.4110750Z cvt.f32.bf16 %r13370, %rs288; 2026-02-21T09:21:00.4110810Z cvt.f32.bf16 %r13371, %rs295; 2026-02-21T09:21:00.4110872Z cvt.f32.bf16 %r13372, %rs296; 2026-02-21T09:21:00.4111082Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4111193Z add.s32 %r13944, %r13918, 108544; 2026-02-21T09:21:00.4111391Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4111461Z add.s32 %r13945, %r13944, %r22239; 2026-02-21T09:21:00.4111525Z add.s32 %r13946, %r13944, %r22243; 2026-02-21T09:21:00.4111587Z add.s32 %r13947, %r13944, %r22244; 2026-02-21T09:21:00.4111849Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4111921Z ld.shared.s8 %rs297, [%r13945]; 2026-02-21T09:21:00.4112118Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4112188Z shl.b16 %rs298, %rs297, 4; 2026-02-21T09:21:00.4112389Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4112458Z ld.shared.s8 %rs299, [%r13945+256]; 2026-02-21T09:21:00.4112663Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4112730Z shl.b16 %rs300, %rs299, 4; 2026-02-21T09:21:00.4112969Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4113039Z ld.shared.s8 %rs301, [%r13945+512]; 2026-02-21T09:21:00.4113236Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4113300Z shl.b16 %rs302, %rs301, 4; 2026-02-21T09:21:00.4113495Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4113568Z ld.shared.s8 %rs303, [%r13946]; 2026-02-21T09:21:00.4113763Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4113824Z shl.b16 %rs304, %rs303, 4; 2026-02-21T09:21:00.4114017Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4114096Z ld.shared.s8 %rs305, [%r13945+1024]; 2026-02-21T09:21:00.4114290Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4114355Z shl.b16 %rs306, %rs305, 4; 2026-02-21T09:21:00.4114553Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4114621Z ld.shared.s8 %rs307, [%r13945+1280]; 2026-02-21T09:21:00.4114818Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4114884Z shl.b16 %rs308, %rs307, 4; 2026-02-21T09:21:00.4115078Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4115147Z ld.shared.s8 %rs309, [%r13945+1536]; 2026-02-21T09:21:00.4115361Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4115429Z shl.b16 %rs310, %rs309, 4; 2026-02-21T09:21:00.4115623Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4115694Z ld.shared.s8 %rs311, [%r13947]; 2026-02-21T09:21:00.4115890Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4115952Z shl.b16 %rs312, %rs311, 4; 2026-02-21T09:21:00.4116148Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4116216Z cvt.s16.s8 %rs313, %rs298; 2026-02-21T09:21:00.4116276Z shr.s16 %rs314, %rs313, 4; 2026-02-21T09:21:00.4116394Z cvt.s16.s8 %rs315, %rs300; 2026-02-21T09:21:00.4116580Z shr.s16 %rs316, %rs315, 4; 2026-02-21T09:21:00.4116646Z shr.s16 %rs317, %rs297, 4; 2026-02-21T09:21:00.4116708Z shr.s16 %rs318, %rs299, 4; 2026-02-21T09:21:00.4116906Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4117049Z cvt.rn.f32.s16 %r13948, %rs318; 2026-02-21T09:21:00.4117114Z cvt.rn.f32.s16 %r13949, %rs317; 2026-02-21T09:21:00.4117176Z cvt.rn.f32.s16 %r13950, %rs316; 2026-02-21T09:21:00.4117244Z cvt.rn.f32.s16 %r13951, %rs314; 2026-02-21T09:21:00.4117443Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4117504Z cvt.s16.s8 %rs319, %rs302; 2026-02-21T09:21:00.4117634Z shr.s16 %rs320, %rs319, 4; 2026-02-21T09:21:00.4117697Z cvt.s16.s8 %rs321, %rs304; 2026-02-21T09:21:00.4117768Z shr.s16 %rs322, %rs321, 4; 2026-02-21T09:21:00.4117829Z shr.s16 %rs323, %rs301, 4; 2026-02-21T09:21:00.4117898Z shr.s16 %rs324, %rs303, 4; 2026-02-21T09:21:00.4118095Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4118161Z cvt.rn.f32.s16 %r13952, %rs324; 2026-02-21T09:21:00.4118231Z cvt.rn.f32.s16 %r13953, %rs323; 2026-02-21T09:21:00.4118293Z cvt.rn.f32.s16 %r13954, %rs322; 2026-02-21T09:21:00.4118358Z cvt.rn.f32.s16 %r13955, %rs320; 2026-02-21T09:21:00.4118558Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4118680Z cvt.s16.s8 %rs325, %rs306; 2026-02-21T09:21:00.4118744Z shr.s16 %rs326, %rs325, 4; 2026-02-21T09:21:00.4118805Z cvt.s16.s8 %rs327, %rs308; 2026-02-21T09:21:00.4118872Z shr.s16 %rs328, %rs327, 4; 2026-02-21T09:21:00.4118935Z shr.s16 %rs329, %rs305, 4; 2026-02-21T09:21:00.4118995Z shr.s16 %rs330, %rs307, 4; 2026-02-21T09:21:00.4119193Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4119261Z cvt.rn.f32.s16 %r13956, %rs330; 2026-02-21T09:21:00.4119324Z cvt.rn.f32.s16 %r13957, %rs329; 2026-02-21T09:21:00.4119386Z cvt.rn.f32.s16 %r13958, %rs328; 2026-02-21T09:21:00.4119454Z cvt.rn.f32.s16 %r13959, %rs326; 2026-02-21T09:21:00.4119650Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4119715Z cvt.s16.s8 %rs331, %rs310; 2026-02-21T09:21:00.4119782Z shr.s16 %rs332, %rs331, 4; 2026-02-21T09:21:00.4119843Z cvt.s16.s8 %rs333, %rs312; 2026-02-21T09:21:00.4119906Z shr.s16 %rs334, %rs333, 4; 2026-02-21T09:21:00.4119970Z shr.s16 %rs335, %rs309, 4; 2026-02-21T09:21:00.4120042Z shr.s16 %rs336, %rs311, 4; 2026-02-21T09:21:00.4120242Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4120308Z cvt.rn.f32.s16 %r13960, %rs336; 2026-02-21T09:21:00.4120377Z cvt.rn.f32.s16 %r13961, %rs335; 2026-02-21T09:21:00.4120445Z cvt.rn.f32.s16 %r13962, %rs334; 2026-02-21T09:21:00.4120508Z cvt.rn.f32.s16 %r13963, %rs332; 2026-02-21T09:21:00.4120569Z bar.sync 0; 2026-02-21T09:21:00.4120692Z st.shared.v4.b32 [%r113], {%r13951, %r13949, %r13950, %r13948}; 2026-02-21T09:21:00.4120808Z st.shared.v4.b32 [%r114], {%r13955, %r13953, %r13954, %r13952}; 2026-02-21T09:21:00.4120923Z st.shared.v4.b32 [%r115], {%r13959, %r13957, %r13958, %r13956}; 2026-02-21T09:21:00.4121034Z st.shared.v4.b32 [%r116], {%r13963, %r13961, %r13962, %r13960}; 2026-02-21T09:21:00.4121090Z $L__tmp11: 2026-02-21T09:21:00.4121365Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4121431Z // begin inline asm 2026-02-21T09:21:00.4121513Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4121571Z // end inline asm 2026-02-21T09:21:00.4121629Z bar.sync 0; 2026-02-21T09:21:00.4121702Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4121840Z // begin inline asm 2026-02-21T09:21:00.4124577Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909}, {%r12589,%r12590,%r12591,%r12592}, %rd1, %p26, 1, 1; 2026-02-21T09:21:00.4124702Z // end inline asm 2026-02-21T09:21:00.4124770Z // begin inline asm 2026-02-21T09:21:00.4127625Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909}, {%r12849,%r12850,%r12851,%r12852}, %rd2, %p26, 1, 1; 2026-02-21T09:21:00.4127701Z // end inline asm 2026-02-21T09:21:00.4127772Z // begin inline asm 2026-02-21T09:21:00.4130432Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037}, {%r13109,%r13110,%r13111,%r13112}, %rd1, %p26, 1, 1; 2026-02-21T09:21:00.4130496Z // end inline asm 2026-02-21T09:21:00.4130554Z // begin inline asm 2026-02-21T09:21:00.4133282Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037}, {%r13369,%r13370,%r13371,%r13372}, %rd2, %p26, 1, 1; 2026-02-21T09:21:00.4133455Z // end inline asm 2026-02-21T09:21:00.4133545Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4133608Z mov.b32 %r13629, %r2884; 2026-02-21T09:21:00.4133673Z mov.b32 %r13631, %r13630; 2026-02-21T09:21:00.4133742Z // begin inline asm 2026-02-21T09:21:00.4138802Z // wait for regs: %r22782,%r22783,%r22784,%r22785,%r22786,%r22787,%r22788,%r22789,%r22790,%r22791,%r22792,%r22793,%r22794,%r22795,%r22796,%r22797,%r22798,%r22799,%r22800,%r22801,%r22802,%r22803,%r22804,%r22805,%r22806,%r22807,%r22808,%r22809,%r22810,%r22811,%r22812,%r22813,%r22814,%r22815,%r22816,%r22817,%r22818,%r22819,%r22820,%r22821,%r22822,%r22823,%r22824,%r22825,%r22826,%r22827,%r22828,%r22829,%r22830,%r22831,%r22832,%r22833,%r22834,%r22835,%r22836,%r22837,%r22838,%r22839,%r22840,%r22841,%r22842,%r22843,%r22844,%r22845,%r22846,%r22847,%r22848,%r22849,%r22850,%r22851,%r22852,%r22853,%r22854,%r22855,%r22856,%r22857,%r22858,%r22859,%r22860,%r22861,%r22862,%r22863,%r22864,%r22865,%r22866,%r22867,%r22868,%r22869,%r22870,%r22871,%r22872,%r22873,%r22874,%r22875,%r22876,%r22877,%r22878,%r22879,%r22880,%r22881,%r22882,%r22883,%r22884,%r22885,%r22886,%r22887,%r22888,%r22889,%r22890,%r22891,%r22892,%r22893,%r22894,%r22895,%r22896,%r22897,%r22898,%r22899,%r22900,%r22901,%r22902,%r22903,%r22904,%r22905,%r22906,%r22907,%r22908,%r22909,%r22910,%r22911,%r22912,%r22913,%r22914,%r22915,%r22916,%r22917,%r22918,%r22919,%r22920,%r22921,%r22922,%r22923,%r22924,%r22925,%r22926,%r22927,%r22928,%r22929,%r22930,%r22931,%r22932,%r22933,%r22934,%r22935,%r22936,%r22937,%r22938,%r22939,%r22940,%r22941,%r22942,%r22943,%r22944,%r22945,%r22946,%r22947,%r22948,%r22949,%r22950,%r22951,%r22952,%r22953,%r22954,%r22955,%r22956,%r22957,%r22958,%r22959,%r22960,%r22961,%r22962,%r22963,%r22964,%r22965,%r22966,%r22967,%r22968,%r22969,%r22970,%r22971,%r22972,%r22973,%r22974,%r22975,%r22976,%r22977,%r22978,%r22979,%r22980,%r22981,%r22982,%r22983,%r22984,%r22985,%r22986,%r22987,%r22988,%r22989,%r22990,%r22991,%r22992,%r22993,%r22994,%r22995,%r22996,%r22997,%r22998,%r22999,%r23000,%r23001,%r23002,%r23003,%r23004,%r23005,%r23006,%r23007,%r23008,%r23009,%r23010,%r23011,%r23012,%r23013,%r23014,%r23015,%r23016,%r23017,%r23018,%r23019,%r23020,%r23021,%r23022,%r23023,%r23024,%r23025,%r23026,%r23027,%r23028,%r23029,%r23030,%r23031,%r23032,%r23033,%r23034,%r23035,%r23036,%r23037,%r13629,%r13630,%r13631 2026-02-21T09:21:00.4138901Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4138964Z // end inline asm 2026-02-21T09:21:00.4139022Z $L__tmp12: 2026-02-21T09:21:00.4139245Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4139311Z add.s32 %r13964, %r22781, 1; 2026-02-21T09:21:00.4139383Z setp.gt.s32 %p36, %r13964, 4; 2026-02-21T09:21:00.4139455Z selp.b32 %r22781, 0, %r13964, %p36; 2026-02-21T09:21:00.4139661Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4139736Z add.s32 %r13965, %r22778, -16; 2026-02-21T09:21:00.4139800Z add.s64 %rd408, %rd718, %rd24; 2026-02-21T09:21:00.4139864Z add.s64 %rd398, %rd408, 320; 2026-02-21T09:21:00.4139930Z add.s64 %rd409, %rd718, %rd23; 2026-02-21T09:21:00.4140070Z add.s64 %rd399, %rd409, 320; 2026-02-21T09:21:00.4140132Z add.s64 %rd410, %rd718, %rd22; 2026-02-21T09:21:00.4140195Z add.s64 %rd400, %rd410, 320; 2026-02-21T09:21:00.4140277Z mad.wide.s32 %rd401, %r13965, 2, %rd44; 2026-02-21T09:21:00.4140538Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4140599Z shl.b32 %r13966, %r22781, 13; 2026-02-21T09:21:00.4140669Z add.s32 %r13967, %r22237, %r13966; 2026-02-21T09:21:00.4140732Z add.s32 %r13891, %r13967, %r47; 2026-02-21T09:21:00.4140797Z selp.b32 %r13892, 8, 0, %p34; 2026-02-21T09:21:00.4140859Z // begin inline asm 2026-02-21T09:21:00.4141020Z cp.async.ca.shared.global [ %r13891 + 0 ], [ %rd398 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4141139Z // end inline asm 2026-02-21T09:21:00.4141204Z add.s32 %r13893, %r13891, 2048; 2026-02-21T09:21:00.4141267Z // begin inline asm 2026-02-21T09:21:00.4141410Z cp.async.ca.shared.global [ %r13893 + 0 ], [ %rd399 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4141481Z // end inline asm 2026-02-21T09:21:00.4141552Z add.s32 %r13895, %r13891, 4096; 2026-02-21T09:21:00.4141614Z // begin inline asm 2026-02-21T09:21:00.4141754Z cp.async.ca.shared.global [ %r13895 + 0 ], [ %rd400 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4141814Z // end inline asm 2026-02-21T09:21:00.4141883Z add.s32 %r13897, %r13891, 6144; 2026-02-21T09:21:00.4141942Z // begin inline asm 2026-02-21T09:21:00.4142093Z cp.async.ca.shared.global [ %r13897 + 0 ], [ %rd401 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4142215Z // end inline asm 2026-02-21T09:21:00.4142287Z cp.async.commit_group; 2026-02-21T09:21:00.4145865Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4145972Z add.s32 %r13968, %r22779, -65536; 2026-02-21T09:21:00.4146043Z cvt.s64.s32 %rd411, %r13968; 2026-02-21T09:21:00.4146112Z add.s64 %rd402, %rd45, %rd411; 2026-02-21T09:21:00.4146351Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4146433Z shl.b32 %r13969, %r22781, 11; 2026-02-21T09:21:00.4146664Z add.s32 %r13899, %r54, %r13969; 2026-02-21T09:21:00.4146749Z // begin inline asm 2026-02-21T09:21:00.4146918Z cp.async.ca.shared.global [ %r13899 + 0 ], [ %rd402 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4146979Z // end inline asm 2026-02-21T09:21:00.4147050Z cp.async.commit_group; 2026-02-21T09:21:00.4147266Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4147335Z add.s64 %rd403, %rd408, 352; 2026-02-21T09:21:00.4147401Z add.s64 %rd404, %rd409, 352; 2026-02-21T09:21:00.4147467Z add.s64 %rd405, %rd410, 352; 2026-02-21T09:21:00.4147546Z mad.wide.s32 %rd406, %r22778, 2, %rd44; 2026-02-21T09:21:00.4147754Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4147828Z add.s32 %r13970, %r6234, %r13966; 2026-02-21T09:21:00.4147893Z add.s32 %r13901, %r13970, %r47; 2026-02-21T09:21:00.4147954Z // begin inline asm 2026-02-21T09:21:00.4148104Z cp.async.ca.shared.global [ %r13901 + 0 ], [ %rd403 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4148174Z // end inline asm 2026-02-21T09:21:00.4148240Z add.s32 %r13903, %r13901, 2048; 2026-02-21T09:21:00.4148373Z // begin inline asm 2026-02-21T09:21:00.4148520Z cp.async.ca.shared.global [ %r13903 + 0 ], [ %rd404 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4148580Z // end inline asm 2026-02-21T09:21:00.4148644Z add.s32 %r13905, %r13901, 4096; 2026-02-21T09:21:00.4148710Z // begin inline asm 2026-02-21T09:21:00.4148845Z cp.async.ca.shared.global [ %r13905 + 0 ], [ %rd405 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4148905Z // end inline asm 2026-02-21T09:21:00.4148969Z add.s32 %r13907, %r13901, 6144; 2026-02-21T09:21:00.4149029Z // begin inline asm 2026-02-21T09:21:00.4149164Z cp.async.ca.shared.global [ %r13907 + 0 ], [ %rd406 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4149362Z // end inline asm 2026-02-21T09:21:00.4149438Z cp.async.commit_group; 2026-02-21T09:21:00.4149644Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4149781Z cvt.s64.s32 %rd412, %r22779; 2026-02-21T09:21:00.4149852Z add.s64 %rd407, %rd45, %rd412; 2026-02-21T09:21:00.4150056Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4150124Z add.s32 %r13909, %r60, %r13969; 2026-02-21T09:21:00.4150189Z // begin inline asm 2026-02-21T09:21:00.4150340Z cp.async.ca.shared.global [ %r13909 + 0 ], [ %rd407 + 0 ], 0x8, %r13892; 2026-02-21T09:21:00.4150410Z // end inline asm 2026-02-21T09:21:00.4150549Z cp.async.commit_group; 2026-02-21T09:21:00.4150762Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4150839Z add.s32 %r22779, %r22779, 131072; 2026-02-21T09:21:00.4150907Z add.s64 %rd718, %rd718, 64; 2026-02-21T09:21:00.4150975Z add.s32 %r22778, %r22778, 32; 2026-02-21T09:21:00.4151044Z setp.lt.u64 %p37, %rd719, 496; 2026-02-21T09:21:00.4151105Z @%p37 bra $L__BB0_7; 2026-02-21T09:21:00.4151217Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:00.4151424Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4151486Z or.b32 %r14362, %r1289, %r11; 2026-02-21T09:21:00.4151545Z or.b32 %r14363, %r1289, %r12; 2026-02-21T09:21:00.4151672Z or.b32 %r14364, %r1289, %r13; 2026-02-21T09:21:00.4151734Z or.b32 %r14365, %r1289, %r14; 2026-02-21T09:21:00.4151792Z or.b32 %r14366, %r1289, %r15; 2026-02-21T09:21:00.4151857Z or.b32 %r14367, %r1289, %r16; 2026-02-21T09:21:00.4151917Z or.b32 %r14368, %r1289, %r17; 2026-02-21T09:21:00.4151975Z or.b32 %r14369, %r1289, %r18; 2026-02-21T09:21:00.4152033Z or.b32 %r14370, %r1289, %r19; 2026-02-21T09:21:00.4152097Z or.b32 %r14371, %r1289, %r20; 2026-02-21T09:21:00.4152157Z or.b32 %r14372, %r1289, %r21; 2026-02-21T09:21:00.4152215Z or.b32 %r14373, %r1289, %r22; 2026-02-21T09:21:00.4152277Z or.b32 %r14374, %r1289, %r23; 2026-02-21T09:21:00.4152334Z or.b32 %r14375, %r1289, %r24; 2026-02-21T09:21:00.4152396Z or.b32 %r14376, %r1289, %r25; 2026-02-21T09:21:00.4152454Z or.b32 %r14377, %r1289, %r26; 2026-02-21T09:21:00.4152516Z or.b32 %r14378, %r1289, %r27; 2026-02-21T09:21:00.4152579Z or.b32 %r14379, %r1289, %r28; 2026-02-21T09:21:00.4152645Z or.b32 %r14380, %r1289, %r29; 2026-02-21T09:21:00.4152708Z or.b32 %r14381, %r1289, %r30; 2026-02-21T09:21:00.4152765Z or.b32 %r14382, %r1289, %r31; 2026-02-21T09:21:00.4152825Z or.b32 %r14383, %r1289, %r32; 2026-02-21T09:21:00.4152885Z or.b32 %r14384, %r1289, %r33; 2026-02-21T09:21:00.4152946Z or.b32 %r14385, %r1289, %r34; 2026-02-21T09:21:00.4153005Z or.b32 %r14386, %r1289, %r35; 2026-02-21T09:21:00.4153065Z or.b32 %r14387, %r1289, %r36; 2026-02-21T09:21:00.4153141Z or.b32 %r14388, %r1289, %r37; 2026-02-21T09:21:00.4153201Z or.b32 %r14389, %r1289, %r38; 2026-02-21T09:21:00.4153261Z or.b32 %r14390, %r1289, %r39; 2026-02-21T09:21:00.4153324Z or.b32 %r14391, %r1289, %r40; 2026-02-21T09:21:00.4153386Z or.b32 %r14392, %r1289, %r41; 2026-02-21T09:21:00.4153443Z or.b32 %r14393, %r1289, %r42; 2026-02-21T09:21:00.4153641Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4153714Z cp.async.wait_group 0; 2026-02-21T09:21:00.4153774Z bar.sync 0; 2026-02-21T09:21:00.4153969Z .loc 1 90 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:90:28 2026-02-21T09:21:00.4154060Z cvt.rn.bf16x2.f32 %r14394, %r22783, %r22782; 2026-02-21T09:21:00.4154142Z cvt.rn.bf16x2.f32 %r14395, %r22785, %r22784; 2026-02-21T09:21:00.4154220Z cvt.rn.bf16x2.f32 %r14396, %r22787, %r22786; 2026-02-21T09:21:00.4154296Z cvt.rn.bf16x2.f32 %r14397, %r22789, %r22788; 2026-02-21T09:21:00.4154437Z cvt.rn.bf16x2.f32 %r14398, %r22791, %r22790; 2026-02-21T09:21:00.4154512Z cvt.rn.bf16x2.f32 %r14399, %r22793, %r22792; 2026-02-21T09:21:00.4154585Z cvt.rn.bf16x2.f32 %r14400, %r22795, %r22794; 2026-02-21T09:21:00.4154734Z cvt.rn.bf16x2.f32 %r14401, %r22797, %r22796; 2026-02-21T09:21:00.4154810Z cvt.rn.bf16x2.f32 %r14402, %r22799, %r22798; 2026-02-21T09:21:00.4154886Z cvt.rn.bf16x2.f32 %r14403, %r22801, %r22800; 2026-02-21T09:21:00.4154966Z cvt.rn.bf16x2.f32 %r14404, %r22803, %r22802; 2026-02-21T09:21:00.4155055Z cvt.rn.bf16x2.f32 %r14405, %r22805, %r22804; 2026-02-21T09:21:00.4155133Z cvt.rn.bf16x2.f32 %r14406, %r22807, %r22806; 2026-02-21T09:21:00.4155209Z cvt.rn.bf16x2.f32 %r14407, %r22809, %r22808; 2026-02-21T09:21:00.4155343Z cvt.rn.bf16x2.f32 %r14408, %r22811, %r22810; 2026-02-21T09:21:00.4155419Z cvt.rn.bf16x2.f32 %r14409, %r22813, %r22812; 2026-02-21T09:21:00.4155495Z cvt.rn.bf16x2.f32 %r14410, %r22815, %r22814; 2026-02-21T09:21:00.4155576Z cvt.rn.bf16x2.f32 %r14411, %r22817, %r22816; 2026-02-21T09:21:00.4155652Z cvt.rn.bf16x2.f32 %r14412, %r22819, %r22818; 2026-02-21T09:21:00.4155727Z cvt.rn.bf16x2.f32 %r14413, %r22821, %r22820; 2026-02-21T09:21:00.4155807Z cvt.rn.bf16x2.f32 %r14414, %r22823, %r22822; 2026-02-21T09:21:00.4155885Z cvt.rn.bf16x2.f32 %r14415, %r22825, %r22824; 2026-02-21T09:21:00.4155959Z cvt.rn.bf16x2.f32 %r14416, %r22827, %r22826; 2026-02-21T09:21:00.4156036Z cvt.rn.bf16x2.f32 %r14417, %r22829, %r22828; 2026-02-21T09:21:00.4156116Z cvt.rn.bf16x2.f32 %r14418, %r22831, %r22830; 2026-02-21T09:21:00.4156235Z cvt.rn.bf16x2.f32 %r14419, %r22833, %r22832; 2026-02-21T09:21:00.4156314Z cvt.rn.bf16x2.f32 %r14420, %r22835, %r22834; 2026-02-21T09:21:00.4156395Z cvt.rn.bf16x2.f32 %r14421, %r22837, %r22836; 2026-02-21T09:21:00.4156597Z cvt.rn.bf16x2.f32 %r14422, %r22839, %r22838; 2026-02-21T09:21:00.4156678Z cvt.rn.bf16x2.f32 %r14423, %r22841, %r22840; 2026-02-21T09:21:00.4156764Z cvt.rn.bf16x2.f32 %r14424, %r22843, %r22842; 2026-02-21T09:21:00.4156843Z cvt.rn.bf16x2.f32 %r14425, %r22845, %r22844; 2026-02-21T09:21:00.4156918Z cvt.rn.bf16x2.f32 %r14426, %r22847, %r22846; 2026-02-21T09:21:00.4156993Z cvt.rn.bf16x2.f32 %r14427, %r22849, %r22848; 2026-02-21T09:21:00.4157076Z cvt.rn.bf16x2.f32 %r14428, %r22851, %r22850; 2026-02-21T09:21:00.4157152Z cvt.rn.bf16x2.f32 %r14429, %r22853, %r22852; 2026-02-21T09:21:00.4157226Z cvt.rn.bf16x2.f32 %r14430, %r22855, %r22854; 2026-02-21T09:21:00.4157304Z cvt.rn.bf16x2.f32 %r14431, %r22857, %r22856; 2026-02-21T09:21:00.4157384Z cvt.rn.bf16x2.f32 %r14432, %r22859, %r22858; 2026-02-21T09:21:00.4157459Z cvt.rn.bf16x2.f32 %r14433, %r22861, %r22860; 2026-02-21T09:21:00.4157537Z cvt.rn.bf16x2.f32 %r14434, %r22863, %r22862; 2026-02-21T09:21:00.4157611Z cvt.rn.bf16x2.f32 %r14435, %r22865, %r22864; 2026-02-21T09:21:00.4157687Z cvt.rn.bf16x2.f32 %r14436, %r22867, %r22866; 2026-02-21T09:21:00.4157762Z cvt.rn.bf16x2.f32 %r14437, %r22869, %r22868; 2026-02-21T09:21:00.4157842Z cvt.rn.bf16x2.f32 %r14438, %r22871, %r22870; 2026-02-21T09:21:00.4157915Z cvt.rn.bf16x2.f32 %r14439, %r22873, %r22872; 2026-02-21T09:21:00.4157990Z cvt.rn.bf16x2.f32 %r14440, %r22875, %r22874; 2026-02-21T09:21:00.4158068Z cvt.rn.bf16x2.f32 %r14441, %r22877, %r22876; 2026-02-21T09:21:00.4158157Z cvt.rn.bf16x2.f32 %r14442, %r22879, %r22878; 2026-02-21T09:21:00.4158237Z cvt.rn.bf16x2.f32 %r14443, %r22881, %r22880; 2026-02-21T09:21:00.4158316Z cvt.rn.bf16x2.f32 %r14444, %r22883, %r22882; 2026-02-21T09:21:00.4158391Z cvt.rn.bf16x2.f32 %r14445, %r22885, %r22884; 2026-02-21T09:21:00.4158468Z cvt.rn.bf16x2.f32 %r14446, %r22887, %r22886; 2026-02-21T09:21:00.4158546Z cvt.rn.bf16x2.f32 %r14447, %r22889, %r22888; 2026-02-21T09:21:00.4158626Z cvt.rn.bf16x2.f32 %r14448, %r22891, %r22890; 2026-02-21T09:21:00.4158703Z cvt.rn.bf16x2.f32 %r14449, %r22893, %r22892; 2026-02-21T09:21:00.4158778Z cvt.rn.bf16x2.f32 %r14450, %r22895, %r22894; 2026-02-21T09:21:00.4158856Z cvt.rn.bf16x2.f32 %r14451, %r22897, %r22896; 2026-02-21T09:21:00.4159013Z cvt.rn.bf16x2.f32 %r14452, %r22899, %r22898; 2026-02-21T09:21:00.4159088Z cvt.rn.bf16x2.f32 %r14453, %r22901, %r22900; 2026-02-21T09:21:00.4159165Z cvt.rn.bf16x2.f32 %r14454, %r22903, %r22902; 2026-02-21T09:21:00.4159301Z cvt.rn.bf16x2.f32 %r14455, %r22905, %r22904; 2026-02-21T09:21:00.4159376Z cvt.rn.bf16x2.f32 %r14456, %r22907, %r22906; 2026-02-21T09:21:00.4159451Z cvt.rn.bf16x2.f32 %r14457, %r22909, %r22908; 2026-02-21T09:21:00.4159530Z cvt.rn.bf16x2.f32 %r14458, %r22911, %r22910; 2026-02-21T09:21:00.4159605Z cvt.rn.bf16x2.f32 %r14459, %r22913, %r22912; 2026-02-21T09:21:00.4159680Z cvt.rn.bf16x2.f32 %r14460, %r22915, %r22914; 2026-02-21T09:21:00.4159760Z cvt.rn.bf16x2.f32 %r14461, %r22917, %r22916; 2026-02-21T09:21:00.4159897Z cvt.rn.bf16x2.f32 %r14462, %r22919, %r22918; 2026-02-21T09:21:00.4159975Z cvt.rn.bf16x2.f32 %r14463, %r22921, %r22920; 2026-02-21T09:21:00.4160053Z cvt.rn.bf16x2.f32 %r14464, %r22923, %r22922; 2026-02-21T09:21:00.4160132Z cvt.rn.bf16x2.f32 %r14465, %r22925, %r22924; 2026-02-21T09:21:00.4160207Z cvt.rn.bf16x2.f32 %r14466, %r22927, %r22926; 2026-02-21T09:21:00.4160283Z cvt.rn.bf16x2.f32 %r14467, %r22929, %r22928; 2026-02-21T09:21:00.4160361Z cvt.rn.bf16x2.f32 %r14468, %r22931, %r22930; 2026-02-21T09:21:00.4160439Z cvt.rn.bf16x2.f32 %r14469, %r22933, %r22932; 2026-02-21T09:21:00.4160515Z cvt.rn.bf16x2.f32 %r14470, %r22935, %r22934; 2026-02-21T09:21:00.4160592Z cvt.rn.bf16x2.f32 %r14471, %r22937, %r22936; 2026-02-21T09:21:00.4160666Z cvt.rn.bf16x2.f32 %r14472, %r22939, %r22938; 2026-02-21T09:21:00.4160805Z cvt.rn.bf16x2.f32 %r14473, %r22941, %r22940; 2026-02-21T09:21:00.4160884Z cvt.rn.bf16x2.f32 %r14474, %r22943, %r22942; 2026-02-21T09:21:00.4160966Z cvt.rn.bf16x2.f32 %r14475, %r22945, %r22944; 2026-02-21T09:21:00.4161045Z cvt.rn.bf16x2.f32 %r14476, %r22947, %r22946; 2026-02-21T09:21:00.4161123Z cvt.rn.bf16x2.f32 %r14477, %r22949, %r22948; 2026-02-21T09:21:00.4161202Z cvt.rn.bf16x2.f32 %r14478, %r22951, %r22950; 2026-02-21T09:21:00.4161279Z cvt.rn.bf16x2.f32 %r14479, %r22953, %r22952; 2026-02-21T09:21:00.4161353Z cvt.rn.bf16x2.f32 %r14480, %r22955, %r22954; 2026-02-21T09:21:00.4161431Z cvt.rn.bf16x2.f32 %r14481, %r22957, %r22956; 2026-02-21T09:21:00.4161511Z cvt.rn.bf16x2.f32 %r14482, %r22959, %r22958; 2026-02-21T09:21:00.4161589Z cvt.rn.bf16x2.f32 %r14483, %r22961, %r22960; 2026-02-21T09:21:00.4161663Z cvt.rn.bf16x2.f32 %r14484, %r22963, %r22962; 2026-02-21T09:21:00.4161741Z cvt.rn.bf16x2.f32 %r14485, %r22965, %r22964; 2026-02-21T09:21:00.4161817Z cvt.rn.bf16x2.f32 %r14486, %r22967, %r22966; 2026-02-21T09:21:00.4161893Z cvt.rn.bf16x2.f32 %r14487, %r22969, %r22968; 2026-02-21T09:21:00.4161983Z cvt.rn.bf16x2.f32 %r14488, %r22971, %r22970; 2026-02-21T09:21:00.4162065Z cvt.rn.bf16x2.f32 %r14489, %r22973, %r22972; 2026-02-21T09:21:00.4162145Z cvt.rn.bf16x2.f32 %r14490, %r22975, %r22974; 2026-02-21T09:21:00.4162225Z cvt.rn.bf16x2.f32 %r14491, %r22977, %r22976; 2026-02-21T09:21:00.4162304Z cvt.rn.bf16x2.f32 %r14492, %r22979, %r22978; 2026-02-21T09:21:00.4162381Z cvt.rn.bf16x2.f32 %r14493, %r22981, %r22980; 2026-02-21T09:21:00.4162458Z cvt.rn.bf16x2.f32 %r14494, %r22983, %r22982; 2026-02-21T09:21:00.4162536Z cvt.rn.bf16x2.f32 %r14495, %r22985, %r22984; 2026-02-21T09:21:00.4162613Z cvt.rn.bf16x2.f32 %r14496, %r22987, %r22986; 2026-02-21T09:21:00.4162688Z cvt.rn.bf16x2.f32 %r14497, %r22989, %r22988; 2026-02-21T09:21:00.4162766Z cvt.rn.bf16x2.f32 %r14498, %r22991, %r22990; 2026-02-21T09:21:00.4162843Z cvt.rn.bf16x2.f32 %r14499, %r22993, %r22992; 2026-02-21T09:21:00.4162919Z cvt.rn.bf16x2.f32 %r14500, %r22995, %r22994; 2026-02-21T09:21:00.4163000Z cvt.rn.bf16x2.f32 %r14501, %r22997, %r22996; 2026-02-21T09:21:00.4163077Z cvt.rn.bf16x2.f32 %r14502, %r22999, %r22998; 2026-02-21T09:21:00.4163153Z cvt.rn.bf16x2.f32 %r14503, %r23001, %r23000; 2026-02-21T09:21:00.4163229Z cvt.rn.bf16x2.f32 %r14504, %r23003, %r23002; 2026-02-21T09:21:00.4163310Z cvt.rn.bf16x2.f32 %r14505, %r23005, %r23004; 2026-02-21T09:21:00.4163448Z cvt.rn.bf16x2.f32 %r14506, %r23007, %r23006; 2026-02-21T09:21:00.4163524Z cvt.rn.bf16x2.f32 %r14507, %r23009, %r23008; 2026-02-21T09:21:00.4163602Z cvt.rn.bf16x2.f32 %r14508, %r23011, %r23010; 2026-02-21T09:21:00.4163737Z cvt.rn.bf16x2.f32 %r14509, %r23013, %r23012; 2026-02-21T09:21:00.4163814Z cvt.rn.bf16x2.f32 %r14510, %r23015, %r23014; 2026-02-21T09:21:00.4163891Z cvt.rn.bf16x2.f32 %r14511, %r23017, %r23016; 2026-02-21T09:21:00.4163965Z cvt.rn.bf16x2.f32 %r14512, %r23019, %r23018; 2026-02-21T09:21:00.4164042Z cvt.rn.bf16x2.f32 %r14513, %r23021, %r23020; 2026-02-21T09:21:00.4164120Z cvt.rn.bf16x2.f32 %r14514, %r23023, %r23022; 2026-02-21T09:21:00.4164198Z cvt.rn.bf16x2.f32 %r14515, %r23025, %r23024; 2026-02-21T09:21:00.4164322Z cvt.rn.bf16x2.f32 %r14516, %r23027, %r23026; 2026-02-21T09:21:00.4164399Z cvt.rn.bf16x2.f32 %r14517, %r23029, %r23028; 2026-02-21T09:21:00.4164478Z cvt.rn.bf16x2.f32 %r14518, %r23031, %r23030; 2026-02-21T09:21:00.4164554Z cvt.rn.bf16x2.f32 %r14519, %r23033, %r23032; 2026-02-21T09:21:00.4164627Z cvt.rn.bf16x2.f32 %r14520, %r23035, %r23034; 2026-02-21T09:21:00.4164702Z cvt.rn.bf16x2.f32 %r14521, %r23037, %r23036; 2026-02-21T09:21:00.4164910Z .loc 1 91 43 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:43 2026-02-21T09:21:00.4164979Z shl.b32 %r14522, %r14362, 13; 2026-02-21T09:21:00.4165040Z shl.b32 %r14523, %r14363, 13; 2026-02-21T09:21:00.4165103Z shl.b32 %r14524, %r14364, 13; 2026-02-21T09:21:00.4165162Z shl.b32 %r14525, %r14365, 13; 2026-02-21T09:21:00.4165267Z shl.b32 %r14526, %r14366, 13; 2026-02-21T09:21:00.4165339Z shl.b32 %r14527, %r14367, 13; 2026-02-21T09:21:00.4165405Z shl.b32 %r14528, %r14368, 13; 2026-02-21T09:21:00.4165466Z shl.b32 %r14529, %r14369, 13; 2026-02-21T09:21:00.4165525Z shl.b32 %r14530, %r14370, 13; 2026-02-21T09:21:00.4165585Z shl.b32 %r14531, %r14371, 13; 2026-02-21T09:21:00.4165642Z shl.b32 %r14532, %r14372, 13; 2026-02-21T09:21:00.4165703Z shl.b32 %r14533, %r14373, 13; 2026-02-21T09:21:00.4165764Z shl.b32 %r14534, %r14374, 13; 2026-02-21T09:21:00.4165822Z shl.b32 %r14535, %r14375, 13; 2026-02-21T09:21:00.4165879Z shl.b32 %r14536, %r14376, 13; 2026-02-21T09:21:00.4165939Z shl.b32 %r14537, %r14377, 13; 2026-02-21T09:21:00.4166002Z shl.b32 %r14538, %r14378, 13; 2026-02-21T09:21:00.4166062Z shl.b32 %r14539, %r14379, 13; 2026-02-21T09:21:00.4166120Z shl.b32 %r14540, %r14380, 13; 2026-02-21T09:21:00.4166181Z shl.b32 %r14541, %r14381, 13; 2026-02-21T09:21:00.4166241Z shl.b32 %r14542, %r14382, 13; 2026-02-21T09:21:00.4166300Z shl.b32 %r14543, %r14383, 13; 2026-02-21T09:21:00.4166362Z shl.b32 %r14544, %r14384, 13; 2026-02-21T09:21:00.4166419Z shl.b32 %r14545, %r14385, 13; 2026-02-21T09:21:00.4166596Z shl.b32 %r14546, %r14386, 13; 2026-02-21T09:21:00.4166661Z shl.b32 %r14547, %r14387, 13; 2026-02-21T09:21:00.4166725Z shl.b32 %r14548, %r14388, 13; 2026-02-21T09:21:00.4166782Z shl.b32 %r14549, %r14389, 13; 2026-02-21T09:21:00.4166844Z shl.b32 %r14550, %r14390, 13; 2026-02-21T09:21:00.4166915Z shl.b32 %r14551, %r14391, 13; 2026-02-21T09:21:00.4166978Z shl.b32 %r14552, %r14392, 13; 2026-02-21T09:21:00.4167038Z shl.b32 %r14553, %r14393, 13; 2026-02-21T09:21:00.4167255Z .loc 1 91 50 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:50 2026-02-21T09:21:00.4167327Z add.s32 %r14554, %r14522, %r1288; 2026-02-21T09:21:00.4167389Z add.s32 %r14555, %r14523, %r1288; 2026-02-21T09:21:00.4167450Z add.s32 %r14556, %r14524, %r1288; 2026-02-21T09:21:00.4167516Z add.s32 %r14557, %r14525, %r1288; 2026-02-21T09:21:00.4167577Z add.s32 %r14558, %r14526, %r1288; 2026-02-21T09:21:00.4167637Z add.s32 %r14559, %r14527, %r1288; 2026-02-21T09:21:00.4167706Z add.s32 %r14560, %r14528, %r1288; 2026-02-21T09:21:00.4167767Z add.s32 %r14561, %r14529, %r1288; 2026-02-21T09:21:00.4167825Z add.s32 %r14562, %r14530, %r1288; 2026-02-21T09:21:00.4167883Z add.s32 %r14563, %r14531, %r1288; 2026-02-21T09:21:00.4168031Z add.s32 %r14564, %r14532, %r1288; 2026-02-21T09:21:00.4168091Z add.s32 %r14565, %r14533, %r1288; 2026-02-21T09:21:00.4168150Z add.s32 %r14566, %r14534, %r1288; 2026-02-21T09:21:00.4168213Z add.s32 %r14567, %r14535, %r1288; 2026-02-21T09:21:00.4168336Z add.s32 %r14568, %r14536, %r1288; 2026-02-21T09:21:00.4168397Z add.s32 %r14569, %r14537, %r1288; 2026-02-21T09:21:00.4168456Z add.s32 %r14570, %r14538, %r1288; 2026-02-21T09:21:00.4168521Z add.s32 %r14571, %r14539, %r1288; 2026-02-21T09:21:00.4168579Z add.s32 %r14572, %r14540, %r1288; 2026-02-21T09:21:00.4168639Z add.s32 %r14573, %r14541, %r1288; 2026-02-21T09:21:00.4168715Z add.s32 %r14574, %r14542, %r1288; 2026-02-21T09:21:00.4168779Z add.s32 %r14575, %r14543, %r1288; 2026-02-21T09:21:00.4168907Z add.s32 %r14576, %r14544, %r1288; 2026-02-21T09:21:00.4168977Z add.s32 %r14577, %r14545, %r1288; 2026-02-21T09:21:00.4169037Z add.s32 %r14578, %r14546, %r1288; 2026-02-21T09:21:00.4169098Z add.s32 %r14579, %r14547, %r1288; 2026-02-21T09:21:00.4169158Z add.s32 %r14580, %r14548, %r1288; 2026-02-21T09:21:00.4169223Z add.s32 %r14581, %r14549, %r1288; 2026-02-21T09:21:00.4169282Z add.s32 %r14582, %r14550, %r1288; 2026-02-21T09:21:00.4169341Z add.s32 %r14583, %r14551, %r1288; 2026-02-21T09:21:00.4169406Z add.s32 %r14584, %r14552, %r1288; 2026-02-21T09:21:00.4169466Z add.s32 %r14585, %r14553, %r1288; 2026-02-21T09:21:00.4169686Z .loc 1 91 22 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:22 2026-02-21T09:21:00.4169825Z mad.wide.s32 %rd413, %r14554, 2, %rd46; 2026-02-21T09:21:00.4169903Z mad.wide.s32 %rd414, %r14555, 2, %rd46; 2026-02-21T09:21:00.4169972Z mad.wide.s32 %rd415, %r14556, 2, %rd46; 2026-02-21T09:21:00.4170053Z mad.wide.s32 %rd416, %r14557, 2, %rd46; 2026-02-21T09:21:00.4170125Z mad.wide.s32 %rd417, %r14558, 2, %rd46; 2026-02-21T09:21:00.4170191Z mad.wide.s32 %rd418, %r14559, 2, %rd46; 2026-02-21T09:21:00.4170257Z mad.wide.s32 %rd419, %r14560, 2, %rd46; 2026-02-21T09:21:00.4170333Z mad.wide.s32 %rd420, %r14561, 2, %rd46; 2026-02-21T09:21:00.4170401Z mad.wide.s32 %rd421, %r14562, 2, %rd46; 2026-02-21T09:21:00.4170467Z mad.wide.s32 %rd422, %r14563, 2, %rd46; 2026-02-21T09:21:00.4170535Z mad.wide.s32 %rd423, %r14564, 2, %rd46; 2026-02-21T09:21:00.4170607Z mad.wide.s32 %rd424, %r14565, 2, %rd46; 2026-02-21T09:21:00.4170675Z mad.wide.s32 %rd425, %r14566, 2, %rd46; 2026-02-21T09:21:00.4170743Z mad.wide.s32 %rd426, %r14567, 2, %rd46; 2026-02-21T09:21:00.4170813Z mad.wide.s32 %rd427, %r14568, 2, %rd46; 2026-02-21T09:21:00.4170882Z mad.wide.s32 %rd428, %r14569, 2, %rd46; 2026-02-21T09:21:00.4170951Z mad.wide.s32 %rd429, %r14570, 2, %rd46; 2026-02-21T09:21:00.4171023Z mad.wide.s32 %rd430, %r14571, 2, %rd46; 2026-02-21T09:21:00.4171093Z mad.wide.s32 %rd431, %r14572, 2, %rd46; 2026-02-21T09:21:00.4171159Z mad.wide.s32 %rd432, %r14573, 2, %rd46; 2026-02-21T09:21:00.4171228Z mad.wide.s32 %rd433, %r14574, 2, %rd46; 2026-02-21T09:21:00.4171297Z mad.wide.s32 %rd434, %r14575, 2, %rd46; 2026-02-21T09:21:00.4171366Z mad.wide.s32 %rd435, %r14576, 2, %rd46; 2026-02-21T09:21:00.4171433Z mad.wide.s32 %rd436, %r14577, 2, %rd46; 2026-02-21T09:21:00.4171503Z mad.wide.s32 %rd437, %r14578, 2, %rd46; 2026-02-21T09:21:00.4171572Z mad.wide.s32 %rd438, %r14579, 2, %rd46; 2026-02-21T09:21:00.4171638Z mad.wide.s32 %rd439, %r14580, 2, %rd46; 2026-02-21T09:21:00.4171707Z mad.wide.s32 %rd440, %r14581, 2, %rd46; 2026-02-21T09:21:00.4171773Z mad.wide.s32 %rd441, %r14582, 2, %rd46; 2026-02-21T09:21:00.4171840Z mad.wide.s32 %rd442, %r14583, 2, %rd46; 2026-02-21T09:21:00.4171907Z mad.wide.s32 %rd443, %r14584, 2, %rd46; 2026-02-21T09:21:00.4171978Z mad.wide.s32 %rd444, %r14585, 2, %rd46; 2026-02-21T09:21:00.4172199Z .loc 1 91 81 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:81 2026-02-21T09:21:00.4172327Z st.shared.v4.b32 [%r117], {%r14394, %r14396, %r14398, %r14400}; 2026-02-21T09:21:00.4172526Z st.shared.v4.b32 [%r118], {%r14402, %r14404, %r14406, %r14408}; 2026-02-21T09:21:00.4172636Z st.shared.v4.b32 [%r119], {%r14410, %r14412, %r14414, %r14416}; 2026-02-21T09:21:00.4172744Z st.shared.v4.b32 [%r120], {%r14418, %r14420, %r14422, %r14424}; 2026-02-21T09:21:00.4172901Z st.shared.v4.b32 [%r121], {%r14426, %r14428, %r14430, %r14432}; 2026-02-21T09:21:00.4173009Z st.shared.v4.b32 [%r122], {%r14434, %r14436, %r14438, %r14440}; 2026-02-21T09:21:00.4173114Z st.shared.v4.b32 [%r123], {%r14442, %r14444, %r14446, %r14448}; 2026-02-21T09:21:00.4173227Z st.shared.v4.b32 [%r124], {%r14450, %r14452, %r14454, %r14456}; 2026-02-21T09:21:00.4173284Z bar.sync 0; 2026-02-21T09:21:00.4173346Z // begin inline asm 2026-02-21T09:21:00.4173597Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14131, %r14132, %r14133, %r14134}, [%r6269]; 2026-02-21T09:21:00.4173665Z // end inline asm 2026-02-21T09:21:00.4173723Z // begin inline asm 2026-02-21T09:21:00.4173915Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14139, %r14140, %r14141, %r14142}, [%r6274]; 2026-02-21T09:21:00.4173980Z // end inline asm 2026-02-21T09:21:00.4174049Z // begin inline asm 2026-02-21T09:21:00.4174245Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14147, %r14148, %r14149, %r14150}, [%r6279]; 2026-02-21T09:21:00.4174307Z // end inline asm 2026-02-21T09:21:00.4174366Z // begin inline asm 2026-02-21T09:21:00.4174551Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14155, %r14156, %r14157, %r14158}, [%r6284]; 2026-02-21T09:21:00.4174611Z // end inline asm 2026-02-21T09:21:00.4174676Z // begin inline asm 2026-02-21T09:21:00.4174909Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14163, %r14164, %r14165, %r14166}, [%r6289]; 2026-02-21T09:21:00.4174968Z // end inline asm 2026-02-21T09:21:00.4175029Z // begin inline asm 2026-02-21T09:21:00.4175216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14171, %r14172, %r14173, %r14174}, [%r6294]; 2026-02-21T09:21:00.4175274Z // end inline asm 2026-02-21T09:21:00.4175332Z // begin inline asm 2026-02-21T09:21:00.4175524Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14179, %r14180, %r14181, %r14182}, [%r6299]; 2026-02-21T09:21:00.4175583Z // end inline asm 2026-02-21T09:21:00.4175641Z // begin inline asm 2026-02-21T09:21:00.4175835Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14187, %r14188, %r14189, %r14190}, [%r6304]; 2026-02-21T09:21:00.4175891Z // end inline asm 2026-02-21T09:21:00.4175946Z bar.sync 0; 2026-02-21T09:21:00.4176067Z st.shared.v4.b32 [%r117], {%r14395, %r14397, %r14399, %r14401}; 2026-02-21T09:21:00.4176179Z st.shared.v4.b32 [%r118], {%r14403, %r14405, %r14407, %r14409}; 2026-02-21T09:21:00.4176288Z st.shared.v4.b32 [%r119], {%r14411, %r14413, %r14415, %r14417}; 2026-02-21T09:21:00.4176393Z st.shared.v4.b32 [%r120], {%r14419, %r14421, %r14423, %r14425}; 2026-02-21T09:21:00.4176629Z st.shared.v4.b32 [%r121], {%r14427, %r14429, %r14431, %r14433}; 2026-02-21T09:21:00.4176742Z st.shared.v4.b32 [%r122], {%r14435, %r14437, %r14439, %r14441}; 2026-02-21T09:21:00.4176853Z st.shared.v4.b32 [%r123], {%r14443, %r14445, %r14447, %r14449}; 2026-02-21T09:21:00.4176965Z st.shared.v4.b32 [%r124], {%r14451, %r14453, %r14455, %r14457}; 2026-02-21T09:21:00.4177020Z bar.sync 0; 2026-02-21T09:21:00.4177078Z // begin inline asm 2026-02-21T09:21:00.4177274Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14135, %r14136, %r14137, %r14138}, [%r6269]; 2026-02-21T09:21:00.4177343Z // end inline asm 2026-02-21T09:21:00.4177406Z // begin inline asm 2026-02-21T09:21:00.4177601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14143, %r14144, %r14145, %r14146}, [%r6274]; 2026-02-21T09:21:00.4177664Z // end inline asm 2026-02-21T09:21:00.4177721Z // begin inline asm 2026-02-21T09:21:00.4177913Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14151, %r14152, %r14153, %r14154}, [%r6279]; 2026-02-21T09:21:00.4177981Z // end inline asm 2026-02-21T09:21:00.4178044Z // begin inline asm 2026-02-21T09:21:00.4178232Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14159, %r14160, %r14161, %r14162}, [%r6284]; 2026-02-21T09:21:00.4178372Z // end inline asm 2026-02-21T09:21:00.4178432Z // begin inline asm 2026-02-21T09:21:00.4178618Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14167, %r14168, %r14169, %r14170}, [%r6289]; 2026-02-21T09:21:00.4178675Z // end inline asm 2026-02-21T09:21:00.4178799Z // begin inline asm 2026-02-21T09:21:00.4178985Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14175, %r14176, %r14177, %r14178}, [%r6294]; 2026-02-21T09:21:00.4179042Z // end inline asm 2026-02-21T09:21:00.4179117Z // begin inline asm 2026-02-21T09:21:00.4179309Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14183, %r14184, %r14185, %r14186}, [%r6299]; 2026-02-21T09:21:00.4179367Z // end inline asm 2026-02-21T09:21:00.4179428Z // begin inline asm 2026-02-21T09:21:00.4179676Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14191, %r14192, %r14193, %r14194}, [%r6304]; 2026-02-21T09:21:00.4179736Z // end inline asm 2026-02-21T09:21:00.4179791Z bar.sync 0; 2026-02-21T09:21:00.4179911Z st.shared.v4.b32 [%r117], {%r14458, %r14460, %r14462, %r14464}; 2026-02-21T09:21:00.4180025Z st.shared.v4.b32 [%r118], {%r14466, %r14468, %r14470, %r14472}; 2026-02-21T09:21:00.4180133Z st.shared.v4.b32 [%r119], {%r14474, %r14476, %r14478, %r14480}; 2026-02-21T09:21:00.4180244Z st.shared.v4.b32 [%r120], {%r14482, %r14484, %r14486, %r14488}; 2026-02-21T09:21:00.4180353Z st.shared.v4.b32 [%r121], {%r14490, %r14492, %r14494, %r14496}; 2026-02-21T09:21:00.4180459Z st.shared.v4.b32 [%r122], {%r14498, %r14500, %r14502, %r14504}; 2026-02-21T09:21:00.4180566Z st.shared.v4.b32 [%r123], {%r14506, %r14508, %r14510, %r14512}; 2026-02-21T09:21:00.4180736Z st.shared.v4.b32 [%r124], {%r14514, %r14516, %r14518, %r14520}; 2026-02-21T09:21:00.4180804Z bar.sync 0; 2026-02-21T09:21:00.4180875Z // begin inline asm 2026-02-21T09:21:00.4181070Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14195, %r14196, %r14197, %r14198}, [%r6269]; 2026-02-21T09:21:00.4181129Z // end inline asm 2026-02-21T09:21:00.4181191Z // begin inline asm 2026-02-21T09:21:00.4181380Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14203, %r14204, %r14205, %r14206}, [%r6274]; 2026-02-21T09:21:00.4181435Z // end inline asm 2026-02-21T09:21:00.4181495Z // begin inline asm 2026-02-21T09:21:00.4181679Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14211, %r14212, %r14213, %r14214}, [%r6279]; 2026-02-21T09:21:00.4181736Z // end inline asm 2026-02-21T09:21:00.4181796Z // begin inline asm 2026-02-21T09:21:00.4181980Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14219, %r14220, %r14221, %r14222}, [%r6284]; 2026-02-21T09:21:00.4182036Z // end inline asm 2026-02-21T09:21:00.4182095Z // begin inline asm 2026-02-21T09:21:00.4182281Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14227, %r14228, %r14229, %r14230}, [%r6289]; 2026-02-21T09:21:00.4182336Z // end inline asm 2026-02-21T09:21:00.4182394Z // begin inline asm 2026-02-21T09:21:00.4182586Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14235, %r14236, %r14237, %r14238}, [%r6294]; 2026-02-21T09:21:00.4182647Z // end inline asm 2026-02-21T09:21:00.4182707Z // begin inline asm 2026-02-21T09:21:00.4182890Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14243, %r14244, %r14245, %r14246}, [%r6299]; 2026-02-21T09:21:00.4182951Z // end inline asm 2026-02-21T09:21:00.4183008Z // begin inline asm 2026-02-21T09:21:00.4183194Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14251, %r14252, %r14253, %r14254}, [%r6304]; 2026-02-21T09:21:00.4183255Z // end inline asm 2026-02-21T09:21:00.4183309Z bar.sync 0; 2026-02-21T09:21:00.4183418Z st.shared.v4.b32 [%r117], {%r14459, %r14461, %r14463, %r14465}; 2026-02-21T09:21:00.4183529Z st.shared.v4.b32 [%r118], {%r14467, %r14469, %r14471, %r14473}; 2026-02-21T09:21:00.4183637Z st.shared.v4.b32 [%r119], {%r14475, %r14477, %r14479, %r14481}; 2026-02-21T09:21:00.4183744Z st.shared.v4.b32 [%r120], {%r14483, %r14485, %r14487, %r14489}; 2026-02-21T09:21:00.4183850Z st.shared.v4.b32 [%r121], {%r14491, %r14493, %r14495, %r14497}; 2026-02-21T09:21:00.4183959Z st.shared.v4.b32 [%r122], {%r14499, %r14501, %r14503, %r14505}; 2026-02-21T09:21:00.4184128Z st.shared.v4.b32 [%r123], {%r14507, %r14509, %r14511, %r14513}; 2026-02-21T09:21:00.4184233Z st.shared.v4.b32 [%r124], {%r14515, %r14517, %r14519, %r14521}; 2026-02-21T09:21:00.4184289Z bar.sync 0; 2026-02-21T09:21:00.4184394Z // begin inline asm 2026-02-21T09:21:00.4184583Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14199, %r14200, %r14201, %r14202}, [%r6269]; 2026-02-21T09:21:00.4184640Z // end inline asm 2026-02-21T09:21:00.4184699Z // begin inline asm 2026-02-21T09:21:00.4184888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14207, %r14208, %r14209, %r14210}, [%r6274]; 2026-02-21T09:21:00.4184944Z // end inline asm 2026-02-21T09:21:00.4185004Z // begin inline asm 2026-02-21T09:21:00.4185246Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14215, %r14216, %r14217, %r14218}, [%r6279]; 2026-02-21T09:21:00.4185309Z // end inline asm 2026-02-21T09:21:00.4185370Z // begin inline asm 2026-02-21T09:21:00.4185557Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14223, %r14224, %r14225, %r14226}, [%r6284]; 2026-02-21T09:21:00.4185615Z // end inline asm 2026-02-21T09:21:00.4185676Z // begin inline asm 2026-02-21T09:21:00.4185860Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14231, %r14232, %r14233, %r14234}, [%r6289]; 2026-02-21T09:21:00.4185918Z // end inline asm 2026-02-21T09:21:00.4185974Z // begin inline asm 2026-02-21T09:21:00.4186161Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14239, %r14240, %r14241, %r14242}, [%r6294]; 2026-02-21T09:21:00.4186217Z // end inline asm 2026-02-21T09:21:00.4186275Z // begin inline asm 2026-02-21T09:21:00.4186642Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14247, %r14248, %r14249, %r14250}, [%r6299]; 2026-02-21T09:21:00.4186710Z // end inline asm 2026-02-21T09:21:00.4186768Z // begin inline asm 2026-02-21T09:21:00.4186971Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14255, %r14256, %r14257, %r14258}, [%r6304]; 2026-02-21T09:21:00.4187031Z // end inline asm 2026-02-21T09:21:00.4187090Z // begin inline asm 2026-02-21T09:21:00.4187224Z st.global.v4.b32 [ %rd413 + 0 ], { %r14131, %r14132, %r14133, %r14134 }; 2026-02-21T09:21:00.4187285Z // end inline asm 2026-02-21T09:21:00.4187343Z // begin inline asm 2026-02-21T09:21:00.4187467Z st.global.v4.b32 [ %rd414 + 0 ], { %r14135, %r14136, %r14137, %r14138 }; 2026-02-21T09:21:00.4187529Z // end inline asm 2026-02-21T09:21:00.4187586Z // begin inline asm 2026-02-21T09:21:00.4187706Z st.global.v4.b32 [ %rd415 + 0 ], { %r14139, %r14140, %r14141, %r14142 }; 2026-02-21T09:21:00.4187764Z // end inline asm 2026-02-21T09:21:00.4187825Z // begin inline asm 2026-02-21T09:21:00.4187944Z st.global.v4.b32 [ %rd416 + 0 ], { %r14143, %r14144, %r14145, %r14146 }; 2026-02-21T09:21:00.4188002Z // end inline asm 2026-02-21T09:21:00.4188063Z // begin inline asm 2026-02-21T09:21:00.4188183Z st.global.v4.b32 [ %rd417 + 0 ], { %r14147, %r14148, %r14149, %r14150 }; 2026-02-21T09:21:00.4188239Z // end inline asm 2026-02-21T09:21:00.4188384Z // begin inline asm 2026-02-21T09:21:00.4188508Z st.global.v4.b32 [ %rd418 + 0 ], { %r14151, %r14152, %r14153, %r14154 }; 2026-02-21T09:21:00.4188565Z // end inline asm 2026-02-21T09:21:00.4188623Z // begin inline asm 2026-02-21T09:21:00.4188742Z st.global.v4.b32 [ %rd419 + 0 ], { %r14155, %r14156, %r14157, %r14158 }; 2026-02-21T09:21:00.4188799Z // end inline asm 2026-02-21T09:21:00.4188859Z // begin inline asm 2026-02-21T09:21:00.4188976Z st.global.v4.b32 [ %rd420 + 0 ], { %r14159, %r14160, %r14161, %r14162 }; 2026-02-21T09:21:00.4189032Z // end inline asm 2026-02-21T09:21:00.4189090Z // begin inline asm 2026-02-21T09:21:00.4189208Z st.global.v4.b32 [ %rd421 + 0 ], { %r14163, %r14164, %r14165, %r14166 }; 2026-02-21T09:21:00.4189267Z // end inline asm 2026-02-21T09:21:00.4189324Z // begin inline asm 2026-02-21T09:21:00.4189442Z st.global.v4.b32 [ %rd422 + 0 ], { %r14167, %r14168, %r14169, %r14170 }; 2026-02-21T09:21:00.4189500Z // end inline asm 2026-02-21T09:21:00.4189558Z // begin inline asm 2026-02-21T09:21:00.4189675Z st.global.v4.b32 [ %rd423 + 0 ], { %r14171, %r14172, %r14173, %r14174 }; 2026-02-21T09:21:00.4189815Z // end inline asm 2026-02-21T09:21:00.4189873Z // begin inline asm 2026-02-21T09:21:00.4190001Z st.global.v4.b32 [ %rd424 + 0 ], { %r14175, %r14176, %r14177, %r14178 }; 2026-02-21T09:21:00.4190125Z // end inline asm 2026-02-21T09:21:00.4190186Z // begin inline asm 2026-02-21T09:21:00.4190305Z st.global.v4.b32 [ %rd425 + 0 ], { %r14179, %r14180, %r14181, %r14182 }; 2026-02-21T09:21:00.4190364Z // end inline asm 2026-02-21T09:21:00.4190424Z // begin inline asm 2026-02-21T09:21:00.4190545Z st.global.v4.b32 [ %rd426 + 0 ], { %r14183, %r14184, %r14185, %r14186 }; 2026-02-21T09:21:00.4190603Z // end inline asm 2026-02-21T09:21:00.4190659Z // begin inline asm 2026-02-21T09:21:00.4190849Z st.global.v4.b32 [ %rd427 + 0 ], { %r14187, %r14188, %r14189, %r14190 }; 2026-02-21T09:21:00.4190908Z // end inline asm 2026-02-21T09:21:00.4190965Z // begin inline asm 2026-02-21T09:21:00.4191087Z st.global.v4.b32 [ %rd428 + 0 ], { %r14191, %r14192, %r14193, %r14194 }; 2026-02-21T09:21:00.4191148Z // end inline asm 2026-02-21T09:21:00.4191205Z // begin inline asm 2026-02-21T09:21:00.4191324Z st.global.v4.b32 [ %rd429 + 0 ], { %r14195, %r14196, %r14197, %r14198 }; 2026-02-21T09:21:00.4191380Z // end inline asm 2026-02-21T09:21:00.4191439Z // begin inline asm 2026-02-21T09:21:00.4191556Z st.global.v4.b32 [ %rd430 + 0 ], { %r14199, %r14200, %r14201, %r14202 }; 2026-02-21T09:21:00.4191617Z // end inline asm 2026-02-21T09:21:00.4191677Z // begin inline asm 2026-02-21T09:21:00.4191863Z st.global.v4.b32 [ %rd431 + 0 ], { %r14203, %r14204, %r14205, %r14206 }; 2026-02-21T09:21:00.4191925Z // end inline asm 2026-02-21T09:21:00.4191983Z // begin inline asm 2026-02-21T09:21:00.4192101Z st.global.v4.b32 [ %rd432 + 0 ], { %r14207, %r14208, %r14209, %r14210 }; 2026-02-21T09:21:00.4192157Z // end inline asm 2026-02-21T09:21:00.4192218Z // begin inline asm 2026-02-21T09:21:00.4192334Z st.global.v4.b32 [ %rd433 + 0 ], { %r14211, %r14212, %r14213, %r14214 }; 2026-02-21T09:21:00.4192392Z // end inline asm 2026-02-21T09:21:00.4192452Z // begin inline asm 2026-02-21T09:21:00.4192568Z st.global.v4.b32 [ %rd434 + 0 ], { %r14215, %r14216, %r14217, %r14218 }; 2026-02-21T09:21:00.4192622Z // end inline asm 2026-02-21T09:21:00.4192683Z // begin inline asm 2026-02-21T09:21:00.4192801Z st.global.v4.b32 [ %rd435 + 0 ], { %r14219, %r14220, %r14221, %r14222 }; 2026-02-21T09:21:00.4192857Z // end inline asm 2026-02-21T09:21:00.4192914Z // begin inline asm 2026-02-21T09:21:00.4193037Z st.global.v4.b32 [ %rd436 + 0 ], { %r14223, %r14224, %r14225, %r14226 }; 2026-02-21T09:21:00.4193093Z // end inline asm 2026-02-21T09:21:00.4193150Z // begin inline asm 2026-02-21T09:21:00.4193271Z st.global.v4.b32 [ %rd437 + 0 ], { %r14227, %r14228, %r14229, %r14230 }; 2026-02-21T09:21:00.4193328Z // end inline asm 2026-02-21T09:21:00.4193387Z // begin inline asm 2026-02-21T09:21:00.4193506Z st.global.v4.b32 [ %rd438 + 0 ], { %r14231, %r14232, %r14233, %r14234 }; 2026-02-21T09:21:00.4193567Z // end inline asm 2026-02-21T09:21:00.4193626Z // begin inline asm 2026-02-21T09:21:00.4193740Z st.global.v4.b32 [ %rd439 + 0 ], { %r14235, %r14236, %r14237, %r14238 }; 2026-02-21T09:21:00.4193799Z // end inline asm 2026-02-21T09:21:00.4193856Z // begin inline asm 2026-02-21T09:21:00.4193975Z st.global.v4.b32 [ %rd440 + 0 ], { %r14239, %r14240, %r14241, %r14242 }; 2026-02-21T09:21:00.4194033Z // end inline asm 2026-02-21T09:21:00.4194090Z // begin inline asm 2026-02-21T09:21:00.4194208Z st.global.v4.b32 [ %rd441 + 0 ], { %r14243, %r14244, %r14245, %r14246 }; 2026-02-21T09:21:00.4194264Z // end inline asm 2026-02-21T09:21:00.4194324Z // begin inline asm 2026-02-21T09:21:00.4194440Z st.global.v4.b32 [ %rd442 + 0 ], { %r14247, %r14248, %r14249, %r14250 }; 2026-02-21T09:21:00.4194509Z // end inline asm 2026-02-21T09:21:00.4194573Z // begin inline asm 2026-02-21T09:21:00.4194692Z st.global.v4.b32 [ %rd443 + 0 ], { %r14251, %r14252, %r14253, %r14254 }; 2026-02-21T09:21:00.4194809Z // end inline asm 2026-02-21T09:21:00.4194867Z // begin inline asm 2026-02-21T09:21:00.4194985Z st.global.v4.b32 [ %rd444 + 0 ], { %r14255, %r14256, %r14257, %r14258 }; 2026-02-21T09:21:00.4195041Z // end inline asm 2026-02-21T09:21:00.4195263Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.4195378Z add.s32 %r14586, %r22257, 3; 2026-02-21T09:21:00.4195587Z .loc 1 29 33 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:29:33 2026-02-21T09:21:00.4195652Z shr.u32 %r14587, %r14586, 6; 2026-02-21T09:21:00.4195722Z and.b32 %r14588, %r14587, 33554424; 2026-02-21T09:21:00.4195923Z .loc 1 30 39 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:39 2026-02-21T09:21:00.4196053Z sub.s32 %r14589, 32, %r14588; 2026-02-21T09:21:00.4196256Z .loc 1 30 52 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:30:52 2026-02-21T09:21:00.4196321Z min.s32 %r14590, %r14589, 8; 2026-02-21T09:21:00.4196634Z .loc 1 31 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:45 2026-02-21T09:21:00.4196701Z and.b32 %r14591, %r14586, 511; 2026-02-21T09:21:00.4196898Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.4196964Z div.s32 %r14592, %r14591, %r14590; 2026-02-21T09:21:00.4197157Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.4197313Z mul.lo.s32 %r14593, %r14592, %r14590; 2026-02-21T09:21:00.4197379Z sub.s32 %r14594, %r14591, %r14593; 2026-02-21T09:21:00.4197577Z .loc 1 31 30 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:30 2026-02-21T09:21:00.4197640Z add.s32 %r14595, %r14594, %r14588; 2026-02-21T09:21:00.4197837Z .loc 1 33 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:33:27 2026-02-21T09:21:00.4197901Z shl.b32 %r14596, %r14595, 8; 2026-02-21T09:21:00.4198098Z .loc 1 34 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:32 2026-02-21T09:21:00.4198161Z or.b32 %r1812, %r14596, %r44; 2026-02-21T09:21:00.4198357Z .loc 1 35 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:35:27 2026-02-21T09:21:00.4198419Z shl.b32 %r1813, %r14592, 8; 2026-02-21T09:21:00.4198614Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4198676Z or.b32 %r14597, %r1813, %r6; 2026-02-21T09:21:00.4198735Z or.b32 %r14598, %r1813, %r7; 2026-02-21T09:21:00.4198795Z or.b32 %r14599, %r1813, %r8; 2026-02-21T09:21:00.4198856Z or.b32 %r14600, %r1813, %r9; 2026-02-21T09:21:00.4199049Z .loc 1 51 53 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:53 2026-02-21T09:21:00.4199112Z shl.b32 %r14601, %r14597, 10; 2026-02-21T09:21:00.4199172Z shl.b32 %r14602, %r14598, 10; 2026-02-21T09:21:00.4199232Z shl.b32 %r14603, %r14599, 10; 2026-02-21T09:21:00.4199290Z shl.b32 %r14604, %r14600, 10; 2026-02-21T09:21:00.4199484Z .loc 1 51 60 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:60 2026-02-21T09:21:00.4199547Z or.b32 %r14605, %r14601, %r46; 2026-02-21T09:21:00.4199609Z or.b32 %r14606, %r14602, %r46; 2026-02-21T09:21:00.4199671Z or.b32 %r14607, %r14603, %r46; 2026-02-21T09:21:00.4199731Z or.b32 %r14608, %r14604, %r46; 2026-02-21T09:21:00.4199924Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4200002Z mad.wide.s32 %rd445, %r14605, 2, %rd44; 2026-02-21T09:21:00.4200073Z mad.wide.s32 %rd446, %r14606, 2, %rd44; 2026-02-21T09:21:00.4200144Z mad.wide.s32 %rd447, %r14607, 2, %rd44; 2026-02-21T09:21:00.4200211Z mad.wide.s32 %rd448, %r14608, 2, %rd44; 2026-02-21T09:21:00.4200407Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4200550Z bar.sync 0; 2026-02-21T09:21:00.4200609Z mov.b32 %r14260, 8; 2026-02-21T09:21:00.4200674Z // begin inline asm 2026-02-21T09:21:00.4200819Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd445 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4200939Z // end inline asm 2026-02-21T09:21:00.4200998Z // begin inline asm 2026-02-21T09:21:00.4201134Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd446 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4201190Z // end inline asm 2026-02-21T09:21:00.4201250Z // begin inline asm 2026-02-21T09:21:00.4201384Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd447 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4201441Z // end inline asm 2026-02-21T09:21:00.4201500Z // begin inline asm 2026-02-21T09:21:00.4201694Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd448 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4201753Z // end inline asm 2026-02-21T09:21:00.4201821Z cp.async.commit_group; 2026-02-21T09:21:00.4202020Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4202091Z add.s32 %r14609, %r1812, %r22238; 2026-02-21T09:21:00.4202294Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4202361Z cvt.s64.s32 %rd496, %r14609; 2026-02-21T09:21:00.4202441Z add.s64 %rd449, %rd45, %rd496; 2026-02-21T09:21:00.4202637Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4202750Z // begin inline asm 2026-02-21T09:21:00.4202887Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd449 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4202943Z // end inline asm 2026-02-21T09:21:00.4203009Z cp.async.commit_group; 2026-02-21T09:21:00.4203205Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4203271Z cvt.s64.s32 %rd497, %r14601; 2026-02-21T09:21:00.4203335Z or.b64 %rd498, %rd497, %rd713; 2026-02-21T09:21:00.4203397Z shl.b64 %rd499, %rd498, 1; 2026-02-21T09:21:00.4203460Z add.s64 %rd500, %rd44, %rd499; 2026-02-21T09:21:00.4203523Z add.s64 %rd450, %rd500, 32; 2026-02-21T09:21:00.4203585Z cvt.s64.s32 %rd501, %r14602; 2026-02-21T09:21:00.4203648Z or.b64 %rd502, %rd501, %rd713; 2026-02-21T09:21:00.4203711Z shl.b64 %rd503, %rd502, 1; 2026-02-21T09:21:00.4203771Z add.s64 %rd504, %rd44, %rd503; 2026-02-21T09:21:00.4203832Z add.s64 %rd451, %rd504, 32; 2026-02-21T09:21:00.4203894Z cvt.s64.s32 %rd505, %r14603; 2026-02-21T09:21:00.4203960Z or.b64 %rd506, %rd505, %rd713; 2026-02-21T09:21:00.4204020Z shl.b64 %rd507, %rd506, 1; 2026-02-21T09:21:00.4204084Z add.s64 %rd508, %rd44, %rd507; 2026-02-21T09:21:00.4204146Z add.s64 %rd452, %rd508, 32; 2026-02-21T09:21:00.4204205Z cvt.s64.s32 %rd509, %r14604; 2026-02-21T09:21:00.4204264Z or.b64 %rd510, %rd509, %rd713; 2026-02-21T09:21:00.4204327Z shl.b64 %rd511, %rd510, 1; 2026-02-21T09:21:00.4204390Z add.s64 %rd512, %rd44, %rd511; 2026-02-21T09:21:00.4204449Z add.s64 %rd453, %rd512, 32; 2026-02-21T09:21:00.4204648Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4204708Z // begin inline asm 2026-02-21T09:21:00.4204842Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd450 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4204901Z // end inline asm 2026-02-21T09:21:00.4204965Z // begin inline asm 2026-02-21T09:21:00.4205095Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd451 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4205165Z // end inline asm 2026-02-21T09:21:00.4205228Z // begin inline asm 2026-02-21T09:21:00.4205357Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd452 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4205416Z // end inline asm 2026-02-21T09:21:00.4205477Z // begin inline asm 2026-02-21T09:21:00.4205606Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd453 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4205662Z // end inline asm 2026-02-21T09:21:00.4205788Z cp.async.commit_group; 2026-02-21T09:21:00.4205988Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4206051Z add.s32 %r14610, %r1812, %r59; 2026-02-21T09:21:00.4206294Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4206361Z cvt.s64.s32 %rd513, %r14610; 2026-02-21T09:21:00.4206423Z add.s64 %rd454, %rd45, %rd513; 2026-02-21T09:21:00.4206749Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4206815Z // begin inline asm 2026-02-21T09:21:00.4206949Z cp.async.ca.shared.global [ %r60 + 0 ], [ %rd454 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4207078Z // end inline asm 2026-02-21T09:21:00.4207146Z cp.async.commit_group; 2026-02-21T09:21:00.4207342Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4207406Z add.s64 %rd455, %rd500, 64; 2026-02-21T09:21:00.4207467Z add.s64 %rd456, %rd504, 64; 2026-02-21T09:21:00.4207529Z add.s64 %rd457, %rd508, 64; 2026-02-21T09:21:00.4207589Z add.s64 %rd458, %rd512, 64; 2026-02-21T09:21:00.4207784Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4207845Z bar.sync 0; 2026-02-21T09:21:00.4207904Z // begin inline asm 2026-02-21T09:21:00.4208037Z cp.async.ca.shared.global [ %r61 + 0 ], [ %rd455 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4208158Z // end inline asm 2026-02-21T09:21:00.4208221Z // begin inline asm 2026-02-21T09:21:00.4208352Z cp.async.ca.shared.global [ %r62 + 0 ], [ %rd456 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4208408Z // end inline asm 2026-02-21T09:21:00.4208470Z // begin inline asm 2026-02-21T09:21:00.4208599Z cp.async.ca.shared.global [ %r63 + 0 ], [ %rd457 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4208656Z // end inline asm 2026-02-21T09:21:00.4208715Z // begin inline asm 2026-02-21T09:21:00.4208846Z cp.async.ca.shared.global [ %r64 + 0 ], [ %rd458 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4208901Z // end inline asm 2026-02-21T09:21:00.4208964Z cp.async.commit_group; 2026-02-21T09:21:00.4209168Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4209239Z add.s32 %r14611, %r1812, %r65; 2026-02-21T09:21:00.4209437Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4209503Z cvt.s64.s32 %rd514, %r14611; 2026-02-21T09:21:00.4209565Z add.s64 %rd459, %rd45, %rd514; 2026-02-21T09:21:00.4209759Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4209819Z // begin inline asm 2026-02-21T09:21:00.4209956Z cp.async.ca.shared.global [ %r66 + 0 ], [ %rd459 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4210013Z // end inline asm 2026-02-21T09:21:00.4210079Z cp.async.commit_group; 2026-02-21T09:21:00.4210274Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4210336Z add.s64 %rd460, %rd500, 96; 2026-02-21T09:21:00.4210396Z add.s64 %rd461, %rd504, 96; 2026-02-21T09:21:00.4210460Z add.s64 %rd462, %rd508, 96; 2026-02-21T09:21:00.4210521Z add.s64 %rd463, %rd512, 96; 2026-02-21T09:21:00.4210715Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4210774Z // begin inline asm 2026-02-21T09:21:00.4210906Z cp.async.ca.shared.global [ %r67 + 0 ], [ %rd460 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4210965Z // end inline asm 2026-02-21T09:21:00.4211022Z // begin inline asm 2026-02-21T09:21:00.4211155Z cp.async.ca.shared.global [ %r68 + 0 ], [ %rd461 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4211212Z // end inline asm 2026-02-21T09:21:00.4211269Z // begin inline asm 2026-02-21T09:21:00.4211397Z cp.async.ca.shared.global [ %r69 + 0 ], [ %rd462 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4211548Z // end inline asm 2026-02-21T09:21:00.4211607Z // begin inline asm 2026-02-21T09:21:00.4211746Z cp.async.ca.shared.global [ %r70 + 0 ], [ %rd463 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4211865Z // end inline asm 2026-02-21T09:21:00.4211931Z cp.async.commit_group; 2026-02-21T09:21:00.4212133Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4212197Z add.s32 %r14612, %r1812, %r71; 2026-02-21T09:21:00.4212396Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4212458Z cvt.s64.s32 %rd515, %r14612; 2026-02-21T09:21:00.4212528Z add.s64 %rd464, %rd45, %rd515; 2026-02-21T09:21:00.4212792Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4212855Z // begin inline asm 2026-02-21T09:21:00.4212990Z cp.async.ca.shared.global [ %r72 + 0 ], [ %rd464 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4213051Z // end inline asm 2026-02-21T09:21:00.4213114Z cp.async.commit_group; 2026-02-21T09:21:00.4213307Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4213373Z add.s64 %rd465, %rd500, 128; 2026-02-21T09:21:00.4213433Z add.s64 %rd466, %rd504, 128; 2026-02-21T09:21:00.4213493Z add.s64 %rd467, %rd508, 128; 2026-02-21T09:21:00.4213563Z add.s64 %rd468, %rd512, 128; 2026-02-21T09:21:00.4213810Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4213868Z bar.sync 0; 2026-02-21T09:21:00.4213926Z // begin inline asm 2026-02-21T09:21:00.4214061Z cp.async.ca.shared.global [ %r73 + 0 ], [ %rd465 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4214117Z // end inline asm 2026-02-21T09:21:00.4214174Z // begin inline asm 2026-02-21T09:21:00.4214307Z cp.async.ca.shared.global [ %r74 + 0 ], [ %rd466 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4214364Z // end inline asm 2026-02-21T09:21:00.4214421Z // begin inline asm 2026-02-21T09:21:00.4214548Z cp.async.ca.shared.global [ %r75 + 0 ], [ %rd467 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4214609Z // end inline asm 2026-02-21T09:21:00.4214679Z // begin inline asm 2026-02-21T09:21:00.4214811Z cp.async.ca.shared.global [ %r76 + 0 ], [ %rd468 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4214868Z // end inline asm 2026-02-21T09:21:00.4214933Z cp.async.commit_group; 2026-02-21T09:21:00.4215128Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4215193Z add.s32 %r14613, %r1812, %r77; 2026-02-21T09:21:00.4215394Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4215458Z cvt.s64.s32 %rd516, %r14613; 2026-02-21T09:21:00.4215523Z add.s64 %rd469, %rd45, %rd516; 2026-02-21T09:21:00.4215728Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4215792Z // begin inline asm 2026-02-21T09:21:00.4215925Z cp.async.ca.shared.global [ %r78 + 0 ], [ %rd469 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4215986Z // end inline asm 2026-02-21T09:21:00.4216053Z cp.async.commit_group; 2026-02-21T09:21:00.4216248Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4216314Z add.s64 %rd470, %rd500, 160; 2026-02-21T09:21:00.4216376Z add.s64 %rd471, %rd504, 160; 2026-02-21T09:21:00.4216438Z add.s64 %rd472, %rd508, 160; 2026-02-21T09:21:00.4216612Z add.s64 %rd473, %rd512, 160; 2026-02-21T09:21:00.4216829Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4216890Z // begin inline asm 2026-02-21T09:21:00.4217022Z cp.async.ca.shared.global [ %r79 + 0 ], [ %rd470 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4217163Z // end inline asm 2026-02-21T09:21:00.4217222Z // begin inline asm 2026-02-21T09:21:00.4217351Z cp.async.ca.shared.global [ %r80 + 0 ], [ %rd471 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4217407Z // end inline asm 2026-02-21T09:21:00.4217469Z // begin inline asm 2026-02-21T09:21:00.4217660Z cp.async.ca.shared.global [ %r81 + 0 ], [ %rd472 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4217716Z // end inline asm 2026-02-21T09:21:00.4217787Z // begin inline asm 2026-02-21T09:21:00.4217921Z cp.async.ca.shared.global [ %r82 + 0 ], [ %rd473 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4217978Z // end inline asm 2026-02-21T09:21:00.4218045Z cp.async.commit_group; 2026-02-21T09:21:00.4218240Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4218365Z add.s32 %r14614, %r1812, %r83; 2026-02-21T09:21:00.4218561Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4218628Z cvt.s64.s32 %rd517, %r14614; 2026-02-21T09:21:00.4218690Z add.s64 %rd474, %rd45, %rd517; 2026-02-21T09:21:00.4218884Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4218951Z // begin inline asm 2026-02-21T09:21:00.4219093Z cp.async.ca.shared.global [ %r84 + 0 ], [ %rd474 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4219149Z // end inline asm 2026-02-21T09:21:00.4219217Z cp.async.commit_group; 2026-02-21T09:21:00.4219474Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4219540Z add.s64 %rd475, %rd500, 192; 2026-02-21T09:21:00.4219601Z add.s64 %rd476, %rd504, 192; 2026-02-21T09:21:00.4219663Z add.s64 %rd477, %rd508, 192; 2026-02-21T09:21:00.4219724Z add.s64 %rd478, %rd512, 192; 2026-02-21T09:21:00.4219919Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4219982Z bar.sync 0; 2026-02-21T09:21:00.4220044Z // begin inline asm 2026-02-21T09:21:00.4220176Z cp.async.ca.shared.global [ %r85 + 0 ], [ %rd475 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4220236Z // end inline asm 2026-02-21T09:21:00.4220294Z // begin inline asm 2026-02-21T09:21:00.4220425Z cp.async.ca.shared.global [ %r86 + 0 ], [ %rd476 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4220481Z // end inline asm 2026-02-21T09:21:00.4220541Z // begin inline asm 2026-02-21T09:21:00.4220668Z cp.async.ca.shared.global [ %r87 + 0 ], [ %rd477 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4220724Z // end inline asm 2026-02-21T09:21:00.4220788Z // begin inline asm 2026-02-21T09:21:00.4220920Z cp.async.ca.shared.global [ %r88 + 0 ], [ %rd478 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4220975Z // end inline asm 2026-02-21T09:21:00.4221040Z cp.async.commit_group; 2026-02-21T09:21:00.4221237Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4221310Z add.s32 %r14615, %r1812, %r89; 2026-02-21T09:21:00.4221509Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4221576Z cvt.s64.s32 %rd518, %r14615; 2026-02-21T09:21:00.4221639Z add.s64 %rd479, %rd45, %rd518; 2026-02-21T09:21:00.4221837Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4221899Z // begin inline asm 2026-02-21T09:21:00.4222031Z cp.async.ca.shared.global [ %r90 + 0 ], [ %rd479 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4222089Z // end inline asm 2026-02-21T09:21:00.4222154Z cp.async.commit_group; 2026-02-21T09:21:00.4222360Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4222422Z add.s64 %rd480, %rd500, 224; 2026-02-21T09:21:00.4222481Z add.s64 %rd481, %rd504, 224; 2026-02-21T09:21:00.4222544Z add.s64 %rd482, %rd508, 224; 2026-02-21T09:21:00.4222604Z add.s64 %rd483, %rd512, 224; 2026-02-21T09:21:00.4222859Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4222920Z // begin inline asm 2026-02-21T09:21:00.4223052Z cp.async.ca.shared.global [ %r91 + 0 ], [ %rd480 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4223156Z // end inline asm 2026-02-21T09:21:00.4223215Z // begin inline asm 2026-02-21T09:21:00.4223347Z cp.async.ca.shared.global [ %r92 + 0 ], [ %rd481 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4223402Z // end inline asm 2026-02-21T09:21:00.4223472Z // begin inline asm 2026-02-21T09:21:00.4223610Z cp.async.ca.shared.global [ %r93 + 0 ], [ %rd482 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4223667Z // end inline asm 2026-02-21T09:21:00.4223725Z // begin inline asm 2026-02-21T09:21:00.4223904Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd483 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4223965Z // end inline asm 2026-02-21T09:21:00.4224030Z cp.async.commit_group; 2026-02-21T09:21:00.4224226Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4224295Z add.s32 %r14616, %r1812, %r95; 2026-02-21T09:21:00.4224489Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4224554Z cvt.s64.s32 %rd519, %r14616; 2026-02-21T09:21:00.4224617Z add.s64 %rd484, %rd45, %rd519; 2026-02-21T09:21:00.4224810Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4224868Z // begin inline asm 2026-02-21T09:21:00.4225049Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd484 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4225108Z // end inline asm 2026-02-21T09:21:00.4225172Z cp.async.commit_group; 2026-02-21T09:21:00.4225369Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4225433Z add.s64 %rd485, %rd500, 256; 2026-02-21T09:21:00.4225492Z add.s64 %rd486, %rd504, 256; 2026-02-21T09:21:00.4225553Z add.s64 %rd487, %rd508, 256; 2026-02-21T09:21:00.4225615Z add.s64 %rd488, %rd512, 256; 2026-02-21T09:21:00.4225810Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4225869Z bar.sync 0; 2026-02-21T09:21:00.4225926Z // begin inline asm 2026-02-21T09:21:00.4226059Z cp.async.ca.shared.global [ %r97 + 0 ], [ %rd485 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4226117Z // end inline asm 2026-02-21T09:21:00.4226174Z // begin inline asm 2026-02-21T09:21:00.4226309Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd486 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4226367Z // end inline asm 2026-02-21T09:21:00.4226425Z // begin inline asm 2026-02-21T09:21:00.4226687Z cp.async.ca.shared.global [ %r99 + 0 ], [ %rd487 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4226748Z // end inline asm 2026-02-21T09:21:00.4226806Z // begin inline asm 2026-02-21T09:21:00.4226942Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd488 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4227003Z // end inline asm 2026-02-21T09:21:00.4227067Z cp.async.commit_group; 2026-02-21T09:21:00.4227262Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4227333Z add.s32 %r14617, %r1812, %r101; 2026-02-21T09:21:00.4227528Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4227590Z cvt.s64.s32 %rd520, %r14617; 2026-02-21T09:21:00.4227653Z add.s64 %rd489, %rd45, %rd520; 2026-02-21T09:21:00.4227847Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4227907Z // begin inline asm 2026-02-21T09:21:00.4228045Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd489 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4228104Z // end inline asm 2026-02-21T09:21:00.4228168Z cp.async.commit_group; 2026-02-21T09:21:00.4228419Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4228574Z add.s64 %rd490, %rd500, 288; 2026-02-21T09:21:00.4228636Z add.s64 %rd491, %rd504, 288; 2026-02-21T09:21:00.4228695Z add.s64 %rd492, %rd508, 288; 2026-02-21T09:21:00.4228819Z add.s64 %rd493, %rd512, 288; 2026-02-21T09:21:00.4229014Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4229072Z // begin inline asm 2026-02-21T09:21:00.4229208Z cp.async.ca.shared.global [ %r103 + 0 ], [ %rd490 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4229269Z // end inline asm 2026-02-21T09:21:00.4229327Z // begin inline asm 2026-02-21T09:21:00.4229459Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd491 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4229583Z // end inline asm 2026-02-21T09:21:00.4229657Z // begin inline asm 2026-02-21T09:21:00.4229793Z cp.async.ca.shared.global [ %r105 + 0 ], [ %rd492 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4229852Z // end inline asm 2026-02-21T09:21:00.4229913Z // begin inline asm 2026-02-21T09:21:00.4230042Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd493 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4230098Z // end inline asm 2026-02-21T09:21:00.4230167Z cp.async.commit_group; 2026-02-21T09:21:00.4230364Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4230428Z add.s32 %r14618, %r1812, %r107; 2026-02-21T09:21:00.4230626Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4230749Z cvt.s64.s32 %rd521, %r14618; 2026-02-21T09:21:00.4230815Z add.s64 %rd494, %rd45, %rd521; 2026-02-21T09:21:00.4231010Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4231072Z // begin inline asm 2026-02-21T09:21:00.4231205Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd494 + 0 ], 0x8, %r14260; 2026-02-21T09:21:00.4231263Z // end inline asm 2026-02-21T09:21:00.4231330Z cp.async.commit_group; 2026-02-21T09:21:00.4231525Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4231592Z add.s32 %r14619, %r14591, %r237; 2026-02-21T09:21:00.4231660Z sub.s32 %r14620, %r14619, %r14593; 2026-02-21T09:21:00.4231722Z shl.b32 %r14621, %r14620, 8; 2026-02-21T09:21:00.4231785Z add.s32 %r23039, %r133, %r14621; 2026-02-21T09:21:00.4231846Z or.b32 %r14622, %r8, %r1813; 2026-02-21T09:21:00.4231912Z shl.b32 %r14623, %r14622, 10; 2026-02-21T09:21:00.4231981Z mul.wide.s32 %rd29, %r14623, 2; 2026-02-21T09:21:00.4232044Z or.b32 %r14624, %r7, %r1813; 2026-02-21T09:21:00.4232107Z shl.b32 %r14625, %r14624, 10; 2026-02-21T09:21:00.4232174Z mul.wide.s32 %rd30, %r14625, 2; 2026-02-21T09:21:00.4232234Z shl.b32 %r14626, %r14592, 18; 2026-02-21T09:21:00.4232296Z or.b32 %r14627, %r22252, %r14626; 2026-02-21T09:21:00.4232364Z mul.wide.s32 %rd31, %r14627, 2; 2026-02-21T09:21:00.4232430Z or.b32 %r23038, %r137, %r14626; 2026-02-21T09:21:00.4232490Z mov.b32 %r23042, 0f00000000; 2026-02-21T09:21:00.4232548Z mov.b32 %r23041, 4; 2026-02-21T09:21:00.4232616Z mov.b32 %r23040, -1; 2026-02-21T09:21:00.4232676Z mov.b64 %rd721, -16; 2026-02-21T09:21:00.4232751Z mov.b64 %rd720, %rd3; 2026-02-21T09:21:00.4232815Z mov.b32 %r23043, %r23042; 2026-02-21T09:21:00.4232875Z mov.b32 %r23044, %r23042; 2026-02-21T09:21:00.4232933Z mov.b32 %r23045, %r23042; 2026-02-21T09:21:00.4232993Z mov.b32 %r23046, %r23042; 2026-02-21T09:21:00.4233053Z mov.b32 %r23047, %r23042; 2026-02-21T09:21:00.4233110Z mov.b32 %r23048, %r23042; 2026-02-21T09:21:00.4233169Z mov.b32 %r23049, %r23042; 2026-02-21T09:21:00.4233227Z mov.b32 %r23050, %r23042; 2026-02-21T09:21:00.4233286Z mov.b32 %r23051, %r23042; 2026-02-21T09:21:00.4233344Z mov.b32 %r23052, %r23042; 2026-02-21T09:21:00.4233405Z mov.b32 %r23053, %r23042; 2026-02-21T09:21:00.4233464Z mov.b32 %r23054, %r23042; 2026-02-21T09:21:00.4233583Z mov.b32 %r23055, %r23042; 2026-02-21T09:21:00.4233644Z mov.b32 %r23056, %r23042; 2026-02-21T09:21:00.4233701Z mov.b32 %r23057, %r23042; 2026-02-21T09:21:00.4233759Z mov.b32 %r23058, %r23042; 2026-02-21T09:21:00.4233817Z mov.b32 %r23059, %r23042; 2026-02-21T09:21:00.4233943Z mov.b32 %r23060, %r23042; 2026-02-21T09:21:00.4234002Z mov.b32 %r23061, %r23042; 2026-02-21T09:21:00.4234059Z mov.b32 %r23062, %r23042; 2026-02-21T09:21:00.4234119Z mov.b32 %r23063, %r23042; 2026-02-21T09:21:00.4234176Z mov.b32 %r23064, %r23042; 2026-02-21T09:21:00.4234234Z mov.b32 %r23065, %r23042; 2026-02-21T09:21:00.4234292Z mov.b32 %r23066, %r23042; 2026-02-21T09:21:00.4234352Z mov.b32 %r23067, %r23042; 2026-02-21T09:21:00.4234408Z mov.b32 %r23068, %r23042; 2026-02-21T09:21:00.4234518Z mov.b32 %r23069, %r23042; 2026-02-21T09:21:00.4234580Z mov.b32 %r23070, %r23042; 2026-02-21T09:21:00.4234638Z mov.b32 %r23071, %r23042; 2026-02-21T09:21:00.4234695Z mov.b32 %r23072, %r23042; 2026-02-21T09:21:00.4234755Z mov.b32 %r23073, %r23042; 2026-02-21T09:21:00.4234814Z mov.b32 %r23074, %r23042; 2026-02-21T09:21:00.4234884Z mov.b32 %r23075, %r23042; 2026-02-21T09:21:00.4234944Z mov.b32 %r23076, %r23042; 2026-02-21T09:21:00.4235005Z mov.b32 %r23077, %r23042; 2026-02-21T09:21:00.4235064Z mov.b32 %r23078, %r23042; 2026-02-21T09:21:00.4235123Z mov.b32 %r23079, %r23042; 2026-02-21T09:21:00.4235179Z mov.b32 %r23080, %r23042; 2026-02-21T09:21:00.4235239Z mov.b32 %r23081, %r23042; 2026-02-21T09:21:00.4235296Z mov.b32 %r23082, %r23042; 2026-02-21T09:21:00.4235402Z mov.b32 %r23083, %r23042; 2026-02-21T09:21:00.4235465Z mov.b32 %r23084, %r23042; 2026-02-21T09:21:00.4235523Z mov.b32 %r23085, %r23042; 2026-02-21T09:21:00.4235579Z mov.b32 %r23086, %r23042; 2026-02-21T09:21:00.4235639Z mov.b32 %r23087, %r23042; 2026-02-21T09:21:00.4235701Z mov.b32 %r23088, %r23042; 2026-02-21T09:21:00.4235759Z mov.b32 %r23089, %r23042; 2026-02-21T09:21:00.4235817Z mov.b32 %r23090, %r23042; 2026-02-21T09:21:00.4235878Z mov.b32 %r23091, %r23042; 2026-02-21T09:21:00.4235936Z mov.b32 %r23092, %r23042; 2026-02-21T09:21:00.4235994Z mov.b32 %r23093, %r23042; 2026-02-21T09:21:00.4236055Z mov.b32 %r23094, %r23042; 2026-02-21T09:21:00.4236113Z mov.b32 %r23095, %r23042; 2026-02-21T09:21:00.4236173Z mov.b32 %r23096, %r23042; 2026-02-21T09:21:00.4236231Z mov.b32 %r23097, %r23042; 2026-02-21T09:21:00.4236292Z mov.b32 %r23098, %r23042; 2026-02-21T09:21:00.4236351Z mov.b32 %r23099, %r23042; 2026-02-21T09:21:00.4236421Z mov.b32 %r23100, %r23042; 2026-02-21T09:21:00.4236611Z mov.b32 %r23101, %r23042; 2026-02-21T09:21:00.4236675Z mov.b32 %r23102, %r23042; 2026-02-21T09:21:00.4236732Z mov.b32 %r23103, %r23042; 2026-02-21T09:21:00.4236789Z mov.b32 %r23104, %r23042; 2026-02-21T09:21:00.4236851Z mov.b32 %r23105, %r23042; 2026-02-21T09:21:00.4236908Z mov.b32 %r23106, %r23042; 2026-02-21T09:21:00.4236965Z mov.b32 %r23107, %r23042; 2026-02-21T09:21:00.4237024Z mov.b32 %r23108, %r23042; 2026-02-21T09:21:00.4237085Z mov.b32 %r23109, %r23042; 2026-02-21T09:21:00.4237142Z mov.b32 %r23110, %r23042; 2026-02-21T09:21:00.4237200Z mov.b32 %r23111, %r23042; 2026-02-21T09:21:00.4237263Z mov.b32 %r23112, %r23042; 2026-02-21T09:21:00.4237321Z mov.b32 %r23113, %r23042; 2026-02-21T09:21:00.4237381Z mov.b32 %r23114, %r23042; 2026-02-21T09:21:00.4237442Z mov.b32 %r23115, %r23042; 2026-02-21T09:21:00.4237500Z mov.b32 %r23116, %r23042; 2026-02-21T09:21:00.4237558Z mov.b32 %r23117, %r23042; 2026-02-21T09:21:00.4237615Z mov.b32 %r23118, %r23042; 2026-02-21T09:21:00.4237678Z mov.b32 %r23119, %r23042; 2026-02-21T09:21:00.4237741Z mov.b32 %r23120, %r23042; 2026-02-21T09:21:00.4237798Z mov.b32 %r23121, %r23042; 2026-02-21T09:21:00.4237861Z mov.b32 %r23122, %r23042; 2026-02-21T09:21:00.4237921Z mov.b32 %r23123, %r23042; 2026-02-21T09:21:00.4237979Z mov.b32 %r23124, %r23042; 2026-02-21T09:21:00.4238038Z mov.b32 %r23125, %r23042; 2026-02-21T09:21:00.4238097Z mov.b32 %r23126, %r23042; 2026-02-21T09:21:00.4238242Z mov.b32 %r23127, %r23042; 2026-02-21T09:21:00.4238298Z mov.b32 %r23128, %r23042; 2026-02-21T09:21:00.4238359Z mov.b32 %r23129, %r23042; 2026-02-21T09:21:00.4238417Z mov.b32 %r23130, %r23042; 2026-02-21T09:21:00.4238485Z mov.b32 %r23131, %r23042; 2026-02-21T09:21:00.4238609Z mov.b32 %r23132, %r23042; 2026-02-21T09:21:00.4238670Z mov.b32 %r23133, %r23042; 2026-02-21T09:21:00.4238727Z mov.b32 %r23134, %r23042; 2026-02-21T09:21:00.4238784Z mov.b32 %r23135, %r23042; 2026-02-21T09:21:00.4238844Z mov.b32 %r23136, %r23042; 2026-02-21T09:21:00.4238903Z mov.b32 %r23137, %r23042; 2026-02-21T09:21:00.4238960Z mov.b32 %r23138, %r23042; 2026-02-21T09:21:00.4239019Z mov.b32 %r23139, %r23042; 2026-02-21T09:21:00.4239077Z mov.b32 %r23140, %r23042; 2026-02-21T09:21:00.4239197Z mov.b32 %r23141, %r23042; 2026-02-21T09:21:00.4239258Z mov.b32 %r23142, %r23042; 2026-02-21T09:21:00.4239317Z mov.b32 %r23143, %r23042; 2026-02-21T09:21:00.4239374Z mov.b32 %r23144, %r23042; 2026-02-21T09:21:00.4239434Z mov.b32 %r23145, %r23042; 2026-02-21T09:21:00.4239494Z mov.b32 %r23146, %r23042; 2026-02-21T09:21:00.4239550Z mov.b32 %r23147, %r23042; 2026-02-21T09:21:00.4239607Z mov.b32 %r23148, %r23042; 2026-02-21T09:21:00.4239665Z mov.b32 %r23149, %r23042; 2026-02-21T09:21:00.4239728Z mov.b32 %r23150, %r23042; 2026-02-21T09:21:00.4239785Z mov.b32 %r23151, %r23042; 2026-02-21T09:21:00.4239844Z mov.b32 %r23152, %r23042; 2026-02-21T09:21:00.4239904Z mov.b32 %r23153, %r23042; 2026-02-21T09:21:00.4239962Z mov.b32 %r23154, %r23042; 2026-02-21T09:21:00.4240085Z mov.b32 %r23155, %r23042; 2026-02-21T09:21:00.4240158Z mov.b32 %r23156, %r23042; 2026-02-21T09:21:00.4240220Z mov.b32 %r23157, %r23042; 2026-02-21T09:21:00.4240278Z mov.b32 %r23158, %r23042; 2026-02-21T09:21:00.4240337Z mov.b32 %r23159, %r23042; 2026-02-21T09:21:00.4240397Z mov.b32 %r23160, %r23042; 2026-02-21T09:21:00.4240454Z mov.b32 %r23161, %r23042; 2026-02-21T09:21:00.4240509Z mov.b32 %r23162, %r23042; 2026-02-21T09:21:00.4240569Z mov.b32 %r23163, %r23042; 2026-02-21T09:21:00.4240629Z mov.b32 %r23164, %r23042; 2026-02-21T09:21:00.4240688Z mov.b32 %r23165, %r23042; 2026-02-21T09:21:00.4240745Z mov.b32 %r23166, %r23042; 2026-02-21T09:21:00.4240805Z mov.b32 %r23167, %r23042; 2026-02-21T09:21:00.4240864Z mov.b32 %r23168, %r23042; 2026-02-21T09:21:00.4240922Z mov.b32 %r23169, %r23042; 2026-02-21T09:21:00.4240980Z mov.b32 %r23170, %r23042; 2026-02-21T09:21:00.4241040Z mov.b32 %r23171, %r23042; 2026-02-21T09:21:00.4241098Z mov.b32 %r23172, %r23042; 2026-02-21T09:21:00.4241157Z mov.b32 %r23173, %r23042; 2026-02-21T09:21:00.4241217Z mov.b32 %r23174, %r23042; 2026-02-21T09:21:00.4241274Z mov.b32 %r23175, %r23042; 2026-02-21T09:21:00.4241331Z mov.b32 %r23176, %r23042; 2026-02-21T09:21:00.4241390Z mov.b32 %r23177, %r23042; 2026-02-21T09:21:00.4241452Z mov.b32 %r23178, %r23042; 2026-02-21T09:21:00.4241510Z mov.b32 %r23179, %r23042; 2026-02-21T09:21:00.4241568Z mov.b32 %r23180, %r23042; 2026-02-21T09:21:00.4241630Z mov.b32 %r23181, %r23042; 2026-02-21T09:21:00.4241687Z mov.b32 %r23182, %r23042; 2026-02-21T09:21:00.4241744Z mov.b32 %r23183, %r23042; 2026-02-21T09:21:00.4241808Z mov.b32 %r23184, %r23042; 2026-02-21T09:21:00.4241875Z mov.b32 %r23185, %r23042; 2026-02-21T09:21:00.4241935Z mov.b32 %r23186, %r23042; 2026-02-21T09:21:00.4241995Z mov.b32 %r23187, %r23042; 2026-02-21T09:21:00.4242056Z mov.b32 %r23188, %r23042; 2026-02-21T09:21:00.4242114Z mov.b32 %r23189, %r23042; 2026-02-21T09:21:00.4242171Z mov.b32 %r23190, %r23042; 2026-02-21T09:21:00.4242239Z mov.b32 %r23191, %r23042; 2026-02-21T09:21:00.4242296Z mov.b32 %r23192, %r23042; 2026-02-21T09:21:00.4242352Z mov.b32 %r23193, %r23042; 2026-02-21T09:21:00.4242411Z mov.b32 %r23194, %r23042; 2026-02-21T09:21:00.4242476Z mov.b32 %r23195, %r23042; 2026-02-21T09:21:00.4242534Z mov.b32 %r23196, %r23042; 2026-02-21T09:21:00.4242591Z mov.b32 %r23197, %r23042; 2026-02-21T09:21:00.4242652Z mov.b32 %r23198, %r23042; 2026-02-21T09:21:00.4242773Z mov.b32 %r23199, %r23042; 2026-02-21T09:21:00.4242830Z mov.b32 %r23200, %r23042; 2026-02-21T09:21:00.4242889Z mov.b32 %r23201, %r23042; 2026-02-21T09:21:00.4242950Z mov.b32 %r23202, %r23042; 2026-02-21T09:21:00.4243008Z mov.b32 %r23203, %r23042; 2026-02-21T09:21:00.4243113Z mov.b32 %r23204, %r23042; 2026-02-21T09:21:00.4243174Z mov.b32 %r23205, %r23042; 2026-02-21T09:21:00.4243231Z mov.b32 %r23206, %r23042; 2026-02-21T09:21:00.4243288Z mov.b32 %r23207, %r23042; 2026-02-21T09:21:00.4243346Z mov.b32 %r23208, %r23042; 2026-02-21T09:21:00.4243407Z mov.b32 %r23209, %r23042; 2026-02-21T09:21:00.4243465Z mov.b32 %r23210, %r23042; 2026-02-21T09:21:00.4243522Z mov.b32 %r23211, %r23042; 2026-02-21T09:21:00.4243584Z mov.b32 %r23212, %r23042; 2026-02-21T09:21:00.4243689Z mov.b32 %r23213, %r23042; 2026-02-21T09:21:00.4243749Z mov.b32 %r23214, %r23042; 2026-02-21T09:21:00.4243806Z mov.b32 %r23215, %r23042; 2026-02-21T09:21:00.4243869Z mov.b32 %r23216, %r23042; 2026-02-21T09:21:00.4243928Z mov.b32 %r23217, %r23042; 2026-02-21T09:21:00.4243986Z mov.b32 %r23218, %r23042; 2026-02-21T09:21:00.4244048Z mov.b32 %r23219, %r23042; 2026-02-21T09:21:00.4244106Z mov.b32 %r23220, %r23042; 2026-02-21T09:21:00.4244162Z mov.b32 %r23221, %r23042; 2026-02-21T09:21:00.4244222Z mov.b32 %r23222, %r23042; 2026-02-21T09:21:00.4244283Z mov.b32 %r23223, %r23042; 2026-02-21T09:21:00.4244340Z mov.b32 %r23224, %r23042; 2026-02-21T09:21:00.4244396Z mov.b32 %r23225, %r23042; 2026-02-21T09:21:00.4244455Z mov.b32 %r23226, %r23042; 2026-02-21T09:21:00.4244561Z mov.b32 %r23227, %r23042; 2026-02-21T09:21:00.4244622Z mov.b32 %r23228, %r23042; 2026-02-21T09:21:00.4244683Z mov.b32 %r23229, %r23042; 2026-02-21T09:21:00.4244740Z mov.b32 %r23230, %r23042; 2026-02-21T09:21:00.4244799Z mov.b32 %r23231, %r23042; 2026-02-21T09:21:00.4244856Z mov.b32 %r23232, %r23042; 2026-02-21T09:21:00.4244918Z mov.b32 %r23233, %r23042; 2026-02-21T09:21:00.4244974Z mov.b32 %r23234, %r23042; 2026-02-21T09:21:00.4245033Z mov.b32 %r23235, %r23042; 2026-02-21T09:21:00.4245093Z mov.b32 %r23236, %r23042; 2026-02-21T09:21:00.4245150Z mov.b32 %r23237, %r23042; 2026-02-21T09:21:00.4245207Z mov.b32 %r23238, %r23042; 2026-02-21T09:21:00.4245264Z mov.b32 %r23239, %r23042; 2026-02-21T09:21:00.4245325Z mov.b32 %r23240, %r23042; 2026-02-21T09:21:00.4245383Z mov.b32 %r23241, %r23042; 2026-02-21T09:21:00.4245441Z mov.b32 %r23242, %r23042; 2026-02-21T09:21:00.4245501Z mov.b32 %r23243, %r23042; 2026-02-21T09:21:00.4245559Z mov.b32 %r23244, %r23042; 2026-02-21T09:21:00.4245617Z mov.b32 %r23245, %r23042; 2026-02-21T09:21:00.4245676Z mov.b32 %r23246, %r23042; 2026-02-21T09:21:00.4245735Z mov.b32 %r23247, %r23042; 2026-02-21T09:21:00.4245792Z mov.b32 %r23248, %r23042; 2026-02-21T09:21:00.4245851Z mov.b32 %r23249, %r23042; 2026-02-21T09:21:00.4245910Z mov.b32 %r23250, %r23042; 2026-02-21T09:21:00.4245967Z mov.b32 %r23251, %r23042; 2026-02-21T09:21:00.4246024Z mov.b32 %r23252, %r23042; 2026-02-21T09:21:00.4246083Z mov.b32 %r23253, %r23042; 2026-02-21T09:21:00.4246142Z mov.b32 %r23254, %r23042; 2026-02-21T09:21:00.4246200Z mov.b32 %r23255, %r23042; 2026-02-21T09:21:00.4246256Z mov.b32 %r23256, %r23042; 2026-02-21T09:21:00.4246317Z mov.b32 %r23257, %r23042; 2026-02-21T09:21:00.4246375Z mov.b32 %r23258, %r23042; 2026-02-21T09:21:00.4246433Z mov.b32 %r23259, %r23042; 2026-02-21T09:21:00.4246615Z mov.b32 %r23260, %r23042; 2026-02-21T09:21:00.4246680Z mov.b32 %r23261, %r23042; 2026-02-21T09:21:00.4246739Z mov.b32 %r23262, %r23042; 2026-02-21T09:21:00.4246799Z mov.b32 %r23263, %r23042; 2026-02-21T09:21:00.4246860Z mov.b32 %r23264, %r23042; 2026-02-21T09:21:00.4246918Z mov.b32 %r23265, %r23042; 2026-02-21T09:21:00.4246974Z mov.b32 %r23266, %r23042; 2026-02-21T09:21:00.4247033Z mov.b32 %r23267, %r23042; 2026-02-21T09:21:00.4247093Z mov.b32 %r23268, %r23042; 2026-02-21T09:21:00.4247150Z mov.b32 %r23269, %r23042; 2026-02-21T09:21:00.4247208Z mov.b32 %r23270, %r23042; 2026-02-21T09:21:00.4247351Z mov.b32 %r23271, %r23042; 2026-02-21T09:21:00.4247409Z mov.b32 %r23272, %r23042; 2026-02-21T09:21:00.4247467Z mov.b32 %r23273, %r23042; 2026-02-21T09:21:00.4247524Z mov.b32 %r23274, %r23042; 2026-02-21T09:21:00.4247584Z mov.b32 %r23275, %r23042; 2026-02-21T09:21:00.4247704Z mov.b32 %r23276, %r23042; 2026-02-21T09:21:00.4247763Z mov.b32 %r23277, %r23042; 2026-02-21T09:21:00.4247823Z mov.b32 %r23278, %r23042; 2026-02-21T09:21:00.4247881Z mov.b32 %r23279, %r23042; 2026-02-21T09:21:00.4247938Z mov.b32 %r23280, %r23042; 2026-02-21T09:21:00.4248000Z mov.b32 %r23281, %r23042; 2026-02-21T09:21:00.4248058Z mov.b32 %r23282, %r23042; 2026-02-21T09:21:00.4248115Z mov.b32 %r23283, %r23042; 2026-02-21T09:21:00.4248173Z mov.b32 %r23284, %r23042; 2026-02-21T09:21:00.4248293Z mov.b32 %r23285, %r23042; 2026-02-21T09:21:00.4248356Z mov.b32 %r23286, %r23042; 2026-02-21T09:21:00.4248414Z mov.b32 %r23287, %r23042; 2026-02-21T09:21:00.4248474Z mov.b32 %r23288, %r23042; 2026-02-21T09:21:00.4248535Z mov.b32 %r23289, %r23042; 2026-02-21T09:21:00.4248592Z mov.b32 %r23290, %r23042; 2026-02-21T09:21:00.4248651Z mov.b32 %r23291, %r23042; 2026-02-21T09:21:00.4248710Z mov.b32 %r23292, %r23042; 2026-02-21T09:21:00.4248768Z mov.b32 %r23293, %r23042; 2026-02-21T09:21:00.4248827Z mov.b32 %r23294, %r23042; 2026-02-21T09:21:00.4248888Z mov.b32 %r23295, %r23042; 2026-02-21T09:21:00.4248946Z mov.b32 %r23296, %r23042; 2026-02-21T09:21:00.4249003Z mov.b32 %r23297, %r23042; 2026-02-21T09:21:00.4249186Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:00.4249316Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:00.4249381Z add.s64 %rd721, %rd721, 16; 2026-02-21T09:21:00.4249452Z setp.lt.u64 %p46, %rd721, 432; 2026-02-21T09:21:00.4249517Z add.s32 %r17764, %r23040, 1; 2026-02-21T09:21:00.4249582Z setp.gt.s32 %p47, %r17764, 4; 2026-02-21T09:21:00.4249653Z selp.b32 %r23040, 0, %r17764, %p47; 2026-02-21T09:21:00.4249865Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4249936Z cp.async.wait_group 16; 2026-02-21T09:21:00.4249993Z bar.sync 0; 2026-02-21T09:21:00.4250055Z shl.b32 %r17765, %r23040, 13; 2026-02-21T09:21:00.4250123Z add.s32 %r17767, %r22237, %r17765; 2026-02-21T09:21:00.4250322Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4250386Z add.s32 %r17768, %r17767, %r109; 2026-02-21T09:21:00.4250458Z ld.shared.b16 %rs337, [%r17768]; 2026-02-21T09:21:00.4250528Z ld.shared.b16 %rs338, [%r17768+256]; 2026-02-21T09:21:00.4250596Z ld.shared.b16 %rs339, [%r17768+16]; 2026-02-21T09:21:00.4250668Z ld.shared.b16 %rs340, [%r17768+272]; 2026-02-21T09:21:00.4250748Z ld.shared.b16 %rs341, [%r17768+4096]; 2026-02-21T09:21:00.4250817Z ld.shared.b16 %rs342, [%r17768+4352]; 2026-02-21T09:21:00.4250883Z ld.shared.b16 %rs343, [%r17768+4112]; 2026-02-21T09:21:00.4250956Z ld.shared.b16 %rs344, [%r17768+4368]; 2026-02-21T09:21:00.4251016Z add.s32 %r17769, %r17767, %r110; 2026-02-21T09:21:00.4251080Z ld.shared.b16 %rs345, [%r17769]; 2026-02-21T09:21:00.4251148Z ld.shared.b16 %rs346, [%r17769+256]; 2026-02-21T09:21:00.4251217Z ld.shared.b16 %rs347, [%r17769+16]; 2026-02-21T09:21:00.4251286Z ld.shared.b16 %rs348, [%r17769+272]; 2026-02-21T09:21:00.4251354Z ld.shared.b16 %rs349, [%r17769+4096]; 2026-02-21T09:21:00.4251422Z ld.shared.b16 %rs350, [%r17769+4352]; 2026-02-21T09:21:00.4251490Z ld.shared.b16 %rs351, [%r17769+4112]; 2026-02-21T09:21:00.4251556Z ld.shared.b16 %rs352, [%r17769+4368]; 2026-02-21T09:21:00.4251621Z cvt.f32.bf16 %r14884, %rs337; 2026-02-21T09:21:00.4251683Z cvt.f32.bf16 %r14885, %rs338; 2026-02-21T09:21:00.4251745Z cvt.f32.bf16 %r14886, %rs345; 2026-02-21T09:21:00.4251805Z cvt.f32.bf16 %r14887, %rs346; 2026-02-21T09:21:00.4251868Z cvt.f32.bf16 %r15144, %rs339; 2026-02-21T09:21:00.4252012Z cvt.f32.bf16 %r15145, %rs340; 2026-02-21T09:21:00.4252072Z cvt.f32.bf16 %r15146, %rs347; 2026-02-21T09:21:00.4252135Z cvt.f32.bf16 %r15147, %rs348; 2026-02-21T09:21:00.4252194Z cvt.f32.bf16 %r15404, %rs341; 2026-02-21T09:21:00.4252253Z cvt.f32.bf16 %r15405, %rs342; 2026-02-21T09:21:00.4252363Z cvt.f32.bf16 %r15406, %rs349; 2026-02-21T09:21:00.4252421Z cvt.f32.bf16 %r15407, %rs350; 2026-02-21T09:21:00.4252481Z cvt.f32.bf16 %r15664, %rs343; 2026-02-21T09:21:00.4252539Z cvt.f32.bf16 %r15665, %rs344; 2026-02-21T09:21:00.4252602Z cvt.f32.bf16 %r15666, %rs351; 2026-02-21T09:21:00.4252663Z cvt.f32.bf16 %r15667, %rs352; 2026-02-21T09:21:00.4252864Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4252973Z shl.b32 %r17770, %r23040, 11; 2026-02-21T09:21:00.4253040Z add.s32 %r17771, %r22237, %r17770; 2026-02-21T09:21:00.4253103Z add.s32 %r17772, %r17771, 98304; 2026-02-21T09:21:00.4253300Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4253366Z add.s32 %r17773, %r17772, %r22239; 2026-02-21T09:21:00.4253427Z add.s32 %r17774, %r17772, %r22243; 2026-02-21T09:21:00.4253500Z add.s32 %r17775, %r17772, %r22244; 2026-02-21T09:21:00.4253705Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4253772Z ld.shared.s8 %rs353, [%r17773]; 2026-02-21T09:21:00.4253966Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4254080Z shl.b16 %rs354, %rs353, 4; 2026-02-21T09:21:00.4254277Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4254347Z ld.shared.s8 %rs355, [%r17773+256]; 2026-02-21T09:21:00.4254543Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4254607Z shl.b16 %rs356, %rs355, 4; 2026-02-21T09:21:00.4254801Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4254870Z ld.shared.s8 %rs357, [%r17773+512]; 2026-02-21T09:21:00.4255068Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4255132Z shl.b16 %rs358, %rs357, 4; 2026-02-21T09:21:00.4255324Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4255393Z ld.shared.s8 %rs359, [%r17774]; 2026-02-21T09:21:00.4255585Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4255646Z shl.b16 %rs360, %rs359, 4; 2026-02-21T09:21:00.4255841Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4255910Z ld.shared.s8 %rs361, [%r17773+1024]; 2026-02-21T09:21:00.4256110Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4256173Z shl.b16 %rs362, %rs361, 4; 2026-02-21T09:21:00.4256367Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4256435Z ld.shared.s8 %rs363, [%r17773+1280]; 2026-02-21T09:21:00.4256750Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4256817Z shl.b16 %rs364, %rs363, 4; 2026-02-21T09:21:00.4257009Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4257077Z ld.shared.s8 %rs365, [%r17773+1536]; 2026-02-21T09:21:00.4257275Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4257336Z shl.b16 %rs366, %rs365, 4; 2026-02-21T09:21:00.4257528Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4257679Z ld.shared.s8 %rs367, [%r17775]; 2026-02-21T09:21:00.4257873Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4257996Z shl.b16 %rs368, %rs367, 4; 2026-02-21T09:21:00.4258192Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4258252Z cvt.s16.s8 %rs369, %rs354; 2026-02-21T09:21:00.4258311Z shr.s16 %rs370, %rs369, 4; 2026-02-21T09:21:00.4258383Z cvt.s16.s8 %rs371, %rs356; 2026-02-21T09:21:00.4258452Z shr.s16 %rs372, %rs371, 4; 2026-02-21T09:21:00.4258512Z shr.s16 %rs373, %rs353, 4; 2026-02-21T09:21:00.4258573Z shr.s16 %rs374, %rs355, 4; 2026-02-21T09:21:00.4258840Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4258907Z cvt.rn.f32.s16 %r17776, %rs374; 2026-02-21T09:21:00.4258970Z cvt.rn.f32.s16 %r17777, %rs373; 2026-02-21T09:21:00.4259043Z cvt.rn.f32.s16 %r17778, %rs372; 2026-02-21T09:21:00.4259107Z cvt.rn.f32.s16 %r17779, %rs370; 2026-02-21T09:21:00.4259302Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4259366Z cvt.s16.s8 %rs375, %rs358; 2026-02-21T09:21:00.4259431Z shr.s16 %rs376, %rs375, 4; 2026-02-21T09:21:00.4259494Z cvt.s16.s8 %rs377, %rs360; 2026-02-21T09:21:00.4259555Z shr.s16 %rs378, %rs377, 4; 2026-02-21T09:21:00.4259616Z shr.s16 %rs379, %rs357, 4; 2026-02-21T09:21:00.4259742Z shr.s16 %rs380, %rs359, 4; 2026-02-21T09:21:00.4259941Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4260004Z cvt.rn.f32.s16 %r17780, %rs380; 2026-02-21T09:21:00.4260072Z cvt.rn.f32.s16 %r17781, %rs379; 2026-02-21T09:21:00.4260136Z cvt.rn.f32.s16 %r17782, %rs378; 2026-02-21T09:21:00.4260197Z cvt.rn.f32.s16 %r17783, %rs376; 2026-02-21T09:21:00.4260410Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4260473Z cvt.s16.s8 %rs381, %rs362; 2026-02-21T09:21:00.4260533Z shr.s16 %rs382, %rs381, 4; 2026-02-21T09:21:00.4260594Z cvt.s16.s8 %rs383, %rs364; 2026-02-21T09:21:00.4260655Z shr.s16 %rs384, %rs383, 4; 2026-02-21T09:21:00.4260715Z shr.s16 %rs385, %rs361, 4; 2026-02-21T09:21:00.4260775Z shr.s16 %rs386, %rs363, 4; 2026-02-21T09:21:00.4260973Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4261038Z cvt.rn.f32.s16 %r17784, %rs386; 2026-02-21T09:21:00.4261100Z cvt.rn.f32.s16 %r17785, %rs385; 2026-02-21T09:21:00.4261164Z cvt.rn.f32.s16 %r17786, %rs384; 2026-02-21T09:21:00.4261227Z cvt.rn.f32.s16 %r17787, %rs382; 2026-02-21T09:21:00.4261420Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4261485Z cvt.s16.s8 %rs387, %rs366; 2026-02-21T09:21:00.4261547Z shr.s16 %rs388, %rs387, 4; 2026-02-21T09:21:00.4261607Z cvt.s16.s8 %rs389, %rs368; 2026-02-21T09:21:00.4261667Z shr.s16 %rs390, %rs389, 4; 2026-02-21T09:21:00.4261731Z shr.s16 %rs391, %rs365, 4; 2026-02-21T09:21:00.4261792Z shr.s16 %rs392, %rs367, 4; 2026-02-21T09:21:00.4261985Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4262049Z cvt.rn.f32.s16 %r17788, %rs392; 2026-02-21T09:21:00.4262112Z cvt.rn.f32.s16 %r17789, %rs391; 2026-02-21T09:21:00.4262175Z cvt.rn.f32.s16 %r17790, %rs390; 2026-02-21T09:21:00.4262237Z cvt.rn.f32.s16 %r17791, %rs388; 2026-02-21T09:21:00.4262363Z st.shared.v4.b32 [%r113], {%r17779, %r17777, %r17778, %r17776}; 2026-02-21T09:21:00.4262480Z st.shared.v4.b32 [%r114], {%r17783, %r17781, %r17782, %r17780}; 2026-02-21T09:21:00.4262597Z st.shared.v4.b32 [%r115], {%r17787, %r17785, %r17786, %r17784}; 2026-02-21T09:21:00.4262709Z st.shared.v4.b32 [%r116], {%r17791, %r17789, %r17790, %r17788}; 2026-02-21T09:21:00.4262826Z $L__tmp13: 2026-02-21T09:21:00.4263104Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4263214Z // begin inline asm 2026-02-21T09:21:00.4263298Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4263355Z // end inline asm 2026-02-21T09:21:00.4263409Z bar.sync 0; 2026-02-21T09:21:00.4263498Z shfl.sync.idx.b32 %r17792, %r5, 0, 31, -1; 2026-02-21T09:21:00.4263572Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4263636Z mov.pred %p38, -1; 2026-02-21T09:21:00.4263698Z // begin inline asm 2026-02-21T09:21:00.4266688Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169}, {%r14884,%r14885,%r14886,%r14887}, %rd1, %p38, 1, 1; 2026-02-21T09:21:00.4266773Z // end inline asm 2026-02-21T09:21:00.4266835Z // begin inline asm 2026-02-21T09:21:00.4269636Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169}, {%r15144,%r15145,%r15146,%r15147}, %rd2, %p38, 1, 1; 2026-02-21T09:21:00.4269702Z // end inline asm 2026-02-21T09:21:00.4269760Z // begin inline asm 2026-02-21T09:21:00.4272484Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297}, {%r15404,%r15405,%r15406,%r15407}, %rd1, %p38, 1, 1; 2026-02-21T09:21:00.4272672Z // end inline asm 2026-02-21T09:21:00.4272729Z // begin inline asm 2026-02-21T09:21:00.4275491Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297}, {%r15664,%r15665,%r15666,%r15667}, %rd2, %p38, 1, 1; 2026-02-21T09:21:00.4275558Z // end inline asm 2026-02-21T09:21:00.4275700Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4275767Z mov.b32 %r17483, 0; 2026-02-21T09:21:00.4275829Z mov.b32 %r15924, %r2884; 2026-02-21T09:21:00.4275890Z mov.b32 %r15925, %r17483; 2026-02-21T09:21:00.4275950Z mov.b32 %r15926, %r17483; 2026-02-21T09:21:00.4276012Z // begin inline asm 2026-02-21T09:21:00.4281138Z // wait for regs: %r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r15924,%r15925,%r15926 2026-02-21T09:21:00.4281231Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4281371Z // end inline asm 2026-02-21T09:21:00.4281425Z $L__tmp14: 2026-02-21T09:21:00.4281642Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4281710Z add.s32 %r17794, %r6234, %r17765; 2026-02-21T09:21:00.4281987Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4282055Z add.s32 %r17795, %r17794, %r109; 2026-02-21T09:21:00.4282123Z ld.shared.b16 %rs393, [%r17795]; 2026-02-21T09:21:00.4282198Z ld.shared.b16 %rs394, [%r17795+256]; 2026-02-21T09:21:00.4282267Z ld.shared.b16 %rs395, [%r17795+16]; 2026-02-21T09:21:00.4282333Z ld.shared.b16 %rs396, [%r17795+272]; 2026-02-21T09:21:00.4282402Z ld.shared.b16 %rs397, [%r17795+4096]; 2026-02-21T09:21:00.4282532Z ld.shared.b16 %rs398, [%r17795+4352]; 2026-02-21T09:21:00.4282601Z ld.shared.b16 %rs399, [%r17795+4112]; 2026-02-21T09:21:00.4282667Z ld.shared.b16 %rs400, [%r17795+4368]; 2026-02-21T09:21:00.4282735Z add.s32 %r17796, %r17794, %r110; 2026-02-21T09:21:00.4282802Z ld.shared.b16 %rs401, [%r17796]; 2026-02-21T09:21:00.4282867Z ld.shared.b16 %rs402, [%r17796+256]; 2026-02-21T09:21:00.4282936Z ld.shared.b16 %rs403, [%r17796+16]; 2026-02-21T09:21:00.4283002Z ld.shared.b16 %rs404, [%r17796+272]; 2026-02-21T09:21:00.4283066Z ld.shared.b16 %rs405, [%r17796+4096]; 2026-02-21T09:21:00.4283131Z ld.shared.b16 %rs406, [%r17796+4352]; 2026-02-21T09:21:00.4283199Z ld.shared.b16 %rs407, [%r17796+4112]; 2026-02-21T09:21:00.4283276Z ld.shared.b16 %rs408, [%r17796+4368]; 2026-02-21T09:21:00.4283409Z cvt.f32.bf16 %r16442, %rs393; 2026-02-21T09:21:00.4283477Z cvt.f32.bf16 %r16443, %rs394; 2026-02-21T09:21:00.4283538Z cvt.f32.bf16 %r16444, %rs401; 2026-02-21T09:21:00.4283599Z cvt.f32.bf16 %r16445, %rs402; 2026-02-21T09:21:00.4283661Z cvt.f32.bf16 %r16702, %rs395; 2026-02-21T09:21:00.4283722Z cvt.f32.bf16 %r16703, %rs396; 2026-02-21T09:21:00.4283782Z cvt.f32.bf16 %r16704, %rs403; 2026-02-21T09:21:00.4283844Z cvt.f32.bf16 %r16705, %rs404; 2026-02-21T09:21:00.4283909Z cvt.f32.bf16 %r16962, %rs397; 2026-02-21T09:21:00.4283969Z cvt.f32.bf16 %r16963, %rs398; 2026-02-21T09:21:00.4284031Z cvt.f32.bf16 %r16964, %rs405; 2026-02-21T09:21:00.4284096Z cvt.f32.bf16 %r16965, %rs406; 2026-02-21T09:21:00.4284156Z cvt.f32.bf16 %r17222, %rs399; 2026-02-21T09:21:00.4284215Z cvt.f32.bf16 %r17223, %rs400; 2026-02-21T09:21:00.4284275Z cvt.f32.bf16 %r17224, %rs407; 2026-02-21T09:21:00.4284337Z cvt.f32.bf16 %r17225, %rs408; 2026-02-21T09:21:00.4284544Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4284608Z add.s32 %r17797, %r17771, 108544; 2026-02-21T09:21:00.4284811Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4284877Z add.s32 %r17798, %r17797, %r22239; 2026-02-21T09:21:00.4284939Z add.s32 %r17799, %r17797, %r22243; 2026-02-21T09:21:00.4285003Z add.s32 %r17800, %r17797, %r22244; 2026-02-21T09:21:00.4285198Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4285265Z ld.shared.s8 %rs409, [%r17798]; 2026-02-21T09:21:00.4285463Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4285531Z shl.b16 %rs410, %rs409, 4; 2026-02-21T09:21:00.4285726Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4285796Z ld.shared.s8 %rs411, [%r17798+256]; 2026-02-21T09:21:00.4285994Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4286057Z shl.b16 %rs412, %rs411, 4; 2026-02-21T09:21:00.4286251Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4286325Z ld.shared.s8 %rs413, [%r17798+512]; 2026-02-21T09:21:00.4286701Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4286767Z shl.b16 %rs414, %rs413, 4; 2026-02-21T09:21:00.4286965Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4287103Z ld.shared.s8 %rs415, [%r17799]; 2026-02-21T09:21:00.4287299Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4287359Z shl.b16 %rs416, %rs415, 4; 2026-02-21T09:21:00.4287556Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4287623Z ld.shared.s8 %rs417, [%r17798+1024]; 2026-02-21T09:21:00.4287888Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4287955Z shl.b16 %rs418, %rs417, 4; 2026-02-21T09:21:00.4288149Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4288216Z ld.shared.s8 %rs419, [%r17798+1280]; 2026-02-21T09:21:00.4288411Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4288475Z shl.b16 %rs420, %rs419, 4; 2026-02-21T09:21:00.4288666Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4288735Z ld.shared.s8 %rs421, [%r17798+1536]; 2026-02-21T09:21:00.4288992Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4289056Z shl.b16 %rs422, %rs421, 4; 2026-02-21T09:21:00.4289253Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4289319Z ld.shared.s8 %rs423, [%r17800]; 2026-02-21T09:21:00.4289512Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4289574Z shl.b16 %rs424, %rs423, 4; 2026-02-21T09:21:00.4289769Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4289830Z cvt.s16.s8 %rs425, %rs410; 2026-02-21T09:21:00.4289891Z shr.s16 %rs426, %rs425, 4; 2026-02-21T09:21:00.4289956Z cvt.s16.s8 %rs427, %rs412; 2026-02-21T09:21:00.4290016Z shr.s16 %rs428, %rs427, 4; 2026-02-21T09:21:00.4290075Z shr.s16 %rs429, %rs409, 4; 2026-02-21T09:21:00.4290134Z shr.s16 %rs430, %rs411, 4; 2026-02-21T09:21:00.4290333Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4290399Z cvt.rn.f32.s16 %r17801, %rs430; 2026-02-21T09:21:00.4290462Z cvt.rn.f32.s16 %r17802, %rs429; 2026-02-21T09:21:00.4290529Z cvt.rn.f32.s16 %r17803, %rs428; 2026-02-21T09:21:00.4290590Z cvt.rn.f32.s16 %r17804, %rs426; 2026-02-21T09:21:00.4290796Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4290864Z cvt.s16.s8 %rs431, %rs414; 2026-02-21T09:21:00.4290925Z shr.s16 %rs432, %rs431, 4; 2026-02-21T09:21:00.4290986Z cvt.s16.s8 %rs433, %rs416; 2026-02-21T09:21:00.4291046Z shr.s16 %rs434, %rs433, 4; 2026-02-21T09:21:00.4291112Z shr.s16 %rs435, %rs413, 4; 2026-02-21T09:21:00.4291171Z shr.s16 %rs436, %rs415, 4; 2026-02-21T09:21:00.4291367Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4291434Z cvt.rn.f32.s16 %r17805, %rs436; 2026-02-21T09:21:00.4291498Z cvt.rn.f32.s16 %r17806, %rs435; 2026-02-21T09:21:00.4291558Z cvt.rn.f32.s16 %r17807, %rs434; 2026-02-21T09:21:00.4291623Z cvt.rn.f32.s16 %r17808, %rs432; 2026-02-21T09:21:00.4291818Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4291879Z cvt.s16.s8 %rs437, %rs418; 2026-02-21T09:21:00.4291939Z shr.s16 %rs438, %rs437, 4; 2026-02-21T09:21:00.4292077Z cvt.s16.s8 %rs439, %rs420; 2026-02-21T09:21:00.4292144Z shr.s16 %rs440, %rs439, 4; 2026-02-21T09:21:00.4292203Z shr.s16 %rs441, %rs417, 4; 2026-02-21T09:21:00.4292266Z shr.s16 %rs442, %rs419, 4; 2026-02-21T09:21:00.4292506Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4292569Z cvt.rn.f32.s16 %r17809, %rs442; 2026-02-21T09:21:00.4292636Z cvt.rn.f32.s16 %r17810, %rs441; 2026-02-21T09:21:00.4292709Z cvt.rn.f32.s16 %r17811, %rs440; 2026-02-21T09:21:00.4292774Z cvt.rn.f32.s16 %r17812, %rs438; 2026-02-21T09:21:00.4292970Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4293034Z cvt.s16.s8 %rs443, %rs422; 2026-02-21T09:21:00.4293145Z shr.s16 %rs444, %rs443, 4; 2026-02-21T09:21:00.4293207Z cvt.s16.s8 %rs445, %rs424; 2026-02-21T09:21:00.4293270Z shr.s16 %rs446, %rs445, 4; 2026-02-21T09:21:00.4293331Z shr.s16 %rs447, %rs421, 4; 2026-02-21T09:21:00.4293390Z shr.s16 %rs448, %rs423, 4; 2026-02-21T09:21:00.4293584Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4293654Z cvt.rn.f32.s16 %r17813, %rs448; 2026-02-21T09:21:00.4293719Z cvt.rn.f32.s16 %r17814, %rs447; 2026-02-21T09:21:00.4293781Z cvt.rn.f32.s16 %r17815, %rs446; 2026-02-21T09:21:00.4293845Z cvt.rn.f32.s16 %r17816, %rs444; 2026-02-21T09:21:00.4293901Z bar.sync 0; 2026-02-21T09:21:00.4294074Z st.shared.v4.b32 [%r113], {%r17804, %r17802, %r17803, %r17801}; 2026-02-21T09:21:00.4294194Z st.shared.v4.b32 [%r114], {%r17808, %r17806, %r17807, %r17805}; 2026-02-21T09:21:00.4294303Z st.shared.v4.b32 [%r115], {%r17812, %r17810, %r17811, %r17809}; 2026-02-21T09:21:00.4294412Z st.shared.v4.b32 [%r116], {%r17816, %r17814, %r17815, %r17813}; 2026-02-21T09:21:00.4294468Z $L__tmp15: 2026-02-21T09:21:00.4294749Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4294822Z // begin inline asm 2026-02-21T09:21:00.4294902Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4294961Z // end inline asm 2026-02-21T09:21:00.4295016Z bar.sync 0; 2026-02-21T09:21:00.4295091Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4295153Z // begin inline asm 2026-02-21T09:21:00.4297986Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169}, {%r16442,%r16443,%r16444,%r16445}, %rd1, %p38, 1, 1; 2026-02-21T09:21:00.4298054Z // end inline asm 2026-02-21T09:21:00.4298114Z // begin inline asm 2026-02-21T09:21:00.4300832Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169}, {%r16702,%r16703,%r16704,%r16705}, %rd2, %p38, 1, 1; 2026-02-21T09:21:00.4301036Z // end inline asm 2026-02-21T09:21:00.4301171Z // begin inline asm 2026-02-21T09:21:00.4303975Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297}, {%r16962,%r16963,%r16964,%r16965}, %rd1, %p38, 1, 1; 2026-02-21T09:21:00.4304043Z // end inline asm 2026-02-21T09:21:00.4304109Z // begin inline asm 2026-02-21T09:21:00.4306936Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297}, {%r17222,%r17223,%r17224,%r17225}, %rd2, %p38, 1, 1; 2026-02-21T09:21:00.4307009Z // end inline asm 2026-02-21T09:21:00.4307101Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4307165Z mov.b32 %r17482, %r2884; 2026-02-21T09:21:00.4307230Z mov.b32 %r17484, %r17483; 2026-02-21T09:21:00.4307291Z // begin inline asm 2026-02-21T09:21:00.4312407Z // wait for regs: %r23042,%r23043,%r23044,%r23045,%r23046,%r23047,%r23048,%r23049,%r23050,%r23051,%r23052,%r23053,%r23054,%r23055,%r23056,%r23057,%r23058,%r23059,%r23060,%r23061,%r23062,%r23063,%r23064,%r23065,%r23066,%r23067,%r23068,%r23069,%r23070,%r23071,%r23072,%r23073,%r23074,%r23075,%r23076,%r23077,%r23078,%r23079,%r23080,%r23081,%r23082,%r23083,%r23084,%r23085,%r23086,%r23087,%r23088,%r23089,%r23090,%r23091,%r23092,%r23093,%r23094,%r23095,%r23096,%r23097,%r23098,%r23099,%r23100,%r23101,%r23102,%r23103,%r23104,%r23105,%r23106,%r23107,%r23108,%r23109,%r23110,%r23111,%r23112,%r23113,%r23114,%r23115,%r23116,%r23117,%r23118,%r23119,%r23120,%r23121,%r23122,%r23123,%r23124,%r23125,%r23126,%r23127,%r23128,%r23129,%r23130,%r23131,%r23132,%r23133,%r23134,%r23135,%r23136,%r23137,%r23138,%r23139,%r23140,%r23141,%r23142,%r23143,%r23144,%r23145,%r23146,%r23147,%r23148,%r23149,%r23150,%r23151,%r23152,%r23153,%r23154,%r23155,%r23156,%r23157,%r23158,%r23159,%r23160,%r23161,%r23162,%r23163,%r23164,%r23165,%r23166,%r23167,%r23168,%r23169,%r23170,%r23171,%r23172,%r23173,%r23174,%r23175,%r23176,%r23177,%r23178,%r23179,%r23180,%r23181,%r23182,%r23183,%r23184,%r23185,%r23186,%r23187,%r23188,%r23189,%r23190,%r23191,%r23192,%r23193,%r23194,%r23195,%r23196,%r23197,%r23198,%r23199,%r23200,%r23201,%r23202,%r23203,%r23204,%r23205,%r23206,%r23207,%r23208,%r23209,%r23210,%r23211,%r23212,%r23213,%r23214,%r23215,%r23216,%r23217,%r23218,%r23219,%r23220,%r23221,%r23222,%r23223,%r23224,%r23225,%r23226,%r23227,%r23228,%r23229,%r23230,%r23231,%r23232,%r23233,%r23234,%r23235,%r23236,%r23237,%r23238,%r23239,%r23240,%r23241,%r23242,%r23243,%r23244,%r23245,%r23246,%r23247,%r23248,%r23249,%r23250,%r23251,%r23252,%r23253,%r23254,%r23255,%r23256,%r23257,%r23258,%r23259,%r23260,%r23261,%r23262,%r23263,%r23264,%r23265,%r23266,%r23267,%r23268,%r23269,%r23270,%r23271,%r23272,%r23273,%r23274,%r23275,%r23276,%r23277,%r23278,%r23279,%r23280,%r23281,%r23282,%r23283,%r23284,%r23285,%r23286,%r23287,%r23288,%r23289,%r23290,%r23291,%r23292,%r23293,%r23294,%r23295,%r23296,%r23297,%r17482,%r17483,%r17484 2026-02-21T09:21:00.4312672Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4312732Z // end inline asm 2026-02-21T09:21:00.4312786Z $L__tmp16: 2026-02-21T09:21:00.4312998Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4313065Z add.s32 %r17817, %r23041, 1; 2026-02-21T09:21:00.4313130Z setp.gt.s32 %p48, %r17817, 4; 2026-02-21T09:21:00.4313199Z selp.b32 %r23041, 0, %r17817, %p48; 2026-02-21T09:21:00.4313403Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4313470Z add.s32 %r17818, %r23038, -16; 2026-02-21T09:21:00.4313533Z add.s64 %rd540, %rd720, %rd31; 2026-02-21T09:21:00.4313601Z add.s64 %rd530, %rd540, 320; 2026-02-21T09:21:00.4313663Z add.s64 %rd541, %rd720, %rd30; 2026-02-21T09:21:00.4313725Z add.s64 %rd531, %rd541, 320; 2026-02-21T09:21:00.4313784Z add.s64 %rd542, %rd720, %rd29; 2026-02-21T09:21:00.4313848Z add.s64 %rd532, %rd542, 320; 2026-02-21T09:21:00.4313928Z mad.wide.s32 %rd533, %r17818, 2, %rd44; 2026-02-21T09:21:00.4314125Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4314187Z shl.b32 %r17819, %r23041, 13; 2026-02-21T09:21:00.4314252Z add.s32 %r17820, %r22237, %r17819; 2026-02-21T09:21:00.4314314Z add.s32 %r17744, %r17820, %r47; 2026-02-21T09:21:00.4314388Z selp.b32 %r17745, 8, 0, %p46; 2026-02-21T09:21:00.4314454Z // begin inline asm 2026-02-21T09:21:00.4314607Z cp.async.ca.shared.global [ %r17744 + 0 ], [ %rd530 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4314663Z // end inline asm 2026-02-21T09:21:00.4314726Z add.s32 %r17746, %r17744, 2048; 2026-02-21T09:21:00.4314787Z // begin inline asm 2026-02-21T09:21:00.4314928Z cp.async.ca.shared.global [ %r17746 + 0 ], [ %rd531 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4314986Z // end inline asm 2026-02-21T09:21:00.4315047Z add.s32 %r17748, %r17744, 4096; 2026-02-21T09:21:00.4315106Z // begin inline asm 2026-02-21T09:21:00.4315245Z cp.async.ca.shared.global [ %r17748 + 0 ], [ %rd532 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4315304Z // end inline asm 2026-02-21T09:21:00.4315365Z add.s32 %r17750, %r17744, 6144; 2026-02-21T09:21:00.4315426Z // begin inline asm 2026-02-21T09:21:00.4315566Z cp.async.ca.shared.global [ %r17750 + 0 ], [ %rd533 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4315622Z // end inline asm 2026-02-21T09:21:00.4315688Z cp.async.commit_group; 2026-02-21T09:21:00.4315952Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4316020Z add.s32 %r17821, %r23039, -65536; 2026-02-21T09:21:00.4316084Z cvt.s64.s32 %rd543, %r17821; 2026-02-21T09:21:00.4316199Z add.s64 %rd534, %rd45, %rd543; 2026-02-21T09:21:00.4316400Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4316580Z shl.b32 %r17822, %r23041, 11; 2026-02-21T09:21:00.4316647Z add.s32 %r17752, %r54, %r17822; 2026-02-21T09:21:00.4316712Z // begin inline asm 2026-02-21T09:21:00.4316856Z cp.async.ca.shared.global [ %r17752 + 0 ], [ %rd534 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4316914Z // end inline asm 2026-02-21T09:21:00.4317061Z cp.async.commit_group; 2026-02-21T09:21:00.4317269Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4317330Z add.s64 %rd535, %rd540, 352; 2026-02-21T09:21:00.4317394Z add.s64 %rd536, %rd541, 352; 2026-02-21T09:21:00.4317457Z add.s64 %rd537, %rd542, 352; 2026-02-21T09:21:00.4317529Z mad.wide.s32 %rd538, %r23038, 2, %rd44; 2026-02-21T09:21:00.4317725Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4317793Z add.s32 %r17823, %r6234, %r17819; 2026-02-21T09:21:00.4317854Z add.s32 %r17754, %r17823, %r47; 2026-02-21T09:21:00.4317912Z // begin inline asm 2026-02-21T09:21:00.4318117Z cp.async.ca.shared.global [ %r17754 + 0 ], [ %rd535 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4318180Z // end inline asm 2026-02-21T09:21:00.4318243Z add.s32 %r17756, %r17754, 2048; 2026-02-21T09:21:00.4318302Z // begin inline asm 2026-02-21T09:21:00.4318442Z cp.async.ca.shared.global [ %r17756 + 0 ], [ %rd536 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4318498Z // end inline asm 2026-02-21T09:21:00.4318557Z add.s32 %r17758, %r17754, 4096; 2026-02-21T09:21:00.4318616Z // begin inline asm 2026-02-21T09:21:00.4318760Z cp.async.ca.shared.global [ %r17758 + 0 ], [ %rd537 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4318816Z // end inline asm 2026-02-21T09:21:00.4318878Z add.s32 %r17760, %r17754, 6144; 2026-02-21T09:21:00.4318939Z // begin inline asm 2026-02-21T09:21:00.4319076Z cp.async.ca.shared.global [ %r17760 + 0 ], [ %rd538 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4319133Z // end inline asm 2026-02-21T09:21:00.4319198Z cp.async.commit_group; 2026-02-21T09:21:00.4319396Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4319459Z cvt.s64.s32 %rd544, %r23039; 2026-02-21T09:21:00.4327591Z add.s64 %rd539, %rd45, %rd544; 2026-02-21T09:21:00.4327988Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4328073Z add.s32 %r17762, %r60, %r17822; 2026-02-21T09:21:00.4328140Z // begin inline asm 2026-02-21T09:21:00.4328332Z cp.async.ca.shared.global [ %r17762 + 0 ], [ %rd539 + 0 ], 0x8, %r17745; 2026-02-21T09:21:00.4328399Z // end inline asm 2026-02-21T09:21:00.4328479Z cp.async.commit_group; 2026-02-21T09:21:00.4328712Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4328797Z add.s32 %r23039, %r23039, 131072; 2026-02-21T09:21:00.4328869Z add.s64 %rd720, %rd720, 64; 2026-02-21T09:21:00.4328934Z add.s32 %r23038, %r23038, 32; 2026-02-21T09:21:00.4329005Z setp.lt.u64 %p49, %rd721, 496; 2026-02-21T09:21:00.4329067Z @%p49 bra $L__BB0_9; 2026-02-21T09:21:00.4329192Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:00.4329411Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4329481Z or.b32 %r18112, %r1813, %r11; 2026-02-21T09:21:00.4329552Z or.b32 %r18113, %r1813, %r12; 2026-02-21T09:21:00.4329611Z or.b32 %r18114, %r1813, %r13; 2026-02-21T09:21:00.4329818Z or.b32 %r18115, %r1813, %r14; 2026-02-21T09:21:00.4329881Z or.b32 %r18116, %r1813, %r15; 2026-02-21T09:21:00.4329942Z or.b32 %r18117, %r1813, %r16; 2026-02-21T09:21:00.4330002Z or.b32 %r18118, %r1813, %r17; 2026-02-21T09:21:00.4330063Z or.b32 %r18119, %r1813, %r18; 2026-02-21T09:21:00.4330200Z or.b32 %r18120, %r1813, %r19; 2026-02-21T09:21:00.4330262Z or.b32 %r18121, %r1813, %r20; 2026-02-21T09:21:00.4330321Z or.b32 %r18122, %r1813, %r21; 2026-02-21T09:21:00.4330385Z or.b32 %r18123, %r1813, %r22; 2026-02-21T09:21:00.4330444Z or.b32 %r18124, %r1813, %r23; 2026-02-21T09:21:00.4330505Z or.b32 %r18125, %r1813, %r24; 2026-02-21T09:21:00.4330566Z or.b32 %r18126, %r1813, %r25; 2026-02-21T09:21:00.4330627Z or.b32 %r18127, %r1813, %r26; 2026-02-21T09:21:00.4330766Z or.b32 %r18128, %r1813, %r27; 2026-02-21T09:21:00.4330833Z or.b32 %r18129, %r1813, %r28; 2026-02-21T09:21:00.4330896Z or.b32 %r18130, %r1813, %r29; 2026-02-21T09:21:00.4330958Z or.b32 %r18131, %r1813, %r30; 2026-02-21T09:21:00.4331020Z or.b32 %r18132, %r1813, %r31; 2026-02-21T09:21:00.4331083Z or.b32 %r18133, %r1813, %r32; 2026-02-21T09:21:00.4331143Z or.b32 %r18134, %r1813, %r33; 2026-02-21T09:21:00.4331203Z or.b32 %r18135, %r1813, %r34; 2026-02-21T09:21:00.4331262Z or.b32 %r18136, %r1813, %r35; 2026-02-21T09:21:00.4331332Z or.b32 %r18137, %r1813, %r36; 2026-02-21T09:21:00.4331392Z or.b32 %r18138, %r1813, %r37; 2026-02-21T09:21:00.4331451Z or.b32 %r18139, %r1813, %r38; 2026-02-21T09:21:00.4331513Z or.b32 %r18140, %r1813, %r39; 2026-02-21T09:21:00.4331574Z or.b32 %r18141, %r1813, %r40; 2026-02-21T09:21:00.4331696Z or.b32 %r18142, %r1813, %r41; 2026-02-21T09:21:00.4331760Z or.b32 %r18143, %r1813, %r42; 2026-02-21T09:21:00.4331976Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4332060Z cp.async.wait_group 0; 2026-02-21T09:21:00.4332156Z bar.sync 0; 2026-02-21T09:21:00.4332373Z .loc 1 90 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:90:28 2026-02-21T09:21:00.4332470Z cvt.rn.bf16x2.f32 %r18144, %r23043, %r23042; 2026-02-21T09:21:00.4332554Z cvt.rn.bf16x2.f32 %r18145, %r23045, %r23044; 2026-02-21T09:21:00.4332643Z cvt.rn.bf16x2.f32 %r18146, %r23047, %r23046; 2026-02-21T09:21:00.4332723Z cvt.rn.bf16x2.f32 %r18147, %r23049, %r23048; 2026-02-21T09:21:00.4332801Z cvt.rn.bf16x2.f32 %r18148, %r23051, %r23050; 2026-02-21T09:21:00.4332879Z cvt.rn.bf16x2.f32 %r18149, %r23053, %r23052; 2026-02-21T09:21:00.4332957Z cvt.rn.bf16x2.f32 %r18150, %r23055, %r23054; 2026-02-21T09:21:00.4333035Z cvt.rn.bf16x2.f32 %r18151, %r23057, %r23056; 2026-02-21T09:21:00.4333114Z cvt.rn.bf16x2.f32 %r18152, %r23059, %r23058; 2026-02-21T09:21:00.4333195Z cvt.rn.bf16x2.f32 %r18153, %r23061, %r23060; 2026-02-21T09:21:00.4333276Z cvt.rn.bf16x2.f32 %r18154, %r23063, %r23062; 2026-02-21T09:21:00.4333352Z cvt.rn.bf16x2.f32 %r18155, %r23065, %r23064; 2026-02-21T09:21:00.4333433Z cvt.rn.bf16x2.f32 %r18156, %r23067, %r23066; 2026-02-21T09:21:00.4333514Z cvt.rn.bf16x2.f32 %r18157, %r23069, %r23068; 2026-02-21T09:21:00.4333590Z cvt.rn.bf16x2.f32 %r18158, %r23071, %r23070; 2026-02-21T09:21:00.4333665Z cvt.rn.bf16x2.f32 %r18159, %r23073, %r23072; 2026-02-21T09:21:00.4333745Z cvt.rn.bf16x2.f32 %r18160, %r23075, %r23074; 2026-02-21T09:21:00.4333822Z cvt.rn.bf16x2.f32 %r18161, %r23077, %r23076; 2026-02-21T09:21:00.4333902Z cvt.rn.bf16x2.f32 %r18162, %r23079, %r23078; 2026-02-21T09:21:00.4333984Z cvt.rn.bf16x2.f32 %r18163, %r23081, %r23080; 2026-02-21T09:21:00.4334062Z cvt.rn.bf16x2.f32 %r18164, %r23083, %r23082; 2026-02-21T09:21:00.4334137Z cvt.rn.bf16x2.f32 %r18165, %r23085, %r23084; 2026-02-21T09:21:00.4334214Z cvt.rn.bf16x2.f32 %r18166, %r23087, %r23086; 2026-02-21T09:21:00.4334292Z cvt.rn.bf16x2.f32 %r18167, %r23089, %r23088; 2026-02-21T09:21:00.4334368Z cvt.rn.bf16x2.f32 %r18168, %r23091, %r23090; 2026-02-21T09:21:00.4334444Z cvt.rn.bf16x2.f32 %r18169, %r23093, %r23092; 2026-02-21T09:21:00.4334589Z cvt.rn.bf16x2.f32 %r18170, %r23095, %r23094; 2026-02-21T09:21:00.4334663Z cvt.rn.bf16x2.f32 %r18171, %r23097, %r23096; 2026-02-21T09:21:00.4334740Z cvt.rn.bf16x2.f32 %r18172, %r23099, %r23098; 2026-02-21T09:21:00.4334820Z cvt.rn.bf16x2.f32 %r18173, %r23101, %r23100; 2026-02-21T09:21:00.4334945Z cvt.rn.bf16x2.f32 %r18174, %r23103, %r23102; 2026-02-21T09:21:00.4335022Z cvt.rn.bf16x2.f32 %r18175, %r23105, %r23104; 2026-02-21T09:21:00.4335098Z cvt.rn.bf16x2.f32 %r18176, %r23107, %r23106; 2026-02-21T09:21:00.4335176Z cvt.rn.bf16x2.f32 %r18177, %r23109, %r23108; 2026-02-21T09:21:00.4335254Z cvt.rn.bf16x2.f32 %r18178, %r23111, %r23110; 2026-02-21T09:21:00.4335335Z cvt.rn.bf16x2.f32 %r18179, %r23113, %r23112; 2026-02-21T09:21:00.4335477Z cvt.rn.bf16x2.f32 %r18180, %r23115, %r23114; 2026-02-21T09:21:00.4335558Z cvt.rn.bf16x2.f32 %r18181, %r23117, %r23116; 2026-02-21T09:21:00.4335635Z cvt.rn.bf16x2.f32 %r18182, %r23119, %r23118; 2026-02-21T09:21:00.4335714Z cvt.rn.bf16x2.f32 %r18183, %r23121, %r23120; 2026-02-21T09:21:00.4335794Z cvt.rn.bf16x2.f32 %r18184, %r23123, %r23122; 2026-02-21T09:21:00.4335870Z cvt.rn.bf16x2.f32 %r18185, %r23125, %r23124; 2026-02-21T09:21:00.4335947Z cvt.rn.bf16x2.f32 %r18186, %r23127, %r23126; 2026-02-21T09:21:00.4336029Z cvt.rn.bf16x2.f32 %r18187, %r23129, %r23128; 2026-02-21T09:21:00.4336104Z cvt.rn.bf16x2.f32 %r18188, %r23131, %r23130; 2026-02-21T09:21:00.4336182Z cvt.rn.bf16x2.f32 %r18189, %r23133, %r23132; 2026-02-21T09:21:00.4336260Z cvt.rn.bf16x2.f32 %r18190, %r23135, %r23134; 2026-02-21T09:21:00.4336384Z cvt.rn.bf16x2.f32 %r18191, %r23137, %r23136; 2026-02-21T09:21:00.4336606Z cvt.rn.bf16x2.f32 %r18192, %r23139, %r23138; 2026-02-21T09:21:00.4336707Z cvt.rn.bf16x2.f32 %r18193, %r23141, %r23140; 2026-02-21T09:21:00.4336788Z cvt.rn.bf16x2.f32 %r18194, %r23143, %r23142; 2026-02-21T09:21:00.4336867Z cvt.rn.bf16x2.f32 %r18195, %r23145, %r23144; 2026-02-21T09:21:00.4336949Z cvt.rn.bf16x2.f32 %r18196, %r23147, %r23146; 2026-02-21T09:21:00.4337031Z cvt.rn.bf16x2.f32 %r18197, %r23149, %r23148; 2026-02-21T09:21:00.4337106Z cvt.rn.bf16x2.f32 %r18198, %r23151, %r23150; 2026-02-21T09:21:00.4337180Z cvt.rn.bf16x2.f32 %r18199, %r23153, %r23152; 2026-02-21T09:21:00.4357908Z cvt.rn.bf16x2.f32 %r18200, %r23155, %r23154; 2026-02-21T09:21:00.4358018Z cvt.rn.bf16x2.f32 %r18201, %r23157, %r23156; 2026-02-21T09:21:00.4358099Z cvt.rn.bf16x2.f32 %r18202, %r23159, %r23158; 2026-02-21T09:21:00.4358176Z cvt.rn.bf16x2.f32 %r18203, %r23161, %r23160; 2026-02-21T09:21:00.4358251Z cvt.rn.bf16x2.f32 %r18204, %r23163, %r23162; 2026-02-21T09:21:00.4358326Z cvt.rn.bf16x2.f32 %r18205, %r23165, %r23164; 2026-02-21T09:21:00.4358400Z cvt.rn.bf16x2.f32 %r18206, %r23167, %r23166; 2026-02-21T09:21:00.4358479Z cvt.rn.bf16x2.f32 %r18207, %r23169, %r23168; 2026-02-21T09:21:00.4358552Z cvt.rn.bf16x2.f32 %r18208, %r23171, %r23170; 2026-02-21T09:21:00.4358624Z cvt.rn.bf16x2.f32 %r18209, %r23173, %r23172; 2026-02-21T09:21:00.4358702Z cvt.rn.bf16x2.f32 %r18210, %r23175, %r23174; 2026-02-21T09:21:00.4358778Z cvt.rn.bf16x2.f32 %r18211, %r23177, %r23176; 2026-02-21T09:21:00.4358864Z cvt.rn.bf16x2.f32 %r18212, %r23179, %r23178; 2026-02-21T09:21:00.4358942Z cvt.rn.bf16x2.f32 %r18213, %r23181, %r23180; 2026-02-21T09:21:00.4359020Z cvt.rn.bf16x2.f32 %r18214, %r23183, %r23182; 2026-02-21T09:21:00.4359093Z cvt.rn.bf16x2.f32 %r18215, %r23185, %r23184; 2026-02-21T09:21:00.4359168Z cvt.rn.bf16x2.f32 %r18216, %r23187, %r23186; 2026-02-21T09:21:00.4359243Z cvt.rn.bf16x2.f32 %r18217, %r23189, %r23188; 2026-02-21T09:21:00.4359317Z cvt.rn.bf16x2.f32 %r18218, %r23191, %r23190; 2026-02-21T09:21:00.4359390Z cvt.rn.bf16x2.f32 %r18219, %r23193, %r23192; 2026-02-21T09:21:00.4359465Z cvt.rn.bf16x2.f32 %r18220, %r23195, %r23194; 2026-02-21T09:21:00.4359540Z cvt.rn.bf16x2.f32 %r18221, %r23197, %r23196; 2026-02-21T09:21:00.4359614Z cvt.rn.bf16x2.f32 %r18222, %r23199, %r23198; 2026-02-21T09:21:00.4359689Z cvt.rn.bf16x2.f32 %r18223, %r23201, %r23200; 2026-02-21T09:21:00.4359870Z cvt.rn.bf16x2.f32 %r18224, %r23203, %r23202; 2026-02-21T09:21:00.4359945Z cvt.rn.bf16x2.f32 %r18225, %r23205, %r23204; 2026-02-21T09:21:00.4360019Z cvt.rn.bf16x2.f32 %r18226, %r23207, %r23206; 2026-02-21T09:21:00.4360099Z cvt.rn.bf16x2.f32 %r18227, %r23209, %r23208; 2026-02-21T09:21:00.4360237Z cvt.rn.bf16x2.f32 %r18228, %r23211, %r23210; 2026-02-21T09:21:00.4360311Z cvt.rn.bf16x2.f32 %r18229, %r23213, %r23212; 2026-02-21T09:21:00.4360386Z cvt.rn.bf16x2.f32 %r18230, %r23215, %r23214; 2026-02-21T09:21:00.4360461Z cvt.rn.bf16x2.f32 %r18231, %r23217, %r23216; 2026-02-21T09:21:00.4360534Z cvt.rn.bf16x2.f32 %r18232, %r23219, %r23218; 2026-02-21T09:21:00.4360609Z cvt.rn.bf16x2.f32 %r18233, %r23221, %r23220; 2026-02-21T09:21:00.4360744Z cvt.rn.bf16x2.f32 %r18234, %r23223, %r23222; 2026-02-21T09:21:00.4360821Z cvt.rn.bf16x2.f32 %r18235, %r23225, %r23224; 2026-02-21T09:21:00.4360894Z cvt.rn.bf16x2.f32 %r18236, %r23227, %r23226; 2026-02-21T09:21:00.4360969Z cvt.rn.bf16x2.f32 %r18237, %r23229, %r23228; 2026-02-21T09:21:00.4361044Z cvt.rn.bf16x2.f32 %r18238, %r23231, %r23230; 2026-02-21T09:21:00.4361116Z cvt.rn.bf16x2.f32 %r18239, %r23233, %r23232; 2026-02-21T09:21:00.4361192Z cvt.rn.bf16x2.f32 %r18240, %r23235, %r23234; 2026-02-21T09:21:00.4361266Z cvt.rn.bf16x2.f32 %r18241, %r23237, %r23236; 2026-02-21T09:21:00.4361340Z cvt.rn.bf16x2.f32 %r18242, %r23239, %r23238; 2026-02-21T09:21:00.4361415Z cvt.rn.bf16x2.f32 %r18243, %r23241, %r23240; 2026-02-21T09:21:00.4361488Z cvt.rn.bf16x2.f32 %r18244, %r23243, %r23242; 2026-02-21T09:21:00.4361619Z cvt.rn.bf16x2.f32 %r18245, %r23245, %r23244; 2026-02-21T09:21:00.4361695Z cvt.rn.bf16x2.f32 %r18246, %r23247, %r23246; 2026-02-21T09:21:00.4361773Z cvt.rn.bf16x2.f32 %r18247, %r23249, %r23248; 2026-02-21T09:21:00.4361849Z cvt.rn.bf16x2.f32 %r18248, %r23251, %r23250; 2026-02-21T09:21:00.4361921Z cvt.rn.bf16x2.f32 %r18249, %r23253, %r23252; 2026-02-21T09:21:00.4362004Z cvt.rn.bf16x2.f32 %r18250, %r23255, %r23254; 2026-02-21T09:21:00.4362080Z cvt.rn.bf16x2.f32 %r18251, %r23257, %r23256; 2026-02-21T09:21:00.4362153Z cvt.rn.bf16x2.f32 %r18252, %r23259, %r23258; 2026-02-21T09:21:00.4362225Z cvt.rn.bf16x2.f32 %r18253, %r23261, %r23260; 2026-02-21T09:21:00.4362301Z cvt.rn.bf16x2.f32 %r18254, %r23263, %r23262; 2026-02-21T09:21:00.4362376Z cvt.rn.bf16x2.f32 %r18255, %r23265, %r23264; 2026-02-21T09:21:00.4362450Z cvt.rn.bf16x2.f32 %r18256, %r23267, %r23266; 2026-02-21T09:21:00.4362538Z cvt.rn.bf16x2.f32 %r18257, %r23269, %r23268; 2026-02-21T09:21:00.4362615Z cvt.rn.bf16x2.f32 %r18258, %r23271, %r23270; 2026-02-21T09:21:00.4362689Z cvt.rn.bf16x2.f32 %r18259, %r23273, %r23272; 2026-02-21T09:21:00.4362764Z cvt.rn.bf16x2.f32 %r18260, %r23275, %r23274; 2026-02-21T09:21:00.4362838Z cvt.rn.bf16x2.f32 %r18261, %r23277, %r23276; 2026-02-21T09:21:00.4362912Z cvt.rn.bf16x2.f32 %r18262, %r23279, %r23278; 2026-02-21T09:21:00.4362987Z cvt.rn.bf16x2.f32 %r18263, %r23281, %r23280; 2026-02-21T09:21:00.4363065Z cvt.rn.bf16x2.f32 %r18264, %r23283, %r23282; 2026-02-21T09:21:00.4363138Z cvt.rn.bf16x2.f32 %r18265, %r23285, %r23284; 2026-02-21T09:21:00.4363212Z cvt.rn.bf16x2.f32 %r18266, %r23287, %r23286; 2026-02-21T09:21:00.4363288Z cvt.rn.bf16x2.f32 %r18267, %r23289, %r23288; 2026-02-21T09:21:00.4363363Z cvt.rn.bf16x2.f32 %r18268, %r23291, %r23290; 2026-02-21T09:21:00.4363436Z cvt.rn.bf16x2.f32 %r18269, %r23293, %r23292; 2026-02-21T09:21:00.4363512Z cvt.rn.bf16x2.f32 %r18270, %r23295, %r23294; 2026-02-21T09:21:00.4363585Z cvt.rn.bf16x2.f32 %r18271, %r23297, %r23296; 2026-02-21T09:21:00.4363809Z .loc 1 91 43 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:43 2026-02-21T09:21:00.4363878Z shl.b32 %r18272, %r18112, 13; 2026-02-21T09:21:00.4363942Z shl.b32 %r18273, %r18113, 13; 2026-02-21T09:21:00.4363999Z shl.b32 %r18274, %r18114, 13; 2026-02-21T09:21:00.4364057Z shl.b32 %r18275, %r18115, 13; 2026-02-21T09:21:00.4364121Z shl.b32 %r18276, %r18116, 13; 2026-02-21T09:21:00.4364254Z shl.b32 %r18277, %r18117, 13; 2026-02-21T09:21:00.4364314Z shl.b32 %r18278, %r18118, 13; 2026-02-21T09:21:00.4364374Z shl.b32 %r18279, %r18119, 13; 2026-02-21T09:21:00.4364433Z shl.b32 %r18280, %r18120, 13; 2026-02-21T09:21:00.4364490Z shl.b32 %r18281, %r18121, 13; 2026-02-21T09:21:00.4364605Z shl.b32 %r18282, %r18122, 13; 2026-02-21T09:21:00.4364667Z shl.b32 %r18283, %r18123, 13; 2026-02-21T09:21:00.4364726Z shl.b32 %r18284, %r18124, 13; 2026-02-21T09:21:00.4364783Z shl.b32 %r18285, %r18125, 13; 2026-02-21T09:21:00.4364843Z shl.b32 %r18286, %r18126, 13; 2026-02-21T09:21:00.4364901Z shl.b32 %r18287, %r18127, 13; 2026-02-21T09:21:00.4364959Z shl.b32 %r18288, %r18128, 13; 2026-02-21T09:21:00.4365016Z shl.b32 %r18289, %r18129, 13; 2026-02-21T09:21:00.4365144Z shl.b32 %r18290, %r18130, 13; 2026-02-21T09:21:00.4365203Z shl.b32 %r18291, %r18131, 13; 2026-02-21T09:21:00.4365261Z shl.b32 %r18292, %r18132, 13; 2026-02-21T09:21:00.4365321Z shl.b32 %r18293, %r18133, 13; 2026-02-21T09:21:00.4365380Z shl.b32 %r18294, %r18134, 13; 2026-02-21T09:21:00.4365437Z shl.b32 %r18295, %r18135, 13; 2026-02-21T09:21:00.4365495Z shl.b32 %r18296, %r18136, 13; 2026-02-21T09:21:00.4365556Z shl.b32 %r18297, %r18137, 13; 2026-02-21T09:21:00.4365614Z shl.b32 %r18298, %r18138, 13; 2026-02-21T09:21:00.4365671Z shl.b32 %r18299, %r18139, 13; 2026-02-21T09:21:00.4365732Z shl.b32 %r18300, %r18140, 13; 2026-02-21T09:21:00.4365789Z shl.b32 %r18301, %r18141, 13; 2026-02-21T09:21:00.4365845Z shl.b32 %r18302, %r18142, 13; 2026-02-21T09:21:00.4365950Z shl.b32 %r18303, %r18143, 13; 2026-02-21T09:21:00.4366171Z .loc 1 91 50 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:50 2026-02-21T09:21:00.4366239Z add.s32 %r18304, %r18272, %r1812; 2026-02-21T09:21:00.4366306Z add.s32 %r18305, %r18273, %r1812; 2026-02-21T09:21:00.4366366Z add.s32 %r18306, %r18274, %r1812; 2026-02-21T09:21:00.4366427Z add.s32 %r18307, %r18275, %r1812; 2026-02-21T09:21:00.4366622Z add.s32 %r18308, %r18276, %r1812; 2026-02-21T09:21:00.4366691Z add.s32 %r18309, %r18277, %r1812; 2026-02-21T09:21:00.4366750Z add.s32 %r18310, %r18278, %r1812; 2026-02-21T09:21:00.4366810Z add.s32 %r18311, %r18279, %r1812; 2026-02-21T09:21:00.4366873Z add.s32 %r18312, %r18280, %r1812; 2026-02-21T09:21:00.4366935Z add.s32 %r18313, %r18281, %r1812; 2026-02-21T09:21:00.4366995Z add.s32 %r18314, %r18282, %r1812; 2026-02-21T09:21:00.4367057Z add.s32 %r18315, %r18283, %r1812; 2026-02-21T09:21:00.4367123Z add.s32 %r18316, %r18284, %r1812; 2026-02-21T09:21:00.4367184Z add.s32 %r18317, %r18285, %r1812; 2026-02-21T09:21:00.4367245Z add.s32 %r18318, %r18286, %r1812; 2026-02-21T09:21:00.4367307Z add.s32 %r18319, %r18287, %r1812; 2026-02-21T09:21:00.4367365Z add.s32 %r18320, %r18288, %r1812; 2026-02-21T09:21:00.4367427Z add.s32 %r18321, %r18289, %r1812; 2026-02-21T09:21:00.4367498Z add.s32 %r18322, %r18290, %r1812; 2026-02-21T09:21:00.4367565Z add.s32 %r18323, %r18291, %r1812; 2026-02-21T09:21:00.4367628Z add.s32 %r18324, %r18292, %r1812; 2026-02-21T09:21:00.4367688Z add.s32 %r18325, %r18293, %r1812; 2026-02-21T09:21:00.4367752Z add.s32 %r18326, %r18294, %r1812; 2026-02-21T09:21:00.4367810Z add.s32 %r18327, %r18295, %r1812; 2026-02-21T09:21:00.4367869Z add.s32 %r18328, %r18296, %r1812; 2026-02-21T09:21:00.4367931Z add.s32 %r18329, %r18297, %r1812; 2026-02-21T09:21:00.4367990Z add.s32 %r18330, %r18298, %r1812; 2026-02-21T09:21:00.4368050Z add.s32 %r18331, %r18299, %r1812; 2026-02-21T09:21:00.4368110Z add.s32 %r18332, %r18300, %r1812; 2026-02-21T09:21:00.4368178Z add.s32 %r18333, %r18301, %r1812; 2026-02-21T09:21:00.4368238Z add.s32 %r18334, %r18302, %r1812; 2026-02-21T09:21:00.4368299Z add.s32 %r18335, %r18303, %r1812; 2026-02-21T09:21:00.4368525Z .loc 1 91 22 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:22 2026-02-21T09:21:00.4368606Z mad.wide.s32 %rd545, %r18304, 2, %rd46; 2026-02-21T09:21:00.4368677Z mad.wide.s32 %rd546, %r18305, 2, %rd46; 2026-02-21T09:21:00.4368832Z mad.wide.s32 %rd547, %r18306, 2, %rd46; 2026-02-21T09:21:00.4368900Z mad.wide.s32 %rd548, %r18307, 2, %rd46; 2026-02-21T09:21:00.4368968Z mad.wide.s32 %rd549, %r18308, 2, %rd46; 2026-02-21T09:21:00.4369033Z mad.wide.s32 %rd550, %r18309, 2, %rd46; 2026-02-21T09:21:00.4369176Z mad.wide.s32 %rd551, %r18310, 2, %rd46; 2026-02-21T09:21:00.4369242Z mad.wide.s32 %rd552, %r18311, 2, %rd46; 2026-02-21T09:21:00.4369309Z mad.wide.s32 %rd553, %r18312, 2, %rd46; 2026-02-21T09:21:00.4369380Z mad.wide.s32 %rd554, %r18313, 2, %rd46; 2026-02-21T09:21:00.4369448Z mad.wide.s32 %rd555, %r18314, 2, %rd46; 2026-02-21T09:21:00.4369514Z mad.wide.s32 %rd556, %r18315, 2, %rd46; 2026-02-21T09:21:00.4369579Z mad.wide.s32 %rd557, %r18316, 2, %rd46; 2026-02-21T09:21:00.4369712Z mad.wide.s32 %rd558, %r18317, 2, %rd46; 2026-02-21T09:21:00.4369781Z mad.wide.s32 %rd559, %r18318, 2, %rd46; 2026-02-21T09:21:00.4369848Z mad.wide.s32 %rd560, %r18319, 2, %rd46; 2026-02-21T09:21:00.4369921Z mad.wide.s32 %rd561, %r18320, 2, %rd46; 2026-02-21T09:21:00.4369988Z mad.wide.s32 %rd562, %r18321, 2, %rd46; 2026-02-21T09:21:00.4370055Z mad.wide.s32 %rd563, %r18322, 2, %rd46; 2026-02-21T09:21:00.4370127Z mad.wide.s32 %rd564, %r18323, 2, %rd46; 2026-02-21T09:21:00.4370199Z mad.wide.s32 %rd565, %r18324, 2, %rd46; 2026-02-21T09:21:00.4370266Z mad.wide.s32 %rd566, %r18325, 2, %rd46; 2026-02-21T09:21:00.4370332Z mad.wide.s32 %rd567, %r18326, 2, %rd46; 2026-02-21T09:21:00.4370405Z mad.wide.s32 %rd568, %r18327, 2, %rd46; 2026-02-21T09:21:00.4370531Z mad.wide.s32 %rd569, %r18328, 2, %rd46; 2026-02-21T09:21:00.4370599Z mad.wide.s32 %rd570, %r18329, 2, %rd46; 2026-02-21T09:21:00.4370670Z mad.wide.s32 %rd571, %r18330, 2, %rd46; 2026-02-21T09:21:00.4370736Z mad.wide.s32 %rd572, %r18331, 2, %rd46; 2026-02-21T09:21:00.4370802Z mad.wide.s32 %rd573, %r18332, 2, %rd46; 2026-02-21T09:21:00.4370875Z mad.wide.s32 %rd574, %r18333, 2, %rd46; 2026-02-21T09:21:00.4370947Z mad.wide.s32 %rd575, %r18334, 2, %rd46; 2026-02-21T09:21:00.4371016Z mad.wide.s32 %rd576, %r18335, 2, %rd46; 2026-02-21T09:21:00.4371216Z .loc 1 91 81 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:81 2026-02-21T09:21:00.4371344Z st.shared.v4.b32 [%r117], {%r18144, %r18146, %r18148, %r18150}; 2026-02-21T09:21:00.4371457Z st.shared.v4.b32 [%r118], {%r18152, %r18154, %r18156, %r18158}; 2026-02-21T09:21:00.4371564Z st.shared.v4.b32 [%r119], {%r18160, %r18162, %r18164, %r18166}; 2026-02-21T09:21:00.4371677Z st.shared.v4.b32 [%r120], {%r18168, %r18170, %r18172, %r18174}; 2026-02-21T09:21:00.4371783Z st.shared.v4.b32 [%r121], {%r18176, %r18178, %r18180, %r18182}; 2026-02-21T09:21:00.4371889Z st.shared.v4.b32 [%r122], {%r18184, %r18186, %r18188, %r18190}; 2026-02-21T09:21:00.4371999Z st.shared.v4.b32 [%r123], {%r18192, %r18194, %r18196, %r18198}; 2026-02-21T09:21:00.4372105Z st.shared.v4.b32 [%r124], {%r18200, %r18202, %r18204, %r18206}; 2026-02-21T09:21:00.4372166Z bar.sync 0; 2026-02-21T09:21:00.4372247Z // begin inline asm 2026-02-21T09:21:00.4372456Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17824, %r17825, %r17826, %r17827}, [%r6269]; 2026-02-21T09:21:00.4372518Z // end inline asm 2026-02-21T09:21:00.4372580Z // begin inline asm 2026-02-21T09:21:00.4372781Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17829, %r17830, %r17831, %r17832}, [%r6274]; 2026-02-21T09:21:00.4372838Z // end inline asm 2026-02-21T09:21:00.4372896Z // begin inline asm 2026-02-21T09:21:00.4373089Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17834, %r17835, %r17836, %r17837}, [%r6279]; 2026-02-21T09:21:00.4373145Z // end inline asm 2026-02-21T09:21:00.4373213Z // begin inline asm 2026-02-21T09:21:00.4373404Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17839, %r17840, %r17841, %r17842}, [%r6284]; 2026-02-21T09:21:00.4373463Z // end inline asm 2026-02-21T09:21:00.4373520Z // begin inline asm 2026-02-21T09:21:00.4373706Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17844, %r17845, %r17846, %r17847}, [%r6289]; 2026-02-21T09:21:00.4373826Z // end inline asm 2026-02-21T09:21:00.4373883Z // begin inline asm 2026-02-21T09:21:00.4374071Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17849, %r17850, %r17851, %r17852}, [%r6294]; 2026-02-21T09:21:00.4374176Z // end inline asm 2026-02-21T09:21:00.4374235Z // begin inline asm 2026-02-21T09:21:00.4374418Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17854, %r17855, %r17856, %r17857}, [%r6299]; 2026-02-21T09:21:00.4374474Z // end inline asm 2026-02-21T09:21:00.4374535Z // begin inline asm 2026-02-21T09:21:00.4374719Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17859, %r17860, %r17861, %r17862}, [%r6304]; 2026-02-21T09:21:00.4374774Z // end inline asm 2026-02-21T09:21:00.4374832Z bar.sync 0; 2026-02-21T09:21:00.4374991Z st.shared.v4.b32 [%r117], {%r18145, %r18147, %r18149, %r18151}; 2026-02-21T09:21:00.4375103Z st.shared.v4.b32 [%r118], {%r18153, %r18155, %r18157, %r18159}; 2026-02-21T09:21:00.4375216Z st.shared.v4.b32 [%r119], {%r18161, %r18163, %r18165, %r18167}; 2026-02-21T09:21:00.4375326Z st.shared.v4.b32 [%r120], {%r18169, %r18171, %r18173, %r18175}; 2026-02-21T09:21:00.4375433Z st.shared.v4.b32 [%r121], {%r18177, %r18179, %r18181, %r18183}; 2026-02-21T09:21:00.4375539Z st.shared.v4.b32 [%r122], {%r18185, %r18187, %r18189, %r18191}; 2026-02-21T09:21:00.4375650Z st.shared.v4.b32 [%r123], {%r18193, %r18195, %r18197, %r18199}; 2026-02-21T09:21:00.4375753Z st.shared.v4.b32 [%r124], {%r18201, %r18203, %r18205, %r18207}; 2026-02-21T09:21:00.4375807Z bar.sync 0; 2026-02-21T09:21:00.4375869Z // begin inline asm 2026-02-21T09:21:00.4376104Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17864, %r17865, %r17866, %r17867}, [%r6269]; 2026-02-21T09:21:00.4376166Z // end inline asm 2026-02-21T09:21:00.4376228Z // begin inline asm 2026-02-21T09:21:00.4376416Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17869, %r17870, %r17871, %r17872}, [%r6274]; 2026-02-21T09:21:00.4376602Z // end inline asm 2026-02-21T09:21:00.4376667Z // begin inline asm 2026-02-21T09:21:00.4376871Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17874, %r17875, %r17876, %r17877}, [%r6279]; 2026-02-21T09:21:00.4376930Z // end inline asm 2026-02-21T09:21:00.4376987Z // begin inline asm 2026-02-21T09:21:00.4377182Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17879, %r17880, %r17881, %r17882}, [%r6284]; 2026-02-21T09:21:00.4377239Z // end inline asm 2026-02-21T09:21:00.4377297Z // begin inline asm 2026-02-21T09:21:00.4377484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17884, %r17885, %r17886, %r17887}, [%r6289]; 2026-02-21T09:21:00.4377544Z // end inline asm 2026-02-21T09:21:00.4377600Z // begin inline asm 2026-02-21T09:21:00.4377784Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17889, %r17890, %r17891, %r17892}, [%r6294]; 2026-02-21T09:21:00.4377846Z // end inline asm 2026-02-21T09:21:00.4377903Z // begin inline asm 2026-02-21T09:21:00.4378087Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17894, %r17895, %r17896, %r17897}, [%r6299]; 2026-02-21T09:21:00.4378148Z // end inline asm 2026-02-21T09:21:00.4378205Z // begin inline asm 2026-02-21T09:21:00.4378387Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17899, %r17900, %r17901, %r17902}, [%r6304]; 2026-02-21T09:21:00.4378443Z // end inline asm 2026-02-21T09:21:00.4378503Z bar.sync 0; 2026-02-21T09:21:00.4378620Z st.shared.v4.b32 [%r117], {%r18208, %r18210, %r18212, %r18214}; 2026-02-21T09:21:00.4378728Z st.shared.v4.b32 [%r118], {%r18216, %r18218, %r18220, %r18222}; 2026-02-21T09:21:00.4378839Z st.shared.v4.b32 [%r119], {%r18224, %r18226, %r18228, %r18230}; 2026-02-21T09:21:00.4378949Z st.shared.v4.b32 [%r120], {%r18232, %r18234, %r18236, %r18238}; 2026-02-21T09:21:00.4379056Z st.shared.v4.b32 [%r121], {%r18240, %r18242, %r18244, %r18246}; 2026-02-21T09:21:00.4379166Z st.shared.v4.b32 [%r122], {%r18248, %r18250, %r18252, %r18254}; 2026-02-21T09:21:00.4379277Z st.shared.v4.b32 [%r123], {%r18256, %r18258, %r18260, %r18262}; 2026-02-21T09:21:00.4379382Z st.shared.v4.b32 [%r124], {%r18264, %r18266, %r18268, %r18270}; 2026-02-21T09:21:00.4379529Z bar.sync 0; 2026-02-21T09:21:00.4379593Z // begin inline asm 2026-02-21T09:21:00.4379791Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17904, %r17905, %r17906, %r17907}, [%r6269]; 2026-02-21T09:21:00.4379847Z // end inline asm 2026-02-21T09:21:00.4379971Z // begin inline asm 2026-02-21T09:21:00.4380165Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17909, %r17910, %r17911, %r17912}, [%r6274]; 2026-02-21T09:21:00.4380220Z // end inline asm 2026-02-21T09:21:00.4380280Z // begin inline asm 2026-02-21T09:21:00.4380469Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17914, %r17915, %r17916, %r17917}, [%r6279]; 2026-02-21T09:21:00.4380528Z // end inline asm 2026-02-21T09:21:00.4380587Z // begin inline asm 2026-02-21T09:21:00.4380833Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17919, %r17920, %r17921, %r17922}, [%r6284]; 2026-02-21T09:21:00.4380892Z // end inline asm 2026-02-21T09:21:00.4380950Z // begin inline asm 2026-02-21T09:21:00.4381134Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17924, %r17925, %r17926, %r17927}, [%r6289]; 2026-02-21T09:21:00.4381191Z // end inline asm 2026-02-21T09:21:00.4381250Z // begin inline asm 2026-02-21T09:21:00.4381436Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17929, %r17930, %r17931, %r17932}, [%r6294]; 2026-02-21T09:21:00.4381496Z // end inline asm 2026-02-21T09:21:00.4381553Z // begin inline asm 2026-02-21T09:21:00.4381735Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17934, %r17935, %r17936, %r17937}, [%r6299]; 2026-02-21T09:21:00.4381792Z // end inline asm 2026-02-21T09:21:00.4381914Z // begin inline asm 2026-02-21T09:21:00.4382101Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17939, %r17940, %r17941, %r17942}, [%r6304]; 2026-02-21T09:21:00.4382159Z // end inline asm 2026-02-21T09:21:00.4382225Z bar.sync 0; 2026-02-21T09:21:00.4382338Z st.shared.v4.b32 [%r117], {%r18209, %r18211, %r18213, %r18215}; 2026-02-21T09:21:00.4382446Z st.shared.v4.b32 [%r118], {%r18217, %r18219, %r18221, %r18223}; 2026-02-21T09:21:00.4382562Z st.shared.v4.b32 [%r119], {%r18225, %r18227, %r18229, %r18231}; 2026-02-21T09:21:00.4382668Z st.shared.v4.b32 [%r120], {%r18233, %r18235, %r18237, %r18239}; 2026-02-21T09:21:00.4382774Z st.shared.v4.b32 [%r121], {%r18241, %r18243, %r18245, %r18247}; 2026-02-21T09:21:00.4382898Z st.shared.v4.b32 [%r122], {%r18249, %r18251, %r18253, %r18255}; 2026-02-21T09:21:00.4383006Z st.shared.v4.b32 [%r123], {%r18257, %r18259, %r18261, %r18263}; 2026-02-21T09:21:00.4383114Z st.shared.v4.b32 [%r124], {%r18265, %r18267, %r18269, %r18271}; 2026-02-21T09:21:00.4383171Z bar.sync 0; 2026-02-21T09:21:00.4383231Z // begin inline asm 2026-02-21T09:21:00.4383418Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17944, %r17945, %r17946, %r17947}, [%r6269]; 2026-02-21T09:21:00.4383477Z // end inline asm 2026-02-21T09:21:00.4383535Z // begin inline asm 2026-02-21T09:21:00.4383719Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17949, %r17950, %r17951, %r17952}, [%r6274]; 2026-02-21T09:21:00.4383774Z // end inline asm 2026-02-21T09:21:00.4383835Z // begin inline asm 2026-02-21T09:21:00.4384018Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17954, %r17955, %r17956, %r17957}, [%r6279]; 2026-02-21T09:21:00.4384074Z // end inline asm 2026-02-21T09:21:00.4384134Z // begin inline asm 2026-02-21T09:21:00.4384319Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17959, %r17960, %r17961, %r17962}, [%r6284]; 2026-02-21T09:21:00.4384375Z // end inline asm 2026-02-21T09:21:00.4384431Z // begin inline asm 2026-02-21T09:21:00.4384622Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17964, %r17965, %r17966, %r17967}, [%r6289]; 2026-02-21T09:21:00.4384677Z // end inline asm 2026-02-21T09:21:00.4384733Z // begin inline asm 2026-02-21T09:21:00.4384924Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17969, %r17970, %r17971, %r17972}, [%r6294]; 2026-02-21T09:21:00.4384978Z // end inline asm 2026-02-21T09:21:00.4385041Z // begin inline asm 2026-02-21T09:21:00.4385227Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17974, %r17975, %r17976, %r17977}, [%r6299]; 2026-02-21T09:21:00.4385343Z // end inline asm 2026-02-21T09:21:00.4385398Z // begin inline asm 2026-02-21T09:21:00.4385587Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17979, %r17980, %r17981, %r17982}, [%r6304]; 2026-02-21T09:21:00.4385646Z // end inline asm 2026-02-21T09:21:00.4385772Z // begin inline asm 2026-02-21T09:21:00.4385903Z st.global.v4.b32 [ %rd545 + 0 ], { %r17824, %r17825, %r17826, %r17827 }; 2026-02-21T09:21:00.4385963Z // end inline asm 2026-02-21T09:21:00.4386018Z // begin inline asm 2026-02-21T09:21:00.4386142Z st.global.v4.b32 [ %rd546 + 0 ], { %r17864, %r17865, %r17866, %r17867 }; 2026-02-21T09:21:00.4386200Z // end inline asm 2026-02-21T09:21:00.4386258Z // begin inline asm 2026-02-21T09:21:00.4386424Z st.global.v4.b32 [ %rd547 + 0 ], { %r17829, %r17830, %r17831, %r17832 }; 2026-02-21T09:21:00.4386605Z // end inline asm 2026-02-21T09:21:00.4386675Z // begin inline asm 2026-02-21T09:21:00.4386797Z st.global.v4.b32 [ %rd548 + 0 ], { %r17869, %r17870, %r17871, %r17872 }; 2026-02-21T09:21:00.4386855Z // end inline asm 2026-02-21T09:21:00.4386915Z // begin inline asm 2026-02-21T09:21:00.4387032Z st.global.v4.b32 [ %rd549 + 0 ], { %r17834, %r17835, %r17836, %r17837 }; 2026-02-21T09:21:00.4387089Z // end inline asm 2026-02-21T09:21:00.4387148Z // begin inline asm 2026-02-21T09:21:00.4387267Z st.global.v4.b32 [ %rd550 + 0 ], { %r17874, %r17875, %r17876, %r17877 }; 2026-02-21T09:21:00.4387325Z // end inline asm 2026-02-21T09:21:00.4387382Z // begin inline asm 2026-02-21T09:21:00.4387594Z st.global.v4.b32 [ %rd551 + 0 ], { %r17839, %r17840, %r17841, %r17842 }; 2026-02-21T09:21:00.4387654Z // end inline asm 2026-02-21T09:21:00.4387711Z // begin inline asm 2026-02-21T09:21:00.4387836Z st.global.v4.b32 [ %rd552 + 0 ], { %r17879, %r17880, %r17881, %r17882 }; 2026-02-21T09:21:00.4387893Z // end inline asm 2026-02-21T09:21:00.4387951Z // begin inline asm 2026-02-21T09:21:00.4388066Z st.global.v4.b32 [ %rd553 + 0 ], { %r17844, %r17845, %r17846, %r17847 }; 2026-02-21T09:21:00.4388127Z // end inline asm 2026-02-21T09:21:00.4388183Z // begin inline asm 2026-02-21T09:21:00.4388401Z st.global.v4.b32 [ %rd554 + 0 ], { %r17884, %r17885, %r17886, %r17887 }; 2026-02-21T09:21:00.4388469Z // end inline asm 2026-02-21T09:21:00.4388529Z // begin inline asm 2026-02-21T09:21:00.4388648Z st.global.v4.b32 [ %rd555 + 0 ], { %r17849, %r17850, %r17851, %r17852 }; 2026-02-21T09:21:00.4388704Z // end inline asm 2026-02-21T09:21:00.4388766Z // begin inline asm 2026-02-21T09:21:00.4388882Z st.global.v4.b32 [ %rd556 + 0 ], { %r17889, %r17890, %r17891, %r17892 }; 2026-02-21T09:21:00.4388938Z // end inline asm 2026-02-21T09:21:00.4388997Z // begin inline asm 2026-02-21T09:21:00.4389112Z st.global.v4.b32 [ %rd557 + 0 ], { %r17854, %r17855, %r17856, %r17857 }; 2026-02-21T09:21:00.4389168Z // end inline asm 2026-02-21T09:21:00.4389225Z // begin inline asm 2026-02-21T09:21:00.4389341Z st.global.v4.b32 [ %rd558 + 0 ], { %r17894, %r17895, %r17896, %r17897 }; 2026-02-21T09:21:00.4389400Z // end inline asm 2026-02-21T09:21:00.4389457Z // begin inline asm 2026-02-21T09:21:00.4389575Z st.global.v4.b32 [ %rd559 + 0 ], { %r17859, %r17860, %r17861, %r17862 }; 2026-02-21T09:21:00.4389630Z // end inline asm 2026-02-21T09:21:00.4389689Z // begin inline asm 2026-02-21T09:21:00.4389814Z st.global.v4.b32 [ %rd560 + 0 ], { %r17899, %r17900, %r17901, %r17902 }; 2026-02-21T09:21:00.4389871Z // end inline asm 2026-02-21T09:21:00.4389927Z // begin inline asm 2026-02-21T09:21:00.4390044Z st.global.v4.b32 [ %rd561 + 0 ], { %r17904, %r17905, %r17906, %r17907 }; 2026-02-21T09:21:00.4390104Z // end inline asm 2026-02-21T09:21:00.4390161Z // begin inline asm 2026-02-21T09:21:00.4390276Z st.global.v4.b32 [ %rd562 + 0 ], { %r17944, %r17945, %r17946, %r17947 }; 2026-02-21T09:21:00.4390344Z // end inline asm 2026-02-21T09:21:00.4390404Z // begin inline asm 2026-02-21T09:21:00.4390521Z st.global.v4.b32 [ %rd563 + 0 ], { %r17909, %r17910, %r17911, %r17912 }; 2026-02-21T09:21:00.4390580Z // end inline asm 2026-02-21T09:21:00.4390718Z // begin inline asm 2026-02-21T09:21:00.4390835Z st.global.v4.b32 [ %rd564 + 0 ], { %r17949, %r17950, %r17951, %r17952 }; 2026-02-21T09:21:00.4390890Z // end inline asm 2026-02-21T09:21:00.4390952Z // begin inline asm 2026-02-21T09:21:00.4391129Z st.global.v4.b32 [ %rd565 + 0 ], { %r17914, %r17915, %r17916, %r17917 }; 2026-02-21T09:21:00.4391187Z // end inline asm 2026-02-21T09:21:00.4391249Z // begin inline asm 2026-02-21T09:21:00.4391371Z st.global.v4.b32 [ %rd566 + 0 ], { %r17954, %r17955, %r17956, %r17957 }; 2026-02-21T09:21:00.4391428Z // end inline asm 2026-02-21T09:21:00.4391488Z // begin inline asm 2026-02-21T09:21:00.4391609Z st.global.v4.b32 [ %rd567 + 0 ], { %r17919, %r17920, %r17921, %r17922 }; 2026-02-21T09:21:00.4391664Z // end inline asm 2026-02-21T09:21:00.4391788Z // begin inline asm 2026-02-21T09:21:00.4391912Z st.global.v4.b32 [ %rd568 + 0 ], { %r17959, %r17960, %r17961, %r17962 }; 2026-02-21T09:21:00.4391967Z // end inline asm 2026-02-21T09:21:00.4392026Z // begin inline asm 2026-02-21T09:21:00.4392144Z st.global.v4.b32 [ %rd569 + 0 ], { %r17924, %r17925, %r17926, %r17927 }; 2026-02-21T09:21:00.4392200Z // end inline asm 2026-02-21T09:21:00.4392257Z // begin inline asm 2026-02-21T09:21:00.4392374Z st.global.v4.b32 [ %rd570 + 0 ], { %r17964, %r17965, %r17966, %r17967 }; 2026-02-21T09:21:00.4392435Z // end inline asm 2026-02-21T09:21:00.4392503Z // begin inline asm 2026-02-21T09:21:00.4392627Z st.global.v4.b32 [ %rd571 + 0 ], { %r17929, %r17930, %r17931, %r17932 }; 2026-02-21T09:21:00.4392684Z // end inline asm 2026-02-21T09:21:00.4392791Z // begin inline asm 2026-02-21T09:21:00.4392910Z st.global.v4.b32 [ %rd572 + 0 ], { %r17969, %r17970, %r17971, %r17972 }; 2026-02-21T09:21:00.4392964Z // end inline asm 2026-02-21T09:21:00.4393025Z // begin inline asm 2026-02-21T09:21:00.4393146Z st.global.v4.b32 [ %rd573 + 0 ], { %r17934, %r17935, %r17936, %r17937 }; 2026-02-21T09:21:00.4393205Z // end inline asm 2026-02-21T09:21:00.4393264Z // begin inline asm 2026-02-21T09:21:00.4393380Z st.global.v4.b32 [ %rd574 + 0 ], { %r17974, %r17975, %r17976, %r17977 }; 2026-02-21T09:21:00.4393439Z // end inline asm 2026-02-21T09:21:00.4393496Z // begin inline asm 2026-02-21T09:21:00.4393623Z st.global.v4.b32 [ %rd575 + 0 ], { %r17939, %r17940, %r17941, %r17942 }; 2026-02-21T09:21:00.4393685Z // end inline asm 2026-02-21T09:21:00.4393743Z // begin inline asm 2026-02-21T09:21:00.4393860Z st.global.v4.b32 [ %rd576 + 0 ], { %r17979, %r17980, %r17981, %r17982 }; 2026-02-21T09:21:00.4393916Z // end inline asm 2026-02-21T09:21:00.4394141Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.4394209Z add.s32 %r22257, %r22257, 4; 2026-02-21T09:21:00.4394272Z add.s32 %r22256, %r22256, 4; 2026-02-21T09:21:00.4394334Z add.s32 %r22255, %r22255, 4; 2026-02-21T09:21:00.4394393Z add.s32 %r22254, %r22254, 4; 2026-02-21T09:21:00.4394468Z setp.lt.s32 %p50, %r22257, %r23298; 2026-02-21T09:21:00.4394529Z @%p50 bra $L__BB0_2; 2026-02-21T09:21:00.4394623Z $L__BB0_11: // %.preheader 2026-02-21T09:21:00.4394692Z setp.gt.s32 %p51, %r23298, %r2; 2026-02-21T09:21:00.4394752Z @%p51 bra $L__BB0_16; 2026-02-21T09:21:00.4394839Z // %bb.12: // %.lr.ph33 2026-02-21T09:21:00.4395052Z .loc 1 0 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:0:121 2026-02-21T09:21:00.4395117Z and.b32 %r18338, %r22236, 136; 2026-02-21T09:21:00.4395185Z xor.b32 %r141, %r18338, %r22235; 2026-02-21T09:21:00.4395248Z add.s32 %r142, %r22237, %r141; 2026-02-21T09:21:00.4395311Z add.s32 %r18387, %r142, 2048; 2026-02-21T09:21:00.4395370Z add.s32 %r18389, %r142, 4096; 2026-02-21T09:21:00.4395432Z add.s32 %r18391, %r142, 6144; 2026-02-21T09:21:00.4395493Z shl.b32 %r18340, %r22239, 3; 2026-02-21T09:21:00.4395556Z add.s32 %r18341, %r22237, %r18340; 2026-02-21T09:21:00.4395617Z add.s32 %r148, %r18341, 98304; 2026-02-21T09:21:00.4395737Z add.s32 %r18395, %r142, 40960; 2026-02-21T09:21:00.4395797Z add.s32 %r18397, %r142, 43008; 2026-02-21T09:21:00.4395855Z add.s32 %r18399, %r142, 45056; 2026-02-21T09:21:00.4395920Z add.s32 %r18401, %r142, 47104; 2026-02-21T09:21:00.4395979Z or.b32 %r153, %r22238, 65536; 2026-02-21T09:21:00.4396088Z add.s32 %r154, %r18341, 108544; 2026-02-21T09:21:00.4396151Z add.s32 %r18405, %r142, 8192; 2026-02-21T09:21:00.4396209Z add.s32 %r18407, %r142, 10240; 2026-02-21T09:21:00.4396267Z add.s32 %r18409, %r142, 12288; 2026-02-21T09:21:00.4396331Z add.s32 %r18411, %r142, 14336; 2026-02-21T09:21:00.4396391Z or.b32 %r159, %r22238, 131072; 2026-02-21T09:21:00.4396574Z add.s32 %r18413, %r18341, 100352; 2026-02-21T09:21:00.4396640Z add.s32 %r18415, %r142, 49152; 2026-02-21T09:21:00.4396779Z add.s32 %r18417, %r142, 51200; 2026-02-21T09:21:00.4396840Z add.s32 %r18419, %r142, 53248; 2026-02-21T09:21:00.4396899Z add.s32 %r18421, %r142, 55296; 2026-02-21T09:21:00.4396962Z or.b32 %r165, %r22238, 196608; 2026-02-21T09:21:00.4397026Z add.s32 %r18423, %r18341, 110592; 2026-02-21T09:21:00.4397084Z add.s32 %r18425, %r142, 16384; 2026-02-21T09:21:00.4397142Z add.s32 %r18427, %r142, 18432; 2026-02-21T09:21:00.4397211Z add.s32 %r18429, %r142, 20480; 2026-02-21T09:21:00.4397275Z add.s32 %r18431, %r142, 22528; 2026-02-21T09:21:00.4397335Z or.b32 %r171, %r22238, 262144; 2026-02-21T09:21:00.4397398Z add.s32 %r18433, %r18341, 102400; 2026-02-21T09:21:00.4397456Z add.s32 %r18435, %r142, 57344; 2026-02-21T09:21:00.4397515Z add.s32 %r18437, %r142, 59392; 2026-02-21T09:21:00.4397640Z add.s32 %r18439, %r142, 61440; 2026-02-21T09:21:00.4397703Z add.s32 %r18441, %r142, 63488; 2026-02-21T09:21:00.4397761Z or.b32 %r177, %r22238, 327680; 2026-02-21T09:21:00.4397821Z add.s32 %r18443, %r18341, 112640; 2026-02-21T09:21:00.4397882Z add.s32 %r18445, %r142, 24576; 2026-02-21T09:21:00.4397941Z add.s32 %r18447, %r142, 26624; 2026-02-21T09:21:00.4397999Z add.s32 %r18449, %r142, 28672; 2026-02-21T09:21:00.4398060Z add.s32 %r18451, %r142, 30720; 2026-02-21T09:21:00.4398122Z or.b32 %r183, %r22238, 393216; 2026-02-21T09:21:00.4398180Z add.s32 %r18453, %r18341, 104448; 2026-02-21T09:21:00.4398240Z add.s32 %r18455, %r142, 65536; 2026-02-21T09:21:00.4398303Z add.s32 %r18457, %r142, 67584; 2026-02-21T09:21:00.4398360Z add.s32 %r18459, %r142, 69632; 2026-02-21T09:21:00.4398417Z add.s32 %r18461, %r142, 71680; 2026-02-21T09:21:00.4398477Z or.b32 %r189, %r22238, 458752; 2026-02-21T09:21:00.4398537Z add.s32 %r18463, %r18341, 114688; 2026-02-21T09:21:00.4398596Z add.s32 %r18465, %r142, 32768; 2026-02-21T09:21:00.4398655Z add.s32 %r18467, %r142, 34816; 2026-02-21T09:21:00.4398716Z add.s32 %r18469, %r142, 36864; 2026-02-21T09:21:00.4398773Z add.s32 %r18471, %r142, 38912; 2026-02-21T09:21:00.4398831Z or.b32 %r195, %r22238, 524288; 2026-02-21T09:21:00.4398893Z add.s32 %r18473, %r18341, 106496; 2026-02-21T09:21:00.4398951Z add.s32 %r18475, %r142, 73728; 2026-02-21T09:21:00.4399011Z add.s32 %r18477, %r142, 75776; 2026-02-21T09:21:00.4399069Z add.s32 %r18479, %r142, 77824; 2026-02-21T09:21:00.4399146Z add.s32 %r18481, %r142, 79872; 2026-02-21T09:21:00.4399207Z or.b32 %r201, %r22238, 589824; 2026-02-21T09:21:00.4399267Z add.s32 %r18483, %r18341, 116736; 2026-02-21T09:21:00.4399333Z or.b32 %r18345, %r22240, %r22241; 2026-02-21T09:21:00.4399392Z or.b32 %r18346, %r18345, %r22242; 2026-02-21T09:21:00.4399454Z or.b32 %r203, %r18346, %r18338; 2026-02-21T09:21:00.4399516Z xor.b32 %r204, %r203, 8; 2026-02-21T09:21:00.4399580Z shl.b32 %r18347, %r22239, 6; 2026-02-21T09:21:00.4399640Z or.b32 %r18349, %r18347, %r22245; 2026-02-21T09:21:00.4399704Z add.s32 %r18350, %r22237, 81920; 2026-02-21T09:21:00.4399768Z add.s32 %r207, %r18350, %r18349; 2026-02-21T09:21:00.4399828Z xor.b32 %r18351, %r18349, 16; 2026-02-21T09:21:00.4399895Z add.s32 %r208, %r18350, %r18351; 2026-02-21T09:21:00.4399957Z xor.b32 %r18352, %r18349, 32; 2026-02-21T09:21:00.4400018Z add.s32 %r209, %r18350, %r18352; 2026-02-21T09:21:00.4400155Z xor.b32 %r18353, %r18349, 48; 2026-02-21T09:21:00.4400216Z add.s32 %r210, %r18350, %r18353; 2026-02-21T09:21:00.4400277Z bfe.u32 %r18354, %r18350, 4, 14; 2026-02-21T09:21:00.4400340Z cvt.u64.u32 %rd577, %r18354; 2026-02-21T09:21:00.4400486Z or.b64 %rd4, %rd577, -9223371899348713472; 2026-02-21T09:21:00.4400549Z add.s32 %r18355, %r22237, 81952; 2026-02-21T09:21:00.4400608Z bfe.u32 %r18356, %r18355, 4, 14; 2026-02-21T09:21:00.4400668Z cvt.u64.u32 %rd578, %r18356; 2026-02-21T09:21:00.4400744Z or.b64 %rd5, %rd578, -9223371899348713472; 2026-02-21T09:21:00.4400813Z and.b32 %r18358, %r22246, 3968; 2026-02-21T09:21:00.4400872Z and.b32 %r18360, %r22247, 4112; 2026-02-21T09:21:00.4400933Z or.b32 %r18361, %r18358, %r18360; 2026-02-21T09:21:00.4401056Z mad.lo.s32 %r18362, %r45, 8224, %r18361; 2026-02-21T09:21:00.4401118Z add.s32 %r211, %r22237, %r18362; 2026-02-21T09:21:00.4401176Z xor.b32 %r18363, %r18362, 16; 2026-02-21T09:21:00.4401239Z add.s32 %r212, %r22237, %r18363; 2026-02-21T09:21:00.4401303Z xor.b32 %r18364, %r18362, 32; 2026-02-21T09:21:00.4401363Z add.s32 %r213, %r22237, %r18364; 2026-02-21T09:21:00.4401422Z xor.b32 %r18365, %r18362, 48; 2026-02-21T09:21:00.4401484Z add.s32 %r214, %r22237, %r18365; 2026-02-21T09:21:00.4401545Z xor.b32 %r18366, %r18362, 64; 2026-02-21T09:21:00.4401603Z add.s32 %r215, %r22237, %r18366; 2026-02-21T09:21:00.4401664Z xor.b32 %r18367, %r18362, 80; 2026-02-21T09:21:00.4401723Z add.s32 %r216, %r22237, %r18367; 2026-02-21T09:21:00.4401783Z xor.b32 %r18368, %r18362, 96; 2026-02-21T09:21:00.4401890Z add.s32 %r217, %r22237, %r18368; 2026-02-21T09:21:00.4401956Z xor.b32 %r18369, %r18362, 112; 2026-02-21T09:21:00.4402016Z add.s32 %r218, %r22237, %r18369; 2026-02-21T09:21:00.4402076Z shl.b32 %r18371, %r22248, 10; 2026-02-21T09:21:00.4402137Z and.b32 %r18372, %r22246, 112; 2026-02-21T09:21:00.4402197Z shl.b32 %r18373, %r22248, 2; 2026-02-21T09:21:00.4402256Z and.b32 %r18375, %r22249, 384; 2026-02-21T09:21:00.4402318Z and.b32 %r18377, %r22250, 4112; 2026-02-21T09:21:00.4402379Z or.b32 %r18378, %r18371, %r18372; 2026-02-21T09:21:00.4402439Z or.b32 %r18379, %r18373, %r18375; 2026-02-21T09:21:00.4402502Z xor.b32 %r18380, %r18378, %r18379; 2026-02-21T09:21:00.4402567Z xor.b32 %r18381, %r18380, %r18377; 2026-02-21T09:21:00.4402627Z add.s32 %r21727, %r22237, %r18381; 2026-02-21T09:21:00.4402686Z add.s32 %r21732, %r21727, 512; 2026-02-21T09:21:00.4402746Z add.s32 %r21737, %r21727, 1024; 2026-02-21T09:21:00.4402810Z add.s32 %r21742, %r21727, 1536; 2026-02-21T09:21:00.4402870Z add.s32 %r21747, %r21727, 2048; 2026-02-21T09:21:00.4402928Z add.s32 %r21752, %r21727, 2560; 2026-02-21T09:21:00.4402996Z add.s32 %r21757, %r21727, 3072; 2026-02-21T09:21:00.4403063Z add.s32 %r21762, %r21727, 3584; 2026-02-21T09:21:00.4403272Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4403336Z or.b32 %r18383, %r22251, %r44; 2026-02-21T09:21:00.4403396Z or.b32 %r227, %r18383, 720896; 2026-02-21T09:21:00.4403603Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.4403670Z mad.wide.u32 %rd6, %r45, 8, %rd44; 2026-02-21T09:21:00.4403788Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T09:21:00.4403885Z // Child Loop BB0_14 Depth 2 2026-02-21T09:21:00.4404086Z .loc 1 28 35 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:28:35 2026-02-21T09:21:00.4404150Z shr.s32 %r18488, %r23298, 31; 2026-02-21T09:21:00.4404208Z shr.u32 %r18489, %r18488, 23; 2026-02-21T09:21:00.4404269Z add.s32 %r18490, %r23298, %r18489; 2026-02-21T09:21:00.4404467Z .loc 1 31 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:45 2026-02-21T09:21:00.4404530Z and.b32 %r18491, %r18490, 65024; 2026-02-21T09:21:00.4404590Z sub.s32 %r18492, %r23298, %r18491; 2026-02-21T09:21:00.4404869Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.4404931Z cvt.u16.u32 %rs449, %r18492; 2026-02-21T09:21:00.4405123Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.4405234Z shr.s16 %rs450, %rs449, 15; 2026-02-21T09:21:00.4405296Z shr.u16 %rs451, %rs450, 13; 2026-02-21T09:21:00.4405359Z add.s16 %rs452, %rs449, %rs451; 2026-02-21T09:21:00.4405421Z shr.s16 %rs453, %rs452, 3; 2026-02-21T09:21:00.4405619Z .loc 1 31 64 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:31:64 2026-02-21T09:21:00.4405681Z and.b16 %rs454, %rs452, -8; 2026-02-21T09:21:00.4405791Z sub.s16 %rs455, %rs449, %rs454; 2026-02-21T09:21:00.4405990Z .loc 1 32 51 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:32:51 2026-02-21T09:21:00.4406053Z cvt.u32.u16 %r18493, %rs453; 2026-02-21T09:21:00.4406248Z .loc 1 33 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:33:27 2026-02-21T09:21:00.4406307Z shl.b32 %r18494, %r18490, 2; 2026-02-21T09:21:00.4406371Z and.b32 %r18495, %r18494, -2048; 2026-02-21T09:21:00.4406438Z mul.wide.s16 %r18496, %rs455, 256; 2026-02-21T09:21:00.4406641Z add.s32 %r18497, %r18496, %r18495; 2026-02-21T09:21:00.4406841Z .loc 1 34 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:34:32 2026-02-21T09:21:00.4406902Z or.b32 %r2341, %r18497, %r44; 2026-02-21T09:21:00.4407174Z .loc 1 35 27 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:35:27 2026-02-21T09:21:00.4407244Z mul.wide.s16 %r2342, %rs453, 256; 2026-02-21T09:21:00.4407439Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4407500Z or.b32 %r18498, %r2342, %r6; 2026-02-21T09:21:00.4407559Z or.b32 %r18499, %r2342, %r7; 2026-02-21T09:21:00.4407624Z or.b32 %r18500, %r2342, %r8; 2026-02-21T09:21:00.4407682Z or.b32 %r18501, %r2342, %r9; 2026-02-21T09:21:00.4407886Z .loc 1 51 53 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:53 2026-02-21T09:21:00.4407954Z shl.b32 %r18502, %r18498, 10; 2026-02-21T09:21:00.4408013Z shl.b32 %r18503, %r18499, 10; 2026-02-21T09:21:00.4408073Z shl.b32 %r18504, %r18500, 10; 2026-02-21T09:21:00.4408134Z shl.b32 %r18505, %r18501, 10; 2026-02-21T09:21:00.4408327Z .loc 1 51 60 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:60 2026-02-21T09:21:00.4408388Z or.b32 %r18506, %r18502, %r46; 2026-02-21T09:21:00.4408448Z or.b32 %r18507, %r18503, %r46; 2026-02-21T09:21:00.4408511Z or.b32 %r18508, %r18504, %r46; 2026-02-21T09:21:00.4408570Z or.b32 %r18509, %r18505, %r46; 2026-02-21T09:21:00.4408764Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4408846Z mad.wide.s32 %rd579, %r18506, 2, %rd44; 2026-02-21T09:21:00.4408915Z mad.wide.s32 %rd580, %r18507, 2, %rd44; 2026-02-21T09:21:00.4408984Z mad.wide.s32 %rd581, %r18508, 2, %rd44; 2026-02-21T09:21:00.4409056Z mad.wide.s32 %rd582, %r18509, 2, %rd44; 2026-02-21T09:21:00.4409252Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4409309Z bar.sync 0; 2026-02-21T09:21:00.4409368Z mov.b32 %r18386, 8; 2026-02-21T09:21:00.4409429Z // begin inline asm 2026-02-21T09:21:00.4409579Z cp.async.ca.shared.global [ %r142 + 0 ], [ %rd579 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4409636Z // end inline asm 2026-02-21T09:21:00.4409697Z // begin inline asm 2026-02-21T09:21:00.4409853Z cp.async.ca.shared.global [ %r18387 + 0 ], [ %rd580 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4409911Z // end inline asm 2026-02-21T09:21:00.4409972Z // begin inline asm 2026-02-21T09:21:00.4410112Z cp.async.ca.shared.global [ %r18389 + 0 ], [ %rd581 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4410244Z // end inline asm 2026-02-21T09:21:00.4410304Z // begin inline asm 2026-02-21T09:21:00.4410444Z cp.async.ca.shared.global [ %r18391 + 0 ], [ %rd582 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4410499Z // end inline asm 2026-02-21T09:21:00.4410629Z cp.async.commit_group; 2026-02-21T09:21:00.4410834Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4410895Z add.s32 %r18510, %r2341, %r22238; 2026-02-21T09:21:00.4411091Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4411155Z cvt.s64.s32 %rd630, %r18510; 2026-02-21T09:21:00.4411219Z add.s64 %rd583, %rd45, %rd630; 2026-02-21T09:21:00.4411475Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4411537Z // begin inline asm 2026-02-21T09:21:00.4411679Z cp.async.ca.shared.global [ %r148 + 0 ], [ %rd583 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4411737Z // end inline asm 2026-02-21T09:21:00.4411802Z cp.async.commit_group; 2026-02-21T09:21:00.4412003Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4412066Z cvt.s64.s32 %rd631, %r18502; 2026-02-21T09:21:00.4412129Z or.b64 %rd633, %rd631, %rd713; 2026-02-21T09:21:00.4412190Z shl.b64 %rd634, %rd633, 1; 2026-02-21T09:21:00.4412255Z add.s64 %rd635, %rd44, %rd634; 2026-02-21T09:21:00.4412315Z add.s64 %rd584, %rd635, 32; 2026-02-21T09:21:00.4412425Z cvt.s64.s32 %rd636, %r18503; 2026-02-21T09:21:00.4412490Z or.b64 %rd637, %rd636, %rd713; 2026-02-21T09:21:00.4412550Z shl.b64 %rd638, %rd637, 1; 2026-02-21T09:21:00.4412612Z add.s64 %rd639, %rd44, %rd638; 2026-02-21T09:21:00.4412675Z add.s64 %rd585, %rd639, 32; 2026-02-21T09:21:00.4412735Z cvt.s64.s32 %rd640, %r18504; 2026-02-21T09:21:00.4412794Z or.b64 %rd641, %rd640, %rd713; 2026-02-21T09:21:00.4412866Z shl.b64 %rd642, %rd641, 1; 2026-02-21T09:21:00.4412930Z add.s64 %rd643, %rd44, %rd642; 2026-02-21T09:21:00.4412989Z add.s64 %rd586, %rd643, 32; 2026-02-21T09:21:00.4413049Z cvt.s64.s32 %rd644, %r18505; 2026-02-21T09:21:00.4413113Z or.b64 %rd645, %rd644, %rd713; 2026-02-21T09:21:00.4413174Z shl.b64 %rd646, %rd645, 1; 2026-02-21T09:21:00.4413234Z add.s64 %rd647, %rd44, %rd646; 2026-02-21T09:21:00.4413293Z add.s64 %rd587, %rd647, 32; 2026-02-21T09:21:00.4413492Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4413554Z // begin inline asm 2026-02-21T09:21:00.4413694Z cp.async.ca.shared.global [ %r18395 + 0 ], [ %rd584 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4413754Z // end inline asm 2026-02-21T09:21:00.4413812Z // begin inline asm 2026-02-21T09:21:00.4413947Z cp.async.ca.shared.global [ %r18397 + 0 ], [ %rd585 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4414004Z // end inline asm 2026-02-21T09:21:00.4414062Z // begin inline asm 2026-02-21T09:21:00.4414197Z cp.async.ca.shared.global [ %r18399 + 0 ], [ %rd586 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4414252Z // end inline asm 2026-02-21T09:21:00.4414312Z // begin inline asm 2026-02-21T09:21:00.4414445Z cp.async.ca.shared.global [ %r18401 + 0 ], [ %rd587 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4414503Z // end inline asm 2026-02-21T09:21:00.4414571Z cp.async.commit_group; 2026-02-21T09:21:00.4414767Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4414832Z add.s32 %r18511, %r153, %r2341; 2026-02-21T09:21:00.4415026Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4415101Z cvt.s64.s32 %rd648, %r18511; 2026-02-21T09:21:00.4415167Z add.s64 %rd588, %rd45, %rd648; 2026-02-21T09:21:00.4415361Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4415481Z // begin inline asm 2026-02-21T09:21:00.4415615Z cp.async.ca.shared.global [ %r154 + 0 ], [ %rd588 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4415670Z // end inline asm 2026-02-21T09:21:00.4415734Z cp.async.commit_group; 2026-02-21T09:21:00.4415930Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4416041Z add.s64 %rd589, %rd635, 64; 2026-02-21T09:21:00.4416102Z add.s64 %rd590, %rd639, 64; 2026-02-21T09:21:00.4416164Z add.s64 %rd591, %rd643, 64; 2026-02-21T09:21:00.4416222Z add.s64 %rd592, %rd647, 64; 2026-02-21T09:21:00.4416419Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4416647Z bar.sync 0; 2026-02-21T09:21:00.4416789Z // begin inline asm 2026-02-21T09:21:00.4416935Z cp.async.ca.shared.global [ %r18405 + 0 ], [ %rd589 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4416993Z // end inline asm 2026-02-21T09:21:00.4417051Z // begin inline asm 2026-02-21T09:21:00.4417192Z cp.async.ca.shared.global [ %r18407 + 0 ], [ %rd590 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4417249Z // end inline asm 2026-02-21T09:21:00.4417316Z // begin inline asm 2026-02-21T09:21:00.4417453Z cp.async.ca.shared.global [ %r18409 + 0 ], [ %rd591 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4417510Z // end inline asm 2026-02-21T09:21:00.4417571Z // begin inline asm 2026-02-21T09:21:00.4417706Z cp.async.ca.shared.global [ %r18411 + 0 ], [ %rd592 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4417760Z // end inline asm 2026-02-21T09:21:00.4417908Z cp.async.commit_group; 2026-02-21T09:21:00.4418109Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4418174Z add.s32 %r18512, %r159, %r2341; 2026-02-21T09:21:00.4418372Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4418436Z cvt.s64.s32 %rd649, %r18512; 2026-02-21T09:21:00.4418500Z add.s64 %rd593, %rd45, %rd649; 2026-02-21T09:21:00.4418691Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4418751Z // begin inline asm 2026-02-21T09:21:00.4418888Z cp.async.ca.shared.global [ %r18413 + 0 ], [ %rd593 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4418945Z // end inline asm 2026-02-21T09:21:00.4419011Z cp.async.commit_group; 2026-02-21T09:21:00.4419217Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4419282Z add.s64 %rd594, %rd635, 96; 2026-02-21T09:21:00.4419343Z add.s64 %rd595, %rd639, 96; 2026-02-21T09:21:00.4419406Z add.s64 %rd596, %rd643, 96; 2026-02-21T09:21:00.4419466Z add.s64 %rd597, %rd647, 96; 2026-02-21T09:21:00.4419663Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4419724Z // begin inline asm 2026-02-21T09:21:00.4419863Z cp.async.ca.shared.global [ %r18415 + 0 ], [ %rd594 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4419919Z // end inline asm 2026-02-21T09:21:00.4419976Z // begin inline asm 2026-02-21T09:21:00.4420115Z cp.async.ca.shared.global [ %r18417 + 0 ], [ %rd595 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4420172Z // end inline asm 2026-02-21T09:21:00.4420233Z // begin inline asm 2026-02-21T09:21:00.4420372Z cp.async.ca.shared.global [ %r18419 + 0 ], [ %rd596 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4420429Z // end inline asm 2026-02-21T09:21:00.4420487Z // begin inline asm 2026-02-21T09:21:00.4420621Z cp.async.ca.shared.global [ %r18421 + 0 ], [ %rd597 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4420678Z // end inline asm 2026-02-21T09:21:00.4420742Z cp.async.commit_group; 2026-02-21T09:21:00.4420939Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4421005Z add.s32 %r18513, %r165, %r2341; 2026-02-21T09:21:00.4421202Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4421344Z cvt.s64.s32 %rd650, %r18513; 2026-02-21T09:21:00.4421409Z add.s64 %rd598, %rd45, %rd650; 2026-02-21T09:21:00.4421604Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4421723Z // begin inline asm 2026-02-21T09:21:00.4421873Z cp.async.ca.shared.global [ %r18423 + 0 ], [ %rd598 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4421931Z // end inline asm 2026-02-21T09:21:00.4421995Z cp.async.commit_group; 2026-02-21T09:21:00.4422192Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4422257Z add.s64 %rd599, %rd635, 128; 2026-02-21T09:21:00.4422318Z add.s64 %rd600, %rd639, 128; 2026-02-21T09:21:00.4422427Z add.s64 %rd601, %rd643, 128; 2026-02-21T09:21:00.4422491Z add.s64 %rd602, %rd647, 128; 2026-02-21T09:21:00.4422686Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4422743Z bar.sync 0; 2026-02-21T09:21:00.4422805Z // begin inline asm 2026-02-21T09:21:00.4422941Z cp.async.ca.shared.global [ %r18425 + 0 ], [ %rd599 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4422996Z // end inline asm 2026-02-21T09:21:00.4423055Z // begin inline asm 2026-02-21T09:21:00.4423194Z cp.async.ca.shared.global [ %r18427 + 0 ], [ %rd600 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4423248Z // end inline asm 2026-02-21T09:21:00.4423305Z // begin inline asm 2026-02-21T09:21:00.4423509Z cp.async.ca.shared.global [ %r18429 + 0 ], [ %rd601 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4423569Z // end inline asm 2026-02-21T09:21:00.4423624Z // begin inline asm 2026-02-21T09:21:00.4423758Z cp.async.ca.shared.global [ %r18431 + 0 ], [ %rd602 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4423816Z // end inline asm 2026-02-21T09:21:00.4423880Z cp.async.commit_group; 2026-02-21T09:21:00.4424073Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4424142Z add.s32 %r18514, %r171, %r2341; 2026-02-21T09:21:00.4424335Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4424397Z cvt.s64.s32 %rd651, %r18514; 2026-02-21T09:21:00.4424462Z add.s64 %rd603, %rd45, %rd651; 2026-02-21T09:21:00.4424653Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4424710Z // begin inline asm 2026-02-21T09:21:00.4424850Z cp.async.ca.shared.global [ %r18433 + 0 ], [ %rd603 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4424907Z // end inline asm 2026-02-21T09:21:00.4424972Z cp.async.commit_group; 2026-02-21T09:21:00.4425165Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4425230Z add.s64 %rd604, %rd635, 160; 2026-02-21T09:21:00.4425289Z add.s64 %rd605, %rd639, 160; 2026-02-21T09:21:00.4425363Z add.s64 %rd606, %rd643, 160; 2026-02-21T09:21:00.4425426Z add.s64 %rd607, %rd647, 160; 2026-02-21T09:21:00.4425622Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4425682Z // begin inline asm 2026-02-21T09:21:00.4425820Z cp.async.ca.shared.global [ %r18435 + 0 ], [ %rd604 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4425880Z // end inline asm 2026-02-21T09:21:00.4425938Z // begin inline asm 2026-02-21T09:21:00.4426073Z cp.async.ca.shared.global [ %r18437 + 0 ], [ %rd605 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4426133Z // end inline asm 2026-02-21T09:21:00.4426190Z // begin inline asm 2026-02-21T09:21:00.4426323Z cp.async.ca.shared.global [ %r18439 + 0 ], [ %rd606 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4426382Z // end inline asm 2026-02-21T09:21:00.4426439Z // begin inline asm 2026-02-21T09:21:00.4426702Z cp.async.ca.shared.global [ %r18441 + 0 ], [ %rd607 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4426844Z // end inline asm 2026-02-21T09:21:00.4426911Z cp.async.commit_group; 2026-02-21T09:21:00.4427105Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4427166Z add.s32 %r18515, %r177, %r2341; 2026-02-21T09:21:00.4427425Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4427486Z cvt.s64.s32 %rd652, %r18515; 2026-02-21T09:21:00.4427547Z add.s64 %rd608, %rd45, %rd652; 2026-02-21T09:21:00.4427743Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4427816Z // begin inline asm 2026-02-21T09:21:00.4427955Z cp.async.ca.shared.global [ %r18443 + 0 ], [ %rd608 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4428072Z // end inline asm 2026-02-21T09:21:00.4428145Z cp.async.commit_group; 2026-02-21T09:21:00.4428434Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4428502Z add.s64 %rd609, %rd635, 192; 2026-02-21T09:21:00.4428564Z add.s64 %rd610, %rd639, 192; 2026-02-21T09:21:00.4428623Z add.s64 %rd611, %rd643, 192; 2026-02-21T09:21:00.4428682Z add.s64 %rd612, %rd647, 192; 2026-02-21T09:21:00.4428881Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4428941Z bar.sync 0; 2026-02-21T09:21:00.4428999Z // begin inline asm 2026-02-21T09:21:00.4429136Z cp.async.ca.shared.global [ %r18445 + 0 ], [ %rd609 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4429277Z // end inline asm 2026-02-21T09:21:00.4429339Z // begin inline asm 2026-02-21T09:21:00.4429476Z cp.async.ca.shared.global [ %r18447 + 0 ], [ %rd610 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4429536Z // end inline asm 2026-02-21T09:21:00.4429594Z // begin inline asm 2026-02-21T09:21:00.4429728Z cp.async.ca.shared.global [ %r18449 + 0 ], [ %rd611 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4429782Z // end inline asm 2026-02-21T09:21:00.4429847Z // begin inline asm 2026-02-21T09:21:00.4429980Z cp.async.ca.shared.global [ %r18451 + 0 ], [ %rd612 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4430036Z // end inline asm 2026-02-21T09:21:00.4430102Z cp.async.commit_group; 2026-02-21T09:21:00.4430296Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4430358Z add.s32 %r18516, %r183, %r2341; 2026-02-21T09:21:00.4430554Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4430617Z cvt.s64.s32 %rd653, %r18516; 2026-02-21T09:21:00.4430680Z add.s64 %rd613, %rd45, %rd653; 2026-02-21T09:21:00.4430877Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4430940Z // begin inline asm 2026-02-21T09:21:00.4431075Z cp.async.ca.shared.global [ %r18453 + 0 ], [ %rd613 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4431132Z // end inline asm 2026-02-21T09:21:00.4431199Z cp.async.commit_group; 2026-02-21T09:21:00.4431392Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4431452Z add.s64 %rd614, %rd635, 224; 2026-02-21T09:21:00.4431518Z add.s64 %rd615, %rd639, 224; 2026-02-21T09:21:00.4431588Z add.s64 %rd616, %rd643, 224; 2026-02-21T09:21:00.4431649Z add.s64 %rd617, %rd647, 224; 2026-02-21T09:21:00.4431844Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4431909Z // begin inline asm 2026-02-21T09:21:00.4432044Z cp.async.ca.shared.global [ %r18455 + 0 ], [ %rd614 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4432100Z // end inline asm 2026-02-21T09:21:00.4432163Z // begin inline asm 2026-02-21T09:21:00.4432297Z cp.async.ca.shared.global [ %r18457 + 0 ], [ %rd615 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4432351Z // end inline asm 2026-02-21T09:21:00.4432409Z // begin inline asm 2026-02-21T09:21:00.4432606Z cp.async.ca.shared.global [ %r18459 + 0 ], [ %rd616 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4432664Z // end inline asm 2026-02-21T09:21:00.4432721Z // begin inline asm 2026-02-21T09:21:00.4432857Z cp.async.ca.shared.global [ %r18461 + 0 ], [ %rd617 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4432957Z // end inline asm 2026-02-21T09:21:00.4433019Z cp.async.commit_group; 2026-02-21T09:21:00.4433215Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4433277Z add.s32 %r18517, %r189, %r2341; 2026-02-21T09:21:00.4433473Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4433537Z cvt.s64.s32 %rd654, %r18517; 2026-02-21T09:21:00.4433643Z add.s64 %rd618, %rd45, %rd654; 2026-02-21T09:21:00.4433838Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4433898Z // begin inline asm 2026-02-21T09:21:00.4434038Z cp.async.ca.shared.global [ %r18463 + 0 ], [ %rd618 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4434092Z // end inline asm 2026-02-21T09:21:00.4434155Z cp.async.commit_group; 2026-02-21T09:21:00.4434351Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4434414Z add.s64 %rd619, %rd635, 256; 2026-02-21T09:21:00.4434473Z add.s64 %rd620, %rd639, 256; 2026-02-21T09:21:00.4434534Z add.s64 %rd621, %rd643, 256; 2026-02-21T09:21:00.4434593Z add.s64 %rd622, %rd647, 256; 2026-02-21T09:21:00.4434847Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4434906Z bar.sync 0; 2026-02-21T09:21:00.4434967Z // begin inline asm 2026-02-21T09:21:00.4435105Z cp.async.ca.shared.global [ %r18465 + 0 ], [ %rd619 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4435161Z // end inline asm 2026-02-21T09:21:00.4435219Z // begin inline asm 2026-02-21T09:21:00.4435357Z cp.async.ca.shared.global [ %r18467 + 0 ], [ %rd620 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4435416Z // end inline asm 2026-02-21T09:21:00.4435475Z // begin inline asm 2026-02-21T09:21:00.4435610Z cp.async.ca.shared.global [ %r18469 + 0 ], [ %rd621 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4435667Z // end inline asm 2026-02-21T09:21:00.4435725Z // begin inline asm 2026-02-21T09:21:00.4435858Z cp.async.ca.shared.global [ %r18471 + 0 ], [ %rd622 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4435914Z // end inline asm 2026-02-21T09:21:00.4435977Z cp.async.commit_group; 2026-02-21T09:21:00.4436176Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4436237Z add.s32 %r18518, %r195, %r2341; 2026-02-21T09:21:00.4436432Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4436622Z cvt.s64.s32 %rd655, %r18518; 2026-02-21T09:21:00.4436691Z add.s64 %rd623, %rd45, %rd655; 2026-02-21T09:21:00.4436894Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4436952Z // begin inline asm 2026-02-21T09:21:00.4437090Z cp.async.ca.shared.global [ %r18473 + 0 ], [ %rd623 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4437148Z // end inline asm 2026-02-21T09:21:00.4437211Z cp.async.commit_group; 2026-02-21T09:21:00.4437408Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4437469Z add.s64 %rd624, %rd635, 288; 2026-02-21T09:21:00.4437530Z add.s64 %rd625, %rd639, 288; 2026-02-21T09:21:00.4437589Z add.s64 %rd626, %rd643, 288; 2026-02-21T09:21:00.4437651Z add.s64 %rd627, %rd647, 288; 2026-02-21T09:21:00.4437846Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4437903Z // begin inline asm 2026-02-21T09:21:00.4438042Z cp.async.ca.shared.global [ %r18475 + 0 ], [ %rd624 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4438184Z // end inline asm 2026-02-21T09:21:00.4438241Z // begin inline asm 2026-02-21T09:21:00.4438378Z cp.async.ca.shared.global [ %r18477 + 0 ], [ %rd625 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4438497Z // end inline asm 2026-02-21T09:21:00.4438554Z // begin inline asm 2026-02-21T09:21:00.4438689Z cp.async.ca.shared.global [ %r18479 + 0 ], [ %rd626 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4438748Z // end inline asm 2026-02-21T09:21:00.4438806Z // begin inline asm 2026-02-21T09:21:00.4438940Z cp.async.ca.shared.global [ %r18481 + 0 ], [ %rd627 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4438999Z // end inline asm 2026-02-21T09:21:00.4439063Z cp.async.commit_group; 2026-02-21T09:21:00.4439318Z .loc 1 57 62 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:62 2026-02-21T09:21:00.4439400Z add.s32 %r18519, %r201, %r2341; 2026-02-21T09:21:00.4439598Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4439661Z cvt.s64.s32 %rd656, %r18519; 2026-02-21T09:21:00.4439722Z add.s64 %rd628, %rd45, %rd656; 2026-02-21T09:21:00.4439916Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4439976Z // begin inline asm 2026-02-21T09:21:00.4440113Z cp.async.ca.shared.global [ %r18483 + 0 ], [ %rd628 + 0 ], 0x8, %r18386; 2026-02-21T09:21:00.4440174Z // end inline asm 2026-02-21T09:21:00.4440236Z cp.async.commit_group; 2026-02-21T09:21:00.4440490Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4440558Z add.s32 %r18520, %r227, %r18495; 2026-02-21T09:21:00.4440620Z add.s32 %r23299, %r18520, %r18496; 2026-02-21T09:21:00.4440682Z shl.b32 %r18521, %r18493, 18; 2026-02-21T09:21:00.4440741Z or.b32 %r18522, %r22253, %r18521; 2026-02-21T09:21:00.4440810Z mul.wide.s32 %rd36, %r18522, 2; 2026-02-21T09:21:00.4440873Z or.b32 %r18523, %r8, %r2342; 2026-02-21T09:21:00.4440932Z shl.b32 %r18524, %r18523, 10; 2026-02-21T09:21:00.4441001Z mul.wide.s32 %rd37, %r18524, 2; 2026-02-21T09:21:00.4441059Z or.b32 %r18525, %r7, %r2342; 2026-02-21T09:21:00.4441118Z shl.b32 %r18526, %r18525, 10; 2026-02-21T09:21:00.4441181Z mul.wide.s32 %rd38, %r18526, 2; 2026-02-21T09:21:00.4441245Z or.b32 %r18527, %r22252, %r18521; 2026-02-21T09:21:00.4441306Z mul.wide.s32 %rd39, %r18527, 2; 2026-02-21T09:21:00.4441366Z mov.b32 %r23302, 0f00000000; 2026-02-21T09:21:00.4441424Z mov.b32 %r23301, 4; 2026-02-21T09:21:00.4441484Z mov.b32 %r23300, -1; 2026-02-21T09:21:00.4441543Z mov.b64 %rd723, -16; 2026-02-21T09:21:00.4441602Z mov.b64 %rd722, %rd6; 2026-02-21T09:21:00.4441667Z mov.b32 %r23303, %r23302; 2026-02-21T09:21:00.4441727Z mov.b32 %r23304, %r23302; 2026-02-21T09:21:00.4441784Z mov.b32 %r23305, %r23302; 2026-02-21T09:21:00.4441845Z mov.b32 %r23306, %r23302; 2026-02-21T09:21:00.4441903Z mov.b32 %r23307, %r23302; 2026-02-21T09:21:00.4441961Z mov.b32 %r23308, %r23302; 2026-02-21T09:21:00.4442020Z mov.b32 %r23309, %r23302; 2026-02-21T09:21:00.4442081Z mov.b32 %r23310, %r23302; 2026-02-21T09:21:00.4442138Z mov.b32 %r23311, %r23302; 2026-02-21T09:21:00.4442196Z mov.b32 %r23312, %r23302; 2026-02-21T09:21:00.4442260Z mov.b32 %r23313, %r23302; 2026-02-21T09:21:00.4442318Z mov.b32 %r23314, %r23302; 2026-02-21T09:21:00.4442374Z mov.b32 %r23315, %r23302; 2026-02-21T09:21:00.4442434Z mov.b32 %r23316, %r23302; 2026-02-21T09:21:00.4442492Z mov.b32 %r23317, %r23302; 2026-02-21T09:21:00.4442551Z mov.b32 %r23318, %r23302; 2026-02-21T09:21:00.4442607Z mov.b32 %r23319, %r23302; 2026-02-21T09:21:00.4442675Z mov.b32 %r23320, %r23302; 2026-02-21T09:21:00.4442738Z mov.b32 %r23321, %r23302; 2026-02-21T09:21:00.4442796Z mov.b32 %r23322, %r23302; 2026-02-21T09:21:00.4442857Z mov.b32 %r23323, %r23302; 2026-02-21T09:21:00.4442913Z mov.b32 %r23324, %r23302; 2026-02-21T09:21:00.4442969Z mov.b32 %r23325, %r23302; 2026-02-21T09:21:00.4443087Z mov.b32 %r23326, %r23302; 2026-02-21T09:21:00.4443148Z mov.b32 %r23327, %r23302; 2026-02-21T09:21:00.4443205Z mov.b32 %r23328, %r23302; 2026-02-21T09:21:00.4443261Z mov.b32 %r23329, %r23302; 2026-02-21T09:21:00.4443319Z mov.b32 %r23330, %r23302; 2026-02-21T09:21:00.4443423Z mov.b32 %r23331, %r23302; 2026-02-21T09:21:00.4443480Z mov.b32 %r23332, %r23302; 2026-02-21T09:21:00.4443536Z mov.b32 %r23333, %r23302; 2026-02-21T09:21:00.4443594Z mov.b32 %r23334, %r23302; 2026-02-21T09:21:00.4443652Z mov.b32 %r23335, %r23302; 2026-02-21T09:21:00.4443711Z mov.b32 %r23336, %r23302; 2026-02-21T09:21:00.4443770Z mov.b32 %r23337, %r23302; 2026-02-21T09:21:00.4443826Z mov.b32 %r23338, %r23302; 2026-02-21T09:21:00.4443883Z mov.b32 %r23339, %r23302; 2026-02-21T09:21:00.4444008Z mov.b32 %r23340, %r23302; 2026-02-21T09:21:00.4444071Z mov.b32 %r23341, %r23302; 2026-02-21T09:21:00.4444128Z mov.b32 %r23342, %r23302; 2026-02-21T09:21:00.4444186Z mov.b32 %r23343, %r23302; 2026-02-21T09:21:00.4444248Z mov.b32 %r23344, %r23302; 2026-02-21T09:21:00.4444304Z mov.b32 %r23345, %r23302; 2026-02-21T09:21:00.4444362Z mov.b32 %r23346, %r23302; 2026-02-21T09:21:00.4444419Z mov.b32 %r23347, %r23302; 2026-02-21T09:21:00.4444483Z mov.b32 %r23348, %r23302; 2026-02-21T09:21:00.4444539Z mov.b32 %r23349, %r23302; 2026-02-21T09:21:00.4444607Z mov.b32 %r23350, %r23302; 2026-02-21T09:21:00.4444670Z mov.b32 %r23351, %r23302; 2026-02-21T09:21:00.4444728Z mov.b32 %r23352, %r23302; 2026-02-21T09:21:00.4444785Z mov.b32 %r23353, %r23302; 2026-02-21T09:21:00.4444891Z mov.b32 %r23354, %r23302; 2026-02-21T09:21:00.4444954Z mov.b32 %r23355, %r23302; 2026-02-21T09:21:00.4445011Z mov.b32 %r23356, %r23302; 2026-02-21T09:21:00.4445070Z mov.b32 %r23357, %r23302; 2026-02-21T09:21:00.4445132Z mov.b32 %r23358, %r23302; 2026-02-21T09:21:00.4445189Z mov.b32 %r23359, %r23302; 2026-02-21T09:21:00.4445248Z mov.b32 %r23360, %r23302; 2026-02-21T09:21:00.4445308Z mov.b32 %r23361, %r23302; 2026-02-21T09:21:00.4445368Z mov.b32 %r23362, %r23302; 2026-02-21T09:21:00.4445426Z mov.b32 %r23363, %r23302; 2026-02-21T09:21:00.4445483Z mov.b32 %r23364, %r23302; 2026-02-21T09:21:00.4445543Z mov.b32 %r23365, %r23302; 2026-02-21T09:21:00.4445604Z mov.b32 %r23366, %r23302; 2026-02-21T09:21:00.4445661Z mov.b32 %r23367, %r23302; 2026-02-21T09:21:00.4445721Z mov.b32 %r23368, %r23302; 2026-02-21T09:21:00.4445777Z mov.b32 %r23369, %r23302; 2026-02-21T09:21:00.4445832Z mov.b32 %r23370, %r23302; 2026-02-21T09:21:00.4445891Z mov.b32 %r23371, %r23302; 2026-02-21T09:21:00.4445952Z mov.b32 %r23372, %r23302; 2026-02-21T09:21:00.4446010Z mov.b32 %r23373, %r23302; 2026-02-21T09:21:00.4446068Z mov.b32 %r23374, %r23302; 2026-02-21T09:21:00.4446128Z mov.b32 %r23375, %r23302; 2026-02-21T09:21:00.4446187Z mov.b32 %r23376, %r23302; 2026-02-21T09:21:00.4446245Z mov.b32 %r23377, %r23302; 2026-02-21T09:21:00.4446303Z mov.b32 %r23378, %r23302; 2026-02-21T09:21:00.4446363Z mov.b32 %r23379, %r23302; 2026-02-21T09:21:00.4446422Z mov.b32 %r23380, %r23302; 2026-02-21T09:21:00.4446777Z mov.b32 %r23381, %r23302; 2026-02-21T09:21:00.4446846Z mov.b32 %r23382, %r23302; 2026-02-21T09:21:00.4446904Z mov.b32 %r23383, %r23302; 2026-02-21T09:21:00.4446964Z mov.b32 %r23384, %r23302; 2026-02-21T09:21:00.4447020Z mov.b32 %r23385, %r23302; 2026-02-21T09:21:00.4447081Z mov.b32 %r23386, %r23302; 2026-02-21T09:21:00.4447139Z mov.b32 %r23387, %r23302; 2026-02-21T09:21:00.4447198Z mov.b32 %r23388, %r23302; 2026-02-21T09:21:00.4447258Z mov.b32 %r23389, %r23302; 2026-02-21T09:21:00.4447320Z mov.b32 %r23390, %r23302; 2026-02-21T09:21:00.4447377Z mov.b32 %r23391, %r23302; 2026-02-21T09:21:00.4447433Z mov.b32 %r23392, %r23302; 2026-02-21T09:21:00.4447492Z mov.b32 %r23393, %r23302; 2026-02-21T09:21:00.4447551Z mov.b32 %r23394, %r23302; 2026-02-21T09:21:00.4447608Z mov.b32 %r23395, %r23302; 2026-02-21T09:21:00.4447670Z mov.b32 %r23396, %r23302; 2026-02-21T09:21:00.4447821Z mov.b32 %r23397, %r23302; 2026-02-21T09:21:00.4447878Z mov.b32 %r23398, %r23302; 2026-02-21T09:21:00.4447936Z mov.b32 %r23399, %r23302; 2026-02-21T09:21:00.4447997Z mov.b32 %r23400, %r23302; 2026-02-21T09:21:00.4448053Z mov.b32 %r23401, %r23302; 2026-02-21T09:21:00.4448174Z mov.b32 %r23402, %r23302; 2026-02-21T09:21:00.4448233Z mov.b32 %r23403, %r23302; 2026-02-21T09:21:00.4448290Z mov.b32 %r23404, %r23302; 2026-02-21T09:21:00.4448346Z mov.b32 %r23405, %r23302; 2026-02-21T09:21:00.4448406Z mov.b32 %r23406, %r23302; 2026-02-21T09:21:00.4448463Z mov.b32 %r23407, %r23302; 2026-02-21T09:21:00.4448522Z mov.b32 %r23408, %r23302; 2026-02-21T09:21:00.4448579Z mov.b32 %r23409, %r23302; 2026-02-21T09:21:00.4448639Z mov.b32 %r23410, %r23302; 2026-02-21T09:21:00.4448696Z mov.b32 %r23411, %r23302; 2026-02-21T09:21:00.4448816Z mov.b32 %r23412, %r23302; 2026-02-21T09:21:00.4448882Z mov.b32 %r23413, %r23302; 2026-02-21T09:21:00.4448939Z mov.b32 %r23414, %r23302; 2026-02-21T09:21:00.4448998Z mov.b32 %r23415, %r23302; 2026-02-21T09:21:00.4449054Z mov.b32 %r23416, %r23302; 2026-02-21T09:21:00.4449115Z mov.b32 %r23417, %r23302; 2026-02-21T09:21:00.4449172Z mov.b32 %r23418, %r23302; 2026-02-21T09:21:00.4449229Z mov.b32 %r23419, %r23302; 2026-02-21T09:21:00.4449290Z mov.b32 %r23420, %r23302; 2026-02-21T09:21:00.4449347Z mov.b32 %r23421, %r23302; 2026-02-21T09:21:00.4449404Z mov.b32 %r23422, %r23302; 2026-02-21T09:21:00.4449462Z mov.b32 %r23423, %r23302; 2026-02-21T09:21:00.4449521Z mov.b32 %r23424, %r23302; 2026-02-21T09:21:00.4449580Z mov.b32 %r23425, %r23302; 2026-02-21T09:21:00.4449700Z mov.b32 %r23426, %r23302; 2026-02-21T09:21:00.4449763Z mov.b32 %r23427, %r23302; 2026-02-21T09:21:00.4449832Z mov.b32 %r23428, %r23302; 2026-02-21T09:21:00.4449891Z mov.b32 %r23429, %r23302; 2026-02-21T09:21:00.4449951Z mov.b32 %r23430, %r23302; 2026-02-21T09:21:00.4450013Z mov.b32 %r23431, %r23302; 2026-02-21T09:21:00.4450071Z mov.b32 %r23432, %r23302; 2026-02-21T09:21:00.4450130Z mov.b32 %r23433, %r23302; 2026-02-21T09:21:00.4450189Z mov.b32 %r23434, %r23302; 2026-02-21T09:21:00.4450248Z mov.b32 %r23435, %r23302; 2026-02-21T09:21:00.4450305Z mov.b32 %r23436, %r23302; 2026-02-21T09:21:00.4450363Z mov.b32 %r23437, %r23302; 2026-02-21T09:21:00.4450426Z mov.b32 %r23438, %r23302; 2026-02-21T09:21:00.4450484Z mov.b32 %r23439, %r23302; 2026-02-21T09:21:00.4450543Z mov.b32 %r23440, %r23302; 2026-02-21T09:21:00.4450603Z mov.b32 %r23441, %r23302; 2026-02-21T09:21:00.4450661Z mov.b32 %r23442, %r23302; 2026-02-21T09:21:00.4450719Z mov.b32 %r23443, %r23302; 2026-02-21T09:21:00.4450777Z mov.b32 %r23444, %r23302; 2026-02-21T09:21:00.4450838Z mov.b32 %r23445, %r23302; 2026-02-21T09:21:00.4450896Z mov.b32 %r23446, %r23302; 2026-02-21T09:21:00.4450955Z mov.b32 %r23447, %r23302; 2026-02-21T09:21:00.4451017Z mov.b32 %r23448, %r23302; 2026-02-21T09:21:00.4451074Z mov.b32 %r23449, %r23302; 2026-02-21T09:21:00.4451132Z mov.b32 %r23450, %r23302; 2026-02-21T09:21:00.4451193Z mov.b32 %r23451, %r23302; 2026-02-21T09:21:00.4451251Z mov.b32 %r23452, %r23302; 2026-02-21T09:21:00.4451307Z mov.b32 %r23453, %r23302; 2026-02-21T09:21:00.4451365Z mov.b32 %r23454, %r23302; 2026-02-21T09:21:00.4451427Z mov.b32 %r23455, %r23302; 2026-02-21T09:21:00.4451486Z mov.b32 %r23456, %r23302; 2026-02-21T09:21:00.4451543Z mov.b32 %r23457, %r23302; 2026-02-21T09:21:00.4451603Z mov.b32 %r23458, %r23302; 2026-02-21T09:21:00.4451660Z mov.b32 %r23459, %r23302; 2026-02-21T09:21:00.4451717Z mov.b32 %r23460, %r23302; 2026-02-21T09:21:00.4451773Z mov.b32 %r23461, %r23302; 2026-02-21T09:21:00.4451835Z mov.b32 %r23462, %r23302; 2026-02-21T09:21:00.4451892Z mov.b32 %r23463, %r23302; 2026-02-21T09:21:00.4451950Z mov.b32 %r23464, %r23302; 2026-02-21T09:21:00.4452011Z mov.b32 %r23465, %r23302; 2026-02-21T09:21:00.4452068Z mov.b32 %r23466, %r23302; 2026-02-21T09:21:00.4452125Z mov.b32 %r23467, %r23302; 2026-02-21T09:21:00.4452182Z mov.b32 %r23468, %r23302; 2026-02-21T09:21:00.4452308Z mov.b32 %r23469, %r23302; 2026-02-21T09:21:00.4452366Z mov.b32 %r23470, %r23302; 2026-02-21T09:21:00.4452423Z mov.b32 %r23471, %r23302; 2026-02-21T09:21:00.4452482Z mov.b32 %r23472, %r23302; 2026-02-21T09:21:00.4452539Z mov.b32 %r23473, %r23302; 2026-02-21T09:21:00.4452643Z mov.b32 %r23474, %r23302; 2026-02-21T09:21:00.4452699Z mov.b32 %r23475, %r23302; 2026-02-21T09:21:00.4452760Z mov.b32 %r23476, %r23302; 2026-02-21T09:21:00.4452821Z mov.b32 %r23477, %r23302; 2026-02-21T09:21:00.4452877Z mov.b32 %r23478, %r23302; 2026-02-21T09:21:00.4452936Z mov.b32 %r23479, %r23302; 2026-02-21T09:21:00.4452995Z mov.b32 %r23480, %r23302; 2026-02-21T09:21:00.4453052Z mov.b32 %r23481, %r23302; 2026-02-21T09:21:00.4453108Z mov.b32 %r23482, %r23302; 2026-02-21T09:21:00.4453217Z mov.b32 %r23483, %r23302; 2026-02-21T09:21:00.4453277Z mov.b32 %r23484, %r23302; 2026-02-21T09:21:00.4453335Z mov.b32 %r23485, %r23302; 2026-02-21T09:21:00.4453394Z mov.b32 %r23486, %r23302; 2026-02-21T09:21:00.4453452Z mov.b32 %r23487, %r23302; 2026-02-21T09:21:00.4453509Z mov.b32 %r23488, %r23302; 2026-02-21T09:21:00.4453566Z mov.b32 %r23489, %r23302; 2026-02-21T09:21:00.4453627Z mov.b32 %r23490, %r23302; 2026-02-21T09:21:00.4453696Z mov.b32 %r23491, %r23302; 2026-02-21T09:21:00.4453757Z mov.b32 %r23492, %r23302; 2026-02-21T09:21:00.4453817Z mov.b32 %r23493, %r23302; 2026-02-21T09:21:00.4453875Z mov.b32 %r23494, %r23302; 2026-02-21T09:21:00.4453933Z mov.b32 %r23495, %r23302; 2026-02-21T09:21:00.4453990Z mov.b32 %r23496, %r23302; 2026-02-21T09:21:00.4454049Z mov.b32 %r23497, %r23302; 2026-02-21T09:21:00.4454159Z mov.b32 %r23498, %r23302; 2026-02-21T09:21:00.4454218Z mov.b32 %r23499, %r23302; 2026-02-21T09:21:00.4454279Z mov.b32 %r23500, %r23302; 2026-02-21T09:21:00.4454337Z mov.b32 %r23501, %r23302; 2026-02-21T09:21:00.4454393Z mov.b32 %r23502, %r23302; 2026-02-21T09:21:00.4454454Z mov.b32 %r23503, %r23302; 2026-02-21T09:21:00.4454510Z mov.b32 %r23504, %r23302; 2026-02-21T09:21:00.4454569Z mov.b32 %r23505, %r23302; 2026-02-21T09:21:00.4454625Z mov.b32 %r23506, %r23302; 2026-02-21T09:21:00.4454687Z mov.b32 %r23507, %r23302; 2026-02-21T09:21:00.4454743Z mov.b32 %r23508, %r23302; 2026-02-21T09:21:00.4454800Z mov.b32 %r23509, %r23302; 2026-02-21T09:21:00.4454861Z mov.b32 %r23510, %r23302; 2026-02-21T09:21:00.4454918Z mov.b32 %r23511, %r23302; 2026-02-21T09:21:00.4454988Z mov.b32 %r23512, %r23302; 2026-02-21T09:21:00.4455046Z mov.b32 %r23513, %r23302; 2026-02-21T09:21:00.4455111Z mov.b32 %r23514, %r23302; 2026-02-21T09:21:00.4455167Z mov.b32 %r23515, %r23302; 2026-02-21T09:21:00.4455228Z mov.b32 %r23516, %r23302; 2026-02-21T09:21:00.4455290Z mov.b32 %r23517, %r23302; 2026-02-21T09:21:00.4455348Z mov.b32 %r23518, %r23302; 2026-02-21T09:21:00.4455406Z mov.b32 %r23519, %r23302; 2026-02-21T09:21:00.4455466Z mov.b32 %r23520, %r23302; 2026-02-21T09:21:00.4455527Z mov.b32 %r23521, %r23302; 2026-02-21T09:21:00.4455584Z mov.b32 %r23522, %r23302; 2026-02-21T09:21:00.4455642Z mov.b32 %r23523, %r23302; 2026-02-21T09:21:00.4455703Z mov.b32 %r23524, %r23302; 2026-02-21T09:21:00.4455760Z mov.b32 %r23525, %r23302; 2026-02-21T09:21:00.4455817Z mov.b32 %r23526, %r23302; 2026-02-21T09:21:00.4455876Z mov.b32 %r23527, %r23302; 2026-02-21T09:21:00.4455939Z mov.b32 %r23528, %r23302; 2026-02-21T09:21:00.4455994Z mov.b32 %r23529, %r23302; 2026-02-21T09:21:00.4456052Z mov.b32 %r23530, %r23302; 2026-02-21T09:21:00.4456112Z mov.b32 %r23531, %r23302; 2026-02-21T09:21:00.4456170Z mov.b32 %r23532, %r23302; 2026-02-21T09:21:00.4456228Z mov.b32 %r23533, %r23302; 2026-02-21T09:21:00.4456286Z mov.b32 %r23534, %r23302; 2026-02-21T09:21:00.4456347Z mov.b32 %r23535, %r23302; 2026-02-21T09:21:00.4456404Z mov.b32 %r23536, %r23302; 2026-02-21T09:21:00.4456593Z mov.b32 %r23537, %r23302; 2026-02-21T09:21:00.4456660Z mov.b32 %r23538, %r23302; 2026-02-21T09:21:00.4456719Z mov.b32 %r23539, %r23302; 2026-02-21T09:21:00.4456775Z mov.b32 %r23540, %r23302; 2026-02-21T09:21:00.4456913Z mov.b32 %r23541, %r23302; 2026-02-21T09:21:00.4456975Z mov.b32 %r23542, %r23302; 2026-02-21T09:21:00.4457032Z mov.b32 %r23543, %r23302; 2026-02-21T09:21:00.4457089Z mov.b32 %r23544, %r23302; 2026-02-21T09:21:00.4457150Z mov.b32 %r23545, %r23302; 2026-02-21T09:21:00.4457269Z mov.b32 %r23546, %r23302; 2026-02-21T09:21:00.4457326Z mov.b32 %r23547, %r23302; 2026-02-21T09:21:00.4457388Z mov.b32 %r23548, %r23302; 2026-02-21T09:21:00.4457445Z mov.b32 %r23549, %r23302; 2026-02-21T09:21:00.4457502Z mov.b32 %r23550, %r23302; 2026-02-21T09:21:00.4457561Z mov.b32 %r23551, %r23302; 2026-02-21T09:21:00.4457623Z mov.b32 %r23552, %r23302; 2026-02-21T09:21:00.4457681Z mov.b32 %r23553, %r23302; 2026-02-21T09:21:00.4457736Z mov.b32 %r23554, %r23302; 2026-02-21T09:21:00.4457858Z mov.b32 %r23555, %r23302; 2026-02-21T09:21:00.4457918Z mov.b32 %r23556, %r23302; 2026-02-21T09:21:00.4457976Z mov.b32 %r23557, %r23302; 2026-02-21T09:21:00.4458094Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T09:21:00.4458207Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:00.4458269Z add.s64 %rd723, %rd723, 16; 2026-02-21T09:21:00.4458337Z setp.lt.u64 %p60, %rd723, 432; 2026-02-21T09:21:00.4458404Z add.s32 %r21664, %r23300, 1; 2026-02-21T09:21:00.4458467Z setp.gt.s32 %p61, %r21664, 4; 2026-02-21T09:21:00.4458535Z selp.b32 %r23300, 0, %r21664, %p61; 2026-02-21T09:21:00.4458741Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4458877Z cp.async.wait_group 16; 2026-02-21T09:21:00.4458940Z bar.sync 0; 2026-02-21T09:21:00.4459001Z shl.b32 %r21665, %r23300, 13; 2026-02-21T09:21:00.4459067Z add.s32 %r21667, %r22237, %r21665; 2026-02-21T09:21:00.4459266Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4459330Z add.s32 %r21668, %r21667, %r203; 2026-02-21T09:21:00.4459403Z ld.shared.b16 %rs456, [%r21668]; 2026-02-21T09:21:00.4459473Z ld.shared.b16 %rs457, [%r21668+256]; 2026-02-21T09:21:00.4459540Z ld.shared.b16 %rs458, [%r21668+16]; 2026-02-21T09:21:00.4459607Z ld.shared.b16 %rs459, [%r21668+272]; 2026-02-21T09:21:00.4459678Z ld.shared.b16 %rs460, [%r21668+4096]; 2026-02-21T09:21:00.4459746Z ld.shared.b16 %rs461, [%r21668+4352]; 2026-02-21T09:21:00.4459811Z ld.shared.b16 %rs462, [%r21668+4112]; 2026-02-21T09:21:00.4459881Z ld.shared.b16 %rs463, [%r21668+4368]; 2026-02-21T09:21:00.4459942Z add.s32 %r21669, %r21667, %r204; 2026-02-21T09:21:00.4460008Z ld.shared.b16 %rs464, [%r21669]; 2026-02-21T09:21:00.4460075Z ld.shared.b16 %rs465, [%r21669+256]; 2026-02-21T09:21:00.4460140Z ld.shared.b16 %rs466, [%r21669+16]; 2026-02-21T09:21:00.4460205Z ld.shared.b16 %rs467, [%r21669+272]; 2026-02-21T09:21:00.4460269Z ld.shared.b16 %rs468, [%r21669+4096]; 2026-02-21T09:21:00.4460338Z ld.shared.b16 %rs469, [%r21669+4352]; 2026-02-21T09:21:00.4460402Z ld.shared.b16 %rs470, [%r21669+4112]; 2026-02-21T09:21:00.4460470Z ld.shared.b16 %rs471, [%r21669+4368]; 2026-02-21T09:21:00.4460549Z cvt.f32.bf16 %r18784, %rs456; 2026-02-21T09:21:00.4460612Z cvt.f32.bf16 %r18785, %rs457; 2026-02-21T09:21:00.4460674Z cvt.f32.bf16 %r18786, %rs464; 2026-02-21T09:21:00.4460735Z cvt.f32.bf16 %r18787, %rs465; 2026-02-21T09:21:00.4460799Z cvt.f32.bf16 %r19044, %rs458; 2026-02-21T09:21:00.4460857Z cvt.f32.bf16 %r19045, %rs459; 2026-02-21T09:21:00.4460917Z cvt.f32.bf16 %r19046, %rs466; 2026-02-21T09:21:00.4460979Z cvt.f32.bf16 %r19047, %rs467; 2026-02-21T09:21:00.4461040Z cvt.f32.bf16 %r19304, %rs460; 2026-02-21T09:21:00.4461100Z cvt.f32.bf16 %r19305, %rs461; 2026-02-21T09:21:00.4461158Z cvt.f32.bf16 %r19306, %rs468; 2026-02-21T09:21:00.4461220Z cvt.f32.bf16 %r19307, %rs469; 2026-02-21T09:21:00.4461279Z cvt.f32.bf16 %r19564, %rs462; 2026-02-21T09:21:00.4461338Z cvt.f32.bf16 %r19565, %rs463; 2026-02-21T09:21:00.4461401Z cvt.f32.bf16 %r19566, %rs470; 2026-02-21T09:21:00.4461522Z cvt.f32.bf16 %r19567, %rs471; 2026-02-21T09:21:00.4461726Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4461789Z shl.b32 %r21670, %r23300, 11; 2026-02-21T09:21:00.4461923Z add.s32 %r21671, %r22237, %r21670; 2026-02-21T09:21:00.4461984Z add.s32 %r21672, %r21671, 98304; 2026-02-21T09:21:00.4462177Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4462240Z add.s32 %r21673, %r21672, %r22239; 2026-02-21T09:21:00.4462301Z add.s32 %r21674, %r21672, %r22243; 2026-02-21T09:21:00.4462361Z add.s32 %r21675, %r21672, %r22244; 2026-02-21T09:21:00.4462603Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4462672Z ld.shared.s8 %rs472, [%r21673]; 2026-02-21T09:21:00.4462867Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4462938Z shl.b16 %rs473, %rs472, 4; 2026-02-21T09:21:00.4463139Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4463205Z ld.shared.s8 %rs474, [%r21673+256]; 2026-02-21T09:21:00.4463403Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4463463Z shl.b16 %rs475, %rs474, 4; 2026-02-21T09:21:00.4463656Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4463775Z ld.shared.s8 %rs476, [%r21673+512]; 2026-02-21T09:21:00.4463973Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4464035Z shl.b16 %rs477, %rs476, 4; 2026-02-21T09:21:00.4464230Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4464295Z ld.shared.s8 %rs478, [%r21674]; 2026-02-21T09:21:00.4464490Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4464552Z shl.b16 %rs479, %rs478, 4; 2026-02-21T09:21:00.4464746Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4464815Z ld.shared.s8 %rs480, [%r21673+1024]; 2026-02-21T09:21:00.4465008Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4465072Z shl.b16 %rs481, %rs480, 4; 2026-02-21T09:21:00.4465267Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4465334Z ld.shared.s8 %rs482, [%r21673+1280]; 2026-02-21T09:21:00.4465528Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4465594Z shl.b16 %rs483, %rs482, 4; 2026-02-21T09:21:00.4465785Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4465854Z ld.shared.s8 %rs484, [%r21673+1536]; 2026-02-21T09:21:00.4466049Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4466111Z shl.b16 %rs485, %rs484, 4; 2026-02-21T09:21:00.4466306Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4466375Z ld.shared.s8 %rs486, [%r21675]; 2026-02-21T09:21:00.4466701Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4466765Z shl.b16 %rs487, %rs486, 4; 2026-02-21T09:21:00.4466962Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4467025Z cvt.s16.s8 %rs488, %rs473; 2026-02-21T09:21:00.4467085Z shr.s16 %rs489, %rs488, 4; 2026-02-21T09:21:00.4467144Z cvt.s16.s8 %rs490, %rs475; 2026-02-21T09:21:00.4467296Z shr.s16 %rs491, %rs490, 4; 2026-02-21T09:21:00.4467355Z shr.s16 %rs492, %rs472, 4; 2026-02-21T09:21:00.4467415Z shr.s16 %rs493, %rs474, 4; 2026-02-21T09:21:00.4467612Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4467740Z cvt.rn.f32.s16 %r21676, %rs493; 2026-02-21T09:21:00.4467803Z cvt.rn.f32.s16 %r21677, %rs492; 2026-02-21T09:21:00.4467863Z cvt.rn.f32.s16 %r21678, %rs491; 2026-02-21T09:21:00.4467927Z cvt.rn.f32.s16 %r21679, %rs489; 2026-02-21T09:21:00.4468121Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4468182Z cvt.s16.s8 %rs494, %rs477; 2026-02-21T09:21:00.4468254Z shr.s16 %rs495, %rs494, 4; 2026-02-21T09:21:00.4468444Z cvt.s16.s8 %rs496, %rs479; 2026-02-21T09:21:00.4468512Z shr.s16 %rs497, %rs496, 4; 2026-02-21T09:21:00.4468574Z shr.s16 %rs498, %rs476, 4; 2026-02-21T09:21:00.4468639Z shr.s16 %rs499, %rs478, 4; 2026-02-21T09:21:00.4468842Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4468906Z cvt.rn.f32.s16 %r21680, %rs499; 2026-02-21T09:21:00.4468972Z cvt.rn.f32.s16 %r21681, %rs498; 2026-02-21T09:21:00.4469035Z cvt.rn.f32.s16 %r21682, %rs497; 2026-02-21T09:21:00.4469097Z cvt.rn.f32.s16 %r21683, %rs495; 2026-02-21T09:21:00.4469299Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4469360Z cvt.s16.s8 %rs500, %rs481; 2026-02-21T09:21:00.4469483Z shr.s16 %rs501, %rs500, 4; 2026-02-21T09:21:00.4469550Z cvt.s16.s8 %rs502, %rs483; 2026-02-21T09:21:00.4469608Z shr.s16 %rs503, %rs502, 4; 2026-02-21T09:21:00.4469668Z shr.s16 %rs504, %rs480, 4; 2026-02-21T09:21:00.4469728Z shr.s16 %rs505, %rs482, 4; 2026-02-21T09:21:00.4469923Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4469986Z cvt.rn.f32.s16 %r21684, %rs505; 2026-02-21T09:21:00.4470047Z cvt.rn.f32.s16 %r21685, %rs504; 2026-02-21T09:21:00.4470113Z cvt.rn.f32.s16 %r21686, %rs503; 2026-02-21T09:21:00.4470176Z cvt.rn.f32.s16 %r21687, %rs501; 2026-02-21T09:21:00.4470370Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4470435Z cvt.s16.s8 %rs506, %rs485; 2026-02-21T09:21:00.4470495Z shr.s16 %rs507, %rs506, 4; 2026-02-21T09:21:00.4470556Z cvt.s16.s8 %rs508, %rs487; 2026-02-21T09:21:00.4470616Z shr.s16 %rs509, %rs508, 4; 2026-02-21T09:21:00.4470678Z shr.s16 %rs510, %rs484, 4; 2026-02-21T09:21:00.4470737Z shr.s16 %rs511, %rs486, 4; 2026-02-21T09:21:00.4470929Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4470994Z cvt.rn.f32.s16 %r21688, %rs511; 2026-02-21T09:21:00.4471055Z cvt.rn.f32.s16 %r21689, %rs510; 2026-02-21T09:21:00.4471117Z cvt.rn.f32.s16 %r21690, %rs509; 2026-02-21T09:21:00.4471178Z cvt.rn.f32.s16 %r21691, %rs507; 2026-02-21T09:21:00.4471303Z st.shared.v4.b32 [%r207], {%r21679, %r21677, %r21678, %r21676}; 2026-02-21T09:21:00.4471421Z st.shared.v4.b32 [%r208], {%r21683, %r21681, %r21682, %r21680}; 2026-02-21T09:21:00.4471531Z st.shared.v4.b32 [%r209], {%r21687, %r21685, %r21686, %r21684}; 2026-02-21T09:21:00.4471642Z st.shared.v4.b32 [%r210], {%r21691, %r21689, %r21690, %r21688}; 2026-02-21T09:21:00.4471697Z $L__tmp17: 2026-02-21T09:21:00.4471973Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4472041Z // begin inline asm 2026-02-21T09:21:00.4472133Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4472194Z // end inline asm 2026-02-21T09:21:00.4472249Z bar.sync 0; 2026-02-21T09:21:00.4472336Z shfl.sync.idx.b32 %r21692, %r5, 0, 31, -1; 2026-02-21T09:21:00.4472409Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4472531Z mov.pred %p52, -1; 2026-02-21T09:21:00.4472593Z // begin inline asm 2026-02-21T09:21:00.4475369Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429}, {%r18784,%r18785,%r18786,%r18787}, %rd4, %p52, 1, 1; 2026-02-21T09:21:00.4475476Z // end inline asm 2026-02-21T09:21:00.4475537Z // begin inline asm 2026-02-21T09:21:00.4478457Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429}, {%r19044,%r19045,%r19046,%r19047}, %rd5, %p52, 1, 1; 2026-02-21T09:21:00.4478525Z // end inline asm 2026-02-21T09:21:00.4478585Z // begin inline asm 2026-02-21T09:21:00.4481297Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557}, {%r19304,%r19305,%r19306,%r19307}, %rd4, %p52, 1, 1; 2026-02-21T09:21:00.4481361Z // end inline asm 2026-02-21T09:21:00.4481429Z // begin inline asm 2026-02-21T09:21:00.4484210Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557}, {%r19564,%r19565,%r19566,%r19567}, %rd5, %p52, 1, 1; 2026-02-21T09:21:00.4484384Z // end inline asm 2026-02-21T09:21:00.4484460Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4484525Z mov.b32 %r21384, 0; 2026-02-21T09:21:00.4484587Z mov.b32 %r19824, %r18350; 2026-02-21T09:21:00.4484645Z mov.b32 %r19825, %r21384; 2026-02-21T09:21:00.4484702Z mov.b32 %r19826, %r21384; 2026-02-21T09:21:00.4484764Z // begin inline asm 2026-02-21T09:21:00.4489942Z // wait for regs: %r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r19824,%r19825,%r19826 2026-02-21T09:21:00.4490037Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4490100Z // end inline asm 2026-02-21T09:21:00.4490154Z $L__tmp18: 2026-02-21T09:21:00.4490364Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4490428Z add.s32 %r21693, %r22237, 40960; 2026-02-21T09:21:00.4490496Z add.s32 %r21694, %r21693, %r21665; 2026-02-21T09:21:00.4490696Z .loc 1 55 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:55:32 2026-02-21T09:21:00.4490759Z add.s32 %r21695, %r21694, %r203; 2026-02-21T09:21:00.4490827Z ld.shared.b16 %rs512, [%r21695]; 2026-02-21T09:21:00.4490982Z ld.shared.b16 %rs513, [%r21695+256]; 2026-02-21T09:21:00.4491048Z ld.shared.b16 %rs514, [%r21695+16]; 2026-02-21T09:21:00.4491118Z ld.shared.b16 %rs515, [%r21695+272]; 2026-02-21T09:21:00.4491187Z ld.shared.b16 %rs516, [%r21695+4096]; 2026-02-21T09:21:00.4491326Z ld.shared.b16 %rs517, [%r21695+4352]; 2026-02-21T09:21:00.4491392Z ld.shared.b16 %rs518, [%r21695+4112]; 2026-02-21T09:21:00.4491461Z ld.shared.b16 %rs519, [%r21695+4368]; 2026-02-21T09:21:00.4491522Z add.s32 %r21696, %r21694, %r204; 2026-02-21T09:21:00.4491589Z ld.shared.b16 %rs520, [%r21696]; 2026-02-21T09:21:00.4491657Z ld.shared.b16 %rs521, [%r21696+256]; 2026-02-21T09:21:00.4491723Z ld.shared.b16 %rs522, [%r21696+16]; 2026-02-21T09:21:00.4491788Z ld.shared.b16 %rs523, [%r21696+272]; 2026-02-21T09:21:00.4491916Z ld.shared.b16 %rs524, [%r21696+4096]; 2026-02-21T09:21:00.4491987Z ld.shared.b16 %rs525, [%r21696+4352]; 2026-02-21T09:21:00.4492051Z ld.shared.b16 %rs526, [%r21696+4112]; 2026-02-21T09:21:00.4492118Z ld.shared.b16 %rs527, [%r21696+4368]; 2026-02-21T09:21:00.4492194Z cvt.f32.bf16 %r20342, %rs512; 2026-02-21T09:21:00.4492255Z cvt.f32.bf16 %r20343, %rs513; 2026-02-21T09:21:00.4492314Z cvt.f32.bf16 %r20344, %rs520; 2026-02-21T09:21:00.4492377Z cvt.f32.bf16 %r20345, %rs521; 2026-02-21T09:21:00.4492437Z cvt.f32.bf16 %r20602, %rs514; 2026-02-21T09:21:00.4492496Z cvt.f32.bf16 %r20603, %rs515; 2026-02-21T09:21:00.4492555Z cvt.f32.bf16 %r20604, %rs522; 2026-02-21T09:21:00.4492616Z cvt.f32.bf16 %r20605, %rs523; 2026-02-21T09:21:00.4492724Z cvt.f32.bf16 %r20862, %rs516; 2026-02-21T09:21:00.4492787Z cvt.f32.bf16 %r20863, %rs517; 2026-02-21T09:21:00.4492849Z cvt.f32.bf16 %r20864, %rs524; 2026-02-21T09:21:00.4492908Z cvt.f32.bf16 %r20865, %rs525; 2026-02-21T09:21:00.4492968Z cvt.f32.bf16 %r21122, %rs518; 2026-02-21T09:21:00.4493029Z cvt.f32.bf16 %r21123, %rs519; 2026-02-21T09:21:00.4493090Z cvt.f32.bf16 %r21124, %rs526; 2026-02-21T09:21:00.4493149Z cvt.f32.bf16 %r21125, %rs527; 2026-02-21T09:21:00.4493353Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4493418Z add.s32 %r21697, %r21671, 108544; 2026-02-21T09:21:00.4493613Z .loc 1 70 45 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:70:45 2026-02-21T09:21:00.4493678Z add.s32 %r21698, %r21697, %r22239; 2026-02-21T09:21:00.4493741Z add.s32 %r21699, %r21697, %r22243; 2026-02-21T09:21:00.4493812Z add.s32 %r21700, %r21697, %r22244; 2026-02-21T09:21:00.4494011Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4494079Z ld.shared.s8 %rs528, [%r21698]; 2026-02-21T09:21:00.4494276Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4494338Z shl.b16 %rs529, %rs528, 4; 2026-02-21T09:21:00.4494529Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4494600Z ld.shared.s8 %rs530, [%r21698+256]; 2026-02-21T09:21:00.4494793Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4494855Z shl.b16 %rs531, %rs530, 4; 2026-02-21T09:21:00.4495052Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4495118Z ld.shared.s8 %rs532, [%r21698+512]; 2026-02-21T09:21:00.4495314Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4495378Z shl.b16 %rs533, %rs532, 4; 2026-02-21T09:21:00.4495571Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4495639Z ld.shared.s8 %rs534, [%r21699]; 2026-02-21T09:21:00.4495832Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4495954Z shl.b16 %rs535, %rs534, 4; 2026-02-21T09:21:00.4496146Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4496213Z ld.shared.s8 %rs536, [%r21698+1024]; 2026-02-21T09:21:00.4496603Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4496669Z shl.b16 %rs537, %rs536, 4; 2026-02-21T09:21:00.4496862Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4496934Z ld.shared.s8 %rs538, [%r21698+1280]; 2026-02-21T09:21:00.4497128Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4497263Z shl.b16 %rs539, %rs538, 4; 2026-02-21T09:21:00.4497461Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4497527Z ld.shared.s8 %rs540, [%r21698+1536]; 2026-02-21T09:21:00.4497719Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4497780Z shl.b16 %rs541, %rs540, 4; 2026-02-21T09:21:00.4497975Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4498042Z ld.shared.s8 %rs542, [%r21700]; 2026-02-21T09:21:00.4498234Z .loc 1 60 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:60:28 2026-02-21T09:21:00.4498298Z shl.b16 %rs543, %rs542, 4; 2026-02-21T09:21:00.4498552Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4498616Z cvt.s16.s8 %rs544, %rs529; 2026-02-21T09:21:00.4498680Z shr.s16 %rs545, %rs544, 4; 2026-02-21T09:21:00.4498739Z cvt.s16.s8 %rs546, %rs531; 2026-02-21T09:21:00.4498797Z shr.s16 %rs547, %rs546, 4; 2026-02-21T09:21:00.4498857Z shr.s16 %rs548, %rs528, 4; 2026-02-21T09:21:00.4498924Z shr.s16 %rs549, %rs530, 4; 2026-02-21T09:21:00.4499116Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4499181Z cvt.rn.f32.s16 %r21701, %rs549; 2026-02-21T09:21:00.4499247Z cvt.rn.f32.s16 %r21702, %rs548; 2026-02-21T09:21:00.4499313Z cvt.rn.f32.s16 %r21703, %rs547; 2026-02-21T09:21:00.4499374Z cvt.rn.f32.s16 %r21704, %rs545; 2026-02-21T09:21:00.4499571Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4499633Z cvt.s16.s8 %rs550, %rs533; 2026-02-21T09:21:00.4499693Z shr.s16 %rs551, %rs550, 4; 2026-02-21T09:21:00.4499754Z cvt.s16.s8 %rs552, %rs535; 2026-02-21T09:21:00.4499817Z shr.s16 %rs553, %rs552, 4; 2026-02-21T09:21:00.4499878Z shr.s16 %rs554, %rs532, 4; 2026-02-21T09:21:00.4499937Z shr.s16 %rs555, %rs534, 4; 2026-02-21T09:21:00.4500137Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4500209Z cvt.rn.f32.s16 %r21705, %rs555; 2026-02-21T09:21:00.4500271Z cvt.rn.f32.s16 %r21706, %rs554; 2026-02-21T09:21:00.4500336Z cvt.rn.f32.s16 %r21707, %rs553; 2026-02-21T09:21:00.4500398Z cvt.rn.f32.s16 %r21708, %rs551; 2026-02-21T09:21:00.4500595Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4500660Z cvt.s16.s8 %rs556, %rs537; 2026-02-21T09:21:00.4500723Z shr.s16 %rs557, %rs556, 4; 2026-02-21T09:21:00.4500783Z cvt.s16.s8 %rs558, %rs539; 2026-02-21T09:21:00.4500845Z shr.s16 %rs559, %rs558, 4; 2026-02-21T09:21:00.4500905Z shr.s16 %rs560, %rs536, 4; 2026-02-21T09:21:00.4500963Z shr.s16 %rs561, %rs538, 4; 2026-02-21T09:21:00.4501156Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4501220Z cvt.rn.f32.s16 %r21709, %rs561; 2026-02-21T09:21:00.4501285Z cvt.rn.f32.s16 %r21710, %rs560; 2026-02-21T09:21:00.4501424Z cvt.rn.f32.s16 %r21711, %rs559; 2026-02-21T09:21:00.4501486Z cvt.rn.f32.s16 %r21712, %rs557; 2026-02-21T09:21:00.4501680Z .loc 1 62 25 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:62:25 2026-02-21T09:21:00.4501740Z cvt.s16.s8 %rs562, %rs541; 2026-02-21T09:21:00.4501859Z shr.s16 %rs563, %rs562, 4; 2026-02-21T09:21:00.4501922Z cvt.s16.s8 %rs564, %rs543; 2026-02-21T09:21:00.4501980Z shr.s16 %rs565, %rs564, 4; 2026-02-21T09:21:00.4502041Z shr.s16 %rs566, %rs540, 4; 2026-02-21T09:21:00.4502103Z shr.s16 %rs567, %rs542, 4; 2026-02-21T09:21:00.4502317Z .loc 1 80 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:80:32 2026-02-21T09:21:00.4502383Z cvt.rn.f32.s16 %r21713, %rs567; 2026-02-21T09:21:00.4502498Z cvt.rn.f32.s16 %r21714, %rs566; 2026-02-21T09:21:00.4502563Z cvt.rn.f32.s16 %r21715, %rs565; 2026-02-21T09:21:00.4505742Z cvt.rn.f32.s16 %r21716, %rs563; 2026-02-21T09:21:00.4505834Z bar.sync 0; 2026-02-21T09:21:00.4505983Z st.shared.v4.b32 [%r207], {%r21704, %r21702, %r21703, %r21701}; 2026-02-21T09:21:00.4506108Z st.shared.v4.b32 [%r208], {%r21708, %r21706, %r21707, %r21705}; 2026-02-21T09:21:00.4506225Z st.shared.v4.b32 [%r209], {%r21712, %r21710, %r21711, %r21709}; 2026-02-21T09:21:00.4506336Z st.shared.v4.b32 [%r210], {%r21716, %r21714, %r21715, %r21713}; 2026-02-21T09:21:00.4506393Z $L__tmp19: 2026-02-21T09:21:00.4506848Z .loc 2 291 36 // standard.py:291:36 @[ cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:87:40 ] 2026-02-21T09:21:00.4507047Z // begin inline asm 2026-02-21T09:21:00.4507144Z fence.proxy.async.shared::cta; 2026-02-21T09:21:00.4507202Z // end inline asm 2026-02-21T09:21:00.4507259Z bar.sync 0; 2026-02-21T09:21:00.4507332Z wgmma.fence.sync.aligned; 2026-02-21T09:21:00.4507395Z // begin inline asm 2026-02-21T09:21:00.4510161Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429}, {%r20342,%r20343,%r20344,%r20345}, %rd4, %p52, 1, 1; 2026-02-21T09:21:00.4510230Z // end inline asm 2026-02-21T09:21:00.4510291Z // begin inline asm 2026-02-21T09:21:00.4512960Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429}, {%r20602,%r20603,%r20604,%r20605}, %rd5, %p52, 1, 1; 2026-02-21T09:21:00.4513102Z // end inline asm 2026-02-21T09:21:00.4513234Z // begin inline asm 2026-02-21T09:21:00.4515959Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557}, {%r20862,%r20863,%r20864,%r20865}, %rd4, %p52, 1, 1; 2026-02-21T09:21:00.4516025Z // end inline asm 2026-02-21T09:21:00.4516082Z // begin inline asm 2026-02-21T09:21:00.4518958Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557}, {%r21122,%r21123,%r21124,%r21125}, %rd5, %p52, 1, 1; 2026-02-21T09:21:00.4519032Z // end inline asm 2026-02-21T09:21:00.4519111Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:00.4519174Z mov.b32 %r21382, %r18350; 2026-02-21T09:21:00.4519232Z mov.b32 %r21383, %r21384; 2026-02-21T09:21:00.4519292Z // begin inline asm 2026-02-21T09:21:00.4524215Z // wait for regs: %r23302,%r23303,%r23304,%r23305,%r23306,%r23307,%r23308,%r23309,%r23310,%r23311,%r23312,%r23313,%r23314,%r23315,%r23316,%r23317,%r23318,%r23319,%r23320,%r23321,%r23322,%r23323,%r23324,%r23325,%r23326,%r23327,%r23328,%r23329,%r23330,%r23331,%r23332,%r23333,%r23334,%r23335,%r23336,%r23337,%r23338,%r23339,%r23340,%r23341,%r23342,%r23343,%r23344,%r23345,%r23346,%r23347,%r23348,%r23349,%r23350,%r23351,%r23352,%r23353,%r23354,%r23355,%r23356,%r23357,%r23358,%r23359,%r23360,%r23361,%r23362,%r23363,%r23364,%r23365,%r23366,%r23367,%r23368,%r23369,%r23370,%r23371,%r23372,%r23373,%r23374,%r23375,%r23376,%r23377,%r23378,%r23379,%r23380,%r23381,%r23382,%r23383,%r23384,%r23385,%r23386,%r23387,%r23388,%r23389,%r23390,%r23391,%r23392,%r23393,%r23394,%r23395,%r23396,%r23397,%r23398,%r23399,%r23400,%r23401,%r23402,%r23403,%r23404,%r23405,%r23406,%r23407,%r23408,%r23409,%r23410,%r23411,%r23412,%r23413,%r23414,%r23415,%r23416,%r23417,%r23418,%r23419,%r23420,%r23421,%r23422,%r23423,%r23424,%r23425,%r23426,%r23427,%r23428,%r23429,%r23430,%r23431,%r23432,%r23433,%r23434,%r23435,%r23436,%r23437,%r23438,%r23439,%r23440,%r23441,%r23442,%r23443,%r23444,%r23445,%r23446,%r23447,%r23448,%r23449,%r23450,%r23451,%r23452,%r23453,%r23454,%r23455,%r23456,%r23457,%r23458,%r23459,%r23460,%r23461,%r23462,%r23463,%r23464,%r23465,%r23466,%r23467,%r23468,%r23469,%r23470,%r23471,%r23472,%r23473,%r23474,%r23475,%r23476,%r23477,%r23478,%r23479,%r23480,%r23481,%r23482,%r23483,%r23484,%r23485,%r23486,%r23487,%r23488,%r23489,%r23490,%r23491,%r23492,%r23493,%r23494,%r23495,%r23496,%r23497,%r23498,%r23499,%r23500,%r23501,%r23502,%r23503,%r23504,%r23505,%r23506,%r23507,%r23508,%r23509,%r23510,%r23511,%r23512,%r23513,%r23514,%r23515,%r23516,%r23517,%r23518,%r23519,%r23520,%r23521,%r23522,%r23523,%r23524,%r23525,%r23526,%r23527,%r23528,%r23529,%r23530,%r23531,%r23532,%r23533,%r23534,%r23535,%r23536,%r23537,%r23538,%r23539,%r23540,%r23541,%r23542,%r23543,%r23544,%r23545,%r23546,%r23547,%r23548,%r23549,%r23550,%r23551,%r23552,%r23553,%r23554,%r23555,%r23556,%r23557,%r21382,%r21383,%r21384 2026-02-21T09:21:00.4524416Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:00.4524477Z // end inline asm 2026-02-21T09:21:00.4524531Z $L__tmp20: 2026-02-21T09:21:00.4524750Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4524816Z add.s32 %r21717, %r23301, 1; 2026-02-21T09:21:00.4524886Z setp.gt.s32 %p62, %r21717, 4; 2026-02-21T09:21:00.4524955Z selp.b32 %r23301, 0, %r21717, %p62; 2026-02-21T09:21:00.4525205Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4525277Z add.s64 %rd675, %rd722, %rd39; 2026-02-21T09:21:00.4525339Z add.s64 %rd665, %rd675, 320; 2026-02-21T09:21:00.4525401Z add.s64 %rd676, %rd722, %rd38; 2026-02-21T09:21:00.4525469Z add.s64 %rd666, %rd676, 320; 2026-02-21T09:21:00.4525536Z add.s64 %rd677, %rd722, %rd37; 2026-02-21T09:21:00.4525596Z add.s64 %rd667, %rd677, 320; 2026-02-21T09:21:00.4525793Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4525859Z add.s64 %rd678, %rd722, %rd36; 2026-02-21T09:21:00.4525918Z add.s64 %rd668, %rd678, 320; 2026-02-21T09:21:00.4525978Z shl.b32 %r21718, %r23301, 13; 2026-02-21T09:21:00.4526046Z add.s32 %r21719, %r22237, %r21718; 2026-02-21T09:21:00.4526108Z add.s32 %r21644, %r21719, %r141; 2026-02-21T09:21:00.4526170Z selp.b32 %r21645, 8, 0, %p60; 2026-02-21T09:21:00.4526229Z // begin inline asm 2026-02-21T09:21:00.4526385Z cp.async.ca.shared.global [ %r21644 + 0 ], [ %rd665 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4526442Z // end inline asm 2026-02-21T09:21:00.4526627Z add.s32 %r21646, %r21644, 2048; 2026-02-21T09:21:00.4526691Z // begin inline asm 2026-02-21T09:21:00.4526834Z cp.async.ca.shared.global [ %r21646 + 0 ], [ %rd666 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4526890Z // end inline asm 2026-02-21T09:21:00.4526952Z add.s32 %r21648, %r21644, 4096; 2026-02-21T09:21:00.4527010Z // begin inline asm 2026-02-21T09:21:00.4527146Z cp.async.ca.shared.global [ %r21648 + 0 ], [ %rd667 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4527201Z // end inline asm 2026-02-21T09:21:00.4527262Z add.s32 %r21650, %r21644, 6144; 2026-02-21T09:21:00.4527320Z // begin inline asm 2026-02-21T09:21:00.4527455Z cp.async.ca.shared.global [ %r21650 + 0 ], [ %rd668 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4527512Z // end inline asm 2026-02-21T09:21:00.4527577Z cp.async.commit_group; 2026-02-21T09:21:00.4527777Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4527845Z add.s32 %r21720, %r23299, -65536; 2026-02-21T09:21:00.4527918Z cvt.s64.s32 %rd679, %r21720; 2026-02-21T09:21:00.4527984Z add.s64 %rd669, %rd45, %rd679; 2026-02-21T09:21:00.4528183Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4528246Z shl.b32 %r21721, %r23301, 11; 2026-02-21T09:21:00.4528391Z add.s32 %r21652, %r148, %r21721; 2026-02-21T09:21:00.4528449Z // begin inline asm 2026-02-21T09:21:00.4528587Z cp.async.ca.shared.global [ %r21652 + 0 ], [ %rd669 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4528642Z // end inline asm 2026-02-21T09:21:00.4528705Z cp.async.commit_group; 2026-02-21T09:21:00.4528962Z .loc 1 51 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:32 2026-02-21T09:21:00.4529029Z add.s64 %rd670, %rd675, 352; 2026-02-21T09:21:00.4529090Z add.s64 %rd671, %rd676, 352; 2026-02-21T09:21:00.4529150Z add.s64 %rd672, %rd677, 352; 2026-02-21T09:21:00.4529350Z .loc 1 51 80 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:51:80 2026-02-21T09:21:00.4529409Z add.s64 %rd673, %rd678, 352; 2026-02-21T09:21:00.4529535Z add.s32 %r21722, %r21693, %r21718; 2026-02-21T09:21:00.4529601Z add.s32 %r21654, %r21722, %r141; 2026-02-21T09:21:00.4529658Z // begin inline asm 2026-02-21T09:21:00.4529813Z cp.async.ca.shared.global [ %r21654 + 0 ], [ %rd670 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4529873Z // end inline asm 2026-02-21T09:21:00.4529940Z add.s32 %r21656, %r21654, 2048; 2026-02-21T09:21:00.4529998Z // begin inline asm 2026-02-21T09:21:00.4530137Z cp.async.ca.shared.global [ %r21656 + 0 ], [ %rd671 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4530197Z // end inline asm 2026-02-21T09:21:00.4530256Z add.s32 %r21658, %r21654, 4096; 2026-02-21T09:21:00.4530313Z // begin inline asm 2026-02-21T09:21:00.4530448Z cp.async.ca.shared.global [ %r21658 + 0 ], [ %rd672 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4530570Z // end inline asm 2026-02-21T09:21:00.4530633Z add.s32 %r21660, %r21654, 6144; 2026-02-21T09:21:00.4530689Z // begin inline asm 2026-02-21T09:21:00.4530827Z cp.async.ca.shared.global [ %r21660 + 0 ], [ %rd673 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4530882Z // end inline asm 2026-02-21T09:21:00.4530946Z cp.async.commit_group; 2026-02-21T09:21:00.4531150Z .loc 1 57 34 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:34 2026-02-21T09:21:00.4531216Z cvt.s64.s32 %rd680, %r23299; 2026-02-21T09:21:00.4531280Z add.s64 %rd674, %rd45, %rd680; 2026-02-21T09:21:00.4531478Z .loc 1 57 87 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:57:87 2026-02-21T09:21:00.4531547Z add.s32 %r21662, %r154, %r21721; 2026-02-21T09:21:00.4531605Z // begin inline asm 2026-02-21T09:21:00.4531741Z cp.async.ca.shared.global [ %r21662 + 0 ], [ %rd674 + 0 ], 0x8, %r21645; 2026-02-21T09:21:00.4531799Z // end inline asm 2026-02-21T09:21:00.4531865Z cp.async.commit_group; 2026-02-21T09:21:00.4532066Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4532137Z add.s32 %r23299, %r23299, 131072; 2026-02-21T09:21:00.4532202Z add.s64 %rd722, %rd722, 64; 2026-02-21T09:21:00.4532270Z setp.lt.u64 %p63, %rd723, 496; 2026-02-21T09:21:00.4532332Z @%p63 bra $L__BB0_14; 2026-02-21T09:21:00.4532452Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T09:21:00.4532668Z .loc 1 36 32 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:36:32 2026-02-21T09:21:00.4532732Z or.b32 %r22011, %r2342, %r11; 2026-02-21T09:21:00.4532796Z or.b32 %r22012, %r2342, %r12; 2026-02-21T09:21:00.4532853Z or.b32 %r22013, %r2342, %r13; 2026-02-21T09:21:00.4532923Z or.b32 %r22014, %r2342, %r14; 2026-02-21T09:21:00.4532987Z or.b32 %r22015, %r2342, %r15; 2026-02-21T09:21:00.4533045Z or.b32 %r22016, %r2342, %r16; 2026-02-21T09:21:00.4533104Z or.b32 %r22017, %r2342, %r17; 2026-02-21T09:21:00.4533162Z or.b32 %r22018, %r2342, %r18; 2026-02-21T09:21:00.4533223Z or.b32 %r22019, %r2342, %r19; 2026-02-21T09:21:00.4533282Z or.b32 %r22020, %r2342, %r20; 2026-02-21T09:21:00.4533340Z or.b32 %r22021, %r2342, %r21; 2026-02-21T09:21:00.4533400Z or.b32 %r22022, %r2342, %r22; 2026-02-21T09:21:00.4533457Z or.b32 %r22023, %r2342, %r23; 2026-02-21T09:21:00.4533586Z or.b32 %r22024, %r2342, %r24; 2026-02-21T09:21:00.4533642Z or.b32 %r22025, %r2342, %r25; 2026-02-21T09:21:00.4533704Z or.b32 %r22026, %r2342, %r26; 2026-02-21T09:21:00.4533761Z or.b32 %r22027, %r2342, %r27; 2026-02-21T09:21:00.4533821Z or.b32 %r22028, %r2342, %r28; 2026-02-21T09:21:00.4533931Z or.b32 %r22029, %r2342, %r29; 2026-02-21T09:21:00.4533988Z or.b32 %r22030, %r2342, %r30; 2026-02-21T09:21:00.4534044Z or.b32 %r22031, %r2342, %r31; 2026-02-21T09:21:00.4534100Z or.b32 %r22032, %r2342, %r32; 2026-02-21T09:21:00.4534159Z or.b32 %r22033, %r2342, %r33; 2026-02-21T09:21:00.4534217Z or.b32 %r22034, %r2342, %r34; 2026-02-21T09:21:00.4534273Z or.b32 %r22035, %r2342, %r35; 2026-02-21T09:21:00.4534334Z or.b32 %r22036, %r2342, %r36; 2026-02-21T09:21:00.4534441Z or.b32 %r22037, %r2342, %r37; 2026-02-21T09:21:00.4534499Z or.b32 %r22038, %r2342, %r38; 2026-02-21T09:21:00.4534559Z or.b32 %r22039, %r2342, %r39; 2026-02-21T09:21:00.4534616Z or.b32 %r22040, %r2342, %r40; 2026-02-21T09:21:00.4534676Z or.b32 %r22041, %r2342, %r41; 2026-02-21T09:21:00.4534744Z or.b32 %r22042, %r2342, %r42; 2026-02-21T09:21:00.4534954Z .loc 1 43 78 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:43:78 2026-02-21T09:21:00.4535024Z cp.async.wait_group 0; 2026-02-21T09:21:00.4535081Z bar.sync 0; 2026-02-21T09:21:00.4535279Z .loc 1 90 28 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:90:28 2026-02-21T09:21:00.4535369Z cvt.rn.bf16x2.f32 %r22043, %r23303, %r23302; 2026-02-21T09:21:00.4535496Z cvt.rn.bf16x2.f32 %r22044, %r23305, %r23304; 2026-02-21T09:21:00.4535574Z cvt.rn.bf16x2.f32 %r22045, %r23307, %r23306; 2026-02-21T09:21:00.4535653Z cvt.rn.bf16x2.f32 %r22046, %r23309, %r23308; 2026-02-21T09:21:00.4535729Z cvt.rn.bf16x2.f32 %r22047, %r23311, %r23310; 2026-02-21T09:21:00.4535806Z cvt.rn.bf16x2.f32 %r22048, %r23313, %r23312; 2026-02-21T09:21:00.4535882Z cvt.rn.bf16x2.f32 %r22049, %r23315, %r23314; 2026-02-21T09:21:00.4535958Z cvt.rn.bf16x2.f32 %r22050, %r23317, %r23316; 2026-02-21T09:21:00.4536033Z cvt.rn.bf16x2.f32 %r22051, %r23319, %r23318; 2026-02-21T09:21:00.4536108Z cvt.rn.bf16x2.f32 %r22052, %r23321, %r23320; 2026-02-21T09:21:00.4536181Z cvt.rn.bf16x2.f32 %r22053, %r23323, %r23322; 2026-02-21T09:21:00.4536260Z cvt.rn.bf16x2.f32 %r22054, %r23325, %r23324; 2026-02-21T09:21:00.4536334Z cvt.rn.bf16x2.f32 %r22055, %r23327, %r23326; 2026-02-21T09:21:00.4536412Z cvt.rn.bf16x2.f32 %r22056, %r23329, %r23328; 2026-02-21T09:21:00.4536603Z cvt.rn.bf16x2.f32 %r22057, %r23331, %r23330; 2026-02-21T09:21:00.4536684Z cvt.rn.bf16x2.f32 %r22058, %r23333, %r23332; 2026-02-21T09:21:00.4536763Z cvt.rn.bf16x2.f32 %r22059, %r23335, %r23334; 2026-02-21T09:21:00.4536838Z cvt.rn.bf16x2.f32 %r22060, %r23337, %r23336; 2026-02-21T09:21:00.4536911Z cvt.rn.bf16x2.f32 %r22061, %r23339, %r23338; 2026-02-21T09:21:00.4536986Z cvt.rn.bf16x2.f32 %r22062, %r23341, %r23340; 2026-02-21T09:21:00.4537060Z cvt.rn.bf16x2.f32 %r22063, %r23343, %r23342; 2026-02-21T09:21:00.4537134Z cvt.rn.bf16x2.f32 %r22064, %r23345, %r23344; 2026-02-21T09:21:00.4537209Z cvt.rn.bf16x2.f32 %r22065, %r23347, %r23346; 2026-02-21T09:21:00.4537294Z cvt.rn.bf16x2.f32 %r22066, %r23349, %r23348; 2026-02-21T09:21:00.4537377Z cvt.rn.bf16x2.f32 %r22067, %r23351, %r23350; 2026-02-21T09:21:00.4537451Z cvt.rn.bf16x2.f32 %r22068, %r23353, %r23352; 2026-02-21T09:21:00.4537536Z cvt.rn.bf16x2.f32 %r22069, %r23355, %r23354; 2026-02-21T09:21:00.4537615Z cvt.rn.bf16x2.f32 %r22070, %r23357, %r23356; 2026-02-21T09:21:00.4537693Z cvt.rn.bf16x2.f32 %r22071, %r23359, %r23358; 2026-02-21T09:21:00.4537774Z cvt.rn.bf16x2.f32 %r22072, %r23361, %r23360; 2026-02-21T09:21:00.4537849Z cvt.rn.bf16x2.f32 %r22073, %r23363, %r23362; 2026-02-21T09:21:00.4537924Z cvt.rn.bf16x2.f32 %r22074, %r23365, %r23364; 2026-02-21T09:21:00.4537999Z cvt.rn.bf16x2.f32 %r22075, %r23367, %r23366; 2026-02-21T09:21:00.4538085Z cvt.rn.bf16x2.f32 %r22076, %r23369, %r23368; 2026-02-21T09:21:00.4538243Z cvt.rn.bf16x2.f32 %r22077, %r23371, %r23370; 2026-02-21T09:21:00.4538318Z cvt.rn.bf16x2.f32 %r22078, %r23373, %r23372; 2026-02-21T09:21:00.4538393Z cvt.rn.bf16x2.f32 %r22079, %r23375, %r23374; 2026-02-21T09:21:00.4538468Z cvt.rn.bf16x2.f32 %r22080, %r23377, %r23376; 2026-02-21T09:21:00.4538603Z cvt.rn.bf16x2.f32 %r22081, %r23379, %r23378; 2026-02-21T09:21:00.4538680Z cvt.rn.bf16x2.f32 %r22082, %r23381, %r23380; 2026-02-21T09:21:00.4538756Z cvt.rn.bf16x2.f32 %r22083, %r23383, %r23382; 2026-02-21T09:21:00.4538835Z cvt.rn.bf16x2.f32 %r22084, %r23385, %r23384; 2026-02-21T09:21:00.4538911Z cvt.rn.bf16x2.f32 %r22085, %r23387, %r23386; 2026-02-21T09:21:00.4538988Z cvt.rn.bf16x2.f32 %r22086, %r23389, %r23388; 2026-02-21T09:21:00.4539141Z cvt.rn.bf16x2.f32 %r22087, %r23391, %r23390; 2026-02-21T09:21:00.4539219Z cvt.rn.bf16x2.f32 %r22088, %r23393, %r23392; 2026-02-21T09:21:00.4539297Z cvt.rn.bf16x2.f32 %r22089, %r23395, %r23394; 2026-02-21T09:21:00.4539371Z cvt.rn.bf16x2.f32 %r22090, %r23397, %r23396; 2026-02-21T09:21:00.4539450Z cvt.rn.bf16x2.f32 %r22091, %r23399, %r23398; 2026-02-21T09:21:00.4539528Z cvt.rn.bf16x2.f32 %r22092, %r23401, %r23400; 2026-02-21T09:21:00.4539603Z cvt.rn.bf16x2.f32 %r22093, %r23403, %r23402; 2026-02-21T09:21:00.4539680Z cvt.rn.bf16x2.f32 %r22094, %r23405, %r23404; 2026-02-21T09:21:00.4539755Z cvt.rn.bf16x2.f32 %r22095, %r23407, %r23406; 2026-02-21T09:21:00.4539834Z cvt.rn.bf16x2.f32 %r22096, %r23409, %r23408; 2026-02-21T09:21:00.4539909Z cvt.rn.bf16x2.f32 %r22097, %r23411, %r23410; 2026-02-21T09:21:00.4540045Z cvt.rn.bf16x2.f32 %r22098, %r23413, %r23412; 2026-02-21T09:21:00.4540127Z cvt.rn.bf16x2.f32 %r22099, %r23415, %r23414; 2026-02-21T09:21:00.4540201Z cvt.rn.bf16x2.f32 %r22100, %r23417, %r23416; 2026-02-21T09:21:00.4540277Z cvt.rn.bf16x2.f32 %r22101, %r23419, %r23418; 2026-02-21T09:21:00.4540353Z cvt.rn.bf16x2.f32 %r22102, %r23421, %r23420; 2026-02-21T09:21:00.4540428Z cvt.rn.bf16x2.f32 %r22103, %r23423, %r23422; 2026-02-21T09:21:00.4540504Z cvt.rn.bf16x2.f32 %r22104, %r23425, %r23424; 2026-02-21T09:21:00.4540576Z cvt.rn.bf16x2.f32 %r22105, %r23427, %r23426; 2026-02-21T09:21:00.4540652Z cvt.rn.bf16x2.f32 %r22106, %r23429, %r23428; 2026-02-21T09:21:00.4540726Z cvt.rn.bf16x2.f32 %r22107, %r23431, %r23430; 2026-02-21T09:21:00.4540803Z cvt.rn.bf16x2.f32 %r22108, %r23433, %r23432; 2026-02-21T09:21:00.4540883Z cvt.rn.bf16x2.f32 %r22109, %r23435, %r23434; 2026-02-21T09:21:00.4540967Z cvt.rn.bf16x2.f32 %r22110, %r23437, %r23436; 2026-02-21T09:21:00.4541043Z cvt.rn.bf16x2.f32 %r22111, %r23439, %r23438; 2026-02-21T09:21:00.4541119Z cvt.rn.bf16x2.f32 %r22112, %r23441, %r23440; 2026-02-21T09:21:00.4541196Z cvt.rn.bf16x2.f32 %r22113, %r23443, %r23442; 2026-02-21T09:21:00.4541270Z cvt.rn.bf16x2.f32 %r22114, %r23445, %r23444; 2026-02-21T09:21:00.4541345Z cvt.rn.bf16x2.f32 %r22115, %r23447, %r23446; 2026-02-21T09:21:00.4541420Z cvt.rn.bf16x2.f32 %r22116, %r23449, %r23448; 2026-02-21T09:21:00.4541493Z cvt.rn.bf16x2.f32 %r22117, %r23451, %r23450; 2026-02-21T09:21:00.4541569Z cvt.rn.bf16x2.f32 %r22118, %r23453, %r23452; 2026-02-21T09:21:00.4541646Z cvt.rn.bf16x2.f32 %r22119, %r23455, %r23454; 2026-02-21T09:21:00.4541720Z cvt.rn.bf16x2.f32 %r22120, %r23457, %r23456; 2026-02-21T09:21:00.4541796Z cvt.rn.bf16x2.f32 %r22121, %r23459, %r23458; 2026-02-21T09:21:00.4541871Z cvt.rn.bf16x2.f32 %r22122, %r23461, %r23460; 2026-02-21T09:21:00.4541948Z cvt.rn.bf16x2.f32 %r22123, %r23463, %r23462; 2026-02-21T09:21:00.4542021Z cvt.rn.bf16x2.f32 %r22124, %r23465, %r23464; 2026-02-21T09:21:00.4542096Z cvt.rn.bf16x2.f32 %r22125, %r23467, %r23466; 2026-02-21T09:21:00.4542174Z cvt.rn.bf16x2.f32 %r22126, %r23469, %r23468; 2026-02-21T09:21:00.4542247Z cvt.rn.bf16x2.f32 %r22127, %r23471, %r23470; 2026-02-21T09:21:00.4542321Z cvt.rn.bf16x2.f32 %r22128, %r23473, %r23472; 2026-02-21T09:21:00.4542400Z cvt.rn.bf16x2.f32 %r22129, %r23475, %r23474; 2026-02-21T09:21:00.4542473Z cvt.rn.bf16x2.f32 %r22130, %r23477, %r23476; 2026-02-21T09:21:00.4542608Z cvt.rn.bf16x2.f32 %r22131, %r23479, %r23478; 2026-02-21T09:21:00.4542684Z cvt.rn.bf16x2.f32 %r22132, %r23481, %r23480; 2026-02-21T09:21:00.4542759Z cvt.rn.bf16x2.f32 %r22133, %r23483, %r23482; 2026-02-21T09:21:00.4542832Z cvt.rn.bf16x2.f32 %r22134, %r23485, %r23484; 2026-02-21T09:21:00.4542954Z cvt.rn.bf16x2.f32 %r22135, %r23487, %r23486; 2026-02-21T09:21:00.4543031Z cvt.rn.bf16x2.f32 %r22136, %r23489, %r23488; 2026-02-21T09:21:00.4543104Z cvt.rn.bf16x2.f32 %r22137, %r23491, %r23490; 2026-02-21T09:21:00.4543180Z cvt.rn.bf16x2.f32 %r22138, %r23493, %r23492; 2026-02-21T09:21:00.4543258Z cvt.rn.bf16x2.f32 %r22139, %r23495, %r23494; 2026-02-21T09:21:00.4543332Z cvt.rn.bf16x2.f32 %r22140, %r23497, %r23496; 2026-02-21T09:21:00.4543455Z cvt.rn.bf16x2.f32 %r22141, %r23499, %r23498; 2026-02-21T09:21:00.4543530Z cvt.rn.bf16x2.f32 %r22142, %r23501, %r23500; 2026-02-21T09:21:00.4543612Z cvt.rn.bf16x2.f32 %r22143, %r23503, %r23502; 2026-02-21T09:21:00.4543689Z cvt.rn.bf16x2.f32 %r22144, %r23505, %r23504; 2026-02-21T09:21:00.4543762Z cvt.rn.bf16x2.f32 %r22145, %r23507, %r23506; 2026-02-21T09:21:00.4543838Z cvt.rn.bf16x2.f32 %r22146, %r23509, %r23508; 2026-02-21T09:21:00.4543911Z cvt.rn.bf16x2.f32 %r22147, %r23511, %r23510; 2026-02-21T09:21:00.4543986Z cvt.rn.bf16x2.f32 %r22148, %r23513, %r23512; 2026-02-21T09:21:00.4544062Z cvt.rn.bf16x2.f32 %r22149, %r23515, %r23514; 2026-02-21T09:21:00.4544135Z cvt.rn.bf16x2.f32 %r22150, %r23517, %r23516; 2026-02-21T09:21:00.4544207Z cvt.rn.bf16x2.f32 %r22151, %r23519, %r23518; 2026-02-21T09:21:00.4544331Z cvt.rn.bf16x2.f32 %r22152, %r23521, %r23520; 2026-02-21T09:21:00.4544412Z cvt.rn.bf16x2.f32 %r22153, %r23523, %r23522; 2026-02-21T09:21:00.4544486Z cvt.rn.bf16x2.f32 %r22154, %r23525, %r23524; 2026-02-21T09:21:00.4544560Z cvt.rn.bf16x2.f32 %r22155, %r23527, %r23526; 2026-02-21T09:21:00.4544637Z cvt.rn.bf16x2.f32 %r22156, %r23529, %r23528; 2026-02-21T09:21:00.4544721Z cvt.rn.bf16x2.f32 %r22157, %r23531, %r23530; 2026-02-21T09:21:00.4544802Z cvt.rn.bf16x2.f32 %r22158, %r23533, %r23532; 2026-02-21T09:21:00.4544884Z cvt.rn.bf16x2.f32 %r22159, %r23535, %r23534; 2026-02-21T09:21:00.4544959Z cvt.rn.bf16x2.f32 %r22160, %r23537, %r23536; 2026-02-21T09:21:00.4545032Z cvt.rn.bf16x2.f32 %r22161, %r23539, %r23538; 2026-02-21T09:21:00.4545107Z cvt.rn.bf16x2.f32 %r22162, %r23541, %r23540; 2026-02-21T09:21:00.4545184Z cvt.rn.bf16x2.f32 %r22163, %r23543, %r23542; 2026-02-21T09:21:00.4545258Z cvt.rn.bf16x2.f32 %r22164, %r23545, %r23544; 2026-02-21T09:21:00.4545332Z cvt.rn.bf16x2.f32 %r22165, %r23547, %r23546; 2026-02-21T09:21:00.4545410Z cvt.rn.bf16x2.f32 %r22166, %r23549, %r23548; 2026-02-21T09:21:00.4545483Z cvt.rn.bf16x2.f32 %r22167, %r23551, %r23550; 2026-02-21T09:21:00.4545557Z cvt.rn.bf16x2.f32 %r22168, %r23553, %r23552; 2026-02-21T09:21:00.4545634Z cvt.rn.bf16x2.f32 %r22169, %r23555, %r23554; 2026-02-21T09:21:00.4545709Z cvt.rn.bf16x2.f32 %r22170, %r23557, %r23556; 2026-02-21T09:21:00.4545920Z .loc 1 91 43 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:43 2026-02-21T09:21:00.4545983Z shl.b32 %r22171, %r22011, 13; 2026-02-21T09:21:00.4546045Z shl.b32 %r22172, %r22012, 13; 2026-02-21T09:21:00.4546104Z shl.b32 %r22173, %r22013, 13; 2026-02-21T09:21:00.4546164Z shl.b32 %r22174, %r22014, 13; 2026-02-21T09:21:00.4546224Z shl.b32 %r22175, %r22015, 13; 2026-02-21T09:21:00.4546283Z shl.b32 %r22176, %r22016, 13; 2026-02-21T09:21:00.4546342Z shl.b32 %r22177, %r22017, 13; 2026-02-21T09:21:00.4546400Z shl.b32 %r22178, %r22018, 13; 2026-02-21T09:21:00.4546590Z shl.b32 %r22179, %r22019, 13; 2026-02-21T09:21:00.4546658Z shl.b32 %r22180, %r22020, 13; 2026-02-21T09:21:00.4546716Z shl.b32 %r22181, %r22021, 13; 2026-02-21T09:21:00.4546779Z shl.b32 %r22182, %r22022, 13; 2026-02-21T09:21:00.4546838Z shl.b32 %r22183, %r22023, 13; 2026-02-21T09:21:00.4546896Z shl.b32 %r22184, %r22024, 13; 2026-02-21T09:21:00.4546955Z shl.b32 %r22185, %r22025, 13; 2026-02-21T09:21:00.4547099Z shl.b32 %r22186, %r22026, 13; 2026-02-21T09:21:00.4547161Z shl.b32 %r22187, %r22027, 13; 2026-02-21T09:21:00.4547222Z shl.b32 %r22188, %r22028, 13; 2026-02-21T09:21:00.4547279Z shl.b32 %r22189, %r22029, 13; 2026-02-21T09:21:00.4547337Z shl.b32 %r22190, %r22030, 13; 2026-02-21T09:21:00.4547458Z shl.b32 %r22191, %r22031, 13; 2026-02-21T09:21:00.4547518Z shl.b32 %r22192, %r22032, 13; 2026-02-21T09:21:00.4547575Z shl.b32 %r22193, %r22033, 13; 2026-02-21T09:21:00.4547633Z shl.b32 %r22194, %r22034, 13; 2026-02-21T09:21:00.4547693Z shl.b32 %r22195, %r22035, 13; 2026-02-21T09:21:00.4547752Z shl.b32 %r22196, %r22036, 13; 2026-02-21T09:21:00.4547810Z shl.b32 %r22197, %r22037, 13; 2026-02-21T09:21:00.4547868Z shl.b32 %r22198, %r22038, 13; 2026-02-21T09:21:00.4547989Z shl.b32 %r22199, %r22039, 13; 2026-02-21T09:21:00.4548050Z shl.b32 %r22200, %r22040, 13; 2026-02-21T09:21:00.4548107Z shl.b32 %r22201, %r22041, 13; 2026-02-21T09:21:00.4548179Z shl.b32 %r22202, %r22042, 13; 2026-02-21T09:21:00.4548471Z .loc 1 91 50 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:50 2026-02-21T09:21:00.4548537Z add.s32 %r22203, %r22171, %r2341; 2026-02-21T09:21:00.4548599Z add.s32 %r22204, %r22172, %r2341; 2026-02-21T09:21:00.4548664Z add.s32 %r22205, %r22173, %r2341; 2026-02-21T09:21:00.4548723Z add.s32 %r22206, %r22174, %r2341; 2026-02-21T09:21:00.4548782Z add.s32 %r22207, %r22175, %r2341; 2026-02-21T09:21:00.4548844Z add.s32 %r22208, %r22176, %r2341; 2026-02-21T09:21:00.4548901Z add.s32 %r22209, %r22177, %r2341; 2026-02-21T09:21:00.4549030Z add.s32 %r22210, %r22178, %r2341; 2026-02-21T09:21:00.4549096Z add.s32 %r22211, %r22179, %r2341; 2026-02-21T09:21:00.4549156Z add.s32 %r22212, %r22180, %r2341; 2026-02-21T09:21:00.4549216Z add.s32 %r22213, %r22181, %r2341; 2026-02-21T09:21:00.4549275Z add.s32 %r22214, %r22182, %r2341; 2026-02-21T09:21:00.4549339Z add.s32 %r22215, %r22183, %r2341; 2026-02-21T09:21:00.4549398Z add.s32 %r22216, %r22184, %r2341; 2026-02-21T09:21:00.4549470Z add.s32 %r22217, %r22185, %r2341; 2026-02-21T09:21:00.4549533Z add.s32 %r22218, %r22186, %r2341; 2026-02-21T09:21:00.4549594Z add.s32 %r22219, %r22187, %r2341; 2026-02-21T09:21:00.4549652Z add.s32 %r22220, %r22188, %r2341; 2026-02-21T09:21:00.4549713Z add.s32 %r22221, %r22189, %r2341; 2026-02-21T09:21:00.4549775Z add.s32 %r22222, %r22190, %r2341; 2026-02-21T09:21:00.4549834Z add.s32 %r22223, %r22191, %r2341; 2026-02-21T09:21:00.4549893Z add.s32 %r22224, %r22192, %r2341; 2026-02-21T09:21:00.4549956Z add.s32 %r22225, %r22193, %r2341; 2026-02-21T09:21:00.4550018Z add.s32 %r22226, %r22194, %r2341; 2026-02-21T09:21:00.4550079Z add.s32 %r22227, %r22195, %r2341; 2026-02-21T09:21:00.4550141Z add.s32 %r22228, %r22196, %r2341; 2026-02-21T09:21:00.4550202Z add.s32 %r22229, %r22197, %r2341; 2026-02-21T09:21:00.4550261Z add.s32 %r22230, %r22198, %r2341; 2026-02-21T09:21:00.4550319Z add.s32 %r22231, %r22199, %r2341; 2026-02-21T09:21:00.4550382Z add.s32 %r22232, %r22200, %r2341; 2026-02-21T09:21:00.4550440Z add.s32 %r22233, %r22201, %r2341; 2026-02-21T09:21:00.4550499Z add.s32 %r22234, %r22202, %r2341; 2026-02-21T09:21:00.4550700Z .loc 1 91 22 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:22 2026-02-21T09:21:00.4550776Z mad.wide.s32 %rd681, %r22203, 2, %rd46; 2026-02-21T09:21:00.4550845Z mad.wide.s32 %rd682, %r22204, 2, %rd46; 2026-02-21T09:21:00.4550911Z mad.wide.s32 %rd683, %r22205, 2, %rd46; 2026-02-21T09:21:00.4550982Z mad.wide.s32 %rd684, %r22206, 2, %rd46; 2026-02-21T09:21:00.4551050Z mad.wide.s32 %rd685, %r22207, 2, %rd46; 2026-02-21T09:21:00.4551117Z mad.wide.s32 %rd686, %r22208, 2, %rd46; 2026-02-21T09:21:00.4551185Z mad.wide.s32 %rd687, %r22209, 2, %rd46; 2026-02-21T09:21:00.4551257Z mad.wide.s32 %rd688, %r22210, 2, %rd46; 2026-02-21T09:21:00.4551334Z mad.wide.s32 %rd689, %r22211, 2, %rd46; 2026-02-21T09:21:00.4551404Z mad.wide.s32 %rd690, %r22212, 2, %rd46; 2026-02-21T09:21:00.4551533Z mad.wide.s32 %rd691, %r22213, 2, %rd46; 2026-02-21T09:21:00.4551597Z mad.wide.s32 %rd692, %r22214, 2, %rd46; 2026-02-21T09:21:00.4551663Z mad.wide.s32 %rd693, %r22215, 2, %rd46; 2026-02-21T09:21:00.4551732Z mad.wide.s32 %rd694, %r22216, 2, %rd46; 2026-02-21T09:21:00.4551845Z mad.wide.s32 %rd695, %r22217, 2, %rd46; 2026-02-21T09:21:00.4551911Z mad.wide.s32 %rd696, %r22218, 2, %rd46; 2026-02-21T09:21:00.4551978Z mad.wide.s32 %rd697, %r22219, 2, %rd46; 2026-02-21T09:21:00.4552044Z mad.wide.s32 %rd698, %r22220, 2, %rd46; 2026-02-21T09:21:00.4552124Z mad.wide.s32 %rd699, %r22221, 2, %rd46; 2026-02-21T09:21:00.4552196Z mad.wide.s32 %rd700, %r22222, 2, %rd46; 2026-02-21T09:21:00.4552263Z mad.wide.s32 %rd701, %r22223, 2, %rd46; 2026-02-21T09:21:00.4552386Z mad.wide.s32 %rd702, %r22224, 2, %rd46; 2026-02-21T09:21:00.4552455Z mad.wide.s32 %rd703, %r22225, 2, %rd46; 2026-02-21T09:21:00.4552525Z mad.wide.s32 %rd704, %r22226, 2, %rd46; 2026-02-21T09:21:00.4552590Z mad.wide.s32 %rd705, %r22227, 2, %rd46; 2026-02-21T09:21:00.4552658Z mad.wide.s32 %rd706, %r22228, 2, %rd46; 2026-02-21T09:21:00.4552726Z mad.wide.s32 %rd707, %r22229, 2, %rd46; 2026-02-21T09:21:00.4552791Z mad.wide.s32 %rd708, %r22230, 2, %rd46; 2026-02-21T09:21:00.4552855Z mad.wide.s32 %rd709, %r22231, 2, %rd46; 2026-02-21T09:21:00.4552924Z mad.wide.s32 %rd710, %r22232, 2, %rd46; 2026-02-21T09:21:00.4552992Z mad.wide.s32 %rd711, %r22233, 2, %rd46; 2026-02-21T09:21:00.4553058Z mad.wide.s32 %rd712, %r22234, 2, %rd46; 2026-02-21T09:21:00.4553303Z .loc 1 91 81 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:91:81 2026-02-21T09:21:00.4553432Z st.shared.v4.b32 [%r211], {%r22043, %r22045, %r22047, %r22049}; 2026-02-21T09:21:00.4553545Z st.shared.v4.b32 [%r212], {%r22051, %r22053, %r22055, %r22057}; 2026-02-21T09:21:00.4553653Z st.shared.v4.b32 [%r213], {%r22059, %r22061, %r22063, %r22065}; 2026-02-21T09:21:00.4553772Z st.shared.v4.b32 [%r214], {%r22067, %r22069, %r22071, %r22073}; 2026-02-21T09:21:00.4553886Z st.shared.v4.b32 [%r215], {%r22075, %r22077, %r22079, %r22081}; 2026-02-21T09:21:00.4553993Z st.shared.v4.b32 [%r216], {%r22083, %r22085, %r22087, %r22089}; 2026-02-21T09:21:00.4554104Z st.shared.v4.b32 [%r217], {%r22091, %r22093, %r22095, %r22097}; 2026-02-21T09:21:00.4554211Z st.shared.v4.b32 [%r218], {%r22099, %r22101, %r22103, %r22105}; 2026-02-21T09:21:00.4554273Z bar.sync 0; 2026-02-21T09:21:00.4554336Z // begin inline asm 2026-02-21T09:21:00.4554544Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21723, %r21724, %r21725, %r21726}, [%r21727]; 2026-02-21T09:21:00.4554603Z // end inline asm 2026-02-21T09:21:00.4554664Z // begin inline asm 2026-02-21T09:21:00.4554857Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21728, %r21729, %r21730, %r21731}, [%r21732]; 2026-02-21T09:21:00.4554917Z // end inline asm 2026-02-21T09:21:00.4554977Z // begin inline asm 2026-02-21T09:21:00.4555167Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21733, %r21734, %r21735, %r21736}, [%r21737]; 2026-02-21T09:21:00.4555223Z // end inline asm 2026-02-21T09:21:00.4555279Z // begin inline asm 2026-02-21T09:21:00.4555470Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21738, %r21739, %r21740, %r21741}, [%r21742]; 2026-02-21T09:21:00.4555525Z // end inline asm 2026-02-21T09:21:00.4555584Z // begin inline asm 2026-02-21T09:21:00.4555774Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21743, %r21744, %r21745, %r21746}, [%r21747]; 2026-02-21T09:21:00.4555829Z // end inline asm 2026-02-21T09:21:00.4555885Z // begin inline asm 2026-02-21T09:21:00.4556076Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21748, %r21749, %r21750, %r21751}, [%r21752]; 2026-02-21T09:21:00.4556131Z // end inline asm 2026-02-21T09:21:00.4556187Z // begin inline asm 2026-02-21T09:21:00.4556376Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21753, %r21754, %r21755, %r21756}, [%r21757]; 2026-02-21T09:21:00.4556435Z // end inline asm 2026-02-21T09:21:00.4556621Z // begin inline asm 2026-02-21T09:21:00.4556814Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21758, %r21759, %r21760, %r21761}, [%r21762]; 2026-02-21T09:21:00.4556955Z // end inline asm 2026-02-21T09:21:00.4557009Z bar.sync 0; 2026-02-21T09:21:00.4557119Z st.shared.v4.b32 [%r211], {%r22044, %r22046, %r22048, %r22050}; 2026-02-21T09:21:00.4557319Z st.shared.v4.b32 [%r212], {%r22052, %r22054, %r22056, %r22058}; 2026-02-21T09:21:00.4557427Z st.shared.v4.b32 [%r213], {%r22060, %r22062, %r22064, %r22066}; 2026-02-21T09:21:00.4557533Z st.shared.v4.b32 [%r214], {%r22068, %r22070, %r22072, %r22074}; 2026-02-21T09:21:00.4557639Z st.shared.v4.b32 [%r215], {%r22076, %r22078, %r22080, %r22082}; 2026-02-21T09:21:00.4557748Z st.shared.v4.b32 [%r216], {%r22084, %r22086, %r22088, %r22090}; 2026-02-21T09:21:00.4557915Z st.shared.v4.b32 [%r217], {%r22092, %r22094, %r22096, %r22098}; 2026-02-21T09:21:00.4558023Z st.shared.v4.b32 [%r218], {%r22100, %r22102, %r22104, %r22106}; 2026-02-21T09:21:00.4558080Z bar.sync 0; 2026-02-21T09:21:00.4558136Z // begin inline asm 2026-02-21T09:21:00.4558343Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21763, %r21764, %r21765, %r21766}, [%r21727]; 2026-02-21T09:21:00.4558404Z // end inline asm 2026-02-21T09:21:00.4558463Z // begin inline asm 2026-02-21T09:21:00.4558654Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21768, %r21769, %r21770, %r21771}, [%r21732]; 2026-02-21T09:21:00.4558709Z // end inline asm 2026-02-21T09:21:00.4558767Z // begin inline asm 2026-02-21T09:21:00.4558954Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21773, %r21774, %r21775, %r21776}, [%r21737]; 2026-02-21T09:21:00.4559080Z // end inline asm 2026-02-21T09:21:00.4559141Z // begin inline asm 2026-02-21T09:21:00.4559330Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21778, %r21779, %r21780, %r21781}, [%r21742]; 2026-02-21T09:21:00.4559386Z // end inline asm 2026-02-21T09:21:00.4559445Z // begin inline asm 2026-02-21T09:21:00.4559639Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21783, %r21784, %r21785, %r21786}, [%r21747]; 2026-02-21T09:21:00.4559696Z // end inline asm 2026-02-21T09:21:00.4559751Z // begin inline asm 2026-02-21T09:21:00.4559941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21788, %r21789, %r21790, %r21791}, [%r21752]; 2026-02-21T09:21:00.4559995Z // end inline asm 2026-02-21T09:21:00.4560051Z // begin inline asm 2026-02-21T09:21:00.4560242Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21793, %r21794, %r21795, %r21796}, [%r21757]; 2026-02-21T09:21:00.4560297Z // end inline asm 2026-02-21T09:21:00.4560352Z // begin inline asm 2026-02-21T09:21:00.4560544Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21798, %r21799, %r21800, %r21801}, [%r21762]; 2026-02-21T09:21:00.4560601Z // end inline asm 2026-02-21T09:21:00.4560655Z bar.sync 0; 2026-02-21T09:21:00.4560764Z st.shared.v4.b32 [%r211], {%r22107, %r22109, %r22111, %r22113}; 2026-02-21T09:21:00.4560877Z st.shared.v4.b32 [%r212], {%r22115, %r22117, %r22119, %r22121}; 2026-02-21T09:21:00.4560987Z st.shared.v4.b32 [%r213], {%r22123, %r22125, %r22127, %r22129}; 2026-02-21T09:21:00.4561095Z st.shared.v4.b32 [%r214], {%r22131, %r22133, %r22135, %r22137}; 2026-02-21T09:21:00.4561204Z st.shared.v4.b32 [%r215], {%r22139, %r22141, %r22143, %r22145}; 2026-02-21T09:21:00.4561310Z st.shared.v4.b32 [%r216], {%r22147, %r22149, %r22151, %r22153}; 2026-02-21T09:21:00.4561417Z st.shared.v4.b32 [%r217], {%r22155, %r22157, %r22159, %r22161}; 2026-02-21T09:21:00.4561536Z st.shared.v4.b32 [%r218], {%r22163, %r22165, %r22167, %r22169}; 2026-02-21T09:21:00.4561593Z bar.sync 0; 2026-02-21T09:21:00.4561651Z // begin inline asm 2026-02-21T09:21:00.4561849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21803, %r21804, %r21805, %r21806}, [%r21727]; 2026-02-21T09:21:00.4561908Z // end inline asm 2026-02-21T09:21:00.4561965Z // begin inline asm 2026-02-21T09:21:00.4562155Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21808, %r21809, %r21810, %r21811}, [%r21732]; 2026-02-21T09:21:00.4562220Z // end inline asm 2026-02-21T09:21:00.4562277Z // begin inline asm 2026-02-21T09:21:00.4562465Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21813, %r21814, %r21815, %r21816}, [%r21737]; 2026-02-21T09:21:00.4562586Z // end inline asm 2026-02-21T09:21:00.4562643Z // begin inline asm 2026-02-21T09:21:00.4562831Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21818, %r21819, %r21820, %r21821}, [%r21742]; 2026-02-21T09:21:00.4562931Z // end inline asm 2026-02-21T09:21:00.4562991Z // begin inline asm 2026-02-21T09:21:00.4563178Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21823, %r21824, %r21825, %r21826}, [%r21747]; 2026-02-21T09:21:00.4563233Z // end inline asm 2026-02-21T09:21:00.4563295Z // begin inline asm 2026-02-21T09:21:00.4563483Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21828, %r21829, %r21830, %r21831}, [%r21752]; 2026-02-21T09:21:00.4563538Z // end inline asm 2026-02-21T09:21:00.4563641Z // begin inline asm 2026-02-21T09:21:00.4563833Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21833, %r21834, %r21835, %r21836}, [%r21757]; 2026-02-21T09:21:00.4563888Z // end inline asm 2026-02-21T09:21:00.4563946Z // begin inline asm 2026-02-21T09:21:00.4564135Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21838, %r21839, %r21840, %r21841}, [%r21762]; 2026-02-21T09:21:00.4564190Z // end inline asm 2026-02-21T09:21:00.4564256Z bar.sync 0; 2026-02-21T09:21:00.4564370Z st.shared.v4.b32 [%r211], {%r22108, %r22110, %r22112, %r22114}; 2026-02-21T09:21:00.4564480Z st.shared.v4.b32 [%r212], {%r22116, %r22118, %r22120, %r22122}; 2026-02-21T09:21:00.4564586Z st.shared.v4.b32 [%r213], {%r22124, %r22126, %r22128, %r22130}; 2026-02-21T09:21:00.4564740Z st.shared.v4.b32 [%r214], {%r22132, %r22134, %r22136, %r22138}; 2026-02-21T09:21:00.4564852Z st.shared.v4.b32 [%r215], {%r22140, %r22142, %r22144, %r22146}; 2026-02-21T09:21:00.4564958Z st.shared.v4.b32 [%r216], {%r22148, %r22150, %r22152, %r22154}; 2026-02-21T09:21:00.4565065Z st.shared.v4.b32 [%r217], {%r22156, %r22158, %r22160, %r22162}; 2026-02-21T09:21:00.4565174Z st.shared.v4.b32 [%r218], {%r22164, %r22166, %r22168, %r22170}; 2026-02-21T09:21:00.4565229Z bar.sync 0; 2026-02-21T09:21:00.4565287Z // begin inline asm 2026-02-21T09:21:00.4565479Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21843, %r21844, %r21845, %r21846}, [%r21727]; 2026-02-21T09:21:00.4565534Z // end inline asm 2026-02-21T09:21:00.4565589Z // begin inline asm 2026-02-21T09:21:00.4565778Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21848, %r21849, %r21850, %r21851}, [%r21732]; 2026-02-21T09:21:00.4565836Z // end inline asm 2026-02-21T09:21:00.4565892Z // begin inline asm 2026-02-21T09:21:00.4566082Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21853, %r21854, %r21855, %r21856}, [%r21737]; 2026-02-21T09:21:00.4566139Z // end inline asm 2026-02-21T09:21:00.4566196Z // begin inline asm 2026-02-21T09:21:00.4566384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21858, %r21859, %r21860, %r21861}, [%r21742]; 2026-02-21T09:21:00.4566443Z // end inline asm 2026-02-21T09:21:00.4566617Z // begin inline asm 2026-02-21T09:21:00.4566808Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21863, %r21864, %r21865, %r21866}, [%r21747]; 2026-02-21T09:21:00.4566866Z // end inline asm 2026-02-21T09:21:00.4566925Z // begin inline asm 2026-02-21T09:21:00.4567112Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21868, %r21869, %r21870, %r21871}, [%r21752]; 2026-02-21T09:21:00.4567168Z // end inline asm 2026-02-21T09:21:00.4567228Z // begin inline asm 2026-02-21T09:21:00.4567414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21873, %r21874, %r21875, %r21876}, [%r21757]; 2026-02-21T09:21:00.4567469Z // end inline asm 2026-02-21T09:21:00.4567539Z // begin inline asm 2026-02-21T09:21:00.4567733Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21878, %r21879, %r21880, %r21881}, [%r21762]; 2026-02-21T09:21:00.4567789Z // end inline asm 2026-02-21T09:21:00.4567846Z // begin inline asm 2026-02-21T09:21:00.4567980Z st.global.v4.b32 [ %rd681 + 0 ], { %r21723, %r21724, %r21725, %r21726 }; 2026-02-21T09:21:00.4568037Z // end inline asm 2026-02-21T09:21:00.4568094Z // begin inline asm 2026-02-21T09:21:00.4568218Z st.global.v4.b32 [ %rd682 + 0 ], { %r21763, %r21764, %r21765, %r21766 }; 2026-02-21T09:21:00.4568358Z // end inline asm 2026-02-21T09:21:00.4568416Z // begin inline asm 2026-02-21T09:21:00.4568535Z st.global.v4.b32 [ %rd683 + 0 ], { %r21728, %r21729, %r21730, %r21731 }; 2026-02-21T09:21:00.4568653Z // end inline asm 2026-02-21T09:21:00.4568709Z // begin inline asm 2026-02-21T09:21:00.4568828Z st.global.v4.b32 [ %rd684 + 0 ], { %r21768, %r21769, %r21770, %r21771 }; 2026-02-21T09:21:00.4568885Z // end inline asm 2026-02-21T09:21:00.4568940Z // begin inline asm 2026-02-21T09:21:00.4569058Z st.global.v4.b32 [ %rd685 + 0 ], { %r21733, %r21734, %r21735, %r21736 }; 2026-02-21T09:21:00.4569115Z // end inline asm 2026-02-21T09:21:00.4569170Z // begin inline asm 2026-02-21T09:21:00.4569347Z st.global.v4.b32 [ %rd686 + 0 ], { %r21773, %r21774, %r21775, %r21776 }; 2026-02-21T09:21:00.4569405Z // end inline asm 2026-02-21T09:21:00.4569465Z // begin inline asm 2026-02-21T09:21:00.4569580Z st.global.v4.b32 [ %rd687 + 0 ], { %r21738, %r21739, %r21740, %r21741 }; 2026-02-21T09:21:00.4569636Z // end inline asm 2026-02-21T09:21:00.4569695Z // begin inline asm 2026-02-21T09:21:00.4569811Z st.global.v4.b32 [ %rd688 + 0 ], { %r21778, %r21779, %r21780, %r21781 }; 2026-02-21T09:21:00.4569868Z // end inline asm 2026-02-21T09:21:00.4569923Z // begin inline asm 2026-02-21T09:21:00.4570042Z st.global.v4.b32 [ %rd689 + 0 ], { %r21743, %r21744, %r21745, %r21746 }; 2026-02-21T09:21:00.4570096Z // end inline asm 2026-02-21T09:21:00.4570152Z // begin inline asm 2026-02-21T09:21:00.4570334Z st.global.v4.b32 [ %rd690 + 0 ], { %r21783, %r21784, %r21785, %r21786 }; 2026-02-21T09:21:00.4570392Z // end inline asm 2026-02-21T09:21:00.4570462Z // begin inline asm 2026-02-21T09:21:00.4570589Z st.global.v4.b32 [ %rd691 + 0 ], { %r21748, %r21749, %r21750, %r21751 }; 2026-02-21T09:21:00.4570645Z // end inline asm 2026-02-21T09:21:00.4570701Z // begin inline asm 2026-02-21T09:21:00.4570819Z st.global.v4.b32 [ %rd692 + 0 ], { %r21788, %r21789, %r21790, %r21791 }; 2026-02-21T09:21:00.4570879Z // end inline asm 2026-02-21T09:21:00.4570934Z // begin inline asm 2026-02-21T09:21:00.4571050Z st.global.v4.b32 [ %rd693 + 0 ], { %r21753, %r21754, %r21755, %r21756 }; 2026-02-21T09:21:00.4571107Z // end inline asm 2026-02-21T09:21:00.4571165Z // begin inline asm 2026-02-21T09:21:00.4571281Z st.global.v4.b32 [ %rd694 + 0 ], { %r21793, %r21794, %r21795, %r21796 }; 2026-02-21T09:21:00.4571336Z // end inline asm 2026-02-21T09:21:00.4571395Z // begin inline asm 2026-02-21T09:21:00.4571512Z st.global.v4.b32 [ %rd695 + 0 ], { %r21758, %r21759, %r21760, %r21761 }; 2026-02-21T09:21:00.4571567Z // end inline asm 2026-02-21T09:21:00.4571626Z // begin inline asm 2026-02-21T09:21:00.4571745Z st.global.v4.b32 [ %rd696 + 0 ], { %r21798, %r21799, %r21800, %r21801 }; 2026-02-21T09:21:00.4571802Z // end inline asm 2026-02-21T09:21:00.4571861Z // begin inline asm 2026-02-21T09:21:00.4571976Z st.global.v4.b32 [ %rd697 + 0 ], { %r21803, %r21804, %r21805, %r21806 }; 2026-02-21T09:21:00.4572032Z // end inline asm 2026-02-21T09:21:00.4572087Z // begin inline asm 2026-02-21T09:21:00.4572206Z st.global.v4.b32 [ %rd698 + 0 ], { %r21843, %r21844, %r21845, %r21846 }; 2026-02-21T09:21:00.4572262Z // end inline asm 2026-02-21T09:21:00.4572319Z // begin inline asm 2026-02-21T09:21:00.4572438Z st.global.v4.b32 [ %rd699 + 0 ], { %r21808, %r21809, %r21810, %r21811 }; 2026-02-21T09:21:00.4572494Z // end inline asm 2026-02-21T09:21:00.4572552Z // begin inline asm 2026-02-21T09:21:00.4572677Z st.global.v4.b32 [ %rd700 + 0 ], { %r21848, %r21849, %r21850, %r21851 }; 2026-02-21T09:21:00.4572733Z // end inline asm 2026-02-21T09:21:00.4572790Z // begin inline asm 2026-02-21T09:21:00.4572906Z st.global.v4.b32 [ %rd701 + 0 ], { %r21813, %r21814, %r21815, %r21816 }; 2026-02-21T09:21:00.4572963Z // end inline asm 2026-02-21T09:21:00.4573019Z // begin inline asm 2026-02-21T09:21:00.4573136Z st.global.v4.b32 [ %rd702 + 0 ], { %r21853, %r21854, %r21855, %r21856 }; 2026-02-21T09:21:00.4573254Z // end inline asm 2026-02-21T09:21:00.4573311Z // begin inline asm 2026-02-21T09:21:00.4573429Z st.global.v4.b32 [ %rd703 + 0 ], { %r21818, %r21819, %r21820, %r21821 }; 2026-02-21T09:21:00.4573484Z // end inline asm 2026-02-21T09:21:00.4573545Z // begin inline asm 2026-02-21T09:21:00.4573710Z st.global.v4.b32 [ %rd704 + 0 ], { %r21858, %r21859, %r21860, %r21861 }; 2026-02-21T09:21:00.4573766Z // end inline asm 2026-02-21T09:21:00.4573825Z // begin inline asm 2026-02-21T09:21:00.4573941Z st.global.v4.b32 [ %rd705 + 0 ], { %r21823, %r21824, %r21825, %r21826 }; 2026-02-21T09:21:00.4573997Z // end inline asm 2026-02-21T09:21:00.4574068Z // begin inline asm 2026-02-21T09:21:00.4574192Z st.global.v4.b32 [ %rd706 + 0 ], { %r21863, %r21864, %r21865, %r21866 }; 2026-02-21T09:21:00.4574294Z // end inline asm 2026-02-21T09:21:00.4574352Z // begin inline asm 2026-02-21T09:21:00.4574471Z st.global.v4.b32 [ %rd707 + 0 ], { %r21828, %r21829, %r21830, %r21831 }; 2026-02-21T09:21:00.4574528Z // end inline asm 2026-02-21T09:21:00.4574585Z // begin inline asm 2026-02-21T09:21:00.4574704Z st.global.v4.b32 [ %rd708 + 0 ], { %r21868, %r21869, %r21870, %r21871 }; 2026-02-21T09:21:00.4574759Z // end inline asm 2026-02-21T09:21:00.4574816Z // begin inline asm 2026-02-21T09:21:00.4574933Z st.global.v4.b32 [ %rd709 + 0 ], { %r21833, %r21834, %r21835, %r21836 }; 2026-02-21T09:21:00.4574991Z // end inline asm 2026-02-21T09:21:00.4575047Z // begin inline asm 2026-02-21T09:21:00.4575166Z st.global.v4.b32 [ %rd710 + 0 ], { %r21873, %r21874, %r21875, %r21876 }; 2026-02-21T09:21:00.4575270Z // end inline asm 2026-02-21T09:21:00.4575329Z // begin inline asm 2026-02-21T09:21:00.4575446Z st.global.v4.b32 [ %rd711 + 0 ], { %r21838, %r21839, %r21840, %r21841 }; 2026-02-21T09:21:00.4575501Z // end inline asm 2026-02-21T09:21:00.4575562Z // begin inline asm 2026-02-21T09:21:00.4575679Z st.global.v4.b32 [ %rd712 + 0 ], { %r21878, %r21879, %r21880, %r21881 }; 2026-02-21T09:21:00.4575733Z // end inline asm 2026-02-21T09:21:00.4575954Z .loc 1 22 121 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:121 2026-02-21T09:21:00.4576017Z add.s32 %r2862, %r23298, 1; 2026-02-21T09:21:00.4576086Z setp.lt.s32 %p64, %r23298, %r2; 2026-02-21T09:21:00.4576152Z mov.b32 %r23298, %r2862; 2026-02-21T09:21:00.4576212Z @%p64 bra $L__BB0_13; 2026-02-21T09:21:00.4576313Z $L__BB0_16: // %._crit_edge 2026-02-21T09:21:00.4576629Z .loc 1 22 4 // cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py:22:4 2026-02-21T09:21:00.4576689Z ret; 2026-02-21T09:21:00.4576744Z $L__tmp21: 2026-02-21T09:21:00.4576798Z $L__func_end0: 2026-02-21T09:21:00.4576886Z // -- End function 2026-02-21T09:21:00.4576940Z } 2026-02-21T09:21:00.4577202Z .file 1 "/tmp/torchinductor_root/ab/cab4led5lliclg6xeel4mm2rfi3vzkm5zae5a34iteecaapqg7hg.py" 2026-02-21T09:21:00.4577414Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:21:00.4577483Z .section .debug_abbrev 2026-02-21T09:21:00.4577533Z { 2026-02-21T09:21:00.4577627Z .b8 1 // Abbreviation Code 2026-02-21T09:21:00.4577726Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:21:00.4577813Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:00.4577895Z .b8 37 // DW_AT_producer 2026-02-21T09:21:00.4577975Z .b8 8 // DW_FORM_string 2026-02-21T09:21:00.4578052Z .b8 19 // DW_AT_language 2026-02-21T09:21:00.4578133Z .b8 5 // DW_FORM_data2 2026-02-21T09:21:00.4578213Z .b8 3 // DW_AT_name 2026-02-21T09:21:00.4578292Z .b8 8 // DW_FORM_string 2026-02-21T09:21:00.4578372Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:21:00.4578552Z .b8 6 // DW_FORM_data4 2026-02-21T09:21:00.4578635Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:21:00.4578712Z .b8 8 // DW_FORM_string 2026-02-21T09:21:00.4578785Z .b8 0 // EOM(1) 2026-02-21T09:21:00.4578923Z .b8 0 // EOM(2) 2026-02-21T09:21:00.4579010Z .b8 2 // Abbreviation Code 2026-02-21T09:21:00.4579094Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:00.4579175Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:00.4579248Z .b8 3 // DW_AT_name 2026-02-21T09:21:00.4579388Z .b8 8 // DW_FORM_string 2026-02-21T09:21:00.4579470Z .b8 32 // DW_AT_inline 2026-02-21T09:21:00.4579552Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:00.4579622Z .b8 0 // EOM(1) 2026-02-21T09:21:00.4579689Z .b8 0 // EOM(2) 2026-02-21T09:21:00.4579775Z .b8 3 // Abbreviation Code 2026-02-21T09:21:00.4579860Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:00.4579943Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:00.4580023Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:00.4580099Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:00.4580240Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:00.4580317Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:00.4580413Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:00.4580488Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:00.4580555Z .b8 0 // EOM(1) 2026-02-21T09:21:00.4580626Z .b8 0 // EOM(2) 2026-02-21T09:21:00.4580709Z .b8 4 // Abbreviation Code 2026-02-21T09:21:00.4580807Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:21:00.4580888Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:00.4580989Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:00.4581067Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:00.4581147Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:00.4581222Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:00.4581301Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:00.4581376Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:00.4581460Z .b8 88 // DW_AT_call_file 2026-02-21T09:21:00.4581538Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:00.4581617Z .b8 89 // DW_AT_call_line 2026-02-21T09:21:00.4581695Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:00.4581776Z .b8 87 // DW_AT_call_column 2026-02-21T09:21:00.4581854Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:00.4581926Z .b8 0 // EOM(1) 2026-02-21T09:21:00.4581995Z .b8 0 // EOM(2) 2026-02-21T09:21:00.4582062Z .b8 0 // EOM(3) 2026-02-21T09:21:00.4582120Z } 2026-02-21T09:21:00.4582185Z .section .debug_info 2026-02-21T09:21:00.4582235Z { 2026-02-21T09:21:00.4582321Z .b32 178 // Length of Unit 2026-02-21T09:21:00.4582421Z .b8 2 // DWARF version number 2026-02-21T09:21:00.4582484Z .b8 0 2026-02-21T09:21:00.4582619Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:21:00.4582769Z .b8 8 // Address Size (in bytes) 2026-02-21T09:21:00.4582886Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:21:00.4582971Z .b8 116 // DW_AT_producer 2026-02-21T09:21:00.4583069Z .b8 114 2026-02-21T09:21:00.4583124Z .b8 105 2026-02-21T09:21:00.4583174Z .b8 116 2026-02-21T09:21:00.4583226Z .b8 111 2026-02-21T09:21:00.4583278Z .b8 110 2026-02-21T09:21:00.4583329Z .b8 0 2026-02-21T09:21:00.4583409Z .b8 2 // DW_AT_language 2026-02-21T09:21:00.4583460Z .b8 0 2026-02-21T09:21:00.4583538Z .b8 99 // DW_AT_name 2026-02-21T09:21:00.4583589Z .b8 97 2026-02-21T09:21:00.4583638Z .b8 98 2026-02-21T09:21:00.4583690Z .b8 52 2026-02-21T09:21:00.4583798Z .b8 108 2026-02-21T09:21:00.4583852Z .b8 101 2026-02-21T09:21:00.4583903Z .b8 100 2026-02-21T09:21:00.4583956Z .b8 53 2026-02-21T09:21:00.4584007Z .b8 108 2026-02-21T09:21:00.4584061Z .b8 108 2026-02-21T09:21:00.4584113Z .b8 105 2026-02-21T09:21:00.4584168Z .b8 99 2026-02-21T09:21:00.4584219Z .b8 108 2026-02-21T09:21:00.4584267Z .b8 103 2026-02-21T09:21:00.4584321Z .b8 54 2026-02-21T09:21:00.4584371Z .b8 120 2026-02-21T09:21:00.4584420Z .b8 101 2026-02-21T09:21:00.4584472Z .b8 101 2026-02-21T09:21:00.4584527Z .b8 108 2026-02-21T09:21:00.4584576Z .b8 52 2026-02-21T09:21:00.4584626Z .b8 109 2026-02-21T09:21:00.4584680Z .b8 109 2026-02-21T09:21:00.4584730Z .b8 50 2026-02-21T09:21:00.4584782Z .b8 114 2026-02-21T09:21:00.4584832Z .b8 102 2026-02-21T09:21:00.4584955Z .b8 105 2026-02-21T09:21:00.4585009Z .b8 51 2026-02-21T09:21:00.4585059Z .b8 118 2026-02-21T09:21:00.4585110Z .b8 122 2026-02-21T09:21:00.4585164Z .b8 107 2026-02-21T09:21:00.4585215Z .b8 109 2026-02-21T09:21:00.4585267Z .b8 53 2026-02-21T09:21:00.4585321Z .b8 122 2026-02-21T09:21:00.4585370Z .b8 97 2026-02-21T09:21:00.4585421Z .b8 101 2026-02-21T09:21:00.4585470Z .b8 53 2026-02-21T09:21:00.4585523Z .b8 97 2026-02-21T09:21:00.4585577Z .b8 51 2026-02-21T09:21:00.4585626Z .b8 52 2026-02-21T09:21:00.4585678Z .b8 105 2026-02-21T09:21:00.4585731Z .b8 116 2026-02-21T09:21:00.4585781Z .b8 101 2026-02-21T09:21:00.4585832Z .b8 101 2026-02-21T09:21:00.4585887Z .b8 99 2026-02-21T09:21:00.4585939Z .b8 97 2026-02-21T09:21:00.4585992Z .b8 97 2026-02-21T09:21:00.4586048Z .b8 112 2026-02-21T09:21:00.4586100Z .b8 113 2026-02-21T09:21:00.4586152Z .b8 103 2026-02-21T09:21:00.4586202Z .b8 55 2026-02-21T09:21:00.4586256Z .b8 104 2026-02-21T09:21:00.4586308Z .b8 103 2026-02-21T09:21:00.4586358Z .b8 46 2026-02-21T09:21:00.4586412Z .b8 112 2026-02-21T09:21:00.4586587Z .b8 121 2026-02-21T09:21:00.4586641Z .b8 0 2026-02-21T09:21:00.4586750Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:21:00.4586843Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:21:00.4586898Z .b8 116 2026-02-21T09:21:00.4586950Z .b8 109 2026-02-21T09:21:00.4587000Z .b8 112 2026-02-21T09:21:00.4587053Z .b8 47 2026-02-21T09:21:00.4587107Z .b8 116 2026-02-21T09:21:00.4587158Z .b8 111 2026-02-21T09:21:00.4587218Z .b8 114 2026-02-21T09:21:00.4587274Z .b8 99 2026-02-21T09:21:00.4587327Z .b8 104 2026-02-21T09:21:00.4587378Z .b8 105 2026-02-21T09:21:00.4587431Z .b8 110 2026-02-21T09:21:00.4587481Z .b8 100 2026-02-21T09:21:00.4587533Z .b8 117 2026-02-21T09:21:00.4587586Z .b8 99 2026-02-21T09:21:00.4587636Z .b8 116 2026-02-21T09:21:00.4587687Z .b8 111 2026-02-21T09:21:00.4587739Z .b8 114 2026-02-21T09:21:00.4587793Z .b8 95 2026-02-21T09:21:00.4587844Z .b8 114 2026-02-21T09:21:00.4587894Z .b8 111 2026-02-21T09:21:00.4587948Z .b8 111 2026-02-21T09:21:00.4588004Z .b8 116 2026-02-21T09:21:00.4588053Z .b8 47 2026-02-21T09:21:00.4588105Z .b8 97 2026-02-21T09:21:00.4588157Z .b8 98 2026-02-21T09:21:00.4588207Z .b8 0 2026-02-21T09:21:00.4588383Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:21:00.4588471Z .b8 95 // DW_AT_name 2026-02-21T09:21:00.4592880Z .b8 104 2026-02-21T09:21:00.4593043Z .b8 101 2026-02-21T09:21:00.4593093Z .b8 108 2026-02-21T09:21:00.4593142Z .b8 105 2026-02-21T09:21:00.4593192Z .b8 111 2026-02-21T09:21:00.4593240Z .b8 110 2026-02-21T09:21:00.4593290Z .b8 95 2026-02-21T09:21:00.4593338Z .b8 109 2026-02-21T09:21:00.4593392Z .b8 97 2026-02-21T09:21:00.4593442Z .b8 116 2026-02-21T09:21:00.4593490Z .b8 109 2026-02-21T09:21:00.4593541Z .b8 117 2026-02-21T09:21:00.4593590Z .b8 108 2026-02-21T09:21:00.4593637Z .b8 95 2026-02-21T09:21:00.4593685Z .b8 98 2026-02-21T09:21:00.4593738Z .b8 102 2026-02-21T09:21:00.4593787Z .b8 49 2026-02-21T09:21:00.4593837Z .b8 54 2026-02-21T09:21:00.4593889Z .b8 95 2026-02-21T09:21:00.4593938Z .b8 105 2026-02-21T09:21:00.4593987Z .b8 110 2026-02-21T09:21:00.4594035Z .b8 116 2026-02-21T09:21:00.4594087Z .b8 52 2026-02-21T09:21:00.4594213Z .b8 0 2026-02-21T09:21:00.4594320Z .b8 1 // DW_AT_inline 2026-02-21T09:21:00.4594458Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:21:00.4594589Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:21:00.4594692Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:21:00.4594797Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:00.4594938Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:21:00.4595044Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:00.4595143Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:21:00.4595313Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T09:21:00.4595407Z .b8 1 // DW_AT_call_file 2026-02-21T09:21:00.4595495Z .b8 87 // DW_AT_call_line 2026-02-21T09:21:00.4595588Z .b8 40 // DW_AT_call_column 2026-02-21T09:21:00.4595686Z .b8 0 // End Of Children Mark 2026-02-21T09:21:00.4595776Z .b8 0 // End Of Children Mark 2026-02-21T09:21:00.4595828Z } 2026-02-21T09:21:00.4595902Z .section .debug_macinfo { } 2026-02-21T09:21:00.4595908Z 2026-02-21T09:21:00.4595991Z ================================================================ 2026-02-21T09:21:00.4596111Z please share the reproducer above with Triton project. 2026-02-21T09:21:01.7463410Z 2026-02-21T09:21:01.7463429Z 2026-02-21T09:21:01.7463437Z 2026-02-21T09:21:01.7463719Z ================================================================ 2026-02-21T09:21:01.7464325Z Internal Triton PTX codegen error 2026-02-21T09:21:01.7464753Z `ptxas` stderr: 2026-02-21T09:21:01.7466105Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:21:01.7468273Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:21:01.7469480Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:01.7469867Z 2026-02-21T09:21:01.7470891Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsnki9y_h.ptx -o /tmp/tmpsnki9y_h.ptx.o 2026-02-21T09:21:01.7472109Z 2026-02-21T09:21:01.7472117Z 2026-02-21T09:21:01.7472246Z // 2026-02-21T09:21:01.7472611Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:21:01.7473125Z // 2026-02-21T09:21:01.7473275Z 2026-02-21T09:21:01.7473351Z .version 8.7 2026-02-21T09:21:01.7473536Z .target sm_90a 2026-02-21T09:21:01.7473710Z .address_size 64 2026-02-21T09:21:01.7473834Z 2026-02-21T09:21:01.7474057Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:21:01.7474486Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:21:01.7474812Z // @_helion_matmul_bf16_int4 2026-02-21T09:21:01.7475531Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:21:01.7475909Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:21:01.7476409Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:21:01.7476988Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:21:01.7477422Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:21:01.7477845Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:21:01.7478186Z ) 2026-02-21T09:21:01.7478345Z .reqntid 1024 2026-02-21T09:21:01.7478527Z { 2026-02-21T09:21:01.7478712Z .reg .pred %p<64>; 2026-02-21T09:21:01.7478930Z .reg .b16 %rs<198>; 2026-02-21T09:21:01.7479211Z .reg .b32 %r<5352>; 2026-02-21T09:21:01.7479547Z .reg .b64 %rd<226>; 2026-02-21T09:21:01.7479963Z .loc 1 14 0 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:14:0 2026-02-21T09:21:01.7480459Z $L__func_begin0: 2026-02-21T09:21:01.7480852Z .loc 1 14 0 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:14:0 2026-02-21T09:21:01.7481226Z 2026-02-21T09:21:01.7481299Z // %bb.0: 2026-02-21T09:21:01.7482215Z [542s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:21:01.7484172Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=32, num_stages=6, num_warps=32, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, False], range_num_stages=[3, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:21:01.7485623Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:21:01.7485905Z `ptxas` stderr: 2026-02-21T09:21:01.7486698Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:21:01.7487652Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:21:01.7488154Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:01.7488345Z 2026-02-21T09:21:01.7488845Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsnki9y_h.ptx -o /tmp/tmpsnki9y_h.ptx.o 2026-02-21T09:21:01.7489421Z 2026-02-21T09:21:01.7489576Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:21:01.7489946Z ld.param.b64 %rd29, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:21:01.7490253Z ld.param.b64 %rd28, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:21:01.7490539Z ld.param.b64 %rd27, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:21:01.7490784Z $L__tmp0: 2026-02-21T09:21:01.7491078Z .loc 1 20 30 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:20:30 2026-02-21T09:21:01.7491452Z mov.u32 %r5171, %ctaid.x; 2026-02-21T09:21:01.7491768Z .loc 1 21 49 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:21:49 2026-02-21T09:21:01.7492126Z min.u32 %r2, %r5171, 4095; 2026-02-21T09:21:01.7492467Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7492842Z sub.s32 %r468, %r2, %r5171; 2026-02-21T09:21:01.7493023Z add.s32 %r469, %r468, 1; 2026-02-21T09:21:01.7493193Z shr.s32 %r470, %r469, 31; 2026-02-21T09:21:01.7493366Z shr.u32 %r471, %r470, 30; 2026-02-21T09:21:01.7493535Z add.s32 %r472, %r469, %r471; 2026-02-21T09:21:01.7493719Z and.b32 %r473, %r472, -4; 2026-02-21T09:21:01.7493894Z add.s32 %r5316, %r473, %r5171; 2026-02-21T09:21:01.7494222Z .loc 1 34 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:45 2026-02-21T09:21:01.7494750Z mov.u32 %r4, %tid.x; 2026-02-21T09:21:01.7494913Z and.b32 %r5, %r4, 31; 2026-02-21T09:21:01.7495079Z shr.u32 %r6, %r4, 5; 2026-02-21T09:21:01.7495230Z shl.b32 %r7, %r4, 3; 2026-02-21T09:21:01.7495389Z and.b32 %r8, %r7, 120; 2026-02-21T09:21:01.7495554Z and.b32 %r9, %r4, 127; 2026-02-21T09:21:01.7495866Z .loc 1 36 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:45 2026-02-21T09:21:01.7496222Z and.b32 %r10, %r4, 1020; 2026-02-21T09:21:01.7496398Z shr.u32 %r11, %r4, 2; 2026-02-21T09:21:01.7496694Z shr.u32 %r12, %r4, 4; 2026-02-21T09:21:01.7496853Z or.b32 %r13, %r12, 64; 2026-02-21T09:21:01.7497021Z or.b32 %r14, %r12, 128; 2026-02-21T09:21:01.7497270Z or.b32 %r15, %r12, 192; 2026-02-21T09:21:01.7497587Z .loc 1 44 48 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:44:48 2026-02-21T09:21:01.7497937Z and.b32 %r16, %r4, 896; 2026-02-21T09:21:01.7498107Z shr.u32 %r17, %r4, 7; 2026-02-21T09:21:01.7498406Z .loc 1 50 38 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:50:38 2026-02-21T09:21:01.7498757Z and.b32 %r18, %r4, 3; 2026-02-21T09:21:01.7498919Z shl.b32 %r19, %r18, 2; 2026-02-21T09:21:01.7499217Z .loc 1 68 38 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:68:38 2026-02-21T09:21:01.7499573Z and.b32 %r20, %r4, 128; 2026-02-21T09:21:01.7499956Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7500347Z setp.ge.s32 %p1, %r5171, %r5316; 2026-02-21T09:21:01.7500545Z and.b32 %r5143, %r7, 8056; 2026-02-21T09:21:01.7500730Z bfe.s32 %r5144, %r4, 4, 1; 2026-02-21T09:21:01.7500925Z mov.b32 %r5145, global_smem; 2026-02-21T09:21:01.7501107Z shl.b32 %r5146, %r4, 4; 2026-02-21T09:21:01.7501269Z and.b32 %r5147, %r7, 96; 2026-02-21T09:21:01.7501455Z shl.b32 %r5148, %r18, 1; 2026-02-21T09:21:01.7501626Z shl.b32 %r5149, %r4, 2; 2026-02-21T09:21:01.7501798Z and.b32 %r5150, %r4, 384; 2026-02-21T09:21:01.7501969Z and.b32 %r5151, %r12, 2; 2026-02-21T09:21:01.7502139Z and.b32 %r5152, %r7, 512; 2026-02-21T09:21:01.7502315Z setp.gt.u32 %p62, %r4, 511; 2026-02-21T09:21:01.7502499Z shr.u32 %r5153, %r4, 1; 2026-02-21T09:21:01.7502668Z shl.b32 %r5154, %r9, 6; 2026-02-21T09:21:01.7502834Z and.b32 %r5155, %r7, 48; 2026-02-21T09:21:01.7503010Z shr.u32 %r5156, %r16, 5; 2026-02-21T09:21:01.7503169Z shl.b32 %r5157, %r6, 7; 2026-02-21T09:21:01.7503338Z shl.b32 %r5158, %r5, 4; 2026-02-21T09:21:01.7503500Z shl.b32 %r5159, %r5, 3; 2026-02-21T09:21:01.7503665Z shl.b32 %r5160, %r4, 8; 2026-02-21T09:21:01.7503825Z shl.b32 %r5161, %r18, 14; 2026-02-21T09:21:01.7503998Z shl.b32 %r5162, %r4, 5; 2026-02-21T09:21:01.7504158Z and.b32 %r5163, %r4, 24; 2026-02-21T09:21:01.7504322Z shl.b32 %r5164, %r18, 5; 2026-02-21T09:21:01.7504488Z shl.b32 %r5165, %r10, 2; 2026-02-21T09:21:01.7504655Z shl.b32 %r5166, %r11, 10; 2026-02-21T09:21:01.7504824Z shl.b32 %r5167, %r17, 13; 2026-02-21T09:21:01.7504993Z setp.eq.b32 %p63, %r20, 0; 2026-02-21T09:21:01.7505174Z @%p1 bra $L__BB0_11; 2026-02-21T09:21:01.7505371Z // %bb.1: // %.lr.ph 2026-02-21T09:21:01.7505750Z .loc 1 0 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:0:121 2026-02-21T09:21:01.7506105Z and.b32 %r476, %r5144, 136; 2026-02-21T09:21:01.7506287Z xor.b32 %r477, %r476, %r5143; 2026-02-21T09:21:01.7506608Z add.s32 %r21, %r5145, %r477; 2026-02-21T09:21:01.7506803Z add.s32 %r22, %r21, 40960; 2026-02-21T09:21:01.7506979Z add.s32 %r23, %r21, 8192; 2026-02-21T09:21:01.7507146Z add.s32 %r24, %r21, 49152; 2026-02-21T09:21:01.7507321Z add.s32 %r25, %r21, 16384; 2026-02-21T09:21:01.7507486Z add.s32 %r26, %r21, 57344; 2026-02-21T09:21:01.7507660Z add.s32 %r27, %r21, 24576; 2026-02-21T09:21:01.7507824Z add.s32 %r28, %r21, 65536; 2026-02-21T09:21:01.7508157Z add.s32 %r29, %r21, 32768; 2026-02-21T09:21:01.7508410Z add.s32 %r30, %r21, 73728; 2026-02-21T09:21:01.7508589Z and.b32 %r480, %r5146, 7680; 2026-02-21T09:21:01.7508765Z or.b32 %r483, %r480, %r5147; 2026-02-21T09:21:01.7508936Z or.b32 %r484, %r483, %r5148; 2026-02-21T09:21:01.7509114Z or.b32 %r31, %r484, %r476; 2026-02-21T09:21:01.7509282Z xor.b32 %r32, %r31, 8; 2026-02-21T09:21:01.7509453Z and.b32 %r486, %r5149, 124; 2026-02-21T09:21:01.7509628Z selp.b32 %r490, 1, 0, %p62; 2026-02-21T09:21:01.7509811Z add.s32 %r491, %r5145, 81920; 2026-02-21T09:21:01.7509990Z add.s32 %r492, %r491, %r5150; 2026-02-21T09:21:01.7510172Z add.s32 %r493, %r492, %r490; 2026-02-21T09:21:01.7510346Z add.s32 %r494, %r493, %r5152; 2026-02-21T09:21:01.7510528Z add.s32 %r495, %r494, %r5151; 2026-02-21T09:21:01.7510794Z add.s32 %r33, %r495, %r486; 2026-02-21T09:21:01.7510983Z and.b32 %r497, %r5153, 384; 2026-02-21T09:21:01.7511161Z add.s32 %r498, %r491, %r5151; 2026-02-21T09:21:01.7511337Z add.s32 %r499, %r498, %r497; 2026-02-21T09:21:01.7511510Z add.s32 %r500, %r499, %r486; 2026-02-21T09:21:01.7511684Z add.s32 %r34, %r500, %r5152; 2026-02-21T09:21:01.7511865Z xor.b32 %r504, %r5155, %r5156; 2026-02-21T09:21:01.7512046Z or.b32 %r505, %r504, %r5154; 2026-02-21T09:21:01.7512223Z add.s32 %r35, %r491, %r505; 2026-02-21T09:21:01.7512399Z xor.b32 %r506, %r505, 32; 2026-02-21T09:21:01.7512566Z add.s32 %r36, %r491, %r506; 2026-02-21T09:21:01.7512746Z or.b32 %r509, %r5157, %r5158; 2026-02-21T09:21:01.7512918Z add.s32 %r510, %r5145, 90112; 2026-02-21T09:21:01.7513168Z add.s32 %r1091, %r510, %r509; 2026-02-21T09:21:01.7513345Z and.b32 %r511, %r5146, 112; 2026-02-21T09:21:01.7513519Z or.b32 %r513, %r5157, %r5159; 2026-02-21T09:21:01.7513688Z and.b32 %r514, %r513, 1920; 2026-02-21T09:21:01.7513864Z and.b32 %r516, %r5160, 2048; 2026-02-21T09:21:01.7514035Z add.s32 %r517, %r510, %r511; 2026-02-21T09:21:01.7514210Z add.s32 %r518, %r517, %r516; 2026-02-21T09:21:01.7514399Z add.s32 %r590, %r518, %r514; 2026-02-21T09:21:01.7514574Z bfe.u32 %r519, %r491, 4, 14; 2026-02-21T09:21:01.7514753Z cvt.u64.u32 %rd30, %r519; 2026-02-21T09:21:01.7514942Z or.b64 %rd1, %rd30, -9223371899382267904; 2026-02-21T09:21:01.7515158Z add.s32 %r520, %r5145, 81952; 2026-02-21T09:21:01.7515330Z bfe.u32 %r521, %r520, 4, 14; 2026-02-21T09:21:01.7515503Z cvt.u64.u32 %rd31, %r521; 2026-02-21T09:21:01.7515682Z or.b64 %rd2, %rd31, -9223371899382267904; 2026-02-21T09:21:01.7515891Z add.s32 %r522, %r5145, 86016; 2026-02-21T09:21:01.7516063Z bfe.u32 %r523, %r522, 4, 14; 2026-02-21T09:21:01.7516241Z cvt.u64.u32 %rd32, %r523; 2026-02-21T09:21:01.7516426Z or.b64 %rd3, %rd32, -9223371899382267904; 2026-02-21T09:21:01.7516771Z add.s32 %r524, %r5145, 86048; 2026-02-21T09:21:01.7516955Z bfe.u32 %r525, %r524, 4, 14; 2026-02-21T09:21:01.7517129Z cvt.u64.u32 %rd33, %r525; 2026-02-21T09:21:01.7517312Z or.b64 %rd4, %rd33, -9223371899382267904; 2026-02-21T09:21:01.7517511Z and.b32 %r528, %r5162, 15456; 2026-02-21T09:21:01.7517689Z shl.b32 %r530, %r5163, 4; 2026-02-21T09:21:01.7517859Z and.b32 %r531, %r5149, 16; 2026-02-21T09:21:01.7518039Z selp.b32 %r532, 64, 0, %p62; 2026-02-21T09:21:01.7518218Z or.b32 %r533, %r5161, %r531; 2026-02-21T09:21:01.7518390Z or.b32 %r534, %r528, %r530; 2026-02-21T09:21:01.7518572Z xor.b32 %r535, %r534, %r532; 2026-02-21T09:21:01.7518741Z or.b32 %r536, %r535, %r533; 2026-02-21T09:21:01.7518933Z add.s32 %r39, %r5145, %r536; 2026-02-21T09:21:01.7519111Z xor.b32 %r537, %r536, 32; 2026-02-21T09:21:01.7519280Z add.s32 %r40, %r5145, %r537; 2026-02-21T09:21:01.7519455Z shl.b32 %r538, %r5163, 11; 2026-02-21T09:21:01.7519632Z or.b32 %r541, %r538, %r5164; 2026-02-21T09:21:01.7519806Z xor.b32 %r542, %r541, %r5165; 2026-02-21T09:21:01.7519990Z add.s32 %r1392, %r5145, %r542; 2026-02-21T09:21:01.7520176Z add.s32 %r1397, %r1392, 4096; 2026-02-21T09:21:01.7520351Z add.s32 %r1402, %r1392, 8192; 2026-02-21T09:21:01.7520532Z add.s32 %r1407, %r1392, 12288; 2026-02-21T09:21:01.7521020Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7521385Z or.b32 %r544, %r5166, %r19; 2026-02-21T09:21:01.7521558Z or.b32 %r45, %r544, 176; 2026-02-21T09:21:01.7521732Z or.b32 %r46, %r5167, %r9; 2026-02-21T09:21:01.7522050Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7522423Z add.s32 %r5170, %r5171, 1; 2026-02-21T09:21:01.7522600Z add.s32 %r5169, %r5171, 2; 2026-02-21T09:21:01.7522774Z add.s32 %r5168, %r5171, 3; 2026-02-21T09:21:01.7523000Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:21:01.7523286Z // Child Loop BB0_3 Depth 2 2026-02-21T09:21:01.7523652Z // Child Loop BB0_5 Depth 2 2026-02-21T09:21:01.7523915Z // Child Loop BB0_7 Depth 2 2026-02-21T09:21:01.7524198Z // Child Loop BB0_9 Depth 2 2026-02-21T09:21:01.7524577Z .loc 1 29 33 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:29:33 2026-02-21T09:21:01.7524927Z shr.u32 %r569, %r5168, 6; 2026-02-21T09:21:01.7525103Z and.b32 %r82, %r569, 33554424; 2026-02-21T09:21:01.7525280Z shr.u32 %r570, %r5169, 6; 2026-02-21T09:21:01.7525452Z and.b32 %r83, %r570, 33554424; 2026-02-21T09:21:01.7525627Z shr.u32 %r571, %r5170, 6; 2026-02-21T09:21:01.7525797Z and.b32 %r84, %r571, 33554424; 2026-02-21T09:21:01.7526043Z shr.u32 %r572, %r5171, 6; 2026-02-21T09:21:01.7526230Z and.b32 %r573, %r572, 33554424; 2026-02-21T09:21:01.7526701Z .loc 1 30 39 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:39 2026-02-21T09:21:01.7527060Z sub.s32 %r574, 64, %r573; 2026-02-21T09:21:01.7527378Z .loc 1 30 52 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:52 2026-02-21T09:21:01.7527731Z min.s32 %r575, %r574, 8; 2026-02-21T09:21:01.7528047Z .loc 1 31 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:45 2026-02-21T09:21:01.7528406Z and.b32 %r576, %r5171, 511; 2026-02-21T09:21:01.7528728Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.7529081Z div.s32 %r577, %r576, %r575; 2026-02-21T09:21:01.7529393Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.7529768Z mul.lo.s32 %r578, %r577, %r575; 2026-02-21T09:21:01.7529956Z sub.s32 %r579, %r576, %r578; 2026-02-21T09:21:01.7530274Z .loc 1 31 30 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:30 2026-02-21T09:21:01.7530627Z add.s32 %r580, %r579, %r573; 2026-02-21T09:21:01.7530942Z .loc 1 33 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:33:27 2026-02-21T09:21:01.7531312Z shl.b32 %r85, %r580, 7; 2026-02-21T09:21:01.7531620Z .loc 1 35 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:35:27 2026-02-21T09:21:01.7531972Z shl.b32 %r86, %r577, 8; 2026-02-21T09:21:01.7532275Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7532635Z or.b32 %r581, %r86, %r11; 2026-02-21T09:21:01.7532951Z .loc 1 51 53 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:53 2026-02-21T09:21:01.7533300Z shl.b32 %r582, %r581, 10; 2026-02-21T09:21:01.7533610Z .loc 1 51 60 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:60 2026-02-21T09:21:01.7533963Z or.b32 %r583, %r582, %r19; 2026-02-21T09:21:01.7534289Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7534642Z mad.wide.s32 %rd34, %r583, 2, %rd27; 2026-02-21T09:21:01.7535140Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7535487Z bar.sync 0; 2026-02-21T09:21:01.7535630Z mov.b32 %r547, 8; 2026-02-21T09:21:01.7535789Z // begin inline asm 2026-02-21T09:21:01.7536029Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd34 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7536313Z // end inline asm 2026-02-21T09:21:01.7536605Z cp.async.commit_group; 2026-02-21T09:21:01.7536947Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7537314Z cvt.s64.s32 %rd45, %r582; 2026-02-21T09:21:01.7537496Z cvt.u64.u32 %rd10, %r19; 2026-02-21T09:21:01.7537673Z or.b64 %rd46, %rd45, %rd10; 2026-02-21T09:21:01.7537851Z shl.b64 %rd47, %rd46, 1; 2026-02-21T09:21:01.7538123Z add.s64 %rd48, %rd27, %rd47; 2026-02-21T09:21:01.7538305Z add.s64 %rd35, %rd48, 32; 2026-02-21T09:21:01.7538618Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7538977Z // begin inline asm 2026-02-21T09:21:01.7539212Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd35 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7539478Z // end inline asm 2026-02-21T09:21:01.7539636Z cp.async.commit_group; 2026-02-21T09:21:01.7539950Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7540308Z add.s64 %rd36, %rd48, 64; 2026-02-21T09:21:01.7540622Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7541047Z bar.sync 0; 2026-02-21T09:21:01.7541199Z // begin inline asm 2026-02-21T09:21:01.7541419Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd36 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7541691Z // end inline asm 2026-02-21T09:21:01.7541845Z cp.async.commit_group; 2026-02-21T09:21:01.7542154Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7542512Z add.s64 %rd37, %rd48, 96; 2026-02-21T09:21:01.7542831Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7543183Z // begin inline asm 2026-02-21T09:21:01.7543401Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd37 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7543669Z // end inline asm 2026-02-21T09:21:01.7543825Z cp.async.commit_group; 2026-02-21T09:21:01.7544131Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7544489Z add.s64 %rd38, %rd48, 128; 2026-02-21T09:21:01.7544800Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7545150Z bar.sync 0; 2026-02-21T09:21:01.7545309Z // begin inline asm 2026-02-21T09:21:01.7545538Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd38 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7545801Z // end inline asm 2026-02-21T09:21:01.7545967Z cp.async.commit_group; 2026-02-21T09:21:01.7546269Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7546761Z add.s64 %rd39, %rd48, 160; 2026-02-21T09:21:01.7547079Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7547423Z // begin inline asm 2026-02-21T09:21:01.7547647Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd39 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7559415Z // end inline asm 2026-02-21T09:21:01.7559632Z cp.async.commit_group; 2026-02-21T09:21:01.7560006Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7560396Z add.s64 %rd40, %rd48, 192; 2026-02-21T09:21:01.7560751Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7561120Z bar.sync 0; 2026-02-21T09:21:01.7561272Z // begin inline asm 2026-02-21T09:21:01.7561733Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd40 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7562015Z // end inline asm 2026-02-21T09:21:01.7562187Z cp.async.commit_group; 2026-02-21T09:21:01.7562522Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7562895Z add.s64 %rd41, %rd48, 224; 2026-02-21T09:21:01.7563225Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7563615Z // begin inline asm 2026-02-21T09:21:01.7563865Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd41 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7564139Z // end inline asm 2026-02-21T09:21:01.7564309Z cp.async.commit_group; 2026-02-21T09:21:01.7564714Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7565096Z add.s64 %rd42, %rd48, 256; 2026-02-21T09:21:01.7565413Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7565766Z bar.sync 0; 2026-02-21T09:21:01.7565918Z // begin inline asm 2026-02-21T09:21:01.7566150Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd42 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7566424Z // end inline asm 2026-02-21T09:21:01.7566725Z cp.async.commit_group; 2026-02-21T09:21:01.7567043Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7567394Z add.s64 %rd43, %rd48, 288; 2026-02-21T09:21:01.7567791Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7568139Z // begin inline asm 2026-02-21T09:21:01.7568386Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd43 + 0 ], 0x8, %r547; 2026-02-21T09:21:01.7568660Z // end inline asm 2026-02-21T09:21:01.7568815Z cp.async.commit_group; 2026-02-21T09:21:01.7569128Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7569480Z shl.b32 %r584, %r577, 18; 2026-02-21T09:21:01.7569662Z or.b32 %r5173, %r45, %r584; 2026-02-21T09:21:01.7569845Z add.s32 %r585, %r576, %r573; 2026-02-21T09:21:01.7570039Z sub.s32 %r586, %r585, %r578; 2026-02-21T09:21:01.7570215Z shl.b32 %r587, %r586, 7; 2026-02-21T09:21:01.7570393Z add.s32 %r5172, %r46, %r587; 2026-02-21T09:21:01.7570571Z mov.b32 %r1222, 0f00000000; 2026-02-21T09:21:01.7570741Z mov.b32 %r5175, 4; 2026-02-21T09:21:01.7570904Z mov.b32 %r5174, -1; 2026-02-21T09:21:01.7571066Z mov.b64 %rd219, -16; 2026-02-21T09:21:01.7571238Z mov.b32 %r1223, %r1222; 2026-02-21T09:21:01.7571406Z mov.b32 %r1224, %r1222; 2026-02-21T09:21:01.7571589Z mov.b32 %r1225, %r1222; 2026-02-21T09:21:01.7571756Z mov.b32 %r1226, %r1222; 2026-02-21T09:21:01.7571925Z mov.b32 %r1227, %r1222; 2026-02-21T09:21:01.7572086Z mov.b32 %r1228, %r1222; 2026-02-21T09:21:01.7572254Z mov.b32 %r1229, %r1222; 2026-02-21T09:21:01.7572421Z mov.b32 %r1230, %r1222; 2026-02-21T09:21:01.7572595Z mov.b32 %r1231, %r1222; 2026-02-21T09:21:01.7572759Z mov.b32 %r1232, %r1222; 2026-02-21T09:21:01.7572923Z mov.b32 %r1233, %r1222; 2026-02-21T09:21:01.7573089Z mov.b32 %r1234, %r1222; 2026-02-21T09:21:01.7573247Z mov.b32 %r1235, %r1222; 2026-02-21T09:21:01.7573416Z mov.b32 %r1236, %r1222; 2026-02-21T09:21:01.7573576Z mov.b32 %r1237, %r1222; 2026-02-21T09:21:01.7573747Z mov.b32 %r1238, %r1222; 2026-02-21T09:21:01.7573908Z mov.b32 %r1239, %r1222; 2026-02-21T09:21:01.7574077Z mov.b32 %r1240, %r1222; 2026-02-21T09:21:01.7574246Z mov.b32 %r1241, %r1222; 2026-02-21T09:21:01.7574407Z mov.b32 %r1242, %r1222; 2026-02-21T09:21:01.7574571Z mov.b32 %r1243, %r1222; 2026-02-21T09:21:01.7574742Z mov.b32 %r1244, %r1222; 2026-02-21T09:21:01.7574912Z mov.b32 %r1245, %r1222; 2026-02-21T09:21:01.7575072Z mov.b32 %r1246, %r1222; 2026-02-21T09:21:01.7575236Z mov.b32 %r1247, %r1222; 2026-02-21T09:21:01.7575396Z mov.b32 %r1248, %r1222; 2026-02-21T09:21:01.7575707Z mov.b32 %r1249, %r1222; 2026-02-21T09:21:01.7575868Z mov.b32 %r1250, %r1222; 2026-02-21T09:21:01.7576030Z mov.b32 %r1251, %r1222; 2026-02-21T09:21:01.7576192Z mov.b32 %r1252, %r1222; 2026-02-21T09:21:01.7576352Z mov.b32 %r1253, %r1222; 2026-02-21T09:21:01.7576711Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:01.7577009Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:01.7577266Z add.s64 %rd219, %rd219, 16; 2026-02-21T09:21:01.7577452Z setp.lt.u64 %p10, %rd219, 432; 2026-02-21T09:21:01.7577651Z add.s32 %r1364, %r5174, 1; 2026-02-21T09:21:01.7577829Z setp.gt.s32 %p11, %r1364, 4; 2026-02-21T09:21:01.7578022Z selp.b32 %r5174, 0, %r1364, %p11; 2026-02-21T09:21:01.7578457Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7578831Z cp.async.wait_group 8; 2026-02-21T09:21:01.7579011Z bar.sync 0; 2026-02-21T09:21:01.7579165Z shl.b32 %r1365, %r5174, 13; 2026-02-21T09:21:01.7579354Z add.s32 %r1367, %r5145, %r1365; 2026-02-21T09:21:01.7579682Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7580043Z add.s32 %r1368, %r1367, %r31; 2026-02-21T09:21:01.7580230Z ld.shared.b16 %rs3, [%r1368]; 2026-02-21T09:21:01.7580425Z ld.shared.b16 %rs4, [%r1368+256]; 2026-02-21T09:21:01.7580628Z ld.shared.b16 %rs5, [%r1368+16]; 2026-02-21T09:21:01.7580825Z ld.shared.b16 %rs6, [%r1368+272]; 2026-02-21T09:21:01.7581022Z add.s32 %r1369, %r1367, %r32; 2026-02-21T09:21:01.7581275Z ld.shared.b16 %rs7, [%r1369]; 2026-02-21T09:21:01.7581478Z ld.shared.b16 %rs8, [%r1369+256]; 2026-02-21T09:21:01.7581673Z ld.shared.b16 %rs9, [%r1369+16]; 2026-02-21T09:21:01.7581875Z ld.shared.b16 %rs10, [%r1369+272]; 2026-02-21T09:21:01.7582072Z cvt.f32.bf16 %r884, %rs3; 2026-02-21T09:21:01.7582253Z cvt.f32.bf16 %r885, %rs4; 2026-02-21T09:21:01.7582433Z cvt.f32.bf16 %r886, %rs7; 2026-02-21T09:21:01.7582625Z cvt.f32.bf16 %r887, %rs8; 2026-02-21T09:21:01.7582813Z cvt.f32.bf16 %r952, %rs5; 2026-02-21T09:21:01.7582987Z cvt.f32.bf16 %r953, %rs6; 2026-02-21T09:21:01.7583162Z cvt.f32.bf16 %r954, %rs9; 2026-02-21T09:21:01.7583334Z cvt.f32.bf16 %r955, %rs10; 2026-02-21T09:21:01.7583678Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7584046Z cvt.s64.s32 %rd63, %r5172; 2026-02-21T09:21:01.7584234Z add.s64 %rd50, %rd28, %rd63; 2026-02-21T09:21:01.7584570Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7584918Z // begin inline asm 2026-02-21T09:21:01.7585085Z mov.u64 %rd49, 0x0; 2026-02-21T09:21:01.7585322Z createpolicy.fractional.L2::evict_first.b64 %rd49, 1.0; 2026-02-21T09:21:01.7585585Z // end inline asm 2026-02-21T09:21:01.7585737Z // begin inline asm 2026-02-21T09:21:01.7585896Z mov.u16 %rs1, 0x0; 2026-02-21T09:21:01.7586147Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd50 + 0 ], %rd49; 2026-02-21T09:21:01.7586570Z // end inline asm 2026-02-21T09:21:01.7586889Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7587245Z st.shared.b8 [%r33], %rs1; 2026-02-21T09:21:01.7587424Z bar.sync 0; 2026-02-21T09:21:01.7587589Z ld.shared.v2.b8 {%rs11, %rs12}, [%r34]; 2026-02-21T09:21:01.7587949Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7588366Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:21:01.7588557Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:21:01.7588863Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7589232Z selp.b16 %rs15, %rs13, %rs11, %p63; 2026-02-21T09:21:01.7589444Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:21:01.7589612Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:21:01.7589880Z selp.b16 %rs18, %rs14, %rs12, %p63; 2026-02-21T09:21:01.7590163Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:21:01.7590352Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:21:01.7590663Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7591020Z cvt.rn.f32.s16 %r1370, %rs17; 2026-02-21T09:21:01.7591207Z cvt.rn.f32.s16 %r1371, %rs20; 2026-02-21T09:21:01.7591378Z bar.sync 0; 2026-02-21T09:21:01.7591532Z st.shared.b32 [%r35], %r1370; 2026-02-21T09:21:01.7591717Z st.shared.b32 [%r36], %r1371; 2026-02-21T09:21:01.7591976Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1222}; 2026-02-21T09:21:01.7592246Z bar.sync 0; 2026-02-21T09:21:01.7592394Z // begin inline asm 2026-02-21T09:21:01.7592715Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r752, %r888}, [%r590]; 2026-02-21T09:21:01.7593019Z // end inline asm 2026-02-21T09:21:01.7593171Z bar.sync 0; 2026-02-21T09:21:01.7593396Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1224}; 2026-02-21T09:21:01.7593670Z bar.sync 0; 2026-02-21T09:21:01.7593810Z // begin inline asm 2026-02-21T09:21:01.7594054Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r754, %r890}, [%r590]; 2026-02-21T09:21:01.7594335Z // end inline asm 2026-02-21T09:21:01.7594489Z bar.sync 0; 2026-02-21T09:21:01.7594699Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1223}; 2026-02-21T09:21:01.7594969Z bar.sync 0; 2026-02-21T09:21:01.7595112Z // begin inline asm 2026-02-21T09:21:01.7595350Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r753, %r889}, [%r590]; 2026-02-21T09:21:01.7595706Z // end inline asm 2026-02-21T09:21:01.7595852Z bar.sync 0; 2026-02-21T09:21:01.7596065Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1225}; 2026-02-21T09:21:01.7596337Z bar.sync 0; 2026-02-21T09:21:01.7596610Z // begin inline asm 2026-02-21T09:21:01.7596847Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r755, %r891}, [%r590]; 2026-02-21T09:21:01.7597126Z // end inline asm 2026-02-21T09:21:01.7597279Z bar.sync 0; 2026-02-21T09:21:01.7597490Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1226}; 2026-02-21T09:21:01.7597755Z bar.sync 0; 2026-02-21T09:21:01.7597895Z // begin inline asm 2026-02-21T09:21:01.7598131Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r756, %r892}, [%r590]; 2026-02-21T09:21:01.7598403Z // end inline asm 2026-02-21T09:21:01.7598562Z bar.sync 0; 2026-02-21T09:21:01.7598771Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1228}; 2026-02-21T09:21:01.7599042Z bar.sync 0; 2026-02-21T09:21:01.7599180Z // begin inline asm 2026-02-21T09:21:01.7599417Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r758, %r894}, [%r590]; 2026-02-21T09:21:01.7599696Z // end inline asm 2026-02-21T09:21:01.7599836Z bar.sync 0; 2026-02-21T09:21:01.7600052Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1227}; 2026-02-21T09:21:01.7600310Z bar.sync 0; 2026-02-21T09:21:01.7600465Z // begin inline asm 2026-02-21T09:21:01.7600695Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r757, %r893}, [%r590]; 2026-02-21T09:21:01.7600974Z // end inline asm 2026-02-21T09:21:01.7601113Z bar.sync 0; 2026-02-21T09:21:01.7601328Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1229}; 2026-02-21T09:21:01.7601591Z bar.sync 0; 2026-02-21T09:21:01.7601727Z // begin inline asm 2026-02-21T09:21:01.7601960Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r759, %r895}, [%r590]; 2026-02-21T09:21:01.7602231Z // end inline asm 2026-02-21T09:21:01.7602379Z bar.sync 0; 2026-02-21T09:21:01.7602584Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1230}; 2026-02-21T09:21:01.7602844Z bar.sync 0; 2026-02-21T09:21:01.7602996Z // begin inline asm 2026-02-21T09:21:01.7603231Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r760, %r896}, [%r590]; 2026-02-21T09:21:01.7603505Z // end inline asm 2026-02-21T09:21:01.7603655Z bar.sync 0; 2026-02-21T09:21:01.7603869Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1232}; 2026-02-21T09:21:01.7604129Z bar.sync 0; 2026-02-21T09:21:01.7604272Z // begin inline asm 2026-02-21T09:21:01.7604651Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r762, %r898}, [%r590]; 2026-02-21T09:21:01.7604926Z // end inline asm 2026-02-21T09:21:01.7605078Z bar.sync 0; 2026-02-21T09:21:01.7605294Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1231}; 2026-02-21T09:21:01.7605550Z bar.sync 0; 2026-02-21T09:21:01.7605690Z // begin inline asm 2026-02-21T09:21:01.7605924Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r761, %r897}, [%r590]; 2026-02-21T09:21:01.7606192Z // end inline asm 2026-02-21T09:21:01.7606335Z bar.sync 0; 2026-02-21T09:21:01.7606688Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1233}; 2026-02-21T09:21:01.7606950Z bar.sync 0; 2026-02-21T09:21:01.7607088Z // begin inline asm 2026-02-21T09:21:01.7607393Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r763, %r899}, [%r590]; 2026-02-21T09:21:01.7607668Z // end inline asm 2026-02-21T09:21:01.7607806Z bar.sync 0; 2026-02-21T09:21:01.7608012Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1234}; 2026-02-21T09:21:01.7608272Z bar.sync 0; 2026-02-21T09:21:01.7608410Z // begin inline asm 2026-02-21T09:21:01.7608637Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r764, %r900}, [%r590]; 2026-02-21T09:21:01.7608908Z // end inline asm 2026-02-21T09:21:01.7609052Z bar.sync 0; 2026-02-21T09:21:01.7609252Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1236}; 2026-02-21T09:21:01.7609509Z bar.sync 0; 2026-02-21T09:21:01.7609641Z // begin inline asm 2026-02-21T09:21:01.7609869Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r766, %r902}, [%r590]; 2026-02-21T09:21:01.7610138Z // end inline asm 2026-02-21T09:21:01.7610355Z bar.sync 0; 2026-02-21T09:21:01.7610579Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1235}; 2026-02-21T09:21:01.7610838Z bar.sync 0; 2026-02-21T09:21:01.7610978Z // begin inline asm 2026-02-21T09:21:01.7611205Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r765, %r901}, [%r590]; 2026-02-21T09:21:01.7611477Z // end inline asm 2026-02-21T09:21:01.7611613Z bar.sync 0; 2026-02-21T09:21:01.7611820Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1237}; 2026-02-21T09:21:01.7612075Z bar.sync 0; 2026-02-21T09:21:01.7612211Z // begin inline asm 2026-02-21T09:21:01.7612436Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r767, %r903}, [%r590]; 2026-02-21T09:21:01.7612709Z // end inline asm 2026-02-21T09:21:01.7612855Z bar.sync 0; 2026-02-21T09:21:01.7613074Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1238}; 2026-02-21T09:21:01.7613335Z bar.sync 0; 2026-02-21T09:21:01.7613470Z // begin inline asm 2026-02-21T09:21:01.7613700Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r768, %r904}, [%r590]; 2026-02-21T09:21:01.7613968Z // end inline asm 2026-02-21T09:21:01.7614110Z bar.sync 0; 2026-02-21T09:21:01.7614312Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1240}; 2026-02-21T09:21:01.7614571Z bar.sync 0; 2026-02-21T09:21:01.7614703Z // begin inline asm 2026-02-21T09:21:01.7614933Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r770, %r906}, [%r590]; 2026-02-21T09:21:01.7615213Z // end inline asm 2026-02-21T09:21:01.7615353Z bar.sync 0; 2026-02-21T09:21:01.7615561Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1239}; 2026-02-21T09:21:01.7615815Z bar.sync 0; 2026-02-21T09:21:01.7615966Z // begin inline asm 2026-02-21T09:21:01.7616199Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r769, %r905}, [%r590]; 2026-02-21T09:21:01.7616583Z // end inline asm 2026-02-21T09:21:01.7616733Z bar.sync 0; 2026-02-21T09:21:01.7616944Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1241}; 2026-02-21T09:21:01.7617204Z bar.sync 0; 2026-02-21T09:21:01.7617341Z // begin inline asm 2026-02-21T09:21:01.7617584Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r771, %r907}, [%r590]; 2026-02-21T09:21:01.7617852Z // end inline asm 2026-02-21T09:21:01.7617997Z bar.sync 0; 2026-02-21T09:21:01.7618196Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1242}; 2026-02-21T09:21:01.7618456Z bar.sync 0; 2026-02-21T09:21:01.7618589Z // begin inline asm 2026-02-21T09:21:01.7618964Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r772, %r908}, [%r590]; 2026-02-21T09:21:01.7619231Z // end inline asm 2026-02-21T09:21:01.7619371Z bar.sync 0; 2026-02-21T09:21:01.7619577Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1244}; 2026-02-21T09:21:01.7619831Z bar.sync 0; 2026-02-21T09:21:01.7619967Z // begin inline asm 2026-02-21T09:21:01.7620198Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r774, %r910}, [%r590]; 2026-02-21T09:21:01.7620469Z // end inline asm 2026-02-21T09:21:01.7620605Z bar.sync 0; 2026-02-21T09:21:01.7620812Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1243}; 2026-02-21T09:21:01.7621066Z bar.sync 0; 2026-02-21T09:21:01.7621202Z // begin inline asm 2026-02-21T09:21:01.7621432Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r773, %r909}, [%r590]; 2026-02-21T09:21:01.7621769Z // end inline asm 2026-02-21T09:21:01.7621928Z bar.sync 0; 2026-02-21T09:21:01.7622134Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1245}; 2026-02-21T09:21:01.7622398Z bar.sync 0; 2026-02-21T09:21:01.7622530Z // begin inline asm 2026-02-21T09:21:01.7622757Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r775, %r911}, [%r590]; 2026-02-21T09:21:01.7623026Z // end inline asm 2026-02-21T09:21:01.7623166Z bar.sync 0; 2026-02-21T09:21:01.7623365Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1246}; 2026-02-21T09:21:01.7623623Z bar.sync 0; 2026-02-21T09:21:01.7623761Z // begin inline asm 2026-02-21T09:21:01.7623987Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r776, %r912}, [%r590]; 2026-02-21T09:21:01.7624263Z // end inline asm 2026-02-21T09:21:01.7624470Z bar.sync 0; 2026-02-21T09:21:01.7624680Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1248}; 2026-02-21T09:21:01.7624933Z bar.sync 0; 2026-02-21T09:21:01.7625068Z // begin inline asm 2026-02-21T09:21:01.7625296Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r778, %r914}, [%r590]; 2026-02-21T09:21:01.7625567Z // end inline asm 2026-02-21T09:21:01.7625705Z bar.sync 0; 2026-02-21T09:21:01.7625914Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1247}; 2026-02-21T09:21:01.7626170Z bar.sync 0; 2026-02-21T09:21:01.7626304Z // begin inline asm 2026-02-21T09:21:01.7626672Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r777, %r913}, [%r590]; 2026-02-21T09:21:01.7626947Z // end inline asm 2026-02-21T09:21:01.7627089Z bar.sync 0; 2026-02-21T09:21:01.7627289Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1249}; 2026-02-21T09:21:01.7627548Z bar.sync 0; 2026-02-21T09:21:01.7627679Z // begin inline asm 2026-02-21T09:21:01.7627915Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r779, %r915}, [%r590]; 2026-02-21T09:21:01.7628188Z // end inline asm 2026-02-21T09:21:01.7628401Z bar.sync 0; 2026-02-21T09:21:01.7628612Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1250}; 2026-02-21T09:21:01.7628868Z bar.sync 0; 2026-02-21T09:21:01.7629006Z // begin inline asm 2026-02-21T09:21:01.7629233Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r780, %r916}, [%r590]; 2026-02-21T09:21:01.7629507Z // end inline asm 2026-02-21T09:21:01.7629643Z bar.sync 0; 2026-02-21T09:21:01.7629850Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1252}; 2026-02-21T09:21:01.7630105Z bar.sync 0; 2026-02-21T09:21:01.7630241Z // begin inline asm 2026-02-21T09:21:01.7630471Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r782, %r918}, [%r590]; 2026-02-21T09:21:01.7630741Z // end inline asm 2026-02-21T09:21:01.7630882Z bar.sync 0; 2026-02-21T09:21:01.7631083Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1251}; 2026-02-21T09:21:01.7631343Z bar.sync 0; 2026-02-21T09:21:01.7631477Z // begin inline asm 2026-02-21T09:21:01.7631725Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r781, %r917}, [%r590]; 2026-02-21T09:21:01.7631996Z // end inline asm 2026-02-21T09:21:01.7632134Z bar.sync 0; 2026-02-21T09:21:01.7632340Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r1253}; 2026-02-21T09:21:01.7632592Z bar.sync 0; 2026-02-21T09:21:01.7632728Z // begin inline asm 2026-02-21T09:21:01.7633041Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r783, %r919}, [%r590]; 2026-02-21T09:21:01.7633391Z // end inline asm 2026-02-21T09:21:01.7633529Z $L__tmp1: 2026-02-21T09:21:01.7633911Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7634335Z // begin inline asm 2026-02-21T09:21:01.7634510Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7634698Z // end inline asm 2026-02-21T09:21:01.7634865Z shfl.sync.idx.b32 %r1372, %r6, 0, 31, -1; 2026-02-21T09:21:01.7635093Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7635288Z mov.pred %p3, -1; 2026-02-21T09:21:01.7635447Z // begin inline asm 2026-02-21T09:21:01.7636272Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783}, {%r884,%r885,%r886,%r887}, %rd1, %p3, 1, 1; 2026-02-21T09:21:01.7637224Z // end inline asm 2026-02-21T09:21:01.7637374Z // begin inline asm 2026-02-21T09:21:01.7638113Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783}, {%r952,%r953,%r954,%r955}, %rd2, %p3, 1, 1; 2026-02-21T09:21:01.7638899Z // end inline asm 2026-02-21T09:21:01.7639043Z // begin inline asm 2026-02-21T09:21:01.7639854Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916,%r917,%r918,%r919}, {%r884,%r885,%r886,%r887}, %rd3, %p3, 1, 1; 2026-02-21T09:21:01.7640641Z // end inline asm 2026-02-21T09:21:01.7640783Z // begin inline asm 2026-02-21T09:21:01.7641511Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916,%r917,%r918,%r919}, {%r952,%r953,%r954,%r955}, %rd4, %p3, 1, 1; 2026-02-21T09:21:01.7642315Z // end inline asm 2026-02-21T09:21:01.7642477Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7642675Z mov.b32 %r1323, 0; 2026-02-21T09:21:01.7642825Z mov.b32 %r1021, %r1323; 2026-02-21T09:21:01.7642994Z mov.b32 %r1022, %r1323; 2026-02-21T09:21:01.7643155Z mov.b32 %r1020, %r491; 2026-02-21T09:21:01.7643317Z // begin inline asm 2026-02-21T09:21:01.7644292Z // wait for regs: %r752,%r753,%r754,%r755,%r756,%r757,%r758,%r759,%r760,%r761,%r762,%r763,%r764,%r765,%r766,%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777,%r778,%r779,%r780,%r781,%r782,%r783,%r888,%r889,%r890,%r891,%r892,%r893,%r894,%r895,%r896,%r897,%r898,%r899,%r900,%r901,%r902,%r903,%r904,%r905,%r906,%r907,%r908,%r909,%r910,%r911,%r912,%r913,%r914,%r915,%r916,%r917,%r918,%r919,%r1020,%r1021,%r1022 2026-02-21T09:21:01.7645345Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7645538Z // end inline asm 2026-02-21T09:21:01.7645678Z $L__tmp2: 2026-02-21T09:21:01.7645980Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7646354Z add.s32 %r1373, %r1367, 40960; 2026-02-21T09:21:01.7646814Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7647173Z add.s32 %r1374, %r1373, %r31; 2026-02-21T09:21:01.7647362Z ld.shared.b16 %rs21, [%r1374]; 2026-02-21T09:21:01.7647552Z ld.shared.b16 %rs22, [%r1374+256]; 2026-02-21T09:21:01.7647756Z ld.shared.b16 %rs23, [%r1374+16]; 2026-02-21T09:21:01.7647954Z ld.shared.b16 %rs24, [%r1374+272]; 2026-02-21T09:21:01.7648141Z add.s32 %r1375, %r1373, %r32; 2026-02-21T09:21:01.7648327Z ld.shared.b16 %rs25, [%r1375]; 2026-02-21T09:21:01.7648509Z ld.shared.b16 %rs26, [%r1375+256]; 2026-02-21T09:21:01.7648861Z ld.shared.b16 %rs27, [%r1375+16]; 2026-02-21T09:21:01.7649047Z ld.shared.b16 %rs28, [%r1375+272]; 2026-02-21T09:21:01.7649242Z cvt.f32.bf16 %r1218, %rs21; 2026-02-21T09:21:01.7649438Z cvt.f32.bf16 %r1219, %rs22; 2026-02-21T09:21:01.7649609Z cvt.f32.bf16 %r1220, %rs25; 2026-02-21T09:21:01.7649780Z cvt.f32.bf16 %r1221, %rs26; 2026-02-21T09:21:01.7649948Z cvt.f32.bf16 %r1286, %rs23; 2026-02-21T09:21:01.7650117Z cvt.f32.bf16 %r1287, %rs24; 2026-02-21T09:21:01.7650282Z cvt.f32.bf16 %r1288, %rs27; 2026-02-21T09:21:01.7650450Z cvt.f32.bf16 %r1289, %rs28; 2026-02-21T09:21:01.7650762Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7651117Z add.s32 %r1376, %r5172, 65536; 2026-02-21T09:21:01.7651374Z cvt.s64.s32 %rd64, %r1376; 2026-02-21T09:21:01.7651551Z add.s64 %rd57, %rd28, %rd64; 2026-02-21T09:21:01.7651874Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7652224Z // begin inline asm 2026-02-21T09:21:01.7652383Z mov.u64 %rd56, 0x0; 2026-02-21T09:21:01.7652598Z createpolicy.fractional.L2::evict_first.b64 %rd56, 1.0; 2026-02-21T09:21:01.7652852Z // end inline asm 2026-02-21T09:21:01.7652999Z // begin inline asm 2026-02-21T09:21:01.7653151Z mov.u16 %rs2, 0x0; 2026-02-21T09:21:01.7653396Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd57 + 0 ], %rd56; 2026-02-21T09:21:01.7653703Z // end inline asm 2026-02-21T09:21:01.7654069Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7654424Z bar.sync 0; 2026-02-21T09:21:01.7654574Z st.shared.b8 [%r33], %rs2; 2026-02-21T09:21:01.7654743Z bar.sync 0; 2026-02-21T09:21:01.7654912Z ld.shared.v2.b8 {%rs29, %rs30}, [%r34]; 2026-02-21T09:21:01.7655266Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7655614Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:21:01.7655798Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:21:01.7656107Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7656598Z selp.b16 %rs33, %rs31, %rs29, %p63; 2026-02-21T09:21:01.7656808Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:21:01.7656986Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:21:01.7657161Z selp.b16 %rs36, %rs32, %rs30, %p63; 2026-02-21T09:21:01.7657359Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:21:01.7657532Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:21:01.7657844Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7658212Z cvt.rn.f32.s16 %r1377, %rs35; 2026-02-21T09:21:01.7658395Z cvt.rn.f32.s16 %r1378, %rs38; 2026-02-21T09:21:01.7658576Z bar.sync 0; 2026-02-21T09:21:01.7658723Z st.shared.b32 [%r35], %r1377; 2026-02-21T09:21:01.7658915Z st.shared.b32 [%r36], %r1378; 2026-02-21T09:21:01.7659086Z $L__tmp3: 2026-02-21T09:21:01.7659453Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7659984Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r752, %r888}; 2026-02-21T09:21:01.7660266Z bar.sync 0; 2026-02-21T09:21:01.7660412Z // begin inline asm 2026-02-21T09:21:01.7660640Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1222}, [%r1091]; 2026-02-21T09:21:01.7660913Z // end inline asm 2026-02-21T09:21:01.7661054Z bar.sync 0; 2026-02-21T09:21:01.7661281Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r754, %r890}; 2026-02-21T09:21:01.7661560Z bar.sync 0; 2026-02-21T09:21:01.7661704Z // begin inline asm 2026-02-21T09:21:01.7661928Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1224}, [%r1091]; 2026-02-21T09:21:01.7662193Z // end inline asm 2026-02-21T09:21:01.7662341Z bar.sync 0; 2026-02-21T09:21:01.7662560Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r753, %r889}; 2026-02-21T09:21:01.7662836Z bar.sync 0; 2026-02-21T09:21:01.7663125Z // begin inline asm 2026-02-21T09:21:01.7663354Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1223}, [%r1091]; 2026-02-21T09:21:01.7663627Z // end inline asm 2026-02-21T09:21:01.7663783Z bar.sync 0; 2026-02-21T09:21:01.7664013Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r755, %r891}; 2026-02-21T09:21:01.7664298Z bar.sync 0; 2026-02-21T09:21:01.7664444Z // begin inline asm 2026-02-21T09:21:01.7664680Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1225}, [%r1091]; 2026-02-21T09:21:01.7664953Z // end inline asm 2026-02-21T09:21:01.7665092Z bar.sync 0; 2026-02-21T09:21:01.7665313Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r756, %r892}; 2026-02-21T09:21:01.7665581Z bar.sync 0; 2026-02-21T09:21:01.7665720Z // begin inline asm 2026-02-21T09:21:01.7666016Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1226}, [%r1091]; 2026-02-21T09:21:01.7666288Z // end inline asm 2026-02-21T09:21:01.7666430Z bar.sync 0; 2026-02-21T09:21:01.7666778Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r758, %r894}; 2026-02-21T09:21:01.7667063Z bar.sync 0; 2026-02-21T09:21:01.7667200Z // begin inline asm 2026-02-21T09:21:01.7667422Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1228}, [%r1091]; 2026-02-21T09:21:01.7667685Z // end inline asm 2026-02-21T09:21:01.7667830Z bar.sync 0; 2026-02-21T09:21:01.7668039Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r757, %r893}; 2026-02-21T09:21:01.7668410Z bar.sync 0; 2026-02-21T09:21:01.7668555Z // begin inline asm 2026-02-21T09:21:01.7668774Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1227}, [%r1091]; 2026-02-21T09:21:01.7669152Z // end inline asm 2026-02-21T09:21:01.7669295Z bar.sync 0; 2026-02-21T09:21:01.7669516Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r759, %r895}; 2026-02-21T09:21:01.7669788Z bar.sync 0; 2026-02-21T09:21:01.7669931Z // begin inline asm 2026-02-21T09:21:01.7670153Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1229}, [%r1091]; 2026-02-21T09:21:01.7670419Z // end inline asm 2026-02-21T09:21:01.7670561Z bar.sync 0; 2026-02-21T09:21:01.7670782Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r760, %r896}; 2026-02-21T09:21:01.7671053Z bar.sync 0; 2026-02-21T09:21:01.7671186Z // begin inline asm 2026-02-21T09:21:01.7671407Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1230}, [%r1091]; 2026-02-21T09:21:01.7671668Z // end inline asm 2026-02-21T09:21:01.7671825Z bar.sync 0; 2026-02-21T09:21:01.7672045Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r762, %r898}; 2026-02-21T09:21:01.7672325Z bar.sync 0; 2026-02-21T09:21:01.7672469Z // begin inline asm 2026-02-21T09:21:01.7672711Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1232}, [%r1091]; 2026-02-21T09:21:01.7672981Z // end inline asm 2026-02-21T09:21:01.7673122Z bar.sync 0; 2026-02-21T09:21:01.7673342Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r761, %r897}; 2026-02-21T09:21:01.7673620Z bar.sync 0; 2026-02-21T09:21:01.7673765Z // begin inline asm 2026-02-21T09:21:01.7673989Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1231}, [%r1091]; 2026-02-21T09:21:01.7674262Z // end inline asm 2026-02-21T09:21:01.7674400Z bar.sync 0; 2026-02-21T09:21:01.7674619Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r763, %r899}; 2026-02-21T09:21:01.7674893Z bar.sync 0; 2026-02-21T09:21:01.7675042Z // begin inline asm 2026-02-21T09:21:01.7675267Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1233}, [%r1091]; 2026-02-21T09:21:01.7675534Z // end inline asm 2026-02-21T09:21:01.7675682Z bar.sync 0; 2026-02-21T09:21:01.7675905Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r764, %r900}; 2026-02-21T09:21:01.7676182Z bar.sync 0; 2026-02-21T09:21:01.7676319Z // begin inline asm 2026-02-21T09:21:01.7676694Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1234}, [%r1091]; 2026-02-21T09:21:01.7676976Z // end inline asm 2026-02-21T09:21:01.7677135Z bar.sync 0; 2026-02-21T09:21:01.7677361Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r766, %r902}; 2026-02-21T09:21:01.7677632Z bar.sync 0; 2026-02-21T09:21:01.7677917Z // begin inline asm 2026-02-21T09:21:01.7678135Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1236}, [%r1091]; 2026-02-21T09:21:01.7678400Z // end inline asm 2026-02-21T09:21:01.7678541Z bar.sync 0; 2026-02-21T09:21:01.7678763Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r765, %r901}; 2026-02-21T09:21:01.7679046Z bar.sync 0; 2026-02-21T09:21:01.7679200Z // begin inline asm 2026-02-21T09:21:01.7679426Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1235}, [%r1091]; 2026-02-21T09:21:01.7679695Z // end inline asm 2026-02-21T09:21:01.7679843Z bar.sync 0; 2026-02-21T09:21:01.7680066Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r767, %r903}; 2026-02-21T09:21:01.7680354Z bar.sync 0; 2026-02-21T09:21:01.7680491Z // begin inline asm 2026-02-21T09:21:01.7680792Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1237}, [%r1091]; 2026-02-21T09:21:01.7681057Z // end inline asm 2026-02-21T09:21:01.7681204Z bar.sync 0; 2026-02-21T09:21:01.7681425Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r768, %r904}; 2026-02-21T09:21:01.7681702Z bar.sync 0; 2026-02-21T09:21:01.7681841Z // begin inline asm 2026-02-21T09:21:01.7682057Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1238}, [%r1091]; 2026-02-21T09:21:01.7682321Z // end inline asm 2026-02-21T09:21:01.7682458Z bar.sync 0; 2026-02-21T09:21:01.7682684Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r770, %r906}; 2026-02-21T09:21:01.7682954Z bar.sync 0; 2026-02-21T09:21:01.7683098Z // begin inline asm 2026-02-21T09:21:01.7683323Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1240}, [%r1091]; 2026-02-21T09:21:01.7683676Z // end inline asm 2026-02-21T09:21:01.7683827Z bar.sync 0; 2026-02-21T09:21:01.7684045Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r769, %r905}; 2026-02-21T09:21:01.7684320Z bar.sync 0; 2026-02-21T09:21:01.7684459Z // begin inline asm 2026-02-21T09:21:01.7684693Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1239}, [%r1091]; 2026-02-21T09:21:01.7684959Z // end inline asm 2026-02-21T09:21:01.7685114Z bar.sync 0; 2026-02-21T09:21:01.7685329Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r771, %r907}; 2026-02-21T09:21:01.7685601Z bar.sync 0; 2026-02-21T09:21:01.7685739Z // begin inline asm 2026-02-21T09:21:01.7685966Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1241}, [%r1091]; 2026-02-21T09:21:01.7686231Z // end inline asm 2026-02-21T09:21:01.7686371Z bar.sync 0; 2026-02-21T09:21:01.7686732Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r772, %r908}; 2026-02-21T09:21:01.7687004Z bar.sync 0; 2026-02-21T09:21:01.7687149Z // begin inline asm 2026-02-21T09:21:01.7687367Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1242}, [%r1091]; 2026-02-21T09:21:01.7687634Z // end inline asm 2026-02-21T09:21:01.7687773Z bar.sync 0; 2026-02-21T09:21:01.7687991Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r774, %r910}; 2026-02-21T09:21:01.7688270Z bar.sync 0; 2026-02-21T09:21:01.7688419Z // begin inline asm 2026-02-21T09:21:01.7688640Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1244}, [%r1091]; 2026-02-21T09:21:01.7688902Z // end inline asm 2026-02-21T09:21:01.7689039Z bar.sync 0; 2026-02-21T09:21:01.7689250Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r773, %r909}; 2026-02-21T09:21:01.7689524Z bar.sync 0; 2026-02-21T09:21:01.7689659Z // begin inline asm 2026-02-21T09:21:01.7689882Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1243}, [%r1091]; 2026-02-21T09:21:01.7690142Z // end inline asm 2026-02-21T09:21:01.7690289Z bar.sync 0; 2026-02-21T09:21:01.7690513Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r775, %r911}; 2026-02-21T09:21:01.7690784Z bar.sync 0; 2026-02-21T09:21:01.7690927Z // begin inline asm 2026-02-21T09:21:01.7691141Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1245}, [%r1091]; 2026-02-21T09:21:01.7691413Z // end inline asm 2026-02-21T09:21:01.7691553Z bar.sync 0; 2026-02-21T09:21:01.7691768Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r776, %r912}; 2026-02-21T09:21:01.7692037Z bar.sync 0; 2026-02-21T09:21:01.7692279Z // begin inline asm 2026-02-21T09:21:01.7692561Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1246}, [%r1091]; 2026-02-21T09:21:01.7692827Z // end inline asm 2026-02-21T09:21:01.7692969Z bar.sync 0; 2026-02-21T09:21:01.7693182Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r778, %r914}; 2026-02-21T09:21:01.7693459Z bar.sync 0; 2026-02-21T09:21:01.7693594Z // begin inline asm 2026-02-21T09:21:01.7693816Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1248}, [%r1091]; 2026-02-21T09:21:01.7694086Z // end inline asm 2026-02-21T09:21:01.7694234Z bar.sync 0; 2026-02-21T09:21:01.7694449Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r777, %r913}; 2026-02-21T09:21:01.7694723Z bar.sync 0; 2026-02-21T09:21:01.7694862Z // begin inline asm 2026-02-21T09:21:01.7695150Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1247}, [%r1091]; 2026-02-21T09:21:01.7695423Z // end inline asm 2026-02-21T09:21:01.7695562Z bar.sync 0; 2026-02-21T09:21:01.7695782Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r779, %r915}; 2026-02-21T09:21:01.7696054Z bar.sync 0; 2026-02-21T09:21:01.7696208Z // begin inline asm 2026-02-21T09:21:01.7696427Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1249}, [%r1091]; 2026-02-21T09:21:01.7696818Z // end inline asm 2026-02-21T09:21:01.7696960Z bar.sync 0; 2026-02-21T09:21:01.7697180Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r780, %r916}; 2026-02-21T09:21:01.7697455Z bar.sync 0; 2026-02-21T09:21:01.7697590Z // begin inline asm 2026-02-21T09:21:01.7697816Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1250}, [%r1091]; 2026-02-21T09:21:01.7698177Z // end inline asm 2026-02-21T09:21:01.7698324Z bar.sync 0; 2026-02-21T09:21:01.7698537Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r782, %r918}; 2026-02-21T09:21:01.7698825Z bar.sync 0; 2026-02-21T09:21:01.7698966Z // begin inline asm 2026-02-21T09:21:01.7699184Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1252}, [%r1091]; 2026-02-21T09:21:01.7699447Z // end inline asm 2026-02-21T09:21:01.7699588Z bar.sync 0; 2026-02-21T09:21:01.7699813Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r781, %r917}; 2026-02-21T09:21:01.7700082Z bar.sync 0; 2026-02-21T09:21:01.7700223Z // begin inline asm 2026-02-21T09:21:01.7700439Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1251}, [%r1091]; 2026-02-21T09:21:01.7700708Z // end inline asm 2026-02-21T09:21:01.7700852Z bar.sync 0; 2026-02-21T09:21:01.7701076Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r783, %r919}; 2026-02-21T09:21:01.7701351Z bar.sync 0; 2026-02-21T09:21:01.7701499Z // begin inline asm 2026-02-21T09:21:01.7701728Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1253}, [%r1091]; 2026-02-21T09:21:01.7701993Z // end inline asm 2026-02-21T09:21:01.7702148Z // begin inline asm 2026-02-21T09:21:01.7702326Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7702522Z // end inline asm 2026-02-21T09:21:01.7702683Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7702882Z shl.b32 %r1379, %r1372, 8; 2026-02-21T09:21:01.7703063Z and.b32 %r1380, %r1379, 4096; 2026-02-21T09:21:01.7703253Z add.s32 %r1381, %r1380, %r491; 2026-02-21T09:21:01.7703441Z bfe.u32 %r1382, %r1381, 4, 14; 2026-02-21T09:21:01.7703620Z cvt.u64.u32 %rd65, %r1382; 2026-02-21T09:21:01.7703815Z or.b64 %rd59, %rd65, -9223371899382267904; 2026-02-21T09:21:01.7704023Z // begin inline asm 2026-02-21T09:21:01.7704879Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241,%r1242,%r1243,%r1244,%r1245,%r1246,%r1247,%r1248,%r1249,%r1250,%r1251,%r1252,%r1253}, {%r1218,%r1219,%r1220,%r1221}, %rd59, %p3, 1, 1; 2026-02-21T09:21:01.7705776Z // end inline asm 2026-02-21T09:21:01.7705932Z add.s32 %r1383, %r1381, 32; 2026-02-21T09:21:01.7706112Z bfe.u32 %r1384, %r1383, 4, 14; 2026-02-21T09:21:01.7706289Z cvt.u64.u32 %rd66, %r1384; 2026-02-21T09:21:01.7706600Z or.b64 %rd60, %rd66, -9223371899382267904; 2026-02-21T09:21:01.7706820Z // begin inline asm 2026-02-21T09:21:01.7707810Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241,%r1242,%r1243,%r1244,%r1245,%r1246,%r1247,%r1248,%r1249,%r1250,%r1251,%r1252,%r1253}, {%r1286,%r1287,%r1288,%r1289}, %rd60, %p3, 1, 1; 2026-02-21T09:21:01.7708772Z // end inline asm 2026-02-21T09:21:01.7708939Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7709145Z mov.b32 %r1322, %r491; 2026-02-21T09:21:01.7709315Z mov.b32 %r1324, %r1323; 2026-02-21T09:21:01.7709488Z // begin inline asm 2026-02-21T09:21:01.7710212Z // wait for regs: %r1222,%r1223,%r1224,%r1225,%r1226,%r1227,%r1228,%r1229,%r1230,%r1231,%r1232,%r1233,%r1234,%r1235,%r1236,%r1237,%r1238,%r1239,%r1240,%r1241,%r1242,%r1243,%r1244,%r1245,%r1246,%r1247,%r1248,%r1249,%r1250,%r1251,%r1252,%r1253,%r1322,%r1323,%r1324 2026-02-21T09:21:01.7710939Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7711137Z // end inline asm 2026-02-21T09:21:01.7711283Z $L__tmp4: 2026-02-21T09:21:01.7711584Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7711947Z add.s32 %r1385, %r5175, 1; 2026-02-21T09:21:01.7712135Z setp.gt.s32 %p12, %r1385, 4; 2026-02-21T09:21:01.7712325Z selp.b32 %r5175, 0, %r1385, %p12; 2026-02-21T09:21:01.7712675Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7713033Z add.s32 %r1386, %r5173, -16; 2026-02-21T09:21:01.7713313Z mad.wide.s32 %rd61, %r1386, 2, %rd27; 2026-02-21T09:21:01.7713667Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7714024Z shl.b32 %r1387, %r5175, 13; 2026-02-21T09:21:01.7714209Z add.s32 %r1360, %r21, %r1387; 2026-02-21T09:21:01.7714388Z selp.b32 %r1361, 8, 0, %p10; 2026-02-21T09:21:01.7714566Z // begin inline asm 2026-02-21T09:21:01.7714800Z cp.async.ca.shared.global [ %r1360 + 0 ], [ %rd61 + 0 ], 0x8, %r1361; 2026-02-21T09:21:01.7715080Z // end inline asm 2026-02-21T09:21:01.7715238Z cp.async.commit_group; 2026-02-21T09:21:01.7715558Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7715922Z mad.wide.s32 %rd62, %r5173, 2, %rd27; 2026-02-21T09:21:01.7716263Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7716762Z add.s32 %r1362, %r22, %r1387; 2026-02-21T09:21:01.7716942Z // begin inline asm 2026-02-21T09:21:01.7717183Z cp.async.ca.shared.global [ %r1362 + 0 ], [ %rd62 + 0 ], 0x8, %r1361; 2026-02-21T09:21:01.7717460Z // end inline asm 2026-02-21T09:21:01.7717625Z cp.async.commit_group; 2026-02-21T09:21:01.7717938Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7718306Z add.s32 %r5173, %r5173, 32; 2026-02-21T09:21:01.7718498Z add.s32 %r5172, %r5172, 131072; 2026-02-21T09:21:01.7718691Z setp.lt.u64 %p13, %rd219, 496; 2026-02-21T09:21:01.7718882Z @%p13 bra $L__BB0_3; 2026-02-21T09:21:01.7719091Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:01.7719501Z .loc 1 34 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:32 2026-02-21T09:21:01.7719857Z or.b32 %r1447, %r85, %r8; 2026-02-21T09:21:01.7720180Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7720541Z or.b32 %r1448, %r86, %r12; 2026-02-21T09:21:01.7720717Z or.b32 %r1449, %r86, %r13; 2026-02-21T09:21:01.7720889Z or.b32 %r1450, %r86, %r14; 2026-02-21T09:21:01.7721054Z or.b32 %r1451, %r86, %r15; 2026-02-21T09:21:01.7721368Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7721721Z cp.async.wait_group 0; 2026-02-21T09:21:01.7722036Z bar.sync 0; 2026-02-21T09:21:01.7722325Z .loc 1 90 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:90:28 2026-02-21T09:21:01.7722696Z cvt.rn.bf16x2.f32 %r1452, %r1223, %r1222; 2026-02-21T09:21:01.7722930Z cvt.rn.bf16x2.f32 %r1453, %r1225, %r1224; 2026-02-21T09:21:01.7723149Z cvt.rn.bf16x2.f32 %r1454, %r1227, %r1226; 2026-02-21T09:21:01.7723367Z cvt.rn.bf16x2.f32 %r1455, %r1229, %r1228; 2026-02-21T09:21:01.7723577Z cvt.rn.bf16x2.f32 %r1456, %r1231, %r1230; 2026-02-21T09:21:01.7723797Z cvt.rn.bf16x2.f32 %r1457, %r1233, %r1232; 2026-02-21T09:21:01.7724009Z cvt.rn.bf16x2.f32 %r1458, %r1235, %r1234; 2026-02-21T09:21:01.7724226Z cvt.rn.bf16x2.f32 %r1459, %r1237, %r1236; 2026-02-21T09:21:01.7724441Z cvt.rn.bf16x2.f32 %r1460, %r1239, %r1238; 2026-02-21T09:21:01.7724738Z cvt.rn.bf16x2.f32 %r1461, %r1241, %r1240; 2026-02-21T09:21:01.7724967Z cvt.rn.bf16x2.f32 %r1462, %r1243, %r1242; 2026-02-21T09:21:01.7725180Z cvt.rn.bf16x2.f32 %r1463, %r1245, %r1244; 2026-02-21T09:21:01.7725408Z cvt.rn.bf16x2.f32 %r1464, %r1247, %r1246; 2026-02-21T09:21:01.7725634Z cvt.rn.bf16x2.f32 %r1465, %r1249, %r1248; 2026-02-21T09:21:01.7725859Z cvt.rn.bf16x2.f32 %r1466, %r1251, %r1250; 2026-02-21T09:21:01.7726071Z cvt.rn.bf16x2.f32 %r1467, %r1253, %r1252; 2026-02-21T09:21:01.7726436Z .loc 1 91 43 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:43 2026-02-21T09:21:01.7726932Z shl.b32 %r1468, %r1448, 13; 2026-02-21T09:21:01.7727118Z shl.b32 %r1469, %r1449, 13; 2026-02-21T09:21:01.7727299Z shl.b32 %r1470, %r1450, 13; 2026-02-21T09:21:01.7727549Z shl.b32 %r1471, %r1451, 13; 2026-02-21T09:21:01.7727890Z .loc 1 91 50 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:50 2026-02-21T09:21:01.7728262Z add.s32 %r1472, %r1468, %r1447; 2026-02-21T09:21:01.7728458Z add.s32 %r1473, %r1469, %r1447; 2026-02-21T09:21:01.7728651Z add.s32 %r1474, %r1470, %r1447; 2026-02-21T09:21:01.7728833Z add.s32 %r1475, %r1471, %r1447; 2026-02-21T09:21:01.7729169Z .loc 1 91 22 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:22 2026-02-21T09:21:01.7729536Z mad.wide.s32 %rd67, %r1472, 2, %rd29; 2026-02-21T09:21:01.7729752Z mad.wide.s32 %rd68, %r1473, 2, %rd29; 2026-02-21T09:21:01.7729952Z mad.wide.s32 %rd69, %r1474, 2, %rd29; 2026-02-21T09:21:01.7730158Z mad.wide.s32 %rd70, %r1475, 2, %rd29; 2026-02-21T09:21:01.7730495Z .loc 1 91 81 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:81 2026-02-21T09:21:01.7730909Z st.shared.v4.b32 [%r39], {%r1452, %r1454, %r1456, %r1458}; 2026-02-21T09:21:01.7731223Z st.shared.v4.b32 [%r39+512], {%r1453, %r1455, %r1457, %r1459}; 2026-02-21T09:21:01.7731525Z st.shared.v4.b32 [%r40], {%r1460, %r1462, %r1464, %r1466}; 2026-02-21T09:21:01.7731835Z st.shared.v4.b32 [%r40+512], {%r1461, %r1463, %r1465, %r1467}; 2026-02-21T09:21:01.7732089Z bar.sync 0; 2026-02-21T09:21:01.7732243Z // begin inline asm 2026-02-21T09:21:01.7732538Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1408, %r1409, %r1410, %r1411}, [%r1392]; 2026-02-21T09:21:01.7732880Z // end inline asm 2026-02-21T09:21:01.7733037Z // begin inline asm 2026-02-21T09:21:01.7733314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1412, %r1413, %r1414, %r1415}, [%r1397]; 2026-02-21T09:21:01.7733643Z // end inline asm 2026-02-21T09:21:01.7733789Z // begin inline asm 2026-02-21T09:21:01.7734067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1416, %r1417, %r1418, %r1419}, [%r1402]; 2026-02-21T09:21:01.7734393Z // end inline asm 2026-02-21T09:21:01.7734546Z // begin inline asm 2026-02-21T09:21:01.7734815Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1420, %r1421, %r1422, %r1423}, [%r1407]; 2026-02-21T09:21:01.7735147Z // end inline asm 2026-02-21T09:21:01.7735302Z // begin inline asm 2026-02-21T09:21:01.7735528Z st.global.v4.b32 [ %rd67 + 0 ], { %r1408, %r1409, %r1410, %r1411 }; 2026-02-21T09:21:01.7735799Z // end inline asm 2026-02-21T09:21:01.7736092Z // begin inline asm 2026-02-21T09:21:01.7736326Z st.global.v4.b32 [ %rd68 + 0 ], { %r1412, %r1413, %r1414, %r1415 }; 2026-02-21T09:21:01.7736700Z // end inline asm 2026-02-21T09:21:01.7736851Z // begin inline asm 2026-02-21T09:21:01.7737064Z st.global.v4.b32 [ %rd69 + 0 ], { %r1416, %r1417, %r1418, %r1419 }; 2026-02-21T09:21:01.7737324Z // end inline asm 2026-02-21T09:21:01.7737477Z // begin inline asm 2026-02-21T09:21:01.7737688Z st.global.v4.b32 [ %rd70 + 0 ], { %r1420, %r1421, %r1422, %r1423 }; 2026-02-21T09:21:01.7737951Z // end inline asm 2026-02-21T09:21:01.7738280Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7738656Z add.s32 %r1476, %r5171, 1; 2026-02-21T09:21:01.7739057Z .loc 1 29 33 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:29:33 2026-02-21T09:21:01.7739420Z shr.u32 %r1477, %r1476, 6; 2026-02-21T09:21:01.7739598Z and.b32 %r1478, %r1477, 33554424; 2026-02-21T09:21:01.7739940Z .loc 1 30 39 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:39 2026-02-21T09:21:01.7740006Z sub.s32 %r1479, 64, %r1478; 2026-02-21T09:21:01.7740214Z .loc 1 30 52 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:52 2026-02-21T09:21:01.7740280Z min.s32 %r1480, %r1479, 8; 2026-02-21T09:21:01.7740481Z .loc 1 31 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:45 2026-02-21T09:21:01.7740551Z and.b32 %r1481, %r1476, 511; 2026-02-21T09:21:01.7740817Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.7740889Z div.s32 %r1482, %r1481, %r1480; 2026-02-21T09:21:01.7741106Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.7741177Z mul.lo.s32 %r1483, %r1482, %r1480; 2026-02-21T09:21:01.7741242Z sub.s32 %r1484, %r1481, %r1483; 2026-02-21T09:21:01.7741449Z .loc 1 31 30 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:30 2026-02-21T09:21:01.7741518Z add.s32 %r1485, %r1484, %r1478; 2026-02-21T09:21:01.7741717Z .loc 1 33 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:33:27 2026-02-21T09:21:01.7741782Z shl.b32 %r161, %r1485, 7; 2026-02-21T09:21:01.7742003Z .loc 1 35 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:35:27 2026-02-21T09:21:01.7742068Z shl.b32 %r162, %r1482, 8; 2026-02-21T09:21:01.7742271Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7742342Z or.b32 %r1486, %r162, %r11; 2026-02-21T09:21:01.7742543Z .loc 1 51 53 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:53 2026-02-21T09:21:01.7742608Z shl.b32 %r1487, %r1486, 10; 2026-02-21T09:21:01.7742815Z .loc 1 51 60 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:60 2026-02-21T09:21:01.7742883Z or.b32 %r1488, %r1487, %r19; 2026-02-21T09:21:01.7743080Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7743156Z mad.wide.s32 %rd71, %r1488, 2, %rd27; 2026-02-21T09:21:01.7743365Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7743423Z bar.sync 0; 2026-02-21T09:21:01.7743486Z mov.b32 %r1425, 8; 2026-02-21T09:21:01.7743552Z // begin inline asm 2026-02-21T09:21:01.7743695Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd71 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7743758Z // end inline asm 2026-02-21T09:21:01.7743835Z cp.async.commit_group; 2026-02-21T09:21:01.7744039Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7744109Z cvt.s64.s32 %rd82, %r1487; 2026-02-21T09:21:01.7744175Z or.b64 %rd83, %rd82, %rd10; 2026-02-21T09:21:01.7744378Z shl.b64 %rd84, %rd83, 1; 2026-02-21T09:21:01.7744444Z add.s64 %rd85, %rd27, %rd84; 2026-02-21T09:21:01.7744506Z add.s64 %rd72, %rd85, 32; 2026-02-21T09:21:01.7744716Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7744778Z // begin inline asm 2026-02-21T09:21:01.7744928Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd72 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7744990Z // end inline asm 2026-02-21T09:21:01.7745064Z cp.async.commit_group; 2026-02-21T09:21:01.7745267Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7745332Z add.s64 %rd73, %rd85, 64; 2026-02-21T09:21:01.7745610Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7745670Z bar.sync 0; 2026-02-21T09:21:01.7745731Z // begin inline asm 2026-02-21T09:21:01.7745868Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd73 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7745930Z // end inline asm 2026-02-21T09:21:01.7745997Z cp.async.commit_group; 2026-02-21T09:21:01.7748197Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7748388Z add.s64 %rd74, %rd85, 96; 2026-02-21T09:21:01.7748639Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7748708Z // begin inline asm 2026-02-21T09:21:01.7748861Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd74 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7748924Z // end inline asm 2026-02-21T09:21:01.7748996Z cp.async.commit_group; 2026-02-21T09:21:01.7749224Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7749299Z add.s64 %rd75, %rd85, 128; 2026-02-21T09:21:01.7749518Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7749582Z bar.sync 0; 2026-02-21T09:21:01.7749645Z // begin inline asm 2026-02-21T09:21:01.7749795Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd75 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7749854Z // end inline asm 2026-02-21T09:21:01.7749945Z cp.async.commit_group; 2026-02-21T09:21:01.7750168Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7750235Z add.s64 %rd76, %rd85, 160; 2026-02-21T09:21:01.7750448Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7750511Z // begin inline asm 2026-02-21T09:21:01.7750645Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd76 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7750713Z // end inline asm 2026-02-21T09:21:01.7750778Z cp.async.commit_group; 2026-02-21T09:21:01.7750981Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7751054Z add.s64 %rd77, %rd85, 192; 2026-02-21T09:21:01.7751256Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7751313Z bar.sync 0; 2026-02-21T09:21:01.7751375Z // begin inline asm 2026-02-21T09:21:01.7751513Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd77 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7751572Z // end inline asm 2026-02-21T09:21:01.7751637Z cp.async.commit_group; 2026-02-21T09:21:01.7751842Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7751904Z add.s64 %rd78, %rd85, 224; 2026-02-21T09:21:01.7752105Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7752172Z // begin inline asm 2026-02-21T09:21:01.7752302Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd78 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7752360Z // end inline asm 2026-02-21T09:21:01.7752590Z cp.async.commit_group; 2026-02-21T09:21:01.7752813Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7752877Z add.s64 %rd79, %rd85, 256; 2026-02-21T09:21:01.7753083Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7753146Z bar.sync 0; 2026-02-21T09:21:01.7753206Z // begin inline asm 2026-02-21T09:21:01.7753335Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd79 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7753397Z // end inline asm 2026-02-21T09:21:01.7753461Z cp.async.commit_group; 2026-02-21T09:21:01.7753660Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7753723Z add.s64 %rd80, %rd85, 288; 2026-02-21T09:21:01.7753993Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7754056Z // begin inline asm 2026-02-21T09:21:01.7754186Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd80 + 0 ], 0x8, %r1425; 2026-02-21T09:21:01.7754248Z // end inline asm 2026-02-21T09:21:01.7754325Z cp.async.commit_group; 2026-02-21T09:21:01.7754626Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7754700Z shl.b32 %r1489, %r1482, 18; 2026-02-21T09:21:01.7754765Z or.b32 %r5209, %r45, %r1489; 2026-02-21T09:21:01.7754831Z add.s32 %r1490, %r1481, %r84; 2026-02-21T09:21:01.7754897Z sub.s32 %r1491, %r1490, %r1483; 2026-02-21T09:21:01.7754967Z shl.b32 %r1492, %r1491, 7; 2026-02-21T09:21:01.7755029Z add.s32 %r5208, %r46, %r1492; 2026-02-21T09:21:01.7755092Z mov.b32 %r2127, 0f00000000; 2026-02-21T09:21:01.7755154Z mov.b32 %r5211, 4; 2026-02-21T09:21:01.7760256Z mov.b32 %r5210, -1; 2026-02-21T09:21:01.7760365Z mov.b64 %rd220, -16; 2026-02-21T09:21:01.7760435Z mov.b32 %r2128, %r2127; 2026-02-21T09:21:01.7760502Z mov.b32 %r2129, %r2127; 2026-02-21T09:21:01.7760571Z mov.b32 %r2130, %r2127; 2026-02-21T09:21:01.7760631Z mov.b32 %r2131, %r2127; 2026-02-21T09:21:01.7760690Z mov.b32 %r2132, %r2127; 2026-02-21T09:21:01.7760756Z mov.b32 %r2133, %r2127; 2026-02-21T09:21:01.7760816Z mov.b32 %r2134, %r2127; 2026-02-21T09:21:01.7760876Z mov.b32 %r2135, %r2127; 2026-02-21T09:21:01.7760940Z mov.b32 %r2136, %r2127; 2026-02-21T09:21:01.7761003Z mov.b32 %r2137, %r2127; 2026-02-21T09:21:01.7761060Z mov.b32 %r2138, %r2127; 2026-02-21T09:21:01.7761119Z mov.b32 %r2139, %r2127; 2026-02-21T09:21:01.7761184Z mov.b32 %r2140, %r2127; 2026-02-21T09:21:01.7761244Z mov.b32 %r2141, %r2127; 2026-02-21T09:21:01.7761302Z mov.b32 %r2142, %r2127; 2026-02-21T09:21:01.7761367Z mov.b32 %r2143, %r2127; 2026-02-21T09:21:01.7761426Z mov.b32 %r2144, %r2127; 2026-02-21T09:21:01.7761487Z mov.b32 %r2145, %r2127; 2026-02-21T09:21:01.7761555Z mov.b32 %r2146, %r2127; 2026-02-21T09:21:01.7761616Z mov.b32 %r2147, %r2127; 2026-02-21T09:21:01.7761676Z mov.b32 %r2148, %r2127; 2026-02-21T09:21:01.7761739Z mov.b32 %r2149, %r2127; 2026-02-21T09:21:01.7761803Z mov.b32 %r2150, %r2127; 2026-02-21T09:21:01.7761864Z mov.b32 %r2151, %r2127; 2026-02-21T09:21:01.7761923Z mov.b32 %r2152, %r2127; 2026-02-21T09:21:01.7762002Z mov.b32 %r2153, %r2127; 2026-02-21T09:21:01.7762064Z mov.b32 %r2154, %r2127; 2026-02-21T09:21:01.7762124Z mov.b32 %r2155, %r2127; 2026-02-21T09:21:01.7762183Z mov.b32 %r2156, %r2127; 2026-02-21T09:21:01.7762247Z mov.b32 %r2157, %r2127; 2026-02-21T09:21:01.7762306Z mov.b32 %r2158, %r2127; 2026-02-21T09:21:01.7762437Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:01.7762557Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:01.7762628Z add.s64 %rd220, %rd220, 16; 2026-02-21T09:21:01.7762703Z setp.lt.u64 %p21, %rd220, 432; 2026-02-21T09:21:01.7762768Z add.s32 %r2269, %r5210, 1; 2026-02-21T09:21:01.7762841Z setp.gt.s32 %p22, %r2269, 4; 2026-02-21T09:21:01.7763121Z selp.b32 %r5210, 0, %r2269, %p22; 2026-02-21T09:21:01.7763359Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7763441Z cp.async.wait_group 8; 2026-02-21T09:21:01.7763498Z bar.sync 0; 2026-02-21T09:21:01.7763566Z shl.b32 %r2270, %r5210, 13; 2026-02-21T09:21:01.7763639Z add.s32 %r2272, %r5145, %r2270; 2026-02-21T09:21:01.7763856Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7763924Z add.s32 %r2273, %r2272, %r31; 2026-02-21T09:21:01.7763995Z ld.shared.b16 %rs41, [%r2273]; 2026-02-21T09:21:01.7764075Z ld.shared.b16 %rs42, [%r2273+256]; 2026-02-21T09:21:01.7764144Z ld.shared.b16 %rs43, [%r2273+16]; 2026-02-21T09:21:01.7764293Z ld.shared.b16 %rs44, [%r2273+272]; 2026-02-21T09:21:01.7764369Z add.s32 %r2274, %r2272, %r32; 2026-02-21T09:21:01.7764437Z ld.shared.b16 %rs45, [%r2274]; 2026-02-21T09:21:01.7764503Z ld.shared.b16 %rs46, [%r2274+256]; 2026-02-21T09:21:01.7764572Z ld.shared.b16 %rs47, [%r2274+16]; 2026-02-21T09:21:01.7764642Z ld.shared.b16 %rs48, [%r2274+272]; 2026-02-21T09:21:01.7764714Z cvt.f32.bf16 %r1789, %rs41; 2026-02-21T09:21:01.7764775Z cvt.f32.bf16 %r1790, %rs42; 2026-02-21T09:21:01.7764913Z cvt.f32.bf16 %r1791, %rs45; 2026-02-21T09:21:01.7764977Z cvt.f32.bf16 %r1792, %rs46; 2026-02-21T09:21:01.7765036Z cvt.f32.bf16 %r1857, %rs43; 2026-02-21T09:21:01.7765103Z cvt.f32.bf16 %r1858, %rs44; 2026-02-21T09:21:01.7765163Z cvt.f32.bf16 %r1859, %rs47; 2026-02-21T09:21:01.7765222Z cvt.f32.bf16 %r1860, %rs48; 2026-02-21T09:21:01.7765441Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7765514Z cvt.s64.s32 %rd100, %r5208; 2026-02-21T09:21:01.7765580Z add.s64 %rd87, %rd28, %rd100; 2026-02-21T09:21:01.7765797Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7765868Z // begin inline asm 2026-02-21T09:21:01.7765931Z mov.u64 %rd86, 0x0; 2026-02-21T09:21:01.7766073Z createpolicy.fractional.L2::evict_first.b64 %rd86, 1.0; 2026-02-21T09:21:01.7766141Z // end inline asm 2026-02-21T09:21:01.7766206Z // begin inline asm 2026-02-21T09:21:01.7766268Z mov.u16 %rs39, 0x0; 2026-02-21T09:21:01.7766435Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs39 }, [ %rd87 + 0 ], %rd86; 2026-02-21T09:21:01.7766671Z // end inline asm 2026-02-21T09:21:01.7766894Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7766964Z st.shared.b8 [%r33], %rs39; 2026-02-21T09:21:01.7767024Z bar.sync 0; 2026-02-21T09:21:01.7767103Z ld.shared.v2.b8 {%rs49, %rs50}, [%r34]; 2026-02-21T09:21:01.7767316Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7767388Z shl.b16 %rs51, %rs49, 4; 2026-02-21T09:21:01.7767450Z shl.b16 %rs52, %rs50, 4; 2026-02-21T09:21:01.7767660Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7767736Z selp.b16 %rs53, %rs51, %rs49, %p63; 2026-02-21T09:21:01.7767807Z cvt.s16.s8 %rs54, %rs53; 2026-02-21T09:21:01.7767870Z shr.s16 %rs55, %rs54, 4; 2026-02-21T09:21:01.7767941Z selp.b16 %rs56, %rs52, %rs50, %p63; 2026-02-21T09:21:01.7768010Z cvt.s16.s8 %rs57, %rs56; 2026-02-21T09:21:01.7768070Z shr.s16 %rs58, %rs57, 4; 2026-02-21T09:21:01.7768274Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7768355Z cvt.rn.f32.s16 %r2275, %rs55; 2026-02-21T09:21:01.7768427Z cvt.rn.f32.s16 %r2276, %rs58; 2026-02-21T09:21:01.7768486Z bar.sync 0; 2026-02-21T09:21:01.7768551Z st.shared.b32 [%r35], %r2275; 2026-02-21T09:21:01.7768623Z st.shared.b32 [%r36], %r2276; 2026-02-21T09:21:01.7768769Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2127}; 2026-02-21T09:21:01.7768826Z bar.sync 0; 2026-02-21T09:21:01.7769043Z // begin inline asm 2026-02-21T09:21:01.7769202Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1657, %r1793}, [%r590]; 2026-02-21T09:21:01.7769262Z // end inline asm 2026-02-21T09:21:01.7769318Z bar.sync 0; 2026-02-21T09:21:01.7769460Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2129}; 2026-02-21T09:21:01.7769517Z bar.sync 0; 2026-02-21T09:21:01.7769576Z // begin inline asm 2026-02-21T09:21:01.7769733Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1659, %r1795}, [%r590]; 2026-02-21T09:21:01.7769790Z // end inline asm 2026-02-21T09:21:01.7769846Z bar.sync 0; 2026-02-21T09:21:01.7769976Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2128}; 2026-02-21T09:21:01.7770037Z bar.sync 0; 2026-02-21T09:21:01.7770096Z // begin inline asm 2026-02-21T09:21:01.7770310Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1658, %r1794}, [%r590]; 2026-02-21T09:21:01.7770378Z // end inline asm 2026-02-21T09:21:01.7770446Z bar.sync 0; 2026-02-21T09:21:01.7770582Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2130}; 2026-02-21T09:21:01.7770641Z bar.sync 0; 2026-02-21T09:21:01.7770710Z // begin inline asm 2026-02-21T09:21:01.7770856Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1660, %r1796}, [%r590]; 2026-02-21T09:21:01.7770975Z // end inline asm 2026-02-21T09:21:01.7771040Z bar.sync 0; 2026-02-21T09:21:01.7771167Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2131}; 2026-02-21T09:21:01.7771222Z bar.sync 0; 2026-02-21T09:21:01.7771287Z // begin inline asm 2026-02-21T09:21:01.7771432Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1661, %r1797}, [%r590]; 2026-02-21T09:21:01.7771489Z // end inline asm 2026-02-21T09:21:01.7771545Z bar.sync 0; 2026-02-21T09:21:01.7771689Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2133}; 2026-02-21T09:21:01.7771749Z bar.sync 0; 2026-02-21T09:21:01.7771810Z // begin inline asm 2026-02-21T09:21:01.7771958Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1663, %r1799}, [%r590]; 2026-02-21T09:21:01.7772015Z // end inline asm 2026-02-21T09:21:01.7772072Z bar.sync 0; 2026-02-21T09:21:01.7772201Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2132}; 2026-02-21T09:21:01.7772265Z bar.sync 0; 2026-02-21T09:21:01.7772323Z // begin inline asm 2026-02-21T09:21:01.7772472Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1662, %r1798}, [%r590]; 2026-02-21T09:21:01.7772542Z // end inline asm 2026-02-21T09:21:01.7772601Z bar.sync 0; 2026-02-21T09:21:01.7772733Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2134}; 2026-02-21T09:21:01.7772788Z bar.sync 0; 2026-02-21T09:21:01.7772853Z // begin inline asm 2026-02-21T09:21:01.7772994Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1664, %r1800}, [%r590]; 2026-02-21T09:21:01.7773052Z // end inline asm 2026-02-21T09:21:01.7773110Z bar.sync 0; 2026-02-21T09:21:01.7773238Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2135}; 2026-02-21T09:21:01.7773295Z bar.sync 0; 2026-02-21T09:21:01.7773359Z // begin inline asm 2026-02-21T09:21:01.7773504Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1665, %r1801}, [%r590]; 2026-02-21T09:21:01.7773564Z // end inline asm 2026-02-21T09:21:01.7773619Z bar.sync 0; 2026-02-21T09:21:01.7773750Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2137}; 2026-02-21T09:21:01.7773806Z bar.sync 0; 2026-02-21T09:21:01.7773876Z // begin inline asm 2026-02-21T09:21:01.7774027Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1667, %r1803}, [%r590]; 2026-02-21T09:21:01.7774085Z // end inline asm 2026-02-21T09:21:01.7774141Z bar.sync 0; 2026-02-21T09:21:01.7774269Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2136}; 2026-02-21T09:21:01.7774332Z bar.sync 0; 2026-02-21T09:21:01.7774391Z // begin inline asm 2026-02-21T09:21:01.7774535Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1666, %r1802}, [%r590]; 2026-02-21T09:21:01.7774598Z // end inline asm 2026-02-21T09:21:01.7774655Z bar.sync 0; 2026-02-21T09:21:01.7774784Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2138}; 2026-02-21T09:21:01.7774902Z bar.sync 0; 2026-02-21T09:21:01.7775039Z // begin inline asm 2026-02-21T09:21:01.7775182Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1668, %r1804}, [%r590]; 2026-02-21T09:21:01.7775240Z // end inline asm 2026-02-21T09:21:01.7775305Z bar.sync 0; 2026-02-21T09:21:01.7775433Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2139}; 2026-02-21T09:21:01.7775490Z bar.sync 0; 2026-02-21T09:21:01.7775555Z // begin inline asm 2026-02-21T09:21:01.7775698Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1669, %r1805}, [%r590]; 2026-02-21T09:21:01.7775754Z // end inline asm 2026-02-21T09:21:01.7775810Z bar.sync 0; 2026-02-21T09:21:01.7775944Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2141}; 2026-02-21T09:21:01.7776000Z bar.sync 0; 2026-02-21T09:21:01.7776060Z // begin inline asm 2026-02-21T09:21:01.7776260Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1671, %r1807}, [%r590]; 2026-02-21T09:21:01.7776320Z // end inline asm 2026-02-21T09:21:01.7776375Z bar.sync 0; 2026-02-21T09:21:01.7776631Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2140}; 2026-02-21T09:21:01.7776707Z bar.sync 0; 2026-02-21T09:21:01.7776772Z // begin inline asm 2026-02-21T09:21:01.7776915Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1670, %r1806}, [%r590]; 2026-02-21T09:21:01.7777060Z // end inline asm 2026-02-21T09:21:01.7777119Z bar.sync 0; 2026-02-21T09:21:01.7777247Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2142}; 2026-02-21T09:21:01.7777303Z bar.sync 0; 2026-02-21T09:21:01.7777369Z // begin inline asm 2026-02-21T09:21:01.7777511Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1672, %r1808}, [%r590]; 2026-02-21T09:21:01.7777571Z // end inline asm 2026-02-21T09:21:01.7777644Z bar.sync 0; 2026-02-21T09:21:01.7777778Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2143}; 2026-02-21T09:21:01.7777835Z bar.sync 0; 2026-02-21T09:21:01.7777899Z // begin inline asm 2026-02-21T09:21:01.7778043Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1673, %r1809}, [%r590]; 2026-02-21T09:21:01.7778103Z // end inline asm 2026-02-21T09:21:01.7778159Z bar.sync 0; 2026-02-21T09:21:01.7778296Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2145}; 2026-02-21T09:21:01.7778352Z bar.sync 0; 2026-02-21T09:21:01.7778413Z // begin inline asm 2026-02-21T09:21:01.7778566Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1675, %r1811}, [%r590]; 2026-02-21T09:21:01.7778624Z // end inline asm 2026-02-21T09:21:01.7778679Z bar.sync 0; 2026-02-21T09:21:01.7778805Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2144}; 2026-02-21T09:21:01.7778866Z bar.sync 0; 2026-02-21T09:21:01.7778923Z // begin inline asm 2026-02-21T09:21:01.7779065Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1674, %r1810}, [%r590]; 2026-02-21T09:21:01.7779128Z // end inline asm 2026-02-21T09:21:01.7779184Z bar.sync 0; 2026-02-21T09:21:01.7779312Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2146}; 2026-02-21T09:21:01.7779369Z bar.sync 0; 2026-02-21T09:21:01.7779434Z // begin inline asm 2026-02-21T09:21:01.7779579Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1676, %r1812}, [%r590]; 2026-02-21T09:21:01.7779639Z // end inline asm 2026-02-21T09:21:01.7779698Z bar.sync 0; 2026-02-21T09:21:01.7779825Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2147}; 2026-02-21T09:21:01.7779882Z bar.sync 0; 2026-02-21T09:21:01.7779948Z // begin inline asm 2026-02-21T09:21:01.7780092Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1677, %r1813}, [%r590]; 2026-02-21T09:21:01.7780153Z // end inline asm 2026-02-21T09:21:01.7780209Z bar.sync 0; 2026-02-21T09:21:01.7780343Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2149}; 2026-02-21T09:21:01.7780403Z bar.sync 0; 2026-02-21T09:21:01.7780462Z // begin inline asm 2026-02-21T09:21:01.7780612Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1679, %r1815}, [%r590]; 2026-02-21T09:21:01.7780675Z // end inline asm 2026-02-21T09:21:01.7780729Z bar.sync 0; 2026-02-21T09:21:01.7780857Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2148}; 2026-02-21T09:21:01.7781053Z bar.sync 0; 2026-02-21T09:21:01.7781113Z // begin inline asm 2026-02-21T09:21:01.7781258Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1678, %r1814}, [%r590]; 2026-02-21T09:21:01.7781319Z // end inline asm 2026-02-21T09:21:01.7781375Z bar.sync 0; 2026-02-21T09:21:01.7781507Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2150}; 2026-02-21T09:21:01.7781562Z bar.sync 0; 2026-02-21T09:21:01.7781627Z // begin inline asm 2026-02-21T09:21:01.7781773Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1680, %r1816}, [%r590]; 2026-02-21T09:21:01.7781830Z // end inline asm 2026-02-21T09:21:01.7781892Z bar.sync 0; 2026-02-21T09:21:01.7782018Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2151}; 2026-02-21T09:21:01.7782075Z bar.sync 0; 2026-02-21T09:21:01.7782143Z // begin inline asm 2026-02-21T09:21:01.7782354Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1681, %r1817}, [%r590]; 2026-02-21T09:21:01.7782415Z // end inline asm 2026-02-21T09:21:01.7782470Z bar.sync 0; 2026-02-21T09:21:01.7782605Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2153}; 2026-02-21T09:21:01.7782670Z bar.sync 0; 2026-02-21T09:21:01.7782731Z // begin inline asm 2026-02-21T09:21:01.7782880Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1683, %r1819}, [%r590]; 2026-02-21T09:21:01.7782985Z // end inline asm 2026-02-21T09:21:01.7783042Z bar.sync 0; 2026-02-21T09:21:01.7783171Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2152}; 2026-02-21T09:21:01.7783233Z bar.sync 0; 2026-02-21T09:21:01.7783292Z // begin inline asm 2026-02-21T09:21:01.7783439Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1682, %r1818}, [%r590]; 2026-02-21T09:21:01.7783501Z // end inline asm 2026-02-21T09:21:01.7783557Z bar.sync 0; 2026-02-21T09:21:01.7783689Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2154}; 2026-02-21T09:21:01.7783744Z bar.sync 0; 2026-02-21T09:21:01.7783806Z // begin inline asm 2026-02-21T09:21:01.7783948Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1684, %r1820}, [%r590]; 2026-02-21T09:21:01.7784006Z // end inline asm 2026-02-21T09:21:01.7784062Z bar.sync 0; 2026-02-21T09:21:01.7784190Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2155}; 2026-02-21T09:21:01.7784245Z bar.sync 0; 2026-02-21T09:21:01.7784309Z // begin inline asm 2026-02-21T09:21:01.7784457Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1685, %r1821}, [%r590]; 2026-02-21T09:21:01.7784515Z // end inline asm 2026-02-21T09:21:01.7784571Z bar.sync 0; 2026-02-21T09:21:01.7784705Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2157}; 2026-02-21T09:21:01.7784761Z bar.sync 0; 2026-02-21T09:21:01.7784831Z // begin inline asm 2026-02-21T09:21:01.7784982Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1687, %r1823}, [%r590]; 2026-02-21T09:21:01.7785042Z // end inline asm 2026-02-21T09:21:01.7785097Z bar.sync 0; 2026-02-21T09:21:01.7785227Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2156}; 2026-02-21T09:21:01.7785290Z bar.sync 0; 2026-02-21T09:21:01.7785351Z // begin inline asm 2026-02-21T09:21:01.7785498Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1686, %r1822}, [%r590]; 2026-02-21T09:21:01.7785559Z // end inline asm 2026-02-21T09:21:01.7785613Z bar.sync 0; 2026-02-21T09:21:01.7785739Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r2158}; 2026-02-21T09:21:01.7785795Z bar.sync 0; 2026-02-21T09:21:01.7785861Z // begin inline asm 2026-02-21T09:21:01.7786003Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r1688, %r1824}, [%r590]; 2026-02-21T09:21:01.7786060Z // end inline asm 2026-02-21T09:21:01.7786120Z $L__tmp5: 2026-02-21T09:21:01.7786408Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7786590Z // begin inline asm 2026-02-21T09:21:01.7786677Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7786736Z // end inline asm 2026-02-21T09:21:01.7786820Z shfl.sync.idx.b32 %r2277, %r6, 0, 31, -1; 2026-02-21T09:21:01.7786893Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7786966Z mov.pred %p14, -1; 2026-02-21T09:21:01.7787166Z // begin inline asm 2026-02-21T09:21:01.7787949Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677,%r1678,%r1679,%r1680,%r1681,%r1682,%r1683,%r1684,%r1685,%r1686,%r1687,%r1688}, {%r1789,%r1790,%r1791,%r1792}, %rd1, %p14, 1, 1; 2026-02-21T09:21:01.7788018Z // end inline asm 2026-02-21T09:21:01.7788079Z // begin inline asm 2026-02-21T09:21:01.7788966Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677,%r1678,%r1679,%r1680,%r1681,%r1682,%r1683,%r1684,%r1685,%r1686,%r1687,%r1688}, {%r1857,%r1858,%r1859,%r1860}, %rd2, %p14, 1, 1; 2026-02-21T09:21:01.7789035Z // end inline asm 2026-02-21T09:21:01.7789093Z // begin inline asm 2026-02-21T09:21:01.7790554Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1819,%r1820,%r1821,%r1822,%r1823,%r1824}, {%r1789,%r1790,%r1791,%r1792}, %rd3, %p14, 1, 1; 2026-02-21T09:21:01.7790627Z // end inline asm 2026-02-21T09:21:01.7790687Z // begin inline asm 2026-02-21T09:21:01.7791436Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1819,%r1820,%r1821,%r1822,%r1823,%r1824}, {%r1857,%r1858,%r1859,%r1860}, %rd4, %p14, 1, 1; 2026-02-21T09:21:01.7791495Z // end inline asm 2026-02-21T09:21:01.7791576Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7791634Z mov.b32 %r2229, 0; 2026-02-21T09:21:01.7791701Z mov.b32 %r1925, %r491; 2026-02-21T09:21:01.7791762Z mov.b32 %r1926, %r2229; 2026-02-21T09:21:01.7791824Z mov.b32 %r1927, %r2229; 2026-02-21T09:21:01.7791887Z // begin inline asm 2026-02-21T09:21:01.7792961Z // wait for regs: %r1657,%r1658,%r1659,%r1660,%r1661,%r1662,%r1663,%r1664,%r1665,%r1666,%r1667,%r1668,%r1669,%r1670,%r1671,%r1672,%r1673,%r1674,%r1675,%r1676,%r1677,%r1678,%r1679,%r1680,%r1681,%r1682,%r1683,%r1684,%r1685,%r1686,%r1687,%r1688,%r1793,%r1794,%r1795,%r1796,%r1797,%r1798,%r1799,%r1800,%r1801,%r1802,%r1803,%r1804,%r1805,%r1806,%r1807,%r1808,%r1809,%r1810,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1819,%r1820,%r1821,%r1822,%r1823,%r1824,%r1925,%r1926,%r1927 2026-02-21T09:21:01.7793048Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7793111Z // end inline asm 2026-02-21T09:21:01.7793165Z $L__tmp6: 2026-02-21T09:21:01.7793399Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7793473Z add.s32 %r2278, %r2272, 40960; 2026-02-21T09:21:01.7793688Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7793756Z add.s32 %r2279, %r2278, %r31; 2026-02-21T09:21:01.7793823Z ld.shared.b16 %rs59, [%r2279]; 2026-02-21T09:21:01.7793898Z ld.shared.b16 %rs60, [%r2279+256]; 2026-02-21T09:21:01.7793966Z ld.shared.b16 %rs61, [%r2279+16]; 2026-02-21T09:21:01.7794032Z ld.shared.b16 %rs62, [%r2279+272]; 2026-02-21T09:21:01.7794101Z add.s32 %r2280, %r2278, %r32; 2026-02-21T09:21:01.7794169Z ld.shared.b16 %rs63, [%r2280]; 2026-02-21T09:21:01.7794233Z ld.shared.b16 %rs64, [%r2280+256]; 2026-02-21T09:21:01.7794302Z ld.shared.b16 %rs65, [%r2280+16]; 2026-02-21T09:21:01.7794366Z ld.shared.b16 %rs66, [%r2280+272]; 2026-02-21T09:21:01.7794433Z cvt.f32.bf16 %r2123, %rs59; 2026-02-21T09:21:01.7794498Z cvt.f32.bf16 %r2124, %rs60; 2026-02-21T09:21:01.7794564Z cvt.f32.bf16 %r2125, %rs63; 2026-02-21T09:21:01.7794626Z cvt.f32.bf16 %r2126, %rs64; 2026-02-21T09:21:01.7794687Z cvt.f32.bf16 %r2191, %rs61; 2026-02-21T09:21:01.7794860Z cvt.f32.bf16 %r2192, %rs62; 2026-02-21T09:21:01.7794920Z cvt.f32.bf16 %r2193, %rs65; 2026-02-21T09:21:01.7794981Z cvt.f32.bf16 %r2194, %rs66; 2026-02-21T09:21:01.7795189Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7795257Z add.s32 %r2281, %r5208, 65536; 2026-02-21T09:21:01.7795332Z cvt.s64.s32 %rd101, %r2281; 2026-02-21T09:21:01.7795397Z add.s64 %rd94, %rd28, %rd101; 2026-02-21T09:21:01.7795604Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7795666Z // begin inline asm 2026-02-21T09:21:01.7795725Z mov.u64 %rd93, 0x0; 2026-02-21T09:21:01.7795859Z createpolicy.fractional.L2::evict_first.b64 %rd93, 1.0; 2026-02-21T09:21:01.7795972Z // end inline asm 2026-02-21T09:21:01.7796033Z // begin inline asm 2026-02-21T09:21:01.7796092Z mov.u16 %rs40, 0x0; 2026-02-21T09:21:01.7796257Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs40 }, [ %rd94 + 0 ], %rd93; 2026-02-21T09:21:01.7796320Z // end inline asm 2026-02-21T09:21:01.7796657Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7796722Z bar.sync 0; 2026-02-21T09:21:01.7796871Z st.shared.b8 [%r33], %rs40; 2026-02-21T09:21:01.7796932Z bar.sync 0; 2026-02-21T09:21:01.7797016Z ld.shared.v2.b8 {%rs67, %rs68}, [%r34]; 2026-02-21T09:21:01.7797219Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7797282Z shl.b16 %rs69, %rs67, 4; 2026-02-21T09:21:01.7797344Z shl.b16 %rs70, %rs68, 4; 2026-02-21T09:21:01.7797546Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7797618Z selp.b16 %rs71, %rs69, %rs67, %p63; 2026-02-21T09:21:01.7797680Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T09:21:01.7797745Z shr.s16 %rs73, %rs72, 4; 2026-02-21T09:21:01.7797823Z selp.b16 %rs74, %rs70, %rs68, %p63; 2026-02-21T09:21:01.7797890Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T09:21:01.7797955Z shr.s16 %rs76, %rs75, 4; 2026-02-21T09:21:01.7798156Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7798223Z cvt.rn.f32.s16 %r2282, %rs73; 2026-02-21T09:21:01.7798288Z cvt.rn.f32.s16 %r2283, %rs76; 2026-02-21T09:21:01.7798352Z bar.sync 0; 2026-02-21T09:21:01.7798416Z st.shared.b32 [%r35], %r2282; 2026-02-21T09:21:01.7798479Z st.shared.b32 [%r36], %r2283; 2026-02-21T09:21:01.7798537Z $L__tmp7: 2026-02-21T09:21:01.7798826Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7798987Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1657, %r1793}; 2026-02-21T09:21:01.7799045Z bar.sync 0; 2026-02-21T09:21:01.7799108Z // begin inline asm 2026-02-21T09:21:01.7799247Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2127}, [%r1091]; 2026-02-21T09:21:01.7799308Z // end inline asm 2026-02-21T09:21:01.7799367Z bar.sync 0; 2026-02-21T09:21:01.7799516Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1659, %r1795}; 2026-02-21T09:21:01.7799573Z bar.sync 0; 2026-02-21T09:21:01.7799636Z // begin inline asm 2026-02-21T09:21:01.7799767Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2129}, [%r1091]; 2026-02-21T09:21:01.7799826Z // end inline asm 2026-02-21T09:21:01.7799881Z bar.sync 0; 2026-02-21T09:21:01.7800030Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1658, %r1794}; 2026-02-21T09:21:01.7800086Z bar.sync 0; 2026-02-21T09:21:01.7800146Z // begin inline asm 2026-02-21T09:21:01.7800279Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2128}, [%r1091]; 2026-02-21T09:21:01.7800335Z // end inline asm 2026-02-21T09:21:01.7800392Z bar.sync 0; 2026-02-21T09:21:01.7800534Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1660, %r1796}; 2026-02-21T09:21:01.7800595Z bar.sync 0; 2026-02-21T09:21:01.7800771Z // begin inline asm 2026-02-21T09:21:01.7800967Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2130}, [%r1091]; 2026-02-21T09:21:01.7801023Z // end inline asm 2026-02-21T09:21:01.7801078Z bar.sync 0; 2026-02-21T09:21:01.7801227Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1661, %r1797}; 2026-02-21T09:21:01.7801282Z bar.sync 0; 2026-02-21T09:21:01.7801340Z // begin inline asm 2026-02-21T09:21:01.7801468Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2131}, [%r1091]; 2026-02-21T09:21:01.7801527Z // end inline asm 2026-02-21T09:21:01.7801581Z bar.sync 0; 2026-02-21T09:21:01.7801723Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1663, %r1799}; 2026-02-21T09:21:01.7801780Z bar.sync 0; 2026-02-21T09:21:01.7801839Z // begin inline asm 2026-02-21T09:21:01.7802030Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2133}, [%r1091]; 2026-02-21T09:21:01.7802089Z // end inline asm 2026-02-21T09:21:01.7802148Z bar.sync 0; 2026-02-21T09:21:01.7802289Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1662, %r1798}; 2026-02-21T09:21:01.7802349Z bar.sync 0; 2026-02-21T09:21:01.7802410Z // begin inline asm 2026-02-21T09:21:01.7802535Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2132}, [%r1091]; 2026-02-21T09:21:01.7802599Z // end inline asm 2026-02-21T09:21:01.7802656Z bar.sync 0; 2026-02-21T09:21:01.7802859Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1664, %r1800}; 2026-02-21T09:21:01.7802917Z bar.sync 0; 2026-02-21T09:21:01.7802976Z // begin inline asm 2026-02-21T09:21:01.7803106Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2134}, [%r1091]; 2026-02-21T09:21:01.7803162Z // end inline asm 2026-02-21T09:21:01.7803217Z bar.sync 0; 2026-02-21T09:21:01.7803361Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1665, %r1801}; 2026-02-21T09:21:01.7803415Z bar.sync 0; 2026-02-21T09:21:01.7803474Z // begin inline asm 2026-02-21T09:21:01.7803600Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2135}, [%r1091]; 2026-02-21T09:21:01.7803661Z // end inline asm 2026-02-21T09:21:01.7803718Z bar.sync 0; 2026-02-21T09:21:01.7803861Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1667, %r1803}; 2026-02-21T09:21:01.7803920Z bar.sync 0; 2026-02-21T09:21:01.7803978Z // begin inline asm 2026-02-21T09:21:01.7804104Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2137}, [%r1091]; 2026-02-21T09:21:01.7804161Z // end inline asm 2026-02-21T09:21:01.7804218Z bar.sync 0; 2026-02-21T09:21:01.7804360Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1666, %r1802}; 2026-02-21T09:21:01.7804415Z bar.sync 0; 2026-02-21T09:21:01.7804476Z // begin inline asm 2026-02-21T09:21:01.7804601Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2136}, [%r1091]; 2026-02-21T09:21:01.7804657Z // end inline asm 2026-02-21T09:21:01.7804714Z bar.sync 0; 2026-02-21T09:21:01.7804856Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1668, %r1804}; 2026-02-21T09:21:01.7804910Z bar.sync 0; 2026-02-21T09:21:01.7804967Z // begin inline asm 2026-02-21T09:21:01.7805097Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2138}, [%r1091]; 2026-02-21T09:21:01.7805156Z // end inline asm 2026-02-21T09:21:01.7805210Z bar.sync 0; 2026-02-21T09:21:01.7805356Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1669, %r1805}; 2026-02-21T09:21:01.7805410Z bar.sync 0; 2026-02-21T09:21:01.7805469Z // begin inline asm 2026-02-21T09:21:01.7805594Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2139}, [%r1091]; 2026-02-21T09:21:01.7805653Z // end inline asm 2026-02-21T09:21:01.7805708Z bar.sync 0; 2026-02-21T09:21:01.7805852Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1671, %r1807}; 2026-02-21T09:21:01.7805912Z bar.sync 0; 2026-02-21T09:21:01.7805971Z // begin inline asm 2026-02-21T09:21:01.7806097Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2141}, [%r1091]; 2026-02-21T09:21:01.7806156Z // end inline asm 2026-02-21T09:21:01.7806215Z bar.sync 0; 2026-02-21T09:21:01.7806358Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1670, %r1806}; 2026-02-21T09:21:01.7806414Z bar.sync 0; 2026-02-21T09:21:01.7806696Z // begin inline asm 2026-02-21T09:21:01.7806921Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2140}, [%r1091]; 2026-02-21T09:21:01.7806978Z // end inline asm 2026-02-21T09:21:01.7807036Z bar.sync 0; 2026-02-21T09:21:01.7807189Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1672, %r1808}; 2026-02-21T09:21:01.7807244Z bar.sync 0; 2026-02-21T09:21:01.7807301Z // begin inline asm 2026-02-21T09:21:01.7807435Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2142}, [%r1091]; 2026-02-21T09:21:01.7807490Z // end inline asm 2026-02-21T09:21:01.7807544Z bar.sync 0; 2026-02-21T09:21:01.7807691Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1673, %r1809}; 2026-02-21T09:21:01.7807744Z bar.sync 0; 2026-02-21T09:21:01.7807803Z // begin inline asm 2026-02-21T09:21:01.7807997Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2143}, [%r1091]; 2026-02-21T09:21:01.7808059Z // end inline asm 2026-02-21T09:21:01.7808112Z bar.sync 0; 2026-02-21T09:21:01.7808252Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1675, %r1811}; 2026-02-21T09:21:01.7808313Z bar.sync 0; 2026-02-21T09:21:01.7808371Z // begin inline asm 2026-02-21T09:21:01.7808495Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2145}, [%r1091]; 2026-02-21T09:21:01.7808551Z // end inline asm 2026-02-21T09:21:01.7808673Z bar.sync 0; 2026-02-21T09:21:01.7808817Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1674, %r1810}; 2026-02-21T09:21:01.7808870Z bar.sync 0; 2026-02-21T09:21:01.7808929Z // begin inline asm 2026-02-21T09:21:01.7809054Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2144}, [%r1091]; 2026-02-21T09:21:01.7809111Z // end inline asm 2026-02-21T09:21:01.7809179Z bar.sync 0; 2026-02-21T09:21:01.7809323Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1676, %r1812}; 2026-02-21T09:21:01.7809379Z bar.sync 0; 2026-02-21T09:21:01.7809439Z // begin inline asm 2026-02-21T09:21:01.7809569Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2146}, [%r1091]; 2026-02-21T09:21:01.7809624Z // end inline asm 2026-02-21T09:21:01.7809680Z bar.sync 0; 2026-02-21T09:21:01.7809824Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1677, %r1813}; 2026-02-21T09:21:01.7809876Z bar.sync 0; 2026-02-21T09:21:01.7809933Z // begin inline asm 2026-02-21T09:21:01.7810059Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2147}, [%r1091]; 2026-02-21T09:21:01.7810118Z // end inline asm 2026-02-21T09:21:01.7810170Z bar.sync 0; 2026-02-21T09:21:01.7810311Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1679, %r1815}; 2026-02-21T09:21:01.7810368Z bar.sync 0; 2026-02-21T09:21:01.7810425Z // begin inline asm 2026-02-21T09:21:01.7810549Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2149}, [%r1091]; 2026-02-21T09:21:01.7810604Z // end inline asm 2026-02-21T09:21:01.7810661Z bar.sync 0; 2026-02-21T09:21:01.7810804Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1678, %r1814}; 2026-02-21T09:21:01.7810856Z bar.sync 0; 2026-02-21T09:21:01.7810916Z // begin inline asm 2026-02-21T09:21:01.7811039Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2148}, [%r1091]; 2026-02-21T09:21:01.7811098Z // end inline asm 2026-02-21T09:21:01.7811154Z bar.sync 0; 2026-02-21T09:21:01.7811298Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1680, %r1816}; 2026-02-21T09:21:01.7811352Z bar.sync 0; 2026-02-21T09:21:01.7811412Z // begin inline asm 2026-02-21T09:21:01.7811546Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2150}, [%r1091]; 2026-02-21T09:21:01.7811601Z // end inline asm 2026-02-21T09:21:01.7811655Z bar.sync 0; 2026-02-21T09:21:01.7811804Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1681, %r1817}; 2026-02-21T09:21:01.7811858Z bar.sync 0; 2026-02-21T09:21:01.7811916Z // begin inline asm 2026-02-21T09:21:01.7812045Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2151}, [%r1091]; 2026-02-21T09:21:01.7812102Z // end inline asm 2026-02-21T09:21:01.7812156Z bar.sync 0; 2026-02-21T09:21:01.7812300Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1683, %r1819}; 2026-02-21T09:21:01.7812357Z bar.sync 0; 2026-02-21T09:21:01.7812535Z // begin inline asm 2026-02-21T09:21:01.7812663Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2153}, [%r1091]; 2026-02-21T09:21:01.7812718Z // end inline asm 2026-02-21T09:21:01.7812776Z bar.sync 0; 2026-02-21T09:21:01.7812918Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1682, %r1818}; 2026-02-21T09:21:01.7812979Z bar.sync 0; 2026-02-21T09:21:01.7813040Z // begin inline asm 2026-02-21T09:21:01.7813165Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2152}, [%r1091]; 2026-02-21T09:21:01.7813220Z // end inline asm 2026-02-21T09:21:01.7813275Z bar.sync 0; 2026-02-21T09:21:01.7813415Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1684, %r1820}; 2026-02-21T09:21:01.7813467Z bar.sync 0; 2026-02-21T09:21:01.7813524Z // begin inline asm 2026-02-21T09:21:01.7813702Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2154}, [%r1091]; 2026-02-21T09:21:01.7813760Z // end inline asm 2026-02-21T09:21:01.7813812Z bar.sync 0; 2026-02-21T09:21:01.7813956Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1685, %r1821}; 2026-02-21T09:21:01.7814013Z bar.sync 0; 2026-02-21T09:21:01.7814081Z // begin inline asm 2026-02-21T09:21:01.7814209Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2155}, [%r1091]; 2026-02-21T09:21:01.7814267Z // end inline asm 2026-02-21T09:21:01.7814369Z bar.sync 0; 2026-02-21T09:21:01.7814517Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1687, %r1823}; 2026-02-21T09:21:01.7814572Z bar.sync 0; 2026-02-21T09:21:01.7814629Z // begin inline asm 2026-02-21T09:21:01.7814753Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2157}, [%r1091]; 2026-02-21T09:21:01.7814808Z // end inline asm 2026-02-21T09:21:01.7814869Z bar.sync 0; 2026-02-21T09:21:01.7815013Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1686, %r1822}; 2026-02-21T09:21:01.7815066Z bar.sync 0; 2026-02-21T09:21:01.7815129Z // begin inline asm 2026-02-21T09:21:01.7815256Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2156}, [%r1091]; 2026-02-21T09:21:01.7815310Z // end inline asm 2026-02-21T09:21:01.7815370Z bar.sync 0; 2026-02-21T09:21:01.7815511Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r1688, %r1824}; 2026-02-21T09:21:01.7815564Z bar.sync 0; 2026-02-21T09:21:01.7815621Z // begin inline asm 2026-02-21T09:21:01.7815751Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2158}, [%r1091]; 2026-02-21T09:21:01.7815805Z // end inline asm 2026-02-21T09:21:01.7815863Z // begin inline asm 2026-02-21T09:21:01.7815945Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7816000Z // end inline asm 2026-02-21T09:21:01.7816073Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7816135Z shl.b32 %r2284, %r2277, 8; 2026-02-21T09:21:01.7816202Z and.b32 %r2285, %r2284, 4096; 2026-02-21T09:21:01.7816264Z add.s32 %r2286, %r2285, %r491; 2026-02-21T09:21:01.7816324Z bfe.u32 %r2287, %r2286, 4, 14; 2026-02-21T09:21:01.7816397Z cvt.u64.u32 %rd102, %r2287; 2026-02-21T09:21:01.7816601Z or.b64 %rd96, %rd102, -9223371899382267904; 2026-02-21T09:21:01.7816667Z // begin inline asm 2026-02-21T09:21:01.7817442Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158}, {%r2123,%r2124,%r2125,%r2126}, %rd96, %p14, 1, 1; 2026-02-21T09:21:01.7817497Z // end inline asm 2026-02-21T09:21:01.7817556Z add.s32 %r2288, %r2286, 32; 2026-02-21T09:21:01.7817616Z bfe.u32 %r2289, %r2288, 4, 14; 2026-02-21T09:21:01.7817681Z cvt.u64.u32 %rd103, %r2289; 2026-02-21T09:21:01.7817755Z or.b64 %rd97, %rd103, -9223371899382267904; 2026-02-21T09:21:01.7817812Z // begin inline asm 2026-02-21T09:21:01.7818574Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158}, {%r2191,%r2192,%r2193,%r2194}, %rd97, %p14, 1, 1; 2026-02-21T09:21:01.7818770Z // end inline asm 2026-02-21T09:21:01.7818847Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7818911Z mov.b32 %r2228, %r2229; 2026-02-21T09:21:01.7818970Z mov.b32 %r2227, %r491; 2026-02-21T09:21:01.7819028Z // begin inline asm 2026-02-21T09:21:01.7819590Z // wait for regs: %r2127,%r2128,%r2129,%r2130,%r2131,%r2132,%r2133,%r2134,%r2135,%r2136,%r2137,%r2138,%r2139,%r2140,%r2141,%r2142,%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150,%r2151,%r2152,%r2153,%r2154,%r2155,%r2156,%r2157,%r2158,%r2227,%r2228,%r2229 2026-02-21T09:21:01.7819666Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7819721Z // end inline asm 2026-02-21T09:21:01.7819790Z $L__tmp8: 2026-02-21T09:21:01.7820073Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7820136Z add.s32 %r2290, %r5211, 1; 2026-02-21T09:21:01.7820201Z setp.gt.s32 %p23, %r2290, 4; 2026-02-21T09:21:01.7820274Z selp.b32 %r5211, 0, %r2290, %p23; 2026-02-21T09:21:01.7820481Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7820544Z add.s32 %r2291, %r5209, -16; 2026-02-21T09:21:01.7820617Z mad.wide.s32 %rd98, %r2291, 2, %rd27; 2026-02-21T09:21:01.7820894Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7820958Z shl.b32 %r2292, %r5211, 13; 2026-02-21T09:21:01.7821022Z add.s32 %r2265, %r21, %r2292; 2026-02-21T09:21:01.7821089Z selp.b32 %r2266, 8, 0, %p21; 2026-02-21T09:21:01.7821148Z // begin inline asm 2026-02-21T09:21:01.7821292Z cp.async.ca.shared.global [ %r2265 + 0 ], [ %rd98 + 0 ], 0x8, %r2266; 2026-02-21T09:21:01.7821365Z // end inline asm 2026-02-21T09:21:01.7821435Z cp.async.commit_group; 2026-02-21T09:21:01.7821635Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7821708Z mad.wide.s32 %rd99, %r5209, 2, %rd27; 2026-02-21T09:21:01.7821914Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7821974Z add.s32 %r2267, %r22, %r2292; 2026-02-21T09:21:01.7822035Z // begin inline asm 2026-02-21T09:21:01.7822173Z cp.async.ca.shared.global [ %r2267 + 0 ], [ %rd99 + 0 ], 0x8, %r2266; 2026-02-21T09:21:01.7822229Z // end inline asm 2026-02-21T09:21:01.7822295Z cp.async.commit_group; 2026-02-21T09:21:01.7822495Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7822556Z add.s32 %r5209, %r5209, 32; 2026-02-21T09:21:01.7822624Z add.s32 %r5208, %r5208, 131072; 2026-02-21T09:21:01.7822698Z setp.lt.u64 %p24, %rd220, 496; 2026-02-21T09:21:01.7822765Z @%p24 bra $L__BB0_5; 2026-02-21T09:21:01.7822877Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:01.7823083Z .loc 1 34 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:32 2026-02-21T09:21:01.7823153Z or.b32 %r2352, %r161, %r8; 2026-02-21T09:21:01.7823349Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7823411Z or.b32 %r2353, %r162, %r12; 2026-02-21T09:21:01.7823474Z or.b32 %r2354, %r162, %r13; 2026-02-21T09:21:01.7823530Z or.b32 %r2355, %r162, %r14; 2026-02-21T09:21:01.7823587Z or.b32 %r2356, %r162, %r15; 2026-02-21T09:21:01.7823786Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7823851Z cp.async.wait_group 0; 2026-02-21T09:21:01.7823907Z bar.sync 0; 2026-02-21T09:21:01.7824103Z .loc 1 90 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:90:28 2026-02-21T09:21:01.7824185Z cvt.rn.bf16x2.f32 %r2357, %r2128, %r2127; 2026-02-21T09:21:01.7824259Z cvt.rn.bf16x2.f32 %r2358, %r2130, %r2129; 2026-02-21T09:21:01.7824329Z cvt.rn.bf16x2.f32 %r2359, %r2132, %r2131; 2026-02-21T09:21:01.7824504Z cvt.rn.bf16x2.f32 %r2360, %r2134, %r2133; 2026-02-21T09:21:01.7824573Z cvt.rn.bf16x2.f32 %r2361, %r2136, %r2135; 2026-02-21T09:21:01.7824641Z cvt.rn.bf16x2.f32 %r2362, %r2138, %r2137; 2026-02-21T09:21:01.7824714Z cvt.rn.bf16x2.f32 %r2363, %r2140, %r2139; 2026-02-21T09:21:01.7824782Z cvt.rn.bf16x2.f32 %r2364, %r2142, %r2141; 2026-02-21T09:21:01.7824850Z cvt.rn.bf16x2.f32 %r2365, %r2144, %r2143; 2026-02-21T09:21:01.7824919Z cvt.rn.bf16x2.f32 %r2366, %r2146, %r2145; 2026-02-21T09:21:01.7824990Z cvt.rn.bf16x2.f32 %r2367, %r2148, %r2147; 2026-02-21T09:21:01.7825059Z cvt.rn.bf16x2.f32 %r2368, %r2150, %r2149; 2026-02-21T09:21:01.7825127Z cvt.rn.bf16x2.f32 %r2369, %r2152, %r2151; 2026-02-21T09:21:01.7825249Z cvt.rn.bf16x2.f32 %r2370, %r2154, %r2153; 2026-02-21T09:21:01.7825320Z cvt.rn.bf16x2.f32 %r2371, %r2156, %r2155; 2026-02-21T09:21:01.7825388Z cvt.rn.bf16x2.f32 %r2372, %r2158, %r2157; 2026-02-21T09:21:01.7825596Z .loc 1 91 43 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:43 2026-02-21T09:21:01.7825659Z shl.b32 %r2373, %r2353, 13; 2026-02-21T09:21:01.7825720Z shl.b32 %r2374, %r2354, 13; 2026-02-21T09:21:01.7825779Z shl.b32 %r2375, %r2355, 13; 2026-02-21T09:21:01.7825898Z shl.b32 %r2376, %r2356, 13; 2026-02-21T09:21:01.7826105Z .loc 1 91 50 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:50 2026-02-21T09:21:01.7826166Z add.s32 %r2377, %r2373, %r2352; 2026-02-21T09:21:01.7826229Z add.s32 %r2378, %r2374, %r2352; 2026-02-21T09:21:01.7826289Z add.s32 %r2379, %r2375, %r2352; 2026-02-21T09:21:01.7826347Z add.s32 %r2380, %r2376, %r2352; 2026-02-21T09:21:01.7826664Z .loc 1 91 22 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:22 2026-02-21T09:21:01.7826740Z mad.wide.s32 %rd104, %r2377, 2, %rd29; 2026-02-21T09:21:01.7826806Z mad.wide.s32 %rd105, %r2378, 2, %rd29; 2026-02-21T09:21:01.7826873Z mad.wide.s32 %rd106, %r2379, 2, %rd29; 2026-02-21T09:21:01.7826961Z mad.wide.s32 %rd107, %r2380, 2, %rd29; 2026-02-21T09:21:01.7827160Z .loc 1 91 81 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:81 2026-02-21T09:21:01.7827275Z st.shared.v4.b32 [%r39], {%r2357, %r2359, %r2361, %r2363}; 2026-02-21T09:21:01.7827395Z st.shared.v4.b32 [%r39+512], {%r2358, %r2360, %r2362, %r2364}; 2026-02-21T09:21:01.7827500Z st.shared.v4.b32 [%r40], {%r2365, %r2367, %r2369, %r2371}; 2026-02-21T09:21:01.7827608Z st.shared.v4.b32 [%r40+512], {%r2366, %r2368, %r2370, %r2372}; 2026-02-21T09:21:01.7827665Z bar.sync 0; 2026-02-21T09:21:01.7827724Z // begin inline asm 2026-02-21T09:21:01.7827915Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2313, %r2314, %r2315, %r2316}, [%r1392]; 2026-02-21T09:21:01.7827972Z // end inline asm 2026-02-21T09:21:01.7828032Z // begin inline asm 2026-02-21T09:21:01.7828211Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2317, %r2318, %r2319, %r2320}, [%r1397]; 2026-02-21T09:21:01.7828269Z // end inline asm 2026-02-21T09:21:01.7828388Z // begin inline asm 2026-02-21T09:21:01.7828571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2321, %r2322, %r2323, %r2324}, [%r1402]; 2026-02-21T09:21:01.7828628Z // end inline asm 2026-02-21T09:21:01.7828691Z // begin inline asm 2026-02-21T09:21:01.7828864Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2325, %r2326, %r2327, %r2328}, [%r1407]; 2026-02-21T09:21:01.7828920Z // end inline asm 2026-02-21T09:21:01.7828977Z // begin inline asm 2026-02-21T09:21:01.7829105Z st.global.v4.b32 [ %rd104 + 0 ], { %r2313, %r2314, %r2315, %r2316 }; 2026-02-21T09:21:01.7829159Z // end inline asm 2026-02-21T09:21:01.7829215Z // begin inline asm 2026-02-21T09:21:01.7829333Z st.global.v4.b32 [ %rd105 + 0 ], { %r2317, %r2318, %r2319, %r2320 }; 2026-02-21T09:21:01.7829388Z // end inline asm 2026-02-21T09:21:01.7829444Z // begin inline asm 2026-02-21T09:21:01.7829556Z st.global.v4.b32 [ %rd106 + 0 ], { %r2321, %r2322, %r2323, %r2324 }; 2026-02-21T09:21:01.7829769Z // end inline asm 2026-02-21T09:21:01.7829829Z // begin inline asm 2026-02-21T09:21:01.7829944Z st.global.v4.b32 [ %rd107 + 0 ], { %r2325, %r2326, %r2327, %r2328 }; 2026-02-21T09:21:01.7830002Z // end inline asm 2026-02-21T09:21:01.7830218Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7830280Z add.s32 %r2381, %r5171, 2; 2026-02-21T09:21:01.7830482Z .loc 1 29 33 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:29:33 2026-02-21T09:21:01.7830541Z shr.u32 %r2382, %r2381, 6; 2026-02-21T09:21:01.7830604Z and.b32 %r2383, %r2382, 33554424; 2026-02-21T09:21:01.7830802Z .loc 1 30 39 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:39 2026-02-21T09:21:01.7830929Z sub.s32 %r2384, 64, %r2383; 2026-02-21T09:21:01.7831129Z .loc 1 30 52 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:52 2026-02-21T09:21:01.7831190Z min.s32 %r2385, %r2384, 8; 2026-02-21T09:21:01.7831390Z .loc 1 31 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:45 2026-02-21T09:21:01.7831450Z and.b32 %r2386, %r2381, 511; 2026-02-21T09:21:01.7831706Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.7831774Z div.s32 %r2387, %r2386, %r2385; 2026-02-21T09:21:01.7831972Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.7832037Z mul.lo.s32 %r2388, %r2387, %r2385; 2026-02-21T09:21:01.7832101Z sub.s32 %r2389, %r2386, %r2388; 2026-02-21T09:21:01.7832295Z .loc 1 31 30 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:30 2026-02-21T09:21:01.7832356Z add.s32 %r2390, %r2389, %r2383; 2026-02-21T09:21:01.7832552Z .loc 1 33 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:33:27 2026-02-21T09:21:01.7832629Z shl.b32 %r237, %r2390, 7; 2026-02-21T09:21:01.7832830Z .loc 1 35 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:35:27 2026-02-21T09:21:01.7832888Z shl.b32 %r238, %r2387, 8; 2026-02-21T09:21:01.7833096Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7833156Z or.b32 %r2391, %r238, %r11; 2026-02-21T09:21:01.7833352Z .loc 1 51 53 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:53 2026-02-21T09:21:01.7833414Z shl.b32 %r2392, %r2391, 10; 2026-02-21T09:21:01.7833608Z .loc 1 51 60 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:60 2026-02-21T09:21:01.7833668Z or.b32 %r2393, %r2392, %r19; 2026-02-21T09:21:01.7833866Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7833934Z mad.wide.s32 %rd108, %r2393, 2, %rd27; 2026-02-21T09:21:01.7834133Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7834189Z bar.sync 0; 2026-02-21T09:21:01.7834248Z mov.b32 %r2330, 8; 2026-02-21T09:21:01.7834304Z // begin inline asm 2026-02-21T09:21:01.7834441Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd108 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7834502Z // end inline asm 2026-02-21T09:21:01.7834566Z cp.async.commit_group; 2026-02-21T09:21:01.7834774Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7834840Z cvt.s64.s32 %rd119, %r2392; 2026-02-21T09:21:01.7834901Z or.b64 %rd120, %rd119, %rd10; 2026-02-21T09:21:01.7834962Z shl.b64 %rd121, %rd120, 1; 2026-02-21T09:21:01.7835025Z add.s64 %rd122, %rd27, %rd121; 2026-02-21T09:21:01.7835089Z add.s64 %rd109, %rd122, 32; 2026-02-21T09:21:01.7835285Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7835445Z // begin inline asm 2026-02-21T09:21:01.7835578Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd109 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7835632Z // end inline asm 2026-02-21T09:21:01.7835695Z cp.async.commit_group; 2026-02-21T09:21:01.7835893Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7835956Z add.s64 %rd110, %rd122, 64; 2026-02-21T09:21:01.7836150Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7836205Z bar.sync 0; 2026-02-21T09:21:01.7836266Z // begin inline asm 2026-02-21T09:21:01.7836393Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd110 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7836571Z // end inline asm 2026-02-21T09:21:01.7836720Z cp.async.commit_group; 2026-02-21T09:21:01.7836921Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7836983Z add.s64 %rd111, %rd122, 96; 2026-02-21T09:21:01.7837184Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7837243Z // begin inline asm 2026-02-21T09:21:01.7837432Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd111 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7837490Z // end inline asm 2026-02-21T09:21:01.7837557Z cp.async.commit_group; 2026-02-21T09:21:01.7837754Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7837817Z add.s64 %rd112, %rd122, 128; 2026-02-21T09:21:01.7838016Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7838070Z bar.sync 0; 2026-02-21T09:21:01.7838127Z // begin inline asm 2026-02-21T09:21:01.7838261Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd112 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7838325Z // end inline asm 2026-02-21T09:21:01.7838391Z cp.async.commit_group; 2026-02-21T09:21:01.7838592Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7838657Z add.s64 %rd113, %rd122, 160; 2026-02-21T09:21:01.7838853Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7838910Z // begin inline asm 2026-02-21T09:21:01.7839038Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd113 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7839093Z // end inline asm 2026-02-21T09:21:01.7839157Z cp.async.commit_group; 2026-02-21T09:21:01.7839354Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7839416Z add.s64 %rd114, %rd122, 192; 2026-02-21T09:21:01.7839612Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7839666Z bar.sync 0; 2026-02-21T09:21:01.7839726Z // begin inline asm 2026-02-21T09:21:01.7839853Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd114 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7839920Z // end inline asm 2026-02-21T09:21:01.7839988Z cp.async.commit_group; 2026-02-21T09:21:01.7840186Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7840247Z add.s64 %rd115, %rd122, 224; 2026-02-21T09:21:01.7840444Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7840501Z // begin inline asm 2026-02-21T09:21:01.7840628Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd115 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7840681Z // end inline asm 2026-02-21T09:21:01.7840747Z cp.async.commit_group; 2026-02-21T09:21:01.7840947Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7841008Z add.s64 %rd116, %rd122, 256; 2026-02-21T09:21:01.7841208Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7841398Z bar.sync 0; 2026-02-21T09:21:01.7841457Z // begin inline asm 2026-02-21T09:21:01.7841588Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd116 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7841644Z // end inline asm 2026-02-21T09:21:01.7841711Z cp.async.commit_group; 2026-02-21T09:21:01.7841918Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7841984Z add.s64 %rd117, %rd122, 288; 2026-02-21T09:21:01.7842180Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7842238Z // begin inline asm 2026-02-21T09:21:01.7842371Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd117 + 0 ], 0x8, %r2330; 2026-02-21T09:21:01.7842501Z // end inline asm 2026-02-21T09:21:01.7842567Z cp.async.commit_group; 2026-02-21T09:21:01.7842774Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7842839Z shl.b32 %r2394, %r2387, 18; 2026-02-21T09:21:01.7842899Z or.b32 %r5245, %r45, %r2394; 2026-02-21T09:21:01.7842962Z add.s32 %r2395, %r2386, %r83; 2026-02-21T09:21:01.7843030Z sub.s32 %r2396, %r2395, %r2388; 2026-02-21T09:21:01.7843136Z shl.b32 %r2397, %r2396, 7; 2026-02-21T09:21:01.7843199Z add.s32 %r5244, %r46, %r2397; 2026-02-21T09:21:01.7843261Z mov.b32 %r3032, 0f00000000; 2026-02-21T09:21:01.7843318Z mov.b32 %r5247, 4; 2026-02-21T09:21:01.7843377Z mov.b32 %r5246, -1; 2026-02-21T09:21:01.7843436Z mov.b64 %rd221, -16; 2026-02-21T09:21:01.7843500Z mov.b32 %r3033, %r3032; 2026-02-21T09:21:01.7843570Z mov.b32 %r3034, %r3032; 2026-02-21T09:21:01.7843629Z mov.b32 %r3035, %r3032; 2026-02-21T09:21:01.7843688Z mov.b32 %r3036, %r3032; 2026-02-21T09:21:01.7843747Z mov.b32 %r3037, %r3032; 2026-02-21T09:21:01.7843807Z mov.b32 %r3038, %r3032; 2026-02-21T09:21:01.7843863Z mov.b32 %r3039, %r3032; 2026-02-21T09:21:01.7843923Z mov.b32 %r3040, %r3032; 2026-02-21T09:21:01.7843983Z mov.b32 %r3041, %r3032; 2026-02-21T09:21:01.7844040Z mov.b32 %r3042, %r3032; 2026-02-21T09:21:01.7844100Z mov.b32 %r3043, %r3032; 2026-02-21T09:21:01.7844155Z mov.b32 %r3044, %r3032; 2026-02-21T09:21:01.7844212Z mov.b32 %r3045, %r3032; 2026-02-21T09:21:01.7844272Z mov.b32 %r3046, %r3032; 2026-02-21T09:21:01.7844335Z mov.b32 %r3047, %r3032; 2026-02-21T09:21:01.7844392Z mov.b32 %r3048, %r3032; 2026-02-21T09:21:01.7844448Z mov.b32 %r3049, %r3032; 2026-02-21T09:21:01.7844521Z mov.b32 %r3050, %r3032; 2026-02-21T09:21:01.7844588Z mov.b32 %r3051, %r3032; 2026-02-21T09:21:01.7844648Z mov.b32 %r3052, %r3032; 2026-02-21T09:21:01.7844704Z mov.b32 %r3053, %r3032; 2026-02-21T09:21:01.7844766Z mov.b32 %r3054, %r3032; 2026-02-21T09:21:01.7844826Z mov.b32 %r3055, %r3032; 2026-02-21T09:21:01.7844884Z mov.b32 %r3056, %r3032; 2026-02-21T09:21:01.7844946Z mov.b32 %r3057, %r3032; 2026-02-21T09:21:01.7845004Z mov.b32 %r3058, %r3032; 2026-02-21T09:21:01.7845066Z mov.b32 %r3059, %r3032; 2026-02-21T09:21:01.7845128Z mov.b32 %r3060, %r3032; 2026-02-21T09:21:01.7845188Z mov.b32 %r3061, %r3032; 2026-02-21T09:21:01.7845247Z mov.b32 %r3062, %r3032; 2026-02-21T09:21:01.7845304Z mov.b32 %r3063, %r3032; 2026-02-21T09:21:01.7845425Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:01.7845533Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:01.7845598Z add.s64 %rd221, %rd221, 16; 2026-02-21T09:21:01.7845670Z setp.lt.u64 %p32, %rd221, 432; 2026-02-21T09:21:01.7845731Z add.s32 %r3174, %r5246, 1; 2026-02-21T09:21:01.7845796Z setp.gt.s32 %p33, %r3174, 4; 2026-02-21T09:21:01.7845865Z selp.b32 %r5246, 0, %r3174, %p33; 2026-02-21T09:21:01.7846082Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7846151Z cp.async.wait_group 8; 2026-02-21T09:21:01.7846205Z bar.sync 0; 2026-02-21T09:21:01.7846331Z shl.b32 %r3175, %r5246, 13; 2026-02-21T09:21:01.7846444Z add.s32 %r3177, %r5145, %r3175; 2026-02-21T09:21:01.7846815Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7846883Z add.s32 %r3178, %r3177, %r31; 2026-02-21T09:21:01.7846954Z ld.shared.b16 %rs79, [%r3178]; 2026-02-21T09:21:01.7847022Z ld.shared.b16 %rs80, [%r3178+256]; 2026-02-21T09:21:01.7847088Z ld.shared.b16 %rs81, [%r3178+16]; 2026-02-21T09:21:01.7847159Z ld.shared.b16 %rs82, [%r3178+272]; 2026-02-21T09:21:01.7847221Z add.s32 %r3179, %r3177, %r32; 2026-02-21T09:21:01.7847284Z ld.shared.b16 %rs83, [%r3179]; 2026-02-21T09:21:01.7847350Z ld.shared.b16 %rs84, [%r3179+256]; 2026-02-21T09:21:01.7847414Z ld.shared.b16 %rs85, [%r3179+16]; 2026-02-21T09:21:01.7847555Z ld.shared.b16 %rs86, [%r3179+272]; 2026-02-21T09:21:01.7847622Z cvt.f32.bf16 %r2694, %rs79; 2026-02-21T09:21:01.7847698Z cvt.f32.bf16 %r2695, %rs80; 2026-02-21T09:21:01.7847761Z cvt.f32.bf16 %r2696, %rs83; 2026-02-21T09:21:01.7847825Z cvt.f32.bf16 %r2697, %rs84; 2026-02-21T09:21:01.7847891Z cvt.f32.bf16 %r2762, %rs81; 2026-02-21T09:21:01.7847951Z cvt.f32.bf16 %r2763, %rs82; 2026-02-21T09:21:01.7848009Z cvt.f32.bf16 %r2764, %rs85; 2026-02-21T09:21:01.7848135Z cvt.f32.bf16 %r2765, %rs86; 2026-02-21T09:21:01.7848349Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7848409Z cvt.s64.s32 %rd137, %r5244; 2026-02-21T09:21:01.7848473Z add.s64 %rd124, %rd28, %rd137; 2026-02-21T09:21:01.7848677Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7848736Z // begin inline asm 2026-02-21T09:21:01.7848795Z mov.u64 %rd123, 0x0; 2026-02-21T09:21:01.7848931Z createpolicy.fractional.L2::evict_first.b64 %rd123, 1.0; 2026-02-21T09:21:01.7848991Z // end inline asm 2026-02-21T09:21:01.7849051Z // begin inline asm 2026-02-21T09:21:01.7849110Z mov.u16 %rs77, 0x0; 2026-02-21T09:21:01.7849279Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs77 }, [ %rd124 + 0 ], %rd123; 2026-02-21T09:21:01.7849335Z // end inline asm 2026-02-21T09:21:01.7849539Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7849608Z st.shared.b8 [%r33], %rs77; 2026-02-21T09:21:01.7849661Z bar.sync 0; 2026-02-21T09:21:01.7849736Z ld.shared.v2.b8 {%rs87, %rs88}, [%r34]; 2026-02-21T09:21:01.7849937Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7849999Z shl.b16 %rs89, %rs87, 4; 2026-02-21T09:21:01.7850060Z shl.b16 %rs90, %rs88, 4; 2026-02-21T09:21:01.7850260Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7850329Z selp.b16 %rs91, %rs89, %rs87, %p63; 2026-02-21T09:21:01.7850390Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T09:21:01.7850448Z shr.s16 %rs93, %rs92, 4; 2026-02-21T09:21:01.7850526Z selp.b16 %rs94, %rs90, %rs88, %p63; 2026-02-21T09:21:01.7850586Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T09:21:01.7850646Z shr.s16 %rs96, %rs95, 4; 2026-02-21T09:21:01.7850849Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7850915Z cvt.rn.f32.s16 %r3180, %rs93; 2026-02-21T09:21:01.7850976Z cvt.rn.f32.s16 %r3181, %rs96; 2026-02-21T09:21:01.7851035Z bar.sync 0; 2026-02-21T09:21:01.7851101Z st.shared.b32 [%r35], %r3180; 2026-02-21T09:21:01.7851164Z st.shared.b32 [%r36], %r3181; 2026-02-21T09:21:01.7851304Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3032}; 2026-02-21T09:21:01.7851365Z bar.sync 0; 2026-02-21T09:21:01.7851428Z // begin inline asm 2026-02-21T09:21:01.7851584Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2562, %r2698}, [%r590]; 2026-02-21T09:21:01.7851644Z // end inline asm 2026-02-21T09:21:01.7851698Z bar.sync 0; 2026-02-21T09:21:01.7851831Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3034}; 2026-02-21T09:21:01.7852030Z bar.sync 0; 2026-02-21T09:21:01.7852094Z // begin inline asm 2026-02-21T09:21:01.7852242Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2564, %r2700}, [%r590]; 2026-02-21T09:21:01.7852297Z // end inline asm 2026-02-21T09:21:01.7852355Z bar.sync 0; 2026-02-21T09:21:01.7852482Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3033}; 2026-02-21T09:21:01.7852536Z bar.sync 0; 2026-02-21T09:21:01.7852601Z // begin inline asm 2026-02-21T09:21:01.7852747Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2563, %r2699}, [%r590]; 2026-02-21T09:21:01.7852802Z // end inline asm 2026-02-21T09:21:01.7852855Z bar.sync 0; 2026-02-21T09:21:01.7852983Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3035}; 2026-02-21T09:21:01.7853035Z bar.sync 0; 2026-02-21T09:21:01.7853150Z // begin inline asm 2026-02-21T09:21:01.7853294Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2565, %r2701}, [%r590]; 2026-02-21T09:21:01.7853354Z // end inline asm 2026-02-21T09:21:01.7853410Z bar.sync 0; 2026-02-21T09:21:01.7853536Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3036}; 2026-02-21T09:21:01.7853594Z bar.sync 0; 2026-02-21T09:21:01.7853649Z // begin inline asm 2026-02-21T09:21:01.7853840Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2566, %r2702}, [%r590]; 2026-02-21T09:21:01.7853907Z // end inline asm 2026-02-21T09:21:01.7853960Z bar.sync 0; 2026-02-21T09:21:01.7854089Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3038}; 2026-02-21T09:21:01.7854142Z bar.sync 0; 2026-02-21T09:21:01.7854203Z // begin inline asm 2026-02-21T09:21:01.7854343Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2568, %r2704}, [%r590]; 2026-02-21T09:21:01.7854397Z // end inline asm 2026-02-21T09:21:01.7854466Z bar.sync 0; 2026-02-21T09:21:01.7854596Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3037}; 2026-02-21T09:21:01.7854651Z bar.sync 0; 2026-02-21T09:21:01.7854707Z // begin inline asm 2026-02-21T09:21:01.7854855Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2567, %r2703}, [%r590]; 2026-02-21T09:21:01.7854914Z // end inline asm 2026-02-21T09:21:01.7854966Z bar.sync 0; 2026-02-21T09:21:01.7855096Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3039}; 2026-02-21T09:21:01.7855148Z bar.sync 0; 2026-02-21T09:21:01.7855207Z // begin inline asm 2026-02-21T09:21:01.7855348Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2569, %r2705}, [%r590]; 2026-02-21T09:21:01.7855416Z // end inline asm 2026-02-21T09:21:01.7855471Z bar.sync 0; 2026-02-21T09:21:01.7855598Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3040}; 2026-02-21T09:21:01.7855655Z bar.sync 0; 2026-02-21T09:21:01.7855713Z // begin inline asm 2026-02-21T09:21:01.7855854Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2570, %r2706}, [%r590]; 2026-02-21T09:21:01.7855915Z // end inline asm 2026-02-21T09:21:01.7855968Z bar.sync 0; 2026-02-21T09:21:01.7856096Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3042}; 2026-02-21T09:21:01.7856149Z bar.sync 0; 2026-02-21T09:21:01.7856214Z // begin inline asm 2026-02-21T09:21:01.7856357Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2572, %r2708}, [%r590]; 2026-02-21T09:21:01.7856413Z // end inline asm 2026-02-21T09:21:01.7856585Z bar.sync 0; 2026-02-21T09:21:01.7856717Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3041}; 2026-02-21T09:21:01.7856769Z bar.sync 0; 2026-02-21T09:21:01.7856825Z // begin inline asm 2026-02-21T09:21:01.7856970Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2571, %r2707}, [%r590]; 2026-02-21T09:21:01.7857028Z // end inline asm 2026-02-21T09:21:01.7857082Z bar.sync 0; 2026-02-21T09:21:01.7857210Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3043}; 2026-02-21T09:21:01.7857264Z bar.sync 0; 2026-02-21T09:21:01.7857322Z // begin inline asm 2026-02-21T09:21:01.7857465Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2573, %r2709}, [%r590]; 2026-02-21T09:21:01.7857525Z // end inline asm 2026-02-21T09:21:01.7857579Z bar.sync 0; 2026-02-21T09:21:01.7857704Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3044}; 2026-02-21T09:21:01.7857913Z bar.sync 0; 2026-02-21T09:21:01.7857972Z // begin inline asm 2026-02-21T09:21:01.7858121Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2574, %r2710}, [%r590]; 2026-02-21T09:21:01.7858182Z // end inline asm 2026-02-21T09:21:01.7858239Z bar.sync 0; 2026-02-21T09:21:01.7858370Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3046}; 2026-02-21T09:21:01.7858425Z bar.sync 0; 2026-02-21T09:21:01.7858486Z // begin inline asm 2026-02-21T09:21:01.7858630Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2576, %r2712}, [%r590]; 2026-02-21T09:21:01.7858686Z // end inline asm 2026-02-21T09:21:01.7858744Z bar.sync 0; 2026-02-21T09:21:01.7858871Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3045}; 2026-02-21T09:21:01.7858992Z bar.sync 0; 2026-02-21T09:21:01.7859053Z // begin inline asm 2026-02-21T09:21:01.7859199Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2575, %r2711}, [%r590]; 2026-02-21T09:21:01.7859255Z // end inline asm 2026-02-21T09:21:01.7859312Z bar.sync 0; 2026-02-21T09:21:01.7859442Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3047}; 2026-02-21T09:21:01.7859499Z bar.sync 0; 2026-02-21T09:21:01.7859557Z // begin inline asm 2026-02-21T09:21:01.7859761Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2577, %r2713}, [%r590]; 2026-02-21T09:21:01.7859822Z // end inline asm 2026-02-21T09:21:01.7859876Z bar.sync 0; 2026-02-21T09:21:01.7860001Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3048}; 2026-02-21T09:21:01.7860059Z bar.sync 0; 2026-02-21T09:21:01.7860115Z // begin inline asm 2026-02-21T09:21:01.7860256Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2578, %r2714}, [%r590]; 2026-02-21T09:21:01.7860313Z // end inline asm 2026-02-21T09:21:01.7860367Z bar.sync 0; 2026-02-21T09:21:01.7860493Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3050}; 2026-02-21T09:21:01.7860548Z bar.sync 0; 2026-02-21T09:21:01.7860608Z // begin inline asm 2026-02-21T09:21:01.7860749Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2580, %r2716}, [%r590]; 2026-02-21T09:21:01.7860808Z // end inline asm 2026-02-21T09:21:01.7860864Z bar.sync 0; 2026-02-21T09:21:01.7860989Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3049}; 2026-02-21T09:21:01.7861042Z bar.sync 0; 2026-02-21T09:21:01.7861100Z // begin inline asm 2026-02-21T09:21:01.7861245Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2579, %r2715}, [%r590]; 2026-02-21T09:21:01.7861301Z // end inline asm 2026-02-21T09:21:01.7861356Z bar.sync 0; 2026-02-21T09:21:01.7861484Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3051}; 2026-02-21T09:21:01.7861538Z bar.sync 0; 2026-02-21T09:21:01.7861595Z // begin inline asm 2026-02-21T09:21:01.7861736Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2581, %r2717}, [%r590]; 2026-02-21T09:21:01.7861797Z // end inline asm 2026-02-21T09:21:01.7861851Z bar.sync 0; 2026-02-21T09:21:01.7861976Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3052}; 2026-02-21T09:21:01.7862032Z bar.sync 0; 2026-02-21T09:21:01.7862093Z // begin inline asm 2026-02-21T09:21:01.7862248Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2582, %r2718}, [%r590]; 2026-02-21T09:21:01.7862308Z // end inline asm 2026-02-21T09:21:01.7862362Z bar.sync 0; 2026-02-21T09:21:01.7862489Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3054}; 2026-02-21T09:21:01.7862542Z bar.sync 0; 2026-02-21T09:21:01.7862602Z // begin inline asm 2026-02-21T09:21:01.7862743Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2584, %r2720}, [%r590]; 2026-02-21T09:21:01.7862798Z // end inline asm 2026-02-21T09:21:01.7862854Z bar.sync 0; 2026-02-21T09:21:01.7862982Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3053}; 2026-02-21T09:21:01.7863040Z bar.sync 0; 2026-02-21T09:21:01.7863097Z // begin inline asm 2026-02-21T09:21:01.7863242Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2583, %r2719}, [%r590]; 2026-02-21T09:21:01.7863297Z // end inline asm 2026-02-21T09:21:01.7863350Z bar.sync 0; 2026-02-21T09:21:01.7863538Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3055}; 2026-02-21T09:21:01.7863657Z bar.sync 0; 2026-02-21T09:21:01.7863714Z // begin inline asm 2026-02-21T09:21:01.7863855Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2585, %r2721}, [%r590]; 2026-02-21T09:21:01.7863913Z // end inline asm 2026-02-21T09:21:01.7863966Z bar.sync 0; 2026-02-21T09:21:01.7864091Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3056}; 2026-02-21T09:21:01.7864146Z bar.sync 0; 2026-02-21T09:21:01.7864203Z // begin inline asm 2026-02-21T09:21:01.7864342Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2586, %r2722}, [%r590]; 2026-02-21T09:21:01.7864400Z // end inline asm 2026-02-21T09:21:01.7864453Z bar.sync 0; 2026-02-21T09:21:01.7864578Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3058}; 2026-02-21T09:21:01.7864684Z bar.sync 0; 2026-02-21T09:21:01.7864747Z // begin inline asm 2026-02-21T09:21:01.7864888Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2588, %r2724}, [%r590]; 2026-02-21T09:21:01.7864957Z // end inline asm 2026-02-21T09:21:01.7865019Z bar.sync 0; 2026-02-21T09:21:01.7865146Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3057}; 2026-02-21T09:21:01.7865201Z bar.sync 0; 2026-02-21T09:21:01.7865259Z // begin inline asm 2026-02-21T09:21:01.7865447Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2587, %r2723}, [%r590]; 2026-02-21T09:21:01.7865505Z // end inline asm 2026-02-21T09:21:01.7865559Z bar.sync 0; 2026-02-21T09:21:01.7865685Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3059}; 2026-02-21T09:21:01.7865740Z bar.sync 0; 2026-02-21T09:21:01.7865798Z // begin inline asm 2026-02-21T09:21:01.7865941Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2589, %r2725}, [%r590]; 2026-02-21T09:21:01.7865997Z // end inline asm 2026-02-21T09:21:01.7866051Z bar.sync 0; 2026-02-21T09:21:01.7866175Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3060}; 2026-02-21T09:21:01.7866230Z bar.sync 0; 2026-02-21T09:21:01.7866285Z // begin inline asm 2026-02-21T09:21:01.7866427Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2590, %r2726}, [%r590]; 2026-02-21T09:21:01.7866600Z // end inline asm 2026-02-21T09:21:01.7866668Z bar.sync 0; 2026-02-21T09:21:01.7866799Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3062}; 2026-02-21T09:21:01.7866854Z bar.sync 0; 2026-02-21T09:21:01.7866916Z // begin inline asm 2026-02-21T09:21:01.7867060Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2592, %r2728}, [%r590]; 2026-02-21T09:21:01.7867114Z // end inline asm 2026-02-21T09:21:01.7867171Z bar.sync 0; 2026-02-21T09:21:01.7867298Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3061}; 2026-02-21T09:21:01.7867352Z bar.sync 0; 2026-02-21T09:21:01.7867410Z // begin inline asm 2026-02-21T09:21:01.7867555Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2591, %r2727}, [%r590]; 2026-02-21T09:21:01.7867611Z // end inline asm 2026-02-21T09:21:01.7867665Z bar.sync 0; 2026-02-21T09:21:01.7867792Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3063}; 2026-02-21T09:21:01.7867848Z bar.sync 0; 2026-02-21T09:21:01.7867912Z // begin inline asm 2026-02-21T09:21:01.7868056Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2593, %r2729}, [%r590]; 2026-02-21T09:21:01.7868115Z // end inline asm 2026-02-21T09:21:01.7868169Z $L__tmp9: 2026-02-21T09:21:01.7868520Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7868585Z // begin inline asm 2026-02-21T09:21:01.7868664Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7868719Z // end inline asm 2026-02-21T09:21:01.7868800Z shfl.sync.idx.b32 %r3182, %r6, 0, 31, -1; 2026-02-21T09:21:01.7868873Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7868936Z mov.pred %p25, -1; 2026-02-21T09:21:01.7868994Z // begin inline asm 2026-02-21T09:21:01.7869759Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583,%r2584,%r2585,%r2586,%r2587,%r2588,%r2589,%r2590,%r2591,%r2592,%r2593}, {%r2694,%r2695,%r2696,%r2697}, %rd1, %p25, 1, 1; 2026-02-21T09:21:01.7869950Z // end inline asm 2026-02-21T09:21:01.7870007Z // begin inline asm 2026-02-21T09:21:01.7870760Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583,%r2584,%r2585,%r2586,%r2587,%r2588,%r2589,%r2590,%r2591,%r2592,%r2593}, {%r2762,%r2763,%r2764,%r2765}, %rd2, %p25, 1, 1; 2026-02-21T09:21:01.7870817Z // end inline asm 2026-02-21T09:21:01.7870873Z // begin inline asm 2026-02-21T09:21:01.7871682Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728,%r2729}, {%r2694,%r2695,%r2696,%r2697}, %rd3, %p25, 1, 1; 2026-02-21T09:21:01.7871756Z // end inline asm 2026-02-21T09:21:01.7871818Z // begin inline asm 2026-02-21T09:21:01.7872625Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728,%r2729}, {%r2762,%r2763,%r2764,%r2765}, %rd4, %p25, 1, 1; 2026-02-21T09:21:01.7872683Z // end inline asm 2026-02-21T09:21:01.7872761Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7872821Z mov.b32 %r3133, 0; 2026-02-21T09:21:01.7872881Z mov.b32 %r2831, %r3133; 2026-02-21T09:21:01.7872939Z mov.b32 %r2832, %r3133; 2026-02-21T09:21:01.7872999Z mov.b32 %r2830, %r491; 2026-02-21T09:21:01.7873058Z // begin inline asm 2026-02-21T09:21:01.7874122Z // wait for regs: %r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2570,%r2571,%r2572,%r2573,%r2574,%r2575,%r2576,%r2577,%r2578,%r2579,%r2580,%r2581,%r2582,%r2583,%r2584,%r2585,%r2586,%r2587,%r2588,%r2589,%r2590,%r2591,%r2592,%r2593,%r2698,%r2699,%r2700,%r2701,%r2702,%r2703,%r2704,%r2705,%r2706,%r2707,%r2708,%r2709,%r2710,%r2711,%r2712,%r2713,%r2714,%r2715,%r2716,%r2717,%r2718,%r2719,%r2720,%r2721,%r2722,%r2723,%r2724,%r2725,%r2726,%r2727,%r2728,%r2729,%r2830,%r2831,%r2832 2026-02-21T09:21:01.7874205Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7874262Z // end inline asm 2026-02-21T09:21:01.7874316Z $L__tmp10: 2026-02-21T09:21:01.7874530Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7874593Z add.s32 %r3183, %r3177, 40960; 2026-02-21T09:21:01.7874794Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7874857Z add.s32 %r3184, %r3183, %r31; 2026-02-21T09:21:01.7874927Z ld.shared.b16 %rs97, [%r3184]; 2026-02-21T09:21:01.7874995Z ld.shared.b16 %rs98, [%r3184+256]; 2026-02-21T09:21:01.7875065Z ld.shared.b16 %rs99, [%r3184+16]; 2026-02-21T09:21:01.7875137Z ld.shared.b16 %rs100, [%r3184+272]; 2026-02-21T09:21:01.7875198Z add.s32 %r3185, %r3183, %r32; 2026-02-21T09:21:01.7875262Z ld.shared.b16 %rs101, [%r3185]; 2026-02-21T09:21:01.7875331Z ld.shared.b16 %rs102, [%r3185+256]; 2026-02-21T09:21:01.7875395Z ld.shared.b16 %rs103, [%r3185+16]; 2026-02-21T09:21:01.7875460Z ld.shared.b16 %rs104, [%r3185+272]; 2026-02-21T09:21:01.7875525Z cvt.f32.bf16 %r3028, %rs97; 2026-02-21T09:21:01.7875588Z cvt.f32.bf16 %r3029, %rs98; 2026-02-21T09:21:01.7875651Z cvt.f32.bf16 %r3030, %rs101; 2026-02-21T09:21:01.7875711Z cvt.f32.bf16 %r3031, %rs102; 2026-02-21T09:21:01.7875774Z cvt.f32.bf16 %r3096, %rs99; 2026-02-21T09:21:01.7875837Z cvt.f32.bf16 %r3097, %rs100; 2026-02-21T09:21:01.7875897Z cvt.f32.bf16 %r3098, %rs103; 2026-02-21T09:21:01.7875957Z cvt.f32.bf16 %r3099, %rs104; 2026-02-21T09:21:01.7876161Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7876332Z add.s32 %r3186, %r5244, 65536; 2026-02-21T09:21:01.7876393Z cvt.s64.s32 %rd138, %r3186; 2026-02-21T09:21:01.7876583Z add.s64 %rd131, %rd28, %rd138; 2026-02-21T09:21:01.7876791Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7876850Z // begin inline asm 2026-02-21T09:21:01.7876911Z mov.u64 %rd130, 0x0; 2026-02-21T09:21:01.7877039Z createpolicy.fractional.L2::evict_first.b64 %rd130, 1.0; 2026-02-21T09:21:01.7877095Z // end inline asm 2026-02-21T09:21:01.7877153Z // begin inline asm 2026-02-21T09:21:01.7877213Z mov.u16 %rs78, 0x0; 2026-02-21T09:21:01.7877373Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs78 }, [ %rd131 + 0 ], %rd130; 2026-02-21T09:21:01.7877429Z // end inline asm 2026-02-21T09:21:01.7877707Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7877778Z bar.sync 0; 2026-02-21T09:21:01.7877845Z st.shared.b8 [%r33], %rs78; 2026-02-21T09:21:01.7877905Z bar.sync 0; 2026-02-21T09:21:01.7877985Z ld.shared.v2.b8 {%rs105, %rs106}, [%r34]; 2026-02-21T09:21:01.7878188Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7878315Z shl.b16 %rs107, %rs105, 4; 2026-02-21T09:21:01.7878382Z shl.b16 %rs108, %rs106, 4; 2026-02-21T09:21:01.7878582Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7878656Z selp.b16 %rs109, %rs107, %rs105, %p63; 2026-02-21T09:21:01.7878721Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T09:21:01.7878783Z shr.s16 %rs111, %rs110, 4; 2026-02-21T09:21:01.7878853Z selp.b16 %rs112, %rs108, %rs106, %p63; 2026-02-21T09:21:01.7878914Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T09:21:01.7878976Z shr.s16 %rs114, %rs113, 4; 2026-02-21T09:21:01.7879175Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7879241Z cvt.rn.f32.s16 %r3187, %rs111; 2026-02-21T09:21:01.7879306Z cvt.rn.f32.s16 %r3188, %rs114; 2026-02-21T09:21:01.7879360Z bar.sync 0; 2026-02-21T09:21:01.7879427Z st.shared.b32 [%r35], %r3187; 2026-02-21T09:21:01.7879495Z st.shared.b32 [%r36], %r3188; 2026-02-21T09:21:01.7879560Z $L__tmp11: 2026-02-21T09:21:01.7879837Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7879993Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2562, %r2698}; 2026-02-21T09:21:01.7880051Z bar.sync 0; 2026-02-21T09:21:01.7880110Z // begin inline asm 2026-02-21T09:21:01.7880244Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3032}, [%r1091]; 2026-02-21T09:21:01.7880304Z // end inline asm 2026-02-21T09:21:01.7880360Z bar.sync 0; 2026-02-21T09:21:01.7880507Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2564, %r2700}; 2026-02-21T09:21:01.7880563Z bar.sync 0; 2026-02-21T09:21:01.7880624Z // begin inline asm 2026-02-21T09:21:01.7880754Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3034}, [%r1091]; 2026-02-21T09:21:01.7880809Z // end inline asm 2026-02-21T09:21:01.7880868Z bar.sync 0; 2026-02-21T09:21:01.7881012Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2563, %r2699}; 2026-02-21T09:21:01.7881066Z bar.sync 0; 2026-02-21T09:21:01.7881127Z // begin inline asm 2026-02-21T09:21:01.7881254Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3033}, [%r1091]; 2026-02-21T09:21:01.7881309Z // end inline asm 2026-02-21T09:21:01.7881363Z bar.sync 0; 2026-02-21T09:21:01.7881511Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2565, %r2701}; 2026-02-21T09:21:01.7881565Z bar.sync 0; 2026-02-21T09:21:01.7881624Z // begin inline asm 2026-02-21T09:21:01.7881755Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3035}, [%r1091]; 2026-02-21T09:21:01.7881811Z // end inline asm 2026-02-21T09:21:01.7881864Z bar.sync 0; 2026-02-21T09:21:01.7882006Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2566, %r2702}; 2026-02-21T09:21:01.7882210Z bar.sync 0; 2026-02-21T09:21:01.7882270Z // begin inline asm 2026-02-21T09:21:01.7882397Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3036}, [%r1091]; 2026-02-21T09:21:01.7882454Z // end inline asm 2026-02-21T09:21:01.7882510Z bar.sync 0; 2026-02-21T09:21:01.7882659Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2568, %r2704}; 2026-02-21T09:21:01.7882716Z bar.sync 0; 2026-02-21T09:21:01.7882773Z // begin inline asm 2026-02-21T09:21:01.7882899Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3038}, [%r1091]; 2026-02-21T09:21:01.7882954Z // end inline asm 2026-02-21T09:21:01.7883011Z bar.sync 0; 2026-02-21T09:21:01.7883153Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2567, %r2703}; 2026-02-21T09:21:01.7883207Z bar.sync 0; 2026-02-21T09:21:01.7883321Z // begin inline asm 2026-02-21T09:21:01.7883449Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3037}, [%r1091]; 2026-02-21T09:21:01.7883504Z // end inline asm 2026-02-21T09:21:01.7883561Z bar.sync 0; 2026-02-21T09:21:01.7883706Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2569, %r2705}; 2026-02-21T09:21:01.7883759Z bar.sync 0; 2026-02-21T09:21:01.7883817Z // begin inline asm 2026-02-21T09:21:01.7883990Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3039}, [%r1091]; 2026-02-21T09:21:01.7884049Z // end inline asm 2026-02-21T09:21:01.7884102Z bar.sync 0; 2026-02-21T09:21:01.7884245Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2570, %r2706}; 2026-02-21T09:21:01.7884302Z bar.sync 0; 2026-02-21T09:21:01.7884371Z // begin inline asm 2026-02-21T09:21:01.7884499Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3040}, [%r1091]; 2026-02-21T09:21:01.7884561Z // end inline asm 2026-02-21T09:21:01.7884614Z bar.sync 0; 2026-02-21T09:21:01.7884759Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2572, %r2708}; 2026-02-21T09:21:01.7884816Z bar.sync 0; 2026-02-21T09:21:01.7884873Z // begin inline asm 2026-02-21T09:21:01.7884998Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3042}, [%r1091]; 2026-02-21T09:21:01.7885057Z // end inline asm 2026-02-21T09:21:01.7885115Z bar.sync 0; 2026-02-21T09:21:01.7885257Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2571, %r2707}; 2026-02-21T09:21:01.7885312Z bar.sync 0; 2026-02-21T09:21:01.7885378Z // begin inline asm 2026-02-21T09:21:01.7885504Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3041}, [%r1091]; 2026-02-21T09:21:01.7885559Z // end inline asm 2026-02-21T09:21:01.7885612Z bar.sync 0; 2026-02-21T09:21:01.7885760Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2573, %r2709}; 2026-02-21T09:21:01.7885811Z bar.sync 0; 2026-02-21T09:21:01.7885869Z // begin inline asm 2026-02-21T09:21:01.7885999Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3043}, [%r1091]; 2026-02-21T09:21:01.7886067Z // end inline asm 2026-02-21T09:21:01.7886123Z bar.sync 0; 2026-02-21T09:21:01.7886270Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2574, %r2710}; 2026-02-21T09:21:01.7886329Z bar.sync 0; 2026-02-21T09:21:01.7886389Z // begin inline asm 2026-02-21T09:21:01.7886631Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3044}, [%r1091]; 2026-02-21T09:21:01.7886694Z // end inline asm 2026-02-21T09:21:01.7886746Z bar.sync 0; 2026-02-21T09:21:01.7886893Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2576, %r2712}; 2026-02-21T09:21:01.7886962Z bar.sync 0; 2026-02-21T09:21:01.7887020Z // begin inline asm 2026-02-21T09:21:01.7887146Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3046}, [%r1091]; 2026-02-21T09:21:01.7887202Z // end inline asm 2026-02-21T09:21:01.7887259Z bar.sync 0; 2026-02-21T09:21:01.7887406Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2575, %r2711}; 2026-02-21T09:21:01.7887460Z bar.sync 0; 2026-02-21T09:21:01.7887522Z // begin inline asm 2026-02-21T09:21:01.7887649Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3045}, [%r1091]; 2026-02-21T09:21:01.7887705Z // end inline asm 2026-02-21T09:21:01.7887758Z bar.sync 0; 2026-02-21T09:21:01.7887904Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2577, %r2713}; 2026-02-21T09:21:01.7888114Z bar.sync 0; 2026-02-21T09:21:01.7888171Z // begin inline asm 2026-02-21T09:21:01.7888302Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3047}, [%r1091]; 2026-02-21T09:21:01.7888357Z // end inline asm 2026-02-21T09:21:01.7888414Z bar.sync 0; 2026-02-21T09:21:01.7888556Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2578, %r2714}; 2026-02-21T09:21:01.7888613Z bar.sync 0; 2026-02-21T09:21:01.7888683Z // begin inline asm 2026-02-21T09:21:01.7888812Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3048}, [%r1091]; 2026-02-21T09:21:01.7888871Z // end inline asm 2026-02-21T09:21:01.7888924Z bar.sync 0; 2026-02-21T09:21:01.7889066Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2580, %r2716}; 2026-02-21T09:21:01.7889194Z bar.sync 0; 2026-02-21T09:21:01.7889253Z // begin inline asm 2026-02-21T09:21:01.7889377Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3050}, [%r1091]; 2026-02-21T09:21:01.7889431Z // end inline asm 2026-02-21T09:21:01.7889495Z bar.sync 0; 2026-02-21T09:21:01.7889636Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2579, %r2715}; 2026-02-21T09:21:01.7889690Z bar.sync 0; 2026-02-21T09:21:01.7889749Z // begin inline asm 2026-02-21T09:21:01.7889937Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3049}, [%r1091]; 2026-02-21T09:21:01.7889997Z // end inline asm 2026-02-21T09:21:01.7890051Z bar.sync 0; 2026-02-21T09:21:01.7890195Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2581, %r2717}; 2026-02-21T09:21:01.7890249Z bar.sync 0; 2026-02-21T09:21:01.7890306Z // begin inline asm 2026-02-21T09:21:01.7890434Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3051}, [%r1091]; 2026-02-21T09:21:01.7890489Z // end inline asm 2026-02-21T09:21:01.7890543Z bar.sync 0; 2026-02-21T09:21:01.7890685Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2582, %r2718}; 2026-02-21T09:21:01.7890742Z bar.sync 0; 2026-02-21T09:21:01.7890799Z // begin inline asm 2026-02-21T09:21:01.7890926Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3052}, [%r1091]; 2026-02-21T09:21:01.7890989Z // end inline asm 2026-02-21T09:21:01.7891042Z bar.sync 0; 2026-02-21T09:21:01.7891183Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2584, %r2720}; 2026-02-21T09:21:01.7891242Z bar.sync 0; 2026-02-21T09:21:01.7891302Z // begin inline asm 2026-02-21T09:21:01.7891427Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3054}, [%r1091]; 2026-02-21T09:21:01.7891482Z // end inline asm 2026-02-21T09:21:01.7891538Z bar.sync 0; 2026-02-21T09:21:01.7891678Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2583, %r2719}; 2026-02-21T09:21:01.7891732Z bar.sync 0; 2026-02-21T09:21:01.7891792Z // begin inline asm 2026-02-21T09:21:01.7891918Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3053}, [%r1091]; 2026-02-21T09:21:01.7891973Z // end inline asm 2026-02-21T09:21:01.7892026Z bar.sync 0; 2026-02-21T09:21:01.7892185Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2585, %r2721}; 2026-02-21T09:21:01.7892240Z bar.sync 0; 2026-02-21T09:21:01.7892301Z // begin inline asm 2026-02-21T09:21:01.7892429Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3055}, [%r1091]; 2026-02-21T09:21:01.7892484Z // end inline asm 2026-02-21T09:21:01.7892537Z bar.sync 0; 2026-02-21T09:21:01.7892681Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2586, %r2722}; 2026-02-21T09:21:01.7892740Z bar.sync 0; 2026-02-21T09:21:01.7892797Z // begin inline asm 2026-02-21T09:21:01.7892922Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3056}, [%r1091]; 2026-02-21T09:21:01.7892985Z // end inline asm 2026-02-21T09:21:01.7893038Z bar.sync 0; 2026-02-21T09:21:01.7893178Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2588, %r2724}; 2026-02-21T09:21:01.7893236Z bar.sync 0; 2026-02-21T09:21:01.7893293Z // begin inline asm 2026-02-21T09:21:01.7893419Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3058}, [%r1091]; 2026-02-21T09:21:01.7893474Z // end inline asm 2026-02-21T09:21:01.7893532Z bar.sync 0; 2026-02-21T09:21:01.7893676Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2587, %r2723}; 2026-02-21T09:21:01.7893836Z bar.sync 0; 2026-02-21T09:21:01.7893896Z // begin inline asm 2026-02-21T09:21:01.7894022Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3057}, [%r1091]; 2026-02-21T09:21:01.7894076Z // end inline asm 2026-02-21T09:21:01.7894131Z bar.sync 0; 2026-02-21T09:21:01.7894278Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2589, %r2725}; 2026-02-21T09:21:01.7894332Z bar.sync 0; 2026-02-21T09:21:01.7894390Z // begin inline asm 2026-02-21T09:21:01.7894520Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3059}, [%r1091]; 2026-02-21T09:21:01.7894576Z // end inline asm 2026-02-21T09:21:01.7894630Z bar.sync 0; 2026-02-21T09:21:01.7894771Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2590, %r2726}; 2026-02-21T09:21:01.7894878Z bar.sync 0; 2026-02-21T09:21:01.7894938Z // begin inline asm 2026-02-21T09:21:01.7895063Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3060}, [%r1091]; 2026-02-21T09:21:01.7895121Z // end inline asm 2026-02-21T09:21:01.7895179Z bar.sync 0; 2026-02-21T09:21:01.7895323Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2592, %r2728}; 2026-02-21T09:21:01.7895383Z bar.sync 0; 2026-02-21T09:21:01.7895441Z // begin inline asm 2026-02-21T09:21:01.7895610Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3062}, [%r1091]; 2026-02-21T09:21:01.7895668Z // end inline asm 2026-02-21T09:21:01.7895731Z bar.sync 0; 2026-02-21T09:21:01.7895881Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2591, %r2727}; 2026-02-21T09:21:01.7895935Z bar.sync 0; 2026-02-21T09:21:01.7895997Z // begin inline asm 2026-02-21T09:21:01.7896123Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3061}, [%r1091]; 2026-02-21T09:21:01.7896177Z // end inline asm 2026-02-21T09:21:01.7896230Z bar.sync 0; 2026-02-21T09:21:01.7896379Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r2593, %r2729}; 2026-02-21T09:21:01.7896432Z bar.sync 0; 2026-02-21T09:21:01.7896611Z // begin inline asm 2026-02-21T09:21:01.7896752Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3063}, [%r1091]; 2026-02-21T09:21:01.7896810Z // end inline asm 2026-02-21T09:21:01.7896867Z // begin inline asm 2026-02-21T09:21:01.7896945Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7897008Z // end inline asm 2026-02-21T09:21:01.7897081Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7897145Z shl.b32 %r3189, %r3182, 8; 2026-02-21T09:21:01.7897209Z and.b32 %r3190, %r3189, 4096; 2026-02-21T09:21:01.7897271Z add.s32 %r3191, %r3190, %r491; 2026-02-21T09:21:01.7897332Z bfe.u32 %r3192, %r3191, 4, 14; 2026-02-21T09:21:01.7897399Z cvt.u64.u32 %rd139, %r3192; 2026-02-21T09:21:01.7897478Z or.b64 %rd133, %rd139, -9223371899382267904; 2026-02-21T09:21:01.7897538Z // begin inline asm 2026-02-21T09:21:01.7898299Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053,%r3054,%r3055,%r3056,%r3057,%r3058,%r3059,%r3060,%r3061,%r3062,%r3063}, {%r3028,%r3029,%r3030,%r3031}, %rd133, %p25, 1, 1; 2026-02-21T09:21:01.7898364Z // end inline asm 2026-02-21T09:21:01.7898425Z add.s32 %r3193, %r3191, 32; 2026-02-21T09:21:01.7898485Z bfe.u32 %r3194, %r3193, 4, 14; 2026-02-21T09:21:01.7898565Z cvt.u64.u32 %rd140, %r3194; 2026-02-21T09:21:01.7898643Z or.b64 %rd134, %rd140, -9223371899382267904; 2026-02-21T09:21:01.7898702Z // begin inline asm 2026-02-21T09:21:01.7899460Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053,%r3054,%r3055,%r3056,%r3057,%r3058,%r3059,%r3060,%r3061,%r3062,%r3063}, {%r3096,%r3097,%r3098,%r3099}, %rd134, %p25, 1, 1; 2026-02-21T09:21:01.7899519Z // end inline asm 2026-02-21T09:21:01.7899595Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7899655Z mov.b32 %r3132, %r491; 2026-02-21T09:21:01.7899716Z mov.b32 %r3134, %r3133; 2026-02-21T09:21:01.7899919Z // begin inline asm 2026-02-21T09:21:01.7900474Z // wait for regs: %r3032,%r3033,%r3034,%r3035,%r3036,%r3037,%r3038,%r3039,%r3040,%r3041,%r3042,%r3043,%r3044,%r3045,%r3046,%r3047,%r3048,%r3049,%r3050,%r3051,%r3052,%r3053,%r3054,%r3055,%r3056,%r3057,%r3058,%r3059,%r3060,%r3061,%r3062,%r3063,%r3132,%r3133,%r3134 2026-02-21T09:21:01.7900553Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7900607Z // end inline asm 2026-02-21T09:21:01.7900660Z $L__tmp12: 2026-02-21T09:21:01.7900873Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7900935Z add.s32 %r3195, %r5247, 1; 2026-02-21T09:21:01.7901000Z setp.gt.s32 %p34, %r3195, 4; 2026-02-21T09:21:01.7901069Z selp.b32 %r5247, 0, %r3195, %p34; 2026-02-21T09:21:01.7901336Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7901401Z add.s32 %r3196, %r5245, -16; 2026-02-21T09:21:01.7901478Z mad.wide.s32 %rd135, %r3196, 2, %rd27; 2026-02-21T09:21:01.7901678Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7901738Z shl.b32 %r3197, %r5247, 13; 2026-02-21T09:21:01.7901797Z add.s32 %r3170, %r21, %r3197; 2026-02-21T09:21:01.7901922Z selp.b32 %r3171, 8, 0, %p32; 2026-02-21T09:21:01.7901987Z // begin inline asm 2026-02-21T09:21:01.7902132Z cp.async.ca.shared.global [ %r3170 + 0 ], [ %rd135 + 0 ], 0x8, %r3171; 2026-02-21T09:21:01.7902197Z // end inline asm 2026-02-21T09:21:01.7902271Z cp.async.commit_group; 2026-02-21T09:21:01.7902472Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7902546Z mad.wide.s32 %rd136, %r5245, 2, %rd27; 2026-02-21T09:21:01.7902743Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7902805Z add.s32 %r3172, %r22, %r3197; 2026-02-21T09:21:01.7902864Z // begin inline asm 2026-02-21T09:21:01.7903004Z cp.async.ca.shared.global [ %r3172 + 0 ], [ %rd136 + 0 ], 0x8, %r3171; 2026-02-21T09:21:01.7903060Z // end inline asm 2026-02-21T09:21:01.7903124Z cp.async.commit_group; 2026-02-21T09:21:01.7903328Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7903388Z add.s32 %r5245, %r5245, 32; 2026-02-21T09:21:01.7903449Z add.s32 %r5244, %r5244, 131072; 2026-02-21T09:21:01.7903516Z setp.lt.u64 %p35, %rd221, 496; 2026-02-21T09:21:01.7903579Z @%p35 bra $L__BB0_7; 2026-02-21T09:21:01.7903687Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:01.7903885Z .loc 1 34 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:32 2026-02-21T09:21:01.7903951Z or.b32 %r3257, %r237, %r8; 2026-02-21T09:21:01.7904149Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7904213Z or.b32 %r3258, %r238, %r12; 2026-02-21T09:21:01.7904275Z or.b32 %r3259, %r238, %r13; 2026-02-21T09:21:01.7904334Z or.b32 %r3260, %r238, %r14; 2026-02-21T09:21:01.7904392Z or.b32 %r3261, %r238, %r15; 2026-02-21T09:21:01.7904592Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7904660Z cp.async.wait_group 0; 2026-02-21T09:21:01.7904715Z bar.sync 0; 2026-02-21T09:21:01.7904910Z .loc 1 90 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:90:28 2026-02-21T09:21:01.7904991Z cvt.rn.bf16x2.f32 %r3262, %r3033, %r3032; 2026-02-21T09:21:01.7905063Z cvt.rn.bf16x2.f32 %r3263, %r3035, %r3034; 2026-02-21T09:21:01.7905132Z cvt.rn.bf16x2.f32 %r3264, %r3037, %r3036; 2026-02-21T09:21:01.7905205Z cvt.rn.bf16x2.f32 %r3265, %r3039, %r3038; 2026-02-21T09:21:01.7905274Z cvt.rn.bf16x2.f32 %r3266, %r3041, %r3040; 2026-02-21T09:21:01.7905342Z cvt.rn.bf16x2.f32 %r3267, %r3043, %r3042; 2026-02-21T09:21:01.7905523Z cvt.rn.bf16x2.f32 %r3268, %r3045, %r3044; 2026-02-21T09:21:01.7905597Z cvt.rn.bf16x2.f32 %r3269, %r3047, %r3046; 2026-02-21T09:21:01.7905666Z cvt.rn.bf16x2.f32 %r3270, %r3049, %r3048; 2026-02-21T09:21:01.7905739Z cvt.rn.bf16x2.f32 %r3271, %r3051, %r3050; 2026-02-21T09:21:01.7905812Z cvt.rn.bf16x2.f32 %r3272, %r3053, %r3052; 2026-02-21T09:21:01.7905881Z cvt.rn.bf16x2.f32 %r3273, %r3055, %r3054; 2026-02-21T09:21:01.7905950Z cvt.rn.bf16x2.f32 %r3274, %r3057, %r3056; 2026-02-21T09:21:01.7906021Z cvt.rn.bf16x2.f32 %r3275, %r3059, %r3058; 2026-02-21T09:21:01.7906089Z cvt.rn.bf16x2.f32 %r3276, %r3061, %r3060; 2026-02-21T09:21:01.7906159Z cvt.rn.bf16x2.f32 %r3277, %r3063, %r3062; 2026-02-21T09:21:01.7906408Z .loc 1 91 43 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:43 2026-02-21T09:21:01.7906591Z shl.b32 %r3278, %r3258, 13; 2026-02-21T09:21:01.7906656Z shl.b32 %r3279, %r3259, 13; 2026-02-21T09:21:01.7906715Z shl.b32 %r3280, %r3260, 13; 2026-02-21T09:21:01.7906781Z shl.b32 %r3281, %r3261, 13; 2026-02-21T09:21:01.7906980Z .loc 1 91 50 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:50 2026-02-21T09:21:01.7907042Z add.s32 %r3282, %r3278, %r3257; 2026-02-21T09:21:01.7907195Z add.s32 %r3283, %r3279, %r3257; 2026-02-21T09:21:01.7907259Z add.s32 %r3284, %r3280, %r3257; 2026-02-21T09:21:01.7907318Z add.s32 %r3285, %r3281, %r3257; 2026-02-21T09:21:01.7907515Z .loc 1 91 22 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:22 2026-02-21T09:21:01.7907588Z mad.wide.s32 %rd141, %r3282, 2, %rd29; 2026-02-21T09:21:01.7907657Z mad.wide.s32 %rd142, %r3283, 2, %rd29; 2026-02-21T09:21:01.7907723Z mad.wide.s32 %rd143, %r3284, 2, %rd29; 2026-02-21T09:21:01.7907794Z mad.wide.s32 %rd144, %r3285, 2, %rd29; 2026-02-21T09:21:01.7907990Z .loc 1 91 81 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:81 2026-02-21T09:21:01.7908116Z st.shared.v4.b32 [%r39], {%r3262, %r3264, %r3266, %r3268}; 2026-02-21T09:21:01.7908238Z st.shared.v4.b32 [%r39+512], {%r3263, %r3265, %r3267, %r3269}; 2026-02-21T09:21:01.7908400Z st.shared.v4.b32 [%r40], {%r3270, %r3272, %r3274, %r3276}; 2026-02-21T09:21:01.7908513Z st.shared.v4.b32 [%r40+512], {%r3271, %r3273, %r3275, %r3277}; 2026-02-21T09:21:01.7908567Z bar.sync 0; 2026-02-21T09:21:01.7908629Z // begin inline asm 2026-02-21T09:21:01.7908819Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3218, %r3219, %r3220, %r3221}, [%r1392]; 2026-02-21T09:21:01.7908876Z // end inline asm 2026-02-21T09:21:01.7908937Z // begin inline asm 2026-02-21T09:21:01.7909119Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3222, %r3223, %r3224, %r3225}, [%r1397]; 2026-02-21T09:21:01.7909175Z // end inline asm 2026-02-21T09:21:01.7909238Z // begin inline asm 2026-02-21T09:21:01.7909414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3226, %r3227, %r3228, %r3229}, [%r1402]; 2026-02-21T09:21:01.7909469Z // end inline asm 2026-02-21T09:21:01.7909528Z // begin inline asm 2026-02-21T09:21:01.7909708Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3230, %r3231, %r3232, %r3233}, [%r1407]; 2026-02-21T09:21:01.7909765Z // end inline asm 2026-02-21T09:21:01.7909820Z // begin inline asm 2026-02-21T09:21:01.7909952Z st.global.v4.b32 [ %rd141 + 0 ], { %r3218, %r3219, %r3220, %r3221 }; 2026-02-21T09:21:01.7910008Z // end inline asm 2026-02-21T09:21:01.7910065Z // begin inline asm 2026-02-21T09:21:01.7910182Z st.global.v4.b32 [ %rd142 + 0 ], { %r3222, %r3223, %r3224, %r3225 }; 2026-02-21T09:21:01.7910242Z // end inline asm 2026-02-21T09:21:01.7910300Z // begin inline asm 2026-02-21T09:21:01.7910414Z st.global.v4.b32 [ %rd143 + 0 ], { %r3226, %r3227, %r3228, %r3229 }; 2026-02-21T09:21:01.7910472Z // end inline asm 2026-02-21T09:21:01.7910530Z // begin inline asm 2026-02-21T09:21:01.7910642Z st.global.v4.b32 [ %rd144 + 0 ], { %r3230, %r3231, %r3232, %r3233 }; 2026-02-21T09:21:01.7910701Z // end inline asm 2026-02-21T09:21:01.7911065Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7911127Z add.s32 %r3286, %r5171, 3; 2026-02-21T09:21:01.7911332Z .loc 1 29 33 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:29:33 2026-02-21T09:21:01.7911396Z shr.u32 %r3287, %r3286, 6; 2026-02-21T09:21:01.7911459Z and.b32 %r3288, %r3287, 33554424; 2026-02-21T09:21:01.7911659Z .loc 1 30 39 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:39 2026-02-21T09:21:01.7911723Z sub.s32 %r3289, 64, %r3288; 2026-02-21T09:21:01.7911920Z .loc 1 30 52 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:30:52 2026-02-21T09:21:01.7911979Z min.s32 %r3290, %r3289, 8; 2026-02-21T09:21:01.7912243Z .loc 1 31 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:45 2026-02-21T09:21:01.7912306Z and.b32 %r3291, %r3286, 511; 2026-02-21T09:21:01.7912507Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.7912573Z div.s32 %r3292, %r3291, %r3290; 2026-02-21T09:21:01.7912812Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.7912879Z mul.lo.s32 %r3293, %r3292, %r3290; 2026-02-21T09:21:01.7912940Z sub.s32 %r3294, %r3291, %r3293; 2026-02-21T09:21:01.7913163Z .loc 1 31 30 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:30 2026-02-21T09:21:01.7913224Z add.s32 %r3295, %r3294, %r3288; 2026-02-21T09:21:01.7913422Z .loc 1 33 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:33:27 2026-02-21T09:21:01.7913486Z shl.b32 %r313, %r3295, 7; 2026-02-21T09:21:01.7913682Z .loc 1 35 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:35:27 2026-02-21T09:21:01.7913741Z shl.b32 %r314, %r3292, 8; 2026-02-21T09:21:01.7913947Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7914005Z or.b32 %r3296, %r314, %r11; 2026-02-21T09:21:01.7914201Z .loc 1 51 53 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:53 2026-02-21T09:21:01.7914263Z shl.b32 %r3297, %r3296, 10; 2026-02-21T09:21:01.7914457Z .loc 1 51 60 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:60 2026-02-21T09:21:01.7914516Z or.b32 %r3298, %r3297, %r19; 2026-02-21T09:21:01.7914711Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7914785Z mad.wide.s32 %rd145, %r3298, 2, %rd27; 2026-02-21T09:21:01.7914982Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7915037Z bar.sync 0; 2026-02-21T09:21:01.7915096Z mov.b32 %r3235, 8; 2026-02-21T09:21:01.7915155Z // begin inline asm 2026-02-21T09:21:01.7915299Z cp.async.ca.shared.global [ %r21 + 0 ], [ %rd145 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7915360Z // end inline asm 2026-02-21T09:21:01.7915424Z cp.async.commit_group; 2026-02-21T09:21:01.7915622Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7915683Z cvt.s64.s32 %rd156, %r3297; 2026-02-21T09:21:01.7915761Z or.b64 %rd157, %rd156, %rd10; 2026-02-21T09:21:01.7915825Z shl.b64 %rd158, %rd157, 1; 2026-02-21T09:21:01.7915889Z add.s64 %rd159, %rd27, %rd158; 2026-02-21T09:21:01.7915954Z add.s64 %rd146, %rd159, 32; 2026-02-21T09:21:01.7916151Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7916210Z // begin inline asm 2026-02-21T09:21:01.7916348Z cp.async.ca.shared.global [ %r22 + 0 ], [ %rd146 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7916404Z // end inline asm 2026-02-21T09:21:01.7916593Z cp.async.commit_group; 2026-02-21T09:21:01.7916958Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7917022Z add.s64 %rd147, %rd159, 64; 2026-02-21T09:21:01.7917221Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7917277Z bar.sync 0; 2026-02-21T09:21:01.7917339Z // begin inline asm 2026-02-21T09:21:01.7917471Z cp.async.ca.shared.global [ %r23 + 0 ], [ %rd147 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7917526Z // end inline asm 2026-02-21T09:21:01.7917591Z cp.async.commit_group; 2026-02-21T09:21:01.7917791Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7917852Z add.s64 %rd148, %rd159, 96; 2026-02-21T09:21:01.7918118Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7918193Z // begin inline asm 2026-02-21T09:21:01.7918329Z cp.async.ca.shared.global [ %r24 + 0 ], [ %rd148 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7918388Z // end inline asm 2026-02-21T09:21:01.7918455Z cp.async.commit_group; 2026-02-21T09:21:01.7918714Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7918781Z add.s64 %rd149, %rd159, 128; 2026-02-21T09:21:01.7918980Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7919035Z bar.sync 0; 2026-02-21T09:21:01.7919093Z // begin inline asm 2026-02-21T09:21:01.7919220Z cp.async.ca.shared.global [ %r25 + 0 ], [ %rd149 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7919281Z // end inline asm 2026-02-21T09:21:01.7922329Z cp.async.commit_group; 2026-02-21T09:21:01.7922607Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7922681Z add.s64 %rd150, %rd159, 160; 2026-02-21T09:21:01.7922904Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7922978Z // begin inline asm 2026-02-21T09:21:01.7923126Z cp.async.ca.shared.global [ %r26 + 0 ], [ %rd150 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7923186Z // end inline asm 2026-02-21T09:21:01.7923261Z cp.async.commit_group; 2026-02-21T09:21:01.7923476Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7923542Z add.s64 %rd151, %rd159, 192; 2026-02-21T09:21:01.7923744Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7923805Z bar.sync 0; 2026-02-21T09:21:01.7923867Z // begin inline asm 2026-02-21T09:21:01.7924006Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd151 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7924068Z // end inline asm 2026-02-21T09:21:01.7924135Z cp.async.commit_group; 2026-02-21T09:21:01.7924338Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7924412Z add.s64 %rd152, %rd159, 224; 2026-02-21T09:21:01.7924614Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7924675Z // begin inline asm 2026-02-21T09:21:01.7924811Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd152 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7924872Z // end inline asm 2026-02-21T09:21:01.7924939Z cp.async.commit_group; 2026-02-21T09:21:01.7925147Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7925214Z add.s64 %rd153, %rd159, 256; 2026-02-21T09:21:01.7925419Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7925477Z bar.sync 0; 2026-02-21T09:21:01.7925539Z // begin inline asm 2026-02-21T09:21:01.7925677Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd153 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7925885Z // end inline asm 2026-02-21T09:21:01.7925955Z cp.async.commit_group; 2026-02-21T09:21:01.7926160Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7926225Z add.s64 %rd154, %rd159, 288; 2026-02-21T09:21:01.7926423Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7926655Z // begin inline asm 2026-02-21T09:21:01.7926794Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd154 + 0 ], 0x8, %r3235; 2026-02-21T09:21:01.7926853Z // end inline asm 2026-02-21T09:21:01.7926919Z cp.async.commit_group; 2026-02-21T09:21:01.7927124Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7927277Z shl.b32 %r3299, %r3292, 18; 2026-02-21T09:21:01.7927346Z or.b32 %r5281, %r45, %r3299; 2026-02-21T09:21:01.7927414Z add.s32 %r3300, %r3291, %r82; 2026-02-21T09:21:01.7927478Z sub.s32 %r3301, %r3300, %r3293; 2026-02-21T09:21:01.7927545Z shl.b32 %r3302, %r3301, 7; 2026-02-21T09:21:01.7927608Z add.s32 %r5280, %r46, %r3302; 2026-02-21T09:21:01.7927669Z mov.b32 %r3937, 0f00000000; 2026-02-21T09:21:01.7927730Z mov.b32 %r5283, 4; 2026-02-21T09:21:01.7927869Z mov.b32 %r5282, -1; 2026-02-21T09:21:01.7927936Z mov.b64 %rd222, -16; 2026-02-21T09:21:01.7927998Z mov.b32 %r3938, %r3937; 2026-02-21T09:21:01.7928058Z mov.b32 %r3939, %r3937; 2026-02-21T09:21:01.7928119Z mov.b32 %r3940, %r3937; 2026-02-21T09:21:01.7928176Z mov.b32 %r3941, %r3937; 2026-02-21T09:21:01.7928233Z mov.b32 %r3942, %r3937; 2026-02-21T09:21:01.7928294Z mov.b32 %r3943, %r3937; 2026-02-21T09:21:01.7928351Z mov.b32 %r3944, %r3937; 2026-02-21T09:21:01.7928408Z mov.b32 %r3945, %r3937; 2026-02-21T09:21:01.7928467Z mov.b32 %r3946, %r3937; 2026-02-21T09:21:01.7928542Z mov.b32 %r3947, %r3937; 2026-02-21T09:21:01.7928603Z mov.b32 %r3948, %r3937; 2026-02-21T09:21:01.7928661Z mov.b32 %r3949, %r3937; 2026-02-21T09:21:01.7928724Z mov.b32 %r3950, %r3937; 2026-02-21T09:21:01.7928781Z mov.b32 %r3951, %r3937; 2026-02-21T09:21:01.7928839Z mov.b32 %r3952, %r3937; 2026-02-21T09:21:01.7928896Z mov.b32 %r3953, %r3937; 2026-02-21T09:21:01.7928957Z mov.b32 %r3954, %r3937; 2026-02-21T09:21:01.7929016Z mov.b32 %r3955, %r3937; 2026-02-21T09:21:01.7929073Z mov.b32 %r3956, %r3937; 2026-02-21T09:21:01.7929134Z mov.b32 %r3957, %r3937; 2026-02-21T09:21:01.7929192Z mov.b32 %r3958, %r3937; 2026-02-21T09:21:01.7929253Z mov.b32 %r3959, %r3937; 2026-02-21T09:21:01.7929310Z mov.b32 %r3960, %r3937; 2026-02-21T09:21:01.7929371Z mov.b32 %r3961, %r3937; 2026-02-21T09:21:01.7929430Z mov.b32 %r3962, %r3937; 2026-02-21T09:21:01.7929487Z mov.b32 %r3963, %r3937; 2026-02-21T09:21:01.7929547Z mov.b32 %r3964, %r3937; 2026-02-21T09:21:01.7929609Z mov.b32 %r3965, %r3937; 2026-02-21T09:21:01.7929667Z mov.b32 %r3966, %r3937; 2026-02-21T09:21:01.7929725Z mov.b32 %r3967, %r3937; 2026-02-21T09:21:01.7929784Z mov.b32 %r3968, %r3937; 2026-02-21T09:21:01.7929904Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:01.7930012Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:01.7930081Z add.s64 %rd222, %rd222, 16; 2026-02-21T09:21:01.7930152Z setp.lt.u64 %p43, %rd222, 432; 2026-02-21T09:21:01.7930212Z add.s32 %r4079, %r5282, 1; 2026-02-21T09:21:01.7930284Z setp.gt.s32 %p44, %r4079, 4; 2026-02-21T09:21:01.7930352Z selp.b32 %r5282, 0, %r4079, %p44; 2026-02-21T09:21:01.7930562Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7930631Z cp.async.wait_group 8; 2026-02-21T09:21:01.7930690Z bar.sync 0; 2026-02-21T09:21:01.7930762Z shl.b32 %r4080, %r5282, 13; 2026-02-21T09:21:01.7930829Z add.s32 %r4082, %r5145, %r4080; 2026-02-21T09:21:01.7931034Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7931247Z add.s32 %r4083, %r4082, %r31; 2026-02-21T09:21:01.7931313Z ld.shared.b16 %rs117, [%r4083]; 2026-02-21T09:21:01.7931387Z ld.shared.b16 %rs118, [%r4083+256]; 2026-02-21T09:21:01.7931454Z ld.shared.b16 %rs119, [%r4083+16]; 2026-02-21T09:21:01.7931521Z ld.shared.b16 %rs120, [%r4083+272]; 2026-02-21T09:21:01.7931583Z add.s32 %r4084, %r4082, %r32; 2026-02-21T09:21:01.7931661Z ld.shared.b16 %rs121, [%r4084]; 2026-02-21T09:21:01.7931731Z ld.shared.b16 %rs122, [%r4084+256]; 2026-02-21T09:21:01.7931796Z ld.shared.b16 %rs123, [%r4084+16]; 2026-02-21T09:21:01.7931864Z ld.shared.b16 %rs124, [%r4084+272]; 2026-02-21T09:21:01.7931928Z cvt.f32.bf16 %r3599, %rs117; 2026-02-21T09:21:01.7931991Z cvt.f32.bf16 %r3600, %rs118; 2026-02-21T09:21:01.7932050Z cvt.f32.bf16 %r3601, %rs121; 2026-02-21T09:21:01.7932187Z cvt.f32.bf16 %r3602, %rs122; 2026-02-21T09:21:01.7932250Z cvt.f32.bf16 %r3667, %rs119; 2026-02-21T09:21:01.7932311Z cvt.f32.bf16 %r3668, %rs120; 2026-02-21T09:21:01.7932378Z cvt.f32.bf16 %r3669, %rs123; 2026-02-21T09:21:01.7932440Z cvt.f32.bf16 %r3670, %rs124; 2026-02-21T09:21:01.7932650Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7932712Z cvt.s64.s32 %rd174, %r5280; 2026-02-21T09:21:01.7932827Z add.s64 %rd161, %rd28, %rd174; 2026-02-21T09:21:01.7933027Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7933087Z // begin inline asm 2026-02-21T09:21:01.7933157Z mov.u64 %rd160, 0x0; 2026-02-21T09:21:01.7933298Z createpolicy.fractional.L2::evict_first.b64 %rd160, 1.0; 2026-02-21T09:21:01.7933356Z // end inline asm 2026-02-21T09:21:01.7933418Z // begin inline asm 2026-02-21T09:21:01.7933476Z mov.u16 %rs115, 0x0; 2026-02-21T09:21:01.7933642Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs115 }, [ %rd161 + 0 ], %rd160; 2026-02-21T09:21:01.7933699Z // end inline asm 2026-02-21T09:21:01.7933905Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7933975Z st.shared.b8 [%r33], %rs115; 2026-02-21T09:21:01.7934032Z bar.sync 0; 2026-02-21T09:21:01.7934116Z ld.shared.v2.b8 {%rs125, %rs126}, [%r34]; 2026-02-21T09:21:01.7934322Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7934386Z shl.b16 %rs127, %rs125, 4; 2026-02-21T09:21:01.7934449Z shl.b16 %rs128, %rs126, 4; 2026-02-21T09:21:01.7934650Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7934723Z selp.b16 %rs129, %rs127, %rs125, %p63; 2026-02-21T09:21:01.7934786Z cvt.s16.s8 %rs130, %rs129; 2026-02-21T09:21:01.7934848Z shr.s16 %rs131, %rs130, 4; 2026-02-21T09:21:01.7934917Z selp.b16 %rs132, %rs128, %rs126, %p63; 2026-02-21T09:21:01.7934978Z cvt.s16.s8 %rs133, %rs132; 2026-02-21T09:21:01.7935043Z shr.s16 %rs134, %rs133, 4; 2026-02-21T09:21:01.7935258Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7935325Z cvt.rn.f32.s16 %r4085, %rs131; 2026-02-21T09:21:01.7935391Z cvt.rn.f32.s16 %r4086, %rs134; 2026-02-21T09:21:01.7935446Z bar.sync 0; 2026-02-21T09:21:01.7935512Z st.shared.b32 [%r35], %r4085; 2026-02-21T09:21:01.7935573Z st.shared.b32 [%r36], %r4086; 2026-02-21T09:21:01.7935716Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3937}; 2026-02-21T09:21:01.7935773Z bar.sync 0; 2026-02-21T09:21:01.7935833Z // begin inline asm 2026-02-21T09:21:01.7935990Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3467, %r3603}, [%r590]; 2026-02-21T09:21:01.7936048Z // end inline asm 2026-02-21T09:21:01.7936102Z bar.sync 0; 2026-02-21T09:21:01.7936233Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3939}; 2026-02-21T09:21:01.7936291Z bar.sync 0; 2026-02-21T09:21:01.7936349Z // begin inline asm 2026-02-21T09:21:01.7936620Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3469, %r3605}, [%r590]; 2026-02-21T09:21:01.7936828Z // end inline asm 2026-02-21T09:21:01.7936884Z bar.sync 0; 2026-02-21T09:21:01.7937025Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3938}; 2026-02-21T09:21:01.7937085Z bar.sync 0; 2026-02-21T09:21:01.7937146Z // begin inline asm 2026-02-21T09:21:01.7937297Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3468, %r3604}, [%r590]; 2026-02-21T09:21:01.7937355Z // end inline asm 2026-02-21T09:21:01.7937413Z bar.sync 0; 2026-02-21T09:21:01.7937541Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3940}; 2026-02-21T09:21:01.7937596Z bar.sync 0; 2026-02-21T09:21:01.7937657Z // begin inline asm 2026-02-21T09:21:01.7937802Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3470, %r3606}, [%r590]; 2026-02-21T09:21:01.7937861Z // end inline asm 2026-02-21T09:21:01.7937979Z bar.sync 0; 2026-02-21T09:21:01.7938113Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3941}; 2026-02-21T09:21:01.7938169Z bar.sync 0; 2026-02-21T09:21:01.7938229Z // begin inline asm 2026-02-21T09:21:01.7938379Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3471, %r3607}, [%r590]; 2026-02-21T09:21:01.7938435Z // end inline asm 2026-02-21T09:21:01.7938490Z bar.sync 0; 2026-02-21T09:21:01.7938677Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3943}; 2026-02-21T09:21:01.7938740Z bar.sync 0; 2026-02-21T09:21:01.7938798Z // begin inline asm 2026-02-21T09:21:01.7938941Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3473, %r3609}, [%r590]; 2026-02-21T09:21:01.7939000Z // end inline asm 2026-02-21T09:21:01.7939054Z bar.sync 0; 2026-02-21T09:21:01.7939180Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3942}; 2026-02-21T09:21:01.7939237Z bar.sync 0; 2026-02-21T09:21:01.7939294Z // begin inline asm 2026-02-21T09:21:01.7939449Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3472, %r3608}, [%r590]; 2026-02-21T09:21:01.7939509Z // end inline asm 2026-02-21T09:21:01.7939568Z bar.sync 0; 2026-02-21T09:21:01.7939695Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3944}; 2026-02-21T09:21:01.7939755Z bar.sync 0; 2026-02-21T09:21:01.7939818Z // begin inline asm 2026-02-21T09:21:01.7939962Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3474, %r3610}, [%r590]; 2026-02-21T09:21:01.7940018Z // end inline asm 2026-02-21T09:21:01.7940073Z bar.sync 0; 2026-02-21T09:21:01.7940204Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3945}; 2026-02-21T09:21:01.7940258Z bar.sync 0; 2026-02-21T09:21:01.7940316Z // begin inline asm 2026-02-21T09:21:01.7940462Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3475, %r3611}, [%r590]; 2026-02-21T09:21:01.7940518Z // end inline asm 2026-02-21T09:21:01.7940573Z bar.sync 0; 2026-02-21T09:21:01.7940701Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3947}; 2026-02-21T09:21:01.7940761Z bar.sync 0; 2026-02-21T09:21:01.7940820Z // begin inline asm 2026-02-21T09:21:01.7940963Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3477, %r3613}, [%r590]; 2026-02-21T09:21:01.7941023Z // end inline asm 2026-02-21T09:21:01.7941080Z bar.sync 0; 2026-02-21T09:21:01.7941205Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3946}; 2026-02-21T09:21:01.7941263Z bar.sync 0; 2026-02-21T09:21:01.7941320Z // begin inline asm 2026-02-21T09:21:01.7941462Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3476, %r3612}, [%r590]; 2026-02-21T09:21:01.7941518Z // end inline asm 2026-02-21T09:21:01.7941578Z bar.sync 0; 2026-02-21T09:21:01.7941707Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3948}; 2026-02-21T09:21:01.7941761Z bar.sync 0; 2026-02-21T09:21:01.7941823Z // begin inline asm 2026-02-21T09:21:01.7941970Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3478, %r3614}, [%r590]; 2026-02-21T09:21:01.7942027Z // end inline asm 2026-02-21T09:21:01.7942082Z bar.sync 0; 2026-02-21T09:21:01.7942213Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3949}; 2026-02-21T09:21:01.7942268Z bar.sync 0; 2026-02-21T09:21:01.7942327Z // begin inline asm 2026-02-21T09:21:01.7942473Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3479, %r3615}, [%r590]; 2026-02-21T09:21:01.7942633Z // end inline asm 2026-02-21T09:21:01.7942687Z bar.sync 0; 2026-02-21T09:21:01.7942815Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3951}; 2026-02-21T09:21:01.7942873Z bar.sync 0; 2026-02-21T09:21:01.7942932Z // begin inline asm 2026-02-21T09:21:01.7943098Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3481, %r3617}, [%r590]; 2026-02-21T09:21:01.7943157Z // end inline asm 2026-02-21T09:21:01.7943210Z bar.sync 0; 2026-02-21T09:21:01.7943335Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3950}; 2026-02-21T09:21:01.7943391Z bar.sync 0; 2026-02-21T09:21:01.7943451Z // begin inline asm 2026-02-21T09:21:01.7943595Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3480, %r3616}, [%r590]; 2026-02-21T09:21:01.7943652Z // end inline asm 2026-02-21T09:21:01.7943763Z bar.sync 0; 2026-02-21T09:21:01.7943890Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3952}; 2026-02-21T09:21:01.7943944Z bar.sync 0; 2026-02-21T09:21:01.7944008Z // begin inline asm 2026-02-21T09:21:01.7944150Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3482, %r3618}, [%r590]; 2026-02-21T09:21:01.7944206Z // end inline asm 2026-02-21T09:21:01.7944260Z bar.sync 0; 2026-02-21T09:21:01.7944432Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3953}; 2026-02-21T09:21:01.7944492Z bar.sync 0; 2026-02-21T09:21:01.7944549Z // begin inline asm 2026-02-21T09:21:01.7944695Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3483, %r3619}, [%r590]; 2026-02-21T09:21:01.7944751Z // end inline asm 2026-02-21T09:21:01.7944806Z bar.sync 0; 2026-02-21T09:21:01.7944930Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3955}; 2026-02-21T09:21:01.7944987Z bar.sync 0; 2026-02-21T09:21:01.7945045Z // begin inline asm 2026-02-21T09:21:01.7945189Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3485, %r3621}, [%r590]; 2026-02-21T09:21:01.7945250Z // end inline asm 2026-02-21T09:21:01.7945303Z bar.sync 0; 2026-02-21T09:21:01.7945429Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3954}; 2026-02-21T09:21:01.7945491Z bar.sync 0; 2026-02-21T09:21:01.7945550Z // begin inline asm 2026-02-21T09:21:01.7945692Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3484, %r3620}, [%r590]; 2026-02-21T09:21:01.7945749Z // end inline asm 2026-02-21T09:21:01.7945809Z bar.sync 0; 2026-02-21T09:21:01.7945935Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3956}; 2026-02-21T09:21:01.7945989Z bar.sync 0; 2026-02-21T09:21:01.7946049Z // begin inline asm 2026-02-21T09:21:01.7946190Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3486, %r3622}, [%r590]; 2026-02-21T09:21:01.7946244Z // end inline asm 2026-02-21T09:21:01.7946297Z bar.sync 0; 2026-02-21T09:21:01.7946425Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3957}; 2026-02-21T09:21:01.7946604Z bar.sync 0; 2026-02-21T09:21:01.7946671Z // begin inline asm 2026-02-21T09:21:01.7946820Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3487, %r3623}, [%r590]; 2026-02-21T09:21:01.7946876Z // end inline asm 2026-02-21T09:21:01.7946933Z bar.sync 0; 2026-02-21T09:21:01.7947060Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3959}; 2026-02-21T09:21:01.7947121Z bar.sync 0; 2026-02-21T09:21:01.7947179Z // begin inline asm 2026-02-21T09:21:01.7947321Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3489, %r3625}, [%r590]; 2026-02-21T09:21:01.7947387Z // end inline asm 2026-02-21T09:21:01.7947441Z bar.sync 0; 2026-02-21T09:21:01.7947567Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3958}; 2026-02-21T09:21:01.7947623Z bar.sync 0; 2026-02-21T09:21:01.7947680Z // begin inline asm 2026-02-21T09:21:01.7947820Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3488, %r3624}, [%r590]; 2026-02-21T09:21:01.7947887Z // end inline asm 2026-02-21T09:21:01.7947944Z bar.sync 0; 2026-02-21T09:21:01.7948074Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3960}; 2026-02-21T09:21:01.7948129Z bar.sync 0; 2026-02-21T09:21:01.7948188Z // begin inline asm 2026-02-21T09:21:01.7948398Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3490, %r3626}, [%r590]; 2026-02-21T09:21:01.7948596Z // end inline asm 2026-02-21T09:21:01.7948653Z bar.sync 0; 2026-02-21T09:21:01.7948784Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3961}; 2026-02-21T09:21:01.7948838Z bar.sync 0; 2026-02-21T09:21:01.7948897Z // begin inline asm 2026-02-21T09:21:01.7949045Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3491, %r3627}, [%r590]; 2026-02-21T09:21:01.7949101Z // end inline asm 2026-02-21T09:21:01.7949155Z bar.sync 0; 2026-02-21T09:21:01.7949282Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3963}; 2026-02-21T09:21:01.7949341Z bar.sync 0; 2026-02-21T09:21:01.7949400Z // begin inline asm 2026-02-21T09:21:01.7949543Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3493, %r3629}, [%r590]; 2026-02-21T09:21:01.7949667Z // end inline asm 2026-02-21T09:21:01.7949725Z bar.sync 0; 2026-02-21T09:21:01.7949852Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3962}; 2026-02-21T09:21:01.7949908Z bar.sync 0; 2026-02-21T09:21:01.7949973Z // begin inline asm 2026-02-21T09:21:01.7950116Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3492, %r3628}, [%r590]; 2026-02-21T09:21:01.7950173Z // end inline asm 2026-02-21T09:21:01.7950227Z bar.sync 0; 2026-02-21T09:21:01.7950413Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3964}; 2026-02-21T09:21:01.7950471Z bar.sync 0; 2026-02-21T09:21:01.7950540Z // begin inline asm 2026-02-21T09:21:01.7950690Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3494, %r3630}, [%r590]; 2026-02-21T09:21:01.7950747Z // end inline asm 2026-02-21T09:21:01.7950802Z bar.sync 0; 2026-02-21T09:21:01.7950929Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3965}; 2026-02-21T09:21:01.7950982Z bar.sync 0; 2026-02-21T09:21:01.7951040Z // begin inline asm 2026-02-21T09:21:01.7951186Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3495, %r3631}, [%r590]; 2026-02-21T09:21:01.7951242Z // end inline asm 2026-02-21T09:21:01.7951296Z bar.sync 0; 2026-02-21T09:21:01.7951425Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3967}; 2026-02-21T09:21:01.7951485Z bar.sync 0; 2026-02-21T09:21:01.7951544Z // begin inline asm 2026-02-21T09:21:01.7951686Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3497, %r3633}, [%r590]; 2026-02-21T09:21:01.7951745Z // end inline asm 2026-02-21T09:21:01.7951799Z bar.sync 0; 2026-02-21T09:21:01.7951926Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3966}; 2026-02-21T09:21:01.7951983Z bar.sync 0; 2026-02-21T09:21:01.7952040Z // begin inline asm 2026-02-21T09:21:01.7952180Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3496, %r3632}, [%r590]; 2026-02-21T09:21:01.7952238Z // end inline asm 2026-02-21T09:21:01.7952292Z bar.sync 0; 2026-02-21T09:21:01.7952418Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1091], {%r3968}; 2026-02-21T09:21:01.7952472Z bar.sync 0; 2026-02-21T09:21:01.7952535Z // begin inline asm 2026-02-21T09:21:01.7952676Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r3498, %r3634}, [%r590]; 2026-02-21T09:21:01.7952732Z // end inline asm 2026-02-21T09:21:01.7952802Z $L__tmp13: 2026-02-21T09:21:01.7953105Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7953164Z // begin inline asm 2026-02-21T09:21:01.7953244Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7953305Z // end inline asm 2026-02-21T09:21:01.7953388Z shfl.sync.idx.b32 %r4087, %r6, 0, 31, -1; 2026-02-21T09:21:01.7953460Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7953525Z mov.pred %p36, -1; 2026-02-21T09:21:01.7953583Z // begin inline asm 2026-02-21T09:21:01.7954344Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498}, {%r3599,%r3600,%r3601,%r3602}, %rd1, %p36, 1, 1; 2026-02-21T09:21:01.7954404Z // end inline asm 2026-02-21T09:21:01.7954525Z // begin inline asm 2026-02-21T09:21:01.7955343Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498}, {%r3667,%r3668,%r3669,%r3670}, %rd2, %p36, 1, 1; 2026-02-21T09:21:01.7955402Z // end inline asm 2026-02-21T09:21:01.7955459Z // begin inline asm 2026-02-21T09:21:01.7956210Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625,%r3626,%r3627,%r3628,%r3629,%r3630,%r3631,%r3632,%r3633,%r3634}, {%r3599,%r3600,%r3601,%r3602}, %rd3, %p36, 1, 1; 2026-02-21T09:21:01.7956318Z // end inline asm 2026-02-21T09:21:01.7956378Z // begin inline asm 2026-02-21T09:21:01.7957418Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625,%r3626,%r3627,%r3628,%r3629,%r3630,%r3631,%r3632,%r3633,%r3634}, {%r3667,%r3668,%r3669,%r3670}, %rd4, %p36, 1, 1; 2026-02-21T09:21:01.7957562Z // end inline asm 2026-02-21T09:21:01.7957646Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7957704Z mov.b32 %r4039, 0; 2026-02-21T09:21:01.7957770Z mov.b32 %r3735, %r491; 2026-02-21T09:21:01.7957830Z mov.b32 %r3736, %r4039; 2026-02-21T09:21:01.7957888Z mov.b32 %r3737, %r4039; 2026-02-21T09:21:01.7957957Z // begin inline asm 2026-02-21T09:21:01.7959035Z // wait for regs: %r3467,%r3468,%r3469,%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498,%r3603,%r3604,%r3605,%r3606,%r3607,%r3608,%r3609,%r3610,%r3611,%r3612,%r3613,%r3614,%r3615,%r3616,%r3617,%r3618,%r3619,%r3620,%r3621,%r3622,%r3623,%r3624,%r3625,%r3626,%r3627,%r3628,%r3629,%r3630,%r3631,%r3632,%r3633,%r3634,%r3735,%r3736,%r3737 2026-02-21T09:21:01.7959115Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7959171Z // end inline asm 2026-02-21T09:21:01.7959228Z $L__tmp14: 2026-02-21T09:21:01.7959447Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7959513Z add.s32 %r4088, %r4082, 40960; 2026-02-21T09:21:01.7959720Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.7959793Z add.s32 %r4089, %r4088, %r31; 2026-02-21T09:21:01.7959859Z ld.shared.b16 %rs135, [%r4089]; 2026-02-21T09:21:01.7959931Z ld.shared.b16 %rs136, [%r4089+256]; 2026-02-21T09:21:01.7959998Z ld.shared.b16 %rs137, [%r4089+16]; 2026-02-21T09:21:01.7960064Z ld.shared.b16 %rs138, [%r4089+272]; 2026-02-21T09:21:01.7960126Z add.s32 %r4090, %r4088, %r32; 2026-02-21T09:21:01.7960196Z ld.shared.b16 %rs139, [%r4090]; 2026-02-21T09:21:01.7960263Z ld.shared.b16 %rs140, [%r4090+256]; 2026-02-21T09:21:01.7960328Z ld.shared.b16 %rs141, [%r4090+16]; 2026-02-21T09:21:01.7960396Z ld.shared.b16 %rs142, [%r4090+272]; 2026-02-21T09:21:01.7960462Z cvt.f32.bf16 %r3933, %rs135; 2026-02-21T09:21:01.7960523Z cvt.f32.bf16 %r3934, %rs136; 2026-02-21T09:21:01.7960586Z cvt.f32.bf16 %r3935, %rs139; 2026-02-21T09:21:01.7960646Z cvt.f32.bf16 %r3936, %rs140; 2026-02-21T09:21:01.7960706Z cvt.f32.bf16 %r4001, %rs137; 2026-02-21T09:21:01.7960766Z cvt.f32.bf16 %r4002, %rs138; 2026-02-21T09:21:01.7960829Z cvt.f32.bf16 %r4003, %rs141; 2026-02-21T09:21:01.7960890Z cvt.f32.bf16 %r4004, %rs142; 2026-02-21T09:21:01.7961098Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.7961163Z add.s32 %r4091, %r5280, 65536; 2026-02-21T09:21:01.7961227Z cvt.s64.s32 %rd175, %r4091; 2026-02-21T09:21:01.7961290Z add.s64 %rd168, %rd28, %rd175; 2026-02-21T09:21:01.7961641Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.7961704Z // begin inline asm 2026-02-21T09:21:01.7961762Z mov.u64 %rd167, 0x0; 2026-02-21T09:21:01.7961895Z createpolicy.fractional.L2::evict_first.b64 %rd167, 1.0; 2026-02-21T09:21:01.7961957Z // end inline asm 2026-02-21T09:21:01.7962016Z // begin inline asm 2026-02-21T09:21:01.7962085Z mov.u16 %rs116, 0x0; 2026-02-21T09:21:01.7962251Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs116 }, [ %rd168 + 0 ], %rd167; 2026-02-21T09:21:01.7962308Z // end inline asm 2026-02-21T09:21:01.7962515Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.7962572Z bar.sync 0; 2026-02-21T09:21:01.7962702Z st.shared.b8 [%r33], %rs116; 2026-02-21T09:21:01.7962760Z bar.sync 0; 2026-02-21T09:21:01.7962843Z ld.shared.v2.b8 {%rs143, %rs144}, [%r34]; 2026-02-21T09:21:01.7963044Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.7963112Z shl.b16 %rs145, %rs143, 4; 2026-02-21T09:21:01.7963175Z shl.b16 %rs146, %rs144, 4; 2026-02-21T09:21:01.7963425Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.7963501Z selp.b16 %rs147, %rs145, %rs143, %p63; 2026-02-21T09:21:01.7963577Z cvt.s16.s8 %rs148, %rs147; 2026-02-21T09:21:01.7963640Z shr.s16 %rs149, %rs148, 4; 2026-02-21T09:21:01.7963709Z selp.b16 %rs150, %rs146, %rs144, %p63; 2026-02-21T09:21:01.7963770Z cvt.s16.s8 %rs151, %rs150; 2026-02-21T09:21:01.7963833Z shr.s16 %rs152, %rs151, 4; 2026-02-21T09:21:01.7964033Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.7964097Z cvt.rn.f32.s16 %r4092, %rs149; 2026-02-21T09:21:01.7964162Z cvt.rn.f32.s16 %r4093, %rs152; 2026-02-21T09:21:01.7964216Z bar.sync 0; 2026-02-21T09:21:01.7964282Z st.shared.b32 [%r35], %r4092; 2026-02-21T09:21:01.7964345Z st.shared.b32 [%r36], %r4093; 2026-02-21T09:21:01.7964399Z $L__tmp15: 2026-02-21T09:21:01.7964675Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.7964834Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3467, %r3603}; 2026-02-21T09:21:01.7964893Z bar.sync 0; 2026-02-21T09:21:01.7964950Z // begin inline asm 2026-02-21T09:21:01.7965086Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3937}, [%r1091]; 2026-02-21T09:21:01.7965144Z // end inline asm 2026-02-21T09:21:01.7965197Z bar.sync 0; 2026-02-21T09:21:01.7965343Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3469, %r3605}; 2026-02-21T09:21:01.7965397Z bar.sync 0; 2026-02-21T09:21:01.7965459Z // begin inline asm 2026-02-21T09:21:01.7965588Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3939}, [%r1091]; 2026-02-21T09:21:01.7965642Z // end inline asm 2026-02-21T09:21:01.7965701Z bar.sync 0; 2026-02-21T09:21:01.7965844Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3468, %r3604}; 2026-02-21T09:21:01.7965897Z bar.sync 0; 2026-02-21T09:21:01.7965954Z // begin inline asm 2026-02-21T09:21:01.7966087Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3938}, [%r1091]; 2026-02-21T09:21:01.7966142Z // end inline asm 2026-02-21T09:21:01.7966195Z bar.sync 0; 2026-02-21T09:21:01.7966339Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3470, %r3606}; 2026-02-21T09:21:01.7966393Z bar.sync 0; 2026-02-21T09:21:01.7966567Z // begin inline asm 2026-02-21T09:21:01.7966705Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3940}, [%r1091]; 2026-02-21T09:21:01.7966760Z // end inline asm 2026-02-21T09:21:01.7966813Z bar.sync 0; 2026-02-21T09:21:01.7966957Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3471, %r3607}; 2026-02-21T09:21:01.7967012Z bar.sync 0; 2026-02-21T09:21:01.7967069Z // begin inline asm 2026-02-21T09:21:01.7967194Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3941}, [%r1091]; 2026-02-21T09:21:01.7967389Z // end inline asm 2026-02-21T09:21:01.7967443Z bar.sync 0; 2026-02-21T09:21:01.7967585Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3473, %r3609}; 2026-02-21T09:21:01.7967640Z bar.sync 0; 2026-02-21T09:21:01.7967711Z // begin inline asm 2026-02-21T09:21:01.7967839Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3943}, [%r1091]; 2026-02-21T09:21:01.7967894Z // end inline asm 2026-02-21T09:21:01.7967948Z bar.sync 0; 2026-02-21T09:21:01.7968090Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3472, %r3608}; 2026-02-21T09:21:01.7968144Z bar.sync 0; 2026-02-21T09:21:01.7968202Z // begin inline asm 2026-02-21T09:21:01.7968329Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3942}, [%r1091]; 2026-02-21T09:21:01.7968384Z // end inline asm 2026-02-21T09:21:01.7968506Z bar.sync 0; 2026-02-21T09:21:01.7968656Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3474, %r3610}; 2026-02-21T09:21:01.7968710Z bar.sync 0; 2026-02-21T09:21:01.7968771Z // begin inline asm 2026-02-21T09:21:01.7968905Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3944}, [%r1091]; 2026-02-21T09:21:01.7968961Z // end inline asm 2026-02-21T09:21:01.7969013Z bar.sync 0; 2026-02-21T09:21:01.7969216Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3475, %r3611}; 2026-02-21T09:21:01.7969275Z bar.sync 0; 2026-02-21T09:21:01.7969332Z // begin inline asm 2026-02-21T09:21:01.7969455Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3945}, [%r1091]; 2026-02-21T09:21:01.7969513Z // end inline asm 2026-02-21T09:21:01.7969565Z bar.sync 0; 2026-02-21T09:21:01.7969718Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3477, %r3613}; 2026-02-21T09:21:01.7969775Z bar.sync 0; 2026-02-21T09:21:01.7969836Z // begin inline asm 2026-02-21T09:21:01.7969961Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3947}, [%r1091]; 2026-02-21T09:21:01.7970016Z // end inline asm 2026-02-21T09:21:01.7970073Z bar.sync 0; 2026-02-21T09:21:01.7970215Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3476, %r3612}; 2026-02-21T09:21:01.7970272Z bar.sync 0; 2026-02-21T09:21:01.7970329Z // begin inline asm 2026-02-21T09:21:01.7970456Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3946}, [%r1091]; 2026-02-21T09:21:01.7970510Z // end inline asm 2026-02-21T09:21:01.7970567Z bar.sync 0; 2026-02-21T09:21:01.7970712Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3478, %r3614}; 2026-02-21T09:21:01.7970767Z bar.sync 0; 2026-02-21T09:21:01.7970824Z // begin inline asm 2026-02-21T09:21:01.7970951Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3948}, [%r1091]; 2026-02-21T09:21:01.7971006Z // end inline asm 2026-02-21T09:21:01.7971059Z bar.sync 0; 2026-02-21T09:21:01.7971199Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3479, %r3615}; 2026-02-21T09:21:01.7971255Z bar.sync 0; 2026-02-21T09:21:01.7971313Z // begin inline asm 2026-02-21T09:21:01.7971437Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3949}, [%r1091]; 2026-02-21T09:21:01.7971494Z // end inline asm 2026-02-21T09:21:01.7971550Z bar.sync 0; 2026-02-21T09:21:01.7971693Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3481, %r3617}; 2026-02-21T09:21:01.7971746Z bar.sync 0; 2026-02-21T09:21:01.7971805Z // begin inline asm 2026-02-21T09:21:01.7971930Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3951}, [%r1091]; 2026-02-21T09:21:01.7971985Z // end inline asm 2026-02-21T09:21:01.7972040Z bar.sync 0; 2026-02-21T09:21:01.7972181Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3480, %r3616}; 2026-02-21T09:21:01.7972235Z bar.sync 0; 2026-02-21T09:21:01.7972291Z // begin inline asm 2026-02-21T09:21:01.7972419Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3950}, [%r1091]; 2026-02-21T09:21:01.7972474Z // end inline asm 2026-02-21T09:21:01.7972528Z bar.sync 0; 2026-02-21T09:21:01.7972671Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3482, %r3618}; 2026-02-21T09:21:01.7972725Z bar.sync 0; 2026-02-21T09:21:01.7972782Z // begin inline asm 2026-02-21T09:21:01.7972907Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3952}, [%r1091]; 2026-02-21T09:21:01.7973068Z // end inline asm 2026-02-21T09:21:01.7973120Z bar.sync 0; 2026-02-21T09:21:01.7973259Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3483, %r3619}; 2026-02-21T09:21:01.7973315Z bar.sync 0; 2026-02-21T09:21:01.7973372Z // begin inline asm 2026-02-21T09:21:01.7973496Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3953}, [%r1091]; 2026-02-21T09:21:01.7973553Z // end inline asm 2026-02-21T09:21:01.7973607Z bar.sync 0; 2026-02-21T09:21:01.7973746Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3485, %r3621}; 2026-02-21T09:21:01.7973798Z bar.sync 0; 2026-02-21T09:21:01.7973857Z // begin inline asm 2026-02-21T09:21:01.7973981Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3955}, [%r1091]; 2026-02-21T09:21:01.7974035Z // end inline asm 2026-02-21T09:21:01.7974140Z bar.sync 0; 2026-02-21T09:21:01.7974283Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3484, %r3620}; 2026-02-21T09:21:01.7974336Z bar.sync 0; 2026-02-21T09:21:01.7974395Z // begin inline asm 2026-02-21T09:21:01.7974524Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3954}, [%r1091]; 2026-02-21T09:21:01.7974579Z // end inline asm 2026-02-21T09:21:01.7974631Z bar.sync 0; 2026-02-21T09:21:01.7974832Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3486, %r3622}; 2026-02-21T09:21:01.7974889Z bar.sync 0; 2026-02-21T09:21:01.7974947Z // begin inline asm 2026-02-21T09:21:01.7975074Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3956}, [%r1091]; 2026-02-21T09:21:01.7975129Z // end inline asm 2026-02-21T09:21:01.7975182Z bar.sync 0; 2026-02-21T09:21:01.7975325Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3487, %r3623}; 2026-02-21T09:21:01.7975381Z bar.sync 0; 2026-02-21T09:21:01.7975437Z // begin inline asm 2026-02-21T09:21:01.7975563Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3957}, [%r1091]; 2026-02-21T09:21:01.7975620Z // end inline asm 2026-02-21T09:21:01.7975673Z bar.sync 0; 2026-02-21T09:21:01.7975814Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3489, %r3625}; 2026-02-21T09:21:01.7975871Z bar.sync 0; 2026-02-21T09:21:01.7975930Z // begin inline asm 2026-02-21T09:21:01.7976054Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3959}, [%r1091]; 2026-02-21T09:21:01.7976109Z // end inline asm 2026-02-21T09:21:01.7976166Z bar.sync 0; 2026-02-21T09:21:01.7976308Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3488, %r3624}; 2026-02-21T09:21:01.7976362Z bar.sync 0; 2026-02-21T09:21:01.7976418Z // begin inline asm 2026-02-21T09:21:01.7976673Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3958}, [%r1091]; 2026-02-21T09:21:01.7976732Z // end inline asm 2026-02-21T09:21:01.7976785Z bar.sync 0; 2026-02-21T09:21:01.7976930Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3490, %r3626}; 2026-02-21T09:21:01.7976984Z bar.sync 0; 2026-02-21T09:21:01.7977044Z // begin inline asm 2026-02-21T09:21:01.7977172Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3960}, [%r1091]; 2026-02-21T09:21:01.7977228Z // end inline asm 2026-02-21T09:21:01.7977284Z bar.sync 0; 2026-02-21T09:21:01.7977425Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3491, %r3627}; 2026-02-21T09:21:01.7977481Z bar.sync 0; 2026-02-21T09:21:01.7977537Z // begin inline asm 2026-02-21T09:21:01.7977664Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3961}, [%r1091]; 2026-02-21T09:21:01.7977721Z // end inline asm 2026-02-21T09:21:01.7977773Z bar.sync 0; 2026-02-21T09:21:01.7977914Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3493, %r3629}; 2026-02-21T09:21:01.7977968Z bar.sync 0; 2026-02-21T09:21:01.7978028Z // begin inline asm 2026-02-21T09:21:01.7978152Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3963}, [%r1091]; 2026-02-21T09:21:01.7978206Z // end inline asm 2026-02-21T09:21:01.7978261Z bar.sync 0; 2026-02-21T09:21:01.7978402Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3492, %r3628}; 2026-02-21T09:21:01.7978455Z bar.sync 0; 2026-02-21T09:21:01.7978512Z // begin inline asm 2026-02-21T09:21:01.7978640Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3962}, [%r1091]; 2026-02-21T09:21:01.7978850Z // end inline asm 2026-02-21T09:21:01.7978904Z bar.sync 0; 2026-02-21T09:21:01.7979047Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3494, %r3630}; 2026-02-21T09:21:01.7979100Z bar.sync 0; 2026-02-21T09:21:01.7979159Z // begin inline asm 2026-02-21T09:21:01.7979285Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3964}, [%r1091]; 2026-02-21T09:21:01.7979341Z // end inline asm 2026-02-21T09:21:01.7979395Z bar.sync 0; 2026-02-21T09:21:01.7979534Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3495, %r3631}; 2026-02-21T09:21:01.7979593Z bar.sync 0; 2026-02-21T09:21:01.7979650Z // begin inline asm 2026-02-21T09:21:01.7979774Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3965}, [%r1091]; 2026-02-21T09:21:01.7979898Z // end inline asm 2026-02-21T09:21:01.7979965Z bar.sync 0; 2026-02-21T09:21:01.7980111Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3497, %r3633}; 2026-02-21T09:21:01.7980166Z bar.sync 0; 2026-02-21T09:21:01.7980230Z // begin inline asm 2026-02-21T09:21:01.7980355Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3967}, [%r1091]; 2026-02-21T09:21:01.7980410Z // end inline asm 2026-02-21T09:21:01.7980464Z bar.sync 0; 2026-02-21T09:21:01.7980668Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3496, %r3632}; 2026-02-21T09:21:01.7980725Z bar.sync 0; 2026-02-21T09:21:01.7980781Z // begin inline asm 2026-02-21T09:21:01.7980909Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3966}, [%r1091]; 2026-02-21T09:21:01.7980965Z // end inline asm 2026-02-21T09:21:01.7981018Z bar.sync 0; 2026-02-21T09:21:01.7981162Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r590], {%r3498, %r3634}; 2026-02-21T09:21:01.7981216Z bar.sync 0; 2026-02-21T09:21:01.7981273Z // begin inline asm 2026-02-21T09:21:01.7981402Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3968}, [%r1091]; 2026-02-21T09:21:01.7981457Z // end inline asm 2026-02-21T09:21:01.7981513Z // begin inline asm 2026-02-21T09:21:01.7981590Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.7981653Z // end inline asm 2026-02-21T09:21:01.7981726Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.7981787Z shl.b32 %r4094, %r4087, 8; 2026-02-21T09:21:01.7981849Z and.b32 %r4095, %r4094, 4096; 2026-02-21T09:21:01.7981913Z add.s32 %r4096, %r4095, %r491; 2026-02-21T09:21:01.7981973Z bfe.u32 %r4097, %r4096, 4, 14; 2026-02-21T09:21:01.7982037Z cvt.u64.u32 %rd176, %r4097; 2026-02-21T09:21:01.7982119Z or.b64 %rd170, %rd176, -9223371899382267904; 2026-02-21T09:21:01.7982176Z // begin inline asm 2026-02-21T09:21:01.7982960Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959,%r3960,%r3961,%r3962,%r3963,%r3964,%r3965,%r3966,%r3967,%r3968}, {%r3933,%r3934,%r3935,%r3936}, %rd170, %p36, 1, 1; 2026-02-21T09:21:01.7983021Z // end inline asm 2026-02-21T09:21:01.7983082Z add.s32 %r4098, %r4096, 32; 2026-02-21T09:21:01.7983145Z bfe.u32 %r4099, %r4098, 4, 14; 2026-02-21T09:21:01.7983208Z cvt.u64.u32 %rd177, %r4099; 2026-02-21T09:21:01.7983285Z or.b64 %rd171, %rd177, -9223371899382267904; 2026-02-21T09:21:01.7983343Z // begin inline asm 2026-02-21T09:21:01.7984099Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959,%r3960,%r3961,%r3962,%r3963,%r3964,%r3965,%r3966,%r3967,%r3968}, {%r4001,%r4002,%r4003,%r4004}, %rd171, %p36, 1, 1; 2026-02-21T09:21:01.7984155Z // end inline asm 2026-02-21T09:21:01.7984230Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.7984290Z mov.b32 %r4038, %r4039; 2026-02-21T09:21:01.7984350Z mov.b32 %r4037, %r491; 2026-02-21T09:21:01.7984408Z // begin inline asm 2026-02-21T09:21:01.7984967Z // wait for regs: %r3937,%r3938,%r3939,%r3940,%r3941,%r3942,%r3943,%r3944,%r3945,%r3946,%r3947,%r3948,%r3949,%r3950,%r3951,%r3952,%r3953,%r3954,%r3955,%r3956,%r3957,%r3958,%r3959,%r3960,%r3961,%r3962,%r3963,%r3964,%r3965,%r3966,%r3967,%r3968,%r4037,%r4038,%r4039 2026-02-21T09:21:01.7985146Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.7985201Z // end inline asm 2026-02-21T09:21:01.7985254Z $L__tmp16: 2026-02-21T09:21:01.7985468Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7985529Z add.s32 %r4100, %r5283, 1; 2026-02-21T09:21:01.7985598Z setp.gt.s32 %p45, %r4100, 4; 2026-02-21T09:21:01.7985667Z selp.b32 %r5283, 0, %r4100, %p45; 2026-02-21T09:21:01.7985868Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7985930Z add.s32 %r4101, %r5281, -16; 2026-02-21T09:21:01.7986051Z mad.wide.s32 %rd172, %r4101, 2, %rd27; 2026-02-21T09:21:01.7986255Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7986319Z shl.b32 %r4102, %r5283, 13; 2026-02-21T09:21:01.7986379Z add.s32 %r4075, %r21, %r4102; 2026-02-21T09:21:01.7986446Z selp.b32 %r4076, 8, 0, %p43; 2026-02-21T09:21:01.7986623Z // begin inline asm 2026-02-21T09:21:01.7986841Z cp.async.ca.shared.global [ %r4075 + 0 ], [ %rd172 + 0 ], 0x8, %r4076; 2026-02-21T09:21:01.7986903Z // end inline asm 2026-02-21T09:21:01.7986968Z cp.async.commit_group; 2026-02-21T09:21:01.7987168Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.7987248Z mad.wide.s32 %rd173, %r5281, 2, %rd27; 2026-02-21T09:21:01.7987451Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.7987513Z add.s32 %r4077, %r22, %r4102; 2026-02-21T09:21:01.7987573Z // begin inline asm 2026-02-21T09:21:01.7987711Z cp.async.ca.shared.global [ %r4077 + 0 ], [ %rd173 + 0 ], 0x8, %r4076; 2026-02-21T09:21:01.7987766Z // end inline asm 2026-02-21T09:21:01.7987835Z cp.async.commit_group; 2026-02-21T09:21:01.7988038Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7988099Z add.s32 %r5281, %r5281, 32; 2026-02-21T09:21:01.7988159Z add.s32 %r5280, %r5280, 131072; 2026-02-21T09:21:01.7988236Z setp.lt.u64 %p46, %rd222, 496; 2026-02-21T09:21:01.7988357Z @%p46 bra $L__BB0_9; 2026-02-21T09:21:01.7988472Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:01.7988673Z .loc 1 34 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:32 2026-02-21T09:21:01.7988737Z or.b32 %r4139, %r313, %r8; 2026-02-21T09:21:01.7988934Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.7988995Z or.b32 %r4140, %r314, %r12; 2026-02-21T09:21:01.7989058Z or.b32 %r4141, %r314, %r13; 2026-02-21T09:21:01.7989117Z or.b32 %r4142, %r314, %r14; 2026-02-21T09:21:01.7989177Z or.b32 %r4143, %r314, %r15; 2026-02-21T09:21:01.7989374Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.7989444Z cp.async.wait_group 0; 2026-02-21T09:21:01.7989499Z bar.sync 0; 2026-02-21T09:21:01.7989698Z .loc 1 90 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:90:28 2026-02-21T09:21:01.7989779Z cvt.rn.bf16x2.f32 %r4144, %r3938, %r3937; 2026-02-21T09:21:01.7989853Z cvt.rn.bf16x2.f32 %r4145, %r3940, %r3939; 2026-02-21T09:21:01.7989924Z cvt.rn.bf16x2.f32 %r4146, %r3942, %r3941; 2026-02-21T09:21:01.7989995Z cvt.rn.bf16x2.f32 %r4147, %r3944, %r3943; 2026-02-21T09:21:01.7990063Z cvt.rn.bf16x2.f32 %r4148, %r3946, %r3945; 2026-02-21T09:21:01.7990134Z cvt.rn.bf16x2.f32 %r4149, %r3948, %r3947; 2026-02-21T09:21:01.7990203Z cvt.rn.bf16x2.f32 %r4150, %r3950, %r3949; 2026-02-21T09:21:01.7990275Z cvt.rn.bf16x2.f32 %r4151, %r3952, %r3951; 2026-02-21T09:21:01.7990354Z cvt.rn.bf16x2.f32 %r4152, %r3954, %r3953; 2026-02-21T09:21:01.7990561Z cvt.rn.bf16x2.f32 %r4153, %r3956, %r3955; 2026-02-21T09:21:01.7990635Z cvt.rn.bf16x2.f32 %r4154, %r3958, %r3957; 2026-02-21T09:21:01.7990704Z cvt.rn.bf16x2.f32 %r4155, %r3960, %r3959; 2026-02-21T09:21:01.7990774Z cvt.rn.bf16x2.f32 %r4156, %r3962, %r3961; 2026-02-21T09:21:01.7990842Z cvt.rn.bf16x2.f32 %r4157, %r3964, %r3963; 2026-02-21T09:21:01.7990916Z cvt.rn.bf16x2.f32 %r4158, %r3966, %r3965; 2026-02-21T09:21:01.7990985Z cvt.rn.bf16x2.f32 %r4159, %r3968, %r3967; 2026-02-21T09:21:01.7991183Z .loc 1 91 43 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:43 2026-02-21T09:21:01.7991246Z shl.b32 %r4160, %r4140, 13; 2026-02-21T09:21:01.7991306Z shl.b32 %r4161, %r4141, 13; 2026-02-21T09:21:01.7991425Z shl.b32 %r4162, %r4142, 13; 2026-02-21T09:21:01.7991500Z shl.b32 %r4163, %r4143, 13; 2026-02-21T09:21:01.7991703Z .loc 1 91 50 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:50 2026-02-21T09:21:01.7991770Z add.s32 %r4164, %r4160, %r4139; 2026-02-21T09:21:01.7991831Z add.s32 %r4165, %r4161, %r4139; 2026-02-21T09:21:01.7991894Z add.s32 %r4166, %r4162, %r4139; 2026-02-21T09:21:01.7991953Z add.s32 %r4167, %r4163, %r4139; 2026-02-21T09:21:01.7992196Z .loc 1 91 22 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:22 2026-02-21T09:21:01.7992272Z mad.wide.s32 %rd178, %r4164, 2, %rd29; 2026-02-21T09:21:01.7992339Z mad.wide.s32 %rd179, %r4165, 2, %rd29; 2026-02-21T09:21:01.7992404Z mad.wide.s32 %rd180, %r4166, 2, %rd29; 2026-02-21T09:21:01.7992473Z mad.wide.s32 %rd181, %r4167, 2, %rd29; 2026-02-21T09:21:01.7992669Z .loc 1 91 81 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:81 2026-02-21T09:21:01.7992781Z st.shared.v4.b32 [%r39], {%r4144, %r4146, %r4148, %r4150}; 2026-02-21T09:21:01.7992897Z st.shared.v4.b32 [%r39+512], {%r4145, %r4147, %r4149, %r4151}; 2026-02-21T09:21:01.7993006Z st.shared.v4.b32 [%r40], {%r4152, %r4154, %r4156, %r4158}; 2026-02-21T09:21:01.7993122Z st.shared.v4.b32 [%r40+512], {%r4153, %r4155, %r4157, %r4159}; 2026-02-21T09:21:01.7993178Z bar.sync 0; 2026-02-21T09:21:01.7993238Z // begin inline asm 2026-02-21T09:21:01.7993434Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4103, %r4104, %r4105, %r4106}, [%r1392]; 2026-02-21T09:21:01.7993491Z // end inline asm 2026-02-21T09:21:01.7993551Z // begin inline asm 2026-02-21T09:21:01.7993733Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4108, %r4109, %r4110, %r4111}, [%r1397]; 2026-02-21T09:21:01.7993788Z // end inline asm 2026-02-21T09:21:01.7993845Z // begin inline asm 2026-02-21T09:21:01.7994025Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4113, %r4114, %r4115, %r4116}, [%r1402]; 2026-02-21T09:21:01.7994081Z // end inline asm 2026-02-21T09:21:01.7994137Z // begin inline asm 2026-02-21T09:21:01.7994314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4118, %r4119, %r4120, %r4121}, [%r1407]; 2026-02-21T09:21:01.7994373Z // end inline asm 2026-02-21T09:21:01.7994430Z // begin inline asm 2026-02-21T09:21:01.7994556Z st.global.v4.b32 [ %rd178 + 0 ], { %r4103, %r4104, %r4105, %r4106 }; 2026-02-21T09:21:01.7994611Z // end inline asm 2026-02-21T09:21:01.7994667Z // begin inline asm 2026-02-21T09:21:01.7994786Z st.global.v4.b32 [ %rd179 + 0 ], { %r4108, %r4109, %r4110, %r4111 }; 2026-02-21T09:21:01.7994856Z // end inline asm 2026-02-21T09:21:01.7994914Z // begin inline asm 2026-02-21T09:21:01.7995030Z st.global.v4.b32 [ %rd180 + 0 ], { %r4113, %r4114, %r4115, %r4116 }; 2026-02-21T09:21:01.7995087Z // end inline asm 2026-02-21T09:21:01.7995145Z // begin inline asm 2026-02-21T09:21:01.7995257Z st.global.v4.b32 [ %rd181 + 0 ], { %r4118, %r4119, %r4120, %r4121 }; 2026-02-21T09:21:01.7995312Z // end inline asm 2026-02-21T09:21:01.7995532Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.7995593Z add.s32 %r5171, %r5171, 4; 2026-02-21T09:21:01.7995757Z add.s32 %r5170, %r5170, 4; 2026-02-21T09:21:01.7995817Z add.s32 %r5169, %r5169, 4; 2026-02-21T09:21:01.7995875Z add.s32 %r5168, %r5168, 4; 2026-02-21T09:21:01.7995945Z setp.lt.s32 %p47, %r5171, %r5316; 2026-02-21T09:21:01.7996006Z @%p47 bra $L__BB0_2; 2026-02-21T09:21:01.7996098Z $L__BB0_11: // %.preheader 2026-02-21T09:21:01.7996166Z setp.gt.s32 %p48, %r5316, %r2; 2026-02-21T09:21:01.7996226Z @%p48 bra $L__BB0_16; 2026-02-21T09:21:01.7996309Z // %bb.12: // %.lr.ph54 2026-02-21T09:21:01.7996647Z .loc 1 0 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:0:121 2026-02-21T09:21:01.7996715Z and.b32 %r4170, %r5144, 136; 2026-02-21T09:21:01.7996781Z xor.b32 %r4171, %r4170, %r5143; 2026-02-21T09:21:01.7996916Z add.s32 %r50, %r5145, %r4171; 2026-02-21T09:21:01.7996978Z add.s32 %r51, %r50, 40960; 2026-02-21T09:21:01.7997039Z add.s32 %r4244, %r50, 8192; 2026-02-21T09:21:01.7997104Z add.s32 %r4246, %r50, 49152; 2026-02-21T09:21:01.7997165Z add.s32 %r4248, %r50, 16384; 2026-02-21T09:21:01.7997223Z add.s32 %r4250, %r50, 57344; 2026-02-21T09:21:01.7997284Z add.s32 %r4252, %r50, 24576; 2026-02-21T09:21:01.7997341Z add.s32 %r4254, %r50, 65536; 2026-02-21T09:21:01.7997488Z add.s32 %r4256, %r50, 32768; 2026-02-21T09:21:01.7997554Z add.s32 %r4258, %r50, 73728; 2026-02-21T09:21:01.7997615Z and.b32 %r4174, %r5146, 7680; 2026-02-21T09:21:01.7997675Z or.b32 %r4177, %r4174, %r5147; 2026-02-21T09:21:01.7997733Z or.b32 %r4178, %r4177, %r5148; 2026-02-21T09:21:01.7997795Z or.b32 %r60, %r4178, %r4170; 2026-02-21T09:21:01.7997854Z xor.b32 %r61, %r60, 8; 2026-02-21T09:21:01.7997912Z and.b32 %r4180, %r5149, 124; 2026-02-21T09:21:01.7997978Z selp.b32 %r4184, 1, 0, %p62; 2026-02-21T09:21:01.7998039Z add.s32 %r4185, %r5145, 81920; 2026-02-21T09:21:01.7998100Z add.s32 %r4186, %r4185, %r5150; 2026-02-21T09:21:01.7998158Z add.s32 %r4187, %r4186, %r4184; 2026-02-21T09:21:01.7998221Z add.s32 %r4188, %r4187, %r5152; 2026-02-21T09:21:01.7998281Z add.s32 %r4189, %r4188, %r5151; 2026-02-21T09:21:01.7998342Z add.s32 %r62, %r4189, %r4180; 2026-02-21T09:21:01.7998403Z and.b32 %r4191, %r5153, 384; 2026-02-21T09:21:01.7998461Z add.s32 %r4192, %r4185, %r5151; 2026-02-21T09:21:01.7998525Z add.s32 %r4193, %r4192, %r4191; 2026-02-21T09:21:01.7998584Z add.s32 %r4194, %r4193, %r4180; 2026-02-21T09:21:01.7998645Z add.s32 %r63, %r4194, %r5152; 2026-02-21T09:21:01.7998704Z xor.b32 %r4198, %r5155, %r5156; 2026-02-21T09:21:01.7998762Z or.b32 %r4199, %r4198, %r5154; 2026-02-21T09:21:01.7998825Z add.s32 %r64, %r4185, %r4199; 2026-02-21T09:21:01.7998884Z xor.b32 %r4200, %r4199, 32; 2026-02-21T09:21:01.7998946Z add.s32 %r65, %r4185, %r4200; 2026-02-21T09:21:01.7999011Z or.b32 %r4203, %r5157, %r5158; 2026-02-21T09:21:01.7999083Z add.s32 %r4204, %r5145, 90112; 2026-02-21T09:21:01.7999144Z add.s32 %r4782, %r4204, %r4203; 2026-02-21T09:21:01.7999203Z and.b32 %r4205, %r5146, 112; 2026-02-21T09:21:01.7999268Z or.b32 %r4207, %r5157, %r5159; 2026-02-21T09:21:01.7999327Z and.b32 %r4208, %r4207, 1920; 2026-02-21T09:21:01.7999385Z and.b32 %r4210, %r5160, 2048; 2026-02-21T09:21:01.7999448Z add.s32 %r4211, %r4204, %r4205; 2026-02-21T09:21:01.7999508Z add.s32 %r4212, %r4211, %r4210; 2026-02-21T09:21:01.7999569Z add.s32 %r4281, %r4212, %r4208; 2026-02-21T09:21:01.7999630Z bfe.u32 %r4213, %r4185, 4, 14; 2026-02-21T09:21:01.7999696Z cvt.u64.u32 %rd182, %r4213; 2026-02-21T09:21:01.7999775Z or.b64 %rd200, %rd182, -9223371899382267904; 2026-02-21T09:21:01.7999835Z add.s32 %r4214, %r5145, 81952; 2026-02-21T09:21:01.7999898Z bfe.u32 %r4215, %r4214, 4, 14; 2026-02-21T09:21:01.7999959Z cvt.u64.u32 %rd183, %r4215; 2026-02-21T09:21:01.8000035Z or.b64 %rd201, %rd183, -9223371899382267904; 2026-02-21T09:21:01.8000095Z add.s32 %r4216, %r5145, 86016; 2026-02-21T09:21:01.8000160Z bfe.u32 %r4217, %r4216, 4, 14; 2026-02-21T09:21:01.8000223Z cvt.u64.u32 %rd184, %r4217; 2026-02-21T09:21:01.8000437Z or.b64 %rd202, %rd184, -9223371899382267904; 2026-02-21T09:21:01.8000503Z add.s32 %r4218, %r5145, 86048; 2026-02-21T09:21:01.8000562Z bfe.u32 %r4219, %r4218, 4, 14; 2026-02-21T09:21:01.8000624Z cvt.u64.u32 %rd185, %r4219; 2026-02-21T09:21:01.8000700Z or.b64 %rd203, %rd185, -9223371899382267904; 2026-02-21T09:21:01.8000760Z and.b32 %r4222, %r5162, 15456; 2026-02-21T09:21:01.8000821Z shl.b32 %r4224, %r5163, 4; 2026-02-21T09:21:01.8000878Z and.b32 %r4225, %r5149, 16; 2026-02-21T09:21:01.8000942Z shr.u32 %r4226, %r4, 3; 2026-02-21T09:21:01.8001000Z and.b32 %r4227, %r4226, 64; 2026-02-21T09:21:01.8001059Z or.b32 %r4228, %r5161, %r4225; 2026-02-21T09:21:01.8001119Z or.b32 %r4229, %r4222, %r4224; 2026-02-21T09:21:01.8001178Z xor.b32 %r4230, %r4229, %r4227; 2026-02-21T09:21:01.8001290Z or.b32 %r4231, %r4230, %r4228; 2026-02-21T09:21:01.8001351Z add.s32 %r68, %r5145, %r4231; 2026-02-21T09:21:01.8001410Z xor.b32 %r4232, %r4231, 32; 2026-02-21T09:21:01.8001469Z add.s32 %r69, %r5145, %r4232; 2026-02-21T09:21:01.8001531Z shl.b32 %r4233, %r5163, 11; 2026-02-21T09:21:01.8001592Z or.b32 %r4236, %r4233, %r5164; 2026-02-21T09:21:01.8001651Z xor.b32 %r4237, %r4236, %r5165; 2026-02-21T09:21:01.8001710Z add.s32 %r5082, %r5145, %r4237; 2026-02-21T09:21:01.8001826Z add.s32 %r5087, %r5082, 4096; 2026-02-21T09:21:01.8001892Z add.s32 %r5092, %r5082, 8192; 2026-02-21T09:21:01.8001952Z add.s32 %r5097, %r5082, 12288; 2026-02-21T09:21:01.8002172Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.8002238Z add.s64 %rd9, %rd27, 320; 2026-02-21T09:21:01.8002443Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.8002503Z or.b32 %r74, %r5166, %r19; 2026-02-21T09:21:01.8002568Z or.b32 %r75, %r74, 176; 2026-02-21T09:21:01.8002635Z or.b32 %r76, %r5167, %r9; 2026-02-21T09:21:01.8002747Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T09:21:01.8002847Z // Child Loop BB0_14 Depth 2 2026-02-21T09:21:01.8003052Z .loc 1 28 35 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:28:35 2026-02-21T09:21:01.8003113Z shr.s32 %r4263, %r5316, 31; 2026-02-21T09:21:01.8003172Z shr.u32 %r4264, %r4263, 23; 2026-02-21T09:21:01.8003235Z add.s32 %r4265, %r5316, %r4264; 2026-02-21T09:21:01.8003433Z .loc 1 31 45 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:45 2026-02-21T09:21:01.8003495Z and.b32 %r4266, %r4265, 65024; 2026-02-21T09:21:01.8003558Z sub.s32 %r4267, %r5316, %r4266; 2026-02-21T09:21:01.8003754Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.8003817Z cvt.u16.u32 %rs153, %r4267; 2026-02-21T09:21:01.8004022Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.8004086Z shr.s16 %rs154, %rs153, 15; 2026-02-21T09:21:01.8004146Z shr.u16 %rs155, %rs154, 13; 2026-02-21T09:21:01.8004207Z add.s16 %rs156, %rs153, %rs155; 2026-02-21T09:21:01.8004270Z shr.s16 %rs157, %rs156, 3; 2026-02-21T09:21:01.8004467Z .loc 1 31 64 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:31:64 2026-02-21T09:21:01.8004530Z and.b16 %rs158, %rs156, -8; 2026-02-21T09:21:01.8004592Z sub.s16 %rs159, %rs153, %rs158; 2026-02-21T09:21:01.8004798Z .loc 1 32 51 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:32:51 2026-02-21T09:21:01.8004859Z cvt.u32.u16 %r4268, %rs157; 2026-02-21T09:21:01.8005058Z .loc 1 33 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:33:27 2026-02-21T09:21:01.8005119Z shl.b32 %r4269, %r4265, 1; 2026-02-21T09:21:01.8005180Z and.b32 %r4270, %r4269, -1024; 2026-02-21T09:21:01.8005246Z mul.wide.s16 %r4271, %rs159, 128; 2026-02-21T09:21:01.8005310Z add.s32 %r394, %r4271, %r4270; 2026-02-21T09:21:01.8005616Z .loc 1 35 27 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:35:27 2026-02-21T09:21:01.8005685Z mul.wide.s16 %r395, %rs157, 256; 2026-02-21T09:21:01.8005885Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.8005945Z or.b32 %r4272, %r395, %r11; 2026-02-21T09:21:01.8006140Z .loc 1 51 53 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:53 2026-02-21T09:21:01.8006202Z shl.b32 %r4273, %r4272, 10; 2026-02-21T09:21:01.8006396Z .loc 1 51 60 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:60 2026-02-21T09:21:01.8006577Z or.b32 %r4274, %r4273, %r19; 2026-02-21T09:21:01.8006857Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8006935Z mad.wide.s32 %rd186, %r4274, 2, %rd27; 2026-02-21T09:21:01.8007133Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8007191Z bar.sync 0; 2026-02-21T09:21:01.8007250Z mov.b32 %r4241, 8; 2026-02-21T09:21:01.8007309Z // begin inline asm 2026-02-21T09:21:01.8007508Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd186 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8007571Z // end inline asm 2026-02-21T09:21:01.8007637Z cp.async.commit_group; 2026-02-21T09:21:01.8007838Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8007899Z add.s64 %rd187, %rd186, 32; 2026-02-21T09:21:01.8008098Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8008157Z // begin inline asm 2026-02-21T09:21:01.8008290Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd187 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8008347Z // end inline asm 2026-02-21T09:21:01.8008411Z cp.async.commit_group; 2026-02-21T09:21:01.8008612Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8008675Z add.s64 %rd188, %rd186, 64; 2026-02-21T09:21:01.8008872Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8008927Z bar.sync 0; 2026-02-21T09:21:01.8008996Z // begin inline asm 2026-02-21T09:21:01.8009142Z cp.async.ca.shared.global [ %r4244 + 0 ], [ %rd188 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8009199Z // end inline asm 2026-02-21T09:21:01.8009262Z cp.async.commit_group; 2026-02-21T09:21:01.8009462Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8009524Z add.s64 %rd189, %rd186, 96; 2026-02-21T09:21:01.8009721Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8009782Z // begin inline asm 2026-02-21T09:21:01.8009919Z cp.async.ca.shared.global [ %r4246 + 0 ], [ %rd189 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8009978Z // end inline asm 2026-02-21T09:21:01.8010042Z cp.async.commit_group; 2026-02-21T09:21:01.8010242Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8010305Z add.s64 %rd190, %rd186, 128; 2026-02-21T09:21:01.8010513Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8010572Z bar.sync 0; 2026-02-21T09:21:01.8010634Z // begin inline asm 2026-02-21T09:21:01.8010768Z cp.async.ca.shared.global [ %r4248 + 0 ], [ %rd190 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8010826Z // end inline asm 2026-02-21T09:21:01.8010889Z cp.async.commit_group; 2026-02-21T09:21:01.8011087Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8011150Z add.s64 %rd191, %rd186, 160; 2026-02-21T09:21:01.8011349Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8011536Z // begin inline asm 2026-02-21T09:21:01.8011669Z cp.async.ca.shared.global [ %r4250 + 0 ], [ %rd191 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8011729Z // end inline asm 2026-02-21T09:21:01.8011794Z cp.async.commit_group; 2026-02-21T09:21:01.8011992Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8012054Z add.s64 %rd192, %rd186, 192; 2026-02-21T09:21:01.8012250Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8012304Z bar.sync 0; 2026-02-21T09:21:01.8012360Z // begin inline asm 2026-02-21T09:21:01.8012551Z cp.async.ca.shared.global [ %r4252 + 0 ], [ %rd192 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8012609Z // end inline asm 2026-02-21T09:21:01.8012673Z cp.async.commit_group; 2026-02-21T09:21:01.8012872Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8012938Z add.s64 %rd193, %rd186, 224; 2026-02-21T09:21:01.8013140Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8013250Z // begin inline asm 2026-02-21T09:21:01.8013384Z cp.async.ca.shared.global [ %r4254 + 0 ], [ %rd193 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8013439Z // end inline asm 2026-02-21T09:21:01.8013505Z cp.async.commit_group; 2026-02-21T09:21:01.8013703Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8013767Z add.s64 %rd194, %rd186, 256; 2026-02-21T09:21:01.8013973Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8014034Z bar.sync 0; 2026-02-21T09:21:01.8014092Z // begin inline asm 2026-02-21T09:21:01.8014226Z cp.async.ca.shared.global [ %r4256 + 0 ], [ %rd194 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8014287Z // end inline asm 2026-02-21T09:21:01.8014351Z cp.async.commit_group; 2026-02-21T09:21:01.8014547Z .loc 1 51 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:32 2026-02-21T09:21:01.8014610Z add.s64 %rd195, %rd186, 288; 2026-02-21T09:21:01.8014807Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8014864Z // begin inline asm 2026-02-21T09:21:01.8014995Z cp.async.ca.shared.global [ %r4258 + 0 ], [ %rd195 + 0 ], 0x8, %r4241; 2026-02-21T09:21:01.8015053Z // end inline asm 2026-02-21T09:21:01.8015116Z cp.async.commit_group; 2026-02-21T09:21:01.8015312Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.8015375Z shl.b32 %r4275, %r4268, 18; 2026-02-21T09:21:01.8015433Z or.b32 %r4276, %r74, %r4275; 2026-02-21T09:21:01.8015503Z mad.wide.s32 %rd224, %r4276, 2, %rd9; 2026-02-21T09:21:01.8015575Z or.b32 %r4277, %r75, %r4275; 2026-02-21T09:21:01.8015650Z mad.wide.s32 %rd223, %r4277, 2, %rd27; 2026-02-21T09:21:01.8015712Z add.s32 %r4278, %r76, %r4270; 2026-02-21T09:21:01.8015774Z add.s32 %r5317, %r4278, %r4271; 2026-02-21T09:21:01.8015835Z mov.b32 %r4913, 0f00000000; 2026-02-21T09:21:01.8015892Z mov.b32 %r5319, 4; 2026-02-21T09:21:01.8015951Z mov.b32 %r5318, -1; 2026-02-21T09:21:01.8016010Z mov.b64 %rd225, -16; 2026-02-21T09:21:01.8016071Z mov.b32 %r4914, %r4913; 2026-02-21T09:21:01.8016129Z mov.b32 %r4915, %r4913; 2026-02-21T09:21:01.8016186Z mov.b32 %r4916, %r4913; 2026-02-21T09:21:01.8016245Z mov.b32 %r4917, %r4913; 2026-02-21T09:21:01.8016302Z mov.b32 %r4918, %r4913; 2026-02-21T09:21:01.8016358Z mov.b32 %r4919, %r4913; 2026-02-21T09:21:01.8016417Z mov.b32 %r4920, %r4913; 2026-02-21T09:21:01.8016600Z mov.b32 %r4921, %r4913; 2026-02-21T09:21:01.8016663Z mov.b32 %r4922, %r4913; 2026-02-21T09:21:01.8016722Z mov.b32 %r4923, %r4913; 2026-02-21T09:21:01.8016863Z mov.b32 %r4924, %r4913; 2026-02-21T09:21:01.8016982Z mov.b32 %r4925, %r4913; 2026-02-21T09:21:01.8017040Z mov.b32 %r4926, %r4913; 2026-02-21T09:21:01.8017099Z mov.b32 %r4927, %r4913; 2026-02-21T09:21:01.8017156Z mov.b32 %r4928, %r4913; 2026-02-21T09:21:01.8017215Z mov.b32 %r4929, %r4913; 2026-02-21T09:21:01.8017272Z mov.b32 %r4930, %r4913; 2026-02-21T09:21:01.8017332Z mov.b32 %r4931, %r4913; 2026-02-21T09:21:01.8017388Z mov.b32 %r4932, %r4913; 2026-02-21T09:21:01.8017446Z mov.b32 %r4933, %r4913; 2026-02-21T09:21:01.8017505Z mov.b32 %r4934, %r4913; 2026-02-21T09:21:01.8017563Z mov.b32 %r4935, %r4913; 2026-02-21T09:21:01.8017620Z mov.b32 %r4936, %r4913; 2026-02-21T09:21:01.8017675Z mov.b32 %r4937, %r4913; 2026-02-21T09:21:01.8017734Z mov.b32 %r4938, %r4913; 2026-02-21T09:21:01.8017876Z mov.b32 %r4939, %r4913; 2026-02-21T09:21:01.8017938Z mov.b32 %r4940, %r4913; 2026-02-21T09:21:01.8017995Z mov.b32 %r4941, %r4913; 2026-02-21T09:21:01.8018052Z mov.b32 %r4942, %r4913; 2026-02-21T09:21:01.8018113Z mov.b32 %r4943, %r4913; 2026-02-21T09:21:01.8018169Z mov.b32 %r4944, %r4913; 2026-02-21T09:21:01.8018280Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T09:21:01.8018457Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:01.8018522Z add.s64 %rd225, %rd225, 16; 2026-02-21T09:21:01.8018593Z setp.lt.u64 %p57, %rd225, 432; 2026-02-21T09:21:01.8018652Z add.s32 %r5055, %r5318, 1; 2026-02-21T09:21:01.8018716Z setp.gt.s32 %p58, %r5055, 4; 2026-02-21T09:21:01.8018787Z selp.b32 %r5318, 0, %r5055, %p58; 2026-02-21T09:21:01.8018990Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8019057Z cp.async.wait_group 8; 2026-02-21T09:21:01.8019115Z bar.sync 0; 2026-02-21T09:21:01.8019177Z shl.b32 %r5056, %r5318, 13; 2026-02-21T09:21:01.8019237Z add.s32 %r5058, %r5145, %r5056; 2026-02-21T09:21:01.8019436Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.8019502Z add.s32 %r5059, %r5058, %r60; 2026-02-21T09:21:01.8019567Z ld.shared.b16 %rs162, [%r5059]; 2026-02-21T09:21:01.8019635Z ld.shared.b16 %rs163, [%r5059+256]; 2026-02-21T09:21:01.8019702Z ld.shared.b16 %rs164, [%r5059+16]; 2026-02-21T09:21:01.8019771Z ld.shared.b16 %rs165, [%r5059+272]; 2026-02-21T09:21:01.8019833Z add.s32 %r5060, %r5058, %r61; 2026-02-21T09:21:01.8019896Z ld.shared.b16 %rs166, [%r5060]; 2026-02-21T09:21:01.8019962Z ld.shared.b16 %rs167, [%r5060+256]; 2026-02-21T09:21:01.8020026Z ld.shared.b16 %rs168, [%r5060+16]; 2026-02-21T09:21:01.8020091Z ld.shared.b16 %rs169, [%r5060+272]; 2026-02-21T09:21:01.8020156Z cvt.f32.bf16 %r4575, %rs162; 2026-02-21T09:21:01.8020217Z cvt.f32.bf16 %r4576, %rs163; 2026-02-21T09:21:01.8020277Z cvt.f32.bf16 %r4577, %rs166; 2026-02-21T09:21:01.8020336Z cvt.f32.bf16 %r4578, %rs167; 2026-02-21T09:21:01.8020399Z cvt.f32.bf16 %r4643, %rs164; 2026-02-21T09:21:01.8020461Z cvt.f32.bf16 %r4644, %rs165; 2026-02-21T09:21:01.8020521Z cvt.f32.bf16 %r4645, %rs168; 2026-02-21T09:21:01.8020582Z cvt.f32.bf16 %r4646, %rs169; 2026-02-21T09:21:01.8020794Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.8020858Z cvt.s64.s32 %rd211, %r5317; 2026-02-21T09:21:01.8020921Z add.s64 %rd198, %rd28, %rd211; 2026-02-21T09:21:01.8021125Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.8021183Z // begin inline asm 2026-02-21T09:21:01.8021241Z mov.u64 %rd197, 0x0; 2026-02-21T09:21:01.8021371Z createpolicy.fractional.L2::evict_first.b64 %rd197, 1.0; 2026-02-21T09:21:01.8021427Z // end inline asm 2026-02-21T09:21:01.8021486Z // begin inline asm 2026-02-21T09:21:01.8021547Z mov.u16 %rs160, 0x0; 2026-02-21T09:21:01.8021710Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs160 }, [ %rd198 + 0 ], %rd197; 2026-02-21T09:21:01.8021878Z // end inline asm 2026-02-21T09:21:01.8022077Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.8022142Z st.shared.b8 [%r62], %rs160; 2026-02-21T09:21:01.8022197Z bar.sync 0; 2026-02-21T09:21:01.8022279Z ld.shared.v2.b8 {%rs170, %rs171}, [%r63]; 2026-02-21T09:21:01.8022495Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.8022556Z shl.b16 %rs172, %rs170, 4; 2026-02-21T09:21:01.8022616Z shl.b16 %rs173, %rs171, 4; 2026-02-21T09:21:01.8022824Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.8022896Z selp.b16 %rs174, %rs172, %rs170, %p63; 2026-02-21T09:21:01.8023007Z cvt.s16.s8 %rs175, %rs174; 2026-02-21T09:21:01.8023068Z shr.s16 %rs176, %rs175, 4; 2026-02-21T09:21:01.8023139Z selp.b16 %rs177, %rs173, %rs171, %p63; 2026-02-21T09:21:01.8023198Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T09:21:01.8023261Z shr.s16 %rs179, %rs178, 4; 2026-02-21T09:21:01.8023474Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.8023542Z cvt.rn.f32.s16 %r5061, %rs176; 2026-02-21T09:21:01.8023649Z cvt.rn.f32.s16 %r5062, %rs179; 2026-02-21T09:21:01.8023709Z bar.sync 0; 2026-02-21T09:21:01.8023770Z st.shared.b32 [%r64], %r5061; 2026-02-21T09:21:01.8023832Z st.shared.b32 [%r65], %r5062; 2026-02-21T09:21:01.8023970Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4913}; 2026-02-21T09:21:01.8024027Z bar.sync 0; 2026-02-21T09:21:01.8024085Z // begin inline asm 2026-02-21T09:21:01.8024240Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4443, %r4579}, [%r4281]; 2026-02-21T09:21:01.8024298Z // end inline asm 2026-02-21T09:21:01.8024354Z bar.sync 0; 2026-02-21T09:21:01.8024485Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4915}; 2026-02-21T09:21:01.8024541Z bar.sync 0; 2026-02-21T09:21:01.8024604Z // begin inline asm 2026-02-21T09:21:01.8024753Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4445, %r4581}, [%r4281]; 2026-02-21T09:21:01.8024808Z // end inline asm 2026-02-21T09:21:01.8024864Z bar.sync 0; 2026-02-21T09:21:01.8024992Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4914}; 2026-02-21T09:21:01.8025047Z bar.sync 0; 2026-02-21T09:21:01.8025107Z // begin inline asm 2026-02-21T09:21:01.8025252Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4444, %r4580}, [%r4281]; 2026-02-21T09:21:01.8025307Z // end inline asm 2026-02-21T09:21:01.8025361Z bar.sync 0; 2026-02-21T09:21:01.8025491Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4916}; 2026-02-21T09:21:01.8025546Z bar.sync 0; 2026-02-21T09:21:01.8025605Z // begin inline asm 2026-02-21T09:21:01.8025751Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4446, %r4582}, [%r4281]; 2026-02-21T09:21:01.8025806Z // end inline asm 2026-02-21T09:21:01.8025859Z bar.sync 0; 2026-02-21T09:21:01.8025984Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4917}; 2026-02-21T09:21:01.8026045Z bar.sync 0; 2026-02-21T09:21:01.8026103Z // begin inline asm 2026-02-21T09:21:01.8026246Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4447, %r4583}, [%r4281]; 2026-02-21T09:21:01.8026308Z // end inline asm 2026-02-21T09:21:01.8026364Z bar.sync 0; 2026-02-21T09:21:01.8026613Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4919}; 2026-02-21T09:21:01.8026674Z bar.sync 0; 2026-02-21T09:21:01.8026737Z // begin inline asm 2026-02-21T09:21:01.8026881Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4449, %r4585}, [%r4281]; 2026-02-21T09:21:01.8026936Z // end inline asm 2026-02-21T09:21:01.8026992Z bar.sync 0; 2026-02-21T09:21:01.8027116Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4918}; 2026-02-21T09:21:01.8027169Z bar.sync 0; 2026-02-21T09:21:01.8027229Z // begin inline asm 2026-02-21T09:21:01.8027373Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4448, %r4584}, [%r4281]; 2026-02-21T09:21:01.8027429Z // end inline asm 2026-02-21T09:21:01.8027621Z bar.sync 0; 2026-02-21T09:21:01.8027750Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4920}; 2026-02-21T09:21:01.8027803Z bar.sync 0; 2026-02-21T09:21:01.8027861Z // begin inline asm 2026-02-21T09:21:01.8028009Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4450, %r4586}, [%r4281]; 2026-02-21T09:21:01.8028065Z // end inline asm 2026-02-21T09:21:01.8028120Z bar.sync 0; 2026-02-21T09:21:01.8028245Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4921}; 2026-02-21T09:21:01.8028371Z bar.sync 0; 2026-02-21T09:21:01.8028436Z // begin inline asm 2026-02-21T09:21:01.8028581Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4451, %r4587}, [%r4281]; 2026-02-21T09:21:01.8028638Z // end inline asm 2026-02-21T09:21:01.8028691Z bar.sync 0; 2026-02-21T09:21:01.8028886Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4923}; 2026-02-21T09:21:01.8028947Z bar.sync 0; 2026-02-21T09:21:01.8029010Z // begin inline asm 2026-02-21T09:21:01.8029155Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4453, %r4589}, [%r4281]; 2026-02-21T09:21:01.8029217Z // end inline asm 2026-02-21T09:21:01.8029275Z bar.sync 0; 2026-02-21T09:21:01.8029402Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4922}; 2026-02-21T09:21:01.8029456Z bar.sync 0; 2026-02-21T09:21:01.8029580Z // begin inline asm 2026-02-21T09:21:01.8029730Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4452, %r4588}, [%r4281]; 2026-02-21T09:21:01.8029787Z // end inline asm 2026-02-21T09:21:01.8029841Z bar.sync 0; 2026-02-21T09:21:01.8029971Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4924}; 2026-02-21T09:21:01.8030025Z bar.sync 0; 2026-02-21T09:21:01.8030082Z // begin inline asm 2026-02-21T09:21:01.8030239Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4454, %r4590}, [%r4281]; 2026-02-21T09:21:01.8030299Z // end inline asm 2026-02-21T09:21:01.8030352Z bar.sync 0; 2026-02-21T09:21:01.8030478Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4925}; 2026-02-21T09:21:01.8030535Z bar.sync 0; 2026-02-21T09:21:01.8030594Z // begin inline asm 2026-02-21T09:21:01.8030738Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4455, %r4591}, [%r4281]; 2026-02-21T09:21:01.8030795Z // end inline asm 2026-02-21T09:21:01.8030849Z bar.sync 0; 2026-02-21T09:21:01.8030973Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4927}; 2026-02-21T09:21:01.8031026Z bar.sync 0; 2026-02-21T09:21:01.8031086Z // begin inline asm 2026-02-21T09:21:01.8031231Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4457, %r4593}, [%r4281]; 2026-02-21T09:21:01.8031286Z // end inline asm 2026-02-21T09:21:01.8031343Z bar.sync 0; 2026-02-21T09:21:01.8031468Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4926}; 2026-02-21T09:21:01.8031521Z bar.sync 0; 2026-02-21T09:21:01.8031581Z // begin inline asm 2026-02-21T09:21:01.8031727Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4456, %r4592}, [%r4281]; 2026-02-21T09:21:01.8031783Z // end inline asm 2026-02-21T09:21:01.8031836Z bar.sync 0; 2026-02-21T09:21:01.8031964Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4928}; 2026-02-21T09:21:01.8032022Z bar.sync 0; 2026-02-21T09:21:01.8032079Z // begin inline asm 2026-02-21T09:21:01.8032224Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4458, %r4594}, [%r4281]; 2026-02-21T09:21:01.8032281Z // end inline asm 2026-02-21T09:21:01.8032336Z bar.sync 0; 2026-02-21T09:21:01.8032460Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4929}; 2026-02-21T09:21:01.8032518Z bar.sync 0; 2026-02-21T09:21:01.8032576Z // begin inline asm 2026-02-21T09:21:01.8032719Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4459, %r4595}, [%r4281]; 2026-02-21T09:21:01.8032776Z // end inline asm 2026-02-21T09:21:01.8032830Z bar.sync 0; 2026-02-21T09:21:01.8032956Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4931}; 2026-02-21T09:21:01.8033011Z bar.sync 0; 2026-02-21T09:21:01.8033070Z // begin inline asm 2026-02-21T09:21:01.8033214Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4461, %r4597}, [%r4281]; 2026-02-21T09:21:01.8033269Z // end inline asm 2026-02-21T09:21:01.8033427Z bar.sync 0; 2026-02-21T09:21:01.8033553Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4930}; 2026-02-21T09:21:01.8033606Z bar.sync 0; 2026-02-21T09:21:01.8033665Z // begin inline asm 2026-02-21T09:21:01.8033811Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4460, %r4596}, [%r4281]; 2026-02-21T09:21:01.8033866Z // end inline asm 2026-02-21T09:21:01.8033919Z bar.sync 0; 2026-02-21T09:21:01.8034047Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4932}; 2026-02-21T09:21:01.8034101Z bar.sync 0; 2026-02-21T09:21:01.8034157Z // begin inline asm 2026-02-21T09:21:01.8034303Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4462, %r4598}, [%r4281]; 2026-02-21T09:21:01.8034357Z // end inline asm 2026-02-21T09:21:01.8034412Z bar.sync 0; 2026-02-21T09:21:01.8034611Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4933}; 2026-02-21T09:21:01.8034674Z bar.sync 0; 2026-02-21T09:21:01.8034731Z // begin inline asm 2026-02-21T09:21:01.8034882Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4463, %r4599}, [%r4281]; 2026-02-21T09:21:01.8034945Z // end inline asm 2026-02-21T09:21:01.8034998Z bar.sync 0; 2026-02-21T09:21:01.8035130Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4935}; 2026-02-21T09:21:01.8035194Z bar.sync 0; 2026-02-21T09:21:01.8035301Z // begin inline asm 2026-02-21T09:21:01.8035454Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4465, %r4601}, [%r4281]; 2026-02-21T09:21:01.8035515Z // end inline asm 2026-02-21T09:21:01.8035573Z bar.sync 0; 2026-02-21T09:21:01.8035705Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4934}; 2026-02-21T09:21:01.8035760Z bar.sync 0; 2026-02-21T09:21:01.8035822Z // begin inline asm 2026-02-21T09:21:01.8035969Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4464, %r4600}, [%r4281]; 2026-02-21T09:21:01.8036026Z // end inline asm 2026-02-21T09:21:01.8036080Z bar.sync 0; 2026-02-21T09:21:01.8036210Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4936}; 2026-02-21T09:21:01.8036267Z bar.sync 0; 2026-02-21T09:21:01.8036327Z // begin inline asm 2026-02-21T09:21:01.8036599Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4466, %r4602}, [%r4281]; 2026-02-21T09:21:01.8036660Z // end inline asm 2026-02-21T09:21:01.8036713Z bar.sync 0; 2026-02-21T09:21:01.8036844Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4937}; 2026-02-21T09:21:01.8036901Z bar.sync 0; 2026-02-21T09:21:01.8036960Z // begin inline asm 2026-02-21T09:21:01.8037105Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4467, %r4603}, [%r4281]; 2026-02-21T09:21:01.8037166Z // end inline asm 2026-02-21T09:21:01.8037219Z bar.sync 0; 2026-02-21T09:21:01.8037345Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4939}; 2026-02-21T09:21:01.8037400Z bar.sync 0; 2026-02-21T09:21:01.8037458Z // begin inline asm 2026-02-21T09:21:01.8037602Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4469, %r4605}, [%r4281]; 2026-02-21T09:21:01.8037657Z // end inline asm 2026-02-21T09:21:01.8037713Z bar.sync 0; 2026-02-21T09:21:01.8037842Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4938}; 2026-02-21T09:21:01.8037898Z bar.sync 0; 2026-02-21T09:21:01.8037957Z // begin inline asm 2026-02-21T09:21:01.8038100Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4468, %r4604}, [%r4281]; 2026-02-21T09:21:01.8038157Z // end inline asm 2026-02-21T09:21:01.8038211Z bar.sync 0; 2026-02-21T09:21:01.8038351Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4940}; 2026-02-21T09:21:01.8038407Z bar.sync 0; 2026-02-21T09:21:01.8038466Z // begin inline asm 2026-02-21T09:21:01.8038614Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4470, %r4606}, [%r4281]; 2026-02-21T09:21:01.8038669Z // end inline asm 2026-02-21T09:21:01.8038722Z bar.sync 0; 2026-02-21T09:21:01.8038848Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4941}; 2026-02-21T09:21:01.8038906Z bar.sync 0; 2026-02-21T09:21:01.8038963Z // begin inline asm 2026-02-21T09:21:01.8039108Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4471, %r4607}, [%r4281]; 2026-02-21T09:21:01.8039323Z // end inline asm 2026-02-21T09:21:01.8039377Z bar.sync 0; 2026-02-21T09:21:01.8039504Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4943}; 2026-02-21T09:21:01.8039559Z bar.sync 0; 2026-02-21T09:21:01.8039616Z // begin inline asm 2026-02-21T09:21:01.8039761Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4473, %r4609}, [%r4281]; 2026-02-21T09:21:01.8039817Z // end inline asm 2026-02-21T09:21:01.8039886Z bar.sync 0; 2026-02-21T09:21:01.8040014Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4942}; 2026-02-21T09:21:01.8040067Z bar.sync 0; 2026-02-21T09:21:01.8040125Z // begin inline asm 2026-02-21T09:21:01.8040268Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4472, %r4608}, [%r4281]; 2026-02-21T09:21:01.8040325Z // end inline asm 2026-02-21T09:21:01.8040381Z bar.sync 0; 2026-02-21T09:21:01.8040579Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4782], {%r4944}; 2026-02-21T09:21:01.8040636Z bar.sync 0; 2026-02-21T09:21:01.8040694Z // begin inline asm 2026-02-21T09:21:01.8040844Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r4474, %r4610}, [%r4281]; 2026-02-21T09:21:01.8040901Z // end inline asm 2026-02-21T09:21:01.8040955Z $L__tmp17: 2026-02-21T09:21:01.8041307Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.8041369Z // begin inline asm 2026-02-21T09:21:01.8041448Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.8041504Z // end inline asm 2026-02-21T09:21:01.8041589Z shfl.sync.idx.b32 %r5063, %r6, 0, 31, -1; 2026-02-21T09:21:01.8041663Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.8041728Z mov.pred %p50, -1; 2026-02-21T09:21:01.8041791Z // begin inline asm 2026-02-21T09:21:01.8042577Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465,%r4466,%r4467,%r4468,%r4469,%r4470,%r4471,%r4472,%r4473,%r4474}, {%r4575,%r4576,%r4577,%r4578}, %rd200, %p50, 1, 1; 2026-02-21T09:21:01.8042649Z // end inline asm 2026-02-21T09:21:01.8042710Z // begin inline asm 2026-02-21T09:21:01.8043468Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465,%r4466,%r4467,%r4468,%r4469,%r4470,%r4471,%r4472,%r4473,%r4474}, {%r4643,%r4644,%r4645,%r4646}, %rd201, %p50, 1, 1; 2026-02-21T09:21:01.8043525Z // end inline asm 2026-02-21T09:21:01.8043587Z // begin inline asm 2026-02-21T09:21:01.8044341Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601,%r4602,%r4603,%r4604,%r4605,%r4606,%r4607,%r4608,%r4609,%r4610}, {%r4575,%r4576,%r4577,%r4578}, %rd202, %p50, 1, 1; 2026-02-21T09:21:01.8044398Z // end inline asm 2026-02-21T09:21:01.8044461Z // begin inline asm 2026-02-21T09:21:01.8045206Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601,%r4602,%r4603,%r4604,%r4605,%r4606,%r4607,%r4608,%r4609,%r4610}, {%r4643,%r4644,%r4645,%r4646}, %rd203, %p50, 1, 1; 2026-02-21T09:21:01.8045261Z // end inline asm 2026-02-21T09:21:01.8045344Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.8045405Z mov.b32 %r5015, 0; 2026-02-21T09:21:01.8045466Z mov.b32 %r4711, %r4185; 2026-02-21T09:21:01.8045525Z mov.b32 %r4712, %r5015; 2026-02-21T09:21:01.8045585Z mov.b32 %r4713, %r5015; 2026-02-21T09:21:01.8045643Z // begin inline asm 2026-02-21T09:21:01.8046831Z // wait for regs: %r4443,%r4444,%r4445,%r4446,%r4447,%r4448,%r4449,%r4450,%r4451,%r4452,%r4453,%r4454,%r4455,%r4456,%r4457,%r4458,%r4459,%r4460,%r4461,%r4462,%r4463,%r4464,%r4465,%r4466,%r4467,%r4468,%r4469,%r4470,%r4471,%r4472,%r4473,%r4474,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4589,%r4590,%r4591,%r4592,%r4593,%r4594,%r4595,%r4596,%r4597,%r4598,%r4599,%r4600,%r4601,%r4602,%r4603,%r4604,%r4605,%r4606,%r4607,%r4608,%r4609,%r4610,%r4711,%r4712,%r4713 2026-02-21T09:21:01.8047055Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.8047116Z // end inline asm 2026-02-21T09:21:01.8047170Z $L__tmp18: 2026-02-21T09:21:01.8047406Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8047472Z add.s32 %r5064, %r5058, 40960; 2026-02-21T09:21:01.8047681Z .loc 1 55 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:55:32 2026-02-21T09:21:01.8047749Z add.s32 %r5065, %r5064, %r60; 2026-02-21T09:21:01.8047878Z ld.shared.b16 %rs180, [%r5065]; 2026-02-21T09:21:01.8047953Z ld.shared.b16 %rs181, [%r5065+256]; 2026-02-21T09:21:01.8048023Z ld.shared.b16 %rs182, [%r5065+16]; 2026-02-21T09:21:01.8048091Z ld.shared.b16 %rs183, [%r5065+272]; 2026-02-21T09:21:01.8048155Z add.s32 %r5066, %r5064, %r61; 2026-02-21T09:21:01.8048220Z ld.shared.b16 %rs184, [%r5066]; 2026-02-21T09:21:01.8048289Z ld.shared.b16 %rs185, [%r5066+256]; 2026-02-21T09:21:01.8048412Z ld.shared.b16 %rs186, [%r5066+16]; 2026-02-21T09:21:01.8048480Z ld.shared.b16 %rs187, [%r5066+272]; 2026-02-21T09:21:01.8048546Z cvt.f32.bf16 %r4909, %rs180; 2026-02-21T09:21:01.8048608Z cvt.f32.bf16 %r4910, %rs181; 2026-02-21T09:21:01.8048669Z cvt.f32.bf16 %r4911, %rs184; 2026-02-21T09:21:01.8048729Z cvt.f32.bf16 %r4912, %rs185; 2026-02-21T09:21:01.8048791Z cvt.f32.bf16 %r4977, %rs182; 2026-02-21T09:21:01.8048851Z cvt.f32.bf16 %r4978, %rs183; 2026-02-21T09:21:01.8048912Z cvt.f32.bf16 %r4979, %rs186; 2026-02-21T09:21:01.8048975Z cvt.f32.bf16 %r4980, %rs187; 2026-02-21T09:21:01.8049181Z .loc 1 57 34 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:34 2026-02-21T09:21:01.8049245Z add.s32 %r5067, %r5317, 65536; 2026-02-21T09:21:01.8049316Z cvt.s64.s32 %rd212, %r5067; 2026-02-21T09:21:01.8049379Z add.s64 %rd205, %rd28, %rd212; 2026-02-21T09:21:01.8049580Z .loc 1 57 87 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:57:87 2026-02-21T09:21:01.8049642Z // begin inline asm 2026-02-21T09:21:01.8049705Z mov.u64 %rd204, 0x0; 2026-02-21T09:21:01.8049836Z createpolicy.fractional.L2::evict_first.b64 %rd204, 1.0; 2026-02-21T09:21:01.8049893Z // end inline asm 2026-02-21T09:21:01.8049952Z // begin inline asm 2026-02-21T09:21:01.8050011Z mov.u16 %rs161, 0x0; 2026-02-21T09:21:01.8050176Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs161 }, [ %rd205 + 0 ], %rd204; 2026-02-21T09:21:01.8050233Z // end inline asm 2026-02-21T09:21:01.8050457Z .loc 1 65 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:65:28 2026-02-21T09:21:01.8050514Z bar.sync 0; 2026-02-21T09:21:01.8050580Z st.shared.b8 [%r62], %rs161; 2026-02-21T09:21:01.8050639Z bar.sync 0; 2026-02-21T09:21:01.8050722Z ld.shared.v2.b8 {%rs188, %rs189}, [%r63]; 2026-02-21T09:21:01.8050921Z .loc 1 60 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:60:28 2026-02-21T09:21:01.8050988Z shl.b16 %rs190, %rs188, 4; 2026-02-21T09:21:01.8051051Z shl.b16 %rs191, %rs189, 4; 2026-02-21T09:21:01.8051249Z .loc 1 75 58 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:75:58 2026-02-21T09:21:01.8051321Z selp.b16 %rs192, %rs190, %rs188, %p63; 2026-02-21T09:21:01.8051386Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T09:21:01.8051446Z shr.s16 %rs194, %rs193, 4; 2026-02-21T09:21:01.8051514Z selp.b16 %rs195, %rs191, %rs189, %p63; 2026-02-21T09:21:01.8051577Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T09:21:01.8051640Z shr.s16 %rs197, %rs196, 4; 2026-02-21T09:21:01.8051837Z .loc 1 80 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:80:32 2026-02-21T09:21:01.8051904Z cvt.rn.f32.s16 %r5068, %rs194; 2026-02-21T09:21:01.8052072Z cvt.rn.f32.s16 %r5069, %rs197; 2026-02-21T09:21:01.8052126Z bar.sync 0; 2026-02-21T09:21:01.8052190Z st.shared.b32 [%r64], %r5068; 2026-02-21T09:21:01.8052256Z st.shared.b32 [%r65], %r5069; 2026-02-21T09:21:01.8052311Z $L__tmp19: 2026-02-21T09:21:01.8052588Z .loc 2 291 36 // standard.py:291:36 @[ ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:87:40 ] 2026-02-21T09:21:01.8052749Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4443, %r4579}; 2026-02-21T09:21:01.8052804Z bar.sync 0; 2026-02-21T09:21:01.8052870Z // begin inline asm 2026-02-21T09:21:01.8053008Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4913}, [%r4782]; 2026-02-21T09:21:01.8053064Z // end inline asm 2026-02-21T09:21:01.8053118Z bar.sync 0; 2026-02-21T09:21:01.8053316Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4445, %r4581}; 2026-02-21T09:21:01.8053378Z bar.sync 0; 2026-02-21T09:21:01.8053441Z // begin inline asm 2026-02-21T09:21:01.8053576Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4915}, [%r4782]; 2026-02-21T09:21:01.8053635Z // end inline asm 2026-02-21T09:21:01.8053691Z bar.sync 0; 2026-02-21T09:21:01.8053837Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4444, %r4580}; 2026-02-21T09:21:01.8053937Z bar.sync 0; 2026-02-21T09:21:01.8054001Z // begin inline asm 2026-02-21T09:21:01.8054128Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4914}, [%r4782]; 2026-02-21T09:21:01.8054183Z // end inline asm 2026-02-21T09:21:01.8054240Z bar.sync 0; 2026-02-21T09:21:01.8054385Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4446, %r4582}; 2026-02-21T09:21:01.8054452Z bar.sync 0; 2026-02-21T09:21:01.8054513Z // begin inline asm 2026-02-21T09:21:01.8054646Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4916}, [%r4782]; 2026-02-21T09:21:01.8054703Z // end inline asm 2026-02-21T09:21:01.8054758Z bar.sync 0; 2026-02-21T09:21:01.8054906Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4447, %r4583}; 2026-02-21T09:21:01.8054963Z bar.sync 0; 2026-02-21T09:21:01.8055022Z // begin inline asm 2026-02-21T09:21:01.8055151Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4917}, [%r4782]; 2026-02-21T09:21:01.8055208Z // end inline asm 2026-02-21T09:21:01.8055263Z bar.sync 0; 2026-02-21T09:21:01.8055409Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4449, %r4585}; 2026-02-21T09:21:01.8055469Z bar.sync 0; 2026-02-21T09:21:01.8055527Z // begin inline asm 2026-02-21T09:21:01.8055654Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4919}, [%r4782]; 2026-02-21T09:21:01.8055712Z // end inline asm 2026-02-21T09:21:01.8055766Z bar.sync 0; 2026-02-21T09:21:01.8055910Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4448, %r4584}; 2026-02-21T09:21:01.8055965Z bar.sync 0; 2026-02-21T09:21:01.8056024Z // begin inline asm 2026-02-21T09:21:01.8056154Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4918}, [%r4782]; 2026-02-21T09:21:01.8056210Z // end inline asm 2026-02-21T09:21:01.8056269Z bar.sync 0; 2026-02-21T09:21:01.8056418Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4450, %r4586}; 2026-02-21T09:21:01.8056595Z bar.sync 0; 2026-02-21T09:21:01.8056660Z // begin inline asm 2026-02-21T09:21:01.8056793Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4920}, [%r4782]; 2026-02-21T09:21:01.8056850Z // end inline asm 2026-02-21T09:21:01.8056904Z bar.sync 0; 2026-02-21T09:21:01.8057052Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4451, %r4587}; 2026-02-21T09:21:01.8057106Z bar.sync 0; 2026-02-21T09:21:01.8057166Z // begin inline asm 2026-02-21T09:21:01.8057295Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4921}, [%r4782]; 2026-02-21T09:21:01.8057351Z // end inline asm 2026-02-21T09:21:01.8057405Z bar.sync 0; 2026-02-21T09:21:01.8057549Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4453, %r4589}; 2026-02-21T09:21:01.8057608Z bar.sync 0; 2026-02-21T09:21:01.8057665Z // begin inline asm 2026-02-21T09:21:01.8057792Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4923}, [%r4782]; 2026-02-21T09:21:01.8057992Z // end inline asm 2026-02-21T09:21:01.8058048Z bar.sync 0; 2026-02-21T09:21:01.8058193Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4452, %r4588}; 2026-02-21T09:21:01.8058248Z bar.sync 0; 2026-02-21T09:21:01.8058311Z // begin inline asm 2026-02-21T09:21:01.8058441Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4922}, [%r4782]; 2026-02-21T09:21:01.8058497Z // end inline asm 2026-02-21T09:21:01.8058554Z bar.sync 0; 2026-02-21T09:21:01.8058700Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4454, %r4590}; 2026-02-21T09:21:01.8058765Z bar.sync 0; 2026-02-21T09:21:01.8058825Z // begin inline asm 2026-02-21T09:21:01.8058959Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4924}, [%r4782]; 2026-02-21T09:21:01.8059015Z // end inline asm 2026-02-21T09:21:01.8059068Z bar.sync 0; 2026-02-21T09:21:01.8059283Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4455, %r4591}; 2026-02-21T09:21:01.8059342Z bar.sync 0; 2026-02-21T09:21:01.8059401Z // begin inline asm 2026-02-21T09:21:01.8059534Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4925}, [%r4782]; 2026-02-21T09:21:01.8059589Z // end inline asm 2026-02-21T09:21:01.8059644Z bar.sync 0; 2026-02-21T09:21:01.8059790Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4457, %r4593}; 2026-02-21T09:21:01.8059915Z bar.sync 0; 2026-02-21T09:21:01.8059977Z // begin inline asm 2026-02-21T09:21:01.8060108Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4927}, [%r4782]; 2026-02-21T09:21:01.8060164Z // end inline asm 2026-02-21T09:21:01.8060221Z bar.sync 0; 2026-02-21T09:21:01.8060365Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4456, %r4592}; 2026-02-21T09:21:01.8060418Z bar.sync 0; 2026-02-21T09:21:01.8060479Z // begin inline asm 2026-02-21T09:21:01.8060610Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4926}, [%r4782]; 2026-02-21T09:21:01.8060668Z // end inline asm 2026-02-21T09:21:01.8060724Z bar.sync 0; 2026-02-21T09:21:01.8060873Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4458, %r4594}; 2026-02-21T09:21:01.8060933Z bar.sync 0; 2026-02-21T09:21:01.8060994Z // begin inline asm 2026-02-21T09:21:01.8061125Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4928}, [%r4782]; 2026-02-21T09:21:01.8061182Z // end inline asm 2026-02-21T09:21:01.8061236Z bar.sync 0; 2026-02-21T09:21:01.8061388Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4459, %r4595}; 2026-02-21T09:21:01.8061445Z bar.sync 0; 2026-02-21T09:21:01.8061505Z // begin inline asm 2026-02-21T09:21:01.8061638Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4929}, [%r4782]; 2026-02-21T09:21:01.8061695Z // end inline asm 2026-02-21T09:21:01.8061748Z bar.sync 0; 2026-02-21T09:21:01.8061894Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4461, %r4597}; 2026-02-21T09:21:01.8061950Z bar.sync 0; 2026-02-21T09:21:01.8062010Z // begin inline asm 2026-02-21T09:21:01.8062138Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4931}, [%r4782]; 2026-02-21T09:21:01.8062196Z // end inline asm 2026-02-21T09:21:01.8062251Z bar.sync 0; 2026-02-21T09:21:01.8062399Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4460, %r4596}; 2026-02-21T09:21:01.8062453Z bar.sync 0; 2026-02-21T09:21:01.8062513Z // begin inline asm 2026-02-21T09:21:01.8062640Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4930}, [%r4782]; 2026-02-21T09:21:01.8062699Z // end inline asm 2026-02-21T09:21:01.8062762Z bar.sync 0; 2026-02-21T09:21:01.8062907Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4462, %r4598}; 2026-02-21T09:21:01.8062961Z bar.sync 0; 2026-02-21T09:21:01.8063023Z // begin inline asm 2026-02-21T09:21:01.8063161Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4932}, [%r4782]; 2026-02-21T09:21:01.8063219Z // end inline asm 2026-02-21T09:21:01.8063274Z bar.sync 0; 2026-02-21T09:21:01.8063422Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4463, %r4599}; 2026-02-21T09:21:01.8063478Z bar.sync 0; 2026-02-21T09:21:01.8063536Z // begin inline asm 2026-02-21T09:21:01.8063664Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4933}, [%r4782]; 2026-02-21T09:21:01.8063840Z // end inline asm 2026-02-21T09:21:01.8063893Z bar.sync 0; 2026-02-21T09:21:01.8064040Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4465, %r4601}; 2026-02-21T09:21:01.8064095Z bar.sync 0; 2026-02-21T09:21:01.8064153Z // begin inline asm 2026-02-21T09:21:01.8064283Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4935}, [%r4782]; 2026-02-21T09:21:01.8064341Z // end inline asm 2026-02-21T09:21:01.8064395Z bar.sync 0; 2026-02-21T09:21:01.8064540Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4464, %r4600}; 2026-02-21T09:21:01.8064594Z bar.sync 0; 2026-02-21T09:21:01.8064654Z // begin inline asm 2026-02-21T09:21:01.8064780Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4934}, [%r4782]; 2026-02-21T09:21:01.8064836Z // end inline asm 2026-02-21T09:21:01.8064944Z bar.sync 0; 2026-02-21T09:21:01.8065095Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4466, %r4602}; 2026-02-21T09:21:01.8065150Z bar.sync 0; 2026-02-21T09:21:01.8065209Z // begin inline asm 2026-02-21T09:21:01.8065339Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4936}, [%r4782]; 2026-02-21T09:21:01.8065394Z // end inline asm 2026-02-21T09:21:01.8065447Z bar.sync 0; 2026-02-21T09:21:01.8065595Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4467, %r4603}; 2026-02-21T09:21:01.8065697Z bar.sync 0; 2026-02-21T09:21:01.8065769Z // begin inline asm 2026-02-21T09:21:01.8065902Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4937}, [%r4782]; 2026-02-21T09:21:01.8065959Z // end inline asm 2026-02-21T09:21:01.8066013Z bar.sync 0; 2026-02-21T09:21:01.8066159Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4469, %r4605}; 2026-02-21T09:21:01.8066215Z bar.sync 0; 2026-02-21T09:21:01.8066273Z // begin inline asm 2026-02-21T09:21:01.8066402Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4939}, [%r4782]; 2026-02-21T09:21:01.8066573Z // end inline asm 2026-02-21T09:21:01.8066632Z bar.sync 0; 2026-02-21T09:21:01.8066778Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4468, %r4604}; 2026-02-21T09:21:01.8066835Z bar.sync 0; 2026-02-21T09:21:01.8066895Z // begin inline asm 2026-02-21T09:21:01.8067022Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4938}, [%r4782]; 2026-02-21T09:21:01.8067079Z // end inline asm 2026-02-21T09:21:01.8067146Z bar.sync 0; 2026-02-21T09:21:01.8067299Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4470, %r4606}; 2026-02-21T09:21:01.8067355Z bar.sync 0; 2026-02-21T09:21:01.8067416Z // begin inline asm 2026-02-21T09:21:01.8067544Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4940}, [%r4782]; 2026-02-21T09:21:01.8067602Z // end inline asm 2026-02-21T09:21:01.8067657Z bar.sync 0; 2026-02-21T09:21:01.8067805Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4471, %r4607}; 2026-02-21T09:21:01.8067859Z bar.sync 0; 2026-02-21T09:21:01.8067918Z // begin inline asm 2026-02-21T09:21:01.8068046Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4941}, [%r4782]; 2026-02-21T09:21:01.8068102Z // end inline asm 2026-02-21T09:21:01.8068158Z bar.sync 0; 2026-02-21T09:21:01.8068359Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4473, %r4609}; 2026-02-21T09:21:01.8068419Z bar.sync 0; 2026-02-21T09:21:01.8068478Z // begin inline asm 2026-02-21T09:21:01.8068608Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4943}, [%r4782]; 2026-02-21T09:21:01.8068666Z // end inline asm 2026-02-21T09:21:01.8068722Z bar.sync 0; 2026-02-21T09:21:01.8068869Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4472, %r4608}; 2026-02-21T09:21:01.8068922Z bar.sync 0; 2026-02-21T09:21:01.8068982Z // begin inline asm 2026-02-21T09:21:01.8069109Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4942}, [%r4782]; 2026-02-21T09:21:01.8069163Z // end inline asm 2026-02-21T09:21:01.8069218Z bar.sync 0; 2026-02-21T09:21:01.8069364Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r4281], {%r4474, %r4610}; 2026-02-21T09:21:01.8069418Z bar.sync 0; 2026-02-21T09:21:01.8069481Z // begin inline asm 2026-02-21T09:21:01.8069609Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4944}, [%r4782]; 2026-02-21T09:21:01.8069819Z // end inline asm 2026-02-21T09:21:01.8069879Z // begin inline asm 2026-02-21T09:21:01.8069962Z fence.proxy.async.shared::cta; 2026-02-21T09:21:01.8070018Z // end inline asm 2026-02-21T09:21:01.8070092Z wgmma.fence.sync.aligned; 2026-02-21T09:21:01.8070159Z shl.b32 %r5070, %r5063, 8; 2026-02-21T09:21:01.8070221Z and.b32 %r5071, %r5070, 4096; 2026-02-21T09:21:01.8070286Z add.s32 %r5072, %r5071, %r4185; 2026-02-21T09:21:01.8070350Z bfe.u32 %r5073, %r5072, 4, 14; 2026-02-21T09:21:01.8070420Z cvt.u64.u32 %rd213, %r5073; 2026-02-21T09:21:01.8070504Z or.b64 %rd207, %rd213, -9223371899382267904; 2026-02-21T09:21:01.8070564Z // begin inline asm 2026-02-21T09:21:01.8071398Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935,%r4936,%r4937,%r4938,%r4939,%r4940,%r4941,%r4942,%r4943,%r4944}, {%r4909,%r4910,%r4911,%r4912}, %rd207, %p50, 1, 1; 2026-02-21T09:21:01.8071462Z // end inline asm 2026-02-21T09:21:01.8071523Z add.s32 %r5074, %r5072, 32; 2026-02-21T09:21:01.8071588Z bfe.u32 %r5075, %r5074, 4, 14; 2026-02-21T09:21:01.8071649Z cvt.u64.u32 %rd214, %r5075; 2026-02-21T09:21:01.8071785Z or.b64 %rd208, %rd214, -9223371899382267904; 2026-02-21T09:21:01.8071850Z // begin inline asm 2026-02-21T09:21:01.8072604Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935,%r4936,%r4937,%r4938,%r4939,%r4940,%r4941,%r4942,%r4943,%r4944}, {%r4977,%r4978,%r4979,%r4980}, %rd208, %p50, 1, 1; 2026-02-21T09:21:01.8072661Z // end inline asm 2026-02-21T09:21:01.8072739Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:01.8072800Z mov.b32 %r5014, %r5015; 2026-02-21T09:21:01.8072858Z mov.b32 %r5013, %r4185; 2026-02-21T09:21:01.8072916Z // begin inline asm 2026-02-21T09:21:01.8073497Z // wait for regs: %r4913,%r4914,%r4915,%r4916,%r4917,%r4918,%r4919,%r4920,%r4921,%r4922,%r4923,%r4924,%r4925,%r4926,%r4927,%r4928,%r4929,%r4930,%r4931,%r4932,%r4933,%r4934,%r4935,%r4936,%r4937,%r4938,%r4939,%r4940,%r4941,%r4942,%r4943,%r4944,%r5013,%r5014,%r5015 2026-02-21T09:21:01.8073574Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:01.8073631Z // end inline asm 2026-02-21T09:21:01.8073686Z $L__tmp20: 2026-02-21T09:21:01.8073900Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.8073962Z add.s32 %r5076, %r5319, 1; 2026-02-21T09:21:01.8074034Z setp.gt.s32 %p59, %r5076, 4; 2026-02-21T09:21:01.8074102Z selp.b32 %r5319, 0, %r5076, %p59; 2026-02-21T09:21:01.8074307Z .loc 1 51 80 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:51:80 2026-02-21T09:21:01.8074367Z shl.b32 %r5077, %r5319, 13; 2026-02-21T09:21:01.8074431Z add.s32 %r5051, %r50, %r5077; 2026-02-21T09:21:01.8074496Z selp.b32 %r5052, 8, 0, %p57; 2026-02-21T09:21:01.8074556Z // begin inline asm 2026-02-21T09:21:01.8074705Z cp.async.ca.shared.global [ %r5051 + 0 ], [ %rd224 + 0 ], 0x8, %r5052; 2026-02-21T09:21:01.8074762Z // end inline asm 2026-02-21T09:21:01.8074829Z cp.async.commit_group; 2026-02-21T09:21:01.8074890Z add.s32 %r5053, %r51, %r5077; 2026-02-21T09:21:01.8074962Z // begin inline asm 2026-02-21T09:21:01.8075100Z cp.async.ca.shared.global [ %r5053 + 0 ], [ %rd223 + 0 ], 0x8, %r5052; 2026-02-21T09:21:01.8075159Z // end inline asm 2026-02-21T09:21:01.8075228Z cp.async.commit_group; 2026-02-21T09:21:01.8075443Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.8075509Z add.s64 %rd224, %rd224, 64; 2026-02-21T09:21:01.8075577Z add.s64 %rd223, %rd223, 64; 2026-02-21T09:21:01.8075641Z add.s32 %r5317, %r5317, 131072; 2026-02-21T09:21:01.8075711Z setp.lt.u64 %p60, %rd225, 496; 2026-02-21T09:21:01.8075830Z @%p60 bra $L__BB0_14; 2026-02-21T09:21:01.8075995Z // %bb.15: // in Loop: Header=BB0_13 Depth=1 2026-02-21T09:21:01.8082926Z .loc 1 34 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:34:32 2026-02-21T09:21:01.8083045Z or.b32 %r5114, %r394, %r8; 2026-02-21T09:21:01.8083292Z .loc 1 36 32 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:36:32 2026-02-21T09:21:01.8083359Z or.b32 %r5115, %r395, %r12; 2026-02-21T09:21:01.8083424Z or.b32 %r5116, %r395, %r13; 2026-02-21T09:21:01.8083482Z or.b32 %r5117, %r395, %r14; 2026-02-21T09:21:01.8083540Z or.b32 %r5118, %r395, %r15; 2026-02-21T09:21:01.8083770Z .loc 1 43 78 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:43:78 2026-02-21T09:21:01.8083959Z cp.async.wait_group 0; 2026-02-21T09:21:01.8084023Z bar.sync 0; 2026-02-21T09:21:01.8084240Z .loc 1 90 28 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:90:28 2026-02-21T09:21:01.8084329Z cvt.rn.bf16x2.f32 %r5119, %r4914, %r4913; 2026-02-21T09:21:01.8084404Z cvt.rn.bf16x2.f32 %r5120, %r4916, %r4915; 2026-02-21T09:21:01.8084475Z cvt.rn.bf16x2.f32 %r5121, %r4918, %r4917; 2026-02-21T09:21:01.8084626Z cvt.rn.bf16x2.f32 %r5122, %r4920, %r4919; 2026-02-21T09:21:01.8084704Z cvt.rn.bf16x2.f32 %r5123, %r4922, %r4921; 2026-02-21T09:21:01.8084776Z cvt.rn.bf16x2.f32 %r5124, %r4924, %r4923; 2026-02-21T09:21:01.8084848Z cvt.rn.bf16x2.f32 %r5125, %r4926, %r4925; 2026-02-21T09:21:01.8084916Z cvt.rn.bf16x2.f32 %r5126, %r4928, %r4927; 2026-02-21T09:21:01.8084986Z cvt.rn.bf16x2.f32 %r5127, %r4930, %r4929; 2026-02-21T09:21:01.8085059Z cvt.rn.bf16x2.f32 %r5128, %r4932, %r4931; 2026-02-21T09:21:01.8085129Z cvt.rn.bf16x2.f32 %r5129, %r4934, %r4933; 2026-02-21T09:21:01.8085198Z cvt.rn.bf16x2.f32 %r5130, %r4936, %r4935; 2026-02-21T09:21:01.8085267Z cvt.rn.bf16x2.f32 %r5131, %r4938, %r4937; 2026-02-21T09:21:01.8085337Z cvt.rn.bf16x2.f32 %r5132, %r4940, %r4939; 2026-02-21T09:21:01.8085410Z cvt.rn.bf16x2.f32 %r5133, %r4942, %r4941; 2026-02-21T09:21:01.8085479Z cvt.rn.bf16x2.f32 %r5134, %r4944, %r4943; 2026-02-21T09:21:01.8085694Z .loc 1 91 43 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:43 2026-02-21T09:21:01.8085758Z shl.b32 %r5135, %r5115, 13; 2026-02-21T09:21:01.8085816Z shl.b32 %r5136, %r5116, 13; 2026-02-21T09:21:01.8085873Z shl.b32 %r5137, %r5117, 13; 2026-02-21T09:21:01.8085934Z shl.b32 %r5138, %r5118, 13; 2026-02-21T09:21:01.8086133Z .loc 1 91 50 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:50 2026-02-21T09:21:01.8086200Z add.s32 %r5139, %r5135, %r5114; 2026-02-21T09:21:01.8086264Z add.s32 %r5140, %r5136, %r5114; 2026-02-21T09:21:01.8086325Z add.s32 %r5141, %r5137, %r5114; 2026-02-21T09:21:01.8086385Z add.s32 %r5142, %r5138, %r5114; 2026-02-21T09:21:01.8086765Z .loc 1 91 22 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:22 2026-02-21T09:21:01.8086848Z mad.wide.s32 %rd215, %r5139, 2, %rd29; 2026-02-21T09:21:01.8086915Z mad.wide.s32 %rd216, %r5140, 2, %rd29; 2026-02-21T09:21:01.8086981Z mad.wide.s32 %rd217, %r5141, 2, %rd29; 2026-02-21T09:21:01.8087050Z mad.wide.s32 %rd218, %r5142, 2, %rd29; 2026-02-21T09:21:01.8087250Z .loc 1 91 81 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:91:81 2026-02-21T09:21:01.8087366Z st.shared.v4.b32 [%r68], {%r5119, %r5121, %r5123, %r5125}; 2026-02-21T09:21:01.8087495Z st.shared.v4.b32 [%r68+512], {%r5120, %r5122, %r5124, %r5126}; 2026-02-21T09:21:01.8087607Z st.shared.v4.b32 [%r69], {%r5127, %r5129, %r5131, %r5133}; 2026-02-21T09:21:01.8087719Z st.shared.v4.b32 [%r69+512], {%r5128, %r5130, %r5132, %r5134}; 2026-02-21T09:21:01.8087781Z bar.sync 0; 2026-02-21T09:21:01.8087842Z // begin inline asm 2026-02-21T09:21:01.8088036Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5078, %r5079, %r5080, %r5081}, [%r5082]; 2026-02-21T09:21:01.8088254Z // end inline asm 2026-02-21T09:21:01.8088314Z // begin inline asm 2026-02-21T09:21:01.8088499Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5083, %r5084, %r5085, %r5086}, [%r5087]; 2026-02-21T09:21:01.8088555Z // end inline asm 2026-02-21T09:21:01.8088620Z // begin inline asm 2026-02-21T09:21:01.8088797Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5088, %r5089, %r5090, %r5091}, [%r5092]; 2026-02-21T09:21:01.8088852Z // end inline asm 2026-02-21T09:21:01.8088914Z // begin inline asm 2026-02-21T09:21:01.8089088Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5093, %r5094, %r5095, %r5096}, [%r5097]; 2026-02-21T09:21:01.8089143Z // end inline asm 2026-02-21T09:21:01.8089199Z // begin inline asm 2026-02-21T09:21:01.8089329Z st.global.v4.b32 [ %rd215 + 0 ], { %r5078, %r5079, %r5080, %r5081 }; 2026-02-21T09:21:01.8089460Z // end inline asm 2026-02-21T09:21:01.8089521Z // begin inline asm 2026-02-21T09:21:01.8089645Z st.global.v4.b32 [ %rd216 + 0 ], { %r5083, %r5084, %r5085, %r5086 }; 2026-02-21T09:21:01.8089704Z // end inline asm 2026-02-21T09:21:01.8089761Z // begin inline asm 2026-02-21T09:21:01.8089879Z st.global.v4.b32 [ %rd217 + 0 ], { %r5088, %r5089, %r5090, %r5091 }; 2026-02-21T09:21:01.8089937Z // end inline asm 2026-02-21T09:21:01.8090074Z // begin inline asm 2026-02-21T09:21:01.8090193Z st.global.v4.b32 [ %rd218 + 0 ], { %r5093, %r5094, %r5095, %r5096 }; 2026-02-21T09:21:01.8090252Z // end inline asm 2026-02-21T09:21:01.8090474Z .loc 1 22 121 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:121 2026-02-21T09:21:01.8090538Z add.s32 %r467, %r5316, 1; 2026-02-21T09:21:01.8090621Z setp.lt.s32 %p61, %r5316, %r2; 2026-02-21T09:21:01.8090684Z mov.b32 %r5316, %r467; 2026-02-21T09:21:01.8090747Z @%p61 bra $L__BB0_13; 2026-02-21T09:21:01.8090839Z $L__BB0_16: // %._crit_edge 2026-02-21T09:21:01.8091048Z .loc 1 22 4 // ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py:22:4 2026-02-21T09:21:01.8091103Z ret; 2026-02-21T09:21:01.8091157Z $L__tmp21: 2026-02-21T09:21:01.8091216Z $L__func_end0: 2026-02-21T09:21:01.8091305Z // -- End function 2026-02-21T09:21:01.8091358Z } 2026-02-21T09:21:01.8091612Z .file 1 "/tmp/torchinductor_root/io/ciozzmanyytlv7o6km4tqpi45x4rikx4scflkjfamfsnnpkiyk7w.py" 2026-02-21T09:21:01.8091830Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:21:01.8091897Z .section .debug_abbrev 2026-02-21T09:21:01.8091949Z { 2026-02-21T09:21:01.8092052Z .b8 1 // Abbreviation Code 2026-02-21T09:21:01.8092147Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:21:01.8092244Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:01.8092336Z .b8 37 // DW_AT_producer 2026-02-21T09:21:01.8092415Z .b8 8 // DW_FORM_string 2026-02-21T09:21:01.8092494Z .b8 19 // DW_AT_language 2026-02-21T09:21:01.8092576Z .b8 5 // DW_FORM_data2 2026-02-21T09:21:01.8092654Z .b8 3 // DW_AT_name 2026-02-21T09:21:01.8092732Z .b8 8 // DW_FORM_string 2026-02-21T09:21:01.8092814Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:21:01.8092893Z .b8 6 // DW_FORM_data4 2026-02-21T09:21:01.8092971Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:21:01.8093049Z .b8 8 // DW_FORM_string 2026-02-21T09:21:01.8093129Z .b8 0 // EOM(1) 2026-02-21T09:21:01.8093201Z .b8 0 // EOM(2) 2026-02-21T09:21:01.8093290Z .b8 2 // Abbreviation Code 2026-02-21T09:21:01.8093377Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:01.8093564Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:01.8093640Z .b8 3 // DW_AT_name 2026-02-21T09:21:01.8093718Z .b8 8 // DW_FORM_string 2026-02-21T09:21:01.8093803Z .b8 32 // DW_AT_inline 2026-02-21T09:21:01.8093884Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:01.8093954Z .b8 0 // EOM(1) 2026-02-21T09:21:01.8094024Z .b8 0 // EOM(2) 2026-02-21T09:21:01.8094107Z .b8 3 // Abbreviation Code 2026-02-21T09:21:01.8094190Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:01.8094321Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:01.8094399Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:01.8094474Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:01.8094563Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:01.8094642Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:01.8094734Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:01.8094855Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:01.8094929Z .b8 0 // EOM(1) 2026-02-21T09:21:01.8094997Z .b8 0 // EOM(2) 2026-02-21T09:21:01.8095082Z .b8 4 // Abbreviation Code 2026-02-21T09:21:01.8095185Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:21:01.8095278Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:01.8095371Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:01.8095449Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:01.8095525Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:01.8095603Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:01.8095682Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:01.8095759Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:01.8095841Z .b8 88 // DW_AT_call_file 2026-02-21T09:21:01.8095918Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:01.8095997Z .b8 89 // DW_AT_call_line 2026-02-21T09:21:01.8096074Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:01.8096156Z .b8 87 // DW_AT_call_column 2026-02-21T09:21:01.8096236Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:01.8096307Z .b8 0 // EOM(1) 2026-02-21T09:21:01.8096373Z .b8 0 // EOM(2) 2026-02-21T09:21:01.8096439Z .b8 0 // EOM(3) 2026-02-21T09:21:01.8096632Z } 2026-02-21T09:21:01.8096698Z .section .debug_info 2026-02-21T09:21:01.8096748Z { 2026-02-21T09:21:01.8096840Z .b32 178 // Length of Unit 2026-02-21T09:21:01.8096936Z .b8 2 // DWARF version number 2026-02-21T09:21:01.8096989Z .b8 0 2026-02-21T09:21:01.8097123Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:21:01.8097222Z .b8 8 // Address Size (in bytes) 2026-02-21T09:21:01.8097337Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:21:01.8097429Z .b8 116 // DW_AT_producer 2026-02-21T09:21:01.8097488Z .b8 114 2026-02-21T09:21:01.8097539Z .b8 105 2026-02-21T09:21:01.8097591Z .b8 116 2026-02-21T09:21:01.8097643Z .b8 111 2026-02-21T09:21:01.8097694Z .b8 110 2026-02-21T09:21:01.8097745Z .b8 0 2026-02-21T09:21:01.8097826Z .b8 2 // DW_AT_language 2026-02-21T09:21:01.8098027Z .b8 0 2026-02-21T09:21:01.8098112Z .b8 99 // DW_AT_name 2026-02-21T09:21:01.8098163Z .b8 105 2026-02-21T09:21:01.8098218Z .b8 111 2026-02-21T09:21:01.8098269Z .b8 122 2026-02-21T09:21:01.8098319Z .b8 122 2026-02-21T09:21:01.8098373Z .b8 109 2026-02-21T09:21:01.8098426Z .b8 97 2026-02-21T09:21:01.8098476Z .b8 110 2026-02-21T09:21:01.8098528Z .b8 121 2026-02-21T09:21:01.8098583Z .b8 121 2026-02-21T09:21:01.8098633Z .b8 116 2026-02-21T09:21:01.8098683Z .b8 108 2026-02-21T09:21:01.8098734Z .b8 118 2026-02-21T09:21:01.8098785Z .b8 55 2026-02-21T09:21:01.8098835Z .b8 111 2026-02-21T09:21:01.8098886Z .b8 54 2026-02-21T09:21:01.8098936Z .b8 107 2026-02-21T09:21:01.8098988Z .b8 109 2026-02-21T09:21:01.8099037Z .b8 52 2026-02-21T09:21:01.8099087Z .b8 116 2026-02-21T09:21:01.8099212Z .b8 113 2026-02-21T09:21:01.8099266Z .b8 112 2026-02-21T09:21:01.8099316Z .b8 105 2026-02-21T09:21:01.8099365Z .b8 52 2026-02-21T09:21:01.8099420Z .b8 53 2026-02-21T09:21:01.8099475Z .b8 120 2026-02-21T09:21:01.8099526Z .b8 52 2026-02-21T09:21:01.8099596Z .b8 114 2026-02-21T09:21:01.8099648Z .b8 105 2026-02-21T09:21:01.8099698Z .b8 107 2026-02-21T09:21:01.8099748Z .b8 120 2026-02-21T09:21:01.8099802Z .b8 52 2026-02-21T09:21:01.8099852Z .b8 115 2026-02-21T09:21:01.8099964Z .b8 99 2026-02-21T09:21:01.8100023Z .b8 102 2026-02-21T09:21:01.8100073Z .b8 108 2026-02-21T09:21:01.8100123Z .b8 107 2026-02-21T09:21:01.8100175Z .b8 106 2026-02-21T09:21:01.8100229Z .b8 102 2026-02-21T09:21:01.8100280Z .b8 97 2026-02-21T09:21:01.8100329Z .b8 109 2026-02-21T09:21:01.8100381Z .b8 102 2026-02-21T09:21:01.8100436Z .b8 115 2026-02-21T09:21:01.8100486Z .b8 110 2026-02-21T09:21:01.8100536Z .b8 110 2026-02-21T09:21:01.8100590Z .b8 112 2026-02-21T09:21:01.8100640Z .b8 107 2026-02-21T09:21:01.8100692Z .b8 105 2026-02-21T09:21:01.8100742Z .b8 121 2026-02-21T09:21:01.8100797Z .b8 107 2026-02-21T09:21:01.8100847Z .b8 55 2026-02-21T09:21:01.8100897Z .b8 119 2026-02-21T09:21:01.8100949Z .b8 46 2026-02-21T09:21:01.8101007Z .b8 112 2026-02-21T09:21:01.8101056Z .b8 121 2026-02-21T09:21:01.8101105Z .b8 0 2026-02-21T09:21:01.8101214Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:21:01.8101298Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:21:01.8101352Z .b8 116 2026-02-21T09:21:01.8101405Z .b8 109 2026-02-21T09:21:01.8101456Z .b8 112 2026-02-21T09:21:01.8101506Z .b8 47 2026-02-21T09:21:01.8101557Z .b8 116 2026-02-21T09:21:01.8101616Z .b8 111 2026-02-21T09:21:01.8101673Z .b8 114 2026-02-21T09:21:01.8101724Z .b8 99 2026-02-21T09:21:01.8101775Z .b8 104 2026-02-21T09:21:01.8101829Z .b8 105 2026-02-21T09:21:01.8101878Z .b8 110 2026-02-21T09:21:01.8101928Z .b8 100 2026-02-21T09:21:01.8101980Z .b8 117 2026-02-21T09:21:01.8102030Z .b8 99 2026-02-21T09:21:01.8102079Z .b8 116 2026-02-21T09:21:01.8102131Z .b8 111 2026-02-21T09:21:01.8102185Z .b8 114 2026-02-21T09:21:01.8102234Z .b8 95 2026-02-21T09:21:01.8102285Z .b8 114 2026-02-21T09:21:01.8102338Z .b8 111 2026-02-21T09:21:01.8102390Z .b8 111 2026-02-21T09:21:01.8102440Z .b8 116 2026-02-21T09:21:01.8102490Z .b8 47 2026-02-21T09:21:01.8102543Z .b8 105 2026-02-21T09:21:01.8102593Z .b8 111 2026-02-21T09:21:01.8102644Z .b8 0 2026-02-21T09:21:01.8102764Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:21:01.8102845Z .b8 95 // DW_AT_name 2026-02-21T09:21:01.8102895Z .b8 104 2026-02-21T09:21:01.8102945Z .b8 101 2026-02-21T09:21:01.8102998Z .b8 108 2026-02-21T09:21:01.8103047Z .b8 105 2026-02-21T09:21:01.8103098Z .b8 111 2026-02-21T09:21:01.8103149Z .b8 110 2026-02-21T09:21:01.8103199Z .b8 95 2026-02-21T09:21:01.8103250Z .b8 109 2026-02-21T09:21:01.8103300Z .b8 97 2026-02-21T09:21:01.8103354Z .b8 116 2026-02-21T09:21:01.8103404Z .b8 109 2026-02-21T09:21:01.8103455Z .b8 117 2026-02-21T09:21:01.8103505Z .b8 108 2026-02-21T09:21:01.8103558Z .b8 95 2026-02-21T09:21:01.8103607Z .b8 98 2026-02-21T09:21:01.8103657Z .b8 102 2026-02-21T09:21:01.8103772Z .b8 49 2026-02-21T09:21:01.8103868Z .b8 54 2026-02-21T09:21:01.8103917Z .b8 95 2026-02-21T09:21:01.8103968Z .b8 105 2026-02-21T09:21:01.8104021Z .b8 110 2026-02-21T09:21:01.8104071Z .b8 116 2026-02-21T09:21:01.8104120Z .b8 52 2026-02-21T09:21:01.8104174Z .b8 0 2026-02-21T09:21:01.8104258Z .b8 1 // DW_AT_inline 2026-02-21T09:21:01.8104377Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:21:01.8104472Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:21:01.8104572Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:21:01.8104675Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:01.8104859Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:21:01.8104963Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:01.8105049Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:21:01.8105142Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T09:21:01.8105228Z .b8 1 // DW_AT_call_file 2026-02-21T09:21:01.8105312Z .b8 87 // DW_AT_call_line 2026-02-21T09:21:01.8105444Z .b8 40 // DW_AT_call_column 2026-02-21T09:21:01.8105540Z .b8 0 // End Of Children Mark 2026-02-21T09:21:01.8105626Z .b8 0 // End Of Children Mark 2026-02-21T09:21:01.8105677Z } 2026-02-21T09:21:01.8105746Z .section .debug_macinfo { } 2026-02-21T09:21:01.8105756Z 2026-02-21T09:21:01.8105847Z ================================================================ 2026-02-21T09:21:01.8105965Z please share the reproducer above with Triton project. 2026-02-21T09:21:07.3080293Z 2026-02-21T09:21:07.3080582Z 2026-02-21T09:21:07.3080597Z 2026-02-21T09:21:07.3080992Z ================================================================ 2026-02-21T09:21:07.3081410Z Internal Triton PTX codegen error 2026-02-21T09:21:07.3081665Z `ptxas` stderr: 2026-02-21T09:21:07.3082402Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1078 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:21:07.3083284Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:07.3083517Z 2026-02-21T09:21:07.3084172Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp75vnvhg2.ptx -o /tmp/tmp75vnvhg2.ptx.o 2026-02-21T09:21:07.3084930Z 2026-02-21T09:21:07.3084935Z 2026-02-21T09:21:07.3085004Z // 2026-02-21T09:21:07.3085183Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:21:07.3085447Z // 2026-02-21T09:21:07.3085555Z 2026-02-21T09:21:07.3085644Z .version 8.7 2026-02-21T09:21:07.3085803Z .target sm_90a 2026-02-21T09:21:07.3085985Z .address_size 64 2026-02-21T09:21:07.3086096Z 2026-02-21T09:21:07.3086319Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:21:07.3089356Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:21:07.3089681Z // @_helion_matmul_bf16_int4 2026-02-21T09:21:07.3090004Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:21:07.3090367Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:21:07.3090807Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:21:07.3091233Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:21:07.3091652Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:21:07.3092046Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:21:07.3092360Z ) 2026-02-21T09:21:07.3092504Z .reqntid 256 2026-02-21T09:21:07.3092664Z .maxnreg 32 2026-02-21T09:21:07.3092810Z { 2026-02-21T09:21:07.3092963Z .reg .pred %p<138>; 2026-02-21T09:21:07.3093842Z .reg .b16 %rs<577>; 2026-02-21T09:21:07.3094020Z .reg .b32 %r<2162>; 2026-02-21T09:21:07.3094199Z .reg .b64 %rd<313>; 2026-02-21T09:21:07.3094576Z .loc 1 19 0 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:19:0 2026-02-21T09:21:07.3094975Z $L__func_begin0: 2026-02-21T09:21:07.3095270Z .loc 1 19 0 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:19:0 2026-02-21T09:21:07.3095568Z 2026-02-21T09:21:07.3095626Z // %bb.0: 2026-02-21T09:21:07.3095824Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:21:07.3096080Z $L__tmp0: 2026-02-21T09:21:07.3096377Z .loc 1 21 66 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:21:66 2026-02-21T09:21:07.3097044Z mov.u32 %r147, %ctaid.x; 2026-02-21T09:21:07.3097287Z ld.param.b64 %rd62, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:21:07.3097549Z mov.u32 %r148, %ctaid.y; 2026-02-21T09:21:07.3098178Z [547s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:21:07.3099895Z Config: @helion.kernel(config=helion.Config(block_sizes=[128, 64, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=16, num_stages=5, num_warps=8, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[False, True], range_num_stages=[0, 2], range_unroll_factors=[2, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:21:07.3101450Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:21:07.3101749Z `ptxas` stderr: 2026-02-21T09:21:07.3102323Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1078 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T09:21:07.3102967Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:21:07.3103165Z 2026-02-21T09:21:07.3103683Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp75vnvhg2.ptx -o /tmp/tmp75vnvhg2.ptx.o 2026-02-21T09:21:07.3104265Z 2026-02-21T09:21:07.3104438Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:21:07.3104801Z ld.param.b64 %rd79, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:21:07.3105072Z mov.u32 %r149, %ctaid.z; 2026-02-21T09:21:07.3105248Z mov.u32 %r150, %nctaid.x; 2026-02-21T09:21:07.3105432Z mov.u32 %r151, %nctaid.y; 2026-02-21T09:21:07.3105620Z mad.lo.s32 %r152, %r149, %r151, %r148; 2026-02-21T09:21:07.3105832Z mad.lo.s32 %r153, %r152, %r150, %r147; 2026-02-21T09:21:07.3106038Z shl.b32 %r154, %r153, 8; 2026-02-21T09:21:07.3106214Z cvt.s64.s32 %rd80, %r154; 2026-02-21T09:21:07.3106394Z add.s64 %rd58, %rd79, %rd80; 2026-02-21T09:21:07.3106722Z mov.u32 %r1, %tid.x; 2026-02-21T09:21:07.3106897Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:21:07.3107069Z shl.b32 %r155, %r1, 2; 2026-02-21T09:21:07.3107261Z mov.b32 %r156, global_smem; 2026-02-21T09:21:07.3107442Z add.s32 %r131, %r156, %r155; 2026-02-21T09:21:07.3107620Z mov.b32 %r140, 0; 2026-02-21T09:21:07.3107775Z // begin inline asm 2026-02-21T09:21:07.3107952Z @%p1 st.shared.b32 [ %r131 + 0 ], %r140; 2026-02-21T09:21:07.3108165Z // end inline asm 2026-02-21T09:21:07.3108404Z bar.warp.sync -1; 2026-02-21T09:21:07.3108575Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T09:21:07.3108749Z cvt.u64.u32 %rd43, %r156; 2026-02-21T09:21:07.3108924Z // begin inline asm 2026-02-21T09:21:07.3109258Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd43 + 0 ], %rd44; 2026-02-21T09:21:07.3109615Z // end inline asm 2026-02-21T09:21:07.3109772Z // begin inline asm 2026-02-21T09:21:07.3110044Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T09:21:07.3110359Z // end inline asm 2026-02-21T09:21:07.3110610Z mov.b32 %r133, 32; 2026-02-21T09:21:07.3110848Z // begin inline asm 2026-02-21T09:21:07.3111144Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r133; 2026-02-21T09:21:07.3111480Z // end inline asm 2026-02-21T09:21:07.3111629Z mov.b32 %r134, 128; 2026-02-21T09:21:07.3111794Z // begin inline asm 2026-02-21T09:21:07.3112076Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r134; 2026-02-21T09:21:07.3112400Z // end inline asm 2026-02-21T09:21:07.3112555Z mov.b32 %r135, 8192; 2026-02-21T09:21:07.3112715Z // begin inline asm 2026-02-21T09:21:07.3113017Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r135; 2026-02-21T09:21:07.3113360Z // end inline asm 2026-02-21T09:21:07.3113529Z mov.b32 %r136, 512; 2026-02-21T09:21:07.3113762Z // begin inline asm 2026-02-21T09:21:07.3114063Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r136; 2026-02-21T09:21:07.3114410Z // end inline asm 2026-02-21T09:21:07.3114564Z mov.b64 %rd51, 8192; 2026-02-21T09:21:07.3114744Z // begin inline asm 2026-02-21T09:21:07.3115059Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd43 + 0 ], 0x0, %rd51; 2026-02-21T09:21:07.3115419Z // end inline asm 2026-02-21T09:21:07.3115647Z mov.b32 %r137, 1; 2026-02-21T09:21:07.3115811Z // begin inline asm 2026-02-21T09:21:07.3116119Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r137; 2026-02-21T09:21:07.3116630Z // end inline asm 2026-02-21T09:21:07.3116791Z // begin inline asm 2026-02-21T09:21:07.3117097Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r137; 2026-02-21T09:21:07.3117452Z // end inline asm 2026-02-21T09:21:07.3117607Z // begin inline asm 2026-02-21T09:21:07.3117901Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T09:21:07.3118224Z // end inline asm 2026-02-21T09:21:07.3118377Z // begin inline asm 2026-02-21T09:21:07.3118691Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T09:21:07.3119042Z // end inline asm 2026-02-21T09:21:07.3119197Z // begin inline asm 2026-02-21T09:21:07.3119490Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T09:21:07.3119821Z // end inline asm 2026-02-21T09:21:07.3119968Z // begin inline asm 2026-02-21T09:21:07.3120240Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T09:21:07.3120560Z // end inline asm 2026-02-21T09:21:07.3120705Z // begin inline asm 2026-02-21T09:21:07.3121134Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd58 + 0 ], [ %rd43 + 0 ], 0x80; 2026-02-21T09:21:07.3121619Z // end inline asm 2026-02-21T09:21:07.3121777Z // begin inline asm 2026-02-21T09:21:07.3122020Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd58 + 0 ], 0x80; 2026-02-21T09:21:07.3122341Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:21:07.3122562Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:21:07.3122778Z // end inline asm 2026-02-21T09:21:07.3122931Z bar.sync 0; 2026-02-21T09:21:07.3123086Z cvta.global.u64 %rd211, %rd58; 2026-02-21T09:21:07.3123437Z .loc 1 23 68 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:23:68 2026-02-21T09:21:07.3123809Z add.s64 %rd76, %rd58, 128; 2026-02-21T09:21:07.3123993Z bar.sync 0; 2026-02-21T09:21:07.3124138Z // begin inline asm 2026-02-21T09:21:07.3124315Z @%p1 st.shared.b32 [ %r131 + 0 ], %r140; 2026-02-21T09:21:07.3124533Z // end inline asm 2026-02-21T09:21:07.3124695Z bar.warp.sync -1; 2026-02-21T09:21:07.3124858Z // begin inline asm 2026-02-21T09:21:07.3125151Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd43 + 0 ], %rd62; 2026-02-21T09:21:07.3125496Z // end inline asm 2026-02-21T09:21:07.3125644Z // begin inline asm 2026-02-21T09:21:07.3125915Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1; 2026-02-21T09:21:07.3126380Z // end inline asm 2026-02-21T09:21:07.3126676Z // begin inline asm 2026-02-21T09:21:07.3126959Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r133; 2026-02-21T09:21:07.3127299Z // end inline asm 2026-02-21T09:21:07.3127451Z mov.b32 %r142, 64; 2026-02-21T09:21:07.3127619Z // begin inline asm 2026-02-21T09:21:07.3127904Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r142; 2026-02-21T09:21:07.3128227Z // end inline asm 2026-02-21T09:21:07.3128384Z // begin inline asm 2026-02-21T09:21:07.3128674Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r135; 2026-02-21T09:21:07.3129034Z // end inline asm 2026-02-21T09:21:07.3129272Z mov.b32 %r144, 16384; 2026-02-21T09:21:07.3129443Z // begin inline asm 2026-02-21T09:21:07.3129743Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r144; 2026-02-21T09:21:07.3130089Z // end inline asm 2026-02-21T09:21:07.3130248Z mov.b64 %rd69, 16384; 2026-02-21T09:21:07.3130409Z // begin inline asm 2026-02-21T09:21:07.3130719Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd43 + 0 ], 0x0, %rd69; 2026-02-21T09:21:07.3131155Z // end inline asm 2026-02-21T09:21:07.3131310Z // begin inline asm 2026-02-21T09:21:07.3131613Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0, %r137; 2026-02-21T09:21:07.3131979Z // end inline asm 2026-02-21T09:21:07.3132136Z // begin inline asm 2026-02-21T09:21:07.3132437Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x1, %r137; 2026-02-21T09:21:07.3132788Z // end inline asm 2026-02-21T09:21:07.3132935Z // begin inline asm 2026-02-21T09:21:07.3133213Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd43 + 0 ], 0xa; 2026-02-21T09:21:07.3133535Z // end inline asm 2026-02-21T09:21:07.3133693Z // begin inline asm 2026-02-21T09:21:07.3134000Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T09:21:07.3134350Z // end inline asm 2026-02-21T09:21:07.3134502Z // begin inline asm 2026-02-21T09:21:07.3134778Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x2; 2026-02-21T09:21:07.3135109Z // end inline asm 2026-02-21T09:21:07.3135265Z // begin inline asm 2026-02-21T09:21:07.3135541Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd43 + 0 ], 0x0; 2026-02-21T09:21:07.3135859Z // end inline asm 2026-02-21T09:21:07.3136005Z // begin inline asm 2026-02-21T09:21:07.3136435Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd76 + 0 ], [ %rd43 + 0 ], 0x80; 2026-02-21T09:21:07.3137037Z // end inline asm 2026-02-21T09:21:07.3137191Z // begin inline asm 2026-02-21T09:21:07.3137433Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd76 + 0 ], 0x80; 2026-02-21T09:21:07.3137746Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:21:07.3137986Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:21:07.3138192Z // end inline asm 2026-02-21T09:21:07.3138341Z bar.sync 0; 2026-02-21T09:21:07.3138494Z cvta.global.u64 %rd194, %rd76; 2026-02-21T09:21:07.3138856Z .loc 1 29 35 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:29:35 2026-02-21T09:21:07.3139217Z shl.b32 %r2139, %r147, 5; 2026-02-21T09:21:07.3139539Z .loc 1 30 37 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:30:37 2026-02-21T09:21:07.3139897Z add.s32 %r157, %r2139, 32; 2026-02-21T09:21:07.3140211Z .loc 1 30 49 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:30:49 2026-02-21T09:21:07.3140565Z min.s32 %r3, %r157, 65536; 2026-02-21T09:21:07.3140877Z .loc 1 31 74 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:31:74 2026-02-21T09:21:07.3141245Z setp.ge.s32 %p37, %r2139, %r3; 2026-02-21T09:21:07.3141529Z @%p37 bra $L__BB0_7; 2026-02-21T09:21:07.3141803Z // %bb.1: // %.lr.ph 2026-02-21T09:21:07.3142188Z .loc 1 0 74 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:0:74 2026-02-21T09:21:07.3142617Z ld.param.b64 %rd42, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:21:07.3142881Z shr.u32 %r4, %r1, 5; 2026-02-21T09:21:07.3143046Z and.b32 %r5, %r1, 192; 2026-02-21T09:21:07.3143232Z bfe.u32 %r6, %r1, 6, 2; 2026-02-21T09:21:07.3143406Z or.b32 %r7, %r6, 4; 2026-02-21T09:21:07.3143571Z or.b32 %r8, %r6, 8; 2026-02-21T09:21:07.3143730Z or.b32 %r9, %r6, 12; 2026-02-21T09:21:07.3143898Z or.b32 %r10, %r6, 16; 2026-02-21T09:21:07.3144062Z or.b32 %r11, %r6, 20; 2026-02-21T09:21:07.3144227Z or.b32 %r12, %r6, 24; 2026-02-21T09:21:07.3144494Z or.b32 %r13, %r6, 28; 2026-02-21T09:21:07.3144658Z or.b32 %r14, %r6, 32; 2026-02-21T09:21:07.3144819Z or.b32 %r15, %r6, 36; 2026-02-21T09:21:07.3144970Z or.b32 %r16, %r6, 40; 2026-02-21T09:21:07.3145131Z or.b32 %r17, %r6, 44; 2026-02-21T09:21:07.3145285Z or.b32 %r18, %r6, 48; 2026-02-21T09:21:07.3145453Z or.b32 %r19, %r6, 52; 2026-02-21T09:21:07.3145611Z or.b32 %r20, %r6, 56; 2026-02-21T09:21:07.3145770Z or.b32 %r21, %r6, 60; 2026-02-21T09:21:07.3145998Z and.b32 %r22, %r155, 252; 2026-02-21T09:21:07.3146180Z and.b32 %r23, %r1, 32; 2026-02-21T09:21:07.3146349Z shl.b32 %r158, %r1, 3; 2026-02-21T09:21:07.3146657Z and.b32 %r159, %r158, 2040; 2026-02-21T09:21:07.3146844Z shr.u32 %r160, %r5, 3; 2026-02-21T09:21:07.3147008Z xor.b32 %r161, %r159, %r160; 2026-02-21T09:21:07.3147215Z add.s32 %r1165, %r156, %r161; 2026-02-21T09:21:07.3147397Z add.s32 %r1167, %r1165, 4096; 2026-02-21T09:21:07.3147578Z add.s32 %r1169, %r1165, 8192; 2026-02-21T09:21:07.3147756Z add.s32 %r1171, %r1165, 12288; 2026-02-21T09:21:07.3147948Z add.s32 %r1173, %r1165, 16384; 2026-02-21T09:21:07.3148126Z add.s32 %r1175, %r1165, 20480; 2026-02-21T09:21:07.3148401Z add.s32 %r1177, %r1165, 24576; 2026-02-21T09:21:07.3148606Z add.s32 %r1179, %r1165, 28672; 2026-02-21T09:21:07.3148787Z xor.b32 %r163, %r161, 32; 2026-02-21T09:21:07.3148968Z add.s32 %r164, %r156, %r163; 2026-02-21T09:21:07.3149158Z add.s32 %r1181, %r164, 2048; 2026-02-21T09:21:07.3149339Z add.s32 %r1183, %r164, 6144; 2026-02-21T09:21:07.3149514Z add.s32 %r1185, %r164, 10240; 2026-02-21T09:21:07.3149696Z add.s32 %r1187, %r164, 14336; 2026-02-21T09:21:07.3149868Z add.s32 %r1189, %r164, 18432; 2026-02-21T09:21:07.3150054Z add.s32 %r1191, %r164, 22528; 2026-02-21T09:21:07.3150236Z add.s32 %r1193, %r164, 26624; 2026-02-21T09:21:07.3150411Z add.s32 %r1195, %r164, 30720; 2026-02-21T09:21:07.3150595Z shl.b32 %r165, %r1, 8; 2026-02-21T09:21:07.3150761Z and.b32 %r166, %r165, 24576; 2026-02-21T09:21:07.3150941Z shl.b32 %r167, %r1, 7; 2026-02-21T09:21:07.3151109Z and.b32 %r168, %r167, 3584; 2026-02-21T09:21:07.3151307Z and.b32 %r169, %r1, 31; 2026-02-21T09:21:07.3151479Z shl.b32 %r170, %r169, 1; 2026-02-21T09:21:07.3151659Z or.b32 %r171, %r166, %r168; 2026-02-21T09:21:07.3151837Z or.b32 %r40, %r171, %r170; 2026-02-21T09:21:07.3152020Z xor.b32 %r41, %r40, 8; 2026-02-21T09:21:07.3152193Z xor.b32 %r42, %r40, 16; 2026-02-21T09:21:07.3152367Z xor.b32 %r43, %r40, 24; 2026-02-21T09:21:07.3152542Z xor.b32 %r44, %r40, 32; 2026-02-21T09:21:07.3152709Z xor.b32 %r45, %r40, 40; 2026-02-21T09:21:07.3152876Z xor.b32 %r46, %r40, 48; 2026-02-21T09:21:07.3153039Z xor.b32 %r47, %r40, 56; 2026-02-21T09:21:07.3153208Z shr.u32 %r172, %r5, 1; 2026-02-21T09:21:07.3153370Z or.b32 %r48, %r172, %r169; 2026-02-21T09:21:07.3153560Z shl.b32 %r173, %r169, 7; 2026-02-21T09:21:07.3153725Z shl.b32 %r174, %r1, 4; 2026-02-21T09:21:07.3153893Z and.b32 %r175, %r174, 112; 2026-02-21T09:21:07.3154067Z shr.u32 %r176, %r1, 3; 2026-02-21T09:21:07.3154232Z and.b32 %r177, %r176, 28; 2026-02-21T09:21:07.3154405Z or.b32 %r178, %r173, %r175; 2026-02-21T09:21:07.3154577Z xor.b32 %r179, %r178, %r177; 2026-02-21T09:21:07.3154761Z add.s32 %r180, %r156, 32768; 2026-02-21T09:21:07.3155110Z add.s32 %r49, %r180, %r179; 2026-02-21T09:21:07.3155287Z xor.b32 %r181, %r179, 32; 2026-02-21T09:21:07.3155456Z add.s32 %r50, %r180, %r181; 2026-02-21T09:21:07.3155630Z xor.b32 %r182, %r179, 64; 2026-02-21T09:21:07.3155797Z add.s32 %r51, %r180, %r182; 2026-02-21T09:21:07.3155975Z xor.b32 %r183, %r179, 96; 2026-02-21T09:21:07.3156153Z add.s32 %r52, %r180, %r183; 2026-02-21T09:21:07.3156331Z shl.b32 %r184, %r1, 6; 2026-02-21T09:21:07.3156642Z and.b32 %r185, %r184, 960; 2026-02-21T09:21:07.3156824Z and.b32 %r186, %r1, 96; 2026-02-21T09:21:07.3157001Z shl.b32 %r187, %r186, 5; 2026-02-21T09:21:07.3157170Z and.b32 %r188, %r158, 48; 2026-02-21T09:21:07.3157342Z and.b32 %r189, %r1, 16; 2026-02-21T09:21:07.3157517Z shr.u32 %r190, %r1, 2; 2026-02-21T09:21:07.3157769Z and.b32 %r191, %r190, 32; 2026-02-21T09:21:07.3157942Z or.b32 %r192, %r187, %r188; 2026-02-21T09:21:07.3158122Z or.b32 %r193, %r191, %r189; 2026-02-21T09:21:07.3158311Z xor.b32 %r194, %r192, %r193; 2026-02-21T09:21:07.3158511Z add.s32 %r195, %r156, %r185; 2026-02-21T09:21:07.3158701Z add.s32 %r196, %r195, %r194; 2026-02-21T09:21:07.3158879Z add.s32 %r53, %r196, 70144; 2026-02-21T09:21:07.3159064Z shl.b32 %r197, %r186, 8; 2026-02-21T09:21:07.3159305Z or.b32 %r198, %r197, %r168; 2026-02-21T09:21:07.3159494Z or.b32 %r54, %r198, %r170; 2026-02-21T09:21:07.3159668Z xor.b32 %r55, %r54, 8; 2026-02-21T09:21:07.3159846Z xor.b32 %r56, %r54, 16; 2026-02-21T09:21:07.3160016Z xor.b32 %r57, %r54, 24; 2026-02-21T09:21:07.3160187Z xor.b32 %r58, %r54, 32; 2026-02-21T09:21:07.3160361Z xor.b32 %r59, %r54, 40; 2026-02-21T09:21:07.3160542Z xor.b32 %r60, %r54, 48; 2026-02-21T09:21:07.3160714Z xor.b32 %r61, %r54, 56; 2026-02-21T09:21:07.3161037Z .loc 1 31 74 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:31:74 2026-02-21T09:21:07.3161405Z and.b32 %r199, %r1, 63; 2026-02-21T09:21:07.3161585Z mad.wide.u32 %rd3, %r199, 8, %rd42; 2026-02-21T09:21:07.3161810Z shl.b32 %r76, %r6, 10; 2026-02-21T09:21:07.3162136Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3162507Z or.b32 %r200, %r76, %r22; 2026-02-21T09:21:07.3162689Z or.b32 %r77, %r200, 61696; 2026-02-21T09:21:07.3162919Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:21:07.3163216Z // Child Loop BB0_3 Depth 2 2026-02-21T09:21:07.3163486Z // Child Loop BB0_5 Depth 2 2026-02-21T09:21:07.3163865Z .loc 1 37 35 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:37:35 2026-02-21T09:21:07.3164221Z shr.s32 %r242, %r2139, 31; 2026-02-21T09:21:07.3164414Z shr.u32 %r243, %r242, 18; 2026-02-21T09:21:07.3164599Z add.s32 %r244, %r2139, %r243; 2026-02-21T09:21:07.3164786Z shr.s32 %r245, %r244, 14; 2026-02-21T09:21:07.3165101Z .loc 1 38 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:38:33 2026-02-21T09:21:07.3165454Z shl.b32 %r246, %r245, 6; 2026-02-21T09:21:07.3165764Z .loc 1 39 39 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:39:39 2026-02-21T09:21:07.3166110Z sub.s32 %r247, 256, %r246; 2026-02-21T09:21:07.3166425Z .loc 1 39 52 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:39:52 2026-02-21T09:21:07.3166914Z min.s32 %r248, %r247, 64; 2026-02-21T09:21:07.3167220Z .loc 1 40 45 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:45 2026-02-21T09:21:07.3167574Z and.b32 %r249, %r244, -16384; 2026-02-21T09:21:07.3167756Z sub.s32 %r250, %r2139, %r249; 2026-02-21T09:21:07.3168077Z .loc 1 41 51 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:41:51 2026-02-21T09:21:07.3168429Z div.s32 %r251, %r250, %r248; 2026-02-21T09:21:07.3168747Z .loc 1 40 64 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:64 2026-02-21T09:21:07.3169273Z mul.lo.s32 %r252, %r251, %r248; 2026-02-21T09:21:07.3169466Z sub.s32 %r253, %r250, %r252; 2026-02-21T09:21:07.3169787Z .loc 1 40 30 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:30 2026-02-21T09:21:07.3170135Z add.s32 %r254, %r253, %r246; 2026-02-21T09:21:07.3170452Z .loc 1 42 27 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:42:27 2026-02-21T09:21:07.3170799Z shl.b32 %r1162, %r254, 6; 2026-02-21T09:21:07.3171114Z .loc 1 43 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:43:32 2026-02-21T09:21:07.3171468Z or.b32 %r255, %r1162, %r6; 2026-02-21T09:21:07.3171644Z or.b32 %r256, %r1162, %r7; 2026-02-21T09:21:07.3171901Z or.b32 %r257, %r1162, %r8; 2026-02-21T09:21:07.3172079Z or.b32 %r258, %r1162, %r9; 2026-02-21T09:21:07.3172258Z or.b32 %r259, %r1162, %r10; 2026-02-21T09:21:07.3172435Z or.b32 %r260, %r1162, %r11; 2026-02-21T09:21:07.3172619Z or.b32 %r261, %r1162, %r12; 2026-02-21T09:21:07.3172790Z or.b32 %r262, %r1162, %r13; 2026-02-21T09:21:07.3172969Z or.b32 %r263, %r1162, %r14; 2026-02-21T09:21:07.3173155Z or.b32 %r264, %r1162, %r15; 2026-02-21T09:21:07.3173438Z or.b32 %r265, %r1162, %r16; 2026-02-21T09:21:07.3173621Z or.b32 %r266, %r1162, %r17; 2026-02-21T09:21:07.3173794Z or.b32 %r267, %r1162, %r18; 2026-02-21T09:21:07.3173985Z or.b32 %r268, %r1162, %r19; 2026-02-21T09:21:07.3174162Z or.b32 %r269, %r1162, %r20; 2026-02-21T09:21:07.3174346Z or.b32 %r270, %r1162, %r21; 2026-02-21T09:21:07.3174657Z .loc 1 44 27 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:44:27 2026-02-21T09:21:07.3175013Z shl.b32 %r1161, %r251, 5; 2026-02-21T09:21:07.3175322Z .loc 1 58 53 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:53 2026-02-21T09:21:07.3175682Z shl.b32 %r271, %r255, 10; 2026-02-21T09:21:07.3175861Z shl.b32 %r272, %r256, 10; 2026-02-21T09:21:07.3176048Z shl.b32 %r273, %r257, 10; 2026-02-21T09:21:07.3176227Z shl.b32 %r274, %r258, 10; 2026-02-21T09:21:07.3176392Z shl.b32 %r275, %r259, 10; 2026-02-21T09:21:07.3176685Z shl.b32 %r276, %r260, 10; 2026-02-21T09:21:07.3176869Z shl.b32 %r277, %r261, 10; 2026-02-21T09:21:07.3177049Z shl.b32 %r278, %r262, 10; 2026-02-21T09:21:07.3177212Z shl.b32 %r279, %r263, 10; 2026-02-21T09:21:07.3177380Z shl.b32 %r280, %r264, 10; 2026-02-21T09:21:07.3177550Z shl.b32 %r281, %r265, 10; 2026-02-21T09:21:07.3177712Z shl.b32 %r282, %r266, 10; 2026-02-21T09:21:07.3177880Z shl.b32 %r283, %r267, 10; 2026-02-21T09:21:07.3178054Z shl.b32 %r284, %r268, 10; 2026-02-21T09:21:07.3178225Z shl.b32 %r285, %r269, 10; 2026-02-21T09:21:07.3178391Z shl.b32 %r286, %r270, 10; 2026-02-21T09:21:07.3178708Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3179062Z add.s32 %r1160, %r156, 69632; 2026-02-21T09:21:07.3179262Z // begin inline asm 2026-02-21T09:21:07.3179470Z @%p2 mbarrier.init.shared::cta.b64 [%r1160], 1; 2026-02-21T09:21:07.3179698Z // end inline asm 2026-02-21T09:21:07.3180000Z .loc 1 58 60 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:60 2026-02-21T09:21:07.3180350Z or.b32 %r288, %r271, %r22; 2026-02-21T09:21:07.3180530Z or.b32 %r289, %r272, %r22; 2026-02-21T09:21:07.3180702Z or.b32 %r290, %r273, %r22; 2026-02-21T09:21:07.3180890Z or.b32 %r291, %r274, %r22; 2026-02-21T09:21:07.3181063Z or.b32 %r292, %r275, %r22; 2026-02-21T09:21:07.3181239Z or.b32 %r293, %r276, %r22; 2026-02-21T09:21:07.3181413Z or.b32 %r294, %r277, %r22; 2026-02-21T09:21:07.3181584Z or.b32 %r295, %r278, %r22; 2026-02-21T09:21:07.3181763Z or.b32 %r296, %r279, %r22; 2026-02-21T09:21:07.3181938Z or.b32 %r297, %r280, %r22; 2026-02-21T09:21:07.3182115Z or.b32 %r298, %r281, %r22; 2026-02-21T09:21:07.3182286Z or.b32 %r299, %r282, %r22; 2026-02-21T09:21:07.3182556Z or.b32 %r300, %r283, %r22; 2026-02-21T09:21:07.3182787Z or.b32 %r301, %r284, %r22; 2026-02-21T09:21:07.3182958Z or.b32 %r302, %r285, %r22; 2026-02-21T09:21:07.3183143Z or.b32 %r303, %r286, %r22; 2026-02-21T09:21:07.3183466Z .loc 1 58 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:32 2026-02-21T09:21:07.3183833Z mad.wide.s32 %rd81, %r288, 2, %rd42; 2026-02-21T09:21:07.3184042Z mad.wide.s32 %rd89, %r289, 2, %rd42; 2026-02-21T09:21:07.3184250Z mad.wide.s32 %rd82, %r290, 2, %rd42; 2026-02-21T09:21:07.3184451Z mad.wide.s32 %rd90, %r291, 2, %rd42; 2026-02-21T09:21:07.3184672Z mad.wide.s32 %rd83, %r292, 2, %rd42; 2026-02-21T09:21:07.3184880Z mad.wide.s32 %rd91, %r293, 2, %rd42; 2026-02-21T09:21:07.3185091Z mad.wide.s32 %rd84, %r294, 2, %rd42; 2026-02-21T09:21:07.3185371Z mad.wide.s32 %rd92, %r295, 2, %rd42; 2026-02-21T09:21:07.3185592Z mad.wide.s32 %rd85, %r296, 2, %rd42; 2026-02-21T09:21:07.3185803Z mad.wide.s32 %rd93, %r297, 2, %rd42; 2026-02-21T09:21:07.3186012Z mad.wide.s32 %rd86, %r298, 2, %rd42; 2026-02-21T09:21:07.3186219Z mad.wide.s32 %rd94, %r299, 2, %rd42; 2026-02-21T09:21:07.3186423Z mad.wide.s32 %rd87, %r300, 2, %rd42; 2026-02-21T09:21:07.3186768Z mad.wide.s32 %rd95, %r301, 2, %rd42; 2026-02-21T09:21:07.3187057Z mad.wide.s32 %rd88, %r302, 2, %rd42; 2026-02-21T09:21:07.3187267Z mad.wide.s32 %rd96, %r303, 2, %rd42; 2026-02-21T09:21:07.3187604Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3187966Z bar.sync 0; 2026-02-21T09:21:07.3188114Z mov.b32 %r203, 8; 2026-02-21T09:21:07.3188268Z // begin inline asm 2026-02-21T09:21:07.3188596Z cp.async.ca.shared.global [ %r1165 + 0 ], [ %rd81 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3188877Z // end inline asm 2026-02-21T09:21:07.3189036Z // begin inline asm 2026-02-21T09:21:07.3189265Z cp.async.ca.shared.global [ %r1167 + 0 ], [ %rd82 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3189539Z // end inline asm 2026-02-21T09:21:07.3189706Z // begin inline asm 2026-02-21T09:21:07.3189938Z cp.async.ca.shared.global [ %r1169 + 0 ], [ %rd83 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3190228Z // end inline asm 2026-02-21T09:21:07.3190379Z // begin inline asm 2026-02-21T09:21:07.3190622Z cp.async.ca.shared.global [ %r1171 + 0 ], [ %rd84 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3190895Z // end inline asm 2026-02-21T09:21:07.3191050Z // begin inline asm 2026-02-21T09:21:07.3191288Z cp.async.ca.shared.global [ %r1173 + 0 ], [ %rd85 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3191573Z // end inline asm 2026-02-21T09:21:07.3191720Z // begin inline asm 2026-02-21T09:21:07.3191949Z cp.async.ca.shared.global [ %r1175 + 0 ], [ %rd86 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3192226Z // end inline asm 2026-02-21T09:21:07.3192373Z // begin inline asm 2026-02-21T09:21:07.3192603Z cp.async.ca.shared.global [ %r1177 + 0 ], [ %rd87 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3192871Z // end inline asm 2026-02-21T09:21:07.3193025Z // begin inline asm 2026-02-21T09:21:07.3193266Z cp.async.ca.shared.global [ %r1179 + 0 ], [ %rd88 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3193540Z // end inline asm 2026-02-21T09:21:07.3193687Z // begin inline asm 2026-02-21T09:21:07.3193919Z cp.async.ca.shared.global [ %r1181 + 0 ], [ %rd89 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3194198Z // end inline asm 2026-02-21T09:21:07.3194345Z // begin inline asm 2026-02-21T09:21:07.3194576Z cp.async.ca.shared.global [ %r1183 + 0 ], [ %rd90 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3194851Z // end inline asm 2026-02-21T09:21:07.3195010Z // begin inline asm 2026-02-21T09:21:07.3195233Z cp.async.ca.shared.global [ %r1185 + 0 ], [ %rd91 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3195510Z // end inline asm 2026-02-21T09:21:07.3195658Z // begin inline asm 2026-02-21T09:21:07.3195894Z cp.async.ca.shared.global [ %r1187 + 0 ], [ %rd92 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3196162Z // end inline asm 2026-02-21T09:21:07.3196318Z // begin inline asm 2026-02-21T09:21:07.3196680Z cp.async.ca.shared.global [ %r1189 + 0 ], [ %rd93 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3197162Z // end inline asm 2026-02-21T09:21:07.3197322Z // begin inline asm 2026-02-21T09:21:07.3197559Z cp.async.ca.shared.global [ %r1191 + 0 ], [ %rd94 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3197860Z // end inline asm 2026-02-21T09:21:07.3198014Z // begin inline asm 2026-02-21T09:21:07.3198252Z cp.async.ca.shared.global [ %r1193 + 0 ], [ %rd95 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3198528Z // end inline asm 2026-02-21T09:21:07.3198687Z // begin inline asm 2026-02-21T09:21:07.3198934Z cp.async.ca.shared.global [ %r1195 + 0 ], [ %rd96 + 0 ], 0x8, %r203; 2026-02-21T09:21:07.3199209Z // end inline asm 2026-02-21T09:21:07.3199376Z cp.async.commit_group; 2026-02-21T09:21:07.3199795Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3200168Z // begin inline asm 2026-02-21T09:21:07.3200392Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r1160], 4096; 2026-02-21T09:21:07.3200663Z // end inline asm 2026-02-21T09:21:07.3200973Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3201329Z bar.sync 0; 2026-02-21T09:21:07.3201494Z elect.sync %r304|%p42, -1; 2026-02-21T09:21:07.3201759Z and.pred %p40, %p1, %p42; 2026-02-21T09:21:07.3201956Z add.s32 %r235, %r156, 65536; 2026-02-21T09:21:07.3202128Z mov.b32 %r237, 0; 2026-02-21T09:21:07.3202284Z // begin inline asm 2026-02-21T09:21:07.3202701Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r235], [%rd211, {%r1161, %r237}], [%r1160]; 2026-02-21T09:21:07.3203171Z // end inline asm 2026-02-21T09:21:07.3203484Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3203848Z or.b32 %r305, %r20, %r1162; 2026-02-21T09:21:07.3204032Z shl.b32 %r306, %r305, 10; 2026-02-21T09:21:07.3204207Z mul.wide.s32 %rd99, %r306, 2; 2026-02-21T09:21:07.3204402Z or.b64 %rd4, %rd99, 512; 2026-02-21T09:21:07.3204580Z or.b32 %r307, %r19, %r1162; 2026-02-21T09:21:07.3204759Z shl.b32 %r308, %r307, 10; 2026-02-21T09:21:07.3204934Z mul.wide.s32 %rd100, %r308, 2; 2026-02-21T09:21:07.3205127Z or.b64 %rd5, %rd100, 512; 2026-02-21T09:21:07.3205306Z or.b32 %r309, %r18, %r1162; 2026-02-21T09:21:07.3205481Z shl.b32 %r310, %r309, 10; 2026-02-21T09:21:07.3205661Z mul.wide.s32 %rd101, %r310, 2; 2026-02-21T09:21:07.3205856Z or.b64 %rd6, %rd101, 512; 2026-02-21T09:21:07.3206038Z or.b32 %r311, %r17, %r1162; 2026-02-21T09:21:07.3206211Z shl.b32 %r312, %r311, 10; 2026-02-21T09:21:07.3206388Z mul.wide.s32 %rd102, %r312, 2; 2026-02-21T09:21:07.3206702Z or.b64 %rd7, %rd102, 512; 2026-02-21T09:21:07.3206879Z or.b32 %r313, %r16, %r1162; 2026-02-21T09:21:07.3207053Z shl.b32 %r314, %r313, 10; 2026-02-21T09:21:07.3207230Z mul.wide.s32 %rd103, %r314, 2; 2026-02-21T09:21:07.3207419Z or.b64 %rd8, %rd103, 512; 2026-02-21T09:21:07.3207604Z or.b32 %r315, %r15, %r1162; 2026-02-21T09:21:07.3207786Z shl.b32 %r316, %r315, 10; 2026-02-21T09:21:07.3207955Z mul.wide.s32 %rd104, %r316, 2; 2026-02-21T09:21:07.3208144Z or.b64 %rd9, %rd104, 512; 2026-02-21T09:21:07.3208314Z or.b32 %r317, %r14, %r1162; 2026-02-21T09:21:07.3208497Z shl.b32 %r318, %r317, 10; 2026-02-21T09:21:07.3208682Z mul.wide.s32 %rd105, %r318, 2; 2026-02-21T09:21:07.3208877Z or.b64 %rd10, %rd105, 512; 2026-02-21T09:21:07.3209053Z or.b32 %r319, %r13, %r1162; 2026-02-21T09:21:07.3209235Z shl.b32 %r320, %r319, 10; 2026-02-21T09:21:07.3209418Z mul.wide.s32 %rd106, %r320, 2; 2026-02-21T09:21:07.3209605Z or.b64 %rd11, %rd106, 512; 2026-02-21T09:21:07.3209788Z or.b32 %r321, %r12, %r1162; 2026-02-21T09:21:07.3209963Z shl.b32 %r322, %r321, 10; 2026-02-21T09:21:07.3210146Z mul.wide.s32 %rd107, %r322, 2; 2026-02-21T09:21:07.3210334Z or.b64 %rd12, %rd107, 512; 2026-02-21T09:21:07.3210529Z or.b32 %r323, %r11, %r1162; 2026-02-21T09:21:07.3210704Z shl.b32 %r324, %r323, 10; 2026-02-21T09:21:07.3211061Z mul.wide.s32 %rd108, %r324, 2; 2026-02-21T09:21:07.3211251Z or.b64 %rd13, %rd108, 512; 2026-02-21T09:21:07.3211434Z or.b32 %r325, %r10, %r1162; 2026-02-21T09:21:07.3211613Z shl.b32 %r326, %r325, 10; 2026-02-21T09:21:07.3211787Z mul.wide.s32 %rd109, %r326, 2; 2026-02-21T09:21:07.3211976Z or.b64 %rd14, %rd109, 512; 2026-02-21T09:21:07.3212143Z or.b32 %r327, %r9, %r1162; 2026-02-21T09:21:07.3212320Z shl.b32 %r328, %r327, 10; 2026-02-21T09:21:07.3212491Z mul.wide.s32 %rd110, %r328, 2; 2026-02-21T09:21:07.3212679Z or.b64 %rd15, %rd110, 512; 2026-02-21T09:21:07.3212849Z or.b32 %r329, %r8, %r1162; 2026-02-21T09:21:07.3213038Z shl.b32 %r330, %r329, 10; 2026-02-21T09:21:07.3213215Z mul.wide.s32 %rd111, %r330, 2; 2026-02-21T09:21:07.3213398Z or.b64 %rd16, %rd111, 512; 2026-02-21T09:21:07.3213653Z or.b32 %r331, %r7, %r1162; 2026-02-21T09:21:07.3213826Z shl.b32 %r332, %r331, 10; 2026-02-21T09:21:07.3214006Z mul.wide.s32 %rd112, %r332, 2; 2026-02-21T09:21:07.3214203Z or.b64 %rd17, %rd112, 512; 2026-02-21T09:21:07.3214383Z shl.b32 %r333, %r254, 16; 2026-02-21T09:21:07.3214562Z or.b32 %r334, %r76, %r333; 2026-02-21T09:21:07.3214745Z mad.wide.s32 %rd18, %r334, 2, 512; 2026-02-21T09:21:07.3214944Z shl.b32 %r335, %r2139, 16; 2026-02-21T09:21:07.3215189Z or.b32 %r336, %r77, %r335; 2026-02-21T09:21:07.3215375Z shl.b32 %r337, %r252, 16; 2026-02-21T09:21:07.3215551Z sub.s32 %r338, %r336, %r337; 2026-02-21T09:21:07.3215744Z mul.lo.s32 %r339, %r245, 1069547520; 2026-02-21T09:21:07.3215954Z sub.s32 %r2140, %r338, %r339; 2026-02-21T09:21:07.3216153Z mov.b32 %r2143, 0f00000000; 2026-02-21T09:21:07.3216332Z mov.b32 %r2142, -1; 2026-02-21T09:21:07.3216629Z mov.b64 %rd310, -128; 2026-02-21T09:21:07.3216804Z mov.b64 %rd309, %rd3; 2026-02-21T09:21:07.3216981Z mov.b32 %r2141, %r237; 2026-02-21T09:21:07.3217150Z mov.b32 %r2144, %r2143; 2026-02-21T09:21:07.3217327Z mov.b32 %r2145, %r2143; 2026-02-21T09:21:07.3217497Z mov.b32 %r2146, %r2143; 2026-02-21T09:21:07.3217678Z mov.b32 %r2147, %r2143; 2026-02-21T09:21:07.3217857Z mov.b32 %r2148, %r2143; 2026-02-21T09:21:07.3218025Z mov.b32 %r2149, %r2143; 2026-02-21T09:21:07.3218196Z mov.b32 %r2150, %r2143; 2026-02-21T09:21:07.3218413Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:07.3218726Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:07.3219133Z .loc 1 81 38 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:81:38 2026-02-21T09:21:07.3219507Z setp.eq.b32 %p77, %r23, 0; 2026-02-21T09:21:07.3219853Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3220222Z add.s64 %rd21, %rd310, 128; 2026-02-21T09:21:07.3220418Z setp.lt.u64 %p80, %rd21, 384; 2026-02-21T09:21:07.3220607Z add.s32 %r1041, %r2142, 1; 2026-02-21T09:21:07.3220805Z setp.lt.u32 %p81, %r2142, 2147483647; 2026-02-21T09:21:07.3221015Z selp.b32 %r2142, 0, %r1041, %p81; 2026-02-21T09:21:07.3221218Z selp.b32 %r1042, 1, 0, %p81; 2026-02-21T09:21:07.3221401Z xor.b32 %r2141, %r2141, %r1042; 2026-02-21T09:21:07.3221749Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3222114Z cp.async.wait_group 0; 2026-02-21T09:21:07.3222285Z bar.sync 0; 2026-02-21T09:21:07.3222441Z shl.b32 %r1043, %r2142, 15; 2026-02-21T09:21:07.3222618Z add.s32 %r1045, %r156, %r1043; 2026-02-21T09:21:07.3222943Z .loc 1 62 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:62:32 2026-02-21T09:21:07.3223304Z add.s32 %r1046, %r1045, %r40; 2026-02-21T09:21:07.3223491Z ld.shared.b16 %rs1, [%r1046]; 2026-02-21T09:21:07.3223682Z ld.shared.b16 %rs2, [%r1046+4096]; 2026-02-21T09:21:07.3223887Z ld.shared.b16 %rs3, [%r1046+64]; 2026-02-21T09:21:07.3224094Z ld.shared.b16 %rs4, [%r1046+4160]; 2026-02-21T09:21:07.3224291Z ld.shared.b16 %rs5, [%r1046+128]; 2026-02-21T09:21:07.3224664Z ld.shared.b16 %rs6, [%r1046+4224]; 2026-02-21T09:21:07.3224860Z ld.shared.b16 %rs7, [%r1046+192]; 2026-02-21T09:21:07.3225056Z ld.shared.b16 %rs8, [%r1046+4288]; 2026-02-21T09:21:07.3225248Z ld.shared.b16 %rs9, [%r1046+256]; 2026-02-21T09:21:07.3225453Z ld.shared.b16 %rs10, [%r1046+4352]; 2026-02-21T09:21:07.3225654Z ld.shared.b16 %rs11, [%r1046+320]; 2026-02-21T09:21:07.3225860Z ld.shared.b16 %rs12, [%r1046+4416]; 2026-02-21T09:21:07.3226062Z ld.shared.b16 %rs13, [%r1046+384]; 2026-02-21T09:21:07.3226259Z ld.shared.b16 %rs14, [%r1046+4480]; 2026-02-21T09:21:07.3226578Z ld.shared.b16 %rs15, [%r1046+448]; 2026-02-21T09:21:07.3226792Z ld.shared.b16 %rs16, [%r1046+4544]; 2026-02-21T09:21:07.3227008Z add.s32 %r1047, %r1045, %r41; 2026-02-21T09:21:07.3227278Z ld.shared.b16 %rs17, [%r1047]; 2026-02-21T09:21:07.3227478Z ld.shared.b16 %rs18, [%r1047+4096]; 2026-02-21T09:21:07.3227676Z ld.shared.b16 %rs19, [%r1047+64]; 2026-02-21T09:21:07.3227892Z ld.shared.b16 %rs20, [%r1047+4160]; 2026-02-21T09:21:07.3228103Z ld.shared.b16 %rs21, [%r1047+128]; 2026-02-21T09:21:07.3228382Z ld.shared.b16 %rs22, [%r1047+4224]; 2026-02-21T09:21:07.3228599Z ld.shared.b16 %rs23, [%r1047+192]; 2026-02-21T09:21:07.3228793Z ld.shared.b16 %rs24, [%r1047+4288]; 2026-02-21T09:21:07.3229072Z ld.shared.b16 %rs25, [%r1047+256]; 2026-02-21T09:21:07.3229271Z ld.shared.b16 %rs26, [%r1047+4352]; 2026-02-21T09:21:07.3239673Z ld.shared.b16 %rs27, [%r1047+320]; 2026-02-21T09:21:07.3239974Z ld.shared.b16 %rs28, [%r1047+4416]; 2026-02-21T09:21:07.3240231Z ld.shared.b16 %rs29, [%r1047+384]; 2026-02-21T09:21:07.3240457Z ld.shared.b16 %rs30, [%r1047+4480]; 2026-02-21T09:21:07.3240683Z ld.shared.b16 %rs31, [%r1047+448]; 2026-02-21T09:21:07.3240906Z ld.shared.b16 %rs32, [%r1047+4544]; 2026-02-21T09:21:07.3241122Z add.s32 %r1048, %r1045, %r42; 2026-02-21T09:21:07.3241329Z ld.shared.b16 %rs33, [%r1048]; 2026-02-21T09:21:07.3241530Z ld.shared.b16 %rs34, [%r1048+4096]; 2026-02-21T09:21:07.3241763Z ld.shared.b16 %rs35, [%r1048+64]; 2026-02-21T09:21:07.3241979Z ld.shared.b16 %rs36, [%r1048+4160]; 2026-02-21T09:21:07.3242194Z ld.shared.b16 %rs37, [%r1048+128]; 2026-02-21T09:21:07.3242395Z ld.shared.b16 %rs38, [%r1048+4224]; 2026-02-21T09:21:07.3242607Z ld.shared.b16 %rs39, [%r1048+192]; 2026-02-21T09:21:07.3242818Z ld.shared.b16 %rs40, [%r1048+4288]; 2026-02-21T09:21:07.3243019Z ld.shared.b16 %rs41, [%r1048+256]; 2026-02-21T09:21:07.3243237Z ld.shared.b16 %rs42, [%r1048+4352]; 2026-02-21T09:21:07.3243436Z ld.shared.b16 %rs43, [%r1048+320]; 2026-02-21T09:21:07.3243651Z ld.shared.b16 %rs44, [%r1048+4416]; 2026-02-21T09:21:07.3243857Z ld.shared.b16 %rs45, [%r1048+384]; 2026-02-21T09:21:07.3244065Z ld.shared.b16 %rs46, [%r1048+4480]; 2026-02-21T09:21:07.3244264Z ld.shared.b16 %rs47, [%r1048+448]; 2026-02-21T09:21:07.3244484Z ld.shared.b16 %rs48, [%r1048+4544]; 2026-02-21T09:21:07.3244686Z add.s32 %r1049, %r1045, %r43; 2026-02-21T09:21:07.3244887Z ld.shared.b16 %rs49, [%r1049]; 2026-02-21T09:21:07.3245095Z ld.shared.b16 %rs50, [%r1049+4096]; 2026-02-21T09:21:07.3245297Z ld.shared.b16 %rs51, [%r1049+64]; 2026-02-21T09:21:07.3245505Z ld.shared.b16 %rs52, [%r1049+4160]; 2026-02-21T09:21:07.3245718Z ld.shared.b16 %rs53, [%r1049+128]; 2026-02-21T09:21:07.3245925Z ld.shared.b16 %rs54, [%r1049+4224]; 2026-02-21T09:21:07.3246122Z ld.shared.b16 %rs55, [%r1049+192]; 2026-02-21T09:21:07.3246327Z ld.shared.b16 %rs56, [%r1049+4288]; 2026-02-21T09:21:07.3246668Z ld.shared.b16 %rs57, [%r1049+256]; 2026-02-21T09:21:07.3246873Z ld.shared.b16 %rs58, [%r1049+4352]; 2026-02-21T09:21:07.3247088Z ld.shared.b16 %rs59, [%r1049+320]; 2026-02-21T09:21:07.3247293Z ld.shared.b16 %rs60, [%r1049+4416]; 2026-02-21T09:21:07.3247498Z ld.shared.b16 %rs61, [%r1049+384]; 2026-02-21T09:21:07.3247706Z ld.shared.b16 %rs62, [%r1049+4480]; 2026-02-21T09:21:07.3247917Z ld.shared.b16 %rs63, [%r1049+448]; 2026-02-21T09:21:07.3248120Z ld.shared.b16 %rs64, [%r1049+4544]; 2026-02-21T09:21:07.3248569Z add.s32 %r1050, %r1045, %r44; 2026-02-21T09:21:07.3248768Z ld.shared.b16 %rs65, [%r1050]; 2026-02-21T09:21:07.3248972Z ld.shared.b16 %rs66, [%r1050+4096]; 2026-02-21T09:21:07.3249178Z ld.shared.b16 %rs67, [%r1050+64]; 2026-02-21T09:21:07.3249381Z ld.shared.b16 %rs68, [%r1050+4160]; 2026-02-21T09:21:07.3249589Z ld.shared.b16 %rs69, [%r1050+128]; 2026-02-21T09:21:07.3249785Z ld.shared.b16 %rs70, [%r1050+4224]; 2026-02-21T09:21:07.3249999Z ld.shared.b16 %rs71, [%r1050+192]; 2026-02-21T09:21:07.3250200Z ld.shared.b16 %rs72, [%r1050+4288]; 2026-02-21T09:21:07.3250407Z ld.shared.b16 %rs73, [%r1050+256]; 2026-02-21T09:21:07.3250605Z ld.shared.b16 %rs74, [%r1050+4352]; 2026-02-21T09:21:07.3250811Z ld.shared.b16 %rs75, [%r1050+320]; 2026-02-21T09:21:07.3251112Z ld.shared.b16 %rs76, [%r1050+4416]; 2026-02-21T09:21:07.3251319Z ld.shared.b16 %rs77, [%r1050+384]; 2026-02-21T09:21:07.3251529Z ld.shared.b16 %rs78, [%r1050+4480]; 2026-02-21T09:21:07.3251728Z ld.shared.b16 %rs79, [%r1050+448]; 2026-02-21T09:21:07.3251947Z ld.shared.b16 %rs80, [%r1050+4544]; 2026-02-21T09:21:07.3252152Z add.s32 %r1051, %r1045, %r45; 2026-02-21T09:21:07.3252344Z ld.shared.b16 %rs81, [%r1051]; 2026-02-21T09:21:07.3252535Z ld.shared.b16 %rs82, [%r1051+4096]; 2026-02-21T09:21:07.3252844Z ld.shared.b16 %rs83, [%r1051+64]; 2026-02-21T09:21:07.3253053Z ld.shared.b16 %rs84, [%r1051+4160]; 2026-02-21T09:21:07.3253270Z ld.shared.b16 %rs85, [%r1051+128]; 2026-02-21T09:21:07.3253477Z ld.shared.b16 %rs86, [%r1051+4224]; 2026-02-21T09:21:07.3253672Z ld.shared.b16 %rs87, [%r1051+192]; 2026-02-21T09:21:07.3253876Z ld.shared.b16 %rs88, [%r1051+4288]; 2026-02-21T09:21:07.3254073Z ld.shared.b16 %rs89, [%r1051+256]; 2026-02-21T09:21:07.3254275Z ld.shared.b16 %rs90, [%r1051+4352]; 2026-02-21T09:21:07.3254483Z ld.shared.b16 %rs91, [%r1051+320]; 2026-02-21T09:21:07.3254692Z ld.shared.b16 %rs92, [%r1051+4416]; 2026-02-21T09:21:07.3254898Z ld.shared.b16 %rs93, [%r1051+384]; 2026-02-21T09:21:07.3255127Z ld.shared.b16 %rs94, [%r1051+4480]; 2026-02-21T09:21:07.3255334Z ld.shared.b16 %rs95, [%r1051+448]; 2026-02-21T09:21:07.3255539Z ld.shared.b16 %rs96, [%r1051+4544]; 2026-02-21T09:21:07.3255740Z add.s32 %r1052, %r1045, %r46; 2026-02-21T09:21:07.3255924Z ld.shared.b16 %rs97, [%r1052]; 2026-02-21T09:21:07.3256120Z ld.shared.b16 %rs98, [%r1052+4096]; 2026-02-21T09:21:07.3256319Z ld.shared.b16 %rs99, [%r1052+64]; 2026-02-21T09:21:07.3256665Z ld.shared.b16 %rs100, [%r1052+4160]; 2026-02-21T09:21:07.3256890Z ld.shared.b16 %rs101, [%r1052+128]; 2026-02-21T09:21:07.3257104Z ld.shared.b16 %rs102, [%r1052+4224]; 2026-02-21T09:21:07.3257315Z ld.shared.b16 %rs103, [%r1052+192]; 2026-02-21T09:21:07.3257513Z ld.shared.b16 %rs104, [%r1052+4288]; 2026-02-21T09:21:07.3257730Z ld.shared.b16 %rs105, [%r1052+256]; 2026-02-21T09:21:07.3257930Z ld.shared.b16 %rs106, [%r1052+4352]; 2026-02-21T09:21:07.3258143Z ld.shared.b16 %rs107, [%r1052+320]; 2026-02-21T09:21:07.3258337Z ld.shared.b16 %rs108, [%r1052+4416]; 2026-02-21T09:21:07.3258549Z ld.shared.b16 %rs109, [%r1052+384]; 2026-02-21T09:21:07.3258759Z ld.shared.b16 %rs110, [%r1052+4480]; 2026-02-21T09:21:07.3258971Z ld.shared.b16 %rs111, [%r1052+448]; 2026-02-21T09:21:07.3259180Z ld.shared.b16 %rs112, [%r1052+4544]; 2026-02-21T09:21:07.3259382Z add.s32 %r1053, %r1045, %r47; 2026-02-21T09:21:07.3259586Z ld.shared.b16 %rs113, [%r1053]; 2026-02-21T09:21:07.3259790Z ld.shared.b16 %rs114, [%r1053+4096]; 2026-02-21T09:21:07.3259998Z ld.shared.b16 %rs115, [%r1053+64]; 2026-02-21T09:21:07.3260195Z ld.shared.b16 %rs116, [%r1053+4160]; 2026-02-21T09:21:07.3260420Z ld.shared.b16 %rs117, [%r1053+128]; 2026-02-21T09:21:07.3260621Z ld.shared.b16 %rs118, [%r1053+4224]; 2026-02-21T09:21:07.3260827Z ld.shared.b16 %rs119, [%r1053+192]; 2026-02-21T09:21:07.3261035Z ld.shared.b16 %rs120, [%r1053+4288]; 2026-02-21T09:21:07.3261237Z ld.shared.b16 %rs121, [%r1053+256]; 2026-02-21T09:21:07.3261460Z ld.shared.b16 %rs122, [%r1053+4352]; 2026-02-21T09:21:07.3261822Z ld.shared.b16 %rs123, [%r1053+320]; 2026-02-21T09:21:07.3262035Z ld.shared.b16 %rs124, [%r1053+4416]; 2026-02-21T09:21:07.3262237Z ld.shared.b16 %rs125, [%r1053+384]; 2026-02-21T09:21:07.3262446Z ld.shared.b16 %rs126, [%r1053+4480]; 2026-02-21T09:21:07.3262655Z ld.shared.b16 %rs127, [%r1053+448]; 2026-02-21T09:21:07.3262867Z ld.shared.b16 %rs128, [%r1053+4544]; 2026-02-21T09:21:07.3263077Z cvt.f32.bf16 %r358, %rs1; 2026-02-21T09:21:07.3263280Z cvt.f32.bf16 %r359, %rs2; 2026-02-21T09:21:07.3263469Z cvt.f32.bf16 %r360, %rs17; 2026-02-21T09:21:07.3263654Z cvt.f32.bf16 %r361, %rs18; 2026-02-21T09:21:07.3263839Z cvt.f32.bf16 %r378, %rs33; 2026-02-21T09:21:07.3264018Z cvt.f32.bf16 %r379, %rs34; 2026-02-21T09:21:07.3264199Z cvt.f32.bf16 %r380, %rs49; 2026-02-21T09:21:07.3264450Z cvt.f32.bf16 %r381, %rs50; 2026-02-21T09:21:07.3264653Z cvt.f32.bf16 %r398, %rs65; 2026-02-21T09:21:07.3264829Z cvt.f32.bf16 %r399, %rs66; 2026-02-21T09:21:07.3265009Z cvt.f32.bf16 %r400, %rs81; 2026-02-21T09:21:07.3265206Z cvt.f32.bf16 %r401, %rs82; 2026-02-21T09:21:07.3265383Z cvt.f32.bf16 %r418, %rs97; 2026-02-21T09:21:07.3265563Z cvt.f32.bf16 %r419, %rs98; 2026-02-21T09:21:07.3265739Z cvt.f32.bf16 %r420, %rs113; 2026-02-21T09:21:07.3265994Z cvt.f32.bf16 %r421, %rs114; 2026-02-21T09:21:07.3266179Z cvt.f32.bf16 %r438, %rs3; 2026-02-21T09:21:07.3266379Z cvt.f32.bf16 %r439, %rs4; 2026-02-21T09:21:07.3266668Z cvt.f32.bf16 %r440, %rs19; 2026-02-21T09:21:07.3266852Z cvt.f32.bf16 %r441, %rs20; 2026-02-21T09:21:07.3267040Z cvt.f32.bf16 %r458, %rs35; 2026-02-21T09:21:07.3267221Z cvt.f32.bf16 %r459, %rs36; 2026-02-21T09:21:07.3267401Z cvt.f32.bf16 %r460, %rs51; 2026-02-21T09:21:07.3267583Z cvt.f32.bf16 %r461, %rs52; 2026-02-21T09:21:07.3267763Z cvt.f32.bf16 %r478, %rs67; 2026-02-21T09:21:07.3267948Z cvt.f32.bf16 %r479, %rs68; 2026-02-21T09:21:07.3268136Z cvt.f32.bf16 %r480, %rs83; 2026-02-21T09:21:07.3268370Z cvt.f32.bf16 %r481, %rs84; 2026-02-21T09:21:07.3268562Z cvt.f32.bf16 %r498, %rs99; 2026-02-21T09:21:07.3268739Z cvt.f32.bf16 %r499, %rs100; 2026-02-21T09:21:07.3268929Z cvt.f32.bf16 %r500, %rs115; 2026-02-21T09:21:07.3269105Z cvt.f32.bf16 %r501, %rs116; 2026-02-21T09:21:07.3269288Z cvt.f32.bf16 %r518, %rs5; 2026-02-21T09:21:07.3269468Z cvt.f32.bf16 %r519, %rs6; 2026-02-21T09:21:07.3269655Z cvt.f32.bf16 %r520, %rs21; 2026-02-21T09:21:07.3269839Z cvt.f32.bf16 %r521, %rs22; 2026-02-21T09:21:07.3270011Z cvt.f32.bf16 %r538, %rs37; 2026-02-21T09:21:07.3270192Z cvt.f32.bf16 %r539, %rs38; 2026-02-21T09:21:07.3270363Z cvt.f32.bf16 %r540, %rs53; 2026-02-21T09:21:07.3270539Z cvt.f32.bf16 %r541, %rs54; 2026-02-21T09:21:07.3270724Z cvt.f32.bf16 %r558, %rs69; 2026-02-21T09:21:07.3270903Z cvt.f32.bf16 %r559, %rs70; 2026-02-21T09:21:07.3271080Z cvt.f32.bf16 %r560, %rs85; 2026-02-21T09:21:07.3271262Z cvt.f32.bf16 %r561, %rs86; 2026-02-21T09:21:07.3271440Z cvt.f32.bf16 %r578, %rs101; 2026-02-21T09:21:07.3271616Z cvt.f32.bf16 %r579, %rs102; 2026-02-21T09:21:07.3271812Z cvt.f32.bf16 %r580, %rs117; 2026-02-21T09:21:07.3271988Z cvt.f32.bf16 %r581, %rs118; 2026-02-21T09:21:07.3272169Z cvt.f32.bf16 %r598, %rs7; 2026-02-21T09:21:07.3272339Z cvt.f32.bf16 %r599, %rs8; 2026-02-21T09:21:07.3272515Z cvt.f32.bf16 %r600, %rs23; 2026-02-21T09:21:07.3272694Z cvt.f32.bf16 %r601, %rs24; 2026-02-21T09:21:07.3272860Z cvt.f32.bf16 %r618, %rs39; 2026-02-21T09:21:07.3273035Z cvt.f32.bf16 %r619, %rs40; 2026-02-21T09:21:07.3273212Z cvt.f32.bf16 %r620, %rs55; 2026-02-21T09:21:07.3273390Z cvt.f32.bf16 %r621, %rs56; 2026-02-21T09:21:07.3273561Z cvt.f32.bf16 %r638, %rs71; 2026-02-21T09:21:07.3273741Z cvt.f32.bf16 %r639, %rs72; 2026-02-21T09:21:07.3273911Z cvt.f32.bf16 %r640, %rs87; 2026-02-21T09:21:07.3274088Z cvt.f32.bf16 %r641, %rs88; 2026-02-21T09:21:07.3274260Z cvt.f32.bf16 %r658, %rs103; 2026-02-21T09:21:07.3274444Z cvt.f32.bf16 %r659, %rs104; 2026-02-21T09:21:07.3274636Z cvt.f32.bf16 %r660, %rs119; 2026-02-21T09:21:07.3274815Z cvt.f32.bf16 %r661, %rs120; 2026-02-21T09:21:07.3275154Z cvt.f32.bf16 %r678, %rs9; 2026-02-21T09:21:07.3275331Z cvt.f32.bf16 %r679, %rs10; 2026-02-21T09:21:07.3275525Z cvt.f32.bf16 %r680, %rs25; 2026-02-21T09:21:07.3275701Z cvt.f32.bf16 %r681, %rs26; 2026-02-21T09:21:07.3275878Z cvt.f32.bf16 %r698, %rs41; 2026-02-21T09:21:07.3276048Z cvt.f32.bf16 %r699, %rs42; 2026-02-21T09:21:07.3276223Z cvt.f32.bf16 %r700, %rs57; 2026-02-21T09:21:07.3276392Z cvt.f32.bf16 %r701, %rs58; 2026-02-21T09:21:07.3276716Z cvt.f32.bf16 %r718, %rs73; 2026-02-21T09:21:07.3276890Z cvt.f32.bf16 %r719, %rs74; 2026-02-21T09:21:07.3277061Z cvt.f32.bf16 %r720, %rs89; 2026-02-21T09:21:07.3277251Z cvt.f32.bf16 %r721, %rs90; 2026-02-21T09:21:07.3277425Z cvt.f32.bf16 %r738, %rs105; 2026-02-21T09:21:07.3277604Z cvt.f32.bf16 %r739, %rs106; 2026-02-21T09:21:07.3277860Z cvt.f32.bf16 %r740, %rs121; 2026-02-21T09:21:07.3278045Z cvt.f32.bf16 %r741, %rs122; 2026-02-21T09:21:07.3278215Z cvt.f32.bf16 %r758, %rs11; 2026-02-21T09:21:07.3278420Z cvt.f32.bf16 %r759, %rs12; 2026-02-21T09:21:07.3278594Z cvt.f32.bf16 %r760, %rs27; 2026-02-21T09:21:07.3278777Z cvt.f32.bf16 %r761, %rs28; 2026-02-21T09:21:07.3278956Z cvt.f32.bf16 %r778, %rs43; 2026-02-21T09:21:07.3279127Z cvt.f32.bf16 %r779, %rs44; 2026-02-21T09:21:07.3279387Z cvt.f32.bf16 %r780, %rs59; 2026-02-21T09:21:07.3279567Z cvt.f32.bf16 %r781, %rs60; 2026-02-21T09:21:07.3279744Z cvt.f32.bf16 %r798, %rs75; 2026-02-21T09:21:07.3279913Z cvt.f32.bf16 %r799, %rs76; 2026-02-21T09:21:07.3280100Z cvt.f32.bf16 %r800, %rs91; 2026-02-21T09:21:07.3280271Z cvt.f32.bf16 %r801, %rs92; 2026-02-21T09:21:07.3280452Z cvt.f32.bf16 %r818, %rs107; 2026-02-21T09:21:07.3280628Z cvt.f32.bf16 %r819, %rs108; 2026-02-21T09:21:07.3280797Z cvt.f32.bf16 %r820, %rs123; 2026-02-21T09:21:07.3280970Z cvt.f32.bf16 %r821, %rs124; 2026-02-21T09:21:07.3281143Z cvt.f32.bf16 %r838, %rs13; 2026-02-21T09:21:07.3281313Z cvt.f32.bf16 %r839, %rs14; 2026-02-21T09:21:07.3281485Z cvt.f32.bf16 %r840, %rs29; 2026-02-21T09:21:07.3281659Z cvt.f32.bf16 %r841, %rs30; 2026-02-21T09:21:07.3281828Z cvt.f32.bf16 %r858, %rs45; 2026-02-21T09:21:07.3281993Z cvt.f32.bf16 %r859, %rs46; 2026-02-21T09:21:07.3282165Z cvt.f32.bf16 %r860, %rs61; 2026-02-21T09:21:07.3282333Z cvt.f32.bf16 %r861, %rs62; 2026-02-21T09:21:07.3282515Z cvt.f32.bf16 %r878, %rs77; 2026-02-21T09:21:07.3282682Z cvt.f32.bf16 %r879, %rs78; 2026-02-21T09:21:07.3282851Z cvt.f32.bf16 %r880, %rs93; 2026-02-21T09:21:07.3283018Z cvt.f32.bf16 %r881, %rs94; 2026-02-21T09:21:07.3283185Z cvt.f32.bf16 %r898, %rs109; 2026-02-21T09:21:07.3283361Z cvt.f32.bf16 %r899, %rs110; 2026-02-21T09:21:07.3283531Z cvt.f32.bf16 %r900, %rs125; 2026-02-21T09:21:07.3283711Z cvt.f32.bf16 %r901, %rs126; 2026-02-21T09:21:07.3283886Z cvt.f32.bf16 %r918, %rs15; 2026-02-21T09:21:07.3284058Z cvt.f32.bf16 %r919, %rs16; 2026-02-21T09:21:07.3284233Z cvt.f32.bf16 %r920, %rs31; 2026-02-21T09:21:07.3284406Z cvt.f32.bf16 %r921, %rs32; 2026-02-21T09:21:07.3284582Z cvt.f32.bf16 %r938, %rs47; 2026-02-21T09:21:07.3284759Z cvt.f32.bf16 %r939, %rs48; 2026-02-21T09:21:07.3284933Z cvt.f32.bf16 %r940, %rs63; 2026-02-21T09:21:07.3285109Z cvt.f32.bf16 %r941, %rs64; 2026-02-21T09:21:07.3285280Z cvt.f32.bf16 %r958, %rs79; 2026-02-21T09:21:07.3285448Z cvt.f32.bf16 %r959, %rs80; 2026-02-21T09:21:07.3285620Z cvt.f32.bf16 %r960, %rs95; 2026-02-21T09:21:07.3285802Z cvt.f32.bf16 %r961, %rs96; 2026-02-21T09:21:07.3285976Z cvt.f32.bf16 %r978, %rs111; 2026-02-21T09:21:07.3286145Z cvt.f32.bf16 %r979, %rs112; 2026-02-21T09:21:07.3286318Z cvt.f32.bf16 %r980, %rs127; 2026-02-21T09:21:07.3286617Z cvt.f32.bf16 %r981, %rs128; 2026-02-21T09:21:07.3286970Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3287348Z shl.b32 %r1054, %r2142, 3; 2026-02-21T09:21:07.3287528Z add.s32 %r340, %r1160, %r1054; 2026-02-21T09:21:07.3287714Z // begin inline asm 2026-02-21T09:21:07.3287864Z 2026-02-21T09:21:07.3288154Z { 2026-02-21T09:21:07.3288285Z .reg .pred complete; 2026-02-21T09:21:07.3288449Z waitLoop: 2026-02-21T09:21:07.3288672Z mbarrier.try_wait.parity.shared.b64 complete, [%r340], %r2141; 2026-02-21T09:21:07.3288971Z @!complete bra.uni waitLoop; 2026-02-21T09:21:07.3289149Z } 2026-02-21T09:21:07.3289227Z 2026-02-21T09:21:07.3289289Z // end inline asm 2026-02-21T09:21:07.3289601Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3289963Z shl.b32 %r1055, %r2142, 12; 2026-02-21T09:21:07.3290143Z add.s32 %r1056, %r235, %r1055; 2026-02-21T09:21:07.3290465Z .loc 1 82 58 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:82:58 2026-02-21T09:21:07.3290820Z add.s32 %r1057, %r1056, %r48; 2026-02-21T09:21:07.3291085Z ld.shared.b8 %rs129, [%r1057]; 2026-02-21T09:21:07.3291288Z ld.shared.b8 %rs130, [%r1057+256]; 2026-02-21T09:21:07.3291489Z ld.shared.b8 %rs131, [%r1057+512]; 2026-02-21T09:21:07.3291698Z ld.shared.b8 %rs132, [%r1057+768]; 2026-02-21T09:21:07.3291896Z ld.shared.b8 %rs133, [%r1057+1024]; 2026-02-21T09:21:07.3292099Z ld.shared.b8 %rs134, [%r1057+1280]; 2026-02-21T09:21:07.3292298Z ld.shared.b8 %rs135, [%r1057+1536]; 2026-02-21T09:21:07.3292574Z ld.shared.b8 %rs136, [%r1057+1792]; 2026-02-21T09:21:07.3292776Z ld.shared.b8 %rs137, [%r1057+2048]; 2026-02-21T09:21:07.3292970Z ld.shared.b8 %rs138, [%r1057+2304]; 2026-02-21T09:21:07.3293163Z ld.shared.b8 %rs139, [%r1057+2560]; 2026-02-21T09:21:07.3293363Z ld.shared.b8 %rs140, [%r1057+2816]; 2026-02-21T09:21:07.3293554Z ld.shared.b8 %rs141, [%r1057+3072]; 2026-02-21T09:21:07.3293743Z ld.shared.b8 %rs142, [%r1057+3328]; 2026-02-21T09:21:07.3293932Z ld.shared.b8 %rs143, [%r1057+3584]; 2026-02-21T09:21:07.3294125Z ld.shared.b8 %rs144, [%r1057+3840]; 2026-02-21T09:21:07.3294315Z xor.b32 %r95, %r48, 16; 2026-02-21T09:21:07.3294506Z add.s32 %r1058, %r1056, %r95; 2026-02-21T09:21:07.3294685Z ld.shared.b8 %rs145, [%r1058+128]; 2026-02-21T09:21:07.3294882Z ld.shared.b8 %rs146, [%r1058+384]; 2026-02-21T09:21:07.3295075Z ld.shared.b8 %rs147, [%r1058+640]; 2026-02-21T09:21:07.3295267Z ld.shared.b8 %rs148, [%r1058+896]; 2026-02-21T09:21:07.3295460Z ld.shared.b8 %rs149, [%r1058+1152]; 2026-02-21T09:21:07.3295653Z ld.shared.b8 %rs150, [%r1058+1408]; 2026-02-21T09:21:07.3295847Z ld.shared.b8 %rs151, [%r1058+1664]; 2026-02-21T09:21:07.3296039Z ld.shared.b8 %rs152, [%r1058+1920]; 2026-02-21T09:21:07.3296249Z ld.shared.b8 %rs153, [%r1058+2176]; 2026-02-21T09:21:07.3296439Z ld.shared.b8 %rs154, [%r1058+2432]; 2026-02-21T09:21:07.3296762Z ld.shared.b8 %rs155, [%r1058+2688]; 2026-02-21T09:21:07.3296954Z ld.shared.b8 %rs156, [%r1058+2944]; 2026-02-21T09:21:07.3297143Z ld.shared.b8 %rs157, [%r1058+3200]; 2026-02-21T09:21:07.3297341Z ld.shared.b8 %rs158, [%r1058+3456]; 2026-02-21T09:21:07.3297542Z ld.shared.b8 %rs159, [%r1058+3712]; 2026-02-21T09:21:07.3297738Z ld.shared.b8 %rs160, [%r1058+3968]; 2026-02-21T09:21:07.3298069Z .loc 1 67 28 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:67:28 2026-02-21T09:21:07.3298423Z shl.b16 %rs161, %rs129, 4; 2026-02-21T09:21:07.3298600Z shl.b16 %rs162, %rs145, 4; 2026-02-21T09:21:07.3298773Z shl.b16 %rs163, %rs130, 4; 2026-02-21T09:21:07.3298944Z shl.b16 %rs164, %rs146, 4; 2026-02-21T09:21:07.3299111Z shl.b16 %rs165, %rs131, 4; 2026-02-21T09:21:07.3299281Z shl.b16 %rs166, %rs147, 4; 2026-02-21T09:21:07.3299447Z shl.b16 %rs167, %rs132, 4; 2026-02-21T09:21:07.3299618Z shl.b16 %rs168, %rs148, 4; 2026-02-21T09:21:07.3299786Z shl.b16 %rs169, %rs133, 4; 2026-02-21T09:21:07.3299970Z shl.b16 %rs170, %rs149, 4; 2026-02-21T09:21:07.3300137Z shl.b16 %rs171, %rs134, 4; 2026-02-21T09:21:07.3300306Z shl.b16 %rs172, %rs150, 4; 2026-02-21T09:21:07.3300476Z shl.b16 %rs173, %rs135, 4; 2026-02-21T09:21:07.3300647Z shl.b16 %rs174, %rs151, 4; 2026-02-21T09:21:07.3300819Z shl.b16 %rs175, %rs136, 4; 2026-02-21T09:21:07.3301083Z shl.b16 %rs176, %rs152, 4; 2026-02-21T09:21:07.3301323Z shl.b16 %rs177, %rs137, 4; 2026-02-21T09:21:07.3301502Z shl.b16 %rs178, %rs153, 4; 2026-02-21T09:21:07.3301676Z shl.b16 %rs179, %rs138, 4; 2026-02-21T09:21:07.3301844Z shl.b16 %rs180, %rs154, 4; 2026-02-21T09:21:07.3302015Z shl.b16 %rs181, %rs139, 4; 2026-02-21T09:21:07.3302181Z shl.b16 %rs182, %rs155, 4; 2026-02-21T09:21:07.3302351Z shl.b16 %rs183, %rs140, 4; 2026-02-21T09:21:07.3302518Z shl.b16 %rs184, %rs156, 4; 2026-02-21T09:21:07.3302690Z shl.b16 %rs185, %rs141, 4; 2026-02-21T09:21:07.3302859Z shl.b16 %rs186, %rs157, 4; 2026-02-21T09:21:07.3303024Z shl.b16 %rs187, %rs142, 4; 2026-02-21T09:21:07.3303210Z shl.b16 %rs188, %rs158, 4; 2026-02-21T09:21:07.3303377Z shl.b16 %rs189, %rs143, 4; 2026-02-21T09:21:07.3303635Z shl.b16 %rs190, %rs159, 4; 2026-02-21T09:21:07.3303806Z shl.b16 %rs191, %rs144, 4; 2026-02-21T09:21:07.3303974Z shl.b16 %rs192, %rs160, 4; 2026-02-21T09:21:07.3304283Z .loc 1 82 58 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:82:58 2026-02-21T09:21:07.3304665Z selp.b16 %rs193, %rs161, %rs129, %p77; 2026-02-21T09:21:07.3304870Z cvt.s16.s8 %rs194, %rs193; 2026-02-21T09:21:07.3305037Z shr.s16 %rs195, %rs194, 4; 2026-02-21T09:21:07.3305284Z selp.b16 %rs196, %rs162, %rs145, %p77; 2026-02-21T09:21:07.3305481Z cvt.s16.s8 %rs197, %rs196; 2026-02-21T09:21:07.3305652Z shr.s16 %rs198, %rs197, 4; 2026-02-21T09:21:07.3305841Z selp.b16 %rs199, %rs163, %rs130, %p77; 2026-02-21T09:21:07.3306039Z cvt.s16.s8 %rs200, %rs199; 2026-02-21T09:21:07.3306205Z shr.s16 %rs201, %rs200, 4; 2026-02-21T09:21:07.3306382Z selp.b16 %rs202, %rs164, %rs146, %p77; 2026-02-21T09:21:07.3306695Z cvt.s16.s8 %rs203, %rs202; 2026-02-21T09:21:07.3306881Z shr.s16 %rs204, %rs203, 4; 2026-02-21T09:21:07.3307064Z selp.b16 %rs205, %rs165, %rs131, %p77; 2026-02-21T09:21:07.3307258Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T09:21:07.3307426Z shr.s16 %rs207, %rs206, 4; 2026-02-21T09:21:07.3307603Z selp.b16 %rs208, %rs166, %rs147, %p77; 2026-02-21T09:21:07.3307814Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T09:21:07.3307984Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:21:07.3308161Z selp.b16 %rs211, %rs167, %rs132, %p77; 2026-02-21T09:21:07.3308416Z cvt.s16.s8 %rs212, %rs211; 2026-02-21T09:21:07.3308591Z shr.s16 %rs213, %rs212, 4; 2026-02-21T09:21:07.3308770Z selp.b16 %rs214, %rs168, %rs148, %p77; 2026-02-21T09:21:07.3308962Z cvt.s16.s8 %rs215, %rs214; 2026-02-21T09:21:07.3309131Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:21:07.3309305Z selp.b16 %rs217, %rs169, %rs133, %p77; 2026-02-21T09:21:07.3309503Z cvt.s16.s8 %rs218, %rs217; 2026-02-21T09:21:07.3309670Z shr.s16 %rs219, %rs218, 4; 2026-02-21T09:21:07.3309849Z selp.b16 %rs220, %rs170, %rs149, %p77; 2026-02-21T09:21:07.3310043Z cvt.s16.s8 %rs221, %rs220; 2026-02-21T09:21:07.3310215Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:21:07.3310391Z selp.b16 %rs223, %rs171, %rs134, %p77; 2026-02-21T09:21:07.3310593Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T09:21:07.3310780Z shr.s16 %rs225, %rs224, 4; 2026-02-21T09:21:07.3310959Z selp.b16 %rs226, %rs172, %rs150, %p77; 2026-02-21T09:21:07.3311159Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T09:21:07.3311338Z shr.s16 %rs228, %rs227, 4; 2026-02-21T09:21:07.3311524Z selp.b16 %rs229, %rs173, %rs135, %p77; 2026-02-21T09:21:07.3311721Z cvt.s16.s8 %rs230, %rs229; 2026-02-21T09:21:07.3311891Z shr.s16 %rs231, %rs230, 4; 2026-02-21T09:21:07.3312064Z selp.b16 %rs232, %rs174, %rs151, %p77; 2026-02-21T09:21:07.3312265Z cvt.s16.s8 %rs233, %rs232; 2026-02-21T09:21:07.3312433Z shr.s16 %rs234, %rs233, 4; 2026-02-21T09:21:07.3312625Z selp.b16 %rs235, %rs175, %rs136, %p77; 2026-02-21T09:21:07.3312826Z cvt.s16.s8 %rs236, %rs235; 2026-02-21T09:21:07.3312995Z shr.s16 %rs237, %rs236, 4; 2026-02-21T09:21:07.3313178Z selp.b16 %rs238, %rs176, %rs152, %p77; 2026-02-21T09:21:07.3313373Z cvt.s16.s8 %rs239, %rs238; 2026-02-21T09:21:07.3313540Z shr.s16 %rs240, %rs239, 4; 2026-02-21T09:21:07.3313822Z selp.b16 %rs241, %rs177, %rs137, %p77; 2026-02-21T09:21:07.3314100Z cvt.s16.s8 %rs242, %rs241; 2026-02-21T09:21:07.3314267Z shr.s16 %rs243, %rs242, 4; 2026-02-21T09:21:07.3314445Z selp.b16 %rs244, %rs178, %rs153, %p77; 2026-02-21T09:21:07.3314638Z cvt.s16.s8 %rs245, %rs244; 2026-02-21T09:21:07.3314812Z shr.s16 %rs246, %rs245, 4; 2026-02-21T09:21:07.3314990Z selp.b16 %rs247, %rs179, %rs138, %p77; 2026-02-21T09:21:07.3315181Z cvt.s16.s8 %rs248, %rs247; 2026-02-21T09:21:07.3315367Z shr.s16 %rs249, %rs248, 4; 2026-02-21T09:21:07.3315542Z selp.b16 %rs250, %rs180, %rs154, %p77; 2026-02-21T09:21:07.3315737Z cvt.s16.s8 %rs251, %rs250; 2026-02-21T09:21:07.3315901Z shr.s16 %rs252, %rs251, 4; 2026-02-21T09:21:07.3316082Z selp.b16 %rs253, %rs181, %rs139, %p77; 2026-02-21T09:21:07.3316358Z cvt.s16.s8 %rs254, %rs253; 2026-02-21T09:21:07.3316677Z shr.s16 %rs255, %rs254, 4; 2026-02-21T09:21:07.3316857Z selp.b16 %rs256, %rs182, %rs155, %p77; 2026-02-21T09:21:07.3317051Z cvt.s16.s8 %rs257, %rs256; 2026-02-21T09:21:07.3317240Z shr.s16 %rs258, %rs257, 4; 2026-02-21T09:21:07.3317413Z selp.b16 %rs259, %rs183, %rs140, %p77; 2026-02-21T09:21:07.3317609Z cvt.s16.s8 %rs260, %rs259; 2026-02-21T09:21:07.3317784Z shr.s16 %rs261, %rs260, 4; 2026-02-21T09:21:07.3318041Z selp.b16 %rs262, %rs184, %rs156, %p77; 2026-02-21T09:21:07.3318239Z cvt.s16.s8 %rs263, %rs262; 2026-02-21T09:21:07.3318410Z shr.s16 %rs264, %rs263, 4; 2026-02-21T09:21:07.3318589Z selp.b16 %rs265, %rs185, %rs141, %p77; 2026-02-21T09:21:07.3318782Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T09:21:07.3318968Z shr.s16 %rs267, %rs266, 4; 2026-02-21T09:21:07.3319143Z selp.b16 %rs268, %rs186, %rs157, %p77; 2026-02-21T09:21:07.3319339Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T09:21:07.3319505Z shr.s16 %rs270, %rs269, 4; 2026-02-21T09:21:07.3319682Z selp.b16 %rs271, %rs187, %rs142, %p77; 2026-02-21T09:21:07.3319882Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T09:21:07.3320057Z shr.s16 %rs273, %rs272, 4; 2026-02-21T09:21:07.3320233Z selp.b16 %rs274, %rs188, %rs158, %p77; 2026-02-21T09:21:07.3320451Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T09:21:07.3320624Z shr.s16 %rs276, %rs275, 4; 2026-02-21T09:21:07.3320801Z selp.b16 %rs277, %rs189, %rs143, %p77; 2026-02-21T09:21:07.3321009Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T09:21:07.3321183Z shr.s16 %rs279, %rs278, 4; 2026-02-21T09:21:07.3321367Z selp.b16 %rs280, %rs190, %rs159, %p77; 2026-02-21T09:21:07.3321561Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T09:21:07.3321735Z shr.s16 %rs282, %rs281, 4; 2026-02-21T09:21:07.3321913Z selp.b16 %rs283, %rs191, %rs144, %p77; 2026-02-21T09:21:07.3322115Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T09:21:07.3322294Z shr.s16 %rs285, %rs284, 4; 2026-02-21T09:21:07.3322475Z selp.b16 %rs286, %rs192, %rs160, %p77; 2026-02-21T09:21:07.3322673Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T09:21:07.3322844Z shr.s16 %rs288, %rs287, 4; 2026-02-21T09:21:07.3323165Z .loc 1 87 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:87:32 2026-02-21T09:21:07.3323531Z cvt.rn.f32.s16 %r1059, %rs195; 2026-02-21T09:21:07.3323729Z cvt.rn.f32.s16 %r1060, %rs198; 2026-02-21T09:21:07.3323913Z cvt.rn.f32.s16 %r1061, %rs201; 2026-02-21T09:21:07.3324094Z cvt.rn.f32.s16 %r1062, %rs204; 2026-02-21T09:21:07.3324279Z cvt.rn.f32.s16 %r1063, %rs207; 2026-02-21T09:21:07.3324464Z cvt.rn.f32.s16 %r1064, %rs210; 2026-02-21T09:21:07.3324643Z cvt.rn.f32.s16 %r1065, %rs213; 2026-02-21T09:21:07.3324819Z cvt.rn.f32.s16 %r1066, %rs216; 2026-02-21T09:21:07.3325001Z cvt.rn.f32.s16 %r1067, %rs219; 2026-02-21T09:21:07.3325179Z cvt.rn.f32.s16 %r1068, %rs222; 2026-02-21T09:21:07.3325362Z cvt.rn.f32.s16 %r1069, %rs225; 2026-02-21T09:21:07.3325540Z cvt.rn.f32.s16 %r1070, %rs228; 2026-02-21T09:21:07.3325721Z cvt.rn.f32.s16 %r1071, %rs231; 2026-02-21T09:21:07.3325899Z cvt.rn.f32.s16 %r1072, %rs234; 2026-02-21T09:21:07.3326082Z cvt.rn.f32.s16 %r1073, %rs237; 2026-02-21T09:21:07.3326264Z cvt.rn.f32.s16 %r1074, %rs240; 2026-02-21T09:21:07.3326645Z cvt.rn.f32.s16 %r1075, %rs243; 2026-02-21T09:21:07.3326918Z cvt.rn.f32.s16 %r1076, %rs246; 2026-02-21T09:21:07.3326981Z cvt.rn.f32.s16 %r1077, %rs249; 2026-02-21T09:21:07.3327043Z cvt.rn.f32.s16 %r1078, %rs252; 2026-02-21T09:21:07.3327110Z cvt.rn.f32.s16 %r1079, %rs255; 2026-02-21T09:21:07.3327185Z cvt.rn.f32.s16 %r1080, %rs258; 2026-02-21T09:21:07.3327248Z cvt.rn.f32.s16 %r1081, %rs261; 2026-02-21T09:21:07.3327310Z cvt.rn.f32.s16 %r1082, %rs264; 2026-02-21T09:21:07.3327376Z cvt.rn.f32.s16 %r1083, %rs267; 2026-02-21T09:21:07.3327436Z cvt.rn.f32.s16 %r1084, %rs270; 2026-02-21T09:21:07.3327498Z cvt.rn.f32.s16 %r1085, %rs273; 2026-02-21T09:21:07.3327565Z cvt.rn.f32.s16 %r1086, %rs276; 2026-02-21T09:21:07.3327631Z cvt.rn.f32.s16 %r1087, %rs279; 2026-02-21T09:21:07.3327777Z cvt.rn.f32.s16 %r1088, %rs282; 2026-02-21T09:21:07.3327849Z cvt.rn.f32.s16 %r1089, %rs285; 2026-02-21T09:21:07.3327919Z cvt.rn.f32.s16 %r1090, %rs288; 2026-02-21T09:21:07.3327988Z st.shared.b32 [%r49], %r1059; 2026-02-21T09:21:07.3328064Z st.shared.b32 [%r49+4096], %r1063; 2026-02-21T09:21:07.3328134Z st.shared.b32 [%r49+8192], %r1067; 2026-02-21T09:21:07.3328205Z st.shared.b32 [%r49+12288], %r1071; 2026-02-21T09:21:07.3328272Z st.shared.b32 [%r49+16384], %r1075; 2026-02-21T09:21:07.3328412Z st.shared.b32 [%r49+20480], %r1079; 2026-02-21T09:21:07.3328488Z st.shared.b32 [%r49+24576], %r1083; 2026-02-21T09:21:07.3328553Z st.shared.b32 [%r49+28672], %r1087; 2026-02-21T09:21:07.3328622Z st.shared.b32 [%r50], %r1060; 2026-02-21T09:21:07.3328690Z st.shared.b32 [%r50+4096], %r1064; 2026-02-21T09:21:07.3328754Z st.shared.b32 [%r50+8192], %r1068; 2026-02-21T09:21:07.3328820Z st.shared.b32 [%r50+12288], %r1072; 2026-02-21T09:21:07.3328899Z st.shared.b32 [%r50+16384], %r1076; 2026-02-21T09:21:07.3328969Z st.shared.b32 [%r50+20480], %r1080; 2026-02-21T09:21:07.3329037Z st.shared.b32 [%r50+24576], %r1084; 2026-02-21T09:21:07.3329102Z st.shared.b32 [%r50+28672], %r1088; 2026-02-21T09:21:07.3329174Z st.shared.b32 [%r51], %r1061; 2026-02-21T09:21:07.3329240Z st.shared.b32 [%r51+4096], %r1065; 2026-02-21T09:21:07.3329305Z st.shared.b32 [%r51+8192], %r1069; 2026-02-21T09:21:07.3329376Z st.shared.b32 [%r51+12288], %r1073; 2026-02-21T09:21:07.3329443Z st.shared.b32 [%r51+16384], %r1077; 2026-02-21T09:21:07.3329509Z st.shared.b32 [%r51+20480], %r1081; 2026-02-21T09:21:07.3329574Z st.shared.b32 [%r51+24576], %r1085; 2026-02-21T09:21:07.3329644Z st.shared.b32 [%r51+28672], %r1089; 2026-02-21T09:21:07.3329718Z st.shared.b32 [%r52], %r1062; 2026-02-21T09:21:07.3329786Z st.shared.b32 [%r52+4096], %r1066; 2026-02-21T09:21:07.3329857Z st.shared.b32 [%r52+8192], %r1070; 2026-02-21T09:21:07.3329921Z st.shared.b32 [%r52+12288], %r1074; 2026-02-21T09:21:07.3329986Z st.shared.b32 [%r52+16384], %r1078; 2026-02-21T09:21:07.3330056Z st.shared.b32 [%r52+20480], %r1082; 2026-02-21T09:21:07.3330132Z st.shared.b32 [%r52+24576], %r1086; 2026-02-21T09:21:07.3330198Z st.shared.b32 [%r52+28672], %r1090; 2026-02-21T09:21:07.3330258Z $L__tmp1: 2026-02-21T09:21:07.3330552Z .loc 2 291 36 // standard.py:291:36 @[ ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:94:40 ] 2026-02-21T09:21:07.3330617Z // begin inline asm 2026-02-21T09:21:07.3330705Z fence.proxy.async.shared::cta; 2026-02-21T09:21:07.3330768Z // end inline asm 2026-02-21T09:21:07.3330826Z bar.sync 0; 2026-02-21T09:21:07.3330915Z shfl.sync.idx.b32 %r1091, %r4, 0, 31, -1; 2026-02-21T09:21:07.3330988Z wgmma.fence.sync.aligned; 2026-02-21T09:21:07.3331060Z shl.b32 %r1092, %r1091, 9; 2026-02-21T09:21:07.3331131Z and.b32 %r1093, %r1092, 2048; 2026-02-21T09:21:07.3331197Z add.s32 %r1094, %r1093, %r180; 2026-02-21T09:21:07.3331268Z bfe.u32 %r1095, %r1094, 4, 14; 2026-02-21T09:21:07.3331335Z cvt.u64.u32 %rd162, %r1095; 2026-02-21T09:21:07.3331414Z or.b64 %rd113, %rd162, 4611686293322072064; 2026-02-21T09:21:07.3331488Z mov.pred %p43, -1; 2026-02-21T09:21:07.3331549Z // begin inline asm 2026-02-21T09:21:07.3332061Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r358,%r359,%r360,%r361}, %rd113, %p43, 1, 1; 2026-02-21T09:21:07.3332120Z // end inline asm 2026-02-21T09:21:07.3332186Z add.s32 %r1096, %r1094, 32; 2026-02-21T09:21:07.3332260Z bfe.u32 %r1097, %r1096, 4, 14; 2026-02-21T09:21:07.3332326Z cvt.u64.u32 %rd163, %r1097; 2026-02-21T09:21:07.3332406Z or.b64 %rd114, %rd163, 4611686293322072064; 2026-02-21T09:21:07.3332466Z // begin inline asm 2026-02-21T09:21:07.3332838Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r378,%r379,%r380,%r381}, %rd114, %p43, 1, 1; 2026-02-21T09:21:07.3332902Z // end inline asm 2026-02-21T09:21:07.3332964Z add.s32 %r1098, %r1094, 64; 2026-02-21T09:21:07.3333082Z bfe.u32 %r1099, %r1098, 4, 14; 2026-02-21T09:21:07.3333147Z cvt.u64.u32 %rd164, %r1099; 2026-02-21T09:21:07.3333226Z or.b64 %rd115, %rd164, 4611686293322072064; 2026-02-21T09:21:07.3333291Z // begin inline asm 2026-02-21T09:21:07.3333655Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r398,%r399,%r400,%r401}, %rd115, %p43, 1, 1; 2026-02-21T09:21:07.3333716Z // end inline asm 2026-02-21T09:21:07.3333829Z add.s32 %r1100, %r1094, 96; 2026-02-21T09:21:07.3333895Z bfe.u32 %r1101, %r1100, 4, 14; 2026-02-21T09:21:07.3333957Z cvt.u64.u32 %rd165, %r1101; 2026-02-21T09:21:07.3334034Z or.b64 %rd116, %rd165, 4611686293322072064; 2026-02-21T09:21:07.3334094Z // begin inline asm 2026-02-21T09:21:07.3334465Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r418,%r419,%r420,%r421}, %rd116, %p43, 1, 1; 2026-02-21T09:21:07.3334538Z // end inline asm 2026-02-21T09:21:07.3334604Z add.s32 %r1102, %r1094, 4096; 2026-02-21T09:21:07.3334669Z bfe.u32 %r1103, %r1102, 4, 14; 2026-02-21T09:21:07.3334740Z cvt.u64.u32 %rd166, %r1103; 2026-02-21T09:21:07.3334820Z or.b64 %rd117, %rd166, 4611686293322072064; 2026-02-21T09:21:07.3334884Z // begin inline asm 2026-02-21T09:21:07.3335250Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r438,%r439,%r440,%r441}, %rd117, %p43, 1, 1; 2026-02-21T09:21:07.3335319Z // end inline asm 2026-02-21T09:21:07.3335383Z add.s32 %r1104, %r1094, 4128; 2026-02-21T09:21:07.3335445Z bfe.u32 %r1105, %r1104, 4, 14; 2026-02-21T09:21:07.3335513Z cvt.u64.u32 %rd167, %r1105; 2026-02-21T09:21:07.3335590Z or.b64 %rd118, %rd167, 4611686293322072064; 2026-02-21T09:21:07.3335650Z // begin inline asm 2026-02-21T09:21:07.3336027Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r458,%r459,%r460,%r461}, %rd118, %p43, 1, 1; 2026-02-21T09:21:07.3336087Z // end inline asm 2026-02-21T09:21:07.3336149Z add.s32 %r1106, %r1094, 4160; 2026-02-21T09:21:07.3336210Z bfe.u32 %r1107, %r1106, 4, 14; 2026-02-21T09:21:07.3336284Z cvt.u64.u32 %rd168, %r1107; 2026-02-21T09:21:07.3336361Z or.b64 %rd119, %rd168, 4611686293322072064; 2026-02-21T09:21:07.3336420Z // begin inline asm 2026-02-21T09:21:07.3336920Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r478,%r479,%r480,%r481}, %rd119, %p43, 1, 1; 2026-02-21T09:21:07.3336982Z // end inline asm 2026-02-21T09:21:07.3337044Z add.s32 %r1108, %r1094, 4192; 2026-02-21T09:21:07.3337113Z bfe.u32 %r1109, %r1108, 4, 14; 2026-02-21T09:21:07.3337175Z cvt.u64.u32 %rd169, %r1109; 2026-02-21T09:21:07.3337247Z or.b64 %rd120, %rd169, 4611686293322072064; 2026-02-21T09:21:07.3337319Z // begin inline asm 2026-02-21T09:21:07.3337684Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r498,%r499,%r500,%r501}, %rd120, %p43, 1, 1; 2026-02-21T09:21:07.3337742Z // end inline asm 2026-02-21T09:21:07.3337804Z add.s32 %r1110, %r1094, 8192; 2026-02-21T09:21:07.3338039Z bfe.u32 %r1111, %r1110, 4, 14; 2026-02-21T09:21:07.3338104Z cvt.u64.u32 %rd170, %r1111; 2026-02-21T09:21:07.3338178Z or.b64 %rd121, %rd170, 4611686293322072064; 2026-02-21T09:21:07.3338244Z // begin inline asm 2026-02-21T09:21:07.3338605Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r518,%r519,%r520,%r521}, %rd121, %p43, 1, 1; 2026-02-21T09:21:07.3338665Z // end inline asm 2026-02-21T09:21:07.3338726Z add.s32 %r1112, %r1094, 8224; 2026-02-21T09:21:07.3338794Z bfe.u32 %r1113, %r1112, 4, 14; 2026-02-21T09:21:07.3338856Z cvt.u64.u32 %rd171, %r1113; 2026-02-21T09:21:07.3338928Z or.b64 %rd122, %rd171, 4611686293322072064; 2026-02-21T09:21:07.3338994Z // begin inline asm 2026-02-21T09:21:07.3339422Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r538,%r539,%r540,%r541}, %rd122, %p43, 1, 1; 2026-02-21T09:21:07.3339485Z // end inline asm 2026-02-21T09:21:07.3339557Z add.s32 %r1114, %r1094, 8256; 2026-02-21T09:21:07.3339621Z bfe.u32 %r1115, %r1114, 4, 14; 2026-02-21T09:21:07.3339683Z cvt.u64.u32 %rd172, %r1115; 2026-02-21T09:21:07.3339758Z or.b64 %rd123, %rd172, 4611686293322072064; 2026-02-21T09:21:07.3339824Z // begin inline asm 2026-02-21T09:21:07.3340254Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r558,%r559,%r560,%r561}, %rd123, %p43, 1, 1; 2026-02-21T09:21:07.3340318Z // end inline asm 2026-02-21T09:21:07.3340384Z add.s32 %r1116, %r1094, 8288; 2026-02-21T09:21:07.3340446Z bfe.u32 %r1117, %r1116, 4, 14; 2026-02-21T09:21:07.3340508Z cvt.u64.u32 %rd173, %r1117; 2026-02-21T09:21:07.3340596Z or.b64 %rd124, %rd173, 4611686293322072064; 2026-02-21T09:21:07.3340659Z // begin inline asm 2026-02-21T09:21:07.3341022Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r578,%r579,%r580,%r581}, %rd124, %p43, 1, 1; 2026-02-21T09:21:07.3341083Z // end inline asm 2026-02-21T09:21:07.3341153Z add.s32 %r1118, %r1094, 12288; 2026-02-21T09:21:07.3341213Z bfe.u32 %r1119, %r1118, 4, 14; 2026-02-21T09:21:07.3341285Z cvt.u64.u32 %rd174, %r1119; 2026-02-21T09:21:07.3341369Z or.b64 %rd125, %rd174, 4611686293322072064; 2026-02-21T09:21:07.3341430Z // begin inline asm 2026-02-21T09:21:07.3341791Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r598,%r599,%r600,%r601}, %rd125, %p43, 1, 1; 2026-02-21T09:21:07.3341853Z // end inline asm 2026-02-21T09:21:07.3341914Z add.s32 %r1120, %r1094, 12320; 2026-02-21T09:21:07.3341974Z bfe.u32 %r1121, %r1120, 4, 14; 2026-02-21T09:21:07.3342036Z cvt.u64.u32 %rd175, %r1121; 2026-02-21T09:21:07.3342113Z or.b64 %rd126, %rd175, 4611686293322072064; 2026-02-21T09:21:07.3342176Z // begin inline asm 2026-02-21T09:21:07.3342536Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r618,%r619,%r620,%r621}, %rd126, %p43, 1, 1; 2026-02-21T09:21:07.3342602Z // end inline asm 2026-02-21T09:21:07.3342663Z add.s32 %r1122, %r1094, 12352; 2026-02-21T09:21:07.3342722Z bfe.u32 %r1123, %r1122, 4, 14; 2026-02-21T09:21:07.3342788Z cvt.u64.u32 %rd176, %r1123; 2026-02-21T09:21:07.3342861Z or.b64 %rd127, %rd176, 4611686293322072064; 2026-02-21T09:21:07.3342918Z // begin inline asm 2026-02-21T09:21:07.3343285Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r638,%r639,%r640,%r641}, %rd127, %p43, 1, 1; 2026-02-21T09:21:07.3343349Z // end inline asm 2026-02-21T09:21:07.3343410Z add.s32 %r1124, %r1094, 12384; 2026-02-21T09:21:07.3343470Z bfe.u32 %r1125, %r1124, 4, 14; 2026-02-21T09:21:07.3343536Z cvt.u64.u32 %rd177, %r1125; 2026-02-21T09:21:07.3343619Z or.b64 %rd128, %rd177, 4611686293322072064; 2026-02-21T09:21:07.3343684Z // begin inline asm 2026-02-21T09:21:07.3344046Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r658,%r659,%r660,%r661}, %rd128, %p43, 1, 1; 2026-02-21T09:21:07.3344214Z // end inline asm 2026-02-21T09:21:07.3344276Z add.s32 %r1126, %r1094, 16384; 2026-02-21T09:21:07.3344336Z bfe.u32 %r1127, %r1126, 4, 14; 2026-02-21T09:21:07.3344406Z cvt.u64.u32 %rd178, %r1127; 2026-02-21T09:21:07.3344478Z or.b64 %rd129, %rd178, 4611686293322072064; 2026-02-21T09:21:07.3344538Z // begin inline asm 2026-02-21T09:21:07.3344902Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r678,%r679,%r680,%r681}, %rd129, %p43, 1, 1; 2026-02-21T09:21:07.3344958Z // end inline asm 2026-02-21T09:21:07.3345020Z add.s32 %r1128, %r1094, 16416; 2026-02-21T09:21:07.3345085Z bfe.u32 %r1129, %r1128, 4, 14; 2026-02-21T09:21:07.3345202Z cvt.u64.u32 %rd179, %r1129; 2026-02-21T09:21:07.3345280Z or.b64 %rd130, %rd179, 4611686293322072064; 2026-02-21T09:21:07.3345340Z // begin inline asm 2026-02-21T09:21:07.3345707Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r698,%r699,%r700,%r701}, %rd130, %p43, 1, 1; 2026-02-21T09:21:07.3345765Z // end inline asm 2026-02-21T09:21:07.3345826Z add.s32 %r1130, %r1094, 16448; 2026-02-21T09:21:07.3345944Z bfe.u32 %r1131, %r1130, 4, 14; 2026-02-21T09:21:07.3346010Z cvt.u64.u32 %rd180, %r1131; 2026-02-21T09:21:07.3346082Z or.b64 %rd131, %rd180, 4611686293322072064; 2026-02-21T09:21:07.3346141Z // begin inline asm 2026-02-21T09:21:07.3346623Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r718,%r719,%r720,%r721}, %rd131, %p43, 1, 1; 2026-02-21T09:21:07.3346685Z // end inline asm 2026-02-21T09:21:07.3346748Z add.s32 %r1132, %r1094, 16480; 2026-02-21T09:21:07.3346814Z bfe.u32 %r1133, %r1132, 4, 14; 2026-02-21T09:21:07.3346877Z cvt.u64.u32 %rd181, %r1133; 2026-02-21T09:21:07.3346949Z or.b64 %rd132, %rd181, 4611686293322072064; 2026-02-21T09:21:07.3347028Z // begin inline asm 2026-02-21T09:21:07.3347389Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r738,%r739,%r740,%r741}, %rd132, %p43, 1, 1; 2026-02-21T09:21:07.3347448Z // end inline asm 2026-02-21T09:21:07.3347510Z add.s32 %r1134, %r1094, 20480; 2026-02-21T09:21:07.3347576Z bfe.u32 %r1135, %r1134, 4, 14; 2026-02-21T09:21:07.3347637Z cvt.u64.u32 %rd182, %r1135; 2026-02-21T09:21:07.3347711Z or.b64 %rd133, %rd182, 4611686293322072064; 2026-02-21T09:21:07.3347775Z // begin inline asm 2026-02-21T09:21:07.3348133Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r758,%r759,%r760,%r761}, %rd133, %p43, 1, 1; 2026-02-21T09:21:07.3348194Z // end inline asm 2026-02-21T09:21:07.3348259Z add.s32 %r1136, %r1094, 20512; 2026-02-21T09:21:07.3348376Z bfe.u32 %r1137, %r1136, 4, 14; 2026-02-21T09:21:07.3348441Z cvt.u64.u32 %rd183, %r1137; 2026-02-21T09:21:07.3348517Z or.b64 %rd134, %rd183, 4611686293322072064; 2026-02-21T09:21:07.3348585Z // begin inline asm 2026-02-21T09:21:07.3348943Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r778,%r779,%r780,%r781}, %rd134, %p43, 1, 1; 2026-02-21T09:21:07.3349001Z // end inline asm 2026-02-21T09:21:07.3349066Z add.s32 %r1138, %r1094, 20544; 2026-02-21T09:21:07.3349129Z bfe.u32 %r1139, %r1138, 4, 14; 2026-02-21T09:21:07.3349191Z cvt.u64.u32 %rd184, %r1139; 2026-02-21T09:21:07.3349268Z or.b64 %rd135, %rd184, 4611686293322072064; 2026-02-21T09:21:07.3349326Z // begin inline asm 2026-02-21T09:21:07.3349694Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r798,%r799,%r800,%r801}, %rd135, %p43, 1, 1; 2026-02-21T09:21:07.3349753Z // end inline asm 2026-02-21T09:21:07.3349819Z add.s32 %r1140, %r1094, 20576; 2026-02-21T09:21:07.3349880Z bfe.u32 %r1141, %r1140, 4, 14; 2026-02-21T09:21:07.3350097Z cvt.u64.u32 %rd185, %r1141; 2026-02-21T09:21:07.3350184Z or.b64 %rd136, %rd185, 4611686293322072064; 2026-02-21T09:21:07.3350245Z // begin inline asm 2026-02-21T09:21:07.3350613Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r818,%r819,%r820,%r821}, %rd136, %p43, 1, 1; 2026-02-21T09:21:07.3350675Z // end inline asm 2026-02-21T09:21:07.3350741Z add.s32 %r1142, %r1094, 24576; 2026-02-21T09:21:07.3350804Z bfe.u32 %r1143, %r1142, 4, 14; 2026-02-21T09:21:07.3350870Z cvt.u64.u32 %rd186, %r1143; 2026-02-21T09:21:07.3350952Z or.b64 %rd137, %rd186, 4611686293322072064; 2026-02-21T09:21:07.3351013Z // begin inline asm 2026-02-21T09:21:07.3351442Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r838,%r839,%r840,%r841}, %rd137, %p43, 1, 1; 2026-02-21T09:21:07.3351508Z // end inline asm 2026-02-21T09:21:07.3351575Z add.s32 %r1144, %r1094, 24608; 2026-02-21T09:21:07.3351641Z bfe.u32 %r1145, %r1144, 4, 14; 2026-02-21T09:21:07.3351711Z cvt.u64.u32 %rd187, %r1145; 2026-02-21T09:21:07.3351786Z or.b64 %rd138, %rd187, 4611686293322072064; 2026-02-21T09:21:07.3351851Z // begin inline asm 2026-02-21T09:21:07.3352280Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r858,%r859,%r860,%r861}, %rd138, %p43, 1, 1; 2026-02-21T09:21:07.3352350Z // end inline asm 2026-02-21T09:21:07.3352412Z add.s32 %r1146, %r1094, 24640; 2026-02-21T09:21:07.3352473Z bfe.u32 %r1147, %r1146, 4, 14; 2026-02-21T09:21:07.3352542Z cvt.u64.u32 %rd188, %r1147; 2026-02-21T09:21:07.3352615Z or.b64 %rd139, %rd188, 4611686293322072064; 2026-02-21T09:21:07.3352676Z // begin inline asm 2026-02-21T09:21:07.3353052Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r878,%r879,%r880,%r881}, %rd139, %p43, 1, 1; 2026-02-21T09:21:07.3353112Z // end inline asm 2026-02-21T09:21:07.3353180Z add.s32 %r1148, %r1094, 24672; 2026-02-21T09:21:07.3353242Z bfe.u32 %r1149, %r1148, 4, 14; 2026-02-21T09:21:07.3353312Z cvt.u64.u32 %rd189, %r1149; 2026-02-21T09:21:07.3353393Z or.b64 %rd140, %rd189, 4611686293322072064; 2026-02-21T09:21:07.3353460Z // begin inline asm 2026-02-21T09:21:07.3353827Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r898,%r899,%r900,%r901}, %rd140, %p43, 1, 1; 2026-02-21T09:21:07.3353885Z // end inline asm 2026-02-21T09:21:07.3353946Z add.s32 %r1150, %r1094, 28672; 2026-02-21T09:21:07.3354011Z bfe.u32 %r1151, %r1150, 4, 14; 2026-02-21T09:21:07.3354074Z cvt.u64.u32 %rd190, %r1151; 2026-02-21T09:21:07.3354147Z or.b64 %rd141, %rd190, 4611686293322072064; 2026-02-21T09:21:07.3354208Z // begin inline asm 2026-02-21T09:21:07.3354573Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r918,%r919,%r920,%r921}, %rd141, %p43, 1, 1; 2026-02-21T09:21:07.3354645Z // end inline asm 2026-02-21T09:21:07.3354715Z add.s32 %r1152, %r1094, 28704; 2026-02-21T09:21:07.3354781Z bfe.u32 %r1153, %r1152, 4, 14; 2026-02-21T09:21:07.3354842Z cvt.u64.u32 %rd191, %r1153; 2026-02-21T09:21:07.3354916Z or.b64 %rd142, %rd191, 4611686293322072064; 2026-02-21T09:21:07.3354983Z // begin inline asm 2026-02-21T09:21:07.3355343Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r938,%r939,%r940,%r941}, %rd142, %p43, 1, 1; 2026-02-21T09:21:07.3355401Z // end inline asm 2026-02-21T09:21:07.3355461Z add.s32 %r1154, %r1094, 28736; 2026-02-21T09:21:07.3355527Z bfe.u32 %r1155, %r1154, 4, 14; 2026-02-21T09:21:07.3355589Z cvt.u64.u32 %rd192, %r1155; 2026-02-21T09:21:07.3355664Z or.b64 %rd143, %rd192, 4611686293322072064; 2026-02-21T09:21:07.3355737Z // begin inline asm 2026-02-21T09:21:07.3356100Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r958,%r959,%r960,%r961}, %rd143, %p43, 1, 1; 2026-02-21T09:21:07.3356275Z // end inline asm 2026-02-21T09:21:07.3356341Z add.s32 %r1156, %r1094, 28768; 2026-02-21T09:21:07.3356403Z bfe.u32 %r1157, %r1156, 4, 14; 2026-02-21T09:21:07.3356592Z cvt.u64.u32 %rd193, %r1157; 2026-02-21T09:21:07.3356683Z or.b64 %rd144, %rd193, 4611686293322072064; 2026-02-21T09:21:07.3356759Z // begin inline asm 2026-02-21T09:21:07.3357120Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150}, {%r978,%r979,%r980,%r981}, %rd144, %p43, 1, 1; 2026-02-21T09:21:07.3357177Z // end inline asm 2026-02-21T09:21:07.3357260Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:07.3357323Z mov.b32 %r990, %r180; 2026-02-21T09:21:07.3357464Z mov.b32 %r992, %r237; 2026-02-21T09:21:07.3357529Z mov.b32 %r991, %r237; 2026-02-21T09:21:07.3357594Z // begin inline asm 2026-02-21T09:21:07.3357771Z // wait for regs: %r2143,%r2144,%r2145,%r2146,%r2147,%r2148,%r2149,%r2150,%r990,%r991,%r992 2026-02-21T09:21:07.3357868Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:07.3357933Z // end inline asm 2026-02-21T09:21:07.3357989Z $L__tmp2: 2026-02-21T09:21:07.3358268Z .loc 1 58 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:32 2026-02-21T09:21:07.3358351Z add.s64 %rd145, %rd309, %rd18; 2026-02-21T09:21:07.3358419Z add.s64 %rd153, %rd309, %rd17; 2026-02-21T09:21:07.3358481Z add.s64 %rd146, %rd309, %rd16; 2026-02-21T09:21:07.3358545Z add.s64 %rd154, %rd309, %rd15; 2026-02-21T09:21:07.3358612Z add.s64 %rd147, %rd309, %rd14; 2026-02-21T09:21:07.3358674Z add.s64 %rd155, %rd309, %rd13; 2026-02-21T09:21:07.3358737Z add.s64 %rd148, %rd309, %rd12; 2026-02-21T09:21:07.3358804Z add.s64 %rd156, %rd309, %rd11; 2026-02-21T09:21:07.3358869Z add.s64 %rd149, %rd309, %rd10; 2026-02-21T09:21:07.3358935Z add.s64 %rd157, %rd309, %rd9; 2026-02-21T09:21:07.3358997Z add.s64 %rd150, %rd309, %rd8; 2026-02-21T09:21:07.3359069Z add.s64 %rd158, %rd309, %rd7; 2026-02-21T09:21:07.3359131Z add.s64 %rd151, %rd309, %rd6; 2026-02-21T09:21:07.3359194Z add.s64 %rd159, %rd309, %rd5; 2026-02-21T09:21:07.3359261Z add.s64 %rd152, %rd309, %rd4; 2026-02-21T09:21:07.3359337Z mad.wide.s32 %rd160, %r2140, 2, %rd42; 2026-02-21T09:21:07.3359545Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3359619Z selp.b32 %r1005, 8, 0, %p80; 2026-02-21T09:21:07.3359679Z // begin inline asm 2026-02-21T09:21:07.3359828Z cp.async.ca.shared.global [ %r1165 + 0 ], [ %rd145 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3359887Z // end inline asm 2026-02-21T09:21:07.3359958Z // begin inline asm 2026-02-21T09:21:07.3360104Z cp.async.ca.shared.global [ %r1167 + 0 ], [ %rd146 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3360164Z // end inline asm 2026-02-21T09:21:07.3360229Z // begin inline asm 2026-02-21T09:21:07.3360362Z cp.async.ca.shared.global [ %r1169 + 0 ], [ %rd147 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3360422Z // end inline asm 2026-02-21T09:21:07.3360484Z // begin inline asm 2026-02-21T09:21:07.3360622Z cp.async.ca.shared.global [ %r1171 + 0 ], [ %rd148 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3360685Z // end inline asm 2026-02-21T09:21:07.3360746Z // begin inline asm 2026-02-21T09:21:07.3360892Z cp.async.ca.shared.global [ %r1173 + 0 ], [ %rd149 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3360956Z // end inline asm 2026-02-21T09:21:07.3361018Z // begin inline asm 2026-02-21T09:21:07.3361159Z cp.async.ca.shared.global [ %r1175 + 0 ], [ %rd150 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3361220Z // end inline asm 2026-02-21T09:21:07.3361281Z // begin inline asm 2026-02-21T09:21:07.3361412Z cp.async.ca.shared.global [ %r1177 + 0 ], [ %rd151 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3361483Z // end inline asm 2026-02-21T09:21:07.3361545Z // begin inline asm 2026-02-21T09:21:07.3361678Z cp.async.ca.shared.global [ %r1179 + 0 ], [ %rd152 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3361898Z // end inline asm 2026-02-21T09:21:07.3361960Z // begin inline asm 2026-02-21T09:21:07.3362092Z cp.async.ca.shared.global [ %r1181 + 0 ], [ %rd153 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3362149Z // end inline asm 2026-02-21T09:21:07.3362217Z // begin inline asm 2026-02-21T09:21:07.3362350Z cp.async.ca.shared.global [ %r1183 + 0 ], [ %rd154 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3362408Z // end inline asm 2026-02-21T09:21:07.3362473Z // begin inline asm 2026-02-21T09:21:07.3362607Z cp.async.ca.shared.global [ %r1185 + 0 ], [ %rd155 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3362666Z // end inline asm 2026-02-21T09:21:07.3362734Z // begin inline asm 2026-02-21T09:21:07.3362870Z cp.async.ca.shared.global [ %r1187 + 0 ], [ %rd156 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3362930Z // end inline asm 2026-02-21T09:21:07.3363041Z // begin inline asm 2026-02-21T09:21:07.3363186Z cp.async.ca.shared.global [ %r1189 + 0 ], [ %rd157 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3363247Z // end inline asm 2026-02-21T09:21:07.3363323Z // begin inline asm 2026-02-21T09:21:07.3363468Z cp.async.ca.shared.global [ %r1191 + 0 ], [ %rd158 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3363529Z // end inline asm 2026-02-21T09:21:07.3363595Z // begin inline asm 2026-02-21T09:21:07.3363772Z cp.async.ca.shared.global [ %r1193 + 0 ], [ %rd159 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3363845Z // end inline asm 2026-02-21T09:21:07.3363907Z // begin inline asm 2026-02-21T09:21:07.3364041Z cp.async.ca.shared.global [ %r1195 + 0 ], [ %rd160 + 0 ], 0x8, %r1005; 2026-02-21T09:21:07.3364110Z // end inline asm 2026-02-21T09:21:07.3364193Z cp.async.commit_group; 2026-02-21T09:21:07.3364425Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3364515Z and.pred %p75, %p2, %p80; 2026-02-21T09:21:07.3364579Z // begin inline asm 2026-02-21T09:21:07.3364718Z @%p75 mbarrier.arrive.expect_tx.shared.b64 _, [%r1160], 4096; 2026-02-21T09:21:07.3364782Z // end inline asm 2026-02-21T09:21:07.3365000Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3365060Z bar.sync 0; 2026-02-21T09:21:07.3365136Z elect.sync %r1158|%p82, -1; 2026-02-21T09:21:07.3365215Z and.pred %p83, %p80, %p82; 2026-02-21T09:21:07.3365285Z and.pred %p76, %p1, %p83; 2026-02-21T09:21:07.3365351Z cvt.u32.u64 %r1159, %rd310; 2026-02-21T09:21:07.3365419Z add.s32 %r1039, %r1159, 256; 2026-02-21T09:21:07.3365496Z // begin inline asm 2026-02-21T09:21:07.3365828Z @%p76 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r235], [%rd211, {%r1161, %r1039}], [%r1160]; 2026-02-21T09:21:07.3365891Z // end inline asm 2026-02-21T09:21:07.3366110Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3366177Z add.s64 %rd309, %rd309, 512; 2026-02-21T09:21:07.3366239Z add.s32 %r2140, %r2140, 256; 2026-02-21T09:21:07.3366308Z mov.b64 %rd310, %rd21; 2026-02-21T09:21:07.3366385Z @%p80 bra $L__BB0_3; 2026-02-21T09:21:07.3366622Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:07.3366696Z cp.async.wait_group 0; 2026-02-21T09:21:07.3366759Z bar.sync 0; 2026-02-21T09:21:07.3366824Z // begin inline asm 2026-02-21T09:21:07.3366937Z @%p2 mbarrier.inval.shared::cta.b64 [%r1160]; 2026-02-21T09:21:07.3367002Z // end inline asm 2026-02-21T09:21:07.3367210Z .loc 1 97 28 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:97:28 2026-02-21T09:21:07.3367290Z cvt.rn.bf16x2.f32 %r1206, %r2144, %r2143; 2026-02-21T09:21:07.3367375Z cvt.rn.bf16x2.f32 %r1207, %r2146, %r2145; 2026-02-21T09:21:07.3367448Z cvt.rn.bf16x2.f32 %r1208, %r2148, %r2147; 2026-02-21T09:21:07.3367533Z cvt.rn.bf16x2.f32 %r1209, %r2150, %r2149; 2026-02-21T09:21:07.3367745Z .loc 1 98 43 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:98:43 2026-02-21T09:21:07.3367975Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:21:07.3368034Z bar.sync 0; 2026-02-21T09:21:07.3368224Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r1206, %r1207, %r1208, %r1209}; 2026-02-21T09:21:07.3368291Z // begin inline asm 2026-02-21T09:21:07.3368374Z fence.proxy.async.shared::cta; 2026-02-21T09:21:07.3368436Z // end inline asm 2026-02-21T09:21:07.3368497Z bar.sync 0; 2026-02-21T09:21:07.3368569Z elect.sync %r1210|%p90, -1; 2026-02-21T09:21:07.3368644Z and.pred %p85, %p1, %p90; 2026-02-21T09:21:07.3368715Z add.s32 %r1163, %r156, 70144; 2026-02-21T09:21:07.3368785Z // begin inline asm 2026-02-21T09:21:07.3369017Z @%p85 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd194, {%r1161, %r1162}], [%r1163]; 2026-02-21T09:21:07.3369077Z // end inline asm 2026-02-21T09:21:07.3369231Z cp.async.bulk.commit_group; 2026-02-21T09:21:07.3369441Z .loc 1 31 74 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:31:74 2026-02-21T09:21:07.3369510Z or.b32 %r1211, %r2139, 1; 2026-02-21T09:21:07.3369720Z .loc 1 37 35 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:37:35 2026-02-21T09:21:07.3369787Z add.s32 %r1214, %r1211, %r243; 2026-02-21T09:21:07.3369853Z shr.s32 %r1215, %r1214, 14; 2026-02-21T09:21:07.3370118Z .loc 1 38 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:38:33 2026-02-21T09:21:07.3370194Z shl.b32 %r1216, %r1215, 6; 2026-02-21T09:21:07.3370401Z .loc 1 39 39 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:39:39 2026-02-21T09:21:07.3370465Z sub.s32 %r1217, 256, %r1216; 2026-02-21T09:21:07.3370687Z .loc 1 39 52 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:39:52 2026-02-21T09:21:07.3370753Z min.s32 %r1218, %r1217, 64; 2026-02-21T09:21:07.3370951Z .loc 1 40 45 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:45 2026-02-21T09:21:07.3371019Z and.b32 %r1219, %r1214, -16384; 2026-02-21T09:21:07.3371096Z sub.s32 %r1220, %r1211, %r1219; 2026-02-21T09:21:07.3371297Z .loc 1 41 51 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:41:51 2026-02-21T09:21:07.3371369Z div.s32 %r1221, %r1220, %r1218; 2026-02-21T09:21:07.3371569Z .loc 1 40 64 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:64 2026-02-21T09:21:07.3371639Z mul.lo.s32 %r1222, %r1221, %r1218; 2026-02-21T09:21:07.3371703Z sub.s32 %r1223, %r1220, %r1222; 2026-02-21T09:21:07.3371906Z .loc 1 40 30 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:40:30 2026-02-21T09:21:07.3371970Z add.s32 %r1224, %r1223, %r1216; 2026-02-21T09:21:07.3372168Z .loc 1 42 27 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:42:27 2026-02-21T09:21:07.3372238Z shl.b32 %r2131, %r1224, 6; 2026-02-21T09:21:07.3372436Z .loc 1 43 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:43:32 2026-02-21T09:21:07.3372507Z or.b32 %r1225, %r2131, %r6; 2026-02-21T09:21:07.3372579Z or.b32 %r1226, %r2131, %r7; 2026-02-21T09:21:07.3372652Z or.b32 %r1227, %r2131, %r8; 2026-02-21T09:21:07.3372716Z or.b32 %r1228, %r2131, %r9; 2026-02-21T09:21:07.3372784Z or.b32 %r1229, %r2131, %r10; 2026-02-21T09:21:07.3372852Z or.b32 %r1230, %r2131, %r11; 2026-02-21T09:21:07.3372914Z or.b32 %r1231, %r2131, %r12; 2026-02-21T09:21:07.3372976Z or.b32 %r1232, %r2131, %r13; 2026-02-21T09:21:07.3373043Z or.b32 %r1233, %r2131, %r14; 2026-02-21T09:21:07.3373105Z or.b32 %r1234, %r2131, %r15; 2026-02-21T09:21:07.3373166Z or.b32 %r1235, %r2131, %r16; 2026-02-21T09:21:07.3373227Z or.b32 %r1236, %r2131, %r17; 2026-02-21T09:21:07.3373296Z or.b32 %r1237, %r2131, %r18; 2026-02-21T09:21:07.3373358Z or.b32 %r1238, %r2131, %r19; 2026-02-21T09:21:07.3373422Z or.b32 %r1239, %r2131, %r20; 2026-02-21T09:21:07.3373488Z or.b32 %r1240, %r2131, %r21; 2026-02-21T09:21:07.3373754Z .loc 1 44 27 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:44:27 2026-02-21T09:21:07.3373864Z shl.b32 %r2130, %r1221, 5; 2026-02-21T09:21:07.3374068Z .loc 1 58 53 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:53 2026-02-21T09:21:07.3374133Z shl.b32 %r1241, %r1225, 10; 2026-02-21T09:21:07.3374195Z shl.b32 %r1242, %r1226, 10; 2026-02-21T09:21:07.3374256Z shl.b32 %r1243, %r1227, 10; 2026-02-21T09:21:07.3374324Z shl.b32 %r1244, %r1228, 10; 2026-02-21T09:21:07.3374392Z shl.b32 %r1245, %r1229, 10; 2026-02-21T09:21:07.3374464Z shl.b32 %r1246, %r1230, 10; 2026-02-21T09:21:07.3374530Z shl.b32 %r1247, %r1231, 10; 2026-02-21T09:21:07.3374591Z shl.b32 %r1248, %r1232, 10; 2026-02-21T09:21:07.3374650Z shl.b32 %r1249, %r1233, 10; 2026-02-21T09:21:07.3374759Z shl.b32 %r1250, %r1234, 10; 2026-02-21T09:21:07.3374828Z shl.b32 %r1251, %r1235, 10; 2026-02-21T09:21:07.3374887Z shl.b32 %r1252, %r1236, 10; 2026-02-21T09:21:07.3374961Z shl.b32 %r1253, %r1237, 10; 2026-02-21T09:21:07.3375029Z shl.b32 %r1254, %r1238, 10; 2026-02-21T09:21:07.3375090Z shl.b32 %r1255, %r1239, 10; 2026-02-21T09:21:07.3375152Z shl.b32 %r1256, %r1240, 10; 2026-02-21T09:21:07.3375414Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3375489Z // begin inline asm 2026-02-21T09:21:07.3375590Z @%p2 mbarrier.init.shared::cta.b64 [%r1160], 1; 2026-02-21T09:21:07.3375651Z // end inline asm 2026-02-21T09:21:07.3375856Z .loc 1 58 60 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:60 2026-02-21T09:21:07.3375931Z or.b32 %r1257, %r1241, %r22; 2026-02-21T09:21:07.3375993Z or.b32 %r1258, %r1242, %r22; 2026-02-21T09:21:07.3376060Z or.b32 %r1259, %r1243, %r22; 2026-02-21T09:21:07.3376123Z or.b32 %r1260, %r1244, %r22; 2026-02-21T09:21:07.3376184Z or.b32 %r1261, %r1245, %r22; 2026-02-21T09:21:07.3376248Z or.b32 %r1262, %r1246, %r22; 2026-02-21T09:21:07.3376311Z or.b32 %r1263, %r1247, %r22; 2026-02-21T09:21:07.3376374Z or.b32 %r1264, %r1248, %r22; 2026-02-21T09:21:07.3376435Z or.b32 %r1265, %r1249, %r22; 2026-02-21T09:21:07.3376641Z or.b32 %r1266, %r1250, %r22; 2026-02-21T09:21:07.3376707Z or.b32 %r1267, %r1251, %r22; 2026-02-21T09:21:07.3376771Z or.b32 %r1268, %r1252, %r22; 2026-02-21T09:21:07.3376836Z or.b32 %r1269, %r1253, %r22; 2026-02-21T09:21:07.3376895Z or.b32 %r1270, %r1254, %r22; 2026-02-21T09:21:07.3376966Z or.b32 %r1271, %r1255, %r22; 2026-02-21T09:21:07.3377028Z or.b32 %r1272, %r1256, %r22; 2026-02-21T09:21:07.3377234Z .loc 1 58 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:32 2026-02-21T09:21:07.3377322Z mad.wide.s32 %rd195, %r1257, 2, %rd42; 2026-02-21T09:21:07.3377394Z mad.wide.s32 %rd203, %r1258, 2, %rd42; 2026-02-21T09:21:07.3377470Z mad.wide.s32 %rd196, %r1259, 2, %rd42; 2026-02-21T09:21:07.3377537Z mad.wide.s32 %rd204, %r1260, 2, %rd42; 2026-02-21T09:21:07.3377608Z mad.wide.s32 %rd197, %r1261, 2, %rd42; 2026-02-21T09:21:07.3377681Z mad.wide.s32 %rd205, %r1262, 2, %rd42; 2026-02-21T09:21:07.3377749Z mad.wide.s32 %rd198, %r1263, 2, %rd42; 2026-02-21T09:21:07.3377819Z mad.wide.s32 %rd206, %r1264, 2, %rd42; 2026-02-21T09:21:07.3377888Z mad.wide.s32 %rd199, %r1265, 2, %rd42; 2026-02-21T09:21:07.3377967Z mad.wide.s32 %rd207, %r1266, 2, %rd42; 2026-02-21T09:21:07.3378034Z mad.wide.s32 %rd200, %r1267, 2, %rd42; 2026-02-21T09:21:07.3378102Z mad.wide.s32 %rd208, %r1268, 2, %rd42; 2026-02-21T09:21:07.3378175Z mad.wide.s32 %rd201, %r1269, 2, %rd42; 2026-02-21T09:21:07.3378245Z mad.wide.s32 %rd209, %r1270, 2, %rd42; 2026-02-21T09:21:07.3378313Z mad.wide.s32 %rd202, %r1271, 2, %rd42; 2026-02-21T09:21:07.3378380Z mad.wide.s32 %rd210, %r1272, 2, %rd42; 2026-02-21T09:21:07.3378458Z mov.b32 %r1166, 8; 2026-02-21T09:21:07.3378662Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3378825Z // begin inline asm 2026-02-21T09:21:07.3379042Z cp.async.ca.shared.global [ %r1165 + 0 ], [ %rd195 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3379103Z // end inline asm 2026-02-21T09:21:07.3379164Z // begin inline asm 2026-02-21T09:21:07.3379308Z cp.async.ca.shared.global [ %r1167 + 0 ], [ %rd196 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3379368Z // end inline asm 2026-02-21T09:21:07.3379429Z // begin inline asm 2026-02-21T09:21:07.3379563Z cp.async.ca.shared.global [ %r1169 + 0 ], [ %rd197 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3379630Z // end inline asm 2026-02-21T09:21:07.3379690Z // begin inline asm 2026-02-21T09:21:07.3379833Z cp.async.ca.shared.global [ %r1171 + 0 ], [ %rd198 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3379897Z // end inline asm 2026-02-21T09:21:07.3379959Z // begin inline asm 2026-02-21T09:21:07.3380157Z cp.async.ca.shared.global [ %r1173 + 0 ], [ %rd199 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3380219Z // end inline asm 2026-02-21T09:21:07.3380286Z // begin inline asm 2026-02-21T09:21:07.3380435Z cp.async.ca.shared.global [ %r1175 + 0 ], [ %rd200 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3380495Z // end inline asm 2026-02-21T09:21:07.3380561Z // begin inline asm 2026-02-21T09:21:07.3380778Z cp.async.ca.shared.global [ %r1177 + 0 ], [ %rd201 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3380851Z // end inline asm 2026-02-21T09:21:07.3380920Z // begin inline asm 2026-02-21T09:21:07.3381056Z cp.async.ca.shared.global [ %r1179 + 0 ], [ %rd202 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3381115Z // end inline asm 2026-02-21T09:21:07.3381176Z // begin inline asm 2026-02-21T09:21:07.3381316Z cp.async.ca.shared.global [ %r1181 + 0 ], [ %rd203 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3381374Z // end inline asm 2026-02-21T09:21:07.3381435Z // begin inline asm 2026-02-21T09:21:07.3381575Z cp.async.ca.shared.global [ %r1183 + 0 ], [ %rd204 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3381637Z // end inline asm 2026-02-21T09:21:07.3381698Z // begin inline asm 2026-02-21T09:21:07.3381832Z cp.async.ca.shared.global [ %r1185 + 0 ], [ %rd205 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3381898Z // end inline asm 2026-02-21T09:21:07.3381957Z // begin inline asm 2026-02-21T09:21:07.3382089Z cp.async.ca.shared.global [ %r1187 + 0 ], [ %rd206 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3382157Z // end inline asm 2026-02-21T09:21:07.3382217Z // begin inline asm 2026-02-21T09:21:07.3382351Z cp.async.ca.shared.global [ %r1189 + 0 ], [ %rd207 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3382416Z // end inline asm 2026-02-21T09:21:07.3382487Z // begin inline asm 2026-02-21T09:21:07.3382620Z cp.async.ca.shared.global [ %r1191 + 0 ], [ %rd208 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3382680Z // end inline asm 2026-02-21T09:21:07.3382743Z // begin inline asm 2026-02-21T09:21:07.3382875Z cp.async.ca.shared.global [ %r1193 + 0 ], [ %rd209 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3382935Z // end inline asm 2026-02-21T09:21:07.3383001Z // begin inline asm 2026-02-21T09:21:07.3383132Z cp.async.ca.shared.global [ %r1195 + 0 ], [ %rd210 + 0 ], 0x8, %r1166; 2026-02-21T09:21:07.3383194Z // end inline asm 2026-02-21T09:21:07.3383265Z cp.async.commit_group; 2026-02-21T09:21:07.3383484Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3383544Z bar.sync 0; 2026-02-21T09:21:07.3383616Z // begin inline asm 2026-02-21T09:21:07.3383753Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r1160], 4096; 2026-02-21T09:21:07.3383811Z // end inline asm 2026-02-21T09:21:07.3384012Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3384078Z bar.sync 0; 2026-02-21T09:21:07.3384147Z elect.sync %r1273|%p91, -1; 2026-02-21T09:21:07.3384217Z and.pred %p88, %p1, %p91; 2026-02-21T09:21:07.3384276Z mov.b32 %r1200, 0; 2026-02-21T09:21:07.3384343Z // begin inline asm 2026-02-21T09:21:07.3384684Z @%p88 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r235], [%rd211, {%r2130, %r1200}], [%r1160]; 2026-02-21T09:21:07.3384861Z // end inline asm 2026-02-21T09:21:07.3385079Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3385144Z or.b32 %r1274, %r20, %r2131; 2026-02-21T09:21:07.3385207Z shl.b32 %r1275, %r1274, 10; 2026-02-21T09:21:07.3385280Z mul.wide.s32 %rd213, %r1275, 2; 2026-02-21T09:21:07.3385343Z or.b64 %rd23, %rd213, 512; 2026-02-21T09:21:07.3385406Z or.b32 %r1276, %r19, %r2131; 2026-02-21T09:21:07.3385468Z shl.b32 %r1277, %r1276, 10; 2026-02-21T09:21:07.3385543Z mul.wide.s32 %rd214, %r1277, 2; 2026-02-21T09:21:07.3385605Z or.b64 %rd24, %rd214, 512; 2026-02-21T09:21:07.3385667Z or.b32 %r1278, %r18, %r2131; 2026-02-21T09:21:07.3385732Z shl.b32 %r1279, %r1278, 10; 2026-02-21T09:21:07.3385844Z mul.wide.s32 %rd215, %r1279, 2; 2026-02-21T09:21:07.3385909Z or.b64 %rd25, %rd215, 512; 2026-02-21T09:21:07.3385971Z or.b32 %r1280, %r17, %r2131; 2026-02-21T09:21:07.3386040Z shl.b32 %r1281, %r1280, 10; 2026-02-21T09:21:07.3386105Z mul.wide.s32 %rd216, %r1281, 2; 2026-02-21T09:21:07.3386168Z or.b64 %rd26, %rd216, 512; 2026-02-21T09:21:07.3386240Z or.b32 %r1282, %r16, %r2131; 2026-02-21T09:21:07.3386307Z shl.b32 %r1283, %r1282, 10; 2026-02-21T09:21:07.3386415Z mul.wide.s32 %rd217, %r1283, 2; 2026-02-21T09:21:07.3386599Z or.b64 %rd27, %rd217, 512; 2026-02-21T09:21:07.3386682Z or.b32 %r1284, %r15, %r2131; 2026-02-21T09:21:07.3386744Z shl.b32 %r1285, %r1284, 10; 2026-02-21T09:21:07.3386810Z mul.wide.s32 %rd218, %r1285, 2; 2026-02-21T09:21:07.3386879Z or.b64 %rd28, %rd218, 512; 2026-02-21T09:21:07.3386954Z or.b32 %r1286, %r14, %r2131; 2026-02-21T09:21:07.3387016Z shl.b32 %r1287, %r1286, 10; 2026-02-21T09:21:07.3387087Z mul.wide.s32 %rd219, %r1287, 2; 2026-02-21T09:21:07.3387153Z or.b64 %rd29, %rd219, 512; 2026-02-21T09:21:07.3387215Z or.b32 %r1288, %r13, %r2131; 2026-02-21T09:21:07.3387276Z shl.b32 %r1289, %r1288, 10; 2026-02-21T09:21:07.3387361Z mul.wide.s32 %rd220, %r1289, 2; 2026-02-21T09:21:07.3387428Z or.b64 %rd30, %rd220, 512; 2026-02-21T09:21:07.3387490Z or.b32 %r1290, %r12, %r2131; 2026-02-21T09:21:07.3387556Z shl.b32 %r1291, %r1290, 10; 2026-02-21T09:21:07.3387622Z mul.wide.s32 %rd221, %r1291, 2; 2026-02-21T09:21:07.3387685Z or.b64 %rd31, %rd221, 512; 2026-02-21T09:21:07.3387747Z or.b32 %r1292, %r11, %r2131; 2026-02-21T09:21:07.3387818Z shl.b32 %r1293, %r1292, 10; 2026-02-21T09:21:07.3387884Z mul.wide.s32 %rd222, %r1293, 2; 2026-02-21T09:21:07.3387948Z or.b64 %rd32, %rd222, 512; 2026-02-21T09:21:07.3388014Z or.b32 %r1294, %r10, %r2131; 2026-02-21T09:21:07.3388076Z shl.b32 %r1295, %r1294, 10; 2026-02-21T09:21:07.3388143Z mul.wide.s32 %rd223, %r1295, 2; 2026-02-21T09:21:07.3388206Z or.b64 %rd33, %rd223, 512; 2026-02-21T09:21:07.3388332Z or.b32 %r1296, %r9, %r2131; 2026-02-21T09:21:07.3388411Z shl.b32 %r1297, %r1296, 10; 2026-02-21T09:21:07.3388477Z mul.wide.s32 %rd224, %r1297, 2; 2026-02-21T09:21:07.3388545Z or.b64 %rd34, %rd224, 512; 2026-02-21T09:21:07.3388608Z or.b32 %r1298, %r8, %r2131; 2026-02-21T09:21:07.3388669Z shl.b32 %r1299, %r1298, 10; 2026-02-21T09:21:07.3388735Z mul.wide.s32 %rd225, %r1299, 2; 2026-02-21T09:21:07.3388802Z or.b64 %rd35, %rd225, 512; 2026-02-21T09:21:07.3388865Z or.b32 %r1300, %r7, %r2131; 2026-02-21T09:21:07.3388926Z shl.b32 %r1301, %r1300, 10; 2026-02-21T09:21:07.3388999Z mul.wide.s32 %rd226, %r1301, 2; 2026-02-21T09:21:07.3389062Z or.b64 %rd36, %rd226, 512; 2026-02-21T09:21:07.3389121Z shl.b32 %r1302, %r1224, 16; 2026-02-21T09:21:07.3389192Z or.b32 %r1303, %r76, %r1302; 2026-02-21T09:21:07.3389277Z mad.wide.s32 %rd37, %r1303, 2, 512; 2026-02-21T09:21:07.3389340Z shl.b32 %r1304, %r1211, 16; 2026-02-21T09:21:07.3389400Z or.b32 %r1305, %r77, %r1304; 2026-02-21T09:21:07.3389473Z shl.b32 %r1306, %r1222, 16; 2026-02-21T09:21:07.3389537Z sub.s32 %r1307, %r1305, %r1306; 2026-02-21T09:21:07.3389608Z mul.lo.s32 %r1308, %r1215, 1069547520; 2026-02-21T09:21:07.3389773Z sub.s32 %r2151, %r1307, %r1308; 2026-02-21T09:21:07.3389909Z mov.b32 %r2154, 0f00000000; 2026-02-21T09:21:07.3389972Z mov.b32 %r2153, -1; 2026-02-21T09:21:07.3390036Z mov.b64 %rd312, -128; 2026-02-21T09:21:07.3390104Z mov.b64 %rd311, %rd3; 2026-02-21T09:21:07.3390171Z mov.b32 %r2152, %r1200; 2026-02-21T09:21:07.3390235Z mov.b32 %r2155, %r2154; 2026-02-21T09:21:07.3390302Z mov.b32 %r2156, %r2154; 2026-02-21T09:21:07.3390362Z mov.b32 %r2157, %r2154; 2026-02-21T09:21:07.3390422Z mov.b32 %r2158, %r2154; 2026-02-21T09:21:07.3390482Z mov.b32 %r2159, %r2154; 2026-02-21T09:21:07.3390548Z mov.b32 %r2160, %r2154; 2026-02-21T09:21:07.3390610Z mov.b32 %r2161, %r2154; 2026-02-21T09:21:07.3390726Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:21:07.3390917Z // => This Inner Loop Header: Depth=2 2026-02-21T09:21:07.3390986Z add.s64 %rd40, %rd312, 128; 2026-02-21T09:21:07.3391057Z setp.lt.u64 %p129, %rd40, 384; 2026-02-21T09:21:07.3391128Z add.s32 %r2010, %r2153, 1; 2026-02-21T09:21:07.3391202Z setp.lt.u32 %p130, %r2153, 2147483647; 2026-02-21T09:21:07.3391270Z selp.b32 %r2153, 0, %r2010, %p130; 2026-02-21T09:21:07.3391336Z selp.b32 %r2011, 1, 0, %p130; 2026-02-21T09:21:07.3391473Z xor.b32 %r2152, %r2152, %r2011; 2026-02-21T09:21:07.3391700Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3391776Z cp.async.wait_group 0; 2026-02-21T09:21:07.3391840Z bar.sync 0; 2026-02-21T09:21:07.3391904Z shl.b32 %r2012, %r2153, 15; 2026-02-21T09:21:07.3391970Z add.s32 %r2014, %r156, %r2012; 2026-02-21T09:21:07.3392184Z .loc 1 62 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:62:32 2026-02-21T09:21:07.3392257Z add.s32 %r2015, %r2014, %r54; 2026-02-21T09:21:07.3392329Z ld.shared.b16 %rs289, [%r2015]; 2026-02-21T09:21:07.3392401Z ld.shared.b16 %rs290, [%r2015+4096]; 2026-02-21T09:21:07.3392475Z ld.shared.b16 %rs291, [%r2015+64]; 2026-02-21T09:21:07.3392546Z ld.shared.b16 %rs292, [%r2015+4160]; 2026-02-21T09:21:07.3392615Z ld.shared.b16 %rs293, [%r2015+128]; 2026-02-21T09:21:07.3392691Z ld.shared.b16 %rs294, [%r2015+4224]; 2026-02-21T09:21:07.3392766Z ld.shared.b16 %rs295, [%r2015+192]; 2026-02-21T09:21:07.3392837Z ld.shared.b16 %rs296, [%r2015+4288]; 2026-02-21T09:21:07.3392906Z ld.shared.b16 %rs297, [%r2015+256]; 2026-02-21T09:21:07.3392979Z ld.shared.b16 %rs298, [%r2015+4352]; 2026-02-21T09:21:07.3393047Z ld.shared.b16 %rs299, [%r2015+320]; 2026-02-21T09:21:07.3393114Z ld.shared.b16 %rs300, [%r2015+4416]; 2026-02-21T09:21:07.3393187Z ld.shared.b16 %rs301, [%r2015+384]; 2026-02-21T09:21:07.3393257Z ld.shared.b16 %rs302, [%r2015+4480]; 2026-02-21T09:21:07.3393325Z ld.shared.b16 %rs303, [%r2015+448]; 2026-02-21T09:21:07.3393395Z ld.shared.b16 %rs304, [%r2015+4544]; 2026-02-21T09:21:07.3393464Z add.s32 %r2016, %r2014, %r55; 2026-02-21T09:21:07.3393531Z ld.shared.b16 %rs305, [%r2016]; 2026-02-21T09:21:07.3393614Z ld.shared.b16 %rs306, [%r2016+4096]; 2026-02-21T09:21:07.3393690Z ld.shared.b16 %rs307, [%r2016+64]; 2026-02-21T09:21:07.3393757Z ld.shared.b16 %rs308, [%r2016+4160]; 2026-02-21T09:21:07.3393823Z ld.shared.b16 %rs309, [%r2016+128]; 2026-02-21T09:21:07.3393897Z ld.shared.b16 %rs310, [%r2016+4224]; 2026-02-21T09:21:07.3393963Z ld.shared.b16 %rs311, [%r2016+192]; 2026-02-21T09:21:07.3394031Z ld.shared.b16 %rs312, [%r2016+4288]; 2026-02-21T09:21:07.3394098Z ld.shared.b16 %rs313, [%r2016+256]; 2026-02-21T09:21:07.3394169Z ld.shared.b16 %rs314, [%r2016+4352]; 2026-02-21T09:21:07.3394235Z ld.shared.b16 %rs315, [%r2016+320]; 2026-02-21T09:21:07.3394301Z ld.shared.b16 %rs316, [%r2016+4416]; 2026-02-21T09:21:07.3394374Z ld.shared.b16 %rs317, [%r2016+384]; 2026-02-21T09:21:07.3394442Z ld.shared.b16 %rs318, [%r2016+4480]; 2026-02-21T09:21:07.3394510Z ld.shared.b16 %rs319, [%r2016+448]; 2026-02-21T09:21:07.3394578Z ld.shared.b16 %rs320, [%r2016+4544]; 2026-02-21T09:21:07.3394775Z add.s32 %r2017, %r2014, %r56; 2026-02-21T09:21:07.3394845Z ld.shared.b16 %rs321, [%r2017]; 2026-02-21T09:21:07.3394913Z ld.shared.b16 %rs322, [%r2017+4096]; 2026-02-21T09:21:07.3394987Z ld.shared.b16 %rs323, [%r2017+64]; 2026-02-21T09:21:07.3395056Z ld.shared.b16 %rs324, [%r2017+4160]; 2026-02-21T09:21:07.3395123Z ld.shared.b16 %rs325, [%r2017+128]; 2026-02-21T09:21:07.3395196Z ld.shared.b16 %rs326, [%r2017+4224]; 2026-02-21T09:21:07.3395264Z ld.shared.b16 %rs327, [%r2017+192]; 2026-02-21T09:21:07.3395331Z ld.shared.b16 %rs328, [%r2017+4288]; 2026-02-21T09:21:07.3395398Z ld.shared.b16 %rs329, [%r2017+256]; 2026-02-21T09:21:07.3395481Z ld.shared.b16 %rs330, [%r2017+4352]; 2026-02-21T09:21:07.3395549Z ld.shared.b16 %rs331, [%r2017+320]; 2026-02-21T09:21:07.3395667Z ld.shared.b16 %rs332, [%r2017+4416]; 2026-02-21T09:21:07.3395742Z ld.shared.b16 %rs333, [%r2017+384]; 2026-02-21T09:21:07.3395811Z ld.shared.b16 %rs334, [%r2017+4480]; 2026-02-21T09:21:07.3395880Z ld.shared.b16 %rs335, [%r2017+448]; 2026-02-21T09:21:07.3395962Z ld.shared.b16 %rs336, [%r2017+4544]; 2026-02-21T09:21:07.3396032Z add.s32 %r2018, %r2014, %r57; 2026-02-21T09:21:07.3396101Z ld.shared.b16 %rs337, [%r2018]; 2026-02-21T09:21:07.3396221Z ld.shared.b16 %rs338, [%r2018+4096]; 2026-02-21T09:21:07.3396298Z ld.shared.b16 %rs339, [%r2018+64]; 2026-02-21T09:21:07.3396367Z ld.shared.b16 %rs340, [%r2018+4160]; 2026-02-21T09:21:07.3396436Z ld.shared.b16 %rs341, [%r2018+128]; 2026-02-21T09:21:07.3396632Z ld.shared.b16 %rs342, [%r2018+4224]; 2026-02-21T09:21:07.3396714Z ld.shared.b16 %rs343, [%r2018+192]; 2026-02-21T09:21:07.3400790Z ld.shared.b16 %rs344, [%r2018+4288]; 2026-02-21T09:21:07.3400913Z ld.shared.b16 %rs345, [%r2018+256]; 2026-02-21T09:21:07.3400998Z ld.shared.b16 %rs346, [%r2018+4352]; 2026-02-21T09:21:07.3401072Z ld.shared.b16 %rs347, [%r2018+320]; 2026-02-21T09:21:07.3401146Z ld.shared.b16 %rs348, [%r2018+4416]; 2026-02-21T09:21:07.3401214Z ld.shared.b16 %rs349, [%r2018+384]; 2026-02-21T09:21:07.3401287Z ld.shared.b16 %rs350, [%r2018+4480]; 2026-02-21T09:21:07.3401362Z ld.shared.b16 %rs351, [%r2018+448]; 2026-02-21T09:21:07.3401428Z ld.shared.b16 %rs352, [%r2018+4544]; 2026-02-21T09:21:07.3401498Z add.s32 %r2019, %r2014, %r58; 2026-02-21T09:21:07.3401575Z ld.shared.b16 %rs353, [%r2019]; 2026-02-21T09:21:07.3401643Z ld.shared.b16 %rs354, [%r2019+4096]; 2026-02-21T09:21:07.3401711Z ld.shared.b16 %rs355, [%r2019+64]; 2026-02-21T09:21:07.3401783Z ld.shared.b16 %rs356, [%r2019+4160]; 2026-02-21T09:21:07.3401860Z ld.shared.b16 %rs357, [%r2019+128]; 2026-02-21T09:21:07.3401928Z ld.shared.b16 %rs358, [%r2019+4224]; 2026-02-21T09:21:07.3401996Z ld.shared.b16 %rs359, [%r2019+192]; 2026-02-21T09:21:07.3402069Z ld.shared.b16 %rs360, [%r2019+4288]; 2026-02-21T09:21:07.3402137Z ld.shared.b16 %rs361, [%r2019+256]; 2026-02-21T09:21:07.3402202Z ld.shared.b16 %rs362, [%r2019+4352]; 2026-02-21T09:21:07.3402278Z ld.shared.b16 %rs363, [%r2019+320]; 2026-02-21T09:21:07.3402362Z ld.shared.b16 %rs364, [%r2019+4416]; 2026-02-21T09:21:07.3402430Z ld.shared.b16 %rs365, [%r2019+384]; 2026-02-21T09:21:07.3402497Z ld.shared.b16 %rs366, [%r2019+4480]; 2026-02-21T09:21:07.3402571Z ld.shared.b16 %rs367, [%r2019+448]; 2026-02-21T09:21:07.3402638Z ld.shared.b16 %rs368, [%r2019+4544]; 2026-02-21T09:21:07.3402705Z add.s32 %r2020, %r2014, %r59; 2026-02-21T09:21:07.3402780Z ld.shared.b16 %rs369, [%r2020]; 2026-02-21T09:21:07.3402848Z ld.shared.b16 %rs370, [%r2020+4096]; 2026-02-21T09:21:07.3402916Z ld.shared.b16 %rs371, [%r2020+64]; 2026-02-21T09:21:07.3402983Z ld.shared.b16 %rs372, [%r2020+4160]; 2026-02-21T09:21:07.3403058Z ld.shared.b16 %rs373, [%r2020+128]; 2026-02-21T09:21:07.3403126Z ld.shared.b16 %rs374, [%r2020+4224]; 2026-02-21T09:21:07.3403196Z ld.shared.b16 %rs375, [%r2020+192]; 2026-02-21T09:21:07.3403265Z ld.shared.b16 %rs376, [%r2020+4288]; 2026-02-21T09:21:07.3403332Z ld.shared.b16 %rs377, [%r2020+256]; 2026-02-21T09:21:07.3403543Z ld.shared.b16 %rs378, [%r2020+4352]; 2026-02-21T09:21:07.3403715Z ld.shared.b16 %rs379, [%r2020+320]; 2026-02-21T09:21:07.3403787Z ld.shared.b16 %rs380, [%r2020+4416]; 2026-02-21T09:21:07.3403851Z ld.shared.b16 %rs381, [%r2020+384]; 2026-02-21T09:21:07.3403920Z ld.shared.b16 %rs382, [%r2020+4480]; 2026-02-21T09:21:07.3403994Z ld.shared.b16 %rs383, [%r2020+448]; 2026-02-21T09:21:07.3404059Z ld.shared.b16 %rs384, [%r2020+4544]; 2026-02-21T09:21:07.3404131Z add.s32 %r2021, %r2014, %r60; 2026-02-21T09:21:07.3404205Z ld.shared.b16 %rs385, [%r2021]; 2026-02-21T09:21:07.3404272Z ld.shared.b16 %rs386, [%r2021+4096]; 2026-02-21T09:21:07.3404337Z ld.shared.b16 %rs387, [%r2021+64]; 2026-02-21T09:21:07.3404402Z ld.shared.b16 %rs388, [%r2021+4160]; 2026-02-21T09:21:07.3404481Z ld.shared.b16 %rs389, [%r2021+128]; 2026-02-21T09:21:07.3404645Z ld.shared.b16 %rs390, [%r2021+4224]; 2026-02-21T09:21:07.3404717Z ld.shared.b16 %rs391, [%r2021+192]; 2026-02-21T09:21:07.3404792Z ld.shared.b16 %rs392, [%r2021+4288]; 2026-02-21T09:21:07.3404863Z ld.shared.b16 %rs393, [%r2021+256]; 2026-02-21T09:21:07.3404930Z ld.shared.b16 %rs394, [%r2021+4352]; 2026-02-21T09:21:07.3404998Z ld.shared.b16 %rs395, [%r2021+320]; 2026-02-21T09:21:07.3405069Z ld.shared.b16 %rs396, [%r2021+4416]; 2026-02-21T09:21:07.3405204Z ld.shared.b16 %rs397, [%r2021+384]; 2026-02-21T09:21:07.3405275Z ld.shared.b16 %rs398, [%r2021+4480]; 2026-02-21T09:21:07.3405347Z ld.shared.b16 %rs399, [%r2021+448]; 2026-02-21T09:21:07.3405413Z ld.shared.b16 %rs400, [%r2021+4544]; 2026-02-21T09:21:07.3405475Z add.s32 %r2022, %r2014, %r61; 2026-02-21T09:21:07.3405546Z ld.shared.b16 %rs401, [%r2022]; 2026-02-21T09:21:07.3405613Z ld.shared.b16 %rs402, [%r2022+4096]; 2026-02-21T09:21:07.3405677Z ld.shared.b16 %rs403, [%r2022+64]; 2026-02-21T09:21:07.3405754Z ld.shared.b16 %rs404, [%r2022+4160]; 2026-02-21T09:21:07.3405830Z ld.shared.b16 %rs405, [%r2022+128]; 2026-02-21T09:21:07.3405896Z ld.shared.b16 %rs406, [%r2022+4224]; 2026-02-21T09:21:07.3405965Z ld.shared.b16 %rs407, [%r2022+192]; 2026-02-21T09:21:07.3406037Z ld.shared.b16 %rs408, [%r2022+4288]; 2026-02-21T09:21:07.3406105Z ld.shared.b16 %rs409, [%r2022+256]; 2026-02-21T09:21:07.3406171Z ld.shared.b16 %rs410, [%r2022+4352]; 2026-02-21T09:21:07.3406239Z ld.shared.b16 %rs411, [%r2022+320]; 2026-02-21T09:21:07.3406310Z ld.shared.b16 %rs412, [%r2022+4416]; 2026-02-21T09:21:07.3406375Z ld.shared.b16 %rs413, [%r2022+384]; 2026-02-21T09:21:07.3406440Z ld.shared.b16 %rs414, [%r2022+4480]; 2026-02-21T09:21:07.3406687Z ld.shared.b16 %rs415, [%r2022+448]; 2026-02-21T09:21:07.3406762Z ld.shared.b16 %rs416, [%r2022+4544]; 2026-02-21T09:21:07.3406830Z cvt.f32.bf16 %r1327, %rs289; 2026-02-21T09:21:07.3406899Z cvt.f32.bf16 %r1328, %rs290; 2026-02-21T09:21:07.3406972Z cvt.f32.bf16 %r1329, %rs305; 2026-02-21T09:21:07.3407043Z cvt.f32.bf16 %r1330, %rs306; 2026-02-21T09:21:07.3407106Z cvt.f32.bf16 %r1347, %rs321; 2026-02-21T09:21:07.3407174Z cvt.f32.bf16 %r1348, %rs322; 2026-02-21T09:21:07.3407240Z cvt.f32.bf16 %r1349, %rs337; 2026-02-21T09:21:07.3407300Z cvt.f32.bf16 %r1350, %rs338; 2026-02-21T09:21:07.3407365Z cvt.f32.bf16 %r1367, %rs353; 2026-02-21T09:21:07.3407430Z cvt.f32.bf16 %r1368, %rs354; 2026-02-21T09:21:07.3407493Z cvt.f32.bf16 %r1369, %rs369; 2026-02-21T09:21:07.3407554Z cvt.f32.bf16 %r1370, %rs370; 2026-02-21T09:21:07.3407628Z cvt.f32.bf16 %r1387, %rs385; 2026-02-21T09:21:07.3407690Z cvt.f32.bf16 %r1388, %rs386; 2026-02-21T09:21:07.3407752Z cvt.f32.bf16 %r1389, %rs401; 2026-02-21T09:21:07.3407817Z cvt.f32.bf16 %r1390, %rs402; 2026-02-21T09:21:07.3407877Z cvt.f32.bf16 %r1407, %rs291; 2026-02-21T09:21:07.3407938Z cvt.f32.bf16 %r1408, %rs292; 2026-02-21T09:21:07.3407999Z cvt.f32.bf16 %r1409, %rs307; 2026-02-21T09:21:07.3408067Z cvt.f32.bf16 %r1410, %rs308; 2026-02-21T09:21:07.3408128Z cvt.f32.bf16 %r1427, %rs323; 2026-02-21T09:21:07.3408189Z cvt.f32.bf16 %r1428, %rs324; 2026-02-21T09:21:07.3408256Z cvt.f32.bf16 %r1429, %rs339; 2026-02-21T09:21:07.3408476Z cvt.f32.bf16 %r1430, %rs340; 2026-02-21T09:21:07.3408539Z cvt.f32.bf16 %r1447, %rs355; 2026-02-21T09:21:07.3408609Z cvt.f32.bf16 %r1448, %rs356; 2026-02-21T09:21:07.3408671Z cvt.f32.bf16 %r1449, %rs371; 2026-02-21T09:21:07.3408734Z cvt.f32.bf16 %r1450, %rs372; 2026-02-21T09:21:07.3408795Z cvt.f32.bf16 %r1467, %rs387; 2026-02-21T09:21:07.3408864Z cvt.f32.bf16 %r1468, %rs388; 2026-02-21T09:21:07.3408924Z cvt.f32.bf16 %r1469, %rs403; 2026-02-21T09:21:07.3408985Z cvt.f32.bf16 %r1470, %rs404; 2026-02-21T09:21:07.3409051Z cvt.f32.bf16 %r1487, %rs293; 2026-02-21T09:21:07.3409111Z cvt.f32.bf16 %r1488, %rs294; 2026-02-21T09:21:07.3409181Z cvt.f32.bf16 %r1489, %rs309; 2026-02-21T09:21:07.3409245Z cvt.f32.bf16 %r1490, %rs310; 2026-02-21T09:21:07.3409373Z cvt.f32.bf16 %r1507, %rs325; 2026-02-21T09:21:07.3409437Z cvt.f32.bf16 %r1508, %rs326; 2026-02-21T09:21:07.3409498Z cvt.f32.bf16 %r1509, %rs341; 2026-02-21T09:21:07.3409566Z cvt.f32.bf16 %r1510, %rs342; 2026-02-21T09:21:07.3409633Z cvt.f32.bf16 %r1527, %rs357; 2026-02-21T09:21:07.3409704Z cvt.f32.bf16 %r1528, %rs358; 2026-02-21T09:21:07.3409770Z cvt.f32.bf16 %r1529, %rs373; 2026-02-21T09:21:07.3409836Z cvt.f32.bf16 %r1530, %rs374; 2026-02-21T09:21:07.3409896Z cvt.f32.bf16 %r1547, %rs389; 2026-02-21T09:21:07.3410042Z cvt.f32.bf16 %r1548, %rs390; 2026-02-21T09:21:07.3410115Z cvt.f32.bf16 %r1549, %rs405; 2026-02-21T09:21:07.3410176Z cvt.f32.bf16 %r1550, %rs406; 2026-02-21T09:21:07.3410237Z cvt.f32.bf16 %r1567, %rs295; 2026-02-21T09:21:07.3410312Z cvt.f32.bf16 %r1568, %rs296; 2026-02-21T09:21:07.3410380Z cvt.f32.bf16 %r1569, %rs311; 2026-02-21T09:21:07.3410442Z cvt.f32.bf16 %r1570, %rs312; 2026-02-21T09:21:07.3410505Z cvt.f32.bf16 %r1587, %rs327; 2026-02-21T09:21:07.3410570Z cvt.f32.bf16 %r1588, %rs328; 2026-02-21T09:21:07.3410636Z cvt.f32.bf16 %r1589, %rs343; 2026-02-21T09:21:07.3410698Z cvt.f32.bf16 %r1590, %rs344; 2026-02-21T09:21:07.3410763Z cvt.f32.bf16 %r1607, %rs359; 2026-02-21T09:21:07.3410828Z cvt.f32.bf16 %r1608, %rs360; 2026-02-21T09:21:07.3410889Z cvt.f32.bf16 %r1609, %rs375; 2026-02-21T09:21:07.3410951Z cvt.f32.bf16 %r1610, %rs376; 2026-02-21T09:21:07.3411016Z cvt.f32.bf16 %r1627, %rs391; 2026-02-21T09:21:07.3411080Z cvt.f32.bf16 %r1628, %rs392; 2026-02-21T09:21:07.3411143Z cvt.f32.bf16 %r1629, %rs407; 2026-02-21T09:21:07.3411207Z cvt.f32.bf16 %r1630, %rs408; 2026-02-21T09:21:07.3411268Z cvt.f32.bf16 %r1647, %rs297; 2026-02-21T09:21:07.3411330Z cvt.f32.bf16 %r1648, %rs298; 2026-02-21T09:21:07.3411390Z cvt.f32.bf16 %r1649, %rs313; 2026-02-21T09:21:07.3411456Z cvt.f32.bf16 %r1650, %rs314; 2026-02-21T09:21:07.3411517Z cvt.f32.bf16 %r1667, %rs329; 2026-02-21T09:21:07.3411577Z cvt.f32.bf16 %r1668, %rs330; 2026-02-21T09:21:07.3411650Z cvt.f32.bf16 %r1669, %rs345; 2026-02-21T09:21:07.3411722Z cvt.f32.bf16 %r1670, %rs346; 2026-02-21T09:21:07.3411783Z cvt.f32.bf16 %r1687, %rs361; 2026-02-21T09:21:07.3411845Z cvt.f32.bf16 %r1688, %rs362; 2026-02-21T09:21:07.3411915Z cvt.f32.bf16 %r1689, %rs377; 2026-02-21T09:21:07.3411978Z cvt.f32.bf16 %r1690, %rs378; 2026-02-21T09:21:07.3412039Z cvt.f32.bf16 %r1707, %rs393; 2026-02-21T09:21:07.3412104Z cvt.f32.bf16 %r1708, %rs394; 2026-02-21T09:21:07.3412166Z cvt.f32.bf16 %r1709, %rs409; 2026-02-21T09:21:07.3412228Z cvt.f32.bf16 %r1710, %rs410; 2026-02-21T09:21:07.3412294Z cvt.f32.bf16 %r1727, %rs299; 2026-02-21T09:21:07.3412356Z cvt.f32.bf16 %r1728, %rs300; 2026-02-21T09:21:07.3412416Z cvt.f32.bf16 %r1729, %rs315; 2026-02-21T09:21:07.3412476Z cvt.f32.bf16 %r1730, %rs316; 2026-02-21T09:21:07.3412544Z cvt.f32.bf16 %r1747, %rs331; 2026-02-21T09:21:07.3412606Z cvt.f32.bf16 %r1748, %rs332; 2026-02-21T09:21:07.3412667Z cvt.f32.bf16 %r1749, %rs347; 2026-02-21T09:21:07.3412736Z cvt.f32.bf16 %r1750, %rs348; 2026-02-21T09:21:07.3412807Z cvt.f32.bf16 %r1767, %rs363; 2026-02-21T09:21:07.3412868Z cvt.f32.bf16 %r1768, %rs364; 2026-02-21T09:21:07.3412930Z cvt.f32.bf16 %r1769, %rs379; 2026-02-21T09:21:07.3413110Z cvt.f32.bf16 %r1770, %rs380; 2026-02-21T09:21:07.3413172Z cvt.f32.bf16 %r1787, %rs395; 2026-02-21T09:21:07.3413231Z cvt.f32.bf16 %r1788, %rs396; 2026-02-21T09:21:07.3413296Z cvt.f32.bf16 %r1789, %rs411; 2026-02-21T09:21:07.3413357Z cvt.f32.bf16 %r1790, %rs412; 2026-02-21T09:21:07.3413419Z cvt.f32.bf16 %r1807, %rs301; 2026-02-21T09:21:07.3413480Z cvt.f32.bf16 %r1808, %rs302; 2026-02-21T09:21:07.3413545Z cvt.f32.bf16 %r1809, %rs317; 2026-02-21T09:21:07.3413606Z cvt.f32.bf16 %r1810, %rs318; 2026-02-21T09:21:07.3413675Z cvt.f32.bf16 %r1827, %rs333; 2026-02-21T09:21:07.3413742Z cvt.f32.bf16 %r1828, %rs334; 2026-02-21T09:21:07.3413805Z cvt.f32.bf16 %r1829, %rs349; 2026-02-21T09:21:07.3413866Z cvt.f32.bf16 %r1830, %rs350; 2026-02-21T09:21:07.3413929Z cvt.f32.bf16 %r1847, %rs365; 2026-02-21T09:21:07.3414046Z cvt.f32.bf16 %r1848, %rs366; 2026-02-21T09:21:07.3414109Z cvt.f32.bf16 %r1849, %rs381; 2026-02-21T09:21:07.3414170Z cvt.f32.bf16 %r1850, %rs382; 2026-02-21T09:21:07.3414249Z cvt.f32.bf16 %r1867, %rs397; 2026-02-21T09:21:07.3414315Z cvt.f32.bf16 %r1868, %rs398; 2026-02-21T09:21:07.3414376Z cvt.f32.bf16 %r1869, %rs413; 2026-02-21T09:21:07.3414442Z cvt.f32.bf16 %r1870, %rs414; 2026-02-21T09:21:07.3414515Z cvt.f32.bf16 %r1887, %rs303; 2026-02-21T09:21:07.3414631Z cvt.f32.bf16 %r1888, %rs304; 2026-02-21T09:21:07.3414707Z cvt.f32.bf16 %r1889, %rs319; 2026-02-21T09:21:07.3414775Z cvt.f32.bf16 %r1890, %rs320; 2026-02-21T09:21:07.3414836Z cvt.f32.bf16 %r1907, %rs335; 2026-02-21T09:21:07.3414897Z cvt.f32.bf16 %r1908, %rs336; 2026-02-21T09:21:07.3414964Z cvt.f32.bf16 %r1909, %rs351; 2026-02-21T09:21:07.3415025Z cvt.f32.bf16 %r1910, %rs352; 2026-02-21T09:21:07.3415088Z cvt.f32.bf16 %r1927, %rs367; 2026-02-21T09:21:07.3415150Z cvt.f32.bf16 %r1928, %rs368; 2026-02-21T09:21:07.3415220Z cvt.f32.bf16 %r1929, %rs383; 2026-02-21T09:21:07.3415282Z cvt.f32.bf16 %r1930, %rs384; 2026-02-21T09:21:07.3415342Z cvt.f32.bf16 %r1947, %rs399; 2026-02-21T09:21:07.3415415Z cvt.f32.bf16 %r1948, %rs400; 2026-02-21T09:21:07.3415489Z cvt.f32.bf16 %r1949, %rs415; 2026-02-21T09:21:07.3415552Z cvt.f32.bf16 %r1950, %rs416; 2026-02-21T09:21:07.3415797Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3415872Z shl.b32 %r2023, %r2153, 3; 2026-02-21T09:21:07.3415941Z add.s32 %r1309, %r1160, %r2023; 2026-02-21T09:21:07.3416004Z // begin inline asm 2026-02-21T09:21:07.3416066Z 2026-02-21T09:21:07.3416119Z { 2026-02-21T09:21:07.3416187Z .reg .pred complete; 2026-02-21T09:21:07.3416250Z waitLoop: 2026-02-21T09:21:07.3416408Z mbarrier.try_wait.parity.shared.b64 complete, [%r1309], %r2152; 2026-02-21T09:21:07.3416601Z @!complete bra.uni waitLoop; 2026-02-21T09:21:07.3416668Z } 2026-02-21T09:21:07.3416674Z 2026-02-21T09:21:07.3416744Z // end inline asm 2026-02-21T09:21:07.3416963Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3417042Z shl.b32 %r2024, %r2153, 12; 2026-02-21T09:21:07.3417114Z add.s32 %r2025, %r235, %r2024; 2026-02-21T09:21:07.3417320Z .loc 1 82 58 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:82:58 2026-02-21T09:21:07.3417389Z add.s32 %r2026, %r2025, %r48; 2026-02-21T09:21:07.3417464Z ld.shared.b8 %rs417, [%r2026]; 2026-02-21T09:21:07.3417534Z ld.shared.b8 %rs418, [%r2026+256]; 2026-02-21T09:21:07.3417600Z ld.shared.b8 %rs419, [%r2026+512]; 2026-02-21T09:21:07.3417665Z ld.shared.b8 %rs420, [%r2026+768]; 2026-02-21T09:21:07.3417742Z ld.shared.b8 %rs421, [%r2026+1024]; 2026-02-21T09:21:07.3417808Z ld.shared.b8 %rs422, [%r2026+1280]; 2026-02-21T09:21:07.3417874Z ld.shared.b8 %rs423, [%r2026+1536]; 2026-02-21T09:21:07.3417942Z ld.shared.b8 %rs424, [%r2026+1792]; 2026-02-21T09:21:07.3418008Z ld.shared.b8 %rs425, [%r2026+2048]; 2026-02-21T09:21:07.3418074Z ld.shared.b8 %rs426, [%r2026+2304]; 2026-02-21T09:21:07.3418138Z ld.shared.b8 %rs427, [%r2026+2560]; 2026-02-21T09:21:07.3418365Z ld.shared.b8 %rs428, [%r2026+2816]; 2026-02-21T09:21:07.3418431Z ld.shared.b8 %rs429, [%r2026+3072]; 2026-02-21T09:21:07.3418495Z ld.shared.b8 %rs430, [%r2026+3328]; 2026-02-21T09:21:07.3418566Z ld.shared.b8 %rs431, [%r2026+3584]; 2026-02-21T09:21:07.3418633Z ld.shared.b8 %rs432, [%r2026+3840]; 2026-02-21T09:21:07.3418695Z add.s32 %r2027, %r2025, %r95; 2026-02-21T09:21:07.3418765Z ld.shared.b8 %rs433, [%r2027+128]; 2026-02-21T09:21:07.3418830Z ld.shared.b8 %rs434, [%r2027+384]; 2026-02-21T09:21:07.3418900Z ld.shared.b8 %rs435, [%r2027+640]; 2026-02-21T09:21:07.3418967Z ld.shared.b8 %rs436, [%r2027+896]; 2026-02-21T09:21:07.3419042Z ld.shared.b8 %rs437, [%r2027+1152]; 2026-02-21T09:21:07.3419109Z ld.shared.b8 %rs438, [%r2027+1408]; 2026-02-21T09:21:07.3419250Z ld.shared.b8 %rs439, [%r2027+1664]; 2026-02-21T09:21:07.3419329Z ld.shared.b8 %rs440, [%r2027+1920]; 2026-02-21T09:21:07.3419395Z ld.shared.b8 %rs441, [%r2027+2176]; 2026-02-21T09:21:07.3419464Z ld.shared.b8 %rs442, [%r2027+2432]; 2026-02-21T09:21:07.3419532Z ld.shared.b8 %rs443, [%r2027+2688]; 2026-02-21T09:21:07.3419605Z ld.shared.b8 %rs444, [%r2027+2944]; 2026-02-21T09:21:07.3419670Z ld.shared.b8 %rs445, [%r2027+3200]; 2026-02-21T09:21:07.3419795Z ld.shared.b8 %rs446, [%r2027+3456]; 2026-02-21T09:21:07.3419867Z ld.shared.b8 %rs447, [%r2027+3712]; 2026-02-21T09:21:07.3419934Z ld.shared.b8 %rs448, [%r2027+3968]; 2026-02-21T09:21:07.3420150Z .loc 1 67 28 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:67:28 2026-02-21T09:21:07.3420223Z shl.b16 %rs449, %rs417, 4; 2026-02-21T09:21:07.3420286Z shl.b16 %rs450, %rs433, 4; 2026-02-21T09:21:07.3420349Z shl.b16 %rs451, %rs418, 4; 2026-02-21T09:21:07.3420409Z shl.b16 %rs452, %rs434, 4; 2026-02-21T09:21:07.3420490Z shl.b16 %rs453, %rs419, 4; 2026-02-21T09:21:07.3420555Z shl.b16 %rs454, %rs435, 4; 2026-02-21T09:21:07.3420616Z shl.b16 %rs455, %rs420, 4; 2026-02-21T09:21:07.3420680Z shl.b16 %rs456, %rs436, 4; 2026-02-21T09:21:07.3420746Z shl.b16 %rs457, %rs421, 4; 2026-02-21T09:21:07.3420807Z shl.b16 %rs458, %rs437, 4; 2026-02-21T09:21:07.3420867Z shl.b16 %rs459, %rs422, 4; 2026-02-21T09:21:07.3420931Z shl.b16 %rs460, %rs438, 4; 2026-02-21T09:21:07.3420994Z shl.b16 %rs461, %rs423, 4; 2026-02-21T09:21:07.3421056Z shl.b16 %rs462, %rs439, 4; 2026-02-21T09:21:07.3421122Z shl.b16 %rs463, %rs424, 4; 2026-02-21T09:21:07.3421183Z shl.b16 %rs464, %rs440, 4; 2026-02-21T09:21:07.3421244Z shl.b16 %rs465, %rs425, 4; 2026-02-21T09:21:07.3421306Z shl.b16 %rs466, %rs441, 4; 2026-02-21T09:21:07.3421376Z shl.b16 %rs467, %rs426, 4; 2026-02-21T09:21:07.3421437Z shl.b16 %rs468, %rs442, 4; 2026-02-21T09:21:07.3421501Z shl.b16 %rs469, %rs427, 4; 2026-02-21T09:21:07.3421565Z shl.b16 %rs470, %rs443, 4; 2026-02-21T09:21:07.3421629Z shl.b16 %rs471, %rs428, 4; 2026-02-21T09:21:07.3421691Z shl.b16 %rs472, %rs444, 4; 2026-02-21T09:21:07.3421763Z shl.b16 %rs473, %rs429, 4; 2026-02-21T09:21:07.3421833Z shl.b16 %rs474, %rs445, 4; 2026-02-21T09:21:07.3421896Z shl.b16 %rs475, %rs430, 4; 2026-02-21T09:21:07.3421958Z shl.b16 %rs476, %rs446, 4; 2026-02-21T09:21:07.3422023Z shl.b16 %rs477, %rs431, 4; 2026-02-21T09:21:07.3422086Z shl.b16 %rs478, %rs447, 4; 2026-02-21T09:21:07.3422149Z shl.b16 %rs479, %rs432, 4; 2026-02-21T09:21:07.3422211Z shl.b16 %rs480, %rs448, 4; 2026-02-21T09:21:07.3422425Z .loc 1 82 58 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:82:58 2026-02-21T09:21:07.3422500Z selp.b16 %rs481, %rs449, %rs417, %p77; 2026-02-21T09:21:07.3422564Z cvt.s16.s8 %rs482, %rs481; 2026-02-21T09:21:07.3422630Z shr.s16 %rs483, %rs482, 4; 2026-02-21T09:21:07.3422701Z selp.b16 %rs484, %rs450, %rs433, %p77; 2026-02-21T09:21:07.3422763Z cvt.s16.s8 %rs485, %rs484; 2026-02-21T09:21:07.3422830Z shr.s16 %rs486, %rs485, 4; 2026-02-21T09:21:07.3422899Z selp.b16 %rs487, %rs451, %rs418, %p77; 2026-02-21T09:21:07.3422974Z cvt.s16.s8 %rs488, %rs487; 2026-02-21T09:21:07.3423165Z shr.s16 %rs489, %rs488, 4; 2026-02-21T09:21:07.3423239Z selp.b16 %rs490, %rs452, %rs434, %p77; 2026-02-21T09:21:07.3423301Z cvt.s16.s8 %rs491, %rs490; 2026-02-21T09:21:07.3423363Z shr.s16 %rs492, %rs491, 4; 2026-02-21T09:21:07.3423437Z selp.b16 %rs493, %rs453, %rs419, %p77; 2026-02-21T09:21:07.3423500Z cvt.s16.s8 %rs494, %rs493; 2026-02-21T09:21:07.3423562Z shr.s16 %rs495, %rs494, 4; 2026-02-21T09:21:07.3423632Z selp.b16 %rs496, %rs454, %rs435, %p77; 2026-02-21T09:21:07.3423692Z cvt.s16.s8 %rs497, %rs496; 2026-02-21T09:21:07.3423756Z shr.s16 %rs498, %rs497, 4; 2026-02-21T09:21:07.3423824Z selp.b16 %rs499, %rs455, %rs420, %p77; 2026-02-21T09:21:07.3423889Z cvt.s16.s8 %rs500, %rs499; 2026-02-21T09:21:07.3423951Z shr.s16 %rs501, %rs500, 4; 2026-02-21T09:21:07.3424078Z selp.b16 %rs502, %rs456, %rs436, %p77; 2026-02-21T09:21:07.3424150Z cvt.s16.s8 %rs503, %rs502; 2026-02-21T09:21:07.3424211Z shr.s16 %rs504, %rs503, 4; 2026-02-21T09:21:07.3424280Z selp.b16 %rs505, %rs457, %rs421, %p77; 2026-02-21T09:21:07.3424349Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T09:21:07.3424410Z shr.s16 %rs507, %rs506, 4; 2026-02-21T09:21:07.3424483Z selp.b16 %rs508, %rs458, %rs437, %p77; 2026-02-21T09:21:07.3424543Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T09:21:07.3424658Z shr.s16 %rs510, %rs509, 4; 2026-02-21T09:21:07.3424730Z selp.b16 %rs511, %rs459, %rs422, %p77; 2026-02-21T09:21:07.3424793Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T09:21:07.3424857Z shr.s16 %rs513, %rs512, 4; 2026-02-21T09:21:07.3424925Z selp.b16 %rs514, %rs460, %rs438, %p77; 2026-02-21T09:21:07.3424986Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T09:21:07.3425047Z shr.s16 %rs516, %rs515, 4; 2026-02-21T09:21:07.3425120Z selp.b16 %rs517, %rs461, %rs423, %p77; 2026-02-21T09:21:07.3425181Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T09:21:07.3425242Z shr.s16 %rs519, %rs518, 4; 2026-02-21T09:21:07.3425318Z selp.b16 %rs520, %rs462, %rs439, %p77; 2026-02-21T09:21:07.3425379Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T09:21:07.3425443Z shr.s16 %rs522, %rs521, 4; 2026-02-21T09:21:07.3425512Z selp.b16 %rs523, %rs463, %rs424, %p77; 2026-02-21T09:21:07.3425579Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T09:21:07.3425639Z shr.s16 %rs525, %rs524, 4; 2026-02-21T09:21:07.3425708Z selp.b16 %rs526, %rs464, %rs440, %p77; 2026-02-21T09:21:07.3425773Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T09:21:07.3425835Z shr.s16 %rs528, %rs527, 4; 2026-02-21T09:21:07.3425901Z selp.b16 %rs529, %rs465, %rs425, %p77; 2026-02-21T09:21:07.3425963Z cvt.s16.s8 %rs530, %rs529; 2026-02-21T09:21:07.3426029Z shr.s16 %rs531, %rs530, 4; 2026-02-21T09:21:07.3426097Z selp.b16 %rs532, %rs466, %rs441, %p77; 2026-02-21T09:21:07.3426158Z cvt.s16.s8 %rs533, %rs532; 2026-02-21T09:21:07.3426230Z shr.s16 %rs534, %rs533, 4; 2026-02-21T09:21:07.3426305Z selp.b16 %rs535, %rs467, %rs426, %p77; 2026-02-21T09:21:07.3426366Z cvt.s16.s8 %rs536, %rs535; 2026-02-21T09:21:07.3426431Z shr.s16 %rs537, %rs536, 4; 2026-02-21T09:21:07.3426629Z selp.b16 %rs538, %rs468, %rs442, %p77; 2026-02-21T09:21:07.3426700Z cvt.s16.s8 %rs539, %rs538; 2026-02-21T09:21:07.3426762Z shr.s16 %rs540, %rs539, 4; 2026-02-21T09:21:07.3426835Z selp.b16 %rs541, %rs469, %rs427, %p77; 2026-02-21T09:21:07.3426896Z cvt.s16.s8 %rs542, %rs541; 2026-02-21T09:21:07.3426958Z shr.s16 %rs543, %rs542, 4; 2026-02-21T09:21:07.3427030Z selp.b16 %rs544, %rs470, %rs443, %p77; 2026-02-21T09:21:07.3427092Z cvt.s16.s8 %rs545, %rs544; 2026-02-21T09:21:07.3427154Z shr.s16 %rs546, %rs545, 4; 2026-02-21T09:21:07.3427222Z selp.b16 %rs547, %rs471, %rs428, %p77; 2026-02-21T09:21:07.3427287Z cvt.s16.s8 %rs548, %rs547; 2026-02-21T09:21:07.3427360Z shr.s16 %rs549, %rs548, 4; 2026-02-21T09:21:07.3427430Z selp.b16 %rs550, %rs472, %rs444, %p77; 2026-02-21T09:21:07.3427493Z cvt.s16.s8 %rs551, %rs550; 2026-02-21T09:21:07.3427555Z shr.s16 %rs552, %rs551, 4; 2026-02-21T09:21:07.3427620Z selp.b16 %rs553, %rs473, %rs429, %p77; 2026-02-21T09:21:07.3427680Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T09:21:07.3427894Z shr.s16 %rs555, %rs554, 4; 2026-02-21T09:21:07.3427963Z selp.b16 %rs556, %rs474, %rs445, %p77; 2026-02-21T09:21:07.3428023Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T09:21:07.3428096Z shr.s16 %rs558, %rs557, 4; 2026-02-21T09:21:07.3428178Z selp.b16 %rs559, %rs475, %rs430, %p77; 2026-02-21T09:21:07.3428243Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T09:21:07.3428385Z shr.s16 %rs561, %rs560, 4; 2026-02-21T09:21:07.3428463Z selp.b16 %rs562, %rs476, %rs446, %p77; 2026-02-21T09:21:07.3428526Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T09:21:07.3428586Z shr.s16 %rs564, %rs563, 4; 2026-02-21T09:21:07.3428660Z selp.b16 %rs565, %rs477, %rs431, %p77; 2026-02-21T09:21:07.3428724Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T09:21:07.3428785Z shr.s16 %rs567, %rs566, 4; 2026-02-21T09:21:07.3428926Z selp.b16 %rs568, %rs478, %rs447, %p77; 2026-02-21T09:21:07.3429002Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T09:21:07.3429065Z shr.s16 %rs570, %rs569, 4; 2026-02-21T09:21:07.3429135Z selp.b16 %rs571, %rs479, %rs432, %p77; 2026-02-21T09:21:07.3429204Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T09:21:07.3429264Z shr.s16 %rs573, %rs572, 4; 2026-02-21T09:21:07.3429331Z selp.b16 %rs574, %rs480, %rs448, %p77; 2026-02-21T09:21:07.3429393Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T09:21:07.3429515Z shr.s16 %rs576, %rs575, 4; 2026-02-21T09:21:07.3429746Z .loc 1 87 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:87:32 2026-02-21T09:21:07.3429818Z cvt.rn.f32.s16 %r2028, %rs483; 2026-02-21T09:21:07.3429887Z cvt.rn.f32.s16 %r2029, %rs486; 2026-02-21T09:21:07.3429951Z cvt.rn.f32.s16 %r2030, %rs489; 2026-02-21T09:21:07.3430014Z cvt.rn.f32.s16 %r2031, %rs492; 2026-02-21T09:21:07.3430081Z cvt.rn.f32.s16 %r2032, %rs495; 2026-02-21T09:21:07.3430143Z cvt.rn.f32.s16 %r2033, %rs498; 2026-02-21T09:21:07.3430207Z cvt.rn.f32.s16 %r2034, %rs501; 2026-02-21T09:21:07.3430272Z cvt.rn.f32.s16 %r2035, %rs504; 2026-02-21T09:21:07.3430333Z cvt.rn.f32.s16 %r2036, %rs507; 2026-02-21T09:21:07.3430401Z cvt.rn.f32.s16 %r2037, %rs510; 2026-02-21T09:21:07.3430468Z cvt.rn.f32.s16 %r2038, %rs513; 2026-02-21T09:21:07.3430531Z cvt.rn.f32.s16 %r2039, %rs516; 2026-02-21T09:21:07.3430592Z cvt.rn.f32.s16 %r2040, %rs519; 2026-02-21T09:21:07.3430658Z cvt.rn.f32.s16 %r2041, %rs522; 2026-02-21T09:21:07.3430720Z cvt.rn.f32.s16 %r2042, %rs525; 2026-02-21T09:21:07.3430791Z cvt.rn.f32.s16 %r2043, %rs528; 2026-02-21T09:21:07.3430858Z cvt.rn.f32.s16 %r2044, %rs531; 2026-02-21T09:21:07.3430920Z cvt.rn.f32.s16 %r2045, %rs534; 2026-02-21T09:21:07.3430980Z cvt.rn.f32.s16 %r2046, %rs537; 2026-02-21T09:21:07.3431040Z cvt.rn.f32.s16 %r2047, %rs540; 2026-02-21T09:21:07.3431104Z cvt.rn.f32.s16 %r2048, %rs543; 2026-02-21T09:21:07.3431166Z cvt.rn.f32.s16 %r2049, %rs546; 2026-02-21T09:21:07.3431231Z cvt.rn.f32.s16 %r2050, %rs549; 2026-02-21T09:21:07.3431295Z cvt.rn.f32.s16 %r2051, %rs552; 2026-02-21T09:21:07.3431357Z cvt.rn.f32.s16 %r2052, %rs555; 2026-02-21T09:21:07.3431419Z cvt.rn.f32.s16 %r2053, %rs558; 2026-02-21T09:21:07.3431482Z cvt.rn.f32.s16 %r2054, %rs561; 2026-02-21T09:21:07.3431546Z cvt.rn.f32.s16 %r2055, %rs564; 2026-02-21T09:21:07.3431607Z cvt.rn.f32.s16 %r2056, %rs567; 2026-02-21T09:21:07.3431669Z cvt.rn.f32.s16 %r2057, %rs570; 2026-02-21T09:21:07.3431734Z cvt.rn.f32.s16 %r2058, %rs573; 2026-02-21T09:21:07.3431795Z cvt.rn.f32.s16 %r2059, %rs576; 2026-02-21T09:21:07.3431861Z st.shared.b32 [%r49], %r2028; 2026-02-21T09:21:07.3431927Z st.shared.b32 [%r49+4096], %r2032; 2026-02-21T09:21:07.3431995Z st.shared.b32 [%r49+8192], %r2036; 2026-02-21T09:21:07.3432062Z st.shared.b32 [%r49+12288], %r2040; 2026-02-21T09:21:07.3432129Z st.shared.b32 [%r49+16384], %r2044; 2026-02-21T09:21:07.3432197Z st.shared.b32 [%r49+20480], %r2048; 2026-02-21T09:21:07.3432262Z st.shared.b32 [%r49+24576], %r2052; 2026-02-21T09:21:07.3432326Z st.shared.b32 [%r49+28672], %r2056; 2026-02-21T09:21:07.3432391Z st.shared.b32 [%r50], %r2029; 2026-02-21T09:21:07.3432455Z st.shared.b32 [%r50+4096], %r2033; 2026-02-21T09:21:07.3432644Z st.shared.b32 [%r50+8192], %r2037; 2026-02-21T09:21:07.3432709Z st.shared.b32 [%r50+12288], %r2041; 2026-02-21T09:21:07.3432777Z st.shared.b32 [%r50+16384], %r2045; 2026-02-21T09:21:07.3432839Z st.shared.b32 [%r50+20480], %r2049; 2026-02-21T09:21:07.3432905Z st.shared.b32 [%r50+24576], %r2053; 2026-02-21T09:21:07.3432970Z st.shared.b32 [%r50+28672], %r2057; 2026-02-21T09:21:07.3433032Z st.shared.b32 [%r51], %r2030; 2026-02-21T09:21:07.3433094Z st.shared.b32 [%r51+4096], %r2034; 2026-02-21T09:21:07.3433156Z st.shared.b32 [%r51+8192], %r2038; 2026-02-21T09:21:07.3433235Z st.shared.b32 [%r51+12288], %r2042; 2026-02-21T09:21:07.3433300Z st.shared.b32 [%r51+16384], %r2046; 2026-02-21T09:21:07.3433364Z st.shared.b32 [%r51+20480], %r2050; 2026-02-21T09:21:07.3433480Z st.shared.b32 [%r51+24576], %r2054; 2026-02-21T09:21:07.3433545Z st.shared.b32 [%r51+28672], %r2058; 2026-02-21T09:21:07.3433609Z st.shared.b32 [%r52], %r2031; 2026-02-21T09:21:07.3433686Z st.shared.b32 [%r52+4096], %r2035; 2026-02-21T09:21:07.3433751Z st.shared.b32 [%r52+8192], %r2039; 2026-02-21T09:21:07.3433815Z st.shared.b32 [%r52+12288], %r2043; 2026-02-21T09:21:07.3433878Z st.shared.b32 [%r52+16384], %r2047; 2026-02-21T09:21:07.3433991Z st.shared.b32 [%r52+20480], %r2051; 2026-02-21T09:21:07.3434057Z st.shared.b32 [%r52+24576], %r2055; 2026-02-21T09:21:07.3434121Z st.shared.b32 [%r52+28672], %r2059; 2026-02-21T09:21:07.3434179Z $L__tmp3: 2026-02-21T09:21:07.3434469Z .loc 2 291 36 // standard.py:291:36 @[ ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:94:40 ] 2026-02-21T09:21:07.3434532Z // begin inline asm 2026-02-21T09:21:07.3434616Z fence.proxy.async.shared::cta; 2026-02-21T09:21:07.3434694Z // end inline asm 2026-02-21T09:21:07.3434755Z bar.sync 0; 2026-02-21T09:21:07.3434842Z shfl.sync.idx.b32 %r2060, %r4, 0, 31, -1; 2026-02-21T09:21:07.3434920Z wgmma.fence.sync.aligned; 2026-02-21T09:21:07.3434984Z shl.b32 %r2061, %r2060, 9; 2026-02-21T09:21:07.3435049Z and.b32 %r2062, %r2061, 2048; 2026-02-21T09:21:07.3435114Z add.s32 %r2063, %r2062, %r180; 2026-02-21T09:21:07.3435179Z bfe.u32 %r2064, %r2063, 4, 14; 2026-02-21T09:21:07.3435245Z cvt.u64.u32 %rd276, %r2064; 2026-02-21T09:21:07.3435325Z or.b64 %rd227, %rd276, 4611686293322072064; 2026-02-21T09:21:07.3435394Z mov.pred %p92, -1; 2026-02-21T09:21:07.3435464Z // begin inline asm 2026-02-21T09:21:07.3435851Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1327,%r1328,%r1329,%r1330}, %rd227, %p92, 1, 1; 2026-02-21T09:21:07.3435913Z // end inline asm 2026-02-21T09:21:07.3435975Z add.s32 %r2065, %r2063, 32; 2026-02-21T09:21:07.3436037Z bfe.u32 %r2066, %r2065, 4, 14; 2026-02-21T09:21:07.3436101Z cvt.u64.u32 %rd277, %r2066; 2026-02-21T09:21:07.3436183Z or.b64 %rd228, %rd277, 4611686293322072064; 2026-02-21T09:21:07.3436242Z // begin inline asm 2026-02-21T09:21:07.3436740Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1347,%r1348,%r1349,%r1350}, %rd228, %p92, 1, 1; 2026-02-21T09:21:07.3436818Z // end inline asm 2026-02-21T09:21:07.3436880Z add.s32 %r2067, %r2063, 64; 2026-02-21T09:21:07.3436943Z bfe.u32 %r2068, %r2067, 4, 14; 2026-02-21T09:21:07.3437007Z cvt.u64.u32 %rd278, %r2068; 2026-02-21T09:21:07.3437080Z or.b64 %rd229, %rd278, 4611686293322072064; 2026-02-21T09:21:07.3437139Z // begin inline asm 2026-02-21T09:21:07.3437511Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1367,%r1368,%r1369,%r1370}, %rd229, %p92, 1, 1; 2026-02-21T09:21:07.3437571Z // end inline asm 2026-02-21T09:21:07.3437631Z add.s32 %r2069, %r2063, 96; 2026-02-21T09:21:07.3437693Z bfe.u32 %r2070, %r2069, 4, 14; 2026-02-21T09:21:07.3437757Z cvt.u64.u32 %rd279, %r2070; 2026-02-21T09:21:07.3437832Z or.b64 %rd230, %rd279, 4611686293322072064; 2026-02-21T09:21:07.3437981Z // begin inline asm 2026-02-21T09:21:07.3438418Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1387,%r1388,%r1389,%r1390}, %rd230, %p92, 1, 1; 2026-02-21T09:21:07.3438475Z // end inline asm 2026-02-21T09:21:07.3438548Z add.s32 %r2071, %r2063, 4096; 2026-02-21T09:21:07.3438611Z bfe.u32 %r2072, %r2071, 4, 14; 2026-02-21T09:21:07.3438680Z cvt.u64.u32 %rd280, %r2072; 2026-02-21T09:21:07.3438753Z or.b64 %rd231, %rd280, 4611686293322072064; 2026-02-21T09:21:07.3438811Z // begin inline asm 2026-02-21T09:21:07.3439180Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1407,%r1408,%r1409,%r1410}, %rd231, %p92, 1, 1; 2026-02-21T09:21:07.3439238Z // end inline asm 2026-02-21T09:21:07.3439363Z add.s32 %r2073, %r2063, 4128; 2026-02-21T09:21:07.3439429Z bfe.u32 %r2074, %r2073, 4, 14; 2026-02-21T09:21:07.3439491Z cvt.u64.u32 %rd281, %r2074; 2026-02-21T09:21:07.3439563Z or.b64 %rd232, %rd281, 4611686293322072064; 2026-02-21T09:21:07.3439637Z // begin inline asm 2026-02-21T09:21:07.3440013Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1427,%r1428,%r1429,%r1430}, %rd232, %p92, 1, 1; 2026-02-21T09:21:07.3440135Z // end inline asm 2026-02-21T09:21:07.3440200Z add.s32 %r2075, %r2063, 4160; 2026-02-21T09:21:07.3440262Z bfe.u32 %r2076, %r2075, 4, 14; 2026-02-21T09:21:07.3440324Z cvt.u64.u32 %rd282, %r2076; 2026-02-21T09:21:07.3440398Z or.b64 %rd233, %rd282, 4611686293322072064; 2026-02-21T09:21:07.3440470Z // begin inline asm 2026-02-21T09:21:07.3440839Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1447,%r1448,%r1449,%r1450}, %rd233, %p92, 1, 1; 2026-02-21T09:21:07.3440899Z // end inline asm 2026-02-21T09:21:07.3440960Z add.s32 %r2077, %r2063, 4192; 2026-02-21T09:21:07.3441025Z bfe.u32 %r2078, %r2077, 4, 14; 2026-02-21T09:21:07.3441087Z cvt.u64.u32 %rd283, %r2078; 2026-02-21T09:21:07.3441173Z or.b64 %rd234, %rd283, 4611686293322072064; 2026-02-21T09:21:07.3441234Z // begin inline asm 2026-02-21T09:21:07.3441603Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1467,%r1468,%r1469,%r1470}, %rd234, %p92, 1, 1; 2026-02-21T09:21:07.3441661Z // end inline asm 2026-02-21T09:21:07.3441725Z add.s32 %r2079, %r2063, 8192; 2026-02-21T09:21:07.3441784Z bfe.u32 %r2080, %r2079, 4, 14; 2026-02-21T09:21:07.3441844Z cvt.u64.u32 %rd284, %r2080; 2026-02-21T09:21:07.3441915Z or.b64 %rd235, %rd284, 4611686293322072064; 2026-02-21T09:21:07.3441977Z // begin inline asm 2026-02-21T09:21:07.3442344Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1487,%r1488,%r1489,%r1490}, %rd235, %p92, 1, 1; 2026-02-21T09:21:07.3442402Z // end inline asm 2026-02-21T09:21:07.3442464Z add.s32 %r2081, %r2063, 8224; 2026-02-21T09:21:07.3442528Z bfe.u32 %r2082, %r2081, 4, 14; 2026-02-21T09:21:07.3442588Z cvt.u64.u32 %rd285, %r2082; 2026-02-21T09:21:07.3442661Z or.b64 %rd236, %rd285, 4611686293322072064; 2026-02-21T09:21:07.3442720Z // begin inline asm 2026-02-21T09:21:07.3443086Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1507,%r1508,%r1509,%r1510}, %rd236, %p92, 1, 1; 2026-02-21T09:21:07.3443146Z // end inline asm 2026-02-21T09:21:07.3443209Z add.s32 %r2083, %r2063, 8256; 2026-02-21T09:21:07.3443268Z bfe.u32 %r2084, %r2083, 4, 14; 2026-02-21T09:21:07.3443329Z cvt.u64.u32 %rd286, %r2084; 2026-02-21T09:21:07.3443402Z or.b64 %rd237, %rd286, 4611686293322072064; 2026-02-21T09:21:07.3443460Z // begin inline asm 2026-02-21T09:21:07.3443824Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1527,%r1528,%r1529,%r1530}, %rd237, %p92, 1, 1; 2026-02-21T09:21:07.3443883Z // end inline asm 2026-02-21T09:21:07.3444069Z add.s32 %r2085, %r2063, 8288; 2026-02-21T09:21:07.3444140Z bfe.u32 %r2086, %r2085, 4, 14; 2026-02-21T09:21:07.3444203Z cvt.u64.u32 %rd287, %r2086; 2026-02-21T09:21:07.3444278Z or.b64 %rd238, %rd287, 4611686293322072064; 2026-02-21T09:21:07.3444338Z // begin inline asm 2026-02-21T09:21:07.3444711Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1547,%r1548,%r1549,%r1550}, %rd238, %p92, 1, 1; 2026-02-21T09:21:07.3444770Z // end inline asm 2026-02-21T09:21:07.3444831Z add.s32 %r2087, %r2063, 12288; 2026-02-21T09:21:07.3444890Z bfe.u32 %r2088, %r2087, 4, 14; 2026-02-21T09:21:07.3444954Z cvt.u64.u32 %rd288, %r2088; 2026-02-21T09:21:07.3445025Z or.b64 %rd239, %rd288, 4611686293322072064; 2026-02-21T09:21:07.3445144Z // begin inline asm 2026-02-21T09:21:07.3445526Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1567,%r1568,%r1569,%r1570}, %rd239, %p92, 1, 1; 2026-02-21T09:21:07.3445592Z // end inline asm 2026-02-21T09:21:07.3445653Z add.s32 %r2089, %r2063, 12320; 2026-02-21T09:21:07.3445713Z bfe.u32 %r2090, %r2089, 4, 14; 2026-02-21T09:21:07.3445777Z cvt.u64.u32 %rd289, %r2090; 2026-02-21T09:21:07.3445893Z or.b64 %rd240, %rd289, 4611686293322072064; 2026-02-21T09:21:07.3445954Z // begin inline asm 2026-02-21T09:21:07.3446333Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1587,%r1588,%r1589,%r1590}, %rd240, %p92, 1, 1; 2026-02-21T09:21:07.3446390Z // end inline asm 2026-02-21T09:21:07.3446568Z add.s32 %r2091, %r2063, 12352; 2026-02-21T09:21:07.3446642Z bfe.u32 %r2092, %r2091, 4, 14; 2026-02-21T09:21:07.3446711Z cvt.u64.u32 %rd290, %r2092; 2026-02-21T09:21:07.3446796Z or.b64 %rd241, %rd290, 4611686293322072064; 2026-02-21T09:21:07.3446856Z // begin inline asm 2026-02-21T09:21:07.3447234Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1607,%r1608,%r1609,%r1610}, %rd241, %p92, 1, 1; 2026-02-21T09:21:07.3447296Z // end inline asm 2026-02-21T09:21:07.3447357Z add.s32 %r2093, %r2063, 12384; 2026-02-21T09:21:07.3447419Z bfe.u32 %r2094, %r2093, 4, 14; 2026-02-21T09:21:07.3447482Z cvt.u64.u32 %rd291, %r2094; 2026-02-21T09:21:07.3447553Z or.b64 %rd242, %rd291, 4611686293322072064; 2026-02-21T09:21:07.3447612Z // begin inline asm 2026-02-21T09:21:07.3447982Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1627,%r1628,%r1629,%r1630}, %rd242, %p92, 1, 1; 2026-02-21T09:21:07.3448038Z // end inline asm 2026-02-21T09:21:07.3448098Z add.s32 %r2095, %r2063, 16384; 2026-02-21T09:21:07.3448160Z bfe.u32 %r2096, %r2095, 4, 14; 2026-02-21T09:21:07.3448220Z cvt.u64.u32 %rd292, %r2096; 2026-02-21T09:21:07.3448293Z or.b64 %rd243, %rd292, 4611686293322072064; 2026-02-21T09:21:07.3448350Z // begin inline asm 2026-02-21T09:21:07.3448719Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1647,%r1648,%r1649,%r1650}, %rd243, %p92, 1, 1; 2026-02-21T09:21:07.3448788Z // end inline asm 2026-02-21T09:21:07.3448850Z add.s32 %r2097, %r2063, 16416; 2026-02-21T09:21:07.3448916Z bfe.u32 %r2098, %r2097, 4, 14; 2026-02-21T09:21:07.3448978Z cvt.u64.u32 %rd293, %r2098; 2026-02-21T09:21:07.3449050Z or.b64 %rd244, %rd293, 4611686293322072064; 2026-02-21T09:21:07.3449111Z // begin inline asm 2026-02-21T09:21:07.3449475Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1667,%r1668,%r1669,%r1670}, %rd244, %p92, 1, 1; 2026-02-21T09:21:07.3449532Z // end inline asm 2026-02-21T09:21:07.3449592Z add.s32 %r2099, %r2063, 16448; 2026-02-21T09:21:07.3449658Z bfe.u32 %r2100, %r2099, 4, 14; 2026-02-21T09:21:07.3449719Z cvt.u64.u32 %rd294, %r2100; 2026-02-21T09:21:07.3449790Z or.b64 %rd245, %rd294, 4611686293322072064; 2026-02-21T09:21:07.3450001Z // begin inline asm 2026-02-21T09:21:07.3450374Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1687,%r1688,%r1689,%r1690}, %rd245, %p92, 1, 1; 2026-02-21T09:21:07.3450431Z // end inline asm 2026-02-21T09:21:07.3450495Z add.s32 %r2101, %r2063, 16480; 2026-02-21T09:21:07.3450554Z bfe.u32 %r2102, %r2101, 4, 14; 2026-02-21T09:21:07.3450616Z cvt.u64.u32 %rd295, %r2102; 2026-02-21T09:21:07.3450687Z or.b64 %rd246, %rd295, 4611686293322072064; 2026-02-21T09:21:07.3450750Z // begin inline asm 2026-02-21T09:21:07.3451130Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1707,%r1708,%r1709,%r1710}, %rd246, %p92, 1, 1; 2026-02-21T09:21:07.3451185Z // end inline asm 2026-02-21T09:21:07.3451319Z add.s32 %r2103, %r2063, 20480; 2026-02-21T09:21:07.3451382Z bfe.u32 %r2104, %r2103, 4, 14; 2026-02-21T09:21:07.3451446Z cvt.u64.u32 %rd296, %r2104; 2026-02-21T09:21:07.3451522Z or.b64 %rd247, %rd296, 4611686293322072064; 2026-02-21T09:21:07.3451580Z // begin inline asm 2026-02-21T09:21:07.3452032Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1727,%r1728,%r1729,%r1730}, %rd247, %p92, 1, 1; 2026-02-21T09:21:07.3452095Z // end inline asm 2026-02-21T09:21:07.3452159Z add.s32 %r2105, %r2063, 20512; 2026-02-21T09:21:07.3452219Z bfe.u32 %r2106, %r2105, 4, 14; 2026-02-21T09:21:07.3452280Z cvt.u64.u32 %rd297, %r2106; 2026-02-21T09:21:07.3452356Z or.b64 %rd248, %rd297, 4611686293322072064; 2026-02-21T09:21:07.3452414Z // begin inline asm 2026-02-21T09:21:07.3452791Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1747,%r1748,%r1749,%r1750}, %rd248, %p92, 1, 1; 2026-02-21T09:21:07.3452852Z // end inline asm 2026-02-21T09:21:07.3452913Z add.s32 %r2107, %r2063, 20544; 2026-02-21T09:21:07.3452972Z bfe.u32 %r2108, %r2107, 4, 14; 2026-02-21T09:21:07.3453036Z cvt.u64.u32 %rd298, %r2108; 2026-02-21T09:21:07.3453110Z or.b64 %rd249, %rd298, 4611686293322072064; 2026-02-21T09:21:07.3453168Z // begin inline asm 2026-02-21T09:21:07.3453543Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1767,%r1768,%r1769,%r1770}, %rd249, %p92, 1, 1; 2026-02-21T09:21:07.3453605Z // end inline asm 2026-02-21T09:21:07.3453666Z add.s32 %r2109, %r2063, 20576; 2026-02-21T09:21:07.3453728Z bfe.u32 %r2110, %r2109, 4, 14; 2026-02-21T09:21:07.3453792Z cvt.u64.u32 %rd299, %r2110; 2026-02-21T09:21:07.3453864Z or.b64 %rd250, %rd299, 4611686293322072064; 2026-02-21T09:21:07.3453923Z // begin inline asm 2026-02-21T09:21:07.3454287Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1787,%r1788,%r1789,%r1790}, %rd250, %p92, 1, 1; 2026-02-21T09:21:07.3454345Z // end inline asm 2026-02-21T09:21:07.3454406Z add.s32 %r2111, %r2063, 24576; 2026-02-21T09:21:07.3454479Z bfe.u32 %r2112, %r2111, 4, 14; 2026-02-21T09:21:07.3454545Z cvt.u64.u32 %rd300, %r2112; 2026-02-21T09:21:07.3454616Z or.b64 %rd251, %rd300, 4611686293322072064; 2026-02-21T09:21:07.3454674Z // begin inline asm 2026-02-21T09:21:07.3455041Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1807,%r1808,%r1809,%r1810}, %rd251, %p92, 1, 1; 2026-02-21T09:21:07.3455097Z // end inline asm 2026-02-21T09:21:07.3455158Z add.s32 %r2113, %r2063, 24608; 2026-02-21T09:21:07.3455216Z bfe.u32 %r2114, %r2113, 4, 14; 2026-02-21T09:21:07.3455280Z cvt.u64.u32 %rd301, %r2114; 2026-02-21T09:21:07.3455349Z or.b64 %rd252, %rd301, 4611686293322072064; 2026-02-21T09:21:07.3455406Z // begin inline asm 2026-02-21T09:21:07.3455775Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1827,%r1828,%r1829,%r1830}, %rd252, %p92, 1, 1; 2026-02-21T09:21:07.3455902Z // end inline asm 2026-02-21T09:21:07.3456007Z add.s32 %r2115, %r2063, 24640; 2026-02-21T09:21:07.3456073Z bfe.u32 %r2116, %r2115, 4, 14; 2026-02-21T09:21:07.3456133Z cvt.u64.u32 %rd302, %r2116; 2026-02-21T09:21:07.3456204Z or.b64 %rd253, %rd302, 4611686293322072064; 2026-02-21T09:21:07.3456264Z // begin inline asm 2026-02-21T09:21:07.3456759Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1847,%r1848,%r1849,%r1850}, %rd253, %p92, 1, 1; 2026-02-21T09:21:07.3456830Z // end inline asm 2026-02-21T09:21:07.3456891Z add.s32 %r2117, %r2063, 24672; 2026-02-21T09:21:07.3456954Z bfe.u32 %r2118, %r2117, 4, 14; 2026-02-21T09:21:07.3457015Z cvt.u64.u32 %rd303, %r2118; 2026-02-21T09:21:07.3457086Z or.b64 %rd254, %rd303, 4611686293322072064; 2026-02-21T09:21:07.3457227Z // begin inline asm 2026-02-21T09:21:07.3457600Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1867,%r1868,%r1869,%r1870}, %rd254, %p92, 1, 1; 2026-02-21T09:21:07.3457673Z // end inline asm 2026-02-21T09:21:07.3457734Z add.s32 %r2119, %r2063, 28672; 2026-02-21T09:21:07.3457798Z bfe.u32 %r2120, %r2119, 4, 14; 2026-02-21T09:21:07.3457858Z cvt.u64.u32 %rd304, %r2120; 2026-02-21T09:21:07.3457988Z or.b64 %rd255, %rd304, 4611686293322072064; 2026-02-21T09:21:07.3458052Z // begin inline asm 2026-02-21T09:21:07.3458425Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1887,%r1888,%r1889,%r1890}, %rd255, %p92, 1, 1; 2026-02-21T09:21:07.3458483Z // end inline asm 2026-02-21T09:21:07.3458546Z add.s32 %r2121, %r2063, 28704; 2026-02-21T09:21:07.3458604Z bfe.u32 %r2122, %r2121, 4, 14; 2026-02-21T09:21:07.3458673Z cvt.u64.u32 %rd305, %r2122; 2026-02-21T09:21:07.3458747Z or.b64 %rd256, %rd305, 4611686293322072064; 2026-02-21T09:21:07.3458811Z // begin inline asm 2026-02-21T09:21:07.3459175Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1907,%r1908,%r1909,%r1910}, %rd256, %p92, 1, 1; 2026-02-21T09:21:07.3459235Z // end inline asm 2026-02-21T09:21:07.3459298Z add.s32 %r2123, %r2063, 28736; 2026-02-21T09:21:07.3459358Z bfe.u32 %r2124, %r2123, 4, 14; 2026-02-21T09:21:07.3459419Z cvt.u64.u32 %rd306, %r2124; 2026-02-21T09:21:07.3459494Z or.b64 %rd257, %rd306, 4611686293322072064; 2026-02-21T09:21:07.3459553Z // begin inline asm 2026-02-21T09:21:07.3459929Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1927,%r1928,%r1929,%r1930}, %rd257, %p92, 1, 1; 2026-02-21T09:21:07.3459988Z // end inline asm 2026-02-21T09:21:07.3460054Z add.s32 %r2125, %r2063, 28768; 2026-02-21T09:21:07.3460114Z bfe.u32 %r2126, %r2125, 4, 14; 2026-02-21T09:21:07.3460177Z cvt.u64.u32 %rd307, %r2126; 2026-02-21T09:21:07.3460253Z or.b64 %rd258, %rd307, 4611686293322072064; 2026-02-21T09:21:07.3460312Z // begin inline asm 2026-02-21T09:21:07.3460677Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161}, {%r1947,%r1948,%r1949,%r1950}, %rd258, %p92, 1, 1; 2026-02-21T09:21:07.3460737Z // end inline asm 2026-02-21T09:21:07.3460816Z wgmma.commit_group.sync.aligned; 2026-02-21T09:21:07.3460880Z mov.b32 %r1959, %r180; 2026-02-21T09:21:07.3460940Z mov.b32 %r1961, %r1200; 2026-02-21T09:21:07.3461001Z mov.b32 %r1960, %r1200; 2026-02-21T09:21:07.3461061Z // begin inline asm 2026-02-21T09:21:07.3461242Z // wait for regs: %r2154,%r2155,%r2156,%r2157,%r2158,%r2159,%r2160,%r2161,%r1959,%r1960,%r1961 2026-02-21T09:21:07.3461321Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:21:07.3461379Z // end inline asm 2026-02-21T09:21:07.3461434Z $L__tmp4: 2026-02-21T09:21:07.3461653Z .loc 1 58 32 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:32 2026-02-21T09:21:07.3461733Z add.s64 %rd259, %rd311, %rd37; 2026-02-21T09:21:07.3461796Z add.s64 %rd267, %rd311, %rd36; 2026-02-21T09:21:07.3461997Z add.s64 %rd260, %rd311, %rd35; 2026-02-21T09:21:07.3462063Z add.s64 %rd268, %rd311, %rd34; 2026-02-21T09:21:07.3462126Z add.s64 %rd261, %rd311, %rd33; 2026-02-21T09:21:07.3462189Z add.s64 %rd269, %rd311, %rd32; 2026-02-21T09:21:07.3462254Z add.s64 %rd262, %rd311, %rd31; 2026-02-21T09:21:07.3462314Z add.s64 %rd270, %rd311, %rd30; 2026-02-21T09:21:07.3462375Z add.s64 %rd263, %rd311, %rd29; 2026-02-21T09:21:07.3462436Z add.s64 %rd271, %rd311, %rd28; 2026-02-21T09:21:07.3462501Z add.s64 %rd264, %rd311, %rd27; 2026-02-21T09:21:07.3462561Z add.s64 %rd272, %rd311, %rd26; 2026-02-21T09:21:07.3462623Z add.s64 %rd265, %rd311, %rd25; 2026-02-21T09:21:07.3462690Z add.s64 %rd273, %rd311, %rd24; 2026-02-21T09:21:07.3462758Z add.s64 %rd266, %rd311, %rd23; 2026-02-21T09:21:07.3462887Z mad.wide.s32 %rd274, %r2151, 2, %rd42; 2026-02-21T09:21:07.3463108Z .loc 1 58 80 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:58:80 2026-02-21T09:21:07.3463192Z selp.b32 %r1974, 8, 0, %p129; 2026-02-21T09:21:07.3463252Z // begin inline asm 2026-02-21T09:21:07.3463403Z cp.async.ca.shared.global [ %r1165 + 0 ], [ %rd259 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3463463Z // end inline asm 2026-02-21T09:21:07.3463587Z // begin inline asm 2026-02-21T09:21:07.3463726Z cp.async.ca.shared.global [ %r1167 + 0 ], [ %rd260 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3463785Z // end inline asm 2026-02-21T09:21:07.3463842Z // begin inline asm 2026-02-21T09:21:07.3463973Z cp.async.ca.shared.global [ %r1169 + 0 ], [ %rd261 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3464030Z // end inline asm 2026-02-21T09:21:07.3464098Z // begin inline asm 2026-02-21T09:21:07.3464234Z cp.async.ca.shared.global [ %r1171 + 0 ], [ %rd262 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3464292Z // end inline asm 2026-02-21T09:21:07.3464353Z // begin inline asm 2026-02-21T09:21:07.3464483Z cp.async.ca.shared.global [ %r1173 + 0 ], [ %rd263 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3464543Z // end inline asm 2026-02-21T09:21:07.3464606Z // begin inline asm 2026-02-21T09:21:07.3464745Z cp.async.ca.shared.global [ %r1175 + 0 ], [ %rd264 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3464802Z // end inline asm 2026-02-21T09:21:07.3464861Z // begin inline asm 2026-02-21T09:21:07.3465005Z cp.async.ca.shared.global [ %r1177 + 0 ], [ %rd265 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3465062Z // end inline asm 2026-02-21T09:21:07.3465121Z // begin inline asm 2026-02-21T09:21:07.3465255Z cp.async.ca.shared.global [ %r1179 + 0 ], [ %rd266 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3465312Z // end inline asm 2026-02-21T09:21:07.3465369Z // begin inline asm 2026-02-21T09:21:07.3465501Z cp.async.ca.shared.global [ %r1181 + 0 ], [ %rd267 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3465560Z // end inline asm 2026-02-21T09:21:07.3465619Z // begin inline asm 2026-02-21T09:21:07.3465748Z cp.async.ca.shared.global [ %r1183 + 0 ], [ %rd268 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3465817Z // end inline asm 2026-02-21T09:21:07.3465879Z // begin inline asm 2026-02-21T09:21:07.3466009Z cp.async.ca.shared.global [ %r1185 + 0 ], [ %rd269 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3466065Z // end inline asm 2026-02-21T09:21:07.3466125Z // begin inline asm 2026-02-21T09:21:07.3466255Z cp.async.ca.shared.global [ %r1187 + 0 ], [ %rd270 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3466310Z // end inline asm 2026-02-21T09:21:07.3466370Z // begin inline asm 2026-02-21T09:21:07.3466627Z cp.async.ca.shared.global [ %r1189 + 0 ], [ %rd271 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3466697Z // end inline asm 2026-02-21T09:21:07.3466758Z // begin inline asm 2026-02-21T09:21:07.3466893Z cp.async.ca.shared.global [ %r1191 + 0 ], [ %rd272 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3466949Z // end inline asm 2026-02-21T09:21:07.3467008Z // begin inline asm 2026-02-21T09:21:07.3467139Z cp.async.ca.shared.global [ %r1193 + 0 ], [ %rd273 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3467194Z // end inline asm 2026-02-21T09:21:07.3467411Z // begin inline asm 2026-02-21T09:21:07.3467547Z cp.async.ca.shared.global [ %r1195 + 0 ], [ %rd274 + 0 ], 0x8, %r1974; 2026-02-21T09:21:07.3467604Z // end inline asm 2026-02-21T09:21:07.3467670Z cp.async.commit_group; 2026-02-21T09:21:07.3467895Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3467969Z and.pred %p124, %p2, %p129; 2026-02-21T09:21:07.3468027Z // begin inline asm 2026-02-21T09:21:07.3468163Z @%p124 mbarrier.arrive.expect_tx.shared.b64 _, [%r1160], 4096; 2026-02-21T09:21:07.3468224Z // end inline asm 2026-02-21T09:21:07.3468508Z .loc 1 64 33 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:64:33 2026-02-21T09:21:07.3468567Z bar.sync 0; 2026-02-21T09:21:07.3468712Z elect.sync %r2127|%p131, -1; 2026-02-21T09:21:07.3468785Z and.pred %p132, %p129, %p131; 2026-02-21T09:21:07.3468854Z and.pred %p125, %p1, %p132; 2026-02-21T09:21:07.3468918Z cvt.u32.u64 %r2128, %rd312; 2026-02-21T09:21:07.3468997Z add.s32 %r2008, %r2128, 256; 2026-02-21T09:21:07.3469061Z // begin inline asm 2026-02-21T09:21:07.3469404Z @%p125 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r235], [%rd211, {%r2130, %r2008}], [%r1160]; 2026-02-21T09:21:07.3469538Z // end inline asm 2026-02-21T09:21:07.3469765Z .loc 1 51 125 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:51:125 2026-02-21T09:21:07.3469830Z add.s64 %rd311, %rd311, 512; 2026-02-21T09:21:07.3469894Z add.s32 %r2151, %r2151, 256; 2026-02-21T09:21:07.3469955Z mov.b64 %rd312, %rd40; 2026-02-21T09:21:07.3470018Z @%p129 bra $L__BB0_5; 2026-02-21T09:21:07.3470132Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:21:07.3470203Z cp.async.wait_group 0; 2026-02-21T09:21:07.3470260Z bar.sync 0; 2026-02-21T09:21:07.3470320Z // begin inline asm 2026-02-21T09:21:07.3470418Z @%p2 mbarrier.inval.shared::cta.b64 [%r1160]; 2026-02-21T09:21:07.3470490Z // end inline asm 2026-02-21T09:21:07.3470701Z .loc 1 97 28 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:97:28 2026-02-21T09:21:07.3470787Z cvt.rn.bf16x2.f32 %r2134, %r2155, %r2154; 2026-02-21T09:21:07.3470866Z cvt.rn.bf16x2.f32 %r2135, %r2157, %r2156; 2026-02-21T09:21:07.3470939Z cvt.rn.bf16x2.f32 %r2136, %r2159, %r2158; 2026-02-21T09:21:07.3471009Z cvt.rn.bf16x2.f32 %r2137, %r2161, %r2160; 2026-02-21T09:21:07.3471222Z .loc 1 98 43 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:98:43 2026-02-21T09:21:07.3471303Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:21:07.3471369Z bar.sync 0; 2026-02-21T09:21:07.3471566Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r2134, %r2135, %r2136, %r2137}; 2026-02-21T09:21:07.3471630Z // begin inline asm 2026-02-21T09:21:07.3471710Z fence.proxy.async.shared::cta; 2026-02-21T09:21:07.3471769Z // end inline asm 2026-02-21T09:21:07.3471840Z bar.sync 0; 2026-02-21T09:21:07.3471925Z elect.sync %r2138|%p136, -1; 2026-02-21T09:21:07.3471996Z and.pred %p134, %p1, %p136; 2026-02-21T09:21:07.3472059Z // begin inline asm 2026-02-21T09:21:07.3472295Z @%p134 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd194, {%r2130, %r2131}], [%r1163]; 2026-02-21T09:21:07.3472355Z // end inline asm 2026-02-21T09:21:07.3472433Z cp.async.bulk.commit_group; 2026-02-21T09:21:07.3472639Z .loc 1 31 74 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:31:74 2026-02-21T09:21:07.3472701Z add.s32 %r2139, %r2139, 2; 2026-02-21T09:21:07.3472773Z setp.lt.s32 %p137, %r2139, %r3; 2026-02-21T09:21:07.3472839Z @%p137 bra $L__BB0_2; 2026-02-21T09:21:07.3472927Z $L__BB0_7: // %._crit_edge 2026-02-21T09:21:07.3473004Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:21:07.3473070Z bar.sync 0; 2026-02-21T09:21:07.3473276Z .loc 1 31 4 // ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py:31:4 2026-02-21T09:21:07.3473440Z ret; 2026-02-21T09:21:07.3473500Z $L__tmp5: 2026-02-21T09:21:07.3473560Z $L__func_end0: 2026-02-21T09:21:07.3473653Z // -- End function 2026-02-21T09:21:07.3473706Z } 2026-02-21T09:21:07.3473962Z .file 1 "/tmp/torchinductor_root/kc/ckcyf5ioegy7sg4bglobsk4ybpxy4f22skggcpij2tsoag3nj6jo.py" 2026-02-21T09:21:07.3474176Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:21:07.3474241Z .section .debug_abbrev 2026-02-21T09:21:07.3474294Z { 2026-02-21T09:21:07.3474391Z .b8 1 // Abbreviation Code 2026-02-21T09:21:07.3474488Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:21:07.3474573Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:07.3474720Z .b8 37 // DW_AT_producer 2026-02-21T09:21:07.3474805Z .b8 8 // DW_FORM_string 2026-02-21T09:21:07.3474889Z .b8 19 // DW_AT_language 2026-02-21T09:21:07.3474974Z .b8 5 // DW_FORM_data2 2026-02-21T09:21:07.3475057Z .b8 3 // DW_AT_name 2026-02-21T09:21:07.3475191Z .b8 8 // DW_FORM_string 2026-02-21T09:21:07.3475282Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:21:07.3475363Z .b8 6 // DW_FORM_data4 2026-02-21T09:21:07.3475444Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:21:07.3475540Z .b8 8 // DW_FORM_string 2026-02-21T09:21:07.3475615Z .b8 0 // EOM(1) 2026-02-21T09:21:07.3475684Z .b8 0 // EOM(2) 2026-02-21T09:21:07.3475774Z .b8 2 // Abbreviation Code 2026-02-21T09:21:07.3475866Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:07.3475948Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:07.3476023Z .b8 3 // DW_AT_name 2026-02-21T09:21:07.3476103Z .b8 8 // DW_FORM_string 2026-02-21T09:21:07.3476185Z .b8 32 // DW_AT_inline 2026-02-21T09:21:07.3476266Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:07.3476342Z .b8 0 // EOM(1) 2026-02-21T09:21:07.3476411Z .b8 0 // EOM(2) 2026-02-21T09:21:07.3476622Z .b8 3 // Abbreviation Code 2026-02-21T09:21:07.3476726Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:21:07.3476815Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:21:07.3476895Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:07.3476972Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:07.3477060Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:07.3477147Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:07.3477244Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:07.3477326Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:07.3477398Z .b8 0 // EOM(1) 2026-02-21T09:21:07.3477466Z .b8 0 // EOM(2) 2026-02-21T09:21:07.3477552Z .b8 4 // Abbreviation Code 2026-02-21T09:21:07.3477659Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:21:07.3477741Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:21:07.3477833Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:21:07.3477922Z .b8 19 // DW_FORM_ref4 2026-02-21T09:21:07.3478002Z .b8 17 // DW_AT_low_pc 2026-02-21T09:21:07.3478224Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:07.3478323Z .b8 18 // DW_AT_high_pc 2026-02-21T09:21:07.3478401Z .b8 1 // DW_FORM_addr 2026-02-21T09:21:07.3478488Z .b8 88 // DW_AT_call_file 2026-02-21T09:21:07.3478571Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:07.3478653Z .b8 89 // DW_AT_call_line 2026-02-21T09:21:07.3478731Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:07.3478817Z .b8 87 // DW_AT_call_column 2026-02-21T09:21:07.3478898Z .b8 11 // DW_FORM_data1 2026-02-21T09:21:07.3479033Z .b8 0 // EOM(1) 2026-02-21T09:21:07.3479107Z .b8 0 // EOM(2) 2026-02-21T09:21:07.3479182Z .b8 0 // EOM(3) 2026-02-21T09:21:07.3479237Z } 2026-02-21T09:21:07.3479302Z .section .debug_info 2026-02-21T09:21:07.3479356Z { 2026-02-21T09:21:07.3479447Z .b32 178 // Length of Unit 2026-02-21T09:21:07.3479608Z .b8 2 // DWARF version number 2026-02-21T09:21:07.3479665Z .b8 0 2026-02-21T09:21:07.3479803Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:21:07.3479899Z .b8 8 // Address Size (in bytes) 2026-02-21T09:21:07.3480016Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:21:07.3480108Z .b8 116 // DW_AT_producer 2026-02-21T09:21:07.3480164Z .b8 114 2026-02-21T09:21:07.3480221Z .b8 105 2026-02-21T09:21:07.3480274Z .b8 116 2026-02-21T09:21:07.3480328Z .b8 111 2026-02-21T09:21:07.3480380Z .b8 110 2026-02-21T09:21:07.3480432Z .b8 0 2026-02-21T09:21:07.3480516Z .b8 2 // DW_AT_language 2026-02-21T09:21:07.3480581Z .b8 0 2026-02-21T09:21:07.3480662Z .b8 99 // DW_AT_name 2026-02-21T09:21:07.3480715Z .b8 107 2026-02-21T09:21:07.3480769Z .b8 99 2026-02-21T09:21:07.3480820Z .b8 121 2026-02-21T09:21:07.3480871Z .b8 102 2026-02-21T09:21:07.3480927Z .b8 53 2026-02-21T09:21:07.3480988Z .b8 105 2026-02-21T09:21:07.3481042Z .b8 111 2026-02-21T09:21:07.3481094Z .b8 101 2026-02-21T09:21:07.3481150Z .b8 103 2026-02-21T09:21:07.3481200Z .b8 121 2026-02-21T09:21:07.3481256Z .b8 55 2026-02-21T09:21:07.3481312Z .b8 115 2026-02-21T09:21:07.3481363Z .b8 103 2026-02-21T09:21:07.3481414Z .b8 52 2026-02-21T09:21:07.3481463Z .b8 98 2026-02-21T09:21:07.3481519Z .b8 103 2026-02-21T09:21:07.3481571Z .b8 108 2026-02-21T09:21:07.3481620Z .b8 111 2026-02-21T09:21:07.3481670Z .b8 98 2026-02-21T09:21:07.3481727Z .b8 115 2026-02-21T09:21:07.3481788Z .b8 107 2026-02-21T09:21:07.3481845Z .b8 52 2026-02-21T09:21:07.3481900Z .b8 121 2026-02-21T09:21:07.3481952Z .b8 98 2026-02-21T09:21:07.3482007Z .b8 112 2026-02-21T09:21:07.3482057Z .b8 120 2026-02-21T09:21:07.3482111Z .b8 121 2026-02-21T09:21:07.3482161Z .b8 52 2026-02-21T09:21:07.3482211Z .b8 102 2026-02-21T09:21:07.3482265Z .b8 50 2026-02-21T09:21:07.3482315Z .b8 50 2026-02-21T09:21:07.3482366Z .b8 115 2026-02-21T09:21:07.3482419Z .b8 107 2026-02-21T09:21:07.3482475Z .b8 103 2026-02-21T09:21:07.3482525Z .b8 103 2026-02-21T09:21:07.3482576Z .b8 99 2026-02-21T09:21:07.3482633Z .b8 112 2026-02-21T09:21:07.3482685Z .b8 105 2026-02-21T09:21:07.3482736Z .b8 106 2026-02-21T09:21:07.3482785Z .b8 50 2026-02-21T09:21:07.3482846Z .b8 116 2026-02-21T09:21:07.3482902Z .b8 115 2026-02-21T09:21:07.3482954Z .b8 111 2026-02-21T09:21:07.3483005Z .b8 97 2026-02-21T09:21:07.3483059Z .b8 103 2026-02-21T09:21:07.3483110Z .b8 51 2026-02-21T09:21:07.3483161Z .b8 110 2026-02-21T09:21:07.3483217Z .b8 106 2026-02-21T09:21:07.3483267Z .b8 54 2026-02-21T09:21:07.3483319Z .b8 106 2026-02-21T09:21:07.3483370Z .b8 111 2026-02-21T09:21:07.3483422Z .b8 46 2026-02-21T09:21:07.3483588Z .b8 112 2026-02-21T09:21:07.3483639Z .b8 121 2026-02-21T09:21:07.3483693Z .b8 0 2026-02-21T09:21:07.3483795Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:21:07.3483887Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:21:07.3483942Z .b8 116 2026-02-21T09:21:07.3483997Z .b8 109 2026-02-21T09:21:07.3484048Z .b8 112 2026-02-21T09:21:07.3484099Z .b8 47 2026-02-21T09:21:07.3484153Z .b8 116 2026-02-21T09:21:07.3484206Z .b8 111 2026-02-21T09:21:07.3484257Z .b8 114 2026-02-21T09:21:07.3484308Z .b8 99 2026-02-21T09:21:07.3484362Z .b8 104 2026-02-21T09:21:07.3484412Z .b8 105 2026-02-21T09:21:07.3484471Z .b8 110 2026-02-21T09:21:07.3484523Z .b8 100 2026-02-21T09:21:07.3484578Z .b8 117 2026-02-21T09:21:07.3484629Z .b8 99 2026-02-21T09:21:07.3484680Z .b8 116 2026-02-21T09:21:07.3484814Z .b8 111 2026-02-21T09:21:07.3484870Z .b8 114 2026-02-21T09:21:07.3484922Z .b8 95 2026-02-21T09:21:07.3484974Z .b8 114 2026-02-21T09:21:07.3485030Z .b8 111 2026-02-21T09:21:07.3485085Z .b8 111 2026-02-21T09:21:07.3485138Z .b8 116 2026-02-21T09:21:07.3485191Z .b8 47 2026-02-21T09:21:07.3485243Z .b8 107 2026-02-21T09:21:07.3485293Z .b8 99 2026-02-21T09:21:07.3485346Z .b8 0 2026-02-21T09:21:07.3485520Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:21:07.3485604Z .b8 95 // DW_AT_name 2026-02-21T09:21:07.3485657Z .b8 104 2026-02-21T09:21:07.3485715Z .b8 101 2026-02-21T09:21:07.3485766Z .b8 108 2026-02-21T09:21:07.3485817Z .b8 105 2026-02-21T09:21:07.3485869Z .b8 111 2026-02-21T09:21:07.3485924Z .b8 110 2026-02-21T09:21:07.3485975Z .b8 95 2026-02-21T09:21:07.3486025Z .b8 109 2026-02-21T09:21:07.3486077Z .b8 97 2026-02-21T09:21:07.3486128Z .b8 116 2026-02-21T09:21:07.3486179Z .b8 109 2026-02-21T09:21:07.3486231Z .b8 117 2026-02-21T09:21:07.3486287Z .b8 108 2026-02-21T09:21:07.3486337Z .b8 95 2026-02-21T09:21:07.3486389Z .b8 98 2026-02-21T09:21:07.3486441Z .b8 102 2026-02-21T09:21:07.3486632Z .b8 49 2026-02-21T09:21:07.3486689Z .b8 54 2026-02-21T09:21:07.3486740Z .b8 95 2026-02-21T09:21:07.3486798Z .b8 105 2026-02-21T09:21:07.3486848Z .b8 110 2026-02-21T09:21:07.3486909Z .b8 116 2026-02-21T09:21:07.3486962Z .b8 52 2026-02-21T09:21:07.3487017Z .b8 0 2026-02-21T09:21:07.3487098Z .b8 1 // DW_AT_inline 2026-02-21T09:21:07.3487205Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:21:07.3487304Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:21:07.3487398Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:21:07.3487499Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:07.3487630Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:21:07.3487727Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:21:07.3487813Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:21:07.3487906Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:21:07.3487993Z .b8 1 // DW_AT_call_file 2026-02-21T09:21:07.3488077Z .b8 94 // DW_AT_call_line 2026-02-21T09:21:07.3488164Z .b8 40 // DW_AT_call_column 2026-02-21T09:21:07.3488268Z .b8 0 // End Of Children Mark 2026-02-21T09:21:07.3488356Z .b8 0 // End Of Children Mark 2026-02-21T09:21:07.3488407Z } 2026-02-21T09:21:07.3488482Z .section .debug_macinfo { } 2026-02-21T09:21:07.3488487Z 2026-02-21T09:21:07.3488566Z ================================================================ 2026-02-21T09:21:07.3488683Z please share the reproducer above with Triton project. 2026-02-21T09:21:07.4495330Z 2026-02-21T09:21:07.4497363Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 94/94 5.4 configs/s 2026-02-21T09:21:10.0842159Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━━━ 140/140 47.4 configs/s 2026-02-21T09:21:10.4707304Z [550s] Generation 4 complete: 2026-02-21T09:21:10.4707582Z error=25 2026-02-21T09:21:10.4707760Z timeout=2 2026-02-21T09:21:10.4707922Z ok=72 2026-02-21T09:21:10.4708083Z min=1.4480 2026-02-21T09:21:10.4708254Z mid=3.5167 2026-02-21T09:21:10.4708499Z max=95.5290 2026-02-21T09:21:10.4708685Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:21:10.4709087Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:21:10.4709491Z 'l2_groupings': [1], 2026-02-21T09:21:10.4709735Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:21:10.4710027Z 'loop_orders': [[0, 1]], 2026-02-21T09:21:10.4710238Z 'maxnreg': 256, 2026-02-21T09:21:10.4710439Z 'num_sm_multiplier': 8, 2026-02-21T09:21:10.4710937Z 'num_stages': 7, 2026-02-21T09:21:10.4711142Z 'num_warps': 4, 2026-02-21T09:21:10.4711345Z 'pid_type': 'persistent_blocked', 2026-02-21T09:21:10.4711609Z 'range_flattens': [None, None], 2026-02-21T09:21:10.4711896Z 'range_multi_buffers': [False, None], 2026-02-21T09:21:10.4712161Z 'range_num_stages': [2, 2], 2026-02-21T09:21:10.4712398Z 'range_unroll_factors': [4, 1], 2026-02-21T09:21:10.4712647Z 'range_warp_specializes': []} 2026-02-21T09:21:10.4753358Z [550s] Fitting surrogate: 555 points, 555 targets 2026-02-21T09:21:12.2115652Z [552s] Generation 5 starting: 102 neighbors, 5 active search path(s) 2026-02-21T09:21:50.9404998Z [591s] Timeout after 30s compiling Config(block_sizes=[16, 256, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, None], range_num_stages=[4, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:21:54.0570437Z [594s] Timeout after 30s compiling Config(block_sizes=[16, 512, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, None], range_num_stages=[4, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:21:54.3473423Z [594s] Timeout after 30s compiling Config(block_sizes=[16, 256, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[64], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=8, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, False], range_num_stages=[3, 4], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:21:59.7411492Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 104/104 0.6 configs/s 2026-02-21T09:22:02.8304870Z 2026-02-21T09:22:02.8304884Z 2026-02-21T09:22:02.8305267Z ================================================================ 2026-02-21T09:22:02.8305655Z Internal Triton PTX codegen error 2026-02-21T09:22:02.8305867Z `ptxas` stderr: 2026-02-21T09:22:02.8306755Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:02.8307764Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:02.8308394Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:02.8308591Z 2026-02-21T09:22:02.8309118Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpx3qykswl.ptx -o /tmp/tmpx3qykswl.ptx.o 2026-02-21T09:22:02.8309699Z 2026-02-21T09:22:02.8309703Z 2026-02-21T09:22:02.8309770Z // 2026-02-21T09:22:02.8309927Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:22:02.8310666Z // 2026-02-21T09:22:02.8310748Z 2026-02-21T09:22:02.8310806Z .version 8.7 2026-02-21T09:22:02.8310965Z .target sm_90a 2026-02-21T09:22:02.8311111Z .address_size 64 2026-02-21T09:22:02.8311211Z 2026-02-21T09:22:02.8311399Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:22:02.8311757Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:22:02.8312051Z // @_helion_matmul_bf16_int4 2026-02-21T09:22:02.8312323Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:22:02.8312621Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:22:02.8312978Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:22:02.8313454Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:22:02.8313819Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:22:02.8314186Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:22:02.8314462Z ) 2026-02-21T09:22:02.8314595Z .reqntid 1024 2026-02-21T09:22:02.8314732Z { 2026-02-21T09:22:02.8314865Z .reg .pred %p<13>; 2026-02-21T09:22:02.8315034Z .reg .b16 %rs<39>; 2026-02-21T09:22:02.8315315Z .reg .b32 %r<1112>; 2026-02-21T09:22:02.8315487Z .reg .b64 %rd<48>; 2026-02-21T09:22:02.8315812Z .loc 1 13 0 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:13:0 2026-02-21T09:22:02.8316178Z $L__func_begin0: 2026-02-21T09:22:02.8316679Z .loc 1 13 0 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:13:0 2026-02-21T09:22:02.8317011Z 2026-02-21T09:22:02.8317082Z // %bb.0: 2026-02-21T09:22:02.8317292Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:22:02.8317602Z ld.param.b64 %rd8, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:22:02.8317888Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:22:02.8318133Z $L__tmp0: 2026-02-21T09:22:02.8318432Z .loc 1 17 33 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:17:33 2026-02-21T09:22:02.8318806Z mov.u32 %r117, %ctaid.x; 2026-02-21T09:22:02.8319134Z .loc 1 20 29 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:20:29 2026-02-21T09:22:02.8319486Z shr.u32 %r118, %r117, 6; 2026-02-21T09:22:02.8319665Z and.b32 %r119, %r118, 33554424; 2026-02-21T09:22:02.8319999Z .loc 1 21 35 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:21:35 2026-02-21T09:22:02.8320356Z sub.s32 %r120, 64, %r119; 2026-02-21T09:22:02.8320668Z .loc 1 21 48 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:21:48 2026-02-21T09:22:02.8321021Z min.s32 %r121, %r120, 8; 2026-02-21T09:22:02.8321337Z .loc 1 22 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:41 2026-02-21T09:22:02.8321717Z and.b32 %r122, %r117, 511; 2026-02-21T09:22:02.8322058Z .loc 1 23 47 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:23:47 2026-02-21T09:22:02.8322415Z div.s32 %r123, %r122, %r121; 2026-02-21T09:22:02.8322744Z .loc 1 22 60 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:60 2026-02-21T09:22:02.8323102Z mul.lo.s32 %r124, %r123, %r121; 2026-02-21T09:22:02.8323299Z sub.s32 %r125, %r122, %r124; 2026-02-21T09:22:02.8323788Z [603s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:22:02.8325200Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:22:02.8326827Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:22:02.8327214Z `ptxas` stderr: 2026-02-21T09:22:02.8327848Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:02.8328784Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:02.8329286Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:02.8329466Z 2026-02-21T09:22:02.8329971Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpx3qykswl.ptx -o /tmp/tmpx3qykswl.ptx.o 2026-02-21T09:22:02.8330643Z 2026-02-21T09:22:02.8330803Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:22:02.8331242Z .loc 1 22 26 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:26 2026-02-21T09:22:02.8331624Z add.s32 %r126, %r125, %r119; 2026-02-21T09:22:02.8331947Z .loc 1 24 23 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:24:23 2026-02-21T09:22:02.8332372Z shl.b32 %r1, %r126, 7; 2026-02-21T09:22:02.8332691Z .loc 1 25 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:41 2026-02-21T09:22:02.8333036Z mov.u32 %r2, %tid.x; 2026-02-21T09:22:02.8333205Z and.b32 %r127, %r2, 31; 2026-02-21T09:22:02.8333370Z shr.u32 %r3, %r2, 5; 2026-02-21T09:22:02.8333533Z and.b32 %r128, %r2, 127; 2026-02-21T09:22:02.8333839Z .loc 1 26 23 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:26:23 2026-02-21T09:22:02.8334199Z shl.b32 %r4, %r123, 8; 2026-02-21T09:22:02.8334504Z .loc 1 27 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:41 2026-02-21T09:22:02.8334850Z shr.u32 %r129, %r2, 2; 2026-02-21T09:22:02.8335176Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:02.8335525Z or.b32 %r130, %r4, %r129; 2026-02-21T09:22:02.8335841Z .loc 1 41 34 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:41:34 2026-02-21T09:22:02.8336194Z and.b32 %r5, %r2, 3; 2026-02-21T09:22:02.8336350Z shl.b32 %r131, %r5, 2; 2026-02-21T09:22:02.8336783Z .loc 1 42 49 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:49 2026-02-21T09:22:02.8337131Z shl.b32 %r132, %r130, 10; 2026-02-21T09:22:02.8337444Z .loc 1 59 34 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:59:34 2026-02-21T09:22:02.8337807Z and.b32 %r6, %r2, 128; 2026-02-21T09:22:02.8338117Z .loc 1 42 56 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:56 2026-02-21T09:22:02.8338466Z or.b32 %r133, %r132, %r131; 2026-02-21T09:22:02.8338792Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8339156Z mad.wide.s32 %rd10, %r133, 2, %rd7; 2026-02-21T09:22:02.8339494Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8339846Z shl.b32 %r7, %r2, 3; 2026-02-21T09:22:02.8340002Z and.b32 %r134, %r7, 8056; 2026-02-21T09:22:02.8340179Z bfe.s32 %r135, %r2, 4, 1; 2026-02-21T09:22:02.8340352Z and.b32 %r136, %r135, 136; 2026-02-21T09:22:02.8340536Z xor.b32 %r137, %r136, %r134; 2026-02-21T09:22:02.8340723Z mov.b32 %r138, global_smem; 2026-02-21T09:22:02.8340902Z add.s32 %r94, %r138, %r137; 2026-02-21T09:22:02.8341077Z mov.b32 %r95, 8; 2026-02-21T09:22:02.8341230Z // begin inline asm 2026-02-21T09:22:02.8341472Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd10 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8341745Z // end inline asm 2026-02-21T09:22:02.8341912Z cp.async.commit_group; 2026-02-21T09:22:02.8342328Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8342766Z add.s64 %rd11, %rd10, 32; 2026-02-21T09:22:02.8343085Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8343439Z add.s32 %r96, %r94, 40960; 2026-02-21T09:22:02.8343617Z // begin inline asm 2026-02-21T09:22:02.8343842Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd11 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8344112Z // end inline asm 2026-02-21T09:22:02.8344266Z cp.async.commit_group; 2026-02-21T09:22:02.8344597Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8344946Z add.s64 %rd12, %rd10, 64; 2026-02-21T09:22:02.8345353Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8345705Z bar.sync 0; 2026-02-21T09:22:02.8345848Z add.s32 %r98, %r94, 8192; 2026-02-21T09:22:02.8346026Z // begin inline asm 2026-02-21T09:22:02.8346247Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd12 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8346638Z // end inline asm 2026-02-21T09:22:02.8346801Z cp.async.commit_group; 2026-02-21T09:22:02.8347191Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8347550Z add.s64 %rd13, %rd10, 96; 2026-02-21T09:22:02.8347856Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8348332Z add.s32 %r100, %r94, 49152; 2026-02-21T09:22:02.8348511Z // begin inline asm 2026-02-21T09:22:02.8348744Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd13 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8349010Z // end inline asm 2026-02-21T09:22:02.8349172Z cp.async.commit_group; 2026-02-21T09:22:02.8349473Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8349836Z add.s64 %rd14, %rd10, 128; 2026-02-21T09:22:02.8350157Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8350497Z bar.sync 0; 2026-02-21T09:22:02.8350647Z add.s32 %r102, %r94, 16384; 2026-02-21T09:22:02.8350823Z // begin inline asm 2026-02-21T09:22:02.8351051Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd14 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8351318Z // end inline asm 2026-02-21T09:22:02.8351480Z cp.async.commit_group; 2026-02-21T09:22:02.8351788Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8352143Z add.s64 %rd15, %rd10, 160; 2026-02-21T09:22:02.8352466Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8352819Z add.s32 %r104, %r94, 57344; 2026-02-21T09:22:02.8352997Z // begin inline asm 2026-02-21T09:22:02.8353218Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd15 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8353494Z // end inline asm 2026-02-21T09:22:02.8353649Z cp.async.commit_group; 2026-02-21T09:22:02.8353959Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8354315Z add.s64 %rd16, %rd10, 192; 2026-02-21T09:22:02.8354637Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8354994Z bar.sync 0; 2026-02-21T09:22:02.8355138Z add.s32 %r106, %r94, 24576; 2026-02-21T09:22:02.8355320Z // begin inline asm 2026-02-21T09:22:02.8355546Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd16 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8355821Z // end inline asm 2026-02-21T09:22:02.8355976Z cp.async.commit_group; 2026-02-21T09:22:02.8356293Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8356796Z add.s64 %rd17, %rd10, 224; 2026-02-21T09:22:02.8357108Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8357640Z add.s32 %r108, %r94, 65536; 2026-02-21T09:22:02.8357819Z // begin inline asm 2026-02-21T09:22:02.8358058Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd17 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8358327Z // end inline asm 2026-02-21T09:22:02.8358488Z cp.async.commit_group; 2026-02-21T09:22:02.8358798Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8359161Z add.s64 %rd18, %rd10, 256; 2026-02-21T09:22:02.8359479Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8359824Z bar.sync 0; 2026-02-21T09:22:02.8359977Z add.s32 %r110, %r94, 32768; 2026-02-21T09:22:02.8360254Z // begin inline asm 2026-02-21T09:22:02.8360488Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd18 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8360754Z // end inline asm 2026-02-21T09:22:02.8360918Z cp.async.commit_group; 2026-02-21T09:22:02.8361233Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8361596Z add.s64 %rd19, %rd10, 288; 2026-02-21T09:22:02.8361983Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8362334Z add.s32 %r112, %r94, 73728; 2026-02-21T09:22:02.8362514Z // begin inline asm 2026-02-21T09:22:02.8362737Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd19 + 0 ], 0x8, %r95; 2026-02-21T09:22:02.8363010Z // end inline asm 2026-02-21T09:22:02.8363165Z cp.async.commit_group; 2026-02-21T09:22:02.8363333Z shl.b32 %r139, %r2, 4; 2026-02-21T09:22:02.8363503Z and.b32 %r140, %r139, 7680; 2026-02-21T09:22:02.8363676Z and.b32 %r141, %r7, 96; 2026-02-21T09:22:02.8363845Z shl.b32 %r142, %r5, 1; 2026-02-21T09:22:02.8364008Z or.b32 %r143, %r140, %r141; 2026-02-21T09:22:02.8364196Z or.b32 %r144, %r143, %r142; 2026-02-21T09:22:02.8364378Z or.b32 %r10, %r144, %r136; 2026-02-21T09:22:02.8364553Z xor.b32 %r11, %r10, 8; 2026-02-21T09:22:02.8364713Z shl.b32 %r12, %r2, 2; 2026-02-21T09:22:02.8364878Z and.b32 %r145, %r12, 124; 2026-02-21T09:22:02.8365046Z and.b32 %r146, %r2, 384; 2026-02-21T09:22:02.8365221Z shr.u32 %r13, %r2, 4; 2026-02-21T09:22:02.8365383Z and.b32 %r147, %r13, 2; 2026-02-21T09:22:02.8365552Z and.b32 %r148, %r7, 512; 2026-02-21T09:22:02.8372742Z setp.gt.u32 %p1, %r2, 511; 2026-02-21T09:22:02.8372960Z selp.b32 %r149, 1, 0, %p1; 2026-02-21T09:22:02.8373150Z add.s32 %r691, %r138, 81920; 2026-02-21T09:22:02.8373361Z add.s32 %r151, %r691, %r146; 2026-02-21T09:22:02.8373553Z add.s32 %r152, %r151, %r149; 2026-02-21T09:22:02.8373738Z add.s32 %r153, %r152, %r148; 2026-02-21T09:22:02.8373917Z add.s32 %r154, %r153, %r147; 2026-02-21T09:22:02.8374105Z add.s32 %r14, %r154, %r145; 2026-02-21T09:22:02.8374284Z shr.u32 %r155, %r2, 1; 2026-02-21T09:22:02.8374462Z and.b32 %r156, %r155, 384; 2026-02-21T09:22:02.8374654Z add.s32 %r157, %r691, %r147; 2026-02-21T09:22:02.8374830Z add.s32 %r158, %r157, %r156; 2026-02-21T09:22:02.8375010Z add.s32 %r159, %r158, %r145; 2026-02-21T09:22:02.8375184Z add.s32 %r15, %r159, %r148; 2026-02-21T09:22:02.8375365Z shl.b32 %r160, %r128, 6; 2026-02-21T09:22:02.8375538Z and.b32 %r161, %r7, 48; 2026-02-21T09:22:02.8375713Z and.b32 %r162, %r3, 28; 2026-02-21T09:22:02.8375875Z xor.b32 %r163, %r161, %r162; 2026-02-21T09:22:02.8376060Z or.b32 %r164, %r163, %r160; 2026-02-21T09:22:02.8376240Z add.s32 %r16, %r691, %r164; 2026-02-21T09:22:02.8376423Z xor.b32 %r165, %r164, 32; 2026-02-21T09:22:02.8376770Z add.s32 %r17, %r691, %r165; 2026-02-21T09:22:02.8376946Z shl.b32 %r166, %r3, 7; 2026-02-21T09:22:02.8377123Z shl.b32 %r167, %r127, 4; 2026-02-21T09:22:02.8377293Z or.b32 %r168, %r166, %r167; 2026-02-21T09:22:02.8377486Z add.s32 %r169, %r138, 90112; 2026-02-21T09:22:02.8377665Z add.s32 %r695, %r169, %r168; 2026-02-21T09:22:02.8377847Z and.b32 %r170, %r139, 112; 2026-02-21T09:22:02.8378273Z shl.b32 %r171, %r127, 3; 2026-02-21T09:22:02.8378445Z or.b32 %r172, %r166, %r171; 2026-02-21T09:22:02.8378616Z and.b32 %r173, %r172, 1920; 2026-02-21T09:22:02.8378798Z shl.b32 %r174, %r2, 8; 2026-02-21T09:22:02.8378971Z and.b32 %r175, %r174, 2048; 2026-02-21T09:22:02.8379144Z add.s32 %r176, %r169, %r170; 2026-02-21T09:22:02.8379323Z add.s32 %r177, %r176, %r175; 2026-02-21T09:22:02.8379494Z add.s32 %r194, %r177, %r173; 2026-02-21T09:22:02.8379690Z bfe.u32 %r178, %r691, 4, 14; 2026-02-21T09:22:02.8379875Z cvt.u64.u32 %rd21, %r178; 2026-02-21T09:22:02.8380083Z or.b64 %rd28, %rd21, -9223371899382267904; 2026-02-21T09:22:02.8380302Z add.s32 %r179, %r138, 81952; 2026-02-21T09:22:02.8380501Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T09:22:02.8380776Z cvt.u64.u32 %rd22, %r180; 2026-02-21T09:22:02.8380978Z or.b64 %rd29, %rd22, -9223371899382267904; 2026-02-21T09:22:02.8381203Z add.s32 %r181, %r138, 86016; 2026-02-21T09:22:02.8381385Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T09:22:02.8381576Z cvt.u64.u32 %rd23, %r182; 2026-02-21T09:22:02.8381760Z or.b64 %rd30, %rd23, -9223371899382267904; 2026-02-21T09:22:02.8381971Z add.s32 %r183, %r138, 86048; 2026-02-21T09:22:02.8382142Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:22:02.8382397Z cvt.u64.u32 %rd24, %r184; 2026-02-21T09:22:02.8382586Z or.b64 %rd31, %rd24, -9223371899382267904; 2026-02-21T09:22:02.8382964Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:02.8383358Z shl.b32 %r185, %r123, 18; 2026-02-21T09:22:02.8383531Z shl.b32 %r186, %r129, 10; 2026-02-21T09:22:02.8383714Z or.b32 %r187, %r185, %r186; 2026-02-21T09:22:02.8383894Z or.b32 %r188, %r187, %r131; 2026-02-21T09:22:02.8384076Z or.b32 %r1077, %r188, 176; 2026-02-21T09:22:02.8384253Z shl.b32 %r189, %r2, 6; 2026-02-21T09:22:02.8384431Z and.b32 %r190, %r189, 57344; 2026-02-21T09:22:02.8384608Z add.s32 %r191, %r190, %r1; 2026-02-21T09:22:02.8384792Z or.b32 %r1076, %r191, %r128; 2026-02-21T09:22:02.8384973Z mov.b32 %r826, 0f00000000; 2026-02-21T09:22:02.8385144Z mov.b32 %r1079, 4; 2026-02-21T09:22:02.8385318Z mov.b32 %r1078, -1; 2026-02-21T09:22:02.8385484Z mov.b64 %rd47, -16; 2026-02-21T09:22:02.8385660Z setp.eq.b32 %p8, %r6, 0; 2026-02-21T09:22:02.8385837Z mov.b32 %r827, %r826; 2026-02-21T09:22:02.8386005Z mov.b32 %r828, %r826; 2026-02-21T09:22:02.8386165Z mov.b32 %r829, %r826; 2026-02-21T09:22:02.8386328Z mov.b32 %r830, %r826; 2026-02-21T09:22:02.8386616Z mov.b32 %r831, %r826; 2026-02-21T09:22:02.8386797Z mov.b32 %r832, %r826; 2026-02-21T09:22:02.8386961Z mov.b32 %r833, %r826; 2026-02-21T09:22:02.8387116Z mov.b32 %r834, %r826; 2026-02-21T09:22:02.8387277Z mov.b32 %r835, %r826; 2026-02-21T09:22:02.8387443Z mov.b32 %r836, %r826; 2026-02-21T09:22:02.8387612Z mov.b32 %r837, %r826; 2026-02-21T09:22:02.8387767Z mov.b32 %r838, %r826; 2026-02-21T09:22:02.8387927Z mov.b32 %r839, %r826; 2026-02-21T09:22:02.8388087Z mov.b32 %r840, %r826; 2026-02-21T09:22:02.8388349Z mov.b32 %r841, %r826; 2026-02-21T09:22:02.8388511Z mov.b32 %r842, %r826; 2026-02-21T09:22:02.8388671Z mov.b32 %r843, %r826; 2026-02-21T09:22:02.8388830Z mov.b32 %r844, %r826; 2026-02-21T09:22:02.8388986Z mov.b32 %r845, %r826; 2026-02-21T09:22:02.8389151Z mov.b32 %r846, %r826; 2026-02-21T09:22:02.8389304Z mov.b32 %r847, %r826; 2026-02-21T09:22:02.8389470Z mov.b32 %r848, %r826; 2026-02-21T09:22:02.8389623Z mov.b32 %r849, %r826; 2026-02-21T09:22:02.8389783Z mov.b32 %r850, %r826; 2026-02-21T09:22:02.8389942Z mov.b32 %r851, %r826; 2026-02-21T09:22:02.8390104Z mov.b32 %r852, %r826; 2026-02-21T09:22:02.8390257Z mov.b32 %r853, %r826; 2026-02-21T09:22:02.8390418Z mov.b32 %r854, %r826; 2026-02-21T09:22:02.8390571Z mov.b32 %r855, %r826; 2026-02-21T09:22:02.8390737Z mov.b32 %r856, %r826; 2026-02-21T09:22:02.8390897Z mov.b32 %r857, %r826; 2026-02-21T09:22:02.8391111Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:22:02.8391541Z add.s64 %rd47, %rd47, 16; 2026-02-21T09:22:02.8391732Z setp.lt.u64 %p9, %rd47, 432; 2026-02-21T09:22:02.8391919Z add.s32 %r968, %r1078, 1; 2026-02-21T09:22:02.8392094Z setp.gt.s32 %p10, %r968, 4; 2026-02-21T09:22:02.8392293Z selp.b32 %r1078, 0, %r968, %p10; 2026-02-21T09:22:02.8392639Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8393014Z cp.async.wait_group 8; 2026-02-21T09:22:02.8393197Z bar.sync 0; 2026-02-21T09:22:02.8393350Z shl.b32 %r969, %r1078, 13; 2026-02-21T09:22:02.8393536Z add.s32 %r971, %r138, %r969; 2026-02-21T09:22:02.8393852Z .loc 1 46 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:46:28 2026-02-21T09:22:02.8394302Z add.s32 %r972, %r971, %r10; 2026-02-21T09:22:02.8394494Z ld.shared.b16 %rs3, [%r972]; 2026-02-21T09:22:02.8394702Z ld.shared.b16 %rs4, [%r972+256]; 2026-02-21T09:22:02.8394902Z ld.shared.b16 %rs5, [%r972+16]; 2026-02-21T09:22:02.8395106Z ld.shared.b16 %rs6, [%r972+272]; 2026-02-21T09:22:02.8395301Z add.s32 %r973, %r971, %r11; 2026-02-21T09:22:02.8395483Z ld.shared.b16 %rs7, [%r973]; 2026-02-21T09:22:02.8395680Z ld.shared.b16 %rs8, [%r973+256]; 2026-02-21T09:22:02.8395945Z ld.shared.b16 %rs9, [%r973+16]; 2026-02-21T09:22:02.8396156Z ld.shared.b16 %rs10, [%r973+272]; 2026-02-21T09:22:02.8396356Z cvt.f32.bf16 %r488, %rs3; 2026-02-21T09:22:02.8396661Z cvt.f32.bf16 %r489, %rs4; 2026-02-21T09:22:02.8396838Z cvt.f32.bf16 %r490, %rs7; 2026-02-21T09:22:02.8397025Z cvt.f32.bf16 %r491, %rs8; 2026-02-21T09:22:02.8397206Z cvt.f32.bf16 %r556, %rs5; 2026-02-21T09:22:02.8397377Z cvt.f32.bf16 %r557, %rs6; 2026-02-21T09:22:02.8397550Z cvt.f32.bf16 %r558, %rs9; 2026-02-21T09:22:02.8397720Z cvt.f32.bf16 %r559, %rs10; 2026-02-21T09:22:02.8398065Z .loc 1 48 30 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:30 2026-02-21T09:22:02.8398428Z cvt.s64.s32 %rd39, %r1076; 2026-02-21T09:22:02.8398619Z add.s64 %rd26, %rd8, %rd39; 2026-02-21T09:22:02.8398947Z .loc 1 48 83 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:83 2026-02-21T09:22:02.8399307Z // begin inline asm 2026-02-21T09:22:02.8399476Z mov.u64 %rd25, 0x0; 2026-02-21T09:22:02.8399738Z createpolicy.fractional.L2::evict_first.b64 %rd25, 1.0; 2026-02-21T09:22:02.8400007Z // end inline asm 2026-02-21T09:22:02.8400161Z // begin inline asm 2026-02-21T09:22:02.8400322Z mov.u16 %rs1, 0x0; 2026-02-21T09:22:02.8400573Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd26 + 0 ], %rd25; 2026-02-21T09:22:02.8400886Z // end inline asm 2026-02-21T09:22:02.8401196Z .loc 1 56 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:56:24 2026-02-21T09:22:02.8401570Z st.shared.b8 [%r14], %rs1; 2026-02-21T09:22:02.8401755Z bar.sync 0; 2026-02-21T09:22:02.8401923Z ld.shared.v2.b8 {%rs11, %rs12}, [%r15]; 2026-02-21T09:22:02.8402291Z .loc 1 51 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:51:24 2026-02-21T09:22:02.8402650Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:22:02.8402832Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:22:02.8403141Z .loc 1 66 54 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:66:54 2026-02-21T09:22:02.8403509Z selp.b16 %rs15, %rs13, %rs11, %p8; 2026-02-21T09:22:02.8403717Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:22:02.8403887Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:22:02.8404066Z selp.b16 %rs18, %rs14, %rs12, %p8; 2026-02-21T09:22:02.8404262Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:22:02.8404439Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:22:02.8404749Z .loc 1 71 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:71:28 2026-02-21T09:22:02.8405113Z cvt.rn.f32.s16 %r974, %rs17; 2026-02-21T09:22:02.8405306Z cvt.rn.f32.s16 %r975, %rs20; 2026-02-21T09:22:02.8405479Z bar.sync 0; 2026-02-21T09:22:02.8405634Z st.shared.b32 [%r16], %r974; 2026-02-21T09:22:02.8405968Z st.shared.b32 [%r17], %r975; 2026-02-21T09:22:02.8406227Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r826}; 2026-02-21T09:22:02.8406656Z bar.sync 0; 2026-02-21T09:22:02.8406826Z // begin inline asm 2026-02-21T09:22:02.8407079Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r356, %r492}, [%r194]; 2026-02-21T09:22:02.8407372Z // end inline asm 2026-02-21T09:22:02.8407520Z bar.sync 0; 2026-02-21T09:22:02.8407738Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r828}; 2026-02-21T09:22:02.8408004Z bar.sync 0; 2026-02-21T09:22:02.8408147Z // begin inline asm 2026-02-21T09:22:02.8408393Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r358, %r494}, [%r194]; 2026-02-21T09:22:02.8408674Z // end inline asm 2026-02-21T09:22:02.8408825Z bar.sync 0; 2026-02-21T09:22:02.8409121Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r827}; 2026-02-21T09:22:02.8409402Z bar.sync 0; 2026-02-21T09:22:02.8409544Z // begin inline asm 2026-02-21T09:22:02.8409788Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r357, %r493}, [%r194]; 2026-02-21T09:22:02.8410073Z // end inline asm 2026-02-21T09:22:02.8410214Z bar.sync 0; 2026-02-21T09:22:02.8410425Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r829}; 2026-02-21T09:22:02.8410742Z bar.sync 0; 2026-02-21T09:22:02.8410891Z // begin inline asm 2026-02-21T09:22:02.8411248Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r359, %r495}, [%r194]; 2026-02-21T09:22:02.8411560Z // end inline asm 2026-02-21T09:22:02.8411707Z bar.sync 0; 2026-02-21T09:22:02.8411915Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r830}; 2026-02-21T09:22:02.8412171Z bar.sync 0; 2026-02-21T09:22:02.8412313Z // begin inline asm 2026-02-21T09:22:02.8412547Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r360, %r496}, [%r194]; 2026-02-21T09:22:02.8412827Z // end inline asm 2026-02-21T09:22:02.8412984Z bar.sync 0; 2026-02-21T09:22:02.8413193Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r832}; 2026-02-21T09:22:02.8413457Z bar.sync 0; 2026-02-21T09:22:02.8413596Z // begin inline asm 2026-02-21T09:22:02.8413832Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r362, %r498}, [%r194]; 2026-02-21T09:22:02.8414106Z // end inline asm 2026-02-21T09:22:02.8414254Z bar.sync 0; 2026-02-21T09:22:02.8414463Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r831}; 2026-02-21T09:22:02.8414719Z bar.sync 0; 2026-02-21T09:22:02.8414864Z // begin inline asm 2026-02-21T09:22:02.8415095Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r361, %r497}, [%r194]; 2026-02-21T09:22:02.8415376Z // end inline asm 2026-02-21T09:22:02.8415521Z bar.sync 0; 2026-02-21T09:22:02.8415721Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r833}; 2026-02-21T09:22:02.8415979Z bar.sync 0; 2026-02-21T09:22:02.8416116Z // begin inline asm 2026-02-21T09:22:02.8416351Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r363, %r499}, [%r194]; 2026-02-21T09:22:02.8416789Z // end inline asm 2026-02-21T09:22:02.8416941Z bar.sync 0; 2026-02-21T09:22:02.8417152Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r834}; 2026-02-21T09:22:02.8417410Z bar.sync 0; 2026-02-21T09:22:02.8417556Z // begin inline asm 2026-02-21T09:22:02.8417804Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r364, %r500}, [%r194]; 2026-02-21T09:22:02.8418081Z // end inline asm 2026-02-21T09:22:02.8418223Z bar.sync 0; 2026-02-21T09:22:02.8418425Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r836}; 2026-02-21T09:22:02.8418679Z bar.sync 0; 2026-02-21T09:22:02.8418820Z // begin inline asm 2026-02-21T09:22:02.8419051Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r366, %r502}, [%r194]; 2026-02-21T09:22:02.8419327Z // end inline asm 2026-02-21T09:22:02.8419466Z bar.sync 0; 2026-02-21T09:22:02.8419663Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r835}; 2026-02-21T09:22:02.8419921Z bar.sync 0; 2026-02-21T09:22:02.8420063Z // begin inline asm 2026-02-21T09:22:02.8420298Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r365, %r501}, [%r194]; 2026-02-21T09:22:02.8420570Z // end inline asm 2026-02-21T09:22:02.8420907Z bar.sync 0; 2026-02-21T09:22:02.8421110Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r837}; 2026-02-21T09:22:02.8421366Z bar.sync 0; 2026-02-21T09:22:02.8421501Z // begin inline asm 2026-02-21T09:22:02.8421737Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r367, %r503}, [%r194]; 2026-02-21T09:22:02.8422019Z // end inline asm 2026-02-21T09:22:02.8422159Z bar.sync 0; 2026-02-21T09:22:02.8422366Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r838}; 2026-02-21T09:22:02.8422615Z bar.sync 0; 2026-02-21T09:22:02.8422750Z // begin inline asm 2026-02-21T09:22:02.8422976Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r368, %r504}, [%r194]; 2026-02-21T09:22:02.8423253Z // end inline asm 2026-02-21T09:22:02.8423396Z bar.sync 0; 2026-02-21T09:22:02.8423674Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r840}; 2026-02-21T09:22:02.8423940Z bar.sync 0; 2026-02-21T09:22:02.8424080Z // begin inline asm 2026-02-21T09:22:02.8424313Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r370, %r506}, [%r194]; 2026-02-21T09:22:02.8424589Z // end inline asm 2026-02-21T09:22:02.8424730Z bar.sync 0; 2026-02-21T09:22:02.8424931Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r839}; 2026-02-21T09:22:02.8425189Z bar.sync 0; 2026-02-21T09:22:02.8425390Z // begin inline asm 2026-02-21T09:22:02.8425633Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r369, %r505}, [%r194]; 2026-02-21T09:22:02.8425912Z // end inline asm 2026-02-21T09:22:02.8426055Z bar.sync 0; 2026-02-21T09:22:02.8426266Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r841}; 2026-02-21T09:22:02.8426685Z bar.sync 0; 2026-02-21T09:22:02.8426842Z // begin inline asm 2026-02-21T09:22:02.8427076Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r371, %r507}, [%r194]; 2026-02-21T09:22:02.8427356Z // end inline asm 2026-02-21T09:22:02.8427501Z bar.sync 0; 2026-02-21T09:22:02.8427716Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r842}; 2026-02-21T09:22:02.8427973Z bar.sync 0; 2026-02-21T09:22:02.8428121Z // begin inline asm 2026-02-21T09:22:02.8428442Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r372, %r508}, [%r194]; 2026-02-21T09:22:02.8428719Z // end inline asm 2026-02-21T09:22:02.8428865Z bar.sync 0; 2026-02-21T09:22:02.8429068Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r844}; 2026-02-21T09:22:02.8429331Z bar.sync 0; 2026-02-21T09:22:02.8429468Z // begin inline asm 2026-02-21T09:22:02.8429702Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r374, %r510}, [%r194]; 2026-02-21T09:22:02.8429975Z // end inline asm 2026-02-21T09:22:02.8430122Z bar.sync 0; 2026-02-21T09:22:02.8430326Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r843}; 2026-02-21T09:22:02.8430587Z bar.sync 0; 2026-02-21T09:22:02.8430727Z // begin inline asm 2026-02-21T09:22:02.8430959Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r373, %r509}, [%r194]; 2026-02-21T09:22:02.8431234Z // end inline asm 2026-02-21T09:22:02.8431376Z bar.sync 0; 2026-02-21T09:22:02.8431580Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r845}; 2026-02-21T09:22:02.8431839Z bar.sync 0; 2026-02-21T09:22:02.8431980Z // begin inline asm 2026-02-21T09:22:02.8432208Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r375, %r511}, [%r194]; 2026-02-21T09:22:02.8432484Z // end inline asm 2026-02-21T09:22:02.8432631Z bar.sync 0; 2026-02-21T09:22:02.8432835Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r846}; 2026-02-21T09:22:02.8433094Z bar.sync 0; 2026-02-21T09:22:02.8433231Z // begin inline asm 2026-02-21T09:22:02.8433467Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r376, %r512}, [%r194]; 2026-02-21T09:22:02.8433737Z // end inline asm 2026-02-21T09:22:02.8433886Z bar.sync 0; 2026-02-21T09:22:02.8434088Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r848}; 2026-02-21T09:22:02.8434349Z bar.sync 0; 2026-02-21T09:22:02.8434501Z // begin inline asm 2026-02-21T09:22:02.8434742Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r378, %r514}, [%r194]; 2026-02-21T09:22:02.8435018Z // end inline asm 2026-02-21T09:22:02.8435249Z bar.sync 0; 2026-02-21T09:22:02.8435522Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r847}; 2026-02-21T09:22:02.8435774Z bar.sync 0; 2026-02-21T09:22:02.8435919Z // begin inline asm 2026-02-21T09:22:02.8436149Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r377, %r513}, [%r194]; 2026-02-21T09:22:02.8436430Z // end inline asm 2026-02-21T09:22:02.8436764Z bar.sync 0; 2026-02-21T09:22:02.8436973Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r849}; 2026-02-21T09:22:02.8437251Z bar.sync 0; 2026-02-21T09:22:02.8437391Z // begin inline asm 2026-02-21T09:22:02.8437630Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r379, %r515}, [%r194]; 2026-02-21T09:22:02.8437904Z // end inline asm 2026-02-21T09:22:02.8438052Z bar.sync 0; 2026-02-21T09:22:02.8438270Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r850}; 2026-02-21T09:22:02.8438632Z bar.sync 0; 2026-02-21T09:22:02.8438782Z // begin inline asm 2026-02-21T09:22:02.8439020Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r380, %r516}, [%r194]; 2026-02-21T09:22:02.8439307Z // end inline asm 2026-02-21T09:22:02.8439450Z bar.sync 0; 2026-02-21T09:22:02.8439665Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r852}; 2026-02-21T09:22:02.8439931Z bar.sync 0; 2026-02-21T09:22:02.8440077Z // begin inline asm 2026-02-21T09:22:02.8440374Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r382, %r518}, [%r194]; 2026-02-21T09:22:02.8440657Z // end inline asm 2026-02-21T09:22:02.8440798Z bar.sync 0; 2026-02-21T09:22:02.8441006Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r851}; 2026-02-21T09:22:02.8441259Z bar.sync 0; 2026-02-21T09:22:02.8441406Z // begin inline asm 2026-02-21T09:22:02.8441656Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r381, %r517}, [%r194]; 2026-02-21T09:22:02.8441944Z // end inline asm 2026-02-21T09:22:02.8442096Z bar.sync 0; 2026-02-21T09:22:02.8442310Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r853}; 2026-02-21T09:22:02.8442573Z bar.sync 0; 2026-02-21T09:22:02.8442714Z // begin inline asm 2026-02-21T09:22:02.8442974Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r383, %r519}, [%r194]; 2026-02-21T09:22:02.8443247Z // end inline asm 2026-02-21T09:22:02.8443395Z bar.sync 0; 2026-02-21T09:22:02.8443600Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r854}; 2026-02-21T09:22:02.8443866Z bar.sync 0; 2026-02-21T09:22:02.8444013Z // begin inline asm 2026-02-21T09:22:02.8444252Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r384, %r520}, [%r194]; 2026-02-21T09:22:02.8444538Z // end inline asm 2026-02-21T09:22:02.8444685Z bar.sync 0; 2026-02-21T09:22:02.8444899Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r856}; 2026-02-21T09:22:02.8445161Z bar.sync 0; 2026-02-21T09:22:02.8445306Z // begin inline asm 2026-02-21T09:22:02.8445543Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r386, %r522}, [%r194]; 2026-02-21T09:22:02.8445826Z // end inline asm 2026-02-21T09:22:02.8445973Z bar.sync 0; 2026-02-21T09:22:02.8446180Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r855}; 2026-02-21T09:22:02.8446574Z bar.sync 0; 2026-02-21T09:22:02.8446727Z // begin inline asm 2026-02-21T09:22:02.8446968Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r385, %r521}, [%r194]; 2026-02-21T09:22:02.8447243Z // end inline asm 2026-02-21T09:22:02.8447390Z bar.sync 0; 2026-02-21T09:22:02.8447598Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r857}; 2026-02-21T09:22:02.8447861Z bar.sync 0; 2026-02-21T09:22:02.8448000Z // begin inline asm 2026-02-21T09:22:02.8448238Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r387, %r523}, [%r194]; 2026-02-21T09:22:02.8448518Z // end inline asm 2026-02-21T09:22:02.8448674Z $L__tmp1: 2026-02-21T09:22:02.8449047Z .loc 2 291 36 // standard.py:291:36 @[ cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:78:36 ] 2026-02-21T09:22:02.8449467Z // begin inline asm 2026-02-21T09:22:02.8449656Z fence.proxy.async.shared::cta; 2026-02-21T09:22:02.8449848Z // end inline asm 2026-02-21T09:22:02.8450028Z shfl.sync.idx.b32 %r976, %r3, 0, 31, -1; 2026-02-21T09:22:02.8450408Z wgmma.fence.sync.aligned; 2026-02-21T09:22:02.8450605Z mov.pred %p2, -1; 2026-02-21T09:22:02.8450770Z // begin inline asm 2026-02-21T09:22:02.8451538Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387}, {%r488,%r489,%r490,%r491}, %rd28, %p2, 1, 1; 2026-02-21T09:22:02.8452348Z // end inline asm 2026-02-21T09:22:02.8452502Z // begin inline asm 2026-02-21T09:22:02.8453334Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387}, {%r556,%r557,%r558,%r559}, %rd29, %p2, 1, 1; 2026-02-21T09:22:02.8454131Z // end inline asm 2026-02-21T09:22:02.8454278Z // begin inline asm 2026-02-21T09:22:02.8455019Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523}, {%r488,%r489,%r490,%r491}, %rd30, %p2, 1, 1; 2026-02-21T09:22:02.8455872Z // end inline asm 2026-02-21T09:22:02.8456021Z // begin inline asm 2026-02-21T09:22:02.8456879Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523}, {%r556,%r557,%r558,%r559}, %rd31, %p2, 1, 1; 2026-02-21T09:22:02.8457673Z // end inline asm 2026-02-21T09:22:02.8457844Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:02.8458041Z mov.b32 %r928, 0; 2026-02-21T09:22:02.8458199Z mov.b32 %r624, %r691; 2026-02-21T09:22:02.8458367Z mov.b32 %r625, %r928; 2026-02-21T09:22:02.8458521Z mov.b32 %r626, %r928; 2026-02-21T09:22:02.8458688Z // begin inline asm 2026-02-21T09:22:02.8459664Z // wait for regs: %r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r624,%r625,%r626 2026-02-21T09:22:02.8460697Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:02.8460904Z // end inline asm 2026-02-21T09:22:02.8461052Z $L__tmp2: 2026-02-21T09:22:02.8461353Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8461726Z add.s32 %r977, %r971, 40960; 2026-02-21T09:22:02.8462053Z .loc 1 46 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:46:28 2026-02-21T09:22:02.8462406Z add.s32 %r978, %r977, %r10; 2026-02-21T09:22:02.8462602Z ld.shared.b16 %rs21, [%r978]; 2026-02-21T09:22:02.8462798Z ld.shared.b16 %rs22, [%r978+256]; 2026-02-21T09:22:02.8463005Z ld.shared.b16 %rs23, [%r978+16]; 2026-02-21T09:22:02.8463206Z ld.shared.b16 %rs24, [%r978+272]; 2026-02-21T09:22:02.8463396Z add.s32 %r979, %r977, %r11; 2026-02-21T09:22:02.8463580Z ld.shared.b16 %rs25, [%r979]; 2026-02-21T09:22:02.8463762Z ld.shared.b16 %rs26, [%r979+256]; 2026-02-21T09:22:02.8463964Z ld.shared.b16 %rs27, [%r979+16]; 2026-02-21T09:22:02.8464154Z ld.shared.b16 %rs28, [%r979+272]; 2026-02-21T09:22:02.8464351Z cvt.f32.bf16 %r822, %rs21; 2026-02-21T09:22:02.8464527Z cvt.f32.bf16 %r823, %rs22; 2026-02-21T09:22:02.8464704Z cvt.f32.bf16 %r824, %rs25; 2026-02-21T09:22:02.8464874Z cvt.f32.bf16 %r825, %rs26; 2026-02-21T09:22:02.8465048Z cvt.f32.bf16 %r890, %rs23; 2026-02-21T09:22:02.8465223Z cvt.f32.bf16 %r891, %rs24; 2026-02-21T09:22:02.8465400Z cvt.f32.bf16 %r892, %rs27; 2026-02-21T09:22:02.8465579Z cvt.f32.bf16 %r893, %rs28; 2026-02-21T09:22:02.8466055Z .loc 1 48 30 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:30 2026-02-21T09:22:02.8466423Z add.s32 %r980, %r1076, 65536; 2026-02-21T09:22:02.8466721Z cvt.s64.s32 %rd40, %r980; 2026-02-21T09:22:02.8466930Z add.s64 %rd33, %rd8, %rd40; 2026-02-21T09:22:02.8467265Z .loc 1 48 83 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:83 2026-02-21T09:22:02.8467644Z // begin inline asm 2026-02-21T09:22:02.8467815Z mov.u64 %rd32, 0x0; 2026-02-21T09:22:02.8468044Z createpolicy.fractional.L2::evict_first.b64 %rd32, 1.0; 2026-02-21T09:22:02.8468386Z // end inline asm 2026-02-21T09:22:02.8468548Z // begin inline asm 2026-02-21T09:22:02.8468711Z mov.u16 %rs2, 0x0; 2026-02-21T09:22:02.8469044Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd33 + 0 ], %rd32; 2026-02-21T09:22:02.8469354Z // end inline asm 2026-02-21T09:22:02.8469659Z .loc 1 56 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:56:24 2026-02-21T09:22:02.8470013Z bar.sync 0; 2026-02-21T09:22:02.8470175Z st.shared.b8 [%r14], %rs2; 2026-02-21T09:22:02.8470356Z bar.sync 0; 2026-02-21T09:22:02.8470522Z ld.shared.v2.b8 {%rs29, %rs30}, [%r15]; 2026-02-21T09:22:02.8470952Z .loc 1 51 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:51:24 2026-02-21T09:22:02.8471340Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:22:02.8471517Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:22:02.8471847Z .loc 1 66 54 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:66:54 2026-02-21T09:22:02.8472217Z selp.b16 %rs33, %rs31, %rs29, %p8; 2026-02-21T09:22:02.8472417Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:22:02.8472596Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:22:02.8472772Z selp.b16 %rs36, %rs32, %rs30, %p8; 2026-02-21T09:22:02.8472973Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:22:02.8473141Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:22:02.8473459Z .loc 1 71 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:71:28 2026-02-21T09:22:02.8473830Z cvt.rn.f32.s16 %r981, %rs35; 2026-02-21T09:22:02.8474030Z cvt.rn.f32.s16 %r982, %rs38; 2026-02-21T09:22:02.8474205Z bar.sync 0; 2026-02-21T09:22:02.8474357Z st.shared.b32 [%r16], %r981; 2026-02-21T09:22:02.8474544Z st.shared.b32 [%r17], %r982; 2026-02-21T09:22:02.8474714Z $L__tmp3: 2026-02-21T09:22:02.8475079Z .loc 2 291 36 // standard.py:291:36 @[ cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:78:36 ] 2026-02-21T09:22:02.8475587Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r356, %r492}; 2026-02-21T09:22:02.8475878Z bar.sync 0; 2026-02-21T09:22:02.8476023Z // begin inline asm 2026-02-21T09:22:02.8476259Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r826}, [%r695]; 2026-02-21T09:22:02.8476650Z // end inline asm 2026-02-21T09:22:02.8476798Z bar.sync 0; 2026-02-21T09:22:02.8477034Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r358, %r494}; 2026-02-21T09:22:02.8477316Z bar.sync 0; 2026-02-21T09:22:02.8477467Z // begin inline asm 2026-02-21T09:22:02.8477689Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r828}, [%r695]; 2026-02-21T09:22:02.8477959Z // end inline asm 2026-02-21T09:22:02.8478105Z bar.sync 0; 2026-02-21T09:22:02.8478359Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r357, %r493}; 2026-02-21T09:22:02.8478654Z bar.sync 0; 2026-02-21T09:22:02.8478795Z // begin inline asm 2026-02-21T09:22:02.8479026Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r827}, [%r695]; 2026-02-21T09:22:02.8479294Z // end inline asm 2026-02-21T09:22:02.8479447Z bar.sync 0; 2026-02-21T09:22:02.8479673Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r359, %r495}; 2026-02-21T09:22:02.8479960Z bar.sync 0; 2026-02-21T09:22:02.8480106Z // begin inline asm 2026-02-21T09:22:02.8480332Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r829}, [%r695]; 2026-02-21T09:22:02.8480595Z // end inline asm 2026-02-21T09:22:02.8480866Z bar.sync 0; 2026-02-21T09:22:02.8481159Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r360, %r496}; 2026-02-21T09:22:02.8481435Z bar.sync 0; 2026-02-21T09:22:02.8481587Z // begin inline asm 2026-02-21T09:22:02.8481804Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r830}, [%r695]; 2026-02-21T09:22:02.8482071Z // end inline asm 2026-02-21T09:22:02.8482213Z bar.sync 0; 2026-02-21T09:22:02.8482440Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r362, %r498}; 2026-02-21T09:22:02.8482733Z bar.sync 0; 2026-02-21T09:22:02.8482878Z // begin inline asm 2026-02-21T09:22:02.8483101Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r832}, [%r695]; 2026-02-21T09:22:02.8483361Z // end inline asm 2026-02-21T09:22:02.8483508Z bar.sync 0; 2026-02-21T09:22:02.8483800Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r361, %r497}; 2026-02-21T09:22:02.8484084Z bar.sync 0; 2026-02-21T09:22:02.8484223Z // begin inline asm 2026-02-21T09:22:02.8484445Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r831}, [%r695]; 2026-02-21T09:22:02.8484707Z // end inline asm 2026-02-21T09:22:02.8484856Z bar.sync 0; 2026-02-21T09:22:02.8485076Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r363, %r499}; 2026-02-21T09:22:02.8485356Z bar.sync 0; 2026-02-21T09:22:02.8485501Z // begin inline asm 2026-02-21T09:22:02.8485783Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r833}, [%r695]; 2026-02-21T09:22:02.8486053Z // end inline asm 2026-02-21T09:22:02.8486195Z bar.sync 0; 2026-02-21T09:22:02.8486423Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r364, %r500}; 2026-02-21T09:22:02.8486816Z bar.sync 0; 2026-02-21T09:22:02.8486961Z // begin inline asm 2026-02-21T09:22:02.8487197Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r834}, [%r695]; 2026-02-21T09:22:02.8487466Z // end inline asm 2026-02-21T09:22:02.8487613Z bar.sync 0; 2026-02-21T09:22:02.8487838Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r366, %r502}; 2026-02-21T09:22:02.8488119Z bar.sync 0; 2026-02-21T09:22:02.8488259Z // begin inline asm 2026-02-21T09:22:02.8488488Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r836}, [%r695]; 2026-02-21T09:22:02.8488751Z // end inline asm 2026-02-21T09:22:02.8488899Z bar.sync 0; 2026-02-21T09:22:02.8489121Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r365, %r501}; 2026-02-21T09:22:02.8489400Z bar.sync 0; 2026-02-21T09:22:02.8489544Z // begin inline asm 2026-02-21T09:22:02.8489770Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r835}, [%r695]; 2026-02-21T09:22:02.8490035Z // end inline asm 2026-02-21T09:22:02.8490180Z bar.sync 0; 2026-02-21T09:22:02.8490409Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r367, %r503}; 2026-02-21T09:22:02.8490687Z bar.sync 0; 2026-02-21T09:22:02.8490839Z // begin inline asm 2026-02-21T09:22:02.8491061Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r837}, [%r695]; 2026-02-21T09:22:02.8491331Z // end inline asm 2026-02-21T09:22:02.8491474Z bar.sync 0; 2026-02-21T09:22:02.8491711Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r368, %r504}; 2026-02-21T09:22:02.8491998Z bar.sync 0; 2026-02-21T09:22:02.8492142Z // begin inline asm 2026-02-21T09:22:02.8492366Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r838}, [%r695]; 2026-02-21T09:22:02.8492623Z // end inline asm 2026-02-21T09:22:02.8492771Z bar.sync 0; 2026-02-21T09:22:02.8492991Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r370, %r506}; 2026-02-21T09:22:02.8493284Z bar.sync 0; 2026-02-21T09:22:02.8493424Z // begin inline asm 2026-02-21T09:22:02.8493642Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r840}, [%r695]; 2026-02-21T09:22:02.8493899Z // end inline asm 2026-02-21T09:22:02.8494047Z bar.sync 0; 2026-02-21T09:22:02.8494272Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r369, %r505}; 2026-02-21T09:22:02.8494546Z bar.sync 0; 2026-02-21T09:22:02.8494689Z // begin inline asm 2026-02-21T09:22:02.8494908Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r839}, [%r695]; 2026-02-21T09:22:02.8495171Z // end inline asm 2026-02-21T09:22:02.8495314Z bar.sync 0; 2026-02-21T09:22:02.8495629Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r371, %r507}; 2026-02-21T09:22:02.8495968Z bar.sync 0; 2026-02-21T09:22:02.8496117Z // begin inline asm 2026-02-21T09:22:02.8496341Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r841}, [%r695]; 2026-02-21T09:22:02.8496731Z // end inline asm 2026-02-21T09:22:02.8496884Z bar.sync 0; 2026-02-21T09:22:02.8497121Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r372, %r508}; 2026-02-21T09:22:02.8497401Z bar.sync 0; 2026-02-21T09:22:02.8497542Z // begin inline asm 2026-02-21T09:22:02.8497770Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r842}, [%r695]; 2026-02-21T09:22:02.8498028Z // end inline asm 2026-02-21T09:22:02.8498183Z bar.sync 0; 2026-02-21T09:22:02.8498406Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r374, %r510}; 2026-02-21T09:22:02.8498767Z bar.sync 0; 2026-02-21T09:22:02.8498916Z // begin inline asm 2026-02-21T09:22:02.8499134Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r844}, [%r695]; 2026-02-21T09:22:02.8499397Z // end inline asm 2026-02-21T09:22:02.8499547Z bar.sync 0; 2026-02-21T09:22:02.8499773Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r373, %r509}; 2026-02-21T09:22:02.8500047Z bar.sync 0; 2026-02-21T09:22:02.8500193Z // begin inline asm 2026-02-21T09:22:02.8500473Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r843}, [%r695]; 2026-02-21T09:22:02.8500746Z // end inline asm 2026-02-21T09:22:02.8500896Z bar.sync 0; 2026-02-21T09:22:02.8501118Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r375, %r511}; 2026-02-21T09:22:02.8501403Z bar.sync 0; 2026-02-21T09:22:02.8501545Z // begin inline asm 2026-02-21T09:22:02.8501769Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r845}, [%r695]; 2026-02-21T09:22:02.8502031Z // end inline asm 2026-02-21T09:22:02.8502196Z bar.sync 0; 2026-02-21T09:22:02.8502425Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r376, %r512}; 2026-02-21T09:22:02.8502712Z bar.sync 0; 2026-02-21T09:22:02.8502855Z // begin inline asm 2026-02-21T09:22:02.8503088Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r846}, [%r695]; 2026-02-21T09:22:02.8503363Z // end inline asm 2026-02-21T09:22:02.8503506Z bar.sync 0; 2026-02-21T09:22:02.8503732Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r378, %r514}; 2026-02-21T09:22:02.8504005Z bar.sync 0; 2026-02-21T09:22:02.8504152Z // begin inline asm 2026-02-21T09:22:02.8504368Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r848}, [%r695]; 2026-02-21T09:22:02.8504634Z // end inline asm 2026-02-21T09:22:02.8504774Z bar.sync 0; 2026-02-21T09:22:02.8504999Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r377, %r513}; 2026-02-21T09:22:02.8505276Z bar.sync 0; 2026-02-21T09:22:02.8505418Z // begin inline asm 2026-02-21T09:22:02.8505644Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r847}, [%r695]; 2026-02-21T09:22:02.8505903Z // end inline asm 2026-02-21T09:22:02.8506052Z bar.sync 0; 2026-02-21T09:22:02.8506272Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r379, %r515}; 2026-02-21T09:22:02.8506680Z bar.sync 0; 2026-02-21T09:22:02.8506836Z // begin inline asm 2026-02-21T09:22:02.8507061Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r849}, [%r695]; 2026-02-21T09:22:02.8507321Z // end inline asm 2026-02-21T09:22:02.8507469Z bar.sync 0; 2026-02-21T09:22:02.8507698Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r380, %r516}; 2026-02-21T09:22:02.8507974Z bar.sync 0; 2026-02-21T09:22:02.8508120Z // begin inline asm 2026-02-21T09:22:02.8508417Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r850}, [%r695]; 2026-02-21T09:22:02.8508681Z // end inline asm 2026-02-21T09:22:02.8508824Z bar.sync 0; 2026-02-21T09:22:02.8509050Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r382, %r518}; 2026-02-21T09:22:02.8509324Z bar.sync 0; 2026-02-21T09:22:02.8509469Z // begin inline asm 2026-02-21T09:22:02.8509694Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r852}, [%r695]; 2026-02-21T09:22:02.8509955Z // end inline asm 2026-02-21T09:22:02.8510102Z bar.sync 0; 2026-02-21T09:22:02.8510320Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r381, %r517}; 2026-02-21T09:22:02.8510766Z bar.sync 0; 2026-02-21T09:22:02.8510911Z // begin inline asm 2026-02-21T09:22:02.8511140Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r851}, [%r695]; 2026-02-21T09:22:02.8511399Z // end inline asm 2026-02-21T09:22:02.8511552Z bar.sync 0; 2026-02-21T09:22:02.8511813Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r383, %r519}; 2026-02-21T09:22:02.8512213Z bar.sync 0; 2026-02-21T09:22:02.8512364Z // begin inline asm 2026-02-21T09:22:02.8512592Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r853}, [%r695]; 2026-02-21T09:22:02.8512867Z // end inline asm 2026-02-21T09:22:02.8513013Z bar.sync 0; 2026-02-21T09:22:02.8513266Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r384, %r520}; 2026-02-21T09:22:02.8513547Z bar.sync 0; 2026-02-21T09:22:02.8513786Z // begin inline asm 2026-02-21T09:22:02.8514015Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r854}, [%r695]; 2026-02-21T09:22:02.8514289Z // end inline asm 2026-02-21T09:22:02.8514443Z bar.sync 0; 2026-02-21T09:22:02.8514668Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r386, %r522}; 2026-02-21T09:22:02.8514950Z bar.sync 0; 2026-02-21T09:22:02.8515088Z // begin inline asm 2026-02-21T09:22:02.8515313Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r856}, [%r695]; 2026-02-21T09:22:02.8515636Z // end inline asm 2026-02-21T09:22:02.8515792Z bar.sync 0; 2026-02-21T09:22:02.8516010Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r385, %r521}; 2026-02-21T09:22:02.8516298Z bar.sync 0; 2026-02-21T09:22:02.8516440Z // begin inline asm 2026-02-21T09:22:02.8516787Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r855}, [%r695]; 2026-02-21T09:22:02.8517049Z // end inline asm 2026-02-21T09:22:02.8517190Z bar.sync 0; 2026-02-21T09:22:02.8517419Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r387, %r523}; 2026-02-21T09:22:02.8517693Z bar.sync 0; 2026-02-21T09:22:02.8517851Z // begin inline asm 2026-02-21T09:22:02.8518070Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r857}, [%r695]; 2026-02-21T09:22:02.8518340Z // end inline asm 2026-02-21T09:22:02.8518485Z // begin inline asm 2026-02-21T09:22:02.8518668Z fence.proxy.async.shared::cta; 2026-02-21T09:22:02.8518864Z // end inline asm 2026-02-21T09:22:02.8519024Z wgmma.fence.sync.aligned; 2026-02-21T09:22:02.8519212Z shl.b32 %r983, %r976, 8; 2026-02-21T09:22:02.8519384Z and.b32 %r984, %r983, 4096; 2026-02-21T09:22:02.8519575Z add.s32 %r985, %r984, %r691; 2026-02-21T09:22:02.8519759Z bfe.u32 %r986, %r985, 4, 14; 2026-02-21T09:22:02.8519945Z cvt.u64.u32 %rd41, %r986; 2026-02-21T09:22:02.8520139Z or.b64 %rd35, %rd41, -9223371899382267904; 2026-02-21T09:22:02.8520355Z // begin inline asm 2026-02-21T09:22:02.8521108Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857}, {%r822,%r823,%r824,%r825}, %rd35, %p2, 1, 1; 2026-02-21T09:22:02.8521916Z // end inline asm 2026-02-21T09:22:02.8522073Z add.s32 %r987, %r985, 32; 2026-02-21T09:22:02.8522245Z bfe.u32 %r988, %r987, 4, 14; 2026-02-21T09:22:02.8522440Z cvt.u64.u32 %rd42, %r988; 2026-02-21T09:22:02.8522630Z or.b64 %rd36, %rd42, -9223371899382267904; 2026-02-21T09:22:02.8522848Z // begin inline asm 2026-02-21T09:22:02.8523593Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857}, {%r890,%r891,%r892,%r893}, %rd36, %p2, 1, 1; 2026-02-21T09:22:02.8524379Z // end inline asm 2026-02-21T09:22:02.8524554Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:02.8524752Z mov.b32 %r926, %r691; 2026-02-21T09:22:02.8524924Z mov.b32 %r927, %r928; 2026-02-21T09:22:02.8525084Z // begin inline asm 2026-02-21T09:22:02.8525650Z // wait for regs: %r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r926,%r927,%r928 2026-02-21T09:22:02.8526437Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:02.8526751Z // end inline asm 2026-02-21T09:22:02.8526910Z $L__tmp4: 2026-02-21T09:22:02.8527205Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:02.8527574Z add.s32 %r989, %r1079, 1; 2026-02-21T09:22:02.8527756Z setp.gt.s32 %p11, %r989, 4; 2026-02-21T09:22:02.8527957Z selp.b32 %r1079, 0, %r989, %p11; 2026-02-21T09:22:02.8528296Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8528659Z add.s32 %r990, %r1077, -16; 2026-02-21T09:22:02.8528928Z mad.wide.s32 %rd37, %r990, 2, %rd7; 2026-02-21T09:22:02.8529276Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8529637Z shl.b32 %r991, %r1079, 13; 2026-02-21T09:22:02.8529815Z add.s32 %r964, %r94, %r991; 2026-02-21T09:22:02.8530002Z selp.b32 %r965, 8, 0, %p9; 2026-02-21T09:22:02.8530176Z // begin inline asm 2026-02-21T09:22:02.8530514Z cp.async.ca.shared.global [ %r964 + 0 ], [ %rd37 + 0 ], 0x8, %r965; 2026-02-21T09:22:02.8530806Z // end inline asm 2026-02-21T09:22:02.8530963Z cp.async.commit_group; 2026-02-21T09:22:02.8531278Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:02.8531644Z mad.wide.s32 %rd38, %r1077, 2, %rd7; 2026-02-21T09:22:02.8531984Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:02.8532332Z add.s32 %r966, %r96, %r991; 2026-02-21T09:22:02.8532512Z // begin inline asm 2026-02-21T09:22:02.8532742Z cp.async.ca.shared.global [ %r966 + 0 ], [ %rd38 + 0 ], 0x8, %r965; 2026-02-21T09:22:02.8533018Z // end inline asm 2026-02-21T09:22:02.8533179Z cp.async.commit_group; 2026-02-21T09:22:02.8533485Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:02.8533842Z add.s32 %r1077, %r1077, 32; 2026-02-21T09:22:02.8534021Z add.s32 %r1076, %r1076, 131072; 2026-02-21T09:22:02.8534229Z setp.lt.u64 %p12, %rd47, 496; 2026-02-21T09:22:02.8534420Z @%p12 bra $L__BB0_1; 2026-02-21T09:22:02.8534583Z // %bb.2: 2026-02-21T09:22:02.8534867Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:02.8535231Z or.b32 %r1028, %r4, %r13; 2026-02-21T09:22:02.8535549Z .loc 1 25 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:41 2026-02-21T09:22:02.8535899Z and.b32 %r1029, %r7, 120; 2026-02-21T09:22:02.8536212Z .loc 1 25 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:28 2026-02-21T09:22:02.8536988Z or.b32 %r1030, %r1, %r1029; 2026-02-21T09:22:02.8537340Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:02.8537706Z cp.async.wait_group 0; 2026-02-21T09:22:02.8537880Z bar.sync 0; 2026-02-21T09:22:02.8538176Z .loc 1 81 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:81:24 2026-02-21T09:22:02.8538537Z cvt.rn.bf16x2.f32 %r1031, %r827, %r826; 2026-02-21T09:22:02.8538762Z cvt.rn.bf16x2.f32 %r1032, %r829, %r828; 2026-02-21T09:22:02.8538975Z cvt.rn.bf16x2.f32 %r1033, %r831, %r830; 2026-02-21T09:22:02.8539209Z cvt.rn.bf16x2.f32 %r1034, %r833, %r832; 2026-02-21T09:22:02.8539418Z cvt.rn.bf16x2.f32 %r1035, %r835, %r834; 2026-02-21T09:22:02.8539634Z cvt.rn.bf16x2.f32 %r1036, %r837, %r836; 2026-02-21T09:22:02.8539848Z cvt.rn.bf16x2.f32 %r1037, %r839, %r838; 2026-02-21T09:22:02.8540055Z cvt.rn.bf16x2.f32 %r1038, %r841, %r840; 2026-02-21T09:22:02.8540271Z cvt.rn.bf16x2.f32 %r1039, %r843, %r842; 2026-02-21T09:22:02.8540480Z cvt.rn.bf16x2.f32 %r1040, %r845, %r844; 2026-02-21T09:22:02.8540856Z cvt.rn.bf16x2.f32 %r1041, %r847, %r846; 2026-02-21T09:22:02.8541068Z cvt.rn.bf16x2.f32 %r1042, %r849, %r848; 2026-02-21T09:22:02.8541283Z cvt.rn.bf16x2.f32 %r1043, %r851, %r850; 2026-02-21T09:22:02.8541494Z cvt.rn.bf16x2.f32 %r1044, %r853, %r852; 2026-02-21T09:22:02.8541714Z cvt.rn.bf16x2.f32 %r1045, %r855, %r854; 2026-02-21T09:22:02.8541924Z cvt.rn.bf16x2.f32 %r1046, %r857, %r856; 2026-02-21T09:22:02.8542273Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:02.8542627Z shl.b32 %r1047, %r1028, 13; 2026-02-21T09:22:02.8542943Z .loc 1 82 39 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:39 2026-02-21T09:22:02.8543300Z or.b32 %r1048, %r1047, 524288; 2026-02-21T09:22:02.8543557Z or.b32 %r1049, %r1047, 1048576; 2026-02-21T09:22:02.8543749Z or.b32 %r1050, %r1047, 1572864; 2026-02-21T09:22:02.8544073Z .loc 1 82 46 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:46 2026-02-21T09:22:02.8544424Z add.s32 %r1051, %r1047, %r1030; 2026-02-21T09:22:02.8544611Z add.s32 %r1052, %r1048, %r1030; 2026-02-21T09:22:02.8544810Z add.s32 %r1053, %r1049, %r1030; 2026-02-21T09:22:02.8545068Z add.s32 %r1054, %r1050, %r1030; 2026-02-21T09:22:02.8545396Z .loc 1 82 18 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:18 2026-02-21T09:22:02.8545762Z mad.wide.s32 %rd43, %r1051, 2, %rd9; 2026-02-21T09:22:02.8545970Z mad.wide.s32 %rd44, %r1052, 2, %rd9; 2026-02-21T09:22:02.8546177Z mad.wide.s32 %rd45, %r1053, 2, %rd9; 2026-02-21T09:22:02.8546380Z mad.wide.s32 %rd46, %r1054, 2, %rd9; 2026-02-21T09:22:02.8546878Z .loc 1 82 77 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:77 2026-02-21T09:22:02.8547242Z shl.b32 %r1055, %r2, 5; 2026-02-21T09:22:02.8547415Z and.b32 %r1056, %r2, 24; 2026-02-21T09:22:02.8547593Z shr.u32 %r1057, %r2, 3; 2026-02-21T09:22:02.8547765Z and.b32 %r1058, %r1057, 64; 2026-02-21T09:22:02.8547950Z shl.b32 %r1059, %r1056, 4; 2026-02-21T09:22:02.8548125Z shl.b32 %r1060, %r5, 14; 2026-02-21T09:22:02.8548380Z and.b32 %r1061, %r1055, 15456; 2026-02-21T09:22:02.8548571Z and.b32 %r1062, %r12, 16; 2026-02-21T09:22:02.8548748Z or.b32 %r1063, %r1062, %r1060; 2026-02-21T09:22:02.8548938Z or.b32 %r1064, %r1061, %r1059; 2026-02-21T09:22:02.8549119Z xor.b32 %r1065, %r1064, %r1058; 2026-02-21T09:22:02.8549308Z or.b32 %r1066, %r1065, %r1063; 2026-02-21T09:22:02.8549486Z add.s32 %r1068, %r138, %r1066; 2026-02-21T09:22:02.8549730Z st.shared.v4.b32 [%r1068], {%r1031, %r1033, %r1035, %r1037}; 2026-02-21T09:22:02.8550049Z st.shared.v4.b32 [%r1068+512], {%r1032, %r1034, %r1036, %r1038}; 2026-02-21T09:22:02.8550323Z xor.b32 %r1069, %r1066, 32; 2026-02-21T09:22:02.8550510Z add.s32 %r1070, %r138, %r1069; 2026-02-21T09:22:02.8550740Z st.shared.v4.b32 [%r1070], {%r1039, %r1041, %r1043, %r1045}; 2026-02-21T09:22:02.8551054Z st.shared.v4.b32 [%r1070+512], {%r1040, %r1042, %r1044, %r1046}; 2026-02-21T09:22:02.8551310Z bar.sync 0; 2026-02-21T09:22:02.8551464Z shl.b32 %r1071, %r1056, 11; 2026-02-21T09:22:02.8551642Z shl.b32 %r1072, %r5, 5; 2026-02-21T09:22:02.8551823Z and.b32 %r1073, %r12, 4080; 2026-02-21T09:22:02.8552001Z or.b32 %r1074, %r1071, %r1072; 2026-02-21T09:22:02.8552204Z xor.b32 %r1075, %r1074, %r1073; 2026-02-21T09:22:02.8552399Z add.s32 %r996, %r138, %r1075; 2026-02-21T09:22:02.8552579Z // begin inline asm 2026-02-21T09:22:02.8552875Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1012, %r1013, %r1014, %r1015}, [%r996]; 2026-02-21T09:22:02.8553212Z // end inline asm 2026-02-21T09:22:02.8553377Z add.s32 %r1001, %r996, 4096; 2026-02-21T09:22:02.8553554Z // begin inline asm 2026-02-21T09:22:02.8553843Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1016, %r1017, %r1018, %r1019}, [%r1001]; 2026-02-21T09:22:02.8554173Z // end inline asm 2026-02-21T09:22:02.8554333Z add.s32 %r1006, %r996, 8192; 2026-02-21T09:22:02.8554656Z // begin inline asm 2026-02-21T09:22:02.8554933Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1021, %r1022, %r1023}, [%r1006]; 2026-02-21T09:22:02.8555266Z // end inline asm 2026-02-21T09:22:02.8555417Z add.s32 %r1011, %r996, 12288; 2026-02-21T09:22:02.8555604Z // begin inline asm 2026-02-21T09:22:02.8555880Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1025, %r1026, %r1027}, [%r1011]; 2026-02-21T09:22:02.8556207Z // end inline asm 2026-02-21T09:22:02.8556358Z // begin inline asm 2026-02-21T09:22:02.8556731Z st.global.v4.b32 [ %rd43 + 0 ], { %r1012, %r1013, %r1014, %r1015 }; 2026-02-21T09:22:02.8557019Z // end inline asm 2026-02-21T09:22:02.8557166Z // begin inline asm 2026-02-21T09:22:02.8557383Z st.global.v4.b32 [ %rd44 + 0 ], { %r1016, %r1017, %r1018, %r1019 }; 2026-02-21T09:22:02.8557719Z // end inline asm 2026-02-21T09:22:02.8557877Z // begin inline asm 2026-02-21T09:22:02.8558084Z st.global.v4.b32 [ %rd45 + 0 ], { %r1020, %r1021, %r1022, %r1023 }; 2026-02-21T09:22:02.8558348Z // end inline asm 2026-02-21T09:22:02.8558494Z // begin inline asm 2026-02-21T09:22:02.8558707Z st.global.v4.b32 [ %rd46 + 0 ], { %r1024, %r1025, %r1026, %r1027 }; 2026-02-21T09:22:02.8558961Z // end inline asm 2026-02-21T09:22:02.8559325Z .loc 1 82 4 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:4 2026-02-21T09:22:02.8559684Z ret; 2026-02-21T09:22:02.8559817Z $L__tmp5: 2026-02-21T09:22:02.8559973Z $L__func_end0: 2026-02-21T09:22:02.8560153Z // -- End function 2026-02-21T09:22:02.8560374Z } 2026-02-21T09:22:02.8560694Z .file 1 "/tmp/torchinductor_root/fi/cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py" 2026-02-21T09:22:02.8561248Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:22:02.8561617Z .section .debug_abbrev 2026-02-21T09:22:02.8561777Z { 2026-02-21T09:22:02.8561949Z .b8 1 // Abbreviation Code 2026-02-21T09:22:02.8562216Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:22:02.8562481Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:02.8562729Z .b8 37 // DW_AT_producer 2026-02-21T09:22:02.8562982Z .b8 8 // DW_FORM_string 2026-02-21T09:22:02.8563221Z .b8 19 // DW_AT_language 2026-02-21T09:22:02.8563470Z .b8 5 // DW_FORM_data2 2026-02-21T09:22:02.8568627Z .b8 3 // DW_AT_name 2026-02-21T09:22:02.8568989Z .b8 8 // DW_FORM_string 2026-02-21T09:22:02.8569268Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:22:02.8569552Z .b8 6 // DW_FORM_data4 2026-02-21T09:22:02.8569818Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:22:02.8570072Z .b8 8 // DW_FORM_string 2026-02-21T09:22:02.8570313Z .b8 0 // EOM(1) 2026-02-21T09:22:02.8570544Z .b8 0 // EOM(2) 2026-02-21T09:22:02.8570794Z .b8 2 // Abbreviation Code 2026-02-21T09:22:02.8571068Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:02.8571331Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:02.8571570Z .b8 3 // DW_AT_name 2026-02-21T09:22:02.8571810Z .b8 8 // DW_FORM_string 2026-02-21T09:22:02.8572049Z .b8 32 // DW_AT_inline 2026-02-21T09:22:02.8572298Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:02.8572541Z .b8 0 // EOM(1) 2026-02-21T09:22:02.8572773Z .b8 0 // EOM(2) 2026-02-21T09:22:02.8573020Z .b8 3 // Abbreviation Code 2026-02-21T09:22:02.8573524Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:02.8573804Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:02.8574055Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:02.8574302Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:02.8574553Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:02.8574794Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:02.8575052Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:02.8575315Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:02.8575557Z .b8 0 // EOM(1) 2026-02-21T09:22:02.8575855Z .b8 0 // EOM(2) 2026-02-21T09:22:02.8576102Z .b8 4 // Abbreviation Code 2026-02-21T09:22:02.8576389Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:22:02.8576835Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:02.8577098Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:02.8577441Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:02.8577692Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:02.8577928Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:02.8578174Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:02.8578252Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:02.8578355Z .b8 88 // DW_AT_call_file 2026-02-21T09:22:02.8578440Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:02.8578526Z .b8 89 // DW_AT_call_line 2026-02-21T09:22:02.8578608Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:02.8578704Z .b8 87 // DW_AT_call_column 2026-02-21T09:22:02.8578790Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:02.8578864Z .b8 0 // EOM(1) 2026-02-21T09:22:02.8578944Z .b8 0 // EOM(2) 2026-02-21T09:22:02.8579013Z .b8 0 // EOM(3) 2026-02-21T09:22:02.8579070Z } 2026-02-21T09:22:02.8579143Z .section .debug_info 2026-02-21T09:22:02.8579196Z { 2026-02-21T09:22:02.8579292Z .b32 178 // Length of Unit 2026-02-21T09:22:02.8579388Z .b8 2 // DWARF version number 2026-02-21T09:22:02.8579449Z .b8 0 2026-02-21T09:22:02.8579585Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:22:02.8579683Z .b8 8 // Address Size (in bytes) 2026-02-21T09:22:02.8579806Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:22:02.8579901Z .b8 116 // DW_AT_producer 2026-02-21T09:22:02.8579958Z .b8 114 2026-02-21T09:22:02.8580012Z .b8 105 2026-02-21T09:22:02.8580083Z .b8 116 2026-02-21T09:22:02.8580139Z .b8 111 2026-02-21T09:22:02.8580192Z .b8 110 2026-02-21T09:22:02.8580253Z .b8 0 2026-02-21T09:22:02.8580336Z .b8 2 // DW_AT_language 2026-02-21T09:22:02.8580391Z .b8 0 2026-02-21T09:22:02.8580473Z .b8 99 // DW_AT_name 2026-02-21T09:22:02.8580539Z .b8 102 2026-02-21T09:22:02.8580597Z .b8 105 2026-02-21T09:22:02.8580650Z .b8 51 2026-02-21T09:22:02.8580712Z .b8 54 2026-02-21T09:22:02.8580769Z .b8 98 2026-02-21T09:22:02.8580823Z .b8 108 2026-02-21T09:22:02.8580877Z .b8 114 2026-02-21T09:22:02.8580938Z .b8 106 2026-02-21T09:22:02.8580993Z .b8 110 2026-02-21T09:22:02.8581047Z .b8 104 2026-02-21T09:22:02.8581108Z .b8 111 2026-02-21T09:22:02.8581162Z .b8 114 2026-02-21T09:22:02.8581216Z .b8 107 2026-02-21T09:22:02.8581272Z .b8 120 2026-02-21T09:22:02.8581467Z .b8 99 2026-02-21T09:22:02.8581533Z .b8 120 2026-02-21T09:22:02.8581588Z .b8 115 2026-02-21T09:22:02.8581649Z .b8 111 2026-02-21T09:22:02.8581702Z .b8 54 2026-02-21T09:22:02.8581757Z .b8 55 2026-02-21T09:22:02.8581811Z .b8 114 2026-02-21T09:22:02.8581872Z .b8 101 2026-02-21T09:22:02.8581928Z .b8 102 2026-02-21T09:22:02.8581981Z .b8 101 2026-02-21T09:22:02.8582038Z .b8 107 2026-02-21T09:22:02.8582096Z .b8 120 2026-02-21T09:22:02.8582148Z .b8 120 2026-02-21T09:22:02.8582201Z .b8 104 2026-02-21T09:22:02.8582261Z .b8 53 2026-02-21T09:22:02.8582328Z .b8 117 2026-02-21T09:22:02.8582384Z .b8 110 2026-02-21T09:22:02.8582440Z .b8 97 2026-02-21T09:22:02.8582500Z .b8 117 2026-02-21T09:22:02.8582552Z .b8 122 2026-02-21T09:22:02.8582604Z .b8 55 2026-02-21T09:22:02.8582663Z .b8 110 2026-02-21T09:22:02.8582826Z .b8 110 2026-02-21T09:22:02.8582886Z .b8 109 2026-02-21T09:22:02.8582939Z .b8 54 2026-02-21T09:22:02.8583002Z .b8 115 2026-02-21T09:22:02.8583056Z .b8 118 2026-02-21T09:22:02.8583112Z .b8 122 2026-02-21T09:22:02.8583168Z .b8 108 2026-02-21T09:22:02.8583227Z .b8 112 2026-02-21T09:22:02.8583280Z .b8 104 2026-02-21T09:22:02.8583333Z .b8 104 2026-02-21T09:22:02.8583394Z .b8 107 2026-02-21T09:22:02.8583447Z .b8 53 2026-02-21T09:22:02.8583500Z .b8 50 2026-02-21T09:22:02.8583603Z .b8 111 2026-02-21T09:22:02.8583665Z .b8 113 2026-02-21T09:22:02.8583717Z .b8 46 2026-02-21T09:22:02.8583770Z .b8 112 2026-02-21T09:22:02.8583830Z .b8 121 2026-02-21T09:22:02.8583883Z .b8 0 2026-02-21T09:22:02.8583993Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:22:02.8584083Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:22:02.8584143Z .b8 116 2026-02-21T09:22:02.8584198Z .b8 109 2026-02-21T09:22:02.8584252Z .b8 112 2026-02-21T09:22:02.8584314Z .b8 47 2026-02-21T09:22:02.8584378Z .b8 116 2026-02-21T09:22:02.8584433Z .b8 111 2026-02-21T09:22:02.8584491Z .b8 114 2026-02-21T09:22:02.8584549Z .b8 99 2026-02-21T09:22:02.8584603Z .b8 104 2026-02-21T09:22:02.8584656Z .b8 105 2026-02-21T09:22:02.8584717Z .b8 110 2026-02-21T09:22:02.8584770Z .b8 100 2026-02-21T09:22:02.8584823Z .b8 117 2026-02-21T09:22:02.8584875Z .b8 99 2026-02-21T09:22:02.8584933Z .b8 116 2026-02-21T09:22:02.8584987Z .b8 111 2026-02-21T09:22:02.8585038Z .b8 114 2026-02-21T09:22:02.8585089Z .b8 95 2026-02-21T09:22:02.8585154Z .b8 114 2026-02-21T09:22:02.8585207Z .b8 111 2026-02-21T09:22:02.8585260Z .b8 111 2026-02-21T09:22:02.8585318Z .b8 116 2026-02-21T09:22:02.8585371Z .b8 47 2026-02-21T09:22:02.8585424Z .b8 102 2026-02-21T09:22:02.8585476Z .b8 105 2026-02-21T09:22:02.8585536Z .b8 0 2026-02-21T09:22:02.8585655Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:22:02.8585740Z .b8 95 // DW_AT_name 2026-02-21T09:22:02.8585801Z .b8 104 2026-02-21T09:22:02.8585857Z .b8 101 2026-02-21T09:22:02.8585911Z .b8 108 2026-02-21T09:22:02.8585964Z .b8 105 2026-02-21T09:22:02.8586023Z .b8 111 2026-02-21T09:22:02.8586075Z .b8 110 2026-02-21T09:22:02.8586133Z .b8 95 2026-02-21T09:22:02.8586192Z .b8 109 2026-02-21T09:22:02.8586244Z .b8 97 2026-02-21T09:22:02.8586297Z .b8 116 2026-02-21T09:22:02.8586351Z .b8 109 2026-02-21T09:22:02.8586410Z .b8 117 2026-02-21T09:22:02.8586590Z .b8 108 2026-02-21T09:22:02.8586648Z .b8 95 2026-02-21T09:22:02.8586709Z .b8 98 2026-02-21T09:22:02.8586764Z .b8 102 2026-02-21T09:22:02.8586819Z .b8 49 2026-02-21T09:22:02.8586871Z .b8 54 2026-02-21T09:22:02.8586932Z .b8 95 2026-02-21T09:22:02.8586989Z .b8 105 2026-02-21T09:22:02.8587042Z .b8 110 2026-02-21T09:22:02.8587097Z .b8 116 2026-02-21T09:22:02.8587159Z .b8 52 2026-02-21T09:22:02.8587213Z .b8 0 2026-02-21T09:22:02.8587298Z .b8 1 // DW_AT_inline 2026-02-21T09:22:02.8587418Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:22:02.8587519Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:22:02.8587622Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:22:02.8587815Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:02.8588010Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:22:02.8588124Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:02.8588298Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:22:02.8588406Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:22:02.8588499Z .b8 1 // DW_AT_call_file 2026-02-21T09:22:02.8588591Z .b8 78 // DW_AT_call_line 2026-02-21T09:22:02.8588696Z .b8 36 // DW_AT_call_column 2026-02-21T09:22:02.8588796Z .b8 0 // End Of Children Mark 2026-02-21T09:22:02.8588952Z .b8 0 // End Of Children Mark 2026-02-21T09:22:02.8589018Z } 2026-02-21T09:22:02.8589094Z .section .debug_macinfo { } 2026-02-21T09:22:02.8589102Z 2026-02-21T09:22:02.8589188Z ================================================================ 2026-02-21T09:22:02.8589317Z please share the reproducer above with Triton project. 2026-02-21T09:22:03.2473304Z [603s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:22:03.2475253Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:22:03.2477322Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:22:03.2477692Z `ptxas` stderr: 2026-02-21T09:22:03.2478525Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:03.2479717Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:03.2480365Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:03.2480593Z 2026-02-21T09:22:03.2481247Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp1h1rk7i9.ptx -o /tmp/tmp1h1rk7i9.ptx.o 2026-02-21T09:22:03.2482123Z 2026-02-21T09:22:03.2482315Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:22:03.2482633Z 2026-02-21T09:22:03.2482637Z 2026-02-21T09:22:03.2482641Z 2026-02-21T09:22:03.2482749Z ================================================================ 2026-02-21T09:22:03.2483063Z Internal Triton PTX codegen error 2026-02-21T09:22:03.2483304Z `ptxas` stderr: 2026-02-21T09:22:03.2484112Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:03.2485300Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:03.2485939Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:03.2486167Z 2026-02-21T09:22:03.2486970Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp1h1rk7i9.ptx -o /tmp/tmp1h1rk7i9.ptx.o 2026-02-21T09:22:03.2487700Z 2026-02-21T09:22:03.2487704Z 2026-02-21T09:22:03.2487783Z // 2026-02-21T09:22:03.2487965Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:22:03.2488207Z // 2026-02-21T09:22:03.2488293Z 2026-02-21T09:22:03.2488362Z .version 8.7 2026-02-21T09:22:03.2488538Z .target sm_90a 2026-02-21T09:22:03.2488873Z .address_size 64 2026-02-21T09:22:03.2489109Z 2026-02-21T09:22:03.2489327Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:22:03.2489755Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:22:03.2490078Z // @_helion_matmul_bf16_int4 2026-02-21T09:22:03.2490413Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:22:03.2490777Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:22:03.2491220Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:22:03.2491656Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:22:03.2492088Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:22:03.2492584Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:22:03.2492889Z ) 2026-02-21T09:22:03.2493030Z .reqntid 1024 2026-02-21T09:22:03.2493173Z { 2026-02-21T09:22:03.2493317Z .reg .pred %p<34>; 2026-02-21T09:22:03.2493483Z .reg .b16 %rs<39>; 2026-02-21T09:22:03.2493645Z .reg .b32 %r<1083>; 2026-02-21T09:22:03.2493802Z .reg .b64 %rd<65>; 2026-02-21T09:22:03.2494187Z .loc 1 19 0 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:19:0 2026-02-21T09:22:03.2494548Z $L__func_begin0: 2026-02-21T09:22:03.2494844Z .loc 1 19 0 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:19:0 2026-02-21T09:22:03.2495132Z 2026-02-21T09:22:03.2495206Z // %bb.0: 2026-02-21T09:22:03.2495405Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:22:03.2495704Z ld.param.b64 %rd8, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:22:03.2495940Z $L__tmp0: 2026-02-21T09:22:03.2496230Z .loc 1 21 68 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:21:68 2026-02-21T09:22:03.2496745Z mov.u32 %r123, %ctaid.x; 2026-02-21T09:22:03.2496981Z ld.param.b64 %rd11, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:22:03.2497238Z mov.u32 %r124, %ctaid.y; 2026-02-21T09:22:03.2497450Z ld.param.b64 %rd39, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:22:03.2497698Z mov.u32 %r125, %ctaid.z; 2026-02-21T09:22:03.2497869Z mov.u32 %r126, %nctaid.x; 2026-02-21T09:22:03.2498049Z mov.u32 %r127, %nctaid.y; 2026-02-21T09:22:03.2498227Z mad.lo.s32 %r128, %r125, %r127, %r124; 2026-02-21T09:22:03.2498442Z mad.lo.s32 %r129, %r128, %r126, %r123; 2026-02-21T09:22:03.2498638Z shl.b32 %r130, %r129, 7; 2026-02-21T09:22:03.2498813Z cvt.s64.s32 %rd40, %r130; 2026-02-21T09:22:03.2498986Z add.s64 %rd1, %rd39, %rd40; 2026-02-21T09:22:03.2499172Z mov.u32 %r1, %tid.x; 2026-02-21T09:22:03.2499345Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:22:03.2499519Z shl.b32 %r131, %r1, 2; 2026-02-21T09:22:03.2499717Z mov.b32 %r132, global_smem; 2026-02-21T09:22:03.2499900Z add.s32 %r92, %r132, %r131; 2026-02-21T09:22:03.2500070Z mov.b32 %r708, 0; 2026-02-21T09:22:03.2500230Z // begin inline asm 2026-02-21T09:22:03.2500407Z @%p1 st.shared.b32 [ %r92 + 0 ], %r708; 2026-02-21T09:22:03.2500611Z // end inline asm 2026-02-21T09:22:03.2500773Z bar.warp.sync -1; 2026-02-21T09:22:03.2500947Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T09:22:03.2501138Z cvt.u64.u32 %rd10, %r132; 2026-02-21T09:22:03.2501321Z // begin inline asm 2026-02-21T09:22:03.2501646Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd10 + 0 ], %rd11; 2026-02-21T09:22:03.2501998Z // end inline asm 2026-02-21T09:22:03.2502153Z // begin inline asm 2026-02-21T09:22:03.2502414Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1; 2026-02-21T09:22:03.2502725Z // end inline asm 2026-02-21T09:22:03.2502871Z mov.b32 %r94, 64; 2026-02-21T09:22:03.2503022Z // begin inline asm 2026-02-21T09:22:03.2503298Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r94; 2026-02-21T09:22:03.2503627Z // end inline asm 2026-02-21T09:22:03.2503776Z mov.b32 %r95, 256; 2026-02-21T09:22:03.2504050Z // begin inline asm 2026-02-21T09:22:03.2504426Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r95; 2026-02-21T09:22:03.2504745Z // end inline asm 2026-02-21T09:22:03.2504899Z mov.b32 %r96, 8192; 2026-02-21T09:22:03.2505059Z // begin inline asm 2026-02-21T09:22:03.2505348Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r96; 2026-02-21T09:22:03.2505679Z // end inline asm 2026-02-21T09:22:03.2505833Z mov.b32 %r97, 16384; 2026-02-21T09:22:03.2505995Z // begin inline asm 2026-02-21T09:22:03.2506276Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r97; 2026-02-21T09:22:03.2506758Z // end inline asm 2026-02-21T09:22:03.2506907Z mov.b64 %rd18, 16384; 2026-02-21T09:22:03.2507074Z // begin inline asm 2026-02-21T09:22:03.2507481Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd10 + 0 ], 0x0, %rd18; 2026-02-21T09:22:03.2507850Z // end inline asm 2026-02-21T09:22:03.2508009Z mov.b32 %r98, 1; 2026-02-21T09:22:03.2508165Z // begin inline asm 2026-02-21T09:22:03.2508597Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0, %r98; 2026-02-21T09:22:03.2508953Z // end inline asm 2026-02-21T09:22:03.2509105Z // begin inline asm 2026-02-21T09:22:03.2509490Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x1, %r98; 2026-02-21T09:22:03.2509998Z // end inline asm 2026-02-21T09:22:03.2510152Z // begin inline asm 2026-02-21T09:22:03.2510432Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd10 + 0 ], 0xa; 2026-02-21T09:22:03.2510758Z // end inline asm 2026-02-21T09:22:03.2510905Z // begin inline asm 2026-02-21T09:22:03.2511207Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:22:03.2511555Z // end inline asm 2026-02-21T09:22:03.2511720Z // begin inline asm 2026-02-21T09:22:03.2512000Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x3; 2026-02-21T09:22:03.2512338Z // end inline asm 2026-02-21T09:22:03.2512492Z // begin inline asm 2026-02-21T09:22:03.2512762Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd10 + 0 ], 0x0; 2026-02-21T09:22:03.2513083Z // end inline asm 2026-02-21T09:22:03.2513233Z // begin inline asm 2026-02-21T09:22:03.2513670Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd1 + 0 ], [ %rd10 + 0 ], 0x80; 2026-02-21T09:22:03.2514141Z // end inline asm 2026-02-21T09:22:03.2514295Z // begin inline asm 2026-02-21T09:22:03.2514542Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd1 + 0 ], 0x80; 2026-02-21T09:22:03.2514847Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:22:03.2515072Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:22:03.2515278Z // end inline asm 2026-02-21T09:22:03.2515427Z bar.sync 0; 2026-02-21T09:22:03.2515724Z .loc 1 28 29 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:28:29 2026-02-21T09:22:03.2516104Z shr.u32 %r133, %r123, 6; 2026-02-21T09:22:03.2516283Z and.b32 %r134, %r133, 33554424; 2026-02-21T09:22:03.2516762Z .loc 1 29 35 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:29:35 2026-02-21T09:22:03.2517134Z sub.s32 %r135, 64, %r134; 2026-02-21T09:22:03.2517454Z .loc 1 29 48 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:29:48 2026-02-21T09:22:03.2517809Z min.s32 %r136, %r135, 8; 2026-02-21T09:22:03.2518118Z .loc 1 30 41 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:30:41 2026-02-21T09:22:03.2518473Z and.b32 %r137, %r123, 511; 2026-02-21T09:22:03.2518784Z .loc 1 31 47 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:31:47 2026-02-21T09:22:03.2519143Z div.s32 %r138, %r137, %r136; 2026-02-21T09:22:03.2519463Z .loc 1 30 60 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:30:60 2026-02-21T09:22:03.2519985Z mul.lo.s32 %r139, %r138, %r136; 2026-02-21T09:22:03.2520194Z sub.s32 %r140, %r137, %r139; 2026-02-21T09:22:03.2520509Z .loc 1 30 26 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:30:26 2026-02-21T09:22:03.2520864Z add.s32 %r141, %r140, %r134; 2026-02-21T09:22:03.2521174Z .loc 1 32 23 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:32:23 2026-02-21T09:22:03.2521528Z shl.b32 %r2, %r141, 7; 2026-02-21T09:22:03.2521851Z .loc 1 33 41 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:33:41 2026-02-21T09:22:03.2522201Z and.b32 %r142, %r1, 31; 2026-02-21T09:22:03.2522376Z shr.u32 %r3, %r1, 5; 2026-02-21T09:22:03.2522537Z and.b32 %r143, %r1, 127; 2026-02-21T09:22:03.2522947Z .loc 1 34 23 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:34:23 2026-02-21T09:22:03.2523303Z shl.b32 %r1009, %r138, 8; 2026-02-21T09:22:03.2523618Z .loc 1 35 41 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:35:41 2026-02-21T09:22:03.2523973Z shr.u32 %r144, %r1, 2; 2026-02-21T09:22:03.2524277Z .loc 1 35 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:35:28 2026-02-21T09:22:03.2524712Z or.b32 %r145, %r1009, %r144; 2026-02-21T09:22:03.2525031Z .loc 1 49 34 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:49:34 2026-02-21T09:22:03.2525388Z and.b32 %r146, %r1, 3; 2026-02-21T09:22:03.2525550Z shl.b32 %r147, %r146, 2; 2026-02-21T09:22:03.2525860Z .loc 1 50 49 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:49 2026-02-21T09:22:03.2526213Z shl.b32 %r148, %r145, 10; 2026-02-21T09:22:03.2526644Z .loc 1 67 34 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:67:34 2026-02-21T09:22:03.2527014Z and.b32 %r5, %r1, 128; 2026-02-21T09:22:03.2527317Z .loc 1 50 56 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:56 2026-02-21T09:22:03.2527679Z or.b32 %r149, %r148, %r147; 2026-02-21T09:22:03.2527992Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2528360Z mad.wide.s32 %rd28, %r149, 2, %rd8; 2026-02-21T09:22:03.2528707Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2529049Z bar.sync 0; 2026-02-21T09:22:03.2529202Z shl.b32 %r150, %r1, 3; 2026-02-21T09:22:03.2529367Z and.b32 %r151, %r150, 8056; 2026-02-21T09:22:03.2529551Z and.b32 %r6, %r1, 16; 2026-02-21T09:22:03.2529719Z bfe.s32 %r152, %r1, 4, 1; 2026-02-21T09:22:03.2529897Z and.b32 %r153, %r152, 136; 2026-02-21T09:22:03.2530070Z xor.b32 %r154, %r153, %r151; 2026-02-21T09:22:03.2530254Z add.s32 %r100, %r132, %r154; 2026-02-21T09:22:03.2530430Z mov.b32 %r101, 8; 2026-02-21T09:22:03.2530581Z // begin inline asm 2026-02-21T09:22:03.2530831Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd28 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2531114Z // end inline asm 2026-02-21T09:22:03.2531277Z cp.async.commit_group; 2026-02-21T09:22:03.2531587Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2531941Z add.s64 %rd29, %rd28, 32; 2026-02-21T09:22:03.2532249Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2532600Z add.s32 %r102, %r100, 40960; 2026-02-21T09:22:03.2532778Z // begin inline asm 2026-02-21T09:22:03.2533007Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd29 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2533278Z // end inline asm 2026-02-21T09:22:03.2533430Z cp.async.commit_group; 2026-02-21T09:22:03.2533742Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2534094Z add.s64 %rd30, %rd28, 64; 2026-02-21T09:22:03.2534406Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2534974Z bar.sync 0; 2026-02-21T09:22:03.2535119Z add.s32 %r104, %r100, 8192; 2026-02-21T09:22:03.2535297Z // begin inline asm 2026-02-21T09:22:03.2535523Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd30 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2535797Z // end inline asm 2026-02-21T09:22:03.2535953Z cp.async.commit_group; 2026-02-21T09:22:03.2536264Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2536772Z add.s64 %rd31, %rd28, 96; 2026-02-21T09:22:03.2537088Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2537441Z add.s32 %r106, %r100, 49152; 2026-02-21T09:22:03.2537714Z // begin inline asm 2026-02-21T09:22:03.2537958Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd31 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2538223Z // end inline asm 2026-02-21T09:22:03.2538390Z cp.async.commit_group; 2026-02-21T09:22:03.2538699Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2539057Z add.s64 %rd32, %rd28, 128; 2026-02-21T09:22:03.2539471Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2539837Z bar.sync 0; 2026-02-21T09:22:03.2539986Z add.s32 %r108, %r100, 16384; 2026-02-21T09:22:03.2540157Z // begin inline asm 2026-02-21T09:22:03.2540388Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd32 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2540655Z // end inline asm 2026-02-21T09:22:03.2540815Z cp.async.commit_group; 2026-02-21T09:22:03.2541121Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2541479Z add.s64 %rd33, %rd28, 160; 2026-02-21T09:22:03.2541797Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2542148Z add.s32 %r110, %r100, 57344; 2026-02-21T09:22:03.2542327Z // begin inline asm 2026-02-21T09:22:03.2542554Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd33 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2542826Z // end inline asm 2026-02-21T09:22:03.2542985Z cp.async.commit_group; 2026-02-21T09:22:03.2543296Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2543647Z add.s64 %rd34, %rd28, 192; 2026-02-21T09:22:03.2543968Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2544323Z bar.sync 0; 2026-02-21T09:22:03.2544481Z add.s32 %r112, %r100, 24576; 2026-02-21T09:22:03.2544660Z // begin inline asm 2026-02-21T09:22:03.2544885Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd34 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2545158Z // end inline asm 2026-02-21T09:22:03.2545309Z cp.async.commit_group; 2026-02-21T09:22:03.2545626Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2545979Z add.s64 %rd35, %rd28, 224; 2026-02-21T09:22:03.2546295Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2546780Z add.s32 %r114, %r100, 65536; 2026-02-21T09:22:03.2546950Z // begin inline asm 2026-02-21T09:22:03.2547176Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd35 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2547441Z // end inline asm 2026-02-21T09:22:03.2547596Z cp.async.commit_group; 2026-02-21T09:22:03.2547900Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2548314Z add.s64 %rd36, %rd28, 256; 2026-02-21T09:22:03.2548644Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2548988Z bar.sync 0; 2026-02-21T09:22:03.2549140Z add.s32 %r116, %r100, 32768; 2026-02-21T09:22:03.2549483Z // begin inline asm 2026-02-21T09:22:03.2549714Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd36 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2549979Z // end inline asm 2026-02-21T09:22:03.2550140Z cp.async.commit_group; 2026-02-21T09:22:03.2550444Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2550804Z add.s64 %rd37, %rd28, 288; 2026-02-21T09:22:03.2551119Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2551471Z add.s32 %r118, %r100, 73728; 2026-02-21T09:22:03.2551649Z // begin inline asm 2026-02-21T09:22:03.2551871Z cp.async.ca.shared.global [ %r118 + 0 ], [ %rd37 + 0 ], 0x8, %r101; 2026-02-21T09:22:03.2552231Z // end inline asm 2026-02-21T09:22:03.2552394Z cp.async.commit_group; 2026-02-21T09:22:03.2552564Z shl.b32 %r155, %r1, 4; 2026-02-21T09:22:03.2552739Z and.b32 %r156, %r155, 7680; 2026-02-21T09:22:03.2552938Z and.b32 %r157, %r150, 96; 2026-02-21T09:22:03.2553119Z shl.b32 %r158, %r146, 1; 2026-02-21T09:22:03.2553304Z or.b32 %r159, %r156, %r157; 2026-02-21T09:22:03.2553497Z or.b32 %r160, %r159, %r158; 2026-02-21T09:22:03.2553671Z or.b32 %r9, %r160, %r153; 2026-02-21T09:22:03.2553928Z xor.b32 %r10, %r9, 8; 2026-02-21T09:22:03.2554098Z and.b32 %r161, %r131, 124; 2026-02-21T09:22:03.2554279Z and.b32 %r162, %r1, 384; 2026-02-21T09:22:03.2554451Z shr.u32 %r163, %r1, 4; 2026-02-21T09:22:03.2554629Z and.b32 %r164, %r163, 2; 2026-02-21T09:22:03.2554796Z and.b32 %r165, %r150, 512; 2026-02-21T09:22:03.2554982Z setp.gt.u32 %p19, %r1, 511; 2026-02-21T09:22:03.2555168Z selp.b32 %r166, 1, 0, %p19; 2026-02-21T09:22:03.2555343Z add.s32 %r707, %r132, 81920; 2026-02-21T09:22:03.2555527Z add.s32 %r168, %r707, %r162; 2026-02-21T09:22:03.2555700Z add.s32 %r169, %r168, %r166; 2026-02-21T09:22:03.2555877Z add.s32 %r170, %r169, %r165; 2026-02-21T09:22:03.2556050Z add.s32 %r171, %r170, %r164; 2026-02-21T09:22:03.2556247Z add.s32 %r11, %r171, %r161; 2026-02-21T09:22:03.2556420Z shr.u32 %r172, %r1, 1; 2026-02-21T09:22:03.2556720Z and.b32 %r173, %r172, 384; 2026-02-21T09:22:03.2556896Z add.s32 %r174, %r707, %r164; 2026-02-21T09:22:03.2557071Z add.s32 %r175, %r174, %r173; 2026-02-21T09:22:03.2557249Z add.s32 %r176, %r175, %r161; 2026-02-21T09:22:03.2557424Z add.s32 %r12, %r176, %r165; 2026-02-21T09:22:03.2557599Z shl.b32 %r177, %r143, 6; 2026-02-21T09:22:03.2557762Z and.b32 %r178, %r150, 48; 2026-02-21T09:22:03.2557933Z and.b32 %r179, %r3, 28; 2026-02-21T09:22:03.2558099Z xor.b32 %r180, %r178, %r179; 2026-02-21T09:22:03.2558276Z or.b32 %r181, %r180, %r177; 2026-02-21T09:22:03.2558449Z add.s32 %r13, %r707, %r181; 2026-02-21T09:22:03.2558623Z xor.b32 %r182, %r181, 32; 2026-02-21T09:22:03.2558797Z add.s32 %r14, %r707, %r182; 2026-02-21T09:22:03.2558966Z shl.b32 %r183, %r3, 7; 2026-02-21T09:22:03.2559133Z shl.b32 %r184, %r142, 4; 2026-02-21T09:22:03.2559295Z or.b32 %r185, %r183, %r184; 2026-02-21T09:22:03.2559476Z add.s32 %r186, %r132, 90112; 2026-02-21T09:22:03.2559647Z add.s32 %r711, %r186, %r185; 2026-02-21T09:22:03.2559825Z and.b32 %r16, %r155, 112; 2026-02-21T09:22:03.2559989Z shl.b32 %r187, %r142, 3; 2026-02-21T09:22:03.2560163Z or.b32 %r188, %r183, %r187; 2026-02-21T09:22:03.2560333Z and.b32 %r189, %r188, 1920; 2026-02-21T09:22:03.2560507Z shl.b32 %r190, %r1, 8; 2026-02-21T09:22:03.2560673Z and.b32 %r191, %r190, 2048; 2026-02-21T09:22:03.2560846Z add.s32 %r192, %r186, %r16; 2026-02-21T09:22:03.2561025Z add.s32 %r193, %r192, %r191; 2026-02-21T09:22:03.2561196Z add.s32 %r210, %r193, %r189; 2026-02-21T09:22:03.2561384Z bfe.u32 %r194, %r707, 4, 14; 2026-02-21T09:22:03.2561564Z cvt.u64.u32 %rd41, %r194; 2026-02-21T09:22:03.2561761Z or.b64 %rd48, %rd41, -9223371899382267904; 2026-02-21T09:22:03.2561978Z add.s32 %r195, %r132, 81952; 2026-02-21T09:22:03.2562157Z bfe.u32 %r196, %r195, 4, 14; 2026-02-21T09:22:03.2562330Z cvt.u64.u32 %rd42, %r196; 2026-02-21T09:22:03.2562680Z or.b64 %rd49, %rd42, -9223371899382267904; 2026-02-21T09:22:03.2562894Z add.s32 %r197, %r132, 86016; 2026-02-21T09:22:03.2563070Z bfe.u32 %r198, %r197, 4, 14; 2026-02-21T09:22:03.2563251Z cvt.u64.u32 %rd43, %r198; 2026-02-21T09:22:03.2563435Z or.b64 %rd50, %rd43, -9223371899382267904; 2026-02-21T09:22:03.2563646Z add.s32 %r199, %r132, 86048; 2026-02-21T09:22:03.2563818Z bfe.u32 %r200, %r199, 4, 14; 2026-02-21T09:22:03.2563999Z cvt.u64.u32 %rd44, %r200; 2026-02-21T09:22:03.2564183Z or.b64 %rd51, %rd44, -9223371899382267904; 2026-02-21T09:22:03.2564545Z .loc 1 42 74 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:42:74 2026-02-21T09:22:03.2564916Z shl.b32 %r201, %r138, 18; 2026-02-21T09:22:03.2565093Z shl.b32 %r202, %r144, 10; 2026-02-21T09:22:03.2565348Z or.b32 %r203, %r201, %r202; 2026-02-21T09:22:03.2565533Z or.b32 %r204, %r203, %r147; 2026-02-21T09:22:03.2565713Z or.b32 %r1048, %r204, 176; 2026-02-21T09:22:03.2565885Z shl.b32 %r205, %r1, 6; 2026-02-21T09:22:03.2566053Z and.b32 %r206, %r205, 57344; 2026-02-21T09:22:03.2566222Z add.s32 %r207, %r206, %r2; 2026-02-21T09:22:03.2566395Z or.b32 %r1047, %r207, %r143; 2026-02-21T09:22:03.2566705Z mov.b32 %r842, 0f00000000; 2026-02-21T09:22:03.2566950Z mov.b32 %r1050, 4; 2026-02-21T09:22:03.2567125Z mov.b32 %r1049, -1; 2026-02-21T09:22:03.2567286Z mov.b64 %rd64, -16; 2026-02-21T09:22:03.2567447Z mov.b32 %r843, %r842; 2026-02-21T09:22:03.2567607Z mov.b32 %r844, %r842; 2026-02-21T09:22:03.2567768Z mov.b32 %r845, %r842; 2026-02-21T09:22:03.2567922Z mov.b32 %r846, %r842; 2026-02-21T09:22:03.2568079Z mov.b32 %r847, %r842; 2026-02-21T09:22:03.2568232Z mov.b32 %r848, %r842; 2026-02-21T09:22:03.2568392Z mov.b32 %r849, %r842; 2026-02-21T09:22:03.2568552Z mov.b32 %r850, %r842; 2026-02-21T09:22:03.2568705Z mov.b32 %r851, %r842; 2026-02-21T09:22:03.2568864Z mov.b32 %r852, %r842; 2026-02-21T09:22:03.2569017Z mov.b32 %r853, %r842; 2026-02-21T09:22:03.2569181Z mov.b32 %r854, %r842; 2026-02-21T09:22:03.2569334Z mov.b32 %r855, %r842; 2026-02-21T09:22:03.2569492Z mov.b32 %r856, %r842; 2026-02-21T09:22:03.2569647Z mov.b32 %r857, %r842; 2026-02-21T09:22:03.2569810Z mov.b32 %r858, %r842; 2026-02-21T09:22:03.2569965Z mov.b32 %r859, %r842; 2026-02-21T09:22:03.2570127Z mov.b32 %r860, %r842; 2026-02-21T09:22:03.2570280Z mov.b32 %r861, %r842; 2026-02-21T09:22:03.2570438Z mov.b32 %r862, %r842; 2026-02-21T09:22:03.2570598Z mov.b32 %r863, %r842; 2026-02-21T09:22:03.2570750Z mov.b32 %r864, %r842; 2026-02-21T09:22:03.2570916Z mov.b32 %r865, %r842; 2026-02-21T09:22:03.2571073Z mov.b32 %r866, %r842; 2026-02-21T09:22:03.2571233Z mov.b32 %r867, %r842; 2026-02-21T09:22:03.2571386Z mov.b32 %r868, %r842; 2026-02-21T09:22:03.2571547Z mov.b32 %r869, %r842; 2026-02-21T09:22:03.2571705Z mov.b32 %r870, %r842; 2026-02-21T09:22:03.2571879Z mov.b32 %r871, %r842; 2026-02-21T09:22:03.2572037Z mov.b32 %r872, %r842; 2026-02-21T09:22:03.2572203Z mov.b32 %r873, %r842; 2026-02-21T09:22:03.2572428Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:22:03.2572838Z .loc 1 73 34 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:73:34 2026-02-21T09:22:03.2573206Z setp.eq.b32 %p26, %r5, 0; 2026-02-21T09:22:03.2573524Z .loc 1 42 74 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:42:74 2026-02-21T09:22:03.2573885Z add.s64 %rd64, %rd64, 16; 2026-02-21T09:22:03.2574063Z setp.lt.u64 %p27, %rd64, 432; 2026-02-21T09:22:03.2574255Z add.s32 %r984, %r1049, 1; 2026-02-21T09:22:03.2574439Z setp.gt.s32 %p28, %r984, 4; 2026-02-21T09:22:03.2574627Z selp.b32 %r1049, 0, %r984, %p28; 2026-02-21T09:22:03.2574967Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2575320Z cp.async.wait_group 8; 2026-02-21T09:22:03.2575489Z bar.sync 0; 2026-02-21T09:22:03.2575631Z shl.b32 %r985, %r1049, 13; 2026-02-21T09:22:03.2575965Z add.s32 %r987, %r132, %r985; 2026-02-21T09:22:03.2576280Z .loc 1 54 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:54:28 2026-02-21T09:22:03.2576767Z add.s32 %r988, %r987, %r9; 2026-02-21T09:22:03.2576949Z ld.shared.b16 %rs3, [%r988]; 2026-02-21T09:22:03.2577136Z ld.shared.b16 %rs4, [%r988+256]; 2026-02-21T09:22:03.2577338Z ld.shared.b16 %rs5, [%r988+16]; 2026-02-21T09:22:03.2577528Z ld.shared.b16 %rs6, [%r988+272]; 2026-02-21T09:22:03.2577731Z add.s32 %r989, %r987, %r10; 2026-02-21T09:22:03.2577911Z ld.shared.b16 %rs7, [%r989]; 2026-02-21T09:22:03.2578093Z ld.shared.b16 %rs8, [%r989+256]; 2026-02-21T09:22:03.2578281Z ld.shared.b16 %rs9, [%r989+16]; 2026-02-21T09:22:03.2578477Z ld.shared.b16 %rs10, [%r989+272]; 2026-02-21T09:22:03.2578752Z cvt.f32.bf16 %r504, %rs3; 2026-02-21T09:22:03.2578928Z cvt.f32.bf16 %r505, %rs4; 2026-02-21T09:22:03.2579099Z cvt.f32.bf16 %r506, %rs7; 2026-02-21T09:22:03.2579279Z cvt.f32.bf16 %r507, %rs8; 2026-02-21T09:22:03.2579459Z cvt.f32.bf16 %r572, %rs5; 2026-02-21T09:22:03.2579628Z cvt.f32.bf16 %r573, %rs6; 2026-02-21T09:22:03.2579802Z cvt.f32.bf16 %r574, %rs9; 2026-02-21T09:22:03.2579970Z cvt.f32.bf16 %r575, %rs10; 2026-02-21T09:22:03.2580373Z .loc 1 56 30 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:56:30 2026-02-21T09:22:03.2580734Z cvt.s64.s32 %rd59, %r1047; 2026-02-21T09:22:03.2580917Z add.s64 %rd46, %rd9, %rd59; 2026-02-21T09:22:03.2581238Z .loc 1 56 83 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:56:83 2026-02-21T09:22:03.2581584Z // begin inline asm 2026-02-21T09:22:03.2581751Z mov.u64 %rd45, 0x0; 2026-02-21T09:22:03.2581975Z createpolicy.fractional.L2::evict_first.b64 %rd45, 1.0; 2026-02-21T09:22:03.2582239Z // end inline asm 2026-02-21T09:22:03.2582392Z // begin inline asm 2026-02-21T09:22:03.2582551Z mov.u16 %rs1, 0x0; 2026-02-21T09:22:03.2582795Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd46 + 0 ], %rd45; 2026-02-21T09:22:03.2583114Z // end inline asm 2026-02-21T09:22:03.2583418Z .loc 1 64 24 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:64:24 2026-02-21T09:22:03.2583772Z st.shared.b8 [%r11], %rs1; 2026-02-21T09:22:03.2583954Z bar.sync 0; 2026-02-21T09:22:03.2584118Z ld.shared.v2.b8 {%rs11, %rs12}, [%r12]; 2026-02-21T09:22:03.2584480Z .loc 1 59 24 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:59:24 2026-02-21T09:22:03.2584830Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:22:03.2585013Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:22:03.2585328Z .loc 1 74 54 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:74:54 2026-02-21T09:22:03.2585690Z selp.b16 %rs15, %rs13, %rs11, %p26; 2026-02-21T09:22:03.2585899Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:22:03.2586067Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:22:03.2586252Z selp.b16 %rs18, %rs14, %rs12, %p26; 2026-02-21T09:22:03.2586560Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:22:03.2586754Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:22:03.2587081Z .loc 1 79 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:79:28 2026-02-21T09:22:03.2587443Z cvt.rn.f32.s16 %r990, %rs17; 2026-02-21T09:22:03.2587631Z cvt.rn.f32.s16 %r991, %rs20; 2026-02-21T09:22:03.2587799Z bar.sync 0; 2026-02-21T09:22:03.2587952Z st.shared.b32 [%r13], %r990; 2026-02-21T09:22:03.2588125Z st.shared.b32 [%r14], %r991; 2026-02-21T09:22:03.2588458Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r842}; 2026-02-21T09:22:03.2588725Z bar.sync 0; 2026-02-21T09:22:03.2588871Z // begin inline asm 2026-02-21T09:22:03.2589108Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r372, %r508}, [%r210]; 2026-02-21T09:22:03.2589397Z // end inline asm 2026-02-21T09:22:03.2589544Z bar.sync 0; 2026-02-21T09:22:03.2589750Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r844}; 2026-02-21T09:22:03.2590011Z bar.sync 0; 2026-02-21T09:22:03.2590335Z // begin inline asm 2026-02-21T09:22:03.2590576Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r374, %r510}, [%r210]; 2026-02-21T09:22:03.2590853Z // end inline asm 2026-02-21T09:22:03.2591001Z bar.sync 0; 2026-02-21T09:22:03.2591205Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r843}; 2026-02-21T09:22:03.2591467Z bar.sync 0; 2026-02-21T09:22:03.2591604Z // begin inline asm 2026-02-21T09:22:03.2591851Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r373, %r509}, [%r210]; 2026-02-21T09:22:03.2592131Z // end inline asm 2026-02-21T09:22:03.2592273Z bar.sync 0; 2026-02-21T09:22:03.2592482Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r845}; 2026-02-21T09:22:03.2592737Z bar.sync 0; 2026-02-21T09:22:03.2592895Z // begin inline asm 2026-02-21T09:22:03.2593213Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r375, %r511}, [%r210]; 2026-02-21T09:22:03.2593511Z // end inline asm 2026-02-21T09:22:03.2593653Z bar.sync 0; 2026-02-21T09:22:03.2593863Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r846}; 2026-02-21T09:22:03.2594126Z bar.sync 0; 2026-02-21T09:22:03.2594265Z // begin inline asm 2026-02-21T09:22:03.2594502Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r376, %r512}, [%r210]; 2026-02-21T09:22:03.2594776Z // end inline asm 2026-02-21T09:22:03.2595001Z bar.sync 0; 2026-02-21T09:22:03.2595212Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r848}; 2026-02-21T09:22:03.2595481Z bar.sync 0; 2026-02-21T09:22:03.2595620Z // begin inline asm 2026-02-21T09:22:03.2595859Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r378, %r514}, [%r210]; 2026-02-21T09:22:03.2596140Z // end inline asm 2026-02-21T09:22:03.2596288Z bar.sync 0; 2026-02-21T09:22:03.2596622Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r847}; 2026-02-21T09:22:03.2596890Z bar.sync 0; 2026-02-21T09:22:03.2597041Z // begin inline asm 2026-02-21T09:22:03.2597278Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r377, %r513}, [%r210]; 2026-02-21T09:22:03.2597579Z // end inline asm 2026-02-21T09:22:03.2597728Z bar.sync 0; 2026-02-21T09:22:03.2597941Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r849}; 2026-02-21T09:22:03.2598199Z bar.sync 0; 2026-02-21T09:22:03.2598345Z // begin inline asm 2026-02-21T09:22:03.2598586Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r379, %r515}, [%r210]; 2026-02-21T09:22:03.2598860Z // end inline asm 2026-02-21T09:22:03.2599006Z bar.sync 0; 2026-02-21T09:22:03.2599211Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r850}; 2026-02-21T09:22:03.2599473Z bar.sync 0; 2026-02-21T09:22:03.2599628Z // begin inline asm 2026-02-21T09:22:03.2599866Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r380, %r516}, [%r210]; 2026-02-21T09:22:03.2600140Z // end inline asm 2026-02-21T09:22:03.2600290Z bar.sync 0; 2026-02-21T09:22:03.2600496Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r852}; 2026-02-21T09:22:03.2600755Z bar.sync 0; 2026-02-21T09:22:03.2600899Z // begin inline asm 2026-02-21T09:22:03.2601130Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r382, %r518}, [%r210]; 2026-02-21T09:22:03.2601411Z // end inline asm 2026-02-21T09:22:03.2601552Z bar.sync 0; 2026-02-21T09:22:03.2601759Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r851}; 2026-02-21T09:22:03.2602015Z bar.sync 0; 2026-02-21T09:22:03.2602165Z // begin inline asm 2026-02-21T09:22:03.2602397Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r381, %r517}, [%r210]; 2026-02-21T09:22:03.2602679Z // end inline asm 2026-02-21T09:22:03.2602836Z bar.sync 0; 2026-02-21T09:22:03.2603045Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r853}; 2026-02-21T09:22:03.2603306Z bar.sync 0; 2026-02-21T09:22:03.2603449Z // begin inline asm 2026-02-21T09:22:03.2603688Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r383, %r519}, [%r210]; 2026-02-21T09:22:03.2603966Z // end inline asm 2026-02-21T09:22:03.2604115Z bar.sync 0; 2026-02-21T09:22:03.2604318Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r854}; 2026-02-21T09:22:03.2604586Z bar.sync 0; 2026-02-21T09:22:03.2604723Z // begin inline asm 2026-02-21T09:22:03.2605039Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r384, %r520}, [%r210]; 2026-02-21T09:22:03.2605100Z // end inline asm 2026-02-21T09:22:03.2605159Z bar.sync 0; 2026-02-21T09:22:03.2605301Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r856}; 2026-02-21T09:22:03.2605359Z bar.sync 0; 2026-02-21T09:22:03.2605419Z // begin inline asm 2026-02-21T09:22:03.2605572Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r386, %r522}, [%r210]; 2026-02-21T09:22:03.2605633Z // end inline asm 2026-02-21T09:22:03.2605688Z bar.sync 0; 2026-02-21T09:22:03.2605817Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r855}; 2026-02-21T09:22:03.2605880Z bar.sync 0; 2026-02-21T09:22:03.2605942Z // begin inline asm 2026-02-21T09:22:03.2606159Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r385, %r521}, [%r210]; 2026-02-21T09:22:03.2606231Z // end inline asm 2026-02-21T09:22:03.2606292Z bar.sync 0; 2026-02-21T09:22:03.2606418Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r857}; 2026-02-21T09:22:03.2606609Z bar.sync 0; 2026-02-21T09:22:03.2606681Z // begin inline asm 2026-02-21T09:22:03.2606821Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r387, %r523}, [%r210]; 2026-02-21T09:22:03.2606880Z // end inline asm 2026-02-21T09:22:03.2606943Z bar.sync 0; 2026-02-21T09:22:03.2607149Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r858}; 2026-02-21T09:22:03.2607214Z bar.sync 0; 2026-02-21T09:22:03.2607274Z // begin inline asm 2026-02-21T09:22:03.2607422Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r388, %r524}, [%r210]; 2026-02-21T09:22:03.2607482Z // end inline asm 2026-02-21T09:22:03.2607540Z bar.sync 0; 2026-02-21T09:22:03.2607671Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r860}; 2026-02-21T09:22:03.2607726Z bar.sync 0; 2026-02-21T09:22:03.2607790Z // begin inline asm 2026-02-21T09:22:03.2607936Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r390, %r526}, [%r210]; 2026-02-21T09:22:03.2607996Z // end inline asm 2026-02-21T09:22:03.2608053Z bar.sync 0; 2026-02-21T09:22:03.2608178Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r859}; 2026-02-21T09:22:03.2608243Z bar.sync 0; 2026-02-21T09:22:03.2608305Z // begin inline asm 2026-02-21T09:22:03.2608445Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r389, %r525}, [%r210]; 2026-02-21T09:22:03.2608511Z // end inline asm 2026-02-21T09:22:03.2608565Z bar.sync 0; 2026-02-21T09:22:03.2608688Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r861}; 2026-02-21T09:22:03.2608754Z bar.sync 0; 2026-02-21T09:22:03.2608825Z // begin inline asm 2026-02-21T09:22:03.2608964Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r391, %r527}, [%r210]; 2026-02-21T09:22:03.2609021Z // end inline asm 2026-02-21T09:22:03.2609084Z bar.sync 0; 2026-02-21T09:22:03.2609205Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r862}; 2026-02-21T09:22:03.2609262Z bar.sync 0; 2026-02-21T09:22:03.2609322Z // begin inline asm 2026-02-21T09:22:03.2609465Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r392, %r528}, [%r210]; 2026-02-21T09:22:03.2609526Z // end inline asm 2026-02-21T09:22:03.2609585Z bar.sync 0; 2026-02-21T09:22:03.2609713Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r864}; 2026-02-21T09:22:03.2609769Z bar.sync 0; 2026-02-21T09:22:03.2609828Z // begin inline asm 2026-02-21T09:22:03.2609972Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r394, %r530}, [%r210]; 2026-02-21T09:22:03.2610029Z // end inline asm 2026-02-21T09:22:03.2610084Z bar.sync 0; 2026-02-21T09:22:03.2610207Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r863}; 2026-02-21T09:22:03.2610267Z bar.sync 0; 2026-02-21T09:22:03.2610327Z // begin inline asm 2026-02-21T09:22:03.2610465Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r393, %r529}, [%r210]; 2026-02-21T09:22:03.2610526Z // end inline asm 2026-02-21T09:22:03.2610581Z bar.sync 0; 2026-02-21T09:22:03.2610704Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r865}; 2026-02-21T09:22:03.2610759Z bar.sync 0; 2026-02-21T09:22:03.2610824Z // begin inline asm 2026-02-21T09:22:03.2611064Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r395, %r531}, [%r210]; 2026-02-21T09:22:03.2611184Z // end inline asm 2026-02-21T09:22:03.2611245Z bar.sync 0; 2026-02-21T09:22:03.2611371Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r866}; 2026-02-21T09:22:03.2611426Z bar.sync 0; 2026-02-21T09:22:03.2611487Z // begin inline asm 2026-02-21T09:22:03.2611631Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r396, %r532}, [%r210]; 2026-02-21T09:22:03.2611688Z // end inline asm 2026-02-21T09:22:03.2611743Z bar.sync 0; 2026-02-21T09:22:03.2611872Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r868}; 2026-02-21T09:22:03.2611926Z bar.sync 0; 2026-02-21T09:22:03.2611986Z // begin inline asm 2026-02-21T09:22:03.2612128Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r398, %r534}, [%r210]; 2026-02-21T09:22:03.2612249Z // end inline asm 2026-02-21T09:22:03.2612307Z bar.sync 0; 2026-02-21T09:22:03.2612431Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r867}; 2026-02-21T09:22:03.2612494Z bar.sync 0; 2026-02-21T09:22:03.2612558Z // begin inline asm 2026-02-21T09:22:03.2612708Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r397, %r533}, [%r210]; 2026-02-21T09:22:03.2612777Z // end inline asm 2026-02-21T09:22:03.2612834Z bar.sync 0; 2026-02-21T09:22:03.2613006Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r869}; 2026-02-21T09:22:03.2613064Z bar.sync 0; 2026-02-21T09:22:03.2613128Z // begin inline asm 2026-02-21T09:22:03.2613267Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r399, %r535}, [%r210]; 2026-02-21T09:22:03.2613325Z // end inline asm 2026-02-21T09:22:03.2613386Z bar.sync 0; 2026-02-21T09:22:03.2613512Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r870}; 2026-02-21T09:22:03.2613568Z bar.sync 0; 2026-02-21T09:22:03.2613628Z // begin inline asm 2026-02-21T09:22:03.2613776Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r400, %r536}, [%r210]; 2026-02-21T09:22:03.2613833Z // end inline asm 2026-02-21T09:22:03.2613888Z bar.sync 0; 2026-02-21T09:22:03.2614018Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r872}; 2026-02-21T09:22:03.2614077Z bar.sync 0; 2026-02-21T09:22:03.2614136Z // begin inline asm 2026-02-21T09:22:03.2614274Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r402, %r538}, [%r210]; 2026-02-21T09:22:03.2614338Z // end inline asm 2026-02-21T09:22:03.2614396Z bar.sync 0; 2026-02-21T09:22:03.2614519Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r871}; 2026-02-21T09:22:03.2614580Z bar.sync 0; 2026-02-21T09:22:03.2614640Z // begin inline asm 2026-02-21T09:22:03.2614778Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r401, %r537}, [%r210]; 2026-02-21T09:22:03.2614840Z // end inline asm 2026-02-21T09:22:03.2614895Z bar.sync 0; 2026-02-21T09:22:03.2615017Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r711], {%r873}; 2026-02-21T09:22:03.2615074Z bar.sync 0; 2026-02-21T09:22:03.2615141Z // begin inline asm 2026-02-21T09:22:03.2615279Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r403, %r539}, [%r210]; 2026-02-21T09:22:03.2615338Z // end inline asm 2026-02-21T09:22:03.2615409Z $L__tmp1: 2026-02-21T09:22:03.2615705Z .loc 2 291 36 // standard.py:291:36 @[ ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:86:36 ] 2026-02-21T09:22:03.2615768Z // begin inline asm 2026-02-21T09:22:03.2615855Z fence.proxy.async.shared::cta; 2026-02-21T09:22:03.2615923Z // end inline asm 2026-02-21T09:22:03.2616004Z shfl.sync.idx.b32 %r992, %r3, 0, 31, -1; 2026-02-21T09:22:03.2616079Z wgmma.fence.sync.aligned; 2026-02-21T09:22:03.2616153Z mov.pred %p20, -1; 2026-02-21T09:22:03.2616213Z // begin inline asm 2026-02-21T09:22:03.2617019Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396,%r397,%r398,%r399,%r400,%r401,%r402,%r403}, {%r504,%r505,%r506,%r507}, %rd48, %p20, 1, 1; 2026-02-21T09:22:03.2617091Z // end inline asm 2026-02-21T09:22:03.2617150Z // begin inline asm 2026-02-21T09:22:03.2617893Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396,%r397,%r398,%r399,%r400,%r401,%r402,%r403}, {%r572,%r573,%r574,%r575}, %rd49, %p20, 1, 1; 2026-02-21T09:22:03.2618023Z // end inline asm 2026-02-21T09:22:03.2618084Z // begin inline asm 2026-02-21T09:22:03.2618726Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527,%r528,%r529,%r530,%r531,%r532,%r533,%r534,%r535,%r536,%r537,%r538,%r539}, {%r504,%r505,%r506,%r507}, %rd50, %p20, 1, 1; 2026-02-21T09:22:03.2618792Z // end inline asm 2026-02-21T09:22:03.2618851Z // begin inline asm 2026-02-21T09:22:03.2619550Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527,%r528,%r529,%r530,%r531,%r532,%r533,%r534,%r535,%r536,%r537,%r538,%r539}, {%r572,%r573,%r574,%r575}, %rd51, %p20, 1, 1; 2026-02-21T09:22:03.2619622Z // end inline asm 2026-02-21T09:22:03.2619704Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:03.2619765Z mov.b32 %r641, %r708; 2026-02-21T09:22:03.2619906Z mov.b32 %r642, %r708; 2026-02-21T09:22:03.2619973Z mov.b32 %r640, %r707; 2026-02-21T09:22:03.2620031Z // begin inline asm 2026-02-21T09:22:03.2620907Z // wait for regs: %r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396,%r397,%r398,%r399,%r400,%r401,%r402,%r403,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527,%r528,%r529,%r530,%r531,%r532,%r533,%r534,%r535,%r536,%r537,%r538,%r539,%r640,%r641,%r642 2026-02-21T09:22:03.2620995Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:03.2621051Z // end inline asm 2026-02-21T09:22:03.2621108Z $L__tmp2: 2026-02-21T09:22:03.2621335Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2621400Z add.s32 %r993, %r987, 40960; 2026-02-21T09:22:03.2621607Z .loc 1 54 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:54:28 2026-02-21T09:22:03.2621678Z add.s32 %r994, %r993, %r9; 2026-02-21T09:22:03.2621746Z ld.shared.b16 %rs21, [%r994]; 2026-02-21T09:22:03.2621816Z ld.shared.b16 %rs22, [%r994+256]; 2026-02-21T09:22:03.2621885Z ld.shared.b16 %rs23, [%r994+16]; 2026-02-21T09:22:03.2621955Z ld.shared.b16 %rs24, [%r994+272]; 2026-02-21T09:22:03.2622019Z add.s32 %r995, %r993, %r10; 2026-02-21T09:22:03.2622086Z ld.shared.b16 %rs25, [%r995]; 2026-02-21T09:22:03.2622159Z ld.shared.b16 %rs26, [%r995+256]; 2026-02-21T09:22:03.2622226Z ld.shared.b16 %rs27, [%r995+16]; 2026-02-21T09:22:03.2622292Z ld.shared.b16 %rs28, [%r995+272]; 2026-02-21T09:22:03.2622374Z cvt.f32.bf16 %r838, %rs21; 2026-02-21T09:22:03.2622441Z cvt.f32.bf16 %r839, %rs22; 2026-02-21T09:22:03.2622505Z cvt.f32.bf16 %r840, %rs25; 2026-02-21T09:22:03.2622567Z cvt.f32.bf16 %r841, %rs26; 2026-02-21T09:22:03.2622635Z cvt.f32.bf16 %r906, %rs23; 2026-02-21T09:22:03.2622697Z cvt.f32.bf16 %r907, %rs24; 2026-02-21T09:22:03.2622759Z cvt.f32.bf16 %r908, %rs27; 2026-02-21T09:22:03.2622824Z cvt.f32.bf16 %r909, %rs28; 2026-02-21T09:22:03.2623029Z .loc 1 56 30 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:56:30 2026-02-21T09:22:03.2623092Z add.s32 %r996, %r1047, 65536; 2026-02-21T09:22:03.2623155Z cvt.s64.s32 %rd60, %r996; 2026-02-21T09:22:03.2623226Z add.s64 %rd53, %rd9, %rd60; 2026-02-21T09:22:03.2623430Z .loc 1 56 83 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:56:83 2026-02-21T09:22:03.2623492Z // begin inline asm 2026-02-21T09:22:03.2623559Z mov.u64 %rd52, 0x0; 2026-02-21T09:22:03.2623686Z createpolicy.fractional.L2::evict_first.b64 %rd52, 1.0; 2026-02-21T09:22:03.2623847Z // end inline asm 2026-02-21T09:22:03.2623912Z // begin inline asm 2026-02-21T09:22:03.2623973Z mov.u16 %rs2, 0x0; 2026-02-21T09:22:03.2624131Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd53 + 0 ], %rd52; 2026-02-21T09:22:03.2624191Z // end inline asm 2026-02-21T09:22:03.2624399Z .loc 1 64 24 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:64:24 2026-02-21T09:22:03.2624456Z bar.sync 0; 2026-02-21T09:22:03.2624523Z st.shared.b8 [%r11], %rs2; 2026-02-21T09:22:03.2624586Z bar.sync 0; 2026-02-21T09:22:03.2624665Z ld.shared.v2.b8 {%rs29, %rs30}, [%r12]; 2026-02-21T09:22:03.2624866Z .loc 1 59 24 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:59:24 2026-02-21T09:22:03.2624986Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:22:03.2625051Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:22:03.2625250Z .loc 1 74 54 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:74:54 2026-02-21T09:22:03.2625327Z selp.b16 %rs33, %rs31, %rs29, %p26; 2026-02-21T09:22:03.2625396Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:22:03.2625457Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:22:03.2625527Z selp.b16 %rs36, %rs32, %rs30, %p26; 2026-02-21T09:22:03.2625654Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:22:03.2625720Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:22:03.2625922Z .loc 1 79 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:79:28 2026-02-21T09:22:03.2625991Z cvt.rn.f32.s16 %r997, %rs35; 2026-02-21T09:22:03.2626063Z cvt.rn.f32.s16 %r998, %rs38; 2026-02-21T09:22:03.2626120Z bar.sync 0; 2026-02-21T09:22:03.2626183Z st.shared.b32 [%r13], %r997; 2026-02-21T09:22:03.2626264Z st.shared.b32 [%r14], %r998; 2026-02-21T09:22:03.2626323Z $L__tmp3: 2026-02-21T09:22:03.2626738Z .loc 2 291 36 // standard.py:291:36 @[ ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:86:36 ] 2026-02-21T09:22:03.2626897Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r372, %r508}; 2026-02-21T09:22:03.2626959Z bar.sync 0; 2026-02-21T09:22:03.2627021Z // begin inline asm 2026-02-21T09:22:03.2627152Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r842}, [%r711]; 2026-02-21T09:22:03.2627218Z // end inline asm 2026-02-21T09:22:03.2627275Z bar.sync 0; 2026-02-21T09:22:03.2627417Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r374, %r510}; 2026-02-21T09:22:03.2627481Z bar.sync 0; 2026-02-21T09:22:03.2627542Z // begin inline asm 2026-02-21T09:22:03.2627671Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r844}, [%r711]; 2026-02-21T09:22:03.2627729Z // end inline asm 2026-02-21T09:22:03.2627796Z bar.sync 0; 2026-02-21T09:22:03.2627936Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r373, %r509}; 2026-02-21T09:22:03.2627992Z bar.sync 0; 2026-02-21T09:22:03.2628061Z // begin inline asm 2026-02-21T09:22:03.2628186Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r843}, [%r711]; 2026-02-21T09:22:03.2628316Z // end inline asm 2026-02-21T09:22:03.2628386Z bar.sync 0; 2026-02-21T09:22:03.2628526Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r375, %r511}; 2026-02-21T09:22:03.2628582Z bar.sync 0; 2026-02-21T09:22:03.2628643Z // begin inline asm 2026-02-21T09:22:03.2628781Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r845}, [%r711]; 2026-02-21T09:22:03.2628839Z // end inline asm 2026-02-21T09:22:03.2628902Z bar.sync 0; 2026-02-21T09:22:03.2629052Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r376, %r512}; 2026-02-21T09:22:03.2629109Z bar.sync 0; 2026-02-21T09:22:03.2629170Z // begin inline asm 2026-02-21T09:22:03.2629296Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r846}, [%r711]; 2026-02-21T09:22:03.2629361Z // end inline asm 2026-02-21T09:22:03.2629418Z bar.sync 0; 2026-02-21T09:22:03.2629560Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r378, %r514}; 2026-02-21T09:22:03.2629624Z bar.sync 0; 2026-02-21T09:22:03.2629685Z // begin inline asm 2026-02-21T09:22:03.2629809Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r848}, [%r711]; 2026-02-21T09:22:03.2630020Z // end inline asm 2026-02-21T09:22:03.2630096Z bar.sync 0; 2026-02-21T09:22:03.2630239Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r377, %r513}; 2026-02-21T09:22:03.2630294Z bar.sync 0; 2026-02-21T09:22:03.2630360Z // begin inline asm 2026-02-21T09:22:03.2630485Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r847}, [%r711]; 2026-02-21T09:22:03.2630544Z // end inline asm 2026-02-21T09:22:03.2630606Z bar.sync 0; 2026-02-21T09:22:03.2630744Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r379, %r515}; 2026-02-21T09:22:03.2630799Z bar.sync 0; 2026-02-21T09:22:03.2630858Z // begin inline asm 2026-02-21T09:22:03.2630986Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r849}, [%r711]; 2026-02-21T09:22:03.2631042Z // end inline asm 2026-02-21T09:22:03.2631166Z bar.sync 0; 2026-02-21T09:22:03.2631318Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r380, %r516}; 2026-02-21T09:22:03.2631376Z bar.sync 0; 2026-02-21T09:22:03.2631438Z // begin inline asm 2026-02-21T09:22:03.2631564Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r850}, [%r711]; 2026-02-21T09:22:03.2631626Z // end inline asm 2026-02-21T09:22:03.2631684Z bar.sync 0; 2026-02-21T09:22:03.2631881Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r382, %r518}; 2026-02-21T09:22:03.2631945Z bar.sync 0; 2026-02-21T09:22:03.2632004Z // begin inline asm 2026-02-21T09:22:03.2632128Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r852}, [%r711]; 2026-02-21T09:22:03.2632186Z // end inline asm 2026-02-21T09:22:03.2632263Z bar.sync 0; 2026-02-21T09:22:03.2632404Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r381, %r517}; 2026-02-21T09:22:03.2632461Z bar.sync 0; 2026-02-21T09:22:03.2632525Z // begin inline asm 2026-02-21T09:22:03.2632650Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r851}, [%r711]; 2026-02-21T09:22:03.2632708Z // end inline asm 2026-02-21T09:22:03.2632769Z bar.sync 0; 2026-02-21T09:22:03.2632906Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r383, %r519}; 2026-02-21T09:22:03.2632967Z bar.sync 0; 2026-02-21T09:22:03.2633027Z // begin inline asm 2026-02-21T09:22:03.2633159Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r853}, [%r711]; 2026-02-21T09:22:03.2633219Z // end inline asm 2026-02-21T09:22:03.2633274Z bar.sync 0; 2026-02-21T09:22:03.2633421Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r384, %r520}; 2026-02-21T09:22:03.2633477Z bar.sync 0; 2026-02-21T09:22:03.2633536Z // begin inline asm 2026-02-21T09:22:03.2633661Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r854}, [%r711]; 2026-02-21T09:22:03.2633726Z // end inline asm 2026-02-21T09:22:03.2633781Z bar.sync 0; 2026-02-21T09:22:03.2633919Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r386, %r522}; 2026-02-21T09:22:03.2633980Z bar.sync 0; 2026-02-21T09:22:03.2634042Z // begin inline asm 2026-02-21T09:22:03.2634170Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r856}, [%r711]; 2026-02-21T09:22:03.2634227Z // end inline asm 2026-02-21T09:22:03.2634289Z bar.sync 0; 2026-02-21T09:22:03.2634431Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r385, %r521}; 2026-02-21T09:22:03.2634486Z bar.sync 0; 2026-02-21T09:22:03.2634553Z // begin inline asm 2026-02-21T09:22:03.2634676Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r855}, [%r711]; 2026-02-21T09:22:03.2634735Z // end inline asm 2026-02-21T09:22:03.2634796Z bar.sync 0; 2026-02-21T09:22:03.2634935Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r387, %r523}; 2026-02-21T09:22:03.2634990Z bar.sync 0; 2026-02-21T09:22:03.2635050Z // begin inline asm 2026-02-21T09:22:03.2635179Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r857}, [%r711]; 2026-02-21T09:22:03.2635237Z // end inline asm 2026-02-21T09:22:03.2635294Z bar.sync 0; 2026-02-21T09:22:03.2635438Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r388, %r524}; 2026-02-21T09:22:03.2635498Z bar.sync 0; 2026-02-21T09:22:03.2635558Z // begin inline asm 2026-02-21T09:22:03.2635681Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r858}, [%r711]; 2026-02-21T09:22:03.2635861Z // end inline asm 2026-02-21T09:22:03.2635916Z bar.sync 0; 2026-02-21T09:22:03.2636055Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r390, %r526}; 2026-02-21T09:22:03.2636120Z bar.sync 0; 2026-02-21T09:22:03.2636183Z // begin inline asm 2026-02-21T09:22:03.2636309Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r860}, [%r711]; 2026-02-21T09:22:03.2636366Z // end inline asm 2026-02-21T09:22:03.2644376Z bar.sync 0; 2026-02-21T09:22:03.2644635Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r389, %r525}; 2026-02-21T09:22:03.2644706Z bar.sync 0; 2026-02-21T09:22:03.2644778Z // begin inline asm 2026-02-21T09:22:03.2644931Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r859}, [%r711]; 2026-02-21T09:22:03.2645011Z // end inline asm 2026-02-21T09:22:03.2645072Z bar.sync 0; 2026-02-21T09:22:03.2645377Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r391, %r527}; 2026-02-21T09:22:03.2645455Z bar.sync 0; 2026-02-21T09:22:03.2645524Z // begin inline asm 2026-02-21T09:22:03.2645681Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r861}, [%r711]; 2026-02-21T09:22:03.2645744Z // end inline asm 2026-02-21T09:22:03.2645810Z bar.sync 0; 2026-02-21T09:22:03.2645971Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r392, %r528}; 2026-02-21T09:22:03.2646124Z bar.sync 0; 2026-02-21T09:22:03.2646199Z // begin inline asm 2026-02-21T09:22:03.2646340Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r862}, [%r711]; 2026-02-21T09:22:03.2646402Z // end inline asm 2026-02-21T09:22:03.2646640Z bar.sync 0; 2026-02-21T09:22:03.2646804Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r394, %r530}; 2026-02-21T09:22:03.2646865Z bar.sync 0; 2026-02-21T09:22:03.2646927Z // begin inline asm 2026-02-21T09:22:03.2647065Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r864}, [%r711]; 2026-02-21T09:22:03.2647127Z // end inline asm 2026-02-21T09:22:03.2647185Z bar.sync 0; 2026-02-21T09:22:03.2647327Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r393, %r529}; 2026-02-21T09:22:03.2647393Z bar.sync 0; 2026-02-21T09:22:03.2647456Z // begin inline asm 2026-02-21T09:22:03.2647582Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r863}, [%r711]; 2026-02-21T09:22:03.2647649Z // end inline asm 2026-02-21T09:22:03.2647708Z bar.sync 0; 2026-02-21T09:22:03.2647852Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r395, %r531}; 2026-02-21T09:22:03.2647917Z bar.sync 0; 2026-02-21T09:22:03.2647981Z // begin inline asm 2026-02-21T09:22:03.2648117Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r865}, [%r711]; 2026-02-21T09:22:03.2648178Z // end inline asm 2026-02-21T09:22:03.2648243Z bar.sync 0; 2026-02-21T09:22:03.2648385Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r396, %r532}; 2026-02-21T09:22:03.2648443Z bar.sync 0; 2026-02-21T09:22:03.2648519Z // begin inline asm 2026-02-21T09:22:03.2648648Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r866}, [%r711]; 2026-02-21T09:22:03.2648709Z // end inline asm 2026-02-21T09:22:03.2648765Z bar.sync 0; 2026-02-21T09:22:03.2648913Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r398, %r534}; 2026-02-21T09:22:03.2648972Z bar.sync 0; 2026-02-21T09:22:03.2649033Z // begin inline asm 2026-02-21T09:22:03.2649167Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r868}, [%r711]; 2026-02-21T09:22:03.2649229Z // end inline asm 2026-02-21T09:22:03.2649289Z bar.sync 0; 2026-02-21T09:22:03.2649431Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r397, %r533}; 2026-02-21T09:22:03.2649498Z bar.sync 0; 2026-02-21T09:22:03.2649558Z // begin inline asm 2026-02-21T09:22:03.2649684Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r867}, [%r711]; 2026-02-21T09:22:03.2649752Z // end inline asm 2026-02-21T09:22:03.2649809Z bar.sync 0; 2026-02-21T09:22:03.2649961Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r399, %r535}; 2026-02-21T09:22:03.2650028Z bar.sync 0; 2026-02-21T09:22:03.2650091Z // begin inline asm 2026-02-21T09:22:03.2650226Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r869}, [%r711]; 2026-02-21T09:22:03.2650289Z // end inline asm 2026-02-21T09:22:03.2650526Z bar.sync 0; 2026-02-21T09:22:03.2650679Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r400, %r536}; 2026-02-21T09:22:03.2650742Z bar.sync 0; 2026-02-21T09:22:03.2650809Z // begin inline asm 2026-02-21T09:22:03.2650939Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r870}, [%r711]; 2026-02-21T09:22:03.2650999Z // end inline asm 2026-02-21T09:22:03.2651057Z bar.sync 0; 2026-02-21T09:22:03.2651205Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r402, %r538}; 2026-02-21T09:22:03.2651262Z bar.sync 0; 2026-02-21T09:22:03.2651322Z // begin inline asm 2026-02-21T09:22:03.2651454Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r872}, [%r711]; 2026-02-21T09:22:03.2651513Z // end inline asm 2026-02-21T09:22:03.2651570Z bar.sync 0; 2026-02-21T09:22:03.2651785Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r401, %r537}; 2026-02-21T09:22:03.2651853Z bar.sync 0; 2026-02-21T09:22:03.2651913Z // begin inline asm 2026-02-21T09:22:03.2652037Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r871}, [%r711]; 2026-02-21T09:22:03.2652108Z // end inline asm 2026-02-21T09:22:03.2652165Z bar.sync 0; 2026-02-21T09:22:03.2652304Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r210], {%r403, %r539}; 2026-02-21T09:22:03.2652366Z bar.sync 0; 2026-02-21T09:22:03.2652487Z // begin inline asm 2026-02-21T09:22:03.2652615Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r873}, [%r711]; 2026-02-21T09:22:03.2652685Z // end inline asm 2026-02-21T09:22:03.2652755Z // begin inline asm 2026-02-21T09:22:03.2652849Z fence.proxy.async.shared::cta; 2026-02-21T09:22:03.2652909Z // end inline asm 2026-02-21T09:22:03.2652996Z wgmma.fence.sync.aligned; 2026-02-21T09:22:03.2653061Z shl.b32 %r999, %r992, 8; 2026-02-21T09:22:03.2653128Z and.b32 %r1000, %r999, 4096; 2026-02-21T09:22:03.2653200Z add.s32 %r1001, %r1000, %r707; 2026-02-21T09:22:03.2653273Z bfe.u32 %r1002, %r1001, 4, 14; 2026-02-21T09:22:03.2653343Z cvt.u64.u32 %rd61, %r1002; 2026-02-21T09:22:03.2653430Z or.b64 %rd55, %rd61, -9223371899382267904; 2026-02-21T09:22:03.2653503Z // begin inline asm 2026-02-21T09:22:03.2654170Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873}, {%r838,%r839,%r840,%r841}, %rd55, %p20, 1, 1; 2026-02-21T09:22:03.2654233Z // end inline asm 2026-02-21T09:22:03.2654310Z add.s32 %r1003, %r1001, 32; 2026-02-21T09:22:03.2654376Z bfe.u32 %r1004, %r1003, 4, 14; 2026-02-21T09:22:03.2654445Z cvt.u64.u32 %rd62, %r1004; 2026-02-21T09:22:03.2654527Z or.b64 %rd56, %rd62, -9223371899382267904; 2026-02-21T09:22:03.2654599Z // begin inline asm 2026-02-21T09:22:03.2655251Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873}, {%r906,%r907,%r908,%r909}, %rd56, %p20, 1, 1; 2026-02-21T09:22:03.2655315Z // end inline asm 2026-02-21T09:22:03.2655405Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:03.2655475Z mov.b32 %r942, %r707; 2026-02-21T09:22:03.2655542Z mov.b32 %r943, %r708; 2026-02-21T09:22:03.2655614Z mov.b32 %r944, %r708; 2026-02-21T09:22:03.2655679Z // begin inline asm 2026-02-21T09:22:03.2656148Z // wait for regs: %r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r862,%r863,%r864,%r865,%r866,%r867,%r868,%r869,%r870,%r871,%r872,%r873,%r942,%r943,%r944 2026-02-21T09:22:03.2656240Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:03.2656305Z // end inline asm 2026-02-21T09:22:03.2656365Z $L__tmp4: 2026-02-21T09:22:03.2656719Z .loc 1 42 74 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:42:74 2026-02-21T09:22:03.2656801Z add.s32 %r1005, %r1050, 1; 2026-02-21T09:22:03.2656875Z setp.gt.s32 %p29, %r1005, 4; 2026-02-21T09:22:03.2657097Z selp.b32 %r1050, 0, %r1005, %p29; 2026-02-21T09:22:03.2657317Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2657388Z add.s32 %r1006, %r1048, -16; 2026-02-21T09:22:03.2657466Z mad.wide.s32 %rd57, %r1006, 2, %rd8; 2026-02-21T09:22:03.2657679Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2657745Z shl.b32 %r1007, %r1050, 13; 2026-02-21T09:22:03.2657812Z add.s32 %r980, %r100, %r1007; 2026-02-21T09:22:03.2657880Z selp.b32 %r981, 8, 0, %p27; 2026-02-21T09:22:03.2657950Z // begin inline asm 2026-02-21T09:22:03.2658095Z cp.async.ca.shared.global [ %r980 + 0 ], [ %rd57 + 0 ], 0x8, %r981; 2026-02-21T09:22:03.2658159Z // end inline asm 2026-02-21T09:22:03.2658311Z cp.async.commit_group; 2026-02-21T09:22:03.2658525Z .loc 1 50 28 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:28 2026-02-21T09:22:03.2658603Z mad.wide.s32 %rd58, %r1048, 2, %rd8; 2026-02-21T09:22:03.2658815Z .loc 1 50 76 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:50:76 2026-02-21T09:22:03.2658884Z add.s32 %r982, %r102, %r1007; 2026-02-21T09:22:03.2659006Z // begin inline asm 2026-02-21T09:22:03.2659149Z cp.async.ca.shared.global [ %r982 + 0 ], [ %rd58 + 0 ], 0x8, %r981; 2026-02-21T09:22:03.2659215Z // end inline asm 2026-02-21T09:22:03.2659283Z cp.async.commit_group; 2026-02-21T09:22:03.2659489Z .loc 1 42 74 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:42:74 2026-02-21T09:22:03.2659561Z add.s32 %r1048, %r1048, 32; 2026-02-21T09:22:03.2659628Z add.s32 %r1047, %r1047, 131072; 2026-02-21T09:22:03.2659696Z setp.lt.u64 %p30, %rd64, 496; 2026-02-21T09:22:03.2659761Z @%p30 bra $L__BB0_1; 2026-02-21T09:22:03.2659835Z // %bb.2: 2026-02-21T09:22:03.2660041Z .loc 1 21 68 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:21:68 2026-02-21T09:22:03.2660117Z cvta.global.u64 %rd63, %rd1; 2026-02-21T09:22:03.2660325Z .loc 1 42 74 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:42:74 2026-02-21T09:22:03.2660396Z cp.async.wait_group 0; 2026-02-21T09:22:03.2660456Z bar.sync 0; 2026-02-21T09:22:03.2660665Z .loc 1 89 24 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:89:24 2026-02-21T09:22:03.2660744Z cvt.rn.bf16x2.f32 %r1011, %r843, %r842; 2026-02-21T09:22:03.2660817Z cvt.rn.bf16x2.f32 %r1012, %r845, %r844; 2026-02-21T09:22:03.2660897Z cvt.rn.bf16x2.f32 %r1013, %r847, %r846; 2026-02-21T09:22:03.2660967Z cvt.rn.bf16x2.f32 %r1014, %r849, %r848; 2026-02-21T09:22:03.2661037Z cvt.rn.bf16x2.f32 %r1015, %r851, %r850; 2026-02-21T09:22:03.2661108Z cvt.rn.bf16x2.f32 %r1016, %r853, %r852; 2026-02-21T09:22:03.2661187Z cvt.rn.bf16x2.f32 %r1017, %r855, %r854; 2026-02-21T09:22:03.2661258Z cvt.rn.bf16x2.f32 %r1018, %r857, %r856; 2026-02-21T09:22:03.2661331Z cvt.rn.bf16x2.f32 %r1019, %r859, %r858; 2026-02-21T09:22:03.2661408Z cvt.rn.bf16x2.f32 %r1020, %r861, %r860; 2026-02-21T09:22:03.2661479Z cvt.rn.bf16x2.f32 %r1021, %r863, %r862; 2026-02-21T09:22:03.2661550Z cvt.rn.bf16x2.f32 %r1022, %r865, %r864; 2026-02-21T09:22:03.2661621Z cvt.rn.bf16x2.f32 %r1023, %r867, %r866; 2026-02-21T09:22:03.2661705Z cvt.rn.bf16x2.f32 %r1024, %r869, %r868; 2026-02-21T09:22:03.2661777Z cvt.rn.bf16x2.f32 %r1025, %r871, %r870; 2026-02-21T09:22:03.2661849Z cvt.rn.bf16x2.f32 %r1026, %r873, %r872; 2026-02-21T09:22:03.2662060Z .loc 1 90 39 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:90:39 2026-02-21T09:22:03.2662126Z shl.b32 %r1027, %r1, 7; 2026-02-21T09:22:03.2662192Z and.b32 %r1028, %r1027, 1920; 2026-02-21T09:22:03.2662265Z and.b32 %r1030, %r205, 63488; 2026-02-21T09:22:03.2662332Z or.b32 %r1031, %r1028, %r1030; 2026-02-21T09:22:03.2662412Z xor.b32 %r1032, %r16, %r6; 2026-02-21T09:22:03.2662480Z or.b32 %r1033, %r1031, %r1032; 2026-02-21T09:22:03.2662652Z add.s32 %r1035, %r132, %r1033; 2026-02-21T09:22:03.2662850Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1035], {%r1011, %r1012, %r1013, %r1014}; 2026-02-21T09:22:03.2662913Z xor.b32 %r1036, %r1033, 32; 2026-02-21T09:22:03.2662986Z add.s32 %r1037, %r132, %r1036; 2026-02-21T09:22:03.2663172Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1037], {%r1015, %r1016, %r1017, %r1018}; 2026-02-21T09:22:03.2663237Z xor.b32 %r1038, %r1033, 64; 2026-02-21T09:22:03.2663307Z add.s32 %r1039, %r132, %r1038; 2026-02-21T09:22:03.2663488Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1039], {%r1019, %r1020, %r1021, %r1022}; 2026-02-21T09:22:03.2663551Z xor.b32 %r1040, %r1033, 96; 2026-02-21T09:22:03.2663617Z add.s32 %r1041, %r132, %r1040; 2026-02-21T09:22:03.2663872Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r1041], {%r1023, %r1024, %r1025, %r1026}; 2026-02-21T09:22:03.2663936Z // begin inline asm 2026-02-21T09:22:03.2664018Z fence.proxy.async.shared::cta; 2026-02-21T09:22:03.2664086Z // end inline asm 2026-02-21T09:22:03.2664144Z bar.sync 0; 2026-02-21T09:22:03.2664214Z elect.sync %r1042|%p32, -1; 2026-02-21T09:22:03.2664302Z shfl.sync.idx.b32 %r1043, %r3, 0, 31, -1; 2026-02-21T09:22:03.2664377Z setp.lt.u32 %p33, %r1, 64; 2026-02-21T09:22:03.2664495Z and.pred %p31, %p33, %p32; 2026-02-21T09:22:03.2664563Z and.b32 %r1044, %r1043, 1; 2026-02-21T09:22:03.2664632Z shl.b32 %r1045, %r1044, 15; 2026-02-21T09:22:03.2664698Z add.s32 %r1010, %r132, %r1045; 2026-02-21T09:22:03.2664760Z shl.b32 %r1046, %r1044, 6; 2026-02-21T09:22:03.2664833Z or.b32 %r1008, %r1046, %r2; 2026-02-21T09:22:03.2664896Z // begin inline asm 2026-02-21T09:22:03.2665140Z @%p31 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd63, {%r1008, %r1009}], [%r1010]; 2026-02-21T09:22:03.2665204Z // end inline asm 2026-02-21T09:22:03.2665295Z cp.async.bulk.commit_group; 2026-02-21T09:22:03.2665382Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:22:03.2665442Z bar.sync 0; 2026-02-21T09:22:03.2665670Z .loc 1 90 4 // ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py:90:4 2026-02-21T09:22:03.2665727Z ret; 2026-02-21T09:22:03.2665785Z $L__tmp5: 2026-02-21T09:22:03.2665845Z $L__func_end0: 2026-02-21T09:22:03.2665946Z // -- End function 2026-02-21T09:22:03.2666004Z } 2026-02-21T09:22:03.2666258Z .file 1 "/tmp/torchinductor_root/eo/ceotrihm5xc7bdvtji2jw4mrmdhfse7sgwmm4gw7grflp4mgdlp4.py" 2026-02-21T09:22:03.2666609Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:22:03.2666683Z .section .debug_abbrev 2026-02-21T09:22:03.2666739Z { 2026-02-21T09:22:03.2666850Z .b8 1 // Abbreviation Code 2026-02-21T09:22:03.2666951Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:22:03.2667041Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:03.2667130Z .b8 37 // DW_AT_producer 2026-02-21T09:22:03.2667223Z .b8 8 // DW_FORM_string 2026-02-21T09:22:03.2667307Z .b8 19 // DW_AT_language 2026-02-21T09:22:03.2667390Z .b8 5 // DW_FORM_data2 2026-02-21T09:22:03.2667479Z .b8 3 // DW_AT_name 2026-02-21T09:22:03.2667564Z .b8 8 // DW_FORM_string 2026-02-21T09:22:03.2667651Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:22:03.2667738Z .b8 6 // DW_FORM_data4 2026-02-21T09:22:03.2667821Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:22:03.2667901Z .b8 8 // DW_FORM_string 2026-02-21T09:22:03.2667980Z .b8 0 // EOM(1) 2026-02-21T09:22:03.2668059Z .b8 0 // EOM(2) 2026-02-21T09:22:03.2668148Z .b8 2 // Abbreviation Code 2026-02-21T09:22:03.2668503Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:03.2668598Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:03.2668677Z .b8 3 // DW_AT_name 2026-02-21T09:22:03.2668761Z .b8 8 // DW_FORM_string 2026-02-21T09:22:03.2668854Z .b8 32 // DW_AT_inline 2026-02-21T09:22:03.2668937Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:03.2669008Z .b8 0 // EOM(1) 2026-02-21T09:22:03.2669087Z .b8 0 // EOM(2) 2026-02-21T09:22:03.2669179Z .b8 3 // Abbreviation Code 2026-02-21T09:22:03.2669331Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:03.2669417Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:03.2669504Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:03.2669584Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:03.2669668Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:03.2669752Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:03.2669917Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:03.2670007Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:03.2670088Z .b8 0 // EOM(1) 2026-02-21T09:22:03.2670158Z .b8 0 // EOM(2) 2026-02-21T09:22:03.2670245Z .b8 4 // Abbreviation Code 2026-02-21T09:22:03.2670346Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:22:03.2670432Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:03.2670528Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:03.2670605Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:03.2670687Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:03.2670760Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:03.2670839Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:03.2670923Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:03.2671006Z .b8 88 // DW_AT_call_file 2026-02-21T09:22:03.2671087Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:03.2671170Z .b8 89 // DW_AT_call_line 2026-02-21T09:22:03.2671252Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:03.2671333Z .b8 87 // DW_AT_call_column 2026-02-21T09:22:03.2671411Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:03.2671488Z .b8 0 // EOM(1) 2026-02-21T09:22:03.2671559Z .b8 0 // EOM(2) 2026-02-21T09:22:03.2671632Z .b8 0 // EOM(3) 2026-02-21T09:22:03.2671695Z } 2026-02-21T09:22:03.2671762Z .section .debug_info 2026-02-21T09:22:03.2671816Z { 2026-02-21T09:22:03.2671916Z .b32 178 // Length of Unit 2026-02-21T09:22:03.2672016Z .b8 2 // DWARF version number 2026-02-21T09:22:03.2672067Z .b8 0 2026-02-21T09:22:03.2672198Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:22:03.2672300Z .b8 8 // Address Size (in bytes) 2026-02-21T09:22:03.2672419Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:22:03.2672507Z .b8 116 // DW_AT_producer 2026-02-21T09:22:03.2672564Z .b8 114 2026-02-21T09:22:03.2672619Z .b8 105 2026-02-21T09:22:03.2672673Z .b8 116 2026-02-21T09:22:03.2672726Z .b8 111 2026-02-21T09:22:03.2672858Z .b8 110 2026-02-21T09:22:03.2672967Z .b8 0 2026-02-21T09:22:03.2673049Z .b8 2 // DW_AT_language 2026-02-21T09:22:03.2673107Z .b8 0 2026-02-21T09:22:03.2673189Z .b8 99 // DW_AT_name 2026-02-21T09:22:03.2673244Z .b8 101 2026-02-21T09:22:03.2673299Z .b8 111 2026-02-21T09:22:03.2673361Z .b8 116 2026-02-21T09:22:03.2673419Z .b8 114 2026-02-21T09:22:03.2673477Z .b8 105 2026-02-21T09:22:03.2673536Z .b8 104 2026-02-21T09:22:03.2673602Z .b8 109 2026-02-21T09:22:03.2673657Z .b8 53 2026-02-21T09:22:03.2673714Z .b8 120 2026-02-21T09:22:03.2673774Z .b8 99 2026-02-21T09:22:03.2673827Z .b8 55 2026-02-21T09:22:03.2673879Z .b8 98 2026-02-21T09:22:03.2673934Z .b8 100 2026-02-21T09:22:03.2673994Z .b8 118 2026-02-21T09:22:03.2674052Z .b8 116 2026-02-21T09:22:03.2674106Z .b8 106 2026-02-21T09:22:03.2674219Z .b8 105 2026-02-21T09:22:03.2674272Z .b8 50 2026-02-21T09:22:03.2674330Z .b8 106 2026-02-21T09:22:03.2674390Z .b8 119 2026-02-21T09:22:03.2674442Z .b8 52 2026-02-21T09:22:03.2674499Z .b8 109 2026-02-21T09:22:03.2674551Z .b8 114 2026-02-21T09:22:03.2674612Z .b8 109 2026-02-21T09:22:03.2674664Z .b8 100 2026-02-21T09:22:03.2674721Z .b8 104 2026-02-21T09:22:03.2674778Z .b8 102 2026-02-21T09:22:03.2674832Z .b8 115 2026-02-21T09:22:03.2674884Z .b8 101 2026-02-21T09:22:03.2674995Z .b8 55 2026-02-21T09:22:03.2675059Z .b8 115 2026-02-21T09:22:03.2675113Z .b8 103 2026-02-21T09:22:03.2675166Z .b8 119 2026-02-21T09:22:03.2675219Z .b8 109 2026-02-21T09:22:03.2675279Z .b8 109 2026-02-21T09:22:03.2675333Z .b8 52 2026-02-21T09:22:03.2675385Z .b8 103 2026-02-21T09:22:03.2675442Z .b8 119 2026-02-21T09:22:03.2675495Z .b8 55 2026-02-21T09:22:03.2675549Z .b8 103 2026-02-21T09:22:03.2675603Z .b8 114 2026-02-21T09:22:03.2675661Z .b8 102 2026-02-21T09:22:03.2675715Z .b8 108 2026-02-21T09:22:03.2675769Z .b8 112 2026-02-21T09:22:03.2675831Z .b8 52 2026-02-21T09:22:03.2675885Z .b8 109 2026-02-21T09:22:03.2675938Z .b8 103 2026-02-21T09:22:03.2675993Z .b8 100 2026-02-21T09:22:03.2676055Z .b8 108 2026-02-21T09:22:03.2676111Z .b8 112 2026-02-21T09:22:03.2676163Z .b8 52 2026-02-21T09:22:03.2676219Z .b8 46 2026-02-21T09:22:03.2676279Z .b8 112 2026-02-21T09:22:03.2676332Z .b8 121 2026-02-21T09:22:03.2676385Z .b8 0 2026-02-21T09:22:03.2676643Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:22:03.2676734Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:22:03.2676789Z .b8 116 2026-02-21T09:22:03.2676843Z .b8 109 2026-02-21T09:22:03.2676903Z .b8 112 2026-02-21T09:22:03.2676955Z .b8 47 2026-02-21T09:22:03.2677009Z .b8 116 2026-02-21T09:22:03.2677071Z .b8 111 2026-02-21T09:22:03.2677126Z .b8 114 2026-02-21T09:22:03.2677180Z .b8 99 2026-02-21T09:22:03.2677232Z .b8 104 2026-02-21T09:22:03.2677291Z .b8 105 2026-02-21T09:22:03.2677344Z .b8 110 2026-02-21T09:22:03.2677399Z .b8 100 2026-02-21T09:22:03.2677456Z .b8 117 2026-02-21T09:22:03.2677508Z .b8 99 2026-02-21T09:22:03.2677563Z .b8 116 2026-02-21T09:22:03.2677616Z .b8 111 2026-02-21T09:22:03.2677676Z .b8 114 2026-02-21T09:22:03.2677731Z .b8 95 2026-02-21T09:22:03.2677787Z .b8 114 2026-02-21T09:22:03.2677844Z .b8 111 2026-02-21T09:22:03.2677897Z .b8 111 2026-02-21T09:22:03.2677949Z .b8 116 2026-02-21T09:22:03.2678002Z .b8 47 2026-02-21T09:22:03.2678060Z .b8 101 2026-02-21T09:22:03.2678112Z .b8 111 2026-02-21T09:22:03.2678165Z .b8 0 2026-02-21T09:22:03.2678280Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:22:03.2678366Z .b8 95 // DW_AT_name 2026-02-21T09:22:03.2678421Z .b8 104 2026-02-21T09:22:03.2678473Z .b8 101 2026-02-21T09:22:03.2678532Z .b8 108 2026-02-21T09:22:03.2678584Z .b8 105 2026-02-21T09:22:03.2678637Z .b8 111 2026-02-21T09:22:03.2678689Z .b8 110 2026-02-21T09:22:03.2678747Z .b8 95 2026-02-21T09:22:03.2678799Z .b8 109 2026-02-21T09:22:03.2678853Z .b8 97 2026-02-21T09:22:03.2678911Z .b8 116 2026-02-21T09:22:03.2678964Z .b8 109 2026-02-21T09:22:03.2679015Z .b8 117 2026-02-21T09:22:03.2679069Z .b8 108 2026-02-21T09:22:03.2679221Z .b8 95 2026-02-21T09:22:03.2679333Z .b8 98 2026-02-21T09:22:03.2679387Z .b8 102 2026-02-21T09:22:03.2679445Z .b8 49 2026-02-21T09:22:03.2679498Z .b8 54 2026-02-21T09:22:03.2679551Z .b8 95 2026-02-21T09:22:03.2679604Z .b8 105 2026-02-21T09:22:03.2679662Z .b8 110 2026-02-21T09:22:03.2679717Z .b8 116 2026-02-21T09:22:03.2679769Z .b8 52 2026-02-21T09:22:03.2679828Z .b8 0 2026-02-21T09:22:03.2679915Z .b8 1 // DW_AT_inline 2026-02-21T09:22:03.2680022Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:22:03.2680126Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:22:03.2680225Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:22:03.2680328Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:03.2680550Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:22:03.2680655Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:03.2680754Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:22:03.2680852Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:22:03.2680938Z .b8 1 // DW_AT_call_file 2026-02-21T09:22:03.2681083Z .b8 86 // DW_AT_call_line 2026-02-21T09:22:03.2681185Z .b8 36 // DW_AT_call_column 2026-02-21T09:22:03.2681280Z .b8 0 // End Of Children Mark 2026-02-21T09:22:03.2681368Z .b8 0 // End Of Children Mark 2026-02-21T09:22:03.2681434Z } 2026-02-21T09:22:03.2681516Z .section .debug_macinfo { } 2026-02-21T09:22:03.2681522Z 2026-02-21T09:22:03.2681608Z ================================================================ 2026-02-21T09:22:03.2681731Z please share the reproducer above with Triton project. 2026-02-21T09:22:04.6939838Z [605s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:22:04.6941753Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=6, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:22:04.6943378Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:22:04.6943694Z `ptxas` stderr: 2026-02-21T09:22:04.6944415Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:04.6945447Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:04.6946001Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:04.6946199Z 2026-02-21T09:22:04.6947105Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpt24fawp2.ptx -o /tmp/tmpt24fawp2.ptx.o 2026-02-21T09:22:04.6947784Z 2026-02-21T09:22:04.6947956Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:22:04.6948310Z 2026-02-21T09:22:04.6948315Z 2026-02-21T09:22:04.6948318Z 2026-02-21T09:22:04.6948412Z ================================================================ 2026-02-21T09:22:04.6948680Z Internal Triton PTX codegen error 2026-02-21T09:22:04.6948887Z `ptxas` stderr: 2026-02-21T09:22:04.6949588Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:04.6950607Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:04.6951622Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:04.6951817Z 2026-02-21T09:22:04.6952394Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpt24fawp2.ptx -o /tmp/tmpt24fawp2.ptx.o 2026-02-21T09:22:04.6953051Z 2026-02-21T09:22:04.6953056Z 2026-02-21T09:22:04.6953117Z // 2026-02-21T09:22:04.6953278Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:22:04.6953487Z // 2026-02-21T09:22:04.6953562Z 2026-02-21T09:22:04.6953619Z .version 8.7 2026-02-21T09:22:04.6953773Z .target sm_90a 2026-02-21T09:22:04.6953923Z .address_size 64 2026-02-21T09:22:04.6954016Z 2026-02-21T09:22:04.6954316Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:22:04.6954664Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:22:04.6954934Z // @_helion_matmul_bf16_int4 2026-02-21T09:22:04.6955198Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:22:04.6955496Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:22:04.6955933Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:22:04.6956308Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:22:04.6956798Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:22:04.6957144Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:22:04.6957419Z ) 2026-02-21T09:22:04.6957556Z .reqntid 1024 2026-02-21T09:22:04.6957707Z { 2026-02-21T09:22:04.6957836Z .reg .pred %p<13>; 2026-02-21T09:22:04.6958001Z .reg .b16 %rs<39>; 2026-02-21T09:22:04.6958155Z .reg .b32 %r<1112>; 2026-02-21T09:22:04.6958314Z .reg .b64 %rd<48>; 2026-02-21T09:22:04.6958618Z .loc 1 13 0 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:13:0 2026-02-21T09:22:04.6958988Z $L__func_begin0: 2026-02-21T09:22:04.6959280Z .loc 1 13 0 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:13:0 2026-02-21T09:22:04.6959567Z 2026-02-21T09:22:04.6959623Z // %bb.0: 2026-02-21T09:22:04.6959825Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:22:04.6960119Z ld.param.b64 %rd8, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:22:04.6960411Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:22:04.6960649Z $L__tmp0: 2026-02-21T09:22:04.6960939Z .loc 1 17 33 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:17:33 2026-02-21T09:22:04.6961306Z mov.u32 %r117, %ctaid.x; 2026-02-21T09:22:04.6961620Z .loc 1 20 29 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:20:29 2026-02-21T09:22:04.6961975Z shr.u32 %r118, %r117, 6; 2026-02-21T09:22:04.6962154Z and.b32 %r119, %r118, 33554424; 2026-02-21T09:22:04.6962489Z .loc 1 21 35 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:21:35 2026-02-21T09:22:04.6962837Z sub.s32 %r120, 64, %r119; 2026-02-21T09:22:04.6963156Z .loc 1 21 48 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:21:48 2026-02-21T09:22:04.6963512Z min.s32 %r121, %r120, 8; 2026-02-21T09:22:04.6963816Z .loc 1 22 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:41 2026-02-21T09:22:04.6964169Z and.b32 %r122, %r117, 511; 2026-02-21T09:22:04.6964481Z .loc 1 23 47 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:23:47 2026-02-21T09:22:04.6964835Z div.s32 %r123, %r122, %r121; 2026-02-21T09:22:04.6965150Z .loc 1 22 60 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:60 2026-02-21T09:22:04.6965512Z mul.lo.s32 %r124, %r123, %r121; 2026-02-21T09:22:04.6965715Z sub.s32 %r125, %r122, %r124; 2026-02-21T09:22:04.6966217Z .loc 1 22 26 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:22:26 2026-02-21T09:22:04.6966705Z add.s32 %r126, %r125, %r119; 2026-02-21T09:22:04.6967031Z .loc 1 24 23 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:24:23 2026-02-21T09:22:04.6967387Z shl.b32 %r1, %r126, 7; 2026-02-21T09:22:04.6967694Z .loc 1 25 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:41 2026-02-21T09:22:04.6968046Z mov.u32 %r2, %tid.x; 2026-02-21T09:22:04.6968211Z and.b32 %r127, %r2, 31; 2026-02-21T09:22:04.6968376Z shr.u32 %r3, %r2, 5; 2026-02-21T09:22:04.6968541Z and.b32 %r128, %r2, 127; 2026-02-21T09:22:04.6968950Z .loc 1 26 23 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:26:23 2026-02-21T09:22:04.6969316Z shl.b32 %r4, %r123, 8; 2026-02-21T09:22:04.6969623Z .loc 1 27 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:41 2026-02-21T09:22:04.6969984Z shr.u32 %r129, %r2, 2; 2026-02-21T09:22:04.6970292Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:04.6970643Z or.b32 %r130, %r4, %r129; 2026-02-21T09:22:04.6971040Z .loc 1 41 34 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:41:34 2026-02-21T09:22:04.6971390Z and.b32 %r5, %r2, 3; 2026-02-21T09:22:04.6971553Z shl.b32 %r131, %r5, 2; 2026-02-21T09:22:04.6971852Z .loc 1 42 49 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:49 2026-02-21T09:22:04.6972209Z shl.b32 %r132, %r130, 10; 2026-02-21T09:22:04.6972522Z .loc 1 59 34 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:59:34 2026-02-21T09:22:04.6973023Z and.b32 %r6, %r2, 128; 2026-02-21T09:22:04.6973333Z .loc 1 42 56 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:56 2026-02-21T09:22:04.6973686Z or.b32 %r133, %r132, %r131; 2026-02-21T09:22:04.6974005Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6974360Z mad.wide.s32 %rd10, %r133, 2, %rd7; 2026-02-21T09:22:04.6974706Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6975057Z shl.b32 %r7, %r2, 3; 2026-02-21T09:22:04.6975212Z and.b32 %r134, %r7, 8056; 2026-02-21T09:22:04.6975392Z bfe.s32 %r135, %r2, 4, 1; 2026-02-21T09:22:04.6975563Z and.b32 %r136, %r135, 136; 2026-02-21T09:22:04.6975749Z xor.b32 %r137, %r136, %r134; 2026-02-21T09:22:04.6975929Z mov.b32 %r138, global_smem; 2026-02-21T09:22:04.6976112Z add.s32 %r94, %r138, %r137; 2026-02-21T09:22:04.6976282Z mov.b32 %r95, 8; 2026-02-21T09:22:04.6976575Z // begin inline asm 2026-02-21T09:22:04.6976826Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd10 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6977112Z // end inline asm 2026-02-21T09:22:04.6977285Z cp.async.commit_group; 2026-02-21T09:22:04.6977610Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6977974Z add.s64 %rd11, %rd10, 32; 2026-02-21T09:22:04.6978290Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6978649Z add.s32 %r96, %r94, 40960; 2026-02-21T09:22:04.6978825Z // begin inline asm 2026-02-21T09:22:04.6979064Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd11 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6979336Z // end inline asm 2026-02-21T09:22:04.6979493Z cp.async.commit_group; 2026-02-21T09:22:04.6979803Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6980151Z add.s64 %rd12, %rd10, 64; 2026-02-21T09:22:04.6980463Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6980805Z bar.sync 0; 2026-02-21T09:22:04.6981121Z add.s32 %r98, %r94, 8192; 2026-02-21T09:22:04.6981291Z // begin inline asm 2026-02-21T09:22:04.6981516Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd12 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6981786Z // end inline asm 2026-02-21T09:22:04.6981940Z cp.async.commit_group; 2026-02-21T09:22:04.6982270Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6982621Z add.s64 %rd13, %rd10, 96; 2026-02-21T09:22:04.6982932Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6983276Z add.s32 %r100, %r94, 49152; 2026-02-21T09:22:04.6983458Z // begin inline asm 2026-02-21T09:22:04.6983689Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd13 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6984043Z // end inline asm 2026-02-21T09:22:04.6984208Z cp.async.commit_group; 2026-02-21T09:22:04.6984512Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6984871Z add.s64 %rd14, %rd10, 128; 2026-02-21T09:22:04.6985183Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6985531Z bar.sync 0; 2026-02-21T09:22:04.6985752Z add.s32 %r102, %r94, 16384; 2026-02-21T09:22:04.6985942Z // begin inline asm 2026-02-21T09:22:04.6986174Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd14 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6986438Z // end inline asm 2026-02-21T09:22:04.6986737Z cp.async.commit_group; 2026-02-21T09:22:04.6987043Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6987400Z add.s64 %rd15, %rd10, 160; 2026-02-21T09:22:04.6987720Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6988071Z add.s32 %r104, %r94, 57344; 2026-02-21T09:22:04.6988321Z // begin inline asm 2026-02-21T09:22:04.6988553Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd15 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6988820Z // end inline asm 2026-02-21T09:22:04.6988974Z cp.async.commit_group; 2026-02-21T09:22:04.6989285Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6989635Z add.s64 %rd16, %rd10, 192; 2026-02-21T09:22:04.6989956Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6990295Z bar.sync 0; 2026-02-21T09:22:04.6990444Z add.s32 %r106, %r94, 24576; 2026-02-21T09:22:04.6990622Z // begin inline asm 2026-02-21T09:22:04.6990844Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd16 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6991127Z // end inline asm 2026-02-21T09:22:04.6991284Z cp.async.commit_group; 2026-02-21T09:22:04.6991592Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6991941Z add.s64 %rd17, %rd10, 224; 2026-02-21T09:22:04.6992259Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6992632Z add.s32 %r108, %r94, 65536; 2026-02-21T09:22:04.6992805Z // begin inline asm 2026-02-21T09:22:04.6993038Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd17 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6993301Z // end inline asm 2026-02-21T09:22:04.6993461Z cp.async.commit_group; 2026-02-21T09:22:04.6993762Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6994113Z add.s64 %rd18, %rd10, 256; 2026-02-21T09:22:04.6994426Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6994767Z bar.sync 0; 2026-02-21T09:22:04.6994917Z add.s32 %r110, %r94, 32768; 2026-02-21T09:22:04.6995088Z // begin inline asm 2026-02-21T09:22:04.6995330Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd18 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6995742Z // end inline asm 2026-02-21T09:22:04.6995900Z cp.async.commit_group; 2026-02-21T09:22:04.6996199Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.6996692Z add.s64 %rd19, %rd10, 288; 2026-02-21T09:22:04.6997007Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.6997354Z add.s32 %r112, %r94, 73728; 2026-02-21T09:22:04.6997528Z // begin inline asm 2026-02-21T09:22:04.6997746Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd19 + 0 ], 0x8, %r95; 2026-02-21T09:22:04.6998017Z // end inline asm 2026-02-21T09:22:04.6998167Z cp.async.commit_group; 2026-02-21T09:22:04.6998334Z shl.b32 %r139, %r2, 4; 2026-02-21T09:22:04.6998620Z and.b32 %r140, %r139, 7680; 2026-02-21T09:22:04.6998807Z and.b32 %r141, %r7, 96; 2026-02-21T09:22:04.6998979Z shl.b32 %r142, %r5, 1; 2026-02-21T09:22:04.6999139Z or.b32 %r143, %r140, %r141; 2026-02-21T09:22:04.6999321Z or.b32 %r144, %r143, %r142; 2026-02-21T09:22:04.6999490Z or.b32 %r10, %r144, %r136; 2026-02-21T09:22:04.6999663Z xor.b32 %r11, %r10, 8; 2026-02-21T09:22:04.6999822Z shl.b32 %r12, %r2, 2; 2026-02-21T09:22:04.6999983Z and.b32 %r145, %r12, 124; 2026-02-21T09:22:04.7000231Z and.b32 %r146, %r2, 384; 2026-02-21T09:22:04.7000411Z shr.u32 %r13, %r2, 4; 2026-02-21T09:22:04.7000576Z and.b32 %r147, %r13, 2; 2026-02-21T09:22:04.7000740Z and.b32 %r148, %r7, 512; 2026-02-21T09:22:04.7000915Z setp.gt.u32 %p1, %r2, 511; 2026-02-21T09:22:04.7001091Z selp.b32 %r149, 1, 0, %p1; 2026-02-21T09:22:04.7001275Z add.s32 %r691, %r138, 81920; 2026-02-21T09:22:04.7001453Z add.s32 %r151, %r691, %r146; 2026-02-21T09:22:04.7001629Z add.s32 %r152, %r151, %r149; 2026-02-21T09:22:04.7001803Z add.s32 %r153, %r152, %r148; 2026-02-21T09:22:04.7001981Z add.s32 %r154, %r153, %r147; 2026-02-21T09:22:04.7002160Z add.s32 %r14, %r154, %r145; 2026-02-21T09:22:04.7002338Z shr.u32 %r155, %r2, 1; 2026-02-21T09:22:04.7002511Z and.b32 %r156, %r155, 384; 2026-02-21T09:22:04.7002679Z add.s32 %r157, %r691, %r147; 2026-02-21T09:22:04.7002855Z add.s32 %r158, %r157, %r156; 2026-02-21T09:22:04.7003029Z add.s32 %r159, %r158, %r145; 2026-02-21T09:22:04.7003207Z add.s32 %r15, %r159, %r148; 2026-02-21T09:22:04.7003379Z shl.b32 %r160, %r128, 6; 2026-02-21T09:22:04.7003564Z and.b32 %r161, %r7, 48; 2026-02-21T09:22:04.7003733Z and.b32 %r162, %r3, 28; 2026-02-21T09:22:04.7003904Z xor.b32 %r163, %r161, %r162; 2026-02-21T09:22:04.7004074Z or.b32 %r164, %r163, %r160; 2026-02-21T09:22:04.7004252Z add.s32 %r16, %r691, %r164; 2026-02-21T09:22:04.7004432Z xor.b32 %r165, %r164, 32; 2026-02-21T09:22:04.7004599Z add.s32 %r17, %r691, %r165; 2026-02-21T09:22:04.7004776Z shl.b32 %r166, %r3, 7; 2026-02-21T09:22:04.7004936Z shl.b32 %r167, %r127, 4; 2026-02-21T09:22:04.7005106Z or.b32 %r168, %r166, %r167; 2026-02-21T09:22:04.7005274Z add.s32 %r169, %r138, 90112; 2026-02-21T09:22:04.7005466Z add.s32 %r695, %r169, %r168; 2026-02-21T09:22:04.7005642Z and.b32 %r170, %r139, 112; 2026-02-21T09:22:04.7005817Z shl.b32 %r171, %r127, 3; 2026-02-21T09:22:04.7005979Z or.b32 %r172, %r166, %r171; 2026-02-21T09:22:04.7006153Z and.b32 %r173, %r172, 1920; 2026-02-21T09:22:04.7006328Z shl.b32 %r174, %r2, 8; 2026-02-21T09:22:04.7006612Z and.b32 %r175, %r174, 2048; 2026-02-21T09:22:04.7006816Z add.s32 %r176, %r169, %r170; 2026-02-21T09:22:04.7006995Z add.s32 %r177, %r176, %r175; 2026-02-21T09:22:04.7007170Z add.s32 %r194, %r177, %r173; 2026-02-21T09:22:04.7007340Z bfe.u32 %r178, %r691, 4, 14; 2026-02-21T09:22:04.7007523Z cvt.u64.u32 %rd21, %r178; 2026-02-21T09:22:04.7007718Z or.b64 %rd28, %rd21, -9223371899382267904; 2026-02-21T09:22:04.7007934Z add.s32 %r179, %r138, 81952; 2026-02-21T09:22:04.7008116Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T09:22:04.7008286Z cvt.u64.u32 %rd22, %r180; 2026-02-21T09:22:04.7008478Z or.b64 %rd29, %rd22, -9223371899382267904; 2026-02-21T09:22:04.7008773Z add.s32 %r181, %r138, 86016; 2026-02-21T09:22:04.7009030Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T09:22:04.7009204Z cvt.u64.u32 %rd23, %r182; 2026-02-21T09:22:04.7009390Z or.b64 %rd30, %rd23, -9223371899382267904; 2026-02-21T09:22:04.7009592Z add.s32 %r183, %r138, 86048; 2026-02-21T09:22:04.7009770Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:22:04.7009941Z cvt.u64.u32 %rd24, %r184; 2026-02-21T09:22:04.7010125Z or.b64 %rd31, %rd24, -9223371899382267904; 2026-02-21T09:22:04.7010481Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:04.7010834Z shl.b32 %r185, %r123, 18; 2026-02-21T09:22:04.7011004Z shl.b32 %r186, %r129, 10; 2026-02-21T09:22:04.7011170Z or.b32 %r187, %r185, %r186; 2026-02-21T09:22:04.7011349Z or.b32 %r188, %r187, %r131; 2026-02-21T09:22:04.7011602Z or.b32 %r1077, %r188, 176; 2026-02-21T09:22:04.7011788Z shl.b32 %r189, %r2, 6; 2026-02-21T09:22:04.7011954Z and.b32 %r190, %r189, 57344; 2026-02-21T09:22:04.7012138Z add.s32 %r191, %r190, %r1; 2026-02-21T09:22:04.7012312Z or.b32 %r1076, %r191, %r128; 2026-02-21T09:22:04.7012491Z mov.b32 %r826, 0f00000000; 2026-02-21T09:22:04.7012663Z mov.b32 %r1079, 4; 2026-02-21T09:22:04.7012819Z mov.b32 %r1078, -1; 2026-02-21T09:22:04.7013072Z mov.b64 %rd47, -16; 2026-02-21T09:22:04.7013244Z setp.eq.b32 %p8, %r6, 0; 2026-02-21T09:22:04.7013422Z mov.b32 %r827, %r826; 2026-02-21T09:22:04.7013583Z mov.b32 %r828, %r826; 2026-02-21T09:22:04.7013748Z mov.b32 %r829, %r826; 2026-02-21T09:22:04.7013906Z mov.b32 %r830, %r826; 2026-02-21T09:22:04.7014067Z mov.b32 %r831, %r826; 2026-02-21T09:22:04.7014229Z mov.b32 %r832, %r826; 2026-02-21T09:22:04.7014388Z mov.b32 %r833, %r826; 2026-02-21T09:22:04.7014551Z mov.b32 %r834, %r826; 2026-02-21T09:22:04.7014707Z mov.b32 %r835, %r826; 2026-02-21T09:22:04.7014868Z mov.b32 %r836, %r826; 2026-02-21T09:22:04.7015021Z mov.b32 %r837, %r826; 2026-02-21T09:22:04.7015183Z mov.b32 %r838, %r826; 2026-02-21T09:22:04.7015340Z mov.b32 %r839, %r826; 2026-02-21T09:22:04.7015510Z mov.b32 %r840, %r826; 2026-02-21T09:22:04.7015662Z mov.b32 %r841, %r826; 2026-02-21T09:22:04.7015821Z mov.b32 %r842, %r826; 2026-02-21T09:22:04.7015978Z mov.b32 %r843, %r826; 2026-02-21T09:22:04.7016132Z mov.b32 %r844, %r826; 2026-02-21T09:22:04.7016290Z mov.b32 %r845, %r826; 2026-02-21T09:22:04.7016441Z mov.b32 %r846, %r826; 2026-02-21T09:22:04.7016745Z mov.b32 %r847, %r826; 2026-02-21T09:22:04.7016896Z mov.b32 %r848, %r826; 2026-02-21T09:22:04.7017055Z mov.b32 %r849, %r826; 2026-02-21T09:22:04.7017207Z mov.b32 %r850, %r826; 2026-02-21T09:22:04.7017364Z mov.b32 %r851, %r826; 2026-02-21T09:22:04.7017519Z mov.b32 %r852, %r826; 2026-02-21T09:22:04.7017678Z mov.b32 %r853, %r826; 2026-02-21T09:22:04.7017834Z mov.b32 %r854, %r826; 2026-02-21T09:22:04.7017997Z mov.b32 %r855, %r826; 2026-02-21T09:22:04.7018156Z mov.b32 %r856, %r826; 2026-02-21T09:22:04.7018310Z mov.b32 %r857, %r826; 2026-02-21T09:22:04.7018528Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:22:04.7018796Z add.s64 %rd47, %rd47, 16; 2026-02-21T09:22:04.7018976Z setp.lt.u64 %p9, %rd47, 432; 2026-02-21T09:22:04.7019155Z add.s32 %r968, %r1078, 1; 2026-02-21T09:22:04.7019336Z setp.gt.s32 %p10, %r968, 4; 2026-02-21T09:22:04.7019523Z selp.b32 %r1078, 0, %r968, %p10; 2026-02-21T09:22:04.7019870Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.7020237Z cp.async.wait_group 8; 2026-02-21T09:22:04.7020404Z bar.sync 0; 2026-02-21T09:22:04.7020556Z shl.b32 %r969, %r1078, 13; 2026-02-21T09:22:04.7020743Z add.s32 %r971, %r138, %r969; 2026-02-21T09:22:04.7021067Z .loc 1 46 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:46:28 2026-02-21T09:22:04.7021416Z add.s32 %r972, %r971, %r10; 2026-02-21T09:22:04.7021602Z ld.shared.b16 %rs3, [%r972]; 2026-02-21T09:22:04.7021787Z ld.shared.b16 %rs4, [%r972+256]; 2026-02-21T09:22:04.7022144Z ld.shared.b16 %rs5, [%r972+16]; 2026-02-21T09:22:04.7022349Z ld.shared.b16 %rs6, [%r972+272]; 2026-02-21T09:22:04.7022549Z add.s32 %r973, %r971, %r11; 2026-02-21T09:22:04.7022735Z ld.shared.b16 %rs7, [%r973]; 2026-02-21T09:22:04.7022931Z ld.shared.b16 %rs8, [%r973+256]; 2026-02-21T09:22:04.7023141Z ld.shared.b16 %rs9, [%r973+16]; 2026-02-21T09:22:04.7023337Z ld.shared.b16 %rs10, [%r973+272]; 2026-02-21T09:22:04.7023548Z cvt.f32.bf16 %r488, %rs3; 2026-02-21T09:22:04.7023723Z cvt.f32.bf16 %r489, %rs4; 2026-02-21T09:22:04.7023902Z cvt.f32.bf16 %r490, %rs7; 2026-02-21T09:22:04.7024076Z cvt.f32.bf16 %r491, %rs8; 2026-02-21T09:22:04.7024244Z cvt.f32.bf16 %r556, %rs5; 2026-02-21T09:22:04.7024420Z cvt.f32.bf16 %r557, %rs6; 2026-02-21T09:22:04.7024664Z cvt.f32.bf16 %r558, %rs9; 2026-02-21T09:22:04.7024845Z cvt.f32.bf16 %r559, %rs10; 2026-02-21T09:22:04.7025184Z .loc 1 48 30 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:30 2026-02-21T09:22:04.7025555Z cvt.s64.s32 %rd39, %r1076; 2026-02-21T09:22:04.7025734Z add.s64 %rd26, %rd8, %rd39; 2026-02-21T09:22:04.7026057Z .loc 1 48 83 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:83 2026-02-21T09:22:04.7026606Z // begin inline asm 2026-02-21T09:22:04.7026795Z mov.u64 %rd25, 0x0; 2026-02-21T09:22:04.7027072Z createpolicy.fractional.L2::evict_first.b64 %rd25, 1.0; 2026-02-21T09:22:04.7027337Z // end inline asm 2026-02-21T09:22:04.7027496Z // begin inline asm 2026-02-21T09:22:04.7027661Z mov.u16 %rs1, 0x0; 2026-02-21T09:22:04.7027917Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd26 + 0 ], %rd25; 2026-02-21T09:22:04.7028270Z // end inline asm 2026-02-21T09:22:04.7028598Z .loc 1 56 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:56:24 2026-02-21T09:22:04.7028972Z st.shared.b8 [%r14], %rs1; 2026-02-21T09:22:04.7029147Z bar.sync 0; 2026-02-21T09:22:04.7029319Z ld.shared.v2.b8 {%rs11, %rs12}, [%r15]; 2026-02-21T09:22:04.7029684Z .loc 1 51 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:51:24 2026-02-21T09:22:04.7030038Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:22:04.7030212Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:22:04.7030526Z .loc 1 66 54 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:66:54 2026-02-21T09:22:04.7030891Z selp.b16 %rs15, %rs13, %rs11, %p8; 2026-02-21T09:22:04.7031089Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:22:04.7031264Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:22:04.7031438Z selp.b16 %rs18, %rs14, %rs12, %p8; 2026-02-21T09:22:04.7031634Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:22:04.7031799Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:22:04.7032109Z .loc 1 71 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:71:28 2026-02-21T09:22:04.7032468Z cvt.rn.f32.s16 %r974, %rs17; 2026-02-21T09:22:04.7032657Z cvt.rn.f32.s16 %r975, %rs20; 2026-02-21T09:22:04.7032838Z bar.sync 0; 2026-02-21T09:22:04.7032988Z st.shared.b32 [%r16], %r974; 2026-02-21T09:22:04.7033172Z st.shared.b32 [%r17], %r975; 2026-02-21T09:22:04.7033431Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r826}; 2026-02-21T09:22:04.7033706Z bar.sync 0; 2026-02-21T09:22:04.7033849Z // begin inline asm 2026-02-21T09:22:04.7034098Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r356, %r492}, [%r194]; 2026-02-21T09:22:04.7034377Z // end inline asm 2026-02-21T09:22:04.7034537Z bar.sync 0; 2026-02-21T09:22:04.7034746Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r828}; 2026-02-21T09:22:04.7035013Z bar.sync 0; 2026-02-21T09:22:04.7035160Z // begin inline asm 2026-02-21T09:22:04.7035397Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r358, %r494}, [%r194]; 2026-02-21T09:22:04.7035696Z // end inline asm 2026-02-21T09:22:04.7035848Z bar.sync 0; 2026-02-21T09:22:04.7036059Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r827}; 2026-02-21T09:22:04.7036317Z bar.sync 0; 2026-02-21T09:22:04.7036739Z // begin inline asm 2026-02-21T09:22:04.7036978Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r357, %r493}, [%r194]; 2026-02-21T09:22:04.7037277Z // end inline asm 2026-02-21T09:22:04.7037429Z bar.sync 0; 2026-02-21T09:22:04.7037637Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r829}; 2026-02-21T09:22:04.7037899Z bar.sync 0; 2026-02-21T09:22:04.7038039Z // begin inline asm 2026-02-21T09:22:04.7038275Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r359, %r495}, [%r194]; 2026-02-21T09:22:04.7038551Z // end inline asm 2026-02-21T09:22:04.7038700Z bar.sync 0; 2026-02-21T09:22:04.7038903Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r830}; 2026-02-21T09:22:04.7039163Z bar.sync 0; 2026-02-21T09:22:04.7039300Z // begin inline asm 2026-02-21T09:22:04.7039623Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r360, %r496}, [%r194]; 2026-02-21T09:22:04.7039911Z // end inline asm 2026-02-21T09:22:04.7040056Z bar.sync 0; 2026-02-21T09:22:04.7040263Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r832}; 2026-02-21T09:22:04.7040518Z bar.sync 0; 2026-02-21T09:22:04.7040661Z // begin inline asm 2026-02-21T09:22:04.7040894Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r362, %r498}, [%r194]; 2026-02-21T09:22:04.7041173Z // end inline asm 2026-02-21T09:22:04.7041397Z bar.sync 0; 2026-02-21T09:22:04.7041616Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r831}; 2026-02-21T09:22:04.7041874Z bar.sync 0; 2026-02-21T09:22:04.7042017Z // begin inline asm 2026-02-21T09:22:04.7042255Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r361, %r497}, [%r194]; 2026-02-21T09:22:04.7042535Z // end inline asm 2026-02-21T09:22:04.7042688Z bar.sync 0; 2026-02-21T09:22:04.7042897Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r833}; 2026-02-21T09:22:04.7043163Z bar.sync 0; 2026-02-21T09:22:04.7043304Z // begin inline asm 2026-02-21T09:22:04.7043564Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r363, %r499}, [%r194]; 2026-02-21T09:22:04.7043844Z // end inline asm 2026-02-21T09:22:04.7043997Z bar.sync 0; 2026-02-21T09:22:04.7044209Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r834}; 2026-02-21T09:22:04.7044469Z bar.sync 0; 2026-02-21T09:22:04.7044617Z // begin inline asm 2026-02-21T09:22:04.7044855Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r364, %r500}, [%r194]; 2026-02-21T09:22:04.7045137Z // end inline asm 2026-02-21T09:22:04.7045278Z bar.sync 0; 2026-02-21T09:22:04.7045486Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r836}; 2026-02-21T09:22:04.7045743Z bar.sync 0; 2026-02-21T09:22:04.7045892Z // begin inline asm 2026-02-21T09:22:04.7046137Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r366, %r502}, [%r194]; 2026-02-21T09:22:04.7046423Z // end inline asm 2026-02-21T09:22:04.7046713Z bar.sync 0; 2026-02-21T09:22:04.7046921Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r835}; 2026-02-21T09:22:04.7047183Z bar.sync 0; 2026-02-21T09:22:04.7047324Z // begin inline asm 2026-02-21T09:22:04.7047563Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r365, %r501}, [%r194]; 2026-02-21T09:22:04.7047842Z // end inline asm 2026-02-21T09:22:04.7047993Z bar.sync 0; 2026-02-21T09:22:04.7048197Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r837}; 2026-02-21T09:22:04.7048460Z bar.sync 0; 2026-02-21T09:22:04.7048611Z // begin inline asm 2026-02-21T09:22:04.7048858Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r367, %r503}, [%r194]; 2026-02-21T09:22:04.7049141Z // end inline asm 2026-02-21T09:22:04.7049281Z bar.sync 0; 2026-02-21T09:22:04.7049489Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r838}; 2026-02-21T09:22:04.7049746Z bar.sync 0; 2026-02-21T09:22:04.7049891Z // begin inline asm 2026-02-21T09:22:04.7050125Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r368, %r504}, [%r194]; 2026-02-21T09:22:04.7050408Z // end inline asm 2026-02-21T09:22:04.7050549Z bar.sync 0; 2026-02-21T09:22:04.7050758Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r840}; 2026-02-21T09:22:04.7051015Z bar.sync 0; 2026-02-21T09:22:04.7051156Z // begin inline asm 2026-02-21T09:22:04.7051588Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r370, %r506}, [%r194]; 2026-02-21T09:22:04.7051865Z // end inline asm 2026-02-21T09:22:04.7052017Z bar.sync 0; 2026-02-21T09:22:04.7052226Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r839}; 2026-02-21T09:22:04.7052489Z bar.sync 0; 2026-02-21T09:22:04.7052644Z // begin inline asm 2026-02-21T09:22:04.7052885Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r369, %r505}, [%r194]; 2026-02-21T09:22:04.7053166Z // end inline asm 2026-02-21T09:22:04.7053307Z bar.sync 0; 2026-02-21T09:22:04.7053516Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r841}; 2026-02-21T09:22:04.7053772Z bar.sync 0; 2026-02-21T09:22:04.7053920Z // begin inline asm 2026-02-21T09:22:04.7054241Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r371, %r507}, [%r194]; 2026-02-21T09:22:04.7054527Z // end inline asm 2026-02-21T09:22:04.7054672Z bar.sync 0; 2026-02-21T09:22:04.7054882Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r842}; 2026-02-21T09:22:04.7055145Z bar.sync 0; 2026-02-21T09:22:04.7055283Z // begin inline asm 2026-02-21T09:22:04.7055525Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r372, %r508}, [%r194]; 2026-02-21T09:22:04.7055798Z // end inline asm 2026-02-21T09:22:04.7055947Z bar.sync 0; 2026-02-21T09:22:04.7056229Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r844}; 2026-02-21T09:22:04.7056613Z bar.sync 0; 2026-02-21T09:22:04.7056753Z // begin inline asm 2026-02-21T09:22:04.7056992Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r374, %r510}, [%r194]; 2026-02-21T09:22:04.7057267Z // end inline asm 2026-02-21T09:22:04.7057430Z bar.sync 0; 2026-02-21T09:22:04.7057643Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r843}; 2026-02-21T09:22:04.7057901Z bar.sync 0; 2026-02-21T09:22:04.7058051Z // begin inline asm 2026-02-21T09:22:04.7058287Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r373, %r509}, [%r194]; 2026-02-21T09:22:04.7058572Z // end inline asm 2026-02-21T09:22:04.7058718Z bar.sync 0; 2026-02-21T09:22:04.7058935Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r845}; 2026-02-21T09:22:04.7059191Z bar.sync 0; 2026-02-21T09:22:04.7059336Z // begin inline asm 2026-02-21T09:22:04.7059566Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r375, %r511}, [%r194]; 2026-02-21T09:22:04.7059849Z // end inline asm 2026-02-21T09:22:04.7059998Z bar.sync 0; 2026-02-21T09:22:04.7060202Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r846}; 2026-02-21T09:22:04.7060460Z bar.sync 0; 2026-02-21T09:22:04.7060595Z // begin inline asm 2026-02-21T09:22:04.7060833Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r376, %r512}, [%r194]; 2026-02-21T09:22:04.7061109Z // end inline asm 2026-02-21T09:22:04.7061258Z bar.sync 0; 2026-02-21T09:22:04.7061460Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r848}; 2026-02-21T09:22:04.7061718Z bar.sync 0; 2026-02-21T09:22:04.7061863Z // begin inline asm 2026-02-21T09:22:04.7062092Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r378, %r514}, [%r194]; 2026-02-21T09:22:04.7062394Z // end inline asm 2026-02-21T09:22:04.7062539Z bar.sync 0; 2026-02-21T09:22:04.7062752Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r847}; 2026-02-21T09:22:04.7063006Z bar.sync 0; 2026-02-21T09:22:04.7063150Z // begin inline asm 2026-02-21T09:22:04.7063384Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r377, %r513}, [%r194]; 2026-02-21T09:22:04.7063663Z // end inline asm 2026-02-21T09:22:04.7063808Z bar.sync 0; 2026-02-21T09:22:04.7064019Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r849}; 2026-02-21T09:22:04.7064279Z bar.sync 0; 2026-02-21T09:22:04.7064421Z // begin inline asm 2026-02-21T09:22:04.7064659Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r379, %r515}, [%r194]; 2026-02-21T09:22:04.7064932Z // end inline asm 2026-02-21T09:22:04.7065079Z bar.sync 0; 2026-02-21T09:22:04.7065289Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r850}; 2026-02-21T09:22:04.7065554Z bar.sync 0; 2026-02-21T09:22:04.7065693Z // begin inline asm 2026-02-21T09:22:04.7066021Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r380, %r516}, [%r194]; 2026-02-21T09:22:04.7066382Z // end inline asm 2026-02-21T09:22:04.7066660Z bar.sync 0; 2026-02-21T09:22:04.7066880Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r852}; 2026-02-21T09:22:04.7067138Z bar.sync 0; 2026-02-21T09:22:04.7067290Z // begin inline asm 2026-02-21T09:22:04.7067524Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r382, %r518}, [%r194]; 2026-02-21T09:22:04.7067803Z // end inline asm 2026-02-21T09:22:04.7067950Z bar.sync 0; 2026-02-21T09:22:04.7068160Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r851}; 2026-02-21T09:22:04.7068515Z bar.sync 0; 2026-02-21T09:22:04.7068665Z // begin inline asm 2026-02-21T09:22:04.7068906Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r381, %r517}, [%r194]; 2026-02-21T09:22:04.7069269Z // end inline asm 2026-02-21T09:22:04.7069436Z bar.sync 0; 2026-02-21T09:22:04.7069638Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r853}; 2026-02-21T09:22:04.7069901Z bar.sync 0; 2026-02-21T09:22:04.7070044Z // begin inline asm 2026-02-21T09:22:04.7070287Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r383, %r519}, [%r194]; 2026-02-21T09:22:04.7070563Z // end inline asm 2026-02-21T09:22:04.7070713Z bar.sync 0; 2026-02-21T09:22:04.7070993Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r854}; 2026-02-21T09:22:04.7071264Z bar.sync 0; 2026-02-21T09:22:04.7071407Z // begin inline asm 2026-02-21T09:22:04.7071637Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r384, %r520}, [%r194]; 2026-02-21T09:22:04.7071916Z // end inline asm 2026-02-21T09:22:04.7072058Z bar.sync 0; 2026-02-21T09:22:04.7072265Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r856}; 2026-02-21T09:22:04.7072520Z bar.sync 0; 2026-02-21T09:22:04.7072660Z // begin inline asm 2026-02-21T09:22:04.7072891Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r386, %r522}, [%r194]; 2026-02-21T09:22:04.7073170Z // end inline asm 2026-02-21T09:22:04.7073314Z bar.sync 0; 2026-02-21T09:22:04.7073515Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r855}; 2026-02-21T09:22:04.7073778Z bar.sync 0; 2026-02-21T09:22:04.7073915Z // begin inline asm 2026-02-21T09:22:04.7074152Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r385, %r521}, [%r194]; 2026-02-21T09:22:04.7074425Z // end inline asm 2026-02-21T09:22:04.7074574Z bar.sync 0; 2026-02-21T09:22:04.7074778Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r695], {%r857}; 2026-02-21T09:22:04.7075037Z bar.sync 0; 2026-02-21T09:22:04.7075182Z // begin inline asm 2026-02-21T09:22:04.7075417Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r387, %r523}, [%r194]; 2026-02-21T09:22:04.7075697Z // end inline asm 2026-02-21T09:22:04.7075852Z $L__tmp1: 2026-02-21T09:22:04.7076223Z .loc 2 291 36 // standard.py:291:36 @[ cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:78:36 ] 2026-02-21T09:22:04.7076794Z // begin inline asm 2026-02-21T09:22:04.7076976Z fence.proxy.async.shared::cta; 2026-02-21T09:22:04.7077167Z // end inline asm 2026-02-21T09:22:04.7077345Z shfl.sync.idx.b32 %r976, %r3, 0, 31, -1; 2026-02-21T09:22:04.7077573Z wgmma.fence.sync.aligned; 2026-02-21T09:22:04.7077757Z mov.pred %p2, -1; 2026-02-21T09:22:04.7077919Z // begin inline asm 2026-02-21T09:22:04.7078676Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387}, {%r488,%r489,%r490,%r491}, %rd28, %p2, 1, 1; 2026-02-21T09:22:04.7079486Z // end inline asm 2026-02-21T09:22:04.7079631Z // begin inline asm 2026-02-21T09:22:04.7080372Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387}, {%r556,%r557,%r558,%r559}, %rd29, %p2, 1, 1; 2026-02-21T09:22:04.7081163Z // end inline asm 2026-02-21T09:22:04.7090031Z // begin inline asm 2026-02-21T09:22:04.7091082Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523}, {%r488,%r489,%r490,%r491}, %rd30, %p2, 1, 1; 2026-02-21T09:22:04.7091919Z // end inline asm 2026-02-21T09:22:04.7092095Z // begin inline asm 2026-02-21T09:22:04.7092865Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523}, {%r556,%r557,%r558,%r559}, %rd31, %p2, 1, 1; 2026-02-21T09:22:04.7093676Z // end inline asm 2026-02-21T09:22:04.7093952Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:04.7094178Z mov.b32 %r928, 0; 2026-02-21T09:22:04.7094340Z mov.b32 %r624, %r691; 2026-02-21T09:22:04.7094515Z mov.b32 %r625, %r928; 2026-02-21T09:22:04.7094687Z mov.b32 %r626, %r928; 2026-02-21T09:22:04.7094847Z // begin inline asm 2026-02-21T09:22:04.7095919Z // wait for regs: %r356,%r357,%r358,%r359,%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r492,%r493,%r494,%r495,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r624,%r625,%r626 2026-02-21T09:22:04.7097125Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:04.7097331Z // end inline asm 2026-02-21T09:22:04.7097492Z $L__tmp2: 2026-02-21T09:22:04.7097817Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.7098198Z add.s32 %r977, %r971, 40960; 2026-02-21T09:22:04.7098537Z .loc 1 46 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:46:28 2026-02-21T09:22:04.7098917Z add.s32 %r978, %r977, %r10; 2026-02-21T09:22:04.7099115Z ld.shared.b16 %rs21, [%r978]; 2026-02-21T09:22:04.7099321Z ld.shared.b16 %rs22, [%r978+256]; 2026-02-21T09:22:04.7099527Z ld.shared.b16 %rs23, [%r978+16]; 2026-02-21T09:22:04.7099745Z ld.shared.b16 %rs24, [%r978+272]; 2026-02-21T09:22:04.7099952Z add.s32 %r979, %r977, %r11; 2026-02-21T09:22:04.7100140Z ld.shared.b16 %rs25, [%r979]; 2026-02-21T09:22:04.7100336Z ld.shared.b16 %rs26, [%r979+256]; 2026-02-21T09:22:04.7100530Z ld.shared.b16 %rs27, [%r979+16]; 2026-02-21T09:22:04.7100729Z ld.shared.b16 %rs28, [%r979+272]; 2026-02-21T09:22:04.7100922Z cvt.f32.bf16 %r822, %rs21; 2026-02-21T09:22:04.7101113Z cvt.f32.bf16 %r823, %rs22; 2026-02-21T09:22:04.7101289Z cvt.f32.bf16 %r824, %rs25; 2026-02-21T09:22:04.7101470Z cvt.f32.bf16 %r825, %rs26; 2026-02-21T09:22:04.7101646Z cvt.f32.bf16 %r890, %rs23; 2026-02-21T09:22:04.7101828Z cvt.f32.bf16 %r891, %rs24; 2026-02-21T09:22:04.7102005Z cvt.f32.bf16 %r892, %rs27; 2026-02-21T09:22:04.7102190Z cvt.f32.bf16 %r893, %rs28; 2026-02-21T09:22:04.7102535Z .loc 1 48 30 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:30 2026-02-21T09:22:04.7102897Z add.s32 %r980, %r1076, 65536; 2026-02-21T09:22:04.7103097Z cvt.s64.s32 %rd40, %r980; 2026-02-21T09:22:04.7103276Z add.s64 %rd33, %rd8, %rd40; 2026-02-21T09:22:04.7103608Z .loc 1 48 83 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:48:83 2026-02-21T09:22:04.7103969Z // begin inline asm 2026-02-21T09:22:04.7104133Z mov.u64 %rd32, 0x0; 2026-02-21T09:22:04.7104377Z createpolicy.fractional.L2::evict_first.b64 %rd32, 1.0; 2026-02-21T09:22:04.7104633Z // end inline asm 2026-02-21T09:22:04.7104795Z // begin inline asm 2026-02-21T09:22:04.7104954Z mov.u16 %rs2, 0x0; 2026-02-21T09:22:04.7105214Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd33 + 0 ], %rd32; 2026-02-21T09:22:04.7105514Z // end inline asm 2026-02-21T09:22:04.7105965Z .loc 1 56 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:56:24 2026-02-21T09:22:04.7106326Z bar.sync 0; 2026-02-21T09:22:04.7106613Z st.shared.b8 [%r14], %rs2; 2026-02-21T09:22:04.7106818Z bar.sync 0; 2026-02-21T09:22:04.7106990Z ld.shared.v2.b8 {%rs29, %rs30}, [%r15]; 2026-02-21T09:22:04.7107357Z .loc 1 51 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:51:24 2026-02-21T09:22:04.7107714Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:22:04.7107900Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:22:04.7108290Z .loc 1 66 54 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:66:54 2026-02-21T09:22:04.7108672Z selp.b16 %rs33, %rs31, %rs29, %p8; 2026-02-21T09:22:04.7108881Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:22:04.7109132Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:22:04.7109333Z selp.b16 %rs36, %rs32, %rs30, %p8; 2026-02-21T09:22:04.7109531Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:22:04.7109711Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:22:04.7110015Z .loc 1 71 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:71:28 2026-02-21T09:22:04.7110378Z cvt.rn.f32.s16 %r981, %rs35; 2026-02-21T09:22:04.7110562Z cvt.rn.f32.s16 %r982, %rs38; 2026-02-21T09:22:04.7110814Z bar.sync 0; 2026-02-21T09:22:04.7110974Z st.shared.b32 [%r16], %r981; 2026-02-21T09:22:04.7111156Z st.shared.b32 [%r17], %r982; 2026-02-21T09:22:04.7111346Z $L__tmp3: 2026-02-21T09:22:04.7111705Z .loc 2 291 36 // standard.py:291:36 @[ cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:78:36 ] 2026-02-21T09:22:04.7112217Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r356, %r492}; 2026-02-21T09:22:04.7112510Z bar.sync 0; 2026-02-21T09:22:04.7112664Z // begin inline asm 2026-02-21T09:22:04.7112895Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r826}, [%r695]; 2026-02-21T09:22:04.7113168Z // end inline asm 2026-02-21T09:22:04.7113324Z bar.sync 0; 2026-02-21T09:22:04.7113563Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r358, %r494}; 2026-02-21T09:22:04.7113856Z bar.sync 0; 2026-02-21T09:22:04.7114005Z // begin inline asm 2026-02-21T09:22:04.7114260Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r828}, [%r695]; 2026-02-21T09:22:04.7114529Z // end inline asm 2026-02-21T09:22:04.7114687Z bar.sync 0; 2026-02-21T09:22:04.7114912Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r357, %r493}; 2026-02-21T09:22:04.7115197Z bar.sync 0; 2026-02-21T09:22:04.7115349Z // begin inline asm 2026-02-21T09:22:04.7115573Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r827}, [%r695]; 2026-02-21T09:22:04.7115843Z // end inline asm 2026-02-21T09:22:04.7115990Z bar.sync 0; 2026-02-21T09:22:04.7116224Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r359, %r495}; 2026-02-21T09:22:04.7116638Z bar.sync 0; 2026-02-21T09:22:04.7116789Z // begin inline asm 2026-02-21T09:22:04.7117012Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r829}, [%r695]; 2026-02-21T09:22:04.7117284Z // end inline asm 2026-02-21T09:22:04.7117430Z bar.sync 0; 2026-02-21T09:22:04.7117665Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r360, %r496}; 2026-02-21T09:22:04.7117947Z bar.sync 0; 2026-02-21T09:22:04.7118092Z // begin inline asm 2026-02-21T09:22:04.7118324Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r830}, [%r695]; 2026-02-21T09:22:04.7118586Z // end inline asm 2026-02-21T09:22:04.7118740Z bar.sync 0; 2026-02-21T09:22:04.7118963Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r362, %r498}; 2026-02-21T09:22:04.7119245Z bar.sync 0; 2026-02-21T09:22:04.7119388Z // begin inline asm 2026-02-21T09:22:04.7119618Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r832}, [%r695]; 2026-02-21T09:22:04.7119887Z // end inline asm 2026-02-21T09:22:04.7120037Z bar.sync 0; 2026-02-21T09:22:04.7120267Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r361, %r497}; 2026-02-21T09:22:04.7120540Z bar.sync 0; 2026-02-21T09:22:04.7120693Z // begin inline asm 2026-02-21T09:22:04.7121047Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r831}, [%r695]; 2026-02-21T09:22:04.7121378Z // end inline asm 2026-02-21T09:22:04.7121526Z bar.sync 0; 2026-02-21T09:22:04.7121757Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r363, %r499}; 2026-02-21T09:22:04.7122034Z bar.sync 0; 2026-02-21T09:22:04.7122194Z // begin inline asm 2026-02-21T09:22:04.7122420Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r833}, [%r695]; 2026-02-21T09:22:04.7122681Z // end inline asm 2026-02-21T09:22:04.7122834Z bar.sync 0; 2026-02-21T09:22:04.7123070Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r364, %r500}; 2026-02-21T09:22:04.7123352Z bar.sync 0; 2026-02-21T09:22:04.7123495Z // begin inline asm 2026-02-21T09:22:04.7123725Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r834}, [%r695]; 2026-02-21T09:22:04.7124056Z // end inline asm 2026-02-21T09:22:04.7124216Z bar.sync 0; 2026-02-21T09:22:04.7124448Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r366, %r502}; 2026-02-21T09:22:04.7124727Z bar.sync 0; 2026-02-21T09:22:04.7124877Z // begin inline asm 2026-02-21T09:22:04.7125113Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r836}, [%r695]; 2026-02-21T09:22:04.7125381Z // end inline asm 2026-02-21T09:22:04.7125524Z bar.sync 0; 2026-02-21T09:22:04.7125822Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r365, %r501}; 2026-02-21T09:22:04.7126108Z bar.sync 0; 2026-02-21T09:22:04.7126256Z // begin inline asm 2026-02-21T09:22:04.7126584Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r835}, [%r695]; 2026-02-21T09:22:04.7126854Z // end inline asm 2026-02-21T09:22:04.7127006Z bar.sync 0; 2026-02-21T09:22:04.7127228Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r367, %r503}; 2026-02-21T09:22:04.7127509Z bar.sync 0; 2026-02-21T09:22:04.7127649Z // begin inline asm 2026-02-21T09:22:04.7127889Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r837}, [%r695]; 2026-02-21T09:22:04.7128152Z // end inline asm 2026-02-21T09:22:04.7128304Z bar.sync 0; 2026-02-21T09:22:04.7128526Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r368, %r504}; 2026-02-21T09:22:04.7128817Z bar.sync 0; 2026-02-21T09:22:04.7128968Z // begin inline asm 2026-02-21T09:22:04.7129192Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r838}, [%r695]; 2026-02-21T09:22:04.7129458Z // end inline asm 2026-02-21T09:22:04.7129608Z bar.sync 0; 2026-02-21T09:22:04.7129839Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r370, %r506}; 2026-02-21T09:22:04.7130128Z bar.sync 0; 2026-02-21T09:22:04.7130278Z // begin inline asm 2026-02-21T09:22:04.7130495Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r840}, [%r695]; 2026-02-21T09:22:04.7130762Z // end inline asm 2026-02-21T09:22:04.7130912Z bar.sync 0; 2026-02-21T09:22:04.7131130Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r369, %r505}; 2026-02-21T09:22:04.7131410Z bar.sync 0; 2026-02-21T09:22:04.7131551Z // begin inline asm 2026-02-21T09:22:04.7131777Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r839}, [%r695]; 2026-02-21T09:22:04.7132036Z // end inline asm 2026-02-21T09:22:04.7132191Z bar.sync 0; 2026-02-21T09:22:04.7132409Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r371, %r507}; 2026-02-21T09:22:04.7132698Z bar.sync 0; 2026-02-21T09:22:04.7132844Z // begin inline asm 2026-02-21T09:22:04.7133062Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r841}, [%r695]; 2026-02-21T09:22:04.7133324Z // end inline asm 2026-02-21T09:22:04.7133469Z bar.sync 0; 2026-02-21T09:22:04.7133693Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r372, %r508}; 2026-02-21T09:22:04.7133967Z bar.sync 0; 2026-02-21T09:22:04.7134128Z // begin inline asm 2026-02-21T09:22:04.7134352Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r842}, [%r695]; 2026-02-21T09:22:04.7134619Z // end inline asm 2026-02-21T09:22:04.7134765Z bar.sync 0; 2026-02-21T09:22:04.7134997Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r374, %r510}; 2026-02-21T09:22:04.7135280Z bar.sync 0; 2026-02-21T09:22:04.7135419Z // begin inline asm 2026-02-21T09:22:04.7135644Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r844}, [%r695]; 2026-02-21T09:22:04.7136059Z // end inline asm 2026-02-21T09:22:04.7136209Z bar.sync 0; 2026-02-21T09:22:04.7136429Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r373, %r509}; 2026-02-21T09:22:04.7136839Z bar.sync 0; 2026-02-21T09:22:04.7136981Z // begin inline asm 2026-02-21T09:22:04.7137204Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r843}, [%r695]; 2026-02-21T09:22:04.7137471Z // end inline asm 2026-02-21T09:22:04.7137613Z bar.sync 0; 2026-02-21T09:22:04.7137836Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r375, %r511}; 2026-02-21T09:22:04.7138108Z bar.sync 0; 2026-02-21T09:22:04.7138256Z // begin inline asm 2026-02-21T09:22:04.7138473Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r845}, [%r695]; 2026-02-21T09:22:04.7138737Z // end inline asm 2026-02-21T09:22:04.7138973Z bar.sync 0; 2026-02-21T09:22:04.7139212Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r376, %r512}; 2026-02-21T09:22:04.7139490Z bar.sync 0; 2026-02-21T09:22:04.7139635Z // begin inline asm 2026-02-21T09:22:04.7139868Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r846}, [%r695]; 2026-02-21T09:22:04.7140125Z // end inline asm 2026-02-21T09:22:04.7140279Z bar.sync 0; 2026-02-21T09:22:04.7140579Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r378, %r514}; 2026-02-21T09:22:04.7140865Z bar.sync 0; 2026-02-21T09:22:04.7141017Z // begin inline asm 2026-02-21T09:22:04.7141248Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r848}, [%r695]; 2026-02-21T09:22:04.7141511Z // end inline asm 2026-02-21T09:22:04.7141655Z bar.sync 0; 2026-02-21T09:22:04.7141879Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r377, %r513}; 2026-02-21T09:22:04.7142153Z bar.sync 0; 2026-02-21T09:22:04.7142308Z // begin inline asm 2026-02-21T09:22:04.7142534Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r847}, [%r695]; 2026-02-21T09:22:04.7142790Z // end inline asm 2026-02-21T09:22:04.7142942Z bar.sync 0; 2026-02-21T09:22:04.7143160Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r379, %r515}; 2026-02-21T09:22:04.7143442Z bar.sync 0; 2026-02-21T09:22:04.7143593Z // begin inline asm 2026-02-21T09:22:04.7143817Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r849}, [%r695]; 2026-02-21T09:22:04.7144076Z // end inline asm 2026-02-21T09:22:04.7144222Z bar.sync 0; 2026-02-21T09:22:04.7144447Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r380, %r516}; 2026-02-21T09:22:04.7144720Z bar.sync 0; 2026-02-21T09:22:04.7144862Z // begin inline asm 2026-02-21T09:22:04.7145077Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r850}, [%r695]; 2026-02-21T09:22:04.7145338Z // end inline asm 2026-02-21T09:22:04.7145478Z bar.sync 0; 2026-02-21T09:22:04.7145700Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r382, %r518}; 2026-02-21T09:22:04.7145970Z bar.sync 0; 2026-02-21T09:22:04.7146115Z // begin inline asm 2026-02-21T09:22:04.7146329Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r852}, [%r695]; 2026-02-21T09:22:04.7146723Z // end inline asm 2026-02-21T09:22:04.7146873Z bar.sync 0; 2026-02-21T09:22:04.7147095Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r381, %r517}; 2026-02-21T09:22:04.7147373Z bar.sync 0; 2026-02-21T09:22:04.7147512Z // begin inline asm 2026-02-21T09:22:04.7147734Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r851}, [%r695]; 2026-02-21T09:22:04.7147994Z // end inline asm 2026-02-21T09:22:04.7148142Z bar.sync 0; 2026-02-21T09:22:04.7148428Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r383, %r519}; 2026-02-21T09:22:04.7148708Z bar.sync 0; 2026-02-21T09:22:04.7148853Z // begin inline asm 2026-02-21T09:22:04.7149073Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r853}, [%r695]; 2026-02-21T09:22:04.7149336Z // end inline asm 2026-02-21T09:22:04.7149477Z bar.sync 0; 2026-02-21T09:22:04.7149700Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r384, %r520}; 2026-02-21T09:22:04.7149977Z bar.sync 0; 2026-02-21T09:22:04.7150123Z // begin inline asm 2026-02-21T09:22:04.7150338Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r854}, [%r695]; 2026-02-21T09:22:04.7150759Z // end inline asm 2026-02-21T09:22:04.7150903Z bar.sync 0; 2026-02-21T09:22:04.7151128Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r386, %r522}; 2026-02-21T09:22:04.7151407Z bar.sync 0; 2026-02-21T09:22:04.7151560Z // begin inline asm 2026-02-21T09:22:04.7151797Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r856}, [%r695]; 2026-02-21T09:22:04.7152055Z // end inline asm 2026-02-21T09:22:04.7152204Z bar.sync 0; 2026-02-21T09:22:04.7152431Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r385, %r521}; 2026-02-21T09:22:04.7152713Z bar.sync 0; 2026-02-21T09:22:04.7152852Z // begin inline asm 2026-02-21T09:22:04.7153077Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r855}, [%r695]; 2026-02-21T09:22:04.7153343Z // end inline asm 2026-02-21T09:22:04.7153490Z bar.sync 0; 2026-02-21T09:22:04.7153796Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r194], {%r387, %r523}; 2026-02-21T09:22:04.7154077Z bar.sync 0; 2026-02-21T09:22:04.7154221Z // begin inline asm 2026-02-21T09:22:04.7154444Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r857}, [%r695]; 2026-02-21T09:22:04.7154724Z // end inline asm 2026-02-21T09:22:04.7154871Z // begin inline asm 2026-02-21T09:22:04.7155056Z fence.proxy.async.shared::cta; 2026-02-21T09:22:04.7155252Z // end inline asm 2026-02-21T09:22:04.7155490Z wgmma.fence.sync.aligned; 2026-02-21T09:22:04.7155688Z shl.b32 %r983, %r976, 8; 2026-02-21T09:22:04.7155859Z and.b32 %r984, %r983, 4096; 2026-02-21T09:22:04.7156059Z add.s32 %r985, %r984, %r691; 2026-02-21T09:22:04.7156243Z bfe.u32 %r986, %r985, 4, 14; 2026-02-21T09:22:04.7156426Z cvt.u64.u32 %rd41, %r986; 2026-02-21T09:22:04.7156727Z or.b64 %rd35, %rd41, -9223371899382267904; 2026-02-21T09:22:04.7156947Z // begin inline asm 2026-02-21T09:22:04.7157699Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857}, {%r822,%r823,%r824,%r825}, %rd35, %p2, 1, 1; 2026-02-21T09:22:04.7158498Z // end inline asm 2026-02-21T09:22:04.7158657Z add.s32 %r987, %r985, 32; 2026-02-21T09:22:04.7158829Z bfe.u32 %r988, %r987, 4, 14; 2026-02-21T09:22:04.7159012Z cvt.u64.u32 %rd42, %r988; 2026-02-21T09:22:04.7159198Z or.b64 %rd36, %rd42, -9223371899382267904; 2026-02-21T09:22:04.7159414Z // begin inline asm 2026-02-21T09:22:04.7160168Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857}, {%r890,%r891,%r892,%r893}, %rd36, %p2, 1, 1; 2026-02-21T09:22:04.7160960Z // end inline asm 2026-02-21T09:22:04.7161135Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:04.7161338Z mov.b32 %r926, %r691; 2026-02-21T09:22:04.7161510Z mov.b32 %r927, %r928; 2026-02-21T09:22:04.7161670Z // begin inline asm 2026-02-21T09:22:04.7162238Z // wait for regs: %r826,%r827,%r828,%r829,%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r926,%r927,%r928 2026-02-21T09:22:04.7162872Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:04.7163066Z // end inline asm 2026-02-21T09:22:04.7163234Z $L__tmp4: 2026-02-21T09:22:04.7163532Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:04.7163905Z add.s32 %r989, %r1079, 1; 2026-02-21T09:22:04.7164084Z setp.gt.s32 %p11, %r989, 4; 2026-02-21T09:22:04.7164279Z selp.b32 %r1079, 0, %r989, %p11; 2026-02-21T09:22:04.7164612Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.7164976Z add.s32 %r990, %r1077, -16; 2026-02-21T09:22:04.7165175Z mad.wide.s32 %rd37, %r990, 2, %rd7; 2026-02-21T09:22:04.7165525Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.7166035Z shl.b32 %r991, %r1079, 13; 2026-02-21T09:22:04.7166213Z add.s32 %r964, %r94, %r991; 2026-02-21T09:22:04.7166403Z selp.b32 %r965, 8, 0, %p9; 2026-02-21T09:22:04.7166716Z // begin inline asm 2026-02-21T09:22:04.7166969Z cp.async.ca.shared.global [ %r964 + 0 ], [ %rd37 + 0 ], 0x8, %r965; 2026-02-21T09:22:04.7167250Z // end inline asm 2026-02-21T09:22:04.7167417Z cp.async.commit_group; 2026-02-21T09:22:04.7167741Z .loc 1 42 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:28 2026-02-21T09:22:04.7168108Z mad.wide.s32 %rd38, %r1077, 2, %rd7; 2026-02-21T09:22:04.7168457Z .loc 1 42 76 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:42:76 2026-02-21T09:22:04.7168895Z add.s32 %r966, %r96, %r991; 2026-02-21T09:22:04.7169086Z // begin inline asm 2026-02-21T09:22:04.7169316Z cp.async.ca.shared.global [ %r966 + 0 ], [ %rd38 + 0 ], 0x8, %r965; 2026-02-21T09:22:04.7169592Z // end inline asm 2026-02-21T09:22:04.7169751Z cp.async.commit_group; 2026-02-21T09:22:04.7170060Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:04.7170417Z add.s32 %r1077, %r1077, 32; 2026-02-21T09:22:04.7170696Z add.s32 %r1076, %r1076, 131072; 2026-02-21T09:22:04.7170899Z setp.lt.u64 %p12, %rd47, 496; 2026-02-21T09:22:04.7171083Z @%p12 bra $L__BB0_1; 2026-02-21T09:22:04.7171245Z // %bb.2: 2026-02-21T09:22:04.7171526Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:04.7171888Z or.b32 %r1028, %r4, %r13; 2026-02-21T09:22:04.7172204Z .loc 1 25 41 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:41 2026-02-21T09:22:04.7172553Z and.b32 %r1029, %r7, 120; 2026-02-21T09:22:04.7172866Z .loc 1 25 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:25:28 2026-02-21T09:22:04.7173219Z or.b32 %r1030, %r1, %r1029; 2026-02-21T09:22:04.7173536Z .loc 1 34 74 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:34:74 2026-02-21T09:22:04.7173891Z cp.async.wait_group 0; 2026-02-21T09:22:04.7174066Z bar.sync 0; 2026-02-21T09:22:04.7174354Z .loc 1 81 24 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:81:24 2026-02-21T09:22:04.7174712Z cvt.rn.bf16x2.f32 %r1031, %r827, %r826; 2026-02-21T09:22:04.7174933Z cvt.rn.bf16x2.f32 %r1032, %r829, %r828; 2026-02-21T09:22:04.7175143Z cvt.rn.bf16x2.f32 %r1033, %r831, %r830; 2026-02-21T09:22:04.7175356Z cvt.rn.bf16x2.f32 %r1034, %r833, %r832; 2026-02-21T09:22:04.7175577Z cvt.rn.bf16x2.f32 %r1035, %r835, %r834; 2026-02-21T09:22:04.7175803Z cvt.rn.bf16x2.f32 %r1036, %r837, %r836; 2026-02-21T09:22:04.7176014Z cvt.rn.bf16x2.f32 %r1037, %r839, %r838; 2026-02-21T09:22:04.7176229Z cvt.rn.bf16x2.f32 %r1038, %r841, %r840; 2026-02-21T09:22:04.7176561Z cvt.rn.bf16x2.f32 %r1039, %r843, %r842; 2026-02-21T09:22:04.7176785Z cvt.rn.bf16x2.f32 %r1040, %r845, %r844; 2026-02-21T09:22:04.7177000Z cvt.rn.bf16x2.f32 %r1041, %r847, %r846; 2026-02-21T09:22:04.7177206Z cvt.rn.bf16x2.f32 %r1042, %r849, %r848; 2026-02-21T09:22:04.7177420Z cvt.rn.bf16x2.f32 %r1043, %r851, %r850; 2026-02-21T09:22:04.7177627Z cvt.rn.bf16x2.f32 %r1044, %r853, %r852; 2026-02-21T09:22:04.7177840Z cvt.rn.bf16x2.f32 %r1045, %r855, %r854; 2026-02-21T09:22:04.7178051Z cvt.rn.bf16x2.f32 %r1046, %r857, %r856; 2026-02-21T09:22:04.7178399Z .loc 1 27 28 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:27:28 2026-02-21T09:22:04.7178761Z shl.b32 %r1047, %r1028, 13; 2026-02-21T09:22:04.7179078Z .loc 1 82 39 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:39 2026-02-21T09:22:04.7179453Z or.b32 %r1048, %r1047, 524288; 2026-02-21T09:22:04.7179643Z or.b32 %r1049, %r1047, 1048576; 2026-02-21T09:22:04.7179833Z or.b32 %r1050, %r1047, 1572864; 2026-02-21T09:22:04.7180309Z .loc 1 82 46 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:46 2026-02-21T09:22:04.7180667Z add.s32 %r1051, %r1047, %r1030; 2026-02-21T09:22:04.7180858Z add.s32 %r1052, %r1048, %r1030; 2026-02-21T09:22:04.7181041Z add.s32 %r1053, %r1049, %r1030; 2026-02-21T09:22:04.7181228Z add.s32 %r1054, %r1050, %r1030; 2026-02-21T09:22:04.7181549Z .loc 1 82 18 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:18 2026-02-21T09:22:04.7181913Z mad.wide.s32 %rd43, %r1051, 2, %rd9; 2026-02-21T09:22:04.7182123Z mad.wide.s32 %rd44, %r1052, 2, %rd9; 2026-02-21T09:22:04.7182337Z mad.wide.s32 %rd45, %r1053, 2, %rd9; 2026-02-21T09:22:04.7182542Z mad.wide.s32 %rd46, %r1054, 2, %rd9; 2026-02-21T09:22:04.7182956Z .loc 1 82 77 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:77 2026-02-21T09:22:04.7183316Z shl.b32 %r1055, %r2, 5; 2026-02-21T09:22:04.7183489Z and.b32 %r1056, %r2, 24; 2026-02-21T09:22:04.7183669Z shr.u32 %r1057, %r2, 3; 2026-02-21T09:22:04.7183833Z and.b32 %r1058, %r1057, 64; 2026-02-21T09:22:04.7184013Z shl.b32 %r1059, %r1056, 4; 2026-02-21T09:22:04.7184184Z shl.b32 %r1060, %r5, 14; 2026-02-21T09:22:04.7184358Z and.b32 %r1061, %r1055, 15456; 2026-02-21T09:22:04.7184629Z and.b32 %r1062, %r12, 16; 2026-02-21T09:22:04.7184811Z or.b32 %r1063, %r1062, %r1060; 2026-02-21T09:22:04.7184996Z or.b32 %r1064, %r1061, %r1059; 2026-02-21T09:22:04.7185176Z xor.b32 %r1065, %r1064, %r1058; 2026-02-21T09:22:04.7185362Z or.b32 %r1066, %r1065, %r1063; 2026-02-21T09:22:04.7185539Z add.s32 %r1068, %r138, %r1066; 2026-02-21T09:22:04.7185778Z st.shared.v4.b32 [%r1068], {%r1031, %r1033, %r1035, %r1037}; 2026-02-21T09:22:04.7186100Z st.shared.v4.b32 [%r1068+512], {%r1032, %r1034, %r1036, %r1038}; 2026-02-21T09:22:04.7186383Z xor.b32 %r1069, %r1066, 32; 2026-02-21T09:22:04.7186688Z add.s32 %r1070, %r138, %r1069; 2026-02-21T09:22:04.7186919Z st.shared.v4.b32 [%r1070], {%r1039, %r1041, %r1043, %r1045}; 2026-02-21T09:22:04.7187233Z st.shared.v4.b32 [%r1070+512], {%r1040, %r1042, %r1044, %r1046}; 2026-02-21T09:22:04.7187490Z bar.sync 0; 2026-02-21T09:22:04.7187645Z shl.b32 %r1071, %r1056, 11; 2026-02-21T09:22:04.7187820Z shl.b32 %r1072, %r5, 5; 2026-02-21T09:22:04.7187994Z and.b32 %r1073, %r12, 4080; 2026-02-21T09:22:04.7188169Z or.b32 %r1074, %r1071, %r1072; 2026-02-21T09:22:04.7188428Z xor.b32 %r1075, %r1074, %r1073; 2026-02-21T09:22:04.7188614Z add.s32 %r996, %r138, %r1075; 2026-02-21T09:22:04.7188798Z // begin inline asm 2026-02-21T09:22:04.7189089Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1012, %r1013, %r1014, %r1015}, [%r996]; 2026-02-21T09:22:04.7189424Z // end inline asm 2026-02-21T09:22:04.7189587Z add.s32 %r1001, %r996, 4096; 2026-02-21T09:22:04.7189766Z // begin inline asm 2026-02-21T09:22:04.7190054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1016, %r1017, %r1018, %r1019}, [%r1001]; 2026-02-21T09:22:04.7190389Z // end inline asm 2026-02-21T09:22:04.7190549Z add.s32 %r1006, %r996, 8192; 2026-02-21T09:22:04.7190722Z // begin inline asm 2026-02-21T09:22:04.7191009Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1021, %r1022, %r1023}, [%r1006]; 2026-02-21T09:22:04.7191345Z // end inline asm 2026-02-21T09:22:04.7191505Z add.s32 %r1011, %r996, 12288; 2026-02-21T09:22:04.7191688Z // begin inline asm 2026-02-21T09:22:04.7191963Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1025, %r1026, %r1027}, [%r1011]; 2026-02-21T09:22:04.7192295Z // end inline asm 2026-02-21T09:22:04.7192446Z // begin inline asm 2026-02-21T09:22:04.7192678Z st.global.v4.b32 [ %rd43 + 0 ], { %r1012, %r1013, %r1014, %r1015 }; 2026-02-21T09:22:04.7192942Z // end inline asm 2026-02-21T09:22:04.7193101Z // begin inline asm 2026-02-21T09:22:04.7193334Z st.global.v4.b32 [ %rd44 + 0 ], { %r1016, %r1017, %r1018, %r1019 }; 2026-02-21T09:22:04.7193592Z // end inline asm 2026-02-21T09:22:04.7193745Z // begin inline asm 2026-02-21T09:22:04.7193953Z st.global.v4.b32 [ %rd45 + 0 ], { %r1020, %r1021, %r1022, %r1023 }; 2026-02-21T09:22:04.7194370Z // end inline asm 2026-02-21T09:22:04.7194517Z // begin inline asm 2026-02-21T09:22:04.7194727Z st.global.v4.b32 [ %rd46 + 0 ], { %r1024, %r1025, %r1026, %r1027 }; 2026-02-21T09:22:04.7194982Z // end inline asm 2026-02-21T09:22:04.7195285Z .loc 1 82 4 // cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py:82:4 2026-02-21T09:22:04.7195640Z ret; 2026-02-21T09:22:04.7195773Z $L__tmp5: 2026-02-21T09:22:04.7195910Z $L__func_end0: 2026-02-21T09:22:04.7196084Z // -- End function 2026-02-21T09:22:04.7196310Z } 2026-02-21T09:22:04.7196757Z .file 1 "/tmp/torchinductor_root/fi/cfi36blrjnhorkxcxso67refekxxh5unauz7nnm6svzlphhk52oq.py" 2026-02-21T09:22:04.7197389Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:22:04.7197763Z .section .debug_abbrev 2026-02-21T09:22:04.7197935Z { 2026-02-21T09:22:04.7198108Z .b8 1 // Abbreviation Code 2026-02-21T09:22:04.7198371Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:22:04.7198632Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:04.7198950Z .b8 37 // DW_AT_producer 2026-02-21T09:22:04.7199203Z .b8 8 // DW_FORM_string 2026-02-21T09:22:04.7199443Z .b8 19 // DW_AT_language 2026-02-21T09:22:04.7199690Z .b8 5 // DW_FORM_data2 2026-02-21T09:22:04.7199931Z .b8 3 // DW_AT_name 2026-02-21T09:22:04.7200166Z .b8 8 // DW_FORM_string 2026-02-21T09:22:04.7200414Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:22:04.7200654Z .b8 6 // DW_FORM_data4 2026-02-21T09:22:04.7200913Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:22:04.7201154Z .b8 8 // DW_FORM_string 2026-02-21T09:22:04.7201393Z .b8 0 // EOM(1) 2026-02-21T09:22:04.7201620Z .b8 0 // EOM(2) 2026-02-21T09:22:04.7201860Z .b8 2 // Abbreviation Code 2026-02-21T09:22:04.7202127Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:04.7202378Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:04.7202620Z .b8 3 // DW_AT_name 2026-02-21T09:22:04.7202854Z .b8 8 // DW_FORM_string 2026-02-21T09:22:04.7203098Z .b8 32 // DW_AT_inline 2026-02-21T09:22:04.7203184Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:04.7203262Z .b8 0 // EOM(1) 2026-02-21T09:22:04.7203334Z .b8 0 // EOM(2) 2026-02-21T09:22:04.7203425Z .b8 3 // Abbreviation Code 2026-02-21T09:22:04.7203515Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:04.7203606Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:04.7203701Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:04.7203785Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:04.7203876Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:04.7203954Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:04.7204050Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:04.7204134Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:04.7204212Z .b8 0 // EOM(1) 2026-02-21T09:22:04.7204282Z .b8 0 // EOM(2) 2026-02-21T09:22:04.7204369Z .b8 4 // Abbreviation Code 2026-02-21T09:22:04.7204625Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:22:04.7204718Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:04.7204814Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:04.7204908Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:04.7204991Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:04.7205068Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:04.7205158Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:04.7205235Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:04.7205321Z .b8 88 // DW_AT_call_file 2026-02-21T09:22:04.7205458Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:04.7205556Z .b8 89 // DW_AT_call_line 2026-02-21T09:22:04.7205641Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:04.7205736Z .b8 87 // DW_AT_call_column 2026-02-21T09:22:04.7205822Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:04.7205897Z .b8 0 // EOM(1) 2026-02-21T09:22:04.7206015Z .b8 0 // EOM(2) 2026-02-21T09:22:04.7206095Z .b8 0 // EOM(3) 2026-02-21T09:22:04.7206149Z } 2026-02-21T09:22:04.7206215Z .section .debug_info 2026-02-21T09:22:04.7206268Z { 2026-02-21T09:22:04.7206366Z .b32 178 // Length of Unit 2026-02-21T09:22:04.7206595Z .b8 2 // DWARF version number 2026-02-21T09:22:04.7206654Z .b8 0 2026-02-21T09:22:04.7206798Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:22:04.7206898Z .b8 8 // Address Size (in bytes) 2026-02-21T09:22:04.7207016Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:22:04.7207114Z .b8 116 // DW_AT_producer 2026-02-21T09:22:04.7207171Z .b8 114 2026-02-21T09:22:04.7207225Z .b8 105 2026-02-21T09:22:04.7207277Z .b8 116 2026-02-21T09:22:04.7207336Z .b8 111 2026-02-21T09:22:04.7207390Z .b8 110 2026-02-21T09:22:04.7207443Z .b8 0 2026-02-21T09:22:04.7207532Z .b8 2 // DW_AT_language 2026-02-21T09:22:04.7207585Z .b8 0 2026-02-21T09:22:04.7207665Z .b8 99 // DW_AT_name 2026-02-21T09:22:04.7207718Z .b8 102 2026-02-21T09:22:04.7207776Z .b8 105 2026-02-21T09:22:04.7207829Z .b8 51 2026-02-21T09:22:04.7207880Z .b8 54 2026-02-21T09:22:04.7207937Z .b8 98 2026-02-21T09:22:04.7207990Z .b8 108 2026-02-21T09:22:04.7208042Z .b8 114 2026-02-21T09:22:04.7208097Z .b8 106 2026-02-21T09:22:04.7208155Z .b8 110 2026-02-21T09:22:04.7208208Z .b8 104 2026-02-21T09:22:04.7208260Z .b8 111 2026-02-21T09:22:04.7208319Z .b8 114 2026-02-21T09:22:04.7208377Z .b8 107 2026-02-21T09:22:04.7208431Z .b8 120 2026-02-21T09:22:04.7208483Z .b8 99 2026-02-21T09:22:04.7208542Z .b8 120 2026-02-21T09:22:04.7208596Z .b8 115 2026-02-21T09:22:04.7208649Z .b8 111 2026-02-21T09:22:04.7208702Z .b8 54 2026-02-21T09:22:04.7208761Z .b8 55 2026-02-21T09:22:04.7208816Z .b8 114 2026-02-21T09:22:04.7208867Z .b8 101 2026-02-21T09:22:04.7208925Z .b8 102 2026-02-21T09:22:04.7208982Z .b8 101 2026-02-21T09:22:04.7209034Z .b8 107 2026-02-21T09:22:04.7209087Z .b8 120 2026-02-21T09:22:04.7209145Z .b8 120 2026-02-21T09:22:04.7209200Z .b8 104 2026-02-21T09:22:04.7209253Z .b8 53 2026-02-21T09:22:04.7209311Z .b8 117 2026-02-21T09:22:04.7209365Z .b8 110 2026-02-21T09:22:04.7209417Z .b8 97 2026-02-21T09:22:04.7209485Z .b8 117 2026-02-21T09:22:04.7209547Z .b8 122 2026-02-21T09:22:04.7209600Z .b8 55 2026-02-21T09:22:04.7209655Z .b8 110 2026-02-21T09:22:04.7209708Z .b8 110 2026-02-21T09:22:04.7209766Z .b8 109 2026-02-21T09:22:04.7209817Z .b8 54 2026-02-21T09:22:04.7209870Z .b8 115 2026-02-21T09:22:04.7210072Z .b8 118 2026-02-21T09:22:04.7210125Z .b8 122 2026-02-21T09:22:04.7210178Z .b8 108 2026-02-21T09:22:04.7210231Z .b8 112 2026-02-21T09:22:04.7210290Z .b8 104 2026-02-21T09:22:04.7210343Z .b8 104 2026-02-21T09:22:04.7210396Z .b8 107 2026-02-21T09:22:04.7210464Z .b8 53 2026-02-21T09:22:04.7210524Z .b8 50 2026-02-21T09:22:04.7210579Z .b8 111 2026-02-21T09:22:04.7210633Z .b8 113 2026-02-21T09:22:04.7210695Z .b8 46 2026-02-21T09:22:04.7210749Z .b8 112 2026-02-21T09:22:04.7210803Z .b8 121 2026-02-21T09:22:04.7210862Z .b8 0 2026-02-21T09:22:04.7210971Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:22:04.7211055Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:22:04.7211109Z .b8 116 2026-02-21T09:22:04.7211171Z .b8 109 2026-02-21T09:22:04.7211227Z .b8 112 2026-02-21T09:22:04.7211380Z .b8 47 2026-02-21T09:22:04.7211446Z .b8 116 2026-02-21T09:22:04.7211502Z .b8 111 2026-02-21T09:22:04.7211555Z .b8 114 2026-02-21T09:22:04.7211609Z .b8 99 2026-02-21T09:22:04.7211673Z .b8 104 2026-02-21T09:22:04.7211726Z .b8 105 2026-02-21T09:22:04.7211781Z .b8 110 2026-02-21T09:22:04.7211833Z .b8 100 2026-02-21T09:22:04.7211892Z .b8 117 2026-02-21T09:22:04.7211944Z .b8 99 2026-02-21T09:22:04.7211998Z .b8 116 2026-02-21T09:22:04.7212059Z .b8 111 2026-02-21T09:22:04.7212175Z .b8 114 2026-02-21T09:22:04.7212230Z .b8 95 2026-02-21T09:22:04.7212286Z .b8 114 2026-02-21T09:22:04.7212344Z .b8 111 2026-02-21T09:22:04.7212400Z .b8 111 2026-02-21T09:22:04.7212453Z .b8 116 2026-02-21T09:22:04.7212508Z .b8 47 2026-02-21T09:22:04.7212571Z .b8 102 2026-02-21T09:22:04.7212623Z .b8 105 2026-02-21T09:22:04.7212675Z .b8 0 2026-02-21T09:22:04.7212808Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:22:04.7212890Z .b8 95 // DW_AT_name 2026-02-21T09:22:04.7212944Z .b8 104 2026-02-21T09:22:04.7213001Z .b8 101 2026-02-21T09:22:04.7213053Z .b8 108 2026-02-21T09:22:04.7213106Z .b8 105 2026-02-21T09:22:04.7213160Z .b8 111 2026-02-21T09:22:04.7213219Z .b8 110 2026-02-21T09:22:04.7213271Z .b8 95 2026-02-21T09:22:04.7213323Z .b8 109 2026-02-21T09:22:04.7213383Z .b8 97 2026-02-21T09:22:04.7213435Z .b8 116 2026-02-21T09:22:04.7213492Z .b8 109 2026-02-21T09:22:04.7213545Z .b8 117 2026-02-21T09:22:04.7213604Z .b8 108 2026-02-21T09:22:04.7213656Z .b8 95 2026-02-21T09:22:04.7213710Z .b8 98 2026-02-21T09:22:04.7213762Z .b8 102 2026-02-21T09:22:04.7213819Z .b8 49 2026-02-21T09:22:04.7213870Z .b8 54 2026-02-21T09:22:04.7213922Z .b8 95 2026-02-21T09:22:04.7213984Z .b8 105 2026-02-21T09:22:04.7214038Z .b8 110 2026-02-21T09:22:04.7214090Z .b8 116 2026-02-21T09:22:04.7214141Z .b8 52 2026-02-21T09:22:04.7214199Z .b8 0 2026-02-21T09:22:04.7214280Z .b8 1 // DW_AT_inline 2026-02-21T09:22:04.7214390Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:22:04.7214490Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:22:04.7214587Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:22:04.7214692Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:04.7214843Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:22:04.7214947Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:04.7215037Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:22:04.7215128Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:22:04.7215219Z .b8 1 // DW_AT_call_file 2026-02-21T09:22:04.7215302Z .b8 78 // DW_AT_call_line 2026-02-21T09:22:04.7215392Z .b8 36 // DW_AT_call_column 2026-02-21T09:22:04.7215490Z .b8 0 // End Of Children Mark 2026-02-21T09:22:04.7215581Z .b8 0 // End Of Children Mark 2026-02-21T09:22:04.7215634Z } 2026-02-21T09:22:04.7215812Z .section .debug_macinfo { } 2026-02-21T09:22:04.7215818Z 2026-02-21T09:22:04.7215901Z ================================================================ 2026-02-21T09:22:04.7216018Z please share the reproducer above with Triton project. 2026-02-21T09:22:05.5297804Z 2026-02-21T09:22:05.5297833Z 2026-02-21T09:22:05.5297840Z 2026-02-21T09:22:05.5298158Z ================================================================ 2026-02-21T09:22:05.5298520Z Internal Triton PTX codegen error 2026-02-21T09:22:05.5298794Z `ptxas` stderr: 2026-02-21T09:22:05.5299621Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:05.5301045Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:05.5301712Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:05.5301976Z 2026-02-21T09:22:05.5302684Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpksa2hx_s.ptx -o /tmp/tmpksa2hx_s.ptx.o 2026-02-21T09:22:05.5303360Z 2026-02-21T09:22:05.5303364Z 2026-02-21T09:22:05.5303537Z // 2026-02-21T09:22:05.5303706Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:22:05.5303920Z // 2026-02-21T09:22:05.5303996Z 2026-02-21T09:22:05.5304062Z .version 8.7 2026-02-21T09:22:05.5304211Z .target sm_90a 2026-02-21T09:22:05.5304368Z .address_size 64 2026-02-21T09:22:05.5304465Z 2026-02-21T09:22:05.5304657Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:22:05.5305039Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:22:05.5305320Z // @_helion_matmul_bf16_int4 2026-02-21T09:22:05.5305607Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:22:05.5305926Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:22:05.5306310Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:22:05.5306887Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:22:05.5307265Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:22:05.5307643Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:22:05.5307928Z ) 2026-02-21T09:22:05.5308067Z .reqntid 1024 2026-02-21T09:22:05.5308328Z { 2026-02-21T09:22:05.5308476Z .reg .pred %p<13>; 2026-02-21T09:22:05.5308653Z .reg .b16 %rs<39>; 2026-02-21T09:22:05.5308836Z .reg .b32 %r<1116>; 2026-02-21T09:22:05.5308999Z .reg .b64 %rd<50>; 2026-02-21T09:22:05.5309340Z .loc 1 13 0 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:13:0 2026-02-21T09:22:05.5309733Z $L__func_begin0: 2026-02-21T09:22:05.5310052Z .loc 1 13 0 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:13:0 2026-02-21T09:22:05.5310380Z 2026-02-21T09:22:05.5310439Z // %bb.0: 2026-02-21T09:22:05.5310651Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:22:05.5310984Z ld.param.b64 %rd8, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:22:05.5311294Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:22:05.5311559Z $L__tmp0: 2026-02-21T09:22:05.5311870Z .loc 1 17 33 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:17:33 2026-02-21T09:22:05.5312272Z mov.u32 %r121, %ctaid.x; 2026-02-21T09:22:05.5312624Z .loc 1 20 29 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:20:29 2026-02-21T09:22:05.5313018Z shr.u32 %r122, %r121, 6; 2026-02-21T09:22:05.5313199Z and.b32 %r123, %r122, 33554424; 2026-02-21T09:22:05.5313543Z .loc 1 21 35 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:21:35 2026-02-21T09:22:05.5313904Z sub.s32 %r124, 64, %r123; 2026-02-21T09:22:05.5314452Z .loc 1 21 48 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:21:48 2026-02-21T09:22:05.5314803Z min.s32 %r125, %r124, 8; 2026-02-21T09:22:05.5315127Z .loc 1 22 41 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:22:41 2026-02-21T09:22:05.5315487Z and.b32 %r126, %r121, 511; 2026-02-21T09:22:05.5315800Z .loc 1 23 47 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:23:47 2026-02-21T09:22:05.5316159Z div.s32 %r127, %r126, %r125; 2026-02-21T09:22:05.5316639Z .loc 1 22 60 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:22:60 2026-02-21T09:22:05.5317002Z mul.lo.s32 %r128, %r127, %r125; 2026-02-21T09:22:05.5317199Z sub.s32 %r129, %r126, %r128; 2026-02-21T09:22:05.5317600Z .loc 1 22 26 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:22:26 2026-02-21T09:22:05.5317959Z add.s32 %r130, %r129, %r123; 2026-02-21T09:22:05.5318278Z .loc 1 24 23 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:24:23 2026-02-21T09:22:05.5318650Z shl.b32 %r1, %r130, 7; 2026-02-21T09:22:05.5318963Z .loc 1 25 41 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:25:41 2026-02-21T09:22:05.5319384Z mov.u32 %r2, %tid.x; 2026-02-21T09:22:05.5319562Z and.b32 %r131, %r2, 31; 2026-02-21T09:22:05.5319731Z shr.u32 %r3, %r2, 5; 2026-02-21T09:22:05.5320207Z [605s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:22:05.5321637Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 256, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_stages=7, num_warps=32, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:22:05.5322975Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:22:05.5323262Z `ptxas` stderr: 2026-02-21T09:22:05.5323894Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T09:22:05.5324819Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T09:22:05.5325324Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:22:05.5325504Z 2026-02-21T09:22:05.5326001Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpksa2hx_s.ptx -o /tmp/tmpksa2hx_s.ptx.o 2026-02-21T09:22:05.5326730Z 2026-02-21T09:22:05.5326885Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:22:05.5327187Z and.b32 %r132, %r2, 127; 2026-02-21T09:22:05.5327514Z .loc 1 26 23 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:26:23 2026-02-21T09:22:05.5327879Z shl.b32 %r4, %r127, 8; 2026-02-21T09:22:05.5328188Z .loc 1 27 41 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:27:41 2026-02-21T09:22:05.5328547Z shr.u32 %r133, %r2, 2; 2026-02-21T09:22:05.5328854Z .loc 1 27 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:27:28 2026-02-21T09:22:05.5329205Z or.b32 %r134, %r4, %r133; 2026-02-21T09:22:05.5329521Z .loc 1 41 34 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:41:34 2026-02-21T09:22:05.5329868Z and.b32 %r5, %r2, 3; 2026-02-21T09:22:05.5330033Z shl.b32 %r135, %r5, 2; 2026-02-21T09:22:05.5330332Z .loc 1 42 49 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:49 2026-02-21T09:22:05.5330687Z shl.b32 %r136, %r134, 10; 2026-02-21T09:22:05.5331114Z .loc 1 59 34 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:59:34 2026-02-21T09:22:05.5331544Z and.b32 %r6, %r2, 128; 2026-02-21T09:22:05.5331876Z .loc 1 42 56 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:56 2026-02-21T09:22:05.5332231Z or.b32 %r137, %r136, %r135; 2026-02-21T09:22:05.5332573Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5332939Z mad.wide.s32 %rd10, %r137, 2, %rd7; 2026-02-21T09:22:05.5333288Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5333653Z shl.b32 %r7, %r2, 3; 2026-02-21T09:22:05.5333820Z and.b32 %r138, %r7, 8056; 2026-02-21T09:22:05.5334076Z bfe.s32 %r139, %r2, 4, 1; 2026-02-21T09:22:05.5334251Z and.b32 %r140, %r139, 136; 2026-02-21T09:22:05.5334434Z xor.b32 %r141, %r140, %r138; 2026-02-21T09:22:05.5334614Z mov.b32 %r142, global_smem; 2026-02-21T09:22:05.5334800Z add.s32 %r94, %r142, %r141; 2026-02-21T09:22:05.5334966Z mov.b32 %r95, 8; 2026-02-21T09:22:05.5335120Z // begin inline asm 2026-02-21T09:22:05.5335366Z cp.async.ca.shared.global [ %r94 + 0 ], [ %rd10 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5335716Z // end inline asm 2026-02-21T09:22:05.5335879Z cp.async.commit_group; 2026-02-21T09:22:05.5336209Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5336706Z add.s64 %rd11, %rd10, 32; 2026-02-21T09:22:05.5337022Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5337378Z add.s32 %r96, %r94, 49152; 2026-02-21T09:22:05.5337553Z // begin inline asm 2026-02-21T09:22:05.5337789Z cp.async.ca.shared.global [ %r96 + 0 ], [ %rd11 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5338060Z // end inline asm 2026-02-21T09:22:05.5338215Z cp.async.commit_group; 2026-02-21T09:22:05.5338545Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5338912Z add.s64 %rd12, %rd10, 64; 2026-02-21T09:22:05.5339232Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5339579Z bar.sync 0; 2026-02-21T09:22:05.5339735Z add.s32 %r98, %r94, 8192; 2026-02-21T09:22:05.5339904Z // begin inline asm 2026-02-21T09:22:05.5340136Z cp.async.ca.shared.global [ %r98 + 0 ], [ %rd12 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5340408Z // end inline asm 2026-02-21T09:22:05.5340565Z cp.async.commit_group; 2026-02-21T09:22:05.5340875Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5341223Z add.s64 %rd13, %rd10, 96; 2026-02-21T09:22:05.5341537Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5341884Z add.s32 %r100, %r94, 57344; 2026-02-21T09:22:05.5342073Z // begin inline asm 2026-02-21T09:22:05.5342304Z cp.async.ca.shared.global [ %r100 + 0 ], [ %rd13 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5342580Z // end inline asm 2026-02-21T09:22:05.5342753Z cp.async.commit_group; 2026-02-21T09:22:05.5343069Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5343430Z add.s64 %rd14, %rd10, 128; 2026-02-21T09:22:05.5343745Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5344092Z bar.sync 0; 2026-02-21T09:22:05.5344240Z add.s32 %r102, %r94, 16384; 2026-02-21T09:22:05.5344421Z // begin inline asm 2026-02-21T09:22:05.5344655Z cp.async.ca.shared.global [ %r102 + 0 ], [ %rd14 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5344927Z // end inline asm 2026-02-21T09:22:05.5345092Z cp.async.commit_group; 2026-02-21T09:22:05.5345399Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5345927Z add.s64 %rd15, %rd10, 160; 2026-02-21T09:22:05.5346241Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5346761Z add.s32 %r104, %r94, 65536; 2026-02-21T09:22:05.5346936Z // begin inline asm 2026-02-21T09:22:05.5347161Z cp.async.ca.shared.global [ %r104 + 0 ], [ %rd15 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5347431Z // end inline asm 2026-02-21T09:22:05.5347581Z cp.async.commit_group; 2026-02-21T09:22:05.5347888Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5348328Z add.s64 %rd16, %rd10, 192; 2026-02-21T09:22:05.5348653Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5349079Z bar.sync 0; 2026-02-21T09:22:05.5349231Z add.s32 %r106, %r94, 24576; 2026-02-21T09:22:05.5349399Z // begin inline asm 2026-02-21T09:22:05.5349622Z cp.async.ca.shared.global [ %r106 + 0 ], [ %rd16 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5349895Z // end inline asm 2026-02-21T09:22:05.5350062Z cp.async.commit_group; 2026-02-21T09:22:05.5350369Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5350789Z add.s64 %rd17, %rd10, 224; 2026-02-21T09:22:05.5351108Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5351450Z add.s32 %r108, %r94, 73728; 2026-02-21T09:22:05.5351624Z // begin inline asm 2026-02-21T09:22:05.5351847Z cp.async.ca.shared.global [ %r108 + 0 ], [ %rd17 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5352113Z // end inline asm 2026-02-21T09:22:05.5352271Z cp.async.commit_group; 2026-02-21T09:22:05.5352576Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5352939Z add.s64 %rd18, %rd10, 256; 2026-02-21T09:22:05.5353249Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5353597Z bar.sync 0; 2026-02-21T09:22:05.5353742Z add.s32 %r110, %r94, 32768; 2026-02-21T09:22:05.5353926Z // begin inline asm 2026-02-21T09:22:05.5354165Z cp.async.ca.shared.global [ %r110 + 0 ], [ %rd18 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5354436Z // end inline asm 2026-02-21T09:22:05.5354615Z cp.async.commit_group; 2026-02-21T09:22:05.5354928Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5355289Z add.s64 %rd19, %rd10, 288; 2026-02-21T09:22:05.5355619Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5355986Z add.s32 %r112, %r94, 81920; 2026-02-21T09:22:05.5356175Z // begin inline asm 2026-02-21T09:22:05.5356410Z cp.async.ca.shared.global [ %r112 + 0 ], [ %rd19 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5356831Z // end inline asm 2026-02-21T09:22:05.5357001Z cp.async.commit_group; 2026-02-21T09:22:05.5357322Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5357687Z add.s64 %rd20, %rd10, 320; 2026-02-21T09:22:05.5358018Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5358364Z bar.sync 0; 2026-02-21T09:22:05.5358522Z add.s32 %r114, %r94, 40960; 2026-02-21T09:22:05.5358703Z // begin inline asm 2026-02-21T09:22:05.5358932Z cp.async.ca.shared.global [ %r114 + 0 ], [ %rd20 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5359214Z // end inline asm 2026-02-21T09:22:05.5359371Z cp.async.commit_group; 2026-02-21T09:22:05.5359686Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5360044Z add.s64 %rd21, %rd10, 352; 2026-02-21T09:22:05.5360379Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5360923Z add.s32 %r116, %r94, 90112; 2026-02-21T09:22:05.5361098Z // begin inline asm 2026-02-21T09:22:05.5361326Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd21 + 0 ], 0x8, %r95; 2026-02-21T09:22:05.5361595Z // end inline asm 2026-02-21T09:22:05.5361758Z cp.async.commit_group; 2026-02-21T09:22:05.5361924Z shl.b32 %r143, %r2, 4; 2026-02-21T09:22:05.5362100Z and.b32 %r144, %r143, 7680; 2026-02-21T09:22:05.5362280Z and.b32 %r145, %r7, 96; 2026-02-21T09:22:05.5362450Z shl.b32 %r146, %r5, 1; 2026-02-21T09:22:05.5362614Z or.b32 %r147, %r144, %r145; 2026-02-21T09:22:05.5362790Z or.b32 %r148, %r147, %r146; 2026-02-21T09:22:05.5362968Z or.b32 %r10, %r148, %r140; 2026-02-21T09:22:05.5363137Z xor.b32 %r11, %r10, 8; 2026-02-21T09:22:05.5363306Z shl.b32 %r12, %r2, 2; 2026-02-21T09:22:05.5363549Z and.b32 %r149, %r12, 124; 2026-02-21T09:22:05.5363741Z and.b32 %r150, %r2, 384; 2026-02-21T09:22:05.5363913Z shr.u32 %r13, %r2, 4; 2026-02-21T09:22:05.5364077Z and.b32 %r151, %r13, 2; 2026-02-21T09:22:05.5364248Z and.b32 %r152, %r7, 512; 2026-02-21T09:22:05.5364426Z setp.gt.u32 %p1, %r2, 511; 2026-02-21T09:22:05.5364604Z selp.b32 %r153, 1, 0, %p1; 2026-02-21T09:22:05.5364785Z add.s32 %r695, %r142, 98304; 2026-02-21T09:22:05.5365039Z add.s32 %r155, %r695, %r150; 2026-02-21T09:22:05.5365216Z add.s32 %r156, %r155, %r153; 2026-02-21T09:22:05.5365395Z add.s32 %r157, %r156, %r152; 2026-02-21T09:22:05.5365566Z add.s32 %r158, %r157, %r151; 2026-02-21T09:22:05.5365743Z add.s32 %r14, %r158, %r149; 2026-02-21T09:22:05.5365916Z shr.u32 %r159, %r2, 1; 2026-02-21T09:22:05.5366083Z and.b32 %r160, %r159, 384; 2026-02-21T09:22:05.5366265Z add.s32 %r161, %r695, %r151; 2026-02-21T09:22:05.5366607Z add.s32 %r162, %r161, %r160; 2026-02-21T09:22:05.5366785Z add.s32 %r163, %r162, %r149; 2026-02-21T09:22:05.5366980Z add.s32 %r15, %r163, %r152; 2026-02-21T09:22:05.5367162Z shl.b32 %r164, %r132, 6; 2026-02-21T09:22:05.5367327Z and.b32 %r165, %r7, 48; 2026-02-21T09:22:05.5367502Z and.b32 %r166, %r3, 28; 2026-02-21T09:22:05.5374892Z xor.b32 %r167, %r165, %r166; 2026-02-21T09:22:05.5375151Z or.b32 %r168, %r167, %r164; 2026-02-21T09:22:05.5375359Z add.s32 %r16, %r695, %r168; 2026-02-21T09:22:05.5375550Z xor.b32 %r169, %r168, 32; 2026-02-21T09:22:05.5375745Z add.s32 %r17, %r695, %r169; 2026-02-21T09:22:05.5375924Z shl.b32 %r170, %r3, 7; 2026-02-21T09:22:05.5376115Z shl.b32 %r171, %r131, 4; 2026-02-21T09:22:05.5376289Z or.b32 %r172, %r170, %r171; 2026-02-21T09:22:05.5376671Z add.s32 %r173, %r142, 106496; 2026-02-21T09:22:05.5376877Z add.s32 %r699, %r173, %r172; 2026-02-21T09:22:05.5377057Z and.b32 %r174, %r143, 112; 2026-02-21T09:22:05.5377239Z shl.b32 %r175, %r131, 3; 2026-02-21T09:22:05.5377409Z or.b32 %r176, %r170, %r175; 2026-02-21T09:22:05.5377595Z and.b32 %r177, %r176, 1920; 2026-02-21T09:22:05.5377770Z shl.b32 %r178, %r2, 8; 2026-02-21T09:22:05.5377961Z and.b32 %r179, %r178, 2048; 2026-02-21T09:22:05.5378147Z add.s32 %r180, %r173, %r174; 2026-02-21T09:22:05.5378339Z add.s32 %r181, %r180, %r179; 2026-02-21T09:22:05.5378519Z add.s32 %r198, %r181, %r177; 2026-02-21T09:22:05.5378705Z bfe.u32 %r182, %r695, 4, 14; 2026-02-21T09:22:05.5378888Z cvt.u64.u32 %rd23, %r182; 2026-02-21T09:22:05.5379093Z or.b64 %rd30, %rd23, -9223371899382267904; 2026-02-21T09:22:05.5379318Z add.s32 %r183, %r142, 98336; 2026-02-21T09:22:05.5379494Z bfe.u32 %r184, %r183, 4, 14; 2026-02-21T09:22:05.5379676Z cvt.u64.u32 %rd24, %r184; 2026-02-21T09:22:05.5379866Z or.b64 %rd31, %rd24, -9223371899382267904; 2026-02-21T09:22:05.5380083Z add.s32 %r185, %r142, 102400; 2026-02-21T09:22:05.5380266Z bfe.u32 %r186, %r185, 4, 14; 2026-02-21T09:22:05.5380454Z cvt.u64.u32 %rd25, %r186; 2026-02-21T09:22:05.5380646Z or.b64 %rd32, %rd25, -9223371899382267904; 2026-02-21T09:22:05.5380856Z add.s32 %r187, %r142, 102432; 2026-02-21T09:22:05.5381039Z bfe.u32 %r188, %r187, 4, 14; 2026-02-21T09:22:05.5381214Z cvt.u64.u32 %rd26, %r188; 2026-02-21T09:22:05.5381582Z or.b64 %rd33, %rd26, -9223371899382267904; 2026-02-21T09:22:05.5382038Z .loc 1 34 74 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:34:74 2026-02-21T09:22:05.5382424Z shl.b32 %r189, %r127, 18; 2026-02-21T09:22:05.5382612Z shl.b32 %r190, %r133, 10; 2026-02-21T09:22:05.5382797Z or.b32 %r191, %r189, %r190; 2026-02-21T09:22:05.5382984Z or.b32 %r192, %r191, %r135; 2026-02-21T09:22:05.5383158Z or.b32 %r1081, %r192, 208; 2026-02-21T09:22:05.5383340Z shl.b32 %r193, %r2, 6; 2026-02-21T09:22:05.5383511Z and.b32 %r194, %r193, 57344; 2026-02-21T09:22:05.5383695Z add.s32 %r195, %r194, %r1; 2026-02-21T09:22:05.5383868Z or.b32 %r1080, %r195, %r132; 2026-02-21T09:22:05.5384054Z mov.b32 %r830, 0f00000000; 2026-02-21T09:22:05.5384223Z mov.b32 %r1083, 5; 2026-02-21T09:22:05.5384471Z mov.b32 %r1082, -1; 2026-02-21T09:22:05.5384638Z mov.b64 %rd49, -16; 2026-02-21T09:22:05.5384809Z setp.eq.b32 %p8, %r6, 0; 2026-02-21T09:22:05.5384997Z mov.b32 %r831, %r830; 2026-02-21T09:22:05.5385165Z mov.b32 %r832, %r830; 2026-02-21T09:22:05.5385328Z mov.b32 %r833, %r830; 2026-02-21T09:22:05.5385498Z mov.b32 %r834, %r830; 2026-02-21T09:22:05.5385660Z mov.b32 %r835, %r830; 2026-02-21T09:22:05.5385814Z mov.b32 %r836, %r830; 2026-02-21T09:22:05.5386055Z mov.b32 %r837, %r830; 2026-02-21T09:22:05.5386218Z mov.b32 %r838, %r830; 2026-02-21T09:22:05.5386378Z mov.b32 %r839, %r830; 2026-02-21T09:22:05.5386671Z mov.b32 %r840, %r830; 2026-02-21T09:22:05.5386839Z mov.b32 %r841, %r830; 2026-02-21T09:22:05.5387000Z mov.b32 %r842, %r830; 2026-02-21T09:22:05.5387163Z mov.b32 %r843, %r830; 2026-02-21T09:22:05.5387335Z mov.b32 %r844, %r830; 2026-02-21T09:22:05.5387492Z mov.b32 %r845, %r830; 2026-02-21T09:22:05.5387658Z mov.b32 %r846, %r830; 2026-02-21T09:22:05.5387816Z mov.b32 %r847, %r830; 2026-02-21T09:22:05.5387977Z mov.b32 %r848, %r830; 2026-02-21T09:22:05.5388133Z mov.b32 %r849, %r830; 2026-02-21T09:22:05.5388370Z mov.b32 %r850, %r830; 2026-02-21T09:22:05.5388529Z mov.b32 %r851, %r830; 2026-02-21T09:22:05.5388699Z mov.b32 %r852, %r830; 2026-02-21T09:22:05.5388855Z mov.b32 %r853, %r830; 2026-02-21T09:22:05.5389018Z mov.b32 %r854, %r830; 2026-02-21T09:22:05.5389181Z mov.b32 %r855, %r830; 2026-02-21T09:22:05.5389337Z mov.b32 %r856, %r830; 2026-02-21T09:22:05.5389502Z mov.b32 %r857, %r830; 2026-02-21T09:22:05.5389660Z mov.b32 %r858, %r830; 2026-02-21T09:22:05.5389821Z mov.b32 %r859, %r830; 2026-02-21T09:22:05.5389974Z mov.b32 %r860, %r830; 2026-02-21T09:22:05.5390135Z mov.b32 %r861, %r830; 2026-02-21T09:22:05.5390352Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T09:22:05.5390635Z add.s64 %rd49, %rd49, 16; 2026-02-21T09:22:05.5390822Z setp.lt.u64 %p9, %rd49, 416; 2026-02-21T09:22:05.5391008Z add.s32 %r972, %r1082, 1; 2026-02-21T09:22:05.5391195Z setp.gt.s32 %p10, %r972, 5; 2026-02-21T09:22:05.5391386Z selp.b32 %r1082, 0, %r972, %p10; 2026-02-21T09:22:05.5391746Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5392125Z cp.async.wait_group 10; 2026-02-21T09:22:05.5392323Z bar.sync 0; 2026-02-21T09:22:05.5392480Z shl.b32 %r973, %r1082, 13; 2026-02-21T09:22:05.5392672Z add.s32 %r975, %r142, %r973; 2026-02-21T09:22:05.5393013Z .loc 1 46 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:46:28 2026-02-21T09:22:05.5393375Z add.s32 %r976, %r975, %r10; 2026-02-21T09:22:05.5393569Z ld.shared.b16 %rs3, [%r976]; 2026-02-21T09:22:05.5393761Z ld.shared.b16 %rs4, [%r976+256]; 2026-02-21T09:22:05.5393967Z ld.shared.b16 %rs5, [%r976+16]; 2026-02-21T09:22:05.5394160Z ld.shared.b16 %rs6, [%r976+272]; 2026-02-21T09:22:05.5394352Z add.s32 %r977, %r975, %r11; 2026-02-21T09:22:05.5394533Z ld.shared.b16 %rs7, [%r977]; 2026-02-21T09:22:05.5394720Z ld.shared.b16 %rs8, [%r977+256]; 2026-02-21T09:22:05.5394913Z ld.shared.b16 %rs9, [%r977+16]; 2026-02-21T09:22:05.5395114Z ld.shared.b16 %rs10, [%r977+272]; 2026-02-21T09:22:05.5395464Z cvt.f32.bf16 %r492, %rs3; 2026-02-21T09:22:05.5395647Z cvt.f32.bf16 %r493, %rs4; 2026-02-21T09:22:05.5395830Z cvt.f32.bf16 %r494, %rs7; 2026-02-21T09:22:05.5396013Z cvt.f32.bf16 %r495, %rs8; 2026-02-21T09:22:05.5396192Z cvt.f32.bf16 %r560, %rs5; 2026-02-21T09:22:05.5396364Z cvt.f32.bf16 %r561, %rs6; 2026-02-21T09:22:05.5396700Z cvt.f32.bf16 %r562, %rs9; 2026-02-21T09:22:05.5396878Z cvt.f32.bf16 %r563, %rs10; 2026-02-21T09:22:05.5397215Z .loc 1 48 30 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:48:30 2026-02-21T09:22:05.5397588Z cvt.s64.s32 %rd41, %r1080; 2026-02-21T09:22:05.5397771Z add.s64 %rd28, %rd8, %rd41; 2026-02-21T09:22:05.5398188Z .loc 1 48 83 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:48:83 2026-02-21T09:22:05.5398550Z // begin inline asm 2026-02-21T09:22:05.5398716Z mov.u64 %rd27, 0x0; 2026-02-21T09:22:05.5398962Z createpolicy.fractional.L2::evict_first.b64 %rd27, 1.0; 2026-02-21T09:22:05.5399229Z // end inline asm 2026-02-21T09:22:05.5399383Z // begin inline asm 2026-02-21T09:22:05.5399544Z mov.u16 %rs1, 0x0; 2026-02-21T09:22:05.5399800Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd28 + 0 ], %rd27; 2026-02-21T09:22:05.5400180Z // end inline asm 2026-02-21T09:22:05.5400495Z .loc 1 56 24 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:56:24 2026-02-21T09:22:05.5400851Z st.shared.b8 [%r14], %rs1; 2026-02-21T09:22:05.5401027Z bar.sync 0; 2026-02-21T09:22:05.5401191Z ld.shared.v2.b8 {%rs11, %rs12}, [%r15]; 2026-02-21T09:22:05.5401553Z .loc 1 51 24 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:51:24 2026-02-21T09:22:05.5401914Z shl.b16 %rs13, %rs11, 4; 2026-02-21T09:22:05.5402091Z shl.b16 %rs14, %rs12, 4; 2026-02-21T09:22:05.5402420Z .loc 1 66 54 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:66:54 2026-02-21T09:22:05.5402783Z selp.b16 %rs15, %rs13, %rs11, %p8; 2026-02-21T09:22:05.5402992Z cvt.s16.s8 %rs16, %rs15; 2026-02-21T09:22:05.5403180Z shr.s16 %rs17, %rs16, 4; 2026-02-21T09:22:05.5403363Z selp.b16 %rs18, %rs14, %rs12, %p8; 2026-02-21T09:22:05.5403557Z cvt.s16.s8 %rs19, %rs18; 2026-02-21T09:22:05.5403736Z shr.s16 %rs20, %rs19, 4; 2026-02-21T09:22:05.5404056Z .loc 1 71 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:71:28 2026-02-21T09:22:05.5404413Z cvt.rn.f32.s16 %r978, %rs17; 2026-02-21T09:22:05.5404617Z cvt.rn.f32.s16 %r979, %rs20; 2026-02-21T09:22:05.5404796Z bar.sync 0; 2026-02-21T09:22:05.5404955Z st.shared.b32 [%r16], %r978; 2026-02-21T09:22:05.5405139Z st.shared.b32 [%r17], %r979; 2026-02-21T09:22:05.5405395Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r830}; 2026-02-21T09:22:05.5405666Z bar.sync 0; 2026-02-21T09:22:05.5405826Z // begin inline asm 2026-02-21T09:22:05.5406078Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r360, %r496}, [%r198]; 2026-02-21T09:22:05.5406362Z // end inline asm 2026-02-21T09:22:05.5406649Z bar.sync 0; 2026-02-21T09:22:05.5406856Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r832}; 2026-02-21T09:22:05.5407123Z bar.sync 0; 2026-02-21T09:22:05.5407262Z // begin inline asm 2026-02-21T09:22:05.5407505Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r362, %r498}, [%r198]; 2026-02-21T09:22:05.5407789Z // end inline asm 2026-02-21T09:22:05.5407932Z bar.sync 0; 2026-02-21T09:22:05.5408141Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r831}; 2026-02-21T09:22:05.5408397Z bar.sync 0; 2026-02-21T09:22:05.5408543Z // begin inline asm 2026-02-21T09:22:05.5408791Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r361, %r497}, [%r198]; 2026-02-21T09:22:05.5409077Z // end inline asm 2026-02-21T09:22:05.5409224Z bar.sync 0; 2026-02-21T09:22:05.5409437Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r833}; 2026-02-21T09:22:05.5409700Z bar.sync 0; 2026-02-21T09:22:05.5409838Z // begin inline asm 2026-02-21T09:22:05.5410243Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r363, %r499}, [%r198]; 2026-02-21T09:22:05.5410523Z // end inline asm 2026-02-21T09:22:05.5410674Z bar.sync 0; 2026-02-21T09:22:05.5410883Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r834}; 2026-02-21T09:22:05.5411145Z bar.sync 0; 2026-02-21T09:22:05.5411285Z // begin inline asm 2026-02-21T09:22:05.5411521Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r364, %r500}, [%r198]; 2026-02-21T09:22:05.5411792Z // end inline asm 2026-02-21T09:22:05.5411940Z bar.sync 0; 2026-02-21T09:22:05.5412147Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r836}; 2026-02-21T09:22:05.5412398Z bar.sync 0; 2026-02-21T09:22:05.5412542Z // begin inline asm 2026-02-21T09:22:05.5412773Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r366, %r502}, [%r198]; 2026-02-21T09:22:05.5413151Z // end inline asm 2026-02-21T09:22:05.5413302Z bar.sync 0; 2026-02-21T09:22:05.5413510Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r835}; 2026-02-21T09:22:05.5413767Z bar.sync 0; 2026-02-21T09:22:05.5413912Z // begin inline asm 2026-02-21T09:22:05.5414150Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r365, %r501}, [%r198]; 2026-02-21T09:22:05.5414420Z // end inline asm 2026-02-21T09:22:05.5414567Z bar.sync 0; 2026-02-21T09:22:05.5414872Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r837}; 2026-02-21T09:22:05.5415139Z bar.sync 0; 2026-02-21T09:22:05.5415275Z // begin inline asm 2026-02-21T09:22:05.5415512Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r367, %r503}, [%r198]; 2026-02-21T09:22:05.5415788Z // end inline asm 2026-02-21T09:22:05.5415936Z bar.sync 0; 2026-02-21T09:22:05.5416136Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r838}; 2026-02-21T09:22:05.5416398Z bar.sync 0; 2026-02-21T09:22:05.5416664Z // begin inline asm 2026-02-21T09:22:05.5416902Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r368, %r504}, [%r198]; 2026-02-21T09:22:05.5417185Z // end inline asm 2026-02-21T09:22:05.5417332Z bar.sync 0; 2026-02-21T09:22:05.5417542Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r840}; 2026-02-21T09:22:05.5417817Z bar.sync 0; 2026-02-21T09:22:05.5417960Z // begin inline asm 2026-02-21T09:22:05.5418198Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r370, %r506}, [%r198]; 2026-02-21T09:22:05.5418474Z // end inline asm 2026-02-21T09:22:05.5418624Z bar.sync 0; 2026-02-21T09:22:05.5418828Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r839}; 2026-02-21T09:22:05.5419090Z bar.sync 0; 2026-02-21T09:22:05.5419232Z // begin inline asm 2026-02-21T09:22:05.5419462Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r369, %r505}, [%r198]; 2026-02-21T09:22:05.5419739Z // end inline asm 2026-02-21T09:22:05.5419878Z bar.sync 0; 2026-02-21T09:22:05.5420086Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r841}; 2026-02-21T09:22:05.5420338Z bar.sync 0; 2026-02-21T09:22:05.5420499Z // begin inline asm 2026-02-21T09:22:05.5420731Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r371, %r507}, [%r198]; 2026-02-21T09:22:05.5421015Z // end inline asm 2026-02-21T09:22:05.5421161Z bar.sync 0; 2026-02-21T09:22:05.5421368Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r842}; 2026-02-21T09:22:05.5421623Z bar.sync 0; 2026-02-21T09:22:05.5421761Z // begin inline asm 2026-02-21T09:22:05.5421999Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r372, %r508}, [%r198]; 2026-02-21T09:22:05.5422272Z // end inline asm 2026-02-21T09:22:05.5422416Z bar.sync 0; 2026-02-21T09:22:05.5422624Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r844}; 2026-02-21T09:22:05.5422895Z bar.sync 0; 2026-02-21T09:22:05.5423035Z // begin inline asm 2026-02-21T09:22:05.5423271Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r374, %r510}, [%r198]; 2026-02-21T09:22:05.5423549Z // end inline asm 2026-02-21T09:22:05.5423693Z bar.sync 0; 2026-02-21T09:22:05.5423897Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r843}; 2026-02-21T09:22:05.5424152Z bar.sync 0; 2026-02-21T09:22:05.5424297Z // begin inline asm 2026-02-21T09:22:05.5424528Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r373, %r509}, [%r198]; 2026-02-21T09:22:05.5424968Z // end inline asm 2026-02-21T09:22:05.5425109Z bar.sync 0; 2026-02-21T09:22:05.5425316Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r845}; 2026-02-21T09:22:05.5425576Z bar.sync 0; 2026-02-21T09:22:05.5425716Z // begin inline asm 2026-02-21T09:22:05.5425952Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r375, %r511}, [%r198]; 2026-02-21T09:22:05.5426224Z // end inline asm 2026-02-21T09:22:05.5426384Z bar.sync 0; 2026-02-21T09:22:05.5426741Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r846}; 2026-02-21T09:22:05.5427013Z bar.sync 0; 2026-02-21T09:22:05.5427156Z // begin inline asm 2026-02-21T09:22:05.5427395Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r376, %r512}, [%r198]; 2026-02-21T09:22:05.5427668Z // end inline asm 2026-02-21T09:22:05.5427905Z bar.sync 0; 2026-02-21T09:22:05.5428116Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r848}; 2026-02-21T09:22:05.5428477Z bar.sync 0; 2026-02-21T09:22:05.5428626Z // begin inline asm 2026-02-21T09:22:05.5428857Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r378, %r514}, [%r198]; 2026-02-21T09:22:05.5429137Z // end inline asm 2026-02-21T09:22:05.5429281Z bar.sync 0; 2026-02-21T09:22:05.5429573Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r847}; 2026-02-21T09:22:05.5429835Z bar.sync 0; 2026-02-21T09:22:05.5429982Z // begin inline asm 2026-02-21T09:22:05.5430211Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r377, %r513}, [%r198]; 2026-02-21T09:22:05.5430488Z // end inline asm 2026-02-21T09:22:05.5430644Z bar.sync 0; 2026-02-21T09:22:05.5430847Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r849}; 2026-02-21T09:22:05.5431106Z bar.sync 0; 2026-02-21T09:22:05.5431241Z // begin inline asm 2026-02-21T09:22:05.5431477Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r379, %r515}, [%r198]; 2026-02-21T09:22:05.5431747Z // end inline asm 2026-02-21T09:22:05.5431897Z bar.sync 0; 2026-02-21T09:22:05.5432094Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r850}; 2026-02-21T09:22:05.5432356Z bar.sync 0; 2026-02-21T09:22:05.5432496Z // begin inline asm 2026-02-21T09:22:05.5432723Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r380, %r516}, [%r198]; 2026-02-21T09:22:05.5432999Z // end inline asm 2026-02-21T09:22:05.5433140Z bar.sync 0; 2026-02-21T09:22:05.5433345Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r852}; 2026-02-21T09:22:05.5433596Z bar.sync 0; 2026-02-21T09:22:05.5433738Z // begin inline asm 2026-02-21T09:22:05.5433966Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r382, %r518}, [%r198]; 2026-02-21T09:22:05.5434262Z // end inline asm 2026-02-21T09:22:05.5434402Z bar.sync 0; 2026-02-21T09:22:05.5434610Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r851}; 2026-02-21T09:22:05.5434872Z bar.sync 0; 2026-02-21T09:22:05.5435014Z // begin inline asm 2026-02-21T09:22:05.5435248Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r381, %r517}, [%r198]; 2026-02-21T09:22:05.5435522Z // end inline asm 2026-02-21T09:22:05.5435669Z bar.sync 0; 2026-02-21T09:22:05.5435872Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r853}; 2026-02-21T09:22:05.5436127Z bar.sync 0; 2026-02-21T09:22:05.5436264Z // begin inline asm 2026-02-21T09:22:05.5436633Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r383, %r519}, [%r198]; 2026-02-21T09:22:05.5436923Z // end inline asm 2026-02-21T09:22:05.5437064Z bar.sync 0; 2026-02-21T09:22:05.5437268Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r854}; 2026-02-21T09:22:05.5437521Z bar.sync 0; 2026-02-21T09:22:05.5437665Z // begin inline asm 2026-02-21T09:22:05.5437895Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r384, %r520}, [%r198]; 2026-02-21T09:22:05.5438172Z // end inline asm 2026-02-21T09:22:05.5438313Z bar.sync 0; 2026-02-21T09:22:05.5438526Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r856}; 2026-02-21T09:22:05.5438781Z bar.sync 0; 2026-02-21T09:22:05.5438928Z // begin inline asm 2026-02-21T09:22:05.5439167Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r386, %r522}, [%r198]; 2026-02-21T09:22:05.5439601Z // end inline asm 2026-02-21T09:22:05.5439748Z bar.sync 0; 2026-02-21T09:22:05.5439950Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r855}; 2026-02-21T09:22:05.5440214Z bar.sync 0; 2026-02-21T09:22:05.5440355Z // begin inline asm 2026-02-21T09:22:05.5440594Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r385, %r521}, [%r198]; 2026-02-21T09:22:05.5440869Z // end inline asm 2026-02-21T09:22:05.5441014Z bar.sync 0; 2026-02-21T09:22:05.5441222Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r857}; 2026-02-21T09:22:05.5441481Z bar.sync 0; 2026-02-21T09:22:05.5441624Z // begin inline asm 2026-02-21T09:22:05.5441871Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r387, %r523}, [%r198]; 2026-02-21T09:22:05.5442156Z // end inline asm 2026-02-21T09:22:05.5442297Z bar.sync 0; 2026-02-21T09:22:05.5442587Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r858}; 2026-02-21T09:22:05.5442847Z bar.sync 0; 2026-02-21T09:22:05.5442989Z // begin inline asm 2026-02-21T09:22:05.5443226Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r388, %r524}, [%r198]; 2026-02-21T09:22:05.5443504Z // end inline asm 2026-02-21T09:22:05.5443648Z bar.sync 0; 2026-02-21T09:22:05.5443848Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r860}; 2026-02-21T09:22:05.5444184Z bar.sync 0; 2026-02-21T09:22:05.5444328Z // begin inline asm 2026-02-21T09:22:05.5444565Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r390, %r526}, [%r198]; 2026-02-21T09:22:05.5444838Z // end inline asm 2026-02-21T09:22:05.5444986Z bar.sync 0; 2026-02-21T09:22:05.5445186Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r859}; 2026-02-21T09:22:05.5445442Z bar.sync 0; 2026-02-21T09:22:05.5445587Z // begin inline asm 2026-02-21T09:22:05.5445818Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r389, %r525}, [%r198]; 2026-02-21T09:22:05.5446100Z // end inline asm 2026-02-21T09:22:05.5446240Z bar.sync 0; 2026-02-21T09:22:05.5446582Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r699], {%r861}; 2026-02-21T09:22:05.5446852Z bar.sync 0; 2026-02-21T09:22:05.5446999Z // begin inline asm 2026-02-21T09:22:05.5447231Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r391, %r527}, [%r198]; 2026-02-21T09:22:05.5447511Z // end inline asm 2026-02-21T09:22:05.5447652Z $L__tmp1: 2026-02-21T09:22:05.5448023Z .loc 2 291 36 // standard.py:291:36 @[ cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:78:36 ] 2026-02-21T09:22:05.5448450Z // begin inline asm 2026-02-21T09:22:05.5448643Z fence.proxy.async.shared::cta; 2026-02-21T09:22:05.5448842Z // end inline asm 2026-02-21T09:22:05.5449009Z shfl.sync.idx.b32 %r980, %r3, 0, 31, -1; 2026-02-21T09:22:05.5449241Z wgmma.fence.sync.aligned; 2026-02-21T09:22:05.5449431Z mov.pred %p2, -1; 2026-02-21T09:22:05.5449594Z // begin inline asm 2026-02-21T09:22:05.5450351Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391}, {%r492,%r493,%r494,%r495}, %rd30, %p2, 1, 1; 2026-02-21T09:22:05.5451164Z // end inline asm 2026-02-21T09:22:05.5451319Z // begin inline asm 2026-02-21T09:22:05.5452053Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391}, {%r560,%r561,%r562,%r563}, %rd31, %p2, 1, 1; 2026-02-21T09:22:05.5452844Z // end inline asm 2026-02-21T09:22:05.5452998Z // begin inline asm 2026-02-21T09:22:05.5453735Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527}, {%r492,%r493,%r494,%r495}, %rd32, %p2, 1, 1; 2026-02-21T09:22:05.5454520Z // end inline asm 2026-02-21T09:22:05.5454666Z // begin inline asm 2026-02-21T09:22:05.5455578Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527}, {%r560,%r561,%r562,%r563}, %rd33, %p2, 1, 1; 2026-02-21T09:22:05.5456363Z // end inline asm 2026-02-21T09:22:05.5456666Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:05.5456882Z mov.b32 %r931, 0; 2026-02-21T09:22:05.5457036Z mov.b32 %r629, %r931; 2026-02-21T09:22:05.5457202Z mov.b32 %r630, %r931; 2026-02-21T09:22:05.5457360Z mov.b32 %r628, %r695; 2026-02-21T09:22:05.5457519Z // begin inline asm 2026-02-21T09:22:05.5458588Z // wait for regs: %r360,%r361,%r362,%r363,%r364,%r365,%r366,%r367,%r368,%r369,%r370,%r371,%r372,%r373,%r374,%r375,%r376,%r377,%r378,%r379,%r380,%r381,%r382,%r383,%r384,%r385,%r386,%r387,%r388,%r389,%r390,%r391,%r496,%r497,%r498,%r499,%r500,%r501,%r502,%r503,%r504,%r505,%r506,%r507,%r508,%r509,%r510,%r511,%r512,%r513,%r514,%r515,%r516,%r517,%r518,%r519,%r520,%r521,%r522,%r523,%r524,%r525,%r526,%r527,%r628,%r629,%r630 2026-02-21T09:22:05.5459626Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:05.5459823Z // end inline asm 2026-02-21T09:22:05.5459966Z $L__tmp2: 2026-02-21T09:22:05.5460329Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5460703Z add.s32 %r981, %r975, 49152; 2026-02-21T09:22:05.5461026Z .loc 1 46 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:46:28 2026-02-21T09:22:05.5461399Z add.s32 %r982, %r981, %r10; 2026-02-21T09:22:05.5461590Z ld.shared.b16 %rs21, [%r982]; 2026-02-21T09:22:05.5461787Z ld.shared.b16 %rs22, [%r982+256]; 2026-02-21T09:22:05.5461987Z ld.shared.b16 %rs23, [%r982+16]; 2026-02-21T09:22:05.5462184Z ld.shared.b16 %rs24, [%r982+272]; 2026-02-21T09:22:05.5462370Z add.s32 %r983, %r981, %r11; 2026-02-21T09:22:05.5462562Z ld.shared.b16 %rs25, [%r983]; 2026-02-21T09:22:05.5462755Z ld.shared.b16 %rs26, [%r983+256]; 2026-02-21T09:22:05.5462944Z ld.shared.b16 %rs27, [%r983+16]; 2026-02-21T09:22:05.5463136Z ld.shared.b16 %rs28, [%r983+272]; 2026-02-21T09:22:05.5463326Z cvt.f32.bf16 %r826, %rs21; 2026-02-21T09:22:05.5463513Z cvt.f32.bf16 %r827, %rs22; 2026-02-21T09:22:05.5463683Z cvt.f32.bf16 %r828, %rs25; 2026-02-21T09:22:05.5463858Z cvt.f32.bf16 %r829, %rs26; 2026-02-21T09:22:05.5464031Z cvt.f32.bf16 %r894, %rs23; 2026-02-21T09:22:05.5464206Z cvt.f32.bf16 %r895, %rs24; 2026-02-21T09:22:05.5464373Z cvt.f32.bf16 %r896, %rs27; 2026-02-21T09:22:05.5464550Z cvt.f32.bf16 %r897, %rs28; 2026-02-21T09:22:05.5464873Z .loc 1 48 30 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:48:30 2026-02-21T09:22:05.5465229Z add.s32 %r984, %r1080, 65536; 2026-02-21T09:22:05.5465415Z cvt.s64.s32 %rd42, %r984; 2026-02-21T09:22:05.5465589Z add.s64 %rd35, %rd8, %rd42; 2026-02-21T09:22:05.5465914Z .loc 1 48 83 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:48:83 2026-02-21T09:22:05.5466267Z // begin inline asm 2026-02-21T09:22:05.5466442Z mov.u64 %rd34, 0x0; 2026-02-21T09:22:05.5466808Z createpolicy.fractional.L2::evict_first.b64 %rd34, 1.0; 2026-02-21T09:22:05.5467068Z // end inline asm 2026-02-21T09:22:05.5467226Z // begin inline asm 2026-02-21T09:22:05.5467378Z mov.u16 %rs2, 0x0; 2026-02-21T09:22:05.5467633Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd35 + 0 ], %rd34; 2026-02-21T09:22:05.5467931Z // end inline asm 2026-02-21T09:22:05.5468309Z .loc 1 56 24 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:56:24 2026-02-21T09:22:05.5468680Z bar.sync 0; 2026-02-21T09:22:05.5468840Z st.shared.b8 [%r14], %rs2; 2026-02-21T09:22:05.5469019Z bar.sync 0; 2026-02-21T09:22:05.5469181Z ld.shared.v2.b8 {%rs29, %rs30}, [%r15]; 2026-02-21T09:22:05.5469543Z .loc 1 51 24 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:51:24 2026-02-21T09:22:05.5470055Z shl.b16 %rs31, %rs29, 4; 2026-02-21T09:22:05.5470228Z shl.b16 %rs32, %rs30, 4; 2026-02-21T09:22:05.5470538Z .loc 1 66 54 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:66:54 2026-02-21T09:22:05.5470904Z selp.b16 %rs33, %rs31, %rs29, %p8; 2026-02-21T09:22:05.5471100Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T09:22:05.5471289Z shr.s16 %rs35, %rs34, 4; 2026-02-21T09:22:05.5471466Z selp.b16 %rs36, %rs32, %rs30, %p8; 2026-02-21T09:22:05.5471658Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T09:22:05.5471828Z shr.s16 %rs38, %rs37, 4; 2026-02-21T09:22:05.5472134Z .loc 1 71 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:71:28 2026-02-21T09:22:05.5472493Z cvt.rn.f32.s16 %r985, %rs35; 2026-02-21T09:22:05.5472782Z cvt.rn.f32.s16 %r986, %rs38; 2026-02-21T09:22:05.5472963Z bar.sync 0; 2026-02-21T09:22:05.5473109Z st.shared.b32 [%r16], %r985; 2026-02-21T09:22:05.5473304Z st.shared.b32 [%r17], %r986; 2026-02-21T09:22:05.5473488Z $L__tmp3: 2026-02-21T09:22:05.5473841Z .loc 2 291 36 // standard.py:291:36 @[ cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:78:36 ] 2026-02-21T09:22:05.5474418Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r360, %r496}; 2026-02-21T09:22:05.5474704Z bar.sync 0; 2026-02-21T09:22:05.5474851Z // begin inline asm 2026-02-21T09:22:05.5475074Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r830}, [%r699]; 2026-02-21T09:22:05.5475344Z // end inline asm 2026-02-21T09:22:05.5475488Z bar.sync 0; 2026-02-21T09:22:05.5475715Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r362, %r498}; 2026-02-21T09:22:05.5475997Z bar.sync 0; 2026-02-21T09:22:05.5476135Z // begin inline asm 2026-02-21T09:22:05.5476362Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r832}, [%r699]; 2026-02-21T09:22:05.5476763Z // end inline asm 2026-02-21T09:22:05.5476917Z bar.sync 0; 2026-02-21T09:22:05.5477135Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r361, %r497}; 2026-02-21T09:22:05.5477418Z bar.sync 0; 2026-02-21T09:22:05.5477560Z // begin inline asm 2026-02-21T09:22:05.5477786Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r831}, [%r699]; 2026-02-21T09:22:05.5478050Z // end inline asm 2026-02-21T09:22:05.5478197Z bar.sync 0; 2026-02-21T09:22:05.5478419Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r363, %r499}; 2026-02-21T09:22:05.5478689Z bar.sync 0; 2026-02-21T09:22:05.5478832Z // begin inline asm 2026-02-21T09:22:05.5479049Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r833}, [%r699]; 2026-02-21T09:22:05.5479325Z // end inline asm 2026-02-21T09:22:05.5479474Z bar.sync 0; 2026-02-21T09:22:05.5479700Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r364, %r500}; 2026-02-21T09:22:05.5479974Z bar.sync 0; 2026-02-21T09:22:05.5480122Z // begin inline asm 2026-02-21T09:22:05.5480346Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r834}, [%r699]; 2026-02-21T09:22:05.5480605Z // end inline asm 2026-02-21T09:22:05.5480759Z bar.sync 0; 2026-02-21T09:22:05.5480981Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r366, %r502}; 2026-02-21T09:22:05.5481261Z bar.sync 0; 2026-02-21T09:22:05.5481398Z // begin inline asm 2026-02-21T09:22:05.5481621Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r836}, [%r699]; 2026-02-21T09:22:05.5481885Z // end inline asm 2026-02-21T09:22:05.5482035Z bar.sync 0; 2026-02-21T09:22:05.5482259Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r365, %r501}; 2026-02-21T09:22:05.5482536Z bar.sync 0; 2026-02-21T09:22:05.5482680Z // begin inline asm 2026-02-21T09:22:05.5482912Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r835}, [%r699]; 2026-02-21T09:22:05.5483176Z // end inline asm 2026-02-21T09:22:05.5483317Z bar.sync 0; 2026-02-21T09:22:05.5483543Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r367, %r503}; 2026-02-21T09:22:05.5483813Z bar.sync 0; 2026-02-21T09:22:05.5483959Z // begin inline asm 2026-02-21T09:22:05.5484172Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r837}, [%r699]; 2026-02-21T09:22:05.5484580Z // end inline asm 2026-02-21T09:22:05.5484728Z bar.sync 0; 2026-02-21T09:22:05.5484946Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r368, %r504}; 2026-02-21T09:22:05.5485223Z bar.sync 0; 2026-02-21T09:22:05.5485361Z // begin inline asm 2026-02-21T09:22:05.5485583Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r838}, [%r699]; 2026-02-21T09:22:05.5485853Z // end inline asm 2026-02-21T09:22:05.5486007Z bar.sync 0; 2026-02-21T09:22:05.5486226Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r370, %r506}; 2026-02-21T09:22:05.5486645Z bar.sync 0; 2026-02-21T09:22:05.5486789Z // begin inline asm 2026-02-21T09:22:05.5487006Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r840}, [%r699]; 2026-02-21T09:22:05.5487270Z // end inline asm 2026-02-21T09:22:05.5487411Z bar.sync 0; 2026-02-21T09:22:05.5487714Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r369, %r505}; 2026-02-21T09:22:05.5488002Z bar.sync 0; 2026-02-21T09:22:05.5488150Z // begin inline asm 2026-02-21T09:22:05.5488370Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r839}, [%r699]; 2026-02-21T09:22:05.5488632Z // end inline asm 2026-02-21T09:22:05.5488776Z bar.sync 0; 2026-02-21T09:22:05.5489001Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r371, %r507}; 2026-02-21T09:22:05.5489347Z bar.sync 0; 2026-02-21T09:22:05.5489490Z // begin inline asm 2026-02-21T09:22:05.5489712Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r841}, [%r699]; 2026-02-21T09:22:05.5489972Z // end inline asm 2026-02-21T09:22:05.5490121Z bar.sync 0; 2026-02-21T09:22:05.5490341Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r372, %r508}; 2026-02-21T09:22:05.5490638Z bar.sync 0; 2026-02-21T09:22:05.5490781Z // begin inline asm 2026-02-21T09:22:05.5491002Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r842}, [%r699]; 2026-02-21T09:22:05.5491270Z // end inline asm 2026-02-21T09:22:05.5491418Z bar.sync 0; 2026-02-21T09:22:05.5491656Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r374, %r510}; 2026-02-21T09:22:05.5491945Z bar.sync 0; 2026-02-21T09:22:05.5492097Z // begin inline asm 2026-02-21T09:22:05.5492324Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r844}, [%r699]; 2026-02-21T09:22:05.5492596Z // end inline asm 2026-02-21T09:22:05.5492742Z bar.sync 0; 2026-02-21T09:22:05.5492972Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r373, %r509}; 2026-02-21T09:22:05.5493249Z bar.sync 0; 2026-02-21T09:22:05.5493395Z // begin inline asm 2026-02-21T09:22:05.5493622Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r843}, [%r699]; 2026-02-21T09:22:05.5493885Z // end inline asm 2026-02-21T09:22:05.5494032Z bar.sync 0; 2026-02-21T09:22:05.5494252Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r375, %r511}; 2026-02-21T09:22:05.5494528Z bar.sync 0; 2026-02-21T09:22:05.5494672Z // begin inline asm 2026-02-21T09:22:05.5494911Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r845}, [%r699]; 2026-02-21T09:22:05.5495171Z // end inline asm 2026-02-21T09:22:05.5495318Z bar.sync 0; 2026-02-21T09:22:05.5495540Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r376, %r512}; 2026-02-21T09:22:05.5495821Z bar.sync 0; 2026-02-21T09:22:05.5495966Z // begin inline asm 2026-02-21T09:22:05.5496184Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r846}, [%r699]; 2026-02-21T09:22:05.5496576Z // end inline asm 2026-02-21T09:22:05.5496725Z bar.sync 0; 2026-02-21T09:22:05.5496948Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r378, %r514}; 2026-02-21T09:22:05.5497236Z bar.sync 0; 2026-02-21T09:22:05.5497384Z // begin inline asm 2026-02-21T09:22:05.5497603Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r848}, [%r699]; 2026-02-21T09:22:05.5497864Z // end inline asm 2026-02-21T09:22:05.5498013Z bar.sync 0; 2026-02-21T09:22:05.5498231Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r377, %r513}; 2026-02-21T09:22:05.5498509Z bar.sync 0; 2026-02-21T09:22:05.5498652Z // begin inline asm 2026-02-21T09:22:05.5498877Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r847}, [%r699]; 2026-02-21T09:22:05.5499221Z // end inline asm 2026-02-21T09:22:05.5499443Z bar.sync 0; 2026-02-21T09:22:05.5499668Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r379, %r515}; 2026-02-21T09:22:05.5499949Z bar.sync 0; 2026-02-21T09:22:05.5500090Z // begin inline asm 2026-02-21T09:22:05.5500316Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r849}, [%r699]; 2026-02-21T09:22:05.5500581Z // end inline asm 2026-02-21T09:22:05.5500721Z bar.sync 0; 2026-02-21T09:22:05.5500938Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r380, %r516}; 2026-02-21T09:22:05.5501210Z bar.sync 0; 2026-02-21T09:22:05.5501355Z // begin inline asm 2026-02-21T09:22:05.5501575Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r850}, [%r699]; 2026-02-21T09:22:05.5501840Z // end inline asm 2026-02-21T09:22:05.5501983Z bar.sync 0; 2026-02-21T09:22:05.5502293Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r382, %r518}; 2026-02-21T09:22:05.5502582Z bar.sync 0; 2026-02-21T09:22:05.5502731Z // begin inline asm 2026-02-21T09:22:05.5502953Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r852}, [%r699]; 2026-02-21T09:22:05.5503215Z // end inline asm 2026-02-21T09:22:05.5503360Z bar.sync 0; 2026-02-21T09:22:05.5503575Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r381, %r517}; 2026-02-21T09:22:05.5503853Z bar.sync 0; 2026-02-21T09:22:05.5504056Z // begin inline asm 2026-02-21T09:22:05.5504280Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r851}, [%r699]; 2026-02-21T09:22:05.5504544Z // end inline asm 2026-02-21T09:22:05.5504683Z bar.sync 0; 2026-02-21T09:22:05.5504905Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r383, %r519}; 2026-02-21T09:22:05.5505174Z bar.sync 0; 2026-02-21T09:22:05.5505317Z // begin inline asm 2026-02-21T09:22:05.5505532Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r853}, [%r699]; 2026-02-21T09:22:05.5505804Z // end inline asm 2026-02-21T09:22:05.5505953Z bar.sync 0; 2026-02-21T09:22:05.5506184Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r384, %r520}; 2026-02-21T09:22:05.5506584Z bar.sync 0; 2026-02-21T09:22:05.5506739Z // begin inline asm 2026-02-21T09:22:05.5506961Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r854}, [%r699]; 2026-02-21T09:22:05.5507219Z // end inline asm 2026-02-21T09:22:05.5507368Z bar.sync 0; 2026-02-21T09:22:05.5507589Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r386, %r522}; 2026-02-21T09:22:05.5507867Z bar.sync 0; 2026-02-21T09:22:05.5508006Z // begin inline asm 2026-02-21T09:22:05.5508330Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r856}, [%r699]; 2026-02-21T09:22:05.5508599Z // end inline asm 2026-02-21T09:22:05.5508750Z bar.sync 0; 2026-02-21T09:22:05.5508971Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r385, %r521}; 2026-02-21T09:22:05.5509248Z bar.sync 0; 2026-02-21T09:22:05.5509390Z // begin inline asm 2026-02-21T09:22:05.5509607Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r855}, [%r699]; 2026-02-21T09:22:05.5509873Z // end inline asm 2026-02-21T09:22:05.5510022Z bar.sync 0; 2026-02-21T09:22:05.5510251Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r387, %r523}; 2026-02-21T09:22:05.5510526Z bar.sync 0; 2026-02-21T09:22:05.5510672Z // begin inline asm 2026-02-21T09:22:05.5510889Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r857}, [%r699]; 2026-02-21T09:22:05.5511164Z // end inline asm 2026-02-21T09:22:05.5511313Z bar.sync 0; 2026-02-21T09:22:05.5511535Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r388, %r524}; 2026-02-21T09:22:05.5511811Z bar.sync 0; 2026-02-21T09:22:05.5511949Z // begin inline asm 2026-02-21T09:22:05.5512178Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r858}, [%r699]; 2026-02-21T09:22:05.5512438Z // end inline asm 2026-02-21T09:22:05.5512594Z bar.sync 0; 2026-02-21T09:22:05.5512815Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r390, %r526}; 2026-02-21T09:22:05.5513096Z bar.sync 0; 2026-02-21T09:22:05.5513239Z // begin inline asm 2026-02-21T09:22:05.5513465Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r860}, [%r699]; 2026-02-21T09:22:05.5513729Z // end inline asm 2026-02-21T09:22:05.5514028Z bar.sync 0; 2026-02-21T09:22:05.5514262Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r389, %r525}; 2026-02-21T09:22:05.5514546Z bar.sync 0; 2026-02-21T09:22:05.5514692Z // begin inline asm 2026-02-21T09:22:05.5514913Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r859}, [%r699]; 2026-02-21T09:22:05.5515177Z // end inline asm 2026-02-21T09:22:05.5515319Z bar.sync 0; 2026-02-21T09:22:05.5515542Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r198], {%r391, %r527}; 2026-02-21T09:22:05.5515820Z bar.sync 0; 2026-02-21T09:22:05.5515959Z // begin inline asm 2026-02-21T09:22:05.5516182Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r861}, [%r699]; 2026-02-21T09:22:05.5516436Z // end inline asm 2026-02-21T09:22:05.5516719Z // begin inline asm 2026-02-21T09:22:05.5516898Z fence.proxy.async.shared::cta; 2026-02-21T09:22:05.5517189Z // end inline asm 2026-02-21T09:22:05.5517359Z wgmma.fence.sync.aligned; 2026-02-21T09:22:05.5517544Z shl.b32 %r987, %r980, 8; 2026-02-21T09:22:05.5517713Z and.b32 %r988, %r987, 4096; 2026-02-21T09:22:05.5517902Z add.s32 %r989, %r988, %r695; 2026-02-21T09:22:05.5518092Z bfe.u32 %r990, %r989, 4, 14; 2026-02-21T09:22:05.5518269Z cvt.u64.u32 %rd43, %r990; 2026-02-21T09:22:05.5518466Z or.b64 %rd37, %rd43, -9223371899382267904; 2026-02-21T09:22:05.5518757Z // begin inline asm 2026-02-21T09:22:05.5519523Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861}, {%r826,%r827,%r828,%r829}, %rd37, %p2, 1, 1; 2026-02-21T09:22:05.5520321Z // end inline asm 2026-02-21T09:22:05.5520472Z add.s32 %r991, %r989, 32; 2026-02-21T09:22:05.5520648Z bfe.u32 %r992, %r991, 4, 14; 2026-02-21T09:22:05.5520826Z cvt.u64.u32 %rd44, %r992; 2026-02-21T09:22:05.5521017Z or.b64 %rd38, %rd44, -9223371899382267904; 2026-02-21T09:22:05.5521228Z // begin inline asm 2026-02-21T09:22:05.5521971Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861}, {%r894,%r895,%r896,%r897}, %rd38, %p2, 1, 1; 2026-02-21T09:22:05.5522778Z // end inline asm 2026-02-21T09:22:05.5522947Z wgmma.commit_group.sync.aligned; 2026-02-21T09:22:05.5523149Z mov.b32 %r932, %r931; 2026-02-21T09:22:05.5523314Z mov.b32 %r930, %r695; 2026-02-21T09:22:05.5523478Z // begin inline asm 2026-02-21T09:22:05.5524033Z // wait for regs: %r830,%r831,%r832,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r849,%r850,%r851,%r852,%r853,%r854,%r855,%r856,%r857,%r858,%r859,%r860,%r861,%r930,%r931,%r932 2026-02-21T09:22:05.5524659Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:22:05.5524851Z // end inline asm 2026-02-21T09:22:05.5525003Z $L__tmp4: 2026-02-21T09:22:05.5525305Z .loc 1 34 74 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:34:74 2026-02-21T09:22:05.5525675Z add.s32 %r993, %r1083, 1; 2026-02-21T09:22:05.5525863Z setp.gt.s32 %p11, %r993, 5; 2026-02-21T09:22:05.5526057Z selp.b32 %r1083, 0, %r993, %p11; 2026-02-21T09:22:05.5526407Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5526913Z add.s32 %r994, %r1081, -16; 2026-02-21T09:22:05.5527108Z mad.wide.s32 %rd39, %r994, 2, %rd7; 2026-02-21T09:22:05.5527460Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5527821Z shl.b32 %r995, %r1083, 13; 2026-02-21T09:22:05.5528006Z add.s32 %r968, %r94, %r995; 2026-02-21T09:22:05.5528187Z selp.b32 %r969, 8, 0, %p9; 2026-02-21T09:22:05.5528370Z // begin inline asm 2026-02-21T09:22:05.5528604Z cp.async.ca.shared.global [ %r968 + 0 ], [ %rd39 + 0 ], 0x8, %r969; 2026-02-21T09:22:05.5528888Z // end inline asm 2026-02-21T09:22:05.5529223Z cp.async.commit_group; 2026-02-21T09:22:05.5529553Z .loc 1 42 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:28 2026-02-21T09:22:05.5529924Z mad.wide.s32 %rd40, %r1081, 2, %rd7; 2026-02-21T09:22:05.5530267Z .loc 1 42 76 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:42:76 2026-02-21T09:22:05.5530626Z add.s32 %r970, %r96, %r995; 2026-02-21T09:22:05.5530801Z // begin inline asm 2026-02-21T09:22:05.5531034Z cp.async.ca.shared.global [ %r970 + 0 ], [ %rd40 + 0 ], 0x8, %r969; 2026-02-21T09:22:05.5531301Z // end inline asm 2026-02-21T09:22:05.5531460Z cp.async.commit_group; 2026-02-21T09:22:05.5531766Z .loc 1 34 74 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:34:74 2026-02-21T09:22:05.5532197Z add.s32 %r1081, %r1081, 32; 2026-02-21T09:22:05.5532387Z add.s32 %r1080, %r1080, 131072; 2026-02-21T09:22:05.5532578Z setp.lt.u64 %p12, %rd49, 496; 2026-02-21T09:22:05.5532768Z @%p12 bra $L__BB0_1; 2026-02-21T09:22:05.5532926Z // %bb.2: 2026-02-21T09:22:05.5533233Z .loc 1 27 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:27:28 2026-02-21T09:22:05.5533592Z or.b32 %r1032, %r4, %r13; 2026-02-21T09:22:05.5533981Z .loc 1 25 41 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:25:41 2026-02-21T09:22:05.5534342Z and.b32 %r1033, %r7, 120; 2026-02-21T09:22:05.5534649Z .loc 1 25 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:25:28 2026-02-21T09:22:05.5535019Z or.b32 %r1034, %r1, %r1033; 2026-02-21T09:22:05.5535334Z .loc 1 34 74 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:34:74 2026-02-21T09:22:05.5535696Z cp.async.wait_group 0; 2026-02-21T09:22:05.5535869Z bar.sync 0; 2026-02-21T09:22:05.5536160Z .loc 1 81 24 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:81:24 2026-02-21T09:22:05.5536664Z cvt.rn.bf16x2.f32 %r1035, %r831, %r830; 2026-02-21T09:22:05.5536886Z cvt.rn.bf16x2.f32 %r1036, %r833, %r832; 2026-02-21T09:22:05.5537118Z cvt.rn.bf16x2.f32 %r1037, %r835, %r834; 2026-02-21T09:22:05.5537333Z cvt.rn.bf16x2.f32 %r1038, %r837, %r836; 2026-02-21T09:22:05.5537549Z cvt.rn.bf16x2.f32 %r1039, %r839, %r838; 2026-02-21T09:22:05.5537755Z cvt.rn.bf16x2.f32 %r1040, %r841, %r840; 2026-02-21T09:22:05.5537969Z cvt.rn.bf16x2.f32 %r1041, %r843, %r842; 2026-02-21T09:22:05.5538175Z cvt.rn.bf16x2.f32 %r1042, %r845, %r844; 2026-02-21T09:22:05.5538390Z cvt.rn.bf16x2.f32 %r1043, %r847, %r846; 2026-02-21T09:22:05.5538603Z cvt.rn.bf16x2.f32 %r1044, %r849, %r848; 2026-02-21T09:22:05.5538808Z cvt.rn.bf16x2.f32 %r1045, %r851, %r850; 2026-02-21T09:22:05.5539021Z cvt.rn.bf16x2.f32 %r1046, %r853, %r852; 2026-02-21T09:22:05.5539232Z cvt.rn.bf16x2.f32 %r1047, %r855, %r854; 2026-02-21T09:22:05.5539450Z cvt.rn.bf16x2.f32 %r1048, %r857, %r856; 2026-02-21T09:22:05.5539655Z cvt.rn.bf16x2.f32 %r1049, %r859, %r858; 2026-02-21T09:22:05.5539874Z cvt.rn.bf16x2.f32 %r1050, %r861, %r860; 2026-02-21T09:22:05.5540218Z .loc 1 27 28 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:27:28 2026-02-21T09:22:05.5540583Z shl.b32 %r1051, %r1032, 13; 2026-02-21T09:22:05.5540911Z .loc 1 82 39 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:82:39 2026-02-21T09:22:05.5541267Z or.b32 %r1052, %r1051, 524288; 2026-02-21T09:22:05.5541463Z or.b32 %r1053, %r1051, 1048576; 2026-02-21T09:22:05.5541648Z or.b32 %r1054, %r1051, 1572864; 2026-02-21T09:22:05.5541974Z .loc 1 82 46 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:82:46 2026-02-21T09:22:05.5542329Z add.s32 %r1055, %r1051, %r1034; 2026-02-21T09:22:05.5542540Z add.s32 %r1056, %r1052, %r1034; 2026-02-21T09:22:05.5542732Z add.s32 %r1057, %r1053, %r1034; 2026-02-21T09:22:05.5542914Z add.s32 %r1058, %r1054, %r1034; 2026-02-21T09:22:05.5543245Z .loc 1 82 18 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:82:18 2026-02-21T09:22:05.5543762Z mad.wide.s32 %rd45, %r1055, 2, %rd9; 2026-02-21T09:22:05.5543972Z mad.wide.s32 %rd46, %r1056, 2, %rd9; 2026-02-21T09:22:05.5544170Z mad.wide.s32 %rd47, %r1057, 2, %rd9; 2026-02-21T09:22:05.5544374Z mad.wide.s32 %rd48, %r1058, 2, %rd9; 2026-02-21T09:22:05.5544722Z .loc 1 82 77 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:82:77 2026-02-21T09:22:05.5545080Z shl.b32 %r1059, %r2, 5; 2026-02-21T09:22:05.5545252Z and.b32 %r1060, %r2, 24; 2026-02-21T09:22:05.5545422Z shr.u32 %r1061, %r2, 3; 2026-02-21T09:22:05.5545594Z and.b32 %r1062, %r1061, 64; 2026-02-21T09:22:05.5545773Z shl.b32 %r1063, %r1060, 4; 2026-02-21T09:22:05.5545953Z shl.b32 %r1064, %r5, 14; 2026-02-21T09:22:05.5546197Z and.b32 %r1065, %r1059, 15456; 2026-02-21T09:22:05.5546390Z and.b32 %r1066, %r12, 16; 2026-02-21T09:22:05.5546706Z or.b32 %r1067, %r1066, %r1064; 2026-02-21T09:22:05.5546901Z or.b32 %r1068, %r1065, %r1063; 2026-02-21T09:22:05.5547089Z xor.b32 %r1069, %r1068, %r1062; 2026-02-21T09:22:05.5547270Z or.b32 %r1070, %r1069, %r1067; 2026-02-21T09:22:05.5547455Z add.s32 %r1072, %r142, %r1070; 2026-02-21T09:22:05.5547763Z st.shared.v4.b32 [%r1072], {%r1035, %r1037, %r1039, %r1041}; 2026-02-21T09:22:05.5548098Z st.shared.v4.b32 [%r1072+512], {%r1036, %r1038, %r1040, %r1042}; 2026-02-21T09:22:05.5548455Z xor.b32 %r1073, %r1070, 32; 2026-02-21T09:22:05.5548638Z add.s32 %r1074, %r142, %r1073; 2026-02-21T09:22:05.5548866Z st.shared.v4.b32 [%r1074], {%r1043, %r1045, %r1047, %r1049}; 2026-02-21T09:22:05.5549177Z st.shared.v4.b32 [%r1074+512], {%r1044, %r1046, %r1048, %r1050}; 2026-02-21T09:22:05.5549436Z bar.sync 0; 2026-02-21T09:22:05.5549583Z shl.b32 %r1075, %r1060, 11; 2026-02-21T09:22:05.5549766Z shl.b32 %r1076, %r5, 5; 2026-02-21T09:22:05.5549934Z and.b32 %r1077, %r12, 4080; 2026-02-21T09:22:05.5550113Z or.b32 %r1078, %r1075, %r1076; 2026-02-21T09:22:05.5550298Z xor.b32 %r1079, %r1078, %r1077; 2026-02-21T09:22:05.5550486Z add.s32 %r1000, %r142, %r1079; 2026-02-21T09:22:05.5550664Z // begin inline asm 2026-02-21T09:22:05.5550957Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1016, %r1017, %r1018, %r1019}, [%r1000]; 2026-02-21T09:22:05.5551298Z // end inline asm 2026-02-21T09:22:05.5551466Z add.s32 %r1005, %r1000, 4096; 2026-02-21T09:22:05.5551656Z // begin inline asm 2026-02-21T09:22:05.5551939Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1021, %r1022, %r1023}, [%r1005]; 2026-02-21T09:22:05.5552274Z // end inline asm 2026-02-21T09:22:05.5552427Z add.s32 %r1010, %r1000, 8192; 2026-02-21T09:22:05.5552611Z // begin inline asm 2026-02-21T09:22:05.5552888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1025, %r1026, %r1027}, [%r1010]; 2026-02-21T09:22:05.5553220Z // end inline asm 2026-02-21T09:22:05.5553380Z add.s32 %r1015, %r1000, 12288; 2026-02-21T09:22:05.5553559Z // begin inline asm 2026-02-21T09:22:05.5553839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1028, %r1029, %r1030, %r1031}, [%r1015]; 2026-02-21T09:22:05.5554171Z // end inline asm 2026-02-21T09:22:05.5554327Z // begin inline asm 2026-02-21T09:22:05.5554544Z st.global.v4.b32 [ %rd45 + 0 ], { %r1016, %r1017, %r1018, %r1019 }; 2026-02-21T09:22:05.5554814Z // end inline asm 2026-02-21T09:22:05.5554960Z // begin inline asm 2026-02-21T09:22:05.5555175Z st.global.v4.b32 [ %rd46 + 0 ], { %r1020, %r1021, %r1022, %r1023 }; 2026-02-21T09:22:05.5555437Z // end inline asm 2026-02-21T09:22:05.5555582Z // begin inline asm 2026-02-21T09:22:05.5555796Z st.global.v4.b32 [ %rd47 + 0 ], { %r1024, %r1025, %r1026, %r1027 }; 2026-02-21T09:22:05.5556049Z // end inline asm 2026-02-21T09:22:05.5556200Z // begin inline asm 2026-02-21T09:22:05.5556408Z st.global.v4.b32 [ %rd48 + 0 ], { %r1028, %r1029, %r1030, %r1031 }; 2026-02-21T09:22:05.5556802Z // end inline asm 2026-02-21T09:22:05.5557098Z .loc 1 82 4 // cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py:82:4 2026-02-21T09:22:05.5557600Z ret; 2026-02-21T09:22:05.5557736Z $L__tmp5: 2026-02-21T09:22:05.5557871Z $L__func_end0: 2026-02-21T09:22:05.5558047Z // -- End function 2026-02-21T09:22:05.5558262Z } 2026-02-21T09:22:05.5558606Z .file 1 "/tmp/torchinductor_root/fg/cfgqmitkxn4t4d5zxkeijpcbv5hjeci7dxjgpuuvsd2mtvtxj4ki.py" 2026-02-21T09:22:05.5559151Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:22:05.5559516Z .section .debug_abbrev 2026-02-21T09:22:05.5559675Z { 2026-02-21T09:22:05.5559849Z .b8 1 // Abbreviation Code 2026-02-21T09:22:05.5560116Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:22:05.5560453Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:05.5560715Z .b8 37 // DW_AT_producer 2026-02-21T09:22:05.5560804Z .b8 8 // DW_FORM_string 2026-02-21T09:22:05.5560890Z .b8 19 // DW_AT_language 2026-02-21T09:22:05.5560979Z .b8 5 // DW_FORM_data2 2026-02-21T09:22:05.5561059Z .b8 3 // DW_AT_name 2026-02-21T09:22:05.5561204Z .b8 8 // DW_FORM_string 2026-02-21T09:22:05.5561295Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:22:05.5561387Z .b8 6 // DW_FORM_data4 2026-02-21T09:22:05.5561473Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:22:05.5561553Z .b8 8 // DW_FORM_string 2026-02-21T09:22:05.5561637Z .b8 0 // EOM(1) 2026-02-21T09:22:05.5561710Z .b8 0 // EOM(2) 2026-02-21T09:22:05.5561801Z .b8 2 // Abbreviation Code 2026-02-21T09:22:05.5561899Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:05.5561985Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:05.5562065Z .b8 3 // DW_AT_name 2026-02-21T09:22:05.5562154Z .b8 8 // DW_FORM_string 2026-02-21T09:22:05.5562237Z .b8 32 // DW_AT_inline 2026-02-21T09:22:05.5562319Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:05.5562392Z .b8 0 // EOM(1) 2026-02-21T09:22:05.5562470Z .b8 0 // EOM(2) 2026-02-21T09:22:05.5562556Z .b8 3 // Abbreviation Code 2026-02-21T09:22:05.5562643Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:22:05.5562746Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:22:05.5562831Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:05.5562913Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:05.5563010Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:05.5563088Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:05.5563183Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:05.5563263Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:05.5563343Z .b8 0 // EOM(1) 2026-02-21T09:22:05.5563414Z .b8 0 // EOM(2) 2026-02-21T09:22:05.5563503Z .b8 4 // Abbreviation Code 2026-02-21T09:22:05.5563614Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:22:05.5563696Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:22:05.5563791Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:22:05.5563876Z .b8 19 // DW_FORM_ref4 2026-02-21T09:22:05.5563956Z .b8 17 // DW_AT_low_pc 2026-02-21T09:22:05.5564136Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:05.5564220Z .b8 18 // DW_AT_high_pc 2026-02-21T09:22:05.5564303Z .b8 1 // DW_FORM_addr 2026-02-21T09:22:05.5564389Z .b8 88 // DW_AT_call_file 2026-02-21T09:22:05.5570530Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:05.5570702Z .b8 89 // DW_AT_call_line 2026-02-21T09:22:05.5570812Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:05.5570925Z .b8 87 // DW_AT_call_column 2026-02-21T09:22:05.5571012Z .b8 11 // DW_FORM_data1 2026-02-21T09:22:05.5571237Z .b8 0 // EOM(1) 2026-02-21T09:22:05.5571327Z .b8 0 // EOM(2) 2026-02-21T09:22:05.5571400Z .b8 0 // EOM(3) 2026-02-21T09:22:05.5571466Z } 2026-02-21T09:22:05.5571547Z .section .debug_info 2026-02-21T09:22:05.5571608Z { 2026-02-21T09:22:05.5571710Z .b32 178 // Length of Unit 2026-02-21T09:22:05.5571878Z .b8 2 // DWARF version number 2026-02-21T09:22:05.5571945Z .b8 0 2026-02-21T09:22:05.5572087Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:22:05.5572202Z .b8 8 // Address Size (in bytes) 2026-02-21T09:22:05.5572331Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:22:05.5572429Z .b8 116 // DW_AT_producer 2026-02-21T09:22:05.5572485Z .b8 114 2026-02-21T09:22:05.5572539Z .b8 105 2026-02-21T09:22:05.5572601Z .b8 116 2026-02-21T09:22:05.5572656Z .b8 111 2026-02-21T09:22:05.5572710Z .b8 110 2026-02-21T09:22:05.5572768Z .b8 0 2026-02-21T09:22:05.5572852Z .b8 2 // DW_AT_language 2026-02-21T09:22:05.5572910Z .b8 0 2026-02-21T09:22:05.5573001Z .b8 99 // DW_AT_name 2026-02-21T09:22:05.5573061Z .b8 102 2026-02-21T09:22:05.5573114Z .b8 103 2026-02-21T09:22:05.5573168Z .b8 113 2026-02-21T09:22:05.5573226Z .b8 109 2026-02-21T09:22:05.5573282Z .b8 105 2026-02-21T09:22:05.5573336Z .b8 116 2026-02-21T09:22:05.5573389Z .b8 107 2026-02-21T09:22:05.5573448Z .b8 120 2026-02-21T09:22:05.5573502Z .b8 110 2026-02-21T09:22:05.5573555Z .b8 52 2026-02-21T09:22:05.5573616Z .b8 116 2026-02-21T09:22:05.5573671Z .b8 52 2026-02-21T09:22:05.5573723Z .b8 100 2026-02-21T09:22:05.5573778Z .b8 53 2026-02-21T09:22:05.5573836Z .b8 122 2026-02-21T09:22:05.5573890Z .b8 120 2026-02-21T09:22:05.5573945Z .b8 107 2026-02-21T09:22:05.5573998Z .b8 101 2026-02-21T09:22:05.5574059Z .b8 105 2026-02-21T09:22:05.5574116Z .b8 106 2026-02-21T09:22:05.5574168Z .b8 112 2026-02-21T09:22:05.5574233Z .b8 99 2026-02-21T09:22:05.5574293Z .b8 98 2026-02-21T09:22:05.5574350Z .b8 118 2026-02-21T09:22:05.5574406Z .b8 53 2026-02-21T09:22:05.5574465Z .b8 104 2026-02-21T09:22:05.5574519Z .b8 106 2026-02-21T09:22:05.5574575Z .b8 101 2026-02-21T09:22:05.5574635Z .b8 99 2026-02-21T09:22:05.5574688Z .b8 105 2026-02-21T09:22:05.5574741Z .b8 55 2026-02-21T09:22:05.5574796Z .b8 100 2026-02-21T09:22:05.5574854Z .b8 120 2026-02-21T09:22:05.5574908Z .b8 106 2026-02-21T09:22:05.5574962Z .b8 103 2026-02-21T09:22:05.5575016Z .b8 112 2026-02-21T09:22:05.5575076Z .b8 117 2026-02-21T09:22:05.5575130Z .b8 117 2026-02-21T09:22:05.5575183Z .b8 118 2026-02-21T09:22:05.5575243Z .b8 115 2026-02-21T09:22:05.5575296Z .b8 100 2026-02-21T09:22:05.5575351Z .b8 50 2026-02-21T09:22:05.5575404Z .b8 109 2026-02-21T09:22:05.5575466Z .b8 116 2026-02-21T09:22:05.5575519Z .b8 118 2026-02-21T09:22:05.5575572Z .b8 116 2026-02-21T09:22:05.5575632Z .b8 120 2026-02-21T09:22:05.5575687Z .b8 106 2026-02-21T09:22:05.5575739Z .b8 52 2026-02-21T09:22:05.5575793Z .b8 107 2026-02-21T09:22:05.5575852Z .b8 105 2026-02-21T09:22:05.5576031Z .b8 46 2026-02-21T09:22:05.5576158Z .b8 112 2026-02-21T09:22:05.5576221Z .b8 121 2026-02-21T09:22:05.5576275Z .b8 0 2026-02-21T09:22:05.5576390Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:22:05.5576657Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:22:05.5576729Z .b8 116 2026-02-21T09:22:05.5576786Z .b8 109 2026-02-21T09:22:05.5576840Z .b8 112 2026-02-21T09:22:05.5576901Z .b8 47 2026-02-21T09:22:05.5576955Z .b8 116 2026-02-21T09:22:05.5577009Z .b8 111 2026-02-21T09:22:05.5577063Z .b8 114 2026-02-21T09:22:05.5577130Z .b8 99 2026-02-21T09:22:05.5577184Z .b8 104 2026-02-21T09:22:05.5577238Z .b8 105 2026-02-21T09:22:05.5577292Z .b8 110 2026-02-21T09:22:05.5577352Z .b8 100 2026-02-21T09:22:05.5577406Z .b8 117 2026-02-21T09:22:05.5577462Z .b8 99 2026-02-21T09:22:05.5577611Z .b8 116 2026-02-21T09:22:05.5577671Z .b8 111 2026-02-21T09:22:05.5577738Z .b8 114 2026-02-21T09:22:05.5577794Z .b8 95 2026-02-21T09:22:05.5577856Z .b8 114 2026-02-21T09:22:05.5577912Z .b8 111 2026-02-21T09:22:05.5577968Z .b8 111 2026-02-21T09:22:05.5578028Z .b8 116 2026-02-21T09:22:05.5578080Z .b8 47 2026-02-21T09:22:05.5578132Z .b8 102 2026-02-21T09:22:05.5578185Z .b8 103 2026-02-21T09:22:05.5578254Z .b8 0 2026-02-21T09:22:05.5578442Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:22:05.5578529Z .b8 95 // DW_AT_name 2026-02-21T09:22:05.5578592Z .b8 104 2026-02-21T09:22:05.5578646Z .b8 101 2026-02-21T09:22:05.5578699Z .b8 108 2026-02-21T09:22:05.5578752Z .b8 105 2026-02-21T09:22:05.5578812Z .b8 111 2026-02-21T09:22:05.5578865Z .b8 110 2026-02-21T09:22:05.5578920Z .b8 95 2026-02-21T09:22:05.5578973Z .b8 109 2026-02-21T09:22:05.5579033Z .b8 97 2026-02-21T09:22:05.5579095Z .b8 116 2026-02-21T09:22:05.5579151Z .b8 109 2026-02-21T09:22:05.5579216Z .b8 117 2026-02-21T09:22:05.5579271Z .b8 108 2026-02-21T09:22:05.5579324Z .b8 95 2026-02-21T09:22:05.5579376Z .b8 98 2026-02-21T09:22:05.5579437Z .b8 102 2026-02-21T09:22:05.5579493Z .b8 49 2026-02-21T09:22:05.5579546Z .b8 54 2026-02-21T09:22:05.5579613Z .b8 95 2026-02-21T09:22:05.5579668Z .b8 105 2026-02-21T09:22:05.5579723Z .b8 110 2026-02-21T09:22:05.5579777Z .b8 116 2026-02-21T09:22:05.5579836Z .b8 52 2026-02-21T09:22:05.5579889Z .b8 0 2026-02-21T09:22:05.5579978Z .b8 1 // DW_AT_inline 2026-02-21T09:22:05.5580100Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:22:05.5580199Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:22:05.5580302Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:22:05.5580408Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:05.5580555Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:22:05.5580656Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:22:05.5580749Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:22:05.5580864Z .b64 $L__tmp4 // DW_AT_high_pc 2026-02-21T09:22:05.5580958Z .b8 1 // DW_AT_call_file 2026-02-21T09:22:05.5581043Z .b8 78 // DW_AT_call_line 2026-02-21T09:22:05.5581140Z .b8 36 // DW_AT_call_column 2026-02-21T09:22:05.5581238Z .b8 0 // End Of Children Mark 2026-02-21T09:22:05.5581325Z .b8 0 // End Of Children Mark 2026-02-21T09:22:05.5581397Z } 2026-02-21T09:22:05.5581473Z .section .debug_macinfo { } 2026-02-21T09:22:05.5581480Z 2026-02-21T09:22:05.5581561Z ================================================================ 2026-02-21T09:22:05.5581685Z please share the reproducer above with Triton project. 2026-02-21T09:22:42.3132691Z 2026-02-21T09:22:42.3133517Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 104/104 1.4 configs/s 2026-02-21T09:22:44.3919602Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━━━ 147/147 60.7 configs/s 2026-02-21T09:22:44.7792925Z [645s] Generation 5 complete: 2026-02-21T09:22:44.7793208Z error=53 2026-02-21T09:22:44.7793395Z timeout=3 2026-02-21T09:22:44.7793568Z ok=51 2026-02-21T09:22:44.7793730Z min=1.3350 2026-02-21T09:22:44.7793916Z mid=4.6778 2026-02-21T09:22:44.7794081Z max=1498.5692 2026-02-21T09:22:44.7794303Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:22:44.7794785Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:22:44.7795227Z 'l2_groupings': [1], 2026-02-21T09:22:44.7795459Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:22:44.7795742Z 'loop_orders': [[0, 1]], 2026-02-21T09:22:44.7795958Z 'maxnreg': 256, 2026-02-21T09:22:44.7796171Z 'num_sm_multiplier': 8, 2026-02-21T09:22:44.7796837Z 'num_stages': 7, 2026-02-21T09:22:44.7797061Z 'num_warps': 4, 2026-02-21T09:22:44.7797270Z 'pid_type': 'persistent_blocked', 2026-02-21T09:22:44.7797529Z 'range_flattens': [None, None], 2026-02-21T09:22:44.7797801Z 'range_multi_buffers': [False, None], 2026-02-21T09:22:44.7798062Z 'range_num_stages': [2, 3], 2026-02-21T09:22:44.7798298Z 'range_unroll_factors': [4, 1], 2026-02-21T09:22:44.7798539Z 'range_warp_specializes': []} 2026-02-21T09:22:44.7834936Z [645s] Fitting surrogate: 662 points, 662 targets 2026-02-21T09:22:46.2647194Z [646s] Generation 6 starting: 86 neighbors, 4 active search path(s) 2026-02-21T09:23:29.8341722Z [690s] Timeout after 30s compiling Config(block_sizes=[64, 64, 128], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[True, None], range_num_stages=[1, 0], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:23:29.8360975Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 87/87 0.3 configs/s 2026-02-21T09:23:31.1233261Z 2026-02-21T09:23:31.1233277Z 2026-02-21T09:23:31.1233674Z ================================================================ 2026-02-21T09:23:31.1234084Z Internal Triton PTX codegen error 2026-02-21T09:23:31.1234348Z `ptxas` stderr: 2026-02-21T09:23:31.1235098Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1397 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:23:31.1235908Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:23:31.1236148Z 2026-02-21T09:23:31.1237040Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpjsxmz11a.ptx -o /tmp/tmpjsxmz11a.ptx.o 2026-02-21T09:23:31.1237784Z 2026-02-21T09:23:31.1237789Z 2026-02-21T09:23:31.1237879Z // 2026-02-21T09:23:31.1238073Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:23:31.1238332Z // 2026-02-21T09:23:31.1238423Z 2026-02-21T09:23:31.1238502Z .version 8.7 2026-02-21T09:23:31.1238720Z .target sm_90a 2026-02-21T09:23:31.1238939Z .address_size 64 2026-02-21T09:23:31.1239077Z 2026-02-21T09:23:31.1239361Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:23:31.1239833Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:23:31.1240184Z // @_helion_matmul_bf16_int4 2026-02-21T09:23:31.1240515Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:23:31.1240878Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:23:31.1241327Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:23:31.1241771Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:23:31.1242199Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:23:31.1242630Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:23:31.1242962Z ) 2026-02-21T09:23:31.1243709Z .reqntid 128 2026-02-21T09:23:31.1243890Z .maxnreg 32 2026-02-21T09:23:31.1244048Z { 2026-02-21T09:23:31.1244219Z .reg .pred %p<202>; 2026-02-21T09:23:31.1244421Z .reg .b16 %rs<897>; 2026-02-21T09:23:31.1244619Z .reg .b32 %r<13007>; 2026-02-21T09:23:31.1244817Z .reg .b64 %rd<763>; 2026-02-21T09:23:31.1245208Z .loc 1 19 0 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:19:0 2026-02-21T09:23:31.1245664Z $L__func_begin0: 2026-02-21T09:23:31.1246025Z .loc 1 19 0 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:19:0 2026-02-21T09:23:31.1246401Z 2026-02-21T09:23:31.1246645Z // %bb.0: 2026-02-21T09:23:31.1246903Z ld.param.b64 %rd102, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:23:31.1247488Z ld.param.b64 %rd104, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:23:31.1247814Z $L__tmp0: 2026-02-21T09:23:31.1248198Z .loc 1 21 66 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:21:66 2026-02-21T09:23:31.1248658Z mov.u32 %r1281, %ctaid.x; 2026-02-21T09:23:31.1248937Z ld.param.b64 %rd122, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:23:31.1249250Z mov.u32 %r1282, %ctaid.y; 2026-02-21T09:23:31.1249519Z ld.param.b64 %rd139, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:23:31.1249981Z mov.u32 %r1283, %ctaid.z; 2026-02-21T09:23:31.1250186Z mov.u32 %r1284, %nctaid.x; 2026-02-21T09:23:31.1250378Z mov.u32 %r1285, %nctaid.y; 2026-02-21T09:23:31.1250580Z mad.lo.s32 %r1286, %r1283, %r1285, %r1282; 2026-02-21T09:23:31.1250816Z mad.lo.s32 %r1287, %r1286, %r1284, %r1281; 2026-02-21T09:23:31.1251031Z shl.b32 %r1288, %r1287, 8; 2026-02-21T09:23:31.1251210Z cvt.s64.s32 %rd140, %r1288; 2026-02-21T09:23:31.1251417Z add.s64 %rd118, %rd139, %rd140; 2026-02-21T09:23:31.1251985Z [691s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:23:31.1253621Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, None], range_multi_buffers=[False, None], range_num_stages=[2, 3], range_unroll_factors=[3, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:23:31.1255174Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:23:31.1255471Z `ptxas` stderr: 2026-02-21T09:23:31.1256060Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1397 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:23:31.1256924Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:23:31.1257108Z 2026-02-21T09:23:31.1257660Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpjsxmz11a.ptx -o /tmp/tmpjsxmz11a.ptx.o 2026-02-21T09:23:31.1258276Z 2026-02-21T09:23:31.1258434Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:23:31.1258748Z mov.u32 %r1, %tid.x; 2026-02-21T09:23:31.1258940Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:23:31.1259162Z shl.b32 %r1289, %r1, 2; 2026-02-21T09:23:31.1259349Z mov.b32 %r1290, global_smem; 2026-02-21T09:23:31.1259557Z add.s32 %r1265, %r1290, %r1289; 2026-02-21T09:23:31.1259760Z mov.b32 %r1274, 0; 2026-02-21T09:23:31.1259926Z // begin inline asm 2026-02-21T09:23:31.1260109Z @%p1 st.shared.b32 [ %r1265 + 0 ], %r1274; 2026-02-21T09:23:31.1260318Z // end inline asm 2026-02-21T09:23:31.1260482Z bar.warp.sync -1; 2026-02-21T09:23:31.1260653Z setp.eq.b32 %p160, %r1, 0; 2026-02-21T09:23:31.1260853Z cvt.u64.u32 %rd103, %r1290; 2026-02-21T09:23:31.1261035Z // begin inline asm 2026-02-21T09:23:31.1261395Z @%p160 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd103 + 0 ], %rd104; 2026-02-21T09:23:31.1261952Z // end inline asm 2026-02-21T09:23:31.1262112Z // begin inline asm 2026-02-21T09:23:31.1262432Z @%p160 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1; 2026-02-21T09:23:31.1262760Z // end inline asm 2026-02-21T09:23:31.1262941Z mov.b32 %r1267, 128; 2026-02-21T09:23:31.1263112Z // begin inline asm 2026-02-21T09:23:31.1263444Z @%p160 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1267; 2026-02-21T09:23:31.1263821Z // end inline asm 2026-02-21T09:23:31.1263996Z mov.b32 %r1268, 32; 2026-02-21T09:23:31.1264239Z // begin inline asm 2026-02-21T09:23:31.1264575Z @%p160 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1268; 2026-02-21T09:23:31.1264936Z // end inline asm 2026-02-21T09:23:31.1265188Z mov.b32 %r1269, 8192; 2026-02-21T09:23:31.1265369Z // begin inline asm 2026-02-21T09:23:31.1265681Z @%p160 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1269; 2026-02-21T09:23:31.1266043Z // end inline asm 2026-02-21T09:23:31.1266199Z mov.b32 %r1270, 512; 2026-02-21T09:23:31.1266357Z // begin inline asm 2026-02-21T09:23:31.1266828Z @%p160 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1270; 2026-02-21T09:23:31.1267291Z // end inline asm 2026-02-21T09:23:31.1267456Z mov.b64 %rd111, 8192; 2026-02-21T09:23:31.1267615Z // begin inline asm 2026-02-21T09:23:31.1267942Z @%p160 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd103 + 0 ], 0x0, %rd111; 2026-02-21T09:23:31.1268407Z // end inline asm 2026-02-21T09:23:31.1268563Z mov.b32 %r1271, 1; 2026-02-21T09:23:31.1268722Z // begin inline asm 2026-02-21T09:23:31.1269045Z @%p160 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1271; 2026-02-21T09:23:31.1269427Z // end inline asm 2026-02-21T09:23:31.1269575Z // begin inline asm 2026-02-21T09:23:31.1269894Z @%p160 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1271; 2026-02-21T09:23:31.1270266Z // end inline asm 2026-02-21T09:23:31.1270413Z // begin inline asm 2026-02-21T09:23:31.1270704Z @%p160 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0; 2026-02-21T09:23:31.1271037Z // end inline asm 2026-02-21T09:23:31.1271207Z // begin inline asm 2026-02-21T09:23:31.1271521Z @%p160 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0; 2026-02-21T09:23:31.1271890Z // end inline asm 2026-02-21T09:23:31.1272039Z // begin inline asm 2026-02-21T09:23:31.1272332Z @%p160 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x3; 2026-02-21T09:23:31.1272675Z // end inline asm 2026-02-21T09:23:31.1272824Z // begin inline asm 2026-02-21T09:23:31.1273112Z @%p160 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0; 2026-02-21T09:23:31.1273441Z // end inline asm 2026-02-21T09:23:31.1273595Z // begin inline asm 2026-02-21T09:23:31.1274028Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd118 + 0 ], [ %rd103 + 0 ], 0x80; 2026-02-21T09:23:31.1274520Z // end inline asm 2026-02-21T09:23:31.1274672Z // begin inline asm 2026-02-21T09:23:31.1274917Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd118 + 0 ], 0x80; 2026-02-21T09:23:31.1275233Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:23:31.1275455Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:23:31.1275668Z // end inline asm 2026-02-21T09:23:31.1275809Z bar.sync 0; 2026-02-21T09:23:31.1275969Z cvta.global.u64 %rd463, %rd118; 2026-02-21T09:23:31.1276314Z .loc 1 23 68 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:23:68 2026-02-21T09:23:31.1276830Z add.s64 %rd136, %rd118, 128; 2026-02-21T09:23:31.1277013Z bar.sync 0; 2026-02-21T09:23:31.1277163Z // begin inline asm 2026-02-21T09:23:31.1277339Z @%p1 st.shared.b32 [ %r1265 + 0 ], %r1274; 2026-02-21T09:23:31.1277542Z // end inline asm 2026-02-21T09:23:31.1277698Z bar.warp.sync -1; 2026-02-21T09:23:31.1278054Z // begin inline asm 2026-02-21T09:23:31.1278367Z @%p160 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd103 + 0 ], %rd122; 2026-02-21T09:23:31.1278730Z // end inline asm 2026-02-21T09:23:31.1278882Z // begin inline asm 2026-02-21T09:23:31.1279175Z @%p160 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1; 2026-02-21T09:23:31.1279495Z // end inline asm 2026-02-21T09:23:31.1279649Z mov.b32 %r1275, 64; 2026-02-21T09:23:31.1279806Z // begin inline asm 2026-02-21T09:23:31.1280105Z @%p160 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1275; 2026-02-21T09:23:31.1280449Z // end inline asm 2026-02-21T09:23:31.1280601Z // begin inline asm 2026-02-21T09:23:31.1280997Z @%p160 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1267; 2026-02-21T09:23:31.1281336Z // end inline asm 2026-02-21T09:23:31.1281489Z // begin inline asm 2026-02-21T09:23:31.1281793Z @%p160 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1269; 2026-02-21T09:23:31.1282155Z // end inline asm 2026-02-21T09:23:31.1282305Z mov.b32 %r1278, 16384; 2026-02-21T09:23:31.1282485Z // begin inline asm 2026-02-21T09:23:31.1282885Z @%p160 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1278; 2026-02-21T09:23:31.1283247Z // end inline asm 2026-02-21T09:23:31.1283419Z mov.b64 %rd129, 16384; 2026-02-21T09:23:31.1283588Z // begin inline asm 2026-02-21T09:23:31.1283919Z @%p160 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd103 + 0 ], 0x0, %rd129; 2026-02-21T09:23:31.1284287Z // end inline asm 2026-02-21T09:23:31.1284446Z // begin inline asm 2026-02-21T09:23:31.1284764Z @%p160 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0, %r1271; 2026-02-21T09:23:31.1285136Z // end inline asm 2026-02-21T09:23:31.1285291Z // begin inline asm 2026-02-21T09:23:31.1285605Z @%p160 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x1, %r1271; 2026-02-21T09:23:31.1285982Z // end inline asm 2026-02-21T09:23:31.1286131Z // begin inline asm 2026-02-21T09:23:31.1286421Z @%p160 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd103 + 0 ], 0xa; 2026-02-21T09:23:31.1286888Z // end inline asm 2026-02-21T09:23:31.1287045Z // begin inline asm 2026-02-21T09:23:31.1287359Z @%p160 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0; 2026-02-21T09:23:31.1287720Z // end inline asm 2026-02-21T09:23:31.1287879Z // begin inline asm 2026-02-21T09:23:31.1288164Z @%p160 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x3; 2026-02-21T09:23:31.1288508Z // end inline asm 2026-02-21T09:23:31.1288665Z // begin inline asm 2026-02-21T09:23:31.1288946Z @%p160 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd103 + 0 ], 0x0; 2026-02-21T09:23:31.1289281Z // end inline asm 2026-02-21T09:23:31.1289431Z // begin inline asm 2026-02-21T09:23:31.1289884Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd136 + 0 ], [ %rd103 + 0 ], 0x80; 2026-02-21T09:23:31.1290373Z // end inline asm 2026-02-21T09:23:31.1290529Z // begin inline asm 2026-02-21T09:23:31.1290787Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd136 + 0 ], 0x80; 2026-02-21T09:23:31.1291099Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:23:31.1291326Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:23:31.1291543Z // end inline asm 2026-02-21T09:23:31.1291695Z bar.sync 0; 2026-02-21T09:23:31.1291851Z cvta.global.u64 %rd446, %rd136; 2026-02-21T09:23:31.1292196Z .loc 1 29 35 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:29:35 2026-02-21T09:23:31.1292556Z shl.b32 %r12481, %r1281, 3; 2026-02-21T09:23:31.1292881Z .loc 1 30 37 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:30:37 2026-02-21T09:23:31.1293238Z add.s32 %r1291, %r12481, 8; 2026-02-21T09:23:31.1293656Z .loc 1 30 49 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:30:49 2026-02-21T09:23:31.1294082Z min.s32 %r3, %r1291, 8192; 2026-02-21T09:23:31.1294396Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1294752Z sub.s32 %r1292, %r3, %r12481; 2026-02-21T09:23:31.1294943Z mul.hi.s32 %r1293, %r1292, 1431655766; 2026-02-21T09:23:31.1295154Z shr.u32 %r1294, %r1293, 31; 2026-02-21T09:23:31.1295337Z add.s32 %r1295, %r1293, %r1294; 2026-02-21T09:23:31.1295528Z mad.lo.s32 %r12875, %r1295, 3, %r12481; 2026-02-21T09:23:31.1295875Z .loc 1 37 45 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:37:45 2026-02-21T09:23:31.1296225Z shr.u32 %r5, %r1, 5; 2026-02-21T09:23:31.1296596Z and.b32 %r6, %r1, 112; 2026-02-21T09:23:31.1296791Z bfe.u32 %r7, %r1, 4, 3; 2026-02-21T09:23:31.1296967Z or.b32 %r8, %r7, 8; 2026-02-21T09:23:31.1297129Z or.b32 %r9, %r7, 16; 2026-02-21T09:23:31.1297301Z or.b32 %r10, %r7, 24; 2026-02-21T09:23:31.1297461Z or.b32 %r11, %r7, 32; 2026-02-21T09:23:31.1297625Z or.b32 %r12, %r7, 40; 2026-02-21T09:23:31.1297786Z or.b32 %r13, %r7, 48; 2026-02-21T09:23:31.1297940Z or.b32 %r14, %r7, 56; 2026-02-21T09:23:31.1298187Z or.b32 %r15, %r7, 64; 2026-02-21T09:23:31.1298351Z or.b32 %r16, %r7, 72; 2026-02-21T09:23:31.1298510Z or.b32 %r17, %r7, 80; 2026-02-21T09:23:31.1298664Z or.b32 %r18, %r7, 88; 2026-02-21T09:23:31.1298821Z or.b32 %r19, %r7, 96; 2026-02-21T09:23:31.1298979Z or.b32 %r20, %r7, 104; 2026-02-21T09:23:31.1299150Z or.b32 %r21, %r7, 112; 2026-02-21T09:23:31.1299311Z or.b32 %r22, %r7, 120; 2026-02-21T09:23:31.1299635Z .loc 1 51 38 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:51:38 2026-02-21T09:23:31.1300003Z and.b32 %r23, %r1, 15; 2026-02-21T09:23:31.1300167Z shl.b32 %r24, %r23, 2; 2026-02-21T09:23:31.1300481Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1300850Z setp.ge.s32 %p37, %r12481, %r12875; 2026-02-21T09:23:31.1301064Z and.b32 %r12473, %r1, 127; 2026-02-21T09:23:31.1301246Z shr.u32 %r12474, %r6, 1; 2026-02-21T09:23:31.1301421Z shl.b32 %r12475, %r1, 6; 2026-02-21T09:23:31.1301589Z shl.b32 %r12476, %r1, 5; 2026-02-21T09:23:31.1301762Z shl.b32 %r12477, %r1, 1; 2026-02-21T09:23:31.1301946Z shl.b32 %r12478, %r1, 4; 2026-02-21T09:23:31.1302117Z shl.b32 %r12479, %r23, 7; 2026-02-21T09:23:31.1302297Z and.b32 %r12480, %r1, 16; 2026-02-21T09:23:31.1302468Z cvt.u64.u32 %rd754, %r24; 2026-02-21T09:23:31.1302654Z setp.lt.u32 %p201, %r1, 64; 2026-02-21T09:23:31.1302837Z @%p37 bra $L__BB0_9; 2026-02-21T09:23:31.1303029Z // %bb.1: // %.lr.ph 2026-02-21T09:23:31.1303395Z .loc 1 0 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:0:88 2026-02-21T09:23:31.1303756Z shl.b32 %r1296, %r12473, 3; 2026-02-21T09:23:31.1303943Z xor.b32 %r26, %r1296, %r12474; 2026-02-21T09:23:31.1304127Z add.s32 %r27, %r1290, %r26; 2026-02-21T09:23:31.1304308Z add.s32 %r28, %r27, 1024; 2026-02-21T09:23:31.1304473Z add.s32 %r29, %r27, 2048; 2026-02-21T09:23:31.1304643Z add.s32 %r30, %r27, 3072; 2026-02-21T09:23:31.1304807Z add.s32 %r31, %r27, 4096; 2026-02-21T09:23:31.1304974Z add.s32 %r32, %r27, 5120; 2026-02-21T09:23:31.1305140Z add.s32 %r33, %r27, 6144; 2026-02-21T09:23:31.1305310Z add.s32 %r34, %r27, 7168; 2026-02-21T09:23:31.1305472Z add.s32 %r35, %r27, 8192; 2026-02-21T09:23:31.1305642Z add.s32 %r36, %r27, 9216; 2026-02-21T09:23:31.1305816Z add.s32 %r37, %r27, 10240; 2026-02-21T09:23:31.1305987Z add.s32 %r38, %r27, 11264; 2026-02-21T09:23:31.1306162Z add.s32 %r39, %r27, 12288; 2026-02-21T09:23:31.1306332Z add.s32 %r40, %r27, 13312; 2026-02-21T09:23:31.1306635Z add.s32 %r41, %r27, 14336; 2026-02-21T09:23:31.1306804Z add.s32 %r42, %r27, 15360; 2026-02-21T09:23:31.1306977Z add.s32 %r43, %r27, 16384; 2026-02-21T09:23:31.1307311Z add.s32 %r44, %r27, 17408; 2026-02-21T09:23:31.1307487Z add.s32 %r45, %r27, 18432; 2026-02-21T09:23:31.1307673Z add.s32 %r46, %r27, 19456; 2026-02-21T09:23:31.1307843Z add.s32 %r47, %r27, 20480; 2026-02-21T09:23:31.1308018Z add.s32 %r48, %r27, 21504; 2026-02-21T09:23:31.1308188Z add.s32 %r49, %r27, 22528; 2026-02-21T09:23:31.1308455Z add.s32 %r50, %r27, 23552; 2026-02-21T09:23:31.1308627Z add.s32 %r51, %r27, 24576; 2026-02-21T09:23:31.1308802Z add.s32 %r52, %r27, 25600; 2026-02-21T09:23:31.1308978Z add.s32 %r53, %r27, 26624; 2026-02-21T09:23:31.1309153Z add.s32 %r54, %r27, 27648; 2026-02-21T09:23:31.1309323Z add.s32 %r55, %r27, 28672; 2026-02-21T09:23:31.1309495Z add.s32 %r56, %r27, 29696; 2026-02-21T09:23:31.1309668Z add.s32 %r57, %r27, 30720; 2026-02-21T09:23:31.1309923Z add.s32 %r58, %r27, 31744; 2026-02-21T09:23:31.1310114Z and.b32 %r1300, %r12475, 6144; 2026-02-21T09:23:31.1310305Z and.b32 %r1302, %r12476, 896; 2026-02-21T09:23:31.1310493Z and.b32 %r1304, %r12477, 62; 2026-02-21T09:23:31.1310677Z or.b32 %r1305, %r1300, %r1302; 2026-02-21T09:23:31.1310865Z or.b32 %r59, %r1305, %r1304; 2026-02-21T09:23:31.1311041Z xor.b32 %r60, %r59, 8; 2026-02-21T09:23:31.1311217Z xor.b32 %r61, %r59, 16; 2026-02-21T09:23:31.1311452Z xor.b32 %r62, %r59, 24; 2026-02-21T09:23:31.1311627Z xor.b32 %r63, %r59, 32; 2026-02-21T09:23:31.1311800Z xor.b32 %r64, %r59, 40; 2026-02-21T09:23:31.1311963Z xor.b32 %r65, %r59, 48; 2026-02-21T09:23:31.1312131Z xor.b32 %r66, %r59, 56; 2026-02-21T09:23:31.1312311Z shl.b32 %r1306, %r12473, 7; 2026-02-21T09:23:31.1312499Z and.b32 %r1308, %r12478, 112; 2026-02-21T09:23:31.1312678Z or.b32 %r1309, %r1306, %r1308; 2026-02-21T09:23:31.1312864Z add.s32 %r1310, %r1290, 32768; 2026-02-21T09:23:31.1313041Z add.s32 %r67, %r1310, %r1309; 2026-02-21T09:23:31.1313224Z xor.b32 %r1311, %r1309, 16; 2026-02-21T09:23:31.1313397Z add.s32 %r68, %r1310, %r1311; 2026-02-21T09:23:31.1313577Z xor.b32 %r1312, %r1309, 32; 2026-02-21T09:23:31.1313759Z add.s32 %r69, %r1310, %r1312; 2026-02-21T09:23:31.1313935Z xor.b32 %r1313, %r1309, 48; 2026-02-21T09:23:31.1314108Z add.s32 %r70, %r1310, %r1313; 2026-02-21T09:23:31.1314283Z xor.b32 %r1314, %r1309, 64; 2026-02-21T09:23:31.1314461Z add.s32 %r71, %r1310, %r1314; 2026-02-21T09:23:31.1314632Z xor.b32 %r1315, %r1309, 80; 2026-02-21T09:23:31.1314809Z add.s32 %r72, %r1310, %r1315; 2026-02-21T09:23:31.1314981Z xor.b32 %r1316, %r1309, 96; 2026-02-21T09:23:31.1315157Z add.s32 %r73, %r1310, %r1316; 2026-02-21T09:23:31.1315339Z xor.b32 %r1317, %r1309, 112; 2026-02-21T09:23:31.1315509Z add.s32 %r74, %r1310, %r1317; 2026-02-21T09:23:31.1315688Z bfe.u32 %r1318, %r1310, 4, 14; 2026-02-21T09:23:31.1315867Z cvt.u64.u32 %rd141, %r1318; 2026-02-21T09:23:31.1316065Z or.b64 %rd3, %rd141, 4611686293372403712; 2026-02-21T09:23:31.1316278Z add.s32 %r1319, %r1290, 32800; 2026-02-21T09:23:31.1316592Z bfe.u32 %r1320, %r1319, 4, 14; 2026-02-21T09:23:31.1316781Z cvt.u64.u32 %rd142, %r1320; 2026-02-21T09:23:31.1316977Z or.b64 %rd4, %rd142, 4611686293372403712; 2026-02-21T09:23:31.1317179Z add.s32 %r1321, %r1290, 32832; 2026-02-21T09:23:31.1317365Z bfe.u32 %r1322, %r1321, 4, 14; 2026-02-21T09:23:31.1317548Z cvt.u64.u32 %rd143, %r1322; 2026-02-21T09:23:31.1317737Z or.b64 %rd5, %rd143, 4611686293372403712; 2026-02-21T09:23:31.1317949Z add.s32 %r1323, %r1290, 32864; 2026-02-21T09:23:31.1318128Z bfe.u32 %r1324, %r1323, 4, 14; 2026-02-21T09:23:31.1318312Z cvt.u64.u32 %rd144, %r1324; 2026-02-21T09:23:31.1318494Z or.b64 %rd6, %rd144, 4611686293372403712; 2026-02-21T09:23:31.1318700Z add.s32 %r1325, %r1290, 49152; 2026-02-21T09:23:31.1318875Z bfe.u32 %r1326, %r1325, 4, 14; 2026-02-21T09:23:31.1319067Z cvt.u64.u32 %rd145, %r1326; 2026-02-21T09:23:31.1319261Z or.b64 %rd7, %rd145, 4611686293372403712; 2026-02-21T09:23:31.1319468Z add.s32 %r1327, %r1290, 49184; 2026-02-21T09:23:31.1319653Z bfe.u32 %r1328, %r1327, 4, 14; 2026-02-21T09:23:31.1319830Z cvt.u64.u32 %rd146, %r1328; 2026-02-21T09:23:31.1320191Z or.b64 %rd8, %rd146, 4611686293372403712; 2026-02-21T09:23:31.1320396Z add.s32 %r1329, %r1290, 49216; 2026-02-21T09:23:31.1320579Z bfe.u32 %r1330, %r1329, 4, 14; 2026-02-21T09:23:31.1320756Z cvt.u64.u32 %rd147, %r1330; 2026-02-21T09:23:31.1320951Z or.b64 %rd9, %rd147, 4611686293372403712; 2026-02-21T09:23:31.1321157Z add.s32 %r1331, %r1290, 49248; 2026-02-21T09:23:31.1321342Z bfe.u32 %r1332, %r1331, 4, 14; 2026-02-21T09:23:31.1321526Z cvt.u64.u32 %rd148, %r1332; 2026-02-21T09:23:31.1321714Z or.b64 %rd10, %rd148, 4611686293372403712; 2026-02-21T09:23:31.1321938Z or.b32 %r1335, %r12479, %r1308; 2026-02-21T09:23:31.1322128Z xor.b32 %r1336, %r1335, %r12480; 2026-02-21T09:23:31.1322325Z or.b32 %r1337, %r1336, %r1300; 2026-02-21T09:23:31.1322503Z add.s32 %r75, %r1290, %r1337; 2026-02-21T09:23:31.1322769Z add.s32 %r76, %r75, 16384; 2026-02-21T09:23:31.1322945Z add.s32 %r77, %r75, 8192; 2026-02-21T09:23:31.1323122Z add.s32 %r78, %r75, 24576; 2026-02-21T09:23:31.1323310Z xor.b32 %r1338, %r1337, 32; 2026-02-21T09:23:31.1323501Z add.s32 %r79, %r1290, %r1338; 2026-02-21T09:23:31.1323682Z add.s32 %r80, %r79, 16384; 2026-02-21T09:23:31.1323853Z add.s32 %r81, %r79, 8192; 2026-02-21T09:23:31.1324024Z add.s32 %r82, %r79, 24576; 2026-02-21T09:23:31.1324267Z xor.b32 %r1339, %r1337, 64; 2026-02-21T09:23:31.1324451Z add.s32 %r83, %r1290, %r1339; 2026-02-21T09:23:31.1324621Z add.s32 %r84, %r83, 16384; 2026-02-21T09:23:31.1324794Z add.s32 %r85, %r83, 8192; 2026-02-21T09:23:31.1324959Z add.s32 %r86, %r83, 24576; 2026-02-21T09:23:31.1325148Z xor.b32 %r1340, %r1337, 96; 2026-02-21T09:23:31.1325328Z add.s32 %r87, %r1290, %r1340; 2026-02-21T09:23:31.1325502Z add.s32 %r88, %r87, 16384; 2026-02-21T09:23:31.1325680Z add.s32 %r89, %r87, 8192; 2026-02-21T09:23:31.1325850Z add.s32 %r90, %r87, 24576; 2026-02-21T09:23:31.1326028Z or.b32 %r1341, %r1302, %r1304; 2026-02-21T09:23:31.1326207Z or.b32 %r91, %r1341, %r1300; 2026-02-21T09:23:31.1326386Z xor.b32 %r92, %r91, 8; 2026-02-21T09:23:31.1326673Z xor.b32 %r93, %r91, 16; 2026-02-21T09:23:31.1326843Z xor.b32 %r94, %r91, 24; 2026-02-21T09:23:31.1327007Z xor.b32 %r95, %r91, 32; 2026-02-21T09:23:31.1327175Z xor.b32 %r96, %r91, 40; 2026-02-21T09:23:31.1327339Z xor.b32 %r97, %r91, 48; 2026-02-21T09:23:31.1327503Z xor.b32 %r98, %r91, 56; 2026-02-21T09:23:31.1327828Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1328204Z mad.wide.u32 %rd11, %r23, 8, %rd102; 2026-02-21T09:23:31.1328462Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:23:31.1328749Z // Child Loop BB0_3 Depth 2 2026-02-21T09:23:31.1329033Z // Child Loop BB0_5 Depth 2 2026-02-21T09:23:31.1329299Z // Child Loop BB0_7 Depth 2 2026-02-21T09:23:31.1329671Z .loc 1 35 31 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:35:31 2026-02-21T09:23:31.1330053Z shr.s32 %r1423, %r12481, 31; 2026-02-21T09:23:31.1330237Z shr.u32 %r1424, %r1423, 25; 2026-02-21T09:23:31.1330424Z add.s32 %r1425, %r12481, %r1424; 2026-02-21T09:23:31.1330751Z .loc 1 34 30 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:34:30 2026-02-21T09:23:31.1331114Z and.b32 %r1379, %r1425, -128; 2026-02-21T09:23:31.1331303Z sub.s32 %r1426, %r12481, %r1379; 2026-02-21T09:23:31.1331629Z .loc 1 36 27 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:36:27 2026-02-21T09:23:31.1331986Z shl.b32 %r4043, %r1426, 7; 2026-02-21T09:23:31.1332298Z .loc 1 37 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:37:32 2026-02-21T09:23:31.1332658Z or.b32 %r1427, %r4043, %r7; 2026-02-21T09:23:31.1332836Z or.b32 %r1428, %r4043, %r8; 2026-02-21T09:23:31.1333015Z or.b32 %r1429, %r4043, %r9; 2026-02-21T09:23:31.1333190Z or.b32 %r1430, %r4043, %r10; 2026-02-21T09:23:31.1333563Z or.b32 %r1431, %r4043, %r11; 2026-02-21T09:23:31.1333736Z or.b32 %r1432, %r4043, %r12; 2026-02-21T09:23:31.1333910Z or.b32 %r1433, %r4043, %r13; 2026-02-21T09:23:31.1334088Z or.b32 %r1434, %r4043, %r14; 2026-02-21T09:23:31.1334262Z or.b32 %r1435, %r4043, %r15; 2026-02-21T09:23:31.1334444Z or.b32 %r1436, %r4043, %r16; 2026-02-21T09:23:31.1334612Z or.b32 %r1437, %r4043, %r17; 2026-02-21T09:23:31.1334785Z or.b32 %r1438, %r4043, %r18; 2026-02-21T09:23:31.1334959Z or.b32 %r1439, %r4043, %r19; 2026-02-21T09:23:31.1335136Z or.b32 %r1440, %r4043, %r20; 2026-02-21T09:23:31.1335313Z or.b32 %r1441, %r4043, %r21; 2026-02-21T09:23:31.1335481Z or.b32 %r1442, %r4043, %r22; 2026-02-21T09:23:31.1335878Z .loc 1 52 53 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:53 2026-02-21T09:23:31.1336232Z shl.b32 %r1443, %r1427, 10; 2026-02-21T09:23:31.1336410Z shl.b32 %r1444, %r1428, 10; 2026-02-21T09:23:31.1336708Z shl.b32 %r1445, %r1429, 10; 2026-02-21T09:23:31.1336889Z shl.b32 %r1446, %r1430, 10; 2026-02-21T09:23:31.1337060Z shl.b32 %r1447, %r1431, 10; 2026-02-21T09:23:31.1337236Z shl.b32 %r1448, %r1432, 10; 2026-02-21T09:23:31.1337411Z shl.b32 %r1449, %r1433, 10; 2026-02-21T09:23:31.1337672Z shl.b32 %r1450, %r1434, 10; 2026-02-21T09:23:31.1337855Z shl.b32 %r1451, %r1435, 10; 2026-02-21T09:23:31.1338025Z shl.b32 %r1452, %r1436, 10; 2026-02-21T09:23:31.1338199Z shl.b32 %r1453, %r1437, 10; 2026-02-21T09:23:31.1338371Z shl.b32 %r1454, %r1438, 10; 2026-02-21T09:23:31.1338549Z shl.b32 %r1455, %r1439, 10; 2026-02-21T09:23:31.1338718Z shl.b32 %r1456, %r1440, 10; 2026-02-21T09:23:31.1338896Z shl.b32 %r1457, %r1441, 10; 2026-02-21T09:23:31.1339074Z shl.b32 %r1458, %r1442, 10; 2026-02-21T09:23:31.1339391Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1339751Z add.s32 %r6810, %r1290, 73728; 2026-02-21T09:23:31.1339952Z // begin inline asm 2026-02-21T09:23:31.1340174Z @%p160 mbarrier.init.shared::cta.b64 [%r6810], 1; 2026-02-21T09:23:31.1340416Z // end inline asm 2026-02-21T09:23:31.1340570Z bar.sync 0; 2026-02-21T09:23:31.1340721Z add.s32 %r6811, %r1290, 73736; 2026-02-21T09:23:31.1340908Z // begin inline asm 2026-02-21T09:23:31.1341101Z @%p160 mbarrier.init.shared::cta.b64 [%r6811], 1; 2026-02-21T09:23:31.1341338Z // end inline asm 2026-02-21T09:23:31.1341644Z .loc 1 52 60 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:60 2026-02-21T09:23:31.1342110Z or.b32 %r1460, %r1443, %r24; 2026-02-21T09:23:31.1342294Z or.b32 %r1461, %r1444, %r24; 2026-02-21T09:23:31.1342468Z or.b32 %r1462, %r1445, %r24; 2026-02-21T09:23:31.1342646Z or.b32 %r1463, %r1446, %r24; 2026-02-21T09:23:31.1342818Z or.b32 %r1464, %r1447, %r24; 2026-02-21T09:23:31.1342995Z or.b32 %r1465, %r1448, %r24; 2026-02-21T09:23:31.1343167Z or.b32 %r1466, %r1449, %r24; 2026-02-21T09:23:31.1343346Z or.b32 %r1467, %r1450, %r24; 2026-02-21T09:23:31.1343532Z or.b32 %r1468, %r1451, %r24; 2026-02-21T09:23:31.1343707Z or.b32 %r1469, %r1452, %r24; 2026-02-21T09:23:31.1343888Z or.b32 %r1470, %r1453, %r24; 2026-02-21T09:23:31.1344056Z or.b32 %r1471, %r1454, %r24; 2026-02-21T09:23:31.1344233Z or.b32 %r1472, %r1455, %r24; 2026-02-21T09:23:31.1344403Z or.b32 %r1473, %r1456, %r24; 2026-02-21T09:23:31.1344578Z or.b32 %r1474, %r1457, %r24; 2026-02-21T09:23:31.1344762Z or.b32 %r1475, %r1458, %r24; 2026-02-21T09:23:31.1345084Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1345450Z mad.wide.s32 %rd149, %r1460, 2, %rd102; 2026-02-21T09:23:31.1345665Z mad.wide.s32 %rd150, %r1461, 2, %rd102; 2026-02-21T09:23:31.1345881Z mad.wide.s32 %rd151, %r1462, 2, %rd102; 2026-02-21T09:23:31.1346086Z mad.wide.s32 %rd152, %r1463, 2, %rd102; 2026-02-21T09:23:31.1346293Z mad.wide.s32 %rd153, %r1464, 2, %rd102; 2026-02-21T09:23:31.1346613Z mad.wide.s32 %rd154, %r1465, 2, %rd102; 2026-02-21T09:23:31.1346994Z mad.wide.s32 %rd155, %r1466, 2, %rd102; 2026-02-21T09:23:31.1347197Z mad.wide.s32 %rd156, %r1467, 2, %rd102; 2026-02-21T09:23:31.1347406Z mad.wide.s32 %rd157, %r1468, 2, %rd102; 2026-02-21T09:23:31.1347618Z mad.wide.s32 %rd158, %r1469, 2, %rd102; 2026-02-21T09:23:31.1347822Z mad.wide.s32 %rd159, %r1470, 2, %rd102; 2026-02-21T09:23:31.1348030Z mad.wide.s32 %rd160, %r1471, 2, %rd102; 2026-02-21T09:23:31.1348301Z mad.wide.s32 %rd161, %r1472, 2, %rd102; 2026-02-21T09:23:31.1348521Z mad.wide.s32 %rd162, %r1473, 2, %rd102; 2026-02-21T09:23:31.1348724Z mad.wide.s32 %rd163, %r1474, 2, %rd102; 2026-02-21T09:23:31.1348934Z mad.wide.s32 %rd164, %r1475, 2, %rd102; 2026-02-21T09:23:31.1349130Z mov.b32 %r1346, 8; 2026-02-21T09:23:31.1349521Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1349880Z // begin inline asm 2026-02-21T09:23:31.1350114Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd149 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1350401Z // end inline asm 2026-02-21T09:23:31.1350549Z // begin inline asm 2026-02-21T09:23:31.1350777Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd150 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1351049Z // end inline asm 2026-02-21T09:23:31.1351301Z // begin inline asm 2026-02-21T09:23:31.1351530Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd151 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1351803Z // end inline asm 2026-02-21T09:23:31.1351953Z // begin inline asm 2026-02-21T09:23:31.1352172Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd152 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1352445Z // end inline asm 2026-02-21T09:23:31.1352590Z // begin inline asm 2026-02-21T09:23:31.1352836Z cp.async.ca.shared.global [ %r31 + 0 ], [ %rd153 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1353110Z // end inline asm 2026-02-21T09:23:31.1353263Z // begin inline asm 2026-02-21T09:23:31.1353487Z cp.async.ca.shared.global [ %r32 + 0 ], [ %rd154 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1353769Z // end inline asm 2026-02-21T09:23:31.1353922Z // begin inline asm 2026-02-21T09:23:31.1354144Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd155 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1354417Z // end inline asm 2026-02-21T09:23:31.1354566Z // begin inline asm 2026-02-21T09:23:31.1354794Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd156 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1355061Z // end inline asm 2026-02-21T09:23:31.1355214Z // begin inline asm 2026-02-21T09:23:31.1355436Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd157 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1355708Z // end inline asm 2026-02-21T09:23:31.1355864Z // begin inline asm 2026-02-21T09:23:31.1356086Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd158 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1356361Z // end inline asm 2026-02-21T09:23:31.1356637Z // begin inline asm 2026-02-21T09:23:31.1356870Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd159 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1357141Z // end inline asm 2026-02-21T09:23:31.1357292Z // begin inline asm 2026-02-21T09:23:31.1357512Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd160 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1357783Z // end inline asm 2026-02-21T09:23:31.1357932Z // begin inline asm 2026-02-21T09:23:31.1358152Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd161 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1358423Z // end inline asm 2026-02-21T09:23:31.1358570Z // begin inline asm 2026-02-21T09:23:31.1358793Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd162 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1359059Z // end inline asm 2026-02-21T09:23:31.1359210Z // begin inline asm 2026-02-21T09:23:31.1359429Z cp.async.ca.shared.global [ %r41 + 0 ], [ %rd163 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1359700Z // end inline asm 2026-02-21T09:23:31.1359846Z // begin inline asm 2026-02-21T09:23:31.1360092Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd164 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1360365Z // end inline asm 2026-02-21T09:23:31.1360665Z cp.async.commit_group; 2026-02-21T09:23:31.1360983Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1361329Z bar.sync 0; 2026-02-21T09:23:31.1361478Z // begin inline asm 2026-02-21T09:23:31.1361706Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6810], 4096; 2026-02-21T09:23:31.1361980Z // end inline asm 2026-02-21T09:23:31.1362270Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1362621Z bar.sync 0; 2026-02-21T09:23:31.1362781Z elect.sync %r1476|%p45, -1; 2026-02-21T09:23:31.1362974Z and.pred %p41, %p1, %p45; 2026-02-21T09:23:31.1363163Z add.s32 %r1378, %r1290, 65536; 2026-02-21T09:23:31.1363341Z mov.b32 %r1380, 0; 2026-02-21T09:23:31.1363581Z // begin inline asm 2026-02-21T09:23:31.1364022Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1378], [%rd463, {%r1379, %r1380}], [%r6810]; 2026-02-21T09:23:31.1364502Z // end inline asm 2026-02-21T09:23:31.1364801Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1365160Z cvt.s64.s32 %rd184, %r1443; 2026-02-21T09:23:31.1365354Z or.b64 %rd185, %rd184, %rd754; 2026-02-21T09:23:31.1365620Z shl.b64 %rd186, %rd185, 1; 2026-02-21T09:23:31.1365816Z add.s64 %rd187, %rd102, %rd186; 2026-02-21T09:23:31.1366005Z add.s64 %rd166, %rd187, 128; 2026-02-21T09:23:31.1366193Z cvt.s64.s32 %rd188, %r1444; 2026-02-21T09:23:31.1366369Z or.b64 %rd189, %rd188, %rd754; 2026-02-21T09:23:31.1366672Z shl.b64 %rd190, %rd189, 1; 2026-02-21T09:23:31.1366857Z add.s64 %rd191, %rd102, %rd190; 2026-02-21T09:23:31.1367041Z add.s64 %rd167, %rd191, 128; 2026-02-21T09:23:31.1367226Z cvt.s64.s32 %rd192, %r1445; 2026-02-21T09:23:31.1367404Z or.b64 %rd193, %rd192, %rd754; 2026-02-21T09:23:31.1367588Z shl.b64 %rd194, %rd193, 1; 2026-02-21T09:23:31.1367768Z add.s64 %rd195, %rd102, %rd194; 2026-02-21T09:23:31.1367964Z add.s64 %rd168, %rd195, 128; 2026-02-21T09:23:31.1368153Z cvt.s64.s32 %rd196, %r1446; 2026-02-21T09:23:31.1368336Z or.b64 %rd197, %rd196, %rd754; 2026-02-21T09:23:31.1368515Z shl.b64 %rd198, %rd197, 1; 2026-02-21T09:23:31.1368696Z add.s64 %rd199, %rd102, %rd198; 2026-02-21T09:23:31.1368896Z add.s64 %rd169, %rd199, 128; 2026-02-21T09:23:31.1369071Z cvt.s64.s32 %rd200, %r1447; 2026-02-21T09:23:31.1369251Z or.b64 %rd201, %rd200, %rd754; 2026-02-21T09:23:31.1369430Z shl.b64 %rd202, %rd201, 1; 2026-02-21T09:23:31.1369608Z add.s64 %rd203, %rd102, %rd202; 2026-02-21T09:23:31.1369788Z add.s64 %rd170, %rd203, 128; 2026-02-21T09:23:31.1369970Z cvt.s64.s32 %rd204, %r1448; 2026-02-21T09:23:31.1370143Z or.b64 %rd205, %rd204, %rd754; 2026-02-21T09:23:31.1370325Z shl.b64 %rd206, %rd205, 1; 2026-02-21T09:23:31.1370514Z add.s64 %rd207, %rd102, %rd206; 2026-02-21T09:23:31.1370702Z add.s64 %rd171, %rd207, 128; 2026-02-21T09:23:31.1370888Z cvt.s64.s32 %rd208, %r1449; 2026-02-21T09:23:31.1371066Z or.b64 %rd209, %rd208, %rd754; 2026-02-21T09:23:31.1371254Z shl.b64 %rd210, %rd209, 1; 2026-02-21T09:23:31.1371429Z add.s64 %rd211, %rd102, %rd210; 2026-02-21T09:23:31.1371616Z add.s64 %rd172, %rd211, 128; 2026-02-21T09:23:31.1371790Z cvt.s64.s32 %rd212, %r1450; 2026-02-21T09:23:31.1371970Z or.b64 %rd213, %rd212, %rd754; 2026-02-21T09:23:31.1372149Z shl.b64 %rd214, %rd213, 1; 2026-02-21T09:23:31.1372328Z add.s64 %rd215, %rd102, %rd214; 2026-02-21T09:23:31.1372520Z add.s64 %rd173, %rd215, 128; 2026-02-21T09:23:31.1372693Z cvt.s64.s32 %rd216, %r1451; 2026-02-21T09:23:31.1372874Z or.b64 %rd217, %rd216, %rd754; 2026-02-21T09:23:31.1373051Z shl.b64 %rd218, %rd217, 1; 2026-02-21T09:23:31.1373234Z add.s64 %rd219, %rd102, %rd218; 2026-02-21T09:23:31.1373419Z add.s64 %rd174, %rd219, 128; 2026-02-21T09:23:31.1373602Z cvt.s64.s32 %rd220, %r1452; 2026-02-21T09:23:31.1373775Z or.b64 %rd221, %rd220, %rd754; 2026-02-21T09:23:31.1373959Z shl.b64 %rd222, %rd221, 1; 2026-02-21T09:23:31.1374300Z add.s64 %rd223, %rd102, %rd222; 2026-02-21T09:23:31.1374494Z add.s64 %rd175, %rd223, 128; 2026-02-21T09:23:31.1374674Z cvt.s64.s32 %rd224, %r1453; 2026-02-21T09:23:31.1374850Z or.b64 %rd225, %rd224, %rd754; 2026-02-21T09:23:31.1375055Z shl.b64 %rd226, %rd225, 1; 2026-02-21T09:23:31.1375233Z add.s64 %rd227, %rd102, %rd226; 2026-02-21T09:23:31.1375427Z add.s64 %rd176, %rd227, 128; 2026-02-21T09:23:31.1375605Z cvt.s64.s32 %rd228, %r1454; 2026-02-21T09:23:31.1375785Z or.b64 %rd229, %rd228, %rd754; 2026-02-21T09:23:31.1375965Z shl.b64 %rd230, %rd229, 1; 2026-02-21T09:23:31.1376143Z add.s64 %rd231, %rd102, %rd230; 2026-02-21T09:23:31.1376340Z add.s64 %rd177, %rd231, 128; 2026-02-21T09:23:31.1392275Z cvt.s64.s32 %rd232, %r1455; 2026-02-21T09:23:31.1392703Z or.b64 %rd233, %rd232, %rd754; 2026-02-21T09:23:31.1392922Z shl.b64 %rd234, %rd233, 1; 2026-02-21T09:23:31.1393129Z add.s64 %rd235, %rd102, %rd234; 2026-02-21T09:23:31.1393327Z add.s64 %rd178, %rd235, 128; 2026-02-21T09:23:31.1393537Z cvt.s64.s32 %rd236, %r1456; 2026-02-21T09:23:31.1393730Z or.b64 %rd237, %rd236, %rd754; 2026-02-21T09:23:31.1393931Z shl.b64 %rd238, %rd237, 1; 2026-02-21T09:23:31.1394118Z add.s64 %rd239, %rd102, %rd238; 2026-02-21T09:23:31.1394448Z add.s64 %rd179, %rd239, 128; 2026-02-21T09:23:31.1394644Z cvt.s64.s32 %rd240, %r1457; 2026-02-21T09:23:31.1394826Z or.b64 %rd241, %rd240, %rd754; 2026-02-21T09:23:31.1395015Z shl.b64 %rd242, %rd241, 1; 2026-02-21T09:23:31.1395191Z add.s64 %rd243, %rd102, %rd242; 2026-02-21T09:23:31.1395389Z add.s64 %rd180, %rd243, 128; 2026-02-21T09:23:31.1395568Z cvt.s64.s32 %rd244, %r1458; 2026-02-21T09:23:31.1395752Z or.b64 %rd245, %rd244, %rd754; 2026-02-21T09:23:31.1395932Z shl.b64 %rd246, %rd245, 1; 2026-02-21T09:23:31.1396116Z add.s64 %rd247, %rd102, %rd246; 2026-02-21T09:23:31.1396302Z add.s64 %rd181, %rd247, 128; 2026-02-21T09:23:31.1396794Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1397186Z // begin inline asm 2026-02-21T09:23:31.1397438Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd166 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1397728Z // end inline asm 2026-02-21T09:23:31.1397897Z // begin inline asm 2026-02-21T09:23:31.1398141Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd167 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1398413Z // end inline asm 2026-02-21T09:23:31.1398573Z // begin inline asm 2026-02-21T09:23:31.1398799Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd168 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1399086Z // end inline asm 2026-02-21T09:23:31.1399249Z // begin inline asm 2026-02-21T09:23:31.1399484Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd169 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1399776Z // end inline asm 2026-02-21T09:23:31.1399933Z // begin inline asm 2026-02-21T09:23:31.1400169Z cp.async.ca.shared.global [ %r47 + 0 ], [ %rd170 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1400441Z // end inline asm 2026-02-21T09:23:31.1400614Z // begin inline asm 2026-02-21T09:23:31.1400840Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd171 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1401119Z // end inline asm 2026-02-21T09:23:31.1401273Z // begin inline asm 2026-02-21T09:23:31.1401498Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd172 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1401774Z // end inline asm 2026-02-21T09:23:31.1401924Z // begin inline asm 2026-02-21T09:23:31.1402152Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd173 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1402426Z // end inline asm 2026-02-21T09:23:31.1402587Z // begin inline asm 2026-02-21T09:23:31.1402821Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd174 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1403102Z // end inline asm 2026-02-21T09:23:31.1403264Z // begin inline asm 2026-02-21T09:23:31.1403495Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd175 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1403776Z // end inline asm 2026-02-21T09:23:31.1403930Z // begin inline asm 2026-02-21T09:23:31.1404354Z cp.async.ca.shared.global [ %r53 + 0 ], [ %rd176 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1404640Z // end inline asm 2026-02-21T09:23:31.1404800Z // begin inline asm 2026-02-21T09:23:31.1405031Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd177 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1405309Z // end inline asm 2026-02-21T09:23:31.1405468Z // begin inline asm 2026-02-21T09:23:31.1405696Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd178 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1405977Z // end inline asm 2026-02-21T09:23:31.1406130Z // begin inline asm 2026-02-21T09:23:31.1406360Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd179 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1406762Z // end inline asm 2026-02-21T09:23:31.1406922Z // begin inline asm 2026-02-21T09:23:31.1407240Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd180 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1407523Z // end inline asm 2026-02-21T09:23:31.1407677Z // begin inline asm 2026-02-21T09:23:31.1407900Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd181 + 0 ], 0x8, %r1346; 2026-02-21T09:23:31.1408180Z // end inline asm 2026-02-21T09:23:31.1408340Z cp.async.commit_group; 2026-02-21T09:23:31.1408679Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1409145Z bar.sync 0; 2026-02-21T09:23:31.1409309Z // begin inline asm 2026-02-21T09:23:31.1409559Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6811], 4096; 2026-02-21T09:23:31.1409845Z // end inline asm 2026-02-21T09:23:31.1410159Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1410517Z bar.sync 0; 2026-02-21T09:23:31.1410683Z elect.sync %r1477|%p46, -1; 2026-02-21T09:23:31.1410888Z and.pred %p43, %p1, %p46; 2026-02-21T09:23:31.1411088Z add.s32 %r1415, %r1290, 69632; 2026-02-21T09:23:31.1411271Z mov.b32 %r1417, 32; 2026-02-21T09:23:31.1411436Z // begin inline asm 2026-02-21T09:23:31.1411877Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1415], [%rd463, {%r1379, %r1417}], [%r6811]; 2026-02-21T09:23:31.1412358Z // end inline asm 2026-02-21T09:23:31.1412669Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1413028Z shl.b32 %r1478, %r12481, 7; 2026-02-21T09:23:31.1413225Z or.b32 %r1479, %r22, %r1478; 2026-02-21T09:23:31.1413405Z shl.b32 %r1480, %r1425, 7; 2026-02-21T09:23:31.1413596Z and.b32 %r1481, %r1480, -16384; 2026-02-21T09:23:31.1413788Z sub.s32 %r1482, %r1479, %r1481; 2026-02-21T09:23:31.1413988Z shl.b32 %r1483, %r1482, 10; 2026-02-21T09:23:31.1414171Z mul.wide.s32 %rd248, %r1483, 2; 2026-02-21T09:23:31.1414368Z or.b64 %rd22, %rd248, 256; 2026-02-21T09:23:31.1414551Z or.b32 %r1484, %r21, %r1478; 2026-02-21T09:23:31.1414730Z sub.s32 %r1485, %r1484, %r1481; 2026-02-21T09:23:31.1414922Z shl.b32 %r1486, %r1485, 10; 2026-02-21T09:23:31.1415101Z mul.wide.s32 %rd249, %r1486, 2; 2026-02-21T09:23:31.1415299Z or.b64 %rd23, %rd249, 256; 2026-02-21T09:23:31.1415475Z or.b32 %r1487, %r20, %r1478; 2026-02-21T09:23:31.1415661Z sub.s32 %r1488, %r1487, %r1481; 2026-02-21T09:23:31.1415842Z shl.b32 %r1489, %r1488, 10; 2026-02-21T09:23:31.1416024Z mul.wide.s32 %rd250, %r1489, 2; 2026-02-21T09:23:31.1416219Z or.b64 %rd24, %rd250, 256; 2026-02-21T09:23:31.1416392Z or.b32 %r1490, %r19, %r1478; 2026-02-21T09:23:31.1416712Z sub.s32 %r1491, %r1490, %r1481; 2026-02-21T09:23:31.1416902Z shl.b32 %r1492, %r1491, 10; 2026-02-21T09:23:31.1417091Z mul.wide.s32 %rd251, %r1492, 2; 2026-02-21T09:23:31.1417281Z or.b64 %rd25, %rd251, 256; 2026-02-21T09:23:31.1417463Z or.b32 %r1493, %r18, %r1478; 2026-02-21T09:23:31.1417644Z sub.s32 %r1494, %r1493, %r1481; 2026-02-21T09:23:31.1417835Z shl.b32 %r1495, %r1494, 10; 2026-02-21T09:23:31.1418015Z mul.wide.s32 %rd252, %r1495, 2; 2026-02-21T09:23:31.1418209Z or.b64 %rd26, %rd252, 256; 2026-02-21T09:23:31.1418400Z or.b32 %r1496, %r17, %r1478; 2026-02-21T09:23:31.1418741Z sub.s32 %r1497, %r1496, %r1481; 2026-02-21T09:23:31.1418928Z shl.b32 %r1498, %r1497, 10; 2026-02-21T09:23:31.1419113Z mul.wide.s32 %rd253, %r1498, 2; 2026-02-21T09:23:31.1419297Z or.b64 %rd27, %rd253, 256; 2026-02-21T09:23:31.1419475Z or.b32 %r1499, %r16, %r1478; 2026-02-21T09:23:31.1419656Z sub.s32 %r1500, %r1499, %r1481; 2026-02-21T09:23:31.1419840Z shl.b32 %r1501, %r1500, 10; 2026-02-21T09:23:31.1420021Z mul.wide.s32 %rd254, %r1501, 2; 2026-02-21T09:23:31.1420203Z or.b64 %rd28, %rd254, 256; 2026-02-21T09:23:31.1420380Z or.b32 %r1502, %r15, %r1478; 2026-02-21T09:23:31.1420553Z sub.s32 %r1503, %r1502, %r1481; 2026-02-21T09:23:31.1420744Z shl.b32 %r1504, %r1503, 10; 2026-02-21T09:23:31.1420923Z mul.wide.s32 %rd255, %r1504, 2; 2026-02-21T09:23:31.1421114Z or.b64 %rd29, %rd255, 256; 2026-02-21T09:23:31.1421376Z or.b32 %r1505, %r14, %r1478; 2026-02-21T09:23:31.1421556Z sub.s32 %r1506, %r1505, %r1481; 2026-02-21T09:23:31.1421746Z shl.b32 %r1507, %r1506, 10; 2026-02-21T09:23:31.1421941Z mul.wide.s32 %rd256, %r1507, 2; 2026-02-21T09:23:31.1422131Z or.b64 %rd30, %rd256, 256; 2026-02-21T09:23:31.1422302Z or.b32 %r1508, %r13, %r1478; 2026-02-21T09:23:31.1422476Z sub.s32 %r1509, %r1508, %r1481; 2026-02-21T09:23:31.1422651Z shl.b32 %r1510, %r1509, 10; 2026-02-21T09:23:31.1422901Z mul.wide.s32 %rd257, %r1510, 2; 2026-02-21T09:23:31.1423098Z or.b64 %rd31, %rd257, 256; 2026-02-21T09:23:31.1423284Z or.b32 %r1511, %r12, %r1478; 2026-02-21T09:23:31.1423471Z sub.s32 %r1512, %r1511, %r1481; 2026-02-21T09:23:31.1423655Z shl.b32 %r1513, %r1512, 10; 2026-02-21T09:23:31.1423835Z mul.wide.s32 %rd258, %r1513, 2; 2026-02-21T09:23:31.1424025Z or.b64 %rd32, %rd258, 256; 2026-02-21T09:23:31.1424206Z or.b32 %r1514, %r11, %r1478; 2026-02-21T09:23:31.1424383Z sub.s32 %r1515, %r1514, %r1481; 2026-02-21T09:23:31.1424578Z shl.b32 %r1516, %r1515, 10; 2026-02-21T09:23:31.1424762Z mul.wide.s32 %rd259, %r1516, 2; 2026-02-21T09:23:31.1424948Z or.b64 %rd33, %rd259, 256; 2026-02-21T09:23:31.1425129Z or.b32 %r1517, %r10, %r1478; 2026-02-21T09:23:31.1425306Z sub.s32 %r1518, %r1517, %r1481; 2026-02-21T09:23:31.1425497Z shl.b32 %r1519, %r1518, 10; 2026-02-21T09:23:31.1425674Z mul.wide.s32 %rd260, %r1519, 2; 2026-02-21T09:23:31.1425865Z or.b64 %rd34, %rd260, 256; 2026-02-21T09:23:31.1426042Z or.b32 %r1520, %r9, %r1478; 2026-02-21T09:23:31.1426220Z sub.s32 %r1521, %r1520, %r1481; 2026-02-21T09:23:31.1426403Z shl.b32 %r1522, %r1521, 10; 2026-02-21T09:23:31.1426722Z mul.wide.s32 %rd261, %r1522, 2; 2026-02-21T09:23:31.1426910Z or.b64 %rd35, %rd261, 256; 2026-02-21T09:23:31.1427088Z or.b32 %r1523, %r8, %r1478; 2026-02-21T09:23:31.1427264Z sub.s32 %r1524, %r1523, %r1481; 2026-02-21T09:23:31.1427445Z shl.b32 %r1525, %r1524, 10; 2026-02-21T09:23:31.1427635Z mul.wide.s32 %rd262, %r1525, 2; 2026-02-21T09:23:31.1427829Z or.b64 %rd36, %rd262, 256; 2026-02-21T09:23:31.1428001Z or.b32 %r1526, %r7, %r1478; 2026-02-21T09:23:31.1428177Z sub.s32 %r1527, %r1526, %r1481; 2026-02-21T09:23:31.1428449Z shl.b32 %r1528, %r1527, 10; 2026-02-21T09:23:31.1428637Z mul.wide.s32 %rd263, %r1528, 2; 2026-02-21T09:23:31.1428821Z or.b64 %rd37, %rd263, 256; 2026-02-21T09:23:31.1428998Z mov.b32 %r12485, 0f00000000; 2026-02-21T09:23:31.1429171Z mov.b32 %r12484, 1; 2026-02-21T09:23:31.1429338Z mov.b32 %r12483, -1; 2026-02-21T09:23:31.1429498Z mov.b64 %rd756, 0; 2026-02-21T09:23:31.1429665Z mov.b64 %rd755, %rd11; 2026-02-21T09:23:31.1429842Z mov.b32 %r12482, %r1380; 2026-02-21T09:23:31.1430017Z mov.b32 %r12486, %r12485; 2026-02-21T09:23:31.1430193Z mov.b32 %r12487, %r12485; 2026-02-21T09:23:31.1430358Z mov.b32 %r12488, %r12485; 2026-02-21T09:23:31.1430526Z mov.b32 %r12489, %r12485; 2026-02-21T09:23:31.1430688Z mov.b32 %r12490, %r12485; 2026-02-21T09:23:31.1430856Z mov.b32 %r12491, %r12485; 2026-02-21T09:23:31.1431019Z mov.b32 %r12492, %r12485; 2026-02-21T09:23:31.1431187Z mov.b32 %r12493, %r12485; 2026-02-21T09:23:31.1431350Z mov.b32 %r12494, %r12485; 2026-02-21T09:23:31.1431689Z mov.b32 %r12495, %r12485; 2026-02-21T09:23:31.1431860Z mov.b32 %r12496, %r12485; 2026-02-21T09:23:31.1432023Z mov.b32 %r12497, %r12485; 2026-02-21T09:23:31.1432192Z mov.b32 %r12498, %r12485; 2026-02-21T09:23:31.1432356Z mov.b32 %r12499, %r12485; 2026-02-21T09:23:31.1432525Z mov.b32 %r12500, %r12485; 2026-02-21T09:23:31.1432689Z mov.b32 %r12501, %r12485; 2026-02-21T09:23:31.1432857Z mov.b32 %r12502, %r12485; 2026-02-21T09:23:31.1433022Z mov.b32 %r12503, %r12485; 2026-02-21T09:23:31.1433193Z mov.b32 %r12504, %r12485; 2026-02-21T09:23:31.1433362Z mov.b32 %r12505, %r12485; 2026-02-21T09:23:31.1433526Z mov.b32 %r12506, %r12485; 2026-02-21T09:23:31.1433695Z mov.b32 %r12507, %r12485; 2026-02-21T09:23:31.1433855Z mov.b32 %r12508, %r12485; 2026-02-21T09:23:31.1434106Z mov.b32 %r12509, %r12485; 2026-02-21T09:23:31.1434273Z mov.b32 %r12510, %r12485; 2026-02-21T09:23:31.1434443Z mov.b32 %r12511, %r12485; 2026-02-21T09:23:31.1434605Z mov.b32 %r12512, %r12485; 2026-02-21T09:23:31.1434778Z mov.b32 %r12513, %r12485; 2026-02-21T09:23:31.1434940Z mov.b32 %r12514, %r12485; 2026-02-21T09:23:31.1435109Z mov.b32 %r12515, %r12485; 2026-02-21T09:23:31.1435279Z mov.b32 %r12516, %r12485; 2026-02-21T09:23:31.1435445Z mov.b32 %r12517, %r12485; 2026-02-21T09:23:31.1435690Z mov.b32 %r12518, %r12485; 2026-02-21T09:23:31.1435858Z mov.b32 %r12519, %r12485; 2026-02-21T09:23:31.1436029Z mov.b32 %r12520, %r12485; 2026-02-21T09:23:31.1436194Z mov.b32 %r12521, %r12485; 2026-02-21T09:23:31.1436365Z mov.b32 %r12522, %r12485; 2026-02-21T09:23:31.1436666Z mov.b32 %r12523, %r12485; 2026-02-21T09:23:31.1436843Z mov.b32 %r12524, %r12485; 2026-02-21T09:23:31.1437008Z mov.b32 %r12525, %r12485; 2026-02-21T09:23:31.1437178Z mov.b32 %r12526, %r12485; 2026-02-21T09:23:31.1437352Z mov.b32 %r12527, %r12485; 2026-02-21T09:23:31.1437519Z mov.b32 %r12528, %r12485; 2026-02-21T09:23:31.1437688Z mov.b32 %r12529, %r12485; 2026-02-21T09:23:31.1437849Z mov.b32 %r12530, %r12485; 2026-02-21T09:23:31.1438023Z mov.b32 %r12531, %r12485; 2026-02-21T09:23:31.1438190Z mov.b32 %r12532, %r12485; 2026-02-21T09:23:31.1438358Z mov.b32 %r12533, %r12485; 2026-02-21T09:23:31.1438524Z mov.b32 %r12534, %r12485; 2026-02-21T09:23:31.1438696Z mov.b32 %r12535, %r12485; 2026-02-21T09:23:31.1438862Z mov.b32 %r12536, %r12485; 2026-02-21T09:23:31.1439030Z mov.b32 %r12537, %r12485; 2026-02-21T09:23:31.1439204Z mov.b32 %r12538, %r12485; 2026-02-21T09:23:31.1439365Z mov.b32 %r12539, %r12485; 2026-02-21T09:23:31.1439536Z mov.b32 %r12540, %r12485; 2026-02-21T09:23:31.1439715Z mov.b32 %r12541, %r12485; 2026-02-21T09:23:31.1439886Z mov.b32 %r12542, %r12485; 2026-02-21T09:23:31.1440048Z mov.b32 %r12543, %r12485; 2026-02-21T09:23:31.1440214Z mov.b32 %r12544, %r12485; 2026-02-21T09:23:31.1440381Z mov.b32 %r12545, %r12485; 2026-02-21T09:23:31.1440549Z mov.b32 %r12546, %r12485; 2026-02-21T09:23:31.1440711Z mov.b32 %r12547, %r12485; 2026-02-21T09:23:31.1440875Z mov.b32 %r12548, %r12485; 2026-02-21T09:23:31.1441044Z mov.b32 %r12549, %r12485; 2026-02-21T09:23:31.1441207Z mov.b32 %r12550, %r12485; 2026-02-21T09:23:31.1441377Z mov.b32 %r12551, %r12485; 2026-02-21T09:23:31.1441542Z mov.b32 %r12552, %r12485; 2026-02-21T09:23:31.1441714Z mov.b32 %r12553, %r12485; 2026-02-21T09:23:31.1441879Z mov.b32 %r12554, %r12485; 2026-02-21T09:23:31.1442049Z mov.b32 %r12555, %r12485; 2026-02-21T09:23:31.1442211Z mov.b32 %r12556, %r12485; 2026-02-21T09:23:31.1442382Z mov.b32 %r12557, %r12485; 2026-02-21T09:23:31.1442548Z mov.b32 %r12558, %r12485; 2026-02-21T09:23:31.1442721Z mov.b32 %r12559, %r12485; 2026-02-21T09:23:31.1442891Z mov.b32 %r12560, %r12485; 2026-02-21T09:23:31.1443054Z mov.b32 %r12561, %r12485; 2026-02-21T09:23:31.1443222Z mov.b32 %r12562, %r12485; 2026-02-21T09:23:31.1443387Z mov.b32 %r12563, %r12485; 2026-02-21T09:23:31.1443556Z mov.b32 %r12564, %r12485; 2026-02-21T09:23:31.1443721Z mov.b32 %r12565, %r12485; 2026-02-21T09:23:31.1443888Z mov.b32 %r12566, %r12485; 2026-02-21T09:23:31.1444237Z mov.b32 %r12567, %r12485; 2026-02-21T09:23:31.1444407Z mov.b32 %r12568, %r12485; 2026-02-21T09:23:31.1444573Z mov.b32 %r12569, %r12485; 2026-02-21T09:23:31.1444746Z mov.b32 %r12570, %r12485; 2026-02-21T09:23:31.1444915Z mov.b32 %r12571, %r12485; 2026-02-21T09:23:31.1445084Z mov.b32 %r12572, %r12485; 2026-02-21T09:23:31.1445256Z mov.b32 %r12573, %r12485; 2026-02-21T09:23:31.1445420Z mov.b32 %r12574, %r12485; 2026-02-21T09:23:31.1445595Z mov.b32 %r12575, %r12485; 2026-02-21T09:23:31.1445760Z mov.b32 %r12576, %r12485; 2026-02-21T09:23:31.1445935Z mov.b32 %r12577, %r12485; 2026-02-21T09:23:31.1446100Z mov.b32 %r12578, %r12485; 2026-02-21T09:23:31.1446269Z mov.b32 %r12579, %r12485; 2026-02-21T09:23:31.1446437Z mov.b32 %r12580, %r12485; 2026-02-21T09:23:31.1446860Z mov.b32 %r12581, %r12485; 2026-02-21T09:23:31.1447045Z mov.b32 %r12582, %r12485; 2026-02-21T09:23:31.1447215Z mov.b32 %r12583, %r12485; 2026-02-21T09:23:31.1447389Z mov.b32 %r12584, %r12485; 2026-02-21T09:23:31.1447559Z mov.b32 %r12585, %r12485; 2026-02-21T09:23:31.1447733Z mov.b32 %r12586, %r12485; 2026-02-21T09:23:31.1447899Z mov.b32 %r12587, %r12485; 2026-02-21T09:23:31.1448065Z mov.b32 %r12588, %r12485; 2026-02-21T09:23:31.1448229Z mov.b32 %r12589, %r12485; 2026-02-21T09:23:31.1448468Z mov.b32 %r12590, %r12485; 2026-02-21T09:23:31.1448637Z mov.b32 %r12591, %r12485; 2026-02-21T09:23:31.1448806Z mov.b32 %r12592, %r12485; 2026-02-21T09:23:31.1448975Z mov.b32 %r12593, %r12485; 2026-02-21T09:23:31.1449142Z mov.b32 %r12594, %r12485; 2026-02-21T09:23:31.1449309Z mov.b32 %r12595, %r12485; 2026-02-21T09:23:31.1449471Z mov.b32 %r12596, %r12485; 2026-02-21T09:23:31.1449639Z mov.b32 %r12597, %r12485; 2026-02-21T09:23:31.1449803Z mov.b32 %r12598, %r12485; 2026-02-21T09:23:31.1449991Z mov.b32 %r12599, %r12485; 2026-02-21T09:23:31.1450159Z mov.b32 %r12600, %r12485; 2026-02-21T09:23:31.1450326Z mov.b32 %r12601, %r12485; 2026-02-21T09:23:31.1450492Z mov.b32 %r12602, %r12485; 2026-02-21T09:23:31.1450663Z mov.b32 %r12603, %r12485; 2026-02-21T09:23:31.1450831Z mov.b32 %r12604, %r12485; 2026-02-21T09:23:31.1450997Z mov.b32 %r12605, %r12485; 2026-02-21T09:23:31.1451168Z mov.b32 %r12606, %r12485; 2026-02-21T09:23:31.1451332Z mov.b32 %r12607, %r12485; 2026-02-21T09:23:31.1451504Z mov.b32 %r12608, %r12485; 2026-02-21T09:23:31.1451667Z mov.b32 %r12609, %r12485; 2026-02-21T09:23:31.1451841Z mov.b32 %r12610, %r12485; 2026-02-21T09:23:31.1452004Z mov.b32 %r12611, %r12485; 2026-02-21T09:23:31.1452172Z mov.b32 %r12612, %r12485; 2026-02-21T09:23:31.1452400Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:31.1452699Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:31.1452969Z setp.lt.u64 %p67, %rd756, 448; 2026-02-21T09:23:31.1453161Z add.s32 %r3942, %r12483, 1; 2026-02-21T09:23:31.1453347Z setp.gt.s32 %p68, %r3942, 1; 2026-02-21T09:23:31.1453535Z selp.b32 %r12483, 0, %r3942, %p68; 2026-02-21T09:23:31.1453743Z selp.b32 %r3943, 1, 0, %p68; 2026-02-21T09:23:31.1453926Z xor.b32 %r12482, %r12482, %r3943; 2026-02-21T09:23:31.1454289Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1454665Z cp.async.wait_group 1; 2026-02-21T09:23:31.1454837Z bar.sync 0; 2026-02-21T09:23:31.1454990Z shl.b32 %r3944, %r12483, 14; 2026-02-21T09:23:31.1455169Z add.s32 %r3946, %r1290, %r3944; 2026-02-21T09:23:31.1455501Z .loc 1 56 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:56:32 2026-02-21T09:23:31.1455859Z add.s32 %r3947, %r3946, %r59; 2026-02-21T09:23:31.1456057Z ld.shared.b16 %rs1, [%r3947]; 2026-02-21T09:23:31.1456242Z ld.shared.b16 %rs2, [%r3947+1024]; 2026-02-21T09:23:31.1456578Z ld.shared.b16 %rs3, [%r3947+64]; 2026-02-21T09:23:31.1456787Z ld.shared.b16 %rs4, [%r3947+1088]; 2026-02-21T09:23:31.1456985Z ld.shared.b16 %rs5, [%r3947+8192]; 2026-02-21T09:23:31.1457346Z ld.shared.b16 %rs6, [%r3947+9216]; 2026-02-21T09:23:31.1457538Z ld.shared.b16 %rs7, [%r3947+8256]; 2026-02-21T09:23:31.1457734Z ld.shared.b16 %rs8, [%r3947+9280]; 2026-02-21T09:23:31.1457926Z add.s32 %r3948, %r3946, %r60; 2026-02-21T09:23:31.1458115Z ld.shared.b16 %rs9, [%r3948]; 2026-02-21T09:23:31.1458303Z ld.shared.b16 %rs10, [%r3948+1024]; 2026-02-21T09:23:31.1458510Z ld.shared.b16 %rs11, [%r3948+64]; 2026-02-21T09:23:31.1458708Z ld.shared.b16 %rs12, [%r3948+1088]; 2026-02-21T09:23:31.1458904Z ld.shared.b16 %rs13, [%r3948+8192]; 2026-02-21T09:23:31.1459110Z ld.shared.b16 %rs14, [%r3948+9216]; 2026-02-21T09:23:31.1459305Z ld.shared.b16 %rs15, [%r3948+8256]; 2026-02-21T09:23:31.1459505Z ld.shared.b16 %rs16, [%r3948+9280]; 2026-02-21T09:23:31.1459708Z add.s32 %r3949, %r3946, %r61; 2026-02-21T09:23:31.1459980Z ld.shared.b16 %rs17, [%r3949]; 2026-02-21T09:23:31.1460174Z ld.shared.b16 %rs18, [%r3949+1024]; 2026-02-21T09:23:31.1460372Z ld.shared.b16 %rs19, [%r3949+64]; 2026-02-21T09:23:31.1460575Z ld.shared.b16 %rs20, [%r3949+1088]; 2026-02-21T09:23:31.1460766Z ld.shared.b16 %rs21, [%r3949+8192]; 2026-02-21T09:23:31.1460966Z ld.shared.b16 %rs22, [%r3949+9216]; 2026-02-21T09:23:31.1461159Z ld.shared.b16 %rs23, [%r3949+8256]; 2026-02-21T09:23:31.1461424Z ld.shared.b16 %rs24, [%r3949+9280]; 2026-02-21T09:23:31.1461619Z add.s32 %r3950, %r3946, %r62; 2026-02-21T09:23:31.1461808Z ld.shared.b16 %rs25, [%r3950]; 2026-02-21T09:23:31.1461996Z ld.shared.b16 %rs26, [%r3950+1024]; 2026-02-21T09:23:31.1462194Z ld.shared.b16 %rs27, [%r3950+64]; 2026-02-21T09:23:31.1462389Z ld.shared.b16 %rs28, [%r3950+1088]; 2026-02-21T09:23:31.1462602Z ld.shared.b16 %rs29, [%r3950+8192]; 2026-02-21T09:23:31.1462807Z ld.shared.b16 %rs30, [%r3950+9216]; 2026-02-21T09:23:31.1463009Z ld.shared.b16 %rs31, [%r3950+8256]; 2026-02-21T09:23:31.1463212Z ld.shared.b16 %rs32, [%r3950+9280]; 2026-02-21T09:23:31.1463408Z add.s32 %r3951, %r3946, %r63; 2026-02-21T09:23:31.1463602Z ld.shared.b16 %rs33, [%r3951]; 2026-02-21T09:23:31.1463799Z ld.shared.b16 %rs34, [%r3951+1024]; 2026-02-21T09:23:31.1464008Z ld.shared.b16 %rs35, [%r3951+64]; 2026-02-21T09:23:31.1464208Z ld.shared.b16 %rs36, [%r3951+1088]; 2026-02-21T09:23:31.1464409Z ld.shared.b16 %rs37, [%r3951+8192]; 2026-02-21T09:23:31.1464613Z ld.shared.b16 %rs38, [%r3951+9216]; 2026-02-21T09:23:31.1464811Z ld.shared.b16 %rs39, [%r3951+8256]; 2026-02-21T09:23:31.1465010Z ld.shared.b16 %rs40, [%r3951+9280]; 2026-02-21T09:23:31.1465200Z add.s32 %r3952, %r3946, %r64; 2026-02-21T09:23:31.1465391Z ld.shared.b16 %rs41, [%r3952]; 2026-02-21T09:23:31.1465589Z ld.shared.b16 %rs42, [%r3952+1024]; 2026-02-21T09:23:31.1465792Z ld.shared.b16 %rs43, [%r3952+64]; 2026-02-21T09:23:31.1465990Z ld.shared.b16 %rs44, [%r3952+1088]; 2026-02-21T09:23:31.1466193Z ld.shared.b16 %rs45, [%r3952+8192]; 2026-02-21T09:23:31.1466410Z ld.shared.b16 %rs46, [%r3952+9216]; 2026-02-21T09:23:31.1466724Z ld.shared.b16 %rs47, [%r3952+8256]; 2026-02-21T09:23:31.1466933Z ld.shared.b16 %rs48, [%r3952+9280]; 2026-02-21T09:23:31.1467129Z add.s32 %r3953, %r3946, %r65; 2026-02-21T09:23:31.1467318Z ld.shared.b16 %rs49, [%r3953]; 2026-02-21T09:23:31.1467504Z ld.shared.b16 %rs50, [%r3953+1024]; 2026-02-21T09:23:31.1467710Z ld.shared.b16 %rs51, [%r3953+64]; 2026-02-21T09:23:31.1467903Z ld.shared.b16 %rs52, [%r3953+1088]; 2026-02-21T09:23:31.1468108Z ld.shared.b16 %rs53, [%r3953+8192]; 2026-02-21T09:23:31.1468408Z ld.shared.b16 %rs54, [%r3953+9216]; 2026-02-21T09:23:31.1468607Z ld.shared.b16 %rs55, [%r3953+8256]; 2026-02-21T09:23:31.1468806Z ld.shared.b16 %rs56, [%r3953+9280]; 2026-02-21T09:23:31.1469003Z add.s32 %r3954, %r3946, %r66; 2026-02-21T09:23:31.1469195Z ld.shared.b16 %rs57, [%r3954]; 2026-02-21T09:23:31.1469379Z ld.shared.b16 %rs58, [%r3954+1024]; 2026-02-21T09:23:31.1469580Z ld.shared.b16 %rs59, [%r3954+64]; 2026-02-21T09:23:31.1469771Z ld.shared.b16 %rs60, [%r3954+1088]; 2026-02-21T09:23:31.1469968Z ld.shared.b16 %rs61, [%r3954+8192]; 2026-02-21T09:23:31.1470352Z ld.shared.b16 %rs62, [%r3954+9216]; 2026-02-21T09:23:31.1470552Z ld.shared.b16 %rs63, [%r3954+8256]; 2026-02-21T09:23:31.1470751Z ld.shared.b16 %rs64, [%r3954+9280]; 2026-02-21T09:23:31.1470943Z cvt.f32.bf16 %r1659, %rs1; 2026-02-21T09:23:31.1471127Z cvt.f32.bf16 %r1660, %rs2; 2026-02-21T09:23:31.1471302Z cvt.f32.bf16 %r1661, %rs9; 2026-02-21T09:23:31.1471484Z cvt.f32.bf16 %r1662, %rs10; 2026-02-21T09:23:31.1471666Z cvt.f32.bf16 %r1791, %rs17; 2026-02-21T09:23:31.1471870Z cvt.f32.bf16 %r1792, %rs18; 2026-02-21T09:23:31.1472048Z cvt.f32.bf16 %r1793, %rs25; 2026-02-21T09:23:31.1472227Z cvt.f32.bf16 %r1794, %rs26; 2026-02-21T09:23:31.1472405Z cvt.f32.bf16 %r1923, %rs33; 2026-02-21T09:23:31.1472579Z cvt.f32.bf16 %r1924, %rs34; 2026-02-21T09:23:31.1472851Z cvt.f32.bf16 %r1925, %rs41; 2026-02-21T09:23:31.1473030Z cvt.f32.bf16 %r1926, %rs42; 2026-02-21T09:23:31.1473208Z cvt.f32.bf16 %r2055, %rs49; 2026-02-21T09:23:31.1473381Z cvt.f32.bf16 %r2056, %rs50; 2026-02-21T09:23:31.1473564Z cvt.f32.bf16 %r2057, %rs57; 2026-02-21T09:23:31.1473736Z cvt.f32.bf16 %r2058, %rs58; 2026-02-21T09:23:31.1473919Z cvt.f32.bf16 %r2187, %rs3; 2026-02-21T09:23:31.1474097Z cvt.f32.bf16 %r2188, %rs4; 2026-02-21T09:23:31.1474350Z cvt.f32.bf16 %r2189, %rs11; 2026-02-21T09:23:31.1474535Z cvt.f32.bf16 %r2190, %rs12; 2026-02-21T09:23:31.1474708Z cvt.f32.bf16 %r2319, %rs19; 2026-02-21T09:23:31.1474888Z cvt.f32.bf16 %r2320, %rs20; 2026-02-21T09:23:31.1475061Z cvt.f32.bf16 %r2321, %rs27; 2026-02-21T09:23:31.1475239Z cvt.f32.bf16 %r2322, %rs28; 2026-02-21T09:23:31.1475412Z cvt.f32.bf16 %r2451, %rs35; 2026-02-21T09:23:31.1475606Z cvt.f32.bf16 %r2452, %rs36; 2026-02-21T09:23:31.1475783Z cvt.f32.bf16 %r2453, %rs43; 2026-02-21T09:23:31.1475962Z cvt.f32.bf16 %r2454, %rs44; 2026-02-21T09:23:31.1476141Z cvt.f32.bf16 %r2583, %rs51; 2026-02-21T09:23:31.1476324Z cvt.f32.bf16 %r2584, %rs52; 2026-02-21T09:23:31.1476625Z cvt.f32.bf16 %r2585, %rs59; 2026-02-21T09:23:31.1476810Z cvt.f32.bf16 %r2586, %rs60; 2026-02-21T09:23:31.1477006Z cvt.f32.bf16 %r2715, %rs5; 2026-02-21T09:23:31.1477185Z cvt.f32.bf16 %r2716, %rs6; 2026-02-21T09:23:31.1477364Z cvt.f32.bf16 %r2717, %rs13; 2026-02-21T09:23:31.1477542Z cvt.f32.bf16 %r2718, %rs14; 2026-02-21T09:23:31.1477723Z cvt.f32.bf16 %r2847, %rs21; 2026-02-21T09:23:31.1477897Z cvt.f32.bf16 %r2848, %rs22; 2026-02-21T09:23:31.1478072Z cvt.f32.bf16 %r2849, %rs29; 2026-02-21T09:23:31.1478251Z cvt.f32.bf16 %r2850, %rs30; 2026-02-21T09:23:31.1478425Z cvt.f32.bf16 %r2979, %rs37; 2026-02-21T09:23:31.1478604Z cvt.f32.bf16 %r2980, %rs38; 2026-02-21T09:23:31.1478776Z cvt.f32.bf16 %r2981, %rs45; 2026-02-21T09:23:31.1478960Z cvt.f32.bf16 %r2982, %rs46; 2026-02-21T09:23:31.1479142Z cvt.f32.bf16 %r3111, %rs53; 2026-02-21T09:23:31.1479320Z cvt.f32.bf16 %r3112, %rs54; 2026-02-21T09:23:31.1479492Z cvt.f32.bf16 %r3113, %rs61; 2026-02-21T09:23:31.1479676Z cvt.f32.bf16 %r3114, %rs62; 2026-02-21T09:23:31.1479863Z cvt.f32.bf16 %r3243, %rs7; 2026-02-21T09:23:31.1480045Z cvt.f32.bf16 %r3244, %rs8; 2026-02-21T09:23:31.1480225Z cvt.f32.bf16 %r3245, %rs15; 2026-02-21T09:23:31.1480399Z cvt.f32.bf16 %r3246, %rs16; 2026-02-21T09:23:31.1480578Z cvt.f32.bf16 %r3375, %rs23; 2026-02-21T09:23:31.1480752Z cvt.f32.bf16 %r3376, %rs24; 2026-02-21T09:23:31.1480931Z cvt.f32.bf16 %r3377, %rs31; 2026-02-21T09:23:31.1481106Z cvt.f32.bf16 %r3378, %rs32; 2026-02-21T09:23:31.1481283Z cvt.f32.bf16 %r3507, %rs39; 2026-02-21T09:23:31.1481454Z cvt.f32.bf16 %r3508, %rs40; 2026-02-21T09:23:31.1481633Z cvt.f32.bf16 %r3509, %rs47; 2026-02-21T09:23:31.1481808Z cvt.f32.bf16 %r3510, %rs48; 2026-02-21T09:23:31.1481876Z cvt.f32.bf16 %r3639, %rs55; 2026-02-21T09:23:31.1481937Z cvt.f32.bf16 %r3640, %rs56; 2026-02-21T09:23:31.1482003Z cvt.f32.bf16 %r3641, %rs63; 2026-02-21T09:23:31.1482069Z cvt.f32.bf16 %r3642, %rs64; 2026-02-21T09:23:31.1482295Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1482530Z shl.b32 %r3955, %r12483, 3; 2026-02-21T09:23:31.1482603Z add.s32 %r1529, %r6810, %r3955; 2026-02-21T09:23:31.1482669Z // begin inline asm 2026-02-21T09:23:31.1482724Z 2026-02-21T09:23:31.1482777Z { 2026-02-21T09:23:31.1482853Z .reg .pred complete; 2026-02-21T09:23:31.1482911Z waitLoop: 2026-02-21T09:23:31.1483068Z mbarrier.try_wait.parity.shared.b64 complete, [%r1529], %r12482; 2026-02-21T09:23:31.1483147Z @!complete bra.uni waitLoop; 2026-02-21T09:23:31.1483201Z } 2026-02-21T09:23:31.1483207Z 2026-02-21T09:23:31.1483267Z // end inline asm 2026-02-21T09:23:31.1483482Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1483553Z shl.b32 %r3957, %r12483, 12; 2026-02-21T09:23:31.1483690Z add.s32 %r3959, %r1378, %r3957; 2026-02-21T09:23:31.1483900Z .loc 1 76 58 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:76:58 2026-02-21T09:23:31.1483974Z add.s32 %r3960, %r3959, %r12473; 2026-02-21T09:23:31.1484042Z xor.b32 %r332, %r12473, 16; 2026-02-21T09:23:31.1484109Z add.s32 %r3961, %r3959, %r332; 2026-02-21T09:23:31.1484178Z xor.b32 %r333, %r12473, 32; 2026-02-21T09:23:31.1484243Z add.s32 %r3962, %r3959, %r333; 2026-02-21T09:23:31.1484370Z xor.b32 %r334, %r12473, 48; 2026-02-21T09:23:31.1484436Z add.s32 %r3963, %r3959, %r334; 2026-02-21T09:23:31.1484507Z xor.b32 %r335, %r12473, 64; 2026-02-21T09:23:31.1484571Z add.s32 %r3964, %r3959, %r335; 2026-02-21T09:23:31.1484634Z xor.b32 %r336, %r12473, 80; 2026-02-21T09:23:31.1484700Z add.s32 %r3965, %r3959, %r336; 2026-02-21T09:23:31.1484763Z xor.b32 %r337, %r12473, 96; 2026-02-21T09:23:31.1484825Z add.s32 %r3966, %r3959, %r337; 2026-02-21T09:23:31.1484889Z xor.b32 %r338, %r12473, 112; 2026-02-21T09:23:31.1484957Z add.s32 %r3967, %r3959, %r338; 2026-02-21T09:23:31.1485161Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1485235Z ld.shared.s8 %rs65, [%r3960]; 2026-02-21T09:23:31.1485441Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1485508Z shl.b16 %rs66, %rs65, 4; 2026-02-21T09:23:31.1485709Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1485787Z ld.shared.s8 %rs67, [%r3961+128]; 2026-02-21T09:23:31.1485986Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1486050Z shl.b16 %rs68, %rs67, 4; 2026-02-21T09:23:31.1486255Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1486325Z ld.shared.s8 %rs69, [%r3962+256]; 2026-02-21T09:23:31.1486658Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1486726Z shl.b16 %rs70, %rs69, 4; 2026-02-21T09:23:31.1486932Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1487000Z ld.shared.s8 %rs71, [%r3963+384]; 2026-02-21T09:23:31.1487200Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1487270Z shl.b16 %rs72, %rs71, 4; 2026-02-21T09:23:31.1487474Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1487540Z ld.shared.s8 %rs73, [%r3964+512]; 2026-02-21T09:23:31.1487741Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1487804Z shl.b16 %rs74, %rs73, 4; 2026-02-21T09:23:31.1488002Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1488073Z ld.shared.s8 %rs75, [%r3965+640]; 2026-02-21T09:23:31.1488270Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1488511Z shl.b16 %rs76, %rs75, 4; 2026-02-21T09:23:31.1488711Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1488785Z ld.shared.s8 %rs77, [%r3966+768]; 2026-02-21T09:23:31.1488990Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1489053Z shl.b16 %rs78, %rs77, 4; 2026-02-21T09:23:31.1489255Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1489324Z ld.shared.s8 %rs79, [%r3967+896]; 2026-02-21T09:23:31.1489522Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1489654Z shl.b16 %rs80, %rs79, 4; 2026-02-21T09:23:31.1489855Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1489928Z ld.shared.s8 %rs81, [%r3960+1024]; 2026-02-21T09:23:31.1490129Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1490191Z shl.b16 %rs82, %rs81, 4; 2026-02-21T09:23:31.1490462Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1490538Z ld.shared.s8 %rs83, [%r3961+1152]; 2026-02-21T09:23:31.1490742Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1490805Z shl.b16 %rs84, %rs83, 4; 2026-02-21T09:23:31.1491001Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1491075Z ld.shared.s8 %rs85, [%r3962+1280]; 2026-02-21T09:23:31.1491274Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1491336Z shl.b16 %rs86, %rs85, 4; 2026-02-21T09:23:31.1491542Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1491608Z ld.shared.s8 %rs87, [%r3963+1408]; 2026-02-21T09:23:31.1491805Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1491873Z shl.b16 %rs88, %rs87, 4; 2026-02-21T09:23:31.1492070Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1492138Z ld.shared.s8 %rs89, [%r3964+1536]; 2026-02-21T09:23:31.1492335Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1492403Z shl.b16 %rs90, %rs89, 4; 2026-02-21T09:23:31.1492605Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1492673Z ld.shared.s8 %rs91, [%r3965+1664]; 2026-02-21T09:23:31.1492875Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1492941Z shl.b16 %rs92, %rs91, 4; 2026-02-21T09:23:31.1493141Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1493216Z ld.shared.s8 %rs93, [%r3966+1792]; 2026-02-21T09:23:31.1493415Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1493477Z shl.b16 %rs94, %rs93, 4; 2026-02-21T09:23:31.1493681Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1493748Z ld.shared.s8 %rs95, [%r3967+1920]; 2026-02-21T09:23:31.1493947Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1494011Z shl.b16 %rs96, %rs95, 4; 2026-02-21T09:23:31.1494213Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1494394Z ld.shared.s8 %rs97, [%r3960+2048]; 2026-02-21T09:23:31.1494591Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1494660Z shl.b16 %rs98, %rs97, 4; 2026-02-21T09:23:31.1494860Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1494926Z ld.shared.s8 %rs99, [%r3961+2176]; 2026-02-21T09:23:31.1495126Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1495192Z shl.b16 %rs100, %rs99, 4; 2026-02-21T09:23:31.1495389Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1495518Z ld.shared.s8 %rs101, [%r3962+2304]; 2026-02-21T09:23:31.1495720Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1495786Z shl.b16 %rs102, %rs101, 4; 2026-02-21T09:23:31.1495990Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1496063Z ld.shared.s8 %rs103, [%r3963+2432]; 2026-02-21T09:23:31.1496304Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1496370Z shl.b16 %rs104, %rs103, 4; 2026-02-21T09:23:31.1496702Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1496776Z ld.shared.s8 %rs105, [%r3964+2560]; 2026-02-21T09:23:31.1496974Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1497043Z shl.b16 %rs106, %rs105, 4; 2026-02-21T09:23:31.1497242Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1497311Z ld.shared.s8 %rs107, [%r3965+2688]; 2026-02-21T09:23:31.1497512Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1497579Z shl.b16 %rs108, %rs107, 4; 2026-02-21T09:23:31.1497777Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1497852Z ld.shared.s8 %rs109, [%r3966+2816]; 2026-02-21T09:23:31.1498049Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1498113Z shl.b16 %rs110, %rs109, 4; 2026-02-21T09:23:31.1498308Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1498382Z ld.shared.s8 %rs111, [%r3967+2944]; 2026-02-21T09:23:31.1498580Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1498644Z shl.b16 %rs112, %rs111, 4; 2026-02-21T09:23:31.1498854Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1498926Z ld.shared.s8 %rs113, [%r3960+3072]; 2026-02-21T09:23:31.1499126Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1499194Z shl.b16 %rs114, %rs113, 4; 2026-02-21T09:23:31.1499392Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1499461Z ld.shared.s8 %rs115, [%r3961+3200]; 2026-02-21T09:23:31.1499661Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1499725Z shl.b16 %rs116, %rs115, 4; 2026-02-21T09:23:31.1499921Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1499992Z ld.shared.s8 %rs117, [%r3962+3328]; 2026-02-21T09:23:31.1500196Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1500417Z shl.b16 %rs118, %rs117, 4; 2026-02-21T09:23:31.1500625Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1500701Z ld.shared.s8 %rs119, [%r3963+3456]; 2026-02-21T09:23:31.1500900Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1500961Z shl.b16 %rs120, %rs119, 4; 2026-02-21T09:23:31.1501164Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1501231Z ld.shared.s8 %rs121, [%r3964+3584]; 2026-02-21T09:23:31.1501427Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1501494Z shl.b16 %rs122, %rs121, 4; 2026-02-21T09:23:31.1501755Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1501826Z ld.shared.s8 %rs123, [%r3965+3712]; 2026-02-21T09:23:31.1502027Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1502100Z shl.b16 %rs124, %rs123, 4; 2026-02-21T09:23:31.1502355Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1502427Z ld.shared.s8 %rs125, [%r3966+3840]; 2026-02-21T09:23:31.1502630Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1502694Z shl.b16 %rs126, %rs125, 4; 2026-02-21T09:23:31.1502888Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1502960Z ld.shared.s8 %rs127, [%r3967+3968]; 2026-02-21T09:23:31.1503158Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1503221Z shl.b16 %rs128, %rs127, 4; 2026-02-21T09:23:31.1503424Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1503494Z cvt.s16.s8 %rs129, %rs66; 2026-02-21T09:23:31.1503555Z shr.s16 %rs130, %rs129, 4; 2026-02-21T09:23:31.1503619Z cvt.s16.s8 %rs131, %rs68; 2026-02-21T09:23:31.1503689Z shr.s16 %rs132, %rs131, 4; 2026-02-21T09:23:31.1503765Z shr.s16 %rs133, %rs65, 4; 2026-02-21T09:23:31.1503830Z shr.s16 %rs134, %rs67, 4; 2026-02-21T09:23:31.1504035Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1504103Z cvt.rn.f32.s16 %r3968, %rs134; 2026-02-21T09:23:31.1504168Z cvt.rn.f32.s16 %r3969, %rs133; 2026-02-21T09:23:31.1504239Z cvt.rn.f32.s16 %r3970, %rs132; 2026-02-21T09:23:31.1504304Z cvt.rn.f32.s16 %r3971, %rs130; 2026-02-21T09:23:31.1504503Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1504566Z cvt.s16.s8 %rs135, %rs70; 2026-02-21T09:23:31.1504635Z shr.s16 %rs136, %rs135, 4; 2026-02-21T09:23:31.1504698Z cvt.s16.s8 %rs137, %rs72; 2026-02-21T09:23:31.1504760Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:23:31.1504829Z shr.s16 %rs139, %rs69, 4; 2026-02-21T09:23:31.1504889Z shr.s16 %rs140, %rs71, 4; 2026-02-21T09:23:31.1505087Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1505152Z cvt.rn.f32.s16 %r3972, %rs140; 2026-02-21T09:23:31.1505223Z cvt.rn.f32.s16 %r3973, %rs139; 2026-02-21T09:23:31.1505285Z cvt.rn.f32.s16 %r3974, %rs138; 2026-02-21T09:23:31.1505348Z cvt.rn.f32.s16 %r3975, %rs136; 2026-02-21T09:23:31.1505552Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1505614Z cvt.s16.s8 %rs141, %rs74; 2026-02-21T09:23:31.1505679Z shr.s16 %rs142, %rs141, 4; 2026-02-21T09:23:31.1505746Z cvt.s16.s8 %rs143, %rs76; 2026-02-21T09:23:31.1505809Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:23:31.1505935Z shr.s16 %rs145, %rs73, 4; 2026-02-21T09:23:31.1506041Z shr.s16 %rs146, %rs75, 4; 2026-02-21T09:23:31.1506241Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1506306Z cvt.rn.f32.s16 %r3976, %rs146; 2026-02-21T09:23:31.1506372Z cvt.rn.f32.s16 %r3977, %rs145; 2026-02-21T09:23:31.1506441Z cvt.rn.f32.s16 %r3978, %rs144; 2026-02-21T09:23:31.1506628Z cvt.rn.f32.s16 %r3979, %rs142; 2026-02-21T09:23:31.1506828Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1506898Z cvt.s16.s8 %rs147, %rs78; 2026-02-21T09:23:31.1506961Z shr.s16 %rs148, %rs147, 4; 2026-02-21T09:23:31.1507024Z cvt.s16.s8 %rs149, %rs80; 2026-02-21T09:23:31.1507086Z shr.s16 %rs150, %rs149, 4; 2026-02-21T09:23:31.1507235Z shr.s16 %rs151, %rs77, 4; 2026-02-21T09:23:31.1507300Z shr.s16 %rs152, %rs79, 4; 2026-02-21T09:23:31.1507494Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1507570Z cvt.rn.f32.s16 %r3980, %rs152; 2026-02-21T09:23:31.1507636Z cvt.rn.f32.s16 %r3981, %rs151; 2026-02-21T09:23:31.1507701Z cvt.rn.f32.s16 %r3982, %rs150; 2026-02-21T09:23:31.1507828Z cvt.rn.f32.s16 %r3983, %rs148; 2026-02-21T09:23:31.1508036Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1508100Z cvt.s16.s8 %rs153, %rs82; 2026-02-21T09:23:31.1508164Z shr.s16 %rs154, %rs153, 4; 2026-02-21T09:23:31.1508308Z cvt.s16.s8 %rs155, %rs84; 2026-02-21T09:23:31.1508375Z shr.s16 %rs156, %rs155, 4; 2026-02-21T09:23:31.1508437Z shr.s16 %rs157, %rs81, 4; 2026-02-21T09:23:31.1508505Z shr.s16 %rs158, %rs83, 4; 2026-02-21T09:23:31.1508707Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1508773Z cvt.rn.f32.s16 %r3984, %rs158; 2026-02-21T09:23:31.1508836Z cvt.rn.f32.s16 %r3985, %rs157; 2026-02-21T09:23:31.1508908Z cvt.rn.f32.s16 %r3986, %rs156; 2026-02-21T09:23:31.1508970Z cvt.rn.f32.s16 %r3987, %rs154; 2026-02-21T09:23:31.1509176Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1509247Z cvt.s16.s8 %rs159, %rs86; 2026-02-21T09:23:31.1509314Z shr.s16 %rs160, %rs159, 4; 2026-02-21T09:23:31.1509378Z cvt.s16.s8 %rs161, %rs88; 2026-02-21T09:23:31.1509445Z shr.s16 %rs162, %rs161, 4; 2026-02-21T09:23:31.1509513Z shr.s16 %rs163, %rs85, 4; 2026-02-21T09:23:31.1509579Z shr.s16 %rs164, %rs87, 4; 2026-02-21T09:23:31.1509777Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1509849Z cvt.rn.f32.s16 %r3988, %rs164; 2026-02-21T09:23:31.1509914Z cvt.rn.f32.s16 %r3989, %rs163; 2026-02-21T09:23:31.1509980Z cvt.rn.f32.s16 %r3990, %rs162; 2026-02-21T09:23:31.1510052Z cvt.rn.f32.s16 %r3991, %rs160; 2026-02-21T09:23:31.1510249Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1510314Z cvt.s16.s8 %rs165, %rs90; 2026-02-21T09:23:31.1510378Z shr.s16 %rs166, %rs165, 4; 2026-02-21T09:23:31.1510449Z cvt.s16.s8 %rs167, %rs92; 2026-02-21T09:23:31.1510513Z shr.s16 %rs168, %rs167, 4; 2026-02-21T09:23:31.1510579Z shr.s16 %rs169, %rs89, 4; 2026-02-21T09:23:31.1510646Z shr.s16 %rs170, %rs91, 4; 2026-02-21T09:23:31.1510842Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1510908Z cvt.rn.f32.s16 %r3992, %rs170; 2026-02-21T09:23:31.1510976Z cvt.rn.f32.s16 %r3993, %rs169; 2026-02-21T09:23:31.1511040Z cvt.rn.f32.s16 %r3994, %rs168; 2026-02-21T09:23:31.1511106Z cvt.rn.f32.s16 %r3995, %rs166; 2026-02-21T09:23:31.1511304Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1511372Z cvt.s16.s8 %rs171, %rs94; 2026-02-21T09:23:31.1511629Z shr.s16 %rs172, %rs171, 4; 2026-02-21T09:23:31.1511693Z cvt.s16.s8 %rs173, %rs96; 2026-02-21T09:23:31.1511764Z shr.s16 %rs174, %rs173, 4; 2026-02-21T09:23:31.1511828Z shr.s16 %rs175, %rs93, 4; 2026-02-21T09:23:31.1511895Z shr.s16 %rs176, %rs95, 4; 2026-02-21T09:23:31.1512103Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1512183Z cvt.rn.f32.s16 %r3996, %rs176; 2026-02-21T09:23:31.1512253Z cvt.rn.f32.s16 %r3997, %rs175; 2026-02-21T09:23:31.1512319Z cvt.rn.f32.s16 %r3998, %rs174; 2026-02-21T09:23:31.1512397Z cvt.rn.f32.s16 %r3999, %rs172; 2026-02-21T09:23:31.1512595Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1512716Z cvt.s16.s8 %rs177, %rs98; 2026-02-21T09:23:31.1512789Z shr.s16 %rs178, %rs177, 4; 2026-02-21T09:23:31.1512852Z cvt.s16.s8 %rs179, %rs100; 2026-02-21T09:23:31.1512916Z shr.s16 %rs180, %rs179, 4; 2026-02-21T09:23:31.1512984Z shr.s16 %rs181, %rs97, 4; 2026-02-21T09:23:31.1513058Z shr.s16 %rs182, %rs99, 4; 2026-02-21T09:23:31.1513256Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1513368Z cvt.rn.f32.s16 %r4000, %rs182; 2026-02-21T09:23:31.1513439Z cvt.rn.f32.s16 %r4001, %rs181; 2026-02-21T09:23:31.1513502Z cvt.rn.f32.s16 %r4002, %rs180; 2026-02-21T09:23:31.1513566Z cvt.rn.f32.s16 %r4003, %rs178; 2026-02-21T09:23:31.1513766Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1513838Z cvt.s16.s8 %rs183, %rs102; 2026-02-21T09:23:31.1513900Z shr.s16 %rs184, %rs183, 4; 2026-02-21T09:23:31.1513962Z cvt.s16.s8 %rs185, %rs104; 2026-02-21T09:23:31.1514038Z shr.s16 %rs186, %rs185, 4; 2026-02-21T09:23:31.1514107Z shr.s16 %rs187, %rs101, 4; 2026-02-21T09:23:31.1514168Z shr.s16 %rs188, %rs103, 4; 2026-02-21T09:23:31.1514376Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1514446Z cvt.rn.f32.s16 %r4004, %rs188; 2026-02-21T09:23:31.1514510Z cvt.rn.f32.s16 %r4005, %rs187; 2026-02-21T09:23:31.1514574Z cvt.rn.f32.s16 %r4006, %rs186; 2026-02-21T09:23:31.1514646Z cvt.rn.f32.s16 %r4007, %rs184; 2026-02-21T09:23:31.1514845Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1514908Z cvt.s16.s8 %rs189, %rs106; 2026-02-21T09:23:31.1514977Z shr.s16 %rs190, %rs189, 4; 2026-02-21T09:23:31.1515041Z cvt.s16.s8 %rs191, %rs108; 2026-02-21T09:23:31.1515105Z shr.s16 %rs192, %rs191, 4; 2026-02-21T09:23:31.1515171Z shr.s16 %rs193, %rs105, 4; 2026-02-21T09:23:31.1515234Z shr.s16 %rs194, %rs107, 4; 2026-02-21T09:23:31.1515434Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1515500Z cvt.rn.f32.s16 %r4008, %rs194; 2026-02-21T09:23:31.1515575Z cvt.rn.f32.s16 %r4009, %rs193; 2026-02-21T09:23:31.1515640Z cvt.rn.f32.s16 %r4010, %rs192; 2026-02-21T09:23:31.1515705Z cvt.rn.f32.s16 %r4011, %rs190; 2026-02-21T09:23:31.1515911Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1515976Z cvt.s16.s8 %rs195, %rs110; 2026-02-21T09:23:31.1516045Z shr.s16 %rs196, %rs195, 4; 2026-02-21T09:23:31.1516110Z cvt.s16.s8 %rs197, %rs112; 2026-02-21T09:23:31.1516184Z shr.s16 %rs198, %rs197, 4; 2026-02-21T09:23:31.1516250Z shr.s16 %rs199, %rs109, 4; 2026-02-21T09:23:31.1516313Z shr.s16 %rs200, %rs111, 4; 2026-02-21T09:23:31.1516641Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1516714Z cvt.rn.f32.s16 %r4012, %rs200; 2026-02-21T09:23:31.1516785Z cvt.rn.f32.s16 %r4013, %rs199; 2026-02-21T09:23:31.1516858Z cvt.rn.f32.s16 %r4014, %rs198; 2026-02-21T09:23:31.1516923Z cvt.rn.f32.s16 %r4015, %rs196; 2026-02-21T09:23:31.1517274Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1517340Z cvt.s16.s8 %rs201, %rs114; 2026-02-21T09:23:31.1517416Z shr.s16 %rs202, %rs201, 4; 2026-02-21T09:23:31.1517481Z cvt.s16.s8 %rs203, %rs116; 2026-02-21T09:23:31.1517548Z shr.s16 %rs204, %rs203, 4; 2026-02-21T09:23:31.1517630Z shr.s16 %rs205, %rs113, 4; 2026-02-21T09:23:31.1517700Z shr.s16 %rs206, %rs115, 4; 2026-02-21T09:23:31.1517899Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1517972Z cvt.rn.f32.s16 %r4016, %rs206; 2026-02-21T09:23:31.1518039Z cvt.rn.f32.s16 %r4017, %rs205; 2026-02-21T09:23:31.1518106Z cvt.rn.f32.s16 %r4018, %rs204; 2026-02-21T09:23:31.1518234Z cvt.rn.f32.s16 %r4019, %rs202; 2026-02-21T09:23:31.1518445Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1518508Z cvt.s16.s8 %rs207, %rs118; 2026-02-21T09:23:31.1518577Z shr.s16 %rs208, %rs207, 4; 2026-02-21T09:23:31.1518643Z cvt.s16.s8 %rs209, %rs120; 2026-02-21T09:23:31.1518707Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:23:31.1518775Z shr.s16 %rs211, %rs117, 4; 2026-02-21T09:23:31.1518897Z shr.s16 %rs212, %rs119, 4; 2026-02-21T09:23:31.1519112Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1519179Z cvt.rn.f32.s16 %r4020, %rs212; 2026-02-21T09:23:31.1519245Z cvt.rn.f32.s16 %r4021, %rs211; 2026-02-21T09:23:31.1519316Z cvt.rn.f32.s16 %r4022, %rs210; 2026-02-21T09:23:31.1519383Z cvt.rn.f32.s16 %r4023, %rs208; 2026-02-21T09:23:31.1519580Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1519654Z cvt.s16.s8 %rs213, %rs122; 2026-02-21T09:23:31.1519719Z shr.s16 %rs214, %rs213, 4; 2026-02-21T09:23:31.1519783Z cvt.s16.s8 %rs215, %rs124; 2026-02-21T09:23:31.1519848Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:23:31.1519917Z shr.s16 %rs217, %rs121, 4; 2026-02-21T09:23:31.1519983Z shr.s16 %rs218, %rs123, 4; 2026-02-21T09:23:31.1520182Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1520269Z cvt.rn.f32.s16 %r4024, %rs218; 2026-02-21T09:23:31.1520339Z cvt.rn.f32.s16 %r4025, %rs217; 2026-02-21T09:23:31.1520405Z cvt.rn.f32.s16 %r4026, %rs216; 2026-02-21T09:23:31.1520471Z cvt.rn.f32.s16 %r4027, %rs214; 2026-02-21T09:23:31.1520677Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1520742Z cvt.s16.s8 %rs219, %rs126; 2026-02-21T09:23:31.1520805Z shr.s16 %rs220, %rs219, 4; 2026-02-21T09:23:31.1520876Z cvt.s16.s8 %rs221, %rs128; 2026-02-21T09:23:31.1520942Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:23:31.1521005Z shr.s16 %rs223, %rs125, 4; 2026-02-21T09:23:31.1521073Z shr.s16 %rs224, %rs127, 4; 2026-02-21T09:23:31.1521276Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1521342Z cvt.rn.f32.s16 %r4028, %rs224; 2026-02-21T09:23:31.1521409Z cvt.rn.f32.s16 %r4029, %rs223; 2026-02-21T09:23:31.1521481Z cvt.rn.f32.s16 %r4030, %rs222; 2026-02-21T09:23:31.1521543Z cvt.rn.f32.s16 %r4031, %rs220; 2026-02-21T09:23:31.1521661Z st.shared.v4.b32 [%r67], {%r3971, %r3969, %r3970, %r3968}; 2026-02-21T09:23:31.1521794Z st.shared.v4.b32 [%r67+16384], {%r4003, %r4001, %r4002, %r4000}; 2026-02-21T09:23:31.1521903Z st.shared.v4.b32 [%r68], {%r3975, %r3973, %r3974, %r3972}; 2026-02-21T09:23:31.1522020Z st.shared.v4.b32 [%r68+16384], {%r4007, %r4005, %r4006, %r4004}; 2026-02-21T09:23:31.1522130Z st.shared.v4.b32 [%r69], {%r3979, %r3977, %r3978, %r3976}; 2026-02-21T09:23:31.1522249Z st.shared.v4.b32 [%r69+16384], {%r4011, %r4009, %r4010, %r4008}; 2026-02-21T09:23:31.1522354Z st.shared.v4.b32 [%r70], {%r3983, %r3981, %r3982, %r3980}; 2026-02-21T09:23:31.1522585Z st.shared.v4.b32 [%r70+16384], {%r4015, %r4013, %r4014, %r4012}; 2026-02-21T09:23:31.1522696Z st.shared.v4.b32 [%r71], {%r3987, %r3985, %r3986, %r3984}; 2026-02-21T09:23:31.1522809Z st.shared.v4.b32 [%r71+16384], {%r4019, %r4017, %r4018, %r4016}; 2026-02-21T09:23:31.1522916Z st.shared.v4.b32 [%r72], {%r3991, %r3989, %r3990, %r3988}; 2026-02-21T09:23:31.1523040Z st.shared.v4.b32 [%r72+16384], {%r4023, %r4021, %r4022, %r4020}; 2026-02-21T09:23:31.1523141Z st.shared.v4.b32 [%r73], {%r3995, %r3993, %r3994, %r3992}; 2026-02-21T09:23:31.1523254Z st.shared.v4.b32 [%r73+16384], {%r4027, %r4025, %r4026, %r4024}; 2026-02-21T09:23:31.1523361Z st.shared.v4.b32 [%r74], {%r3999, %r3997, %r3998, %r3996}; 2026-02-21T09:23:31.1523479Z st.shared.v4.b32 [%r74+16384], {%r4031, %r4029, %r4030, %r4028}; 2026-02-21T09:23:31.1523587Z $L__tmp1: 2026-02-21T09:23:31.1523876Z .loc 2 291 36 // standard.py:291:36 @[ cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:88:40 ] 2026-02-21T09:23:31.1523946Z // begin inline asm 2026-02-21T09:23:31.1524033Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1524094Z // end inline asm 2026-02-21T09:23:31.1524160Z bar.sync 0; 2026-02-21T09:23:31.1524246Z shfl.sync.idx.b32 %r4032, %r5, 0, 31, -1; 2026-02-21T09:23:31.1524367Z wgmma.fence.sync.aligned; 2026-02-21T09:23:31.1524455Z mov.pred %p47, -1; 2026-02-21T09:23:31.1524518Z // begin inline asm 2026-02-21T09:23:31.1526010Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r1659,%r1660,%r1661,%r1662}, %rd3, %p47, 1, 1; 2026-02-21T09:23:31.1526079Z // end inline asm 2026-02-21T09:23:31.1526140Z // begin inline asm 2026-02-21T09:23:31.1527733Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r1791,%r1792,%r1793,%r1794}, %rd4, %p47, 1, 1; 2026-02-21T09:23:31.1527806Z // end inline asm 2026-02-21T09:23:31.1527867Z // begin inline asm 2026-02-21T09:23:31.1529359Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r1923,%r1924,%r1925,%r1926}, %rd5, %p47, 1, 1; 2026-02-21T09:23:31.1529429Z // end inline asm 2026-02-21T09:23:31.1529495Z // begin inline asm 2026-02-21T09:23:31.1530980Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r2055,%r2056,%r2057,%r2058}, %rd6, %p47, 1, 1; 2026-02-21T09:23:31.1531198Z // end inline asm 2026-02-21T09:23:31.1531263Z // begin inline asm 2026-02-21T09:23:31.1532804Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r2187,%r2188,%r2189,%r2190}, %rd7, %p47, 1, 1; 2026-02-21T09:23:31.1532870Z // end inline asm 2026-02-21T09:23:31.1532947Z // begin inline asm 2026-02-21T09:23:31.1534498Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r2319,%r2320,%r2321,%r2322}, %rd8, %p47, 1, 1; 2026-02-21T09:23:31.1534561Z // end inline asm 2026-02-21T09:23:31.1534628Z // begin inline asm 2026-02-21T09:23:31.1536104Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r2451,%r2452,%r2453,%r2454}, %rd9, %p47, 1, 1; 2026-02-21T09:23:31.1536173Z // end inline asm 2026-02-21T09:23:31.1536234Z // begin inline asm 2026-02-21T09:23:31.1537824Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548}, {%r2583,%r2584,%r2585,%r2586}, %rd10, %p47, 1, 1; 2026-02-21T09:23:31.1537912Z // end inline asm 2026-02-21T09:23:31.1537980Z // begin inline asm 2026-02-21T09:23:31.1539467Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r2715,%r2716,%r2717,%r2718}, %rd3, %p47, 1, 1; 2026-02-21T09:23:31.1539540Z // end inline asm 2026-02-21T09:23:31.1539602Z // begin inline asm 2026-02-21T09:23:31.1541080Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r2847,%r2848,%r2849,%r2850}, %rd4, %p47, 1, 1; 2026-02-21T09:23:31.1541284Z // end inline asm 2026-02-21T09:23:31.1541346Z // begin inline asm 2026-02-21T09:23:31.1542947Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r2979,%r2980,%r2981,%r2982}, %rd5, %p47, 1, 1; 2026-02-21T09:23:31.1543015Z // end inline asm 2026-02-21T09:23:31.1543077Z // begin inline asm 2026-02-21T09:23:31.1544561Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r3111,%r3112,%r3113,%r3114}, %rd6, %p47, 1, 1; 2026-02-21T09:23:31.1544626Z // end inline asm 2026-02-21T09:23:31.1544696Z // begin inline asm 2026-02-21T09:23:31.1546176Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r3243,%r3244,%r3245,%r3246}, %rd7, %p47, 1, 1; 2026-02-21T09:23:31.1546250Z // end inline asm 2026-02-21T09:23:31.1546327Z // begin inline asm 2026-02-21T09:23:31.1547917Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r3375,%r3376,%r3377,%r3378}, %rd8, %p47, 1, 1; 2026-02-21T09:23:31.1547995Z // end inline asm 2026-02-21T09:23:31.1548056Z // begin inline asm 2026-02-21T09:23:31.1549592Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r3507,%r3508,%r3509,%r3510}, %rd9, %p47, 1, 1; 2026-02-21T09:23:31.1549798Z // end inline asm 2026-02-21T09:23:31.1549861Z // begin inline asm 2026-02-21T09:23:31.1551402Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612}, {%r3639,%r3640,%r3641,%r3642}, %rd10, %p47, 1, 1; 2026-02-21T09:23:31.1551476Z // end inline asm 2026-02-21T09:23:31.1551563Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:31.1551635Z mov.b32 %r3772, %r1380; 2026-02-21T09:23:31.1551699Z mov.b32 %r3773, %r1380; 2026-02-21T09:23:31.1551761Z mov.b32 %r3771, %r1310; 2026-02-21T09:23:31.1551821Z // begin inline asm 2026-02-21T09:23:31.1554408Z // wait for regs: %r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r3771,%r3772,%r3773 2026-02-21T09:23:31.1554497Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:31.1554562Z // end inline asm 2026-02-21T09:23:31.1554623Z $L__tmp2: 2026-02-21T09:23:31.1554838Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1554912Z add.s32 %r4033, %r12484, 1; 2026-02-21T09:23:31.1554984Z setp.gt.s32 %p69, %r4033, 1; 2026-02-21T09:23:31.1555058Z selp.b32 %r12484, 0, %r4033, %p69; 2026-02-21T09:23:31.1555272Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1555340Z add.s64 %rd280, %rd755, %rd37; 2026-02-21T09:23:31.1555411Z add.s64 %rd281, %rd755, %rd36; 2026-02-21T09:23:31.1555474Z add.s64 %rd282, %rd755, %rd35; 2026-02-21T09:23:31.1555544Z add.s64 %rd283, %rd755, %rd34; 2026-02-21T09:23:31.1555608Z add.s64 %rd284, %rd755, %rd33; 2026-02-21T09:23:31.1555674Z add.s64 %rd285, %rd755, %rd32; 2026-02-21T09:23:31.1555742Z add.s64 %rd286, %rd755, %rd31; 2026-02-21T09:23:31.1555804Z add.s64 %rd287, %rd755, %rd30; 2026-02-21T09:23:31.1555868Z add.s64 %rd288, %rd755, %rd29; 2026-02-21T09:23:31.1555932Z add.s64 %rd289, %rd755, %rd28; 2026-02-21T09:23:31.1555999Z add.s64 %rd290, %rd755, %rd27; 2026-02-21T09:23:31.1556063Z add.s64 %rd291, %rd755, %rd26; 2026-02-21T09:23:31.1556127Z add.s64 %rd292, %rd755, %rd25; 2026-02-21T09:23:31.1556196Z add.s64 %rd293, %rd755, %rd24; 2026-02-21T09:23:31.1556260Z add.s64 %rd294, %rd755, %rd23; 2026-02-21T09:23:31.1556586Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1556753Z add.s64 %rd295, %rd755, %rd22; 2026-02-21T09:23:31.1556889Z shl.b32 %r4034, %r12484, 14; 2026-02-21T09:23:31.1556956Z add.s32 %r4035, %r1290, %r4034; 2026-02-21T09:23:31.1557022Z add.s32 %r3905, %r4035, %r26; 2026-02-21T09:23:31.1557094Z selp.b32 %r3906, 8, 0, %p67; 2026-02-21T09:23:31.1557160Z // begin inline asm 2026-02-21T09:23:31.1557330Z cp.async.ca.shared.global [ %r3905 + 0 ], [ %rd280 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1557398Z // end inline asm 2026-02-21T09:23:31.1557466Z add.s32 %r3907, %r3905, 1024; 2026-02-21T09:23:31.1557528Z // begin inline asm 2026-02-21T09:23:31.1557670Z cp.async.ca.shared.global [ %r3907 + 0 ], [ %rd281 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1557737Z // end inline asm 2026-02-21T09:23:31.1557800Z add.s32 %r3909, %r3905, 2048; 2026-02-21T09:23:31.1557927Z // begin inline asm 2026-02-21T09:23:31.1558071Z cp.async.ca.shared.global [ %r3909 + 0 ], [ %rd282 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1558130Z // end inline asm 2026-02-21T09:23:31.1558193Z add.s32 %r3911, %r3905, 3072; 2026-02-21T09:23:31.1558257Z // begin inline asm 2026-02-21T09:23:31.1558399Z cp.async.ca.shared.global [ %r3911 + 0 ], [ %rd283 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1558458Z // end inline asm 2026-02-21T09:23:31.1558520Z add.s32 %r3913, %r3905, 4096; 2026-02-21T09:23:31.1558652Z // begin inline asm 2026-02-21T09:23:31.1558800Z cp.async.ca.shared.global [ %r3913 + 0 ], [ %rd284 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1558860Z // end inline asm 2026-02-21T09:23:31.1558926Z add.s32 %r3915, %r3905, 5120; 2026-02-21T09:23:31.1558986Z // begin inline asm 2026-02-21T09:23:31.1559121Z cp.async.ca.shared.global [ %r3915 + 0 ], [ %rd285 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1559180Z // end inline asm 2026-02-21T09:23:31.1559248Z add.s32 %r3917, %r3905, 6144; 2026-02-21T09:23:31.1559311Z // begin inline asm 2026-02-21T09:23:31.1559446Z cp.async.ca.shared.global [ %r3917 + 0 ], [ %rd286 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1559511Z // end inline asm 2026-02-21T09:23:31.1559575Z add.s32 %r3919, %r3905, 7168; 2026-02-21T09:23:31.1559634Z // begin inline asm 2026-02-21T09:23:31.1559769Z cp.async.ca.shared.global [ %r3919 + 0 ], [ %rd287 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1559834Z // end inline asm 2026-02-21T09:23:31.1559899Z add.s32 %r3921, %r3905, 8192; 2026-02-21T09:23:31.1559958Z // begin inline asm 2026-02-21T09:23:31.1560097Z cp.async.ca.shared.global [ %r3921 + 0 ], [ %rd288 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1560155Z // end inline asm 2026-02-21T09:23:31.1560217Z add.s32 %r3923, %r3905, 9216; 2026-02-21T09:23:31.1560284Z // begin inline asm 2026-02-21T09:23:31.1560419Z cp.async.ca.shared.global [ %r3923 + 0 ], [ %rd289 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1560477Z // end inline asm 2026-02-21T09:23:31.1560543Z add.s32 %r3925, %r3905, 10240; 2026-02-21T09:23:31.1560610Z // begin inline asm 2026-02-21T09:23:31.1560742Z cp.async.ca.shared.global [ %r3925 + 0 ], [ %rd290 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1560802Z // end inline asm 2026-02-21T09:23:31.1560873Z add.s32 %r3927, %r3905, 11264; 2026-02-21T09:23:31.1560934Z // begin inline asm 2026-02-21T09:23:31.1561067Z cp.async.ca.shared.global [ %r3927 + 0 ], [ %rd291 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1561127Z // end inline asm 2026-02-21T09:23:31.1561197Z add.s32 %r3929, %r3905, 12288; 2026-02-21T09:23:31.1561258Z // begin inline asm 2026-02-21T09:23:31.1561392Z cp.async.ca.shared.global [ %r3929 + 0 ], [ %rd292 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1561455Z // end inline asm 2026-02-21T09:23:31.1561517Z add.s32 %r3931, %r3905, 13312; 2026-02-21T09:23:31.1561578Z // begin inline asm 2026-02-21T09:23:31.1561709Z cp.async.ca.shared.global [ %r3931 + 0 ], [ %rd293 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1561776Z // end inline asm 2026-02-21T09:23:31.1561841Z add.s32 %r3933, %r3905, 14336; 2026-02-21T09:23:31.1561903Z // begin inline asm 2026-02-21T09:23:31.1562040Z cp.async.ca.shared.global [ %r3933 + 0 ], [ %rd294 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1562174Z // end inline asm 2026-02-21T09:23:31.1562287Z add.s32 %r3935, %r3905, 15360; 2026-02-21T09:23:31.1562352Z // begin inline asm 2026-02-21T09:23:31.1562483Z cp.async.ca.shared.global [ %r3935 + 0 ], [ %rd295 + 0 ], 0x8, %r3906; 2026-02-21T09:23:31.1562542Z // end inline asm 2026-02-21T09:23:31.1562612Z cp.async.commit_group; 2026-02-21T09:23:31.1562839Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1562907Z shl.b32 %r4036, %r12484, 3; 2026-02-21T09:23:31.1562973Z add.s32 %r3937, %r6810, %r4036; 2026-02-21T09:23:31.1563048Z and.pred %p63, %p160, %p67; 2026-02-21T09:23:31.1563109Z // begin inline asm 2026-02-21T09:23:31.1563248Z @%p63 mbarrier.arrive.expect_tx.shared.b64 _, [%r3937], 4096; 2026-02-21T09:23:31.1563386Z // end inline asm 2026-02-21T09:23:31.1563606Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1563671Z shl.b32 %r4037, %r12484, 12; 2026-02-21T09:23:31.1563741Z add.s32 %r3938, %r1378, %r4037; 2026-02-21T09:23:31.1563811Z bar.sync 0; 2026-02-21T09:23:31.1563882Z elect.sync %r4038|%p70, -1; 2026-02-21T09:23:31.1563953Z and.pred %p71, %p67, %p70; 2026-02-21T09:23:31.1564028Z and.pred %p64, %p1, %p71; 2026-02-21T09:23:31.1564143Z cvt.u32.u64 %r4039, %rd756; 2026-02-21T09:23:31.1564207Z add.s32 %r3940, %r4039, 64; 2026-02-21T09:23:31.1564268Z // begin inline asm 2026-02-21T09:23:31.1564603Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3938], [%rd463, {%r1379, %r3940}], [%r3937]; 2026-02-21T09:23:31.1564663Z // end inline asm 2026-02-21T09:23:31.1564869Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1564944Z add.s64 %rd755, %rd755, 128; 2026-02-21T09:23:31.1565014Z setp.lt.u64 %p72, %rd756, 480; 2026-02-21T09:23:31.1565076Z add.s64 %rd756, %rd756, 32; 2026-02-21T09:23:31.1565142Z @%p72 bra $L__BB0_3; 2026-02-21T09:23:31.1565258Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:31.1565327Z cp.async.wait_group 0; 2026-02-21T09:23:31.1565385Z bar.sync 0; 2026-02-21T09:23:31.1565450Z // begin inline asm 2026-02-21T09:23:31.1565551Z @%p160 mbarrier.inval.shared::cta.b64 [%r6810]; 2026-02-21T09:23:31.1565611Z // end inline asm 2026-02-21T09:23:31.1565672Z bar.sync 0; 2026-02-21T09:23:31.1565732Z // begin inline asm 2026-02-21T09:23:31.1565826Z @%p160 mbarrier.inval.shared::cta.b64 [%r6811]; 2026-02-21T09:23:31.1565884Z // end inline asm 2026-02-21T09:23:31.1566095Z .loc 1 91 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:91:28 2026-02-21T09:23:31.1566184Z cvt.rn.bf16x2.f32 %r4126, %r12486, %r12485; 2026-02-21T09:23:31.1566269Z cvt.rn.bf16x2.f32 %r4127, %r12488, %r12487; 2026-02-21T09:23:31.1566353Z cvt.rn.bf16x2.f32 %r4128, %r12490, %r12489; 2026-02-21T09:23:31.1566433Z cvt.rn.bf16x2.f32 %r4129, %r12492, %r12491; 2026-02-21T09:23:31.1566641Z cvt.rn.bf16x2.f32 %r4130, %r12494, %r12493; 2026-02-21T09:23:31.1566731Z cvt.rn.bf16x2.f32 %r4131, %r12496, %r12495; 2026-02-21T09:23:31.1566807Z cvt.rn.bf16x2.f32 %r4132, %r12498, %r12497; 2026-02-21T09:23:31.1566884Z cvt.rn.bf16x2.f32 %r4133, %r12500, %r12499; 2026-02-21T09:23:31.1566962Z cvt.rn.bf16x2.f32 %r4134, %r12502, %r12501; 2026-02-21T09:23:31.1567045Z cvt.rn.bf16x2.f32 %r4135, %r12504, %r12503; 2026-02-21T09:23:31.1567120Z cvt.rn.bf16x2.f32 %r4136, %r12506, %r12505; 2026-02-21T09:23:31.1567196Z cvt.rn.bf16x2.f32 %r4137, %r12508, %r12507; 2026-02-21T09:23:31.1567278Z cvt.rn.bf16x2.f32 %r4138, %r12510, %r12509; 2026-02-21T09:23:31.1567355Z cvt.rn.bf16x2.f32 %r4139, %r12512, %r12511; 2026-02-21T09:23:31.1567431Z cvt.rn.bf16x2.f32 %r4140, %r12514, %r12513; 2026-02-21T09:23:31.1567513Z cvt.rn.bf16x2.f32 %r4141, %r12516, %r12515; 2026-02-21T09:23:31.1567589Z cvt.rn.bf16x2.f32 %r4142, %r12518, %r12517; 2026-02-21T09:23:31.1567666Z cvt.rn.bf16x2.f32 %r4143, %r12520, %r12519; 2026-02-21T09:23:31.1567894Z cvt.rn.bf16x2.f32 %r4144, %r12522, %r12521; 2026-02-21T09:23:31.1567975Z cvt.rn.bf16x2.f32 %r4145, %r12524, %r12523; 2026-02-21T09:23:31.1568051Z cvt.rn.bf16x2.f32 %r4146, %r12526, %r12525; 2026-02-21T09:23:31.1568590Z cvt.rn.bf16x2.f32 %r4147, %r12528, %r12527; 2026-02-21T09:23:31.1568674Z cvt.rn.bf16x2.f32 %r4148, %r12530, %r12529; 2026-02-21T09:23:31.1568749Z cvt.rn.bf16x2.f32 %r4149, %r12532, %r12531; 2026-02-21T09:23:31.1568833Z cvt.rn.bf16x2.f32 %r4150, %r12534, %r12533; 2026-02-21T09:23:31.1568914Z cvt.rn.bf16x2.f32 %r4151, %r12536, %r12535; 2026-02-21T09:23:31.1568992Z cvt.rn.bf16x2.f32 %r4152, %r12538, %r12537; 2026-02-21T09:23:31.1569069Z cvt.rn.bf16x2.f32 %r4153, %r12540, %r12539; 2026-02-21T09:23:31.1569228Z cvt.rn.bf16x2.f32 %r4154, %r12542, %r12541; 2026-02-21T09:23:31.1569308Z cvt.rn.bf16x2.f32 %r4155, %r12544, %r12543; 2026-02-21T09:23:31.1569386Z cvt.rn.bf16x2.f32 %r4156, %r12546, %r12545; 2026-02-21T09:23:31.1569466Z cvt.rn.bf16x2.f32 %r4157, %r12548, %r12547; 2026-02-21T09:23:31.1569548Z cvt.rn.bf16x2.f32 %r4158, %r12550, %r12549; 2026-02-21T09:23:31.1569624Z cvt.rn.bf16x2.f32 %r4159, %r12552, %r12551; 2026-02-21T09:23:31.1569700Z cvt.rn.bf16x2.f32 %r4160, %r12554, %r12553; 2026-02-21T09:23:31.1569842Z cvt.rn.bf16x2.f32 %r4161, %r12556, %r12555; 2026-02-21T09:23:31.1569923Z cvt.rn.bf16x2.f32 %r4162, %r12558, %r12557; 2026-02-21T09:23:31.1569998Z cvt.rn.bf16x2.f32 %r4163, %r12560, %r12559; 2026-02-21T09:23:31.1570073Z cvt.rn.bf16x2.f32 %r4164, %r12562, %r12561; 2026-02-21T09:23:31.1570158Z cvt.rn.bf16x2.f32 %r4165, %r12564, %r12563; 2026-02-21T09:23:31.1570233Z cvt.rn.bf16x2.f32 %r4166, %r12566, %r12565; 2026-02-21T09:23:31.1570315Z cvt.rn.bf16x2.f32 %r4167, %r12568, %r12567; 2026-02-21T09:23:31.1570402Z cvt.rn.bf16x2.f32 %r4168, %r12570, %r12569; 2026-02-21T09:23:31.1570480Z cvt.rn.bf16x2.f32 %r4169, %r12572, %r12571; 2026-02-21T09:23:31.1570556Z cvt.rn.bf16x2.f32 %r4170, %r12574, %r12573; 2026-02-21T09:23:31.1570642Z cvt.rn.bf16x2.f32 %r4171, %r12576, %r12575; 2026-02-21T09:23:31.1570720Z cvt.rn.bf16x2.f32 %r4172, %r12578, %r12577; 2026-02-21T09:23:31.1570796Z cvt.rn.bf16x2.f32 %r4173, %r12580, %r12579; 2026-02-21T09:23:31.1570876Z cvt.rn.bf16x2.f32 %r4174, %r12582, %r12581; 2026-02-21T09:23:31.1570960Z cvt.rn.bf16x2.f32 %r4175, %r12584, %r12583; 2026-02-21T09:23:31.1571039Z cvt.rn.bf16x2.f32 %r4176, %r12586, %r12585; 2026-02-21T09:23:31.1571116Z cvt.rn.bf16x2.f32 %r4177, %r12588, %r12587; 2026-02-21T09:23:31.1571203Z cvt.rn.bf16x2.f32 %r4178, %r12590, %r12589; 2026-02-21T09:23:31.1571280Z cvt.rn.bf16x2.f32 %r4179, %r12592, %r12591; 2026-02-21T09:23:31.1571357Z cvt.rn.bf16x2.f32 %r4180, %r12594, %r12593; 2026-02-21T09:23:31.1571442Z cvt.rn.bf16x2.f32 %r4181, %r12596, %r12595; 2026-02-21T09:23:31.1571522Z cvt.rn.bf16x2.f32 %r4182, %r12598, %r12597; 2026-02-21T09:23:31.1571600Z cvt.rn.bf16x2.f32 %r4183, %r12600, %r12599; 2026-02-21T09:23:31.1571679Z cvt.rn.bf16x2.f32 %r4184, %r12602, %r12601; 2026-02-21T09:23:31.1571770Z cvt.rn.bf16x2.f32 %r4185, %r12604, %r12603; 2026-02-21T09:23:31.1571847Z cvt.rn.bf16x2.f32 %r4186, %r12606, %r12605; 2026-02-21T09:23:31.1571923Z cvt.rn.bf16x2.f32 %r4187, %r12608, %r12607; 2026-02-21T09:23:31.1572009Z cvt.rn.bf16x2.f32 %r4188, %r12610, %r12609; 2026-02-21T09:23:31.1572086Z cvt.rn.bf16x2.f32 %r4189, %r12612, %r12611; 2026-02-21T09:23:31.1572301Z .loc 1 92 43 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:92:43 2026-02-21T09:23:31.1572496Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r4126, %r4127, %r4128, %r4129}; 2026-02-21T09:23:31.1572678Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r4142, %r4143, %r4144, %r4145}; 2026-02-21T09:23:31.1572861Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r4158, %r4159, %r4160, %r4161}; 2026-02-21T09:23:31.1573050Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r4174, %r4175, %r4176, %r4177}; 2026-02-21T09:23:31.1573312Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r4130, %r4131, %r4132, %r4133}; 2026-02-21T09:23:31.1573534Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r4146, %r4147, %r4148, %r4149}; 2026-02-21T09:23:31.1573714Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r4162, %r4163, %r4164, %r4165}; 2026-02-21T09:23:31.1573897Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r4178, %r4179, %r4180, %r4181}; 2026-02-21T09:23:31.1574071Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r4134, %r4135, %r4136, %r4137}; 2026-02-21T09:23:31.1574246Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r4150, %r4151, %r4152, %r4153}; 2026-02-21T09:23:31.1574426Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r4166, %r4167, %r4168, %r4169}; 2026-02-21T09:23:31.1574649Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r4182, %r4183, %r4184, %r4185}; 2026-02-21T09:23:31.1574826Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r4138, %r4139, %r4140, %r4141}; 2026-02-21T09:23:31.1575007Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r4154, %r4155, %r4156, %r4157}; 2026-02-21T09:23:31.1575184Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r4170, %r4171, %r4172, %r4173}; 2026-02-21T09:23:31.1575410Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r90], {%r4186, %r4187, %r4188, %r4189}; 2026-02-21T09:23:31.1575485Z // begin inline asm 2026-02-21T09:23:31.1575569Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1575630Z // end inline asm 2026-02-21T09:23:31.1575688Z bar.sync 0; 2026-02-21T09:23:31.1575768Z elect.sync %r4190|%p84, -1; 2026-02-21T09:23:31.1575853Z shfl.sync.idx.b32 %r4191, %r5, 0, 31, -1; 2026-02-21T09:23:31.1575923Z and.pred %p75, %p201, %p84; 2026-02-21T09:23:31.1575997Z and.b32 %r4192, %r4191, 1; 2026-02-21T09:23:31.1576065Z shl.b32 %r4193, %r4192, 14; 2026-02-21T09:23:31.1576134Z add.s32 %r6814, %r1290, %r4193; 2026-02-21T09:23:31.1576201Z shl.b32 %r469, %r4192, 6; 2026-02-21T09:23:31.1576268Z or.b32 %r4042, %r469, %r1379; 2026-02-21T09:23:31.1576332Z // begin inline asm 2026-02-21T09:23:31.1576697Z @%p75 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd446, {%r4042, %r4043}], [%r6814]; 2026-02-21T09:23:31.1576768Z // end inline asm 2026-02-21T09:23:31.1576848Z cp.async.bulk.commit_group; 2026-02-21T09:23:31.1576929Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:31.1576993Z bar.sync 0; 2026-02-21T09:23:31.1577207Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1577279Z add.s32 %r4194, %r12481, 1; 2026-02-21T09:23:31.1577496Z .loc 1 35 31 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:35:31 2026-02-21T09:23:31.1577561Z shr.s32 %r4195, %r4194, 31; 2026-02-21T09:23:31.1577624Z shr.u32 %r4196, %r4195, 25; 2026-02-21T09:23:31.1577694Z add.s32 %r4197, %r4194, %r4196; 2026-02-21T09:23:31.1577902Z .loc 1 34 30 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:34:30 2026-02-21T09:23:31.1577971Z and.b32 %r4081, %r4197, -128; 2026-02-21T09:23:31.1578039Z sub.s32 %r4198, %r4194, %r4081; 2026-02-21T09:23:31.1578249Z .loc 1 36 27 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:36:27 2026-02-21T09:23:31.1578315Z shl.b32 %r6813, %r4198, 7; 2026-02-21T09:23:31.1578516Z .loc 1 37 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:37:32 2026-02-21T09:23:31.1578588Z or.b32 %r4199, %r6813, %r7; 2026-02-21T09:23:31.1578653Z or.b32 %r4200, %r6813, %r8; 2026-02-21T09:23:31.1578714Z or.b32 %r4201, %r6813, %r9; 2026-02-21T09:23:31.1578782Z or.b32 %r4202, %r6813, %r10; 2026-02-21T09:23:31.1587454Z or.b32 %r4203, %r6813, %r11; 2026-02-21T09:23:31.1587564Z or.b32 %r4204, %r6813, %r12; 2026-02-21T09:23:31.1587638Z or.b32 %r4205, %r6813, %r13; 2026-02-21T09:23:31.1587704Z or.b32 %r4206, %r6813, %r14; 2026-02-21T09:23:31.1587773Z or.b32 %r4207, %r6813, %r15; 2026-02-21T09:23:31.1587836Z or.b32 %r4208, %r6813, %r16; 2026-02-21T09:23:31.1588140Z or.b32 %r4209, %r6813, %r17; 2026-02-21T09:23:31.1588211Z or.b32 %r4210, %r6813, %r18; 2026-02-21T09:23:31.1588358Z or.b32 %r4211, %r6813, %r19; 2026-02-21T09:23:31.1588424Z or.b32 %r4212, %r6813, %r20; 2026-02-21T09:23:31.1588490Z or.b32 %r4213, %r6813, %r21; 2026-02-21T09:23:31.1588564Z or.b32 %r4214, %r6813, %r22; 2026-02-21T09:23:31.1588805Z .loc 1 52 53 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:53 2026-02-21T09:23:31.1588878Z shl.b32 %r4215, %r4199, 10; 2026-02-21T09:23:31.1588951Z shl.b32 %r4216, %r4200, 10; 2026-02-21T09:23:31.1589014Z shl.b32 %r4217, %r4201, 10; 2026-02-21T09:23:31.1589078Z shl.b32 %r4218, %r4202, 10; 2026-02-21T09:23:31.1589153Z shl.b32 %r4219, %r4203, 10; 2026-02-21T09:23:31.1589300Z shl.b32 %r4220, %r4204, 10; 2026-02-21T09:23:31.1589368Z shl.b32 %r4221, %r4205, 10; 2026-02-21T09:23:31.1589431Z shl.b32 %r4222, %r4206, 10; 2026-02-21T09:23:31.1589500Z shl.b32 %r4223, %r4207, 10; 2026-02-21T09:23:31.1589567Z shl.b32 %r4224, %r4208, 10; 2026-02-21T09:23:31.1589629Z shl.b32 %r4225, %r4209, 10; 2026-02-21T09:23:31.1589701Z shl.b32 %r4226, %r4210, 10; 2026-02-21T09:23:31.1589761Z shl.b32 %r4227, %r4211, 10; 2026-02-21T09:23:31.1589822Z shl.b32 %r4228, %r4212, 10; 2026-02-21T09:23:31.1589948Z shl.b32 %r4229, %r4213, 10; 2026-02-21T09:23:31.1590020Z shl.b32 %r4230, %r4214, 10; 2026-02-21T09:23:31.1590245Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1590311Z // begin inline asm 2026-02-21T09:23:31.1590435Z @%p160 mbarrier.init.shared::cta.b64 [%r6810], 1; 2026-02-21T09:23:31.1590498Z // end inline asm 2026-02-21T09:23:31.1590557Z bar.sync 0; 2026-02-21T09:23:31.1590625Z // begin inline asm 2026-02-21T09:23:31.1590727Z @%p160 mbarrier.init.shared::cta.b64 [%r6811], 1; 2026-02-21T09:23:31.1590788Z // end inline asm 2026-02-21T09:23:31.1591008Z .loc 1 52 60 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:60 2026-02-21T09:23:31.1591085Z or.b32 %r4231, %r4215, %r24; 2026-02-21T09:23:31.1591150Z or.b32 %r4232, %r4216, %r24; 2026-02-21T09:23:31.1591213Z or.b32 %r4233, %r4217, %r24; 2026-02-21T09:23:31.1591281Z or.b32 %r4234, %r4218, %r24; 2026-02-21T09:23:31.1591343Z or.b32 %r4235, %r4219, %r24; 2026-02-21T09:23:31.1591405Z or.b32 %r4236, %r4220, %r24; 2026-02-21T09:23:31.1591472Z or.b32 %r4237, %r4221, %r24; 2026-02-21T09:23:31.1591532Z or.b32 %r4238, %r4222, %r24; 2026-02-21T09:23:31.1591593Z or.b32 %r4239, %r4223, %r24; 2026-02-21T09:23:31.1591653Z or.b32 %r4240, %r4224, %r24; 2026-02-21T09:23:31.1591719Z or.b32 %r4241, %r4225, %r24; 2026-02-21T09:23:31.1591781Z or.b32 %r4242, %r4226, %r24; 2026-02-21T09:23:31.1591843Z or.b32 %r4243, %r4227, %r24; 2026-02-21T09:23:31.1591911Z or.b32 %r4244, %r4228, %r24; 2026-02-21T09:23:31.1591972Z or.b32 %r4245, %r4229, %r24; 2026-02-21T09:23:31.1592032Z or.b32 %r4246, %r4230, %r24; 2026-02-21T09:23:31.1592249Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1592339Z mad.wide.s32 %rd298, %r4231, 2, %rd102; 2026-02-21T09:23:31.1592413Z mad.wide.s32 %rd299, %r4232, 2, %rd102; 2026-02-21T09:23:31.1592485Z mad.wide.s32 %rd300, %r4233, 2, %rd102; 2026-02-21T09:23:31.1592559Z mad.wide.s32 %rd301, %r4234, 2, %rd102; 2026-02-21T09:23:31.1592630Z mad.wide.s32 %rd302, %r4235, 2, %rd102; 2026-02-21T09:23:31.1592701Z mad.wide.s32 %rd303, %r4236, 2, %rd102; 2026-02-21T09:23:31.1592777Z mad.wide.s32 %rd304, %r4237, 2, %rd102; 2026-02-21T09:23:31.1592847Z mad.wide.s32 %rd305, %r4238, 2, %rd102; 2026-02-21T09:23:31.1592916Z mad.wide.s32 %rd306, %r4239, 2, %rd102; 2026-02-21T09:23:31.1592985Z mad.wide.s32 %rd307, %r4240, 2, %rd102; 2026-02-21T09:23:31.1593061Z mad.wide.s32 %rd308, %r4241, 2, %rd102; 2026-02-21T09:23:31.1593129Z mad.wide.s32 %rd309, %r4242, 2, %rd102; 2026-02-21T09:23:31.1593200Z mad.wide.s32 %rd310, %r4243, 2, %rd102; 2026-02-21T09:23:31.1593415Z mad.wide.s32 %rd311, %r4244, 2, %rd102; 2026-02-21T09:23:31.1593485Z mad.wide.s32 %rd312, %r4245, 2, %rd102; 2026-02-21T09:23:31.1593555Z mad.wide.s32 %rd313, %r4246, 2, %rd102; 2026-02-21T09:23:31.1593614Z mov.b32 %r4048, 8; 2026-02-21T09:23:31.1593824Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1593888Z // begin inline asm 2026-02-21T09:23:31.1594035Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd298 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1594108Z // end inline asm 2026-02-21T09:23:31.1594170Z // begin inline asm 2026-02-21T09:23:31.1594322Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd299 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1594391Z // end inline asm 2026-02-21T09:23:31.1594513Z // begin inline asm 2026-02-21T09:23:31.1594654Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd300 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1594716Z // end inline asm 2026-02-21T09:23:31.1594788Z // begin inline asm 2026-02-21T09:23:31.1594925Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd301 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1594984Z // end inline asm 2026-02-21T09:23:31.1595052Z // begin inline asm 2026-02-21T09:23:31.1595229Z cp.async.ca.shared.global [ %r31 + 0 ], [ %rd302 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1595294Z // end inline asm 2026-02-21T09:23:31.1595355Z // begin inline asm 2026-02-21T09:23:31.1595491Z cp.async.ca.shared.global [ %r32 + 0 ], [ %rd303 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1595552Z // end inline asm 2026-02-21T09:23:31.1595612Z // begin inline asm 2026-02-21T09:23:31.1595747Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd304 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1595806Z // end inline asm 2026-02-21T09:23:31.1595866Z // begin inline asm 2026-02-21T09:23:31.1596002Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd305 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1596061Z // end inline asm 2026-02-21T09:23:31.1596122Z // begin inline asm 2026-02-21T09:23:31.1596256Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd306 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1596321Z // end inline asm 2026-02-21T09:23:31.1596381Z // begin inline asm 2026-02-21T09:23:31.1596636Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd307 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1596707Z // end inline asm 2026-02-21T09:23:31.1596768Z // begin inline asm 2026-02-21T09:23:31.1596895Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd308 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1596954Z // end inline asm 2026-02-21T09:23:31.1597020Z // begin inline asm 2026-02-21T09:23:31.1597147Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd309 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1597206Z // end inline asm 2026-02-21T09:23:31.1597273Z // begin inline asm 2026-02-21T09:23:31.1597403Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd310 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1597463Z // end inline asm 2026-02-21T09:23:31.1597535Z // begin inline asm 2026-02-21T09:23:31.1597662Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd311 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1597727Z // end inline asm 2026-02-21T09:23:31.1597789Z // begin inline asm 2026-02-21T09:23:31.1597923Z cp.async.ca.shared.global [ %r41 + 0 ], [ %rd312 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1597985Z // end inline asm 2026-02-21T09:23:31.1598047Z // begin inline asm 2026-02-21T09:23:31.1598196Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd313 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1598257Z // end inline asm 2026-02-21T09:23:31.1598328Z cp.async.commit_group; 2026-02-21T09:23:31.1598549Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1598618Z bar.sync 0; 2026-02-21T09:23:31.1598680Z // begin inline asm 2026-02-21T09:23:31.1598824Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6810], 4096; 2026-02-21T09:23:31.1598892Z // end inline asm 2026-02-21T09:23:31.1599108Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1599326Z bar.sync 0; 2026-02-21T09:23:31.1599410Z elect.sync %r4247|%p85, -1; 2026-02-21T09:23:31.1599484Z and.pred %p79, %p1, %p85; 2026-02-21T09:23:31.1599547Z mov.b32 %r4082, 0; 2026-02-21T09:23:31.1599612Z // begin inline asm 2026-02-21T09:23:31.1599961Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1378], [%rd463, {%r4081, %r4082}], [%r6810]; 2026-02-21T09:23:31.1600025Z // end inline asm 2026-02-21T09:23:31.1600241Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1600316Z cvt.s64.s32 %rd333, %r4215; 2026-02-21T09:23:31.1600386Z or.b64 %rd334, %rd333, %rd754; 2026-02-21T09:23:31.1600453Z shl.b64 %rd335, %rd334, 1; 2026-02-21T09:23:31.1600593Z add.s64 %rd336, %rd102, %rd335; 2026-02-21T09:23:31.1600662Z add.s64 %rd315, %rd336, 128; 2026-02-21T09:23:31.1600732Z cvt.s64.s32 %rd337, %r4216; 2026-02-21T09:23:31.1600798Z or.b64 %rd338, %rd337, %rd754; 2026-02-21T09:23:31.1600873Z shl.b64 %rd339, %rd338, 1; 2026-02-21T09:23:31.1600939Z add.s64 %rd340, %rd102, %rd339; 2026-02-21T09:23:31.1601006Z add.s64 %rd316, %rd340, 128; 2026-02-21T09:23:31.1601076Z cvt.s64.s32 %rd341, %r4217; 2026-02-21T09:23:31.1601202Z or.b64 %rd342, %rd341, %rd754; 2026-02-21T09:23:31.1601269Z shl.b64 %rd343, %rd342, 1; 2026-02-21T09:23:31.1601334Z add.s64 %rd344, %rd102, %rd343; 2026-02-21T09:23:31.1601419Z add.s64 %rd317, %rd344, 128; 2026-02-21T09:23:31.1601484Z cvt.s64.s32 %rd345, %r4218; 2026-02-21T09:23:31.1601547Z or.b64 %rd346, %rd345, %rd754; 2026-02-21T09:23:31.1601614Z shl.b64 %rd347, %rd346, 1; 2026-02-21T09:23:31.1601680Z add.s64 %rd348, %rd102, %rd347; 2026-02-21T09:23:31.1601743Z add.s64 %rd318, %rd348, 128; 2026-02-21T09:23:31.1601815Z cvt.s64.s32 %rd349, %r4219; 2026-02-21T09:23:31.1601879Z or.b64 %rd350, %rd349, %rd754; 2026-02-21T09:23:31.1601943Z shl.b64 %rd351, %rd350, 1; 2026-02-21T09:23:31.1602007Z add.s64 %rd352, %rd102, %rd351; 2026-02-21T09:23:31.1602079Z add.s64 %rd319, %rd352, 128; 2026-02-21T09:23:31.1602142Z cvt.s64.s32 %rd353, %r4220; 2026-02-21T09:23:31.1602206Z or.b64 %rd354, %rd353, %rd754; 2026-02-21T09:23:31.1602274Z shl.b64 %rd355, %rd354, 1; 2026-02-21T09:23:31.1602341Z add.s64 %rd356, %rd102, %rd355; 2026-02-21T09:23:31.1602405Z add.s64 %rd320, %rd356, 128; 2026-02-21T09:23:31.1602467Z cvt.s64.s32 %rd357, %r4221; 2026-02-21T09:23:31.1602540Z or.b64 %rd358, %rd357, %rd754; 2026-02-21T09:23:31.1602603Z shl.b64 %rd359, %rd358, 1; 2026-02-21T09:23:31.1602670Z add.s64 %rd360, %rd102, %rd359; 2026-02-21T09:23:31.1602738Z add.s64 %rd321, %rd360, 128; 2026-02-21T09:23:31.1602802Z cvt.s64.s32 %rd361, %r4222; 2026-02-21T09:23:31.1602865Z or.b64 %rd362, %rd361, %rd754; 2026-02-21T09:23:31.1602931Z shl.b64 %rd363, %rd362, 1; 2026-02-21T09:23:31.1603002Z add.s64 %rd364, %rd102, %rd363; 2026-02-21T09:23:31.1603065Z add.s64 %rd322, %rd364, 128; 2026-02-21T09:23:31.1603129Z cvt.s64.s32 %rd365, %r4223; 2026-02-21T09:23:31.1603208Z or.b64 %rd366, %rd365, %rd754; 2026-02-21T09:23:31.1603275Z shl.b64 %rd367, %rd366, 1; 2026-02-21T09:23:31.1603342Z add.s64 %rd368, %rd102, %rd367; 2026-02-21T09:23:31.1603410Z add.s64 %rd323, %rd368, 128; 2026-02-21T09:23:31.1603483Z cvt.s64.s32 %rd369, %r4224; 2026-02-21T09:23:31.1603550Z or.b64 %rd370, %rd369, %rd754; 2026-02-21T09:23:31.1603615Z shl.b64 %rd371, %rd370, 1; 2026-02-21T09:23:31.1603688Z add.s64 %rd372, %rd102, %rd371; 2026-02-21T09:23:31.1603753Z add.s64 %rd324, %rd372, 128; 2026-02-21T09:23:31.1603816Z cvt.s64.s32 %rd373, %r4225; 2026-02-21T09:23:31.1603886Z or.b64 %rd374, %rd373, %rd754; 2026-02-21T09:23:31.1603951Z shl.b64 %rd375, %rd374, 1; 2026-02-21T09:23:31.1604015Z add.s64 %rd376, %rd102, %rd375; 2026-02-21T09:23:31.1604080Z add.s64 %rd325, %rd376, 128; 2026-02-21T09:23:31.1604152Z cvt.s64.s32 %rd377, %r4226; 2026-02-21T09:23:31.1604217Z or.b64 %rd378, %rd377, %rd754; 2026-02-21T09:23:31.1604281Z shl.b64 %rd379, %rd378, 1; 2026-02-21T09:23:31.1604457Z add.s64 %rd380, %rd102, %rd379; 2026-02-21T09:23:31.1604525Z add.s64 %rd326, %rd380, 128; 2026-02-21T09:23:31.1604590Z cvt.s64.s32 %rd381, %r4227; 2026-02-21T09:23:31.1604654Z or.b64 %rd382, %rd381, %rd754; 2026-02-21T09:23:31.1604726Z shl.b64 %rd383, %rd382, 1; 2026-02-21T09:23:31.1604792Z add.s64 %rd384, %rd102, %rd383; 2026-02-21T09:23:31.1604855Z add.s64 %rd327, %rd384, 128; 2026-02-21T09:23:31.1604927Z cvt.s64.s32 %rd385, %r4228; 2026-02-21T09:23:31.1604990Z or.b64 %rd386, %rd385, %rd754; 2026-02-21T09:23:31.1605053Z shl.b64 %rd387, %rd386, 1; 2026-02-21T09:23:31.1605119Z add.s64 %rd388, %rd102, %rd387; 2026-02-21T09:23:31.1605188Z add.s64 %rd328, %rd388, 128; 2026-02-21T09:23:31.1605251Z cvt.s64.s32 %rd389, %r4229; 2026-02-21T09:23:31.1605363Z or.b64 %rd390, %rd389, %rd754; 2026-02-21T09:23:31.1605435Z shl.b64 %rd391, %rd390, 1; 2026-02-21T09:23:31.1605502Z add.s64 %rd392, %rd102, %rd391; 2026-02-21T09:23:31.1605566Z add.s64 %rd329, %rd392, 128; 2026-02-21T09:23:31.1605636Z cvt.s64.s32 %rd393, %r4230; 2026-02-21T09:23:31.1605712Z or.b64 %rd394, %rd393, %rd754; 2026-02-21T09:23:31.1605776Z shl.b64 %rd395, %rd394, 1; 2026-02-21T09:23:31.1605842Z add.s64 %rd396, %rd102, %rd395; 2026-02-21T09:23:31.1605958Z add.s64 %rd330, %rd396, 128; 2026-02-21T09:23:31.1606175Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1606252Z // begin inline asm 2026-02-21T09:23:31.1606412Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd315 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1606582Z // end inline asm 2026-02-21T09:23:31.1606648Z // begin inline asm 2026-02-21T09:23:31.1606785Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd316 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1606857Z // end inline asm 2026-02-21T09:23:31.1606919Z // begin inline asm 2026-02-21T09:23:31.1607050Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd317 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1607119Z // end inline asm 2026-02-21T09:23:31.1607184Z // begin inline asm 2026-02-21T09:23:31.1607313Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd318 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1607376Z // end inline asm 2026-02-21T09:23:31.1607440Z // begin inline asm 2026-02-21T09:23:31.1607575Z cp.async.ca.shared.global [ %r47 + 0 ], [ %rd319 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1607634Z // end inline asm 2026-02-21T09:23:31.1607694Z // begin inline asm 2026-02-21T09:23:31.1607827Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd320 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1607885Z // end inline asm 2026-02-21T09:23:31.1607945Z // begin inline asm 2026-02-21T09:23:31.1608077Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd321 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1608135Z // end inline asm 2026-02-21T09:23:31.1608197Z // begin inline asm 2026-02-21T09:23:31.1608325Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd322 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1608390Z // end inline asm 2026-02-21T09:23:31.1608455Z // begin inline asm 2026-02-21T09:23:31.1608584Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd323 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1608652Z // end inline asm 2026-02-21T09:23:31.1608712Z // begin inline asm 2026-02-21T09:23:31.1608844Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd324 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1608909Z // end inline asm 2026-02-21T09:23:31.1608975Z // begin inline asm 2026-02-21T09:23:31.1609101Z cp.async.ca.shared.global [ %r53 + 0 ], [ %rd325 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1609158Z // end inline asm 2026-02-21T09:23:31.1609226Z // begin inline asm 2026-02-21T09:23:31.1609354Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd326 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1609412Z // end inline asm 2026-02-21T09:23:31.1609477Z // begin inline asm 2026-02-21T09:23:31.1609604Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd327 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1609662Z // end inline asm 2026-02-21T09:23:31.1609722Z // begin inline asm 2026-02-21T09:23:31.1610031Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd328 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1610092Z // end inline asm 2026-02-21T09:23:31.1610155Z // begin inline asm 2026-02-21T09:23:31.1610290Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd329 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1610352Z // end inline asm 2026-02-21T09:23:31.1610414Z // begin inline asm 2026-02-21T09:23:31.1610540Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd330 + 0 ], 0x8, %r4048; 2026-02-21T09:23:31.1610599Z // end inline asm 2026-02-21T09:23:31.1610669Z cp.async.commit_group; 2026-02-21T09:23:31.1610876Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1610940Z bar.sync 0; 2026-02-21T09:23:31.1611001Z // begin inline asm 2026-02-21T09:23:31.1611201Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6811], 4096; 2026-02-21T09:23:31.1611270Z // end inline asm 2026-02-21T09:23:31.1611475Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1611541Z bar.sync 0; 2026-02-21T09:23:31.1611612Z elect.sync %r4248|%p86, -1; 2026-02-21T09:23:31.1611686Z and.pred %p81, %p1, %p86; 2026-02-21T09:23:31.1611746Z mov.b32 %r4119, 32; 2026-02-21T09:23:31.1611865Z // begin inline asm 2026-02-21T09:23:31.1612203Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1415], [%rd463, {%r4081, %r4119}], [%r6811]; 2026-02-21T09:23:31.1612263Z // end inline asm 2026-02-21T09:23:31.1612472Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1612542Z shl.b32 %r4249, %r4194, 7; 2026-02-21T09:23:31.1612605Z or.b32 %r4250, %r22, %r4249; 2026-02-21T09:23:31.1612670Z shl.b32 %r4251, %r4197, 7; 2026-02-21T09:23:31.1612739Z and.b32 %r4252, %r4251, -16384; 2026-02-21T09:23:31.1612807Z sub.s32 %r4253, %r4250, %r4252; 2026-02-21T09:23:31.1612869Z shl.b32 %r4254, %r4253, 10; 2026-02-21T09:23:31.1612941Z mul.wide.s32 %rd397, %r4254, 2; 2026-02-21T09:23:31.1613019Z or.b64 %rd42, %rd397, 256; 2026-02-21T09:23:31.1613084Z or.b32 %r4255, %r21, %r4249; 2026-02-21T09:23:31.1613147Z sub.s32 %r4256, %r4255, %r4252; 2026-02-21T09:23:31.1613210Z shl.b32 %r4257, %r4256, 10; 2026-02-21T09:23:31.1613281Z mul.wide.s32 %rd398, %r4257, 2; 2026-02-21T09:23:31.1613344Z or.b64 %rd43, %rd398, 256; 2026-02-21T09:23:31.1613406Z or.b32 %r4258, %r20, %r4249; 2026-02-21T09:23:31.1613471Z sub.s32 %r4259, %r4258, %r4252; 2026-02-21T09:23:31.1613531Z shl.b32 %r4260, %r4259, 10; 2026-02-21T09:23:31.1613594Z mul.wide.s32 %rd399, %r4260, 2; 2026-02-21T09:23:31.1613659Z or.b64 %rd44, %rd399, 256; 2026-02-21T09:23:31.1613720Z or.b32 %r4261, %r19, %r4249; 2026-02-21T09:23:31.1613784Z sub.s32 %r4262, %r4261, %r4252; 2026-02-21T09:23:31.1613845Z shl.b32 %r4263, %r4262, 10; 2026-02-21T09:23:31.1613915Z mul.wide.s32 %rd400, %r4263, 2; 2026-02-21T09:23:31.1613976Z or.b64 %rd45, %rd400, 256; 2026-02-21T09:23:31.1614040Z or.b32 %r4264, %r18, %r4249; 2026-02-21T09:23:31.1614107Z sub.s32 %r4265, %r4264, %r4252; 2026-02-21T09:23:31.1614168Z shl.b32 %r4266, %r4265, 10; 2026-02-21T09:23:31.1614233Z mul.wide.s32 %rd401, %r4266, 2; 2026-02-21T09:23:31.1614296Z or.b64 %rd46, %rd401, 256; 2026-02-21T09:23:31.1614363Z or.b32 %r4267, %r17, %r4249; 2026-02-21T09:23:31.1614424Z sub.s32 %r4268, %r4267, %r4252; 2026-02-21T09:23:31.1614485Z shl.b32 %r4269, %r4268, 10; 2026-02-21T09:23:31.1614564Z mul.wide.s32 %rd402, %r4269, 2; 2026-02-21T09:23:31.1614627Z or.b64 %rd47, %rd402, 256; 2026-02-21T09:23:31.1614690Z or.b32 %r4270, %r16, %r4249; 2026-02-21T09:23:31.1614753Z sub.s32 %r4271, %r4270, %r4252; 2026-02-21T09:23:31.1614820Z shl.b32 %r4272, %r4271, 10; 2026-02-21T09:23:31.1614888Z mul.wide.s32 %rd403, %r4272, 2; 2026-02-21T09:23:31.1614952Z or.b64 %rd48, %rd403, 256; 2026-02-21T09:23:31.1615020Z or.b32 %r4273, %r15, %r4249; 2026-02-21T09:23:31.1615082Z sub.s32 %r4274, %r4273, %r4252; 2026-02-21T09:23:31.1615277Z shl.b32 %r4275, %r4274, 10; 2026-02-21T09:23:31.1615349Z mul.wide.s32 %rd404, %r4275, 2; 2026-02-21T09:23:31.1615410Z or.b64 %rd49, %rd404, 256; 2026-02-21T09:23:31.1615472Z or.b32 %r4276, %r14, %r4249; 2026-02-21T09:23:31.1615540Z sub.s32 %r4277, %r4276, %r4252; 2026-02-21T09:23:31.1615600Z shl.b32 %r4278, %r4277, 10; 2026-02-21T09:23:31.1615665Z mul.wide.s32 %rd405, %r4278, 2; 2026-02-21T09:23:31.1615733Z or.b64 %rd50, %rd405, 256; 2026-02-21T09:23:31.1615794Z or.b32 %r4279, %r13, %r4249; 2026-02-21T09:23:31.1615857Z sub.s32 %r4280, %r4279, %r4252; 2026-02-21T09:23:31.1615919Z shl.b32 %r4281, %r4280, 10; 2026-02-21T09:23:31.1615989Z mul.wide.s32 %rd406, %r4281, 2; 2026-02-21T09:23:31.1616051Z or.b64 %rd51, %rd406, 256; 2026-02-21T09:23:31.1616166Z or.b32 %r4282, %r12, %r4249; 2026-02-21T09:23:31.1616240Z sub.s32 %r4283, %r4282, %r4252; 2026-02-21T09:23:31.1616302Z shl.b32 %r4284, %r4283, 10; 2026-02-21T09:23:31.1616368Z mul.wide.s32 %rd407, %r4284, 2; 2026-02-21T09:23:31.1616437Z or.b64 %rd52, %rd407, 256; 2026-02-21T09:23:31.1616619Z or.b32 %r4285, %r11, %r4249; 2026-02-21T09:23:31.1616684Z sub.s32 %r4286, %r4285, %r4252; 2026-02-21T09:23:31.1616745Z shl.b32 %r4287, %r4286, 10; 2026-02-21T09:23:31.1616900Z mul.wide.s32 %rd408, %r4287, 2; 2026-02-21T09:23:31.1616968Z or.b64 %rd53, %rd408, 256; 2026-02-21T09:23:31.1617028Z or.b32 %r4288, %r10, %r4249; 2026-02-21T09:23:31.1617094Z sub.s32 %r4289, %r4288, %r4252; 2026-02-21T09:23:31.1617166Z shl.b32 %r4290, %r4289, 10; 2026-02-21T09:23:31.1617234Z mul.wide.s32 %rd409, %r4290, 2; 2026-02-21T09:23:31.1617295Z or.b64 %rd54, %rd409, 256; 2026-02-21T09:23:31.1617361Z or.b32 %r4291, %r9, %r4249; 2026-02-21T09:23:31.1617423Z sub.s32 %r4292, %r4291, %r4252; 2026-02-21T09:23:31.1617488Z shl.b32 %r4293, %r4292, 10; 2026-02-21T09:23:31.1617560Z mul.wide.s32 %rd410, %r4293, 2; 2026-02-21T09:23:31.1617620Z or.b64 %rd55, %rd410, 256; 2026-02-21T09:23:31.1617678Z or.b32 %r4294, %r8, %r4249; 2026-02-21T09:23:31.1617745Z sub.s32 %r4295, %r4294, %r4252; 2026-02-21T09:23:31.1617811Z shl.b32 %r4296, %r4295, 10; 2026-02-21T09:23:31.1617874Z mul.wide.s32 %rd411, %r4296, 2; 2026-02-21T09:23:31.1617937Z or.b64 %rd56, %rd411, 256; 2026-02-21T09:23:31.1618002Z or.b32 %r4297, %r7, %r4249; 2026-02-21T09:23:31.1618063Z sub.s32 %r4298, %r4297, %r4252; 2026-02-21T09:23:31.1618124Z shl.b32 %r4299, %r4298, 10; 2026-02-21T09:23:31.1618192Z mul.wide.s32 %rd412, %r4299, 2; 2026-02-21T09:23:31.1618253Z or.b64 %rd57, %rd412, 256; 2026-02-21T09:23:31.1618317Z mov.b32 %r12616, 0f00000000; 2026-02-21T09:23:31.1618376Z mov.b32 %r12615, 1; 2026-02-21T09:23:31.1618443Z mov.b32 %r12614, -1; 2026-02-21T09:23:31.1618502Z mov.b64 %rd758, 0; 2026-02-21T09:23:31.1618564Z mov.b64 %rd757, %rd11; 2026-02-21T09:23:31.1618633Z mov.b32 %r12613, %r4082; 2026-02-21T09:23:31.1618696Z mov.b32 %r12617, %r12616; 2026-02-21T09:23:31.1618757Z mov.b32 %r12618, %r12616; 2026-02-21T09:23:31.1618820Z mov.b32 %r12619, %r12616; 2026-02-21T09:23:31.1618885Z mov.b32 %r12620, %r12616; 2026-02-21T09:23:31.1618944Z mov.b32 %r12621, %r12616; 2026-02-21T09:23:31.1619005Z mov.b32 %r12622, %r12616; 2026-02-21T09:23:31.1619068Z mov.b32 %r12623, %r12616; 2026-02-21T09:23:31.1619129Z mov.b32 %r12624, %r12616; 2026-02-21T09:23:31.1619190Z mov.b32 %r12625, %r12616; 2026-02-21T09:23:31.1619249Z mov.b32 %r12626, %r12616; 2026-02-21T09:23:31.1619312Z mov.b32 %r12627, %r12616; 2026-02-21T09:23:31.1619371Z mov.b32 %r12628, %r12616; 2026-02-21T09:23:31.1619431Z mov.b32 %r12629, %r12616; 2026-02-21T09:23:31.1619493Z mov.b32 %r12630, %r12616; 2026-02-21T09:23:31.1619552Z mov.b32 %r12631, %r12616; 2026-02-21T09:23:31.1619610Z mov.b32 %r12632, %r12616; 2026-02-21T09:23:31.1619670Z mov.b32 %r12633, %r12616; 2026-02-21T09:23:31.1619734Z mov.b32 %r12634, %r12616; 2026-02-21T09:23:31.1619795Z mov.b32 %r12635, %r12616; 2026-02-21T09:23:31.1619856Z mov.b32 %r12636, %r12616; 2026-02-21T09:23:31.1620069Z mov.b32 %r12637, %r12616; 2026-02-21T09:23:31.1620129Z mov.b32 %r12638, %r12616; 2026-02-21T09:23:31.1620190Z mov.b32 %r12639, %r12616; 2026-02-21T09:23:31.1620252Z mov.b32 %r12640, %r12616; 2026-02-21T09:23:31.1620315Z mov.b32 %r12641, %r12616; 2026-02-21T09:23:31.1620376Z mov.b32 %r12642, %r12616; 2026-02-21T09:23:31.1620435Z mov.b32 %r12643, %r12616; 2026-02-21T09:23:31.1620500Z mov.b32 %r12644, %r12616; 2026-02-21T09:23:31.1620560Z mov.b32 %r12645, %r12616; 2026-02-21T09:23:31.1620619Z mov.b32 %r12646, %r12616; 2026-02-21T09:23:31.1620678Z mov.b32 %r12647, %r12616; 2026-02-21T09:23:31.1620743Z mov.b32 %r12648, %r12616; 2026-02-21T09:23:31.1620805Z mov.b32 %r12649, %r12616; 2026-02-21T09:23:31.1620865Z mov.b32 %r12650, %r12616; 2026-02-21T09:23:31.1621013Z mov.b32 %r12651, %r12616; 2026-02-21T09:23:31.1621078Z mov.b32 %r12652, %r12616; 2026-02-21T09:23:31.1621139Z mov.b32 %r12653, %r12616; 2026-02-21T09:23:31.1621205Z mov.b32 %r12654, %r12616; 2026-02-21T09:23:31.1621273Z mov.b32 %r12655, %r12616; 2026-02-21T09:23:31.1621333Z mov.b32 %r12656, %r12616; 2026-02-21T09:23:31.1621393Z mov.b32 %r12657, %r12616; 2026-02-21T09:23:31.1621456Z mov.b32 %r12658, %r12616; 2026-02-21T09:23:31.1621513Z mov.b32 %r12659, %r12616; 2026-02-21T09:23:31.1621617Z mov.b32 %r12660, %r12616; 2026-02-21T09:23:31.1621683Z mov.b32 %r12661, %r12616; 2026-02-21T09:23:31.1621741Z mov.b32 %r12662, %r12616; 2026-02-21T09:23:31.1621800Z mov.b32 %r12663, %r12616; 2026-02-21T09:23:31.1621858Z mov.b32 %r12664, %r12616; 2026-02-21T09:23:31.1621923Z mov.b32 %r12665, %r12616; 2026-02-21T09:23:31.1621981Z mov.b32 %r12666, %r12616; 2026-02-21T09:23:31.1622040Z mov.b32 %r12667, %r12616; 2026-02-21T09:23:31.1622102Z mov.b32 %r12668, %r12616; 2026-02-21T09:23:31.1622164Z mov.b32 %r12669, %r12616; 2026-02-21T09:23:31.1622221Z mov.b32 %r12670, %r12616; 2026-02-21T09:23:31.1622284Z mov.b32 %r12671, %r12616; 2026-02-21T09:23:31.1622349Z mov.b32 %r12672, %r12616; 2026-02-21T09:23:31.1622412Z mov.b32 %r12673, %r12616; 2026-02-21T09:23:31.1622470Z mov.b32 %r12674, %r12616; 2026-02-21T09:23:31.1622535Z mov.b32 %r12675, %r12616; 2026-02-21T09:23:31.1622605Z mov.b32 %r12676, %r12616; 2026-02-21T09:23:31.1622668Z mov.b32 %r12677, %r12616; 2026-02-21T09:23:31.1622729Z mov.b32 %r12678, %r12616; 2026-02-21T09:23:31.1622794Z mov.b32 %r12679, %r12616; 2026-02-21T09:23:31.1622852Z mov.b32 %r12680, %r12616; 2026-02-21T09:23:31.1622910Z mov.b32 %r12681, %r12616; 2026-02-21T09:23:31.1622978Z mov.b32 %r12682, %r12616; 2026-02-21T09:23:31.1623038Z mov.b32 %r12683, %r12616; 2026-02-21T09:23:31.1623097Z mov.b32 %r12684, %r12616; 2026-02-21T09:23:31.1623157Z mov.b32 %r12685, %r12616; 2026-02-21T09:23:31.1623220Z mov.b32 %r12686, %r12616; 2026-02-21T09:23:31.1623282Z mov.b32 %r12687, %r12616; 2026-02-21T09:23:31.1623340Z mov.b32 %r12688, %r12616; 2026-02-21T09:23:31.1623408Z mov.b32 %r12689, %r12616; 2026-02-21T09:23:31.1623466Z mov.b32 %r12690, %r12616; 2026-02-21T09:23:31.1623529Z mov.b32 %r12691, %r12616; 2026-02-21T09:23:31.1623588Z mov.b32 %r12692, %r12616; 2026-02-21T09:23:31.1623652Z mov.b32 %r12693, %r12616; 2026-02-21T09:23:31.1623711Z mov.b32 %r12694, %r12616; 2026-02-21T09:23:31.1623768Z mov.b32 %r12695, %r12616; 2026-02-21T09:23:31.1623832Z mov.b32 %r12696, %r12616; 2026-02-21T09:23:31.1623890Z mov.b32 %r12697, %r12616; 2026-02-21T09:23:31.1623949Z mov.b32 %r12698, %r12616; 2026-02-21T09:23:31.1624011Z mov.b32 %r12699, %r12616; 2026-02-21T09:23:31.1624070Z mov.b32 %r12700, %r12616; 2026-02-21T09:23:31.1624130Z mov.b32 %r12701, %r12616; 2026-02-21T09:23:31.1624188Z mov.b32 %r12702, %r12616; 2026-02-21T09:23:31.1624251Z mov.b32 %r12703, %r12616; 2026-02-21T09:23:31.1624310Z mov.b32 %r12704, %r12616; 2026-02-21T09:23:31.1624370Z mov.b32 %r12705, %r12616; 2026-02-21T09:23:31.1624436Z mov.b32 %r12706, %r12616; 2026-02-21T09:23:31.1624493Z mov.b32 %r12707, %r12616; 2026-02-21T09:23:31.1624556Z mov.b32 %r12708, %r12616; 2026-02-21T09:23:31.1624736Z mov.b32 %r12709, %r12616; 2026-02-21T09:23:31.1624799Z mov.b32 %r12710, %r12616; 2026-02-21T09:23:31.1624857Z mov.b32 %r12711, %r12616; 2026-02-21T09:23:31.1624917Z mov.b32 %r12712, %r12616; 2026-02-21T09:23:31.1624986Z mov.b32 %r12713, %r12616; 2026-02-21T09:23:31.1625046Z mov.b32 %r12714, %r12616; 2026-02-21T09:23:31.1625107Z mov.b32 %r12715, %r12616; 2026-02-21T09:23:31.1625166Z mov.b32 %r12716, %r12616; 2026-02-21T09:23:31.1625230Z mov.b32 %r12717, %r12616; 2026-02-21T09:23:31.1625288Z mov.b32 %r12718, %r12616; 2026-02-21T09:23:31.1625347Z mov.b32 %r12719, %r12616; 2026-02-21T09:23:31.1625410Z mov.b32 %r12720, %r12616; 2026-02-21T09:23:31.1625469Z mov.b32 %r12721, %r12616; 2026-02-21T09:23:31.1625531Z mov.b32 %r12722, %r12616; 2026-02-21T09:23:31.1625641Z mov.b32 %r12723, %r12616; 2026-02-21T09:23:31.1625708Z mov.b32 %r12724, %r12616; 2026-02-21T09:23:31.1625766Z mov.b32 %r12725, %r12616; 2026-02-21T09:23:31.1625829Z mov.b32 %r12726, %r12616; 2026-02-21T09:23:31.1625898Z mov.b32 %r12727, %r12616; 2026-02-21T09:23:31.1625968Z mov.b32 %r12728, %r12616; 2026-02-21T09:23:31.1626029Z mov.b32 %r12729, %r12616; 2026-02-21T09:23:31.1626092Z mov.b32 %r12730, %r12616; 2026-02-21T09:23:31.1626157Z mov.b32 %r12731, %r12616; 2026-02-21T09:23:31.1626269Z mov.b32 %r12732, %r12616; 2026-02-21T09:23:31.1626330Z mov.b32 %r12733, %r12616; 2026-02-21T09:23:31.1626394Z mov.b32 %r12734, %r12616; 2026-02-21T09:23:31.1626563Z mov.b32 %r12735, %r12616; 2026-02-21T09:23:31.1626628Z mov.b32 %r12736, %r12616; 2026-02-21T09:23:31.1626687Z mov.b32 %r12737, %r12616; 2026-02-21T09:23:31.1626750Z mov.b32 %r12738, %r12616; 2026-02-21T09:23:31.1626810Z mov.b32 %r12739, %r12616; 2026-02-21T09:23:31.1626873Z mov.b32 %r12740, %r12616; 2026-02-21T09:23:31.1626939Z mov.b32 %r12741, %r12616; 2026-02-21T09:23:31.1626998Z mov.b32 %r12742, %r12616; 2026-02-21T09:23:31.1627057Z mov.b32 %r12743, %r12616; 2026-02-21T09:23:31.1627185Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:31.1627303Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:31.1627376Z setp.lt.u64 %p107, %rd758, 448; 2026-02-21T09:23:31.1627439Z add.s32 %r6713, %r12614, 1; 2026-02-21T09:23:31.1627512Z setp.gt.s32 %p108, %r6713, 1; 2026-02-21T09:23:31.1627586Z selp.b32 %r12614, 0, %r6713, %p108; 2026-02-21T09:23:31.1627653Z selp.b32 %r6714, 1, 0, %p108; 2026-02-21T09:23:31.1627722Z xor.b32 %r12613, %r12613, %r6714; 2026-02-21T09:23:31.1627939Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1628011Z cp.async.wait_group 1; 2026-02-21T09:23:31.1628069Z bar.sync 0; 2026-02-21T09:23:31.1628146Z shl.b32 %r6715, %r12614, 14; 2026-02-21T09:23:31.1628262Z add.s32 %r6717, %r1290, %r6715; 2026-02-21T09:23:31.1628484Z .loc 1 56 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:56:32 2026-02-21T09:23:31.1628562Z add.s32 %r6718, %r6717, %r91; 2026-02-21T09:23:31.1628631Z ld.shared.b16 %rs225, [%r6718]; 2026-02-21T09:23:31.1628706Z ld.shared.b16 %rs226, [%r6718+1024]; 2026-02-21T09:23:31.1628779Z ld.shared.b16 %rs227, [%r6718+64]; 2026-02-21T09:23:31.1628850Z ld.shared.b16 %rs228, [%r6718+1088]; 2026-02-21T09:23:31.1628917Z ld.shared.b16 %rs229, [%r6718+8192]; 2026-02-21T09:23:31.1628982Z ld.shared.b16 %rs230, [%r6718+9216]; 2026-02-21T09:23:31.1629052Z ld.shared.b16 %rs231, [%r6718+8256]; 2026-02-21T09:23:31.1629119Z ld.shared.b16 %rs232, [%r6718+9280]; 2026-02-21T09:23:31.1629181Z add.s32 %r6719, %r6717, %r92; 2026-02-21T09:23:31.1629252Z ld.shared.b16 %rs233, [%r6719]; 2026-02-21T09:23:31.1629317Z ld.shared.b16 %rs234, [%r6719+1024]; 2026-02-21T09:23:31.1629385Z ld.shared.b16 %rs235, [%r6719+64]; 2026-02-21T09:23:31.1629454Z ld.shared.b16 %rs236, [%r6719+1088]; 2026-02-21T09:23:31.1629523Z ld.shared.b16 %rs237, [%r6719+8192]; 2026-02-21T09:23:31.1629590Z ld.shared.b16 %rs238, [%r6719+9216]; 2026-02-21T09:23:31.1629820Z ld.shared.b16 %rs239, [%r6719+8256]; 2026-02-21T09:23:31.1629890Z ld.shared.b16 %rs240, [%r6719+9280]; 2026-02-21T09:23:31.1629952Z add.s32 %r6720, %r6717, %r93; 2026-02-21T09:23:31.1630020Z ld.shared.b16 %rs241, [%r6720]; 2026-02-21T09:23:31.1630090Z ld.shared.b16 %rs242, [%r6720+1024]; 2026-02-21T09:23:31.1630156Z ld.shared.b16 %rs243, [%r6720+64]; 2026-02-21T09:23:31.1630222Z ld.shared.b16 %rs244, [%r6720+1088]; 2026-02-21T09:23:31.1630300Z ld.shared.b16 %rs245, [%r6720+8192]; 2026-02-21T09:23:31.1630374Z ld.shared.b16 %rs246, [%r6720+9216]; 2026-02-21T09:23:31.1630442Z ld.shared.b16 %rs247, [%r6720+8256]; 2026-02-21T09:23:31.1630507Z ld.shared.b16 %rs248, [%r6720+9280]; 2026-02-21T09:23:31.1630574Z add.s32 %r6721, %r6717, %r94; 2026-02-21T09:23:31.1630705Z ld.shared.b16 %rs249, [%r6721]; 2026-02-21T09:23:31.1630775Z ld.shared.b16 %rs250, [%r6721+1024]; 2026-02-21T09:23:31.1630839Z ld.shared.b16 %rs251, [%r6721+64]; 2026-02-21T09:23:31.1630911Z ld.shared.b16 %rs252, [%r6721+1088]; 2026-02-21T09:23:31.1630976Z ld.shared.b16 %rs253, [%r6721+8192]; 2026-02-21T09:23:31.1631041Z ld.shared.b16 %rs254, [%r6721+9216]; 2026-02-21T09:23:31.1631113Z ld.shared.b16 %rs255, [%r6721+8256]; 2026-02-21T09:23:31.1631266Z ld.shared.b16 %rs256, [%r6721+9280]; 2026-02-21T09:23:31.1631332Z add.s32 %r6722, %r6717, %r95; 2026-02-21T09:23:31.1631402Z ld.shared.b16 %rs257, [%r6722]; 2026-02-21T09:23:31.1631466Z ld.shared.b16 %rs258, [%r6722+1024]; 2026-02-21T09:23:31.1631533Z ld.shared.b16 %rs259, [%r6722+64]; 2026-02-21T09:23:31.1631599Z ld.shared.b16 %rs260, [%r6722+1088]; 2026-02-21T09:23:31.1631668Z ld.shared.b16 %rs261, [%r6722+8192]; 2026-02-21T09:23:31.1631733Z ld.shared.b16 %rs262, [%r6722+9216]; 2026-02-21T09:23:31.1631801Z ld.shared.b16 %rs263, [%r6722+8256]; 2026-02-21T09:23:31.1631872Z ld.shared.b16 %rs264, [%r6722+9280]; 2026-02-21T09:23:31.1631934Z add.s32 %r6723, %r6717, %r96; 2026-02-21T09:23:31.1631999Z ld.shared.b16 %rs265, [%r6723]; 2026-02-21T09:23:31.1632068Z ld.shared.b16 %rs266, [%r6723+1024]; 2026-02-21T09:23:31.1632137Z ld.shared.b16 %rs267, [%r6723+64]; 2026-02-21T09:23:31.1632203Z ld.shared.b16 %rs268, [%r6723+1088]; 2026-02-21T09:23:31.1632270Z ld.shared.b16 %rs269, [%r6723+8192]; 2026-02-21T09:23:31.1632342Z ld.shared.b16 %rs270, [%r6723+9216]; 2026-02-21T09:23:31.1632409Z ld.shared.b16 %rs271, [%r6723+8256]; 2026-02-21T09:23:31.1632475Z ld.shared.b16 %rs272, [%r6723+9280]; 2026-02-21T09:23:31.1632540Z add.s32 %r6724, %r6717, %r97; 2026-02-21T09:23:31.1632605Z ld.shared.b16 %rs273, [%r6724]; 2026-02-21T09:23:31.1632670Z ld.shared.b16 %rs274, [%r6724+1024]; 2026-02-21T09:23:31.1632737Z ld.shared.b16 %rs275, [%r6724+64]; 2026-02-21T09:23:31.1632807Z ld.shared.b16 %rs276, [%r6724+1088]; 2026-02-21T09:23:31.1632875Z ld.shared.b16 %rs277, [%r6724+8192]; 2026-02-21T09:23:31.1632943Z ld.shared.b16 %rs278, [%r6724+9216]; 2026-02-21T09:23:31.1633018Z ld.shared.b16 %rs279, [%r6724+8256]; 2026-02-21T09:23:31.1633098Z ld.shared.b16 %rs280, [%r6724+9280]; 2026-02-21T09:23:31.1633162Z add.s32 %r6725, %r6717, %r98; 2026-02-21T09:23:31.1633228Z ld.shared.b16 %rs281, [%r6725]; 2026-02-21T09:23:31.1633299Z ld.shared.b16 %rs282, [%r6725+1024]; 2026-02-21T09:23:31.1633365Z ld.shared.b16 %rs283, [%r6725+64]; 2026-02-21T09:23:31.1633433Z ld.shared.b16 %rs284, [%r6725+1088]; 2026-02-21T09:23:31.1633503Z ld.shared.b16 %rs285, [%r6725+8192]; 2026-02-21T09:23:31.1633569Z ld.shared.b16 %rs286, [%r6725+9216]; 2026-02-21T09:23:31.1633637Z ld.shared.b16 %rs287, [%r6725+8256]; 2026-02-21T09:23:31.1633707Z ld.shared.b16 %rs288, [%r6725+9280]; 2026-02-21T09:23:31.1633774Z cvt.f32.bf16 %r4430, %rs225; 2026-02-21T09:23:31.1633837Z cvt.f32.bf16 %r4431, %rs226; 2026-02-21T09:23:31.1633900Z cvt.f32.bf16 %r4432, %rs233; 2026-02-21T09:23:31.1633966Z cvt.f32.bf16 %r4433, %rs234; 2026-02-21T09:23:31.1634027Z cvt.f32.bf16 %r4562, %rs241; 2026-02-21T09:23:31.1634090Z cvt.f32.bf16 %r4563, %rs242; 2026-02-21T09:23:31.1634270Z cvt.f32.bf16 %r4564, %rs249; 2026-02-21T09:23:31.1634332Z cvt.f32.bf16 %r4565, %rs250; 2026-02-21T09:23:31.1634394Z cvt.f32.bf16 %r4694, %rs257; 2026-02-21T09:23:31.1634456Z cvt.f32.bf16 %r4695, %rs258; 2026-02-21T09:23:31.1634523Z cvt.f32.bf16 %r4696, %rs265; 2026-02-21T09:23:31.1634585Z cvt.f32.bf16 %r4697, %rs266; 2026-02-21T09:23:31.1634647Z cvt.f32.bf16 %r4826, %rs273; 2026-02-21T09:23:31.1634712Z cvt.f32.bf16 %r4827, %rs274; 2026-02-21T09:23:31.1634775Z cvt.f32.bf16 %r4828, %rs281; 2026-02-21T09:23:31.1634836Z cvt.f32.bf16 %r4829, %rs282; 2026-02-21T09:23:31.1634897Z cvt.f32.bf16 %r4958, %rs227; 2026-02-21T09:23:31.1634965Z cvt.f32.bf16 %r4959, %rs228; 2026-02-21T09:23:31.1635027Z cvt.f32.bf16 %r4960, %rs235; 2026-02-21T09:23:31.1635137Z cvt.f32.bf16 %r4961, %rs236; 2026-02-21T09:23:31.1635207Z cvt.f32.bf16 %r5090, %rs243; 2026-02-21T09:23:31.1635269Z cvt.f32.bf16 %r5091, %rs244; 2026-02-21T09:23:31.1635330Z cvt.f32.bf16 %r5092, %rs251; 2026-02-21T09:23:31.1635402Z cvt.f32.bf16 %r5093, %rs252; 2026-02-21T09:23:31.1635465Z cvt.f32.bf16 %r5222, %rs259; 2026-02-21T09:23:31.1635529Z cvt.f32.bf16 %r5223, %rs260; 2026-02-21T09:23:31.1635591Z cvt.f32.bf16 %r5224, %rs267; 2026-02-21T09:23:31.1635706Z cvt.f32.bf16 %r5225, %rs268; 2026-02-21T09:23:31.1635769Z cvt.f32.bf16 %r5354, %rs275; 2026-02-21T09:23:31.1635844Z cvt.f32.bf16 %r5355, %rs276; 2026-02-21T09:23:31.1635912Z cvt.f32.bf16 %r5356, %rs283; 2026-02-21T09:23:31.1635974Z cvt.f32.bf16 %r5357, %rs284; 2026-02-21T09:23:31.1636037Z cvt.f32.bf16 %r5486, %rs229; 2026-02-21T09:23:31.1636098Z cvt.f32.bf16 %r5487, %rs230; 2026-02-21T09:23:31.1636165Z cvt.f32.bf16 %r5488, %rs237; 2026-02-21T09:23:31.1636228Z cvt.f32.bf16 %r5489, %rs238; 2026-02-21T09:23:31.1636292Z cvt.f32.bf16 %r5618, %rs245; 2026-02-21T09:23:31.1636360Z cvt.f32.bf16 %r5619, %rs246; 2026-02-21T09:23:31.1636424Z cvt.f32.bf16 %r5620, %rs253; 2026-02-21T09:23:31.1636598Z cvt.f32.bf16 %r5621, %rs254; 2026-02-21T09:23:31.1636668Z cvt.f32.bf16 %r5750, %rs261; 2026-02-21T09:23:31.1636736Z cvt.f32.bf16 %r5751, %rs262; 2026-02-21T09:23:31.1636799Z cvt.f32.bf16 %r5752, %rs269; 2026-02-21T09:23:31.1636859Z cvt.f32.bf16 %r5753, %rs270; 2026-02-21T09:23:31.1636928Z cvt.f32.bf16 %r5882, %rs277; 2026-02-21T09:23:31.1636990Z cvt.f32.bf16 %r5883, %rs278; 2026-02-21T09:23:31.1637051Z cvt.f32.bf16 %r5884, %rs285; 2026-02-21T09:23:31.1637120Z cvt.f32.bf16 %r5885, %rs286; 2026-02-21T09:23:31.1637181Z cvt.f32.bf16 %r6014, %rs231; 2026-02-21T09:23:31.1637243Z cvt.f32.bf16 %r6015, %rs232; 2026-02-21T09:23:31.1637305Z cvt.f32.bf16 %r6016, %rs239; 2026-02-21T09:23:31.1637372Z cvt.f32.bf16 %r6017, %rs240; 2026-02-21T09:23:31.1637435Z cvt.f32.bf16 %r6146, %rs247; 2026-02-21T09:23:31.1637499Z cvt.f32.bf16 %r6147, %rs248; 2026-02-21T09:23:31.1637565Z cvt.f32.bf16 %r6148, %rs255; 2026-02-21T09:23:31.1637631Z cvt.f32.bf16 %r6149, %rs256; 2026-02-21T09:23:31.1637693Z cvt.f32.bf16 %r6278, %rs263; 2026-02-21T09:23:31.1637757Z cvt.f32.bf16 %r6279, %rs264; 2026-02-21T09:23:31.1637821Z cvt.f32.bf16 %r6280, %rs271; 2026-02-21T09:23:31.1637882Z cvt.f32.bf16 %r6281, %rs272; 2026-02-21T09:23:31.1637944Z cvt.f32.bf16 %r6410, %rs279; 2026-02-21T09:23:31.1638011Z cvt.f32.bf16 %r6411, %rs280; 2026-02-21T09:23:31.1638074Z cvt.f32.bf16 %r6412, %rs287; 2026-02-21T09:23:31.1638134Z cvt.f32.bf16 %r6413, %rs288; 2026-02-21T09:23:31.1638349Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1638422Z shl.b32 %r6726, %r12614, 3; 2026-02-21T09:23:31.1638484Z add.s32 %r4300, %r6810, %r6726; 2026-02-21T09:23:31.1638548Z // begin inline asm 2026-02-21T09:23:31.1638613Z 2026-02-21T09:23:31.1638671Z { 2026-02-21T09:23:31.1638740Z .reg .pred complete; 2026-02-21T09:23:31.1638802Z waitLoop: 2026-02-21T09:23:31.1638957Z mbarrier.try_wait.parity.shared.b64 complete, [%r4300], %r12613; 2026-02-21T09:23:31.1639030Z @!complete bra.uni waitLoop; 2026-02-21T09:23:31.1639224Z } 2026-02-21T09:23:31.1639231Z 2026-02-21T09:23:31.1639298Z // end inline asm 2026-02-21T09:23:31.1639506Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1639572Z shl.b32 %r6728, %r12614, 12; 2026-02-21T09:23:31.1639642Z add.s32 %r6730, %r1378, %r6728; 2026-02-21T09:23:31.1639842Z .loc 1 76 58 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:76:58 2026-02-21T09:23:31.1639907Z add.s32 %r6731, %r6730, %r12473; 2026-02-21T09:23:31.1639974Z add.s32 %r6732, %r6730, %r332; 2026-02-21T09:23:31.1640042Z add.s32 %r6733, %r6730, %r333; 2026-02-21T09:23:31.1640104Z add.s32 %r6734, %r6730, %r334; 2026-02-21T09:23:31.1640165Z add.s32 %r6735, %r6730, %r335; 2026-02-21T09:23:31.1640301Z add.s32 %r6736, %r6730, %r336; 2026-02-21T09:23:31.1640367Z add.s32 %r6737, %r6730, %r337; 2026-02-21T09:23:31.1640429Z add.s32 %r6738, %r6730, %r338; 2026-02-21T09:23:31.1640643Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1640716Z ld.shared.s8 %rs289, [%r6731]; 2026-02-21T09:23:31.1640980Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1641063Z shl.b16 %rs290, %rs289, 4; 2026-02-21T09:23:31.1641276Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1641347Z ld.shared.s8 %rs291, [%r6732+128]; 2026-02-21T09:23:31.1641544Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1641614Z shl.b16 %rs292, %rs291, 4; 2026-02-21T09:23:31.1641815Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1641884Z ld.shared.s8 %rs293, [%r6733+256]; 2026-02-21T09:23:31.1642086Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1642152Z shl.b16 %rs294, %rs293, 4; 2026-02-21T09:23:31.1642348Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1642423Z ld.shared.s8 %rs295, [%r6734+384]; 2026-02-21T09:23:31.1642627Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1642691Z shl.b16 %rs296, %rs295, 4; 2026-02-21T09:23:31.1642897Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1642973Z ld.shared.s8 %rs297, [%r6735+512]; 2026-02-21T09:23:31.1643174Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1643239Z shl.b16 %rs298, %rs297, 4; 2026-02-21T09:23:31.1643438Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1643509Z ld.shared.s8 %rs299, [%r6736+640]; 2026-02-21T09:23:31.1643705Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1643770Z shl.b16 %rs300, %rs299, 4; 2026-02-21T09:23:31.1643966Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1644031Z ld.shared.s8 %rs301, [%r6737+768]; 2026-02-21T09:23:31.1644234Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1644302Z shl.b16 %rs302, %rs301, 4; 2026-02-21T09:23:31.1644499Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1644566Z ld.shared.s8 %rs303, [%r6738+896]; 2026-02-21T09:23:31.1644768Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1644834Z shl.b16 %rs304, %rs303, 4; 2026-02-21T09:23:31.1645146Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1645223Z ld.shared.s8 %rs305, [%r6731+1024]; 2026-02-21T09:23:31.1645422Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1645486Z shl.b16 %rs306, %rs305, 4; 2026-02-21T09:23:31.1645687Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1645756Z ld.shared.s8 %rs307, [%r6732+1152]; 2026-02-21T09:23:31.1645954Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1646021Z shl.b16 %rs308, %rs307, 4; 2026-02-21T09:23:31.1646266Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1646335Z ld.shared.s8 %rs309, [%r6733+1280]; 2026-02-21T09:23:31.1646662Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1646735Z shl.b16 %rs310, %rs309, 4; 2026-02-21T09:23:31.1646932Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1647071Z ld.shared.s8 %rs311, [%r6734+1408]; 2026-02-21T09:23:31.1647279Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1647341Z shl.b16 %rs312, %rs311, 4; 2026-02-21T09:23:31.1647536Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1647611Z ld.shared.s8 %rs313, [%r6735+1536]; 2026-02-21T09:23:31.1647809Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1647873Z shl.b16 %rs314, %rs313, 4; 2026-02-21T09:23:31.1648072Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1648144Z ld.shared.s8 %rs315, [%r6736+1664]; 2026-02-21T09:23:31.1648340Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1648409Z shl.b16 %rs316, %rs315, 4; 2026-02-21T09:23:31.1648608Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1648676Z ld.shared.s8 %rs317, [%r6737+1792]; 2026-02-21T09:23:31.1648870Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1648937Z shl.b16 %rs318, %rs317, 4; 2026-02-21T09:23:31.1649132Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1649201Z ld.shared.s8 %rs319, [%r6738+1920]; 2026-02-21T09:23:31.1649404Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1649472Z shl.b16 %rs320, %rs319, 4; 2026-02-21T09:23:31.1649666Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1649739Z ld.shared.s8 %rs321, [%r6731+2048]; 2026-02-21T09:23:31.1649936Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1649999Z shl.b16 %rs322, %rs321, 4; 2026-02-21T09:23:31.1650200Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1650268Z ld.shared.s8 %rs323, [%r6732+2176]; 2026-02-21T09:23:31.1650462Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1650525Z shl.b16 %rs324, %rs323, 4; 2026-02-21T09:23:31.1650724Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1650792Z ld.shared.s8 %rs325, [%r6733+2304]; 2026-02-21T09:23:31.1651135Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1651203Z shl.b16 %rs326, %rs325, 4; 2026-02-21T09:23:31.1651400Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1651471Z ld.shared.s8 %rs327, [%r6734+2432]; 2026-02-21T09:23:31.1651670Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1651732Z shl.b16 %rs328, %rs327, 4; 2026-02-21T09:23:31.1651927Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1651997Z ld.shared.s8 %rs329, [%r6735+2560]; 2026-02-21T09:23:31.1652276Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1652344Z shl.b16 %rs330, %rs329, 4; 2026-02-21T09:23:31.1652539Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1652626Z ld.shared.s8 %rs331, [%r6736+2688]; 2026-02-21T09:23:31.1652827Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1652938Z shl.b16 %rs332, %rs331, 4; 2026-02-21T09:23:31.1653141Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1653209Z ld.shared.s8 %rs333, [%r6737+2816]; 2026-02-21T09:23:31.1653406Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1653476Z shl.b16 %rs334, %rs333, 4; 2026-02-21T09:23:31.1653675Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1653744Z ld.shared.s8 %rs335, [%r6738+2944]; 2026-02-21T09:23:31.1653945Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1654012Z shl.b16 %rs336, %rs335, 4; 2026-02-21T09:23:31.1654208Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1654275Z ld.shared.s8 %rs337, [%r6731+3072]; 2026-02-21T09:23:31.1654478Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1654543Z shl.b16 %rs338, %rs337, 4; 2026-02-21T09:23:31.1654738Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1654810Z ld.shared.s8 %rs339, [%r6732+3200]; 2026-02-21T09:23:31.1655005Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1655070Z shl.b16 %rs340, %rs339, 4; 2026-02-21T09:23:31.1655274Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1655341Z ld.shared.s8 %rs341, [%r6733+3328]; 2026-02-21T09:23:31.1655538Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1655605Z shl.b16 %rs342, %rs341, 4; 2026-02-21T09:23:31.1655803Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1655872Z ld.shared.s8 %rs343, [%r6734+3456]; 2026-02-21T09:23:31.1656071Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1656134Z shl.b16 %rs344, %rs343, 4; 2026-02-21T09:23:31.1656332Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1656399Z ld.shared.s8 %rs345, [%r6735+3584]; 2026-02-21T09:23:31.1656731Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1656798Z shl.b16 %rs346, %rs345, 4; 2026-02-21T09:23:31.1657080Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1657213Z ld.shared.s8 %rs347, [%r6736+3712]; 2026-02-21T09:23:31.1657411Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1657473Z shl.b16 %rs348, %rs347, 4; 2026-02-21T09:23:31.1657672Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1657739Z ld.shared.s8 %rs349, [%r6737+3840]; 2026-02-21T09:23:31.1657935Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1658003Z shl.b16 %rs350, %rs349, 4; 2026-02-21T09:23:31.1658263Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1658334Z ld.shared.s8 %rs351, [%r6738+3968]; 2026-02-21T09:23:31.1658530Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1658600Z shl.b16 %rs352, %rs351, 4; 2026-02-21T09:23:31.1658793Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1658915Z cvt.s16.s8 %rs353, %rs290; 2026-02-21T09:23:31.1658987Z shr.s16 %rs354, %rs353, 4; 2026-02-21T09:23:31.1659057Z cvt.s16.s8 %rs355, %rs292; 2026-02-21T09:23:31.1659120Z shr.s16 %rs356, %rs355, 4; 2026-02-21T09:23:31.1659187Z shr.s16 %rs357, %rs289, 4; 2026-02-21T09:23:31.1659248Z shr.s16 %rs358, %rs291, 4; 2026-02-21T09:23:31.1659445Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1659513Z cvt.rn.f32.s16 %r6739, %rs358; 2026-02-21T09:23:31.1659584Z cvt.rn.f32.s16 %r6740, %rs357; 2026-02-21T09:23:31.1659648Z cvt.rn.f32.s16 %r6741, %rs356; 2026-02-21T09:23:31.1659711Z cvt.rn.f32.s16 %r6742, %rs354; 2026-02-21T09:23:31.1659913Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1659979Z cvt.s16.s8 %rs359, %rs294; 2026-02-21T09:23:31.1660042Z shr.s16 %rs360, %rs359, 4; 2026-02-21T09:23:31.1660118Z cvt.s16.s8 %rs361, %rs296; 2026-02-21T09:23:31.1660190Z shr.s16 %rs362, %rs361, 4; 2026-02-21T09:23:31.1660254Z shr.s16 %rs363, %rs293, 4; 2026-02-21T09:23:31.1660316Z shr.s16 %rs364, %rs295, 4; 2026-02-21T09:23:31.1660518Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1660583Z cvt.rn.f32.s16 %r6743, %rs364; 2026-02-21T09:23:31.1660648Z cvt.rn.f32.s16 %r6744, %rs363; 2026-02-21T09:23:31.1660717Z cvt.rn.f32.s16 %r6745, %rs362; 2026-02-21T09:23:31.1660781Z cvt.rn.f32.s16 %r6746, %rs360; 2026-02-21T09:23:31.1660982Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1661046Z cvt.s16.s8 %rs365, %rs298; 2026-02-21T09:23:31.1661116Z shr.s16 %rs366, %rs365, 4; 2026-02-21T09:23:31.1661179Z cvt.s16.s8 %rs367, %rs300; 2026-02-21T09:23:31.1661241Z shr.s16 %rs368, %rs367, 4; 2026-02-21T09:23:31.1661310Z shr.s16 %rs369, %rs297, 4; 2026-02-21T09:23:31.1661372Z shr.s16 %rs370, %rs299, 4; 2026-02-21T09:23:31.1661570Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1661639Z cvt.rn.f32.s16 %r6747, %rs370; 2026-02-21T09:23:31.1661705Z cvt.rn.f32.s16 %r6748, %rs369; 2026-02-21T09:23:31.1661767Z cvt.rn.f32.s16 %r6749, %rs368; 2026-02-21T09:23:31.1661831Z cvt.rn.f32.s16 %r6750, %rs366; 2026-02-21T09:23:31.1662034Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1662098Z cvt.s16.s8 %rs371, %rs302; 2026-02-21T09:23:31.1662162Z shr.s16 %rs372, %rs371, 4; 2026-02-21T09:23:31.1662231Z cvt.s16.s8 %rs373, %rs304; 2026-02-21T09:23:31.1662291Z shr.s16 %rs374, %rs373, 4; 2026-02-21T09:23:31.1662474Z shr.s16 %rs375, %rs301, 4; 2026-02-21T09:23:31.1662535Z shr.s16 %rs376, %rs303, 4; 2026-02-21T09:23:31.1662738Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1662803Z cvt.rn.f32.s16 %r6751, %rs376; 2026-02-21T09:23:31.1662867Z cvt.rn.f32.s16 %r6752, %rs375; 2026-02-21T09:23:31.1662935Z cvt.rn.f32.s16 %r6753, %rs374; 2026-02-21T09:23:31.1663000Z cvt.rn.f32.s16 %r6754, %rs372; 2026-02-21T09:23:31.1663198Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1663264Z cvt.s16.s8 %rs377, %rs306; 2026-02-21T09:23:31.1663327Z shr.s16 %rs378, %rs377, 4; 2026-02-21T09:23:31.1663389Z cvt.s16.s8 %rs379, %rs308; 2026-02-21T09:23:31.1663502Z shr.s16 %rs380, %rs379, 4; 2026-02-21T09:23:31.1663583Z shr.s16 %rs381, %rs305, 4; 2026-02-21T09:23:31.1663647Z shr.s16 %rs382, %rs307, 4; 2026-02-21T09:23:31.1663844Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1663916Z cvt.rn.f32.s16 %r6755, %rs382; 2026-02-21T09:23:31.1663979Z cvt.rn.f32.s16 %r6756, %rs381; 2026-02-21T09:23:31.1664042Z cvt.rn.f32.s16 %r6757, %rs380; 2026-02-21T09:23:31.1664153Z cvt.rn.f32.s16 %r6758, %rs378; 2026-02-21T09:23:31.1664358Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1664420Z cvt.s16.s8 %rs383, %rs310; 2026-02-21T09:23:31.1664482Z shr.s16 %rs384, %rs383, 4; 2026-02-21T09:23:31.1664550Z cvt.s16.s8 %rs385, %rs312; 2026-02-21T09:23:31.1664612Z shr.s16 %rs386, %rs385, 4; 2026-02-21T09:23:31.1664674Z shr.s16 %rs387, %rs309, 4; 2026-02-21T09:23:31.1664740Z shr.s16 %rs388, %rs311, 4; 2026-02-21T09:23:31.1664937Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1665001Z cvt.rn.f32.s16 %r6759, %rs388; 2026-02-21T09:23:31.1665067Z cvt.rn.f32.s16 %r6760, %rs387; 2026-02-21T09:23:31.1665139Z cvt.rn.f32.s16 %r6761, %rs386; 2026-02-21T09:23:31.1665201Z cvt.rn.f32.s16 %r6762, %rs384; 2026-02-21T09:23:31.1665400Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1665470Z cvt.s16.s8 %rs389, %rs314; 2026-02-21T09:23:31.1665534Z shr.s16 %rs390, %rs389, 4; 2026-02-21T09:23:31.1665596Z cvt.s16.s8 %rs391, %rs316; 2026-02-21T09:23:31.1665661Z shr.s16 %rs392, %rs391, 4; 2026-02-21T09:23:31.1665724Z shr.s16 %rs393, %rs313, 4; 2026-02-21T09:23:31.1665785Z shr.s16 %rs394, %rs315, 4; 2026-02-21T09:23:31.1665983Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1666051Z cvt.rn.f32.s16 %r6763, %rs394; 2026-02-21T09:23:31.1666117Z cvt.rn.f32.s16 %r6764, %rs393; 2026-02-21T09:23:31.1666181Z cvt.rn.f32.s16 %r6765, %rs392; 2026-02-21T09:23:31.1666249Z cvt.rn.f32.s16 %r6766, %rs390; 2026-02-21T09:23:31.1666561Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1666630Z cvt.s16.s8 %rs395, %rs318; 2026-02-21T09:23:31.1666692Z shr.s16 %rs396, %rs395, 4; 2026-02-21T09:23:31.1666760Z cvt.s16.s8 %rs397, %rs320; 2026-02-21T09:23:31.1666821Z shr.s16 %rs398, %rs397, 4; 2026-02-21T09:23:31.1666881Z shr.s16 %rs399, %rs317, 4; 2026-02-21T09:23:31.1666948Z shr.s16 %rs400, %rs319, 4; 2026-02-21T09:23:31.1667143Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1667208Z cvt.rn.f32.s16 %r6767, %rs400; 2026-02-21T09:23:31.1667277Z cvt.rn.f32.s16 %r6768, %rs399; 2026-02-21T09:23:31.1667343Z cvt.rn.f32.s16 %r6769, %rs398; 2026-02-21T09:23:31.1667407Z cvt.rn.f32.s16 %r6770, %rs396; 2026-02-21T09:23:31.1667605Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1667672Z cvt.s16.s8 %rs401, %rs322; 2026-02-21T09:23:31.1667901Z shr.s16 %rs402, %rs401, 4; 2026-02-21T09:23:31.1667965Z cvt.s16.s8 %rs403, %rs324; 2026-02-21T09:23:31.1668032Z shr.s16 %rs404, %rs403, 4; 2026-02-21T09:23:31.1668095Z shr.s16 %rs405, %rs321, 4; 2026-02-21T09:23:31.1668159Z shr.s16 %rs406, %rs323, 4; 2026-02-21T09:23:31.1668438Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1668513Z cvt.rn.f32.s16 %r6771, %rs406; 2026-02-21T09:23:31.1668577Z cvt.rn.f32.s16 %r6772, %rs405; 2026-02-21T09:23:31.1668640Z cvt.rn.f32.s16 %r6773, %rs404; 2026-02-21T09:23:31.1668710Z cvt.rn.f32.s16 %r6774, %rs402; 2026-02-21T09:23:31.1668914Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1669064Z cvt.s16.s8 %rs407, %rs326; 2026-02-21T09:23:31.1669134Z shr.s16 %rs408, %rs407, 4; 2026-02-21T09:23:31.1669196Z cvt.s16.s8 %rs409, %rs328; 2026-02-21T09:23:31.1669265Z shr.s16 %rs410, %rs409, 4; 2026-02-21T09:23:31.1669329Z shr.s16 %rs411, %rs325, 4; 2026-02-21T09:23:31.1669395Z shr.s16 %rs412, %rs327, 4; 2026-02-21T09:23:31.1669593Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1669721Z cvt.rn.f32.s16 %r6775, %rs412; 2026-02-21T09:23:31.1669795Z cvt.rn.f32.s16 %r6776, %rs411; 2026-02-21T09:23:31.1669858Z cvt.rn.f32.s16 %r6777, %rs410; 2026-02-21T09:23:31.1669921Z cvt.rn.f32.s16 %r6778, %rs408; 2026-02-21T09:23:31.1670121Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1670185Z cvt.s16.s8 %rs413, %rs330; 2026-02-21T09:23:31.1670247Z shr.s16 %rs414, %rs413, 4; 2026-02-21T09:23:31.1670307Z cvt.s16.s8 %rs415, %rs332; 2026-02-21T09:23:31.1670377Z shr.s16 %rs416, %rs415, 4; 2026-02-21T09:23:31.1670438Z shr.s16 %rs417, %rs329, 4; 2026-02-21T09:23:31.1670499Z shr.s16 %rs418, %rs331, 4; 2026-02-21T09:23:31.1670704Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1670770Z cvt.rn.f32.s16 %r6779, %rs418; 2026-02-21T09:23:31.1670834Z cvt.rn.f32.s16 %r6780, %rs417; 2026-02-21T09:23:31.1670903Z cvt.rn.f32.s16 %r6781, %rs416; 2026-02-21T09:23:31.1670971Z cvt.rn.f32.s16 %r6782, %rs414; 2026-02-21T09:23:31.1671167Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1671230Z cvt.s16.s8 %rs419, %rs334; 2026-02-21T09:23:31.1671296Z shr.s16 %rs420, %rs419, 4; 2026-02-21T09:23:31.1671358Z cvt.s16.s8 %rs421, %rs336; 2026-02-21T09:23:31.1671419Z shr.s16 %rs422, %rs421, 4; 2026-02-21T09:23:31.1671488Z shr.s16 %rs423, %rs333, 4; 2026-02-21T09:23:31.1671551Z shr.s16 %rs424, %rs335, 4; 2026-02-21T09:23:31.1671746Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1671810Z cvt.rn.f32.s16 %r6783, %rs424; 2026-02-21T09:23:31.1671885Z cvt.rn.f32.s16 %r6784, %rs423; 2026-02-21T09:23:31.1671951Z cvt.rn.f32.s16 %r6785, %rs422; 2026-02-21T09:23:31.1672013Z cvt.rn.f32.s16 %r6786, %rs420; 2026-02-21T09:23:31.1672219Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1672281Z cvt.s16.s8 %rs425, %rs338; 2026-02-21T09:23:31.1672342Z shr.s16 %rs426, %rs425, 4; 2026-02-21T09:23:31.1672422Z cvt.s16.s8 %rs427, %rs340; 2026-02-21T09:23:31.1672486Z shr.s16 %rs428, %rs427, 4; 2026-02-21T09:23:31.1672548Z shr.s16 %rs429, %rs337, 4; 2026-02-21T09:23:31.1672610Z shr.s16 %rs430, %rs339, 4; 2026-02-21T09:23:31.1672814Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1672879Z cvt.rn.f32.s16 %r6787, %rs430; 2026-02-21T09:23:31.1672943Z cvt.rn.f32.s16 %r6788, %rs429; 2026-02-21T09:23:31.1673011Z cvt.rn.f32.s16 %r6789, %rs428; 2026-02-21T09:23:31.1673131Z cvt.rn.f32.s16 %r6790, %rs426; 2026-02-21T09:23:31.1673399Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1673463Z cvt.s16.s8 %rs431, %rs342; 2026-02-21T09:23:31.1673529Z shr.s16 %rs432, %rs431, 4; 2026-02-21T09:23:31.1673593Z cvt.s16.s8 %rs433, %rs344; 2026-02-21T09:23:31.1673656Z shr.s16 %rs434, %rs433, 4; 2026-02-21T09:23:31.1673723Z shr.s16 %rs435, %rs341, 4; 2026-02-21T09:23:31.1673785Z shr.s16 %rs436, %rs343, 4; 2026-02-21T09:23:31.1673982Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1674051Z cvt.rn.f32.s16 %r6791, %rs436; 2026-02-21T09:23:31.1674114Z cvt.rn.f32.s16 %r6792, %rs435; 2026-02-21T09:23:31.1674177Z cvt.rn.f32.s16 %r6793, %rs434; 2026-02-21T09:23:31.1674290Z cvt.rn.f32.s16 %r6794, %rs432; 2026-02-21T09:23:31.1674497Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1674563Z cvt.s16.s8 %rs437, %rs346; 2026-02-21T09:23:31.1674624Z shr.s16 %rs438, %rs437, 4; 2026-02-21T09:23:31.1674690Z cvt.s16.s8 %rs439, %rs348; 2026-02-21T09:23:31.1674752Z shr.s16 %rs440, %rs439, 4; 2026-02-21T09:23:31.1674812Z shr.s16 %rs441, %rs345, 4; 2026-02-21T09:23:31.1674921Z shr.s16 %rs442, %rs347, 4; 2026-02-21T09:23:31.1675128Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1675205Z cvt.rn.f32.s16 %r6795, %rs442; 2026-02-21T09:23:31.1675270Z cvt.rn.f32.s16 %r6796, %rs441; 2026-02-21T09:23:31.1675339Z cvt.rn.f32.s16 %r6797, %rs440; 2026-02-21T09:23:31.1675403Z cvt.rn.f32.s16 %r6798, %rs438; 2026-02-21T09:23:31.1675602Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1675670Z cvt.s16.s8 %rs443, %rs350; 2026-02-21T09:23:31.1675732Z shr.s16 %rs444, %rs443, 4; 2026-02-21T09:23:31.1675796Z cvt.s16.s8 %rs445, %rs352; 2026-02-21T09:23:31.1675861Z shr.s16 %rs446, %rs445, 4; 2026-02-21T09:23:31.1675928Z shr.s16 %rs447, %rs349, 4; 2026-02-21T09:23:31.1675990Z shr.s16 %rs448, %rs351, 4; 2026-02-21T09:23:31.1676187Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1676257Z cvt.rn.f32.s16 %r6799, %rs448; 2026-02-21T09:23:31.1676322Z cvt.rn.f32.s16 %r6800, %rs447; 2026-02-21T09:23:31.1676385Z cvt.rn.f32.s16 %r6801, %rs446; 2026-02-21T09:23:31.1676563Z cvt.rn.f32.s16 %r6802, %rs444; 2026-02-21T09:23:31.1676688Z st.shared.v4.b32 [%r67], {%r6742, %r6740, %r6741, %r6739}; 2026-02-21T09:23:31.1676816Z st.shared.v4.b32 [%r67+16384], {%r6774, %r6772, %r6773, %r6771}; 2026-02-21T09:23:31.1676925Z st.shared.v4.b32 [%r68], {%r6746, %r6744, %r6745, %r6743}; 2026-02-21T09:23:31.1677045Z st.shared.v4.b32 [%r68+16384], {%r6778, %r6776, %r6777, %r6775}; 2026-02-21T09:23:31.1677150Z st.shared.v4.b32 [%r69], {%r6750, %r6748, %r6749, %r6747}; 2026-02-21T09:23:31.1677267Z st.shared.v4.b32 [%r69+16384], {%r6782, %r6780, %r6781, %r6779}; 2026-02-21T09:23:31.1677374Z st.shared.v4.b32 [%r70], {%r6754, %r6752, %r6753, %r6751}; 2026-02-21T09:23:31.1677490Z st.shared.v4.b32 [%r70+16384], {%r6786, %r6784, %r6785, %r6783}; 2026-02-21T09:23:31.1677598Z st.shared.v4.b32 [%r71], {%r6758, %r6756, %r6757, %r6755}; 2026-02-21T09:23:31.1677717Z st.shared.v4.b32 [%r71+16384], {%r6790, %r6788, %r6789, %r6787}; 2026-02-21T09:23:31.1677820Z st.shared.v4.b32 [%r72], {%r6762, %r6760, %r6761, %r6759}; 2026-02-21T09:23:31.1677932Z st.shared.v4.b32 [%r72+16384], {%r6794, %r6792, %r6793, %r6791}; 2026-02-21T09:23:31.1678034Z st.shared.v4.b32 [%r73], {%r6766, %r6764, %r6765, %r6763}; 2026-02-21T09:23:31.1678153Z st.shared.v4.b32 [%r73+16384], {%r6798, %r6796, %r6797, %r6795}; 2026-02-21T09:23:31.1678255Z st.shared.v4.b32 [%r74], {%r6770, %r6768, %r6769, %r6767}; 2026-02-21T09:23:31.1678368Z st.shared.v4.b32 [%r74+16384], {%r6802, %r6800, %r6801, %r6799}; 2026-02-21T09:23:31.1678598Z $L__tmp3: 2026-02-21T09:23:31.1678884Z .loc 2 291 36 // standard.py:291:36 @[ cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:88:40 ] 2026-02-21T09:23:31.1678949Z // begin inline asm 2026-02-21T09:23:31.1679038Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1679097Z // end inline asm 2026-02-21T09:23:31.1679156Z bar.sync 0; 2026-02-21T09:23:31.1679230Z wgmma.fence.sync.aligned; 2026-02-21T09:23:31.1679305Z mov.pred %p87, -1; 2026-02-21T09:23:31.1679367Z // begin inline asm 2026-02-21T09:23:31.1680945Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r4430,%r4431,%r4432,%r4433}, %rd3, %p87, 1, 1; 2026-02-21T09:23:31.1681019Z // end inline asm 2026-02-21T09:23:31.1681080Z // begin inline asm 2026-02-21T09:23:31.1682619Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r4562,%r4563,%r4564,%r4565}, %rd4, %p87, 1, 1; 2026-02-21T09:23:31.1682681Z // end inline asm 2026-02-21T09:23:31.1682740Z // begin inline asm 2026-02-21T09:23:31.1684228Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r4694,%r4695,%r4696,%r4697}, %rd5, %p87, 1, 1; 2026-02-21T09:23:31.1684289Z // end inline asm 2026-02-21T09:23:31.1684351Z // begin inline asm 2026-02-21T09:23:31.1685839Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r4826,%r4827,%r4828,%r4829}, %rd6, %p87, 1, 1; 2026-02-21T09:23:31.1685901Z // end inline asm 2026-02-21T09:23:31.1685968Z // begin inline asm 2026-02-21T09:23:31.1687563Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r4958,%r4959,%r4960,%r4961}, %rd7, %p87, 1, 1; 2026-02-21T09:23:31.1687757Z // end inline asm 2026-02-21T09:23:31.1687822Z // begin inline asm 2026-02-21T09:23:31.1689304Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r5090,%r5091,%r5092,%r5093}, %rd8, %p87, 1, 1; 2026-02-21T09:23:31.1689426Z // end inline asm 2026-02-21T09:23:31.1689490Z // begin inline asm 2026-02-21T09:23:31.1691014Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r5222,%r5223,%r5224,%r5225}, %rd9, %p87, 1, 1; 2026-02-21T09:23:31.1691087Z // end inline asm 2026-02-21T09:23:31.1691146Z // begin inline asm 2026-02-21T09:23:31.1692628Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679}, {%r5354,%r5355,%r5356,%r5357}, %rd10, %p87, 1, 1; 2026-02-21T09:23:31.1692692Z // end inline asm 2026-02-21T09:23:31.1692752Z // begin inline asm 2026-02-21T09:23:31.1694231Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r5486,%r5487,%r5488,%r5489}, %rd3, %p87, 1, 1; 2026-02-21T09:23:31.1694293Z // end inline asm 2026-02-21T09:23:31.1694352Z // begin inline asm 2026-02-21T09:23:31.1695829Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r5618,%r5619,%r5620,%r5621}, %rd4, %p87, 1, 1; 2026-02-21T09:23:31.1695889Z // end inline asm 2026-02-21T09:23:31.1695954Z // begin inline asm 2026-02-21T09:23:31.1697555Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r5750,%r5751,%r5752,%r5753}, %rd5, %p87, 1, 1; 2026-02-21T09:23:31.1697761Z // end inline asm 2026-02-21T09:23:31.1697828Z // begin inline asm 2026-02-21T09:23:31.1699378Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r5882,%r5883,%r5884,%r5885}, %rd6, %p87, 1, 1; 2026-02-21T09:23:31.1699448Z // end inline asm 2026-02-21T09:23:31.1699508Z // begin inline asm 2026-02-21T09:23:31.1701047Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r6014,%r6015,%r6016,%r6017}, %rd7, %p87, 1, 1; 2026-02-21T09:23:31.1701121Z // end inline asm 2026-02-21T09:23:31.1701189Z // begin inline asm 2026-02-21T09:23:31.1702667Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r6146,%r6147,%r6148,%r6149}, %rd8, %p87, 1, 1; 2026-02-21T09:23:31.1702731Z // end inline asm 2026-02-21T09:23:31.1702796Z // begin inline asm 2026-02-21T09:23:31.1704267Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r6278,%r6279,%r6280,%r6281}, %rd9, %p87, 1, 1; 2026-02-21T09:23:31.1704327Z // end inline asm 2026-02-21T09:23:31.1704386Z // begin inline asm 2026-02-21T09:23:31.1705862Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743}, {%r6410,%r6411,%r6412,%r6413}, %rd10, %p87, 1, 1; 2026-02-21T09:23:31.1706033Z // end inline asm 2026-02-21T09:23:31.1706113Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:31.1706181Z mov.b32 %r6543, %r4082; 2026-02-21T09:23:31.1706245Z mov.b32 %r6544, %r4082; 2026-02-21T09:23:31.1706306Z mov.b32 %r6542, %r1310; 2026-02-21T09:23:31.1706370Z // begin inline asm 2026-02-21T09:23:31.1709206Z // wait for regs: %r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r6542,%r6543,%r6544 2026-02-21T09:23:31.1709307Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:31.1709367Z // end inline asm 2026-02-21T09:23:31.1709424Z $L__tmp4: 2026-02-21T09:23:31.1709641Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1709706Z add.s32 %r6803, %r12615, 1; 2026-02-21T09:23:31.1709776Z setp.gt.s32 %p109, %r6803, 1; 2026-02-21T09:23:31.1709848Z selp.b32 %r12615, 0, %r6803, %p109; 2026-02-21T09:23:31.1710055Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1710126Z add.s64 %rd429, %rd757, %rd57; 2026-02-21T09:23:31.1710191Z add.s64 %rd430, %rd757, %rd56; 2026-02-21T09:23:31.1710258Z add.s64 %rd431, %rd757, %rd55; 2026-02-21T09:23:31.1710323Z add.s64 %rd432, %rd757, %rd54; 2026-02-21T09:23:31.1710386Z add.s64 %rd433, %rd757, %rd53; 2026-02-21T09:23:31.1710449Z add.s64 %rd434, %rd757, %rd52; 2026-02-21T09:23:31.1710516Z add.s64 %rd435, %rd757, %rd51; 2026-02-21T09:23:31.1710578Z add.s64 %rd436, %rd757, %rd50; 2026-02-21T09:23:31.1710641Z add.s64 %rd437, %rd757, %rd49; 2026-02-21T09:23:31.1710709Z add.s64 %rd438, %rd757, %rd48; 2026-02-21T09:23:31.1710772Z add.s64 %rd439, %rd757, %rd47; 2026-02-21T09:23:31.1710836Z add.s64 %rd440, %rd757, %rd46; 2026-02-21T09:23:31.1710904Z add.s64 %rd441, %rd757, %rd45; 2026-02-21T09:23:31.1710967Z add.s64 %rd442, %rd757, %rd44; 2026-02-21T09:23:31.1711031Z add.s64 %rd443, %rd757, %rd43; 2026-02-21T09:23:31.1711233Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1711302Z add.s64 %rd444, %rd757, %rd42; 2026-02-21T09:23:31.1711367Z shl.b32 %r6804, %r12615, 14; 2026-02-21T09:23:31.1711431Z add.s32 %r6805, %r1290, %r6804; 2026-02-21T09:23:31.1711516Z add.s32 %r6676, %r6805, %r26; 2026-02-21T09:23:31.1711584Z selp.b32 %r6677, 8, 0, %p107; 2026-02-21T09:23:31.1711646Z // begin inline asm 2026-02-21T09:23:31.1711797Z cp.async.ca.shared.global [ %r6676 + 0 ], [ %rd429 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1711862Z // end inline asm 2026-02-21T09:23:31.1711924Z add.s32 %r6678, %r6676, 1024; 2026-02-21T09:23:31.1711986Z // begin inline asm 2026-02-21T09:23:31.1712130Z cp.async.ca.shared.global [ %r6678 + 0 ], [ %rd430 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1712190Z // end inline asm 2026-02-21T09:23:31.1712252Z add.s32 %r6680, %r6676, 2048; 2026-02-21T09:23:31.1712318Z // begin inline asm 2026-02-21T09:23:31.1712453Z cp.async.ca.shared.global [ %r6680 + 0 ], [ %rd431 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1712673Z // end inline asm 2026-02-21T09:23:31.1712736Z add.s32 %r6682, %r6676, 3072; 2026-02-21T09:23:31.1712802Z // begin inline asm 2026-02-21T09:23:31.1712935Z cp.async.ca.shared.global [ %r6682 + 0 ], [ %rd432 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1712995Z // end inline asm 2026-02-21T09:23:31.1713060Z add.s32 %r6684, %r6676, 4096; 2026-02-21T09:23:31.1713123Z // begin inline asm 2026-02-21T09:23:31.1713253Z cp.async.ca.shared.global [ %r6684 + 0 ], [ %rd433 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1713311Z // end inline asm 2026-02-21T09:23:31.1713379Z add.s32 %r6686, %r6676, 5120; 2026-02-21T09:23:31.1713443Z // begin inline asm 2026-02-21T09:23:31.1713625Z cp.async.ca.shared.global [ %r6686 + 0 ], [ %rd434 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1713692Z // end inline asm 2026-02-21T09:23:31.1713755Z add.s32 %r6688, %r6676, 6144; 2026-02-21T09:23:31.1713815Z // begin inline asm 2026-02-21T09:23:31.1713949Z cp.async.ca.shared.global [ %r6688 + 0 ], [ %rd435 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1714017Z // end inline asm 2026-02-21T09:23:31.1714079Z add.s32 %r6690, %r6676, 7168; 2026-02-21T09:23:31.1714139Z // begin inline asm 2026-02-21T09:23:31.1714327Z cp.async.ca.shared.global [ %r6690 + 0 ], [ %rd436 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1714388Z // end inline asm 2026-02-21T09:23:31.1714449Z add.s32 %r6692, %r6676, 8192; 2026-02-21T09:23:31.1714513Z // begin inline asm 2026-02-21T09:23:31.1714644Z cp.async.ca.shared.global [ %r6692 + 0 ], [ %rd437 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1714702Z // end inline asm 2026-02-21T09:23:31.1714764Z add.s32 %r6694, %r6676, 9216; 2026-02-21T09:23:31.1714827Z // begin inline asm 2026-02-21T09:23:31.1714958Z cp.async.ca.shared.global [ %r6694 + 0 ], [ %rd438 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1715015Z // end inline asm 2026-02-21T09:23:31.1715084Z add.s32 %r6696, %r6676, 10240; 2026-02-21T09:23:31.1715156Z // begin inline asm 2026-02-21T09:23:31.1715300Z cp.async.ca.shared.global [ %r6696 + 0 ], [ %rd439 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1715364Z // end inline asm 2026-02-21T09:23:31.1715433Z add.s32 %r6698, %r6676, 11264; 2026-02-21T09:23:31.1715493Z // begin inline asm 2026-02-21T09:23:31.1715627Z cp.async.ca.shared.global [ %r6698 + 0 ], [ %rd440 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1715689Z // end inline asm 2026-02-21T09:23:31.1715752Z add.s32 %r6700, %r6676, 12288; 2026-02-21T09:23:31.1715811Z // begin inline asm 2026-02-21T09:23:31.1715948Z cp.async.ca.shared.global [ %r6700 + 0 ], [ %rd441 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1716008Z // end inline asm 2026-02-21T09:23:31.1716070Z add.s32 %r6702, %r6676, 13312; 2026-02-21T09:23:31.1716129Z // begin inline asm 2026-02-21T09:23:31.1716268Z cp.async.ca.shared.global [ %r6702 + 0 ], [ %rd442 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1716326Z // end inline asm 2026-02-21T09:23:31.1716388Z add.s32 %r6704, %r6676, 14336; 2026-02-21T09:23:31.1716564Z // begin inline asm 2026-02-21T09:23:31.1716707Z cp.async.ca.shared.global [ %r6704 + 0 ], [ %rd443 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1716765Z // end inline asm 2026-02-21T09:23:31.1716828Z add.s32 %r6706, %r6676, 15360; 2026-02-21T09:23:31.1716891Z // begin inline asm 2026-02-21T09:23:31.1717023Z cp.async.ca.shared.global [ %r6706 + 0 ], [ %rd444 + 0 ], 0x8, %r6677; 2026-02-21T09:23:31.1717080Z // end inline asm 2026-02-21T09:23:31.1717153Z cp.async.commit_group; 2026-02-21T09:23:31.1717360Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1717425Z shl.b32 %r6806, %r12615, 3; 2026-02-21T09:23:31.1717494Z add.s32 %r6708, %r6810, %r6806; 2026-02-21T09:23:31.1717565Z and.pred %p103, %p160, %p107; 2026-02-21T09:23:31.1717628Z // begin inline asm 2026-02-21T09:23:31.1717764Z @%p103 mbarrier.arrive.expect_tx.shared.b64 _, [%r6708], 4096; 2026-02-21T09:23:31.1717828Z // end inline asm 2026-02-21T09:23:31.1718033Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1718255Z shl.b32 %r6807, %r12615, 12; 2026-02-21T09:23:31.1718324Z add.s32 %r6709, %r1378, %r6807; 2026-02-21T09:23:31.1718382Z bar.sync 0; 2026-02-21T09:23:31.1718454Z elect.sync %r6808|%p110, -1; 2026-02-21T09:23:31.1718521Z and.pred %p111, %p107, %p110; 2026-02-21T09:23:31.1718608Z and.pred %p104, %p1, %p111; 2026-02-21T09:23:31.1718674Z cvt.u32.u64 %r6809, %rd758; 2026-02-21T09:23:31.1718736Z add.s32 %r6711, %r6809, 64; 2026-02-21T09:23:31.1718801Z // begin inline asm 2026-02-21T09:23:31.1719141Z @%p104 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r6709], [%rd463, {%r4081, %r6711}], [%r6708]; 2026-02-21T09:23:31.1719201Z // end inline asm 2026-02-21T09:23:31.1719477Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1719547Z add.s64 %rd757, %rd757, 128; 2026-02-21T09:23:31.1719621Z setp.lt.u64 %p112, %rd758, 480; 2026-02-21T09:23:31.1719684Z add.s64 %rd758, %rd758, 32; 2026-02-21T09:23:31.1719751Z @%p112 bra $L__BB0_5; 2026-02-21T09:23:31.1719865Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:31.1719994Z cp.async.wait_group 0; 2026-02-21T09:23:31.1720059Z bar.sync 0; 2026-02-21T09:23:31.1720120Z // begin inline asm 2026-02-21T09:23:31.1720220Z @%p160 mbarrier.inval.shared::cta.b64 [%r6810]; 2026-02-21T09:23:31.1720291Z // end inline asm 2026-02-21T09:23:31.1720348Z bar.sync 0; 2026-02-21T09:23:31.1720408Z // begin inline asm 2026-02-21T09:23:31.1720501Z @%p160 mbarrier.inval.shared::cta.b64 [%r6811]; 2026-02-21T09:23:31.1720567Z // end inline asm 2026-02-21T09:23:31.1720771Z .loc 1 91 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:91:28 2026-02-21T09:23:31.1720859Z cvt.rn.bf16x2.f32 %r6896, %r12617, %r12616; 2026-02-21T09:23:31.1720946Z cvt.rn.bf16x2.f32 %r6897, %r12619, %r12618; 2026-02-21T09:23:31.1721029Z cvt.rn.bf16x2.f32 %r6898, %r12621, %r12620; 2026-02-21T09:23:31.1721105Z cvt.rn.bf16x2.f32 %r6899, %r12623, %r12622; 2026-02-21T09:23:31.1721187Z cvt.rn.bf16x2.f32 %r6900, %r12625, %r12624; 2026-02-21T09:23:31.1721265Z cvt.rn.bf16x2.f32 %r6901, %r12627, %r12626; 2026-02-21T09:23:31.1721342Z cvt.rn.bf16x2.f32 %r6902, %r12629, %r12628; 2026-02-21T09:23:31.1721419Z cvt.rn.bf16x2.f32 %r6903, %r12631, %r12630; 2026-02-21T09:23:31.1721503Z cvt.rn.bf16x2.f32 %r6904, %r12633, %r12632; 2026-02-21T09:23:31.1721579Z cvt.rn.bf16x2.f32 %r6905, %r12635, %r12634; 2026-02-21T09:23:31.1721655Z cvt.rn.bf16x2.f32 %r6906, %r12637, %r12636; 2026-02-21T09:23:31.1721736Z cvt.rn.bf16x2.f32 %r6907, %r12639, %r12638; 2026-02-21T09:23:31.1721811Z cvt.rn.bf16x2.f32 %r6908, %r12641, %r12640; 2026-02-21T09:23:31.1721888Z cvt.rn.bf16x2.f32 %r6909, %r12643, %r12642; 2026-02-21T09:23:31.1721964Z cvt.rn.bf16x2.f32 %r6910, %r12645, %r12644; 2026-02-21T09:23:31.1722044Z cvt.rn.bf16x2.f32 %r6911, %r12647, %r12646; 2026-02-21T09:23:31.1722124Z cvt.rn.bf16x2.f32 %r6912, %r12649, %r12648; 2026-02-21T09:23:31.1722199Z cvt.rn.bf16x2.f32 %r6913, %r12651, %r12650; 2026-02-21T09:23:31.1722281Z cvt.rn.bf16x2.f32 %r6914, %r12653, %r12652; 2026-02-21T09:23:31.1722357Z cvt.rn.bf16x2.f32 %r6915, %r12655, %r12654; 2026-02-21T09:23:31.1722437Z cvt.rn.bf16x2.f32 %r6916, %r12657, %r12656; 2026-02-21T09:23:31.1722517Z cvt.rn.bf16x2.f32 %r6917, %r12659, %r12658; 2026-02-21T09:23:31.1722593Z cvt.rn.bf16x2.f32 %r6918, %r12661, %r12660; 2026-02-21T09:23:31.1722682Z cvt.rn.bf16x2.f32 %r6919, %r12663, %r12662; 2026-02-21T09:23:31.1722762Z cvt.rn.bf16x2.f32 %r6920, %r12665, %r12664; 2026-02-21T09:23:31.1722845Z cvt.rn.bf16x2.f32 %r6921, %r12667, %r12666; 2026-02-21T09:23:31.1722922Z cvt.rn.bf16x2.f32 %r6922, %r12669, %r12668; 2026-02-21T09:23:31.1722999Z cvt.rn.bf16x2.f32 %r6923, %r12671, %r12670; 2026-02-21T09:23:31.1723083Z cvt.rn.bf16x2.f32 %r6924, %r12673, %r12672; 2026-02-21T09:23:31.1723266Z cvt.rn.bf16x2.f32 %r6925, %r12675, %r12674; 2026-02-21T09:23:31.1723344Z cvt.rn.bf16x2.f32 %r6926, %r12677, %r12676; 2026-02-21T09:23:31.1723427Z cvt.rn.bf16x2.f32 %r6927, %r12679, %r12678; 2026-02-21T09:23:31.1723503Z cvt.rn.bf16x2.f32 %r6928, %r12681, %r12680; 2026-02-21T09:23:31.1723579Z cvt.rn.bf16x2.f32 %r6929, %r12683, %r12682; 2026-02-21T09:23:31.1723655Z cvt.rn.bf16x2.f32 %r6930, %r12685, %r12684; 2026-02-21T09:23:31.1723734Z cvt.rn.bf16x2.f32 %r6931, %r12687, %r12686; 2026-02-21T09:23:31.1723809Z cvt.rn.bf16x2.f32 %r6932, %r12689, %r12688; 2026-02-21T09:23:31.1723882Z cvt.rn.bf16x2.f32 %r6933, %r12691, %r12690; 2026-02-21T09:23:31.1723964Z cvt.rn.bf16x2.f32 %r6934, %r12693, %r12692; 2026-02-21T09:23:31.1724039Z cvt.rn.bf16x2.f32 %r6935, %r12695, %r12694; 2026-02-21T09:23:31.1724169Z cvt.rn.bf16x2.f32 %r6936, %r12697, %r12696; 2026-02-21T09:23:31.1724252Z cvt.rn.bf16x2.f32 %r6937, %r12699, %r12698; 2026-02-21T09:23:31.1724329Z cvt.rn.bf16x2.f32 %r6938, %r12701, %r12700; 2026-02-21T09:23:31.1724408Z cvt.rn.bf16x2.f32 %r6939, %r12703, %r12702; 2026-02-21T09:23:31.1724484Z cvt.rn.bf16x2.f32 %r6940, %r12705, %r12704; 2026-02-21T09:23:31.1724564Z cvt.rn.bf16x2.f32 %r6941, %r12707, %r12706; 2026-02-21T09:23:31.1724716Z cvt.rn.bf16x2.f32 %r6942, %r12709, %r12708; 2026-02-21T09:23:31.1724798Z cvt.rn.bf16x2.f32 %r6943, %r12711, %r12710; 2026-02-21T09:23:31.1724879Z cvt.rn.bf16x2.f32 %r6944, %r12713, %r12712; 2026-02-21T09:23:31.1724956Z cvt.rn.bf16x2.f32 %r6945, %r12715, %r12714; 2026-02-21T09:23:31.1725033Z cvt.rn.bf16x2.f32 %r6946, %r12717, %r12716; 2026-02-21T09:23:31.1725112Z cvt.rn.bf16x2.f32 %r6947, %r12719, %r12718; 2026-02-21T09:23:31.1725188Z cvt.rn.bf16x2.f32 %r6948, %r12721, %r12720; 2026-02-21T09:23:31.1725263Z cvt.rn.bf16x2.f32 %r6949, %r12723, %r12722; 2026-02-21T09:23:31.1725339Z cvt.rn.bf16x2.f32 %r6950, %r12725, %r12724; 2026-02-21T09:23:31.1725422Z cvt.rn.bf16x2.f32 %r6951, %r12727, %r12726; 2026-02-21T09:23:31.1725497Z cvt.rn.bf16x2.f32 %r6952, %r12729, %r12728; 2026-02-21T09:23:31.1725575Z cvt.rn.bf16x2.f32 %r6953, %r12731, %r12730; 2026-02-21T09:23:31.1725654Z cvt.rn.bf16x2.f32 %r6954, %r12733, %r12732; 2026-02-21T09:23:31.1725730Z cvt.rn.bf16x2.f32 %r6955, %r12735, %r12734; 2026-02-21T09:23:31.1725809Z cvt.rn.bf16x2.f32 %r6956, %r12737, %r12736; 2026-02-21T09:23:31.1725889Z cvt.rn.bf16x2.f32 %r6957, %r12739, %r12738; 2026-02-21T09:23:31.1725963Z cvt.rn.bf16x2.f32 %r6958, %r12741, %r12740; 2026-02-21T09:23:31.1726040Z cvt.rn.bf16x2.f32 %r6959, %r12743, %r12742; 2026-02-21T09:23:31.1726249Z .loc 1 92 43 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:92:43 2026-02-21T09:23:31.1726442Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r6896, %r6897, %r6898, %r6899}; 2026-02-21T09:23:31.1726737Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r6912, %r6913, %r6914, %r6915}; 2026-02-21T09:23:31.1726914Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r6928, %r6929, %r6930, %r6931}; 2026-02-21T09:23:31.1727100Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r6944, %r6945, %r6946, %r6947}; 2026-02-21T09:23:31.1727275Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r6900, %r6901, %r6902, %r6903}; 2026-02-21T09:23:31.1727450Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r6916, %r6917, %r6918, %r6919}; 2026-02-21T09:23:31.1727629Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r6932, %r6933, %r6934, %r6935}; 2026-02-21T09:23:31.1727804Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r6948, %r6949, %r6950, %r6951}; 2026-02-21T09:23:31.1727978Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r6904, %r6905, %r6906, %r6907}; 2026-02-21T09:23:31.1728157Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r6920, %r6921, %r6922, %r6923}; 2026-02-21T09:23:31.1728334Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r6936, %r6937, %r6938, %r6939}; 2026-02-21T09:23:31.1728509Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r6952, %r6953, %r6954, %r6955}; 2026-02-21T09:23:31.1728865Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r6908, %r6909, %r6910, %r6911}; 2026-02-21T09:23:31.1729046Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r6924, %r6925, %r6926, %r6927}; 2026-02-21T09:23:31.1729221Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r6940, %r6941, %r6942, %r6943}; 2026-02-21T09:23:31.1729394Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r90], {%r6956, %r6957, %r6958, %r6959}; 2026-02-21T09:23:31.1729463Z // begin inline asm 2026-02-21T09:23:31.1729544Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1729603Z // end inline asm 2026-02-21T09:23:31.1729665Z bar.sync 0; 2026-02-21T09:23:31.1729739Z elect.sync %r6960|%p124, -1; 2026-02-21T09:23:31.1729808Z and.pred %p115, %p201, %p124; 2026-02-21T09:23:31.1729943Z or.b32 %r6812, %r469, %r4081; 2026-02-21T09:23:31.1730009Z // begin inline asm 2026-02-21T09:23:31.1730244Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd446, {%r6812, %r6813}], [%r6814]; 2026-02-21T09:23:31.1730310Z // end inline asm 2026-02-21T09:23:31.1730389Z cp.async.bulk.commit_group; 2026-02-21T09:23:31.1730467Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:31.1730524Z bar.sync 0; 2026-02-21T09:23:31.1730794Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1730860Z add.s32 %r6961, %r12481, 2; 2026-02-21T09:23:31.1731060Z .loc 1 35 31 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:35:31 2026-02-21T09:23:31.1731129Z shr.s32 %r6962, %r6961, 31; 2026-02-21T09:23:31.1731191Z shr.u32 %r6963, %r6962, 25; 2026-02-21T09:23:31.1731255Z add.s32 %r6964, %r6961, %r6963; 2026-02-21T09:23:31.1731454Z .loc 1 34 30 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:34:30 2026-02-21T09:23:31.1731524Z and.b32 %r6851, %r6964, -128; 2026-02-21T09:23:31.1731588Z sub.s32 %r6965, %r6961, %r6851; 2026-02-21T09:23:31.1731786Z .loc 1 36 27 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:36:27 2026-02-21T09:23:31.1731858Z shl.b32 %r9580, %r6965, 7; 2026-02-21T09:23:31.1732053Z .loc 1 37 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:37:32 2026-02-21T09:23:31.1732117Z or.b32 %r6966, %r9580, %r7; 2026-02-21T09:23:31.1732185Z or.b32 %r6967, %r9580, %r8; 2026-02-21T09:23:31.1732248Z or.b32 %r6968, %r9580, %r9; 2026-02-21T09:23:31.1732311Z or.b32 %r6969, %r9580, %r10; 2026-02-21T09:23:31.1732373Z or.b32 %r6970, %r9580, %r11; 2026-02-21T09:23:31.1732453Z or.b32 %r6971, %r9580, %r12; 2026-02-21T09:23:31.1732521Z or.b32 %r6972, %r9580, %r13; 2026-02-21T09:23:31.1732583Z or.b32 %r6973, %r9580, %r14; 2026-02-21T09:23:31.1732651Z or.b32 %r6974, %r9580, %r15; 2026-02-21T09:23:31.1732713Z or.b32 %r6975, %r9580, %r16; 2026-02-21T09:23:31.1732775Z or.b32 %r6976, %r9580, %r17; 2026-02-21T09:23:31.1732836Z or.b32 %r6977, %r9580, %r18; 2026-02-21T09:23:31.1732902Z or.b32 %r6978, %r9580, %r19; 2026-02-21T09:23:31.1732967Z or.b32 %r6979, %r9580, %r20; 2026-02-21T09:23:31.1733029Z or.b32 %r6980, %r9580, %r21; 2026-02-21T09:23:31.1733096Z or.b32 %r6981, %r9580, %r22; 2026-02-21T09:23:31.1733298Z .loc 1 52 53 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:53 2026-02-21T09:23:31.1733362Z shl.b32 %r6982, %r6966, 10; 2026-02-21T09:23:31.1733428Z shl.b32 %r6983, %r6967, 10; 2026-02-21T09:23:31.1733489Z shl.b32 %r6984, %r6968, 10; 2026-02-21T09:23:31.1733550Z shl.b32 %r6985, %r6969, 10; 2026-02-21T09:23:31.1733612Z shl.b32 %r6986, %r6970, 10; 2026-02-21T09:23:31.1733677Z shl.b32 %r6987, %r6971, 10; 2026-02-21T09:23:31.1733737Z shl.b32 %r6988, %r6972, 10; 2026-02-21T09:23:31.1733798Z shl.b32 %r6989, %r6973, 10; 2026-02-21T09:23:31.1733864Z shl.b32 %r6990, %r6974, 10; 2026-02-21T09:23:31.1733926Z shl.b32 %r6991, %r6975, 10; 2026-02-21T09:23:31.1733986Z shl.b32 %r6992, %r6976, 10; 2026-02-21T09:23:31.1734045Z shl.b32 %r6993, %r6977, 10; 2026-02-21T09:23:31.1734227Z shl.b32 %r6994, %r6978, 10; 2026-02-21T09:23:31.1734288Z shl.b32 %r6995, %r6979, 10; 2026-02-21T09:23:31.1734351Z shl.b32 %r6996, %r6980, 10; 2026-02-21T09:23:31.1734416Z shl.b32 %r6997, %r6981, 10; 2026-02-21T09:23:31.1734616Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1734679Z // begin inline asm 2026-02-21T09:23:31.1734784Z @%p160 mbarrier.init.shared::cta.b64 [%r6810], 1; 2026-02-21T09:23:31.1734842Z // end inline asm 2026-02-21T09:23:31.1734898Z bar.sync 0; 2026-02-21T09:23:31.1734970Z // begin inline asm 2026-02-21T09:23:31.1735072Z @%p160 mbarrier.init.shared::cta.b64 [%r6811], 1; 2026-02-21T09:23:31.1735131Z // end inline asm 2026-02-21T09:23:31.1735386Z .loc 1 52 60 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:60 2026-02-21T09:23:31.1735459Z or.b32 %r6998, %r6982, %r24; 2026-02-21T09:23:31.1735521Z or.b32 %r6999, %r6983, %r24; 2026-02-21T09:23:31.1735585Z or.b32 %r7000, %r6984, %r24; 2026-02-21T09:23:31.1735647Z or.b32 %r7001, %r6985, %r24; 2026-02-21T09:23:31.1735712Z or.b32 %r7002, %r6986, %r24; 2026-02-21T09:23:31.1735771Z or.b32 %r7003, %r6987, %r24; 2026-02-21T09:23:31.1735879Z or.b32 %r7004, %r6988, %r24; 2026-02-21T09:23:31.1735948Z or.b32 %r7005, %r6989, %r24; 2026-02-21T09:23:31.1736008Z or.b32 %r7006, %r6990, %r24; 2026-02-21T09:23:31.1736069Z or.b32 %r7007, %r6991, %r24; 2026-02-21T09:23:31.1736132Z or.b32 %r7008, %r6992, %r24; 2026-02-21T09:23:31.1736192Z or.b32 %r7009, %r6993, %r24; 2026-02-21T09:23:31.1736253Z or.b32 %r7010, %r6994, %r24; 2026-02-21T09:23:31.1736313Z or.b32 %r7011, %r6995, %r24; 2026-02-21T09:23:31.1736379Z or.b32 %r7012, %r6996, %r24; 2026-02-21T09:23:31.1736440Z or.b32 %r7013, %r6997, %r24; 2026-02-21T09:23:31.1736749Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1736830Z mad.wide.s32 %rd447, %r6998, 2, %rd102; 2026-02-21T09:23:31.1736921Z mad.wide.s32 %rd448, %r6999, 2, %rd102; 2026-02-21T09:23:31.1736994Z mad.wide.s32 %rd449, %r7000, 2, %rd102; 2026-02-21T09:23:31.1737065Z mad.wide.s32 %rd450, %r7001, 2, %rd102; 2026-02-21T09:23:31.1737142Z mad.wide.s32 %rd451, %r7002, 2, %rd102; 2026-02-21T09:23:31.1737212Z mad.wide.s32 %rd452, %r7003, 2, %rd102; 2026-02-21T09:23:31.1737280Z mad.wide.s32 %rd453, %r7004, 2, %rd102; 2026-02-21T09:23:31.1737356Z mad.wide.s32 %rd454, %r7005, 2, %rd102; 2026-02-21T09:23:31.1737424Z mad.wide.s32 %rd455, %r7006, 2, %rd102; 2026-02-21T09:23:31.1737493Z mad.wide.s32 %rd456, %r7007, 2, %rd102; 2026-02-21T09:23:31.1737566Z mad.wide.s32 %rd457, %r7008, 2, %rd102; 2026-02-21T09:23:31.1737637Z mad.wide.s32 %rd458, %r7009, 2, %rd102; 2026-02-21T09:23:31.1737706Z mad.wide.s32 %rd459, %r7010, 2, %rd102; 2026-02-21T09:23:31.1737775Z mad.wide.s32 %rd460, %r7011, 2, %rd102; 2026-02-21T09:23:31.1737849Z mad.wide.s32 %rd461, %r7012, 2, %rd102; 2026-02-21T09:23:31.1737921Z mad.wide.s32 %rd462, %r7013, 2, %rd102; 2026-02-21T09:23:31.1737981Z mov.b32 %r6818, 8; 2026-02-21T09:23:31.1738186Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1738249Z // begin inline asm 2026-02-21T09:23:31.1738391Z cp.async.ca.shared.global [ %r27 + 0 ], [ %rd447 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1738454Z // end inline asm 2026-02-21T09:23:31.1738514Z // begin inline asm 2026-02-21T09:23:31.1738648Z cp.async.ca.shared.global [ %r28 + 0 ], [ %rd448 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1738705Z // end inline asm 2026-02-21T09:23:31.1738771Z // begin inline asm 2026-02-21T09:23:31.1738907Z cp.async.ca.shared.global [ %r29 + 0 ], [ %rd449 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1738966Z // end inline asm 2026-02-21T09:23:31.1739032Z // begin inline asm 2026-02-21T09:23:31.1739161Z cp.async.ca.shared.global [ %r30 + 0 ], [ %rd450 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1739311Z // end inline asm 2026-02-21T09:23:31.1739432Z // begin inline asm 2026-02-21T09:23:31.1739564Z cp.async.ca.shared.global [ %r31 + 0 ], [ %rd451 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1739635Z // end inline asm 2026-02-21T09:23:31.1739698Z // begin inline asm 2026-02-21T09:23:31.1739836Z cp.async.ca.shared.global [ %r32 + 0 ], [ %rd452 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1739894Z // end inline asm 2026-02-21T09:23:31.1739954Z // begin inline asm 2026-02-21T09:23:31.1740090Z cp.async.ca.shared.global [ %r33 + 0 ], [ %rd453 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1740148Z // end inline asm 2026-02-21T09:23:31.1740211Z // begin inline asm 2026-02-21T09:23:31.1740340Z cp.async.ca.shared.global [ %r34 + 0 ], [ %rd454 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1740403Z // end inline asm 2026-02-21T09:23:31.1740535Z // begin inline asm 2026-02-21T09:23:31.1740669Z cp.async.ca.shared.global [ %r35 + 0 ], [ %rd455 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1740732Z // end inline asm 2026-02-21T09:23:31.1740794Z // begin inline asm 2026-02-21T09:23:31.1740924Z cp.async.ca.shared.global [ %r36 + 0 ], [ %rd456 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1740984Z // end inline asm 2026-02-21T09:23:31.1741050Z // begin inline asm 2026-02-21T09:23:31.1741235Z cp.async.ca.shared.global [ %r37 + 0 ], [ %rd457 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1741295Z // end inline asm 2026-02-21T09:23:31.1741360Z // begin inline asm 2026-02-21T09:23:31.1741487Z cp.async.ca.shared.global [ %r38 + 0 ], [ %rd458 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1741545Z // end inline asm 2026-02-21T09:23:31.1741606Z // begin inline asm 2026-02-21T09:23:31.1741739Z cp.async.ca.shared.global [ %r39 + 0 ], [ %rd459 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1741797Z // end inline asm 2026-02-21T09:23:31.1741861Z // begin inline asm 2026-02-21T09:23:31.1741995Z cp.async.ca.shared.global [ %r40 + 0 ], [ %rd460 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1742066Z // end inline asm 2026-02-21T09:23:31.1742127Z // begin inline asm 2026-02-21T09:23:31.1742266Z cp.async.ca.shared.global [ %r41 + 0 ], [ %rd461 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1742324Z // end inline asm 2026-02-21T09:23:31.1742384Z // begin inline asm 2026-02-21T09:23:31.1742514Z cp.async.ca.shared.global [ %r42 + 0 ], [ %rd462 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1742580Z // end inline asm 2026-02-21T09:23:31.1742649Z cp.async.commit_group; 2026-02-21T09:23:31.1742853Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1742917Z bar.sync 0; 2026-02-21T09:23:31.1742976Z // begin inline asm 2026-02-21T09:23:31.1743112Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6810], 4096; 2026-02-21T09:23:31.1743177Z // end inline asm 2026-02-21T09:23:31.1743377Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1743436Z bar.sync 0; 2026-02-21T09:23:31.1743505Z elect.sync %r7014|%p125, -1; 2026-02-21T09:23:31.1743578Z and.pred %p119, %p1, %p125; 2026-02-21T09:23:31.1743641Z mov.b32 %r6852, 0; 2026-02-21T09:23:31.1743700Z // begin inline asm 2026-02-21T09:23:31.1744033Z @%p119 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1378], [%rd463, {%r6851, %r6852}], [%r6810]; 2026-02-21T09:23:31.1744093Z // end inline asm 2026-02-21T09:23:31.1744294Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1744366Z cvt.s64.s32 %rd482, %r6982; 2026-02-21T09:23:31.1744432Z or.b64 %rd483, %rd482, %rd754; 2026-02-21T09:23:31.1744496Z shl.b64 %rd484, %rd483, 1; 2026-02-21T09:23:31.1744562Z add.s64 %rd485, %rd102, %rd484; 2026-02-21T09:23:31.1744646Z add.s64 %rd464, %rd485, 128; 2026-02-21T09:23:31.1744712Z cvt.s64.s32 %rd486, %r6983; 2026-02-21T09:23:31.1744779Z or.b64 %rd487, %rd486, %rd754; 2026-02-21T09:23:31.1744850Z shl.b64 %rd488, %rd487, 1; 2026-02-21T09:23:31.1744915Z add.s64 %rd489, %rd102, %rd488; 2026-02-21T09:23:31.1745089Z add.s64 %rd465, %rd489, 128; 2026-02-21T09:23:31.1745153Z cvt.s64.s32 %rd490, %r6984; 2026-02-21T09:23:31.1745221Z or.b64 %rd491, %rd490, %rd754; 2026-02-21T09:23:31.1745284Z shl.b64 %rd492, %rd491, 1; 2026-02-21T09:23:31.1745347Z add.s64 %rd493, %rd102, %rd492; 2026-02-21T09:23:31.1745415Z add.s64 %rd466, %rd493, 128; 2026-02-21T09:23:31.1745478Z cvt.s64.s32 %rd494, %r6985; 2026-02-21T09:23:31.1745541Z or.b64 %rd495, %rd494, %rd754; 2026-02-21T09:23:31.1745603Z shl.b64 %rd496, %rd495, 1; 2026-02-21T09:23:31.1745671Z add.s64 %rd497, %rd102, %rd496; 2026-02-21T09:23:31.1745732Z add.s64 %rd467, %rd497, 128; 2026-02-21T09:23:31.1745798Z cvt.s64.s32 %rd498, %r6986; 2026-02-21T09:23:31.1745867Z or.b64 %rd499, %rd498, %rd754; 2026-02-21T09:23:31.1745931Z shl.b64 %rd500, %rd499, 1; 2026-02-21T09:23:31.1746067Z add.s64 %rd501, %rd102, %rd500; 2026-02-21T09:23:31.1746139Z add.s64 %rd468, %rd501, 128; 2026-02-21T09:23:31.1746203Z cvt.s64.s32 %rd502, %r6987; 2026-02-21T09:23:31.1746271Z or.b64 %rd503, %rd502, %rd754; 2026-02-21T09:23:31.1746332Z shl.b64 %rd504, %rd503, 1; 2026-02-21T09:23:31.1746402Z add.s64 %rd505, %rd102, %rd504; 2026-02-21T09:23:31.1746584Z add.s64 %rd469, %rd505, 128; 2026-02-21T09:23:31.1746653Z cvt.s64.s32 %rd506, %r6988; 2026-02-21T09:23:31.1746803Z or.b64 %rd507, %rd506, %rd754; 2026-02-21T09:23:31.1746868Z shl.b64 %rd508, %rd507, 1; 2026-02-21T09:23:31.1746933Z add.s64 %rd509, %rd102, %rd508; 2026-02-21T09:23:31.1746996Z add.s64 %rd470, %rd509, 128; 2026-02-21T09:23:31.1747065Z cvt.s64.s32 %rd510, %r6989; 2026-02-21T09:23:31.1747128Z or.b64 %rd511, %rd510, %rd754; 2026-02-21T09:23:31.1747190Z shl.b64 %rd512, %rd511, 1; 2026-02-21T09:23:31.1747260Z add.s64 %rd513, %rd102, %rd512; 2026-02-21T09:23:31.1747322Z add.s64 %rd471, %rd513, 128; 2026-02-21T09:23:31.1747386Z cvt.s64.s32 %rd514, %r6990; 2026-02-21T09:23:31.1747450Z or.b64 %rd515, %rd514, %rd754; 2026-02-21T09:23:31.1747518Z shl.b64 %rd516, %rd515, 1; 2026-02-21T09:23:31.1747585Z add.s64 %rd517, %rd102, %rd516; 2026-02-21T09:23:31.1747648Z add.s64 %rd472, %rd517, 128; 2026-02-21T09:23:31.1747717Z cvt.s64.s32 %rd518, %r6991; 2026-02-21T09:23:31.1747780Z or.b64 %rd519, %rd518, %rd754; 2026-02-21T09:23:31.1747842Z shl.b64 %rd520, %rd519, 1; 2026-02-21T09:23:31.1747907Z add.s64 %rd521, %rd102, %rd520; 2026-02-21T09:23:31.1747978Z add.s64 %rd473, %rd521, 128; 2026-02-21T09:23:31.1748041Z cvt.s64.s32 %rd522, %r6992; 2026-02-21T09:23:31.1748104Z or.b64 %rd523, %rd522, %rd754; 2026-02-21T09:23:31.1748170Z shl.b64 %rd524, %rd523, 1; 2026-02-21T09:23:31.1748294Z add.s64 %rd525, %rd102, %rd524; 2026-02-21T09:23:31.1748361Z add.s64 %rd474, %rd525, 128; 2026-02-21T09:23:31.1748429Z cvt.s64.s32 %rd526, %r6993; 2026-02-21T09:23:31.1748496Z or.b64 %rd527, %rd526, %rd754; 2026-02-21T09:23:31.1748561Z shl.b64 %rd528, %rd527, 1; 2026-02-21T09:23:31.1748628Z add.s64 %rd529, %rd102, %rd528; 2026-02-21T09:23:31.1748697Z add.s64 %rd475, %rd529, 128; 2026-02-21T09:23:31.1748767Z cvt.s64.s32 %rd530, %r6994; 2026-02-21T09:23:31.1748831Z or.b64 %rd531, %rd530, %rd754; 2026-02-21T09:23:31.1748899Z shl.b64 %rd532, %rd531, 1; 2026-02-21T09:23:31.1748963Z add.s64 %rd533, %rd102, %rd532; 2026-02-21T09:23:31.1749027Z add.s64 %rd476, %rd533, 128; 2026-02-21T09:23:31.1749095Z cvt.s64.s32 %rd534, %r6995; 2026-02-21T09:23:31.1749173Z or.b64 %rd535, %rd534, %rd754; 2026-02-21T09:23:31.1749240Z shl.b64 %rd536, %rd535, 1; 2026-02-21T09:23:31.1749306Z add.s64 %rd537, %rd102, %rd536; 2026-02-21T09:23:31.1749373Z add.s64 %rd477, %rd537, 128; 2026-02-21T09:23:31.1749436Z cvt.s64.s32 %rd538, %r6996; 2026-02-21T09:23:31.1749499Z or.b64 %rd539, %rd538, %rd754; 2026-02-21T09:23:31.1749562Z shl.b64 %rd540, %rd539, 1; 2026-02-21T09:23:31.1749630Z add.s64 %rd541, %rd102, %rd540; 2026-02-21T09:23:31.1749696Z add.s64 %rd478, %rd541, 128; 2026-02-21T09:23:31.1749757Z cvt.s64.s32 %rd542, %r6997; 2026-02-21T09:23:31.1749831Z or.b64 %rd543, %rd542, %rd754; 2026-02-21T09:23:31.1750031Z shl.b64 %rd544, %rd543, 1; 2026-02-21T09:23:31.1750094Z add.s64 %rd545, %rd102, %rd544; 2026-02-21T09:23:31.1750162Z add.s64 %rd479, %rd545, 128; 2026-02-21T09:23:31.1750368Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1750431Z // begin inline asm 2026-02-21T09:23:31.1750567Z cp.async.ca.shared.global [ %r43 + 0 ], [ %rd464 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1750635Z // end inline asm 2026-02-21T09:23:31.1750708Z // begin inline asm 2026-02-21T09:23:31.1750850Z cp.async.ca.shared.global [ %r44 + 0 ], [ %rd465 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1750912Z // end inline asm 2026-02-21T09:23:31.1750972Z // begin inline asm 2026-02-21T09:23:31.1751170Z cp.async.ca.shared.global [ %r45 + 0 ], [ %rd466 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1751231Z // end inline asm 2026-02-21T09:23:31.1751297Z // begin inline asm 2026-02-21T09:23:31.1751427Z cp.async.ca.shared.global [ %r46 + 0 ], [ %rd467 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1751490Z // end inline asm 2026-02-21T09:23:31.1751555Z // begin inline asm 2026-02-21T09:23:31.1751683Z cp.async.ca.shared.global [ %r47 + 0 ], [ %rd468 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1751740Z // end inline asm 2026-02-21T09:23:31.1751856Z // begin inline asm 2026-02-21T09:23:31.1751987Z cp.async.ca.shared.global [ %r48 + 0 ], [ %rd469 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1752046Z // end inline asm 2026-02-21T09:23:31.1752107Z // begin inline asm 2026-02-21T09:23:31.1752243Z cp.async.ca.shared.global [ %r49 + 0 ], [ %rd470 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1752302Z // end inline asm 2026-02-21T09:23:31.1752367Z // begin inline asm 2026-02-21T09:23:31.1752500Z cp.async.ca.shared.global [ %r50 + 0 ], [ %rd471 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1752559Z // end inline asm 2026-02-21T09:23:31.1752620Z // begin inline asm 2026-02-21T09:23:31.1752748Z cp.async.ca.shared.global [ %r51 + 0 ], [ %rd472 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1752815Z // end inline asm 2026-02-21T09:23:31.1752876Z // begin inline asm 2026-02-21T09:23:31.1753004Z cp.async.ca.shared.global [ %r52 + 0 ], [ %rd473 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1753066Z // end inline asm 2026-02-21T09:23:31.1753127Z // begin inline asm 2026-02-21T09:23:31.1753257Z cp.async.ca.shared.global [ %r53 + 0 ], [ %rd474 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1753320Z // end inline asm 2026-02-21T09:23:31.1753384Z // begin inline asm 2026-02-21T09:23:31.1753513Z cp.async.ca.shared.global [ %r54 + 0 ], [ %rd475 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1753572Z // end inline asm 2026-02-21T09:23:31.1753637Z // begin inline asm 2026-02-21T09:23:31.1753765Z cp.async.ca.shared.global [ %r55 + 0 ], [ %rd476 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1753824Z // end inline asm 2026-02-21T09:23:31.1753891Z // begin inline asm 2026-02-21T09:23:31.1754022Z cp.async.ca.shared.global [ %r56 + 0 ], [ %rd477 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1754081Z // end inline asm 2026-02-21T09:23:31.1754144Z // begin inline asm 2026-02-21T09:23:31.1754280Z cp.async.ca.shared.global [ %r57 + 0 ], [ %rd478 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1754338Z // end inline asm 2026-02-21T09:23:31.1754399Z // begin inline asm 2026-02-21T09:23:31.1754533Z cp.async.ca.shared.global [ %r58 + 0 ], [ %rd479 + 0 ], 0x8, %r6818; 2026-02-21T09:23:31.1754592Z // end inline asm 2026-02-21T09:23:31.1761152Z cp.async.commit_group; 2026-02-21T09:23:31.1761434Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1761503Z bar.sync 0; 2026-02-21T09:23:31.1761568Z // begin inline asm 2026-02-21T09:23:31.1761720Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r6811], 4096; 2026-02-21T09:23:31.1761784Z // end inline asm 2026-02-21T09:23:31.1762009Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1762070Z bar.sync 0; 2026-02-21T09:23:31.1762149Z elect.sync %r7015|%p126, -1; 2026-02-21T09:23:31.1762453Z and.pred %p121, %p1, %p126; 2026-02-21T09:23:31.1762515Z mov.b32 %r6889, 32; 2026-02-21T09:23:31.1762579Z // begin inline asm 2026-02-21T09:23:31.1762955Z @%p121 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1415], [%rd463, {%r6851, %r6889}], [%r6811]; 2026-02-21T09:23:31.1763019Z // end inline asm 2026-02-21T09:23:31.1763237Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1763314Z shl.b32 %r7016, %r6961, 7; 2026-02-21T09:23:31.1763381Z or.b32 %r7017, %r22, %r7016; 2026-02-21T09:23:31.1763447Z shl.b32 %r7018, %r6964, 7; 2026-02-21T09:23:31.1763516Z and.b32 %r7019, %r7018, -16384; 2026-02-21T09:23:31.1763588Z sub.s32 %r7020, %r7017, %r7019; 2026-02-21T09:23:31.1763727Z shl.b32 %r7021, %r7020, 10; 2026-02-21T09:23:31.1763797Z mul.wide.s32 %rd546, %r7021, 2; 2026-02-21T09:23:31.1763872Z or.b64 %rd62, %rd546, 256; 2026-02-21T09:23:31.1763939Z or.b32 %r7022, %r21, %r7016; 2026-02-21T09:23:31.1764006Z sub.s32 %r7023, %r7022, %r7019; 2026-02-21T09:23:31.1764075Z shl.b32 %r7024, %r7023, 10; 2026-02-21T09:23:31.1764142Z mul.wide.s32 %rd547, %r7024, 2; 2026-02-21T09:23:31.1764207Z or.b64 %rd63, %rd547, 256; 2026-02-21T09:23:31.1764336Z or.b32 %r7025, %r20, %r7016; 2026-02-21T09:23:31.1764408Z sub.s32 %r7026, %r7025, %r7019; 2026-02-21T09:23:31.1764469Z shl.b32 %r7027, %r7026, 10; 2026-02-21T09:23:31.1764534Z mul.wide.s32 %rd548, %r7027, 2; 2026-02-21T09:23:31.1764606Z or.b64 %rd64, %rd548, 256; 2026-02-21T09:23:31.1764667Z or.b32 %r7028, %r19, %r7016; 2026-02-21T09:23:31.1764729Z sub.s32 %r7029, %r7028, %r7019; 2026-02-21T09:23:31.1764792Z shl.b32 %r7030, %r7029, 10; 2026-02-21T09:23:31.1764863Z mul.wide.s32 %rd549, %r7030, 2; 2026-02-21T09:23:31.1764927Z or.b64 %rd65, %rd549, 256; 2026-02-21T09:23:31.1764989Z or.b32 %r7031, %r18, %r7016; 2026-02-21T09:23:31.1765058Z sub.s32 %r7032, %r7031, %r7019; 2026-02-21T09:23:31.1765124Z shl.b32 %r7033, %r7032, 10; 2026-02-21T09:23:31.1765192Z mul.wide.s32 %rd550, %r7033, 2; 2026-02-21T09:23:31.1765254Z or.b64 %rd66, %rd550, 256; 2026-02-21T09:23:31.1765321Z or.b32 %r7034, %r17, %r7016; 2026-02-21T09:23:31.1765385Z sub.s32 %r7035, %r7034, %r7019; 2026-02-21T09:23:31.1765447Z shl.b32 %r7036, %r7035, 10; 2026-02-21T09:23:31.1765518Z mul.wide.s32 %rd551, %r7036, 2; 2026-02-21T09:23:31.1765580Z or.b64 %rd67, %rd551, 256; 2026-02-21T09:23:31.1765641Z or.b32 %r7037, %r16, %r7016; 2026-02-21T09:23:31.1765708Z sub.s32 %r7038, %r7037, %r7019; 2026-02-21T09:23:31.1765770Z shl.b32 %r7039, %r7038, 10; 2026-02-21T09:23:31.1765834Z mul.wide.s32 %rd552, %r7039, 2; 2026-02-21T09:23:31.1765895Z or.b64 %rd68, %rd552, 256; 2026-02-21T09:23:31.1765963Z or.b32 %r7040, %r15, %r7016; 2026-02-21T09:23:31.1766026Z sub.s32 %r7041, %r7040, %r7019; 2026-02-21T09:23:31.1766089Z shl.b32 %r7042, %r7041, 10; 2026-02-21T09:23:31.1766158Z mul.wide.s32 %rd553, %r7042, 2; 2026-02-21T09:23:31.1766223Z or.b64 %rd69, %rd553, 256; 2026-02-21T09:23:31.1766283Z or.b32 %r7043, %r14, %r7016; 2026-02-21T09:23:31.1766344Z sub.s32 %r7044, %r7043, %r7019; 2026-02-21T09:23:31.1766411Z shl.b32 %r7045, %r7044, 10; 2026-02-21T09:23:31.1766618Z mul.wide.s32 %rd554, %r7045, 2; 2026-02-21T09:23:31.1766691Z or.b64 %rd70, %rd554, 256; 2026-02-21T09:23:31.1766760Z or.b32 %r7046, %r13, %r7016; 2026-02-21T09:23:31.1766821Z sub.s32 %r7047, %r7046, %r7019; 2026-02-21T09:23:31.1766880Z shl.b32 %r7048, %r7047, 10; 2026-02-21T09:23:31.1766945Z mul.wide.s32 %rd555, %r7048, 2; 2026-02-21T09:23:31.1767011Z or.b64 %rd71, %rd555, 256; 2026-02-21T09:23:31.1767071Z or.b32 %r7049, %r12, %r7016; 2026-02-21T09:23:31.1767133Z sub.s32 %r7050, %r7049, %r7019; 2026-02-21T09:23:31.1767201Z shl.b32 %r7051, %r7050, 10; 2026-02-21T09:23:31.1767267Z mul.wide.s32 %rd556, %r7051, 2; 2026-02-21T09:23:31.1767329Z or.b64 %rd72, %rd556, 256; 2026-02-21T09:23:31.1767396Z or.b32 %r7052, %r11, %r7016; 2026-02-21T09:23:31.1767619Z sub.s32 %r7053, %r7052, %r7019; 2026-02-21T09:23:31.1767682Z shl.b32 %r7054, %r7053, 10; 2026-02-21T09:23:31.1767745Z mul.wide.s32 %rd557, %r7054, 2; 2026-02-21T09:23:31.1767812Z or.b64 %rd73, %rd557, 256; 2026-02-21T09:23:31.1767873Z or.b32 %r7055, %r10, %r7016; 2026-02-21T09:23:31.1767938Z sub.s32 %r7056, %r7055, %r7019; 2026-02-21T09:23:31.1768005Z shl.b32 %r7057, %r7056, 10; 2026-02-21T09:23:31.1768071Z mul.wide.s32 %rd558, %r7057, 2; 2026-02-21T09:23:31.1768134Z or.b64 %rd74, %rd558, 256; 2026-02-21T09:23:31.1768194Z or.b32 %r7058, %r9, %r7016; 2026-02-21T09:23:31.1768261Z sub.s32 %r7059, %r7058, %r7019; 2026-02-21T09:23:31.1768321Z shl.b32 %r7060, %r7059, 10; 2026-02-21T09:23:31.1768386Z mul.wide.s32 %rd559, %r7060, 2; 2026-02-21T09:23:31.1768517Z or.b64 %rd75, %rd559, 256; 2026-02-21T09:23:31.1768581Z or.b32 %r7061, %r8, %r7016; 2026-02-21T09:23:31.1768644Z sub.s32 %r7062, %r7061, %r7019; 2026-02-21T09:23:31.1768704Z shl.b32 %r7063, %r7062, 10; 2026-02-21T09:23:31.1768777Z mul.wide.s32 %rd560, %r7063, 2; 2026-02-21T09:23:31.1768839Z or.b64 %rd76, %rd560, 256; 2026-02-21T09:23:31.1768907Z or.b32 %r7064, %r7, %r7016; 2026-02-21T09:23:31.1768975Z sub.s32 %r7065, %r7064, %r7019; 2026-02-21T09:23:31.1769094Z shl.b32 %r7066, %r7065, 10; 2026-02-21T09:23:31.1769163Z mul.wide.s32 %rd561, %r7066, 2; 2026-02-21T09:23:31.1769227Z or.b64 %rd77, %rd561, 256; 2026-02-21T09:23:31.1769296Z mov.b32 %r12747, 0f00000000; 2026-02-21T09:23:31.1769358Z mov.b32 %r12746, 1; 2026-02-21T09:23:31.1769426Z mov.b32 %r12745, -1; 2026-02-21T09:23:31.1769493Z mov.b64 %rd760, 0; 2026-02-21T09:23:31.1769556Z mov.b64 %rd759, %rd11; 2026-02-21T09:23:31.1769617Z mov.b32 %r12744, %r6852; 2026-02-21T09:23:31.1769685Z mov.b32 %r12748, %r12747; 2026-02-21T09:23:31.1769748Z mov.b32 %r12749, %r12747; 2026-02-21T09:23:31.1769810Z mov.b32 %r12750, %r12747; 2026-02-21T09:23:31.1769871Z mov.b32 %r12751, %r12747; 2026-02-21T09:23:31.1769934Z mov.b32 %r12752, %r12747; 2026-02-21T09:23:31.1769998Z mov.b32 %r12753, %r12747; 2026-02-21T09:23:31.1770058Z mov.b32 %r12754, %r12747; 2026-02-21T09:23:31.1770124Z mov.b32 %r12755, %r12747; 2026-02-21T09:23:31.1770183Z mov.b32 %r12756, %r12747; 2026-02-21T09:23:31.1770243Z mov.b32 %r12757, %r12747; 2026-02-21T09:23:31.1770305Z mov.b32 %r12758, %r12747; 2026-02-21T09:23:31.1770374Z mov.b32 %r12759, %r12747; 2026-02-21T09:23:31.1770434Z mov.b32 %r12760, %r12747; 2026-02-21T09:23:31.1770494Z mov.b32 %r12761, %r12747; 2026-02-21T09:23:31.1770562Z mov.b32 %r12762, %r12747; 2026-02-21T09:23:31.1770622Z mov.b32 %r12763, %r12747; 2026-02-21T09:23:31.1770681Z mov.b32 %r12764, %r12747; 2026-02-21T09:23:31.1770740Z mov.b32 %r12765, %r12747; 2026-02-21T09:23:31.1770809Z mov.b32 %r12766, %r12747; 2026-02-21T09:23:31.1770870Z mov.b32 %r12767, %r12747; 2026-02-21T09:23:31.1770930Z mov.b32 %r12768, %r12747; 2026-02-21T09:23:31.1771000Z mov.b32 %r12769, %r12747; 2026-02-21T09:23:31.1771061Z mov.b32 %r12770, %r12747; 2026-02-21T09:23:31.1771123Z mov.b32 %r12771, %r12747; 2026-02-21T09:23:31.1771184Z mov.b32 %r12772, %r12747; 2026-02-21T09:23:31.1771251Z mov.b32 %r12773, %r12747; 2026-02-21T09:23:31.1771311Z mov.b32 %r12774, %r12747; 2026-02-21T09:23:31.1771371Z mov.b32 %r12775, %r12747; 2026-02-21T09:23:31.1771437Z mov.b32 %r12776, %r12747; 2026-02-21T09:23:31.1771498Z mov.b32 %r12777, %r12747; 2026-02-21T09:23:31.1771557Z mov.b32 %r12778, %r12747; 2026-02-21T09:23:31.1771617Z mov.b32 %r12779, %r12747; 2026-02-21T09:23:31.1771683Z mov.b32 %r12780, %r12747; 2026-02-21T09:23:31.1771743Z mov.b32 %r12781, %r12747; 2026-02-21T09:23:31.1771806Z mov.b32 %r12782, %r12747; 2026-02-21T09:23:31.1771871Z mov.b32 %r12783, %r12747; 2026-02-21T09:23:31.1771931Z mov.b32 %r12784, %r12747; 2026-02-21T09:23:31.1771991Z mov.b32 %r12785, %r12747; 2026-02-21T09:23:31.1772053Z mov.b32 %r12786, %r12747; 2026-02-21T09:23:31.1772120Z mov.b32 %r12787, %r12747; 2026-02-21T09:23:31.1772192Z mov.b32 %r12788, %r12747; 2026-02-21T09:23:31.1772403Z mov.b32 %r12789, %r12747; 2026-02-21T09:23:31.1772469Z mov.b32 %r12790, %r12747; 2026-02-21T09:23:31.1772532Z mov.b32 %r12791, %r12747; 2026-02-21T09:23:31.1772592Z mov.b32 %r12792, %r12747; 2026-02-21T09:23:31.1772660Z mov.b32 %r12793, %r12747; 2026-02-21T09:23:31.1772727Z mov.b32 %r12794, %r12747; 2026-02-21T09:23:31.1772788Z mov.b32 %r12795, %r12747; 2026-02-21T09:23:31.1772849Z mov.b32 %r12796, %r12747; 2026-02-21T09:23:31.1772917Z mov.b32 %r12797, %r12747; 2026-02-21T09:23:31.1772975Z mov.b32 %r12798, %r12747; 2026-02-21T09:23:31.1773035Z mov.b32 %r12799, %r12747; 2026-02-21T09:23:31.1773103Z mov.b32 %r12800, %r12747; 2026-02-21T09:23:31.1773163Z mov.b32 %r12801, %r12747; 2026-02-21T09:23:31.1773222Z mov.b32 %r12802, %r12747; 2026-02-21T09:23:31.1773334Z mov.b32 %r12803, %r12747; 2026-02-21T09:23:31.1773404Z mov.b32 %r12804, %r12747; 2026-02-21T09:23:31.1773465Z mov.b32 %r12805, %r12747; 2026-02-21T09:23:31.1773525Z mov.b32 %r12806, %r12747; 2026-02-21T09:23:31.1773593Z mov.b32 %r12807, %r12747; 2026-02-21T09:23:31.1773652Z mov.b32 %r12808, %r12747; 2026-02-21T09:23:31.1773712Z mov.b32 %r12809, %r12747; 2026-02-21T09:23:31.1773770Z mov.b32 %r12810, %r12747; 2026-02-21T09:23:31.1773838Z mov.b32 %r12811, %r12747; 2026-02-21T09:23:31.1773950Z mov.b32 %r12812, %r12747; 2026-02-21T09:23:31.1774012Z mov.b32 %r12813, %r12747; 2026-02-21T09:23:31.1774077Z mov.b32 %r12814, %r12747; 2026-02-21T09:23:31.1774136Z mov.b32 %r12815, %r12747; 2026-02-21T09:23:31.1774197Z mov.b32 %r12816, %r12747; 2026-02-21T09:23:31.1774257Z mov.b32 %r12817, %r12747; 2026-02-21T09:23:31.1774323Z mov.b32 %r12818, %r12747; 2026-02-21T09:23:31.1774383Z mov.b32 %r12819, %r12747; 2026-02-21T09:23:31.1774444Z mov.b32 %r12820, %r12747; 2026-02-21T09:23:31.1774510Z mov.b32 %r12821, %r12747; 2026-02-21T09:23:31.1774572Z mov.b32 %r12822, %r12747; 2026-02-21T09:23:31.1774645Z mov.b32 %r12823, %r12747; 2026-02-21T09:23:31.1774706Z mov.b32 %r12824, %r12747; 2026-02-21T09:23:31.1774781Z mov.b32 %r12825, %r12747; 2026-02-21T09:23:31.1774841Z mov.b32 %r12826, %r12747; 2026-02-21T09:23:31.1774900Z mov.b32 %r12827, %r12747; 2026-02-21T09:23:31.1774965Z mov.b32 %r12828, %r12747; 2026-02-21T09:23:31.1775029Z mov.b32 %r12829, %r12747; 2026-02-21T09:23:31.1775088Z mov.b32 %r12830, %r12747; 2026-02-21T09:23:31.1775149Z mov.b32 %r12831, %r12747; 2026-02-21T09:23:31.1775213Z mov.b32 %r12832, %r12747; 2026-02-21T09:23:31.1775272Z mov.b32 %r12833, %r12747; 2026-02-21T09:23:31.1775330Z mov.b32 %r12834, %r12747; 2026-02-21T09:23:31.1775395Z mov.b32 %r12835, %r12747; 2026-02-21T09:23:31.1775454Z mov.b32 %r12836, %r12747; 2026-02-21T09:23:31.1775513Z mov.b32 %r12837, %r12747; 2026-02-21T09:23:31.1775573Z mov.b32 %r12838, %r12747; 2026-02-21T09:23:31.1775639Z mov.b32 %r12839, %r12747; 2026-02-21T09:23:31.1775699Z mov.b32 %r12840, %r12747; 2026-02-21T09:23:31.1775760Z mov.b32 %r12841, %r12747; 2026-02-21T09:23:31.1775825Z mov.b32 %r12842, %r12747; 2026-02-21T09:23:31.1775887Z mov.b32 %r12843, %r12747; 2026-02-21T09:23:31.1775947Z mov.b32 %r12844, %r12747; 2026-02-21T09:23:31.1776011Z mov.b32 %r12845, %r12747; 2026-02-21T09:23:31.1776071Z mov.b32 %r12846, %r12747; 2026-02-21T09:23:31.1776132Z mov.b32 %r12847, %r12747; 2026-02-21T09:23:31.1776192Z mov.b32 %r12848, %r12747; 2026-02-21T09:23:31.1776259Z mov.b32 %r12849, %r12747; 2026-02-21T09:23:31.1776320Z mov.b32 %r12850, %r12747; 2026-02-21T09:23:31.1776381Z mov.b32 %r12851, %r12747; 2026-02-21T09:23:31.1776446Z mov.b32 %r12852, %r12747; 2026-02-21T09:23:31.1776634Z mov.b32 %r12853, %r12747; 2026-02-21T09:23:31.1776695Z mov.b32 %r12854, %r12747; 2026-02-21T09:23:31.1776756Z mov.b32 %r12855, %r12747; 2026-02-21T09:23:31.1776824Z mov.b32 %r12856, %r12747; 2026-02-21T09:23:31.1776886Z mov.b32 %r12857, %r12747; 2026-02-21T09:23:31.1776947Z mov.b32 %r12858, %r12747; 2026-02-21T09:23:31.1777014Z mov.b32 %r12859, %r12747; 2026-02-21T09:23:31.1777077Z mov.b32 %r12860, %r12747; 2026-02-21T09:23:31.1777283Z mov.b32 %r12861, %r12747; 2026-02-21T09:23:31.1777347Z mov.b32 %r12862, %r12747; 2026-02-21T09:23:31.1777414Z mov.b32 %r12863, %r12747; 2026-02-21T09:23:31.1777474Z mov.b32 %r12864, %r12747; 2026-02-21T09:23:31.1777539Z mov.b32 %r12865, %r12747; 2026-02-21T09:23:31.1777609Z mov.b32 %r12866, %r12747; 2026-02-21T09:23:31.1777670Z mov.b32 %r12867, %r12747; 2026-02-21T09:23:31.1777731Z mov.b32 %r12868, %r12747; 2026-02-21T09:23:31.1777791Z mov.b32 %r12869, %r12747; 2026-02-21T09:23:31.1777860Z mov.b32 %r12870, %r12747; 2026-02-21T09:23:31.1777919Z mov.b32 %r12871, %r12747; 2026-02-21T09:23:31.1777979Z mov.b32 %r12872, %r12747; 2026-02-21T09:23:31.1778040Z mov.b32 %r12873, %r12747; 2026-02-21T09:23:31.1778109Z mov.b32 %r12874, %r12747; 2026-02-21T09:23:31.1778302Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:31.1778416Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:31.1778500Z setp.lt.u64 %p147, %rd760, 448; 2026-02-21T09:23:31.1778565Z add.s32 %r9480, %r12745, 1; 2026-02-21T09:23:31.1778633Z setp.gt.s32 %p148, %r9480, 1; 2026-02-21T09:23:31.1778705Z selp.b32 %r12745, 0, %r9480, %p148; 2026-02-21T09:23:31.1778839Z selp.b32 %r9481, 1, 0, %p148; 2026-02-21T09:23:31.1778906Z xor.b32 %r12744, %r12744, %r9481; 2026-02-21T09:23:31.1779127Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1779201Z cp.async.wait_group 1; 2026-02-21T09:23:31.1779260Z bar.sync 0; 2026-02-21T09:23:31.1779327Z shl.b32 %r9482, %r12745, 14; 2026-02-21T09:23:31.1779392Z add.s32 %r9484, %r1290, %r9482; 2026-02-21T09:23:31.1779606Z .loc 1 56 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:56:32 2026-02-21T09:23:31.1779673Z add.s32 %r9485, %r9484, %r91; 2026-02-21T09:23:31.1779743Z ld.shared.b16 %rs449, [%r9485]; 2026-02-21T09:23:31.1779822Z ld.shared.b16 %rs450, [%r9485+1024]; 2026-02-21T09:23:31.1779895Z ld.shared.b16 %rs451, [%r9485+64]; 2026-02-21T09:23:31.1779962Z ld.shared.b16 %rs452, [%r9485+1088]; 2026-02-21T09:23:31.1780035Z ld.shared.b16 %rs453, [%r9485+8192]; 2026-02-21T09:23:31.1780101Z ld.shared.b16 %rs454, [%r9485+9216]; 2026-02-21T09:23:31.1780182Z ld.shared.b16 %rs455, [%r9485+8256]; 2026-02-21T09:23:31.1780252Z ld.shared.b16 %rs456, [%r9485+9280]; 2026-02-21T09:23:31.1780320Z add.s32 %r9486, %r9484, %r92; 2026-02-21T09:23:31.1780385Z ld.shared.b16 %rs457, [%r9486]; 2026-02-21T09:23:31.1780452Z ld.shared.b16 %rs458, [%r9486+1024]; 2026-02-21T09:23:31.1780524Z ld.shared.b16 %rs459, [%r9486+64]; 2026-02-21T09:23:31.1780590Z ld.shared.b16 %rs460, [%r9486+1088]; 2026-02-21T09:23:31.1780657Z ld.shared.b16 %rs461, [%r9486+8192]; 2026-02-21T09:23:31.1780725Z ld.shared.b16 %rs462, [%r9486+9216]; 2026-02-21T09:23:31.1780798Z ld.shared.b16 %rs463, [%r9486+8256]; 2026-02-21T09:23:31.1780865Z ld.shared.b16 %rs464, [%r9486+9280]; 2026-02-21T09:23:31.1780930Z add.s32 %r9487, %r9484, %r93; 2026-02-21T09:23:31.1780999Z ld.shared.b16 %rs465, [%r9487]; 2026-02-21T09:23:31.1781062Z ld.shared.b16 %rs466, [%r9487+1024]; 2026-02-21T09:23:31.1781129Z ld.shared.b16 %rs467, [%r9487+64]; 2026-02-21T09:23:31.1781199Z ld.shared.b16 %rs468, [%r9487+1088]; 2026-02-21T09:23:31.1781266Z ld.shared.b16 %rs469, [%r9487+8192]; 2026-02-21T09:23:31.1781333Z ld.shared.b16 %rs470, [%r9487+9216]; 2026-02-21T09:23:31.1781398Z ld.shared.b16 %rs471, [%r9487+8256]; 2026-02-21T09:23:31.1781471Z ld.shared.b16 %rs472, [%r9487+9280]; 2026-02-21T09:23:31.1781531Z add.s32 %r9488, %r9484, %r94; 2026-02-21T09:23:31.1781598Z ld.shared.b16 %rs473, [%r9488]; 2026-02-21T09:23:31.1781668Z ld.shared.b16 %rs474, [%r9488+1024]; 2026-02-21T09:23:31.1781734Z ld.shared.b16 %rs475, [%r9488+64]; 2026-02-21T09:23:31.1781801Z ld.shared.b16 %rs476, [%r9488+1088]; 2026-02-21T09:23:31.1781866Z ld.shared.b16 %rs477, [%r9488+8192]; 2026-02-21T09:23:31.1782008Z ld.shared.b16 %rs478, [%r9488+9216]; 2026-02-21T09:23:31.1782121Z ld.shared.b16 %rs479, [%r9488+8256]; 2026-02-21T09:23:31.1782186Z ld.shared.b16 %rs480, [%r9488+9280]; 2026-02-21T09:23:31.1782253Z add.s32 %r9489, %r9484, %r95; 2026-02-21T09:23:31.1782319Z ld.shared.b16 %rs481, [%r9489]; 2026-02-21T09:23:31.1782384Z ld.shared.b16 %rs482, [%r9489+1024]; 2026-02-21T09:23:31.1782454Z ld.shared.b16 %rs483, [%r9489+64]; 2026-02-21T09:23:31.1782519Z ld.shared.b16 %rs484, [%r9489+1088]; 2026-02-21T09:23:31.1782584Z ld.shared.b16 %rs485, [%r9489+8192]; 2026-02-21T09:23:31.1782649Z ld.shared.b16 %rs486, [%r9489+9216]; 2026-02-21T09:23:31.1782719Z ld.shared.b16 %rs487, [%r9489+8256]; 2026-02-21T09:23:31.1782783Z ld.shared.b16 %rs488, [%r9489+9280]; 2026-02-21T09:23:31.1782856Z add.s32 %r9490, %r9484, %r96; 2026-02-21T09:23:31.1782980Z ld.shared.b16 %rs489, [%r9490]; 2026-02-21T09:23:31.1783048Z ld.shared.b16 %rs490, [%r9490+1024]; 2026-02-21T09:23:31.1783114Z ld.shared.b16 %rs491, [%r9490+64]; 2026-02-21T09:23:31.1783184Z ld.shared.b16 %rs492, [%r9490+1088]; 2026-02-21T09:23:31.1783253Z ld.shared.b16 %rs493, [%r9490+8192]; 2026-02-21T09:23:31.1783318Z ld.shared.b16 %rs494, [%r9490+9216]; 2026-02-21T09:23:31.1783381Z ld.shared.b16 %rs495, [%r9490+8256]; 2026-02-21T09:23:31.1783496Z ld.shared.b16 %rs496, [%r9490+9280]; 2026-02-21T09:23:31.1783558Z add.s32 %r9491, %r9484, %r97; 2026-02-21T09:23:31.1783623Z ld.shared.b16 %rs497, [%r9491]; 2026-02-21T09:23:31.1783694Z ld.shared.b16 %rs498, [%r9491+1024]; 2026-02-21T09:23:31.1783758Z ld.shared.b16 %rs499, [%r9491+64]; 2026-02-21T09:23:31.1783823Z ld.shared.b16 %rs500, [%r9491+1088]; 2026-02-21T09:23:31.1783889Z ld.shared.b16 %rs501, [%r9491+8192]; 2026-02-21T09:23:31.1783962Z ld.shared.b16 %rs502, [%r9491+9216]; 2026-02-21T09:23:31.1784028Z ld.shared.b16 %rs503, [%r9491+8256]; 2026-02-21T09:23:31.1784093Z ld.shared.b16 %rs504, [%r9491+9280]; 2026-02-21T09:23:31.1784157Z add.s32 %r9492, %r9484, %r98; 2026-02-21T09:23:31.1784224Z ld.shared.b16 %rs505, [%r9492]; 2026-02-21T09:23:31.1784288Z ld.shared.b16 %rs506, [%r9492+1024]; 2026-02-21T09:23:31.1784352Z ld.shared.b16 %rs507, [%r9492+64]; 2026-02-21T09:23:31.1784422Z ld.shared.b16 %rs508, [%r9492+1088]; 2026-02-21T09:23:31.1784488Z ld.shared.b16 %rs509, [%r9492+8192]; 2026-02-21T09:23:31.1784554Z ld.shared.b16 %rs510, [%r9492+9216]; 2026-02-21T09:23:31.1784624Z ld.shared.b16 %rs511, [%r9492+8256]; 2026-02-21T09:23:31.1784690Z ld.shared.b16 %rs512, [%r9492+9280]; 2026-02-21T09:23:31.1784756Z cvt.f32.bf16 %r7197, %rs449; 2026-02-21T09:23:31.1784823Z cvt.f32.bf16 %r7198, %rs450; 2026-02-21T09:23:31.1784886Z cvt.f32.bf16 %r7199, %rs457; 2026-02-21T09:23:31.1784947Z cvt.f32.bf16 %r7200, %rs458; 2026-02-21T09:23:31.1785007Z cvt.f32.bf16 %r7329, %rs465; 2026-02-21T09:23:31.1785079Z cvt.f32.bf16 %r7330, %rs466; 2026-02-21T09:23:31.1785141Z cvt.f32.bf16 %r7331, %rs473; 2026-02-21T09:23:31.1785206Z cvt.f32.bf16 %r7332, %rs474; 2026-02-21T09:23:31.1785272Z cvt.f32.bf16 %r7461, %rs481; 2026-02-21T09:23:31.1785333Z cvt.f32.bf16 %r7462, %rs482; 2026-02-21T09:23:31.1785398Z cvt.f32.bf16 %r7463, %rs489; 2026-02-21T09:23:31.1785460Z cvt.f32.bf16 %r7464, %rs490; 2026-02-21T09:23:31.1785520Z cvt.f32.bf16 %r7593, %rs497; 2026-02-21T09:23:31.1785583Z cvt.f32.bf16 %r7594, %rs498; 2026-02-21T09:23:31.1785650Z cvt.f32.bf16 %r7595, %rs505; 2026-02-21T09:23:31.1785715Z cvt.f32.bf16 %r7596, %rs506; 2026-02-21T09:23:31.1785775Z cvt.f32.bf16 %r7725, %rs451; 2026-02-21T09:23:31.1785839Z cvt.f32.bf16 %r7726, %rs452; 2026-02-21T09:23:31.1785898Z cvt.f32.bf16 %r7727, %rs459; 2026-02-21T09:23:31.1785959Z cvt.f32.bf16 %r7728, %rs460; 2026-02-21T09:23:31.1786022Z cvt.f32.bf16 %r7857, %rs467; 2026-02-21T09:23:31.1786088Z cvt.f32.bf16 %r7858, %rs468; 2026-02-21T09:23:31.1786162Z cvt.f32.bf16 %r7859, %rs475; 2026-02-21T09:23:31.1786225Z cvt.f32.bf16 %r7860, %rs476; 2026-02-21T09:23:31.1786290Z cvt.f32.bf16 %r7989, %rs483; 2026-02-21T09:23:31.1786416Z cvt.f32.bf16 %r7990, %rs484; 2026-02-21T09:23:31.1786657Z cvt.f32.bf16 %r7991, %rs491; 2026-02-21T09:23:31.1786724Z cvt.f32.bf16 %r7992, %rs492; 2026-02-21T09:23:31.1786790Z cvt.f32.bf16 %r8121, %rs499; 2026-02-21T09:23:31.1786849Z cvt.f32.bf16 %r8122, %rs500; 2026-02-21T09:23:31.1786912Z cvt.f32.bf16 %r8123, %rs507; 2026-02-21T09:23:31.1786977Z cvt.f32.bf16 %r8124, %rs508; 2026-02-21T09:23:31.1787038Z cvt.f32.bf16 %r8253, %rs453; 2026-02-21T09:23:31.1787098Z cvt.f32.bf16 %r8254, %rs454; 2026-02-21T09:23:31.1787159Z cvt.f32.bf16 %r8255, %rs461; 2026-02-21T09:23:31.1787223Z cvt.f32.bf16 %r8256, %rs462; 2026-02-21T09:23:31.1787283Z cvt.f32.bf16 %r8385, %rs469; 2026-02-21T09:23:31.1787343Z cvt.f32.bf16 %r8386, %rs470; 2026-02-21T09:23:31.1787406Z cvt.f32.bf16 %r8387, %rs477; 2026-02-21T09:23:31.1787547Z cvt.f32.bf16 %r8388, %rs478; 2026-02-21T09:23:31.1787610Z cvt.f32.bf16 %r8517, %rs485; 2026-02-21T09:23:31.1787674Z cvt.f32.bf16 %r8518, %rs486; 2026-02-21T09:23:31.1787737Z cvt.f32.bf16 %r8519, %rs493; 2026-02-21T09:23:31.1787800Z cvt.f32.bf16 %r8520, %rs494; 2026-02-21T09:23:31.1787861Z cvt.f32.bf16 %r8649, %rs501; 2026-02-21T09:23:31.1787929Z cvt.f32.bf16 %r8650, %rs502; 2026-02-21T09:23:31.1787990Z cvt.f32.bf16 %r8651, %rs509; 2026-02-21T09:23:31.1788111Z cvt.f32.bf16 %r8652, %rs510; 2026-02-21T09:23:31.1788178Z cvt.f32.bf16 %r8781, %rs455; 2026-02-21T09:23:31.1788320Z cvt.f32.bf16 %r8782, %rs456; 2026-02-21T09:23:31.1788384Z cvt.f32.bf16 %r8783, %rs463; 2026-02-21T09:23:31.1788446Z cvt.f32.bf16 %r8784, %rs464; 2026-02-21T09:23:31.1788512Z cvt.f32.bf16 %r8913, %rs471; 2026-02-21T09:23:31.1788573Z cvt.f32.bf16 %r8914, %rs472; 2026-02-21T09:23:31.1788633Z cvt.f32.bf16 %r8915, %rs479; 2026-02-21T09:23:31.1788703Z cvt.f32.bf16 %r8916, %rs480; 2026-02-21T09:23:31.1788766Z cvt.f32.bf16 %r9045, %rs487; 2026-02-21T09:23:31.1788828Z cvt.f32.bf16 %r9046, %rs488; 2026-02-21T09:23:31.1788888Z cvt.f32.bf16 %r9047, %rs495; 2026-02-21T09:23:31.1788955Z cvt.f32.bf16 %r9048, %rs496; 2026-02-21T09:23:31.1789018Z cvt.f32.bf16 %r9177, %rs503; 2026-02-21T09:23:31.1789079Z cvt.f32.bf16 %r9178, %rs504; 2026-02-21T09:23:31.1789150Z cvt.f32.bf16 %r9179, %rs511; 2026-02-21T09:23:31.1789209Z cvt.f32.bf16 %r9180, %rs512; 2026-02-21T09:23:31.1789423Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1789492Z shl.b32 %r9493, %r12745, 3; 2026-02-21T09:23:31.1789554Z add.s32 %r7067, %r6810, %r9493; 2026-02-21T09:23:31.1789618Z // begin inline asm 2026-02-21T09:23:31.1789672Z 2026-02-21T09:23:31.1789727Z { 2026-02-21T09:23:31.1789793Z .reg .pred complete; 2026-02-21T09:23:31.1789850Z waitLoop: 2026-02-21T09:23:31.1790001Z mbarrier.try_wait.parity.shared.b64 complete, [%r7067], %r12744; 2026-02-21T09:23:31.1790074Z @!complete bra.uni waitLoop; 2026-02-21T09:23:31.1790126Z } 2026-02-21T09:23:31.1790131Z 2026-02-21T09:23:31.1790190Z // end inline asm 2026-02-21T09:23:31.1790399Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1790466Z shl.b32 %r9495, %r12745, 12; 2026-02-21T09:23:31.1790532Z add.s32 %r9497, %r1378, %r9495; 2026-02-21T09:23:31.1790739Z .loc 1 76 58 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:76:58 2026-02-21T09:23:31.1790805Z add.s32 %r9498, %r9497, %r12473; 2026-02-21T09:23:31.1790874Z add.s32 %r9499, %r9497, %r332; 2026-02-21T09:23:31.1790942Z add.s32 %r9500, %r9497, %r333; 2026-02-21T09:23:31.1791002Z add.s32 %r9501, %r9497, %r334; 2026-02-21T09:23:31.1791062Z add.s32 %r9502, %r9497, %r335; 2026-02-21T09:23:31.1791122Z add.s32 %r9503, %r9497, %r336; 2026-02-21T09:23:31.1791188Z add.s32 %r9504, %r9497, %r337; 2026-02-21T09:23:31.1791253Z add.s32 %r9505, %r9497, %r338; 2026-02-21T09:23:31.1791451Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1791522Z ld.shared.s8 %rs513, [%r9498]; 2026-02-21T09:23:31.1791884Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1791949Z shl.b16 %rs514, %rs513, 4; 2026-02-21T09:23:31.1792162Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1792233Z ld.shared.s8 %rs515, [%r9499+128]; 2026-02-21T09:23:31.1792441Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1792508Z shl.b16 %rs516, %rs515, 4; 2026-02-21T09:23:31.1792712Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1792781Z ld.shared.s8 %rs517, [%r9500+256]; 2026-02-21T09:23:31.1793027Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1793098Z shl.b16 %rs518, %rs517, 4; 2026-02-21T09:23:31.1793295Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1793369Z ld.shared.s8 %rs519, [%r9501+384]; 2026-02-21T09:23:31.1793570Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1793687Z shl.b16 %rs520, %rs519, 4; 2026-02-21T09:23:31.1793883Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1793954Z ld.shared.s8 %rs521, [%r9502+512]; 2026-02-21T09:23:31.1794151Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1794214Z shl.b16 %rs522, %rs521, 4; 2026-02-21T09:23:31.1794413Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1794483Z ld.shared.s8 %rs523, [%r9503+640]; 2026-02-21T09:23:31.1794678Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1794745Z shl.b16 %rs524, %rs523, 4; 2026-02-21T09:23:31.1794949Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1795017Z ld.shared.s8 %rs525, [%r9504+768]; 2026-02-21T09:23:31.1795212Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1795277Z shl.b16 %rs526, %rs525, 4; 2026-02-21T09:23:31.1795472Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1795538Z ld.shared.s8 %rs527, [%r9505+896]; 2026-02-21T09:23:31.1795737Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1795799Z shl.b16 %rs528, %rs527, 4; 2026-02-21T09:23:31.1795994Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1796069Z ld.shared.s8 %rs529, [%r9498+1024]; 2026-02-21T09:23:31.1796273Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1796337Z shl.b16 %rs530, %rs529, 4; 2026-02-21T09:23:31.1796675Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1796759Z ld.shared.s8 %rs531, [%r9499+1152]; 2026-02-21T09:23:31.1796962Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1797027Z shl.b16 %rs532, %rs531, 4; 2026-02-21T09:23:31.1797234Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1797305Z ld.shared.s8 %rs533, [%r9500+1280]; 2026-02-21T09:23:31.1797508Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1797576Z shl.b16 %rs534, %rs533, 4; 2026-02-21T09:23:31.1797928Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1797995Z ld.shared.s8 %rs535, [%r9501+1408]; 2026-02-21T09:23:31.1798201Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1798264Z shl.b16 %rs536, %rs535, 4; 2026-02-21T09:23:31.1798460Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1798540Z ld.shared.s8 %rs537, [%r9502+1536]; 2026-02-21T09:23:31.1798746Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1798810Z shl.b16 %rs538, %rs537, 4; 2026-02-21T09:23:31.1799071Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1799151Z ld.shared.s8 %rs539, [%r9503+1664]; 2026-02-21T09:23:31.1799348Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1799414Z shl.b16 %rs540, %rs539, 4; 2026-02-21T09:23:31.1799613Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1799742Z ld.shared.s8 %rs541, [%r9504+1792]; 2026-02-21T09:23:31.1799938Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1800002Z shl.b16 %rs542, %rs541, 4; 2026-02-21T09:23:31.1800199Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1800266Z ld.shared.s8 %rs543, [%r9505+1920]; 2026-02-21T09:23:31.1800462Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1800526Z shl.b16 %rs544, %rs543, 4; 2026-02-21T09:23:31.1800720Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1800789Z ld.shared.s8 %rs545, [%r9498+2048]; 2026-02-21T09:23:31.1800992Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1801053Z shl.b16 %rs546, %rs545, 4; 2026-02-21T09:23:31.1801249Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1801319Z ld.shared.s8 %rs547, [%r9499+2176]; 2026-02-21T09:23:31.1801512Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1801573Z shl.b16 %rs548, %rs547, 4; 2026-02-21T09:23:31.1801771Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1801839Z ld.shared.s8 %rs549, [%r9500+2304]; 2026-02-21T09:23:31.1802032Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1802098Z shl.b16 %rs550, %rs549, 4; 2026-02-21T09:23:31.1802301Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1802369Z ld.shared.s8 %rs551, [%r9501+2432]; 2026-02-21T09:23:31.1802567Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1802634Z shl.b16 %rs552, %rs551, 4; 2026-02-21T09:23:31.1802829Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1802896Z ld.shared.s8 %rs553, [%r9502+2560]; 2026-02-21T09:23:31.1803097Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1803162Z shl.b16 %rs554, %rs553, 4; 2026-02-21T09:23:31.1803359Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1803430Z ld.shared.s8 %rs555, [%r9503+2688]; 2026-02-21T09:23:31.1803733Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1803795Z shl.b16 %rs556, %rs555, 4; 2026-02-21T09:23:31.1803991Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1804062Z ld.shared.s8 %rs557, [%r9504+2816]; 2026-02-21T09:23:31.1804255Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1804317Z shl.b16 %rs558, %rs557, 4; 2026-02-21T09:23:31.1804516Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1804582Z ld.shared.s8 %rs559, [%r9505+2944]; 2026-02-21T09:23:31.1804824Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1804893Z shl.b16 %rs560, %rs559, 4; 2026-02-21T09:23:31.1805088Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1805158Z ld.shared.s8 %rs561, [%r9498+3072]; 2026-02-21T09:23:31.1805356Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1805463Z shl.b16 %rs562, %rs561, 4; 2026-02-21T09:23:31.1805676Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1805742Z ld.shared.s8 %rs563, [%r9499+3200]; 2026-02-21T09:23:31.1805942Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1806005Z shl.b16 %rs564, %rs563, 4; 2026-02-21T09:23:31.1806201Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1806273Z ld.shared.s8 %rs565, [%r9500+3328]; 2026-02-21T09:23:31.1806583Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1806653Z shl.b16 %rs566, %rs565, 4; 2026-02-21T09:23:31.1806854Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1806920Z ld.shared.s8 %rs567, [%r9501+3456]; 2026-02-21T09:23:31.1807117Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1807183Z shl.b16 %rs568, %rs567, 4; 2026-02-21T09:23:31.1807378Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1807444Z ld.shared.s8 %rs569, [%r9502+3584]; 2026-02-21T09:23:31.1807644Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1807708Z shl.b16 %rs570, %rs569, 4; 2026-02-21T09:23:31.1807904Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1807970Z ld.shared.s8 %rs571, [%r9503+3712]; 2026-02-21T09:23:31.1808173Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1808236Z shl.b16 %rs572, %rs571, 4; 2026-02-21T09:23:31.1808445Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1808522Z ld.shared.s8 %rs573, [%r9504+3840]; 2026-02-21T09:23:31.1808720Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1808783Z shl.b16 %rs574, %rs573, 4; 2026-02-21T09:23:31.1808990Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1809054Z ld.shared.s8 %rs575, [%r9505+3968]; 2026-02-21T09:23:31.1809251Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1809318Z shl.b16 %rs576, %rs575, 4; 2026-02-21T09:23:31.1809607Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1809731Z cvt.s16.s8 %rs577, %rs514; 2026-02-21T09:23:31.1809791Z shr.s16 %rs578, %rs577, 4; 2026-02-21T09:23:31.1809857Z cvt.s16.s8 %rs579, %rs516; 2026-02-21T09:23:31.1809920Z shr.s16 %rs580, %rs579, 4; 2026-02-21T09:23:31.1809981Z shr.s16 %rs581, %rs513, 4; 2026-02-21T09:23:31.1810047Z shr.s16 %rs582, %rs515, 4; 2026-02-21T09:23:31.1810244Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1810310Z cvt.rn.f32.s16 %r9506, %rs582; 2026-02-21T09:23:31.1810376Z cvt.rn.f32.s16 %r9507, %rs581; 2026-02-21T09:23:31.1810444Z cvt.rn.f32.s16 %r9508, %rs580; 2026-02-21T09:23:31.1810506Z cvt.rn.f32.s16 %r9509, %rs578; 2026-02-21T09:23:31.1810767Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1810838Z cvt.s16.s8 %rs583, %rs518; 2026-02-21T09:23:31.1810904Z shr.s16 %rs584, %rs583, 4; 2026-02-21T09:23:31.1810965Z cvt.s16.s8 %rs585, %rs520; 2026-02-21T09:23:31.1811030Z shr.s16 %rs586, %rs585, 4; 2026-02-21T09:23:31.1811101Z shr.s16 %rs587, %rs517, 4; 2026-02-21T09:23:31.1811165Z shr.s16 %rs588, %rs519, 4; 2026-02-21T09:23:31.1811441Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1811511Z cvt.rn.f32.s16 %r9510, %rs588; 2026-02-21T09:23:31.1811574Z cvt.rn.f32.s16 %r9511, %rs587; 2026-02-21T09:23:31.1811636Z cvt.rn.f32.s16 %r9512, %rs586; 2026-02-21T09:23:31.1811702Z cvt.rn.f32.s16 %r9513, %rs584; 2026-02-21T09:23:31.1811897Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1811961Z cvt.s16.s8 %rs589, %rs522; 2026-02-21T09:23:31.1812028Z shr.s16 %rs590, %rs589, 4; 2026-02-21T09:23:31.1812089Z cvt.s16.s8 %rs591, %rs524; 2026-02-21T09:23:31.1812149Z shr.s16 %rs592, %rs591, 4; 2026-02-21T09:23:31.1812214Z shr.s16 %rs593, %rs521, 4; 2026-02-21T09:23:31.1812280Z shr.s16 %rs594, %rs523, 4; 2026-02-21T09:23:31.1812474Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1812539Z cvt.rn.f32.s16 %r9514, %rs594; 2026-02-21T09:23:31.1812605Z cvt.rn.f32.s16 %r9515, %rs593; 2026-02-21T09:23:31.1812666Z cvt.rn.f32.s16 %r9516, %rs592; 2026-02-21T09:23:31.1812727Z cvt.rn.f32.s16 %r9517, %rs590; 2026-02-21T09:23:31.1812920Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1812988Z cvt.s16.s8 %rs595, %rs526; 2026-02-21T09:23:31.1813047Z shr.s16 %rs596, %rs595, 4; 2026-02-21T09:23:31.1813107Z cvt.s16.s8 %rs597, %rs528; 2026-02-21T09:23:31.1813174Z shr.s16 %rs598, %rs597, 4; 2026-02-21T09:23:31.1813234Z shr.s16 %rs599, %rs525, 4; 2026-02-21T09:23:31.1813296Z shr.s16 %rs600, %rs527, 4; 2026-02-21T09:23:31.1813493Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1813560Z cvt.rn.f32.s16 %r9518, %rs600; 2026-02-21T09:23:31.1813621Z cvt.rn.f32.s16 %r9519, %rs599; 2026-02-21T09:23:31.1813682Z cvt.rn.f32.s16 %r9520, %rs598; 2026-02-21T09:23:31.1813751Z cvt.rn.f32.s16 %r9521, %rs596; 2026-02-21T09:23:31.1813945Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1814020Z cvt.s16.s8 %rs601, %rs530; 2026-02-21T09:23:31.1814090Z shr.s16 %rs602, %rs601, 4; 2026-02-21T09:23:31.1814152Z cvt.s16.s8 %rs603, %rs532; 2026-02-21T09:23:31.1814213Z shr.s16 %rs604, %rs603, 4; 2026-02-21T09:23:31.1814274Z shr.s16 %rs605, %rs529, 4; 2026-02-21T09:23:31.1814340Z shr.s16 %rs606, %rs531, 4; 2026-02-21T09:23:31.1814537Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1814603Z cvt.rn.f32.s16 %r9522, %rs606; 2026-02-21T09:23:31.1814786Z cvt.rn.f32.s16 %r9523, %rs605; 2026-02-21T09:23:31.1814848Z cvt.rn.f32.s16 %r9524, %rs604; 2026-02-21T09:23:31.1814910Z cvt.rn.f32.s16 %r9525, %rs602; 2026-02-21T09:23:31.1815112Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1815174Z cvt.s16.s8 %rs607, %rs534; 2026-02-21T09:23:31.1815235Z shr.s16 %rs608, %rs607, 4; 2026-02-21T09:23:31.1815295Z cvt.s16.s8 %rs609, %rs536; 2026-02-21T09:23:31.1815362Z shr.s16 %rs610, %rs609, 4; 2026-02-21T09:23:31.1815422Z shr.s16 %rs611, %rs533, 4; 2026-02-21T09:23:31.1815484Z shr.s16 %rs612, %rs535, 4; 2026-02-21T09:23:31.1815686Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1815800Z cvt.rn.f32.s16 %r9526, %rs612; 2026-02-21T09:23:31.1815865Z cvt.rn.f32.s16 %r9527, %rs611; 2026-02-21T09:23:31.1815932Z cvt.rn.f32.s16 %r9528, %rs610; 2026-02-21T09:23:31.1815995Z cvt.rn.f32.s16 %r9529, %rs608; 2026-02-21T09:23:31.1816192Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1816254Z cvt.s16.s8 %rs613, %rs538; 2026-02-21T09:23:31.1816324Z shr.s16 %rs614, %rs613, 4; 2026-02-21T09:23:31.1816434Z cvt.s16.s8 %rs615, %rs540; 2026-02-21T09:23:31.1816620Z shr.s16 %rs616, %rs615, 4; 2026-02-21T09:23:31.1816690Z shr.s16 %rs617, %rs537, 4; 2026-02-21T09:23:31.1816750Z shr.s16 %rs618, %rs539, 4; 2026-02-21T09:23:31.1816948Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1817011Z cvt.rn.f32.s16 %r9530, %rs618; 2026-02-21T09:23:31.1817078Z cvt.rn.f32.s16 %r9531, %rs617; 2026-02-21T09:23:31.1817140Z cvt.rn.f32.s16 %r9532, %rs616; 2026-02-21T09:23:31.1817204Z cvt.rn.f32.s16 %r9533, %rs614; 2026-02-21T09:23:31.1817402Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1817468Z cvt.s16.s8 %rs619, %rs542; 2026-02-21T09:23:31.1817539Z shr.s16 %rs620, %rs619, 4; 2026-02-21T09:23:31.1817610Z cvt.s16.s8 %rs621, %rs544; 2026-02-21T09:23:31.1817671Z shr.s16 %rs622, %rs621, 4; 2026-02-21T09:23:31.1817733Z shr.s16 %rs623, %rs541, 4; 2026-02-21T09:23:31.1817796Z shr.s16 %rs624, %rs543, 4; 2026-02-21T09:23:31.1817999Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1818065Z cvt.rn.f32.s16 %r9534, %rs624; 2026-02-21T09:23:31.1818129Z cvt.rn.f32.s16 %r9535, %rs623; 2026-02-21T09:23:31.1818196Z cvt.rn.f32.s16 %r9536, %rs622; 2026-02-21T09:23:31.1818259Z cvt.rn.f32.s16 %r9537, %rs620; 2026-02-21T09:23:31.1818458Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1818524Z cvt.s16.s8 %rs625, %rs546; 2026-02-21T09:23:31.1818587Z shr.s16 %rs626, %rs625, 4; 2026-02-21T09:23:31.1818647Z cvt.s16.s8 %rs627, %rs548; 2026-02-21T09:23:31.1818716Z shr.s16 %rs628, %rs627, 4; 2026-02-21T09:23:31.1818782Z shr.s16 %rs629, %rs545, 4; 2026-02-21T09:23:31.1818843Z shr.s16 %rs630, %rs547, 4; 2026-02-21T09:23:31.1819044Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1819116Z cvt.rn.f32.s16 %r9538, %rs630; 2026-02-21T09:23:31.1819178Z cvt.rn.f32.s16 %r9539, %rs629; 2026-02-21T09:23:31.1819242Z cvt.rn.f32.s16 %r9540, %rs628; 2026-02-21T09:23:31.1819303Z cvt.rn.f32.s16 %r9541, %rs626; 2026-02-21T09:23:31.1819504Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1819566Z cvt.s16.s8 %rs631, %rs550; 2026-02-21T09:23:31.1819625Z shr.s16 %rs632, %rs631, 4; 2026-02-21T09:23:31.1819690Z cvt.s16.s8 %rs633, %rs552; 2026-02-21T09:23:31.1819752Z shr.s16 %rs634, %rs633, 4; 2026-02-21T09:23:31.1819813Z shr.s16 %rs635, %rs549, 4; 2026-02-21T09:23:31.1819874Z shr.s16 %rs636, %rs551, 4; 2026-02-21T09:23:31.1820217Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1820281Z cvt.rn.f32.s16 %r9542, %rs636; 2026-02-21T09:23:31.1820341Z cvt.rn.f32.s16 %r9543, %rs635; 2026-02-21T09:23:31.1820412Z cvt.rn.f32.s16 %r9544, %rs634; 2026-02-21T09:23:31.1820477Z cvt.rn.f32.s16 %r9545, %rs632; 2026-02-21T09:23:31.1820672Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1820738Z cvt.s16.s8 %rs637, %rs554; 2026-02-21T09:23:31.1820799Z shr.s16 %rs638, %rs637, 4; 2026-02-21T09:23:31.1820860Z cvt.s16.s8 %rs639, %rs556; 2026-02-21T09:23:31.1820920Z shr.s16 %rs640, %rs639, 4; 2026-02-21T09:23:31.1821000Z shr.s16 %rs641, %rs553, 4; 2026-02-21T09:23:31.1821129Z shr.s16 %rs642, %rs555, 4; 2026-02-21T09:23:31.1821331Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1821403Z cvt.rn.f32.s16 %r9546, %rs642; 2026-02-21T09:23:31.1821472Z cvt.rn.f32.s16 %r9547, %rs641; 2026-02-21T09:23:31.1821533Z cvt.rn.f32.s16 %r9548, %rs640; 2026-02-21T09:23:31.1821599Z cvt.rn.f32.s16 %r9549, %rs638; 2026-02-21T09:23:31.1821864Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1821929Z cvt.s16.s8 %rs643, %rs558; 2026-02-21T09:23:31.1821993Z shr.s16 %rs644, %rs643, 4; 2026-02-21T09:23:31.1822059Z cvt.s16.s8 %rs645, %rs560; 2026-02-21T09:23:31.1822119Z shr.s16 %rs646, %rs645, 4; 2026-02-21T09:23:31.1822180Z shr.s16 %rs647, %rs557, 4; 2026-02-21T09:23:31.1822244Z shr.s16 %rs648, %rs559, 4; 2026-02-21T09:23:31.1822439Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1822504Z cvt.rn.f32.s16 %r9550, %rs648; 2026-02-21T09:23:31.1822567Z cvt.rn.f32.s16 %r9551, %rs647; 2026-02-21T09:23:31.1822633Z cvt.rn.f32.s16 %r9552, %rs646; 2026-02-21T09:23:31.1822704Z cvt.rn.f32.s16 %r9553, %rs644; 2026-02-21T09:23:31.1822898Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1822968Z cvt.s16.s8 %rs649, %rs562; 2026-02-21T09:23:31.1823031Z shr.s16 %rs650, %rs649, 4; 2026-02-21T09:23:31.1823091Z cvt.s16.s8 %rs651, %rs564; 2026-02-21T09:23:31.1823155Z shr.s16 %rs652, %rs651, 4; 2026-02-21T09:23:31.1823217Z shr.s16 %rs653, %rs561, 4; 2026-02-21T09:23:31.1823277Z shr.s16 %rs654, %rs563, 4; 2026-02-21T09:23:31.1823475Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1823544Z cvt.rn.f32.s16 %r9554, %rs654; 2026-02-21T09:23:31.1823607Z cvt.rn.f32.s16 %r9555, %rs653; 2026-02-21T09:23:31.1823670Z cvt.rn.f32.s16 %r9556, %rs652; 2026-02-21T09:23:31.1823742Z cvt.rn.f32.s16 %r9557, %rs650; 2026-02-21T09:23:31.1823938Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1824002Z cvt.s16.s8 %rs655, %rs566; 2026-02-21T09:23:31.1824069Z shr.s16 %rs656, %rs655, 4; 2026-02-21T09:23:31.1824130Z cvt.s16.s8 %rs657, %rs568; 2026-02-21T09:23:31.1824190Z shr.s16 %rs658, %rs657, 4; 2026-02-21T09:23:31.1824252Z shr.s16 %rs659, %rs565, 4; 2026-02-21T09:23:31.1824317Z shr.s16 %rs660, %rs567, 4; 2026-02-21T09:23:31.1824512Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1824576Z cvt.rn.f32.s16 %r9558, %rs660; 2026-02-21T09:23:31.1824643Z cvt.rn.f32.s16 %r9559, %rs659; 2026-02-21T09:23:31.1824706Z cvt.rn.f32.s16 %r9560, %rs658; 2026-02-21T09:23:31.1824770Z cvt.rn.f32.s16 %r9561, %rs656; 2026-02-21T09:23:31.1824968Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1825036Z cvt.s16.s8 %rs661, %rs570; 2026-02-21T09:23:31.1825101Z shr.s16 %rs662, %rs661, 4; 2026-02-21T09:23:31.1825225Z cvt.s16.s8 %rs663, %rs572; 2026-02-21T09:23:31.1825338Z shr.s16 %rs664, %rs663, 4; 2026-02-21T09:23:31.1825399Z shr.s16 %rs665, %rs569, 4; 2026-02-21T09:23:31.1825459Z shr.s16 %rs666, %rs571, 4; 2026-02-21T09:23:31.1825661Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1825727Z cvt.rn.f32.s16 %r9562, %rs666; 2026-02-21T09:23:31.1825791Z cvt.rn.f32.s16 %r9563, %rs665; 2026-02-21T09:23:31.1825852Z cvt.rn.f32.s16 %r9564, %rs664; 2026-02-21T09:23:31.1825919Z cvt.rn.f32.s16 %r9565, %rs662; 2026-02-21T09:23:31.1826113Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1826175Z cvt.s16.s8 %rs667, %rs574; 2026-02-21T09:23:31.1826245Z shr.s16 %rs668, %rs667, 4; 2026-02-21T09:23:31.1826353Z cvt.s16.s8 %rs669, %rs576; 2026-02-21T09:23:31.1826418Z shr.s16 %rs670, %rs669, 4; 2026-02-21T09:23:31.1826632Z shr.s16 %rs671, %rs573, 4; 2026-02-21T09:23:31.1826713Z shr.s16 %rs672, %rs575, 4; 2026-02-21T09:23:31.1826912Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1826974Z cvt.rn.f32.s16 %r9566, %rs672; 2026-02-21T09:23:31.1827042Z cvt.rn.f32.s16 %r9567, %rs671; 2026-02-21T09:23:31.1827188Z cvt.rn.f32.s16 %r9568, %rs670; 2026-02-21T09:23:31.1827254Z cvt.rn.f32.s16 %r9569, %rs668; 2026-02-21T09:23:31.1827377Z st.shared.v4.b32 [%r67], {%r9509, %r9507, %r9508, %r9506}; 2026-02-21T09:23:31.1827503Z st.shared.v4.b32 [%r67+16384], {%r9541, %r9539, %r9540, %r9538}; 2026-02-21T09:23:31.1827610Z st.shared.v4.b32 [%r68], {%r9513, %r9511, %r9512, %r9510}; 2026-02-21T09:23:31.1827726Z st.shared.v4.b32 [%r68+16384], {%r9545, %r9543, %r9544, %r9542}; 2026-02-21T09:23:31.1827837Z st.shared.v4.b32 [%r69], {%r9517, %r9515, %r9516, %r9514}; 2026-02-21T09:23:31.1827951Z st.shared.v4.b32 [%r69+16384], {%r9549, %r9547, %r9548, %r9546}; 2026-02-21T09:23:31.1828051Z st.shared.v4.b32 [%r70], {%r9521, %r9519, %r9520, %r9518}; 2026-02-21T09:23:31.1828180Z st.shared.v4.b32 [%r70+16384], {%r9553, %r9551, %r9552, %r9550}; 2026-02-21T09:23:31.1828369Z st.shared.v4.b32 [%r71], {%r9525, %r9523, %r9524, %r9522}; 2026-02-21T09:23:31.1828485Z st.shared.v4.b32 [%r71+16384], {%r9557, %r9555, %r9556, %r9554}; 2026-02-21T09:23:31.1828590Z st.shared.v4.b32 [%r72], {%r9529, %r9527, %r9528, %r9526}; 2026-02-21T09:23:31.1828702Z st.shared.v4.b32 [%r72+16384], {%r9561, %r9559, %r9560, %r9558}; 2026-02-21T09:23:31.1828804Z st.shared.v4.b32 [%r73], {%r9533, %r9531, %r9532, %r9530}; 2026-02-21T09:23:31.1828919Z st.shared.v4.b32 [%r73+16384], {%r9565, %r9563, %r9564, %r9562}; 2026-02-21T09:23:31.1829019Z st.shared.v4.b32 [%r74], {%r9537, %r9535, %r9536, %r9534}; 2026-02-21T09:23:31.1829132Z st.shared.v4.b32 [%r74+16384], {%r9569, %r9567, %r9568, %r9566}; 2026-02-21T09:23:31.1829189Z $L__tmp5: 2026-02-21T09:23:31.1829477Z .loc 2 291 36 // standard.py:291:36 @[ cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:88:40 ] 2026-02-21T09:23:31.1829544Z // begin inline asm 2026-02-21T09:23:31.1829626Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1829689Z // end inline asm 2026-02-21T09:23:31.1829746Z bar.sync 0; 2026-02-21T09:23:31.1829822Z wgmma.fence.sync.aligned; 2026-02-21T09:23:31.1829898Z mov.pred %p127, -1; 2026-02-21T09:23:31.1829958Z // begin inline asm 2026-02-21T09:23:31.1831430Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7197,%r7198,%r7199,%r7200}, %rd3, %p127, 1, 1; 2026-02-21T09:23:31.1831642Z // end inline asm 2026-02-21T09:23:31.1831707Z // begin inline asm 2026-02-21T09:23:31.1833159Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7329,%r7330,%r7331,%r7332}, %rd4, %p127, 1, 1; 2026-02-21T09:23:31.1833223Z // end inline asm 2026-02-21T09:23:31.1833358Z // begin inline asm 2026-02-21T09:23:31.1834859Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7461,%r7462,%r7463,%r7464}, %rd5, %p127, 1, 1; 2026-02-21T09:23:31.1834924Z // end inline asm 2026-02-21T09:23:31.1834983Z // begin inline asm 2026-02-21T09:23:31.1836435Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7593,%r7594,%r7595,%r7596}, %rd6, %p127, 1, 1; 2026-02-21T09:23:31.1836625Z // end inline asm 2026-02-21T09:23:31.1836693Z // begin inline asm 2026-02-21T09:23:31.1838147Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7725,%r7726,%r7727,%r7728}, %rd7, %p127, 1, 1; 2026-02-21T09:23:31.1838205Z // end inline asm 2026-02-21T09:23:31.1838271Z // begin inline asm 2026-02-21T09:23:31.1839760Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7857,%r7858,%r7859,%r7860}, %rd8, %p127, 1, 1; 2026-02-21T09:23:31.1839820Z // end inline asm 2026-02-21T09:23:31.1839884Z // begin inline asm 2026-02-21T09:23:31.1841364Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r7989,%r7990,%r7991,%r7992}, %rd9, %p127, 1, 1; 2026-02-21T09:23:31.1841564Z // end inline asm 2026-02-21T09:23:31.1841628Z // begin inline asm 2026-02-21T09:23:31.1843170Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810}, {%r8121,%r8122,%r8123,%r8124}, %rd10, %p127, 1, 1; 2026-02-21T09:23:31.1843255Z // end inline asm 2026-02-21T09:23:31.1843318Z // begin inline asm 2026-02-21T09:23:31.1844853Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8253,%r8254,%r8255,%r8256}, %rd3, %p127, 1, 1; 2026-02-21T09:23:31.1844917Z // end inline asm 2026-02-21T09:23:31.1844976Z // begin inline asm 2026-02-21T09:23:31.1846571Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8385,%r8386,%r8387,%r8388}, %rd4, %p127, 1, 1; 2026-02-21T09:23:31.1846637Z // end inline asm 2026-02-21T09:23:31.1846695Z // begin inline asm 2026-02-21T09:23:31.1848177Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8517,%r8518,%r8519,%r8520}, %rd5, %p127, 1, 1; 2026-02-21T09:23:31.1848236Z // end inline asm 2026-02-21T09:23:31.1848293Z // begin inline asm 2026-02-21T09:23:31.1849787Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8649,%r8650,%r8651,%r8652}, %rd6, %p127, 1, 1; 2026-02-21T09:23:31.1849991Z // end inline asm 2026-02-21T09:23:31.1850057Z // begin inline asm 2026-02-21T09:23:31.1851532Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8781,%r8782,%r8783,%r8784}, %rd7, %p127, 1, 1; 2026-02-21T09:23:31.1851650Z // end inline asm 2026-02-21T09:23:31.1851715Z // begin inline asm 2026-02-21T09:23:31.1853242Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r8913,%r8914,%r8915,%r8916}, %rd8, %p127, 1, 1; 2026-02-21T09:23:31.1853312Z // end inline asm 2026-02-21T09:23:31.1853372Z // begin inline asm 2026-02-21T09:23:31.1854842Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r9045,%r9046,%r9047,%r9048}, %rd9, %p127, 1, 1; 2026-02-21T09:23:31.1854910Z // end inline asm 2026-02-21T09:23:31.1854972Z // begin inline asm 2026-02-21T09:23:31.1856555Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874}, {%r9177,%r9178,%r9179,%r9180}, %rd10, %p127, 1, 1; 2026-02-21T09:23:31.1856622Z // end inline asm 2026-02-21T09:23:31.1856705Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:31.1856769Z mov.b32 %r9310, %r6852; 2026-02-21T09:23:31.1856835Z mov.b32 %r9311, %r6852; 2026-02-21T09:23:31.1856894Z mov.b32 %r9309, %r1310; 2026-02-21T09:23:31.1856953Z // begin inline asm 2026-02-21T09:23:31.1859500Z // wait for regs: %r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r12851,%r12852,%r12853,%r12854,%r12855,%r12856,%r12857,%r12858,%r12859,%r12860,%r12861,%r12862,%r12863,%r12864,%r12865,%r12866,%r12867,%r12868,%r12869,%r12870,%r12871,%r12872,%r12873,%r12874,%r9309,%r9310,%r9311 2026-02-21T09:23:31.1859741Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:31.1859804Z // end inline asm 2026-02-21T09:23:31.1859863Z $L__tmp6: 2026-02-21T09:23:31.1860082Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1860151Z add.s32 %r9570, %r12746, 1; 2026-02-21T09:23:31.1860219Z setp.gt.s32 %p149, %r9570, 1; 2026-02-21T09:23:31.1860291Z selp.b32 %r12746, 0, %r9570, %p149; 2026-02-21T09:23:31.1860553Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1860628Z add.s64 %rd578, %rd759, %rd77; 2026-02-21T09:23:31.1860697Z add.s64 %rd579, %rd759, %rd76; 2026-02-21T09:23:31.1860758Z add.s64 %rd580, %rd759, %rd75; 2026-02-21T09:23:31.1860829Z add.s64 %rd581, %rd759, %rd74; 2026-02-21T09:23:31.1860893Z add.s64 %rd582, %rd759, %rd73; 2026-02-21T09:23:31.1861016Z add.s64 %rd583, %rd759, %rd72; 2026-02-21T09:23:31.1861089Z add.s64 %rd584, %rd759, %rd71; 2026-02-21T09:23:31.1861151Z add.s64 %rd585, %rd759, %rd70; 2026-02-21T09:23:31.1861213Z add.s64 %rd586, %rd759, %rd69; 2026-02-21T09:23:31.1861278Z add.s64 %rd587, %rd759, %rd68; 2026-02-21T09:23:31.1861345Z add.s64 %rd588, %rd759, %rd67; 2026-02-21T09:23:31.1861407Z add.s64 %rd589, %rd759, %rd66; 2026-02-21T09:23:31.1861472Z add.s64 %rd590, %rd759, %rd65; 2026-02-21T09:23:31.1861538Z add.s64 %rd591, %rd759, %rd64; 2026-02-21T09:23:31.1861601Z add.s64 %rd592, %rd759, %rd63; 2026-02-21T09:23:31.1861802Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1861867Z add.s64 %rd593, %rd759, %rd62; 2026-02-21T09:23:31.1861937Z shl.b32 %r9571, %r12746, 14; 2026-02-21T09:23:31.1862000Z add.s32 %r9572, %r1290, %r9571; 2026-02-21T09:23:31.1862062Z add.s32 %r9443, %r9572, %r26; 2026-02-21T09:23:31.1862134Z selp.b32 %r9444, 8, 0, %p147; 2026-02-21T09:23:31.1862209Z // begin inline asm 2026-02-21T09:23:31.1862360Z cp.async.ca.shared.global [ %r9443 + 0 ], [ %rd578 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1862423Z // end inline asm 2026-02-21T09:23:31.1862488Z add.s32 %r9445, %r9443, 1024; 2026-02-21T09:23:31.1862547Z // begin inline asm 2026-02-21T09:23:31.1862684Z cp.async.ca.shared.global [ %r9445 + 0 ], [ %rd579 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1862745Z // end inline asm 2026-02-21T09:23:31.1862804Z add.s32 %r9447, %r9443, 2048; 2026-02-21T09:23:31.1862864Z // begin inline asm 2026-02-21T09:23:31.1863002Z cp.async.ca.shared.global [ %r9447 + 0 ], [ %rd580 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1863059Z // end inline asm 2026-02-21T09:23:31.1863123Z add.s32 %r9449, %r9443, 3072; 2026-02-21T09:23:31.1863183Z // begin inline asm 2026-02-21T09:23:31.1863319Z cp.async.ca.shared.global [ %r9449 + 0 ], [ %rd581 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1863377Z // end inline asm 2026-02-21T09:23:31.1863440Z add.s32 %r9451, %r9443, 4096; 2026-02-21T09:23:31.1863504Z // begin inline asm 2026-02-21T09:23:31.1863635Z cp.async.ca.shared.global [ %r9451 + 0 ], [ %rd582 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1863691Z // end inline asm 2026-02-21T09:23:31.1863751Z add.s32 %r9453, %r9443, 5120; 2026-02-21T09:23:31.1863816Z // begin inline asm 2026-02-21T09:23:31.1863945Z cp.async.ca.shared.global [ %r9453 + 0 ], [ %rd583 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1864003Z // end inline asm 2026-02-21T09:23:31.1864070Z add.s32 %r9455, %r9443, 6144; 2026-02-21T09:23:31.1864131Z // begin inline asm 2026-02-21T09:23:31.1864262Z cp.async.ca.shared.global [ %r9455 + 0 ], [ %rd584 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1864323Z // end inline asm 2026-02-21T09:23:31.1864527Z add.s32 %r9457, %r9443, 7168; 2026-02-21T09:23:31.1864584Z // begin inline asm 2026-02-21T09:23:31.1864715Z cp.async.ca.shared.global [ %r9457 + 0 ], [ %rd585 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1864775Z // end inline asm 2026-02-21T09:23:31.1864838Z add.s32 %r9459, %r9443, 8192; 2026-02-21T09:23:31.1864900Z // begin inline asm 2026-02-21T09:23:31.1865033Z cp.async.ca.shared.global [ %r9459 + 0 ], [ %rd586 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1865089Z // end inline asm 2026-02-21T09:23:31.1865148Z add.s32 %r9461, %r9443, 9216; 2026-02-21T09:23:31.1865206Z // begin inline asm 2026-02-21T09:23:31.1865340Z cp.async.ca.shared.global [ %r9461 + 0 ], [ %rd587 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1865396Z // end inline asm 2026-02-21T09:23:31.1865458Z add.s32 %r9463, %r9443, 10240; 2026-02-21T09:23:31.1865579Z // begin inline asm 2026-02-21T09:23:31.1865714Z cp.async.ca.shared.global [ %r9463 + 0 ], [ %rd588 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1865770Z // end inline asm 2026-02-21T09:23:31.1865841Z add.s32 %r9465, %r9443, 11264; 2026-02-21T09:23:31.1865900Z // begin inline asm 2026-02-21T09:23:31.1866031Z cp.async.ca.shared.global [ %r9465 + 0 ], [ %rd589 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1866088Z // end inline asm 2026-02-21T09:23:31.1866203Z add.s32 %r9467, %r9443, 12288; 2026-02-21T09:23:31.1866265Z // begin inline asm 2026-02-21T09:23:31.1866394Z cp.async.ca.shared.global [ %r9467 + 0 ], [ %rd590 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1866567Z // end inline asm 2026-02-21T09:23:31.1866634Z add.s32 %r9469, %r9443, 13312; 2026-02-21T09:23:31.1866695Z // begin inline asm 2026-02-21T09:23:31.1866829Z cp.async.ca.shared.global [ %r9469 + 0 ], [ %rd591 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1866893Z // end inline asm 2026-02-21T09:23:31.1866956Z add.s32 %r9471, %r9443, 14336; 2026-02-21T09:23:31.1867014Z // begin inline asm 2026-02-21T09:23:31.1867149Z cp.async.ca.shared.global [ %r9471 + 0 ], [ %rd592 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1867212Z // end inline asm 2026-02-21T09:23:31.1867273Z add.s32 %r9473, %r9443, 15360; 2026-02-21T09:23:31.1867331Z // begin inline asm 2026-02-21T09:23:31.1867467Z cp.async.ca.shared.global [ %r9473 + 0 ], [ %rd593 + 0 ], 0x8, %r9444; 2026-02-21T09:23:31.1867524Z // end inline asm 2026-02-21T09:23:31.1867592Z cp.async.commit_group; 2026-02-21T09:23:31.1867801Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1867866Z shl.b32 %r9573, %r12746, 3; 2026-02-21T09:23:31.1867930Z add.s32 %r9475, %r6810, %r9573; 2026-02-21T09:23:31.1868004Z and.pred %p143, %p160, %p147; 2026-02-21T09:23:31.1868067Z // begin inline asm 2026-02-21T09:23:31.1868203Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r9475], 4096; 2026-02-21T09:23:31.1868347Z // end inline asm 2026-02-21T09:23:31.1868559Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1868623Z shl.b32 %r9574, %r12746, 12; 2026-02-21T09:23:31.1868688Z add.s32 %r9476, %r1378, %r9574; 2026-02-21T09:23:31.1868750Z bar.sync 0; 2026-02-21T09:23:31.1868822Z elect.sync %r9575|%p150, -1; 2026-02-21T09:23:31.1868890Z and.pred %p151, %p147, %p150; 2026-02-21T09:23:31.1868964Z and.pred %p144, %p1, %p151; 2026-02-21T09:23:31.1869027Z cvt.u32.u64 %r9576, %rd760; 2026-02-21T09:23:31.1869088Z add.s32 %r9478, %r9576, 64; 2026-02-21T09:23:31.1869147Z // begin inline asm 2026-02-21T09:23:31.1869482Z @%p144 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9476], [%rd463, {%r6851, %r9478}], [%r9475]; 2026-02-21T09:23:31.1869540Z // end inline asm 2026-02-21T09:23:31.1869740Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1869810Z add.s64 %rd759, %rd759, 128; 2026-02-21T09:23:31.1869877Z setp.lt.u64 %p152, %rd760, 480; 2026-02-21T09:23:31.1869942Z add.s64 %rd760, %rd760, 32; 2026-02-21T09:23:31.1870090Z @%p152 bra $L__BB0_7; 2026-02-21T09:23:31.1870264Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:31.1870333Z cp.async.wait_group 0; 2026-02-21T09:23:31.1870389Z bar.sync 0; 2026-02-21T09:23:31.1870453Z // begin inline asm 2026-02-21T09:23:31.1870552Z @%p160 mbarrier.inval.shared::cta.b64 [%r6810]; 2026-02-21T09:23:31.1870610Z // end inline asm 2026-02-21T09:23:31.1870670Z bar.sync 0; 2026-02-21T09:23:31.1870729Z // begin inline asm 2026-02-21T09:23:31.1870818Z @%p160 mbarrier.inval.shared::cta.b64 [%r6811]; 2026-02-21T09:23:31.1870878Z // end inline asm 2026-02-21T09:23:31.1871085Z .loc 1 91 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:91:28 2026-02-21T09:23:31.1871172Z cvt.rn.bf16x2.f32 %r9583, %r12748, %r12747; 2026-02-21T09:23:31.1871317Z cvt.rn.bf16x2.f32 %r9584, %r12750, %r12749; 2026-02-21T09:23:31.1871403Z cvt.rn.bf16x2.f32 %r9585, %r12752, %r12751; 2026-02-21T09:23:31.1871479Z cvt.rn.bf16x2.f32 %r9586, %r12754, %r12753; 2026-02-21T09:23:31.1871558Z cvt.rn.bf16x2.f32 %r9587, %r12756, %r12755; 2026-02-21T09:23:31.1871636Z cvt.rn.bf16x2.f32 %r9588, %r12758, %r12757; 2026-02-21T09:23:31.1871714Z cvt.rn.bf16x2.f32 %r9589, %r12760, %r12759; 2026-02-21T09:23:31.1871846Z cvt.rn.bf16x2.f32 %r9590, %r12762, %r12761; 2026-02-21T09:23:31.1871927Z cvt.rn.bf16x2.f32 %r9591, %r12764, %r12763; 2026-02-21T09:23:31.1872006Z cvt.rn.bf16x2.f32 %r9592, %r12766, %r12765; 2026-02-21T09:23:31.1872092Z cvt.rn.bf16x2.f32 %r9593, %r12768, %r12767; 2026-02-21T09:23:31.1872170Z cvt.rn.bf16x2.f32 %r9594, %r12770, %r12769; 2026-02-21T09:23:31.1872249Z cvt.rn.bf16x2.f32 %r9595, %r12772, %r12771; 2026-02-21T09:23:31.1872323Z cvt.rn.bf16x2.f32 %r9596, %r12774, %r12773; 2026-02-21T09:23:31.1872400Z cvt.rn.bf16x2.f32 %r9597, %r12776, %r12775; 2026-02-21T09:23:31.1872479Z cvt.rn.bf16x2.f32 %r9598, %r12778, %r12777; 2026-02-21T09:23:31.1872553Z cvt.rn.bf16x2.f32 %r9599, %r12780, %r12779; 2026-02-21T09:23:31.1872631Z cvt.rn.bf16x2.f32 %r9600, %r12782, %r12781; 2026-02-21T09:23:31.1872706Z cvt.rn.bf16x2.f32 %r9601, %r12784, %r12783; 2026-02-21T09:23:31.1872787Z cvt.rn.bf16x2.f32 %r9602, %r12786, %r12785; 2026-02-21T09:23:31.1872862Z cvt.rn.bf16x2.f32 %r9603, %r12788, %r12787; 2026-02-21T09:23:31.1872936Z cvt.rn.bf16x2.f32 %r9604, %r12790, %r12789; 2026-02-21T09:23:31.1873014Z cvt.rn.bf16x2.f32 %r9605, %r12792, %r12791; 2026-02-21T09:23:31.1873088Z cvt.rn.bf16x2.f32 %r9606, %r12794, %r12793; 2026-02-21T09:23:31.1873161Z cvt.rn.bf16x2.f32 %r9607, %r12796, %r12795; 2026-02-21T09:23:31.1873242Z cvt.rn.bf16x2.f32 %r9608, %r12798, %r12797; 2026-02-21T09:23:31.1873316Z cvt.rn.bf16x2.f32 %r9609, %r12800, %r12799; 2026-02-21T09:23:31.1873389Z cvt.rn.bf16x2.f32 %r9610, %r12802, %r12801; 2026-02-21T09:23:31.1873464Z cvt.rn.bf16x2.f32 %r9611, %r12804, %r12803; 2026-02-21T09:23:31.1873543Z cvt.rn.bf16x2.f32 %r9612, %r12806, %r12805; 2026-02-21T09:23:31.1873624Z cvt.rn.bf16x2.f32 %r9613, %r12808, %r12807; 2026-02-21T09:23:31.1873706Z cvt.rn.bf16x2.f32 %r9614, %r12810, %r12809; 2026-02-21T09:23:31.1873789Z cvt.rn.bf16x2.f32 %r9615, %r12812, %r12811; 2026-02-21T09:23:31.1873864Z cvt.rn.bf16x2.f32 %r9616, %r12814, %r12813; 2026-02-21T09:23:31.1873939Z cvt.rn.bf16x2.f32 %r9617, %r12816, %r12815; 2026-02-21T09:23:31.1874022Z cvt.rn.bf16x2.f32 %r9618, %r12818, %r12817; 2026-02-21T09:23:31.1874096Z cvt.rn.bf16x2.f32 %r9619, %r12820, %r12819; 2026-02-21T09:23:31.1874170Z cvt.rn.bf16x2.f32 %r9620, %r12822, %r12821; 2026-02-21T09:23:31.1874246Z cvt.rn.bf16x2.f32 %r9621, %r12824, %r12823; 2026-02-21T09:23:31.1874326Z cvt.rn.bf16x2.f32 %r9622, %r12826, %r12825; 2026-02-21T09:23:31.1874401Z cvt.rn.bf16x2.f32 %r9623, %r12828, %r12827; 2026-02-21T09:23:31.1874478Z cvt.rn.bf16x2.f32 %r9624, %r12830, %r12829; 2026-02-21T09:23:31.1874559Z cvt.rn.bf16x2.f32 %r9625, %r12832, %r12831; 2026-02-21T09:23:31.1874634Z cvt.rn.bf16x2.f32 %r9626, %r12834, %r12833; 2026-02-21T09:23:31.1874709Z cvt.rn.bf16x2.f32 %r9627, %r12836, %r12835; 2026-02-21T09:23:31.1874900Z cvt.rn.bf16x2.f32 %r9628, %r12838, %r12837; 2026-02-21T09:23:31.1874974Z cvt.rn.bf16x2.f32 %r9629, %r12840, %r12839; 2026-02-21T09:23:31.1875047Z cvt.rn.bf16x2.f32 %r9630, %r12842, %r12841; 2026-02-21T09:23:31.1875126Z cvt.rn.bf16x2.f32 %r9631, %r12844, %r12843; 2026-02-21T09:23:31.1875206Z cvt.rn.bf16x2.f32 %r9632, %r12846, %r12845; 2026-02-21T09:23:31.1875280Z cvt.rn.bf16x2.f32 %r9633, %r12848, %r12847; 2026-02-21T09:23:31.1875355Z cvt.rn.bf16x2.f32 %r9634, %r12850, %r12849; 2026-02-21T09:23:31.1875431Z cvt.rn.bf16x2.f32 %r9635, %r12852, %r12851; 2026-02-21T09:23:31.1875507Z cvt.rn.bf16x2.f32 %r9636, %r12854, %r12853; 2026-02-21T09:23:31.1875584Z cvt.rn.bf16x2.f32 %r9637, %r12856, %r12855; 2026-02-21T09:23:31.1875707Z cvt.rn.bf16x2.f32 %r9638, %r12858, %r12857; 2026-02-21T09:23:31.1875793Z cvt.rn.bf16x2.f32 %r9639, %r12860, %r12859; 2026-02-21T09:23:31.1875868Z cvt.rn.bf16x2.f32 %r9640, %r12862, %r12861; 2026-02-21T09:23:31.1875946Z cvt.rn.bf16x2.f32 %r9641, %r12864, %r12863; 2026-02-21T09:23:31.1876025Z cvt.rn.bf16x2.f32 %r9642, %r12866, %r12865; 2026-02-21T09:23:31.1876099Z cvt.rn.bf16x2.f32 %r9643, %r12868, %r12867; 2026-02-21T09:23:31.1876174Z cvt.rn.bf16x2.f32 %r9644, %r12870, %r12869; 2026-02-21T09:23:31.1876298Z cvt.rn.bf16x2.f32 %r9645, %r12872, %r12871; 2026-02-21T09:23:31.1876375Z cvt.rn.bf16x2.f32 %r9646, %r12874, %r12873; 2026-02-21T09:23:31.1876720Z .loc 1 92 43 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:92:43 2026-02-21T09:23:31.1876921Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r9583, %r9584, %r9585, %r9586}; 2026-02-21T09:23:31.1877101Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r9599, %r9600, %r9601, %r9602}; 2026-02-21T09:23:31.1877279Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r9615, %r9616, %r9617, %r9618}; 2026-02-21T09:23:31.1877453Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r9631, %r9632, %r9633, %r9634}; 2026-02-21T09:23:31.1877635Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r9587, %r9588, %r9589, %r9590}; 2026-02-21T09:23:31.1877809Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r9603, %r9604, %r9605, %r9606}; 2026-02-21T09:23:31.1877984Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r9619, %r9620, %r9621, %r9622}; 2026-02-21T09:23:31.1878164Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r9635, %r9636, %r9637, %r9638}; 2026-02-21T09:23:31.1878336Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r9591, %r9592, %r9593, %r9594}; 2026-02-21T09:23:31.1878509Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r9607, %r9608, %r9609, %r9610}; 2026-02-21T09:23:31.1878685Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r9623, %r9624, %r9625, %r9626}; 2026-02-21T09:23:31.1878859Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r9639, %r9640, %r9641, %r9642}; 2026-02-21T09:23:31.1879034Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r9595, %r9596, %r9597, %r9598}; 2026-02-21T09:23:31.1879216Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r9611, %r9612, %r9613, %r9614}; 2026-02-21T09:23:31.1879389Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r9627, %r9628, %r9629, %r9630}; 2026-02-21T09:23:31.1879566Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r90], {%r9643, %r9644, %r9645, %r9646}; 2026-02-21T09:23:31.1879637Z // begin inline asm 2026-02-21T09:23:31.1879718Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.1879777Z // end inline asm 2026-02-21T09:23:31.1879833Z bar.sync 0; 2026-02-21T09:23:31.1879909Z elect.sync %r9647|%p157, -1; 2026-02-21T09:23:31.1879978Z and.pred %p155, %p201, %p157; 2026-02-21T09:23:31.1880040Z or.b32 %r9579, %r469, %r6851; 2026-02-21T09:23:31.1880105Z // begin inline asm 2026-02-21T09:23:31.1880342Z @%p155 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd446, {%r9579, %r9580}], [%r6814]; 2026-02-21T09:23:31.1880402Z // end inline asm 2026-02-21T09:23:31.1880484Z cp.async.bulk.commit_group; 2026-02-21T09:23:31.1880669Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:31.1880787Z bar.sync 0; 2026-02-21T09:23:31.1881000Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1881069Z add.s32 %r12481, %r12481, 3; 2026-02-21T09:23:31.1881143Z setp.lt.s32 %p158, %r12481, %r12875; 2026-02-21T09:23:31.1881205Z @%p158 bra $L__BB0_2; 2026-02-21T09:23:31.1881299Z $L__BB0_9: // %.preheader 2026-02-21T09:23:31.1881368Z setp.ge.s32 %p159, %r12875, %r3; 2026-02-21T09:23:31.1881432Z @%p159 bra $L__BB0_14; 2026-02-21T09:23:31.1881515Z // %bb.10: // %.lr.ph19 2026-02-21T09:23:31.1881721Z .loc 1 0 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:0:88 2026-02-21T09:23:31.1881847Z shl.b32 %r9648, %r12473, 3; 2026-02-21T09:23:31.1881915Z xor.b32 %r115, %r9648, %r12474; 2026-02-21T09:23:31.1881983Z add.s32 %r116, %r1290, %r115; 2026-02-21T09:23:31.1882045Z add.s32 %r9698, %r116, 1024; 2026-02-21T09:23:31.1882111Z add.s32 %r9700, %r116, 2048; 2026-02-21T09:23:31.1882175Z add.s32 %r9702, %r116, 3072; 2026-02-21T09:23:31.1882233Z add.s32 %r9704, %r116, 4096; 2026-02-21T09:23:31.1882293Z add.s32 %r9706, %r116, 5120; 2026-02-21T09:23:31.1882414Z add.s32 %r9708, %r116, 6144; 2026-02-21T09:23:31.1882480Z add.s32 %r9710, %r116, 7168; 2026-02-21T09:23:31.1882540Z add.s32 %r9712, %r116, 8192; 2026-02-21T09:23:31.1882605Z add.s32 %r9714, %r116, 9216; 2026-02-21T09:23:31.1882669Z add.s32 %r9716, %r116, 10240; 2026-02-21T09:23:31.1882731Z add.s32 %r9718, %r116, 11264; 2026-02-21T09:23:31.1882791Z add.s32 %r9720, %r116, 12288; 2026-02-21T09:23:31.1882851Z add.s32 %r9722, %r116, 13312; 2026-02-21T09:23:31.1882914Z add.s32 %r9724, %r116, 14336; 2026-02-21T09:23:31.1882976Z add.s32 %r9726, %r116, 15360; 2026-02-21T09:23:31.1883037Z add.s32 %r9733, %r116, 16384; 2026-02-21T09:23:31.1883102Z add.s32 %r9735, %r116, 17408; 2026-02-21T09:23:31.1883164Z add.s32 %r9737, %r116, 18432; 2026-02-21T09:23:31.1883226Z add.s32 %r9739, %r116, 19456; 2026-02-21T09:23:31.1883285Z add.s32 %r9741, %r116, 20480; 2026-02-21T09:23:31.1883348Z add.s32 %r9743, %r116, 21504; 2026-02-21T09:23:31.1883411Z add.s32 %r9745, %r116, 22528; 2026-02-21T09:23:31.1883481Z add.s32 %r9747, %r116, 23552; 2026-02-21T09:23:31.1883546Z add.s32 %r9749, %r116, 24576; 2026-02-21T09:23:31.1883608Z add.s32 %r9751, %r116, 25600; 2026-02-21T09:23:31.1883668Z add.s32 %r9753, %r116, 26624; 2026-02-21T09:23:31.1883733Z add.s32 %r9755, %r116, 27648; 2026-02-21T09:23:31.1883793Z add.s32 %r9757, %r116, 28672; 2026-02-21T09:23:31.1883853Z add.s32 %r9759, %r116, 29696; 2026-02-21T09:23:31.1883912Z add.s32 %r9761, %r116, 30720; 2026-02-21T09:23:31.1883975Z add.s32 %r9763, %r116, 31744; 2026-02-21T09:23:31.1884042Z and.b32 %r9652, %r12475, 6144; 2026-02-21T09:23:31.1884104Z and.b32 %r9654, %r12476, 896; 2026-02-21T09:23:31.1884169Z and.b32 %r9656, %r12477, 62; 2026-02-21T09:23:31.1884235Z or.b32 %r9657, %r9654, %r9656; 2026-02-21T09:23:31.1884297Z or.b32 %r148, %r9657, %r9652; 2026-02-21T09:23:31.1884357Z xor.b32 %r149, %r148, 8; 2026-02-21T09:23:31.1884424Z xor.b32 %r150, %r148, 16; 2026-02-21T09:23:31.1884484Z xor.b32 %r151, %r148, 24; 2026-02-21T09:23:31.1884545Z xor.b32 %r152, %r148, 32; 2026-02-21T09:23:31.1884609Z xor.b32 %r153, %r148, 40; 2026-02-21T09:23:31.1884667Z xor.b32 %r154, %r148, 48; 2026-02-21T09:23:31.1884726Z xor.b32 %r155, %r148, 56; 2026-02-21T09:23:31.1884789Z shl.b32 %r9658, %r12473, 7; 2026-02-21T09:23:31.1884855Z and.b32 %r9660, %r12478, 112; 2026-02-21T09:23:31.1884917Z or.b32 %r9661, %r9658, %r9660; 2026-02-21T09:23:31.1884979Z add.s32 %r9662, %r1290, 32768; 2026-02-21T09:23:31.1885045Z add.s32 %r156, %r9662, %r9661; 2026-02-21T09:23:31.1885109Z xor.b32 %r9663, %r9661, 16; 2026-02-21T09:23:31.1885170Z add.s32 %r157, %r9662, %r9663; 2026-02-21T09:23:31.1885231Z xor.b32 %r9664, %r9661, 32; 2026-02-21T09:23:31.1885299Z add.s32 %r158, %r9662, %r9664; 2026-02-21T09:23:31.1885507Z xor.b32 %r9665, %r9661, 48; 2026-02-21T09:23:31.1885570Z add.s32 %r159, %r9662, %r9665; 2026-02-21T09:23:31.1885635Z xor.b32 %r9666, %r9661, 64; 2026-02-21T09:23:31.1885696Z add.s32 %r160, %r9662, %r9666; 2026-02-21T09:23:31.1885757Z xor.b32 %r9667, %r9661, 80; 2026-02-21T09:23:31.1885823Z add.s32 %r161, %r9662, %r9667; 2026-02-21T09:23:31.1885883Z xor.b32 %r9668, %r9661, 96; 2026-02-21T09:23:31.1885945Z add.s32 %r162, %r9662, %r9668; 2026-02-21T09:23:31.1886006Z xor.b32 %r9669, %r9661, 112; 2026-02-21T09:23:31.1886071Z add.s32 %r163, %r9662, %r9669; 2026-02-21T09:23:31.1886132Z bfe.u32 %r9670, %r9662, 4, 14; 2026-02-21T09:23:31.1886196Z cvt.u64.u32 %rd596, %r9670; 2026-02-21T09:23:31.1886278Z or.b64 %rd12, %rd596, 4611686293372403712; 2026-02-21T09:23:31.1886390Z add.s32 %r9671, %r1290, 32800; 2026-02-21T09:23:31.1886560Z bfe.u32 %r9672, %r9671, 4, 14; 2026-02-21T09:23:31.1886627Z cvt.u64.u32 %rd597, %r9672; 2026-02-21T09:23:31.1886713Z or.b64 %rd13, %rd597, 4611686293372403712; 2026-02-21T09:23:31.1886776Z add.s32 %r9673, %r1290, 32832; 2026-02-21T09:23:31.1886835Z bfe.u32 %r9674, %r9673, 4, 14; 2026-02-21T09:23:31.1886901Z cvt.u64.u32 %rd598, %r9674; 2026-02-21T09:23:31.1887048Z or.b64 %rd14, %rd598, 4611686293372403712; 2026-02-21T09:23:31.1887114Z add.s32 %r9675, %r1290, 32864; 2026-02-21T09:23:31.1887175Z bfe.u32 %r9676, %r9675, 4, 14; 2026-02-21T09:23:31.1887243Z cvt.u64.u32 %rd599, %r9676; 2026-02-21T09:23:31.1887314Z or.b64 %rd15, %rd599, 4611686293372403712; 2026-02-21T09:23:31.1887375Z add.s32 %r9677, %r1290, 49152; 2026-02-21T09:23:31.1887455Z bfe.u32 %r9678, %r9677, 4, 14; 2026-02-21T09:23:31.1887518Z cvt.u64.u32 %rd600, %r9678; 2026-02-21T09:23:31.1887590Z or.b64 %rd16, %rd600, 4611686293372403712; 2026-02-21T09:23:31.1887659Z add.s32 %r9679, %r1290, 49184; 2026-02-21T09:23:31.1887721Z bfe.u32 %r9680, %r9679, 4, 14; 2026-02-21T09:23:31.1887782Z cvt.u64.u32 %rd601, %r9680; 2026-02-21T09:23:31.1887856Z or.b64 %rd17, %rd601, 4611686293372403712; 2026-02-21T09:23:31.1887923Z add.s32 %r9681, %r1290, 49216; 2026-02-21T09:23:31.1887983Z bfe.u32 %r9682, %r9681, 4, 14; 2026-02-21T09:23:31.1888044Z cvt.u64.u32 %rd602, %r9682; 2026-02-21T09:23:31.1888121Z or.b64 %rd18, %rd602, 4611686293372403712; 2026-02-21T09:23:31.1888181Z add.s32 %r9683, %r1290, 49248; 2026-02-21T09:23:31.1888241Z bfe.u32 %r9684, %r9683, 4, 14; 2026-02-21T09:23:31.1888301Z cvt.u64.u32 %rd603, %r9684; 2026-02-21T09:23:31.1888375Z or.b64 %rd19, %rd603, 4611686293372403712; 2026-02-21T09:23:31.1888437Z or.b32 %r9687, %r12479, %r9660; 2026-02-21T09:23:31.1888501Z xor.b32 %r9688, %r9687, %r12480; 2026-02-21T09:23:31.1888566Z or.b32 %r9689, %r9688, %r9652; 2026-02-21T09:23:31.1888627Z add.s32 %r164, %r1290, %r9689; 2026-02-21T09:23:31.1888690Z add.s32 %r165, %r164, 16384; 2026-02-21T09:23:31.1888754Z add.s32 %r166, %r164, 8192; 2026-02-21T09:23:31.1888820Z add.s32 %r167, %r164, 24576; 2026-02-21T09:23:31.1888889Z xor.b32 %r9690, %r9689, 32; 2026-02-21T09:23:31.1888950Z add.s32 %r168, %r1290, %r9690; 2026-02-21T09:23:31.1889015Z add.s32 %r169, %r168, 16384; 2026-02-21T09:23:31.1889075Z add.s32 %r170, %r168, 8192; 2026-02-21T09:23:31.1889133Z add.s32 %r171, %r168, 24576; 2026-02-21T09:23:31.1889197Z xor.b32 %r9691, %r9689, 64; 2026-02-21T09:23:31.1889259Z add.s32 %r172, %r1290, %r9691; 2026-02-21T09:23:31.1889318Z add.s32 %r173, %r172, 16384; 2026-02-21T09:23:31.1889377Z add.s32 %r174, %r172, 8192; 2026-02-21T09:23:31.1889441Z add.s32 %r175, %r172, 24576; 2026-02-21T09:23:31.1889501Z xor.b32 %r9692, %r9689, 96; 2026-02-21T09:23:31.1889564Z add.s32 %r176, %r1290, %r9692; 2026-02-21T09:23:31.1889627Z add.s32 %r177, %r176, 16384; 2026-02-21T09:23:31.1889698Z add.s32 %r178, %r176, 8192; 2026-02-21T09:23:31.1889761Z add.s32 %r179, %r176, 24576; 2026-02-21T09:23:31.1889970Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.1890185Z mad.wide.u32 %rd20, %r23, 8, %rd102; 2026-02-21T09:23:31.1890299Z $L__BB0_11: // =>This Loop Header: Depth=1 2026-02-21T09:23:31.1890400Z // Child Loop BB0_12 Depth 2 2026-02-21T09:23:31.1890607Z .loc 1 35 31 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:35:31 2026-02-21T09:23:31.1890669Z shr.s32 %r9774, %r12875, 31; 2026-02-21T09:23:31.1890730Z shr.u32 %r9775, %r9774, 25; 2026-02-21T09:23:31.1890798Z add.s32 %r9776, %r12875, %r9775; 2026-02-21T09:23:31.1890995Z .loc 1 34 30 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:34:30 2026-02-21T09:23:31.1891061Z and.b32 %r9730, %r9776, -128; 2026-02-21T09:23:31.1891123Z sub.s32 %r9777, %r12875, %r9730; 2026-02-21T09:23:31.1891387Z .loc 1 36 27 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:36:27 2026-02-21T09:23:31.1891452Z shl.b32 %r12401, %r9777, 7; 2026-02-21T09:23:31.1891650Z .loc 1 37 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:37:32 2026-02-21T09:23:31.1891715Z or.b32 %r9778, %r12401, %r7; 2026-02-21T09:23:31.1891775Z or.b32 %r9779, %r12401, %r8; 2026-02-21T09:23:31.1891879Z or.b32 %r9780, %r12401, %r9; 2026-02-21T09:23:31.1891946Z or.b32 %r9781, %r12401, %r10; 2026-02-21T09:23:31.1892006Z or.b32 %r9782, %r12401, %r11; 2026-02-21T09:23:31.1892065Z or.b32 %r9783, %r12401, %r12; 2026-02-21T09:23:31.1892125Z or.b32 %r9784, %r12401, %r13; 2026-02-21T09:23:31.1892189Z or.b32 %r9785, %r12401, %r14; 2026-02-21T09:23:31.1892249Z or.b32 %r9786, %r12401, %r15; 2026-02-21T09:23:31.1892310Z or.b32 %r9787, %r12401, %r16; 2026-02-21T09:23:31.1892373Z or.b32 %r9788, %r12401, %r17; 2026-02-21T09:23:31.1892434Z or.b32 %r9789, %r12401, %r18; 2026-02-21T09:23:31.1892493Z or.b32 %r9790, %r12401, %r19; 2026-02-21T09:23:31.1892558Z or.b32 %r9791, %r12401, %r20; 2026-02-21T09:23:31.1892618Z or.b32 %r9792, %r12401, %r21; 2026-02-21T09:23:31.1892682Z or.b32 %r9793, %r12401, %r22; 2026-02-21T09:23:31.1892877Z .loc 1 52 53 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:53 2026-02-21T09:23:31.1892942Z shl.b32 %r9794, %r9778, 10; 2026-02-21T09:23:31.1893005Z shl.b32 %r9795, %r9779, 10; 2026-02-21T09:23:31.1893067Z shl.b32 %r9796, %r9780, 10; 2026-02-21T09:23:31.1893130Z shl.b32 %r9797, %r9781, 10; 2026-02-21T09:23:31.1893188Z shl.b32 %r9798, %r9782, 10; 2026-02-21T09:23:31.1893249Z shl.b32 %r9799, %r9783, 10; 2026-02-21T09:23:31.1893311Z shl.b32 %r9800, %r9784, 10; 2026-02-21T09:23:31.1893377Z shl.b32 %r9801, %r9785, 10; 2026-02-21T09:23:31.1893437Z shl.b32 %r9802, %r9786, 10; 2026-02-21T09:23:31.1893509Z shl.b32 %r9803, %r9787, 10; 2026-02-21T09:23:31.1893575Z shl.b32 %r9804, %r9788, 10; 2026-02-21T09:23:31.1893635Z shl.b32 %r9805, %r9789, 10; 2026-02-21T09:23:31.1893696Z shl.b32 %r9806, %r9790, 10; 2026-02-21T09:23:31.1893755Z shl.b32 %r9807, %r9791, 10; 2026-02-21T09:23:31.1893821Z shl.b32 %r9808, %r9792, 10; 2026-02-21T09:23:31.1893880Z shl.b32 %r9809, %r9793, 10; 2026-02-21T09:23:31.1894076Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1894148Z add.s32 %r9694, %r1290, 73728; 2026-02-21T09:23:31.1894209Z // begin inline asm 2026-02-21T09:23:31.1894310Z @%p160 mbarrier.init.shared::cta.b64 [%r9694], 1; 2026-02-21T09:23:31.1894371Z // end inline asm 2026-02-21T09:23:31.1894429Z bar.sync 0; 2026-02-21T09:23:31.1894493Z add.s32 %r9695, %r1290, 73736; 2026-02-21T09:23:31.1894555Z // begin inline asm 2026-02-21T09:23:31.1894655Z @%p160 mbarrier.init.shared::cta.b64 [%r9695], 1; 2026-02-21T09:23:31.1894713Z // end inline asm 2026-02-21T09:23:31.1894912Z .loc 1 52 60 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:60 2026-02-21T09:23:31.1894977Z or.b32 %r9811, %r9794, %r24; 2026-02-21T09:23:31.1895036Z or.b32 %r9812, %r9795, %r24; 2026-02-21T09:23:31.1895218Z or.b32 %r9813, %r9796, %r24; 2026-02-21T09:23:31.1895280Z or.b32 %r9814, %r9797, %r24; 2026-02-21T09:23:31.1895340Z or.b32 %r9815, %r9798, %r24; 2026-02-21T09:23:31.1895400Z or.b32 %r9816, %r9799, %r24; 2026-02-21T09:23:31.1895462Z or.b32 %r9817, %r9800, %r24; 2026-02-21T09:23:31.1895531Z or.b32 %r9818, %r9801, %r24; 2026-02-21T09:23:31.1895592Z or.b32 %r9819, %r9802, %r24; 2026-02-21T09:23:31.1895663Z or.b32 %r9820, %r9803, %r24; 2026-02-21T09:23:31.1895731Z or.b32 %r9821, %r9804, %r24; 2026-02-21T09:23:31.1895790Z or.b32 %r9822, %r9805, %r24; 2026-02-21T09:23:31.1895851Z or.b32 %r9823, %r9806, %r24; 2026-02-21T09:23:31.1895912Z or.b32 %r9824, %r9807, %r24; 2026-02-21T09:23:31.1895976Z or.b32 %r9825, %r9808, %r24; 2026-02-21T09:23:31.1896085Z or.b32 %r9826, %r9809, %r24; 2026-02-21T09:23:31.1896285Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1896366Z mad.wide.s32 %rd604, %r9811, 2, %rd102; 2026-02-21T09:23:31.1896442Z mad.wide.s32 %rd605, %r9812, 2, %rd102; 2026-02-21T09:23:31.1896618Z mad.wide.s32 %rd606, %r9813, 2, %rd102; 2026-02-21T09:23:31.1896689Z mad.wide.s32 %rd607, %r9814, 2, %rd102; 2026-02-21T09:23:31.1896849Z mad.wide.s32 %rd608, %r9815, 2, %rd102; 2026-02-21T09:23:31.1896923Z mad.wide.s32 %rd609, %r9816, 2, %rd102; 2026-02-21T09:23:31.1896993Z mad.wide.s32 %rd610, %r9817, 2, %rd102; 2026-02-21T09:23:31.1897065Z mad.wide.s32 %rd611, %r9818, 2, %rd102; 2026-02-21T09:23:31.1897132Z mad.wide.s32 %rd612, %r9819, 2, %rd102; 2026-02-21T09:23:31.1897201Z mad.wide.s32 %rd613, %r9820, 2, %rd102; 2026-02-21T09:23:31.1897282Z mad.wide.s32 %rd614, %r9821, 2, %rd102; 2026-02-21T09:23:31.1897353Z mad.wide.s32 %rd615, %r9822, 2, %rd102; 2026-02-21T09:23:31.1897422Z mad.wide.s32 %rd616, %r9823, 2, %rd102; 2026-02-21T09:23:31.1897490Z mad.wide.s32 %rd617, %r9824, 2, %rd102; 2026-02-21T09:23:31.1897562Z mad.wide.s32 %rd618, %r9825, 2, %rd102; 2026-02-21T09:23:31.1897632Z mad.wide.s32 %rd619, %r9826, 2, %rd102; 2026-02-21T09:23:31.1897691Z mov.b32 %r9697, 8; 2026-02-21T09:23:31.1897897Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1897960Z // begin inline asm 2026-02-21T09:23:31.1898104Z cp.async.ca.shared.global [ %r116 + 0 ], [ %rd604 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1898166Z // end inline asm 2026-02-21T09:23:31.1898228Z // begin inline asm 2026-02-21T09:23:31.1898369Z cp.async.ca.shared.global [ %r9698 + 0 ], [ %rd605 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1898428Z // end inline asm 2026-02-21T09:23:31.1898492Z // begin inline asm 2026-02-21T09:23:31.1898625Z cp.async.ca.shared.global [ %r9700 + 0 ], [ %rd606 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1898683Z // end inline asm 2026-02-21T09:23:31.1898744Z // begin inline asm 2026-02-21T09:23:31.1898876Z cp.async.ca.shared.global [ %r9702 + 0 ], [ %rd607 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1898936Z // end inline asm 2026-02-21T09:23:31.1898994Z // begin inline asm 2026-02-21T09:23:31.1899137Z cp.async.ca.shared.global [ %r9704 + 0 ], [ %rd608 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1899194Z // end inline asm 2026-02-21T09:23:31.1899252Z // begin inline asm 2026-02-21T09:23:31.1899387Z cp.async.ca.shared.global [ %r9706 + 0 ], [ %rd609 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1899444Z // end inline asm 2026-02-21T09:23:31.1899507Z // begin inline asm 2026-02-21T09:23:31.1899643Z cp.async.ca.shared.global [ %r9708 + 0 ], [ %rd610 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1899700Z // end inline asm 2026-02-21T09:23:31.1899759Z // begin inline asm 2026-02-21T09:23:31.1899889Z cp.async.ca.shared.global [ %r9710 + 0 ], [ %rd611 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1899952Z // end inline asm 2026-02-21T09:23:31.1900012Z // begin inline asm 2026-02-21T09:23:31.1900155Z cp.async.ca.shared.global [ %r9712 + 0 ], [ %rd612 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1900218Z // end inline asm 2026-02-21T09:23:31.1900425Z // begin inline asm 2026-02-21T09:23:31.1900556Z cp.async.ca.shared.global [ %r9714 + 0 ], [ %rd613 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1900614Z // end inline asm 2026-02-21T09:23:31.1900677Z // begin inline asm 2026-02-21T09:23:31.1900807Z cp.async.ca.shared.global [ %r9716 + 0 ], [ %rd614 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1900865Z // end inline asm 2026-02-21T09:23:31.1900927Z // begin inline asm 2026-02-21T09:23:31.1901056Z cp.async.ca.shared.global [ %r9718 + 0 ], [ %rd615 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1901114Z // end inline asm 2026-02-21T09:23:31.1901179Z // begin inline asm 2026-02-21T09:23:31.1901309Z cp.async.ca.shared.global [ %r9720 + 0 ], [ %rd616 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1901365Z // end inline asm 2026-02-21T09:23:31.1901489Z // begin inline asm 2026-02-21T09:23:31.1901625Z cp.async.ca.shared.global [ %r9722 + 0 ], [ %rd617 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1901684Z // end inline asm 2026-02-21T09:23:31.1901744Z // begin inline asm 2026-02-21T09:23:31.1901881Z cp.async.ca.shared.global [ %r9724 + 0 ], [ %rd618 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1901938Z // end inline asm 2026-02-21T09:23:31.1901996Z // begin inline asm 2026-02-21T09:23:31.1902197Z cp.async.ca.shared.global [ %r9726 + 0 ], [ %rd619 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1902262Z // end inline asm 2026-02-21T09:23:31.1902332Z cp.async.commit_group; 2026-02-21T09:23:31.1902543Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1902604Z bar.sync 0; 2026-02-21T09:23:31.1902663Z // begin inline asm 2026-02-21T09:23:31.1902797Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r9694], 4096; 2026-02-21T09:23:31.1902859Z // end inline asm 2026-02-21T09:23:31.1903060Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1903115Z bar.sync 0; 2026-02-21T09:23:31.1903185Z elect.sync %r9827|%p167, -1; 2026-02-21T09:23:31.1903260Z and.pred %p163, %p1, %p167; 2026-02-21T09:23:31.1903325Z add.s32 %r9729, %r1290, 65536; 2026-02-21T09:23:31.1903382Z mov.b32 %r9731, 0; 2026-02-21T09:23:31.1903444Z // begin inline asm 2026-02-21T09:23:31.1903775Z @%p163 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9729], [%rd463, {%r9730, %r9731}], [%r9694]; 2026-02-21T09:23:31.1903834Z // end inline asm 2026-02-21T09:23:31.1904037Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.1904102Z cvt.s64.s32 %rd639, %r9794; 2026-02-21T09:23:31.1904166Z or.b64 %rd641, %rd639, %rd754; 2026-02-21T09:23:31.1904229Z shl.b64 %rd642, %rd641, 1; 2026-02-21T09:23:31.1904297Z add.s64 %rd643, %rd102, %rd642; 2026-02-21T09:23:31.1904359Z add.s64 %rd621, %rd643, 128; 2026-02-21T09:23:31.1904422Z cvt.s64.s32 %rd644, %r9795; 2026-02-21T09:23:31.1904493Z or.b64 %rd645, %rd644, %rd754; 2026-02-21T09:23:31.1904563Z shl.b64 %rd646, %rd645, 1; 2026-02-21T09:23:31.1904632Z add.s64 %rd647, %rd102, %rd646; 2026-02-21T09:23:31.1904694Z add.s64 %rd622, %rd647, 128; 2026-02-21T09:23:31.1904760Z cvt.s64.s32 %rd648, %r9796; 2026-02-21T09:23:31.1904824Z or.b64 %rd649, %rd648, %rd754; 2026-02-21T09:23:31.1904885Z shl.b64 %rd650, %rd649, 1; 2026-02-21T09:23:31.1904951Z add.s64 %rd651, %rd102, %rd650; 2026-02-21T09:23:31.1905012Z add.s64 %rd623, %rd651, 128; 2026-02-21T09:23:31.1905073Z cvt.s64.s32 %rd652, %r9797; 2026-02-21T09:23:31.1905139Z or.b64 %rd653, %rd652, %rd754; 2026-02-21T09:23:31.1905201Z shl.b64 %rd654, %rd653, 1; 2026-02-21T09:23:31.1905262Z add.s64 %rd655, %rd102, %rd654; 2026-02-21T09:23:31.1905322Z add.s64 %rd624, %rd655, 128; 2026-02-21T09:23:31.1905387Z cvt.s64.s32 %rd656, %r9798; 2026-02-21T09:23:31.1905449Z or.b64 %rd657, %rd656, %rd754; 2026-02-21T09:23:31.1905508Z shl.b64 %rd658, %rd657, 1; 2026-02-21T09:23:31.1905575Z add.s64 %rd659, %rd102, %rd658; 2026-02-21T09:23:31.1905637Z add.s64 %rd625, %rd659, 128; 2026-02-21T09:23:31.1905808Z cvt.s64.s32 %rd660, %r9799; 2026-02-21T09:23:31.1905870Z or.b64 %rd661, %rd660, %rd754; 2026-02-21T09:23:31.1905935Z shl.b64 %rd662, %rd661, 1; 2026-02-21T09:23:31.1905997Z add.s64 %rd663, %rd102, %rd662; 2026-02-21T09:23:31.1906059Z add.s64 %rd626, %rd663, 128; 2026-02-21T09:23:31.1906124Z cvt.s64.s32 %rd664, %r9800; 2026-02-21T09:23:31.1906185Z or.b64 %rd665, %rd664, %rd754; 2026-02-21T09:23:31.1906246Z shl.b64 %rd666, %rd665, 1; 2026-02-21T09:23:31.1906308Z add.s64 %rd667, %rd102, %rd666; 2026-02-21T09:23:31.1906373Z add.s64 %rd627, %rd667, 128; 2026-02-21T09:23:31.1906434Z cvt.s64.s32 %rd668, %r9801; 2026-02-21T09:23:31.1906618Z or.b64 %rd669, %rd668, %rd754; 2026-02-21T09:23:31.1906689Z shl.b64 %rd670, %rd669, 1; 2026-02-21T09:23:31.1906843Z add.s64 %rd671, %rd102, %rd670; 2026-02-21T09:23:31.1906907Z add.s64 %rd628, %rd671, 128; 2026-02-21T09:23:31.1906968Z cvt.s64.s32 %rd672, %r9802; 2026-02-21T09:23:31.1907034Z or.b64 %rd673, %rd672, %rd754; 2026-02-21T09:23:31.1907113Z shl.b64 %rd674, %rd673, 1; 2026-02-21T09:23:31.1907177Z add.s64 %rd675, %rd102, %rd674; 2026-02-21T09:23:31.1907243Z add.s64 %rd629, %rd675, 128; 2026-02-21T09:23:31.1907303Z cvt.s64.s32 %rd676, %r9803; 2026-02-21T09:23:31.1907429Z or.b64 %rd677, %rd676, %rd754; 2026-02-21T09:23:31.1907498Z shl.b64 %rd678, %rd677, 1; 2026-02-21T09:23:31.1907560Z add.s64 %rd679, %rd102, %rd678; 2026-02-21T09:23:31.1907621Z add.s64 %rd630, %rd679, 128; 2026-02-21T09:23:31.1907682Z cvt.s64.s32 %rd680, %r9804; 2026-02-21T09:23:31.1907748Z or.b64 %rd681, %rd680, %rd754; 2026-02-21T09:23:31.1907810Z shl.b64 %rd682, %rd681, 1; 2026-02-21T09:23:31.1907873Z add.s64 %rd683, %rd102, %rd682; 2026-02-21T09:23:31.1907939Z add.s64 %rd631, %rd683, 128; 2026-02-21T09:23:31.1908001Z cvt.s64.s32 %rd684, %r9805; 2026-02-21T09:23:31.1908063Z or.b64 %rd685, %rd684, %rd754; 2026-02-21T09:23:31.1908124Z shl.b64 %rd686, %rd685, 1; 2026-02-21T09:23:31.1908192Z add.s64 %rd687, %rd102, %rd686; 2026-02-21T09:23:31.1908354Z add.s64 %rd632, %rd687, 128; 2026-02-21T09:23:31.1908420Z cvt.s64.s32 %rd688, %r9806; 2026-02-21T09:23:31.1908487Z or.b64 %rd689, %rd688, %rd754; 2026-02-21T09:23:31.1908548Z shl.b64 %rd690, %rd689, 1; 2026-02-21T09:23:31.1908613Z add.s64 %rd691, %rd102, %rd690; 2026-02-21T09:23:31.1908675Z add.s64 %rd633, %rd691, 128; 2026-02-21T09:23:31.1908741Z cvt.s64.s32 %rd692, %r9807; 2026-02-21T09:23:31.1908803Z or.b64 %rd693, %rd692, %rd754; 2026-02-21T09:23:31.1908865Z shl.b64 %rd694, %rd693, 1; 2026-02-21T09:23:31.1908932Z add.s64 %rd695, %rd102, %rd694; 2026-02-21T09:23:31.1908994Z add.s64 %rd634, %rd695, 128; 2026-02-21T09:23:31.1909061Z cvt.s64.s32 %rd696, %r9808; 2026-02-21T09:23:31.1909126Z or.b64 %rd697, %rd696, %rd754; 2026-02-21T09:23:31.1909190Z shl.b64 %rd698, %rd697, 1; 2026-02-21T09:23:31.1909252Z add.s64 %rd699, %rd102, %rd698; 2026-02-21T09:23:31.1909313Z add.s64 %rd635, %rd699, 128; 2026-02-21T09:23:31.1909380Z cvt.s64.s32 %rd700, %r9809; 2026-02-21T09:23:31.1909444Z or.b64 %rd701, %rd700, %rd754; 2026-02-21T09:23:31.1909504Z shl.b64 %rd702, %rd701, 1; 2026-02-21T09:23:31.1909573Z add.s64 %rd703, %rd102, %rd702; 2026-02-21T09:23:31.1909634Z add.s64 %rd636, %rd703, 128; 2026-02-21T09:23:31.1909843Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1909906Z // begin inline asm 2026-02-21T09:23:31.1910048Z cp.async.ca.shared.global [ %r9733 + 0 ], [ %rd621 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1910106Z // end inline asm 2026-02-21T09:23:31.1910164Z // begin inline asm 2026-02-21T09:23:31.1910303Z cp.async.ca.shared.global [ %r9735 + 0 ], [ %rd622 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1910361Z // end inline asm 2026-02-21T09:23:31.1910420Z // begin inline asm 2026-02-21T09:23:31.1910551Z cp.async.ca.shared.global [ %r9737 + 0 ], [ %rd623 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1910611Z // end inline asm 2026-02-21T09:23:31.1910816Z // begin inline asm 2026-02-21T09:23:31.1910948Z cp.async.ca.shared.global [ %r9739 + 0 ], [ %rd624 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1911010Z // end inline asm 2026-02-21T09:23:31.1911069Z // begin inline asm 2026-02-21T09:23:31.1911202Z cp.async.ca.shared.global [ %r9741 + 0 ], [ %rd625 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1911264Z // end inline asm 2026-02-21T09:23:31.1911323Z // begin inline asm 2026-02-21T09:23:31.1911455Z cp.async.ca.shared.global [ %r9743 + 0 ], [ %rd626 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1911512Z // end inline asm 2026-02-21T09:23:31.1911584Z // begin inline asm 2026-02-21T09:23:31.1911719Z cp.async.ca.shared.global [ %r9745 + 0 ], [ %rd627 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1911778Z // end inline asm 2026-02-21T09:23:31.1911841Z // begin inline asm 2026-02-21T09:23:31.1912023Z cp.async.ca.shared.global [ %r9747 + 0 ], [ %rd628 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1912081Z // end inline asm 2026-02-21T09:23:31.1912139Z // begin inline asm 2026-02-21T09:23:31.1912274Z cp.async.ca.shared.global [ %r9749 + 0 ], [ %rd629 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1912333Z // end inline asm 2026-02-21T09:23:31.1912392Z // begin inline asm 2026-02-21T09:23:31.1912571Z cp.async.ca.shared.global [ %r9751 + 0 ], [ %rd630 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1912630Z // end inline asm 2026-02-21T09:23:31.1912690Z // begin inline asm 2026-02-21T09:23:31.1912825Z cp.async.ca.shared.global [ %r9753 + 0 ], [ %rd631 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1912881Z // end inline asm 2026-02-21T09:23:31.1912939Z // begin inline asm 2026-02-21T09:23:31.1913068Z cp.async.ca.shared.global [ %r9755 + 0 ], [ %rd632 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1913131Z // end inline asm 2026-02-21T09:23:31.1913192Z // begin inline asm 2026-02-21T09:23:31.1913322Z cp.async.ca.shared.global [ %r9757 + 0 ], [ %rd633 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1913387Z // end inline asm 2026-02-21T09:23:31.1913446Z // begin inline asm 2026-02-21T09:23:31.1913577Z cp.async.ca.shared.global [ %r9759 + 0 ], [ %rd634 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1913636Z // end inline asm 2026-02-21T09:23:31.1913699Z // begin inline asm 2026-02-21T09:23:31.1913829Z cp.async.ca.shared.global [ %r9761 + 0 ], [ %rd635 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1913887Z // end inline asm 2026-02-21T09:23:31.1913955Z // begin inline asm 2026-02-21T09:23:31.1914086Z cp.async.ca.shared.global [ %r9763 + 0 ], [ %rd636 + 0 ], 0x8, %r9697; 2026-02-21T09:23:31.1914143Z // end inline asm 2026-02-21T09:23:31.1914213Z cp.async.commit_group; 2026-02-21T09:23:31.1914415Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1914472Z bar.sync 0; 2026-02-21T09:23:31.1914530Z // begin inline asm 2026-02-21T09:23:31.1914667Z @%p160 mbarrier.arrive.expect_tx.shared.b64 _, [%r9695], 4096; 2026-02-21T09:23:31.1914723Z // end inline asm 2026-02-21T09:23:31.1914922Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1914986Z bar.sync 0; 2026-02-21T09:23:31.1915054Z elect.sync %r9828|%p168, -1; 2026-02-21T09:23:31.1915121Z and.pred %p165, %p1, %p168; 2026-02-21T09:23:31.1915183Z add.s32 %r9766, %r1290, 69632; 2026-02-21T09:23:31.1915248Z mov.b32 %r9768, 32; 2026-02-21T09:23:31.1915308Z // begin inline asm 2026-02-21T09:23:31.1915632Z @%p165 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9766], [%rd463, {%r9730, %r9768}], [%r9695]; 2026-02-21T09:23:31.1915697Z // end inline asm 2026-02-21T09:23:31.1915895Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1915958Z shl.b32 %r9829, %r12875, 7; 2026-02-21T09:23:31.1916030Z or.b32 %r9830, %r22, %r9829; 2026-02-21T09:23:31.1916099Z shl.b32 %r9831, %r9776, 7; 2026-02-21T09:23:31.1916165Z and.b32 %r9832, %r9831, -16384; 2026-02-21T09:23:31.1916229Z sub.s32 %r9833, %r9830, %r9832; 2026-02-21T09:23:31.1916409Z shl.b32 %r9834, %r9833, 10; 2026-02-21T09:23:31.1916585Z mul.wide.s32 %rd704, %r9834, 2; 2026-02-21T09:23:31.1916652Z or.b64 %rd82, %rd704, 256; 2026-02-21T09:23:31.1916717Z or.b32 %r9835, %r21, %r9829; 2026-02-21T09:23:31.1916779Z sub.s32 %r9836, %r9835, %r9832; 2026-02-21T09:23:31.1916843Z shl.b32 %r9837, %r9836, 10; 2026-02-21T09:23:31.1916915Z mul.wide.s32 %rd705, %r9837, 2; 2026-02-21T09:23:31.1916976Z or.b64 %rd83, %rd705, 256; 2026-02-21T09:23:31.1917036Z or.b32 %r9838, %r20, %r9829; 2026-02-21T09:23:31.1917096Z sub.s32 %r9839, %r9838, %r9832; 2026-02-21T09:23:31.1917159Z shl.b32 %r9840, %r9839, 10; 2026-02-21T09:23:31.1917222Z mul.wide.s32 %rd706, %r9840, 2; 2026-02-21T09:23:31.1917282Z or.b64 %rd84, %rd706, 256; 2026-02-21T09:23:31.1917343Z or.b32 %r9841, %r19, %r9829; 2026-02-21T09:23:31.1917494Z sub.s32 %r9842, %r9841, %r9832; 2026-02-21T09:23:31.1917559Z shl.b32 %r9843, %r9842, 10; 2026-02-21T09:23:31.1917623Z mul.wide.s32 %rd707, %r9843, 2; 2026-02-21T09:23:31.1917690Z or.b64 %rd85, %rd707, 256; 2026-02-21T09:23:31.1917748Z or.b32 %r9844, %r18, %r9829; 2026-02-21T09:23:31.1917808Z sub.s32 %r9845, %r9844, %r9832; 2026-02-21T09:23:31.1917874Z shl.b32 %r9846, %r9845, 10; 2026-02-21T09:23:31.1918003Z mul.wide.s32 %rd708, %r9846, 2; 2026-02-21T09:23:31.1918067Z or.b64 %rd86, %rd708, 256; 2026-02-21T09:23:31.1918128Z or.b32 %r9847, %r17, %r9829; 2026-02-21T09:23:31.1918193Z sub.s32 %r9848, %r9847, %r9832; 2026-02-21T09:23:31.1918254Z shl.b32 %r9849, %r9848, 10; 2026-02-21T09:23:31.1918317Z mul.wide.s32 %rd709, %r9849, 2; 2026-02-21T09:23:31.1918382Z or.b64 %rd87, %rd709, 256; 2026-02-21T09:23:31.1918441Z or.b32 %r9850, %r16, %r9829; 2026-02-21T09:23:31.1918502Z sub.s32 %r9851, %r9850, %r9832; 2026-02-21T09:23:31.1918569Z shl.b32 %r9852, %r9851, 10; 2026-02-21T09:23:31.1918633Z mul.wide.s32 %rd710, %r9852, 2; 2026-02-21T09:23:31.1918693Z or.b64 %rd88, %rd710, 256; 2026-02-21T09:23:31.1918757Z or.b32 %r9853, %r15, %r9829; 2026-02-21T09:23:31.1918826Z sub.s32 %r9854, %r9853, %r9832; 2026-02-21T09:23:31.1918887Z shl.b32 %r9855, %r9854, 10; 2026-02-21T09:23:31.1918952Z mul.wide.s32 %rd711, %r9855, 2; 2026-02-21T09:23:31.1919017Z or.b64 %rd89, %rd711, 256; 2026-02-21T09:23:31.1919079Z or.b32 %r9856, %r14, %r9829; 2026-02-21T09:23:31.1919140Z sub.s32 %r9857, %r9856, %r9832; 2026-02-21T09:23:31.1919202Z shl.b32 %r9858, %r9857, 10; 2026-02-21T09:23:31.1919272Z mul.wide.s32 %rd712, %r9858, 2; 2026-02-21T09:23:31.1919333Z or.b64 %rd90, %rd712, 256; 2026-02-21T09:23:31.1919395Z or.b32 %r9859, %r13, %r9829; 2026-02-21T09:23:31.1919463Z sub.s32 %r9860, %r9859, %r9832; 2026-02-21T09:23:31.1919523Z shl.b32 %r9861, %r9860, 10; 2026-02-21T09:23:31.1919589Z mul.wide.s32 %rd713, %r9861, 2; 2026-02-21T09:23:31.1919651Z or.b64 %rd91, %rd713, 256; 2026-02-21T09:23:31.1919717Z or.b32 %r9862, %r12, %r9829; 2026-02-21T09:23:31.1919793Z sub.s32 %r9863, %r9862, %r9832; 2026-02-21T09:23:31.1919856Z shl.b32 %r9864, %r9863, 10; 2026-02-21T09:23:31.1919931Z mul.wide.s32 %rd714, %r9864, 2; 2026-02-21T09:23:31.1919990Z or.b64 %rd92, %rd714, 256; 2026-02-21T09:23:31.1920050Z or.b32 %r9865, %r11, %r9829; 2026-02-21T09:23:31.1920121Z sub.s32 %r9866, %r9865, %r9832; 2026-02-21T09:23:31.1920182Z shl.b32 %r9867, %r9866, 10; 2026-02-21T09:23:31.1920247Z mul.wide.s32 %rd715, %r9867, 2; 2026-02-21T09:23:31.1920307Z or.b64 %rd93, %rd715, 256; 2026-02-21T09:23:31.1920374Z or.b32 %r9868, %r10, %r9829; 2026-02-21T09:23:31.1920439Z sub.s32 %r9869, %r9868, %r9832; 2026-02-21T09:23:31.1920498Z shl.b32 %r9870, %r9869, 10; 2026-02-21T09:23:31.1920567Z mul.wide.s32 %rd716, %r9870, 2; 2026-02-21T09:23:31.1920630Z or.b64 %rd94, %rd716, 256; 2026-02-21T09:23:31.1920690Z or.b32 %r9871, %r9, %r9829; 2026-02-21T09:23:31.1920755Z sub.s32 %r9872, %r9871, %r9832; 2026-02-21T09:23:31.1920823Z shl.b32 %r9873, %r9872, 10; 2026-02-21T09:23:31.1920889Z mul.wide.s32 %rd717, %r9873, 2; 2026-02-21T09:23:31.1920950Z or.b64 %rd95, %rd717, 256; 2026-02-21T09:23:31.1921161Z or.b32 %r9874, %r8, %r9829; 2026-02-21T09:23:31.1921221Z sub.s32 %r9875, %r9874, %r9832; 2026-02-21T09:23:31.1921280Z shl.b32 %r9876, %r9875, 10; 2026-02-21T09:23:31.1921344Z mul.wide.s32 %rd718, %r9876, 2; 2026-02-21T09:23:31.1921412Z or.b64 %rd96, %rd718, 256; 2026-02-21T09:23:31.1921473Z or.b32 %r9877, %r7, %r9829; 2026-02-21T09:23:31.1921534Z sub.s32 %r9878, %r9877, %r9832; 2026-02-21T09:23:31.1921599Z shl.b32 %r9879, %r9878, 10; 2026-02-21T09:23:31.1921662Z mul.wide.s32 %rd719, %r9879, 2; 2026-02-21T09:23:31.1921722Z or.b64 %rd97, %rd719, 256; 2026-02-21T09:23:31.1921788Z mov.b32 %r12879, 0f00000000; 2026-02-21T09:23:31.1921849Z mov.b32 %r12878, 1; 2026-02-21T09:23:31.1921909Z mov.b32 %r12877, -1; 2026-02-21T09:23:31.1921970Z mov.b64 %rd762, 0; 2026-02-21T09:23:31.1922114Z mov.b64 %rd761, %rd20; 2026-02-21T09:23:31.1922179Z mov.b32 %r12876, %r9731; 2026-02-21T09:23:31.1922241Z mov.b32 %r12880, %r12879; 2026-02-21T09:23:31.1922308Z mov.b32 %r12881, %r12879; 2026-02-21T09:23:31.1922370Z mov.b32 %r12882, %r12879; 2026-02-21T09:23:31.1922429Z mov.b32 %r12883, %r12879; 2026-02-21T09:23:31.1922487Z mov.b32 %r12884, %r12879; 2026-02-21T09:23:31.1922552Z mov.b32 %r12885, %r12879; 2026-02-21T09:23:31.1922657Z mov.b32 %r12886, %r12879; 2026-02-21T09:23:31.1922718Z mov.b32 %r12887, %r12879; 2026-02-21T09:23:31.1922782Z mov.b32 %r12888, %r12879; 2026-02-21T09:23:31.1922841Z mov.b32 %r12889, %r12879; 2026-02-21T09:23:31.1922900Z mov.b32 %r12890, %r12879; 2026-02-21T09:23:31.1922959Z mov.b32 %r12891, %r12879; 2026-02-21T09:23:31.1923023Z mov.b32 %r12892, %r12879; 2026-02-21T09:23:31.1923083Z mov.b32 %r12893, %r12879; 2026-02-21T09:23:31.1923143Z mov.b32 %r12894, %r12879; 2026-02-21T09:23:31.1923206Z mov.b32 %r12895, %r12879; 2026-02-21T09:23:31.1923266Z mov.b32 %r12896, %r12879; 2026-02-21T09:23:31.1923324Z mov.b32 %r12897, %r12879; 2026-02-21T09:23:31.1923383Z mov.b32 %r12898, %r12879; 2026-02-21T09:23:31.1923448Z mov.b32 %r12899, %r12879; 2026-02-21T09:23:31.1923510Z mov.b32 %r12900, %r12879; 2026-02-21T09:23:31.1923569Z mov.b32 %r12901, %r12879; 2026-02-21T09:23:31.1923632Z mov.b32 %r12902, %r12879; 2026-02-21T09:23:31.1923694Z mov.b32 %r12903, %r12879; 2026-02-21T09:23:31.1923754Z mov.b32 %r12904, %r12879; 2026-02-21T09:23:31.1923812Z mov.b32 %r12905, %r12879; 2026-02-21T09:23:31.1923875Z mov.b32 %r12906, %r12879; 2026-02-21T09:23:31.1923934Z mov.b32 %r12907, %r12879; 2026-02-21T09:23:31.1923994Z mov.b32 %r12908, %r12879; 2026-02-21T09:23:31.1934564Z mov.b32 %r12909, %r12879; 2026-02-21T09:23:31.1934687Z mov.b32 %r12910, %r12879; 2026-02-21T09:23:31.1934759Z mov.b32 %r12911, %r12879; 2026-02-21T09:23:31.1934824Z mov.b32 %r12912, %r12879; 2026-02-21T09:23:31.1934898Z mov.b32 %r12913, %r12879; 2026-02-21T09:23:31.1934969Z mov.b32 %r12914, %r12879; 2026-02-21T09:23:31.1935041Z mov.b32 %r12915, %r12879; 2026-02-21T09:23:31.1935125Z mov.b32 %r12916, %r12879; 2026-02-21T09:23:31.1935189Z mov.b32 %r12917, %r12879; 2026-02-21T09:23:31.1935253Z mov.b32 %r12918, %r12879; 2026-02-21T09:23:31.1935314Z mov.b32 %r12919, %r12879; 2026-02-21T09:23:31.1935384Z mov.b32 %r12920, %r12879; 2026-02-21T09:23:31.1935446Z mov.b32 %r12921, %r12879; 2026-02-21T09:23:31.1935509Z mov.b32 %r12922, %r12879; 2026-02-21T09:23:31.1935578Z mov.b32 %r12923, %r12879; 2026-02-21T09:23:31.1935640Z mov.b32 %r12924, %r12879; 2026-02-21T09:23:31.1935702Z mov.b32 %r12925, %r12879; 2026-02-21T09:23:31.1935764Z mov.b32 %r12926, %r12879; 2026-02-21T09:23:31.1935834Z mov.b32 %r12927, %r12879; 2026-02-21T09:23:31.1935896Z mov.b32 %r12928, %r12879; 2026-02-21T09:23:31.1935958Z mov.b32 %r12929, %r12879; 2026-02-21T09:23:31.1936025Z mov.b32 %r12930, %r12879; 2026-02-21T09:23:31.1936091Z mov.b32 %r12931, %r12879; 2026-02-21T09:23:31.1936156Z mov.b32 %r12932, %r12879; 2026-02-21T09:23:31.1936217Z mov.b32 %r12933, %r12879; 2026-02-21T09:23:31.1936286Z mov.b32 %r12934, %r12879; 2026-02-21T09:23:31.1936712Z mov.b32 %r12935, %r12879; 2026-02-21T09:23:31.1936869Z mov.b32 %r12936, %r12879; 2026-02-21T09:23:31.1936938Z mov.b32 %r12937, %r12879; 2026-02-21T09:23:31.1936999Z mov.b32 %r12938, %r12879; 2026-02-21T09:23:31.1937061Z mov.b32 %r12939, %r12879; 2026-02-21T09:23:31.1937139Z mov.b32 %r12940, %r12879; 2026-02-21T09:23:31.1937210Z mov.b32 %r12941, %r12879; 2026-02-21T09:23:31.1937273Z mov.b32 %r12942, %r12879; 2026-02-21T09:23:31.1937337Z mov.b32 %r12943, %r12879; 2026-02-21T09:23:31.1937405Z mov.b32 %r12944, %r12879; 2026-02-21T09:23:31.1937468Z mov.b32 %r12945, %r12879; 2026-02-21T09:23:31.1937529Z mov.b32 %r12946, %r12879; 2026-02-21T09:23:31.1937590Z mov.b32 %r12947, %r12879; 2026-02-21T09:23:31.1937659Z mov.b32 %r12948, %r12879; 2026-02-21T09:23:31.1937721Z mov.b32 %r12949, %r12879; 2026-02-21T09:23:31.1937869Z mov.b32 %r12950, %r12879; 2026-02-21T09:23:31.1937941Z mov.b32 %r12951, %r12879; 2026-02-21T09:23:31.1938006Z mov.b32 %r12952, %r12879; 2026-02-21T09:23:31.1938073Z mov.b32 %r12953, %r12879; 2026-02-21T09:23:31.1938138Z mov.b32 %r12954, %r12879; 2026-02-21T09:23:31.1938205Z mov.b32 %r12955, %r12879; 2026-02-21T09:23:31.1938266Z mov.b32 %r12956, %r12879; 2026-02-21T09:23:31.1938327Z mov.b32 %r12957, %r12879; 2026-02-21T09:23:31.1938457Z mov.b32 %r12958, %r12879; 2026-02-21T09:23:31.1938523Z mov.b32 %r12959, %r12879; 2026-02-21T09:23:31.1938585Z mov.b32 %r12960, %r12879; 2026-02-21T09:23:31.1938653Z mov.b32 %r12961, %r12879; 2026-02-21T09:23:31.1938716Z mov.b32 %r12962, %r12879; 2026-02-21T09:23:31.1938779Z mov.b32 %r12963, %r12879; 2026-02-21T09:23:31.1938840Z mov.b32 %r12964, %r12879; 2026-02-21T09:23:31.1938919Z mov.b32 %r12965, %r12879; 2026-02-21T09:23:31.1938981Z mov.b32 %r12966, %r12879; 2026-02-21T09:23:31.1939049Z mov.b32 %r12967, %r12879; 2026-02-21T09:23:31.1939117Z mov.b32 %r12968, %r12879; 2026-02-21T09:23:31.1939177Z mov.b32 %r12969, %r12879; 2026-02-21T09:23:31.1939240Z mov.b32 %r12970, %r12879; 2026-02-21T09:23:31.1939306Z mov.b32 %r12971, %r12879; 2026-02-21T09:23:31.1939373Z mov.b32 %r12972, %r12879; 2026-02-21T09:23:31.1939434Z mov.b32 %r12973, %r12879; 2026-02-21T09:23:31.1939496Z mov.b32 %r12974, %r12879; 2026-02-21T09:23:31.1939562Z mov.b32 %r12975, %r12879; 2026-02-21T09:23:31.1939625Z mov.b32 %r12976, %r12879; 2026-02-21T09:23:31.1939686Z mov.b32 %r12977, %r12879; 2026-02-21T09:23:31.1939747Z mov.b32 %r12978, %r12879; 2026-02-21T09:23:31.1939813Z mov.b32 %r12979, %r12879; 2026-02-21T09:23:31.1939878Z mov.b32 %r12980, %r12879; 2026-02-21T09:23:31.1939940Z mov.b32 %r12981, %r12879; 2026-02-21T09:23:31.1940007Z mov.b32 %r12982, %r12879; 2026-02-21T09:23:31.1940069Z mov.b32 %r12983, %r12879; 2026-02-21T09:23:31.1940131Z mov.b32 %r12984, %r12879; 2026-02-21T09:23:31.1940191Z mov.b32 %r12985, %r12879; 2026-02-21T09:23:31.1940259Z mov.b32 %r12986, %r12879; 2026-02-21T09:23:31.1940321Z mov.b32 %r12987, %r12879; 2026-02-21T09:23:31.1940383Z mov.b32 %r12988, %r12879; 2026-02-21T09:23:31.1940452Z mov.b32 %r12989, %r12879; 2026-02-21T09:23:31.1940512Z mov.b32 %r12990, %r12879; 2026-02-21T09:23:31.1940575Z mov.b32 %r12991, %r12879; 2026-02-21T09:23:31.1940638Z mov.b32 %r12992, %r12879; 2026-02-21T09:23:31.1940706Z mov.b32 %r12993, %r12879; 2026-02-21T09:23:31.1940769Z mov.b32 %r12994, %r12879; 2026-02-21T09:23:31.1940832Z mov.b32 %r12995, %r12879; 2026-02-21T09:23:31.1940899Z mov.b32 %r12996, %r12879; 2026-02-21T09:23:31.1940961Z mov.b32 %r12997, %r12879; 2026-02-21T09:23:31.1941022Z mov.b32 %r12998, %r12879; 2026-02-21T09:23:31.1941085Z mov.b32 %r12999, %r12879; 2026-02-21T09:23:31.1941152Z mov.b32 %r13000, %r12879; 2026-02-21T09:23:31.1941214Z mov.b32 %r13001, %r12879; 2026-02-21T09:23:31.1941275Z mov.b32 %r13002, %r12879; 2026-02-21T09:23:31.1941345Z mov.b32 %r13003, %r12879; 2026-02-21T09:23:31.1941406Z mov.b32 %r13004, %r12879; 2026-02-21T09:23:31.1941468Z mov.b32 %r13005, %r12879; 2026-02-21T09:23:31.1941534Z mov.b32 %r13006, %r12879; 2026-02-21T09:23:31.1941829Z $L__BB0_12: // Parent Loop BB0_11 Depth=1 2026-02-21T09:23:31.1941947Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:31.1942026Z setp.lt.u64 %p189, %rd762, 448; 2026-02-21T09:23:31.1942105Z add.s32 %r12293, %r12877, 1; 2026-02-21T09:23:31.1942181Z setp.gt.s32 %p190, %r12293, 1; 2026-02-21T09:23:31.1942257Z selp.b32 %r12877, 0, %r12293, %p190; 2026-02-21T09:23:31.1942332Z selp.b32 %r12294, 1, 0, %p190; 2026-02-21T09:23:31.1942402Z xor.b32 %r12876, %r12876, %r12294; 2026-02-21T09:23:31.1942634Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.1942710Z cp.async.wait_group 1; 2026-02-21T09:23:31.1963581Z bar.sync 0; 2026-02-21T09:23:31.1963889Z shl.b32 %r12295, %r12877, 14; 2026-02-21T09:23:31.1963982Z add.s32 %r12297, %r1290, %r12295; 2026-02-21T09:23:31.1964236Z .loc 1 56 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:56:32 2026-02-21T09:23:31.1964332Z add.s32 %r12298, %r12297, %r148; 2026-02-21T09:23:31.1964405Z ld.shared.b16 %rs673, [%r12298]; 2026-02-21T09:23:31.1964483Z ld.shared.b16 %rs674, [%r12298+1024]; 2026-02-21T09:23:31.1964656Z ld.shared.b16 %rs675, [%r12298+64]; 2026-02-21T09:23:31.1964740Z ld.shared.b16 %rs676, [%r12298+1088]; 2026-02-21T09:23:31.1964810Z ld.shared.b16 %rs677, [%r12298+8192]; 2026-02-21T09:23:31.1964880Z ld.shared.b16 %rs678, [%r12298+9216]; 2026-02-21T09:23:31.1964945Z ld.shared.b16 %rs679, [%r12298+8256]; 2026-02-21T09:23:31.1965009Z ld.shared.b16 %rs680, [%r12298+9280]; 2026-02-21T09:23:31.1965079Z add.s32 %r12299, %r12297, %r149; 2026-02-21T09:23:31.1965146Z ld.shared.b16 %rs681, [%r12299]; 2026-02-21T09:23:31.1965211Z ld.shared.b16 %rs682, [%r12299+1024]; 2026-02-21T09:23:31.1965293Z ld.shared.b16 %rs683, [%r12299+64]; 2026-02-21T09:23:31.1965362Z ld.shared.b16 %rs684, [%r12299+1088]; 2026-02-21T09:23:31.1965427Z ld.shared.b16 %rs685, [%r12299+8192]; 2026-02-21T09:23:31.1965498Z ld.shared.b16 %rs686, [%r12299+9216]; 2026-02-21T09:23:31.1965567Z ld.shared.b16 %rs687, [%r12299+8256]; 2026-02-21T09:23:31.1965632Z ld.shared.b16 %rs688, [%r12299+9280]; 2026-02-21T09:23:31.1965693Z add.s32 %r12300, %r12297, %r150; 2026-02-21T09:23:31.1965766Z ld.shared.b16 %rs689, [%r12300]; 2026-02-21T09:23:31.1965836Z ld.shared.b16 %rs690, [%r12300+1024]; 2026-02-21T09:23:31.1965902Z ld.shared.b16 %rs691, [%r12300+64]; 2026-02-21T09:23:31.1965966Z ld.shared.b16 %rs692, [%r12300+1088]; 2026-02-21T09:23:31.1966045Z ld.shared.b16 %rs693, [%r12300+8192]; 2026-02-21T09:23:31.1966111Z ld.shared.b16 %rs694, [%r12300+9216]; 2026-02-21T09:23:31.1966176Z ld.shared.b16 %rs695, [%r12300+8256]; 2026-02-21T09:23:31.1966250Z ld.shared.b16 %rs696, [%r12300+9280]; 2026-02-21T09:23:31.1966312Z add.s32 %r12301, %r12297, %r151; 2026-02-21T09:23:31.1966375Z ld.shared.b16 %rs697, [%r12301]; 2026-02-21T09:23:31.1966443Z ld.shared.b16 %rs698, [%r12301+1024]; 2026-02-21T09:23:31.1966733Z ld.shared.b16 %rs699, [%r12301+64]; 2026-02-21T09:23:31.1966804Z ld.shared.b16 %rs700, [%r12301+1088]; 2026-02-21T09:23:31.1966868Z ld.shared.b16 %rs701, [%r12301+8192]; 2026-02-21T09:23:31.1966936Z ld.shared.b16 %rs702, [%r12301+9216]; 2026-02-21T09:23:31.1967003Z ld.shared.b16 %rs703, [%r12301+8256]; 2026-02-21T09:23:31.1967068Z ld.shared.b16 %rs704, [%r12301+9280]; 2026-02-21T09:23:31.1967132Z add.s32 %r12302, %r12297, %r152; 2026-02-21T09:23:31.1967196Z ld.shared.b16 %rs705, [%r12302]; 2026-02-21T09:23:31.1967261Z ld.shared.b16 %rs706, [%r12302+1024]; 2026-02-21T09:23:31.1967328Z ld.shared.b16 %rs707, [%r12302+64]; 2026-02-21T09:23:31.1967409Z ld.shared.b16 %rs708, [%r12302+1088]; 2026-02-21T09:23:31.1967475Z ld.shared.b16 %rs709, [%r12302+8192]; 2026-02-21T09:23:31.1967552Z ld.shared.b16 %rs710, [%r12302+9216]; 2026-02-21T09:23:31.1967626Z ld.shared.b16 %rs711, [%r12302+8256]; 2026-02-21T09:23:31.1967691Z ld.shared.b16 %rs712, [%r12302+9280]; 2026-02-21T09:23:31.1967980Z add.s32 %r12303, %r12297, %r153; 2026-02-21T09:23:31.1968052Z ld.shared.b16 %rs713, [%r12303]; 2026-02-21T09:23:31.1968122Z ld.shared.b16 %rs714, [%r12303+1024]; 2026-02-21T09:23:31.1968200Z ld.shared.b16 %rs715, [%r12303+64]; 2026-02-21T09:23:31.1968270Z ld.shared.b16 %rs716, [%r12303+1088]; 2026-02-21T09:23:31.1968341Z ld.shared.b16 %rs717, [%r12303+8192]; 2026-02-21T09:23:31.1968407Z ld.shared.b16 %rs718, [%r12303+9216]; 2026-02-21T09:23:31.1968471Z ld.shared.b16 %rs719, [%r12303+8256]; 2026-02-21T09:23:31.1968543Z ld.shared.b16 %rs720, [%r12303+9280]; 2026-02-21T09:23:31.1968603Z add.s32 %r12304, %r12297, %r154; 2026-02-21T09:23:31.1968665Z ld.shared.b16 %rs721, [%r12304]; 2026-02-21T09:23:31.1968729Z ld.shared.b16 %rs722, [%r12304+1024]; 2026-02-21T09:23:31.1968878Z ld.shared.b16 %rs723, [%r12304+64]; 2026-02-21T09:23:31.1968949Z ld.shared.b16 %rs724, [%r12304+1088]; 2026-02-21T09:23:31.1969014Z ld.shared.b16 %rs725, [%r12304+8192]; 2026-02-21T09:23:31.1969088Z ld.shared.b16 %rs726, [%r12304+9216]; 2026-02-21T09:23:31.1969152Z ld.shared.b16 %rs727, [%r12304+8256]; 2026-02-21T09:23:31.1969217Z ld.shared.b16 %rs728, [%r12304+9280]; 2026-02-21T09:23:31.1969279Z add.s32 %r12305, %r12297, %r155; 2026-02-21T09:23:31.1969414Z ld.shared.b16 %rs729, [%r12305]; 2026-02-21T09:23:31.1969484Z ld.shared.b16 %rs730, [%r12305+1024]; 2026-02-21T09:23:31.1969549Z ld.shared.b16 %rs731, [%r12305+64]; 2026-02-21T09:23:31.1969618Z ld.shared.b16 %rs732, [%r12305+1088]; 2026-02-21T09:23:31.1969684Z ld.shared.b16 %rs733, [%r12305+8192]; 2026-02-21T09:23:31.1969748Z ld.shared.b16 %rs734, [%r12305+9216]; 2026-02-21T09:23:31.1969813Z ld.shared.b16 %rs735, [%r12305+8256]; 2026-02-21T09:23:31.1969878Z ld.shared.b16 %rs736, [%r12305+9280]; 2026-02-21T09:23:31.1969949Z cvt.f32.bf16 %r10010, %rs673; 2026-02-21T09:23:31.1970013Z cvt.f32.bf16 %r10011, %rs674; 2026-02-21T09:23:31.1970075Z cvt.f32.bf16 %r10012, %rs681; 2026-02-21T09:23:31.1970135Z cvt.f32.bf16 %r10013, %rs682; 2026-02-21T09:23:31.1970207Z cvt.f32.bf16 %r10142, %rs689; 2026-02-21T09:23:31.1970274Z cvt.f32.bf16 %r10143, %rs690; 2026-02-21T09:23:31.1970332Z cvt.f32.bf16 %r10144, %rs697; 2026-02-21T09:23:31.1970389Z cvt.f32.bf16 %r10145, %rs698; 2026-02-21T09:23:31.1970449Z cvt.f32.bf16 %r10274, %rs705; 2026-02-21T09:23:31.1970511Z cvt.f32.bf16 %r10275, %rs706; 2026-02-21T09:23:31.1970570Z cvt.f32.bf16 %r10276, %rs713; 2026-02-21T09:23:31.1970628Z cvt.f32.bf16 %r10277, %rs714; 2026-02-21T09:23:31.1970693Z cvt.f32.bf16 %r10406, %rs721; 2026-02-21T09:23:31.1970749Z cvt.f32.bf16 %r10407, %rs722; 2026-02-21T09:23:31.1970806Z cvt.f32.bf16 %r10408, %rs729; 2026-02-21T09:23:31.1970868Z cvt.f32.bf16 %r10409, %rs730; 2026-02-21T09:23:31.1970928Z cvt.f32.bf16 %r10538, %rs675; 2026-02-21T09:23:31.1970988Z cvt.f32.bf16 %r10539, %rs676; 2026-02-21T09:23:31.1971046Z cvt.f32.bf16 %r10540, %rs683; 2026-02-21T09:23:31.1971110Z cvt.f32.bf16 %r10541, %rs684; 2026-02-21T09:23:31.1971183Z cvt.f32.bf16 %r10670, %rs691; 2026-02-21T09:23:31.1971244Z cvt.f32.bf16 %r10671, %rs692; 2026-02-21T09:23:31.1971306Z cvt.f32.bf16 %r10672, %rs699; 2026-02-21T09:23:31.1971365Z cvt.f32.bf16 %r10673, %rs700; 2026-02-21T09:23:31.1971469Z cvt.f32.bf16 %r10802, %rs707; 2026-02-21T09:23:31.1971531Z cvt.f32.bf16 %r10803, %rs708; 2026-02-21T09:23:31.1971594Z cvt.f32.bf16 %r10804, %rs715; 2026-02-21T09:23:31.1971651Z cvt.f32.bf16 %r10805, %rs716; 2026-02-21T09:23:31.1971709Z cvt.f32.bf16 %r10934, %rs723; 2026-02-21T09:23:31.1971768Z cvt.f32.bf16 %r10935, %rs724; 2026-02-21T09:23:31.1971831Z cvt.f32.bf16 %r10936, %rs731; 2026-02-21T09:23:31.1971901Z cvt.f32.bf16 %r10937, %rs732; 2026-02-21T09:23:31.1971966Z cvt.f32.bf16 %r11066, %rs677; 2026-02-21T09:23:31.1972035Z cvt.f32.bf16 %r11067, %rs678; 2026-02-21T09:23:31.1972099Z cvt.f32.bf16 %r11068, %rs685; 2026-02-21T09:23:31.1972160Z cvt.f32.bf16 %r11069, %rs686; 2026-02-21T09:23:31.1972230Z cvt.f32.bf16 %r11198, %rs693; 2026-02-21T09:23:31.1972445Z cvt.f32.bf16 %r11199, %rs694; 2026-02-21T09:23:31.1972508Z cvt.f32.bf16 %r11200, %rs701; 2026-02-21T09:23:31.1972584Z cvt.f32.bf16 %r11201, %rs702; 2026-02-21T09:23:31.1972648Z cvt.f32.bf16 %r11330, %rs709; 2026-02-21T09:23:31.1972714Z cvt.f32.bf16 %r11331, %rs710; 2026-02-21T09:23:31.1972775Z cvt.f32.bf16 %r11332, %rs717; 2026-02-21T09:23:31.1972842Z cvt.f32.bf16 %r11333, %rs718; 2026-02-21T09:23:31.1972906Z cvt.f32.bf16 %r11462, %rs725; 2026-02-21T09:23:31.1972969Z cvt.f32.bf16 %r11463, %rs726; 2026-02-21T09:23:31.1973036Z cvt.f32.bf16 %r11464, %rs733; 2026-02-21T09:23:31.1973103Z cvt.f32.bf16 %r11465, %rs734; 2026-02-21T09:23:31.1973177Z cvt.f32.bf16 %r11594, %rs679; 2026-02-21T09:23:31.1973240Z cvt.f32.bf16 %r11595, %rs680; 2026-02-21T09:23:31.1973363Z cvt.f32.bf16 %r11596, %rs687; 2026-02-21T09:23:31.1973427Z cvt.f32.bf16 %r11597, %rs688; 2026-02-21T09:23:31.1973488Z cvt.f32.bf16 %r11726, %rs695; 2026-02-21T09:23:31.1973555Z cvt.f32.bf16 %r11727, %rs696; 2026-02-21T09:23:31.1973624Z cvt.f32.bf16 %r11728, %rs703; 2026-02-21T09:23:31.1973686Z cvt.f32.bf16 %r11729, %rs704; 2026-02-21T09:23:31.1973758Z cvt.f32.bf16 %r11858, %rs711; 2026-02-21T09:23:31.1973828Z cvt.f32.bf16 %r11859, %rs712; 2026-02-21T09:23:31.1973938Z cvt.f32.bf16 %r11860, %rs719; 2026-02-21T09:23:31.1974002Z cvt.f32.bf16 %r11861, %rs720; 2026-02-21T09:23:31.1974070Z cvt.f32.bf16 %r11990, %rs727; 2026-02-21T09:23:31.1974132Z cvt.f32.bf16 %r11991, %rs728; 2026-02-21T09:23:31.1974191Z cvt.f32.bf16 %r11992, %rs735; 2026-02-21T09:23:31.1974252Z cvt.f32.bf16 %r11993, %rs736; 2026-02-21T09:23:31.1974499Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.1974566Z shl.b32 %r12306, %r12877, 3; 2026-02-21T09:23:31.1974636Z add.s32 %r9880, %r9694, %r12306; 2026-02-21T09:23:31.1974703Z // begin inline asm 2026-02-21T09:23:31.1974758Z 2026-02-21T09:23:31.1974810Z { 2026-02-21T09:23:31.1974885Z .reg .pred complete; 2026-02-21T09:23:31.1974945Z waitLoop: 2026-02-21T09:23:31.1975098Z mbarrier.try_wait.parity.shared.b64 complete, [%r9880], %r12876; 2026-02-21T09:23:31.1975172Z @!complete bra.uni waitLoop; 2026-02-21T09:23:31.1975231Z } 2026-02-21T09:23:31.1975237Z 2026-02-21T09:23:31.1975298Z // end inline asm 2026-02-21T09:23:31.1975514Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.1975585Z shl.b32 %r12308, %r12877, 12; 2026-02-21T09:23:31.1975652Z add.s32 %r12310, %r9729, %r12308; 2026-02-21T09:23:31.1975853Z .loc 1 76 58 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:76:58 2026-02-21T09:23:31.1975923Z add.s32 %r12311, %r12310, %r12473; 2026-02-21T09:23:31.1975985Z xor.b32 %r12312, %r12473, 16; 2026-02-21T09:23:31.1976050Z add.s32 %r12313, %r12310, %r12312; 2026-02-21T09:23:31.1976122Z xor.b32 %r12314, %r12473, 32; 2026-02-21T09:23:31.1976193Z add.s32 %r12315, %r12310, %r12314; 2026-02-21T09:23:31.1976261Z xor.b32 %r12316, %r12473, 48; 2026-02-21T09:23:31.1976326Z add.s32 %r12317, %r12310, %r12316; 2026-02-21T09:23:31.1976392Z xor.b32 %r12318, %r12473, 64; 2026-02-21T09:23:31.1976618Z add.s32 %r12319, %r12310, %r12318; 2026-02-21T09:23:31.1976689Z xor.b32 %r12320, %r12473, 80; 2026-02-21T09:23:31.1976753Z add.s32 %r12321, %r12310, %r12320; 2026-02-21T09:23:31.1976822Z xor.b32 %r12322, %r12473, 96; 2026-02-21T09:23:31.1976884Z add.s32 %r12323, %r12310, %r12322; 2026-02-21T09:23:31.1976948Z xor.b32 %r12324, %r12473, 112; 2026-02-21T09:23:31.1977015Z add.s32 %r12325, %r12310, %r12324; 2026-02-21T09:23:31.1977215Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1977285Z ld.shared.s8 %rs737, [%r12311]; 2026-02-21T09:23:31.1977494Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1977560Z shl.b16 %rs738, %rs737, 4; 2026-02-21T09:23:31.1977940Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1978018Z ld.shared.s8 %rs739, [%r12313+128]; 2026-02-21T09:23:31.1978225Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1978292Z shl.b16 %rs740, %rs739, 4; 2026-02-21T09:23:31.1978487Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1978564Z ld.shared.s8 %rs741, [%r12315+256]; 2026-02-21T09:23:31.1978759Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1978823Z shl.b16 %rs742, %rs741, 4; 2026-02-21T09:23:31.1979085Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1979155Z ld.shared.s8 %rs743, [%r12317+384]; 2026-02-21T09:23:31.1979351Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1979433Z shl.b16 %rs744, %rs743, 4; 2026-02-21T09:23:31.1979631Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1979772Z ld.shared.s8 %rs745, [%r12319+512]; 2026-02-21T09:23:31.1979968Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1980039Z shl.b16 %rs746, %rs745, 4; 2026-02-21T09:23:31.1980234Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1980300Z ld.shared.s8 %rs747, [%r12321+640]; 2026-02-21T09:23:31.1980499Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1980562Z shl.b16 %rs748, %rs747, 4; 2026-02-21T09:23:31.1980758Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1980835Z ld.shared.s8 %rs749, [%r12323+768]; 2026-02-21T09:23:31.1981027Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1981099Z shl.b16 %rs750, %rs749, 4; 2026-02-21T09:23:31.1981304Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1981370Z ld.shared.s8 %rs751, [%r12325+896]; 2026-02-21T09:23:31.1981564Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1981626Z shl.b16 %rs752, %rs751, 4; 2026-02-21T09:23:31.1981825Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1981896Z ld.shared.s8 %rs753, [%r12311+1024]; 2026-02-21T09:23:31.1982090Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1982162Z shl.b16 %rs754, %rs753, 4; 2026-02-21T09:23:31.1982355Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1982425Z ld.shared.s8 %rs755, [%r12313+1152]; 2026-02-21T09:23:31.1982628Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1982689Z shl.b16 %rs756, %rs755, 4; 2026-02-21T09:23:31.1982884Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1982957Z ld.shared.s8 %rs757, [%r12315+1280]; 2026-02-21T09:23:31.1983152Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1983214Z shl.b16 %rs758, %rs757, 4; 2026-02-21T09:23:31.1983416Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1983486Z ld.shared.s8 %rs759, [%r12317+1408]; 2026-02-21T09:23:31.1983793Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1983857Z shl.b16 %rs760, %rs759, 4; 2026-02-21T09:23:31.1984060Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1984128Z ld.shared.s8 %rs761, [%r12319+1536]; 2026-02-21T09:23:31.1984333Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1984405Z shl.b16 %rs762, %rs761, 4; 2026-02-21T09:23:31.1984611Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1984683Z ld.shared.s8 %rs763, [%r12321+1664]; 2026-02-21T09:23:31.1984948Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1985015Z shl.b16 %rs764, %rs763, 4; 2026-02-21T09:23:31.1985221Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1985295Z ld.shared.s8 %rs765, [%r12323+1792]; 2026-02-21T09:23:31.1985495Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1985606Z shl.b16 %rs766, %rs765, 4; 2026-02-21T09:23:31.1985810Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1985878Z ld.shared.s8 %rs767, [%r12325+1920]; 2026-02-21T09:23:31.1986081Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1986143Z shl.b16 %rs768, %rs767, 4; 2026-02-21T09:23:31.1986341Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1986413Z ld.shared.s8 %rs769, [%r12311+2048]; 2026-02-21T09:23:31.1986742Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1986822Z shl.b16 %rs770, %rs769, 4; 2026-02-21T09:23:31.1987023Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1987090Z ld.shared.s8 %rs771, [%r12313+2176]; 2026-02-21T09:23:31.1987284Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1987350Z shl.b16 %rs772, %rs771, 4; 2026-02-21T09:23:31.1987545Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1987611Z ld.shared.s8 %rs773, [%r12315+2304]; 2026-02-21T09:23:31.1987805Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1987873Z shl.b16 %rs774, %rs773, 4; 2026-02-21T09:23:31.1988067Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1988137Z ld.shared.s8 %rs775, [%r12317+2432]; 2026-02-21T09:23:31.1988427Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1988491Z shl.b16 %rs776, %rs775, 4; 2026-02-21T09:23:31.1988687Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1988759Z ld.shared.s8 %rs777, [%r12319+2560]; 2026-02-21T09:23:31.1988954Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1989017Z shl.b16 %rs778, %rs777, 4; 2026-02-21T09:23:31.1989224Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1989291Z ld.shared.s8 %rs779, [%r12321+2688]; 2026-02-21T09:23:31.1989489Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1989552Z shl.b16 %rs780, %rs779, 4; 2026-02-21T09:23:31.1989945Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1990014Z ld.shared.s8 %rs781, [%r12323+2816]; 2026-02-21T09:23:31.1990208Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1990273Z shl.b16 %rs782, %rs781, 4; 2026-02-21T09:23:31.1990464Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1990530Z ld.shared.s8 %rs783, [%r12325+2944]; 2026-02-21T09:23:31.1990728Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1990790Z shl.b16 %rs784, %rs783, 4; 2026-02-21T09:23:31.1991059Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1991135Z ld.shared.s8 %rs785, [%r12311+3072]; 2026-02-21T09:23:31.1991334Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1991399Z shl.b16 %rs786, %rs785, 4; 2026-02-21T09:23:31.1991597Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1991721Z ld.shared.s8 %rs787, [%r12313+3200]; 2026-02-21T09:23:31.1991917Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1991979Z shl.b16 %rs788, %rs787, 4; 2026-02-21T09:23:31.1992176Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1992255Z ld.shared.s8 %rs789, [%r12315+3328]; 2026-02-21T09:23:31.1992450Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1992515Z shl.b16 %rs790, %rs789, 4; 2026-02-21T09:23:31.1992706Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1992776Z ld.shared.s8 %rs791, [%r12317+3456]; 2026-02-21T09:23:31.1992974Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1993035Z shl.b16 %rs792, %rs791, 4; 2026-02-21T09:23:31.1993229Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1993300Z ld.shared.s8 %rs793, [%r12319+3584]; 2026-02-21T09:23:31.1993492Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1993556Z shl.b16 %rs794, %rs793, 4; 2026-02-21T09:23:31.1993749Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1993821Z ld.shared.s8 %rs795, [%r12321+3712]; 2026-02-21T09:23:31.1994014Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1994078Z shl.b16 %rs796, %rs795, 4; 2026-02-21T09:23:31.1994276Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1994341Z ld.shared.s8 %rs797, [%r12323+3840]; 2026-02-21T09:23:31.1994536Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1994601Z shl.b16 %rs798, %rs797, 4; 2026-02-21T09:23:31.1994793Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1994859Z ld.shared.s8 %rs799, [%r12325+3968]; 2026-02-21T09:23:31.1995060Z .loc 1 61 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:61:28 2026-02-21T09:23:31.1995122Z shl.b16 %rs800, %rs799, 4; 2026-02-21T09:23:31.1995315Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1995378Z cvt.s16.s8 %rs801, %rs738; 2026-02-21T09:23:31.1995586Z shr.s16 %rs802, %rs801, 4; 2026-02-21T09:23:31.1995647Z cvt.s16.s8 %rs803, %rs740; 2026-02-21T09:23:31.1995705Z shr.s16 %rs804, %rs803, 4; 2026-02-21T09:23:31.1995772Z shr.s16 %rs805, %rs737, 4; 2026-02-21T09:23:31.1995833Z shr.s16 %rs806, %rs739, 4; 2026-02-21T09:23:31.1996029Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1996102Z cvt.rn.f32.s16 %r12326, %rs806; 2026-02-21T09:23:31.1996165Z cvt.rn.f32.s16 %r12327, %rs805; 2026-02-21T09:23:31.1996226Z cvt.rn.f32.s16 %r12328, %rs804; 2026-02-21T09:23:31.1996287Z cvt.rn.f32.s16 %r12329, %rs802; 2026-02-21T09:23:31.1996623Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1996782Z cvt.s16.s8 %rs807, %rs742; 2026-02-21T09:23:31.1996861Z shr.s16 %rs808, %rs807, 4; 2026-02-21T09:23:31.1996930Z cvt.s16.s8 %rs809, %rs744; 2026-02-21T09:23:31.1996990Z shr.s16 %rs810, %rs809, 4; 2026-02-21T09:23:31.1997057Z shr.s16 %rs811, %rs741, 4; 2026-02-21T09:23:31.1997128Z shr.s16 %rs812, %rs743, 4; 2026-02-21T09:23:31.1997338Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1997474Z cvt.rn.f32.s16 %r12330, %rs812; 2026-02-21T09:23:31.1997543Z cvt.rn.f32.s16 %r12331, %rs811; 2026-02-21T09:23:31.1997605Z cvt.rn.f32.s16 %r12332, %rs810; 2026-02-21T09:23:31.1997668Z cvt.rn.f32.s16 %r12333, %rs808; 2026-02-21T09:23:31.1997875Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1997935Z cvt.s16.s8 %rs813, %rs746; 2026-02-21T09:23:31.1997998Z shr.s16 %rs814, %rs813, 4; 2026-02-21T09:23:31.1998064Z cvt.s16.s8 %rs815, %rs748; 2026-02-21T09:23:31.1998126Z shr.s16 %rs816, %rs815, 4; 2026-02-21T09:23:31.1998186Z shr.s16 %rs817, %rs745, 4; 2026-02-21T09:23:31.1998246Z shr.s16 %rs818, %rs747, 4; 2026-02-21T09:23:31.1998449Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1998528Z cvt.rn.f32.s16 %r12334, %rs818; 2026-02-21T09:23:31.1998592Z cvt.rn.f32.s16 %r12335, %rs817; 2026-02-21T09:23:31.1998657Z cvt.rn.f32.s16 %r12336, %rs816; 2026-02-21T09:23:31.1998721Z cvt.rn.f32.s16 %r12337, %rs814; 2026-02-21T09:23:31.1998917Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.1998981Z cvt.s16.s8 %rs819, %rs750; 2026-02-21T09:23:31.1999044Z shr.s16 %rs820, %rs819, 4; 2026-02-21T09:23:31.1999113Z cvt.s16.s8 %rs821, %rs752; 2026-02-21T09:23:31.1999172Z shr.s16 %rs822, %rs821, 4; 2026-02-21T09:23:31.1999236Z shr.s16 %rs823, %rs749, 4; 2026-02-21T09:23:31.1999295Z shr.s16 %rs824, %rs751, 4; 2026-02-21T09:23:31.1999493Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.1999561Z cvt.rn.f32.s16 %r12338, %rs824; 2026-02-21T09:23:31.1999625Z cvt.rn.f32.s16 %r12339, %rs823; 2026-02-21T09:23:31.1999686Z cvt.rn.f32.s16 %r12340, %rs822; 2026-02-21T09:23:31.1999748Z cvt.rn.f32.s16 %r12341, %rs820; 2026-02-21T09:23:31.1999947Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2000008Z cvt.s16.s8 %rs825, %rs754; 2026-02-21T09:23:31.2000067Z shr.s16 %rs826, %rs825, 4; 2026-02-21T09:23:31.2000130Z cvt.s16.s8 %rs827, %rs756; 2026-02-21T09:23:31.2000189Z shr.s16 %rs828, %rs827, 4; 2026-02-21T09:23:31.2000247Z shr.s16 %rs829, %rs753, 4; 2026-02-21T09:23:31.2000323Z shr.s16 %rs830, %rs755, 4; 2026-02-21T09:23:31.2000522Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2000585Z cvt.rn.f32.s16 %r12342, %rs830; 2026-02-21T09:23:31.2000648Z cvt.rn.f32.s16 %r12343, %rs829; 2026-02-21T09:23:31.2000713Z cvt.rn.f32.s16 %r12344, %rs828; 2026-02-21T09:23:31.2000775Z cvt.rn.f32.s16 %r12345, %rs826; 2026-02-21T09:23:31.2001120Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2001186Z cvt.s16.s8 %rs831, %rs758; 2026-02-21T09:23:31.2001246Z shr.s16 %rs832, %rs831, 4; 2026-02-21T09:23:31.2001308Z cvt.s16.s8 %rs833, %rs760; 2026-02-21T09:23:31.2001368Z shr.s16 %rs834, %rs833, 4; 2026-02-21T09:23:31.2001432Z shr.s16 %rs835, %rs757, 4; 2026-02-21T09:23:31.2001493Z shr.s16 %rs836, %rs759, 4; 2026-02-21T09:23:31.2001686Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2001754Z cvt.rn.f32.s16 %r12346, %rs836; 2026-02-21T09:23:31.2001815Z cvt.rn.f32.s16 %r12347, %rs835; 2026-02-21T09:23:31.2001877Z cvt.rn.f32.s16 %r12348, %rs834; 2026-02-21T09:23:31.2002006Z cvt.rn.f32.s16 %r12349, %rs832; 2026-02-21T09:23:31.2002206Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2002271Z cvt.s16.s8 %rs837, %rs762; 2026-02-21T09:23:31.2002332Z shr.s16 %rs838, %rs837, 4; 2026-02-21T09:23:31.2002396Z cvt.s16.s8 %rs839, %rs764; 2026-02-21T09:23:31.2002456Z shr.s16 %rs840, %rs839, 4; 2026-02-21T09:23:31.2002516Z shr.s16 %rs841, %rs761, 4; 2026-02-21T09:23:31.2002635Z shr.s16 %rs842, %rs763, 4; 2026-02-21T09:23:31.2002835Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2002898Z cvt.rn.f32.s16 %r12350, %rs842; 2026-02-21T09:23:31.2002964Z cvt.rn.f32.s16 %r12351, %rs841; 2026-02-21T09:23:31.2003026Z cvt.rn.f32.s16 %r12352, %rs840; 2026-02-21T09:23:31.2003086Z cvt.rn.f32.s16 %r12353, %rs838; 2026-02-21T09:23:31.2003282Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2003348Z cvt.s16.s8 %rs843, %rs766; 2026-02-21T09:23:31.2003407Z shr.s16 %rs844, %rs843, 4; 2026-02-21T09:23:31.2003467Z cvt.s16.s8 %rs845, %rs768; 2026-02-21T09:23:31.2003543Z shr.s16 %rs846, %rs845, 4; 2026-02-21T09:23:31.2003612Z shr.s16 %rs847, %rs765, 4; 2026-02-21T09:23:31.2003671Z shr.s16 %rs848, %rs767, 4; 2026-02-21T09:23:31.2003871Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2003938Z cvt.rn.f32.s16 %r12354, %rs848; 2026-02-21T09:23:31.2004000Z cvt.rn.f32.s16 %r12355, %rs847; 2026-02-21T09:23:31.2004062Z cvt.rn.f32.s16 %r12356, %rs846; 2026-02-21T09:23:31.2004128Z cvt.rn.f32.s16 %r12357, %rs844; 2026-02-21T09:23:31.2004321Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2004382Z cvt.s16.s8 %rs849, %rs770; 2026-02-21T09:23:31.2004447Z shr.s16 %rs850, %rs849, 4; 2026-02-21T09:23:31.2004508Z cvt.s16.s8 %rs851, %rs772; 2026-02-21T09:23:31.2004572Z shr.s16 %rs852, %rs851, 4; 2026-02-21T09:23:31.2004632Z shr.s16 %rs853, %rs769, 4; 2026-02-21T09:23:31.2004698Z shr.s16 %rs854, %rs771, 4; 2026-02-21T09:23:31.2004896Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2004962Z cvt.rn.f32.s16 %r12358, %rs854; 2026-02-21T09:23:31.2005028Z cvt.rn.f32.s16 %r12359, %rs853; 2026-02-21T09:23:31.2005091Z cvt.rn.f32.s16 %r12360, %rs852; 2026-02-21T09:23:31.2005151Z cvt.rn.f32.s16 %r12361, %rs850; 2026-02-21T09:23:31.2005346Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2005407Z cvt.s16.s8 %rs855, %rs774; 2026-02-21T09:23:31.2005468Z shr.s16 %rs856, %rs855, 4; 2026-02-21T09:23:31.2005527Z cvt.s16.s8 %rs857, %rs776; 2026-02-21T09:23:31.2005594Z shr.s16 %rs858, %rs857, 4; 2026-02-21T09:23:31.2005654Z shr.s16 %rs859, %rs773, 4; 2026-02-21T09:23:31.2005714Z shr.s16 %rs860, %rs775, 4; 2026-02-21T09:23:31.2005910Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2006095Z cvt.rn.f32.s16 %r12362, %rs860; 2026-02-21T09:23:31.2006157Z cvt.rn.f32.s16 %r12363, %rs859; 2026-02-21T09:23:31.2006220Z cvt.rn.f32.s16 %r12364, %rs858; 2026-02-21T09:23:31.2006290Z cvt.rn.f32.s16 %r12365, %rs856; 2026-02-21T09:23:31.2006608Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2006676Z cvt.s16.s8 %rs861, %rs778; 2026-02-21T09:23:31.2006742Z shr.s16 %rs862, %rs861, 4; 2026-02-21T09:23:31.2006813Z cvt.s16.s8 %rs863, %rs780; 2026-02-21T09:23:31.2006875Z shr.s16 %rs864, %rs863, 4; 2026-02-21T09:23:31.2006940Z shr.s16 %rs865, %rs777, 4; 2026-02-21T09:23:31.2007002Z shr.s16 %rs866, %rs779, 4; 2026-02-21T09:23:31.2007281Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2007363Z cvt.rn.f32.s16 %r12366, %rs866; 2026-02-21T09:23:31.2007431Z cvt.rn.f32.s16 %r12367, %rs865; 2026-02-21T09:23:31.2007492Z cvt.rn.f32.s16 %r12368, %rs864; 2026-02-21T09:23:31.2007569Z cvt.rn.f32.s16 %r12369, %rs862; 2026-02-21T09:23:31.2007774Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2007835Z cvt.s16.s8 %rs867, %rs782; 2026-02-21T09:23:31.2007956Z shr.s16 %rs868, %rs867, 4; 2026-02-21T09:23:31.2008026Z cvt.s16.s8 %rs869, %rs784; 2026-02-21T09:23:31.2008086Z shr.s16 %rs870, %rs869, 4; 2026-02-21T09:23:31.2008146Z shr.s16 %rs871, %rs781, 4; 2026-02-21T09:23:31.2008204Z shr.s16 %rs872, %rs783, 4; 2026-02-21T09:23:31.2008402Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2008467Z cvt.rn.f32.s16 %r12370, %rs872; 2026-02-21T09:23:31.2008529Z cvt.rn.f32.s16 %r12371, %rs871; 2026-02-21T09:23:31.2008594Z cvt.rn.f32.s16 %r12372, %rs870; 2026-02-21T09:23:31.2008655Z cvt.rn.f32.s16 %r12373, %rs868; 2026-02-21T09:23:31.2008858Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2008927Z cvt.s16.s8 %rs873, %rs786; 2026-02-21T09:23:31.2008990Z shr.s16 %rs874, %rs873, 4; 2026-02-21T09:23:31.2009051Z cvt.s16.s8 %rs875, %rs788; 2026-02-21T09:23:31.2009112Z shr.s16 %rs876, %rs875, 4; 2026-02-21T09:23:31.2009180Z shr.s16 %rs877, %rs785, 4; 2026-02-21T09:23:31.2009239Z shr.s16 %rs878, %rs787, 4; 2026-02-21T09:23:31.2009429Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2009495Z cvt.rn.f32.s16 %r12374, %rs878; 2026-02-21T09:23:31.2009556Z cvt.rn.f32.s16 %r12375, %rs877; 2026-02-21T09:23:31.2009628Z cvt.rn.f32.s16 %r12376, %rs876; 2026-02-21T09:23:31.2009691Z cvt.rn.f32.s16 %r12377, %rs874; 2026-02-21T09:23:31.2009894Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2009955Z cvt.s16.s8 %rs879, %rs790; 2026-02-21T09:23:31.2010014Z shr.s16 %rs880, %rs879, 4; 2026-02-21T09:23:31.2010082Z cvt.s16.s8 %rs881, %rs792; 2026-02-21T09:23:31.2010141Z shr.s16 %rs882, %rs881, 4; 2026-02-21T09:23:31.2010200Z shr.s16 %rs883, %rs789, 4; 2026-02-21T09:23:31.2010260Z shr.s16 %rs884, %rs791, 4; 2026-02-21T09:23:31.2010458Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2010522Z cvt.rn.f32.s16 %r12378, %rs884; 2026-02-21T09:23:31.2010586Z cvt.rn.f32.s16 %r12379, %rs883; 2026-02-21T09:23:31.2010651Z cvt.rn.f32.s16 %r12380, %rs882; 2026-02-21T09:23:31.2010712Z cvt.rn.f32.s16 %r12381, %rs880; 2026-02-21T09:23:31.2010904Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2010970Z cvt.s16.s8 %rs885, %rs794; 2026-02-21T09:23:31.2011032Z shr.s16 %rs886, %rs885, 4; 2026-02-21T09:23:31.2011092Z cvt.s16.s8 %rs887, %rs796; 2026-02-21T09:23:31.2011151Z shr.s16 %rs888, %rs887, 4; 2026-02-21T09:23:31.2011302Z shr.s16 %rs889, %rs793, 4; 2026-02-21T09:23:31.2011420Z shr.s16 %rs890, %rs795, 4; 2026-02-21T09:23:31.2011615Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2011682Z cvt.rn.f32.s16 %r12382, %rs890; 2026-02-21T09:23:31.2011745Z cvt.rn.f32.s16 %r12383, %rs889; 2026-02-21T09:23:31.2011819Z cvt.rn.f32.s16 %r12384, %rs888; 2026-02-21T09:23:31.2011888Z cvt.rn.f32.s16 %r12385, %rs886; 2026-02-21T09:23:31.2012084Z .loc 1 63 25 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:63:25 2026-02-21T09:23:31.2012145Z cvt.s16.s8 %rs891, %rs798; 2026-02-21T09:23:31.2012206Z shr.s16 %rs892, %rs891, 4; 2026-02-21T09:23:31.2012271Z cvt.s16.s8 %rs893, %rs800; 2026-02-21T09:23:31.2012331Z shr.s16 %rs894, %rs893, 4; 2026-02-21T09:23:31.2012443Z shr.s16 %rs895, %rs797, 4; 2026-02-21T09:23:31.2012511Z shr.s16 %rs896, %rs799, 4; 2026-02-21T09:23:31.2012715Z .loc 1 81 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:81:32 2026-02-21T09:23:31.2012783Z cvt.rn.f32.s16 %r12386, %rs896; 2026-02-21T09:23:31.2012851Z cvt.rn.f32.s16 %r12387, %rs895; 2026-02-21T09:23:31.2012912Z cvt.rn.f32.s16 %r12388, %rs894; 2026-02-21T09:23:31.2013045Z cvt.rn.f32.s16 %r12389, %rs892; 2026-02-21T09:23:31.2013176Z st.shared.v4.b32 [%r156], {%r12329, %r12327, %r12328, %r12326}; 2026-02-21T09:23:31.2013316Z st.shared.v4.b32 [%r156+16384], {%r12361, %r12359, %r12360, %r12358}; 2026-02-21T09:23:31.2013430Z st.shared.v4.b32 [%r157], {%r12333, %r12331, %r12332, %r12330}; 2026-02-21T09:23:31.2013554Z st.shared.v4.b32 [%r157+16384], {%r12365, %r12363, %r12364, %r12362}; 2026-02-21T09:23:31.2013666Z st.shared.v4.b32 [%r158], {%r12337, %r12335, %r12336, %r12334}; 2026-02-21T09:23:31.2013787Z st.shared.v4.b32 [%r158+16384], {%r12369, %r12367, %r12368, %r12366}; 2026-02-21T09:23:31.2013894Z st.shared.v4.b32 [%r159], {%r12341, %r12339, %r12340, %r12338}; 2026-02-21T09:23:31.2014018Z st.shared.v4.b32 [%r159+16384], {%r12373, %r12371, %r12372, %r12370}; 2026-02-21T09:23:31.2014140Z st.shared.v4.b32 [%r160], {%r12345, %r12343, %r12344, %r12342}; 2026-02-21T09:23:31.2014258Z st.shared.v4.b32 [%r160+16384], {%r12377, %r12375, %r12376, %r12374}; 2026-02-21T09:23:31.2014367Z st.shared.v4.b32 [%r161], {%r12349, %r12347, %r12348, %r12346}; 2026-02-21T09:23:31.2014491Z st.shared.v4.b32 [%r161+16384], {%r12381, %r12379, %r12380, %r12378}; 2026-02-21T09:23:31.2014597Z st.shared.v4.b32 [%r162], {%r12353, %r12351, %r12352, %r12350}; 2026-02-21T09:23:31.2014714Z st.shared.v4.b32 [%r162+16384], {%r12385, %r12383, %r12384, %r12382}; 2026-02-21T09:23:31.2014827Z st.shared.v4.b32 [%r163], {%r12357, %r12355, %r12356, %r12354}; 2026-02-21T09:23:31.2014947Z st.shared.v4.b32 [%r163+16384], {%r12389, %r12387, %r12388, %r12386}; 2026-02-21T09:23:31.2015003Z $L__tmp7: 2026-02-21T09:23:31.2015284Z .loc 2 291 36 // standard.py:291:36 @[ cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:88:40 ] 2026-02-21T09:23:31.2015350Z // begin inline asm 2026-02-21T09:23:31.2015441Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.2015506Z // end inline asm 2026-02-21T09:23:31.2015563Z bar.sync 0; 2026-02-21T09:23:31.2015653Z shfl.sync.idx.b32 %r12390, %r5, 0, 31, -1; 2026-02-21T09:23:31.2015729Z wgmma.fence.sync.aligned; 2026-02-21T09:23:31.2015803Z mov.pred %p169, -1; 2026-02-21T09:23:31.2015872Z // begin inline asm 2026-02-21T09:23:31.2017489Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10010,%r10011,%r10012,%r10013}, %rd12, %p169, 1, 1; 2026-02-21T09:23:31.2017720Z // end inline asm 2026-02-21T09:23:31.2017781Z // begin inline asm 2026-02-21T09:23:31.2019316Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10142,%r10143,%r10144,%r10145}, %rd13, %p169, 1, 1; 2026-02-21T09:23:31.2019378Z // end inline asm 2026-02-21T09:23:31.2019437Z // begin inline asm 2026-02-21T09:23:31.2020971Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10274,%r10275,%r10276,%r10277}, %rd14, %p169, 1, 1; 2026-02-21T09:23:31.2021040Z // end inline asm 2026-02-21T09:23:31.2021097Z // begin inline asm 2026-02-21T09:23:31.2022559Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10406,%r10407,%r10408,%r10409}, %rd15, %p169, 1, 1; 2026-02-21T09:23:31.2022622Z // end inline asm 2026-02-21T09:23:31.2022683Z // begin inline asm 2026-02-21T09:23:31.2024140Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10538,%r10539,%r10540,%r10541}, %rd16, %p169, 1, 1; 2026-02-21T09:23:31.2024199Z // end inline asm 2026-02-21T09:23:31.2024264Z // begin inline asm 2026-02-21T09:23:31.2025721Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10670,%r10671,%r10672,%r10673}, %rd17, %p169, 1, 1; 2026-02-21T09:23:31.2025781Z // end inline asm 2026-02-21T09:23:31.2025840Z // begin inline asm 2026-02-21T09:23:31.2027412Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10802,%r10803,%r10804,%r10805}, %rd18, %p169, 1, 1; 2026-02-21T09:23:31.2027637Z // end inline asm 2026-02-21T09:23:31.2027700Z // begin inline asm 2026-02-21T09:23:31.2029330Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942}, {%r10934,%r10935,%r10936,%r10937}, %rd19, %p169, 1, 1; 2026-02-21T09:23:31.2029413Z // end inline asm 2026-02-21T09:23:31.2029533Z // begin inline asm 2026-02-21T09:23:31.2031002Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11066,%r11067,%r11068,%r11069}, %rd12, %p169, 1, 1; 2026-02-21T09:23:31.2031062Z // end inline asm 2026-02-21T09:23:31.2031124Z // begin inline asm 2026-02-21T09:23:31.2032583Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11198,%r11199,%r11200,%r11201}, %rd13, %p169, 1, 1; 2026-02-21T09:23:31.2032642Z // end inline asm 2026-02-21T09:23:31.2032710Z // begin inline asm 2026-02-21T09:23:31.2034180Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11330,%r11331,%r11332,%r11333}, %rd14, %p169, 1, 1; 2026-02-21T09:23:31.2034239Z // end inline asm 2026-02-21T09:23:31.2034305Z // begin inline asm 2026-02-21T09:23:31.2035756Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11462,%r11463,%r11464,%r11465}, %rd15, %p169, 1, 1; 2026-02-21T09:23:31.2035922Z // end inline asm 2026-02-21T09:23:31.2035985Z // begin inline asm 2026-02-21T09:23:31.2037647Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11594,%r11595,%r11596,%r11597}, %rd16, %p169, 1, 1; 2026-02-21T09:23:31.2037728Z // end inline asm 2026-02-21T09:23:31.2037790Z // begin inline asm 2026-02-21T09:23:31.2039313Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11726,%r11727,%r11728,%r11729}, %rd17, %p169, 1, 1; 2026-02-21T09:23:31.2039388Z // end inline asm 2026-02-21T09:23:31.2039447Z // begin inline asm 2026-02-21T09:23:31.2040922Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11858,%r11859,%r11860,%r11861}, %rd18, %p169, 1, 1; 2026-02-21T09:23:31.2040983Z // end inline asm 2026-02-21T09:23:31.2041039Z // begin inline asm 2026-02-21T09:23:31.2042503Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006}, {%r11990,%r11991,%r11992,%r11993}, %rd19, %p169, 1, 1; 2026-02-21T09:23:31.2042563Z // end inline asm 2026-02-21T09:23:31.2042642Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:31.2042710Z mov.b32 %r12124, %r9731; 2026-02-21T09:23:31.2042771Z mov.b32 %r12122, %r9662; 2026-02-21T09:23:31.2042830Z mov.b32 %r12123, %r9731; 2026-02-21T09:23:31.2042895Z // begin inline asm 2026-02-21T09:23:31.2045364Z // wait for regs: %r12879,%r12880,%r12881,%r12882,%r12883,%r12884,%r12885,%r12886,%r12887,%r12888,%r12889,%r12890,%r12891,%r12892,%r12893,%r12894,%r12895,%r12896,%r12897,%r12898,%r12899,%r12900,%r12901,%r12902,%r12903,%r12904,%r12905,%r12906,%r12907,%r12908,%r12909,%r12910,%r12911,%r12912,%r12913,%r12914,%r12915,%r12916,%r12917,%r12918,%r12919,%r12920,%r12921,%r12922,%r12923,%r12924,%r12925,%r12926,%r12927,%r12928,%r12929,%r12930,%r12931,%r12932,%r12933,%r12934,%r12935,%r12936,%r12937,%r12938,%r12939,%r12940,%r12941,%r12942,%r12943,%r12944,%r12945,%r12946,%r12947,%r12948,%r12949,%r12950,%r12951,%r12952,%r12953,%r12954,%r12955,%r12956,%r12957,%r12958,%r12959,%r12960,%r12961,%r12962,%r12963,%r12964,%r12965,%r12966,%r12967,%r12968,%r12969,%r12970,%r12971,%r12972,%r12973,%r12974,%r12975,%r12976,%r12977,%r12978,%r12979,%r12980,%r12981,%r12982,%r12983,%r12984,%r12985,%r12986,%r12987,%r12988,%r12989,%r12990,%r12991,%r12992,%r12993,%r12994,%r12995,%r12996,%r12997,%r12998,%r12999,%r13000,%r13001,%r13002,%r13003,%r13004,%r13005,%r13006,%r12122,%r12123,%r12124 2026-02-21T09:23:31.2045589Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:31.2045648Z // end inline asm 2026-02-21T09:23:31.2045704Z $L__tmp8: 2026-02-21T09:23:31.2045922Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.2045995Z add.s32 %r12391, %r12878, 1; 2026-02-21T09:23:31.2046154Z setp.gt.s32 %p191, %r12391, 1; 2026-02-21T09:23:31.2046229Z selp.b32 %r12878, 0, %r12391, %p191; 2026-02-21T09:23:31.2046443Z .loc 1 52 32 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:32 2026-02-21T09:23:31.2046651Z add.s64 %rd736, %rd761, %rd97; 2026-02-21T09:23:31.2046717Z add.s64 %rd737, %rd761, %rd96; 2026-02-21T09:23:31.2046785Z add.s64 %rd738, %rd761, %rd95; 2026-02-21T09:23:31.2046929Z add.s64 %rd739, %rd761, %rd94; 2026-02-21T09:23:31.2046996Z add.s64 %rd740, %rd761, %rd93; 2026-02-21T09:23:31.2047059Z add.s64 %rd741, %rd761, %rd92; 2026-02-21T09:23:31.2047125Z add.s64 %rd742, %rd761, %rd91; 2026-02-21T09:23:31.2047200Z add.s64 %rd743, %rd761, %rd90; 2026-02-21T09:23:31.2047264Z add.s64 %rd744, %rd761, %rd89; 2026-02-21T09:23:31.2047331Z add.s64 %rd745, %rd761, %rd88; 2026-02-21T09:23:31.2047400Z add.s64 %rd746, %rd761, %rd87; 2026-02-21T09:23:31.2047462Z add.s64 %rd747, %rd761, %rd86; 2026-02-21T09:23:31.2047526Z add.s64 %rd748, %rd761, %rd85; 2026-02-21T09:23:31.2047594Z add.s64 %rd749, %rd761, %rd84; 2026-02-21T09:23:31.2047654Z add.s64 %rd750, %rd761, %rd83; 2026-02-21T09:23:31.2047854Z .loc 1 52 80 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:52:80 2026-02-21T09:23:31.2047927Z add.s64 %rd751, %rd761, %rd82; 2026-02-21T09:23:31.2047991Z shl.b32 %r12392, %r12878, 14; 2026-02-21T09:23:31.2048056Z add.s32 %r12393, %r1290, %r12392; 2026-02-21T09:23:31.2048129Z add.s32 %r12256, %r12393, %r115; 2026-02-21T09:23:31.2048193Z selp.b32 %r12257, 8, 0, %p189; 2026-02-21T09:23:31.2048254Z // begin inline asm 2026-02-21T09:23:31.2048405Z cp.async.ca.shared.global [ %r12256 + 0 ], [ %rd736 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2048468Z // end inline asm 2026-02-21T09:23:31.2048533Z add.s32 %r12258, %r12256, 1024; 2026-02-21T09:23:31.2048591Z // begin inline asm 2026-02-21T09:23:31.2048741Z cp.async.ca.shared.global [ %r12258 + 0 ], [ %rd737 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2048797Z // end inline asm 2026-02-21T09:23:31.2048867Z add.s32 %r12260, %r12256, 2048; 2026-02-21T09:23:31.2048924Z // begin inline asm 2026-02-21T09:23:31.2049081Z cp.async.ca.shared.global [ %r12260 + 0 ], [ %rd738 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2049142Z // end inline asm 2026-02-21T09:23:31.2049204Z add.s32 %r12262, %r12256, 3072; 2026-02-21T09:23:31.2049270Z // begin inline asm 2026-02-21T09:23:31.2049408Z cp.async.ca.shared.global [ %r12262 + 0 ], [ %rd739 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2049464Z // end inline asm 2026-02-21T09:23:31.2049533Z add.s32 %r12264, %r12256, 4096; 2026-02-21T09:23:31.2049602Z // begin inline asm 2026-02-21T09:23:31.2049741Z cp.async.ca.shared.global [ %r12264 + 0 ], [ %rd740 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2049798Z // end inline asm 2026-02-21T09:23:31.2049866Z add.s32 %r12266, %r12256, 5120; 2026-02-21T09:23:31.2049924Z // begin inline asm 2026-02-21T09:23:31.2050060Z cp.async.ca.shared.global [ %r12266 + 0 ], [ %rd741 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2050121Z // end inline asm 2026-02-21T09:23:31.2050181Z add.s32 %r12268, %r12256, 6144; 2026-02-21T09:23:31.2050238Z // begin inline asm 2026-02-21T09:23:31.2050529Z cp.async.ca.shared.global [ %r12268 + 0 ], [ %rd742 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2050595Z // end inline asm 2026-02-21T09:23:31.2050657Z add.s32 %r12270, %r12256, 7168; 2026-02-21T09:23:31.2050715Z // begin inline asm 2026-02-21T09:23:31.2050859Z cp.async.ca.shared.global [ %r12270 + 0 ], [ %rd743 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2050915Z // end inline asm 2026-02-21T09:23:31.2050974Z add.s32 %r12272, %r12256, 8192; 2026-02-21T09:23:31.2051038Z // begin inline asm 2026-02-21T09:23:31.2051172Z cp.async.ca.shared.global [ %r12272 + 0 ], [ %rd744 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2051228Z // end inline asm 2026-02-21T09:23:31.2051288Z add.s32 %r12274, %r12256, 9216; 2026-02-21T09:23:31.2051350Z // begin inline asm 2026-02-21T09:23:31.2051554Z cp.async.ca.shared.global [ %r12274 + 0 ], [ %rd745 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2051617Z // end inline asm 2026-02-21T09:23:31.2051683Z add.s32 %r12276, %r12256, 10240; 2026-02-21T09:23:31.2051747Z // begin inline asm 2026-02-21T09:23:31.2051884Z cp.async.ca.shared.global [ %r12276 + 0 ], [ %rd746 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2051941Z // end inline asm 2026-02-21T09:23:31.2052006Z add.s32 %r12278, %r12256, 11264; 2026-02-21T09:23:31.2052118Z // begin inline asm 2026-02-21T09:23:31.2052259Z cp.async.ca.shared.global [ %r12278 + 0 ], [ %rd747 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2052320Z // end inline asm 2026-02-21T09:23:31.2052380Z add.s32 %r12280, %r12256, 12288; 2026-02-21T09:23:31.2052441Z // begin inline asm 2026-02-21T09:23:31.2052579Z cp.async.ca.shared.global [ %r12280 + 0 ], [ %rd748 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2052647Z // end inline asm 2026-02-21T09:23:31.2052708Z add.s32 %r12282, %r12256, 13312; 2026-02-21T09:23:31.2052767Z // begin inline asm 2026-02-21T09:23:31.2052910Z cp.async.ca.shared.global [ %r12282 + 0 ], [ %rd749 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2052967Z // end inline asm 2026-02-21T09:23:31.2053030Z add.s32 %r12284, %r12256, 14336; 2026-02-21T09:23:31.2053095Z // begin inline asm 2026-02-21T09:23:31.2053231Z cp.async.ca.shared.global [ %r12284 + 0 ], [ %rd750 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2053288Z // end inline asm 2026-02-21T09:23:31.2053349Z add.s32 %r12286, %r12256, 15360; 2026-02-21T09:23:31.2053415Z // begin inline asm 2026-02-21T09:23:31.2053550Z cp.async.ca.shared.global [ %r12286 + 0 ], [ %rd751 + 0 ], 0x8, %r12257; 2026-02-21T09:23:31.2053606Z // end inline asm 2026-02-21T09:23:31.2053677Z cp.async.commit_group; 2026-02-21T09:23:31.2053881Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.2053944Z shl.b32 %r12394, %r12878, 3; 2026-02-21T09:23:31.2054011Z add.s32 %r12288, %r9694, %r12394; 2026-02-21T09:23:31.2054082Z and.pred %p185, %p160, %p189; 2026-02-21T09:23:31.2054143Z // begin inline asm 2026-02-21T09:23:31.2054283Z @%p185 mbarrier.arrive.expect_tx.shared.b64 _, [%r12288], 4096; 2026-02-21T09:23:31.2054360Z // end inline asm 2026-02-21T09:23:31.2054563Z .loc 1 58 33 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:58:33 2026-02-21T09:23:31.2054627Z shl.b32 %r12395, %r12878, 12; 2026-02-21T09:23:31.2054694Z add.s32 %r12289, %r9729, %r12395; 2026-02-21T09:23:31.2054752Z bar.sync 0; 2026-02-21T09:23:31.2054822Z elect.sync %r12396|%p192, -1; 2026-02-21T09:23:31.2054889Z and.pred %p193, %p189, %p192; 2026-02-21T09:23:31.2054962Z and.pred %p186, %p1, %p193; 2026-02-21T09:23:31.2055026Z cvt.u32.u64 %r12397, %rd762; 2026-02-21T09:23:31.2055086Z add.s32 %r12291, %r12397, 64; 2026-02-21T09:23:31.2055149Z // begin inline asm 2026-02-21T09:23:31.2055490Z @%p186 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r12289], [%rd463, {%r9730, %r12291}], [%r12288]; 2026-02-21T09:23:31.2055550Z // end inline asm 2026-02-21T09:23:31.2055764Z .loc 1 45 78 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:45:78 2026-02-21T09:23:31.2055953Z add.s64 %rd761, %rd761, 128; 2026-02-21T09:23:31.2056025Z setp.lt.u64 %p194, %rd762, 480; 2026-02-21T09:23:31.2056091Z add.s64 %rd762, %rd762, 32; 2026-02-21T09:23:31.2056156Z @%p194 bra $L__BB0_12; 2026-02-21T09:23:31.2056271Z // %bb.13: // in Loop: Header=BB0_11 Depth=1 2026-02-21T09:23:31.2056341Z cp.async.wait_group 0; 2026-02-21T09:23:31.2056403Z bar.sync 0; 2026-02-21T09:23:31.2056586Z // begin inline asm 2026-02-21T09:23:31.2056708Z @%p160 mbarrier.inval.shared::cta.b64 [%r9694]; 2026-02-21T09:23:31.2056787Z // end inline asm 2026-02-21T09:23:31.2056845Z bar.sync 0; 2026-02-21T09:23:31.2056906Z // begin inline asm 2026-02-21T09:23:31.2057004Z @%p160 mbarrier.inval.shared::cta.b64 [%r9695]; 2026-02-21T09:23:31.2057066Z // end inline asm 2026-02-21T09:23:31.2057370Z .loc 1 91 28 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:91:28 2026-02-21T09:23:31.2057466Z cvt.rn.bf16x2.f32 %r12404, %r12880, %r12879; 2026-02-21T09:23:31.2057559Z cvt.rn.bf16x2.f32 %r12405, %r12882, %r12881; 2026-02-21T09:23:31.2057642Z cvt.rn.bf16x2.f32 %r12406, %r12884, %r12883; 2026-02-21T09:23:31.2057720Z cvt.rn.bf16x2.f32 %r12407, %r12886, %r12885; 2026-02-21T09:23:31.2057869Z cvt.rn.bf16x2.f32 %r12408, %r12888, %r12887; 2026-02-21T09:23:31.2057955Z cvt.rn.bf16x2.f32 %r12409, %r12890, %r12889; 2026-02-21T09:23:31.2058033Z cvt.rn.bf16x2.f32 %r12410, %r12892, %r12891; 2026-02-21T09:23:31.2058109Z cvt.rn.bf16x2.f32 %r12411, %r12894, %r12893; 2026-02-21T09:23:31.2058190Z cvt.rn.bf16x2.f32 %r12412, %r12896, %r12895; 2026-02-21T09:23:31.2058283Z cvt.rn.bf16x2.f32 %r12413, %r12898, %r12897; 2026-02-21T09:23:31.2058359Z cvt.rn.bf16x2.f32 %r12414, %r12900, %r12899; 2026-02-21T09:23:31.2058441Z cvt.rn.bf16x2.f32 %r12415, %r12902, %r12901; 2026-02-21T09:23:31.2058516Z cvt.rn.bf16x2.f32 %r12416, %r12904, %r12903; 2026-02-21T09:23:31.2058593Z cvt.rn.bf16x2.f32 %r12417, %r12906, %r12905; 2026-02-21T09:23:31.2058679Z cvt.rn.bf16x2.f32 %r12418, %r12908, %r12907; 2026-02-21T09:23:31.2058756Z cvt.rn.bf16x2.f32 %r12419, %r12910, %r12909; 2026-02-21T09:23:31.2058835Z cvt.rn.bf16x2.f32 %r12420, %r12912, %r12911; 2026-02-21T09:23:31.2058910Z cvt.rn.bf16x2.f32 %r12421, %r12914, %r12913; 2026-02-21T09:23:31.2058995Z cvt.rn.bf16x2.f32 %r12422, %r12916, %r12915; 2026-02-21T09:23:31.2059072Z cvt.rn.bf16x2.f32 %r12423, %r12918, %r12917; 2026-02-21T09:23:31.2059148Z cvt.rn.bf16x2.f32 %r12424, %r12920, %r12919; 2026-02-21T09:23:31.2059235Z cvt.rn.bf16x2.f32 %r12425, %r12922, %r12921; 2026-02-21T09:23:31.2059315Z cvt.rn.bf16x2.f32 %r12426, %r12924, %r12923; 2026-02-21T09:23:31.2059391Z cvt.rn.bf16x2.f32 %r12427, %r12926, %r12925; 2026-02-21T09:23:31.2059472Z cvt.rn.bf16x2.f32 %r12428, %r12928, %r12927; 2026-02-21T09:23:31.2059550Z cvt.rn.bf16x2.f32 %r12429, %r12930, %r12929; 2026-02-21T09:23:31.2059626Z cvt.rn.bf16x2.f32 %r12430, %r12932, %r12931; 2026-02-21T09:23:31.2059704Z cvt.rn.bf16x2.f32 %r12431, %r12934, %r12933; 2026-02-21T09:23:31.2059788Z cvt.rn.bf16x2.f32 %r12432, %r12936, %r12935; 2026-02-21T09:23:31.2059864Z cvt.rn.bf16x2.f32 %r12433, %r12938, %r12937; 2026-02-21T09:23:31.2059939Z cvt.rn.bf16x2.f32 %r12434, %r12940, %r12939; 2026-02-21T09:23:31.2060020Z cvt.rn.bf16x2.f32 %r12435, %r12942, %r12941; 2026-02-21T09:23:31.2060096Z cvt.rn.bf16x2.f32 %r12436, %r12944, %r12943; 2026-02-21T09:23:31.2060172Z cvt.rn.bf16x2.f32 %r12437, %r12946, %r12945; 2026-02-21T09:23:31.2060254Z cvt.rn.bf16x2.f32 %r12438, %r12948, %r12947; 2026-02-21T09:23:31.2060339Z cvt.rn.bf16x2.f32 %r12439, %r12950, %r12949; 2026-02-21T09:23:31.2060415Z cvt.rn.bf16x2.f32 %r12440, %r12952, %r12951; 2026-02-21T09:23:31.2060489Z cvt.rn.bf16x2.f32 %r12441, %r12954, %r12953; 2026-02-21T09:23:31.2060571Z cvt.rn.bf16x2.f32 %r12442, %r12956, %r12955; 2026-02-21T09:23:31.2060646Z cvt.rn.bf16x2.f32 %r12443, %r12958, %r12957; 2026-02-21T09:23:31.2060723Z cvt.rn.bf16x2.f32 %r12444, %r12960, %r12959; 2026-02-21T09:23:31.2060972Z cvt.rn.bf16x2.f32 %r12445, %r12962, %r12961; 2026-02-21T09:23:31.2061049Z cvt.rn.bf16x2.f32 %r12446, %r12964, %r12963; 2026-02-21T09:23:31.2061124Z cvt.rn.bf16x2.f32 %r12447, %r12966, %r12965; 2026-02-21T09:23:31.2061198Z cvt.rn.bf16x2.f32 %r12448, %r12968, %r12967; 2026-02-21T09:23:31.2061278Z cvt.rn.bf16x2.f32 %r12449, %r12970, %r12969; 2026-02-21T09:23:31.2061355Z cvt.rn.bf16x2.f32 %r12450, %r12972, %r12971; 2026-02-21T09:23:31.2061431Z cvt.rn.bf16x2.f32 %r12451, %r12974, %r12973; 2026-02-21T09:23:31.2061513Z cvt.rn.bf16x2.f32 %r12452, %r12976, %r12975; 2026-02-21T09:23:31.2061588Z cvt.rn.bf16x2.f32 %r12453, %r12978, %r12977; 2026-02-21T09:23:31.2061667Z cvt.rn.bf16x2.f32 %r12454, %r12980, %r12979; 2026-02-21T09:23:31.2061746Z cvt.rn.bf16x2.f32 %r12455, %r12982, %r12981; 2026-02-21T09:23:31.2061884Z cvt.rn.bf16x2.f32 %r12456, %r12984, %r12983; 2026-02-21T09:23:31.2061963Z cvt.rn.bf16x2.f32 %r12457, %r12986, %r12985; 2026-02-21T09:23:31.2062041Z cvt.rn.bf16x2.f32 %r12458, %r12988, %r12987; 2026-02-21T09:23:31.2062138Z cvt.rn.bf16x2.f32 %r12459, %r12990, %r12989; 2026-02-21T09:23:31.2062216Z cvt.rn.bf16x2.f32 %r12460, %r12992, %r12991; 2026-02-21T09:23:31.2062294Z cvt.rn.bf16x2.f32 %r12461, %r12994, %r12993; 2026-02-21T09:23:31.2062421Z cvt.rn.bf16x2.f32 %r12462, %r12996, %r12995; 2026-02-21T09:23:31.2062499Z cvt.rn.bf16x2.f32 %r12463, %r12998, %r12997; 2026-02-21T09:23:31.2062575Z cvt.rn.bf16x2.f32 %r12464, %r13000, %r12999; 2026-02-21T09:23:31.2062657Z cvt.rn.bf16x2.f32 %r12465, %r13002, %r13001; 2026-02-21T09:23:31.2062732Z cvt.rn.bf16x2.f32 %r12466, %r13004, %r13003; 2026-02-21T09:23:31.2062818Z cvt.rn.bf16x2.f32 %r12467, %r13006, %r13005; 2026-02-21T09:23:31.2063029Z .loc 1 92 43 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:92:43 2026-02-21T09:23:31.2063238Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r164], {%r12404, %r12405, %r12406, %r12407}; 2026-02-21T09:23:31.2063434Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r165], {%r12420, %r12421, %r12422, %r12423}; 2026-02-21T09:23:31.2063630Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r166], {%r12436, %r12437, %r12438, %r12439}; 2026-02-21T09:23:31.2063872Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r167], {%r12452, %r12453, %r12454, %r12455}; 2026-02-21T09:23:31.2064075Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r168], {%r12408, %r12409, %r12410, %r12411}; 2026-02-21T09:23:31.2064265Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r169], {%r12424, %r12425, %r12426, %r12427}; 2026-02-21T09:23:31.2064456Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r170], {%r12440, %r12441, %r12442, %r12443}; 2026-02-21T09:23:31.2064640Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r171], {%r12456, %r12457, %r12458, %r12459}; 2026-02-21T09:23:31.2064825Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r172], {%r12412, %r12413, %r12414, %r12415}; 2026-02-21T09:23:31.2065013Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r173], {%r12428, %r12429, %r12430, %r12431}; 2026-02-21T09:23:31.2065198Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r174], {%r12444, %r12445, %r12446, %r12447}; 2026-02-21T09:23:31.2065383Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r175], {%r12460, %r12461, %r12462, %r12463}; 2026-02-21T09:23:31.2065575Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r176], {%r12416, %r12417, %r12418, %r12419}; 2026-02-21T09:23:31.2065761Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r177], {%r12432, %r12433, %r12434, %r12435}; 2026-02-21T09:23:31.2065945Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r178], {%r12448, %r12449, %r12450, %r12451}; 2026-02-21T09:23:31.2066133Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r179], {%r12464, %r12465, %r12466, %r12467}; 2026-02-21T09:23:31.2066196Z // begin inline asm 2026-02-21T09:23:31.2066276Z fence.proxy.async.shared::cta; 2026-02-21T09:23:31.2066335Z // end inline asm 2026-02-21T09:23:31.2066396Z bar.sync 0; 2026-02-21T09:23:31.2066595Z elect.sync %r12468|%p199, -1; 2026-02-21T09:23:31.2066695Z shfl.sync.idx.b32 %r12469, %r5, 0, 31, -1; 2026-02-21T09:23:31.2066958Z and.pred %p197, %p201, %p199; 2026-02-21T09:23:31.2067024Z and.b32 %r12470, %r12469, 1; 2026-02-21T09:23:31.2067086Z shl.b32 %r12471, %r12470, 14; 2026-02-21T09:23:31.2067152Z add.s32 %r12402, %r1290, %r12471; 2026-02-21T09:23:31.2067221Z shl.b32 %r12472, %r12470, 6; 2026-02-21T09:23:31.2067285Z or.b32 %r12400, %r12472, %r9730; 2026-02-21T09:23:31.2067346Z // begin inline asm 2026-02-21T09:23:31.2067590Z @%p197 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd446, {%r12400, %r12401}], [%r12402]; 2026-02-21T09:23:31.2067649Z // end inline asm 2026-02-21T09:23:31.2067724Z cp.async.bulk.commit_group; 2026-02-21T09:23:31.2067804Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:31.2067859Z bar.sync 0; 2026-02-21T09:23:31.2068133Z .loc 1 31 88 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:88 2026-02-21T09:23:31.2068200Z add.s32 %r12875, %r12875, 1; 2026-02-21T09:23:31.2068356Z setp.ne.b32 %p200, %r12875, %r3; 2026-02-21T09:23:31.2068425Z @%p200 bra $L__BB0_11; 2026-02-21T09:23:31.2068518Z $L__BB0_14: // %._crit_edge 2026-02-21T09:23:31.2068737Z .loc 1 31 4 // cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py:31:4 2026-02-21T09:23:31.2068860Z ret; 2026-02-21T09:23:31.2068920Z $L__tmp9: 2026-02-21T09:23:31.2068981Z $L__func_end0: 2026-02-21T09:23:31.2069071Z // -- End function 2026-02-21T09:23:31.2069125Z } 2026-02-21T09:23:31.2069371Z .file 1 "/tmp/torchinductor_root/nl/cnlnm23revxarq5q5fe6q2fdt5yovpbxx4oeezty5gndrpq2ifp4.py" 2026-02-21T09:23:31.2069605Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:23:31.2069673Z .section .debug_abbrev 2026-02-21T09:23:31.2069726Z { 2026-02-21T09:23:31.2069830Z .b8 1 // Abbreviation Code 2026-02-21T09:23:31.2069926Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:23:31.2070014Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:23:31.2070105Z .b8 37 // DW_AT_producer 2026-02-21T09:23:31.2070187Z .b8 8 // DW_FORM_string 2026-02-21T09:23:31.2070267Z .b8 19 // DW_AT_language 2026-02-21T09:23:31.2070349Z .b8 5 // DW_FORM_data2 2026-02-21T09:23:31.2070434Z .b8 3 // DW_AT_name 2026-02-21T09:23:31.2070512Z .b8 8 // DW_FORM_string 2026-02-21T09:23:31.2070605Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:23:31.2070693Z .b8 6 // DW_FORM_data4 2026-02-21T09:23:31.2070776Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:23:31.2070855Z .b8 8 // DW_FORM_string 2026-02-21T09:23:31.2070936Z .b8 0 // EOM(1) 2026-02-21T09:23:31.2071013Z .b8 0 // EOM(2) 2026-02-21T09:23:31.2071103Z .b8 2 // Abbreviation Code 2026-02-21T09:23:31.2071191Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:23:31.2071278Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:23:31.2071354Z .b8 3 // DW_AT_name 2026-02-21T09:23:31.2071433Z .b8 8 // DW_FORM_string 2026-02-21T09:23:31.2071517Z .b8 32 // DW_AT_inline 2026-02-21T09:23:31.2071601Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:31.2071671Z .b8 0 // EOM(1) 2026-02-21T09:23:31.2071745Z .b8 0 // EOM(2) 2026-02-21T09:23:31.2071833Z .b8 3 // Abbreviation Code 2026-02-21T09:23:31.2071919Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:23:31.2072128Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:23:31.2072208Z .b8 17 // DW_AT_low_pc 2026-02-21T09:23:31.2072287Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:31.2072369Z .b8 18 // DW_AT_high_pc 2026-02-21T09:23:31.2072450Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:31.2072543Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:23:31.2072619Z .b8 19 // DW_FORM_ref4 2026-02-21T09:23:31.2072692Z .b8 0 // EOM(1) 2026-02-21T09:23:31.2072759Z .b8 0 // EOM(2) 2026-02-21T09:23:31.2072903Z .b8 4 // Abbreviation Code 2026-02-21T09:23:31.2073014Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:23:31.2073097Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:23:31.2073192Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:23:31.2073269Z .b8 19 // DW_FORM_ref4 2026-02-21T09:23:31.2073351Z .b8 17 // DW_AT_low_pc 2026-02-21T09:23:31.2073479Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:31.2073565Z .b8 18 // DW_AT_high_pc 2026-02-21T09:23:31.2073644Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:31.2073726Z .b8 88 // DW_AT_call_file 2026-02-21T09:23:31.2073815Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:31.2073900Z .b8 89 // DW_AT_call_line 2026-02-21T09:23:31.2073980Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:31.2074063Z .b8 87 // DW_AT_call_column 2026-02-21T09:23:31.2074141Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:31.2074220Z .b8 0 // EOM(1) 2026-02-21T09:23:31.2074289Z .b8 0 // EOM(2) 2026-02-21T09:23:31.2074357Z .b8 0 // EOM(3) 2026-02-21T09:23:31.2074414Z } 2026-02-21T09:23:31.2074478Z .section .debug_info 2026-02-21T09:23:31.2074530Z { 2026-02-21T09:23:31.2074627Z .b32 178 // Length of Unit 2026-02-21T09:23:31.2074718Z .b8 2 // DWARF version number 2026-02-21T09:23:31.2074770Z .b8 0 2026-02-21T09:23:31.2074902Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:23:31.2075005Z .b8 8 // Address Size (in bytes) 2026-02-21T09:23:31.2075133Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:23:31.2075221Z .b8 116 // DW_AT_producer 2026-02-21T09:23:31.2075283Z .b8 114 2026-02-21T09:23:31.2075340Z .b8 105 2026-02-21T09:23:31.2075393Z .b8 116 2026-02-21T09:23:31.2075444Z .b8 111 2026-02-21T09:23:31.2075501Z .b8 110 2026-02-21T09:23:31.2075551Z .b8 0 2026-02-21T09:23:31.2075632Z .b8 2 // DW_AT_language 2026-02-21T09:23:31.2075689Z .b8 0 2026-02-21T09:23:31.2075766Z .b8 99 // DW_AT_name 2026-02-21T09:23:31.2075818Z .b8 110 2026-02-21T09:23:31.2075870Z .b8 108 2026-02-21T09:23:31.2075927Z .b8 110 2026-02-21T09:23:31.2075978Z .b8 109 2026-02-21T09:23:31.2076033Z .b8 50 2026-02-21T09:23:31.2076091Z .b8 51 2026-02-21T09:23:31.2076143Z .b8 114 2026-02-21T09:23:31.2076194Z .b8 101 2026-02-21T09:23:31.2076247Z .b8 118 2026-02-21T09:23:31.2076302Z .b8 120 2026-02-21T09:23:31.2076354Z .b8 97 2026-02-21T09:23:31.2076406Z .b8 114 2026-02-21T09:23:31.2076616Z .b8 113 2026-02-21T09:23:31.2076673Z .b8 53 2026-02-21T09:23:31.2076727Z .b8 113 2026-02-21T09:23:31.2076778Z .b8 53 2026-02-21T09:23:31.2076835Z .b8 102 2026-02-21T09:23:31.2077043Z .b8 101 2026-02-21T09:23:31.2077096Z .b8 54 2026-02-21T09:23:31.2077153Z .b8 113 2026-02-21T09:23:31.2077206Z .b8 50 2026-02-21T09:23:31.2077259Z .b8 102 2026-02-21T09:23:31.2077310Z .b8 100 2026-02-21T09:23:31.2077366Z .b8 116 2026-02-21T09:23:31.2077418Z .b8 53 2026-02-21T09:23:31.2077473Z .b8 121 2026-02-21T09:23:31.2077524Z .b8 111 2026-02-21T09:23:31.2077581Z .b8 118 2026-02-21T09:23:31.2077645Z .b8 112 2026-02-21T09:23:31.2077698Z .b8 98 2026-02-21T09:23:31.2077756Z .b8 120 2026-02-21T09:23:31.2077807Z .b8 120 2026-02-21T09:23:31.2077859Z .b8 52 2026-02-21T09:23:31.2077912Z .b8 111 2026-02-21T09:23:31.2077969Z .b8 101 2026-02-21T09:23:31.2078021Z .b8 101 2026-02-21T09:23:31.2078072Z .b8 122 2026-02-21T09:23:31.2078128Z .b8 116 2026-02-21T09:23:31.2078183Z .b8 121 2026-02-21T09:23:31.2078316Z .b8 53 2026-02-21T09:23:31.2078371Z .b8 103 2026-02-21T09:23:31.2078429Z .b8 110 2026-02-21T09:23:31.2078481Z .b8 100 2026-02-21T09:23:31.2078533Z .b8 114 2026-02-21T09:23:31.2078587Z .b8 112 2026-02-21T09:23:31.2078645Z .b8 113 2026-02-21T09:23:31.2078696Z .b8 50 2026-02-21T09:23:31.2078748Z .b8 105 2026-02-21T09:23:31.2078802Z .b8 102 2026-02-21T09:23:31.2078854Z .b8 112 2026-02-21T09:23:31.2078904Z .b8 52 2026-02-21T09:23:31.2078955Z .b8 46 2026-02-21T09:23:31.2079085Z .b8 112 2026-02-21T09:23:31.2079145Z .b8 121 2026-02-21T09:23:31.2079197Z .b8 0 2026-02-21T09:23:31.2079308Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:23:31.2079390Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:23:31.2079443Z .b8 116 2026-02-21T09:23:31.2079504Z .b8 109 2026-02-21T09:23:31.2079566Z .b8 112 2026-02-21T09:23:31.2079617Z .b8 47 2026-02-21T09:23:31.2079668Z .b8 116 2026-02-21T09:23:31.2079734Z .b8 111 2026-02-21T09:23:31.2079787Z .b8 114 2026-02-21T09:23:31.2079840Z .b8 99 2026-02-21T09:23:31.2079892Z .b8 104 2026-02-21T09:23:31.2079952Z .b8 105 2026-02-21T09:23:31.2080003Z .b8 110 2026-02-21T09:23:31.2080054Z .b8 100 2026-02-21T09:23:31.2080110Z .b8 117 2026-02-21T09:23:31.2080165Z .b8 99 2026-02-21T09:23:31.2080216Z .b8 116 2026-02-21T09:23:31.2080267Z .b8 111 2026-02-21T09:23:31.2080321Z .b8 114 2026-02-21T09:23:31.2080372Z .b8 95 2026-02-21T09:23:31.2080424Z .b8 114 2026-02-21T09:23:31.2080475Z .b8 111 2026-02-21T09:23:31.2080530Z .b8 111 2026-02-21T09:23:31.2080583Z .b8 116 2026-02-21T09:23:31.2080634Z .b8 47 2026-02-21T09:23:31.2080689Z .b8 110 2026-02-21T09:23:31.2080740Z .b8 108 2026-02-21T09:23:31.2080790Z .b8 0 2026-02-21T09:23:31.2080903Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:23:31.2080985Z .b8 95 // DW_AT_name 2026-02-21T09:23:31.2081037Z .b8 104 2026-02-21T09:23:31.2081090Z .b8 101 2026-02-21T09:23:31.2081145Z .b8 108 2026-02-21T09:23:31.2081198Z .b8 105 2026-02-21T09:23:31.2081261Z .b8 111 2026-02-21T09:23:31.2081316Z .b8 110 2026-02-21T09:23:31.2081374Z .b8 95 2026-02-21T09:23:31.2081426Z .b8 109 2026-02-21T09:23:31.2081478Z .b8 97 2026-02-21T09:23:31.2081538Z .b8 116 2026-02-21T09:23:31.2081589Z .b8 109 2026-02-21T09:23:31.2081643Z .b8 117 2026-02-21T09:23:31.2081695Z .b8 108 2026-02-21T09:23:31.2081751Z .b8 95 2026-02-21T09:23:31.2081804Z .b8 98 2026-02-21T09:23:31.2081856Z .b8 102 2026-02-21T09:23:31.2081908Z .b8 49 2026-02-21T09:23:31.2081967Z .b8 54 2026-02-21T09:23:31.2082018Z .b8 95 2026-02-21T09:23:31.2082069Z .b8 105 2026-02-21T09:23:31.2082125Z .b8 110 2026-02-21T09:23:31.2082178Z .b8 116 2026-02-21T09:23:31.2082228Z .b8 52 2026-02-21T09:23:31.2082279Z .b8 0 2026-02-21T09:23:31.2082365Z .b8 1 // DW_AT_inline 2026-02-21T09:23:31.2082471Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:23:31.2082564Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:23:31.2082665Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:23:31.2082764Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:23:31.2082895Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:23:31.2083133Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:23:31.2083225Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:23:31.2083317Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T09:23:31.2083405Z .b8 1 // DW_AT_call_file 2026-02-21T09:23:31.2083489Z .b8 88 // DW_AT_call_line 2026-02-21T09:23:31.2083577Z .b8 40 // DW_AT_call_column 2026-02-21T09:23:31.2083667Z .b8 0 // End Of Children Mark 2026-02-21T09:23:31.2083762Z .b8 0 // End Of Children Mark 2026-02-21T09:23:31.2083815Z } 2026-02-21T09:23:31.2083939Z .section .debug_macinfo { } 2026-02-21T09:23:31.2083945Z 2026-02-21T09:23:31.2084034Z ================================================================ 2026-02-21T09:23:31.2084162Z please share the reproducer above with Triton project. 2026-02-21T09:23:32.8301011Z 2026-02-21T09:23:32.8301025Z 2026-02-21T09:23:32.8301030Z 2026-02-21T09:23:32.8301527Z ================================================================ 2026-02-21T09:23:32.8302348Z Internal Triton PTX codegen error 2026-02-21T09:23:32.8302644Z `ptxas` stderr: 2026-02-21T09:23:32.8303428Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1389 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:23:32.8304255Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:23:32.8304492Z 2026-02-21T09:23:32.8305143Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpcdmj90jr.ptx -o /tmp/tmpcdmj90jr.ptx.o 2026-02-21T09:23:32.8305911Z 2026-02-21T09:23:32.8305916Z 2026-02-21T09:23:32.8305990Z // 2026-02-21T09:23:32.8306191Z // Generated by LLVM NVPTX Back-End 2026-02-21T09:23:32.8306690Z // 2026-02-21T09:23:32.8306796Z 2026-02-21T09:23:32.8306874Z .version 8.7 2026-02-21T09:23:32.8307064Z .target sm_90a 2026-02-21T09:23:32.8307268Z .address_size 64 2026-02-21T09:23:32.8307397Z 2026-02-21T09:23:32.8307639Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T09:23:32.8308092Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T09:23:32.8308522Z // @_helion_matmul_bf16_int4 2026-02-21T09:23:32.8308862Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T09:23:32.8309257Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T09:23:32.8309758Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T09:23:32.8310201Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T09:23:32.8310587Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T09:23:32.8310983Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T09:23:32.8311284Z ) 2026-02-21T09:23:32.8311432Z .reqntid 128 2026-02-21T09:23:32.8311590Z .maxnreg 32 2026-02-21T09:23:32.8311742Z { 2026-02-21T09:23:32.8311888Z .reg .pred %p<199>; 2026-02-21T09:23:32.8312069Z .reg .b16 %rs<897>; 2026-02-21T09:23:32.8312240Z .reg .b32 %r<12851>; 2026-02-21T09:23:32.8312434Z .reg .b64 %rd<744>; 2026-02-21T09:23:32.8312776Z .loc 1 19 0 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:19:0 2026-02-21T09:23:32.8313193Z $L__func_begin0: 2026-02-21T09:23:32.8313525Z .loc 1 19 0 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:19:0 2026-02-21T09:23:32.8313850Z 2026-02-21T09:23:32.8313916Z // %bb.0: 2026-02-21T09:23:32.8314137Z ld.param.b64 %rd95, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T09:23:32.8314415Z $L__tmp0: 2026-02-21T09:23:32.8314721Z .loc 1 21 66 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:21:66 2026-02-21T09:23:32.8315417Z mov.u32 %r1196, %ctaid.x; 2026-02-21T09:23:32.8315702Z ld.param.b64 %rd113, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T09:23:32.8315991Z mov.u32 %r1197, %ctaid.y; 2026-02-21T09:23:32.8316843Z [693s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T09:23:32.8318812Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=7, num_warps=4, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[False, None], range_num_stages=[2, 3], range_unroll_factors=[4, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:23:32.8320678Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T09:23:32.8320981Z `ptxas` stderr: 2026-02-21T09:23:32.8321593Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1389 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T09:23:32.8322251Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T09:23:32.8322533Z 2026-02-21T09:23:32.8323053Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpcdmj90jr.ptx -o /tmp/tmpcdmj90jr.ptx.o 2026-02-21T09:23:32.8323636Z 2026-02-21T09:23:32.8323804Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T09:23:32.8324172Z ld.param.b64 %rd130, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T09:23:32.8324449Z mov.u32 %r1198, %ctaid.z; 2026-02-21T09:23:32.8324634Z mov.u32 %r1199, %nctaid.x; 2026-02-21T09:23:32.8324835Z mov.u32 %r1200, %nctaid.y; 2026-02-21T09:23:32.8325036Z mad.lo.s32 %r1201, %r1198, %r1200, %r1197; 2026-02-21T09:23:32.8325320Z mad.lo.s32 %r1202, %r1201, %r1199, %r1196; 2026-02-21T09:23:32.8325574Z shl.b32 %r1203, %r1202, 8; 2026-02-21T09:23:32.8325755Z cvt.s64.s32 %rd131, %r1203; 2026-02-21T09:23:32.8325960Z add.s64 %rd109, %rd130, %rd131; 2026-02-21T09:23:32.8326157Z mov.u32 %r1, %tid.x; 2026-02-21T09:23:32.8326345Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T09:23:32.8326686Z shl.b32 %r1204, %r1, 2; 2026-02-21T09:23:32.8326882Z mov.b32 %r1205, global_smem; 2026-02-21T09:23:32.8327071Z add.s32 %r1180, %r1205, %r1204; 2026-02-21T09:23:32.8327268Z mov.b32 %r1189, 0; 2026-02-21T09:23:32.8327438Z // begin inline asm 2026-02-21T09:23:32.8327622Z @%p1 st.shared.b32 [ %r1180 + 0 ], %r1189; 2026-02-21T09:23:32.8327848Z // end inline asm 2026-02-21T09:23:32.8328007Z bar.warp.sync -1; 2026-02-21T09:23:32.8328183Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T09:23:32.8328369Z cvt.u64.u32 %rd94, %r1205; 2026-02-21T09:23:32.8328563Z // begin inline asm 2026-02-21T09:23:32.8328898Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd94 + 0 ], %rd95; 2026-02-21T09:23:32.8329285Z // end inline asm 2026-02-21T09:23:32.8329448Z // begin inline asm 2026-02-21T09:23:32.8329731Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1; 2026-02-21T09:23:32.8330056Z // end inline asm 2026-02-21T09:23:32.8330219Z mov.b32 %r1182, 128; 2026-02-21T09:23:32.8330408Z // begin inline asm 2026-02-21T09:23:32.8330692Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1182; 2026-02-21T09:23:32.8331044Z // end inline asm 2026-02-21T09:23:32.8331206Z mov.b32 %r1183, 32; 2026-02-21T09:23:32.8331384Z // begin inline asm 2026-02-21T09:23:32.8331685Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1183; 2026-02-21T09:23:32.8332020Z // end inline asm 2026-02-21T09:23:32.8332192Z mov.b32 %r1184, 8192; 2026-02-21T09:23:32.8332364Z // begin inline asm 2026-02-21T09:23:32.8332671Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1184; 2026-02-21T09:23:32.8333223Z // end inline asm 2026-02-21T09:23:32.8333415Z mov.b32 %r1185, 512; 2026-02-21T09:23:32.8333594Z // begin inline asm 2026-02-21T09:23:32.8333900Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1185; 2026-02-21T09:23:32.8334273Z // end inline asm 2026-02-21T09:23:32.8334447Z mov.b64 %rd102, 8192; 2026-02-21T09:23:32.8334618Z // begin inline asm 2026-02-21T09:23:32.8334946Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd94 + 0 ], 0x0, %rd102; 2026-02-21T09:23:32.8335318Z // end inline asm 2026-02-21T09:23:32.8335472Z mov.b32 %r1186, 1; 2026-02-21T09:23:32.8335643Z // begin inline asm 2026-02-21T09:23:32.8335975Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1186; 2026-02-21T09:23:32.8336437Z // end inline asm 2026-02-21T09:23:32.8336766Z // begin inline asm 2026-02-21T09:23:32.8337081Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1186; 2026-02-21T09:23:32.8337450Z // end inline asm 2026-02-21T09:23:32.8337622Z // begin inline asm 2026-02-21T09:23:32.8337907Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0; 2026-02-21T09:23:32.8338240Z // end inline asm 2026-02-21T09:23:32.8338486Z // begin inline asm 2026-02-21T09:23:32.8338805Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0; 2026-02-21T09:23:32.8339156Z // end inline asm 2026-02-21T09:23:32.8339313Z // begin inline asm 2026-02-21T09:23:32.8339600Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x3; 2026-02-21T09:23:32.8339937Z // end inline asm 2026-02-21T09:23:32.8340111Z // begin inline asm 2026-02-21T09:23:32.8340392Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0; 2026-02-21T09:23:32.8340722Z // end inline asm 2026-02-21T09:23:32.8340878Z // begin inline asm 2026-02-21T09:23:32.8341327Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd109 + 0 ], [ %rd94 + 0 ], 0x80; 2026-02-21T09:23:32.8341813Z // end inline asm 2026-02-21T09:23:32.8341973Z // begin inline asm 2026-02-21T09:23:32.8342241Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd109 + 0 ], 0x80; 2026-02-21T09:23:32.8342558Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:23:32.8342806Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:23:32.8343021Z // end inline asm 2026-02-21T09:23:32.8343180Z bar.sync 0; 2026-02-21T09:23:32.8343343Z cvta.global.u64 %rd603, %rd109; 2026-02-21T09:23:32.8343697Z .loc 1 23 68 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:23:68 2026-02-21T09:23:32.8344080Z add.s64 %rd127, %rd109, 128; 2026-02-21T09:23:32.8344276Z bar.sync 0; 2026-02-21T09:23:32.8344432Z // begin inline asm 2026-02-21T09:23:32.8344610Z @%p1 st.shared.b32 [ %r1180 + 0 ], %r1189; 2026-02-21T09:23:32.8344831Z // end inline asm 2026-02-21T09:23:32.8344988Z bar.warp.sync -1; 2026-02-21T09:23:32.8345160Z // begin inline asm 2026-02-21T09:23:32.8345465Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd94 + 0 ], %rd113; 2026-02-21T09:23:32.8345817Z // end inline asm 2026-02-21T09:23:32.8345968Z // begin inline asm 2026-02-21T09:23:32.8346243Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1; 2026-02-21T09:23:32.8346702Z // end inline asm 2026-02-21T09:23:32.8346864Z mov.b32 %r1190, 64; 2026-02-21T09:23:32.8347030Z // begin inline asm 2026-02-21T09:23:32.8347314Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1190; 2026-02-21T09:23:32.8347667Z // end inline asm 2026-02-21T09:23:32.8347820Z // begin inline asm 2026-02-21T09:23:32.8348109Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1182; 2026-02-21T09:23:32.8348535Z // end inline asm 2026-02-21T09:23:32.8348689Z // begin inline asm 2026-02-21T09:23:32.8348995Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1184; 2026-02-21T09:23:32.8349519Z // end inline asm 2026-02-21T09:23:32.8349682Z mov.b32 %r1193, 16384; 2026-02-21T09:23:32.8349852Z // begin inline asm 2026-02-21T09:23:32.8365062Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1193; 2026-02-21T09:23:32.8365534Z // end inline asm 2026-02-21T09:23:32.8365722Z mov.b64 %rd120, 16384; 2026-02-21T09:23:32.8365911Z // begin inline asm 2026-02-21T09:23:32.8366266Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd94 + 0 ], 0x0, %rd120; 2026-02-21T09:23:32.8366831Z // end inline asm 2026-02-21T09:23:32.8366994Z // begin inline asm 2026-02-21T09:23:32.8367330Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0, %r1186; 2026-02-21T09:23:32.8367854Z // end inline asm 2026-02-21T09:23:32.8368030Z // begin inline asm 2026-02-21T09:23:32.8368374Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x1, %r1186; 2026-02-21T09:23:32.8368767Z // end inline asm 2026-02-21T09:23:32.8368936Z // begin inline asm 2026-02-21T09:23:32.8369236Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd94 + 0 ], 0xa; 2026-02-21T09:23:32.8369579Z // end inline asm 2026-02-21T09:23:32.8369835Z // begin inline asm 2026-02-21T09:23:32.8370157Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0; 2026-02-21T09:23:32.8370524Z // end inline asm 2026-02-21T09:23:32.8370694Z // begin inline asm 2026-02-21T09:23:32.8371012Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x3; 2026-02-21T09:23:32.8371367Z // end inline asm 2026-02-21T09:23:32.8371534Z // begin inline asm 2026-02-21T09:23:32.8371823Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd94 + 0 ], 0x0; 2026-02-21T09:23:32.8372167Z // end inline asm 2026-02-21T09:23:32.8372324Z // begin inline asm 2026-02-21T09:23:32.8372776Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd127 + 0 ], [ %rd94 + 0 ], 0x80; 2026-02-21T09:23:32.8373262Z // end inline asm 2026-02-21T09:23:32.8373424Z // begin inline asm 2026-02-21T09:23:32.8373686Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd127 + 0 ], 0x80; 2026-02-21T09:23:32.8374005Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T09:23:32.8374255Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T09:23:32.8374475Z // end inline asm 2026-02-21T09:23:32.8374634Z bar.sync 0; 2026-02-21T09:23:32.8374800Z cvta.global.u64 %rd586, %rd127; 2026-02-21T09:23:32.8375164Z .loc 1 29 35 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:29:35 2026-02-21T09:23:32.8375532Z shl.b32 %r12326, %r1196, 3; 2026-02-21T09:23:32.8375883Z .loc 1 30 37 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:30:37 2026-02-21T09:23:32.8376244Z add.s32 %r1206, %r12326, 8; 2026-02-21T09:23:32.8376692Z .loc 1 30 49 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:30:49 2026-02-21T09:23:32.8377061Z min.s32 %r3, %r1206, 8192; 2026-02-21T09:23:32.8377386Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.8377758Z setp.ge.s32 %p37, %r12326, %r3; 2026-02-21T09:23:32.8377960Z @%p37 bra $L__BB0_11; 2026-02-21T09:23:32.8378165Z // %bb.1: // %.lr.ph 2026-02-21T09:23:32.8378550Z .loc 1 0 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:0:88 2026-02-21T09:23:32.8378966Z ld.param.b64 %rd93, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T09:23:32.8379232Z shr.u32 %r4, %r1, 5; 2026-02-21T09:23:32.8379402Z and.b32 %r5, %r1, 112; 2026-02-21T09:23:32.8379583Z bfe.u32 %r6, %r1, 4, 3; 2026-02-21T09:23:32.8379765Z or.b32 %r7, %r6, 8; 2026-02-21T09:23:32.8379935Z or.b32 %r8, %r6, 16; 2026-02-21T09:23:32.8380098Z or.b32 %r9, %r6, 24; 2026-02-21T09:23:32.8380265Z or.b32 %r10, %r6, 32; 2026-02-21T09:23:32.8380631Z or.b32 %r11, %r6, 40; 2026-02-21T09:23:32.8380795Z or.b32 %r12, %r6, 48; 2026-02-21T09:23:32.8380965Z or.b32 %r13, %r6, 56; 2026-02-21T09:23:32.8381126Z or.b32 %r14, %r6, 64; 2026-02-21T09:23:32.8381292Z or.b32 %r15, %r6, 72; 2026-02-21T09:23:32.8381452Z or.b32 %r16, %r6, 80; 2026-02-21T09:23:32.8381619Z or.b32 %r17, %r6, 88; 2026-02-21T09:23:32.8381793Z or.b32 %r18, %r6, 96; 2026-02-21T09:23:32.8381970Z or.b32 %r19, %r6, 104; 2026-02-21T09:23:32.8382145Z or.b32 %r20, %r6, 112; 2026-02-21T09:23:32.8382322Z or.b32 %r21, %r6, 120; 2026-02-21T09:23:32.8382501Z and.b32 %r22, %r1, 15; 2026-02-21T09:23:32.8382670Z shl.b32 %r23, %r22, 2; 2026-02-21T09:23:32.8382850Z and.b32 %r24, %r1, 127; 2026-02-21T09:23:32.8383030Z shl.b32 %r1207, %r24, 3; 2026-02-21T09:23:32.8383337Z shr.u32 %r1208, %r5, 1; 2026-02-21T09:23:32.8383530Z xor.b32 %r25, %r1207, %r1208; 2026-02-21T09:23:32.8383731Z add.s32 %r9495, %r1205, %r25; 2026-02-21T09:23:32.8383922Z add.s32 %r9497, %r9495, 1024; 2026-02-21T09:23:32.8384120Z add.s32 %r9499, %r9495, 2048; 2026-02-21T09:23:32.8384302Z add.s32 %r9501, %r9495, 3072; 2026-02-21T09:23:32.8384491Z add.s32 %r9503, %r9495, 4096; 2026-02-21T09:23:32.8384681Z add.s32 %r9505, %r9495, 5120; 2026-02-21T09:23:32.8384940Z add.s32 %r9507, %r9495, 6144; 2026-02-21T09:23:32.8385135Z add.s32 %r9509, %r9495, 7168; 2026-02-21T09:23:32.8385323Z add.s32 %r9511, %r9495, 8192; 2026-02-21T09:23:32.8385513Z add.s32 %r9513, %r9495, 9216; 2026-02-21T09:23:32.8385699Z add.s32 %r9515, %r9495, 10240; 2026-02-21T09:23:32.8385897Z add.s32 %r9517, %r9495, 11264; 2026-02-21T09:23:32.8386083Z add.s32 %r9519, %r9495, 12288; 2026-02-21T09:23:32.8386277Z add.s32 %r9521, %r9495, 13312; 2026-02-21T09:23:32.8386599Z add.s32 %r9523, %r9495, 14336; 2026-02-21T09:23:32.8386806Z add.s32 %r9525, %r9495, 15360; 2026-02-21T09:23:32.8387001Z add.s32 %r9532, %r9495, 16384; 2026-02-21T09:23:32.8387186Z add.s32 %r9534, %r9495, 17408; 2026-02-21T09:23:32.8387384Z add.s32 %r9536, %r9495, 18432; 2026-02-21T09:23:32.8387571Z add.s32 %r9538, %r9495, 19456; 2026-02-21T09:23:32.8387766Z add.s32 %r9540, %r9495, 20480; 2026-02-21T09:23:32.8387954Z add.s32 %r9542, %r9495, 21504; 2026-02-21T09:23:32.8388154Z add.s32 %r9544, %r9495, 22528; 2026-02-21T09:23:32.8388430Z add.s32 %r9546, %r9495, 23552; 2026-02-21T09:23:32.8388627Z add.s32 %r9548, %r9495, 24576; 2026-02-21T09:23:32.8388825Z add.s32 %r9550, %r9495, 25600; 2026-02-21T09:23:32.8389015Z add.s32 %r9552, %r9495, 26624; 2026-02-21T09:23:32.8389204Z add.s32 %r9554, %r9495, 27648; 2026-02-21T09:23:32.8389387Z add.s32 %r9556, %r9495, 28672; 2026-02-21T09:23:32.8389576Z add.s32 %r9558, %r9495, 29696; 2026-02-21T09:23:32.8389771Z add.s32 %r9560, %r9495, 30720; 2026-02-21T09:23:32.8389978Z add.s32 %r9562, %r9495, 31744; 2026-02-21T09:23:32.8390165Z shl.b32 %r1210, %r1, 6; 2026-02-21T09:23:32.8390356Z and.b32 %r1211, %r1210, 6144; 2026-02-21T09:23:32.8390541Z shl.b32 %r1212, %r1, 5; 2026-02-21T09:23:32.8390730Z and.b32 %r1213, %r1212, 896; 2026-02-21T09:23:32.8390922Z shl.b32 %r1214, %r1, 1; 2026-02-21T09:23:32.8391097Z and.b32 %r1215, %r1214, 62; 2026-02-21T09:23:32.8391291Z or.b32 %r1216, %r1211, %r1213; 2026-02-21T09:23:32.8391493Z or.b32 %r58, %r1216, %r1215; 2026-02-21T09:23:32.8391691Z xor.b32 %r59, %r58, 8; 2026-02-21T09:23:32.8391866Z xor.b32 %r60, %r58, 16; 2026-02-21T09:23:32.8392049Z xor.b32 %r61, %r58, 24; 2026-02-21T09:23:32.8392220Z xor.b32 %r62, %r58, 32; 2026-02-21T09:23:32.8392403Z xor.b32 %r63, %r58, 40; 2026-02-21T09:23:32.8392572Z xor.b32 %r64, %r58, 48; 2026-02-21T09:23:32.8392747Z xor.b32 %r65, %r58, 56; 2026-02-21T09:23:32.8392932Z shl.b32 %r1217, %r24, 7; 2026-02-21T09:23:32.8393110Z shl.b32 %r1218, %r1, 4; 2026-02-21T09:23:32.8393292Z and.b32 %r1219, %r1218, 112; 2026-02-21T09:23:32.8393476Z or.b32 %r1220, %r1217, %r1219; 2026-02-21T09:23:32.8393672Z add.s32 %r1221, %r1205, 32768; 2026-02-21T09:23:32.8393864Z add.s32 %r66, %r1221, %r1220; 2026-02-21T09:23:32.8394241Z xor.b32 %r1222, %r1220, 16; 2026-02-21T09:23:32.8394431Z add.s32 %r67, %r1221, %r1222; 2026-02-21T09:23:32.8394616Z xor.b32 %r1223, %r1220, 32; 2026-02-21T09:23:32.8394805Z add.s32 %r68, %r1221, %r1223; 2026-02-21T09:23:32.8394990Z xor.b32 %r1224, %r1220, 48; 2026-02-21T09:23:32.8395178Z add.s32 %r69, %r1221, %r1224; 2026-02-21T09:23:32.8395361Z xor.b32 %r1225, %r1220, 64; 2026-02-21T09:23:32.8395565Z add.s32 %r70, %r1221, %r1225; 2026-02-21T09:23:32.8395751Z xor.b32 %r1226, %r1220, 80; 2026-02-21T09:23:32.8395940Z add.s32 %r71, %r1221, %r1226; 2026-02-21T09:23:32.8396127Z xor.b32 %r1227, %r1220, 96; 2026-02-21T09:23:32.8396317Z add.s32 %r72, %r1221, %r1227; 2026-02-21T09:23:32.8396624Z xor.b32 %r1228, %r1220, 112; 2026-02-21T09:23:32.8396911Z add.s32 %r73, %r1221, %r1228; 2026-02-21T09:23:32.8397109Z bfe.u32 %r1229, %r1221, 4, 14; 2026-02-21T09:23:32.8397297Z cvt.u64.u32 %rd132, %r1229; 2026-02-21T09:23:32.8397511Z or.b64 %rd702, %rd132, 4611686293372403712; 2026-02-21T09:23:32.8397741Z add.s32 %r1230, %r1205, 32800; 2026-02-21T09:23:32.8397935Z bfe.u32 %r1231, %r1230, 4, 14; 2026-02-21T09:23:32.8398124Z cvt.u64.u32 %rd133, %r1231; 2026-02-21T09:23:32.8398328Z or.b64 %rd703, %rd133, 4611686293372403712; 2026-02-21T09:23:32.8398651Z add.s32 %r1232, %r1205, 32832; 2026-02-21T09:23:32.8398839Z bfe.u32 %r1233, %r1232, 4, 14; 2026-02-21T09:23:32.8399033Z cvt.u64.u32 %rd134, %r1233; 2026-02-21T09:23:32.8399223Z or.b64 %rd704, %rd134, 4611686293372403712; 2026-02-21T09:23:32.8399439Z add.s32 %r1234, %r1205, 32864; 2026-02-21T09:23:32.8399637Z bfe.u32 %r1235, %r1234, 4, 14; 2026-02-21T09:23:32.8399826Z cvt.u64.u32 %rd135, %r1235; 2026-02-21T09:23:32.8400022Z or.b64 %rd705, %rd135, 4611686293372403712; 2026-02-21T09:23:32.8400244Z add.s32 %r1236, %r1205, 49152; 2026-02-21T09:23:32.8400436Z bfe.u32 %r1237, %r1236, 4, 14; 2026-02-21T09:23:32.8400622Z cvt.u64.u32 %rd136, %r1237; 2026-02-21T09:23:32.8400823Z or.b64 %rd706, %rd136, 4611686293372403712; 2026-02-21T09:23:32.8401042Z add.s32 %r1238, %r1205, 49184; 2026-02-21T09:23:32.8401239Z bfe.u32 %r1239, %r1238, 4, 14; 2026-02-21T09:23:32.8401427Z cvt.u64.u32 %rd137, %r1239; 2026-02-21T09:23:32.8401626Z or.b64 %rd707, %rd137, 4611686293372403712; 2026-02-21T09:23:32.8401857Z add.s32 %r1240, %r1205, 49216; 2026-02-21T09:23:32.8402053Z bfe.u32 %r1241, %r1240, 4, 14; 2026-02-21T09:23:32.8402240Z cvt.u64.u32 %rd138, %r1241; 2026-02-21T09:23:32.8402438Z or.b64 %rd708, %rd138, 4611686293372403712; 2026-02-21T09:23:32.8402665Z add.s32 %r1242, %r1205, 49248; 2026-02-21T09:23:32.8402859Z bfe.u32 %r1243, %r1242, 4, 14; 2026-02-21T09:23:32.8403044Z cvt.u64.u32 %rd139, %r1243; 2026-02-21T09:23:32.8403247Z or.b64 %rd709, %rd139, 4611686293372403712; 2026-02-21T09:23:32.8403476Z shl.b32 %r1244, %r22, 7; 2026-02-21T09:23:32.8403663Z and.b32 %r1245, %r1, 16; 2026-02-21T09:23:32.8403843Z or.b32 %r1246, %r1244, %r1219; 2026-02-21T09:23:32.8404041Z xor.b32 %r1247, %r1246, %r1245; 2026-02-21T09:23:32.8404244Z or.b32 %r1248, %r1247, %r1211; 2026-02-21T09:23:32.8404443Z add.s32 %r74, %r1205, %r1248; 2026-02-21T09:23:32.8404646Z add.s32 %r75, %r74, 16384; 2026-02-21T09:23:32.8404833Z add.s32 %r76, %r74, 8192; 2026-02-21T09:23:32.8405017Z add.s32 %r77, %r74, 24576; 2026-02-21T09:23:32.8405196Z xor.b32 %r1249, %r1248, 32; 2026-02-21T09:23:32.8405384Z add.s32 %r78, %r1205, %r1249; 2026-02-21T09:23:32.8405565Z add.s32 %r79, %r78, 16384; 2026-02-21T09:23:32.8405747Z add.s32 %r80, %r78, 8192; 2026-02-21T09:23:32.8405919Z add.s32 %r81, %r78, 24576; 2026-02-21T09:23:32.8406103Z xor.b32 %r1250, %r1248, 64; 2026-02-21T09:23:32.8406279Z add.s32 %r82, %r1205, %r1250; 2026-02-21T09:23:32.8406587Z add.s32 %r83, %r82, 16384; 2026-02-21T09:23:32.8406776Z add.s32 %r84, %r82, 8192; 2026-02-21T09:23:32.8406957Z add.s32 %r85, %r82, 24576; 2026-02-21T09:23:32.8407139Z xor.b32 %r1251, %r1248, 96; 2026-02-21T09:23:32.8407319Z add.s32 %r86, %r1205, %r1251; 2026-02-21T09:23:32.8407707Z add.s32 %r87, %r86, 16384; 2026-02-21T09:23:32.8407885Z add.s32 %r88, %r86, 8192; 2026-02-21T09:23:32.8408061Z add.s32 %r89, %r86, 24576; 2026-02-21T09:23:32.8408238Z or.b32 %r1252, %r1213, %r1215; 2026-02-21T09:23:32.8408428Z or.b32 %r90, %r1252, %r1211; 2026-02-21T09:23:32.8408609Z xor.b32 %r91, %r90, 8; 2026-02-21T09:23:32.8408785Z xor.b32 %r92, %r90, 16; 2026-02-21T09:23:32.8408964Z xor.b32 %r93, %r90, 24; 2026-02-21T09:23:32.8409139Z xor.b32 %r94, %r90, 32; 2026-02-21T09:23:32.8409315Z xor.b32 %r95, %r90, 40; 2026-02-21T09:23:32.8409492Z xor.b32 %r96, %r90, 48; 2026-02-21T09:23:32.8409669Z xor.b32 %r97, %r90, 56; 2026-02-21T09:23:32.8410014Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.8410490Z mad.wide.u32 %rd11, %r22, 8, %rd93; 2026-02-21T09:23:32.8410759Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T09:23:32.8411052Z // Child Loop BB0_3 Depth 2 2026-02-21T09:23:32.8411346Z // Child Loop BB0_5 Depth 2 2026-02-21T09:23:32.8411629Z // Child Loop BB0_7 Depth 2 2026-02-21T09:23:32.8411970Z // Child Loop BB0_9 Depth 2 2026-02-21T09:23:32.8412353Z .loc 1 35 31 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:35:31 2026-02-21T09:23:32.8412723Z shr.s32 %r1334, %r12326, 31; 2026-02-21T09:23:32.8412912Z shr.u32 %r1335, %r1334, 25; 2026-02-21T09:23:32.8413113Z add.s32 %r1336, %r12326, %r1335; 2026-02-21T09:23:32.8413472Z .loc 1 34 30 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:34:30 2026-02-21T09:23:32.8413846Z and.b32 %r1290, %r1336, -128; 2026-02-21T09:23:32.8414051Z sub.s32 %r1337, %r12326, %r1290; 2026-02-21T09:23:32.8414387Z .loc 1 36 27 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:36:27 2026-02-21T09:23:32.8414770Z shl.b32 %r3954, %r1337, 7; 2026-02-21T09:23:32.8415089Z .loc 1 37 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:37:32 2026-02-21T09:23:32.8415454Z or.b32 %r1338, %r3954, %r6; 2026-02-21T09:23:32.8415644Z or.b32 %r1339, %r3954, %r7; 2026-02-21T09:23:32.8415823Z or.b32 %r1340, %r3954, %r8; 2026-02-21T09:23:32.8416015Z or.b32 %r1341, %r3954, %r9; 2026-02-21T09:23:32.8416204Z or.b32 %r1342, %r3954, %r10; 2026-02-21T09:23:32.8416398Z or.b32 %r1343, %r3954, %r11; 2026-02-21T09:23:32.8416718Z or.b32 %r1344, %r3954, %r12; 2026-02-21T09:23:32.8416911Z or.b32 %r1345, %r3954, %r13; 2026-02-21T09:23:32.8417094Z or.b32 %r1346, %r3954, %r14; 2026-02-21T09:23:32.8417276Z or.b32 %r1347, %r3954, %r15; 2026-02-21T09:23:32.8417456Z or.b32 %r1348, %r3954, %r16; 2026-02-21T09:23:32.8417642Z or.b32 %r1349, %r3954, %r17; 2026-02-21T09:23:32.8417830Z or.b32 %r1350, %r3954, %r18; 2026-02-21T09:23:32.8418012Z or.b32 %r1351, %r3954, %r19; 2026-02-21T09:23:32.8418202Z or.b32 %r1352, %r3954, %r20; 2026-02-21T09:23:32.8418381Z or.b32 %r1353, %r3954, %r21; 2026-02-21T09:23:32.8418714Z .loc 1 52 53 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:53 2026-02-21T09:23:32.8419070Z shl.b32 %r1354, %r1338, 10; 2026-02-21T09:23:32.8419256Z shl.b32 %r1355, %r1339, 10; 2026-02-21T09:23:32.8419433Z shl.b32 %r1356, %r1340, 10; 2026-02-21T09:23:32.8419618Z shl.b32 %r1357, %r1341, 10; 2026-02-21T09:23:32.8419801Z shl.b32 %r1358, %r1342, 10; 2026-02-21T09:23:32.8419976Z shl.b32 %r1359, %r1343, 10; 2026-02-21T09:23:32.8420163Z shl.b32 %r1360, %r1344, 10; 2026-02-21T09:23:32.8420340Z shl.b32 %r1361, %r1345, 10; 2026-02-21T09:23:32.8420523Z shl.b32 %r1362, %r1346, 10; 2026-02-21T09:23:32.8420717Z shl.b32 %r1363, %r1347, 10; 2026-02-21T09:23:32.8420906Z shl.b32 %r1364, %r1348, 10; 2026-02-21T09:23:32.8421084Z shl.b32 %r1365, %r1349, 10; 2026-02-21T09:23:32.8421400Z shl.b32 %r1366, %r1350, 10; 2026-02-21T09:23:32.8421653Z shl.b32 %r1367, %r1351, 10; 2026-02-21T09:23:32.8421840Z shl.b32 %r1368, %r1352, 10; 2026-02-21T09:23:32.8422024Z shl.b32 %r1369, %r1353, 10; 2026-02-21T09:23:32.8422343Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8422705Z add.s32 %r9488, %r1205, 73728; 2026-02-21T09:23:32.8422897Z // begin inline asm 2026-02-21T09:23:32.8423108Z @%p2 mbarrier.init.shared::cta.b64 [%r9488], 1; 2026-02-21T09:23:32.8423348Z // end inline asm 2026-02-21T09:23:32.8423513Z bar.sync 0; 2026-02-21T09:23:32.8423667Z add.s32 %r9489, %r1205, 73736; 2026-02-21T09:23:32.8423859Z // begin inline asm 2026-02-21T09:23:32.8424060Z @%p2 mbarrier.init.shared::cta.b64 [%r9489], 1; 2026-02-21T09:23:32.8424389Z // end inline asm 2026-02-21T09:23:32.8424703Z .loc 1 52 60 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:60 2026-02-21T09:23:32.8425063Z or.b32 %r1371, %r1354, %r23; 2026-02-21T09:23:32.8425255Z or.b32 %r1372, %r1355, %r23; 2026-02-21T09:23:32.8425440Z or.b32 %r1373, %r1356, %r23; 2026-02-21T09:23:32.8425628Z or.b32 %r1374, %r1357, %r23; 2026-02-21T09:23:32.8425811Z or.b32 %r1375, %r1358, %r23; 2026-02-21T09:23:32.8426071Z or.b32 %r1376, %r1359, %r23; 2026-02-21T09:23:32.8426265Z or.b32 %r1377, %r1360, %r23; 2026-02-21T09:23:32.8426442Z or.b32 %r1378, %r1361, %r23; 2026-02-21T09:23:32.8426791Z or.b32 %r1379, %r1362, %r23; 2026-02-21T09:23:32.8426972Z or.b32 %r1380, %r1363, %r23; 2026-02-21T09:23:32.8427176Z or.b32 %r1381, %r1364, %r23; 2026-02-21T09:23:32.8427360Z or.b32 %r1382, %r1365, %r23; 2026-02-21T09:23:32.8427547Z or.b32 %r1383, %r1366, %r23; 2026-02-21T09:23:32.8427729Z or.b32 %r1384, %r1367, %r23; 2026-02-21T09:23:32.8427921Z or.b32 %r1385, %r1368, %r23; 2026-02-21T09:23:32.8428110Z or.b32 %r1386, %r1369, %r23; 2026-02-21T09:23:32.8428519Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8428904Z mad.wide.s32 %rd140, %r1371, 2, %rd93; 2026-02-21T09:23:32.8429133Z mad.wide.s32 %rd141, %r1372, 2, %rd93; 2026-02-21T09:23:32.8429354Z mad.wide.s32 %rd142, %r1373, 2, %rd93; 2026-02-21T09:23:32.8429567Z mad.wide.s32 %rd143, %r1374, 2, %rd93; 2026-02-21T09:23:32.8429807Z mad.wide.s32 %rd144, %r1375, 2, %rd93; 2026-02-21T09:23:32.8430023Z mad.wide.s32 %rd145, %r1376, 2, %rd93; 2026-02-21T09:23:32.8430236Z mad.wide.s32 %rd146, %r1377, 2, %rd93; 2026-02-21T09:23:32.8430451Z mad.wide.s32 %rd147, %r1378, 2, %rd93; 2026-02-21T09:23:32.8430674Z mad.wide.s32 %rd148, %r1379, 2, %rd93; 2026-02-21T09:23:32.8430901Z mad.wide.s32 %rd149, %r1380, 2, %rd93; 2026-02-21T09:23:32.8431111Z mad.wide.s32 %rd150, %r1381, 2, %rd93; 2026-02-21T09:23:32.8431330Z mad.wide.s32 %rd151, %r1382, 2, %rd93; 2026-02-21T09:23:32.8431538Z mad.wide.s32 %rd152, %r1383, 2, %rd93; 2026-02-21T09:23:32.8431754Z mad.wide.s32 %rd153, %r1384, 2, %rd93; 2026-02-21T09:23:32.8431967Z mad.wide.s32 %rd154, %r1385, 2, %rd93; 2026-02-21T09:23:32.8432180Z mad.wide.s32 %rd155, %r1386, 2, %rd93; 2026-02-21T09:23:32.8432394Z mov.b32 %r1257, 8; 2026-02-21T09:23:32.8432707Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8433067Z // begin inline asm 2026-02-21T09:23:32.8433314Z cp.async.ca.shared.global [ %r9495 + 0 ], [ %rd140 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8433611Z // end inline asm 2026-02-21T09:23:32.8433768Z // begin inline asm 2026-02-21T09:23:32.8434009Z cp.async.ca.shared.global [ %r9497 + 0 ], [ %rd141 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8434288Z // end inline asm 2026-02-21T09:23:32.8434449Z // begin inline asm 2026-02-21T09:23:32.8434690Z cp.async.ca.shared.global [ %r9499 + 0 ], [ %rd142 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8434968Z // end inline asm 2026-02-21T09:23:32.8435123Z // begin inline asm 2026-02-21T09:23:32.8435374Z cp.async.ca.shared.global [ %r9501 + 0 ], [ %rd143 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8435833Z // end inline asm 2026-02-21T09:23:32.8435986Z // begin inline asm 2026-02-21T09:23:32.8436226Z cp.async.ca.shared.global [ %r9503 + 0 ], [ %rd144 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8436640Z // end inline asm 2026-02-21T09:23:32.8436805Z // begin inline asm 2026-02-21T09:23:32.8437049Z cp.async.ca.shared.global [ %r9505 + 0 ], [ %rd145 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8437326Z // end inline asm 2026-02-21T09:23:32.8437493Z // begin inline asm 2026-02-21T09:23:32.8437733Z cp.async.ca.shared.global [ %r9507 + 0 ], [ %rd146 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8438022Z // end inline asm 2026-02-21T09:23:32.8438176Z // begin inline asm 2026-02-21T09:23:32.8438414Z cp.async.ca.shared.global [ %r9509 + 0 ], [ %rd147 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8438803Z // end inline asm 2026-02-21T09:23:32.8438967Z // begin inline asm 2026-02-21T09:23:32.8439211Z cp.async.ca.shared.global [ %r9511 + 0 ], [ %rd148 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8439507Z // end inline asm 2026-02-21T09:23:32.8439667Z // begin inline asm 2026-02-21T09:23:32.8439902Z cp.async.ca.shared.global [ %r9513 + 0 ], [ %rd149 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8440185Z // end inline asm 2026-02-21T09:23:32.8440414Z // begin inline asm 2026-02-21T09:23:32.8440659Z cp.async.ca.shared.global [ %r9515 + 0 ], [ %rd150 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8440937Z // end inline asm 2026-02-21T09:23:32.8441094Z // begin inline asm 2026-02-21T09:23:32.8441333Z cp.async.ca.shared.global [ %r9517 + 0 ], [ %rd151 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8441608Z // end inline asm 2026-02-21T09:23:32.8441761Z // begin inline asm 2026-02-21T09:23:32.8441993Z cp.async.ca.shared.global [ %r9519 + 0 ], [ %rd152 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8442278Z // end inline asm 2026-02-21T09:23:32.8442430Z // begin inline asm 2026-02-21T09:23:32.8442668Z cp.async.ca.shared.global [ %r9521 + 0 ], [ %rd153 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8442949Z // end inline asm 2026-02-21T09:23:32.8443109Z // begin inline asm 2026-02-21T09:23:32.8443342Z cp.async.ca.shared.global [ %r9523 + 0 ], [ %rd154 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8443622Z // end inline asm 2026-02-21T09:23:32.8443776Z // begin inline asm 2026-02-21T09:23:32.8444012Z cp.async.ca.shared.global [ %r9525 + 0 ], [ %rd155 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8444298Z // end inline asm 2026-02-21T09:23:32.8444472Z cp.async.commit_group; 2026-02-21T09:23:32.8444801Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8445154Z bar.sync 0; 2026-02-21T09:23:32.8445316Z // begin inline asm 2026-02-21T09:23:32.8445547Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9488], 4096; 2026-02-21T09:23:32.8445823Z // end inline asm 2026-02-21T09:23:32.8446126Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8446609Z bar.sync 0; 2026-02-21T09:23:32.8446787Z elect.sync %r1387|%p45, -1; 2026-02-21T09:23:32.8446986Z and.pred %p41, %p1, %p45; 2026-02-21T09:23:32.8447186Z add.s32 %r1289, %r1205, 65536; 2026-02-21T09:23:32.8447372Z mov.b32 %r1291, 0; 2026-02-21T09:23:32.8447540Z // begin inline asm 2026-02-21T09:23:32.8447977Z @%p41 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1289], [%rd603, {%r1290, %r1291}], [%r9488]; 2026-02-21T09:23:32.8448447Z // end inline asm 2026-02-21T09:23:32.8448748Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8449107Z cvt.s64.s32 %rd175, %r1354; 2026-02-21T09:23:32.8449305Z cvt.u64.u32 %rd12, %r23; 2026-02-21T09:23:32.8449503Z or.b64 %rd176, %rd175, %rd12; 2026-02-21T09:23:32.8449701Z shl.b64 %rd177, %rd176, 1; 2026-02-21T09:23:32.8449906Z add.s64 %rd178, %rd93, %rd177; 2026-02-21T09:23:32.8450109Z add.s64 %rd157, %rd178, 128; 2026-02-21T09:23:32.8450305Z cvt.s64.s32 %rd179, %r1355; 2026-02-21T09:23:32.8450664Z or.b64 %rd180, %rd179, %rd12; 2026-02-21T09:23:32.8450861Z shl.b64 %rd181, %rd180, 1; 2026-02-21T09:23:32.8451045Z add.s64 %rd182, %rd93, %rd181; 2026-02-21T09:23:32.8451247Z add.s64 %rd158, %rd182, 128; 2026-02-21T09:23:32.8451434Z cvt.s64.s32 %rd183, %r1356; 2026-02-21T09:23:32.8451624Z or.b64 %rd184, %rd183, %rd12; 2026-02-21T09:23:32.8451808Z shl.b64 %rd185, %rd184, 1; 2026-02-21T09:23:32.8451996Z add.s64 %rd186, %rd93, %rd185; 2026-02-21T09:23:32.8452184Z add.s64 %rd159, %rd186, 128; 2026-02-21T09:23:32.8452387Z cvt.s64.s32 %rd187, %r1357; 2026-02-21T09:23:32.8452577Z or.b64 %rd188, %rd187, %rd12; 2026-02-21T09:23:32.8452760Z shl.b64 %rd189, %rd188, 1; 2026-02-21T09:23:32.8452944Z add.s64 %rd190, %rd93, %rd189; 2026-02-21T09:23:32.8453213Z add.s64 %rd160, %rd190, 128; 2026-02-21T09:23:32.8453405Z cvt.s64.s32 %rd191, %r1358; 2026-02-21T09:23:32.8453589Z or.b64 %rd192, %rd191, %rd12; 2026-02-21T09:23:32.8453779Z shl.b64 %rd193, %rd192, 1; 2026-02-21T09:23:32.8453985Z add.s64 %rd194, %rd93, %rd193; 2026-02-21T09:23:32.8454182Z add.s64 %rd161, %rd194, 128; 2026-02-21T09:23:32.8454369Z cvt.s64.s32 %rd195, %r1359; 2026-02-21T09:23:32.8454550Z or.b64 %rd196, %rd195, %rd12; 2026-02-21T09:23:32.8454820Z shl.b64 %rd197, %rd196, 1; 2026-02-21T09:23:32.8455005Z add.s64 %rd198, %rd93, %rd197; 2026-02-21T09:23:32.8455198Z add.s64 %rd162, %rd198, 128; 2026-02-21T09:23:32.8455382Z cvt.s64.s32 %rd199, %r1360; 2026-02-21T09:23:32.8455570Z or.b64 %rd200, %rd199, %rd12; 2026-02-21T09:23:32.8455753Z shl.b64 %rd201, %rd200, 1; 2026-02-21T09:23:32.8455939Z add.s64 %rd202, %rd93, %rd201; 2026-02-21T09:23:32.8456127Z add.s64 %rd163, %rd202, 128; 2026-02-21T09:23:32.8456314Z cvt.s64.s32 %rd203, %r1361; 2026-02-21T09:23:32.8456625Z or.b64 %rd204, %rd203, %rd12; 2026-02-21T09:23:32.8456827Z shl.b64 %rd205, %rd204, 1; 2026-02-21T09:23:32.8457017Z add.s64 %rd206, %rd93, %rd205; 2026-02-21T09:23:32.8457205Z add.s64 %rd164, %rd206, 128; 2026-02-21T09:23:32.8457400Z cvt.s64.s32 %rd207, %r1362; 2026-02-21T09:23:32.8457585Z or.b64 %rd208, %rd207, %rd12; 2026-02-21T09:23:32.8457777Z shl.b64 %rd209, %rd208, 1; 2026-02-21T09:23:32.8457960Z add.s64 %rd210, %rd93, %rd209; 2026-02-21T09:23:32.8458154Z add.s64 %rd165, %rd210, 128; 2026-02-21T09:23:32.8458339Z cvt.s64.s32 %rd211, %r1363; 2026-02-21T09:23:32.8458529Z or.b64 %rd212, %rd211, %rd12; 2026-02-21T09:23:32.8458720Z shl.b64 %rd213, %rd212, 1; 2026-02-21T09:23:32.8458901Z add.s64 %rd214, %rd93, %rd213; 2026-02-21T09:23:32.8459101Z add.s64 %rd166, %rd214, 128; 2026-02-21T09:23:32.8459287Z cvt.s64.s32 %rd215, %r1364; 2026-02-21T09:23:32.8459476Z or.b64 %rd216, %rd215, %rd12; 2026-02-21T09:23:32.8459659Z shl.b64 %rd217, %rd216, 1; 2026-02-21T09:23:32.8459850Z add.s64 %rd218, %rd93, %rd217; 2026-02-21T09:23:32.8460043Z add.s64 %rd167, %rd218, 128; 2026-02-21T09:23:32.8460235Z cvt.s64.s32 %rd219, %r1365; 2026-02-21T09:23:32.8460422Z or.b64 %rd220, %rd219, %rd12; 2026-02-21T09:23:32.8460612Z shl.b64 %rd221, %rd220, 1; 2026-02-21T09:23:32.8460798Z add.s64 %rd222, %rd93, %rd221; 2026-02-21T09:23:32.8460987Z add.s64 %rd168, %rd222, 128; 2026-02-21T09:23:32.8461176Z cvt.s64.s32 %rd223, %r1366; 2026-02-21T09:23:32.8461363Z or.b64 %rd224, %rd223, %rd12; 2026-02-21T09:23:32.8461560Z shl.b64 %rd225, %rd224, 1; 2026-02-21T09:23:32.8461740Z add.s64 %rd226, %rd93, %rd225; 2026-02-21T09:23:32.8461945Z add.s64 %rd169, %rd226, 128; 2026-02-21T09:23:32.8462131Z cvt.s64.s32 %rd227, %r1367; 2026-02-21T09:23:32.8462321Z or.b64 %rd228, %rd227, %rd12; 2026-02-21T09:23:32.8462510Z shl.b64 %rd229, %rd228, 1; 2026-02-21T09:23:32.8462689Z add.s64 %rd230, %rd93, %rd229; 2026-02-21T09:23:32.8462878Z add.s64 %rd170, %rd230, 128; 2026-02-21T09:23:32.8463056Z cvt.s64.s32 %rd231, %r1368; 2026-02-21T09:23:32.8463243Z or.b64 %rd232, %rd231, %rd12; 2026-02-21T09:23:32.8463426Z shl.b64 %rd233, %rd232, 1; 2026-02-21T09:23:32.8463613Z add.s64 %rd234, %rd93, %rd233; 2026-02-21T09:23:32.8463986Z add.s64 %rd171, %rd234, 128; 2026-02-21T09:23:32.8464175Z cvt.s64.s32 %rd235, %r1369; 2026-02-21T09:23:32.8464356Z or.b64 %rd236, %rd235, %rd12; 2026-02-21T09:23:32.8464543Z shl.b64 %rd237, %rd236, 1; 2026-02-21T09:23:32.8464730Z add.s64 %rd238, %rd93, %rd237; 2026-02-21T09:23:32.8464917Z add.s64 %rd172, %rd238, 128; 2026-02-21T09:23:32.8465258Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8465617Z // begin inline asm 2026-02-21T09:23:32.8465876Z cp.async.ca.shared.global [ %r9532 + 0 ], [ %rd157 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8466165Z // end inline asm 2026-02-21T09:23:32.8466328Z // begin inline asm 2026-02-21T09:23:32.8466696Z cp.async.ca.shared.global [ %r9534 + 0 ], [ %rd158 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8467100Z // end inline asm 2026-02-21T09:23:32.8467265Z // begin inline asm 2026-02-21T09:23:32.8467509Z cp.async.ca.shared.global [ %r9536 + 0 ], [ %rd159 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8467816Z // end inline asm 2026-02-21T09:23:32.8467973Z // begin inline asm 2026-02-21T09:23:32.8468302Z cp.async.ca.shared.global [ %r9538 + 0 ], [ %rd160 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8468611Z // end inline asm 2026-02-21T09:23:32.8468877Z // begin inline asm 2026-02-21T09:23:32.8469116Z cp.async.ca.shared.global [ %r9540 + 0 ], [ %rd161 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8469402Z // end inline asm 2026-02-21T09:23:32.8469563Z // begin inline asm 2026-02-21T09:23:32.8469797Z cp.async.ca.shared.global [ %r9542 + 0 ], [ %rd162 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8470086Z // end inline asm 2026-02-21T09:23:32.8470250Z // begin inline asm 2026-02-21T09:23:32.8470497Z cp.async.ca.shared.global [ %r9544 + 0 ], [ %rd163 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8470786Z // end inline asm 2026-02-21T09:23:32.8470953Z // begin inline asm 2026-02-21T09:23:32.8471192Z cp.async.ca.shared.global [ %r9546 + 0 ], [ %rd164 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8471489Z // end inline asm 2026-02-21T09:23:32.8471648Z // begin inline asm 2026-02-21T09:23:32.8471880Z cp.async.ca.shared.global [ %r9548 + 0 ], [ %rd165 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8472169Z // end inline asm 2026-02-21T09:23:32.8472323Z // begin inline asm 2026-02-21T09:23:32.8472562Z cp.async.ca.shared.global [ %r9550 + 0 ], [ %rd166 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8472844Z // end inline asm 2026-02-21T09:23:32.8473009Z // begin inline asm 2026-02-21T09:23:32.8473247Z cp.async.ca.shared.global [ %r9552 + 0 ], [ %rd167 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8473538Z // end inline asm 2026-02-21T09:23:32.8473696Z // begin inline asm 2026-02-21T09:23:32.8473930Z cp.async.ca.shared.global [ %r9554 + 0 ], [ %rd168 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8474215Z // end inline asm 2026-02-21T09:23:32.8474368Z // begin inline asm 2026-02-21T09:23:32.8474606Z cp.async.ca.shared.global [ %r9556 + 0 ], [ %rd169 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8474900Z // end inline asm 2026-02-21T09:23:32.8475069Z // begin inline asm 2026-02-21T09:23:32.8475314Z cp.async.ca.shared.global [ %r9558 + 0 ], [ %rd170 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8475602Z // end inline asm 2026-02-21T09:23:32.8475761Z // begin inline asm 2026-02-21T09:23:32.8475998Z cp.async.ca.shared.global [ %r9560 + 0 ], [ %rd171 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8476293Z // end inline asm 2026-02-21T09:23:32.8476568Z // begin inline asm 2026-02-21T09:23:32.8476817Z cp.async.ca.shared.global [ %r9562 + 0 ], [ %rd172 + 0 ], 0x8, %r1257; 2026-02-21T09:23:32.8477094Z // end inline asm 2026-02-21T09:23:32.8477264Z cp.async.commit_group; 2026-02-21T09:23:32.8477594Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8477968Z bar.sync 0; 2026-02-21T09:23:32.8478128Z // begin inline asm 2026-02-21T09:23:32.8478357Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9489], 4096; 2026-02-21T09:23:32.8478634Z // end inline asm 2026-02-21T09:23:32.8479078Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8479433Z bar.sync 0; 2026-02-21T09:23:32.8479593Z elect.sync %r1388|%p46, -1; 2026-02-21T09:23:32.8479798Z and.pred %p43, %p1, %p46; 2026-02-21T09:23:32.8479994Z add.s32 %r1326, %r1205, 69632; 2026-02-21T09:23:32.8480189Z mov.b32 %r1328, 32; 2026-02-21T09:23:32.8480355Z // begin inline asm 2026-02-21T09:23:32.8480788Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1326], [%rd603, {%r1290, %r1328}], [%r9489]; 2026-02-21T09:23:32.8481265Z // end inline asm 2026-02-21T09:23:32.8481563Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8481930Z shl.b32 %r1389, %r12326, 7; 2026-02-21T09:23:32.8482213Z or.b32 %r1390, %r21, %r1389; 2026-02-21T09:23:32.8482407Z shl.b32 %r1391, %r1336, 7; 2026-02-21T09:23:32.8482596Z and.b32 %r1392, %r1391, -16384; 2026-02-21T09:23:32.8482810Z sub.s32 %r1393, %r1390, %r1392; 2026-02-21T09:23:32.8483025Z shl.b32 %r1394, %r1393, 10; 2026-02-21T09:23:32.8483219Z mul.wide.s32 %rd239, %r1394, 2; 2026-02-21T09:23:32.8483432Z or.b64 %rd13, %rd239, 256; 2026-02-21T09:23:32.8483695Z or.b32 %r1395, %r20, %r1389; 2026-02-21T09:23:32.8483892Z sub.s32 %r1396, %r1395, %r1392; 2026-02-21T09:23:32.8484082Z shl.b32 %r1397, %r1396, 10; 2026-02-21T09:23:32.8484277Z mul.wide.s32 %rd240, %r1397, 2; 2026-02-21T09:23:32.8484471Z or.b64 %rd14, %rd240, 256; 2026-02-21T09:23:32.8484663Z or.b32 %r1398, %r19, %r1389; 2026-02-21T09:23:32.8484855Z sub.s32 %r1399, %r1398, %r1392; 2026-02-21T09:23:32.8485044Z shl.b32 %r1400, %r1399, 10; 2026-02-21T09:23:32.8485233Z mul.wide.s32 %rd241, %r1400, 2; 2026-02-21T09:23:32.8485423Z or.b64 %rd15, %rd241, 256; 2026-02-21T09:23:32.8485610Z or.b32 %r1401, %r18, %r1389; 2026-02-21T09:23:32.8485795Z sub.s32 %r1402, %r1401, %r1392; 2026-02-21T09:23:32.8485988Z shl.b32 %r1403, %r1402, 10; 2026-02-21T09:23:32.8486182Z mul.wide.s32 %rd242, %r1403, 2; 2026-02-21T09:23:32.8486377Z or.b64 %rd16, %rd242, 256; 2026-02-21T09:23:32.8486684Z or.b32 %r1404, %r17, %r1389; 2026-02-21T09:23:32.8486875Z sub.s32 %r1405, %r1404, %r1392; 2026-02-21T09:23:32.8487070Z shl.b32 %r1406, %r1405, 10; 2026-02-21T09:23:32.8487256Z mul.wide.s32 %rd243, %r1406, 2; 2026-02-21T09:23:32.8487450Z or.b64 %rd17, %rd243, 256; 2026-02-21T09:23:32.8487630Z or.b32 %r1407, %r16, %r1389; 2026-02-21T09:23:32.8487813Z sub.s32 %r1408, %r1407, %r1392; 2026-02-21T09:23:32.8487999Z shl.b32 %r1409, %r1408, 10; 2026-02-21T09:23:32.8488185Z mul.wide.s32 %rd244, %r1409, 2; 2026-02-21T09:23:32.8488372Z or.b64 %rd18, %rd244, 256; 2026-02-21T09:23:32.8488551Z or.b32 %r1410, %r15, %r1389; 2026-02-21T09:23:32.8488750Z sub.s32 %r1411, %r1410, %r1392; 2026-02-21T09:23:32.8488941Z shl.b32 %r1412, %r1411, 10; 2026-02-21T09:23:32.8489145Z mul.wide.s32 %rd245, %r1412, 2; 2026-02-21T09:23:32.8489340Z or.b64 %rd19, %rd245, 256; 2026-02-21T09:23:32.8489530Z or.b32 %r1413, %r14, %r1389; 2026-02-21T09:23:32.8489716Z sub.s32 %r1414, %r1413, %r1392; 2026-02-21T09:23:32.8489911Z shl.b32 %r1415, %r1414, 10; 2026-02-21T09:23:32.8490093Z mul.wide.s32 %rd246, %r1415, 2; 2026-02-21T09:23:32.8490292Z or.b64 %rd20, %rd246, 256; 2026-02-21T09:23:32.8490476Z or.b32 %r1416, %r13, %r1389; 2026-02-21T09:23:32.8490667Z sub.s32 %r1417, %r1416, %r1392; 2026-02-21T09:23:32.8490861Z shl.b32 %r1418, %r1417, 10; 2026-02-21T09:23:32.8491051Z mul.wide.s32 %rd247, %r1418, 2; 2026-02-21T09:23:32.8491247Z or.b64 %rd21, %rd247, 256; 2026-02-21T09:23:32.8491434Z or.b32 %r1419, %r12, %r1389; 2026-02-21T09:23:32.8491621Z sub.s32 %r1420, %r1419, %r1392; 2026-02-21T09:23:32.8491809Z shl.b32 %r1421, %r1420, 10; 2026-02-21T09:23:32.8491999Z mul.wide.s32 %rd248, %r1421, 2; 2026-02-21T09:23:32.8492194Z or.b64 %rd22, %rd248, 256; 2026-02-21T09:23:32.8492379Z or.b32 %r1422, %r11, %r1389; 2026-02-21T09:23:32.8492566Z sub.s32 %r1423, %r1422, %r1392; 2026-02-21T09:23:32.8492933Z shl.b32 %r1424, %r1423, 10; 2026-02-21T09:23:32.8493135Z mul.wide.s32 %rd249, %r1424, 2; 2026-02-21T09:23:32.8493325Z or.b64 %rd23, %rd249, 256; 2026-02-21T09:23:32.8493512Z or.b32 %r1425, %r10, %r1389; 2026-02-21T09:23:32.8493696Z sub.s32 %r1426, %r1425, %r1392; 2026-02-21T09:23:32.8493906Z shl.b32 %r1427, %r1426, 10; 2026-02-21T09:23:32.8494092Z mul.wide.s32 %rd250, %r1427, 2; 2026-02-21T09:23:32.8494293Z or.b64 %rd24, %rd250, 256; 2026-02-21T09:23:32.8494470Z or.b32 %r1428, %r9, %r1389; 2026-02-21T09:23:32.8494657Z sub.s32 %r1429, %r1428, %r1392; 2026-02-21T09:23:32.8494848Z shl.b32 %r1430, %r1429, 10; 2026-02-21T09:23:32.8495031Z mul.wide.s32 %rd251, %r1430, 2; 2026-02-21T09:23:32.8495230Z or.b64 %rd25, %rd251, 256; 2026-02-21T09:23:32.8495488Z or.b32 %r1431, %r8, %r1389; 2026-02-21T09:23:32.8495681Z sub.s32 %r1432, %r1431, %r1392; 2026-02-21T09:23:32.8495870Z shl.b32 %r1433, %r1432, 10; 2026-02-21T09:23:32.8496059Z mul.wide.s32 %rd252, %r1433, 2; 2026-02-21T09:23:32.8496270Z or.b64 %rd26, %rd252, 256; 2026-02-21T09:23:32.8496568Z or.b32 %r1434, %r7, %r1389; 2026-02-21T09:23:32.8496754Z sub.s32 %r1435, %r1434, %r1392; 2026-02-21T09:23:32.8496946Z shl.b32 %r1436, %r1435, 10; 2026-02-21T09:23:32.8497219Z mul.wide.s32 %rd253, %r1436, 2; 2026-02-21T09:23:32.8497414Z or.b64 %rd27, %rd253, 256; 2026-02-21T09:23:32.8497596Z or.b32 %r1437, %r6, %r1389; 2026-02-21T09:23:32.8497778Z sub.s32 %r1438, %r1437, %r1392; 2026-02-21T09:23:32.8497974Z shl.b32 %r1439, %r1438, 10; 2026-02-21T09:23:32.8498156Z mul.wide.s32 %rd254, %r1439, 2; 2026-02-21T09:23:32.8498351Z or.b64 %rd28, %rd254, 256; 2026-02-21T09:23:32.8498529Z mov.b32 %r12330, 0f00000000; 2026-02-21T09:23:32.8498721Z mov.b32 %r12329, 1; 2026-02-21T09:23:32.8498911Z mov.b32 %r12328, -1; 2026-02-21T09:23:32.8499077Z mov.b64 %rd737, 0; 2026-02-21T09:23:32.8499246Z mov.b64 %rd736, %rd11; 2026-02-21T09:23:32.8499423Z mov.b32 %r12327, %r1291; 2026-02-21T09:23:32.8499614Z mov.b32 %r12331, %r12330; 2026-02-21T09:23:32.8499794Z mov.b32 %r12332, %r12330; 2026-02-21T09:23:32.8499975Z mov.b32 %r12333, %r12330; 2026-02-21T09:23:32.8500149Z mov.b32 %r12334, %r12330; 2026-02-21T09:23:32.8500325Z mov.b32 %r12335, %r12330; 2026-02-21T09:23:32.8500501Z mov.b32 %r12336, %r12330; 2026-02-21T09:23:32.8500679Z mov.b32 %r12337, %r12330; 2026-02-21T09:23:32.8500858Z mov.b32 %r12338, %r12330; 2026-02-21T09:23:32.8501032Z mov.b32 %r12339, %r12330; 2026-02-21T09:23:32.8501209Z mov.b32 %r12340, %r12330; 2026-02-21T09:23:32.8501399Z mov.b32 %r12341, %r12330; 2026-02-21T09:23:32.8501581Z mov.b32 %r12342, %r12330; 2026-02-21T09:23:32.8501753Z mov.b32 %r12343, %r12330; 2026-02-21T09:23:32.8501934Z mov.b32 %r12344, %r12330; 2026-02-21T09:23:32.8502109Z mov.b32 %r12345, %r12330; 2026-02-21T09:23:32.8502290Z mov.b32 %r12346, %r12330; 2026-02-21T09:23:32.8502464Z mov.b32 %r12347, %r12330; 2026-02-21T09:23:32.8502647Z mov.b32 %r12348, %r12330; 2026-02-21T09:23:32.8502831Z mov.b32 %r12349, %r12330; 2026-02-21T09:23:32.8503004Z mov.b32 %r12350, %r12330; 2026-02-21T09:23:32.8503184Z mov.b32 %r12351, %r12330; 2026-02-21T09:23:32.8503357Z mov.b32 %r12352, %r12330; 2026-02-21T09:23:32.8503541Z mov.b32 %r12353, %r12330; 2026-02-21T09:23:32.8503715Z mov.b32 %r12354, %r12330; 2026-02-21T09:23:32.8503893Z mov.b32 %r12355, %r12330; 2026-02-21T09:23:32.8504068Z mov.b32 %r12356, %r12330; 2026-02-21T09:23:32.8504252Z mov.b32 %r12357, %r12330; 2026-02-21T09:23:32.8504424Z mov.b32 %r12358, %r12330; 2026-02-21T09:23:32.8504604Z mov.b32 %r12359, %r12330; 2026-02-21T09:23:32.8504784Z mov.b32 %r12360, %r12330; 2026-02-21T09:23:32.8504958Z mov.b32 %r12361, %r12330; 2026-02-21T09:23:32.8505138Z mov.b32 %r12362, %r12330; 2026-02-21T09:23:32.8505315Z mov.b32 %r12363, %r12330; 2026-02-21T09:23:32.8505499Z mov.b32 %r12364, %r12330; 2026-02-21T09:23:32.8505673Z mov.b32 %r12365, %r12330; 2026-02-21T09:23:32.8505853Z mov.b32 %r12366, %r12330; 2026-02-21T09:23:32.8506197Z mov.b32 %r12367, %r12330; 2026-02-21T09:23:32.8506374Z mov.b32 %r12368, %r12330; 2026-02-21T09:23:32.8506661Z mov.b32 %r12369, %r12330; 2026-02-21T09:23:32.8506838Z mov.b32 %r12370, %r12330; 2026-02-21T09:23:32.8507014Z mov.b32 %r12371, %r12330; 2026-02-21T09:23:32.8507188Z mov.b32 %r12372, %r12330; 2026-02-21T09:23:32.8507377Z mov.b32 %r12373, %r12330; 2026-02-21T09:23:32.8507554Z mov.b32 %r12374, %r12330; 2026-02-21T09:23:32.8507729Z mov.b32 %r12375, %r12330; 2026-02-21T09:23:32.8507901Z mov.b32 %r12376, %r12330; 2026-02-21T09:23:32.8508076Z mov.b32 %r12377, %r12330; 2026-02-21T09:23:32.8508308Z mov.b32 %r12378, %r12330; 2026-02-21T09:23:32.8508494Z mov.b32 %r12379, %r12330; 2026-02-21T09:23:32.8508668Z mov.b32 %r12380, %r12330; 2026-02-21T09:23:32.8508947Z mov.b32 %r12381, %r12330; 2026-02-21T09:23:32.8509130Z mov.b32 %r12382, %r12330; 2026-02-21T09:23:32.8509302Z mov.b32 %r12383, %r12330; 2026-02-21T09:23:32.8509483Z mov.b32 %r12384, %r12330; 2026-02-21T09:23:32.8509657Z mov.b32 %r12385, %r12330; 2026-02-21T09:23:32.8509838Z mov.b32 %r12386, %r12330; 2026-02-21T09:23:32.8510009Z mov.b32 %r12387, %r12330; 2026-02-21T09:23:32.8510186Z mov.b32 %r12388, %r12330; 2026-02-21T09:23:32.8510358Z mov.b32 %r12389, %r12330; 2026-02-21T09:23:32.8510619Z mov.b32 %r12390, %r12330; 2026-02-21T09:23:32.8510794Z mov.b32 %r12391, %r12330; 2026-02-21T09:23:32.8510973Z mov.b32 %r12392, %r12330; 2026-02-21T09:23:32.8511151Z mov.b32 %r12393, %r12330; 2026-02-21T09:23:32.8511326Z mov.b32 %r12394, %r12330; 2026-02-21T09:23:32.8511506Z mov.b32 %r12395, %r12330; 2026-02-21T09:23:32.8511677Z mov.b32 %r12396, %r12330; 2026-02-21T09:23:32.8511852Z mov.b32 %r12397, %r12330; 2026-02-21T09:23:32.8512041Z mov.b32 %r12398, %r12330; 2026-02-21T09:23:32.8512230Z mov.b32 %r12399, %r12330; 2026-02-21T09:23:32.8512405Z mov.b32 %r12400, %r12330; 2026-02-21T09:23:32.8512585Z mov.b32 %r12401, %r12330; 2026-02-21T09:23:32.8512758Z mov.b32 %r12402, %r12330; 2026-02-21T09:23:32.8512941Z mov.b32 %r12403, %r12330; 2026-02-21T09:23:32.8513121Z mov.b32 %r12404, %r12330; 2026-02-21T09:23:32.8513295Z mov.b32 %r12405, %r12330; 2026-02-21T09:23:32.8513475Z mov.b32 %r12406, %r12330; 2026-02-21T09:23:32.8513649Z mov.b32 %r12407, %r12330; 2026-02-21T09:23:32.8513831Z mov.b32 %r12408, %r12330; 2026-02-21T09:23:32.8514005Z mov.b32 %r12409, %r12330; 2026-02-21T09:23:32.8514186Z mov.b32 %r12410, %r12330; 2026-02-21T09:23:32.8514360Z mov.b32 %r12411, %r12330; 2026-02-21T09:23:32.8514539Z mov.b32 %r12412, %r12330; 2026-02-21T09:23:32.8514724Z mov.b32 %r12413, %r12330; 2026-02-21T09:23:32.8514907Z mov.b32 %r12414, %r12330; 2026-02-21T09:23:32.8515087Z mov.b32 %r12415, %r12330; 2026-02-21T09:23:32.8515258Z mov.b32 %r12416, %r12330; 2026-02-21T09:23:32.8515437Z mov.b32 %r12417, %r12330; 2026-02-21T09:23:32.8515608Z mov.b32 %r12418, %r12330; 2026-02-21T09:23:32.8515787Z mov.b32 %r12419, %r12330; 2026-02-21T09:23:32.8515959Z mov.b32 %r12420, %r12330; 2026-02-21T09:23:32.8516142Z mov.b32 %r12421, %r12330; 2026-02-21T09:23:32.8516313Z mov.b32 %r12422, %r12330; 2026-02-21T09:23:32.8516612Z mov.b32 %r12423, %r12330; 2026-02-21T09:23:32.8516803Z mov.b32 %r12424, %r12330; 2026-02-21T09:23:32.8516982Z mov.b32 %r12425, %r12330; 2026-02-21T09:23:32.8517160Z mov.b32 %r12426, %r12330; 2026-02-21T09:23:32.8517333Z mov.b32 %r12427, %r12330; 2026-02-21T09:23:32.8517511Z mov.b32 %r12428, %r12330; 2026-02-21T09:23:32.8517680Z mov.b32 %r12429, %r12330; 2026-02-21T09:23:32.8517861Z mov.b32 %r12430, %r12330; 2026-02-21T09:23:32.8518035Z mov.b32 %r12431, %r12330; 2026-02-21T09:23:32.8518214Z mov.b32 %r12432, %r12330; 2026-02-21T09:23:32.8518386Z mov.b32 %r12433, %r12330; 2026-02-21T09:23:32.8518562Z mov.b32 %r12434, %r12330; 2026-02-21T09:23:32.8518743Z mov.b32 %r12435, %r12330; 2026-02-21T09:23:32.8518920Z mov.b32 %r12436, %r12330; 2026-02-21T09:23:32.8519100Z mov.b32 %r12437, %r12330; 2026-02-21T09:23:32.8519272Z mov.b32 %r12438, %r12330; 2026-02-21T09:23:32.8519639Z mov.b32 %r12439, %r12330; 2026-02-21T09:23:32.8519810Z mov.b32 %r12440, %r12330; 2026-02-21T09:23:32.8519989Z mov.b32 %r12441, %r12330; 2026-02-21T09:23:32.8520161Z mov.b32 %r12442, %r12330; 2026-02-21T09:23:32.8520343Z mov.b32 %r12443, %r12330; 2026-02-21T09:23:32.8520511Z mov.b32 %r12444, %r12330; 2026-02-21T09:23:32.8520688Z mov.b32 %r12445, %r12330; 2026-02-21T09:23:32.8520865Z mov.b32 %r12446, %r12330; 2026-02-21T09:23:32.8521036Z mov.b32 %r12447, %r12330; 2026-02-21T09:23:32.8521231Z mov.b32 %r12448, %r12330; 2026-02-21T09:23:32.8521403Z mov.b32 %r12449, %r12330; 2026-02-21T09:23:32.8521586Z mov.b32 %r12450, %r12330; 2026-02-21T09:23:32.8521763Z mov.b32 %r12451, %r12330; 2026-02-21T09:23:32.8521940Z mov.b32 %r12452, %r12330; 2026-02-21T09:23:32.8522224Z mov.b32 %r12453, %r12330; 2026-02-21T09:23:32.8522410Z mov.b32 %r12454, %r12330; 2026-02-21T09:23:32.8522583Z mov.b32 %r12455, %r12330; 2026-02-21T09:23:32.8522762Z mov.b32 %r12456, %r12330; 2026-02-21T09:23:32.8522943Z mov.b32 %r12457, %r12330; 2026-02-21T09:23:32.8523177Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:32.8523497Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:32.8523844Z setp.lt.u64 %p67, %rd737, 448; 2026-02-21T09:23:32.8524068Z add.s32 %r3853, %r12328, 1; 2026-02-21T09:23:32.8524262Z setp.gt.s32 %p68, %r3853, 1; 2026-02-21T09:23:32.8524465Z selp.b32 %r12328, 0, %r3853, %p68; 2026-02-21T09:23:32.8524672Z selp.b32 %r3854, 1, 0, %p68; 2026-02-21T09:23:32.8524868Z xor.b32 %r12327, %r12327, %r3854; 2026-02-21T09:23:32.8525221Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8525602Z cp.async.wait_group 1; 2026-02-21T09:23:32.8525804Z bar.sync 0; 2026-02-21T09:23:32.8525964Z shl.b32 %r3855, %r12328, 14; 2026-02-21T09:23:32.8526159Z add.s32 %r3857, %r1205, %r3855; 2026-02-21T09:23:32.8526619Z .loc 1 56 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:56:32 2026-02-21T09:23:32.8527012Z add.s32 %r3858, %r3857, %r58; 2026-02-21T09:23:32.8527204Z ld.shared.b16 %rs1, [%r3858]; 2026-02-21T09:23:32.8527408Z ld.shared.b16 %rs2, [%r3858+1024]; 2026-02-21T09:23:32.8527624Z ld.shared.b16 %rs3, [%r3858+64]; 2026-02-21T09:23:32.8527828Z ld.shared.b16 %rs4, [%r3858+1088]; 2026-02-21T09:23:32.8528037Z ld.shared.b16 %rs5, [%r3858+8192]; 2026-02-21T09:23:32.8528238Z ld.shared.b16 %rs6, [%r3858+9216]; 2026-02-21T09:23:32.8528446Z ld.shared.b16 %rs7, [%r3858+8256]; 2026-02-21T09:23:32.8528646Z ld.shared.b16 %rs8, [%r3858+9280]; 2026-02-21T09:23:32.8528849Z add.s32 %r3859, %r3857, %r59; 2026-02-21T09:23:32.8529041Z ld.shared.b16 %rs9, [%r3859]; 2026-02-21T09:23:32.8529242Z ld.shared.b16 %rs10, [%r3859+1024]; 2026-02-21T09:23:32.8529457Z ld.shared.b16 %rs11, [%r3859+64]; 2026-02-21T09:23:32.8529661Z ld.shared.b16 %rs12, [%r3859+1088]; 2026-02-21T09:23:32.8529891Z ld.shared.b16 %rs13, [%r3859+8192]; 2026-02-21T09:23:32.8530102Z ld.shared.b16 %rs14, [%r3859+9216]; 2026-02-21T09:23:32.8530313Z ld.shared.b16 %rs15, [%r3859+8256]; 2026-02-21T09:23:32.8530513Z ld.shared.b16 %rs16, [%r3859+9280]; 2026-02-21T09:23:32.8530718Z add.s32 %r3860, %r3857, %r60; 2026-02-21T09:23:32.8530910Z ld.shared.b16 %rs17, [%r3860]; 2026-02-21T09:23:32.8531112Z ld.shared.b16 %rs18, [%r3860+1024]; 2026-02-21T09:23:32.8531323Z ld.shared.b16 %rs19, [%r3860+64]; 2026-02-21T09:23:32.8531525Z ld.shared.b16 %rs20, [%r3860+1088]; 2026-02-21T09:23:32.8531736Z ld.shared.b16 %rs21, [%r3860+8192]; 2026-02-21T09:23:32.8531953Z ld.shared.b16 %rs22, [%r3860+9216]; 2026-02-21T09:23:32.8532164Z ld.shared.b16 %rs23, [%r3860+8256]; 2026-02-21T09:23:32.8532372Z ld.shared.b16 %rs24, [%r3860+9280]; 2026-02-21T09:23:32.8532581Z add.s32 %r3861, %r3857, %r61; 2026-02-21T09:23:32.8532771Z ld.shared.b16 %rs25, [%r3861]; 2026-02-21T09:23:32.8532981Z ld.shared.b16 %rs26, [%r3861+1024]; 2026-02-21T09:23:32.8533350Z ld.shared.b16 %rs27, [%r3861+64]; 2026-02-21T09:23:32.8533563Z ld.shared.b16 %rs28, [%r3861+1088]; 2026-02-21T09:23:32.8533775Z ld.shared.b16 %rs29, [%r3861+8192]; 2026-02-21T09:23:32.8533980Z ld.shared.b16 %rs30, [%r3861+9216]; 2026-02-21T09:23:32.8534193Z ld.shared.b16 %rs31, [%r3861+8256]; 2026-02-21T09:23:32.8534396Z ld.shared.b16 %rs32, [%r3861+9280]; 2026-02-21T09:23:32.8534602Z add.s32 %r3862, %r3857, %r62; 2026-02-21T09:23:32.8534813Z ld.shared.b16 %rs33, [%r3862]; 2026-02-21T09:23:32.8535021Z ld.shared.b16 %rs34, [%r3862+1024]; 2026-02-21T09:23:32.8535228Z ld.shared.b16 %rs35, [%r3862+64]; 2026-02-21T09:23:32.8535435Z ld.shared.b16 %rs36, [%r3862+1088]; 2026-02-21T09:23:32.8535645Z ld.shared.b16 %rs37, [%r3862+8192]; 2026-02-21T09:23:32.8535931Z ld.shared.b16 %rs38, [%r3862+9216]; 2026-02-21T09:23:32.8536148Z ld.shared.b16 %rs39, [%r3862+8256]; 2026-02-21T09:23:32.8536364Z ld.shared.b16 %rs40, [%r3862+9280]; 2026-02-21T09:23:32.8536685Z add.s32 %r3863, %r3857, %r63; 2026-02-21T09:23:32.8536879Z ld.shared.b16 %rs41, [%r3863]; 2026-02-21T09:23:32.8537079Z ld.shared.b16 %rs42, [%r3863+1024]; 2026-02-21T09:23:32.8537285Z ld.shared.b16 %rs43, [%r3863+64]; 2026-02-21T09:23:32.8537491Z ld.shared.b16 %rs44, [%r3863+1088]; 2026-02-21T09:23:32.8537808Z ld.shared.b16 %rs45, [%r3863+8192]; 2026-02-21T09:23:32.8538019Z ld.shared.b16 %rs46, [%r3863+9216]; 2026-02-21T09:23:32.8538231Z ld.shared.b16 %rs47, [%r3863+8256]; 2026-02-21T09:23:32.8538433Z ld.shared.b16 %rs48, [%r3863+9280]; 2026-02-21T09:23:32.8538638Z add.s32 %r3864, %r3857, %r64; 2026-02-21T09:23:32.8538825Z ld.shared.b16 %rs49, [%r3864]; 2026-02-21T09:23:32.8539027Z ld.shared.b16 %rs50, [%r3864+1024]; 2026-02-21T09:23:32.8539233Z ld.shared.b16 %rs51, [%r3864+64]; 2026-02-21T09:23:32.8539441Z ld.shared.b16 %rs52, [%r3864+1088]; 2026-02-21T09:23:32.8539654Z ld.shared.b16 %rs53, [%r3864+8192]; 2026-02-21T09:23:32.8539859Z ld.shared.b16 %rs54, [%r3864+9216]; 2026-02-21T09:23:32.8540074Z ld.shared.b16 %rs55, [%r3864+8256]; 2026-02-21T09:23:32.8540278Z ld.shared.b16 %rs56, [%r3864+9280]; 2026-02-21T09:23:32.8550467Z add.s32 %r3865, %r3857, %r65; 2026-02-21T09:23:32.8550715Z ld.shared.b16 %rs57, [%r3865]; 2026-02-21T09:23:32.8550942Z ld.shared.b16 %rs58, [%r3865+1024]; 2026-02-21T09:23:32.8551163Z ld.shared.b16 %rs59, [%r3865+64]; 2026-02-21T09:23:32.8551368Z ld.shared.b16 %rs60, [%r3865+1088]; 2026-02-21T09:23:32.8551571Z ld.shared.b16 %rs61, [%r3865+8192]; 2026-02-21T09:23:32.8551789Z ld.shared.b16 %rs62, [%r3865+9216]; 2026-02-21T09:23:32.8551988Z ld.shared.b16 %rs63, [%r3865+8256]; 2026-02-21T09:23:32.8552194Z ld.shared.b16 %rs64, [%r3865+9280]; 2026-02-21T09:23:32.8552396Z cvt.f32.bf16 %r1570, %rs1; 2026-02-21T09:23:32.8552590Z cvt.f32.bf16 %r1571, %rs2; 2026-02-21T09:23:32.8552784Z cvt.f32.bf16 %r1572, %rs9; 2026-02-21T09:23:32.8552976Z cvt.f32.bf16 %r1573, %rs10; 2026-02-21T09:23:32.8553156Z cvt.f32.bf16 %r1702, %rs17; 2026-02-21T09:23:32.8553347Z cvt.f32.bf16 %r1703, %rs18; 2026-02-21T09:23:32.8553536Z cvt.f32.bf16 %r1704, %rs25; 2026-02-21T09:23:32.8553722Z cvt.f32.bf16 %r1705, %rs26; 2026-02-21T09:23:32.8553909Z cvt.f32.bf16 %r1834, %rs33; 2026-02-21T09:23:32.8554091Z cvt.f32.bf16 %r1835, %rs34; 2026-02-21T09:23:32.8554276Z cvt.f32.bf16 %r1836, %rs41; 2026-02-21T09:23:32.8554452Z cvt.f32.bf16 %r1837, %rs42; 2026-02-21T09:23:32.8554627Z cvt.f32.bf16 %r1966, %rs49; 2026-02-21T09:23:32.8554814Z cvt.f32.bf16 %r1967, %rs50; 2026-02-21T09:23:32.8555004Z cvt.f32.bf16 %r1968, %rs57; 2026-02-21T09:23:32.8555181Z cvt.f32.bf16 %r1969, %rs58; 2026-02-21T09:23:32.8555367Z cvt.f32.bf16 %r2098, %rs3; 2026-02-21T09:23:32.8555553Z cvt.f32.bf16 %r2099, %rs4; 2026-02-21T09:23:32.8555729Z cvt.f32.bf16 %r2100, %rs11; 2026-02-21T09:23:32.8555910Z cvt.f32.bf16 %r2101, %rs12; 2026-02-21T09:23:32.8556084Z cvt.f32.bf16 %r2230, %rs19; 2026-02-21T09:23:32.8556261Z cvt.f32.bf16 %r2231, %rs20; 2026-02-21T09:23:32.8556435Z cvt.f32.bf16 %r2232, %rs27; 2026-02-21T09:23:32.8557052Z cvt.f32.bf16 %r2233, %rs28; 2026-02-21T09:23:32.8557245Z cvt.f32.bf16 %r2362, %rs35; 2026-02-21T09:23:32.8557429Z cvt.f32.bf16 %r2363, %rs36; 2026-02-21T09:23:32.8557604Z cvt.f32.bf16 %r2364, %rs43; 2026-02-21T09:23:32.8557785Z cvt.f32.bf16 %r2365, %rs44; 2026-02-21T09:23:32.8557972Z cvt.f32.bf16 %r2494, %rs51; 2026-02-21T09:23:32.8558153Z cvt.f32.bf16 %r2495, %rs52; 2026-02-21T09:23:32.8558339Z cvt.f32.bf16 %r2496, %rs59; 2026-02-21T09:23:32.8558516Z cvt.f32.bf16 %r2497, %rs60; 2026-02-21T09:23:32.8558707Z cvt.f32.bf16 %r2626, %rs5; 2026-02-21T09:23:32.8558888Z cvt.f32.bf16 %r2627, %rs6; 2026-02-21T09:23:32.8559072Z cvt.f32.bf16 %r2628, %rs13; 2026-02-21T09:23:32.8559267Z cvt.f32.bf16 %r2629, %rs14; 2026-02-21T09:23:32.8559454Z cvt.f32.bf16 %r2758, %rs21; 2026-02-21T09:23:32.8559727Z cvt.f32.bf16 %r2759, %rs22; 2026-02-21T09:23:32.8559920Z cvt.f32.bf16 %r2760, %rs29; 2026-02-21T09:23:32.8560103Z cvt.f32.bf16 %r2761, %rs30; 2026-02-21T09:23:32.8560286Z cvt.f32.bf16 %r2890, %rs37; 2026-02-21T09:23:32.8560471Z cvt.f32.bf16 %r2891, %rs38; 2026-02-21T09:23:32.8560653Z cvt.f32.bf16 %r2892, %rs45; 2026-02-21T09:23:32.8560838Z cvt.f32.bf16 %r2893, %rs46; 2026-02-21T09:23:32.8561020Z cvt.f32.bf16 %r3022, %rs53; 2026-02-21T09:23:32.8561285Z cvt.f32.bf16 %r3023, %rs54; 2026-02-21T09:23:32.8561467Z cvt.f32.bf16 %r3024, %rs61; 2026-02-21T09:23:32.8561658Z cvt.f32.bf16 %r3025, %rs62; 2026-02-21T09:23:32.8561837Z cvt.f32.bf16 %r3154, %rs7; 2026-02-21T09:23:32.8562025Z cvt.f32.bf16 %r3155, %rs8; 2026-02-21T09:23:32.8562213Z cvt.f32.bf16 %r3156, %rs15; 2026-02-21T09:23:32.8562392Z cvt.f32.bf16 %r3157, %rs16; 2026-02-21T09:23:32.8562576Z cvt.f32.bf16 %r3286, %rs23; 2026-02-21T09:23:32.8562755Z cvt.f32.bf16 %r3287, %rs24; 2026-02-21T09:23:32.8562956Z cvt.f32.bf16 %r3288, %rs31; 2026-02-21T09:23:32.8563138Z cvt.f32.bf16 %r3289, %rs32; 2026-02-21T09:23:32.8563318Z cvt.f32.bf16 %r3418, %rs39; 2026-02-21T09:23:32.8563495Z cvt.f32.bf16 %r3419, %rs40; 2026-02-21T09:23:32.8563687Z cvt.f32.bf16 %r3420, %rs47; 2026-02-21T09:23:32.8563869Z cvt.f32.bf16 %r3421, %rs48; 2026-02-21T09:23:32.8564060Z cvt.f32.bf16 %r3550, %rs55; 2026-02-21T09:23:32.8564247Z cvt.f32.bf16 %r3551, %rs56; 2026-02-21T09:23:32.8564425Z cvt.f32.bf16 %r3552, %rs63; 2026-02-21T09:23:32.8564609Z cvt.f32.bf16 %r3553, %rs64; 2026-02-21T09:23:32.8564950Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8565326Z shl.b32 %r3866, %r12328, 3; 2026-02-21T09:23:32.8565514Z add.s32 %r1440, %r9488, %r3866; 2026-02-21T09:23:32.8565714Z // begin inline asm 2026-02-21T09:23:32.8565775Z 2026-02-21T09:23:32.8565834Z { 2026-02-21T09:23:32.8565903Z .reg .pred complete; 2026-02-21T09:23:32.8565963Z waitLoop: 2026-02-21T09:23:32.8566117Z mbarrier.try_wait.parity.shared.b64 complete, [%r1440], %r12327; 2026-02-21T09:23:32.8566198Z @!complete bra.uni waitLoop; 2026-02-21T09:23:32.8566252Z } 2026-02-21T09:23:32.8566263Z 2026-02-21T09:23:32.8566327Z // end inline asm 2026-02-21T09:23:32.8566681Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8566756Z shl.b32 %r3868, %r12328, 12; 2026-02-21T09:23:32.8566826Z add.s32 %r3870, %r1289, %r3868; 2026-02-21T09:23:32.8567039Z .loc 1 76 58 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:76:58 2026-02-21T09:23:32.8567108Z add.s32 %r3871, %r3870, %r24; 2026-02-21T09:23:32.8567173Z xor.b32 %r249, %r24, 16; 2026-02-21T09:23:32.8567240Z add.s32 %r3872, %r3870, %r249; 2026-02-21T09:23:32.8567309Z xor.b32 %r250, %r24, 32; 2026-02-21T09:23:32.8567374Z add.s32 %r3873, %r3870, %r250; 2026-02-21T09:23:32.8567437Z xor.b32 %r251, %r24, 48; 2026-02-21T09:23:32.8567508Z add.s32 %r3874, %r3870, %r251; 2026-02-21T09:23:32.8567571Z xor.b32 %r252, %r24, 64; 2026-02-21T09:23:32.8567634Z add.s32 %r3875, %r3870, %r252; 2026-02-21T09:23:32.8567805Z xor.b32 %r253, %r24, 80; 2026-02-21T09:23:32.8567945Z add.s32 %r3876, %r3870, %r253; 2026-02-21T09:23:32.8568008Z xor.b32 %r254, %r24, 96; 2026-02-21T09:23:32.8568074Z add.s32 %r3877, %r3870, %r254; 2026-02-21T09:23:32.8568144Z xor.b32 %r255, %r24, 112; 2026-02-21T09:23:32.8568210Z add.s32 %r3878, %r3870, %r255; 2026-02-21T09:23:32.8568422Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8568495Z ld.shared.s8 %rs65, [%r3871]; 2026-02-21T09:23:32.8568714Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8568795Z shl.b16 %rs66, %rs65, 4; 2026-02-21T09:23:32.8569078Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8569163Z ld.shared.s8 %rs67, [%r3872+128]; 2026-02-21T09:23:32.8569368Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8569446Z shl.b16 %rs68, %rs67, 4; 2026-02-21T09:23:32.8569663Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8569737Z ld.shared.s8 %rs69, [%r3873+256]; 2026-02-21T09:23:32.8570006Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8570078Z shl.b16 %rs70, %rs69, 4; 2026-02-21T09:23:32.8570280Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8570348Z ld.shared.s8 %rs71, [%r3874+384]; 2026-02-21T09:23:32.8570543Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8570612Z shl.b16 %rs72, %rs71, 4; 2026-02-21T09:23:32.8570808Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8570878Z ld.shared.s8 %rs73, [%r3875+512]; 2026-02-21T09:23:32.8571102Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8571167Z shl.b16 %rs74, %rs73, 4; 2026-02-21T09:23:32.8571371Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8571455Z ld.shared.s8 %rs75, [%r3876+640]; 2026-02-21T09:23:32.8571656Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8571721Z shl.b16 %rs76, %rs75, 4; 2026-02-21T09:23:32.8571931Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8572002Z ld.shared.s8 %rs77, [%r3877+768]; 2026-02-21T09:23:32.8572201Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8572272Z shl.b16 %rs78, %rs77, 4; 2026-02-21T09:23:32.8572473Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8572546Z ld.shared.s8 %rs79, [%r3878+896]; 2026-02-21T09:23:32.8572744Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8572832Z shl.b16 %rs80, %rs79, 4; 2026-02-21T09:23:32.8573033Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8573105Z ld.shared.s8 %rs81, [%r3871+1024]; 2026-02-21T09:23:32.8573311Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8573377Z shl.b16 %rs82, %rs81, 4; 2026-02-21T09:23:32.8573576Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8573659Z ld.shared.s8 %rs83, [%r3872+1152]; 2026-02-21T09:23:32.8573862Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8574063Z shl.b16 %rs84, %rs83, 4; 2026-02-21T09:23:32.8574270Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8574345Z ld.shared.s8 %rs85, [%r3873+1280]; 2026-02-21T09:23:32.8574545Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8574607Z shl.b16 %rs86, %rs85, 4; 2026-02-21T09:23:32.8574812Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8574884Z ld.shared.s8 %rs87, [%r3874+1408]; 2026-02-21T09:23:32.8575083Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8575203Z shl.b16 %rs88, %rs87, 4; 2026-02-21T09:23:32.8575408Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8575478Z ld.shared.s8 %rs89, [%r3875+1536]; 2026-02-21T09:23:32.8575686Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8575753Z shl.b16 %rs90, %rs89, 4; 2026-02-21T09:23:32.8575997Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8576073Z ld.shared.s8 %rs91, [%r3876+1664]; 2026-02-21T09:23:32.8576270Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8576333Z shl.b16 %rs92, %rs91, 4; 2026-02-21T09:23:32.8576660Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8576743Z ld.shared.s8 %rs93, [%r3877+1792]; 2026-02-21T09:23:32.8576952Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8577018Z shl.b16 %rs94, %rs93, 4; 2026-02-21T09:23:32.8577219Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8577297Z ld.shared.s8 %rs95, [%r3878+1920]; 2026-02-21T09:23:32.8577496Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8577566Z shl.b16 %rs96, %rs95, 4; 2026-02-21T09:23:32.8577766Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8577834Z ld.shared.s8 %rs97, [%r3871+2048]; 2026-02-21T09:23:32.8578042Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8578110Z shl.b16 %rs98, %rs97, 4; 2026-02-21T09:23:32.8578325Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8578397Z ld.shared.s8 %rs99, [%r3872+2176]; 2026-02-21T09:23:32.8578606Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8578677Z shl.b16 %rs100, %rs99, 4; 2026-02-21T09:23:32.8578877Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8578960Z ld.shared.s8 %rs101, [%r3873+2304]; 2026-02-21T09:23:32.8579161Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8579228Z shl.b16 %rs102, %rs101, 4; 2026-02-21T09:23:32.8579433Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8579503Z ld.shared.s8 %rs103, [%r3874+2432]; 2026-02-21T09:23:32.8579711Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8579786Z shl.b16 %rs104, %rs103, 4; 2026-02-21T09:23:32.8579985Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8580232Z ld.shared.s8 %rs105, [%r3875+2560]; 2026-02-21T09:23:32.8580435Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8580506Z shl.b16 %rs106, %rs105, 4; 2026-02-21T09:23:32.8580705Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8580778Z ld.shared.s8 %rs107, [%r3876+2688]; 2026-02-21T09:23:32.8580980Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8581044Z shl.b16 %rs108, %rs107, 4; 2026-02-21T09:23:32.8581240Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8581318Z ld.shared.s8 %rs109, [%r3877+2816]; 2026-02-21T09:23:32.8581579Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8581647Z shl.b16 %rs110, %rs109, 4; 2026-02-21T09:23:32.8581854Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8581923Z ld.shared.s8 %rs111, [%r3878+2944]; 2026-02-21T09:23:32.8582179Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8582248Z shl.b16 %rs112, %rs111, 4; 2026-02-21T09:23:32.8582451Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8582522Z ld.shared.s8 %rs113, [%r3871+3072]; 2026-02-21T09:23:32.8582733Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8582805Z shl.b16 %rs114, %rs113, 4; 2026-02-21T09:23:32.8583003Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8583073Z ld.shared.s8 %rs115, [%r3872+3200]; 2026-02-21T09:23:32.8583282Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8583356Z shl.b16 %rs116, %rs115, 4; 2026-02-21T09:23:32.8583554Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8583631Z ld.shared.s8 %rs117, [%r3873+3328]; 2026-02-21T09:23:32.8583827Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8583892Z shl.b16 %rs118, %rs117, 4; 2026-02-21T09:23:32.8584087Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8584163Z ld.shared.s8 %rs119, [%r3874+3456]; 2026-02-21T09:23:32.8584363Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8584427Z shl.b16 %rs120, %rs119, 4; 2026-02-21T09:23:32.8584631Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8584704Z ld.shared.s8 %rs121, [%r3875+3584]; 2026-02-21T09:23:32.8584902Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8584975Z shl.b16 %rs122, %rs121, 4; 2026-02-21T09:23:32.8585176Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8585251Z ld.shared.s8 %rs123, [%r3876+3712]; 2026-02-21T09:23:32.8585460Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8585525Z shl.b16 %rs124, %rs123, 4; 2026-02-21T09:23:32.8585722Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8585802Z ld.shared.s8 %rs125, [%r3877+3840]; 2026-02-21T09:23:32.8586002Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8586146Z shl.b16 %rs126, %rs125, 4; 2026-02-21T09:23:32.8586402Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8586592Z ld.shared.s8 %rs127, [%r3878+3968]; 2026-02-21T09:23:32.8586802Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8586866Z shl.b16 %rs128, %rs127, 4; 2026-02-21T09:23:32.8587071Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8587147Z cvt.s16.s8 %rs129, %rs66; 2026-02-21T09:23:32.8587213Z shr.s16 %rs130, %rs129, 4; 2026-02-21T09:23:32.8587283Z cvt.s16.s8 %rs131, %rs68; 2026-02-21T09:23:32.8587349Z shr.s16 %rs132, %rs131, 4; 2026-02-21T09:23:32.8587496Z shr.s16 %rs133, %rs65, 4; 2026-02-21T09:23:32.8587566Z shr.s16 %rs134, %rs67, 4; 2026-02-21T09:23:32.8587765Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8587841Z cvt.rn.f32.s16 %r3879, %rs134; 2026-02-21T09:23:32.8587910Z cvt.rn.f32.s16 %r3880, %rs133; 2026-02-21T09:23:32.8587983Z cvt.rn.f32.s16 %r3881, %rs132; 2026-02-21T09:23:32.8588047Z cvt.rn.f32.s16 %r3882, %rs130; 2026-02-21T09:23:32.8588395Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8588473Z cvt.s16.s8 %rs135, %rs70; 2026-02-21T09:23:32.8588537Z shr.s16 %rs136, %rs135, 4; 2026-02-21T09:23:32.8588601Z cvt.s16.s8 %rs137, %rs72; 2026-02-21T09:23:32.8588670Z shr.s16 %rs138, %rs137, 4; 2026-02-21T09:23:32.8588734Z shr.s16 %rs139, %rs69, 4; 2026-02-21T09:23:32.8588794Z shr.s16 %rs140, %rs71, 4; 2026-02-21T09:23:32.8588996Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8589068Z cvt.rn.f32.s16 %r3883, %rs140; 2026-02-21T09:23:32.8589134Z cvt.rn.f32.s16 %r3884, %rs139; 2026-02-21T09:23:32.8589200Z cvt.rn.f32.s16 %r3885, %rs138; 2026-02-21T09:23:32.8589275Z cvt.rn.f32.s16 %r3886, %rs136; 2026-02-21T09:23:32.8589481Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8589546Z cvt.s16.s8 %rs141, %rs74; 2026-02-21T09:23:32.8589621Z shr.s16 %rs142, %rs141, 4; 2026-02-21T09:23:32.8589710Z cvt.s16.s8 %rs143, %rs76; 2026-02-21T09:23:32.8589777Z shr.s16 %rs144, %rs143, 4; 2026-02-21T09:23:32.8589841Z shr.s16 %rs145, %rs73, 4; 2026-02-21T09:23:32.8589908Z shr.s16 %rs146, %rs75, 4; 2026-02-21T09:23:32.8590110Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8590177Z cvt.rn.f32.s16 %r3887, %rs146; 2026-02-21T09:23:32.8590250Z cvt.rn.f32.s16 %r3888, %rs145; 2026-02-21T09:23:32.8590315Z cvt.rn.f32.s16 %r3889, %rs144; 2026-02-21T09:23:32.8590381Z cvt.rn.f32.s16 %r3890, %rs142; 2026-02-21T09:23:32.8590584Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8590659Z cvt.s16.s8 %rs147, %rs78; 2026-02-21T09:23:32.8590725Z shr.s16 %rs148, %rs147, 4; 2026-02-21T09:23:32.8590787Z cvt.s16.s8 %rs149, %rs80; 2026-02-21T09:23:32.8590855Z shr.s16 %rs150, %rs149, 4; 2026-02-21T09:23:32.8590918Z shr.s16 %rs151, %rs77, 4; 2026-02-21T09:23:32.8590980Z shr.s16 %rs152, %rs79, 4; 2026-02-21T09:23:32.8591176Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8591247Z cvt.rn.f32.s16 %r3891, %rs152; 2026-02-21T09:23:32.8591323Z cvt.rn.f32.s16 %r3892, %rs151; 2026-02-21T09:23:32.8591389Z cvt.rn.f32.s16 %r3893, %rs150; 2026-02-21T09:23:32.8591462Z cvt.rn.f32.s16 %r3894, %rs148; 2026-02-21T09:23:32.8591661Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8591726Z cvt.s16.s8 %rs153, %rs82; 2026-02-21T09:23:32.8591794Z shr.s16 %rs154, %rs153, 4; 2026-02-21T09:23:32.8592004Z cvt.s16.s8 %rs155, %rs84; 2026-02-21T09:23:32.8592069Z shr.s16 %rs156, %rs155, 4; 2026-02-21T09:23:32.8592130Z shr.s16 %rs157, %rs81, 4; 2026-02-21T09:23:32.8592194Z shr.s16 %rs158, %rs83, 4; 2026-02-21T09:23:32.8592406Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8592473Z cvt.rn.f32.s16 %r3895, %rs158; 2026-02-21T09:23:32.8592544Z cvt.rn.f32.s16 %r3896, %rs157; 2026-02-21T09:23:32.8592609Z cvt.rn.f32.s16 %r3897, %rs156; 2026-02-21T09:23:32.8592672Z cvt.rn.f32.s16 %r3898, %rs154; 2026-02-21T09:23:32.8592878Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8592942Z cvt.s16.s8 %rs159, %rs86; 2026-02-21T09:23:32.8593059Z shr.s16 %rs160, %rs159, 4; 2026-02-21T09:23:32.8593125Z cvt.s16.s8 %rs161, %rs88; 2026-02-21T09:23:32.8593193Z shr.s16 %rs162, %rs161, 4; 2026-02-21T09:23:32.8593255Z shr.s16 %rs163, %rs85, 4; 2026-02-21T09:23:32.8593319Z shr.s16 %rs164, %rs87, 4; 2026-02-21T09:23:32.8593522Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8593591Z cvt.rn.f32.s16 %r3899, %rs164; 2026-02-21T09:23:32.8593703Z cvt.rn.f32.s16 %r3900, %rs163; 2026-02-21T09:23:32.8593783Z cvt.rn.f32.s16 %r3901, %rs162; 2026-02-21T09:23:32.8593859Z cvt.rn.f32.s16 %r3902, %rs160; 2026-02-21T09:23:32.8594058Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8594121Z cvt.s16.s8 %rs165, %rs90; 2026-02-21T09:23:32.8594191Z shr.s16 %rs166, %rs165, 4; 2026-02-21T09:23:32.8594257Z cvt.s16.s8 %rs167, %rs92; 2026-02-21T09:23:32.8594320Z shr.s16 %rs168, %rs167, 4; 2026-02-21T09:23:32.8594387Z shr.s16 %rs169, %rs89, 4; 2026-02-21T09:23:32.8594449Z shr.s16 %rs170, %rs91, 4; 2026-02-21T09:23:32.8594646Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8594715Z cvt.rn.f32.s16 %r3903, %rs170; 2026-02-21T09:23:32.8594788Z cvt.rn.f32.s16 %r3904, %rs169; 2026-02-21T09:23:32.8594852Z cvt.rn.f32.s16 %r3905, %rs168; 2026-02-21T09:23:32.8594918Z cvt.rn.f32.s16 %r3906, %rs166; 2026-02-21T09:23:32.8595124Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8595186Z cvt.s16.s8 %rs171, %rs94; 2026-02-21T09:23:32.8595276Z shr.s16 %rs172, %rs171, 4; 2026-02-21T09:23:32.8595340Z cvt.s16.s8 %rs173, %rs96; 2026-02-21T09:23:32.8595404Z shr.s16 %rs174, %rs173, 4; 2026-02-21T09:23:32.8595467Z shr.s16 %rs175, %rs93, 4; 2026-02-21T09:23:32.8595535Z shr.s16 %rs176, %rs95, 4; 2026-02-21T09:23:32.8595735Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8595802Z cvt.rn.f32.s16 %r3907, %rs176; 2026-02-21T09:23:32.8595873Z cvt.rn.f32.s16 %r3908, %rs175; 2026-02-21T09:23:32.8595940Z cvt.rn.f32.s16 %r3909, %rs174; 2026-02-21T09:23:32.8596008Z cvt.rn.f32.s16 %r3910, %rs172; 2026-02-21T09:23:32.8596211Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8596276Z cvt.s16.s8 %rs177, %rs98; 2026-02-21T09:23:32.8596342Z shr.s16 %rs178, %rs177, 4; 2026-02-21T09:23:32.8596406Z cvt.s16.s8 %rs179, %rs100; 2026-02-21T09:23:32.8596606Z shr.s16 %rs180, %rs179, 4; 2026-02-21T09:23:32.8596677Z shr.s16 %rs181, %rs97, 4; 2026-02-21T09:23:32.8596739Z shr.s16 %rs182, %rs99, 4; 2026-02-21T09:23:32.8596946Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8597011Z cvt.rn.f32.s16 %r3911, %rs182; 2026-02-21T09:23:32.8597078Z cvt.rn.f32.s16 %r3912, %rs181; 2026-02-21T09:23:32.8597153Z cvt.rn.f32.s16 %r3913, %rs180; 2026-02-21T09:23:32.8597217Z cvt.rn.f32.s16 %r3914, %rs178; 2026-02-21T09:23:32.8597413Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8597666Z cvt.s16.s8 %rs183, %rs102; 2026-02-21T09:23:32.8597739Z shr.s16 %rs184, %rs183, 4; 2026-02-21T09:23:32.8597802Z cvt.s16.s8 %rs185, %rs104; 2026-02-21T09:23:32.8597868Z shr.s16 %rs186, %rs185, 4; 2026-02-21T09:23:32.8597936Z shr.s16 %rs187, %rs101, 4; 2026-02-21T09:23:32.8597999Z shr.s16 %rs188, %rs103, 4; 2026-02-21T09:23:32.8598202Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8598268Z cvt.rn.f32.s16 %r3915, %rs188; 2026-02-21T09:23:32.8598340Z cvt.rn.f32.s16 %r3916, %rs187; 2026-02-21T09:23:32.8598405Z cvt.rn.f32.s16 %r3917, %rs186; 2026-02-21T09:23:32.8598469Z cvt.rn.f32.s16 %r3918, %rs184; 2026-02-21T09:23:32.8598743Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8598809Z cvt.s16.s8 %rs189, %rs106; 2026-02-21T09:23:32.8598872Z shr.s16 %rs190, %rs189, 4; 2026-02-21T09:23:32.8598943Z cvt.s16.s8 %rs191, %rs108; 2026-02-21T09:23:32.8599005Z shr.s16 %rs192, %rs191, 4; 2026-02-21T09:23:32.8599068Z shr.s16 %rs193, %rs105, 4; 2026-02-21T09:23:32.8599132Z shr.s16 %rs194, %rs107, 4; 2026-02-21T09:23:32.8599399Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8599467Z cvt.rn.f32.s16 %r3919, %rs194; 2026-02-21T09:23:32.8599530Z cvt.rn.f32.s16 %r3920, %rs193; 2026-02-21T09:23:32.8599602Z cvt.rn.f32.s16 %r3921, %rs192; 2026-02-21T09:23:32.8599669Z cvt.rn.f32.s16 %r3922, %rs190; 2026-02-21T09:23:32.8599869Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8599938Z cvt.s16.s8 %rs195, %rs110; 2026-02-21T09:23:32.8600004Z shr.s16 %rs196, %rs195, 4; 2026-02-21T09:23:32.8600080Z cvt.s16.s8 %rs197, %rs112; 2026-02-21T09:23:32.8600144Z shr.s16 %rs198, %rs197, 4; 2026-02-21T09:23:32.8600217Z shr.s16 %rs199, %rs109, 4; 2026-02-21T09:23:32.8600281Z shr.s16 %rs200, %rs111, 4; 2026-02-21T09:23:32.8600480Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8600552Z cvt.rn.f32.s16 %r3923, %rs200; 2026-02-21T09:23:32.8600619Z cvt.rn.f32.s16 %r3924, %rs199; 2026-02-21T09:23:32.8600683Z cvt.rn.f32.s16 %r3925, %rs198; 2026-02-21T09:23:32.8600749Z cvt.rn.f32.s16 %r3926, %rs196; 2026-02-21T09:23:32.8600951Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8601016Z cvt.s16.s8 %rs201, %rs114; 2026-02-21T09:23:32.8601079Z shr.s16 %rs202, %rs201, 4; 2026-02-21T09:23:32.8601148Z cvt.s16.s8 %rs203, %rs116; 2026-02-21T09:23:32.8601211Z shr.s16 %rs204, %rs203, 4; 2026-02-21T09:23:32.8601276Z shr.s16 %rs205, %rs113, 4; 2026-02-21T09:23:32.8601342Z shr.s16 %rs206, %rs115, 4; 2026-02-21T09:23:32.8601545Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8601613Z cvt.rn.f32.s16 %r3927, %rs206; 2026-02-21T09:23:32.8601679Z cvt.rn.f32.s16 %r3928, %rs205; 2026-02-21T09:23:32.8601751Z cvt.rn.f32.s16 %r3929, %rs204; 2026-02-21T09:23:32.8601818Z cvt.rn.f32.s16 %r3930, %rs202; 2026-02-21T09:23:32.8602018Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8602088Z cvt.s16.s8 %rs207, %rs118; 2026-02-21T09:23:32.8602153Z shr.s16 %rs208, %rs207, 4; 2026-02-21T09:23:32.8602217Z cvt.s16.s8 %rs209, %rs120; 2026-02-21T09:23:32.8602296Z shr.s16 %rs210, %rs209, 4; 2026-02-21T09:23:32.8602367Z shr.s16 %rs211, %rs117, 4; 2026-02-21T09:23:32.8602431Z shr.s16 %rs212, %rs119, 4; 2026-02-21T09:23:32.8602631Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8602704Z cvt.rn.f32.s16 %r3931, %rs212; 2026-02-21T09:23:32.8602769Z cvt.rn.f32.s16 %r3932, %rs211; 2026-02-21T09:23:32.8602949Z cvt.rn.f32.s16 %r3933, %rs210; 2026-02-21T09:23:32.8603019Z cvt.rn.f32.s16 %r3934, %rs208; 2026-02-21T09:23:32.8603217Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8603282Z cvt.s16.s8 %rs213, %rs122; 2026-02-21T09:23:32.8603346Z shr.s16 %rs214, %rs213, 4; 2026-02-21T09:23:32.8603416Z cvt.s16.s8 %rs215, %rs124; 2026-02-21T09:23:32.8603480Z shr.s16 %rs216, %rs215, 4; 2026-02-21T09:23:32.8603543Z shr.s16 %rs217, %rs121, 4; 2026-02-21T09:23:32.8603612Z shr.s16 %rs218, %rs123, 4; 2026-02-21T09:23:32.8603811Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8603877Z cvt.rn.f32.s16 %r3935, %rs218; 2026-02-21T09:23:32.8603995Z cvt.rn.f32.s16 %r3936, %rs217; 2026-02-21T09:23:32.8604069Z cvt.rn.f32.s16 %r3937, %rs216; 2026-02-21T09:23:32.8604132Z cvt.rn.f32.s16 %r3938, %rs214; 2026-02-21T09:23:32.8604331Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8604403Z cvt.s16.s8 %rs219, %rs126; 2026-02-21T09:23:32.8604466Z shr.s16 %rs220, %rs219, 4; 2026-02-21T09:23:32.8604530Z cvt.s16.s8 %rs221, %rs128; 2026-02-21T09:23:32.8604645Z shr.s16 %rs222, %rs221, 4; 2026-02-21T09:23:32.8604711Z shr.s16 %rs223, %rs125, 4; 2026-02-21T09:23:32.8604776Z shr.s16 %rs224, %rs127, 4; 2026-02-21T09:23:32.8604972Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8605045Z cvt.rn.f32.s16 %r3939, %rs224; 2026-02-21T09:23:32.8605110Z cvt.rn.f32.s16 %r3940, %rs223; 2026-02-21T09:23:32.8605176Z cvt.rn.f32.s16 %r3941, %rs222; 2026-02-21T09:23:32.8605248Z cvt.rn.f32.s16 %r3942, %rs220; 2026-02-21T09:23:32.8605373Z st.shared.v4.b32 [%r66], {%r3882, %r3880, %r3881, %r3879}; 2026-02-21T09:23:32.8605504Z st.shared.v4.b32 [%r66+16384], {%r3914, %r3912, %r3913, %r3911}; 2026-02-21T09:23:32.8605629Z st.shared.v4.b32 [%r67], {%r3886, %r3884, %r3885, %r3883}; 2026-02-21T09:23:32.8605751Z st.shared.v4.b32 [%r67+16384], {%r3918, %r3916, %r3917, %r3915}; 2026-02-21T09:23:32.8605864Z st.shared.v4.b32 [%r68], {%r3890, %r3888, %r3889, %r3887}; 2026-02-21T09:23:32.8605984Z st.shared.v4.b32 [%r68+16384], {%r3922, %r3920, %r3921, %r3919}; 2026-02-21T09:23:32.8606097Z st.shared.v4.b32 [%r69], {%r3894, %r3892, %r3893, %r3891}; 2026-02-21T09:23:32.8606214Z st.shared.v4.b32 [%r69+16384], {%r3926, %r3924, %r3925, %r3923}; 2026-02-21T09:23:32.8606320Z st.shared.v4.b32 [%r70], {%r3898, %r3896, %r3897, %r3895}; 2026-02-21T09:23:32.8606444Z st.shared.v4.b32 [%r70+16384], {%r3930, %r3928, %r3929, %r3927}; 2026-02-21T09:23:32.8606681Z st.shared.v4.b32 [%r71], {%r3902, %r3900, %r3901, %r3899}; 2026-02-21T09:23:32.8606808Z st.shared.v4.b32 [%r71+16384], {%r3934, %r3932, %r3933, %r3931}; 2026-02-21T09:23:32.8606922Z st.shared.v4.b32 [%r72], {%r3906, %r3904, %r3905, %r3903}; 2026-02-21T09:23:32.8607046Z st.shared.v4.b32 [%r72+16384], {%r3938, %r3936, %r3937, %r3935}; 2026-02-21T09:23:32.8607154Z st.shared.v4.b32 [%r73], {%r3910, %r3908, %r3909, %r3907}; 2026-02-21T09:23:32.8607277Z st.shared.v4.b32 [%r73+16384], {%r3942, %r3940, %r3941, %r3939}; 2026-02-21T09:23:32.8607349Z $L__tmp1: 2026-02-21T09:23:32.8607633Z .loc 2 291 36 // standard.py:291:36 @[ c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:88:40 ] 2026-02-21T09:23:32.8607699Z // begin inline asm 2026-02-21T09:23:32.8607793Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8607857Z // end inline asm 2026-02-21T09:23:32.8607918Z bar.sync 0; 2026-02-21T09:23:32.8608009Z shfl.sync.idx.b32 %r3943, %r4, 0, 31, -1; 2026-02-21T09:23:32.8608085Z wgmma.fence.sync.aligned; 2026-02-21T09:23:32.8608156Z mov.pred %p47, -1; 2026-02-21T09:23:32.8608219Z // begin inline asm 2026-02-21T09:23:32.8609725Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r1570,%r1571,%r1572,%r1573}, %rd702, %p47, 1, 1; 2026-02-21T09:23:32.8609947Z // end inline asm 2026-02-21T09:23:32.8610016Z // begin inline asm 2026-02-21T09:23:32.8611570Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r1702,%r1703,%r1704,%r1705}, %rd703, %p47, 1, 1; 2026-02-21T09:23:32.8611635Z // end inline asm 2026-02-21T09:23:32.8611762Z // begin inline asm 2026-02-21T09:23:32.8613249Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r1834,%r1835,%r1836,%r1837}, %rd704, %p47, 1, 1; 2026-02-21T09:23:32.8613317Z // end inline asm 2026-02-21T09:23:32.8613380Z // begin inline asm 2026-02-21T09:23:32.8614859Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r1966,%r1967,%r1968,%r1969}, %rd705, %p47, 1, 1; 2026-02-21T09:23:32.8614926Z // end inline asm 2026-02-21T09:23:32.8614985Z // begin inline asm 2026-02-21T09:23:32.8616586Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r2098,%r2099,%r2100,%r2101}, %rd706, %p47, 1, 1; 2026-02-21T09:23:32.8616659Z // end inline asm 2026-02-21T09:23:32.8616722Z // begin inline asm 2026-02-21T09:23:32.8618213Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r2230,%r2231,%r2232,%r2233}, %rd707, %p47, 1, 1; 2026-02-21T09:23:32.8618417Z // end inline asm 2026-02-21T09:23:32.8618481Z // begin inline asm 2026-02-21T09:23:32.8620026Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r2362,%r2363,%r2364,%r2365}, %rd708, %p47, 1, 1; 2026-02-21T09:23:32.8620087Z // end inline asm 2026-02-21T09:23:32.8620149Z // begin inline asm 2026-02-21T09:23:32.8621724Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393}, {%r2494,%r2495,%r2496,%r2497}, %rd709, %p47, 1, 1; 2026-02-21T09:23:32.8621788Z // end inline asm 2026-02-21T09:23:32.8621853Z // begin inline asm 2026-02-21T09:23:32.8623333Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r2626,%r2627,%r2628,%r2629}, %rd702, %p47, 1, 1; 2026-02-21T09:23:32.8623397Z // end inline asm 2026-02-21T09:23:32.8623463Z // begin inline asm 2026-02-21T09:23:32.8624941Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r2758,%r2759,%r2760,%r2761}, %rd703, %p47, 1, 1; 2026-02-21T09:23:32.8625010Z // end inline asm 2026-02-21T09:23:32.8625071Z // begin inline asm 2026-02-21T09:23:32.8626668Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r2890,%r2891,%r2892,%r2893}, %rd704, %p47, 1, 1; 2026-02-21T09:23:32.8626741Z // end inline asm 2026-02-21T09:23:32.8626803Z // begin inline asm 2026-02-21T09:23:32.8628332Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r3022,%r3023,%r3024,%r3025}, %rd705, %p47, 1, 1; 2026-02-21T09:23:32.8628561Z // end inline asm 2026-02-21T09:23:32.8628622Z // begin inline asm 2026-02-21T09:23:32.8630187Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r3154,%r3155,%r3156,%r3157}, %rd706, %p47, 1, 1; 2026-02-21T09:23:32.8630310Z // end inline asm 2026-02-21T09:23:32.8630374Z // begin inline asm 2026-02-21T09:23:32.8631867Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r3286,%r3287,%r3288,%r3289}, %rd707, %p47, 1, 1; 2026-02-21T09:23:32.8631930Z // end inline asm 2026-02-21T09:23:32.8631995Z // begin inline asm 2026-02-21T09:23:32.8633475Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r3418,%r3419,%r3420,%r3421}, %rd708, %p47, 1, 1; 2026-02-21T09:23:32.8633536Z // end inline asm 2026-02-21T09:23:32.8633606Z // begin inline asm 2026-02-21T09:23:32.8635082Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457}, {%r3550,%r3551,%r3552,%r3553}, %rd709, %p47, 1, 1; 2026-02-21T09:23:32.8635150Z // end inline asm 2026-02-21T09:23:32.8635233Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:32.8635299Z mov.b32 %r3683, %r1291; 2026-02-21T09:23:32.8635363Z mov.b32 %r3684, %r1291; 2026-02-21T09:23:32.8635429Z mov.b32 %r3682, %r1221; 2026-02-21T09:23:32.8635495Z // begin inline asm 2026-02-21T09:23:32.8638226Z // wait for regs: %r12330,%r12331,%r12332,%r12333,%r12334,%r12335,%r12336,%r12337,%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r3682,%r3683,%r3684 2026-02-21T09:23:32.8638429Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:32.8638490Z // end inline asm 2026-02-21T09:23:32.8638553Z $L__tmp2: 2026-02-21T09:23:32.8638775Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8638842Z add.s32 %r3944, %r12329, 1; 2026-02-21T09:23:32.8638916Z setp.gt.s32 %p69, %r3944, 1; 2026-02-21T09:23:32.8639049Z selp.b32 %r12329, 0, %r3944, %p69; 2026-02-21T09:23:32.8639262Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8639345Z add.s64 %rd271, %rd736, %rd28; 2026-02-21T09:23:32.8639422Z add.s64 %rd272, %rd736, %rd27; 2026-02-21T09:23:32.8639489Z add.s64 %rd273, %rd736, %rd26; 2026-02-21T09:23:32.8639554Z add.s64 %rd274, %rd736, %rd25; 2026-02-21T09:23:32.8639625Z add.s64 %rd275, %rd736, %rd24; 2026-02-21T09:23:32.8639693Z add.s64 %rd276, %rd736, %rd23; 2026-02-21T09:23:32.8639767Z add.s64 %rd277, %rd736, %rd22; 2026-02-21T09:23:32.8639833Z add.s64 %rd278, %rd736, %rd21; 2026-02-21T09:23:32.8639904Z add.s64 %rd279, %rd736, %rd20; 2026-02-21T09:23:32.8639971Z add.s64 %rd280, %rd736, %rd19; 2026-02-21T09:23:32.8640038Z add.s64 %rd281, %rd736, %rd18; 2026-02-21T09:23:32.8640108Z add.s64 %rd282, %rd736, %rd17; 2026-02-21T09:23:32.8640173Z add.s64 %rd283, %rd736, %rd16; 2026-02-21T09:23:32.8640240Z add.s64 %rd284, %rd736, %rd15; 2026-02-21T09:23:32.8640315Z add.s64 %rd285, %rd736, %rd14; 2026-02-21T09:23:32.8640524Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8640591Z add.s64 %rd286, %rd736, %rd13; 2026-02-21T09:23:32.8640657Z shl.b32 %r3945, %r12329, 14; 2026-02-21T09:23:32.8640731Z add.s32 %r3946, %r1205, %r3945; 2026-02-21T09:23:32.8640798Z add.s32 %r3816, %r3946, %r25; 2026-02-21T09:23:32.8640866Z selp.b32 %r3817, 8, 0, %p67; 2026-02-21T09:23:32.8640932Z // begin inline asm 2026-02-21T09:23:32.8641087Z cp.async.ca.shared.global [ %r3816 + 0 ], [ %rd271 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8641147Z // end inline asm 2026-02-21T09:23:32.8641212Z add.s32 %r3818, %r3816, 1024; 2026-02-21T09:23:32.8641293Z // begin inline asm 2026-02-21T09:23:32.8641440Z cp.async.ca.shared.global [ %r3818 + 0 ], [ %rd272 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8641499Z // end inline asm 2026-02-21T09:23:32.8641570Z add.s32 %r3820, %r3816, 2048; 2026-02-21T09:23:32.8641635Z // begin inline asm 2026-02-21T09:23:32.8641775Z cp.async.ca.shared.global [ %r3820 + 0 ], [ %rd273 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8641839Z // end inline asm 2026-02-21T09:23:32.8641903Z add.s32 %r3822, %r3816, 3072; 2026-02-21T09:23:32.8641962Z // begin inline asm 2026-02-21T09:23:32.8642101Z cp.async.ca.shared.global [ %r3822 + 0 ], [ %rd274 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8642168Z // end inline asm 2026-02-21T09:23:32.8642229Z add.s32 %r3824, %r3816, 4096; 2026-02-21T09:23:32.8642291Z // begin inline asm 2026-02-21T09:23:32.8642430Z cp.async.ca.shared.global [ %r3824 + 0 ], [ %rd275 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8642488Z // end inline asm 2026-02-21T09:23:32.8642619Z add.s32 %r3826, %r3816, 5120; 2026-02-21T09:23:32.8642726Z // begin inline asm 2026-02-21T09:23:32.8642867Z cp.async.ca.shared.global [ %r3826 + 0 ], [ %rd276 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8642924Z // end inline asm 2026-02-21T09:23:32.8642988Z add.s32 %r3828, %r3816, 6144; 2026-02-21T09:23:32.8643054Z // begin inline asm 2026-02-21T09:23:32.8643190Z cp.async.ca.shared.global [ %r3828 + 0 ], [ %rd277 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8643249Z // end inline asm 2026-02-21T09:23:32.8643311Z add.s32 %r3830, %r3816, 7168; 2026-02-21T09:23:32.8643377Z // begin inline asm 2026-02-21T09:23:32.8643512Z cp.async.ca.shared.global [ %r3830 + 0 ], [ %rd278 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8643570Z // end inline asm 2026-02-21T09:23:32.8643638Z add.s32 %r3832, %r3816, 8192; 2026-02-21T09:23:32.8643753Z // begin inline asm 2026-02-21T09:23:32.8643891Z cp.async.ca.shared.global [ %r3832 + 0 ], [ %rd279 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8643966Z // end inline asm 2026-02-21T09:23:32.8644035Z add.s32 %r3834, %r3816, 9216; 2026-02-21T09:23:32.8644096Z // begin inline asm 2026-02-21T09:23:32.8644235Z cp.async.ca.shared.global [ %r3834 + 0 ], [ %rd280 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8644297Z // end inline asm 2026-02-21T09:23:32.8644410Z add.s32 %r3836, %r3816, 10240; 2026-02-21T09:23:32.8644472Z // begin inline asm 2026-02-21T09:23:32.8644612Z cp.async.ca.shared.global [ %r3836 + 0 ], [ %rd281 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8644671Z // end inline asm 2026-02-21T09:23:32.8644734Z add.s32 %r3838, %r3816, 11264; 2026-02-21T09:23:32.8644794Z // begin inline asm 2026-02-21T09:23:32.8644936Z cp.async.ca.shared.global [ %r3838 + 0 ], [ %rd282 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8644992Z // end inline asm 2026-02-21T09:23:32.8645054Z add.s32 %r3840, %r3816, 12288; 2026-02-21T09:23:32.8645121Z // begin inline asm 2026-02-21T09:23:32.8645258Z cp.async.ca.shared.global [ %r3840 + 0 ], [ %rd283 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8645317Z // end inline asm 2026-02-21T09:23:32.8645389Z add.s32 %r3842, %r3816, 13312; 2026-02-21T09:23:32.8645449Z // begin inline asm 2026-02-21T09:23:32.8645586Z cp.async.ca.shared.global [ %r3842 + 0 ], [ %rd284 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8645644Z // end inline asm 2026-02-21T09:23:32.8645715Z add.s32 %r3844, %r3816, 14336; 2026-02-21T09:23:32.8645776Z // begin inline asm 2026-02-21T09:23:32.8645913Z cp.async.ca.shared.global [ %r3844 + 0 ], [ %rd285 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8645976Z // end inline asm 2026-02-21T09:23:32.8646040Z add.s32 %r3846, %r3816, 15360; 2026-02-21T09:23:32.8646102Z // begin inline asm 2026-02-21T09:23:32.8646238Z cp.async.ca.shared.global [ %r3846 + 0 ], [ %rd286 + 0 ], 0x8, %r3817; 2026-02-21T09:23:32.8646305Z // end inline asm 2026-02-21T09:23:32.8646376Z cp.async.commit_group; 2026-02-21T09:23:32.8646711Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8646786Z shl.b32 %r3947, %r12329, 3; 2026-02-21T09:23:32.8646857Z add.s32 %r3848, %r9488, %r3947; 2026-02-21T09:23:32.8646929Z and.pred %p63, %p2, %p67; 2026-02-21T09:23:32.8646996Z // begin inline asm 2026-02-21T09:23:32.8647135Z @%p63 mbarrier.arrive.expect_tx.shared.b64 _, [%r3848], 4096; 2026-02-21T09:23:32.8647196Z // end inline asm 2026-02-21T09:23:32.8647401Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8647472Z shl.b32 %r3948, %r12329, 12; 2026-02-21T09:23:32.8647538Z add.s32 %r3849, %r1289, %r3948; 2026-02-21T09:23:32.8647596Z bar.sync 0; 2026-02-21T09:23:32.8647671Z elect.sync %r3949|%p70, -1; 2026-02-21T09:23:32.8647741Z and.pred %p71, %p67, %p70; 2026-02-21T09:23:32.8647809Z and.pred %p64, %p1, %p71; 2026-02-21T09:23:32.8647876Z cvt.u32.u64 %r3950, %rd737; 2026-02-21T09:23:32.8647954Z add.s32 %r3851, %r3950, 64; 2026-02-21T09:23:32.8648021Z // begin inline asm 2026-02-21T09:23:32.8648358Z @%p64 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3849], [%rd603, {%r1290, %r3851}], [%r3848]; 2026-02-21T09:23:32.8648586Z // end inline asm 2026-02-21T09:23:32.8648802Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8648872Z add.s64 %rd736, %rd736, 128; 2026-02-21T09:23:32.8648952Z setp.lt.u64 %p72, %rd737, 480; 2026-02-21T09:23:32.8649022Z add.s64 %rd737, %rd737, 32; 2026-02-21T09:23:32.8649085Z @%p72 bra $L__BB0_3; 2026-02-21T09:23:32.8649205Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:32.8649416Z .loc 1 0 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:0:92 2026-02-21T09:23:32.8649484Z setp.lt.u32 %p82, %r1, 64; 2026-02-21T09:23:32.8649791Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8649872Z cp.async.wait_group 0; 2026-02-21T09:23:32.8649934Z bar.sync 0; 2026-02-21T09:23:32.8650000Z // begin inline asm 2026-02-21T09:23:32.8650103Z @%p2 mbarrier.inval.shared::cta.b64 [%r9488]; 2026-02-21T09:23:32.8650177Z // end inline asm 2026-02-21T09:23:32.8650237Z bar.sync 0; 2026-02-21T09:23:32.8650300Z // begin inline asm 2026-02-21T09:23:32.8650470Z @%p2 mbarrier.inval.shared::cta.b64 [%r9489]; 2026-02-21T09:23:32.8650530Z // end inline asm 2026-02-21T09:23:32.8650733Z .loc 1 91 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:91:28 2026-02-21T09:23:32.8650829Z cvt.rn.bf16x2.f32 %r4037, %r12331, %r12330; 2026-02-21T09:23:32.8650915Z cvt.rn.bf16x2.f32 %r4038, %r12333, %r12332; 2026-02-21T09:23:32.8650998Z cvt.rn.bf16x2.f32 %r4039, %r12335, %r12334; 2026-02-21T09:23:32.8651083Z cvt.rn.bf16x2.f32 %r4040, %r12337, %r12336; 2026-02-21T09:23:32.8651169Z cvt.rn.bf16x2.f32 %r4041, %r12339, %r12338; 2026-02-21T09:23:32.8651259Z cvt.rn.bf16x2.f32 %r4042, %r12341, %r12340; 2026-02-21T09:23:32.8651339Z cvt.rn.bf16x2.f32 %r4043, %r12343, %r12342; 2026-02-21T09:23:32.8651429Z cvt.rn.bf16x2.f32 %r4044, %r12345, %r12344; 2026-02-21T09:23:32.8651507Z cvt.rn.bf16x2.f32 %r4045, %r12347, %r12346; 2026-02-21T09:23:32.8651587Z cvt.rn.bf16x2.f32 %r4046, %r12349, %r12348; 2026-02-21T09:23:32.8651672Z cvt.rn.bf16x2.f32 %r4047, %r12351, %r12350; 2026-02-21T09:23:32.8651753Z cvt.rn.bf16x2.f32 %r4048, %r12353, %r12352; 2026-02-21T09:23:32.8651834Z cvt.rn.bf16x2.f32 %r4049, %r12355, %r12354; 2026-02-21T09:23:32.8651917Z cvt.rn.bf16x2.f32 %r4050, %r12357, %r12356; 2026-02-21T09:23:32.8651996Z cvt.rn.bf16x2.f32 %r4051, %r12359, %r12358; 2026-02-21T09:23:32.8652074Z cvt.rn.bf16x2.f32 %r4052, %r12361, %r12360; 2026-02-21T09:23:32.8652150Z cvt.rn.bf16x2.f32 %r4053, %r12363, %r12362; 2026-02-21T09:23:32.8652234Z cvt.rn.bf16x2.f32 %r4054, %r12365, %r12364; 2026-02-21T09:23:32.8652312Z cvt.rn.bf16x2.f32 %r4055, %r12367, %r12366; 2026-02-21T09:23:32.8652390Z cvt.rn.bf16x2.f32 %r4056, %r12369, %r12368; 2026-02-21T09:23:32.8652475Z cvt.rn.bf16x2.f32 %r4057, %r12371, %r12370; 2026-02-21T09:23:32.8652551Z cvt.rn.bf16x2.f32 %r4058, %r12373, %r12372; 2026-02-21T09:23:32.8652626Z cvt.rn.bf16x2.f32 %r4059, %r12375, %r12374; 2026-02-21T09:23:32.8652708Z cvt.rn.bf16x2.f32 %r4060, %r12377, %r12376; 2026-02-21T09:23:32.8652798Z cvt.rn.bf16x2.f32 %r4061, %r12379, %r12378; 2026-02-21T09:23:32.8652880Z cvt.rn.bf16x2.f32 %r4062, %r12381, %r12380; 2026-02-21T09:23:32.8652958Z cvt.rn.bf16x2.f32 %r4063, %r12383, %r12382; 2026-02-21T09:23:32.8653042Z cvt.rn.bf16x2.f32 %r4064, %r12385, %r12384; 2026-02-21T09:23:32.8653119Z cvt.rn.bf16x2.f32 %r4065, %r12387, %r12386; 2026-02-21T09:23:32.8653196Z cvt.rn.bf16x2.f32 %r4066, %r12389, %r12388; 2026-02-21T09:23:32.8653281Z cvt.rn.bf16x2.f32 %r4067, %r12391, %r12390; 2026-02-21T09:23:32.8653360Z cvt.rn.bf16x2.f32 %r4068, %r12393, %r12392; 2026-02-21T09:23:32.8653439Z cvt.rn.bf16x2.f32 %r4069, %r12395, %r12394; 2026-02-21T09:23:32.8653523Z cvt.rn.bf16x2.f32 %r4070, %r12397, %r12396; 2026-02-21T09:23:32.8653714Z cvt.rn.bf16x2.f32 %r4071, %r12399, %r12398; 2026-02-21T09:23:32.8653791Z cvt.rn.bf16x2.f32 %r4072, %r12401, %r12400; 2026-02-21T09:23:32.8653870Z cvt.rn.bf16x2.f32 %r4073, %r12403, %r12402; 2026-02-21T09:23:32.8653956Z cvt.rn.bf16x2.f32 %r4074, %r12405, %r12404; 2026-02-21T09:23:32.8654036Z cvt.rn.bf16x2.f32 %r4075, %r12407, %r12406; 2026-02-21T09:23:32.8654114Z cvt.rn.bf16x2.f32 %r4076, %r12409, %r12408; 2026-02-21T09:23:32.8654197Z cvt.rn.bf16x2.f32 %r4077, %r12411, %r12410; 2026-02-21T09:23:32.8654274Z cvt.rn.bf16x2.f32 %r4078, %r12413, %r12412; 2026-02-21T09:23:32.8654353Z cvt.rn.bf16x2.f32 %r4079, %r12415, %r12414; 2026-02-21T09:23:32.8654431Z cvt.rn.bf16x2.f32 %r4080, %r12417, %r12416; 2026-02-21T09:23:32.8654516Z cvt.rn.bf16x2.f32 %r4081, %r12419, %r12418; 2026-02-21T09:23:32.8654644Z cvt.rn.bf16x2.f32 %r4082, %r12421, %r12420; 2026-02-21T09:23:32.8654737Z cvt.rn.bf16x2.f32 %r4083, %r12423, %r12422; 2026-02-21T09:23:32.8654826Z cvt.rn.bf16x2.f32 %r4084, %r12425, %r12424; 2026-02-21T09:23:32.8654907Z cvt.rn.bf16x2.f32 %r4085, %r12427, %r12426; 2026-02-21T09:23:32.8654986Z cvt.rn.bf16x2.f32 %r4086, %r12429, %r12428; 2026-02-21T09:23:32.8655070Z cvt.rn.bf16x2.f32 %r4087, %r12431, %r12430; 2026-02-21T09:23:32.8655192Z cvt.rn.bf16x2.f32 %r4088, %r12433, %r12432; 2026-02-21T09:23:32.8655272Z cvt.rn.bf16x2.f32 %r4089, %r12435, %r12434; 2026-02-21T09:23:32.8655354Z cvt.rn.bf16x2.f32 %r4090, %r12437, %r12436; 2026-02-21T09:23:32.8655439Z cvt.rn.bf16x2.f32 %r4091, %r12439, %r12438; 2026-02-21T09:23:32.8655520Z cvt.rn.bf16x2.f32 %r4092, %r12441, %r12440; 2026-02-21T09:23:32.8655598Z cvt.rn.bf16x2.f32 %r4093, %r12443, %r12442; 2026-02-21T09:23:32.8655680Z cvt.rn.bf16x2.f32 %r4094, %r12445, %r12444; 2026-02-21T09:23:32.8655760Z cvt.rn.bf16x2.f32 %r4095, %r12447, %r12446; 2026-02-21T09:23:32.8655838Z cvt.rn.bf16x2.f32 %r4096, %r12449, %r12448; 2026-02-21T09:23:32.8655935Z cvt.rn.bf16x2.f32 %r4097, %r12451, %r12450; 2026-02-21T09:23:32.8656020Z cvt.rn.bf16x2.f32 %r4098, %r12453, %r12452; 2026-02-21T09:23:32.8656099Z cvt.rn.bf16x2.f32 %r4099, %r12455, %r12454; 2026-02-21T09:23:32.8656179Z cvt.rn.bf16x2.f32 %r4100, %r12457, %r12456; 2026-02-21T09:23:32.8656393Z .loc 1 92 43 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:92:43 2026-02-21T09:23:32.8656713Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r74], {%r4037, %r4038, %r4039, %r4040}; 2026-02-21T09:23:32.8656901Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r4053, %r4054, %r4055, %r4056}; 2026-02-21T09:23:32.8657091Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r4069, %r4070, %r4071, %r4072}; 2026-02-21T09:23:32.8657274Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r4085, %r4086, %r4087, %r4088}; 2026-02-21T09:23:32.8657455Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r4041, %r4042, %r4043, %r4044}; 2026-02-21T09:23:32.8657644Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r4057, %r4058, %r4059, %r4060}; 2026-02-21T09:23:32.8657828Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r4073, %r4074, %r4075, %r4076}; 2026-02-21T09:23:32.8658006Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r4089, %r4090, %r4091, %r4092}; 2026-02-21T09:23:32.8658197Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r4045, %r4046, %r4047, %r4048}; 2026-02-21T09:23:32.8658376Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r4061, %r4062, %r4063, %r4064}; 2026-02-21T09:23:32.8658567Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r4077, %r4078, %r4079, %r4080}; 2026-02-21T09:23:32.8658755Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r4093, %r4094, %r4095, %r4096}; 2026-02-21T09:23:32.8658935Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r4049, %r4050, %r4051, %r4052}; 2026-02-21T09:23:32.8659114Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r4065, %r4066, %r4067, %r4068}; 2026-02-21T09:23:32.8659302Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r4081, %r4082, %r4083, %r4084}; 2026-02-21T09:23:32.8659621Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r4097, %r4098, %r4099, %r4100}; 2026-02-21T09:23:32.8659692Z // begin inline asm 2026-02-21T09:23:32.8659776Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8659846Z // end inline asm 2026-02-21T09:23:32.8659906Z bar.sync 0; 2026-02-21T09:23:32.8659980Z elect.sync %r4101|%p84, -1; 2026-02-21T09:23:32.8660068Z shfl.sync.idx.b32 %r4102, %r4, 0, 31, -1; 2026-02-21T09:23:32.8660139Z and.pred %p75, %p82, %p84; 2026-02-21T09:23:32.8660203Z and.b32 %r4103, %r4102, 1; 2026-02-21T09:23:32.8660267Z shl.b32 %r4104, %r4103, 14; 2026-02-21T09:23:32.8660348Z add.s32 %r9492, %r1205, %r4104; 2026-02-21T09:23:32.8660419Z shl.b32 %r386, %r4103, 6; 2026-02-21T09:23:32.8660485Z or.b32 %r3953, %r386, %r1290; 2026-02-21T09:23:32.8660626Z // begin inline asm 2026-02-21T09:23:32.8660870Z @%p75 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd586, {%r3953, %r3954}], [%r9492]; 2026-02-21T09:23:32.8660932Z // end inline asm 2026-02-21T09:23:32.8661020Z cp.async.bulk.commit_group; 2026-02-21T09:23:32.8661103Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:32.8661160Z bar.sync 0; 2026-02-21T09:23:32.8661365Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.8661508Z or.b32 %r4105, %r12326, 1; 2026-02-21T09:23:32.8661718Z .loc 1 35 31 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:35:31 2026-02-21T09:23:32.8661786Z add.s32 %r4108, %r4105, %r1335; 2026-02-21T09:23:32.8661989Z .loc 1 34 30 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:34:30 2026-02-21T09:23:32.8662055Z and.b32 %r3992, %r4108, -128; 2026-02-21T09:23:32.8662121Z sub.s32 %r4109, %r4105, %r3992; 2026-02-21T09:23:32.8662324Z .loc 1 36 27 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:36:27 2026-02-21T09:23:32.8662386Z shl.b32 %r6724, %r4109, 7; 2026-02-21T09:23:32.8662583Z .loc 1 37 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:37:32 2026-02-21T09:23:32.8662652Z or.b32 %r4110, %r6724, %r6; 2026-02-21T09:23:32.8662719Z or.b32 %r4111, %r6724, %r7; 2026-02-21T09:23:32.8662781Z or.b32 %r4112, %r6724, %r8; 2026-02-21T09:23:32.8662842Z or.b32 %r4113, %r6724, %r9; 2026-02-21T09:23:32.8662911Z or.b32 %r4114, %r6724, %r10; 2026-02-21T09:23:32.8662974Z or.b32 %r4115, %r6724, %r11; 2026-02-21T09:23:32.8663036Z or.b32 %r4116, %r6724, %r12; 2026-02-21T09:23:32.8663108Z or.b32 %r4117, %r6724, %r13; 2026-02-21T09:23:32.8663182Z or.b32 %r4118, %r6724, %r14; 2026-02-21T09:23:32.8663243Z or.b32 %r4119, %r6724, %r15; 2026-02-21T09:23:32.8663304Z or.b32 %r4120, %r6724, %r16; 2026-02-21T09:23:32.8663370Z or.b32 %r4121, %r6724, %r17; 2026-02-21T09:23:32.8663432Z or.b32 %r4122, %r6724, %r18; 2026-02-21T09:23:32.8663495Z or.b32 %r4123, %r6724, %r19; 2026-02-21T09:23:32.8663563Z or.b32 %r4124, %r6724, %r20; 2026-02-21T09:23:32.8663626Z or.b32 %r4125, %r6724, %r21; 2026-02-21T09:23:32.8663827Z .loc 1 52 53 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:53 2026-02-21T09:23:32.8663890Z shl.b32 %r4126, %r4110, 10; 2026-02-21T09:23:32.8663957Z shl.b32 %r4127, %r4111, 10; 2026-02-21T09:23:32.8664020Z shl.b32 %r4128, %r4112, 10; 2026-02-21T09:23:32.8664086Z shl.b32 %r4129, %r4113, 10; 2026-02-21T09:23:32.8664152Z shl.b32 %r4130, %r4114, 10; 2026-02-21T09:23:32.8664214Z shl.b32 %r4131, %r4115, 10; 2026-02-21T09:23:32.8664278Z shl.b32 %r4132, %r4116, 10; 2026-02-21T09:23:32.8664346Z shl.b32 %r4133, %r4117, 10; 2026-02-21T09:23:32.8664407Z shl.b32 %r4134, %r4118, 10; 2026-02-21T09:23:32.8664468Z shl.b32 %r4135, %r4119, 10; 2026-02-21T09:23:32.8664530Z shl.b32 %r4136, %r4120, 10; 2026-02-21T09:23:32.8664599Z shl.b32 %r4137, %r4121, 10; 2026-02-21T09:23:32.8664660Z shl.b32 %r4138, %r4122, 10; 2026-02-21T09:23:32.8664721Z shl.b32 %r4139, %r4123, 10; 2026-02-21T09:23:32.8664787Z shl.b32 %r4140, %r4124, 10; 2026-02-21T09:23:32.8664976Z shl.b32 %r4141, %r4125, 10; 2026-02-21T09:23:32.8665175Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8665238Z // begin inline asm 2026-02-21T09:23:32.8665359Z @%p2 mbarrier.init.shared::cta.b64 [%r9488], 1; 2026-02-21T09:23:32.8665421Z // end inline asm 2026-02-21T09:23:32.8665480Z bar.sync 0; 2026-02-21T09:23:32.8665545Z // begin inline asm 2026-02-21T09:23:32.8665639Z @%p2 mbarrier.init.shared::cta.b64 [%r9489], 1; 2026-02-21T09:23:32.8665698Z // end inline asm 2026-02-21T09:23:32.8665906Z .loc 1 52 60 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:60 2026-02-21T09:23:32.8665970Z or.b32 %r4142, %r4126, %r23; 2026-02-21T09:23:32.8666084Z or.b32 %r4143, %r4127, %r23; 2026-02-21T09:23:32.8666146Z or.b32 %r4144, %r4128, %r23; 2026-02-21T09:23:32.8666214Z or.b32 %r4145, %r4129, %r23; 2026-02-21T09:23:32.8666277Z or.b32 %r4146, %r4130, %r23; 2026-02-21T09:23:32.8666343Z or.b32 %r4147, %r4131, %r23; 2026-02-21T09:23:32.8666410Z or.b32 %r4148, %r4132, %r23; 2026-02-21T09:23:32.8666598Z or.b32 %r4149, %r4133, %r23; 2026-02-21T09:23:32.8666672Z or.b32 %r4150, %r4134, %r23; 2026-02-21T09:23:32.8666824Z or.b32 %r4151, %r4135, %r23; 2026-02-21T09:23:32.8666896Z or.b32 %r4152, %r4136, %r23; 2026-02-21T09:23:32.8666959Z or.b32 %r4153, %r4137, %r23; 2026-02-21T09:23:32.8667022Z or.b32 %r4154, %r4138, %r23; 2026-02-21T09:23:32.8667090Z or.b32 %r4155, %r4139, %r23; 2026-02-21T09:23:32.8667154Z or.b32 %r4156, %r4140, %r23; 2026-02-21T09:23:32.8667215Z or.b32 %r4157, %r4141, %r23; 2026-02-21T09:23:32.8667417Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8667501Z mad.wide.s32 %rd289, %r4142, 2, %rd93; 2026-02-21T09:23:32.8667576Z mad.wide.s32 %rd290, %r4143, 2, %rd93; 2026-02-21T09:23:32.8667646Z mad.wide.s32 %rd291, %r4144, 2, %rd93; 2026-02-21T09:23:32.8667726Z mad.wide.s32 %rd292, %r4145, 2, %rd93; 2026-02-21T09:23:32.8667795Z mad.wide.s32 %rd293, %r4146, 2, %rd93; 2026-02-21T09:23:32.8667865Z mad.wide.s32 %rd294, %r4147, 2, %rd93; 2026-02-21T09:23:32.8667943Z mad.wide.s32 %rd295, %r4148, 2, %rd93; 2026-02-21T09:23:32.8668018Z mad.wide.s32 %rd296, %r4149, 2, %rd93; 2026-02-21T09:23:32.8668090Z mad.wide.s32 %rd297, %r4150, 2, %rd93; 2026-02-21T09:23:32.8668160Z mad.wide.s32 %rd298, %r4151, 2, %rd93; 2026-02-21T09:23:32.8668310Z mad.wide.s32 %rd299, %r4152, 2, %rd93; 2026-02-21T09:23:32.8668384Z mad.wide.s32 %rd300, %r4153, 2, %rd93; 2026-02-21T09:23:32.8668457Z mad.wide.s32 %rd301, %r4154, 2, %rd93; 2026-02-21T09:23:32.8668533Z mad.wide.s32 %rd302, %r4155, 2, %rd93; 2026-02-21T09:23:32.8668601Z mad.wide.s32 %rd303, %r4156, 2, %rd93; 2026-02-21T09:23:32.8668672Z mad.wide.s32 %rd304, %r4157, 2, %rd93; 2026-02-21T09:23:32.8668736Z mov.b32 %r3959, 8; 2026-02-21T09:23:32.8668936Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8669002Z // begin inline asm 2026-02-21T09:23:32.8669151Z cp.async.ca.shared.global [ %r9495 + 0 ], [ %rd289 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8669217Z // end inline asm 2026-02-21T09:23:32.8669278Z // begin inline asm 2026-02-21T09:23:32.8669420Z cp.async.ca.shared.global [ %r9497 + 0 ], [ %rd290 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8669485Z // end inline asm 2026-02-21T09:23:32.8669546Z // begin inline asm 2026-02-21T09:23:32.8669692Z cp.async.ca.shared.global [ %r9499 + 0 ], [ %rd291 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8669752Z // end inline asm 2026-02-21T09:23:32.8669820Z // begin inline asm 2026-02-21T09:23:32.8669956Z cp.async.ca.shared.global [ %r9501 + 0 ], [ %rd292 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8670016Z // end inline asm 2026-02-21T09:23:32.8670082Z // begin inline asm 2026-02-21T09:23:32.8670218Z cp.async.ca.shared.global [ %r9503 + 0 ], [ %rd293 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8670458Z // end inline asm 2026-02-21T09:23:32.8670522Z // begin inline asm 2026-02-21T09:23:32.8670658Z cp.async.ca.shared.global [ %r9505 + 0 ], [ %rd294 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8670716Z // end inline asm 2026-02-21T09:23:32.8670776Z // begin inline asm 2026-02-21T09:23:32.8670919Z cp.async.ca.shared.global [ %r9507 + 0 ], [ %rd295 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8670977Z // end inline asm 2026-02-21T09:23:32.8671037Z // begin inline asm 2026-02-21T09:23:32.8671178Z cp.async.ca.shared.global [ %r9509 + 0 ], [ %rd296 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8671234Z // end inline asm 2026-02-21T09:23:32.8671294Z // begin inline asm 2026-02-21T09:23:32.8671430Z cp.async.ca.shared.global [ %r9511 + 0 ], [ %rd297 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8671493Z // end inline asm 2026-02-21T09:23:32.8671626Z // begin inline asm 2026-02-21T09:23:32.8671764Z cp.async.ca.shared.global [ %r9513 + 0 ], [ %rd298 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8671827Z // end inline asm 2026-02-21T09:23:32.8671892Z // begin inline asm 2026-02-21T09:23:32.8672028Z cp.async.ca.shared.global [ %r9515 + 0 ], [ %rd299 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8672091Z // end inline asm 2026-02-21T09:23:32.8672151Z // begin inline asm 2026-02-21T09:23:32.8672333Z cp.async.ca.shared.global [ %r9517 + 0 ], [ %rd300 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8672394Z // end inline asm 2026-02-21T09:23:32.8672471Z // begin inline asm 2026-02-21T09:23:32.8672609Z cp.async.ca.shared.global [ %r9519 + 0 ], [ %rd301 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8672665Z // end inline asm 2026-02-21T09:23:32.8672729Z // begin inline asm 2026-02-21T09:23:32.8672867Z cp.async.ca.shared.global [ %r9521 + 0 ], [ %rd302 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8672925Z // end inline asm 2026-02-21T09:23:32.8672987Z // begin inline asm 2026-02-21T09:23:32.8673128Z cp.async.ca.shared.global [ %r9523 + 0 ], [ %rd303 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8673185Z // end inline asm 2026-02-21T09:23:32.8673250Z // begin inline asm 2026-02-21T09:23:32.8673391Z cp.async.ca.shared.global [ %r9525 + 0 ], [ %rd304 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8673449Z // end inline asm 2026-02-21T09:23:32.8673518Z cp.async.commit_group; 2026-02-21T09:23:32.8673728Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8673786Z bar.sync 0; 2026-02-21T09:23:32.8673847Z // begin inline asm 2026-02-21T09:23:32.8673981Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9488], 4096; 2026-02-21T09:23:32.8674049Z // end inline asm 2026-02-21T09:23:32.8674259Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8674318Z bar.sync 0; 2026-02-21T09:23:32.8674398Z elect.sync %r4158|%p85, -1; 2026-02-21T09:23:32.8674470Z and.pred %p79, %p1, %p85; 2026-02-21T09:23:32.8674531Z mov.b32 %r3993, 0; 2026-02-21T09:23:32.8674591Z // begin inline asm 2026-02-21T09:23:32.8674926Z @%p79 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1289], [%rd603, {%r3992, %r3993}], [%r9488]; 2026-02-21T09:23:32.8674988Z // end inline asm 2026-02-21T09:23:32.8675190Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8675265Z cvt.s64.s32 %rd324, %r4126; 2026-02-21T09:23:32.8675332Z or.b64 %rd325, %rd324, %rd12; 2026-02-21T09:23:32.8675398Z shl.b64 %rd326, %rd325, 1; 2026-02-21T09:23:32.8675472Z add.s64 %rd327, %rd93, %rd326; 2026-02-21T09:23:32.8675538Z add.s64 %rd306, %rd327, 128; 2026-02-21T09:23:32.8675602Z cvt.s64.s32 %rd328, %r4127; 2026-02-21T09:23:32.8675665Z or.b64 %rd329, %rd328, %rd12; 2026-02-21T09:23:32.8675736Z shl.b64 %rd330, %rd329, 1; 2026-02-21T09:23:32.8675804Z add.s64 %rd331, %rd93, %rd330; 2026-02-21T09:23:32.8675870Z add.s64 %rd307, %rd331, 128; 2026-02-21T09:23:32.8675941Z cvt.s64.s32 %rd332, %r4128; 2026-02-21T09:23:32.8676005Z or.b64 %rd333, %rd332, %rd12; 2026-02-21T09:23:32.8676184Z shl.b64 %rd334, %rd333, 1; 2026-02-21T09:23:32.8676249Z add.s64 %rd335, %rd93, %rd334; 2026-02-21T09:23:32.8676318Z add.s64 %rd308, %rd335, 128; 2026-02-21T09:23:32.8676385Z cvt.s64.s32 %rd336, %r4129; 2026-02-21T09:23:32.8676574Z or.b64 %rd337, %rd336, %rd12; 2026-02-21T09:23:32.8676651Z shl.b64 %rd338, %rd337, 1; 2026-02-21T09:23:32.8676717Z add.s64 %rd339, %rd93, %rd338; 2026-02-21T09:23:32.8676781Z add.s64 %rd309, %rd339, 128; 2026-02-21T09:23:32.8676853Z cvt.s64.s32 %rd340, %r4130; 2026-02-21T09:23:32.8676920Z or.b64 %rd341, %rd340, %rd12; 2026-02-21T09:23:32.8676983Z shl.b64 %rd342, %rd341, 1; 2026-02-21T09:23:32.8677047Z add.s64 %rd343, %rd93, %rd342; 2026-02-21T09:23:32.8677120Z add.s64 %rd310, %rd343, 128; 2026-02-21T09:23:32.8677183Z cvt.s64.s32 %rd344, %r4131; 2026-02-21T09:23:32.8677339Z or.b64 %rd345, %rd344, %rd12; 2026-02-21T09:23:32.8677411Z shl.b64 %rd346, %rd345, 1; 2026-02-21T09:23:32.8677476Z add.s64 %rd347, %rd93, %rd346; 2026-02-21T09:23:32.8677559Z add.s64 %rd311, %rd347, 128; 2026-02-21T09:23:32.8677624Z cvt.s64.s32 %rd348, %r4132; 2026-02-21T09:23:32.8677698Z or.b64 %rd349, %rd348, %rd12; 2026-02-21T09:23:32.8677764Z shl.b64 %rd350, %rd349, 1; 2026-02-21T09:23:32.8677829Z add.s64 %rd351, %rd93, %rd350; 2026-02-21T09:23:32.8677967Z add.s64 %rd312, %rd351, 128; 2026-02-21T09:23:32.8678036Z cvt.s64.s32 %rd352, %r4133; 2026-02-21T09:23:32.8678099Z or.b64 %rd353, %rd352, %rd12; 2026-02-21T09:23:32.8678164Z shl.b64 %rd354, %rd353, 1; 2026-02-21T09:23:32.8678235Z add.s64 %rd355, %rd93, %rd354; 2026-02-21T09:23:32.8678300Z add.s64 %rd313, %rd355, 128; 2026-02-21T09:23:32.8678365Z cvt.s64.s32 %rd356, %r4134; 2026-02-21T09:23:32.8678436Z or.b64 %rd357, %rd356, %rd12; 2026-02-21T09:23:32.8678502Z shl.b64 %rd358, %rd357, 1; 2026-02-21T09:23:32.8678570Z add.s64 %rd359, %rd93, %rd358; 2026-02-21T09:23:32.8678638Z add.s64 %rd314, %rd359, 128; 2026-02-21T09:23:32.8678723Z cvt.s64.s32 %rd360, %r4135; 2026-02-21T09:23:32.8678791Z or.b64 %rd361, %rd360, %rd12; 2026-02-21T09:23:32.8678853Z shl.b64 %rd362, %rd361, 1; 2026-02-21T09:23:32.8678921Z add.s64 %rd363, %rd93, %rd362; 2026-02-21T09:23:32.8678987Z add.s64 %rd315, %rd363, 128; 2026-02-21T09:23:32.8679051Z cvt.s64.s32 %rd364, %r4136; 2026-02-21T09:23:32.8679120Z or.b64 %rd365, %rd364, %rd12; 2026-02-21T09:23:32.8679182Z shl.b64 %rd366, %rd365, 1; 2026-02-21T09:23:32.8679245Z add.s64 %rd367, %rd93, %rd366; 2026-02-21T09:23:32.8679307Z add.s64 %rd316, %rd367, 128; 2026-02-21T09:23:32.8679374Z cvt.s64.s32 %rd368, %r4137; 2026-02-21T09:23:32.8679439Z or.b64 %rd369, %rd368, %rd12; 2026-02-21T09:23:32.8679503Z shl.b64 %rd370, %rd369, 1; 2026-02-21T09:23:32.8679574Z add.s64 %rd371, %rd93, %rd370; 2026-02-21T09:23:32.8679643Z add.s64 %rd317, %rd371, 128; 2026-02-21T09:23:32.8679708Z cvt.s64.s32 %rd372, %r4138; 2026-02-21T09:23:32.8679771Z or.b64 %rd373, %rd372, %rd12; 2026-02-21T09:23:32.8679841Z shl.b64 %rd374, %rd373, 1; 2026-02-21T09:23:32.8679909Z add.s64 %rd375, %rd93, %rd374; 2026-02-21T09:23:32.8679972Z add.s64 %rd318, %rd375, 128; 2026-02-21T09:23:32.8680041Z cvt.s64.s32 %rd376, %r4139; 2026-02-21T09:23:32.8680105Z or.b64 %rd377, %rd376, %rd12; 2026-02-21T09:23:32.8680168Z shl.b64 %rd378, %rd377, 1; 2026-02-21T09:23:32.8680238Z add.s64 %rd379, %rd93, %rd378; 2026-02-21T09:23:32.8680307Z add.s64 %rd319, %rd379, 128; 2026-02-21T09:23:32.8680386Z cvt.s64.s32 %rd380, %r4140; 2026-02-21T09:23:32.8680451Z or.b64 %rd381, %rd380, %rd12; 2026-02-21T09:23:32.8680520Z shl.b64 %rd382, %rd381, 1; 2026-02-21T09:23:32.8680588Z add.s64 %rd383, %rd93, %rd382; 2026-02-21T09:23:32.8680652Z add.s64 %rd320, %rd383, 128; 2026-02-21T09:23:32.8680716Z cvt.s64.s32 %rd384, %r4141; 2026-02-21T09:23:32.8680783Z or.b64 %rd385, %rd384, %rd12; 2026-02-21T09:23:32.8680847Z shl.b64 %rd386, %rd385, 1; 2026-02-21T09:23:32.8680911Z add.s64 %rd387, %rd93, %rd386; 2026-02-21T09:23:32.8680980Z add.s64 %rd321, %rd387, 128; 2026-02-21T09:23:32.8681269Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8681404Z // begin inline asm 2026-02-21T09:23:32.8681555Z cp.async.ca.shared.global [ %r9532 + 0 ], [ %rd306 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8681619Z // end inline asm 2026-02-21T09:23:32.8681679Z // begin inline asm 2026-02-21T09:23:32.8681820Z cp.async.ca.shared.global [ %r9534 + 0 ], [ %rd307 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8681884Z // end inline asm 2026-02-21T09:23:32.8681944Z // begin inline asm 2026-02-21T09:23:32.8682079Z cp.async.ca.shared.global [ %r9536 + 0 ], [ %rd308 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8682144Z // end inline asm 2026-02-21T09:23:32.8682203Z // begin inline asm 2026-02-21T09:23:32.8682390Z cp.async.ca.shared.global [ %r9538 + 0 ], [ %rd309 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8682456Z // end inline asm 2026-02-21T09:23:32.8682518Z // begin inline asm 2026-02-21T09:23:32.8682656Z cp.async.ca.shared.global [ %r9540 + 0 ], [ %rd310 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8682718Z // end inline asm 2026-02-21T09:23:32.8682783Z // begin inline asm 2026-02-21T09:23:32.8682921Z cp.async.ca.shared.global [ %r9542 + 0 ], [ %rd311 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8682979Z // end inline asm 2026-02-21T09:23:32.8683092Z // begin inline asm 2026-02-21T09:23:32.8683230Z cp.async.ca.shared.global [ %r9544 + 0 ], [ %rd312 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8683288Z // end inline asm 2026-02-21T09:23:32.8683350Z // begin inline asm 2026-02-21T09:23:32.8683490Z cp.async.ca.shared.global [ %r9546 + 0 ], [ %rd313 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8683548Z // end inline asm 2026-02-21T09:23:32.8683609Z // begin inline asm 2026-02-21T09:23:32.8683749Z cp.async.ca.shared.global [ %r9548 + 0 ], [ %rd314 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8683810Z // end inline asm 2026-02-21T09:23:32.8683870Z // begin inline asm 2026-02-21T09:23:32.8684006Z cp.async.ca.shared.global [ %r9550 + 0 ], [ %rd315 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8684073Z // end inline asm 2026-02-21T09:23:32.8684133Z // begin inline asm 2026-02-21T09:23:32.8684270Z cp.async.ca.shared.global [ %r9552 + 0 ], [ %rd316 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8684331Z // end inline asm 2026-02-21T09:23:32.8684394Z // begin inline asm 2026-02-21T09:23:32.8684533Z cp.async.ca.shared.global [ %r9554 + 0 ], [ %rd317 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8684596Z // end inline asm 2026-02-21T09:23:32.8684666Z // begin inline asm 2026-02-21T09:23:32.8684804Z cp.async.ca.shared.global [ %r9556 + 0 ], [ %rd318 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8684863Z // end inline asm 2026-02-21T09:23:32.8684928Z // begin inline asm 2026-02-21T09:23:32.8685063Z cp.async.ca.shared.global [ %r9558 + 0 ], [ %rd319 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8685124Z // end inline asm 2026-02-21T09:23:32.8685189Z // begin inline asm 2026-02-21T09:23:32.8685338Z cp.async.ca.shared.global [ %r9560 + 0 ], [ %rd320 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8685400Z // end inline asm 2026-02-21T09:23:32.8685466Z // begin inline asm 2026-02-21T09:23:32.8685610Z cp.async.ca.shared.global [ %r9562 + 0 ], [ %rd321 + 0 ], 0x8, %r3959; 2026-02-21T09:23:32.8685668Z // end inline asm 2026-02-21T09:23:32.8685737Z cp.async.commit_group; 2026-02-21T09:23:32.8685949Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8686009Z bar.sync 0; 2026-02-21T09:23:32.8686070Z // begin inline asm 2026-02-21T09:23:32.8686209Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9489], 4096; 2026-02-21T09:23:32.8686270Z // end inline asm 2026-02-21T09:23:32.8686602Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8686667Z bar.sync 0; 2026-02-21T09:23:32.8686746Z elect.sync %r4159|%p86, -1; 2026-02-21T09:23:32.8686816Z and.pred %p81, %p1, %p86; 2026-02-21T09:23:32.8686877Z mov.b32 %r4030, 32; 2026-02-21T09:23:32.8686941Z // begin inline asm 2026-02-21T09:23:32.8687420Z @%p81 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1326], [%rd603, {%r3992, %r4030}], [%r9489]; 2026-02-21T09:23:32.8687480Z // end inline asm 2026-02-21T09:23:32.8687702Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8687768Z shl.b32 %r4160, %r4105, 7; 2026-02-21T09:23:32.8687833Z or.b32 %r4161, %r21, %r4160; 2026-02-21T09:23:32.8687895Z shl.b32 %r4162, %r4108, 7; 2026-02-21T09:23:32.8687969Z and.b32 %r4163, %r4162, -16384; 2026-02-21T09:23:32.8688033Z sub.s32 %r4164, %r4161, %r4163; 2026-02-21T09:23:32.8688095Z shl.b32 %r4165, %r4164, 10; 2026-02-21T09:23:32.8688172Z mul.wide.s32 %rd388, %r4165, 2; 2026-02-21T09:23:32.8688237Z or.b64 %rd33, %rd388, 256; 2026-02-21T09:23:32.8688367Z or.b32 %r4166, %r20, %r4160; 2026-02-21T09:23:32.8688437Z sub.s32 %r4167, %r4166, %r4163; 2026-02-21T09:23:32.8688504Z shl.b32 %r4168, %r4167, 10; 2026-02-21T09:23:32.8688575Z mul.wide.s32 %rd389, %r4168, 2; 2026-02-21T09:23:32.8688637Z or.b64 %rd34, %rd389, 256; 2026-02-21T09:23:32.8688703Z or.b32 %r4169, %r19, %r4160; 2026-02-21T09:23:32.8688764Z sub.s32 %r4170, %r4169, %r4163; 2026-02-21T09:23:32.8688827Z shl.b32 %r4171, %r4170, 10; 2026-02-21T09:23:32.8688974Z mul.wide.s32 %rd390, %r4171, 2; 2026-02-21T09:23:32.8689041Z or.b64 %rd35, %rd390, 256; 2026-02-21T09:23:32.8689105Z or.b32 %r4172, %r18, %r4160; 2026-02-21T09:23:32.8689168Z sub.s32 %r4173, %r4172, %r4163; 2026-02-21T09:23:32.8689236Z shl.b32 %r4174, %r4173, 10; 2026-02-21T09:23:32.8689303Z mul.wide.s32 %rd391, %r4174, 2; 2026-02-21T09:23:32.8689368Z or.b64 %rd36, %rd391, 256; 2026-02-21T09:23:32.8689435Z or.b32 %r4175, %r17, %r4160; 2026-02-21T09:23:32.8689500Z sub.s32 %r4176, %r4175, %r4163; 2026-02-21T09:23:32.8689573Z shl.b32 %r4177, %r4176, 10; 2026-02-21T09:23:32.8689639Z mul.wide.s32 %rd392, %r4177, 2; 2026-02-21T09:23:32.8689707Z or.b64 %rd37, %rd392, 256; 2026-02-21T09:23:32.8689772Z or.b32 %r4178, %r16, %r4160; 2026-02-21T09:23:32.8689833Z sub.s32 %r4179, %r4178, %r4163; 2026-02-21T09:23:32.8689900Z shl.b32 %r4180, %r4179, 10; 2026-02-21T09:23:32.8689965Z mul.wide.s32 %rd393, %r4180, 2; 2026-02-21T09:23:32.8690030Z or.b64 %rd38, %rd393, 256; 2026-02-21T09:23:32.8690094Z or.b32 %r4181, %r15, %r4160; 2026-02-21T09:23:32.8690160Z sub.s32 %r4182, %r4181, %r4163; 2026-02-21T09:23:32.8690223Z shl.b32 %r4183, %r4182, 10; 2026-02-21T09:23:32.8690287Z mul.wide.s32 %rd394, %r4183, 2; 2026-02-21T09:23:32.8690352Z or.b64 %rd39, %rd394, 256; 2026-02-21T09:23:32.8690413Z or.b32 %r4184, %r14, %r4160; 2026-02-21T09:23:32.8690475Z sub.s32 %r4185, %r4184, %r4163; 2026-02-21T09:23:32.8690542Z shl.b32 %r4186, %r4185, 10; 2026-02-21T09:23:32.8690609Z mul.wide.s32 %rd395, %r4186, 2; 2026-02-21T09:23:32.8690672Z or.b64 %rd40, %rd395, 256; 2026-02-21T09:23:32.8690735Z or.b32 %r4187, %r13, %r4160; 2026-02-21T09:23:32.8690802Z sub.s32 %r4188, %r4187, %r4163; 2026-02-21T09:23:32.8690866Z shl.b32 %r4189, %r4188, 10; 2026-02-21T09:23:32.8690932Z mul.wide.s32 %rd396, %r4189, 2; 2026-02-21T09:23:32.8690998Z or.b64 %rd41, %rd396, 256; 2026-02-21T09:23:32.8691060Z or.b32 %r4190, %r12, %r4160; 2026-02-21T09:23:32.8691124Z sub.s32 %r4191, %r4190, %r4163; 2026-02-21T09:23:32.8691185Z shl.b32 %r4192, %r4191, 10; 2026-02-21T09:23:32.8691254Z mul.wide.s32 %rd397, %r4192, 2; 2026-02-21T09:23:32.8691315Z or.b64 %rd42, %rd397, 256; 2026-02-21T09:23:32.8691375Z or.b32 %r4193, %r11, %r4160; 2026-02-21T09:23:32.8691443Z sub.s32 %r4194, %r4193, %r4163; 2026-02-21T09:23:32.8691503Z shl.b32 %r4195, %r4194, 10; 2026-02-21T09:23:32.8691570Z mul.wide.s32 %rd398, %r4195, 2; 2026-02-21T09:23:32.8691632Z or.b64 %rd43, %rd398, 256; 2026-02-21T09:23:32.8691699Z or.b32 %r4196, %r10, %r4160; 2026-02-21T09:23:32.8691763Z sub.s32 %r4197, %r4196, %r4163; 2026-02-21T09:23:32.8691825Z shl.b32 %r4198, %r4197, 10; 2026-02-21T09:23:32.8691895Z mul.wide.s32 %rd399, %r4198, 2; 2026-02-21T09:23:32.8692094Z or.b64 %rd44, %rd399, 256; 2026-02-21T09:23:32.8692154Z or.b32 %r4199, %r9, %r4160; 2026-02-21T09:23:32.8692219Z sub.s32 %r4200, %r4199, %r4163; 2026-02-21T09:23:32.8692294Z shl.b32 %r4201, %r4200, 10; 2026-02-21T09:23:32.8692365Z mul.wide.s32 %rd400, %r4201, 2; 2026-02-21T09:23:32.8692430Z or.b64 %rd45, %rd400, 256; 2026-02-21T09:23:32.8692497Z or.b32 %r4202, %r8, %r4160; 2026-02-21T09:23:32.8692560Z sub.s32 %r4203, %r4202, %r4163; 2026-02-21T09:23:32.8692623Z shl.b32 %r4204, %r4203, 10; 2026-02-21T09:23:32.8692692Z mul.wide.s32 %rd401, %r4204, 2; 2026-02-21T09:23:32.8692756Z or.b64 %rd46, %rd401, 256; 2026-02-21T09:23:32.8692818Z or.b32 %r4205, %r7, %r4160; 2026-02-21T09:23:32.8692881Z sub.s32 %r4206, %r4205, %r4163; 2026-02-21T09:23:32.8693005Z shl.b32 %r4207, %r4206, 10; 2026-02-21T09:23:32.8693074Z mul.wide.s32 %rd402, %r4207, 2; 2026-02-21T09:23:32.8693137Z or.b64 %rd47, %rd402, 256; 2026-02-21T09:23:32.8693200Z or.b32 %r4208, %r6, %r4160; 2026-02-21T09:23:32.8693267Z sub.s32 %r4209, %r4208, %r4163; 2026-02-21T09:23:32.8693326Z shl.b32 %r4210, %r4209, 10; 2026-02-21T09:23:32.8693394Z mul.wide.s32 %rd403, %r4210, 2; 2026-02-21T09:23:32.8693461Z or.b64 %rd48, %rd403, 256; 2026-02-21T09:23:32.8693571Z mov.b32 %r12461, 0f00000000; 2026-02-21T09:23:32.8693631Z mov.b32 %r12460, 1; 2026-02-21T09:23:32.8693697Z mov.b32 %r12459, -1; 2026-02-21T09:23:32.8693759Z mov.b64 %rd739, 0; 2026-02-21T09:23:32.8693824Z mov.b64 %rd738, %rd11; 2026-02-21T09:23:32.8693888Z mov.b32 %r12458, %r3993; 2026-02-21T09:23:32.8693968Z mov.b32 %r12462, %r12461; 2026-02-21T09:23:32.8694031Z mov.b32 %r12463, %r12461; 2026-02-21T09:23:32.8694094Z mov.b32 %r12464, %r12461; 2026-02-21T09:23:32.8694161Z mov.b32 %r12465, %r12461; 2026-02-21T09:23:32.8694225Z mov.b32 %r12466, %r12461; 2026-02-21T09:23:32.8694284Z mov.b32 %r12467, %r12461; 2026-02-21T09:23:32.8694345Z mov.b32 %r12468, %r12461; 2026-02-21T09:23:32.8694411Z mov.b32 %r12469, %r12461; 2026-02-21T09:23:32.8694475Z mov.b32 %r12470, %r12461; 2026-02-21T09:23:32.8694535Z mov.b32 %r12471, %r12461; 2026-02-21T09:23:32.8694599Z mov.b32 %r12472, %r12461; 2026-02-21T09:23:32.8694660Z mov.b32 %r12473, %r12461; 2026-02-21T09:23:32.8694720Z mov.b32 %r12474, %r12461; 2026-02-21T09:23:32.8694788Z mov.b32 %r12475, %r12461; 2026-02-21T09:23:32.8694850Z mov.b32 %r12476, %r12461; 2026-02-21T09:23:32.8694908Z mov.b32 %r12477, %r12461; 2026-02-21T09:23:32.8694967Z mov.b32 %r12478, %r12461; 2026-02-21T09:23:32.8695031Z mov.b32 %r12479, %r12461; 2026-02-21T09:23:32.8695090Z mov.b32 %r12480, %r12461; 2026-02-21T09:23:32.8695151Z mov.b32 %r12481, %r12461; 2026-02-21T09:23:32.8695217Z mov.b32 %r12482, %r12461; 2026-02-21T09:23:32.8695277Z mov.b32 %r12483, %r12461; 2026-02-21T09:23:32.8695337Z mov.b32 %r12484, %r12461; 2026-02-21T09:23:32.8695399Z mov.b32 %r12485, %r12461; 2026-02-21T09:23:32.8695463Z mov.b32 %r12486, %r12461; 2026-02-21T09:23:32.8695522Z mov.b32 %r12487, %r12461; 2026-02-21T09:23:32.8695586Z mov.b32 %r12488, %r12461; 2026-02-21T09:23:32.8695650Z mov.b32 %r12489, %r12461; 2026-02-21T09:23:32.8695710Z mov.b32 %r12490, %r12461; 2026-02-21T09:23:32.8695770Z mov.b32 %r12491, %r12461; 2026-02-21T09:23:32.8695829Z mov.b32 %r12492, %r12461; 2026-02-21T09:23:32.8695897Z mov.b32 %r12493, %r12461; 2026-02-21T09:23:32.8695957Z mov.b32 %r12494, %r12461; 2026-02-21T09:23:32.8696016Z mov.b32 %r12495, %r12461; 2026-02-21T09:23:32.8696081Z mov.b32 %r12496, %r12461; 2026-02-21T09:23:32.8696142Z mov.b32 %r12497, %r12461; 2026-02-21T09:23:32.8696216Z mov.b32 %r12498, %r12461; 2026-02-21T09:23:32.8696277Z mov.b32 %r12499, %r12461; 2026-02-21T09:23:32.8696344Z mov.b32 %r12500, %r12461; 2026-02-21T09:23:32.8696404Z mov.b32 %r12501, %r12461; 2026-02-21T09:23:32.8696581Z mov.b32 %r12502, %r12461; 2026-02-21T09:23:32.8696651Z mov.b32 %r12503, %r12461; 2026-02-21T09:23:32.8696710Z mov.b32 %r12504, %r12461; 2026-02-21T09:23:32.8696769Z mov.b32 %r12505, %r12461; 2026-02-21T09:23:32.8697001Z mov.b32 %r12506, %r12461; 2026-02-21T09:23:32.8697066Z mov.b32 %r12507, %r12461; 2026-02-21T09:23:32.8697127Z mov.b32 %r12508, %r12461; 2026-02-21T09:23:32.8697190Z mov.b32 %r12509, %r12461; 2026-02-21T09:23:32.8697254Z mov.b32 %r12510, %r12461; 2026-02-21T09:23:32.8697315Z mov.b32 %r12511, %r12461; 2026-02-21T09:23:32.8697376Z mov.b32 %r12512, %r12461; 2026-02-21T09:23:32.8697438Z mov.b32 %r12513, %r12461; 2026-02-21T09:23:32.8697504Z mov.b32 %r12514, %r12461; 2026-02-21T09:23:32.8697564Z mov.b32 %r12515, %r12461; 2026-02-21T09:23:32.8697624Z mov.b32 %r12516, %r12461; 2026-02-21T09:23:32.8697687Z mov.b32 %r12517, %r12461; 2026-02-21T09:23:32.8697746Z mov.b32 %r12518, %r12461; 2026-02-21T09:23:32.8697805Z mov.b32 %r12519, %r12461; 2026-02-21T09:23:32.8697936Z mov.b32 %r12520, %r12461; 2026-02-21T09:23:32.8697999Z mov.b32 %r12521, %r12461; 2026-02-21T09:23:32.8698062Z mov.b32 %r12522, %r12461; 2026-02-21T09:23:32.8698122Z mov.b32 %r12523, %r12461; 2026-02-21T09:23:32.8698191Z mov.b32 %r12524, %r12461; 2026-02-21T09:23:32.8698252Z mov.b32 %r12525, %r12461; 2026-02-21T09:23:32.8698312Z mov.b32 %r12526, %r12461; 2026-02-21T09:23:32.8698374Z mov.b32 %r12527, %r12461; 2026-02-21T09:23:32.8698434Z mov.b32 %r12528, %r12461; 2026-02-21T09:23:32.8698556Z mov.b32 %r12529, %r12461; 2026-02-21T09:23:32.8698632Z mov.b32 %r12530, %r12461; 2026-02-21T09:23:32.8698698Z mov.b32 %r12531, %r12461; 2026-02-21T09:23:32.8698760Z mov.b32 %r12532, %r12461; 2026-02-21T09:23:32.8698820Z mov.b32 %r12533, %r12461; 2026-02-21T09:23:32.8698884Z mov.b32 %r12534, %r12461; 2026-02-21T09:23:32.8698945Z mov.b32 %r12535, %r12461; 2026-02-21T09:23:32.8699006Z mov.b32 %r12536, %r12461; 2026-02-21T09:23:32.8699065Z mov.b32 %r12537, %r12461; 2026-02-21T09:23:32.8699131Z mov.b32 %r12538, %r12461; 2026-02-21T09:23:32.8699190Z mov.b32 %r12539, %r12461; 2026-02-21T09:23:32.8699251Z mov.b32 %r12540, %r12461; 2026-02-21T09:23:32.8699314Z mov.b32 %r12541, %r12461; 2026-02-21T09:23:32.8699386Z mov.b32 %r12542, %r12461; 2026-02-21T09:23:32.8699448Z mov.b32 %r12543, %r12461; 2026-02-21T09:23:32.8699509Z mov.b32 %r12544, %r12461; 2026-02-21T09:23:32.8699581Z mov.b32 %r12545, %r12461; 2026-02-21T09:23:32.8699641Z mov.b32 %r12546, %r12461; 2026-02-21T09:23:32.8699704Z mov.b32 %r12547, %r12461; 2026-02-21T09:23:32.8699770Z mov.b32 %r12548, %r12461; 2026-02-21T09:23:32.8699832Z mov.b32 %r12549, %r12461; 2026-02-21T09:23:32.8699895Z mov.b32 %r12550, %r12461; 2026-02-21T09:23:32.8699962Z mov.b32 %r12551, %r12461; 2026-02-21T09:23:32.8700025Z mov.b32 %r12552, %r12461; 2026-02-21T09:23:32.8700089Z mov.b32 %r12553, %r12461; 2026-02-21T09:23:32.8700150Z mov.b32 %r12554, %r12461; 2026-02-21T09:23:32.8700215Z mov.b32 %r12555, %r12461; 2026-02-21T09:23:32.8700277Z mov.b32 %r12556, %r12461; 2026-02-21T09:23:32.8700336Z mov.b32 %r12557, %r12461; 2026-02-21T09:23:32.8700396Z mov.b32 %r12558, %r12461; 2026-02-21T09:23:32.8700461Z mov.b32 %r12559, %r12461; 2026-02-21T09:23:32.8700534Z mov.b32 %r12560, %r12461; 2026-02-21T09:23:32.8700594Z mov.b32 %r12561, %r12461; 2026-02-21T09:23:32.8700659Z mov.b32 %r12562, %r12461; 2026-02-21T09:23:32.8700720Z mov.b32 %r12563, %r12461; 2026-02-21T09:23:32.8700781Z mov.b32 %r12564, %r12461; 2026-02-21T09:23:32.8700848Z mov.b32 %r12565, %r12461; 2026-02-21T09:23:32.8700907Z mov.b32 %r12566, %r12461; 2026-02-21T09:23:32.8700967Z mov.b32 %r12567, %r12461; 2026-02-21T09:23:32.8701027Z mov.b32 %r12568, %r12461; 2026-02-21T09:23:32.8701091Z mov.b32 %r12569, %r12461; 2026-02-21T09:23:32.8701149Z mov.b32 %r12570, %r12461; 2026-02-21T09:23:32.8701211Z mov.b32 %r12571, %r12461; 2026-02-21T09:23:32.8701279Z mov.b32 %r12572, %r12461; 2026-02-21T09:23:32.8701340Z mov.b32 %r12573, %r12461; 2026-02-21T09:23:32.8701401Z mov.b32 %r12574, %r12461; 2026-02-21T09:23:32.8701463Z mov.b32 %r12575, %r12461; 2026-02-21T09:23:32.8701526Z mov.b32 %r12576, %r12461; 2026-02-21T09:23:32.8701585Z mov.b32 %r12577, %r12461; 2026-02-21T09:23:32.8701758Z mov.b32 %r12578, %r12461; 2026-02-21T09:23:32.8701821Z mov.b32 %r12579, %r12461; 2026-02-21T09:23:32.8701883Z mov.b32 %r12580, %r12461; 2026-02-21T09:23:32.8701943Z mov.b32 %r12581, %r12461; 2026-02-21T09:23:32.8702004Z mov.b32 %r12582, %r12461; 2026-02-21T09:23:32.8702071Z mov.b32 %r12583, %r12461; 2026-02-21T09:23:32.8702134Z mov.b32 %r12584, %r12461; 2026-02-21T09:23:32.8702193Z mov.b32 %r12585, %r12461; 2026-02-21T09:23:32.8702257Z mov.b32 %r12586, %r12461; 2026-02-21T09:23:32.8702316Z mov.b32 %r12587, %r12461; 2026-02-21T09:23:32.8702377Z mov.b32 %r12588, %r12461; 2026-02-21T09:23:32.8702498Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:32.8702684Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:32.8702758Z setp.lt.u64 %p107, %rd739, 448; 2026-02-21T09:23:32.8702822Z add.s32 %r6624, %r12459, 1; 2026-02-21T09:23:32.8702894Z setp.gt.s32 %p108, %r6624, 1; 2026-02-21T09:23:32.8702972Z selp.b32 %r12459, 0, %r6624, %p108; 2026-02-21T09:23:32.8703052Z selp.b32 %r6625, 1, 0, %p108; 2026-02-21T09:23:32.8703125Z xor.b32 %r12458, %r12458, %r6625; 2026-02-21T09:23:32.8703392Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8703463Z cp.async.wait_group 1; 2026-02-21T09:23:32.8703521Z bar.sync 0; 2026-02-21T09:23:32.8703595Z shl.b32 %r6626, %r12459, 14; 2026-02-21T09:23:32.8703660Z add.s32 %r6628, %r1205, %r6626; 2026-02-21T09:23:32.8703864Z .loc 1 56 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:56:32 2026-02-21T09:23:32.8703934Z add.s32 %r6629, %r6628, %r90; 2026-02-21T09:23:32.8704003Z ld.shared.b16 %rs225, [%r6629]; 2026-02-21T09:23:32.8704078Z ld.shared.b16 %rs226, [%r6629+1024]; 2026-02-21T09:23:32.8704148Z ld.shared.b16 %rs227, [%r6629+64]; 2026-02-21T09:23:32.8704223Z ld.shared.b16 %rs228, [%r6629+1088]; 2026-02-21T09:23:32.8704292Z ld.shared.b16 %rs229, [%r6629+8192]; 2026-02-21T09:23:32.8704360Z ld.shared.b16 %rs230, [%r6629+9216]; 2026-02-21T09:23:32.8704432Z ld.shared.b16 %rs231, [%r6629+8256]; 2026-02-21T09:23:32.8704500Z ld.shared.b16 %rs232, [%r6629+9280]; 2026-02-21T09:23:32.8704565Z add.s32 %r6630, %r6628, %r91; 2026-02-21T09:23:32.8704638Z ld.shared.b16 %rs233, [%r6630]; 2026-02-21T09:23:32.8704706Z ld.shared.b16 %rs234, [%r6630+1024]; 2026-02-21T09:23:32.8704774Z ld.shared.b16 %rs235, [%r6630+64]; 2026-02-21T09:23:32.8704842Z ld.shared.b16 %rs236, [%r6630+1088]; 2026-02-21T09:23:32.8704918Z ld.shared.b16 %rs237, [%r6630+8192]; 2026-02-21T09:23:32.8704984Z ld.shared.b16 %rs238, [%r6630+9216]; 2026-02-21T09:23:32.8705053Z ld.shared.b16 %rs239, [%r6630+8256]; 2026-02-21T09:23:32.8705126Z ld.shared.b16 %rs240, [%r6630+9280]; 2026-02-21T09:23:32.8705191Z add.s32 %r6631, %r6628, %r92; 2026-02-21T09:23:32.8705259Z ld.shared.b16 %rs241, [%r6631]; 2026-02-21T09:23:32.8705326Z ld.shared.b16 %rs242, [%r6631+1024]; 2026-02-21T09:23:32.8705404Z ld.shared.b16 %rs243, [%r6631+64]; 2026-02-21T09:23:32.8705473Z ld.shared.b16 %rs244, [%r6631+1088]; 2026-02-21T09:23:32.8705542Z ld.shared.b16 %rs245, [%r6631+8192]; 2026-02-21T09:23:32.8705616Z ld.shared.b16 %rs246, [%r6631+9216]; 2026-02-21T09:23:32.8705685Z ld.shared.b16 %rs247, [%r6631+8256]; 2026-02-21T09:23:32.8705752Z ld.shared.b16 %rs248, [%r6631+9280]; 2026-02-21T09:23:32.8705822Z add.s32 %r6632, %r6628, %r93; 2026-02-21T09:23:32.8705890Z ld.shared.b16 %rs249, [%r6632]; 2026-02-21T09:23:32.8705962Z ld.shared.b16 %rs250, [%r6632+1024]; 2026-02-21T09:23:32.8706031Z ld.shared.b16 %rs251, [%r6632+64]; 2026-02-21T09:23:32.8706107Z ld.shared.b16 %rs252, [%r6632+1088]; 2026-02-21T09:23:32.8706174Z ld.shared.b16 %rs253, [%r6632+8192]; 2026-02-21T09:23:32.8706242Z ld.shared.b16 %rs254, [%r6632+9216]; 2026-02-21T09:23:32.8706321Z ld.shared.b16 %rs255, [%r6632+8256]; 2026-02-21T09:23:32.8706397Z ld.shared.b16 %rs256, [%r6632+9280]; 2026-02-21T09:23:32.8706749Z add.s32 %r6633, %r6628, %r94; 2026-02-21T09:23:32.8706832Z ld.shared.b16 %rs257, [%r6633]; 2026-02-21T09:23:32.8706913Z ld.shared.b16 %rs258, [%r6633+1024]; 2026-02-21T09:23:32.8706984Z ld.shared.b16 %rs259, [%r6633+64]; 2026-02-21T09:23:32.8707059Z ld.shared.b16 %rs260, [%r6633+1088]; 2026-02-21T09:23:32.8707135Z ld.shared.b16 %rs261, [%r6633+8192]; 2026-02-21T09:23:32.8707203Z ld.shared.b16 %rs262, [%r6633+9216]; 2026-02-21T09:23:32.8707272Z ld.shared.b16 %rs263, [%r6633+8256]; 2026-02-21T09:23:32.8707346Z ld.shared.b16 %rs264, [%r6633+9280]; 2026-02-21T09:23:32.8707411Z add.s32 %r6634, %r6628, %r95; 2026-02-21T09:23:32.8707481Z ld.shared.b16 %rs265, [%r6634]; 2026-02-21T09:23:32.8707551Z ld.shared.b16 %rs266, [%r6634+1024]; 2026-02-21T09:23:32.8707695Z ld.shared.b16 %rs267, [%r6634+64]; 2026-02-21T09:23:32.8707769Z ld.shared.b16 %rs268, [%r6634+1088]; 2026-02-21T09:23:32.8707840Z ld.shared.b16 %rs269, [%r6634+8192]; 2026-02-21T09:23:32.8707916Z ld.shared.b16 %rs270, [%r6634+9216]; 2026-02-21T09:23:32.8707986Z ld.shared.b16 %rs271, [%r6634+8256]; 2026-02-21T09:23:32.8708058Z ld.shared.b16 %rs272, [%r6634+9280]; 2026-02-21T09:23:32.8708120Z add.s32 %r6635, %r6628, %r96; 2026-02-21T09:23:32.8708342Z ld.shared.b16 %rs273, [%r6635]; 2026-02-21T09:23:32.8708419Z ld.shared.b16 %rs274, [%r6635+1024]; 2026-02-21T09:23:32.8708485Z ld.shared.b16 %rs275, [%r6635+64]; 2026-02-21T09:23:32.8708559Z ld.shared.b16 %rs276, [%r6635+1088]; 2026-02-21T09:23:32.8708627Z ld.shared.b16 %rs277, [%r6635+8192]; 2026-02-21T09:23:32.8708697Z ld.shared.b16 %rs278, [%r6635+9216]; 2026-02-21T09:23:32.8708772Z ld.shared.b16 %rs279, [%r6635+8256]; 2026-02-21T09:23:32.8708841Z ld.shared.b16 %rs280, [%r6635+9280]; 2026-02-21T09:23:32.8708906Z add.s32 %r6636, %r6628, %r97; 2026-02-21T09:23:32.8708974Z ld.shared.b16 %rs281, [%r6636]; 2026-02-21T09:23:32.8709047Z ld.shared.b16 %rs282, [%r6636+1024]; 2026-02-21T09:23:32.8709113Z ld.shared.b16 %rs283, [%r6636+64]; 2026-02-21T09:23:32.8709184Z ld.shared.b16 %rs284, [%r6636+1088]; 2026-02-21T09:23:32.8709258Z ld.shared.b16 %rs285, [%r6636+8192]; 2026-02-21T09:23:32.8709328Z ld.shared.b16 %rs286, [%r6636+9216]; 2026-02-21T09:23:32.8709393Z ld.shared.b16 %rs287, [%r6636+8256]; 2026-02-21T09:23:32.8709462Z ld.shared.b16 %rs288, [%r6636+9280]; 2026-02-21T09:23:32.8709536Z cvt.f32.bf16 %r4341, %rs225; 2026-02-21T09:23:32.8709600Z cvt.f32.bf16 %r4342, %rs226; 2026-02-21T09:23:32.8709664Z cvt.f32.bf16 %r4343, %rs233; 2026-02-21T09:23:32.8709731Z cvt.f32.bf16 %r4344, %rs234; 2026-02-21T09:23:32.8709803Z cvt.f32.bf16 %r4473, %rs241; 2026-02-21T09:23:32.8709867Z cvt.f32.bf16 %r4474, %rs242; 2026-02-21T09:23:32.8709937Z cvt.f32.bf16 %r4475, %rs249; 2026-02-21T09:23:32.8710000Z cvt.f32.bf16 %r4476, %rs250; 2026-02-21T09:23:32.8710064Z cvt.f32.bf16 %r4605, %rs257; 2026-02-21T09:23:32.8710128Z cvt.f32.bf16 %r4606, %rs258; 2026-02-21T09:23:32.8710196Z cvt.f32.bf16 %r4607, %rs265; 2026-02-21T09:23:32.8710262Z cvt.f32.bf16 %r4608, %rs266; 2026-02-21T09:23:32.8710324Z cvt.f32.bf16 %r4737, %rs273; 2026-02-21T09:23:32.8710393Z cvt.f32.bf16 %r4738, %rs274; 2026-02-21T09:23:32.8710453Z cvt.f32.bf16 %r4739, %rs281; 2026-02-21T09:23:32.8710529Z cvt.f32.bf16 %r4740, %rs282; 2026-02-21T09:23:32.8710594Z cvt.f32.bf16 %r4869, %rs227; 2026-02-21T09:23:32.8710662Z cvt.f32.bf16 %r4870, %rs228; 2026-02-21T09:23:32.8710725Z cvt.f32.bf16 %r4871, %rs235; 2026-02-21T09:23:32.8710787Z cvt.f32.bf16 %r4872, %rs236; 2026-02-21T09:23:32.8710854Z cvt.f32.bf16 %r5001, %rs243; 2026-02-21T09:23:32.8710915Z cvt.f32.bf16 %r5002, %rs244; 2026-02-21T09:23:32.8710978Z cvt.f32.bf16 %r5003, %rs251; 2026-02-21T09:23:32.8711041Z cvt.f32.bf16 %r5004, %rs252; 2026-02-21T09:23:32.8711108Z cvt.f32.bf16 %r5133, %rs259; 2026-02-21T09:23:32.8711171Z cvt.f32.bf16 %r5134, %rs260; 2026-02-21T09:23:32.8711235Z cvt.f32.bf16 %r5135, %rs267; 2026-02-21T09:23:32.8711305Z cvt.f32.bf16 %r5136, %rs268; 2026-02-21T09:23:32.8711494Z cvt.f32.bf16 %r5265, %rs275; 2026-02-21T09:23:32.8711558Z cvt.f32.bf16 %r5266, %rs276; 2026-02-21T09:23:32.8711622Z cvt.f32.bf16 %r5267, %rs283; 2026-02-21T09:23:32.8711691Z cvt.f32.bf16 %r5268, %rs284; 2026-02-21T09:23:32.8711753Z cvt.f32.bf16 %r5397, %rs229; 2026-02-21T09:23:32.8711817Z cvt.f32.bf16 %r5398, %rs230; 2026-02-21T09:23:32.8711884Z cvt.f32.bf16 %r5399, %rs237; 2026-02-21T09:23:32.8711947Z cvt.f32.bf16 %r5400, %rs238; 2026-02-21T09:23:32.8712024Z cvt.f32.bf16 %r5529, %rs245; 2026-02-21T09:23:32.8712095Z cvt.f32.bf16 %r5530, %rs246; 2026-02-21T09:23:32.8712158Z cvt.f32.bf16 %r5531, %rs253; 2026-02-21T09:23:32.8712222Z cvt.f32.bf16 %r5532, %rs254; 2026-02-21T09:23:32.8712286Z cvt.f32.bf16 %r5661, %rs261; 2026-02-21T09:23:32.8712354Z cvt.f32.bf16 %r5662, %rs262; 2026-02-21T09:23:32.8712471Z cvt.f32.bf16 %r5663, %rs269; 2026-02-21T09:23:32.8712536Z cvt.f32.bf16 %r5664, %rs270; 2026-02-21T09:23:32.8712607Z cvt.f32.bf16 %r5793, %rs277; 2026-02-21T09:23:32.8712674Z cvt.f32.bf16 %r5794, %rs278; 2026-02-21T09:23:32.8712749Z cvt.f32.bf16 %r5795, %rs285; 2026-02-21T09:23:32.8712814Z cvt.f32.bf16 %r5796, %rs286; 2026-02-21T09:23:32.8712883Z cvt.f32.bf16 %r5925, %rs231; 2026-02-21T09:23:32.8712948Z cvt.f32.bf16 %r5926, %rs232; 2026-02-21T09:23:32.8713057Z cvt.f32.bf16 %r5927, %rs239; 2026-02-21T09:23:32.8713128Z cvt.f32.bf16 %r5928, %rs240; 2026-02-21T09:23:32.8713191Z cvt.f32.bf16 %r6057, %rs247; 2026-02-21T09:23:32.8713254Z cvt.f32.bf16 %r6058, %rs248; 2026-02-21T09:23:32.8713318Z cvt.f32.bf16 %r6059, %rs255; 2026-02-21T09:23:32.8713386Z cvt.f32.bf16 %r6060, %rs256; 2026-02-21T09:23:32.8713450Z cvt.f32.bf16 %r6189, %rs263; 2026-02-21T09:23:32.8713512Z cvt.f32.bf16 %r6190, %rs264; 2026-02-21T09:23:32.8713579Z cvt.f32.bf16 %r6191, %rs271; 2026-02-21T09:23:32.8713643Z cvt.f32.bf16 %r6192, %rs272; 2026-02-21T09:23:32.8713706Z cvt.f32.bf16 %r6321, %rs279; 2026-02-21T09:23:32.8713775Z cvt.f32.bf16 %r6322, %rs280; 2026-02-21T09:23:32.8713841Z cvt.f32.bf16 %r6323, %rs287; 2026-02-21T09:23:32.8713906Z cvt.f32.bf16 %r6324, %rs288; 2026-02-21T09:23:32.8714118Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8714187Z shl.b32 %r6637, %r12459, 3; 2026-02-21T09:23:32.8714253Z add.s32 %r4211, %r9488, %r6637; 2026-02-21T09:23:32.8714318Z // begin inline asm 2026-02-21T09:23:32.8714381Z 2026-02-21T09:23:32.8714434Z { 2026-02-21T09:23:32.8714501Z .reg .pred complete; 2026-02-21T09:23:32.8714561Z waitLoop: 2026-02-21T09:23:32.8714716Z mbarrier.try_wait.parity.shared.b64 complete, [%r4211], %r12458; 2026-02-21T09:23:32.8714802Z @!complete bra.uni waitLoop; 2026-02-21T09:23:32.8714858Z } 2026-02-21T09:23:32.8714863Z 2026-02-21T09:23:32.8714930Z // end inline asm 2026-02-21T09:23:32.8715143Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8715208Z shl.b32 %r6639, %r12459, 12; 2026-02-21T09:23:32.8715282Z add.s32 %r6641, %r1289, %r6639; 2026-02-21T09:23:32.8715481Z .loc 1 76 58 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:76:58 2026-02-21T09:23:32.8715550Z add.s32 %r6642, %r6641, %r24; 2026-02-21T09:23:32.8715619Z add.s32 %r6643, %r6641, %r249; 2026-02-21T09:23:32.8715691Z add.s32 %r6644, %r6641, %r250; 2026-02-21T09:23:32.8715755Z add.s32 %r6645, %r6641, %r251; 2026-02-21T09:23:32.8715820Z add.s32 %r6646, %r6641, %r252; 2026-02-21T09:23:32.8715889Z add.s32 %r6647, %r6641, %r253; 2026-02-21T09:23:32.8715956Z add.s32 %r6648, %r6641, %r254; 2026-02-21T09:23:32.8716020Z add.s32 %r6649, %r6641, %r255; 2026-02-21T09:23:32.8716235Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8716312Z ld.shared.s8 %rs289, [%r6642]; 2026-02-21T09:23:32.8716633Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8716805Z shl.b16 %rs290, %rs289, 4; 2026-02-21T09:23:32.8717076Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8717147Z ld.shared.s8 %rs291, [%r6643+128]; 2026-02-21T09:23:32.8717360Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8717432Z shl.b16 %rs292, %rs291, 4; 2026-02-21T09:23:32.8717630Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8717702Z ld.shared.s8 %rs293, [%r6644+256]; 2026-02-21T09:23:32.8717905Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8717969Z shl.b16 %rs294, %rs293, 4; 2026-02-21T09:23:32.8718232Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8718305Z ld.shared.s8 %rs295, [%r6645+384]; 2026-02-21T09:23:32.8718511Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8718575Z shl.b16 %rs296, %rs295, 4; 2026-02-21T09:23:32.8718834Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8718909Z ld.shared.s8 %rs297, [%r6646+512]; 2026-02-21T09:23:32.8719111Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8719175Z shl.b16 %rs298, %rs297, 4; 2026-02-21T09:23:32.8719380Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8719446Z ld.shared.s8 %rs299, [%r6647+640]; 2026-02-21T09:23:32.8719644Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8719713Z shl.b16 %rs300, %rs299, 4; 2026-02-21T09:23:32.8719911Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8719981Z ld.shared.s8 %rs301, [%r6648+768]; 2026-02-21T09:23:32.8720182Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8720251Z shl.b16 %rs302, %rs301, 4; 2026-02-21T09:23:32.8720449Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8720529Z ld.shared.s8 %rs303, [%r6649+896]; 2026-02-21T09:23:32.8720731Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8720796Z shl.b16 %rs304, %rs303, 4; 2026-02-21T09:23:32.8720991Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8721069Z ld.shared.s8 %rs305, [%r6642+1024]; 2026-02-21T09:23:32.8721267Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8721335Z shl.b16 %rs306, %rs305, 4; 2026-02-21T09:23:32.8721540Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8721610Z ld.shared.s8 %rs307, [%r6643+1152]; 2026-02-21T09:23:32.8721807Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8721876Z shl.b16 %rs308, %rs307, 4; 2026-02-21T09:23:32.8722076Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8722145Z ld.shared.s8 %rs309, [%r6644+1280]; 2026-02-21T09:23:32.8722341Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8722412Z shl.b16 %rs310, %rs309, 4; 2026-02-21T09:23:32.8722610Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8722680Z ld.shared.s8 %rs311, [%r6645+1408]; 2026-02-21T09:23:32.8722999Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8723075Z shl.b16 %rs312, %rs311, 4; 2026-02-21T09:23:32.8723277Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8723355Z ld.shared.s8 %rs313, [%r6646+1536]; 2026-02-21T09:23:32.8723552Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8723617Z shl.b16 %rs314, %rs313, 4; 2026-02-21T09:23:32.8723819Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8723888Z ld.shared.s8 %rs315, [%r6647+1664]; 2026-02-21T09:23:32.8724134Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8724201Z shl.b16 %rs316, %rs315, 4; 2026-02-21T09:23:32.8724404Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8724478Z ld.shared.s8 %rs317, [%r6648+1792]; 2026-02-21T09:23:32.8724677Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8724792Z shl.b16 %rs318, %rs317, 4; 2026-02-21T09:23:32.8724993Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8725074Z ld.shared.s8 %rs319, [%r6649+1920]; 2026-02-21T09:23:32.8725279Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8725344Z shl.b16 %rs320, %rs319, 4; 2026-02-21T09:23:32.8725546Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8725620Z ld.shared.s8 %rs321, [%r6642+2048]; 2026-02-21T09:23:32.8725821Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8725891Z shl.b16 %rs322, %rs321, 4; 2026-02-21T09:23:32.8726089Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8726180Z ld.shared.s8 %rs323, [%r6643+2176]; 2026-02-21T09:23:32.8726381Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8726446Z shl.b16 %rs324, %rs323, 4; 2026-02-21T09:23:32.8726772Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8726842Z ld.shared.s8 %rs325, [%r6644+2304]; 2026-02-21T09:23:32.8727038Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8727107Z shl.b16 %rs326, %rs325, 4; 2026-02-21T09:23:32.8727308Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8727380Z ld.shared.s8 %rs327, [%r6645+2432]; 2026-02-21T09:23:32.8727589Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8732226Z shl.b16 %rs328, %rs327, 4; 2026-02-21T09:23:32.8732507Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8732587Z ld.shared.s8 %rs329, [%r6646+2560]; 2026-02-21T09:23:32.8732808Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8732881Z shl.b16 %rs330, %rs329, 4; 2026-02-21T09:23:32.8733095Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8733174Z ld.shared.s8 %rs331, [%r6647+2688]; 2026-02-21T09:23:32.8733387Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8733461Z shl.b16 %rs332, %rs331, 4; 2026-02-21T09:23:32.8733897Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8733969Z ld.shared.s8 %rs333, [%r6648+2816]; 2026-02-21T09:23:32.8734173Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8734239Z shl.b16 %rs334, %rs333, 4; 2026-02-21T09:23:32.8734438Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8734510Z ld.shared.s8 %rs335, [%r6649+2944]; 2026-02-21T09:23:32.8734705Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8734769Z shl.b16 %rs336, %rs335, 4; 2026-02-21T09:23:32.8735072Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8735142Z ld.shared.s8 %rs337, [%r6642+3072]; 2026-02-21T09:23:32.8735338Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8735406Z shl.b16 %rs338, %rs337, 4; 2026-02-21T09:23:32.8735606Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8735739Z ld.shared.s8 %rs339, [%r6643+3200]; 2026-02-21T09:23:32.8735936Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8736003Z shl.b16 %rs340, %rs339, 4; 2026-02-21T09:23:32.8736199Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8736268Z ld.shared.s8 %rs341, [%r6644+3328]; 2026-02-21T09:23:32.8736642Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8736714Z shl.b16 %rs342, %rs341, 4; 2026-02-21T09:23:32.8736910Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8736990Z ld.shared.s8 %rs343, [%r6645+3456]; 2026-02-21T09:23:32.8737195Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8737261Z shl.b16 %rs344, %rs343, 4; 2026-02-21T09:23:32.8737463Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8737535Z ld.shared.s8 %rs345, [%r6646+3584]; 2026-02-21T09:23:32.8737730Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8737797Z shl.b16 %rs346, %rs345, 4; 2026-02-21T09:23:32.8737994Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8738065Z ld.shared.s8 %rs347, [%r6647+3712]; 2026-02-21T09:23:32.8738261Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8738331Z shl.b16 %rs348, %rs347, 4; 2026-02-21T09:23:32.8738525Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8738604Z ld.shared.s8 %rs349, [%r6648+3840]; 2026-02-21T09:23:32.8738807Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8738872Z shl.b16 %rs350, %rs349, 4; 2026-02-21T09:23:32.8739066Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8739137Z ld.shared.s8 %rs351, [%r6649+3968]; 2026-02-21T09:23:32.8739329Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8739393Z shl.b16 %rs352, %rs351, 4; 2026-02-21T09:23:32.8739588Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8739659Z cvt.s16.s8 %rs353, %rs290; 2026-02-21T09:23:32.8739901Z shr.s16 %rs354, %rs353, 4; 2026-02-21T09:23:32.8739965Z cvt.s16.s8 %rs355, %rs292; 2026-02-21T09:23:32.8740031Z shr.s16 %rs356, %rs355, 4; 2026-02-21T09:23:32.8740093Z shr.s16 %rs357, %rs289, 4; 2026-02-21T09:23:32.8740154Z shr.s16 %rs358, %rs291, 4; 2026-02-21T09:23:32.8740362Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8740431Z cvt.rn.f32.s16 %r6650, %rs358; 2026-02-21T09:23:32.8740498Z cvt.rn.f32.s16 %r6651, %rs357; 2026-02-21T09:23:32.8740563Z cvt.rn.f32.s16 %r6652, %rs356; 2026-02-21T09:23:32.8740630Z cvt.rn.f32.s16 %r6653, %rs354; 2026-02-21T09:23:32.8740827Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8740958Z cvt.s16.s8 %rs359, %rs294; 2026-02-21T09:23:32.8741027Z shr.s16 %rs360, %rs359, 4; 2026-02-21T09:23:32.8741089Z cvt.s16.s8 %rs361, %rs296; 2026-02-21T09:23:32.8741151Z shr.s16 %rs362, %rs361, 4; 2026-02-21T09:23:32.8741220Z shr.s16 %rs363, %rs293, 4; 2026-02-21T09:23:32.8741284Z shr.s16 %rs364, %rs295, 4; 2026-02-21T09:23:32.8741479Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8741612Z cvt.rn.f32.s16 %r6654, %rs364; 2026-02-21T09:23:32.8741685Z cvt.rn.f32.s16 %r6655, %rs363; 2026-02-21T09:23:32.8741752Z cvt.rn.f32.s16 %r6656, %rs362; 2026-02-21T09:23:32.8741820Z cvt.rn.f32.s16 %r6657, %rs360; 2026-02-21T09:23:32.8742026Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8742091Z cvt.s16.s8 %rs365, %rs298; 2026-02-21T09:23:32.8742156Z shr.s16 %rs366, %rs365, 4; 2026-02-21T09:23:32.8742221Z cvt.s16.s8 %rs367, %rs300; 2026-02-21T09:23:32.8742292Z shr.s16 %rs368, %rs367, 4; 2026-02-21T09:23:32.8742356Z shr.s16 %rs369, %rs297, 4; 2026-02-21T09:23:32.8742420Z shr.s16 %rs370, %rs299, 4; 2026-02-21T09:23:32.8742624Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8742698Z cvt.rn.f32.s16 %r6658, %rs370; 2026-02-21T09:23:32.8742764Z cvt.rn.f32.s16 %r6659, %rs369; 2026-02-21T09:23:32.8742830Z cvt.rn.f32.s16 %r6660, %rs368; 2026-02-21T09:23:32.8742901Z cvt.rn.f32.s16 %r6661, %rs366; 2026-02-21T09:23:32.8743098Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8743165Z cvt.s16.s8 %rs371, %rs302; 2026-02-21T09:23:32.8743236Z shr.s16 %rs372, %rs371, 4; 2026-02-21T09:23:32.8743616Z cvt.s16.s8 %rs373, %rs304; 2026-02-21T09:23:32.8743682Z shr.s16 %rs374, %rs373, 4; 2026-02-21T09:23:32.8743750Z shr.s16 %rs375, %rs301, 4; 2026-02-21T09:23:32.8743817Z shr.s16 %rs376, %rs303, 4; 2026-02-21T09:23:32.8744024Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8744094Z cvt.rn.f32.s16 %r6662, %rs376; 2026-02-21T09:23:32.8744172Z cvt.rn.f32.s16 %r6663, %rs375; 2026-02-21T09:23:32.8744242Z cvt.rn.f32.s16 %r6664, %rs374; 2026-02-21T09:23:32.8744309Z cvt.rn.f32.s16 %r6665, %rs372; 2026-02-21T09:23:32.8744520Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8744587Z cvt.s16.s8 %rs377, %rs306; 2026-02-21T09:23:32.8744653Z shr.s16 %rs378, %rs377, 4; 2026-02-21T09:23:32.8744724Z cvt.s16.s8 %rs379, %rs308; 2026-02-21T09:23:32.8744790Z shr.s16 %rs380, %rs379, 4; 2026-02-21T09:23:32.8744857Z shr.s16 %rs381, %rs305, 4; 2026-02-21T09:23:32.8744921Z shr.s16 %rs382, %rs307, 4; 2026-02-21T09:23:32.8745127Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8745197Z cvt.rn.f32.s16 %r6666, %rs382; 2026-02-21T09:23:32.8745265Z cvt.rn.f32.s16 %r6667, %rs381; 2026-02-21T09:23:32.8745339Z cvt.rn.f32.s16 %r6668, %rs380; 2026-02-21T09:23:32.8745404Z cvt.rn.f32.s16 %r6669, %rs378; 2026-02-21T09:23:32.8745723Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8745795Z cvt.s16.s8 %rs383, %rs310; 2026-02-21T09:23:32.8745862Z shr.s16 %rs384, %rs383, 4; 2026-02-21T09:23:32.8745931Z cvt.s16.s8 %rs385, %rs312; 2026-02-21T09:23:32.8745997Z shr.s16 %rs386, %rs385, 4; 2026-02-21T09:23:32.8746068Z shr.s16 %rs387, %rs309, 4; 2026-02-21T09:23:32.8746136Z shr.s16 %rs388, %rs311, 4; 2026-02-21T09:23:32.8746335Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8746412Z cvt.rn.f32.s16 %r6670, %rs388; 2026-02-21T09:23:32.8746599Z cvt.rn.f32.s16 %r6671, %rs387; 2026-02-21T09:23:32.8746681Z cvt.rn.f32.s16 %r6672, %rs386; 2026-02-21T09:23:32.8746830Z cvt.rn.f32.s16 %r6673, %rs384; 2026-02-21T09:23:32.8747061Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8747134Z cvt.s16.s8 %rs389, %rs314; 2026-02-21T09:23:32.8747205Z shr.s16 %rs390, %rs389, 4; 2026-02-21T09:23:32.8747279Z cvt.s16.s8 %rs391, %rs316; 2026-02-21T09:23:32.8747343Z shr.s16 %rs392, %rs391, 4; 2026-02-21T09:23:32.8747408Z shr.s16 %rs393, %rs313, 4; 2026-02-21T09:23:32.8747540Z shr.s16 %rs394, %rs315, 4; 2026-02-21T09:23:32.8747758Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8747830Z cvt.rn.f32.s16 %r6674, %rs394; 2026-02-21T09:23:32.8747897Z cvt.rn.f32.s16 %r6675, %rs393; 2026-02-21T09:23:32.8747969Z cvt.rn.f32.s16 %r6676, %rs392; 2026-02-21T09:23:32.8748038Z cvt.rn.f32.s16 %r6677, %rs390; 2026-02-21T09:23:32.8748318Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8748397Z cvt.s16.s8 %rs395, %rs318; 2026-02-21T09:23:32.8748463Z shr.s16 %rs396, %rs395, 4; 2026-02-21T09:23:32.8748530Z cvt.s16.s8 %rs397, %rs320; 2026-02-21T09:23:32.8748600Z shr.s16 %rs398, %rs397, 4; 2026-02-21T09:23:32.8748672Z shr.s16 %rs399, %rs317, 4; 2026-02-21T09:23:32.8748737Z shr.s16 %rs400, %rs319, 4; 2026-02-21T09:23:32.8748938Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8749013Z cvt.rn.f32.s16 %r6678, %rs400; 2026-02-21T09:23:32.8749080Z cvt.rn.f32.s16 %r6679, %rs399; 2026-02-21T09:23:32.8749148Z cvt.rn.f32.s16 %r6680, %rs398; 2026-02-21T09:23:32.8749225Z cvt.rn.f32.s16 %r6681, %rs396; 2026-02-21T09:23:32.8749424Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8749489Z cvt.s16.s8 %rs401, %rs322; 2026-02-21T09:23:32.8749555Z shr.s16 %rs402, %rs401, 4; 2026-02-21T09:23:32.8749628Z cvt.s16.s8 %rs403, %rs324; 2026-02-21T09:23:32.8749700Z shr.s16 %rs404, %rs403, 4; 2026-02-21T09:23:32.8749764Z shr.s16 %rs405, %rs321, 4; 2026-02-21T09:23:32.8749836Z shr.s16 %rs406, %rs323, 4; 2026-02-21T09:23:32.8750041Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8750110Z cvt.rn.f32.s16 %r6682, %rs406; 2026-02-21T09:23:32.8750182Z cvt.rn.f32.s16 %r6683, %rs405; 2026-02-21T09:23:32.8750252Z cvt.rn.f32.s16 %r6684, %rs404; 2026-02-21T09:23:32.8750320Z cvt.rn.f32.s16 %r6685, %rs402; 2026-02-21T09:23:32.8750519Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8750590Z cvt.s16.s8 %rs407, %rs326; 2026-02-21T09:23:32.8750656Z shr.s16 %rs408, %rs407, 4; 2026-02-21T09:23:32.8750721Z cvt.s16.s8 %rs409, %rs328; 2026-02-21T09:23:32.8750792Z shr.s16 %rs410, %rs409, 4; 2026-02-21T09:23:32.8750856Z shr.s16 %rs411, %rs325, 4; 2026-02-21T09:23:32.8750923Z shr.s16 %rs412, %rs327, 4; 2026-02-21T09:23:32.8751123Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8751196Z cvt.rn.f32.s16 %r6686, %rs412; 2026-02-21T09:23:32.8751433Z cvt.rn.f32.s16 %r6687, %rs411; 2026-02-21T09:23:32.8751500Z cvt.rn.f32.s16 %r6688, %rs410; 2026-02-21T09:23:32.8751573Z cvt.rn.f32.s16 %r6689, %rs408; 2026-02-21T09:23:32.8751772Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8751837Z cvt.s16.s8 %rs413, %rs330; 2026-02-21T09:23:32.8751908Z shr.s16 %rs414, %rs413, 4; 2026-02-21T09:23:32.8751974Z cvt.s16.s8 %rs415, %rs332; 2026-02-21T09:23:32.8752040Z shr.s16 %rs416, %rs415, 4; 2026-02-21T09:23:32.8752106Z shr.s16 %rs417, %rs329, 4; 2026-02-21T09:23:32.8752175Z shr.s16 %rs418, %rs331, 4; 2026-02-21T09:23:32.8752374Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8752492Z cvt.rn.f32.s16 %r6690, %rs418; 2026-02-21T09:23:32.8752567Z cvt.rn.f32.s16 %r6691, %rs417; 2026-02-21T09:23:32.8752643Z cvt.rn.f32.s16 %r6692, %rs416; 2026-02-21T09:23:32.8752717Z cvt.rn.f32.s16 %r6693, %rs414; 2026-02-21T09:23:32.8752927Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8752993Z cvt.s16.s8 %rs419, %rs334; 2026-02-21T09:23:32.8753060Z shr.s16 %rs420, %rs419, 4; 2026-02-21T09:23:32.8753172Z cvt.s16.s8 %rs421, %rs336; 2026-02-21T09:23:32.8753247Z shr.s16 %rs422, %rs421, 4; 2026-02-21T09:23:32.8753312Z shr.s16 %rs423, %rs333, 4; 2026-02-21T09:23:32.8753376Z shr.s16 %rs424, %rs335, 4; 2026-02-21T09:23:32.8753581Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8753650Z cvt.rn.f32.s16 %r6694, %rs424; 2026-02-21T09:23:32.8753715Z cvt.rn.f32.s16 %r6695, %rs423; 2026-02-21T09:23:32.8753784Z cvt.rn.f32.s16 %r6696, %rs422; 2026-02-21T09:23:32.8753858Z cvt.rn.f32.s16 %r6697, %rs420; 2026-02-21T09:23:32.8754056Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8754127Z cvt.s16.s8 %rs425, %rs338; 2026-02-21T09:23:32.8754202Z shr.s16 %rs426, %rs425, 4; 2026-02-21T09:23:32.8754268Z cvt.s16.s8 %rs427, %rs340; 2026-02-21T09:23:32.8754335Z shr.s16 %rs428, %rs427, 4; 2026-02-21T09:23:32.8754406Z shr.s16 %rs429, %rs337, 4; 2026-02-21T09:23:32.8754472Z shr.s16 %rs430, %rs339, 4; 2026-02-21T09:23:32.8754672Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8754741Z cvt.rn.f32.s16 %r6698, %rs430; 2026-02-21T09:23:32.8754817Z cvt.rn.f32.s16 %r6699, %rs429; 2026-02-21T09:23:32.8754884Z cvt.rn.f32.s16 %r6700, %rs428; 2026-02-21T09:23:32.8754951Z cvt.rn.f32.s16 %r6701, %rs426; 2026-02-21T09:23:32.8755158Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8755225Z cvt.s16.s8 %rs431, %rs342; 2026-02-21T09:23:32.8755289Z shr.s16 %rs432, %rs431, 4; 2026-02-21T09:23:32.8755359Z cvt.s16.s8 %rs433, %rs344; 2026-02-21T09:23:32.8755433Z shr.s16 %rs434, %rs433, 4; 2026-02-21T09:23:32.8755501Z shr.s16 %rs435, %rs341, 4; 2026-02-21T09:23:32.8755569Z shr.s16 %rs436, %rs343, 4; 2026-02-21T09:23:32.8755772Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8755840Z cvt.rn.f32.s16 %r6702, %rs436; 2026-02-21T09:23:32.8755909Z cvt.rn.f32.s16 %r6703, %rs435; 2026-02-21T09:23:32.8755980Z cvt.rn.f32.s16 %r6704, %rs434; 2026-02-21T09:23:32.8756060Z cvt.rn.f32.s16 %r6705, %rs432; 2026-02-21T09:23:32.8756276Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8756350Z cvt.s16.s8 %rs437, %rs346; 2026-02-21T09:23:32.8756421Z shr.s16 %rs438, %rs437, 4; 2026-02-21T09:23:32.8756619Z cvt.s16.s8 %rs439, %rs348; 2026-02-21T09:23:32.8756690Z shr.s16 %rs440, %rs439, 4; 2026-02-21T09:23:32.8756761Z shr.s16 %rs441, %rs345, 4; 2026-02-21T09:23:32.8756921Z shr.s16 %rs442, %rs347, 4; 2026-02-21T09:23:32.8757223Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8757300Z cvt.rn.f32.s16 %r6706, %rs442; 2026-02-21T09:23:32.8757368Z cvt.rn.f32.s16 %r6707, %rs441; 2026-02-21T09:23:32.8757436Z cvt.rn.f32.s16 %r6708, %rs440; 2026-02-21T09:23:32.8757503Z cvt.rn.f32.s16 %r6709, %rs438; 2026-02-21T09:23:32.8757713Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8757783Z cvt.s16.s8 %rs443, %rs350; 2026-02-21T09:23:32.8757851Z shr.s16 %rs444, %rs443, 4; 2026-02-21T09:23:32.8757922Z cvt.s16.s8 %rs445, %rs352; 2026-02-21T09:23:32.8757992Z shr.s16 %rs446, %rs445, 4; 2026-02-21T09:23:32.8758057Z shr.s16 %rs447, %rs349, 4; 2026-02-21T09:23:32.8758187Z shr.s16 %rs448, %rs351, 4; 2026-02-21T09:23:32.8758400Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8758474Z cvt.rn.f32.s16 %r6710, %rs448; 2026-02-21T09:23:32.8758541Z cvt.rn.f32.s16 %r6711, %rs447; 2026-02-21T09:23:32.8758613Z cvt.rn.f32.s16 %r6712, %rs446; 2026-02-21T09:23:32.8758680Z cvt.rn.f32.s16 %r6713, %rs444; 2026-02-21T09:23:32.8758866Z st.shared.v4.b32 [%r66], {%r6653, %r6651, %r6652, %r6650}; 2026-02-21T09:23:32.8759002Z st.shared.v4.b32 [%r66+16384], {%r6685, %r6683, %r6684, %r6682}; 2026-02-21T09:23:32.8759115Z st.shared.v4.b32 [%r67], {%r6657, %r6655, %r6656, %r6654}; 2026-02-21T09:23:32.8759237Z st.shared.v4.b32 [%r67+16384], {%r6689, %r6687, %r6688, %r6686}; 2026-02-21T09:23:32.8759347Z st.shared.v4.b32 [%r68], {%r6661, %r6659, %r6660, %r6658}; 2026-02-21T09:23:32.8759475Z st.shared.v4.b32 [%r68+16384], {%r6693, %r6691, %r6692, %r6690}; 2026-02-21T09:23:32.8759582Z st.shared.v4.b32 [%r69], {%r6665, %r6663, %r6664, %r6662}; 2026-02-21T09:23:32.8759701Z st.shared.v4.b32 [%r69+16384], {%r6697, %r6695, %r6696, %r6694}; 2026-02-21T09:23:32.8759813Z st.shared.v4.b32 [%r70], {%r6669, %r6667, %r6668, %r6666}; 2026-02-21T09:23:32.8759937Z st.shared.v4.b32 [%r70+16384], {%r6701, %r6699, %r6700, %r6698}; 2026-02-21T09:23:32.8760058Z st.shared.v4.b32 [%r71], {%r6673, %r6671, %r6672, %r6670}; 2026-02-21T09:23:32.8760189Z st.shared.v4.b32 [%r71+16384], {%r6705, %r6703, %r6704, %r6702}; 2026-02-21T09:23:32.8760297Z st.shared.v4.b32 [%r72], {%r6677, %r6675, %r6676, %r6674}; 2026-02-21T09:23:32.8760416Z st.shared.v4.b32 [%r72+16384], {%r6709, %r6707, %r6708, %r6706}; 2026-02-21T09:23:32.8760527Z st.shared.v4.b32 [%r73], {%r6681, %r6679, %r6680, %r6678}; 2026-02-21T09:23:32.8760645Z st.shared.v4.b32 [%r73+16384], {%r6713, %r6711, %r6712, %r6710}; 2026-02-21T09:23:32.8760707Z $L__tmp3: 2026-02-21T09:23:32.8760990Z .loc 2 291 36 // standard.py:291:36 @[ c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:88:40 ] 2026-02-21T09:23:32.8761065Z // begin inline asm 2026-02-21T09:23:32.8761158Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8761224Z // end inline asm 2026-02-21T09:23:32.8761290Z bar.sync 0; 2026-02-21T09:23:32.8761372Z wgmma.fence.sync.aligned; 2026-02-21T09:23:32.8761440Z mov.pred %p87, -1; 2026-02-21T09:23:32.8761506Z // begin inline asm 2026-02-21T09:23:32.8762996Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r4341,%r4342,%r4343,%r4344}, %rd702, %p87, 1, 1; 2026-02-21T09:23:32.8763061Z // end inline asm 2026-02-21T09:23:32.8763130Z // begin inline asm 2026-02-21T09:23:32.8764596Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r4473,%r4474,%r4475,%r4476}, %rd703, %p87, 1, 1; 2026-02-21T09:23:32.8764798Z // end inline asm 2026-02-21T09:23:32.8764868Z // begin inline asm 2026-02-21T09:23:32.8766410Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r4605,%r4606,%r4607,%r4608}, %rd704, %p87, 1, 1; 2026-02-21T09:23:32.8766607Z // end inline asm 2026-02-21T09:23:32.8766677Z // begin inline asm 2026-02-21T09:23:32.8768165Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r4737,%r4738,%r4739,%r4740}, %rd705, %p87, 1, 1; 2026-02-21T09:23:32.8768239Z // end inline asm 2026-02-21T09:23:32.8768302Z // begin inline asm 2026-02-21T09:23:32.8769806Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r4869,%r4870,%r4871,%r4872}, %rd706, %p87, 1, 1; 2026-02-21T09:23:32.8769878Z // end inline asm 2026-02-21T09:23:32.8769940Z // begin inline asm 2026-02-21T09:23:32.8771434Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r5001,%r5002,%r5003,%r5004}, %rd707, %p87, 1, 1; 2026-02-21T09:23:32.8771499Z // end inline asm 2026-02-21T09:23:32.8771562Z // begin inline asm 2026-02-21T09:23:32.8773049Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r5133,%r5134,%r5135,%r5136}, %rd708, %p87, 1, 1; 2026-02-21T09:23:32.8773269Z // end inline asm 2026-02-21T09:23:32.8773344Z // begin inline asm 2026-02-21T09:23:32.8774882Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524}, {%r5265,%r5266,%r5267,%r5268}, %rd709, %p87, 1, 1; 2026-02-21T09:23:32.8774950Z // end inline asm 2026-02-21T09:23:32.8775025Z // begin inline asm 2026-02-21T09:23:32.8776700Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r5397,%r5398,%r5399,%r5400}, %rd702, %p87, 1, 1; 2026-02-21T09:23:32.8776776Z // end inline asm 2026-02-21T09:23:32.8776842Z // begin inline asm 2026-02-21T09:23:32.8778322Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r5529,%r5530,%r5531,%r5532}, %rd703, %p87, 1, 1; 2026-02-21T09:23:32.8778404Z // end inline asm 2026-02-21T09:23:32.8778470Z // begin inline asm 2026-02-21T09:23:32.8779965Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r5661,%r5662,%r5663,%r5664}, %rd704, %p87, 1, 1; 2026-02-21T09:23:32.8780031Z // end inline asm 2026-02-21T09:23:32.8780094Z // begin inline asm 2026-02-21T09:23:32.8781576Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r5793,%r5794,%r5795,%r5796}, %rd705, %p87, 1, 1; 2026-02-21T09:23:32.8781639Z // end inline asm 2026-02-21T09:23:32.8781700Z // begin inline asm 2026-02-21T09:23:32.8783307Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r5925,%r5926,%r5927,%r5928}, %rd706, %p87, 1, 1; 2026-02-21T09:23:32.8783368Z // end inline asm 2026-02-21T09:23:32.8783439Z // begin inline asm 2026-02-21T09:23:32.8785028Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r6057,%r6058,%r6059,%r6060}, %rd707, %p87, 1, 1; 2026-02-21T09:23:32.8785097Z // end inline asm 2026-02-21T09:23:32.8785167Z // begin inline asm 2026-02-21T09:23:32.8786756Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r6189,%r6190,%r6191,%r6192}, %rd708, %p87, 1, 1; 2026-02-21T09:23:32.8786840Z // end inline asm 2026-02-21T09:23:32.8786903Z // begin inline asm 2026-02-21T09:23:32.8788449Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588}, {%r6321,%r6322,%r6323,%r6324}, %rd709, %p87, 1, 1; 2026-02-21T09:23:32.8788520Z // end inline asm 2026-02-21T09:23:32.8788607Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:32.8788675Z mov.b32 %r6454, %r3993; 2026-02-21T09:23:32.8788744Z mov.b32 %r6455, %r3993; 2026-02-21T09:23:32.8788811Z mov.b32 %r6453, %r1221; 2026-02-21T09:23:32.8788878Z // begin inline asm 2026-02-21T09:23:32.8791402Z // wait for regs: %r12461,%r12462,%r12463,%r12464,%r12465,%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r6453,%r6454,%r6455 2026-02-21T09:23:32.8791673Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:32.8791737Z // end inline asm 2026-02-21T09:23:32.8791804Z $L__tmp4: 2026-02-21T09:23:32.8792027Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8792096Z add.s32 %r6714, %r12460, 1; 2026-02-21T09:23:32.8792177Z setp.gt.s32 %p109, %r6714, 1; 2026-02-21T09:23:32.8792254Z selp.b32 %r12460, 0, %r6714, %p109; 2026-02-21T09:23:32.8792458Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8792601Z add.s64 %rd420, %rd738, %rd48; 2026-02-21T09:23:32.8792674Z add.s64 %rd421, %rd738, %rd47; 2026-02-21T09:23:32.8792741Z add.s64 %rd422, %rd738, %rd46; 2026-02-21T09:23:32.8792813Z add.s64 %rd423, %rd738, %rd45; 2026-02-21T09:23:32.8792891Z add.s64 %rd424, %rd738, %rd44; 2026-02-21T09:23:32.8792959Z add.s64 %rd425, %rd738, %rd43; 2026-02-21T09:23:32.8793026Z add.s64 %rd426, %rd738, %rd42; 2026-02-21T09:23:32.8793110Z add.s64 %rd427, %rd738, %rd41; 2026-02-21T09:23:32.8793241Z add.s64 %rd428, %rd738, %rd40; 2026-02-21T09:23:32.8793310Z add.s64 %rd429, %rd738, %rd39; 2026-02-21T09:23:32.8793373Z add.s64 %rd430, %rd738, %rd38; 2026-02-21T09:23:32.8793443Z add.s64 %rd431, %rd738, %rd37; 2026-02-21T09:23:32.8793508Z add.s64 %rd432, %rd738, %rd36; 2026-02-21T09:23:32.8793572Z add.s64 %rd433, %rd738, %rd35; 2026-02-21T09:23:32.8793643Z add.s64 %rd434, %rd738, %rd34; 2026-02-21T09:23:32.8793848Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8793917Z add.s64 %rd435, %rd738, %rd33; 2026-02-21T09:23:32.8793991Z shl.b32 %r6715, %r12460, 14; 2026-02-21T09:23:32.8794059Z add.s32 %r6716, %r1205, %r6715; 2026-02-21T09:23:32.8794128Z add.s32 %r6587, %r6716, %r25; 2026-02-21T09:23:32.8794208Z selp.b32 %r6588, 8, 0, %p107; 2026-02-21T09:23:32.8794280Z // begin inline asm 2026-02-21T09:23:32.8794432Z cp.async.ca.shared.global [ %r6587 + 0 ], [ %rd420 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8794495Z // end inline asm 2026-02-21T09:23:32.8794566Z add.s32 %r6589, %r6587, 1024; 2026-02-21T09:23:32.8794629Z // begin inline asm 2026-02-21T09:23:32.8794771Z cp.async.ca.shared.global [ %r6589 + 0 ], [ %rd421 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8794830Z // end inline asm 2026-02-21T09:23:32.8794900Z add.s32 %r6591, %r6587, 2048; 2026-02-21T09:23:32.8794961Z // begin inline asm 2026-02-21T09:23:32.8795099Z cp.async.ca.shared.global [ %r6591 + 0 ], [ %rd422 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8795177Z // end inline asm 2026-02-21T09:23:32.8795244Z add.s32 %r6593, %r6587, 3072; 2026-02-21T09:23:32.8795308Z // begin inline asm 2026-02-21T09:23:32.8795453Z cp.async.ca.shared.global [ %r6593 + 0 ], [ %rd423 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8795518Z // end inline asm 2026-02-21T09:23:32.8795581Z add.s32 %r6595, %r6587, 4096; 2026-02-21T09:23:32.8795643Z // begin inline asm 2026-02-21T09:23:32.8795788Z cp.async.ca.shared.global [ %r6595 + 0 ], [ %rd424 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8795850Z // end inline asm 2026-02-21T09:23:32.8795914Z add.s32 %r6597, %r6587, 5120; 2026-02-21T09:23:32.8795982Z // begin inline asm 2026-02-21T09:23:32.8796118Z cp.async.ca.shared.global [ %r6597 + 0 ], [ %rd425 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8796177Z // end inline asm 2026-02-21T09:23:32.8796244Z add.s32 %r6599, %r6587, 6144; 2026-02-21T09:23:32.8796311Z // begin inline asm 2026-02-21T09:23:32.8796576Z cp.async.ca.shared.global [ %r6599 + 0 ], [ %rd426 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8796644Z // end inline asm 2026-02-21T09:23:32.8796715Z add.s32 %r6601, %r6587, 7168; 2026-02-21T09:23:32.8796777Z // begin inline asm 2026-02-21T09:23:32.8796920Z cp.async.ca.shared.global [ %r6601 + 0 ], [ %rd427 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8797153Z // end inline asm 2026-02-21T09:23:32.8797217Z add.s32 %r6603, %r6587, 8192; 2026-02-21T09:23:32.8797281Z // begin inline asm 2026-02-21T09:23:32.8797423Z cp.async.ca.shared.global [ %r6603 + 0 ], [ %rd428 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8797487Z // end inline asm 2026-02-21T09:23:32.8797554Z add.s32 %r6605, %r6587, 9216; 2026-02-21T09:23:32.8797621Z // begin inline asm 2026-02-21T09:23:32.8797758Z cp.async.ca.shared.global [ %r6605 + 0 ], [ %rd429 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8797817Z // end inline asm 2026-02-21T09:23:32.8797887Z add.s32 %r6607, %r6587, 10240; 2026-02-21T09:23:32.8797949Z // begin inline asm 2026-02-21T09:23:32.8798087Z cp.async.ca.shared.global [ %r6607 + 0 ], [ %rd430 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8798224Z // end inline asm 2026-02-21T09:23:32.8798299Z add.s32 %r6609, %r6587, 11264; 2026-02-21T09:23:32.8798361Z // begin inline asm 2026-02-21T09:23:32.8798501Z cp.async.ca.shared.global [ %r6609 + 0 ], [ %rd431 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8798571Z // end inline asm 2026-02-21T09:23:32.8798637Z add.s32 %r6611, %r6587, 12288; 2026-02-21T09:23:32.8798701Z // begin inline asm 2026-02-21T09:23:32.8798909Z cp.async.ca.shared.global [ %r6611 + 0 ], [ %rd432 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8798977Z // end inline asm 2026-02-21T09:23:32.8799041Z add.s32 %r6613, %r6587, 13312; 2026-02-21T09:23:32.8799104Z // begin inline asm 2026-02-21T09:23:32.8799251Z cp.async.ca.shared.global [ %r6613 + 0 ], [ %rd433 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8799317Z // end inline asm 2026-02-21T09:23:32.8799385Z add.s32 %r6615, %r6587, 14336; 2026-02-21T09:23:32.8799446Z // begin inline asm 2026-02-21T09:23:32.8799591Z cp.async.ca.shared.global [ %r6615 + 0 ], [ %rd434 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8799649Z // end inline asm 2026-02-21T09:23:32.8799714Z add.s32 %r6617, %r6587, 15360; 2026-02-21T09:23:32.8799787Z // begin inline asm 2026-02-21T09:23:32.8799926Z cp.async.ca.shared.global [ %r6617 + 0 ], [ %rd435 + 0 ], 0x8, %r6588; 2026-02-21T09:23:32.8799984Z // end inline asm 2026-02-21T09:23:32.8800059Z cp.async.commit_group; 2026-02-21T09:23:32.8800268Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8800336Z shl.b32 %r6717, %r12460, 3; 2026-02-21T09:23:32.8800401Z add.s32 %r6619, %r9488, %r6717; 2026-02-21T09:23:32.8800479Z and.pred %p103, %p2, %p107; 2026-02-21T09:23:32.8800540Z // begin inline asm 2026-02-21T09:23:32.8800682Z @%p103 mbarrier.arrive.expect_tx.shared.b64 _, [%r6619], 4096; 2026-02-21T09:23:32.8800744Z // end inline asm 2026-02-21T09:23:32.8800948Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8801015Z shl.b32 %r6718, %r12460, 12; 2026-02-21T09:23:32.8801086Z add.s32 %r6620, %r1289, %r6718; 2026-02-21T09:23:32.8801144Z bar.sync 0; 2026-02-21T09:23:32.8801216Z elect.sync %r6719|%p110, -1; 2026-02-21T09:23:32.8801290Z and.pred %p111, %p107, %p110; 2026-02-21T09:23:32.8801364Z and.pred %p104, %p1, %p111; 2026-02-21T09:23:32.8801431Z cvt.u32.u64 %r6720, %rd739; 2026-02-21T09:23:32.8801493Z add.s32 %r6622, %r6720, 64; 2026-02-21T09:23:32.8801560Z // begin inline asm 2026-02-21T09:23:32.8801897Z @%p104 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r6620], [%rd603, {%r3992, %r6622}], [%r6619]; 2026-02-21T09:23:32.8801958Z // end inline asm 2026-02-21T09:23:32.8802166Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8802236Z add.s64 %rd738, %rd738, 128; 2026-02-21T09:23:32.8802310Z setp.lt.u64 %p112, %rd739, 480; 2026-02-21T09:23:32.8802376Z add.s64 %rd739, %rd739, 32; 2026-02-21T09:23:32.8802445Z @%p112 bra $L__BB0_5; 2026-02-21T09:23:32.8802562Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:32.8802634Z cp.async.wait_group 0; 2026-02-21T09:23:32.8802831Z bar.sync 0; 2026-02-21T09:23:32.8802894Z // begin inline asm 2026-02-21T09:23:32.8802994Z @%p2 mbarrier.inval.shared::cta.b64 [%r9488]; 2026-02-21T09:23:32.8803054Z // end inline asm 2026-02-21T09:23:32.8803119Z bar.sync 0; 2026-02-21T09:23:32.8803182Z // begin inline asm 2026-02-21T09:23:32.8803273Z @%p2 mbarrier.inval.shared::cta.b64 [%r9489]; 2026-02-21T09:23:32.8803337Z // end inline asm 2026-02-21T09:23:32.8803546Z .loc 1 91 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:91:28 2026-02-21T09:23:32.8803633Z cvt.rn.bf16x2.f32 %r6807, %r12462, %r12461; 2026-02-21T09:23:32.8803725Z cvt.rn.bf16x2.f32 %r6808, %r12464, %r12463; 2026-02-21T09:23:32.8803807Z cvt.rn.bf16x2.f32 %r6809, %r12466, %r12465; 2026-02-21T09:23:32.8803943Z cvt.rn.bf16x2.f32 %r6810, %r12468, %r12467; 2026-02-21T09:23:32.8804025Z cvt.rn.bf16x2.f32 %r6811, %r12470, %r12469; 2026-02-21T09:23:32.8804110Z cvt.rn.bf16x2.f32 %r6812, %r12472, %r12471; 2026-02-21T09:23:32.8804193Z cvt.rn.bf16x2.f32 %r6813, %r12474, %r12473; 2026-02-21T09:23:32.8804272Z cvt.rn.bf16x2.f32 %r6814, %r12476, %r12475; 2026-02-21T09:23:32.8804357Z cvt.rn.bf16x2.f32 %r6815, %r12478, %r12477; 2026-02-21T09:23:32.8804484Z cvt.rn.bf16x2.f32 %r6816, %r12480, %r12479; 2026-02-21T09:23:32.8804577Z cvt.rn.bf16x2.f32 %r6817, %r12482, %r12481; 2026-02-21T09:23:32.8804663Z cvt.rn.bf16x2.f32 %r6818, %r12484, %r12483; 2026-02-21T09:23:32.8804743Z cvt.rn.bf16x2.f32 %r6819, %r12486, %r12485; 2026-02-21T09:23:32.8804821Z cvt.rn.bf16x2.f32 %r6820, %r12488, %r12487; 2026-02-21T09:23:32.8804901Z cvt.rn.bf16x2.f32 %r6821, %r12490, %r12489; 2026-02-21T09:23:32.8804982Z cvt.rn.bf16x2.f32 %r6822, %r12492, %r12491; 2026-02-21T09:23:32.8805061Z cvt.rn.bf16x2.f32 %r6823, %r12494, %r12493; 2026-02-21T09:23:32.8805141Z cvt.rn.bf16x2.f32 %r6824, %r12496, %r12495; 2026-02-21T09:23:32.8805226Z cvt.rn.bf16x2.f32 %r6825, %r12498, %r12497; 2026-02-21T09:23:32.8805307Z cvt.rn.bf16x2.f32 %r6826, %r12500, %r12499; 2026-02-21T09:23:32.8805388Z cvt.rn.bf16x2.f32 %r6827, %r12502, %r12501; 2026-02-21T09:23:32.8805472Z cvt.rn.bf16x2.f32 %r6828, %r12504, %r12503; 2026-02-21T09:23:32.8805551Z cvt.rn.bf16x2.f32 %r6829, %r12506, %r12505; 2026-02-21T09:23:32.8805632Z cvt.rn.bf16x2.f32 %r6830, %r12508, %r12507; 2026-02-21T09:23:32.8805712Z cvt.rn.bf16x2.f32 %r6831, %r12510, %r12509; 2026-02-21T09:23:32.8805794Z cvt.rn.bf16x2.f32 %r6832, %r12512, %r12511; 2026-02-21T09:23:32.8805872Z cvt.rn.bf16x2.f32 %r6833, %r12514, %r12513; 2026-02-21T09:23:32.8805948Z cvt.rn.bf16x2.f32 %r6834, %r12516, %r12515; 2026-02-21T09:23:32.8806032Z cvt.rn.bf16x2.f32 %r6835, %r12518, %r12517; 2026-02-21T09:23:32.8806111Z cvt.rn.bf16x2.f32 %r6836, %r12520, %r12519; 2026-02-21T09:23:32.8806192Z cvt.rn.bf16x2.f32 %r6837, %r12522, %r12521; 2026-02-21T09:23:32.8806270Z cvt.rn.bf16x2.f32 %r6838, %r12524, %r12523; 2026-02-21T09:23:32.8806366Z cvt.rn.bf16x2.f32 %r6839, %r12526, %r12525; 2026-02-21T09:23:32.8806563Z cvt.rn.bf16x2.f32 %r6840, %r12528, %r12527; 2026-02-21T09:23:32.8806647Z cvt.rn.bf16x2.f32 %r6841, %r12530, %r12529; 2026-02-21T09:23:32.8806729Z cvt.rn.bf16x2.f32 %r6842, %r12532, %r12531; 2026-02-21T09:23:32.8806808Z cvt.rn.bf16x2.f32 %r6843, %r12534, %r12533; 2026-02-21T09:23:32.8806892Z cvt.rn.bf16x2.f32 %r6844, %r12536, %r12535; 2026-02-21T09:23:32.8806973Z cvt.rn.bf16x2.f32 %r6845, %r12538, %r12537; 2026-02-21T09:23:32.8807062Z cvt.rn.bf16x2.f32 %r6846, %r12540, %r12539; 2026-02-21T09:23:32.8807142Z cvt.rn.bf16x2.f32 %r6847, %r12542, %r12541; 2026-02-21T09:23:32.8807222Z cvt.rn.bf16x2.f32 %r6848, %r12544, %r12543; 2026-02-21T09:23:32.8807308Z cvt.rn.bf16x2.f32 %r6849, %r12546, %r12545; 2026-02-21T09:23:32.8807388Z cvt.rn.bf16x2.f32 %r6850, %r12548, %r12547; 2026-02-21T09:23:32.8807466Z cvt.rn.bf16x2.f32 %r6851, %r12550, %r12549; 2026-02-21T09:23:32.8807548Z cvt.rn.bf16x2.f32 %r6852, %r12552, %r12551; 2026-02-21T09:23:32.8807627Z cvt.rn.bf16x2.f32 %r6853, %r12554, %r12553; 2026-02-21T09:23:32.8807850Z cvt.rn.bf16x2.f32 %r6854, %r12556, %r12555; 2026-02-21T09:23:32.8807932Z cvt.rn.bf16x2.f32 %r6855, %r12558, %r12557; 2026-02-21T09:23:32.8808010Z cvt.rn.bf16x2.f32 %r6856, %r12560, %r12559; 2026-02-21T09:23:32.8808091Z cvt.rn.bf16x2.f32 %r6857, %r12562, %r12561; 2026-02-21T09:23:32.8808171Z cvt.rn.bf16x2.f32 %r6858, %r12564, %r12563; 2026-02-21T09:23:32.8808255Z cvt.rn.bf16x2.f32 %r6859, %r12566, %r12565; 2026-02-21T09:23:32.8808335Z cvt.rn.bf16x2.f32 %r6860, %r12568, %r12567; 2026-02-21T09:23:32.8808412Z cvt.rn.bf16x2.f32 %r6861, %r12570, %r12569; 2026-02-21T09:23:32.8808493Z cvt.rn.bf16x2.f32 %r6862, %r12572, %r12571; 2026-02-21T09:23:32.8808571Z cvt.rn.bf16x2.f32 %r6863, %r12574, %r12573; 2026-02-21T09:23:32.8808713Z cvt.rn.bf16x2.f32 %r6864, %r12576, %r12575; 2026-02-21T09:23:32.8808799Z cvt.rn.bf16x2.f32 %r6865, %r12578, %r12577; 2026-02-21T09:23:32.8808878Z cvt.rn.bf16x2.f32 %r6866, %r12580, %r12579; 2026-02-21T09:23:32.8808956Z cvt.rn.bf16x2.f32 %r6867, %r12582, %r12581; 2026-02-21T09:23:32.8809038Z cvt.rn.bf16x2.f32 %r6868, %r12584, %r12583; 2026-02-21T09:23:32.8809120Z cvt.rn.bf16x2.f32 %r6869, %r12586, %r12585; 2026-02-21T09:23:32.8809198Z cvt.rn.bf16x2.f32 %r6870, %r12588, %r12587; 2026-02-21T09:23:32.8809498Z .loc 1 92 43 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:92:43 2026-02-21T09:23:32.8809704Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r74], {%r6807, %r6808, %r6809, %r6810}; 2026-02-21T09:23:32.8809892Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r6823, %r6824, %r6825, %r6826}; 2026-02-21T09:23:32.8810084Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r6839, %r6840, %r6841, %r6842}; 2026-02-21T09:23:32.8810269Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r6855, %r6856, %r6857, %r6858}; 2026-02-21T09:23:32.8810449Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r6811, %r6812, %r6813, %r6814}; 2026-02-21T09:23:32.8810629Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r6827, %r6828, %r6829, %r6830}; 2026-02-21T09:23:32.8810814Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r6843, %r6844, %r6845, %r6846}; 2026-02-21T09:23:32.8810990Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r6859, %r6860, %r6861, %r6862}; 2026-02-21T09:23:32.8811170Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r6815, %r6816, %r6817, %r6818}; 2026-02-21T09:23:32.8811351Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r6831, %r6832, %r6833, %r6834}; 2026-02-21T09:23:32.8811532Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r6847, %r6848, %r6849, %r6850}; 2026-02-21T09:23:32.8811713Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r6863, %r6864, %r6865, %r6866}; 2026-02-21T09:23:32.8811890Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r6819, %r6820, %r6821, %r6822}; 2026-02-21T09:23:32.8812074Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r6835, %r6836, %r6837, %r6838}; 2026-02-21T09:23:32.8812254Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r6851, %r6852, %r6853, %r6854}; 2026-02-21T09:23:32.8812433Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r6867, %r6868, %r6869, %r6870}; 2026-02-21T09:23:32.8812501Z // begin inline asm 2026-02-21T09:23:32.8812584Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8812648Z // end inline asm 2026-02-21T09:23:32.8812711Z bar.sync 0; 2026-02-21T09:23:32.8812785Z elect.sync %r6871|%p124, -1; 2026-02-21T09:23:32.8812855Z and.pred %p115, %p82, %p124; 2026-02-21T09:23:32.8812921Z or.b32 %r6723, %r386, %r3992; 2026-02-21T09:23:32.8812989Z // begin inline asm 2026-02-21T09:23:32.8813228Z @%p115 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd586, {%r6723, %r6724}], [%r9492]; 2026-02-21T09:23:32.8813289Z // end inline asm 2026-02-21T09:23:32.8813369Z cp.async.bulk.commit_group; 2026-02-21T09:23:32.8813451Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:32.8813510Z bar.sync 0; 2026-02-21T09:23:32.8813723Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.8813908Z or.b32 %r6872, %r12326, 2; 2026-02-21T09:23:32.8814113Z .loc 1 35 31 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:35:31 2026-02-21T09:23:32.8814184Z add.s32 %r6875, %r6872, %r1335; 2026-02-21T09:23:32.8814394Z .loc 1 34 30 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:34:30 2026-02-21T09:23:32.8814461Z and.b32 %r6762, %r6875, -128; 2026-02-21T09:23:32.8814529Z sub.s32 %r6876, %r6872, %r6762; 2026-02-21T09:23:32.8814733Z .loc 1 36 27 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:36:27 2026-02-21T09:23:32.8814797Z shl.b32 %r9491, %r6876, 7; 2026-02-21T09:23:32.8815047Z .loc 1 37 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:37:32 2026-02-21T09:23:32.8815120Z or.b32 %r6877, %r9491, %r6; 2026-02-21T09:23:32.8815186Z or.b32 %r6878, %r9491, %r7; 2026-02-21T09:23:32.8815249Z or.b32 %r6879, %r9491, %r8; 2026-02-21T09:23:32.8815314Z or.b32 %r6880, %r9491, %r9; 2026-02-21T09:23:32.8815381Z or.b32 %r6881, %r9491, %r10; 2026-02-21T09:23:32.8815445Z or.b32 %r6882, %r9491, %r11; 2026-02-21T09:23:32.8815512Z or.b32 %r6883, %r9491, %r12; 2026-02-21T09:23:32.8815624Z or.b32 %r6884, %r9491, %r13; 2026-02-21T09:23:32.8815688Z or.b32 %r6885, %r9491, %r14; 2026-02-21T09:23:32.8815750Z or.b32 %r6886, %r9491, %r15; 2026-02-21T09:23:32.8815816Z or.b32 %r6887, %r9491, %r16; 2026-02-21T09:23:32.8815878Z or.b32 %r6888, %r9491, %r17; 2026-02-21T09:23:32.8815941Z or.b32 %r6889, %r9491, %r18; 2026-02-21T09:23:32.8816004Z or.b32 %r6890, %r9491, %r19; 2026-02-21T09:23:32.8816070Z or.b32 %r6891, %r9491, %r20; 2026-02-21T09:23:32.8816134Z or.b32 %r6892, %r9491, %r21; 2026-02-21T09:23:32.8816333Z .loc 1 52 53 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:53 2026-02-21T09:23:32.8816404Z shl.b32 %r6893, %r6877, 10; 2026-02-21T09:23:32.8816589Z shl.b32 %r6894, %r6878, 10; 2026-02-21T09:23:32.8816660Z shl.b32 %r6895, %r6879, 10; 2026-02-21T09:23:32.8816724Z shl.b32 %r6896, %r6880, 10; 2026-02-21T09:23:32.8816790Z shl.b32 %r6897, %r6881, 10; 2026-02-21T09:23:32.8816853Z shl.b32 %r6898, %r6882, 10; 2026-02-21T09:23:32.8816917Z shl.b32 %r6899, %r6883, 10; 2026-02-21T09:23:32.8816984Z shl.b32 %r6900, %r6884, 10; 2026-02-21T09:23:32.8817048Z shl.b32 %r6901, %r6885, 10; 2026-02-21T09:23:32.8817113Z shl.b32 %r6902, %r6886, 10; 2026-02-21T09:23:32.8817181Z shl.b32 %r6903, %r6887, 10; 2026-02-21T09:23:32.8817243Z shl.b32 %r6904, %r6888, 10; 2026-02-21T09:23:32.8817306Z shl.b32 %r6905, %r6889, 10; 2026-02-21T09:23:32.8817369Z shl.b32 %r6906, %r6890, 10; 2026-02-21T09:23:32.8817436Z shl.b32 %r6907, %r6891, 10; 2026-02-21T09:23:32.8817499Z shl.b32 %r6908, %r6892, 10; 2026-02-21T09:23:32.8817705Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8817775Z // begin inline asm 2026-02-21T09:23:32.8817881Z @%p2 mbarrier.init.shared::cta.b64 [%r9488], 1; 2026-02-21T09:23:32.8817943Z // end inline asm 2026-02-21T09:23:32.8818013Z bar.sync 0; 2026-02-21T09:23:32.8818082Z // begin inline asm 2026-02-21T09:23:32.8818182Z @%p2 mbarrier.init.shared::cta.b64 [%r9489], 1; 2026-02-21T09:23:32.8818243Z // end inline asm 2026-02-21T09:23:32.8818449Z .loc 1 52 60 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:60 2026-02-21T09:23:32.8818517Z or.b32 %r6909, %r6893, %r23; 2026-02-21T09:23:32.8818580Z or.b32 %r6910, %r6894, %r23; 2026-02-21T09:23:32.8818648Z or.b32 %r6911, %r6895, %r23; 2026-02-21T09:23:32.8818709Z or.b32 %r6912, %r6896, %r23; 2026-02-21T09:23:32.8818772Z or.b32 %r6913, %r6897, %r23; 2026-02-21T09:23:32.8818839Z or.b32 %r6914, %r6898, %r23; 2026-02-21T09:23:32.8818908Z or.b32 %r6915, %r6899, %r23; 2026-02-21T09:23:32.8818971Z or.b32 %r6916, %r6900, %r23; 2026-02-21T09:23:32.8819035Z or.b32 %r6917, %r6901, %r23; 2026-02-21T09:23:32.8819253Z or.b32 %r6918, %r6902, %r23; 2026-02-21T09:23:32.8819315Z or.b32 %r6919, %r6903, %r23; 2026-02-21T09:23:32.8819378Z or.b32 %r6920, %r6904, %r23; 2026-02-21T09:23:32.8819441Z or.b32 %r6921, %r6905, %r23; 2026-02-21T09:23:32.8819509Z or.b32 %r6922, %r6906, %r23; 2026-02-21T09:23:32.8819571Z or.b32 %r6923, %r6907, %r23; 2026-02-21T09:23:32.8819633Z or.b32 %r6924, %r6908, %r23; 2026-02-21T09:23:32.8819852Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8819932Z mad.wide.s32 %rd438, %r6909, 2, %rd93; 2026-02-21T09:23:32.8820007Z mad.wide.s32 %rd439, %r6910, 2, %rd93; 2026-02-21T09:23:32.8820083Z mad.wide.s32 %rd440, %r6911, 2, %rd93; 2026-02-21T09:23:32.8820222Z mad.wide.s32 %rd441, %r6912, 2, %rd93; 2026-02-21T09:23:32.8820296Z mad.wide.s32 %rd442, %r6913, 2, %rd93; 2026-02-21T09:23:32.8820366Z mad.wide.s32 %rd443, %r6914, 2, %rd93; 2026-02-21T09:23:32.8820442Z mad.wide.s32 %rd444, %r6915, 2, %rd93; 2026-02-21T09:23:32.8820532Z mad.wide.s32 %rd445, %r6916, 2, %rd93; 2026-02-21T09:23:32.8820603Z mad.wide.s32 %rd446, %r6917, 2, %rd93; 2026-02-21T09:23:32.8820678Z mad.wide.s32 %rd447, %r6918, 2, %rd93; 2026-02-21T09:23:32.8820812Z mad.wide.s32 %rd448, %r6919, 2, %rd93; 2026-02-21T09:23:32.8820889Z mad.wide.s32 %rd449, %r6920, 2, %rd93; 2026-02-21T09:23:32.8820960Z mad.wide.s32 %rd450, %r6921, 2, %rd93; 2026-02-21T09:23:32.8821036Z mad.wide.s32 %rd451, %r6922, 2, %rd93; 2026-02-21T09:23:32.8821105Z mad.wide.s32 %rd452, %r6923, 2, %rd93; 2026-02-21T09:23:32.8821177Z mad.wide.s32 %rd453, %r6924, 2, %rd93; 2026-02-21T09:23:32.8821240Z mov.b32 %r6729, 8; 2026-02-21T09:23:32.8821443Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8821505Z // begin inline asm 2026-02-21T09:23:32.8821660Z cp.async.ca.shared.global [ %r9495 + 0 ], [ %rd438 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8821718Z // end inline asm 2026-02-21T09:23:32.8821795Z // begin inline asm 2026-02-21T09:23:32.8821940Z cp.async.ca.shared.global [ %r9497 + 0 ], [ %rd439 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8822005Z // end inline asm 2026-02-21T09:23:32.8822065Z // begin inline asm 2026-02-21T09:23:32.8822203Z cp.async.ca.shared.global [ %r9499 + 0 ], [ %rd440 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8822268Z // end inline asm 2026-02-21T09:23:32.8822329Z // begin inline asm 2026-02-21T09:23:32.8822465Z cp.async.ca.shared.global [ %r9501 + 0 ], [ %rd441 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8822524Z // end inline asm 2026-02-21T09:23:32.8822589Z // begin inline asm 2026-02-21T09:23:32.8822727Z cp.async.ca.shared.global [ %r9503 + 0 ], [ %rd442 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8822786Z // end inline asm 2026-02-21T09:23:32.8822852Z // begin inline asm 2026-02-21T09:23:32.8822988Z cp.async.ca.shared.global [ %r9505 + 0 ], [ %rd443 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8823047Z // end inline asm 2026-02-21T09:23:32.8823118Z // begin inline asm 2026-02-21T09:23:32.8823255Z cp.async.ca.shared.global [ %r9507 + 0 ], [ %rd444 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8823315Z // end inline asm 2026-02-21T09:23:32.8823378Z // begin inline asm 2026-02-21T09:23:32.8823522Z cp.async.ca.shared.global [ %r9509 + 0 ], [ %rd445 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8823593Z // end inline asm 2026-02-21T09:23:32.8823656Z // begin inline asm 2026-02-21T09:23:32.8823799Z cp.async.ca.shared.global [ %r9511 + 0 ], [ %rd446 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8823859Z // end inline asm 2026-02-21T09:23:32.8823920Z // begin inline asm 2026-02-21T09:23:32.8824057Z cp.async.ca.shared.global [ %r9513 + 0 ], [ %rd447 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8824123Z // end inline asm 2026-02-21T09:23:32.8824185Z // begin inline asm 2026-02-21T09:23:32.8824323Z cp.async.ca.shared.global [ %r9515 + 0 ], [ %rd448 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8824387Z // end inline asm 2026-02-21T09:23:32.8824450Z // begin inline asm 2026-02-21T09:23:32.8824717Z cp.async.ca.shared.global [ %r9517 + 0 ], [ %rd449 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8824782Z // end inline asm 2026-02-21T09:23:32.8824842Z // begin inline asm 2026-02-21T09:23:32.8824979Z cp.async.ca.shared.global [ %r9519 + 0 ], [ %rd450 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8825038Z // end inline asm 2026-02-21T09:23:32.8825103Z // begin inline asm 2026-02-21T09:23:32.8825242Z cp.async.ca.shared.global [ %r9521 + 0 ], [ %rd451 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8825301Z // end inline asm 2026-02-21T09:23:32.8825368Z // begin inline asm 2026-02-21T09:23:32.8825504Z cp.async.ca.shared.global [ %r9523 + 0 ], [ %rd452 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8825565Z // end inline asm 2026-02-21T09:23:32.8825625Z // begin inline asm 2026-02-21T09:23:32.8825813Z cp.async.ca.shared.global [ %r9525 + 0 ], [ %rd453 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8825875Z // end inline asm 2026-02-21T09:23:32.8825945Z cp.async.commit_group; 2026-02-21T09:23:32.8826156Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8826218Z bar.sync 0; 2026-02-21T09:23:32.8826293Z // begin inline asm 2026-02-21T09:23:32.8826434Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9488], 4096; 2026-02-21T09:23:32.8826697Z // end inline asm 2026-02-21T09:23:32.8826919Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8826980Z bar.sync 0; 2026-02-21T09:23:32.8827056Z elect.sync %r6925|%p125, -1; 2026-02-21T09:23:32.8827129Z and.pred %p119, %p1, %p125; 2026-02-21T09:23:32.8827189Z mov.b32 %r6763, 0; 2026-02-21T09:23:32.8827255Z // begin inline asm 2026-02-21T09:23:32.8827596Z @%p119 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1289], [%rd603, {%r6762, %r6763}], [%r9488]; 2026-02-21T09:23:32.8827658Z // end inline asm 2026-02-21T09:23:32.8827866Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8827939Z cvt.s64.s32 %rd473, %r6893; 2026-02-21T09:23:32.8828005Z or.b64 %rd474, %rd473, %rd12; 2026-02-21T09:23:32.8828076Z shl.b64 %rd475, %rd474, 1; 2026-02-21T09:23:32.8828152Z add.s64 %rd476, %rd93, %rd475; 2026-02-21T09:23:32.8828285Z add.s64 %rd455, %rd476, 128; 2026-02-21T09:23:32.8828364Z cvt.s64.s32 %rd477, %r6894; 2026-02-21T09:23:32.8828435Z or.b64 %rd478, %rd477, %rd12; 2026-02-21T09:23:32.8828500Z shl.b64 %rd479, %rd478, 1; 2026-02-21T09:23:32.8828570Z add.s64 %rd480, %rd93, %rd479; 2026-02-21T09:23:32.8828639Z add.s64 %rd456, %rd480, 128; 2026-02-21T09:23:32.8828710Z cvt.s64.s32 %rd481, %r6895; 2026-02-21T09:23:32.8828774Z or.b64 %rd482, %rd481, %rd12; 2026-02-21T09:23:32.8828838Z shl.b64 %rd483, %rd482, 1; 2026-02-21T09:23:32.8828912Z add.s64 %rd484, %rd93, %rd483; 2026-02-21T09:23:32.8828979Z add.s64 %rd457, %rd484, 128; 2026-02-21T09:23:32.8829047Z cvt.s64.s32 %rd485, %r6896; 2026-02-21T09:23:32.8829117Z or.b64 %rd486, %rd485, %rd12; 2026-02-21T09:23:32.8829188Z shl.b64 %rd487, %rd486, 1; 2026-02-21T09:23:32.8829257Z add.s64 %rd488, %rd93, %rd487; 2026-02-21T09:23:32.8829319Z add.s64 %rd458, %rd488, 128; 2026-02-21T09:23:32.8829386Z cvt.s64.s32 %rd489, %r6897; 2026-02-21T09:23:32.8829452Z or.b64 %rd490, %rd489, %rd12; 2026-02-21T09:23:32.8829517Z shl.b64 %rd491, %rd490, 1; 2026-02-21T09:23:32.8829586Z add.s64 %rd492, %rd93, %rd491; 2026-02-21T09:23:32.8829649Z add.s64 %rd459, %rd492, 128; 2026-02-21T09:23:32.8829714Z cvt.s64.s32 %rd493, %r6898; 2026-02-21T09:23:32.8829778Z or.b64 %rd494, %rd493, %rd12; 2026-02-21T09:23:32.8829846Z shl.b64 %rd495, %rd494, 1; 2026-02-21T09:23:32.8829913Z add.s64 %rd496, %rd93, %rd495; 2026-02-21T09:23:32.8829979Z add.s64 %rd460, %rd496, 128; 2026-02-21T09:23:32.8830049Z cvt.s64.s32 %rd497, %r6899; 2026-02-21T09:23:32.8830115Z or.b64 %rd498, %rd497, %rd12; 2026-02-21T09:23:32.8830180Z shl.b64 %rd499, %rd498, 1; 2026-02-21T09:23:32.8830415Z add.s64 %rd500, %rd93, %rd499; 2026-02-21T09:23:32.8830485Z add.s64 %rd461, %rd500, 128; 2026-02-21T09:23:32.8830549Z cvt.s64.s32 %rd501, %r6900; 2026-02-21T09:23:32.8830613Z or.b64 %rd502, %rd501, %rd12; 2026-02-21T09:23:32.8830683Z shl.b64 %rd503, %rd502, 1; 2026-02-21T09:23:32.8830749Z add.s64 %rd504, %rd93, %rd503; 2026-02-21T09:23:32.8830813Z add.s64 %rd462, %rd504, 128; 2026-02-21T09:23:32.8830878Z cvt.s64.s32 %rd505, %r6901; 2026-02-21T09:23:32.8830948Z or.b64 %rd506, %rd505, %rd12; 2026-02-21T09:23:32.8831012Z shl.b64 %rd507, %rd506, 1; 2026-02-21T09:23:32.8831077Z add.s64 %rd508, %rd93, %rd507; 2026-02-21T09:23:32.8831146Z add.s64 %rd463, %rd508, 128; 2026-02-21T09:23:32.8831211Z cvt.s64.s32 %rd509, %r6902; 2026-02-21T09:23:32.8831275Z or.b64 %rd510, %rd509, %rd12; 2026-02-21T09:23:32.8831427Z shl.b64 %rd511, %rd510, 1; 2026-02-21T09:23:32.8831503Z add.s64 %rd512, %rd93, %rd511; 2026-02-21T09:23:32.8831570Z add.s64 %rd464, %rd512, 128; 2026-02-21T09:23:32.8831637Z cvt.s64.s32 %rd513, %r6903; 2026-02-21T09:23:32.8831707Z or.b64 %rd514, %rd513, %rd12; 2026-02-21T09:23:32.8831770Z shl.b64 %rd515, %rd514, 1; 2026-02-21T09:23:32.8831836Z add.s64 %rd516, %rd93, %rd515; 2026-02-21T09:23:32.8831904Z add.s64 %rd465, %rd516, 128; 2026-02-21T09:23:32.8832017Z cvt.s64.s32 %rd517, %r6904; 2026-02-21T09:23:32.8832082Z or.b64 %rd518, %rd517, %rd12; 2026-02-21T09:23:32.8832147Z shl.b64 %rd519, %rd518, 1; 2026-02-21T09:23:32.8832218Z add.s64 %rd520, %rd93, %rd519; 2026-02-21T09:23:32.8832284Z add.s64 %rd466, %rd520, 128; 2026-02-21T09:23:32.8832350Z cvt.s64.s32 %rd521, %r6905; 2026-02-21T09:23:32.8832420Z or.b64 %rd522, %rd521, %rd12; 2026-02-21T09:23:32.8832486Z shl.b64 %rd523, %rd522, 1; 2026-02-21T09:23:32.8832553Z add.s64 %rd524, %rd93, %rd523; 2026-02-21T09:23:32.8832621Z add.s64 %rd467, %rd524, 128; 2026-02-21T09:23:32.8832692Z cvt.s64.s32 %rd525, %r6906; 2026-02-21T09:23:32.8832757Z or.b64 %rd526, %rd525, %rd12; 2026-02-21T09:23:32.8832823Z shl.b64 %rd527, %rd526, 1; 2026-02-21T09:23:32.8832908Z add.s64 %rd528, %rd93, %rd527; 2026-02-21T09:23:32.8832975Z add.s64 %rd468, %rd528, 128; 2026-02-21T09:23:32.8833040Z cvt.s64.s32 %rd529, %r6907; 2026-02-21T09:23:32.8833105Z or.b64 %rd530, %rd529, %rd12; 2026-02-21T09:23:32.8833179Z shl.b64 %rd531, %rd530, 1; 2026-02-21T09:23:32.8833246Z add.s64 %rd532, %rd93, %rd531; 2026-02-21T09:23:32.8833312Z add.s64 %rd469, %rd532, 128; 2026-02-21T09:23:32.8833382Z cvt.s64.s32 %rd533, %r6908; 2026-02-21T09:23:32.8833447Z or.b64 %rd534, %rd533, %rd12; 2026-02-21T09:23:32.8833513Z shl.b64 %rd535, %rd534, 1; 2026-02-21T09:23:32.8833581Z add.s64 %rd536, %rd93, %rd535; 2026-02-21T09:23:32.8833652Z add.s64 %rd470, %rd536, 128; 2026-02-21T09:23:32.8833862Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8833926Z // begin inline asm 2026-02-21T09:23:32.8834078Z cp.async.ca.shared.global [ %r9532 + 0 ], [ %rd455 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8834147Z // end inline asm 2026-02-21T09:23:32.8834209Z // begin inline asm 2026-02-21T09:23:32.8834361Z cp.async.ca.shared.global [ %r9534 + 0 ], [ %rd456 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8834423Z // end inline asm 2026-02-21T09:23:32.8834487Z // begin inline asm 2026-02-21T09:23:32.8834626Z cp.async.ca.shared.global [ %r9536 + 0 ], [ %rd457 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8834692Z // end inline asm 2026-02-21T09:23:32.8834755Z // begin inline asm 2026-02-21T09:23:32.8834892Z cp.async.ca.shared.global [ %r9538 + 0 ], [ %rd458 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8834956Z // end inline asm 2026-02-21T09:23:32.8835017Z // begin inline asm 2026-02-21T09:23:32.8835153Z cp.async.ca.shared.global [ %r9540 + 0 ], [ %rd459 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8835219Z // end inline asm 2026-02-21T09:23:32.8835281Z // begin inline asm 2026-02-21T09:23:32.8835422Z cp.async.ca.shared.global [ %r9542 + 0 ], [ %rd460 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8835612Z // end inline asm 2026-02-21T09:23:32.8835678Z // begin inline asm 2026-02-21T09:23:32.8835816Z cp.async.ca.shared.global [ %r9544 + 0 ], [ %rd461 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8835876Z // end inline asm 2026-02-21T09:23:32.8835942Z // begin inline asm 2026-02-21T09:23:32.8836079Z cp.async.ca.shared.global [ %r9546 + 0 ], [ %rd462 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8836141Z // end inline asm 2026-02-21T09:23:32.8836202Z // begin inline asm 2026-02-21T09:23:32.8836346Z cp.async.ca.shared.global [ %r9548 + 0 ], [ %rd463 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8836406Z // end inline asm 2026-02-21T09:23:32.8836582Z // begin inline asm 2026-02-21T09:23:32.8836744Z cp.async.ca.shared.global [ %r9550 + 0 ], [ %rd464 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8836805Z // end inline asm 2026-02-21T09:23:32.8836955Z // begin inline asm 2026-02-21T09:23:32.8837101Z cp.async.ca.shared.global [ %r9552 + 0 ], [ %rd465 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8837161Z // end inline asm 2026-02-21T09:23:32.8837232Z // begin inline asm 2026-02-21T09:23:32.8837366Z cp.async.ca.shared.global [ %r9554 + 0 ], [ %rd466 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8837433Z // end inline asm 2026-02-21T09:23:32.8837494Z // begin inline asm 2026-02-21T09:23:32.8837708Z cp.async.ca.shared.global [ %r9556 + 0 ], [ %rd467 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8837778Z // end inline asm 2026-02-21T09:23:32.8837841Z // begin inline asm 2026-02-21T09:23:32.8837977Z cp.async.ca.shared.global [ %r9558 + 0 ], [ %rd468 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8838035Z // end inline asm 2026-02-21T09:23:32.8838101Z // begin inline asm 2026-02-21T09:23:32.8838236Z cp.async.ca.shared.global [ %r9560 + 0 ], [ %rd469 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8838296Z // end inline asm 2026-02-21T09:23:32.8838363Z // begin inline asm 2026-02-21T09:23:32.8838499Z cp.async.ca.shared.global [ %r9562 + 0 ], [ %rd470 + 0 ], 0x8, %r6729; 2026-02-21T09:23:32.8838561Z // end inline asm 2026-02-21T09:23:32.8838634Z cp.async.commit_group; 2026-02-21T09:23:32.8838850Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8838908Z bar.sync 0; 2026-02-21T09:23:32.8838970Z // begin inline asm 2026-02-21T09:23:32.8839109Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9489], 4096; 2026-02-21T09:23:32.8839168Z // end inline asm 2026-02-21T09:23:32.8839367Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8839433Z bar.sync 0; 2026-02-21T09:23:32.8839505Z elect.sync %r6926|%p126, -1; 2026-02-21T09:23:32.8839580Z and.pred %p121, %p1, %p126; 2026-02-21T09:23:32.8839640Z mov.b32 %r6800, 32; 2026-02-21T09:23:32.8839707Z // begin inline asm 2026-02-21T09:23:32.8840039Z @%p121 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1326], [%rd603, {%r6762, %r6800}], [%r9489]; 2026-02-21T09:23:32.8840099Z // end inline asm 2026-02-21T09:23:32.8840308Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8840377Z shl.b32 %r6927, %r6872, 7; 2026-02-21T09:23:32.8840442Z or.b32 %r6928, %r21, %r6927; 2026-02-21T09:23:32.8840513Z shl.b32 %r6929, %r6875, 7; 2026-02-21T09:23:32.8840585Z and.b32 %r6930, %r6929, -16384; 2026-02-21T09:23:32.8840653Z sub.s32 %r6931, %r6928, %r6930; 2026-02-21T09:23:32.8840717Z shl.b32 %r6932, %r6931, 10; 2026-02-21T09:23:32.8840792Z mul.wide.s32 %rd537, %r6932, 2; 2026-02-21T09:23:32.8840872Z or.b64 %rd53, %rd537, 256; 2026-02-21T09:23:32.8840938Z or.b32 %r6933, %r20, %r6927; 2026-02-21T09:23:32.8841007Z sub.s32 %r6934, %r6933, %r6930; 2026-02-21T09:23:32.8841072Z shl.b32 %r6935, %r6934, 10; 2026-02-21T09:23:32.8841141Z mul.wide.s32 %rd538, %r6935, 2; 2026-02-21T09:23:32.8841206Z or.b64 %rd54, %rd538, 256; 2026-02-21T09:23:32.8841278Z or.b32 %r6936, %r19, %r6927; 2026-02-21T09:23:32.8841346Z sub.s32 %r6937, %r6936, %r6930; 2026-02-21T09:23:32.8841586Z shl.b32 %r6938, %r6937, 10; 2026-02-21T09:23:32.8841660Z mul.wide.s32 %rd539, %r6938, 2; 2026-02-21T09:23:32.8841724Z or.b64 %rd55, %rd539, 256; 2026-02-21T09:23:32.8841786Z or.b32 %r6939, %r18, %r6927; 2026-02-21T09:23:32.8841856Z sub.s32 %r6940, %r6939, %r6930; 2026-02-21T09:23:32.8841920Z shl.b32 %r6941, %r6940, 10; 2026-02-21T09:23:32.8841989Z mul.wide.s32 %rd540, %r6941, 2; 2026-02-21T09:23:32.8842054Z or.b64 %rd56, %rd540, 256; 2026-02-21T09:23:32.8842124Z or.b32 %r6942, %r17, %r6927; 2026-02-21T09:23:32.8842189Z sub.s32 %r6943, %r6942, %r6930; 2026-02-21T09:23:32.8842252Z shl.b32 %r6944, %r6943, 10; 2026-02-21T09:23:32.8842328Z mul.wide.s32 %rd541, %r6944, 2; 2026-02-21T09:23:32.8842394Z or.b64 %rd57, %rd541, 256; 2026-02-21T09:23:32.8842460Z or.b32 %r6945, %r16, %r6927; 2026-02-21T09:23:32.8842585Z sub.s32 %r6946, %r6945, %r6930; 2026-02-21T09:23:32.8842657Z shl.b32 %r6947, %r6946, 10; 2026-02-21T09:23:32.8842728Z mul.wide.s32 %rd542, %r6947, 2; 2026-02-21T09:23:32.8842796Z or.b64 %rd58, %rd542, 256; 2026-02-21T09:23:32.8842878Z or.b32 %r6948, %r15, %r6927; 2026-02-21T09:23:32.8842948Z sub.s32 %r6949, %r6948, %r6930; 2026-02-21T09:23:32.8843012Z shl.b32 %r6950, %r6949, 10; 2026-02-21T09:23:32.8843127Z mul.wide.s32 %rd543, %r6950, 2; 2026-02-21T09:23:32.8843200Z or.b64 %rd59, %rd543, 256; 2026-02-21T09:23:32.8843264Z or.b32 %r6951, %r14, %r6927; 2026-02-21T09:23:32.8843329Z sub.s32 %r6952, %r6951, %r6930; 2026-02-21T09:23:32.8843403Z shl.b32 %r6953, %r6952, 10; 2026-02-21T09:23:32.8843471Z mul.wide.s32 %rd544, %r6953, 2; 2026-02-21T09:23:32.8843537Z or.b64 %rd60, %rd544, 256; 2026-02-21T09:23:32.8843606Z or.b32 %r6954, %r13, %r6927; 2026-02-21T09:23:32.8843672Z sub.s32 %r6955, %r6954, %r6930; 2026-02-21T09:23:32.8843736Z shl.b32 %r6956, %r6955, 10; 2026-02-21T09:23:32.8843806Z mul.wide.s32 %rd545, %r6956, 2; 2026-02-21T09:23:32.8843877Z or.b64 %rd61, %rd545, 256; 2026-02-21T09:23:32.8843941Z or.b32 %r6957, %r12, %r6927; 2026-02-21T09:23:32.8844009Z sub.s32 %r6958, %r6957, %r6930; 2026-02-21T09:23:32.8844078Z shl.b32 %r6959, %r6958, 10; 2026-02-21T09:23:32.8844146Z mul.wide.s32 %rd546, %r6959, 2; 2026-02-21T09:23:32.8844210Z or.b64 %rd62, %rd546, 256; 2026-02-21T09:23:32.8844275Z or.b32 %r6960, %r11, %r6927; 2026-02-21T09:23:32.8844360Z sub.s32 %r6961, %r6960, %r6930; 2026-02-21T09:23:32.8844425Z shl.b32 %r6962, %r6961, 10; 2026-02-21T09:23:32.8844494Z mul.wide.s32 %rd547, %r6962, 2; 2026-02-21T09:23:32.8844565Z or.b64 %rd63, %rd547, 256; 2026-02-21T09:23:32.8844631Z or.b32 %r6963, %r10, %r6927; 2026-02-21T09:23:32.8844696Z sub.s32 %r6964, %r6963, %r6930; 2026-02-21T09:23:32.8844762Z shl.b32 %r6965, %r6964, 10; 2026-02-21T09:23:32.8844838Z mul.wide.s32 %rd548, %r6965, 2; 2026-02-21T09:23:32.8844906Z or.b64 %rd64, %rd548, 256; 2026-02-21T09:23:32.8844970Z or.b32 %r6966, %r9, %r6927; 2026-02-21T09:23:32.8845040Z sub.s32 %r6967, %r6966, %r6930; 2026-02-21T09:23:32.8845103Z shl.b32 %r6968, %r6967, 10; 2026-02-21T09:23:32.8845174Z mul.wide.s32 %rd549, %r6968, 2; 2026-02-21T09:23:32.8845243Z or.b64 %rd65, %rd549, 256; 2026-02-21T09:23:32.8845307Z or.b32 %r6969, %r8, %r6927; 2026-02-21T09:23:32.8845371Z sub.s32 %r6970, %r6969, %r6930; 2026-02-21T09:23:32.8845434Z shl.b32 %r6971, %r6970, 10; 2026-02-21T09:23:32.8845510Z mul.wide.s32 %rd550, %r6971, 2; 2026-02-21T09:23:32.8845574Z or.b64 %rd66, %rd550, 256; 2026-02-21T09:23:32.8845641Z or.b32 %r6972, %r7, %r6927; 2026-02-21T09:23:32.8845711Z sub.s32 %r6973, %r6972, %r6930; 2026-02-21T09:23:32.8845774Z shl.b32 %r6974, %r6973, 10; 2026-02-21T09:23:32.8845842Z mul.wide.s32 %rd551, %r6974, 2; 2026-02-21T09:23:32.8845905Z or.b64 %rd67, %rd551, 256; 2026-02-21T09:23:32.8845975Z or.b32 %r6975, %r6, %r6927; 2026-02-21T09:23:32.8846040Z sub.s32 %r6976, %r6975, %r6930; 2026-02-21T09:23:32.8846103Z shl.b32 %r6977, %r6976, 10; 2026-02-21T09:23:32.8846176Z mul.wide.s32 %rd552, %r6977, 2; 2026-02-21T09:23:32.8846240Z or.b64 %rd68, %rd552, 256; 2026-02-21T09:23:32.8846419Z mov.b32 %r12592, 0f00000000; 2026-02-21T09:23:32.8846600Z mov.b32 %r12591, 1; 2026-02-21T09:23:32.8846675Z mov.b32 %r12590, -1; 2026-02-21T09:23:32.8846738Z mov.b64 %rd741, 0; 2026-02-21T09:23:32.8846802Z mov.b64 %rd740, %rd11; 2026-02-21T09:23:32.8846872Z mov.b32 %r12589, %r6763; 2026-02-21T09:23:32.8846936Z mov.b32 %r12593, %r12592; 2026-02-21T09:23:32.8846999Z mov.b32 %r12594, %r12592; 2026-02-21T09:23:32.8847063Z mov.b32 %r12595, %r12592; 2026-02-21T09:23:32.8847130Z mov.b32 %r12596, %r12592; 2026-02-21T09:23:32.8847194Z mov.b32 %r12597, %r12592; 2026-02-21T09:23:32.8847256Z mov.b32 %r12598, %r12592; 2026-02-21T09:23:32.8847323Z mov.b32 %r12599, %r12592; 2026-02-21T09:23:32.8847387Z mov.b32 %r12600, %r12592; 2026-02-21T09:23:32.8847531Z mov.b32 %r12601, %r12592; 2026-02-21T09:23:32.8847597Z mov.b32 %r12602, %r12592; 2026-02-21T09:23:32.8847664Z mov.b32 %r12603, %r12592; 2026-02-21T09:23:32.8847725Z mov.b32 %r12604, %r12592; 2026-02-21T09:23:32.8847792Z mov.b32 %r12605, %r12592; 2026-02-21T09:23:32.8847863Z mov.b32 %r12606, %r12592; 2026-02-21T09:23:32.8847927Z mov.b32 %r12607, %r12592; 2026-02-21T09:23:32.8847988Z mov.b32 %r12608, %r12592; 2026-02-21T09:23:32.8848055Z mov.b32 %r12609, %r12592; 2026-02-21T09:23:32.8848180Z mov.b32 %r12610, %r12592; 2026-02-21T09:23:32.8848259Z mov.b32 %r12611, %r12592; 2026-02-21T09:23:32.8848323Z mov.b32 %r12612, %r12592; 2026-02-21T09:23:32.8848392Z mov.b32 %r12613, %r12592; 2026-02-21T09:23:32.8848457Z mov.b32 %r12614, %r12592; 2026-02-21T09:23:32.8848520Z mov.b32 %r12615, %r12592; 2026-02-21T09:23:32.8848587Z mov.b32 %r12616, %r12592; 2026-02-21T09:23:32.8848648Z mov.b32 %r12617, %r12592; 2026-02-21T09:23:32.8848711Z mov.b32 %r12618, %r12592; 2026-02-21T09:23:32.8848774Z mov.b32 %r12619, %r12592; 2026-02-21T09:23:32.8848841Z mov.b32 %r12620, %r12592; 2026-02-21T09:23:32.8848906Z mov.b32 %r12621, %r12592; 2026-02-21T09:23:32.8848968Z mov.b32 %r12622, %r12592; 2026-02-21T09:23:32.8849039Z mov.b32 %r12623, %r12592; 2026-02-21T09:23:32.8849100Z mov.b32 %r12624, %r12592; 2026-02-21T09:23:32.8849162Z mov.b32 %r12625, %r12592; 2026-02-21T09:23:32.8849224Z mov.b32 %r12626, %r12592; 2026-02-21T09:23:32.8849290Z mov.b32 %r12627, %r12592; 2026-02-21T09:23:32.8849355Z mov.b32 %r12628, %r12592; 2026-02-21T09:23:32.8849426Z mov.b32 %r12629, %r12592; 2026-02-21T09:23:32.8849491Z mov.b32 %r12630, %r12592; 2026-02-21T09:23:32.8849555Z mov.b32 %r12631, %r12592; 2026-02-21T09:23:32.8849617Z mov.b32 %r12632, %r12592; 2026-02-21T09:23:32.8849679Z mov.b32 %r12633, %r12592; 2026-02-21T09:23:32.8849748Z mov.b32 %r12634, %r12592; 2026-02-21T09:23:32.8849810Z mov.b32 %r12635, %r12592; 2026-02-21T09:23:32.8849874Z mov.b32 %r12636, %r12592; 2026-02-21T09:23:32.8849943Z mov.b32 %r12637, %r12592; 2026-02-21T09:23:32.8850007Z mov.b32 %r12638, %r12592; 2026-02-21T09:23:32.8850084Z mov.b32 %r12639, %r12592; 2026-02-21T09:23:32.8850148Z mov.b32 %r12640, %r12592; 2026-02-21T09:23:32.8850220Z mov.b32 %r12641, %r12592; 2026-02-21T09:23:32.8850283Z mov.b32 %r12642, %r12592; 2026-02-21T09:23:32.8850350Z mov.b32 %r12643, %r12592; 2026-02-21T09:23:32.8850417Z mov.b32 %r12644, %r12592; 2026-02-21T09:23:32.8850480Z mov.b32 %r12645, %r12592; 2026-02-21T09:23:32.8850544Z mov.b32 %r12646, %r12592; 2026-02-21T09:23:32.8850608Z mov.b32 %r12647, %r12592; 2026-02-21T09:23:32.8850675Z mov.b32 %r12648, %r12592; 2026-02-21T09:23:32.8850736Z mov.b32 %r12649, %r12592; 2026-02-21T09:23:32.8850799Z mov.b32 %r12650, %r12592; 2026-02-21T09:23:32.8850866Z mov.b32 %r12651, %r12592; 2026-02-21T09:23:32.8850928Z mov.b32 %r12652, %r12592; 2026-02-21T09:23:32.8850992Z mov.b32 %r12653, %r12592; 2026-02-21T09:23:32.8851061Z mov.b32 %r12654, %r12592; 2026-02-21T09:23:32.8851127Z mov.b32 %r12655, %r12592; 2026-02-21T09:23:32.8851193Z mov.b32 %r12656, %r12592; 2026-02-21T09:23:32.8851255Z mov.b32 %r12657, %r12592; 2026-02-21T09:23:32.8851323Z mov.b32 %r12658, %r12592; 2026-02-21T09:23:32.8851564Z mov.b32 %r12659, %r12592; 2026-02-21T09:23:32.8851626Z mov.b32 %r12660, %r12592; 2026-02-21T09:23:32.8851693Z mov.b32 %r12661, %r12592; 2026-02-21T09:23:32.8851757Z mov.b32 %r12662, %r12592; 2026-02-21T09:23:32.8851821Z mov.b32 %r12663, %r12592; 2026-02-21T09:23:32.8851886Z mov.b32 %r12664, %r12592; 2026-02-21T09:23:32.8851953Z mov.b32 %r12665, %r12592; 2026-02-21T09:23:32.8852015Z mov.b32 %r12666, %r12592; 2026-02-21T09:23:32.8852077Z mov.b32 %r12667, %r12592; 2026-02-21T09:23:32.8852146Z mov.b32 %r12668, %r12592; 2026-02-21T09:23:32.8852207Z mov.b32 %r12669, %r12592; 2026-02-21T09:23:32.8852267Z mov.b32 %r12670, %r12592; 2026-02-21T09:23:32.8852328Z mov.b32 %r12671, %r12592; 2026-02-21T09:23:32.8852394Z mov.b32 %r12672, %r12592; 2026-02-21T09:23:32.8852518Z mov.b32 %r12673, %r12592; 2026-02-21T09:23:32.8852584Z mov.b32 %r12674, %r12592; 2026-02-21T09:23:32.8852648Z mov.b32 %r12675, %r12592; 2026-02-21T09:23:32.8852711Z mov.b32 %r12676, %r12592; 2026-02-21T09:23:32.8852775Z mov.b32 %r12677, %r12592; 2026-02-21T09:23:32.8852835Z mov.b32 %r12678, %r12592; 2026-02-21T09:23:32.8852903Z mov.b32 %r12679, %r12592; 2026-02-21T09:23:32.8852964Z mov.b32 %r12680, %r12592; 2026-02-21T09:23:32.8853026Z mov.b32 %r12681, %r12592; 2026-02-21T09:23:32.8853139Z mov.b32 %r12682, %r12592; 2026-02-21T09:23:32.8853206Z mov.b32 %r12683, %r12592; 2026-02-21T09:23:32.8853268Z mov.b32 %r12684, %r12592; 2026-02-21T09:23:32.8853330Z mov.b32 %r12685, %r12592; 2026-02-21T09:23:32.8853396Z mov.b32 %r12686, %r12592; 2026-02-21T09:23:32.8853457Z mov.b32 %r12687, %r12592; 2026-02-21T09:23:32.8853517Z mov.b32 %r12688, %r12592; 2026-02-21T09:23:32.8853586Z mov.b32 %r12689, %r12592; 2026-02-21T09:23:32.8853660Z mov.b32 %r12690, %r12592; 2026-02-21T09:23:32.8853727Z mov.b32 %r12691, %r12592; 2026-02-21T09:23:32.8853789Z mov.b32 %r12692, %r12592; 2026-02-21T09:23:32.8853857Z mov.b32 %r12693, %r12592; 2026-02-21T09:23:32.8853919Z mov.b32 %r12694, %r12592; 2026-02-21T09:23:32.8853984Z mov.b32 %r12695, %r12592; 2026-02-21T09:23:32.8854049Z mov.b32 %r12696, %r12592; 2026-02-21T09:23:32.8854112Z mov.b32 %r12697, %r12592; 2026-02-21T09:23:32.8854174Z mov.b32 %r12698, %r12592; 2026-02-21T09:23:32.8854241Z mov.b32 %r12699, %r12592; 2026-02-21T09:23:32.8854306Z mov.b32 %r12700, %r12592; 2026-02-21T09:23:32.8854367Z mov.b32 %r12701, %r12592; 2026-02-21T09:23:32.8854440Z mov.b32 %r12702, %r12592; 2026-02-21T09:23:32.8854508Z mov.b32 %r12703, %r12592; 2026-02-21T09:23:32.8854570Z mov.b32 %r12704, %r12592; 2026-02-21T09:23:32.8854634Z mov.b32 %r12705, %r12592; 2026-02-21T09:23:32.8854706Z mov.b32 %r12706, %r12592; 2026-02-21T09:23:32.8854768Z mov.b32 %r12707, %r12592; 2026-02-21T09:23:32.8854830Z mov.b32 %r12708, %r12592; 2026-02-21T09:23:32.8854894Z mov.b32 %r12709, %r12592; 2026-02-21T09:23:32.8854963Z mov.b32 %r12710, %r12592; 2026-02-21T09:23:32.8855026Z mov.b32 %r12711, %r12592; 2026-02-21T09:23:32.8855087Z mov.b32 %r12712, %r12592; 2026-02-21T09:23:32.8855157Z mov.b32 %r12713, %r12592; 2026-02-21T09:23:32.8855218Z mov.b32 %r12714, %r12592; 2026-02-21T09:23:32.8855280Z mov.b32 %r12715, %r12592; 2026-02-21T09:23:32.8855342Z mov.b32 %r12716, %r12592; 2026-02-21T09:23:32.8855410Z mov.b32 %r12717, %r12592; 2026-02-21T09:23:32.8855474Z mov.b32 %r12718, %r12592; 2026-02-21T09:23:32.8855535Z mov.b32 %r12719, %r12592; 2026-02-21T09:23:32.8855667Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:32.8855783Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:32.8855865Z setp.lt.u64 %p147, %rd741, 448; 2026-02-21T09:23:32.8855932Z add.s32 %r9391, %r12590, 1; 2026-02-21T09:23:32.8856008Z setp.gt.s32 %p148, %r9391, 1; 2026-02-21T09:23:32.8856082Z selp.b32 %r12590, 0, %r9391, %p148; 2026-02-21T09:23:32.8856152Z selp.b32 %r9392, 1, 0, %p148; 2026-02-21T09:23:32.8856226Z xor.b32 %r12589, %r12589, %r9392; 2026-02-21T09:23:32.8856442Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8856817Z cp.async.wait_group 1; 2026-02-21T09:23:32.8856885Z bar.sync 0; 2026-02-21T09:23:32.8856956Z shl.b32 %r9393, %r12590, 14; 2026-02-21T09:23:32.8857026Z add.s32 %r9395, %r1205, %r9393; 2026-02-21T09:23:32.8857239Z .loc 1 56 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:56:32 2026-02-21T09:23:32.8857311Z add.s32 %r9396, %r9395, %r90; 2026-02-21T09:23:32.8857382Z ld.shared.b16 %rs449, [%r9396]; 2026-02-21T09:23:32.8857457Z ld.shared.b16 %rs450, [%r9396+1024]; 2026-02-21T09:23:32.8857535Z ld.shared.b16 %rs451, [%r9396+64]; 2026-02-21T09:23:32.8857611Z ld.shared.b16 %rs452, [%r9396+1088]; 2026-02-21T09:23:32.8857751Z ld.shared.b16 %rs453, [%r9396+8192]; 2026-02-21T09:23:32.8857829Z ld.shared.b16 %rs454, [%r9396+9216]; 2026-02-21T09:23:32.8857898Z ld.shared.b16 %rs455, [%r9396+8256]; 2026-02-21T09:23:32.8857968Z ld.shared.b16 %rs456, [%r9396+9280]; 2026-02-21T09:23:32.8858038Z add.s32 %r9397, %r9395, %r91; 2026-02-21T09:23:32.8858114Z ld.shared.b16 %rs457, [%r9397]; 2026-02-21T09:23:32.8858186Z ld.shared.b16 %rs458, [%r9397+1024]; 2026-02-21T09:23:32.8858261Z ld.shared.b16 %rs459, [%r9397+64]; 2026-02-21T09:23:32.8858396Z ld.shared.b16 %rs460, [%r9397+1088]; 2026-02-21T09:23:32.8858468Z ld.shared.b16 %rs461, [%r9397+8192]; 2026-02-21T09:23:32.8858539Z ld.shared.b16 %rs462, [%r9397+9216]; 2026-02-21T09:23:32.8858611Z ld.shared.b16 %rs463, [%r9397+8256]; 2026-02-21T09:23:32.8858690Z ld.shared.b16 %rs464, [%r9397+9280]; 2026-02-21T09:23:32.8858760Z add.s32 %r9398, %r9395, %r92; 2026-02-21T09:23:32.8858831Z ld.shared.b16 %rs465, [%r9398]; 2026-02-21T09:23:32.8858908Z ld.shared.b16 %rs466, [%r9398+1024]; 2026-02-21T09:23:32.8858980Z ld.shared.b16 %rs467, [%r9398+64]; 2026-02-21T09:23:32.8859052Z ld.shared.b16 %rs468, [%r9398+1088]; 2026-02-21T09:23:32.8859128Z ld.shared.b16 %rs469, [%r9398+8192]; 2026-02-21T09:23:32.8859202Z ld.shared.b16 %rs470, [%r9398+9216]; 2026-02-21T09:23:32.8859273Z ld.shared.b16 %rs471, [%r9398+8256]; 2026-02-21T09:23:32.8859341Z ld.shared.b16 %rs472, [%r9398+9280]; 2026-02-21T09:23:32.8859412Z add.s32 %r9399, %r9395, %r93; 2026-02-21T09:23:32.8859481Z ld.shared.b16 %rs473, [%r9399]; 2026-02-21T09:23:32.8859562Z ld.shared.b16 %rs474, [%r9399+1024]; 2026-02-21T09:23:32.8859647Z ld.shared.b16 %rs475, [%r9399+64]; 2026-02-21T09:23:32.8859719Z ld.shared.b16 %rs476, [%r9399+1088]; 2026-02-21T09:23:32.8859789Z ld.shared.b16 %rs477, [%r9399+8192]; 2026-02-21T09:23:32.8859860Z ld.shared.b16 %rs478, [%r9399+9216]; 2026-02-21T09:23:32.8859935Z ld.shared.b16 %rs479, [%r9399+8256]; 2026-02-21T09:23:32.8860005Z ld.shared.b16 %rs480, [%r9399+9280]; 2026-02-21T09:23:32.8860073Z add.s32 %r9400, %r9395, %r94; 2026-02-21T09:23:32.8860147Z ld.shared.b16 %rs481, [%r9400]; 2026-02-21T09:23:32.8860217Z ld.shared.b16 %rs482, [%r9400+1024]; 2026-02-21T09:23:32.8860286Z ld.shared.b16 %rs483, [%r9400+64]; 2026-02-21T09:23:32.8860357Z ld.shared.b16 %rs484, [%r9400+1088]; 2026-02-21T09:23:32.8860432Z ld.shared.b16 %rs485, [%r9400+8192]; 2026-02-21T09:23:32.8860503Z ld.shared.b16 %rs486, [%r9400+9216]; 2026-02-21T09:23:32.8860571Z ld.shared.b16 %rs487, [%r9400+8256]; 2026-02-21T09:23:32.8860646Z ld.shared.b16 %rs488, [%r9400+9280]; 2026-02-21T09:23:32.8860711Z add.s32 %r9401, %r9395, %r95; 2026-02-21T09:23:32.8860780Z ld.shared.b16 %rs489, [%r9401]; 2026-02-21T09:23:32.8860853Z ld.shared.b16 %rs490, [%r9401+1024]; 2026-02-21T09:23:32.8860919Z ld.shared.b16 %rs491, [%r9401+64]; 2026-02-21T09:23:32.8860988Z ld.shared.b16 %rs492, [%r9401+1088]; 2026-02-21T09:23:32.8861055Z ld.shared.b16 %rs493, [%r9401+8192]; 2026-02-21T09:23:32.8861128Z ld.shared.b16 %rs494, [%r9401+9216]; 2026-02-21T09:23:32.8861199Z ld.shared.b16 %rs495, [%r9401+8256]; 2026-02-21T09:23:32.8861268Z ld.shared.b16 %rs496, [%r9401+9280]; 2026-02-21T09:23:32.8861338Z add.s32 %r9402, %r9395, %r96; 2026-02-21T09:23:32.8861521Z ld.shared.b16 %rs497, [%r9402]; 2026-02-21T09:23:32.8861592Z ld.shared.b16 %rs498, [%r9402+1024]; 2026-02-21T09:23:32.8861660Z ld.shared.b16 %rs499, [%r9402+64]; 2026-02-21T09:23:32.8861733Z ld.shared.b16 %rs500, [%r9402+1088]; 2026-02-21T09:23:32.8861805Z ld.shared.b16 %rs501, [%r9402+8192]; 2026-02-21T09:23:32.8861875Z ld.shared.b16 %rs502, [%r9402+9216]; 2026-02-21T09:23:32.8861947Z ld.shared.b16 %rs503, [%r9402+8256]; 2026-02-21T09:23:32.8862014Z ld.shared.b16 %rs504, [%r9402+9280]; 2026-02-21T09:23:32.8862079Z add.s32 %r9403, %r9395, %r97; 2026-02-21T09:23:32.8862152Z ld.shared.b16 %rs505, [%r9403]; 2026-02-21T09:23:32.8862222Z ld.shared.b16 %rs506, [%r9403+1024]; 2026-02-21T09:23:32.8862302Z ld.shared.b16 %rs507, [%r9403+64]; 2026-02-21T09:23:32.8862422Z ld.shared.b16 %rs508, [%r9403+1088]; 2026-02-21T09:23:32.8862496Z ld.shared.b16 %rs509, [%r9403+8192]; 2026-02-21T09:23:32.8862567Z ld.shared.b16 %rs510, [%r9403+9216]; 2026-02-21T09:23:32.8862639Z ld.shared.b16 %rs511, [%r9403+8256]; 2026-02-21T09:23:32.8862718Z ld.shared.b16 %rs512, [%r9403+9280]; 2026-02-21T09:23:32.8862789Z cvt.f32.bf16 %r7108, %rs449; 2026-02-21T09:23:32.8862855Z cvt.f32.bf16 %r7109, %rs450; 2026-02-21T09:23:32.8862921Z cvt.f32.bf16 %r7110, %rs457; 2026-02-21T09:23:32.8863040Z cvt.f32.bf16 %r7111, %rs458; 2026-02-21T09:23:32.8863107Z cvt.f32.bf16 %r7240, %rs465; 2026-02-21T09:23:32.8863172Z cvt.f32.bf16 %r7241, %rs466; 2026-02-21T09:23:32.8863241Z cvt.f32.bf16 %r7242, %rs473; 2026-02-21T09:23:32.8863306Z cvt.f32.bf16 %r7243, %rs474; 2026-02-21T09:23:32.8863371Z cvt.f32.bf16 %r7372, %rs481; 2026-02-21T09:23:32.8863438Z cvt.f32.bf16 %r7373, %rs482; 2026-02-21T09:23:32.8863503Z cvt.f32.bf16 %r7374, %rs489; 2026-02-21T09:23:32.8863567Z cvt.f32.bf16 %r7375, %rs490; 2026-02-21T09:23:32.8863633Z cvt.f32.bf16 %r7504, %rs497; 2026-02-21T09:23:32.8863702Z cvt.f32.bf16 %r7505, %rs498; 2026-02-21T09:23:32.8863767Z cvt.f32.bf16 %r7506, %rs505; 2026-02-21T09:23:32.8863834Z cvt.f32.bf16 %r7507, %rs506; 2026-02-21T09:23:32.8863907Z cvt.f32.bf16 %r7636, %rs451; 2026-02-21T09:23:32.8863972Z cvt.f32.bf16 %r7637, %rs452; 2026-02-21T09:23:32.8864036Z cvt.f32.bf16 %r7638, %rs459; 2026-02-21T09:23:32.8864101Z cvt.f32.bf16 %r7639, %rs460; 2026-02-21T09:23:32.8864174Z cvt.f32.bf16 %r7768, %rs467; 2026-02-21T09:23:32.8864240Z cvt.f32.bf16 %r7769, %rs468; 2026-02-21T09:23:32.8864305Z cvt.f32.bf16 %r7770, %rs475; 2026-02-21T09:23:32.8864374Z cvt.f32.bf16 %r7771, %rs476; 2026-02-21T09:23:32.8864438Z cvt.f32.bf16 %r7900, %rs483; 2026-02-21T09:23:32.8864503Z cvt.f32.bf16 %r7901, %rs484; 2026-02-21T09:23:32.8864568Z cvt.f32.bf16 %r7902, %rs491; 2026-02-21T09:23:32.8864637Z cvt.f32.bf16 %r7903, %rs492; 2026-02-21T09:23:32.8864701Z cvt.f32.bf16 %r8032, %rs499; 2026-02-21T09:23:32.8864768Z cvt.f32.bf16 %r8033, %rs500; 2026-02-21T09:23:32.8864838Z cvt.f32.bf16 %r8034, %rs507; 2026-02-21T09:23:32.8864905Z cvt.f32.bf16 %r8035, %rs508; 2026-02-21T09:23:32.8864973Z cvt.f32.bf16 %r8164, %rs453; 2026-02-21T09:23:32.8865036Z cvt.f32.bf16 %r8165, %rs454; 2026-02-21T09:23:32.8865113Z cvt.f32.bf16 %r8166, %rs461; 2026-02-21T09:23:32.8865182Z cvt.f32.bf16 %r8167, %rs462; 2026-02-21T09:23:32.8865248Z cvt.f32.bf16 %r8296, %rs469; 2026-02-21T09:23:32.8865318Z cvt.f32.bf16 %r8297, %rs470; 2026-02-21T09:23:32.8865384Z cvt.f32.bf16 %r8298, %rs477; 2026-02-21T09:23:32.8865449Z cvt.f32.bf16 %r8299, %rs478; 2026-02-21T09:23:32.8865522Z cvt.f32.bf16 %r8428, %rs485; 2026-02-21T09:23:32.8865587Z cvt.f32.bf16 %r8429, %rs486; 2026-02-21T09:23:32.8865664Z cvt.f32.bf16 %r8430, %rs493; 2026-02-21T09:23:32.8865731Z cvt.f32.bf16 %r8431, %rs494; 2026-02-21T09:23:32.8865805Z cvt.f32.bf16 %r8560, %rs501; 2026-02-21T09:23:32.8865871Z cvt.f32.bf16 %r8561, %rs502; 2026-02-21T09:23:32.8865939Z cvt.f32.bf16 %r8562, %rs509; 2026-02-21T09:23:32.8866010Z cvt.f32.bf16 %r8563, %rs510; 2026-02-21T09:23:32.8866074Z cvt.f32.bf16 %r8692, %rs455; 2026-02-21T09:23:32.8866243Z cvt.f32.bf16 %r8693, %rs456; 2026-02-21T09:23:32.8866307Z cvt.f32.bf16 %r8694, %rs463; 2026-02-21T09:23:32.8866379Z cvt.f32.bf16 %r8695, %rs464; 2026-02-21T09:23:32.8866443Z cvt.f32.bf16 %r8824, %rs471; 2026-02-21T09:23:32.8866622Z cvt.f32.bf16 %r8825, %rs472; 2026-02-21T09:23:32.8866696Z cvt.f32.bf16 %r8826, %rs479; 2026-02-21T09:23:32.8866764Z cvt.f32.bf16 %r8827, %rs480; 2026-02-21T09:23:32.8866831Z cvt.f32.bf16 %r8956, %rs487; 2026-02-21T09:23:32.8866897Z cvt.f32.bf16 %r8957, %rs488; 2026-02-21T09:23:32.8866969Z cvt.f32.bf16 %r8958, %rs495; 2026-02-21T09:23:32.8867036Z cvt.f32.bf16 %r8959, %rs496; 2026-02-21T09:23:32.8867101Z cvt.f32.bf16 %r9088, %rs503; 2026-02-21T09:23:32.8867168Z cvt.f32.bf16 %r9089, %rs504; 2026-02-21T09:23:32.8867232Z cvt.f32.bf16 %r9090, %rs511; 2026-02-21T09:23:32.8867383Z cvt.f32.bf16 %r9091, %rs512; 2026-02-21T09:23:32.8867615Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8867687Z shl.b32 %r9404, %r12590, 3; 2026-02-21T09:23:32.8867753Z add.s32 %r6978, %r9488, %r9404; 2026-02-21T09:23:32.8867817Z // begin inline asm 2026-02-21T09:23:32.8867878Z 2026-02-21T09:23:32.8867935Z { 2026-02-21T09:23:32.8868005Z .reg .pred complete; 2026-02-21T09:23:32.8868135Z waitLoop: 2026-02-21T09:23:32.8868367Z mbarrier.try_wait.parity.shared.b64 complete, [%r6978], %r12589; 2026-02-21T09:23:32.8868445Z @!complete bra.uni waitLoop; 2026-02-21T09:23:32.8868500Z } 2026-02-21T09:23:32.8868506Z 2026-02-21T09:23:32.8868573Z // end inline asm 2026-02-21T09:23:32.8868781Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8868847Z shl.b32 %r9406, %r12590, 12; 2026-02-21T09:23:32.8868919Z add.s32 %r9408, %r1289, %r9406; 2026-02-21T09:23:32.8869122Z .loc 1 76 58 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:76:58 2026-02-21T09:23:32.8869192Z add.s32 %r9409, %r9408, %r24; 2026-02-21T09:23:32.8869280Z add.s32 %r9410, %r9408, %r249; 2026-02-21T09:23:32.8869346Z add.s32 %r9411, %r9408, %r250; 2026-02-21T09:23:32.8869410Z add.s32 %r9412, %r9408, %r251; 2026-02-21T09:23:32.8869475Z add.s32 %r9413, %r9408, %r252; 2026-02-21T09:23:32.8869545Z add.s32 %r9414, %r9408, %r253; 2026-02-21T09:23:32.8869610Z add.s32 %r9415, %r9408, %r254; 2026-02-21T09:23:32.8869674Z add.s32 %r9416, %r9408, %r255; 2026-02-21T09:23:32.8869884Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8869954Z ld.shared.s8 %rs513, [%r9409]; 2026-02-21T09:23:32.8870149Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8870225Z shl.b16 %rs514, %rs513, 4; 2026-02-21T09:23:32.8870427Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8870500Z ld.shared.s8 %rs515, [%r9410+128]; 2026-02-21T09:23:32.8870699Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8870774Z shl.b16 %rs516, %rs515, 4; 2026-02-21T09:23:32.8870973Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8871044Z ld.shared.s8 %rs517, [%r9411+256]; 2026-02-21T09:23:32.8871250Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8871315Z shl.b16 %rs518, %rs517, 4; 2026-02-21T09:23:32.8871510Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8871586Z ld.shared.s8 %rs519, [%r9412+384]; 2026-02-21T09:23:32.8871784Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8871852Z shl.b16 %rs520, %rs519, 4; 2026-02-21T09:23:32.8872063Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8872311Z ld.shared.s8 %rs521, [%r9413+512]; 2026-02-21T09:23:32.8872514Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8872582Z shl.b16 %rs522, %rs521, 4; 2026-02-21T09:23:32.8872799Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8872876Z ld.shared.s8 %rs523, [%r9414+640]; 2026-02-21T09:23:32.8873074Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8873147Z shl.b16 %rs524, %rs523, 4; 2026-02-21T09:23:32.8873394Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8873468Z ld.shared.s8 %rs525, [%r9415+768]; 2026-02-21T09:23:32.8873673Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8873743Z shl.b16 %rs526, %rs525, 4; 2026-02-21T09:23:32.8873941Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8874015Z ld.shared.s8 %rs527, [%r9416+896]; 2026-02-21T09:23:32.8874272Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8874341Z shl.b16 %rs528, %rs527, 4; 2026-02-21T09:23:32.8874539Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8874620Z ld.shared.s8 %rs529, [%r9409+1024]; 2026-02-21T09:23:32.8874818Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8874887Z shl.b16 %rs530, %rs529, 4; 2026-02-21T09:23:32.8875090Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8875164Z ld.shared.s8 %rs531, [%r9410+1152]; 2026-02-21T09:23:32.8875368Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8875439Z shl.b16 %rs532, %rs531, 4; 2026-02-21T09:23:32.8875637Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8875708Z ld.shared.s8 %rs533, [%r9411+1280]; 2026-02-21T09:23:32.8875911Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8875978Z shl.b16 %rs534, %rs533, 4; 2026-02-21T09:23:32.8876173Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8876244Z ld.shared.s8 %rs535, [%r9412+1408]; 2026-02-21T09:23:32.8876577Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8876649Z shl.b16 %rs536, %rs535, 4; 2026-02-21T09:23:32.8876853Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8876936Z ld.shared.s8 %rs537, [%r9413+1536]; 2026-02-21T09:23:32.8877149Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8877215Z shl.b16 %rs538, %rs537, 4; 2026-02-21T09:23:32.8877424Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8877496Z ld.shared.s8 %rs539, [%r9414+1664]; 2026-02-21T09:23:32.8877698Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8877769Z shl.b16 %rs540, %rs539, 4; 2026-02-21T09:23:32.8877968Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8878038Z ld.shared.s8 %rs541, [%r9415+1792]; 2026-02-21T09:23:32.8878242Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8878456Z shl.b16 %rs542, %rs541, 4; 2026-02-21T09:23:32.8878656Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8878729Z ld.shared.s8 %rs543, [%r9416+1920]; 2026-02-21T09:23:32.8878935Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8879002Z shl.b16 %rs544, %rs543, 4; 2026-02-21T09:23:32.8879200Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8879278Z ld.shared.s8 %rs545, [%r9409+2048]; 2026-02-21T09:23:32.8879475Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8879618Z shl.b16 %rs546, %rs545, 4; 2026-02-21T09:23:32.8879825Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8879899Z ld.shared.s8 %rs547, [%r9410+2176]; 2026-02-21T09:23:32.8880096Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8880167Z shl.b16 %rs548, %rs547, 4; 2026-02-21T09:23:32.8880426Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8880500Z ld.shared.s8 %rs549, [%r9411+2304]; 2026-02-21T09:23:32.8880697Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8880767Z shl.b16 %rs550, %rs549, 4; 2026-02-21T09:23:32.8880983Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8881056Z ld.shared.s8 %rs551, [%r9412+2432]; 2026-02-21T09:23:32.8881255Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8881322Z shl.b16 %rs552, %rs551, 4; 2026-02-21T09:23:32.8881521Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8881595Z ld.shared.s8 %rs553, [%r9413+2560]; 2026-02-21T09:23:32.8881798Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8881866Z shl.b16 %rs554, %rs553, 4; 2026-02-21T09:23:32.8882068Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8882141Z ld.shared.s8 %rs555, [%r9414+2688]; 2026-02-21T09:23:32.8882340Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8882406Z shl.b16 %rs556, %rs555, 4; 2026-02-21T09:23:32.8882609Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8882679Z ld.shared.s8 %rs557, [%r9415+2816]; 2026-02-21T09:23:32.8882874Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8882947Z shl.b16 %rs558, %rs557, 4; 2026-02-21T09:23:32.8883142Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8883213Z ld.shared.s8 %rs559, [%r9416+2944]; 2026-02-21T09:23:32.8883414Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8883491Z shl.b16 %rs560, %rs559, 4; 2026-02-21T09:23:32.8883691Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8883767Z ld.shared.s8 %rs561, [%r9409+3072]; 2026-02-21T09:23:32.8883963Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8884027Z shl.b16 %rs562, %rs561, 4; 2026-02-21T09:23:32.8884224Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8884407Z ld.shared.s8 %rs563, [%r9410+3200]; 2026-02-21T09:23:32.8884605Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8884669Z shl.b16 %rs564, %rs563, 4; 2026-02-21T09:23:32.8884873Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8884945Z ld.shared.s8 %rs565, [%r9411+3328]; 2026-02-21T09:23:32.8885140Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8885213Z shl.b16 %rs566, %rs565, 4; 2026-02-21T09:23:32.8885410Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8885534Z ld.shared.s8 %rs567, [%r9412+3456]; 2026-02-21T09:23:32.8885741Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8885809Z shl.b16 %rs568, %rs567, 4; 2026-02-21T09:23:32.8886006Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8886077Z ld.shared.s8 %rs569, [%r9413+3584]; 2026-02-21T09:23:32.8886325Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8886392Z shl.b16 %rs570, %rs569, 4; 2026-02-21T09:23:32.8886719Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8886801Z ld.shared.s8 %rs571, [%r9414+3712]; 2026-02-21T09:23:32.8887002Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8887067Z shl.b16 %rs572, %rs571, 4; 2026-02-21T09:23:32.8887271Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8887341Z ld.shared.s8 %rs573, [%r9415+3840]; 2026-02-21T09:23:32.8887542Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8887616Z shl.b16 %rs574, %rs573, 4; 2026-02-21T09:23:32.8887817Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8887893Z ld.shared.s8 %rs575, [%r9416+3968]; 2026-02-21T09:23:32.8888100Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.8888167Z shl.b16 %rs576, %rs575, 4; 2026-02-21T09:23:32.8888366Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8888433Z cvt.s16.s8 %rs577, %rs514; 2026-02-21T09:23:32.8888515Z shr.s16 %rs578, %rs577, 4; 2026-02-21T09:23:32.8888585Z cvt.s16.s8 %rs579, %rs516; 2026-02-21T09:23:32.8888650Z shr.s16 %rs580, %rs579, 4; 2026-02-21T09:23:32.8888718Z shr.s16 %rs581, %rs513, 4; 2026-02-21T09:23:32.8888786Z shr.s16 %rs582, %rs515, 4; 2026-02-21T09:23:32.8888986Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8889056Z cvt.rn.f32.s16 %r9417, %rs582; 2026-02-21T09:23:32.8889133Z cvt.rn.f32.s16 %r9418, %rs581; 2026-02-21T09:23:32.8889202Z cvt.rn.f32.s16 %r9419, %rs580; 2026-02-21T09:23:32.8889268Z cvt.rn.f32.s16 %r9420, %rs578; 2026-02-21T09:23:32.8889471Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8889536Z cvt.s16.s8 %rs583, %rs518; 2026-02-21T09:23:32.8889606Z shr.s16 %rs584, %rs583, 4; 2026-02-21T09:23:32.8889680Z cvt.s16.s8 %rs585, %rs520; 2026-02-21T09:23:32.8889745Z shr.s16 %rs586, %rs585, 4; 2026-02-21T09:23:32.8889812Z shr.s16 %rs587, %rs517, 4; 2026-02-21T09:23:32.8889876Z shr.s16 %rs588, %rs519, 4; 2026-02-21T09:23:32.8890082Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8890326Z cvt.rn.f32.s16 %r9421, %rs588; 2026-02-21T09:23:32.8890395Z cvt.rn.f32.s16 %r9422, %rs587; 2026-02-21T09:23:32.8890466Z cvt.rn.f32.s16 %r9423, %rs586; 2026-02-21T09:23:32.8890532Z cvt.rn.f32.s16 %r9424, %rs584; 2026-02-21T09:23:32.8890734Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8890807Z cvt.s16.s8 %rs589, %rs522; 2026-02-21T09:23:32.8890873Z shr.s16 %rs590, %rs589, 4; 2026-02-21T09:23:32.8890941Z cvt.s16.s8 %rs591, %rs524; 2026-02-21T09:23:32.8891008Z shr.s16 %rs592, %rs591, 4; 2026-02-21T09:23:32.8891079Z shr.s16 %rs593, %rs521, 4; 2026-02-21T09:23:32.8891146Z shr.s16 %rs594, %rs523, 4; 2026-02-21T09:23:32.8891409Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8891486Z cvt.rn.f32.s16 %r9425, %rs594; 2026-02-21T09:23:32.8891553Z cvt.rn.f32.s16 %r9426, %rs593; 2026-02-21T09:23:32.8891619Z cvt.rn.f32.s16 %r9427, %rs592; 2026-02-21T09:23:32.8891688Z cvt.rn.f32.s16 %r9428, %rs590; 2026-02-21T09:23:32.8891907Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8891974Z cvt.s16.s8 %rs595, %rs526; 2026-02-21T09:23:32.8892117Z shr.s16 %rs596, %rs595, 4; 2026-02-21T09:23:32.8892189Z cvt.s16.s8 %rs597, %rs528; 2026-02-21T09:23:32.8892252Z shr.s16 %rs598, %rs597, 4; 2026-02-21T09:23:32.8892315Z shr.s16 %rs599, %rs525, 4; 2026-02-21T09:23:32.8892389Z shr.s16 %rs600, %rs527, 4; 2026-02-21T09:23:32.8892599Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8892668Z cvt.rn.f32.s16 %r9429, %rs600; 2026-02-21T09:23:32.8892738Z cvt.rn.f32.s16 %r9430, %rs599; 2026-02-21T09:23:32.8892809Z cvt.rn.f32.s16 %r9431, %rs598; 2026-02-21T09:23:32.8892875Z cvt.rn.f32.s16 %r9432, %rs596; 2026-02-21T09:23:32.8893074Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8893148Z cvt.s16.s8 %rs601, %rs530; 2026-02-21T09:23:32.8893212Z shr.s16 %rs602, %rs601, 4; 2026-02-21T09:23:32.8893278Z cvt.s16.s8 %rs603, %rs532; 2026-02-21T09:23:32.8893342Z shr.s16 %rs604, %rs603, 4; 2026-02-21T09:23:32.8893411Z shr.s16 %rs605, %rs529, 4; 2026-02-21T09:23:32.8893473Z shr.s16 %rs606, %rs531, 4; 2026-02-21T09:23:32.8893672Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8893746Z cvt.rn.f32.s16 %r9433, %rs606; 2026-02-21T09:23:32.8893812Z cvt.rn.f32.s16 %r9434, %rs605; 2026-02-21T09:23:32.8893878Z cvt.rn.f32.s16 %r9435, %rs604; 2026-02-21T09:23:32.8893946Z cvt.rn.f32.s16 %r9436, %rs602; 2026-02-21T09:23:32.8894145Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8894210Z cvt.s16.s8 %rs607, %rs534; 2026-02-21T09:23:32.8894273Z shr.s16 %rs608, %rs607, 4; 2026-02-21T09:23:32.8894344Z cvt.s16.s8 %rs609, %rs536; 2026-02-21T09:23:32.8894408Z shr.s16 %rs610, %rs609, 4; 2026-02-21T09:23:32.8894471Z shr.s16 %rs611, %rs533, 4; 2026-02-21T09:23:32.8894541Z shr.s16 %rs612, %rs535, 4; 2026-02-21T09:23:32.8894740Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8894808Z cvt.rn.f32.s16 %r9437, %rs612; 2026-02-21T09:23:32.8894882Z cvt.rn.f32.s16 %r9438, %rs611; 2026-02-21T09:23:32.8894957Z cvt.rn.f32.s16 %r9439, %rs610; 2026-02-21T09:23:32.8895023Z cvt.rn.f32.s16 %r9440, %rs608; 2026-02-21T09:23:32.8895223Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8895294Z cvt.s16.s8 %rs613, %rs538; 2026-02-21T09:23:32.8895359Z shr.s16 %rs614, %rs613, 4; 2026-02-21T09:23:32.8895425Z cvt.s16.s8 %rs615, %rs540; 2026-02-21T09:23:32.8895494Z shr.s16 %rs616, %rs615, 4; 2026-02-21T09:23:32.8895559Z shr.s16 %rs617, %rs537, 4; 2026-02-21T09:23:32.8895735Z shr.s16 %rs618, %rs539, 4; 2026-02-21T09:23:32.8895934Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8896009Z cvt.rn.f32.s16 %r9441, %rs618; 2026-02-21T09:23:32.8896078Z cvt.rn.f32.s16 %r9442, %rs617; 2026-02-21T09:23:32.8896143Z cvt.rn.f32.s16 %r9443, %rs616; 2026-02-21T09:23:32.8896217Z cvt.rn.f32.s16 %r9444, %rs614; 2026-02-21T09:23:32.8896415Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8896602Z cvt.s16.s8 %rs619, %rs542; 2026-02-21T09:23:32.8896692Z shr.s16 %rs620, %rs619, 4; 2026-02-21T09:23:32.8896758Z cvt.s16.s8 %rs621, %rs544; 2026-02-21T09:23:32.8896823Z shr.s16 %rs622, %rs621, 4; 2026-02-21T09:23:32.8896965Z shr.s16 %rs623, %rs541, 4; 2026-02-21T09:23:32.8897039Z shr.s16 %rs624, %rs543, 4; 2026-02-21T09:23:32.8897246Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8897332Z cvt.rn.f32.s16 %r9445, %rs624; 2026-02-21T09:23:32.8897405Z cvt.rn.f32.s16 %r9446, %rs623; 2026-02-21T09:23:32.8897473Z cvt.rn.f32.s16 %r9447, %rs622; 2026-02-21T09:23:32.8897540Z cvt.rn.f32.s16 %r9448, %rs620; 2026-02-21T09:23:32.8897807Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8897889Z cvt.s16.s8 %rs625, %rs546; 2026-02-21T09:23:32.8897955Z shr.s16 %rs626, %rs625, 4; 2026-02-21T09:23:32.8898021Z cvt.s16.s8 %rs627, %rs548; 2026-02-21T09:23:32.8898090Z shr.s16 %rs628, %rs627, 4; 2026-02-21T09:23:32.8898155Z shr.s16 %rs629, %rs545, 4; 2026-02-21T09:23:32.8898219Z shr.s16 %rs630, %rs547, 4; 2026-02-21T09:23:32.8898429Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8898496Z cvt.rn.f32.s16 %r9449, %rs630; 2026-02-21T09:23:32.8898562Z cvt.rn.f32.s16 %r9450, %rs629; 2026-02-21T09:23:32.8898632Z cvt.rn.f32.s16 %r9451, %rs628; 2026-02-21T09:23:32.8898704Z cvt.rn.f32.s16 %r9452, %rs626; 2026-02-21T09:23:32.8898904Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8898971Z cvt.s16.s8 %rs631, %rs550; 2026-02-21T09:23:32.8899041Z shr.s16 %rs632, %rs631, 4; 2026-02-21T09:23:32.8899110Z cvt.s16.s8 %rs633, %rs552; 2026-02-21T09:23:32.8899176Z shr.s16 %rs634, %rs633, 4; 2026-02-21T09:23:32.8899258Z shr.s16 %rs635, %rs549, 4; 2026-02-21T09:23:32.8899325Z shr.s16 %rs636, %rs551, 4; 2026-02-21T09:23:32.8899521Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8899590Z cvt.rn.f32.s16 %r9453, %rs636; 2026-02-21T09:23:32.8899664Z cvt.rn.f32.s16 %r9454, %rs635; 2026-02-21T09:23:32.8899739Z cvt.rn.f32.s16 %r9455, %rs634; 2026-02-21T09:23:32.8899807Z cvt.rn.f32.s16 %r9456, %rs632; 2026-02-21T09:23:32.8900011Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8900082Z cvt.s16.s8 %rs637, %rs554; 2026-02-21T09:23:32.8900147Z shr.s16 %rs638, %rs637, 4; 2026-02-21T09:23:32.8900212Z cvt.s16.s8 %rs639, %rs556; 2026-02-21T09:23:32.8900285Z shr.s16 %rs640, %rs639, 4; 2026-02-21T09:23:32.8900349Z shr.s16 %rs641, %rs553, 4; 2026-02-21T09:23:32.8900413Z shr.s16 %rs642, %rs555, 4; 2026-02-21T09:23:32.8900618Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8900686Z cvt.rn.f32.s16 %r9457, %rs642; 2026-02-21T09:23:32.8900754Z cvt.rn.f32.s16 %r9458, %rs641; 2026-02-21T09:23:32.8900824Z cvt.rn.f32.s16 %r9459, %rs640; 2026-02-21T09:23:32.8900892Z cvt.rn.f32.s16 %r9460, %rs638; 2026-02-21T09:23:32.8901094Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8901161Z cvt.s16.s8 %rs643, %rs558; 2026-02-21T09:23:32.8901379Z shr.s16 %rs644, %rs643, 4; 2026-02-21T09:23:32.8901445Z cvt.s16.s8 %rs645, %rs560; 2026-02-21T09:23:32.8901510Z shr.s16 %rs646, %rs645, 4; 2026-02-21T09:23:32.8901591Z shr.s16 %rs647, %rs557, 4; 2026-02-21T09:23:32.8901658Z shr.s16 %rs648, %rs559, 4; 2026-02-21T09:23:32.8901862Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8901934Z cvt.rn.f32.s16 %r9461, %rs648; 2026-02-21T09:23:32.8902001Z cvt.rn.f32.s16 %r9462, %rs647; 2026-02-21T09:23:32.8902068Z cvt.rn.f32.s16 %r9463, %rs646; 2026-02-21T09:23:32.8902134Z cvt.rn.f32.s16 %r9464, %rs644; 2026-02-21T09:23:32.8902338Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8902454Z cvt.s16.s8 %rs649, %rs562; 2026-02-21T09:23:32.8902522Z shr.s16 %rs650, %rs649, 4; 2026-02-21T09:23:32.8902591Z cvt.s16.s8 %rs651, %rs564; 2026-02-21T09:23:32.8902653Z shr.s16 %rs652, %rs651, 4; 2026-02-21T09:23:32.8902721Z shr.s16 %rs653, %rs561, 4; 2026-02-21T09:23:32.8902795Z shr.s16 %rs654, %rs563, 4; 2026-02-21T09:23:32.8903003Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8903119Z cvt.rn.f32.s16 %r9465, %rs654; 2026-02-21T09:23:32.8903189Z cvt.rn.f32.s16 %r9466, %rs653; 2026-02-21T09:23:32.8903260Z cvt.rn.f32.s16 %r9467, %rs652; 2026-02-21T09:23:32.8903324Z cvt.rn.f32.s16 %r9468, %rs650; 2026-02-21T09:23:32.8903521Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8903591Z cvt.s16.s8 %rs655, %rs566; 2026-02-21T09:23:32.8903655Z shr.s16 %rs656, %rs655, 4; 2026-02-21T09:23:32.8903719Z cvt.s16.s8 %rs657, %rs568; 2026-02-21T09:23:32.8903787Z shr.s16 %rs658, %rs657, 4; 2026-02-21T09:23:32.8903864Z shr.s16 %rs659, %rs565, 4; 2026-02-21T09:23:32.8903933Z shr.s16 %rs660, %rs567, 4; 2026-02-21T09:23:32.8904133Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8904209Z cvt.rn.f32.s16 %r9469, %rs660; 2026-02-21T09:23:32.8904276Z cvt.rn.f32.s16 %r9470, %rs659; 2026-02-21T09:23:32.8904342Z cvt.rn.f32.s16 %r9471, %rs658; 2026-02-21T09:23:32.8904411Z cvt.rn.f32.s16 %r9472, %rs656; 2026-02-21T09:23:32.8904617Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8904683Z cvt.s16.s8 %rs661, %rs570; 2026-02-21T09:23:32.8904747Z shr.s16 %rs662, %rs661, 4; 2026-02-21T09:23:32.8904817Z cvt.s16.s8 %rs663, %rs572; 2026-02-21T09:23:32.8904883Z shr.s16 %rs664, %rs663, 4; 2026-02-21T09:23:32.8904946Z shr.s16 %rs665, %rs569, 4; 2026-02-21T09:23:32.8905016Z shr.s16 %rs666, %rs571, 4; 2026-02-21T09:23:32.8905215Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8905282Z cvt.rn.f32.s16 %r9473, %rs666; 2026-02-21T09:23:32.8905353Z cvt.rn.f32.s16 %r9474, %rs665; 2026-02-21T09:23:32.8905424Z cvt.rn.f32.s16 %r9475, %rs664; 2026-02-21T09:23:32.8905490Z cvt.rn.f32.s16 %r9476, %rs662; 2026-02-21T09:23:32.8905690Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.8905762Z cvt.s16.s8 %rs667, %rs574; 2026-02-21T09:23:32.8905826Z shr.s16 %rs668, %rs667, 4; 2026-02-21T09:23:32.8905890Z cvt.s16.s8 %rs669, %rs576; 2026-02-21T09:23:32.8905960Z shr.s16 %rs670, %rs669, 4; 2026-02-21T09:23:32.8906027Z shr.s16 %rs671, %rs573, 4; 2026-02-21T09:23:32.8906093Z shr.s16 %rs672, %rs575, 4; 2026-02-21T09:23:32.8906293Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.8906368Z cvt.rn.f32.s16 %r9477, %rs672; 2026-02-21T09:23:32.8906434Z cvt.rn.f32.s16 %r9478, %rs671; 2026-02-21T09:23:32.8906621Z cvt.rn.f32.s16 %r9479, %rs670; 2026-02-21T09:23:32.8906697Z cvt.rn.f32.s16 %r9480, %rs668; 2026-02-21T09:23:32.8906959Z st.shared.v4.b32 [%r66], {%r9420, %r9418, %r9419, %r9417}; 2026-02-21T09:23:32.8907088Z st.shared.v4.b32 [%r66+16384], {%r9452, %r9450, %r9451, %r9449}; 2026-02-21T09:23:32.8907197Z st.shared.v4.b32 [%r67], {%r9424, %r9422, %r9423, %r9421}; 2026-02-21T09:23:32.8907338Z st.shared.v4.b32 [%r67+16384], {%r9456, %r9454, %r9455, %r9453}; 2026-02-21T09:23:32.8907449Z st.shared.v4.b32 [%r68], {%r9428, %r9426, %r9427, %r9425}; 2026-02-21T09:23:32.8907570Z st.shared.v4.b32 [%r68+16384], {%r9460, %r9458, %r9459, %r9457}; 2026-02-21T09:23:32.8907685Z st.shared.v4.b32 [%r69], {%r9432, %r9430, %r9431, %r9429}; 2026-02-21T09:23:32.8907802Z st.shared.v4.b32 [%r69+16384], {%r9464, %r9462, %r9463, %r9461}; 2026-02-21T09:23:32.8907972Z st.shared.v4.b32 [%r70], {%r9436, %r9434, %r9435, %r9433}; 2026-02-21T09:23:32.8908112Z st.shared.v4.b32 [%r70+16384], {%r9468, %r9466, %r9467, %r9465}; 2026-02-21T09:23:32.8908304Z st.shared.v4.b32 [%r71], {%r9440, %r9438, %r9439, %r9437}; 2026-02-21T09:23:32.8908432Z st.shared.v4.b32 [%r71+16384], {%r9472, %r9470, %r9471, %r9469}; 2026-02-21T09:23:32.8908546Z st.shared.v4.b32 [%r72], {%r9444, %r9442, %r9443, %r9441}; 2026-02-21T09:23:32.8908664Z st.shared.v4.b32 [%r72+16384], {%r9476, %r9474, %r9475, %r9473}; 2026-02-21T09:23:32.8908842Z st.shared.v4.b32 [%r73], {%r9448, %r9446, %r9447, %r9445}; 2026-02-21T09:23:32.8908963Z st.shared.v4.b32 [%r73+16384], {%r9480, %r9478, %r9479, %r9477}; 2026-02-21T09:23:32.8909038Z $L__tmp5: 2026-02-21T09:23:32.8909325Z .loc 2 291 36 // standard.py:291:36 @[ c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:88:40 ] 2026-02-21T09:23:32.8909391Z // begin inline asm 2026-02-21T09:23:32.8916787Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8916901Z // end inline asm 2026-02-21T09:23:32.8916978Z bar.sync 0; 2026-02-21T09:23:32.8917059Z wgmma.fence.sync.aligned; 2026-02-21T09:23:32.8917133Z mov.pred %p127, -1; 2026-02-21T09:23:32.8917203Z // begin inline asm 2026-02-21T09:23:32.8918720Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7108,%r7109,%r7110,%r7111}, %rd702, %p127, 1, 1; 2026-02-21T09:23:32.8918784Z // end inline asm 2026-02-21T09:23:32.8918852Z // begin inline asm 2026-02-21T09:23:32.8920331Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7240,%r7241,%r7242,%r7243}, %rd703, %p127, 1, 1; 2026-02-21T09:23:32.8920397Z // end inline asm 2026-02-21T09:23:32.8920459Z // begin inline asm 2026-02-21T09:23:32.8921921Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7372,%r7373,%r7374,%r7375}, %rd704, %p127, 1, 1; 2026-02-21T09:23:32.8922208Z // end inline asm 2026-02-21T09:23:32.8922268Z // begin inline asm 2026-02-21T09:23:32.8923821Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7504,%r7505,%r7506,%r7507}, %rd705, %p127, 1, 1; 2026-02-21T09:23:32.8923892Z // end inline asm 2026-02-21T09:23:32.8923955Z // begin inline asm 2026-02-21T09:23:32.8925478Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7636,%r7637,%r7638,%r7639}, %rd706, %p127, 1, 1; 2026-02-21T09:23:32.8925544Z // end inline asm 2026-02-21T09:23:32.8925608Z // begin inline asm 2026-02-21T09:23:32.8927201Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7768,%r7769,%r7770,%r7771}, %rd707, %p127, 1, 1; 2026-02-21T09:23:32.8927272Z // end inline asm 2026-02-21T09:23:32.8927337Z // begin inline asm 2026-02-21T09:23:32.8928808Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r7900,%r7901,%r7902,%r7903}, %rd708, %p127, 1, 1; 2026-02-21T09:23:32.8928874Z // end inline asm 2026-02-21T09:23:32.8928935Z // begin inline asm 2026-02-21T09:23:32.8930396Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655}, {%r8032,%r8033,%r8034,%r8035}, %rd709, %p127, 1, 1; 2026-02-21T09:23:32.8930454Z // end inline asm 2026-02-21T09:23:32.8930519Z // begin inline asm 2026-02-21T09:23:32.8931971Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8164,%r8165,%r8166,%r8167}, %rd702, %p127, 1, 1; 2026-02-21T09:23:32.8932181Z // end inline asm 2026-02-21T09:23:32.8932245Z // begin inline asm 2026-02-21T09:23:32.8933763Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8296,%r8297,%r8298,%r8299}, %rd703, %p127, 1, 1; 2026-02-21T09:23:32.8933836Z // end inline asm 2026-02-21T09:23:32.8933954Z // begin inline asm 2026-02-21T09:23:32.8935444Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8428,%r8429,%r8430,%r8431}, %rd704, %p127, 1, 1; 2026-02-21T09:23:32.8935510Z // end inline asm 2026-02-21T09:23:32.8935577Z // begin inline asm 2026-02-21T09:23:32.8937186Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8560,%r8561,%r8562,%r8563}, %rd705, %p127, 1, 1; 2026-02-21T09:23:32.8937249Z // end inline asm 2026-02-21T09:23:32.8937322Z // begin inline asm 2026-02-21T09:23:32.8938812Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8692,%r8693,%r8694,%r8695}, %rd706, %p127, 1, 1; 2026-02-21T09:23:32.8938875Z // end inline asm 2026-02-21T09:23:32.8938933Z // begin inline asm 2026-02-21T09:23:32.8940424Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8824,%r8825,%r8826,%r8827}, %rd707, %p127, 1, 1; 2026-02-21T09:23:32.8940618Z // end inline asm 2026-02-21T09:23:32.8940683Z // begin inline asm 2026-02-21T09:23:32.8942219Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r8956,%r8957,%r8958,%r8959}, %rd708, %p127, 1, 1; 2026-02-21T09:23:32.8942281Z // end inline asm 2026-02-21T09:23:32.8942347Z // begin inline asm 2026-02-21T09:23:32.8943888Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719}, {%r9088,%r9089,%r9090,%r9091}, %rd709, %p127, 1, 1; 2026-02-21T09:23:32.8943961Z // end inline asm 2026-02-21T09:23:32.8944051Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:32.8944116Z mov.b32 %r9222, %r6763; 2026-02-21T09:23:32.8944182Z mov.b32 %r9220, %r1221; 2026-02-21T09:23:32.8944243Z mov.b32 %r9221, %r6763; 2026-02-21T09:23:32.8944302Z // begin inline asm 2026-02-21T09:23:32.8946953Z // wait for regs: %r12592,%r12593,%r12594,%r12595,%r12596,%r12597,%r12598,%r12599,%r12600,%r12601,%r12602,%r12603,%r12604,%r12605,%r12606,%r12607,%r12608,%r12609,%r12610,%r12611,%r12612,%r12613,%r12614,%r12615,%r12616,%r12617,%r12618,%r12619,%r12620,%r12621,%r12622,%r12623,%r12624,%r12625,%r12626,%r12627,%r12628,%r12629,%r12630,%r12631,%r12632,%r12633,%r12634,%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r9220,%r9221,%r9222 2026-02-21T09:23:32.8947047Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:32.8947107Z // end inline asm 2026-02-21T09:23:32.8947169Z $L__tmp6: 2026-02-21T09:23:32.8947392Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8947459Z add.s32 %r9481, %r12591, 1; 2026-02-21T09:23:32.8947534Z setp.gt.s32 %p149, %r9481, 1; 2026-02-21T09:23:32.8947607Z selp.b32 %r12591, 0, %r9481, %p149; 2026-02-21T09:23:32.8947814Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8947887Z add.s64 %rd569, %rd740, %rd68; 2026-02-21T09:23:32.8947965Z add.s64 %rd570, %rd740, %rd67; 2026-02-21T09:23:32.8948032Z add.s64 %rd571, %rd740, %rd66; 2026-02-21T09:23:32.8948097Z add.s64 %rd572, %rd740, %rd65; 2026-02-21T09:23:32.8948166Z add.s64 %rd573, %rd740, %rd64; 2026-02-21T09:23:32.8948294Z add.s64 %rd574, %rd740, %rd63; 2026-02-21T09:23:32.8948373Z add.s64 %rd575, %rd740, %rd62; 2026-02-21T09:23:32.8948603Z add.s64 %rd576, %rd740, %rd61; 2026-02-21T09:23:32.8948667Z add.s64 %rd577, %rd740, %rd60; 2026-02-21T09:23:32.8948732Z add.s64 %rd578, %rd740, %rd59; 2026-02-21T09:23:32.8948797Z add.s64 %rd579, %rd740, %rd58; 2026-02-21T09:23:32.8948868Z add.s64 %rd580, %rd740, %rd57; 2026-02-21T09:23:32.8948936Z add.s64 %rd581, %rd740, %rd56; 2026-02-21T09:23:32.8949001Z add.s64 %rd582, %rd740, %rd55; 2026-02-21T09:23:32.8949072Z add.s64 %rd583, %rd740, %rd54; 2026-02-21T09:23:32.8949277Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8949343Z add.s64 %rd584, %rd740, %rd53; 2026-02-21T09:23:32.8949413Z shl.b32 %r9482, %r12591, 14; 2026-02-21T09:23:32.8949481Z add.s32 %r9483, %r1205, %r9482; 2026-02-21T09:23:32.8949609Z add.s32 %r9354, %r9483, %r25; 2026-02-21T09:23:32.8949679Z selp.b32 %r9355, 8, 0, %p147; 2026-02-21T09:23:32.8949744Z // begin inline asm 2026-02-21T09:23:32.8949905Z cp.async.ca.shared.global [ %r9354 + 0 ], [ %rd569 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8949968Z // end inline asm 2026-02-21T09:23:32.8950034Z add.s32 %r9356, %r9354, 1024; 2026-02-21T09:23:32.8950095Z // begin inline asm 2026-02-21T09:23:32.8950309Z cp.async.ca.shared.global [ %r9356 + 0 ], [ %rd570 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8950373Z // end inline asm 2026-02-21T09:23:32.8950443Z add.s32 %r9358, %r9354, 2048; 2026-02-21T09:23:32.8950504Z // begin inline asm 2026-02-21T09:23:32.8950644Z cp.async.ca.shared.global [ %r9358 + 0 ], [ %rd571 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8950711Z // end inline asm 2026-02-21T09:23:32.8950776Z add.s32 %r9360, %r9354, 3072; 2026-02-21T09:23:32.8950837Z // begin inline asm 2026-02-21T09:23:32.8950984Z cp.async.ca.shared.global [ %r9360 + 0 ], [ %rd572 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8951050Z // end inline asm 2026-02-21T09:23:32.8951114Z add.s32 %r9362, %r9354, 4096; 2026-02-21T09:23:32.8951179Z // begin inline asm 2026-02-21T09:23:32.8951320Z cp.async.ca.shared.global [ %r9362 + 0 ], [ %rd573 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8951385Z // end inline asm 2026-02-21T09:23:32.8951447Z add.s32 %r9364, %r9354, 5120; 2026-02-21T09:23:32.8951514Z // begin inline asm 2026-02-21T09:23:32.8951654Z cp.async.ca.shared.global [ %r9364 + 0 ], [ %rd574 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8951712Z // end inline asm 2026-02-21T09:23:32.8951774Z add.s32 %r9366, %r9354, 6144; 2026-02-21T09:23:32.8951845Z // begin inline asm 2026-02-21T09:23:32.8951984Z cp.async.ca.shared.global [ %r9366 + 0 ], [ %rd575 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8952043Z // end inline asm 2026-02-21T09:23:32.8952113Z add.s32 %r9368, %r9354, 7168; 2026-02-21T09:23:32.8952175Z // begin inline asm 2026-02-21T09:23:32.8952312Z cp.async.ca.shared.global [ %r9368 + 0 ], [ %rd576 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8952373Z // end inline asm 2026-02-21T09:23:32.8952443Z add.s32 %r9370, %r9354, 8192; 2026-02-21T09:23:32.8952515Z // begin inline asm 2026-02-21T09:23:32.8952662Z cp.async.ca.shared.global [ %r9370 + 0 ], [ %rd577 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8952727Z // end inline asm 2026-02-21T09:23:32.8952790Z add.s32 %r9372, %r9354, 9216; 2026-02-21T09:23:32.8952853Z // begin inline asm 2026-02-21T09:23:32.8952994Z cp.async.ca.shared.global [ %r9372 + 0 ], [ %rd578 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8953053Z // end inline asm 2026-02-21T09:23:32.8953119Z add.s32 %r9374, %r9354, 10240; 2026-02-21T09:23:32.8953180Z // begin inline asm 2026-02-21T09:23:32.8953323Z cp.async.ca.shared.global [ %r9374 + 0 ], [ %rd579 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8953381Z // end inline asm 2026-02-21T09:23:32.8953445Z add.s32 %r9376, %r9354, 11264; 2026-02-21T09:23:32.8953512Z // begin inline asm 2026-02-21T09:23:32.8953650Z cp.async.ca.shared.global [ %r9376 + 0 ], [ %rd580 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8953708Z // end inline asm 2026-02-21T09:23:32.8953772Z add.s32 %r9378, %r9354, 12288; 2026-02-21T09:23:32.8953837Z // begin inline asm 2026-02-21T09:23:32.8954106Z cp.async.ca.shared.global [ %r9378 + 0 ], [ %rd581 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8954164Z // end inline asm 2026-02-21T09:23:32.8954231Z add.s32 %r9380, %r9354, 13312; 2026-02-21T09:23:32.8954292Z // begin inline asm 2026-02-21T09:23:32.8954436Z cp.async.ca.shared.global [ %r9380 + 0 ], [ %rd582 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8954500Z // end inline asm 2026-02-21T09:23:32.8954564Z add.s32 %r9382, %r9354, 14336; 2026-02-21T09:23:32.8954623Z // begin inline asm 2026-02-21T09:23:32.8954758Z cp.async.ca.shared.global [ %r9382 + 0 ], [ %rd583 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8954826Z // end inline asm 2026-02-21T09:23:32.8954889Z add.s32 %r9384, %r9354, 15360; 2026-02-21T09:23:32.8954949Z // begin inline asm 2026-02-21T09:23:32.8955145Z cp.async.ca.shared.global [ %r9384 + 0 ], [ %rd584 + 0 ], 0x8, %r9355; 2026-02-21T09:23:32.8955206Z // end inline asm 2026-02-21T09:23:32.8955276Z cp.async.commit_group; 2026-02-21T09:23:32.8955485Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8955561Z shl.b32 %r9484, %r12591, 3; 2026-02-21T09:23:32.8955629Z add.s32 %r9386, %r9488, %r9484; 2026-02-21T09:23:32.8955703Z and.pred %p143, %p2, %p147; 2026-02-21T09:23:32.8955812Z // begin inline asm 2026-02-21T09:23:32.8955956Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r9386], 4096; 2026-02-21T09:23:32.8956014Z // end inline asm 2026-02-21T09:23:32.8956224Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8956291Z shl.b32 %r9485, %r12591, 12; 2026-02-21T09:23:32.8956361Z add.s32 %r9387, %r1289, %r9485; 2026-02-21T09:23:32.8956420Z bar.sync 0; 2026-02-21T09:23:32.8956625Z elect.sync %r9486|%p150, -1; 2026-02-21T09:23:32.8956704Z and.pred %p151, %p147, %p150; 2026-02-21T09:23:32.8956778Z and.pred %p144, %p1, %p151; 2026-02-21T09:23:32.8956847Z cvt.u32.u64 %r9487, %rd741; 2026-02-21T09:23:32.8956917Z add.s32 %r9389, %r9487, 64; 2026-02-21T09:23:32.8956981Z // begin inline asm 2026-02-21T09:23:32.8957321Z @%p144 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9387], [%rd603, {%r6762, %r9389}], [%r9386]; 2026-02-21T09:23:32.8957387Z // end inline asm 2026-02-21T09:23:32.8957596Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8957665Z add.s64 %rd740, %rd740, 128; 2026-02-21T09:23:32.8957744Z setp.lt.u64 %p152, %rd741, 480; 2026-02-21T09:23:32.8957807Z add.s64 %rd741, %rd741, 32; 2026-02-21T09:23:32.8957870Z @%p152 bra $L__BB0_7; 2026-02-21T09:23:32.8957999Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:32.8958078Z cp.async.wait_group 0; 2026-02-21T09:23:32.8958140Z bar.sync 0; 2026-02-21T09:23:32.8958203Z // begin inline asm 2026-02-21T09:23:32.8958307Z @%p2 mbarrier.inval.shared::cta.b64 [%r9488]; 2026-02-21T09:23:32.8958367Z // end inline asm 2026-02-21T09:23:32.8958429Z bar.sync 0; 2026-02-21T09:23:32.8958495Z // begin inline asm 2026-02-21T09:23:32.8958588Z @%p2 mbarrier.inval.shared::cta.b64 [%r9489]; 2026-02-21T09:23:32.8958649Z // end inline asm 2026-02-21T09:23:32.8958873Z .loc 1 91 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:91:28 2026-02-21T09:23:32.8958972Z cvt.rn.bf16x2.f32 %r9574, %r12593, %r12592; 2026-02-21T09:23:32.8959057Z cvt.rn.bf16x2.f32 %r9575, %r12595, %r12594; 2026-02-21T09:23:32.8959137Z cvt.rn.bf16x2.f32 %r9576, %r12597, %r12596; 2026-02-21T09:23:32.8959220Z cvt.rn.bf16x2.f32 %r9577, %r12599, %r12598; 2026-02-21T09:23:32.8959300Z cvt.rn.bf16x2.f32 %r9578, %r12601, %r12600; 2026-02-21T09:23:32.8959378Z cvt.rn.bf16x2.f32 %r9579, %r12603, %r12602; 2026-02-21T09:23:32.8959467Z cvt.rn.bf16x2.f32 %r9580, %r12605, %r12604; 2026-02-21T09:23:32.8959546Z cvt.rn.bf16x2.f32 %r9581, %r12607, %r12606; 2026-02-21T09:23:32.8959625Z cvt.rn.bf16x2.f32 %r9582, %r12609, %r12608; 2026-02-21T09:23:32.8959870Z cvt.rn.bf16x2.f32 %r9583, %r12611, %r12610; 2026-02-21T09:23:32.8959951Z cvt.rn.bf16x2.f32 %r9584, %r12613, %r12612; 2026-02-21T09:23:32.8960029Z cvt.rn.bf16x2.f32 %r9585, %r12615, %r12614; 2026-02-21T09:23:32.8960111Z cvt.rn.bf16x2.f32 %r9586, %r12617, %r12616; 2026-02-21T09:23:32.8960197Z cvt.rn.bf16x2.f32 %r9587, %r12619, %r12618; 2026-02-21T09:23:32.8960277Z cvt.rn.bf16x2.f32 %r9588, %r12621, %r12620; 2026-02-21T09:23:32.8960354Z cvt.rn.bf16x2.f32 %r9589, %r12623, %r12622; 2026-02-21T09:23:32.8960441Z cvt.rn.bf16x2.f32 %r9590, %r12625, %r12624; 2026-02-21T09:23:32.8960518Z cvt.rn.bf16x2.f32 %r9591, %r12627, %r12626; 2026-02-21T09:23:32.8960600Z cvt.rn.bf16x2.f32 %r9592, %r12629, %r12628; 2026-02-21T09:23:32.8960684Z cvt.rn.bf16x2.f32 %r9593, %r12631, %r12630; 2026-02-21T09:23:32.8960826Z cvt.rn.bf16x2.f32 %r9594, %r12633, %r12632; 2026-02-21T09:23:32.8960908Z cvt.rn.bf16x2.f32 %r9595, %r12635, %r12634; 2026-02-21T09:23:32.8960991Z cvt.rn.bf16x2.f32 %r9596, %r12637, %r12636; 2026-02-21T09:23:32.8961081Z cvt.rn.bf16x2.f32 %r9597, %r12639, %r12638; 2026-02-21T09:23:32.8961159Z cvt.rn.bf16x2.f32 %r9598, %r12641, %r12640; 2026-02-21T09:23:32.8961237Z cvt.rn.bf16x2.f32 %r9599, %r12643, %r12642; 2026-02-21T09:23:32.8961382Z cvt.rn.bf16x2.f32 %r9600, %r12645, %r12644; 2026-02-21T09:23:32.8961466Z cvt.rn.bf16x2.f32 %r9601, %r12647, %r12646; 2026-02-21T09:23:32.8961547Z cvt.rn.bf16x2.f32 %r9602, %r12649, %r12648; 2026-02-21T09:23:32.8961625Z cvt.rn.bf16x2.f32 %r9603, %r12651, %r12650; 2026-02-21T09:23:32.8961711Z cvt.rn.bf16x2.f32 %r9604, %r12653, %r12652; 2026-02-21T09:23:32.8961789Z cvt.rn.bf16x2.f32 %r9605, %r12655, %r12654; 2026-02-21T09:23:32.8961868Z cvt.rn.bf16x2.f32 %r9606, %r12657, %r12656; 2026-02-21T09:23:32.8961958Z cvt.rn.bf16x2.f32 %r9607, %r12659, %r12658; 2026-02-21T09:23:32.8962037Z cvt.rn.bf16x2.f32 %r9608, %r12661, %r12660; 2026-02-21T09:23:32.8962130Z cvt.rn.bf16x2.f32 %r9609, %r12663, %r12662; 2026-02-21T09:23:32.8962217Z cvt.rn.bf16x2.f32 %r9610, %r12665, %r12664; 2026-02-21T09:23:32.8962295Z cvt.rn.bf16x2.f32 %r9611, %r12667, %r12666; 2026-02-21T09:23:32.8962373Z cvt.rn.bf16x2.f32 %r9612, %r12669, %r12668; 2026-02-21T09:23:32.8962449Z cvt.rn.bf16x2.f32 %r9613, %r12671, %r12670; 2026-02-21T09:23:32.8962533Z cvt.rn.bf16x2.f32 %r9614, %r12673, %r12672; 2026-02-21T09:23:32.8962611Z cvt.rn.bf16x2.f32 %r9615, %r12675, %r12674; 2026-02-21T09:23:32.8962688Z cvt.rn.bf16x2.f32 %r9616, %r12677, %r12676; 2026-02-21T09:23:32.8962770Z cvt.rn.bf16x2.f32 %r9617, %r12679, %r12678; 2026-02-21T09:23:32.8962851Z cvt.rn.bf16x2.f32 %r9618, %r12681, %r12680; 2026-02-21T09:23:32.8962929Z cvt.rn.bf16x2.f32 %r9619, %r12683, %r12682; 2026-02-21T09:23:32.8963013Z cvt.rn.bf16x2.f32 %r9620, %r12685, %r12684; 2026-02-21T09:23:32.8963093Z cvt.rn.bf16x2.f32 %r9621, %r12687, %r12686; 2026-02-21T09:23:32.8963173Z cvt.rn.bf16x2.f32 %r9622, %r12689, %r12688; 2026-02-21T09:23:32.8963249Z cvt.rn.bf16x2.f32 %r9623, %r12691, %r12690; 2026-02-21T09:23:32.8963334Z cvt.rn.bf16x2.f32 %r9624, %r12693, %r12692; 2026-02-21T09:23:32.8963411Z cvt.rn.bf16x2.f32 %r9625, %r12695, %r12694; 2026-02-21T09:23:32.8963489Z cvt.rn.bf16x2.f32 %r9626, %r12697, %r12696; 2026-02-21T09:23:32.8963571Z cvt.rn.bf16x2.f32 %r9627, %r12699, %r12698; 2026-02-21T09:23:32.8963648Z cvt.rn.bf16x2.f32 %r9628, %r12701, %r12700; 2026-02-21T09:23:32.8963726Z cvt.rn.bf16x2.f32 %r9629, %r12703, %r12702; 2026-02-21T09:23:32.8963808Z cvt.rn.bf16x2.f32 %r9630, %r12705, %r12704; 2026-02-21T09:23:32.8963886Z cvt.rn.bf16x2.f32 %r9631, %r12707, %r12706; 2026-02-21T09:23:32.8963964Z cvt.rn.bf16x2.f32 %r9632, %r12709, %r12708; 2026-02-21T09:23:32.8964041Z cvt.rn.bf16x2.f32 %r9633, %r12711, %r12710; 2026-02-21T09:23:32.8964124Z cvt.rn.bf16x2.f32 %r9634, %r12713, %r12712; 2026-02-21T09:23:32.8964203Z cvt.rn.bf16x2.f32 %r9635, %r12715, %r12714; 2026-02-21T09:23:32.8964280Z cvt.rn.bf16x2.f32 %r9636, %r12717, %r12716; 2026-02-21T09:23:32.8964432Z cvt.rn.bf16x2.f32 %r9637, %r12719, %r12718; 2026-02-21T09:23:32.8964691Z .loc 1 92 43 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:92:43 2026-02-21T09:23:32.8964885Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r74], {%r9574, %r9575, %r9576, %r9577}; 2026-02-21T09:23:32.8965077Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r9590, %r9591, %r9592, %r9593}; 2026-02-21T09:23:32.8965260Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r9606, %r9607, %r9608, %r9609}; 2026-02-21T09:23:32.8965439Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r9622, %r9623, %r9624, %r9625}; 2026-02-21T09:23:32.8965621Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r9578, %r9579, %r9580, %r9581}; 2026-02-21T09:23:32.8965843Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r9594, %r9595, %r9596, %r9597}; 2026-02-21T09:23:32.8966022Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r9610, %r9611, %r9612, %r9613}; 2026-02-21T09:23:32.8966201Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r9626, %r9627, %r9628, %r9629}; 2026-02-21T09:23:32.8966401Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r9582, %r9583, %r9584, %r9585}; 2026-02-21T09:23:32.8966704Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r9598, %r9599, %r9600, %r9601}; 2026-02-21T09:23:32.8966974Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r9614, %r9615, %r9616, %r9617}; 2026-02-21T09:23:32.8967163Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r9630, %r9631, %r9632, %r9633}; 2026-02-21T09:23:32.8967344Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r9586, %r9587, %r9588, %r9589}; 2026-02-21T09:23:32.8967523Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r9602, %r9603, %r9604, %r9605}; 2026-02-21T09:23:32.8967709Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r9618, %r9619, %r9620, %r9621}; 2026-02-21T09:23:32.8967887Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r9634, %r9635, %r9636, %r9637}; 2026-02-21T09:23:32.8967951Z // begin inline asm 2026-02-21T09:23:32.8968045Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.8968108Z // end inline asm 2026-02-21T09:23:32.8968167Z bar.sync 0; 2026-02-21T09:23:32.8968240Z elect.sync %r9638|%p164, -1; 2026-02-21T09:23:32.8968316Z and.pred %p155, %p82, %p164; 2026-02-21T09:23:32.8968382Z or.b32 %r9490, %r386, %r6762; 2026-02-21T09:23:32.8968456Z // begin inline asm 2026-02-21T09:23:32.8968705Z @%p155 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd586, {%r9490, %r9491}], [%r9492]; 2026-02-21T09:23:32.8968766Z // end inline asm 2026-02-21T09:23:32.8968844Z cp.async.bulk.commit_group; 2026-02-21T09:23:32.8968931Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:32.8968990Z bar.sync 0; 2026-02-21T09:23:32.8969200Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.8969265Z or.b32 %r9639, %r12326, 3; 2026-02-21T09:23:32.8969482Z .loc 1 35 31 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:35:31 2026-02-21T09:23:32.8969558Z add.s32 %r9642, %r9639, %r1335; 2026-02-21T09:23:32.8969764Z .loc 1 34 30 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:34:30 2026-02-21T09:23:32.8969835Z and.b32 %r9529, %r9642, -128; 2026-02-21T09:23:32.8969902Z sub.s32 %r9643, %r9639, %r9529; 2026-02-21T09:23:32.8970103Z .loc 1 36 27 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:36:27 2026-02-21T09:23:32.8970175Z shl.b32 %r12258, %r9643, 7; 2026-02-21T09:23:32.8970374Z .loc 1 37 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:37:32 2026-02-21T09:23:32.8970441Z or.b32 %r9644, %r12258, %r6; 2026-02-21T09:23:32.8970514Z or.b32 %r9645, %r12258, %r7; 2026-02-21T09:23:32.8970577Z or.b32 %r9646, %r12258, %r8; 2026-02-21T09:23:32.8970645Z or.b32 %r9647, %r12258, %r9; 2026-02-21T09:23:32.8970709Z or.b32 %r9648, %r12258, %r10; 2026-02-21T09:23:32.8970777Z or.b32 %r9649, %r12258, %r11; 2026-02-21T09:23:32.8970985Z or.b32 %r9650, %r12258, %r12; 2026-02-21T09:23:32.8971052Z or.b32 %r9651, %r12258, %r13; 2026-02-21T09:23:32.8971118Z or.b32 %r9652, %r12258, %r14; 2026-02-21T09:23:32.8971181Z or.b32 %r9653, %r12258, %r15; 2026-02-21T09:23:32.8971243Z or.b32 %r9654, %r12258, %r16; 2026-02-21T09:23:32.8971311Z or.b32 %r9655, %r12258, %r17; 2026-02-21T09:23:32.8971374Z or.b32 %r9656, %r12258, %r18; 2026-02-21T09:23:32.8971437Z or.b32 %r9657, %r12258, %r19; 2026-02-21T09:23:32.8971499Z or.b32 %r9658, %r12258, %r20; 2026-02-21T09:23:32.8971566Z or.b32 %r9659, %r12258, %r21; 2026-02-21T09:23:32.8971773Z .loc 1 52 53 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:53 2026-02-21T09:23:32.8971837Z shl.b32 %r9660, %r9644, 10; 2026-02-21T09:23:32.8971972Z shl.b32 %r9661, %r9645, 10; 2026-02-21T09:23:32.8972036Z shl.b32 %r9662, %r9646, 10; 2026-02-21T09:23:32.8972099Z shl.b32 %r9663, %r9647, 10; 2026-02-21T09:23:32.8972173Z shl.b32 %r9664, %r9648, 10; 2026-02-21T09:23:32.8972246Z shl.b32 %r9665, %r9649, 10; 2026-02-21T09:23:32.8972310Z shl.b32 %r9666, %r9650, 10; 2026-02-21T09:23:32.8972372Z shl.b32 %r9667, %r9651, 10; 2026-02-21T09:23:32.8972437Z shl.b32 %r9668, %r9652, 10; 2026-02-21T09:23:32.8972499Z shl.b32 %r9669, %r9653, 10; 2026-02-21T09:23:32.8972610Z shl.b32 %r9670, %r9654, 10; 2026-02-21T09:23:32.8972679Z shl.b32 %r9671, %r9655, 10; 2026-02-21T09:23:32.8972741Z shl.b32 %r9672, %r9656, 10; 2026-02-21T09:23:32.8972802Z shl.b32 %r9673, %r9657, 10; 2026-02-21T09:23:32.8972863Z shl.b32 %r9674, %r9658, 10; 2026-02-21T09:23:32.8972932Z shl.b32 %r9675, %r9659, 10; 2026-02-21T09:23:32.8973138Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8973200Z // begin inline asm 2026-02-21T09:23:32.8973309Z @%p2 mbarrier.init.shared::cta.b64 [%r9488], 1; 2026-02-21T09:23:32.8973369Z // end inline asm 2026-02-21T09:23:32.8973428Z bar.sync 0; 2026-02-21T09:23:32.8973491Z // begin inline asm 2026-02-21T09:23:32.8973607Z @%p2 mbarrier.init.shared::cta.b64 [%r9489], 1; 2026-02-21T09:23:32.8973670Z // end inline asm 2026-02-21T09:23:32.8973876Z .loc 1 52 60 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:60 2026-02-21T09:23:32.8973951Z or.b32 %r9676, %r9660, %r23; 2026-02-21T09:23:32.8974015Z or.b32 %r9677, %r9661, %r23; 2026-02-21T09:23:32.8974076Z or.b32 %r9678, %r9662, %r23; 2026-02-21T09:23:32.8974144Z or.b32 %r9679, %r9663, %r23; 2026-02-21T09:23:32.8974206Z or.b32 %r9680, %r9664, %r23; 2026-02-21T09:23:32.8974267Z or.b32 %r9681, %r9665, %r23; 2026-02-21T09:23:32.8974329Z or.b32 %r9682, %r9666, %r23; 2026-02-21T09:23:32.8974396Z or.b32 %r9683, %r9667, %r23; 2026-02-21T09:23:32.8974458Z or.b32 %r9684, %r9668, %r23; 2026-02-21T09:23:32.8974519Z or.b32 %r9685, %r9669, %r23; 2026-02-21T09:23:32.8974583Z or.b32 %r9686, %r9670, %r23; 2026-02-21T09:23:32.8974644Z or.b32 %r9687, %r9671, %r23; 2026-02-21T09:23:32.8974707Z or.b32 %r9688, %r9672, %r23; 2026-02-21T09:23:32.8974768Z or.b32 %r9689, %r9673, %r23; 2026-02-21T09:23:32.8974832Z or.b32 %r9690, %r9674, %r23; 2026-02-21T09:23:32.8974891Z or.b32 %r9691, %r9675, %r23; 2026-02-21T09:23:32.8975092Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8975172Z mad.wide.s32 %rd587, %r9676, 2, %rd93; 2026-02-21T09:23:32.8975251Z mad.wide.s32 %rd588, %r9677, 2, %rd93; 2026-02-21T09:23:32.8975330Z mad.wide.s32 %rd589, %r9678, 2, %rd93; 2026-02-21T09:23:32.8975408Z mad.wide.s32 %rd590, %r9679, 2, %rd93; 2026-02-21T09:23:32.8975478Z mad.wide.s32 %rd591, %r9680, 2, %rd93; 2026-02-21T09:23:32.8975549Z mad.wide.s32 %rd592, %r9681, 2, %rd93; 2026-02-21T09:23:32.8975619Z mad.wide.s32 %rd593, %r9682, 2, %rd93; 2026-02-21T09:23:32.8975693Z mad.wide.s32 %rd594, %r9683, 2, %rd93; 2026-02-21T09:23:32.8975762Z mad.wide.s32 %rd595, %r9684, 2, %rd93; 2026-02-21T09:23:32.8975832Z mad.wide.s32 %rd596, %r9685, 2, %rd93; 2026-02-21T09:23:32.8976036Z mad.wide.s32 %rd597, %r9686, 2, %rd93; 2026-02-21T09:23:32.8976105Z mad.wide.s32 %rd598, %r9687, 2, %rd93; 2026-02-21T09:23:32.8976173Z mad.wide.s32 %rd599, %r9688, 2, %rd93; 2026-02-21T09:23:32.8976244Z mad.wide.s32 %rd600, %r9689, 2, %rd93; 2026-02-21T09:23:32.8976330Z mad.wide.s32 %rd601, %r9690, 2, %rd93; 2026-02-21T09:23:32.8976404Z mad.wide.s32 %rd602, %r9691, 2, %rd93; 2026-02-21T09:23:32.8976574Z mov.b32 %r9496, 8; 2026-02-21T09:23:32.8976800Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8976864Z // begin inline asm 2026-02-21T09:23:32.8977015Z cp.async.ca.shared.global [ %r9495 + 0 ], [ %rd587 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8977081Z // end inline asm 2026-02-21T09:23:32.8977228Z // begin inline asm 2026-02-21T09:23:32.8977377Z cp.async.ca.shared.global [ %r9497 + 0 ], [ %rd588 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8977437Z // end inline asm 2026-02-21T09:23:32.8977507Z // begin inline asm 2026-02-21T09:23:32.8977650Z cp.async.ca.shared.global [ %r9499 + 0 ], [ %rd589 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8977709Z // end inline asm 2026-02-21T09:23:32.8977775Z // begin inline asm 2026-02-21T09:23:32.8977972Z cp.async.ca.shared.global [ %r9501 + 0 ], [ %rd590 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8978033Z // end inline asm 2026-02-21T09:23:32.8978099Z // begin inline asm 2026-02-21T09:23:32.8978235Z cp.async.ca.shared.global [ %r9503 + 0 ], [ %rd591 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8978293Z // end inline asm 2026-02-21T09:23:32.8978364Z // begin inline asm 2026-02-21T09:23:32.8978510Z cp.async.ca.shared.global [ %r9505 + 0 ], [ %rd592 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8978568Z // end inline asm 2026-02-21T09:23:32.8978631Z // begin inline asm 2026-02-21T09:23:32.8978772Z cp.async.ca.shared.global [ %r9507 + 0 ], [ %rd593 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8978831Z // end inline asm 2026-02-21T09:23:32.8978901Z // begin inline asm 2026-02-21T09:23:32.8979041Z cp.async.ca.shared.global [ %r9509 + 0 ], [ %rd594 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8979099Z // end inline asm 2026-02-21T09:23:32.8979160Z // begin inline asm 2026-02-21T09:23:32.8979300Z cp.async.ca.shared.global [ %r9511 + 0 ], [ %rd595 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8979362Z // end inline asm 2026-02-21T09:23:32.8979424Z // begin inline asm 2026-02-21T09:23:32.8979571Z cp.async.ca.shared.global [ %r9513 + 0 ], [ %rd596 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8979630Z // end inline asm 2026-02-21T09:23:32.8979691Z // begin inline asm 2026-02-21T09:23:32.8979827Z cp.async.ca.shared.global [ %r9515 + 0 ], [ %rd597 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8979888Z // end inline asm 2026-02-21T09:23:32.8979948Z // begin inline asm 2026-02-21T09:23:32.8980086Z cp.async.ca.shared.global [ %r9517 + 0 ], [ %rd598 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8980149Z // end inline asm 2026-02-21T09:23:32.8980209Z // begin inline asm 2026-02-21T09:23:32.8980357Z cp.async.ca.shared.global [ %r9519 + 0 ], [ %rd599 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8980422Z // end inline asm 2026-02-21T09:23:32.8980487Z // begin inline asm 2026-02-21T09:23:32.8980620Z cp.async.ca.shared.global [ %r9521 + 0 ], [ %rd600 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8980681Z // end inline asm 2026-02-21T09:23:32.8980746Z // begin inline asm 2026-02-21T09:23:32.8980878Z cp.async.ca.shared.global [ %r9523 + 0 ], [ %rd601 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8980936Z // end inline asm 2026-02-21T09:23:32.8981002Z // begin inline asm 2026-02-21T09:23:32.8981141Z cp.async.ca.shared.global [ %r9525 + 0 ], [ %rd602 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8981198Z // end inline asm 2026-02-21T09:23:32.8981267Z cp.async.commit_group; 2026-02-21T09:23:32.8981481Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8981540Z bar.sync 0; 2026-02-21T09:23:32.8981599Z // begin inline asm 2026-02-21T09:23:32.8981881Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9488], 4096; 2026-02-21T09:23:32.8981940Z // end inline asm 2026-02-21T09:23:32.8982138Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8982200Z bar.sync 0; 2026-02-21T09:23:32.8982270Z elect.sync %r9692|%p165, -1; 2026-02-21T09:23:32.8982338Z and.pred %p159, %p1, %p165; 2026-02-21T09:23:32.8982397Z mov.b32 %r9530, 0; 2026-02-21T09:23:32.8982469Z // begin inline asm 2026-02-21T09:23:32.8982810Z @%p159 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1289], [%rd603, {%r9529, %r9530}], [%r9488]; 2026-02-21T09:23:32.8982870Z // end inline asm 2026-02-21T09:23:32.8983121Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.8983189Z cvt.s64.s32 %rd622, %r9660; 2026-02-21T09:23:32.8983254Z or.b64 %rd623, %rd622, %rd12; 2026-02-21T09:23:32.8983324Z shl.b64 %rd624, %rd623, 1; 2026-02-21T09:23:32.8983397Z add.s64 %rd625, %rd93, %rd624; 2026-02-21T09:23:32.8983462Z add.s64 %rd604, %rd625, 128; 2026-02-21T09:23:32.8983525Z cvt.s64.s32 %rd626, %r9661; 2026-02-21T09:23:32.8983593Z or.b64 %rd627, %rd626, %rd12; 2026-02-21T09:23:32.8983699Z shl.b64 %rd628, %rd627, 1; 2026-02-21T09:23:32.8983765Z add.s64 %rd629, %rd93, %rd628; 2026-02-21T09:23:32.8983831Z add.s64 %rd605, %rd629, 128; 2026-02-21T09:23:32.8983894Z cvt.s64.s32 %rd630, %r9662; 2026-02-21T09:23:32.8983956Z or.b64 %rd631, %rd630, %rd12; 2026-02-21T09:23:32.8984021Z shl.b64 %rd632, %rd631, 1; 2026-02-21T09:23:32.8984090Z add.s64 %rd633, %rd93, %rd632; 2026-02-21T09:23:32.8984153Z add.s64 %rd606, %rd633, 128; 2026-02-21T09:23:32.8984215Z cvt.s64.s32 %rd634, %r9663; 2026-02-21T09:23:32.8984286Z or.b64 %rd635, %rd634, %rd12; 2026-02-21T09:23:32.8984349Z shl.b64 %rd636, %rd635, 1; 2026-02-21T09:23:32.8984417Z add.s64 %rd637, %rd93, %rd636; 2026-02-21T09:23:32.8984480Z add.s64 %rd607, %rd637, 128; 2026-02-21T09:23:32.8984551Z cvt.s64.s32 %rd638, %r9664; 2026-02-21T09:23:32.8984614Z or.b64 %rd639, %rd638, %rd12; 2026-02-21T09:23:32.8984677Z shl.b64 %rd640, %rd639, 1; 2026-02-21T09:23:32.8984745Z add.s64 %rd641, %rd93, %rd640; 2026-02-21T09:23:32.8984810Z add.s64 %rd608, %rd641, 128; 2026-02-21T09:23:32.8984872Z cvt.s64.s32 %rd642, %r9665; 2026-02-21T09:23:32.8984939Z or.b64 %rd643, %rd642, %rd12; 2026-02-21T09:23:32.8985002Z shl.b64 %rd644, %rd643, 1; 2026-02-21T09:23:32.8985068Z add.s64 %rd645, %rd93, %rd644; 2026-02-21T09:23:32.8985130Z add.s64 %rd609, %rd645, 128; 2026-02-21T09:23:32.8985197Z cvt.s64.s32 %rd646, %r9666; 2026-02-21T09:23:32.8985261Z or.b64 %rd647, %rd646, %rd12; 2026-02-21T09:23:32.8985324Z shl.b64 %rd648, %rd647, 1; 2026-02-21T09:23:32.8985408Z add.s64 %rd649, %rd93, %rd648; 2026-02-21T09:23:32.8985474Z add.s64 %rd610, %rd649, 128; 2026-02-21T09:23:32.8985540Z cvt.s64.s32 %rd650, %r9667; 2026-02-21T09:23:32.8985604Z or.b64 %rd651, %rd650, %rd12; 2026-02-21T09:23:32.8985677Z shl.b64 %rd652, %rd651, 1; 2026-02-21T09:23:32.8985743Z add.s64 %rd653, %rd93, %rd652; 2026-02-21T09:23:32.8985808Z add.s64 %rd611, %rd653, 128; 2026-02-21T09:23:32.8985880Z cvt.s64.s32 %rd654, %r9668; 2026-02-21T09:23:32.8985945Z or.b64 %rd655, %rd654, %rd12; 2026-02-21T09:23:32.8986010Z shl.b64 %rd656, %rd655, 1; 2026-02-21T09:23:32.8986074Z add.s64 %rd657, %rd93, %rd656; 2026-02-21T09:23:32.8986142Z add.s64 %rd612, %rd657, 128; 2026-02-21T09:23:32.8986206Z cvt.s64.s32 %rd658, %r9669; 2026-02-21T09:23:32.8986271Z or.b64 %rd659, %rd658, %rd12; 2026-02-21T09:23:32.8986340Z shl.b64 %rd660, %rd659, 1; 2026-02-21T09:23:32.8986403Z add.s64 %rd661, %rd93, %rd660; 2026-02-21T09:23:32.8986586Z add.s64 %rd613, %rd661, 128; 2026-02-21T09:23:32.8986654Z cvt.s64.s32 %rd662, %r9670; 2026-02-21T09:23:32.8986724Z or.b64 %rd663, %rd662, %rd12; 2026-02-21T09:23:32.8986786Z shl.b64 %rd664, %rd663, 1; 2026-02-21T09:23:32.8986851Z add.s64 %rd665, %rd93, %rd664; 2026-02-21T09:23:32.8987091Z add.s64 %rd614, %rd665, 128; 2026-02-21T09:23:32.8987156Z cvt.s64.s32 %rd666, %r9671; 2026-02-21T09:23:32.8987219Z or.b64 %rd667, %rd666, %rd12; 2026-02-21T09:23:32.8987290Z shl.b64 %rd668, %rd667, 1; 2026-02-21T09:23:32.8987356Z add.s64 %rd669, %rd93, %rd668; 2026-02-21T09:23:32.8987421Z add.s64 %rd615, %rd669, 128; 2026-02-21T09:23:32.8987485Z cvt.s64.s32 %rd670, %r9672; 2026-02-21T09:23:32.8987555Z or.b64 %rd671, %rd670, %rd12; 2026-02-21T09:23:32.8987619Z shl.b64 %rd672, %rd671, 1; 2026-02-21T09:23:32.8987684Z add.s64 %rd673, %rd93, %rd672; 2026-02-21T09:23:32.8987756Z add.s64 %rd616, %rd673, 128; 2026-02-21T09:23:32.8987820Z cvt.s64.s32 %rd674, %r9673; 2026-02-21T09:23:32.8987885Z or.b64 %rd675, %rd674, %rd12; 2026-02-21T09:23:32.8988018Z shl.b64 %rd676, %rd675, 1; 2026-02-21T09:23:32.8988092Z add.s64 %rd677, %rd93, %rd676; 2026-02-21T09:23:32.8988157Z add.s64 %rd617, %rd677, 128; 2026-02-21T09:23:32.8988292Z cvt.s64.s32 %rd678, %r9674; 2026-02-21T09:23:32.8988374Z or.b64 %rd679, %rd678, %rd12; 2026-02-21T09:23:32.8988438Z shl.b64 %rd680, %rd679, 1; 2026-02-21T09:23:32.8988503Z add.s64 %rd681, %rd93, %rd680; 2026-02-21T09:23:32.8988569Z add.s64 %rd618, %rd681, 128; 2026-02-21T09:23:32.8988702Z cvt.s64.s32 %rd682, %r9675; 2026-02-21T09:23:32.8988770Z or.b64 %rd683, %rd682, %rd12; 2026-02-21T09:23:32.8988833Z shl.b64 %rd684, %rd683, 1; 2026-02-21T09:23:32.8988903Z add.s64 %rd685, %rd93, %rd684; 2026-02-21T09:23:32.8988970Z add.s64 %rd619, %rd685, 128; 2026-02-21T09:23:32.8989173Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.8989240Z // begin inline asm 2026-02-21T09:23:32.8989384Z cp.async.ca.shared.global [ %r9532 + 0 ], [ %rd604 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8989447Z // end inline asm 2026-02-21T09:23:32.8989510Z // begin inline asm 2026-02-21T09:23:32.8989655Z cp.async.ca.shared.global [ %r9534 + 0 ], [ %rd605 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8989717Z // end inline asm 2026-02-21T09:23:32.8989784Z // begin inline asm 2026-02-21T09:23:32.8989926Z cp.async.ca.shared.global [ %r9536 + 0 ], [ %rd606 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8989987Z // end inline asm 2026-02-21T09:23:32.8990048Z // begin inline asm 2026-02-21T09:23:32.8990184Z cp.async.ca.shared.global [ %r9538 + 0 ], [ %rd607 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8990259Z // end inline asm 2026-02-21T09:23:32.8990322Z // begin inline asm 2026-02-21T09:23:32.8990460Z cp.async.ca.shared.global [ %r9540 + 0 ], [ %rd608 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8990523Z // end inline asm 2026-02-21T09:23:32.8990582Z // begin inline asm 2026-02-21T09:23:32.8990715Z cp.async.ca.shared.global [ %r9542 + 0 ], [ %rd609 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8990782Z // end inline asm 2026-02-21T09:23:32.8990842Z // begin inline asm 2026-02-21T09:23:32.8990978Z cp.async.ca.shared.global [ %r9544 + 0 ], [ %rd610 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8991039Z // end inline asm 2026-02-21T09:23:32.8991104Z // begin inline asm 2026-02-21T09:23:32.8991240Z cp.async.ca.shared.global [ %r9546 + 0 ], [ %rd611 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8991301Z // end inline asm 2026-02-21T09:23:32.8991364Z // begin inline asm 2026-02-21T09:23:32.8991497Z cp.async.ca.shared.global [ %r9548 + 0 ], [ %rd612 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8991557Z // end inline asm 2026-02-21T09:23:32.8991616Z // begin inline asm 2026-02-21T09:23:32.8991756Z cp.async.ca.shared.global [ %r9550 + 0 ], [ %rd613 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8991814Z // end inline asm 2026-02-21T09:23:32.8991872Z // begin inline asm 2026-02-21T09:23:32.8992012Z cp.async.ca.shared.global [ %r9552 + 0 ], [ %rd614 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8992069Z // end inline asm 2026-02-21T09:23:32.8992130Z // begin inline asm 2026-02-21T09:23:32.8992264Z cp.async.ca.shared.global [ %r9554 + 0 ], [ %rd615 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8992330Z // end inline asm 2026-02-21T09:23:32.8992490Z // begin inline asm 2026-02-21T09:23:32.8992624Z cp.async.ca.shared.global [ %r9556 + 0 ], [ %rd616 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8992688Z // end inline asm 2026-02-21T09:23:32.8992747Z // begin inline asm 2026-02-21T09:23:32.8992884Z cp.async.ca.shared.global [ %r9558 + 0 ], [ %rd617 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8992947Z // end inline asm 2026-02-21T09:23:32.8993007Z // begin inline asm 2026-02-21T09:23:32.8993140Z cp.async.ca.shared.global [ %r9560 + 0 ], [ %rd618 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8993199Z // end inline asm 2026-02-21T09:23:32.8993263Z // begin inline asm 2026-02-21T09:23:32.8993414Z cp.async.ca.shared.global [ %r9562 + 0 ], [ %rd619 + 0 ], 0x8, %r9496; 2026-02-21T09:23:32.8993473Z // end inline asm 2026-02-21T09:23:32.8993593Z cp.async.commit_group; 2026-02-21T09:23:32.8993799Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8993858Z bar.sync 0; 2026-02-21T09:23:32.8993921Z // begin inline asm 2026-02-21T09:23:32.8994056Z @%p2 mbarrier.arrive.expect_tx.shared.b64 _, [%r9489], 4096; 2026-02-21T09:23:32.8994115Z // end inline asm 2026-02-21T09:23:32.8994384Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.8994449Z bar.sync 0; 2026-02-21T09:23:32.8994521Z elect.sync %r9693|%p166, -1; 2026-02-21T09:23:32.8994591Z and.pred %p161, %p1, %p166; 2026-02-21T09:23:32.8994658Z mov.b32 %r9567, 32; 2026-02-21T09:23:32.8994722Z // begin inline asm 2026-02-21T09:23:32.8995054Z @%p161 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1326], [%rd603, {%r9529, %r9567}], [%r9489]; 2026-02-21T09:23:32.8995121Z // end inline asm 2026-02-21T09:23:32.8995323Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.8995387Z shl.b32 %r9694, %r9639, 7; 2026-02-21T09:23:32.8995450Z or.b32 %r9695, %r21, %r9694; 2026-02-21T09:23:32.8995522Z shl.b32 %r9696, %r9642, 7; 2026-02-21T09:23:32.8995590Z and.b32 %r9697, %r9696, -16384; 2026-02-21T09:23:32.8995656Z sub.s32 %r9698, %r9695, %r9697; 2026-02-21T09:23:32.8995723Z shl.b32 %r9699, %r9698, 10; 2026-02-21T09:23:32.8995792Z mul.wide.s32 %rd686, %r9699, 2; 2026-02-21T09:23:32.8995857Z or.b64 %rd73, %rd686, 256; 2026-02-21T09:23:32.8995920Z or.b32 %r9700, %r20, %r9694; 2026-02-21T09:23:32.8995990Z sub.s32 %r9701, %r9700, %r9697; 2026-02-21T09:23:32.8996054Z shl.b32 %r9702, %r9701, 10; 2026-02-21T09:23:32.8996120Z mul.wide.s32 %rd687, %r9702, 2; 2026-02-21T09:23:32.8996189Z or.b64 %rd74, %rd687, 256; 2026-02-21T09:23:32.8996254Z or.b32 %r9703, %r19, %r9694; 2026-02-21T09:23:32.8996317Z sub.s32 %r9704, %r9703, %r9697; 2026-02-21T09:23:32.8996381Z shl.b32 %r9705, %r9704, 10; 2026-02-21T09:23:32.8996583Z mul.wide.s32 %rd688, %r9705, 2; 2026-02-21T09:23:32.8996651Z or.b64 %rd75, %rd688, 256; 2026-02-21T09:23:32.8996713Z or.b32 %r9706, %r18, %r9694; 2026-02-21T09:23:32.8996785Z sub.s32 %r9707, %r9706, %r9697; 2026-02-21T09:23:32.8996849Z shl.b32 %r9708, %r9707, 10; 2026-02-21T09:23:32.8996916Z mul.wide.s32 %rd689, %r9708, 2; 2026-02-21T09:23:32.8996985Z or.b64 %rd76, %rd689, 256; 2026-02-21T09:23:32.8997048Z or.b32 %r9709, %r17, %r9694; 2026-02-21T09:23:32.8997114Z sub.s32 %r9710, %r9709, %r9697; 2026-02-21T09:23:32.8997186Z shl.b32 %r9711, %r9710, 10; 2026-02-21T09:23:32.8997261Z mul.wide.s32 %rd690, %r9711, 2; 2026-02-21T09:23:32.8997326Z or.b64 %rd77, %rd690, 256; 2026-02-21T09:23:32.8997389Z or.b32 %r9712, %r16, %r9694; 2026-02-21T09:23:32.8997460Z sub.s32 %r9713, %r9712, %r9697; 2026-02-21T09:23:32.8997522Z shl.b32 %r9714, %r9713, 10; 2026-02-21T09:23:32.8997588Z mul.wide.s32 %rd691, %r9714, 2; 2026-02-21T09:23:32.8997653Z or.b64 %rd78, %rd691, 256; 2026-02-21T09:23:32.8997719Z or.b32 %r9715, %r15, %r9694; 2026-02-21T09:23:32.8997782Z sub.s32 %r9716, %r9715, %r9697; 2026-02-21T09:23:32.8997939Z shl.b32 %r9717, %r9716, 10; 2026-02-21T09:23:32.8998073Z mul.wide.s32 %rd692, %r9717, 2; 2026-02-21T09:23:32.8998136Z or.b64 %rd79, %rd692, 256; 2026-02-21T09:23:32.8998212Z or.b32 %r9718, %r14, %r9694; 2026-02-21T09:23:32.8998277Z sub.s32 %r9719, %r9718, %r9697; 2026-02-21T09:23:32.8998348Z shl.b32 %r9720, %r9719, 10; 2026-02-21T09:23:32.8998416Z mul.wide.s32 %rd693, %r9720, 2; 2026-02-21T09:23:32.8998482Z or.b64 %rd80, %rd693, 256; 2026-02-21T09:23:32.8998554Z or.b32 %r9721, %r13, %r9694; 2026-02-21T09:23:32.8998618Z sub.s32 %r9722, %r9721, %r9697; 2026-02-21T09:23:32.8998678Z shl.b32 %r9723, %r9722, 10; 2026-02-21T09:23:32.8998750Z mul.wide.s32 %rd694, %r9723, 2; 2026-02-21T09:23:32.8998813Z or.b64 %rd81, %rd694, 256; 2026-02-21T09:23:32.8998875Z or.b32 %r9724, %r12, %r9694; 2026-02-21T09:23:32.8999001Z sub.s32 %r9725, %r9724, %r9697; 2026-02-21T09:23:32.8999070Z shl.b32 %r9726, %r9725, 10; 2026-02-21T09:23:32.8999135Z mul.wide.s32 %rd695, %r9726, 2; 2026-02-21T09:23:32.8999200Z or.b64 %rd82, %rd695, 256; 2026-02-21T09:23:32.8999270Z or.b32 %r9727, %r11, %r9694; 2026-02-21T09:23:32.8999332Z sub.s32 %r9728, %r9727, %r9697; 2026-02-21T09:23:32.8999392Z shl.b32 %r9729, %r9728, 10; 2026-02-21T09:23:32.8999458Z mul.wide.s32 %rd696, %r9729, 2; 2026-02-21T09:23:32.8999585Z or.b64 %rd83, %rd696, 256; 2026-02-21T09:23:32.8999648Z or.b32 %r9730, %r10, %r9694; 2026-02-21T09:23:32.8999710Z sub.s32 %r9731, %r9730, %r9697; 2026-02-21T09:23:32.8999781Z shl.b32 %r9732, %r9731, 10; 2026-02-21T09:23:32.8999849Z mul.wide.s32 %rd697, %r9732, 2; 2026-02-21T09:23:32.8999911Z or.b64 %rd84, %rd697, 256; 2026-02-21T09:23:32.8999973Z or.b32 %r9733, %r9, %r9694; 2026-02-21T09:23:32.9000042Z sub.s32 %r9734, %r9733, %r9697; 2026-02-21T09:23:32.9000104Z shl.b32 %r9735, %r9734, 10; 2026-02-21T09:23:32.9000173Z mul.wide.s32 %rd698, %r9735, 2; 2026-02-21T09:23:32.9000240Z or.b64 %rd85, %rd698, 256; 2026-02-21T09:23:32.9000302Z or.b32 %r9736, %r8, %r9694; 2026-02-21T09:23:32.9000368Z sub.s32 %r9737, %r9736, %r9697; 2026-02-21T09:23:32.9000427Z shl.b32 %r9738, %r9737, 10; 2026-02-21T09:23:32.9000497Z mul.wide.s32 %rd699, %r9738, 2; 2026-02-21T09:23:32.9000560Z or.b64 %rd86, %rd699, 256; 2026-02-21T09:23:32.9000620Z or.b32 %r9739, %r7, %r9694; 2026-02-21T09:23:32.9000696Z sub.s32 %r9740, %r9739, %r9697; 2026-02-21T09:23:32.9000762Z shl.b32 %r9741, %r9740, 10; 2026-02-21T09:23:32.9000827Z mul.wide.s32 %rd700, %r9741, 2; 2026-02-21T09:23:32.9000892Z or.b64 %rd87, %rd700, 256; 2026-02-21T09:23:32.9000953Z or.b32 %r9742, %r6, %r9694; 2026-02-21T09:23:32.9001017Z sub.s32 %r9743, %r9742, %r9697; 2026-02-21T09:23:32.9001077Z shl.b32 %r9744, %r9743, 10; 2026-02-21T09:23:32.9001149Z mul.wide.s32 %rd701, %r9744, 2; 2026-02-21T09:23:32.9001211Z or.b64 %rd88, %rd701, 256; 2026-02-21T09:23:32.9001276Z mov.b32 %r12723, 0f00000000; 2026-02-21T09:23:32.9001343Z mov.b32 %r12722, 1; 2026-02-21T09:23:32.9001405Z mov.b32 %r12721, -1; 2026-02-21T09:23:32.9001464Z mov.b64 %rd743, 0; 2026-02-21T09:23:32.9001529Z mov.b64 %rd742, %rd11; 2026-02-21T09:23:32.9001599Z mov.b32 %r12720, %r9530; 2026-02-21T09:23:32.9001661Z mov.b32 %r12724, %r12723; 2026-02-21T09:23:32.9001722Z mov.b32 %r12725, %r12723; 2026-02-21T09:23:32.9001789Z mov.b32 %r12726, %r12723; 2026-02-21T09:23:32.9001851Z mov.b32 %r12727, %r12723; 2026-02-21T09:23:32.9001910Z mov.b32 %r12728, %r12723; 2026-02-21T09:23:32.9001971Z mov.b32 %r12729, %r12723; 2026-02-21T09:23:32.9002038Z mov.b32 %r12730, %r12723; 2026-02-21T09:23:32.9002097Z mov.b32 %r12731, %r12723; 2026-02-21T09:23:32.9002158Z mov.b32 %r12732, %r12723; 2026-02-21T09:23:32.9002223Z mov.b32 %r12733, %r12723; 2026-02-21T09:23:32.9002287Z mov.b32 %r12734, %r12723; 2026-02-21T09:23:32.9002348Z mov.b32 %r12735, %r12723; 2026-02-21T09:23:32.9002409Z mov.b32 %r12736, %r12723; 2026-02-21T09:23:32.9002477Z mov.b32 %r12737, %r12723; 2026-02-21T09:23:32.9002536Z mov.b32 %r12738, %r12723; 2026-02-21T09:23:32.9002595Z mov.b32 %r12739, %r12723; 2026-02-21T09:23:32.9002774Z mov.b32 %r12740, %r12723; 2026-02-21T09:23:32.9002848Z mov.b32 %r12741, %r12723; 2026-02-21T09:23:32.9002910Z mov.b32 %r12742, %r12723; 2026-02-21T09:23:32.9002971Z mov.b32 %r12743, %r12723; 2026-02-21T09:23:32.9003039Z mov.b32 %r12744, %r12723; 2026-02-21T09:23:32.9003101Z mov.b32 %r12745, %r12723; 2026-02-21T09:23:32.9003162Z mov.b32 %r12746, %r12723; 2026-02-21T09:23:32.9003228Z mov.b32 %r12747, %r12723; 2026-02-21T09:23:32.9003290Z mov.b32 %r12748, %r12723; 2026-02-21T09:23:32.9003350Z mov.b32 %r12749, %r12723; 2026-02-21T09:23:32.9003414Z mov.b32 %r12750, %r12723; 2026-02-21T09:23:32.9003475Z mov.b32 %r12751, %r12723; 2026-02-21T09:23:32.9003535Z mov.b32 %r12752, %r12723; 2026-02-21T09:23:32.9003596Z mov.b32 %r12753, %r12723; 2026-02-21T09:23:32.9003719Z mov.b32 %r12754, %r12723; 2026-02-21T09:23:32.9003782Z mov.b32 %r12755, %r12723; 2026-02-21T09:23:32.9003841Z mov.b32 %r12756, %r12723; 2026-02-21T09:23:32.9003907Z mov.b32 %r12757, %r12723; 2026-02-21T09:23:32.9003971Z mov.b32 %r12758, %r12723; 2026-02-21T09:23:32.9004031Z mov.b32 %r12759, %r12723; 2026-02-21T09:23:32.9004091Z mov.b32 %r12760, %r12723; 2026-02-21T09:23:32.9004157Z mov.b32 %r12761, %r12723; 2026-02-21T09:23:32.9004261Z mov.b32 %r12762, %r12723; 2026-02-21T09:23:32.9004322Z mov.b32 %r12763, %r12723; 2026-02-21T09:23:32.9004388Z mov.b32 %r12764, %r12723; 2026-02-21T09:23:32.9004448Z mov.b32 %r12765, %r12723; 2026-02-21T09:23:32.9004508Z mov.b32 %r12766, %r12723; 2026-02-21T09:23:32.9004569Z mov.b32 %r12767, %r12723; 2026-02-21T09:23:32.9004636Z mov.b32 %r12768, %r12723; 2026-02-21T09:23:32.9004698Z mov.b32 %r12769, %r12723; 2026-02-21T09:23:32.9004757Z mov.b32 %r12770, %r12723; 2026-02-21T09:23:32.9004832Z mov.b32 %r12771, %r12723; 2026-02-21T09:23:32.9004895Z mov.b32 %r12772, %r12723; 2026-02-21T09:23:32.9004956Z mov.b32 %r12773, %r12723; 2026-02-21T09:23:32.9005016Z mov.b32 %r12774, %r12723; 2026-02-21T09:23:32.9005081Z mov.b32 %r12775, %r12723; 2026-02-21T09:23:32.9005144Z mov.b32 %r12776, %r12723; 2026-02-21T09:23:32.9005203Z mov.b32 %r12777, %r12723; 2026-02-21T09:23:32.9005266Z mov.b32 %r12778, %r12723; 2026-02-21T09:23:32.9005328Z mov.b32 %r12779, %r12723; 2026-02-21T09:23:32.9005391Z mov.b32 %r12780, %r12723; 2026-02-21T09:23:32.9005451Z mov.b32 %r12781, %r12723; 2026-02-21T09:23:32.9005519Z mov.b32 %r12782, %r12723; 2026-02-21T09:23:32.9005579Z mov.b32 %r12783, %r12723; 2026-02-21T09:23:32.9005640Z mov.b32 %r12784, %r12723; 2026-02-21T09:23:32.9005704Z mov.b32 %r12785, %r12723; 2026-02-21T09:23:32.9005764Z mov.b32 %r12786, %r12723; 2026-02-21T09:23:32.9005824Z mov.b32 %r12787, %r12723; 2026-02-21T09:23:32.9005895Z mov.b32 %r12788, %r12723; 2026-02-21T09:23:32.9005962Z mov.b32 %r12789, %r12723; 2026-02-21T09:23:32.9006024Z mov.b32 %r12790, %r12723; 2026-02-21T09:23:32.9006084Z mov.b32 %r12791, %r12723; 2026-02-21T09:23:32.9006149Z mov.b32 %r12792, %r12723; 2026-02-21T09:23:32.9006210Z mov.b32 %r12793, %r12723; 2026-02-21T09:23:32.9006274Z mov.b32 %r12794, %r12723; 2026-02-21T09:23:32.9006339Z mov.b32 %r12795, %r12723; 2026-02-21T09:23:32.9006400Z mov.b32 %r12796, %r12723; 2026-02-21T09:23:32.9006605Z mov.b32 %r12797, %r12723; 2026-02-21T09:23:32.9006673Z mov.b32 %r12798, %r12723; 2026-02-21T09:23:32.9006739Z mov.b32 %r12799, %r12723; 2026-02-21T09:23:32.9006800Z mov.b32 %r12800, %r12723; 2026-02-21T09:23:32.9006859Z mov.b32 %r12801, %r12723; 2026-02-21T09:23:32.9006924Z mov.b32 %r12802, %r12723; 2026-02-21T09:23:32.9006983Z mov.b32 %r12803, %r12723; 2026-02-21T09:23:32.9007043Z mov.b32 %r12804, %r12723; 2026-02-21T09:23:32.9007105Z mov.b32 %r12805, %r12723; 2026-02-21T09:23:32.9007170Z mov.b32 %r12806, %r12723; 2026-02-21T09:23:32.9007231Z mov.b32 %r12807, %r12723; 2026-02-21T09:23:32.9007294Z mov.b32 %r12808, %r12723; 2026-02-21T09:23:32.9007361Z mov.b32 %r12809, %r12723; 2026-02-21T09:23:32.9007421Z mov.b32 %r12810, %r12723; 2026-02-21T09:23:32.9007481Z mov.b32 %r12811, %r12723; 2026-02-21T09:23:32.9007699Z mov.b32 %r12812, %r12723; 2026-02-21T09:23:32.9007767Z mov.b32 %r12813, %r12723; 2026-02-21T09:23:32.9007827Z mov.b32 %r12814, %r12723; 2026-02-21T09:23:32.9007888Z mov.b32 %r12815, %r12723; 2026-02-21T09:23:32.9007952Z mov.b32 %r12816, %r12723; 2026-02-21T09:23:32.9008011Z mov.b32 %r12817, %r12723; 2026-02-21T09:23:32.9008071Z mov.b32 %r12818, %r12723; 2026-02-21T09:23:32.9008133Z mov.b32 %r12819, %r12723; 2026-02-21T09:23:32.9008199Z mov.b32 %r12820, %r12723; 2026-02-21T09:23:32.9008258Z mov.b32 %r12821, %r12723; 2026-02-21T09:23:32.9008317Z mov.b32 %r12822, %r12723; 2026-02-21T09:23:32.9008383Z mov.b32 %r12823, %r12723; 2026-02-21T09:23:32.9008442Z mov.b32 %r12824, %r12723; 2026-02-21T09:23:32.9008502Z mov.b32 %r12825, %r12723; 2026-02-21T09:23:32.9008626Z mov.b32 %r12826, %r12723; 2026-02-21T09:23:32.9008695Z mov.b32 %r12827, %r12723; 2026-02-21T09:23:32.9008754Z mov.b32 %r12828, %r12723; 2026-02-21T09:23:32.9008812Z mov.b32 %r12829, %r12723; 2026-02-21T09:23:32.9008881Z mov.b32 %r12830, %r12723; 2026-02-21T09:23:32.9008940Z mov.b32 %r12831, %r12723; 2026-02-21T09:23:32.9009002Z mov.b32 %r12832, %r12723; 2026-02-21T09:23:32.9009064Z mov.b32 %r12833, %r12723; 2026-02-21T09:23:32.9009193Z mov.b32 %r12834, %r12723; 2026-02-21T09:23:32.9009267Z mov.b32 %r12835, %r12723; 2026-02-21T09:23:32.9009331Z mov.b32 %r12836, %r12723; 2026-02-21T09:23:32.9009397Z mov.b32 %r12837, %r12723; 2026-02-21T09:23:32.9009465Z mov.b32 %r12838, %r12723; 2026-02-21T09:23:32.9009525Z mov.b32 %r12839, %r12723; 2026-02-21T09:23:32.9009586Z mov.b32 %r12840, %r12723; 2026-02-21T09:23:32.9009651Z mov.b32 %r12841, %r12723; 2026-02-21T09:23:32.9009711Z mov.b32 %r12842, %r12723; 2026-02-21T09:23:32.9009771Z mov.b32 %r12843, %r12723; 2026-02-21T09:23:32.9009841Z mov.b32 %r12844, %r12723; 2026-02-21T09:23:32.9009902Z mov.b32 %r12845, %r12723; 2026-02-21T09:23:32.9009962Z mov.b32 %r12846, %r12723; 2026-02-21T09:23:32.9010039Z mov.b32 %r12847, %r12723; 2026-02-21T09:23:32.9010110Z mov.b32 %r12848, %r12723; 2026-02-21T09:23:32.9010171Z mov.b32 %r12849, %r12723; 2026-02-21T09:23:32.9010232Z mov.b32 %r12850, %r12723; 2026-02-21T09:23:32.9010354Z $L__BB0_9: // Parent Loop BB0_2 Depth=1 2026-02-21T09:23:32.9010468Z // => This Inner Loop Header: Depth=2 2026-02-21T09:23:32.9010539Z setp.lt.u64 %p187, %rd743, 448; 2026-02-21T09:23:32.9010608Z add.s32 %r12158, %r12721, 1; 2026-02-21T09:23:32.9010678Z setp.gt.s32 %p188, %r12158, 1; 2026-02-21T09:23:32.9010751Z selp.b32 %r12721, 0, %r12158, %p188; 2026-02-21T09:23:32.9010817Z selp.b32 %r12159, 1, 0, %p188; 2026-02-21T09:23:32.9010891Z xor.b32 %r12720, %r12720, %r12159; 2026-02-21T09:23:32.9011098Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.9011169Z cp.async.wait_group 1; 2026-02-21T09:23:32.9011234Z bar.sync 0; 2026-02-21T09:23:32.9011301Z shl.b32 %r12160, %r12721, 14; 2026-02-21T09:23:32.9011368Z add.s32 %r12162, %r1205, %r12160; 2026-02-21T09:23:32.9011574Z .loc 1 56 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:56:32 2026-02-21T09:23:32.9011642Z add.s32 %r12163, %r12162, %r90; 2026-02-21T09:23:32.9011714Z ld.shared.b16 %rs673, [%r12163]; 2026-02-21T09:23:32.9011789Z ld.shared.b16 %rs674, [%r12163+1024]; 2026-02-21T09:23:32.9011866Z ld.shared.b16 %rs675, [%r12163+64]; 2026-02-21T09:23:32.9011937Z ld.shared.b16 %rs676, [%r12163+1088]; 2026-02-21T09:23:32.9012003Z ld.shared.b16 %rs677, [%r12163+8192]; 2026-02-21T09:23:32.9012088Z ld.shared.b16 %rs678, [%r12163+9216]; 2026-02-21T09:23:32.9012157Z ld.shared.b16 %rs679, [%r12163+8256]; 2026-02-21T09:23:32.9012229Z ld.shared.b16 %rs680, [%r12163+9280]; 2026-02-21T09:23:32.9012298Z add.s32 %r12164, %r12162, %r91; 2026-02-21T09:23:32.9012368Z ld.shared.b16 %rs681, [%r12164]; 2026-02-21T09:23:32.9012437Z ld.shared.b16 %rs682, [%r12164+1024]; 2026-02-21T09:23:32.9012626Z ld.shared.b16 %rs683, [%r12164+64]; 2026-02-21T09:23:32.9012701Z ld.shared.b16 %rs684, [%r12164+1088]; 2026-02-21T09:23:32.9012773Z ld.shared.b16 %rs685, [%r12164+8192]; 2026-02-21T09:23:32.9012842Z ld.shared.b16 %rs686, [%r12164+9216]; 2026-02-21T09:23:32.9012914Z ld.shared.b16 %rs687, [%r12164+8256]; 2026-02-21T09:23:32.9012985Z ld.shared.b16 %rs688, [%r12164+9280]; 2026-02-21T09:23:32.9013048Z add.s32 %r12165, %r12162, %r92; 2026-02-21T09:23:32.9013116Z ld.shared.b16 %rs689, [%r12165]; 2026-02-21T09:23:32.9013189Z ld.shared.b16 %rs690, [%r12165+1024]; 2026-02-21T09:23:32.9013256Z ld.shared.b16 %rs691, [%r12165+64]; 2026-02-21T09:23:32.9013325Z ld.shared.b16 %rs692, [%r12165+1088]; 2026-02-21T09:23:32.9013464Z ld.shared.b16 %rs693, [%r12165+8192]; 2026-02-21T09:23:32.9013538Z ld.shared.b16 %rs694, [%r12165+9216]; 2026-02-21T09:23:32.9013606Z ld.shared.b16 %rs695, [%r12165+8256]; 2026-02-21T09:23:32.9013681Z ld.shared.b16 %rs696, [%r12165+9280]; 2026-02-21T09:23:32.9013748Z add.s32 %r12166, %r12162, %r93; 2026-02-21T09:23:32.9013816Z ld.shared.b16 %rs697, [%r12166]; 2026-02-21T09:23:32.9013885Z ld.shared.b16 %rs698, [%r12166+1024]; 2026-02-21T09:23:32.9013958Z ld.shared.b16 %rs699, [%r12166+64]; 2026-02-21T09:23:32.9014088Z ld.shared.b16 %rs700, [%r12166+1088]; 2026-02-21T09:23:32.9014158Z ld.shared.b16 %rs701, [%r12166+8192]; 2026-02-21T09:23:32.9014231Z ld.shared.b16 %rs702, [%r12166+9216]; 2026-02-21T09:23:32.9014300Z ld.shared.b16 %rs703, [%r12166+8256]; 2026-02-21T09:23:32.9014368Z ld.shared.b16 %rs704, [%r12166+9280]; 2026-02-21T09:23:32.9014432Z add.s32 %r12167, %r12162, %r94; 2026-02-21T09:23:32.9014503Z ld.shared.b16 %rs705, [%r12167]; 2026-02-21T09:23:32.9014573Z ld.shared.b16 %rs706, [%r12167+1024]; 2026-02-21T09:23:32.9014656Z ld.shared.b16 %rs707, [%r12167+64]; 2026-02-21T09:23:32.9014732Z ld.shared.b16 %rs708, [%r12167+1088]; 2026-02-21T09:23:32.9014802Z ld.shared.b16 %rs709, [%r12167+8192]; 2026-02-21T09:23:32.9014874Z ld.shared.b16 %rs710, [%r12167+9216]; 2026-02-21T09:23:32.9014946Z ld.shared.b16 %rs711, [%r12167+8256]; 2026-02-21T09:23:32.9015016Z ld.shared.b16 %rs712, [%r12167+9280]; 2026-02-21T09:23:32.9015079Z add.s32 %r12168, %r12162, %r95; 2026-02-21T09:23:32.9015148Z ld.shared.b16 %rs713, [%r12168]; 2026-02-21T09:23:32.9015222Z ld.shared.b16 %rs714, [%r12168+1024]; 2026-02-21T09:23:32.9015293Z ld.shared.b16 %rs715, [%r12168+64]; 2026-02-21T09:23:32.9015363Z ld.shared.b16 %rs716, [%r12168+1088]; 2026-02-21T09:23:32.9015438Z ld.shared.b16 %rs717, [%r12168+8192]; 2026-02-21T09:23:32.9015505Z ld.shared.b16 %rs718, [%r12168+9216]; 2026-02-21T09:23:32.9015575Z ld.shared.b16 %rs719, [%r12168+8256]; 2026-02-21T09:23:32.9015654Z ld.shared.b16 %rs720, [%r12168+9280]; 2026-02-21T09:23:32.9015726Z add.s32 %r12169, %r12162, %r96; 2026-02-21T09:23:32.9015794Z ld.shared.b16 %rs721, [%r12169]; 2026-02-21T09:23:32.9015862Z ld.shared.b16 %rs722, [%r12169+1024]; 2026-02-21T09:23:32.9015940Z ld.shared.b16 %rs723, [%r12169+64]; 2026-02-21T09:23:32.9016008Z ld.shared.b16 %rs724, [%r12169+1088]; 2026-02-21T09:23:32.9016077Z ld.shared.b16 %rs725, [%r12169+8192]; 2026-02-21T09:23:32.9016150Z ld.shared.b16 %rs726, [%r12169+9216]; 2026-02-21T09:23:32.9016220Z ld.shared.b16 %rs727, [%r12169+8256]; 2026-02-21T09:23:32.9016290Z ld.shared.b16 %rs728, [%r12169+9280]; 2026-02-21T09:23:32.9016354Z add.s32 %r12170, %r12162, %r97; 2026-02-21T09:23:32.9016428Z ld.shared.b16 %rs729, [%r12170]; 2026-02-21T09:23:32.9016624Z ld.shared.b16 %rs730, [%r12170+1024]; 2026-02-21T09:23:32.9016696Z ld.shared.b16 %rs731, [%r12170+64]; 2026-02-21T09:23:32.9016769Z ld.shared.b16 %rs732, [%r12170+1088]; 2026-02-21T09:23:32.9016839Z ld.shared.b16 %rs733, [%r12170+8192]; 2026-02-21T09:23:32.9016910Z ld.shared.b16 %rs734, [%r12170+9216]; 2026-02-21T09:23:32.9016980Z ld.shared.b16 %rs735, [%r12170+8256]; 2026-02-21T09:23:32.9017058Z ld.shared.b16 %rs736, [%r12170+9280]; 2026-02-21T09:23:32.9017308Z cvt.f32.bf16 %r9875, %rs673; 2026-02-21T09:23:32.9017374Z cvt.f32.bf16 %r9876, %rs674; 2026-02-21T09:23:32.9017445Z cvt.f32.bf16 %r9877, %rs681; 2026-02-21T09:23:32.9017508Z cvt.f32.bf16 %r9878, %rs682; 2026-02-21T09:23:32.9017572Z cvt.f32.bf16 %r10007, %rs689; 2026-02-21T09:23:32.9017638Z cvt.f32.bf16 %r10008, %rs690; 2026-02-21T09:23:32.9017707Z cvt.f32.bf16 %r10009, %rs697; 2026-02-21T09:23:32.9017774Z cvt.f32.bf16 %r10010, %rs698; 2026-02-21T09:23:32.9017837Z cvt.f32.bf16 %r10139, %rs705; 2026-02-21T09:23:32.9017904Z cvt.f32.bf16 %r10140, %rs706; 2026-02-21T09:23:32.9017967Z cvt.f32.bf16 %r10141, %rs713; 2026-02-21T09:23:32.9018031Z cvt.f32.bf16 %r10142, %rs714; 2026-02-21T09:23:32.9018099Z cvt.f32.bf16 %r10271, %rs721; 2026-02-21T09:23:32.9018242Z cvt.f32.bf16 %r10272, %rs722; 2026-02-21T09:23:32.9018310Z cvt.f32.bf16 %r10273, %rs729; 2026-02-21T09:23:32.9018373Z cvt.f32.bf16 %r10274, %rs730; 2026-02-21T09:23:32.9018443Z cvt.f32.bf16 %r10403, %rs675; 2026-02-21T09:23:32.9018510Z cvt.f32.bf16 %r10404, %rs676; 2026-02-21T09:23:32.9018574Z cvt.f32.bf16 %r10405, %rs683; 2026-02-21T09:23:32.9018640Z cvt.f32.bf16 %r10406, %rs684; 2026-02-21T09:23:32.9018703Z cvt.f32.bf16 %r10535, %rs691; 2026-02-21T09:23:32.9018823Z cvt.f32.bf16 %r10536, %rs692; 2026-02-21T09:23:32.9018890Z cvt.f32.bf16 %r10537, %rs699; 2026-02-21T09:23:32.9018959Z cvt.f32.bf16 %r10538, %rs700; 2026-02-21T09:23:32.9019024Z cvt.f32.bf16 %r10667, %rs707; 2026-02-21T09:23:32.9019089Z cvt.f32.bf16 %r10668, %rs708; 2026-02-21T09:23:32.9019157Z cvt.f32.bf16 %r10669, %rs715; 2026-02-21T09:23:32.9019218Z cvt.f32.bf16 %r10670, %rs716; 2026-02-21T09:23:32.9019280Z cvt.f32.bf16 %r10799, %rs723; 2026-02-21T09:23:32.9019344Z cvt.f32.bf16 %r10800, %rs724; 2026-02-21T09:23:32.9019414Z cvt.f32.bf16 %r10801, %rs731; 2026-02-21T09:23:32.9019477Z cvt.f32.bf16 %r10802, %rs732; 2026-02-21T09:23:32.9019540Z cvt.f32.bf16 %r10931, %rs677; 2026-02-21T09:23:32.9019618Z cvt.f32.bf16 %r10932, %rs678; 2026-02-21T09:23:32.9019686Z cvt.f32.bf16 %r10933, %rs685; 2026-02-21T09:23:32.9019747Z cvt.f32.bf16 %r10934, %rs686; 2026-02-21T09:23:32.9019814Z cvt.f32.bf16 %r11063, %rs693; 2026-02-21T09:23:32.9019878Z cvt.f32.bf16 %r11064, %rs694; 2026-02-21T09:23:32.9019942Z cvt.f32.bf16 %r11065, %rs701; 2026-02-21T09:23:32.9020016Z cvt.f32.bf16 %r11066, %rs702; 2026-02-21T09:23:32.9020085Z cvt.f32.bf16 %r11195, %rs709; 2026-02-21T09:23:32.9020148Z cvt.f32.bf16 %r11196, %rs710; 2026-02-21T09:23:32.9020209Z cvt.f32.bf16 %r11197, %rs717; 2026-02-21T09:23:32.9020275Z cvt.f32.bf16 %r11198, %rs718; 2026-02-21T09:23:32.9020337Z cvt.f32.bf16 %r11327, %rs725; 2026-02-21T09:23:32.9020401Z cvt.f32.bf16 %r11328, %rs726; 2026-02-21T09:23:32.9020462Z cvt.f32.bf16 %r11329, %rs733; 2026-02-21T09:23:32.9020529Z cvt.f32.bf16 %r11330, %rs734; 2026-02-21T09:23:32.9020591Z cvt.f32.bf16 %r11459, %rs679; 2026-02-21T09:23:32.9020653Z cvt.f32.bf16 %r11460, %rs680; 2026-02-21T09:23:32.9020720Z cvt.f32.bf16 %r11461, %rs687; 2026-02-21T09:23:32.9020785Z cvt.f32.bf16 %r11462, %rs688; 2026-02-21T09:23:32.9020847Z cvt.f32.bf16 %r11591, %rs695; 2026-02-21T09:23:32.9020912Z cvt.f32.bf16 %r11592, %rs696; 2026-02-21T09:23:32.9020983Z cvt.f32.bf16 %r11593, %rs703; 2026-02-21T09:23:32.9021048Z cvt.f32.bf16 %r11594, %rs704; 2026-02-21T09:23:32.9021111Z cvt.f32.bf16 %r11723, %rs711; 2026-02-21T09:23:32.9021177Z cvt.f32.bf16 %r11724, %rs712; 2026-02-21T09:23:32.9021239Z cvt.f32.bf16 %r11725, %rs719; 2026-02-21T09:23:32.9021302Z cvt.f32.bf16 %r11726, %rs720; 2026-02-21T09:23:32.9021366Z cvt.f32.bf16 %r11855, %rs727; 2026-02-21T09:23:32.9021432Z cvt.f32.bf16 %r11856, %rs728; 2026-02-21T09:23:32.9021497Z cvt.f32.bf16 %r11857, %rs735; 2026-02-21T09:23:32.9021559Z cvt.f32.bf16 %r11858, %rs736; 2026-02-21T09:23:32.9021786Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.9021853Z shl.b32 %r12171, %r12721, 3; 2026-02-21T09:23:32.9022023Z add.s32 %r9745, %r9488, %r12171; 2026-02-21T09:23:32.9022088Z // begin inline asm 2026-02-21T09:23:32.9022145Z 2026-02-21T09:23:32.9022197Z { 2026-02-21T09:23:32.9022267Z .reg .pred complete; 2026-02-21T09:23:32.9022329Z waitLoop: 2026-02-21T09:23:32.9022482Z mbarrier.try_wait.parity.shared.b64 complete, [%r9745], %r12720; 2026-02-21T09:23:32.9022561Z @!complete bra.uni waitLoop; 2026-02-21T09:23:32.9022618Z } 2026-02-21T09:23:32.9022624Z 2026-02-21T09:23:32.9022683Z // end inline asm 2026-02-21T09:23:32.9022899Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.9022970Z shl.b32 %r12173, %r12721, 12; 2026-02-21T09:23:32.9023038Z add.s32 %r12175, %r1289, %r12173; 2026-02-21T09:23:32.9023297Z .loc 1 76 58 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:76:58 2026-02-21T09:23:32.9023366Z add.s32 %r12176, %r12175, %r24; 2026-02-21T09:23:32.9023440Z add.s32 %r12177, %r12175, %r249; 2026-02-21T09:23:32.9023508Z add.s32 %r12178, %r12175, %r250; 2026-02-21T09:23:32.9023573Z add.s32 %r12179, %r12175, %r251; 2026-02-21T09:23:32.9023639Z add.s32 %r12180, %r12175, %r252; 2026-02-21T09:23:32.9023703Z add.s32 %r12181, %r12175, %r253; 2026-02-21T09:23:32.9023810Z add.s32 %r12182, %r12175, %r254; 2026-02-21T09:23:32.9023875Z add.s32 %r12183, %r12175, %r255; 2026-02-21T09:23:32.9024082Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9024151Z ld.shared.s8 %rs737, [%r12176]; 2026-02-21T09:23:32.9024350Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9024422Z shl.b16 %rs738, %rs737, 4; 2026-02-21T09:23:32.9024620Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9024694Z ld.shared.s8 %rs739, [%r12177+128]; 2026-02-21T09:23:32.9024892Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9024960Z shl.b16 %rs740, %rs739, 4; 2026-02-21T09:23:32.9025161Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9025239Z ld.shared.s8 %rs741, [%r12178+256]; 2026-02-21T09:23:32.9025441Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9025506Z shl.b16 %rs742, %rs741, 4; 2026-02-21T09:23:32.9025705Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9025781Z ld.shared.s8 %rs743, [%r12179+384]; 2026-02-21T09:23:32.9025980Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9026044Z shl.b16 %rs744, %rs743, 4; 2026-02-21T09:23:32.9026248Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9026333Z ld.shared.s8 %rs745, [%r12180+512]; 2026-02-21T09:23:32.9026648Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9026723Z shl.b16 %rs746, %rs745, 4; 2026-02-21T09:23:32.9026924Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9026993Z ld.shared.s8 %rs747, [%r12181+640]; 2026-02-21T09:23:32.9027193Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9027257Z shl.b16 %rs748, %rs747, 4; 2026-02-21T09:23:32.9027453Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9027524Z ld.shared.s8 %rs749, [%r12182+768]; 2026-02-21T09:23:32.9027740Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9027954Z shl.b16 %rs750, %rs749, 4; 2026-02-21T09:23:32.9028155Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9028285Z ld.shared.s8 %rs751, [%r12183+896]; 2026-02-21T09:23:32.9028489Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9028553Z shl.b16 %rs752, %rs751, 4; 2026-02-21T09:23:32.9028758Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9028829Z ld.shared.s8 %rs753, [%r12176+1024]; 2026-02-21T09:23:32.9029027Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9029099Z shl.b16 %rs754, %rs753, 4; 2026-02-21T09:23:32.9029362Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9029436Z ld.shared.s8 %rs755, [%r12177+1152]; 2026-02-21T09:23:32.9029639Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9029703Z shl.b16 %rs756, %rs755, 4; 2026-02-21T09:23:32.9029956Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9030041Z ld.shared.s8 %rs757, [%r12178+1280]; 2026-02-21T09:23:32.9030245Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9030310Z shl.b16 %rs758, %rs757, 4; 2026-02-21T09:23:32.9030506Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9030580Z ld.shared.s8 %rs759, [%r12179+1408]; 2026-02-21T09:23:32.9030778Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9030845Z shl.b16 %rs760, %rs759, 4; 2026-02-21T09:23:32.9031044Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9031116Z ld.shared.s8 %rs761, [%r12180+1536]; 2026-02-21T09:23:32.9031311Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9031383Z shl.b16 %rs762, %rs761, 4; 2026-02-21T09:23:32.9031580Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9031650Z ld.shared.s8 %rs763, [%r12181+1664]; 2026-02-21T09:23:32.9031858Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9031925Z shl.b16 %rs764, %rs763, 4; 2026-02-21T09:23:32.9032121Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9032189Z ld.shared.s8 %rs765, [%r12182+1792]; 2026-02-21T09:23:32.9032391Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9032456Z shl.b16 %rs766, %rs765, 4; 2026-02-21T09:23:32.9032651Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9032724Z ld.shared.s8 %rs767, [%r12183+1920]; 2026-02-21T09:23:32.9032920Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9032985Z shl.b16 %rs768, %rs767, 4; 2026-02-21T09:23:32.9033188Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9033267Z ld.shared.s8 %rs769, [%r12176+2048]; 2026-02-21T09:23:32.9033464Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9033527Z shl.b16 %rs770, %rs769, 4; 2026-02-21T09:23:32.9033727Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9033858Z ld.shared.s8 %rs771, [%r12177+2176]; 2026-02-21T09:23:32.9034117Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9034184Z shl.b16 %rs772, %rs771, 4; 2026-02-21T09:23:32.9034383Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9034454Z ld.shared.s8 %rs773, [%r12178+2304]; 2026-02-21T09:23:32.9034651Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9034713Z shl.b16 %rs774, %rs773, 4; 2026-02-21T09:23:32.9034908Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9034993Z ld.shared.s8 %rs775, [%r12179+2432]; 2026-02-21T09:23:32.9035261Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9035326Z shl.b16 %rs776, %rs775, 4; 2026-02-21T09:23:32.9035529Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9035602Z ld.shared.s8 %rs777, [%r12180+2560]; 2026-02-21T09:23:32.9035845Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9035910Z shl.b16 %rs778, %rs777, 4; 2026-02-21T09:23:32.9036112Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9036179Z ld.shared.s8 %rs779, [%r12181+2688]; 2026-02-21T09:23:32.9036373Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9036441Z shl.b16 %rs780, %rs779, 4; 2026-02-21T09:23:32.9036762Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9036834Z ld.shared.s8 %rs781, [%r12182+2816]; 2026-02-21T09:23:32.9037035Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9037103Z shl.b16 %rs782, %rs781, 4; 2026-02-21T09:23:32.9037311Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9037383Z ld.shared.s8 %rs783, [%r12183+2944]; 2026-02-21T09:23:32.9037585Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9037650Z shl.b16 %rs784, %rs783, 4; 2026-02-21T09:23:32.9037848Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9037920Z ld.shared.s8 %rs785, [%r12176+3072]; 2026-02-21T09:23:32.9038116Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9038179Z shl.b16 %rs786, %rs785, 4; 2026-02-21T09:23:32.9038380Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9038454Z ld.shared.s8 %rs787, [%r12177+3200]; 2026-02-21T09:23:32.9038653Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9038723Z shl.b16 %rs788, %rs787, 4; 2026-02-21T09:23:32.9038925Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9038992Z ld.shared.s8 %rs789, [%r12178+3328]; 2026-02-21T09:23:32.9039192Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9039258Z shl.b16 %rs790, %rs789, 4; 2026-02-21T09:23:32.9039457Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9039526Z ld.shared.s8 %rs791, [%r12179+3456]; 2026-02-21T09:23:32.9039728Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9039941Z shl.b16 %rs792, %rs791, 4; 2026-02-21T09:23:32.9040140Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9040214Z ld.shared.s8 %rs793, [%r12180+3584]; 2026-02-21T09:23:32.9040415Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9040483Z shl.b16 %rs794, %rs793, 4; 2026-02-21T09:23:32.9040688Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9040761Z ld.shared.s8 %rs795, [%r12181+3712]; 2026-02-21T09:23:32.9040960Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9041046Z shl.b16 %rs796, %rs795, 4; 2026-02-21T09:23:32.9041316Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9041389Z ld.shared.s8 %rs797, [%r12182+3840]; 2026-02-21T09:23:32.9041590Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9041660Z shl.b16 %rs798, %rs797, 4; 2026-02-21T09:23:32.9041916Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9041987Z ld.shared.s8 %rs799, [%r12183+3968]; 2026-02-21T09:23:32.9042188Z .loc 1 61 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:61:28 2026-02-21T09:23:32.9042252Z shl.b16 %rs800, %rs799, 4; 2026-02-21T09:23:32.9042452Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9042523Z cvt.s16.s8 %rs801, %rs738; 2026-02-21T09:23:32.9042586Z shr.s16 %rs802, %rs801, 4; 2026-02-21T09:23:32.9042650Z cvt.s16.s8 %rs803, %rs740; 2026-02-21T09:23:32.9042715Z shr.s16 %rs804, %rs803, 4; 2026-02-21T09:23:32.9042782Z shr.s16 %rs805, %rs737, 4; 2026-02-21T09:23:32.9042847Z shr.s16 %rs806, %rs739, 4; 2026-02-21T09:23:32.9043046Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9043131Z cvt.rn.f32.s16 %r12184, %rs806; 2026-02-21T09:23:32.9043200Z cvt.rn.f32.s16 %r12185, %rs805; 2026-02-21T09:23:32.9043269Z cvt.rn.f32.s16 %r12186, %rs804; 2026-02-21T09:23:32.9043342Z cvt.rn.f32.s16 %r12187, %rs802; 2026-02-21T09:23:32.9043541Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9043606Z cvt.s16.s8 %rs807, %rs742; 2026-02-21T09:23:32.9043669Z shr.s16 %rs808, %rs807, 4; 2026-02-21T09:23:32.9043736Z cvt.s16.s8 %rs809, %rs744; 2026-02-21T09:23:32.9043801Z shr.s16 %rs810, %rs809, 4; 2026-02-21T09:23:32.9043863Z shr.s16 %rs811, %rs741, 4; 2026-02-21T09:23:32.9043932Z shr.s16 %rs812, %rs743, 4; 2026-02-21T09:23:32.9044131Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9044202Z cvt.rn.f32.s16 %r12188, %rs812; 2026-02-21T09:23:32.9044275Z cvt.rn.f32.s16 %r12189, %rs811; 2026-02-21T09:23:32.9044350Z cvt.rn.f32.s16 %r12190, %rs810; 2026-02-21T09:23:32.9044415Z cvt.rn.f32.s16 %r12191, %rs808; 2026-02-21T09:23:32.9044616Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9044686Z cvt.s16.s8 %rs813, %rs746; 2026-02-21T09:23:32.9044749Z shr.s16 %rs814, %rs813, 4; 2026-02-21T09:23:32.9044815Z cvt.s16.s8 %rs815, %rs748; 2026-02-21T09:23:32.9044880Z shr.s16 %rs816, %rs815, 4; 2026-02-21T09:23:32.9044943Z shr.s16 %rs817, %rs745, 4; 2026-02-21T09:23:32.9045004Z shr.s16 %rs818, %rs747, 4; 2026-02-21T09:23:32.9045202Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9045276Z cvt.rn.f32.s16 %r12192, %rs818; 2026-02-21T09:23:32.9045341Z cvt.rn.f32.s16 %r12193, %rs817; 2026-02-21T09:23:32.9045479Z cvt.rn.f32.s16 %r12194, %rs816; 2026-02-21T09:23:32.9045596Z cvt.rn.f32.s16 %r12195, %rs814; 2026-02-21T09:23:32.9045791Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9045854Z cvt.s16.s8 %rs819, %rs750; 2026-02-21T09:23:32.9045921Z shr.s16 %rs820, %rs819, 4; 2026-02-21T09:23:32.9045984Z cvt.s16.s8 %rs821, %rs752; 2026-02-21T09:23:32.9046045Z shr.s16 %rs822, %rs821, 4; 2026-02-21T09:23:32.9046107Z shr.s16 %rs823, %rs749, 4; 2026-02-21T09:23:32.9046173Z shr.s16 %rs824, %rs751, 4; 2026-02-21T09:23:32.9046369Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9046437Z cvt.rn.f32.s16 %r12196, %rs824; 2026-02-21T09:23:32.9046625Z cvt.rn.f32.s16 %r12197, %rs823; 2026-02-21T09:23:32.9046769Z cvt.rn.f32.s16 %r12198, %rs822; 2026-02-21T09:23:32.9046838Z cvt.rn.f32.s16 %r12199, %rs820; 2026-02-21T09:23:32.9047044Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9047113Z cvt.s16.s8 %rs825, %rs754; 2026-02-21T09:23:32.9047176Z shr.s16 %rs826, %rs825, 4; 2026-02-21T09:23:32.9047238Z cvt.s16.s8 %rs827, %rs756; 2026-02-21T09:23:32.9047365Z shr.s16 %rs828, %rs827, 4; 2026-02-21T09:23:32.9047431Z shr.s16 %rs829, %rs753, 4; 2026-02-21T09:23:32.9047492Z shr.s16 %rs830, %rs755, 4; 2026-02-21T09:23:32.9047697Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9047763Z cvt.rn.f32.s16 %r12200, %rs830; 2026-02-21T09:23:32.9047829Z cvt.rn.f32.s16 %r12201, %rs829; 2026-02-21T09:23:32.9047910Z cvt.rn.f32.s16 %r12202, %rs828; 2026-02-21T09:23:32.9047984Z cvt.rn.f32.s16 %r12203, %rs826; 2026-02-21T09:23:32.9048184Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9048248Z cvt.s16.s8 %rs831, %rs758; 2026-02-21T09:23:32.9048316Z shr.s16 %rs832, %rs831, 4; 2026-02-21T09:23:32.9048383Z cvt.s16.s8 %rs833, %rs760; 2026-02-21T09:23:32.9048445Z shr.s16 %rs834, %rs833, 4; 2026-02-21T09:23:32.9048512Z shr.s16 %rs835, %rs757, 4; 2026-02-21T09:23:32.9048574Z shr.s16 %rs836, %rs759, 4; 2026-02-21T09:23:32.9048775Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9048844Z cvt.rn.f32.s16 %r12204, %rs836; 2026-02-21T09:23:32.9048919Z cvt.rn.f32.s16 %r12205, %rs835; 2026-02-21T09:23:32.9048983Z cvt.rn.f32.s16 %r12206, %rs834; 2026-02-21T09:23:32.9049047Z cvt.rn.f32.s16 %r12207, %rs832; 2026-02-21T09:23:32.9049249Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9049316Z cvt.s16.s8 %rs837, %rs762; 2026-02-21T09:23:32.9049381Z shr.s16 %rs838, %rs837, 4; 2026-02-21T09:23:32.9049456Z cvt.s16.s8 %rs839, %rs764; 2026-02-21T09:23:32.9049519Z shr.s16 %rs840, %rs839, 4; 2026-02-21T09:23:32.9049585Z shr.s16 %rs841, %rs761, 4; 2026-02-21T09:23:32.9049648Z shr.s16 %rs842, %rs763, 4; 2026-02-21T09:23:32.9049852Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9049918Z cvt.rn.f32.s16 %r12208, %rs842; 2026-02-21T09:23:32.9049986Z cvt.rn.f32.s16 %r12209, %rs841; 2026-02-21T09:23:32.9050058Z cvt.rn.f32.s16 %r12210, %rs840; 2026-02-21T09:23:32.9050123Z cvt.rn.f32.s16 %r12211, %rs838; 2026-02-21T09:23:32.9050324Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9050390Z cvt.s16.s8 %rs843, %rs766; 2026-02-21T09:23:32.9050467Z shr.s16 %rs844, %rs843, 4; 2026-02-21T09:23:32.9050532Z cvt.s16.s8 %rs845, %rs768; 2026-02-21T09:23:32.9050597Z shr.s16 %rs846, %rs845, 4; 2026-02-21T09:23:32.9050663Z shr.s16 %rs847, %rs765, 4; 2026-02-21T09:23:32.9050726Z shr.s16 %rs848, %rs767, 4; 2026-02-21T09:23:32.9050923Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9051140Z cvt.rn.f32.s16 %r12212, %rs848; 2026-02-21T09:23:32.9051203Z cvt.rn.f32.s16 %r12213, %rs847; 2026-02-21T09:23:32.9051268Z cvt.rn.f32.s16 %r12214, %rs846; 2026-02-21T09:23:32.9051333Z cvt.rn.f32.s16 %r12215, %rs844; 2026-02-21T09:23:32.9051551Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9051620Z cvt.s16.s8 %rs849, %rs770; 2026-02-21T09:23:32.9051684Z shr.s16 %rs850, %rs849, 4; 2026-02-21T09:23:32.9051751Z cvt.s16.s8 %rs851, %rs772; 2026-02-21T09:23:32.9051817Z shr.s16 %rs852, %rs851, 4; 2026-02-21T09:23:32.9051880Z shr.s16 %rs853, %rs769, 4; 2026-02-21T09:23:32.9051946Z shr.s16 %rs854, %rs771, 4; 2026-02-21T09:23:32.9052198Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9052266Z cvt.rn.f32.s16 %r12216, %rs854; 2026-02-21T09:23:32.9052345Z cvt.rn.f32.s16 %r12217, %rs853; 2026-02-21T09:23:32.9052419Z cvt.rn.f32.s16 %r12218, %rs852; 2026-02-21T09:23:32.9052485Z cvt.rn.f32.s16 %r12219, %rs850; 2026-02-21T09:23:32.9052685Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9052802Z cvt.s16.s8 %rs855, %rs774; 2026-02-21T09:23:32.9052865Z shr.s16 %rs856, %rs855, 4; 2026-02-21T09:23:32.9052925Z cvt.s16.s8 %rs857, %rs776; 2026-02-21T09:23:32.9052987Z shr.s16 %rs858, %rs857, 4; 2026-02-21T09:23:32.9053053Z shr.s16 %rs859, %rs773, 4; 2026-02-21T09:23:32.9053117Z shr.s16 %rs860, %rs775, 4; 2026-02-21T09:23:32.9053315Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9053385Z cvt.rn.f32.s16 %r12220, %rs860; 2026-02-21T09:23:32.9053452Z cvt.rn.f32.s16 %r12221, %rs859; 2026-02-21T09:23:32.9053518Z cvt.rn.f32.s16 %r12222, %rs858; 2026-02-21T09:23:32.9053592Z cvt.rn.f32.s16 %r12223, %rs856; 2026-02-21T09:23:32.9053791Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9053855Z cvt.s16.s8 %rs861, %rs778; 2026-02-21T09:23:32.9053919Z shr.s16 %rs862, %rs861, 4; 2026-02-21T09:23:32.9053988Z cvt.s16.s8 %rs863, %rs780; 2026-02-21T09:23:32.9054049Z shr.s16 %rs864, %rs863, 4; 2026-02-21T09:23:32.9054110Z shr.s16 %rs865, %rs777, 4; 2026-02-21T09:23:32.9054175Z shr.s16 %rs866, %rs779, 4; 2026-02-21T09:23:32.9054374Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9054439Z cvt.rn.f32.s16 %r12224, %rs866; 2026-02-21T09:23:32.9054508Z cvt.rn.f32.s16 %r12225, %rs865; 2026-02-21T09:23:32.9054574Z cvt.rn.f32.s16 %r12226, %rs864; 2026-02-21T09:23:32.9054639Z cvt.rn.f32.s16 %r12227, %rs862; 2026-02-21T09:23:32.9054835Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9054907Z cvt.s16.s8 %rs867, %rs782; 2026-02-21T09:23:32.9054972Z shr.s16 %rs868, %rs867, 4; 2026-02-21T09:23:32.9055034Z cvt.s16.s8 %rs869, %rs784; 2026-02-21T09:23:32.9055100Z shr.s16 %rs870, %rs869, 4; 2026-02-21T09:23:32.9055163Z shr.s16 %rs871, %rs781, 4; 2026-02-21T09:23:32.9055227Z shr.s16 %rs872, %rs783, 4; 2026-02-21T09:23:32.9055424Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9055494Z cvt.rn.f32.s16 %r12228, %rs872; 2026-02-21T09:23:32.9055558Z cvt.rn.f32.s16 %r12229, %rs871; 2026-02-21T09:23:32.9055623Z cvt.rn.f32.s16 %r12230, %rs870; 2026-02-21T09:23:32.9055691Z cvt.rn.f32.s16 %r12231, %rs868; 2026-02-21T09:23:32.9055888Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9055953Z cvt.s16.s8 %rs873, %rs786; 2026-02-21T09:23:32.9056019Z shr.s16 %rs874, %rs873, 4; 2026-02-21T09:23:32.9056082Z cvt.s16.s8 %rs875, %rs788; 2026-02-21T09:23:32.9056281Z shr.s16 %rs876, %rs875, 4; 2026-02-21T09:23:32.9056343Z shr.s16 %rs877, %rs785, 4; 2026-02-21T09:23:32.9056411Z shr.s16 %rs878, %rs787, 4; 2026-02-21T09:23:32.9056723Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9056796Z cvt.rn.f32.s16 %r12232, %rs878; 2026-02-21T09:23:32.9056870Z cvt.rn.f32.s16 %r12233, %rs877; 2026-02-21T09:23:32.9056934Z cvt.rn.f32.s16 %r12234, %rs876; 2026-02-21T09:23:32.9056997Z cvt.rn.f32.s16 %r12235, %rs874; 2026-02-21T09:23:32.9057199Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9057263Z cvt.s16.s8 %rs879, %rs790; 2026-02-21T09:23:32.9057337Z shr.s16 %rs880, %rs879, 4; 2026-02-21T09:23:32.9057483Z cvt.s16.s8 %rs881, %rs792; 2026-02-21T09:23:32.9057557Z shr.s16 %rs882, %rs881, 4; 2026-02-21T09:23:32.9057619Z shr.s16 %rs883, %rs789, 4; 2026-02-21T09:23:32.9057682Z shr.s16 %rs884, %rs791, 4; 2026-02-21T09:23:32.9057895Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9057960Z cvt.rn.f32.s16 %r12236, %rs884; 2026-02-21T09:23:32.9058024Z cvt.rn.f32.s16 %r12237, %rs883; 2026-02-21T09:23:32.9058153Z cvt.rn.f32.s16 %r12238, %rs882; 2026-02-21T09:23:32.9058223Z cvt.rn.f32.s16 %r12239, %rs880; 2026-02-21T09:23:32.9058420Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9058485Z cvt.s16.s8 %rs885, %rs794; 2026-02-21T09:23:32.9058552Z shr.s16 %rs886, %rs885, 4; 2026-02-21T09:23:32.9058617Z cvt.s16.s8 %rs887, %rs796; 2026-02-21T09:23:32.9058679Z shr.s16 %rs888, %rs887, 4; 2026-02-21T09:23:32.9058760Z shr.s16 %rs889, %rs793, 4; 2026-02-21T09:23:32.9058828Z shr.s16 %rs890, %rs795, 4; 2026-02-21T09:23:32.9059028Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9059097Z cvt.rn.f32.s16 %r12240, %rs890; 2026-02-21T09:23:32.9059170Z cvt.rn.f32.s16 %r12241, %rs889; 2026-02-21T09:23:32.9059233Z cvt.rn.f32.s16 %r12242, %rs888; 2026-02-21T09:23:32.9059296Z cvt.rn.f32.s16 %r12243, %rs886; 2026-02-21T09:23:32.9059497Z .loc 1 63 25 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:63:25 2026-02-21T09:23:32.9059560Z cvt.s16.s8 %rs891, %rs798; 2026-02-21T09:23:32.9059622Z shr.s16 %rs892, %rs891, 4; 2026-02-21T09:23:32.9059690Z cvt.s16.s8 %rs893, %rs800; 2026-02-21T09:23:32.9059754Z shr.s16 %rs894, %rs893, 4; 2026-02-21T09:23:32.9059819Z shr.s16 %rs895, %rs797, 4; 2026-02-21T09:23:32.9059881Z shr.s16 %rs896, %rs799, 4; 2026-02-21T09:23:32.9060083Z .loc 1 81 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:81:32 2026-02-21T09:23:32.9060148Z cvt.rn.f32.s16 %r12244, %rs896; 2026-02-21T09:23:32.9060214Z cvt.rn.f32.s16 %r12245, %rs895; 2026-02-21T09:23:32.9060283Z cvt.rn.f32.s16 %r12246, %rs894; 2026-02-21T09:23:32.9060350Z cvt.rn.f32.s16 %r12247, %rs892; 2026-02-21T09:23:32.9060475Z st.shared.v4.b32 [%r66], {%r12187, %r12185, %r12186, %r12184}; 2026-02-21T09:23:32.9060609Z st.shared.v4.b32 [%r66+16384], {%r12219, %r12217, %r12218, %r12216}; 2026-02-21T09:23:32.9060734Z st.shared.v4.b32 [%r67], {%r12191, %r12189, %r12190, %r12188}; 2026-02-21T09:23:32.9060859Z st.shared.v4.b32 [%r67+16384], {%r12223, %r12221, %r12222, %r12220}; 2026-02-21T09:23:32.9060984Z st.shared.v4.b32 [%r68], {%r12195, %r12193, %r12194, %r12192}; 2026-02-21T09:23:32.9061113Z st.shared.v4.b32 [%r68+16384], {%r12227, %r12225, %r12226, %r12224}; 2026-02-21T09:23:32.9061229Z st.shared.v4.b32 [%r69], {%r12199, %r12197, %r12198, %r12196}; 2026-02-21T09:23:32.9061356Z st.shared.v4.b32 [%r69+16384], {%r12231, %r12229, %r12230, %r12228}; 2026-02-21T09:23:32.9061472Z st.shared.v4.b32 [%r70], {%r12203, %r12201, %r12202, %r12200}; 2026-02-21T09:23:32.9061594Z st.shared.v4.b32 [%r70+16384], {%r12235, %r12233, %r12234, %r12232}; 2026-02-21T09:23:32.9061843Z st.shared.v4.b32 [%r71], {%r12207, %r12205, %r12206, %r12204}; 2026-02-21T09:23:32.9061985Z st.shared.v4.b32 [%r71+16384], {%r12239, %r12237, %r12238, %r12236}; 2026-02-21T09:23:32.9062099Z st.shared.v4.b32 [%r72], {%r12211, %r12209, %r12210, %r12208}; 2026-02-21T09:23:32.9062222Z st.shared.v4.b32 [%r72+16384], {%r12243, %r12241, %r12242, %r12240}; 2026-02-21T09:23:32.9062333Z st.shared.v4.b32 [%r73], {%r12215, %r12213, %r12214, %r12212}; 2026-02-21T09:23:32.9062459Z st.shared.v4.b32 [%r73+16384], {%r12247, %r12245, %r12246, %r12244}; 2026-02-21T09:23:32.9062517Z $L__tmp7: 2026-02-21T09:23:32.9062794Z .loc 2 291 36 // standard.py:291:36 @[ c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:88:40 ] 2026-02-21T09:23:32.9062861Z // begin inline asm 2026-02-21T09:23:32.9062998Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.9063059Z // end inline asm 2026-02-21T09:23:32.9063122Z bar.sync 0; 2026-02-21T09:23:32.9063195Z wgmma.fence.sync.aligned; 2026-02-21T09:23:32.9063264Z mov.pred %p167, -1; 2026-02-21T09:23:32.9063325Z // begin inline asm 2026-02-21T09:23:32.9064877Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r9875,%r9876,%r9877,%r9878}, %rd702, %p167, 1, 1; 2026-02-21T09:23:32.9064939Z // end inline asm 2026-02-21T09:23:32.9065003Z // begin inline asm 2026-02-21T09:23:32.9066623Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10007,%r10008,%r10009,%r10010}, %rd703, %p167, 1, 1; 2026-02-21T09:23:32.9066694Z // end inline asm 2026-02-21T09:23:32.9066760Z // begin inline asm 2026-02-21T09:23:32.9068336Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10139,%r10140,%r10141,%r10142}, %rd704, %p167, 1, 1; 2026-02-21T09:23:32.9068408Z // end inline asm 2026-02-21T09:23:32.9068471Z // begin inline asm 2026-02-21T09:23:32.9069970Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10271,%r10272,%r10273,%r10274}, %rd705, %p167, 1, 1; 2026-02-21T09:23:32.9070035Z // end inline asm 2026-02-21T09:23:32.9070259Z // begin inline asm 2026-02-21T09:23:32.9071754Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10403,%r10404,%r10405,%r10406}, %rd706, %p167, 1, 1; 2026-02-21T09:23:32.9071819Z // end inline asm 2026-02-21T09:23:32.9071880Z // begin inline asm 2026-02-21T09:23:32.9073483Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10535,%r10536,%r10537,%r10538}, %rd707, %p167, 1, 1; 2026-02-21T09:23:32.9073551Z // end inline asm 2026-02-21T09:23:32.9073611Z // begin inline asm 2026-02-21T09:23:32.9075106Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10667,%r10668,%r10669,%r10670}, %rd708, %p167, 1, 1; 2026-02-21T09:23:32.9075169Z // end inline asm 2026-02-21T09:23:32.9075229Z // begin inline asm 2026-02-21T09:23:32.9076850Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786}, {%r10799,%r10800,%r10801,%r10802}, %rd709, %p167, 1, 1; 2026-02-21T09:23:32.9076915Z // end inline asm 2026-02-21T09:23:32.9076979Z // begin inline asm 2026-02-21T09:23:32.9078469Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r10931,%r10932,%r10933,%r10934}, %rd702, %p167, 1, 1; 2026-02-21T09:23:32.9078527Z // end inline asm 2026-02-21T09:23:32.9078593Z // begin inline asm 2026-02-21T09:23:32.9080090Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11063,%r11064,%r11065,%r11066}, %rd703, %p167, 1, 1; 2026-02-21T09:23:32.9080297Z // end inline asm 2026-02-21T09:23:32.9080359Z // begin inline asm 2026-02-21T09:23:32.9081911Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11195,%r11196,%r11197,%r11198}, %rd704, %p167, 1, 1; 2026-02-21T09:23:32.9081980Z // end inline asm 2026-02-21T09:23:32.9082041Z // begin inline asm 2026-02-21T09:23:32.9083608Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11327,%r11328,%r11329,%r11330}, %rd705, %p167, 1, 1; 2026-02-21T09:23:32.9083671Z // end inline asm 2026-02-21T09:23:32.9083732Z // begin inline asm 2026-02-21T09:23:32.9085226Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11459,%r11460,%r11461,%r11462}, %rd706, %p167, 1, 1; 2026-02-21T09:23:32.9085290Z // end inline asm 2026-02-21T09:23:32.9085352Z // begin inline asm 2026-02-21T09:23:32.9086965Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11591,%r11592,%r11593,%r11594}, %rd707, %p167, 1, 1; 2026-02-21T09:23:32.9087044Z // end inline asm 2026-02-21T09:23:32.9087111Z // begin inline asm 2026-02-21T09:23:32.9088596Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11723,%r11724,%r11725,%r11726}, %rd708, %p167, 1, 1; 2026-02-21T09:23:32.9088789Z // end inline asm 2026-02-21T09:23:32.9088853Z // begin inline asm 2026-02-21T09:23:32.9090337Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850}, {%r11855,%r11856,%r11857,%r11858}, %rd709, %p167, 1, 1; 2026-02-21T09:23:32.9090400Z // end inline asm 2026-02-21T09:23:32.9090543Z wgmma.commit_group.sync.aligned; 2026-02-21T09:23:32.9090610Z mov.b32 %r11989, %r9530; 2026-02-21T09:23:32.9090675Z mov.b32 %r11988, %r9530; 2026-02-21T09:23:32.9090743Z mov.b32 %r11987, %r1221; 2026-02-21T09:23:32.9090803Z // begin inline asm 2026-02-21T09:23:32.9093412Z // wait for regs: %r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r12763,%r12764,%r12765,%r12766,%r12767,%r12768,%r12769,%r12770,%r12771,%r12772,%r12773,%r12774,%r12775,%r12776,%r12777,%r12778,%r12779,%r12780,%r12781,%r12782,%r12783,%r12784,%r12785,%r12786,%r12787,%r12788,%r12789,%r12790,%r12791,%r12792,%r12793,%r12794,%r12795,%r12796,%r12797,%r12798,%r12799,%r12800,%r12801,%r12802,%r12803,%r12804,%r12805,%r12806,%r12807,%r12808,%r12809,%r12810,%r12811,%r12812,%r12813,%r12814,%r12815,%r12816,%r12817,%r12818,%r12819,%r12820,%r12821,%r12822,%r12823,%r12824,%r12825,%r12826,%r12827,%r12828,%r12829,%r12830,%r12831,%r12832,%r12833,%r12834,%r12835,%r12836,%r12837,%r12838,%r12839,%r12840,%r12841,%r12842,%r12843,%r12844,%r12845,%r12846,%r12847,%r12848,%r12849,%r12850,%r11987,%r11988,%r11989 2026-02-21T09:23:32.9093504Z wgmma.wait_group.sync.aligned 0; 2026-02-21T09:23:32.9093563Z // end inline asm 2026-02-21T09:23:32.9093626Z $L__tmp8: 2026-02-21T09:23:32.9093837Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.9093905Z add.s32 %r12248, %r12722, 1; 2026-02-21T09:23:32.9093981Z setp.gt.s32 %p189, %r12248, 1; 2026-02-21T09:23:32.9094054Z selp.b32 %r12722, 0, %r12248, %p189; 2026-02-21T09:23:32.9094257Z .loc 1 52 32 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:32 2026-02-21T09:23:32.9094325Z add.s64 %rd718, %rd742, %rd88; 2026-02-21T09:23:32.9094400Z add.s64 %rd719, %rd742, %rd87; 2026-02-21T09:23:32.9094465Z add.s64 %rd720, %rd742, %rd86; 2026-02-21T09:23:32.9094530Z add.s64 %rd721, %rd742, %rd85; 2026-02-21T09:23:32.9094600Z add.s64 %rd722, %rd742, %rd84; 2026-02-21T09:23:32.9094666Z add.s64 %rd723, %rd742, %rd83; 2026-02-21T09:23:32.9094734Z add.s64 %rd724, %rd742, %rd82; 2026-02-21T09:23:32.9094801Z add.s64 %rd725, %rd742, %rd81; 2026-02-21T09:23:32.9094885Z add.s64 %rd726, %rd742, %rd80; 2026-02-21T09:23:32.9094951Z add.s64 %rd727, %rd742, %rd79; 2026-02-21T09:23:32.9095017Z add.s64 %rd728, %rd742, %rd78; 2026-02-21T09:23:32.9095086Z add.s64 %rd729, %rd742, %rd77; 2026-02-21T09:23:32.9095150Z add.s64 %rd730, %rd742, %rd76; 2026-02-21T09:23:32.9095215Z add.s64 %rd731, %rd742, %rd75; 2026-02-21T09:23:32.9095281Z add.s64 %rd732, %rd742, %rd74; 2026-02-21T09:23:32.9095482Z .loc 1 52 80 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:52:80 2026-02-21T09:23:32.9095550Z add.s64 %rd733, %rd742, %rd73; 2026-02-21T09:23:32.9095615Z shl.b32 %r12249, %r12722, 14; 2026-02-21T09:23:32.9095697Z add.s32 %r12250, %r1205, %r12249; 2026-02-21T09:23:32.9095825Z add.s32 %r12121, %r12250, %r25; 2026-02-21T09:23:32.9095937Z selp.b32 %r12122, 8, 0, %p187; 2026-02-21T09:23:32.9096004Z // begin inline asm 2026-02-21T09:23:32.9096160Z cp.async.ca.shared.global [ %r12121 + 0 ], [ %rd718 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9096220Z // end inline asm 2026-02-21T09:23:32.9096288Z add.s32 %r12123, %r12121, 1024; 2026-02-21T09:23:32.9096356Z // begin inline asm 2026-02-21T09:23:32.9096627Z cp.async.ca.shared.global [ %r12123 + 0 ], [ %rd719 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9096693Z // end inline asm 2026-02-21T09:23:32.9096763Z add.s32 %r12125, %r12121, 2048; 2026-02-21T09:23:32.9096826Z // begin inline asm 2026-02-21T09:23:32.9096976Z cp.async.ca.shared.global [ %r12125 + 0 ], [ %rd720 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9097046Z // end inline asm 2026-02-21T09:23:32.9097184Z add.s32 %r12127, %r12121, 3072; 2026-02-21T09:23:32.9097250Z // begin inline asm 2026-02-21T09:23:32.9097398Z cp.async.ca.shared.global [ %r12127 + 0 ], [ %rd721 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9097467Z // end inline asm 2026-02-21T09:23:32.9097529Z add.s32 %r12129, %r12121, 4096; 2026-02-21T09:23:32.9097591Z // begin inline asm 2026-02-21T09:23:32.9097739Z cp.async.ca.shared.global [ %r12129 + 0 ], [ %rd722 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9097874Z // end inline asm 2026-02-21T09:23:32.9097943Z add.s32 %r12131, %r12121, 5120; 2026-02-21T09:23:32.9098004Z // begin inline asm 2026-02-21T09:23:32.9098151Z cp.async.ca.shared.global [ %r12131 + 0 ], [ %rd723 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9098210Z // end inline asm 2026-02-21T09:23:32.9098273Z add.s32 %r12133, %r12121, 6144; 2026-02-21T09:23:32.9098340Z // begin inline asm 2026-02-21T09:23:32.9098483Z cp.async.ca.shared.global [ %r12133 + 0 ], [ %rd724 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9098544Z // end inline asm 2026-02-21T09:23:32.9098610Z add.s32 %r12135, %r12121, 7168; 2026-02-21T09:23:32.9098671Z // begin inline asm 2026-02-21T09:23:32.9098812Z cp.async.ca.shared.global [ %r12135 + 0 ], [ %rd725 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9098874Z // end inline asm 2026-02-21T09:23:32.9098942Z add.s32 %r12137, %r12121, 8192; 2026-02-21T09:23:32.9099000Z // begin inline asm 2026-02-21T09:23:32.9099143Z cp.async.ca.shared.global [ %r12137 + 0 ], [ %rd726 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9099205Z // end inline asm 2026-02-21T09:23:32.9099269Z add.s32 %r12139, %r12121, 9216; 2026-02-21T09:23:32.9099333Z // begin inline asm 2026-02-21T09:23:32.9099475Z cp.async.ca.shared.global [ %r12139 + 0 ], [ %rd727 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9099541Z // end inline asm 2026-02-21T09:23:32.9099607Z add.s32 %r12141, %r12121, 10240; 2026-02-21T09:23:32.9099671Z // begin inline asm 2026-02-21T09:23:32.9099829Z cp.async.ca.shared.global [ %r12141 + 0 ], [ %rd728 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9099889Z // end inline asm 2026-02-21T09:23:32.9099954Z add.s32 %r12143, %r12121, 11264; 2026-02-21T09:23:32.9100019Z // begin inline asm 2026-02-21T09:23:32.9100168Z cp.async.ca.shared.global [ %r12143 + 0 ], [ %rd729 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9100227Z // end inline asm 2026-02-21T09:23:32.9100296Z add.s32 %r12145, %r12121, 12288; 2026-02-21T09:23:32.9100361Z // begin inline asm 2026-02-21T09:23:32.9100504Z cp.async.ca.shared.global [ %r12145 + 0 ], [ %rd730 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9100573Z // end inline asm 2026-02-21T09:23:32.9100642Z add.s32 %r12147, %r12121, 13312; 2026-02-21T09:23:32.9100705Z // begin inline asm 2026-02-21T09:23:32.9100847Z cp.async.ca.shared.global [ %r12147 + 0 ], [ %rd731 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9100904Z // end inline asm 2026-02-21T09:23:32.9100978Z add.s32 %r12149, %r12121, 14336; 2026-02-21T09:23:32.9101038Z // begin inline asm 2026-02-21T09:23:32.9101181Z cp.async.ca.shared.global [ %r12149 + 0 ], [ %rd732 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9101243Z // end inline asm 2026-02-21T09:23:32.9101307Z add.s32 %r12151, %r12121, 15360; 2026-02-21T09:23:32.9101369Z // begin inline asm 2026-02-21T09:23:32.9101658Z cp.async.ca.shared.global [ %r12151 + 0 ], [ %rd733 + 0 ], 0x8, %r12122; 2026-02-21T09:23:32.9101720Z // end inline asm 2026-02-21T09:23:32.9101786Z cp.async.commit_group; 2026-02-21T09:23:32.9101999Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.9102069Z shl.b32 %r12251, %r12722, 3; 2026-02-21T09:23:32.9102137Z add.s32 %r12153, %r9488, %r12251; 2026-02-21T09:23:32.9102209Z and.pred %p183, %p2, %p187; 2026-02-21T09:23:32.9102274Z // begin inline asm 2026-02-21T09:23:32.9102416Z @%p183 mbarrier.arrive.expect_tx.shared.b64 _, [%r12153], 4096; 2026-02-21T09:23:32.9102474Z // end inline asm 2026-02-21T09:23:32.9102726Z .loc 1 58 33 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:58:33 2026-02-21T09:23:32.9102799Z shl.b32 %r12252, %r12722, 12; 2026-02-21T09:23:32.9102866Z add.s32 %r12154, %r1289, %r12252; 2026-02-21T09:23:32.9102924Z bar.sync 0; 2026-02-21T09:23:32.9103015Z elect.sync %r12253|%p190, -1; 2026-02-21T09:23:32.9103087Z and.pred %p191, %p187, %p190; 2026-02-21T09:23:32.9103158Z and.pred %p184, %p1, %p191; 2026-02-21T09:23:32.9103231Z cvt.u32.u64 %r12254, %rd743; 2026-02-21T09:23:32.9103345Z add.s32 %r12156, %r12254, 64; 2026-02-21T09:23:32.9103408Z // begin inline asm 2026-02-21T09:23:32.9103761Z @%p184 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r12154], [%rd603, {%r9529, %r12156}], [%r12153]; 2026-02-21T09:23:32.9103827Z // end inline asm 2026-02-21T09:23:32.9104030Z .loc 1 45 92 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:45:92 2026-02-21T09:23:32.9104098Z add.s64 %rd742, %rd742, 128; 2026-02-21T09:23:32.9104173Z setp.lt.u64 %p192, %rd743, 480; 2026-02-21T09:23:32.9104241Z add.s64 %rd743, %rd743, 32; 2026-02-21T09:23:32.9104304Z @%p192 bra $L__BB0_9; 2026-02-21T09:23:32.9104425Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T09:23:32.9104497Z cp.async.wait_group 0; 2026-02-21T09:23:32.9104558Z bar.sync 0; 2026-02-21T09:23:32.9104623Z // begin inline asm 2026-02-21T09:23:32.9104727Z @%p2 mbarrier.inval.shared::cta.b64 [%r9488]; 2026-02-21T09:23:32.9104786Z // end inline asm 2026-02-21T09:23:32.9104844Z bar.sync 0; 2026-02-21T09:23:32.9104909Z // begin inline asm 2026-02-21T09:23:32.9105003Z @%p2 mbarrier.inval.shared::cta.b64 [%r9489]; 2026-02-21T09:23:32.9105063Z // end inline asm 2026-02-21T09:23:32.9105266Z .loc 1 91 28 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:91:28 2026-02-21T09:23:32.9105359Z cvt.rn.bf16x2.f32 %r12261, %r12724, %r12723; 2026-02-21T09:23:32.9105444Z cvt.rn.bf16x2.f32 %r12262, %r12726, %r12725; 2026-02-21T09:23:32.9105525Z cvt.rn.bf16x2.f32 %r12263, %r12728, %r12727; 2026-02-21T09:23:32.9105608Z cvt.rn.bf16x2.f32 %r12264, %r12730, %r12729; 2026-02-21T09:23:32.9105689Z cvt.rn.bf16x2.f32 %r12265, %r12732, %r12731; 2026-02-21T09:23:32.9105773Z cvt.rn.bf16x2.f32 %r12266, %r12734, %r12733; 2026-02-21T09:23:32.9105857Z cvt.rn.bf16x2.f32 %r12267, %r12736, %r12735; 2026-02-21T09:23:32.9105948Z cvt.rn.bf16x2.f32 %r12268, %r12738, %r12737; 2026-02-21T09:23:32.9106031Z cvt.rn.bf16x2.f32 %r12269, %r12740, %r12739; 2026-02-21T09:23:32.9106114Z cvt.rn.bf16x2.f32 %r12270, %r12742, %r12741; 2026-02-21T09:23:32.9106198Z cvt.rn.bf16x2.f32 %r12271, %r12744, %r12743; 2026-02-21T09:23:32.9106277Z cvt.rn.bf16x2.f32 %r12272, %r12746, %r12745; 2026-02-21T09:23:32.9106356Z cvt.rn.bf16x2.f32 %r12273, %r12748, %r12747; 2026-02-21T09:23:32.9106439Z cvt.rn.bf16x2.f32 %r12274, %r12750, %r12749; 2026-02-21T09:23:32.9106634Z cvt.rn.bf16x2.f32 %r12275, %r12752, %r12751; 2026-02-21T09:23:32.9106716Z cvt.rn.bf16x2.f32 %r12276, %r12754, %r12753; 2026-02-21T09:23:32.9106802Z cvt.rn.bf16x2.f32 %r12277, %r12756, %r12755; 2026-02-21T09:23:32.9106882Z cvt.rn.bf16x2.f32 %r12278, %r12758, %r12757; 2026-02-21T09:23:32.9106961Z cvt.rn.bf16x2.f32 %r12279, %r12760, %r12759; 2026-02-21T09:23:32.9107212Z cvt.rn.bf16x2.f32 %r12280, %r12762, %r12761; 2026-02-21T09:23:32.9107301Z cvt.rn.bf16x2.f32 %r12281, %r12764, %r12763; 2026-02-21T09:23:32.9107383Z cvt.rn.bf16x2.f32 %r12282, %r12766, %r12765; 2026-02-21T09:23:32.9107466Z cvt.rn.bf16x2.f32 %r12283, %r12768, %r12767; 2026-02-21T09:23:32.9107551Z cvt.rn.bf16x2.f32 %r12284, %r12770, %r12769; 2026-02-21T09:23:32.9107631Z cvt.rn.bf16x2.f32 %r12285, %r12772, %r12771; 2026-02-21T09:23:32.9107710Z cvt.rn.bf16x2.f32 %r12286, %r12774, %r12773; 2026-02-21T09:23:32.9107795Z cvt.rn.bf16x2.f32 %r12287, %r12776, %r12775; 2026-02-21T09:23:32.9107876Z cvt.rn.bf16x2.f32 %r12288, %r12778, %r12777; 2026-02-21T09:23:32.9107956Z cvt.rn.bf16x2.f32 %r12289, %r12780, %r12779; 2026-02-21T09:23:32.9108120Z cvt.rn.bf16x2.f32 %r12290, %r12782, %r12781; 2026-02-21T09:23:32.9108207Z cvt.rn.bf16x2.f32 %r12291, %r12784, %r12783; 2026-02-21T09:23:32.9108362Z cvt.rn.bf16x2.f32 %r12292, %r12786, %r12785; 2026-02-21T09:23:32.9108450Z cvt.rn.bf16x2.f32 %r12293, %r12788, %r12787; 2026-02-21T09:23:32.9108534Z cvt.rn.bf16x2.f32 %r12294, %r12790, %r12789; 2026-02-21T09:23:32.9108616Z cvt.rn.bf16x2.f32 %r12295, %r12792, %r12791; 2026-02-21T09:23:32.9108764Z cvt.rn.bf16x2.f32 %r12296, %r12794, %r12793; 2026-02-21T09:23:32.9108850Z cvt.rn.bf16x2.f32 %r12297, %r12796, %r12795; 2026-02-21T09:23:32.9108929Z cvt.rn.bf16x2.f32 %r12298, %r12798, %r12797; 2026-02-21T09:23:32.9109007Z cvt.rn.bf16x2.f32 %r12299, %r12800, %r12799; 2026-02-21T09:23:32.9109085Z cvt.rn.bf16x2.f32 %r12300, %r12802, %r12801; 2026-02-21T09:23:32.9109170Z cvt.rn.bf16x2.f32 %r12301, %r12804, %r12803; 2026-02-21T09:23:32.9109250Z cvt.rn.bf16x2.f32 %r12302, %r12806, %r12805; 2026-02-21T09:23:32.9109331Z cvt.rn.bf16x2.f32 %r12303, %r12808, %r12807; 2026-02-21T09:23:32.9109417Z cvt.rn.bf16x2.f32 %r12304, %r12810, %r12809; 2026-02-21T09:23:32.9109497Z cvt.rn.bf16x2.f32 %r12305, %r12812, %r12811; 2026-02-21T09:23:32.9109578Z cvt.rn.bf16x2.f32 %r12306, %r12814, %r12813; 2026-02-21T09:23:32.9109673Z cvt.rn.bf16x2.f32 %r12307, %r12816, %r12815; 2026-02-21T09:23:32.9109753Z cvt.rn.bf16x2.f32 %r12308, %r12818, %r12817; 2026-02-21T09:23:32.9109834Z cvt.rn.bf16x2.f32 %r12309, %r12820, %r12819; 2026-02-21T09:23:32.9109914Z cvt.rn.bf16x2.f32 %r12310, %r12822, %r12821; 2026-02-21T09:23:32.9110000Z cvt.rn.bf16x2.f32 %r12311, %r12824, %r12823; 2026-02-21T09:23:32.9110078Z cvt.rn.bf16x2.f32 %r12312, %r12826, %r12825; 2026-02-21T09:23:32.9110160Z cvt.rn.bf16x2.f32 %r12313, %r12828, %r12827; 2026-02-21T09:23:32.9110246Z cvt.rn.bf16x2.f32 %r12314, %r12830, %r12829; 2026-02-21T09:23:32.9110327Z cvt.rn.bf16x2.f32 %r12315, %r12832, %r12831; 2026-02-21T09:23:32.9110408Z cvt.rn.bf16x2.f32 %r12316, %r12834, %r12833; 2026-02-21T09:23:32.9110493Z cvt.rn.bf16x2.f32 %r12317, %r12836, %r12835; 2026-02-21T09:23:32.9110572Z cvt.rn.bf16x2.f32 %r12318, %r12838, %r12837; 2026-02-21T09:23:32.9110652Z cvt.rn.bf16x2.f32 %r12319, %r12840, %r12839; 2026-02-21T09:23:32.9110737Z cvt.rn.bf16x2.f32 %r12320, %r12842, %r12841; 2026-02-21T09:23:32.9110821Z cvt.rn.bf16x2.f32 %r12321, %r12844, %r12843; 2026-02-21T09:23:32.9110900Z cvt.rn.bf16x2.f32 %r12322, %r12846, %r12845; 2026-02-21T09:23:32.9110981Z cvt.rn.bf16x2.f32 %r12323, %r12848, %r12847; 2026-02-21T09:23:32.9111072Z cvt.rn.bf16x2.f32 %r12324, %r12850, %r12849; 2026-02-21T09:23:32.9111289Z .loc 1 92 43 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:92:43 2026-02-21T09:23:32.9111493Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r74], {%r12261, %r12262, %r12263, %r12264}; 2026-02-21T09:23:32.9111694Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r75], {%r12277, %r12278, %r12279, %r12280}; 2026-02-21T09:23:32.9111891Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r76], {%r12293, %r12294, %r12295, %r12296}; 2026-02-21T09:23:32.9112081Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r77], {%r12309, %r12310, %r12311, %r12312}; 2026-02-21T09:23:32.9112277Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r78], {%r12265, %r12266, %r12267, %r12268}; 2026-02-21T09:23:32.9112571Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r79], {%r12281, %r12282, %r12283, %r12284}; 2026-02-21T09:23:32.9112758Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r80], {%r12297, %r12298, %r12299, %r12300}; 2026-02-21T09:23:32.9112943Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r81], {%r12313, %r12314, %r12315, %r12316}; 2026-02-21T09:23:32.9113148Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r82], {%r12269, %r12270, %r12271, %r12272}; 2026-02-21T09:23:32.9113333Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r83], {%r12285, %r12286, %r12287, %r12288}; 2026-02-21T09:23:32.9113524Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r84], {%r12301, %r12302, %r12303, %r12304}; 2026-02-21T09:23:32.9116693Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r85], {%r12317, %r12318, %r12319, %r12320}; 2026-02-21T09:23:32.9116960Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r86], {%r12273, %r12274, %r12275, %r12276}; 2026-02-21T09:23:32.9117180Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r87], {%r12289, %r12290, %r12291, %r12292}; 2026-02-21T09:23:32.9117376Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r88], {%r12305, %r12306, %r12307, %r12308}; 2026-02-21T09:23:32.9117672Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r89], {%r12321, %r12322, %r12323, %r12324}; 2026-02-21T09:23:32.9117749Z // begin inline asm 2026-02-21T09:23:32.9117840Z fence.proxy.async.shared::cta; 2026-02-21T09:23:32.9117902Z // end inline asm 2026-02-21T09:23:32.9117963Z bar.sync 0; 2026-02-21T09:23:32.9118036Z elect.sync %r12325|%p197, -1; 2026-02-21T09:23:32.9118106Z and.pred %p195, %p82, %p197; 2026-02-21T09:23:32.9118177Z or.b32 %r12257, %r386, %r9529; 2026-02-21T09:23:32.9118241Z // begin inline asm 2026-02-21T09:23:32.9118490Z @%p195 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd586, {%r12257, %r12258}], [%r9492]; 2026-02-21T09:23:32.9118551Z // end inline asm 2026-02-21T09:23:32.9118631Z cp.async.bulk.commit_group; 2026-02-21T09:23:32.9118715Z cp.async.bulk.wait_group.read 0; 2026-02-21T09:23:32.9118772Z bar.sync 0; 2026-02-21T09:23:32.9119022Z .loc 1 31 88 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:88 2026-02-21T09:23:32.9119094Z add.s32 %r12326, %r12326, 4; 2026-02-21T09:23:32.9119173Z setp.lt.s32 %p198, %r12326, %r3; 2026-02-21T09:23:32.9119249Z @%p198 bra $L__BB0_2; 2026-02-21T09:23:32.9119353Z $L__BB0_11: // %.preheader 2026-02-21T09:23:32.9119574Z .loc 1 31 4 // c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py:31:4 2026-02-21T09:23:32.9119637Z ret; 2026-02-21T09:23:32.9119699Z $L__tmp9: 2026-02-21T09:23:32.9119757Z $L__func_end0: 2026-02-21T09:23:32.9119850Z // -- End function 2026-02-21T09:23:32.9119908Z } 2026-02-21T09:23:32.9120160Z .file 1 "/tmp/torchinductor_root/64/c64csfft7u75h7xweol6l4ggjgqkcuk76hr6hyipbeiik6dln3ar.py" 2026-02-21T09:23:32.9120381Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T09:23:32.9120451Z .section .debug_abbrev 2026-02-21T09:23:32.9120507Z { 2026-02-21T09:23:32.9120612Z .b8 1 // Abbreviation Code 2026-02-21T09:23:32.9120713Z .b8 17 // DW_TAG_compile_unit 2026-02-21T09:23:32.9120807Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:23:32.9120896Z .b8 37 // DW_AT_producer 2026-02-21T09:23:32.9120979Z .b8 8 // DW_FORM_string 2026-02-21T09:23:32.9121063Z .b8 19 // DW_AT_language 2026-02-21T09:23:32.9121149Z .b8 5 // DW_FORM_data2 2026-02-21T09:23:32.9121233Z .b8 3 // DW_AT_name 2026-02-21T09:23:32.9121314Z .b8 8 // DW_FORM_string 2026-02-21T09:23:32.9121401Z .b8 16 // DW_AT_stmt_list 2026-02-21T09:23:32.9121635Z .b8 6 // DW_FORM_data4 2026-02-21T09:23:32.9121723Z .b8 27 // DW_AT_comp_dir 2026-02-21T09:23:32.9121811Z .b8 8 // DW_FORM_string 2026-02-21T09:23:32.9121890Z .b8 0 // EOM(1) 2026-02-21T09:23:32.9121965Z .b8 0 // EOM(2) 2026-02-21T09:23:32.9122074Z .b8 2 // Abbreviation Code 2026-02-21T09:23:32.9122171Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:23:32.9122263Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:23:32.9122346Z .b8 3 // DW_AT_name 2026-02-21T09:23:32.9122497Z .b8 8 // DW_FORM_string 2026-02-21T09:23:32.9122587Z .b8 32 // DW_AT_inline 2026-02-21T09:23:32.9122669Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:32.9122750Z .b8 0 // EOM(1) 2026-02-21T09:23:32.9122823Z .b8 0 // EOM(2) 2026-02-21T09:23:32.9122915Z .b8 3 // Abbreviation Code 2026-02-21T09:23:32.9123061Z .b8 46 // DW_TAG_subprogram 2026-02-21T09:23:32.9123148Z .b8 1 // DW_CHILDREN_yes 2026-02-21T09:23:32.9123228Z .b8 17 // DW_AT_low_pc 2026-02-21T09:23:32.9123309Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:32.9123394Z .b8 18 // DW_AT_high_pc 2026-02-21T09:23:32.9123478Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:32.9123576Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:23:32.9123660Z .b8 19 // DW_FORM_ref4 2026-02-21T09:23:32.9123735Z .b8 0 // EOM(1) 2026-02-21T09:23:32.9123823Z .b8 0 // EOM(2) 2026-02-21T09:23:32.9123922Z .b8 4 // Abbreviation Code 2026-02-21T09:23:32.9124031Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T09:23:32.9124118Z .b8 0 // DW_CHILDREN_no 2026-02-21T09:23:32.9124217Z .b8 49 // DW_AT_abstract_origin 2026-02-21T09:23:32.9124299Z .b8 19 // DW_FORM_ref4 2026-02-21T09:23:32.9124380Z .b8 17 // DW_AT_low_pc 2026-02-21T09:23:32.9124459Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:32.9124549Z .b8 18 // DW_AT_high_pc 2026-02-21T09:23:32.9124630Z .b8 1 // DW_FORM_addr 2026-02-21T09:23:32.9124716Z .b8 88 // DW_AT_call_file 2026-02-21T09:23:32.9124801Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:32.9124892Z .b8 89 // DW_AT_call_line 2026-02-21T09:23:32.9124975Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:32.9125074Z .b8 87 // DW_AT_call_column 2026-02-21T09:23:32.9125156Z .b8 11 // DW_FORM_data1 2026-02-21T09:23:32.9125232Z .b8 0 // EOM(1) 2026-02-21T09:23:32.9125310Z .b8 0 // EOM(2) 2026-02-21T09:23:32.9125383Z .b8 0 // EOM(3) 2026-02-21T09:23:32.9125448Z } 2026-02-21T09:23:32.9125516Z .section .debug_info 2026-02-21T09:23:32.9125580Z { 2026-02-21T09:23:32.9125675Z .b32 178 // Length of Unit 2026-02-21T09:23:32.9125777Z .b8 2 // DWARF version number 2026-02-21T09:23:32.9125833Z .b8 0 2026-02-21T09:23:32.9125973Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T09:23:32.9126195Z .b8 8 // Address Size (in bytes) 2026-02-21T09:23:32.9126314Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T09:23:32.9126414Z .b8 116 // DW_AT_producer 2026-02-21T09:23:32.9126601Z .b8 114 2026-02-21T09:23:32.9126662Z .b8 105 2026-02-21T09:23:32.9126724Z .b8 116 2026-02-21T09:23:32.9126778Z .b8 111 2026-02-21T09:23:32.9126834Z .b8 110 2026-02-21T09:23:32.9126892Z .b8 0 2026-02-21T09:23:32.9126986Z .b8 2 // DW_AT_language 2026-02-21T09:23:32.9127038Z .b8 0 2026-02-21T09:23:32.9127121Z .b8 99 // DW_AT_name 2026-02-21T09:23:32.9127179Z .b8 54 2026-02-21T09:23:32.9127233Z .b8 52 2026-02-21T09:23:32.9127376Z .b8 99 2026-02-21T09:23:32.9127433Z .b8 115 2026-02-21T09:23:32.9127496Z .b8 102 2026-02-21T09:23:32.9127550Z .b8 102 2026-02-21T09:23:32.9127603Z .b8 116 2026-02-21T09:23:32.9127667Z .b8 55 2026-02-21T09:23:32.9127721Z .b8 117 2026-02-21T09:23:32.9127778Z .b8 55 2026-02-21T09:23:32.9127832Z .b8 53 2026-02-21T09:23:32.9127892Z .b8 104 2026-02-21T09:23:32.9127944Z .b8 55 2026-02-21T09:23:32.9127998Z .b8 120 2026-02-21T09:23:32.9128057Z .b8 119 2026-02-21T09:23:32.9128176Z .b8 101 2026-02-21T09:23:32.9128234Z .b8 111 2026-02-21T09:23:32.9128291Z .b8 108 2026-02-21T09:23:32.9128349Z .b8 54 2026-02-21T09:23:32.9128405Z .b8 108 2026-02-21T09:23:32.9128474Z .b8 52 2026-02-21T09:23:32.9128535Z .b8 103 2026-02-21T09:23:32.9128589Z .b8 103 2026-02-21T09:23:32.9128641Z .b8 106 2026-02-21T09:23:32.9128695Z .b8 103 2026-02-21T09:23:32.9128763Z .b8 113 2026-02-21T09:23:32.9128818Z .b8 107 2026-02-21T09:23:32.9128871Z .b8 99 2026-02-21T09:23:32.9128925Z .b8 117 2026-02-21T09:23:32.9128984Z .b8 107 2026-02-21T09:23:32.9129042Z .b8 55 2026-02-21T09:23:32.9129095Z .b8 54 2026-02-21T09:23:32.9129153Z .b8 104 2026-02-21T09:23:32.9129206Z .b8 114 2026-02-21T09:23:32.9129258Z .b8 54 2026-02-21T09:23:32.9129317Z .b8 104 2026-02-21T09:23:32.9129376Z .b8 121 2026-02-21T09:23:32.9129430Z .b8 105 2026-02-21T09:23:32.9129482Z .b8 112 2026-02-21T09:23:32.9129542Z .b8 98 2026-02-21T09:23:32.9129603Z .b8 101 2026-02-21T09:23:32.9129657Z .b8 105 2026-02-21T09:23:32.9129711Z .b8 105 2026-02-21T09:23:32.9129772Z .b8 107 2026-02-21T09:23:32.9129826Z .b8 54 2026-02-21T09:23:32.9129879Z .b8 100 2026-02-21T09:23:32.9129933Z .b8 108 2026-02-21T09:23:32.9129992Z .b8 110 2026-02-21T09:23:32.9130045Z .b8 51 2026-02-21T09:23:32.9130098Z .b8 97 2026-02-21T09:23:32.9130157Z .b8 114 2026-02-21T09:23:32.9130210Z .b8 46 2026-02-21T09:23:32.9130265Z .b8 112 2026-02-21T09:23:32.9130319Z .b8 121 2026-02-21T09:23:32.9130377Z .b8 0 2026-02-21T09:23:32.9130492Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T09:23:32.9130585Z .b8 47 // DW_AT_comp_dir 2026-02-21T09:23:32.9130645Z .b8 116 2026-02-21T09:23:32.9130700Z .b8 109 2026-02-21T09:23:32.9130754Z .b8 112 2026-02-21T09:23:32.9130808Z .b8 47 2026-02-21T09:23:32.9130870Z .b8 116 2026-02-21T09:23:32.9130923Z .b8 111 2026-02-21T09:23:32.9130979Z .b8 114 2026-02-21T09:23:32.9131036Z .b8 99 2026-02-21T09:23:32.9131089Z .b8 104 2026-02-21T09:23:32.9131143Z .b8 105 2026-02-21T09:23:32.9131196Z .b8 110 2026-02-21T09:23:32.9131255Z .b8 100 2026-02-21T09:23:32.9131308Z .b8 117 2026-02-21T09:23:32.9131361Z .b8 99 2026-02-21T09:23:32.9131423Z .b8 116 2026-02-21T09:23:32.9131477Z .b8 111 2026-02-21T09:23:32.9131529Z .b8 114 2026-02-21T09:23:32.9131584Z .b8 95 2026-02-21T09:23:32.9131648Z .b8 114 2026-02-21T09:23:32.9131710Z .b8 111 2026-02-21T09:23:32.9131765Z .b8 111 2026-02-21T09:23:32.9131823Z .b8 116 2026-02-21T09:23:32.9131882Z .b8 47 2026-02-21T09:23:32.9131935Z .b8 54 2026-02-21T09:23:32.9131987Z .b8 52 2026-02-21T09:23:32.9132046Z .b8 0 2026-02-21T09:23:32.9132173Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T09:23:32.9132258Z .b8 95 // DW_AT_name 2026-02-21T09:23:32.9132486Z .b8 104 2026-02-21T09:23:32.9132547Z .b8 101 2026-02-21T09:23:32.9132600Z .b8 108 2026-02-21T09:23:32.9132655Z .b8 105 2026-02-21T09:23:32.9132712Z .b8 111 2026-02-21T09:23:32.9132766Z .b8 110 2026-02-21T09:23:32.9132818Z .b8 95 2026-02-21T09:23:32.9132872Z .b8 109 2026-02-21T09:23:32.9132934Z .b8 97 2026-02-21T09:23:32.9132988Z .b8 116 2026-02-21T09:23:32.9133046Z .b8 109 2026-02-21T09:23:32.9133106Z .b8 117 2026-02-21T09:23:32.9133161Z .b8 108 2026-02-21T09:23:32.9133213Z .b8 95 2026-02-21T09:23:32.9133268Z .b8 98 2026-02-21T09:23:32.9133326Z .b8 102 2026-02-21T09:23:32.9133379Z .b8 49 2026-02-21T09:23:32.9133432Z .b8 54 2026-02-21T09:23:32.9133484Z .b8 95 2026-02-21T09:23:32.9133546Z .b8 105 2026-02-21T09:23:32.9133599Z .b8 110 2026-02-21T09:23:32.9133653Z .b8 116 2026-02-21T09:23:32.9133710Z .b8 52 2026-02-21T09:23:32.9133819Z .b8 0 2026-02-21T09:23:32.9133911Z .b8 1 // DW_AT_inline 2026-02-21T09:23:32.9134040Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T09:23:32.9134152Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T09:23:32.9134257Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T09:23:32.9134410Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:23:32.9134555Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T09:23:32.9134657Z .b32 108 // DW_AT_abstract_origin 2026-02-21T09:23:32.9134750Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T09:23:32.9134851Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T09:23:32.9134940Z .b8 1 // DW_AT_call_file 2026-02-21T09:23:32.9135029Z .b8 88 // DW_AT_call_line 2026-02-21T09:23:32.9135128Z .b8 40 // DW_AT_call_column 2026-02-21T09:23:32.9135224Z .b8 0 // End Of Children Mark 2026-02-21T09:23:32.9135319Z .b8 0 // End Of Children Mark 2026-02-21T09:23:32.9135374Z } 2026-02-21T09:23:32.9135452Z .section .debug_macinfo { } 2026-02-21T09:23:32.9135461Z 2026-02-21T09:23:32.9135559Z ================================================================ 2026-02-21T09:23:32.9135683Z please share the reproducer above with Triton project. 2026-02-21T09:23:39.7115275Z 2026-02-21T09:23:39.7116109Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 87/87 8.7 configs/s 2026-02-21T09:23:47.6692517Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━━━ 155/155 18.8 configs/s 2026-02-21T09:23:48.0670757Z [708s] Generation 6 complete: 2026-02-21T09:23:48.0671067Z error=38 2026-02-21T09:23:48.0671255Z timeout=1 2026-02-21T09:23:48.0671437Z ok=51 2026-02-21T09:23:48.0671624Z min=1.2832 2026-02-21T09:23:48.0671801Z mid=1.7971 2026-02-21T09:23:48.0671977Z max=165.0409 2026-02-21T09:23:48.0672205Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:23:48.0672637Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:23:48.0673065Z 'l2_groupings': [1], 2026-02-21T09:23:48.0673321Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:23:48.0673620Z 'loop_orders': [[0, 1]], 2026-02-21T09:23:48.0673846Z 'num_stages': 7, 2026-02-21T09:23:48.0674070Z 'num_warps': 4, 2026-02-21T09:23:48.0674313Z 'pid_type': 'flat', 2026-02-21T09:23:48.0674577Z 'range_flattens': [None, None], 2026-02-21T09:23:48.0674867Z 'range_multi_buffers': [None, None], 2026-02-21T09:23:48.0675130Z 'range_num_stages': [0, 3], 2026-02-21T09:23:48.0675369Z 'range_unroll_factors': [0, 1], 2026-02-21T09:23:48.0675616Z 'range_warp_specializes': []} 2026-02-21T09:23:48.0716378Z [708s] Fitting surrogate: 752 points, 752 targets 2026-02-21T09:23:49.3864073Z [709s] Generation 7 starting: 79 neighbors, 4 active search path(s) 2026-02-21T09:24:33.4477299Z [753s] Timeout after 30s compiling Config(block_sizes=[128, 64, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[False, None], range_num_stages=[1, 0], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:24:33.4496990Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 80/80 0.4 configs/s 2026-02-21T09:24:40.6602109Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 80/80 11.0 configs/s 2026-02-21T09:24:47.8253346Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━━━ 156/156 20.9 configs/s 2026-02-21T09:24:48.2019685Z [768s] Generation 7 complete: 2026-02-21T09:24:48.2020007Z error=29 2026-02-21T09:24:48.2020199Z timeout=1 2026-02-21T09:24:48.2020801Z ok=53 2026-02-21T09:24:48.2020976Z min=1.3213 2026-02-21T09:24:48.2021156Z mid=1.8410 2026-02-21T09:24:48.2021319Z max=155.4317 2026-02-21T09:24:48.2021562Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:24:48.2021991Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:24:48.2022403Z 'l2_groupings': [1], 2026-02-21T09:24:48.2022663Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:24:48.2023171Z 'loop_orders': [[0, 1]], 2026-02-21T09:24:48.2023414Z 'num_stages': 7, 2026-02-21T09:24:48.2023588Z 'num_warps': 4, 2026-02-21T09:24:48.2023760Z 'pid_type': 'flat', 2026-02-21T09:24:48.2023937Z 'range_flattens': [None, None], 2026-02-21T09:24:48.2024162Z 'range_multi_buffers': [None, None], 2026-02-21T09:24:48.2024387Z 'range_num_stages': [0, 3], 2026-02-21T09:24:48.2024593Z 'range_unroll_factors': [0, 1], 2026-02-21T09:24:48.2024806Z 'range_warp_specializes': []} 2026-02-21T09:24:48.2067660Z [768s] Fitting surrogate: 835 points, 835 targets 2026-02-21T09:24:49.0558861Z [769s] Generation 8 starting: 44 neighbors, 2 active search path(s) 2026-02-21T09:25:21.9685444Z [802s] Timeout after 30s compiling Config(block_sizes=[128, 64, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_sm_multiplier=16, num_stages=7, num_warps=2, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[False, True], range_num_stages=[1, 0], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:25:22.8710045Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45/45 0.4 configs/s 2026-02-21T09:25:25.7030250Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 45/45 15.4 configs/s 2026-02-21T09:25:27.8840751Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━━━ 156/156 63.0 configs/s 2026-02-21T09:25:28.2234021Z [808s] Generation 8 complete: 2026-02-21T09:25:28.2234285Z error=25 2026-02-21T09:25:28.2234488Z timeout=1 2026-02-21T09:25:28.2234654Z ok=21 2026-02-21T09:25:28.2234821Z min=1.3630 2026-02-21T09:25:28.2234985Z mid=2.0043 2026-02-21T09:25:28.2235153Z max=28.0511 2026-02-21T09:25:28.2235403Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:25:28.2235877Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:25:28.2236322Z 'l2_groupings': [1], 2026-02-21T09:25:28.2236885Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:25:28.2237188Z 'loop_orders': [[0, 1]], 2026-02-21T09:25:28.2237396Z 'num_stages': 7, 2026-02-21T09:25:28.2237586Z 'num_warps': 4, 2026-02-21T09:25:28.2237770Z 'pid_type': 'flat', 2026-02-21T09:25:28.2237982Z 'range_flattens': [None, None], 2026-02-21T09:25:28.2238229Z 'range_multi_buffers': [None, None], 2026-02-21T09:25:28.2238491Z 'range_num_stages': [0, 3], 2026-02-21T09:25:28.2238729Z 'range_unroll_factors': [0, 1], 2026-02-21T09:25:28.2238971Z 'range_warp_specializes': []} 2026-02-21T09:25:28.2278920Z [808s] Fitting surrogate: 882 points, 882 targets 2026-02-21T09:25:28.6495800Z [809s] Generation 9 starting: 18 neighbors, 1 active search path(s) 2026-02-21T09:26:02.1851554Z [842s] Timeout after 30s compiling Config(block_sizes=[128, 64, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], num_sm_multiplier=8, num_stages=7, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, True], range_num_stages=[1, 1], range_unroll_factors=[4, 4], range_warp_specializes=[]) 2026-02-21T09:26:02.1872632Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 0.2 configs/s 2026-02-21T09:26:06.0566845Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 19/19 4.8 configs/s 2026-02-21T09:26:06.4700942Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━━ 156/156 218.6 configs/s 2026-02-21T09:26:06.7947720Z [847s] Generation 9 complete: 2026-02-21T09:26:06.7948439Z timeout=1 2026-02-21T09:26:06.7948628Z ok=19 2026-02-21T09:26:06.7948801Z min=1.2681 2026-02-21T09:26:06.7948968Z mid=4.6396 2026-02-21T09:26:06.7949142Z max=81.3310 2026-02-21T09:26:06.7949356Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:26:06.7949750Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:26:06.7950167Z 'l2_groupings': [1], 2026-02-21T09:26:06.7950414Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:26:06.7950907Z 'loop_orders': [[0, 1]], 2026-02-21T09:26:06.7951143Z 'num_stages': 7, 2026-02-21T09:26:06.7951344Z 'num_warps': 4, 2026-02-21T09:26:06.7951537Z 'pid_type': 'flat', 2026-02-21T09:26:06.7951758Z 'range_flattens': [None, None], 2026-02-21T09:26:06.7952012Z 'range_multi_buffers': [None, None], 2026-02-21T09:26:06.7952283Z 'range_num_stages': [0, 3], 2026-02-21T09:26:06.7952522Z 'range_unroll_factors': [0, 1], 2026-02-21T09:26:06.7952772Z 'range_warp_specializes': []} 2026-02-21T09:26:06.7989270Z [847s] Fitting surrogate: 902 points, 902 targets 2026-02-21T09:26:07.2238115Z [847s] Generation 10 starting: 19 neighbors, 1 active search path(s) 2026-02-21T09:26:18.0269256Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 19/19 1.9 configs/s 2026-02-21T09:26:20.0632975Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 19/19 13.8 configs/s 2026-02-21T09:26:21.3066889Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━━━ 157/157 99.9 configs/s 2026-02-21T09:26:21.6509414Z [862s] Generation 10 complete: 2026-02-21T09:26:21.6509650Z error=8 2026-02-21T09:26:21.6509798Z ok=13 2026-02-21T09:26:21.6509936Z min=1.2585 2026-02-21T09:26:21.6510082Z mid=2.2142 2026-02-21T09:26:21.6510216Z max=72.1797 2026-02-21T09:26:21.6510377Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:26:21.6510703Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:26:21.6511058Z 'l2_groupings': [1], 2026-02-21T09:26:21.6511264Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:26:21.6511508Z 'loop_orders': [[0, 1]], 2026-02-21T09:26:21.6511688Z 'num_stages': 7, 2026-02-21T09:26:21.6511857Z 'num_warps': 4, 2026-02-21T09:26:21.6512044Z 'pid_type': 'flat', 2026-02-21T09:26:21.6512567Z 'range_flattens': [None, None], 2026-02-21T09:26:21.6512820Z 'range_multi_buffers': [None, None], 2026-02-21T09:26:21.6513073Z 'range_num_stages': [0, 3], 2026-02-21T09:26:21.6513305Z 'range_unroll_factors': [0, 1], 2026-02-21T09:26:21.6513545Z 'range_warp_specializes': []} 2026-02-21T09:26:21.6556962Z [862s] Fitting surrogate: 923 points, 923 targets 2026-02-21T09:26:21.9904630Z [862s] Generation 11 starting: 12 neighbors, 1 active search path(s) 2026-02-21T09:26:30.9534222Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12/12 1.7 configs/s 2026-02-21T09:26:33.3739461Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 12/12 4.7 configs/s 2026-02-21T09:26:34.5873156Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━━ 158/158 103.5 configs/s 2026-02-21T09:26:34.9227246Z [875s] Generation 11 complete: 2026-02-21T09:26:34.9227530Z error=5 2026-02-21T09:26:34.9227705Z ok=9 2026-02-21T09:26:34.9227864Z min=1.2775 2026-02-21T09:26:34.9228242Z mid=1.5490 2026-02-21T09:26:34.9228522Z max=107.6327 2026-02-21T09:26:34.9228729Z best={'block_sizes': [32, 128, 128], 2026-02-21T09:26:34.9229136Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T09:26:34.9229560Z 'l2_groupings': [1], 2026-02-21T09:26:34.9229815Z 'load_eviction_policies': ['first', 'last'], 2026-02-21T09:26:34.9230108Z 'loop_orders': [[0, 1]], 2026-02-21T09:26:34.9230336Z 'num_stages': 7, 2026-02-21T09:26:34.9230527Z 'num_warps': 4, 2026-02-21T09:26:34.9230739Z 'pid_type': 'flat', 2026-02-21T09:26:34.9230958Z 'range_flattens': [None, None], 2026-02-21T09:26:34.9231218Z 'range_multi_buffers': [None, None], 2026-02-21T09:26:34.9231478Z 'range_num_stages': [0, 3], 2026-02-21T09:26:34.9231718Z 'range_unroll_factors': [0, 1], 2026-02-21T09:26:34.9231968Z 'range_warp_specializes': []} 2026-02-21T09:26:34.9274086Z [875s] Fitting surrogate: 937 points, 937 targets 2026-02-21T09:26:35.1012885Z [875s] Autotuning complete in 875.5s after searching 795 configs. 2026-02-21T09:26:35.1013408Z One can hardcode the best config and skip autotuning with: 2026-02-21T09:26:35.1015492Z @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], num_stages=7, num_warps=4, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 3], range_unroll_factors=[0, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T09:26:35.1017305Z 2026-02-21T09:26:35.1017702Z [875s] Code of selected kernel: /tmp/torchinductor_root/2m/c2msqmnpytyk4utqgtvtjrri6fbgwopflmqfw72sam7gxz2u3vy6.py 2026-02-21T09:26:36.3132825Z WARNING:tritonbench.utils.triton_op:Completed input ID 21: 2026-02-21T09:26:36.3133416Z x_val 2026-02-21T09:26:36.3133709Z --------------------- 2026-02-21T09:26:36.3134070Z (4, 4096, 8192, 1024) 2026-02-21T09:26:36.3134293Z 2026-02-21T09:26:36.3165599Z 70%|███████ | 7/10 [1:08:16<33:32, 670.70s/it]WARNING:tritonbench.utils.triton_op:Running input ID 24: 2026-02-21T09:26:36.3166063Z x_val 2026-02-21T09:26:36.3166225Z ---------------------- 2026-02-21T09:26:36.3166423Z (16, 4096, 1280, 8192) 2026-02-21T09:26:36.3172672Z INFO:tritonbench.utils.triton_op:Took 0.27ms to get benchmark function for preprocessed_eager_int4_gemm 2026-02-21T09:26:37.5032760Z INFO:tritonbench.utils.triton_op:Took 3.54ms to get benchmark function for preprocessed_torch_compile_int4_gemm 2026-02-21T09:26:41.3433362Z Autotune Choices Stats: 2026-02-21T09:26:41.3434855Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 2.013792037963867, "best_triton_pos": 1, "best_triton_time": 2.3480639457702637, "best_triton_kernel": "triton_mm_107", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2026-02-21T09:26:41.4218704Z AUTOTUNE mm(65536x8192, 8192x1280) 2026-02-21T09:26:41.4219550Z strides: [8192, 1], [1280, 1] 2026-02-21T09:26:41.4219834Z dtypes: torch.bfloat16, torch.bfloat16 2026-02-21T09:26:41.4220114Z mm 2.0138 ms 100.0% 2026-02-21T09:26:41.4220798Z triton_mm_107 2.3481 ms 85.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T09:26:41.4221956Z triton_mm_106 2.3796 ms 84.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:26:41.4223094Z triton_mm_105 2.5932 ms 77.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:26:41.4224400Z triton_mm_101 2.8099 ms 71.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T09:26:41.4225700Z triton_mm_98 3.6783 ms 54.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:26:41.4227030Z triton_mm_102 3.7852 ms 53.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:26:41.4228172Z triton_mm_99 3.8361 ms 52.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:26:41.4229315Z triton_mm_100 3.8405 ms 52.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T09:26:41.4230304Z triton_mm_103 4.0932 ms 49.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T09:26:41.4231141Z SingleProcess AUTOTUNE benchmarking takes 2.9430 seconds and 0.7105 seconds precompiling for 20 choices 2026-02-21T09:26:44.7896963Z INFO:tritonbench.utils.triton_op:Took 0.19ms to get benchmark function for preprocessed_triton_int4_gemm 2026-02-21T09:26:46.0334258Z WARNING:__main__:Input tensor metadata: 2026-02-21T09:26:46.0334586Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T09:26:46.0334860Z 'dtype': 'torch.bfloat16', 2026-02-21T09:26:46.0335148Z 'shape': (16, 4096, 8192), 2026-02-21T09:26:46.0335419Z 'stride': (33554432, 8192, 1)}, 2026-02-21T09:26:46.0335693Z { 'device': 'cuda:0', 2026-02-21T09:26:46.0335939Z 'dtype': 'torch.int32', 2026-02-21T09:26:46.0336213Z 'shape': (8192, 1280), 2026-02-21T09:26:46.0336912Z 'stride': (1280, 1)}), 2026-02-21T09:26:46.0337174Z 'kwargs': {}} 2026-02-21T09:26:46.0400806Z INFO:tritonbench.utils.triton_op:Took 6.95ms to get benchmark function for helion_int4_gemm_tritonbench 2026-02-21T09:26:46.4502796Z [0s] Autotune random seed: 2135373392 2026-02-21T09:26:46.9790719Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T09:27:21.4757718Z [34s] Timeout after 30s compiling Config(block_sizes=[16, 8192, 2], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=4, num_stages=3, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[1, 3], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:27:22.7275701Z [35s] Timeout after 30s compiling Config(block_sizes=[16, 1, 1024], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=3, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[0, 3], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:27:28.0266947Z [41s] Timeout after 30s compiling Config(block_sizes=[128, 512, 8], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=4, num_stages=4, num_warps=16, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[2, 2], range_unroll_factors=[4, 0], range_warp_specializes=[]) 2026-02-21T09:27:31.0969481Z [44s] Timeout after 30s compiling Config(block_sizes=[64, 4096, 1], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=32, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[3, 0], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T09:27:36.2329488Z [49s] Timeout after 30s compiling Config(block_sizes=[8, 2048, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=1, num_stages=3, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[2, 2], range_unroll_factors=[2, 3], range_warp_specializes=[]) 2026-02-21T09:27:42.1385821Z [55s] Timeout after 30s compiling Config(block_sizes=[4, 16384, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=4, num_stages=5, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[3, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T09:27:42.8927401Z [55s] Timeout after 30s compiling Config(block_sizes=[1, 512, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=1, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[True, False], range_num_stages=[3, 3], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T09:27:42.8929630Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━━ 100/100 0.7 configs/s 2026-02-21T09:28:44.5641637Z [117s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=128, num_stages=8, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[2, 1], range_warp_specializes=[]) 2026-02-21T09:28:44.5643549Z Tensor-likes are not close! 2026-02-21T09:28:44.5643710Z 2026-02-21T09:28:44.5643821Z Mismatched elements: 83709763 / 83886080 (99.8%) 2026-02-21T09:28:44.5644260Z Greatest absolute difference: 7040.0 at index (54665, 1115) (up to 0.01 allowed) 2026-02-21T09:28:44.5644800Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T09:28:44.5645271Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:28:44.5645529Z 2026-02-21T09:32:10.4315846Z [323s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 2048, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=5, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 2], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T09:32:10.4319478Z Tensor-likes are not close! 2026-02-21T09:32:10.4319742Z 2026-02-21T09:32:10.4319939Z Mismatched elements: 83761041 / 83886080 (99.9%) 2026-02-21T09:32:10.4320675Z Greatest absolute difference: 9728.0 at index (23718, 523) (up to 0.01 allowed) 2026-02-21T09:32:10.4321601Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T09:32:10.4322068Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:32:10.4322326Z 2026-02-21T09:34:41.9576265Z [474s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 512], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=2, num_warps=32, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[1, 4], range_unroll_factors=[1, 4], range_warp_specializes=[]) 2026-02-21T09:34:41.9578372Z Tensor-likes are not close! 2026-02-21T09:34:41.9578526Z 2026-02-21T09:34:41.9578627Z Mismatched elements: 83702833 / 83886080 (99.8%) 2026-02-21T09:34:41.9579006Z Greatest absolute difference: 6848.0 at index (40628, 744) (up to 0.01 allowed) 2026-02-21T09:34:41.9579478Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T09:34:41.9579886Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:34:41.9580107Z 2026-02-21T09:38:48.7372283Z [721s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 2048, 16], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=128, num_stages=3, num_warps=32, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, True], range_num_stages=[1, 0], range_unroll_factors=[1, 4], range_warp_specializes=[]) 2026-02-21T09:38:48.7375222Z Tensor-likes are not close! 2026-02-21T09:38:48.7375441Z 2026-02-21T09:38:48.7375541Z Mismatched elements: 83774986 / 83886080 (99.9%) 2026-02-21T09:38:48.7375920Z Greatest absolute difference: 9728.0 at index (23718, 523) (up to 0.01 allowed) 2026-02-21T09:38:48.7377068Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T09:38:48.7377530Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:38:48.7377755Z 2026-02-21T09:47:47.9590714Z [1260s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 512, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=64, num_stages=8, num_warps=32, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[True, False], range_num_stages=[1, 3], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T09:47:47.9592625Z Tensor-likes are not close! 2026-02-21T09:47:47.9592813Z 2026-02-21T09:47:47.9592941Z Mismatched elements: 83764624 / 83886080 (99.9%) 2026-02-21T09:47:47.9593368Z Greatest absolute difference: 9024.0 at index (56211, 554) (up to 0.01 allowed) 2026-02-21T09:47:47.9593902Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T09:47:47.9594365Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:47:47.9594624Z 2026-02-21T09:51:50.0932182Z [1503s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 128, 16], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=64, num_stages=2, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[None, None], range_num_stages=[0, 1], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T09:51:50.0935053Z Tensor-likes are not close! 2026-02-21T09:51:50.0935338Z 2026-02-21T09:51:50.0936097Z Mismatched elements: 83821367 / 83886080 (99.9%) 2026-02-21T09:51:50.0937273Z Greatest absolute difference: 8896.0 at index (34528, 600) (up to 0.01 allowed) 2026-02-21T09:51:50.0937969Z Greatest relative difference: 150994944.0 at index (40894, 397) (up to 0.01 allowed) 2026-02-21T09:51:50.0938416Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T09:51:50.0938641Z 2026-02-21T10:00:11.1485214Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━ 100/100 0.1 configs/s 2026-02-21T10:00:11.1508880Z [2004s] Adaptive compile timeout: 30s (90% percentile=24.9s, bounds=[30.0s, 30s]) 2026-02-21T10:00:11.2391150Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 8/8 - configs/s 2026-02-21T10:00:11.8938683Z [2004s] Initial random population of 100, 5 starting points: 2026-02-21T10:00:11.8939082Z error=14 2026-02-21T10:00:11.8939275Z timeout=7 2026-02-21T10:00:11.8939437Z ok=79 2026-02-21T10:00:11.8939602Z min=24.5989 2026-02-21T10:00:11.8939962Z mid=1271.0828 2026-02-21T10:00:11.8940155Z max=17810.0020 2026-02-21T10:00:11.8940373Z best={'block_sizes': [32, 32, 32], 2026-02-21T10:00:11.8940710Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T10:00:11.8941048Z 'l2_groupings': [32], 2026-02-21T10:00:11.8941426Z 'load_eviction_policies': ['', 'first'], 2026-02-21T10:00:11.8941724Z 'loop_orders': [[0, 1]], 2026-02-21T10:00:11.8941942Z 'num_stages': 1, 2026-02-21T10:00:11.8942147Z 'num_warps': 1, 2026-02-21T10:00:11.8942355Z 'pid_type': 'flat', 2026-02-21T10:00:11.8942568Z 'range_flattens': [None, None], 2026-02-21T10:00:11.8942785Z 'range_multi_buffers': [None, True], 2026-02-21T10:00:11.8943014Z 'range_num_stages': [0, 3], 2026-02-21T10:00:11.8943215Z 'range_unroll_factors': [0, 2], 2026-02-21T10:00:11.8943433Z 'range_warp_specializes': []} 2026-02-21T10:00:11.8964428Z [2004s] Fitting surrogate: 100 points, 100 targets 2026-02-21T10:00:13.5282860Z [2006s] Generation 1 starting: 99 neighbors, 5 active search path(s) 2026-02-21T10:00:49.0792537Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 102/102 0.5 configs/s 2026-02-21T10:01:06.6973223Z [2059s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 128, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=64, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, True], range_num_stages=[0, 2], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T10:01:06.6976153Z Tensor-likes are not close! 2026-02-21T10:01:06.6976425Z 2026-02-21T10:01:06.6977103Z Mismatched elements: 83714507 / 83886080 (99.8%) 2026-02-21T10:01:06.6977824Z Greatest absolute difference: 6720.0 at index (20213, 233) (up to 0.01 allowed) 2026-02-21T10:01:06.6978677Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T10:01:06.6979503Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:01:06.6980011Z 2026-02-21T10:04:37.1060482Z [2270s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=1, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T10:04:37.1062050Z Tensor-likes are not close! 2026-02-21T10:04:37.1062227Z 2026-02-21T10:04:37.1062352Z Mismatched elements: 83349780 / 83886080 (99.4%) 2026-02-21T10:04:37.1062797Z Greatest absolute difference: 2480.0 at index (16763, 429) (up to 0.01 allowed) 2026-02-21T10:04:37.1063348Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T10:04:37.1063821Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:04:37.1064071Z 2026-02-21T10:04:40.5632808Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 102/102 0.9 configs/s 2026-02-21T10:04:40.7053460Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━ 14/14 - configs/s 2026-02-21T10:04:42.4175643Z [2275s] Generation 1 complete: 2026-02-21T10:04:42.4175901Z error=7 2026-02-21T10:04:42.4176064Z ok=98 2026-02-21T10:04:42.4176241Z min=13.8933 2026-02-21T10:04:42.4176432Z mid=40.9790 2026-02-21T10:04:42.4176999Z max=7685.8882 2026-02-21T10:04:42.4177204Z best={'block_sizes': [32, 128, 32], 2026-02-21T10:04:42.4177566Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T10:04:42.4177908Z 'l2_groupings': [32], 2026-02-21T10:04:42.4178143Z 'load_eviction_policies': ['', 'first'], 2026-02-21T10:04:42.4178437Z 'loop_orders': [[0, 1]], 2026-02-21T10:04:42.4178647Z 'num_stages': 1, 2026-02-21T10:04:42.4179146Z 'num_warps': 2, 2026-02-21T10:04:42.4179347Z 'pid_type': 'flat', 2026-02-21T10:04:42.4179562Z 'range_flattens': [None, None], 2026-02-21T10:04:42.4179813Z 'range_multi_buffers': [None, True], 2026-02-21T10:04:42.4180117Z 'range_num_stages': [0, 4], 2026-02-21T10:04:42.4180350Z 'range_unroll_factors': [0, 2], 2026-02-21T10:04:42.4180598Z 'range_warp_specializes': []} 2026-02-21T10:04:42.4207596Z [2275s] Fitting surrogate: 205 points, 205 targets 2026-02-21T10:04:44.1294061Z [2277s] Generation 2 starting: 101 neighbors, 5 active search path(s) 2026-02-21T10:05:15.8657714Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 105/105 5.9 configs/s 2026-02-21T10:06:07.5854531Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 105/105 2.5 configs/s 2026-02-21T10:06:07.5882205Z [2360s] Generation 2 complete: 2026-02-21T10:06:07.5882485Z error=22 2026-02-21T10:06:07.5882705Z ok=85 2026-02-21T10:06:07.5882899Z min=6.8942 2026-02-21T10:06:07.5883107Z mid=30.2716 2026-02-21T10:06:07.5883326Z max=837.1562 2026-02-21T10:06:07.5883530Z best={'block_sizes': [32, 128, 128], 2026-02-21T10:06:07.5883866Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T10:06:07.5884226Z 'l2_groupings': [32], 2026-02-21T10:06:07.5884456Z 'load_eviction_policies': ['', 'last'], 2026-02-21T10:06:07.5884727Z 'loop_orders': [[0, 1]], 2026-02-21T10:06:07.5884934Z 'num_stages': 1, 2026-02-21T10:06:07.5885128Z 'num_warps': 4, 2026-02-21T10:06:07.5885331Z 'pid_type': 'flat', 2026-02-21T10:06:07.5885544Z 'range_flattens': [None, None], 2026-02-21T10:06:07.5885804Z 'range_multi_buffers': [None, True], 2026-02-21T10:06:07.5886060Z 'range_num_stages': [0, 4], 2026-02-21T10:06:07.5886314Z 'range_unroll_factors': [0, 2], 2026-02-21T10:06:07.5886729Z 'range_warp_specializes': []} 2026-02-21T10:06:07.5919092Z [2360s] Fitting surrogate: 312 points, 312 targets 2026-02-21T10:06:09.1313442Z [2362s] Generation 3 starting: 94 neighbors, 5 active search path(s) 2026-02-21T10:06:41.6619643Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 95/95 3.3 configs/s 2026-02-21T10:07:17.3136800Z [2430s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 256, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=64, num_stages=1, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[4, 3], range_unroll_factors=[1, 0], range_warp_specializes=[]) 2026-02-21T10:07:17.3138702Z Tensor-likes are not close! 2026-02-21T10:07:17.3138868Z 2026-02-21T10:07:17.3138991Z Mismatched elements: 83627204 / 83886080 (99.7%) 2026-02-21T10:07:17.3139427Z Greatest absolute difference: 3872.0 at index (32292, 406) (up to 0.01 allowed) 2026-02-21T10:07:17.3139970Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T10:07:17.3140465Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:07:17.3140732Z 2026-02-21T10:07:43.7561401Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 95/95 1.4 configs/s 2026-02-21T10:07:43.8317308Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━ 29/29 - configs/s 2026-02-21T10:07:45.2986918Z [2458s] Generation 3 complete: 2026-02-21T10:07:45.2987383Z error=16 2026-02-21T10:07:45.2987641Z ok=84 2026-02-21T10:07:45.2987891Z min=7.1298 2026-02-21T10:07:45.2988139Z mid=18.6694 2026-02-21T10:07:45.2988517Z max=637.5189 2026-02-21T10:07:45.2988811Z best={'block_sizes': [32, 128, 128], 2026-02-21T10:07:45.2989327Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T10:07:45.2989835Z 'l2_groupings': [32], 2026-02-21T10:07:45.2990178Z 'load_eviction_policies': ['', 'first'], 2026-02-21T10:07:45.2990587Z 'loop_orders': [[0, 1]], 2026-02-21T10:07:45.2990902Z 'num_stages': 1, 2026-02-21T10:07:45.2991185Z 'num_warps': 4, 2026-02-21T10:07:45.2991460Z 'pid_type': 'flat', 2026-02-21T10:07:45.2992215Z 'range_flattens': [None, None], 2026-02-21T10:07:45.2992635Z 'range_multi_buffers': [None, True], 2026-02-21T10:07:45.2993119Z 'range_num_stages': [0, 4], 2026-02-21T10:07:45.2993505Z 'range_unroll_factors': [0, 2], 2026-02-21T10:07:45.2993758Z 'range_warp_specializes': []} 2026-02-21T10:07:45.3021423Z [2458s] Fitting surrogate: 412 points, 412 targets 2026-02-21T10:07:46.8061942Z [2459s] Generation 4 starting: 91 neighbors, 5 active search path(s) 2026-02-21T10:08:15.7303407Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 93/93 3.9 configs/s 2026-02-21T10:08:48.7615754Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 93/93 3.1 configs/s 2026-02-21T10:08:50.1597475Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━ 29/29 9.3 configs/s 2026-02-21T10:08:51.9979961Z [2525s] Generation 4 complete: 2026-02-21T10:08:51.9980251Z error=24 2026-02-21T10:08:51.9980422Z ok=72 2026-02-21T10:08:51.9980604Z min=6.8459 2026-02-21T10:08:51.9980771Z mid=14.8084 2026-02-21T10:08:51.9980956Z max=551.9349 2026-02-21T10:08:51.9981172Z best={'block_sizes': [32, 128, 128], 2026-02-21T10:08:51.9981532Z 'indexing': ['pointer', 'tensor_descriptor', 'pointer'], 2026-02-21T10:08:51.9981893Z 'l2_groupings': [32], 2026-02-21T10:08:51.9982132Z 'load_eviction_policies': ['', 'first'], 2026-02-21T10:08:51.9982412Z 'loop_orders': [[0, 1]], 2026-02-21T10:08:51.9982623Z 'num_stages': 1, 2026-02-21T10:08:51.9982818Z 'num_warps': 4, 2026-02-21T10:08:51.9983011Z 'pid_type': 'flat', 2026-02-21T10:08:51.9983228Z 'range_flattens': [None, None], 2026-02-21T10:08:51.9983493Z 'range_multi_buffers': [None, True], 2026-02-21T10:08:51.9983774Z 'range_num_stages': [0, 4], 2026-02-21T10:08:51.9984001Z 'range_unroll_factors': [0, 2], 2026-02-21T10:08:51.9984218Z 'range_warp_specializes': []} 2026-02-21T10:08:52.0016810Z [2525s] Fitting surrogate: 508 points, 508 targets 2026-02-21T10:08:53.4414783Z [2526s] Generation 5 starting: 84 neighbors, 4 active search path(s) 2026-02-21T10:09:35.4471761Z [2568s] Timeout after 30s compiling Config(block_sizes=[16, 512, 64], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=1, num_stages=6, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, None], range_num_stages=[0, 4], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T10:09:35.4494819Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 84/84 0.6 configs/s 2026-02-21T10:09:52.7320005Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 84/84 4.9 configs/s 2026-02-21T10:09:53.5767531Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 14.8 configs/s 2026-02-21T10:09:55.1829503Z [2588s] Generation 5 complete: 2026-02-21T10:09:55.1829837Z error=22 2026-02-21T10:09:55.1830030Z timeout=1 2026-02-21T10:09:55.1830203Z ok=66 2026-02-21T10:09:55.1830371Z min=6.0421 2026-02-21T10:09:55.1830544Z mid=11.6511 2026-02-21T10:09:55.1831331Z max=405.1867 2026-02-21T10:09:55.1831552Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:09:55.1831922Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:09:55.1832658Z 'l2_groupings': [32], 2026-02-21T10:09:55.1832902Z 'load_eviction_policies': ['', ''], 2026-02-21T10:09:55.1833221Z 'loop_orders': [[1, 0]], 2026-02-21T10:09:55.1833483Z 'maxnreg': 256, 2026-02-21T10:09:55.1833709Z 'num_sm_multiplier': 64, 2026-02-21T10:09:55.1833925Z 'num_stages': 1, 2026-02-21T10:09:55.1834114Z 'num_warps': 4, 2026-02-21T10:09:55.1834327Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:09:55.1834599Z 'range_flattens': [True, True], 2026-02-21T10:09:55.1834853Z 'range_multi_buffers': [None, True], 2026-02-21T10:09:55.1835108Z 'range_num_stages': [4, 2], 2026-02-21T10:09:55.1835338Z 'range_unroll_factors': [1, 1], 2026-02-21T10:09:55.1835587Z 'range_warp_specializes': []} 2026-02-21T10:09:55.1867974Z [2588s] Fitting surrogate: 597 points, 597 targets 2026-02-21T10:09:56.5710371Z [2589s] Generation 6 starting: 81 neighbors, 4 active search path(s) 2026-02-21T10:10:22.8373473Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 82/82 3.6 configs/s 2026-02-21T10:10:40.0955423Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 82/82 4.6 configs/s 2026-02-21T10:10:41.4363991Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 12.9 configs/s 2026-02-21T10:10:42.9745385Z [2635s] Generation 6 complete: 2026-02-21T10:10:42.9746001Z error=26 2026-02-21T10:10:42.9746204Z ok=59 2026-02-21T10:10:42.9746352Z min=6.2661 2026-02-21T10:10:42.9746852Z mid=11.3126 2026-02-21T10:10:42.9747011Z max=228.7354 2026-02-21T10:10:42.9747195Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:10:42.9747530Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:10:42.9747874Z 'l2_groupings': [32], 2026-02-21T10:10:42.9748077Z 'load_eviction_policies': ['', ''], 2026-02-21T10:10:42.9748390Z 'loop_orders': [[1, 0]], 2026-02-21T10:10:42.9748594Z 'maxnreg': 256, 2026-02-21T10:10:42.9748774Z 'num_sm_multiplier': 64, 2026-02-21T10:10:42.9748973Z 'num_stages': 1, 2026-02-21T10:10:42.9749139Z 'num_warps': 4, 2026-02-21T10:10:42.9749358Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:10:42.9749607Z 'range_flattens': [True, True], 2026-02-21T10:10:42.9749840Z 'range_multi_buffers': [None, True], 2026-02-21T10:10:42.9750075Z 'range_num_stages': [4, 2], 2026-02-21T10:10:42.9750301Z 'range_unroll_factors': [1, 1], 2026-02-21T10:10:42.9750522Z 'range_warp_specializes': []} 2026-02-21T10:10:42.9788023Z [2636s] Fitting surrogate: 682 points, 682 targets 2026-02-21T10:10:44.2300098Z [2637s] Generation 7 starting: 75 neighbors, 4 active search path(s) 2026-02-21T10:11:13.3215618Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76/76 0.5 configs/s 2026-02-21T10:11:34.3358020Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 76/76 3.6 configs/s 2026-02-21T10:11:37.4411202Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━ 34/34 7.6 configs/s 2026-02-21T10:11:38.9421849Z [2691s] Generation 7 complete: 2026-02-21T10:11:38.9422306Z error=31 2026-02-21T10:11:38.9422577Z ok=48 2026-02-21T10:11:38.9423426Z min=5.9637 2026-02-21T10:11:38.9423705Z mid=10.2980 2026-02-21T10:11:38.9423978Z max=422.2885 2026-02-21T10:11:38.9424294Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:11:38.9424874Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:11:38.9425483Z 'l2_groupings': [16], 2026-02-21T10:11:38.9425859Z 'load_eviction_policies': ['', ''], 2026-02-21T10:11:38.9426115Z 'loop_orders': [[1, 0]], 2026-02-21T10:11:38.9426303Z 'maxnreg': 256, 2026-02-21T10:11:38.9426782Z 'num_sm_multiplier': 64, 2026-02-21T10:11:38.9426984Z 'num_stages': 1, 2026-02-21T10:11:38.9427150Z 'num_warps': 4, 2026-02-21T10:11:38.9427338Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:11:38.9427588Z 'range_flattens': [True, True], 2026-02-21T10:11:38.9427809Z 'range_multi_buffers': [None, True], 2026-02-21T10:11:38.9428200Z 'range_num_stages': [4, 2], 2026-02-21T10:11:38.9428507Z 'range_unroll_factors': [1, 1], 2026-02-21T10:11:38.9428721Z 'range_warp_specializes': []} 2026-02-21T10:11:38.9465888Z [2691s] Fitting surrogate: 761 points, 761 targets 2026-02-21T10:11:40.1769129Z [2693s] Generation 8 starting: 75 neighbors, 4 active search path(s) 2026-02-21T10:12:10.4918163Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76/76 1.1 configs/s 2026-02-21T10:12:32.2542961Z 2026-02-21T10:12:32.2542978Z 2026-02-21T10:12:32.2543272Z ================================================================ 2026-02-21T10:12:32.2543648Z Internal Triton PTX codegen error 2026-02-21T10:12:32.2543909Z `ptxas` stderr: 2026-02-21T10:12:32.2544740Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T10:12:32.2545934Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T10:12:32.2546824Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:12:32.2547089Z 2026-02-21T10:12:32.2547507Z [2745s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:12:32.2549387Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 256], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_stages=6, num_warps=32, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 4], range_unroll_factors=[0, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:12:32.2551153Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:12:32.2551511Z `ptxas` stderr: 2026-02-21T10:12:32.2552464Z ptxas info : (C7511) Potential Performance Loss: wgmma.mma_async instructions are serialized due to insufficient register resources for the wgmma pipeline in the function '_helion_matmul_bf16_int4' 2026-02-21T10:12:32.2553654Z ptxas fatal : (C7600) Register allocation failed with register count of '64'. Compile the program with a higher register target 2026-02-21T10:12:32.2554297Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:12:32.2554521Z 2026-02-21T10:12:32.2555157Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp7dx6ii7y.ptx -o /tmp/tmp7dx6ii7y.ptx.o 2026-02-21T10:12:32.2555907Z 2026-02-21T10:12:32.2556097Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:12:32.2557182Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp7dx6ii7y.ptx -o /tmp/tmp7dx6ii7y.ptx.o 2026-02-21T10:12:32.2557918Z 2026-02-21T10:12:32.2557922Z 2026-02-21T10:12:32.2557993Z // 2026-02-21T10:12:32.2558174Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:12:32.2558414Z // 2026-02-21T10:12:32.2558500Z 2026-02-21T10:12:32.2558570Z .version 8.7 2026-02-21T10:12:32.2559082Z .target sm_90a 2026-02-21T10:12:32.2559259Z .address_size 64 2026-02-21T10:12:32.2559377Z 2026-02-21T10:12:32.2559597Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:12:32.2560037Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:12:32.2560361Z // @_helion_matmul_bf16_int4 2026-02-21T10:12:32.2560684Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:12:32.2561042Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:12:32.2561479Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:12:32.2561905Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:12:32.2579308Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:12:32.2579702Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:12:32.2579989Z ) 2026-02-21T10:12:32.2580118Z .reqntid 1024 2026-02-21T10:12:32.2580282Z { 2026-02-21T10:12:32.2580422Z .reg .pred %p<56>; 2026-02-21T10:12:32.2580584Z .reg .b16 %rs<225>; 2026-02-21T10:12:32.2580747Z .reg .b32 %r<5045>; 2026-02-21T10:12:32.2580902Z .reg .b64 %rd<135>; 2026-02-21T10:12:32.2581311Z .loc 1 13 0 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:13:0 2026-02-21T10:12:32.2581690Z $L__func_begin0: 2026-02-21T10:12:32.2582012Z .loc 1 13 0 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:13:0 2026-02-21T10:12:32.2582397Z 2026-02-21T10:12:32.2582466Z // %bb.0: 2026-02-21T10:12:32.2582731Z ld.param.b64 %rd25, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:12:32.2583112Z ld.param.b64 %rd27, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:12:32.2583471Z ld.param.b64 %rd28, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:12:32.2583771Z $L__tmp0: 2026-02-21T10:12:32.2584138Z .loc 1 17 33 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:17:33 2026-02-21T10:12:32.2584619Z mov.u32 %r96, %ctaid.x; 2026-02-21T10:12:32.2585019Z .loc 1 20 29 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:20:29 2026-02-21T10:12:32.2585459Z shr.u32 %r97, %r96, 9; 2026-02-21T10:12:32.2585674Z and.b32 %r98, %r97, 4194272; 2026-02-21T10:12:32.2586067Z .loc 1 21 35 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:21:35 2026-02-21T10:12:32.2586664Z sub.s32 %r99, 5, %r98; 2026-02-21T10:12:32.2587051Z .loc 1 22 41 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:22:41 2026-02-21T10:12:32.2587504Z and.b32 %r100, %r96, 16383; 2026-02-21T10:12:32.2587901Z .loc 1 23 47 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:23:47 2026-02-21T10:12:32.2588411Z div.s32 %r101, %r100, %r99; 2026-02-21T10:12:32.2588832Z .loc 1 22 60 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:22:60 2026-02-21T10:12:32.2589286Z mul.lo.s32 %r102, %r101, %r99; 2026-02-21T10:12:32.2589526Z sub.s32 %r103, %r100, %r102; 2026-02-21T10:12:32.2589913Z .loc 1 22 26 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:22:26 2026-02-21T10:12:32.2590356Z add.s32 %r104, %r103, %r98; 2026-02-21T10:12:32.2590765Z .loc 1 24 23 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:24:23 2026-02-21T10:12:32.2591213Z shl.b32 %r1, %r104, 8; 2026-02-21T10:12:32.2591602Z .loc 1 25 41 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:25:41 2026-02-21T10:12:32.2592040Z mov.u32 %r2, %tid.x; 2026-02-21T10:12:32.2592215Z shr.u32 %r3, %r2, 5; 2026-02-21T10:12:32.2592371Z shl.b32 %r105, %r2, 2; 2026-02-21T10:12:32.2592543Z and.b32 %r106, %r105, 252; 2026-02-21T10:12:32.2592713Z and.b32 %r4, %r2, 31; 2026-02-21T10:12:32.2593023Z .loc 1 26 23 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:26:23 2026-02-21T10:12:32.2593570Z shl.b32 %r5, %r101, 7; 2026-02-21T10:12:32.2593890Z .loc 1 27 41 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:27:41 2026-02-21T10:12:32.2594240Z shr.u32 %r6, %r2, 3; 2026-02-21T10:12:32.2594536Z .loc 1 35 44 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:35:44 2026-02-21T10:12:32.2594885Z shr.u32 %r107, %r2, 6; 2026-02-21T10:12:32.2595182Z .loc 1 41 34 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:41:34 2026-02-21T10:12:32.2595533Z and.b32 %r108, %r2, 7; 2026-02-21T10:12:32.2595717Z shl.b32 %r109, %r108, 2; 2026-02-21T10:12:32.2596047Z .loc 1 59 34 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:59:34 2026-02-21T10:12:32.2611433Z and.b32 %r7, %r2, 256; 2026-02-21T10:12:32.2611649Z shl.b32 %r110, %r2, 3; 2026-02-21T10:12:32.2611829Z shr.u32 %r111, %r2, 1; 2026-02-21T10:12:32.2612009Z and.b32 %r112, %r111, 24; 2026-02-21T10:12:32.2612239Z xor.b32 %r113, %r112, %r110; 2026-02-21T10:12:32.2612463Z mov.b32 %r1713, global_smem; 2026-02-21T10:12:32.2622373Z add.s32 %r8, %r1713, %r113; 2026-02-21T10:12:32.2622664Z and.b32 %r115, %r2, 224; 2026-02-21T10:12:32.2622864Z shl.b32 %r116, %r115, 5; 2026-02-21T10:12:32.2623208Z shl.b32 %r117, %r2, 4; 2026-02-21T10:12:32.2623397Z and.b32 %r118, %r117, 448; 2026-02-21T10:12:32.2623575Z and.b32 %r119, %r2, 3; 2026-02-21T10:12:32.2623752Z shl.b32 %r120, %r119, 1; 2026-02-21T10:12:32.2623926Z and.b32 %r9, %r2, 24; 2026-02-21T10:12:32.2624115Z or.b32 %r121, %r116, %r118; 2026-02-21T10:12:32.2624314Z or.b32 %r122, %r120, %r9; 2026-02-21T10:12:32.2624506Z or.b32 %r123, %r121, %r122; 2026-02-21T10:12:32.2624687Z add.s32 %r10, %r1713, %r123; 2026-02-21T10:12:32.2624878Z xor.b32 %r124, %r123, 8; 2026-02-21T10:12:32.2625065Z add.s32 %r11, %r1713, %r124; 2026-02-21T10:12:32.2625252Z xor.b32 %r125, %r123, 16; 2026-02-21T10:12:32.2625434Z add.s32 %r12, %r1713, %r125; 2026-02-21T10:12:32.2625612Z xor.b32 %r126, %r123, 24; 2026-02-21T10:12:32.2625790Z add.s32 %r13, %r1713, %r126; 2026-02-21T10:12:32.2625978Z shl.b32 %r127, %r4, 2; 2026-02-21T10:12:32.2626155Z shr.u32 %r128, %r2, 7; 2026-02-21T10:12:32.2626319Z or.b32 %r129, %r128, %r117; 2026-02-21T10:12:32.2626671Z shl.b32 %r14, %r2, 1; 2026-02-21T10:12:32.2626844Z and.b32 %r130, %r14, 128; 2026-02-21T10:12:32.2627030Z setp.lt.u32 %p1, %r2, 512; 2026-02-21T10:12:32.2627220Z selp.b32 %r131, 0, 256, %p1; 2026-02-21T10:12:32.2627419Z and.b32 %r132, %r129, 515; 2026-02-21T10:12:32.2627604Z or.b32 %r133, %r132, %r131; 2026-02-21T10:12:32.2627778Z or.b32 %r134, %r133, %r130; 2026-02-21T10:12:32.2627957Z or.b32 %r135, %r134, %r127; 2026-02-21T10:12:32.2628131Z add.s32 %r15, %r1713, %r135; 2026-02-21T10:12:32.2628314Z xor.b32 %r136, %r135, 32; 2026-02-21T10:12:32.2628552Z add.s32 %r16, %r1713, %r136; 2026-02-21T10:12:32.2628736Z xor.b32 %r137, %r135, 64; 2026-02-21T10:12:32.2628905Z add.s32 %r17, %r1713, %r137; 2026-02-21T10:12:32.2629089Z xor.b32 %r138, %r135, 96; 2026-02-21T10:12:32.2629257Z add.s32 %r18, %r1713, %r138; 2026-02-21T10:12:32.2629436Z shl.b32 %r139, %r119, 10; 2026-02-21T10:12:32.2629624Z shl.b32 %r140, %r119, 5; 2026-02-21T10:12:32.2629799Z and.b32 %r141, %r2, 124; 2026-02-21T10:12:32.2629973Z and.b32 %r142, %r105, 512; 2026-02-21T10:12:32.2630147Z selp.b32 %r143, 0, 128, %p1; 2026-02-21T10:12:32.2630331Z xor.b32 %r144, %r140, %r141; 2026-02-21T10:12:32.2630508Z add.s32 %r145, %r1713, %r139; 2026-02-21T10:12:32.2630710Z add.s32 %r146, %r145, %r142; 2026-02-21T10:12:32.2630889Z add.s32 %r147, %r146, %r143; 2026-02-21T10:12:32.2631074Z add.s32 %r19, %r147, %r144; 2026-02-21T10:12:32.2631264Z shl.b32 %r148, %r2, 7; 2026-02-21T10:12:32.2631431Z or.b32 %r149, %r148, %r107; 2026-02-21T10:12:32.2631630Z shl.b32 %r20, %r108, 4; 2026-02-21T10:12:32.2631804Z and.b32 %r150, %r149, 32652; 2026-02-21T10:12:32.2631992Z or.b32 %r151, %r150, %r20; 2026-02-21T10:12:32.2632382Z add.s32 %r21, %r1713, %r151; 2026-02-21T10:12:32.2632573Z xor.b32 %r152, %r151, 16; 2026-02-21T10:12:32.2632752Z add.s32 %r22, %r1713, %r152; 2026-02-21T10:12:32.2632937Z xor.b32 %r153, %r151, 32; 2026-02-21T10:12:32.2633106Z add.s32 %r23, %r1713, %r153; 2026-02-21T10:12:32.2633290Z xor.b32 %r154, %r151, 48; 2026-02-21T10:12:32.2633465Z add.s32 %r24, %r1713, %r154; 2026-02-21T10:12:32.2633640Z xor.b32 %r155, %r151, 64; 2026-02-21T10:12:32.2633827Z add.s32 %r25, %r1713, %r155; 2026-02-21T10:12:32.2634011Z xor.b32 %r156, %r151, 80; 2026-02-21T10:12:32.2634192Z add.s32 %r26, %r1713, %r156; 2026-02-21T10:12:32.2634365Z xor.b32 %r157, %r151, 96; 2026-02-21T10:12:32.2634551Z add.s32 %r27, %r1713, %r157; 2026-02-21T10:12:32.2634728Z xor.b32 %r158, %r151, 112; 2026-02-21T10:12:32.2635001Z add.s32 %r28, %r1713, %r158; 2026-02-21T10:12:32.2635184Z shl.b32 %r159, %r3, 7; 2026-02-21T10:12:32.2635358Z and.b32 %r160, %r117, 496; 2026-02-21T10:12:32.2635536Z or.b32 %r161, %r159, %r160; 2026-02-21T10:12:32.2635727Z add.s32 %r162, %r1713, 32768; 2026-02-21T10:12:32.2635916Z add.s32 %r4426, %r162, %r161; 2026-02-21T10:12:32.2636090Z shl.b32 %r163, %r9, 7; 2026-02-21T10:12:32.2636268Z shl.b32 %r164, %r115, 2; 2026-02-21T10:12:32.2636685Z add.s32 %r165, %r162, %r163; 2026-02-21T10:12:32.2636899Z add.s32 %r166, %r165, %r164; 2026-02-21T10:12:32.2637074Z add.s32 %r210, %r166, %r20; 2026-02-21T10:12:32.2637254Z bfe.u32 %r167, %r1713, 4, 14; 2026-02-21T10:12:32.2637430Z cvt.u64.u32 %rd29, %r167; 2026-02-21T10:12:32.2637627Z or.b64 %rd54, %rd29, 4611686293439512576; 2026-02-21T10:12:32.2637847Z add.s32 %r168, %r1713, 32; 2026-02-21T10:12:32.2638036Z bfe.u32 %r169, %r168, 4, 14; 2026-02-21T10:12:32.2638223Z cvt.u64.u32 %rd30, %r169; 2026-02-21T10:12:32.2638405Z or.b64 %rd55, %rd30, 4611686293439512576; 2026-02-21T10:12:32.2638614Z add.s32 %r170, %r1713, 64; 2026-02-21T10:12:32.2638785Z bfe.u32 %r171, %r170, 4, 14; 2026-02-21T10:12:32.2638964Z cvt.u64.u32 %rd31, %r171; 2026-02-21T10:12:32.2639159Z or.b64 %rd56, %rd31, 4611686293439512576; 2026-02-21T10:12:32.2639374Z add.s32 %r172, %r1713, 96; 2026-02-21T10:12:32.2639555Z bfe.u32 %r173, %r172, 4, 14; 2026-02-21T10:12:32.2639735Z cvt.u64.u32 %rd32, %r173; 2026-02-21T10:12:32.2639931Z or.b64 %rd57, %rd32, 4611686293439512576; 2026-02-21T10:12:32.2640138Z add.s32 %r174, %r1713, 8192; 2026-02-21T10:12:32.2640330Z bfe.u32 %r175, %r174, 4, 14; 2026-02-21T10:12:32.2640507Z cvt.u64.u32 %rd33, %r175; 2026-02-21T10:12:32.2640691Z or.b64 %rd58, %rd33, 4611686293439512576; 2026-02-21T10:12:32.2640890Z add.s32 %r176, %r1713, 8224; 2026-02-21T10:12:32.2641081Z bfe.u32 %r177, %r176, 4, 14; 2026-02-21T10:12:32.2641256Z cvt.u64.u32 %rd34, %r177; 2026-02-21T10:12:32.2641431Z or.b64 %rd59, %rd34, 4611686293439512576; 2026-02-21T10:12:32.2641648Z add.s32 %r178, %r1713, 8256; 2026-02-21T10:12:32.2641829Z bfe.u32 %r179, %r178, 4, 14; 2026-02-21T10:12:32.2642013Z cvt.u64.u32 %rd35, %r179; 2026-02-21T10:12:32.2642200Z or.b64 %rd60, %rd35, 4611686293439512576; 2026-02-21T10:12:32.2642408Z add.s32 %r180, %r1713, 8288; 2026-02-21T10:12:32.2642582Z bfe.u32 %r181, %r180, 4, 14; 2026-02-21T10:12:32.2642761Z cvt.u64.u32 %rd36, %r181; 2026-02-21T10:12:32.2642955Z or.b64 %rd61, %rd36, 4611686293439512576; 2026-02-21T10:12:32.2643164Z add.s32 %r182, %r1713, 16384; 2026-02-21T10:12:32.2643356Z bfe.u32 %r183, %r182, 4, 14; 2026-02-21T10:12:32.2643531Z cvt.u64.u32 %rd37, %r183; 2026-02-21T10:12:32.2643719Z or.b64 %rd62, %rd37, 4611686293439512576; 2026-02-21T10:12:32.2643921Z add.s32 %r184, %r1713, 16416; 2026-02-21T10:12:32.2644104Z bfe.u32 %r185, %r184, 4, 14; 2026-02-21T10:12:32.2644280Z cvt.u64.u32 %rd38, %r185; 2026-02-21T10:12:32.2644463Z or.b64 %rd63, %rd38, 4611686293439512576; 2026-02-21T10:12:32.2644664Z add.s32 %r186, %r1713, 16448; 2026-02-21T10:12:32.2644859Z bfe.u32 %r187, %r186, 4, 14; 2026-02-21T10:12:32.2645048Z cvt.u64.u32 %rd39, %r187; 2026-02-21T10:12:32.2645321Z or.b64 %rd64, %rd39, 4611686293439512576; 2026-02-21T10:12:32.2645609Z add.s32 %r188, %r1713, 16480; 2026-02-21T10:12:32.2645783Z bfe.u32 %r189, %r188, 4, 14; 2026-02-21T10:12:32.2645961Z cvt.u64.u32 %rd40, %r189; 2026-02-21T10:12:32.2646136Z or.b64 %rd65, %rd40, 4611686293439512576; 2026-02-21T10:12:32.2646344Z add.s32 %r190, %r1713, 24576; 2026-02-21T10:12:32.2646661Z bfe.u32 %r191, %r190, 4, 14; 2026-02-21T10:12:32.2646846Z cvt.u64.u32 %rd41, %r191; 2026-02-21T10:12:32.2647021Z or.b64 %rd66, %rd41, 4611686293439512576; 2026-02-21T10:12:32.2647225Z add.s32 %r192, %r1713, 24608; 2026-02-21T10:12:32.2647406Z bfe.u32 %r193, %r192, 4, 14; 2026-02-21T10:12:32.2647577Z cvt.u64.u32 %rd42, %r193; 2026-02-21T10:12:32.2647759Z or.b64 %rd67, %rd42, 4611686293439512576; 2026-02-21T10:12:32.2648053Z add.s32 %r194, %r1713, 24640; 2026-02-21T10:12:32.2648247Z bfe.u32 %r195, %r194, 4, 14; 2026-02-21T10:12:32.2648422Z cvt.u64.u32 %rd43, %r195; 2026-02-21T10:12:32.2648601Z or.b64 %rd68, %rd43, 4611686293439512576; 2026-02-21T10:12:32.2648817Z add.s32 %r196, %r1713, 24672; 2026-02-21T10:12:32.2648996Z bfe.u32 %r197, %r196, 4, 14; 2026-02-21T10:12:32.2649173Z cvt.u64.u32 %rd44, %r197; 2026-02-21T10:12:32.2649345Z or.b64 %rd69, %rd44, 4611686293439512576; 2026-02-21T10:12:32.2649804Z .loc 1 34 122 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:34:122 2026-02-21T10:12:32.2650188Z shl.b32 %r198, %r101, 20; 2026-02-21T10:12:32.2650364Z shl.b32 %r199, %r6, 13; 2026-02-21T10:12:32.2650532Z or.b32 %r200, %r198, %r199; 2026-02-21T10:12:32.2650716Z or.b32 %r201, %r200, %r109; 2026-02-21T10:12:32.2650899Z mad.wide.s32 %rd45, %r201, 2, %rd27; 2026-02-21T10:12:32.2651105Z add.s64 %rd133, %rd45, 192; 2026-02-21T10:12:32.2651290Z or.b32 %r202, %r1, %r106; 2026-02-21T10:12:32.2651460Z cvt.u64.u32 %rd46, %r202; 2026-02-21T10:12:32.2651665Z mad.wide.u32 %rd47, %r107, 1280, %rd46; 2026-02-21T10:12:32.2651875Z add.s64 %rd132, %rd28, %rd47; 2026-02-21T10:12:32.2652085Z mov.b32 %r4557, 0f00000000; 2026-02-21T10:12:32.2652265Z mov.b64 %rd134, -64; 2026-02-21T10:12:32.2652443Z setp.eq.b32 %p54, %r7, 0; 2026-02-21T10:12:32.2652630Z mov.b32 %r4558, %r4557; 2026-02-21T10:12:32.2652793Z mov.b32 %r4559, %r4557; 2026-02-21T10:12:32.2652967Z mov.b32 %r4560, %r4557; 2026-02-21T10:12:32.2653132Z mov.b32 %r4561, %r4557; 2026-02-21T10:12:32.2653300Z mov.b32 %r4562, %r4557; 2026-02-21T10:12:32.2653460Z mov.b32 %r4563, %r4557; 2026-02-21T10:12:32.2653625Z mov.b32 %r4564, %r4557; 2026-02-21T10:12:32.2653787Z mov.b32 %r4565, %r4557; 2026-02-21T10:12:32.2653953Z mov.b32 %r4566, %r4557; 2026-02-21T10:12:32.2654115Z mov.b32 %r4567, %r4557; 2026-02-21T10:12:32.2654283Z mov.b32 %r4568, %r4557; 2026-02-21T10:12:32.2654460Z mov.b32 %r4569, %r4557; 2026-02-21T10:12:32.2654627Z mov.b32 %r4570, %r4557; 2026-02-21T10:12:32.2654794Z mov.b32 %r4571, %r4557; 2026-02-21T10:12:32.2654952Z mov.b32 %r4572, %r4557; 2026-02-21T10:12:32.2655117Z mov.b32 %r4573, %r4557; 2026-02-21T10:12:32.2655278Z mov.b32 %r4574, %r4557; 2026-02-21T10:12:32.2655445Z mov.b32 %r4575, %r4557; 2026-02-21T10:12:32.2655606Z mov.b32 %r4576, %r4557; 2026-02-21T10:12:32.2655771Z mov.b32 %r4577, %r4557; 2026-02-21T10:12:32.2655930Z mov.b32 %r4578, %r4557; 2026-02-21T10:12:32.2656114Z mov.b32 %r4579, %r4557; 2026-02-21T10:12:32.2656281Z mov.b32 %r4580, %r4557; 2026-02-21T10:12:32.2656439Z mov.b32 %r4581, %r4557; 2026-02-21T10:12:32.2656759Z mov.b32 %r4582, %r4557; 2026-02-21T10:12:32.2656919Z mov.b32 %r4583, %r4557; 2026-02-21T10:12:32.2657084Z mov.b32 %r4584, %r4557; 2026-02-21T10:12:32.2657241Z mov.b32 %r4585, %r4557; 2026-02-21T10:12:32.2657404Z mov.b32 %r4586, %r4557; 2026-02-21T10:12:32.2657561Z mov.b32 %r4587, %r4557; 2026-02-21T10:12:32.2657720Z mov.b32 %r4588, %r4557; 2026-02-21T10:12:32.2657939Z $L__BB0_1: // =>This Inner Loop Header: Depth=1 2026-02-21T10:12:32.2658379Z .loc 1 42 76 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:42:76 2026-02-21T10:12:32.2658947Z add.s64 %rd49, %rd133, -192; 2026-02-21T10:12:32.2659137Z // begin inline asm 2026-02-21T10:12:32.2659301Z mov.u64 %rd48, 0x0; 2026-02-21T10:12:32.2659540Z createpolicy.fractional.L2::evict_last.b64 %rd48, 1.0; 2026-02-21T10:12:32.2659803Z // end inline asm 2026-02-21T10:12:32.2659955Z // begin inline asm 2026-02-21T10:12:32.2660113Z mov.u32 %r203, 0x0; 2026-02-21T10:12:32.2660262Z mov.u32 %r204, 0x0; 2026-02-21T10:12:32.2660534Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r203, %r204 }, [ %rd49 + 0 ], %rd48; 2026-02-21T10:12:32.2660863Z // end inline asm 2026-02-21T10:12:32.2661167Z .loc 1 46 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:46:28 2026-02-21T10:12:32.2661527Z bar.sync 0; 2026-02-21T10:12:32.2661768Z st.shared.v2.b32 [%r8], {%r203, %r204}; 2026-02-21T10:12:32.2661979Z bar.sync 0; 2026-02-21T10:12:32.2662167Z ld.shared.b16 %rs1, [%r10]; 2026-02-21T10:12:32.2662369Z ld.shared.b16 %rs2, [%r10+512]; 2026-02-21T10:12:32.2662574Z ld.shared.b16 %rs3, [%r10+32]; 2026-02-21T10:12:32.2662762Z ld.shared.b16 %rs4, [%r10+544]; 2026-02-21T10:12:32.2662957Z ld.shared.b16 %rs5, [%r11]; 2026-02-21T10:12:32.2663139Z ld.shared.b16 %rs6, [%r11+512]; 2026-02-21T10:12:32.2663414Z ld.shared.b16 %rs7, [%r11+32]; 2026-02-21T10:12:32.2663617Z ld.shared.b16 %rs8, [%r11+544]; 2026-02-21T10:12:32.2663805Z ld.shared.b16 %rs9, [%r12]; 2026-02-21T10:12:32.2663998Z ld.shared.b16 %rs10, [%r12+512]; 2026-02-21T10:12:32.2664197Z ld.shared.b16 %rs11, [%r12+32]; 2026-02-21T10:12:32.2664390Z ld.shared.b16 %rs12, [%r12+544]; 2026-02-21T10:12:32.2664581Z ld.shared.b16 %rs13, [%r13]; 2026-02-21T10:12:32.2664771Z ld.shared.b16 %rs14, [%r13+512]; 2026-02-21T10:12:32.2664959Z ld.shared.b16 %rs15, [%r13+32]; 2026-02-21T10:12:32.2665155Z ld.shared.b16 %rs16, [%r13+544]; 2026-02-21T10:12:32.2665348Z cvt.f32.bf16 %r702, %rs1; 2026-02-21T10:12:32.2665524Z cvt.f32.bf16 %r703, %rs2; 2026-02-21T10:12:32.2665719Z cvt.f32.bf16 %r704, %rs5; 2026-02-21T10:12:32.2665900Z cvt.f32.bf16 %r705, %rs6; 2026-02-21T10:12:32.2666074Z cvt.f32.bf16 %r770, %rs9; 2026-02-21T10:12:32.2666246Z cvt.f32.bf16 %r771, %rs10; 2026-02-21T10:12:32.2666427Z cvt.f32.bf16 %r772, %rs13; 2026-02-21T10:12:32.2666731Z cvt.f32.bf16 %r773, %rs14; 2026-02-21T10:12:32.2666912Z cvt.f32.bf16 %r838, %rs3; 2026-02-21T10:12:32.2667078Z cvt.f32.bf16 %r839, %rs4; 2026-02-21T10:12:32.2667249Z cvt.f32.bf16 %r840, %rs7; 2026-02-21T10:12:32.2667419Z cvt.f32.bf16 %r841, %rs8; 2026-02-21T10:12:32.2667602Z cvt.f32.bf16 %r906, %rs11; 2026-02-21T10:12:32.2667780Z cvt.f32.bf16 %r907, %rs12; 2026-02-21T10:12:32.2667960Z cvt.f32.bf16 %r908, %rs15; 2026-02-21T10:12:32.2668137Z cvt.f32.bf16 %r909, %rs16; 2026-02-21T10:12:32.2668566Z .loc 1 48 83 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:48:83 2026-02-21T10:12:32.2668947Z // begin inline asm 2026-02-21T10:12:32.2669110Z mov.u64 %rd51, 0x0; 2026-02-21T10:12:32.2669339Z createpolicy.fractional.L2::evict_last.b64 %rd51, 1.0; 2026-02-21T10:12:32.2669594Z // end inline asm 2026-02-21T10:12:32.2669745Z // begin inline asm 2026-02-21T10:12:32.2669905Z mov.u32 %r205, 0x0; 2026-02-21T10:12:32.2670155Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r205 }, [ %rd132 + 0 ], %rd51; 2026-02-21T10:12:32.2670454Z // end inline asm 2026-02-21T10:12:32.2670757Z .loc 1 56 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:56:24 2026-02-21T10:12:32.2671112Z bar.sync 0; 2026-02-21T10:12:32.2671261Z st.shared.b8 [%r15], %r205; 2026-02-21T10:12:32.2671458Z prmt.b32 %r4831, %r205, 0, 0x7771U; 2026-02-21T10:12:32.2671674Z st.shared.b8 [%r16+1024], %r4831; 2026-02-21T10:12:32.2671879Z prmt.b32 %r4832, %r205, 0, 0x7772U; 2026-02-21T10:12:32.2672086Z st.shared.b8 [%r17+2048], %r4832; 2026-02-21T10:12:32.2672277Z prmt.b32 %r4833, %r205, 0, 0x7773U; 2026-02-21T10:12:32.2672479Z st.shared.b8 [%r18+3072], %r4833; 2026-02-21T10:12:32.2686632Z bar.sync 0; 2026-02-21T10:12:32.2686811Z ld.shared.b32 %r4834, [%r19]; 2026-02-21T10:12:32.2687013Z prmt.b32 %r4835, %r4834, 0, 0x7770U; 2026-02-21T10:12:32.2687238Z cvt.u16.u32 %rs17, %r4835; 2026-02-21T10:12:32.2687451Z prmt.b32 %r4836, %r4834, 0, 0x7771U; 2026-02-21T10:12:32.2687656Z cvt.u16.u32 %rs18, %r4836; 2026-02-21T10:12:32.2687842Z prmt.b32 %r4837, %r4834, 0, 0x7772U; 2026-02-21T10:12:32.2688037Z cvt.u16.u32 %rs19, %r4837; 2026-02-21T10:12:32.2688222Z prmt.b32 %r4838, %r4834, 0, 0x7773U; 2026-02-21T10:12:32.2688417Z cvt.u16.u32 %rs20, %r4838; 2026-02-21T10:12:32.2688620Z ld.shared.b32 %r4839, [%r19+256]; 2026-02-21T10:12:32.2688827Z prmt.b32 %r4840, %r4839, 0, 0x7770U; 2026-02-21T10:12:32.2689033Z cvt.u16.u32 %rs21, %r4840; 2026-02-21T10:12:32.2689297Z prmt.b32 %r4841, %r4839, 0, 0x7771U; 2026-02-21T10:12:32.2689499Z cvt.u16.u32 %rs22, %r4841; 2026-02-21T10:12:32.2689678Z prmt.b32 %r4842, %r4839, 0, 0x7772U; 2026-02-21T10:12:32.2689884Z cvt.u16.u32 %rs23, %r4842; 2026-02-21T10:12:32.2690071Z prmt.b32 %r4843, %r4839, 0, 0x7773U; 2026-02-21T10:12:32.2690269Z cvt.u16.u32 %rs24, %r4843; 2026-02-21T10:12:32.2690684Z .loc 1 51 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:51:24 2026-02-21T10:12:32.2691058Z shl.b16 %rs25, %rs17, 4; 2026-02-21T10:12:32.2691234Z shl.b16 %rs26, %rs18, 4; 2026-02-21T10:12:32.2691401Z shl.b16 %rs27, %rs19, 4; 2026-02-21T10:12:32.2691572Z shl.b16 %rs28, %rs20, 4; 2026-02-21T10:12:32.2691742Z shl.b16 %rs29, %rs21, 4; 2026-02-21T10:12:32.2691912Z shl.b16 %rs30, %rs22, 4; 2026-02-21T10:12:32.2692084Z shl.b16 %rs31, %rs23, 4; 2026-02-21T10:12:32.2692247Z shl.b16 %rs32, %rs24, 4; 2026-02-21T10:12:32.2692579Z .loc 1 66 54 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:66:54 2026-02-21T10:12:32.2692945Z selp.b16 %rs33, %rs25, %rs17, %p54; 2026-02-21T10:12:32.2693149Z cvt.s16.s8 %rs34, %rs33; 2026-02-21T10:12:32.2693317Z shr.s16 %rs35, %rs34, 4; 2026-02-21T10:12:32.2693499Z selp.b16 %rs36, %rs26, %rs18, %p54; 2026-02-21T10:12:32.2693709Z cvt.s16.s8 %rs37, %rs36; 2026-02-21T10:12:32.2693877Z shr.s16 %rs38, %rs37, 4; 2026-02-21T10:12:32.2694055Z selp.b16 %rs39, %rs27, %rs19, %p54; 2026-02-21T10:12:32.2694247Z cvt.s16.s8 %rs40, %rs39; 2026-02-21T10:12:32.2694418Z shr.s16 %rs41, %rs40, 4; 2026-02-21T10:12:32.2694587Z selp.b16 %rs42, %rs28, %rs20, %p54; 2026-02-21T10:12:32.2694784Z cvt.s16.s8 %rs43, %rs42; 2026-02-21T10:12:32.2694946Z shr.s16 %rs44, %rs43, 4; 2026-02-21T10:12:32.2695123Z selp.b16 %rs45, %rs29, %rs21, %p54; 2026-02-21T10:12:32.2695311Z cvt.s16.s8 %rs46, %rs45; 2026-02-21T10:12:32.2695494Z shr.s16 %rs47, %rs46, 4; 2026-02-21T10:12:32.2695664Z selp.b16 %rs48, %rs30, %rs22, %p54; 2026-02-21T10:12:32.2695851Z cvt.s16.s8 %rs49, %rs48; 2026-02-21T10:12:32.2696018Z shr.s16 %rs50, %rs49, 4; 2026-02-21T10:12:32.2696189Z selp.b16 %rs51, %rs31, %rs23, %p54; 2026-02-21T10:12:32.2696396Z cvt.s16.s8 %rs52, %rs51; 2026-02-21T10:12:32.2696699Z shr.s16 %rs53, %rs52, 4; 2026-02-21T10:12:32.2696881Z selp.b16 %rs54, %rs32, %rs24, %p54; 2026-02-21T10:12:32.2697076Z cvt.s16.s8 %rs55, %rs54; 2026-02-21T10:12:32.2697245Z shr.s16 %rs56, %rs55, 4; 2026-02-21T10:12:32.2697565Z .loc 1 71 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:71:28 2026-02-21T10:12:32.2697928Z cvt.rn.f32.s16 %r4844, %rs35; 2026-02-21T10:12:32.2698131Z cvt.rn.f32.s16 %r4845, %rs38; 2026-02-21T10:12:32.2698315Z cvt.rn.f32.s16 %r4846, %rs41; 2026-02-21T10:12:32.2698493Z cvt.rn.f32.s16 %r4847, %rs44; 2026-02-21T10:12:32.2698665Z cvt.rn.f32.s16 %r4848, %rs47; 2026-02-21T10:12:32.2698851Z cvt.rn.f32.s16 %r4849, %rs50; 2026-02-21T10:12:32.2699044Z cvt.rn.f32.s16 %r4850, %rs53; 2026-02-21T10:12:32.2699242Z cvt.rn.f32.s16 %r4851, %rs56; 2026-02-21T10:12:32.2699438Z bar.sync 0; 2026-02-21T10:12:32.2699592Z st.shared.b32 [%r21], %r4844; 2026-02-21T10:12:32.2699783Z st.shared.b32 [%r22], %r4845; 2026-02-21T10:12:32.2700135Z st.shared.b32 [%r23], %r4846; 2026-02-21T10:12:32.2700335Z st.shared.b32 [%r24], %r4847; 2026-02-21T10:12:32.2700511Z st.shared.b32 [%r25], %r4848; 2026-02-21T10:12:32.2700695Z st.shared.b32 [%r26], %r4849; 2026-02-21T10:12:32.2700872Z st.shared.b32 [%r27], %r4850; 2026-02-21T10:12:32.2701048Z st.shared.b32 [%r28], %r4851; 2026-02-21T10:12:32.2701295Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4557}; 2026-02-21T10:12:32.2701568Z bar.sync 0; 2026-02-21T10:12:32.2701711Z // begin inline asm 2026-02-21T10:12:32.2701982Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r434, %r706, %r978, %r1250}, [%r210]; 2026-02-21T10:12:32.2702314Z // end inline asm 2026-02-21T10:12:32.2702458Z bar.sync 0; 2026-02-21T10:12:32.2702757Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4559}; 2026-02-21T10:12:32.2703029Z bar.sync 0; 2026-02-21T10:12:32.2703169Z // begin inline asm 2026-02-21T10:12:32.2703435Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r436, %r708, %r980, %r1252}, [%r210]; 2026-02-21T10:12:32.2703756Z // end inline asm 2026-02-21T10:12:32.2703898Z bar.sync 0; 2026-02-21T10:12:32.2704103Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4558}; 2026-02-21T10:12:32.2704365Z bar.sync 0; 2026-02-21T10:12:32.2704581Z // begin inline asm 2026-02-21T10:12:32.2704867Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r435, %r707, %r979, %r1251}, [%r210]; 2026-02-21T10:12:32.2705181Z // end inline asm 2026-02-21T10:12:32.2705325Z bar.sync 0; 2026-02-21T10:12:32.2705528Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4560}; 2026-02-21T10:12:32.2705786Z bar.sync 0; 2026-02-21T10:12:32.2705921Z // begin inline asm 2026-02-21T10:12:32.2706187Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r437, %r709, %r981, %r1253}, [%r210]; 2026-02-21T10:12:32.2706656Z // end inline asm 2026-02-21T10:12:32.2706804Z bar.sync 0; 2026-02-21T10:12:32.2707014Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4561}; 2026-02-21T10:12:32.2707273Z bar.sync 0; 2026-02-21T10:12:32.2707414Z // begin inline asm 2026-02-21T10:12:32.2707673Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r438, %r710, %r982, %r1254}, [%r210]; 2026-02-21T10:12:32.2707985Z // end inline asm 2026-02-21T10:12:32.2708123Z bar.sync 0; 2026-02-21T10:12:32.2708329Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4563}; 2026-02-21T10:12:32.2708659Z bar.sync 0; 2026-02-21T10:12:32.2708802Z // begin inline asm 2026-02-21T10:12:32.2709076Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r440, %r712, %r984, %r1256}, [%r210]; 2026-02-21T10:12:32.2709392Z // end inline asm 2026-02-21T10:12:32.2709537Z bar.sync 0; 2026-02-21T10:12:32.2709757Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4562}; 2026-02-21T10:12:32.2710021Z bar.sync 0; 2026-02-21T10:12:32.2710158Z // begin inline asm 2026-02-21T10:12:32.2710420Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r439, %r711, %r983, %r1255}, [%r210]; 2026-02-21T10:12:32.2710736Z // end inline asm 2026-02-21T10:12:32.2710878Z bar.sync 0; 2026-02-21T10:12:32.2711087Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4564}; 2026-02-21T10:12:32.2711344Z bar.sync 0; 2026-02-21T10:12:32.2711495Z // begin inline asm 2026-02-21T10:12:32.2711763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r441, %r713, %r985, %r1257}, [%r210]; 2026-02-21T10:12:32.2712086Z // end inline asm 2026-02-21T10:12:32.2712225Z bar.sync 0; 2026-02-21T10:12:32.2712432Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4565}; 2026-02-21T10:12:32.2712699Z bar.sync 0; 2026-02-21T10:12:32.2712847Z // begin inline asm 2026-02-21T10:12:32.2713116Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r442, %r714, %r986, %r1258}, [%r210]; 2026-02-21T10:12:32.2713423Z // end inline asm 2026-02-21T10:12:32.2713568Z bar.sync 0; 2026-02-21T10:12:32.2713772Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4567}; 2026-02-21T10:12:32.2714031Z bar.sync 0; 2026-02-21T10:12:32.2714166Z // begin inline asm 2026-02-21T10:12:32.2714431Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r444, %r716, %r988, %r1260}, [%r210]; 2026-02-21T10:12:32.2714904Z // end inline asm 2026-02-21T10:12:32.2715049Z bar.sync 0; 2026-02-21T10:12:32.2715257Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4566}; 2026-02-21T10:12:32.2715515Z bar.sync 0; 2026-02-21T10:12:32.2715655Z // begin inline asm 2026-02-21T10:12:32.2715914Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r443, %r715, %r987, %r1259}, [%r210]; 2026-02-21T10:12:32.2716227Z // end inline asm 2026-02-21T10:12:32.2716364Z bar.sync 0; 2026-02-21T10:12:32.2716731Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4568}; 2026-02-21T10:12:32.2717003Z bar.sync 0; 2026-02-21T10:12:32.2717142Z // begin inline asm 2026-02-21T10:12:32.2717405Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r445, %r717, %r989, %r1261}, [%r210]; 2026-02-21T10:12:32.2717799Z // end inline asm 2026-02-21T10:12:32.2717958Z bar.sync 0; 2026-02-21T10:12:32.2718167Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4569}; 2026-02-21T10:12:32.2718433Z bar.sync 0; 2026-02-21T10:12:32.2718569Z // begin inline asm 2026-02-21T10:12:32.2718850Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r446, %r718, %r990, %r1262}, [%r210]; 2026-02-21T10:12:32.2719158Z // end inline asm 2026-02-21T10:12:32.2719302Z bar.sync 0; 2026-02-21T10:12:32.2719586Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4571}; 2026-02-21T10:12:32.2719855Z bar.sync 0; 2026-02-21T10:12:32.2720002Z // begin inline asm 2026-02-21T10:12:32.2720266Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r448, %r720, %r992, %r1264}, [%r210]; 2026-02-21T10:12:32.2720597Z // end inline asm 2026-02-21T10:12:32.2720747Z bar.sync 0; 2026-02-21T10:12:32.2720962Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4570}; 2026-02-21T10:12:32.2721220Z bar.sync 0; 2026-02-21T10:12:32.2721387Z // begin inline asm 2026-02-21T10:12:32.2721651Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r447, %r719, %r991, %r1263}, [%r210]; 2026-02-21T10:12:32.2721980Z // end inline asm 2026-02-21T10:12:32.2722133Z bar.sync 0; 2026-02-21T10:12:32.2722340Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4572}; 2026-02-21T10:12:32.2722607Z bar.sync 0; 2026-02-21T10:12:32.2722746Z // begin inline asm 2026-02-21T10:12:32.2723013Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r449, %r721, %r993, %r1265}, [%r210]; 2026-02-21T10:12:32.2723322Z // end inline asm 2026-02-21T10:12:32.2723469Z bar.sync 0; 2026-02-21T10:12:32.2723674Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4573}; 2026-02-21T10:12:32.2723940Z bar.sync 0; 2026-02-21T10:12:32.2724097Z // begin inline asm 2026-02-21T10:12:32.2724361Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r450, %r722, %r994, %r1266}, [%r210]; 2026-02-21T10:12:32.2724678Z // end inline asm 2026-02-21T10:12:32.2724819Z bar.sync 0; 2026-02-21T10:12:32.2725036Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4575}; 2026-02-21T10:12:32.2725301Z bar.sync 0; 2026-02-21T10:12:32.2725445Z // begin inline asm 2026-02-21T10:12:32.2725710Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r452, %r724, %r996, %r1268}, [%r210]; 2026-02-21T10:12:32.2726035Z // end inline asm 2026-02-21T10:12:32.2726180Z bar.sync 0; 2026-02-21T10:12:32.2726384Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4574}; 2026-02-21T10:12:32.2726781Z bar.sync 0; 2026-02-21T10:12:32.2726920Z // begin inline asm 2026-02-21T10:12:32.2727191Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r451, %r723, %r995, %r1267}, [%r210]; 2026-02-21T10:12:32.2727505Z // end inline asm 2026-02-21T10:12:32.2727654Z bar.sync 0; 2026-02-21T10:12:32.2727864Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4576}; 2026-02-21T10:12:32.2728132Z bar.sync 0; 2026-02-21T10:12:32.2728272Z // begin inline asm 2026-02-21T10:12:32.2728557Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r453, %r725, %r997, %r1269}, [%r210]; 2026-02-21T10:12:32.2728876Z // end inline asm 2026-02-21T10:12:32.2729019Z bar.sync 0; 2026-02-21T10:12:32.2729233Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4577}; 2026-02-21T10:12:32.2729647Z bar.sync 0; 2026-02-21T10:12:32.2729791Z // begin inline asm 2026-02-21T10:12:32.2730057Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r454, %r726, %r998, %r1270}, [%r210]; 2026-02-21T10:12:32.2730377Z // end inline asm 2026-02-21T10:12:32.2730519Z bar.sync 0; 2026-02-21T10:12:32.2730730Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4579}; 2026-02-21T10:12:32.2730995Z bar.sync 0; 2026-02-21T10:12:32.2731134Z // begin inline asm 2026-02-21T10:12:32.2731420Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r456, %r728, %r1000, %r1272}, [%r210]; 2026-02-21T10:12:32.2731747Z // end inline asm 2026-02-21T10:12:32.2731896Z bar.sync 0; 2026-02-21T10:12:32.2732115Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4578}; 2026-02-21T10:12:32.2732383Z bar.sync 0; 2026-02-21T10:12:32.2732594Z // begin inline asm 2026-02-21T10:12:32.2732887Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r455, %r727, %r999, %r1271}, [%r210]; 2026-02-21T10:12:32.2733214Z // end inline asm 2026-02-21T10:12:32.2733379Z bar.sync 0; 2026-02-21T10:12:32.2733599Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4580}; 2026-02-21T10:12:32.2733864Z bar.sync 0; 2026-02-21T10:12:32.2734010Z // begin inline asm 2026-02-21T10:12:32.2734350Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r457, %r729, %r1001, %r1273}, [%r210]; 2026-02-21T10:12:32.2734688Z // end inline asm 2026-02-21T10:12:32.2734832Z bar.sync 0; 2026-02-21T10:12:32.2735045Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4581}; 2026-02-21T10:12:32.2735312Z bar.sync 0; 2026-02-21T10:12:32.2735451Z // begin inline asm 2026-02-21T10:12:32.2735721Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r458, %r730, %r1002, %r1274}, [%r210]; 2026-02-21T10:12:32.2736039Z // end inline asm 2026-02-21T10:12:32.2736188Z bar.sync 0; 2026-02-21T10:12:32.2736395Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4583}; 2026-02-21T10:12:32.2736791Z bar.sync 0; 2026-02-21T10:12:32.2736931Z // begin inline asm 2026-02-21T10:12:32.2737207Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r460, %r732, %r1004, %r1276}, [%r210]; 2026-02-21T10:12:32.2737533Z // end inline asm 2026-02-21T10:12:32.2737683Z bar.sync 0; 2026-02-21T10:12:32.2737894Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4582}; 2026-02-21T10:12:32.2738157Z bar.sync 0; 2026-02-21T10:12:32.2738299Z // begin inline asm 2026-02-21T10:12:32.2738564Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r459, %r731, %r1003, %r1275}, [%r210]; 2026-02-21T10:12:32.2738884Z // end inline asm 2026-02-21T10:12:32.2739026Z bar.sync 0; 2026-02-21T10:12:32.2739236Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4584}; 2026-02-21T10:12:32.2739510Z bar.sync 0; 2026-02-21T10:12:32.2739654Z // begin inline asm 2026-02-21T10:12:32.2739928Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r461, %r733, %r1005, %r1277}, [%r210]; 2026-02-21T10:12:32.2740243Z // end inline asm 2026-02-21T10:12:32.2740392Z bar.sync 0; 2026-02-21T10:12:32.2740598Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4585}; 2026-02-21T10:12:32.2740867Z bar.sync 0; 2026-02-21T10:12:32.2741005Z // begin inline asm 2026-02-21T10:12:32.2741275Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r462, %r734, %r1006, %r1278}, [%r210]; 2026-02-21T10:12:32.2741592Z // end inline asm 2026-02-21T10:12:32.2741753Z bar.sync 0; 2026-02-21T10:12:32.2741976Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4587}; 2026-02-21T10:12:32.2742245Z bar.sync 0; 2026-02-21T10:12:32.2742398Z // begin inline asm 2026-02-21T10:12:32.2742676Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r464, %r736, %r1008, %r1280}, [%r210]; 2026-02-21T10:12:32.2743001Z // end inline asm 2026-02-21T10:12:32.2743144Z bar.sync 0; 2026-02-21T10:12:32.2743360Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4586}; 2026-02-21T10:12:32.2743623Z bar.sync 0; 2026-02-21T10:12:32.2743775Z // begin inline asm 2026-02-21T10:12:32.2744043Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r463, %r735, %r1007, %r1279}, [%r210]; 2026-02-21T10:12:32.2744546Z // end inline asm 2026-02-21T10:12:32.2744697Z bar.sync 0; 2026-02-21T10:12:32.2744906Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4426], {%r4588}; 2026-02-21T10:12:32.2745191Z bar.sync 0; 2026-02-21T10:12:32.2745334Z // begin inline asm 2026-02-21T10:12:32.2745610Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r465, %r737, %r1009, %r1281}, [%r210]; 2026-02-21T10:12:32.2745929Z // end inline asm 2026-02-21T10:12:32.2746079Z $L__tmp1: 2026-02-21T10:12:32.2746444Z .loc 2 291 36 // standard.py:291:36 @[ cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:78:36 ] 2026-02-21T10:12:32.2747037Z // begin inline asm 2026-02-21T10:12:32.2747225Z fence.proxy.async.shared::cta; 2026-02-21T10:12:32.2747414Z // end inline asm 2026-02-21T10:12:32.2747678Z shfl.sync.idx.b32 %r4852, %r3, 0, 31, -1; 2026-02-21T10:12:32.2747912Z wgmma.fence.sync.aligned; 2026-02-21T10:12:32.2748096Z mov.pred %p2, -1; 2026-02-21T10:12:32.2748250Z // begin inline asm 2026-02-21T10:12:32.2749159Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r702,%r703,%r704,%r705}, %rd54, %p2, 1, 1; 2026-02-21T10:12:32.2749972Z // end inline asm 2026-02-21T10:12:32.2750121Z // begin inline asm 2026-02-21T10:12:32.2750877Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r770,%r771,%r772,%r773}, %rd55, %p2, 1, 1; 2026-02-21T10:12:32.2751663Z // end inline asm 2026-02-21T10:12:32.2751814Z // begin inline asm 2026-02-21T10:12:32.2752565Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r838,%r839,%r840,%r841}, %rd56, %p2, 1, 1; 2026-02-21T10:12:32.2753349Z // end inline asm 2026-02-21T10:12:32.2753500Z // begin inline asm 2026-02-21T10:12:32.2754229Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r906,%r907,%r908,%r909}, %rd57, %p2, 1, 1; 2026-02-21T10:12:32.2755026Z // end inline asm 2026-02-21T10:12:32.2755181Z // begin inline asm 2026-02-21T10:12:32.2755921Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r702,%r703,%r704,%r705}, %rd58, %p2, 1, 1; 2026-02-21T10:12:32.2756884Z // end inline asm 2026-02-21T10:12:32.2757049Z // begin inline asm 2026-02-21T10:12:32.2757790Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r770,%r771,%r772,%r773}, %rd59, %p2, 1, 1; 2026-02-21T10:12:32.2758582Z // end inline asm 2026-02-21T10:12:32.2758729Z // begin inline asm 2026-02-21T10:12:32.2759469Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r838,%r839,%r840,%r841}, %rd60, %p2, 1, 1; 2026-02-21T10:12:32.2760255Z // end inline asm 2026-02-21T10:12:32.2760406Z // begin inline asm 2026-02-21T10:12:32.2761152Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r906,%r907,%r908,%r909}, %rd61, %p2, 1, 1; 2026-02-21T10:12:32.2762094Z // end inline asm 2026-02-21T10:12:32.2762259Z // begin inline asm 2026-02-21T10:12:32.2763047Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r702,%r703,%r704,%r705}, %rd62, %p2, 1, 1; 2026-02-21T10:12:32.2763877Z // end inline asm 2026-02-21T10:12:32.2764040Z // begin inline asm 2026-02-21T10:12:32.2764887Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r770,%r771,%r772,%r773}, %rd63, %p2, 1, 1; 2026-02-21T10:12:32.2765733Z // end inline asm 2026-02-21T10:12:32.2765895Z // begin inline asm 2026-02-21T10:12:32.2766888Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r838,%r839,%r840,%r841}, %rd64, %p2, 1, 1; 2026-02-21T10:12:32.2767745Z // end inline asm 2026-02-21T10:12:32.2767894Z // begin inline asm 2026-02-21T10:12:32.2768673Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r906,%r907,%r908,%r909}, %rd65, %p2, 1, 1; 2026-02-21T10:12:32.2769500Z // end inline asm 2026-02-21T10:12:32.2769652Z // begin inline asm 2026-02-21T10:12:32.2770491Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r702,%r703,%r704,%r705}, %rd66, %p2, 1, 1; 2026-02-21T10:12:32.2771384Z // end inline asm 2026-02-21T10:12:32.2771532Z // begin inline asm 2026-02-21T10:12:32.2772375Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r770,%r771,%r772,%r773}, %rd67, %p2, 1, 1; 2026-02-21T10:12:32.2773262Z // end inline asm 2026-02-21T10:12:32.2773416Z // begin inline asm 2026-02-21T10:12:32.2774256Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r838,%r839,%r840,%r841}, %rd68, %p2, 1, 1; 2026-02-21T10:12:32.2775150Z // end inline asm 2026-02-21T10:12:32.2775303Z // begin inline asm 2026-02-21T10:12:32.2776136Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r906,%r907,%r908,%r909}, %rd69, %p2, 1, 1; 2026-02-21T10:12:32.2777183Z // end inline asm 2026-02-21T10:12:32.2777360Z wgmma.commit_group.sync.aligned; 2026-02-21T10:12:32.2777559Z mov.b32 %r4794, 0; 2026-02-21T10:12:32.2777721Z mov.b32 %r1583, %r4794; 2026-02-21T10:12:32.2777889Z mov.b32 %r1584, %r4794; 2026-02-21T10:12:32.2778059Z mov.b32 %r1582, %r1713; 2026-02-21T10:12:32.2778229Z // begin inline asm 2026-02-21T10:12:32.2780332Z // wait for regs: %r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009,%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281,%r1582,%r1583,%r1584 2026-02-21T10:12:32.2782388Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:12:32.2782589Z // end inline asm 2026-02-21T10:12:32.2782735Z $L__tmp2: 2026-02-21T10:12:32.2783045Z .loc 1 42 76 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:42:76 2026-02-21T10:12:32.2783437Z add.s64 %rd71, %rd133, -128; 2026-02-21T10:12:32.2783634Z // begin inline asm 2026-02-21T10:12:32.2783794Z mov.u64 %rd70, 0x0; 2026-02-21T10:12:32.2784088Z createpolicy.fractional.L2::evict_last.b64 %rd70, 1.0; 2026-02-21T10:12:32.2784352Z // end inline asm 2026-02-21T10:12:32.2784500Z // begin inline asm 2026-02-21T10:12:32.2784659Z mov.u32 %r1716, 0x0; 2026-02-21T10:12:32.2784815Z mov.u32 %r1717, 0x0; 2026-02-21T10:12:32.2785098Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1716, %r1717 }, [ %rd71 + 0 ], %rd70; 2026-02-21T10:12:32.2785429Z // end inline asm 2026-02-21T10:12:32.2785743Z .loc 1 46 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:46:28 2026-02-21T10:12:32.2786108Z bar.sync 0; 2026-02-21T10:12:32.2786292Z st.shared.v2.b32 [%r8], {%r1716, %r1717}; 2026-02-21T10:12:32.2786771Z bar.sync 0; 2026-02-21T10:12:32.2786947Z ld.shared.b16 %rs57, [%r10]; 2026-02-21T10:12:32.2787155Z ld.shared.b16 %rs58, [%r10+512]; 2026-02-21T10:12:32.2787357Z ld.shared.b16 %rs59, [%r10+32]; 2026-02-21T10:12:32.2787565Z ld.shared.b16 %rs60, [%r10+544]; 2026-02-21T10:12:32.2787770Z ld.shared.b16 %rs61, [%r11]; 2026-02-21T10:12:32.2787959Z ld.shared.b16 %rs62, [%r11+512]; 2026-02-21T10:12:32.2788166Z ld.shared.b16 %rs63, [%r11+32]; 2026-02-21T10:12:32.2788441Z ld.shared.b16 %rs64, [%r11+544]; 2026-02-21T10:12:32.2788653Z ld.shared.b16 %rs65, [%r12]; 2026-02-21T10:12:32.2788837Z ld.shared.b16 %rs66, [%r12+512]; 2026-02-21T10:12:32.2789034Z ld.shared.b16 %rs67, [%r12+32]; 2026-02-21T10:12:32.2789222Z ld.shared.b16 %rs68, [%r12+544]; 2026-02-21T10:12:32.2789419Z ld.shared.b16 %rs69, [%r13]; 2026-02-21T10:12:32.2789601Z ld.shared.b16 %rs70, [%r13+512]; 2026-02-21T10:12:32.2789798Z ld.shared.b16 %rs71, [%r13+32]; 2026-02-21T10:12:32.2789993Z ld.shared.b16 %rs72, [%r13+544]; 2026-02-21T10:12:32.2790188Z cvt.f32.bf16 %r2055, %rs57; 2026-02-21T10:12:32.2790391Z cvt.f32.bf16 %r2056, %rs58; 2026-02-21T10:12:32.2790566Z cvt.f32.bf16 %r2057, %rs61; 2026-02-21T10:12:32.2790742Z cvt.f32.bf16 %r2058, %rs62; 2026-02-21T10:12:32.2790914Z cvt.f32.bf16 %r2123, %rs65; 2026-02-21T10:12:32.2791095Z cvt.f32.bf16 %r2124, %rs66; 2026-02-21T10:12:32.2791269Z cvt.f32.bf16 %r2125, %rs69; 2026-02-21T10:12:32.2791446Z cvt.f32.bf16 %r2126, %rs70; 2026-02-21T10:12:32.2791635Z cvt.f32.bf16 %r2191, %rs59; 2026-02-21T10:12:32.2791811Z cvt.f32.bf16 %r2192, %rs60; 2026-02-21T10:12:32.2791988Z cvt.f32.bf16 %r2193, %rs63; 2026-02-21T10:12:32.2792166Z cvt.f32.bf16 %r2194, %rs64; 2026-02-21T10:12:32.2792342Z cvt.f32.bf16 %r2259, %rs67; 2026-02-21T10:12:32.2792513Z cvt.f32.bf16 %r2260, %rs68; 2026-02-21T10:12:32.2792691Z cvt.f32.bf16 %r2261, %rs71; 2026-02-21T10:12:32.2792867Z cvt.f32.bf16 %r2262, %rs72; 2026-02-21T10:12:32.2793202Z .loc 1 48 83 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:48:83 2026-02-21T10:12:32.2793733Z add.s64 %rd74, %rd132, 20480; 2026-02-21T10:12:32.2793926Z // begin inline asm 2026-02-21T10:12:32.2794096Z mov.u64 %rd73, 0x0; 2026-02-21T10:12:32.2794314Z createpolicy.fractional.L2::evict_last.b64 %rd73, 1.0; 2026-02-21T10:12:32.2794578Z // end inline asm 2026-02-21T10:12:32.2794728Z // begin inline asm 2026-02-21T10:12:32.2794898Z mov.u32 %r1718, 0x0; 2026-02-21T10:12:32.2795158Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1718 }, [ %rd74 + 0 ], %rd73; 2026-02-21T10:12:32.2795463Z // end inline asm 2026-02-21T10:12:32.2795773Z .loc 1 56 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:56:24 2026-02-21T10:12:32.2796131Z bar.sync 0; 2026-02-21T10:12:32.2796292Z st.shared.b8 [%r15], %r1718; 2026-02-21T10:12:32.2796730Z prmt.b32 %r4853, %r1718, 0, 0x7771U; 2026-02-21T10:12:32.2796971Z st.shared.b8 [%r16+1024], %r4853; 2026-02-21T10:12:32.2797176Z prmt.b32 %r4854, %r1718, 0, 0x7772U; 2026-02-21T10:12:32.2797400Z st.shared.b8 [%r17+2048], %r4854; 2026-02-21T10:12:32.2797589Z prmt.b32 %r4855, %r1718, 0, 0x7773U; 2026-02-21T10:12:32.2797792Z st.shared.b8 [%r18+3072], %r4855; 2026-02-21T10:12:32.2797982Z bar.sync 0; 2026-02-21T10:12:32.2798131Z ld.shared.b32 %r4856, [%r19]; 2026-02-21T10:12:32.2798429Z prmt.b32 %r4857, %r4856, 0, 0x7770U; 2026-02-21T10:12:32.2798637Z cvt.u16.u32 %rs73, %r4857; 2026-02-21T10:12:32.2798828Z prmt.b32 %r4858, %r4856, 0, 0x7771U; 2026-02-21T10:12:32.2799022Z cvt.u16.u32 %rs74, %r4858; 2026-02-21T10:12:32.2799205Z prmt.b32 %r4859, %r4856, 0, 0x7772U; 2026-02-21T10:12:32.2799396Z cvt.u16.u32 %rs75, %r4859; 2026-02-21T10:12:32.2799595Z prmt.b32 %r4860, %r4856, 0, 0x7773U; 2026-02-21T10:12:32.2799787Z cvt.u16.u32 %rs76, %r4860; 2026-02-21T10:12:32.2799974Z ld.shared.b32 %r4861, [%r19+256]; 2026-02-21T10:12:32.2800177Z prmt.b32 %r4862, %r4861, 0, 0x7770U; 2026-02-21T10:12:32.2800380Z cvt.u16.u32 %rs77, %r4862; 2026-02-21T10:12:32.2800564Z prmt.b32 %r4863, %r4861, 0, 0x7771U; 2026-02-21T10:12:32.2800761Z cvt.u16.u32 %rs78, %r4863; 2026-02-21T10:12:32.2800938Z prmt.b32 %r4864, %r4861, 0, 0x7772U; 2026-02-21T10:12:32.2801125Z cvt.u16.u32 %rs79, %r4864; 2026-02-21T10:12:32.2801310Z prmt.b32 %r4865, %r4861, 0, 0x7773U; 2026-02-21T10:12:32.2801502Z cvt.u16.u32 %rs80, %r4865; 2026-02-21T10:12:32.2801840Z .loc 1 51 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:51:24 2026-02-21T10:12:32.2802219Z shl.b16 %rs81, %rs73, 4; 2026-02-21T10:12:32.2802391Z shl.b16 %rs82, %rs74, 4; 2026-02-21T10:12:32.2802564Z shl.b16 %rs83, %rs75, 4; 2026-02-21T10:12:32.2802732Z shl.b16 %rs84, %rs76, 4; 2026-02-21T10:12:32.2802903Z shl.b16 %rs85, %rs77, 4; 2026-02-21T10:12:32.2803067Z shl.b16 %rs86, %rs78, 4; 2026-02-21T10:12:32.2803251Z shl.b16 %rs87, %rs79, 4; 2026-02-21T10:12:32.2803417Z shl.b16 %rs88, %rs80, 4; 2026-02-21T10:12:32.2803735Z .loc 1 66 54 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:66:54 2026-02-21T10:12:32.2804108Z selp.b16 %rs89, %rs81, %rs73, %p54; 2026-02-21T10:12:32.2804306Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T10:12:32.2804477Z shr.s16 %rs91, %rs90, 4; 2026-02-21T10:12:32.2804652Z selp.b16 %rs92, %rs82, %rs74, %p54; 2026-02-21T10:12:32.2804861Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T10:12:32.2805036Z shr.s16 %rs94, %rs93, 4; 2026-02-21T10:12:32.2805221Z selp.b16 %rs95, %rs83, %rs75, %p54; 2026-02-21T10:12:32.2805417Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T10:12:32.2805591Z shr.s16 %rs97, %rs96, 4; 2026-02-21T10:12:32.2805767Z selp.b16 %rs98, %rs84, %rs76, %p54; 2026-02-21T10:12:32.2805969Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T10:12:32.2806146Z shr.s16 %rs100, %rs99, 4; 2026-02-21T10:12:32.2806332Z selp.b16 %rs101, %rs85, %rs77, %p54; 2026-02-21T10:12:32.2806677Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T10:12:32.2806864Z shr.s16 %rs103, %rs102, 4; 2026-02-21T10:12:32.2807058Z selp.b16 %rs104, %rs86, %rs78, %p54; 2026-02-21T10:12:32.2807362Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T10:12:32.2807614Z shr.s16 %rs106, %rs105, 4; 2026-02-21T10:12:32.2807809Z selp.b16 %rs107, %rs87, %rs79, %p54; 2026-02-21T10:12:32.2808023Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T10:12:32.2808205Z shr.s16 %rs109, %rs108, 4; 2026-02-21T10:12:32.2808385Z selp.b16 %rs110, %rs88, %rs80, %p54; 2026-02-21T10:12:32.2808594Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T10:12:32.2808770Z shr.s16 %rs112, %rs111, 4; 2026-02-21T10:12:32.2809108Z .loc 1 71 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:71:28 2026-02-21T10:12:32.2809494Z cvt.rn.f32.s16 %r4866, %rs91; 2026-02-21T10:12:32.2809696Z cvt.rn.f32.s16 %r4867, %rs94; 2026-02-21T10:12:32.2809878Z cvt.rn.f32.s16 %r4868, %rs97; 2026-02-21T10:12:32.2810070Z cvt.rn.f32.s16 %r4869, %rs100; 2026-02-21T10:12:32.2810356Z cvt.rn.f32.s16 %r4870, %rs103; 2026-02-21T10:12:32.2810550Z cvt.rn.f32.s16 %r4871, %rs106; 2026-02-21T10:12:32.2810744Z cvt.rn.f32.s16 %r4872, %rs109; 2026-02-21T10:12:32.2810943Z cvt.rn.f32.s16 %r4873, %rs112; 2026-02-21T10:12:32.2811131Z bar.sync 0; 2026-02-21T10:12:32.2811285Z st.shared.b32 [%r21], %r4866; 2026-02-21T10:12:32.2811480Z st.shared.b32 [%r22], %r4867; 2026-02-21T10:12:32.2811660Z st.shared.b32 [%r23], %r4868; 2026-02-21T10:12:32.2811923Z st.shared.b32 [%r24], %r4869; 2026-02-21T10:12:32.2812110Z st.shared.b32 [%r25], %r4870; 2026-02-21T10:12:32.2812296Z st.shared.b32 [%r26], %r4871; 2026-02-21T10:12:32.2812481Z st.shared.b32 [%r27], %r4872; 2026-02-21T10:12:32.2812659Z st.shared.b32 [%r28], %r4873; 2026-02-21T10:12:32.2812833Z $L__tmp3: 2026-02-21T10:12:32.2813195Z .loc 2 291 36 // standard.py:291:36 @[ cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:78:36 ] 2026-02-21T10:12:32.2813627Z // begin inline asm 2026-02-21T10:12:32.2813821Z fence.proxy.async.shared::cta; 2026-02-21T10:12:32.2814018Z // end inline asm 2026-02-21T10:12:32.2814166Z bar.sync 0; 2026-02-21T10:12:32.2814329Z wgmma.fence.sync.aligned; 2026-02-21T10:12:32.2814519Z // begin inline asm 2026-02-21T10:12:32.2815283Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r2055,%r2056,%r2057,%r2058}, %rd54, %p2, 1, 1; 2026-02-21T10:12:32.2816092Z // end inline asm 2026-02-21T10:12:32.2816241Z // begin inline asm 2026-02-21T10:12:32.2817125Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r2123,%r2124,%r2125,%r2126}, %rd55, %p2, 1, 1; 2026-02-21T10:12:32.2817935Z // end inline asm 2026-02-21T10:12:32.2818099Z // begin inline asm 2026-02-21T10:12:32.2818853Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r2191,%r2192,%r2193,%r2194}, %rd56, %p2, 1, 1; 2026-02-21T10:12:32.2819651Z // end inline asm 2026-02-21T10:12:32.2819822Z // begin inline asm 2026-02-21T10:12:32.2820571Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r2259,%r2260,%r2261,%r2262}, %rd57, %p2, 1, 1; 2026-02-21T10:12:32.2821364Z // end inline asm 2026-02-21T10:12:32.2821529Z // begin inline asm 2026-02-21T10:12:32.2822272Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r2055,%r2056,%r2057,%r2058}, %rd58, %p2, 1, 1; 2026-02-21T10:12:32.2823250Z // end inline asm 2026-02-21T10:12:32.2823409Z // begin inline asm 2026-02-21T10:12:32.2824151Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r2123,%r2124,%r2125,%r2126}, %rd59, %p2, 1, 1; 2026-02-21T10:12:32.2824955Z // end inline asm 2026-02-21T10:12:32.2825103Z // begin inline asm 2026-02-21T10:12:32.2825921Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r2191,%r2192,%r2193,%r2194}, %rd60, %p2, 1, 1; 2026-02-21T10:12:32.2826865Z // end inline asm 2026-02-21T10:12:32.2827015Z // begin inline asm 2026-02-21T10:12:32.2827767Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r2259,%r2260,%r2261,%r2262}, %rd61, %p2, 1, 1; 2026-02-21T10:12:32.2828755Z // end inline asm 2026-02-21T10:12:32.2828915Z // begin inline asm 2026-02-21T10:12:32.2829711Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r2055,%r2056,%r2057,%r2058}, %rd62, %p2, 1, 1; 2026-02-21T10:12:32.2830563Z // end inline asm 2026-02-21T10:12:32.2830728Z // begin inline asm 2026-02-21T10:12:32.2831523Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r2123,%r2124,%r2125,%r2126}, %rd63, %p2, 1, 1; 2026-02-21T10:12:32.2832374Z // end inline asm 2026-02-21T10:12:32.2832530Z // begin inline asm 2026-02-21T10:12:32.2833315Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r2191,%r2192,%r2193,%r2194}, %rd64, %p2, 1, 1; 2026-02-21T10:12:32.2834153Z // end inline asm 2026-02-21T10:12:32.2834311Z // begin inline asm 2026-02-21T10:12:32.2835093Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r2259,%r2260,%r2261,%r2262}, %rd65, %p2, 1, 1; 2026-02-21T10:12:32.2835932Z // end inline asm 2026-02-21T10:12:32.2836096Z // begin inline asm 2026-02-21T10:12:32.2837155Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r2055,%r2056,%r2057,%r2058}, %rd66, %p2, 1, 1; 2026-02-21T10:12:32.2838057Z // end inline asm 2026-02-21T10:12:32.2838206Z // begin inline asm 2026-02-21T10:12:32.2839052Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r2123,%r2124,%r2125,%r2126}, %rd67, %p2, 1, 1; 2026-02-21T10:12:32.2839947Z // end inline asm 2026-02-21T10:12:32.2840096Z // begin inline asm 2026-02-21T10:12:32.2840952Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r2191,%r2192,%r2193,%r2194}, %rd68, %p2, 1, 1; 2026-02-21T10:12:32.2842001Z // end inline asm 2026-02-21T10:12:32.2842155Z // begin inline asm 2026-02-21T10:12:32.2842995Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r2259,%r2260,%r2261,%r2262}, %rd69, %p2, 1, 1; 2026-02-21T10:12:32.2843886Z // end inline asm 2026-02-21T10:12:32.2844131Z wgmma.commit_group.sync.aligned; 2026-02-21T10:12:32.2844348Z mov.b32 %r2936, %r4794; 2026-02-21T10:12:32.2844530Z mov.b32 %r2937, %r4794; 2026-02-21T10:12:32.2844706Z mov.b32 %r2935, %r1713; 2026-02-21T10:12:32.2844879Z // begin inline asm 2026-02-21T10:12:32.2847037Z // wait for regs: %r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009,%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281,%r2935,%r2936,%r2937 2026-02-21T10:12:32.2849040Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:12:32.2849242Z // end inline asm 2026-02-21T10:12:32.2849408Z $L__tmp4: 2026-02-21T10:12:32.2849708Z .loc 1 42 76 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:42:76 2026-02-21T10:12:32.2850085Z add.s64 %rd93, %rd133, -64; 2026-02-21T10:12:32.2850272Z // begin inline asm 2026-02-21T10:12:32.2850436Z mov.u64 %rd92, 0x0; 2026-02-21T10:12:32.2850661Z createpolicy.fractional.L2::evict_last.b64 %rd92, 1.0; 2026-02-21T10:12:32.2850923Z // end inline asm 2026-02-21T10:12:32.2851089Z // begin inline asm 2026-02-21T10:12:32.2851247Z mov.u32 %r3069, 0x0; 2026-02-21T10:12:32.2851424Z mov.u32 %r3070, 0x0; 2026-02-21T10:12:32.2851714Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r3069, %r3070 }, [ %rd93 + 0 ], %rd92; 2026-02-21T10:12:32.2852052Z // end inline asm 2026-02-21T10:12:32.2852366Z .loc 1 46 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:46:28 2026-02-21T10:12:32.2852730Z bar.sync 0; 2026-02-21T10:12:32.2852903Z st.shared.v2.b32 [%r8], {%r3069, %r3070}; 2026-02-21T10:12:32.2853117Z bar.sync 0; 2026-02-21T10:12:32.2853289Z ld.shared.b16 %rs113, [%r10]; 2026-02-21T10:12:32.2858349Z ld.shared.b16 %rs114, [%r10+512]; 2026-02-21T10:12:32.2858622Z ld.shared.b16 %rs115, [%r10+32]; 2026-02-21T10:12:32.2858857Z ld.shared.b16 %rs116, [%r10+544]; 2026-02-21T10:12:32.2859078Z ld.shared.b16 %rs117, [%r11]; 2026-02-21T10:12:32.2859279Z ld.shared.b16 %rs118, [%r11+512]; 2026-02-21T10:12:32.2859483Z ld.shared.b16 %rs119, [%r11+32]; 2026-02-21T10:12:32.2859689Z ld.shared.b16 %rs120, [%r11+544]; 2026-02-21T10:12:32.2859890Z ld.shared.b16 %rs121, [%r12]; 2026-02-21T10:12:32.2860079Z ld.shared.b16 %rs122, [%r12+512]; 2026-02-21T10:12:32.2860283Z ld.shared.b16 %rs123, [%r12+32]; 2026-02-21T10:12:32.2860474Z ld.shared.b16 %rs124, [%r12+544]; 2026-02-21T10:12:32.2860674Z ld.shared.b16 %rs125, [%r13]; 2026-02-21T10:12:32.2860867Z ld.shared.b16 %rs126, [%r13+512]; 2026-02-21T10:12:32.2861063Z ld.shared.b16 %rs127, [%r13+32]; 2026-02-21T10:12:32.2861474Z ld.shared.b16 %rs128, [%r13+544]; 2026-02-21T10:12:32.2861679Z cvt.f32.bf16 %r3408, %rs113; 2026-02-21T10:12:32.2861872Z cvt.f32.bf16 %r3409, %rs114; 2026-02-21T10:12:32.2862056Z cvt.f32.bf16 %r3410, %rs117; 2026-02-21T10:12:32.2862246Z cvt.f32.bf16 %r3411, %rs118; 2026-02-21T10:12:32.2862422Z cvt.f32.bf16 %r3476, %rs121; 2026-02-21T10:12:32.2862597Z cvt.f32.bf16 %r3477, %rs122; 2026-02-21T10:12:32.2862769Z cvt.f32.bf16 %r3478, %rs125; 2026-02-21T10:12:32.2862947Z cvt.f32.bf16 %r3479, %rs126; 2026-02-21T10:12:32.2863116Z cvt.f32.bf16 %r3544, %rs115; 2026-02-21T10:12:32.2863294Z cvt.f32.bf16 %r3545, %rs116; 2026-02-21T10:12:32.2863464Z cvt.f32.bf16 %r3546, %rs119; 2026-02-21T10:12:32.2863640Z cvt.f32.bf16 %r3547, %rs120; 2026-02-21T10:12:32.2863928Z cvt.f32.bf16 %r3612, %rs123; 2026-02-21T10:12:32.2864108Z cvt.f32.bf16 %r3613, %rs124; 2026-02-21T10:12:32.2864289Z cvt.f32.bf16 %r3614, %rs127; 2026-02-21T10:12:32.2864456Z cvt.f32.bf16 %r3615, %rs128; 2026-02-21T10:12:32.2864817Z .loc 1 48 83 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:48:83 2026-02-21T10:12:32.2865189Z add.s64 %rd96, %rd132, 40960; 2026-02-21T10:12:32.2865387Z // begin inline asm 2026-02-21T10:12:32.2865634Z mov.u64 %rd95, 0x0; 2026-02-21T10:12:32.2865868Z createpolicy.fractional.L2::evict_last.b64 %rd95, 1.0; 2026-02-21T10:12:32.2866128Z // end inline asm 2026-02-21T10:12:32.2866279Z // begin inline asm 2026-02-21T10:12:32.2866589Z mov.u32 %r3071, 0x0; 2026-02-21T10:12:32.2866873Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3071 }, [ %rd96 + 0 ], %rd95; 2026-02-21T10:12:32.2867180Z // end inline asm 2026-02-21T10:12:32.2867497Z .loc 1 56 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:56:24 2026-02-21T10:12:32.2867872Z bar.sync 0; 2026-02-21T10:12:32.2868031Z st.shared.b8 [%r15], %r3071; 2026-02-21T10:12:32.2868234Z prmt.b32 %r4874, %r3071, 0, 0x7771U; 2026-02-21T10:12:32.2868536Z st.shared.b8 [%r16+1024], %r4874; 2026-02-21T10:12:32.2868747Z prmt.b32 %r4875, %r3071, 0, 0x7772U; 2026-02-21T10:12:32.2868953Z st.shared.b8 [%r17+2048], %r4875; 2026-02-21T10:12:32.2869150Z prmt.b32 %r4876, %r3071, 0, 0x7773U; 2026-02-21T10:12:32.2869349Z st.shared.b8 [%r18+3072], %r4876; 2026-02-21T10:12:32.2869530Z bar.sync 0; 2026-02-21T10:12:32.2869700Z ld.shared.b32 %r4877, [%r19]; 2026-02-21T10:12:32.2869898Z prmt.b32 %r4878, %r4877, 0, 0x7770U; 2026-02-21T10:12:32.2870100Z cvt.u16.u32 %rs129, %r4878; 2026-02-21T10:12:32.2870288Z prmt.b32 %r4879, %r4877, 0, 0x7771U; 2026-02-21T10:12:32.2870480Z cvt.u16.u32 %rs130, %r4879; 2026-02-21T10:12:32.2870663Z prmt.b32 %r4880, %r4877, 0, 0x7772U; 2026-02-21T10:12:32.2870853Z cvt.u16.u32 %rs131, %r4880; 2026-02-21T10:12:32.2871047Z prmt.b32 %r4881, %r4877, 0, 0x7773U; 2026-02-21T10:12:32.2871240Z cvt.u16.u32 %rs132, %r4881; 2026-02-21T10:12:32.2871424Z ld.shared.b32 %r4882, [%r19+256]; 2026-02-21T10:12:32.2871618Z prmt.b32 %r4883, %r4882, 0, 0x7770U; 2026-02-21T10:12:32.2871849Z cvt.u16.u32 %rs133, %r4883; 2026-02-21T10:12:32.2872035Z prmt.b32 %r4884, %r4882, 0, 0x7771U; 2026-02-21T10:12:32.2872237Z cvt.u16.u32 %rs134, %r4884; 2026-02-21T10:12:32.2872425Z prmt.b32 %r4885, %r4882, 0, 0x7772U; 2026-02-21T10:12:32.2872614Z cvt.u16.u32 %rs135, %r4885; 2026-02-21T10:12:32.2872793Z prmt.b32 %r4886, %r4882, 0, 0x7773U; 2026-02-21T10:12:32.2872981Z cvt.u16.u32 %rs136, %r4886; 2026-02-21T10:12:32.2873313Z .loc 1 51 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:51:24 2026-02-21T10:12:32.2873676Z shl.b16 %rs137, %rs129, 4; 2026-02-21T10:12:32.2873871Z shl.b16 %rs138, %rs130, 4; 2026-02-21T10:12:32.2874045Z shl.b16 %rs139, %rs131, 4; 2026-02-21T10:12:32.2874215Z shl.b16 %rs140, %rs132, 4; 2026-02-21T10:12:32.2874395Z shl.b16 %rs141, %rs133, 4; 2026-02-21T10:12:32.2874563Z shl.b16 %rs142, %rs134, 4; 2026-02-21T10:12:32.2874736Z shl.b16 %rs143, %rs135, 4; 2026-02-21T10:12:32.2875077Z shl.b16 %rs144, %rs136, 4; 2026-02-21T10:12:32.2875400Z .loc 1 66 54 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:66:54 2026-02-21T10:12:32.2875764Z selp.b16 %rs145, %rs137, %rs129, %p54; 2026-02-21T10:12:32.2875975Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T10:12:32.2876151Z shr.s16 %rs147, %rs146, 4; 2026-02-21T10:12:32.2876333Z selp.b16 %rs148, %rs138, %rs130, %p54; 2026-02-21T10:12:32.2876676Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T10:12:32.2876863Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:12:32.2877044Z selp.b16 %rs151, %rs139, %rs131, %p54; 2026-02-21T10:12:32.2877242Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T10:12:32.2877415Z shr.s16 %rs153, %rs152, 4; 2026-02-21T10:12:32.2877587Z selp.b16 %rs154, %rs140, %rs132, %p54; 2026-02-21T10:12:32.2877878Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T10:12:32.2878053Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:12:32.2878236Z selp.b16 %rs157, %rs141, %rs133, %p54; 2026-02-21T10:12:32.2878452Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T10:12:32.2878627Z shr.s16 %rs159, %rs158, 4; 2026-02-21T10:12:32.2878812Z selp.b16 %rs160, %rs142, %rs134, %p54; 2026-02-21T10:12:32.2879008Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T10:12:32.2879197Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:12:32.2879452Z selp.b16 %rs163, %rs143, %rs135, %p54; 2026-02-21T10:12:32.2879669Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T10:12:32.2879839Z shr.s16 %rs165, %rs164, 4; 2026-02-21T10:12:32.2880030Z selp.b16 %rs166, %rs144, %rs136, %p54; 2026-02-21T10:12:32.2880231Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T10:12:32.2880409Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:12:32.2880732Z .loc 1 71 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:71:28 2026-02-21T10:12:32.2881091Z cvt.rn.f32.s16 %r4887, %rs147; 2026-02-21T10:12:32.2881283Z cvt.rn.f32.s16 %r4888, %rs150; 2026-02-21T10:12:32.2881465Z cvt.rn.f32.s16 %r4889, %rs153; 2026-02-21T10:12:32.2881654Z cvt.rn.f32.s16 %r4890, %rs156; 2026-02-21T10:12:32.2881838Z cvt.rn.f32.s16 %r4891, %rs159; 2026-02-21T10:12:32.2882042Z cvt.rn.f32.s16 %r4892, %rs162; 2026-02-21T10:12:32.2882222Z cvt.rn.f32.s16 %r4893, %rs165; 2026-02-21T10:12:32.2882406Z cvt.rn.f32.s16 %r4894, %rs168; 2026-02-21T10:12:32.2882583Z bar.sync 0; 2026-02-21T10:12:32.2882739Z st.shared.b32 [%r21], %r4887; 2026-02-21T10:12:32.2882930Z st.shared.b32 [%r22], %r4888; 2026-02-21T10:12:32.2883107Z st.shared.b32 [%r23], %r4889; 2026-02-21T10:12:32.2883300Z st.shared.b32 [%r24], %r4890; 2026-02-21T10:12:32.2883481Z st.shared.b32 [%r25], %r4891; 2026-02-21T10:12:32.2883664Z st.shared.b32 [%r26], %r4892; 2026-02-21T10:12:32.2883837Z st.shared.b32 [%r27], %r4893; 2026-02-21T10:12:32.2884016Z st.shared.b32 [%r28], %r4894; 2026-02-21T10:12:32.2884190Z $L__tmp5: 2026-02-21T10:12:32.2884551Z .loc 2 291 36 // standard.py:291:36 @[ cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:78:36 ] 2026-02-21T10:12:32.2884985Z // begin inline asm 2026-02-21T10:12:32.2885165Z fence.proxy.async.shared::cta; 2026-02-21T10:12:32.2885352Z // end inline asm 2026-02-21T10:12:32.2885499Z bar.sync 0; 2026-02-21T10:12:32.2885659Z wgmma.fence.sync.aligned; 2026-02-21T10:12:32.2885842Z // begin inline asm 2026-02-21T10:12:32.2886743Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r3408,%r3409,%r3410,%r3411}, %rd54, %p2, 1, 1; 2026-02-21T10:12:32.2887571Z // end inline asm 2026-02-21T10:12:32.2887725Z // begin inline asm 2026-02-21T10:12:32.2888478Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r3476,%r3477,%r3478,%r3479}, %rd55, %p2, 1, 1; 2026-02-21T10:12:32.2889433Z // end inline asm 2026-02-21T10:12:32.2889582Z // begin inline asm 2026-02-21T10:12:32.2890333Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r3544,%r3545,%r3546,%r3547}, %rd56, %p2, 1, 1; 2026-02-21T10:12:32.2891142Z // end inline asm 2026-02-21T10:12:32.2891299Z // begin inline asm 2026-02-21T10:12:32.2892048Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465}, {%r3612,%r3613,%r3614,%r3615}, %rd57, %p2, 1, 1; 2026-02-21T10:12:32.2892915Z // end inline asm 2026-02-21T10:12:32.2893067Z // begin inline asm 2026-02-21T10:12:32.2893818Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r3408,%r3409,%r3410,%r3411}, %rd58, %p2, 1, 1; 2026-02-21T10:12:32.2894621Z // end inline asm 2026-02-21T10:12:32.2894852Z // begin inline asm 2026-02-21T10:12:32.2895509Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r3476,%r3477,%r3478,%r3479}, %rd59, %p2, 1, 1; 2026-02-21T10:12:32.2895571Z // end inline asm 2026-02-21T10:12:32.2895638Z // begin inline asm 2026-02-21T10:12:32.2896287Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r3544,%r3545,%r3546,%r3547}, %rd60, %p2, 1, 1; 2026-02-21T10:12:32.2896348Z // end inline asm 2026-02-21T10:12:32.2896407Z // begin inline asm 2026-02-21T10:12:32.2897198Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737}, {%r3612,%r3613,%r3614,%r3615}, %rd61, %p2, 1, 1; 2026-02-21T10:12:32.2897272Z // end inline asm 2026-02-21T10:12:32.2897334Z // begin inline asm 2026-02-21T10:12:32.2898025Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r3408,%r3409,%r3410,%r3411}, %rd62, %p2, 1, 1; 2026-02-21T10:12:32.2898083Z // end inline asm 2026-02-21T10:12:32.2898141Z // begin inline asm 2026-02-21T10:12:32.2898837Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r3476,%r3477,%r3478,%r3479}, %rd63, %p2, 1, 1; 2026-02-21T10:12:32.2898897Z // end inline asm 2026-02-21T10:12:32.2898956Z // begin inline asm 2026-02-21T10:12:32.2899645Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r3544,%r3545,%r3546,%r3547}, %rd64, %p2, 1, 1; 2026-02-21T10:12:32.2899701Z // end inline asm 2026-02-21T10:12:32.2899772Z // begin inline asm 2026-02-21T10:12:32.2900456Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009}, {%r3612,%r3613,%r3614,%r3615}, %rd65, %p2, 1, 1; 2026-02-21T10:12:32.2900665Z // end inline asm 2026-02-21T10:12:32.2900731Z // begin inline asm 2026-02-21T10:12:32.2901482Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r3408,%r3409,%r3410,%r3411}, %rd66, %p2, 1, 1; 2026-02-21T10:12:32.2901541Z // end inline asm 2026-02-21T10:12:32.2901613Z // begin inline asm 2026-02-21T10:12:32.2902444Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r3476,%r3477,%r3478,%r3479}, %rd67, %p2, 1, 1; 2026-02-21T10:12:32.2902516Z // end inline asm 2026-02-21T10:12:32.2902579Z // begin inline asm 2026-02-21T10:12:32.2903392Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r3544,%r3545,%r3546,%r3547}, %rd68, %p2, 1, 1; 2026-02-21T10:12:32.2903452Z // end inline asm 2026-02-21T10:12:32.2903516Z // begin inline asm 2026-02-21T10:12:32.2904258Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281}, {%r3612,%r3613,%r3614,%r3615}, %rd69, %p2, 1, 1; 2026-02-21T10:12:32.2904314Z // end inline asm 2026-02-21T10:12:32.2904419Z wgmma.commit_group.sync.aligned; 2026-02-21T10:12:32.2904483Z mov.b32 %r4289, %r4794; 2026-02-21T10:12:32.2904544Z mov.b32 %r4290, %r4794; 2026-02-21T10:12:32.2904604Z mov.b32 %r4288, %r1713; 2026-02-21T10:12:32.2904668Z // begin inline asm 2026-02-21T10:12:32.2906614Z // wait for regs: %r434,%r435,%r436,%r437,%r438,%r439,%r440,%r441,%r442,%r443,%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r706,%r707,%r708,%r709,%r710,%r711,%r712,%r713,%r714,%r715,%r716,%r717,%r718,%r719,%r720,%r721,%r722,%r723,%r724,%r725,%r726,%r727,%r728,%r729,%r730,%r731,%r732,%r733,%r734,%r735,%r736,%r737,%r978,%r979,%r980,%r981,%r982,%r983,%r984,%r985,%r986,%r987,%r988,%r989,%r990,%r991,%r992,%r993,%r994,%r995,%r996,%r997,%r998,%r999,%r1000,%r1001,%r1002,%r1003,%r1004,%r1005,%r1006,%r1007,%r1008,%r1009,%r1250,%r1251,%r1252,%r1253,%r1254,%r1255,%r1256,%r1257,%r1258,%r1259,%r1260,%r1261,%r1262,%r1263,%r1264,%r1265,%r1266,%r1267,%r1268,%r1269,%r1270,%r1271,%r1272,%r1273,%r1274,%r1275,%r1276,%r1277,%r1278,%r1279,%r1280,%r1281,%r4288,%r4289,%r4290 2026-02-21T10:12:32.2906713Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:12:32.2906782Z // end inline asm 2026-02-21T10:12:32.2906838Z $L__tmp6: 2026-02-21T10:12:32.2907069Z .loc 1 42 76 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:42:76 2026-02-21T10:12:32.2907129Z // begin inline asm 2026-02-21T10:12:32.2907190Z mov.u64 %rd114, 0x0; 2026-02-21T10:12:32.2907323Z createpolicy.fractional.L2::evict_last.b64 %rd114, 1.0; 2026-02-21T10:12:32.2907381Z // end inline asm 2026-02-21T10:12:32.2907441Z // begin inline asm 2026-02-21T10:12:32.2907501Z mov.u32 %r4422, 0x0; 2026-02-21T10:12:32.2907563Z mov.u32 %r4423, 0x0; 2026-02-21T10:12:32.2907756Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r4422, %r4423 }, [ %rd133 + 0 ], %rd114; 2026-02-21T10:12:32.2907815Z // end inline asm 2026-02-21T10:12:32.2908031Z .loc 1 46 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:46:28 2026-02-21T10:12:32.2908261Z bar.sync 0; 2026-02-21T10:12:32.2908395Z st.shared.v2.b32 [%r8], {%r4422, %r4423}; 2026-02-21T10:12:32.2908459Z bar.sync 0; 2026-02-21T10:12:32.2908540Z ld.shared.b16 %rs169, [%r10]; 2026-02-21T10:12:32.2908613Z ld.shared.b16 %rs170, [%r10+512]; 2026-02-21T10:12:32.2908681Z ld.shared.b16 %rs171, [%r10+32]; 2026-02-21T10:12:32.2908750Z ld.shared.b16 %rs172, [%r10+544]; 2026-02-21T10:12:32.2908816Z ld.shared.b16 %rs173, [%r11]; 2026-02-21T10:12:32.2908880Z ld.shared.b16 %rs174, [%r11+512]; 2026-02-21T10:12:32.2908950Z ld.shared.b16 %rs175, [%r11+32]; 2026-02-21T10:12:32.2909014Z ld.shared.b16 %rs176, [%r11+544]; 2026-02-21T10:12:32.2909076Z ld.shared.b16 %rs177, [%r12]; 2026-02-21T10:12:32.2909226Z ld.shared.b16 %rs178, [%r12+512]; 2026-02-21T10:12:32.2909302Z ld.shared.b16 %rs179, [%r12+32]; 2026-02-21T10:12:32.2909367Z ld.shared.b16 %rs180, [%r12+544]; 2026-02-21T10:12:32.2909434Z ld.shared.b16 %rs181, [%r13]; 2026-02-21T10:12:32.2909505Z ld.shared.b16 %rs182, [%r13+512]; 2026-02-21T10:12:32.2909569Z ld.shared.b16 %rs183, [%r13+32]; 2026-02-21T10:12:32.2909632Z ld.shared.b16 %rs184, [%r13+544]; 2026-02-21T10:12:32.2909699Z cvt.f32.bf16 %r4553, %rs169; 2026-02-21T10:12:32.2909842Z cvt.f32.bf16 %r4554, %rs170; 2026-02-21T10:12:32.2909909Z cvt.f32.bf16 %r4555, %rs173; 2026-02-21T10:12:32.2909971Z cvt.f32.bf16 %r4556, %rs174; 2026-02-21T10:12:32.2910035Z cvt.f32.bf16 %r4621, %rs177; 2026-02-21T10:12:32.2910095Z cvt.f32.bf16 %r4622, %rs178; 2026-02-21T10:12:32.2910155Z cvt.f32.bf16 %r4623, %rs181; 2026-02-21T10:12:32.2910215Z cvt.f32.bf16 %r4624, %rs182; 2026-02-21T10:12:32.2910280Z cvt.f32.bf16 %r4689, %rs171; 2026-02-21T10:12:32.2910340Z cvt.f32.bf16 %r4690, %rs172; 2026-02-21T10:12:32.2910402Z cvt.f32.bf16 %r4691, %rs175; 2026-02-21T10:12:32.2910466Z cvt.f32.bf16 %r4692, %rs176; 2026-02-21T10:12:32.2910530Z cvt.f32.bf16 %r4757, %rs179; 2026-02-21T10:12:32.2910604Z cvt.f32.bf16 %r4758, %rs180; 2026-02-21T10:12:32.2910669Z cvt.f32.bf16 %r4759, %rs183; 2026-02-21T10:12:32.2910731Z cvt.f32.bf16 %r4760, %rs184; 2026-02-21T10:12:32.2910943Z .loc 1 48 83 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:48:83 2026-02-21T10:12:32.2911012Z add.s64 %rd118, %rd132, 61440; 2026-02-21T10:12:32.2911077Z // begin inline asm 2026-02-21T10:12:32.2911138Z mov.u64 %rd117, 0x0; 2026-02-21T10:12:32.2911261Z createpolicy.fractional.L2::evict_last.b64 %rd117, 1.0; 2026-02-21T10:12:32.2911325Z // end inline asm 2026-02-21T10:12:32.2911383Z // begin inline asm 2026-02-21T10:12:32.2911441Z mov.u32 %r4424, 0x0; 2026-02-21T10:12:32.2911610Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r4424 }, [ %rd118 + 0 ], %rd117; 2026-02-21T10:12:32.2911670Z // end inline asm 2026-02-21T10:12:32.2911875Z .loc 1 56 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:56:24 2026-02-21T10:12:32.2911932Z bar.sync 0; 2026-02-21T10:12:32.2912006Z st.shared.b8 [%r15], %r4424; 2026-02-21T10:12:32.2912077Z prmt.b32 %r4895, %r4424, 0, 0x7771U; 2026-02-21T10:12:32.2912143Z st.shared.b8 [%r16+1024], %r4895; 2026-02-21T10:12:32.2912222Z prmt.b32 %r4896, %r4424, 0, 0x7772U; 2026-02-21T10:12:32.2912289Z st.shared.b8 [%r17+2048], %r4896; 2026-02-21T10:12:32.2912353Z prmt.b32 %r4897, %r4424, 0, 0x7773U; 2026-02-21T10:12:32.2912417Z st.shared.b8 [%r18+3072], %r4897; 2026-02-21T10:12:32.2912477Z bar.sync 0; 2026-02-21T10:12:32.2912544Z ld.shared.b32 %r4898, [%r19]; 2026-02-21T10:12:32.2912620Z prmt.b32 %r4899, %r4898, 0, 0x7770U; 2026-02-21T10:12:32.2912687Z cvt.u16.u32 %rs185, %r4899; 2026-02-21T10:12:32.2912750Z prmt.b32 %r4900, %r4898, 0, 0x7771U; 2026-02-21T10:12:32.2912820Z cvt.u16.u32 %rs186, %r4900; 2026-02-21T10:12:32.2912889Z prmt.b32 %r4901, %r4898, 0, 0x7772U; 2026-02-21T10:12:32.2912955Z cvt.u16.u32 %rs187, %r4901; 2026-02-21T10:12:32.2913020Z prmt.b32 %r4902, %r4898, 0, 0x7773U; 2026-02-21T10:12:32.2913205Z cvt.u16.u32 %rs188, %r4902; 2026-02-21T10:12:32.2913271Z ld.shared.b32 %r4903, [%r19+256]; 2026-02-21T10:12:32.2913336Z prmt.b32 %r4904, %r4903, 0, 0x7770U; 2026-02-21T10:12:32.2913406Z cvt.u16.u32 %rs189, %r4904; 2026-02-21T10:12:32.2913471Z prmt.b32 %r4905, %r4903, 0, 0x7771U; 2026-02-21T10:12:32.2913534Z cvt.u16.u32 %rs190, %r4905; 2026-02-21T10:12:32.2913603Z prmt.b32 %r4906, %r4903, 0, 0x7772U; 2026-02-21T10:12:32.2913668Z cvt.u16.u32 %rs191, %r4906; 2026-02-21T10:12:32.2913730Z prmt.b32 %r4907, %r4903, 0, 0x7773U; 2026-02-21T10:12:32.2913792Z cvt.u16.u32 %rs192, %r4907; 2026-02-21T10:12:32.2914000Z .loc 1 51 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:51:24 2026-02-21T10:12:32.2914064Z shl.b16 %rs193, %rs185, 4; 2026-02-21T10:12:32.2914184Z shl.b16 %rs194, %rs186, 4; 2026-02-21T10:12:32.2914255Z shl.b16 %rs195, %rs187, 4; 2026-02-21T10:12:32.2914317Z shl.b16 %rs196, %rs188, 4; 2026-02-21T10:12:32.2914380Z shl.b16 %rs197, %rs189, 4; 2026-02-21T10:12:32.2914445Z shl.b16 %rs198, %rs190, 4; 2026-02-21T10:12:32.2914512Z shl.b16 %rs199, %rs191, 4; 2026-02-21T10:12:32.2914572Z shl.b16 %rs200, %rs192, 4; 2026-02-21T10:12:32.2914827Z .loc 1 66 54 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:66:54 2026-02-21T10:12:32.2914909Z selp.b16 %rs201, %rs193, %rs185, %p54; 2026-02-21T10:12:32.2914972Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T10:12:32.2915032Z shr.s16 %rs203, %rs202, 4; 2026-02-21T10:12:32.2915107Z selp.b16 %rs204, %rs194, %rs186, %p54; 2026-02-21T10:12:32.2915169Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T10:12:32.2915228Z shr.s16 %rs206, %rs205, 4; 2026-02-21T10:12:32.2915296Z selp.b16 %rs207, %rs195, %rs187, %p54; 2026-02-21T10:12:32.2915375Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T10:12:32.2915437Z shr.s16 %rs209, %rs208, 4; 2026-02-21T10:12:32.2915505Z selp.b16 %rs210, %rs196, %rs188, %p54; 2026-02-21T10:12:32.2915571Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T10:12:32.2915635Z shr.s16 %rs212, %rs211, 4; 2026-02-21T10:12:32.2915702Z selp.b16 %rs213, %rs197, %rs189, %p54; 2026-02-21T10:12:32.2915764Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T10:12:32.2915843Z shr.s16 %rs215, %rs214, 4; 2026-02-21T10:12:32.2915916Z selp.b16 %rs216, %rs198, %rs190, %p54; 2026-02-21T10:12:32.2915978Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T10:12:32.2916041Z shr.s16 %rs218, %rs217, 4; 2026-02-21T10:12:32.2916110Z selp.b16 %rs219, %rs199, %rs191, %p54; 2026-02-21T10:12:32.2916173Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T10:12:32.2916235Z shr.s16 %rs221, %rs220, 4; 2026-02-21T10:12:32.2916310Z selp.b16 %rs222, %rs200, %rs192, %p54; 2026-02-21T10:12:32.2916370Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T10:12:32.2916441Z shr.s16 %rs224, %rs223, 4; 2026-02-21T10:12:32.2916780Z .loc 1 71 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:71:28 2026-02-21T10:12:32.2916847Z cvt.rn.f32.s16 %r4908, %rs203; 2026-02-21T10:12:32.2916922Z cvt.rn.f32.s16 %r4909, %rs206; 2026-02-21T10:12:32.2916995Z cvt.rn.f32.s16 %r4910, %rs209; 2026-02-21T10:12:32.2917058Z cvt.rn.f32.s16 %r4911, %rs212; 2026-02-21T10:12:32.2917120Z cvt.rn.f32.s16 %r4912, %rs215; 2026-02-21T10:12:32.2917181Z cvt.rn.f32.s16 %r4913, %rs218; 2026-02-21T10:12:32.2917251Z cvt.rn.f32.s16 %r4914, %rs221; 2026-02-21T10:12:32.2917313Z cvt.rn.f32.s16 %r4915, %rs224; 2026-02-21T10:12:32.2917369Z bar.sync 0; 2026-02-21T10:12:32.2917439Z st.shared.b32 [%r21], %r4908; 2026-02-21T10:12:32.2917505Z st.shared.b32 [%r22], %r4909; 2026-02-21T10:12:32.2917567Z st.shared.b32 [%r23], %r4910; 2026-02-21T10:12:32.2917630Z st.shared.b32 [%r24], %r4911; 2026-02-21T10:12:32.2917699Z st.shared.b32 [%r25], %r4912; 2026-02-21T10:12:32.2917761Z st.shared.b32 [%r26], %r4913; 2026-02-21T10:12:32.2917824Z st.shared.b32 [%r27], %r4914; 2026-02-21T10:12:32.2917891Z st.shared.b32 [%r28], %r4915; 2026-02-21T10:12:32.2917946Z $L__tmp7: 2026-02-21T10:12:32.2918225Z .loc 2 291 36 // standard.py:291:36 @[ cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:78:36 ] 2026-02-21T10:12:32.2918580Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r434, %r706, %r978, %r1250}; 2026-02-21T10:12:32.2918639Z bar.sync 0; 2026-02-21T10:12:32.2918702Z // begin inline asm 2026-02-21T10:12:32.2918841Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4557}, [%r4426]; 2026-02-21T10:12:32.2918905Z // end inline asm 2026-02-21T10:12:32.2918960Z bar.sync 0; 2026-02-21T10:12:32.2919136Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r436, %r708, %r980, %r1252}; 2026-02-21T10:12:32.2919198Z bar.sync 0; 2026-02-21T10:12:32.2919258Z // begin inline asm 2026-02-21T10:12:32.2919390Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4559}, [%r4426]; 2026-02-21T10:12:32.2919447Z // end inline asm 2026-02-21T10:12:32.2919582Z bar.sync 0; 2026-02-21T10:12:32.2919761Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r435, %r707, %r979, %r1251}; 2026-02-21T10:12:32.2919817Z bar.sync 0; 2026-02-21T10:12:32.2919886Z // begin inline asm 2026-02-21T10:12:32.2920015Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4558}, [%r4426]; 2026-02-21T10:12:32.2920074Z // end inline asm 2026-02-21T10:12:32.2920128Z bar.sync 0; 2026-02-21T10:12:32.2920379Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r437, %r709, %r981, %r1253}; 2026-02-21T10:12:32.2920440Z bar.sync 0; 2026-02-21T10:12:32.2920500Z // begin inline asm 2026-02-21T10:12:32.2920641Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4560}, [%r4426]; 2026-02-21T10:12:32.2920698Z // end inline asm 2026-02-21T10:12:32.2920754Z bar.sync 0; 2026-02-21T10:12:32.2920945Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r438, %r710, %r982, %r1254}; 2026-02-21T10:12:32.2921002Z bar.sync 0; 2026-02-21T10:12:32.2921065Z // begin inline asm 2026-02-21T10:12:32.2921197Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4561}, [%r4426]; 2026-02-21T10:12:32.2921265Z // end inline asm 2026-02-21T10:12:32.2921320Z bar.sync 0; 2026-02-21T10:12:32.2921493Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r440, %r712, %r984, %r1256}; 2026-02-21T10:12:32.2921556Z bar.sync 0; 2026-02-21T10:12:32.2921615Z // begin inline asm 2026-02-21T10:12:32.2921745Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4563}, [%r4426]; 2026-02-21T10:12:32.2921805Z // end inline asm 2026-02-21T10:12:32.2921869Z bar.sync 0; 2026-02-21T10:12:32.2922047Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r439, %r711, %r983, %r1255}; 2026-02-21T10:12:32.2922114Z bar.sync 0; 2026-02-21T10:12:32.2922181Z // begin inline asm 2026-02-21T10:12:32.2922311Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4562}, [%r4426]; 2026-02-21T10:12:32.2922369Z // end inline asm 2026-02-21T10:12:32.2922432Z bar.sync 0; 2026-02-21T10:12:32.2922604Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r441, %r713, %r985, %r1257}; 2026-02-21T10:12:32.2922662Z bar.sync 0; 2026-02-21T10:12:32.2922721Z // begin inline asm 2026-02-21T10:12:32.2922867Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4564}, [%r4426]; 2026-02-21T10:12:32.2922929Z // end inline asm 2026-02-21T10:12:32.2922986Z bar.sync 0; 2026-02-21T10:12:32.2923161Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r442, %r714, %r986, %r1258}; 2026-02-21T10:12:32.2923217Z bar.sync 0; 2026-02-21T10:12:32.2923276Z // begin inline asm 2026-02-21T10:12:32.2923405Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4565}, [%r4426]; 2026-02-21T10:12:32.2923471Z // end inline asm 2026-02-21T10:12:32.2923525Z bar.sync 0; 2026-02-21T10:12:32.2923694Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r444, %r716, %r988, %r1260}; 2026-02-21T10:12:32.2923756Z bar.sync 0; 2026-02-21T10:12:32.2923815Z // begin inline asm 2026-02-21T10:12:32.2923946Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4567}, [%r4426]; 2026-02-21T10:12:32.2924010Z // end inline asm 2026-02-21T10:12:32.2924065Z bar.sync 0; 2026-02-21T10:12:32.2924234Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r443, %r715, %r987, %r1259}; 2026-02-21T10:12:32.2924362Z bar.sync 0; 2026-02-21T10:12:32.2924476Z // begin inline asm 2026-02-21T10:12:32.2924606Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4566}, [%r4426]; 2026-02-21T10:12:32.2924663Z // end inline asm 2026-02-21T10:12:32.2924722Z bar.sync 0; 2026-02-21T10:12:32.2924894Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r445, %r717, %r989, %r1261}; 2026-02-21T10:12:32.2924949Z bar.sync 0; 2026-02-21T10:12:32.2925010Z // begin inline asm 2026-02-21T10:12:32.2925143Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4568}, [%r4426]; 2026-02-21T10:12:32.2925199Z // end inline asm 2026-02-21T10:12:32.2925254Z bar.sync 0; 2026-02-21T10:12:32.2925431Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r446, %r718, %r990, %r1262}; 2026-02-21T10:12:32.2925487Z bar.sync 0; 2026-02-21T10:12:32.2925558Z // begin inline asm 2026-02-21T10:12:32.2925739Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4569}, [%r4426]; 2026-02-21T10:12:32.2925804Z // end inline asm 2026-02-21T10:12:32.2925860Z bar.sync 0; 2026-02-21T10:12:32.2926035Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r448, %r720, %r992, %r1264}; 2026-02-21T10:12:32.2926094Z bar.sync 0; 2026-02-21T10:12:32.2926153Z // begin inline asm 2026-02-21T10:12:32.2926280Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4571}, [%r4426]; 2026-02-21T10:12:32.2926402Z // end inline asm 2026-02-21T10:12:32.2926599Z bar.sync 0; 2026-02-21T10:12:32.2926776Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r447, %r719, %r991, %r1263}; 2026-02-21T10:12:32.2926832Z bar.sync 0; 2026-02-21T10:12:32.2926895Z // begin inline asm 2026-02-21T10:12:32.2927022Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4570}, [%r4426]; 2026-02-21T10:12:32.2927079Z // end inline asm 2026-02-21T10:12:32.2927145Z bar.sync 0; 2026-02-21T10:12:32.2927327Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r449, %r721, %r993, %r1265}; 2026-02-21T10:12:32.2927382Z bar.sync 0; 2026-02-21T10:12:32.2927441Z // begin inline asm 2026-02-21T10:12:32.2927572Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4572}, [%r4426]; 2026-02-21T10:12:32.2927631Z // end inline asm 2026-02-21T10:12:32.2927685Z bar.sync 0; 2026-02-21T10:12:32.2927858Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r450, %r722, %r994, %r1266}; 2026-02-21T10:12:32.2927915Z bar.sync 0; 2026-02-21T10:12:32.2927975Z // begin inline asm 2026-02-21T10:12:32.2928117Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4573}, [%r4426]; 2026-02-21T10:12:32.2928175Z // end inline asm 2026-02-21T10:12:32.2928232Z bar.sync 0; 2026-02-21T10:12:32.2928404Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r452, %r724, %r996, %r1268}; 2026-02-21T10:12:32.2928466Z bar.sync 0; 2026-02-21T10:12:32.2928524Z // begin inline asm 2026-02-21T10:12:32.2928653Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4575}, [%r4426]; 2026-02-21T10:12:32.2928717Z // end inline asm 2026-02-21T10:12:32.2928773Z bar.sync 0; 2026-02-21T10:12:32.2928945Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r451, %r723, %r995, %r1267}; 2026-02-21T10:12:32.2929014Z bar.sync 0; 2026-02-21T10:12:32.2929080Z // begin inline asm 2026-02-21T10:12:32.2929208Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4574}, [%r4426]; 2026-02-21T10:12:32.2929265Z // end inline asm 2026-02-21T10:12:32.2929327Z bar.sync 0; 2026-02-21T10:12:32.2929498Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r453, %r725, %r997, %r1269}; 2026-02-21T10:12:32.2929555Z bar.sync 0; 2026-02-21T10:12:32.2929616Z // begin inline asm 2026-02-21T10:12:32.2929753Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4576}, [%r4426]; 2026-02-21T10:12:32.2929810Z // end inline asm 2026-02-21T10:12:32.2929865Z bar.sync 0; 2026-02-21T10:12:32.2930039Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r454, %r726, %r998, %r1270}; 2026-02-21T10:12:32.2930094Z bar.sync 0; 2026-02-21T10:12:32.2930155Z // begin inline asm 2026-02-21T10:12:32.2930290Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4577}, [%r4426]; 2026-02-21T10:12:32.2930346Z // end inline asm 2026-02-21T10:12:32.2930527Z bar.sync 0; 2026-02-21T10:12:32.2930771Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r456, %r728, %r1000, %r1272}; 2026-02-21T10:12:32.2930835Z bar.sync 0; 2026-02-21T10:12:32.2930895Z // begin inline asm 2026-02-21T10:12:32.2931024Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4579}, [%r4426]; 2026-02-21T10:12:32.2931098Z // end inline asm 2026-02-21T10:12:32.2931157Z bar.sync 0; 2026-02-21T10:12:32.2931332Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r455, %r727, %r999, %r1271}; 2026-02-21T10:12:32.2931389Z bar.sync 0; 2026-02-21T10:12:32.2931454Z // begin inline asm 2026-02-21T10:12:32.2931582Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4578}, [%r4426]; 2026-02-21T10:12:32.2931638Z // end inline asm 2026-02-21T10:12:32.2931698Z bar.sync 0; 2026-02-21T10:12:32.2931947Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r457, %r729, %r1001, %r1273}; 2026-02-21T10:12:32.2932009Z bar.sync 0; 2026-02-21T10:12:32.2932077Z // begin inline asm 2026-02-21T10:12:32.2932208Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4580}, [%r4426]; 2026-02-21T10:12:32.2932280Z // end inline asm 2026-02-21T10:12:32.2932694Z bar.sync 0; 2026-02-21T10:12:32.2932884Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r458, %r730, %r1002, %r1274}; 2026-02-21T10:12:32.2933024Z bar.sync 0; 2026-02-21T10:12:32.2933087Z // begin inline asm 2026-02-21T10:12:32.2933237Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4581}, [%r4426]; 2026-02-21T10:12:32.2933297Z // end inline asm 2026-02-21T10:12:32.2933354Z bar.sync 0; 2026-02-21T10:12:32.2933532Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r460, %r732, %r1004, %r1276}; 2026-02-21T10:12:32.2933596Z bar.sync 0; 2026-02-21T10:12:32.2933658Z // begin inline asm 2026-02-21T10:12:32.2933789Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4583}, [%r4426]; 2026-02-21T10:12:32.2933854Z // end inline asm 2026-02-21T10:12:32.2933910Z bar.sync 0; 2026-02-21T10:12:32.2934082Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r459, %r731, %r1003, %r1275}; 2026-02-21T10:12:32.2934156Z bar.sync 0; 2026-02-21T10:12:32.2934219Z // begin inline asm 2026-02-21T10:12:32.2934350Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4582}, [%r4426]; 2026-02-21T10:12:32.2934410Z // end inline asm 2026-02-21T10:12:32.2934471Z bar.sync 0; 2026-02-21T10:12:32.2934645Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r461, %r733, %r1005, %r1277}; 2026-02-21T10:12:32.2934700Z bar.sync 0; 2026-02-21T10:12:32.2934764Z // begin inline asm 2026-02-21T10:12:32.2934892Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4584}, [%r4426]; 2026-02-21T10:12:32.2934949Z // end inline asm 2026-02-21T10:12:32.2935004Z bar.sync 0; 2026-02-21T10:12:32.2935181Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r462, %r734, %r1006, %r1278}; 2026-02-21T10:12:32.2935238Z bar.sync 0; 2026-02-21T10:12:32.2935308Z // begin inline asm 2026-02-21T10:12:32.2935444Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4585}, [%r4426]; 2026-02-21T10:12:32.2935500Z // end inline asm 2026-02-21T10:12:32.2935559Z bar.sync 0; 2026-02-21T10:12:32.2935730Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r464, %r736, %r1008, %r1280}; 2026-02-21T10:12:32.2935795Z bar.sync 0; 2026-02-21T10:12:32.2935854Z // begin inline asm 2026-02-21T10:12:32.2935984Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4587}, [%r4426]; 2026-02-21T10:12:32.2936049Z // end inline asm 2026-02-21T10:12:32.2936106Z bar.sync 0; 2026-02-21T10:12:32.2936281Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r463, %r735, %r1007, %r1279}; 2026-02-21T10:12:32.2936340Z bar.sync 0; 2026-02-21T10:12:32.2936400Z // begin inline asm 2026-02-21T10:12:32.2936680Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4586}, [%r4426]; 2026-02-21T10:12:32.2936743Z // end inline asm 2026-02-21T10:12:32.2936804Z bar.sync 0; 2026-02-21T10:12:32.2936980Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r210], {%r465, %r737, %r1009, %r1281}; 2026-02-21T10:12:32.2937038Z bar.sync 0; 2026-02-21T10:12:32.2937104Z // begin inline asm 2026-02-21T10:12:32.2937392Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4588}, [%r4426]; 2026-02-21T10:12:32.2937450Z // end inline asm 2026-02-21T10:12:32.2937508Z // begin inline asm 2026-02-21T10:12:32.2937594Z fence.proxy.async.shared::cta; 2026-02-21T10:12:32.2937653Z // end inline asm 2026-02-21T10:12:32.2937729Z wgmma.fence.sync.aligned; 2026-02-21T10:12:32.2937799Z shl.b32 %r4916, %r4852, 10; 2026-02-21T10:12:32.2937867Z and.b32 %r4917, %r4916, 24576; 2026-02-21T10:12:32.2937934Z add.s32 %r4918, %r4917, %r1713; 2026-02-21T10:12:32.2938002Z bfe.u32 %r4919, %r4918, 4, 14; 2026-02-21T10:12:32.2938076Z cvt.u64.u32 %rd124, %r4919; 2026-02-21T10:12:32.2938158Z or.b64 %rd120, %rd124, 4611686293439512576; 2026-02-21T10:12:32.2938222Z // begin inline asm 2026-02-21T10:12:32.2939071Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4557,%r4558,%r4559,%r4560,%r4561,%r4562,%r4563,%r4564,%r4565,%r4566,%r4567,%r4568,%r4569,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588}, {%r4553,%r4554,%r4555,%r4556}, %rd120, %p2, 1, 1; 2026-02-21T10:12:32.2939148Z // end inline asm 2026-02-21T10:12:32.2939213Z add.s32 %r4920, %r4918, 32; 2026-02-21T10:12:32.2939280Z bfe.u32 %r4921, %r4920, 4, 14; 2026-02-21T10:12:32.2939413Z cvt.u64.u32 %rd125, %r4921; 2026-02-21T10:12:32.2939494Z or.b64 %rd121, %rd125, 4611686293439512576; 2026-02-21T10:12:32.2939560Z // begin inline asm 2026-02-21T10:12:32.2940310Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4557,%r4558,%r4559,%r4560,%r4561,%r4562,%r4563,%r4564,%r4565,%r4566,%r4567,%r4568,%r4569,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588}, {%r4621,%r4622,%r4623,%r4624}, %rd121, %p2, 1, 1; 2026-02-21T10:12:32.2940372Z // end inline asm 2026-02-21T10:12:32.2940439Z add.s32 %r4922, %r4918, 64; 2026-02-21T10:12:32.2940501Z bfe.u32 %r4923, %r4922, 4, 14; 2026-02-21T10:12:32.2940564Z cvt.u64.u32 %rd126, %r4923; 2026-02-21T10:12:32.2940653Z or.b64 %rd122, %rd126, 4611686293439512576; 2026-02-21T10:12:32.2940720Z // begin inline asm 2026-02-21T10:12:32.2941474Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4557,%r4558,%r4559,%r4560,%r4561,%r4562,%r4563,%r4564,%r4565,%r4566,%r4567,%r4568,%r4569,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588}, {%r4689,%r4690,%r4691,%r4692}, %rd122, %p2, 1, 1; 2026-02-21T10:12:32.2941533Z // end inline asm 2026-02-21T10:12:32.2941601Z add.s32 %r4924, %r4918, 96; 2026-02-21T10:12:32.2941663Z bfe.u32 %r4925, %r4924, 4, 14; 2026-02-21T10:12:32.2941725Z cvt.u64.u32 %rd127, %r4925; 2026-02-21T10:12:32.2941801Z or.b64 %rd123, %rd127, 4611686293439512576; 2026-02-21T10:12:32.2941860Z // begin inline asm 2026-02-21T10:12:32.2942606Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r4557,%r4558,%r4559,%r4560,%r4561,%r4562,%r4563,%r4564,%r4565,%r4566,%r4567,%r4568,%r4569,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588}, {%r4757,%r4758,%r4759,%r4760}, %rd123, %p2, 1, 1; 2026-02-21T10:12:32.2942673Z // end inline asm 2026-02-21T10:12:32.2942753Z wgmma.commit_group.sync.aligned; 2026-02-21T10:12:32.2942818Z mov.b32 %r4793, %r1713; 2026-02-21T10:12:32.2942885Z mov.b32 %r4795, %r4794; 2026-02-21T10:12:32.2942944Z // begin inline asm 2026-02-21T10:12:32.2943502Z // wait for regs: %r4557,%r4558,%r4559,%r4560,%r4561,%r4562,%r4563,%r4564,%r4565,%r4566,%r4567,%r4568,%r4569,%r4570,%r4571,%r4572,%r4573,%r4574,%r4575,%r4576,%r4577,%r4578,%r4579,%r4580,%r4581,%r4582,%r4583,%r4584,%r4585,%r4586,%r4587,%r4588,%r4793,%r4794,%r4795 2026-02-21T10:12:32.2943591Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:12:32.2943657Z // end inline asm 2026-02-21T10:12:32.2943718Z $L__tmp8: 2026-02-21T10:12:32.2943946Z .loc 1 34 122 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:34:122 2026-02-21T10:12:32.2944132Z add.s64 %rd134, %rd134, 64; 2026-02-21T10:12:32.2944198Z add.s64 %rd133, %rd133, 256; 2026-02-21T10:12:32.2944264Z add.s64 %rd132, %rd132, 81920; 2026-02-21T10:12:32.2944338Z setp.lt.u64 %p55, %rd134, 4032; 2026-02-21T10:12:32.2944402Z @%p55 bra $L__BB0_1; 2026-02-21T10:12:32.2944459Z // %bb.2: 2026-02-21T10:12:32.2944674Z .loc 1 27 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:27:28 2026-02-21T10:12:32.2944745Z or.b32 %r4962, %r5, %r3; 2026-02-21T10:12:32.2944810Z or.b32 %r4963, %r4962, 96; 2026-02-21T10:12:32.2944871Z or.b32 %r4964, %r4962, 64; 2026-02-21T10:12:32.2944949Z or.b32 %r4965, %r4962, 32; 2026-02-21T10:12:32.2945219Z .loc 1 25 41 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:25:41 2026-02-21T10:12:32.2945284Z shl.b32 %r4966, %r4, 3; 2026-02-21T10:12:32.2945492Z .loc 1 25 28 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:25:28 2026-02-21T10:12:32.2945562Z or.b32 %r4967, %r1, %r4966; 2026-02-21T10:12:32.2945763Z .loc 1 81 24 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:81:24 2026-02-21T10:12:32.2945842Z cvt.rn.bf16x2.f32 %r4968, %r4558, %r4557; 2026-02-21T10:12:32.2945980Z cvt.rn.bf16x2.f32 %r4969, %r4560, %r4559; 2026-02-21T10:12:32.2946057Z cvt.rn.bf16x2.f32 %r4970, %r4562, %r4561; 2026-02-21T10:12:32.2946130Z cvt.rn.bf16x2.f32 %r4971, %r4564, %r4563; 2026-02-21T10:12:32.2946206Z cvt.rn.bf16x2.f32 %r4972, %r4566, %r4565; 2026-02-21T10:12:32.2946276Z cvt.rn.bf16x2.f32 %r4973, %r4568, %r4567; 2026-02-21T10:12:32.2946346Z cvt.rn.bf16x2.f32 %r4974, %r4570, %r4569; 2026-02-21T10:12:32.2946420Z cvt.rn.bf16x2.f32 %r4975, %r4572, %r4571; 2026-02-21T10:12:32.2946619Z cvt.rn.bf16x2.f32 %r4976, %r4574, %r4573; 2026-02-21T10:12:32.2946710Z cvt.rn.bf16x2.f32 %r4977, %r4576, %r4575; 2026-02-21T10:12:32.2946783Z cvt.rn.bf16x2.f32 %r4978, %r4578, %r4577; 2026-02-21T10:12:32.2946864Z cvt.rn.bf16x2.f32 %r4979, %r4580, %r4579; 2026-02-21T10:12:32.2946934Z cvt.rn.bf16x2.f32 %r4980, %r4582, %r4581; 2026-02-21T10:12:32.2947006Z cvt.rn.bf16x2.f32 %r4981, %r4584, %r4583; 2026-02-21T10:12:32.2947082Z cvt.rn.bf16x2.f32 %r4982, %r4586, %r4585; 2026-02-21T10:12:32.2947157Z cvt.rn.bf16x2.f32 %r4983, %r4588, %r4587; 2026-02-21T10:12:32.2947369Z .loc 1 82 46 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:82:46 2026-02-21T10:12:32.2947447Z mad.lo.s32 %r4984, %r4962, 1280, %r4967; 2026-02-21T10:12:32.2947519Z mad.lo.s32 %r4985, %r4965, 1280, %r4967; 2026-02-21T10:12:32.2947586Z mad.lo.s32 %r4986, %r4964, 1280, %r4967; 2026-02-21T10:12:32.2947654Z mad.lo.s32 %r4987, %r4963, 1280, %r4967; 2026-02-21T10:12:32.2947879Z .loc 1 82 18 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:82:18 2026-02-21T10:12:32.2947954Z mad.wide.s32 %rd128, %r4984, 2, %rd25; 2026-02-21T10:12:32.2948025Z mad.wide.s32 %rd129, %r4985, 2, %rd25; 2026-02-21T10:12:32.2948102Z mad.wide.s32 %rd130, %r4986, 2, %rd25; 2026-02-21T10:12:32.2948168Z mad.wide.s32 %rd131, %r4987, 2, %rd25; 2026-02-21T10:12:32.2948437Z .loc 1 82 77 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:82:77 2026-02-21T10:12:32.2948504Z bar.sync 0; 2026-02-21T10:12:32.2948566Z shl.b32 %r4988, %r2, 14; 2026-02-21T10:12:32.2948633Z and.b32 %r4989, %r4988, 49152; 2026-02-21T10:12:32.2948692Z shl.b32 %r4990, %r2, 5; 2026-02-21T10:12:32.2948759Z and.b32 %r4991, %r4990, 7264; 2026-02-21T10:12:32.2948819Z shl.b32 %r4992, %r9, 4; 2026-02-21T10:12:32.2948880Z and.b32 %r4993, %r6, 96; 2026-02-21T10:12:32.2948948Z bfe.s32 %r4994, %r2, 2, 1; 2026-02-21T10:12:32.2949010Z and.b32 %r4995, %r4994, 8208; 2026-02-21T10:12:32.2949075Z or.b32 %r4996, %r4995, %r4989; 2026-02-21T10:12:32.2949138Z or.b32 %r4997, %r4991, %r4992; 2026-02-21T10:12:32.2949207Z xor.b32 %r4998, %r4997, %r4993; 2026-02-21T10:12:32.2949270Z or.b32 %r4999, %r4996, %r4998; 2026-02-21T10:12:32.2949500Z add.s32 %r5001, %r1713, %r4999; 2026-02-21T10:12:32.2949630Z st.shared.v4.b32 [%r5001], {%r4968, %r4970, %r4972, %r4974}; 2026-02-21T10:12:32.2949754Z st.shared.v4.b32 [%r5001+512], {%r4969, %r4971, %r4973, %r4975}; 2026-02-21T10:12:32.2949819Z xor.b32 %r5002, %r4999, 16; 2026-02-21T10:12:32.2949883Z add.s32 %r5003, %r1713, %r5002; 2026-02-21T10:12:32.2949998Z st.shared.v4.b32 [%r5003], {%r4976, %r4978, %r4980, %r4982}; 2026-02-21T10:12:32.2950115Z st.shared.v4.b32 [%r5003+512], {%r4977, %r4979, %r4981, %r4983}; 2026-02-21T10:12:32.2950171Z bar.sync 0; 2026-02-21T10:12:32.2950237Z shl.b32 %r5004, %r9, 11; 2026-02-21T10:12:32.2950297Z shl.b32 %r5005, %r9, 2; 2026-02-21T10:12:32.2950359Z and.b32 %r5006, %r14, 1920; 2026-02-21T10:12:32.2950501Z bfe.s32 %r5007, %r2, 5, 1; 2026-02-21T10:12:32.2950570Z and.b32 %r5008, %r5007, 8208; 2026-02-21T10:12:32.2950634Z or.b32 %r5009, %r5004, %r20; 2026-02-21T10:12:32.2950696Z or.b32 %r5010, %r5005, %r5006; 2026-02-21T10:12:32.2950781Z xor.b32 %r5011, %r5009, %r5010; 2026-02-21T10:12:32.2950844Z xor.b32 %r5012, %r5011, %r5008; 2026-02-21T10:12:32.2950906Z add.s32 %r4930, %r1713, %r5012; 2026-02-21T10:12:32.2950972Z // begin inline asm 2026-02-21T10:12:32.2951261Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4946, %r4947, %r4948, %r4949}, [%r4930]; 2026-02-21T10:12:32.2951325Z // end inline asm 2026-02-21T10:12:32.2951388Z add.s32 %r4935, %r4930, 2048; 2026-02-21T10:12:32.2951455Z // begin inline asm 2026-02-21T10:12:32.2951648Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4950, %r4951, %r4952, %r4953}, [%r4935]; 2026-02-21T10:12:32.2951707Z // end inline asm 2026-02-21T10:12:32.2951776Z add.s32 %r4940, %r4930, 4096; 2026-02-21T10:12:32.2951837Z // begin inline asm 2026-02-21T10:12:32.2952020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4954, %r4955, %r4956, %r4957}, [%r4940]; 2026-02-21T10:12:32.2952082Z // end inline asm 2026-02-21T10:12:32.2952143Z add.s32 %r4945, %r4930, 6144; 2026-02-21T10:12:32.2952206Z // begin inline asm 2026-02-21T10:12:32.2952393Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4958, %r4959, %r4960, %r4961}, [%r4945]; 2026-02-21T10:12:32.2952457Z // end inline asm 2026-02-21T10:12:32.2952519Z // begin inline asm 2026-02-21T10:12:32.2952650Z st.global.v4.b32 [ %rd128 + 0 ], { %r4946, %r4947, %r4948, %r4949 }; 2026-02-21T10:12:32.2952722Z // end inline asm 2026-02-21T10:12:32.2952786Z // begin inline asm 2026-02-21T10:12:32.2952912Z st.global.v4.b32 [ %rd129 + 0 ], { %r4950, %r4951, %r4952, %r4953 }; 2026-02-21T10:12:32.2952970Z // end inline asm 2026-02-21T10:12:32.2953035Z // begin inline asm 2026-02-21T10:12:32.2953163Z st.global.v4.b32 [ %rd130 + 0 ], { %r4954, %r4955, %r4956, %r4957 }; 2026-02-21T10:12:32.2953223Z // end inline asm 2026-02-21T10:12:32.2953291Z // begin inline asm 2026-02-21T10:12:32.2953408Z st.global.v4.b32 [ %rd131 + 0 ], { %r4958, %r4959, %r4960, %r4961 }; 2026-02-21T10:12:32.2953467Z // end inline asm 2026-02-21T10:12:32.2953688Z .loc 1 82 4 // cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py:82:4 2026-02-21T10:12:32.2953756Z ret; 2026-02-21T10:12:32.2953814Z $L__tmp9: 2026-02-21T10:12:32.2953872Z $L__func_end0: 2026-02-21T10:12:32.2953970Z // -- End function 2026-02-21T10:12:32.2954026Z } 2026-02-21T10:12:32.2954282Z .file 1 "/tmp/torchinductor_root/ub/cubcpyzjneiopnugk2ru3waezagurbj7mtfv5wbzvxj4ryw2k3pv.py" 2026-02-21T10:12:32.2954499Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:12:32.2954565Z .section .debug_abbrev 2026-02-21T10:12:32.2954620Z { 2026-02-21T10:12:32.2954722Z .b8 1 // Abbreviation Code 2026-02-21T10:12:32.2954819Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:12:32.2954905Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:12:32.2954988Z .b8 37 // DW_AT_producer 2026-02-21T10:12:32.2955187Z .b8 8 // DW_FORM_string 2026-02-21T10:12:32.2955266Z .b8 19 // DW_AT_language 2026-02-21T10:12:32.2955349Z .b8 5 // DW_FORM_data2 2026-02-21T10:12:32.2955435Z .b8 3 // DW_AT_name 2026-02-21T10:12:32.2955515Z .b8 8 // DW_FORM_string 2026-02-21T10:12:32.2955597Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:12:32.2955681Z .b8 6 // DW_FORM_data4 2026-02-21T10:12:32.2955761Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:12:32.2955838Z .b8 8 // DW_FORM_string 2026-02-21T10:12:32.2955963Z .b8 0 // EOM(1) 2026-02-21T10:12:32.2956043Z .b8 0 // EOM(2) 2026-02-21T10:12:32.2956129Z .b8 2 // Abbreviation Code 2026-02-21T10:12:32.2956234Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:12:32.2956322Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:12:32.2956398Z .b8 3 // DW_AT_name 2026-02-21T10:12:32.2956690Z .b8 8 // DW_FORM_string 2026-02-21T10:12:32.2956800Z .b8 32 // DW_AT_inline 2026-02-21T10:12:32.2956883Z .b8 11 // DW_FORM_data1 2026-02-21T10:12:32.2956955Z .b8 0 // EOM(1) 2026-02-21T10:12:32.2957026Z .b8 0 // EOM(2) 2026-02-21T10:12:32.2957117Z .b8 3 // Abbreviation Code 2026-02-21T10:12:32.2957206Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:12:32.2957288Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:12:32.2957374Z .b8 17 // DW_AT_low_pc 2026-02-21T10:12:32.2957455Z .b8 1 // DW_FORM_addr 2026-02-21T10:12:32.2957537Z .b8 18 // DW_AT_high_pc 2026-02-21T10:12:32.2957619Z .b8 1 // DW_FORM_addr 2026-02-21T10:12:32.2957716Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:12:32.2957794Z .b8 19 // DW_FORM_ref4 2026-02-21T10:12:32.2957874Z .b8 0 // EOM(1) 2026-02-21T10:12:32.2957945Z .b8 0 // EOM(2) 2026-02-21T10:12:32.2958032Z .b8 4 // Abbreviation Code 2026-02-21T10:12:32.2958133Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:12:32.2958218Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:12:32.2958310Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:12:32.2958391Z .b8 19 // DW_FORM_ref4 2026-02-21T10:12:32.2958474Z .b8 17 // DW_AT_low_pc 2026-02-21T10:12:32.2958549Z .b8 1 // DW_FORM_addr 2026-02-21T10:12:32.2958641Z .b8 18 // DW_AT_high_pc 2026-02-21T10:12:32.2958725Z .b8 1 // DW_FORM_addr 2026-02-21T10:12:32.2958809Z .b8 88 // DW_AT_call_file 2026-02-21T10:12:32.2958887Z .b8 11 // DW_FORM_data1 2026-02-21T10:12:32.2958966Z .b8 89 // DW_AT_call_line 2026-02-21T10:12:32.2959050Z .b8 11 // DW_FORM_data1 2026-02-21T10:12:32.2959135Z .b8 87 // DW_AT_call_column 2026-02-21T10:12:32.2959216Z .b8 11 // DW_FORM_data1 2026-02-21T10:12:32.2959291Z .b8 0 // EOM(1) 2026-02-21T10:12:32.2959363Z .b8 0 // EOM(2) 2026-02-21T10:12:32.2959585Z .b8 0 // EOM(3) 2026-02-21T10:12:32.2959648Z } 2026-02-21T10:12:32.2959719Z .section .debug_info 2026-02-21T10:12:32.2959771Z { 2026-02-21T10:12:32.2959866Z .b32 178 // Length of Unit 2026-02-21T10:12:32.2959970Z .b8 2 // DWARF version number 2026-02-21T10:12:32.2960023Z .b8 0 2026-02-21T10:12:32.2960156Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:12:32.2960258Z .b8 8 // Address Size (in bytes) 2026-02-21T10:12:32.2960372Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:12:32.2960459Z .b8 116 // DW_AT_producer 2026-02-21T10:12:32.2960595Z .b8 114 2026-02-21T10:12:32.2960655Z .b8 105 2026-02-21T10:12:32.2960709Z .b8 116 2026-02-21T10:12:32.2960763Z .b8 111 2026-02-21T10:12:32.2960823Z .b8 110 2026-02-21T10:12:32.2960880Z .b8 0 2026-02-21T10:12:32.2960965Z .b8 2 // DW_AT_language 2026-02-21T10:12:32.2961022Z .b8 0 2026-02-21T10:12:32.2961102Z .b8 99 // DW_AT_name 2026-02-21T10:12:32.2961156Z .b8 117 2026-02-21T10:12:32.2961275Z .b8 98 2026-02-21T10:12:32.2961334Z .b8 99 2026-02-21T10:12:32.2961387Z .b8 112 2026-02-21T10:12:32.2961441Z .b8 121 2026-02-21T10:12:32.2961494Z .b8 122 2026-02-21T10:12:32.2961550Z .b8 106 2026-02-21T10:12:32.2961602Z .b8 110 2026-02-21T10:12:32.2961654Z .b8 101 2026-02-21T10:12:32.2961712Z .b8 105 2026-02-21T10:12:32.2961768Z .b8 111 2026-02-21T10:12:32.2961820Z .b8 112 2026-02-21T10:12:32.2961872Z .b8 110 2026-02-21T10:12:32.2961939Z .b8 117 2026-02-21T10:12:32.2961996Z .b8 103 2026-02-21T10:12:32.2962049Z .b8 107 2026-02-21T10:12:32.2962109Z .b8 50 2026-02-21T10:12:32.2962160Z .b8 114 2026-02-21T10:12:32.2962213Z .b8 117 2026-02-21T10:12:32.2962265Z .b8 51 2026-02-21T10:12:32.2962333Z .b8 119 2026-02-21T10:12:32.2962389Z .b8 97 2026-02-21T10:12:32.2962441Z .b8 101 2026-02-21T10:12:32.2962499Z .b8 122 2026-02-21T10:12:32.2962551Z .b8 97 2026-02-21T10:12:32.2962605Z .b8 103 2026-02-21T10:12:32.2962659Z .b8 117 2026-02-21T10:12:32.2962717Z .b8 114 2026-02-21T10:12:32.2962770Z .b8 98 2026-02-21T10:12:32.2962822Z .b8 106 2026-02-21T10:12:32.2962874Z .b8 55 2026-02-21T10:12:32.2962933Z .b8 109 2026-02-21T10:12:32.2962984Z .b8 116 2026-02-21T10:12:32.2963037Z .b8 102 2026-02-21T10:12:32.2963092Z .b8 118 2026-02-21T10:12:32.2963143Z .b8 53 2026-02-21T10:12:32.2963194Z .b8 119 2026-02-21T10:12:32.2963245Z .b8 98 2026-02-21T10:12:32.2963303Z .b8 122 2026-02-21T10:12:32.2963367Z .b8 118 2026-02-21T10:12:32.2963419Z .b8 120 2026-02-21T10:12:32.2963475Z .b8 106 2026-02-21T10:12:32.2963526Z .b8 52 2026-02-21T10:12:32.2963578Z .b8 114 2026-02-21T10:12:32.2963632Z .b8 121 2026-02-21T10:12:32.2963689Z .b8 119 2026-02-21T10:12:32.2963741Z .b8 50 2026-02-21T10:12:32.2963792Z .b8 107 2026-02-21T10:12:32.2963844Z .b8 51 2026-02-21T10:12:32.2963902Z .b8 112 2026-02-21T10:12:32.2963953Z .b8 118 2026-02-21T10:12:32.2964004Z .b8 46 2026-02-21T10:12:32.2964061Z .b8 112 2026-02-21T10:12:32.2964112Z .b8 121 2026-02-21T10:12:32.2964162Z .b8 0 2026-02-21T10:12:32.2964268Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:12:32.2964368Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:12:32.2964423Z .b8 116 2026-02-21T10:12:32.2964475Z .b8 109 2026-02-21T10:12:32.2964531Z .b8 112 2026-02-21T10:12:32.2964582Z .b8 47 2026-02-21T10:12:32.2964634Z .b8 116 2026-02-21T10:12:32.2964688Z .b8 111 2026-02-21T10:12:32.2964745Z .b8 114 2026-02-21T10:12:32.2964797Z .b8 99 2026-02-21T10:12:32.2964848Z .b8 104 2026-02-21T10:12:32.2964904Z .b8 105 2026-02-21T10:12:32.2964956Z .b8 110 2026-02-21T10:12:32.2965008Z .b8 100 2026-02-21T10:12:32.2965061Z .b8 117 2026-02-21T10:12:32.2965117Z .b8 99 2026-02-21T10:12:32.2965169Z .b8 116 2026-02-21T10:12:32.2965220Z .b8 111 2026-02-21T10:12:32.2965281Z .b8 114 2026-02-21T10:12:32.2965452Z .b8 95 2026-02-21T10:12:32.2965505Z .b8 114 2026-02-21T10:12:32.2965557Z .b8 111 2026-02-21T10:12:32.2965615Z .b8 111 2026-02-21T10:12:32.2965667Z .b8 116 2026-02-21T10:12:32.2965720Z .b8 47 2026-02-21T10:12:32.2965774Z .b8 117 2026-02-21T10:12:32.2965831Z .b8 98 2026-02-21T10:12:32.2965884Z .b8 0 2026-02-21T10:12:32.2965995Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:12:32.2966079Z .b8 95 // DW_AT_name 2026-02-21T10:12:32.2966131Z .b8 104 2026-02-21T10:12:32.2966183Z .b8 101 2026-02-21T10:12:32.2966235Z .b8 108 2026-02-21T10:12:32.2966291Z .b8 105 2026-02-21T10:12:32.2966343Z .b8 111 2026-02-21T10:12:32.2966395Z .b8 110 2026-02-21T10:12:32.2966586Z .b8 95 2026-02-21T10:12:32.2966645Z .b8 109 2026-02-21T10:12:32.2966795Z .b8 97 2026-02-21T10:12:32.2966852Z .b8 116 2026-02-21T10:12:32.2966912Z .b8 109 2026-02-21T10:12:32.2966963Z .b8 117 2026-02-21T10:12:32.2967017Z .b8 108 2026-02-21T10:12:32.2967074Z .b8 95 2026-02-21T10:12:32.2967131Z .b8 98 2026-02-21T10:12:32.2967184Z .b8 102 2026-02-21T10:12:32.2967236Z .b8 49 2026-02-21T10:12:32.2967295Z .b8 54 2026-02-21T10:12:32.2967347Z .b8 95 2026-02-21T10:12:32.2967400Z .b8 105 2026-02-21T10:12:32.2967453Z .b8 110 2026-02-21T10:12:32.2967510Z .b8 116 2026-02-21T10:12:32.2967645Z .b8 52 2026-02-21T10:12:32.2967701Z .b8 0 2026-02-21T10:12:32.2967791Z .b8 1 // DW_AT_inline 2026-02-21T10:12:32.2967896Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:12:32.2967992Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:12:32.2968094Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:12:32.2968194Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:12:32.2968337Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:12:32.2968436Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:12:32.2968547Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:12:32.2968637Z .b64 $L__tmp8 // DW_AT_high_pc 2026-02-21T10:12:32.2968722Z .b8 1 // DW_AT_call_file 2026-02-21T10:12:32.2968813Z .b8 78 // DW_AT_call_line 2026-02-21T10:12:32.2968901Z .b8 36 // DW_AT_call_column 2026-02-21T10:12:32.2968992Z .b8 0 // End Of Children Mark 2026-02-21T10:12:32.2969084Z .b8 0 // End Of Children Mark 2026-02-21T10:12:32.2969136Z } 2026-02-21T10:12:32.2969206Z .section .debug_macinfo { } 2026-02-21T10:12:32.2969213Z 2026-02-21T10:12:32.2969302Z ================================================================ 2026-02-21T10:12:32.2969422Z please share the reproducer above with Triton project. 2026-02-21T10:12:47.8237907Z 2026-02-21T10:12:47.8239287Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 76/76 2.3 configs/s 2026-02-21T10:12:53.0752533Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━ 34/34 5.2 configs/s 2026-02-21T10:12:54.6068574Z [2767s] Generation 8 complete: 2026-02-21T10:12:54.6069028Z error=16 2026-02-21T10:12:54.6069308Z ok=63 2026-02-21T10:12:54.6069607Z min=5.8521 2026-02-21T10:12:54.6069894Z mid=10.1293 2026-02-21T10:12:54.6070161Z max=670.4321 2026-02-21T10:12:54.6070483Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:12:54.6071067Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:12:54.6071673Z 'l2_groupings': [16], 2026-02-21T10:12:54.6072052Z 'load_eviction_policies': ['', ''], 2026-02-21T10:12:54.6072494Z 'loop_orders': [[1, 0]], 2026-02-21T10:12:54.6072845Z 'maxnreg': 256, 2026-02-21T10:12:54.6073176Z 'num_sm_multiplier': 64, 2026-02-21T10:12:54.6073533Z 'num_stages': 1, 2026-02-21T10:12:54.6073831Z 'num_warps': 4, 2026-02-21T10:12:54.6074206Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:12:54.6075479Z 'range_flattens': [True, True], 2026-02-21T10:12:54.6075901Z 'range_multi_buffers': [None, True], 2026-02-21T10:12:54.6076321Z 'range_num_stages': [4, 2], 2026-02-21T10:12:54.6077206Z 'range_unroll_factors': [1, 0], 2026-02-21T10:12:54.6077638Z 'range_warp_specializes': []} 2026-02-21T10:12:54.6117302Z [2767s] Fitting surrogate: 840 points, 840 targets 2026-02-21T10:12:55.9100355Z [2768s] Generation 9 starting: 76 neighbors, 4 active search path(s) 2026-02-21T10:13:25.2280907Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77/77 1.5 configs/s 2026-02-21T10:13:56.6938013Z 2026-02-21T10:13:56.6938029Z 2026-02-21T10:13:56.6938442Z ================================================================ 2026-02-21T10:13:56.6938857Z Internal Triton PTX codegen error 2026-02-21T10:13:56.6939445Z `ptxas` stderr: 2026-02-21T10:13:56.6940183Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1007 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:13:56.6941019Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:13:56.6941271Z 2026-02-21T10:13:56.6942123Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpj8sj081l.ptx -o /tmp/tmpj8sj081l.ptx.o 2026-02-21T10:13:56.6942910Z 2026-02-21T10:13:56.6942916Z 2026-02-21T10:13:56.6942989Z // 2026-02-21T10:13:56.6943188Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:13:56.6943433Z // 2026-02-21T10:13:56.6943528Z 2026-02-21T10:13:56.6943604Z .version 8.7 2026-02-21T10:13:56.6943810Z .target sm_90a 2026-02-21T10:13:56.6943999Z .address_size 64 2026-02-21T10:13:56.6944115Z 2026-02-21T10:13:56.6944349Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:13:56.6944776Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:13:56.6945100Z // @_helion_matmul_bf16_int4 2026-02-21T10:13:56.6945430Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:13:56.6945801Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:13:56.6946242Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:13:56.6946960Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:13:56.6947386Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:13:56.6947801Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:13:56.6948171Z ) 2026-02-21T10:13:56.6948340Z .reqntid 128 2026-02-21T10:13:56.6948632Z .maxnreg 64 2026-02-21T10:13:56.6948774Z { 2026-02-21T10:13:56.6948917Z .reg .pred %p<64>; 2026-02-21T10:13:56.6949087Z .reg .b16 %rs<113>; 2026-02-21T10:13:56.6949323Z .reg .b32 %r<2654>; 2026-02-21T10:13:56.6949492Z .reg .b64 %rd<125>; 2026-02-21T10:13:56.6949841Z .loc 1 19 0 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:19:0 2026-02-21T10:13:56.6950252Z $L__func_begin0: 2026-02-21T10:13:56.6950601Z .loc 1 19 0 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:19:0 2026-02-21T10:13:56.6950935Z 2026-02-21T10:13:56.6951001Z // %bb.0: 2026-02-21T10:13:56.6951209Z ld.param.b64 %rd6, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:13:56.6951492Z $L__tmp0: 2026-02-21T10:13:56.6951812Z .loc 1 21 67 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:21:67 2026-02-21T10:13:56.6952233Z mov.u32 %r2517, %ctaid.x; 2026-02-21T10:13:56.6952485Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:13:56.6952751Z mov.u32 %r578, %ctaid.y; 2026-02-21T10:13:56.6952997Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:13:56.6953264Z mov.u32 %r579, %ctaid.z; 2026-02-21T10:13:56.6953828Z [2829s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:13:56.6955573Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[None, True], range_num_stages=[4, 2], range_unroll_factors=[1, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:13:56.6957737Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:13:56.6958055Z `ptxas` stderr: 2026-02-21T10:13:56.6958687Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1007 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:13:56.6959463Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:13:56.6959655Z 2026-02-21T10:13:56.6960175Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpj8sj081l.ptx -o /tmp/tmpj8sj081l.ptx.o 2026-02-21T10:13:56.6960764Z 2026-02-21T10:13:56.6960920Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:13:56.6961232Z mov.u32 %r580, %nctaid.x; 2026-02-21T10:13:56.6961514Z mov.u32 %r581, %nctaid.y; 2026-02-21T10:13:56.6961714Z mad.lo.s32 %r582, %r579, %r581, %r578; 2026-02-21T10:13:56.6961935Z mad.lo.s32 %r583, %r582, %r580, %r2517; 2026-02-21T10:13:56.6962136Z shl.b32 %r584, %r583, 7; 2026-02-21T10:13:56.6962312Z cvt.s64.s32 %rd54, %r584; 2026-02-21T10:13:56.6962489Z add.s64 %rd23, %rd53, %rd54; 2026-02-21T10:13:56.6962674Z mov.u32 %r2, %tid.x; 2026-02-21T10:13:56.6962838Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:13:56.6963013Z shl.b32 %r585, %r2, 2; 2026-02-21T10:13:56.6963193Z mov.b32 %r586, global_smem; 2026-02-21T10:13:56.6963372Z add.s32 %r504, %r586, %r585; 2026-02-21T10:13:56.6963550Z mov.b32 %r2377, 0; 2026-02-21T10:13:56.6963703Z // begin inline asm 2026-02-21T10:13:56.6963886Z @%p1 st.shared.b32 [ %r504 + 0 ], %r2377; 2026-02-21T10:13:56.6964094Z // end inline asm 2026-02-21T10:13:56.6964255Z bar.warp.sync -1; 2026-02-21T10:13:56.6964418Z setp.eq.b32 %p61, %r2, 0; 2026-02-21T10:13:56.6964606Z cvt.u64.u32 %rd8, %r586; 2026-02-21T10:13:56.6964782Z // begin inline asm 2026-02-21T10:13:56.6965106Z @%p61 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T10:13:56.6965467Z // end inline asm 2026-02-21T10:13:56.6965618Z // begin inline asm 2026-02-21T10:13:56.6965888Z @%p61 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T10:13:56.6966196Z // end inline asm 2026-02-21T10:13:56.6966355Z mov.b32 %r506, 128; 2026-02-21T10:13:56.6966659Z // begin inline asm 2026-02-21T10:13:56.6966951Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r506; 2026-02-21T10:13:56.6967280Z // end inline asm 2026-02-21T10:13:56.6967448Z mov.b32 %r507, 16; 2026-02-21T10:13:56.6967611Z // begin inline asm 2026-02-21T10:13:56.6967886Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r507; 2026-02-21T10:13:56.6968213Z // end inline asm 2026-02-21T10:13:56.6968365Z mov.b32 %r508, 1280; 2026-02-21T10:13:56.6968531Z // begin inline asm 2026-02-21T10:13:56.6968816Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r508; 2026-02-21T10:13:56.6969160Z // end inline asm 2026-02-21T10:13:56.6969316Z mov.b32 %r509, 4096; 2026-02-21T10:13:56.6969471Z // begin inline asm 2026-02-21T10:13:56.6969760Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r509; 2026-02-21T10:13:56.6970098Z // end inline asm 2026-02-21T10:13:56.6970252Z mov.b64 %rd16, 1280; 2026-02-21T10:13:56.6970424Z // begin inline asm 2026-02-21T10:13:56.6970737Z @%p61 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T10:13:56.6971088Z // end inline asm 2026-02-21T10:13:56.6971407Z mov.b32 %r2376, 1; 2026-02-21T10:13:56.6971563Z // begin inline asm 2026-02-21T10:13:56.6971870Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r2376; 2026-02-21T10:13:56.6972231Z // end inline asm 2026-02-21T10:13:56.6972396Z // begin inline asm 2026-02-21T10:13:56.6972707Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r2376; 2026-02-21T10:13:56.6973059Z // end inline asm 2026-02-21T10:13:56.6973215Z // begin inline asm 2026-02-21T10:13:56.6973498Z @%p61 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:13:56.6973827Z // end inline asm 2026-02-21T10:13:56.6973984Z // begin inline asm 2026-02-21T10:13:56.6974361Z @%p61 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:13:56.6974719Z // end inline asm 2026-02-21T10:13:56.6974862Z // begin inline asm 2026-02-21T10:13:56.6975147Z @%p61 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T10:13:56.6975488Z // end inline asm 2026-02-21T10:13:56.6975640Z // begin inline asm 2026-02-21T10:13:56.6975909Z @%p61 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:13:56.6976309Z // end inline asm 2026-02-21T10:13:56.6976600Z // begin inline asm 2026-02-21T10:13:56.6977048Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T10:13:56.6977527Z // end inline asm 2026-02-21T10:13:56.6977674Z // begin inline asm 2026-02-21T10:13:56.6977926Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T10:13:56.6978242Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:13:56.6978467Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:13:56.6978681Z // end inline asm 2026-02-21T10:13:56.6978830Z bar.sync 0; 2026-02-21T10:13:56.6978995Z cvta.global.u64 %rd34, %rd23; 2026-02-21T10:13:56.6979354Z .loc 1 38 45 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:38:45 2026-02-21T10:13:56.6979728Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:13:56.6979913Z or.b32 %r6, %r5, 16; 2026-02-21T10:13:56.6980075Z or.b32 %r7, %r5, 32; 2026-02-21T10:13:56.6980238Z or.b32 %r8, %r5, 48; 2026-02-21T10:13:56.6980391Z or.b32 %r9, %r5, 64; 2026-02-21T10:13:56.6980549Z or.b32 %r10, %r5, 80; 2026-02-21T10:13:56.6980708Z or.b32 %r11, %r5, 96; 2026-02-21T10:13:56.6980878Z or.b32 %r12, %r5, 112; 2026-02-21T10:13:56.6981207Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.6981585Z sub.s32 %r589, 5120, %r2517; 2026-02-21T10:13:56.6981772Z mul.hi.s32 %r590, %r589, 1041204193; 2026-02-21T10:13:56.6981982Z shr.u32 %r591, %r590, 31; 2026-02-21T10:13:56.6982164Z shr.s32 %r592, %r590, 11; 2026-02-21T10:13:56.6982339Z add.s32 %r30, %r592, %r591; 2026-02-21T10:13:56.6982522Z mul.lo.s32 %r593, %r30, 8448; 2026-02-21T10:13:56.6982725Z setp.ne.b32 %p28, %r589, %r593; 2026-02-21T10:13:56.6982932Z setp.lt.u32 %p29, %r2517, 5121; 2026-02-21T10:13:56.6983128Z and.pred %p30, %p29, %p28; 2026-02-21T10:13:56.6983322Z selp.b32 %r31, 1, 0, %p30; 2026-02-21T10:13:56.6983497Z add.s32 %r32, %r30, %r31; 2026-02-21T10:13:56.6983828Z .loc 1 53 38 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:53:38 2026-02-21T10:13:56.6984198Z and.b32 %r33, %r2, 7; 2026-02-21T10:13:56.6984368Z shl.b32 %r34, %r33, 2; 2026-02-21T10:13:56.6984691Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.6985049Z add.s32 %r512, %r586, 47104; 2026-02-21T10:13:56.6985235Z // begin inline asm 2026-02-21T10:13:56.6985435Z @%p61 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T10:13:56.6985668Z // end inline asm 2026-02-21T10:13:56.6985818Z bar.sync 0; 2026-02-21T10:13:56.6985971Z add.s32 %r513, %r586, 47112; 2026-02-21T10:13:56.6986310Z // begin inline asm 2026-02-21T10:13:56.6986630Z @%p61 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T10:13:56.6986869Z // end inline asm 2026-02-21T10:13:56.6987012Z bar.sync 0; 2026-02-21T10:13:56.6987160Z add.s32 %r514, %r586, 47120; 2026-02-21T10:13:56.6987336Z // begin inline asm 2026-02-21T10:13:56.6987526Z @%p61 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T10:13:56.6987746Z // end inline asm 2026-02-21T10:13:56.6987918Z setp.lt.s32 %p31, %r32, 1; 2026-02-21T10:13:56.6988096Z setp.gt.s32 %p32, %r32, 0; 2026-02-21T10:13:56.6988428Z .loc 1 33 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:33:33 2026-02-21T10:13:56.6988889Z shr.u32 %r594, %r2517, 9; 2026-02-21T10:13:56.6989067Z and.b32 %r595, %r594, 4194288; 2026-02-21T10:13:56.6989482Z .loc 1 34 39 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:34:39 2026-02-21T10:13:56.6989837Z sub.s32 %r596, 10, %r595; 2026-02-21T10:13:56.6990153Z .loc 1 35 45 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:45 2026-02-21T10:13:56.6990508Z and.b32 %r597, %r2517, 8191; 2026-02-21T10:13:56.6990847Z .loc 1 36 51 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:36:51 2026-02-21T10:13:56.6991299Z div.s32 %r598, %r597, %r596; 2026-02-21T10:13:56.6991616Z .loc 1 35 64 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:64 2026-02-21T10:13:56.6991987Z mul.lo.s32 %r599, %r598, %r596; 2026-02-21T10:13:56.6992178Z sub.s32 %r600, %r597, %r599; 2026-02-21T10:13:56.6992511Z .loc 1 35 30 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:30 2026-02-21T10:13:56.6992871Z add.s32 %r601, %r600, %r595; 2026-02-21T10:13:56.6993196Z .loc 1 37 27 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:37:27 2026-02-21T10:13:56.6993552Z shl.b32 %r2374, %r601, 7; 2026-02-21T10:13:56.6993862Z .loc 1 39 27 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:39:27 2026-02-21T10:13:56.6994223Z shl.b32 %r2372, %r598, 7; 2026-02-21T10:13:56.6994529Z .loc 1 40 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:40:32 2026-02-21T10:13:56.6994886Z or.b32 %r2518, %r2372, %r5; 2026-02-21T10:13:56.6995064Z or.b32 %r2519, %r2372, %r6; 2026-02-21T10:13:56.6995242Z or.b32 %r2520, %r2372, %r7; 2026-02-21T10:13:56.6995418Z or.b32 %r2521, %r2372, %r8; 2026-02-21T10:13:56.6995601Z or.b32 %r2522, %r2372, %r9; 2026-02-21T10:13:56.6995783Z or.b32 %r2523, %r2372, %r10; 2026-02-21T10:13:56.7003837Z or.b32 %r2524, %r2372, %r11; 2026-02-21T10:13:56.7004131Z or.b32 %r2525, %r2372, %r12; 2026-02-21T10:13:56.7004519Z .loc 1 54 53 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:53 2026-02-21T10:13:56.7004921Z shl.b32 %r602, %r2518, 13; 2026-02-21T10:13:56.7005126Z shl.b32 %r603, %r2519, 13; 2026-02-21T10:13:56.7005313Z shl.b32 %r604, %r2520, 13; 2026-02-21T10:13:56.7005509Z shl.b32 %r605, %r2521, 13; 2026-02-21T10:13:56.7005691Z shl.b32 %r606, %r2522, 13; 2026-02-21T10:13:56.7005873Z shl.b32 %r607, %r2523, 13; 2026-02-21T10:13:56.7006051Z shl.b32 %r608, %r2524, 13; 2026-02-21T10:13:56.7006228Z shl.b32 %r609, %r2525, 13; 2026-02-21T10:13:56.7006762Z .loc 1 54 60 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:60 2026-02-21T10:13:56.7007139Z or.b32 %r610, %r602, %r34; 2026-02-21T10:13:56.7007319Z or.b32 %r611, %r603, %r34; 2026-02-21T10:13:56.7007495Z or.b32 %r612, %r604, %r34; 2026-02-21T10:13:56.7007676Z or.b32 %r613, %r605, %r34; 2026-02-21T10:13:56.7007848Z or.b32 %r614, %r606, %r34; 2026-02-21T10:13:56.7008024Z or.b32 %r615, %r607, %r34; 2026-02-21T10:13:56.7008200Z or.b32 %r616, %r608, %r34; 2026-02-21T10:13:56.7008382Z or.b32 %r617, %r609, %r34; 2026-02-21T10:13:56.7008709Z .loc 1 54 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:32 2026-02-21T10:13:56.7009357Z mad.wide.s32 %rd26, %r610, 2, %rd6; 2026-02-21T10:13:56.7009595Z mad.wide.s32 %rd27, %r611, 2, %rd6; 2026-02-21T10:13:56.7009814Z mad.wide.s32 %rd28, %r612, 2, %rd6; 2026-02-21T10:13:56.7010031Z mad.wide.s32 %rd29, %r613, 2, %rd6; 2026-02-21T10:13:56.7010246Z mad.wide.s32 %rd30, %r614, 2, %rd6; 2026-02-21T10:13:56.7010453Z mad.wide.s32 %rd31, %r615, 2, %rd6; 2026-02-21T10:13:56.7010657Z mad.wide.s32 %rd32, %r616, 2, %rd6; 2026-02-21T10:13:56.7010857Z mad.wide.s32 %rd33, %r617, 2, %rd6; 2026-02-21T10:13:56.7011233Z .loc 1 54 80 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:80 2026-02-21T10:13:56.7011606Z and.b32 %r45, %r2, 127; 2026-02-21T10:13:56.7011794Z shl.b32 %r618, %r45, 3; 2026-02-21T10:13:56.7012057Z shr.u32 %r619, %r2, 1; 2026-02-21T10:13:56.7012243Z and.b32 %r620, %r619, 24; 2026-02-21T10:13:56.7012424Z xor.b32 %r46, %r618, %r620; 2026-02-21T10:13:56.7012613Z add.s32 %r515, %r586, %r46; 2026-02-21T10:13:56.7012817Z selp.b32 %r516, 8, 0, %p32; 2026-02-21T10:13:56.7012995Z // begin inline asm 2026-02-21T10:13:56.7013248Z cp.async.ca.shared.global [ %r515 + 0 ], [ %rd26 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7013536Z // end inline asm 2026-02-21T10:13:56.7013793Z add.s32 %r517, %r515, 1024; 2026-02-21T10:13:56.7013976Z // begin inline asm 2026-02-21T10:13:56.7014215Z cp.async.ca.shared.global [ %r517 + 0 ], [ %rd27 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7014499Z // end inline asm 2026-02-21T10:13:56.7014662Z add.s32 %r519, %r515, 2048; 2026-02-21T10:13:56.7014838Z // begin inline asm 2026-02-21T10:13:56.7015090Z cp.async.ca.shared.global [ %r519 + 0 ], [ %rd28 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7015367Z // end inline asm 2026-02-21T10:13:56.7015521Z add.s32 %r521, %r515, 3072; 2026-02-21T10:13:56.7015705Z // begin inline asm 2026-02-21T10:13:56.7015933Z cp.async.ca.shared.global [ %r521 + 0 ], [ %rd29 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7016222Z // end inline asm 2026-02-21T10:13:56.7016384Z add.s32 %r523, %r515, 4096; 2026-02-21T10:13:56.7016703Z // begin inline asm 2026-02-21T10:13:56.7016930Z cp.async.ca.shared.global [ %r523 + 0 ], [ %rd30 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7017208Z // end inline asm 2026-02-21T10:13:56.7017369Z add.s32 %r525, %r515, 5120; 2026-02-21T10:13:56.7017545Z // begin inline asm 2026-02-21T10:13:56.7017777Z cp.async.ca.shared.global [ %r525 + 0 ], [ %rd31 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7018058Z // end inline asm 2026-02-21T10:13:56.7018223Z add.s32 %r527, %r515, 6144; 2026-02-21T10:13:56.7018399Z // begin inline asm 2026-02-21T10:13:56.7018629Z cp.async.ca.shared.global [ %r527 + 0 ], [ %rd32 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7018897Z // end inline asm 2026-02-21T10:13:56.7019062Z add.s32 %r529, %r515, 7168; 2026-02-21T10:13:56.7019232Z // begin inline asm 2026-02-21T10:13:56.7019465Z cp.async.ca.shared.global [ %r529 + 0 ], [ %rd33 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7019739Z // end inline asm 2026-02-21T10:13:56.7019902Z cp.async.commit_group; 2026-02-21T10:13:56.7020259Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7020641Z bar.sync 0; 2026-02-21T10:13:56.7020811Z and.pred %p22, %p61, %p32; 2026-02-21T10:13:56.7020997Z // begin inline asm 2026-02-21T10:13:56.7021237Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r512], 2048; 2026-02-21T10:13:56.7021506Z // end inline asm 2026-02-21T10:13:56.7021815Z .loc 1 60 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:60:33 2026-02-21T10:13:56.7022178Z bar.sync 0; 2026-02-21T10:13:56.7022338Z elect.sync %r621|%p33, -1; 2026-02-21T10:13:56.7022533Z and.pred %p34, %p32, %p33; 2026-02-21T10:13:56.7022720Z and.pred %p23, %p1, %p34; 2026-02-21T10:13:56.7022913Z add.s32 %r532, %r586, 40960; 2026-02-21T10:13:56.7023097Z // begin inline asm 2026-02-21T10:13:56.7023534Z @%p23 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r532], [%rd34, {%r2374, %r2377}], [%r512]; 2026-02-21T10:13:56.7024173Z // end inline asm 2026-02-21T10:13:56.7024477Z .loc 1 54 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:32 2026-02-21T10:13:56.7024849Z cvt.s64.s32 %rd55, %r602; 2026-02-21T10:13:56.7025026Z cvt.u64.u32 %rd56, %r34; 2026-02-21T10:13:56.7025208Z or.b64 %rd57, %rd55, %rd56; 2026-02-21T10:13:56.7025388Z shl.b64 %rd58, %rd57, 1; 2026-02-21T10:13:56.7025566Z add.s64 %rd59, %rd6, %rd58; 2026-02-21T10:13:56.7025750Z add.s64 %rd35, %rd59, 64; 2026-02-21T10:13:56.7025930Z cvt.s64.s32 %rd60, %r603; 2026-02-21T10:13:56.7026112Z or.b64 %rd61, %rd60, %rd56; 2026-02-21T10:13:56.7026288Z shl.b64 %rd62, %rd61, 1; 2026-02-21T10:13:56.7026604Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T10:13:56.7026891Z add.s64 %rd36, %rd63, 64; 2026-02-21T10:13:56.7027076Z cvt.s64.s32 %rd64, %r604; 2026-02-21T10:13:56.7027264Z or.b64 %rd65, %rd64, %rd56; 2026-02-21T10:13:56.7027452Z shl.b64 %rd66, %rd65, 1; 2026-02-21T10:13:56.7027623Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T10:13:56.7027808Z add.s64 %rd37, %rd67, 64; 2026-02-21T10:13:56.7027979Z cvt.s64.s32 %rd68, %r605; 2026-02-21T10:13:56.7028161Z or.b64 %rd69, %rd68, %rd56; 2026-02-21T10:13:56.7028430Z shl.b64 %rd70, %rd69, 1; 2026-02-21T10:13:56.7028693Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T10:13:56.7028883Z add.s64 %rd38, %rd71, 64; 2026-02-21T10:13:56.7029057Z cvt.s64.s32 %rd72, %r606; 2026-02-21T10:13:56.7029238Z or.b64 %rd73, %rd72, %rd56; 2026-02-21T10:13:56.7029414Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:13:56.7029592Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T10:13:56.7029767Z add.s64 %rd39, %rd75, 64; 2026-02-21T10:13:56.7029945Z cvt.s64.s32 %rd76, %r607; 2026-02-21T10:13:56.7030124Z or.b64 %rd77, %rd76, %rd56; 2026-02-21T10:13:56.7030308Z shl.b64 %rd78, %rd77, 1; 2026-02-21T10:13:56.7030487Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T10:13:56.7030664Z add.s64 %rd40, %rd79, 64; 2026-02-21T10:13:56.7030842Z cvt.s64.s32 %rd80, %r608; 2026-02-21T10:13:56.7031012Z or.b64 %rd81, %rd80, %rd56; 2026-02-21T10:13:56.7031198Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:13:56.7031373Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T10:13:56.7031558Z add.s64 %rd41, %rd83, 64; 2026-02-21T10:13:56.7031732Z cvt.s64.s32 %rd84, %r609; 2026-02-21T10:13:56.7031911Z or.b64 %rd85, %rd84, %rd56; 2026-02-21T10:13:56.7032089Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:13:56.7032265Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T10:13:56.7032449Z add.s64 %rd42, %rd87, 64; 2026-02-21T10:13:56.7032784Z .loc 1 54 80 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:80 2026-02-21T10:13:56.7033164Z add.s32 %r536, %r515, 8192; 2026-02-21T10:13:56.7033351Z // begin inline asm 2026-02-21T10:13:56.7033602Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd35 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7033883Z // end inline asm 2026-02-21T10:13:56.7034049Z add.s32 %r538, %r515, 9216; 2026-02-21T10:13:56.7034230Z // begin inline asm 2026-02-21T10:13:56.7034472Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd36 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7034751Z // end inline asm 2026-02-21T10:13:56.7034908Z add.s32 %r540, %r515, 10240; 2026-02-21T10:13:56.7035094Z // begin inline asm 2026-02-21T10:13:56.7035336Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd37 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7035615Z // end inline asm 2026-02-21T10:13:56.7035773Z add.s32 %r542, %r515, 11264; 2026-02-21T10:13:56.7035955Z // begin inline asm 2026-02-21T10:13:56.7036185Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd38 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7036594Z // end inline asm 2026-02-21T10:13:56.7036766Z add.s32 %r544, %r515, 12288; 2026-02-21T10:13:56.7036943Z // begin inline asm 2026-02-21T10:13:56.7037174Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd39 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7037443Z // end inline asm 2026-02-21T10:13:56.7037599Z add.s32 %r546, %r515, 13312; 2026-02-21T10:13:56.7037930Z // begin inline asm 2026-02-21T10:13:56.7038164Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd40 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7038431Z // end inline asm 2026-02-21T10:13:56.7038600Z add.s32 %r548, %r515, 14336; 2026-02-21T10:13:56.7038787Z // begin inline asm 2026-02-21T10:13:56.7039031Z cp.async.ca.shared.global [ %r548 + 0 ], [ %rd41 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7039312Z // end inline asm 2026-02-21T10:13:56.7039470Z add.s32 %r550, %r515, 15360; 2026-02-21T10:13:56.7039653Z // begin inline asm 2026-02-21T10:13:56.7039883Z cp.async.ca.shared.global [ %r550 + 0 ], [ %rd42 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7040163Z // end inline asm 2026-02-21T10:13:56.7040319Z cp.async.commit_group; 2026-02-21T10:13:56.7040728Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7041116Z bar.sync 0; 2026-02-21T10:13:56.7041269Z // begin inline asm 2026-02-21T10:13:56.7041513Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r513], 2048; 2026-02-21T10:13:56.7041787Z // end inline asm 2026-02-21T10:13:56.7042095Z .loc 1 60 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:60:33 2026-02-21T10:13:56.7042452Z bar.sync 0; 2026-02-21T10:13:56.7042684Z elect.sync %r622|%p35, -1; 2026-02-21T10:13:56.7042890Z and.pred %p36, %p32, %p35; 2026-02-21T10:13:56.7043095Z and.pred %p25, %p1, %p36; 2026-02-21T10:13:56.7043290Z add.s32 %r553, %r586, 43008; 2026-02-21T10:13:56.7043469Z // begin inline asm 2026-02-21T10:13:56.7043892Z @%p25 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r553], [%rd34, {%r2374, %r507}], [%r513]; 2026-02-21T10:13:56.7044352Z // end inline asm 2026-02-21T10:13:56.7044662Z .loc 1 54 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:32 2026-02-21T10:13:56.7045024Z add.s64 %rd44, %rd59, 128; 2026-02-21T10:13:56.7045209Z add.s64 %rd45, %rd63, 128; 2026-02-21T10:13:56.7045409Z add.s64 %rd46, %rd67, 128; 2026-02-21T10:13:56.7045579Z add.s64 %rd47, %rd71, 128; 2026-02-21T10:13:56.7045756Z add.s64 %rd48, %rd75, 128; 2026-02-21T10:13:56.7045929Z add.s64 %rd49, %rd79, 128; 2026-02-21T10:13:56.7046103Z add.s64 %rd50, %rd83, 128; 2026-02-21T10:13:56.7046277Z add.s64 %rd51, %rd87, 128; 2026-02-21T10:13:56.7046731Z .loc 1 54 80 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:80 2026-02-21T10:13:56.7047094Z add.s32 %r557, %r515, 16384; 2026-02-21T10:13:56.7047288Z // begin inline asm 2026-02-21T10:13:56.7047528Z cp.async.ca.shared.global [ %r557 + 0 ], [ %rd44 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7047801Z // end inline asm 2026-02-21T10:13:56.7047958Z add.s32 %r559, %r515, 17408; 2026-02-21T10:13:56.7048132Z // begin inline asm 2026-02-21T10:13:56.7048367Z cp.async.ca.shared.global [ %r559 + 0 ], [ %rd45 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7048637Z // end inline asm 2026-02-21T10:13:56.7048794Z add.s32 %r561, %r515, 18432; 2026-02-21T10:13:56.7048969Z // begin inline asm 2026-02-21T10:13:56.7049197Z cp.async.ca.shared.global [ %r561 + 0 ], [ %rd46 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7049471Z // end inline asm 2026-02-21T10:13:56.7049634Z add.s32 %r563, %r515, 19456; 2026-02-21T10:13:56.7049819Z // begin inline asm 2026-02-21T10:13:56.7050040Z cp.async.ca.shared.global [ %r563 + 0 ], [ %rd47 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7050312Z // end inline asm 2026-02-21T10:13:56.7050475Z add.s32 %r565, %r515, 20480; 2026-02-21T10:13:56.7050651Z // begin inline asm 2026-02-21T10:13:56.7050878Z cp.async.ca.shared.global [ %r565 + 0 ], [ %rd48 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7051142Z // end inline asm 2026-02-21T10:13:56.7051291Z add.s32 %r567, %r515, 21504; 2026-02-21T10:13:56.7051460Z // begin inline asm 2026-02-21T10:13:56.7051683Z cp.async.ca.shared.global [ %r567 + 0 ], [ %rd49 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7051969Z // end inline asm 2026-02-21T10:13:56.7052123Z add.s32 %r569, %r515, 22528; 2026-02-21T10:13:56.7052455Z // begin inline asm 2026-02-21T10:13:56.7052675Z cp.async.ca.shared.global [ %r569 + 0 ], [ %rd50 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7052947Z // end inline asm 2026-02-21T10:13:56.7053096Z add.s32 %r571, %r515, 23552; 2026-02-21T10:13:56.7053279Z // begin inline asm 2026-02-21T10:13:56.7053501Z cp.async.ca.shared.global [ %r571 + 0 ], [ %rd51 + 0 ], 0x8, %r516; 2026-02-21T10:13:56.7053776Z // end inline asm 2026-02-21T10:13:56.7053936Z cp.async.commit_group; 2026-02-21T10:13:56.7054267Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7054630Z bar.sync 0; 2026-02-21T10:13:56.7054771Z // begin inline asm 2026-02-21T10:13:56.7054993Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r514], 2048; 2026-02-21T10:13:56.7055372Z // end inline asm 2026-02-21T10:13:56.7055672Z .loc 1 60 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:60:33 2026-02-21T10:13:56.7056024Z bar.sync 0; 2026-02-21T10:13:56.7056182Z elect.sync %r623|%p37, -1; 2026-02-21T10:13:56.7056371Z and.pred %p38, %p32, %p37; 2026-02-21T10:13:56.7056679Z and.pred %p27, %p1, %p38; 2026-02-21T10:13:56.7056869Z add.s32 %r574, %r586, 45056; 2026-02-21T10:13:56.7057635Z mov.b32 %r2378, 32; 2026-02-21T10:13:56.7057829Z // begin inline asm 2026-02-21T10:13:56.7058255Z @%p27 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r574], [%rd34, {%r2374, %r2378}], [%r514]; 2026-02-21T10:13:56.7058720Z // end inline asm 2026-02-21T10:13:56.7059037Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7059411Z @%p31 bra $L__BB0_7; 2026-02-21T10:13:56.7059598Z // %bb.1: // %.lr.ph 2026-02-21T10:13:56.7059988Z .loc 1 0 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:0:112 2026-02-21T10:13:56.7060417Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:13:56.7060679Z shr.u32 %r3, %r2, 5; 2026-02-21T10:13:56.7060856Z and.b32 %r4, %r2, 120; 2026-02-21T10:13:56.7061024Z shr.u32 %r587, %r2, 4; 2026-02-21T10:13:56.7061205Z bfe.u32 %r13, %r2, 4, 3; 2026-02-21T10:13:56.7061390Z or.b32 %r14, %r13, 8; 2026-02-21T10:13:56.7061570Z or.b32 %r15, %r13, 16; 2026-02-21T10:13:56.7061734Z or.b32 %r16, %r13, 24; 2026-02-21T10:13:56.7061892Z or.b32 %r17, %r13, 32; 2026-02-21T10:13:56.7062059Z or.b32 %r18, %r13, 40; 2026-02-21T10:13:56.7062221Z or.b32 %r19, %r13, 48; 2026-02-21T10:13:56.7062390Z or.b32 %r20, %r587, 56; 2026-02-21T10:13:56.7062555Z or.b32 %r21, %r13, 64; 2026-02-21T10:13:56.7062721Z or.b32 %r22, %r13, 72; 2026-02-21T10:13:56.7062877Z or.b32 %r23, %r13, 80; 2026-02-21T10:13:56.7063055Z or.b32 %r24, %r13, 88; 2026-02-21T10:13:56.7063220Z or.b32 %r25, %r13, 96; 2026-02-21T10:13:56.7063387Z or.b32 %r26, %r13, 104; 2026-02-21T10:13:56.7063560Z or.b32 %r27, %r13, 112; 2026-02-21T10:13:56.7063730Z or.b32 %r28, %r587, 120; 2026-02-21T10:13:56.7063907Z shl.b32 %r588, %r2, 3; 2026-02-21T10:13:56.7064070Z and.b32 %r29, %r588, 120; 2026-02-21T10:13:56.7064406Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7064767Z shl.b32 %r630, %r32, 8; 2026-02-21T10:13:56.7064940Z add.s32 %r47, %r630, -3; 2026-02-21T10:13:56.7065108Z shl.b32 %r631, %r2, 5; 2026-02-21T10:13:56.7065281Z and.b32 %r632, %r631, 3072; 2026-02-21T10:13:56.7065460Z shl.b32 %r633, %r2, 4; 2026-02-21T10:13:56.7065626Z and.b32 %r634, %r633, 448; 2026-02-21T10:13:56.7065801Z and.b32 %r635, %r2, 3; 2026-02-21T10:13:56.7065960Z shl.b32 %r636, %r635, 1; 2026-02-21T10:13:56.7066130Z and.b32 %r637, %r2, 24; 2026-02-21T10:13:56.7066293Z or.b32 %r638, %r632, %r634; 2026-02-21T10:13:56.7066616Z or.b32 %r639, %r636, %r637; 2026-02-21T10:13:56.7066810Z or.b32 %r48, %r638, %r639; 2026-02-21T10:13:56.7066987Z xor.b32 %r49, %r48, 8; 2026-02-21T10:13:56.7067455Z xor.b32 %r50, %r48, 16; 2026-02-21T10:13:56.7067628Z xor.b32 %r51, %r48, 24; 2026-02-21T10:13:56.7067796Z shl.b32 %r640, %r45, 7; 2026-02-21T10:13:56.7067955Z shl.b32 %r641, %r33, 4; 2026-02-21T10:13:56.7068119Z or.b32 %r642, %r640, %r641; 2026-02-21T10:13:56.7068293Z add.s32 %r644, %r586, 24576; 2026-02-21T10:13:56.7068541Z add.s32 %r52, %r644, %r642; 2026-02-21T10:13:56.7068719Z xor.b32 %r645, %r642, 16; 2026-02-21T10:13:56.7068894Z add.s32 %r53, %r644, %r645; 2026-02-21T10:13:56.7069063Z xor.b32 %r646, %r642, 32; 2026-02-21T10:13:56.7069238Z add.s32 %r54, %r644, %r646; 2026-02-21T10:13:56.7069407Z xor.b32 %r647, %r642, 48; 2026-02-21T10:13:56.7069579Z add.s32 %r55, %r644, %r647; 2026-02-21T10:13:56.7069755Z xor.b32 %r648, %r642, 64; 2026-02-21T10:13:56.7070021Z add.s32 %r56, %r644, %r648; 2026-02-21T10:13:56.7070200Z xor.b32 %r649, %r642, 80; 2026-02-21T10:13:56.7070365Z add.s32 %r57, %r644, %r649; 2026-02-21T10:13:56.7070539Z xor.b32 %r650, %r642, 96; 2026-02-21T10:13:56.7070709Z add.s32 %r58, %r644, %r650; 2026-02-21T10:13:56.7070888Z xor.b32 %r651, %r642, 112; 2026-02-21T10:13:56.7071058Z add.s32 %r59, %r644, %r651; 2026-02-21T10:13:56.7071237Z bfe.u32 %r652, %r644, 4, 14; 2026-02-21T10:13:56.7071414Z cvt.u64.u32 %rd88, %r652; 2026-02-21T10:13:56.7071676Z or.b64 %rd92, %rd88, 4611686293372403712; 2026-02-21T10:13:56.7071895Z add.s32 %r653, %r586, 24608; 2026-02-21T10:13:56.7072070Z bfe.u32 %r654, %r653, 4, 14; 2026-02-21T10:13:56.7072266Z cvt.u64.u32 %rd89, %r654; 2026-02-21T10:13:56.7072446Z or.b64 %rd93, %rd89, 4611686293372403712; 2026-02-21T10:13:56.7072659Z add.s32 %r655, %r586, 24640; 2026-02-21T10:13:56.7072835Z bfe.u32 %r656, %r655, 4, 14; 2026-02-21T10:13:56.7073014Z cvt.u64.u32 %rd90, %r656; 2026-02-21T10:13:56.7073196Z or.b64 %rd94, %rd90, 4611686293372403712; 2026-02-21T10:13:56.7073407Z add.s32 %r657, %r586, 24672; 2026-02-21T10:13:56.7073586Z bfe.u32 %r658, %r657, 4, 14; 2026-02-21T10:13:56.7073760Z cvt.u64.u32 %rd91, %r658; 2026-02-21T10:13:56.7073950Z or.b64 %rd95, %rd91, 4611686293372403712; 2026-02-21T10:13:56.7074151Z shl.b32 %r659, %r635, 11; 2026-02-21T10:13:56.7074326Z shl.b32 %r660, %r635, 5; 2026-02-21T10:13:56.7074492Z shl.b32 %r661, %r4, 4; 2026-02-21T10:13:56.7074663Z and.b32 %r663, %r585, 16; 2026-02-21T10:13:56.7074836Z or.b32 %r664, %r661, %r663; 2026-02-21T10:13:56.7075016Z or.b32 %r665, %r664, %r659; 2026-02-21T10:13:56.7075185Z or.b32 %r666, %r665, %r660; 2026-02-21T10:13:56.7075361Z add.s32 %r60, %r644, %r666; 2026-02-21T10:13:56.7075536Z xor.b32 %r667, %r666, 32; 2026-02-21T10:13:56.7075702Z add.s32 %r61, %r644, %r667; 2026-02-21T10:13:56.7075876Z xor.b32 %r668, %r666, 64; 2026-02-21T10:13:56.7076053Z add.s32 %r62, %r644, %r668; 2026-02-21T10:13:56.7076231Z xor.b32 %r669, %r666, 96; 2026-02-21T10:13:56.7076400Z add.s32 %r63, %r644, %r669; 2026-02-21T10:13:56.7076708Z shl.b32 %r670, %r637, 8; 2026-02-21T10:13:56.7076877Z and.b32 %r671, %r585, 496; 2026-02-21T10:13:56.7077054Z or.b32 %r672, %r670, %r660; 2026-02-21T10:13:56.7077223Z xor.b32 %r673, %r672, %r671; 2026-02-21T10:13:56.7077404Z add.s32 %r2122, %r644, %r673; 2026-02-21T10:13:56.7077587Z add.s32 %r2127, %r2122, 512; 2026-02-21T10:13:56.7077761Z add.s32 %r2132, %r2122, 1024; 2026-02-21T10:13:56.7077945Z add.s32 %r2137, %r2122, 1536; 2026-02-21T10:13:56.7078119Z shl.b32 %r674, %r30, 8; 2026-02-21T10:13:56.7078293Z shl.b32 %r675, %r31, 8; 2026-02-21T10:13:56.7078457Z add.s32 %r68, %r674, %r675; 2026-02-21T10:13:56.7078636Z mov.b32 %r2384, 0f00000000; 2026-02-21T10:13:56.7078805Z mov.b32 %r2381, 2; 2026-02-21T10:13:56.7078981Z mov.b32 %r2380, -1; 2026-02-21T10:13:56.7079147Z mov.b32 %r2373, %r2372; 2026-02-21T10:13:56.7079318Z mov.b32 %r2375, %r2374; 2026-02-21T10:13:56.7079491Z mov.b32 %r2379, %r2377; 2026-02-21T10:13:56.7079661Z mov.b32 %r2382, %r2372; 2026-02-21T10:13:56.7079832Z mov.b32 %r2383, %r2374; 2026-02-21T10:13:56.7079994Z mov.b32 %r2385, %r2384; 2026-02-21T10:13:56.7080329Z mov.b32 %r2386, %r2384; 2026-02-21T10:13:56.7080490Z mov.b32 %r2387, %r2384; 2026-02-21T10:13:56.7080656Z mov.b32 %r2388, %r2384; 2026-02-21T10:13:56.7080818Z mov.b32 %r2389, %r2384; 2026-02-21T10:13:56.7080985Z mov.b32 %r2390, %r2384; 2026-02-21T10:13:56.7081148Z mov.b32 %r2391, %r2384; 2026-02-21T10:13:56.7081316Z mov.b32 %r2392, %r2384; 2026-02-21T10:13:56.7081482Z mov.b32 %r2393, %r2384; 2026-02-21T10:13:56.7081642Z mov.b32 %r2394, %r2384; 2026-02-21T10:13:56.7081811Z mov.b32 %r2395, %r2384; 2026-02-21T10:13:56.7081973Z mov.b32 %r2396, %r2384; 2026-02-21T10:13:56.7082139Z mov.b32 %r2397, %r2384; 2026-02-21T10:13:56.7082301Z mov.b32 %r2398, %r2384; 2026-02-21T10:13:56.7082470Z mov.b32 %r2399, %r2384; 2026-02-21T10:13:56.7082631Z mov.b32 %r2400, %r2384; 2026-02-21T10:13:56.7082891Z mov.b32 %r2401, %r2384; 2026-02-21T10:13:56.7083061Z mov.b32 %r2402, %r2384; 2026-02-21T10:13:56.7083230Z mov.b32 %r2403, %r2384; 2026-02-21T10:13:56.7083396Z mov.b32 %r2404, %r2384; 2026-02-21T10:13:56.7083564Z mov.b32 %r2405, %r2384; 2026-02-21T10:13:56.7083733Z mov.b32 %r2406, %r2384; 2026-02-21T10:13:56.7083898Z mov.b32 %r2407, %r2384; 2026-02-21T10:13:56.7084063Z mov.b32 %r2408, %r2384; 2026-02-21T10:13:56.7084227Z mov.b32 %r2409, %r2384; 2026-02-21T10:13:56.7084458Z mov.b32 %r2410, %r2384; 2026-02-21T10:13:56.7084622Z mov.b32 %r2411, %r2384; 2026-02-21T10:13:56.7084784Z mov.b32 %r2412, %r2384; 2026-02-21T10:13:56.7084944Z mov.b32 %r2413, %r2384; 2026-02-21T10:13:56.7085110Z mov.b32 %r2414, %r2384; 2026-02-21T10:13:56.7085275Z mov.b32 %r2415, %r2384; 2026-02-21T10:13:56.7085450Z mov.b32 %r2416, %r2384; 2026-02-21T10:13:56.7085621Z mov.b32 %r2417, %r2384; 2026-02-21T10:13:56.7085778Z mov.b32 %r2418, %r2384; 2026-02-21T10:13:56.7085945Z mov.b32 %r2419, %r2384; 2026-02-21T10:13:56.7086106Z mov.b32 %r2420, %r2384; 2026-02-21T10:13:56.7086271Z mov.b32 %r2421, %r2384; 2026-02-21T10:13:56.7086429Z mov.b32 %r2422, %r2384; 2026-02-21T10:13:56.7086722Z mov.b32 %r2423, %r2384; 2026-02-21T10:13:56.7086884Z mov.b32 %r2424, %r2384; 2026-02-21T10:13:56.7087050Z mov.b32 %r2425, %r2384; 2026-02-21T10:13:56.7087235Z mov.b32 %r2426, %r2384; 2026-02-21T10:13:56.7087397Z mov.b32 %r2427, %r2384; 2026-02-21T10:13:56.7087566Z mov.b32 %r2428, %r2384; 2026-02-21T10:13:56.7087725Z mov.b32 %r2429, %r2384; 2026-02-21T10:13:56.7087891Z mov.b32 %r2430, %r2384; 2026-02-21T10:13:56.7088053Z mov.b32 %r2431, %r2384; 2026-02-21T10:13:56.7088220Z mov.b32 %r2432, %r2384; 2026-02-21T10:13:56.7088381Z mov.b32 %r2433, %r2384; 2026-02-21T10:13:56.7088547Z mov.b32 %r2434, %r2384; 2026-02-21T10:13:56.7088710Z mov.b32 %r2435, %r2384; 2026-02-21T10:13:56.7088877Z mov.b32 %r2436, %r2384; 2026-02-21T10:13:56.7089044Z mov.b32 %r2437, %r2384; 2026-02-21T10:13:56.7089206Z mov.b32 %r2438, %r2384; 2026-02-21T10:13:56.7089372Z mov.b32 %r2439, %r2384; 2026-02-21T10:13:56.7089530Z mov.b32 %r2440, %r2384; 2026-02-21T10:13:56.7089695Z mov.b32 %r2441, %r2384; 2026-02-21T10:13:56.7089875Z mov.b32 %r2442, %r2384; 2026-02-21T10:13:56.7090043Z mov.b32 %r2443, %r2384; 2026-02-21T10:13:56.7090210Z mov.b32 %r2444, %r2384; 2026-02-21T10:13:56.7090377Z mov.b32 %r2445, %r2384; 2026-02-21T10:13:56.7090541Z mov.b32 %r2446, %r2384; 2026-02-21T10:13:56.7090713Z mov.b32 %r2447, %r2384; 2026-02-21T10:13:56.7090883Z mov.b32 %r2448, %r2384; 2026-02-21T10:13:56.7091047Z mov.b32 %r2449, %r2384; 2026-02-21T10:13:56.7091215Z mov.b32 %r2450, %r2384; 2026-02-21T10:13:56.7091383Z mov.b32 %r2451, %r2384; 2026-02-21T10:13:56.7091552Z mov.b32 %r2452, %r2384; 2026-02-21T10:13:56.7091718Z mov.b32 %r2453, %r2384; 2026-02-21T10:13:56.7091889Z mov.b32 %r2454, %r2384; 2026-02-21T10:13:56.7092051Z mov.b32 %r2455, %r2384; 2026-02-21T10:13:56.7092221Z mov.b32 %r2456, %r2384; 2026-02-21T10:13:56.7092398Z mov.b32 %r2457, %r2384; 2026-02-21T10:13:56.7092570Z mov.b32 %r2458, %r2384; 2026-02-21T10:13:56.7092737Z mov.b32 %r2459, %r2384; 2026-02-21T10:13:56.7092896Z mov.b32 %r2460, %r2384; 2026-02-21T10:13:56.7093226Z mov.b32 %r2461, %r2384; 2026-02-21T10:13:56.7093389Z mov.b32 %r2462, %r2384; 2026-02-21T10:13:56.7093553Z mov.b32 %r2463, %r2384; 2026-02-21T10:13:56.7093714Z mov.b32 %r2464, %r2384; 2026-02-21T10:13:56.7093879Z mov.b32 %r2465, %r2384; 2026-02-21T10:13:56.7094043Z mov.b32 %r2466, %r2384; 2026-02-21T10:13:56.7094208Z mov.b32 %r2467, %r2384; 2026-02-21T10:13:56.7094367Z mov.b32 %r2468, %r2384; 2026-02-21T10:13:56.7094538Z mov.b32 %r2469, %r2384; 2026-02-21T10:13:56.7094704Z mov.b32 %r2470, %r2384; 2026-02-21T10:13:56.7094866Z mov.b32 %r2471, %r2384; 2026-02-21T10:13:56.7095034Z mov.b32 %r2472, %r2384; 2026-02-21T10:13:56.7095195Z mov.b32 %r2473, %r2384; 2026-02-21T10:13:56.7095361Z mov.b32 %r2474, %r2384; 2026-02-21T10:13:56.7095522Z mov.b32 %r2475, %r2384; 2026-02-21T10:13:56.7095762Z mov.b32 %r2476, %r2384; 2026-02-21T10:13:56.7095939Z mov.b32 %r2477, %r2384; 2026-02-21T10:13:56.7096111Z mov.b32 %r2478, %r2384; 2026-02-21T10:13:56.7096275Z mov.b32 %r2479, %r2384; 2026-02-21T10:13:56.7096557Z mov.b32 %r2480, %r2384; 2026-02-21T10:13:56.7096740Z mov.b32 %r2481, %r2384; 2026-02-21T10:13:56.7096906Z mov.b32 %r2482, %r2384; 2026-02-21T10:13:56.7097073Z mov.b32 %r2483, %r2384; 2026-02-21T10:13:56.7097308Z mov.b32 %r2484, %r2384; 2026-02-21T10:13:56.7097480Z mov.b32 %r2485, %r2384; 2026-02-21T10:13:56.7097643Z mov.b32 %r2486, %r2384; 2026-02-21T10:13:56.7097812Z mov.b32 %r2487, %r2384; 2026-02-21T10:13:56.7097973Z mov.b32 %r2488, %r2384; 2026-02-21T10:13:56.7098141Z mov.b32 %r2489, %r2384; 2026-02-21T10:13:56.7098310Z mov.b32 %r2490, %r2384; 2026-02-21T10:13:56.7098479Z mov.b32 %r2491, %r2384; 2026-02-21T10:13:56.7098645Z mov.b32 %r2492, %r2384; 2026-02-21T10:13:56.7098805Z mov.b32 %r2493, %r2384; 2026-02-21T10:13:56.7098980Z mov.b32 %r2494, %r2384; 2026-02-21T10:13:56.7099144Z mov.b32 %r2495, %r2384; 2026-02-21T10:13:56.7099310Z mov.b32 %r2496, %r2384; 2026-02-21T10:13:56.7099476Z mov.b32 %r2497, %r2384; 2026-02-21T10:13:56.7099666Z mov.b32 %r2498, %r2384; 2026-02-21T10:13:56.7099831Z mov.b32 %r2499, %r2384; 2026-02-21T10:13:56.7100000Z mov.b32 %r2500, %r2384; 2026-02-21T10:13:56.7100163Z mov.b32 %r2501, %r2384; 2026-02-21T10:13:56.7100335Z mov.b32 %r2502, %r2384; 2026-02-21T10:13:56.7100507Z mov.b32 %r2503, %r2384; 2026-02-21T10:13:56.7100672Z mov.b32 %r2504, %r2384; 2026-02-21T10:13:56.7100844Z mov.b32 %r2505, %r2384; 2026-02-21T10:13:56.7101007Z mov.b32 %r2506, %r2384; 2026-02-21T10:13:56.7101173Z mov.b32 %r2507, %r2384; 2026-02-21T10:13:56.7101336Z mov.b32 %r2508, %r2384; 2026-02-21T10:13:56.7101501Z mov.b32 %r2509, %r2384; 2026-02-21T10:13:56.7101663Z mov.b32 %r2510, %r2384; 2026-02-21T10:13:56.7101828Z mov.b32 %r2511, %r2384; 2026-02-21T10:13:56.7101987Z mov.b32 %r2513, %r2381; 2026-02-21T10:13:56.7102154Z mov.b32 %r2514, %r2377; 2026-02-21T10:13:56.7102319Z mov.b32 %r2515, %r2383; 2026-02-21T10:13:56.7102480Z mov.b32 %r2516, %r2382; 2026-02-21T10:13:56.7102646Z bra.uni $L__BB0_2; 2026-02-21T10:13:56.7102867Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:13:56.7103294Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7103664Z add.s32 %r2514, %r2514, 1; 2026-02-21T10:13:56.7103858Z setp.ne.b32 %p60, %r68, %r2514; 2026-02-21T10:13:56.7104052Z mov.b32 %r2372, %r2382; 2026-02-21T10:13:56.7104239Z mov.b32 %r2373, %r77; 2026-02-21T10:13:56.7104411Z mov.b32 %r2374, %r2383; 2026-02-21T10:13:56.7104575Z mov.b32 %r2375, %r79; 2026-02-21T10:13:56.7104743Z mov.b32 %r2376, %r2513; 2026-02-21T10:13:56.7104909Z mov.b32 %r2377, %r81; 2026-02-21T10:13:56.7105073Z mov.b32 %r2382, %r2516; 2026-02-21T10:13:56.7105235Z mov.b32 %r2383, %r2515; 2026-02-21T10:13:56.7105405Z mov.b32 %r2513, %r220; 2026-02-21T10:13:56.7105575Z @%p60 bra $L__BB0_2; 2026-02-21T10:13:56.7105744Z bra.uni $L__BB0_7; 2026-02-21T10:13:56.7105970Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:13:56.7106693Z .loc 1 0 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:0:112 2026-02-21T10:13:56.7107062Z mov.b32 %r81, %r2376; 2026-02-21T10:13:56.7107221Z mov.b32 %r79, %r2374; 2026-02-21T10:13:56.7107392Z mov.b32 %r77, %r2372; 2026-02-21T10:13:56.7107700Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7108068Z add.s32 %r676, %r2513, 1; 2026-02-21T10:13:56.7108260Z setp.eq.b32 %p39, %r2513, 255; 2026-02-21T10:13:56.7108535Z selp.b32 %r220, 0, %r676, %p39; 2026-02-21T10:13:56.7108753Z setp.ne.b32 %p40, %r220, 0; 2026-02-21T10:13:56.7108936Z @%p40 bra $L__BB0_4; 2026-02-21T10:13:56.7109237Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:13:56.7109501Z add.s32 %r2517, %r2517, 8448; 2026-02-21T10:13:56.7109843Z .loc 1 32 35 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:32:35 2026-02-21T10:13:56.7110203Z shr.s32 %r677, %r2517, 31; 2026-02-21T10:13:56.7110388Z shr.u32 %r678, %r677, 19; 2026-02-21T10:13:56.7110567Z add.s32 %r679, %r2517, %r678; 2026-02-21T10:13:56.7110756Z shr.s32 %r680, %r679, 13; 2026-02-21T10:13:56.7111153Z .loc 1 33 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:33:33 2026-02-21T10:13:56.7111512Z shl.b32 %r681, %r680, 4; 2026-02-21T10:13:56.7111832Z .loc 1 34 39 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:34:39 2026-02-21T10:13:56.7112181Z sub.s32 %r682, 10, %r681; 2026-02-21T10:13:56.7112497Z .loc 1 34 52 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:34:52 2026-02-21T10:13:56.7112848Z min.s32 %r683, %r682, 16; 2026-02-21T10:13:56.7113164Z .loc 1 35 45 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:45 2026-02-21T10:13:56.7113535Z and.b32 %r684, %r679, -8192; 2026-02-21T10:13:56.7113721Z sub.s32 %r685, %r2517, %r684; 2026-02-21T10:13:56.7114042Z .loc 1 36 51 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:36:51 2026-02-21T10:13:56.7114394Z div.s32 %r686, %r685, %r683; 2026-02-21T10:13:56.7114718Z .loc 1 35 64 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:64 2026-02-21T10:13:56.7115077Z mul.lo.s32 %r687, %r686, %r683; 2026-02-21T10:13:56.7115271Z sub.s32 %r688, %r685, %r687; 2026-02-21T10:13:56.7115590Z .loc 1 35 30 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:35:30 2026-02-21T10:13:56.7115939Z add.s32 %r689, %r688, %r681; 2026-02-21T10:13:56.7116258Z .loc 1 37 27 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:37:27 2026-02-21T10:13:56.7116749Z shl.b32 %r2515, %r689, 7; 2026-02-21T10:13:56.7117079Z .loc 1 39 27 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:39:27 2026-02-21T10:13:56.7117434Z shl.b32 %r2516, %r686, 7; 2026-02-21T10:13:56.7117748Z .loc 1 40 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:40:32 2026-02-21T10:13:56.7118102Z or.b32 %r2518, %r2516, %r5; 2026-02-21T10:13:56.7118283Z or.b32 %r2519, %r2516, %r6; 2026-02-21T10:13:56.7118462Z or.b32 %r2520, %r2516, %r7; 2026-02-21T10:13:56.7118646Z or.b32 %r2521, %r2516, %r8; 2026-02-21T10:13:56.7118824Z or.b32 %r2522, %r2516, %r9; 2026-02-21T10:13:56.7118999Z or.b32 %r2523, %r2516, %r10; 2026-02-21T10:13:56.7119178Z or.b32 %r2524, %r2516, %r11; 2026-02-21T10:13:56.7119351Z or.b32 %r2525, %r2516, %r12; 2026-02-21T10:13:56.7119582Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:13:56.7119997Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7120363Z setp.eq.b32 %p51, %r220, 0; 2026-02-21T10:13:56.7120556Z setp.lt.s32 %p52, %r2514, %r47; 2026-02-21T10:13:56.7120906Z add.s32 %r2031, %r2380, 1; 2026-02-21T10:13:56.7121093Z setp.gt.s32 %p55, %r2031, 2; 2026-02-21T10:13:56.7121278Z selp.b32 %r2380, 0, %r2031, %p55; 2026-02-21T10:13:56.7121484Z selp.b32 %r2032, 1, 0, %p55; 2026-02-21T10:13:56.7121670Z xor.b32 %r2379, %r2379, %r2032; 2026-02-21T10:13:56.7122003Z .loc 1 54 80 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:80 2026-02-21T10:13:56.7122368Z cp.async.wait_group 2; 2026-02-21T10:13:56.7122540Z bar.sync 0; 2026-02-21T10:13:56.7122696Z shl.b32 %r2033, %r2380, 13; 2026-02-21T10:13:56.7122880Z add.s32 %r2035, %r586, %r2033; 2026-02-21T10:13:56.7123210Z .loc 1 58 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:58:32 2026-02-21T10:13:56.7123634Z add.s32 %r2036, %r2035, %r48; 2026-02-21T10:13:56.7123830Z ld.shared.b16 %rs1, [%r2036]; 2026-02-21T10:13:56.7124041Z ld.shared.b16 %rs2, [%r2036+512]; 2026-02-21T10:13:56.7124251Z ld.shared.b16 %rs3, [%r2036+32]; 2026-02-21T10:13:56.7124461Z ld.shared.b16 %rs4, [%r2036+544]; 2026-02-21T10:13:56.7124661Z ld.shared.b16 %rs5, [%r2036+4096]; 2026-02-21T10:13:56.7124873Z ld.shared.b16 %rs6, [%r2036+4608]; 2026-02-21T10:13:56.7125072Z ld.shared.b16 %rs7, [%r2036+4128]; 2026-02-21T10:13:56.7125346Z ld.shared.b16 %rs8, [%r2036+4640]; 2026-02-21T10:13:56.7125544Z add.s32 %r2037, %r2035, %r49; 2026-02-21T10:13:56.7125733Z ld.shared.b16 %rs9, [%r2037]; 2026-02-21T10:13:56.7125922Z ld.shared.b16 %rs10, [%r2037+512]; 2026-02-21T10:13:56.7126130Z ld.shared.b16 %rs11, [%r2037+32]; 2026-02-21T10:13:56.7126328Z ld.shared.b16 %rs12, [%r2037+544]; 2026-02-21T10:13:56.7126659Z ld.shared.b16 %rs13, [%r2037+4096]; 2026-02-21T10:13:56.7126887Z ld.shared.b16 %rs14, [%r2037+4608]; 2026-02-21T10:13:56.7127098Z ld.shared.b16 %rs15, [%r2037+4128]; 2026-02-21T10:13:56.7127307Z ld.shared.b16 %rs16, [%r2037+4640]; 2026-02-21T10:13:56.7127504Z add.s32 %r2038, %r2035, %r50; 2026-02-21T10:13:56.7127698Z ld.shared.b16 %rs17, [%r2038]; 2026-02-21T10:13:56.7127897Z ld.shared.b16 %rs18, [%r2038+512]; 2026-02-21T10:13:56.7128098Z ld.shared.b16 %rs19, [%r2038+32]; 2026-02-21T10:13:56.7128298Z ld.shared.b16 %rs20, [%r2038+544]; 2026-02-21T10:13:56.7128493Z ld.shared.b16 %rs21, [%r2038+4096]; 2026-02-21T10:13:56.7128702Z ld.shared.b16 %rs22, [%r2038+4608]; 2026-02-21T10:13:56.7128900Z ld.shared.b16 %rs23, [%r2038+4128]; 2026-02-21T10:13:56.7129102Z ld.shared.b16 %rs24, [%r2038+4640]; 2026-02-21T10:13:56.7129293Z add.s32 %r2039, %r2035, %r51; 2026-02-21T10:13:56.7129481Z ld.shared.b16 %rs25, [%r2039]; 2026-02-21T10:13:56.7129675Z ld.shared.b16 %rs26, [%r2039+512]; 2026-02-21T10:13:56.7129883Z ld.shared.b16 %rs27, [%r2039+32]; 2026-02-21T10:13:56.7130087Z ld.shared.b16 %rs28, [%r2039+544]; 2026-02-21T10:13:56.7130306Z ld.shared.b16 %rs29, [%r2039+4096]; 2026-02-21T10:13:56.7130516Z ld.shared.b16 %rs30, [%r2039+4608]; 2026-02-21T10:13:56.7130716Z ld.shared.b16 %rs31, [%r2039+4128]; 2026-02-21T10:13:56.7130922Z ld.shared.b16 %rs32, [%r2039+4640]; 2026-02-21T10:13:56.7131120Z cvt.f32.bf16 %r820, %rs1; 2026-02-21T10:13:56.7131306Z cvt.f32.bf16 %r821, %rs2; 2026-02-21T10:13:56.7131481Z cvt.f32.bf16 %r822, %rs9; 2026-02-21T10:13:56.7131660Z cvt.f32.bf16 %r823, %rs10; 2026-02-21T10:13:56.7131839Z cvt.f32.bf16 %r952, %rs17; 2026-02-21T10:13:56.7132017Z cvt.f32.bf16 %r953, %rs18; 2026-02-21T10:13:56.7132196Z cvt.f32.bf16 %r954, %rs25; 2026-02-21T10:13:56.7132369Z cvt.f32.bf16 %r955, %rs26; 2026-02-21T10:13:56.7132545Z cvt.f32.bf16 %r1084, %rs3; 2026-02-21T10:13:56.7132714Z cvt.f32.bf16 %r1085, %rs4; 2026-02-21T10:13:56.7132897Z cvt.f32.bf16 %r1086, %rs11; 2026-02-21T10:13:56.7133076Z cvt.f32.bf16 %r1087, %rs12; 2026-02-21T10:13:56.7133261Z cvt.f32.bf16 %r1216, %rs19; 2026-02-21T10:13:56.7133437Z cvt.f32.bf16 %r1217, %rs20; 2026-02-21T10:13:56.7133629Z cvt.f32.bf16 %r1218, %rs27; 2026-02-21T10:13:56.7133806Z cvt.f32.bf16 %r1219, %rs28; 2026-02-21T10:13:56.7133990Z cvt.f32.bf16 %r1348, %rs5; 2026-02-21T10:13:56.7134313Z cvt.f32.bf16 %r1349, %rs6; 2026-02-21T10:13:56.7134487Z cvt.f32.bf16 %r1350, %rs13; 2026-02-21T10:13:56.7134687Z cvt.f32.bf16 %r1351, %rs14; 2026-02-21T10:13:56.7134866Z cvt.f32.bf16 %r1480, %rs21; 2026-02-21T10:13:56.7135051Z cvt.f32.bf16 %r1481, %rs22; 2026-02-21T10:13:56.7135224Z cvt.f32.bf16 %r1482, %rs29; 2026-02-21T10:13:56.7135406Z cvt.f32.bf16 %r1483, %rs30; 2026-02-21T10:13:56.7135581Z cvt.f32.bf16 %r1612, %rs7; 2026-02-21T10:13:56.7135762Z cvt.f32.bf16 %r1613, %rs8; 2026-02-21T10:13:56.7135942Z cvt.f32.bf16 %r1614, %rs15; 2026-02-21T10:13:56.7136115Z cvt.f32.bf16 %r1615, %rs16; 2026-02-21T10:13:56.7136294Z cvt.f32.bf16 %r1744, %rs23; 2026-02-21T10:13:56.7136588Z cvt.f32.bf16 %r1745, %rs24; 2026-02-21T10:13:56.7136859Z cvt.f32.bf16 %r1746, %rs31; 2026-02-21T10:13:56.7137035Z cvt.f32.bf16 %r1747, %rs32; 2026-02-21T10:13:56.7137381Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7137755Z shl.b32 %r2040, %r2380, 3; 2026-02-21T10:13:56.7137936Z add.s32 %r690, %r512, %r2040; 2026-02-21T10:13:56.7138123Z // begin inline asm 2026-02-21T10:13:56.7138277Z 2026-02-21T10:13:56.7138403Z { 2026-02-21T10:13:56.7138537Z .reg .pred complete; 2026-02-21T10:13:56.7138785Z waitLoop: 2026-02-21T10:13:56.7139017Z mbarrier.try_wait.parity.shared.b64 complete, [%r690], %r2379; 2026-02-21T10:13:56.7139323Z @!complete bra.uni waitLoop; 2026-02-21T10:13:56.7139497Z } 2026-02-21T10:13:56.7139574Z 2026-02-21T10:13:56.7139636Z // end inline asm 2026-02-21T10:13:56.7139944Z .loc 1 60 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:60:33 2026-02-21T10:13:56.7140317Z shl.b32 %r2042, %r2380, 11; 2026-02-21T10:13:56.7140510Z add.s32 %r2044, %r532, %r2042; 2026-02-21T10:13:56.7140840Z .loc 1 78 58 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:78:58 2026-02-21T10:13:56.7141204Z add.s32 %r2045, %r2044, %r45; 2026-02-21T10:13:56.7141407Z xor.b32 %r2046, %r45, 16; 2026-02-21T10:13:56.7141590Z add.s32 %r2047, %r2044, %r2046; 2026-02-21T10:13:56.7141779Z xor.b32 %r2048, %r45, 32; 2026-02-21T10:13:56.7141956Z add.s32 %r2049, %r2044, %r2048; 2026-02-21T10:13:56.7142141Z xor.b32 %r2050, %r45, 48; 2026-02-21T10:13:56.7142318Z add.s32 %r2051, %r2044, %r2050; 2026-02-21T10:13:56.7142505Z xor.b32 %r2052, %r45, 64; 2026-02-21T10:13:56.7142673Z add.s32 %r2053, %r2044, %r2052; 2026-02-21T10:13:56.7142865Z xor.b32 %r2054, %r45, 80; 2026-02-21T10:13:56.7143033Z add.s32 %r2055, %r2044, %r2054; 2026-02-21T10:13:56.7143223Z xor.b32 %r2056, %r45, 96; 2026-02-21T10:13:56.7143394Z add.s32 %r2057, %r2044, %r2056; 2026-02-21T10:13:56.7143597Z xor.b32 %r2058, %r45, 112; 2026-02-21T10:13:56.7143778Z add.s32 %r2059, %r2044, %r2058; 2026-02-21T10:13:56.7144115Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7144489Z ld.shared.s8 %rs33, [%r2045]; 2026-02-21T10:13:56.7144817Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7145180Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:13:56.7145496Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7145865Z ld.shared.s8 %rs35, [%r2047+128]; 2026-02-21T10:13:56.7146205Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7146697Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:13:56.7147021Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7147380Z ld.shared.s8 %rs37, [%r2049+256]; 2026-02-21T10:13:56.7147730Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7148080Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:13:56.7148394Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7148981Z ld.shared.s8 %rs39, [%r2051+384]; 2026-02-21T10:13:56.7149320Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7149679Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:13:56.7149987Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7150348Z ld.shared.s8 %rs41, [%r2053+512]; 2026-02-21T10:13:56.7150676Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7151030Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:13:56.7151408Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7151773Z ld.shared.s8 %rs43, [%r2055+640]; 2026-02-21T10:13:56.7152105Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7152475Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:13:56.7152787Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7153144Z ld.shared.s8 %rs45, [%r2057+768]; 2026-02-21T10:13:56.7153541Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7153902Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:13:56.7154212Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7154572Z ld.shared.s8 %rs47, [%r2059+896]; 2026-02-21T10:13:56.7154901Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7155258Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:13:56.7155565Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7155930Z ld.shared.s8 %rs49, [%r2045+1024]; 2026-02-21T10:13:56.7156274Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7156756Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:13:56.7157071Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7157441Z ld.shared.s8 %rs51, [%r2047+1152]; 2026-02-21T10:13:56.7157787Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7158134Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:13:56.7158449Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7158810Z ld.shared.s8 %rs53, [%r2049+1280]; 2026-02-21T10:13:56.7159143Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7159501Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:13:56.7159814Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7160180Z ld.shared.s8 %rs55, [%r2051+1408]; 2026-02-21T10:13:56.7160512Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7160871Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:13:56.7161182Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7161539Z ld.shared.s8 %rs57, [%r2053+1536]; 2026-02-21T10:13:56.7161893Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7162248Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:13:56.7162568Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7162923Z ld.shared.s8 %rs59, [%r2055+1664]; 2026-02-21T10:13:56.7163260Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7163802Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:13:56.7164116Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7164482Z ld.shared.s8 %rs61, [%r2057+1792]; 2026-02-21T10:13:56.7164812Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7165168Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:13:56.7165484Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7165839Z ld.shared.s8 %rs63, [%r2059+1920]; 2026-02-21T10:13:56.7166177Z .loc 1 63 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:63:28 2026-02-21T10:13:56.7166733Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:13:56.7167065Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7167424Z cvt.s16.s8 %rs65, %rs34; 2026-02-21T10:13:56.7167601Z shr.s16 %rs66, %rs65, 4; 2026-02-21T10:13:56.7167773Z cvt.s16.s8 %rs67, %rs36; 2026-02-21T10:13:56.7167949Z shr.s16 %rs68, %rs67, 4; 2026-02-21T10:13:56.7168119Z shr.s16 %rs69, %rs33, 4; 2026-02-21T10:13:56.7168361Z shr.s16 %rs70, %rs35, 4; 2026-02-21T10:13:56.7168676Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7169032Z cvt.rn.f32.s16 %r2060, %rs70; 2026-02-21T10:13:56.7169225Z cvt.rn.f32.s16 %r2061, %rs69; 2026-02-21T10:13:56.7169405Z cvt.rn.f32.s16 %r2062, %rs68; 2026-02-21T10:13:56.7169590Z cvt.rn.f32.s16 %r2063, %rs66; 2026-02-21T10:13:56.7169912Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7170272Z cvt.s16.s8 %rs71, %rs38; 2026-02-21T10:13:56.7170461Z shr.s16 %rs72, %rs71, 4; 2026-02-21T10:13:56.7170631Z cvt.s16.s8 %rs73, %rs40; 2026-02-21T10:13:56.7170807Z shr.s16 %rs74, %rs73, 4; 2026-02-21T10:13:56.7170976Z shr.s16 %rs75, %rs37, 4; 2026-02-21T10:13:56.7171151Z shr.s16 %rs76, %rs39, 4; 2026-02-21T10:13:56.7171459Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7171821Z cvt.rn.f32.s16 %r2064, %rs76; 2026-02-21T10:13:56.7172002Z cvt.rn.f32.s16 %r2065, %rs75; 2026-02-21T10:13:56.7172187Z cvt.rn.f32.s16 %r2066, %rs74; 2026-02-21T10:13:56.7172371Z cvt.rn.f32.s16 %r2067, %rs72; 2026-02-21T10:13:56.7172686Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7173049Z cvt.s16.s8 %rs77, %rs42; 2026-02-21T10:13:56.7173221Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:13:56.7173397Z cvt.s16.s8 %rs79, %rs44; 2026-02-21T10:13:56.7173563Z shr.s16 %rs80, %rs79, 4; 2026-02-21T10:13:56.7173742Z shr.s16 %rs81, %rs41, 4; 2026-02-21T10:13:56.7173909Z shr.s16 %rs82, %rs43, 4; 2026-02-21T10:13:56.7174225Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7174596Z cvt.rn.f32.s16 %r2068, %rs82; 2026-02-21T10:13:56.7174781Z cvt.rn.f32.s16 %r2069, %rs81; 2026-02-21T10:13:56.7174967Z cvt.rn.f32.s16 %r2070, %rs80; 2026-02-21T10:13:56.7175145Z cvt.rn.f32.s16 %r2071, %rs78; 2026-02-21T10:13:56.7175470Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7175823Z cvt.s16.s8 %rs83, %rs46; 2026-02-21T10:13:56.7175998Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:13:56.7176177Z cvt.s16.s8 %rs85, %rs48; 2026-02-21T10:13:56.7176355Z shr.s16 %rs86, %rs85, 4; 2026-02-21T10:13:56.7176653Z shr.s16 %rs87, %rs45, 4; 2026-02-21T10:13:56.7176822Z shr.s16 %rs88, %rs47, 4; 2026-02-21T10:13:56.7177153Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7177511Z cvt.rn.f32.s16 %r2072, %rs88; 2026-02-21T10:13:56.7177793Z cvt.rn.f32.s16 %r2073, %rs87; 2026-02-21T10:13:56.7178035Z cvt.rn.f32.s16 %r2074, %rs86; 2026-02-21T10:13:56.7178226Z cvt.rn.f32.s16 %r2075, %rs84; 2026-02-21T10:13:56.7178556Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7178924Z cvt.s16.s8 %rs89, %rs50; 2026-02-21T10:13:56.7179100Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:13:56.7179269Z cvt.s16.s8 %rs91, %rs52; 2026-02-21T10:13:56.7179444Z shr.s16 %rs92, %rs91, 4; 2026-02-21T10:13:56.7179612Z shr.s16 %rs93, %rs49, 4; 2026-02-21T10:13:56.7179782Z shr.s16 %rs94, %rs51, 4; 2026-02-21T10:13:56.7180086Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7180446Z cvt.rn.f32.s16 %r2076, %rs94; 2026-02-21T10:13:56.7180716Z cvt.rn.f32.s16 %r2077, %rs93; 2026-02-21T10:13:56.7180904Z cvt.rn.f32.s16 %r2078, %rs92; 2026-02-21T10:13:56.7181086Z cvt.rn.f32.s16 %r2079, %rs90; 2026-02-21T10:13:56.7181405Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7181776Z cvt.s16.s8 %rs95, %rs54; 2026-02-21T10:13:56.7181947Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:13:56.7182120Z cvt.s16.s8 %rs97, %rs56; 2026-02-21T10:13:56.7182355Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:13:56.7182528Z shr.s16 %rs99, %rs53, 4; 2026-02-21T10:13:56.7182703Z shr.s16 %rs100, %rs55, 4; 2026-02-21T10:13:56.7183017Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7183378Z cvt.rn.f32.s16 %r2080, %rs100; 2026-02-21T10:13:56.7183567Z cvt.rn.f32.s16 %r2081, %rs99; 2026-02-21T10:13:56.7183754Z cvt.rn.f32.s16 %r2082, %rs98; 2026-02-21T10:13:56.7183933Z cvt.rn.f32.s16 %r2083, %rs96; 2026-02-21T10:13:56.7184255Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7184607Z cvt.s16.s8 %rs101, %rs58; 2026-02-21T10:13:56.7184796Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:13:56.7184988Z cvt.s16.s8 %rs103, %rs60; 2026-02-21T10:13:56.7185171Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:13:56.7185353Z shr.s16 %rs105, %rs57, 4; 2026-02-21T10:13:56.7185523Z shr.s16 %rs106, %rs59, 4; 2026-02-21T10:13:56.7185843Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7186202Z cvt.rn.f32.s16 %r2084, %rs106; 2026-02-21T10:13:56.7186407Z cvt.rn.f32.s16 %r2085, %rs105; 2026-02-21T10:13:56.7186714Z cvt.rn.f32.s16 %r2086, %rs104; 2026-02-21T10:13:56.7186905Z cvt.rn.f32.s16 %r2087, %rs102; 2026-02-21T10:13:56.7187233Z .loc 1 65 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:65:25 2026-02-21T10:13:56.7187584Z cvt.s16.s8 %rs107, %rs62; 2026-02-21T10:13:56.7192800Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:13:56.7193032Z cvt.s16.s8 %rs109, %rs64; 2026-02-21T10:13:56.7193238Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:13:56.7193446Z shr.s16 %rs111, %rs61, 4; 2026-02-21T10:13:56.7193624Z shr.s16 %rs112, %rs63, 4; 2026-02-21T10:13:56.7193974Z .loc 1 83 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:83:32 2026-02-21T10:13:56.7194370Z cvt.rn.f32.s16 %r2088, %rs112; 2026-02-21T10:13:56.7194586Z cvt.rn.f32.s16 %r2089, %rs111; 2026-02-21T10:13:56.7194786Z cvt.rn.f32.s16 %r2090, %rs110; 2026-02-21T10:13:56.7194976Z cvt.rn.f32.s16 %r2091, %rs108; 2026-02-21T10:13:56.7195227Z st.shared.v4.b32 [%r52], {%r2063, %r2061, %r2062, %r2060}; 2026-02-21T10:13:56.7195538Z st.shared.v4.b32 [%r53], {%r2067, %r2065, %r2066, %r2064}; 2026-02-21T10:13:56.7195838Z st.shared.v4.b32 [%r54], {%r2071, %r2069, %r2070, %r2068}; 2026-02-21T10:13:56.7196156Z st.shared.v4.b32 [%r55], {%r2075, %r2073, %r2074, %r2072}; 2026-02-21T10:13:56.7196629Z st.shared.v4.b32 [%r56], {%r2079, %r2077, %r2078, %r2076}; 2026-02-21T10:13:56.7196941Z st.shared.v4.b32 [%r57], {%r2083, %r2081, %r2082, %r2080}; 2026-02-21T10:13:56.7197436Z st.shared.v4.b32 [%r58], {%r2087, %r2085, %r2086, %r2084}; 2026-02-21T10:13:56.7197729Z st.shared.v4.b32 [%r59], {%r2091, %r2089, %r2090, %r2088}; 2026-02-21T10:13:56.7197966Z $L__tmp1: 2026-02-21T10:13:56.7198360Z .loc 2 291 36 // standard.py:291:36 @[ ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:90:40 ] 2026-02-21T10:13:56.7198792Z // begin inline asm 2026-02-21T10:13:56.7198995Z fence.proxy.async.shared::cta; 2026-02-21T10:13:56.7199207Z // end inline asm 2026-02-21T10:13:56.7199370Z bar.sync 0; 2026-02-21T10:13:56.7199547Z shfl.sync.idx.b32 %r2092, %r3, 0, 31, -1; 2026-02-21T10:13:56.7199783Z wgmma.fence.sync.aligned; 2026-02-21T10:13:56.7199980Z mov.pred %p41, -1; 2026-02-21T10:13:56.7200147Z // begin inline asm 2026-02-21T10:13:56.7201661Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r820,%r821,%r822,%r823}, %rd92, %p41, 1, 1; 2026-02-21T10:13:56.7203094Z // end inline asm 2026-02-21T10:13:56.7203253Z // begin inline asm 2026-02-21T10:13:56.7204634Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r952,%r953,%r954,%r955}, %rd93, %p41, 1, 1; 2026-02-21T10:13:56.7206033Z // end inline asm 2026-02-21T10:13:56.7206187Z // begin inline asm 2026-02-21T10:13:56.7207679Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r1084,%r1085,%r1086,%r1087}, %rd94, %p41, 1, 1; 2026-02-21T10:13:56.7209091Z // end inline asm 2026-02-21T10:13:56.7209241Z // begin inline asm 2026-02-21T10:13:56.7210509Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r1216,%r1217,%r1218,%r1219}, %rd95, %p41, 1, 1; 2026-02-21T10:13:56.7210575Z // end inline asm 2026-02-21T10:13:56.7210637Z // begin inline asm 2026-02-21T10:13:56.7211905Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1348,%r1349,%r1350,%r1351}, %rd92, %p41, 1, 1; 2026-02-21T10:13:56.7212116Z // end inline asm 2026-02-21T10:13:56.7212182Z // begin inline asm 2026-02-21T10:13:56.7213464Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1480,%r1481,%r1482,%r1483}, %rd93, %p41, 1, 1; 2026-02-21T10:13:56.7213527Z // end inline asm 2026-02-21T10:13:56.7213598Z // begin inline asm 2026-02-21T10:13:56.7215000Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1612,%r1613,%r1614,%r1615}, %rd94, %p41, 1, 1; 2026-02-21T10:13:56.7215068Z // end inline asm 2026-02-21T10:13:56.7215136Z // begin inline asm 2026-02-21T10:13:56.7216400Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1744,%r1745,%r1746,%r1747}, %rd95, %p41, 1, 1; 2026-02-21T10:13:56.7216592Z // end inline asm 2026-02-21T10:13:56.7216690Z wgmma.commit_group.sync.aligned; 2026-02-21T10:13:56.7216754Z mov.b32 %r1877, 0; 2026-02-21T10:13:56.7216822Z mov.b32 %r1876, %r644; 2026-02-21T10:13:56.7216895Z mov.b32 %r1878, %r1877; 2026-02-21T10:13:56.7216959Z // begin inline asm 2026-02-21T10:13:56.7219041Z // wait for regs: %r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r1876,%r1877,%r1878 2026-02-21T10:13:56.7219137Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:13:56.7219198Z // end inline asm 2026-02-21T10:13:56.7219262Z $L__tmp2: 2026-02-21T10:13:56.7219492Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7219561Z add.s32 %r2093, %r2378, 16; 2026-02-21T10:13:56.7219639Z add.s32 %r2094, %r2381, 1; 2026-02-21T10:13:56.7219720Z setp.gt.s32 %p56, %r2094, 2; 2026-02-21T10:13:56.7219794Z selp.b32 %r2381, 0, %r2094, %p56; 2026-02-21T10:13:56.7219866Z selp.b32 %r2378, 0, %r2093, %p51; 2026-02-21T10:13:56.7220092Z .loc 1 51 22 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:51:22 2026-02-21T10:13:56.7220239Z shl.b32 %r2095, %r2378, 1; 2026-02-21T10:13:56.7220509Z .loc 1 53 25 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:53:25 2026-02-21T10:13:56.7220585Z add.s32 %r2096, %r2095, %r34; 2026-02-21T10:13:56.7220789Z .loc 1 54 53 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:53 2026-02-21T10:13:56.7220855Z shl.b32 %r2097, %r2518, 13; 2026-02-21T10:13:56.7220921Z shl.b32 %r2098, %r2519, 13; 2026-02-21T10:13:56.7220996Z shl.b32 %r2099, %r2520, 13; 2026-02-21T10:13:56.7221060Z shl.b32 %r2100, %r2521, 13; 2026-02-21T10:13:56.7221133Z shl.b32 %r2101, %r2522, 13; 2026-02-21T10:13:56.7221208Z shl.b32 %r2102, %r2523, 13; 2026-02-21T10:13:56.7221273Z shl.b32 %r2103, %r2524, 13; 2026-02-21T10:13:56.7221334Z shl.b32 %r2104, %r2525, 13; 2026-02-21T10:13:56.7221617Z .loc 1 54 60 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:60 2026-02-21T10:13:56.7221689Z add.s32 %r2105, %r2097, %r2096; 2026-02-21T10:13:56.7221759Z add.s32 %r2106, %r2098, %r2096; 2026-02-21T10:13:56.7221822Z add.s32 %r2107, %r2099, %r2096; 2026-02-21T10:13:56.7221892Z add.s32 %r2108, %r2100, %r2096; 2026-02-21T10:13:56.7221956Z add.s32 %r2109, %r2101, %r2096; 2026-02-21T10:13:56.7222081Z add.s32 %r2110, %r2102, %r2096; 2026-02-21T10:13:56.7222151Z add.s32 %r2111, %r2103, %r2096; 2026-02-21T10:13:56.7222225Z add.s32 %r2112, %r2104, %r2096; 2026-02-21T10:13:56.7222434Z .loc 1 54 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:32 2026-02-21T10:13:56.7222518Z mad.wide.s32 %rd100, %r2105, 2, %rd6; 2026-02-21T10:13:56.7222590Z mad.wide.s32 %rd101, %r2106, 2, %rd6; 2026-02-21T10:13:56.7222661Z mad.wide.s32 %rd102, %r2107, 2, %rd6; 2026-02-21T10:13:56.7222732Z mad.wide.s32 %rd103, %r2108, 2, %rd6; 2026-02-21T10:13:56.7222808Z mad.wide.s32 %rd104, %r2109, 2, %rd6; 2026-02-21T10:13:56.7222876Z mad.wide.s32 %rd105, %r2110, 2, %rd6; 2026-02-21T10:13:56.7222943Z mad.wide.s32 %rd106, %r2111, 2, %rd6; 2026-02-21T10:13:56.7223022Z mad.wide.s32 %rd107, %r2112, 2, %rd6; 2026-02-21T10:13:56.7223233Z .loc 1 54 80 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:54:80 2026-02-21T10:13:56.7223301Z shl.b32 %r2113, %r2381, 13; 2026-02-21T10:13:56.7223371Z add.s32 %r2114, %r586, %r2113; 2026-02-21T10:13:56.7223443Z add.s32 %r2010, %r2114, %r46; 2026-02-21T10:13:56.7223511Z selp.b32 %r2011, 8, 0, %p52; 2026-02-21T10:13:56.7223574Z // begin inline asm 2026-02-21T10:13:56.7223731Z cp.async.ca.shared.global [ %r2010 + 0 ], [ %rd100 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7223793Z // end inline asm 2026-02-21T10:13:56.7223857Z add.s32 %r2012, %r2010, 1024; 2026-02-21T10:13:56.7223927Z // begin inline asm 2026-02-21T10:13:56.7224068Z cp.async.ca.shared.global [ %r2012 + 0 ], [ %rd101 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7224130Z // end inline asm 2026-02-21T10:13:56.7224194Z add.s32 %r2014, %r2010, 2048; 2026-02-21T10:13:56.7224263Z // begin inline asm 2026-02-21T10:13:56.7224400Z cp.async.ca.shared.global [ %r2014 + 0 ], [ %rd102 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7224461Z // end inline asm 2026-02-21T10:13:56.7224531Z add.s32 %r2016, %r2010, 3072; 2026-02-21T10:13:56.7224592Z // begin inline asm 2026-02-21T10:13:56.7224727Z cp.async.ca.shared.global [ %r2016 + 0 ], [ %rd103 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7224786Z // end inline asm 2026-02-21T10:13:56.7224860Z add.s32 %r2018, %r2010, 4096; 2026-02-21T10:13:56.7224922Z // begin inline asm 2026-02-21T10:13:56.7225056Z cp.async.ca.shared.global [ %r2018 + 0 ], [ %rd104 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7225122Z // end inline asm 2026-02-21T10:13:56.7225188Z add.s32 %r2020, %r2010, 5120; 2026-02-21T10:13:56.7225251Z // begin inline asm 2026-02-21T10:13:56.7225393Z cp.async.ca.shared.global [ %r2020 + 0 ], [ %rd105 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7225456Z // end inline asm 2026-02-21T10:13:56.7225518Z add.s32 %r2022, %r2010, 6144; 2026-02-21T10:13:56.7225691Z // begin inline asm 2026-02-21T10:13:56.7225832Z cp.async.ca.shared.global [ %r2022 + 0 ], [ %rd106 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7225892Z // end inline asm 2026-02-21T10:13:56.7225958Z add.s32 %r2024, %r2010, 7168; 2026-02-21T10:13:56.7226025Z // begin inline asm 2026-02-21T10:13:56.7226158Z cp.async.ca.shared.global [ %r2024 + 0 ], [ %rd107 + 0 ], 0x8, %r2011; 2026-02-21T10:13:56.7226218Z // end inline asm 2026-02-21T10:13:56.7226287Z cp.async.commit_group; 2026-02-21T10:13:56.7226651Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7226723Z shl.b32 %r2115, %r2381, 3; 2026-02-21T10:13:56.7226790Z add.s32 %r2026, %r512, %r2115; 2026-02-21T10:13:56.7226866Z and.pred %p49, %p61, %p52; 2026-02-21T10:13:56.7227013Z // begin inline asm 2026-02-21T10:13:56.7227161Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r2026], 2048; 2026-02-21T10:13:56.7227230Z // end inline asm 2026-02-21T10:13:56.7227437Z .loc 1 60 33 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:60:33 2026-02-21T10:13:56.7227504Z shl.b32 %r2116, %r2381, 11; 2026-02-21T10:13:56.7227569Z add.s32 %r2027, %r532, %r2116; 2026-02-21T10:13:56.7227645Z bar.sync 0; 2026-02-21T10:13:56.7227786Z elect.sync %r2117|%p57, -1; 2026-02-21T10:13:56.7227858Z and.pred %p58, %p52, %p57; 2026-02-21T10:13:56.7227934Z and.pred %p50, %p1, %p58; 2026-02-21T10:13:56.7227997Z // begin inline asm 2026-02-21T10:13:56.7228329Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r2027], [%rd34, {%r2515, %r2378}], [%r2026]; 2026-02-21T10:13:56.7228397Z // end inline asm 2026-02-21T10:13:56.7228707Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7228786Z setp.ne.b32 %p59, %r2377, 255; 2026-02-21T10:13:56.7228852Z @%p59 bra $L__BB0_6; 2026-02-21T10:13:56.7228973Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:13:56.7229186Z .loc 1 38 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:38:32 2026-02-21T10:13:56.7229255Z add.s32 %r2263, %r2375, %r29; 2026-02-21T10:13:56.7229465Z .loc 1 40 32 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:40:32 2026-02-21T10:13:56.7229530Z add.s32 %r2264, %r2373, %r13; 2026-02-21T10:13:56.7229595Z add.s32 %r2265, %r14, %r2373; 2026-02-21T10:13:56.7229664Z add.s32 %r2266, %r15, %r2373; 2026-02-21T10:13:56.7229730Z add.s32 %r2267, %r16, %r2373; 2026-02-21T10:13:56.7229791Z add.s32 %r2268, %r17, %r2373; 2026-02-21T10:13:56.7229853Z add.s32 %r2269, %r18, %r2373; 2026-02-21T10:13:56.7229923Z add.s32 %r2270, %r19, %r2373; 2026-02-21T10:13:56.7229987Z add.s32 %r2271, %r2373, %r20; 2026-02-21T10:13:56.7230051Z add.s32 %r2272, %r21, %r2373; 2026-02-21T10:13:56.7230121Z add.s32 %r2273, %r22, %r2373; 2026-02-21T10:13:56.7230184Z add.s32 %r2274, %r23, %r2373; 2026-02-21T10:13:56.7230249Z add.s32 %r2275, %r24, %r2373; 2026-02-21T10:13:56.7230314Z add.s32 %r2276, %r25, %r2373; 2026-02-21T10:13:56.7230386Z add.s32 %r2277, %r26, %r2373; 2026-02-21T10:13:56.7230449Z add.s32 %r2278, %r27, %r2373; 2026-02-21T10:13:56.7230518Z add.s32 %r2279, %r2373, %r28; 2026-02-21T10:13:56.7230734Z .loc 1 93 28 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:93:28 2026-02-21T10:13:56.7230818Z cvt.rn.bf16x2.f32 %r2280, %r2385, %r2384; 2026-02-21T10:13:56.7230896Z cvt.rn.bf16x2.f32 %r2281, %r2387, %r2386; 2026-02-21T10:13:56.7230979Z cvt.rn.bf16x2.f32 %r2282, %r2389, %r2388; 2026-02-21T10:13:56.7231053Z cvt.rn.bf16x2.f32 %r2283, %r2391, %r2390; 2026-02-21T10:13:56.7231129Z cvt.rn.bf16x2.f32 %r2284, %r2393, %r2392; 2026-02-21T10:13:56.7231204Z cvt.rn.bf16x2.f32 %r2285, %r2395, %r2394; 2026-02-21T10:13:56.7231286Z cvt.rn.bf16x2.f32 %r2286, %r2397, %r2396; 2026-02-21T10:13:56.7231359Z cvt.rn.bf16x2.f32 %r2287, %r2399, %r2398; 2026-02-21T10:13:56.7231583Z cvt.rn.bf16x2.f32 %r2288, %r2401, %r2400; 2026-02-21T10:13:56.7231663Z cvt.rn.bf16x2.f32 %r2289, %r2403, %r2402; 2026-02-21T10:13:56.7231737Z cvt.rn.bf16x2.f32 %r2290, %r2405, %r2404; 2026-02-21T10:13:56.7231811Z cvt.rn.bf16x2.f32 %r2291, %r2407, %r2406; 2026-02-21T10:13:56.7231885Z cvt.rn.bf16x2.f32 %r2292, %r2409, %r2408; 2026-02-21T10:13:56.7231966Z cvt.rn.bf16x2.f32 %r2293, %r2411, %r2410; 2026-02-21T10:13:56.7232039Z cvt.rn.bf16x2.f32 %r2294, %r2413, %r2412; 2026-02-21T10:13:56.7232112Z cvt.rn.bf16x2.f32 %r2295, %r2415, %r2414; 2026-02-21T10:13:56.7232190Z cvt.rn.bf16x2.f32 %r2296, %r2417, %r2416; 2026-02-21T10:13:56.7232264Z cvt.rn.bf16x2.f32 %r2297, %r2419, %r2418; 2026-02-21T10:13:56.7232338Z cvt.rn.bf16x2.f32 %r2298, %r2421, %r2420; 2026-02-21T10:13:56.7232464Z cvt.rn.bf16x2.f32 %r2299, %r2423, %r2422; 2026-02-21T10:13:56.7232540Z cvt.rn.bf16x2.f32 %r2300, %r2425, %r2424; 2026-02-21T10:13:56.7232612Z cvt.rn.bf16x2.f32 %r2301, %r2427, %r2426; 2026-02-21T10:13:56.7232690Z cvt.rn.bf16x2.f32 %r2302, %r2429, %r2428; 2026-02-21T10:13:56.7232769Z cvt.rn.bf16x2.f32 %r2303, %r2431, %r2430; 2026-02-21T10:13:56.7232846Z cvt.rn.bf16x2.f32 %r2304, %r2433, %r2432; 2026-02-21T10:13:56.7232919Z cvt.rn.bf16x2.f32 %r2305, %r2435, %r2434; 2026-02-21T10:13:56.7233059Z cvt.rn.bf16x2.f32 %r2306, %r2437, %r2436; 2026-02-21T10:13:56.7233138Z cvt.rn.bf16x2.f32 %r2307, %r2439, %r2438; 2026-02-21T10:13:56.7233211Z cvt.rn.bf16x2.f32 %r2308, %r2441, %r2440; 2026-02-21T10:13:56.7233290Z cvt.rn.bf16x2.f32 %r2309, %r2443, %r2442; 2026-02-21T10:13:56.7233363Z cvt.rn.bf16x2.f32 %r2310, %r2445, %r2444; 2026-02-21T10:13:56.7233435Z cvt.rn.bf16x2.f32 %r2311, %r2447, %r2446; 2026-02-21T10:13:56.7233509Z cvt.rn.bf16x2.f32 %r2312, %r2449, %r2448; 2026-02-21T10:13:56.7233591Z cvt.rn.bf16x2.f32 %r2313, %r2451, %r2450; 2026-02-21T10:13:56.7233665Z cvt.rn.bf16x2.f32 %r2314, %r2453, %r2452; 2026-02-21T10:13:56.7233737Z cvt.rn.bf16x2.f32 %r2315, %r2455, %r2454; 2026-02-21T10:13:56.7233818Z cvt.rn.bf16x2.f32 %r2316, %r2457, %r2456; 2026-02-21T10:13:56.7233890Z cvt.rn.bf16x2.f32 %r2317, %r2459, %r2458; 2026-02-21T10:13:56.7233962Z cvt.rn.bf16x2.f32 %r2318, %r2461, %r2460; 2026-02-21T10:13:56.7234043Z cvt.rn.bf16x2.f32 %r2319, %r2463, %r2462; 2026-02-21T10:13:56.7234118Z cvt.rn.bf16x2.f32 %r2320, %r2465, %r2464; 2026-02-21T10:13:56.7234192Z cvt.rn.bf16x2.f32 %r2321, %r2467, %r2466; 2026-02-21T10:13:56.7234266Z cvt.rn.bf16x2.f32 %r2322, %r2469, %r2468; 2026-02-21T10:13:56.7234347Z cvt.rn.bf16x2.f32 %r2323, %r2471, %r2470; 2026-02-21T10:13:56.7234421Z cvt.rn.bf16x2.f32 %r2324, %r2473, %r2472; 2026-02-21T10:13:56.7234493Z cvt.rn.bf16x2.f32 %r2325, %r2475, %r2474; 2026-02-21T10:13:56.7234570Z cvt.rn.bf16x2.f32 %r2326, %r2477, %r2476; 2026-02-21T10:13:56.7234647Z cvt.rn.bf16x2.f32 %r2327, %r2479, %r2478; 2026-02-21T10:13:56.7234720Z cvt.rn.bf16x2.f32 %r2328, %r2481, %r2480; 2026-02-21T10:13:56.7234794Z cvt.rn.bf16x2.f32 %r2329, %r2483, %r2482; 2026-02-21T10:13:56.7234877Z cvt.rn.bf16x2.f32 %r2330, %r2485, %r2484; 2026-02-21T10:13:56.7234951Z cvt.rn.bf16x2.f32 %r2331, %r2487, %r2486; 2026-02-21T10:13:56.7235024Z cvt.rn.bf16x2.f32 %r2332, %r2489, %r2488; 2026-02-21T10:13:56.7235103Z cvt.rn.bf16x2.f32 %r2333, %r2491, %r2490; 2026-02-21T10:13:56.7235177Z cvt.rn.bf16x2.f32 %r2334, %r2493, %r2492; 2026-02-21T10:13:56.7235250Z cvt.rn.bf16x2.f32 %r2335, %r2495, %r2494; 2026-02-21T10:13:56.7235334Z cvt.rn.bf16x2.f32 %r2336, %r2497, %r2496; 2026-02-21T10:13:56.7235419Z cvt.rn.bf16x2.f32 %r2337, %r2499, %r2498; 2026-02-21T10:13:56.7235494Z cvt.rn.bf16x2.f32 %r2338, %r2501, %r2500; 2026-02-21T10:13:56.7235568Z cvt.rn.bf16x2.f32 %r2339, %r2503, %r2502; 2026-02-21T10:13:56.7235645Z cvt.rn.bf16x2.f32 %r2340, %r2505, %r2504; 2026-02-21T10:13:56.7235723Z cvt.rn.bf16x2.f32 %r2341, %r2507, %r2506; 2026-02-21T10:13:56.7235796Z cvt.rn.bf16x2.f32 %r2342, %r2509, %r2508; 2026-02-21T10:13:56.7235876Z cvt.rn.bf16x2.f32 %r2343, %r2511, %r2510; 2026-02-21T10:13:56.7236195Z .loc 1 94 50 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:94:50 2026-02-21T10:13:56.7236275Z mad.lo.s32 %r2344, %r2264, 1280, %r2263; 2026-02-21T10:13:56.7236364Z mad.lo.s32 %r2345, %r2265, 1280, %r2263; 2026-02-21T10:13:56.7236439Z mad.lo.s32 %r2346, %r2266, 1280, %r2263; 2026-02-21T10:13:56.7236636Z mad.lo.s32 %r2347, %r2267, 1280, %r2263; 2026-02-21T10:13:56.7236711Z mad.lo.s32 %r2348, %r2268, 1280, %r2263; 2026-02-21T10:13:56.7236791Z mad.lo.s32 %r2349, %r2269, 1280, %r2263; 2026-02-21T10:13:56.7236861Z mad.lo.s32 %r2350, %r2270, 1280, %r2263; 2026-02-21T10:13:56.7236930Z mad.lo.s32 %r2351, %r2271, 1280, %r2263; 2026-02-21T10:13:56.7237006Z mad.lo.s32 %r2352, %r2272, 1280, %r2263; 2026-02-21T10:13:56.7237170Z mad.lo.s32 %r2353, %r2273, 1280, %r2263; 2026-02-21T10:13:56.7237242Z mad.lo.s32 %r2354, %r2274, 1280, %r2263; 2026-02-21T10:13:56.7237318Z mad.lo.s32 %r2355, %r2275, 1280, %r2263; 2026-02-21T10:13:56.7237388Z mad.lo.s32 %r2356, %r2276, 1280, %r2263; 2026-02-21T10:13:56.7237476Z mad.lo.s32 %r2357, %r2277, 1280, %r2263; 2026-02-21T10:13:56.7237550Z mad.lo.s32 %r2358, %r2278, 1280, %r2263; 2026-02-21T10:13:56.7237627Z mad.lo.s32 %r2359, %r2279, 1280, %r2263; 2026-02-21T10:13:56.7237898Z .loc 1 94 22 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:94:22 2026-02-21T10:13:56.7237977Z mad.wide.s32 %rd109, %r2344, 2, %rd7; 2026-02-21T10:13:56.7238054Z mad.wide.s32 %rd110, %r2345, 2, %rd7; 2026-02-21T10:13:56.7238125Z mad.wide.s32 %rd111, %r2346, 2, %rd7; 2026-02-21T10:13:56.7238195Z mad.wide.s32 %rd112, %r2347, 2, %rd7; 2026-02-21T10:13:56.7238263Z mad.wide.s32 %rd113, %r2348, 2, %rd7; 2026-02-21T10:13:56.7238337Z mad.wide.s32 %rd114, %r2349, 2, %rd7; 2026-02-21T10:13:56.7238408Z mad.wide.s32 %rd115, %r2350, 2, %rd7; 2026-02-21T10:13:56.7238482Z mad.wide.s32 %rd116, %r2351, 2, %rd7; 2026-02-21T10:13:56.7238551Z mad.wide.s32 %rd117, %r2352, 2, %rd7; 2026-02-21T10:13:56.7238622Z mad.wide.s32 %rd118, %r2353, 2, %rd7; 2026-02-21T10:13:56.7238692Z mad.wide.s32 %rd119, %r2354, 2, %rd7; 2026-02-21T10:13:56.7238768Z mad.wide.s32 %rd120, %r2355, 2, %rd7; 2026-02-21T10:13:56.7238836Z mad.wide.s32 %rd121, %r2356, 2, %rd7; 2026-02-21T10:13:56.7238907Z mad.wide.s32 %rd122, %r2357, 2, %rd7; 2026-02-21T10:13:56.7238982Z mad.wide.s32 %rd123, %r2358, 2, %rd7; 2026-02-21T10:13:56.7239050Z mad.wide.s32 %rd124, %r2359, 2, %rd7; 2026-02-21T10:13:56.7239253Z .loc 1 94 81 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:94:81 2026-02-21T10:13:56.7239371Z st.shared.v4.b32 [%r60], {%r2280, %r2282, %r2284, %r2286}; 2026-02-21T10:13:56.7239487Z st.shared.v4.b32 [%r61], {%r2288, %r2290, %r2292, %r2294}; 2026-02-21T10:13:56.7239595Z st.shared.v4.b32 [%r62], {%r2296, %r2298, %r2300, %r2302}; 2026-02-21T10:13:56.7239699Z st.shared.v4.b32 [%r63], {%r2304, %r2306, %r2308, %r2310}; 2026-02-21T10:13:56.7239779Z bar.sync 0; 2026-02-21T10:13:56.7239846Z // begin inline asm 2026-02-21T10:13:56.7240048Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2198, %r2199, %r2200, %r2201}, [%r2122]; 2026-02-21T10:13:56.7240112Z // end inline asm 2026-02-21T10:13:56.7240174Z // begin inline asm 2026-02-21T10:13:56.7240361Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2206, %r2207, %r2208, %r2209}, [%r2127]; 2026-02-21T10:13:56.7240421Z // end inline asm 2026-02-21T10:13:56.7240489Z // begin inline asm 2026-02-21T10:13:56.7240670Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2214, %r2215, %r2216, %r2217}, [%r2132]; 2026-02-21T10:13:56.7240730Z // end inline asm 2026-02-21T10:13:56.7240797Z // begin inline asm 2026-02-21T10:13:56.7240977Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2222, %r2223, %r2224, %r2225}, [%r2137]; 2026-02-21T10:13:56.7241036Z // end inline asm 2026-02-21T10:13:56.7241102Z bar.sync 0; 2026-02-21T10:13:56.7241213Z st.shared.v4.b32 [%r60], {%r2281, %r2283, %r2285, %r2287}; 2026-02-21T10:13:56.7241319Z st.shared.v4.b32 [%r61], {%r2289, %r2291, %r2293, %r2295}; 2026-02-21T10:13:56.7241558Z st.shared.v4.b32 [%r62], {%r2297, %r2299, %r2301, %r2303}; 2026-02-21T10:13:56.7241682Z st.shared.v4.b32 [%r63], {%r2305, %r2307, %r2309, %r2311}; 2026-02-21T10:13:56.7241745Z bar.sync 0; 2026-02-21T10:13:56.7241808Z // begin inline asm 2026-02-21T10:13:56.7241999Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2202, %r2203, %r2204, %r2205}, [%r2122]; 2026-02-21T10:13:56.7242058Z // end inline asm 2026-02-21T10:13:56.7242119Z // begin inline asm 2026-02-21T10:13:56.7242304Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2210, %r2211, %r2212, %r2213}, [%r2127]; 2026-02-21T10:13:56.7242363Z // end inline asm 2026-02-21T10:13:56.7242424Z // begin inline asm 2026-02-21T10:13:56.7242652Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2218, %r2219, %r2220, %r2221}, [%r2132]; 2026-02-21T10:13:56.7242720Z // end inline asm 2026-02-21T10:13:56.7242780Z // begin inline asm 2026-02-21T10:13:56.7242960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2226, %r2227, %r2228, %r2229}, [%r2137]; 2026-02-21T10:13:56.7243028Z // end inline asm 2026-02-21T10:13:56.7243086Z bar.sync 0; 2026-02-21T10:13:56.7243192Z st.shared.v4.b32 [%r60], {%r2312, %r2314, %r2316, %r2318}; 2026-02-21T10:13:56.7243343Z st.shared.v4.b32 [%r61], {%r2320, %r2322, %r2324, %r2326}; 2026-02-21T10:13:56.7243454Z st.shared.v4.b32 [%r62], {%r2328, %r2330, %r2332, %r2334}; 2026-02-21T10:13:56.7243559Z st.shared.v4.b32 [%r63], {%r2336, %r2338, %r2340, %r2342}; 2026-02-21T10:13:56.7243615Z bar.sync 0; 2026-02-21T10:13:56.7243682Z // begin inline asm 2026-02-21T10:13:56.7243861Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2230, %r2231, %r2232, %r2233}, [%r2122]; 2026-02-21T10:13:56.7243919Z // end inline asm 2026-02-21T10:13:56.7243983Z // begin inline asm 2026-02-21T10:13:56.7244175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2238, %r2239, %r2240, %r2241}, [%r2127]; 2026-02-21T10:13:56.7244237Z // end inline asm 2026-02-21T10:13:56.7244296Z // begin inline asm 2026-02-21T10:13:56.7244484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2246, %r2247, %r2248, %r2249}, [%r2132]; 2026-02-21T10:13:56.7244542Z // end inline asm 2026-02-21T10:13:56.7244603Z // begin inline asm 2026-02-21T10:13:56.7244799Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2254, %r2255, %r2256, %r2257}, [%r2137]; 2026-02-21T10:13:56.7244861Z // end inline asm 2026-02-21T10:13:56.7244918Z bar.sync 0; 2026-02-21T10:13:56.7245022Z st.shared.v4.b32 [%r60], {%r2313, %r2315, %r2317, %r2319}; 2026-02-21T10:13:56.7245131Z st.shared.v4.b32 [%r61], {%r2321, %r2323, %r2325, %r2327}; 2026-02-21T10:13:56.7245235Z st.shared.v4.b32 [%r62], {%r2329, %r2331, %r2333, %r2335}; 2026-02-21T10:13:56.7245339Z st.shared.v4.b32 [%r63], {%r2337, %r2339, %r2341, %r2343}; 2026-02-21T10:13:56.7245399Z bar.sync 0; 2026-02-21T10:13:56.7245463Z // begin inline asm 2026-02-21T10:13:56.7245645Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2234, %r2235, %r2236, %r2237}, [%r2122]; 2026-02-21T10:13:56.7245707Z // end inline asm 2026-02-21T10:13:56.7245769Z // begin inline asm 2026-02-21T10:13:56.7245948Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2242, %r2243, %r2244, %r2245}, [%r2127]; 2026-02-21T10:13:56.7246005Z // end inline asm 2026-02-21T10:13:56.7246072Z // begin inline asm 2026-02-21T10:13:56.7246252Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2250, %r2251, %r2252, %r2253}, [%r2132]; 2026-02-21T10:13:56.7246310Z // end inline asm 2026-02-21T10:13:56.7246376Z // begin inline asm 2026-02-21T10:13:56.7246687Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2258, %r2259, %r2260, %r2261}, [%r2137]; 2026-02-21T10:13:56.7246749Z // end inline asm 2026-02-21T10:13:56.7246814Z // begin inline asm 2026-02-21T10:13:56.7246947Z st.global.v4.b32 [ %rd109 + 0 ], { %r2198, %r2199, %r2200, %r2201 }; 2026-02-21T10:13:56.7247005Z // end inline asm 2026-02-21T10:13:56.7247067Z // begin inline asm 2026-02-21T10:13:56.7247194Z st.global.v4.b32 [ %rd110 + 0 ], { %r2202, %r2203, %r2204, %r2205 }; 2026-02-21T10:13:56.7247252Z // end inline asm 2026-02-21T10:13:56.7247467Z // begin inline asm 2026-02-21T10:13:56.7247594Z st.global.v4.b32 [ %rd111 + 0 ], { %r2206, %r2207, %r2208, %r2209 }; 2026-02-21T10:13:56.7247652Z // end inline asm 2026-02-21T10:13:56.7247712Z // begin inline asm 2026-02-21T10:13:56.7247838Z st.global.v4.b32 [ %rd112 + 0 ], { %r2210, %r2211, %r2212, %r2213 }; 2026-02-21T10:13:56.7247898Z // end inline asm 2026-02-21T10:13:56.7247958Z // begin inline asm 2026-02-21T10:13:56.7248084Z st.global.v4.b32 [ %rd113 + 0 ], { %r2214, %r2215, %r2216, %r2217 }; 2026-02-21T10:13:56.7248145Z // end inline asm 2026-02-21T10:13:56.7248205Z // begin inline asm 2026-02-21T10:13:56.7248328Z st.global.v4.b32 [ %rd114 + 0 ], { %r2218, %r2219, %r2220, %r2221 }; 2026-02-21T10:13:56.7248389Z // end inline asm 2026-02-21T10:13:56.7248449Z // begin inline asm 2026-02-21T10:13:56.7248636Z st.global.v4.b32 [ %rd115 + 0 ], { %r2222, %r2223, %r2224, %r2225 }; 2026-02-21T10:13:56.7248703Z // end inline asm 2026-02-21T10:13:56.7248764Z // begin inline asm 2026-02-21T10:13:56.7248885Z st.global.v4.b32 [ %rd116 + 0 ], { %r2226, %r2227, %r2228, %r2229 }; 2026-02-21T10:13:56.7248950Z // end inline asm 2026-02-21T10:13:56.7249010Z // begin inline asm 2026-02-21T10:13:56.7249189Z st.global.v4.b32 [ %rd117 + 0 ], { %r2230, %r2231, %r2232, %r2233 }; 2026-02-21T10:13:56.7249257Z // end inline asm 2026-02-21T10:13:56.7249316Z // begin inline asm 2026-02-21T10:13:56.7249433Z st.global.v4.b32 [ %rd118 + 0 ], { %r2234, %r2235, %r2236, %r2237 }; 2026-02-21T10:13:56.7249491Z // end inline asm 2026-02-21T10:13:56.7249559Z // begin inline asm 2026-02-21T10:13:56.7249676Z st.global.v4.b32 [ %rd119 + 0 ], { %r2238, %r2239, %r2240, %r2241 }; 2026-02-21T10:13:56.7249745Z // end inline asm 2026-02-21T10:13:56.7249813Z // begin inline asm 2026-02-21T10:13:56.7249935Z st.global.v4.b32 [ %rd120 + 0 ], { %r2242, %r2243, %r2244, %r2245 }; 2026-02-21T10:13:56.7249993Z // end inline asm 2026-02-21T10:13:56.7250056Z // begin inline asm 2026-02-21T10:13:56.7250179Z st.global.v4.b32 [ %rd121 + 0 ], { %r2246, %r2247, %r2248, %r2249 }; 2026-02-21T10:13:56.7250238Z // end inline asm 2026-02-21T10:13:56.7250298Z // begin inline asm 2026-02-21T10:13:56.7250418Z st.global.v4.b32 [ %rd122 + 0 ], { %r2250, %r2251, %r2252, %r2253 }; 2026-02-21T10:13:56.7250477Z // end inline asm 2026-02-21T10:13:56.7250537Z // begin inline asm 2026-02-21T10:13:56.7250656Z st.global.v4.b32 [ %rd123 + 0 ], { %r2254, %r2255, %r2256, %r2257 }; 2026-02-21T10:13:56.7250714Z // end inline asm 2026-02-21T10:13:56.7250774Z // begin inline asm 2026-02-21T10:13:56.7250890Z st.global.v4.b32 [ %rd124 + 0 ], { %r2258, %r2259, %r2260, %r2261 }; 2026-02-21T10:13:56.7250955Z // end inline asm 2026-02-21T10:13:56.7251024Z mov.b32 %r2384, 0f00000000; 2026-02-21T10:13:56.7251091Z mov.b32 %r2385, %r2384; 2026-02-21T10:13:56.7251158Z mov.b32 %r2386, %r2384; 2026-02-21T10:13:56.7251219Z mov.b32 %r2387, %r2384; 2026-02-21T10:13:56.7251281Z mov.b32 %r2388, %r2384; 2026-02-21T10:13:56.7251340Z mov.b32 %r2389, %r2384; 2026-02-21T10:13:56.7251410Z mov.b32 %r2390, %r2384; 2026-02-21T10:13:56.7251474Z mov.b32 %r2391, %r2384; 2026-02-21T10:13:56.7251533Z mov.b32 %r2392, %r2384; 2026-02-21T10:13:56.7251597Z mov.b32 %r2393, %r2384; 2026-02-21T10:13:56.7251658Z mov.b32 %r2394, %r2384; 2026-02-21T10:13:56.7251718Z mov.b32 %r2395, %r2384; 2026-02-21T10:13:56.7251777Z mov.b32 %r2396, %r2384; 2026-02-21T10:13:56.7251854Z mov.b32 %r2397, %r2384; 2026-02-21T10:13:56.7251915Z mov.b32 %r2398, %r2384; 2026-02-21T10:13:56.7251976Z mov.b32 %r2399, %r2384; 2026-02-21T10:13:56.7252042Z mov.b32 %r2400, %r2384; 2026-02-21T10:13:56.7252105Z mov.b32 %r2401, %r2384; 2026-02-21T10:13:56.7252166Z mov.b32 %r2402, %r2384; 2026-02-21T10:13:56.7252227Z mov.b32 %r2403, %r2384; 2026-02-21T10:13:56.7252293Z mov.b32 %r2404, %r2384; 2026-02-21T10:13:56.7252353Z mov.b32 %r2405, %r2384; 2026-02-21T10:13:56.7252413Z mov.b32 %r2406, %r2384; 2026-02-21T10:13:56.7252478Z mov.b32 %r2407, %r2384; 2026-02-21T10:13:56.7252642Z mov.b32 %r2408, %r2384; 2026-02-21T10:13:56.7252702Z mov.b32 %r2409, %r2384; 2026-02-21T10:13:56.7252768Z mov.b32 %r2410, %r2384; 2026-02-21T10:13:56.7252831Z mov.b32 %r2411, %r2384; 2026-02-21T10:13:56.7252890Z mov.b32 %r2412, %r2384; 2026-02-21T10:13:56.7252955Z mov.b32 %r2413, %r2384; 2026-02-21T10:13:56.7253022Z mov.b32 %r2414, %r2384; 2026-02-21T10:13:56.7253082Z mov.b32 %r2415, %r2384; 2026-02-21T10:13:56.7253142Z mov.b32 %r2416, %r2384; 2026-02-21T10:13:56.7253214Z mov.b32 %r2417, %r2384; 2026-02-21T10:13:56.7253273Z mov.b32 %r2418, %r2384; 2026-02-21T10:13:56.7253333Z mov.b32 %r2419, %r2384; 2026-02-21T10:13:56.7253395Z mov.b32 %r2420, %r2384; 2026-02-21T10:13:56.7253463Z mov.b32 %r2421, %r2384; 2026-02-21T10:13:56.7253524Z mov.b32 %r2422, %r2384; 2026-02-21T10:13:56.7253636Z mov.b32 %r2423, %r2384; 2026-02-21T10:13:56.7253708Z mov.b32 %r2424, %r2384; 2026-02-21T10:13:56.7253769Z mov.b32 %r2425, %r2384; 2026-02-21T10:13:56.7253828Z mov.b32 %r2426, %r2384; 2026-02-21T10:13:56.7253892Z mov.b32 %r2427, %r2384; 2026-02-21T10:13:56.7253959Z mov.b32 %r2428, %r2384; 2026-02-21T10:13:56.7254020Z mov.b32 %r2429, %r2384; 2026-02-21T10:13:56.7254080Z mov.b32 %r2430, %r2384; 2026-02-21T10:13:56.7254159Z mov.b32 %r2431, %r2384; 2026-02-21T10:13:56.7254271Z mov.b32 %r2432, %r2384; 2026-02-21T10:13:56.7254335Z mov.b32 %r2433, %r2384; 2026-02-21T10:13:56.7254396Z mov.b32 %r2434, %r2384; 2026-02-21T10:13:56.7254461Z mov.b32 %r2435, %r2384; 2026-02-21T10:13:56.7254523Z mov.b32 %r2436, %r2384; 2026-02-21T10:13:56.7254584Z mov.b32 %r2437, %r2384; 2026-02-21T10:13:56.7254650Z mov.b32 %r2438, %r2384; 2026-02-21T10:13:56.7254711Z mov.b32 %r2439, %r2384; 2026-02-21T10:13:56.7254772Z mov.b32 %r2440, %r2384; 2026-02-21T10:13:56.7254831Z mov.b32 %r2441, %r2384; 2026-02-21T10:13:56.7254898Z mov.b32 %r2442, %r2384; 2026-02-21T10:13:56.7254958Z mov.b32 %r2443, %r2384; 2026-02-21T10:13:56.7255019Z mov.b32 %r2444, %r2384; 2026-02-21T10:13:56.7255086Z mov.b32 %r2445, %r2384; 2026-02-21T10:13:56.7255152Z mov.b32 %r2446, %r2384; 2026-02-21T10:13:56.7255212Z mov.b32 %r2447, %r2384; 2026-02-21T10:13:56.7255273Z mov.b32 %r2448, %r2384; 2026-02-21T10:13:56.7255344Z mov.b32 %r2449, %r2384; 2026-02-21T10:13:56.7255405Z mov.b32 %r2450, %r2384; 2026-02-21T10:13:56.7255468Z mov.b32 %r2451, %r2384; 2026-02-21T10:13:56.7255537Z mov.b32 %r2452, %r2384; 2026-02-21T10:13:56.7255598Z mov.b32 %r2453, %r2384; 2026-02-21T10:13:56.7255659Z mov.b32 %r2454, %r2384; 2026-02-21T10:13:56.7255726Z mov.b32 %r2455, %r2384; 2026-02-21T10:13:56.7255786Z mov.b32 %r2456, %r2384; 2026-02-21T10:13:56.7255847Z mov.b32 %r2457, %r2384; 2026-02-21T10:13:56.7255906Z mov.b32 %r2458, %r2384; 2026-02-21T10:13:56.7255973Z mov.b32 %r2459, %r2384; 2026-02-21T10:13:56.7256034Z mov.b32 %r2460, %r2384; 2026-02-21T10:13:56.7256095Z mov.b32 %r2461, %r2384; 2026-02-21T10:13:56.7256162Z mov.b32 %r2462, %r2384; 2026-02-21T10:13:56.7256224Z mov.b32 %r2463, %r2384; 2026-02-21T10:13:56.7256289Z mov.b32 %r2464, %r2384; 2026-02-21T10:13:56.7256351Z mov.b32 %r2465, %r2384; 2026-02-21T10:13:56.7256418Z mov.b32 %r2466, %r2384; 2026-02-21T10:13:56.7256604Z mov.b32 %r2467, %r2384; 2026-02-21T10:13:56.7256670Z mov.b32 %r2468, %r2384; 2026-02-21T10:13:56.7256749Z mov.b32 %r2469, %r2384; 2026-02-21T10:13:56.7256812Z mov.b32 %r2470, %r2384; 2026-02-21T10:13:56.7256874Z mov.b32 %r2471, %r2384; 2026-02-21T10:13:56.7256933Z mov.b32 %r2472, %r2384; 2026-02-21T10:13:56.7256999Z mov.b32 %r2473, %r2384; 2026-02-21T10:13:56.7257061Z mov.b32 %r2474, %r2384; 2026-02-21T10:13:56.7257120Z mov.b32 %r2475, %r2384; 2026-02-21T10:13:56.7257185Z mov.b32 %r2476, %r2384; 2026-02-21T10:13:56.7257244Z mov.b32 %r2477, %r2384; 2026-02-21T10:13:56.7257303Z mov.b32 %r2478, %r2384; 2026-02-21T10:13:56.7257365Z mov.b32 %r2479, %r2384; 2026-02-21T10:13:56.7257431Z mov.b32 %r2480, %r2384; 2026-02-21T10:13:56.7257491Z mov.b32 %r2481, %r2384; 2026-02-21T10:13:56.7257550Z mov.b32 %r2482, %r2384; 2026-02-21T10:13:56.7257797Z mov.b32 %r2483, %r2384; 2026-02-21T10:13:56.7257858Z mov.b32 %r2484, %r2384; 2026-02-21T10:13:56.7257917Z mov.b32 %r2485, %r2384; 2026-02-21T10:13:56.7257976Z mov.b32 %r2486, %r2384; 2026-02-21T10:13:56.7258043Z mov.b32 %r2487, %r2384; 2026-02-21T10:13:56.7258106Z mov.b32 %r2488, %r2384; 2026-02-21T10:13:56.7258167Z mov.b32 %r2489, %r2384; 2026-02-21T10:13:56.7258232Z mov.b32 %r2490, %r2384; 2026-02-21T10:13:56.7258294Z mov.b32 %r2491, %r2384; 2026-02-21T10:13:56.7258353Z mov.b32 %r2492, %r2384; 2026-02-21T10:13:56.7258413Z mov.b32 %r2493, %r2384; 2026-02-21T10:13:56.7258479Z mov.b32 %r2494, %r2384; 2026-02-21T10:13:56.7258538Z mov.b32 %r2495, %r2384; 2026-02-21T10:13:56.7258599Z mov.b32 %r2496, %r2384; 2026-02-21T10:13:56.7258663Z mov.b32 %r2497, %r2384; 2026-02-21T10:13:56.7258788Z mov.b32 %r2498, %r2384; 2026-02-21T10:13:56.7258851Z mov.b32 %r2499, %r2384; 2026-02-21T10:13:56.7258916Z mov.b32 %r2500, %r2384; 2026-02-21T10:13:56.7258978Z mov.b32 %r2501, %r2384; 2026-02-21T10:13:56.7259041Z mov.b32 %r2502, %r2384; 2026-02-21T10:13:56.7259101Z mov.b32 %r2503, %r2384; 2026-02-21T10:13:56.7259166Z mov.b32 %r2504, %r2384; 2026-02-21T10:13:56.7259228Z mov.b32 %r2505, %r2384; 2026-02-21T10:13:56.7259350Z mov.b32 %r2506, %r2384; 2026-02-21T10:13:56.7259418Z mov.b32 %r2507, %r2384; 2026-02-21T10:13:56.7259481Z mov.b32 %r2508, %r2384; 2026-02-21T10:13:56.7259542Z mov.b32 %r2509, %r2384; 2026-02-21T10:13:56.7259601Z mov.b32 %r2510, %r2384; 2026-02-21T10:13:56.7259666Z mov.b32 %r2511, %r2384; 2026-02-21T10:13:56.7259727Z bra.uni $L__BB0_6; 2026-02-21T10:13:56.7259819Z $L__BB0_7: // %._crit_edge 2026-02-21T10:13:56.7260066Z .loc 1 26 112 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:112 2026-02-21T10:13:56.7260139Z cp.async.wait_group 0; 2026-02-21T10:13:56.7260198Z bar.sync 0; 2026-02-21T10:13:56.7260260Z // begin inline asm 2026-02-21T10:13:56.7260368Z @%p61 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T10:13:56.7260429Z // end inline asm 2026-02-21T10:13:56.7260487Z bar.sync 0; 2026-02-21T10:13:56.7260553Z // begin inline asm 2026-02-21T10:13:56.7260647Z @%p61 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T10:13:56.7260708Z // end inline asm 2026-02-21T10:13:56.7260768Z bar.sync 0; 2026-02-21T10:13:56.7260836Z // begin inline asm 2026-02-21T10:13:56.7260923Z @%p61 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T10:13:56.7260982Z // end inline asm 2026-02-21T10:13:56.7261196Z .loc 1 26 4 // ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py:26:4 2026-02-21T10:13:56.7261253Z ret; 2026-02-21T10:13:56.7261312Z $L__tmp3: 2026-02-21T10:13:56.7261381Z $L__func_end0: 2026-02-21T10:13:56.7261473Z // -- End function 2026-02-21T10:13:56.7261529Z } 2026-02-21T10:13:56.7261782Z .file 1 "/tmp/torchinductor_root/ia/ciaahmr7lflzaxnnnghnqheym6xkx2cf5rlz435tdxl7crdm7q3k.py" 2026-02-21T10:13:56.7262005Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:13:56.7262074Z .section .debug_abbrev 2026-02-21T10:13:56.7262129Z { 2026-02-21T10:13:56.7262237Z .b8 1 // Abbreviation Code 2026-02-21T10:13:56.7262339Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:13:56.7262427Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:13:56.7262520Z .b8 37 // DW_AT_producer 2026-02-21T10:13:56.7262605Z .b8 8 // DW_FORM_string 2026-02-21T10:13:56.7262686Z .b8 19 // DW_AT_language 2026-02-21T10:13:56.7262785Z .b8 5 // DW_FORM_data2 2026-02-21T10:13:56.7262882Z .b8 3 // DW_AT_name 2026-02-21T10:13:56.7262966Z .b8 8 // DW_FORM_string 2026-02-21T10:13:56.7263113Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:13:56.7263251Z .b8 6 // DW_FORM_data4 2026-02-21T10:13:56.7263334Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:13:56.7263421Z .b8 8 // DW_FORM_string 2026-02-21T10:13:56.7263506Z .b8 0 // EOM(1) 2026-02-21T10:13:56.7263581Z .b8 0 // EOM(2) 2026-02-21T10:13:56.7263671Z .b8 2 // Abbreviation Code 2026-02-21T10:13:56.7263761Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:13:56.7263850Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:13:56.7263928Z .b8 3 // DW_AT_name 2026-02-21T10:13:56.7264057Z .b8 8 // DW_FORM_string 2026-02-21T10:13:56.7264150Z .b8 32 // DW_AT_inline 2026-02-21T10:13:56.7264237Z .b8 11 // DW_FORM_data1 2026-02-21T10:13:56.7264314Z .b8 0 // EOM(1) 2026-02-21T10:13:56.7264392Z .b8 0 // EOM(2) 2026-02-21T10:13:56.7264531Z .b8 3 // Abbreviation Code 2026-02-21T10:13:56.7264622Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:13:56.7264710Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:13:56.7264793Z .b8 17 // DW_AT_low_pc 2026-02-21T10:13:56.7264886Z .b8 1 // DW_FORM_addr 2026-02-21T10:13:56.7264970Z .b8 18 // DW_AT_high_pc 2026-02-21T10:13:56.7265055Z .b8 1 // DW_FORM_addr 2026-02-21T10:13:56.7265153Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:13:56.7265233Z .b8 19 // DW_FORM_ref4 2026-02-21T10:13:56.7265316Z .b8 0 // EOM(1) 2026-02-21T10:13:56.7265388Z .b8 0 // EOM(2) 2026-02-21T10:13:56.7265476Z .b8 4 // Abbreviation Code 2026-02-21T10:13:56.7265596Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:13:56.7265682Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:13:56.7265774Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:13:56.7265856Z .b8 19 // DW_FORM_ref4 2026-02-21T10:13:56.7265944Z .b8 17 // DW_AT_low_pc 2026-02-21T10:13:56.7266023Z .b8 1 // DW_FORM_addr 2026-02-21T10:13:56.7266106Z .b8 18 // DW_AT_high_pc 2026-02-21T10:13:56.7266187Z .b8 1 // DW_FORM_addr 2026-02-21T10:13:56.7266271Z .b8 88 // DW_AT_call_file 2026-02-21T10:13:56.7266354Z .b8 11 // DW_FORM_data1 2026-02-21T10:13:56.7266442Z .b8 89 // DW_AT_call_line 2026-02-21T10:13:56.7266640Z .b8 11 // DW_FORM_data1 2026-02-21T10:13:56.7266729Z .b8 87 // DW_AT_call_column 2026-02-21T10:13:56.7266809Z .b8 11 // DW_FORM_data1 2026-02-21T10:13:56.7266889Z .b8 0 // EOM(1) 2026-02-21T10:13:56.7266959Z .b8 0 // EOM(2) 2026-02-21T10:13:56.7267032Z .b8 0 // EOM(3) 2026-02-21T10:13:56.7267091Z } 2026-02-21T10:13:56.7267156Z .section .debug_info 2026-02-21T10:13:56.7267208Z { 2026-02-21T10:13:56.7267314Z .b32 178 // Length of Unit 2026-02-21T10:13:56.7267413Z .b8 2 // DWARF version number 2026-02-21T10:13:56.7267469Z .b8 0 2026-02-21T10:13:56.7267685Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:13:56.7267848Z .b8 8 // Address Size (in bytes) 2026-02-21T10:13:56.7267964Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:13:56.7268053Z .b8 116 // DW_AT_producer 2026-02-21T10:13:56.7268115Z .b8 114 2026-02-21T10:13:56.7268170Z .b8 105 2026-02-21T10:13:56.7268227Z .b8 116 2026-02-21T10:13:56.7268280Z .b8 111 2026-02-21T10:13:56.7268340Z .b8 110 2026-02-21T10:13:56.7268393Z .b8 0 2026-02-21T10:13:56.7268552Z .b8 2 // DW_AT_language 2026-02-21T10:13:56.7268618Z .b8 0 2026-02-21T10:13:56.7268702Z .b8 99 // DW_AT_name 2026-02-21T10:13:56.7268758Z .b8 105 2026-02-21T10:13:56.7268882Z .b8 97 2026-02-21T10:13:56.7268942Z .b8 97 2026-02-21T10:13:56.7268997Z .b8 104 2026-02-21T10:13:56.7269051Z .b8 109 2026-02-21T10:13:56.7269110Z .b8 114 2026-02-21T10:13:56.7269171Z .b8 55 2026-02-21T10:13:56.7269224Z .b8 108 2026-02-21T10:13:56.7269279Z .b8 102 2026-02-21T10:13:56.7269342Z .b8 108 2026-02-21T10:13:56.7269397Z .b8 122 2026-02-21T10:13:56.7269450Z .b8 97 2026-02-21T10:13:56.7269509Z .b8 120 2026-02-21T10:13:56.7269563Z .b8 110 2026-02-21T10:13:56.7269682Z .b8 110 2026-02-21T10:13:56.7269737Z .b8 110 2026-02-21T10:13:56.7269796Z .b8 103 2026-02-21T10:13:56.7269850Z .b8 104 2026-02-21T10:13:56.7269905Z .b8 110 2026-02-21T10:13:56.7269965Z .b8 113 2026-02-21T10:13:56.7270017Z .b8 104 2026-02-21T10:13:56.7270071Z .b8 101 2026-02-21T10:13:56.7270125Z .b8 121 2026-02-21T10:13:56.7270184Z .b8 109 2026-02-21T10:13:56.7270236Z .b8 54 2026-02-21T10:13:56.7270301Z .b8 120 2026-02-21T10:13:56.7270358Z .b8 107 2026-02-21T10:13:56.7270420Z .b8 120 2026-02-21T10:13:56.7270474Z .b8 50 2026-02-21T10:13:56.7270531Z .b8 99 2026-02-21T10:13:56.7270591Z .b8 102 2026-02-21T10:13:56.7270643Z .b8 53 2026-02-21T10:13:56.7270697Z .b8 114 2026-02-21T10:13:56.7270751Z .b8 108 2026-02-21T10:13:56.7270817Z .b8 122 2026-02-21T10:13:56.7270871Z .b8 52 2026-02-21T10:13:56.7270924Z .b8 51 2026-02-21T10:13:56.7270984Z .b8 53 2026-02-21T10:13:56.7271038Z .b8 116 2026-02-21T10:13:56.7271093Z .b8 100 2026-02-21T10:13:56.7271148Z .b8 120 2026-02-21T10:13:56.7271208Z .b8 108 2026-02-21T10:13:56.7271263Z .b8 55 2026-02-21T10:13:56.7271315Z .b8 99 2026-02-21T10:13:56.7271370Z .b8 114 2026-02-21T10:13:56.7271429Z .b8 100 2026-02-21T10:13:56.7271484Z .b8 109 2026-02-21T10:13:56.7271539Z .b8 55 2026-02-21T10:13:56.7271598Z .b8 113 2026-02-21T10:13:56.7271653Z .b8 51 2026-02-21T10:13:56.7271706Z .b8 107 2026-02-21T10:13:56.7271760Z .b8 46 2026-02-21T10:13:56.7271821Z .b8 112 2026-02-21T10:13:56.7271875Z .b8 121 2026-02-21T10:13:56.7271929Z .b8 0 2026-02-21T10:13:56.7272046Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:13:56.7272135Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:13:56.7272203Z .b8 116 2026-02-21T10:13:56.7272260Z .b8 109 2026-02-21T10:13:56.7272324Z .b8 112 2026-02-21T10:13:56.7272376Z .b8 47 2026-02-21T10:13:56.7272431Z .b8 116 2026-02-21T10:13:56.7272490Z .b8 111 2026-02-21T10:13:56.7272543Z .b8 114 2026-02-21T10:13:56.7272596Z .b8 99 2026-02-21T10:13:56.7272650Z .b8 104 2026-02-21T10:13:56.7272708Z .b8 105 2026-02-21T10:13:56.7272764Z .b8 110 2026-02-21T10:13:56.7272818Z .b8 100 2026-02-21T10:13:56.7272877Z .b8 117 2026-02-21T10:13:56.7272930Z .b8 99 2026-02-21T10:13:56.7272983Z .b8 116 2026-02-21T10:13:56.7273047Z .b8 111 2026-02-21T10:13:56.7273108Z .b8 114 2026-02-21T10:13:56.7273161Z .b8 95 2026-02-21T10:13:56.7273213Z .b8 114 2026-02-21T10:13:56.7273266Z .b8 111 2026-02-21T10:13:56.7273327Z .b8 111 2026-02-21T10:13:56.7273381Z .b8 116 2026-02-21T10:13:56.7273435Z .b8 47 2026-02-21T10:13:56.7273493Z .b8 105 2026-02-21T10:13:56.7273546Z .b8 97 2026-02-21T10:13:56.7273600Z .b8 0 2026-02-21T10:13:56.7273717Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:13:56.7273806Z .b8 95 // DW_AT_name 2026-02-21T10:13:56.7273969Z .b8 104 2026-02-21T10:13:56.7274022Z .b8 101 2026-02-21T10:13:56.7274081Z .b8 108 2026-02-21T10:13:56.7274133Z .b8 105 2026-02-21T10:13:56.7274186Z .b8 111 2026-02-21T10:13:56.7274239Z .b8 110 2026-02-21T10:13:56.7274299Z .b8 95 2026-02-21T10:13:56.7274353Z .b8 109 2026-02-21T10:13:56.7274405Z .b8 97 2026-02-21T10:13:56.7274490Z .b8 116 2026-02-21T10:13:56.7274544Z .b8 109 2026-02-21T10:13:56.7274597Z .b8 117 2026-02-21T10:13:56.7274655Z .b8 108 2026-02-21T10:13:56.7274708Z .b8 95 2026-02-21T10:13:56.7274760Z .b8 98 2026-02-21T10:13:56.7274813Z .b8 102 2026-02-21T10:13:56.7274871Z .b8 49 2026-02-21T10:13:56.7274924Z .b8 54 2026-02-21T10:13:56.7274977Z .b8 95 2026-02-21T10:13:56.7275050Z .b8 105 2026-02-21T10:13:56.7275107Z .b8 110 2026-02-21T10:13:56.7275215Z .b8 116 2026-02-21T10:13:56.7275269Z .b8 52 2026-02-21T10:13:56.7275327Z .b8 0 2026-02-21T10:13:56.7275413Z .b8 1 // DW_AT_inline 2026-02-21T10:13:56.7275529Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:13:56.7275635Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:13:56.7275734Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:13:56.7275892Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:13:56.7276028Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:13:56.7276132Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:13:56.7276223Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:13:56.7276316Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:13:56.7276408Z .b8 1 // DW_AT_call_file 2026-02-21T10:13:56.7276623Z .b8 90 // DW_AT_call_line 2026-02-21T10:13:56.7276722Z .b8 40 // DW_AT_call_column 2026-02-21T10:13:56.7276828Z .b8 0 // End Of Children Mark 2026-02-21T10:13:56.7276918Z .b8 0 // End Of Children Mark 2026-02-21T10:13:56.7276972Z } 2026-02-21T10:13:56.7277047Z .section .debug_macinfo { } 2026-02-21T10:13:56.7277063Z 2026-02-21T10:13:56.7277157Z ================================================================ 2026-02-21T10:13:56.7277279Z please share the reproducer above with Triton project. 2026-02-21T10:14:11.3782449Z [2844s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 64, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[1], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], num_stages=2, num_warps=2, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T10:14:11.3784052Z Tensor-likes are not close! 2026-02-21T10:14:11.3784224Z 2026-02-21T10:14:11.3784343Z Mismatched elements: 83620783 / 83886080 (99.7%) 2026-02-21T10:14:11.3784800Z Greatest absolute difference: 3984.0 at index (50476, 568) (up to 0.01 allowed) 2026-02-21T10:14:11.3785355Z Greatest relative difference: inf at index (38888, 1203) (up to 0.01 allowed) 2026-02-21T10:14:11.3785846Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:14:11.3786108Z 2026-02-21T10:14:11.8004230Z 2026-02-21T10:14:11.8005609Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 77/77 1.8 configs/s 2026-02-21T10:14:16.9131449Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━ 34/34 5.4 configs/s 2026-02-21T10:14:18.4848196Z [2851s] Generation 9 complete: 2026-02-21T10:14:18.4848477Z error=15 2026-02-21T10:14:18.4848650Z ok=65 2026-02-21T10:14:18.4848815Z min=6.2016 2026-02-21T10:14:18.4849002Z mid=10.5176 2026-02-21T10:14:18.4849169Z max=1027.4597 2026-02-21T10:14:18.4849378Z best={'block_sizes': [32, 128, 128], 2026-02-21T10:14:18.4849750Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:14:18.4850688Z 'l2_groupings': [16], 2026-02-21T10:14:18.4850924Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:14:18.4851200Z 'loop_orders': [[1, 0]], 2026-02-21T10:14:18.4851416Z 'maxnreg': 256, 2026-02-21T10:14:18.4851640Z 'num_sm_multiplier': 64, 2026-02-21T10:14:18.4851862Z 'num_stages': 1, 2026-02-21T10:14:18.4852049Z 'num_warps': 4, 2026-02-21T10:14:18.4852270Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:14:18.4852548Z 'range_flattens': [True, True], 2026-02-21T10:14:18.4852802Z 'range_multi_buffers': [None, True], 2026-02-21T10:14:18.4853058Z 'range_num_stages': [4, 2], 2026-02-21T10:14:18.4853296Z 'range_unroll_factors': [1, 0], 2026-02-21T10:14:18.4853538Z 'range_warp_specializes': []} 2026-02-21T10:14:18.4899992Z [2851s] Fitting surrogate: 920 points, 920 targets 2026-02-21T10:14:19.7706055Z [2852s] Generation 10 starting: 77 neighbors, 4 active search path(s) 2026-02-21T10:14:44.9807007Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 78/78 3.8 configs/s 2026-02-21T10:15:10.5424410Z 2026-02-21T10:15:10.5424426Z 2026-02-21T10:15:10.5424832Z ================================================================ 2026-02-21T10:15:10.5425226Z Internal Triton PTX codegen error 2026-02-21T10:15:10.5425901Z `ptxas` stderr: 2026-02-21T10:15:10.5427059Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1007 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:15:10.5427907Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:15:10.5428139Z 2026-02-21T10:15:10.5428898Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8q8_cija.ptx -o /tmp/tmp8q8_cija.ptx.o 2026-02-21T10:15:10.5429636Z 2026-02-21T10:15:10.5429642Z 2026-02-21T10:15:10.5429716Z // 2026-02-21T10:15:10.5429908Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:15:10.5430144Z // 2026-02-21T10:15:10.5430258Z 2026-02-21T10:15:10.5430331Z .version 8.7 2026-02-21T10:15:10.5430507Z .target sm_90a 2026-02-21T10:15:10.5430693Z .address_size 64 2026-02-21T10:15:10.5430807Z 2026-02-21T10:15:10.5431044Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:15:10.5431523Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:15:10.5431818Z // @_helion_matmul_bf16_int4 2026-02-21T10:15:10.5432098Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:15:10.5432418Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:15:10.5432793Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:15:10.5433165Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:15:10.5433539Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:15:10.5433907Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:15:10.5434203Z ) 2026-02-21T10:15:10.5434340Z .reqntid 128 2026-02-21T10:15:10.5434494Z .maxnreg 32 2026-02-21T10:15:10.5434631Z { 2026-02-21T10:15:10.5434790Z .reg .pred %p<64>; 2026-02-21T10:15:10.5434968Z .reg .b16 %rs<113>; 2026-02-21T10:15:10.5435146Z .reg .b32 %r<2654>; 2026-02-21T10:15:10.5435311Z .reg .b64 %rd<125>; 2026-02-21T10:15:10.5435655Z .loc 1 19 0 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:19:0 2026-02-21T10:15:10.5436053Z $L__func_begin0: 2026-02-21T10:15:10.5436370Z .loc 1 19 0 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:19:0 2026-02-21T10:15:10.5436888Z 2026-02-21T10:15:10.5436947Z // %bb.0: 2026-02-21T10:15:10.5437151Z ld.param.b64 %rd6, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:15:10.5437422Z $L__tmp0: 2026-02-21T10:15:10.5437731Z .loc 1 21 67 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:21:67 2026-02-21T10:15:10.5438126Z mov.u32 %r2517, %ctaid.x; 2026-02-21T10:15:10.5438707Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:15:10.5438978Z mov.u32 %r578, %ctaid.y; 2026-02-21T10:15:10.5439219Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:15:10.5439489Z mov.u32 %r579, %ctaid.z; 2026-02-21T10:15:10.5439673Z mov.u32 %r580, %nctaid.x; 2026-02-21T10:15:10.5439853Z mov.u32 %r581, %nctaid.y; 2026-02-21T10:15:10.5440046Z mad.lo.s32 %r582, %r579, %r581, %r578; 2026-02-21T10:15:10.5440274Z mad.lo.s32 %r583, %r582, %r580, %r2517; 2026-02-21T10:15:10.5440497Z shl.b32 %r584, %r583, 7; 2026-02-21T10:15:10.5440681Z cvt.s64.s32 %rd54, %r584; 2026-02-21T10:15:10.5440867Z add.s64 %rd23, %rd53, %rd54; 2026-02-21T10:15:10.5441059Z mov.u32 %r2, %tid.x; 2026-02-21T10:15:10.5441349Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:15:10.5441542Z shl.b32 %r585, %r2, 2; 2026-02-21T10:15:10.5441721Z mov.b32 %r586, global_smem; 2026-02-21T10:15:10.5441916Z add.s32 %r504, %r586, %r585; 2026-02-21T10:15:10.5442105Z mov.b32 %r2377, 0; 2026-02-21T10:15:10.5442274Z // begin inline asm 2026-02-21T10:15:10.5442453Z @%p1 st.shared.b32 [ %r504 + 0 ], %r2377; 2026-02-21T10:15:10.5442671Z // end inline asm 2026-02-21T10:15:10.5442835Z bar.warp.sync -1; 2026-02-21T10:15:10.5443093Z setp.eq.b32 %p61, %r2, 0; 2026-02-21T10:15:10.5443290Z cvt.u64.u32 %rd8, %r586; 2026-02-21T10:15:10.5443463Z // begin inline asm 2026-02-21T10:15:10.5443787Z @%p61 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T10:15:10.5444137Z // end inline asm 2026-02-21T10:15:10.5444298Z // begin inline asm 2026-02-21T10:15:10.5444572Z @%p61 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T10:15:10.5444890Z // end inline asm 2026-02-21T10:15:10.5445051Z mov.b32 %r506, 128; 2026-02-21T10:15:10.5445209Z // begin inline asm 2026-02-21T10:15:10.5445512Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r506; 2026-02-21T10:15:10.5445847Z // end inline asm 2026-02-21T10:15:10.5446001Z mov.b32 %r507, 16; 2026-02-21T10:15:10.5446153Z // begin inline asm 2026-02-21T10:15:10.5446433Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r507; 2026-02-21T10:15:10.5446899Z // end inline asm 2026-02-21T10:15:10.5447053Z mov.b32 %r508, 1280; 2026-02-21T10:15:10.5447218Z // begin inline asm 2026-02-21T10:15:10.5447521Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r508; 2026-02-21T10:15:10.5447872Z // end inline asm 2026-02-21T10:15:10.5448021Z mov.b32 %r509, 4096; 2026-02-21T10:15:10.5448187Z // begin inline asm 2026-02-21T10:15:10.5448472Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r509; 2026-02-21T10:15:10.5448816Z // end inline asm 2026-02-21T10:15:10.5448973Z mov.b64 %rd16, 1280; 2026-02-21T10:15:10.5449128Z // begin inline asm 2026-02-21T10:15:10.5449432Z @%p61 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T10:15:10.5450125Z [2903s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:15:10.5451710Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[None, True], range_num_stages=[4, 2], range_unroll_factors=[0, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:15:10.5453223Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:15:10.5453515Z `ptxas` stderr: 2026-02-21T10:15:10.5454066Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1007 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:15:10.5454810Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:15:10.5455190Z 2026-02-21T10:15:10.5455688Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8q8_cija.ptx -o /tmp/tmp8q8_cija.ptx.o 2026-02-21T10:15:10.5456273Z 2026-02-21T10:15:10.5456426Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:15:10.5456877Z // end inline asm 2026-02-21T10:15:10.5457026Z mov.b32 %r2376, 1; 2026-02-21T10:15:10.5457187Z // begin inline asm 2026-02-21T10:15:10.5457512Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r2376; 2026-02-21T10:15:10.5457880Z // end inline asm 2026-02-21T10:15:10.5458030Z // begin inline asm 2026-02-21T10:15:10.5458437Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r2376; 2026-02-21T10:15:10.5458802Z // end inline asm 2026-02-21T10:15:10.5458947Z // begin inline asm 2026-02-21T10:15:10.5459250Z @%p61 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:15:10.5459585Z // end inline asm 2026-02-21T10:15:10.5459737Z // begin inline asm 2026-02-21T10:15:10.5460117Z @%p61 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:15:10.5460473Z // end inline asm 2026-02-21T10:15:10.5460625Z // begin inline asm 2026-02-21T10:15:10.5460904Z @%p61 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T10:15:10.5461248Z // end inline asm 2026-02-21T10:15:10.5461417Z // begin inline asm 2026-02-21T10:15:10.5461692Z @%p61 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:15:10.5462016Z // end inline asm 2026-02-21T10:15:10.5462165Z // begin inline asm 2026-02-21T10:15:10.5462602Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T10:15:10.5463074Z // end inline asm 2026-02-21T10:15:10.5463228Z // begin inline asm 2026-02-21T10:15:10.5463470Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T10:15:10.5463779Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:15:10.5464003Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:15:10.5464211Z // end inline asm 2026-02-21T10:15:10.5464366Z bar.sync 0; 2026-02-21T10:15:10.5464519Z cvta.global.u64 %rd34, %rd23; 2026-02-21T10:15:10.5464871Z .loc 1 38 45 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:38:45 2026-02-21T10:15:10.5465240Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:15:10.5465414Z or.b32 %r6, %r5, 16; 2026-02-21T10:15:10.5465575Z or.b32 %r7, %r5, 32; 2026-02-21T10:15:10.5465731Z or.b32 %r8, %r5, 48; 2026-02-21T10:15:10.5465891Z or.b32 %r9, %r5, 64; 2026-02-21T10:15:10.5466045Z or.b32 %r10, %r5, 80; 2026-02-21T10:15:10.5466207Z or.b32 %r11, %r5, 96; 2026-02-21T10:15:10.5466366Z or.b32 %r12, %r5, 112; 2026-02-21T10:15:10.5466818Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5467177Z sub.s32 %r589, 5120, %r2517; 2026-02-21T10:15:10.5467370Z mul.hi.s32 %r590, %r589, 1041204193; 2026-02-21T10:15:10.5467569Z shr.u32 %r591, %r590, 31; 2026-02-21T10:15:10.5467749Z shr.s32 %r592, %r590, 11; 2026-02-21T10:15:10.5467926Z add.s32 %r30, %r592, %r591; 2026-02-21T10:15:10.5468106Z mul.lo.s32 %r593, %r30, 8448; 2026-02-21T10:15:10.5468301Z setp.ne.b32 %p28, %r589, %r593; 2026-02-21T10:15:10.5468597Z setp.lt.u32 %p29, %r2517, 5121; 2026-02-21T10:15:10.5468808Z and.pred %p30, %p29, %p28; 2026-02-21T10:15:10.5469000Z selp.b32 %r31, 1, 0, %p30; 2026-02-21T10:15:10.5469185Z add.s32 %r32, %r30, %r31; 2026-02-21T10:15:10.5469508Z .loc 1 53 38 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:53:38 2026-02-21T10:15:10.5469872Z and.b32 %r33, %r2, 7; 2026-02-21T10:15:10.5470043Z shl.b32 %r34, %r33, 2; 2026-02-21T10:15:10.5470526Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5470890Z add.s32 %r512, %r586, 47104; 2026-02-21T10:15:10.5471070Z // begin inline asm 2026-02-21T10:15:10.5471278Z @%p61 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T10:15:10.5471506Z // end inline asm 2026-02-21T10:15:10.5471659Z bar.sync 0; 2026-02-21T10:15:10.5471807Z add.s32 %r513, %r586, 47112; 2026-02-21T10:15:10.5471994Z // begin inline asm 2026-02-21T10:15:10.5472187Z @%p61 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T10:15:10.5472408Z // end inline asm 2026-02-21T10:15:10.5472560Z bar.sync 0; 2026-02-21T10:15:10.5472705Z add.s32 %r514, %r586, 47120; 2026-02-21T10:15:10.5472885Z // begin inline asm 2026-02-21T10:15:10.5473157Z @%p61 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T10:15:10.5473395Z // end inline asm 2026-02-21T10:15:10.5473552Z setp.lt.s32 %p31, %r32, 1; 2026-02-21T10:15:10.5473745Z setp.gt.s32 %p32, %r32, 0; 2026-02-21T10:15:10.5474069Z .loc 1 33 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:33:33 2026-02-21T10:15:10.5474430Z shr.u32 %r594, %r2517, 9; 2026-02-21T10:15:10.5474616Z and.b32 %r595, %r594, 4194288; 2026-02-21T10:15:10.5475010Z .loc 1 34 39 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:34:39 2026-02-21T10:15:10.5475372Z sub.s32 %r596, 10, %r595; 2026-02-21T10:15:10.5475680Z .loc 1 35 45 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:45 2026-02-21T10:15:10.5476030Z and.b32 %r597, %r2517, 8191; 2026-02-21T10:15:10.5476341Z .loc 1 36 51 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:36:51 2026-02-21T10:15:10.5476841Z div.s32 %r598, %r597, %r596; 2026-02-21T10:15:10.5477158Z .loc 1 35 64 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:64 2026-02-21T10:15:10.5477583Z mul.lo.s32 %r599, %r598, %r596; 2026-02-21T10:15:10.5477783Z sub.s32 %r600, %r597, %r599; 2026-02-21T10:15:10.5478097Z .loc 1 35 30 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:30 2026-02-21T10:15:10.5478449Z add.s32 %r601, %r600, %r595; 2026-02-21T10:15:10.5478763Z .loc 1 37 27 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:37:27 2026-02-21T10:15:10.5479114Z shl.b32 %r2374, %r601, 7; 2026-02-21T10:15:10.5479426Z .loc 1 39 27 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:39:27 2026-02-21T10:15:10.5479768Z shl.b32 %r2372, %r598, 7; 2026-02-21T10:15:10.5480074Z .loc 1 40 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:40:32 2026-02-21T10:15:10.5480421Z or.b32 %r2518, %r2372, %r5; 2026-02-21T10:15:10.5480606Z or.b32 %r2519, %r2372, %r6; 2026-02-21T10:15:10.5480780Z or.b32 %r2520, %r2372, %r7; 2026-02-21T10:15:10.5480957Z or.b32 %r2521, %r2372, %r8; 2026-02-21T10:15:10.5481135Z or.b32 %r2522, %r2372, %r9; 2026-02-21T10:15:10.5481312Z or.b32 %r2523, %r2372, %r10; 2026-02-21T10:15:10.5481497Z or.b32 %r2524, %r2372, %r11; 2026-02-21T10:15:10.5481668Z or.b32 %r2525, %r2372, %r12; 2026-02-21T10:15:10.5481989Z .loc 1 54 53 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:53 2026-02-21T10:15:10.5482336Z shl.b32 %r602, %r2518, 13; 2026-02-21T10:15:10.5482515Z shl.b32 %r603, %r2519, 13; 2026-02-21T10:15:10.5482682Z shl.b32 %r604, %r2520, 13; 2026-02-21T10:15:10.5482857Z shl.b32 %r605, %r2521, 13; 2026-02-21T10:15:10.5483028Z shl.b32 %r606, %r2522, 13; 2026-02-21T10:15:10.5483195Z shl.b32 %r607, %r2523, 13; 2026-02-21T10:15:10.5483372Z shl.b32 %r608, %r2524, 13; 2026-02-21T10:15:10.5483553Z shl.b32 %r609, %r2525, 13; 2026-02-21T10:15:10.5483870Z .loc 1 54 60 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:60 2026-02-21T10:15:10.5484220Z or.b32 %r610, %r602, %r34; 2026-02-21T10:15:10.5484497Z or.b32 %r611, %r603, %r34; 2026-02-21T10:15:10.5484734Z or.b32 %r612, %r604, %r34; 2026-02-21T10:15:10.5484911Z or.b32 %r613, %r605, %r34; 2026-02-21T10:15:10.5485087Z or.b32 %r614, %r606, %r34; 2026-02-21T10:15:10.5485255Z or.b32 %r615, %r607, %r34; 2026-02-21T10:15:10.5485436Z or.b32 %r616, %r608, %r34; 2026-02-21T10:15:10.5485607Z or.b32 %r617, %r609, %r34; 2026-02-21T10:15:10.5485926Z .loc 1 54 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:32 2026-02-21T10:15:10.5486281Z mad.wide.s32 %rd26, %r610, 2, %rd6; 2026-02-21T10:15:10.5486628Z mad.wide.s32 %rd27, %r611, 2, %rd6; 2026-02-21T10:15:10.5486838Z mad.wide.s32 %rd28, %r612, 2, %rd6; 2026-02-21T10:15:10.5487040Z mad.wide.s32 %rd29, %r613, 2, %rd6; 2026-02-21T10:15:10.5487328Z mad.wide.s32 %rd30, %r614, 2, %rd6; 2026-02-21T10:15:10.5487528Z mad.wide.s32 %rd31, %r615, 2, %rd6; 2026-02-21T10:15:10.5487727Z mad.wide.s32 %rd32, %r616, 2, %rd6; 2026-02-21T10:15:10.5487919Z mad.wide.s32 %rd33, %r617, 2, %rd6; 2026-02-21T10:15:10.5488259Z .loc 1 54 80 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:80 2026-02-21T10:15:10.5488617Z and.b32 %r45, %r2, 127; 2026-02-21T10:15:10.5488793Z shl.b32 %r618, %r45, 3; 2026-02-21T10:15:10.5489025Z shr.u32 %r619, %r2, 1; 2026-02-21T10:15:10.5489203Z and.b32 %r620, %r619, 24; 2026-02-21T10:15:10.5489380Z xor.b32 %r46, %r618, %r620; 2026-02-21T10:15:10.5489553Z add.s32 %r515, %r586, %r46; 2026-02-21T10:15:10.5489734Z selp.b32 %r516, 8, 0, %p32; 2026-02-21T10:15:10.5489907Z // begin inline asm 2026-02-21T10:15:10.5490151Z cp.async.ca.shared.global [ %r515 + 0 ], [ %rd26 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5490426Z // end inline asm 2026-02-21T10:15:10.5490587Z add.s32 %r517, %r515, 1024; 2026-02-21T10:15:10.5490763Z // begin inline asm 2026-02-21T10:15:10.5491001Z cp.async.ca.shared.global [ %r517 + 0 ], [ %rd27 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5491284Z // end inline asm 2026-02-21T10:15:10.5491445Z add.s32 %r519, %r515, 2048; 2026-02-21T10:15:10.5491618Z // begin inline asm 2026-02-21T10:15:10.5491837Z cp.async.ca.shared.global [ %r519 + 0 ], [ %rd28 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5492109Z // end inline asm 2026-02-21T10:15:10.5492258Z add.s32 %r521, %r515, 3072; 2026-02-21T10:15:10.5492436Z // begin inline asm 2026-02-21T10:15:10.5492657Z cp.async.ca.shared.global [ %r521 + 0 ], [ %rd29 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5492929Z // end inline asm 2026-02-21T10:15:10.5493076Z add.s32 %r523, %r515, 4096; 2026-02-21T10:15:10.5493252Z // begin inline asm 2026-02-21T10:15:10.5493479Z cp.async.ca.shared.global [ %r523 + 0 ], [ %rd30 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5493746Z // end inline asm 2026-02-21T10:15:10.5493901Z add.s32 %r525, %r515, 5120; 2026-02-21T10:15:10.5494076Z // begin inline asm 2026-02-21T10:15:10.5494304Z cp.async.ca.shared.global [ %r525 + 0 ], [ %rd31 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5494579Z // end inline asm 2026-02-21T10:15:10.5494744Z add.s32 %r527, %r515, 6144; 2026-02-21T10:15:10.5494916Z // begin inline asm 2026-02-21T10:15:10.5495142Z cp.async.ca.shared.global [ %r527 + 0 ], [ %rd32 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5495417Z // end inline asm 2026-02-21T10:15:10.5495568Z add.s32 %r529, %r515, 7168; 2026-02-21T10:15:10.5495746Z // begin inline asm 2026-02-21T10:15:10.5495966Z cp.async.ca.shared.global [ %r529 + 0 ], [ %rd33 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5496242Z // end inline asm 2026-02-21T10:15:10.5496399Z cp.async.commit_group; 2026-02-21T10:15:10.5496841Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5497201Z bar.sync 0; 2026-02-21T10:15:10.5497364Z and.pred %p22, %p61, %p32; 2026-02-21T10:15:10.5497542Z // begin inline asm 2026-02-21T10:15:10.5497785Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r512], 2048; 2026-02-21T10:15:10.5498054Z // end inline asm 2026-02-21T10:15:10.5498344Z .loc 1 60 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:60:33 2026-02-21T10:15:10.5498850Z bar.sync 0; 2026-02-21T10:15:10.5499004Z elect.sync %r621|%p33, -1; 2026-02-21T10:15:10.5499192Z and.pred %p34, %p32, %p33; 2026-02-21T10:15:10.5499375Z and.pred %p23, %p1, %p34; 2026-02-21T10:15:10.5499560Z add.s32 %r532, %r586, 40960; 2026-02-21T10:15:10.5499736Z // begin inline asm 2026-02-21T10:15:10.5500159Z @%p23 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r532], [%rd34, {%r2374, %r2377}], [%r512]; 2026-02-21T10:15:10.5500633Z // end inline asm 2026-02-21T10:15:10.5500930Z .loc 1 54 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:32 2026-02-21T10:15:10.5501298Z cvt.s64.s32 %rd55, %r602; 2026-02-21T10:15:10.5501545Z cvt.u64.u32 %rd56, %r34; 2026-02-21T10:15:10.5501729Z or.b64 %rd57, %rd55, %rd56; 2026-02-21T10:15:10.5501902Z shl.b64 %rd58, %rd57, 1; 2026-02-21T10:15:10.5502076Z add.s64 %rd59, %rd6, %rd58; 2026-02-21T10:15:10.5502260Z add.s64 %rd35, %rd59, 64; 2026-02-21T10:15:10.5502426Z cvt.s64.s32 %rd60, %r603; 2026-02-21T10:15:10.5502600Z or.b64 %rd61, %rd60, %rd56; 2026-02-21T10:15:10.5502773Z shl.b64 %rd62, %rd61, 1; 2026-02-21T10:15:10.5502950Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T10:15:10.5503210Z add.s64 %rd36, %rd63, 64; 2026-02-21T10:15:10.5503388Z cvt.s64.s32 %rd64, %r604; 2026-02-21T10:15:10.5503556Z or.b64 %rd65, %rd64, %rd56; 2026-02-21T10:15:10.5503734Z shl.b64 %rd66, %rd65, 1; 2026-02-21T10:15:10.5503900Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T10:15:10.5504080Z add.s64 %rd37, %rd67, 64; 2026-02-21T10:15:10.5504252Z cvt.s64.s32 %rd68, %r605; 2026-02-21T10:15:10.5504424Z or.b64 %rd69, %rd68, %rd56; 2026-02-21T10:15:10.5504601Z shl.b64 %rd70, %rd69, 1; 2026-02-21T10:15:10.5504771Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T10:15:10.5504949Z add.s64 %rd38, %rd71, 64; 2026-02-21T10:15:10.5505121Z cvt.s64.s32 %rd72, %r606; 2026-02-21T10:15:10.5505293Z or.b64 %rd73, %rd72, %rd56; 2026-02-21T10:15:10.5505467Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:15:10.5505641Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T10:15:10.5505813Z add.s64 %rd39, %rd75, 64; 2026-02-21T10:15:10.5505986Z cvt.s64.s32 %rd76, %r607; 2026-02-21T10:15:10.5514008Z or.b64 %rd77, %rd76, %rd56; 2026-02-21T10:15:10.5514316Z shl.b64 %rd78, %rd77, 1; 2026-02-21T10:15:10.5514517Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T10:15:10.5514719Z add.s64 %rd40, %rd79, 64; 2026-02-21T10:15:10.5514909Z cvt.s64.s32 %rd80, %r608; 2026-02-21T10:15:10.5515102Z or.b64 %rd81, %rd80, %rd56; 2026-02-21T10:15:10.5515299Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:15:10.5515489Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T10:15:10.5515668Z add.s64 %rd41, %rd83, 64; 2026-02-21T10:15:10.5515850Z cvt.s64.s32 %rd84, %r609; 2026-02-21T10:15:10.5516026Z or.b64 %rd85, %rd84, %rd56; 2026-02-21T10:15:10.5516209Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:15:10.5516387Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T10:15:10.5516730Z add.s64 %rd42, %rd87, 64; 2026-02-21T10:15:10.5517085Z .loc 1 54 80 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:80 2026-02-21T10:15:10.5517460Z add.s32 %r536, %r515, 8192; 2026-02-21T10:15:10.5517654Z // begin inline asm 2026-02-21T10:15:10.5517908Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd35 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5518203Z // end inline asm 2026-02-21T10:15:10.5518365Z add.s32 %r538, %r515, 9216; 2026-02-21T10:15:10.5518549Z // begin inline asm 2026-02-21T10:15:10.5518788Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd36 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5519065Z // end inline asm 2026-02-21T10:15:10.5519230Z add.s32 %r540, %r515, 10240; 2026-02-21T10:15:10.5519412Z // begin inline asm 2026-02-21T10:15:10.5519648Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd37 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5519918Z // end inline asm 2026-02-21T10:15:10.5520080Z add.s32 %r542, %r515, 11264; 2026-02-21T10:15:10.5520252Z // begin inline asm 2026-02-21T10:15:10.5520742Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd38 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5521020Z // end inline asm 2026-02-21T10:15:10.5521174Z add.s32 %r544, %r515, 12288; 2026-02-21T10:15:10.5521352Z // begin inline asm 2026-02-21T10:15:10.5521587Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd39 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5521860Z // end inline asm 2026-02-21T10:15:10.5522008Z add.s32 %r546, %r515, 13312; 2026-02-21T10:15:10.5522187Z // begin inline asm 2026-02-21T10:15:10.5522412Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd40 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5522682Z // end inline asm 2026-02-21T10:15:10.5522833Z add.s32 %r548, %r515, 14336; 2026-02-21T10:15:10.5523017Z // begin inline asm 2026-02-21T10:15:10.5523325Z cp.async.ca.shared.global [ %r548 + 0 ], [ %rd41 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5523596Z // end inline asm 2026-02-21T10:15:10.5523754Z add.s32 %r550, %r515, 15360; 2026-02-21T10:15:10.5523930Z // begin inline asm 2026-02-21T10:15:10.5524160Z cp.async.ca.shared.global [ %r550 + 0 ], [ %rd42 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5524424Z // end inline asm 2026-02-21T10:15:10.5524589Z cp.async.commit_group; 2026-02-21T10:15:10.5524979Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5525350Z bar.sync 0; 2026-02-21T10:15:10.5525508Z // begin inline asm 2026-02-21T10:15:10.5525733Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r513], 2048; 2026-02-21T10:15:10.5526009Z // end inline asm 2026-02-21T10:15:10.5526303Z .loc 1 60 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:60:33 2026-02-21T10:15:10.5526822Z bar.sync 0; 2026-02-21T10:15:10.5526980Z elect.sync %r622|%p35, -1; 2026-02-21T10:15:10.5527184Z and.pred %p36, %p32, %p35; 2026-02-21T10:15:10.5527385Z and.pred %p25, %p1, %p36; 2026-02-21T10:15:10.5527582Z add.s32 %r553, %r586, 43008; 2026-02-21T10:15:10.5527766Z // begin inline asm 2026-02-21T10:15:10.5528198Z @%p25 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r553], [%rd34, {%r2374, %r507}], [%r513]; 2026-02-21T10:15:10.5528672Z // end inline asm 2026-02-21T10:15:10.5528980Z .loc 1 54 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:32 2026-02-21T10:15:10.5529347Z add.s64 %rd44, %rd59, 128; 2026-02-21T10:15:10.5529525Z add.s64 %rd45, %rd63, 128; 2026-02-21T10:15:10.5529706Z add.s64 %rd46, %rd67, 128; 2026-02-21T10:15:10.5529888Z add.s64 %rd47, %rd71, 128; 2026-02-21T10:15:10.5530075Z add.s64 %rd48, %rd75, 128; 2026-02-21T10:15:10.5530264Z add.s64 %rd49, %rd79, 128; 2026-02-21T10:15:10.5530439Z add.s64 %rd50, %rd83, 128; 2026-02-21T10:15:10.5530620Z add.s64 %rd51, %rd87, 128; 2026-02-21T10:15:10.5530937Z .loc 1 54 80 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:80 2026-02-21T10:15:10.5531300Z add.s32 %r557, %r515, 16384; 2026-02-21T10:15:10.5531479Z // begin inline asm 2026-02-21T10:15:10.5531723Z cp.async.ca.shared.global [ %r557 + 0 ], [ %rd44 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5532007Z // end inline asm 2026-02-21T10:15:10.5532160Z add.s32 %r559, %r515, 17408; 2026-02-21T10:15:10.5532345Z // begin inline asm 2026-02-21T10:15:10.5532576Z cp.async.ca.shared.global [ %r559 + 0 ], [ %rd45 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5532857Z // end inline asm 2026-02-21T10:15:10.5533008Z add.s32 %r561, %r515, 18432; 2026-02-21T10:15:10.5533203Z // begin inline asm 2026-02-21T10:15:10.5533429Z cp.async.ca.shared.global [ %r561 + 0 ], [ %rd46 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5533704Z // end inline asm 2026-02-21T10:15:10.5533857Z add.s32 %r563, %r515, 19456; 2026-02-21T10:15:10.5534036Z // begin inline asm 2026-02-21T10:15:10.5534263Z cp.async.ca.shared.global [ %r563 + 0 ], [ %rd47 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5534531Z // end inline asm 2026-02-21T10:15:10.5534688Z add.s32 %r565, %r515, 20480; 2026-02-21T10:15:10.5534958Z // begin inline asm 2026-02-21T10:15:10.5535252Z cp.async.ca.shared.global [ %r565 + 0 ], [ %rd48 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5535518Z // end inline asm 2026-02-21T10:15:10.5535676Z add.s32 %r567, %r515, 21504; 2026-02-21T10:15:10.5535848Z // begin inline asm 2026-02-21T10:15:10.5536076Z cp.async.ca.shared.global [ %r567 + 0 ], [ %rd49 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5536347Z // end inline asm 2026-02-21T10:15:10.5536628Z add.s32 %r569, %r515, 22528; 2026-02-21T10:15:10.5536813Z // begin inline asm 2026-02-21T10:15:10.5537033Z cp.async.ca.shared.global [ %r569 + 0 ], [ %rd50 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5537305Z // end inline asm 2026-02-21T10:15:10.5537459Z add.s32 %r571, %r515, 23552; 2026-02-21T10:15:10.5537642Z // begin inline asm 2026-02-21T10:15:10.5537955Z cp.async.ca.shared.global [ %r571 + 0 ], [ %rd51 + 0 ], 0x8, %r516; 2026-02-21T10:15:10.5538247Z // end inline asm 2026-02-21T10:15:10.5538499Z cp.async.commit_group; 2026-02-21T10:15:10.5538826Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5539199Z bar.sync 0; 2026-02-21T10:15:10.5539351Z // begin inline asm 2026-02-21T10:15:10.5539587Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r514], 2048; 2026-02-21T10:15:10.5539930Z // end inline asm 2026-02-21T10:15:10.5540235Z .loc 1 60 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:60:33 2026-02-21T10:15:10.5540601Z bar.sync 0; 2026-02-21T10:15:10.5540765Z elect.sync %r623|%p37, -1; 2026-02-21T10:15:10.5540960Z and.pred %p38, %p32, %p37; 2026-02-21T10:15:10.5541156Z and.pred %p27, %p1, %p38; 2026-02-21T10:15:10.5541349Z add.s32 %r574, %r586, 45056; 2026-02-21T10:15:10.5541529Z mov.b32 %r2378, 32; 2026-02-21T10:15:10.5541698Z // begin inline asm 2026-02-21T10:15:10.5542125Z @%p27 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r574], [%rd34, {%r2374, %r2378}], [%r514]; 2026-02-21T10:15:10.5542593Z // end inline asm 2026-02-21T10:15:10.5542893Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5543259Z @%p31 bra $L__BB0_7; 2026-02-21T10:15:10.5543446Z // %bb.1: // %.lr.ph 2026-02-21T10:15:10.5543817Z .loc 1 0 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:0:90 2026-02-21T10:15:10.5544232Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:15:10.5544484Z shr.u32 %r3, %r2, 5; 2026-02-21T10:15:10.5544654Z and.b32 %r4, %r2, 120; 2026-02-21T10:15:10.5544818Z shr.u32 %r587, %r2, 4; 2026-02-21T10:15:10.5544994Z bfe.u32 %r13, %r2, 4, 3; 2026-02-21T10:15:10.5545164Z or.b32 %r14, %r13, 8; 2026-02-21T10:15:10.5545333Z or.b32 %r15, %r13, 16; 2026-02-21T10:15:10.5545500Z or.b32 %r16, %r13, 24; 2026-02-21T10:15:10.5545657Z or.b32 %r17, %r13, 32; 2026-02-21T10:15:10.5545823Z or.b32 %r18, %r13, 40; 2026-02-21T10:15:10.5545984Z or.b32 %r19, %r13, 48; 2026-02-21T10:15:10.5546156Z or.b32 %r20, %r587, 56; 2026-02-21T10:15:10.5546324Z or.b32 %r21, %r13, 64; 2026-02-21T10:15:10.5546621Z or.b32 %r22, %r13, 72; 2026-02-21T10:15:10.5546788Z or.b32 %r23, %r13, 80; 2026-02-21T10:15:10.5546957Z or.b32 %r24, %r13, 88; 2026-02-21T10:15:10.5547117Z or.b32 %r25, %r13, 96; 2026-02-21T10:15:10.5547289Z or.b32 %r26, %r13, 104; 2026-02-21T10:15:10.5547460Z or.b32 %r27, %r13, 112; 2026-02-21T10:15:10.5547622Z or.b32 %r28, %r587, 120; 2026-02-21T10:15:10.5547800Z shl.b32 %r588, %r2, 3; 2026-02-21T10:15:10.5547981Z and.b32 %r29, %r588, 120; 2026-02-21T10:15:10.5548307Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5548742Z shl.b32 %r630, %r32, 8; 2026-02-21T10:15:10.5548921Z add.s32 %r47, %r630, -3; 2026-02-21T10:15:10.5549090Z shl.b32 %r631, %r2, 5; 2026-02-21T10:15:10.5549264Z and.b32 %r632, %r631, 3072; 2026-02-21T10:15:10.5549449Z shl.b32 %r633, %r2, 4; 2026-02-21T10:15:10.5549760Z and.b32 %r634, %r633, 448; 2026-02-21T10:15:10.5549944Z and.b32 %r635, %r2, 3; 2026-02-21T10:15:10.5550105Z shl.b32 %r636, %r635, 1; 2026-02-21T10:15:10.5550276Z and.b32 %r637, %r2, 24; 2026-02-21T10:15:10.5550440Z or.b32 %r638, %r632, %r634; 2026-02-21T10:15:10.5550621Z or.b32 %r639, %r636, %r637; 2026-02-21T10:15:10.5550794Z or.b32 %r48, %r638, %r639; 2026-02-21T10:15:10.5550974Z xor.b32 %r49, %r48, 8; 2026-02-21T10:15:10.5551142Z xor.b32 %r50, %r48, 16; 2026-02-21T10:15:10.5551312Z xor.b32 %r51, %r48, 24; 2026-02-21T10:15:10.5551479Z shl.b32 %r640, %r45, 7; 2026-02-21T10:15:10.5551639Z shl.b32 %r641, %r33, 4; 2026-02-21T10:15:10.5551807Z or.b32 %r642, %r640, %r641; 2026-02-21T10:15:10.5551983Z add.s32 %r644, %r586, 24576; 2026-02-21T10:15:10.5552253Z add.s32 %r52, %r644, %r642; 2026-02-21T10:15:10.5552435Z xor.b32 %r645, %r642, 16; 2026-02-21T10:15:10.5552613Z add.s32 %r53, %r644, %r645; 2026-02-21T10:15:10.5552782Z xor.b32 %r646, %r642, 32; 2026-02-21T10:15:10.5552961Z add.s32 %r54, %r644, %r646; 2026-02-21T10:15:10.5553142Z xor.b32 %r647, %r642, 48; 2026-02-21T10:15:10.5553310Z add.s32 %r55, %r644, %r647; 2026-02-21T10:15:10.5553490Z xor.b32 %r648, %r642, 64; 2026-02-21T10:15:10.5553657Z add.s32 %r56, %r644, %r648; 2026-02-21T10:15:10.5553925Z xor.b32 %r649, %r642, 80; 2026-02-21T10:15:10.5554094Z add.s32 %r57, %r644, %r649; 2026-02-21T10:15:10.5554271Z xor.b32 %r650, %r642, 96; 2026-02-21T10:15:10.5554437Z add.s32 %r58, %r644, %r650; 2026-02-21T10:15:10.5554610Z xor.b32 %r651, %r642, 112; 2026-02-21T10:15:10.5554784Z add.s32 %r59, %r644, %r651; 2026-02-21T10:15:10.5554959Z bfe.u32 %r652, %r644, 4, 14; 2026-02-21T10:15:10.5555140Z cvt.u64.u32 %rd88, %r652; 2026-02-21T10:15:10.5555319Z or.b64 %rd92, %rd88, 4611686293372403712; 2026-02-21T10:15:10.5555552Z add.s32 %r653, %r586, 24608; 2026-02-21T10:15:10.5555727Z bfe.u32 %r654, %r653, 4, 14; 2026-02-21T10:15:10.5555903Z cvt.u64.u32 %rd89, %r654; 2026-02-21T10:15:10.5556085Z or.b64 %rd93, %rd89, 4611686293372403712; 2026-02-21T10:15:10.5556292Z add.s32 %r655, %r586, 24640; 2026-02-21T10:15:10.5556585Z bfe.u32 %r656, %r655, 4, 14; 2026-02-21T10:15:10.5556776Z cvt.u64.u32 %rd90, %r656; 2026-02-21T10:15:10.5556958Z or.b64 %rd94, %rd90, 4611686293372403712; 2026-02-21T10:15:10.5557157Z add.s32 %r657, %r586, 24672; 2026-02-21T10:15:10.5557355Z bfe.u32 %r658, %r657, 4, 14; 2026-02-21T10:15:10.5557530Z cvt.u64.u32 %rd91, %r658; 2026-02-21T10:15:10.5557711Z or.b64 %rd95, %rd91, 4611686293372403712; 2026-02-21T10:15:10.5557908Z shl.b32 %r659, %r635, 11; 2026-02-21T10:15:10.5558081Z shl.b32 %r660, %r635, 5; 2026-02-21T10:15:10.5558247Z shl.b32 %r661, %r4, 4; 2026-02-21T10:15:10.5558414Z and.b32 %r663, %r585, 16; 2026-02-21T10:15:10.5558593Z or.b32 %r664, %r661, %r663; 2026-02-21T10:15:10.5558767Z or.b32 %r665, %r664, %r659; 2026-02-21T10:15:10.5558944Z or.b32 %r666, %r665, %r660; 2026-02-21T10:15:10.5559120Z add.s32 %r60, %r644, %r666; 2026-02-21T10:15:10.5559298Z xor.b32 %r667, %r666, 32; 2026-02-21T10:15:10.5559470Z add.s32 %r61, %r644, %r667; 2026-02-21T10:15:10.5559641Z xor.b32 %r668, %r666, 64; 2026-02-21T10:15:10.5559822Z add.s32 %r62, %r644, %r668; 2026-02-21T10:15:10.5559993Z xor.b32 %r669, %r666, 96; 2026-02-21T10:15:10.5560169Z add.s32 %r63, %r644, %r669; 2026-02-21T10:15:10.5560342Z shl.b32 %r670, %r637, 8; 2026-02-21T10:15:10.5560517Z and.b32 %r671, %r585, 496; 2026-02-21T10:15:10.5560686Z or.b32 %r672, %r670, %r660; 2026-02-21T10:15:10.5560861Z xor.b32 %r673, %r672, %r671; 2026-02-21T10:15:10.5561042Z add.s32 %r2122, %r644, %r673; 2026-02-21T10:15:10.5561224Z add.s32 %r2127, %r2122, 512; 2026-02-21T10:15:10.5561404Z add.s32 %r2132, %r2122, 1024; 2026-02-21T10:15:10.5561576Z add.s32 %r2137, %r2122, 1536; 2026-02-21T10:15:10.5561756Z shl.b32 %r674, %r30, 8; 2026-02-21T10:15:10.5561921Z shl.b32 %r675, %r31, 8; 2026-02-21T10:15:10.5562091Z add.s32 %r68, %r674, %r675; 2026-02-21T10:15:10.5562267Z mov.b32 %r2384, 0f00000000; 2026-02-21T10:15:10.5562607Z mov.b32 %r2381, 2; 2026-02-21T10:15:10.5562770Z mov.b32 %r2380, -1; 2026-02-21T10:15:10.5562934Z mov.b32 %r2373, %r2372; 2026-02-21T10:15:10.5563102Z mov.b32 %r2375, %r2374; 2026-02-21T10:15:10.5563261Z mov.b32 %r2379, %r2377; 2026-02-21T10:15:10.5563428Z mov.b32 %r2382, %r2372; 2026-02-21T10:15:10.5563586Z mov.b32 %r2383, %r2374; 2026-02-21T10:15:10.5563751Z mov.b32 %r2385, %r2384; 2026-02-21T10:15:10.5563909Z mov.b32 %r2386, %r2384; 2026-02-21T10:15:10.5564072Z mov.b32 %r2387, %r2384; 2026-02-21T10:15:10.5564230Z mov.b32 %r2388, %r2384; 2026-02-21T10:15:10.5564400Z mov.b32 %r2389, %r2384; 2026-02-21T10:15:10.5564560Z mov.b32 %r2390, %r2384; 2026-02-21T10:15:10.5564724Z mov.b32 %r2391, %r2384; 2026-02-21T10:15:10.5564887Z mov.b32 %r2392, %r2384; 2026-02-21T10:15:10.5565142Z mov.b32 %r2393, %r2384; 2026-02-21T10:15:10.5565316Z mov.b32 %r2394, %r2384; 2026-02-21T10:15:10.5565476Z mov.b32 %r2395, %r2384; 2026-02-21T10:15:10.5565647Z mov.b32 %r2396, %r2384; 2026-02-21T10:15:10.5565804Z mov.b32 %r2397, %r2384; 2026-02-21T10:15:10.5565969Z mov.b32 %r2398, %r2384; 2026-02-21T10:15:10.5566127Z mov.b32 %r2399, %r2384; 2026-02-21T10:15:10.5566293Z mov.b32 %r2400, %r2384; 2026-02-21T10:15:10.5566659Z mov.b32 %r2401, %r2384; 2026-02-21T10:15:10.5566847Z mov.b32 %r2402, %r2384; 2026-02-21T10:15:10.5567013Z mov.b32 %r2403, %r2384; 2026-02-21T10:15:10.5567172Z mov.b32 %r2404, %r2384; 2026-02-21T10:15:10.5567340Z mov.b32 %r2405, %r2384; 2026-02-21T10:15:10.5567499Z mov.b32 %r2406, %r2384; 2026-02-21T10:15:10.5567665Z mov.b32 %r2407, %r2384; 2026-02-21T10:15:10.5567830Z mov.b32 %r2408, %r2384; 2026-02-21T10:15:10.5567997Z mov.b32 %r2409, %r2384; 2026-02-21T10:15:10.5568157Z mov.b32 %r2410, %r2384; 2026-02-21T10:15:10.5568325Z mov.b32 %r2411, %r2384; 2026-02-21T10:15:10.5568486Z mov.b32 %r2412, %r2384; 2026-02-21T10:15:10.5568658Z mov.b32 %r2413, %r2384; 2026-02-21T10:15:10.5568823Z mov.b32 %r2414, %r2384; 2026-02-21T10:15:10.5568990Z mov.b32 %r2415, %r2384; 2026-02-21T10:15:10.5569158Z mov.b32 %r2416, %r2384; 2026-02-21T10:15:10.5569316Z mov.b32 %r2417, %r2384; 2026-02-21T10:15:10.5569485Z mov.b32 %r2418, %r2384; 2026-02-21T10:15:10.5569641Z mov.b32 %r2419, %r2384; 2026-02-21T10:15:10.5569807Z mov.b32 %r2420, %r2384; 2026-02-21T10:15:10.5569963Z mov.b32 %r2421, %r2384; 2026-02-21T10:15:10.5570123Z mov.b32 %r2422, %r2384; 2026-02-21T10:15:10.5570281Z mov.b32 %r2423, %r2384; 2026-02-21T10:15:10.5570447Z mov.b32 %r2424, %r2384; 2026-02-21T10:15:10.5570611Z mov.b32 %r2425, %r2384; 2026-02-21T10:15:10.5570769Z mov.b32 %r2426, %r2384; 2026-02-21T10:15:10.5570932Z mov.b32 %r2427, %r2384; 2026-02-21T10:15:10.5571088Z mov.b32 %r2428, %r2384; 2026-02-21T10:15:10.5571266Z mov.b32 %r2429, %r2384; 2026-02-21T10:15:10.5571426Z mov.b32 %r2430, %r2384; 2026-02-21T10:15:10.5571592Z mov.b32 %r2431, %r2384; 2026-02-21T10:15:10.5571751Z mov.b32 %r2432, %r2384; 2026-02-21T10:15:10.5571917Z mov.b32 %r2433, %r2384; 2026-02-21T10:15:10.5572092Z mov.b32 %r2434, %r2384; 2026-02-21T10:15:10.5572260Z mov.b32 %r2435, %r2384; 2026-02-21T10:15:10.5572422Z mov.b32 %r2436, %r2384; 2026-02-21T10:15:10.5572582Z mov.b32 %r2437, %r2384; 2026-02-21T10:15:10.5572747Z mov.b32 %r2438, %r2384; 2026-02-21T10:15:10.5572912Z mov.b32 %r2439, %r2384; 2026-02-21T10:15:10.5573078Z mov.b32 %r2440, %r2384; 2026-02-21T10:15:10.5573235Z mov.b32 %r2441, %r2384; 2026-02-21T10:15:10.5573400Z mov.b32 %r2442, %r2384; 2026-02-21T10:15:10.5573557Z mov.b32 %r2443, %r2384; 2026-02-21T10:15:10.5573722Z mov.b32 %r2444, %r2384; 2026-02-21T10:15:10.5573880Z mov.b32 %r2445, %r2384; 2026-02-21T10:15:10.5574047Z mov.b32 %r2446, %r2384; 2026-02-21T10:15:10.5574213Z mov.b32 %r2447, %r2384; 2026-02-21T10:15:10.5574372Z mov.b32 %r2448, %r2384; 2026-02-21T10:15:10.5574538Z mov.b32 %r2449, %r2384; 2026-02-21T10:15:10.5574697Z mov.b32 %r2450, %r2384; 2026-02-21T10:15:10.5574863Z mov.b32 %r2451, %r2384; 2026-02-21T10:15:10.5575164Z mov.b32 %r2452, %r2384; 2026-02-21T10:15:10.5575329Z mov.b32 %r2453, %r2384; 2026-02-21T10:15:10.5575487Z mov.b32 %r2454, %r2384; 2026-02-21T10:15:10.5575655Z mov.b32 %r2455, %r2384; 2026-02-21T10:15:10.5575828Z mov.b32 %r2456, %r2384; 2026-02-21T10:15:10.5576003Z mov.b32 %r2457, %r2384; 2026-02-21T10:15:10.5576169Z mov.b32 %r2458, %r2384; 2026-02-21T10:15:10.5576329Z mov.b32 %r2459, %r2384; 2026-02-21T10:15:10.5576620Z mov.b32 %r2460, %r2384; 2026-02-21T10:15:10.5576785Z mov.b32 %r2461, %r2384; 2026-02-21T10:15:10.5576952Z mov.b32 %r2462, %r2384; 2026-02-21T10:15:10.5577114Z mov.b32 %r2463, %r2384; 2026-02-21T10:15:10.5577283Z mov.b32 %r2464, %r2384; 2026-02-21T10:15:10.5577442Z mov.b32 %r2465, %r2384; 2026-02-21T10:15:10.5577606Z mov.b32 %r2466, %r2384; 2026-02-21T10:15:10.5577848Z mov.b32 %r2467, %r2384; 2026-02-21T10:15:10.5578019Z mov.b32 %r2468, %r2384; 2026-02-21T10:15:10.5578183Z mov.b32 %r2469, %r2384; 2026-02-21T10:15:10.5578340Z mov.b32 %r2470, %r2384; 2026-02-21T10:15:10.5578514Z mov.b32 %r2471, %r2384; 2026-02-21T10:15:10.5578673Z mov.b32 %r2472, %r2384; 2026-02-21T10:15:10.5578839Z mov.b32 %r2473, %r2384; 2026-02-21T10:15:10.5579010Z mov.b32 %r2474, %r2384; 2026-02-21T10:15:10.5579176Z mov.b32 %r2475, %r2384; 2026-02-21T10:15:10.5579404Z mov.b32 %r2476, %r2384; 2026-02-21T10:15:10.5579575Z mov.b32 %r2477, %r2384; 2026-02-21T10:15:10.5579733Z mov.b32 %r2478, %r2384; 2026-02-21T10:15:10.5579896Z mov.b32 %r2479, %r2384; 2026-02-21T10:15:10.5580060Z mov.b32 %r2480, %r2384; 2026-02-21T10:15:10.5580217Z mov.b32 %r2481, %r2384; 2026-02-21T10:15:10.5580385Z mov.b32 %r2482, %r2384; 2026-02-21T10:15:10.5580543Z mov.b32 %r2483, %r2384; 2026-02-21T10:15:10.5580706Z mov.b32 %r2484, %r2384; 2026-02-21T10:15:10.5580865Z mov.b32 %r2485, %r2384; 2026-02-21T10:15:10.5581028Z mov.b32 %r2486, %r2384; 2026-02-21T10:15:10.5581189Z mov.b32 %r2487, %r2384; 2026-02-21T10:15:10.5581353Z mov.b32 %r2488, %r2384; 2026-02-21T10:15:10.5581519Z mov.b32 %r2489, %r2384; 2026-02-21T10:15:10.5581701Z mov.b32 %r2490, %r2384; 2026-02-21T10:15:10.5581867Z mov.b32 %r2491, %r2384; 2026-02-21T10:15:10.5582027Z mov.b32 %r2492, %r2384; 2026-02-21T10:15:10.5582194Z mov.b32 %r2493, %r2384; 2026-02-21T10:15:10.5582355Z mov.b32 %r2494, %r2384; 2026-02-21T10:15:10.5582521Z mov.b32 %r2495, %r2384; 2026-02-21T10:15:10.5582679Z mov.b32 %r2496, %r2384; 2026-02-21T10:15:10.5582843Z mov.b32 %r2497, %r2384; 2026-02-21T10:15:10.5583004Z mov.b32 %r2498, %r2384; 2026-02-21T10:15:10.5583170Z mov.b32 %r2499, %r2384; 2026-02-21T10:15:10.5583332Z mov.b32 %r2500, %r2384; 2026-02-21T10:15:10.5583499Z mov.b32 %r2501, %r2384; 2026-02-21T10:15:10.5583665Z mov.b32 %r2502, %r2384; 2026-02-21T10:15:10.5583825Z mov.b32 %r2503, %r2384; 2026-02-21T10:15:10.5583992Z mov.b32 %r2504, %r2384; 2026-02-21T10:15:10.5584150Z mov.b32 %r2505, %r2384; 2026-02-21T10:15:10.5584317Z mov.b32 %r2506, %r2384; 2026-02-21T10:15:10.5584491Z mov.b32 %r2507, %r2384; 2026-02-21T10:15:10.5584666Z mov.b32 %r2508, %r2384; 2026-02-21T10:15:10.5584826Z mov.b32 %r2509, %r2384; 2026-02-21T10:15:10.5584991Z mov.b32 %r2510, %r2384; 2026-02-21T10:15:10.5585152Z mov.b32 %r2511, %r2384; 2026-02-21T10:15:10.5585319Z mov.b32 %r2513, %r2381; 2026-02-21T10:15:10.5585488Z mov.b32 %r2514, %r2377; 2026-02-21T10:15:10.5585649Z mov.b32 %r2515, %r2383; 2026-02-21T10:15:10.5585815Z mov.b32 %r2516, %r2382; 2026-02-21T10:15:10.5585975Z bra.uni $L__BB0_2; 2026-02-21T10:15:10.5586195Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:15:10.5586711Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5587079Z add.s32 %r2514, %r2514, 1; 2026-02-21T10:15:10.5587266Z setp.ne.b32 %p60, %r68, %r2514; 2026-02-21T10:15:10.5587459Z mov.b32 %r2372, %r2382; 2026-02-21T10:15:10.5587625Z mov.b32 %r2373, %r77; 2026-02-21T10:15:10.5587788Z mov.b32 %r2374, %r2383; 2026-02-21T10:15:10.5588098Z mov.b32 %r2375, %r79; 2026-02-21T10:15:10.5588270Z mov.b32 %r2376, %r2513; 2026-02-21T10:15:10.5588523Z mov.b32 %r2377, %r81; 2026-02-21T10:15:10.5588683Z mov.b32 %r2382, %r2516; 2026-02-21T10:15:10.5588849Z mov.b32 %r2383, %r2515; 2026-02-21T10:15:10.5589010Z mov.b32 %r2513, %r220; 2026-02-21T10:15:10.5589180Z @%p60 bra $L__BB0_2; 2026-02-21T10:15:10.5589340Z bra.uni $L__BB0_7; 2026-02-21T10:15:10.5589552Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:15:10.5589965Z .loc 1 0 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:0:90 2026-02-21T10:15:10.5590314Z mov.b32 %r81, %r2376; 2026-02-21T10:15:10.5590476Z mov.b32 %r79, %r2374; 2026-02-21T10:15:10.5590632Z mov.b32 %r77, %r2372; 2026-02-21T10:15:10.5591009Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5591369Z add.s32 %r676, %r2513, 1; 2026-02-21T10:15:10.5591575Z setp.eq.b32 %p39, %r2513, 255; 2026-02-21T10:15:10.5591782Z selp.b32 %r220, 0, %r676, %p39; 2026-02-21T10:15:10.5591981Z setp.ne.b32 %p40, %r220, 0; 2026-02-21T10:15:10.5592167Z @%p40 bra $L__BB0_4; 2026-02-21T10:15:10.5592443Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:15:10.5592712Z add.s32 %r2517, %r2517, 8448; 2026-02-21T10:15:10.5593041Z .loc 1 32 35 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:32:35 2026-02-21T10:15:10.5593401Z shr.s32 %r677, %r2517, 31; 2026-02-21T10:15:10.5593582Z shr.u32 %r678, %r677, 19; 2026-02-21T10:15:10.5593760Z add.s32 %r679, %r2517, %r678; 2026-02-21T10:15:10.5593942Z shr.s32 %r680, %r679, 13; 2026-02-21T10:15:10.5594257Z .loc 1 33 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:33:33 2026-02-21T10:15:10.5594613Z shl.b32 %r681, %r680, 4; 2026-02-21T10:15:10.5594924Z .loc 1 34 39 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:34:39 2026-02-21T10:15:10.5595287Z sub.s32 %r682, 10, %r681; 2026-02-21T10:15:10.5595593Z .loc 1 34 52 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:34:52 2026-02-21T10:15:10.5595944Z min.s32 %r683, %r682, 16; 2026-02-21T10:15:10.5596260Z .loc 1 35 45 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:45 2026-02-21T10:15:10.5596737Z and.b32 %r684, %r679, -8192; 2026-02-21T10:15:10.5596926Z sub.s32 %r685, %r2517, %r684; 2026-02-21T10:15:10.5597245Z .loc 1 36 51 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:36:51 2026-02-21T10:15:10.5597603Z div.s32 %r686, %r685, %r683; 2026-02-21T10:15:10.5597920Z .loc 1 35 64 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:64 2026-02-21T10:15:10.5598283Z mul.lo.s32 %r687, %r686, %r683; 2026-02-21T10:15:10.5598480Z sub.s32 %r688, %r685, %r687; 2026-02-21T10:15:10.5598792Z .loc 1 35 30 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:35:30 2026-02-21T10:15:10.5599154Z add.s32 %r689, %r688, %r681; 2026-02-21T10:15:10.5599472Z .loc 1 37 27 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:37:27 2026-02-21T10:15:10.5599835Z shl.b32 %r2515, %r689, 7; 2026-02-21T10:15:10.5600144Z .loc 1 39 27 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:39:27 2026-02-21T10:15:10.5600496Z shl.b32 %r2516, %r686, 7; 2026-02-21T10:15:10.5600806Z .loc 1 40 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:40:32 2026-02-21T10:15:10.5601154Z or.b32 %r2518, %r2516, %r5; 2026-02-21T10:15:10.5601337Z or.b32 %r2519, %r2516, %r6; 2026-02-21T10:15:10.5601517Z or.b32 %r2520, %r2516, %r7; 2026-02-21T10:15:10.5601710Z or.b32 %r2521, %r2516, %r8; 2026-02-21T10:15:10.5601882Z or.b32 %r2522, %r2516, %r9; 2026-02-21T10:15:10.5602059Z or.b32 %r2523, %r2516, %r10; 2026-02-21T10:15:10.5602375Z or.b32 %r2524, %r2516, %r11; 2026-02-21T10:15:10.5602552Z or.b32 %r2525, %r2516, %r12; 2026-02-21T10:15:10.5602777Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:15:10.5603172Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5603527Z setp.eq.b32 %p51, %r220, 0; 2026-02-21T10:15:10.5603710Z setp.lt.s32 %p52, %r2514, %r47; 2026-02-21T10:15:10.5603903Z add.s32 %r2031, %r2380, 1; 2026-02-21T10:15:10.5604078Z setp.gt.s32 %p55, %r2031, 2; 2026-02-21T10:15:10.5604267Z selp.b32 %r2380, 0, %r2031, %p55; 2026-02-21T10:15:10.5604466Z selp.b32 %r2032, 1, 0, %p55; 2026-02-21T10:15:10.5604665Z xor.b32 %r2379, %r2379, %r2032; 2026-02-21T10:15:10.5605066Z .loc 1 54 80 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:80 2026-02-21T10:15:10.5605423Z cp.async.wait_group 2; 2026-02-21T10:15:10.5605598Z bar.sync 0; 2026-02-21T10:15:10.5605751Z shl.b32 %r2033, %r2380, 13; 2026-02-21T10:15:10.5605937Z add.s32 %r2035, %r586, %r2033; 2026-02-21T10:15:10.5606258Z .loc 1 58 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:58:32 2026-02-21T10:15:10.5606744Z add.s32 %r2036, %r2035, %r48; 2026-02-21T10:15:10.5607003Z ld.shared.b16 %rs1, [%r2036]; 2026-02-21T10:15:10.5607200Z ld.shared.b16 %rs2, [%r2036+512]; 2026-02-21T10:15:10.5607482Z ld.shared.b16 %rs3, [%r2036+32]; 2026-02-21T10:15:10.5607682Z ld.shared.b16 %rs4, [%r2036+544]; 2026-02-21T10:15:10.5607885Z ld.shared.b16 %rs5, [%r2036+4096]; 2026-02-21T10:15:10.5608087Z ld.shared.b16 %rs6, [%r2036+4608]; 2026-02-21T10:15:10.5608288Z ld.shared.b16 %rs7, [%r2036+4128]; 2026-02-21T10:15:10.5608479Z ld.shared.b16 %rs8, [%r2036+4640]; 2026-02-21T10:15:10.5608683Z add.s32 %r2037, %r2035, %r49; 2026-02-21T10:15:10.5608864Z ld.shared.b16 %rs9, [%r2037]; 2026-02-21T10:15:10.5609052Z ld.shared.b16 %rs10, [%r2037+512]; 2026-02-21T10:15:10.5609274Z ld.shared.b16 %rs11, [%r2037+32]; 2026-02-21T10:15:10.5609474Z ld.shared.b16 %rs12, [%r2037+544]; 2026-02-21T10:15:10.5609681Z ld.shared.b16 %rs13, [%r2037+4096]; 2026-02-21T10:15:10.5609887Z ld.shared.b16 %rs14, [%r2037+4608]; 2026-02-21T10:15:10.5610095Z ld.shared.b16 %rs15, [%r2037+4128]; 2026-02-21T10:15:10.5610293Z ld.shared.b16 %rs16, [%r2037+4640]; 2026-02-21T10:15:10.5610494Z add.s32 %r2038, %r2035, %r50; 2026-02-21T10:15:10.5610678Z ld.shared.b16 %rs17, [%r2038]; 2026-02-21T10:15:10.5610874Z ld.shared.b16 %rs18, [%r2038+512]; 2026-02-21T10:15:10.5611080Z ld.shared.b16 %rs19, [%r2038+32]; 2026-02-21T10:15:10.5611286Z ld.shared.b16 %rs20, [%r2038+544]; 2026-02-21T10:15:10.5611490Z ld.shared.b16 %rs21, [%r2038+4096]; 2026-02-21T10:15:10.5611689Z ld.shared.b16 %rs22, [%r2038+4608]; 2026-02-21T10:15:10.5611894Z ld.shared.b16 %rs23, [%r2038+4128]; 2026-02-21T10:15:10.5612091Z ld.shared.b16 %rs24, [%r2038+4640]; 2026-02-21T10:15:10.5612289Z add.s32 %r2039, %r2035, %r51; 2026-02-21T10:15:10.5612470Z ld.shared.b16 %rs25, [%r2039]; 2026-02-21T10:15:10.5612664Z ld.shared.b16 %rs26, [%r2039+512]; 2026-02-21T10:15:10.5612857Z ld.shared.b16 %rs27, [%r2039+32]; 2026-02-21T10:15:10.5613070Z ld.shared.b16 %rs28, [%r2039+544]; 2026-02-21T10:15:10.5613272Z ld.shared.b16 %rs29, [%r2039+4096]; 2026-02-21T10:15:10.5613469Z ld.shared.b16 %rs30, [%r2039+4608]; 2026-02-21T10:15:10.5613666Z ld.shared.b16 %rs31, [%r2039+4128]; 2026-02-21T10:15:10.5613862Z ld.shared.b16 %rs32, [%r2039+4640]; 2026-02-21T10:15:10.5614064Z cvt.f32.bf16 %r820, %rs1; 2026-02-21T10:15:10.5614238Z cvt.f32.bf16 %r821, %rs2; 2026-02-21T10:15:10.5614411Z cvt.f32.bf16 %r822, %rs9; 2026-02-21T10:15:10.5614580Z cvt.f32.bf16 %r823, %rs10; 2026-02-21T10:15:10.5614760Z cvt.f32.bf16 %r952, %rs17; 2026-02-21T10:15:10.5614932Z cvt.f32.bf16 %r953, %rs18; 2026-02-21T10:15:10.5615101Z cvt.f32.bf16 %r954, %rs25; 2026-02-21T10:15:10.5615276Z cvt.f32.bf16 %r955, %rs26; 2026-02-21T10:15:10.5615444Z cvt.f32.bf16 %r1084, %rs3; 2026-02-21T10:15:10.5615784Z cvt.f32.bf16 %r1085, %rs4; 2026-02-21T10:15:10.5615954Z cvt.f32.bf16 %r1086, %rs11; 2026-02-21T10:15:10.5616130Z cvt.f32.bf16 %r1087, %rs12; 2026-02-21T10:15:10.5616299Z cvt.f32.bf16 %r1216, %rs19; 2026-02-21T10:15:10.5616608Z cvt.f32.bf16 %r1217, %rs20; 2026-02-21T10:15:10.5616786Z cvt.f32.bf16 %r1218, %rs27; 2026-02-21T10:15:10.5616957Z cvt.f32.bf16 %r1219, %rs28; 2026-02-21T10:15:10.5617126Z cvt.f32.bf16 %r1348, %rs5; 2026-02-21T10:15:10.5617296Z cvt.f32.bf16 %r1349, %rs6; 2026-02-21T10:15:10.5617465Z cvt.f32.bf16 %r1350, %rs13; 2026-02-21T10:15:10.5617637Z cvt.f32.bf16 %r1351, %rs14; 2026-02-21T10:15:10.5617807Z cvt.f32.bf16 %r1480, %rs21; 2026-02-21T10:15:10.5617976Z cvt.f32.bf16 %r1481, %rs22; 2026-02-21T10:15:10.5618229Z cvt.f32.bf16 %r1482, %rs29; 2026-02-21T10:15:10.5618401Z cvt.f32.bf16 %r1483, %rs30; 2026-02-21T10:15:10.5618571Z cvt.f32.bf16 %r1612, %rs7; 2026-02-21T10:15:10.5618736Z cvt.f32.bf16 %r1613, %rs8; 2026-02-21T10:15:10.5618913Z cvt.f32.bf16 %r1614, %rs15; 2026-02-21T10:15:10.5619091Z cvt.f32.bf16 %r1615, %rs16; 2026-02-21T10:15:10.5619263Z cvt.f32.bf16 %r1744, %rs23; 2026-02-21T10:15:10.5619444Z cvt.f32.bf16 %r1745, %rs24; 2026-02-21T10:15:10.5619616Z cvt.f32.bf16 %r1746, %rs31; 2026-02-21T10:15:10.5619869Z cvt.f32.bf16 %r1747, %rs32; 2026-02-21T10:15:10.5620196Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5620562Z shl.b32 %r2040, %r2380, 3; 2026-02-21T10:15:10.5620736Z add.s32 %r690, %r512, %r2040; 2026-02-21T10:15:10.5620917Z // begin inline asm 2026-02-21T10:15:10.5621079Z 2026-02-21T10:15:10.5621203Z { 2026-02-21T10:15:10.5621340Z .reg .pred complete; 2026-02-21T10:15:10.5621500Z waitLoop: 2026-02-21T10:15:10.5621727Z mbarrier.try_wait.parity.shared.b64 complete, [%r690], %r2379; 2026-02-21T10:15:10.5622016Z @!complete bra.uni waitLoop; 2026-02-21T10:15:10.5622196Z } 2026-02-21T10:15:10.5622283Z 2026-02-21T10:15:10.5622349Z // end inline asm 2026-02-21T10:15:10.5622653Z .loc 1 60 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:60:33 2026-02-21T10:15:10.5623009Z shl.b32 %r2042, %r2380, 11; 2026-02-21T10:15:10.5623189Z add.s32 %r2044, %r532, %r2042; 2026-02-21T10:15:10.5623514Z .loc 1 78 58 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:78:58 2026-02-21T10:15:10.5623868Z add.s32 %r2045, %r2044, %r45; 2026-02-21T10:15:10.5624050Z xor.b32 %r2046, %r45, 16; 2026-02-21T10:15:10.5624221Z add.s32 %r2047, %r2044, %r2046; 2026-02-21T10:15:10.5624408Z xor.b32 %r2048, %r45, 32; 2026-02-21T10:15:10.5624577Z add.s32 %r2049, %r2044, %r2048; 2026-02-21T10:15:10.5624760Z xor.b32 %r2050, %r45, 48; 2026-02-21T10:15:10.5624936Z add.s32 %r2051, %r2044, %r2050; 2026-02-21T10:15:10.5625113Z xor.b32 %r2052, %r45, 64; 2026-02-21T10:15:10.5625284Z add.s32 %r2053, %r2044, %r2052; 2026-02-21T10:15:10.5625464Z xor.b32 %r2054, %r45, 80; 2026-02-21T10:15:10.5625639Z add.s32 %r2055, %r2044, %r2054; 2026-02-21T10:15:10.5625817Z xor.b32 %r2056, %r45, 96; 2026-02-21T10:15:10.5625989Z add.s32 %r2057, %r2044, %r2056; 2026-02-21T10:15:10.5626166Z xor.b32 %r2058, %r45, 112; 2026-02-21T10:15:10.5626352Z add.s32 %r2059, %r2044, %r2058; 2026-02-21T10:15:10.5626801Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5627163Z ld.shared.s8 %rs33, [%r2045]; 2026-02-21T10:15:10.5627493Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5627844Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:15:10.5628159Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5628593Z ld.shared.s8 %rs35, [%r2047+128]; 2026-02-21T10:15:10.5628932Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5629430Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:15:10.5629734Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5630089Z ld.shared.s8 %rs37, [%r2049+256]; 2026-02-21T10:15:10.5630416Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5630769Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:15:10.5631071Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5631436Z ld.shared.s8 %rs39, [%r2051+384]; 2026-02-21T10:15:10.5631771Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5632115Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:15:10.5632506Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5632862Z ld.shared.s8 %rs41, [%r2053+512]; 2026-02-21T10:15:10.5633197Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5633544Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:15:10.5633919Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5634280Z ld.shared.s8 %rs43, [%r2055+640]; 2026-02-21T10:15:10.5634609Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5634963Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:15:10.5635266Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5635621Z ld.shared.s8 %rs45, [%r2057+768]; 2026-02-21T10:15:10.5635949Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5636312Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:15:10.5636745Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5637103Z ld.shared.s8 %rs47, [%r2059+896]; 2026-02-21T10:15:10.5637431Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5637776Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:15:10.5638085Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5638435Z ld.shared.s8 %rs49, [%r2045+1024]; 2026-02-21T10:15:10.5638773Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5639128Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:15:10.5639439Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5639793Z ld.shared.s8 %rs51, [%r2047+1152]; 2026-02-21T10:15:10.5640119Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5640470Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:15:10.5640768Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5641122Z ld.shared.s8 %rs53, [%r2049+1280]; 2026-02-21T10:15:10.5641455Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5641806Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:15:10.5642113Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5642460Z ld.shared.s8 %rs55, [%r2051+1408]; 2026-02-21T10:15:10.5642788Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5643142Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:15:10.5643453Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5643806Z ld.shared.s8 %rs57, [%r2053+1536]; 2026-02-21T10:15:10.5644274Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5644622Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:15:10.5644924Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5645281Z ld.shared.s8 %rs59, [%r2055+1664]; 2026-02-21T10:15:10.5645612Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5645958Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:15:10.5646265Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5646737Z ld.shared.s8 %rs61, [%r2057+1792]; 2026-02-21T10:15:10.5647155Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5647503Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:15:10.5647810Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5648169Z ld.shared.s8 %rs63, [%r2059+1920]; 2026-02-21T10:15:10.5648498Z .loc 1 63 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:63:28 2026-02-21T10:15:10.5648915Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:15:10.5649224Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5649578Z cvt.s16.s8 %rs65, %rs34; 2026-02-21T10:15:10.5649765Z shr.s16 %rs66, %rs65, 4; 2026-02-21T10:15:10.5649943Z cvt.s16.s8 %rs67, %rs36; 2026-02-21T10:15:10.5650117Z shr.s16 %rs68, %rs67, 4; 2026-02-21T10:15:10.5650285Z shr.s16 %rs69, %rs33, 4; 2026-02-21T10:15:10.5650468Z shr.s16 %rs70, %rs35, 4; 2026-02-21T10:15:10.5650786Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5651151Z cvt.rn.f32.s16 %r2060, %rs70; 2026-02-21T10:15:10.5651346Z cvt.rn.f32.s16 %r2061, %rs69; 2026-02-21T10:15:10.5651540Z cvt.rn.f32.s16 %r2062, %rs68; 2026-02-21T10:15:10.5651718Z cvt.rn.f32.s16 %r2063, %rs66; 2026-02-21T10:15:10.5652038Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5652398Z cvt.s16.s8 %rs71, %rs38; 2026-02-21T10:15:10.5652567Z shr.s16 %rs72, %rs71, 4; 2026-02-21T10:15:10.5652742Z cvt.s16.s8 %rs73, %rs40; 2026-02-21T10:15:10.5652907Z shr.s16 %rs74, %rs73, 4; 2026-02-21T10:15:10.5653076Z shr.s16 %rs75, %rs37, 4; 2026-02-21T10:15:10.5653240Z shr.s16 %rs76, %rs39, 4; 2026-02-21T10:15:10.5653549Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5653898Z cvt.rn.f32.s16 %r2064, %rs76; 2026-02-21T10:15:10.5654084Z cvt.rn.f32.s16 %r2065, %rs75; 2026-02-21T10:15:10.5654268Z cvt.rn.f32.s16 %r2066, %rs74; 2026-02-21T10:15:10.5654449Z cvt.rn.f32.s16 %r2067, %rs72; 2026-02-21T10:15:10.5654781Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5655139Z cvt.s16.s8 %rs77, %rs42; 2026-02-21T10:15:10.5655314Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:15:10.5655482Z cvt.s16.s8 %rs79, %rs44; 2026-02-21T10:15:10.5655657Z shr.s16 %rs80, %rs79, 4; 2026-02-21T10:15:10.5655823Z shr.s16 %rs81, %rs41, 4; 2026-02-21T10:15:10.5655995Z shr.s16 %rs82, %rs43, 4; 2026-02-21T10:15:10.5656304Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5656771Z cvt.rn.f32.s16 %r2068, %rs82; 2026-02-21T10:15:10.5656956Z cvt.rn.f32.s16 %r2069, %rs81; 2026-02-21T10:15:10.5657135Z cvt.rn.f32.s16 %r2070, %rs80; 2026-02-21T10:15:10.5657317Z cvt.rn.f32.s16 %r2071, %rs78; 2026-02-21T10:15:10.5657634Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5657987Z cvt.s16.s8 %rs83, %rs46; 2026-02-21T10:15:10.5658246Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:15:10.5658493Z cvt.s16.s8 %rs85, %rs48; 2026-02-21T10:15:10.5658668Z shr.s16 %rs86, %rs85, 4; 2026-02-21T10:15:10.5658834Z shr.s16 %rs87, %rs45, 4; 2026-02-21T10:15:10.5659006Z shr.s16 %rs88, %rs47, 4; 2026-02-21T10:15:10.5659317Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5659674Z cvt.rn.f32.s16 %r2072, %rs88; 2026-02-21T10:15:10.5659866Z cvt.rn.f32.s16 %r2073, %rs87; 2026-02-21T10:15:10.5660059Z cvt.rn.f32.s16 %r2074, %rs86; 2026-02-21T10:15:10.5660245Z cvt.rn.f32.s16 %r2075, %rs84; 2026-02-21T10:15:10.5660578Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5660941Z cvt.s16.s8 %rs89, %rs50; 2026-02-21T10:15:10.5661190Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:15:10.5661373Z cvt.s16.s8 %rs91, %rs52; 2026-02-21T10:15:10.5661542Z shr.s16 %rs92, %rs91, 4; 2026-02-21T10:15:10.5661714Z shr.s16 %rs93, %rs49, 4; 2026-02-21T10:15:10.5661890Z shr.s16 %rs94, %rs51, 4; 2026-02-21T10:15:10.5662203Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5662558Z cvt.rn.f32.s16 %r2076, %rs94; 2026-02-21T10:15:10.5662811Z cvt.rn.f32.s16 %r2077, %rs93; 2026-02-21T10:15:10.5662999Z cvt.rn.f32.s16 %r2078, %rs92; 2026-02-21T10:15:10.5663178Z cvt.rn.f32.s16 %r2079, %rs90; 2026-02-21T10:15:10.5663501Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5663849Z cvt.s16.s8 %rs95, %rs54; 2026-02-21T10:15:10.5664021Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:15:10.5664187Z cvt.s16.s8 %rs97, %rs56; 2026-02-21T10:15:10.5664359Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:15:10.5664540Z shr.s16 %rs99, %rs53, 4; 2026-02-21T10:15:10.5664716Z shr.s16 %rs100, %rs55, 4; 2026-02-21T10:15:10.5665034Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5665392Z cvt.rn.f32.s16 %r2080, %rs100; 2026-02-21T10:15:10.5665585Z cvt.rn.f32.s16 %r2081, %rs99; 2026-02-21T10:15:10.5665764Z cvt.rn.f32.s16 %r2082, %rs98; 2026-02-21T10:15:10.5665946Z cvt.rn.f32.s16 %r2083, %rs96; 2026-02-21T10:15:10.5666258Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5666727Z cvt.s16.s8 %rs101, %rs58; 2026-02-21T10:15:10.5666916Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:15:10.5667097Z cvt.s16.s8 %rs103, %rs60; 2026-02-21T10:15:10.5667274Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:15:10.5667446Z shr.s16 %rs105, %rs57, 4; 2026-02-21T10:15:10.5667623Z shr.s16 %rs106, %rs59, 4; 2026-02-21T10:15:10.5667935Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5668295Z cvt.rn.f32.s16 %r2084, %rs106; 2026-02-21T10:15:10.5668568Z cvt.rn.f32.s16 %r2085, %rs105; 2026-02-21T10:15:10.5668766Z cvt.rn.f32.s16 %r2086, %rs104; 2026-02-21T10:15:10.5668959Z cvt.rn.f32.s16 %r2087, %rs102; 2026-02-21T10:15:10.5669280Z .loc 1 65 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:65:25 2026-02-21T10:15:10.5669638Z cvt.s16.s8 %rs107, %rs62; 2026-02-21T10:15:10.5669815Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:15:10.5669999Z cvt.s16.s8 %rs109, %rs64; 2026-02-21T10:15:10.5670168Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:15:10.5670355Z shr.s16 %rs111, %rs61, 4; 2026-02-21T10:15:10.5670526Z shr.s16 %rs112, %rs63, 4; 2026-02-21T10:15:10.5670838Z .loc 1 83 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:83:32 2026-02-21T10:15:10.5671198Z cvt.rn.f32.s16 %r2088, %rs112; 2026-02-21T10:15:10.5671388Z cvt.rn.f32.s16 %r2089, %rs111; 2026-02-21T10:15:10.5671578Z cvt.rn.f32.s16 %r2090, %rs110; 2026-02-21T10:15:10.5671761Z cvt.rn.f32.s16 %r2091, %rs108; 2026-02-21T10:15:10.5671999Z st.shared.v4.b32 [%r52], {%r2063, %r2061, %r2062, %r2060}; 2026-02-21T10:15:10.5672498Z st.shared.v4.b32 [%r53], {%r2067, %r2065, %r2066, %r2064}; 2026-02-21T10:15:10.5672795Z st.shared.v4.b32 [%r54], {%r2071, %r2069, %r2070, %r2068}; 2026-02-21T10:15:10.5673076Z st.shared.v4.b32 [%r55], {%r2075, %r2073, %r2074, %r2072}; 2026-02-21T10:15:10.5673371Z st.shared.v4.b32 [%r56], {%r2079, %r2077, %r2078, %r2076}; 2026-02-21T10:15:10.5673661Z st.shared.v4.b32 [%r57], {%r2083, %r2081, %r2082, %r2080}; 2026-02-21T10:15:10.5673948Z st.shared.v4.b32 [%r58], {%r2087, %r2085, %r2086, %r2084}; 2026-02-21T10:15:10.5674236Z st.shared.v4.b32 [%r59], {%r2091, %r2089, %r2090, %r2088}; 2026-02-21T10:15:10.5674478Z $L__tmp1: 2026-02-21T10:15:10.5674924Z .loc 2 291 36 // standard.py:291:36 @[ cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:90:40 ] 2026-02-21T10:15:10.5675360Z // begin inline asm 2026-02-21T10:15:10.5675556Z fence.proxy.async.shared::cta; 2026-02-21T10:15:10.5675757Z // end inline asm 2026-02-21T10:15:10.5675914Z bar.sync 0; 2026-02-21T10:15:10.5676090Z shfl.sync.idx.b32 %r2092, %r3, 0, 31, -1; 2026-02-21T10:15:10.5676318Z wgmma.fence.sync.aligned; 2026-02-21T10:15:10.5676631Z mov.pred %p41, -1; 2026-02-21T10:15:10.5676796Z // begin inline asm 2026-02-21T10:15:10.5678245Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r820,%r821,%r822,%r823}, %rd92, %p41, 1, 1; 2026-02-21T10:15:10.5679672Z // end inline asm 2026-02-21T10:15:10.5679830Z // begin inline asm 2026-02-21T10:15:10.5681170Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r952,%r953,%r954,%r955}, %rd93, %p41, 1, 1; 2026-02-21T10:15:10.5682569Z // end inline asm 2026-02-21T10:15:10.5682724Z // begin inline asm 2026-02-21T10:15:10.5684083Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r1084,%r1085,%r1086,%r1087}, %rd94, %p41, 1, 1; 2026-02-21T10:15:10.5685489Z // end inline asm 2026-02-21T10:15:10.5685643Z // begin inline asm 2026-02-21T10:15:10.5687093Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447}, {%r1216,%r1217,%r1218,%r1219}, %rd95, %p41, 1, 1; 2026-02-21T10:15:10.5687160Z // end inline asm 2026-02-21T10:15:10.5687231Z // begin inline asm 2026-02-21T10:15:10.5688491Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1348,%r1349,%r1350,%r1351}, %rd92, %p41, 1, 1; 2026-02-21T10:15:10.5688703Z // end inline asm 2026-02-21T10:15:10.5688775Z // begin inline asm 2026-02-21T10:15:10.5690099Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1480,%r1481,%r1482,%r1483}, %rd93, %p41, 1, 1; 2026-02-21T10:15:10.5690171Z // end inline asm 2026-02-21T10:15:10.5690233Z // begin inline asm 2026-02-21T10:15:10.5691551Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1612,%r1613,%r1614,%r1615}, %rd94, %p41, 1, 1; 2026-02-21T10:15:10.5691622Z // end inline asm 2026-02-21T10:15:10.5691682Z // begin inline asm 2026-02-21T10:15:10.5692937Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511}, {%r1744,%r1745,%r1746,%r1747}, %rd95, %p41, 1, 1; 2026-02-21T10:15:10.5693008Z // end inline asm 2026-02-21T10:15:10.5693090Z wgmma.commit_group.sync.aligned; 2026-02-21T10:15:10.5693150Z mov.b32 %r1877, 0; 2026-02-21T10:15:10.5693217Z mov.b32 %r1876, %r644; 2026-02-21T10:15:10.5693280Z mov.b32 %r1878, %r1877; 2026-02-21T10:15:10.5693342Z // begin inline asm 2026-02-21T10:15:10.5695425Z // wait for regs: %r2384,%r2385,%r2386,%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r1876,%r1877,%r1878 2026-02-21T10:15:10.5695508Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:15:10.5695574Z // end inline asm 2026-02-21T10:15:10.5695633Z $L__tmp2: 2026-02-21T10:15:10.5695851Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5696017Z add.s32 %r2093, %r2378, 16; 2026-02-21T10:15:10.5696089Z add.s32 %r2094, %r2381, 1; 2026-02-21T10:15:10.5696159Z setp.gt.s32 %p56, %r2094, 2; 2026-02-21T10:15:10.5696228Z selp.b32 %r2381, 0, %r2094, %p56; 2026-02-21T10:15:10.5696301Z selp.b32 %r2378, 0, %r2093, %p51; 2026-02-21T10:15:10.5696625Z .loc 1 51 22 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:51:22 2026-02-21T10:15:10.5696692Z shl.b32 %r2095, %r2378, 1; 2026-02-21T10:15:10.5696900Z .loc 1 53 25 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:53:25 2026-02-21T10:15:10.5696966Z add.s32 %r2096, %r2095, %r34; 2026-02-21T10:15:10.5697164Z .loc 1 54 53 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:53 2026-02-21T10:15:10.5697316Z shl.b32 %r2097, %r2518, 13; 2026-02-21T10:15:10.5697385Z shl.b32 %r2098, %r2519, 13; 2026-02-21T10:15:10.5697447Z shl.b32 %r2099, %r2520, 13; 2026-02-21T10:15:10.5697512Z shl.b32 %r2100, %r2521, 13; 2026-02-21T10:15:10.5697583Z shl.b32 %r2101, %r2522, 13; 2026-02-21T10:15:10.5697646Z shl.b32 %r2102, %r2523, 13; 2026-02-21T10:15:10.5697708Z shl.b32 %r2103, %r2524, 13; 2026-02-21T10:15:10.5697775Z shl.b32 %r2104, %r2525, 13; 2026-02-21T10:15:10.5698039Z .loc 1 54 60 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:60 2026-02-21T10:15:10.5698107Z add.s32 %r2105, %r2097, %r2096; 2026-02-21T10:15:10.5698171Z add.s32 %r2106, %r2098, %r2096; 2026-02-21T10:15:10.5698241Z add.s32 %r2107, %r2099, %r2096; 2026-02-21T10:15:10.5698305Z add.s32 %r2108, %r2100, %r2096; 2026-02-21T10:15:10.5698368Z add.s32 %r2109, %r2101, %r2096; 2026-02-21T10:15:10.5698439Z add.s32 %r2110, %r2102, %r2096; 2026-02-21T10:15:10.5698505Z add.s32 %r2111, %r2103, %r2096; 2026-02-21T10:15:10.5698570Z add.s32 %r2112, %r2104, %r2096; 2026-02-21T10:15:10.5698778Z .loc 1 54 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:32 2026-02-21T10:15:10.5698857Z mad.wide.s32 %rd100, %r2105, 2, %rd6; 2026-02-21T10:15:10.5698928Z mad.wide.s32 %rd101, %r2106, 2, %rd6; 2026-02-21T10:15:10.5698998Z mad.wide.s32 %rd102, %r2107, 2, %rd6; 2026-02-21T10:15:10.5699073Z mad.wide.s32 %rd103, %r2108, 2, %rd6; 2026-02-21T10:15:10.5699144Z mad.wide.s32 %rd104, %r2109, 2, %rd6; 2026-02-21T10:15:10.5699211Z mad.wide.s32 %rd105, %r2110, 2, %rd6; 2026-02-21T10:15:10.5699286Z mad.wide.s32 %rd106, %r2111, 2, %rd6; 2026-02-21T10:15:10.5699356Z mad.wide.s32 %rd107, %r2112, 2, %rd6; 2026-02-21T10:15:10.5699557Z .loc 1 54 80 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:54:80 2026-02-21T10:15:10.5699628Z shl.b32 %r2113, %r2381, 13; 2026-02-21T10:15:10.5699697Z add.s32 %r2114, %r586, %r2113; 2026-02-21T10:15:10.5699762Z add.s32 %r2010, %r2114, %r46; 2026-02-21T10:15:10.5699829Z selp.b32 %r2011, 8, 0, %p52; 2026-02-21T10:15:10.5699897Z // begin inline asm 2026-02-21T10:15:10.5700048Z cp.async.ca.shared.global [ %r2010 + 0 ], [ %rd100 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5700110Z // end inline asm 2026-02-21T10:15:10.5700181Z add.s32 %r2012, %r2010, 1024; 2026-02-21T10:15:10.5700243Z // begin inline asm 2026-02-21T10:15:10.5700384Z cp.async.ca.shared.global [ %r2012 + 0 ], [ %rd101 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5700443Z // end inline asm 2026-02-21T10:15:10.5700515Z add.s32 %r2014, %r2010, 2048; 2026-02-21T10:15:10.5700575Z // begin inline asm 2026-02-21T10:15:10.5700709Z cp.async.ca.shared.global [ %r2014 + 0 ], [ %rd102 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5700773Z // end inline asm 2026-02-21T10:15:10.5700836Z add.s32 %r2016, %r2010, 3072; 2026-02-21T10:15:10.5700897Z // begin inline asm 2026-02-21T10:15:10.5701031Z cp.async.ca.shared.global [ %r2016 + 0 ], [ %rd103 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5701098Z // end inline asm 2026-02-21T10:15:10.5701158Z add.s32 %r2018, %r2010, 4096; 2026-02-21T10:15:10.5701218Z // begin inline asm 2026-02-21T10:15:10.5701492Z cp.async.ca.shared.global [ %r2018 + 0 ], [ %rd104 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5701551Z // end inline asm 2026-02-21T10:15:10.5701612Z add.s32 %r2020, %r2010, 5120; 2026-02-21T10:15:10.5701676Z // begin inline asm 2026-02-21T10:15:10.5701810Z cp.async.ca.shared.global [ %r2020 + 0 ], [ %rd105 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5701869Z // end inline asm 2026-02-21T10:15:10.5701930Z add.s32 %r2022, %r2010, 6144; 2026-02-21T10:15:10.5701996Z // begin inline asm 2026-02-21T10:15:10.5702127Z cp.async.ca.shared.global [ %r2022 + 0 ], [ %rd106 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5702186Z // end inline asm 2026-02-21T10:15:10.5702260Z add.s32 %r2024, %r2010, 7168; 2026-02-21T10:15:10.5702330Z // begin inline asm 2026-02-21T10:15:10.5702514Z cp.async.ca.shared.global [ %r2024 + 0 ], [ %rd107 + 0 ], 0x8, %r2011; 2026-02-21T10:15:10.5702574Z // end inline asm 2026-02-21T10:15:10.5702647Z cp.async.commit_group; 2026-02-21T10:15:10.5702851Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5702918Z shl.b32 %r2115, %r2381, 3; 2026-02-21T10:15:10.5702989Z add.s32 %r2026, %r512, %r2115; 2026-02-21T10:15:10.5703062Z and.pred %p49, %p61, %p52; 2026-02-21T10:15:10.5703188Z // begin inline asm 2026-02-21T10:15:10.5703330Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r2026], 2048; 2026-02-21T10:15:10.5703391Z // end inline asm 2026-02-21T10:15:10.5703590Z .loc 1 60 33 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:60:33 2026-02-21T10:15:10.5703654Z shl.b32 %r2116, %r2381, 11; 2026-02-21T10:15:10.5703722Z add.s32 %r2027, %r532, %r2116; 2026-02-21T10:15:10.5703781Z bar.sync 0; 2026-02-21T10:15:10.5703851Z elect.sync %r2117|%p57, -1; 2026-02-21T10:15:10.5703926Z and.pred %p58, %p52, %p57; 2026-02-21T10:15:10.5703995Z and.pred %p50, %p1, %p58; 2026-02-21T10:15:10.5704056Z // begin inline asm 2026-02-21T10:15:10.5704381Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r2027], [%rd34, {%r2515, %r2378}], [%r2026]; 2026-02-21T10:15:10.5704448Z // end inline asm 2026-02-21T10:15:10.5704648Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5704719Z setp.ne.b32 %p59, %r2377, 255; 2026-02-21T10:15:10.5704801Z @%p59 bra $L__BB0_6; 2026-02-21T10:15:10.5704914Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:15:10.5705119Z .loc 1 38 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:38:32 2026-02-21T10:15:10.5705191Z add.s32 %r2263, %r2375, %r29; 2026-02-21T10:15:10.5705389Z .loc 1 40 32 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:40:32 2026-02-21T10:15:10.5705454Z add.s32 %r2264, %r2373, %r13; 2026-02-21T10:15:10.5705523Z add.s32 %r2265, %r14, %r2373; 2026-02-21T10:15:10.5705586Z add.s32 %r2266, %r15, %r2373; 2026-02-21T10:15:10.5705651Z add.s32 %r2267, %r16, %r2373; 2026-02-21T10:15:10.5705716Z add.s32 %r2268, %r17, %r2373; 2026-02-21T10:15:10.5705784Z add.s32 %r2269, %r18, %r2373; 2026-02-21T10:15:10.5705845Z add.s32 %r2270, %r19, %r2373; 2026-02-21T10:15:10.5705907Z add.s32 %r2271, %r2373, %r20; 2026-02-21T10:15:10.5705974Z add.s32 %r2272, %r21, %r2373; 2026-02-21T10:15:10.5706037Z add.s32 %r2273, %r22, %r2373; 2026-02-21T10:15:10.5706103Z add.s32 %r2274, %r23, %r2373; 2026-02-21T10:15:10.5706165Z add.s32 %r2275, %r24, %r2373; 2026-02-21T10:15:10.5706232Z add.s32 %r2276, %r25, %r2373; 2026-02-21T10:15:10.5706295Z add.s32 %r2277, %r26, %r2373; 2026-02-21T10:15:10.5706357Z add.s32 %r2278, %r27, %r2373; 2026-02-21T10:15:10.5706424Z add.s32 %r2279, %r2373, %r28; 2026-02-21T10:15:10.5706743Z .loc 1 93 28 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:93:28 2026-02-21T10:15:10.5706827Z cvt.rn.bf16x2.f32 %r2280, %r2385, %r2384; 2026-02-21T10:15:10.5706910Z cvt.rn.bf16x2.f32 %r2281, %r2387, %r2386; 2026-02-21T10:15:10.5707127Z cvt.rn.bf16x2.f32 %r2282, %r2389, %r2388; 2026-02-21T10:15:10.5707200Z cvt.rn.bf16x2.f32 %r2283, %r2391, %r2390; 2026-02-21T10:15:10.5707273Z cvt.rn.bf16x2.f32 %r2284, %r2393, %r2392; 2026-02-21T10:15:10.5707351Z cvt.rn.bf16x2.f32 %r2285, %r2395, %r2394; 2026-02-21T10:15:10.5707424Z cvt.rn.bf16x2.f32 %r2286, %r2397, %r2396; 2026-02-21T10:15:10.5707496Z cvt.rn.bf16x2.f32 %r2287, %r2399, %r2398; 2026-02-21T10:15:10.5712345Z cvt.rn.bf16x2.f32 %r2288, %r2401, %r2400; 2026-02-21T10:15:10.5712483Z cvt.rn.bf16x2.f32 %r2289, %r2403, %r2402; 2026-02-21T10:15:10.5712579Z cvt.rn.bf16x2.f32 %r2290, %r2405, %r2404; 2026-02-21T10:15:10.5712657Z cvt.rn.bf16x2.f32 %r2291, %r2407, %r2406; 2026-02-21T10:15:10.5712733Z cvt.rn.bf16x2.f32 %r2292, %r2409, %r2408; 2026-02-21T10:15:10.5712963Z cvt.rn.bf16x2.f32 %r2293, %r2411, %r2410; 2026-02-21T10:15:10.5713040Z cvt.rn.bf16x2.f32 %r2294, %r2413, %r2412; 2026-02-21T10:15:10.5713112Z cvt.rn.bf16x2.f32 %r2295, %r2415, %r2414; 2026-02-21T10:15:10.5713191Z cvt.rn.bf16x2.f32 %r2296, %r2417, %r2416; 2026-02-21T10:15:10.5713270Z cvt.rn.bf16x2.f32 %r2297, %r2419, %r2418; 2026-02-21T10:15:10.5713342Z cvt.rn.bf16x2.f32 %r2298, %r2421, %r2420; 2026-02-21T10:15:10.5713487Z cvt.rn.bf16x2.f32 %r2299, %r2423, %r2422; 2026-02-21T10:15:10.5713571Z cvt.rn.bf16x2.f32 %r2300, %r2425, %r2424; 2026-02-21T10:15:10.5713643Z cvt.rn.bf16x2.f32 %r2301, %r2427, %r2426; 2026-02-21T10:15:10.5713715Z cvt.rn.bf16x2.f32 %r2302, %r2429, %r2428; 2026-02-21T10:15:10.5713792Z cvt.rn.bf16x2.f32 %r2303, %r2431, %r2430; 2026-02-21T10:15:10.5713862Z cvt.rn.bf16x2.f32 %r2304, %r2433, %r2432; 2026-02-21T10:15:10.5713933Z cvt.rn.bf16x2.f32 %r2305, %r2435, %r2434; 2026-02-21T10:15:10.5714006Z cvt.rn.bf16x2.f32 %r2306, %r2437, %r2436; 2026-02-21T10:15:10.5714086Z cvt.rn.bf16x2.f32 %r2307, %r2439, %r2438; 2026-02-21T10:15:10.5714159Z cvt.rn.bf16x2.f32 %r2308, %r2441, %r2440; 2026-02-21T10:15:10.5714229Z cvt.rn.bf16x2.f32 %r2309, %r2443, %r2442; 2026-02-21T10:15:10.5714314Z cvt.rn.bf16x2.f32 %r2310, %r2445, %r2444; 2026-02-21T10:15:10.5714385Z cvt.rn.bf16x2.f32 %r2311, %r2447, %r2446; 2026-02-21T10:15:10.5714457Z cvt.rn.bf16x2.f32 %r2312, %r2449, %r2448; 2026-02-21T10:15:10.5714529Z cvt.rn.bf16x2.f32 %r2313, %r2451, %r2450; 2026-02-21T10:15:10.5714606Z cvt.rn.bf16x2.f32 %r2314, %r2453, %r2452; 2026-02-21T10:15:10.5714679Z cvt.rn.bf16x2.f32 %r2315, %r2455, %r2454; 2026-02-21T10:15:10.5714750Z cvt.rn.bf16x2.f32 %r2316, %r2457, %r2456; 2026-02-21T10:15:10.5714827Z cvt.rn.bf16x2.f32 %r2317, %r2459, %r2458; 2026-02-21T10:15:10.5714911Z cvt.rn.bf16x2.f32 %r2318, %r2461, %r2460; 2026-02-21T10:15:10.5714983Z cvt.rn.bf16x2.f32 %r2319, %r2463, %r2462; 2026-02-21T10:15:10.5715059Z cvt.rn.bf16x2.f32 %r2320, %r2465, %r2464; 2026-02-21T10:15:10.5715132Z cvt.rn.bf16x2.f32 %r2321, %r2467, %r2466; 2026-02-21T10:15:10.5715203Z cvt.rn.bf16x2.f32 %r2322, %r2469, %r2468; 2026-02-21T10:15:10.5715275Z cvt.rn.bf16x2.f32 %r2323, %r2471, %r2470; 2026-02-21T10:15:10.5715358Z cvt.rn.bf16x2.f32 %r2324, %r2473, %r2472; 2026-02-21T10:15:10.5715431Z cvt.rn.bf16x2.f32 %r2325, %r2475, %r2474; 2026-02-21T10:15:10.5715502Z cvt.rn.bf16x2.f32 %r2326, %r2477, %r2476; 2026-02-21T10:15:10.5715582Z cvt.rn.bf16x2.f32 %r2327, %r2479, %r2478; 2026-02-21T10:15:10.5715654Z cvt.rn.bf16x2.f32 %r2328, %r2481, %r2480; 2026-02-21T10:15:10.5715733Z cvt.rn.bf16x2.f32 %r2329, %r2483, %r2482; 2026-02-21T10:15:10.5715814Z cvt.rn.bf16x2.f32 %r2330, %r2485, %r2484; 2026-02-21T10:15:10.5715886Z cvt.rn.bf16x2.f32 %r2331, %r2487, %r2486; 2026-02-21T10:15:10.5715957Z cvt.rn.bf16x2.f32 %r2332, %r2489, %r2488; 2026-02-21T10:15:10.5716029Z cvt.rn.bf16x2.f32 %r2333, %r2491, %r2490; 2026-02-21T10:15:10.5716106Z cvt.rn.bf16x2.f32 %r2334, %r2493, %r2492; 2026-02-21T10:15:10.5716181Z cvt.rn.bf16x2.f32 %r2335, %r2495, %r2494; 2026-02-21T10:15:10.5716251Z cvt.rn.bf16x2.f32 %r2336, %r2497, %r2496; 2026-02-21T10:15:10.5716328Z cvt.rn.bf16x2.f32 %r2337, %r2499, %r2498; 2026-02-21T10:15:10.5716722Z cvt.rn.bf16x2.f32 %r2338, %r2501, %r2500; 2026-02-21T10:15:10.5716798Z cvt.rn.bf16x2.f32 %r2339, %r2503, %r2502; 2026-02-21T10:15:10.5716869Z cvt.rn.bf16x2.f32 %r2340, %r2505, %r2504; 2026-02-21T10:15:10.5716950Z cvt.rn.bf16x2.f32 %r2341, %r2507, %r2506; 2026-02-21T10:15:10.5717021Z cvt.rn.bf16x2.f32 %r2342, %r2509, %r2508; 2026-02-21T10:15:10.5717092Z cvt.rn.bf16x2.f32 %r2343, %r2511, %r2510; 2026-02-21T10:15:10.5717339Z .loc 1 94 50 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:94:50 2026-02-21T10:15:10.5717419Z mad.lo.s32 %r2344, %r2264, 1280, %r2263; 2026-02-21T10:15:10.5717493Z mad.lo.s32 %r2345, %r2265, 1280, %r2263; 2026-02-21T10:15:10.5717568Z mad.lo.s32 %r2346, %r2266, 1280, %r2263; 2026-02-21T10:15:10.5717724Z mad.lo.s32 %r2347, %r2267, 1280, %r2263; 2026-02-21T10:15:10.5717800Z mad.lo.s32 %r2348, %r2268, 1280, %r2263; 2026-02-21T10:15:10.5717870Z mad.lo.s32 %r2349, %r2269, 1280, %r2263; 2026-02-21T10:15:10.5717950Z mad.lo.s32 %r2350, %r2270, 1280, %r2263; 2026-02-21T10:15:10.5718019Z mad.lo.s32 %r2351, %r2271, 1280, %r2263; 2026-02-21T10:15:10.5718087Z mad.lo.s32 %r2352, %r2272, 1280, %r2263; 2026-02-21T10:15:10.5718163Z mad.lo.s32 %r2353, %r2273, 1280, %r2263; 2026-02-21T10:15:10.5718292Z mad.lo.s32 %r2354, %r2274, 1280, %r2263; 2026-02-21T10:15:10.5718363Z mad.lo.s32 %r2355, %r2275, 1280, %r2263; 2026-02-21T10:15:10.5718439Z mad.lo.s32 %r2356, %r2276, 1280, %r2263; 2026-02-21T10:15:10.5718511Z mad.lo.s32 %r2357, %r2277, 1280, %r2263; 2026-02-21T10:15:10.5718581Z mad.lo.s32 %r2358, %r2278, 1280, %r2263; 2026-02-21T10:15:10.5718649Z mad.lo.s32 %r2359, %r2279, 1280, %r2263; 2026-02-21T10:15:10.5718872Z .loc 1 94 22 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:94:22 2026-02-21T10:15:10.5718947Z mad.wide.s32 %rd109, %r2344, 2, %rd7; 2026-02-21T10:15:10.5719016Z mad.wide.s32 %rd110, %r2345, 2, %rd7; 2026-02-21T10:15:10.5719094Z mad.wide.s32 %rd111, %r2346, 2, %rd7; 2026-02-21T10:15:10.5719165Z mad.wide.s32 %rd112, %r2347, 2, %rd7; 2026-02-21T10:15:10.5719232Z mad.wide.s32 %rd113, %r2348, 2, %rd7; 2026-02-21T10:15:10.5719303Z mad.wide.s32 %rd114, %r2349, 2, %rd7; 2026-02-21T10:15:10.5719372Z mad.wide.s32 %rd115, %r2350, 2, %rd7; 2026-02-21T10:15:10.5719439Z mad.wide.s32 %rd116, %r2351, 2, %rd7; 2026-02-21T10:15:10.5719506Z mad.wide.s32 %rd117, %r2352, 2, %rd7; 2026-02-21T10:15:10.5719581Z mad.wide.s32 %rd118, %r2353, 2, %rd7; 2026-02-21T10:15:10.5719654Z mad.wide.s32 %rd119, %r2354, 2, %rd7; 2026-02-21T10:15:10.5719722Z mad.wide.s32 %rd120, %r2355, 2, %rd7; 2026-02-21T10:15:10.5719796Z mad.wide.s32 %rd121, %r2356, 2, %rd7; 2026-02-21T10:15:10.5719863Z mad.wide.s32 %rd122, %r2357, 2, %rd7; 2026-02-21T10:15:10.5719932Z mad.wide.s32 %rd123, %r2358, 2, %rd7; 2026-02-21T10:15:10.5720004Z mad.wide.s32 %rd124, %r2359, 2, %rd7; 2026-02-21T10:15:10.5720212Z .loc 1 94 81 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:94:81 2026-02-21T10:15:10.5720333Z st.shared.v4.b32 [%r60], {%r2280, %r2282, %r2284, %r2286}; 2026-02-21T10:15:10.5720444Z st.shared.v4.b32 [%r61], {%r2288, %r2290, %r2292, %r2294}; 2026-02-21T10:15:10.5720552Z st.shared.v4.b32 [%r62], {%r2296, %r2298, %r2300, %r2302}; 2026-02-21T10:15:10.5720652Z st.shared.v4.b32 [%r63], {%r2304, %r2306, %r2308, %r2310}; 2026-02-21T10:15:10.5720712Z bar.sync 0; 2026-02-21T10:15:10.5720784Z // begin inline asm 2026-02-21T10:15:10.5720981Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2198, %r2199, %r2200, %r2201}, [%r2122]; 2026-02-21T10:15:10.5721043Z // end inline asm 2026-02-21T10:15:10.5721113Z // begin inline asm 2026-02-21T10:15:10.5721298Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2206, %r2207, %r2208, %r2209}, [%r2127]; 2026-02-21T10:15:10.5721359Z // end inline asm 2026-02-21T10:15:10.5721422Z // begin inline asm 2026-02-21T10:15:10.5721615Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2214, %r2215, %r2216, %r2217}, [%r2132]; 2026-02-21T10:15:10.5721808Z // end inline asm 2026-02-21T10:15:10.5721869Z // begin inline asm 2026-02-21T10:15:10.5722061Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2222, %r2223, %r2224, %r2225}, [%r2137]; 2026-02-21T10:15:10.5722123Z // end inline asm 2026-02-21T10:15:10.5722187Z bar.sync 0; 2026-02-21T10:15:10.5722301Z st.shared.v4.b32 [%r60], {%r2281, %r2283, %r2285, %r2287}; 2026-02-21T10:15:10.5724182Z st.shared.v4.b32 [%r61], {%r2289, %r2291, %r2293, %r2295}; 2026-02-21T10:15:10.5727652Z st.shared.v4.b32 [%r62], {%r2297, %r2299, %r2301, %r2303}; 2026-02-21T10:15:10.5727788Z st.shared.v4.b32 [%r63], {%r2305, %r2307, %r2309, %r2311}; 2026-02-21T10:15:10.5727850Z bar.sync 0; 2026-02-21T10:15:10.5727917Z // begin inline asm 2026-02-21T10:15:10.5728259Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2202, %r2203, %r2204, %r2205}, [%r2122]; 2026-02-21T10:15:10.5728327Z // end inline asm 2026-02-21T10:15:10.5728390Z // begin inline asm 2026-02-21T10:15:10.5728583Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2210, %r2211, %r2212, %r2213}, [%r2127]; 2026-02-21T10:15:10.5728652Z // end inline asm 2026-02-21T10:15:10.5728714Z // begin inline asm 2026-02-21T10:15:10.5728906Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2218, %r2219, %r2220, %r2221}, [%r2132]; 2026-02-21T10:15:10.5729042Z // end inline asm 2026-02-21T10:15:10.5729104Z // begin inline asm 2026-02-21T10:15:10.5729286Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2226, %r2227, %r2228, %r2229}, [%r2137]; 2026-02-21T10:15:10.5729379Z // end inline asm 2026-02-21T10:15:10.5729436Z bar.sync 0; 2026-02-21T10:15:10.5729553Z st.shared.v4.b32 [%r60], {%r2312, %r2314, %r2316, %r2318}; 2026-02-21T10:15:10.5729667Z st.shared.v4.b32 [%r61], {%r2320, %r2322, %r2324, %r2326}; 2026-02-21T10:15:10.5729772Z st.shared.v4.b32 [%r62], {%r2328, %r2330, %r2332, %r2334}; 2026-02-21T10:15:10.5729873Z st.shared.v4.b32 [%r63], {%r2336, %r2338, %r2340, %r2342}; 2026-02-21T10:15:10.5729936Z bar.sync 0; 2026-02-21T10:15:10.5729997Z // begin inline asm 2026-02-21T10:15:10.5730185Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2230, %r2231, %r2232, %r2233}, [%r2122]; 2026-02-21T10:15:10.5730252Z // end inline asm 2026-02-21T10:15:10.5730311Z // begin inline asm 2026-02-21T10:15:10.5730493Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2238, %r2239, %r2240, %r2241}, [%r2127]; 2026-02-21T10:15:10.5730550Z // end inline asm 2026-02-21T10:15:10.5730615Z // begin inline asm 2026-02-21T10:15:10.5730794Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2246, %r2247, %r2248, %r2249}, [%r2132]; 2026-02-21T10:15:10.5730855Z // end inline asm 2026-02-21T10:15:10.5730933Z // begin inline asm 2026-02-21T10:15:10.5731112Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2254, %r2255, %r2256, %r2257}, [%r2137]; 2026-02-21T10:15:10.5731171Z // end inline asm 2026-02-21T10:15:10.5731236Z bar.sync 0; 2026-02-21T10:15:10.5731357Z st.shared.v4.b32 [%r60], {%r2313, %r2315, %r2317, %r2319}; 2026-02-21T10:15:10.5731463Z st.shared.v4.b32 [%r61], {%r2321, %r2323, %r2325, %r2327}; 2026-02-21T10:15:10.5731569Z st.shared.v4.b32 [%r62], {%r2329, %r2331, %r2333, %r2335}; 2026-02-21T10:15:10.5731677Z st.shared.v4.b32 [%r63], {%r2337, %r2339, %r2341, %r2343}; 2026-02-21T10:15:10.5731734Z bar.sync 0; 2026-02-21T10:15:10.5731793Z // begin inline asm 2026-02-21T10:15:10.5731981Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2234, %r2235, %r2236, %r2237}, [%r2122]; 2026-02-21T10:15:10.5732040Z // end inline asm 2026-02-21T10:15:10.5732099Z // begin inline asm 2026-02-21T10:15:10.5732283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2242, %r2243, %r2244, %r2245}, [%r2127]; 2026-02-21T10:15:10.5732353Z // end inline asm 2026-02-21T10:15:10.5732414Z // begin inline asm 2026-02-21T10:15:10.5732601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2250, %r2251, %r2252, %r2253}, [%r2132]; 2026-02-21T10:15:10.5732667Z // end inline asm 2026-02-21T10:15:10.5732729Z // begin inline asm 2026-02-21T10:15:10.5732911Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2258, %r2259, %r2260, %r2261}, [%r2137]; 2026-02-21T10:15:10.5732974Z // end inline asm 2026-02-21T10:15:10.5733034Z // begin inline asm 2026-02-21T10:15:10.5733167Z st.global.v4.b32 [ %rd109 + 0 ], { %r2198, %r2199, %r2200, %r2201 }; 2026-02-21T10:15:10.5733227Z // end inline asm 2026-02-21T10:15:10.5733293Z // begin inline asm 2026-02-21T10:15:10.5733418Z st.global.v4.b32 [ %rd110 + 0 ], { %r2202, %r2203, %r2204, %r2205 }; 2026-02-21T10:15:10.5733623Z // end inline asm 2026-02-21T10:15:10.5733690Z // begin inline asm 2026-02-21T10:15:10.5733890Z st.global.v4.b32 [ %rd111 + 0 ], { %r2206, %r2207, %r2208, %r2209 }; 2026-02-21T10:15:10.5733949Z // end inline asm 2026-02-21T10:15:10.5734009Z // begin inline asm 2026-02-21T10:15:10.5734133Z st.global.v4.b32 [ %rd112 + 0 ], { %r2210, %r2211, %r2212, %r2213 }; 2026-02-21T10:15:10.5734255Z // end inline asm 2026-02-21T10:15:10.5734318Z // begin inline asm 2026-02-21T10:15:10.5734440Z st.global.v4.b32 [ %rd113 + 0 ], { %r2214, %r2215, %r2216, %r2217 }; 2026-02-21T10:15:10.5734500Z // end inline asm 2026-02-21T10:15:10.5734560Z // begin inline asm 2026-02-21T10:15:10.5734676Z st.global.v4.b32 [ %rd114 + 0 ], { %r2218, %r2219, %r2220, %r2221 }; 2026-02-21T10:15:10.5734739Z // end inline asm 2026-02-21T10:15:10.5734799Z // begin inline asm 2026-02-21T10:15:10.5734968Z st.global.v4.b32 [ %rd115 + 0 ], { %r2222, %r2223, %r2224, %r2225 }; 2026-02-21T10:15:10.5735037Z // end inline asm 2026-02-21T10:15:10.5735099Z // begin inline asm 2026-02-21T10:15:10.5735215Z st.global.v4.b32 [ %rd116 + 0 ], { %r2226, %r2227, %r2228, %r2229 }; 2026-02-21T10:15:10.5735281Z // end inline asm 2026-02-21T10:15:10.5735340Z // begin inline asm 2026-02-21T10:15:10.5735452Z st.global.v4.b32 [ %rd117 + 0 ], { %r2230, %r2231, %r2232, %r2233 }; 2026-02-21T10:15:10.5735511Z // end inline asm 2026-02-21T10:15:10.5735578Z // begin inline asm 2026-02-21T10:15:10.5735694Z st.global.v4.b32 [ %rd118 + 0 ], { %r2234, %r2235, %r2236, %r2237 }; 2026-02-21T10:15:10.5735754Z // end inline asm 2026-02-21T10:15:10.5735821Z // begin inline asm 2026-02-21T10:15:10.5735934Z st.global.v4.b32 [ %rd119 + 0 ], { %r2238, %r2239, %r2240, %r2241 }; 2026-02-21T10:15:10.5735992Z // end inline asm 2026-02-21T10:15:10.5736052Z // begin inline asm 2026-02-21T10:15:10.5736176Z st.global.v4.b32 [ %rd120 + 0 ], { %r2242, %r2243, %r2244, %r2245 }; 2026-02-21T10:15:10.5736238Z // end inline asm 2026-02-21T10:15:10.5736298Z // begin inline asm 2026-02-21T10:15:10.5736425Z st.global.v4.b32 [ %rd121 + 0 ], { %r2246, %r2247, %r2248, %r2249 }; 2026-02-21T10:15:10.5736638Z // end inline asm 2026-02-21T10:15:10.5736705Z // begin inline asm 2026-02-21T10:15:10.5736829Z st.global.v4.b32 [ %rd122 + 0 ], { %r2250, %r2251, %r2252, %r2253 }; 2026-02-21T10:15:10.5736888Z // end inline asm 2026-02-21T10:15:10.5736949Z // begin inline asm 2026-02-21T10:15:10.5737066Z st.global.v4.b32 [ %rd123 + 0 ], { %r2254, %r2255, %r2256, %r2257 }; 2026-02-21T10:15:10.5737129Z // end inline asm 2026-02-21T10:15:10.5737188Z // begin inline asm 2026-02-21T10:15:10.5737305Z st.global.v4.b32 [ %rd124 + 0 ], { %r2258, %r2259, %r2260, %r2261 }; 2026-02-21T10:15:10.5737370Z // end inline asm 2026-02-21T10:15:10.5737439Z mov.b32 %r2384, 0f00000000; 2026-02-21T10:15:10.5737505Z mov.b32 %r2385, %r2384; 2026-02-21T10:15:10.5737567Z mov.b32 %r2386, %r2384; 2026-02-21T10:15:10.5737634Z mov.b32 %r2387, %r2384; 2026-02-21T10:15:10.5737707Z mov.b32 %r2388, %r2384; 2026-02-21T10:15:10.5737771Z mov.b32 %r2389, %r2384; 2026-02-21T10:15:10.5737838Z mov.b32 %r2390, %r2384; 2026-02-21T10:15:10.5737900Z mov.b32 %r2391, %r2384; 2026-02-21T10:15:10.5737961Z mov.b32 %r2392, %r2384; 2026-02-21T10:15:10.5738020Z mov.b32 %r2393, %r2384; 2026-02-21T10:15:10.5738085Z mov.b32 %r2394, %r2384; 2026-02-21T10:15:10.5738144Z mov.b32 %r2395, %r2384; 2026-02-21T10:15:10.5738204Z mov.b32 %r2396, %r2384; 2026-02-21T10:15:10.5738272Z mov.b32 %r2397, %r2384; 2026-02-21T10:15:10.5738332Z mov.b32 %r2398, %r2384; 2026-02-21T10:15:10.5738390Z mov.b32 %r2399, %r2384; 2026-02-21T10:15:10.5738450Z mov.b32 %r2400, %r2384; 2026-02-21T10:15:10.5738515Z mov.b32 %r2401, %r2384; 2026-02-21T10:15:10.5738573Z mov.b32 %r2402, %r2384; 2026-02-21T10:15:10.5738631Z mov.b32 %r2403, %r2384; 2026-02-21T10:15:10.5738696Z mov.b32 %r2404, %r2384; 2026-02-21T10:15:10.5738756Z mov.b32 %r2405, %r2384; 2026-02-21T10:15:10.5738814Z mov.b32 %r2406, %r2384; 2026-02-21T10:15:10.5738880Z mov.b32 %r2407, %r2384; 2026-02-21T10:15:10.5739037Z mov.b32 %r2408, %r2384; 2026-02-21T10:15:10.5739164Z mov.b32 %r2409, %r2384; 2026-02-21T10:15:10.5739223Z mov.b32 %r2410, %r2384; 2026-02-21T10:15:10.5739288Z mov.b32 %r2411, %r2384; 2026-02-21T10:15:10.5739349Z mov.b32 %r2412, %r2384; 2026-02-21T10:15:10.5739408Z mov.b32 %r2413, %r2384; 2026-02-21T10:15:10.5739473Z mov.b32 %r2414, %r2384; 2026-02-21T10:15:10.5739604Z mov.b32 %r2415, %r2384; 2026-02-21T10:15:10.5739667Z mov.b32 %r2416, %r2384; 2026-02-21T10:15:10.5739725Z mov.b32 %r2417, %r2384; 2026-02-21T10:15:10.5739790Z mov.b32 %r2418, %r2384; 2026-02-21T10:15:10.5739849Z mov.b32 %r2419, %r2384; 2026-02-21T10:15:10.5739907Z mov.b32 %r2420, %r2384; 2026-02-21T10:15:10.5739972Z mov.b32 %r2421, %r2384; 2026-02-21T10:15:10.5740031Z mov.b32 %r2422, %r2384; 2026-02-21T10:15:10.5740150Z mov.b32 %r2423, %r2384; 2026-02-21T10:15:10.5740211Z mov.b32 %r2424, %r2384; 2026-02-21T10:15:10.5740276Z mov.b32 %r2425, %r2384; 2026-02-21T10:15:10.5740337Z mov.b32 %r2426, %r2384; 2026-02-21T10:15:10.5740396Z mov.b32 %r2427, %r2384; 2026-02-21T10:15:10.5740464Z mov.b32 %r2428, %r2384; 2026-02-21T10:15:10.5740523Z mov.b32 %r2429, %r2384; 2026-02-21T10:15:10.5740581Z mov.b32 %r2430, %r2384; 2026-02-21T10:15:10.5740640Z mov.b32 %r2431, %r2384; 2026-02-21T10:15:10.5740705Z mov.b32 %r2432, %r2384; 2026-02-21T10:15:10.5740767Z mov.b32 %r2433, %r2384; 2026-02-21T10:15:10.5740830Z mov.b32 %r2434, %r2384; 2026-02-21T10:15:10.5740896Z mov.b32 %r2435, %r2384; 2026-02-21T10:15:10.5740956Z mov.b32 %r2436, %r2384; 2026-02-21T10:15:10.5741017Z mov.b32 %r2437, %r2384; 2026-02-21T10:15:10.5741076Z mov.b32 %r2438, %r2384; 2026-02-21T10:15:10.5741144Z mov.b32 %r2439, %r2384; 2026-02-21T10:15:10.5741204Z mov.b32 %r2440, %r2384; 2026-02-21T10:15:10.5741262Z mov.b32 %r2441, %r2384; 2026-02-21T10:15:10.5741329Z mov.b32 %r2442, %r2384; 2026-02-21T10:15:10.5741390Z mov.b32 %r2443, %r2384; 2026-02-21T10:15:10.5741450Z mov.b32 %r2444, %r2384; 2026-02-21T10:15:10.5741509Z mov.b32 %r2445, %r2384; 2026-02-21T10:15:10.5741575Z mov.b32 %r2446, %r2384; 2026-02-21T10:15:10.5741725Z mov.b32 %r2447, %r2384; 2026-02-21T10:15:10.5741792Z mov.b32 %r2448, %r2384; 2026-02-21T10:15:10.5741862Z mov.b32 %r2449, %r2384; 2026-02-21T10:15:10.5741922Z mov.b32 %r2450, %r2384; 2026-02-21T10:15:10.5741981Z mov.b32 %r2451, %r2384; 2026-02-21T10:15:10.5742048Z mov.b32 %r2452, %r2384; 2026-02-21T10:15:10.5742110Z mov.b32 %r2453, %r2384; 2026-02-21T10:15:10.5742170Z mov.b32 %r2454, %r2384; 2026-02-21T10:15:10.5742230Z mov.b32 %r2455, %r2384; 2026-02-21T10:15:10.5742297Z mov.b32 %r2456, %r2384; 2026-02-21T10:15:10.5742359Z mov.b32 %r2457, %r2384; 2026-02-21T10:15:10.5742418Z mov.b32 %r2458, %r2384; 2026-02-21T10:15:10.5742484Z mov.b32 %r2459, %r2384; 2026-02-21T10:15:10.5742543Z mov.b32 %r2460, %r2384; 2026-02-21T10:15:10.5742604Z mov.b32 %r2461, %r2384; 2026-02-21T10:15:10.5742678Z mov.b32 %r2462, %r2384; 2026-02-21T10:15:10.5742748Z mov.b32 %r2463, %r2384; 2026-02-21T10:15:10.5742809Z mov.b32 %r2464, %r2384; 2026-02-21T10:15:10.5742874Z mov.b32 %r2465, %r2384; 2026-02-21T10:15:10.5742939Z mov.b32 %r2466, %r2384; 2026-02-21T10:15:10.5742999Z mov.b32 %r2467, %r2384; 2026-02-21T10:15:10.5743057Z mov.b32 %r2468, %r2384; 2026-02-21T10:15:10.5743115Z mov.b32 %r2469, %r2384; 2026-02-21T10:15:10.5743178Z mov.b32 %r2470, %r2384; 2026-02-21T10:15:10.5743247Z mov.b32 %r2471, %r2384; 2026-02-21T10:15:10.5743304Z mov.b32 %r2472, %r2384; 2026-02-21T10:15:10.5743362Z mov.b32 %r2473, %r2384; 2026-02-21T10:15:10.5743426Z mov.b32 %r2474, %r2384; 2026-02-21T10:15:10.5743485Z mov.b32 %r2475, %r2384; 2026-02-21T10:15:10.5743543Z mov.b32 %r2476, %r2384; 2026-02-21T10:15:10.5743601Z mov.b32 %r2477, %r2384; 2026-02-21T10:15:10.5743666Z mov.b32 %r2478, %r2384; 2026-02-21T10:15:10.5743726Z mov.b32 %r2479, %r2384; 2026-02-21T10:15:10.5743786Z mov.b32 %r2480, %r2384; 2026-02-21T10:15:10.5743848Z mov.b32 %r2481, %r2384; 2026-02-21T10:15:10.5743972Z mov.b32 %r2482, %r2384; 2026-02-21T10:15:10.5744030Z mov.b32 %r2483, %r2384; 2026-02-21T10:15:10.5744134Z mov.b32 %r2484, %r2384; 2026-02-21T10:15:10.5744197Z mov.b32 %r2485, %r2384; 2026-02-21T10:15:10.5744256Z mov.b32 %r2486, %r2384; 2026-02-21T10:15:10.5744315Z mov.b32 %r2487, %r2384; 2026-02-21T10:15:10.5744378Z mov.b32 %r2488, %r2384; 2026-02-21T10:15:10.5744485Z mov.b32 %r2489, %r2384; 2026-02-21T10:15:10.5744547Z mov.b32 %r2490, %r2384; 2026-02-21T10:15:10.5744605Z mov.b32 %r2491, %r2384; 2026-02-21T10:15:10.5744672Z mov.b32 %r2492, %r2384; 2026-02-21T10:15:10.5744729Z mov.b32 %r2493, %r2384; 2026-02-21T10:15:10.5744788Z mov.b32 %r2494, %r2384; 2026-02-21T10:15:10.5744853Z mov.b32 %r2495, %r2384; 2026-02-21T10:15:10.5744912Z mov.b32 %r2496, %r2384; 2026-02-21T10:15:10.5744970Z mov.b32 %r2497, %r2384; 2026-02-21T10:15:10.5745081Z mov.b32 %r2498, %r2384; 2026-02-21T10:15:10.5745149Z mov.b32 %r2499, %r2384; 2026-02-21T10:15:10.5745210Z mov.b32 %r2500, %r2384; 2026-02-21T10:15:10.5745282Z mov.b32 %r2501, %r2384; 2026-02-21T10:15:10.5745351Z mov.b32 %r2502, %r2384; 2026-02-21T10:15:10.5745410Z mov.b32 %r2503, %r2384; 2026-02-21T10:15:10.5745469Z mov.b32 %r2504, %r2384; 2026-02-21T10:15:10.5745528Z mov.b32 %r2505, %r2384; 2026-02-21T10:15:10.5745591Z mov.b32 %r2506, %r2384; 2026-02-21T10:15:10.5745651Z mov.b32 %r2507, %r2384; 2026-02-21T10:15:10.5745711Z mov.b32 %r2508, %r2384; 2026-02-21T10:15:10.5745776Z mov.b32 %r2509, %r2384; 2026-02-21T10:15:10.5745836Z mov.b32 %r2510, %r2384; 2026-02-21T10:15:10.5745894Z mov.b32 %r2511, %r2384; 2026-02-21T10:15:10.5745954Z bra.uni $L__BB0_6; 2026-02-21T10:15:10.5746056Z $L__BB0_7: // %._crit_edge 2026-02-21T10:15:10.5746281Z .loc 1 26 90 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:90 2026-02-21T10:15:10.5746354Z cp.async.wait_group 0; 2026-02-21T10:15:10.5746418Z bar.sync 0; 2026-02-21T10:15:10.5746589Z // begin inline asm 2026-02-21T10:15:10.5746707Z @%p61 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T10:15:10.5746774Z // end inline asm 2026-02-21T10:15:10.5746830Z bar.sync 0; 2026-02-21T10:15:10.5746890Z // begin inline asm 2026-02-21T10:15:10.5746986Z @%p61 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T10:15:10.5747049Z // end inline asm 2026-02-21T10:15:10.5747105Z bar.sync 0; 2026-02-21T10:15:10.5747165Z // begin inline asm 2026-02-21T10:15:10.5747260Z @%p61 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T10:15:10.5747317Z // end inline asm 2026-02-21T10:15:10.5747532Z .loc 1 26 4 // cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py:26:4 2026-02-21T10:15:10.5747586Z ret; 2026-02-21T10:15:10.5747647Z $L__tmp3: 2026-02-21T10:15:10.5747704Z $L__func_end0: 2026-02-21T10:15:10.5747798Z // -- End function 2026-02-21T10:15:10.5747872Z } 2026-02-21T10:15:10.5748128Z .file 1 "/tmp/torchinductor_root/jt/cjtqx5orxxwrfgt66i7rw3rc3zbkzpuna6shslaks2lb6w2ae6bm.py" 2026-02-21T10:15:10.5748345Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:15:10.5748497Z .section .debug_abbrev 2026-02-21T10:15:10.5748554Z { 2026-02-21T10:15:10.5748651Z .b8 1 // Abbreviation Code 2026-02-21T10:15:10.5748750Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:15:10.5748840Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:15:10.5748926Z .b8 37 // DW_AT_producer 2026-02-21T10:15:10.5749004Z .b8 8 // DW_FORM_string 2026-02-21T10:15:10.5749095Z .b8 19 // DW_AT_language 2026-02-21T10:15:10.5749177Z .b8 5 // DW_FORM_data2 2026-02-21T10:15:10.5749258Z .b8 3 // DW_AT_name 2026-02-21T10:15:10.5749341Z .b8 8 // DW_FORM_string 2026-02-21T10:15:10.5749513Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:15:10.5749681Z .b8 6 // DW_FORM_data4 2026-02-21T10:15:10.5749762Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:15:10.5749849Z .b8 8 // DW_FORM_string 2026-02-21T10:15:10.5749994Z .b8 0 // EOM(1) 2026-02-21T10:15:10.5750070Z .b8 0 // EOM(2) 2026-02-21T10:15:10.5750165Z .b8 2 // Abbreviation Code 2026-02-21T10:15:10.5750252Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:15:10.5750333Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:15:10.5750413Z .b8 3 // DW_AT_name 2026-02-21T10:15:10.5750551Z .b8 8 // DW_FORM_string 2026-02-21T10:15:10.5750635Z .b8 32 // DW_AT_inline 2026-02-21T10:15:10.5750724Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:10.5750798Z .b8 0 // EOM(1) 2026-02-21T10:15:10.5750868Z .b8 0 // EOM(2) 2026-02-21T10:15:10.5750964Z .b8 3 // Abbreviation Code 2026-02-21T10:15:10.5751051Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:15:10.5751134Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:15:10.5751219Z .b8 17 // DW_AT_low_pc 2026-02-21T10:15:10.5751295Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:10.5751380Z .b8 18 // DW_AT_high_pc 2026-02-21T10:15:10.5751463Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:10.5751557Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:15:10.5751635Z .b8 19 // DW_FORM_ref4 2026-02-21T10:15:10.5751712Z .b8 0 // EOM(1) 2026-02-21T10:15:10.5751782Z .b8 0 // EOM(2) 2026-02-21T10:15:10.5751869Z .b8 4 // Abbreviation Code 2026-02-21T10:15:10.5751972Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:15:10.5752058Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:15:10.5752149Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:15:10.5752227Z .b8 19 // DW_FORM_ref4 2026-02-21T10:15:10.5752312Z .b8 17 // DW_AT_low_pc 2026-02-21T10:15:10.5752387Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:10.5752468Z .b8 18 // DW_AT_high_pc 2026-02-21T10:15:10.5752548Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:10.5752629Z .b8 88 // DW_AT_call_file 2026-02-21T10:15:10.5752709Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:10.5752790Z .b8 89 // DW_AT_call_line 2026-02-21T10:15:10.5752877Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:10.5752963Z .b8 87 // DW_AT_call_column 2026-02-21T10:15:10.5753042Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:10.5753120Z .b8 0 // EOM(1) 2026-02-21T10:15:10.5753189Z .b8 0 // EOM(2) 2026-02-21T10:15:10.5753257Z .b8 0 // EOM(3) 2026-02-21T10:15:10.5753318Z } 2026-02-21T10:15:10.5753381Z .section .debug_info 2026-02-21T10:15:10.5753431Z { 2026-02-21T10:15:10.5753520Z .b32 178 // Length of Unit 2026-02-21T10:15:10.5753619Z .b8 2 // DWARF version number 2026-02-21T10:15:10.5753743Z .b8 0 2026-02-21T10:15:10.5753878Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:15:10.5754029Z .b8 8 // Address Size (in bytes) 2026-02-21T10:15:10.5754145Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:15:10.5754230Z .b8 116 // DW_AT_producer 2026-02-21T10:15:10.5754336Z .b8 114 2026-02-21T10:15:10.5754391Z .b8 105 2026-02-21T10:15:10.5754442Z .b8 116 2026-02-21T10:15:10.5754492Z .b8 111 2026-02-21T10:15:10.5754552Z .b8 110 2026-02-21T10:15:10.5754603Z .b8 0 2026-02-21T10:15:10.5754682Z .b8 2 // DW_AT_language 2026-02-21T10:15:10.5754739Z .b8 0 2026-02-21T10:15:10.5754816Z .b8 99 // DW_AT_name 2026-02-21T10:15:10.5754869Z .b8 106 2026-02-21T10:15:10.5754970Z .b8 116 2026-02-21T10:15:10.5755028Z .b8 113 2026-02-21T10:15:10.5755079Z .b8 120 2026-02-21T10:15:10.5755135Z .b8 53 2026-02-21T10:15:10.5755193Z .b8 111 2026-02-21T10:15:10.5755244Z .b8 114 2026-02-21T10:15:10.5755297Z .b8 120 2026-02-21T10:15:10.5755348Z .b8 120 2026-02-21T10:15:10.5755408Z .b8 119 2026-02-21T10:15:10.5755460Z .b8 114 2026-02-21T10:15:10.5755512Z .b8 102 2026-02-21T10:15:10.5755563Z .b8 103 2026-02-21T10:15:10.5755620Z .b8 116 2026-02-21T10:15:10.5755672Z .b8 54 2026-02-21T10:15:10.5755723Z .b8 54 2026-02-21T10:15:10.5755781Z .b8 105 2026-02-21T10:15:10.5755832Z .b8 55 2026-02-21T10:15:10.5755884Z .b8 114 2026-02-21T10:15:10.5755935Z .b8 119 2026-02-21T10:15:10.5755992Z .b8 51 2026-02-21T10:15:10.5756044Z .b8 114 2026-02-21T10:15:10.5756095Z .b8 99 2026-02-21T10:15:10.5756152Z .b8 51 2026-02-21T10:15:10.5756203Z .b8 122 2026-02-21T10:15:10.5756254Z .b8 98 2026-02-21T10:15:10.5756305Z .b8 107 2026-02-21T10:15:10.5756362Z .b8 122 2026-02-21T10:15:10.5756415Z .b8 112 2026-02-21T10:15:10.5756609Z .b8 117 2026-02-21T10:15:10.5756669Z .b8 110 2026-02-21T10:15:10.5756726Z .b8 97 2026-02-21T10:15:10.5756778Z .b8 54 2026-02-21T10:15:10.5756831Z .b8 115 2026-02-21T10:15:10.5756888Z .b8 104 2026-02-21T10:15:10.5756944Z .b8 115 2026-02-21T10:15:10.5756996Z .b8 108 2026-02-21T10:15:10.5757046Z .b8 97 2026-02-21T10:15:10.5757107Z .b8 107 2026-02-21T10:15:10.5757159Z .b8 115 2026-02-21T10:15:10.5757210Z .b8 50 2026-02-21T10:15:10.5757268Z .b8 108 2026-02-21T10:15:10.5757320Z .b8 98 2026-02-21T10:15:10.5757371Z .b8 54 2026-02-21T10:15:10.5757424Z .b8 119 2026-02-21T10:15:10.5757481Z .b8 50 2026-02-21T10:15:10.5757533Z .b8 97 2026-02-21T10:15:10.5757585Z .b8 101 2026-02-21T10:15:10.5757635Z .b8 54 2026-02-21T10:15:10.5757692Z .b8 98 2026-02-21T10:15:10.5757745Z .b8 109 2026-02-21T10:15:10.5757796Z .b8 46 2026-02-21T10:15:10.5757855Z .b8 112 2026-02-21T10:15:10.5757925Z .b8 121 2026-02-21T10:15:10.5757978Z .b8 0 2026-02-21T10:15:10.5758080Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:15:10.5758170Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:15:10.5758223Z .b8 116 2026-02-21T10:15:10.5758278Z .b8 109 2026-02-21T10:15:10.5758336Z .b8 112 2026-02-21T10:15:10.5758388Z .b8 47 2026-02-21T10:15:10.5758441Z .b8 116 2026-02-21T10:15:10.5758497Z .b8 111 2026-02-21T10:15:10.5758555Z .b8 114 2026-02-21T10:15:10.5758607Z .b8 99 2026-02-21T10:15:10.5758660Z .b8 104 2026-02-21T10:15:10.5758718Z .b8 105 2026-02-21T10:15:10.5758770Z .b8 110 2026-02-21T10:15:10.5758823Z .b8 100 2026-02-21T10:15:10.5758876Z .b8 117 2026-02-21T10:15:10.5758934Z .b8 99 2026-02-21T10:15:10.5758987Z .b8 116 2026-02-21T10:15:10.5759038Z .b8 111 2026-02-21T10:15:10.5759096Z .b8 114 2026-02-21T10:15:10.5759147Z .b8 95 2026-02-21T10:15:10.5759198Z .b8 114 2026-02-21T10:15:10.5759252Z .b8 111 2026-02-21T10:15:10.5759309Z .b8 111 2026-02-21T10:15:10.5759361Z .b8 116 2026-02-21T10:15:10.5759410Z .b8 47 2026-02-21T10:15:10.5759461Z .b8 106 2026-02-21T10:15:10.5759517Z .b8 116 2026-02-21T10:15:10.5759569Z .b8 0 2026-02-21T10:15:10.5759688Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:15:10.5759854Z .b8 95 // DW_AT_name 2026-02-21T10:15:10.5759970Z .b8 104 2026-02-21T10:15:10.5760020Z .b8 101 2026-02-21T10:15:10.5760072Z .b8 108 2026-02-21T10:15:10.5760132Z .b8 105 2026-02-21T10:15:10.5760196Z .b8 111 2026-02-21T10:15:10.5760249Z .b8 110 2026-02-21T10:15:10.5760306Z .b8 95 2026-02-21T10:15:10.5760357Z .b8 109 2026-02-21T10:15:10.5760409Z .b8 97 2026-02-21T10:15:10.5760529Z .b8 116 2026-02-21T10:15:10.5760589Z .b8 109 2026-02-21T10:15:10.5760640Z .b8 117 2026-02-21T10:15:10.5760690Z .b8 108 2026-02-21T10:15:10.5760745Z .b8 95 2026-02-21T10:15:10.5760796Z .b8 98 2026-02-21T10:15:10.5760846Z .b8 102 2026-02-21T10:15:10.5760896Z .b8 49 2026-02-21T10:15:10.5760952Z .b8 54 2026-02-21T10:15:10.5761002Z .b8 95 2026-02-21T10:15:10.5761054Z .b8 105 2026-02-21T10:15:10.5761110Z .b8 110 2026-02-21T10:15:10.5761161Z .b8 116 2026-02-21T10:15:10.5761275Z .b8 52 2026-02-21T10:15:10.5761329Z .b8 0 2026-02-21T10:15:10.5761419Z .b8 1 // DW_AT_inline 2026-02-21T10:15:10.5761529Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:15:10.5761625Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:15:10.5761724Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:15:10.5761825Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:15:10.5761955Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:15:10.5762052Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:15:10.5762148Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:15:10.5762236Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:15:10.5762320Z .b8 1 // DW_AT_call_file 2026-02-21T10:15:10.5762422Z .b8 90 // DW_AT_call_line 2026-02-21T10:15:10.5762509Z .b8 40 // DW_AT_call_column 2026-02-21T10:15:10.5762601Z .b8 0 // End Of Children Mark 2026-02-21T10:15:10.5762695Z .b8 0 // End Of Children Mark 2026-02-21T10:15:10.5762747Z } 2026-02-21T10:15:10.5762819Z .section .debug_macinfo { } 2026-02-21T10:15:10.5762826Z 2026-02-21T10:15:10.5762916Z ================================================================ 2026-02-21T10:15:10.5763034Z please share the reproducer above with Triton project. 2026-02-21T10:15:14.3941662Z 2026-02-21T10:15:14.3941679Z 2026-02-21T10:15:14.3941684Z 2026-02-21T10:15:14.3942077Z ================================================================ 2026-02-21T10:15:14.3942468Z Internal Triton PTX codegen error 2026-02-21T10:15:14.3942740Z `ptxas` stderr: 2026-02-21T10:15:14.3943505Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1181 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T10:15:14.3944351Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:15:14.3944594Z 2026-02-21T10:15:14.3945264Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmporhgp67d.ptx -o /tmp/tmporhgp67d.ptx.o 2026-02-21T10:15:14.3946044Z 2026-02-21T10:15:14.3946050Z 2026-02-21T10:15:14.3946124Z // 2026-02-21T10:15:14.3946322Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:15:14.3946834Z // 2026-02-21T10:15:14.3946931Z 2026-02-21T10:15:14.3947012Z .version 8.7 2026-02-21T10:15:14.3947197Z .target sm_90a 2026-02-21T10:15:14.3947382Z .address_size 64 2026-02-21T10:15:14.3947500Z 2026-02-21T10:15:14.3947731Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:15:14.3948189Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:15:14.3948655Z // @_helion_matmul_bf16_int4 2026-02-21T10:15:14.3948991Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:15:14.3949674Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:15:14.3950238Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:15:14.3950677Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:15:14.3951106Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:15:14.3951700Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:15:14.3952085Z ) 2026-02-21T10:15:14.3952302Z .reqntid 128 2026-02-21T10:15:14.3952537Z .maxnreg 64 2026-02-21T10:15:14.3952742Z { 2026-02-21T10:15:14.3952925Z .reg .pred %p<23>; 2026-02-21T10:15:14.3953141Z .reg .b16 %rs<385>; 2026-02-21T10:15:14.3953355Z .reg .b32 %r<6947>; 2026-02-21T10:15:14.3953555Z .reg .b64 %rd<160>; 2026-02-21T10:15:14.3953761Z $L__func_begin0: 2026-02-21T10:15:14.3954040Z 2026-02-21T10:15:14.3954121Z // %bb.0: 2026-02-21T10:15:14.3954542Z .loc 1 20 30 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:20:30 2026-02-21T10:15:14.3955070Z mov.u32 %r646, %ctaid.x; 2026-02-21T10:15:14.3955522Z .loc 1 20 35 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:20:35 2026-02-21T10:15:14.3956024Z mul.lo.s32 %r6688, %r646, 20; 2026-02-21T10:15:14.3956709Z .loc 1 21 37 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:21:37 2026-02-21T10:15:14.3957220Z add.s32 %r647, %r6688, 20; 2026-02-21T10:15:14.3957669Z .loc 1 21 49 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:21:49 2026-02-21T10:15:14.3958169Z min.s32 %r2, %r647, 2560; 2026-02-21T10:15:14.3958602Z .loc 1 22 52 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:22:52 2026-02-21T10:15:14.3959102Z setp.ge.s32 %p1, %r6688, %r2; 2026-02-21T10:15:14.3959350Z @%p1 bra $L__BB0_5; 2026-02-21T10:15:14.3959584Z // %bb.1: // %.lr.ph 2026-02-21T10:15:14.3960091Z .loc 1 0 52 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:0:52 2026-02-21T10:15:14.3960651Z ld.param.b64 %rd17, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:15:14.3961051Z ld.param.b64 %rd16, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:15:14.3961437Z ld.param.b64 %rd15, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:15:14.3961768Z mov.u32 %r3, %tid.x; 2026-02-21T10:15:14.3961981Z shr.u32 %r4, %r3, 5; 2026-02-21T10:15:14.3962188Z and.b32 %r5, %r3, 120; 2026-02-21T10:15:14.3962417Z bfe.u32 %r6, %r3, 3, 4; 2026-02-21T10:15:14.3962636Z and.b32 %r7, %r3, 96; 2026-02-21T10:15:14.3962842Z bfe.u32 %r8, %r3, 5, 2; 2026-02-21T10:15:14.3963003Z or.b32 %r9, %r8, 4; 2026-02-21T10:15:14.3963161Z or.b32 %r10, %r8, 8; 2026-02-21T10:15:14.3963311Z or.b32 %r11, %r8, 12; 2026-02-21T10:15:14.3963472Z or.b32 %r12, %r8, 16; 2026-02-21T10:15:14.3963623Z or.b32 %r13, %r8, 20; 2026-02-21T10:15:14.3963783Z or.b32 %r14, %r8, 24; 2026-02-21T10:15:14.3963951Z or.b32 %r15, %r8, 28; 2026-02-21T10:15:14.3964103Z or.b32 %r16, %r8, 32; 2026-02-21T10:15:14.3964279Z or.b32 %r17, %r8, 36; 2026-02-21T10:15:14.3964434Z or.b32 %r18, %r8, 40; 2026-02-21T10:15:14.3964599Z or.b32 %r19, %r8, 44; 2026-02-21T10:15:14.3964748Z or.b32 %r20, %r8, 48; 2026-02-21T10:15:14.3964909Z or.b32 %r21, %r8, 52; 2026-02-21T10:15:14.3965064Z or.b32 %r22, %r8, 56; 2026-02-21T10:15:14.3965230Z or.b32 %r23, %r8, 60; 2026-02-21T10:15:14.3965385Z or.b32 %r24, %r8, 64; 2026-02-21T10:15:14.3965551Z or.b32 %r25, %r8, 68; 2026-02-21T10:15:14.3965708Z or.b32 %r26, %r8, 72; 2026-02-21T10:15:14.3965872Z or.b32 %r27, %r8, 76; 2026-02-21T10:15:14.3966032Z or.b32 %r28, %r8, 80; 2026-02-21T10:15:14.3966187Z or.b32 %r29, %r8, 84; 2026-02-21T10:15:14.3966347Z or.b32 %r30, %r8, 88; 2026-02-21T10:15:14.3966650Z or.b32 %r31, %r8, 92; 2026-02-21T10:15:14.3966820Z or.b32 %r32, %r8, 96; 2026-02-21T10:15:14.3966974Z or.b32 %r33, %r8, 100; 2026-02-21T10:15:14.3967272Z or.b32 %r34, %r8, 104; 2026-02-21T10:15:14.3967435Z or.b32 %r35, %r8, 108; 2026-02-21T10:15:14.3967678Z or.b32 %r36, %r8, 112; 2026-02-21T10:15:14.3967836Z or.b32 %r37, %r8, 116; 2026-02-21T10:15:14.3967997Z or.b32 %r38, %r8, 120; 2026-02-21T10:15:14.3968157Z or.b32 %r39, %r8, 124; 2026-02-21T10:15:14.3968313Z shl.b32 %r40, %r3, 4; 2026-02-21T10:15:14.3968566Z and.b32 %r41, %r40, 240; 2026-02-21T10:15:14.3968752Z shl.b32 %r648, %r3, 3; 2026-02-21T10:15:14.3968924Z and.b32 %r42, %r648, 248; 2026-02-21T10:15:14.3969094Z and.b32 %r43, %r3, 7; 2026-02-21T10:15:14.3969256Z shl.b32 %r44, %r43, 2; 2026-02-21T10:15:14.3969579Z .loc 1 38 48 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:38:48 2026-02-21T10:15:14.3969955Z shr.u32 %r649, %r3, 4; 2026-02-21T10:15:14.3970132Z bfe.u32 %r650, %r3, 4, 3; 2026-02-21T10:15:14.3970388Z and.b32 %r45, %r3, 127; 2026-02-21T10:15:14.3970571Z shl.b32 %r651, %r45, 3; 2026-02-21T10:15:14.3970733Z shr.u32 %r652, %r3, 1; 2026-02-21T10:15:14.3970902Z and.b32 %r653, %r652, 24; 2026-02-21T10:15:14.3971074Z xor.b32 %r46, %r651, %r653; 2026-02-21T10:15:14.3971261Z mov.b32 %r3401, global_smem; 2026-02-21T10:15:14.3971439Z add.s32 %r655, %r3401, %r46; 2026-02-21T10:15:14.3971616Z add.s32 %r705, %r655, 32768; 2026-02-21T10:15:14.3971785Z add.s32 %r707, %r655, 33792; 2026-02-21T10:15:14.3971963Z add.s32 %r709, %r655, 34816; 2026-02-21T10:15:14.3972142Z add.s32 %r711, %r655, 35840; 2026-02-21T10:15:14.3972320Z add.s32 %r713, %r655, 36864; 2026-02-21T10:15:14.3972495Z add.s32 %r715, %r655, 37888; 2026-02-21T10:15:14.3972670Z add.s32 %r717, %r655, 38912; 2026-02-21T10:15:14.3972847Z add.s32 %r719, %r655, 39936; 2026-02-21T10:15:14.3973021Z mul.lo.s32 %r55, %r650, 1280; 2026-02-21T10:15:14.3973206Z add.s32 %r56, %r55, 10240; 2026-02-21T10:15:14.3973826Z [2907s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:15:14.3975369Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=1, num_stages=3, num_warps=4, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:15:14.3977008Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:15:14.3977297Z `ptxas` stderr: 2026-02-21T10:15:14.3977853Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1181 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T10:15:14.3978509Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:15:14.3978691Z 2026-02-21T10:15:14.3979200Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmporhgp67d.ptx -o /tmp/tmporhgp67d.ptx.o 2026-02-21T10:15:14.3979782Z 2026-02-21T10:15:14.3979936Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:15:14.3980240Z shl.b32 %r57, %r45, 4; 2026-02-21T10:15:14.3980412Z add.s32 %r656, %r3401, %r57; 2026-02-21T10:15:14.3980599Z add.s32 %r721, %r656, 65536; 2026-02-21T10:15:14.3980773Z add.s32 %r723, %r656, 67584; 2026-02-21T10:15:14.3980953Z or.b32 %r60, %r44, 32; 2026-02-21T10:15:14.3981118Z add.s32 %r725, %r655, 49152; 2026-02-21T10:15:14.3981297Z add.s32 %r727, %r655, 50176; 2026-02-21T10:15:14.3981474Z add.s32 %r729, %r655, 51200; 2026-02-21T10:15:14.3981644Z add.s32 %r731, %r655, 52224; 2026-02-21T10:15:14.3981821Z add.s32 %r733, %r655, 53248; 2026-02-21T10:15:14.3981989Z add.s32 %r735, %r655, 54272; 2026-02-21T10:15:14.3982167Z add.s32 %r737, %r655, 55296; 2026-02-21T10:15:14.3982336Z add.s32 %r739, %r655, 56320; 2026-02-21T10:15:14.3982517Z add.s32 %r741, %r656, 73728; 2026-02-21T10:15:14.3982805Z add.s32 %r743, %r656, 75776; 2026-02-21T10:15:14.3983053Z or.b32 %r71, %r44, 64; 2026-02-21T10:15:14.3983218Z add.s32 %r745, %r655, 40960; 2026-02-21T10:15:14.3983389Z add.s32 %r747, %r655, 41984; 2026-02-21T10:15:14.3983563Z add.s32 %r749, %r655, 43008; 2026-02-21T10:15:14.3983731Z add.s32 %r751, %r655, 44032; 2026-02-21T10:15:14.3984009Z add.s32 %r753, %r655, 45056; 2026-02-21T10:15:14.3984186Z add.s32 %r755, %r655, 46080; 2026-02-21T10:15:14.3984377Z add.s32 %r757, %r655, 47104; 2026-02-21T10:15:14.3984552Z add.s32 %r759, %r655, 48128; 2026-02-21T10:15:14.3984721Z add.s32 %r761, %r656, 69632; 2026-02-21T10:15:14.3984899Z add.s32 %r763, %r656, 71680; 2026-02-21T10:15:14.3985069Z or.b32 %r657, %r649, 56; 2026-02-21T10:15:14.3985254Z or.b32 %r82, %r44, 96; 2026-02-21T10:15:14.3985420Z add.s32 %r765, %r655, 57344; 2026-02-21T10:15:14.3985685Z add.s32 %r767, %r655, 58368; 2026-02-21T10:15:14.3985865Z add.s32 %r769, %r655, 59392; 2026-02-21T10:15:14.3986045Z add.s32 %r771, %r655, 60416; 2026-02-21T10:15:14.3986223Z add.s32 %r773, %r655, 61440; 2026-02-21T10:15:14.3986402Z add.s32 %r775, %r655, 62464; 2026-02-21T10:15:14.3986729Z add.s32 %r777, %r655, 63488; 2026-02-21T10:15:14.3986897Z add.s32 %r779, %r655, 64512; 2026-02-21T10:15:14.3987079Z mul.lo.s32 %r91, %r657, 1280; 2026-02-21T10:15:14.3987267Z add.s32 %r781, %r656, 77824; 2026-02-21T10:15:14.3987453Z add.s32 %r783, %r656, 79872; 2026-02-21T10:15:14.3987624Z shl.b32 %r658, %r7, 5; 2026-02-21T10:15:14.3987788Z and.b32 %r659, %r40, 448; 2026-02-21T10:15:14.3987957Z shl.b32 %r660, %r3, 1; 2026-02-21T10:15:14.3988123Z and.b32 %r661, %r660, 6; 2026-02-21T10:15:14.3988294Z and.b32 %r662, %r3, 24; 2026-02-21T10:15:14.3988564Z or.b32 %r663, %r658, %r659; 2026-02-21T10:15:14.3988744Z or.b32 %r664, %r662, %r661; 2026-02-21T10:15:14.3988917Z or.b32 %r94, %r663, %r664; 2026-02-21T10:15:14.3989095Z xor.b32 %r95, %r94, 8; 2026-02-21T10:15:14.3989255Z xor.b32 %r96, %r94, 16; 2026-02-21T10:15:14.3989426Z xor.b32 %r97, %r94, 24; 2026-02-21T10:15:14.3989584Z or.b32 %r98, %r3, 896; 2026-02-21T10:15:14.3989749Z or.b32 %r99, %r3, 1920; 2026-02-21T10:15:14.3989909Z or.b32 %r100, %r3, 2944; 2026-02-21T10:15:14.3990077Z or.b32 %r101, %r3, 3968; 2026-02-21T10:15:14.3990244Z shl.b32 %r665, %r45, 7; 2026-02-21T10:15:14.3990406Z shl.b32 %r666, %r43, 4; 2026-02-21T10:15:14.3990576Z or.b32 %r667, %r665, %r666; 2026-02-21T10:15:14.3990751Z add.s32 %r102, %r3401, %r667; 2026-02-21T10:15:14.3990932Z xor.b32 %r668, %r667, 16; 2026-02-21T10:15:14.3991105Z add.s32 %r103, %r3401, %r668; 2026-02-21T10:15:14.3991287Z xor.b32 %r669, %r667, 32; 2026-02-21T10:15:14.3991453Z add.s32 %r104, %r3401, %r669; 2026-02-21T10:15:14.3991632Z xor.b32 %r670, %r667, 48; 2026-02-21T10:15:14.3991806Z add.s32 %r105, %r3401, %r670; 2026-02-21T10:15:14.3991993Z xor.b32 %r671, %r667, 64; 2026-02-21T10:15:14.3992173Z add.s32 %r106, %r3401, %r671; 2026-02-21T10:15:14.3992344Z xor.b32 %r672, %r667, 80; 2026-02-21T10:15:14.3992531Z add.s32 %r107, %r3401, %r672; 2026-02-21T10:15:14.3992705Z xor.b32 %r673, %r667, 96; 2026-02-21T10:15:14.3992879Z add.s32 %r108, %r3401, %r673; 2026-02-21T10:15:14.3993053Z xor.b32 %r674, %r667, 112; 2026-02-21T10:15:14.3993233Z add.s32 %r109, %r3401, %r674; 2026-02-21T10:15:14.3993409Z bfe.u32 %r675, %r3401, 4, 14; 2026-02-21T10:15:14.3993590Z cvt.u64.u32 %rd18, %r675; 2026-02-21T10:15:14.3993777Z or.b64 %rd89, %rd18, 4611686293439512576; 2026-02-21T10:15:14.3993986Z add.s32 %r676, %r3401, 32; 2026-02-21T10:15:14.3994172Z bfe.u32 %r677, %r676, 4, 14; 2026-02-21T10:15:14.3994354Z cvt.u64.u32 %rd19, %r677; 2026-02-21T10:15:14.3994542Z or.b64 %rd90, %rd19, 4611686293439512576; 2026-02-21T10:15:14.3994746Z add.s32 %r678, %r3401, 64; 2026-02-21T10:15:14.3994922Z bfe.u32 %r679, %r678, 4, 14; 2026-02-21T10:15:14.3995100Z cvt.u64.u32 %rd20, %r679; 2026-02-21T10:15:14.3995285Z or.b64 %rd91, %rd20, 4611686293439512576; 2026-02-21T10:15:14.3995597Z add.s32 %r680, %r3401, 96; 2026-02-21T10:15:14.3995778Z bfe.u32 %r681, %r680, 4, 14; 2026-02-21T10:15:14.3996043Z cvt.u64.u32 %rd21, %r681; 2026-02-21T10:15:14.3996236Z or.b64 %rd92, %rd21, 4611686293439512576; 2026-02-21T10:15:14.3996447Z and.b32 %r682, %r3, 3; 2026-02-21T10:15:14.3996754Z shl.b32 %r683, %r5, 4; 2026-02-21T10:15:14.3996918Z bfe.s32 %r684, %r3, 2, 1; 2026-02-21T10:15:14.3997210Z and.b32 %r685, %r684, 2064; 2026-02-21T10:15:14.3997400Z or.b32 %r686, %r685, %r683; 2026-02-21T10:15:14.3997579Z mad.lo.s32 %r687, %r682, 4128, %r686; 2026-02-21T10:15:14.3997782Z add.s32 %r110, %r3401, %r687; 2026-02-21T10:15:14.3997963Z xor.b32 %r688, %r687, 16; 2026-02-21T10:15:14.3998129Z add.s32 %r111, %r3401, %r688; 2026-02-21T10:15:14.3998303Z xor.b32 %r689, %r687, 32; 2026-02-21T10:15:14.3998469Z add.s32 %r112, %r3401, %r689; 2026-02-21T10:15:14.3998729Z xor.b32 %r690, %r687, 48; 2026-02-21T10:15:14.3998904Z add.s32 %r113, %r3401, %r690; 2026-02-21T10:15:14.3999079Z xor.b32 %r691, %r687, 64; 2026-02-21T10:15:14.3999249Z add.s32 %r114, %r3401, %r691; 2026-02-21T10:15:14.3999432Z xor.b32 %r692, %r687, 80; 2026-02-21T10:15:14.3999597Z add.s32 %r115, %r3401, %r692; 2026-02-21T10:15:14.3999773Z xor.b32 %r693, %r687, 96; 2026-02-21T10:15:14.3999942Z add.s32 %r116, %r3401, %r693; 2026-02-21T10:15:14.4000116Z xor.b32 %r694, %r687, 112; 2026-02-21T10:15:14.4000295Z add.s32 %r117, %r3401, %r694; 2026-02-21T10:15:14.4000470Z shl.b32 %r695, %r662, 9; 2026-02-21T10:15:14.4000645Z shl.b32 %r696, %r662, 2; 2026-02-21T10:15:14.4000813Z bfe.s32 %r697, %r3, 5, 1; 2026-02-21T10:15:14.4000985Z and.b32 %r698, %r697, 2064; 2026-02-21T10:15:14.4001158Z and.b32 %r699, %r660, 128; 2026-02-21T10:15:14.4001334Z or.b32 %r700, %r695, %r666; 2026-02-21T10:15:14.4001504Z or.b32 %r701, %r698, %r696; 2026-02-21T10:15:14.4001680Z xor.b32 %r702, %r701, %r700; 2026-02-21T10:15:14.4001861Z add.s32 %r703, %r3401, %r699; 2026-02-21T10:15:14.4002035Z add.s32 %r6211, %r703, %r702; 2026-02-21T10:15:14.4002218Z add.s32 %r6216, %r6211, 256; 2026-02-21T10:15:14.4002396Z add.s32 %r6221, %r6211, 512; 2026-02-21T10:15:14.4002574Z add.s32 %r6226, %r6211, 768; 2026-02-21T10:15:14.4002743Z add.s32 %r6231, %r6211, 1024; 2026-02-21T10:15:14.4002924Z add.s32 %r6236, %r6211, 1280; 2026-02-21T10:15:14.4003097Z add.s32 %r6241, %r6211, 1536; 2026-02-21T10:15:14.4003279Z add.s32 %r6246, %r6211, 1792; 2026-02-21T10:15:14.4003636Z .loc 1 22 52 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:22:52 2026-02-21T10:15:14.4004018Z mad.wide.u32 %rd22, %r43, 8, %rd15; 2026-02-21T10:15:14.4004227Z add.s64 %rd5, %rd22, 1835328; 2026-02-21T10:15:14.4004418Z mad.wide.u32 %rd23, %r650, 1280, %rd16; 2026-02-21T10:15:14.4004633Z add.s64 %rd6, %rd23, 112640; 2026-02-21T10:15:14.4004858Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:15:14.4005153Z // Child Loop BB0_3 Depth 2 2026-02-21T10:15:14.4005549Z .loc 1 26 31 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:26:31 2026-02-21T10:15:14.4005914Z shr.s32 %r788, %r6688, 31; 2026-02-21T10:15:14.4006096Z shr.u32 %r789, %r788, 23; 2026-02-21T10:15:14.4006266Z add.s32 %r790, %r6688, %r789; 2026-02-21T10:15:14.4006443Z shr.s32 %r791, %r790, 9; 2026-02-21T10:15:14.4006902Z .loc 1 25 30 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:25:30 2026-02-21T10:15:14.4007268Z and.b32 %r792, %r790, 33553920; 2026-02-21T10:15:14.4007453Z sub.s32 %r793, %r6688, %r792; 2026-02-21T10:15:14.4007784Z .loc 1 27 27 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:27:27 2026-02-21T10:15:14.4008142Z shl.b32 %r127, %r793, 7; 2026-02-21T10:15:14.4008457Z .loc 1 28 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:28:32 2026-02-21T10:15:14.4008817Z or.b32 %r794, %r127, %r6; 2026-02-21T10:15:14.4009131Z .loc 1 29 27 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:29:27 2026-02-21T10:15:14.4009672Z shl.b32 %r128, %r791, 8; 2026-02-21T10:15:14.4009992Z .loc 1 30 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:30:32 2026-02-21T10:15:14.4010358Z or.b32 %r795, %r128, %r41; 2026-02-21T10:15:14.4010749Z .loc 1 45 53 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:53 2026-02-21T10:15:14.4011105Z shl.b32 %r796, %r794, 13; 2026-02-21T10:15:14.4011425Z .loc 1 45 60 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:60 2026-02-21T10:15:14.4011780Z or.b32 %r797, %r796, %r44; 2026-02-21T10:15:14.4012097Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4012544Z mad.wide.s32 %rd24, %r797, 2, %rd15; 2026-02-21T10:15:14.4012760Z cvt.u64.u32 %rd65, %r44; 2026-02-21T10:15:14.4012936Z cvt.s64.s32 %rd66, %r796; 2026-02-21T10:15:14.4013111Z or.b64 %rd67, %rd66, %rd65; 2026-02-21T10:15:14.4013300Z shl.b64 %rd68, %rd67, 1; 2026-02-21T10:15:14.4013471Z add.s64 %rd69, %rd15, %rd68; 2026-02-21T10:15:14.4013656Z add.s64 %rd25, %rd69, 262144; 2026-02-21T10:15:14.4013838Z add.s64 %rd26, %rd69, 524288; 2026-02-21T10:15:14.4014019Z add.s64 %rd27, %rd69, 786432; 2026-02-21T10:15:14.4014198Z add.s64 %rd28, %rd69, 1048576; 2026-02-21T10:15:14.4014389Z add.s64 %rd29, %rd69, 1310720; 2026-02-21T10:15:14.4014576Z add.s64 %rd30, %rd69, 1572864; 2026-02-21T10:15:14.4014755Z add.s64 %rd31, %rd69, 1835008; 2026-02-21T10:15:14.4014935Z mov.b32 %r706, 8; 2026-02-21T10:15:14.4015241Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4015607Z // begin inline asm 2026-02-21T10:15:14.4015843Z cp.async.ca.shared.global [ %r705 + 0 ], [ %rd24 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4016126Z // end inline asm 2026-02-21T10:15:14.4016275Z // begin inline asm 2026-02-21T10:15:14.4016636Z cp.async.ca.shared.global [ %r707 + 0 ], [ %rd25 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4016922Z // end inline asm 2026-02-21T10:15:14.4017071Z // begin inline asm 2026-02-21T10:15:14.4017306Z cp.async.ca.shared.global [ %r709 + 0 ], [ %rd26 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4017570Z // end inline asm 2026-02-21T10:15:14.4017726Z // begin inline asm 2026-02-21T10:15:14.4017948Z cp.async.ca.shared.global [ %r711 + 0 ], [ %rd27 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4018222Z // end inline asm 2026-02-21T10:15:14.4018368Z // begin inline asm 2026-02-21T10:15:14.4018596Z cp.async.ca.shared.global [ %r713 + 0 ], [ %rd28 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4018864Z // end inline asm 2026-02-21T10:15:14.4019016Z // begin inline asm 2026-02-21T10:15:14.4019241Z cp.async.ca.shared.global [ %r715 + 0 ], [ %rd29 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4019518Z // end inline asm 2026-02-21T10:15:14.4019670Z // begin inline asm 2026-02-21T10:15:14.4019889Z cp.async.ca.shared.global [ %r717 + 0 ], [ %rd30 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4020156Z // end inline asm 2026-02-21T10:15:14.4020303Z // begin inline asm 2026-02-21T10:15:14.4020529Z cp.async.ca.shared.global [ %r719 + 0 ], [ %rd31 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4020791Z // end inline asm 2026-02-21T10:15:14.4020951Z cp.async.commit_group; 2026-02-21T10:15:14.4021281Z .loc 1 51 62 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:62 2026-02-21T10:15:14.4021649Z add.s32 %r798, %r795, %r55; 2026-02-21T10:15:14.4021835Z add.s32 %r799, %r795, %r56; 2026-02-21T10:15:14.4022153Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4022514Z cvt.s64.s32 %rd70, %r798; 2026-02-21T10:15:14.4022689Z add.s64 %rd32, %rd16, %rd70; 2026-02-21T10:15:14.4022870Z cvt.s64.s32 %rd71, %r799; 2026-02-21T10:15:14.4023039Z add.s64 %rd33, %rd16, %rd71; 2026-02-21T10:15:14.4023214Z mov.b32 %r722, 16; 2026-02-21T10:15:14.4023644Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4024065Z // begin inline asm 2026-02-21T10:15:14.4024300Z cp.async.cg.shared.global [ %r721 + 0 ], [ %rd32 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4024573Z // end inline asm 2026-02-21T10:15:14.4024726Z // begin inline asm 2026-02-21T10:15:14.4025015Z cp.async.cg.shared.global [ %r723 + 0 ], [ %rd33 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4025296Z // end inline asm 2026-02-21T10:15:14.4025454Z cp.async.commit_group; 2026-02-21T10:15:14.4025784Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4026149Z add.s64 %rd34, %rd69, 64; 2026-02-21T10:15:14.4026319Z cvt.u64.u32 %rd72, %r60; 2026-02-21T10:15:14.4026696Z or.b64 %rd73, %rd66, %rd72; 2026-02-21T10:15:14.4026887Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:15:14.4027062Z add.s64 %rd75, %rd15, %rd74; 2026-02-21T10:15:14.4027248Z add.s64 %rd35, %rd75, 262144; 2026-02-21T10:15:14.4027440Z add.s64 %rd36, %rd75, 524288; 2026-02-21T10:15:14.4027623Z add.s64 %rd37, %rd75, 786432; 2026-02-21T10:15:14.4027807Z add.s64 %rd38, %rd75, 1048576; 2026-02-21T10:15:14.4027994Z add.s64 %rd39, %rd75, 1310720; 2026-02-21T10:15:14.4028174Z add.s64 %rd40, %rd75, 1572864; 2026-02-21T10:15:14.4040459Z add.s64 %rd41, %rd75, 1835008; 2026-02-21T10:15:14.4040878Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4041277Z // begin inline asm 2026-02-21T10:15:14.4041549Z cp.async.ca.shared.global [ %r725 + 0 ], [ %rd34 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4041844Z // end inline asm 2026-02-21T10:15:14.4042014Z // begin inline asm 2026-02-21T10:15:14.4042267Z cp.async.ca.shared.global [ %r727 + 0 ], [ %rd35 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4042557Z // end inline asm 2026-02-21T10:15:14.4042717Z // begin inline asm 2026-02-21T10:15:14.4042957Z cp.async.ca.shared.global [ %r729 + 0 ], [ %rd36 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4043246Z // end inline asm 2026-02-21T10:15:14.4043397Z // begin inline asm 2026-02-21T10:15:14.4043647Z cp.async.ca.shared.global [ %r731 + 0 ], [ %rd37 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4043917Z // end inline asm 2026-02-21T10:15:14.4044071Z // begin inline asm 2026-02-21T10:15:14.4044296Z cp.async.ca.shared.global [ %r733 + 0 ], [ %rd38 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4044567Z // end inline asm 2026-02-21T10:15:14.4044720Z // begin inline asm 2026-02-21T10:15:14.4044945Z cp.async.ca.shared.global [ %r735 + 0 ], [ %rd39 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4045228Z // end inline asm 2026-02-21T10:15:14.4045380Z // begin inline asm 2026-02-21T10:15:14.4045607Z cp.async.ca.shared.global [ %r737 + 0 ], [ %rd40 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4045872Z // end inline asm 2026-02-21T10:15:14.4046028Z // begin inline asm 2026-02-21T10:15:14.4046249Z cp.async.ca.shared.global [ %r739 + 0 ], [ %rd41 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4046681Z // end inline asm 2026-02-21T10:15:14.4046847Z cp.async.commit_group; 2026-02-21T10:15:14.4047191Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4047572Z cvt.s64.s32 %rd76, %r795; 2026-02-21T10:15:14.4047756Z cvt.u64.u32 %rd77, %r55; 2026-02-21T10:15:14.4047949Z add.s64 %rd78, %rd76, %rd77; 2026-02-21T10:15:14.4048139Z add.s64 %rd79, %rd16, %rd78; 2026-02-21T10:15:14.4048326Z add.s64 %rd42, %rd79, 20480; 2026-02-21T10:15:14.4048504Z add.s64 %rd43, %rd79, 30720; 2026-02-21T10:15:14.4048840Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4049222Z // begin inline asm 2026-02-21T10:15:14.4049481Z cp.async.cg.shared.global [ %r741 + 0 ], [ %rd42 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4049769Z // end inline asm 2026-02-21T10:15:14.4049926Z // begin inline asm 2026-02-21T10:15:14.4050165Z cp.async.cg.shared.global [ %r743 + 0 ], [ %rd43 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4050719Z // end inline asm 2026-02-21T10:15:14.4050887Z cp.async.commit_group; 2026-02-21T10:15:14.4051220Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4051602Z add.s64 %rd44, %rd69, 128; 2026-02-21T10:15:14.4051788Z cvt.u64.u32 %rd80, %r71; 2026-02-21T10:15:14.4052051Z or.b64 %rd81, %rd66, %rd80; 2026-02-21T10:15:14.4052244Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:15:14.4052416Z add.s64 %rd83, %rd15, %rd82; 2026-02-21T10:15:14.4052605Z add.s64 %rd45, %rd83, 262144; 2026-02-21T10:15:14.4052790Z add.s64 %rd46, %rd83, 524288; 2026-02-21T10:15:14.4052976Z add.s64 %rd47, %rd83, 786432; 2026-02-21T10:15:14.4053158Z add.s64 %rd48, %rd83, 1048576; 2026-02-21T10:15:14.4053354Z add.s64 %rd49, %rd83, 1310720; 2026-02-21T10:15:14.4053644Z add.s64 %rd50, %rd83, 1572864; 2026-02-21T10:15:14.4053852Z add.s64 %rd51, %rd83, 1835008; 2026-02-21T10:15:14.4054193Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4054554Z bar.sync 0; 2026-02-21T10:15:14.4054710Z // begin inline asm 2026-02-21T10:15:14.4054943Z cp.async.ca.shared.global [ %r745 + 0 ], [ %rd44 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4055227Z // end inline asm 2026-02-21T10:15:14.4055387Z // begin inline asm 2026-02-21T10:15:14.4055634Z cp.async.ca.shared.global [ %r747 + 0 ], [ %rd45 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4055917Z // end inline asm 2026-02-21T10:15:14.4056079Z // begin inline asm 2026-02-21T10:15:14.4056317Z cp.async.ca.shared.global [ %r749 + 0 ], [ %rd46 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4056720Z // end inline asm 2026-02-21T10:15:14.4056883Z // begin inline asm 2026-02-21T10:15:14.4057111Z cp.async.ca.shared.global [ %r751 + 0 ], [ %rd47 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4057392Z // end inline asm 2026-02-21T10:15:14.4057541Z // begin inline asm 2026-02-21T10:15:14.4057773Z cp.async.ca.shared.global [ %r753 + 0 ], [ %rd48 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4058052Z // end inline asm 2026-02-21T10:15:14.4058214Z // begin inline asm 2026-02-21T10:15:14.4058436Z cp.async.ca.shared.global [ %r755 + 0 ], [ %rd49 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4058708Z // end inline asm 2026-02-21T10:15:14.4058865Z // begin inline asm 2026-02-21T10:15:14.4059086Z cp.async.ca.shared.global [ %r757 + 0 ], [ %rd50 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4059357Z // end inline asm 2026-02-21T10:15:14.4059507Z // begin inline asm 2026-02-21T10:15:14.4059735Z cp.async.ca.shared.global [ %r759 + 0 ], [ %rd51 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4060002Z // end inline asm 2026-02-21T10:15:14.4060167Z cp.async.commit_group; 2026-02-21T10:15:14.4060499Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4060883Z add.s64 %rd52, %rd79, 40960; 2026-02-21T10:15:14.4061075Z add.s64 %rd53, %rd79, 51200; 2026-02-21T10:15:14.4061407Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4061792Z // begin inline asm 2026-02-21T10:15:14.4062026Z cp.async.cg.shared.global [ %r761 + 0 ], [ %rd52 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4062307Z // end inline asm 2026-02-21T10:15:14.4062466Z // begin inline asm 2026-02-21T10:15:14.4062703Z cp.async.cg.shared.global [ %r763 + 0 ], [ %rd53 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4062981Z // end inline asm 2026-02-21T10:15:14.4063138Z cp.async.commit_group; 2026-02-21T10:15:14.4063464Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4063828Z add.s64 %rd54, %rd69, 192; 2026-02-21T10:15:14.4064021Z cvt.u64.u32 %rd84, %r82; 2026-02-21T10:15:14.4064201Z or.b64 %rd85, %rd66, %rd84; 2026-02-21T10:15:14.4064388Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:15:14.4064561Z add.s64 %rd87, %rd15, %rd86; 2026-02-21T10:15:14.4064753Z add.s64 %rd55, %rd87, 262144; 2026-02-21T10:15:14.4065046Z add.s64 %rd56, %rd87, 524288; 2026-02-21T10:15:14.4065304Z add.s64 %rd57, %rd87, 786432; 2026-02-21T10:15:14.4065498Z add.s64 %rd58, %rd87, 1048576; 2026-02-21T10:15:14.4065687Z add.s64 %rd59, %rd87, 1310720; 2026-02-21T10:15:14.4065880Z add.s64 %rd60, %rd87, 1572864; 2026-02-21T10:15:14.4066064Z add.s64 %rd61, %rd87, 1835008; 2026-02-21T10:15:14.4066620Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4067004Z // begin inline asm 2026-02-21T10:15:14.4067244Z cp.async.ca.shared.global [ %r765 + 0 ], [ %rd54 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4067524Z // end inline asm 2026-02-21T10:15:14.4067677Z // begin inline asm 2026-02-21T10:15:14.4067913Z cp.async.ca.shared.global [ %r767 + 0 ], [ %rd55 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4068295Z // end inline asm 2026-02-21T10:15:14.4068543Z // begin inline asm 2026-02-21T10:15:14.4068773Z cp.async.ca.shared.global [ %r769 + 0 ], [ %rd56 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4069054Z // end inline asm 2026-02-21T10:15:14.4069206Z // begin inline asm 2026-02-21T10:15:14.4069435Z cp.async.ca.shared.global [ %r771 + 0 ], [ %rd57 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4069699Z // end inline asm 2026-02-21T10:15:14.4069854Z // begin inline asm 2026-02-21T10:15:14.4070082Z cp.async.ca.shared.global [ %r773 + 0 ], [ %rd58 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4070350Z // end inline asm 2026-02-21T10:15:14.4070504Z // begin inline asm 2026-02-21T10:15:14.4070724Z cp.async.ca.shared.global [ %r775 + 0 ], [ %rd59 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4071000Z // end inline asm 2026-02-21T10:15:14.4071148Z // begin inline asm 2026-02-21T10:15:14.4071378Z cp.async.ca.shared.global [ %r777 + 0 ], [ %rd60 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4071641Z // end inline asm 2026-02-21T10:15:14.4071802Z // begin inline asm 2026-02-21T10:15:14.4072033Z cp.async.ca.shared.global [ %r779 + 0 ], [ %rd61 + 0 ], 0x8, %r706; 2026-02-21T10:15:14.4072299Z // end inline asm 2026-02-21T10:15:14.4072464Z cp.async.commit_group; 2026-02-21T10:15:14.4072788Z .loc 1 51 62 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:62 2026-02-21T10:15:14.4073168Z add.s32 %r800, %r795, %r91; 2026-02-21T10:15:14.4073511Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4073887Z add.s64 %rd62, %rd79, 61440; 2026-02-21T10:15:14.4074073Z cvt.s64.s32 %rd88, %r800; 2026-02-21T10:15:14.4074279Z add.s64 %rd63, %rd16, %rd88; 2026-02-21T10:15:14.4074618Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4074973Z // begin inline asm 2026-02-21T10:15:14.4075215Z cp.async.cg.shared.global [ %r781 + 0 ], [ %rd62 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4075493Z // end inline asm 2026-02-21T10:15:14.4075651Z // begin inline asm 2026-02-21T10:15:14.4075881Z cp.async.cg.shared.global [ %r783 + 0 ], [ %rd63 + 0 ], 0x10, %r722; 2026-02-21T10:15:14.4076171Z // end inline asm 2026-02-21T10:15:14.4076338Z cp.async.commit_group; 2026-02-21T10:15:14.4076819Z .loc 1 37 112 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:37:112 2026-02-21T10:15:14.4077207Z shl.b32 %r801, %r6688, 7; 2026-02-21T10:15:14.4077392Z or.b32 %r802, %r6, %r801; 2026-02-21T10:15:14.4077587Z shl.b32 %r803, %r791, 16; 2026-02-21T10:15:14.4077768Z sub.s32 %r804, %r802, %r803; 2026-02-21T10:15:14.4077961Z shl.b32 %r805, %r804, 13; 2026-02-21T10:15:14.4078155Z mad.wide.s32 %rd158, %r805, 2, %rd5; 2026-02-21T10:15:14.4078367Z add.s64 %rd157, %rd6, %rd76; 2026-02-21T10:15:14.4078558Z mov.b32 %r6691, 0f00000000; 2026-02-21T10:15:14.4078736Z mov.b32 %r6690, 1; 2026-02-21T10:15:14.4078906Z mov.b32 %r6689, -1; 2026-02-21T10:15:14.4079080Z mov.b64 %rd159, -32; 2026-02-21T10:15:14.4079256Z mov.b32 %r6692, %r6691; 2026-02-21T10:15:14.4079426Z mov.b32 %r6693, %r6691; 2026-02-21T10:15:14.4079700Z mov.b32 %r6694, %r6691; 2026-02-21T10:15:14.4079866Z mov.b32 %r6695, %r6691; 2026-02-21T10:15:14.4080097Z mov.b32 %r6696, %r6691; 2026-02-21T10:15:14.4080281Z mov.b32 %r6697, %r6691; 2026-02-21T10:15:14.4080452Z mov.b32 %r6698, %r6691; 2026-02-21T10:15:14.4080627Z mov.b32 %r6699, %r6691; 2026-02-21T10:15:14.4080795Z mov.b32 %r6700, %r6691; 2026-02-21T10:15:14.4081044Z mov.b32 %r6701, %r6691; 2026-02-21T10:15:14.4081218Z mov.b32 %r6702, %r6691; 2026-02-21T10:15:14.4081395Z mov.b32 %r6703, %r6691; 2026-02-21T10:15:14.4081558Z mov.b32 %r6704, %r6691; 2026-02-21T10:15:14.4081733Z mov.b32 %r6705, %r6691; 2026-02-21T10:15:14.4081896Z mov.b32 %r6706, %r6691; 2026-02-21T10:15:14.4082071Z mov.b32 %r6707, %r6691; 2026-02-21T10:15:14.4082251Z mov.b32 %r6708, %r6691; 2026-02-21T10:15:14.4082414Z mov.b32 %r6709, %r6691; 2026-02-21T10:15:14.4082647Z mov.b32 %r6710, %r6691; 2026-02-21T10:15:14.4082829Z mov.b32 %r6711, %r6691; 2026-02-21T10:15:14.4083001Z mov.b32 %r6712, %r6691; 2026-02-21T10:15:14.4083164Z mov.b32 %r6713, %r6691; 2026-02-21T10:15:14.4083336Z mov.b32 %r6714, %r6691; 2026-02-21T10:15:14.4083498Z mov.b32 %r6715, %r6691; 2026-02-21T10:15:14.4083665Z mov.b32 %r6716, %r6691; 2026-02-21T10:15:14.4083826Z mov.b32 %r6717, %r6691; 2026-02-21T10:15:14.4083998Z mov.b32 %r6718, %r6691; 2026-02-21T10:15:14.4084168Z mov.b32 %r6719, %r6691; 2026-02-21T10:15:14.4084329Z mov.b32 %r6720, %r6691; 2026-02-21T10:15:14.4084500Z mov.b32 %r6721, %r6691; 2026-02-21T10:15:14.4084663Z mov.b32 %r6722, %r6691; 2026-02-21T10:15:14.4084832Z mov.b32 %r6723, %r6691; 2026-02-21T10:15:14.4084995Z mov.b32 %r6724, %r6691; 2026-02-21T10:15:14.4085162Z mov.b32 %r6725, %r6691; 2026-02-21T10:15:14.4085324Z mov.b32 %r6726, %r6691; 2026-02-21T10:15:14.4085493Z mov.b32 %r6727, %r6691; 2026-02-21T10:15:14.4085659Z mov.b32 %r6728, %r6691; 2026-02-21T10:15:14.4085828Z mov.b32 %r6729, %r6691; 2026-02-21T10:15:14.4085998Z mov.b32 %r6730, %r6691; 2026-02-21T10:15:14.4086163Z mov.b32 %r6731, %r6691; 2026-02-21T10:15:14.4086333Z mov.b32 %r6732, %r6691; 2026-02-21T10:15:14.4086635Z mov.b32 %r6733, %r6691; 2026-02-21T10:15:14.4086806Z mov.b32 %r6734, %r6691; 2026-02-21T10:15:14.4086974Z mov.b32 %r6735, %r6691; 2026-02-21T10:15:14.4087135Z mov.b32 %r6736, %r6691; 2026-02-21T10:15:14.4087304Z mov.b32 %r6737, %r6691; 2026-02-21T10:15:14.4087472Z mov.b32 %r6738, %r6691; 2026-02-21T10:15:14.4087638Z mov.b32 %r6739, %r6691; 2026-02-21T10:15:14.4087804Z mov.b32 %r6740, %r6691; 2026-02-21T10:15:14.4087978Z mov.b32 %r6741, %r6691; 2026-02-21T10:15:14.4088142Z mov.b32 %r6742, %r6691; 2026-02-21T10:15:14.4088312Z mov.b32 %r6743, %r6691; 2026-02-21T10:15:14.4088483Z mov.b32 %r6744, %r6691; 2026-02-21T10:15:14.4088653Z mov.b32 %r6745, %r6691; 2026-02-21T10:15:14.4088822Z mov.b32 %r6746, %r6691; 2026-02-21T10:15:14.4088985Z mov.b32 %r6747, %r6691; 2026-02-21T10:15:14.4089153Z mov.b32 %r6748, %r6691; 2026-02-21T10:15:14.4089313Z mov.b32 %r6749, %r6691; 2026-02-21T10:15:14.4089486Z mov.b32 %r6750, %r6691; 2026-02-21T10:15:14.4089649Z mov.b32 %r6751, %r6691; 2026-02-21T10:15:14.4089831Z mov.b32 %r6752, %r6691; 2026-02-21T10:15:14.4089993Z mov.b32 %r6753, %r6691; 2026-02-21T10:15:14.4090161Z mov.b32 %r6754, %r6691; 2026-02-21T10:15:14.4090330Z mov.b32 %r6755, %r6691; 2026-02-21T10:15:14.4090490Z mov.b32 %r6756, %r6691; 2026-02-21T10:15:14.4090660Z mov.b32 %r6757, %r6691; 2026-02-21T10:15:14.4090821Z mov.b32 %r6758, %r6691; 2026-02-21T10:15:14.4090991Z mov.b32 %r6759, %r6691; 2026-02-21T10:15:14.4091155Z mov.b32 %r6760, %r6691; 2026-02-21T10:15:14.4091321Z mov.b32 %r6761, %r6691; 2026-02-21T10:15:14.4091484Z mov.b32 %r6762, %r6691; 2026-02-21T10:15:14.4091649Z mov.b32 %r6763, %r6691; 2026-02-21T10:15:14.4091808Z mov.b32 %r6764, %r6691; 2026-02-21T10:15:14.4091975Z mov.b32 %r6765, %r6691; 2026-02-21T10:15:14.4092142Z mov.b32 %r6766, %r6691; 2026-02-21T10:15:14.4092316Z mov.b32 %r6767, %r6691; 2026-02-21T10:15:14.4092581Z mov.b32 %r6768, %r6691; 2026-02-21T10:15:14.4092748Z mov.b32 %r6769, %r6691; 2026-02-21T10:15:14.4092978Z mov.b32 %r6770, %r6691; 2026-02-21T10:15:14.4093145Z mov.b32 %r6771, %r6691; 2026-02-21T10:15:14.4093312Z mov.b32 %r6772, %r6691; 2026-02-21T10:15:14.4093475Z mov.b32 %r6773, %r6691; 2026-02-21T10:15:14.4093645Z mov.b32 %r6774, %r6691; 2026-02-21T10:15:14.4093898Z mov.b32 %r6775, %r6691; 2026-02-21T10:15:14.4094085Z mov.b32 %r6776, %r6691; 2026-02-21T10:15:14.4094255Z mov.b32 %r6777, %r6691; 2026-02-21T10:15:14.4094417Z mov.b32 %r6778, %r6691; 2026-02-21T10:15:14.4094589Z mov.b32 %r6779, %r6691; 2026-02-21T10:15:14.4094750Z mov.b32 %r6780, %r6691; 2026-02-21T10:15:14.4094917Z mov.b32 %r6781, %r6691; 2026-02-21T10:15:14.4095079Z mov.b32 %r6782, %r6691; 2026-02-21T10:15:14.4095246Z mov.b32 %r6783, %r6691; 2026-02-21T10:15:14.4095481Z mov.b32 %r6784, %r6691; 2026-02-21T10:15:14.4095657Z mov.b32 %r6785, %r6691; 2026-02-21T10:15:14.4095820Z mov.b32 %r6786, %r6691; 2026-02-21T10:15:14.4095994Z mov.b32 %r6787, %r6691; 2026-02-21T10:15:14.4096161Z mov.b32 %r6788, %r6691; 2026-02-21T10:15:14.4096323Z mov.b32 %r6789, %r6691; 2026-02-21T10:15:14.4096629Z mov.b32 %r6790, %r6691; 2026-02-21T10:15:14.4096800Z mov.b32 %r6791, %r6691; 2026-02-21T10:15:14.4096967Z mov.b32 %r6792, %r6691; 2026-02-21T10:15:14.4097133Z mov.b32 %r6793, %r6691; 2026-02-21T10:15:14.4097304Z mov.b32 %r6794, %r6691; 2026-02-21T10:15:14.4097474Z mov.b32 %r6795, %r6691; 2026-02-21T10:15:14.4097656Z mov.b32 %r6796, %r6691; 2026-02-21T10:15:14.4097823Z mov.b32 %r6797, %r6691; 2026-02-21T10:15:14.4097990Z mov.b32 %r6798, %r6691; 2026-02-21T10:15:14.4098157Z mov.b32 %r6799, %r6691; 2026-02-21T10:15:14.4098318Z mov.b32 %r6800, %r6691; 2026-02-21T10:15:14.4098487Z mov.b32 %r6801, %r6691; 2026-02-21T10:15:14.4098647Z mov.b32 %r6802, %r6691; 2026-02-21T10:15:14.4098815Z mov.b32 %r6803, %r6691; 2026-02-21T10:15:14.4098974Z mov.b32 %r6804, %r6691; 2026-02-21T10:15:14.4099143Z mov.b32 %r6805, %r6691; 2026-02-21T10:15:14.4099304Z mov.b32 %r6806, %r6691; 2026-02-21T10:15:14.4099473Z mov.b32 %r6807, %r6691; 2026-02-21T10:15:14.4099646Z mov.b32 %r6808, %r6691; 2026-02-21T10:15:14.4099817Z mov.b32 %r6809, %r6691; 2026-02-21T10:15:14.4099984Z mov.b32 %r6810, %r6691; 2026-02-21T10:15:14.4100143Z mov.b32 %r6811, %r6691; 2026-02-21T10:15:14.4100309Z mov.b32 %r6812, %r6691; 2026-02-21T10:15:14.4100471Z mov.b32 %r6813, %r6691; 2026-02-21T10:15:14.4100638Z mov.b32 %r6814, %r6691; 2026-02-21T10:15:14.4100800Z mov.b32 %r6815, %r6691; 2026-02-21T10:15:14.4100966Z mov.b32 %r6816, %r6691; 2026-02-21T10:15:14.4101128Z mov.b32 %r6817, %r6691; 2026-02-21T10:15:14.4101294Z mov.b32 %r6818, %r6691; 2026-02-21T10:15:14.4101456Z mov.b32 %r6819, %r6691; 2026-02-21T10:15:14.4101622Z mov.b32 %r6820, %r6691; 2026-02-21T10:15:14.4101792Z mov.b32 %r6821, %r6691; 2026-02-21T10:15:14.4101954Z mov.b32 %r6822, %r6691; 2026-02-21T10:15:14.4102118Z mov.b32 %r6823, %r6691; 2026-02-21T10:15:14.4102280Z mov.b32 %r6824, %r6691; 2026-02-21T10:15:14.4102448Z mov.b32 %r6825, %r6691; 2026-02-21T10:15:14.4102611Z mov.b32 %r6826, %r6691; 2026-02-21T10:15:14.4102777Z mov.b32 %r6827, %r6691; 2026-02-21T10:15:14.4102938Z mov.b32 %r6828, %r6691; 2026-02-21T10:15:14.4103106Z mov.b32 %r6829, %r6691; 2026-02-21T10:15:14.4103266Z mov.b32 %r6830, %r6691; 2026-02-21T10:15:14.4103441Z mov.b32 %r6831, %r6691; 2026-02-21T10:15:14.4103611Z mov.b32 %r6832, %r6691; 2026-02-21T10:15:14.4103773Z mov.b32 %r6833, %r6691; 2026-02-21T10:15:14.4103946Z mov.b32 %r6834, %r6691; 2026-02-21T10:15:14.4104108Z mov.b32 %r6835, %r6691; 2026-02-21T10:15:14.4104289Z mov.b32 %r6836, %r6691; 2026-02-21T10:15:14.4104458Z mov.b32 %r6837, %r6691; 2026-02-21T10:15:14.4104627Z mov.b32 %r6838, %r6691; 2026-02-21T10:15:14.4104788Z mov.b32 %r6839, %r6691; 2026-02-21T10:15:14.4104961Z mov.b32 %r6840, %r6691; 2026-02-21T10:15:14.4105128Z mov.b32 %r6841, %r6691; 2026-02-21T10:15:14.4105296Z mov.b32 %r6842, %r6691; 2026-02-21T10:15:14.4105561Z mov.b32 %r6843, %r6691; 2026-02-21T10:15:14.4105792Z mov.b32 %r6844, %r6691; 2026-02-21T10:15:14.4105957Z mov.b32 %r6845, %r6691; 2026-02-21T10:15:14.4106125Z mov.b32 %r6846, %r6691; 2026-02-21T10:15:14.4106294Z mov.b32 %r6847, %r6691; 2026-02-21T10:15:14.4106585Z mov.b32 %r6848, %r6691; 2026-02-21T10:15:14.4106782Z mov.b32 %r6849, %r6691; 2026-02-21T10:15:14.4107028Z mov.b32 %r6850, %r6691; 2026-02-21T10:15:14.4107204Z mov.b32 %r6851, %r6691; 2026-02-21T10:15:14.4107367Z mov.b32 %r6852, %r6691; 2026-02-21T10:15:14.4107532Z mov.b32 %r6853, %r6691; 2026-02-21T10:15:14.4107698Z mov.b32 %r6854, %r6691; 2026-02-21T10:15:14.4107857Z mov.b32 %r6855, %r6691; 2026-02-21T10:15:14.4108022Z mov.b32 %r6856, %r6691; 2026-02-21T10:15:14.4108185Z mov.b32 %r6857, %r6691; 2026-02-21T10:15:14.4108352Z mov.b32 %r6858, %r6691; 2026-02-21T10:15:14.4108713Z mov.b32 %r6859, %r6691; 2026-02-21T10:15:14.4108894Z mov.b32 %r6860, %r6691; 2026-02-21T10:15:14.4109061Z mov.b32 %r6861, %r6691; 2026-02-21T10:15:14.4109227Z mov.b32 %r6862, %r6691; 2026-02-21T10:15:14.4109391Z mov.b32 %r6863, %r6691; 2026-02-21T10:15:14.4109556Z mov.b32 %r6864, %r6691; 2026-02-21T10:15:14.4109722Z mov.b32 %r6865, %r6691; 2026-02-21T10:15:14.4109885Z mov.b32 %r6866, %r6691; 2026-02-21T10:15:14.4110052Z mov.b32 %r6867, %r6691; 2026-02-21T10:15:14.4110222Z mov.b32 %r6868, %r6691; 2026-02-21T10:15:14.4110391Z mov.b32 %r6869, %r6691; 2026-02-21T10:15:14.4110552Z mov.b32 %r6870, %r6691; 2026-02-21T10:15:14.4110719Z mov.b32 %r6871, %r6691; 2026-02-21T10:15:14.4110881Z mov.b32 %r6872, %r6691; 2026-02-21T10:15:14.4111048Z mov.b32 %r6873, %r6691; 2026-02-21T10:15:14.4111209Z mov.b32 %r6874, %r6691; 2026-02-21T10:15:14.4111375Z mov.b32 %r6875, %r6691; 2026-02-21T10:15:14.4111541Z mov.b32 %r6876, %r6691; 2026-02-21T10:15:14.4111704Z mov.b32 %r6877, %r6691; 2026-02-21T10:15:14.4111872Z mov.b32 %r6878, %r6691; 2026-02-21T10:15:14.4112037Z mov.b32 %r6879, %r6691; 2026-02-21T10:15:14.4112209Z mov.b32 %r6880, %r6691; 2026-02-21T10:15:14.4112372Z mov.b32 %r6881, %r6691; 2026-02-21T10:15:14.4112549Z mov.b32 %r6882, %r6691; 2026-02-21T10:15:14.4112710Z mov.b32 %r6883, %r6691; 2026-02-21T10:15:14.4112878Z mov.b32 %r6884, %r6691; 2026-02-21T10:15:14.4113042Z mov.b32 %r6885, %r6691; 2026-02-21T10:15:14.4113216Z mov.b32 %r6886, %r6691; 2026-02-21T10:15:14.4113399Z mov.b32 %r6887, %r6691; 2026-02-21T10:15:14.4113565Z mov.b32 %r6888, %r6691; 2026-02-21T10:15:14.4113735Z mov.b32 %r6889, %r6691; 2026-02-21T10:15:14.4113902Z mov.b32 %r6890, %r6691; 2026-02-21T10:15:14.4114071Z mov.b32 %r6891, %r6691; 2026-02-21T10:15:14.4114232Z mov.b32 %r6892, %r6691; 2026-02-21T10:15:14.4114404Z mov.b32 %r6893, %r6691; 2026-02-21T10:15:14.4114573Z mov.b32 %r6894, %r6691; 2026-02-21T10:15:14.4114739Z mov.b32 %r6895, %r6691; 2026-02-21T10:15:14.4114903Z mov.b32 %r6896, %r6691; 2026-02-21T10:15:14.4115071Z mov.b32 %r6897, %r6691; 2026-02-21T10:15:14.4115235Z mov.b32 %r6898, %r6691; 2026-02-21T10:15:14.4115403Z mov.b32 %r6899, %r6691; 2026-02-21T10:15:14.4115568Z mov.b32 %r6900, %r6691; 2026-02-21T10:15:14.4115740Z mov.b32 %r6901, %r6691; 2026-02-21T10:15:14.4115915Z mov.b32 %r6902, %r6691; 2026-02-21T10:15:14.4116074Z mov.b32 %r6903, %r6691; 2026-02-21T10:15:14.4116240Z mov.b32 %r6904, %r6691; 2026-02-21T10:15:14.4116400Z mov.b32 %r6905, %r6691; 2026-02-21T10:15:14.4116689Z mov.b32 %r6906, %r6691; 2026-02-21T10:15:14.4116851Z mov.b32 %r6907, %r6691; 2026-02-21T10:15:14.4117014Z mov.b32 %r6908, %r6691; 2026-02-21T10:15:14.4117181Z mov.b32 %r6909, %r6691; 2026-02-21T10:15:14.4117339Z mov.b32 %r6910, %r6691; 2026-02-21T10:15:14.4117505Z mov.b32 %r6911, %r6691; 2026-02-21T10:15:14.4117667Z mov.b32 %r6912, %r6691; 2026-02-21T10:15:14.4117834Z mov.b32 %r6913, %r6691; 2026-02-21T10:15:14.4117997Z mov.b32 %r6914, %r6691; 2026-02-21T10:15:14.4118163Z mov.b32 %r6915, %r6691; 2026-02-21T10:15:14.4118322Z mov.b32 %r6916, %r6691; 2026-02-21T10:15:14.4118605Z mov.b32 %r6917, %r6691; 2026-02-21T10:15:14.4118771Z mov.b32 %r6918, %r6691; 2026-02-21T10:15:14.4119002Z mov.b32 %r6919, %r6691; 2026-02-21T10:15:14.4119170Z mov.b32 %r6920, %r6691; 2026-02-21T10:15:14.4119336Z mov.b32 %r6921, %r6691; 2026-02-21T10:15:14.4119505Z mov.b32 %r6922, %r6691; 2026-02-21T10:15:14.4119669Z mov.b32 %r6923, %r6691; 2026-02-21T10:15:14.4119902Z mov.b32 %r6924, %r6691; 2026-02-21T10:15:14.4120067Z mov.b32 %r6925, %r6691; 2026-02-21T10:15:14.4120232Z mov.b32 %r6926, %r6691; 2026-02-21T10:15:14.4120392Z mov.b32 %r6927, %r6691; 2026-02-21T10:15:14.4120573Z mov.b32 %r6928, %r6691; 2026-02-21T10:15:14.4120739Z mov.b32 %r6929, %r6691; 2026-02-21T10:15:14.4120909Z mov.b32 %r6930, %r6691; 2026-02-21T10:15:14.4121079Z mov.b32 %r6931, %r6691; 2026-02-21T10:15:14.4121241Z mov.b32 %r6932, %r6691; 2026-02-21T10:15:14.4121478Z mov.b32 %r6933, %r6691; 2026-02-21T10:15:14.4121651Z mov.b32 %r6934, %r6691; 2026-02-21T10:15:14.4121822Z mov.b32 %r6935, %r6691; 2026-02-21T10:15:14.4121988Z mov.b32 %r6936, %r6691; 2026-02-21T10:15:14.4122162Z mov.b32 %r6937, %r6691; 2026-02-21T10:15:14.4122335Z mov.b32 %r6938, %r6691; 2026-02-21T10:15:14.4122508Z mov.b32 %r6939, %r6691; 2026-02-21T10:15:14.4122670Z mov.b32 %r6940, %r6691; 2026-02-21T10:15:14.4122851Z mov.b32 %r6941, %r6691; 2026-02-21T10:15:14.4123021Z mov.b32 %r6942, %r6691; 2026-02-21T10:15:14.4123183Z mov.b32 %r6943, %r6691; 2026-02-21T10:15:14.4123351Z mov.b32 %r6944, %r6691; 2026-02-21T10:15:14.4123513Z mov.b32 %r6945, %r6691; 2026-02-21T10:15:14.4123683Z mov.b32 %r6946, %r6691; 2026-02-21T10:15:14.4123902Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:15:14.4124212Z // => This Inner Loop Header: Depth=2 2026-02-21T10:15:14.4124470Z add.s64 %rd159, %rd159, 32; 2026-02-21T10:15:14.4124686Z setp.lt.u64 %p18, %rd159, 4032; 2026-02-21T10:15:14.4124889Z add.s32 %r6042, %r6689, 1; 2026-02-21T10:15:14.4125077Z setp.gt.s32 %p19, %r6042, 1; 2026-02-21T10:15:14.4125273Z selp.b32 %r6689, 0, %r6042, %p19; 2026-02-21T10:15:14.4125636Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4126020Z cp.async.wait_group 4; 2026-02-21T10:15:14.4126194Z bar.sync 0; 2026-02-21T10:15:14.4126349Z shl.b32 %r6043, %r6689, 12; 2026-02-21T10:15:14.4126650Z shl.b32 %r6044, %r6689, 13; 2026-02-21T10:15:14.4126835Z add.s32 %r6045, %r3401, 32768; 2026-02-21T10:15:14.4127021Z add.s32 %r6046, %r6045, %r6044; 2026-02-21T10:15:14.4127364Z .loc 1 49 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:49:32 2026-02-21T10:15:14.4127733Z add.s32 %r6047, %r6046, %r94; 2026-02-21T10:15:14.4127916Z ld.shared.b16 %rs1, [%r6047]; 2026-02-21T10:15:14.4128103Z ld.shared.b16 %rs2, [%r6047+512]; 2026-02-21T10:15:14.4128304Z ld.shared.b16 %rs3, [%r6047+32]; 2026-02-21T10:15:14.4128499Z ld.shared.b16 %rs4, [%r6047+544]; 2026-02-21T10:15:14.4128692Z ld.shared.b16 %rs5, [%r6047+4096]; 2026-02-21T10:15:14.4128889Z ld.shared.b16 %rs6, [%r6047+4608]; 2026-02-21T10:15:14.4129093Z ld.shared.b16 %rs7, [%r6047+4128]; 2026-02-21T10:15:14.4129282Z ld.shared.b16 %rs8, [%r6047+4640]; 2026-02-21T10:15:14.4129474Z add.s32 %r6048, %r6046, %r95; 2026-02-21T10:15:14.4129656Z ld.shared.b16 %rs9, [%r6048]; 2026-02-21T10:15:14.4129865Z ld.shared.b16 %rs10, [%r6048+512]; 2026-02-21T10:15:14.4130061Z ld.shared.b16 %rs11, [%r6048+32]; 2026-02-21T10:15:14.4130255Z ld.shared.b16 %rs12, [%r6048+544]; 2026-02-21T10:15:14.4130445Z ld.shared.b16 %rs13, [%r6048+4096]; 2026-02-21T10:15:14.4130647Z ld.shared.b16 %rs14, [%r6048+4608]; 2026-02-21T10:15:14.4130848Z ld.shared.b16 %rs15, [%r6048+4128]; 2026-02-21T10:15:14.4131046Z ld.shared.b16 %rs16, [%r6048+4640]; 2026-02-21T10:15:14.4131243Z add.s32 %r6049, %r6046, %r96; 2026-02-21T10:15:14.4131427Z ld.shared.b16 %rs17, [%r6049]; 2026-02-21T10:15:14.4131623Z ld.shared.b16 %rs18, [%r6049+512]; 2026-02-21T10:15:14.4131928Z ld.shared.b16 %rs19, [%r6049+32]; 2026-02-21T10:15:14.4132203Z ld.shared.b16 %rs20, [%r6049+544]; 2026-02-21T10:15:14.4132409Z ld.shared.b16 %rs21, [%r6049+4096]; 2026-02-21T10:15:14.4132615Z ld.shared.b16 %rs22, [%r6049+4608]; 2026-02-21T10:15:14.4132814Z ld.shared.b16 %rs23, [%r6049+4128]; 2026-02-21T10:15:14.4133017Z ld.shared.b16 %rs24, [%r6049+4640]; 2026-02-21T10:15:14.4133279Z add.s32 %r6050, %r6046, %r97; 2026-02-21T10:15:14.4133483Z ld.shared.b16 %rs25, [%r6050]; 2026-02-21T10:15:14.4133683Z ld.shared.b16 %rs26, [%r6050+512]; 2026-02-21T10:15:14.4133881Z ld.shared.b16 %rs27, [%r6050+32]; 2026-02-21T10:15:14.4134083Z ld.shared.b16 %rs28, [%r6050+544]; 2026-02-21T10:15:14.4134280Z ld.shared.b16 %rs29, [%r6050+4096]; 2026-02-21T10:15:14.4134484Z ld.shared.b16 %rs30, [%r6050+4608]; 2026-02-21T10:15:14.4134758Z ld.shared.b16 %rs31, [%r6050+4128]; 2026-02-21T10:15:14.4134971Z ld.shared.b16 %rs32, [%r6050+4640]; 2026-02-21T10:15:14.4135182Z cvt.f32.bf16 %r1062, %rs1; 2026-02-21T10:15:14.4135360Z cvt.f32.bf16 %r1063, %rs2; 2026-02-21T10:15:14.4135544Z cvt.f32.bf16 %r1064, %rs9; 2026-02-21T10:15:14.4137700Z cvt.f32.bf16 %r1065, %rs10; 2026-02-21T10:15:14.4137969Z cvt.f32.bf16 %r1322, %rs17; 2026-02-21T10:15:14.4138171Z cvt.f32.bf16 %r1323, %rs18; 2026-02-21T10:15:14.4138356Z cvt.f32.bf16 %r1324, %rs25; 2026-02-21T10:15:14.4138542Z cvt.f32.bf16 %r1325, %rs26; 2026-02-21T10:15:14.4140244Z cvt.f32.bf16 %r1582, %rs3; 2026-02-21T10:15:14.4140491Z cvt.f32.bf16 %r1583, %rs4; 2026-02-21T10:15:14.4140687Z cvt.f32.bf16 %r1584, %rs11; 2026-02-21T10:15:14.4140874Z cvt.f32.bf16 %r1585, %rs12; 2026-02-21T10:15:14.4141056Z cvt.f32.bf16 %r1842, %rs19; 2026-02-21T10:15:14.4141232Z cvt.f32.bf16 %r1843, %rs20; 2026-02-21T10:15:14.4141408Z cvt.f32.bf16 %r1844, %rs27; 2026-02-21T10:15:14.4141582Z cvt.f32.bf16 %r1845, %rs28; 2026-02-21T10:15:14.4141761Z cvt.f32.bf16 %r2102, %rs5; 2026-02-21T10:15:14.4141934Z cvt.f32.bf16 %r2103, %rs6; 2026-02-21T10:15:14.4142120Z cvt.f32.bf16 %r2104, %rs13; 2026-02-21T10:15:14.4142296Z cvt.f32.bf16 %r2105, %rs14; 2026-02-21T10:15:14.4142476Z cvt.f32.bf16 %r2362, %rs21; 2026-02-21T10:15:14.4142684Z cvt.f32.bf16 %r2363, %rs22; 2026-02-21T10:15:14.4142861Z cvt.f32.bf16 %r2364, %rs29; 2026-02-21T10:15:14.4143043Z cvt.f32.bf16 %r2365, %rs30; 2026-02-21T10:15:14.4143222Z cvt.f32.bf16 %r2622, %rs7; 2026-02-21T10:15:14.4143395Z cvt.f32.bf16 %r2623, %rs8; 2026-02-21T10:15:14.4143579Z cvt.f32.bf16 %r2624, %rs15; 2026-02-21T10:15:14.4143755Z cvt.f32.bf16 %r2625, %rs16; 2026-02-21T10:15:14.4143934Z cvt.f32.bf16 %r2882, %rs23; 2026-02-21T10:15:14.4144108Z cvt.f32.bf16 %r2883, %rs24; 2026-02-21T10:15:14.4144290Z cvt.f32.bf16 %r2884, %rs31; 2026-02-21T10:15:14.4144463Z cvt.f32.bf16 %r2885, %rs32; 2026-02-21T10:15:14.4144818Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4145204Z add.s32 %r6051, %r3401, 65536; 2026-02-21T10:15:14.4145420Z add.s32 %r6052, %r6051, %r6043; 2026-02-21T10:15:14.4145774Z .loc 1 64 45 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:64:45 2026-02-21T10:15:14.4146146Z add.s32 %r6053, %r6052, %r45; 2026-02-21T10:15:14.4146338Z add.s32 %r6054, %r6052, %r98; 2026-02-21T10:15:14.4146684Z add.s32 %r6055, %r6052, %r99; 2026-02-21T10:15:14.4146878Z add.s32 %r6056, %r6052, %r100; 2026-02-21T10:15:14.4147063Z add.s32 %r6057, %r6052, %r101; 2026-02-21T10:15:14.4147408Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4147775Z ld.shared.s8 %rs33, [%r6053]; 2026-02-21T10:15:14.4148115Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4148584Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:15:14.4148914Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4149418Z ld.shared.s8 %rs35, [%r6053+128]; 2026-02-21T10:15:14.4149760Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4150197Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:15:14.4150514Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4150883Z ld.shared.s8 %rs37, [%r6053+256]; 2026-02-21T10:15:14.4151241Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4151603Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:15:14.4151927Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4152295Z ld.shared.s8 %rs39, [%r6053+384]; 2026-02-21T10:15:14.4152651Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4153014Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:15:14.4153343Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4153712Z ld.shared.s8 %rs41, [%r6053+512]; 2026-02-21T10:15:14.4154197Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4154591Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:15:14.4155018Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4155402Z ld.shared.s8 %rs43, [%r6053+640]; 2026-02-21T10:15:14.4155751Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4156111Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:15:14.4156434Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4156936Z ld.shared.s8 %rs45, [%r6053+768]; 2026-02-21T10:15:14.4157279Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4157641Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:15:14.4157968Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4158335Z ld.shared.s8 %rs47, [%r6054]; 2026-02-21T10:15:14.4158664Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4159026Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:15:14.4159341Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4159712Z ld.shared.s8 %rs49, [%r6053+1024]; 2026-02-21T10:15:14.4160059Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4160439Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:15:14.4160769Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4161135Z ld.shared.s8 %rs51, [%r6053+1152]; 2026-02-21T10:15:14.4161486Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4161848Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:15:14.4162168Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4162530Z ld.shared.s8 %rs53, [%r6053+1280]; 2026-02-21T10:15:14.4162878Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4163240Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:15:14.4163556Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4163924Z ld.shared.s8 %rs55, [%r6053+1408]; 2026-02-21T10:15:14.4164278Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4164645Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:15:14.4165050Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4165485Z ld.shared.s8 %rs57, [%r6053+1536]; 2026-02-21T10:15:14.4165838Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4166198Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:15:14.4166641Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4167007Z ld.shared.s8 %rs59, [%r6053+1664]; 2026-02-21T10:15:14.4167354Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4167720Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:15:14.4168039Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4168409Z ld.shared.s8 %rs61, [%r6053+1792]; 2026-02-21T10:15:14.4168750Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4169117Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:15:14.4169537Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4169905Z ld.shared.s8 %rs63, [%r6055]; 2026-02-21T10:15:14.4170242Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4170668Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:15:14.4170993Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4171355Z ld.shared.s8 %rs65, [%r6053+2048]; 2026-02-21T10:15:14.4171699Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4172069Z shl.b16 %rs66, %rs65, 4; 2026-02-21T10:15:14.4172391Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4172759Z ld.shared.s8 %rs67, [%r6053+2176]; 2026-02-21T10:15:14.4173100Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4173464Z shl.b16 %rs68, %rs67, 4; 2026-02-21T10:15:14.4173781Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4174145Z ld.shared.s8 %rs69, [%r6053+2304]; 2026-02-21T10:15:14.4174485Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4174848Z shl.b16 %rs70, %rs69, 4; 2026-02-21T10:15:14.4175168Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4175527Z ld.shared.s8 %rs71, [%r6053+2432]; 2026-02-21T10:15:14.4175874Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4176231Z shl.b16 %rs72, %rs71, 4; 2026-02-21T10:15:14.4176671Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4177051Z ld.shared.s8 %rs73, [%r6053+2560]; 2026-02-21T10:15:14.4177403Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4177764Z shl.b16 %rs74, %rs73, 4; 2026-02-21T10:15:14.4178081Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4178449Z ld.shared.s8 %rs75, [%r6053+2688]; 2026-02-21T10:15:14.4178788Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4179155Z shl.b16 %rs76, %rs75, 4; 2026-02-21T10:15:14.4179474Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4179843Z ld.shared.s8 %rs77, [%r6053+2816]; 2026-02-21T10:15:14.4180190Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4180705Z shl.b16 %rs78, %rs77, 4; 2026-02-21T10:15:14.4181034Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4181403Z ld.shared.s8 %rs79, [%r6056]; 2026-02-21T10:15:14.4181748Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4182117Z shl.b16 %rs80, %rs79, 4; 2026-02-21T10:15:14.4182444Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4182807Z ld.shared.s8 %rs81, [%r6053+3072]; 2026-02-21T10:15:14.4183158Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4183517Z shl.b16 %rs82, %rs81, 4; 2026-02-21T10:15:14.4183840Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4184207Z ld.shared.s8 %rs83, [%r6053+3200]; 2026-02-21T10:15:14.4184639Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4185021Z shl.b16 %rs84, %rs83, 4; 2026-02-21T10:15:14.4185339Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4185708Z ld.shared.s8 %rs85, [%r6053+3328]; 2026-02-21T10:15:14.4186109Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4186599Z shl.b16 %rs86, %rs85, 4; 2026-02-21T10:15:14.4186932Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4187302Z ld.shared.s8 %rs87, [%r6053+3456]; 2026-02-21T10:15:14.4187647Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4188004Z shl.b16 %rs88, %rs87, 4; 2026-02-21T10:15:14.4188326Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4188774Z ld.shared.s8 %rs89, [%r6053+3584]; 2026-02-21T10:15:14.4189124Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4189483Z shl.b16 %rs90, %rs89, 4; 2026-02-21T10:15:14.4189803Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4190170Z ld.shared.s8 %rs91, [%r6053+3712]; 2026-02-21T10:15:14.4190511Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4190875Z shl.b16 %rs92, %rs91, 4; 2026-02-21T10:15:14.4191188Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4191558Z ld.shared.s8 %rs93, [%r6053+3840]; 2026-02-21T10:15:14.4191903Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4192264Z shl.b16 %rs94, %rs93, 4; 2026-02-21T10:15:14.4192590Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4192965Z ld.shared.s8 %rs95, [%r6057]; 2026-02-21T10:15:14.4193303Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4193662Z shl.b16 %rs96, %rs95, 4; 2026-02-21T10:15:14.4193988Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4194357Z cvt.s16.s8 %rs97, %rs34; 2026-02-21T10:15:14.4194530Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:15:14.4194707Z cvt.s16.s8 %rs99, %rs38; 2026-02-21T10:15:14.4194877Z shr.s16 %rs100, %rs99, 4; 2026-02-21T10:15:14.4195062Z shr.s16 %rs101, %rs33, 4; 2026-02-21T10:15:14.4195233Z shr.s16 %rs102, %rs37, 4; 2026-02-21T10:15:14.4195662Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4196093Z cvt.rn.f32.s16 %r6058, %rs102; 2026-02-21T10:15:14.4196300Z cvt.rn.f32.s16 %r6059, %rs101; 2026-02-21T10:15:14.4196615Z cvt.rn.f32.s16 %r6060, %rs100; 2026-02-21T10:15:14.4196811Z cvt.rn.f32.s16 %r6061, %rs98; 2026-02-21T10:15:14.4197149Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4197511Z cvt.s16.s8 %rs103, %rs36; 2026-02-21T10:15:14.4197694Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:15:14.4197874Z cvt.s16.s8 %rs105, %rs40; 2026-02-21T10:15:14.4198053Z shr.s16 %rs106, %rs105, 4; 2026-02-21T10:15:14.4198229Z shr.s16 %rs107, %rs35, 4; 2026-02-21T10:15:14.4198408Z shr.s16 %rs108, %rs39, 4; 2026-02-21T10:15:14.4198737Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4199099Z cvt.rn.f32.s16 %r6062, %rs108; 2026-02-21T10:15:14.4199297Z cvt.rn.f32.s16 %r6063, %rs107; 2026-02-21T10:15:14.4199483Z cvt.rn.f32.s16 %r6064, %rs106; 2026-02-21T10:15:14.4199680Z cvt.rn.f32.s16 %r6065, %rs104; 2026-02-21T10:15:14.4200122Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4200497Z cvt.s16.s8 %rs109, %rs42; 2026-02-21T10:15:14.4200676Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:15:14.4200859Z cvt.s16.s8 %rs111, %rs46; 2026-02-21T10:15:14.4201102Z shr.s16 %rs112, %rs111, 4; 2026-02-21T10:15:14.4201279Z shr.s16 %rs113, %rs41, 4; 2026-02-21T10:15:14.4201453Z shr.s16 %rs114, %rs45, 4; 2026-02-21T10:15:14.4201789Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4202159Z cvt.rn.f32.s16 %r6066, %rs114; 2026-02-21T10:15:14.4202345Z cvt.rn.f32.s16 %r6067, %rs113; 2026-02-21T10:15:14.4202533Z cvt.rn.f32.s16 %r6068, %rs112; 2026-02-21T10:15:14.4202714Z cvt.rn.f32.s16 %r6069, %rs110; 2026-02-21T10:15:14.4203048Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4203417Z cvt.s16.s8 %rs115, %rs44; 2026-02-21T10:15:14.4203588Z shr.s16 %rs116, %rs115, 4; 2026-02-21T10:15:14.4203771Z cvt.s16.s8 %rs117, %rs48; 2026-02-21T10:15:14.4203942Z shr.s16 %rs118, %rs117, 4; 2026-02-21T10:15:14.4204119Z shr.s16 %rs119, %rs43, 4; 2026-02-21T10:15:14.4204288Z shr.s16 %rs120, %rs47, 4; 2026-02-21T10:15:14.4204611Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4204971Z cvt.rn.f32.s16 %r6070, %rs120; 2026-02-21T10:15:14.4205169Z cvt.rn.f32.s16 %r6071, %rs119; 2026-02-21T10:15:14.4205358Z cvt.rn.f32.s16 %r6072, %rs118; 2026-02-21T10:15:14.4205540Z cvt.rn.f32.s16 %r6073, %rs116; 2026-02-21T10:15:14.4205879Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4206239Z cvt.s16.s8 %rs121, %rs50; 2026-02-21T10:15:14.4206418Z shr.s16 %rs122, %rs121, 4; 2026-02-21T10:15:14.4206728Z cvt.s16.s8 %rs123, %rs54; 2026-02-21T10:15:14.4206800Z shr.s16 %rs124, %rs123, 4; 2026-02-21T10:15:14.4206864Z shr.s16 %rs125, %rs49, 4; 2026-02-21T10:15:14.4206928Z shr.s16 %rs126, %rs53, 4; 2026-02-21T10:15:14.4207150Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4207217Z cvt.rn.f32.s16 %r6074, %rs126; 2026-02-21T10:15:14.4207282Z cvt.rn.f32.s16 %r6075, %rs125; 2026-02-21T10:15:14.4207347Z cvt.rn.f32.s16 %r6076, %rs124; 2026-02-21T10:15:14.4207419Z cvt.rn.f32.s16 %r6077, %rs122; 2026-02-21T10:15:14.4207631Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4207694Z cvt.s16.s8 %rs127, %rs52; 2026-02-21T10:15:14.4207764Z shr.s16 %rs128, %rs127, 4; 2026-02-21T10:15:14.4207827Z cvt.s16.s8 %rs129, %rs56; 2026-02-21T10:15:14.4207986Z shr.s16 %rs130, %rs129, 4; 2026-02-21T10:15:14.4208053Z shr.s16 %rs131, %rs51, 4; 2026-02-21T10:15:14.4208185Z shr.s16 %rs132, %rs55, 4; 2026-02-21T10:15:14.4208408Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4208479Z cvt.rn.f32.s16 %r6078, %rs132; 2026-02-21T10:15:14.4208551Z cvt.rn.f32.s16 %r6079, %rs131; 2026-02-21T10:15:14.4208617Z cvt.rn.f32.s16 %r6080, %rs130; 2026-02-21T10:15:14.4208681Z cvt.rn.f32.s16 %r6081, %rs128; 2026-02-21T10:15:14.4208910Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4208983Z cvt.s16.s8 %rs133, %rs58; 2026-02-21T10:15:14.4209053Z shr.s16 %rs134, %rs133, 4; 2026-02-21T10:15:14.4209119Z cvt.s16.s8 %rs135, %rs62; 2026-02-21T10:15:14.4209189Z shr.s16 %rs136, %rs135, 4; 2026-02-21T10:15:14.4209253Z shr.s16 %rs137, %rs57, 4; 2026-02-21T10:15:14.4209314Z shr.s16 %rs138, %rs61, 4; 2026-02-21T10:15:14.4209533Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4209602Z cvt.rn.f32.s16 %r6082, %rs138; 2026-02-21T10:15:14.4209741Z cvt.rn.f32.s16 %r6083, %rs137; 2026-02-21T10:15:14.4209819Z cvt.rn.f32.s16 %r6084, %rs136; 2026-02-21T10:15:14.4209884Z cvt.rn.f32.s16 %r6085, %rs134; 2026-02-21T10:15:14.4210152Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4210220Z cvt.s16.s8 %rs139, %rs60; 2026-02-21T10:15:14.4210291Z shr.s16 %rs140, %rs139, 4; 2026-02-21T10:15:14.4210355Z cvt.s16.s8 %rs141, %rs64; 2026-02-21T10:15:14.4210419Z shr.s16 %rs142, %rs141, 4; 2026-02-21T10:15:14.4210489Z shr.s16 %rs143, %rs59, 4; 2026-02-21T10:15:14.4210552Z shr.s16 %rs144, %rs63, 4; 2026-02-21T10:15:14.4210764Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4210830Z cvt.rn.f32.s16 %r6086, %rs144; 2026-02-21T10:15:14.4210904Z cvt.rn.f32.s16 %r6087, %rs143; 2026-02-21T10:15:14.4210972Z cvt.rn.f32.s16 %r6088, %rs142; 2026-02-21T10:15:14.4211037Z cvt.rn.f32.s16 %r6089, %rs140; 2026-02-21T10:15:14.4211250Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4211313Z cvt.s16.s8 %rs145, %rs66; 2026-02-21T10:15:14.4211377Z shr.s16 %rs146, %rs145, 4; 2026-02-21T10:15:14.4211461Z cvt.s16.s8 %rs147, %rs70; 2026-02-21T10:15:14.4211526Z shr.s16 %rs148, %rs147, 4; 2026-02-21T10:15:14.4211590Z shr.s16 %rs149, %rs65, 4; 2026-02-21T10:15:14.4211656Z shr.s16 %rs150, %rs69, 4; 2026-02-21T10:15:14.4211869Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4211933Z cvt.rn.f32.s16 %r6090, %rs150; 2026-02-21T10:15:14.4212001Z cvt.rn.f32.s16 %r6091, %rs149; 2026-02-21T10:15:14.4212070Z cvt.rn.f32.s16 %r6092, %rs148; 2026-02-21T10:15:14.4212135Z cvt.rn.f32.s16 %r6093, %rs146; 2026-02-21T10:15:14.4212340Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4212414Z cvt.s16.s8 %rs151, %rs68; 2026-02-21T10:15:14.4212476Z shr.s16 %rs152, %rs151, 4; 2026-02-21T10:15:14.4212541Z cvt.s16.s8 %rs153, %rs72; 2026-02-21T10:15:14.4212604Z shr.s16 %rs154, %rs153, 4; 2026-02-21T10:15:14.4212671Z shr.s16 %rs155, %rs67, 4; 2026-02-21T10:15:14.4212734Z shr.s16 %rs156, %rs71, 4; 2026-02-21T10:15:14.4212940Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4213010Z cvt.rn.f32.s16 %r6094, %rs156; 2026-02-21T10:15:14.4213074Z cvt.rn.f32.s16 %r6095, %rs155; 2026-02-21T10:15:14.4213136Z cvt.rn.f32.s16 %r6096, %rs154; 2026-02-21T10:15:14.4213198Z cvt.rn.f32.s16 %r6097, %rs152; 2026-02-21T10:15:14.4213410Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4213545Z cvt.s16.s8 %rs157, %rs74; 2026-02-21T10:15:14.4213608Z shr.s16 %rs158, %rs157, 4; 2026-02-21T10:15:14.4213725Z cvt.s16.s8 %rs159, %rs78; 2026-02-21T10:15:14.4213790Z shr.s16 %rs160, %rs159, 4; 2026-02-21T10:15:14.4213854Z shr.s16 %rs161, %rs73, 4; 2026-02-21T10:15:14.4213923Z shr.s16 %rs162, %rs77, 4; 2026-02-21T10:15:14.4214132Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4214199Z cvt.rn.f32.s16 %r6098, %rs162; 2026-02-21T10:15:14.4214262Z cvt.rn.f32.s16 %r6099, %rs161; 2026-02-21T10:15:14.4214333Z cvt.rn.f32.s16 %r6100, %rs160; 2026-02-21T10:15:14.4214397Z cvt.rn.f32.s16 %r6101, %rs158; 2026-02-21T10:15:14.4214602Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4214670Z cvt.s16.s8 %rs163, %rs76; 2026-02-21T10:15:14.4214733Z shr.s16 %rs164, %rs163, 4; 2026-02-21T10:15:14.4214796Z cvt.s16.s8 %rs165, %rs80; 2026-02-21T10:15:14.4214861Z shr.s16 %rs166, %rs165, 4; 2026-02-21T10:15:14.4214931Z shr.s16 %rs167, %rs75, 4; 2026-02-21T10:15:14.4214993Z shr.s16 %rs168, %rs79, 4; 2026-02-21T10:15:14.4215254Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4215328Z cvt.rn.f32.s16 %r6102, %rs168; 2026-02-21T10:15:14.4215392Z cvt.rn.f32.s16 %r6103, %rs167; 2026-02-21T10:15:14.4215457Z cvt.rn.f32.s16 %r6104, %rs166; 2026-02-21T10:15:14.4215570Z cvt.rn.f32.s16 %r6105, %rs164; 2026-02-21T10:15:14.4215777Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4215840Z cvt.s16.s8 %rs169, %rs82; 2026-02-21T10:15:14.4215903Z shr.s16 %rs170, %rs169, 4; 2026-02-21T10:15:14.4215971Z cvt.s16.s8 %rs171, %rs86; 2026-02-21T10:15:14.4216035Z shr.s16 %rs172, %rs171, 4; 2026-02-21T10:15:14.4216099Z shr.s16 %rs173, %rs81, 4; 2026-02-21T10:15:14.4216172Z shr.s16 %rs174, %rs85, 4; 2026-02-21T10:15:14.4216380Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4216564Z cvt.rn.f32.s16 %r6106, %rs174; 2026-02-21T10:15:14.4216643Z cvt.rn.f32.s16 %r6107, %rs173; 2026-02-21T10:15:14.4216709Z cvt.rn.f32.s16 %r6108, %rs172; 2026-02-21T10:15:14.4216774Z cvt.rn.f32.s16 %r6109, %rs170; 2026-02-21T10:15:14.4216982Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4217052Z cvt.s16.s8 %rs175, %rs84; 2026-02-21T10:15:14.4217116Z shr.s16 %rs176, %rs175, 4; 2026-02-21T10:15:14.4217179Z cvt.s16.s8 %rs177, %rs88; 2026-02-21T10:15:14.4217249Z shr.s16 %rs178, %rs177, 4; 2026-02-21T10:15:14.4217312Z shr.s16 %rs179, %rs83, 4; 2026-02-21T10:15:14.4217373Z shr.s16 %rs180, %rs87, 4; 2026-02-21T10:15:14.4217580Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4217659Z cvt.rn.f32.s16 %r6110, %rs180; 2026-02-21T10:15:14.4217723Z cvt.rn.f32.s16 %r6111, %rs179; 2026-02-21T10:15:14.4217790Z cvt.rn.f32.s16 %r6112, %rs178; 2026-02-21T10:15:14.4217861Z cvt.rn.f32.s16 %r6113, %rs176; 2026-02-21T10:15:14.4218068Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4218130Z cvt.s16.s8 %rs181, %rs90; 2026-02-21T10:15:14.4218200Z shr.s16 %rs182, %rs181, 4; 2026-02-21T10:15:14.4218266Z cvt.s16.s8 %rs183, %rs94; 2026-02-21T10:15:14.4218329Z shr.s16 %rs184, %rs183, 4; 2026-02-21T10:15:14.4218394Z shr.s16 %rs185, %rs89, 4; 2026-02-21T10:15:14.4218478Z shr.s16 %rs186, %rs93, 4; 2026-02-21T10:15:14.4218686Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4218754Z cvt.rn.f32.s16 %r6114, %rs186; 2026-02-21T10:15:14.4218826Z cvt.rn.f32.s16 %r6115, %rs185; 2026-02-21T10:15:14.4218891Z cvt.rn.f32.s16 %r6116, %rs184; 2026-02-21T10:15:14.4219044Z cvt.rn.f32.s16 %r6117, %rs182; 2026-02-21T10:15:14.4219254Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4219387Z cvt.s16.s8 %rs187, %rs92; 2026-02-21T10:15:14.4219453Z shr.s16 %rs188, %rs187, 4; 2026-02-21T10:15:14.4219527Z cvt.s16.s8 %rs189, %rs96; 2026-02-21T10:15:14.4219597Z shr.s16 %rs190, %rs189, 4; 2026-02-21T10:15:14.4219660Z shr.s16 %rs191, %rs91, 4; 2026-02-21T10:15:14.4219726Z shr.s16 %rs192, %rs95, 4; 2026-02-21T10:15:14.4219941Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4220007Z cvt.rn.f32.s16 %r6118, %rs192; 2026-02-21T10:15:14.4220072Z cvt.rn.f32.s16 %r6119, %rs191; 2026-02-21T10:15:14.4220136Z cvt.rn.f32.s16 %r6120, %rs190; 2026-02-21T10:15:14.4220209Z cvt.rn.f32.s16 %r6121, %rs188; 2026-02-21T10:15:14.4220332Z st.shared.v4.b32 [%r102], {%r6061, %r6059, %r6060, %r6058}; 2026-02-21T10:15:14.4220461Z st.shared.v4.b32 [%r102+16384], {%r6065, %r6063, %r6064, %r6062}; 2026-02-21T10:15:14.4220580Z st.shared.v4.b32 [%r103], {%r6069, %r6067, %r6068, %r6066}; 2026-02-21T10:15:14.4220767Z st.shared.v4.b32 [%r103+16384], {%r6073, %r6071, %r6072, %r6070}; 2026-02-21T10:15:14.4220879Z st.shared.v4.b32 [%r104], {%r6077, %r6075, %r6076, %r6074}; 2026-02-21T10:15:14.4221007Z st.shared.v4.b32 [%r104+16384], {%r6081, %r6079, %r6080, %r6078}; 2026-02-21T10:15:14.4221175Z st.shared.v4.b32 [%r105], {%r6085, %r6083, %r6084, %r6082}; 2026-02-21T10:15:14.4221293Z st.shared.v4.b32 [%r105+16384], {%r6089, %r6087, %r6088, %r6086}; 2026-02-21T10:15:14.4221398Z st.shared.v4.b32 [%r106], {%r6093, %r6091, %r6092, %r6090}; 2026-02-21T10:15:14.4221520Z st.shared.v4.b32 [%r106+16384], {%r6097, %r6095, %r6096, %r6094}; 2026-02-21T10:15:14.4221623Z st.shared.v4.b32 [%r107], {%r6101, %r6099, %r6100, %r6098}; 2026-02-21T10:15:14.4221740Z st.shared.v4.b32 [%r107+16384], {%r6105, %r6103, %r6104, %r6102}; 2026-02-21T10:15:14.4221853Z st.shared.v4.b32 [%r108], {%r6109, %r6107, %r6108, %r6106}; 2026-02-21T10:15:14.4221970Z st.shared.v4.b32 [%r108+16384], {%r6113, %r6111, %r6112, %r6110}; 2026-02-21T10:15:14.4222077Z st.shared.v4.b32 [%r109], {%r6117, %r6115, %r6116, %r6114}; 2026-02-21T10:15:14.4222198Z st.shared.v4.b32 [%r109+16384], {%r6121, %r6119, %r6120, %r6118}; 2026-02-21T10:15:14.4222256Z $L__tmp0: 2026-02-21T10:15:14.4222545Z .loc 2 291 36 // standard.py:291:36 @[ cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:81:40 ] 2026-02-21T10:15:14.4222620Z // begin inline asm 2026-02-21T10:15:14.4222734Z fence.proxy.async.shared::cta; 2026-02-21T10:15:14.4222795Z // end inline asm 2026-02-21T10:15:14.4222853Z bar.sync 0; 2026-02-21T10:15:14.4222946Z shfl.sync.idx.b32 %r6122, %r4, 0, 31, -1; 2026-02-21T10:15:14.4223022Z wgmma.fence.sync.aligned; 2026-02-21T10:15:14.4223089Z mov.pred %p2, -1; 2026-02-21T10:15:14.4223156Z // begin inline asm 2026-02-21T10:15:14.4225445Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r1062,%r1063,%r1064,%r1065}, %rd89, %p2, 1, 1; 2026-02-21T10:15:14.4225574Z // end inline asm 2026-02-21T10:15:14.4225634Z // begin inline asm 2026-02-21T10:15:14.4228101Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r1322,%r1323,%r1324,%r1325}, %rd90, %p2, 1, 1; 2026-02-21T10:15:14.4228176Z // end inline asm 2026-02-21T10:15:14.4228238Z // begin inline asm 2026-02-21T10:15:14.4230739Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r1582,%r1583,%r1584,%r1585}, %rd91, %p2, 1, 1; 2026-02-21T10:15:14.4230816Z // end inline asm 2026-02-21T10:15:14.4230885Z // begin inline asm 2026-02-21T10:15:14.4233161Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r1842,%r1843,%r1844,%r1845}, %rd92, %p2, 1, 1; 2026-02-21T10:15:14.4233228Z // end inline asm 2026-02-21T10:15:14.4233298Z // begin inline asm 2026-02-21T10:15:14.4235570Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r2102,%r2103,%r2104,%r2105}, %rd89, %p2, 1, 1; 2026-02-21T10:15:14.4235768Z // end inline asm 2026-02-21T10:15:14.4235833Z // begin inline asm 2026-02-21T10:15:14.4238322Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r2362,%r2363,%r2364,%r2365}, %rd90, %p2, 1, 1; 2026-02-21T10:15:14.4238451Z // end inline asm 2026-02-21T10:15:14.4238519Z // begin inline asm 2026-02-21T10:15:14.4240792Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r2622,%r2623,%r2624,%r2625}, %rd91, %p2, 1, 1; 2026-02-21T10:15:14.4240860Z // end inline asm 2026-02-21T10:15:14.4240920Z // begin inline asm 2026-02-21T10:15:14.4243193Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r2882,%r2883,%r2884,%r2885}, %rd92, %p2, 1, 1; 2026-02-21T10:15:14.4243264Z // end inline asm 2026-02-21T10:15:14.4243359Z wgmma.commit_group.sync.aligned; 2026-02-21T10:15:14.4243428Z mov.b32 %r5742, 0; 2026-02-21T10:15:14.4243495Z mov.b32 %r3142, %r3401; 2026-02-21T10:15:14.4243557Z mov.b32 %r3143, %r5742; 2026-02-21T10:15:14.4243618Z mov.b32 %r3144, %r5742; 2026-02-21T10:15:14.4243765Z // begin inline asm 2026-02-21T10:15:14.4248195Z // wait for regs: %r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818,%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946,%r3142,%r3143,%r3144 2026-02-21T10:15:14.4248297Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:15:14.4248356Z // end inline asm 2026-02-21T10:15:14.4248423Z $L__tmp1: 2026-02-21T10:15:14.4248655Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4248725Z add.s32 %r6123, %r3401, 49152; 2026-02-21T10:15:14.4248797Z add.s32 %r6124, %r6123, %r6044; 2026-02-21T10:15:14.4249022Z .loc 1 49 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:49:32 2026-02-21T10:15:14.4249092Z add.s32 %r6125, %r6124, %r94; 2026-02-21T10:15:14.4249168Z ld.shared.b16 %rs193, [%r6125]; 2026-02-21T10:15:14.4249243Z ld.shared.b16 %rs194, [%r6125+512]; 2026-02-21T10:15:14.4249314Z ld.shared.b16 %rs195, [%r6125+32]; 2026-02-21T10:15:14.4249382Z ld.shared.b16 %rs196, [%r6125+544]; 2026-02-21T10:15:14.4249459Z ld.shared.b16 %rs197, [%r6125+4096]; 2026-02-21T10:15:14.4249527Z ld.shared.b16 %rs198, [%r6125+4608]; 2026-02-21T10:15:14.4249596Z ld.shared.b16 %rs199, [%r6125+4128]; 2026-02-21T10:15:14.4249668Z ld.shared.b16 %rs200, [%r6125+4640]; 2026-02-21T10:15:14.4249734Z add.s32 %r6126, %r6124, %r95; 2026-02-21T10:15:14.4249804Z ld.shared.b16 %rs201, [%r6126]; 2026-02-21T10:15:14.4249873Z ld.shared.b16 %rs202, [%r6126+512]; 2026-02-21T10:15:14.4249949Z ld.shared.b16 %rs203, [%r6126+32]; 2026-02-21T10:15:14.4255573Z ld.shared.b16 %rs204, [%r6126+544]; 2026-02-21T10:15:14.4255700Z ld.shared.b16 %rs205, [%r6126+4096]; 2026-02-21T10:15:14.4255780Z ld.shared.b16 %rs206, [%r6126+4608]; 2026-02-21T10:15:14.4255858Z ld.shared.b16 %rs207, [%r6126+4128]; 2026-02-21T10:15:14.4255936Z ld.shared.b16 %rs208, [%r6126+4640]; 2026-02-21T10:15:14.4256006Z add.s32 %r6127, %r6124, %r96; 2026-02-21T10:15:14.4256078Z ld.shared.b16 %rs209, [%r6127]; 2026-02-21T10:15:14.4256156Z ld.shared.b16 %rs210, [%r6127+512]; 2026-02-21T10:15:14.4256229Z ld.shared.b16 %rs211, [%r6127+32]; 2026-02-21T10:15:14.4256299Z ld.shared.b16 %rs212, [%r6127+544]; 2026-02-21T10:15:14.4256372Z ld.shared.b16 %rs213, [%r6127+4096]; 2026-02-21T10:15:14.4256642Z ld.shared.b16 %rs214, [%r6127+4608]; 2026-02-21T10:15:14.4256893Z ld.shared.b16 %rs215, [%r6127+4128]; 2026-02-21T10:15:14.4257045Z ld.shared.b16 %rs216, [%r6127+4640]; 2026-02-21T10:15:14.4257120Z add.s32 %r6128, %r6124, %r97; 2026-02-21T10:15:14.4257190Z ld.shared.b16 %rs217, [%r6128]; 2026-02-21T10:15:14.4257258Z ld.shared.b16 %rs218, [%r6128+512]; 2026-02-21T10:15:14.4257330Z ld.shared.b16 %rs219, [%r6128+32]; 2026-02-21T10:15:14.4257400Z ld.shared.b16 %rs220, [%r6128+544]; 2026-02-21T10:15:14.4257471Z ld.shared.b16 %rs221, [%r6128+4096]; 2026-02-21T10:15:14.4257538Z ld.shared.b16 %rs222, [%r6128+4608]; 2026-02-21T10:15:14.4257610Z ld.shared.b16 %rs223, [%r6128+4128]; 2026-02-21T10:15:14.4257677Z ld.shared.b16 %rs224, [%r6128+4640]; 2026-02-21T10:15:14.4257750Z cvt.f32.bf16 %r3660, %rs193; 2026-02-21T10:15:14.4257821Z cvt.f32.bf16 %r3661, %rs194; 2026-02-21T10:15:14.4257894Z cvt.f32.bf16 %r3662, %rs201; 2026-02-21T10:15:14.4257957Z cvt.f32.bf16 %r3663, %rs202; 2026-02-21T10:15:14.4258020Z cvt.f32.bf16 %r3920, %rs209; 2026-02-21T10:15:14.4258095Z cvt.f32.bf16 %r3921, %rs210; 2026-02-21T10:15:14.4258163Z cvt.f32.bf16 %r3922, %rs217; 2026-02-21T10:15:14.4258223Z cvt.f32.bf16 %r3923, %rs218; 2026-02-21T10:15:14.4258375Z cvt.f32.bf16 %r4180, %rs195; 2026-02-21T10:15:14.4258449Z cvt.f32.bf16 %r4181, %rs196; 2026-02-21T10:15:14.4258516Z cvt.f32.bf16 %r4182, %rs203; 2026-02-21T10:15:14.4258581Z cvt.f32.bf16 %r4183, %rs204; 2026-02-21T10:15:14.4258713Z cvt.f32.bf16 %r4440, %rs211; 2026-02-21T10:15:14.4258778Z cvt.f32.bf16 %r4441, %rs212; 2026-02-21T10:15:14.4258840Z cvt.f32.bf16 %r4442, %rs219; 2026-02-21T10:15:14.4258912Z cvt.f32.bf16 %r4443, %rs220; 2026-02-21T10:15:14.4258974Z cvt.f32.bf16 %r4700, %rs197; 2026-02-21T10:15:14.4259036Z cvt.f32.bf16 %r4701, %rs198; 2026-02-21T10:15:14.4259106Z cvt.f32.bf16 %r4702, %rs205; 2026-02-21T10:15:14.4259169Z cvt.f32.bf16 %r4703, %rs206; 2026-02-21T10:15:14.4259242Z cvt.f32.bf16 %r4960, %rs213; 2026-02-21T10:15:14.4259305Z cvt.f32.bf16 %r4961, %rs214; 2026-02-21T10:15:14.4259378Z cvt.f32.bf16 %r4962, %rs221; 2026-02-21T10:15:14.4259441Z cvt.f32.bf16 %r4963, %rs222; 2026-02-21T10:15:14.4259504Z cvt.f32.bf16 %r5220, %rs199; 2026-02-21T10:15:14.4259574Z cvt.f32.bf16 %r5221, %rs200; 2026-02-21T10:15:14.4259639Z cvt.f32.bf16 %r5222, %rs207; 2026-02-21T10:15:14.4259700Z cvt.f32.bf16 %r5223, %rs208; 2026-02-21T10:15:14.4259762Z cvt.f32.bf16 %r5480, %rs215; 2026-02-21T10:15:14.4259833Z cvt.f32.bf16 %r5481, %rs216; 2026-02-21T10:15:14.4259896Z cvt.f32.bf16 %r5482, %rs223; 2026-02-21T10:15:14.4259958Z cvt.f32.bf16 %r5483, %rs224; 2026-02-21T10:15:14.4260211Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4260286Z add.s32 %r6129, %r3401, 73728; 2026-02-21T10:15:14.4260351Z add.s32 %r6130, %r6129, %r6043; 2026-02-21T10:15:14.4260586Z .loc 1 64 45 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:64:45 2026-02-21T10:15:14.4260657Z add.s32 %r6131, %r6130, %r45; 2026-02-21T10:15:14.4260724Z add.s32 %r6132, %r6130, %r98; 2026-02-21T10:15:14.4260787Z add.s32 %r6133, %r6130, %r99; 2026-02-21T10:15:14.4260864Z add.s32 %r6134, %r6130, %r100; 2026-02-21T10:15:14.4260926Z add.s32 %r6135, %r6130, %r101; 2026-02-21T10:15:14.4261155Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4261230Z ld.shared.s8 %rs225, [%r6131]; 2026-02-21T10:15:14.4261453Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4261521Z shl.b16 %rs226, %rs225, 4; 2026-02-21T10:15:14.4261741Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4261813Z ld.shared.s8 %rs227, [%r6131+128]; 2026-02-21T10:15:14.4262019Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4262184Z shl.b16 %rs228, %rs227, 4; 2026-02-21T10:15:14.4262406Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4262523Z ld.shared.s8 %rs229, [%r6131+256]; 2026-02-21T10:15:14.4262728Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4262800Z shl.b16 %rs230, %rs229, 4; 2026-02-21T10:15:14.4263005Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4263071Z ld.shared.s8 %rs231, [%r6131+384]; 2026-02-21T10:15:14.4263280Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4263344Z shl.b16 %rs232, %rs231, 4; 2026-02-21T10:15:14.4263545Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4263617Z ld.shared.s8 %rs233, [%r6131+512]; 2026-02-21T10:15:14.4263820Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4263882Z shl.b16 %rs234, %rs233, 4; 2026-02-21T10:15:14.4264135Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4264208Z ld.shared.s8 %rs235, [%r6131+640]; 2026-02-21T10:15:14.4264456Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4264521Z shl.b16 %rs236, %rs235, 4; 2026-02-21T10:15:14.4264730Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4264798Z ld.shared.s8 %rs237, [%r6131+768]; 2026-02-21T10:15:14.4265000Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4265070Z shl.b16 %rs238, %rs237, 4; 2026-02-21T10:15:14.4265274Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4265346Z ld.shared.s8 %rs239, [%r6132]; 2026-02-21T10:15:14.4265561Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4265624Z shl.b16 %rs240, %rs239, 4; 2026-02-21T10:15:14.4265828Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4265903Z ld.shared.s8 %rs241, [%r6131+1024]; 2026-02-21T10:15:14.4266113Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4266177Z shl.b16 %rs242, %rs241, 4; 2026-02-21T10:15:14.4266381Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4266639Z ld.shared.s8 %rs243, [%r6131+1152]; 2026-02-21T10:15:14.4266859Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4266929Z shl.b16 %rs244, %rs243, 4; 2026-02-21T10:15:14.4267144Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4267218Z ld.shared.s8 %rs245, [%r6131+1280]; 2026-02-21T10:15:14.4267424Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4267491Z shl.b16 %rs246, %rs245, 4; 2026-02-21T10:15:14.4267697Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4267765Z ld.shared.s8 %rs247, [%r6131+1408]; 2026-02-21T10:15:14.4267968Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4268040Z shl.b16 %rs248, %rs247, 4; 2026-02-21T10:15:14.4268243Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4268310Z ld.shared.s8 %rs249, [%r6131+1536]; 2026-02-21T10:15:14.4268712Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4268847Z shl.b16 %rs250, %rs249, 4; 2026-02-21T10:15:14.4269066Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4269143Z ld.shared.s8 %rs251, [%r6131+1664]; 2026-02-21T10:15:14.4269348Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4269412Z shl.b16 %rs252, %rs251, 4; 2026-02-21T10:15:14.4269621Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4269687Z ld.shared.s8 %rs253, [%r6131+1792]; 2026-02-21T10:15:14.4269885Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4269952Z shl.b16 %rs254, %rs253, 4; 2026-02-21T10:15:14.4270155Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4270229Z ld.shared.s8 %rs255, [%r6133]; 2026-02-21T10:15:14.4270511Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4270586Z shl.b16 %rs256, %rs255, 4; 2026-02-21T10:15:14.4270791Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4270919Z ld.shared.s8 %rs257, [%r6131+2048]; 2026-02-21T10:15:14.4271132Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4271195Z shl.b16 %rs258, %rs257, 4; 2026-02-21T10:15:14.4271398Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4271472Z ld.shared.s8 %rs259, [%r6131+2176]; 2026-02-21T10:15:14.4271684Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4271752Z shl.b16 %rs260, %rs259, 4; 2026-02-21T10:15:14.4271970Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4272038Z ld.shared.s8 %rs261, [%r6131+2304]; 2026-02-21T10:15:14.4272239Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4272302Z shl.b16 %rs262, %rs261, 4; 2026-02-21T10:15:14.4272514Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4272584Z ld.shared.s8 %rs263, [%r6131+2432]; 2026-02-21T10:15:14.4272796Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4272863Z shl.b16 %rs264, %rs263, 4; 2026-02-21T10:15:14.4273064Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4273133Z ld.shared.s8 %rs265, [%r6131+2560]; 2026-02-21T10:15:14.4273343Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4273409Z shl.b16 %rs266, %rs265, 4; 2026-02-21T10:15:14.4273613Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4273690Z ld.shared.s8 %rs267, [%r6131+2688]; 2026-02-21T10:15:14.4273895Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4273958Z shl.b16 %rs268, %rs267, 4; 2026-02-21T10:15:14.4274161Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4274236Z ld.shared.s8 %rs269, [%r6131+2816]; 2026-02-21T10:15:14.4274437Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4274511Z shl.b16 %rs270, %rs269, 4; 2026-02-21T10:15:14.4274782Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4274896Z ld.shared.s8 %rs271, [%r6134]; 2026-02-21T10:15:14.4275099Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4275168Z shl.b16 %rs272, %rs271, 4; 2026-02-21T10:15:14.4275372Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4275442Z ld.shared.s8 %rs273, [%r6131+3072]; 2026-02-21T10:15:14.4275650Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4275712Z shl.b16 %rs274, %rs273, 4; 2026-02-21T10:15:14.4275915Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4275982Z ld.shared.s8 %rs275, [%r6131+3200]; 2026-02-21T10:15:14.4276191Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4276255Z shl.b16 %rs276, %rs275, 4; 2026-02-21T10:15:14.4276665Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4276752Z ld.shared.s8 %rs277, [%r6131+3328]; 2026-02-21T10:15:14.4276978Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4277105Z shl.b16 %rs278, %rs277, 4; 2026-02-21T10:15:14.4277335Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4277405Z ld.shared.s8 %rs279, [%r6131+3456]; 2026-02-21T10:15:14.4277607Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4277674Z shl.b16 %rs280, %rs279, 4; 2026-02-21T10:15:14.4277876Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4277947Z ld.shared.s8 %rs281, [%r6131+3584]; 2026-02-21T10:15:14.4278154Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4278223Z shl.b16 %rs282, %rs281, 4; 2026-02-21T10:15:14.4278426Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4278493Z ld.shared.s8 %rs283, [%r6131+3712]; 2026-02-21T10:15:14.4278705Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4278768Z shl.b16 %rs284, %rs283, 4; 2026-02-21T10:15:14.4278971Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4279043Z ld.shared.s8 %rs285, [%r6131+3840]; 2026-02-21T10:15:14.4279243Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4279307Z shl.b16 %rs286, %rs285, 4; 2026-02-21T10:15:14.4279515Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4279587Z ld.shared.s8 %rs287, [%r6135]; 2026-02-21T10:15:14.4279789Z .loc 1 54 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:54:28 2026-02-21T10:15:14.4279853Z shl.b16 %rs288, %rs287, 4; 2026-02-21T10:15:14.4280073Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4280140Z cvt.s16.s8 %rs289, %rs226; 2026-02-21T10:15:14.4280202Z shr.s16 %rs290, %rs289, 4; 2026-02-21T10:15:14.4280270Z cvt.s16.s8 %rs291, %rs230; 2026-02-21T10:15:14.4280332Z shr.s16 %rs292, %rs291, 4; 2026-02-21T10:15:14.4280394Z shr.s16 %rs293, %rs225, 4; 2026-02-21T10:15:14.4280461Z shr.s16 %rs294, %rs229, 4; 2026-02-21T10:15:14.4280664Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4280807Z cvt.rn.f32.s16 %r6136, %rs294; 2026-02-21T10:15:14.4280943Z cvt.rn.f32.s16 %r6137, %rs293; 2026-02-21T10:15:14.4281012Z cvt.rn.f32.s16 %r6138, %rs292; 2026-02-21T10:15:14.4281075Z cvt.rn.f32.s16 %r6139, %rs290; 2026-02-21T10:15:14.4281291Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4281364Z cvt.s16.s8 %rs295, %rs228; 2026-02-21T10:15:14.4281428Z shr.s16 %rs296, %rs295, 4; 2026-02-21T10:15:14.4281489Z cvt.s16.s8 %rs297, %rs232; 2026-02-21T10:15:14.4281557Z shr.s16 %rs298, %rs297, 4; 2026-02-21T10:15:14.4281618Z shr.s16 %rs299, %rs227, 4; 2026-02-21T10:15:14.4281679Z shr.s16 %rs300, %rs231, 4; 2026-02-21T10:15:14.4281886Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4281955Z cvt.rn.f32.s16 %r6140, %rs300; 2026-02-21T10:15:14.4282018Z cvt.rn.f32.s16 %r6141, %rs299; 2026-02-21T10:15:14.4282080Z cvt.rn.f32.s16 %r6142, %rs298; 2026-02-21T10:15:14.4282151Z cvt.rn.f32.s16 %r6143, %rs296; 2026-02-21T10:15:14.4282416Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4282481Z cvt.s16.s8 %rs301, %rs234; 2026-02-21T10:15:14.4282543Z shr.s16 %rs302, %rs301, 4; 2026-02-21T10:15:14.4282610Z cvt.s16.s8 %rs303, %rs238; 2026-02-21T10:15:14.4282671Z shr.s16 %rs304, %rs303, 4; 2026-02-21T10:15:14.4282795Z shr.s16 %rs305, %rs233, 4; 2026-02-21T10:15:14.4282871Z shr.s16 %rs306, %rs237, 4; 2026-02-21T10:15:14.4283074Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4283141Z cvt.rn.f32.s16 %r6144, %rs306; 2026-02-21T10:15:14.4283210Z cvt.rn.f32.s16 %r6145, %rs305; 2026-02-21T10:15:14.4283272Z cvt.rn.f32.s16 %r6146, %rs304; 2026-02-21T10:15:14.4283335Z cvt.rn.f32.s16 %r6147, %rs302; 2026-02-21T10:15:14.4283537Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4283611Z cvt.s16.s8 %rs307, %rs236; 2026-02-21T10:15:14.4283674Z shr.s16 %rs308, %rs307, 4; 2026-02-21T10:15:14.4283738Z cvt.s16.s8 %rs309, %rs240; 2026-02-21T10:15:14.4283820Z shr.s16 %rs310, %rs309, 4; 2026-02-21T10:15:14.4283883Z shr.s16 %rs311, %rs235, 4; 2026-02-21T10:15:14.4283946Z shr.s16 %rs312, %rs239, 4; 2026-02-21T10:15:14.4284155Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4284227Z cvt.rn.f32.s16 %r6148, %rs312; 2026-02-21T10:15:14.4284291Z cvt.rn.f32.s16 %r6149, %rs311; 2026-02-21T10:15:14.4284353Z cvt.rn.f32.s16 %r6150, %rs310; 2026-02-21T10:15:14.4284425Z cvt.rn.f32.s16 %r6151, %rs308; 2026-02-21T10:15:14.4284628Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4284691Z cvt.s16.s8 %rs313, %rs242; 2026-02-21T10:15:14.4284762Z shr.s16 %rs314, %rs313, 4; 2026-02-21T10:15:14.4284829Z cvt.s16.s8 %rs315, %rs246; 2026-02-21T10:15:14.4284894Z shr.s16 %rs316, %rs315, 4; 2026-02-21T10:15:14.4284959Z shr.s16 %rs317, %rs241, 4; 2026-02-21T10:15:14.4285028Z shr.s16 %rs318, %rs245, 4; 2026-02-21T10:15:14.4285232Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4285296Z cvt.rn.f32.s16 %r6152, %rs318; 2026-02-21T10:15:14.4285370Z cvt.rn.f32.s16 %r6153, %rs317; 2026-02-21T10:15:14.4285437Z cvt.rn.f32.s16 %r6154, %rs316; 2026-02-21T10:15:14.4285500Z cvt.rn.f32.s16 %r6155, %rs314; 2026-02-21T10:15:14.4285710Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4285771Z cvt.s16.s8 %rs319, %rs244; 2026-02-21T10:15:14.4285843Z shr.s16 %rs320, %rs319, 4; 2026-02-21T10:15:14.4285906Z cvt.s16.s8 %rs321, %rs248; 2026-02-21T10:15:14.4285976Z shr.s16 %rs322, %rs321, 4; 2026-02-21T10:15:14.4286039Z shr.s16 %rs323, %rs243, 4; 2026-02-21T10:15:14.4286161Z shr.s16 %rs324, %rs247, 4; 2026-02-21T10:15:14.4286418Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4286593Z cvt.rn.f32.s16 %r6156, %rs324; 2026-02-21T10:15:14.4286661Z cvt.rn.f32.s16 %r6157, %rs323; 2026-02-21T10:15:14.4286724Z cvt.rn.f32.s16 %r6158, %rs322; 2026-02-21T10:15:14.4286795Z cvt.rn.f32.s16 %r6159, %rs320; 2026-02-21T10:15:14.4287000Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4287065Z cvt.s16.s8 %rs325, %rs250; 2026-02-21T10:15:14.4287132Z shr.s16 %rs326, %rs325, 4; 2026-02-21T10:15:14.4287206Z cvt.s16.s8 %rs327, %rs254; 2026-02-21T10:15:14.4287268Z shr.s16 %rs328, %rs327, 4; 2026-02-21T10:15:14.4287335Z shr.s16 %rs329, %rs249, 4; 2026-02-21T10:15:14.4287395Z shr.s16 %rs330, %rs253, 4; 2026-02-21T10:15:14.4287598Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4287663Z cvt.rn.f32.s16 %r6160, %rs330; 2026-02-21T10:15:14.4287734Z cvt.rn.f32.s16 %r6161, %rs329; 2026-02-21T10:15:14.4287878Z cvt.rn.f32.s16 %r6162, %rs328; 2026-02-21T10:15:14.4287946Z cvt.rn.f32.s16 %r6163, %rs326; 2026-02-21T10:15:14.4288158Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4288222Z cvt.s16.s8 %rs331, %rs252; 2026-02-21T10:15:14.4288348Z shr.s16 %rs332, %rs331, 4; 2026-02-21T10:15:14.4288424Z cvt.s16.s8 %rs333, %rs256; 2026-02-21T10:15:14.4288486Z shr.s16 %rs334, %rs333, 4; 2026-02-21T10:15:14.4288547Z shr.s16 %rs335, %rs251, 4; 2026-02-21T10:15:14.4288607Z shr.s16 %rs336, %rs255, 4; 2026-02-21T10:15:14.4288817Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4288881Z cvt.rn.f32.s16 %r6164, %rs336; 2026-02-21T10:15:14.4288946Z cvt.rn.f32.s16 %r6165, %rs335; 2026-02-21T10:15:14.4289016Z cvt.rn.f32.s16 %r6166, %rs334; 2026-02-21T10:15:14.4289080Z cvt.rn.f32.s16 %r6167, %rs332; 2026-02-21T10:15:14.4289288Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4289352Z cvt.s16.s8 %rs337, %rs258; 2026-02-21T10:15:14.4289421Z shr.s16 %rs338, %rs337, 4; 2026-02-21T10:15:14.4289482Z cvt.s16.s8 %rs339, %rs262; 2026-02-21T10:15:14.4289556Z shr.s16 %rs340, %rs339, 4; 2026-02-21T10:15:14.4289628Z shr.s16 %rs341, %rs257, 4; 2026-02-21T10:15:14.4289686Z shr.s16 %rs342, %rs261, 4; 2026-02-21T10:15:14.4289900Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4289964Z cvt.rn.f32.s16 %r6168, %rs342; 2026-02-21T10:15:14.4290026Z cvt.rn.f32.s16 %r6169, %rs341; 2026-02-21T10:15:14.4290092Z cvt.rn.f32.s16 %r6170, %rs340; 2026-02-21T10:15:14.4290155Z cvt.rn.f32.s16 %r6171, %rs338; 2026-02-21T10:15:14.4290357Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4290422Z cvt.s16.s8 %rs343, %rs260; 2026-02-21T10:15:14.4290491Z shr.s16 %rs344, %rs343, 4; 2026-02-21T10:15:14.4290553Z cvt.s16.s8 %rs345, %rs264; 2026-02-21T10:15:14.4290613Z shr.s16 %rs346, %rs345, 4; 2026-02-21T10:15:14.4290679Z shr.s16 %rs347, %rs259, 4; 2026-02-21T10:15:14.4290739Z shr.s16 %rs348, %rs263, 4; 2026-02-21T10:15:14.4290943Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4291012Z cvt.rn.f32.s16 %r6172, %rs348; 2026-02-21T10:15:14.4291074Z cvt.rn.f32.s16 %r6173, %rs347; 2026-02-21T10:15:14.4291137Z cvt.rn.f32.s16 %r6174, %rs346; 2026-02-21T10:15:14.4291200Z cvt.rn.f32.s16 %r6175, %rs344; 2026-02-21T10:15:14.4291408Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4291469Z cvt.s16.s8 %rs349, %rs266; 2026-02-21T10:15:14.4291618Z shr.s16 %rs350, %rs349, 4; 2026-02-21T10:15:14.4291687Z cvt.s16.s8 %rs351, %rs270; 2026-02-21T10:15:14.4291812Z shr.s16 %rs352, %rs351, 4; 2026-02-21T10:15:14.4291878Z shr.s16 %rs353, %rs265, 4; 2026-02-21T10:15:14.4291941Z shr.s16 %rs354, %rs269, 4; 2026-02-21T10:15:14.4292169Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4292234Z cvt.rn.f32.s16 %r6176, %rs354; 2026-02-21T10:15:14.4292300Z cvt.rn.f32.s16 %r6177, %rs353; 2026-02-21T10:15:14.4292369Z cvt.rn.f32.s16 %r6178, %rs352; 2026-02-21T10:15:14.4292432Z cvt.rn.f32.s16 %r6179, %rs350; 2026-02-21T10:15:14.4292634Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4292702Z cvt.s16.s8 %rs355, %rs268; 2026-02-21T10:15:14.4292764Z shr.s16 %rs356, %rs355, 4; 2026-02-21T10:15:14.4292828Z cvt.s16.s8 %rs357, %rs272; 2026-02-21T10:15:14.4292889Z shr.s16 %rs358, %rs357, 4; 2026-02-21T10:15:14.4292959Z shr.s16 %rs359, %rs267, 4; 2026-02-21T10:15:14.4293022Z shr.s16 %rs360, %rs271, 4; 2026-02-21T10:15:14.4293284Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4293362Z cvt.rn.f32.s16 %r6180, %rs360; 2026-02-21T10:15:14.4293426Z cvt.rn.f32.s16 %r6181, %rs359; 2026-02-21T10:15:14.4293492Z cvt.rn.f32.s16 %r6182, %rs358; 2026-02-21T10:15:14.4293601Z cvt.rn.f32.s16 %r6183, %rs356; 2026-02-21T10:15:14.4293818Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4293881Z cvt.s16.s8 %rs361, %rs274; 2026-02-21T10:15:14.4293944Z shr.s16 %rs362, %rs361, 4; 2026-02-21T10:15:14.4294012Z cvt.s16.s8 %rs363, %rs278; 2026-02-21T10:15:14.4294073Z shr.s16 %rs364, %rs363, 4; 2026-02-21T10:15:14.4294134Z shr.s16 %rs365, %rs273, 4; 2026-02-21T10:15:14.4294201Z shr.s16 %rs366, %rs277, 4; 2026-02-21T10:15:14.4294405Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4294473Z cvt.rn.f32.s16 %r6184, %rs366; 2026-02-21T10:15:14.4294536Z cvt.rn.f32.s16 %r6185, %rs365; 2026-02-21T10:15:14.4294604Z cvt.rn.f32.s16 %r6186, %rs364; 2026-02-21T10:15:14.4294666Z cvt.rn.f32.s16 %r6187, %rs362; 2026-02-21T10:15:14.4294867Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4294940Z cvt.s16.s8 %rs367, %rs276; 2026-02-21T10:15:14.4295002Z shr.s16 %rs368, %rs367, 4; 2026-02-21T10:15:14.4295064Z cvt.s16.s8 %rs369, %rs280; 2026-02-21T10:15:14.4295129Z shr.s16 %rs370, %rs369, 4; 2026-02-21T10:15:14.4295189Z shr.s16 %rs371, %rs275, 4; 2026-02-21T10:15:14.4295249Z shr.s16 %rs372, %rs279, 4; 2026-02-21T10:15:14.4295449Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4295519Z cvt.rn.f32.s16 %r6188, %rs372; 2026-02-21T10:15:14.4295583Z cvt.rn.f32.s16 %r6189, %rs371; 2026-02-21T10:15:14.4295645Z cvt.rn.f32.s16 %r6190, %rs370; 2026-02-21T10:15:14.4295715Z cvt.rn.f32.s16 %r6191, %rs368; 2026-02-21T10:15:14.4295918Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4295979Z cvt.s16.s8 %rs373, %rs282; 2026-02-21T10:15:14.4296040Z shr.s16 %rs374, %rs373, 4; 2026-02-21T10:15:14.4296106Z cvt.s16.s8 %rs375, %rs286; 2026-02-21T10:15:14.4296167Z shr.s16 %rs376, %rs375, 4; 2026-02-21T10:15:14.4296228Z shr.s16 %rs377, %rs281, 4; 2026-02-21T10:15:14.4296295Z shr.s16 %rs378, %rs285, 4; 2026-02-21T10:15:14.4296624Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4296690Z cvt.rn.f32.s16 %r6192, %rs378; 2026-02-21T10:15:14.4296756Z cvt.rn.f32.s16 %r6193, %rs377; 2026-02-21T10:15:14.4296817Z cvt.rn.f32.s16 %r6194, %rs376; 2026-02-21T10:15:14.4296880Z cvt.rn.f32.s16 %r6195, %rs374; 2026-02-21T10:15:14.4297161Z .loc 1 56 25 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:56:25 2026-02-21T10:15:14.4297294Z cvt.s16.s8 %rs379, %rs284; 2026-02-21T10:15:14.4297359Z shr.s16 %rs380, %rs379, 4; 2026-02-21T10:15:14.4297419Z cvt.s16.s8 %rs381, %rs288; 2026-02-21T10:15:14.4297485Z shr.s16 %rs382, %rs381, 4; 2026-02-21T10:15:14.4297546Z shr.s16 %rs383, %rs283, 4; 2026-02-21T10:15:14.4297607Z shr.s16 %rs384, %rs287, 4; 2026-02-21T10:15:14.4297815Z .loc 1 74 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:74:32 2026-02-21T10:15:14.4297880Z cvt.rn.f32.s16 %r6196, %rs384; 2026-02-21T10:15:14.4297943Z cvt.rn.f32.s16 %r6197, %rs383; 2026-02-21T10:15:14.4298005Z cvt.rn.f32.s16 %r6198, %rs382; 2026-02-21T10:15:14.4298072Z cvt.rn.f32.s16 %r6199, %rs380; 2026-02-21T10:15:14.4298129Z bar.sync 0; 2026-02-21T10:15:14.4298252Z st.shared.v4.b32 [%r102], {%r6139, %r6137, %r6138, %r6136}; 2026-02-21T10:15:14.4298383Z st.shared.v4.b32 [%r102+16384], {%r6143, %r6141, %r6142, %r6140}; 2026-02-21T10:15:14.4298500Z st.shared.v4.b32 [%r103], {%r6147, %r6145, %r6146, %r6144}; 2026-02-21T10:15:14.4298698Z st.shared.v4.b32 [%r103+16384], {%r6151, %r6149, %r6150, %r6148}; 2026-02-21T10:15:14.4298813Z st.shared.v4.b32 [%r104], {%r6155, %r6153, %r6154, %r6152}; 2026-02-21T10:15:14.4298927Z st.shared.v4.b32 [%r104+16384], {%r6159, %r6157, %r6158, %r6156}; 2026-02-21T10:15:14.4299096Z st.shared.v4.b32 [%r105], {%r6163, %r6161, %r6162, %r6160}; 2026-02-21T10:15:14.4299236Z st.shared.v4.b32 [%r105+16384], {%r6167, %r6165, %r6166, %r6164}; 2026-02-21T10:15:14.4299344Z st.shared.v4.b32 [%r106], {%r6171, %r6169, %r6170, %r6168}; 2026-02-21T10:15:14.4299459Z st.shared.v4.b32 [%r106+16384], {%r6175, %r6173, %r6174, %r6172}; 2026-02-21T10:15:14.4299569Z st.shared.v4.b32 [%r107], {%r6179, %r6177, %r6178, %r6176}; 2026-02-21T10:15:14.4299682Z st.shared.v4.b32 [%r107+16384], {%r6183, %r6181, %r6182, %r6180}; 2026-02-21T10:15:14.4299789Z st.shared.v4.b32 [%r108], {%r6187, %r6185, %r6186, %r6184}; 2026-02-21T10:15:14.4299909Z st.shared.v4.b32 [%r108+16384], {%r6191, %r6189, %r6190, %r6188}; 2026-02-21T10:15:14.4300014Z st.shared.v4.b32 [%r109], {%r6195, %r6193, %r6194, %r6192}; 2026-02-21T10:15:14.4300128Z st.shared.v4.b32 [%r109+16384], {%r6199, %r6197, %r6198, %r6196}; 2026-02-21T10:15:14.4300193Z $L__tmp2: 2026-02-21T10:15:14.4300489Z .loc 2 291 36 // standard.py:291:36 @[ cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:81:40 ] 2026-02-21T10:15:14.4300555Z // begin inline asm 2026-02-21T10:15:14.4300657Z fence.proxy.async.shared::cta; 2026-02-21T10:15:14.4300724Z // end inline asm 2026-02-21T10:15:14.4300782Z bar.sync 0; 2026-02-21T10:15:14.4300857Z wgmma.fence.sync.aligned; 2026-02-21T10:15:14.4300924Z // begin inline asm 2026-02-21T10:15:14.4303210Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r3660,%r3661,%r3662,%r3663}, %rd89, %p2, 1, 1; 2026-02-21T10:15:14.4303284Z // end inline asm 2026-02-21T10:15:14.4303342Z // begin inline asm 2026-02-21T10:15:14.4305615Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r3920,%r3921,%r3922,%r3923}, %rd90, %p2, 1, 1; 2026-02-21T10:15:14.4305777Z // end inline asm 2026-02-21T10:15:14.4305841Z // begin inline asm 2026-02-21T10:15:14.4308384Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r4180,%r4181,%r4182,%r4183}, %rd91, %p2, 1, 1; 2026-02-21T10:15:14.4308522Z // end inline asm 2026-02-21T10:15:14.4308589Z // begin inline asm 2026-02-21T10:15:14.4310874Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818}, {%r4440,%r4441,%r4442,%r4443}, %rd92, %p2, 1, 1; 2026-02-21T10:15:14.4310947Z // end inline asm 2026-02-21T10:15:14.4311007Z // begin inline asm 2026-02-21T10:15:14.4313280Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r4700,%r4701,%r4702,%r4703}, %rd89, %p2, 1, 1; 2026-02-21T10:15:14.4313476Z // end inline asm 2026-02-21T10:15:14.4313538Z // begin inline asm 2026-02-21T10:15:14.4315866Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r4960,%r4961,%r4962,%r4963}, %rd90, %p2, 1, 1; 2026-02-21T10:15:14.4315933Z // end inline asm 2026-02-21T10:15:14.4315996Z // begin inline asm 2026-02-21T10:15:14.4318479Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r5220,%r5221,%r5222,%r5223}, %rd91, %p2, 1, 1; 2026-02-21T10:15:14.4318558Z // end inline asm 2026-02-21T10:15:14.4318620Z // begin inline asm 2026-02-21T10:15:14.4320893Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946}, {%r5480,%r5481,%r5482,%r5483}, %rd92, %p2, 1, 1; 2026-02-21T10:15:14.4320961Z // end inline asm 2026-02-21T10:15:14.4321041Z wgmma.commit_group.sync.aligned; 2026-02-21T10:15:14.4321108Z mov.b32 %r5741, %r5742; 2026-02-21T10:15:14.4321167Z mov.b32 %r5740, %r3401; 2026-02-21T10:15:14.4321225Z // begin inline asm 2026-02-21T10:15:14.4325423Z // wait for regs: %r6691,%r6692,%r6693,%r6694,%r6695,%r6696,%r6697,%r6698,%r6699,%r6700,%r6701,%r6702,%r6703,%r6704,%r6705,%r6706,%r6707,%r6708,%r6709,%r6710,%r6711,%r6712,%r6713,%r6714,%r6715,%r6716,%r6717,%r6718,%r6719,%r6720,%r6721,%r6722,%r6723,%r6724,%r6725,%r6726,%r6727,%r6728,%r6729,%r6730,%r6731,%r6732,%r6733,%r6734,%r6735,%r6736,%r6737,%r6738,%r6739,%r6740,%r6741,%r6742,%r6743,%r6744,%r6745,%r6746,%r6747,%r6748,%r6749,%r6750,%r6751,%r6752,%r6753,%r6754,%r6755,%r6756,%r6757,%r6758,%r6759,%r6760,%r6761,%r6762,%r6763,%r6764,%r6765,%r6766,%r6767,%r6768,%r6769,%r6770,%r6771,%r6772,%r6773,%r6774,%r6775,%r6776,%r6777,%r6778,%r6779,%r6780,%r6781,%r6782,%r6783,%r6784,%r6785,%r6786,%r6787,%r6788,%r6789,%r6790,%r6791,%r6792,%r6793,%r6794,%r6795,%r6796,%r6797,%r6798,%r6799,%r6800,%r6801,%r6802,%r6803,%r6804,%r6805,%r6806,%r6807,%r6808,%r6809,%r6810,%r6811,%r6812,%r6813,%r6814,%r6815,%r6816,%r6817,%r6818,%r6819,%r6820,%r6821,%r6822,%r6823,%r6824,%r6825,%r6826,%r6827,%r6828,%r6829,%r6830,%r6831,%r6832,%r6833,%r6834,%r6835,%r6836,%r6837,%r6838,%r6839,%r6840,%r6841,%r6842,%r6843,%r6844,%r6845,%r6846,%r6847,%r6848,%r6849,%r6850,%r6851,%r6852,%r6853,%r6854,%r6855,%r6856,%r6857,%r6858,%r6859,%r6860,%r6861,%r6862,%r6863,%r6864,%r6865,%r6866,%r6867,%r6868,%r6869,%r6870,%r6871,%r6872,%r6873,%r6874,%r6875,%r6876,%r6877,%r6878,%r6879,%r6880,%r6881,%r6882,%r6883,%r6884,%r6885,%r6886,%r6887,%r6888,%r6889,%r6890,%r6891,%r6892,%r6893,%r6894,%r6895,%r6896,%r6897,%r6898,%r6899,%r6900,%r6901,%r6902,%r6903,%r6904,%r6905,%r6906,%r6907,%r6908,%r6909,%r6910,%r6911,%r6912,%r6913,%r6914,%r6915,%r6916,%r6917,%r6918,%r6919,%r6920,%r6921,%r6922,%r6923,%r6924,%r6925,%r6926,%r6927,%r6928,%r6929,%r6930,%r6931,%r6932,%r6933,%r6934,%r6935,%r6936,%r6937,%r6938,%r6939,%r6940,%r6941,%r6942,%r6943,%r6944,%r6945,%r6946,%r5740,%r5741,%r5742 2026-02-21T10:15:14.4325630Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:15:14.4325690Z // end inline asm 2026-02-21T10:15:14.4325744Z $L__tmp3: 2026-02-21T10:15:14.4325979Z .loc 1 37 112 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:37:112 2026-02-21T10:15:14.4326048Z add.s32 %r6200, %r6690, 1; 2026-02-21T10:15:14.4326119Z setp.gt.s32 %p20, %r6200, 1; 2026-02-21T10:15:14.4326187Z selp.b32 %r6690, 0, %r6200, %p20; 2026-02-21T10:15:14.4326411Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4326594Z add.s64 %rd105, %rd158, -1835072; 2026-02-21T10:15:14.4326665Z add.s64 %rd106, %rd158, -1572928; 2026-02-21T10:15:14.4326733Z add.s64 %rd107, %rd158, -1310784; 2026-02-21T10:15:14.4326798Z add.s64 %rd108, %rd158, -1048640; 2026-02-21T10:15:14.4326864Z add.s64 %rd109, %rd158, -786496; 2026-02-21T10:15:14.4326928Z add.s64 %rd110, %rd158, -524352; 2026-02-21T10:15:14.4326997Z add.s64 %rd111, %rd158, -262208; 2026-02-21T10:15:14.4327208Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4327276Z add.s64 %rd112, %rd158, -64; 2026-02-21T10:15:14.4327344Z shl.b32 %r6201, %r6690, 12; 2026-02-21T10:15:14.4327404Z shl.b32 %r6202, %r6690, 13; 2026-02-21T10:15:14.4327470Z add.s32 %r6203, %r6045, %r6202; 2026-02-21T10:15:14.4327539Z add.s32 %r6002, %r6203, %r46; 2026-02-21T10:15:14.4327608Z selp.b32 %r6003, 8, 0, %p18; 2026-02-21T10:15:14.4327672Z // begin inline asm 2026-02-21T10:15:14.4327832Z cp.async.ca.shared.global [ %r6002 + 0 ], [ %rd105 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4327897Z // end inline asm 2026-02-21T10:15:14.4327958Z add.s32 %r6004, %r6002, 1024; 2026-02-21T10:15:14.4328018Z // begin inline asm 2026-02-21T10:15:14.4328165Z cp.async.ca.shared.global [ %r6004 + 0 ], [ %rd106 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4328225Z // end inline asm 2026-02-21T10:15:14.4328288Z add.s32 %r6006, %r6002, 2048; 2026-02-21T10:15:14.4328347Z // begin inline asm 2026-02-21T10:15:14.4328485Z cp.async.ca.shared.global [ %r6006 + 0 ], [ %rd107 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4328545Z // end inline asm 2026-02-21T10:15:14.4328606Z add.s32 %r6008, %r6002, 3072; 2026-02-21T10:15:14.4328671Z // begin inline asm 2026-02-21T10:15:14.4328895Z cp.async.ca.shared.global [ %r6008 + 0 ], [ %rd108 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4329013Z // end inline asm 2026-02-21T10:15:14.4329080Z add.s32 %r6010, %r6002, 4096; 2026-02-21T10:15:14.4329140Z // begin inline asm 2026-02-21T10:15:14.4329271Z cp.async.ca.shared.global [ %r6010 + 0 ], [ %rd109 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4329328Z // end inline asm 2026-02-21T10:15:14.4329395Z add.s32 %r6012, %r6002, 5120; 2026-02-21T10:15:14.4329459Z // begin inline asm 2026-02-21T10:15:14.4329591Z cp.async.ca.shared.global [ %r6012 + 0 ], [ %rd110 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4329653Z // end inline asm 2026-02-21T10:15:14.4329714Z add.s32 %r6014, %r6002, 6144; 2026-02-21T10:15:14.4329773Z // begin inline asm 2026-02-21T10:15:14.4329903Z cp.async.ca.shared.global [ %r6014 + 0 ], [ %rd111 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4329978Z // end inline asm 2026-02-21T10:15:14.4330041Z add.s32 %r6016, %r6002, 7168; 2026-02-21T10:15:14.4330101Z // begin inline asm 2026-02-21T10:15:14.4330245Z cp.async.ca.shared.global [ %r6016 + 0 ], [ %rd112 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4330304Z // end inline asm 2026-02-21T10:15:14.4330373Z cp.async.commit_group; 2026-02-21T10:15:14.4330664Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4330736Z add.s64 %rd113, %rd157, -30720; 2026-02-21T10:15:14.4331003Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4331072Z add.s64 %rd114, %rd157, -20480; 2026-02-21T10:15:14.4331141Z add.s32 %r6204, %r6051, %r6201; 2026-02-21T10:15:14.4331203Z add.s32 %r6018, %r6204, %r57; 2026-02-21T10:15:14.4331279Z selp.b32 %r6019, 16, 0, %p18; 2026-02-21T10:15:14.4331346Z // begin inline asm 2026-02-21T10:15:14.4331490Z cp.async.cg.shared.global [ %r6018 + 0 ], [ %rd113 + 0 ], 0x10, %r6019; 2026-02-21T10:15:14.4331548Z // end inline asm 2026-02-21T10:15:14.4331611Z add.s32 %r6020, %r6018, 2048; 2026-02-21T10:15:14.4331678Z // begin inline asm 2026-02-21T10:15:14.4331816Z cp.async.cg.shared.global [ %r6020 + 0 ], [ %rd114 + 0 ], 0x10, %r6019; 2026-02-21T10:15:14.4331878Z // end inline asm 2026-02-21T10:15:14.4331951Z cp.async.commit_group; 2026-02-21T10:15:14.4332159Z .loc 1 45 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:32 2026-02-21T10:15:14.4332225Z add.s64 %rd115, %rd158, -1835008; 2026-02-21T10:15:14.4332294Z add.s64 %rd116, %rd158, -1572864; 2026-02-21T10:15:14.4332363Z add.s64 %rd117, %rd158, -1310720; 2026-02-21T10:15:14.4332429Z add.s64 %rd118, %rd158, -1048576; 2026-02-21T10:15:14.4332494Z add.s64 %rd119, %rd158, -786432; 2026-02-21T10:15:14.4332563Z add.s64 %rd120, %rd158, -524288; 2026-02-21T10:15:14.4332625Z add.s64 %rd121, %rd158, -262144; 2026-02-21T10:15:14.4332829Z .loc 1 45 80 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:45:80 2026-02-21T10:15:14.4332896Z add.s32 %r6205, %r6123, %r6202; 2026-02-21T10:15:14.4332957Z add.s32 %r6022, %r6205, %r46; 2026-02-21T10:15:14.4333019Z // begin inline asm 2026-02-21T10:15:14.4333162Z cp.async.ca.shared.global [ %r6022 + 0 ], [ %rd115 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4333217Z // end inline asm 2026-02-21T10:15:14.4333277Z add.s32 %r6024, %r6022, 1024; 2026-02-21T10:15:14.4333348Z // begin inline asm 2026-02-21T10:15:14.4333490Z cp.async.ca.shared.global [ %r6024 + 0 ], [ %rd116 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4333550Z // end inline asm 2026-02-21T10:15:14.4333614Z add.s32 %r6026, %r6022, 2048; 2026-02-21T10:15:14.4333676Z // begin inline asm 2026-02-21T10:15:14.4333806Z cp.async.ca.shared.global [ %r6026 + 0 ], [ %rd117 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4333862Z // end inline asm 2026-02-21T10:15:14.4333922Z add.s32 %r6028, %r6022, 3072; 2026-02-21T10:15:14.4333986Z // begin inline asm 2026-02-21T10:15:14.4334115Z cp.async.ca.shared.global [ %r6028 + 0 ], [ %rd118 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4334233Z // end inline asm 2026-02-21T10:15:14.4334302Z add.s32 %r6030, %r6022, 4096; 2026-02-21T10:15:14.4334434Z // begin inline asm 2026-02-21T10:15:14.4334567Z cp.async.ca.shared.global [ %r6030 + 0 ], [ %rd119 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4334625Z // end inline asm 2026-02-21T10:15:14.4334692Z add.s32 %r6032, %r6022, 5120; 2026-02-21T10:15:14.4334750Z // begin inline asm 2026-02-21T10:15:14.4334881Z cp.async.ca.shared.global [ %r6032 + 0 ], [ %rd120 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4334943Z // end inline asm 2026-02-21T10:15:14.4335002Z add.s32 %r6034, %r6022, 6144; 2026-02-21T10:15:14.4335074Z // begin inline asm 2026-02-21T10:15:14.4335212Z cp.async.ca.shared.global [ %r6034 + 0 ], [ %rd121 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4335269Z // end inline asm 2026-02-21T10:15:14.4335328Z add.s32 %r6036, %r6022, 7168; 2026-02-21T10:15:14.4335388Z // begin inline asm 2026-02-21T10:15:14.4335523Z cp.async.ca.shared.global [ %r6036 + 0 ], [ %rd158 + 0 ], 0x8, %r6003; 2026-02-21T10:15:14.4335581Z // end inline asm 2026-02-21T10:15:14.4335648Z cp.async.commit_group; 2026-02-21T10:15:14.4335962Z .loc 1 51 34 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:34 2026-02-21T10:15:14.4336037Z add.s64 %rd123, %rd157, -10240; 2026-02-21T10:15:14.4336244Z .loc 1 51 87 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:51:87 2026-02-21T10:15:14.4336312Z add.s32 %r6206, %r6129, %r6201; 2026-02-21T10:15:14.4336417Z add.s32 %r6038, %r6206, %r57; 2026-02-21T10:15:14.4336600Z // begin inline asm 2026-02-21T10:15:14.4336748Z cp.async.cg.shared.global [ %r6038 + 0 ], [ %rd123 + 0 ], 0x10, %r6019; 2026-02-21T10:15:14.4336810Z // end inline asm 2026-02-21T10:15:14.4336870Z add.s32 %r6040, %r6038, 2048; 2026-02-21T10:15:14.4336930Z // begin inline asm 2026-02-21T10:15:14.4337084Z cp.async.cg.shared.global [ %r6040 + 0 ], [ %rd157 + 0 ], 0x10, %r6019; 2026-02-21T10:15:14.4337143Z // end inline asm 2026-02-21T10:15:14.4337212Z cp.async.commit_group; 2026-02-21T10:15:14.4337431Z .loc 1 37 112 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:37:112 2026-02-21T10:15:14.4337505Z add.s64 %rd158, %rd158, 128; 2026-02-21T10:15:14.4337570Z add.s64 %rd157, %rd157, 40960; 2026-02-21T10:15:14.4337641Z setp.lt.u64 %p21, %rd159, 4064; 2026-02-21T10:15:14.4337709Z @%p21 bra $L__BB0_3; 2026-02-21T10:15:14.4337822Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:15:14.4338031Z .loc 1 28 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:28:32 2026-02-21T10:15:14.4338105Z or.b32 %r6495, %r127, %r8; 2026-02-21T10:15:14.4338169Z or.b32 %r6496, %r127, %r9; 2026-02-21T10:15:14.4338232Z or.b32 %r6497, %r127, %r10; 2026-02-21T10:15:14.4338292Z or.b32 %r6498, %r127, %r11; 2026-02-21T10:15:14.4338358Z or.b32 %r6499, %r127, %r12; 2026-02-21T10:15:14.4338418Z or.b32 %r6500, %r127, %r13; 2026-02-21T10:15:14.4338480Z or.b32 %r6501, %r127, %r14; 2026-02-21T10:15:14.4338545Z or.b32 %r6502, %r127, %r15; 2026-02-21T10:15:14.4338606Z or.b32 %r6503, %r127, %r16; 2026-02-21T10:15:14.4338670Z or.b32 %r6504, %r127, %r17; 2026-02-21T10:15:14.4338730Z or.b32 %r6505, %r127, %r18; 2026-02-21T10:15:14.4338800Z or.b32 %r6506, %r127, %r19; 2026-02-21T10:15:14.4338859Z or.b32 %r6507, %r127, %r20; 2026-02-21T10:15:14.4338920Z or.b32 %r6508, %r127, %r21; 2026-02-21T10:15:14.4338994Z or.b32 %r6509, %r127, %r22; 2026-02-21T10:15:14.4339058Z or.b32 %r6510, %r127, %r23; 2026-02-21T10:15:14.4339119Z or.b32 %r6511, %r127, %r24; 2026-02-21T10:15:14.4339186Z or.b32 %r6512, %r127, %r25; 2026-02-21T10:15:14.4339245Z or.b32 %r6513, %r127, %r26; 2026-02-21T10:15:14.4339305Z or.b32 %r6514, %r127, %r27; 2026-02-21T10:15:14.4339364Z or.b32 %r6515, %r127, %r28; 2026-02-21T10:15:14.4339431Z or.b32 %r6516, %r127, %r29; 2026-02-21T10:15:14.4339492Z or.b32 %r6517, %r127, %r30; 2026-02-21T10:15:14.4339552Z or.b32 %r6518, %r127, %r31; 2026-02-21T10:15:14.4339701Z or.b32 %r6519, %r127, %r32; 2026-02-21T10:15:14.4339819Z or.b32 %r6520, %r127, %r33; 2026-02-21T10:15:14.4339879Z or.b32 %r6521, %r127, %r34; 2026-02-21T10:15:14.4339941Z or.b32 %r6522, %r127, %r35; 2026-02-21T10:15:14.4340021Z or.b32 %r6523, %r127, %r36; 2026-02-21T10:15:14.4340085Z or.b32 %r6524, %r127, %r37; 2026-02-21T10:15:14.4340147Z or.b32 %r6525, %r127, %r38; 2026-02-21T10:15:14.4340213Z or.b32 %r6526, %r127, %r39; 2026-02-21T10:15:14.4340424Z .loc 1 30 32 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:30:32 2026-02-21T10:15:14.4340484Z or.b32 %r6527, %r128, %r42; 2026-02-21T10:15:14.4340703Z .loc 1 37 112 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:37:112 2026-02-21T10:15:14.4340772Z cp.async.wait_group 0; 2026-02-21T10:15:14.4340828Z bar.sync 0; 2026-02-21T10:15:14.4341032Z .loc 1 84 28 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:84:28 2026-02-21T10:15:14.4341123Z cvt.rn.bf16x2.f32 %r6528, %r6692, %r6691; 2026-02-21T10:15:14.4341200Z cvt.rn.bf16x2.f32 %r6529, %r6694, %r6693; 2026-02-21T10:15:14.4341343Z cvt.rn.bf16x2.f32 %r6530, %r6696, %r6695; 2026-02-21T10:15:14.4341426Z cvt.rn.bf16x2.f32 %r6531, %r6698, %r6697; 2026-02-21T10:15:14.4341497Z cvt.rn.bf16x2.f32 %r6532, %r6700, %r6699; 2026-02-21T10:15:14.4341567Z cvt.rn.bf16x2.f32 %r6533, %r6702, %r6701; 2026-02-21T10:15:14.4341691Z cvt.rn.bf16x2.f32 %r6534, %r6704, %r6703; 2026-02-21T10:15:14.4341771Z cvt.rn.bf16x2.f32 %r6535, %r6706, %r6705; 2026-02-21T10:15:14.4341841Z cvt.rn.bf16x2.f32 %r6536, %r6708, %r6707; 2026-02-21T10:15:14.4341912Z cvt.rn.bf16x2.f32 %r6537, %r6710, %r6709; 2026-02-21T10:15:14.4341987Z cvt.rn.bf16x2.f32 %r6538, %r6712, %r6711; 2026-02-21T10:15:14.4342055Z cvt.rn.bf16x2.f32 %r6539, %r6714, %r6713; 2026-02-21T10:15:14.4342125Z cvt.rn.bf16x2.f32 %r6540, %r6716, %r6715; 2026-02-21T10:15:14.4342199Z cvt.rn.bf16x2.f32 %r6541, %r6718, %r6717; 2026-02-21T10:15:14.4342270Z cvt.rn.bf16x2.f32 %r6542, %r6720, %r6719; 2026-02-21T10:15:14.4342350Z cvt.rn.bf16x2.f32 %r6543, %r6722, %r6721; 2026-02-21T10:15:14.4342422Z cvt.rn.bf16x2.f32 %r6544, %r6724, %r6723; 2026-02-21T10:15:14.4342497Z cvt.rn.bf16x2.f32 %r6545, %r6726, %r6725; 2026-02-21T10:15:14.4342567Z cvt.rn.bf16x2.f32 %r6546, %r6728, %r6727; 2026-02-21T10:15:14.4342636Z cvt.rn.bf16x2.f32 %r6547, %r6730, %r6729; 2026-02-21T10:15:14.4342711Z cvt.rn.bf16x2.f32 %r6548, %r6732, %r6731; 2026-02-21T10:15:14.4342792Z cvt.rn.bf16x2.f32 %r6549, %r6734, %r6733; 2026-02-21T10:15:14.4342862Z cvt.rn.bf16x2.f32 %r6550, %r6736, %r6735; 2026-02-21T10:15:14.4342938Z cvt.rn.bf16x2.f32 %r6551, %r6738, %r6737; 2026-02-21T10:15:14.4343011Z cvt.rn.bf16x2.f32 %r6552, %r6740, %r6739; 2026-02-21T10:15:14.4343081Z cvt.rn.bf16x2.f32 %r6553, %r6742, %r6741; 2026-02-21T10:15:14.4343150Z cvt.rn.bf16x2.f32 %r6554, %r6744, %r6743; 2026-02-21T10:15:14.4343226Z cvt.rn.bf16x2.f32 %r6555, %r6746, %r6745; 2026-02-21T10:15:14.4343298Z cvt.rn.bf16x2.f32 %r6556, %r6748, %r6747; 2026-02-21T10:15:14.4343371Z cvt.rn.bf16x2.f32 %r6557, %r6750, %r6749; 2026-02-21T10:15:14.4343449Z cvt.rn.bf16x2.f32 %r6558, %r6752, %r6751; 2026-02-21T10:15:14.4343519Z cvt.rn.bf16x2.f32 %r6559, %r6754, %r6753; 2026-02-21T10:15:14.4343589Z cvt.rn.bf16x2.f32 %r6560, %r6756, %r6755; 2026-02-21T10:15:14.4343658Z cvt.rn.bf16x2.f32 %r6561, %r6758, %r6757; 2026-02-21T10:15:14.4343735Z cvt.rn.bf16x2.f32 %r6562, %r6760, %r6759; 2026-02-21T10:15:14.4343806Z cvt.rn.bf16x2.f32 %r6563, %r6762, %r6761; 2026-02-21T10:15:14.4343876Z cvt.rn.bf16x2.f32 %r6564, %r6764, %r6763; 2026-02-21T10:15:14.4343952Z cvt.rn.bf16x2.f32 %r6565, %r6766, %r6765; 2026-02-21T10:15:14.4344022Z cvt.rn.bf16x2.f32 %r6566, %r6768, %r6767; 2026-02-21T10:15:14.4344093Z cvt.rn.bf16x2.f32 %r6567, %r6770, %r6769; 2026-02-21T10:15:14.4344168Z cvt.rn.bf16x2.f32 %r6568, %r6772, %r6771; 2026-02-21T10:15:14.4344238Z cvt.rn.bf16x2.f32 %r6569, %r6774, %r6773; 2026-02-21T10:15:14.4344364Z cvt.rn.bf16x2.f32 %r6570, %r6776, %r6775; 2026-02-21T10:15:14.4344492Z cvt.rn.bf16x2.f32 %r6571, %r6778, %r6777; 2026-02-21T10:15:14.4344570Z cvt.rn.bf16x2.f32 %r6572, %r6780, %r6779; 2026-02-21T10:15:14.4344641Z cvt.rn.bf16x2.f32 %r6573, %r6782, %r6781; 2026-02-21T10:15:14.4344712Z cvt.rn.bf16x2.f32 %r6574, %r6784, %r6783; 2026-02-21T10:15:14.4344792Z cvt.rn.bf16x2.f32 %r6575, %r6786, %r6785; 2026-02-21T10:15:14.4344863Z cvt.rn.bf16x2.f32 %r6576, %r6788, %r6787; 2026-02-21T10:15:14.4344933Z cvt.rn.bf16x2.f32 %r6577, %r6790, %r6789; 2026-02-21T10:15:14.4345008Z cvt.rn.bf16x2.f32 %r6578, %r6792, %r6791; 2026-02-21T10:15:14.4345079Z cvt.rn.bf16x2.f32 %r6579, %r6794, %r6793; 2026-02-21T10:15:14.4345148Z cvt.rn.bf16x2.f32 %r6580, %r6796, %r6795; 2026-02-21T10:15:14.4345217Z cvt.rn.bf16x2.f32 %r6581, %r6798, %r6797; 2026-02-21T10:15:14.4345292Z cvt.rn.bf16x2.f32 %r6582, %r6800, %r6799; 2026-02-21T10:15:14.4345362Z cvt.rn.bf16x2.f32 %r6583, %r6802, %r6801; 2026-02-21T10:15:14.4345437Z cvt.rn.bf16x2.f32 %r6584, %r6804, %r6803; 2026-02-21T10:15:14.4345515Z cvt.rn.bf16x2.f32 %r6585, %r6806, %r6805; 2026-02-21T10:15:14.4345649Z cvt.rn.bf16x2.f32 %r6586, %r6808, %r6807; 2026-02-21T10:15:14.4345733Z cvt.rn.bf16x2.f32 %r6587, %r6810, %r6809; 2026-02-21T10:15:14.4345812Z cvt.rn.bf16x2.f32 %r6588, %r6812, %r6811; 2026-02-21T10:15:14.4345883Z cvt.rn.bf16x2.f32 %r6589, %r6814, %r6813; 2026-02-21T10:15:14.4346005Z cvt.rn.bf16x2.f32 %r6590, %r6816, %r6815; 2026-02-21T10:15:14.4346078Z cvt.rn.bf16x2.f32 %r6591, %r6818, %r6817; 2026-02-21T10:15:14.4346154Z cvt.rn.bf16x2.f32 %r6592, %r6820, %r6819; 2026-02-21T10:15:14.4346226Z cvt.rn.bf16x2.f32 %r6593, %r6822, %r6821; 2026-02-21T10:15:14.4346297Z cvt.rn.bf16x2.f32 %r6594, %r6824, %r6823; 2026-02-21T10:15:14.4346376Z cvt.rn.bf16x2.f32 %r6595, %r6826, %r6825; 2026-02-21T10:15:14.4346564Z cvt.rn.bf16x2.f32 %r6596, %r6828, %r6827; 2026-02-21T10:15:14.4346641Z cvt.rn.bf16x2.f32 %r6597, %r6830, %r6829; 2026-02-21T10:15:14.4346715Z cvt.rn.bf16x2.f32 %r6598, %r6832, %r6831; 2026-02-21T10:15:14.4346795Z cvt.rn.bf16x2.f32 %r6599, %r6834, %r6833; 2026-02-21T10:15:14.4346868Z cvt.rn.bf16x2.f32 %r6600, %r6836, %r6835; 2026-02-21T10:15:14.4346939Z cvt.rn.bf16x2.f32 %r6601, %r6838, %r6837; 2026-02-21T10:15:14.4347017Z cvt.rn.bf16x2.f32 %r6602, %r6840, %r6839; 2026-02-21T10:15:14.4347088Z cvt.rn.bf16x2.f32 %r6603, %r6842, %r6841; 2026-02-21T10:15:14.4347160Z cvt.rn.bf16x2.f32 %r6604, %r6844, %r6843; 2026-02-21T10:15:14.4347238Z cvt.rn.bf16x2.f32 %r6605, %r6846, %r6845; 2026-02-21T10:15:14.4347308Z cvt.rn.bf16x2.f32 %r6606, %r6848, %r6847; 2026-02-21T10:15:14.4347378Z cvt.rn.bf16x2.f32 %r6607, %r6850, %r6849; 2026-02-21T10:15:14.4347449Z cvt.rn.bf16x2.f32 %r6608, %r6852, %r6851; 2026-02-21T10:15:14.4347542Z cvt.rn.bf16x2.f32 %r6609, %r6854, %r6853; 2026-02-21T10:15:14.4347613Z cvt.rn.bf16x2.f32 %r6610, %r6856, %r6855; 2026-02-21T10:15:14.4347685Z cvt.rn.bf16x2.f32 %r6611, %r6858, %r6857; 2026-02-21T10:15:14.4347761Z cvt.rn.bf16x2.f32 %r6612, %r6860, %r6859; 2026-02-21T10:15:14.4347832Z cvt.rn.bf16x2.f32 %r6613, %r6862, %r6861; 2026-02-21T10:15:14.4347905Z cvt.rn.bf16x2.f32 %r6614, %r6864, %r6863; 2026-02-21T10:15:14.4347983Z cvt.rn.bf16x2.f32 %r6615, %r6866, %r6865; 2026-02-21T10:15:14.4348055Z cvt.rn.bf16x2.f32 %r6616, %r6868, %r6867; 2026-02-21T10:15:14.4348126Z cvt.rn.bf16x2.f32 %r6617, %r6870, %r6869; 2026-02-21T10:15:14.4348202Z cvt.rn.bf16x2.f32 %r6618, %r6872, %r6871; 2026-02-21T10:15:14.4348285Z cvt.rn.bf16x2.f32 %r6619, %r6874, %r6873; 2026-02-21T10:15:14.4348357Z cvt.rn.bf16x2.f32 %r6620, %r6876, %r6875; 2026-02-21T10:15:14.4348497Z cvt.rn.bf16x2.f32 %r6621, %r6878, %r6877; 2026-02-21T10:15:14.4348578Z cvt.rn.bf16x2.f32 %r6622, %r6880, %r6879; 2026-02-21T10:15:14.4348650Z cvt.rn.bf16x2.f32 %r6623, %r6882, %r6881; 2026-02-21T10:15:14.4348721Z cvt.rn.bf16x2.f32 %r6624, %r6884, %r6883; 2026-02-21T10:15:14.4348792Z cvt.rn.bf16x2.f32 %r6625, %r6886, %r6885; 2026-02-21T10:15:14.4348959Z cvt.rn.bf16x2.f32 %r6626, %r6888, %r6887; 2026-02-21T10:15:14.4349097Z cvt.rn.bf16x2.f32 %r6627, %r6890, %r6889; 2026-02-21T10:15:14.4349170Z cvt.rn.bf16x2.f32 %r6628, %r6892, %r6891; 2026-02-21T10:15:14.4349249Z cvt.rn.bf16x2.f32 %r6629, %r6894, %r6893; 2026-02-21T10:15:14.4349319Z cvt.rn.bf16x2.f32 %r6630, %r6896, %r6895; 2026-02-21T10:15:14.4349389Z cvt.rn.bf16x2.f32 %r6631, %r6898, %r6897; 2026-02-21T10:15:14.4349467Z cvt.rn.bf16x2.f32 %r6632, %r6900, %r6899; 2026-02-21T10:15:14.4349537Z cvt.rn.bf16x2.f32 %r6633, %r6902, %r6901; 2026-02-21T10:15:14.4349607Z cvt.rn.bf16x2.f32 %r6634, %r6904, %r6903; 2026-02-21T10:15:14.4349676Z cvt.rn.bf16x2.f32 %r6635, %r6906, %r6905; 2026-02-21T10:15:14.4349753Z cvt.rn.bf16x2.f32 %r6636, %r6908, %r6907; 2026-02-21T10:15:14.4349822Z cvt.rn.bf16x2.f32 %r6637, %r6910, %r6909; 2026-02-21T10:15:14.4349890Z cvt.rn.bf16x2.f32 %r6638, %r6912, %r6911; 2026-02-21T10:15:14.4349965Z cvt.rn.bf16x2.f32 %r6639, %r6914, %r6913; 2026-02-21T10:15:14.4350040Z cvt.rn.bf16x2.f32 %r6640, %r6916, %r6915; 2026-02-21T10:15:14.4350113Z cvt.rn.bf16x2.f32 %r6641, %r6918, %r6917; 2026-02-21T10:15:14.4350263Z cvt.rn.bf16x2.f32 %r6642, %r6920, %r6919; 2026-02-21T10:15:14.4350339Z cvt.rn.bf16x2.f32 %r6643, %r6922, %r6921; 2026-02-21T10:15:14.4350409Z cvt.rn.bf16x2.f32 %r6644, %r6924, %r6923; 2026-02-21T10:15:14.4350478Z cvt.rn.bf16x2.f32 %r6645, %r6926, %r6925; 2026-02-21T10:15:14.4350613Z cvt.rn.bf16x2.f32 %r6646, %r6928, %r6927; 2026-02-21T10:15:14.4350687Z cvt.rn.bf16x2.f32 %r6647, %r6930, %r6929; 2026-02-21T10:15:14.4350757Z cvt.rn.bf16x2.f32 %r6648, %r6932, %r6931; 2026-02-21T10:15:14.4350835Z cvt.rn.bf16x2.f32 %r6649, %r6934, %r6933; 2026-02-21T10:15:14.4350904Z cvt.rn.bf16x2.f32 %r6650, %r6936, %r6935; 2026-02-21T10:15:14.4350973Z cvt.rn.bf16x2.f32 %r6651, %r6938, %r6937; 2026-02-21T10:15:14.4351044Z cvt.rn.bf16x2.f32 %r6652, %r6940, %r6939; 2026-02-21T10:15:14.4351121Z cvt.rn.bf16x2.f32 %r6653, %r6942, %r6941; 2026-02-21T10:15:14.4351195Z cvt.rn.bf16x2.f32 %r6654, %r6944, %r6943; 2026-02-21T10:15:14.4351278Z cvt.rn.bf16x2.f32 %r6655, %r6946, %r6945; 2026-02-21T10:15:14.4351504Z .loc 1 85 50 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:85:50 2026-02-21T10:15:14.4351578Z mad.lo.s32 %r6656, %r6495, 1280, %r6527; 2026-02-21T10:15:14.4351648Z mad.lo.s32 %r6657, %r6496, 1280, %r6527; 2026-02-21T10:15:14.4351720Z mad.lo.s32 %r6658, %r6497, 1280, %r6527; 2026-02-21T10:15:14.4351791Z mad.lo.s32 %r6659, %r6498, 1280, %r6527; 2026-02-21T10:15:14.4351858Z mad.lo.s32 %r6660, %r6499, 1280, %r6527; 2026-02-21T10:15:14.4351925Z mad.lo.s32 %r6661, %r6500, 1280, %r6527; 2026-02-21T10:15:14.4352000Z mad.lo.s32 %r6662, %r6501, 1280, %r6527; 2026-02-21T10:15:14.4352068Z mad.lo.s32 %r6663, %r6502, 1280, %r6527; 2026-02-21T10:15:14.4352136Z mad.lo.s32 %r6664, %r6503, 1280, %r6527; 2026-02-21T10:15:14.4352209Z mad.lo.s32 %r6665, %r6504, 1280, %r6527; 2026-02-21T10:15:14.4352277Z mad.lo.s32 %r6666, %r6505, 1280, %r6527; 2026-02-21T10:15:14.4352348Z mad.lo.s32 %r6667, %r6506, 1280, %r6527; 2026-02-21T10:15:14.4352423Z mad.lo.s32 %r6668, %r6507, 1280, %r6527; 2026-02-21T10:15:14.4352492Z mad.lo.s32 %r6669, %r6508, 1280, %r6527; 2026-02-21T10:15:14.4352560Z mad.lo.s32 %r6670, %r6509, 1280, %r6527; 2026-02-21T10:15:14.4352627Z mad.lo.s32 %r6671, %r6510, 1280, %r6527; 2026-02-21T10:15:14.4352699Z mad.lo.s32 %r6672, %r6511, 1280, %r6527; 2026-02-21T10:15:14.4352768Z mad.lo.s32 %r6673, %r6512, 1280, %r6527; 2026-02-21T10:15:14.4352836Z mad.lo.s32 %r6674, %r6513, 1280, %r6527; 2026-02-21T10:15:14.4352909Z mad.lo.s32 %r6675, %r6514, 1280, %r6527; 2026-02-21T10:15:14.4352975Z mad.lo.s32 %r6676, %r6515, 1280, %r6527; 2026-02-21T10:15:14.4353043Z mad.lo.s32 %r6677, %r6516, 1280, %r6527; 2026-02-21T10:15:14.4353117Z mad.lo.s32 %r6678, %r6517, 1280, %r6527; 2026-02-21T10:15:14.4353186Z mad.lo.s32 %r6679, %r6518, 1280, %r6527; 2026-02-21T10:15:14.4353255Z mad.lo.s32 %r6680, %r6519, 1280, %r6527; 2026-02-21T10:15:14.4353418Z mad.lo.s32 %r6681, %r6520, 1280, %r6527; 2026-02-21T10:15:14.4353543Z mad.lo.s32 %r6682, %r6521, 1280, %r6527; 2026-02-21T10:15:14.4353613Z mad.lo.s32 %r6683, %r6522, 1280, %r6527; 2026-02-21T10:15:14.4353683Z mad.lo.s32 %r6684, %r6523, 1280, %r6527; 2026-02-21T10:15:14.4353757Z mad.lo.s32 %r6685, %r6524, 1280, %r6527; 2026-02-21T10:15:14.4353827Z mad.lo.s32 %r6686, %r6525, 1280, %r6527; 2026-02-21T10:15:14.4353897Z mad.lo.s32 %r6687, %r6526, 1280, %r6527; 2026-02-21T10:15:14.4354117Z .loc 1 85 22 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:85:22 2026-02-21T10:15:14.4354189Z mad.wide.s32 %rd125, %r6656, 2, %rd17; 2026-02-21T10:15:14.4354257Z mad.wide.s32 %rd126, %r6657, 2, %rd17; 2026-02-21T10:15:14.4354324Z mad.wide.s32 %rd127, %r6658, 2, %rd17; 2026-02-21T10:15:14.4354397Z mad.wide.s32 %rd128, %r6659, 2, %rd17; 2026-02-21T10:15:14.4354464Z mad.wide.s32 %rd129, %r6660, 2, %rd17; 2026-02-21T10:15:14.4354534Z mad.wide.s32 %rd130, %r6661, 2, %rd17; 2026-02-21T10:15:14.4354606Z mad.wide.s32 %rd131, %r6662, 2, %rd17; 2026-02-21T10:15:14.4354679Z mad.wide.s32 %rd132, %r6663, 2, %rd17; 2026-02-21T10:15:14.4354797Z mad.wide.s32 %rd133, %r6664, 2, %rd17; 2026-02-21T10:15:14.4354868Z mad.wide.s32 %rd134, %r6665, 2, %rd17; 2026-02-21T10:15:14.4354942Z mad.wide.s32 %rd135, %r6666, 2, %rd17; 2026-02-21T10:15:14.4355010Z mad.wide.s32 %rd136, %r6667, 2, %rd17; 2026-02-21T10:15:14.4355119Z mad.wide.s32 %rd137, %r6668, 2, %rd17; 2026-02-21T10:15:14.4355195Z mad.wide.s32 %rd138, %r6669, 2, %rd17; 2026-02-21T10:15:14.4355263Z mad.wide.s32 %rd139, %r6670, 2, %rd17; 2026-02-21T10:15:14.4355331Z mad.wide.s32 %rd140, %r6671, 2, %rd17; 2026-02-21T10:15:14.4355405Z mad.wide.s32 %rd141, %r6672, 2, %rd17; 2026-02-21T10:15:14.4355472Z mad.wide.s32 %rd142, %r6673, 2, %rd17; 2026-02-21T10:15:14.4355539Z mad.wide.s32 %rd143, %r6674, 2, %rd17; 2026-02-21T10:15:14.4355606Z mad.wide.s32 %rd144, %r6675, 2, %rd17; 2026-02-21T10:15:14.4355689Z mad.wide.s32 %rd145, %r6676, 2, %rd17; 2026-02-21T10:15:14.4355762Z mad.wide.s32 %rd146, %r6677, 2, %rd17; 2026-02-21T10:15:14.4355832Z mad.wide.s32 %rd147, %r6678, 2, %rd17; 2026-02-21T10:15:14.4355904Z mad.wide.s32 %rd148, %r6679, 2, %rd17; 2026-02-21T10:15:14.4355971Z mad.wide.s32 %rd149, %r6680, 2, %rd17; 2026-02-21T10:15:14.4356036Z mad.wide.s32 %rd150, %r6681, 2, %rd17; 2026-02-21T10:15:14.4356104Z mad.wide.s32 %rd151, %r6682, 2, %rd17; 2026-02-21T10:15:14.4356175Z mad.wide.s32 %rd152, %r6683, 2, %rd17; 2026-02-21T10:15:14.4356243Z mad.wide.s32 %rd153, %r6684, 2, %rd17; 2026-02-21T10:15:14.4356310Z mad.wide.s32 %rd154, %r6685, 2, %rd17; 2026-02-21T10:15:14.4356386Z mad.wide.s32 %rd155, %r6686, 2, %rd17; 2026-02-21T10:15:14.4356565Z mad.wide.s32 %rd156, %r6687, 2, %rd17; 2026-02-21T10:15:14.4356793Z .loc 1 85 81 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:85:81 2026-02-21T10:15:14.4356918Z st.shared.v4.b32 [%r110], {%r6528, %r6530, %r6532, %r6534}; 2026-02-21T10:15:14.4357029Z st.shared.v4.b32 [%r111], {%r6536, %r6538, %r6540, %r6542}; 2026-02-21T10:15:14.4357138Z st.shared.v4.b32 [%r112], {%r6544, %r6546, %r6548, %r6550}; 2026-02-21T10:15:14.4357244Z st.shared.v4.b32 [%r113], {%r6552, %r6554, %r6556, %r6558}; 2026-02-21T10:15:14.4357353Z st.shared.v4.b32 [%r114], {%r6560, %r6562, %r6564, %r6566}; 2026-02-21T10:15:14.4357456Z st.shared.v4.b32 [%r115], {%r6568, %r6570, %r6572, %r6574}; 2026-02-21T10:15:14.4357559Z st.shared.v4.b32 [%r116], {%r6576, %r6578, %r6580, %r6582}; 2026-02-21T10:15:14.4357666Z st.shared.v4.b32 [%r117], {%r6584, %r6586, %r6588, %r6590}; 2026-02-21T10:15:14.4357725Z bar.sync 0; 2026-02-21T10:15:14.4357787Z // begin inline asm 2026-02-21T10:15:14.4357985Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6207, %r6208, %r6209, %r6210}, [%r6211]; 2026-02-21T10:15:14.4358044Z // end inline asm 2026-02-21T10:15:14.4358104Z // begin inline asm 2026-02-21T10:15:14.4358384Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6212, %r6213, %r6214, %r6215}, [%r6216]; 2026-02-21T10:15:14.4358508Z // end inline asm 2026-02-21T10:15:14.4358570Z // begin inline asm 2026-02-21T10:15:14.4358756Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6217, %r6218, %r6219, %r6220}, [%r6221]; 2026-02-21T10:15:14.4358818Z // end inline asm 2026-02-21T10:15:14.4358876Z // begin inline asm 2026-02-21T10:15:14.4359058Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6222, %r6223, %r6224, %r6225}, [%r6226]; 2026-02-21T10:15:14.4359123Z // end inline asm 2026-02-21T10:15:14.4359182Z // begin inline asm 2026-02-21T10:15:14.4359361Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6227, %r6228, %r6229, %r6230}, [%r6231]; 2026-02-21T10:15:14.4359418Z // end inline asm 2026-02-21T10:15:14.4359482Z // begin inline asm 2026-02-21T10:15:14.4359659Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6232, %r6233, %r6234, %r6235}, [%r6236]; 2026-02-21T10:15:14.4359716Z // end inline asm 2026-02-21T10:15:14.4359795Z // begin inline asm 2026-02-21T10:15:14.4359978Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6237, %r6238, %r6239, %r6240}, [%r6241]; 2026-02-21T10:15:14.4360038Z // end inline asm 2026-02-21T10:15:14.4360166Z // begin inline asm 2026-02-21T10:15:14.4360343Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6242, %r6243, %r6244, %r6245}, [%r6246]; 2026-02-21T10:15:14.4360397Z // end inline asm 2026-02-21T10:15:14.4360453Z bar.sync 0; 2026-02-21T10:15:14.4360616Z st.shared.v4.b32 [%r110], {%r6529, %r6531, %r6533, %r6535}; 2026-02-21T10:15:14.4360721Z st.shared.v4.b32 [%r111], {%r6537, %r6539, %r6541, %r6543}; 2026-02-21T10:15:14.4360823Z st.shared.v4.b32 [%r112], {%r6545, %r6547, %r6549, %r6551}; 2026-02-21T10:15:14.4360927Z st.shared.v4.b32 [%r113], {%r6553, %r6555, %r6557, %r6559}; 2026-02-21T10:15:14.4361026Z st.shared.v4.b32 [%r114], {%r6561, %r6563, %r6565, %r6567}; 2026-02-21T10:15:14.4361125Z st.shared.v4.b32 [%r115], {%r6569, %r6571, %r6573, %r6575}; 2026-02-21T10:15:14.4361225Z st.shared.v4.b32 [%r116], {%r6577, %r6579, %r6581, %r6583}; 2026-02-21T10:15:14.4361326Z st.shared.v4.b32 [%r117], {%r6585, %r6587, %r6589, %r6591}; 2026-02-21T10:15:14.4361382Z bar.sync 0; 2026-02-21T10:15:14.4361442Z // begin inline asm 2026-02-21T10:15:14.4361629Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6247, %r6248, %r6249, %r6250}, [%r6211]; 2026-02-21T10:15:14.4361687Z // end inline asm 2026-02-21T10:15:14.4361744Z // begin inline asm 2026-02-21T10:15:14.4361927Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6252, %r6253, %r6254, %r6255}, [%r6216]; 2026-02-21T10:15:14.4361985Z // end inline asm 2026-02-21T10:15:14.4362044Z // begin inline asm 2026-02-21T10:15:14.4362223Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6257, %r6258, %r6259, %r6260}, [%r6221]; 2026-02-21T10:15:14.4362300Z // end inline asm 2026-02-21T10:15:14.4362360Z // begin inline asm 2026-02-21T10:15:14.4362539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6262, %r6263, %r6264, %r6265}, [%r6226]; 2026-02-21T10:15:14.4362601Z // end inline asm 2026-02-21T10:15:14.4362659Z // begin inline asm 2026-02-21T10:15:14.4362836Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6267, %r6268, %r6269, %r6270}, [%r6231]; 2026-02-21T10:15:14.4362898Z // end inline asm 2026-02-21T10:15:14.4362954Z // begin inline asm 2026-02-21T10:15:14.4363130Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6272, %r6273, %r6274, %r6275}, [%r6236]; 2026-02-21T10:15:14.4363185Z // end inline asm 2026-02-21T10:15:14.4363249Z // begin inline asm 2026-02-21T10:15:14.4363428Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6277, %r6278, %r6279, %r6280}, [%r6241]; 2026-02-21T10:15:14.4363483Z // end inline asm 2026-02-21T10:15:14.4363547Z // begin inline asm 2026-02-21T10:15:14.4363722Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6282, %r6283, %r6284, %r6285}, [%r6246]; 2026-02-21T10:15:14.4363782Z // end inline asm 2026-02-21T10:15:14.4363839Z bar.sync 0; 2026-02-21T10:15:14.4363944Z st.shared.v4.b32 [%r110], {%r6592, %r6594, %r6596, %r6598}; 2026-02-21T10:15:14.4364052Z st.shared.v4.b32 [%r111], {%r6600, %r6602, %r6604, %r6606}; 2026-02-21T10:15:14.4364221Z st.shared.v4.b32 [%r112], {%r6608, %r6610, %r6612, %r6614}; 2026-02-21T10:15:14.4364393Z st.shared.v4.b32 [%r113], {%r6616, %r6618, %r6620, %r6622}; 2026-02-21T10:15:14.4364498Z st.shared.v4.b32 [%r114], {%r6624, %r6626, %r6628, %r6630}; 2026-02-21T10:15:14.4364603Z st.shared.v4.b32 [%r115], {%r6632, %r6634, %r6636, %r6638}; 2026-02-21T10:15:14.4364714Z st.shared.v4.b32 [%r116], {%r6640, %r6642, %r6644, %r6646}; 2026-02-21T10:15:14.4364816Z st.shared.v4.b32 [%r117], {%r6648, %r6650, %r6652, %r6654}; 2026-02-21T10:15:14.4364871Z bar.sync 0; 2026-02-21T10:15:14.4364937Z // begin inline asm 2026-02-21T10:15:14.4365120Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6287, %r6288, %r6289, %r6290}, [%r6211]; 2026-02-21T10:15:14.4365176Z // end inline asm 2026-02-21T10:15:14.4365235Z // begin inline asm 2026-02-21T10:15:14.4365420Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6292, %r6293, %r6294, %r6295}, [%r6216]; 2026-02-21T10:15:14.4365480Z // end inline asm 2026-02-21T10:15:14.4365535Z // begin inline asm 2026-02-21T10:15:14.4365773Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6297, %r6298, %r6299, %r6300}, [%r6221]; 2026-02-21T10:15:14.4365835Z // end inline asm 2026-02-21T10:15:14.4365894Z // begin inline asm 2026-02-21T10:15:14.4366080Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6302, %r6303, %r6304, %r6305}, [%r6226]; 2026-02-21T10:15:14.4366138Z // end inline asm 2026-02-21T10:15:14.4366238Z // begin inline asm 2026-02-21T10:15:14.4366417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6307, %r6308, %r6309, %r6310}, [%r6231]; 2026-02-21T10:15:14.4366594Z // end inline asm 2026-02-21T10:15:14.4366659Z // begin inline asm 2026-02-21T10:15:14.4366837Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6312, %r6313, %r6314, %r6315}, [%r6236]; 2026-02-21T10:15:14.4366896Z // end inline asm 2026-02-21T10:15:14.4366954Z // begin inline asm 2026-02-21T10:15:14.4367143Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6317, %r6318, %r6319, %r6320}, [%r6241]; 2026-02-21T10:15:14.4367205Z // end inline asm 2026-02-21T10:15:14.4367270Z // begin inline asm 2026-02-21T10:15:14.4367448Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6322, %r6323, %r6324, %r6325}, [%r6246]; 2026-02-21T10:15:14.4367503Z // end inline asm 2026-02-21T10:15:14.4367561Z bar.sync 0; 2026-02-21T10:15:14.4367663Z st.shared.v4.b32 [%r110], {%r6593, %r6595, %r6597, %r6599}; 2026-02-21T10:15:14.4367765Z st.shared.v4.b32 [%r111], {%r6601, %r6603, %r6605, %r6607}; 2026-02-21T10:15:14.4367871Z st.shared.v4.b32 [%r112], {%r6609, %r6611, %r6613, %r6615}; 2026-02-21T10:15:14.4367972Z st.shared.v4.b32 [%r113], {%r6617, %r6619, %r6621, %r6623}; 2026-02-21T10:15:14.4368072Z st.shared.v4.b32 [%r114], {%r6625, %r6627, %r6629, %r6631}; 2026-02-21T10:15:14.4368171Z st.shared.v4.b32 [%r115], {%r6633, %r6635, %r6637, %r6639}; 2026-02-21T10:15:14.4368275Z st.shared.v4.b32 [%r116], {%r6641, %r6643, %r6645, %r6647}; 2026-02-21T10:15:14.4368377Z st.shared.v4.b32 [%r117], {%r6649, %r6651, %r6653, %r6655}; 2026-02-21T10:15:14.4368433Z bar.sync 0; 2026-02-21T10:15:14.4368497Z // begin inline asm 2026-02-21T10:15:14.4368674Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6327, %r6328, %r6329, %r6330}, [%r6211]; 2026-02-21T10:15:14.4368730Z // end inline asm 2026-02-21T10:15:14.4368792Z // begin inline asm 2026-02-21T10:15:14.4368968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6332, %r6333, %r6334, %r6335}, [%r6216]; 2026-02-21T10:15:14.4369025Z // end inline asm 2026-02-21T10:15:14.4369096Z // begin inline asm 2026-02-21T10:15:14.4369279Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6337, %r6338, %r6339, %r6340}, [%r6221]; 2026-02-21T10:15:14.4369333Z // end inline asm 2026-02-21T10:15:14.4369391Z // begin inline asm 2026-02-21T10:15:14.4369575Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6342, %r6343, %r6344, %r6345}, [%r6226]; 2026-02-21T10:15:14.4369631Z // end inline asm 2026-02-21T10:15:14.4369689Z // begin inline asm 2026-02-21T10:15:14.4369951Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6347, %r6348, %r6349, %r6350}, [%r6231]; 2026-02-21T10:15:14.4370069Z // end inline asm 2026-02-21T10:15:14.4370127Z // begin inline asm 2026-02-21T10:15:14.4370307Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6352, %r6353, %r6354, %r6355}, [%r6236]; 2026-02-21T10:15:14.4370366Z // end inline asm 2026-02-21T10:15:14.4370424Z // begin inline asm 2026-02-21T10:15:14.4370603Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6357, %r6358, %r6359, %r6360}, [%r6241]; 2026-02-21T10:15:14.4370662Z // end inline asm 2026-02-21T10:15:14.4370719Z // begin inline asm 2026-02-21T10:15:14.4370897Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6362, %r6363, %r6364, %r6365}, [%r6246]; 2026-02-21T10:15:14.4370953Z // end inline asm 2026-02-21T10:15:14.4371026Z // begin inline asm 2026-02-21T10:15:14.4371154Z st.global.v4.b32 [ %rd125 + 0 ], { %r6207, %r6208, %r6209, %r6210 }; 2026-02-21T10:15:14.4371209Z // end inline asm 2026-02-21T10:15:14.4371272Z // begin inline asm 2026-02-21T10:15:14.4371393Z st.global.v4.b32 [ %rd126 + 0 ], { %r6212, %r6213, %r6214, %r6215 }; 2026-02-21T10:15:14.4371452Z // end inline asm 2026-02-21T10:15:14.4371515Z // begin inline asm 2026-02-21T10:15:14.4371697Z st.global.v4.b32 [ %rd127 + 0 ], { %r6247, %r6248, %r6249, %r6250 }; 2026-02-21T10:15:14.4371756Z // end inline asm 2026-02-21T10:15:14.4371815Z // begin inline asm 2026-02-21T10:15:14.4371935Z st.global.v4.b32 [ %rd128 + 0 ], { %r6252, %r6253, %r6254, %r6255 }; 2026-02-21T10:15:14.4372080Z // end inline asm 2026-02-21T10:15:14.4372142Z // begin inline asm 2026-02-21T10:15:14.4372262Z st.global.v4.b32 [ %rd129 + 0 ], { %r6217, %r6218, %r6219, %r6220 }; 2026-02-21T10:15:14.4372320Z // end inline asm 2026-02-21T10:15:14.4372379Z // begin inline asm 2026-02-21T10:15:14.4372493Z st.global.v4.b32 [ %rd130 + 0 ], { %r6222, %r6223, %r6224, %r6225 }; 2026-02-21T10:15:14.4372554Z // end inline asm 2026-02-21T10:15:14.4372611Z // begin inline asm 2026-02-21T10:15:14.4372727Z st.global.v4.b32 [ %rd131 + 0 ], { %r6257, %r6258, %r6259, %r6260 }; 2026-02-21T10:15:14.4372799Z // end inline asm 2026-02-21T10:15:14.4372860Z // begin inline asm 2026-02-21T10:15:14.4372979Z st.global.v4.b32 [ %rd132 + 0 ], { %r6262, %r6263, %r6264, %r6265 }; 2026-02-21T10:15:14.4373039Z // end inline asm 2026-02-21T10:15:14.4373098Z // begin inline asm 2026-02-21T10:15:14.4373215Z st.global.v4.b32 [ %rd133 + 0 ], { %r6227, %r6228, %r6229, %r6230 }; 2026-02-21T10:15:14.4373273Z // end inline asm 2026-02-21T10:15:14.4373338Z // begin inline asm 2026-02-21T10:15:14.4373453Z st.global.v4.b32 [ %rd134 + 0 ], { %r6232, %r6233, %r6234, %r6235 }; 2026-02-21T10:15:14.4373511Z // end inline asm 2026-02-21T10:15:14.4373575Z // begin inline asm 2026-02-21T10:15:14.4373690Z st.global.v4.b32 [ %rd135 + 0 ], { %r6267, %r6268, %r6269, %r6270 }; 2026-02-21T10:15:14.4373746Z // end inline asm 2026-02-21T10:15:14.4373809Z // begin inline asm 2026-02-21T10:15:14.4373930Z st.global.v4.b32 [ %rd136 + 0 ], { %r6272, %r6273, %r6274, %r6275 }; 2026-02-21T10:15:14.4373989Z // end inline asm 2026-02-21T10:15:14.4374051Z // begin inline asm 2026-02-21T10:15:14.4374175Z st.global.v4.b32 [ %rd137 + 0 ], { %r6237, %r6238, %r6239, %r6240 }; 2026-02-21T10:15:14.4374233Z // end inline asm 2026-02-21T10:15:14.4374291Z // begin inline asm 2026-02-21T10:15:14.4374405Z st.global.v4.b32 [ %rd138 + 0 ], { %r6242, %r6243, %r6244, %r6245 }; 2026-02-21T10:15:14.4374468Z // end inline asm 2026-02-21T10:15:14.4374529Z // begin inline asm 2026-02-21T10:15:14.4374647Z st.global.v4.b32 [ %rd139 + 0 ], { %r6277, %r6278, %r6279, %r6280 }; 2026-02-21T10:15:14.4374711Z // end inline asm 2026-02-21T10:15:14.4374769Z // begin inline asm 2026-02-21T10:15:14.4374886Z st.global.v4.b32 [ %rd140 + 0 ], { %r6282, %r6283, %r6284, %r6285 }; 2026-02-21T10:15:14.4374950Z // end inline asm 2026-02-21T10:15:14.4375009Z // begin inline asm 2026-02-21T10:15:14.4375128Z st.global.v4.b32 [ %rd141 + 0 ], { %r6287, %r6288, %r6289, %r6290 }; 2026-02-21T10:15:14.4375255Z // end inline asm 2026-02-21T10:15:14.4375320Z // begin inline asm 2026-02-21T10:15:14.4375482Z st.global.v4.b32 [ %rd142 + 0 ], { %r6292, %r6293, %r6294, %r6295 }; 2026-02-21T10:15:14.4375539Z // end inline asm 2026-02-21T10:15:14.4375603Z // begin inline asm 2026-02-21T10:15:14.4375719Z st.global.v4.b32 [ %rd143 + 0 ], { %r6327, %r6328, %r6329, %r6330 }; 2026-02-21T10:15:14.4375776Z // end inline asm 2026-02-21T10:15:14.4375836Z // begin inline asm 2026-02-21T10:15:14.4375960Z st.global.v4.b32 [ %rd144 + 0 ], { %r6332, %r6333, %r6334, %r6335 }; 2026-02-21T10:15:14.4376016Z // end inline asm 2026-02-21T10:15:14.4376074Z // begin inline asm 2026-02-21T10:15:14.4376193Z st.global.v4.b32 [ %rd145 + 0 ], { %r6297, %r6298, %r6299, %r6300 }; 2026-02-21T10:15:14.4376250Z // end inline asm 2026-02-21T10:15:14.4376308Z // begin inline asm 2026-02-21T10:15:14.4376426Z st.global.v4.b32 [ %rd146 + 0 ], { %r6302, %r6303, %r6304, %r6305 }; 2026-02-21T10:15:14.4376608Z // end inline asm 2026-02-21T10:15:14.4376675Z // begin inline asm 2026-02-21T10:15:14.4376791Z st.global.v4.b32 [ %rd147 + 0 ], { %r6337, %r6338, %r6339, %r6340 }; 2026-02-21T10:15:14.4376852Z // end inline asm 2026-02-21T10:15:14.4376980Z // begin inline asm 2026-02-21T10:15:14.4377117Z st.global.v4.b32 [ %rd148 + 0 ], { %r6342, %r6343, %r6344, %r6345 }; 2026-02-21T10:15:14.4377181Z // end inline asm 2026-02-21T10:15:14.4377243Z // begin inline asm 2026-02-21T10:15:14.4377425Z st.global.v4.b32 [ %rd149 + 0 ], { %r6307, %r6308, %r6309, %r6310 }; 2026-02-21T10:15:14.4377486Z // end inline asm 2026-02-21T10:15:14.4377550Z // begin inline asm 2026-02-21T10:15:14.4377666Z st.global.v4.b32 [ %rd150 + 0 ], { %r6312, %r6313, %r6314, %r6315 }; 2026-02-21T10:15:14.4377725Z // end inline asm 2026-02-21T10:15:14.4377788Z // begin inline asm 2026-02-21T10:15:14.4377900Z st.global.v4.b32 [ %rd151 + 0 ], { %r6347, %r6348, %r6349, %r6350 }; 2026-02-21T10:15:14.4377956Z // end inline asm 2026-02-21T10:15:14.4378018Z // begin inline asm 2026-02-21T10:15:14.4378134Z st.global.v4.b32 [ %rd152 + 0 ], { %r6352, %r6353, %r6354, %r6355 }; 2026-02-21T10:15:14.4378193Z // end inline asm 2026-02-21T10:15:14.4378251Z // begin inline asm 2026-02-21T10:15:14.4378370Z st.global.v4.b32 [ %rd153 + 0 ], { %r6317, %r6318, %r6319, %r6320 }; 2026-02-21T10:15:14.4378430Z // end inline asm 2026-02-21T10:15:14.4378487Z // begin inline asm 2026-02-21T10:15:14.4378610Z st.global.v4.b32 [ %rd154 + 0 ], { %r6322, %r6323, %r6324, %r6325 }; 2026-02-21T10:15:14.4378670Z // end inline asm 2026-02-21T10:15:14.4378728Z // begin inline asm 2026-02-21T10:15:14.4378843Z st.global.v4.b32 [ %rd155 + 0 ], { %r6357, %r6358, %r6359, %r6360 }; 2026-02-21T10:15:14.4378918Z // end inline asm 2026-02-21T10:15:14.4378979Z // begin inline asm 2026-02-21T10:15:14.4379092Z st.global.v4.b32 [ %rd156 + 0 ], { %r6362, %r6363, %r6364, %r6365 }; 2026-02-21T10:15:14.4379153Z // end inline asm 2026-02-21T10:15:14.4379370Z .loc 1 22 52 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:22:52 2026-02-21T10:15:14.4379438Z add.s32 %r6688, %r6688, 1; 2026-02-21T10:15:14.4379519Z setp.ne.b32 %p22, %r6688, %r2; 2026-02-21T10:15:14.4379582Z @%p22 bra $L__BB0_2; 2026-02-21T10:15:14.4379673Z $L__BB0_5: // %._crit_edge 2026-02-21T10:15:14.4379885Z .loc 1 22 4 // cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py:22:4 2026-02-21T10:15:14.4379951Z ret; 2026-02-21T10:15:14.4380009Z $L__tmp4: 2026-02-21T10:15:14.4380070Z $L__func_end0: 2026-02-21T10:15:14.4380163Z // -- End function 2026-02-21T10:15:14.4380220Z } 2026-02-21T10:15:14.4380476Z .file 1 "/tmp/torchinductor_root/uv/cuvmqzfw3uuijcpdbgvimvagyasvkutahwyjwoqjzyrcjikqnozb.py" 2026-02-21T10:15:14.4380695Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:15:14.4380761Z .section .debug_abbrev 2026-02-21T10:15:14.4380814Z { 2026-02-21T10:15:14.4381001Z .b8 1 // Abbreviation Code 2026-02-21T10:15:14.4381161Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:15:14.4381250Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:15:14.4381336Z .b8 37 // DW_AT_producer 2026-02-21T10:15:14.4381423Z .b8 8 // DW_FORM_string 2026-02-21T10:15:14.4381503Z .b8 19 // DW_AT_language 2026-02-21T10:15:14.4381586Z .b8 5 // DW_FORM_data2 2026-02-21T10:15:14.4381672Z .b8 3 // DW_AT_name 2026-02-21T10:15:14.4381753Z .b8 8 // DW_FORM_string 2026-02-21T10:15:14.4381837Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:15:14.4381917Z .b8 6 // DW_FORM_data4 2026-02-21T10:15:14.4382005Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:15:14.4382085Z .b8 8 // DW_FORM_string 2026-02-21T10:15:14.4382163Z .b8 0 // EOM(1) 2026-02-21T10:15:14.4382296Z .b8 0 // EOM(2) 2026-02-21T10:15:14.4382391Z .b8 2 // Abbreviation Code 2026-02-21T10:15:14.4382479Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:15:14.4382604Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:15:14.4382684Z .b8 3 // DW_AT_name 2026-02-21T10:15:14.4382763Z .b8 8 // DW_FORM_string 2026-02-21T10:15:14.4382848Z .b8 32 // DW_AT_inline 2026-02-21T10:15:14.4382936Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:14.4383005Z .b8 0 // EOM(1) 2026-02-21T10:15:14.4383074Z .b8 0 // EOM(2) 2026-02-21T10:15:14.4383168Z .b8 3 // Abbreviation Code 2026-02-21T10:15:14.4383256Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:15:14.4383343Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:15:14.4383433Z .b8 17 // DW_AT_low_pc 2026-02-21T10:15:14.4383513Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:14.4383596Z .b8 18 // DW_AT_high_pc 2026-02-21T10:15:14.4383676Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:14.4383791Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:15:14.4383873Z .b8 19 // DW_FORM_ref4 2026-02-21T10:15:14.4383948Z .b8 0 // EOM(1) 2026-02-21T10:15:14.4384028Z .b8 0 // EOM(2) 2026-02-21T10:15:14.4384114Z .b8 4 // Abbreviation Code 2026-02-21T10:15:14.4384216Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:15:14.4384304Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:15:14.4384396Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:15:14.4384474Z .b8 19 // DW_FORM_ref4 2026-02-21T10:15:14.4384557Z .b8 17 // DW_AT_low_pc 2026-02-21T10:15:14.4384639Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:14.4384722Z .b8 18 // DW_AT_high_pc 2026-02-21T10:15:14.4384800Z .b8 1 // DW_FORM_addr 2026-02-21T10:15:14.4384892Z .b8 88 // DW_AT_call_file 2026-02-21T10:15:14.4384971Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:14.4385055Z .b8 89 // DW_AT_call_line 2026-02-21T10:15:14.4385140Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:14.4385283Z .b8 87 // DW_AT_call_column 2026-02-21T10:15:14.4385407Z .b8 11 // DW_FORM_data1 2026-02-21T10:15:14.4385485Z .b8 0 // EOM(1) 2026-02-21T10:15:14.4385556Z .b8 0 // EOM(2) 2026-02-21T10:15:14.4385626Z .b8 0 // EOM(3) 2026-02-21T10:15:14.4385679Z } 2026-02-21T10:15:14.4385748Z .section .debug_info 2026-02-21T10:15:14.4385804Z { 2026-02-21T10:15:14.4385895Z .b32 178 // Length of Unit 2026-02-21T10:15:14.4385994Z .b8 2 // DWARF version number 2026-02-21T10:15:14.4386047Z .b8 0 2026-02-21T10:15:14.4386178Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:15:14.4386271Z .b8 8 // Address Size (in bytes) 2026-02-21T10:15:14.4386383Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:15:14.4386586Z .b8 116 // DW_AT_producer 2026-02-21T10:15:14.4386650Z .b8 114 2026-02-21T10:15:14.4386796Z .b8 105 2026-02-21T10:15:14.4386852Z .b8 116 2026-02-21T10:15:14.4386905Z .b8 111 2026-02-21T10:15:14.4386961Z .b8 110 2026-02-21T10:15:14.4387012Z .b8 0 2026-02-21T10:15:14.4387092Z .b8 2 // DW_AT_language 2026-02-21T10:15:14.4387144Z .b8 0 2026-02-21T10:15:14.4387286Z .b8 99 // DW_AT_name 2026-02-21T10:15:14.4387352Z .b8 117 2026-02-21T10:15:14.4387405Z .b8 118 2026-02-21T10:15:14.4387462Z .b8 109 2026-02-21T10:15:14.4387517Z .b8 113 2026-02-21T10:15:14.4387569Z .b8 122 2026-02-21T10:15:14.4387620Z .b8 102 2026-02-21T10:15:14.4387676Z .b8 119 2026-02-21T10:15:14.4387728Z .b8 51 2026-02-21T10:15:14.4387779Z .b8 117 2026-02-21T10:15:14.4387830Z .b8 117 2026-02-21T10:15:14.4387889Z .b8 105 2026-02-21T10:15:14.4387941Z .b8 106 2026-02-21T10:15:14.4387994Z .b8 99 2026-02-21T10:15:14.4388050Z .b8 112 2026-02-21T10:15:14.4388101Z .b8 100 2026-02-21T10:15:14.4388154Z .b8 98 2026-02-21T10:15:14.4388205Z .b8 103 2026-02-21T10:15:14.4388262Z .b8 118 2026-02-21T10:15:14.4388316Z .b8 105 2026-02-21T10:15:14.4388368Z .b8 109 2026-02-21T10:15:14.4388495Z .b8 118 2026-02-21T10:15:14.4388550Z .b8 97 2026-02-21T10:15:14.4388603Z .b8 103 2026-02-21T10:15:14.4388660Z .b8 121 2026-02-21T10:15:14.4388716Z .b8 97 2026-02-21T10:15:14.4388768Z .b8 115 2026-02-21T10:15:14.4388820Z .b8 118 2026-02-21T10:15:14.4388876Z .b8 107 2026-02-21T10:15:14.4388927Z .b8 117 2026-02-21T10:15:14.4388979Z .b8 116 2026-02-21T10:15:14.4389033Z .b8 97 2026-02-21T10:15:14.4389090Z .b8 104 2026-02-21T10:15:14.4389143Z .b8 119 2026-02-21T10:15:14.4389193Z .b8 121 2026-02-21T10:15:14.4389245Z .b8 106 2026-02-21T10:15:14.4389304Z .b8 119 2026-02-21T10:15:14.4389356Z .b8 111 2026-02-21T10:15:14.4389408Z .b8 113 2026-02-21T10:15:14.4389463Z .b8 106 2026-02-21T10:15:14.4389515Z .b8 122 2026-02-21T10:15:14.4389570Z .b8 121 2026-02-21T10:15:14.4389624Z .b8 114 2026-02-21T10:15:14.4389682Z .b8 99 2026-02-21T10:15:14.4389735Z .b8 106 2026-02-21T10:15:14.4389789Z .b8 105 2026-02-21T10:15:14.4389853Z .b8 107 2026-02-21T10:15:14.4389915Z .b8 113 2026-02-21T10:15:14.4389968Z .b8 110 2026-02-21T10:15:14.4390020Z .b8 111 2026-02-21T10:15:14.4390076Z .b8 122 2026-02-21T10:15:14.4390131Z .b8 98 2026-02-21T10:15:14.4390183Z .b8 46 2026-02-21T10:15:14.4390234Z .b8 112 2026-02-21T10:15:14.4390294Z .b8 121 2026-02-21T10:15:14.4390344Z .b8 0 2026-02-21T10:15:14.4390453Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:15:14.4390546Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:15:14.4390599Z .b8 116 2026-02-21T10:15:14.4390651Z .b8 109 2026-02-21T10:15:14.4390704Z .b8 112 2026-02-21T10:15:14.4390761Z .b8 47 2026-02-21T10:15:14.4390817Z .b8 116 2026-02-21T10:15:14.4390868Z .b8 111 2026-02-21T10:15:14.4390925Z .b8 114 2026-02-21T10:15:14.4390977Z .b8 99 2026-02-21T10:15:14.4391120Z .b8 104 2026-02-21T10:15:14.4391172Z .b8 105 2026-02-21T10:15:14.4391231Z .b8 110 2026-02-21T10:15:14.4391343Z .b8 100 2026-02-21T10:15:14.4391396Z .b8 117 2026-02-21T10:15:14.4391455Z .b8 99 2026-02-21T10:15:14.4391510Z .b8 116 2026-02-21T10:15:14.4391564Z .b8 111 2026-02-21T10:15:14.4391616Z .b8 114 2026-02-21T10:15:14.4391672Z .b8 95 2026-02-21T10:15:14.4391724Z .b8 114 2026-02-21T10:15:14.4391778Z .b8 111 2026-02-21T10:15:14.4391835Z .b8 111 2026-02-21T10:15:14.4391889Z .b8 116 2026-02-21T10:15:14.4391940Z .b8 47 2026-02-21T10:15:14.4391991Z .b8 117 2026-02-21T10:15:14.4392059Z .b8 118 2026-02-21T10:15:14.4392111Z .b8 0 2026-02-21T10:15:14.4392231Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:15:14.4392317Z .b8 95 // DW_AT_name 2026-02-21T10:15:14.4392370Z .b8 104 2026-02-21T10:15:14.4392421Z .b8 101 2026-02-21T10:15:14.4392472Z .b8 108 2026-02-21T10:15:14.4392527Z .b8 105 2026-02-21T10:15:14.4392578Z .b8 111 2026-02-21T10:15:14.4392634Z .b8 110 2026-02-21T10:15:14.4392685Z .b8 95 2026-02-21T10:15:14.4392742Z .b8 109 2026-02-21T10:15:14.4392796Z .b8 97 2026-02-21T10:15:14.4392847Z .b8 116 2026-02-21T10:15:14.4392979Z .b8 109 2026-02-21T10:15:14.4393037Z .b8 117 2026-02-21T10:15:14.4393090Z .b8 108 2026-02-21T10:15:14.4393140Z .b8 95 2026-02-21T10:15:14.4393196Z .b8 98 2026-02-21T10:15:14.4393247Z .b8 102 2026-02-21T10:15:14.4393298Z .b8 49 2026-02-21T10:15:14.4393359Z .b8 54 2026-02-21T10:15:14.4393460Z .b8 95 2026-02-21T10:15:14.4393515Z .b8 105 2026-02-21T10:15:14.4393566Z .b8 110 2026-02-21T10:15:14.4393623Z .b8 116 2026-02-21T10:15:14.4393675Z .b8 52 2026-02-21T10:15:14.4393726Z .b8 0 2026-02-21T10:15:14.4393815Z .b8 1 // DW_AT_inline 2026-02-21T10:15:14.4393924Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:15:14.4394022Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:15:14.4394120Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:15:14.4394228Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:15:14.4394361Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:15:14.4394458Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:15:14.4394551Z .b64 $L__tmp0 // DW_AT_low_pc 2026-02-21T10:15:14.4394655Z .b64 $L__tmp3 // DW_AT_high_pc 2026-02-21T10:15:14.4394744Z .b8 1 // DW_AT_call_file 2026-02-21T10:15:14.4394832Z .b8 81 // DW_AT_call_line 2026-02-21T10:15:14.4394921Z .b8 40 // DW_AT_call_column 2026-02-21T10:15:14.4395013Z .b8 0 // End Of Children Mark 2026-02-21T10:15:14.4395100Z .b8 0 // End Of Children Mark 2026-02-21T10:15:14.4395162Z } 2026-02-21T10:15:14.4395234Z .section .debug_macinfo { } 2026-02-21T10:15:14.4395240Z 2026-02-21T10:15:14.4395323Z ================================================================ 2026-02-21T10:15:14.4395447Z please share the reproducer above with Triton project. 2026-02-21T10:15:16.7307642Z 2026-02-21T10:15:16.7310626Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 78/78 2.5 configs/s 2026-02-21T10:15:22.2492175Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 5.0 configs/s 2026-02-21T10:15:23.7845484Z [2916s] Generation 10 complete: 2026-02-21T10:15:23.7845816Z error=23 2026-02-21T10:15:23.7846037Z ok=58 2026-02-21T10:15:23.7846232Z min=5.9408 2026-02-21T10:15:23.7846428Z mid=10.1352 2026-02-21T10:15:23.7846885Z max=499.3928 2026-02-21T10:15:23.7847084Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:15:23.7847445Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:15:23.7847818Z 'l2_groupings': [16], 2026-02-21T10:15:23.7848056Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:15:23.7848796Z 'loop_orders': [[0, 1]], 2026-02-21T10:15:23.7849124Z 'maxnreg': 256, 2026-02-21T10:15:23.7849324Z 'num_sm_multiplier': 64, 2026-02-21T10:15:23.7849565Z 'num_stages': 1, 2026-02-21T10:15:23.7849764Z 'num_warps': 4, 2026-02-21T10:15:23.7849984Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:15:23.7850257Z 'range_flattens': [True, True], 2026-02-21T10:15:23.7850516Z 'range_multi_buffers': [None, False], 2026-02-21T10:15:23.7850780Z 'range_num_stages': [4, 2], 2026-02-21T10:15:23.7851019Z 'range_unroll_factors': [1, 0], 2026-02-21T10:15:23.7851262Z 'range_warp_specializes': []} 2026-02-21T10:15:23.7902718Z [2916s] Fitting surrogate: 1001 points, 1001 targets 2026-02-21T10:15:25.0053119Z [2918s] Generation 11 starting: 72 neighbors, 4 active search path(s) 2026-02-21T10:15:48.1292181Z Generation 11: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 72/72 2.9 configs/s 2026-02-21T10:16:13.0097002Z Generation 11: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 72/72 2.8 configs/s 2026-02-21T10:16:19.7519966Z Generation 11: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 4.4 configs/s 2026-02-21T10:16:21.2682980Z [2974s] Generation 11 complete: 2026-02-21T10:16:21.2683689Z error=14 2026-02-21T10:16:21.2683906Z ok=62 2026-02-21T10:16:21.2684107Z min=5.8497 2026-02-21T10:16:21.2684306Z mid=8.7092 2026-02-21T10:16:21.2684494Z max=466.0323 2026-02-21T10:16:21.2684687Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:16:21.2685240Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:16:21.2685630Z 'l2_groupings': [16], 2026-02-21T10:16:21.2685867Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:16:21.2686149Z 'loop_orders': [[0, 1]], 2026-02-21T10:16:21.2686366Z 'maxnreg': 256, 2026-02-21T10:16:21.2686864Z 'num_sm_multiplier': 64, 2026-02-21T10:16:21.2687082Z 'num_stages': 1, 2026-02-21T10:16:21.2687276Z 'num_warps': 4, 2026-02-21T10:16:21.2687488Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:16:21.2687771Z 'range_flattens': [True, True], 2026-02-21T10:16:21.2688043Z 'range_multi_buffers': [True, False], 2026-02-21T10:16:21.2688320Z 'range_num_stages': [4, 2], 2026-02-21T10:16:21.2688568Z 'range_unroll_factors': [1, 0], 2026-02-21T10:16:21.2688818Z 'range_warp_specializes': []} 2026-02-21T10:16:21.2742793Z [2974s] Fitting surrogate: 1077 points, 1077 targets 2026-02-21T10:16:22.3733798Z [2975s] Generation 12 starting: 61 neighbors, 3 active search path(s) 2026-02-21T10:16:46.7995882Z Generation 12: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61/61 0.9 configs/s 2026-02-21T10:17:13.5371802Z Generation 12: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 61/61 2.2 configs/s 2026-02-21T10:17:17.5292023Z Generation 12: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 6.7 configs/s 2026-02-21T10:17:18.9834724Z [3032s] Generation 12 complete: 2026-02-21T10:17:18.9835017Z error=18 2026-02-21T10:17:18.9835195Z ok=46 2026-02-21T10:17:18.9835374Z min=5.9786 2026-02-21T10:17:18.9835552Z mid=9.3078 2026-02-21T10:17:18.9835740Z max=465.8338 2026-02-21T10:17:18.9835941Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:17:18.9836317Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:17:18.9837352Z 'l2_groupings': [16], 2026-02-21T10:17:18.9837625Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:17:18.9837908Z 'loop_orders': [[0, 1]], 2026-02-21T10:17:18.9838155Z 'maxnreg': 256, 2026-02-21T10:17:18.9838353Z 'num_sm_multiplier': 64, 2026-02-21T10:17:18.9838580Z 'num_stages': 1, 2026-02-21T10:17:18.9838769Z 'num_warps': 4, 2026-02-21T10:17:18.9838990Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:17:18.9839274Z 'range_flattens': [True, False], 2026-02-21T10:17:18.9839531Z 'range_multi_buffers': [True, False], 2026-02-21T10:17:18.9839808Z 'range_num_stages': [4, 2], 2026-02-21T10:17:18.9840046Z 'range_unroll_factors': [1, 0], 2026-02-21T10:17:18.9840300Z 'range_warp_specializes': []} 2026-02-21T10:17:18.9889812Z [3032s] Fitting surrogate: 1141 points, 1141 targets 2026-02-21T10:17:19.9989987Z [3033s] Generation 13 starting: 58 neighbors, 3 active search path(s) 2026-02-21T10:17:40.3245639Z Generation 13: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 2.6 configs/s 2026-02-21T10:18:00.0759775Z Generation 13: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 2.9 configs/s 2026-02-21T10:18:03.6115996Z Generation 13: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 7.3 configs/s 2026-02-21T10:18:05.0847036Z [3078s] Generation 13 complete: 2026-02-21T10:18:05.0847514Z error=18 2026-02-21T10:18:05.0847801Z ok=43 2026-02-21T10:18:05.0848059Z min=6.3683 2026-02-21T10:18:05.0848332Z mid=9.6286 2026-02-21T10:18:05.0848585Z max=467.3730 2026-02-21T10:18:05.0848901Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:18:05.0849492Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:18:05.0850116Z 'l2_groupings': [8], 2026-02-21T10:18:05.0850503Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:18:05.0850931Z 'loop_orders': [[0, 1]], 2026-02-21T10:18:05.0851293Z 'maxnreg': 256, 2026-02-21T10:18:05.0851605Z 'num_sm_multiplier': 64, 2026-02-21T10:18:05.0851965Z 'num_stages': 1, 2026-02-21T10:18:05.0852263Z 'num_warps': 4, 2026-02-21T10:18:05.0853124Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:18:05.0853606Z 'range_flattens': [True, False], 2026-02-21T10:18:05.0854020Z 'range_multi_buffers': [True, False], 2026-02-21T10:18:05.0854429Z 'range_num_stages': [4, 2], 2026-02-21T10:18:05.0854814Z 'range_unroll_factors': [1, 0], 2026-02-21T10:18:05.0855436Z 'range_warp_specializes': []} 2026-02-21T10:18:05.0906179Z [3078s] Fitting surrogate: 1202 points, 1202 targets 2026-02-21T10:18:06.0911782Z [3079s] Generation 14 starting: 57 neighbors, 3 active search path(s) 2026-02-21T10:18:27.0019512Z Generation 14: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 3.2 configs/s 2026-02-21T10:18:33.5512613Z 2026-02-21T10:18:33.5512636Z 2026-02-21T10:18:33.5513059Z ================================================================ 2026-02-21T10:18:33.5513687Z Internal Triton PTX codegen error 2026-02-21T10:18:33.5514129Z `ptxas` stderr: 2026-02-21T10:18:33.5515467Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1154 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:33.5516298Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:33.5516824Z 2026-02-21T10:18:33.5517494Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4v0g1n9x.ptx -o /tmp/tmp4v0g1n9x.ptx.o 2026-02-21T10:18:33.5518259Z 2026-02-21T10:18:33.5518263Z 2026-02-21T10:18:33.5518333Z // 2026-02-21T10:18:33.5518525Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:18:33.5518768Z // 2026-02-21T10:18:33.5518858Z 2026-02-21T10:18:33.5518935Z .version 8.7 2026-02-21T10:18:33.5519109Z .target sm_90a 2026-02-21T10:18:33.5519293Z .address_size 64 2026-02-21T10:18:33.5519405Z 2026-02-21T10:18:33.5519626Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:18:33.5520059Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:18:33.5520373Z // @_helion_matmul_bf16_int4 2026-02-21T10:18:33.5520700Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:18:33.5521067Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:18:33.5521519Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:18:33.5521955Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:18:33.5522397Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:18:33.5522886Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:18:33.5523219Z ) 2026-02-21T10:18:33.5523374Z .reqntid 128 2026-02-21T10:18:33.5523552Z .maxnreg 32 2026-02-21T10:18:33.5523710Z { 2026-02-21T10:18:33.5523889Z .reg .pred %p<315>; 2026-02-21T10:18:33.5524564Z .reg .b16 %rs<2689>; 2026-02-21T10:18:33.5524766Z .reg .b32 %r<32677>; 2026-02-21T10:18:33.5525046Z .reg .b64 %rd<666>; 2026-02-21T10:18:33.5525393Z .loc 1 19 0 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:19:0 2026-02-21T10:18:33.5525761Z $L__func_begin0: 2026-02-21T10:18:33.5526093Z .loc 1 19 0 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:19:0 2026-02-21T10:18:33.5526404Z 2026-02-21T10:18:33.5526620Z // %bb.0: 2026-02-21T10:18:33.5526849Z ld.param.b64 %rd85, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:18:33.5527181Z $L__tmp0: 2026-02-21T10:18:33.5527724Z .loc 1 21 67 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:21:67 2026-02-21T10:18:33.5528111Z mov.u32 %r31905, %ctaid.x; 2026-02-21T10:18:33.5528367Z ld.param.b64 %rd87, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:18:33.5528688Z ld.param.b64 %rd105, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:18:33.5528962Z mov.u32 %r1689, %ctaid.y; 2026-02-21T10:18:33.5529204Z ld.param.b64 %rd122, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:18:33.5529460Z mov.u32 %r1690, %ctaid.z; 2026-02-21T10:18:33.5529811Z mov.u32 %r1691, %nctaid.x; 2026-02-21T10:18:33.5530002Z mov.u32 %r1692, %nctaid.y; 2026-02-21T10:18:33.5530213Z mad.lo.s32 %r1693, %r1690, %r1692, %r1689; 2026-02-21T10:18:33.5530457Z mad.lo.s32 %r1694, %r1693, %r1691, %r31905; 2026-02-21T10:18:33.5530824Z shl.b32 %r1695, %r1694, 8; 2026-02-21T10:18:33.5531034Z cvt.s64.s32 %rd123, %r1695; 2026-02-21T10:18:33.5531274Z add.s64 %rd101, %rd122, %rd123; 2026-02-21T10:18:33.5531506Z mov.u32 %r2, %tid.x; 2026-02-21T10:18:33.5531743Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:18:33.5531944Z shl.b32 %r1696, %r2, 2; 2026-02-21T10:18:33.5532129Z mov.b32 %r29377, global_smem; 2026-02-21T10:18:33.5532336Z add.s32 %r1673, %r29377, %r1696; 2026-02-21T10:18:33.5532531Z mov.b32 %r1682, 0; 2026-02-21T10:18:33.5532699Z // begin inline asm 2026-02-21T10:18:33.5532878Z @%p1 st.shared.b32 [ %r1673 + 0 ], %r1682; 2026-02-21T10:18:33.5533114Z // end inline asm 2026-02-21T10:18:33.5533291Z bar.warp.sync -1; 2026-02-21T10:18:33.5533476Z setp.eq.b32 %p222, %r2, 0; 2026-02-21T10:18:33.5533679Z cvt.u64.u32 %rd86, %r29377; 2026-02-21T10:18:33.5533880Z // begin inline asm 2026-02-21T10:18:33.5534221Z @%p222 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd86 + 0 ], %rd87; 2026-02-21T10:18:33.5534609Z // end inline asm 2026-02-21T10:18:33.5534772Z // begin inline asm 2026-02-21T10:18:33.5535124Z @%p222 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1; 2026-02-21T10:18:33.5535469Z // end inline asm 2026-02-21T10:18:33.5535627Z mov.b32 %r1675, 128; 2026-02-21T10:18:33.5535819Z // begin inline asm 2026-02-21T10:18:33.5536114Z @%p222 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1675; 2026-02-21T10:18:33.5536623Z // end inline asm 2026-02-21T10:18:33.5536786Z mov.b32 %r1676, 32; 2026-02-21T10:18:33.5536954Z // begin inline asm 2026-02-21T10:18:33.5537232Z @%p222 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1676; 2026-02-21T10:18:33.5537588Z // end inline asm 2026-02-21T10:18:33.5537767Z mov.b32 %r1677, 1280; 2026-02-21T10:18:33.5537957Z // begin inline asm 2026-02-21T10:18:33.5538341Z @%p222 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1677; 2026-02-21T10:18:33.5538713Z // end inline asm 2026-02-21T10:18:33.5538901Z mov.b32 %r1678, 4096; 2026-02-21T10:18:33.5539076Z // begin inline asm 2026-02-21T10:18:33.5539392Z @%p222 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1678; 2026-02-21T10:18:33.5539743Z // end inline asm 2026-02-21T10:18:33.5539894Z mov.b64 %rd94, 1280; 2026-02-21T10:18:33.5540053Z // begin inline asm 2026-02-21T10:18:33.5540384Z @%p222 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd86 + 0 ], 0x0, %rd94; 2026-02-21T10:18:33.5540741Z // end inline asm 2026-02-21T10:18:33.5541018Z mov.b32 %r1679, 1; 2026-02-21T10:18:33.5541172Z // begin inline asm 2026-02-21T10:18:33.5541585Z @%p222 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1679; 2026-02-21T10:18:33.5541958Z // end inline asm 2026-02-21T10:18:33.5542111Z // begin inline asm 2026-02-21T10:18:33.5542449Z @%p222 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1679; 2026-02-21T10:18:33.5542816Z // end inline asm 2026-02-21T10:18:33.5542971Z // begin inline asm 2026-02-21T10:18:33.5543268Z @%p222 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0; 2026-02-21T10:18:33.5543615Z // end inline asm 2026-02-21T10:18:33.5543762Z // begin inline asm 2026-02-21T10:18:33.5544102Z @%p222 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0; 2026-02-21T10:18:33.5544486Z // end inline asm 2026-02-21T10:18:33.5544634Z // begin inline asm 2026-02-21T10:18:33.5544937Z @%p222 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x3; 2026-02-21T10:18:33.5545285Z // end inline asm 2026-02-21T10:18:33.5545437Z // begin inline asm 2026-02-21T10:18:33.5545802Z @%p222 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0; 2026-02-21T10:18:33.5546148Z // end inline asm 2026-02-21T10:18:33.5546309Z // begin inline asm 2026-02-21T10:18:33.5546975Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd101 + 0 ], [ %rd86 + 0 ], 0x80; 2026-02-21T10:18:33.5547486Z // end inline asm 2026-02-21T10:18:33.5547638Z // begin inline asm 2026-02-21T10:18:33.5547903Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd101 + 0 ], 0x80; 2026-02-21T10:18:33.5548210Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:18:33.5548523Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:18:33.5548756Z // end inline asm 2026-02-21T10:18:33.5548916Z bar.sync 0; 2026-02-21T10:18:33.5549080Z cvta.global.u64 %rd633, %rd101; 2026-02-21T10:18:33.5549431Z .loc 1 23 68 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:23:68 2026-02-21T10:18:33.5549801Z add.s64 %rd119, %rd101, 128; 2026-02-21T10:18:33.5549977Z bar.sync 0; 2026-02-21T10:18:33.5550127Z // begin inline asm 2026-02-21T10:18:33.5550293Z @%p1 st.shared.b32 [ %r1673 + 0 ], %r1682; 2026-02-21T10:18:33.5550503Z // end inline asm 2026-02-21T10:18:33.5550661Z bar.warp.sync -1; 2026-02-21T10:18:33.5550817Z // begin inline asm 2026-02-21T10:18:33.5551131Z @%p222 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd86 + 0 ], %rd105; 2026-02-21T10:18:33.5551486Z // end inline asm 2026-02-21T10:18:33.5551639Z // begin inline asm 2026-02-21T10:18:33.5551907Z @%p222 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1; 2026-02-21T10:18:33.5552225Z // end inline asm 2026-02-21T10:18:33.5552371Z mov.b32 %r1683, 64; 2026-02-21T10:18:33.5552532Z // begin inline asm 2026-02-21T10:18:33.5552824Z @%p222 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1683; 2026-02-21T10:18:33.5553169Z // end inline asm 2026-02-21T10:18:33.5553325Z // begin inline asm 2026-02-21T10:18:33.5553606Z @%p222 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1675; 2026-02-21T10:18:33.5553988Z // end inline asm 2026-02-21T10:18:33.5554147Z // begin inline asm 2026-02-21T10:18:33.5554457Z @%p222 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1677; 2026-02-21T10:18:33.5554820Z // end inline asm 2026-02-21T10:18:33.5554966Z mov.b32 %r1686, 65536; 2026-02-21T10:18:33.5555144Z // begin inline asm 2026-02-21T10:18:33.5555438Z @%p222 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1686; 2026-02-21T10:18:33.5555789Z // end inline asm 2026-02-21T10:18:33.5555940Z mov.b64 %rd112, 2560; 2026-02-21T10:18:33.5556111Z // begin inline asm 2026-02-21T10:18:33.5556420Z @%p222 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd86 + 0 ], 0x0, %rd112; 2026-02-21T10:18:33.5557046Z // end inline asm 2026-02-21T10:18:33.5557282Z // begin inline asm 2026-02-21T10:18:33.5557611Z @%p222 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0, %r1679; 2026-02-21T10:18:33.5557984Z // end inline asm 2026-02-21T10:18:33.5558133Z // begin inline asm 2026-02-21T10:18:33.5558448Z @%p222 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x1, %r1679; 2026-02-21T10:18:33.5558820Z // end inline asm 2026-02-21T10:18:33.5558986Z // begin inline asm 2026-02-21T10:18:33.5559284Z @%p222 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd86 + 0 ], 0xa; 2026-02-21T10:18:33.5559617Z // end inline asm 2026-02-21T10:18:33.5559771Z // begin inline asm 2026-02-21T10:18:33.5560076Z @%p222 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0; 2026-02-21T10:18:33.5560443Z // end inline asm 2026-02-21T10:18:33.5560598Z // begin inline asm 2026-02-21T10:18:33.5560896Z @%p222 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x3; 2026-02-21T10:18:33.5561240Z // end inline asm 2026-02-21T10:18:33.5561393Z // begin inline asm 2026-02-21T10:18:33.5561747Z @%p222 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd86 + 0 ], 0x0; 2026-02-21T10:18:33.5562089Z // end inline asm 2026-02-21T10:18:33.5562259Z // begin inline asm 2026-02-21T10:18:33.5562769Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd119 + 0 ], [ %rd86 + 0 ], 0x80; 2026-02-21T10:18:33.5563259Z // end inline asm 2026-02-21T10:18:33.5563419Z // begin inline asm 2026-02-21T10:18:33.5563674Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd119 + 0 ], 0x80; 2026-02-21T10:18:33.5563995Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:18:33.5564209Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:18:33.5564432Z // end inline asm 2026-02-21T10:18:33.5564574Z bar.sync 0; 2026-02-21T10:18:33.5564739Z cvta.global.u64 %rd298, %rd119; 2026-02-21T10:18:33.5565089Z .loc 1 30 49 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:30:49 2026-02-21T10:18:33.5565458Z min.u32 %r3, %r31905, 5119; 2026-02-21T10:18:33.5565797Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.5566156Z sub.s32 %r1698, %r3, %r31905; 2026-02-21T10:18:33.5566354Z add.s32 %r1699, %r1698, 1; 2026-02-21T10:18:33.5566665Z shr.u32 %r1700, %r1699, 31; 2026-02-21T10:18:33.5566851Z add.s32 %r1701, %r1699, %r1700; 2026-02-21T10:18:33.5567048Z and.b32 %r1702, %r1701, -2; 2026-02-21T10:18:33.5567233Z add.s32 %r32420, %r1702, %r31905; 2026-02-21T10:18:33.5567597Z .loc 1 44 45 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:44:45 2026-02-21T10:18:33.5567958Z shr.u32 %r5, %r2, 5; 2026-02-21T10:18:33.5568130Z shr.u32 %r1703, %r2, 3; 2026-02-21T10:18:33.5568296Z bfe.u32 %r6, %r2, 3, 4; 2026-02-21T10:18:33.5568473Z or.b32 %r7, %r6, 16; 2026-02-21T10:18:33.5568628Z or.b32 %r8, %r6, 32; 2026-02-21T10:18:33.5568804Z or.b32 %r9, %r6, 48; 2026-02-21T10:18:33.5568974Z or.b32 %r10, %r6, 64; 2026-02-21T10:18:33.5569140Z or.b32 %r11, %r6, 80; 2026-02-21T10:18:33.5569305Z or.b32 %r12, %r6, 96; 2026-02-21T10:18:33.5569460Z or.b32 %r13, %r1703, 112; 2026-02-21T10:18:33.5569792Z .loc 1 57 38 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:57:38 2026-02-21T10:18:33.5570150Z and.b32 %r14, %r2, 7; 2026-02-21T10:18:33.5570319Z shl.b32 %r15, %r14, 3; 2026-02-21T10:18:33.5570630Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.5571002Z setp.ge.s32 %p37, %r31905, %r32420; 2026-02-21T10:18:33.5571221Z and.b32 %r31897, %r2, 127; 2026-02-21T10:18:33.5571408Z and.b32 %r31898, %r2, 56; 2026-02-21T10:18:33.5571620Z shl.b32 %r31899, %r2, 6; 2026-02-21T10:18:33.5571807Z shl.b32 %r31900, %r2, 5; 2026-02-21T10:18:33.5572098Z shl.b32 %r31901, %r2, 1; 2026-02-21T10:18:33.5572264Z shl.b32 %r31902, %r14, 4; 2026-02-21T10:18:33.5572507Z shl.b32 %r31903, %r2, 7; 2026-02-21T10:18:33.5572691Z and.b32 %r31904, %r2, 16; 2026-02-21T10:18:33.5572879Z setp.lt.u32 %p314, %r2, 64; 2026-02-21T10:18:33.5573058Z @%p37 bra $L__BB0_11; 2026-02-21T10:18:33.5573249Z // %bb.1: // %.lr.ph 2026-02-21T10:18:33.5573633Z .loc 1 0 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:0:88 2026-02-21T10:18:33.5573992Z shl.b32 %r1705, %r31897, 4; 2026-02-21T10:18:33.5574209Z xor.b32 %r1707, %r1705, %r31898; 2026-02-21T10:18:33.5574405Z add.s32 %r16, %r29377, %r1707; 2026-02-21T10:18:33.5574598Z xor.b32 %r1709, %r1707, 8; 2026-02-21T10:18:33.5574772Z add.s32 %r17, %r29377, %r1709; 2026-02-21T10:18:33.5574959Z and.b32 %r1711, %r31899, 6144; 2026-02-21T10:18:33.5575167Z and.b32 %r1713, %r31900, 896; 2026-02-21T10:18:33.5575350Z and.b32 %r1715, %r31901, 62; 2026-02-21T10:18:33.5575564Z or.b32 %r1716, %r1711, %r1713; 2026-02-21T10:18:33.5575758Z or.b32 %r1717, %r1716, %r1715; 2026-02-21T10:18:33.5575949Z add.s32 %r18, %r29377, %r1717; 2026-02-21T10:18:33.5576231Z xor.b32 %r1718, %r1717, 8; 2026-02-21T10:18:33.5576415Z add.s32 %r19, %r29377, %r1718; 2026-02-21T10:18:33.5576723Z xor.b32 %r1719, %r1717, 16; 2026-02-21T10:18:33.5576918Z add.s32 %r20, %r29377, %r1719; 2026-02-21T10:18:33.5577095Z xor.b32 %r1720, %r1717, 24; 2026-02-21T10:18:33.5577360Z add.s32 %r21, %r29377, %r1720; 2026-02-21T10:18:33.5577552Z xor.b32 %r1721, %r1717, 32; 2026-02-21T10:18:33.5577725Z add.s32 %r22, %r29377, %r1721; 2026-02-21T10:18:33.5577927Z xor.b32 %r1722, %r1717, 40; 2026-02-21T10:18:33.5578102Z add.s32 %r23, %r29377, %r1722; 2026-02-21T10:18:33.5578285Z xor.b32 %r1723, %r1717, 48; 2026-02-21T10:18:33.5578459Z add.s32 %r24, %r29377, %r1723; 2026-02-21T10:18:33.5578652Z xor.b32 %r1724, %r1717, 56; 2026-02-21T10:18:33.5578822Z add.s32 %r25, %r29377, %r1724; 2026-02-21T10:18:33.5579013Z add.s32 %r26, %r29377, %r31897; 2026-02-21T10:18:33.5579207Z xor.b32 %r1725, %r31897, 16; 2026-02-21T10:18:33.5579380Z add.s32 %r27, %r29377, %r1725; 2026-02-21T10:18:33.5579579Z xor.b32 %r1726, %r31897, 32; 2026-02-21T10:18:33.5579758Z add.s32 %r28, %r29377, %r1726; 2026-02-21T10:18:33.5579941Z xor.b32 %r1727, %r31897, 48; 2026-02-21T10:18:33.5580118Z add.s32 %r29, %r29377, %r1727; 2026-02-21T10:18:33.5580302Z xor.b32 %r1728, %r31897, 64; 2026-02-21T10:18:33.5580472Z add.s32 %r30, %r29377, %r1728; 2026-02-21T10:18:33.5580651Z xor.b32 %r1729, %r31897, 80; 2026-02-21T10:18:33.5580833Z add.s32 %r31, %r29377, %r1729; 2026-02-21T10:18:33.5581015Z xor.b32 %r1730, %r31897, 96; 2026-02-21T10:18:33.5581191Z add.s32 %r32, %r29377, %r1730; 2026-02-21T10:18:33.5581371Z xor.b32 %r1731, %r31897, 112; 2026-02-21T10:18:33.5581550Z add.s32 %r33, %r29377, %r1731; 2026-02-21T10:18:33.5581724Z shl.b32 %r1732, %r31897, 7; 2026-02-21T10:18:33.5581901Z or.b32 %r1734, %r1732, %r31902; 2026-02-21T10:18:33.5582082Z add.s32 %r34, %r29377, %r1734; 2026-02-21T10:18:33.5582261Z xor.b32 %r1735, %r1734, 16; 2026-02-21T10:18:33.5582434Z add.s32 %r35, %r29377, %r1735; 2026-02-21T10:18:33.5582627Z xor.b32 %r1736, %r1734, 32; 2026-02-21T10:18:33.5582803Z add.s32 %r36, %r29377, %r1736; 2026-02-21T10:18:33.5582977Z xor.b32 %r1737, %r1734, 48; 2026-02-21T10:18:33.5583152Z add.s32 %r37, %r29377, %r1737; 2026-02-21T10:18:33.5583329Z xor.b32 %r1738, %r1734, 64; 2026-02-21T10:18:33.5583504Z add.s32 %r38, %r29377, %r1738; 2026-02-21T10:18:33.5583679Z xor.b32 %r1739, %r1734, 80; 2026-02-21T10:18:33.5583853Z add.s32 %r39, %r29377, %r1739; 2026-02-21T10:18:33.5584029Z xor.b32 %r1740, %r1734, 96; 2026-02-21T10:18:33.5584206Z add.s32 %r40, %r29377, %r1740; 2026-02-21T10:18:33.5584382Z xor.b32 %r1741, %r1734, 112; 2026-02-21T10:18:33.5584558Z add.s32 %r41, %r29377, %r1741; 2026-02-21T10:18:33.5584741Z bfe.u32 %r1742, %r29377, 4, 14; 2026-02-21T10:18:33.5585016Z cvt.u64.u32 %rd124, %r1742; 2026-02-21T10:18:33.5585211Z or.b64 %rd3, %rd124, 4611686293372403712; 2026-02-21T10:18:33.5585481Z add.s32 %r1743, %r29377, 32; 2026-02-21T10:18:33.5585668Z bfe.u32 %r1744, %r1743, 4, 14; 2026-02-21T10:18:33.5585850Z cvt.u64.u32 %rd125, %r1744; 2026-02-21T10:18:33.5586056Z or.b64 %rd4, %rd125, 4611686293372403712; 2026-02-21T10:18:33.5586261Z add.s32 %r1745, %r29377, 64; 2026-02-21T10:18:33.5586442Z bfe.u32 %r1746, %r1745, 4, 14; 2026-02-21T10:18:33.5586777Z cvt.u64.u32 %rd126, %r1746; 2026-02-21T10:18:33.5586962Z or.b64 %rd5, %rd126, 4611686293372403712; 2026-02-21T10:18:33.5587168Z add.s32 %r1747, %r29377, 96; 2026-02-21T10:18:33.5587343Z bfe.u32 %r1748, %r1747, 4, 14; 2026-02-21T10:18:33.5587530Z cvt.u64.u32 %rd127, %r1748; 2026-02-21T10:18:33.5587709Z or.b64 %rd6, %rd127, 4611686293372403712; 2026-02-21T10:18:33.5587919Z add.s32 %r1749, %r29377, 16384; 2026-02-21T10:18:33.5588114Z bfe.u32 %r1750, %r1749, 4, 14; 2026-02-21T10:18:33.5588302Z cvt.u64.u32 %rd128, %r1750; 2026-02-21T10:18:33.5588565Z or.b64 %rd7, %rd128, 4611686293372403712; 2026-02-21T10:18:33.5588778Z add.s32 %r1751, %r29377, 16416; 2026-02-21T10:18:33.5589043Z bfe.u32 %r1752, %r1751, 4, 14; 2026-02-21T10:18:33.5589227Z cvt.u64.u32 %rd129, %r1752; 2026-02-21T10:18:33.5589416Z or.b64 %rd8, %rd129, 4611686293372403712; 2026-02-21T10:18:33.5589627Z add.s32 %r1753, %r29377, 16448; 2026-02-21T10:18:33.5589816Z bfe.u32 %r1754, %r1753, 4, 14; 2026-02-21T10:18:33.5590070Z cvt.u64.u32 %rd130, %r1754; 2026-02-21T10:18:33.5590263Z or.b64 %rd9, %rd130, 4611686293372403712; 2026-02-21T10:18:33.5590462Z add.s32 %r1755, %r29377, 16480; 2026-02-21T10:18:33.5590644Z bfe.u32 %r1756, %r1755, 4, 14; 2026-02-21T10:18:33.5590823Z cvt.u64.u32 %rd131, %r1756; 2026-02-21T10:18:33.5591009Z or.b64 %rd10, %rd131, 4611686293372403712; 2026-02-21T10:18:33.5591219Z and.b32 %r1758, %r31903, 1920; 2026-02-21T10:18:33.5591392Z or.b32 %r1760, %r1758, %r31902; 2026-02-21T10:18:33.5591585Z xor.b32 %r1761, %r1760, %r31904; 2026-02-21T10:18:33.5591780Z or.b32 %r1762, %r1761, %r1711; 2026-02-21T10:18:33.5591967Z add.s32 %r42, %r29377, %r1762; 2026-02-21T10:18:33.5592144Z add.s32 %r43, %r42, 16384; 2026-02-21T10:18:33.5592321Z add.s32 %r44, %r42, 8192; 2026-02-21T10:18:33.5592490Z add.s32 %r45, %r42, 24576; 2026-02-21T10:18:33.5592672Z xor.b32 %r1763, %r1762, 32; 2026-02-21T10:18:33.5592849Z add.s32 %r46, %r29377, %r1763; 2026-02-21T10:18:33.5593025Z add.s32 %r47, %r46, 16384; 2026-02-21T10:18:33.5593199Z add.s32 %r48, %r46, 8192; 2026-02-21T10:18:33.5593365Z add.s32 %r49, %r46, 24576; 2026-02-21T10:18:33.5593540Z xor.b32 %r1764, %r1762, 64; 2026-02-21T10:18:33.5593712Z add.s32 %r50, %r29377, %r1764; 2026-02-21T10:18:33.5593893Z add.s32 %r51, %r50, 16384; 2026-02-21T10:18:33.5594061Z add.s32 %r52, %r50, 8192; 2026-02-21T10:18:33.5594232Z add.s32 %r53, %r50, 24576; 2026-02-21T10:18:33.5594411Z xor.b32 %r1765, %r1762, 96; 2026-02-21T10:18:33.5594582Z add.s32 %r54, %r29377, %r1765; 2026-02-21T10:18:33.5594766Z add.s32 %r55, %r54, 16384; 2026-02-21T10:18:33.5594934Z add.s32 %r56, %r54, 8192; 2026-02-21T10:18:33.5595103Z add.s32 %r57, %r54, 24576; 2026-02-21T10:18:33.5595274Z or.b32 %r1766, %r1713, %r1715; 2026-02-21T10:18:33.5595456Z or.b32 %r1767, %r1766, %r1711; 2026-02-21T10:18:33.5595632Z add.s32 %r58, %r29377, %r1767; 2026-02-21T10:18:33.5595813Z xor.b32 %r1768, %r1767, 8; 2026-02-21T10:18:33.5595984Z add.s32 %r59, %r29377, %r1768; 2026-02-21T10:18:33.5596165Z xor.b32 %r1769, %r1767, 16; 2026-02-21T10:18:33.5596343Z add.s32 %r60, %r29377, %r1769; 2026-02-21T10:18:33.5596649Z xor.b32 %r1770, %r1767, 24; 2026-02-21T10:18:33.5596846Z add.s32 %r61, %r29377, %r1770; 2026-02-21T10:18:33.5597027Z xor.b32 %r1771, %r1767, 32; 2026-02-21T10:18:33.5597206Z add.s32 %r62, %r29377, %r1771; 2026-02-21T10:18:33.5597382Z xor.b32 %r1772, %r1767, 40; 2026-02-21T10:18:33.5597562Z add.s32 %r63, %r29377, %r1772; 2026-02-21T10:18:33.5597736Z xor.b32 %r1773, %r1767, 48; 2026-02-21T10:18:33.5598014Z add.s32 %r64, %r29377, %r1773; 2026-02-21T10:18:33.5598266Z xor.b32 %r1774, %r1767, 56; 2026-02-21T10:18:33.5598448Z add.s32 %r65, %r29377, %r1774; 2026-02-21T10:18:33.5598797Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.5599166Z mad.wide.u32 %rd11, %r14, 16, %rd85; 2026-02-21T10:18:33.5599373Z shl.b32 %r72, %r6, 13; 2026-02-21T10:18:33.5599546Z shl.b32 %r73, %r13, 13; 2026-02-21T10:18:33.5599859Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.5600209Z or.b32 %r1776, %r73, %r15; 2026-02-21T10:18:33.5600385Z or.b32 %r74, %r1776, 128; 2026-02-21T10:18:33.5600603Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:18:33.5600883Z // Child Loop BB0_3 Depth 2 2026-02-21T10:18:33.5601157Z // Child Loop BB0_5 Depth 2 2026-02-21T10:18:33.5601418Z // Child Loop BB0_7 Depth 2 2026-02-21T10:18:33.5601747Z // Child Loop BB0_9 Depth 2 2026-02-21T10:18:33.5602125Z .loc 1 38 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:38:33 2026-02-21T10:18:33.5602482Z shr.u32 %r1778, %r31905, 9; 2026-02-21T10:18:33.5602675Z and.b32 %r1779, %r1778, 4194272; 2026-02-21T10:18:33.5603062Z .loc 1 39 39 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:39:39 2026-02-21T10:18:33.5603419Z sub.s32 %r1780, 10, %r1779; 2026-02-21T10:18:33.5603725Z .loc 1 40 45 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:45 2026-02-21T10:18:33.5604093Z and.b32 %r1781, %r31905, 16383; 2026-02-21T10:18:33.5604423Z .loc 1 41 51 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:41:51 2026-02-21T10:18:33.5604780Z div.s32 %r1782, %r1781, %r1780; 2026-02-21T10:18:33.5605099Z .loc 1 40 64 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:64 2026-02-21T10:18:33.5605465Z mul.lo.s32 %r1783, %r1782, %r1780; 2026-02-21T10:18:33.5605673Z sub.s32 %r1784, %r1781, %r1783; 2026-02-21T10:18:33.5605988Z .loc 1 40 30 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:30 2026-02-21T10:18:33.5606339Z add.s32 %r1785, %r1784, %r1779; 2026-02-21T10:18:33.5606765Z .loc 1 42 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:42:27 2026-02-21T10:18:33.5607114Z shl.b32 %r9279, %r1785, 7; 2026-02-21T10:18:33.5607426Z .loc 1 43 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:43:27 2026-02-21T10:18:33.5607771Z shl.b32 %r11726, %r1782, 7; 2026-02-21T10:18:33.5608088Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.5608438Z or.b32 %r1786, %r12, %r11726; 2026-02-21T10:18:33.5608625Z shl.b32 %r1787, %r1786, 13; 2026-02-21T10:18:33.5608812Z mul.wide.s32 %rd21, %r1787, 2; 2026-02-21T10:18:33.5609011Z or.b32 %r1788, %r11, %r11726; 2026-02-21T10:18:33.5609190Z shl.b32 %r1789, %r1788, 13; 2026-02-21T10:18:33.5609364Z mul.wide.s32 %rd22, %r1789, 2; 2026-02-21T10:18:33.5609549Z or.b32 %r1790, %r10, %r11726; 2026-02-21T10:18:33.5609724Z shl.b32 %r1791, %r1790, 13; 2026-02-21T10:18:33.5609906Z mul.wide.s32 %rd23, %r1791, 2; 2026-02-21T10:18:33.5610086Z or.b32 %r1792, %r9, %r11726; 2026-02-21T10:18:33.5610264Z shl.b32 %r1793, %r1792, 13; 2026-02-21T10:18:33.5610436Z mul.wide.s32 %rd24, %r1793, 2; 2026-02-21T10:18:33.5610626Z or.b32 %r1794, %r8, %r11726; 2026-02-21T10:18:33.5610799Z shl.b32 %r1795, %r1794, 13; 2026-02-21T10:18:33.5610983Z mul.wide.s32 %rd25, %r1795, 2; 2026-02-21T10:18:33.5611169Z or.b32 %r1796, %r7, %r11726; 2026-02-21T10:18:33.5611343Z shl.b32 %r1797, %r1796, 13; 2026-02-21T10:18:33.5611616Z mul.wide.s32 %rd26, %r1797, 2; 2026-02-21T10:18:33.5611851Z shl.b32 %r1798, %r1782, 20; 2026-02-21T10:18:33.5612030Z or.b32 %r1799, %r72, %r1798; 2026-02-21T10:18:33.5612206Z mul.wide.s32 %rd27, %r1799, 2; 2026-02-21T10:18:33.5612392Z or.b32 %r31906, %r74, %r1798; 2026-02-21T10:18:33.5612566Z or.b32 %r1800, %r73, %r1798; 2026-02-21T10:18:33.5612743Z mul.wide.s32 %rd28, %r1800, 2; 2026-02-21T10:18:33.5612926Z mov.b32 %r31907, 0f00000000; 2026-02-21T10:18:33.5613098Z mov.b64 %rd655, -96; 2026-02-21T10:18:33.5613264Z mov.b64 %rd654, %rd11; 2026-02-21T10:18:33.5613430Z mov.b32 %r31908, %r31907; 2026-02-21T10:18:33.5613603Z mov.b32 %r31909, %r31907; 2026-02-21T10:18:33.5613765Z mov.b32 %r31910, %r31907; 2026-02-21T10:18:33.5613942Z mov.b32 %r31911, %r31907; 2026-02-21T10:18:33.5614107Z mov.b32 %r31912, %r31907; 2026-02-21T10:18:33.5614276Z mov.b32 %r31913, %r31907; 2026-02-21T10:18:33.5614436Z mov.b32 %r31914, %r31907; 2026-02-21T10:18:33.5614607Z mov.b32 %r31915, %r31907; 2026-02-21T10:18:33.5614773Z mov.b32 %r31916, %r31907; 2026-02-21T10:18:33.5614936Z mov.b32 %r31917, %r31907; 2026-02-21T10:18:33.5615104Z mov.b32 %r31918, %r31907; 2026-02-21T10:18:33.5615340Z mov.b32 %r31919, %r31907; 2026-02-21T10:18:33.5615511Z mov.b32 %r31920, %r31907; 2026-02-21T10:18:33.5615673Z mov.b32 %r31921, %r31907; 2026-02-21T10:18:33.5615840Z mov.b32 %r31922, %r31907; 2026-02-21T10:18:33.5616019Z mov.b32 %r31923, %r31907; 2026-02-21T10:18:33.5616248Z mov.b32 %r31924, %r31907; 2026-02-21T10:18:33.5616414Z mov.b32 %r31925, %r31907; 2026-02-21T10:18:33.5616713Z mov.b32 %r31926, %r31907; 2026-02-21T10:18:33.5616881Z mov.b32 %r31927, %r31907; 2026-02-21T10:18:33.5617043Z mov.b32 %r31928, %r31907; 2026-02-21T10:18:33.5617224Z mov.b32 %r31929, %r31907; 2026-02-21T10:18:33.5617391Z mov.b32 %r31930, %r31907; 2026-02-21T10:18:33.5617561Z mov.b32 %r31931, %r31907; 2026-02-21T10:18:33.5617722Z mov.b32 %r31932, %r31907; 2026-02-21T10:18:33.5617895Z mov.b32 %r31933, %r31907; 2026-02-21T10:18:33.5618057Z mov.b32 %r31934, %r31907; 2026-02-21T10:18:33.5618238Z mov.b32 %r31935, %r31907; 2026-02-21T10:18:33.5618401Z mov.b32 %r31936, %r31907; 2026-02-21T10:18:33.5618576Z mov.b32 %r31937, %r31907; 2026-02-21T10:18:33.5618744Z mov.b32 %r31938, %r31907; 2026-02-21T10:18:33.5618907Z mov.b32 %r31939, %r31907; 2026-02-21T10:18:33.5619080Z mov.b32 %r31940, %r31907; 2026-02-21T10:18:33.5619245Z mov.b32 %r31941, %r31907; 2026-02-21T10:18:33.5619416Z mov.b32 %r31942, %r31907; 2026-02-21T10:18:33.5619583Z mov.b32 %r31943, %r31907; 2026-02-21T10:18:33.5619758Z mov.b32 %r31944, %r31907; 2026-02-21T10:18:33.5619922Z mov.b32 %r31945, %r31907; 2026-02-21T10:18:33.5620094Z mov.b32 %r31946, %r31907; 2026-02-21T10:18:33.5620257Z mov.b32 %r31947, %r31907; 2026-02-21T10:18:33.5620426Z mov.b32 %r31948, %r31907; 2026-02-21T10:18:33.5620609Z mov.b32 %r31949, %r31907; 2026-02-21T10:18:33.5620774Z mov.b32 %r31950, %r31907; 2026-02-21T10:18:33.5620947Z mov.b32 %r31951, %r31907; 2026-02-21T10:18:33.5621111Z mov.b32 %r31952, %r31907; 2026-02-21T10:18:33.5621281Z mov.b32 %r31953, %r31907; 2026-02-21T10:18:33.5621444Z mov.b32 %r31954, %r31907; 2026-02-21T10:18:33.5621613Z mov.b32 %r31955, %r31907; 2026-02-21T10:18:33.5621776Z mov.b32 %r31956, %r31907; 2026-02-21T10:18:33.5621946Z mov.b32 %r31957, %r31907; 2026-02-21T10:18:33.5622109Z mov.b32 %r31958, %r31907; 2026-02-21T10:18:33.5622280Z mov.b32 %r31959, %r31907; 2026-02-21T10:18:33.5622452Z mov.b32 %r31960, %r31907; 2026-02-21T10:18:33.5622615Z mov.b32 %r31961, %r31907; 2026-02-21T10:18:33.5622783Z mov.b32 %r31962, %r31907; 2026-02-21T10:18:33.5622942Z mov.b32 %r31963, %r31907; 2026-02-21T10:18:33.5623109Z mov.b32 %r31964, %r31907; 2026-02-21T10:18:33.5623287Z mov.b32 %r31965, %r31907; 2026-02-21T10:18:33.5623456Z mov.b32 %r31966, %r31907; 2026-02-21T10:18:33.5623618Z mov.b32 %r31967, %r31907; 2026-02-21T10:18:33.5623785Z mov.b32 %r31968, %r31907; 2026-02-21T10:18:33.5624032Z mov.b32 %r31969, %r31907; 2026-02-21T10:18:33.5624201Z mov.b32 %r31970, %r31907; 2026-02-21T10:18:33.5624476Z mov.b32 %r31971, %r31907; 2026-02-21T10:18:33.5624642Z mov.b32 %r31972, %r31907; 2026-02-21T10:18:33.5624813Z mov.b32 %r31973, %r31907; 2026-02-21T10:18:33.5624977Z mov.b32 %r31974, %r31907; 2026-02-21T10:18:33.5625158Z mov.b32 %r31975, %r31907; 2026-02-21T10:18:33.5625322Z mov.b32 %r31976, %r31907; 2026-02-21T10:18:33.5625494Z mov.b32 %r31977, %r31907; 2026-02-21T10:18:33.5625661Z mov.b32 %r31978, %r31907; 2026-02-21T10:18:33.5625835Z mov.b32 %r31979, %r31907; 2026-02-21T10:18:33.5625998Z mov.b32 %r31980, %r31907; 2026-02-21T10:18:33.5626169Z mov.b32 %r31981, %r31907; 2026-02-21T10:18:33.5626340Z mov.b32 %r31982, %r31907; 2026-02-21T10:18:33.5626637Z mov.b32 %r31983, %r31907; 2026-02-21T10:18:33.5626812Z mov.b32 %r31984, %r31907; 2026-02-21T10:18:33.5626974Z mov.b32 %r31985, %r31907; 2026-02-21T10:18:33.5627143Z mov.b32 %r31986, %r31907; 2026-02-21T10:18:33.5627310Z mov.b32 %r31987, %r31907; 2026-02-21T10:18:33.5627479Z mov.b32 %r31988, %r31907; 2026-02-21T10:18:33.5627645Z mov.b32 %r31989, %r31907; 2026-02-21T10:18:33.5627897Z mov.b32 %r31990, %r31907; 2026-02-21T10:18:33.5628071Z mov.b32 %r31991, %r31907; 2026-02-21T10:18:33.5628241Z mov.b32 %r31992, %r31907; 2026-02-21T10:18:33.5628465Z mov.b32 %r31993, %r31907; 2026-02-21T10:18:33.5628653Z mov.b32 %r31994, %r31907; 2026-02-21T10:18:33.5628831Z mov.b32 %r31995, %r31907; 2026-02-21T10:18:33.5629067Z mov.b32 %r31996, %r31907; 2026-02-21T10:18:33.5629260Z mov.b32 %r31997, %r31907; 2026-02-21T10:18:33.5629425Z mov.b32 %r31998, %r31907; 2026-02-21T10:18:33.5629595Z mov.b32 %r31999, %r31907; 2026-02-21T10:18:33.5629756Z mov.b32 %r32000, %r31907; 2026-02-21T10:18:33.5629927Z mov.b32 %r32001, %r31907; 2026-02-21T10:18:33.5630096Z mov.b32 %r32002, %r31907; 2026-02-21T10:18:33.5630259Z mov.b32 %r32003, %r31907; 2026-02-21T10:18:33.5630425Z mov.b32 %r32004, %r31907; 2026-02-21T10:18:33.5630589Z mov.b32 %r32005, %r31907; 2026-02-21T10:18:33.5630756Z mov.b32 %r32006, %r31907; 2026-02-21T10:18:33.5630919Z mov.b32 %r32007, %r31907; 2026-02-21T10:18:33.5631086Z mov.b32 %r32008, %r31907; 2026-02-21T10:18:33.5631248Z mov.b32 %r32009, %r31907; 2026-02-21T10:18:33.5631427Z mov.b32 %r32010, %r31907; 2026-02-21T10:18:33.5631591Z mov.b32 %r32011, %r31907; 2026-02-21T10:18:33.5631759Z mov.b32 %r32012, %r31907; 2026-02-21T10:18:33.5631925Z mov.b32 %r32013, %r31907; 2026-02-21T10:18:33.5632091Z mov.b32 %r32014, %r31907; 2026-02-21T10:18:33.5632258Z mov.b32 %r32015, %r31907; 2026-02-21T10:18:33.5632420Z mov.b32 %r32016, %r31907; 2026-02-21T10:18:33.5632587Z mov.b32 %r32017, %r31907; 2026-02-21T10:18:33.5632750Z mov.b32 %r32018, %r31907; 2026-02-21T10:18:33.5632917Z mov.b32 %r32019, %r31907; 2026-02-21T10:18:33.5633079Z mov.b32 %r32020, %r31907; 2026-02-21T10:18:33.5633245Z mov.b32 %r32021, %r31907; 2026-02-21T10:18:33.5633408Z mov.b32 %r32022, %r31907; 2026-02-21T10:18:33.5633578Z mov.b32 %r32023, %r31907; 2026-02-21T10:18:33.5633747Z mov.b32 %r32024, %r31907; 2026-02-21T10:18:33.5633912Z mov.b32 %r32025, %r31907; 2026-02-21T10:18:33.5634080Z mov.b32 %r32026, %r31907; 2026-02-21T10:18:33.5634244Z mov.b32 %r32027, %r31907; 2026-02-21T10:18:33.5634415Z mov.b32 %r32028, %r31907; 2026-02-21T10:18:33.5634575Z mov.b32 %r32029, %r31907; 2026-02-21T10:18:33.5634756Z mov.b32 %r32030, %r31907; 2026-02-21T10:18:33.5634921Z mov.b32 %r32031, %r31907; 2026-02-21T10:18:33.5635100Z mov.b32 %r32032, %r31907; 2026-02-21T10:18:33.5635263Z mov.b32 %r32033, %r31907; 2026-02-21T10:18:33.5635432Z mov.b32 %r32034, %r31907; 2026-02-21T10:18:33.5635653Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:18:33.5635947Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.5636347Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.5636924Z add.s64 %rd134, %rd654, %rd27; 2026-02-21T10:18:33.5637122Z add.s64 %rd137, %rd654, %rd26; 2026-02-21T10:18:33.5637390Z add.s64 %rd140, %rd654, %rd25; 2026-02-21T10:18:33.5637599Z add.s64 %rd143, %rd654, %rd24; 2026-02-21T10:18:33.5637785Z add.s64 %rd146, %rd654, %rd23; 2026-02-21T10:18:33.5637975Z add.s64 %rd149, %rd654, %rd22; 2026-02-21T10:18:33.5638167Z add.s64 %rd152, %rd654, %rd21; 2026-02-21T10:18:33.5638492Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.5638850Z add.s64 %rd155, %rd654, %rd28; 2026-02-21T10:18:33.5639030Z // begin inline asm 2026-02-21T10:18:33.5639199Z mov.u64 %rd133, 0x0; 2026-02-21T10:18:33.5639430Z createpolicy.fractional.L2::evict_first.b64 %rd133, 1.0; 2026-02-21T10:18:33.5639697Z // end inline asm 2026-02-21T10:18:33.5639847Z // begin inline asm 2026-02-21T10:18:33.5640012Z mov.u32 %r1801, 0x0; 2026-02-21T10:18:33.5640175Z mov.u32 %r1802, 0x0; 2026-02-21T10:18:33.5640327Z mov.u32 %r1803, 0x0; 2026-02-21T10:18:33.5640479Z mov.u32 %r1804, 0x0; 2026-02-21T10:18:33.5640886Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1801, %r1802, %r1803, %r1804 }, [ %rd134 + 0 ], %rd133; 2026-02-21T10:18:33.5641275Z // end inline asm 2026-02-21T10:18:33.5641425Z // begin inline asm 2026-02-21T10:18:33.5641584Z mov.u64 %rd136, 0x0; 2026-02-21T10:18:33.5641796Z createpolicy.fractional.L2::evict_first.b64 %rd136, 1.0; 2026-02-21T10:18:33.5642059Z // end inline asm 2026-02-21T10:18:33.5642282Z // begin inline asm 2026-02-21T10:18:33.5642439Z mov.u32 %r1805, 0x0; 2026-02-21T10:18:33.5642593Z mov.u32 %r1806, 0x0; 2026-02-21T10:18:33.5642745Z mov.u32 %r1807, 0x0; 2026-02-21T10:18:33.5642897Z mov.u32 %r1808, 0x0; 2026-02-21T10:18:33.5643207Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1805, %r1806, %r1807, %r1808 }, [ %rd137 + 0 ], %rd136; 2026-02-21T10:18:33.5643572Z // end inline asm 2026-02-21T10:18:33.5643717Z // begin inline asm 2026-02-21T10:18:33.5643876Z mov.u64 %rd139, 0x0; 2026-02-21T10:18:33.5644101Z createpolicy.fractional.L2::evict_first.b64 %rd139, 1.0; 2026-02-21T10:18:33.5644367Z // end inline asm 2026-02-21T10:18:33.5644524Z // begin inline asm 2026-02-21T10:18:33.5644677Z mov.u32 %r1809, 0x0; 2026-02-21T10:18:33.5644834Z mov.u32 %r1810, 0x0; 2026-02-21T10:18:33.5644982Z mov.u32 %r1811, 0x0; 2026-02-21T10:18:33.5645135Z mov.u32 %r1812, 0x0; 2026-02-21T10:18:33.5645448Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1809, %r1810, %r1811, %r1812 }, [ %rd140 + 0 ], %rd139; 2026-02-21T10:18:33.5645823Z // end inline asm 2026-02-21T10:18:33.5645987Z // begin inline asm 2026-02-21T10:18:33.5646158Z mov.u64 %rd142, 0x0; 2026-02-21T10:18:33.5646398Z createpolicy.fractional.L2::evict_first.b64 %rd142, 1.0; 2026-02-21T10:18:33.5646771Z // end inline asm 2026-02-21T10:18:33.5646925Z // begin inline asm 2026-02-21T10:18:33.5647075Z mov.u32 %r1813, 0x0; 2026-02-21T10:18:33.5647233Z mov.u32 %r1814, 0x0; 2026-02-21T10:18:33.5647380Z mov.u32 %r1815, 0x0; 2026-02-21T10:18:33.5647536Z mov.u32 %r1816, 0x0; 2026-02-21T10:18:33.5647859Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1813, %r1814, %r1815, %r1816 }, [ %rd143 + 0 ], %rd142; 2026-02-21T10:18:33.5648231Z // end inline asm 2026-02-21T10:18:33.5648403Z // begin inline asm 2026-02-21T10:18:33.5648556Z mov.u64 %rd145, 0x0; 2026-02-21T10:18:33.5648780Z createpolicy.fractional.L2::evict_first.b64 %rd145, 1.0; 2026-02-21T10:18:33.5649034Z // end inline asm 2026-02-21T10:18:33.5649188Z // begin inline asm 2026-02-21T10:18:33.5649340Z mov.u32 %r1817, 0x0; 2026-02-21T10:18:33.5649499Z mov.u32 %r1818, 0x0; 2026-02-21T10:18:33.5649648Z mov.u32 %r1819, 0x0; 2026-02-21T10:18:33.5649803Z mov.u32 %r1820, 0x0; 2026-02-21T10:18:33.5650119Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1817, %r1818, %r1819, %r1820 }, [ %rd146 + 0 ], %rd145; 2026-02-21T10:18:33.5650485Z // end inline asm 2026-02-21T10:18:33.5650639Z // begin inline asm 2026-02-21T10:18:33.5650794Z mov.u64 %rd148, 0x0; 2026-02-21T10:18:33.5651143Z createpolicy.fractional.L2::evict_first.b64 %rd148, 1.0; 2026-02-21T10:18:33.5651476Z // end inline asm 2026-02-21T10:18:33.5651626Z // begin inline asm 2026-02-21T10:18:33.5651781Z mov.u32 %r1821, 0x0; 2026-02-21T10:18:33.5651933Z mov.u32 %r1822, 0x0; 2026-02-21T10:18:33.5652087Z mov.u32 %r1823, 0x0; 2026-02-21T10:18:33.5652233Z mov.u32 %r1824, 0x0; 2026-02-21T10:18:33.5652550Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1821, %r1822, %r1823, %r1824 }, [ %rd149 + 0 ], %rd148; 2026-02-21T10:18:33.5652907Z // end inline asm 2026-02-21T10:18:33.5653061Z // begin inline asm 2026-02-21T10:18:33.5653211Z mov.u64 %rd151, 0x0; 2026-02-21T10:18:33.5653428Z createpolicy.fractional.L2::evict_first.b64 %rd151, 1.0; 2026-02-21T10:18:33.5653683Z // end inline asm 2026-02-21T10:18:33.5653827Z // begin inline asm 2026-02-21T10:18:33.5653983Z mov.u32 %r1825, 0x0; 2026-02-21T10:18:33.5654129Z mov.u32 %r1826, 0x0; 2026-02-21T10:18:33.5654283Z mov.u32 %r1827, 0x0; 2026-02-21T10:18:33.5654438Z mov.u32 %r1828, 0x0; 2026-02-21T10:18:33.5654750Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1825, %r1826, %r1827, %r1828 }, [ %rd152 + 0 ], %rd151; 2026-02-21T10:18:33.5655187Z // end inline asm 2026-02-21T10:18:33.5655342Z // begin inline asm 2026-02-21T10:18:33.5655492Z mov.u64 %rd154, 0x0; 2026-02-21T10:18:33.5655707Z createpolicy.fractional.L2::evict_first.b64 %rd154, 1.0; 2026-02-21T10:18:33.5655975Z // end inline asm 2026-02-21T10:18:33.5656190Z // begin inline asm 2026-02-21T10:18:33.5656352Z mov.u32 %r1829, 0x0; 2026-02-21T10:18:33.5656623Z mov.u32 %r1830, 0x0; 2026-02-21T10:18:33.5656780Z mov.u32 %r1831, 0x0; 2026-02-21T10:18:33.5656930Z mov.u32 %r1832, 0x0; 2026-02-21T10:18:33.5657258Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r1829, %r1830, %r1831, %r1832 }, [ %rd155 + 0 ], %rd154; 2026-02-21T10:18:33.5657621Z // end inline asm 2026-02-21T10:18:33.5657928Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.5658286Z bar.sync 0; 2026-02-21T10:18:33.5658450Z st.shared.v2.b32 [%r16], {%r1801, %r1802}; 2026-02-21T10:18:33.5658693Z st.shared.v2.b32 [%r16+2048], {%r1805, %r1806}; 2026-02-21T10:18:33.5658937Z st.shared.v2.b32 [%r16+4096], {%r1809, %r1810}; 2026-02-21T10:18:33.5659189Z st.shared.v2.b32 [%r16+6144], {%r1813, %r1814}; 2026-02-21T10:18:33.5659428Z st.shared.v2.b32 [%r16+8192], {%r1817, %r1818}; 2026-02-21T10:18:33.5659679Z st.shared.v2.b32 [%r16+10240], {%r1821, %r1822}; 2026-02-21T10:18:33.5659931Z st.shared.v2.b32 [%r16+12288], {%r1825, %r1826}; 2026-02-21T10:18:33.5660175Z st.shared.v2.b32 [%r16+14336], {%r1829, %r1830}; 2026-02-21T10:18:33.5660415Z st.shared.v2.b32 [%r17], {%r1803, %r1804}; 2026-02-21T10:18:33.5660645Z st.shared.v2.b32 [%r17+2048], {%r1807, %r1808}; 2026-02-21T10:18:33.5660892Z st.shared.v2.b32 [%r17+4096], {%r1811, %r1812}; 2026-02-21T10:18:33.5661127Z st.shared.v2.b32 [%r17+6144], {%r1815, %r1816}; 2026-02-21T10:18:33.5686247Z st.shared.v2.b32 [%r17+8192], {%r1819, %r1820}; 2026-02-21T10:18:33.5686736Z st.shared.v2.b32 [%r17+10240], {%r1823, %r1824}; 2026-02-21T10:18:33.5687043Z st.shared.v2.b32 [%r17+12288], {%r1827, %r1828}; 2026-02-21T10:18:33.5687326Z st.shared.v2.b32 [%r17+14336], {%r1831, %r1832}; 2026-02-21T10:18:33.5687576Z bar.sync 0; 2026-02-21T10:18:33.5687746Z ld.shared.b16 %rs1, [%r18]; 2026-02-21T10:18:33.5687944Z ld.shared.b16 %rs2, [%r18+1024]; 2026-02-21T10:18:33.5688164Z ld.shared.b16 %rs3, [%r18+64]; 2026-02-21T10:18:33.5688372Z ld.shared.b16 %rs4, [%r18+1088]; 2026-02-21T10:18:33.5688568Z ld.shared.b16 %rs5, [%r18+8192]; 2026-02-21T10:18:33.5688767Z ld.shared.b16 %rs6, [%r18+9216]; 2026-02-21T10:18:33.5688957Z ld.shared.b16 %rs7, [%r18+8256]; 2026-02-21T10:18:33.5689154Z ld.shared.b16 %rs8, [%r18+9280]; 2026-02-21T10:18:33.5689349Z ld.shared.b16 %rs9, [%r19]; 2026-02-21T10:18:33.5689545Z ld.shared.b16 %rs10, [%r19+1024]; 2026-02-21T10:18:33.5689749Z ld.shared.b16 %rs11, [%r19+64]; 2026-02-21T10:18:33.5690141Z ld.shared.b16 %rs12, [%r19+1088]; 2026-02-21T10:18:33.5690416Z ld.shared.b16 %rs13, [%r19+8192]; 2026-02-21T10:18:33.5690625Z ld.shared.b16 %rs14, [%r19+9216]; 2026-02-21T10:18:33.5690831Z ld.shared.b16 %rs15, [%r19+8256]; 2026-02-21T10:18:33.5691024Z ld.shared.b16 %rs16, [%r19+9280]; 2026-02-21T10:18:33.5691227Z ld.shared.b16 %rs17, [%r20]; 2026-02-21T10:18:33.5691425Z ld.shared.b16 %rs18, [%r20+1024]; 2026-02-21T10:18:33.5691635Z ld.shared.b16 %rs19, [%r20+64]; 2026-02-21T10:18:33.5691829Z ld.shared.b16 %rs20, [%r20+1088]; 2026-02-21T10:18:33.5692029Z ld.shared.b16 %rs21, [%r20+8192]; 2026-02-21T10:18:33.5692217Z ld.shared.b16 %rs22, [%r20+9216]; 2026-02-21T10:18:33.5692414Z ld.shared.b16 %rs23, [%r20+8256]; 2026-02-21T10:18:33.5692612Z ld.shared.b16 %rs24, [%r20+9280]; 2026-02-21T10:18:33.5692806Z ld.shared.b16 %rs25, [%r21]; 2026-02-21T10:18:33.5692998Z ld.shared.b16 %rs26, [%r21+1024]; 2026-02-21T10:18:33.5693189Z ld.shared.b16 %rs27, [%r21+64]; 2026-02-21T10:18:33.5693385Z ld.shared.b16 %rs28, [%r21+1088]; 2026-02-21T10:18:33.5693580Z ld.shared.b16 %rs29, [%r21+8192]; 2026-02-21T10:18:33.5693793Z ld.shared.b16 %rs30, [%r21+9216]; 2026-02-21T10:18:33.5694076Z ld.shared.b16 %rs31, [%r21+8256]; 2026-02-21T10:18:33.5694275Z ld.shared.b16 %rs32, [%r21+9280]; 2026-02-21T10:18:33.5694472Z ld.shared.b16 %rs33, [%r22]; 2026-02-21T10:18:33.5694653Z ld.shared.b16 %rs34, [%r22+1024]; 2026-02-21T10:18:33.5694909Z ld.shared.b16 %rs35, [%r22+64]; 2026-02-21T10:18:33.5695100Z ld.shared.b16 %rs36, [%r22+1088]; 2026-02-21T10:18:33.5695299Z ld.shared.b16 %rs37, [%r22+8192]; 2026-02-21T10:18:33.5695490Z ld.shared.b16 %rs38, [%r22+9216]; 2026-02-21T10:18:33.5695698Z ld.shared.b16 %rs39, [%r22+8256]; 2026-02-21T10:18:33.5695892Z ld.shared.b16 %rs40, [%r22+9280]; 2026-02-21T10:18:33.5696092Z ld.shared.b16 %rs41, [%r23]; 2026-02-21T10:18:33.5696281Z ld.shared.b16 %rs42, [%r23+1024]; 2026-02-21T10:18:33.5696607Z ld.shared.b16 %rs43, [%r23+64]; 2026-02-21T10:18:33.5696816Z ld.shared.b16 %rs44, [%r23+1088]; 2026-02-21T10:18:33.5697010Z ld.shared.b16 %rs45, [%r23+8192]; 2026-02-21T10:18:33.5697224Z ld.shared.b16 %rs46, [%r23+9216]; 2026-02-21T10:18:33.5697426Z ld.shared.b16 %rs47, [%r23+8256]; 2026-02-21T10:18:33.5697625Z ld.shared.b16 %rs48, [%r23+9280]; 2026-02-21T10:18:33.5697818Z ld.shared.b16 %rs49, [%r24]; 2026-02-21T10:18:33.5698013Z ld.shared.b16 %rs50, [%r24+1024]; 2026-02-21T10:18:33.5698210Z ld.shared.b16 %rs51, [%r24+64]; 2026-02-21T10:18:33.5698412Z ld.shared.b16 %rs52, [%r24+1088]; 2026-02-21T10:18:33.5698609Z ld.shared.b16 %rs53, [%r24+8192]; 2026-02-21T10:18:33.5698799Z ld.shared.b16 %rs54, [%r24+9216]; 2026-02-21T10:18:33.5698995Z ld.shared.b16 %rs55, [%r24+8256]; 2026-02-21T10:18:33.5699182Z ld.shared.b16 %rs56, [%r24+9280]; 2026-02-21T10:18:33.5699382Z ld.shared.b16 %rs57, [%r25]; 2026-02-21T10:18:33.5699566Z ld.shared.b16 %rs58, [%r25+1024]; 2026-02-21T10:18:33.5699767Z ld.shared.b16 %rs59, [%r25+64]; 2026-02-21T10:18:33.5699959Z ld.shared.b16 %rs60, [%r25+1088]; 2026-02-21T10:18:33.5700158Z ld.shared.b16 %rs61, [%r25+8192]; 2026-02-21T10:18:33.5700355Z ld.shared.b16 %rs62, [%r25+9216]; 2026-02-21T10:18:33.5700547Z ld.shared.b16 %rs63, [%r25+8256]; 2026-02-21T10:18:33.5700743Z ld.shared.b16 %rs64, [%r25+9280]; 2026-02-21T10:18:33.5700935Z cvt.f32.bf16 %r1970, %rs1; 2026-02-21T10:18:33.5701121Z cvt.f32.bf16 %r1971, %rs2; 2026-02-21T10:18:33.5701299Z cvt.f32.bf16 %r1972, %rs9; 2026-02-21T10:18:33.5701487Z cvt.f32.bf16 %r1973, %rs10; 2026-02-21T10:18:33.5701664Z cvt.f32.bf16 %r2102, %rs17; 2026-02-21T10:18:33.5701845Z cvt.f32.bf16 %r2103, %rs18; 2026-02-21T10:18:33.5702019Z cvt.f32.bf16 %r2104, %rs25; 2026-02-21T10:18:33.5702204Z cvt.f32.bf16 %r2105, %rs26; 2026-02-21T10:18:33.5702396Z cvt.f32.bf16 %r2234, %rs33; 2026-02-21T10:18:33.5702572Z cvt.f32.bf16 %r2235, %rs34; 2026-02-21T10:18:33.5702752Z cvt.f32.bf16 %r2236, %rs41; 2026-02-21T10:18:33.5702935Z cvt.f32.bf16 %r2237, %rs42; 2026-02-21T10:18:33.5703224Z cvt.f32.bf16 %r2366, %rs49; 2026-02-21T10:18:33.5703466Z cvt.f32.bf16 %r2367, %rs50; 2026-02-21T10:18:33.5703642Z cvt.f32.bf16 %r2368, %rs57; 2026-02-21T10:18:33.5703819Z cvt.f32.bf16 %r2369, %rs58; 2026-02-21T10:18:33.5704004Z cvt.f32.bf16 %r2498, %rs3; 2026-02-21T10:18:33.5704197Z cvt.f32.bf16 %r2499, %rs4; 2026-02-21T10:18:33.5704396Z cvt.f32.bf16 %r2500, %rs11; 2026-02-21T10:18:33.5704594Z cvt.f32.bf16 %r2501, %rs12; 2026-02-21T10:18:33.5704774Z cvt.f32.bf16 %r2630, %rs19; 2026-02-21T10:18:33.5704957Z cvt.f32.bf16 %r2631, %rs20; 2026-02-21T10:18:33.5705137Z cvt.f32.bf16 %r2632, %rs27; 2026-02-21T10:18:33.5705321Z cvt.f32.bf16 %r2633, %rs28; 2026-02-21T10:18:33.5705497Z cvt.f32.bf16 %r2762, %rs35; 2026-02-21T10:18:33.5705680Z cvt.f32.bf16 %r2763, %rs36; 2026-02-21T10:18:33.5705862Z cvt.f32.bf16 %r2764, %rs43; 2026-02-21T10:18:33.5706040Z cvt.f32.bf16 %r2765, %rs44; 2026-02-21T10:18:33.5706226Z cvt.f32.bf16 %r2894, %rs51; 2026-02-21T10:18:33.5706403Z cvt.f32.bf16 %r2895, %rs52; 2026-02-21T10:18:33.5706718Z cvt.f32.bf16 %r2896, %rs59; 2026-02-21T10:18:33.5706894Z cvt.f32.bf16 %r2897, %rs60; 2026-02-21T10:18:33.5707173Z cvt.f32.bf16 %r3026, %rs5; 2026-02-21T10:18:33.5707359Z cvt.f32.bf16 %r3027, %rs6; 2026-02-21T10:18:33.5707548Z cvt.f32.bf16 %r3028, %rs13; 2026-02-21T10:18:33.5707725Z cvt.f32.bf16 %r3029, %rs14; 2026-02-21T10:18:33.5707909Z cvt.f32.bf16 %r3158, %rs21; 2026-02-21T10:18:33.5708147Z cvt.f32.bf16 %r3159, %rs22; 2026-02-21T10:18:33.5708339Z cvt.f32.bf16 %r3160, %rs29; 2026-02-21T10:18:33.5708632Z cvt.f32.bf16 %r3161, %rs30; 2026-02-21T10:18:33.5708809Z cvt.f32.bf16 %r3290, %rs37; 2026-02-21T10:18:33.5709006Z cvt.f32.bf16 %r3291, %rs38; 2026-02-21T10:18:33.5709191Z cvt.f32.bf16 %r3292, %rs45; 2026-02-21T10:18:33.5709377Z cvt.f32.bf16 %r3293, %rs46; 2026-02-21T10:18:33.5709556Z cvt.f32.bf16 %r3422, %rs53; 2026-02-21T10:18:33.5709737Z cvt.f32.bf16 %r3423, %rs54; 2026-02-21T10:18:33.5709915Z cvt.f32.bf16 %r3424, %rs61; 2026-02-21T10:18:33.5710097Z cvt.f32.bf16 %r3425, %rs62; 2026-02-21T10:18:33.5710271Z cvt.f32.bf16 %r3554, %rs7; 2026-02-21T10:18:33.5710449Z cvt.f32.bf16 %r3555, %rs8; 2026-02-21T10:18:33.5710630Z cvt.f32.bf16 %r3556, %rs15; 2026-02-21T10:18:33.5710809Z cvt.f32.bf16 %r3557, %rs16; 2026-02-21T10:18:33.5710990Z cvt.f32.bf16 %r3686, %rs23; 2026-02-21T10:18:33.5711164Z cvt.f32.bf16 %r3687, %rs24; 2026-02-21T10:18:33.5711348Z cvt.f32.bf16 %r3688, %rs31; 2026-02-21T10:18:33.5711525Z cvt.f32.bf16 %r3689, %rs32; 2026-02-21T10:18:33.5711713Z cvt.f32.bf16 %r3818, %rs39; 2026-02-21T10:18:33.5711885Z cvt.f32.bf16 %r3819, %rs40; 2026-02-21T10:18:33.5712069Z cvt.f32.bf16 %r3820, %rs47; 2026-02-21T10:18:33.5712246Z cvt.f32.bf16 %r3821, %rs48; 2026-02-21T10:18:33.5712428Z cvt.f32.bf16 %r3950, %rs55; 2026-02-21T10:18:33.5712609Z cvt.f32.bf16 %r3951, %rs56; 2026-02-21T10:18:33.5712794Z cvt.f32.bf16 %r3952, %rs63; 2026-02-21T10:18:33.5712982Z cvt.f32.bf16 %r3953, %rs64; 2026-02-21T10:18:33.5713324Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.5713695Z bar.sync 0; 2026-02-21T10:18:33.5713855Z add.s32 %r19296, %r29377, 4096; 2026-02-21T10:18:33.5714057Z // begin inline asm 2026-02-21T10:18:33.5714272Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.5714524Z // end inline asm 2026-02-21T10:18:33.5714682Z bar.sync 0; 2026-02-21T10:18:33.5714830Z // begin inline asm 2026-02-21T10:18:33.5715082Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.5715360Z // end inline asm 2026-02-21T10:18:33.5715520Z // begin inline asm 2026-02-21T10:18:33.5715695Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.5715891Z // end inline asm 2026-02-21T10:18:33.5716035Z bar.sync 0; 2026-02-21T10:18:33.5716208Z elect.sync %r9046|%p99, -1; 2026-02-21T10:18:33.5716407Z and.pred %p40, %p1, %p99; 2026-02-21T10:18:33.5716721Z add.s64 %rd31, %rd655, 96; 2026-02-21T10:18:33.5716995Z cvt.u32.u64 %r1837, %rd31; 2026-02-21T10:18:33.5717250Z // begin inline asm 2026-02-21T10:18:33.5717692Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r9279, %r1837}], [%r19296]; 2026-02-21T10:18:33.5718158Z // end inline asm 2026-02-21T10:18:33.5718317Z bar.sync 0; 2026-02-21T10:18:33.5718464Z mov.b32 %r8914, 0; 2026-02-21T10:18:33.5718615Z // begin inline asm 2026-02-21T10:18:33.5718770Z 2026-02-21T10:18:33.5718897Z { 2026-02-21T10:18:33.5719053Z .reg .pred complete; 2026-02-21T10:18:33.5719216Z waitLoop: 2026-02-21T10:18:33.5719446Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r8914; 2026-02-21T10:18:33.5719741Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.5719928Z } 2026-02-21T10:18:33.5720000Z 2026-02-21T10:18:33.5720061Z // end inline asm 2026-02-21T10:18:33.5720219Z bar.sync 0; 2026-02-21T10:18:33.5720370Z // begin inline asm 2026-02-21T10:18:33.5720565Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.5720806Z // end inline asm 2026-02-21T10:18:33.5721198Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5721578Z ld.shared.s8 %rs65, [%r26]; 2026-02-21T10:18:33.5721904Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5722267Z shl.b16 %rs66, %rs65, 4; 2026-02-21T10:18:33.5722642Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5722999Z ld.shared.s8 %rs67, [%r27+128]; 2026-02-21T10:18:33.5723326Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5723675Z shl.b16 %rs68, %rs67, 4; 2026-02-21T10:18:33.5724011Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5724374Z ld.shared.s8 %rs69, [%r28+256]; 2026-02-21T10:18:33.5724716Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5725076Z shl.b16 %rs70, %rs69, 4; 2026-02-21T10:18:33.5725391Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5725749Z ld.shared.s8 %rs71, [%r29+384]; 2026-02-21T10:18:33.5726078Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5726431Z shl.b16 %rs72, %rs71, 4; 2026-02-21T10:18:33.5726854Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5727205Z ld.shared.s8 %rs73, [%r30+512]; 2026-02-21T10:18:33.5727530Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5727874Z shl.b16 %rs74, %rs73, 4; 2026-02-21T10:18:33.5728201Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5728569Z ld.shared.s8 %rs75, [%r31+640]; 2026-02-21T10:18:33.5728907Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5729260Z shl.b16 %rs76, %rs75, 4; 2026-02-21T10:18:33.5729586Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5729963Z ld.shared.s8 %rs77, [%r32+768]; 2026-02-21T10:18:33.5730293Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5730649Z shl.b16 %rs78, %rs77, 4; 2026-02-21T10:18:33.5730955Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5731307Z ld.shared.s8 %rs79, [%r33+896]; 2026-02-21T10:18:33.5731638Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5732082Z shl.b16 %rs80, %rs79, 4; 2026-02-21T10:18:33.5732458Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5732820Z ld.shared.s8 %rs81, [%r26+1024]; 2026-02-21T10:18:33.5733147Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5733509Z shl.b16 %rs82, %rs81, 4; 2026-02-21T10:18:33.5733815Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5734172Z ld.shared.s8 %rs83, [%r27+1152]; 2026-02-21T10:18:33.5734497Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5734846Z shl.b16 %rs84, %rs83, 4; 2026-02-21T10:18:33.5735156Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5735505Z ld.shared.s8 %rs85, [%r28+1280]; 2026-02-21T10:18:33.5735838Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5736185Z shl.b16 %rs86, %rs85, 4; 2026-02-21T10:18:33.5736718Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5737107Z ld.shared.s8 %rs87, [%r29+1408]; 2026-02-21T10:18:33.5737498Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5737856Z shl.b16 %rs88, %rs87, 4; 2026-02-21T10:18:33.5738166Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5738538Z ld.shared.s8 %rs89, [%r30+1536]; 2026-02-21T10:18:33.5738870Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5739224Z shl.b16 %rs90, %rs89, 4; 2026-02-21T10:18:33.5739533Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5739887Z ld.shared.s8 %rs91, [%r31+1664]; 2026-02-21T10:18:33.5740213Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5740557Z shl.b16 %rs92, %rs91, 4; 2026-02-21T10:18:33.5740864Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5741212Z ld.shared.s8 %rs93, [%r32+1792]; 2026-02-21T10:18:33.5741535Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5741885Z shl.b16 %rs94, %rs93, 4; 2026-02-21T10:18:33.5742188Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5742540Z ld.shared.s8 %rs95, [%r33+1920]; 2026-02-21T10:18:33.5742861Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5743213Z shl.b16 %rs96, %rs95, 4; 2026-02-21T10:18:33.5743513Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5743868Z ld.shared.s8 %rs97, [%r26+2048]; 2026-02-21T10:18:33.5744199Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5744555Z shl.b16 %rs98, %rs97, 4; 2026-02-21T10:18:33.5744864Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5745212Z ld.shared.s8 %rs99, [%r27+2176]; 2026-02-21T10:18:33.5745534Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5745882Z shl.b16 %rs100, %rs99, 4; 2026-02-21T10:18:33.5746198Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5746684Z ld.shared.s8 %rs101, [%r28+2304]; 2026-02-21T10:18:33.5747112Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5747544Z shl.b16 %rs102, %rs101, 4; 2026-02-21T10:18:33.5747861Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5748223Z ld.shared.s8 %rs103, [%r29+2432]; 2026-02-21T10:18:33.5748636Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5748994Z shl.b16 %rs104, %rs103, 4; 2026-02-21T10:18:33.5749311Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5749663Z ld.shared.s8 %rs105, [%r30+2560]; 2026-02-21T10:18:33.5749995Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5750341Z shl.b16 %rs106, %rs105, 4; 2026-02-21T10:18:33.5750659Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5751018Z ld.shared.s8 %rs107, [%r31+2688]; 2026-02-21T10:18:33.5751419Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5751786Z shl.b16 %rs108, %rs107, 4; 2026-02-21T10:18:33.5752099Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5752518Z ld.shared.s8 %rs109, [%r32+2816]; 2026-02-21T10:18:33.5752856Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5753213Z shl.b16 %rs110, %rs109, 4; 2026-02-21T10:18:33.5753525Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5753873Z ld.shared.s8 %rs111, [%r33+2944]; 2026-02-21T10:18:33.5754198Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5754556Z shl.b16 %rs112, %rs111, 4; 2026-02-21T10:18:33.5754868Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5755215Z ld.shared.s8 %rs113, [%r26+3072]; 2026-02-21T10:18:33.5755538Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5755901Z shl.b16 %rs114, %rs113, 4; 2026-02-21T10:18:33.5756212Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5756683Z ld.shared.s8 %rs115, [%r27+3200]; 2026-02-21T10:18:33.5757005Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5757359Z shl.b16 %rs116, %rs115, 4; 2026-02-21T10:18:33.5757665Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5758016Z ld.shared.s8 %rs117, [%r28+3328]; 2026-02-21T10:18:33.5758347Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5758692Z shl.b16 %rs118, %rs117, 4; 2026-02-21T10:18:33.5759002Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5759365Z ld.shared.s8 %rs119, [%r29+3456]; 2026-02-21T10:18:33.5759697Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5760043Z shl.b16 %rs120, %rs119, 4; 2026-02-21T10:18:33.5760358Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5760711Z ld.shared.s8 %rs121, [%r30+3584]; 2026-02-21T10:18:33.5761031Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5761381Z shl.b16 %rs122, %rs121, 4; 2026-02-21T10:18:33.5761689Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5762199Z ld.shared.s8 %rs123, [%r31+3712]; 2026-02-21T10:18:33.5762528Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5762877Z shl.b16 %rs124, %rs123, 4; 2026-02-21T10:18:33.5763191Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5763542Z ld.shared.s8 %rs125, [%r32+3840]; 2026-02-21T10:18:33.5763871Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5764218Z shl.b16 %rs126, %rs125, 4; 2026-02-21T10:18:33.5764537Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5764903Z ld.shared.s8 %rs127, [%r33+3968]; 2026-02-21T10:18:33.5765231Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5765588Z shl.b16 %rs128, %rs127, 4; 2026-02-21T10:18:33.5765969Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5766331Z cvt.s16.s8 %rs129, %rs66; 2026-02-21T10:18:33.5766619Z shr.s16 %rs130, %rs129, 4; 2026-02-21T10:18:33.5766819Z cvt.s16.s8 %rs131, %rs68; 2026-02-21T10:18:33.5766995Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:18:33.5767247Z shr.s16 %rs133, %rs65, 4; 2026-02-21T10:18:33.5767428Z shr.s16 %rs134, %rs67, 4; 2026-02-21T10:18:33.5767735Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5768097Z cvt.rn.f32.s16 %r9047, %rs134; 2026-02-21T10:18:33.5768293Z cvt.rn.f32.s16 %r9048, %rs133; 2026-02-21T10:18:33.5768483Z cvt.rn.f32.s16 %r9049, %rs132; 2026-02-21T10:18:33.5768667Z cvt.rn.f32.s16 %r9050, %rs130; 2026-02-21T10:18:33.5768988Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5769361Z cvt.s16.s8 %rs135, %rs70; 2026-02-21T10:18:33.5769534Z shr.s16 %rs136, %rs135, 4; 2026-02-21T10:18:33.5769716Z cvt.s16.s8 %rs137, %rs72; 2026-02-21T10:18:33.5769886Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:18:33.5770065Z shr.s16 %rs139, %rs69, 4; 2026-02-21T10:18:33.5770234Z shr.s16 %rs140, %rs71, 4; 2026-02-21T10:18:33.5770540Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5770891Z cvt.rn.f32.s16 %r9051, %rs140; 2026-02-21T10:18:33.5771080Z cvt.rn.f32.s16 %r9052, %rs139; 2026-02-21T10:18:33.5771264Z cvt.rn.f32.s16 %r9053, %rs138; 2026-02-21T10:18:33.5771442Z cvt.rn.f32.s16 %r9054, %rs136; 2026-02-21T10:18:33.5771759Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5772105Z cvt.s16.s8 %rs141, %rs74; 2026-02-21T10:18:33.5772278Z shr.s16 %rs142, %rs141, 4; 2026-02-21T10:18:33.5772455Z cvt.s16.s8 %rs143, %rs76; 2026-02-21T10:18:33.5772631Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:18:33.5772803Z shr.s16 %rs145, %rs73, 4; 2026-02-21T10:18:33.5772981Z shr.s16 %rs146, %rs75, 4; 2026-02-21T10:18:33.5773287Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5773633Z cvt.rn.f32.s16 %r9055, %rs146; 2026-02-21T10:18:33.5773824Z cvt.rn.f32.s16 %r9056, %rs145; 2026-02-21T10:18:33.5774005Z cvt.rn.f32.s16 %r9057, %rs144; 2026-02-21T10:18:33.5774189Z cvt.rn.f32.s16 %r9058, %rs142; 2026-02-21T10:18:33.5774505Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5774855Z cvt.s16.s8 %rs147, %rs78; 2026-02-21T10:18:33.5775025Z shr.s16 %rs148, %rs147, 4; 2026-02-21T10:18:33.5775205Z cvt.s16.s8 %rs149, %rs80; 2026-02-21T10:18:33.5775383Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:18:33.5775555Z shr.s16 %rs151, %rs77, 4; 2026-02-21T10:18:33.5775832Z shr.s16 %rs152, %rs79, 4; 2026-02-21T10:18:33.5776199Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5776701Z cvt.rn.f32.s16 %r9059, %rs152; 2026-02-21T10:18:33.5776891Z cvt.rn.f32.s16 %r9060, %rs151; 2026-02-21T10:18:33.5777080Z cvt.rn.f32.s16 %r9061, %rs150; 2026-02-21T10:18:33.5777263Z cvt.rn.f32.s16 %r9062, %rs148; 2026-02-21T10:18:33.5777584Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5777936Z cvt.s16.s8 %rs153, %rs82; 2026-02-21T10:18:33.5778107Z shr.s16 %rs154, %rs153, 4; 2026-02-21T10:18:33.5778290Z cvt.s16.s8 %rs155, %rs84; 2026-02-21T10:18:33.5778464Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:18:33.5778645Z shr.s16 %rs157, %rs81, 4; 2026-02-21T10:18:33.5778813Z shr.s16 %rs158, %rs83, 4; 2026-02-21T10:18:33.5779124Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5779472Z cvt.rn.f32.s16 %r9063, %rs158; 2026-02-21T10:18:33.5779663Z cvt.rn.f32.s16 %r9064, %rs157; 2026-02-21T10:18:33.5779941Z cvt.rn.f32.s16 %r9065, %rs156; 2026-02-21T10:18:33.5780132Z cvt.rn.f32.s16 %r9066, %rs154; 2026-02-21T10:18:33.5780457Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5780805Z cvt.s16.s8 %rs159, %rs86; 2026-02-21T10:18:33.5781048Z shr.s16 %rs160, %rs159, 4; 2026-02-21T10:18:33.5781226Z cvt.s16.s8 %rs161, %rs88; 2026-02-21T10:18:33.5781411Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:18:33.5781589Z shr.s16 %rs163, %rs85, 4; 2026-02-21T10:18:33.5781761Z shr.s16 %rs164, %rs87, 4; 2026-02-21T10:18:33.5782070Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5782420Z cvt.rn.f32.s16 %r9067, %rs164; 2026-02-21T10:18:33.5782608Z cvt.rn.f32.s16 %r9068, %rs163; 2026-02-21T10:18:33.5782794Z cvt.rn.f32.s16 %r9069, %rs162; 2026-02-21T10:18:33.5782978Z cvt.rn.f32.s16 %r9070, %rs160; 2026-02-21T10:18:33.5783291Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5783644Z cvt.s16.s8 %rs165, %rs90; 2026-02-21T10:18:33.5783812Z shr.s16 %rs166, %rs165, 4; 2026-02-21T10:18:33.5783995Z cvt.s16.s8 %rs167, %rs92; 2026-02-21T10:18:33.5784170Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:18:33.5784351Z shr.s16 %rs169, %rs89, 4; 2026-02-21T10:18:33.5784524Z shr.s16 %rs170, %rs91, 4; 2026-02-21T10:18:33.5784826Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5785180Z cvt.rn.f32.s16 %r9071, %rs170; 2026-02-21T10:18:33.5785365Z cvt.rn.f32.s16 %r9072, %rs169; 2026-02-21T10:18:33.5785550Z cvt.rn.f32.s16 %r9073, %rs168; 2026-02-21T10:18:33.5785735Z cvt.rn.f32.s16 %r9074, %rs166; 2026-02-21T10:18:33.5786054Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5786409Z cvt.s16.s8 %rs171, %rs94; 2026-02-21T10:18:33.5786712Z shr.s16 %rs172, %rs171, 4; 2026-02-21T10:18:33.5786898Z cvt.s16.s8 %rs173, %rs96; 2026-02-21T10:18:33.5787067Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:18:33.5787135Z shr.s16 %rs175, %rs93, 4; 2026-02-21T10:18:33.5787198Z shr.s16 %rs176, %rs95, 4; 2026-02-21T10:18:33.5787398Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5787463Z cvt.rn.f32.s16 %r9075, %rs176; 2026-02-21T10:18:33.5787534Z cvt.rn.f32.s16 %r9076, %rs175; 2026-02-21T10:18:33.5787597Z cvt.rn.f32.s16 %r9077, %rs174; 2026-02-21T10:18:33.5787660Z cvt.rn.f32.s16 %r9078, %rs172; 2026-02-21T10:18:33.5787861Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5787925Z cvt.s16.s8 %rs177, %rs98; 2026-02-21T10:18:33.5788077Z shr.s16 %rs178, %rs177, 4; 2026-02-21T10:18:33.5788143Z cvt.s16.s8 %rs179, %rs100; 2026-02-21T10:18:33.5788291Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:18:33.5788357Z shr.s16 %rs181, %rs97, 4; 2026-02-21T10:18:33.5788476Z shr.s16 %rs182, %rs99, 4; 2026-02-21T10:18:33.5788707Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5788776Z cvt.rn.f32.s16 %r9079, %rs182; 2026-02-21T10:18:33.5788844Z cvt.rn.f32.s16 %r9080, %rs181; 2026-02-21T10:18:33.5788914Z cvt.rn.f32.s16 %r9081, %rs180; 2026-02-21T10:18:33.5788977Z cvt.rn.f32.s16 %r9082, %rs178; 2026-02-21T10:18:33.5789178Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5789244Z cvt.s16.s8 %rs183, %rs102; 2026-02-21T10:18:33.5789315Z shr.s16 %rs184, %rs183, 4; 2026-02-21T10:18:33.5789377Z cvt.s16.s8 %rs185, %rs104; 2026-02-21T10:18:33.5789446Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:18:33.5789520Z shr.s16 %rs187, %rs101, 4; 2026-02-21T10:18:33.5789583Z shr.s16 %rs188, %rs103, 4; 2026-02-21T10:18:33.5789858Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5789936Z cvt.rn.f32.s16 %r9083, %rs188; 2026-02-21T10:18:33.5790002Z cvt.rn.f32.s16 %r9084, %rs187; 2026-02-21T10:18:33.5790068Z cvt.rn.f32.s16 %r9085, %rs186; 2026-02-21T10:18:33.5790135Z cvt.rn.f32.s16 %r9086, %rs184; 2026-02-21T10:18:33.5790404Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5790472Z cvt.s16.s8 %rs189, %rs106; 2026-02-21T10:18:33.5790535Z shr.s16 %rs190, %rs189, 4; 2026-02-21T10:18:33.5790603Z cvt.s16.s8 %rs191, %rs108; 2026-02-21T10:18:33.5790666Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:18:33.5790739Z shr.s16 %rs193, %rs105, 4; 2026-02-21T10:18:33.5790806Z shr.s16 %rs194, %rs107, 4; 2026-02-21T10:18:33.5791011Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5791079Z cvt.rn.f32.s16 %r9087, %rs194; 2026-02-21T10:18:33.5791144Z cvt.rn.f32.s16 %r9088, %rs193; 2026-02-21T10:18:33.5791217Z cvt.rn.f32.s16 %r9089, %rs192; 2026-02-21T10:18:33.5791281Z cvt.rn.f32.s16 %r9090, %rs190; 2026-02-21T10:18:33.5791477Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5791549Z cvt.s16.s8 %rs195, %rs110; 2026-02-21T10:18:33.5791613Z shr.s16 %rs196, %rs195, 4; 2026-02-21T10:18:33.5791676Z cvt.s16.s8 %rs197, %rs112; 2026-02-21T10:18:33.5791738Z shr.s16 %rs198, %rs197, 4; 2026-02-21T10:18:33.5791804Z shr.s16 %rs199, %rs109, 4; 2026-02-21T10:18:33.5791864Z shr.s16 %rs200, %rs111, 4; 2026-02-21T10:18:33.5792063Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5792132Z cvt.rn.f32.s16 %r9091, %rs200; 2026-02-21T10:18:33.5792197Z cvt.rn.f32.s16 %r9092, %rs199; 2026-02-21T10:18:33.5792275Z cvt.rn.f32.s16 %r9093, %rs198; 2026-02-21T10:18:33.5792343Z cvt.rn.f32.s16 %r9094, %rs196; 2026-02-21T10:18:33.5792552Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5792615Z cvt.s16.s8 %rs201, %rs114; 2026-02-21T10:18:33.5792677Z shr.s16 %rs202, %rs201, 4; 2026-02-21T10:18:33.5792743Z cvt.s16.s8 %rs203, %rs116; 2026-02-21T10:18:33.5792808Z shr.s16 %rs204, %rs203, 4; 2026-02-21T10:18:33.5792870Z shr.s16 %rs205, %rs113, 4; 2026-02-21T10:18:33.5792935Z shr.s16 %rs206, %rs115, 4; 2026-02-21T10:18:33.5793139Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5793205Z cvt.rn.f32.s16 %r9095, %rs206; 2026-02-21T10:18:33.5793273Z cvt.rn.f32.s16 %r9096, %rs205; 2026-02-21T10:18:33.5793342Z cvt.rn.f32.s16 %r9097, %rs204; 2026-02-21T10:18:33.5793405Z cvt.rn.f32.s16 %r9098, %rs202; 2026-02-21T10:18:33.5793667Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5793782Z cvt.s16.s8 %rs207, %rs118; 2026-02-21T10:18:33.5793847Z shr.s16 %rs208, %rs207, 4; 2026-02-21T10:18:33.5793909Z cvt.s16.s8 %rs209, %rs120; 2026-02-21T10:18:33.5793976Z shr.s16 %rs210, %rs209, 4; 2026-02-21T10:18:33.5794040Z shr.s16 %rs211, %rs117, 4; 2026-02-21T10:18:33.5794103Z shr.s16 %rs212, %rs119, 4; 2026-02-21T10:18:33.5794316Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5794390Z cvt.rn.f32.s16 %r9099, %rs212; 2026-02-21T10:18:33.5794460Z cvt.rn.f32.s16 %r9100, %rs211; 2026-02-21T10:18:33.5794528Z cvt.rn.f32.s16 %r9101, %rs210; 2026-02-21T10:18:33.5794597Z cvt.rn.f32.s16 %r9102, %rs208; 2026-02-21T10:18:33.5794797Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5794861Z cvt.s16.s8 %rs213, %rs122; 2026-02-21T10:18:33.5794926Z shr.s16 %rs214, %rs213, 4; 2026-02-21T10:18:33.5795001Z cvt.s16.s8 %rs215, %rs124; 2026-02-21T10:18:33.5795063Z shr.s16 %rs216, %rs215, 4; 2026-02-21T10:18:33.5795174Z shr.s16 %rs217, %rs121, 4; 2026-02-21T10:18:33.5795243Z shr.s16 %rs218, %rs123, 4; 2026-02-21T10:18:33.5795442Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5795511Z cvt.rn.f32.s16 %r9103, %rs218; 2026-02-21T10:18:33.5795629Z cvt.rn.f32.s16 %r9104, %rs217; 2026-02-21T10:18:33.5795696Z cvt.rn.f32.s16 %r9105, %rs216; 2026-02-21T10:18:33.5795760Z cvt.rn.f32.s16 %r9106, %rs214; 2026-02-21T10:18:33.5795958Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5796027Z cvt.s16.s8 %rs219, %rs126; 2026-02-21T10:18:33.5796090Z shr.s16 %rs220, %rs219, 4; 2026-02-21T10:18:33.5796152Z cvt.s16.s8 %rs221, %rs128; 2026-02-21T10:18:33.5796220Z shr.s16 %rs222, %rs221, 4; 2026-02-21T10:18:33.5796284Z shr.s16 %rs223, %rs125, 4; 2026-02-21T10:18:33.5796347Z shr.s16 %rs224, %rs127, 4; 2026-02-21T10:18:33.5796684Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5796755Z cvt.rn.f32.s16 %r9107, %rs224; 2026-02-21T10:18:33.5796818Z cvt.rn.f32.s16 %r9108, %rs223; 2026-02-21T10:18:33.5796884Z cvt.rn.f32.s16 %r9109, %rs222; 2026-02-21T10:18:33.5796955Z cvt.rn.f32.s16 %r9110, %rs220; 2026-02-21T10:18:33.5797016Z bar.sync 0; 2026-02-21T10:18:33.5797136Z st.shared.v4.b32 [%r34], {%r9050, %r9048, %r9049, %r9047}; 2026-02-21T10:18:33.5797263Z st.shared.v4.b32 [%r34+16384], {%r9082, %r9080, %r9081, %r9079}; 2026-02-21T10:18:33.5797370Z st.shared.v4.b32 [%r35], {%r9054, %r9052, %r9053, %r9051}; 2026-02-21T10:18:33.5797484Z st.shared.v4.b32 [%r35+16384], {%r9086, %r9084, %r9085, %r9083}; 2026-02-21T10:18:33.5797588Z st.shared.v4.b32 [%r36], {%r9058, %r9056, %r9057, %r9055}; 2026-02-21T10:18:33.5797711Z st.shared.v4.b32 [%r36+16384], {%r9090, %r9088, %r9089, %r9087}; 2026-02-21T10:18:33.5797817Z st.shared.v4.b32 [%r37], {%r9062, %r9060, %r9061, %r9059}; 2026-02-21T10:18:33.5797931Z st.shared.v4.b32 [%r37+16384], {%r9094, %r9092, %r9093, %r9091}; 2026-02-21T10:18:33.5798041Z st.shared.v4.b32 [%r38], {%r9066, %r9064, %r9065, %r9063}; 2026-02-21T10:18:33.5798153Z st.shared.v4.b32 [%r38+16384], {%r9098, %r9096, %r9097, %r9095}; 2026-02-21T10:18:33.5798257Z st.shared.v4.b32 [%r39], {%r9070, %r9068, %r9069, %r9067}; 2026-02-21T10:18:33.5798374Z st.shared.v4.b32 [%r39+16384], {%r9102, %r9100, %r9101, %r9099}; 2026-02-21T10:18:33.5798476Z st.shared.v4.b32 [%r40], {%r9074, %r9072, %r9073, %r9071}; 2026-02-21T10:18:33.5798588Z st.shared.v4.b32 [%r40+16384], {%r9106, %r9104, %r9105, %r9103}; 2026-02-21T10:18:33.5798695Z st.shared.v4.b32 [%r41], {%r9078, %r9076, %r9077, %r9075}; 2026-02-21T10:18:33.5798808Z st.shared.v4.b32 [%r41+16384], {%r9110, %r9108, %r9109, %r9107}; 2026-02-21T10:18:33.5798950Z $L__tmp1: 2026-02-21T10:18:33.5799230Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.5799374Z // begin inline asm 2026-02-21T10:18:33.5799462Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.5799526Z // end inline asm 2026-02-21T10:18:33.5799589Z bar.sync 0; 2026-02-21T10:18:33.5799676Z shfl.sync.idx.b32 %r9111, %r5, 0, 31, -1; 2026-02-21T10:18:33.5799753Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.5799827Z mov.pred %p42, -1; 2026-02-21T10:18:33.5799889Z // begin inline asm 2026-02-21T10:18:33.5801457Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r1970,%r1971,%r1972,%r1973}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.5801535Z // end inline asm 2026-02-21T10:18:33.5801598Z // begin inline asm 2026-02-21T10:18:33.5803135Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2102,%r2103,%r2104,%r2105}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.5803207Z // end inline asm 2026-02-21T10:18:33.5803269Z // begin inline asm 2026-02-21T10:18:33.5804752Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2234,%r2235,%r2236,%r2237}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.5804815Z // end inline asm 2026-02-21T10:18:33.5804875Z // begin inline asm 2026-02-21T10:18:33.5806355Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2366,%r2367,%r2368,%r2369}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.5806429Z // end inline asm 2026-02-21T10:18:33.5806611Z // begin inline asm 2026-02-21T10:18:33.5808093Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2498,%r2499,%r2500,%r2501}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.5808293Z // end inline asm 2026-02-21T10:18:33.5808360Z // begin inline asm 2026-02-21T10:18:33.5809836Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2630,%r2631,%r2632,%r2633}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.5809898Z // end inline asm 2026-02-21T10:18:33.5809963Z // begin inline asm 2026-02-21T10:18:33.5811550Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2762,%r2763,%r2764,%r2765}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.5811620Z // end inline asm 2026-02-21T10:18:33.5811680Z // begin inline asm 2026-02-21T10:18:33.5813160Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r2894,%r2895,%r2896,%r2897}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.5813230Z // end inline asm 2026-02-21T10:18:33.5813300Z // begin inline asm 2026-02-21T10:18:33.5814778Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3026,%r3027,%r3028,%r3029}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.5814848Z // end inline asm 2026-02-21T10:18:33.5814907Z // begin inline asm 2026-02-21T10:18:33.5816397Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3158,%r3159,%r3160,%r3161}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.5816571Z // end inline asm 2026-02-21T10:18:33.5816638Z // begin inline asm 2026-02-21T10:18:33.5818138Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3290,%r3291,%r3292,%r3293}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.5818360Z // end inline asm 2026-02-21T10:18:33.5818421Z // begin inline asm 2026-02-21T10:18:33.5819966Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3422,%r3423,%r3424,%r3425}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.5820033Z // end inline asm 2026-02-21T10:18:33.5820098Z // begin inline asm 2026-02-21T10:18:33.5821643Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3554,%r3555,%r3556,%r3557}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.5821708Z // end inline asm 2026-02-21T10:18:33.5821774Z // begin inline asm 2026-02-21T10:18:33.5823246Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3686,%r3687,%r3688,%r3689}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.5823310Z // end inline asm 2026-02-21T10:18:33.5823368Z // begin inline asm 2026-02-21T10:18:33.5824840Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3818,%r3819,%r3820,%r3821}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.5824910Z // end inline asm 2026-02-21T10:18:33.5824971Z // begin inline asm 2026-02-21T10:18:33.5826561Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r3950,%r3951,%r3952,%r3953}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.5826767Z // end inline asm 2026-02-21T10:18:33.5826853Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.5826920Z mov.b32 %r4083, %r8914; 2026-02-21T10:18:33.5826987Z mov.b32 %r4084, %r8914; 2026-02-21T10:18:33.5827050Z mov.b32 %r4082, %r29377; 2026-02-21T10:18:33.5827112Z // begin inline asm 2026-02-21T10:18:33.5829821Z // wait for regs: %r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970,%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034,%r4082,%r4083,%r4084 2026-02-21T10:18:33.5829912Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.5829977Z // end inline asm 2026-02-21T10:18:33.5830035Z $L__tmp2: 2026-02-21T10:18:33.5830256Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.5830332Z add.s32 %r9112, %r31906, -64; 2026-02-21T10:18:33.5830398Z add.s64 %rd175, %rd134, 128; 2026-02-21T10:18:33.5830460Z add.s64 %rd178, %rd137, 128; 2026-02-21T10:18:33.5830527Z add.s64 %rd181, %rd140, 128; 2026-02-21T10:18:33.5830606Z add.s64 %rd184, %rd143, 128; 2026-02-21T10:18:33.5830673Z add.s64 %rd187, %rd146, 128; 2026-02-21T10:18:33.5830738Z add.s64 %rd190, %rd149, 128; 2026-02-21T10:18:33.5830807Z add.s64 %rd193, %rd152, 128; 2026-02-21T10:18:33.5830888Z mad.wide.s32 %rd196, %r9112, 2, %rd85; 2026-02-21T10:18:33.5831099Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.5831167Z // begin inline asm 2026-02-21T10:18:33.5831228Z mov.u64 %rd174, 0x0; 2026-02-21T10:18:33.5831362Z createpolicy.fractional.L2::evict_first.b64 %rd174, 1.0; 2026-02-21T10:18:33.5831424Z // end inline asm 2026-02-21T10:18:33.5831491Z // begin inline asm 2026-02-21T10:18:33.5831553Z mov.u32 %r4216, 0x0; 2026-02-21T10:18:33.5831616Z mov.u32 %r4217, 0x0; 2026-02-21T10:18:33.5831680Z mov.u32 %r4218, 0x0; 2026-02-21T10:18:33.5831743Z mov.u32 %r4219, 0x0; 2026-02-21T10:18:33.5831979Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4216, %r4217, %r4218, %r4219 }, [ %rd175 + 0 ], %rd174; 2026-02-21T10:18:33.5832042Z // end inline asm 2026-02-21T10:18:33.5832108Z // begin inline asm 2026-02-21T10:18:33.5832171Z mov.u64 %rd177, 0x0; 2026-02-21T10:18:33.5832298Z createpolicy.fractional.L2::evict_first.b64 %rd177, 1.0; 2026-02-21T10:18:33.5832363Z // end inline asm 2026-02-21T10:18:33.5832424Z // begin inline asm 2026-02-21T10:18:33.5832483Z mov.u32 %r4220, 0x0; 2026-02-21T10:18:33.5832547Z mov.u32 %r4221, 0x0; 2026-02-21T10:18:33.5832606Z mov.u32 %r4222, 0x0; 2026-02-21T10:18:33.5832666Z mov.u32 %r4223, 0x0; 2026-02-21T10:18:33.5832885Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4220, %r4221, %r4222, %r4223 }, [ %rd178 + 0 ], %rd177; 2026-02-21T10:18:33.5832954Z // end inline asm 2026-02-21T10:18:33.5833015Z // begin inline asm 2026-02-21T10:18:33.5833075Z mov.u64 %rd180, 0x0; 2026-02-21T10:18:33.5833204Z createpolicy.fractional.L2::evict_first.b64 %rd180, 1.0; 2026-02-21T10:18:33.5833332Z // end inline asm 2026-02-21T10:18:33.5833393Z // begin inline asm 2026-02-21T10:18:33.5833501Z mov.u32 %r4224, 0x0; 2026-02-21T10:18:33.5833566Z mov.u32 %r4225, 0x0; 2026-02-21T10:18:33.5833627Z mov.u32 %r4226, 0x0; 2026-02-21T10:18:33.5833687Z mov.u32 %r4227, 0x0; 2026-02-21T10:18:33.5833911Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4224, %r4225, %r4226, %r4227 }, [ %rd181 + 0 ], %rd180; 2026-02-21T10:18:33.5833973Z // end inline asm 2026-02-21T10:18:33.5834035Z // begin inline asm 2026-02-21T10:18:33.5834100Z mov.u64 %rd183, 0x0; 2026-02-21T10:18:33.5834222Z createpolicy.fractional.L2::evict_first.b64 %rd183, 1.0; 2026-02-21T10:18:33.5834282Z // end inline asm 2026-02-21T10:18:33.5834344Z // begin inline asm 2026-02-21T10:18:33.5834411Z mov.u32 %r4228, 0x0; 2026-02-21T10:18:33.5834471Z mov.u32 %r4229, 0x0; 2026-02-21T10:18:33.5834540Z mov.u32 %r4230, 0x0; 2026-02-21T10:18:33.5834606Z mov.u32 %r4231, 0x0; 2026-02-21T10:18:33.5834825Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4228, %r4229, %r4230, %r4231 }, [ %rd184 + 0 ], %rd183; 2026-02-21T10:18:33.5834889Z // end inline asm 2026-02-21T10:18:33.5834953Z // begin inline asm 2026-02-21T10:18:33.5835063Z mov.u64 %rd186, 0x0; 2026-02-21T10:18:33.5835187Z createpolicy.fractional.L2::evict_first.b64 %rd186, 1.0; 2026-02-21T10:18:33.5835248Z // end inline asm 2026-02-21T10:18:33.5835313Z // begin inline asm 2026-02-21T10:18:33.5835371Z mov.u32 %r4232, 0x0; 2026-02-21T10:18:33.5835472Z mov.u32 %r4233, 0x0; 2026-02-21T10:18:33.5835537Z mov.u32 %r4234, 0x0; 2026-02-21T10:18:33.5835594Z mov.u32 %r4235, 0x0; 2026-02-21T10:18:33.5835810Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4232, %r4233, %r4234, %r4235 }, [ %rd187 + 0 ], %rd186; 2026-02-21T10:18:33.5835881Z // end inline asm 2026-02-21T10:18:33.5835950Z // begin inline asm 2026-02-21T10:18:33.5836014Z mov.u64 %rd189, 0x0; 2026-02-21T10:18:33.5836133Z createpolicy.fractional.L2::evict_first.b64 %rd189, 1.0; 2026-02-21T10:18:33.5836199Z // end inline asm 2026-02-21T10:18:33.5836257Z // begin inline asm 2026-02-21T10:18:33.5836317Z mov.u32 %r4236, 0x0; 2026-02-21T10:18:33.5836380Z mov.u32 %r4237, 0x0; 2026-02-21T10:18:33.5836440Z mov.u32 %r4238, 0x0; 2026-02-21T10:18:33.5836612Z mov.u32 %r4239, 0x0; 2026-02-21T10:18:33.5836851Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4236, %r4237, %r4238, %r4239 }, [ %rd190 + 0 ], %rd189; 2026-02-21T10:18:33.5836916Z // end inline asm 2026-02-21T10:18:33.5836979Z // begin inline asm 2026-02-21T10:18:33.5837039Z mov.u64 %rd192, 0x0; 2026-02-21T10:18:33.5837167Z createpolicy.fractional.L2::evict_first.b64 %rd192, 1.0; 2026-02-21T10:18:33.5837225Z // end inline asm 2026-02-21T10:18:33.5837285Z // begin inline asm 2026-02-21T10:18:33.5837349Z mov.u32 %r4240, 0x0; 2026-02-21T10:18:33.5837408Z mov.u32 %r4241, 0x0; 2026-02-21T10:18:33.5837465Z mov.u32 %r4242, 0x0; 2026-02-21T10:18:33.5837522Z mov.u32 %r4243, 0x0; 2026-02-21T10:18:33.5837743Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4240, %r4241, %r4242, %r4243 }, [ %rd193 + 0 ], %rd192; 2026-02-21T10:18:33.5837806Z // end inline asm 2026-02-21T10:18:33.5837866Z // begin inline asm 2026-02-21T10:18:33.5837931Z mov.u64 %rd195, 0x0; 2026-02-21T10:18:33.5838051Z createpolicy.fractional.L2::evict_first.b64 %rd195, 1.0; 2026-02-21T10:18:33.5838108Z // end inline asm 2026-02-21T10:18:33.5838169Z // begin inline asm 2026-02-21T10:18:33.5838233Z mov.u32 %r4244, 0x0; 2026-02-21T10:18:33.5838293Z mov.u32 %r4245, 0x0; 2026-02-21T10:18:33.5838351Z mov.u32 %r4246, 0x0; 2026-02-21T10:18:33.5838422Z mov.u32 %r4247, 0x0; 2026-02-21T10:18:33.5838650Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4244, %r4245, %r4246, %r4247 }, [ %rd196 + 0 ], %rd195; 2026-02-21T10:18:33.5838711Z // end inline asm 2026-02-21T10:18:33.5838925Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.5838984Z bar.sync 0; 2026-02-21T10:18:33.5839069Z st.shared.v2.b32 [%r16], {%r4216, %r4217}; 2026-02-21T10:18:33.5839239Z st.shared.v2.b32 [%r16+2048], {%r4220, %r4221}; 2026-02-21T10:18:33.5839387Z st.shared.v2.b32 [%r16+4096], {%r4224, %r4225}; 2026-02-21T10:18:33.5839471Z st.shared.v2.b32 [%r16+6144], {%r4228, %r4229}; 2026-02-21T10:18:33.5839554Z st.shared.v2.b32 [%r16+8192], {%r4232, %r4233}; 2026-02-21T10:18:33.5839649Z st.shared.v2.b32 [%r16+10240], {%r4236, %r4237}; 2026-02-21T10:18:33.5839737Z st.shared.v2.b32 [%r16+12288], {%r4240, %r4241}; 2026-02-21T10:18:33.5839827Z st.shared.v2.b32 [%r16+14336], {%r4244, %r4245}; 2026-02-21T10:18:33.5839912Z st.shared.v2.b32 [%r17], {%r4218, %r4219}; 2026-02-21T10:18:33.5839997Z st.shared.v2.b32 [%r17+2048], {%r4222, %r4223}; 2026-02-21T10:18:33.5840079Z st.shared.v2.b32 [%r17+4096], {%r4226, %r4227}; 2026-02-21T10:18:33.5840159Z st.shared.v2.b32 [%r17+6144], {%r4230, %r4231}; 2026-02-21T10:18:33.5840247Z st.shared.v2.b32 [%r17+8192], {%r4234, %r4235}; 2026-02-21T10:18:33.5840342Z st.shared.v2.b32 [%r17+10240], {%r4238, %r4239}; 2026-02-21T10:18:33.5840435Z st.shared.v2.b32 [%r17+12288], {%r4242, %r4243}; 2026-02-21T10:18:33.5840530Z st.shared.v2.b32 [%r17+14336], {%r4246, %r4247}; 2026-02-21T10:18:33.5840650Z bar.sync 0; 2026-02-21T10:18:33.5840725Z ld.shared.b16 %rs225, [%r18]; 2026-02-21T10:18:33.5840802Z ld.shared.b16 %rs226, [%r18+1024]; 2026-02-21T10:18:33.5840875Z ld.shared.b16 %rs227, [%r18+64]; 2026-02-21T10:18:33.5840942Z ld.shared.b16 %rs228, [%r18+1088]; 2026-02-21T10:18:33.5841081Z ld.shared.b16 %rs229, [%r18+8192]; 2026-02-21T10:18:33.5841155Z ld.shared.b16 %rs230, [%r18+9216]; 2026-02-21T10:18:33.5841233Z ld.shared.b16 %rs231, [%r18+8256]; 2026-02-21T10:18:33.5841303Z ld.shared.b16 %rs232, [%r18+9280]; 2026-02-21T10:18:33.5841375Z ld.shared.b16 %rs233, [%r19]; 2026-02-21T10:18:33.5841441Z ld.shared.b16 %rs234, [%r19+1024]; 2026-02-21T10:18:33.5841508Z ld.shared.b16 %rs235, [%r19+64]; 2026-02-21T10:18:33.5841577Z ld.shared.b16 %rs236, [%r19+1088]; 2026-02-21T10:18:33.5841651Z ld.shared.b16 %rs237, [%r19+8192]; 2026-02-21T10:18:33.5841717Z ld.shared.b16 %rs238, [%r19+9216]; 2026-02-21T10:18:33.5841784Z ld.shared.b16 %rs239, [%r19+8256]; 2026-02-21T10:18:33.5841859Z ld.shared.b16 %rs240, [%r19+9280]; 2026-02-21T10:18:33.5841927Z ld.shared.b16 %rs241, [%r20]; 2026-02-21T10:18:33.5841992Z ld.shared.b16 %rs242, [%r20+1024]; 2026-02-21T10:18:33.5842058Z ld.shared.b16 %rs243, [%r20+64]; 2026-02-21T10:18:33.5842132Z ld.shared.b16 %rs244, [%r20+1088]; 2026-02-21T10:18:33.5842201Z ld.shared.b16 %rs245, [%r20+8192]; 2026-02-21T10:18:33.5842266Z ld.shared.b16 %rs246, [%r20+9216]; 2026-02-21T10:18:33.5842341Z ld.shared.b16 %rs247, [%r20+8256]; 2026-02-21T10:18:33.5842408Z ld.shared.b16 %rs248, [%r20+9280]; 2026-02-21T10:18:33.5842477Z ld.shared.b16 %rs249, [%r21]; 2026-02-21T10:18:33.5842552Z ld.shared.b16 %rs250, [%r21+1024]; 2026-02-21T10:18:33.5842618Z ld.shared.b16 %rs251, [%r21+64]; 2026-02-21T10:18:33.5842685Z ld.shared.b16 %rs252, [%r21+1088]; 2026-02-21T10:18:33.5842755Z ld.shared.b16 %rs253, [%r21+8192]; 2026-02-21T10:18:33.5842829Z ld.shared.b16 %rs254, [%r21+9216]; 2026-02-21T10:18:33.5842899Z ld.shared.b16 %rs255, [%r21+8256]; 2026-02-21T10:18:33.5842967Z ld.shared.b16 %rs256, [%r21+9280]; 2026-02-21T10:18:33.5843041Z ld.shared.b16 %rs257, [%r22]; 2026-02-21T10:18:33.5843108Z ld.shared.b16 %rs258, [%r22+1024]; 2026-02-21T10:18:33.5843175Z ld.shared.b16 %rs259, [%r22+64]; 2026-02-21T10:18:33.5843242Z ld.shared.b16 %rs260, [%r22+1088]; 2026-02-21T10:18:33.5843327Z ld.shared.b16 %rs261, [%r22+8192]; 2026-02-21T10:18:33.5843395Z ld.shared.b16 %rs262, [%r22+9216]; 2026-02-21T10:18:33.5843460Z ld.shared.b16 %rs263, [%r22+8256]; 2026-02-21T10:18:33.5843530Z ld.shared.b16 %rs264, [%r22+9280]; 2026-02-21T10:18:33.5843597Z ld.shared.b16 %rs265, [%r23]; 2026-02-21T10:18:33.5843665Z ld.shared.b16 %rs266, [%r23+1024]; 2026-02-21T10:18:33.5843738Z ld.shared.b16 %rs267, [%r23+64]; 2026-02-21T10:18:33.5843803Z ld.shared.b16 %rs268, [%r23+1088]; 2026-02-21T10:18:33.5843926Z ld.shared.b16 %rs269, [%r23+8192]; 2026-02-21T10:18:33.5844037Z ld.shared.b16 %rs270, [%r23+9216]; 2026-02-21T10:18:33.5844110Z ld.shared.b16 %rs271, [%r23+8256]; 2026-02-21T10:18:33.5844175Z ld.shared.b16 %rs272, [%r23+9280]; 2026-02-21T10:18:33.5844242Z ld.shared.b16 %rs273, [%r24]; 2026-02-21T10:18:33.5844317Z ld.shared.b16 %rs274, [%r24+1024]; 2026-02-21T10:18:33.5844381Z ld.shared.b16 %rs275, [%r24+64]; 2026-02-21T10:18:33.5844447Z ld.shared.b16 %rs276, [%r24+1088]; 2026-02-21T10:18:33.5844514Z ld.shared.b16 %rs277, [%r24+8192]; 2026-02-21T10:18:33.5844585Z ld.shared.b16 %rs278, [%r24+9216]; 2026-02-21T10:18:33.5844650Z ld.shared.b16 %rs279, [%r24+8256]; 2026-02-21T10:18:33.5844715Z ld.shared.b16 %rs280, [%r24+9280]; 2026-02-21T10:18:33.5844785Z ld.shared.b16 %rs281, [%r25]; 2026-02-21T10:18:33.5844850Z ld.shared.b16 %rs282, [%r25+1024]; 2026-02-21T10:18:33.5844916Z ld.shared.b16 %rs283, [%r25+64]; 2026-02-21T10:18:33.5844984Z ld.shared.b16 %rs284, [%r25+1088]; 2026-02-21T10:18:33.5845056Z ld.shared.b16 %rs285, [%r25+8192]; 2026-02-21T10:18:33.5845122Z ld.shared.b16 %rs286, [%r25+9216]; 2026-02-21T10:18:33.5845237Z ld.shared.b16 %rs287, [%r25+8256]; 2026-02-21T10:18:33.5845311Z ld.shared.b16 %rs288, [%r25+9280]; 2026-02-21T10:18:33.5845379Z cvt.f32.bf16 %r4385, %rs225; 2026-02-21T10:18:33.5845444Z cvt.f32.bf16 %r4386, %rs226; 2026-02-21T10:18:33.5845515Z cvt.f32.bf16 %r4387, %rs233; 2026-02-21T10:18:33.5845620Z cvt.f32.bf16 %r4388, %rs234; 2026-02-21T10:18:33.5845687Z cvt.f32.bf16 %r4517, %rs241; 2026-02-21T10:18:33.5845749Z cvt.f32.bf16 %r4518, %rs242; 2026-02-21T10:18:33.5845819Z cvt.f32.bf16 %r4519, %rs249; 2026-02-21T10:18:33.5845881Z cvt.f32.bf16 %r4520, %rs250; 2026-02-21T10:18:33.5845943Z cvt.f32.bf16 %r4649, %rs257; 2026-02-21T10:18:33.5846010Z cvt.f32.bf16 %r4650, %rs258; 2026-02-21T10:18:33.5846071Z cvt.f32.bf16 %r4651, %rs265; 2026-02-21T10:18:33.5846133Z cvt.f32.bf16 %r4652, %rs266; 2026-02-21T10:18:33.5846197Z cvt.f32.bf16 %r4781, %rs273; 2026-02-21T10:18:33.5846266Z cvt.f32.bf16 %r4782, %rs274; 2026-02-21T10:18:33.5846330Z cvt.f32.bf16 %r4783, %rs281; 2026-02-21T10:18:33.5846397Z cvt.f32.bf16 %r4784, %rs282; 2026-02-21T10:18:33.5846596Z cvt.f32.bf16 %r4913, %rs227; 2026-02-21T10:18:33.5846663Z cvt.f32.bf16 %r4914, %rs228; 2026-02-21T10:18:33.5846726Z cvt.f32.bf16 %r4915, %rs235; 2026-02-21T10:18:33.5846789Z cvt.f32.bf16 %r4916, %rs236; 2026-02-21T10:18:33.5846859Z cvt.f32.bf16 %r5045, %rs243; 2026-02-21T10:18:33.5846923Z cvt.f32.bf16 %r5046, %rs244; 2026-02-21T10:18:33.5846985Z cvt.f32.bf16 %r5047, %rs251; 2026-02-21T10:18:33.5847057Z cvt.f32.bf16 %r5048, %rs252; 2026-02-21T10:18:33.5847118Z cvt.f32.bf16 %r5177, %rs259; 2026-02-21T10:18:33.5847181Z cvt.f32.bf16 %r5178, %rs260; 2026-02-21T10:18:33.5847250Z cvt.f32.bf16 %r5179, %rs267; 2026-02-21T10:18:33.5847312Z cvt.f32.bf16 %r5180, %rs268; 2026-02-21T10:18:33.5847373Z cvt.f32.bf16 %r5309, %rs275; 2026-02-21T10:18:33.5847439Z cvt.f32.bf16 %r5310, %rs276; 2026-02-21T10:18:33.5847510Z cvt.f32.bf16 %r5311, %rs283; 2026-02-21T10:18:33.5847575Z cvt.f32.bf16 %r5312, %rs284; 2026-02-21T10:18:33.5847637Z cvt.f32.bf16 %r5441, %rs229; 2026-02-21T10:18:33.5847705Z cvt.f32.bf16 %r5442, %rs230; 2026-02-21T10:18:33.5847768Z cvt.f32.bf16 %r5443, %rs237; 2026-02-21T10:18:33.5847830Z cvt.f32.bf16 %r5444, %rs238; 2026-02-21T10:18:33.5847892Z cvt.f32.bf16 %r5573, %rs245; 2026-02-21T10:18:33.5847962Z cvt.f32.bf16 %r5574, %rs246; 2026-02-21T10:18:33.5848025Z cvt.f32.bf16 %r5575, %rs253; 2026-02-21T10:18:33.5848087Z cvt.f32.bf16 %r5576, %rs254; 2026-02-21T10:18:33.5848156Z cvt.f32.bf16 %r5705, %rs261; 2026-02-21T10:18:33.5848219Z cvt.f32.bf16 %r5706, %rs262; 2026-02-21T10:18:33.5848284Z cvt.f32.bf16 %r5707, %rs269; 2026-02-21T10:18:33.5848346Z cvt.f32.bf16 %r5708, %rs270; 2026-02-21T10:18:33.5848414Z cvt.f32.bf16 %r5837, %rs277; 2026-02-21T10:18:33.5848481Z cvt.f32.bf16 %r5838, %rs278; 2026-02-21T10:18:33.5848636Z cvt.f32.bf16 %r5839, %rs285; 2026-02-21T10:18:33.5848711Z cvt.f32.bf16 %r5840, %rs286; 2026-02-21T10:18:33.5848837Z cvt.f32.bf16 %r5969, %rs231; 2026-02-21T10:18:33.5848901Z cvt.f32.bf16 %r5970, %rs232; 2026-02-21T10:18:33.5848965Z cvt.f32.bf16 %r5971, %rs239; 2026-02-21T10:18:33.5849036Z cvt.f32.bf16 %r5972, %rs240; 2026-02-21T10:18:33.5849099Z cvt.f32.bf16 %r6101, %rs247; 2026-02-21T10:18:33.5849162Z cvt.f32.bf16 %r6102, %rs248; 2026-02-21T10:18:33.5849233Z cvt.f32.bf16 %r6103, %rs255; 2026-02-21T10:18:33.5849297Z cvt.f32.bf16 %r6104, %rs256; 2026-02-21T10:18:33.5849359Z cvt.f32.bf16 %r6233, %rs263; 2026-02-21T10:18:33.5849433Z cvt.f32.bf16 %r6234, %rs264; 2026-02-21T10:18:33.5849499Z cvt.f32.bf16 %r6235, %rs271; 2026-02-21T10:18:33.5849561Z cvt.f32.bf16 %r6236, %rs272; 2026-02-21T10:18:33.5849625Z cvt.f32.bf16 %r6365, %rs279; 2026-02-21T10:18:33.5849697Z cvt.f32.bf16 %r6366, %rs280; 2026-02-21T10:18:33.5849773Z cvt.f32.bf16 %r6367, %rs287; 2026-02-21T10:18:33.5849840Z cvt.f32.bf16 %r6368, %rs288; 2026-02-21T10:18:33.5850069Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.5850195Z bar.sync 0; 2026-02-21T10:18:33.5850262Z // begin inline asm 2026-02-21T10:18:33.5850371Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.5850441Z // end inline asm 2026-02-21T10:18:33.5850499Z bar.sync 0; 2026-02-21T10:18:33.5850561Z // begin inline asm 2026-02-21T10:18:33.5850765Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.5850831Z // end inline asm 2026-02-21T10:18:33.5850891Z // begin inline asm 2026-02-21T10:18:33.5850975Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.5851036Z // end inline asm 2026-02-21T10:18:33.5851105Z bar.sync 0; 2026-02-21T10:18:33.5851182Z elect.sync %r9113|%p100, -1; 2026-02-21T10:18:33.5851261Z and.pred %p60, %p1, %p100; 2026-02-21T10:18:33.5851327Z cvt.u32.u64 %r9114, %rd655; 2026-02-21T10:18:33.5851394Z add.s32 %r4252, %r9114, 128; 2026-02-21T10:18:33.5851457Z // begin inline asm 2026-02-21T10:18:33.5851799Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r9279, %r4252}], [%r19296]; 2026-02-21T10:18:33.5851857Z // end inline asm 2026-02-21T10:18:33.5851917Z bar.sync 0; 2026-02-21T10:18:33.5851989Z // begin inline asm 2026-02-21T10:18:33.5852046Z 2026-02-21T10:18:33.5852099Z { 2026-02-21T10:18:33.5852178Z .reg .pred complete; 2026-02-21T10:18:33.5852239Z waitLoop: 2026-02-21T10:18:33.5852392Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r8914; 2026-02-21T10:18:33.5852467Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.5852529Z } 2026-02-21T10:18:33.5852535Z 2026-02-21T10:18:33.5852594Z // end inline asm 2026-02-21T10:18:33.5852654Z bar.sync 0; 2026-02-21T10:18:33.5852723Z // begin inline asm 2026-02-21T10:18:33.5852831Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.5852889Z // end inline asm 2026-02-21T10:18:33.5853109Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5853180Z ld.shared.s8 %rs289, [%r26]; 2026-02-21T10:18:33.5853385Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5853451Z shl.b16 %rs290, %rs289, 4; 2026-02-21T10:18:33.5853658Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5853728Z ld.shared.s8 %rs291, [%r27+128]; 2026-02-21T10:18:33.5853931Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5854001Z shl.b16 %rs292, %rs291, 4; 2026-02-21T10:18:33.5854200Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5854270Z ld.shared.s8 %rs293, [%r28+256]; 2026-02-21T10:18:33.5854473Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5854651Z shl.b16 %rs294, %rs293, 4; 2026-02-21T10:18:33.5854851Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5854923Z ld.shared.s8 %rs295, [%r29+384]; 2026-02-21T10:18:33.5855118Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5855184Z shl.b16 %rs296, %rs295, 4; 2026-02-21T10:18:33.5855381Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5855454Z ld.shared.s8 %rs297, [%r30+512]; 2026-02-21T10:18:33.5855650Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5855711Z shl.b16 %rs298, %rs297, 4; 2026-02-21T10:18:33.5855913Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5855988Z ld.shared.s8 %rs299, [%r31+640]; 2026-02-21T10:18:33.5856231Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5856303Z shl.b16 %rs300, %rs299, 4; 2026-02-21T10:18:33.5856762Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5856833Z ld.shared.s8 %rs301, [%r32+768]; 2026-02-21T10:18:33.5857117Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5857186Z shl.b16 %rs302, %rs301, 4; 2026-02-21T10:18:33.5857393Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5859143Z [3106s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:18:33.5860456Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=64, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[None, True], range_num_stages=[2, 3], range_unroll_factors=[2, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:18:33.5860608Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:18:33.5860688Z `ptxas` stderr: 2026-02-21T10:18:33.5861154Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1154 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:33.5861262Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:33.5861267Z 2026-02-21T10:18:33.5861777Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4v0g1n9x.ptx -o /tmp/tmp4v0g1n9x.ptx.o 2026-02-21T10:18:33.5861784Z 2026-02-21T10:18:33.5861936Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:18:33.5862011Z ld.shared.s8 %rs303, [%r33+896]; 2026-02-21T10:18:33.5862226Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5862294Z shl.b16 %rs304, %rs303, 4; 2026-02-21T10:18:33.5862499Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5862578Z ld.shared.s8 %rs305, [%r26+1024]; 2026-02-21T10:18:33.5862777Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5862844Z shl.b16 %rs306, %rs305, 4; 2026-02-21T10:18:33.5863049Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5863117Z ld.shared.s8 %rs307, [%r27+1152]; 2026-02-21T10:18:33.5863417Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5863568Z shl.b16 %rs308, %rs307, 4; 2026-02-21T10:18:33.5863770Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5863843Z ld.shared.s8 %rs309, [%r28+1280]; 2026-02-21T10:18:33.5864045Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5864128Z shl.b16 %rs310, %rs309, 4; 2026-02-21T10:18:33.5864331Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5864399Z ld.shared.s8 %rs311, [%r29+1408]; 2026-02-21T10:18:33.5864601Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5864665Z shl.b16 %rs312, %rs311, 4; 2026-02-21T10:18:33.5864862Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5864945Z ld.shared.s8 %rs313, [%r30+1536]; 2026-02-21T10:18:33.5865242Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5865314Z shl.b16 %rs314, %rs313, 4; 2026-02-21T10:18:33.5865523Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5865642Z ld.shared.s8 %rs315, [%r31+1664]; 2026-02-21T10:18:33.5865841Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5865910Z shl.b16 %rs316, %rs315, 4; 2026-02-21T10:18:33.5866119Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5866188Z ld.shared.s8 %rs317, [%r32+1792]; 2026-02-21T10:18:33.5866386Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5866672Z shl.b16 %rs318, %rs317, 4; 2026-02-21T10:18:33.5866887Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5866961Z ld.shared.s8 %rs319, [%r33+1920]; 2026-02-21T10:18:33.5867168Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5867232Z shl.b16 %rs320, %rs319, 4; 2026-02-21T10:18:33.5867435Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5867509Z ld.shared.s8 %rs321, [%r26+2048]; 2026-02-21T10:18:33.5867709Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5867776Z shl.b16 %rs322, %rs321, 4; 2026-02-21T10:18:33.5867976Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5868048Z ld.shared.s8 %rs323, [%r27+2176]; 2026-02-21T10:18:33.5868264Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5868332Z shl.b16 %rs324, %rs323, 4; 2026-02-21T10:18:33.5868617Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5868691Z ld.shared.s8 %rs325, [%r28+2304]; 2026-02-21T10:18:33.5868893Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5868963Z shl.b16 %rs326, %rs325, 4; 2026-02-21T10:18:33.5869163Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5869255Z ld.shared.s8 %rs327, [%r29+2432]; 2026-02-21T10:18:33.5869619Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5869694Z shl.b16 %rs328, %rs327, 4; 2026-02-21T10:18:33.5869898Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5870127Z ld.shared.s8 %rs329, [%r30+2560]; 2026-02-21T10:18:33.5870335Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5870401Z shl.b16 %rs330, %rs329, 4; 2026-02-21T10:18:33.5870599Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5870673Z ld.shared.s8 %rs331, [%r31+2688]; 2026-02-21T10:18:33.5870872Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5870939Z shl.b16 %rs332, %rs331, 4; 2026-02-21T10:18:33.5871141Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5871211Z ld.shared.s8 %rs333, [%r32+2816]; 2026-02-21T10:18:33.5871412Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5871484Z shl.b16 %rs334, %rs333, 4; 2026-02-21T10:18:33.5871744Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5871819Z ld.shared.s8 %rs335, [%r33+2944]; 2026-02-21T10:18:33.5872018Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5872101Z shl.b16 %rs336, %rs335, 4; 2026-02-21T10:18:33.5872366Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5872438Z ld.shared.s8 %rs337, [%r26+3072]; 2026-02-21T10:18:33.5872644Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5872707Z shl.b16 %rs338, %rs337, 4; 2026-02-21T10:18:33.5872904Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5872979Z ld.shared.s8 %rs339, [%r27+3200]; 2026-02-21T10:18:33.5873179Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5873246Z shl.b16 %rs340, %rs339, 4; 2026-02-21T10:18:33.5873449Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5873515Z ld.shared.s8 %rs341, [%r28+3328]; 2026-02-21T10:18:33.5873718Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5873781Z shl.b16 %rs342, %rs341, 4; 2026-02-21T10:18:33.5873982Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5874049Z ld.shared.s8 %rs343, [%r29+3456]; 2026-02-21T10:18:33.5874255Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5874326Z shl.b16 %rs344, %rs343, 4; 2026-02-21T10:18:33.5874524Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5874592Z ld.shared.s8 %rs345, [%r30+3584]; 2026-02-21T10:18:33.5874798Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5874862Z shl.b16 %rs346, %rs345, 4; 2026-02-21T10:18:33.5875062Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5875138Z ld.shared.s8 %rs347, [%r31+3712]; 2026-02-21T10:18:33.5875336Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5875404Z shl.b16 %rs348, %rs347, 4; 2026-02-21T10:18:33.5875603Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5875684Z ld.shared.s8 %rs349, [%r32+3840]; 2026-02-21T10:18:33.5875896Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5876022Z shl.b16 %rs350, %rs349, 4; 2026-02-21T10:18:33.5876273Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5876340Z ld.shared.s8 %rs351, [%r33+3968]; 2026-02-21T10:18:33.5876679Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5876756Z shl.b16 %rs352, %rs351, 4; 2026-02-21T10:18:33.5876967Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5877036Z cvt.s16.s8 %rs353, %rs290; 2026-02-21T10:18:33.5877105Z shr.s16 %rs354, %rs353, 4; 2026-02-21T10:18:33.5877168Z cvt.s16.s8 %rs355, %rs292; 2026-02-21T10:18:33.5877232Z shr.s16 %rs356, %rs355, 4; 2026-02-21T10:18:33.5877296Z shr.s16 %rs357, %rs289, 4; 2026-02-21T10:18:33.5877368Z shr.s16 %rs358, %rs291, 4; 2026-02-21T10:18:33.5877569Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5877651Z cvt.rn.f32.s16 %r9115, %rs358; 2026-02-21T10:18:33.5877802Z cvt.rn.f32.s16 %r9116, %rs357; 2026-02-21T10:18:33.5877872Z cvt.rn.f32.s16 %r9117, %rs356; 2026-02-21T10:18:33.5877940Z cvt.rn.f32.s16 %r9118, %rs354; 2026-02-21T10:18:33.5878155Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5878278Z cvt.s16.s8 %rs359, %rs294; 2026-02-21T10:18:33.5878345Z shr.s16 %rs360, %rs359, 4; 2026-02-21T10:18:33.5878410Z cvt.s16.s8 %rs361, %rs296; 2026-02-21T10:18:33.5878479Z shr.s16 %rs362, %rs361, 4; 2026-02-21T10:18:33.5878547Z shr.s16 %rs363, %rs293, 4; 2026-02-21T10:18:33.5878611Z shr.s16 %rs364, %rs295, 4; 2026-02-21T10:18:33.5878821Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5878892Z cvt.rn.f32.s16 %r9119, %rs364; 2026-02-21T10:18:33.5878959Z cvt.rn.f32.s16 %r9120, %rs363; 2026-02-21T10:18:33.5879024Z cvt.rn.f32.s16 %r9121, %rs362; 2026-02-21T10:18:33.5879100Z cvt.rn.f32.s16 %r9122, %rs360; 2026-02-21T10:18:33.5879301Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5879367Z cvt.s16.s8 %rs365, %rs298; 2026-02-21T10:18:33.5879447Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:18:33.5879512Z cvt.s16.s8 %rs367, %rs300; 2026-02-21T10:18:33.5879575Z shr.s16 %rs368, %rs367, 4; 2026-02-21T10:18:33.5879651Z shr.s16 %rs369, %rs297, 4; 2026-02-21T10:18:33.5879714Z shr.s16 %rs370, %rs299, 4; 2026-02-21T10:18:33.5879912Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5879980Z cvt.rn.f32.s16 %r9123, %rs370; 2026-02-21T10:18:33.5880051Z cvt.rn.f32.s16 %r9124, %rs369; 2026-02-21T10:18:33.5880117Z cvt.rn.f32.s16 %r9125, %rs368; 2026-02-21T10:18:33.5880183Z cvt.rn.f32.s16 %r9126, %rs366; 2026-02-21T10:18:33.5880389Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5880456Z cvt.s16.s8 %rs371, %rs302; 2026-02-21T10:18:33.5880521Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:18:33.5880586Z cvt.s16.s8 %rs373, %rs304; 2026-02-21T10:18:33.5880655Z shr.s16 %rs374, %rs373, 4; 2026-02-21T10:18:33.5880719Z shr.s16 %rs375, %rs301, 4; 2026-02-21T10:18:33.5880784Z shr.s16 %rs376, %rs303, 4; 2026-02-21T10:18:33.5880989Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5881056Z cvt.rn.f32.s16 %r9127, %rs376; 2026-02-21T10:18:33.5881123Z cvt.rn.f32.s16 %r9128, %rs375; 2026-02-21T10:18:33.5881195Z cvt.rn.f32.s16 %r9129, %rs374; 2026-02-21T10:18:33.5881260Z cvt.rn.f32.s16 %r9130, %rs372; 2026-02-21T10:18:33.5881459Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5881612Z cvt.s16.s8 %rs377, %rs306; 2026-02-21T10:18:33.5881691Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:18:33.5881816Z cvt.s16.s8 %rs379, %rs308; 2026-02-21T10:18:33.5881884Z shr.s16 %rs380, %rs379, 4; 2026-02-21T10:18:33.5881957Z shr.s16 %rs381, %rs305, 4; 2026-02-21T10:18:33.5882020Z shr.s16 %rs382, %rs307, 4; 2026-02-21T10:18:33.5882219Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5882298Z cvt.rn.f32.s16 %r9131, %rs382; 2026-02-21T10:18:33.5882362Z cvt.rn.f32.s16 %r9132, %rs381; 2026-02-21T10:18:33.5882427Z cvt.rn.f32.s16 %r9133, %rs380; 2026-02-21T10:18:33.5882496Z cvt.rn.f32.s16 %r9134, %rs378; 2026-02-21T10:18:33.5882700Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5882765Z cvt.s16.s8 %rs383, %rs310; 2026-02-21T10:18:33.5882829Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:18:33.5882897Z cvt.s16.s8 %rs385, %rs312; 2026-02-21T10:18:33.5882963Z shr.s16 %rs386, %rs385, 4; 2026-02-21T10:18:33.5883028Z shr.s16 %rs387, %rs309, 4; 2026-02-21T10:18:33.5883093Z shr.s16 %rs388, %rs311, 4; 2026-02-21T10:18:33.5883351Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5883419Z cvt.rn.f32.s16 %r9135, %rs388; 2026-02-21T10:18:33.5883483Z cvt.rn.f32.s16 %r9136, %rs387; 2026-02-21T10:18:33.5883553Z cvt.rn.f32.s16 %r9137, %rs386; 2026-02-21T10:18:33.5883664Z cvt.rn.f32.s16 %r9138, %rs384; 2026-02-21T10:18:33.5883865Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5883933Z cvt.s16.s8 %rs389, %rs314; 2026-02-21T10:18:33.5883995Z shr.s16 %rs390, %rs389, 4; 2026-02-21T10:18:33.5884059Z cvt.s16.s8 %rs391, %rs316; 2026-02-21T10:18:33.5884120Z shr.s16 %rs392, %rs391, 4; 2026-02-21T10:18:33.5884189Z shr.s16 %rs393, %rs313, 4; 2026-02-21T10:18:33.5884266Z shr.s16 %rs394, %rs315, 4; 2026-02-21T10:18:33.5884472Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5884548Z cvt.rn.f32.s16 %r9139, %rs394; 2026-02-21T10:18:33.5884615Z cvt.rn.f32.s16 %r9140, %rs393; 2026-02-21T10:18:33.5884690Z cvt.rn.f32.s16 %r9141, %rs392; 2026-02-21T10:18:33.5884755Z cvt.rn.f32.s16 %r9142, %rs390; 2026-02-21T10:18:33.5884962Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5885027Z cvt.s16.s8 %rs395, %rs318; 2026-02-21T10:18:33.5885094Z shr.s16 %rs396, %rs395, 4; 2026-02-21T10:18:33.5885164Z cvt.s16.s8 %rs397, %rs320; 2026-02-21T10:18:33.5885227Z shr.s16 %rs398, %rs397, 4; 2026-02-21T10:18:33.5885289Z shr.s16 %rs399, %rs317, 4; 2026-02-21T10:18:33.5885357Z shr.s16 %rs400, %rs319, 4; 2026-02-21T10:18:33.5885556Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5885621Z cvt.rn.f32.s16 %r9143, %rs400; 2026-02-21T10:18:33.5885690Z cvt.rn.f32.s16 %r9144, %rs399; 2026-02-21T10:18:33.5885765Z cvt.rn.f32.s16 %r9145, %rs398; 2026-02-21T10:18:33.5885832Z cvt.rn.f32.s16 %r9146, %rs396; 2026-02-21T10:18:33.5886030Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5886103Z cvt.s16.s8 %rs401, %rs322; 2026-02-21T10:18:33.5886165Z shr.s16 %rs402, %rs401, 4; 2026-02-21T10:18:33.5886230Z cvt.s16.s8 %rs403, %rs324; 2026-02-21T10:18:33.5886301Z shr.s16 %rs404, %rs403, 4; 2026-02-21T10:18:33.5886365Z shr.s16 %rs405, %rs321, 4; 2026-02-21T10:18:33.5886429Z shr.s16 %rs406, %rs323, 4; 2026-02-21T10:18:33.5886755Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5886836Z cvt.rn.f32.s16 %r9147, %rs406; 2026-02-21T10:18:33.5886902Z cvt.rn.f32.s16 %r9148, %rs405; 2026-02-21T10:18:33.5886970Z cvt.rn.f32.s16 %r9149, %rs404; 2026-02-21T10:18:33.5887151Z cvt.rn.f32.s16 %r9150, %rs402; 2026-02-21T10:18:33.5887353Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5887479Z cvt.s16.s8 %rs407, %rs326; 2026-02-21T10:18:33.5887542Z shr.s16 %rs408, %rs407, 4; 2026-02-21T10:18:33.5887617Z cvt.s16.s8 %rs409, %rs328; 2026-02-21T10:18:33.5887680Z shr.s16 %rs410, %rs409, 4; 2026-02-21T10:18:33.5887743Z shr.s16 %rs411, %rs325, 4; 2026-02-21T10:18:33.5887817Z shr.s16 %rs412, %rs327, 4; 2026-02-21T10:18:33.5888018Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5888085Z cvt.rn.f32.s16 %r9151, %rs412; 2026-02-21T10:18:33.5888159Z cvt.rn.f32.s16 %r9152, %rs411; 2026-02-21T10:18:33.5888232Z cvt.rn.f32.s16 %r9153, %rs410; 2026-02-21T10:18:33.5888300Z cvt.rn.f32.s16 %r9154, %rs408; 2026-02-21T10:18:33.5888499Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5888573Z cvt.s16.s8 %rs413, %rs330; 2026-02-21T10:18:33.5888642Z shr.s16 %rs414, %rs413, 4; 2026-02-21T10:18:33.5888706Z cvt.s16.s8 %rs415, %rs332; 2026-02-21T10:18:33.5888851Z shr.s16 %rs416, %rs415, 4; 2026-02-21T10:18:33.5888920Z shr.s16 %rs417, %rs329, 4; 2026-02-21T10:18:33.5888983Z shr.s16 %rs418, %rs331, 4; 2026-02-21T10:18:33.5889188Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5889320Z cvt.rn.f32.s16 %r9155, %rs418; 2026-02-21T10:18:33.5889388Z cvt.rn.f32.s16 %r9156, %rs417; 2026-02-21T10:18:33.5889454Z cvt.rn.f32.s16 %r9157, %rs416; 2026-02-21T10:18:33.5889532Z cvt.rn.f32.s16 %r9158, %rs414; 2026-02-21T10:18:33.5889732Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5889796Z cvt.s16.s8 %rs419, %rs334; 2026-02-21T10:18:33.5889868Z shr.s16 %rs420, %rs419, 4; 2026-02-21T10:18:33.5889933Z cvt.s16.s8 %rs421, %rs336; 2026-02-21T10:18:33.5889998Z shr.s16 %rs422, %rs421, 4; 2026-02-21T10:18:33.5890066Z shr.s16 %rs423, %rs333, 4; 2026-02-21T10:18:33.5890142Z shr.s16 %rs424, %rs335, 4; 2026-02-21T10:18:33.5890343Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5890410Z cvt.rn.f32.s16 %r9159, %rs424; 2026-02-21T10:18:33.5890481Z cvt.rn.f32.s16 %r9160, %rs423; 2026-02-21T10:18:33.5890549Z cvt.rn.f32.s16 %r9161, %rs422; 2026-02-21T10:18:33.5890628Z cvt.rn.f32.s16 %r9162, %rs420; 2026-02-21T10:18:33.5890842Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5890910Z cvt.s16.s8 %rs425, %rs338; 2026-02-21T10:18:33.5890971Z shr.s16 %rs426, %rs425, 4; 2026-02-21T10:18:33.5891033Z cvt.s16.s8 %rs427, %rs340; 2026-02-21T10:18:33.5891105Z shr.s16 %rs428, %rs427, 4; 2026-02-21T10:18:33.5891166Z shr.s16 %rs429, %rs337, 4; 2026-02-21T10:18:33.5891232Z shr.s16 %rs430, %rs339, 4; 2026-02-21T10:18:33.5891439Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5891508Z cvt.rn.f32.s16 %r9163, %rs430; 2026-02-21T10:18:33.5891572Z cvt.rn.f32.s16 %r9164, %rs429; 2026-02-21T10:18:33.5891637Z cvt.rn.f32.s16 %r9165, %rs428; 2026-02-21T10:18:33.5891708Z cvt.rn.f32.s16 %r9166, %rs426; 2026-02-21T10:18:33.5891908Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5891973Z cvt.s16.s8 %rs431, %rs342; 2026-02-21T10:18:33.5892041Z shr.s16 %rs432, %rs431, 4; 2026-02-21T10:18:33.5892107Z cvt.s16.s8 %rs433, %rs344; 2026-02-21T10:18:33.5892168Z shr.s16 %rs434, %rs433, 4; 2026-02-21T10:18:33.5892237Z shr.s16 %rs435, %rs341, 4; 2026-02-21T10:18:33.5892300Z shr.s16 %rs436, %rs343, 4; 2026-02-21T10:18:33.5892502Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5892632Z cvt.rn.f32.s16 %r9167, %rs436; 2026-02-21T10:18:33.5892744Z cvt.rn.f32.s16 %r9168, %rs435; 2026-02-21T10:18:33.5892809Z cvt.rn.f32.s16 %r9169, %rs434; 2026-02-21T10:18:33.5892874Z cvt.rn.f32.s16 %r9170, %rs432; 2026-02-21T10:18:33.5893081Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5893146Z cvt.s16.s8 %rs437, %rs346; 2026-02-21T10:18:33.5893208Z shr.s16 %rs438, %rs437, 4; 2026-02-21T10:18:33.5893286Z cvt.s16.s8 %rs439, %rs348; 2026-02-21T10:18:33.5893351Z shr.s16 %rs440, %rs439, 4; 2026-02-21T10:18:33.5893415Z shr.s16 %rs441, %rs345, 4; 2026-02-21T10:18:33.5893479Z shr.s16 %rs442, %rs347, 4; 2026-02-21T10:18:33.5893684Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5893751Z cvt.rn.f32.s16 %r9171, %rs442; 2026-02-21T10:18:33.5893815Z cvt.rn.f32.s16 %r9172, %rs441; 2026-02-21T10:18:33.5893888Z cvt.rn.f32.s16 %r9173, %rs440; 2026-02-21T10:18:33.5893954Z cvt.rn.f32.s16 %r9174, %rs438; 2026-02-21T10:18:33.5894201Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5894266Z cvt.s16.s8 %rs443, %rs350; 2026-02-21T10:18:33.5894343Z shr.s16 %rs444, %rs443, 4; 2026-02-21T10:18:33.5894407Z cvt.s16.s8 %rs445, %rs352; 2026-02-21T10:18:33.5894468Z shr.s16 %rs446, %rs445, 4; 2026-02-21T10:18:33.5894579Z shr.s16 %rs447, %rs349, 4; 2026-02-21T10:18:33.5894643Z shr.s16 %rs448, %rs351, 4; 2026-02-21T10:18:33.5894841Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5894913Z cvt.rn.f32.s16 %r9175, %rs448; 2026-02-21T10:18:33.5894991Z cvt.rn.f32.s16 %r9176, %rs447; 2026-02-21T10:18:33.5895058Z cvt.rn.f32.s16 %r9177, %rs446; 2026-02-21T10:18:33.5907562Z cvt.rn.f32.s16 %r9178, %rs444; 2026-02-21T10:18:33.5907671Z bar.sync 0; 2026-02-21T10:18:33.5907819Z st.shared.v4.b32 [%r34], {%r9118, %r9116, %r9117, %r9115}; 2026-02-21T10:18:33.5907972Z st.shared.v4.b32 [%r34+16384], {%r9150, %r9148, %r9149, %r9147}; 2026-02-21T10:18:33.5908103Z st.shared.v4.b32 [%r35], {%r9122, %r9120, %r9121, %r9119}; 2026-02-21T10:18:33.5908236Z st.shared.v4.b32 [%r35+16384], {%r9154, %r9152, %r9153, %r9151}; 2026-02-21T10:18:33.5908354Z st.shared.v4.b32 [%r36], {%r9126, %r9124, %r9125, %r9123}; 2026-02-21T10:18:33.5908554Z st.shared.v4.b32 [%r36+16384], {%r9158, %r9156, %r9157, %r9155}; 2026-02-21T10:18:33.5908663Z st.shared.v4.b32 [%r37], {%r9130, %r9128, %r9129, %r9127}; 2026-02-21T10:18:33.5908785Z st.shared.v4.b32 [%r37+16384], {%r9162, %r9160, %r9161, %r9159}; 2026-02-21T10:18:33.5908886Z st.shared.v4.b32 [%r38], {%r9134, %r9132, %r9133, %r9131}; 2026-02-21T10:18:33.5908995Z st.shared.v4.b32 [%r38+16384], {%r9166, %r9164, %r9165, %r9163}; 2026-02-21T10:18:33.5909095Z st.shared.v4.b32 [%r39], {%r9138, %r9136, %r9137, %r9135}; 2026-02-21T10:18:33.5909212Z st.shared.v4.b32 [%r39+16384], {%r9170, %r9168, %r9169, %r9167}; 2026-02-21T10:18:33.5909315Z st.shared.v4.b32 [%r40], {%r9142, %r9140, %r9141, %r9139}; 2026-02-21T10:18:33.5909428Z st.shared.v4.b32 [%r40+16384], {%r9174, %r9172, %r9173, %r9171}; 2026-02-21T10:18:33.5909537Z st.shared.v4.b32 [%r41], {%r9146, %r9144, %r9145, %r9143}; 2026-02-21T10:18:33.5909649Z st.shared.v4.b32 [%r41+16384], {%r9178, %r9176, %r9177, %r9175}; 2026-02-21T10:18:33.5909710Z $L__tmp3: 2026-02-21T10:18:33.5910016Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.5910083Z // begin inline asm 2026-02-21T10:18:33.5910173Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.5910243Z // end inline asm 2026-02-21T10:18:33.5910303Z bar.sync 0; 2026-02-21T10:18:33.5910382Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.5910447Z // begin inline asm 2026-02-21T10:18:33.5911922Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r4385,%r4386,%r4387,%r4388}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.5912228Z // end inline asm 2026-02-21T10:18:33.5912301Z // begin inline asm 2026-02-21T10:18:33.5913838Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r4517,%r4518,%r4519,%r4520}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.5913906Z // end inline asm 2026-02-21T10:18:33.5913975Z // begin inline asm 2026-02-21T10:18:33.5915499Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r4649,%r4650,%r4651,%r4652}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.5915571Z // end inline asm 2026-02-21T10:18:33.5915632Z // begin inline asm 2026-02-21T10:18:33.5917224Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r4781,%r4782,%r4783,%r4784}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.5917296Z // end inline asm 2026-02-21T10:18:33.5917357Z // begin inline asm 2026-02-21T10:18:33.5918811Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r4913,%r4914,%r4915,%r4916}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.5918885Z // end inline asm 2026-02-21T10:18:33.5918945Z // begin inline asm 2026-02-21T10:18:33.5920407Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r5045,%r5046,%r5047,%r5048}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.5920613Z // end inline asm 2026-02-21T10:18:33.5920680Z // begin inline asm 2026-02-21T10:18:33.5922147Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r5177,%r5178,%r5179,%r5180}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.5922210Z // end inline asm 2026-02-21T10:18:33.5922273Z // begin inline asm 2026-02-21T10:18:33.5923878Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r5309,%r5310,%r5311,%r5312}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.5923946Z // end inline asm 2026-02-21T10:18:33.5924016Z // begin inline asm 2026-02-21T10:18:33.5925493Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r5441,%r5442,%r5443,%r5444}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.5925556Z // end inline asm 2026-02-21T10:18:33.5925623Z // begin inline asm 2026-02-21T10:18:33.5927198Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r5573,%r5574,%r5575,%r5576}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.5927269Z // end inline asm 2026-02-21T10:18:33.5927333Z // begin inline asm 2026-02-21T10:18:33.5928780Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r5705,%r5706,%r5707,%r5708}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.5928849Z // end inline asm 2026-02-21T10:18:33.5928909Z // begin inline asm 2026-02-21T10:18:33.5930352Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r5837,%r5838,%r5839,%r5840}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.5930559Z // end inline asm 2026-02-21T10:18:33.5930620Z // begin inline asm 2026-02-21T10:18:33.5932127Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r5969,%r5970,%r5971,%r5972}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.5932190Z // end inline asm 2026-02-21T10:18:33.5932250Z // begin inline asm 2026-02-21T10:18:33.5933771Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r6101,%r6102,%r6103,%r6104}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.5933836Z // end inline asm 2026-02-21T10:18:33.5933897Z // begin inline asm 2026-02-21T10:18:33.5935349Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r6233,%r6234,%r6235,%r6236}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.5935407Z // end inline asm 2026-02-21T10:18:33.5935476Z // begin inline asm 2026-02-21T10:18:33.5937035Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r6365,%r6366,%r6367,%r6368}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.5937102Z // end inline asm 2026-02-21T10:18:33.5937192Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.5937255Z mov.b32 %r6498, %r8914; 2026-02-21T10:18:33.5937316Z mov.b32 %r6499, %r8914; 2026-02-21T10:18:33.5937386Z mov.b32 %r6497, %r29377; 2026-02-21T10:18:33.5937446Z // begin inline asm 2026-02-21T10:18:33.5939907Z // wait for regs: %r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970,%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034,%r6497,%r6498,%r6499 2026-02-21T10:18:33.5940133Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.5940193Z // end inline asm 2026-02-21T10:18:33.5940257Z $L__tmp4: 2026-02-21T10:18:33.5940484Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.5940556Z add.s64 %rd216, %rd134, 256; 2026-02-21T10:18:33.5940675Z add.s64 %rd219, %rd137, 256; 2026-02-21T10:18:33.5940748Z add.s64 %rd222, %rd140, 256; 2026-02-21T10:18:33.5940811Z add.s64 %rd225, %rd143, 256; 2026-02-21T10:18:33.5940873Z add.s64 %rd228, %rd146, 256; 2026-02-21T10:18:33.5940940Z add.s64 %rd231, %rd149, 256; 2026-02-21T10:18:33.5941073Z add.s64 %rd234, %rd152, 256; 2026-02-21T10:18:33.5941156Z mad.wide.s32 %rd237, %r31906, 2, %rd85; 2026-02-21T10:18:33.5941371Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.5941436Z // begin inline asm 2026-02-21T10:18:33.5941498Z mov.u64 %rd215, 0x0; 2026-02-21T10:18:33.5941631Z createpolicy.fractional.L2::evict_first.b64 %rd215, 1.0; 2026-02-21T10:18:33.5941696Z // end inline asm 2026-02-21T10:18:33.5941757Z // begin inline asm 2026-02-21T10:18:33.5941820Z mov.u32 %r6631, 0x0; 2026-02-21T10:18:33.5941888Z mov.u32 %r6632, 0x0; 2026-02-21T10:18:33.5941950Z mov.u32 %r6633, 0x0; 2026-02-21T10:18:33.5942011Z mov.u32 %r6634, 0x0; 2026-02-21T10:18:33.5942246Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6631, %r6632, %r6633, %r6634 }, [ %rd216 + 0 ], %rd215; 2026-02-21T10:18:33.5942317Z // end inline asm 2026-02-21T10:18:33.5942378Z // begin inline asm 2026-02-21T10:18:33.5942437Z mov.u64 %rd218, 0x0; 2026-02-21T10:18:33.5942574Z createpolicy.fractional.L2::evict_first.b64 %rd218, 1.0; 2026-02-21T10:18:33.5942635Z // end inline asm 2026-02-21T10:18:33.5942696Z // begin inline asm 2026-02-21T10:18:33.5942764Z mov.u32 %r6635, 0x0; 2026-02-21T10:18:33.5942825Z mov.u32 %r6636, 0x0; 2026-02-21T10:18:33.5942883Z mov.u32 %r6637, 0x0; 2026-02-21T10:18:33.5942945Z mov.u32 %r6638, 0x0; 2026-02-21T10:18:33.5943170Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6635, %r6636, %r6637, %r6638 }, [ %rd219 + 0 ], %rd218; 2026-02-21T10:18:33.5943231Z // end inline asm 2026-02-21T10:18:33.5943293Z // begin inline asm 2026-02-21T10:18:33.5943365Z mov.u64 %rd221, 0x0; 2026-02-21T10:18:33.5943496Z createpolicy.fractional.L2::evict_first.b64 %rd221, 1.0; 2026-02-21T10:18:33.5943555Z // end inline asm 2026-02-21T10:18:33.5943616Z // begin inline asm 2026-02-21T10:18:33.5943682Z mov.u32 %r6639, 0x0; 2026-02-21T10:18:33.5943740Z mov.u32 %r6640, 0x0; 2026-02-21T10:18:33.5943796Z mov.u32 %r6641, 0x0; 2026-02-21T10:18:33.5943862Z mov.u32 %r6642, 0x0; 2026-02-21T10:18:33.5944077Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6639, %r6640, %r6641, %r6642 }, [ %rd222 + 0 ], %rd221; 2026-02-21T10:18:33.5944136Z // end inline asm 2026-02-21T10:18:33.5944201Z // begin inline asm 2026-02-21T10:18:33.5944261Z mov.u64 %rd224, 0x0; 2026-02-21T10:18:33.5944379Z createpolicy.fractional.L2::evict_first.b64 %rd224, 1.0; 2026-02-21T10:18:33.5944438Z // end inline asm 2026-02-21T10:18:33.5944503Z // begin inline asm 2026-02-21T10:18:33.5944624Z mov.u32 %r6643, 0x0; 2026-02-21T10:18:33.5944681Z mov.u32 %r6644, 0x0; 2026-02-21T10:18:33.5944787Z mov.u32 %r6645, 0x0; 2026-02-21T10:18:33.5944844Z mov.u32 %r6646, 0x0; 2026-02-21T10:18:33.5945061Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6643, %r6644, %r6645, %r6646 }, [ %rd225 + 0 ], %rd224; 2026-02-21T10:18:33.5945125Z // end inline asm 2026-02-21T10:18:33.5945184Z // begin inline asm 2026-02-21T10:18:33.5945244Z mov.u64 %rd227, 0x0; 2026-02-21T10:18:33.5945364Z createpolicy.fractional.L2::evict_first.b64 %rd227, 1.0; 2026-02-21T10:18:33.5945427Z // end inline asm 2026-02-21T10:18:33.5945487Z // begin inline asm 2026-02-21T10:18:33.5945543Z mov.u32 %r6647, 0x0; 2026-02-21T10:18:33.5945609Z mov.u32 %r6648, 0x0; 2026-02-21T10:18:33.5945667Z mov.u32 %r6649, 0x0; 2026-02-21T10:18:33.5945727Z mov.u32 %r6650, 0x0; 2026-02-21T10:18:33.5945941Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6647, %r6648, %r6649, %r6650 }, [ %rd228 + 0 ], %rd227; 2026-02-21T10:18:33.5946010Z // end inline asm 2026-02-21T10:18:33.5946070Z // begin inline asm 2026-02-21T10:18:33.5946131Z mov.u64 %rd230, 0x0; 2026-02-21T10:18:33.5946301Z createpolicy.fractional.L2::evict_first.b64 %rd230, 1.0; 2026-02-21T10:18:33.5946362Z // end inline asm 2026-02-21T10:18:33.5946422Z // begin inline asm 2026-02-21T10:18:33.5946608Z mov.u32 %r6651, 0x0; 2026-02-21T10:18:33.5946671Z mov.u32 %r6652, 0x0; 2026-02-21T10:18:33.5946742Z mov.u32 %r6653, 0x0; 2026-02-21T10:18:33.5946871Z mov.u32 %r6654, 0x0; 2026-02-21T10:18:33.5947110Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6651, %r6652, %r6653, %r6654 }, [ %rd231 + 0 ], %rd230; 2026-02-21T10:18:33.5947171Z // end inline asm 2026-02-21T10:18:33.5947231Z // begin inline asm 2026-02-21T10:18:33.5947295Z mov.u64 %rd233, 0x0; 2026-02-21T10:18:33.5947413Z createpolicy.fractional.L2::evict_first.b64 %rd233, 1.0; 2026-02-21T10:18:33.5947471Z // end inline asm 2026-02-21T10:18:33.5947530Z // begin inline asm 2026-02-21T10:18:33.5947593Z mov.u32 %r6655, 0x0; 2026-02-21T10:18:33.5947654Z mov.u32 %r6656, 0x0; 2026-02-21T10:18:33.5947713Z mov.u32 %r6657, 0x0; 2026-02-21T10:18:33.5947776Z mov.u32 %r6658, 0x0; 2026-02-21T10:18:33.5947989Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6655, %r6656, %r6657, %r6658 }, [ %rd234 + 0 ], %rd233; 2026-02-21T10:18:33.5948045Z // end inline asm 2026-02-21T10:18:33.5948110Z // begin inline asm 2026-02-21T10:18:33.5948168Z mov.u64 %rd236, 0x0; 2026-02-21T10:18:33.5948287Z createpolicy.fractional.L2::evict_first.b64 %rd236, 1.0; 2026-02-21T10:18:33.5948345Z // end inline asm 2026-02-21T10:18:33.5948474Z // begin inline asm 2026-02-21T10:18:33.5948545Z mov.u32 %r6659, 0x0; 2026-02-21T10:18:33.5948605Z mov.u32 %r6660, 0x0; 2026-02-21T10:18:33.5948675Z mov.u32 %r6661, 0x0; 2026-02-21T10:18:33.5948734Z mov.u32 %r6662, 0x0; 2026-02-21T10:18:33.5948948Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r6659, %r6660, %r6661, %r6662 }, [ %rd237 + 0 ], %rd236; 2026-02-21T10:18:33.5949011Z // end inline asm 2026-02-21T10:18:33.5949221Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.5949281Z bar.sync 0; 2026-02-21T10:18:33.5949365Z st.shared.v2.b32 [%r16], {%r6631, %r6632}; 2026-02-21T10:18:33.5949459Z st.shared.v2.b32 [%r16+2048], {%r6635, %r6636}; 2026-02-21T10:18:33.5949542Z st.shared.v2.b32 [%r16+4096], {%r6639, %r6640}; 2026-02-21T10:18:33.5949627Z st.shared.v2.b32 [%r16+6144], {%r6643, %r6644}; 2026-02-21T10:18:33.5949715Z st.shared.v2.b32 [%r16+8192], {%r6647, %r6648}; 2026-02-21T10:18:33.5949805Z st.shared.v2.b32 [%r16+10240], {%r6651, %r6652}; 2026-02-21T10:18:33.5949890Z st.shared.v2.b32 [%r16+12288], {%r6655, %r6656}; 2026-02-21T10:18:33.5949979Z st.shared.v2.b32 [%r16+14336], {%r6659, %r6660}; 2026-02-21T10:18:33.5950056Z st.shared.v2.b32 [%r17], {%r6633, %r6634}; 2026-02-21T10:18:33.5950139Z st.shared.v2.b32 [%r17+2048], {%r6637, %r6638}; 2026-02-21T10:18:33.5950222Z st.shared.v2.b32 [%r17+4096], {%r6641, %r6642}; 2026-02-21T10:18:33.5950399Z st.shared.v2.b32 [%r17+6144], {%r6645, %r6646}; 2026-02-21T10:18:33.5950535Z st.shared.v2.b32 [%r17+8192], {%r6649, %r6650}; 2026-02-21T10:18:33.5950623Z st.shared.v2.b32 [%r17+10240], {%r6653, %r6654}; 2026-02-21T10:18:33.5950718Z st.shared.v2.b32 [%r17+12288], {%r6657, %r6658}; 2026-02-21T10:18:33.5950801Z st.shared.v2.b32 [%r17+14336], {%r6661, %r6662}; 2026-02-21T10:18:33.5950859Z bar.sync 0; 2026-02-21T10:18:33.5950937Z ld.shared.b16 %rs449, [%r18]; 2026-02-21T10:18:33.5951009Z ld.shared.b16 %rs450, [%r18+1024]; 2026-02-21T10:18:33.5951080Z ld.shared.b16 %rs451, [%r18+64]; 2026-02-21T10:18:33.5951146Z ld.shared.b16 %rs452, [%r18+1088]; 2026-02-21T10:18:33.5951218Z ld.shared.b16 %rs453, [%r18+8192]; 2026-02-21T10:18:33.5951285Z ld.shared.b16 %rs454, [%r18+9216]; 2026-02-21T10:18:33.5951350Z ld.shared.b16 %rs455, [%r18+8256]; 2026-02-21T10:18:33.5951423Z ld.shared.b16 %rs456, [%r18+9280]; 2026-02-21T10:18:33.5951488Z ld.shared.b16 %rs457, [%r19]; 2026-02-21T10:18:33.5951555Z ld.shared.b16 %rs458, [%r19+1024]; 2026-02-21T10:18:33.5951625Z ld.shared.b16 %rs459, [%r19+64]; 2026-02-21T10:18:33.5951775Z ld.shared.b16 %rs460, [%r19+1088]; 2026-02-21T10:18:33.5951843Z ld.shared.b16 %rs461, [%r19+8192]; 2026-02-21T10:18:33.5951908Z ld.shared.b16 %rs462, [%r19+9216]; 2026-02-21T10:18:33.5951979Z ld.shared.b16 %rs463, [%r19+8256]; 2026-02-21T10:18:33.5952046Z ld.shared.b16 %rs464, [%r19+9280]; 2026-02-21T10:18:33.5952156Z ld.shared.b16 %rs465, [%r20]; 2026-02-21T10:18:33.5952224Z ld.shared.b16 %rs466, [%r20+1024]; 2026-02-21T10:18:33.5952294Z ld.shared.b16 %rs467, [%r20+64]; 2026-02-21T10:18:33.5952357Z ld.shared.b16 %rs468, [%r20+1088]; 2026-02-21T10:18:33.5952436Z ld.shared.b16 %rs469, [%r20+8192]; 2026-02-21T10:18:33.5952509Z ld.shared.b16 %rs470, [%r20+9216]; 2026-02-21T10:18:33.5952572Z ld.shared.b16 %rs471, [%r20+8256]; 2026-02-21T10:18:33.5952633Z ld.shared.b16 %rs472, [%r20+9280]; 2026-02-21T10:18:33.5952709Z ld.shared.b16 %rs473, [%r21]; 2026-02-21T10:18:33.5952776Z ld.shared.b16 %rs474, [%r21+1024]; 2026-02-21T10:18:33.5952844Z ld.shared.b16 %rs475, [%r21+64]; 2026-02-21T10:18:33.5952910Z ld.shared.b16 %rs476, [%r21+1088]; 2026-02-21T10:18:33.5952982Z ld.shared.b16 %rs477, [%r21+8192]; 2026-02-21T10:18:33.5953044Z ld.shared.b16 %rs478, [%r21+9216]; 2026-02-21T10:18:33.5953107Z ld.shared.b16 %rs479, [%r21+8256]; 2026-02-21T10:18:33.5953178Z ld.shared.b16 %rs480, [%r21+9280]; 2026-02-21T10:18:33.5953248Z ld.shared.b16 %rs481, [%r22]; 2026-02-21T10:18:33.5953314Z ld.shared.b16 %rs482, [%r22+1024]; 2026-02-21T10:18:33.5953381Z ld.shared.b16 %rs483, [%r22+64]; 2026-02-21T10:18:33.5953451Z ld.shared.b16 %rs484, [%r22+1088]; 2026-02-21T10:18:33.5953515Z ld.shared.b16 %rs485, [%r22+8192]; 2026-02-21T10:18:33.5953579Z ld.shared.b16 %rs486, [%r22+9216]; 2026-02-21T10:18:33.5953649Z ld.shared.b16 %rs487, [%r22+8256]; 2026-02-21T10:18:33.5953716Z ld.shared.b16 %rs488, [%r22+9280]; 2026-02-21T10:18:33.5953783Z ld.shared.b16 %rs489, [%r23]; 2026-02-21T10:18:33.5953854Z ld.shared.b16 %rs490, [%r23+1024]; 2026-02-21T10:18:33.5953924Z ld.shared.b16 %rs491, [%r23+64]; 2026-02-21T10:18:33.5953993Z ld.shared.b16 %rs492, [%r23+1088]; 2026-02-21T10:18:33.5954056Z ld.shared.b16 %rs493, [%r23+8192]; 2026-02-21T10:18:33.5954125Z ld.shared.b16 %rs494, [%r23+9216]; 2026-02-21T10:18:33.5954191Z ld.shared.b16 %rs495, [%r23+8256]; 2026-02-21T10:18:33.5954257Z ld.shared.b16 %rs496, [%r23+9280]; 2026-02-21T10:18:33.5954328Z ld.shared.b16 %rs497, [%r24]; 2026-02-21T10:18:33.5954396Z ld.shared.b16 %rs498, [%r24+1024]; 2026-02-21T10:18:33.5954463Z ld.shared.b16 %rs499, [%r24+64]; 2026-02-21T10:18:33.5954528Z ld.shared.b16 %rs500, [%r24+1088]; 2026-02-21T10:18:33.5954596Z ld.shared.b16 %rs501, [%r24+8192]; 2026-02-21T10:18:33.5954660Z ld.shared.b16 %rs502, [%r24+9216]; 2026-02-21T10:18:33.5954724Z ld.shared.b16 %rs503, [%r24+8256]; 2026-02-21T10:18:33.5954793Z ld.shared.b16 %rs504, [%r24+9280]; 2026-02-21T10:18:33.5954919Z ld.shared.b16 %rs505, [%r25]; 2026-02-21T10:18:33.5955027Z ld.shared.b16 %rs506, [%r25+1024]; 2026-02-21T10:18:33.5955092Z ld.shared.b16 %rs507, [%r25+64]; 2026-02-21T10:18:33.5955159Z ld.shared.b16 %rs508, [%r25+1088]; 2026-02-21T10:18:33.5955223Z ld.shared.b16 %rs509, [%r25+8192]; 2026-02-21T10:18:33.5955288Z ld.shared.b16 %rs510, [%r25+9216]; 2026-02-21T10:18:33.5955356Z ld.shared.b16 %rs511, [%r25+8256]; 2026-02-21T10:18:33.5955432Z ld.shared.b16 %rs512, [%r25+9280]; 2026-02-21T10:18:33.5955502Z cvt.f32.bf16 %r6800, %rs449; 2026-02-21T10:18:33.5955575Z cvt.f32.bf16 %r6801, %rs450; 2026-02-21T10:18:33.5955636Z cvt.f32.bf16 %r6802, %rs457; 2026-02-21T10:18:33.5955699Z cvt.f32.bf16 %r6803, %rs458; 2026-02-21T10:18:33.5955757Z cvt.f32.bf16 %r6932, %rs465; 2026-02-21T10:18:33.5955821Z cvt.f32.bf16 %r6933, %rs466; 2026-02-21T10:18:33.5955882Z cvt.f32.bf16 %r6934, %rs473; 2026-02-21T10:18:33.5955942Z cvt.f32.bf16 %r6935, %rs474; 2026-02-21T10:18:33.5956008Z cvt.f32.bf16 %r7064, %rs481; 2026-02-21T10:18:33.5956072Z cvt.f32.bf16 %r7065, %rs482; 2026-02-21T10:18:33.5956134Z cvt.f32.bf16 %r7066, %rs489; 2026-02-21T10:18:33.5956251Z cvt.f32.bf16 %r7067, %rs490; 2026-02-21T10:18:33.5956323Z cvt.f32.bf16 %r7196, %rs497; 2026-02-21T10:18:33.5956384Z cvt.f32.bf16 %r7197, %rs498; 2026-02-21T10:18:33.5956445Z cvt.f32.bf16 %r7198, %rs505; 2026-02-21T10:18:33.5956721Z cvt.f32.bf16 %r7199, %rs506; 2026-02-21T10:18:33.5956876Z cvt.f32.bf16 %r7328, %rs451; 2026-02-21T10:18:33.5956941Z cvt.f32.bf16 %r7329, %rs452; 2026-02-21T10:18:33.5957004Z cvt.f32.bf16 %r7330, %rs459; 2026-02-21T10:18:33.5957074Z cvt.f32.bf16 %r7331, %rs460; 2026-02-21T10:18:33.5957135Z cvt.f32.bf16 %r7460, %rs467; 2026-02-21T10:18:33.5957197Z cvt.f32.bf16 %r7461, %rs468; 2026-02-21T10:18:33.5957270Z cvt.f32.bf16 %r7462, %rs475; 2026-02-21T10:18:33.5957339Z cvt.f32.bf16 %r7463, %rs476; 2026-02-21T10:18:33.5957401Z cvt.f32.bf16 %r7592, %rs483; 2026-02-21T10:18:33.5957465Z cvt.f32.bf16 %r7593, %rs484; 2026-02-21T10:18:33.5957533Z cvt.f32.bf16 %r7594, %rs491; 2026-02-21T10:18:33.5957595Z cvt.f32.bf16 %r7595, %rs492; 2026-02-21T10:18:33.5957658Z cvt.f32.bf16 %r7724, %rs499; 2026-02-21T10:18:33.5957727Z cvt.f32.bf16 %r7725, %rs500; 2026-02-21T10:18:33.5957789Z cvt.f32.bf16 %r7726, %rs507; 2026-02-21T10:18:33.5957852Z cvt.f32.bf16 %r7727, %rs508; 2026-02-21T10:18:33.5957934Z cvt.f32.bf16 %r7856, %rs453; 2026-02-21T10:18:33.5957999Z cvt.f32.bf16 %r7857, %rs454; 2026-02-21T10:18:33.5958062Z cvt.f32.bf16 %r7858, %rs461; 2026-02-21T10:18:33.5958129Z cvt.f32.bf16 %r7859, %rs462; 2026-02-21T10:18:33.5958190Z cvt.f32.bf16 %r7988, %rs469; 2026-02-21T10:18:33.5958250Z cvt.f32.bf16 %r7989, %rs470; 2026-02-21T10:18:33.5958312Z cvt.f32.bf16 %r7990, %rs477; 2026-02-21T10:18:33.5958378Z cvt.f32.bf16 %r7991, %rs478; 2026-02-21T10:18:33.5958438Z cvt.f32.bf16 %r8120, %rs485; 2026-02-21T10:18:33.5958503Z cvt.f32.bf16 %r8121, %rs486; 2026-02-21T10:18:33.5958570Z cvt.f32.bf16 %r8122, %rs493; 2026-02-21T10:18:33.5958632Z cvt.f32.bf16 %r8123, %rs494; 2026-02-21T10:18:33.5958694Z cvt.f32.bf16 %r8252, %rs501; 2026-02-21T10:18:33.5958756Z cvt.f32.bf16 %r8253, %rs502; 2026-02-21T10:18:33.5958827Z cvt.f32.bf16 %r8254, %rs509; 2026-02-21T10:18:33.5958889Z cvt.f32.bf16 %r8255, %rs510; 2026-02-21T10:18:33.5958950Z cvt.f32.bf16 %r8384, %rs455; 2026-02-21T10:18:33.5959014Z cvt.f32.bf16 %r8385, %rs456; 2026-02-21T10:18:33.5959078Z cvt.f32.bf16 %r8386, %rs463; 2026-02-21T10:18:33.5959140Z cvt.f32.bf16 %r8387, %rs464; 2026-02-21T10:18:33.5959200Z cvt.f32.bf16 %r8516, %rs471; 2026-02-21T10:18:33.5959268Z cvt.f32.bf16 %r8517, %rs472; 2026-02-21T10:18:33.5959328Z cvt.f32.bf16 %r8518, %rs479; 2026-02-21T10:18:33.5959387Z cvt.f32.bf16 %r8519, %rs480; 2026-02-21T10:18:33.5959451Z cvt.f32.bf16 %r8648, %rs487; 2026-02-21T10:18:33.5959510Z cvt.f32.bf16 %r8649, %rs488; 2026-02-21T10:18:33.5959570Z cvt.f32.bf16 %r8650, %rs495; 2026-02-21T10:18:33.5959724Z cvt.f32.bf16 %r8651, %rs496; 2026-02-21T10:18:33.5959786Z cvt.f32.bf16 %r8780, %rs503; 2026-02-21T10:18:33.5959904Z cvt.f32.bf16 %r8781, %rs504; 2026-02-21T10:18:33.5959966Z cvt.f32.bf16 %r8782, %rs511; 2026-02-21T10:18:33.5960030Z cvt.f32.bf16 %r8783, %rs512; 2026-02-21T10:18:33.5960247Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.5960305Z bar.sync 0; 2026-02-21T10:18:33.5960372Z // begin inline asm 2026-02-21T10:18:33.5960479Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.5960549Z // end inline asm 2026-02-21T10:18:33.5960606Z bar.sync 0; 2026-02-21T10:18:33.5960670Z // begin inline asm 2026-02-21T10:18:33.5960808Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.5960866Z // end inline asm 2026-02-21T10:18:33.5960931Z // begin inline asm 2026-02-21T10:18:33.5961010Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.5961066Z // end inline asm 2026-02-21T10:18:33.5961123Z bar.sync 0; 2026-02-21T10:18:33.5961198Z elect.sync %r9179|%p101, -1; 2026-02-21T10:18:33.5961272Z and.pred %p80, %p1, %p101; 2026-02-21T10:18:33.5961401Z add.s32 %r6667, %r9114, 160; 2026-02-21T10:18:33.5961469Z // begin inline asm 2026-02-21T10:18:33.5961803Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r9279, %r6667}], [%r19296]; 2026-02-21T10:18:33.5961863Z // end inline asm 2026-02-21T10:18:33.5961973Z bar.sync 0; 2026-02-21T10:18:33.5962038Z // begin inline asm 2026-02-21T10:18:33.5962095Z 2026-02-21T10:18:33.5962145Z { 2026-02-21T10:18:33.5962226Z .reg .pred complete; 2026-02-21T10:18:33.5962287Z waitLoop: 2026-02-21T10:18:33.5962432Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r8914; 2026-02-21T10:18:33.5962507Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.5962558Z } 2026-02-21T10:18:33.5962564Z 2026-02-21T10:18:33.5962623Z // end inline asm 2026-02-21T10:18:33.5962681Z bar.sync 0; 2026-02-21T10:18:33.5962749Z // begin inline asm 2026-02-21T10:18:33.5962849Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.5962906Z // end inline asm 2026-02-21T10:18:33.5963124Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5963193Z ld.shared.s8 %rs513, [%r26]; 2026-02-21T10:18:33.5963394Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5963464Z shl.b16 %rs514, %rs513, 4; 2026-02-21T10:18:33.5963660Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5963730Z ld.shared.s8 %rs515, [%r27+128]; 2026-02-21T10:18:33.5963928Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5963991Z shl.b16 %rs516, %rs515, 4; 2026-02-21T10:18:33.5964185Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5964254Z ld.shared.s8 %rs517, [%r28+256]; 2026-02-21T10:18:33.5964456Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5964521Z shl.b16 %rs518, %rs517, 4; 2026-02-21T10:18:33.5964730Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5964804Z ld.shared.s8 %rs519, [%r29+384]; 2026-02-21T10:18:33.5965003Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5965067Z shl.b16 %rs520, %rs519, 4; 2026-02-21T10:18:33.5965271Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5965337Z ld.shared.s8 %rs521, [%r30+512]; 2026-02-21T10:18:33.5965535Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5965672Z shl.b16 %rs522, %rs521, 4; 2026-02-21T10:18:33.5965914Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5965980Z ld.shared.s8 %rs523, [%r31+640]; 2026-02-21T10:18:33.5966176Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5966240Z shl.b16 %rs524, %rs523, 4; 2026-02-21T10:18:33.5966435Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5966642Z ld.shared.s8 %rs525, [%r32+768]; 2026-02-21T10:18:33.5966845Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5966908Z shl.b16 %rs526, %rs525, 4; 2026-02-21T10:18:33.5967106Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5967181Z ld.shared.s8 %rs527, [%r33+896]; 2026-02-21T10:18:33.5967386Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5967459Z shl.b16 %rs528, %rs527, 4; 2026-02-21T10:18:33.5967740Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5967816Z ld.shared.s8 %rs529, [%r26+1024]; 2026-02-21T10:18:33.5968077Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5968143Z shl.b16 %rs530, %rs529, 4; 2026-02-21T10:18:33.5968340Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5968409Z ld.shared.s8 %rs531, [%r27+1152]; 2026-02-21T10:18:33.5968608Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5968673Z shl.b16 %rs532, %rs531, 4; 2026-02-21T10:18:33.5968882Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5968963Z ld.shared.s8 %rs533, [%r28+1280]; 2026-02-21T10:18:33.5969160Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5969225Z shl.b16 %rs534, %rs533, 4; 2026-02-21T10:18:33.5969427Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5969495Z ld.shared.s8 %rs535, [%r29+1408]; 2026-02-21T10:18:33.5969690Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5969757Z shl.b16 %rs536, %rs535, 4; 2026-02-21T10:18:33.5969954Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5970018Z ld.shared.s8 %rs537, [%r30+1536]; 2026-02-21T10:18:33.5970362Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5970454Z shl.b16 %rs538, %rs537, 4; 2026-02-21T10:18:33.5970658Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5970722Z ld.shared.s8 %rs539, [%r31+1664]; 2026-02-21T10:18:33.5970924Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5970986Z shl.b16 %rs540, %rs539, 4; 2026-02-21T10:18:33.5971183Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5971255Z ld.shared.s8 %rs541, [%r32+1792]; 2026-02-21T10:18:33.5971451Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5971513Z shl.b16 %rs542, %rs541, 4; 2026-02-21T10:18:33.5971709Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5971871Z ld.shared.s8 %rs543, [%r33+1920]; 2026-02-21T10:18:33.5972073Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5972214Z shl.b16 %rs544, %rs543, 4; 2026-02-21T10:18:33.5972413Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5972488Z ld.shared.s8 %rs545, [%r26+2048]; 2026-02-21T10:18:33.5972694Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5972761Z shl.b16 %rs546, %rs545, 4; 2026-02-21T10:18:33.5972956Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5973020Z ld.shared.s8 %rs547, [%r27+2176]; 2026-02-21T10:18:33.5973217Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5973278Z shl.b16 %rs548, %rs547, 4; 2026-02-21T10:18:33.5973473Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5973545Z ld.shared.s8 %rs549, [%r28+2304]; 2026-02-21T10:18:33.5973788Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5973850Z shl.b16 %rs550, %rs549, 4; 2026-02-21T10:18:33.5974044Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5974154Z ld.shared.s8 %rs551, [%r29+2432]; 2026-02-21T10:18:33.5974352Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5974413Z shl.b16 %rs552, %rs551, 4; 2026-02-21T10:18:33.5974611Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5974676Z ld.shared.s8 %rs553, [%r30+2560]; 2026-02-21T10:18:33.5974870Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5974950Z shl.b16 %rs554, %rs553, 4; 2026-02-21T10:18:33.5975152Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5975218Z ld.shared.s8 %rs555, [%r31+2688]; 2026-02-21T10:18:33.5975417Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5975480Z shl.b16 %rs556, %rs555, 4; 2026-02-21T10:18:33.5975676Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5975741Z ld.shared.s8 %rs557, [%r32+2816]; 2026-02-21T10:18:33.5975939Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5976002Z shl.b16 %rs558, %rs557, 4; 2026-02-21T10:18:33.5976196Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5976269Z ld.shared.s8 %rs559, [%r33+2944]; 2026-02-21T10:18:33.5976630Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5976710Z shl.b16 %rs560, %rs559, 4; 2026-02-21T10:18:33.5976928Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5976995Z ld.shared.s8 %rs561, [%r26+3072]; 2026-02-21T10:18:33.5977193Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5977259Z shl.b16 %rs562, %rs561, 4; 2026-02-21T10:18:33.5977452Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5977517Z ld.shared.s8 %rs563, [%r27+3200]; 2026-02-21T10:18:33.5977721Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5977784Z shl.b16 %rs564, %rs563, 4; 2026-02-21T10:18:33.5978081Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5978204Z ld.shared.s8 %rs565, [%r28+3328]; 2026-02-21T10:18:33.5978405Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5978468Z shl.b16 %rs566, %rs565, 4; 2026-02-21T10:18:33.5978663Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5978734Z ld.shared.s8 %rs567, [%r29+3456]; 2026-02-21T10:18:33.5978928Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5978989Z shl.b16 %rs568, %rs567, 4; 2026-02-21T10:18:33.5979187Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5979253Z ld.shared.s8 %rs569, [%r30+3584]; 2026-02-21T10:18:33.5979447Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5979516Z shl.b16 %rs570, %rs569, 4; 2026-02-21T10:18:33.5979779Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5979849Z ld.shared.s8 %rs571, [%r31+3712]; 2026-02-21T10:18:33.5980045Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5980172Z shl.b16 %rs572, %rs571, 4; 2026-02-21T10:18:33.5980371Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5980520Z ld.shared.s8 %rs573, [%r32+3840]; 2026-02-21T10:18:33.5980731Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5980796Z shl.b16 %rs574, %rs573, 4; 2026-02-21T10:18:33.5980994Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5981067Z ld.shared.s8 %rs575, [%r33+3968]; 2026-02-21T10:18:33.5981265Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.5981326Z shl.b16 %rs576, %rs575, 4; 2026-02-21T10:18:33.5981531Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5981594Z cvt.s16.s8 %rs577, %rs514; 2026-02-21T10:18:33.5981658Z shr.s16 %rs578, %rs577, 4; 2026-02-21T10:18:33.5981719Z cvt.s16.s8 %rs579, %rs516; 2026-02-21T10:18:33.5981785Z shr.s16 %rs580, %rs579, 4; 2026-02-21T10:18:33.5981845Z shr.s16 %rs581, %rs513, 4; 2026-02-21T10:18:33.5981906Z shr.s16 %rs582, %rs515, 4; 2026-02-21T10:18:33.5982108Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5982175Z cvt.rn.f32.s16 %r9180, %rs582; 2026-02-21T10:18:33.5982240Z cvt.rn.f32.s16 %r9181, %rs581; 2026-02-21T10:18:33.5982306Z cvt.rn.f32.s16 %r9182, %rs580; 2026-02-21T10:18:33.5982375Z cvt.rn.f32.s16 %r9183, %rs578; 2026-02-21T10:18:33.5982573Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5982635Z cvt.s16.s8 %rs583, %rs518; 2026-02-21T10:18:33.5982703Z shr.s16 %rs584, %rs583, 4; 2026-02-21T10:18:33.5982767Z cvt.s16.s8 %rs585, %rs520; 2026-02-21T10:18:33.5982828Z shr.s16 %rs586, %rs585, 4; 2026-02-21T10:18:33.5982895Z shr.s16 %rs587, %rs517, 4; 2026-02-21T10:18:33.5982958Z shr.s16 %rs588, %rs519, 4; 2026-02-21T10:18:33.5983154Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5983218Z cvt.rn.f32.s16 %r9184, %rs588; 2026-02-21T10:18:33.5983287Z cvt.rn.f32.s16 %r9185, %rs587; 2026-02-21T10:18:33.5983349Z cvt.rn.f32.s16 %r9186, %rs586; 2026-02-21T10:18:33.5983424Z cvt.rn.f32.s16 %r9187, %rs584; 2026-02-21T10:18:33.5983629Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5983790Z cvt.s16.s8 %rs589, %rs522; 2026-02-21T10:18:33.5983850Z shr.s16 %rs590, %rs589, 4; 2026-02-21T10:18:33.5983911Z cvt.s16.s8 %rs591, %rs524; 2026-02-21T10:18:33.5983979Z shr.s16 %rs592, %rs591, 4; 2026-02-21T10:18:33.5984038Z shr.s16 %rs593, %rs521, 4; 2026-02-21T10:18:33.5984099Z shr.s16 %rs594, %rs523, 4; 2026-02-21T10:18:33.5984301Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5984365Z cvt.rn.f32.s16 %r9188, %rs594; 2026-02-21T10:18:33.5984426Z cvt.rn.f32.s16 %r9189, %rs593; 2026-02-21T10:18:33.5984491Z cvt.rn.f32.s16 %r9190, %rs592; 2026-02-21T10:18:33.5984552Z cvt.rn.f32.s16 %r9191, %rs590; 2026-02-21T10:18:33.5984746Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5984807Z cvt.s16.s8 %rs595, %rs526; 2026-02-21T10:18:33.5984874Z shr.s16 %rs596, %rs595, 4; 2026-02-21T10:18:33.5984934Z cvt.s16.s8 %rs597, %rs528; 2026-02-21T10:18:33.5984995Z shr.s16 %rs598, %rs597, 4; 2026-02-21T10:18:33.5985058Z shr.s16 %rs599, %rs525, 4; 2026-02-21T10:18:33.5985165Z shr.s16 %rs600, %rs527, 4; 2026-02-21T10:18:33.5985374Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5985443Z cvt.rn.f32.s16 %r9192, %rs600; 2026-02-21T10:18:33.5985549Z cvt.rn.f32.s16 %r9193, %rs599; 2026-02-21T10:18:33.5985613Z cvt.rn.f32.s16 %r9194, %rs598; 2026-02-21T10:18:33.5985674Z cvt.rn.f32.s16 %r9195, %rs596; 2026-02-21T10:18:33.5985878Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5985940Z cvt.s16.s8 %rs601, %rs530; 2026-02-21T10:18:33.5986000Z shr.s16 %rs602, %rs601, 4; 2026-02-21T10:18:33.5986068Z cvt.s16.s8 %rs603, %rs532; 2026-02-21T10:18:33.5986128Z shr.s16 %rs604, %rs603, 4; 2026-02-21T10:18:33.5986189Z shr.s16 %rs605, %rs529, 4; 2026-02-21T10:18:33.5986249Z shr.s16 %rs606, %rs531, 4; 2026-02-21T10:18:33.5986569Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5986636Z cvt.rn.f32.s16 %r9196, %rs606; 2026-02-21T10:18:33.5986700Z cvt.rn.f32.s16 %r9197, %rs605; 2026-02-21T10:18:33.5986775Z cvt.rn.f32.s16 %r9198, %rs604; 2026-02-21T10:18:33.5986845Z cvt.rn.f32.s16 %r9199, %rs602; 2026-02-21T10:18:33.5987044Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5987112Z cvt.s16.s8 %rs607, %rs534; 2026-02-21T10:18:33.5987173Z shr.s16 %rs608, %rs607, 4; 2026-02-21T10:18:33.5987232Z cvt.s16.s8 %rs609, %rs536; 2026-02-21T10:18:33.5987295Z shr.s16 %rs610, %rs609, 4; 2026-02-21T10:18:33.5987362Z shr.s16 %rs611, %rs533, 4; 2026-02-21T10:18:33.5987424Z shr.s16 %rs612, %rs535, 4; 2026-02-21T10:18:33.5987620Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5987693Z cvt.rn.f32.s16 %r9200, %rs612; 2026-02-21T10:18:33.5987757Z cvt.rn.f32.s16 %r9201, %rs611; 2026-02-21T10:18:33.5987821Z cvt.rn.f32.s16 %r9202, %rs610; 2026-02-21T10:18:33.5987882Z cvt.rn.f32.s16 %r9203, %rs608; 2026-02-21T10:18:33.5988085Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5988148Z cvt.s16.s8 %rs613, %rs538; 2026-02-21T10:18:33.5988208Z shr.s16 %rs614, %rs613, 4; 2026-02-21T10:18:33.5988275Z cvt.s16.s8 %rs615, %rs540; 2026-02-21T10:18:33.5988334Z shr.s16 %rs616, %rs615, 4; 2026-02-21T10:18:33.5988395Z shr.s16 %rs617, %rs537, 4; 2026-02-21T10:18:33.5988525Z shr.s16 %rs618, %rs539, 4; 2026-02-21T10:18:33.5988725Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5988788Z cvt.rn.f32.s16 %r9204, %rs618; 2026-02-21T10:18:33.5988850Z cvt.rn.f32.s16 %r9205, %rs617; 2026-02-21T10:18:33.5989003Z cvt.rn.f32.s16 %r9206, %rs616; 2026-02-21T10:18:33.5989124Z cvt.rn.f32.s16 %r9207, %rs614; 2026-02-21T10:18:33.5989322Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5989391Z cvt.s16.s8 %rs619, %rs542; 2026-02-21T10:18:33.5989453Z shr.s16 %rs620, %rs619, 4; 2026-02-21T10:18:33.5989511Z cvt.s16.s8 %rs621, %rs544; 2026-02-21T10:18:33.5989580Z shr.s16 %rs622, %rs621, 4; 2026-02-21T10:18:33.5989641Z shr.s16 %rs623, %rs541, 4; 2026-02-21T10:18:33.5989701Z shr.s16 %rs624, %rs543, 4; 2026-02-21T10:18:33.5989905Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5989980Z cvt.rn.f32.s16 %r9208, %rs624; 2026-02-21T10:18:33.5990046Z cvt.rn.f32.s16 %r9209, %rs623; 2026-02-21T10:18:33.5990110Z cvt.rn.f32.s16 %r9210, %rs622; 2026-02-21T10:18:33.5990176Z cvt.rn.f32.s16 %r9211, %rs620; 2026-02-21T10:18:33.5990382Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5990463Z cvt.s16.s8 %rs625, %rs546; 2026-02-21T10:18:33.5990590Z shr.s16 %rs626, %rs625, 4; 2026-02-21T10:18:33.5990661Z cvt.s16.s8 %rs627, %rs548; 2026-02-21T10:18:33.5990722Z shr.s16 %rs628, %rs627, 4; 2026-02-21T10:18:33.5990783Z shr.s16 %rs629, %rs545, 4; 2026-02-21T10:18:33.5990850Z shr.s16 %rs630, %rs547, 4; 2026-02-21T10:18:33.5991107Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5991174Z cvt.rn.f32.s16 %r9212, %rs630; 2026-02-21T10:18:33.5991253Z cvt.rn.f32.s16 %r9213, %rs629; 2026-02-21T10:18:33.5991318Z cvt.rn.f32.s16 %r9214, %rs628; 2026-02-21T10:18:33.5991380Z cvt.rn.f32.s16 %r9215, %rs626; 2026-02-21T10:18:33.5991578Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5991647Z cvt.s16.s8 %rs631, %rs550; 2026-02-21T10:18:33.5991710Z shr.s16 %rs632, %rs631, 4; 2026-02-21T10:18:33.5991773Z cvt.s16.s8 %rs633, %rs552; 2026-02-21T10:18:33.5991837Z shr.s16 %rs634, %rs633, 4; 2026-02-21T10:18:33.5991897Z shr.s16 %rs635, %rs549, 4; 2026-02-21T10:18:33.5991959Z shr.s16 %rs636, %rs551, 4; 2026-02-21T10:18:33.5992159Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5992221Z cvt.rn.f32.s16 %r9216, %rs636; 2026-02-21T10:18:33.5992286Z cvt.rn.f32.s16 %r9217, %rs635; 2026-02-21T10:18:33.5992355Z cvt.rn.f32.s16 %r9218, %rs634; 2026-02-21T10:18:33.5992430Z cvt.rn.f32.s16 %r9219, %rs632; 2026-02-21T10:18:33.5992630Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5992693Z cvt.s16.s8 %rs637, %rs554; 2026-02-21T10:18:33.5992757Z shr.s16 %rs638, %rs637, 4; 2026-02-21T10:18:33.5992817Z cvt.s16.s8 %rs639, %rs556; 2026-02-21T10:18:33.5992877Z shr.s16 %rs640, %rs639, 4; 2026-02-21T10:18:33.5992940Z shr.s16 %rs641, %rs553, 4; 2026-02-21T10:18:33.5993008Z shr.s16 %rs642, %rs555, 4; 2026-02-21T10:18:33.5993206Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5993268Z cvt.rn.f32.s16 %r9220, %rs642; 2026-02-21T10:18:33.5993336Z cvt.rn.f32.s16 %r9221, %rs641; 2026-02-21T10:18:33.5993400Z cvt.rn.f32.s16 %r9222, %rs640; 2026-02-21T10:18:33.5993463Z cvt.rn.f32.s16 %r9223, %rs638; 2026-02-21T10:18:33.5993665Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5993726Z cvt.s16.s8 %rs643, %rs558; 2026-02-21T10:18:33.5993789Z shr.s16 %rs644, %rs643, 4; 2026-02-21T10:18:33.5993850Z cvt.s16.s8 %rs645, %rs560; 2026-02-21T10:18:33.5993915Z shr.s16 %rs646, %rs645, 4; 2026-02-21T10:18:33.5993976Z shr.s16 %rs647, %rs557, 4; 2026-02-21T10:18:33.5994037Z shr.s16 %rs648, %rs559, 4; 2026-02-21T10:18:33.5994316Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5994423Z cvt.rn.f32.s16 %r9224, %rs648; 2026-02-21T10:18:33.5994487Z cvt.rn.f32.s16 %r9225, %rs647; 2026-02-21T10:18:33.5994548Z cvt.rn.f32.s16 %r9226, %rs646; 2026-02-21T10:18:33.5994627Z cvt.rn.f32.s16 %r9227, %rs644; 2026-02-21T10:18:33.5994829Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5994891Z cvt.s16.s8 %rs649, %rs562; 2026-02-21T10:18:33.5994958Z shr.s16 %rs650, %rs649, 4; 2026-02-21T10:18:33.5995019Z cvt.s16.s8 %rs651, %rs564; 2026-02-21T10:18:33.5995079Z shr.s16 %rs652, %rs651, 4; 2026-02-21T10:18:33.5995143Z shr.s16 %rs653, %rs561, 4; 2026-02-21T10:18:33.5995203Z shr.s16 %rs654, %rs563, 4; 2026-02-21T10:18:33.5995400Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5995465Z cvt.rn.f32.s16 %r9228, %rs654; 2026-02-21T10:18:33.5995535Z cvt.rn.f32.s16 %r9229, %rs653; 2026-02-21T10:18:33.5995600Z cvt.rn.f32.s16 %r9230, %rs652; 2026-02-21T10:18:33.5995665Z cvt.rn.f32.s16 %r9231, %rs650; 2026-02-21T10:18:33.5995912Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5995976Z cvt.s16.s8 %rs655, %rs566; 2026-02-21T10:18:33.5996037Z shr.s16 %rs656, %rs655, 4; 2026-02-21T10:18:33.5996100Z cvt.s16.s8 %rs657, %rs568; 2026-02-21T10:18:33.5996203Z shr.s16 %rs658, %rs657, 4; 2026-02-21T10:18:33.5996265Z shr.s16 %rs659, %rs565, 4; 2026-02-21T10:18:33.5996324Z shr.s16 %rs660, %rs567, 4; 2026-02-21T10:18:33.5996644Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5996710Z cvt.rn.f32.s16 %r9232, %rs660; 2026-02-21T10:18:33.5996777Z cvt.rn.f32.s16 %r9233, %rs659; 2026-02-21T10:18:33.5996844Z cvt.rn.f32.s16 %r9234, %rs658; 2026-02-21T10:18:33.5996908Z cvt.rn.f32.s16 %r9235, %rs656; 2026-02-21T10:18:33.5997108Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5997175Z cvt.s16.s8 %rs661, %rs570; 2026-02-21T10:18:33.5997241Z shr.s16 %rs662, %rs661, 4; 2026-02-21T10:18:33.5997302Z cvt.s16.s8 %rs663, %rs572; 2026-02-21T10:18:33.5997365Z shr.s16 %rs664, %rs663, 4; 2026-02-21T10:18:33.5997429Z shr.s16 %rs665, %rs569, 4; 2026-02-21T10:18:33.5997491Z shr.s16 %rs666, %rs571, 4; 2026-02-21T10:18:33.5997689Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5997756Z cvt.rn.f32.s16 %r9236, %rs666; 2026-02-21T10:18:33.5997819Z cvt.rn.f32.s16 %r9237, %rs665; 2026-02-21T10:18:33.5997880Z cvt.rn.f32.s16 %r9238, %rs664; 2026-02-21T10:18:33.5997944Z cvt.rn.f32.s16 %r9239, %rs662; 2026-02-21T10:18:33.5998144Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.5998209Z cvt.s16.s8 %rs667, %rs574; 2026-02-21T10:18:33.5998270Z shr.s16 %rs668, %rs667, 4; 2026-02-21T10:18:33.5998338Z cvt.s16.s8 %rs669, %rs576; 2026-02-21T10:18:33.5998400Z shr.s16 %rs670, %rs669, 4; 2026-02-21T10:18:33.5998473Z shr.s16 %rs671, %rs573, 4; 2026-02-21T10:18:33.5998537Z shr.s16 %rs672, %rs575, 4; 2026-02-21T10:18:33.5998743Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.5998809Z cvt.rn.f32.s16 %r9240, %rs672; 2026-02-21T10:18:33.5998872Z cvt.rn.f32.s16 %r9241, %rs671; 2026-02-21T10:18:33.5998942Z cvt.rn.f32.s16 %r9242, %rs670; 2026-02-21T10:18:33.5999004Z cvt.rn.f32.s16 %r9243, %rs668; 2026-02-21T10:18:33.5999061Z bar.sync 0; 2026-02-21T10:18:33.5999183Z st.shared.v4.b32 [%r34], {%r9183, %r9181, %r9182, %r9180}; 2026-02-21T10:18:33.5999308Z st.shared.v4.b32 [%r34+16384], {%r9215, %r9213, %r9214, %r9212}; 2026-02-21T10:18:33.5999413Z st.shared.v4.b32 [%r35], {%r9187, %r9185, %r9186, %r9184}; 2026-02-21T10:18:33.5999608Z st.shared.v4.b32 [%r35+16384], {%r9219, %r9217, %r9218, %r9216}; 2026-02-21T10:18:33.5999772Z st.shared.v4.b32 [%r36], {%r9191, %r9189, %r9190, %r9188}; 2026-02-21T10:18:33.5999887Z st.shared.v4.b32 [%r36+16384], {%r9223, %r9221, %r9222, %r9220}; 2026-02-21T10:18:33.5999988Z st.shared.v4.b32 [%r37], {%r9195, %r9193, %r9194, %r9192}; 2026-02-21T10:18:33.6000104Z st.shared.v4.b32 [%r37+16384], {%r9227, %r9225, %r9226, %r9224}; 2026-02-21T10:18:33.6000209Z st.shared.v4.b32 [%r38], {%r9199, %r9197, %r9198, %r9196}; 2026-02-21T10:18:33.6000320Z st.shared.v4.b32 [%r38+16384], {%r9231, %r9229, %r9230, %r9228}; 2026-02-21T10:18:33.6000425Z st.shared.v4.b32 [%r39], {%r9203, %r9201, %r9202, %r9200}; 2026-02-21T10:18:33.6000534Z st.shared.v4.b32 [%r39+16384], {%r9235, %r9233, %r9234, %r9232}; 2026-02-21T10:18:33.6000636Z st.shared.v4.b32 [%r40], {%r9207, %r9205, %r9206, %r9204}; 2026-02-21T10:18:33.6000751Z st.shared.v4.b32 [%r40+16384], {%r9239, %r9237, %r9238, %r9236}; 2026-02-21T10:18:33.6000858Z st.shared.v4.b32 [%r41], {%r9211, %r9209, %r9210, %r9208}; 2026-02-21T10:18:33.6000972Z st.shared.v4.b32 [%r41+16384], {%r9243, %r9241, %r9242, %r9240}; 2026-02-21T10:18:33.6001090Z $L__tmp5: 2026-02-21T10:18:33.6001377Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6001440Z // begin inline asm 2026-02-21T10:18:33.6001592Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6001659Z // end inline asm 2026-02-21T10:18:33.6001716Z bar.sync 0; 2026-02-21T10:18:33.6001789Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6001850Z // begin inline asm 2026-02-21T10:18:33.6003350Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r6800,%r6801,%r6802,%r6803}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.6003412Z // end inline asm 2026-02-21T10:18:33.6003475Z // begin inline asm 2026-02-21T10:18:33.6004956Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r6932,%r6933,%r6934,%r6935}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.6005031Z // end inline asm 2026-02-21T10:18:33.6005094Z // begin inline asm 2026-02-21T10:18:33.6006683Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7064,%r7065,%r7066,%r7067}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.6006750Z // end inline asm 2026-02-21T10:18:33.6006810Z // begin inline asm 2026-02-21T10:18:33.6008282Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7196,%r7197,%r7198,%r7199}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.6008485Z // end inline asm 2026-02-21T10:18:33.6008545Z // begin inline asm 2026-02-21T10:18:33.6010088Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7328,%r7329,%r7330,%r7331}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.6010156Z // end inline asm 2026-02-21T10:18:33.6010216Z // begin inline asm 2026-02-21T10:18:33.6011758Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7460,%r7461,%r7462,%r7463}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.6011822Z // end inline asm 2026-02-21T10:18:33.6011884Z // begin inline asm 2026-02-21T10:18:33.6013359Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7592,%r7593,%r7594,%r7595}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.6013419Z // end inline asm 2026-02-21T10:18:33.6013482Z // begin inline asm 2026-02-21T10:18:33.6014960Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r7724,%r7725,%r7726,%r7727}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.6015023Z // end inline asm 2026-02-21T10:18:33.6015089Z // begin inline asm 2026-02-21T10:18:33.6016677Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r7856,%r7857,%r7858,%r7859}, %rd3, %p42, 1, 1; 2026-02-21T10:18:33.6016883Z // end inline asm 2026-02-21T10:18:33.6016945Z // begin inline asm 2026-02-21T10:18:33.6018430Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r7988,%r7989,%r7990,%r7991}, %rd4, %p42, 1, 1; 2026-02-21T10:18:33.6018496Z // end inline asm 2026-02-21T10:18:33.6018555Z // begin inline asm 2026-02-21T10:18:33.6020144Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8120,%r8121,%r8122,%r8123}, %rd5, %p42, 1, 1; 2026-02-21T10:18:33.6020212Z // end inline asm 2026-02-21T10:18:33.6020271Z // begin inline asm 2026-02-21T10:18:33.6021741Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8252,%r8253,%r8254,%r8255}, %rd6, %p42, 1, 1; 2026-02-21T10:18:33.6021802Z // end inline asm 2026-02-21T10:18:33.6021861Z // begin inline asm 2026-02-21T10:18:33.6023337Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8384,%r8385,%r8386,%r8387}, %rd7, %p42, 1, 1; 2026-02-21T10:18:33.6023397Z // end inline asm 2026-02-21T10:18:33.6023458Z // begin inline asm 2026-02-21T10:18:33.6024936Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8516,%r8517,%r8518,%r8519}, %rd8, %p42, 1, 1; 2026-02-21T10:18:33.6024994Z // end inline asm 2026-02-21T10:18:33.6025058Z // begin inline asm 2026-02-21T10:18:33.6026658Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8648,%r8649,%r8650,%r8651}, %rd9, %p42, 1, 1; 2026-02-21T10:18:33.6026850Z // end inline asm 2026-02-21T10:18:33.6026918Z // begin inline asm 2026-02-21T10:18:33.6028539Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r8780,%r8781,%r8782,%r8783}, %rd10, %p42, 1, 1; 2026-02-21T10:18:33.6028615Z // end inline asm 2026-02-21T10:18:33.6028699Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6028761Z mov.b32 %r8912, %r29377; 2026-02-21T10:18:33.6028824Z mov.b32 %r8913, %r8914; 2026-02-21T10:18:33.6028947Z // begin inline asm 2026-02-21T10:18:33.6031460Z // wait for regs: %r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970,%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034,%r8912,%r8913,%r8914 2026-02-21T10:18:33.6031546Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6031604Z // end inline asm 2026-02-21T10:18:33.6031660Z $L__tmp6: 2026-02-21T10:18:33.6031875Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6031941Z add.s64 %rd654, %rd654, 384; 2026-02-21T10:18:33.6032008Z add.s32 %r31906, %r31906, 192; 2026-02-21T10:18:33.6032081Z setp.lt.u64 %p102, %rd31, 3936; 2026-02-21T10:18:33.6032144Z mov.b64 %rd655, %rd31; 2026-02-21T10:18:33.6032220Z @%p102 bra $L__BB0_3; 2026-02-21T10:18:33.6032337Z // %bb.4: // %.preheader204.preheader 2026-02-21T10:18:33.6032450Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:33.6032516Z add.s64 %rd33, %rd28, 16128; 2026-02-21T10:18:33.6032578Z add.s64 %rd34, %rd21, 16128; 2026-02-21T10:18:33.6032647Z add.s64 %rd35, %rd22, 16128; 2026-02-21T10:18:33.6032710Z add.s64 %rd36, %rd23, 16128; 2026-02-21T10:18:33.6032770Z add.s64 %rd37, %rd24, 16128; 2026-02-21T10:18:33.6032837Z add.s64 %rd38, %rd25, 16128; 2026-02-21T10:18:33.6032898Z add.s64 %rd39, %rd26, 16128; 2026-02-21T10:18:33.6032959Z add.s64 %rd40, %rd27, 16128; 2026-02-21T10:18:33.6033019Z mov.b64 %rd657, 4000; 2026-02-21T10:18:33.6033084Z mov.b64 %rd656, %rd11; 2026-02-21T10:18:33.6033190Z $L__BB0_5: // %.preheader204 2026-02-21T10:18:33.6033359Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:18:33.6033514Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.6033727Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6033797Z add.s64 %rd258, %rd656, %rd40; 2026-02-21T10:18:33.6033866Z add.s64 %rd261, %rd656, %rd39; 2026-02-21T10:18:33.6033928Z add.s64 %rd264, %rd656, %rd38; 2026-02-21T10:18:33.6033992Z add.s64 %rd267, %rd656, %rd37; 2026-02-21T10:18:33.6034056Z add.s64 %rd270, %rd656, %rd36; 2026-02-21T10:18:33.6034124Z add.s64 %rd273, %rd656, %rd35; 2026-02-21T10:18:33.6034186Z add.s64 %rd276, %rd656, %rd34; 2026-02-21T10:18:33.6034387Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6034454Z add.s64 %rd279, %rd656, %rd33; 2026-02-21T10:18:33.6034522Z // begin inline asm 2026-02-21T10:18:33.6034586Z mov.u64 %rd257, 0x0; 2026-02-21T10:18:33.6034715Z createpolicy.fractional.L2::evict_first.b64 %rd257, 1.0; 2026-02-21T10:18:33.6034780Z // end inline asm 2026-02-21T10:18:33.6034892Z // begin inline asm 2026-02-21T10:18:33.6034954Z mov.u32 %r9244, 0x0; 2026-02-21T10:18:33.6035025Z mov.u32 %r9245, 0x0; 2026-02-21T10:18:33.6035095Z mov.u32 %r9246, 0x0; 2026-02-21T10:18:33.6035154Z mov.u32 %r9247, 0x0; 2026-02-21T10:18:33.6035438Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9244, %r9245, %r9246, %r9247 }, [ %rd258 + 0 ], %rd257; 2026-02-21T10:18:33.6035498Z // end inline asm 2026-02-21T10:18:33.6035558Z // begin inline asm 2026-02-21T10:18:33.6035616Z mov.u64 %rd260, 0x0; 2026-02-21T10:18:33.6035743Z createpolicy.fractional.L2::evict_first.b64 %rd260, 1.0; 2026-02-21T10:18:33.6035801Z // end inline asm 2026-02-21T10:18:33.6035860Z // begin inline asm 2026-02-21T10:18:33.6035924Z mov.u32 %r9248, 0x0; 2026-02-21T10:18:33.6035982Z mov.u32 %r9249, 0x0; 2026-02-21T10:18:33.6036039Z mov.u32 %r9250, 0x0; 2026-02-21T10:18:33.6036099Z mov.u32 %r9251, 0x0; 2026-02-21T10:18:33.6036345Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9248, %r9249, %r9250, %r9251 }, [ %rd261 + 0 ], %rd260; 2026-02-21T10:18:33.6036404Z // end inline asm 2026-02-21T10:18:33.6036576Z // begin inline asm 2026-02-21T10:18:33.6036647Z mov.u64 %rd263, 0x0; 2026-02-21T10:18:33.6036767Z createpolicy.fractional.L2::evict_first.b64 %rd263, 1.0; 2026-02-21T10:18:33.6036824Z // end inline asm 2026-02-21T10:18:33.6036890Z // begin inline asm 2026-02-21T10:18:33.6036949Z mov.u32 %r9252, 0x0; 2026-02-21T10:18:33.6037008Z mov.u32 %r9253, 0x0; 2026-02-21T10:18:33.6037066Z mov.u32 %r9254, 0x0; 2026-02-21T10:18:33.6037128Z mov.u32 %r9255, 0x0; 2026-02-21T10:18:33.6037345Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9252, %r9253, %r9254, %r9255 }, [ %rd264 + 0 ], %rd263; 2026-02-21T10:18:33.6037403Z // end inline asm 2026-02-21T10:18:33.6037466Z // begin inline asm 2026-02-21T10:18:33.6037523Z mov.u64 %rd266, 0x0; 2026-02-21T10:18:33.6037643Z createpolicy.fractional.L2::evict_first.b64 %rd266, 1.0; 2026-02-21T10:18:33.6037707Z // end inline asm 2026-02-21T10:18:33.6037768Z // begin inline asm 2026-02-21T10:18:33.6037827Z mov.u32 %r9256, 0x0; 2026-02-21T10:18:33.6037885Z mov.u32 %r9257, 0x0; 2026-02-21T10:18:33.6037947Z mov.u32 %r9258, 0x0; 2026-02-21T10:18:33.6038005Z mov.u32 %r9259, 0x0; 2026-02-21T10:18:33.6038219Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9256, %r9257, %r9258, %r9259 }, [ %rd267 + 0 ], %rd266; 2026-02-21T10:18:33.6038283Z // end inline asm 2026-02-21T10:18:33.6038341Z // begin inline asm 2026-02-21T10:18:33.6038401Z mov.u64 %rd269, 0x0; 2026-02-21T10:18:33.6038518Z createpolicy.fractional.L2::evict_first.b64 %rd269, 1.0; 2026-02-21T10:18:33.6038582Z // end inline asm 2026-02-21T10:18:33.6038641Z // begin inline asm 2026-02-21T10:18:33.6038699Z mov.u32 %r9260, 0x0; 2026-02-21T10:18:33.6038764Z mov.u32 %r9261, 0x0; 2026-02-21T10:18:33.6038822Z mov.u32 %r9262, 0x0; 2026-02-21T10:18:33.6038972Z mov.u32 %r9263, 0x0; 2026-02-21T10:18:33.6039192Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9260, %r9261, %r9262, %r9263 }, [ %rd270 + 0 ], %rd269; 2026-02-21T10:18:33.6039314Z // end inline asm 2026-02-21T10:18:33.6039374Z // begin inline asm 2026-02-21T10:18:33.6039432Z mov.u64 %rd272, 0x0; 2026-02-21T10:18:33.6039556Z createpolicy.fractional.L2::evict_first.b64 %rd272, 1.0; 2026-02-21T10:18:33.6039625Z // end inline asm 2026-02-21T10:18:33.6039690Z // begin inline asm 2026-02-21T10:18:33.6039755Z mov.u32 %r9264, 0x0; 2026-02-21T10:18:33.6039813Z mov.u32 %r9265, 0x0; 2026-02-21T10:18:33.6039870Z mov.u32 %r9266, 0x0; 2026-02-21T10:18:33.6039927Z mov.u32 %r9267, 0x0; 2026-02-21T10:18:33.6040150Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9264, %r9265, %r9266, %r9267 }, [ %rd273 + 0 ], %rd272; 2026-02-21T10:18:33.6040211Z // end inline asm 2026-02-21T10:18:33.6040271Z // begin inline asm 2026-02-21T10:18:33.6040336Z mov.u64 %rd275, 0x0; 2026-02-21T10:18:33.6040463Z createpolicy.fractional.L2::evict_first.b64 %rd275, 1.0; 2026-02-21T10:18:33.6040525Z // end inline asm 2026-02-21T10:18:33.6040589Z // begin inline asm 2026-02-21T10:18:33.6040724Z mov.u32 %r9268, 0x0; 2026-02-21T10:18:33.6040788Z mov.u32 %r9269, 0x0; 2026-02-21T10:18:33.6040848Z mov.u32 %r9270, 0x0; 2026-02-21T10:18:33.6040912Z mov.u32 %r9271, 0x0; 2026-02-21T10:18:33.6041193Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9268, %r9269, %r9270, %r9271 }, [ %rd276 + 0 ], %rd275; 2026-02-21T10:18:33.6041256Z // end inline asm 2026-02-21T10:18:33.6041321Z // begin inline asm 2026-02-21T10:18:33.6041381Z mov.u64 %rd278, 0x0; 2026-02-21T10:18:33.6041500Z createpolicy.fractional.L2::evict_first.b64 %rd278, 1.0; 2026-02-21T10:18:33.6041563Z // end inline asm 2026-02-21T10:18:33.6041623Z // begin inline asm 2026-02-21T10:18:33.6041681Z mov.u32 %r9272, 0x0; 2026-02-21T10:18:33.6041739Z mov.u32 %r9273, 0x0; 2026-02-21T10:18:33.6041802Z mov.u32 %r9274, 0x0; 2026-02-21T10:18:33.6041862Z mov.u32 %r9275, 0x0; 2026-02-21T10:18:33.6042078Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9272, %r9273, %r9274, %r9275 }, [ %rd279 + 0 ], %rd278; 2026-02-21T10:18:33.6042147Z // end inline asm 2026-02-21T10:18:33.6042357Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6042415Z bar.sync 0; 2026-02-21T10:18:33.6042500Z st.shared.v2.b32 [%r16], {%r9244, %r9245}; 2026-02-21T10:18:33.6042601Z st.shared.v2.b32 [%r16+2048], {%r9248, %r9249}; 2026-02-21T10:18:33.6042686Z st.shared.v2.b32 [%r16+4096], {%r9252, %r9253}; 2026-02-21T10:18:33.6042769Z st.shared.v2.b32 [%r16+6144], {%r9256, %r9257}; 2026-02-21T10:18:33.6042856Z st.shared.v2.b32 [%r16+8192], {%r9260, %r9261}; 2026-02-21T10:18:33.6042945Z st.shared.v2.b32 [%r16+10240], {%r9264, %r9265}; 2026-02-21T10:18:33.6043029Z st.shared.v2.b32 [%r16+12288], {%r9268, %r9269}; 2026-02-21T10:18:33.6043121Z st.shared.v2.b32 [%r16+14336], {%r9272, %r9273}; 2026-02-21T10:18:33.6043201Z st.shared.v2.b32 [%r17], {%r9246, %r9247}; 2026-02-21T10:18:33.6043287Z st.shared.v2.b32 [%r17+2048], {%r9250, %r9251}; 2026-02-21T10:18:33.6043372Z st.shared.v2.b32 [%r17+4096], {%r9254, %r9255}; 2026-02-21T10:18:33.6043461Z st.shared.v2.b32 [%r17+6144], {%r9258, %r9259}; 2026-02-21T10:18:33.6043544Z st.shared.v2.b32 [%r17+8192], {%r9262, %r9263}; 2026-02-21T10:18:33.6043629Z st.shared.v2.b32 [%r17+10240], {%r9266, %r9267}; 2026-02-21T10:18:33.6043719Z st.shared.v2.b32 [%r17+12288], {%r9270, %r9271}; 2026-02-21T10:18:33.6043814Z st.shared.v2.b32 [%r17+14336], {%r9274, %r9275}; 2026-02-21T10:18:33.6043873Z bar.sync 0; 2026-02-21T10:18:33.6043949Z ld.shared.b16 %rs673, [%r18]; 2026-02-21T10:18:33.6044020Z ld.shared.b16 %rs674, [%r18+1024]; 2026-02-21T10:18:33.6044090Z ld.shared.b16 %rs675, [%r18+64]; 2026-02-21T10:18:33.6044157Z ld.shared.b16 %rs676, [%r18+1088]; 2026-02-21T10:18:33.6044231Z ld.shared.b16 %rs677, [%r18+8192]; 2026-02-21T10:18:33.6044363Z ld.shared.b16 %rs678, [%r18+9216]; 2026-02-21T10:18:33.6044428Z ld.shared.b16 %rs679, [%r18+8256]; 2026-02-21T10:18:33.6044546Z ld.shared.b16 %rs680, [%r18+9280]; 2026-02-21T10:18:33.6044614Z ld.shared.b16 %rs681, [%r19]; 2026-02-21T10:18:33.6044678Z ld.shared.b16 %rs682, [%r19+1024]; 2026-02-21T10:18:33.6044748Z ld.shared.b16 %rs683, [%r19+64]; 2026-02-21T10:18:33.6044817Z ld.shared.b16 %rs684, [%r19+1088]; 2026-02-21T10:18:33.6044881Z ld.shared.b16 %rs685, [%r19+8192]; 2026-02-21T10:18:33.6044945Z ld.shared.b16 %rs686, [%r19+9216]; 2026-02-21T10:18:33.6045016Z ld.shared.b16 %rs687, [%r19+8256]; 2026-02-21T10:18:33.6045080Z ld.shared.b16 %rs688, [%r19+9280]; 2026-02-21T10:18:33.6045145Z ld.shared.b16 %rs689, [%r20]; 2026-02-21T10:18:33.6045213Z ld.shared.b16 %rs690, [%r20+1024]; 2026-02-21T10:18:33.6045278Z ld.shared.b16 %rs691, [%r20+64]; 2026-02-21T10:18:33.6045343Z ld.shared.b16 %rs692, [%r20+1088]; 2026-02-21T10:18:33.6045411Z ld.shared.b16 %rs693, [%r20+8192]; 2026-02-21T10:18:33.6045483Z ld.shared.b16 %rs694, [%r20+9216]; 2026-02-21T10:18:33.6045550Z ld.shared.b16 %rs695, [%r20+8256]; 2026-02-21T10:18:33.6045629Z ld.shared.b16 %rs696, [%r20+9280]; 2026-02-21T10:18:33.6045765Z ld.shared.b16 %rs697, [%r21]; 2026-02-21T10:18:33.6045833Z ld.shared.b16 %rs698, [%r21+1024]; 2026-02-21T10:18:33.6045898Z ld.shared.b16 %rs699, [%r21+64]; 2026-02-21T10:18:33.6045962Z ld.shared.b16 %rs700, [%r21+1088]; 2026-02-21T10:18:33.6046073Z ld.shared.b16 %rs701, [%r21+8192]; 2026-02-21T10:18:33.6046141Z ld.shared.b16 %rs702, [%r21+9216]; 2026-02-21T10:18:33.6046206Z ld.shared.b16 %rs703, [%r21+8256]; 2026-02-21T10:18:33.6046277Z ld.shared.b16 %rs704, [%r21+9280]; 2026-02-21T10:18:33.6046342Z ld.shared.b16 %rs705, [%r22]; 2026-02-21T10:18:33.6046407Z ld.shared.b16 %rs706, [%r22+1024]; 2026-02-21T10:18:33.6046626Z ld.shared.b16 %rs707, [%r22+64]; 2026-02-21T10:18:33.6046695Z ld.shared.b16 %rs708, [%r22+1088]; 2026-02-21T10:18:33.6046760Z ld.shared.b16 %rs709, [%r22+8192]; 2026-02-21T10:18:33.6046827Z ld.shared.b16 %rs710, [%r22+9216]; 2026-02-21T10:18:33.6046898Z ld.shared.b16 %rs711, [%r22+8256]; 2026-02-21T10:18:33.6046962Z ld.shared.b16 %rs712, [%r22+9280]; 2026-02-21T10:18:33.6047030Z ld.shared.b16 %rs713, [%r23]; 2026-02-21T10:18:33.6047102Z ld.shared.b16 %rs714, [%r23+1024]; 2026-02-21T10:18:33.6047166Z ld.shared.b16 %rs715, [%r23+64]; 2026-02-21T10:18:33.6047231Z ld.shared.b16 %rs716, [%r23+1088]; 2026-02-21T10:18:33.6047299Z ld.shared.b16 %rs717, [%r23+8192]; 2026-02-21T10:18:33.6047369Z ld.shared.b16 %rs718, [%r23+9216]; 2026-02-21T10:18:33.6047434Z ld.shared.b16 %rs719, [%r23+8256]; 2026-02-21T10:18:33.6047500Z ld.shared.b16 %rs720, [%r23+9280]; 2026-02-21T10:18:33.6047568Z ld.shared.b16 %rs721, [%r24]; 2026-02-21T10:18:33.6047634Z ld.shared.b16 %rs722, [%r24+1024]; 2026-02-21T10:18:33.6047700Z ld.shared.b16 %rs723, [%r24+64]; 2026-02-21T10:18:33.6047765Z ld.shared.b16 %rs724, [%r24+1088]; 2026-02-21T10:18:33.6047837Z ld.shared.b16 %rs725, [%r24+8192]; 2026-02-21T10:18:33.6047904Z ld.shared.b16 %rs726, [%r24+9216]; 2026-02-21T10:18:33.6047971Z ld.shared.b16 %rs727, [%r24+8256]; 2026-02-21T10:18:33.6048044Z ld.shared.b16 %rs728, [%r24+9280]; 2026-02-21T10:18:33.6048111Z ld.shared.b16 %rs729, [%r25]; 2026-02-21T10:18:33.6048175Z ld.shared.b16 %rs730, [%r25+1024]; 2026-02-21T10:18:33.6048250Z ld.shared.b16 %rs731, [%r25+64]; 2026-02-21T10:18:33.6048325Z ld.shared.b16 %rs732, [%r25+1088]; 2026-02-21T10:18:33.6048394Z ld.shared.b16 %rs733, [%r25+8192]; 2026-02-21T10:18:33.6048460Z ld.shared.b16 %rs734, [%r25+9216]; 2026-02-21T10:18:33.6048529Z ld.shared.b16 %rs735, [%r25+8256]; 2026-02-21T10:18:33.6048594Z ld.shared.b16 %rs736, [%r25+9280]; 2026-02-21T10:18:33.6048662Z cvt.f32.bf16 %r9413, %rs673; 2026-02-21T10:18:33.6048735Z cvt.f32.bf16 %r9414, %rs674; 2026-02-21T10:18:33.6048801Z cvt.f32.bf16 %r9415, %rs681; 2026-02-21T10:18:33.6048861Z cvt.f32.bf16 %r9416, %rs682; 2026-02-21T10:18:33.6048923Z cvt.f32.bf16 %r9545, %rs689; 2026-02-21T10:18:33.6049069Z cvt.f32.bf16 %r9546, %rs690; 2026-02-21T10:18:33.6049189Z cvt.f32.bf16 %r9547, %rs697; 2026-02-21T10:18:33.6049252Z cvt.f32.bf16 %r9548, %rs698; 2026-02-21T10:18:33.6049321Z cvt.f32.bf16 %r9677, %rs705; 2026-02-21T10:18:33.6049386Z cvt.f32.bf16 %r9678, %rs706; 2026-02-21T10:18:33.6049448Z cvt.f32.bf16 %r9679, %rs713; 2026-02-21T10:18:33.6049508Z cvt.f32.bf16 %r9680, %rs714; 2026-02-21T10:18:33.6049576Z cvt.f32.bf16 %r9809, %rs721; 2026-02-21T10:18:33.6049638Z cvt.f32.bf16 %r9810, %rs722; 2026-02-21T10:18:33.6049700Z cvt.f32.bf16 %r9811, %rs729; 2026-02-21T10:18:33.6049767Z cvt.f32.bf16 %r9812, %rs730; 2026-02-21T10:18:33.6049830Z cvt.f32.bf16 %r9941, %rs675; 2026-02-21T10:18:33.6049891Z cvt.f32.bf16 %r9942, %rs676; 2026-02-21T10:18:33.6049963Z cvt.f32.bf16 %r9943, %rs683; 2026-02-21T10:18:33.6050036Z cvt.f32.bf16 %r9944, %rs684; 2026-02-21T10:18:33.6050101Z cvt.f32.bf16 %r10073, %rs691; 2026-02-21T10:18:33.6050164Z cvt.f32.bf16 %r10074, %rs692; 2026-02-21T10:18:33.6050233Z cvt.f32.bf16 %r10075, %rs699; 2026-02-21T10:18:33.6050298Z cvt.f32.bf16 %r10076, %rs700; 2026-02-21T10:18:33.6050361Z cvt.f32.bf16 %r10205, %rs707; 2026-02-21T10:18:33.6050496Z cvt.f32.bf16 %r10206, %rs708; 2026-02-21T10:18:33.6050562Z cvt.f32.bf16 %r10207, %rs715; 2026-02-21T10:18:33.6050623Z cvt.f32.bf16 %r10208, %rs716; 2026-02-21T10:18:33.6050687Z cvt.f32.bf16 %r10337, %rs723; 2026-02-21T10:18:33.6050811Z cvt.f32.bf16 %r10338, %rs724; 2026-02-21T10:18:33.6050875Z cvt.f32.bf16 %r10339, %rs731; 2026-02-21T10:18:33.6050936Z cvt.f32.bf16 %r10340, %rs732; 2026-02-21T10:18:33.6051004Z cvt.f32.bf16 %r10469, %rs677; 2026-02-21T10:18:33.6051067Z cvt.f32.bf16 %r10470, %rs678; 2026-02-21T10:18:33.6051129Z cvt.f32.bf16 %r10471, %rs685; 2026-02-21T10:18:33.6051193Z cvt.f32.bf16 %r10472, %rs686; 2026-02-21T10:18:33.6051260Z cvt.f32.bf16 %r10601, %rs693; 2026-02-21T10:18:33.6051324Z cvt.f32.bf16 %r10602, %rs694; 2026-02-21T10:18:33.6051384Z cvt.f32.bf16 %r10603, %rs701; 2026-02-21T10:18:33.6051453Z cvt.f32.bf16 %r10604, %rs702; 2026-02-21T10:18:33.6051516Z cvt.f32.bf16 %r10733, %rs709; 2026-02-21T10:18:33.6051579Z cvt.f32.bf16 %r10734, %rs710; 2026-02-21T10:18:33.6051657Z cvt.f32.bf16 %r10735, %rs717; 2026-02-21T10:18:33.6051719Z cvt.f32.bf16 %r10736, %rs718; 2026-02-21T10:18:33.6051781Z cvt.f32.bf16 %r10865, %rs725; 2026-02-21T10:18:33.6051842Z cvt.f32.bf16 %r10866, %rs726; 2026-02-21T10:18:33.6051913Z cvt.f32.bf16 %r10867, %rs733; 2026-02-21T10:18:33.6051973Z cvt.f32.bf16 %r10868, %rs734; 2026-02-21T10:18:33.6052034Z cvt.f32.bf16 %r10997, %rs679; 2026-02-21T10:18:33.6052105Z cvt.f32.bf16 %r10998, %rs680; 2026-02-21T10:18:33.6052166Z cvt.f32.bf16 %r10999, %rs687; 2026-02-21T10:18:33.6052226Z cvt.f32.bf16 %r11000, %rs688; 2026-02-21T10:18:33.6052287Z cvt.f32.bf16 %r11129, %rs695; 2026-02-21T10:18:33.6052353Z cvt.f32.bf16 %r11130, %rs696; 2026-02-21T10:18:33.6052414Z cvt.f32.bf16 %r11131, %rs703; 2026-02-21T10:18:33.6052478Z cvt.f32.bf16 %r11132, %rs704; 2026-02-21T10:18:33.6052542Z cvt.f32.bf16 %r11261, %rs711; 2026-02-21T10:18:33.6052604Z cvt.f32.bf16 %r11262, %rs712; 2026-02-21T10:18:33.6052667Z cvt.f32.bf16 %r11263, %rs719; 2026-02-21T10:18:33.6052729Z cvt.f32.bf16 %r11264, %rs720; 2026-02-21T10:18:33.6052795Z cvt.f32.bf16 %r11393, %rs727; 2026-02-21T10:18:33.6052855Z cvt.f32.bf16 %r11394, %rs728; 2026-02-21T10:18:33.6052915Z cvt.f32.bf16 %r11395, %rs735; 2026-02-21T10:18:33.6052986Z cvt.f32.bf16 %r11396, %rs736; 2026-02-21T10:18:33.6053208Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6053267Z bar.sync 0; 2026-02-21T10:18:33.6053333Z // begin inline asm 2026-02-21T10:18:33.6053437Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.6053495Z // end inline asm 2026-02-21T10:18:33.6053551Z bar.sync 0; 2026-02-21T10:18:33.6053615Z // begin inline asm 2026-02-21T10:18:33.6053759Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.6053877Z // end inline asm 2026-02-21T10:18:33.6053998Z // begin inline asm 2026-02-21T10:18:33.6054078Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6054136Z // end inline asm 2026-02-21T10:18:33.6054192Z bar.sync 0; 2026-02-21T10:18:33.6054268Z elect.sync %r11659|%p124, -1; 2026-02-21T10:18:33.6054344Z and.pred %p105, %p1, %p124; 2026-02-21T10:18:33.6054409Z add.s64 %rd657, %rd657, 32; 2026-02-21T10:18:33.6054476Z cvt.u32.u64 %r9280, %rd657; 2026-02-21T10:18:33.6054538Z // begin inline asm 2026-02-21T10:18:33.6054874Z @%p105 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r9279, %r9280}], [%r19296]; 2026-02-21T10:18:33.6054937Z // end inline asm 2026-02-21T10:18:33.6054992Z bar.sync 0; 2026-02-21T10:18:33.6055052Z mov.b32 %r11527, 0; 2026-02-21T10:18:33.6055114Z // begin inline asm 2026-02-21T10:18:33.6055173Z 2026-02-21T10:18:33.6055225Z { 2026-02-21T10:18:33.6055290Z .reg .pred complete; 2026-02-21T10:18:33.6055354Z waitLoop: 2026-02-21T10:18:33.6055504Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r11527; 2026-02-21T10:18:33.6055628Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6055682Z } 2026-02-21T10:18:33.6055687Z 2026-02-21T10:18:33.6055751Z // end inline asm 2026-02-21T10:18:33.6055808Z bar.sync 0; 2026-02-21T10:18:33.6055870Z // begin inline asm 2026-02-21T10:18:33.6056032Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.6056093Z // end inline asm 2026-02-21T10:18:33.6056306Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6056373Z ld.shared.s8 %rs737, [%r26]; 2026-02-21T10:18:33.6056695Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6056764Z shl.b16 %rs738, %rs737, 4; 2026-02-21T10:18:33.6056962Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6057041Z ld.shared.s8 %rs739, [%r27+128]; 2026-02-21T10:18:33.6057242Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6057306Z shl.b16 %rs740, %rs739, 4; 2026-02-21T10:18:33.6057511Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6057586Z ld.shared.s8 %rs741, [%r28+256]; 2026-02-21T10:18:33.6057786Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6057853Z shl.b16 %rs742, %rs741, 4; 2026-02-21T10:18:33.6058050Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6058116Z ld.shared.s8 %rs743, [%r29+384]; 2026-02-21T10:18:33.6058318Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6058383Z shl.b16 %rs744, %rs743, 4; 2026-02-21T10:18:33.6058579Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6058649Z ld.shared.s8 %rs745, [%r30+512]; 2026-02-21T10:18:33.6058852Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6058915Z shl.b16 %rs746, %rs745, 4; 2026-02-21T10:18:33.6059111Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6059185Z ld.shared.s8 %rs747, [%r31+640]; 2026-02-21T10:18:33.6059381Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6059446Z shl.b16 %rs748, %rs747, 4; 2026-02-21T10:18:33.6059649Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6059716Z ld.shared.s8 %rs749, [%r32+768]; 2026-02-21T10:18:33.6060009Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6060139Z shl.b16 %rs750, %rs749, 4; 2026-02-21T10:18:33.6060341Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6060408Z ld.shared.s8 %rs751, [%r33+896]; 2026-02-21T10:18:33.6060607Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6060675Z shl.b16 %rs752, %rs751, 4; 2026-02-21T10:18:33.6060870Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6060942Z ld.shared.s8 %rs753, [%r26+1024]; 2026-02-21T10:18:33.6061149Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6061213Z shl.b16 %rs754, %rs753, 4; 2026-02-21T10:18:33.6061409Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6061492Z ld.shared.s8 %rs755, [%r27+1152]; 2026-02-21T10:18:33.6061757Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6061823Z shl.b16 %rs756, %rs755, 4; 2026-02-21T10:18:33.6062022Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6062144Z ld.shared.s8 %rs757, [%r28+1280]; 2026-02-21T10:18:33.6062345Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6062408Z shl.b16 %rs758, %rs757, 4; 2026-02-21T10:18:33.6062608Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6062672Z ld.shared.s8 %rs759, [%r29+1408]; 2026-02-21T10:18:33.6062867Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6062935Z shl.b16 %rs760, %rs759, 4; 2026-02-21T10:18:33.6063133Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6063199Z ld.shared.s8 %rs761, [%r30+1536]; 2026-02-21T10:18:33.6063398Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6063459Z shl.b16 %rs762, %rs761, 4; 2026-02-21T10:18:33.6063657Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6063728Z ld.shared.s8 %rs763, [%r31+1664]; 2026-02-21T10:18:33.6063920Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6063982Z shl.b16 %rs764, %rs763, 4; 2026-02-21T10:18:33.6064177Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6064247Z ld.shared.s8 %rs765, [%r32+1792]; 2026-02-21T10:18:33.6064444Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6064507Z shl.b16 %rs766, %rs765, 4; 2026-02-21T10:18:33.6064711Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6064788Z ld.shared.s8 %rs767, [%r33+1920]; 2026-02-21T10:18:33.6064988Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6065057Z shl.b16 %rs768, %rs767, 4; 2026-02-21T10:18:33.6065251Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6065317Z ld.shared.s8 %rs769, [%r26+2048]; 2026-02-21T10:18:33.6065514Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6065576Z shl.b16 %rs770, %rs769, 4; 2026-02-21T10:18:33.6065771Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6065948Z ld.shared.s8 %rs771, [%r27+2176]; 2026-02-21T10:18:33.6066149Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6066210Z shl.b16 %rs772, %rs771, 4; 2026-02-21T10:18:33.6066404Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6066585Z ld.shared.s8 %rs773, [%r28+2304]; 2026-02-21T10:18:33.6066789Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6066851Z shl.b16 %rs774, %rs773, 4; 2026-02-21T10:18:33.6067052Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6067121Z ld.shared.s8 %rs775, [%r29+2432]; 2026-02-21T10:18:33.6067325Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6067398Z shl.b16 %rs776, %rs775, 4; 2026-02-21T10:18:33.6067680Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6067753Z ld.shared.s8 %rs777, [%r30+2560]; 2026-02-21T10:18:33.6067953Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6068023Z shl.b16 %rs778, %rs777, 4; 2026-02-21T10:18:33.6068276Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6068345Z ld.shared.s8 %rs779, [%r31+2688]; 2026-02-21T10:18:33.6068643Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6068708Z shl.b16 %rs780, %rs779, 4; 2026-02-21T10:18:33.6068904Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6068977Z ld.shared.s8 %rs781, [%r32+2816]; 2026-02-21T10:18:33.6069172Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6069239Z shl.b16 %rs782, %rs781, 4; 2026-02-21T10:18:33.6069444Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6069513Z ld.shared.s8 %rs783, [%r33+2944]; 2026-02-21T10:18:33.6069721Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6069790Z shl.b16 %rs784, %rs783, 4; 2026-02-21T10:18:33.6069994Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6070061Z ld.shared.s8 %rs785, [%r26+3072]; 2026-02-21T10:18:33.6070257Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6070327Z shl.b16 %rs786, %rs785, 4; 2026-02-21T10:18:33.6070535Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6070607Z ld.shared.s8 %rs787, [%r27+3200]; 2026-02-21T10:18:33.6070809Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6070873Z shl.b16 %rs788, %rs787, 4; 2026-02-21T10:18:33.6071068Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6071141Z ld.shared.s8 %rs789, [%r28+3328]; 2026-02-21T10:18:33.6071336Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6071399Z shl.b16 %rs790, %rs789, 4; 2026-02-21T10:18:33.6071599Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6071672Z ld.shared.s8 %rs791, [%r29+3456]; 2026-02-21T10:18:33.6071871Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6072014Z shl.b16 %rs792, %rs791, 4; 2026-02-21T10:18:33.6072293Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6072360Z ld.shared.s8 %rs793, [%r30+3584]; 2026-02-21T10:18:33.6072555Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6072627Z shl.b16 %rs794, %rs793, 4; 2026-02-21T10:18:33.6072823Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6072890Z ld.shared.s8 %rs795, [%r31+3712]; 2026-02-21T10:18:33.6073092Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6073157Z shl.b16 %rs796, %rs795, 4; 2026-02-21T10:18:33.6073363Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6073432Z ld.shared.s8 %rs797, [%r32+3840]; 2026-02-21T10:18:33.6073636Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6073746Z shl.b16 %rs798, %rs797, 4; 2026-02-21T10:18:33.6073945Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6074017Z ld.shared.s8 %rs799, [%r33+3968]; 2026-02-21T10:18:33.6074255Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6074318Z shl.b16 %rs800, %rs799, 4; 2026-02-21T10:18:33.6074516Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6074589Z cvt.s16.s8 %rs801, %rs738; 2026-02-21T10:18:33.6074652Z shr.s16 %rs802, %rs801, 4; 2026-02-21T10:18:33.6074729Z cvt.s16.s8 %rs803, %rs740; 2026-02-21T10:18:33.6074792Z shr.s16 %rs804, %rs803, 4; 2026-02-21T10:18:33.6074860Z shr.s16 %rs805, %rs737, 4; 2026-02-21T10:18:33.6074927Z shr.s16 %rs806, %rs739, 4; 2026-02-21T10:18:33.6075132Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6075201Z cvt.rn.f32.s16 %r11660, %rs806; 2026-02-21T10:18:33.6075267Z cvt.rn.f32.s16 %r11661, %rs805; 2026-02-21T10:18:33.6075335Z cvt.rn.f32.s16 %r11662, %rs804; 2026-02-21T10:18:33.6075399Z cvt.rn.f32.s16 %r11663, %rs802; 2026-02-21T10:18:33.6075597Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6075667Z cvt.s16.s8 %rs807, %rs742; 2026-02-21T10:18:33.6075731Z shr.s16 %rs808, %rs807, 4; 2026-02-21T10:18:33.6075793Z cvt.s16.s8 %rs809, %rs744; 2026-02-21T10:18:33.6075858Z shr.s16 %rs810, %rs809, 4; 2026-02-21T10:18:33.6075923Z shr.s16 %rs811, %rs741, 4; 2026-02-21T10:18:33.6075983Z shr.s16 %rs812, %rs743, 4; 2026-02-21T10:18:33.6076178Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6076250Z cvt.rn.f32.s16 %r11664, %rs812; 2026-02-21T10:18:33.6076313Z cvt.rn.f32.s16 %r11665, %rs811; 2026-02-21T10:18:33.6076378Z cvt.rn.f32.s16 %r11666, %rs810; 2026-02-21T10:18:33.6076445Z cvt.rn.f32.s16 %r11667, %rs808; 2026-02-21T10:18:33.6076774Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6076840Z cvt.s16.s8 %rs813, %rs746; 2026-02-21T10:18:33.6076902Z shr.s16 %rs814, %rs813, 4; 2026-02-21T10:18:33.6076968Z cvt.s16.s8 %rs815, %rs748; 2026-02-21T10:18:33.6077033Z shr.s16 %rs816, %rs815, 4; 2026-02-21T10:18:33.6077094Z shr.s16 %rs817, %rs745, 4; 2026-02-21T10:18:33.6077161Z shr.s16 %rs818, %rs747, 4; 2026-02-21T10:18:33.6077356Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6077421Z cvt.rn.f32.s16 %r11668, %rs818; 2026-02-21T10:18:33.6077485Z cvt.rn.f32.s16 %r11669, %rs817; 2026-02-21T10:18:33.6077644Z cvt.rn.f32.s16 %r11670, %rs816; 2026-02-21T10:18:33.6077766Z cvt.rn.f32.s16 %r11671, %rs814; 2026-02-21T10:18:33.6077973Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6078042Z cvt.s16.s8 %rs819, %rs750; 2026-02-21T10:18:33.6078103Z shr.s16 %rs820, %rs819, 4; 2026-02-21T10:18:33.6078166Z cvt.s16.s8 %rs821, %rs752; 2026-02-21T10:18:33.6078229Z shr.s16 %rs822, %rs821, 4; 2026-02-21T10:18:33.6078295Z shr.s16 %rs823, %rs749, 4; 2026-02-21T10:18:33.6078356Z shr.s16 %rs824, %rs751, 4; 2026-02-21T10:18:33.6078557Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6078626Z cvt.rn.f32.s16 %r11672, %rs824; 2026-02-21T10:18:33.6078691Z cvt.rn.f32.s16 %r11673, %rs823; 2026-02-21T10:18:33.6078755Z cvt.rn.f32.s16 %r11674, %rs822; 2026-02-21T10:18:33.6078822Z cvt.rn.f32.s16 %r11675, %rs820; 2026-02-21T10:18:33.6079022Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6079087Z cvt.s16.s8 %rs825, %rs754; 2026-02-21T10:18:33.6079221Z shr.s16 %rs826, %rs825, 4; 2026-02-21T10:18:33.6079296Z cvt.s16.s8 %rs827, %rs756; 2026-02-21T10:18:33.6079359Z shr.s16 %rs828, %rs827, 4; 2026-02-21T10:18:33.6079420Z shr.s16 %rs829, %rs753, 4; 2026-02-21T10:18:33.6079487Z shr.s16 %rs830, %rs755, 4; 2026-02-21T10:18:33.6079743Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6079814Z cvt.rn.f32.s16 %r11676, %rs830; 2026-02-21T10:18:33.6079885Z cvt.rn.f32.s16 %r11677, %rs829; 2026-02-21T10:18:33.6079949Z cvt.rn.f32.s16 %r11678, %rs828; 2026-02-21T10:18:33.6080024Z cvt.rn.f32.s16 %r11679, %rs826; 2026-02-21T10:18:33.6080224Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6080295Z cvt.s16.s8 %rs831, %rs758; 2026-02-21T10:18:33.6080357Z shr.s16 %rs832, %rs831, 4; 2026-02-21T10:18:33.6080420Z cvt.s16.s8 %rs833, %rs760; 2026-02-21T10:18:33.6080490Z shr.s16 %rs834, %rs833, 4; 2026-02-21T10:18:33.6080554Z shr.s16 %rs835, %rs757, 4; 2026-02-21T10:18:33.6080618Z shr.s16 %rs836, %rs759, 4; 2026-02-21T10:18:33.6080814Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6080888Z cvt.rn.f32.s16 %r11680, %rs836; 2026-02-21T10:18:33.6080953Z cvt.rn.f32.s16 %r11681, %rs835; 2026-02-21T10:18:33.6081016Z cvt.rn.f32.s16 %r11682, %rs834; 2026-02-21T10:18:33.6081085Z cvt.rn.f32.s16 %r11683, %rs832; 2026-02-21T10:18:33.6081281Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6081345Z cvt.s16.s8 %rs837, %rs762; 2026-02-21T10:18:33.6081413Z shr.s16 %rs838, %rs837, 4; 2026-02-21T10:18:33.6081477Z cvt.s16.s8 %rs839, %rs764; 2026-02-21T10:18:33.6081542Z shr.s16 %rs840, %rs839, 4; 2026-02-21T10:18:33.6081603Z shr.s16 %rs841, %rs761, 4; 2026-02-21T10:18:33.6081672Z shr.s16 %rs842, %rs763, 4; 2026-02-21T10:18:33.6081868Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6081934Z cvt.rn.f32.s16 %r11684, %rs842; 2026-02-21T10:18:33.6082005Z cvt.rn.f32.s16 %r11685, %rs841; 2026-02-21T10:18:33.6082069Z cvt.rn.f32.s16 %r11686, %rs840; 2026-02-21T10:18:33.6082134Z cvt.rn.f32.s16 %r11687, %rs838; 2026-02-21T10:18:33.6082348Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6082416Z cvt.s16.s8 %rs843, %rs766; 2026-02-21T10:18:33.6082479Z shr.s16 %rs844, %rs843, 4; 2026-02-21T10:18:33.6082542Z cvt.s16.s8 %rs845, %rs768; 2026-02-21T10:18:33.6082610Z shr.s16 %rs846, %rs845, 4; 2026-02-21T10:18:33.6082672Z shr.s16 %rs847, %rs765, 4; 2026-02-21T10:18:33.6082738Z shr.s16 %rs848, %rs767, 4; 2026-02-21T10:18:33.6083009Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6083120Z cvt.rn.f32.s16 %r11688, %rs848; 2026-02-21T10:18:33.6083186Z cvt.rn.f32.s16 %r11689, %rs847; 2026-02-21T10:18:33.6083253Z cvt.rn.f32.s16 %r11690, %rs846; 2026-02-21T10:18:33.6083326Z cvt.rn.f32.s16 %r11691, %rs844; 2026-02-21T10:18:33.6083526Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6083590Z cvt.s16.s8 %rs849, %rs770; 2026-02-21T10:18:33.6083658Z shr.s16 %rs850, %rs849, 4; 2026-02-21T10:18:33.6083721Z cvt.s16.s8 %rs851, %rs772; 2026-02-21T10:18:33.6083782Z shr.s16 %rs852, %rs851, 4; 2026-02-21T10:18:33.6083849Z shr.s16 %rs853, %rs769, 4; 2026-02-21T10:18:33.6083914Z shr.s16 %rs854, %rs771, 4; 2026-02-21T10:18:33.6084111Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6084180Z cvt.rn.f32.s16 %r11692, %rs854; 2026-02-21T10:18:33.6084251Z cvt.rn.f32.s16 %r11693, %rs853; 2026-02-21T10:18:33.6084323Z cvt.rn.f32.s16 %r11694, %rs852; 2026-02-21T10:18:33.6084435Z cvt.rn.f32.s16 %r11695, %rs850; 2026-02-21T10:18:33.6084640Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6084701Z cvt.s16.s8 %rs855, %rs774; 2026-02-21T10:18:33.6084765Z shr.s16 %rs856, %rs855, 4; 2026-02-21T10:18:33.6084878Z cvt.s16.s8 %rs857, %rs776; 2026-02-21T10:18:33.6084945Z shr.s16 %rs858, %rs857, 4; 2026-02-21T10:18:33.6085005Z shr.s16 %rs859, %rs773, 4; 2026-02-21T10:18:33.6085066Z shr.s16 %rs860, %rs775, 4; 2026-02-21T10:18:33.6085282Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6085347Z cvt.rn.f32.s16 %r11696, %rs860; 2026-02-21T10:18:33.6085411Z cvt.rn.f32.s16 %r11697, %rs859; 2026-02-21T10:18:33.6085478Z cvt.rn.f32.s16 %r11698, %rs858; 2026-02-21T10:18:33.6085543Z cvt.rn.f32.s16 %r11699, %rs856; 2026-02-21T10:18:33.6085741Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6085810Z cvt.s16.s8 %rs861, %rs778; 2026-02-21T10:18:33.6085872Z shr.s16 %rs862, %rs861, 4; 2026-02-21T10:18:33.6085936Z cvt.s16.s8 %rs863, %rs780; 2026-02-21T10:18:33.6085996Z shr.s16 %rs864, %rs863, 4; 2026-02-21T10:18:33.6086063Z shr.s16 %rs865, %rs777, 4; 2026-02-21T10:18:33.6086126Z shr.s16 %rs866, %rs779, 4; 2026-02-21T10:18:33.6086323Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6086392Z cvt.rn.f32.s16 %r11700, %rs866; 2026-02-21T10:18:33.6086579Z cvt.rn.f32.s16 %r11701, %rs865; 2026-02-21T10:18:33.6086646Z cvt.rn.f32.s16 %r11702, %rs864; 2026-02-21T10:18:33.6086710Z cvt.rn.f32.s16 %r11703, %rs862; 2026-02-21T10:18:33.6086909Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6086976Z cvt.s16.s8 %rs867, %rs782; 2026-02-21T10:18:33.6087039Z shr.s16 %rs868, %rs867, 4; 2026-02-21T10:18:33.6087108Z cvt.s16.s8 %rs869, %rs784; 2026-02-21T10:18:33.6087170Z shr.s16 %rs870, %rs869, 4; 2026-02-21T10:18:33.6087231Z shr.s16 %rs871, %rs781, 4; 2026-02-21T10:18:33.6087298Z shr.s16 %rs872, %rs783, 4; 2026-02-21T10:18:33.6087494Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6087571Z cvt.rn.f32.s16 %r11704, %rs872; 2026-02-21T10:18:33.6087639Z cvt.rn.f32.s16 %r11705, %rs871; 2026-02-21T10:18:33.6087711Z cvt.rn.f32.s16 %r11706, %rs870; 2026-02-21T10:18:33.6087774Z cvt.rn.f32.s16 %r11707, %rs868; 2026-02-21T10:18:33.6087969Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6088036Z cvt.s16.s8 %rs873, %rs786; 2026-02-21T10:18:33.6088097Z shr.s16 %rs874, %rs873, 4; 2026-02-21T10:18:33.6088262Z cvt.s16.s8 %rs875, %rs788; 2026-02-21T10:18:33.6088325Z shr.s16 %rs876, %rs875, 4; 2026-02-21T10:18:33.6088454Z shr.s16 %rs877, %rs785, 4; 2026-02-21T10:18:33.6088517Z shr.s16 %rs878, %rs787, 4; 2026-02-21T10:18:33.6088715Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6088786Z cvt.rn.f32.s16 %r11708, %rs878; 2026-02-21T10:18:33.6088851Z cvt.rn.f32.s16 %r11709, %rs877; 2026-02-21T10:18:33.6088918Z cvt.rn.f32.s16 %r11710, %rs876; 2026-02-21T10:18:33.6088988Z cvt.rn.f32.s16 %r11711, %rs874; 2026-02-21T10:18:33.6089184Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6089246Z cvt.s16.s8 %rs879, %rs790; 2026-02-21T10:18:33.6089311Z shr.s16 %rs880, %rs879, 4; 2026-02-21T10:18:33.6089377Z cvt.s16.s8 %rs881, %rs792; 2026-02-21T10:18:33.6089440Z shr.s16 %rs882, %rs881, 4; 2026-02-21T10:18:33.6089503Z shr.s16 %rs883, %rs789, 4; 2026-02-21T10:18:33.6089572Z shr.s16 %rs884, %rs791, 4; 2026-02-21T10:18:33.6089769Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6089906Z cvt.rn.f32.s16 %r11712, %rs884; 2026-02-21T10:18:33.6089980Z cvt.rn.f32.s16 %r11713, %rs883; 2026-02-21T10:18:33.6090044Z cvt.rn.f32.s16 %r11714, %rs882; 2026-02-21T10:18:33.6090106Z cvt.rn.f32.s16 %r11715, %rs880; 2026-02-21T10:18:33.6090366Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6090437Z cvt.s16.s8 %rs885, %rs794; 2026-02-21T10:18:33.6090500Z shr.s16 %rs886, %rs885, 4; 2026-02-21T10:18:33.6090562Z cvt.s16.s8 %rs887, %rs796; 2026-02-21T10:18:33.6090631Z shr.s16 %rs888, %rs887, 4; 2026-02-21T10:18:33.6090693Z shr.s16 %rs889, %rs793, 4; 2026-02-21T10:18:33.6090754Z shr.s16 %rs890, %rs795, 4; 2026-02-21T10:18:33.6090950Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6091033Z cvt.rn.f32.s16 %r11716, %rs890; 2026-02-21T10:18:33.6091103Z cvt.rn.f32.s16 %r11717, %rs889; 2026-02-21T10:18:33.6091170Z cvt.rn.f32.s16 %r11718, %rs888; 2026-02-21T10:18:33.6091241Z cvt.rn.f32.s16 %r11719, %rs886; 2026-02-21T10:18:33.6091438Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6091502Z cvt.s16.s8 %rs891, %rs798; 2026-02-21T10:18:33.6091571Z shr.s16 %rs892, %rs891, 4; 2026-02-21T10:18:33.6091635Z cvt.s16.s8 %rs893, %rs800; 2026-02-21T10:18:33.6091697Z shr.s16 %rs894, %rs893, 4; 2026-02-21T10:18:33.6091759Z shr.s16 %rs895, %rs797, 4; 2026-02-21T10:18:33.6091826Z shr.s16 %rs896, %rs799, 4; 2026-02-21T10:18:33.6092022Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6092087Z cvt.rn.f32.s16 %r11720, %rs896; 2026-02-21T10:18:33.6092157Z cvt.rn.f32.s16 %r11721, %rs895; 2026-02-21T10:18:33.6092224Z cvt.rn.f32.s16 %r11722, %rs894; 2026-02-21T10:18:33.6092291Z cvt.rn.f32.s16 %r11723, %rs892; 2026-02-21T10:18:33.6092357Z bar.sync 0; 2026-02-21T10:18:33.6092483Z st.shared.v4.b32 [%r34], {%r11663, %r11661, %r11662, %r11660}; 2026-02-21T10:18:33.6092617Z st.shared.v4.b32 [%r34+16384], {%r11695, %r11693, %r11694, %r11692}; 2026-02-21T10:18:33.6092732Z st.shared.v4.b32 [%r35], {%r11667, %r11665, %r11666, %r11664}; 2026-02-21T10:18:33.6092864Z st.shared.v4.b32 [%r35+16384], {%r11699, %r11697, %r11698, %r11696}; 2026-02-21T10:18:33.6092975Z st.shared.v4.b32 [%r36], {%r11671, %r11669, %r11670, %r11668}; 2026-02-21T10:18:33.6093098Z st.shared.v4.b32 [%r36+16384], {%r11703, %r11701, %r11702, %r11700}; 2026-02-21T10:18:33.6093213Z st.shared.v4.b32 [%r37], {%r11675, %r11673, %r11674, %r11672}; 2026-02-21T10:18:33.6093330Z st.shared.v4.b32 [%r37+16384], {%r11707, %r11705, %r11706, %r11704}; 2026-02-21T10:18:33.6093443Z st.shared.v4.b32 [%r38], {%r11679, %r11677, %r11678, %r11676}; 2026-02-21T10:18:33.6093634Z st.shared.v4.b32 [%r38+16384], {%r11711, %r11709, %r11710, %r11708}; 2026-02-21T10:18:33.6093786Z st.shared.v4.b32 [%r39], {%r11683, %r11681, %r11682, %r11680}; 2026-02-21T10:18:33.6093903Z st.shared.v4.b32 [%r39+16384], {%r11715, %r11713, %r11714, %r11712}; 2026-02-21T10:18:33.6094015Z st.shared.v4.b32 [%r40], {%r11687, %r11685, %r11686, %r11684}; 2026-02-21T10:18:33.6094134Z st.shared.v4.b32 [%r40+16384], {%r11719, %r11717, %r11718, %r11716}; 2026-02-21T10:18:33.6094242Z st.shared.v4.b32 [%r41], {%r11691, %r11689, %r11690, %r11688}; 2026-02-21T10:18:33.6094369Z st.shared.v4.b32 [%r41+16384], {%r11723, %r11721, %r11722, %r11720}; 2026-02-21T10:18:33.6094432Z $L__tmp7: 2026-02-21T10:18:33.6094709Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6094772Z // begin inline asm 2026-02-21T10:18:33.6094859Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6094920Z // end inline asm 2026-02-21T10:18:33.6094979Z bar.sync 0; 2026-02-21T10:18:33.6095071Z shfl.sync.idx.b32 %r11724, %r5, 0, 31, -1; 2026-02-21T10:18:33.6095147Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6095264Z mov.pred %p107, -1; 2026-02-21T10:18:33.6095336Z // begin inline asm 2026-02-21T10:18:33.6097016Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r9413,%r9414,%r9415,%r9416}, %rd3, %p107, 1, 1; 2026-02-21T10:18:33.6097084Z // end inline asm 2026-02-21T10:18:33.6097157Z // begin inline asm 2026-02-21T10:18:33.6098642Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r9545,%r9546,%r9547,%r9548}, %rd4, %p107, 1, 1; 2026-02-21T10:18:33.6098702Z // end inline asm 2026-02-21T10:18:33.6098767Z // begin inline asm 2026-02-21T10:18:33.6100240Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r9677,%r9678,%r9679,%r9680}, %rd5, %p107, 1, 1; 2026-02-21T10:18:33.6100308Z // end inline asm 2026-02-21T10:18:33.6100370Z // begin inline asm 2026-02-21T10:18:33.6101841Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r9809,%r9810,%r9811,%r9812}, %rd6, %p107, 1, 1; 2026-02-21T10:18:33.6102033Z // end inline asm 2026-02-21T10:18:33.6102096Z // begin inline asm 2026-02-21T10:18:33.6103572Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r9941,%r9942,%r9943,%r9944}, %rd7, %p107, 1, 1; 2026-02-21T10:18:33.6103636Z // end inline asm 2026-02-21T10:18:33.6103696Z // begin inline asm 2026-02-21T10:18:33.6105285Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r10073,%r10074,%r10075,%r10076}, %rd8, %p107, 1, 1; 2026-02-21T10:18:33.6105352Z // end inline asm 2026-02-21T10:18:33.6105412Z // begin inline asm 2026-02-21T10:18:33.6107023Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r10205,%r10206,%r10207,%r10208}, %rd9, %p107, 1, 1; 2026-02-21T10:18:33.6107090Z // end inline asm 2026-02-21T10:18:33.6107151Z // begin inline asm 2026-02-21T10:18:33.6108699Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970}, {%r10337,%r10338,%r10339,%r10340}, %rd10, %p107, 1, 1; 2026-02-21T10:18:33.6108761Z // end inline asm 2026-02-21T10:18:33.6108827Z // begin inline asm 2026-02-21T10:18:33.6110315Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r10469,%r10470,%r10471,%r10472}, %rd3, %p107, 1, 1; 2026-02-21T10:18:33.6110373Z // end inline asm 2026-02-21T10:18:33.6110440Z // begin inline asm 2026-02-21T10:18:33.6111923Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r10601,%r10602,%r10603,%r10604}, %rd4, %p107, 1, 1; 2026-02-21T10:18:33.6112124Z // end inline asm 2026-02-21T10:18:33.6112184Z // begin inline asm 2026-02-21T10:18:33.6113736Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r10733,%r10734,%r10735,%r10736}, %rd5, %p107, 1, 1; 2026-02-21T10:18:33.6113808Z // end inline asm 2026-02-21T10:18:33.6113868Z // begin inline asm 2026-02-21T10:18:33.6115432Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r10865,%r10866,%r10867,%r10868}, %rd6, %p107, 1, 1; 2026-02-21T10:18:33.6115499Z // end inline asm 2026-02-21T10:18:33.6115561Z // begin inline asm 2026-02-21T10:18:33.6117189Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r10997,%r10998,%r10999,%r11000}, %rd7, %p107, 1, 1; 2026-02-21T10:18:33.6117254Z // end inline asm 2026-02-21T10:18:33.6117316Z // begin inline asm 2026-02-21T10:18:33.6118808Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r11129,%r11130,%r11131,%r11132}, %rd8, %p107, 1, 1; 2026-02-21T10:18:33.6118871Z // end inline asm 2026-02-21T10:18:33.6118933Z // begin inline asm 2026-02-21T10:18:33.6120414Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r11261,%r11262,%r11263,%r11264}, %rd9, %p107, 1, 1; 2026-02-21T10:18:33.6120610Z // end inline asm 2026-02-21T10:18:33.6120678Z // begin inline asm 2026-02-21T10:18:33.6122161Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034}, {%r11393,%r11394,%r11395,%r11396}, %rd10, %p107, 1, 1; 2026-02-21T10:18:33.6122222Z // end inline asm 2026-02-21T10:18:33.6122306Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6122372Z mov.b32 %r11525, %r29377; 2026-02-21T10:18:33.6122436Z mov.b32 %r11526, %r11527; 2026-02-21T10:18:33.6122501Z // begin inline asm 2026-02-21T10:18:33.6125167Z // wait for regs: %r31907,%r31908,%r31909,%r31910,%r31911,%r31912,%r31913,%r31914,%r31915,%r31916,%r31917,%r31918,%r31919,%r31920,%r31921,%r31922,%r31923,%r31924,%r31925,%r31926,%r31927,%r31928,%r31929,%r31930,%r31931,%r31932,%r31933,%r31934,%r31935,%r31936,%r31937,%r31938,%r31939,%r31940,%r31941,%r31942,%r31943,%r31944,%r31945,%r31946,%r31947,%r31948,%r31949,%r31950,%r31951,%r31952,%r31953,%r31954,%r31955,%r31956,%r31957,%r31958,%r31959,%r31960,%r31961,%r31962,%r31963,%r31964,%r31965,%r31966,%r31967,%r31968,%r31969,%r31970,%r31971,%r31972,%r31973,%r31974,%r31975,%r31976,%r31977,%r31978,%r31979,%r31980,%r31981,%r31982,%r31983,%r31984,%r31985,%r31986,%r31987,%r31988,%r31989,%r31990,%r31991,%r31992,%r31993,%r31994,%r31995,%r31996,%r31997,%r31998,%r31999,%r32000,%r32001,%r32002,%r32003,%r32004,%r32005,%r32006,%r32007,%r32008,%r32009,%r32010,%r32011,%r32012,%r32013,%r32014,%r32015,%r32016,%r32017,%r32018,%r32019,%r32020,%r32021,%r32022,%r32023,%r32024,%r32025,%r32026,%r32027,%r32028,%r32029,%r32030,%r32031,%r32032,%r32033,%r32034,%r11525,%r11526,%r11527 2026-02-21T10:18:33.6125260Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6125320Z // end inline asm 2026-02-21T10:18:33.6125377Z $L__tmp8: 2026-02-21T10:18:33.6125596Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6125676Z add.s64 %rd656, %rd656, 128; 2026-02-21T10:18:33.6125750Z setp.lt.u64 %p125, %rd657, 4064; 2026-02-21T10:18:33.6125816Z @%p125 bra $L__BB0_5; 2026-02-21T10:18:33.6125935Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:33.6126144Z .loc 1 97 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:97:28 2026-02-21T10:18:33.6126234Z cvt.rn.bf16x2.f32 %r11729, %r31908, %r31907; 2026-02-21T10:18:33.6126327Z cvt.rn.bf16x2.f32 %r11730, %r31910, %r31909; 2026-02-21T10:18:33.6126407Z cvt.rn.bf16x2.f32 %r11731, %r31912, %r31911; 2026-02-21T10:18:33.6126595Z cvt.rn.bf16x2.f32 %r11732, %r31914, %r31913; 2026-02-21T10:18:33.6126682Z cvt.rn.bf16x2.f32 %r11733, %r31916, %r31915; 2026-02-21T10:18:33.6126761Z cvt.rn.bf16x2.f32 %r11734, %r31918, %r31917; 2026-02-21T10:18:33.6126838Z cvt.rn.bf16x2.f32 %r11735, %r31920, %r31919; 2026-02-21T10:18:33.6126916Z cvt.rn.bf16x2.f32 %r11736, %r31922, %r31921; 2026-02-21T10:18:33.6127000Z cvt.rn.bf16x2.f32 %r11737, %r31924, %r31923; 2026-02-21T10:18:33.6127077Z cvt.rn.bf16x2.f32 %r11738, %r31926, %r31925; 2026-02-21T10:18:33.6127158Z cvt.rn.bf16x2.f32 %r11739, %r31928, %r31927; 2026-02-21T10:18:33.6127242Z cvt.rn.bf16x2.f32 %r11740, %r31930, %r31929; 2026-02-21T10:18:33.6127318Z cvt.rn.bf16x2.f32 %r11741, %r31932, %r31931; 2026-02-21T10:18:33.6127395Z cvt.rn.bf16x2.f32 %r11742, %r31934, %r31933; 2026-02-21T10:18:33.6127477Z cvt.rn.bf16x2.f32 %r11743, %r31936, %r31935; 2026-02-21T10:18:33.6127647Z cvt.rn.bf16x2.f32 %r11744, %r31938, %r31937; 2026-02-21T10:18:33.6127785Z cvt.rn.bf16x2.f32 %r11745, %r31940, %r31939; 2026-02-21T10:18:33.6127865Z cvt.rn.bf16x2.f32 %r11746, %r31942, %r31941; 2026-02-21T10:18:33.6127951Z cvt.rn.bf16x2.f32 %r11747, %r31944, %r31943; 2026-02-21T10:18:33.6128028Z cvt.rn.bf16x2.f32 %r11748, %r31946, %r31945; 2026-02-21T10:18:33.6128105Z cvt.rn.bf16x2.f32 %r11749, %r31948, %r31947; 2026-02-21T10:18:33.6128188Z cvt.rn.bf16x2.f32 %r11750, %r31950, %r31949; 2026-02-21T10:18:33.6128265Z cvt.rn.bf16x2.f32 %r11751, %r31952, %r31951; 2026-02-21T10:18:33.6128343Z cvt.rn.bf16x2.f32 %r11752, %r31954, %r31953; 2026-02-21T10:18:33.6128423Z cvt.rn.bf16x2.f32 %r11753, %r31956, %r31955; 2026-02-21T10:18:33.6128500Z cvt.rn.bf16x2.f32 %r11754, %r31958, %r31957; 2026-02-21T10:18:33.6128576Z cvt.rn.bf16x2.f32 %r11755, %r31960, %r31959; 2026-02-21T10:18:33.6128650Z cvt.rn.bf16x2.f32 %r11756, %r31962, %r31961; 2026-02-21T10:18:33.6128735Z cvt.rn.bf16x2.f32 %r11757, %r31964, %r31963; 2026-02-21T10:18:33.6128814Z cvt.rn.bf16x2.f32 %r11758, %r31966, %r31965; 2026-02-21T10:18:33.6128953Z cvt.rn.bf16x2.f32 %r11759, %r31968, %r31967; 2026-02-21T10:18:33.6129038Z cvt.rn.bf16x2.f32 %r11760, %r31970, %r31969; 2026-02-21T10:18:33.6129114Z cvt.rn.bf16x2.f32 %r11761, %r31972, %r31971; 2026-02-21T10:18:33.6129193Z cvt.rn.bf16x2.f32 %r11762, %r31974, %r31973; 2026-02-21T10:18:33.6129330Z cvt.rn.bf16x2.f32 %r11763, %r31976, %r31975; 2026-02-21T10:18:33.6129420Z cvt.rn.bf16x2.f32 %r11764, %r31978, %r31977; 2026-02-21T10:18:33.6129499Z cvt.rn.bf16x2.f32 %r11765, %r31980, %r31979; 2026-02-21T10:18:33.6129578Z cvt.rn.bf16x2.f32 %r11766, %r31982, %r31981; 2026-02-21T10:18:33.6129661Z cvt.rn.bf16x2.f32 %r11767, %r31984, %r31983; 2026-02-21T10:18:33.6129739Z cvt.rn.bf16x2.f32 %r11768, %r31986, %r31985; 2026-02-21T10:18:33.6129816Z cvt.rn.bf16x2.f32 %r11769, %r31988, %r31987; 2026-02-21T10:18:33.6129902Z cvt.rn.bf16x2.f32 %r11770, %r31990, %r31989; 2026-02-21T10:18:33.6129980Z cvt.rn.bf16x2.f32 %r11771, %r31992, %r31991; 2026-02-21T10:18:33.6130058Z cvt.rn.bf16x2.f32 %r11772, %r31994, %r31993; 2026-02-21T10:18:33.6130143Z cvt.rn.bf16x2.f32 %r11773, %r31996, %r31995; 2026-02-21T10:18:33.6130220Z cvt.rn.bf16x2.f32 %r11774, %r31998, %r31997; 2026-02-21T10:18:33.6130296Z cvt.rn.bf16x2.f32 %r11775, %r32000, %r31999; 2026-02-21T10:18:33.6130374Z cvt.rn.bf16x2.f32 %r11776, %r32002, %r32001; 2026-02-21T10:18:33.6130457Z cvt.rn.bf16x2.f32 %r11777, %r32004, %r32003; 2026-02-21T10:18:33.6130533Z cvt.rn.bf16x2.f32 %r11778, %r32006, %r32005; 2026-02-21T10:18:33.6130608Z cvt.rn.bf16x2.f32 %r11779, %r32008, %r32007; 2026-02-21T10:18:33.6130690Z cvt.rn.bf16x2.f32 %r11780, %r32010, %r32009; 2026-02-21T10:18:33.6130768Z cvt.rn.bf16x2.f32 %r11781, %r32012, %r32011; 2026-02-21T10:18:33.6130845Z cvt.rn.bf16x2.f32 %r11782, %r32014, %r32013; 2026-02-21T10:18:33.6130928Z cvt.rn.bf16x2.f32 %r11783, %r32016, %r32015; 2026-02-21T10:18:33.6131007Z cvt.rn.bf16x2.f32 %r11784, %r32018, %r32017; 2026-02-21T10:18:33.6131085Z cvt.rn.bf16x2.f32 %r11785, %r32020, %r32019; 2026-02-21T10:18:33.6131163Z cvt.rn.bf16x2.f32 %r11786, %r32022, %r32021; 2026-02-21T10:18:33.6131245Z cvt.rn.bf16x2.f32 %r11787, %r32024, %r32023; 2026-02-21T10:18:33.6131322Z cvt.rn.bf16x2.f32 %r11788, %r32026, %r32025; 2026-02-21T10:18:33.6131398Z cvt.rn.bf16x2.f32 %r11789, %r32028, %r32027; 2026-02-21T10:18:33.6131482Z cvt.rn.bf16x2.f32 %r11790, %r32030, %r32029; 2026-02-21T10:18:33.6131559Z cvt.rn.bf16x2.f32 %r11791, %r32032, %r32031; 2026-02-21T10:18:33.6131637Z cvt.rn.bf16x2.f32 %r11792, %r32034, %r32033; 2026-02-21T10:18:33.6131848Z .loc 1 98 43 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:98:43 2026-02-21T10:18:33.6131921Z bar.sync 0; 2026-02-21T10:18:33.6132122Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r11729, %r11730, %r11731, %r11732}; 2026-02-21T10:18:33.6132309Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r11745, %r11746, %r11747, %r11748}; 2026-02-21T10:18:33.6132594Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r11761, %r11762, %r11763, %r11764}; 2026-02-21T10:18:33.6132777Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r11777, %r11778, %r11779, %r11780}; 2026-02-21T10:18:33.6132959Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r11733, %r11734, %r11735, %r11736}; 2026-02-21T10:18:33.6133147Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r11749, %r11750, %r11751, %r11752}; 2026-02-21T10:18:33.6133330Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r11765, %r11766, %r11767, %r11768}; 2026-02-21T10:18:33.6133510Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r11781, %r11782, %r11783, %r11784}; 2026-02-21T10:18:33.6133701Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r11737, %r11738, %r11739, %r11740}; 2026-02-21T10:18:33.6133884Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r11753, %r11754, %r11755, %r11756}; 2026-02-21T10:18:33.6134069Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r52], {%r11769, %r11770, %r11771, %r11772}; 2026-02-21T10:18:33.6134306Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r11785, %r11786, %r11787, %r11788}; 2026-02-21T10:18:33.6134498Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r54], {%r11741, %r11742, %r11743, %r11744}; 2026-02-21T10:18:33.6134678Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r55], {%r11757, %r11758, %r11759, %r11760}; 2026-02-21T10:18:33.6134908Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r56], {%r11773, %r11774, %r11775, %r11776}; 2026-02-21T10:18:33.6135095Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r57], {%r11789, %r11790, %r11791, %r11792}; 2026-02-21T10:18:33.6135159Z // begin inline asm 2026-02-21T10:18:33.6135242Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6135308Z // end inline asm 2026-02-21T10:18:33.6135369Z bar.sync 0; 2026-02-21T10:18:33.6135439Z elect.sync %r11793|%p128, -1; 2026-02-21T10:18:33.6135528Z shfl.sync.idx.b32 %r11794, %r5, 0, 31, -1; 2026-02-21T10:18:33.6135602Z and.pred %p126, %p314, %p128; 2026-02-21T10:18:33.6135667Z and.b32 %r11795, %r11794, 1; 2026-02-21T10:18:33.6135737Z shl.b32 %r11796, %r11795, 14; 2026-02-21T10:18:33.6135807Z add.s32 %r21746, %r29377, %r11796; 2026-02-21T10:18:33.6140482Z shl.b32 %r637, %r11795, 6; 2026-02-21T10:18:33.6140590Z or.b32 %r11725, %r637, %r9279; 2026-02-21T10:18:33.6140660Z // begin inline asm 2026-02-21T10:18:33.6140939Z @%p126 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd298, {%r11725, %r11726}], [%r21746]; 2026-02-21T10:18:33.6141014Z // end inline asm 2026-02-21T10:18:33.6141095Z cp.async.bulk.commit_group; 2026-02-21T10:18:33.6141174Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:18:33.6141237Z bar.sync 0; 2026-02-21T10:18:33.6141475Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.6141547Z add.s32 %r11798, %r31905, 1; 2026-02-21T10:18:33.6141767Z .loc 1 38 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:38:33 2026-02-21T10:18:33.6141839Z shr.u32 %r11799, %r11798, 9; 2026-02-21T10:18:33.6141907Z and.b32 %r11800, %r11799, 4194272; 2026-02-21T10:18:33.6142120Z .loc 1 39 39 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:39:39 2026-02-21T10:18:33.6142192Z sub.s32 %r11801, 10, %r11800; 2026-02-21T10:18:33.6142393Z .loc 1 40 45 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:45 2026-02-21T10:18:33.6142460Z and.b32 %r11802, %r11798, 16383; 2026-02-21T10:18:33.6142664Z .loc 1 41 51 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:41:51 2026-02-21T10:18:33.6142731Z div.s32 %r11803, %r11802, %r11801; 2026-02-21T10:18:33.6142930Z .loc 1 40 64 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:64 2026-02-21T10:18:33.6143007Z mul.lo.s32 %r11804, %r11803, %r11801; 2026-02-21T10:18:33.6143072Z sub.s32 %r11805, %r11802, %r11804; 2026-02-21T10:18:33.6143407Z .loc 1 40 30 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:30 2026-02-21T10:18:33.6143542Z add.s32 %r11806, %r11805, %r11800; 2026-02-21T10:18:33.6143738Z .loc 1 42 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:42:27 2026-02-21T10:18:33.6143802Z shl.b32 %r19299, %r11806, 7; 2026-02-21T10:18:33.6143995Z .loc 1 43 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:43:27 2026-02-21T10:18:33.6144063Z shl.b32 %r21745, %r11803, 7; 2026-02-21T10:18:33.6144257Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6144320Z or.b32 %r11807, %r12, %r21745; 2026-02-21T10:18:33.6144401Z shl.b32 %r11808, %r11807, 13; 2026-02-21T10:18:33.6144476Z mul.wide.s32 %rd45, %r11808, 2; 2026-02-21T10:18:33.6144539Z or.b32 %r11809, %r11, %r21745; 2026-02-21T10:18:33.6144606Z shl.b32 %r11810, %r11809, 13; 2026-02-21T10:18:33.6144674Z mul.wide.s32 %rd46, %r11810, 2; 2026-02-21T10:18:33.6144745Z or.b32 %r11811, %r10, %r21745; 2026-02-21T10:18:33.6144807Z shl.b32 %r11812, %r11811, 13; 2026-02-21T10:18:33.6144973Z mul.wide.s32 %rd47, %r11812, 2; 2026-02-21T10:18:33.6145040Z or.b32 %r11813, %r9, %r21745; 2026-02-21T10:18:33.6145102Z shl.b32 %r11814, %r11813, 13; 2026-02-21T10:18:33.6145172Z mul.wide.s32 %rd48, %r11814, 2; 2026-02-21T10:18:33.6145290Z or.b32 %r11815, %r8, %r21745; 2026-02-21T10:18:33.6145354Z shl.b32 %r11816, %r11815, 13; 2026-02-21T10:18:33.6145420Z mul.wide.s32 %rd49, %r11816, 2; 2026-02-21T10:18:33.6145499Z or.b32 %r11817, %r7, %r21745; 2026-02-21T10:18:33.6145563Z shl.b32 %r11818, %r11817, 13; 2026-02-21T10:18:33.6145631Z mul.wide.s32 %rd50, %r11818, 2; 2026-02-21T10:18:33.6145699Z shl.b32 %r11819, %r11803, 20; 2026-02-21T10:18:33.6145763Z or.b32 %r11820, %r72, %r11819; 2026-02-21T10:18:33.6145829Z mul.wide.s32 %rd51, %r11820, 2; 2026-02-21T10:18:33.6145904Z or.b32 %r32163, %r74, %r11819; 2026-02-21T10:18:33.6145966Z or.b32 %r11821, %r73, %r11819; 2026-02-21T10:18:33.6146033Z mul.wide.s32 %rd52, %r11821, 2; 2026-02-21T10:18:33.6146100Z mov.b32 %r32164, 0f00000000; 2026-02-21T10:18:33.6146169Z mov.b64 %rd659, -96; 2026-02-21T10:18:33.6146232Z mov.b64 %rd658, %rd11; 2026-02-21T10:18:33.6146298Z mov.b32 %r32165, %r32164; 2026-02-21T10:18:33.6146364Z mov.b32 %r32166, %r32164; 2026-02-21T10:18:33.6146425Z mov.b32 %r32167, %r32164; 2026-02-21T10:18:33.6146648Z mov.b32 %r32168, %r32164; 2026-02-21T10:18:33.6146716Z mov.b32 %r32169, %r32164; 2026-02-21T10:18:33.6146783Z mov.b32 %r32170, %r32164; 2026-02-21T10:18:33.6146845Z mov.b32 %r32171, %r32164; 2026-02-21T10:18:33.6146905Z mov.b32 %r32172, %r32164; 2026-02-21T10:18:33.6146971Z mov.b32 %r32173, %r32164; 2026-02-21T10:18:33.6147029Z mov.b32 %r32174, %r32164; 2026-02-21T10:18:33.6147089Z mov.b32 %r32175, %r32164; 2026-02-21T10:18:33.6147148Z mov.b32 %r32176, %r32164; 2026-02-21T10:18:33.6147215Z mov.b32 %r32177, %r32164; 2026-02-21T10:18:33.6147275Z mov.b32 %r32178, %r32164; 2026-02-21T10:18:33.6147337Z mov.b32 %r32179, %r32164; 2026-02-21T10:18:33.6147403Z mov.b32 %r32180, %r32164; 2026-02-21T10:18:33.6147461Z mov.b32 %r32181, %r32164; 2026-02-21T10:18:33.6147523Z mov.b32 %r32182, %r32164; 2026-02-21T10:18:33.6147583Z mov.b32 %r32183, %r32164; 2026-02-21T10:18:33.6147648Z mov.b32 %r32184, %r32164; 2026-02-21T10:18:33.6147710Z mov.b32 %r32185, %r32164; 2026-02-21T10:18:33.6147770Z mov.b32 %r32186, %r32164; 2026-02-21T10:18:33.6147837Z mov.b32 %r32187, %r32164; 2026-02-21T10:18:33.6147896Z mov.b32 %r32188, %r32164; 2026-02-21T10:18:33.6147956Z mov.b32 %r32189, %r32164; 2026-02-21T10:18:33.6148016Z mov.b32 %r32190, %r32164; 2026-02-21T10:18:33.6148082Z mov.b32 %r32191, %r32164; 2026-02-21T10:18:33.6148142Z mov.b32 %r32192, %r32164; 2026-02-21T10:18:33.6148202Z mov.b32 %r32193, %r32164; 2026-02-21T10:18:33.6148268Z mov.b32 %r32194, %r32164; 2026-02-21T10:18:33.6148486Z mov.b32 %r32195, %r32164; 2026-02-21T10:18:33.6148556Z mov.b32 %r32196, %r32164; 2026-02-21T10:18:33.6148690Z mov.b32 %r32197, %r32164; 2026-02-21T10:18:33.6148760Z mov.b32 %r32198, %r32164; 2026-02-21T10:18:33.6148820Z mov.b32 %r32199, %r32164; 2026-02-21T10:18:33.6148880Z mov.b32 %r32200, %r32164; 2026-02-21T10:18:33.6148944Z mov.b32 %r32201, %r32164; 2026-02-21T10:18:33.6149003Z mov.b32 %r32202, %r32164; 2026-02-21T10:18:33.6149066Z mov.b32 %r32203, %r32164; 2026-02-21T10:18:33.6149133Z mov.b32 %r32204, %r32164; 2026-02-21T10:18:33.6149193Z mov.b32 %r32205, %r32164; 2026-02-21T10:18:33.6149254Z mov.b32 %r32206, %r32164; 2026-02-21T10:18:33.6149314Z mov.b32 %r32207, %r32164; 2026-02-21T10:18:33.6149382Z mov.b32 %r32208, %r32164; 2026-02-21T10:18:33.6149446Z mov.b32 %r32209, %r32164; 2026-02-21T10:18:33.6149506Z mov.b32 %r32210, %r32164; 2026-02-21T10:18:33.6149572Z mov.b32 %r32211, %r32164; 2026-02-21T10:18:33.6149632Z mov.b32 %r32212, %r32164; 2026-02-21T10:18:33.6149694Z mov.b32 %r32213, %r32164; 2026-02-21T10:18:33.6149753Z mov.b32 %r32214, %r32164; 2026-02-21T10:18:33.6149826Z mov.b32 %r32215, %r32164; 2026-02-21T10:18:33.6149958Z mov.b32 %r32216, %r32164; 2026-02-21T10:18:33.6150023Z mov.b32 %r32217, %r32164; 2026-02-21T10:18:33.6150088Z mov.b32 %r32218, %r32164; 2026-02-21T10:18:33.6150149Z mov.b32 %r32219, %r32164; 2026-02-21T10:18:33.6150212Z mov.b32 %r32220, %r32164; 2026-02-21T10:18:33.6150325Z mov.b32 %r32221, %r32164; 2026-02-21T10:18:33.6150399Z mov.b32 %r32222, %r32164; 2026-02-21T10:18:33.6150462Z mov.b32 %r32223, %r32164; 2026-02-21T10:18:33.6150522Z mov.b32 %r32224, %r32164; 2026-02-21T10:18:33.6150588Z mov.b32 %r32225, %r32164; 2026-02-21T10:18:33.6150648Z mov.b32 %r32226, %r32164; 2026-02-21T10:18:33.6150707Z mov.b32 %r32227, %r32164; 2026-02-21T10:18:33.6150770Z mov.b32 %r32228, %r32164; 2026-02-21T10:18:33.6150834Z mov.b32 %r32229, %r32164; 2026-02-21T10:18:33.6150893Z mov.b32 %r32230, %r32164; 2026-02-21T10:18:33.6150967Z mov.b32 %r32231, %r32164; 2026-02-21T10:18:33.6151035Z mov.b32 %r32232, %r32164; 2026-02-21T10:18:33.6151096Z mov.b32 %r32233, %r32164; 2026-02-21T10:18:33.6151155Z mov.b32 %r32234, %r32164; 2026-02-21T10:18:33.6151215Z mov.b32 %r32235, %r32164; 2026-02-21T10:18:33.6151282Z mov.b32 %r32236, %r32164; 2026-02-21T10:18:33.6151342Z mov.b32 %r32237, %r32164; 2026-02-21T10:18:33.6151401Z mov.b32 %r32238, %r32164; 2026-02-21T10:18:33.6151467Z mov.b32 %r32239, %r32164; 2026-02-21T10:18:33.6151530Z mov.b32 %r32240, %r32164; 2026-02-21T10:18:33.6151588Z mov.b32 %r32241, %r32164; 2026-02-21T10:18:33.6151647Z mov.b32 %r32242, %r32164; 2026-02-21T10:18:33.6151712Z mov.b32 %r32243, %r32164; 2026-02-21T10:18:33.6151772Z mov.b32 %r32244, %r32164; 2026-02-21T10:18:33.6151830Z mov.b32 %r32245, %r32164; 2026-02-21T10:18:33.6151895Z mov.b32 %r32246, %r32164; 2026-02-21T10:18:33.6151955Z mov.b32 %r32247, %r32164; 2026-02-21T10:18:33.6152015Z mov.b32 %r32248, %r32164; 2026-02-21T10:18:33.6152085Z mov.b32 %r32249, %r32164; 2026-02-21T10:18:33.6152145Z mov.b32 %r32250, %r32164; 2026-02-21T10:18:33.6152206Z mov.b32 %r32251, %r32164; 2026-02-21T10:18:33.6152268Z mov.b32 %r32252, %r32164; 2026-02-21T10:18:33.6152333Z mov.b32 %r32253, %r32164; 2026-02-21T10:18:33.6152393Z mov.b32 %r32254, %r32164; 2026-02-21T10:18:33.6152452Z mov.b32 %r32255, %r32164; 2026-02-21T10:18:33.6152517Z mov.b32 %r32256, %r32164; 2026-02-21T10:18:33.6152578Z mov.b32 %r32257, %r32164; 2026-02-21T10:18:33.6152639Z mov.b32 %r32258, %r32164; 2026-02-21T10:18:33.6152698Z mov.b32 %r32259, %r32164; 2026-02-21T10:18:33.6152764Z mov.b32 %r32260, %r32164; 2026-02-21T10:18:33.6152824Z mov.b32 %r32261, %r32164; 2026-02-21T10:18:33.6152886Z mov.b32 %r32262, %r32164; 2026-02-21T10:18:33.6152949Z mov.b32 %r32263, %r32164; 2026-02-21T10:18:33.6153011Z mov.b32 %r32264, %r32164; 2026-02-21T10:18:33.6153070Z mov.b32 %r32265, %r32164; 2026-02-21T10:18:33.6153128Z mov.b32 %r32266, %r32164; 2026-02-21T10:18:33.6153264Z mov.b32 %r32267, %r32164; 2026-02-21T10:18:33.6153328Z mov.b32 %r32268, %r32164; 2026-02-21T10:18:33.6153437Z mov.b32 %r32269, %r32164; 2026-02-21T10:18:33.6153506Z mov.b32 %r32270, %r32164; 2026-02-21T10:18:33.6153566Z mov.b32 %r32271, %r32164; 2026-02-21T10:18:33.6153625Z mov.b32 %r32272, %r32164; 2026-02-21T10:18:33.6153684Z mov.b32 %r32273, %r32164; 2026-02-21T10:18:33.6153750Z mov.b32 %r32274, %r32164; 2026-02-21T10:18:33.6153808Z mov.b32 %r32275, %r32164; 2026-02-21T10:18:33.6153867Z mov.b32 %r32276, %r32164; 2026-02-21T10:18:33.6153931Z mov.b32 %r32277, %r32164; 2026-02-21T10:18:33.6153989Z mov.b32 %r32278, %r32164; 2026-02-21T10:18:33.6154050Z mov.b32 %r32279, %r32164; 2026-02-21T10:18:33.6154109Z mov.b32 %r32280, %r32164; 2026-02-21T10:18:33.6154169Z mov.b32 %r32281, %r32164; 2026-02-21T10:18:33.6154229Z mov.b32 %r32282, %r32164; 2026-02-21T10:18:33.6154288Z mov.b32 %r32283, %r32164; 2026-02-21T10:18:33.6154353Z mov.b32 %r32284, %r32164; 2026-02-21T10:18:33.6154415Z mov.b32 %r32285, %r32164; 2026-02-21T10:18:33.6154476Z mov.b32 %r32286, %r32164; 2026-02-21T10:18:33.6154543Z mov.b32 %r32287, %r32164; 2026-02-21T10:18:33.6154658Z mov.b32 %r32288, %r32164; 2026-02-21T10:18:33.6154720Z mov.b32 %r32289, %r32164; 2026-02-21T10:18:33.6154781Z mov.b32 %r32290, %r32164; 2026-02-21T10:18:33.6154845Z mov.b32 %r32291, %r32164; 2026-02-21T10:18:33.6155027Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T10:18:33.6155141Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.6155356Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6155425Z add.s64 %rd301, %rd658, %rd51; 2026-02-21T10:18:33.6155489Z add.s64 %rd304, %rd658, %rd50; 2026-02-21T10:18:33.6155551Z add.s64 %rd307, %rd658, %rd49; 2026-02-21T10:18:33.6155621Z add.s64 %rd310, %rd658, %rd48; 2026-02-21T10:18:33.6155684Z add.s64 %rd313, %rd658, %rd47; 2026-02-21T10:18:33.6155750Z add.s64 %rd316, %rd658, %rd46; 2026-02-21T10:18:33.6155818Z add.s64 %rd319, %rd658, %rd45; 2026-02-21T10:18:33.6156022Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6156087Z add.s64 %rd322, %rd658, %rd52; 2026-02-21T10:18:33.6156156Z // begin inline asm 2026-02-21T10:18:33.6156217Z mov.u64 %rd300, 0x0; 2026-02-21T10:18:33.6156353Z createpolicy.fractional.L2::evict_first.b64 %rd300, 1.0; 2026-02-21T10:18:33.6156414Z // end inline asm 2026-02-21T10:18:33.6156615Z // begin inline asm 2026-02-21T10:18:33.6156683Z mov.u32 %r11822, 0x0; 2026-02-21T10:18:33.6156742Z mov.u32 %r11823, 0x0; 2026-02-21T10:18:33.6156806Z mov.u32 %r11824, 0x0; 2026-02-21T10:18:33.6156864Z mov.u32 %r11825, 0x0; 2026-02-21T10:18:33.6157121Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11822, %r11823, %r11824, %r11825 }, [ %rd301 + 0 ], %rd300; 2026-02-21T10:18:33.6157187Z // end inline asm 2026-02-21T10:18:33.6157249Z // begin inline asm 2026-02-21T10:18:33.6157310Z mov.u64 %rd303, 0x0; 2026-02-21T10:18:33.6157441Z createpolicy.fractional.L2::evict_first.b64 %rd303, 1.0; 2026-02-21T10:18:33.6157511Z // end inline asm 2026-02-21T10:18:33.6157571Z // begin inline asm 2026-02-21T10:18:33.6157630Z mov.u32 %r11826, 0x0; 2026-02-21T10:18:33.6157698Z mov.u32 %r11827, 0x0; 2026-02-21T10:18:33.6157757Z mov.u32 %r11828, 0x0; 2026-02-21T10:18:33.6157817Z mov.u32 %r11829, 0x0; 2026-02-21T10:18:33.6158053Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11826, %r11827, %r11828, %r11829 }, [ %rd304 + 0 ], %rd303; 2026-02-21T10:18:33.6158118Z // end inline asm 2026-02-21T10:18:33.6158177Z // begin inline asm 2026-02-21T10:18:33.6158239Z mov.u64 %rd306, 0x0; 2026-02-21T10:18:33.6158369Z createpolicy.fractional.L2::evict_first.b64 %rd306, 1.0; 2026-02-21T10:18:33.6158427Z // end inline asm 2026-02-21T10:18:33.6158487Z // begin inline asm 2026-02-21T10:18:33.6158556Z mov.u32 %r11830, 0x0; 2026-02-21T10:18:33.6158696Z mov.u32 %r11831, 0x0; 2026-02-21T10:18:33.6158755Z mov.u32 %r11832, 0x0; 2026-02-21T10:18:33.6158876Z mov.u32 %r11833, 0x0; 2026-02-21T10:18:33.6159120Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11830, %r11831, %r11832, %r11833 }, [ %rd307 + 0 ], %rd306; 2026-02-21T10:18:33.6159178Z // end inline asm 2026-02-21T10:18:33.6159238Z // begin inline asm 2026-02-21T10:18:33.6159302Z mov.u64 %rd309, 0x0; 2026-02-21T10:18:33.6159426Z createpolicy.fractional.L2::evict_first.b64 %rd309, 1.0; 2026-02-21T10:18:33.6159485Z // end inline asm 2026-02-21T10:18:33.6159544Z // begin inline asm 2026-02-21T10:18:33.6159606Z mov.u32 %r11834, 0x0; 2026-02-21T10:18:33.6159664Z mov.u32 %r11835, 0x0; 2026-02-21T10:18:33.6159723Z mov.u32 %r11836, 0x0; 2026-02-21T10:18:33.6159788Z mov.u32 %r11837, 0x0; 2026-02-21T10:18:33.6160012Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11834, %r11835, %r11836, %r11837 }, [ %rd310 + 0 ], %rd309; 2026-02-21T10:18:33.6160070Z // end inline asm 2026-02-21T10:18:33.6160149Z // begin inline asm 2026-02-21T10:18:33.6160212Z mov.u64 %rd312, 0x0; 2026-02-21T10:18:33.6160331Z createpolicy.fractional.L2::evict_first.b64 %rd312, 1.0; 2026-02-21T10:18:33.6160454Z // end inline asm 2026-02-21T10:18:33.6160522Z // begin inline asm 2026-02-21T10:18:33.6160580Z mov.u32 %r11838, 0x0; 2026-02-21T10:18:33.6160638Z mov.u32 %r11839, 0x0; 2026-02-21T10:18:33.6160701Z mov.u32 %r11840, 0x0; 2026-02-21T10:18:33.6160813Z mov.u32 %r11841, 0x0; 2026-02-21T10:18:33.6161037Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11838, %r11839, %r11840, %r11841 }, [ %rd313 + 0 ], %rd312; 2026-02-21T10:18:33.6161101Z // end inline asm 2026-02-21T10:18:33.6161160Z // begin inline asm 2026-02-21T10:18:33.6161220Z mov.u64 %rd315, 0x0; 2026-02-21T10:18:33.6161336Z createpolicy.fractional.L2::evict_first.b64 %rd315, 1.0; 2026-02-21T10:18:33.6161400Z // end inline asm 2026-02-21T10:18:33.6161459Z // begin inline asm 2026-02-21T10:18:33.6161517Z mov.u32 %r11842, 0x0; 2026-02-21T10:18:33.6161595Z mov.u32 %r11843, 0x0; 2026-02-21T10:18:33.6161658Z mov.u32 %r11844, 0x0; 2026-02-21T10:18:33.6161716Z mov.u32 %r11845, 0x0; 2026-02-21T10:18:33.6161940Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11842, %r11843, %r11844, %r11845 }, [ %rd316 + 0 ], %rd315; 2026-02-21T10:18:33.6162005Z // end inline asm 2026-02-21T10:18:33.6162063Z // begin inline asm 2026-02-21T10:18:33.6162121Z mov.u64 %rd318, 0x0; 2026-02-21T10:18:33.6162244Z createpolicy.fractional.L2::evict_first.b64 %rd318, 1.0; 2026-02-21T10:18:33.6162301Z // end inline asm 2026-02-21T10:18:33.6162361Z // begin inline asm 2026-02-21T10:18:33.6162424Z mov.u32 %r11846, 0x0; 2026-02-21T10:18:33.6162482Z mov.u32 %r11847, 0x0; 2026-02-21T10:18:33.6162541Z mov.u32 %r11848, 0x0; 2026-02-21T10:18:33.6162598Z mov.u32 %r11849, 0x0; 2026-02-21T10:18:33.6162824Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11846, %r11847, %r11848, %r11849 }, [ %rd319 + 0 ], %rd318; 2026-02-21T10:18:33.6162891Z // end inline asm 2026-02-21T10:18:33.6162955Z // begin inline asm 2026-02-21T10:18:33.6163020Z mov.u64 %rd321, 0x0; 2026-02-21T10:18:33.6163136Z createpolicy.fractional.L2::evict_first.b64 %rd321, 1.0; 2026-02-21T10:18:33.6163196Z // end inline asm 2026-02-21T10:18:33.6163254Z // begin inline asm 2026-02-21T10:18:33.6163316Z mov.u32 %r11850, 0x0; 2026-02-21T10:18:33.6163375Z mov.u32 %r11851, 0x0; 2026-02-21T10:18:33.6163432Z mov.u32 %r11852, 0x0; 2026-02-21T10:18:33.6163495Z mov.u32 %r11853, 0x0; 2026-02-21T10:18:33.6163716Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r11850, %r11851, %r11852, %r11853 }, [ %rd322 + 0 ], %rd321; 2026-02-21T10:18:33.6163785Z // end inline asm 2026-02-21T10:18:33.6164006Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6164067Z bar.sync 0; 2026-02-21T10:18:33.6164154Z st.shared.v2.b32 [%r16], {%r11822, %r11823}; 2026-02-21T10:18:33.6164250Z st.shared.v2.b32 [%r16+2048], {%r11826, %r11827}; 2026-02-21T10:18:33.6164427Z st.shared.v2.b32 [%r16+4096], {%r11830, %r11831}; 2026-02-21T10:18:33.6164582Z st.shared.v2.b32 [%r16+6144], {%r11834, %r11835}; 2026-02-21T10:18:33.6164665Z st.shared.v2.b32 [%r16+8192], {%r11838, %r11839}; 2026-02-21T10:18:33.6164773Z st.shared.v2.b32 [%r16+10240], {%r11842, %r11843}; 2026-02-21T10:18:33.6164865Z st.shared.v2.b32 [%r16+12288], {%r11846, %r11847}; 2026-02-21T10:18:33.6164950Z st.shared.v2.b32 [%r16+14336], {%r11850, %r11851}; 2026-02-21T10:18:33.6165037Z st.shared.v2.b32 [%r17], {%r11824, %r11825}; 2026-02-21T10:18:33.6165122Z st.shared.v2.b32 [%r17+2048], {%r11828, %r11829}; 2026-02-21T10:18:33.6165206Z st.shared.v2.b32 [%r17+4096], {%r11832, %r11833}; 2026-02-21T10:18:33.6165290Z st.shared.v2.b32 [%r17+6144], {%r11836, %r11837}; 2026-02-21T10:18:33.6165381Z st.shared.v2.b32 [%r17+8192], {%r11840, %r11841}; 2026-02-21T10:18:33.6165466Z st.shared.v2.b32 [%r17+10240], {%r11844, %r11845}; 2026-02-21T10:18:33.6165553Z st.shared.v2.b32 [%r17+12288], {%r11848, %r11849}; 2026-02-21T10:18:33.6165642Z st.shared.v2.b32 [%r17+14336], {%r11852, %r11853}; 2026-02-21T10:18:33.6165703Z bar.sync 0; 2026-02-21T10:18:33.6165830Z ld.shared.b16 %rs897, [%r58]; 2026-02-21T10:18:33.6165911Z ld.shared.b16 %rs898, [%r58+1024]; 2026-02-21T10:18:33.6165981Z ld.shared.b16 %rs899, [%r58+64]; 2026-02-21T10:18:33.6166051Z ld.shared.b16 %rs900, [%r58+1088]; 2026-02-21T10:18:33.6166119Z ld.shared.b16 %rs901, [%r58+8192]; 2026-02-21T10:18:33.6166225Z ld.shared.b16 %rs902, [%r58+9216]; 2026-02-21T10:18:33.6166292Z ld.shared.b16 %rs903, [%r58+8256]; 2026-02-21T10:18:33.6166357Z ld.shared.b16 %rs904, [%r58+9280]; 2026-02-21T10:18:33.6166427Z ld.shared.b16 %rs905, [%r59]; 2026-02-21T10:18:33.6166613Z ld.shared.b16 %rs906, [%r59+1024]; 2026-02-21T10:18:33.6166682Z ld.shared.b16 %rs907, [%r59+64]; 2026-02-21T10:18:33.6166753Z ld.shared.b16 %rs908, [%r59+1088]; 2026-02-21T10:18:33.6166819Z ld.shared.b16 %rs909, [%r59+8192]; 2026-02-21T10:18:33.6166886Z ld.shared.b16 %rs910, [%r59+9216]; 2026-02-21T10:18:33.6166949Z ld.shared.b16 %rs911, [%r59+8256]; 2026-02-21T10:18:33.6167022Z ld.shared.b16 %rs912, [%r59+9280]; 2026-02-21T10:18:33.6167089Z ld.shared.b16 %rs913, [%r60]; 2026-02-21T10:18:33.6167151Z ld.shared.b16 %rs914, [%r60+1024]; 2026-02-21T10:18:33.6167221Z ld.shared.b16 %rs915, [%r60+64]; 2026-02-21T10:18:33.6167284Z ld.shared.b16 %rs916, [%r60+1088]; 2026-02-21T10:18:33.6167349Z ld.shared.b16 %rs917, [%r60+8192]; 2026-02-21T10:18:33.6167416Z ld.shared.b16 %rs918, [%r60+9216]; 2026-02-21T10:18:33.6167478Z ld.shared.b16 %rs919, [%r60+8256]; 2026-02-21T10:18:33.6167541Z ld.shared.b16 %rs920, [%r60+9280]; 2026-02-21T10:18:33.6167604Z ld.shared.b16 %rs921, [%r61]; 2026-02-21T10:18:33.6167672Z ld.shared.b16 %rs922, [%r61+1024]; 2026-02-21T10:18:33.6167736Z ld.shared.b16 %rs923, [%r61+64]; 2026-02-21T10:18:33.6167800Z ld.shared.b16 %rs924, [%r61+1088]; 2026-02-21T10:18:33.6167869Z ld.shared.b16 %rs925, [%r61+8192]; 2026-02-21T10:18:33.6167934Z ld.shared.b16 %rs926, [%r61+9216]; 2026-02-21T10:18:33.6167999Z ld.shared.b16 %rs927, [%r61+8256]; 2026-02-21T10:18:33.6168065Z ld.shared.b16 %rs928, [%r61+9280]; 2026-02-21T10:18:33.6168133Z ld.shared.b16 %rs929, [%r62]; 2026-02-21T10:18:33.6168197Z ld.shared.b16 %rs930, [%r62+1024]; 2026-02-21T10:18:33.6168273Z ld.shared.b16 %rs931, [%r62+64]; 2026-02-21T10:18:33.6168348Z ld.shared.b16 %rs932, [%r62+1088]; 2026-02-21T10:18:33.6168413Z ld.shared.b16 %rs933, [%r62+8192]; 2026-02-21T10:18:33.6168476Z ld.shared.b16 %rs934, [%r62+9216]; 2026-02-21T10:18:33.6168545Z ld.shared.b16 %rs935, [%r62+8256]; 2026-02-21T10:18:33.6168609Z ld.shared.b16 %rs936, [%r62+9280]; 2026-02-21T10:18:33.6168675Z ld.shared.b16 %rs937, [%r63]; 2026-02-21T10:18:33.6168742Z ld.shared.b16 %rs938, [%r63+1024]; 2026-02-21T10:18:33.6168812Z ld.shared.b16 %rs939, [%r63+64]; 2026-02-21T10:18:33.6168875Z ld.shared.b16 %rs940, [%r63+1088]; 2026-02-21T10:18:33.6168940Z ld.shared.b16 %rs941, [%r63+8192]; 2026-02-21T10:18:33.6169089Z ld.shared.b16 %rs942, [%r63+9216]; 2026-02-21T10:18:33.6169215Z ld.shared.b16 %rs943, [%r63+8256]; 2026-02-21T10:18:33.6169281Z ld.shared.b16 %rs944, [%r63+9280]; 2026-02-21T10:18:33.6169344Z ld.shared.b16 %rs945, [%r64]; 2026-02-21T10:18:33.6169415Z ld.shared.b16 %rs946, [%r64+1024]; 2026-02-21T10:18:33.6169478Z ld.shared.b16 %rs947, [%r64+64]; 2026-02-21T10:18:33.6169541Z ld.shared.b16 %rs948, [%r64+1088]; 2026-02-21T10:18:33.6169618Z ld.shared.b16 %rs949, [%r64+8192]; 2026-02-21T10:18:33.6169691Z ld.shared.b16 %rs950, [%r64+9216]; 2026-02-21T10:18:33.6169754Z ld.shared.b16 %rs951, [%r64+8256]; 2026-02-21T10:18:33.6169819Z ld.shared.b16 %rs952, [%r64+9280]; 2026-02-21T10:18:33.6169887Z ld.shared.b16 %rs953, [%r65]; 2026-02-21T10:18:33.6169950Z ld.shared.b16 %rs954, [%r65+1024]; 2026-02-21T10:18:33.6170015Z ld.shared.b16 %rs955, [%r65+64]; 2026-02-21T10:18:33.6170083Z ld.shared.b16 %rs956, [%r65+1088]; 2026-02-21T10:18:33.6170150Z ld.shared.b16 %rs957, [%r65+8192]; 2026-02-21T10:18:33.6170215Z ld.shared.b16 %rs958, [%r65+9216]; 2026-02-21T10:18:33.6170285Z ld.shared.b16 %rs959, [%r65+8256]; 2026-02-21T10:18:33.6170412Z ld.shared.b16 %rs960, [%r65+9280]; 2026-02-21T10:18:33.6170478Z cvt.f32.bf16 %r11991, %rs897; 2026-02-21T10:18:33.6170539Z cvt.f32.bf16 %r11992, %rs898; 2026-02-21T10:18:33.6170603Z cvt.f32.bf16 %r11993, %rs905; 2026-02-21T10:18:33.6170662Z cvt.f32.bf16 %r11994, %rs906; 2026-02-21T10:18:33.6170782Z cvt.f32.bf16 %r12123, %rs913; 2026-02-21T10:18:33.6170857Z cvt.f32.bf16 %r12124, %rs914; 2026-02-21T10:18:33.6170917Z cvt.f32.bf16 %r12125, %rs921; 2026-02-21T10:18:33.6170976Z cvt.f32.bf16 %r12126, %rs922; 2026-02-21T10:18:33.6171037Z cvt.f32.bf16 %r12255, %rs929; 2026-02-21T10:18:33.6171102Z cvt.f32.bf16 %r12256, %rs930; 2026-02-21T10:18:33.6171161Z cvt.f32.bf16 %r12257, %rs937; 2026-02-21T10:18:33.6171221Z cvt.f32.bf16 %r12258, %rs938; 2026-02-21T10:18:33.6171288Z cvt.f32.bf16 %r12387, %rs945; 2026-02-21T10:18:33.6171351Z cvt.f32.bf16 %r12388, %rs946; 2026-02-21T10:18:33.6171412Z cvt.f32.bf16 %r12389, %rs953; 2026-02-21T10:18:33.6171475Z cvt.f32.bf16 %r12390, %rs954; 2026-02-21T10:18:33.6171542Z cvt.f32.bf16 %r12519, %rs899; 2026-02-21T10:18:33.6171604Z cvt.f32.bf16 %r12520, %rs900; 2026-02-21T10:18:33.6171664Z cvt.f32.bf16 %r12521, %rs907; 2026-02-21T10:18:33.6171729Z cvt.f32.bf16 %r12522, %rs908; 2026-02-21T10:18:33.6171789Z cvt.f32.bf16 %r12651, %rs915; 2026-02-21T10:18:33.6171850Z cvt.f32.bf16 %r12652, %rs916; 2026-02-21T10:18:33.6171916Z cvt.f32.bf16 %r12653, %rs923; 2026-02-21T10:18:33.6171979Z cvt.f32.bf16 %r12654, %rs924; 2026-02-21T10:18:33.6172038Z cvt.f32.bf16 %r12783, %rs931; 2026-02-21T10:18:33.6172098Z cvt.f32.bf16 %r12784, %rs932; 2026-02-21T10:18:33.6172163Z cvt.f32.bf16 %r12785, %rs939; 2026-02-21T10:18:33.6172221Z cvt.f32.bf16 %r12786, %rs940; 2026-02-21T10:18:33.6172288Z cvt.f32.bf16 %r12915, %rs947; 2026-02-21T10:18:33.6172348Z cvt.f32.bf16 %r12916, %rs948; 2026-02-21T10:18:33.6172408Z cvt.f32.bf16 %r12917, %rs955; 2026-02-21T10:18:33.6172475Z cvt.f32.bf16 %r12918, %rs956; 2026-02-21T10:18:33.6172546Z cvt.f32.bf16 %r13047, %rs901; 2026-02-21T10:18:33.6172609Z cvt.f32.bf16 %r13048, %rs902; 2026-02-21T10:18:33.6172670Z cvt.f32.bf16 %r13049, %rs909; 2026-02-21T10:18:33.6172736Z cvt.f32.bf16 %r13050, %rs910; 2026-02-21T10:18:33.6172797Z cvt.f32.bf16 %r13179, %rs917; 2026-02-21T10:18:33.6172858Z cvt.f32.bf16 %r13180, %rs918; 2026-02-21T10:18:33.6172924Z cvt.f32.bf16 %r13181, %rs925; 2026-02-21T10:18:33.6172984Z cvt.f32.bf16 %r13182, %rs926; 2026-02-21T10:18:33.6173044Z cvt.f32.bf16 %r13311, %rs933; 2026-02-21T10:18:33.6173103Z cvt.f32.bf16 %r13312, %rs934; 2026-02-21T10:18:33.6173166Z cvt.f32.bf16 %r13313, %rs941; 2026-02-21T10:18:33.6173225Z cvt.f32.bf16 %r13314, %rs942; 2026-02-21T10:18:33.6173283Z cvt.f32.bf16 %r13443, %rs949; 2026-02-21T10:18:33.6173344Z cvt.f32.bf16 %r13444, %rs950; 2026-02-21T10:18:33.6173404Z cvt.f32.bf16 %r13445, %rs957; 2026-02-21T10:18:33.6173534Z cvt.f32.bf16 %r13446, %rs958; 2026-02-21T10:18:33.6173641Z cvt.f32.bf16 %r13575, %rs903; 2026-02-21T10:18:33.6173709Z cvt.f32.bf16 %r13576, %rs904; 2026-02-21T10:18:33.6173769Z cvt.f32.bf16 %r13577, %rs911; 2026-02-21T10:18:33.6173829Z cvt.f32.bf16 %r13578, %rs912; 2026-02-21T10:18:33.6173891Z cvt.f32.bf16 %r13707, %rs919; 2026-02-21T10:18:33.6173950Z cvt.f32.bf16 %r13708, %rs920; 2026-02-21T10:18:33.6174010Z cvt.f32.bf16 %r13709, %rs927; 2026-02-21T10:18:33.6174069Z cvt.f32.bf16 %r13710, %rs928; 2026-02-21T10:18:33.6174131Z cvt.f32.bf16 %r13839, %rs935; 2026-02-21T10:18:33.6174190Z cvt.f32.bf16 %r13840, %rs936; 2026-02-21T10:18:33.6174249Z cvt.f32.bf16 %r13841, %rs943; 2026-02-21T10:18:33.6174315Z cvt.f32.bf16 %r13842, %rs944; 2026-02-21T10:18:33.6174382Z cvt.f32.bf16 %r13971, %rs951; 2026-02-21T10:18:33.6174440Z cvt.f32.bf16 %r13972, %rs952; 2026-02-21T10:18:33.6174503Z cvt.f32.bf16 %r13973, %rs959; 2026-02-21T10:18:33.6174564Z cvt.f32.bf16 %r13974, %rs960; 2026-02-21T10:18:33.6174778Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6174882Z bar.sync 0; 2026-02-21T10:18:33.6174948Z // begin inline asm 2026-02-21T10:18:33.6175062Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.6175119Z // end inline asm 2026-02-21T10:18:33.6175178Z bar.sync 0; 2026-02-21T10:18:33.6175237Z // begin inline asm 2026-02-21T10:18:33.6175416Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.6175475Z // end inline asm 2026-02-21T10:18:33.6175536Z // begin inline asm 2026-02-21T10:18:33.6175613Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6175670Z // end inline asm 2026-02-21T10:18:33.6175729Z bar.sync 0; 2026-02-21T10:18:33.6175797Z elect.sync %r19067|%p190, -1; 2026-02-21T10:18:33.6175867Z and.pred %p131, %p1, %p190; 2026-02-21T10:18:33.6175932Z add.s64 %rd55, %rd659, 96; 2026-02-21T10:18:33.6176001Z cvt.u32.u64 %r11858, %rd55; 2026-02-21T10:18:33.6176061Z // begin inline asm 2026-02-21T10:18:33.6176417Z @%p131 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r19299, %r11858}], [%r19296]; 2026-02-21T10:18:33.6176605Z // end inline asm 2026-02-21T10:18:33.6176664Z bar.sync 0; 2026-02-21T10:18:33.6176723Z mov.b32 %r18935, 0; 2026-02-21T10:18:33.6176786Z // begin inline asm 2026-02-21T10:18:33.6176838Z 2026-02-21T10:18:33.6176891Z { 2026-02-21T10:18:33.6176956Z .reg .pred complete; 2026-02-21T10:18:33.6177015Z waitLoop: 2026-02-21T10:18:33.6177168Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r18935; 2026-02-21T10:18:33.6177239Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6177292Z } 2026-02-21T10:18:33.6177297Z 2026-02-21T10:18:33.6177354Z // end inline asm 2026-02-21T10:18:33.6177408Z bar.sync 0; 2026-02-21T10:18:33.6177466Z // begin inline asm 2026-02-21T10:18:33.6177568Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.6177628Z // end inline asm 2026-02-21T10:18:33.6177843Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6177917Z ld.shared.s8 %rs961, [%r26]; 2026-02-21T10:18:33.6178115Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6178181Z shl.b16 %rs962, %rs961, 4; 2026-02-21T10:18:33.6178382Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6178452Z ld.shared.s8 %rs963, [%r27+128]; 2026-02-21T10:18:33.6178659Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6178728Z shl.b16 %rs964, %rs963, 4; 2026-02-21T10:18:33.6178925Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6178990Z ld.shared.s8 %rs965, [%r28+256]; 2026-02-21T10:18:33.6179280Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6179411Z shl.b16 %rs966, %rs965, 4; 2026-02-21T10:18:33.6179610Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6179677Z ld.shared.s8 %rs967, [%r29+384]; 2026-02-21T10:18:33.6179877Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6179939Z shl.b16 %rs968, %rs967, 4; 2026-02-21T10:18:33.6180132Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6180204Z ld.shared.s8 %rs969, [%r30+512]; 2026-02-21T10:18:33.6180394Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6180456Z shl.b16 %rs970, %rs969, 4; 2026-02-21T10:18:33.6180653Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6180720Z ld.shared.s8 %rs971, [%r31+640]; 2026-02-21T10:18:33.6180983Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6181049Z shl.b16 %rs972, %rs971, 4; 2026-02-21T10:18:33.6181249Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6181369Z ld.shared.s8 %rs973, [%r32+768]; 2026-02-21T10:18:33.6181566Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6181635Z shl.b16 %rs974, %rs973, 4; 2026-02-21T10:18:33.6181828Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6181893Z ld.shared.s8 %rs975, [%r33+896]; 2026-02-21T10:18:33.6182096Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6182165Z shl.b16 %rs976, %rs975, 4; 2026-02-21T10:18:33.6182363Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6182434Z ld.shared.s8 %rs977, [%r26+1024]; 2026-02-21T10:18:33.6182632Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6182692Z shl.b16 %rs978, %rs977, 4; 2026-02-21T10:18:33.6182887Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6182957Z ld.shared.s8 %rs979, [%r27+1152]; 2026-02-21T10:18:33.6183150Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6183211Z shl.b16 %rs980, %rs979, 4; 2026-02-21T10:18:33.6183406Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6183470Z ld.shared.s8 %rs981, [%r28+1280]; 2026-02-21T10:18:33.6183666Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6183733Z shl.b16 %rs982, %rs981, 4; 2026-02-21T10:18:33.6183926Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6183991Z ld.shared.s8 %rs983, [%r29+1408]; 2026-02-21T10:18:33.6184191Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6184251Z shl.b16 %rs984, %rs983, 4; 2026-02-21T10:18:33.6184450Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6184514Z ld.shared.s8 %rs985, [%r30+1536]; 2026-02-21T10:18:33.6184709Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6184769Z shl.b16 %rs986, %rs985, 4; 2026-02-21T10:18:33.6184961Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6185146Z ld.shared.s8 %rs987, [%r31+1664]; 2026-02-21T10:18:33.6185340Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6185402Z shl.b16 %rs988, %rs987, 4; 2026-02-21T10:18:33.6185599Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6185665Z ld.shared.s8 %rs989, [%r32+1792]; 2026-02-21T10:18:33.6185860Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6185923Z shl.b16 %rs990, %rs989, 4; 2026-02-21T10:18:33.6186117Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6186181Z ld.shared.s8 %rs991, [%r33+1920]; 2026-02-21T10:18:33.6186380Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6186574Z shl.b16 %rs992, %rs991, 4; 2026-02-21T10:18:33.6186850Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6186917Z ld.shared.s8 %rs993, [%r26+2048]; 2026-02-21T10:18:33.6187127Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6187189Z shl.b16 %rs994, %rs993, 4; 2026-02-21T10:18:33.6187440Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6187514Z ld.shared.s8 %rs995, [%r27+2176]; 2026-02-21T10:18:33.6187708Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6187768Z shl.b16 %rs996, %rs995, 4; 2026-02-21T10:18:33.6187964Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6188031Z ld.shared.s8 %rs997, [%r28+2304]; 2026-02-21T10:18:33.6188224Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6188293Z shl.b16 %rs998, %rs997, 4; 2026-02-21T10:18:33.6188554Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6188621Z ld.shared.s8 %rs999, [%r29+2432]; 2026-02-21T10:18:33.6188815Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6188883Z shl.b16 %rs1000, %rs999, 4; 2026-02-21T10:18:33.6189077Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6189146Z ld.shared.s8 %rs1001, [%r30+2560]; 2026-02-21T10:18:33.6189341Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6189405Z shl.b16 %rs1002, %rs1001, 4; 2026-02-21T10:18:33.6189600Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6189670Z ld.shared.s8 %rs1003, [%r31+2688]; 2026-02-21T10:18:33.6189865Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6189930Z shl.b16 %rs1004, %rs1003, 4; 2026-02-21T10:18:33.6190126Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6190190Z ld.shared.s8 %rs1005, [%r32+2816]; 2026-02-21T10:18:33.6190384Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6190446Z shl.b16 %rs1006, %rs1005, 4; 2026-02-21T10:18:33.6190643Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6190708Z ld.shared.s8 %rs1007, [%r33+2944]; 2026-02-21T10:18:33.6190900Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6191106Z shl.b16 %rs1008, %rs1007, 4; 2026-02-21T10:18:33.6191307Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6191373Z ld.shared.s8 %rs1009, [%r26+3072]; 2026-02-21T10:18:33.6191569Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6191633Z shl.b16 %rs1010, %rs1009, 4; 2026-02-21T10:18:33.6191827Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6191893Z ld.shared.s8 %rs1011, [%r27+3200]; 2026-02-21T10:18:33.6192085Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6192147Z shl.b16 %rs1012, %rs1011, 4; 2026-02-21T10:18:33.6192344Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6192413Z ld.shared.s8 %rs1013, [%r28+3328]; 2026-02-21T10:18:33.6192655Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6192730Z shl.b16 %rs1014, %rs1013, 4; 2026-02-21T10:18:33.6192930Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6192994Z ld.shared.s8 %rs1015, [%r29+3456]; 2026-02-21T10:18:33.6193231Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6193299Z shl.b16 %rs1016, %rs1015, 4; 2026-02-21T10:18:33.6193491Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6193558Z ld.shared.s8 %rs1017, [%r30+3584]; 2026-02-21T10:18:33.6193756Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6193819Z shl.b16 %rs1018, %rs1017, 4; 2026-02-21T10:18:33.6194011Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6194079Z ld.shared.s8 %rs1019, [%r31+3712]; 2026-02-21T10:18:33.6194272Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6194332Z shl.b16 %rs1020, %rs1019, 4; 2026-02-21T10:18:33.6194526Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6194603Z ld.shared.s8 %rs1021, [%r32+3840]; 2026-02-21T10:18:33.6194795Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6194856Z shl.b16 %rs1022, %rs1021, 4; 2026-02-21T10:18:33.6195052Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6195116Z ld.shared.s8 %rs1023, [%r33+3968]; 2026-02-21T10:18:33.6195311Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6195387Z shl.b16 %rs1024, %rs1023, 4; 2026-02-21T10:18:33.6195584Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6195647Z cvt.s16.s8 %rs1025, %rs962; 2026-02-21T10:18:33.6195707Z shr.s16 %rs1026, %rs1025, 4; 2026-02-21T10:18:33.6195772Z cvt.s16.s8 %rs1027, %rs964; 2026-02-21T10:18:33.6195835Z shr.s16 %rs1028, %rs1027, 4; 2026-02-21T10:18:33.6195896Z shr.s16 %rs1029, %rs961, 4; 2026-02-21T10:18:33.6195958Z shr.s16 %rs1030, %rs963, 4; 2026-02-21T10:18:33.6196152Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6196218Z cvt.rn.f32.s16 %r19068, %rs1030; 2026-02-21T10:18:33.6196285Z cvt.rn.f32.s16 %r19069, %rs1029; 2026-02-21T10:18:33.6196347Z cvt.rn.f32.s16 %r19070, %rs1028; 2026-02-21T10:18:33.6196592Z cvt.rn.f32.s16 %r19071, %rs1026; 2026-02-21T10:18:33.6196794Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6196934Z cvt.s16.s8 %rs1031, %rs966; 2026-02-21T10:18:33.6196995Z shr.s16 %rs1032, %rs1031, 4; 2026-02-21T10:18:33.6197056Z cvt.s16.s8 %rs1033, %rs968; 2026-02-21T10:18:33.6197120Z shr.s16 %rs1034, %rs1033, 4; 2026-02-21T10:18:33.6197181Z shr.s16 %rs1035, %rs965, 4; 2026-02-21T10:18:33.6197243Z shr.s16 %rs1036, %rs967, 4; 2026-02-21T10:18:33.6197441Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6197504Z cvt.rn.f32.s16 %r19072, %rs1036; 2026-02-21T10:18:33.6197565Z cvt.rn.f32.s16 %r19073, %rs1035; 2026-02-21T10:18:33.6197625Z cvt.rn.f32.s16 %r19074, %rs1034; 2026-02-21T10:18:33.6197692Z cvt.rn.f32.s16 %r19075, %rs1032; 2026-02-21T10:18:33.6197886Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6197948Z cvt.s16.s8 %rs1037, %rs970; 2026-02-21T10:18:33.6198015Z shr.s16 %rs1038, %rs1037, 4; 2026-02-21T10:18:33.6198086Z cvt.s16.s8 %rs1039, %rs972; 2026-02-21T10:18:33.6198213Z shr.s16 %rs1040, %rs1039, 4; 2026-02-21T10:18:33.6198277Z shr.s16 %rs1041, %rs969, 4; 2026-02-21T10:18:33.6198341Z shr.s16 %rs1042, %rs971, 4; 2026-02-21T10:18:33.6198592Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6198657Z cvt.rn.f32.s16 %r19076, %rs1042; 2026-02-21T10:18:33.6198722Z cvt.rn.f32.s16 %r19077, %rs1041; 2026-02-21T10:18:33.6198783Z cvt.rn.f32.s16 %r19078, %rs1040; 2026-02-21T10:18:33.6198844Z cvt.rn.f32.s16 %r19079, %rs1038; 2026-02-21T10:18:33.6199045Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6199105Z cvt.s16.s8 %rs1043, %rs974; 2026-02-21T10:18:33.6199167Z shr.s16 %rs1044, %rs1043, 4; 2026-02-21T10:18:33.6199230Z cvt.s16.s8 %rs1045, %rs976; 2026-02-21T10:18:33.6199292Z shr.s16 %rs1046, %rs1045, 4; 2026-02-21T10:18:33.6199354Z shr.s16 %rs1047, %rs973, 4; 2026-02-21T10:18:33.6199415Z shr.s16 %rs1048, %rs975, 4; 2026-02-21T10:18:33.6199611Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6199673Z cvt.rn.f32.s16 %r19080, %rs1048; 2026-02-21T10:18:33.6199744Z cvt.rn.f32.s16 %r19081, %rs1047; 2026-02-21T10:18:33.6199812Z cvt.rn.f32.s16 %r19082, %rs1046; 2026-02-21T10:18:33.6199874Z cvt.rn.f32.s16 %r19083, %rs1044; 2026-02-21T10:18:33.6200070Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6200135Z cvt.s16.s8 %rs1049, %rs978; 2026-02-21T10:18:33.6200199Z shr.s16 %rs1050, %rs1049, 4; 2026-02-21T10:18:33.6200259Z cvt.s16.s8 %rs1051, %rs980; 2026-02-21T10:18:33.6200319Z shr.s16 %rs1052, %rs1051, 4; 2026-02-21T10:18:33.6200382Z shr.s16 %rs1053, %rs977, 4; 2026-02-21T10:18:33.6200444Z shr.s16 %rs1054, %rs979, 4; 2026-02-21T10:18:33.6200641Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6200707Z cvt.rn.f32.s16 %r19084, %rs1054; 2026-02-21T10:18:33.6200767Z cvt.rn.f32.s16 %r19085, %rs1053; 2026-02-21T10:18:33.6200829Z cvt.rn.f32.s16 %r19086, %rs1052; 2026-02-21T10:18:33.6200894Z cvt.rn.f32.s16 %r19087, %rs1050; 2026-02-21T10:18:33.6201091Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6201153Z cvt.s16.s8 %rs1055, %rs982; 2026-02-21T10:18:33.6201213Z shr.s16 %rs1056, %rs1055, 4; 2026-02-21T10:18:33.6201278Z cvt.s16.s8 %rs1057, %rs984; 2026-02-21T10:18:33.6201338Z shr.s16 %rs1058, %rs1057, 4; 2026-02-21T10:18:33.6201399Z shr.s16 %rs1059, %rs981, 4; 2026-02-21T10:18:33.6201458Z shr.s16 %rs1060, %rs983, 4; 2026-02-21T10:18:33.6201656Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6201833Z cvt.rn.f32.s16 %r19088, %rs1060; 2026-02-21T10:18:33.6201896Z cvt.rn.f32.s16 %r19089, %rs1059; 2026-02-21T10:18:33.6201962Z cvt.rn.f32.s16 %r19090, %rs1058; 2026-02-21T10:18:33.6202023Z cvt.rn.f32.s16 %r19091, %rs1056; 2026-02-21T10:18:33.6202217Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6202282Z cvt.s16.s8 %rs1061, %rs986; 2026-02-21T10:18:33.6202342Z shr.s16 %rs1062, %rs1061, 4; 2026-02-21T10:18:33.6202415Z cvt.s16.s8 %rs1063, %rs988; 2026-02-21T10:18:33.6202478Z shr.s16 %rs1064, %rs1063, 4; 2026-02-21T10:18:33.6202543Z shr.s16 %rs1065, %rs985, 4; 2026-02-21T10:18:33.6202603Z shr.s16 %rs1066, %rs987, 4; 2026-02-21T10:18:33.6202797Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6202862Z cvt.rn.f32.s16 %r19092, %rs1066; 2026-02-21T10:18:33.6202925Z cvt.rn.f32.s16 %r19093, %rs1065; 2026-02-21T10:18:33.6202987Z cvt.rn.f32.s16 %r19094, %rs1064; 2026-02-21T10:18:33.6203052Z cvt.rn.f32.s16 %r19095, %rs1062; 2026-02-21T10:18:33.6203294Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6203357Z cvt.s16.s8 %rs1067, %rs990; 2026-02-21T10:18:33.6203418Z shr.s16 %rs1068, %rs1067, 4; 2026-02-21T10:18:33.6203484Z cvt.s16.s8 %rs1069, %rs992; 2026-02-21T10:18:33.6203604Z shr.s16 %rs1070, %rs1069, 4; 2026-02-21T10:18:33.6203667Z shr.s16 %rs1071, %rs989, 4; 2026-02-21T10:18:33.6203731Z shr.s16 %rs1072, %rs991, 4; 2026-02-21T10:18:33.6203924Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6203987Z cvt.rn.f32.s16 %r19096, %rs1072; 2026-02-21T10:18:33.6204050Z cvt.rn.f32.s16 %r19097, %rs1071; 2026-02-21T10:18:33.6204111Z cvt.rn.f32.s16 %r19098, %rs1070; 2026-02-21T10:18:33.6204172Z cvt.rn.f32.s16 %r19099, %rs1068; 2026-02-21T10:18:33.6204375Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6204455Z cvt.s16.s8 %rs1073, %rs994; 2026-02-21T10:18:33.6204518Z shr.s16 %rs1074, %rs1073, 4; 2026-02-21T10:18:33.6204578Z cvt.s16.s8 %rs1075, %rs996; 2026-02-21T10:18:33.6204641Z shr.s16 %rs1076, %rs1075, 4; 2026-02-21T10:18:33.6204701Z shr.s16 %rs1077, %rs993, 4; 2026-02-21T10:18:33.6204762Z shr.s16 %rs1078, %rs995, 4; 2026-02-21T10:18:33.6204956Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6205023Z cvt.rn.f32.s16 %r19100, %rs1078; 2026-02-21T10:18:33.6205084Z cvt.rn.f32.s16 %r19101, %rs1077; 2026-02-21T10:18:33.6205145Z cvt.rn.f32.s16 %r19102, %rs1076; 2026-02-21T10:18:33.6205209Z cvt.rn.f32.s16 %r19103, %rs1074; 2026-02-21T10:18:33.6205402Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6205465Z cvt.s16.s8 %rs1079, %rs998; 2026-02-21T10:18:33.6205531Z shr.s16 %rs1080, %rs1079, 4; 2026-02-21T10:18:33.6205594Z cvt.s16.s8 %rs1081, %rs1000; 2026-02-21T10:18:33.6205655Z shr.s16 %rs1082, %rs1081, 4; 2026-02-21T10:18:33.6205715Z shr.s16 %rs1083, %rs997, 4; 2026-02-21T10:18:33.6205779Z shr.s16 %rs1084, %rs999, 4; 2026-02-21T10:18:33.6205973Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6206038Z cvt.rn.f32.s16 %r19104, %rs1084; 2026-02-21T10:18:33.6206102Z cvt.rn.f32.s16 %r19105, %rs1083; 2026-02-21T10:18:33.6206163Z cvt.rn.f32.s16 %r19106, %rs1082; 2026-02-21T10:18:33.6206222Z cvt.rn.f32.s16 %r19107, %rs1080; 2026-02-21T10:18:33.6206417Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6206587Z cvt.s16.s8 %rs1085, %rs1002; 2026-02-21T10:18:33.6206664Z shr.s16 %rs1086, %rs1085, 4; 2026-02-21T10:18:33.6206809Z cvt.s16.s8 %rs1087, %rs1004; 2026-02-21T10:18:33.6206876Z shr.s16 %rs1088, %rs1087, 4; 2026-02-21T10:18:33.6206994Z shr.s16 %rs1089, %rs1001, 4; 2026-02-21T10:18:33.6207055Z shr.s16 %rs1090, %rs1003, 4; 2026-02-21T10:18:33.6207253Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6207314Z cvt.rn.f32.s16 %r19108, %rs1090; 2026-02-21T10:18:33.6207375Z cvt.rn.f32.s16 %r19109, %rs1089; 2026-02-21T10:18:33.6207441Z cvt.rn.f32.s16 %r19110, %rs1088; 2026-02-21T10:18:33.6207501Z cvt.rn.f32.s16 %r19111, %rs1086; 2026-02-21T10:18:33.6207710Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6207772Z cvt.s16.s8 %rs1091, %rs1006; 2026-02-21T10:18:33.6207836Z shr.s16 %rs1092, %rs1091, 4; 2026-02-21T10:18:33.6207899Z cvt.s16.s8 %rs1093, %rs1008; 2026-02-21T10:18:33.6207957Z shr.s16 %rs1094, %rs1093, 4; 2026-02-21T10:18:33.6208022Z shr.s16 %rs1095, %rs1005, 4; 2026-02-21T10:18:33.6208083Z shr.s16 %rs1096, %rs1007, 4; 2026-02-21T10:18:33.6208342Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6208406Z cvt.rn.f32.s16 %r19112, %rs1096; 2026-02-21T10:18:33.6208470Z cvt.rn.f32.s16 %r19113, %rs1095; 2026-02-21T10:18:33.6208531Z cvt.rn.f32.s16 %r19114, %rs1094; 2026-02-21T10:18:33.6208592Z cvt.rn.f32.s16 %r19115, %rs1092; 2026-02-21T10:18:33.6208844Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6208908Z cvt.s16.s8 %rs1097, %rs1010; 2026-02-21T10:18:33.6208967Z shr.s16 %rs1098, %rs1097, 4; 2026-02-21T10:18:33.6209031Z cvt.s16.s8 %rs1099, %rs1012; 2026-02-21T10:18:33.6209090Z shr.s16 %rs1100, %rs1099, 4; 2026-02-21T10:18:33.6209152Z shr.s16 %rs1101, %rs1009, 4; 2026-02-21T10:18:33.6209211Z shr.s16 %rs1102, %rs1011, 4; 2026-02-21T10:18:33.6209408Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6209473Z cvt.rn.f32.s16 %r19116, %rs1102; 2026-02-21T10:18:33.6209533Z cvt.rn.f32.s16 %r19117, %rs1101; 2026-02-21T10:18:33.6209610Z cvt.rn.f32.s16 %r19118, %rs1100; 2026-02-21T10:18:33.6209674Z cvt.rn.f32.s16 %r19119, %rs1098; 2026-02-21T10:18:33.6209869Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6209933Z cvt.s16.s8 %rs1103, %rs1014; 2026-02-21T10:18:33.6209993Z shr.s16 %rs1104, %rs1103, 4; 2026-02-21T10:18:33.6210052Z cvt.s16.s8 %rs1105, %rs1016; 2026-02-21T10:18:33.6210112Z shr.s16 %rs1106, %rs1105, 4; 2026-02-21T10:18:33.6210175Z shr.s16 %rs1107, %rs1013, 4; 2026-02-21T10:18:33.6210236Z shr.s16 %rs1108, %rs1015, 4; 2026-02-21T10:18:33.6210430Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6210497Z cvt.rn.f32.s16 %r19120, %rs1108; 2026-02-21T10:18:33.6210561Z cvt.rn.f32.s16 %r19121, %rs1107; 2026-02-21T10:18:33.6210622Z cvt.rn.f32.s16 %r19122, %rs1106; 2026-02-21T10:18:33.6210690Z cvt.rn.f32.s16 %r19123, %rs1104; 2026-02-21T10:18:33.6210884Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6210946Z cvt.s16.s8 %rs1109, %rs1018; 2026-02-21T10:18:33.6211006Z shr.s16 %rs1110, %rs1109, 4; 2026-02-21T10:18:33.6211069Z cvt.s16.s8 %rs1111, %rs1020; 2026-02-21T10:18:33.6211131Z shr.s16 %rs1112, %rs1111, 4; 2026-02-21T10:18:33.6211192Z shr.s16 %rs1113, %rs1017, 4; 2026-02-21T10:18:33.6211253Z shr.s16 %rs1114, %rs1019, 4; 2026-02-21T10:18:33.6211444Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6211505Z cvt.rn.f32.s16 %r19124, %rs1114; 2026-02-21T10:18:33.6211565Z cvt.rn.f32.s16 %r19125, %rs1113; 2026-02-21T10:18:33.6211631Z cvt.rn.f32.s16 %r19126, %rs1112; 2026-02-21T10:18:33.6211763Z cvt.rn.f32.s16 %r19127, %rs1110; 2026-02-21T10:18:33.6211958Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6212068Z cvt.s16.s8 %rs1115, %rs1022; 2026-02-21T10:18:33.6212130Z shr.s16 %rs1116, %rs1115, 4; 2026-02-21T10:18:33.6212188Z cvt.s16.s8 %rs1117, %rs1024; 2026-02-21T10:18:33.6212251Z shr.s16 %rs1118, %rs1117, 4; 2026-02-21T10:18:33.6212310Z shr.s16 %rs1119, %rs1021, 4; 2026-02-21T10:18:33.6212372Z shr.s16 %rs1120, %rs1023, 4; 2026-02-21T10:18:33.6212567Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6212632Z cvt.rn.f32.s16 %r19128, %rs1120; 2026-02-21T10:18:33.6212693Z cvt.rn.f32.s16 %r19129, %rs1119; 2026-02-21T10:18:33.6212754Z cvt.rn.f32.s16 %r19130, %rs1118; 2026-02-21T10:18:33.6212818Z cvt.rn.f32.s16 %r19131, %rs1116; 2026-02-21T10:18:33.6212875Z bar.sync 0; 2026-02-21T10:18:33.6213012Z st.shared.v4.b32 [%r34], {%r19071, %r19069, %r19070, %r19068}; 2026-02-21T10:18:33.6213154Z st.shared.v4.b32 [%r34+16384], {%r19103, %r19101, %r19102, %r19100}; 2026-02-21T10:18:33.6213317Z st.shared.v4.b32 [%r35], {%r19075, %r19073, %r19074, %r19072}; 2026-02-21T10:18:33.6213443Z st.shared.v4.b32 [%r35+16384], {%r19107, %r19105, %r19106, %r19104}; 2026-02-21T10:18:33.6213551Z st.shared.v4.b32 [%r36], {%r19079, %r19077, %r19078, %r19076}; 2026-02-21T10:18:33.6213717Z st.shared.v4.b32 [%r36+16384], {%r19111, %r19109, %r19110, %r19108}; 2026-02-21T10:18:33.6213826Z st.shared.v4.b32 [%r37], {%r19083, %r19081, %r19082, %r19080}; 2026-02-21T10:18:33.6213941Z st.shared.v4.b32 [%r37+16384], {%r19115, %r19113, %r19114, %r19112}; 2026-02-21T10:18:33.6214052Z st.shared.v4.b32 [%r38], {%r19087, %r19085, %r19086, %r19084}; 2026-02-21T10:18:33.6214167Z st.shared.v4.b32 [%r38+16384], {%r19119, %r19117, %r19118, %r19116}; 2026-02-21T10:18:33.6214271Z st.shared.v4.b32 [%r39], {%r19091, %r19089, %r19090, %r19088}; 2026-02-21T10:18:33.6214398Z st.shared.v4.b32 [%r39+16384], {%r19123, %r19121, %r19122, %r19120}; 2026-02-21T10:18:33.6214507Z st.shared.v4.b32 [%r40], {%r19095, %r19093, %r19094, %r19092}; 2026-02-21T10:18:33.6214625Z st.shared.v4.b32 [%r40+16384], {%r19127, %r19125, %r19126, %r19124}; 2026-02-21T10:18:33.6214739Z st.shared.v4.b32 [%r41], {%r19099, %r19097, %r19098, %r19096}; 2026-02-21T10:18:33.6214858Z st.shared.v4.b32 [%r41+16384], {%r19131, %r19129, %r19130, %r19128}; 2026-02-21T10:18:33.6214918Z $L__tmp9: 2026-02-21T10:18:33.6215196Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6215266Z // begin inline asm 2026-02-21T10:18:33.6215350Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6215408Z // end inline asm 2026-02-21T10:18:33.6215470Z bar.sync 0; 2026-02-21T10:18:33.6215544Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6215611Z mov.pred %p133, -1; 2026-02-21T10:18:33.6215671Z // begin inline asm 2026-02-21T10:18:33.6217309Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r11991,%r11992,%r11993,%r11994}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6217373Z // end inline asm 2026-02-21T10:18:33.6217439Z // begin inline asm 2026-02-21T10:18:33.6218928Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12123,%r12124,%r12125,%r12126}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6219132Z // end inline asm 2026-02-21T10:18:33.6219193Z // begin inline asm 2026-02-21T10:18:33.6220678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12255,%r12256,%r12257,%r12258}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6220744Z // end inline asm 2026-02-21T10:18:33.6220805Z // begin inline asm 2026-02-21T10:18:33.6222409Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12387,%r12388,%r12389,%r12390}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6222477Z // end inline asm 2026-02-21T10:18:33.6222537Z // begin inline asm 2026-02-21T10:18:33.6224024Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12519,%r12520,%r12521,%r12522}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6224088Z // end inline asm 2026-02-21T10:18:33.6224147Z // begin inline asm 2026-02-21T10:18:33.6225629Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12651,%r12652,%r12653,%r12654}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6225690Z // end inline asm 2026-02-21T10:18:33.6225752Z // begin inline asm 2026-02-21T10:18:33.6227347Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12783,%r12784,%r12785,%r12786}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6227487Z // end inline asm 2026-02-21T10:18:33.6227615Z // begin inline asm 2026-02-21T10:18:33.6229196Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r12915,%r12916,%r12917,%r12918}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6229262Z // end inline asm 2026-02-21T10:18:33.6229327Z // begin inline asm 2026-02-21T10:18:33.6230932Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13047,%r13048,%r13049,%r13050}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6231010Z // end inline asm 2026-02-21T10:18:33.6231076Z // begin inline asm 2026-02-21T10:18:33.6232557Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13179,%r13180,%r13181,%r13182}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6232625Z // end inline asm 2026-02-21T10:18:33.6232683Z // begin inline asm 2026-02-21T10:18:33.6234171Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13311,%r13312,%r13313,%r13314}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6234236Z // end inline asm 2026-02-21T10:18:33.6234297Z // begin inline asm 2026-02-21T10:18:33.6235780Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13443,%r13444,%r13445,%r13446}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6235839Z // end inline asm 2026-02-21T10:18:33.6235899Z // begin inline asm 2026-02-21T10:18:33.6237500Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13575,%r13576,%r13577,%r13578}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6237700Z // end inline asm 2026-02-21T10:18:33.6237761Z // begin inline asm 2026-02-21T10:18:33.6239243Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13707,%r13708,%r13709,%r13710}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6239304Z // end inline asm 2026-02-21T10:18:33.6239425Z // begin inline asm 2026-02-21T10:18:33.6240961Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13839,%r13840,%r13841,%r13842}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6241023Z // end inline asm 2026-02-21T10:18:33.6241087Z // begin inline asm 2026-02-21T10:18:33.6242581Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r13971,%r13972,%r13973,%r13974}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6242644Z // end inline asm 2026-02-21T10:18:33.6242723Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6242786Z mov.b32 %r14104, %r18935; 2026-02-21T10:18:33.6242846Z mov.b32 %r14105, %r18935; 2026-02-21T10:18:33.6242910Z mov.b32 %r14103, %r29377; 2026-02-21T10:18:33.6242971Z // begin inline asm 2026-02-21T10:18:33.6245500Z // wait for regs: %r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227,%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291,%r14103,%r14104,%r14105 2026-02-21T10:18:33.6245631Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6245733Z // end inline asm 2026-02-21T10:18:33.6245794Z $L__tmp10: 2026-02-21T10:18:33.6246005Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6246074Z add.s32 %r19132, %r32163, -64; 2026-02-21T10:18:33.6246157Z add.s64 %rd342, %rd301, 128; 2026-02-21T10:18:33.6246223Z add.s64 %rd345, %rd304, 128; 2026-02-21T10:18:33.6246286Z add.s64 %rd348, %rd307, 128; 2026-02-21T10:18:33.6246348Z add.s64 %rd351, %rd310, 128; 2026-02-21T10:18:33.6246414Z add.s64 %rd354, %rd313, 128; 2026-02-21T10:18:33.6246590Z add.s64 %rd357, %rd316, 128; 2026-02-21T10:18:33.6246654Z add.s64 %rd360, %rd319, 128; 2026-02-21T10:18:33.6246735Z mad.wide.s32 %rd363, %r19132, 2, %rd85; 2026-02-21T10:18:33.6246938Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6247000Z // begin inline asm 2026-02-21T10:18:33.6247066Z mov.u64 %rd341, 0x0; 2026-02-21T10:18:33.6247199Z createpolicy.fractional.L2::evict_first.b64 %rd341, 1.0; 2026-02-21T10:18:33.6247259Z // end inline asm 2026-02-21T10:18:33.6247401Z // begin inline asm 2026-02-21T10:18:33.6247471Z mov.u32 %r14237, 0x0; 2026-02-21T10:18:33.6247530Z mov.u32 %r14238, 0x0; 2026-02-21T10:18:33.6247587Z mov.u32 %r14239, 0x0; 2026-02-21T10:18:33.6247652Z mov.u32 %r14240, 0x0; 2026-02-21T10:18:33.6247955Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14237, %r14238, %r14239, %r14240 }, [ %rd342 + 0 ], %rd341; 2026-02-21T10:18:33.6248026Z // end inline asm 2026-02-21T10:18:33.6248091Z // begin inline asm 2026-02-21T10:18:33.6248152Z mov.u64 %rd344, 0x0; 2026-02-21T10:18:33.6248276Z createpolicy.fractional.L2::evict_first.b64 %rd344, 1.0; 2026-02-21T10:18:33.6248334Z // end inline asm 2026-02-21T10:18:33.6248397Z // begin inline asm 2026-02-21T10:18:33.6248455Z mov.u32 %r14241, 0x0; 2026-02-21T10:18:33.6248513Z mov.u32 %r14242, 0x0; 2026-02-21T10:18:33.6248577Z mov.u32 %r14243, 0x0; 2026-02-21T10:18:33.6248634Z mov.u32 %r14244, 0x0; 2026-02-21T10:18:33.6248871Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14241, %r14242, %r14243, %r14244 }, [ %rd345 + 0 ], %rd344; 2026-02-21T10:18:33.6248930Z // end inline asm 2026-02-21T10:18:33.6248995Z // begin inline asm 2026-02-21T10:18:33.6249054Z mov.u64 %rd347, 0x0; 2026-02-21T10:18:33.6249172Z createpolicy.fractional.L2::evict_first.b64 %rd347, 1.0; 2026-02-21T10:18:33.6249242Z // end inline asm 2026-02-21T10:18:33.6249300Z // begin inline asm 2026-02-21T10:18:33.6249358Z mov.u32 %r14245, 0x0; 2026-02-21T10:18:33.6249419Z mov.u32 %r14246, 0x0; 2026-02-21T10:18:33.6249477Z mov.u32 %r14247, 0x0; 2026-02-21T10:18:33.6249534Z mov.u32 %r14248, 0x0; 2026-02-21T10:18:33.6249758Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14245, %r14246, %r14247, %r14248 }, [ %rd348 + 0 ], %rd347; 2026-02-21T10:18:33.6249819Z // end inline asm 2026-02-21T10:18:33.6249877Z // begin inline asm 2026-02-21T10:18:33.6249937Z mov.u64 %rd350, 0x0; 2026-02-21T10:18:33.6250060Z createpolicy.fractional.L2::evict_first.b64 %rd350, 1.0; 2026-02-21T10:18:33.6250117Z // end inline asm 2026-02-21T10:18:33.6250176Z // begin inline asm 2026-02-21T10:18:33.6250234Z mov.u32 %r14249, 0x0; 2026-02-21T10:18:33.6250296Z mov.u32 %r14250, 0x0; 2026-02-21T10:18:33.6250353Z mov.u32 %r14251, 0x0; 2026-02-21T10:18:33.6250413Z mov.u32 %r14252, 0x0; 2026-02-21T10:18:33.6250641Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14249, %r14250, %r14251, %r14252 }, [ %rd351 + 0 ], %rd350; 2026-02-21T10:18:33.6250698Z // end inline asm 2026-02-21T10:18:33.6250757Z // begin inline asm 2026-02-21T10:18:33.6250822Z mov.u64 %rd353, 0x0; 2026-02-21T10:18:33.6250940Z createpolicy.fractional.L2::evict_first.b64 %rd353, 1.0; 2026-02-21T10:18:33.6251008Z // end inline asm 2026-02-21T10:18:33.6251068Z // begin inline asm 2026-02-21T10:18:33.6251134Z mov.u32 %r14253, 0x0; 2026-02-21T10:18:33.6251192Z mov.u32 %r14254, 0x0; 2026-02-21T10:18:33.6251327Z mov.u32 %r14255, 0x0; 2026-02-21T10:18:33.6251451Z mov.u32 %r14256, 0x0; 2026-02-21T10:18:33.6251675Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14253, %r14254, %r14255, %r14256 }, [ %rd354 + 0 ], %rd353; 2026-02-21T10:18:33.6251731Z // end inline asm 2026-02-21T10:18:33.6251792Z // begin inline asm 2026-02-21T10:18:33.6251850Z mov.u64 %rd356, 0x0; 2026-02-21T10:18:33.6251966Z createpolicy.fractional.L2::evict_first.b64 %rd356, 1.0; 2026-02-21T10:18:33.6252025Z // end inline asm 2026-02-21T10:18:33.6252087Z // begin inline asm 2026-02-21T10:18:33.6252144Z mov.u32 %r14257, 0x0; 2026-02-21T10:18:33.6252202Z mov.u32 %r14258, 0x0; 2026-02-21T10:18:33.6252261Z mov.u32 %r14259, 0x0; 2026-02-21T10:18:33.6252319Z mov.u32 %r14260, 0x0; 2026-02-21T10:18:33.6252540Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14257, %r14258, %r14259, %r14260 }, [ %rd357 + 0 ], %rd356; 2026-02-21T10:18:33.6252597Z // end inline asm 2026-02-21T10:18:33.6252658Z // begin inline asm 2026-02-21T10:18:33.6252718Z mov.u64 %rd359, 0x0; 2026-02-21T10:18:33.6252837Z createpolicy.fractional.L2::evict_first.b64 %rd359, 1.0; 2026-02-21T10:18:33.6252898Z // end inline asm 2026-02-21T10:18:33.6253020Z // begin inline asm 2026-02-21T10:18:33.6253082Z mov.u32 %r14261, 0x0; 2026-02-21T10:18:33.6253143Z mov.u32 %r14262, 0x0; 2026-02-21T10:18:33.6253200Z mov.u32 %r14263, 0x0; 2026-02-21T10:18:33.6253259Z mov.u32 %r14264, 0x0; 2026-02-21T10:18:33.6253528Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14261, %r14262, %r14263, %r14264 }, [ %rd360 + 0 ], %rd359; 2026-02-21T10:18:33.6253594Z // end inline asm 2026-02-21T10:18:33.6253652Z // begin inline asm 2026-02-21T10:18:33.6253711Z mov.u64 %rd362, 0x0; 2026-02-21T10:18:33.6253833Z createpolicy.fractional.L2::evict_first.b64 %rd362, 1.0; 2026-02-21T10:18:33.6253890Z // end inline asm 2026-02-21T10:18:33.6253949Z // begin inline asm 2026-02-21T10:18:33.6254007Z mov.u32 %r14265, 0x0; 2026-02-21T10:18:33.6254069Z mov.u32 %r14266, 0x0; 2026-02-21T10:18:33.6254127Z mov.u32 %r14267, 0x0; 2026-02-21T10:18:33.6254186Z mov.u32 %r14268, 0x0; 2026-02-21T10:18:33.6254415Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14265, %r14266, %r14267, %r14268 }, [ %rd363 + 0 ], %rd362; 2026-02-21T10:18:33.6254481Z // end inline asm 2026-02-21T10:18:33.6254685Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6254745Z bar.sync 0; 2026-02-21T10:18:33.6254833Z st.shared.v2.b32 [%r16], {%r14237, %r14238}; 2026-02-21T10:18:33.6254927Z st.shared.v2.b32 [%r16+2048], {%r14241, %r14242}; 2026-02-21T10:18:33.6255019Z st.shared.v2.b32 [%r16+4096], {%r14245, %r14246}; 2026-02-21T10:18:33.6255105Z st.shared.v2.b32 [%r16+6144], {%r14249, %r14250}; 2026-02-21T10:18:33.6255189Z st.shared.v2.b32 [%r16+8192], {%r14253, %r14254}; 2026-02-21T10:18:33.6255279Z st.shared.v2.b32 [%r16+10240], {%r14257, %r14258}; 2026-02-21T10:18:33.6255372Z st.shared.v2.b32 [%r16+12288], {%r14261, %r14262}; 2026-02-21T10:18:33.6255459Z st.shared.v2.b32 [%r16+14336], {%r14265, %r14266}; 2026-02-21T10:18:33.6255539Z st.shared.v2.b32 [%r17], {%r14239, %r14240}; 2026-02-21T10:18:33.6255629Z st.shared.v2.b32 [%r17+2048], {%r14243, %r14244}; 2026-02-21T10:18:33.6255712Z st.shared.v2.b32 [%r17+4096], {%r14247, %r14248}; 2026-02-21T10:18:33.6255794Z st.shared.v2.b32 [%r17+6144], {%r14251, %r14252}; 2026-02-21T10:18:33.6255886Z st.shared.v2.b32 [%r17+8192], {%r14255, %r14256}; 2026-02-21T10:18:33.6255981Z st.shared.v2.b32 [%r17+10240], {%r14259, %r14260}; 2026-02-21T10:18:33.6256068Z st.shared.v2.b32 [%r17+12288], {%r14263, %r14264}; 2026-02-21T10:18:33.6256161Z st.shared.v2.b32 [%r17+14336], {%r14267, %r14268}; 2026-02-21T10:18:33.6256236Z bar.sync 0; 2026-02-21T10:18:33.6256313Z ld.shared.b16 %rs1121, [%r58]; 2026-02-21T10:18:33.6256387Z ld.shared.b16 %rs1122, [%r58+1024]; 2026-02-21T10:18:33.6256575Z ld.shared.b16 %rs1123, [%r58+64]; 2026-02-21T10:18:33.6256648Z ld.shared.b16 %rs1124, [%r58+1088]; 2026-02-21T10:18:33.6256804Z ld.shared.b16 %rs1125, [%r58+8192]; 2026-02-21T10:18:33.6256946Z ld.shared.b16 %rs1126, [%r58+9216]; 2026-02-21T10:18:33.6257020Z ld.shared.b16 %rs1127, [%r58+8256]; 2026-02-21T10:18:33.6257087Z ld.shared.b16 %rs1128, [%r58+9280]; 2026-02-21T10:18:33.6257157Z ld.shared.b16 %rs1129, [%r59]; 2026-02-21T10:18:33.6257228Z ld.shared.b16 %rs1130, [%r59+1024]; 2026-02-21T10:18:33.6257295Z ld.shared.b16 %rs1131, [%r59+64]; 2026-02-21T10:18:33.6257361Z ld.shared.b16 %rs1132, [%r59+1088]; 2026-02-21T10:18:33.6257432Z ld.shared.b16 %rs1133, [%r59+8192]; 2026-02-21T10:18:33.6257498Z ld.shared.b16 %rs1134, [%r59+9216]; 2026-02-21T10:18:33.6257563Z ld.shared.b16 %rs1135, [%r59+8256]; 2026-02-21T10:18:33.6257631Z ld.shared.b16 %rs1136, [%r59+9280]; 2026-02-21T10:18:33.6257704Z ld.shared.b16 %rs1137, [%r60]; 2026-02-21T10:18:33.6257773Z ld.shared.b16 %rs1138, [%r60+1024]; 2026-02-21T10:18:33.6257842Z ld.shared.b16 %rs1139, [%r60+64]; 2026-02-21T10:18:33.6257914Z ld.shared.b16 %rs1140, [%r60+1088]; 2026-02-21T10:18:33.6257982Z ld.shared.b16 %rs1141, [%r60+8192]; 2026-02-21T10:18:33.6258049Z ld.shared.b16 %rs1142, [%r60+9216]; 2026-02-21T10:18:33.6258186Z ld.shared.b16 %rs1143, [%r60+8256]; 2026-02-21T10:18:33.6258262Z ld.shared.b16 %rs1144, [%r60+9280]; 2026-02-21T10:18:33.6258328Z ld.shared.b16 %rs1145, [%r61]; 2026-02-21T10:18:33.6258394Z ld.shared.b16 %rs1146, [%r61+1024]; 2026-02-21T10:18:33.6258518Z ld.shared.b16 %rs1147, [%r61+64]; 2026-02-21T10:18:33.6258586Z ld.shared.b16 %rs1148, [%r61+1088]; 2026-02-21T10:18:33.6258651Z ld.shared.b16 %rs1149, [%r61+8192]; 2026-02-21T10:18:33.6258732Z ld.shared.b16 %rs1150, [%r61+9216]; 2026-02-21T10:18:33.6258801Z ld.shared.b16 %rs1151, [%r61+8256]; 2026-02-21T10:18:33.6258867Z ld.shared.b16 %rs1152, [%r61+9280]; 2026-02-21T10:18:33.6258932Z ld.shared.b16 %rs1153, [%r62]; 2026-02-21T10:18:33.6259000Z ld.shared.b16 %rs1154, [%r62+1024]; 2026-02-21T10:18:33.6259064Z ld.shared.b16 %rs1155, [%r62+64]; 2026-02-21T10:18:33.6259133Z ld.shared.b16 %rs1156, [%r62+1088]; 2026-02-21T10:18:33.6259204Z ld.shared.b16 %rs1157, [%r62+8192]; 2026-02-21T10:18:33.6259269Z ld.shared.b16 %rs1158, [%r62+9216]; 2026-02-21T10:18:33.6259336Z ld.shared.b16 %rs1159, [%r62+8256]; 2026-02-21T10:18:33.6259403Z ld.shared.b16 %rs1160, [%r62+9280]; 2026-02-21T10:18:33.6259472Z ld.shared.b16 %rs1161, [%r63]; 2026-02-21T10:18:33.6259538Z ld.shared.b16 %rs1162, [%r63+1024]; 2026-02-21T10:18:33.6259604Z ld.shared.b16 %rs1163, [%r63+64]; 2026-02-21T10:18:33.6259673Z ld.shared.b16 %rs1164, [%r63+1088]; 2026-02-21T10:18:33.6259740Z ld.shared.b16 %rs1165, [%r63+8192]; 2026-02-21T10:18:33.6259808Z ld.shared.b16 %rs1166, [%r63+9216]; 2026-02-21T10:18:33.6259877Z ld.shared.b16 %rs1167, [%r63+8256]; 2026-02-21T10:18:33.6259943Z ld.shared.b16 %rs1168, [%r63+9280]; 2026-02-21T10:18:33.6260006Z ld.shared.b16 %rs1169, [%r64]; 2026-02-21T10:18:33.6260071Z ld.shared.b16 %rs1170, [%r64+1024]; 2026-02-21T10:18:33.6260143Z ld.shared.b16 %rs1171, [%r64+64]; 2026-02-21T10:18:33.6260209Z ld.shared.b16 %rs1172, [%r64+1088]; 2026-02-21T10:18:33.6260277Z ld.shared.b16 %rs1173, [%r64+8192]; 2026-02-21T10:18:33.6260347Z ld.shared.b16 %rs1174, [%r64+9216]; 2026-02-21T10:18:33.6260413Z ld.shared.b16 %rs1175, [%r64+8256]; 2026-02-21T10:18:33.6260477Z ld.shared.b16 %rs1176, [%r64+9280]; 2026-02-21T10:18:33.6260542Z ld.shared.b16 %rs1177, [%r65]; 2026-02-21T10:18:33.6260614Z ld.shared.b16 %rs1178, [%r65+1024]; 2026-02-21T10:18:33.6260680Z ld.shared.b16 %rs1179, [%r65+64]; 2026-02-21T10:18:33.6260745Z ld.shared.b16 %rs1180, [%r65+1088]; 2026-02-21T10:18:33.6260815Z ld.shared.b16 %rs1181, [%r65+8192]; 2026-02-21T10:18:33.6260880Z ld.shared.b16 %rs1182, [%r65+9216]; 2026-02-21T10:18:33.6260946Z ld.shared.b16 %rs1183, [%r65+8256]; 2026-02-21T10:18:33.6261018Z ld.shared.b16 %rs1184, [%r65+9280]; 2026-02-21T10:18:33.6261082Z cvt.f32.bf16 %r14406, %rs1121; 2026-02-21T10:18:33.6261145Z cvt.f32.bf16 %r14407, %rs1122; 2026-02-21T10:18:33.6261276Z cvt.f32.bf16 %r14408, %rs1129; 2026-02-21T10:18:33.6261386Z cvt.f32.bf16 %r14409, %rs1130; 2026-02-21T10:18:33.6261449Z cvt.f32.bf16 %r14538, %rs1137; 2026-02-21T10:18:33.6261511Z cvt.f32.bf16 %r14539, %rs1138; 2026-02-21T10:18:33.6261579Z cvt.f32.bf16 %r14540, %rs1145; 2026-02-21T10:18:33.6261640Z cvt.f32.bf16 %r14541, %rs1146; 2026-02-21T10:18:33.6261703Z cvt.f32.bf16 %r14670, %rs1153; 2026-02-21T10:18:33.6261765Z cvt.f32.bf16 %r14671, %rs1154; 2026-02-21T10:18:33.6261833Z cvt.f32.bf16 %r14672, %rs1161; 2026-02-21T10:18:33.6261894Z cvt.f32.bf16 %r14673, %rs1162; 2026-02-21T10:18:33.6261955Z cvt.f32.bf16 %r14802, %rs1169; 2026-02-21T10:18:33.6262020Z cvt.f32.bf16 %r14803, %rs1170; 2026-02-21T10:18:33.6262080Z cvt.f32.bf16 %r14804, %rs1177; 2026-02-21T10:18:33.6262144Z cvt.f32.bf16 %r14805, %rs1178; 2026-02-21T10:18:33.6262206Z cvt.f32.bf16 %r14934, %rs1123; 2026-02-21T10:18:33.6262271Z cvt.f32.bf16 %r14935, %rs1124; 2026-02-21T10:18:33.6262334Z cvt.f32.bf16 %r14936, %rs1131; 2026-02-21T10:18:33.6262396Z cvt.f32.bf16 %r14937, %rs1132; 2026-02-21T10:18:33.6262461Z cvt.f32.bf16 %r15066, %rs1139; 2026-02-21T10:18:33.6262571Z cvt.f32.bf16 %r15067, %rs1140; 2026-02-21T10:18:33.6262636Z cvt.f32.bf16 %r15068, %rs1147; 2026-02-21T10:18:33.6262702Z cvt.f32.bf16 %r15069, %rs1148; 2026-02-21T10:18:33.6262768Z cvt.f32.bf16 %r15198, %rs1155; 2026-02-21T10:18:33.6262831Z cvt.f32.bf16 %r15199, %rs1156; 2026-02-21T10:18:33.6262938Z cvt.f32.bf16 %r15200, %rs1163; 2026-02-21T10:18:33.6263013Z cvt.f32.bf16 %r15201, %rs1164; 2026-02-21T10:18:33.6263080Z cvt.f32.bf16 %r15330, %rs1171; 2026-02-21T10:18:33.6263142Z cvt.f32.bf16 %r15331, %rs1172; 2026-02-21T10:18:33.6263208Z cvt.f32.bf16 %r15332, %rs1179; 2026-02-21T10:18:33.6263269Z cvt.f32.bf16 %r15333, %rs1180; 2026-02-21T10:18:33.6263332Z cvt.f32.bf16 %r15462, %rs1125; 2026-02-21T10:18:33.6263393Z cvt.f32.bf16 %r15463, %rs1126; 2026-02-21T10:18:33.6263459Z cvt.f32.bf16 %r15464, %rs1133; 2026-02-21T10:18:33.6263522Z cvt.f32.bf16 %r15465, %rs1134; 2026-02-21T10:18:33.6263585Z cvt.f32.bf16 %r15594, %rs1141; 2026-02-21T10:18:33.6263650Z cvt.f32.bf16 %r15595, %rs1142; 2026-02-21T10:18:33.6263711Z cvt.f32.bf16 %r15596, %rs1149; 2026-02-21T10:18:33.6263773Z cvt.f32.bf16 %r15597, %rs1150; 2026-02-21T10:18:33.6263836Z cvt.f32.bf16 %r15726, %rs1157; 2026-02-21T10:18:33.6263905Z cvt.f32.bf16 %r15727, %rs1158; 2026-02-21T10:18:33.6263966Z cvt.f32.bf16 %r15728, %rs1165; 2026-02-21T10:18:33.6264029Z cvt.f32.bf16 %r15729, %rs1166; 2026-02-21T10:18:33.6264096Z cvt.f32.bf16 %r15858, %rs1173; 2026-02-21T10:18:33.6264158Z cvt.f32.bf16 %r15859, %rs1174; 2026-02-21T10:18:33.6264219Z cvt.f32.bf16 %r15860, %rs1181; 2026-02-21T10:18:33.6264281Z cvt.f32.bf16 %r15861, %rs1182; 2026-02-21T10:18:33.6264346Z cvt.f32.bf16 %r15990, %rs1127; 2026-02-21T10:18:33.6264409Z cvt.f32.bf16 %r15991, %rs1128; 2026-02-21T10:18:33.6264469Z cvt.f32.bf16 %r15992, %rs1135; 2026-02-21T10:18:33.6264536Z cvt.f32.bf16 %r15993, %rs1136; 2026-02-21T10:18:33.6264597Z cvt.f32.bf16 %r16122, %rs1143; 2026-02-21T10:18:33.6264659Z cvt.f32.bf16 %r16123, %rs1144; 2026-02-21T10:18:33.6264722Z cvt.f32.bf16 %r16124, %rs1151; 2026-02-21T10:18:33.6264788Z cvt.f32.bf16 %r16125, %rs1152; 2026-02-21T10:18:33.6264852Z cvt.f32.bf16 %r16254, %rs1159; 2026-02-21T10:18:33.6264913Z cvt.f32.bf16 %r16255, %rs1160; 2026-02-21T10:18:33.6264979Z cvt.f32.bf16 %r16256, %rs1167; 2026-02-21T10:18:33.6265042Z cvt.f32.bf16 %r16257, %rs1168; 2026-02-21T10:18:33.6265104Z cvt.f32.bf16 %r16386, %rs1175; 2026-02-21T10:18:33.6265168Z cvt.f32.bf16 %r16387, %rs1176; 2026-02-21T10:18:33.6265229Z cvt.f32.bf16 %r16388, %rs1183; 2026-02-21T10:18:33.6265291Z cvt.f32.bf16 %r16389, %rs1184; 2026-02-21T10:18:33.6265509Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6265571Z bar.sync 0; 2026-02-21T10:18:33.6265631Z // begin inline asm 2026-02-21T10:18:33.6265806Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.6265910Z // end inline asm 2026-02-21T10:18:33.6265967Z bar.sync 0; 2026-02-21T10:18:33.6266028Z // begin inline asm 2026-02-21T10:18:33.6266166Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.6266228Z // end inline asm 2026-02-21T10:18:33.6266288Z // begin inline asm 2026-02-21T10:18:33.6266366Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6266428Z // end inline asm 2026-02-21T10:18:33.6266604Z bar.sync 0; 2026-02-21T10:18:33.6266678Z elect.sync %r19133|%p191, -1; 2026-02-21T10:18:33.6266754Z and.pred %p151, %p1, %p191; 2026-02-21T10:18:33.6266820Z cvt.u32.u64 %r19134, %rd659; 2026-02-21T10:18:33.6266883Z add.s32 %r14273, %r19134, 128; 2026-02-21T10:18:33.6266943Z // begin inline asm 2026-02-21T10:18:33.6267291Z @%p151 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r19299, %r14273}], [%r19296]; 2026-02-21T10:18:33.6267353Z // end inline asm 2026-02-21T10:18:33.6267408Z bar.sync 0; 2026-02-21T10:18:33.6267473Z // begin inline asm 2026-02-21T10:18:33.6267528Z 2026-02-21T10:18:33.6267667Z { 2026-02-21T10:18:33.6267738Z .reg .pred complete; 2026-02-21T10:18:33.6267800Z waitLoop: 2026-02-21T10:18:33.6267949Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r18935; 2026-02-21T10:18:33.6268022Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6268077Z } 2026-02-21T10:18:33.6268141Z 2026-02-21T10:18:33.6268203Z // end inline asm 2026-02-21T10:18:33.6268259Z bar.sync 0; 2026-02-21T10:18:33.6268321Z // begin inline asm 2026-02-21T10:18:33.6268481Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.6268551Z // end inline asm 2026-02-21T10:18:33.6268761Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6268834Z ld.shared.s8 %rs1185, [%r26]; 2026-02-21T10:18:33.6269032Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6269101Z shl.b16 %rs1186, %rs1185, 4; 2026-02-21T10:18:33.6269304Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6269374Z ld.shared.s8 %rs1187, [%r27+128]; 2026-02-21T10:18:33.6269570Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6269639Z shl.b16 %rs1188, %rs1187, 4; 2026-02-21T10:18:33.6269833Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6269899Z ld.shared.s8 %rs1189, [%r28+256]; 2026-02-21T10:18:33.6270092Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6270160Z shl.b16 %rs1190, %rs1189, 4; 2026-02-21T10:18:33.6270353Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6270420Z ld.shared.s8 %rs1191, [%r29+384]; 2026-02-21T10:18:33.6270623Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6270687Z shl.b16 %rs1192, %rs1191, 4; 2026-02-21T10:18:33.6270880Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6270949Z ld.shared.s8 %rs1193, [%r30+512]; 2026-02-21T10:18:33.6271145Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6271208Z shl.b16 %rs1194, %rs1193, 4; 2026-02-21T10:18:33.6271407Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6271471Z ld.shared.s8 %rs1195, [%r31+640]; 2026-02-21T10:18:33.6271664Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6271820Z shl.b16 %rs1196, %rs1195, 4; 2026-02-21T10:18:33.6272025Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6272153Z ld.shared.s8 %rs1197, [%r32+768]; 2026-02-21T10:18:33.6272350Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6272416Z shl.b16 %rs1198, %rs1197, 4; 2026-02-21T10:18:33.6272611Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6272681Z ld.shared.s8 %rs1199, [%r33+896]; 2026-02-21T10:18:33.6272884Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6272959Z shl.b16 %rs1200, %rs1199, 4; 2026-02-21T10:18:33.6273158Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6273231Z ld.shared.s8 %rs1201, [%r26+1024]; 2026-02-21T10:18:33.6273430Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6273497Z shl.b16 %rs1202, %rs1201, 4; 2026-02-21T10:18:33.6273746Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6273815Z ld.shared.s8 %rs1203, [%r27+1152]; 2026-02-21T10:18:33.6274051Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6274116Z shl.b16 %rs1204, %rs1203, 4; 2026-02-21T10:18:33.6274314Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6274381Z ld.shared.s8 %rs1205, [%r28+1280]; 2026-02-21T10:18:33.6274585Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6274655Z shl.b16 %rs1206, %rs1205, 4; 2026-02-21T10:18:33.6274849Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6274917Z ld.shared.s8 %rs1207, [%r29+1408]; 2026-02-21T10:18:33.6275118Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6275182Z shl.b16 %rs1208, %rs1207, 4; 2026-02-21T10:18:33.6275376Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6275447Z ld.shared.s8 %rs1209, [%r30+1536]; 2026-02-21T10:18:33.6275643Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6275705Z shl.b16 %rs1210, %rs1209, 4; 2026-02-21T10:18:33.6275900Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6275973Z ld.shared.s8 %rs1211, [%r31+1664]; 2026-02-21T10:18:33.6276168Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6276233Z shl.b16 %rs1212, %rs1211, 4; 2026-02-21T10:18:33.6276435Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6276617Z ld.shared.s8 %rs1213, [%r32+1792]; 2026-02-21T10:18:33.6276836Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6276907Z shl.b16 %rs1214, %rs1213, 4; 2026-02-21T10:18:33.6277104Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6277170Z ld.shared.s8 %rs1215, [%r33+1920]; 2026-02-21T10:18:33.6277368Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6277430Z shl.b16 %rs1216, %rs1215, 4; 2026-02-21T10:18:33.6277623Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6277795Z ld.shared.s8 %rs1217, [%r26+2048]; 2026-02-21T10:18:33.6277996Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6278124Z shl.b16 %rs1218, %rs1217, 4; 2026-02-21T10:18:33.6278322Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6278391Z ld.shared.s8 %rs1219, [%r27+2176]; 2026-02-21T10:18:33.6278588Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6278651Z shl.b16 %rs1220, %rs1219, 4; 2026-02-21T10:18:33.6278850Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6278916Z ld.shared.s8 %rs1221, [%r28+2304]; 2026-02-21T10:18:33.6279110Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6279181Z shl.b16 %rs1222, %rs1221, 4; 2026-02-21T10:18:33.6279386Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6279451Z ld.shared.s8 %rs1223, [%r29+2432]; 2026-02-21T10:18:33.6279711Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6279782Z shl.b16 %rs1224, %rs1223, 4; 2026-02-21T10:18:33.6280041Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6280109Z ld.shared.s8 %rs1225, [%r30+2560]; 2026-02-21T10:18:33.6280310Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6280373Z shl.b16 %rs1226, %rs1225, 4; 2026-02-21T10:18:33.6280568Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6280637Z ld.shared.s8 %rs1227, [%r31+2688]; 2026-02-21T10:18:33.6280845Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6280913Z shl.b16 %rs1228, %rs1227, 4; 2026-02-21T10:18:33.6281115Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6281181Z ld.shared.s8 %rs1229, [%r32+2816]; 2026-02-21T10:18:33.6281376Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6281447Z shl.b16 %rs1230, %rs1229, 4; 2026-02-21T10:18:33.6281644Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6281710Z ld.shared.s8 %rs1231, [%r33+2944]; 2026-02-21T10:18:33.6281903Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6281970Z shl.b16 %rs1232, %rs1231, 4; 2026-02-21T10:18:33.6282168Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6282236Z ld.shared.s8 %rs1233, [%r26+3072]; 2026-02-21T10:18:33.6282437Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6282500Z shl.b16 %rs1234, %rs1233, 4; 2026-02-21T10:18:33.6282695Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6282765Z ld.shared.s8 %rs1235, [%r27+3200]; 2026-02-21T10:18:33.6282961Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6283023Z shl.b16 %rs1236, %rs1235, 4; 2026-02-21T10:18:33.6283225Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6283291Z ld.shared.s8 %rs1237, [%r28+3328]; 2026-02-21T10:18:33.6283486Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6283606Z shl.b16 %rs1238, %rs1237, 4; 2026-02-21T10:18:33.6283806Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6283917Z ld.shared.s8 %rs1239, [%r29+3456]; 2026-02-21T10:18:33.6284113Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6284196Z shl.b16 %rs1240, %rs1239, 4; 2026-02-21T10:18:33.6284404Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6284471Z ld.shared.s8 %rs1241, [%r30+3584]; 2026-02-21T10:18:33.6284671Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6284734Z shl.b16 %rs1242, %rs1241, 4; 2026-02-21T10:18:33.6284927Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6284999Z ld.shared.s8 %rs1243, [%r31+3712]; 2026-02-21T10:18:33.6285195Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6285260Z shl.b16 %rs1244, %rs1243, 4; 2026-02-21T10:18:33.6285502Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6285574Z ld.shared.s8 %rs1245, [%r32+3840]; 2026-02-21T10:18:33.6285810Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6285876Z shl.b16 %rs1246, %rs1245, 4; 2026-02-21T10:18:33.6286080Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6286145Z ld.shared.s8 %rs1247, [%r33+3968]; 2026-02-21T10:18:33.6286342Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6286411Z shl.b16 %rs1248, %rs1247, 4; 2026-02-21T10:18:33.6286740Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6286813Z cvt.s16.s8 %rs1249, %rs1186; 2026-02-21T10:18:33.6286882Z shr.s16 %rs1250, %rs1249, 4; 2026-02-21T10:18:33.6286946Z cvt.s16.s8 %rs1251, %rs1188; 2026-02-21T10:18:33.6287010Z shr.s16 %rs1252, %rs1251, 4; 2026-02-21T10:18:33.6287072Z shr.s16 %rs1253, %rs1185, 4; 2026-02-21T10:18:33.6287140Z shr.s16 %rs1254, %rs1187, 4; 2026-02-21T10:18:33.6287339Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6287409Z cvt.rn.f32.s16 %r19135, %rs1254; 2026-02-21T10:18:33.6287479Z cvt.rn.f32.s16 %r19136, %rs1253; 2026-02-21T10:18:33.6287545Z cvt.rn.f32.s16 %r19137, %rs1252; 2026-02-21T10:18:33.6287611Z cvt.rn.f32.s16 %r19138, %rs1250; 2026-02-21T10:18:33.6287811Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6287885Z cvt.s16.s8 %rs1255, %rs1190; 2026-02-21T10:18:33.6287950Z shr.s16 %rs1256, %rs1255, 4; 2026-02-21T10:18:33.6288011Z cvt.s16.s8 %rs1257, %rs1192; 2026-02-21T10:18:33.6288082Z shr.s16 %rs1258, %rs1257, 4; 2026-02-21T10:18:33.6288144Z shr.s16 %rs1259, %rs1189, 4; 2026-02-21T10:18:33.6288207Z shr.s16 %rs1260, %rs1191, 4; 2026-02-21T10:18:33.6288410Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6288475Z cvt.rn.f32.s16 %r19139, %rs1260; 2026-02-21T10:18:33.6288540Z cvt.rn.f32.s16 %r19140, %rs1259; 2026-02-21T10:18:33.6288604Z cvt.rn.f32.s16 %r19141, %rs1258; 2026-02-21T10:18:33.6288686Z cvt.rn.f32.s16 %r19142, %rs1256; 2026-02-21T10:18:33.6288884Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6288946Z cvt.s16.s8 %rs1261, %rs1194; 2026-02-21T10:18:33.6289014Z shr.s16 %rs1262, %rs1261, 4; 2026-02-21T10:18:33.6289076Z cvt.s16.s8 %rs1263, %rs1196; 2026-02-21T10:18:33.6289137Z shr.s16 %rs1264, %rs1263, 4; 2026-02-21T10:18:33.6289292Z shr.s16 %rs1265, %rs1193, 4; 2026-02-21T10:18:33.6289416Z shr.s16 %rs1266, %rs1195, 4; 2026-02-21T10:18:33.6289621Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6289687Z cvt.rn.f32.s16 %r19143, %rs1266; 2026-02-21T10:18:33.6289756Z cvt.rn.f32.s16 %r19144, %rs1265; 2026-02-21T10:18:33.6289818Z cvt.rn.f32.s16 %r19145, %rs1264; 2026-02-21T10:18:33.6289882Z cvt.rn.f32.s16 %r19146, %rs1262; 2026-02-21T10:18:33.6290085Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6290147Z cvt.s16.s8 %rs1267, %rs1198; 2026-02-21T10:18:33.6290209Z shr.s16 %rs1268, %rs1267, 4; 2026-02-21T10:18:33.6290278Z cvt.s16.s8 %rs1269, %rs1200; 2026-02-21T10:18:33.6290341Z shr.s16 %rs1270, %rs1269, 4; 2026-02-21T10:18:33.6290402Z shr.s16 %rs1271, %rs1197, 4; 2026-02-21T10:18:33.6290464Z shr.s16 %rs1272, %rs1199, 4; 2026-02-21T10:18:33.6290667Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6290744Z cvt.rn.f32.s16 %r19147, %rs1272; 2026-02-21T10:18:33.6290875Z cvt.rn.f32.s16 %r19148, %rs1271; 2026-02-21T10:18:33.6290946Z cvt.rn.f32.s16 %r19149, %rs1270; 2026-02-21T10:18:33.6291009Z cvt.rn.f32.s16 %r19150, %rs1268; 2026-02-21T10:18:33.6291262Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6291331Z cvt.s16.s8 %rs1273, %rs1202; 2026-02-21T10:18:33.6291395Z shr.s16 %rs1274, %rs1273, 4; 2026-02-21T10:18:33.6291458Z cvt.s16.s8 %rs1275, %rs1204; 2026-02-21T10:18:33.6291520Z shr.s16 %rs1276, %rs1275, 4; 2026-02-21T10:18:33.6291586Z shr.s16 %rs1277, %rs1201, 4; 2026-02-21T10:18:33.6291648Z shr.s16 %rs1278, %rs1203, 4; 2026-02-21T10:18:33.6291843Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6291914Z cvt.rn.f32.s16 %r19151, %rs1278; 2026-02-21T10:18:33.6291978Z cvt.rn.f32.s16 %r19152, %rs1277; 2026-02-21T10:18:33.6292043Z cvt.rn.f32.s16 %r19153, %rs1276; 2026-02-21T10:18:33.6292108Z cvt.rn.f32.s16 %r19154, %rs1274; 2026-02-21T10:18:33.6292308Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6292371Z cvt.s16.s8 %rs1279, %rs1206; 2026-02-21T10:18:33.6292433Z shr.s16 %rs1280, %rs1279, 4; 2026-02-21T10:18:33.6292500Z cvt.s16.s8 %rs1281, %rs1208; 2026-02-21T10:18:33.6292566Z shr.s16 %rs1282, %rs1281, 4; 2026-02-21T10:18:33.6292626Z shr.s16 %rs1283, %rs1205, 4; 2026-02-21T10:18:33.6292698Z shr.s16 %rs1284, %rs1207, 4; 2026-02-21T10:18:33.6292894Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6292961Z cvt.rn.f32.s16 %r19155, %rs1284; 2026-02-21T10:18:33.6293037Z cvt.rn.f32.s16 %r19156, %rs1283; 2026-02-21T10:18:33.6293109Z cvt.rn.f32.s16 %r19157, %rs1282; 2026-02-21T10:18:33.6293175Z cvt.rn.f32.s16 %r19158, %rs1280; 2026-02-21T10:18:33.6293374Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6293442Z cvt.s16.s8 %rs1285, %rs1210; 2026-02-21T10:18:33.6293509Z shr.s16 %rs1286, %rs1285, 4; 2026-02-21T10:18:33.6293572Z cvt.s16.s8 %rs1287, %rs1212; 2026-02-21T10:18:33.6293638Z shr.s16 %rs1288, %rs1287, 4; 2026-02-21T10:18:33.6293701Z shr.s16 %rs1289, %rs1209, 4; 2026-02-21T10:18:33.6293762Z shr.s16 %rs1290, %rs1211, 4; 2026-02-21T10:18:33.6293959Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6294029Z cvt.rn.f32.s16 %r19159, %rs1290; 2026-02-21T10:18:33.6294093Z cvt.rn.f32.s16 %r19160, %rs1289; 2026-02-21T10:18:33.6294157Z cvt.rn.f32.s16 %r19161, %rs1288; 2026-02-21T10:18:33.6294226Z cvt.rn.f32.s16 %r19162, %rs1286; 2026-02-21T10:18:33.6294419Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6294586Z cvt.s16.s8 %rs1291, %rs1214; 2026-02-21T10:18:33.6294649Z shr.s16 %rs1292, %rs1291, 4; 2026-02-21T10:18:33.6294718Z cvt.s16.s8 %rs1293, %rs1216; 2026-02-21T10:18:33.6294780Z shr.s16 %rs1294, %rs1293, 4; 2026-02-21T10:18:33.6294840Z shr.s16 %rs1295, %rs1213, 4; 2026-02-21T10:18:33.6294907Z shr.s16 %rs1296, %rs1215, 4; 2026-02-21T10:18:33.6295104Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6295170Z cvt.rn.f32.s16 %r19163, %rs1296; 2026-02-21T10:18:33.6295241Z cvt.rn.f32.s16 %r19164, %rs1295; 2026-02-21T10:18:33.6295305Z cvt.rn.f32.s16 %r19165, %rs1294; 2026-02-21T10:18:33.6295367Z cvt.rn.f32.s16 %r19166, %rs1292; 2026-02-21T10:18:33.6295564Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6295633Z cvt.s16.s8 %rs1297, %rs1218; 2026-02-21T10:18:33.6295699Z shr.s16 %rs1298, %rs1297, 4; 2026-02-21T10:18:33.6295763Z cvt.s16.s8 %rs1299, %rs1220; 2026-02-21T10:18:33.6295830Z shr.s16 %rs1300, %rs1299, 4; 2026-02-21T10:18:33.6295942Z shr.s16 %rs1301, %rs1217, 4; 2026-02-21T10:18:33.6296007Z shr.s16 %rs1302, %rs1219, 4; 2026-02-21T10:18:33.6296209Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6296278Z cvt.rn.f32.s16 %r19167, %rs1302; 2026-02-21T10:18:33.6296403Z cvt.rn.f32.s16 %r19168, %rs1301; 2026-02-21T10:18:33.6296587Z cvt.rn.f32.s16 %r19169, %rs1300; 2026-02-21T10:18:33.6296662Z cvt.rn.f32.s16 %r19170, %rs1298; 2026-02-21T10:18:33.6296861Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6296925Z cvt.s16.s8 %rs1303, %rs1222; 2026-02-21T10:18:33.6296992Z shr.s16 %rs1304, %rs1303, 4; 2026-02-21T10:18:33.6297054Z cvt.s16.s8 %rs1305, %rs1224; 2026-02-21T10:18:33.6297118Z shr.s16 %rs1306, %rs1305, 4; 2026-02-21T10:18:33.6297183Z shr.s16 %rs1307, %rs1221, 4; 2026-02-21T10:18:33.6297249Z shr.s16 %rs1308, %rs1223, 4; 2026-02-21T10:18:33.6297445Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6297509Z cvt.rn.f32.s16 %r19171, %rs1308; 2026-02-21T10:18:33.6297579Z cvt.rn.f32.s16 %r19172, %rs1307; 2026-02-21T10:18:33.6297645Z cvt.rn.f32.s16 %r19173, %rs1306; 2026-02-21T10:18:33.6297710Z cvt.rn.f32.s16 %r19174, %rs1304; 2026-02-21T10:18:33.6297913Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6297974Z cvt.s16.s8 %rs1309, %rs1226; 2026-02-21T10:18:33.6298039Z shr.s16 %rs1310, %rs1309, 4; 2026-02-21T10:18:33.6298099Z cvt.s16.s8 %rs1311, %rs1228; 2026-02-21T10:18:33.6298165Z shr.s16 %rs1312, %rs1311, 4; 2026-02-21T10:18:33.6298226Z shr.s16 %rs1313, %rs1225, 4; 2026-02-21T10:18:33.6298289Z shr.s16 %rs1314, %rs1227, 4; 2026-02-21T10:18:33.6298492Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6298560Z cvt.rn.f32.s16 %r19175, %rs1314; 2026-02-21T10:18:33.6298624Z cvt.rn.f32.s16 %r19176, %rs1313; 2026-02-21T10:18:33.6298692Z cvt.rn.f32.s16 %r19177, %rs1312; 2026-02-21T10:18:33.6298755Z cvt.rn.f32.s16 %r19178, %rs1310; 2026-02-21T10:18:33.6298951Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6299014Z cvt.s16.s8 %rs1315, %rs1230; 2026-02-21T10:18:33.6299081Z shr.s16 %rs1316, %rs1315, 4; 2026-02-21T10:18:33.6299143Z cvt.s16.s8 %rs1317, %rs1232; 2026-02-21T10:18:33.6299203Z shr.s16 %rs1318, %rs1317, 4; 2026-02-21T10:18:33.6299270Z shr.s16 %rs1319, %rs1229, 4; 2026-02-21T10:18:33.6299331Z shr.s16 %rs1320, %rs1231, 4; 2026-02-21T10:18:33.6299526Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6299687Z cvt.rn.f32.s16 %r19179, %rs1320; 2026-02-21T10:18:33.6299810Z cvt.rn.f32.s16 %r19180, %rs1319; 2026-02-21T10:18:33.6299873Z cvt.rn.f32.s16 %r19181, %rs1318; 2026-02-21T10:18:33.6299938Z cvt.rn.f32.s16 %r19182, %rs1316; 2026-02-21T10:18:33.6300144Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6300209Z cvt.s16.s8 %rs1321, %rs1234; 2026-02-21T10:18:33.6300273Z shr.s16 %rs1322, %rs1321, 4; 2026-02-21T10:18:33.6300343Z cvt.s16.s8 %rs1323, %rs1236; 2026-02-21T10:18:33.6300405Z shr.s16 %rs1324, %rs1323, 4; 2026-02-21T10:18:33.6300466Z shr.s16 %rs1325, %rs1233, 4; 2026-02-21T10:18:33.6300527Z shr.s16 %rs1326, %rs1235, 4; 2026-02-21T10:18:33.6300741Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6300808Z cvt.rn.f32.s16 %r19183, %rs1326; 2026-02-21T10:18:33.6300872Z cvt.rn.f32.s16 %r19184, %rs1325; 2026-02-21T10:18:33.6300941Z cvt.rn.f32.s16 %r19185, %rs1324; 2026-02-21T10:18:33.6301005Z cvt.rn.f32.s16 %r19186, %rs1322; 2026-02-21T10:18:33.6301272Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6301344Z cvt.s16.s8 %rs1327, %rs1238; 2026-02-21T10:18:33.6301406Z shr.s16 %rs1328, %rs1327, 4; 2026-02-21T10:18:33.6301468Z cvt.s16.s8 %rs1329, %rs1240; 2026-02-21T10:18:33.6301529Z shr.s16 %rs1330, %rs1329, 4; 2026-02-21T10:18:33.6301651Z shr.s16 %rs1331, %rs1237, 4; 2026-02-21T10:18:33.6301714Z shr.s16 %rs1332, %rs1239, 4; 2026-02-21T10:18:33.6301913Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6301984Z cvt.rn.f32.s16 %r19187, %rs1332; 2026-02-21T10:18:33.6302051Z cvt.rn.f32.s16 %r19188, %rs1331; 2026-02-21T10:18:33.6302118Z cvt.rn.f32.s16 %r19189, %rs1330; 2026-02-21T10:18:33.6302191Z cvt.rn.f32.s16 %r19190, %rs1328; 2026-02-21T10:18:33.6302401Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6302468Z cvt.s16.s8 %rs1333, %rs1242; 2026-02-21T10:18:33.6302534Z shr.s16 %rs1334, %rs1333, 4; 2026-02-21T10:18:33.6302603Z cvt.s16.s8 %rs1335, %rs1244; 2026-02-21T10:18:33.6302664Z shr.s16 %rs1336, %rs1335, 4; 2026-02-21T10:18:33.6302726Z shr.s16 %rs1337, %rs1241, 4; 2026-02-21T10:18:33.6302792Z shr.s16 %rs1338, %rs1243, 4; 2026-02-21T10:18:33.6302990Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6303055Z cvt.rn.f32.s16 %r19191, %rs1338; 2026-02-21T10:18:33.6303128Z cvt.rn.f32.s16 %r19192, %rs1337; 2026-02-21T10:18:33.6303192Z cvt.rn.f32.s16 %r19193, %rs1336; 2026-02-21T10:18:33.6303255Z cvt.rn.f32.s16 %r19194, %rs1334; 2026-02-21T10:18:33.6303455Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6303522Z cvt.s16.s8 %rs1339, %rs1246; 2026-02-21T10:18:33.6303587Z shr.s16 %rs1340, %rs1339, 4; 2026-02-21T10:18:33.6303650Z cvt.s16.s8 %rs1341, %rs1248; 2026-02-21T10:18:33.6303716Z shr.s16 %rs1342, %rs1341, 4; 2026-02-21T10:18:33.6303779Z shr.s16 %rs1343, %rs1245, 4; 2026-02-21T10:18:33.6303841Z shr.s16 %rs1344, %rs1247, 4; 2026-02-21T10:18:33.6304043Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6304109Z cvt.rn.f32.s16 %r19195, %rs1344; 2026-02-21T10:18:33.6304174Z cvt.rn.f32.s16 %r19196, %rs1343; 2026-02-21T10:18:33.6304237Z cvt.rn.f32.s16 %r19197, %rs1342; 2026-02-21T10:18:33.6304306Z cvt.rn.f32.s16 %r19198, %rs1340; 2026-02-21T10:18:33.6304363Z bar.sync 0; 2026-02-21T10:18:33.6304494Z st.shared.v4.b32 [%r34], {%r19138, %r19136, %r19137, %r19135}; 2026-02-21T10:18:33.6304632Z st.shared.v4.b32 [%r34+16384], {%r19170, %r19168, %r19169, %r19167}; 2026-02-21T10:18:33.6304746Z st.shared.v4.b32 [%r35], {%r19142, %r19140, %r19141, %r19139}; 2026-02-21T10:18:33.6304939Z st.shared.v4.b32 [%r35+16384], {%r19174, %r19172, %r19173, %r19171}; 2026-02-21T10:18:33.6305100Z st.shared.v4.b32 [%r36], {%r19146, %r19144, %r19145, %r19143}; 2026-02-21T10:18:33.6305220Z st.shared.v4.b32 [%r36+16384], {%r19178, %r19176, %r19177, %r19175}; 2026-02-21T10:18:33.6305329Z st.shared.v4.b32 [%r37], {%r19150, %r19148, %r19149, %r19147}; 2026-02-21T10:18:33.6305448Z st.shared.v4.b32 [%r37+16384], {%r19182, %r19180, %r19181, %r19179}; 2026-02-21T10:18:33.6305561Z st.shared.v4.b32 [%r38], {%r19154, %r19152, %r19153, %r19151}; 2026-02-21T10:18:33.6305677Z st.shared.v4.b32 [%r38+16384], {%r19186, %r19184, %r19185, %r19183}; 2026-02-21T10:18:33.6305785Z st.shared.v4.b32 [%r39], {%r19158, %r19156, %r19157, %r19155}; 2026-02-21T10:18:33.6305907Z st.shared.v4.b32 [%r39+16384], {%r19190, %r19188, %r19189, %r19187}; 2026-02-21T10:18:33.6306016Z st.shared.v4.b32 [%r40], {%r19162, %r19160, %r19161, %r19159}; 2026-02-21T10:18:33.6306132Z st.shared.v4.b32 [%r40+16384], {%r19194, %r19192, %r19193, %r19191}; 2026-02-21T10:18:33.6306252Z st.shared.v4.b32 [%r41], {%r19166, %r19164, %r19165, %r19163}; 2026-02-21T10:18:33.6306418Z st.shared.v4.b32 [%r41+16384], {%r19198, %r19196, %r19197, %r19195}; 2026-02-21T10:18:33.6306591Z $L__tmp11: 2026-02-21T10:18:33.6306884Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6306959Z // begin inline asm 2026-02-21T10:18:33.6307114Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6307177Z // end inline asm 2026-02-21T10:18:33.6307238Z bar.sync 0; 2026-02-21T10:18:33.6307315Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6307375Z // begin inline asm 2026-02-21T10:18:33.6308975Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r14406,%r14407,%r14408,%r14409}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6309042Z // end inline asm 2026-02-21T10:18:33.6309103Z // begin inline asm 2026-02-21T10:18:33.6310590Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r14538,%r14539,%r14540,%r14541}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6310657Z // end inline asm 2026-02-21T10:18:33.6310722Z // begin inline asm 2026-02-21T10:18:33.6312205Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r14670,%r14671,%r14672,%r14673}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6312264Z // end inline asm 2026-02-21T10:18:33.6312331Z // begin inline asm 2026-02-21T10:18:33.6313816Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r14802,%r14803,%r14804,%r14805}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6314016Z // end inline asm 2026-02-21T10:18:33.6314077Z // begin inline asm 2026-02-21T10:18:33.6315630Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r14934,%r14935,%r14936,%r14937}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6315698Z // end inline asm 2026-02-21T10:18:33.6315759Z // begin inline asm 2026-02-21T10:18:33.6317450Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r15066,%r15067,%r15068,%r15069}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6317528Z // end inline asm 2026-02-21T10:18:33.6317595Z // begin inline asm 2026-02-21T10:18:33.6319087Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r15198,%r15199,%r15200,%r15201}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6319149Z // end inline asm 2026-02-21T10:18:33.6319209Z // begin inline asm 2026-02-21T10:18:33.6320697Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r15330,%r15331,%r15332,%r15333}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6320759Z // end inline asm 2026-02-21T10:18:33.6320817Z // begin inline asm 2026-02-21T10:18:33.6322299Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r15462,%r15463,%r15464,%r15465}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6322485Z // end inline asm 2026-02-21T10:18:33.6322551Z // begin inline asm 2026-02-21T10:18:33.6324031Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r15594,%r15595,%r15596,%r15597}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6324088Z // end inline asm 2026-02-21T10:18:33.6324154Z // begin inline asm 2026-02-21T10:18:33.6325761Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r15726,%r15727,%r15728,%r15729}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6325827Z // end inline asm 2026-02-21T10:18:33.6325886Z // begin inline asm 2026-02-21T10:18:33.6327499Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r15858,%r15859,%r15860,%r15861}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6327570Z // end inline asm 2026-02-21T10:18:33.6327635Z // begin inline asm 2026-02-21T10:18:33.6329111Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r15990,%r15991,%r15992,%r15993}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6329180Z // end inline asm 2026-02-21T10:18:33.6329240Z // begin inline asm 2026-02-21T10:18:33.6330730Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r16122,%r16123,%r16124,%r16125}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6330790Z // end inline asm 2026-02-21T10:18:33.6330849Z // begin inline asm 2026-02-21T10:18:33.6332405Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r16254,%r16255,%r16256,%r16257}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6332524Z // end inline asm 2026-02-21T10:18:33.6332583Z // begin inline asm 2026-02-21T10:18:33.6334136Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r16386,%r16387,%r16388,%r16389}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6334205Z // end inline asm 2026-02-21T10:18:33.6334292Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6334411Z mov.b32 %r16519, %r18935; 2026-02-21T10:18:33.6334474Z mov.b32 %r16520, %r18935; 2026-02-21T10:18:33.6334540Z mov.b32 %r16518, %r29377; 2026-02-21T10:18:33.6334603Z // begin inline asm 2026-02-21T10:18:33.6337236Z // wait for regs: %r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227,%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291,%r16518,%r16519,%r16520 2026-02-21T10:18:33.6337328Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6337387Z // end inline asm 2026-02-21T10:18:33.6337443Z $L__tmp12: 2026-02-21T10:18:33.6337662Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6337734Z add.s64 %rd383, %rd301, 256; 2026-02-21T10:18:33.6337801Z add.s64 %rd386, %rd304, 256; 2026-02-21T10:18:33.6337868Z add.s64 %rd389, %rd307, 256; 2026-02-21T10:18:33.6337930Z add.s64 %rd392, %rd310, 256; 2026-02-21T10:18:33.6337993Z add.s64 %rd395, %rd313, 256; 2026-02-21T10:18:33.6338064Z add.s64 %rd398, %rd316, 256; 2026-02-21T10:18:33.6338126Z add.s64 %rd401, %rd319, 256; 2026-02-21T10:18:33.6338205Z mad.wide.s32 %rd404, %r32163, 2, %rd85; 2026-02-21T10:18:33.6338408Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6338474Z // begin inline asm 2026-02-21T10:18:33.6338535Z mov.u64 %rd382, 0x0; 2026-02-21T10:18:33.6338667Z createpolicy.fractional.L2::evict_first.b64 %rd382, 1.0; 2026-02-21T10:18:33.6338731Z // end inline asm 2026-02-21T10:18:33.6338791Z // begin inline asm 2026-02-21T10:18:33.6338851Z mov.u32 %r16652, 0x0; 2026-02-21T10:18:33.6339002Z mov.u32 %r16653, 0x0; 2026-02-21T10:18:33.6341557Z mov.u32 %r16654, 0x0; 2026-02-21T10:18:33.6341771Z mov.u32 %r16655, 0x0; 2026-02-21T10:18:33.6342049Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16652, %r16653, %r16654, %r16655 }, [ %rd383 + 0 ], %rd382; 2026-02-21T10:18:33.6342113Z // end inline asm 2026-02-21T10:18:33.6342176Z // begin inline asm 2026-02-21T10:18:33.6342243Z mov.u64 %rd385, 0x0; 2026-02-21T10:18:33.6342381Z createpolicy.fractional.L2::evict_first.b64 %rd385, 1.0; 2026-02-21T10:18:33.6342442Z // end inline asm 2026-02-21T10:18:33.6342514Z // begin inline asm 2026-02-21T10:18:33.6342582Z mov.u32 %r16656, 0x0; 2026-02-21T10:18:33.6342641Z mov.u32 %r16657, 0x0; 2026-02-21T10:18:33.6342700Z mov.u32 %r16658, 0x0; 2026-02-21T10:18:33.6342763Z mov.u32 %r16659, 0x0; 2026-02-21T10:18:33.6343005Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16656, %r16657, %r16658, %r16659 }, [ %rd386 + 0 ], %rd385; 2026-02-21T10:18:33.6343064Z // end inline asm 2026-02-21T10:18:33.6343129Z // begin inline asm 2026-02-21T10:18:33.6343191Z mov.u64 %rd388, 0x0; 2026-02-21T10:18:33.6343320Z createpolicy.fractional.L2::evict_first.b64 %rd388, 1.0; 2026-02-21T10:18:33.6343461Z // end inline asm 2026-02-21T10:18:33.6343529Z // begin inline asm 2026-02-21T10:18:33.6343589Z mov.u32 %r16660, 0x0; 2026-02-21T10:18:33.6343649Z mov.u32 %r16661, 0x0; 2026-02-21T10:18:33.6343712Z mov.u32 %r16662, 0x0; 2026-02-21T10:18:33.6343768Z mov.u32 %r16663, 0x0; 2026-02-21T10:18:33.6344068Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16660, %r16661, %r16662, %r16663 }, [ %rd389 + 0 ], %rd388; 2026-02-21T10:18:33.6344136Z // end inline asm 2026-02-21T10:18:33.6344196Z // begin inline asm 2026-02-21T10:18:33.6344254Z mov.u64 %rd391, 0x0; 2026-02-21T10:18:33.6344387Z createpolicy.fractional.L2::evict_first.b64 %rd391, 1.0; 2026-02-21T10:18:33.6344444Z // end inline asm 2026-02-21T10:18:33.6344503Z // begin inline asm 2026-02-21T10:18:33.6344559Z mov.u32 %r16664, 0x0; 2026-02-21T10:18:33.6344625Z mov.u32 %r16665, 0x0; 2026-02-21T10:18:33.6344682Z mov.u32 %r16666, 0x0; 2026-02-21T10:18:33.6344740Z mov.u32 %r16667, 0x0; 2026-02-21T10:18:33.6344972Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16664, %r16665, %r16666, %r16667 }, [ %rd392 + 0 ], %rd391; 2026-02-21T10:18:33.6345032Z // end inline asm 2026-02-21T10:18:33.6345091Z // begin inline asm 2026-02-21T10:18:33.6345161Z mov.u64 %rd394, 0x0; 2026-02-21T10:18:33.6345293Z createpolicy.fractional.L2::evict_first.b64 %rd394, 1.0; 2026-02-21T10:18:33.6345354Z // end inline asm 2026-02-21T10:18:33.6345414Z // begin inline asm 2026-02-21T10:18:33.6345476Z mov.u32 %r16668, 0x0; 2026-02-21T10:18:33.6345535Z mov.u32 %r16669, 0x0; 2026-02-21T10:18:33.6345591Z mov.u32 %r16670, 0x0; 2026-02-21T10:18:33.6345654Z mov.u32 %r16671, 0x0; 2026-02-21T10:18:33.6345880Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16668, %r16669, %r16670, %r16671 }, [ %rd395 + 0 ], %rd394; 2026-02-21T10:18:33.6345940Z // end inline asm 2026-02-21T10:18:33.6346000Z // begin inline asm 2026-02-21T10:18:33.6346065Z mov.u64 %rd397, 0x0; 2026-02-21T10:18:33.6346187Z createpolicy.fractional.L2::evict_first.b64 %rd397, 1.0; 2026-02-21T10:18:33.6346246Z // end inline asm 2026-02-21T10:18:33.6346309Z // begin inline asm 2026-02-21T10:18:33.6346367Z mov.u32 %r16672, 0x0; 2026-02-21T10:18:33.6346425Z mov.u32 %r16673, 0x0; 2026-02-21T10:18:33.6346649Z mov.u32 %r16674, 0x0; 2026-02-21T10:18:33.6346715Z mov.u32 %r16675, 0x0; 2026-02-21T10:18:33.6346939Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16672, %r16673, %r16674, %r16675 }, [ %rd398 + 0 ], %rd397; 2026-02-21T10:18:33.6346996Z // end inline asm 2026-02-21T10:18:33.6347059Z // begin inline asm 2026-02-21T10:18:33.6347116Z mov.u64 %rd400, 0x0; 2026-02-21T10:18:33.6347234Z createpolicy.fractional.L2::evict_first.b64 %rd400, 1.0; 2026-02-21T10:18:33.6347294Z // end inline asm 2026-02-21T10:18:33.6347351Z // begin inline asm 2026-02-21T10:18:33.6347408Z mov.u32 %r16676, 0x0; 2026-02-21T10:18:33.6347557Z mov.u32 %r16677, 0x0; 2026-02-21T10:18:33.6347619Z mov.u32 %r16678, 0x0; 2026-02-21T10:18:33.6347737Z mov.u32 %r16679, 0x0; 2026-02-21T10:18:33.6347963Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16676, %r16677, %r16678, %r16679 }, [ %rd401 + 0 ], %rd400; 2026-02-21T10:18:33.6348026Z // end inline asm 2026-02-21T10:18:33.6348086Z // begin inline asm 2026-02-21T10:18:33.6348143Z mov.u64 %rd403, 0x0; 2026-02-21T10:18:33.6348267Z createpolicy.fractional.L2::evict_first.b64 %rd403, 1.0; 2026-02-21T10:18:33.6348324Z // end inline asm 2026-02-21T10:18:33.6348383Z // begin inline asm 2026-02-21T10:18:33.6348523Z mov.u32 %r16680, 0x0; 2026-02-21T10:18:33.6348590Z mov.u32 %r16681, 0x0; 2026-02-21T10:18:33.6348648Z mov.u32 %r16682, 0x0; 2026-02-21T10:18:33.6348707Z mov.u32 %r16683, 0x0; 2026-02-21T10:18:33.6348931Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r16680, %r16681, %r16682, %r16683 }, [ %rd404 + 0 ], %rd403; 2026-02-21T10:18:33.6348989Z // end inline asm 2026-02-21T10:18:33.6349213Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6349278Z bar.sync 0; 2026-02-21T10:18:33.6349439Z st.shared.v2.b32 [%r16], {%r16652, %r16653}; 2026-02-21T10:18:33.6349538Z st.shared.v2.b32 [%r16+2048], {%r16656, %r16657}; 2026-02-21T10:18:33.6349624Z st.shared.v2.b32 [%r16+4096], {%r16660, %r16661}; 2026-02-21T10:18:33.6349715Z st.shared.v2.b32 [%r16+6144], {%r16664, %r16665}; 2026-02-21T10:18:33.6349871Z st.shared.v2.b32 [%r16+8192], {%r16668, %r16669}; 2026-02-21T10:18:33.6349970Z st.shared.v2.b32 [%r16+10240], {%r16672, %r16673}; 2026-02-21T10:18:33.6350063Z st.shared.v2.b32 [%r16+12288], {%r16676, %r16677}; 2026-02-21T10:18:33.6350151Z st.shared.v2.b32 [%r16+14336], {%r16680, %r16681}; 2026-02-21T10:18:33.6350232Z st.shared.v2.b32 [%r17], {%r16654, %r16655}; 2026-02-21T10:18:33.6350322Z st.shared.v2.b32 [%r17+2048], {%r16658, %r16659}; 2026-02-21T10:18:33.6350407Z st.shared.v2.b32 [%r17+4096], {%r16662, %r16663}; 2026-02-21T10:18:33.6350492Z st.shared.v2.b32 [%r17+6144], {%r16666, %r16667}; 2026-02-21T10:18:33.6350577Z st.shared.v2.b32 [%r17+8192], {%r16670, %r16671}; 2026-02-21T10:18:33.6350669Z st.shared.v2.b32 [%r17+10240], {%r16674, %r16675}; 2026-02-21T10:18:33.6350757Z st.shared.v2.b32 [%r17+12288], {%r16678, %r16679}; 2026-02-21T10:18:33.6350843Z st.shared.v2.b32 [%r17+14336], {%r16682, %r16683}; 2026-02-21T10:18:33.6350904Z bar.sync 0; 2026-02-21T10:18:33.6350976Z ld.shared.b16 %rs1345, [%r58]; 2026-02-21T10:18:33.6351050Z ld.shared.b16 %rs1346, [%r58+1024]; 2026-02-21T10:18:33.6351122Z ld.shared.b16 %rs1347, [%r58+64]; 2026-02-21T10:18:33.6351189Z ld.shared.b16 %rs1348, [%r58+1088]; 2026-02-21T10:18:33.6351254Z ld.shared.b16 %rs1349, [%r58+8192]; 2026-02-21T10:18:33.6351321Z ld.shared.b16 %rs1350, [%r58+9216]; 2026-02-21T10:18:33.6351392Z ld.shared.b16 %rs1351, [%r58+8256]; 2026-02-21T10:18:33.6351460Z ld.shared.b16 %rs1352, [%r58+9280]; 2026-02-21T10:18:33.6351529Z ld.shared.b16 %rs1353, [%r59]; 2026-02-21T10:18:33.6351602Z ld.shared.b16 %rs1354, [%r59+1024]; 2026-02-21T10:18:33.6351675Z ld.shared.b16 %rs1355, [%r59+64]; 2026-02-21T10:18:33.6351743Z ld.shared.b16 %rs1356, [%r59+1088]; 2026-02-21T10:18:33.6351809Z ld.shared.b16 %rs1357, [%r59+8192]; 2026-02-21T10:18:33.6351879Z ld.shared.b16 %rs1358, [%r59+9216]; 2026-02-21T10:18:33.6351947Z ld.shared.b16 %rs1359, [%r59+8256]; 2026-02-21T10:18:33.6352013Z ld.shared.b16 %rs1360, [%r59+9280]; 2026-02-21T10:18:33.6352088Z ld.shared.b16 %rs1361, [%r60]; 2026-02-21T10:18:33.6352153Z ld.shared.b16 %rs1362, [%r60+1024]; 2026-02-21T10:18:33.6352220Z ld.shared.b16 %rs1363, [%r60+64]; 2026-02-21T10:18:33.6352289Z ld.shared.b16 %rs1364, [%r60+1088]; 2026-02-21T10:18:33.6352354Z ld.shared.b16 %rs1365, [%r60+8192]; 2026-02-21T10:18:33.6352418Z ld.shared.b16 %rs1366, [%r60+9216]; 2026-02-21T10:18:33.6352484Z ld.shared.b16 %rs1367, [%r60+8256]; 2026-02-21T10:18:33.6352553Z ld.shared.b16 %rs1368, [%r60+9280]; 2026-02-21T10:18:33.6352677Z ld.shared.b16 %rs1369, [%r61]; 2026-02-21T10:18:33.6352805Z ld.shared.b16 %rs1370, [%r61+1024]; 2026-02-21T10:18:33.6352873Z ld.shared.b16 %rs1371, [%r61+64]; 2026-02-21T10:18:33.6352940Z ld.shared.b16 %rs1372, [%r61+1088]; 2026-02-21T10:18:33.6353008Z ld.shared.b16 %rs1373, [%r61+8192]; 2026-02-21T10:18:33.6353072Z ld.shared.b16 %rs1374, [%r61+9216]; 2026-02-21T10:18:33.6353144Z ld.shared.b16 %rs1375, [%r61+8256]; 2026-02-21T10:18:33.6353213Z ld.shared.b16 %rs1376, [%r61+9280]; 2026-02-21T10:18:33.6353279Z ld.shared.b16 %rs1377, [%r62]; 2026-02-21T10:18:33.6353350Z ld.shared.b16 %rs1378, [%r62+1024]; 2026-02-21T10:18:33.6353414Z ld.shared.b16 %rs1379, [%r62+64]; 2026-02-21T10:18:33.6353479Z ld.shared.b16 %rs1380, [%r62+1088]; 2026-02-21T10:18:33.6353548Z ld.shared.b16 %rs1381, [%r62+8192]; 2026-02-21T10:18:33.6353615Z ld.shared.b16 %rs1382, [%r62+9216]; 2026-02-21T10:18:33.6353678Z ld.shared.b16 %rs1383, [%r62+8256]; 2026-02-21T10:18:33.6353743Z ld.shared.b16 %rs1384, [%r62+9280]; 2026-02-21T10:18:33.6353825Z ld.shared.b16 %rs1385, [%r63]; 2026-02-21T10:18:33.6353893Z ld.shared.b16 %rs1386, [%r63+1024]; 2026-02-21T10:18:33.6354007Z ld.shared.b16 %rs1387, [%r63+64]; 2026-02-21T10:18:33.6354078Z ld.shared.b16 %rs1388, [%r63+1088]; 2026-02-21T10:18:33.6354144Z ld.shared.b16 %rs1389, [%r63+8192]; 2026-02-21T10:18:33.6354207Z ld.shared.b16 %rs1390, [%r63+9216]; 2026-02-21T10:18:33.6354315Z ld.shared.b16 %rs1391, [%r63+8256]; 2026-02-21T10:18:33.6354385Z ld.shared.b16 %rs1392, [%r63+9280]; 2026-02-21T10:18:33.6354448Z ld.shared.b16 %rs1393, [%r64]; 2026-02-21T10:18:33.6354515Z ld.shared.b16 %rs1394, [%r64+1024]; 2026-02-21T10:18:33.6354589Z ld.shared.b16 %rs1395, [%r64+64]; 2026-02-21T10:18:33.6354655Z ld.shared.b16 %rs1396, [%r64+1088]; 2026-02-21T10:18:33.6354721Z ld.shared.b16 %rs1397, [%r64+8192]; 2026-02-21T10:18:33.6354793Z ld.shared.b16 %rs1398, [%r64+9216]; 2026-02-21T10:18:33.6354859Z ld.shared.b16 %rs1399, [%r64+8256]; 2026-02-21T10:18:33.6354926Z ld.shared.b16 %rs1400, [%r64+9280]; 2026-02-21T10:18:33.6354993Z ld.shared.b16 %rs1401, [%r65]; 2026-02-21T10:18:33.6355064Z ld.shared.b16 %rs1402, [%r65+1024]; 2026-02-21T10:18:33.6355129Z ld.shared.b16 %rs1403, [%r65+64]; 2026-02-21T10:18:33.6355194Z ld.shared.b16 %rs1404, [%r65+1088]; 2026-02-21T10:18:33.6355265Z ld.shared.b16 %rs1405, [%r65+8192]; 2026-02-21T10:18:33.6355342Z ld.shared.b16 %rs1406, [%r65+9216]; 2026-02-21T10:18:33.6355413Z ld.shared.b16 %rs1407, [%r65+8256]; 2026-02-21T10:18:33.6355477Z ld.shared.b16 %rs1408, [%r65+9280]; 2026-02-21T10:18:33.6355545Z cvt.f32.bf16 %r16821, %rs1345; 2026-02-21T10:18:33.6355606Z cvt.f32.bf16 %r16822, %rs1346; 2026-02-21T10:18:33.6355666Z cvt.f32.bf16 %r16823, %rs1353; 2026-02-21T10:18:33.6355729Z cvt.f32.bf16 %r16824, %rs1354; 2026-02-21T10:18:33.6355791Z cvt.f32.bf16 %r16953, %rs1361; 2026-02-21T10:18:33.6355851Z cvt.f32.bf16 %r16954, %rs1362; 2026-02-21T10:18:33.6355913Z cvt.f32.bf16 %r16955, %rs1369; 2026-02-21T10:18:33.6355979Z cvt.f32.bf16 %r16956, %rs1370; 2026-02-21T10:18:33.6356039Z cvt.f32.bf16 %r17085, %rs1377; 2026-02-21T10:18:33.6356100Z cvt.f32.bf16 %r17086, %rs1378; 2026-02-21T10:18:33.6356167Z cvt.f32.bf16 %r17087, %rs1385; 2026-02-21T10:18:33.6356232Z cvt.f32.bf16 %r17088, %rs1386; 2026-02-21T10:18:33.6356292Z cvt.f32.bf16 %r17217, %rs1393; 2026-02-21T10:18:33.6356359Z cvt.f32.bf16 %r17218, %rs1394; 2026-02-21T10:18:33.6356421Z cvt.f32.bf16 %r17219, %rs1401; 2026-02-21T10:18:33.6356607Z cvt.f32.bf16 %r17220, %rs1402; 2026-02-21T10:18:33.6356674Z cvt.f32.bf16 %r17349, %rs1347; 2026-02-21T10:18:33.6356739Z cvt.f32.bf16 %r17350, %rs1348; 2026-02-21T10:18:33.6356800Z cvt.f32.bf16 %r17351, %rs1355; 2026-02-21T10:18:33.6356861Z cvt.f32.bf16 %r17352, %rs1356; 2026-02-21T10:18:33.6356925Z cvt.f32.bf16 %r17481, %rs1363; 2026-02-21T10:18:33.6356985Z cvt.f32.bf16 %r17482, %rs1364; 2026-02-21T10:18:33.6357046Z cvt.f32.bf16 %r17483, %rs1371; 2026-02-21T10:18:33.6357198Z cvt.f32.bf16 %r17484, %rs1372; 2026-02-21T10:18:33.6357263Z cvt.f32.bf16 %r17613, %rs1379; 2026-02-21T10:18:33.6357384Z cvt.f32.bf16 %r17614, %rs1380; 2026-02-21T10:18:33.6357448Z cvt.f32.bf16 %r17615, %rs1387; 2026-02-21T10:18:33.6357514Z cvt.f32.bf16 %r17616, %rs1388; 2026-02-21T10:18:33.6357574Z cvt.f32.bf16 %r17745, %rs1395; 2026-02-21T10:18:33.6357634Z cvt.f32.bf16 %r17746, %rs1396; 2026-02-21T10:18:33.6357696Z cvt.f32.bf16 %r17747, %rs1403; 2026-02-21T10:18:33.6357773Z cvt.f32.bf16 %r17748, %rs1404; 2026-02-21T10:18:33.6357837Z cvt.f32.bf16 %r17877, %rs1349; 2026-02-21T10:18:33.6357897Z cvt.f32.bf16 %r17878, %rs1350; 2026-02-21T10:18:33.6357963Z cvt.f32.bf16 %r17879, %rs1357; 2026-02-21T10:18:33.6358024Z cvt.f32.bf16 %r17880, %rs1358; 2026-02-21T10:18:33.6358086Z cvt.f32.bf16 %r18009, %rs1365; 2026-02-21T10:18:33.6358150Z cvt.f32.bf16 %r18010, %rs1366; 2026-02-21T10:18:33.6358211Z cvt.f32.bf16 %r18011, %rs1373; 2026-02-21T10:18:33.6358271Z cvt.f32.bf16 %r18012, %rs1374; 2026-02-21T10:18:33.6358334Z cvt.f32.bf16 %r18141, %rs1381; 2026-02-21T10:18:33.6358399Z cvt.f32.bf16 %r18142, %rs1382; 2026-02-21T10:18:33.6358459Z cvt.f32.bf16 %r18143, %rs1389; 2026-02-21T10:18:33.6358585Z cvt.f32.bf16 %r18144, %rs1390; 2026-02-21T10:18:33.6358656Z cvt.f32.bf16 %r18273, %rs1397; 2026-02-21T10:18:33.6358718Z cvt.f32.bf16 %r18274, %rs1398; 2026-02-21T10:18:33.6358777Z cvt.f32.bf16 %r18275, %rs1405; 2026-02-21T10:18:33.6358904Z cvt.f32.bf16 %r18276, %rs1406; 2026-02-21T10:18:33.6358975Z cvt.f32.bf16 %r18405, %rs1351; 2026-02-21T10:18:33.6359037Z cvt.f32.bf16 %r18406, %rs1352; 2026-02-21T10:18:33.6359097Z cvt.f32.bf16 %r18407, %rs1359; 2026-02-21T10:18:33.6359163Z cvt.f32.bf16 %r18408, %rs1360; 2026-02-21T10:18:33.6359222Z cvt.f32.bf16 %r18537, %rs1367; 2026-02-21T10:18:33.6359282Z cvt.f32.bf16 %r18538, %rs1368; 2026-02-21T10:18:33.6359343Z cvt.f32.bf16 %r18539, %rs1375; 2026-02-21T10:18:33.6359405Z cvt.f32.bf16 %r18540, %rs1376; 2026-02-21T10:18:33.6359469Z cvt.f32.bf16 %r18669, %rs1383; 2026-02-21T10:18:33.6359529Z cvt.f32.bf16 %r18670, %rs1384; 2026-02-21T10:18:33.6359595Z cvt.f32.bf16 %r18671, %rs1391; 2026-02-21T10:18:33.6359658Z cvt.f32.bf16 %r18672, %rs1392; 2026-02-21T10:18:33.6359719Z cvt.f32.bf16 %r18801, %rs1399; 2026-02-21T10:18:33.6359779Z cvt.f32.bf16 %r18802, %rs1400; 2026-02-21T10:18:33.6359843Z cvt.f32.bf16 %r18803, %rs1407; 2026-02-21T10:18:33.6359905Z cvt.f32.bf16 %r18804, %rs1408; 2026-02-21T10:18:33.6360124Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6360188Z bar.sync 0; 2026-02-21T10:18:33.6360249Z // begin inline asm 2026-02-21T10:18:33.6360356Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.6360419Z // end inline asm 2026-02-21T10:18:33.6360474Z bar.sync 0; 2026-02-21T10:18:33.6360534Z // begin inline asm 2026-02-21T10:18:33.6360671Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.6360734Z // end inline asm 2026-02-21T10:18:33.6360793Z // begin inline asm 2026-02-21T10:18:33.6360872Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6360933Z // end inline asm 2026-02-21T10:18:33.6360989Z bar.sync 0; 2026-02-21T10:18:33.6361061Z elect.sync %r19199|%p192, -1; 2026-02-21T10:18:33.6361131Z and.pred %p171, %p1, %p192; 2026-02-21T10:18:33.6361199Z add.s32 %r16688, %r19134, 160; 2026-02-21T10:18:33.6361257Z // begin inline asm 2026-02-21T10:18:33.6361604Z @%p171 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r19299, %r16688}], [%r19296]; 2026-02-21T10:18:33.6361665Z // end inline asm 2026-02-21T10:18:33.6361719Z bar.sync 0; 2026-02-21T10:18:33.6361777Z // begin inline asm 2026-02-21T10:18:33.6361836Z 2026-02-21T10:18:33.6361888Z { 2026-02-21T10:18:33.6361952Z .reg .pred complete; 2026-02-21T10:18:33.6362008Z waitLoop: 2026-02-21T10:18:33.6362164Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r18935; 2026-02-21T10:18:33.6362306Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6362417Z } 2026-02-21T10:18:33.6362423Z 2026-02-21T10:18:33.6362483Z // end inline asm 2026-02-21T10:18:33.6362541Z bar.sync 0; 2026-02-21T10:18:33.6362601Z // begin inline asm 2026-02-21T10:18:33.6362700Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.6362761Z // end inline asm 2026-02-21T10:18:33.6362976Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6363044Z ld.shared.s8 %rs1409, [%r26]; 2026-02-21T10:18:33.6363262Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6363331Z shl.b16 %rs1410, %rs1409, 4; 2026-02-21T10:18:33.6363532Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6363612Z ld.shared.s8 %rs1411, [%r27+128]; 2026-02-21T10:18:33.6363817Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6363884Z shl.b16 %rs1412, %rs1411, 4; 2026-02-21T10:18:33.6364132Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6364202Z ld.shared.s8 %rs1413, [%r28+256]; 2026-02-21T10:18:33.6364404Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6364508Z shl.b16 %rs1414, %rs1413, 4; 2026-02-21T10:18:33.6364710Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6364775Z ld.shared.s8 %rs1415, [%r29+384]; 2026-02-21T10:18:33.6364968Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6365032Z shl.b16 %rs1416, %rs1415, 4; 2026-02-21T10:18:33.6365235Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6365316Z ld.shared.s8 %rs1417, [%r30+512]; 2026-02-21T10:18:33.6365532Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6365596Z shl.b16 %rs1418, %rs1417, 4; 2026-02-21T10:18:33.6365794Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6365866Z ld.shared.s8 %rs1419, [%r31+640]; 2026-02-21T10:18:33.6366060Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6366123Z shl.b16 %rs1420, %rs1419, 4; 2026-02-21T10:18:33.6366318Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6366387Z ld.shared.s8 %rs1421, [%r32+768]; 2026-02-21T10:18:33.6366697Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6366765Z shl.b16 %rs1422, %rs1421, 4; 2026-02-21T10:18:33.6366963Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6367030Z ld.shared.s8 %rs1423, [%r33+896]; 2026-02-21T10:18:33.6367225Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6367290Z shl.b16 %rs1424, %rs1423, 4; 2026-02-21T10:18:33.6367485Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6367566Z ld.shared.s8 %rs1425, [%r26+1024]; 2026-02-21T10:18:33.6367769Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6367835Z shl.b16 %rs1426, %rs1425, 4; 2026-02-21T10:18:33.6368031Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6368101Z ld.shared.s8 %rs1427, [%r27+1152]; 2026-02-21T10:18:33.6368397Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6368523Z shl.b16 %rs1428, %rs1427, 4; 2026-02-21T10:18:33.6368738Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6368807Z ld.shared.s8 %rs1429, [%r28+1280]; 2026-02-21T10:18:33.6369003Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6369066Z shl.b16 %rs1430, %rs1429, 4; 2026-02-21T10:18:33.6369265Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6369331Z ld.shared.s8 %rs1431, [%r29+1408]; 2026-02-21T10:18:33.6369524Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6369591Z shl.b16 %rs1432, %rs1431, 4; 2026-02-21T10:18:33.6369784Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6369855Z ld.shared.s8 %rs1433, [%r30+1536]; 2026-02-21T10:18:33.6370116Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6370180Z shl.b16 %rs1434, %rs1433, 4; 2026-02-21T10:18:33.6370373Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6370498Z ld.shared.s8 %rs1435, [%r31+1664]; 2026-02-21T10:18:33.6370694Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6370757Z shl.b16 %rs1436, %rs1435, 4; 2026-02-21T10:18:33.6370951Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6371021Z ld.shared.s8 %rs1437, [%r32+1792]; 2026-02-21T10:18:33.6371212Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6371287Z shl.b16 %rs1438, %rs1437, 4; 2026-02-21T10:18:33.6371490Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6371556Z ld.shared.s8 %rs1439, [%r33+1920]; 2026-02-21T10:18:33.6371751Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6371815Z shl.b16 %rs1440, %rs1439, 4; 2026-02-21T10:18:33.6372008Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6372076Z ld.shared.s8 %rs1441, [%r26+2048]; 2026-02-21T10:18:33.6372273Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6372334Z shl.b16 %rs1442, %rs1441, 4; 2026-02-21T10:18:33.6372534Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6372606Z ld.shared.s8 %rs1443, [%r27+2176]; 2026-02-21T10:18:33.6372820Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6372887Z shl.b16 %rs1444, %rs1443, 4; 2026-02-21T10:18:33.6373085Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6373155Z ld.shared.s8 %rs1445, [%r28+2304]; 2026-02-21T10:18:33.6373351Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6373420Z shl.b16 %rs1446, %rs1445, 4; 2026-02-21T10:18:33.6373619Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6373685Z ld.shared.s8 %rs1447, [%r29+2432]; 2026-02-21T10:18:33.6373881Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6373946Z shl.b16 %rs1448, %rs1447, 4; 2026-02-21T10:18:33.6374218Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6374334Z ld.shared.s8 %rs1449, [%r30+2560]; 2026-02-21T10:18:33.6374532Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6374596Z shl.b16 %rs1450, %rs1449, 4; 2026-02-21T10:18:33.6374790Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6374854Z ld.shared.s8 %rs1451, [%r31+2688]; 2026-02-21T10:18:33.6375048Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6375110Z shl.b16 %rs1452, %rs1451, 4; 2026-02-21T10:18:33.6375304Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6375372Z ld.shared.s8 %rs1453, [%r32+2816]; 2026-02-21T10:18:33.6375563Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6375628Z shl.b16 %rs1454, %rs1453, 4; 2026-02-21T10:18:33.6375886Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6375957Z ld.shared.s8 %rs1455, [%r33+2944]; 2026-02-21T10:18:33.6376152Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6376259Z shl.b16 %rs1456, %rs1455, 4; 2026-02-21T10:18:33.6376568Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6376637Z ld.shared.s8 %rs1457, [%r26+3072]; 2026-02-21T10:18:33.6376831Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6376895Z shl.b16 %rs1458, %rs1457, 4; 2026-02-21T10:18:33.6377090Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6377158Z ld.shared.s8 %rs1459, [%r27+3200]; 2026-02-21T10:18:33.6377365Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6377429Z shl.b16 %rs1460, %rs1459, 4; 2026-02-21T10:18:33.6377624Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6377691Z ld.shared.s8 %rs1461, [%r28+3328]; 2026-02-21T10:18:33.6377887Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6377949Z shl.b16 %rs1462, %rs1461, 4; 2026-02-21T10:18:33.6378147Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6378211Z ld.shared.s8 %rs1463, [%r29+3456]; 2026-02-21T10:18:33.6378404Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6378467Z shl.b16 %rs1464, %rs1463, 4; 2026-02-21T10:18:33.6378663Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6378730Z ld.shared.s8 %rs1465, [%r30+3584]; 2026-02-21T10:18:33.6378925Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6378988Z shl.b16 %rs1466, %rs1465, 4; 2026-02-21T10:18:33.6379181Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6379250Z ld.shared.s8 %rs1467, [%r31+3712]; 2026-02-21T10:18:33.6379446Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6379507Z shl.b16 %rs1468, %rs1467, 4; 2026-02-21T10:18:33.6379700Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6379769Z ld.shared.s8 %rs1469, [%r32+3840]; 2026-02-21T10:18:33.6380045Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6380176Z shl.b16 %rs1470, %rs1469, 4; 2026-02-21T10:18:33.6380373Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6380443Z ld.shared.s8 %rs1471, [%r33+3968]; 2026-02-21T10:18:33.6380638Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6380699Z shl.b16 %rs1472, %rs1471, 4; 2026-02-21T10:18:33.6380897Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6380960Z cvt.s16.s8 %rs1473, %rs1410; 2026-02-21T10:18:33.6381024Z shr.s16 %rs1474, %rs1473, 4; 2026-02-21T10:18:33.6381089Z cvt.s16.s8 %rs1475, %rs1412; 2026-02-21T10:18:33.6381149Z shr.s16 %rs1476, %rs1475, 4; 2026-02-21T10:18:33.6381212Z shr.s16 %rs1477, %rs1409, 4; 2026-02-21T10:18:33.6381274Z shr.s16 %rs1478, %rs1411, 4; 2026-02-21T10:18:33.6381470Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6381611Z cvt.rn.f32.s16 %r19200, %rs1478; 2026-02-21T10:18:33.6381685Z cvt.rn.f32.s16 %r19201, %rs1477; 2026-02-21T10:18:33.6381749Z cvt.rn.f32.s16 %r19202, %rs1476; 2026-02-21T10:18:33.6381809Z cvt.rn.f32.s16 %r19203, %rs1474; 2026-02-21T10:18:33.6382067Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6382133Z cvt.s16.s8 %rs1479, %rs1414; 2026-02-21T10:18:33.6382194Z shr.s16 %rs1480, %rs1479, 4; 2026-02-21T10:18:33.6382258Z cvt.s16.s8 %rs1481, %rs1416; 2026-02-21T10:18:33.6382319Z shr.s16 %rs1482, %rs1481, 4; 2026-02-21T10:18:33.6382382Z shr.s16 %rs1483, %rs1413, 4; 2026-02-21T10:18:33.6382441Z shr.s16 %rs1484, %rs1415, 4; 2026-02-21T10:18:33.6382640Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6382705Z cvt.rn.f32.s16 %r19204, %rs1484; 2026-02-21T10:18:33.6382769Z cvt.rn.f32.s16 %r19205, %rs1483; 2026-02-21T10:18:33.6382847Z cvt.rn.f32.s16 %r19206, %rs1482; 2026-02-21T10:18:33.6382910Z cvt.rn.f32.s16 %r19207, %rs1480; 2026-02-21T10:18:33.6383104Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6383169Z cvt.s16.s8 %rs1485, %rs1418; 2026-02-21T10:18:33.6383230Z shr.s16 %rs1486, %rs1485, 4; 2026-02-21T10:18:33.6383292Z cvt.s16.s8 %rs1487, %rs1420; 2026-02-21T10:18:33.6383352Z shr.s16 %rs1488, %rs1487, 4; 2026-02-21T10:18:33.6383413Z shr.s16 %rs1489, %rs1417, 4; 2026-02-21T10:18:33.6383472Z shr.s16 %rs1490, %rs1419, 4; 2026-02-21T10:18:33.6383665Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6383730Z cvt.rn.f32.s16 %r19208, %rs1490; 2026-02-21T10:18:33.6383793Z cvt.rn.f32.s16 %r19209, %rs1489; 2026-02-21T10:18:33.6383854Z cvt.rn.f32.s16 %r19210, %rs1488; 2026-02-21T10:18:33.6383917Z cvt.rn.f32.s16 %r19211, %rs1486; 2026-02-21T10:18:33.6384113Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6384174Z cvt.s16.s8 %rs1491, %rs1422; 2026-02-21T10:18:33.6384234Z shr.s16 %rs1492, %rs1491, 4; 2026-02-21T10:18:33.6384297Z cvt.s16.s8 %rs1493, %rs1424; 2026-02-21T10:18:33.6384359Z shr.s16 %rs1494, %rs1493, 4; 2026-02-21T10:18:33.6384418Z shr.s16 %rs1495, %rs1421, 4; 2026-02-21T10:18:33.6384479Z shr.s16 %rs1496, %rs1423, 4; 2026-02-21T10:18:33.6384680Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6384742Z cvt.rn.f32.s16 %r19212, %rs1496; 2026-02-21T10:18:33.6384805Z cvt.rn.f32.s16 %r19213, %rs1495; 2026-02-21T10:18:33.6384869Z cvt.rn.f32.s16 %r19214, %rs1494; 2026-02-21T10:18:33.6384930Z cvt.rn.f32.s16 %r19215, %rs1492; 2026-02-21T10:18:33.6385195Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6385304Z cvt.s16.s8 %rs1497, %rs1426; 2026-02-21T10:18:33.6385365Z shr.s16 %rs1498, %rs1497, 4; 2026-02-21T10:18:33.6385427Z cvt.s16.s8 %rs1499, %rs1428; 2026-02-21T10:18:33.6385490Z shr.s16 %rs1500, %rs1499, 4; 2026-02-21T10:18:33.6385548Z shr.s16 %rs1501, %rs1425, 4; 2026-02-21T10:18:33.6385607Z shr.s16 %rs1502, %rs1427, 4; 2026-02-21T10:18:33.6385802Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6385868Z cvt.rn.f32.s16 %r19216, %rs1502; 2026-02-21T10:18:33.6385929Z cvt.rn.f32.s16 %r19217, %rs1501; 2026-02-21T10:18:33.6385989Z cvt.rn.f32.s16 %r19218, %rs1500; 2026-02-21T10:18:33.6386053Z cvt.rn.f32.s16 %r19219, %rs1498; 2026-02-21T10:18:33.6386248Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6386323Z cvt.s16.s8 %rs1503, %rs1430; 2026-02-21T10:18:33.6386390Z shr.s16 %rs1504, %rs1503, 4; 2026-02-21T10:18:33.6386579Z cvt.s16.s8 %rs1505, %rs1432; 2026-02-21T10:18:33.6386726Z shr.s16 %rs1506, %rs1505, 4; 2026-02-21T10:18:33.6386794Z shr.s16 %rs1507, %rs1429, 4; 2026-02-21T10:18:33.6386858Z shr.s16 %rs1508, %rs1431, 4; 2026-02-21T10:18:33.6387054Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6387177Z cvt.rn.f32.s16 %r19220, %rs1508; 2026-02-21T10:18:33.6387245Z cvt.rn.f32.s16 %r19221, %rs1507; 2026-02-21T10:18:33.6387307Z cvt.rn.f32.s16 %r19222, %rs1506; 2026-02-21T10:18:33.6387368Z cvt.rn.f32.s16 %r19223, %rs1504; 2026-02-21T10:18:33.6387561Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6387628Z cvt.s16.s8 %rs1509, %rs1434; 2026-02-21T10:18:33.6387687Z shr.s16 %rs1510, %rs1509, 4; 2026-02-21T10:18:33.6387745Z cvt.s16.s8 %rs1511, %rs1436; 2026-02-21T10:18:33.6387812Z shr.s16 %rs1512, %rs1511, 4; 2026-02-21T10:18:33.6387873Z shr.s16 %rs1513, %rs1433, 4; 2026-02-21T10:18:33.6387944Z shr.s16 %rs1514, %rs1435, 4; 2026-02-21T10:18:33.6388145Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6388208Z cvt.rn.f32.s16 %r19224, %rs1514; 2026-02-21T10:18:33.6388272Z cvt.rn.f32.s16 %r19225, %rs1513; 2026-02-21T10:18:33.6388336Z cvt.rn.f32.s16 %r19226, %rs1512; 2026-02-21T10:18:33.6388403Z cvt.rn.f32.s16 %r19227, %rs1510; 2026-02-21T10:18:33.6388657Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6388718Z cvt.s16.s8 %rs1515, %rs1438; 2026-02-21T10:18:33.6388782Z shr.s16 %rs1516, %rs1515, 4; 2026-02-21T10:18:33.6388843Z cvt.s16.s8 %rs1517, %rs1440; 2026-02-21T10:18:33.6388904Z shr.s16 %rs1518, %rs1517, 4; 2026-02-21T10:18:33.6388966Z shr.s16 %rs1519, %rs1437, 4; 2026-02-21T10:18:33.6389028Z shr.s16 %rs1520, %rs1439, 4; 2026-02-21T10:18:33.6389226Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6389296Z cvt.rn.f32.s16 %r19228, %rs1520; 2026-02-21T10:18:33.6389362Z cvt.rn.f32.s16 %r19229, %rs1519; 2026-02-21T10:18:33.6389423Z cvt.rn.f32.s16 %r19230, %rs1518; 2026-02-21T10:18:33.6389483Z cvt.rn.f32.s16 %r19231, %rs1516; 2026-02-21T10:18:33.6389693Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6389758Z cvt.s16.s8 %rs1521, %rs1442; 2026-02-21T10:18:33.6389820Z shr.s16 %rs1522, %rs1521, 4; 2026-02-21T10:18:33.6389885Z cvt.s16.s8 %rs1523, %rs1444; 2026-02-21T10:18:33.6389945Z shr.s16 %rs1524, %rs1523, 4; 2026-02-21T10:18:33.6390005Z shr.s16 %rs1525, %rs1441, 4; 2026-02-21T10:18:33.6390063Z shr.s16 %rs1526, %rs1443, 4; 2026-02-21T10:18:33.6390266Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6390418Z cvt.rn.f32.s16 %r19232, %rs1526; 2026-02-21T10:18:33.6390539Z cvt.rn.f32.s16 %r19233, %rs1525; 2026-02-21T10:18:33.6390605Z cvt.rn.f32.s16 %r19234, %rs1524; 2026-02-21T10:18:33.6390667Z cvt.rn.f32.s16 %r19235, %rs1522; 2026-02-21T10:18:33.6390863Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6390927Z cvt.s16.s8 %rs1527, %rs1446; 2026-02-21T10:18:33.6390988Z shr.s16 %rs1528, %rs1527, 4; 2026-02-21T10:18:33.6391048Z cvt.s16.s8 %rs1529, %rs1448; 2026-02-21T10:18:33.6391106Z shr.s16 %rs1530, %rs1529, 4; 2026-02-21T10:18:33.6391169Z shr.s16 %rs1531, %rs1445, 4; 2026-02-21T10:18:33.6391229Z shr.s16 %rs1532, %rs1447, 4; 2026-02-21T10:18:33.6391427Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6391492Z cvt.rn.f32.s16 %r19236, %rs1532; 2026-02-21T10:18:33.6391554Z cvt.rn.f32.s16 %r19237, %rs1531; 2026-02-21T10:18:33.6391617Z cvt.rn.f32.s16 %r19238, %rs1530; 2026-02-21T10:18:33.6391679Z cvt.rn.f32.s16 %r19239, %rs1528; 2026-02-21T10:18:33.6392051Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6392464Z cvt.s16.s8 %rs1533, %rs1450; 2026-02-21T10:18:33.6392674Z shr.s16 %rs1534, %rs1533, 4; 2026-02-21T10:18:33.6392867Z cvt.s16.s8 %rs1535, %rs1452; 2026-02-21T10:18:33.6393119Z shr.s16 %rs1536, %rs1535, 4; 2026-02-21T10:18:33.6393316Z shr.s16 %rs1537, %rs1449, 4; 2026-02-21T10:18:33.6393497Z shr.s16 %rs1538, %rs1451, 4; 2026-02-21T10:18:33.6393854Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6394210Z cvt.rn.f32.s16 %r19240, %rs1538; 2026-02-21T10:18:33.6394405Z cvt.rn.f32.s16 %r19241, %rs1537; 2026-02-21T10:18:33.6394597Z cvt.rn.f32.s16 %r19242, %rs1536; 2026-02-21T10:18:33.6394783Z cvt.rn.f32.s16 %r19243, %rs1534; 2026-02-21T10:18:33.6395111Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6395468Z cvt.s16.s8 %rs1539, %rs1454; 2026-02-21T10:18:33.6395651Z shr.s16 %rs1540, %rs1539, 4; 2026-02-21T10:18:33.6395826Z cvt.s16.s8 %rs1541, %rs1456; 2026-02-21T10:18:33.6395999Z shr.s16 %rs1542, %rs1541, 4; 2026-02-21T10:18:33.6396186Z shr.s16 %rs1543, %rs1453, 4; 2026-02-21T10:18:33.6396363Z shr.s16 %rs1544, %rs1455, 4; 2026-02-21T10:18:33.6396811Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6397165Z cvt.rn.f32.s16 %r19244, %rs1544; 2026-02-21T10:18:33.6397351Z cvt.rn.f32.s16 %r19245, %rs1543; 2026-02-21T10:18:33.6397531Z cvt.rn.f32.s16 %r19246, %rs1542; 2026-02-21T10:18:33.6397712Z cvt.rn.f32.s16 %r19247, %rs1540; 2026-02-21T10:18:33.6398033Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6398387Z cvt.s16.s8 %rs1545, %rs1458; 2026-02-21T10:18:33.6398563Z shr.s16 %rs1546, %rs1545, 4; 2026-02-21T10:18:33.6398755Z cvt.s16.s8 %rs1547, %rs1460; 2026-02-21T10:18:33.6398938Z shr.s16 %rs1548, %rs1547, 4; 2026-02-21T10:18:33.6399111Z shr.s16 %rs1549, %rs1457, 4; 2026-02-21T10:18:33.6399323Z shr.s16 %rs1550, %rs1459, 4; 2026-02-21T10:18:33.6399636Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6400055Z cvt.rn.f32.s16 %r19248, %rs1550; 2026-02-21T10:18:33.6400267Z cvt.rn.f32.s16 %r19249, %rs1549; 2026-02-21T10:18:33.6400490Z cvt.rn.f32.s16 %r19250, %rs1548; 2026-02-21T10:18:33.6400675Z cvt.rn.f32.s16 %r19251, %rs1546; 2026-02-21T10:18:33.6400996Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6401340Z cvt.s16.s8 %rs1551, %rs1462; 2026-02-21T10:18:33.6401521Z shr.s16 %rs1552, %rs1551, 4; 2026-02-21T10:18:33.6401784Z cvt.s16.s8 %rs1553, %rs1464; 2026-02-21T10:18:33.6401956Z shr.s16 %rs1554, %rs1553, 4; 2026-02-21T10:18:33.6402191Z shr.s16 %rs1555, %rs1461, 4; 2026-02-21T10:18:33.6402370Z shr.s16 %rs1556, %rs1463, 4; 2026-02-21T10:18:33.6402689Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6403042Z cvt.rn.f32.s16 %r19252, %rs1556; 2026-02-21T10:18:33.6403227Z cvt.rn.f32.s16 %r19253, %rs1555; 2026-02-21T10:18:33.6403412Z cvt.rn.f32.s16 %r19254, %rs1554; 2026-02-21T10:18:33.6403594Z cvt.rn.f32.s16 %r19255, %rs1552; 2026-02-21T10:18:33.6403908Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6404251Z cvt.s16.s8 %rs1557, %rs1466; 2026-02-21T10:18:33.6404423Z shr.s16 %rs1558, %rs1557, 4; 2026-02-21T10:18:33.6404597Z cvt.s16.s8 %rs1559, %rs1468; 2026-02-21T10:18:33.6404770Z shr.s16 %rs1560, %rs1559, 4; 2026-02-21T10:18:33.6404944Z shr.s16 %rs1561, %rs1465, 4; 2026-02-21T10:18:33.6405118Z shr.s16 %rs1562, %rs1467, 4; 2026-02-21T10:18:33.6405503Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6405859Z cvt.rn.f32.s16 %r19256, %rs1562; 2026-02-21T10:18:33.6406041Z cvt.rn.f32.s16 %r19257, %rs1561; 2026-02-21T10:18:33.6406225Z cvt.rn.f32.s16 %r19258, %rs1560; 2026-02-21T10:18:33.6406407Z cvt.rn.f32.s16 %r19259, %rs1558; 2026-02-21T10:18:33.6406927Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6407281Z cvt.s16.s8 %rs1563, %rs1470; 2026-02-21T10:18:33.6407458Z shr.s16 %rs1564, %rs1563, 4; 2026-02-21T10:18:33.6407634Z cvt.s16.s8 %rs1565, %rs1472; 2026-02-21T10:18:33.6407805Z shr.s16 %rs1566, %rs1565, 4; 2026-02-21T10:18:33.6407980Z shr.s16 %rs1567, %rs1469, 4; 2026-02-21T10:18:33.6408151Z shr.s16 %rs1568, %rs1471, 4; 2026-02-21T10:18:33.6408459Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6408806Z cvt.rn.f32.s16 %r19260, %rs1568; 2026-02-21T10:18:33.6408991Z cvt.rn.f32.s16 %r19261, %rs1567; 2026-02-21T10:18:33.6409176Z cvt.rn.f32.s16 %r19262, %rs1566; 2026-02-21T10:18:33.6409357Z cvt.rn.f32.s16 %r19263, %rs1564; 2026-02-21T10:18:33.6409538Z bar.sync 0; 2026-02-21T10:18:33.6409742Z st.shared.v4.b32 [%r34], {%r19203, %r19201, %r19202, %r19200}; 2026-02-21T10:18:33.6410067Z st.shared.v4.b32 [%r34+16384], {%r19235, %r19233, %r19234, %r19232}; 2026-02-21T10:18:33.6410383Z st.shared.v4.b32 [%r35], {%r19207, %r19205, %r19206, %r19204}; 2026-02-21T10:18:33.6410696Z st.shared.v4.b32 [%r35+16384], {%r19239, %r19237, %r19238, %r19236}; 2026-02-21T10:18:33.6411003Z st.shared.v4.b32 [%r36], {%r19211, %r19209, %r19210, %r19208}; 2026-02-21T10:18:33.6411310Z st.shared.v4.b32 [%r36+16384], {%r19243, %r19241, %r19242, %r19240}; 2026-02-21T10:18:33.6411620Z st.shared.v4.b32 [%r37], {%r19215, %r19213, %r19214, %r19212}; 2026-02-21T10:18:33.6411939Z st.shared.v4.b32 [%r37+16384], {%r19247, %r19245, %r19246, %r19244}; 2026-02-21T10:18:33.6412251Z st.shared.v4.b32 [%r38], {%r19219, %r19217, %r19218, %r19216}; 2026-02-21T10:18:33.6412551Z st.shared.v4.b32 [%r38+16384], {%r19251, %r19249, %r19250, %r19248}; 2026-02-21T10:18:33.6412854Z st.shared.v4.b32 [%r39], {%r19223, %r19221, %r19222, %r19220}; 2026-02-21T10:18:33.6413155Z st.shared.v4.b32 [%r39+16384], {%r19255, %r19253, %r19254, %r19252}; 2026-02-21T10:18:33.6413461Z st.shared.v4.b32 [%r40], {%r19227, %r19225, %r19226, %r19224}; 2026-02-21T10:18:33.6413766Z st.shared.v4.b32 [%r40+16384], {%r19259, %r19257, %r19258, %r19256}; 2026-02-21T10:18:33.6414066Z st.shared.v4.b32 [%r41], {%r19231, %r19229, %r19230, %r19228}; 2026-02-21T10:18:33.6414369Z st.shared.v4.b32 [%r41+16384], {%r19263, %r19261, %r19262, %r19260}; 2026-02-21T10:18:33.6414622Z $L__tmp13: 2026-02-21T10:18:33.6414980Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6415553Z // begin inline asm 2026-02-21T10:18:33.6415731Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6415925Z // end inline asm 2026-02-21T10:18:33.6416072Z bar.sync 0; 2026-02-21T10:18:33.6416231Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6416410Z // begin inline asm 2026-02-21T10:18:33.6418137Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r16821,%r16822,%r16823,%r16824}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6419787Z // end inline asm 2026-02-21T10:18:33.6419936Z // begin inline asm 2026-02-21T10:18:33.6421648Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r16953,%r16954,%r16955,%r16956}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6423275Z // end inline asm 2026-02-21T10:18:33.6423423Z // begin inline asm 2026-02-21T10:18:33.6425006Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17085,%r17086,%r17087,%r17088}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6426769Z // end inline asm 2026-02-21T10:18:33.6426919Z // begin inline asm 2026-02-21T10:18:33.6428558Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17217,%r17218,%r17219,%r17220}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6430199Z // end inline asm 2026-02-21T10:18:33.6430350Z // begin inline asm 2026-02-21T10:18:33.6431929Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17349,%r17350,%r17351,%r17352}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6433544Z // end inline asm 2026-02-21T10:18:33.6433787Z // begin inline asm 2026-02-21T10:18:33.6435374Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17481,%r17482,%r17483,%r17484}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6437167Z // end inline asm 2026-02-21T10:18:33.6437317Z // begin inline asm 2026-02-21T10:18:33.6438967Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17613,%r17614,%r17615,%r17616}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6440592Z // end inline asm 2026-02-21T10:18:33.6440793Z // begin inline asm 2026-02-21T10:18:33.6442374Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r17745,%r17746,%r17747,%r17748}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6443998Z // end inline asm 2026-02-21T10:18:33.6444159Z // begin inline asm 2026-02-21T10:18:33.6445740Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r17877,%r17878,%r17879,%r17880}, %rd3, %p133, 1, 1; 2026-02-21T10:18:33.6447458Z // end inline asm 2026-02-21T10:18:33.6447600Z // begin inline asm 2026-02-21T10:18:33.6449172Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18009,%r18010,%r18011,%r18012}, %rd4, %p133, 1, 1; 2026-02-21T10:18:33.6450809Z // end inline asm 2026-02-21T10:18:33.6450955Z // begin inline asm 2026-02-21T10:18:33.6452534Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18141,%r18142,%r18143,%r18144}, %rd5, %p133, 1, 1; 2026-02-21T10:18:33.6454289Z // end inline asm 2026-02-21T10:18:33.6454432Z // begin inline asm 2026-02-21T10:18:33.6456011Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18273,%r18274,%r18275,%r18276}, %rd6, %p133, 1, 1; 2026-02-21T10:18:33.6457759Z // end inline asm 2026-02-21T10:18:33.6457908Z // begin inline asm 2026-02-21T10:18:33.6459627Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18405,%r18406,%r18407,%r18408}, %rd7, %p133, 1, 1; 2026-02-21T10:18:33.6461284Z // end inline asm 2026-02-21T10:18:33.6461444Z // begin inline asm 2026-02-21T10:18:33.6463047Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18537,%r18538,%r18539,%r18540}, %rd8, %p133, 1, 1; 2026-02-21T10:18:33.6464681Z // end inline asm 2026-02-21T10:18:33.6464831Z // begin inline asm 2026-02-21T10:18:33.6466413Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18669,%r18670,%r18671,%r18672}, %rd9, %p133, 1, 1; 2026-02-21T10:18:33.6468143Z // end inline asm 2026-02-21T10:18:33.6468293Z // begin inline asm 2026-02-21T10:18:33.6469935Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r18801,%r18802,%r18803,%r18804}, %rd10, %p133, 1, 1; 2026-02-21T10:18:33.6471644Z // end inline asm 2026-02-21T10:18:33.6471817Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6472083Z mov.b32 %r18933, %r29377; 2026-02-21T10:18:33.6472260Z mov.b32 %r18934, %r18935; 2026-02-21T10:18:33.6472425Z // begin inline asm 2026-02-21T10:18:33.6475112Z // wait for regs: %r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227,%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291,%r18933,%r18934,%r18935 2026-02-21T10:18:33.6478085Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6478281Z // end inline asm 2026-02-21T10:18:33.6478425Z $L__tmp14: 2026-02-21T10:18:33.6478794Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6479160Z add.s64 %rd658, %rd658, 384; 2026-02-21T10:18:33.6479351Z add.s32 %r32163, %r32163, 192; 2026-02-21T10:18:33.6479543Z setp.lt.u64 %p193, %rd55, 3936; 2026-02-21T10:18:33.6479734Z mov.b64 %rd659, %rd55; 2026-02-21T10:18:33.6479902Z @%p193 bra $L__BB0_7; 2026-02-21T10:18:33.6480116Z // %bb.8: // %.preheader203.preheader 2026-02-21T10:18:33.6480416Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:33.6480666Z add.s64 %rd57, %rd52, 16128; 2026-02-21T10:18:33.6480853Z add.s64 %rd58, %rd45, 16128; 2026-02-21T10:18:33.6481027Z add.s64 %rd59, %rd46, 16128; 2026-02-21T10:18:33.6481206Z add.s64 %rd60, %rd47, 16128; 2026-02-21T10:18:33.6481377Z add.s64 %rd61, %rd48, 16128; 2026-02-21T10:18:33.6481551Z add.s64 %rd62, %rd49, 16128; 2026-02-21T10:18:33.6481725Z add.s64 %rd63, %rd50, 16128; 2026-02-21T10:18:33.6481915Z add.s64 %rd64, %rd51, 16128; 2026-02-21T10:18:33.6482092Z mov.b64 %rd661, 4000; 2026-02-21T10:18:33.6482251Z mov.b64 %rd660, %rd11; 2026-02-21T10:18:33.6482452Z $L__BB0_9: // %.preheader203 2026-02-21T10:18:33.6482724Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:18:33.6483010Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.6483405Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6483772Z add.s64 %rd425, %rd660, %rd64; 2026-02-21T10:18:33.6483957Z add.s64 %rd428, %rd660, %rd63; 2026-02-21T10:18:33.6484142Z add.s64 %rd431, %rd660, %rd62; 2026-02-21T10:18:33.6484342Z add.s64 %rd434, %rd660, %rd61; 2026-02-21T10:18:33.6484523Z add.s64 %rd437, %rd660, %rd60; 2026-02-21T10:18:33.6484705Z add.s64 %rd440, %rd660, %rd59; 2026-02-21T10:18:33.6484885Z add.s64 %rd443, %rd660, %rd58; 2026-02-21T10:18:33.6485207Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6485558Z add.s64 %rd446, %rd660, %rd57; 2026-02-21T10:18:33.6485738Z // begin inline asm 2026-02-21T10:18:33.6485899Z mov.u64 %rd424, 0x0; 2026-02-21T10:18:33.6486130Z createpolicy.fractional.L2::evict_first.b64 %rd424, 1.0; 2026-02-21T10:18:33.6486388Z // end inline asm 2026-02-21T10:18:33.6486656Z // begin inline asm 2026-02-21T10:18:33.6486814Z mov.u32 %r19264, 0x0; 2026-02-21T10:18:33.6487059Z mov.u32 %r19265, 0x0; 2026-02-21T10:18:33.6487271Z mov.u32 %r19266, 0x0; 2026-02-21T10:18:33.6487420Z mov.u32 %r19267, 0x0; 2026-02-21T10:18:33.6487760Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19264, %r19265, %r19266, %r19267 }, [ %rd425 + 0 ], %rd424; 2026-02-21T10:18:33.6488137Z // end inline asm 2026-02-21T10:18:33.6488289Z // begin inline asm 2026-02-21T10:18:33.6488455Z mov.u64 %rd427, 0x0; 2026-02-21T10:18:33.6488676Z createpolicy.fractional.L2::evict_first.b64 %rd427, 1.0; 2026-02-21T10:18:33.6488929Z // end inline asm 2026-02-21T10:18:33.6489072Z // begin inline asm 2026-02-21T10:18:33.6489222Z mov.u32 %r19268, 0x0; 2026-02-21T10:18:33.6489373Z mov.u32 %r19269, 0x0; 2026-02-21T10:18:33.6489526Z mov.u32 %r19270, 0x0; 2026-02-21T10:18:33.6489677Z mov.u32 %r19271, 0x0; 2026-02-21T10:18:33.6490001Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19268, %r19269, %r19270, %r19271 }, [ %rd428 + 0 ], %rd427; 2026-02-21T10:18:33.6490376Z // end inline asm 2026-02-21T10:18:33.6490519Z // begin inline asm 2026-02-21T10:18:33.6490675Z mov.u64 %rd430, 0x0; 2026-02-21T10:18:33.6490962Z createpolicy.fractional.L2::evict_first.b64 %rd430, 1.0; 2026-02-21T10:18:33.6491217Z // end inline asm 2026-02-21T10:18:33.6491360Z // begin inline asm 2026-02-21T10:18:33.6491521Z mov.u32 %r19272, 0x0; 2026-02-21T10:18:33.6491672Z mov.u32 %r19273, 0x0; 2026-02-21T10:18:33.6491827Z mov.u32 %r19274, 0x0; 2026-02-21T10:18:33.6492042Z mov.u32 %r19275, 0x0; 2026-02-21T10:18:33.6492374Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19272, %r19273, %r19274, %r19275 }, [ %rd431 + 0 ], %rd430; 2026-02-21T10:18:33.6492750Z // end inline asm 2026-02-21T10:18:33.6492896Z // begin inline asm 2026-02-21T10:18:33.6493049Z mov.u64 %rd433, 0x0; 2026-02-21T10:18:33.6493263Z createpolicy.fractional.L2::evict_first.b64 %rd433, 1.0; 2026-02-21T10:18:33.6493513Z // end inline asm 2026-02-21T10:18:33.6493656Z // begin inline asm 2026-02-21T10:18:33.6493810Z mov.u32 %r19276, 0x0; 2026-02-21T10:18:33.6493966Z mov.u32 %r19277, 0x0; 2026-02-21T10:18:33.6494118Z mov.u32 %r19278, 0x0; 2026-02-21T10:18:33.6494270Z mov.u32 %r19279, 0x0; 2026-02-21T10:18:33.6494588Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19276, %r19277, %r19278, %r19279 }, [ %rd434 + 0 ], %rd433; 2026-02-21T10:18:33.6494958Z // end inline asm 2026-02-21T10:18:33.6495102Z // begin inline asm 2026-02-21T10:18:33.6495255Z mov.u64 %rd436, 0x0; 2026-02-21T10:18:33.6495472Z createpolicy.fractional.L2::evict_first.b64 %rd436, 1.0; 2026-02-21T10:18:33.6495725Z // end inline asm 2026-02-21T10:18:33.6495869Z // begin inline asm 2026-02-21T10:18:33.6496017Z mov.u32 %r19280, 0x0; 2026-02-21T10:18:33.6496171Z mov.u32 %r19281, 0x0; 2026-02-21T10:18:33.6496321Z mov.u32 %r19282, 0x0; 2026-02-21T10:18:33.6496602Z mov.u32 %r19283, 0x0; 2026-02-21T10:18:33.6496927Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19280, %r19281, %r19282, %r19283 }, [ %rd437 + 0 ], %rd436; 2026-02-21T10:18:33.6497300Z // end inline asm 2026-02-21T10:18:33.6497443Z // begin inline asm 2026-02-21T10:18:33.6497597Z mov.u64 %rd439, 0x0; 2026-02-21T10:18:33.6497809Z createpolicy.fractional.L2::evict_first.b64 %rd439, 1.0; 2026-02-21T10:18:33.6498067Z // end inline asm 2026-02-21T10:18:33.6498218Z // begin inline asm 2026-02-21T10:18:33.6498365Z mov.u32 %r19284, 0x0; 2026-02-21T10:18:33.6498517Z mov.u32 %r19285, 0x0; 2026-02-21T10:18:33.6498664Z mov.u32 %r19286, 0x0; 2026-02-21T10:18:33.6498819Z mov.u32 %r19287, 0x0; 2026-02-21T10:18:33.6499141Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19284, %r19285, %r19286, %r19287 }, [ %rd440 + 0 ], %rd439; 2026-02-21T10:18:33.6499523Z // end inline asm 2026-02-21T10:18:33.6499679Z // begin inline asm 2026-02-21T10:18:33.6499836Z mov.u64 %rd442, 0x0; 2026-02-21T10:18:33.6500061Z createpolicy.fractional.L2::evict_first.b64 %rd442, 1.0; 2026-02-21T10:18:33.6500318Z // end inline asm 2026-02-21T10:18:33.6500471Z // begin inline asm 2026-02-21T10:18:33.6500715Z mov.u32 %r19288, 0x0; 2026-02-21T10:18:33.6500870Z mov.u32 %r19289, 0x0; 2026-02-21T10:18:33.6501082Z mov.u32 %r19290, 0x0; 2026-02-21T10:18:33.6501234Z mov.u32 %r19291, 0x0; 2026-02-21T10:18:33.6501579Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19288, %r19289, %r19290, %r19291 }, [ %rd443 + 0 ], %rd442; 2026-02-21T10:18:33.6501955Z // end inline asm 2026-02-21T10:18:33.6502103Z // begin inline asm 2026-02-21T10:18:33.6502253Z mov.u64 %rd445, 0x0; 2026-02-21T10:18:33.6502480Z createpolicy.fractional.L2::evict_first.b64 %rd445, 1.0; 2026-02-21T10:18:33.6502734Z // end inline asm 2026-02-21T10:18:33.6502884Z // begin inline asm 2026-02-21T10:18:33.6503033Z mov.u32 %r19292, 0x0; 2026-02-21T10:18:33.6503186Z mov.u32 %r19293, 0x0; 2026-02-21T10:18:33.6503335Z mov.u32 %r19294, 0x0; 2026-02-21T10:18:33.6503489Z mov.u32 %r19295, 0x0; 2026-02-21T10:18:33.6503814Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19292, %r19293, %r19294, %r19295 }, [ %rd446 + 0 ], %rd445; 2026-02-21T10:18:33.6504188Z // end inline asm 2026-02-21T10:18:33.6504493Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6504929Z bar.sync 0; 2026-02-21T10:18:33.6505111Z st.shared.v2.b32 [%r16], {%r19264, %r19265}; 2026-02-21T10:18:33.6505362Z st.shared.v2.b32 [%r16+2048], {%r19268, %r19269}; 2026-02-21T10:18:33.6505619Z st.shared.v2.b32 [%r16+4096], {%r19272, %r19273}; 2026-02-21T10:18:33.6505936Z st.shared.v2.b32 [%r16+6144], {%r19276, %r19277}; 2026-02-21T10:18:33.6506187Z st.shared.v2.b32 [%r16+8192], {%r19280, %r19281}; 2026-02-21T10:18:33.6506440Z st.shared.v2.b32 [%r16+10240], {%r19284, %r19285}; 2026-02-21T10:18:33.6506817Z st.shared.v2.b32 [%r16+12288], {%r19288, %r19289}; 2026-02-21T10:18:33.6507069Z st.shared.v2.b32 [%r16+14336], {%r19292, %r19293}; 2026-02-21T10:18:33.6507310Z st.shared.v2.b32 [%r17], {%r19266, %r19267}; 2026-02-21T10:18:33.6507547Z st.shared.v2.b32 [%r17+2048], {%r19270, %r19271}; 2026-02-21T10:18:33.6507798Z st.shared.v2.b32 [%r17+4096], {%r19274, %r19275}; 2026-02-21T10:18:33.6508042Z st.shared.v2.b32 [%r17+6144], {%r19278, %r19279}; 2026-02-21T10:18:33.6508286Z st.shared.v2.b32 [%r17+8192], {%r19282, %r19283}; 2026-02-21T10:18:33.6508614Z st.shared.v2.b32 [%r17+10240], {%r19286, %r19287}; 2026-02-21T10:18:33.6508866Z st.shared.v2.b32 [%r17+12288], {%r19290, %r19291}; 2026-02-21T10:18:33.6509115Z st.shared.v2.b32 [%r17+14336], {%r19294, %r19295}; 2026-02-21T10:18:33.6509338Z bar.sync 0; 2026-02-21T10:18:33.6509494Z ld.shared.b16 %rs1569, [%r58]; 2026-02-21T10:18:33.6509696Z ld.shared.b16 %rs1570, [%r58+1024]; 2026-02-21T10:18:33.6509900Z ld.shared.b16 %rs1571, [%r58+64]; 2026-02-21T10:18:33.6510098Z ld.shared.b16 %rs1572, [%r58+1088]; 2026-02-21T10:18:33.6510298Z ld.shared.b16 %rs1573, [%r58+8192]; 2026-02-21T10:18:33.6510492Z ld.shared.b16 %rs1574, [%r58+9216]; 2026-02-21T10:18:33.6510685Z ld.shared.b16 %rs1575, [%r58+8256]; 2026-02-21T10:18:33.6510880Z ld.shared.b16 %rs1576, [%r58+9280]; 2026-02-21T10:18:33.6511077Z ld.shared.b16 %rs1577, [%r59]; 2026-02-21T10:18:33.6511266Z ld.shared.b16 %rs1578, [%r59+1024]; 2026-02-21T10:18:33.6511464Z ld.shared.b16 %rs1579, [%r59+64]; 2026-02-21T10:18:33.6511658Z ld.shared.b16 %rs1580, [%r59+1088]; 2026-02-21T10:18:33.6511853Z ld.shared.b16 %rs1581, [%r59+8192]; 2026-02-21T10:18:33.6512047Z ld.shared.b16 %rs1582, [%r59+9216]; 2026-02-21T10:18:33.6512238Z ld.shared.b16 %rs1583, [%r59+8256]; 2026-02-21T10:18:33.6512435Z ld.shared.b16 %rs1584, [%r59+9280]; 2026-02-21T10:18:33.6512629Z ld.shared.b16 %rs1585, [%r60]; 2026-02-21T10:18:33.6512817Z ld.shared.b16 %rs1586, [%r60+1024]; 2026-02-21T10:18:33.6513010Z ld.shared.b16 %rs1587, [%r60+64]; 2026-02-21T10:18:33.6513203Z ld.shared.b16 %rs1588, [%r60+1088]; 2026-02-21T10:18:33.6513399Z ld.shared.b16 %rs1589, [%r60+8192]; 2026-02-21T10:18:33.6513590Z ld.shared.b16 %rs1590, [%r60+9216]; 2026-02-21T10:18:33.6513786Z ld.shared.b16 %rs1591, [%r60+8256]; 2026-02-21T10:18:33.6514074Z ld.shared.b16 %rs1592, [%r60+9280]; 2026-02-21T10:18:33.6514271Z ld.shared.b16 %rs1593, [%r61]; 2026-02-21T10:18:33.6514533Z ld.shared.b16 %rs1594, [%r61+1024]; 2026-02-21T10:18:33.6514735Z ld.shared.b16 %rs1595, [%r61+64]; 2026-02-21T10:18:33.6514936Z ld.shared.b16 %rs1596, [%r61+1088]; 2026-02-21T10:18:33.6515139Z ld.shared.b16 %rs1597, [%r61+8192]; 2026-02-21T10:18:33.6515331Z ld.shared.b16 %rs1598, [%r61+9216]; 2026-02-21T10:18:33.6515529Z ld.shared.b16 %rs1599, [%r61+8256]; 2026-02-21T10:18:33.6515725Z ld.shared.b16 %rs1600, [%r61+9280]; 2026-02-21T10:18:33.6515921Z ld.shared.b16 %rs1601, [%r62]; 2026-02-21T10:18:33.6516105Z ld.shared.b16 %rs1602, [%r62+1024]; 2026-02-21T10:18:33.6516299Z ld.shared.b16 %rs1603, [%r62+64]; 2026-02-21T10:18:33.6516771Z ld.shared.b16 %rs1604, [%r62+1088]; 2026-02-21T10:18:33.6516989Z ld.shared.b16 %rs1605, [%r62+8192]; 2026-02-21T10:18:33.6517183Z ld.shared.b16 %rs1606, [%r62+9216]; 2026-02-21T10:18:33.6517377Z ld.shared.b16 %rs1607, [%r62+8256]; 2026-02-21T10:18:33.6517574Z ld.shared.b16 %rs1608, [%r62+9280]; 2026-02-21T10:18:33.6517771Z ld.shared.b16 %rs1609, [%r63]; 2026-02-21T10:18:33.6518046Z ld.shared.b16 %rs1610, [%r63+1024]; 2026-02-21T10:18:33.6518246Z ld.shared.b16 %rs1611, [%r63+64]; 2026-02-21T10:18:33.6518441Z ld.shared.b16 %rs1612, [%r63+1088]; 2026-02-21T10:18:33.6518633Z ld.shared.b16 %rs1613, [%r63+8192]; 2026-02-21T10:18:33.6518826Z ld.shared.b16 %rs1614, [%r63+9216]; 2026-02-21T10:18:33.6519089Z ld.shared.b16 %rs1615, [%r63+8256]; 2026-02-21T10:18:33.6519289Z ld.shared.b16 %rs1616, [%r63+9280]; 2026-02-21T10:18:33.6519483Z ld.shared.b16 %rs1617, [%r64]; 2026-02-21T10:18:33.6519667Z ld.shared.b16 %rs1618, [%r64+1024]; 2026-02-21T10:18:33.6519861Z ld.shared.b16 %rs1619, [%r64+64]; 2026-02-21T10:18:33.6520049Z ld.shared.b16 %rs1620, [%r64+1088]; 2026-02-21T10:18:33.6520246Z ld.shared.b16 %rs1621, [%r64+8192]; 2026-02-21T10:18:33.6520437Z ld.shared.b16 %rs1622, [%r64+9216]; 2026-02-21T10:18:33.6520635Z ld.shared.b16 %rs1623, [%r64+8256]; 2026-02-21T10:18:33.6520826Z ld.shared.b16 %rs1624, [%r64+9280]; 2026-02-21T10:18:33.6521021Z ld.shared.b16 %rs1625, [%r65]; 2026-02-21T10:18:33.6521206Z ld.shared.b16 %rs1626, [%r65+1024]; 2026-02-21T10:18:33.6521399Z ld.shared.b16 %rs1627, [%r65+64]; 2026-02-21T10:18:33.6521590Z ld.shared.b16 %rs1628, [%r65+1088]; 2026-02-21T10:18:33.6521781Z ld.shared.b16 %rs1629, [%r65+8192]; 2026-02-21T10:18:33.6521976Z ld.shared.b16 %rs1630, [%r65+9216]; 2026-02-21T10:18:33.6522169Z ld.shared.b16 %rs1631, [%r65+8256]; 2026-02-21T10:18:33.6522366Z ld.shared.b16 %rs1632, [%r65+9280]; 2026-02-21T10:18:33.6522559Z cvt.f32.bf16 %r19433, %rs1569; 2026-02-21T10:18:33.6522746Z cvt.f32.bf16 %r19434, %rs1570; 2026-02-21T10:18:33.6522939Z cvt.f32.bf16 %r19435, %rs1577; 2026-02-21T10:18:33.6523120Z cvt.f32.bf16 %r19436, %rs1578; 2026-02-21T10:18:33.6523297Z cvt.f32.bf16 %r19565, %rs1585; 2026-02-21T10:18:33.6523473Z cvt.f32.bf16 %r19566, %rs1586; 2026-02-21T10:18:33.6523657Z cvt.f32.bf16 %r19567, %rs1593; 2026-02-21T10:18:33.6523835Z cvt.f32.bf16 %r19568, %rs1594; 2026-02-21T10:18:33.6524017Z cvt.f32.bf16 %r19697, %rs1601; 2026-02-21T10:18:33.6524194Z cvt.f32.bf16 %r19698, %rs1602; 2026-02-21T10:18:33.6524374Z cvt.f32.bf16 %r19699, %rs1609; 2026-02-21T10:18:33.6524549Z cvt.f32.bf16 %r19700, %rs1610; 2026-02-21T10:18:33.6524727Z cvt.f32.bf16 %r19829, %rs1617; 2026-02-21T10:18:33.6524915Z cvt.f32.bf16 %r19830, %rs1618; 2026-02-21T10:18:33.6525101Z cvt.f32.bf16 %r19831, %rs1625; 2026-02-21T10:18:33.6525277Z cvt.f32.bf16 %r19832, %rs1626; 2026-02-21T10:18:33.6525452Z cvt.f32.bf16 %r19961, %rs1571; 2026-02-21T10:18:33.6525630Z cvt.f32.bf16 %r19962, %rs1572; 2026-02-21T10:18:33.6525805Z cvt.f32.bf16 %r19963, %rs1579; 2026-02-21T10:18:33.6525984Z cvt.f32.bf16 %r19964, %rs1580; 2026-02-21T10:18:33.6526158Z cvt.f32.bf16 %r20093, %rs1587; 2026-02-21T10:18:33.6526337Z cvt.f32.bf16 %r20094, %rs1588; 2026-02-21T10:18:33.6526722Z cvt.f32.bf16 %r20095, %rs1595; 2026-02-21T10:18:33.6526904Z cvt.f32.bf16 %r20096, %rs1596; 2026-02-21T10:18:33.6527150Z cvt.f32.bf16 %r20225, %rs1603; 2026-02-21T10:18:33.6527328Z cvt.f32.bf16 %r20226, %rs1604; 2026-02-21T10:18:33.6527516Z cvt.f32.bf16 %r20227, %rs1611; 2026-02-21T10:18:33.6527694Z cvt.f32.bf16 %r20228, %rs1612; 2026-02-21T10:18:33.6527874Z cvt.f32.bf16 %r20357, %rs1619; 2026-02-21T10:18:33.6528049Z cvt.f32.bf16 %r20358, %rs1620; 2026-02-21T10:18:33.6528242Z cvt.f32.bf16 %r20359, %rs1627; 2026-02-21T10:18:33.6528422Z cvt.f32.bf16 %r20360, %rs1628; 2026-02-21T10:18:33.6528598Z cvt.f32.bf16 %r20489, %rs1573; 2026-02-21T10:18:33.6528772Z cvt.f32.bf16 %r20490, %rs1574; 2026-02-21T10:18:33.6528953Z cvt.f32.bf16 %r20491, %rs1581; 2026-02-21T10:18:33.6529131Z cvt.f32.bf16 %r20492, %rs1582; 2026-02-21T10:18:33.6529308Z cvt.f32.bf16 %r20621, %rs1589; 2026-02-21T10:18:33.6529486Z cvt.f32.bf16 %r20622, %rs1590; 2026-02-21T10:18:33.6529658Z cvt.f32.bf16 %r20623, %rs1597; 2026-02-21T10:18:33.6529837Z cvt.f32.bf16 %r20624, %rs1598; 2026-02-21T10:18:33.6530015Z cvt.f32.bf16 %r20753, %rs1605; 2026-02-21T10:18:33.6530194Z cvt.f32.bf16 %r20754, %rs1606; 2026-02-21T10:18:33.6530447Z cvt.f32.bf16 %r20755, %rs1613; 2026-02-21T10:18:33.6530641Z cvt.f32.bf16 %r20756, %rs1614; 2026-02-21T10:18:33.6530819Z cvt.f32.bf16 %r20885, %rs1621; 2026-02-21T10:18:33.6530998Z cvt.f32.bf16 %r20886, %rs1622; 2026-02-21T10:18:33.6531175Z cvt.f32.bf16 %r20887, %rs1629; 2026-02-21T10:18:33.6531413Z cvt.f32.bf16 %r20888, %rs1630; 2026-02-21T10:18:33.6531596Z cvt.f32.bf16 %r21017, %rs1575; 2026-02-21T10:18:33.6531771Z cvt.f32.bf16 %r21018, %rs1576; 2026-02-21T10:18:33.6531952Z cvt.f32.bf16 %r21019, %rs1583; 2026-02-21T10:18:33.6532143Z cvt.f32.bf16 %r21020, %rs1584; 2026-02-21T10:18:33.6532322Z cvt.f32.bf16 %r21149, %rs1591; 2026-02-21T10:18:33.6532498Z cvt.f32.bf16 %r21150, %rs1592; 2026-02-21T10:18:33.6532675Z cvt.f32.bf16 %r21151, %rs1599; 2026-02-21T10:18:33.6532852Z cvt.f32.bf16 %r21152, %rs1600; 2026-02-21T10:18:33.6533031Z cvt.f32.bf16 %r21281, %rs1607; 2026-02-21T10:18:33.6533214Z cvt.f32.bf16 %r21282, %rs1608; 2026-02-21T10:18:33.6533393Z cvt.f32.bf16 %r21283, %rs1615; 2026-02-21T10:18:33.6533572Z cvt.f32.bf16 %r21284, %rs1616; 2026-02-21T10:18:33.6533745Z cvt.f32.bf16 %r21413, %rs1623; 2026-02-21T10:18:33.6533927Z cvt.f32.bf16 %r21414, %rs1624; 2026-02-21T10:18:33.6534103Z cvt.f32.bf16 %r21415, %rs1631; 2026-02-21T10:18:33.6534283Z cvt.f32.bf16 %r21416, %rs1632; 2026-02-21T10:18:33.6534622Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6534978Z bar.sync 0; 2026-02-21T10:18:33.6535126Z // begin inline asm 2026-02-21T10:18:33.6535337Z @%p222 mbarrier.init.shared::cta.b64 [%r19296], 1; 2026-02-21T10:18:33.6535573Z // end inline asm 2026-02-21T10:18:33.6535717Z bar.sync 0; 2026-02-21T10:18:33.6535857Z // begin inline asm 2026-02-21T10:18:33.6536083Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r19296], 4096; 2026-02-21T10:18:33.6536358Z // end inline asm 2026-02-21T10:18:33.6536628Z // begin inline asm 2026-02-21T10:18:33.6536805Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6536999Z // end inline asm 2026-02-21T10:18:33.6537139Z bar.sync 0; 2026-02-21T10:18:33.6537302Z elect.sync %r21679|%p215, -1; 2026-02-21T10:18:33.6537498Z and.pred %p196, %p1, %p215; 2026-02-21T10:18:33.6537692Z add.s64 %rd661, %rd661, 32; 2026-02-21T10:18:33.6537871Z cvt.u32.u64 %r19300, %rd661; 2026-02-21T10:18:33.6538050Z // begin inline asm 2026-02-21T10:18:33.6538482Z @%p196 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r19299, %r19300}], [%r19296]; 2026-02-21T10:18:33.6538961Z // end inline asm 2026-02-21T10:18:33.6539105Z bar.sync 0; 2026-02-21T10:18:33.6539242Z mov.b32 %r21547, 0; 2026-02-21T10:18:33.6539399Z // begin inline asm 2026-02-21T10:18:33.6539543Z 2026-02-21T10:18:33.6539665Z { 2026-02-21T10:18:33.6539882Z .reg .pred complete; 2026-02-21T10:18:33.6540037Z waitLoop: 2026-02-21T10:18:33.6540330Z mbarrier.try_wait.parity.shared.b64 complete, [%r19296], %r21547; 2026-02-21T10:18:33.6540626Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6540809Z } 2026-02-21T10:18:33.6540884Z 2026-02-21T10:18:33.6540942Z // end inline asm 2026-02-21T10:18:33.6541089Z bar.sync 0; 2026-02-21T10:18:33.6541227Z // begin inline asm 2026-02-21T10:18:33.6541418Z @%p222 mbarrier.inval.shared::cta.b64 [%r19296]; 2026-02-21T10:18:33.6541645Z // end inline asm 2026-02-21T10:18:33.6541944Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6542305Z ld.shared.s8 %rs1633, [%r26]; 2026-02-21T10:18:33.6542627Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6542980Z shl.b16 %rs1634, %rs1633, 4; 2026-02-21T10:18:33.6543295Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6543653Z ld.shared.s8 %rs1635, [%r27+128]; 2026-02-21T10:18:33.6544065Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6544419Z shl.b16 %rs1636, %rs1635, 4; 2026-02-21T10:18:33.6544742Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6545153Z ld.shared.s8 %rs1637, [%r28+256]; 2026-02-21T10:18:33.6545483Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6545846Z shl.b16 %rs1638, %rs1637, 4; 2026-02-21T10:18:33.6546162Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6546627Z ld.shared.s8 %rs1639, [%r29+384]; 2026-02-21T10:18:33.6546955Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6547303Z shl.b16 %rs1640, %rs1639, 4; 2026-02-21T10:18:33.6547616Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6547970Z ld.shared.s8 %rs1641, [%r30+512]; 2026-02-21T10:18:33.6548289Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6548714Z shl.b16 %rs1642, %rs1641, 4; 2026-02-21T10:18:33.6549027Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6549375Z ld.shared.s8 %rs1643, [%r31+640]; 2026-02-21T10:18:33.6549696Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6550044Z shl.b16 %rs1644, %rs1643, 4; 2026-02-21T10:18:33.6550354Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6550699Z ld.shared.s8 %rs1645, [%r32+768]; 2026-02-21T10:18:33.6551022Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6551368Z shl.b16 %rs1646, %rs1645, 4; 2026-02-21T10:18:33.6551680Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6552024Z ld.shared.s8 %rs1647, [%r33+896]; 2026-02-21T10:18:33.6552351Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6552697Z shl.b16 %rs1648, %rs1647, 4; 2026-02-21T10:18:33.6553007Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6553358Z ld.shared.s8 %rs1649, [%r26+1024]; 2026-02-21T10:18:33.6553686Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6554033Z shl.b16 %rs1650, %rs1649, 4; 2026-02-21T10:18:33.6554344Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6554844Z ld.shared.s8 %rs1651, [%r27+1152]; 2026-02-21T10:18:33.6555175Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6555519Z shl.b16 %rs1652, %rs1651, 4; 2026-02-21T10:18:33.6555827Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6556177Z ld.shared.s8 %rs1653, [%r28+1280]; 2026-02-21T10:18:33.6556627Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6556980Z shl.b16 %rs1654, %rs1653, 4; 2026-02-21T10:18:33.6557304Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6557667Z ld.shared.s8 %rs1655, [%r29+1408]; 2026-02-21T10:18:33.6558000Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6558352Z shl.b16 %rs1656, %rs1655, 4; 2026-02-21T10:18:33.6558759Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6558840Z ld.shared.s8 %rs1657, [%r30+1536]; 2026-02-21T10:18:33.6559048Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6559115Z shl.b16 %rs1658, %rs1657, 4; 2026-02-21T10:18:33.6559384Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6559457Z ld.shared.s8 %rs1659, [%r31+1664]; 2026-02-21T10:18:33.6559655Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6559722Z shl.b16 %rs1660, %rs1659, 4; 2026-02-21T10:18:33.6559919Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6559988Z ld.shared.s8 %rs1661, [%r32+1792]; 2026-02-21T10:18:33.6560185Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6560251Z shl.b16 %rs1662, %rs1661, 4; 2026-02-21T10:18:33.6560447Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6560513Z ld.shared.s8 %rs1663, [%r33+1920]; 2026-02-21T10:18:33.6560712Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6560775Z shl.b16 %rs1664, %rs1663, 4; 2026-02-21T10:18:33.6560970Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6561037Z ld.shared.s8 %rs1665, [%r26+2048]; 2026-02-21T10:18:33.6561231Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6561294Z shl.b16 %rs1666, %rs1665, 4; 2026-02-21T10:18:33.6561497Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6561565Z ld.shared.s8 %rs1667, [%r27+2176]; 2026-02-21T10:18:33.6561761Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6561827Z shl.b16 %rs1668, %rs1667, 4; 2026-02-21T10:18:33.6562023Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6562088Z ld.shared.s8 %rs1669, [%r28+2304]; 2026-02-21T10:18:33.6562286Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6562350Z shl.b16 %rs1670, %rs1669, 4; 2026-02-21T10:18:33.6562545Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6562610Z ld.shared.s8 %rs1671, [%r29+2432]; 2026-02-21T10:18:33.6562808Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6563023Z shl.b16 %rs1672, %rs1671, 4; 2026-02-21T10:18:33.6563228Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6563298Z ld.shared.s8 %rs1673, [%r30+2560]; 2026-02-21T10:18:33.6563493Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6563555Z shl.b16 %rs1674, %rs1673, 4; 2026-02-21T10:18:33.6563751Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6563815Z ld.shared.s8 %rs1675, [%r31+2688]; 2026-02-21T10:18:33.6564008Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6564070Z shl.b16 %rs1676, %rs1675, 4; 2026-02-21T10:18:33.6564270Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6568304Z ld.shared.s8 %rs1677, [%r32+2816]; 2026-02-21T10:18:33.6568729Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6568820Z shl.b16 %rs1678, %rs1677, 4; 2026-02-21T10:18:33.6569045Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6569123Z ld.shared.s8 %rs1679, [%r33+2944]; 2026-02-21T10:18:33.6569392Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6569461Z shl.b16 %rs1680, %rs1679, 4; 2026-02-21T10:18:33.6569671Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6569752Z ld.shared.s8 %rs1681, [%r26+3072]; 2026-02-21T10:18:33.6569964Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6570033Z shl.b16 %rs1682, %rs1681, 4; 2026-02-21T10:18:33.6570237Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6570308Z ld.shared.s8 %rs1683, [%r27+3200]; 2026-02-21T10:18:33.6570504Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6570565Z shl.b16 %rs1684, %rs1683, 4; 2026-02-21T10:18:33.6570762Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6570830Z ld.shared.s8 %rs1685, [%r28+3328]; 2026-02-21T10:18:33.6571023Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6571084Z shl.b16 %rs1686, %rs1685, 4; 2026-02-21T10:18:33.6571278Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6571347Z ld.shared.s8 %rs1687, [%r29+3456]; 2026-02-21T10:18:33.6571540Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6571602Z shl.b16 %rs1688, %rs1687, 4; 2026-02-21T10:18:33.6571795Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6571859Z ld.shared.s8 %rs1689, [%r30+3584]; 2026-02-21T10:18:33.6572052Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6572120Z shl.b16 %rs1690, %rs1689, 4; 2026-02-21T10:18:33.6572323Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6572393Z ld.shared.s8 %rs1691, [%r31+3712]; 2026-02-21T10:18:33.6572591Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6572655Z shl.b16 %rs1692, %rs1691, 4; 2026-02-21T10:18:33.6572849Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6573052Z ld.shared.s8 %rs1693, [%r32+3840]; 2026-02-21T10:18:33.6573250Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6573312Z shl.b16 %rs1694, %rs1693, 4; 2026-02-21T10:18:33.6573503Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6573581Z ld.shared.s8 %rs1695, [%r33+3968]; 2026-02-21T10:18:33.6573779Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6573843Z shl.b16 %rs1696, %rs1695, 4; 2026-02-21T10:18:33.6574043Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6574109Z cvt.s16.s8 %rs1697, %rs1634; 2026-02-21T10:18:33.6574171Z shr.s16 %rs1698, %rs1697, 4; 2026-02-21T10:18:33.6574231Z cvt.s16.s8 %rs1699, %rs1636; 2026-02-21T10:18:33.6574296Z shr.s16 %rs1700, %rs1699, 4; 2026-02-21T10:18:33.6574358Z shr.s16 %rs1701, %rs1633, 4; 2026-02-21T10:18:33.6574425Z shr.s16 %rs1702, %rs1635, 4; 2026-02-21T10:18:33.6574681Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6574753Z cvt.rn.f32.s16 %r21680, %rs1702; 2026-02-21T10:18:33.6574817Z cvt.rn.f32.s16 %r21681, %rs1701; 2026-02-21T10:18:33.6574924Z cvt.rn.f32.s16 %r21682, %rs1700; 2026-02-21T10:18:33.6574988Z cvt.rn.f32.s16 %r21683, %rs1698; 2026-02-21T10:18:33.6575183Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6575246Z cvt.s16.s8 %rs1703, %rs1638; 2026-02-21T10:18:33.6575310Z shr.s16 %rs1704, %rs1703, 4; 2026-02-21T10:18:33.6575371Z cvt.s16.s8 %rs1705, %rs1640; 2026-02-21T10:18:33.6575430Z shr.s16 %rs1706, %rs1705, 4; 2026-02-21T10:18:33.6575493Z shr.s16 %rs1707, %rs1637, 4; 2026-02-21T10:18:33.6575554Z shr.s16 %rs1708, %rs1639, 4; 2026-02-21T10:18:33.6575747Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6575814Z cvt.rn.f32.s16 %r21684, %rs1708; 2026-02-21T10:18:33.6575876Z cvt.rn.f32.s16 %r21685, %rs1707; 2026-02-21T10:18:33.6575936Z cvt.rn.f32.s16 %r21686, %rs1706; 2026-02-21T10:18:33.6575996Z cvt.rn.f32.s16 %r21687, %rs1704; 2026-02-21T10:18:33.6576194Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6576257Z cvt.s16.s8 %rs1709, %rs1642; 2026-02-21T10:18:33.6576318Z shr.s16 %rs1710, %rs1709, 4; 2026-02-21T10:18:33.6576383Z cvt.s16.s8 %rs1711, %rs1644; 2026-02-21T10:18:33.6576444Z shr.s16 %rs1712, %rs1711, 4; 2026-02-21T10:18:33.6576671Z shr.s16 %rs1713, %rs1641, 4; 2026-02-21T10:18:33.6576735Z shr.s16 %rs1714, %rs1643, 4; 2026-02-21T10:18:33.6576954Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6577022Z cvt.rn.f32.s16 %r21688, %rs1714; 2026-02-21T10:18:33.6577088Z cvt.rn.f32.s16 %r21689, %rs1713; 2026-02-21T10:18:33.6577154Z cvt.rn.f32.s16 %r21690, %rs1712; 2026-02-21T10:18:33.6577216Z cvt.rn.f32.s16 %r21691, %rs1710; 2026-02-21T10:18:33.6577417Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6577486Z cvt.s16.s8 %rs1715, %rs1646; 2026-02-21T10:18:33.6577548Z shr.s16 %rs1716, %rs1715, 4; 2026-02-21T10:18:33.6577609Z cvt.s16.s8 %rs1717, %rs1648; 2026-02-21T10:18:33.6577670Z shr.s16 %rs1718, %rs1717, 4; 2026-02-21T10:18:33.6577733Z shr.s16 %rs1719, %rs1645, 4; 2026-02-21T10:18:33.6577793Z shr.s16 %rs1720, %rs1647, 4; 2026-02-21T10:18:33.6577988Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6578054Z cvt.rn.f32.s16 %r21692, %rs1720; 2026-02-21T10:18:33.6578116Z cvt.rn.f32.s16 %r21693, %rs1719; 2026-02-21T10:18:33.6578271Z cvt.rn.f32.s16 %r21694, %rs1718; 2026-02-21T10:18:33.6578397Z cvt.rn.f32.s16 %r21695, %rs1716; 2026-02-21T10:18:33.6578604Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6578668Z cvt.s16.s8 %rs1721, %rs1650; 2026-02-21T10:18:33.6578729Z shr.s16 %rs1722, %rs1721, 4; 2026-02-21T10:18:33.6578792Z cvt.s16.s8 %rs1723, %rs1652; 2026-02-21T10:18:33.6578855Z shr.s16 %rs1724, %rs1723, 4; 2026-02-21T10:18:33.6578919Z shr.s16 %rs1725, %rs1649, 4; 2026-02-21T10:18:33.6578982Z shr.s16 %rs1726, %rs1651, 4; 2026-02-21T10:18:33.6579189Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6579256Z cvt.rn.f32.s16 %r21696, %rs1726; 2026-02-21T10:18:33.6579322Z cvt.rn.f32.s16 %r21697, %rs1725; 2026-02-21T10:18:33.6579383Z cvt.rn.f32.s16 %r21698, %rs1724; 2026-02-21T10:18:33.6579447Z cvt.rn.f32.s16 %r21699, %rs1722; 2026-02-21T10:18:33.6579658Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6579791Z cvt.s16.s8 %rs1727, %rs1654; 2026-02-21T10:18:33.6579859Z shr.s16 %rs1728, %rs1727, 4; 2026-02-21T10:18:33.6579921Z cvt.s16.s8 %rs1729, %rs1656; 2026-02-21T10:18:33.6579983Z shr.s16 %rs1730, %rs1729, 4; 2026-02-21T10:18:33.6580043Z shr.s16 %rs1731, %rs1653, 4; 2026-02-21T10:18:33.6580160Z shr.s16 %rs1732, %rs1655, 4; 2026-02-21T10:18:33.6580372Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6580443Z cvt.rn.f32.s16 %r21700, %rs1732; 2026-02-21T10:18:33.6580504Z cvt.rn.f32.s16 %r21701, %rs1731; 2026-02-21T10:18:33.6580567Z cvt.rn.f32.s16 %r21702, %rs1730; 2026-02-21T10:18:33.6580632Z cvt.rn.f32.s16 %r21703, %rs1728; 2026-02-21T10:18:33.6580837Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6580905Z cvt.s16.s8 %rs1733, %rs1658; 2026-02-21T10:18:33.6580971Z shr.s16 %rs1734, %rs1733, 4; 2026-02-21T10:18:33.6581031Z cvt.s16.s8 %rs1735, %rs1660; 2026-02-21T10:18:33.6581092Z shr.s16 %rs1736, %rs1735, 4; 2026-02-21T10:18:33.6581152Z shr.s16 %rs1737, %rs1657, 4; 2026-02-21T10:18:33.6581220Z shr.s16 %rs1738, %rs1659, 4; 2026-02-21T10:18:33.6581418Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6581496Z cvt.rn.f32.s16 %r21704, %rs1738; 2026-02-21T10:18:33.6581570Z cvt.rn.f32.s16 %r21705, %rs1737; 2026-02-21T10:18:33.6581633Z cvt.rn.f32.s16 %r21706, %rs1736; 2026-02-21T10:18:33.6581693Z cvt.rn.f32.s16 %r21707, %rs1734; 2026-02-21T10:18:33.6581893Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6581954Z cvt.s16.s8 %rs1739, %rs1662; 2026-02-21T10:18:33.6582015Z shr.s16 %rs1740, %rs1739, 4; 2026-02-21T10:18:33.6582078Z cvt.s16.s8 %rs1741, %rs1664; 2026-02-21T10:18:33.6582142Z shr.s16 %rs1742, %rs1741, 4; 2026-02-21T10:18:33.6582204Z shr.s16 %rs1743, %rs1661, 4; 2026-02-21T10:18:33.6582265Z shr.s16 %rs1744, %rs1663, 4; 2026-02-21T10:18:33.6582463Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6582529Z cvt.rn.f32.s16 %r21708, %rs1744; 2026-02-21T10:18:33.6582591Z cvt.rn.f32.s16 %r21709, %rs1743; 2026-02-21T10:18:33.6582657Z cvt.rn.f32.s16 %r21710, %rs1742; 2026-02-21T10:18:33.6582718Z cvt.rn.f32.s16 %r21711, %rs1740; 2026-02-21T10:18:33.6582919Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6582982Z cvt.s16.s8 %rs1745, %rs1666; 2026-02-21T10:18:33.6583048Z shr.s16 %rs1746, %rs1745, 4; 2026-02-21T10:18:33.6583109Z cvt.s16.s8 %rs1747, %rs1668; 2026-02-21T10:18:33.6583169Z shr.s16 %rs1748, %rs1747, 4; 2026-02-21T10:18:33.6583230Z shr.s16 %rs1749, %rs1665, 4; 2026-02-21T10:18:33.6583363Z shr.s16 %rs1750, %rs1667, 4; 2026-02-21T10:18:33.6583626Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6583697Z cvt.rn.f32.s16 %r21712, %rs1750; 2026-02-21T10:18:33.6583762Z cvt.rn.f32.s16 %r21713, %rs1749; 2026-02-21T10:18:33.6583824Z cvt.rn.f32.s16 %r21714, %rs1748; 2026-02-21T10:18:33.6583886Z cvt.rn.f32.s16 %r21715, %rs1746; 2026-02-21T10:18:33.6584095Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6584159Z cvt.s16.s8 %rs1751, %rs1670; 2026-02-21T10:18:33.6584221Z shr.s16 %rs1752, %rs1751, 4; 2026-02-21T10:18:33.6584284Z cvt.s16.s8 %rs1753, %rs1672; 2026-02-21T10:18:33.6584348Z shr.s16 %rs1754, %rs1753, 4; 2026-02-21T10:18:33.6584409Z shr.s16 %rs1755, %rs1669, 4; 2026-02-21T10:18:33.6584470Z shr.s16 %rs1756, %rs1671, 4; 2026-02-21T10:18:33.6584674Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6584743Z cvt.rn.f32.s16 %r21716, %rs1756; 2026-02-21T10:18:33.6584805Z cvt.rn.f32.s16 %r21717, %rs1755; 2026-02-21T10:18:33.6584922Z cvt.rn.f32.s16 %r21718, %rs1754; 2026-02-21T10:18:33.6584987Z cvt.rn.f32.s16 %r21719, %rs1752; 2026-02-21T10:18:33.6585186Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6585324Z cvt.s16.s8 %rs1757, %rs1674; 2026-02-21T10:18:33.6585391Z shr.s16 %rs1758, %rs1757, 4; 2026-02-21T10:18:33.6585452Z cvt.s16.s8 %rs1759, %rs1676; 2026-02-21T10:18:33.6585512Z shr.s16 %rs1760, %rs1759, 4; 2026-02-21T10:18:33.6585574Z shr.s16 %rs1761, %rs1673, 4; 2026-02-21T10:18:33.6585633Z shr.s16 %rs1762, %rs1675, 4; 2026-02-21T10:18:33.6585830Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6585898Z cvt.rn.f32.s16 %r21720, %rs1762; 2026-02-21T10:18:33.6585964Z cvt.rn.f32.s16 %r21721, %rs1761; 2026-02-21T10:18:33.6586026Z cvt.rn.f32.s16 %r21722, %rs1760; 2026-02-21T10:18:33.6586092Z cvt.rn.f32.s16 %r21723, %rs1758; 2026-02-21T10:18:33.6586300Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6586363Z cvt.s16.s8 %rs1763, %rs1678; 2026-02-21T10:18:33.6586425Z shr.s16 %rs1764, %rs1763, 4; 2026-02-21T10:18:33.6586598Z cvt.s16.s8 %rs1765, %rs1680; 2026-02-21T10:18:33.6586664Z shr.s16 %rs1766, %rs1765, 4; 2026-02-21T10:18:33.6586724Z shr.s16 %rs1767, %rs1677, 4; 2026-02-21T10:18:33.6586786Z shr.s16 %rs1768, %rs1679, 4; 2026-02-21T10:18:33.6586986Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6587052Z cvt.rn.f32.s16 %r21724, %rs1768; 2026-02-21T10:18:33.6587114Z cvt.rn.f32.s16 %r21725, %rs1767; 2026-02-21T10:18:33.6587178Z cvt.rn.f32.s16 %r21726, %rs1766; 2026-02-21T10:18:33.6587240Z cvt.rn.f32.s16 %r21727, %rs1764; 2026-02-21T10:18:33.6587445Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6587513Z cvt.s16.s8 %rs1769, %rs1682; 2026-02-21T10:18:33.6587576Z shr.s16 %rs1770, %rs1769, 4; 2026-02-21T10:18:33.6587637Z cvt.s16.s8 %rs1771, %rs1684; 2026-02-21T10:18:33.6587700Z shr.s16 %rs1772, %rs1771, 4; 2026-02-21T10:18:33.6587760Z shr.s16 %rs1773, %rs1681, 4; 2026-02-21T10:18:33.6587822Z shr.s16 %rs1774, %rs1683, 4; 2026-02-21T10:18:33.6588019Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6588084Z cvt.rn.f32.s16 %r21728, %rs1774; 2026-02-21T10:18:33.6588145Z cvt.rn.f32.s16 %r21729, %rs1773; 2026-02-21T10:18:33.6588209Z cvt.rn.f32.s16 %r21730, %rs1772; 2026-02-21T10:18:33.6588281Z cvt.rn.f32.s16 %r21731, %rs1770; 2026-02-21T10:18:33.6588571Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6588721Z cvt.s16.s8 %rs1775, %rs1686; 2026-02-21T10:18:33.6588857Z shr.s16 %rs1776, %rs1775, 4; 2026-02-21T10:18:33.6588919Z cvt.s16.s8 %rs1777, %rs1688; 2026-02-21T10:18:33.6588981Z shr.s16 %rs1778, %rs1777, 4; 2026-02-21T10:18:33.6589042Z shr.s16 %rs1779, %rs1685, 4; 2026-02-21T10:18:33.6589104Z shr.s16 %rs1780, %rs1687, 4; 2026-02-21T10:18:33.6589302Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6589366Z cvt.rn.f32.s16 %r21732, %rs1780; 2026-02-21T10:18:33.6589432Z cvt.rn.f32.s16 %r21733, %rs1779; 2026-02-21T10:18:33.6589494Z cvt.rn.f32.s16 %r21734, %rs1778; 2026-02-21T10:18:33.6589555Z cvt.rn.f32.s16 %r21735, %rs1776; 2026-02-21T10:18:33.6589755Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6589817Z cvt.s16.s8 %rs1781, %rs1690; 2026-02-21T10:18:33.6589878Z shr.s16 %rs1782, %rs1781, 4; 2026-02-21T10:18:33.6589941Z cvt.s16.s8 %rs1783, %rs1692; 2026-02-21T10:18:33.6590005Z shr.s16 %rs1784, %rs1783, 4; 2026-02-21T10:18:33.6590068Z shr.s16 %rs1785, %rs1689, 4; 2026-02-21T10:18:33.6590189Z shr.s16 %rs1786, %rs1691, 4; 2026-02-21T10:18:33.6590389Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6590452Z cvt.rn.f32.s16 %r21736, %rs1786; 2026-02-21T10:18:33.6590514Z cvt.rn.f32.s16 %r21737, %rs1785; 2026-02-21T10:18:33.6590631Z cvt.rn.f32.s16 %r21738, %rs1784; 2026-02-21T10:18:33.6590697Z cvt.rn.f32.s16 %r21739, %rs1782; 2026-02-21T10:18:33.6590892Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6590954Z cvt.s16.s8 %rs1787, %rs1694; 2026-02-21T10:18:33.6591018Z shr.s16 %rs1788, %rs1787, 4; 2026-02-21T10:18:33.6591090Z cvt.s16.s8 %rs1789, %rs1696; 2026-02-21T10:18:33.6591156Z shr.s16 %rs1790, %rs1789, 4; 2026-02-21T10:18:33.6591220Z shr.s16 %rs1791, %rs1693, 4; 2026-02-21T10:18:33.6591281Z shr.s16 %rs1792, %rs1695, 4; 2026-02-21T10:18:33.6591480Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6591543Z cvt.rn.f32.s16 %r21740, %rs1792; 2026-02-21T10:18:33.6591609Z cvt.rn.f32.s16 %r21741, %rs1791; 2026-02-21T10:18:33.6591670Z cvt.rn.f32.s16 %r21742, %rs1790; 2026-02-21T10:18:33.6591730Z cvt.rn.f32.s16 %r21743, %rs1788; 2026-02-21T10:18:33.6591790Z bar.sync 0; 2026-02-21T10:18:33.6591913Z st.shared.v4.b32 [%r34], {%r21683, %r21681, %r21682, %r21680}; 2026-02-21T10:18:33.6592043Z st.shared.v4.b32 [%r34+16384], {%r21715, %r21713, %r21714, %r21712}; 2026-02-21T10:18:33.6592158Z st.shared.v4.b32 [%r35], {%r21687, %r21685, %r21686, %r21684}; 2026-02-21T10:18:33.6592281Z st.shared.v4.b32 [%r35+16384], {%r21719, %r21717, %r21718, %r21716}; 2026-02-21T10:18:33.6592389Z st.shared.v4.b32 [%r36], {%r21691, %r21689, %r21690, %r21688}; 2026-02-21T10:18:33.6592509Z st.shared.v4.b32 [%r36+16384], {%r21723, %r21721, %r21722, %r21720}; 2026-02-21T10:18:33.6592621Z st.shared.v4.b32 [%r37], {%r21695, %r21693, %r21694, %r21692}; 2026-02-21T10:18:33.6592740Z st.shared.v4.b32 [%r37+16384], {%r21727, %r21725, %r21726, %r21724}; 2026-02-21T10:18:33.6592845Z st.shared.v4.b32 [%r38], {%r21699, %r21697, %r21698, %r21696}; 2026-02-21T10:18:33.6592965Z st.shared.v4.b32 [%r38+16384], {%r21731, %r21729, %r21730, %r21728}; 2026-02-21T10:18:33.6593075Z st.shared.v4.b32 [%r39], {%r21703, %r21701, %r21702, %r21700}; 2026-02-21T10:18:33.6593193Z st.shared.v4.b32 [%r39+16384], {%r21735, %r21733, %r21734, %r21732}; 2026-02-21T10:18:33.6593301Z st.shared.v4.b32 [%r40], {%r21707, %r21705, %r21706, %r21704}; 2026-02-21T10:18:33.6593418Z st.shared.v4.b32 [%r40+16384], {%r21739, %r21737, %r21738, %r21736}; 2026-02-21T10:18:33.6593524Z st.shared.v4.b32 [%r41], {%r21711, %r21709, %r21710, %r21708}; 2026-02-21T10:18:33.6593643Z st.shared.v4.b32 [%r41+16384], {%r21743, %r21741, %r21742, %r21740}; 2026-02-21T10:18:33.6593768Z $L__tmp15: 2026-02-21T10:18:33.6594101Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6594166Z // begin inline asm 2026-02-21T10:18:33.6594259Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6594320Z // end inline asm 2026-02-21T10:18:33.6594377Z bar.sync 0; 2026-02-21T10:18:33.6594452Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6594520Z mov.pred %p198, -1; 2026-02-21T10:18:33.6594588Z // begin inline asm 2026-02-21T10:18:33.6596147Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r19433,%r19434,%r19435,%r19436}, %rd3, %p198, 1, 1; 2026-02-21T10:18:33.6596212Z // end inline asm 2026-02-21T10:18:33.6596272Z // begin inline asm 2026-02-21T10:18:33.6597948Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r19565,%r19566,%r19567,%r19568}, %rd4, %p198, 1, 1; 2026-02-21T10:18:33.6598017Z // end inline asm 2026-02-21T10:18:33.6598082Z // begin inline asm 2026-02-21T10:18:33.6599569Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r19697,%r19698,%r19699,%r19700}, %rd5, %p198, 1, 1; 2026-02-21T10:18:33.6599631Z // end inline asm 2026-02-21T10:18:33.6599691Z // begin inline asm 2026-02-21T10:18:33.6601171Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r19829,%r19830,%r19831,%r19832}, %rd6, %p198, 1, 1; 2026-02-21T10:18:33.6601234Z // end inline asm 2026-02-21T10:18:33.6601294Z // begin inline asm 2026-02-21T10:18:33.6602774Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r19961,%r19962,%r19963,%r19964}, %rd7, %p198, 1, 1; 2026-02-21T10:18:33.6602964Z // end inline asm 2026-02-21T10:18:33.6603026Z // begin inline asm 2026-02-21T10:18:33.6604512Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r20093,%r20094,%r20095,%r20096}, %rd8, %p198, 1, 1; 2026-02-21T10:18:33.6604572Z // end inline asm 2026-02-21T10:18:33.6604631Z // begin inline asm 2026-02-21T10:18:33.6606214Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r20225,%r20226,%r20227,%r20228}, %rd9, %p198, 1, 1; 2026-02-21T10:18:33.6606279Z // end inline asm 2026-02-21T10:18:33.6606339Z // begin inline asm 2026-02-21T10:18:33.6607940Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227}, {%r20357,%r20358,%r20359,%r20360}, %rd10, %p198, 1, 1; 2026-02-21T10:18:33.6608003Z // end inline asm 2026-02-21T10:18:33.6608060Z // begin inline asm 2026-02-21T10:18:33.6609557Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r20489,%r20490,%r20491,%r20492}, %rd3, %p198, 1, 1; 2026-02-21T10:18:33.6609618Z // end inline asm 2026-02-21T10:18:33.6609679Z // begin inline asm 2026-02-21T10:18:33.6611161Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r20621,%r20622,%r20623,%r20624}, %rd4, %p198, 1, 1; 2026-02-21T10:18:33.6611218Z // end inline asm 2026-02-21T10:18:33.6611278Z // begin inline asm 2026-02-21T10:18:33.6612760Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r20753,%r20754,%r20755,%r20756}, %rd5, %p198, 1, 1; 2026-02-21T10:18:33.6612972Z // end inline asm 2026-02-21T10:18:33.6613031Z // begin inline asm 2026-02-21T10:18:33.6614581Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r20885,%r20886,%r20887,%r20888}, %rd6, %p198, 1, 1; 2026-02-21T10:18:33.6614647Z // end inline asm 2026-02-21T10:18:33.6614705Z // begin inline asm 2026-02-21T10:18:33.6616232Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r21017,%r21018,%r21019,%r21020}, %rd7, %p198, 1, 1; 2026-02-21T10:18:33.6616297Z // end inline asm 2026-02-21T10:18:33.6616367Z // begin inline asm 2026-02-21T10:18:33.6617972Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r21149,%r21150,%r21151,%r21152}, %rd8, %p198, 1, 1; 2026-02-21T10:18:33.6618035Z // end inline asm 2026-02-21T10:18:33.6618093Z // begin inline asm 2026-02-21T10:18:33.6619571Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r21281,%r21282,%r21283,%r21284}, %rd9, %p198, 1, 1; 2026-02-21T10:18:33.6619633Z // end inline asm 2026-02-21T10:18:33.6619696Z // begin inline asm 2026-02-21T10:18:33.6621179Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291}, {%r21413,%r21414,%r21415,%r21416}, %rd10, %p198, 1, 1; 2026-02-21T10:18:33.6621375Z // end inline asm 2026-02-21T10:18:33.6621458Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6621522Z mov.b32 %r21545, %r29377; 2026-02-21T10:18:33.6621583Z mov.b32 %r21546, %r21547; 2026-02-21T10:18:33.6621643Z // begin inline asm 2026-02-21T10:18:33.6624277Z // wait for regs: %r32164,%r32165,%r32166,%r32167,%r32168,%r32169,%r32170,%r32171,%r32172,%r32173,%r32174,%r32175,%r32176,%r32177,%r32178,%r32179,%r32180,%r32181,%r32182,%r32183,%r32184,%r32185,%r32186,%r32187,%r32188,%r32189,%r32190,%r32191,%r32192,%r32193,%r32194,%r32195,%r32196,%r32197,%r32198,%r32199,%r32200,%r32201,%r32202,%r32203,%r32204,%r32205,%r32206,%r32207,%r32208,%r32209,%r32210,%r32211,%r32212,%r32213,%r32214,%r32215,%r32216,%r32217,%r32218,%r32219,%r32220,%r32221,%r32222,%r32223,%r32224,%r32225,%r32226,%r32227,%r32228,%r32229,%r32230,%r32231,%r32232,%r32233,%r32234,%r32235,%r32236,%r32237,%r32238,%r32239,%r32240,%r32241,%r32242,%r32243,%r32244,%r32245,%r32246,%r32247,%r32248,%r32249,%r32250,%r32251,%r32252,%r32253,%r32254,%r32255,%r32256,%r32257,%r32258,%r32259,%r32260,%r32261,%r32262,%r32263,%r32264,%r32265,%r32266,%r32267,%r32268,%r32269,%r32270,%r32271,%r32272,%r32273,%r32274,%r32275,%r32276,%r32277,%r32278,%r32279,%r32280,%r32281,%r32282,%r32283,%r32284,%r32285,%r32286,%r32287,%r32288,%r32289,%r32290,%r32291,%r21545,%r21546,%r21547 2026-02-21T10:18:33.6624365Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6624423Z // end inline asm 2026-02-21T10:18:33.6624479Z $L__tmp16: 2026-02-21T10:18:33.6624698Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6624768Z add.s64 %rd660, %rd660, 128; 2026-02-21T10:18:33.6624837Z setp.lt.u64 %p216, %rd661, 4064; 2026-02-21T10:18:33.6624898Z @%p216 bra $L__BB0_9; 2026-02-21T10:18:33.6625028Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:33.6625240Z .loc 1 97 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:97:28 2026-02-21T10:18:33.6625330Z cvt.rn.bf16x2.f32 %r21747, %r32165, %r32164; 2026-02-21T10:18:33.6625414Z cvt.rn.bf16x2.f32 %r21748, %r32167, %r32166; 2026-02-21T10:18:33.6625493Z cvt.rn.bf16x2.f32 %r21749, %r32169, %r32168; 2026-02-21T10:18:33.6625570Z cvt.rn.bf16x2.f32 %r21750, %r32171, %r32170; 2026-02-21T10:18:33.6625648Z cvt.rn.bf16x2.f32 %r21751, %r32173, %r32172; 2026-02-21T10:18:33.6625724Z cvt.rn.bf16x2.f32 %r21752, %r32175, %r32174; 2026-02-21T10:18:33.6625799Z cvt.rn.bf16x2.f32 %r21753, %r32177, %r32176; 2026-02-21T10:18:33.6625875Z cvt.rn.bf16x2.f32 %r21754, %r32179, %r32178; 2026-02-21T10:18:33.6625952Z cvt.rn.bf16x2.f32 %r21755, %r32181, %r32180; 2026-02-21T10:18:33.6626027Z cvt.rn.bf16x2.f32 %r21756, %r32183, %r32182; 2026-02-21T10:18:33.6626101Z cvt.rn.bf16x2.f32 %r21757, %r32185, %r32184; 2026-02-21T10:18:33.6626181Z cvt.rn.bf16x2.f32 %r21758, %r32187, %r32186; 2026-02-21T10:18:33.6626258Z cvt.rn.bf16x2.f32 %r21759, %r32189, %r32188; 2026-02-21T10:18:33.6626334Z cvt.rn.bf16x2.f32 %r21760, %r32191, %r32190; 2026-02-21T10:18:33.6626414Z cvt.rn.bf16x2.f32 %r21761, %r32193, %r32192; 2026-02-21T10:18:33.6626621Z cvt.rn.bf16x2.f32 %r21762, %r32195, %r32194; 2026-02-21T10:18:33.6626701Z cvt.rn.bf16x2.f32 %r21763, %r32197, %r32196; 2026-02-21T10:18:33.6626777Z cvt.rn.bf16x2.f32 %r21764, %r32199, %r32198; 2026-02-21T10:18:33.6626856Z cvt.rn.bf16x2.f32 %r21765, %r32201, %r32200; 2026-02-21T10:18:33.6626932Z cvt.rn.bf16x2.f32 %r21766, %r32203, %r32202; 2026-02-21T10:18:33.6627007Z cvt.rn.bf16x2.f32 %r21767, %r32205, %r32204; 2026-02-21T10:18:33.6627085Z cvt.rn.bf16x2.f32 %r21768, %r32207, %r32206; 2026-02-21T10:18:33.6627158Z cvt.rn.bf16x2.f32 %r21769, %r32209, %r32208; 2026-02-21T10:18:33.6627233Z cvt.rn.bf16x2.f32 %r21770, %r32211, %r32210; 2026-02-21T10:18:33.6627406Z cvt.rn.bf16x2.f32 %r21771, %r32213, %r32212; 2026-02-21T10:18:33.6627543Z cvt.rn.bf16x2.f32 %r21772, %r32215, %r32214; 2026-02-21T10:18:33.6627620Z cvt.rn.bf16x2.f32 %r21773, %r32217, %r32216; 2026-02-21T10:18:33.6627696Z cvt.rn.bf16x2.f32 %r21774, %r32219, %r32218; 2026-02-21T10:18:33.6627773Z cvt.rn.bf16x2.f32 %r21775, %r32221, %r32220; 2026-02-21T10:18:33.6627847Z cvt.rn.bf16x2.f32 %r21776, %r32223, %r32222; 2026-02-21T10:18:33.6627923Z cvt.rn.bf16x2.f32 %r21777, %r32225, %r32224; 2026-02-21T10:18:33.6628005Z cvt.rn.bf16x2.f32 %r21778, %r32227, %r32226; 2026-02-21T10:18:33.6628081Z cvt.rn.bf16x2.f32 %r21779, %r32229, %r32228; 2026-02-21T10:18:33.6628155Z cvt.rn.bf16x2.f32 %r21780, %r32231, %r32230; 2026-02-21T10:18:33.6628233Z cvt.rn.bf16x2.f32 %r21781, %r32233, %r32232; 2026-02-21T10:18:33.6628307Z cvt.rn.bf16x2.f32 %r21782, %r32235, %r32234; 2026-02-21T10:18:33.6628381Z cvt.rn.bf16x2.f32 %r21783, %r32237, %r32236; 2026-02-21T10:18:33.6628526Z cvt.rn.bf16x2.f32 %r21784, %r32239, %r32238; 2026-02-21T10:18:33.6628611Z cvt.rn.bf16x2.f32 %r21785, %r32241, %r32240; 2026-02-21T10:18:33.6628689Z cvt.rn.bf16x2.f32 %r21786, %r32243, %r32242; 2026-02-21T10:18:33.6628832Z cvt.rn.bf16x2.f32 %r21787, %r32245, %r32244; 2026-02-21T10:18:33.6628919Z cvt.rn.bf16x2.f32 %r21788, %r32247, %r32246; 2026-02-21T10:18:33.6628999Z cvt.rn.bf16x2.f32 %r21789, %r32249, %r32248; 2026-02-21T10:18:33.6629076Z cvt.rn.bf16x2.f32 %r21790, %r32251, %r32250; 2026-02-21T10:18:33.6629210Z cvt.rn.bf16x2.f32 %r21791, %r32253, %r32252; 2026-02-21T10:18:33.6629288Z cvt.rn.bf16x2.f32 %r21792, %r32255, %r32254; 2026-02-21T10:18:33.6629361Z cvt.rn.bf16x2.f32 %r21793, %r32257, %r32256; 2026-02-21T10:18:33.6629435Z cvt.rn.bf16x2.f32 %r21794, %r32259, %r32258; 2026-02-21T10:18:33.6629514Z cvt.rn.bf16x2.f32 %r21795, %r32261, %r32260; 2026-02-21T10:18:33.6629588Z cvt.rn.bf16x2.f32 %r21796, %r32263, %r32262; 2026-02-21T10:18:33.6629661Z cvt.rn.bf16x2.f32 %r21797, %r32265, %r32264; 2026-02-21T10:18:33.6629741Z cvt.rn.bf16x2.f32 %r21798, %r32267, %r32266; 2026-02-21T10:18:33.6629817Z cvt.rn.bf16x2.f32 %r21799, %r32269, %r32268; 2026-02-21T10:18:33.6629891Z cvt.rn.bf16x2.f32 %r21800, %r32271, %r32270; 2026-02-21T10:18:33.6629966Z cvt.rn.bf16x2.f32 %r21801, %r32273, %r32272; 2026-02-21T10:18:33.6630043Z cvt.rn.bf16x2.f32 %r21802, %r32275, %r32274; 2026-02-21T10:18:33.6630117Z cvt.rn.bf16x2.f32 %r21803, %r32277, %r32276; 2026-02-21T10:18:33.6630193Z cvt.rn.bf16x2.f32 %r21804, %r32279, %r32278; 2026-02-21T10:18:33.6630271Z cvt.rn.bf16x2.f32 %r21805, %r32281, %r32280; 2026-02-21T10:18:33.6630346Z cvt.rn.bf16x2.f32 %r21806, %r32283, %r32282; 2026-02-21T10:18:33.6630422Z cvt.rn.bf16x2.f32 %r21807, %r32285, %r32284; 2026-02-21T10:18:33.6630498Z cvt.rn.bf16x2.f32 %r21808, %r32287, %r32286; 2026-02-21T10:18:33.6630573Z cvt.rn.bf16x2.f32 %r21809, %r32289, %r32288; 2026-02-21T10:18:33.6630649Z cvt.rn.bf16x2.f32 %r21810, %r32291, %r32290; 2026-02-21T10:18:33.6630857Z .loc 1 98 43 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:98:43 2026-02-21T10:18:33.6630923Z bar.sync 0; 2026-02-21T10:18:33.6631121Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r21747, %r21748, %r21749, %r21750}; 2026-02-21T10:18:33.6631309Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r21763, %r21764, %r21765, %r21766}; 2026-02-21T10:18:33.6631494Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r21779, %r21780, %r21781, %r21782}; 2026-02-21T10:18:33.6631675Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r21795, %r21796, %r21797, %r21798}; 2026-02-21T10:18:33.6631861Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r21751, %r21752, %r21753, %r21754}; 2026-02-21T10:18:33.6632041Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r21767, %r21768, %r21769, %r21770}; 2026-02-21T10:18:33.6632219Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r21783, %r21784, %r21785, %r21786}; 2026-02-21T10:18:33.6632402Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r21799, %r21800, %r21801, %r21802}; 2026-02-21T10:18:33.6632651Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r21755, %r21756, %r21757, %r21758}; 2026-02-21T10:18:33.6632874Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r21771, %r21772, %r21773, %r21774}; 2026-02-21T10:18:33.6633057Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r52], {%r21787, %r21788, %r21789, %r21790}; 2026-02-21T10:18:33.6633238Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r53], {%r21803, %r21804, %r21805, %r21806}; 2026-02-21T10:18:33.6633417Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r54], {%r21759, %r21760, %r21761, %r21762}; 2026-02-21T10:18:33.6633598Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r55], {%r21775, %r21776, %r21777, %r21778}; 2026-02-21T10:18:33.6633777Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r56], {%r21791, %r21792, %r21793, %r21794}; 2026-02-21T10:18:33.6633956Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r57], {%r21807, %r21808, %r21809, %r21810}; 2026-02-21T10:18:33.6634019Z // begin inline asm 2026-02-21T10:18:33.6634101Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6634160Z // end inline asm 2026-02-21T10:18:33.6634215Z bar.sync 0; 2026-02-21T10:18:33.6634338Z elect.sync %r21811|%p219, -1; 2026-02-21T10:18:33.6634411Z and.pred %p217, %p314, %p219; 2026-02-21T10:18:33.6634477Z or.b32 %r21744, %r19299, %r637; 2026-02-21T10:18:33.6634546Z // begin inline asm 2026-02-21T10:18:33.6634833Z @%p217 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd298, {%r21744, %r21745}], [%r21746]; 2026-02-21T10:18:33.6634894Z // end inline asm 2026-02-21T10:18:33.6634972Z cp.async.bulk.commit_group; 2026-02-21T10:18:33.6635048Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:18:33.6635104Z bar.sync 0; 2026-02-21T10:18:33.6635313Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.6635385Z add.s32 %r31905, %r31905, 2; 2026-02-21T10:18:33.6635462Z setp.lt.s32 %p220, %r31905, %r32420; 2026-02-21T10:18:33.6635525Z @%p220 bra $L__BB0_2; 2026-02-21T10:18:33.6635626Z $L__BB0_11: // %.preheader202 2026-02-21T10:18:33.6635696Z setp.gt.s32 %p221, %r32420, %r3; 2026-02-21T10:18:33.6635757Z @%p221 bra $L__BB0_18; 2026-02-21T10:18:33.6635844Z // %bb.12: // %.lr.ph211 2026-02-21T10:18:33.6636045Z .loc 1 0 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:0:88 2026-02-21T10:18:33.6636110Z shl.b32 %r21813, %r31897, 4; 2026-02-21T10:18:33.6636177Z xor.b32 %r21815, %r21813, %r31898; 2026-02-21T10:18:33.6636244Z add.s32 %r75, %r29377, %r21815; 2026-02-21T10:18:33.6636305Z xor.b32 %r21817, %r21815, 8; 2026-02-21T10:18:33.6636367Z add.s32 %r76, %r29377, %r21817; 2026-02-21T10:18:33.6636432Z and.b32 %r21819, %r31899, 6144; 2026-02-21T10:18:33.6636613Z and.b32 %r21821, %r31900, 896; 2026-02-21T10:18:33.6636679Z and.b32 %r21823, %r31901, 62; 2026-02-21T10:18:33.6636741Z or.b32 %r21824, %r21821, %r21823; 2026-02-21T10:18:33.6636807Z or.b32 %r21825, %r21824, %r21819; 2026-02-21T10:18:33.6636868Z add.s32 %r77, %r29377, %r21825; 2026-02-21T10:18:33.6636931Z xor.b32 %r21826, %r21825, 8; 2026-02-21T10:18:33.6636995Z add.s32 %r78, %r29377, %r21826; 2026-02-21T10:18:33.6637057Z xor.b32 %r21827, %r21825, 16; 2026-02-21T10:18:33.6637118Z add.s32 %r79, %r29377, %r21827; 2026-02-21T10:18:33.6637177Z xor.b32 %r21828, %r21825, 24; 2026-02-21T10:18:33.6637239Z add.s32 %r80, %r29377, %r21828; 2026-02-21T10:18:33.6637299Z xor.b32 %r21829, %r21825, 32; 2026-02-21T10:18:33.6637359Z add.s32 %r81, %r29377, %r21829; 2026-02-21T10:18:33.6637420Z xor.b32 %r21830, %r21825, 40; 2026-02-21T10:18:33.6637481Z add.s32 %r82, %r29377, %r21830; 2026-02-21T10:18:33.6637541Z xor.b32 %r21831, %r21825, 48; 2026-02-21T10:18:33.6637605Z add.s32 %r83, %r29377, %r21831; 2026-02-21T10:18:33.6637664Z xor.b32 %r21832, %r21825, 56; 2026-02-21T10:18:33.6637724Z add.s32 %r84, %r29377, %r21832; 2026-02-21T10:18:33.6637784Z add.s32 %r85, %r29377, %r31897; 2026-02-21T10:18:33.6637940Z xor.b32 %r21833, %r31897, 16; 2026-02-21T10:18:33.6638076Z add.s32 %r86, %r29377, %r21833; 2026-02-21T10:18:33.6638138Z xor.b32 %r21834, %r31897, 32; 2026-02-21T10:18:33.6638201Z add.s32 %r87, %r29377, %r21834; 2026-02-21T10:18:33.6638259Z xor.b32 %r21835, %r31897, 48; 2026-02-21T10:18:33.6638319Z add.s32 %r88, %r29377, %r21835; 2026-02-21T10:18:33.6638389Z xor.b32 %r21836, %r31897, 64; 2026-02-21T10:18:33.6638456Z add.s32 %r89, %r29377, %r21836; 2026-02-21T10:18:33.6638515Z xor.b32 %r21837, %r31897, 80; 2026-02-21T10:18:33.6638576Z add.s32 %r90, %r29377, %r21837; 2026-02-21T10:18:33.6638637Z xor.b32 %r21838, %r31897, 96; 2026-02-21T10:18:33.6638697Z add.s32 %r91, %r29377, %r21838; 2026-02-21T10:18:33.6638757Z xor.b32 %r21839, %r31897, 112; 2026-02-21T10:18:33.6638819Z add.s32 %r92, %r29377, %r21839; 2026-02-21T10:18:33.6638881Z shl.b32 %r21840, %r31897, 7; 2026-02-21T10:18:33.6638944Z or.b32 %r21842, %r21840, %r31902; 2026-02-21T10:18:33.6639007Z add.s32 %r93, %r29377, %r21842; 2026-02-21T10:18:33.6639069Z xor.b32 %r21843, %r21842, 16; 2026-02-21T10:18:33.6639130Z add.s32 %r94, %r29377, %r21843; 2026-02-21T10:18:33.6639255Z xor.b32 %r21844, %r21842, 32; 2026-02-21T10:18:33.6639323Z add.s32 %r95, %r29377, %r21844; 2026-02-21T10:18:33.6639382Z xor.b32 %r21845, %r21842, 48; 2026-02-21T10:18:33.6639442Z add.s32 %r96, %r29377, %r21845; 2026-02-21T10:18:33.6639503Z xor.b32 %r21846, %r21842, 64; 2026-02-21T10:18:33.6639639Z add.s32 %r97, %r29377, %r21846; 2026-02-21T10:18:33.6639703Z xor.b32 %r21847, %r21842, 80; 2026-02-21T10:18:33.6639765Z add.s32 %r98, %r29377, %r21847; 2026-02-21T10:18:33.6639826Z xor.b32 %r21848, %r21842, 96; 2026-02-21T10:18:33.6639887Z add.s32 %r99, %r29377, %r21848; 2026-02-21T10:18:33.6639946Z xor.b32 %r21849, %r21842, 112; 2026-02-21T10:18:33.6640009Z add.s32 %r100, %r29377, %r21849; 2026-02-21T10:18:33.6640074Z bfe.u32 %r21850, %r29377, 4, 14; 2026-02-21T10:18:33.6640139Z cvt.u64.u32 %rd466, %r21850; 2026-02-21T10:18:33.6640221Z or.b64 %rd12, %rd466, 4611686293372403712; 2026-02-21T10:18:33.6640286Z add.s32 %r21851, %r29377, 32; 2026-02-21T10:18:33.6640348Z bfe.u32 %r21852, %r21851, 4, 14; 2026-02-21T10:18:33.6640410Z cvt.u64.u32 %rd467, %r21852; 2026-02-21T10:18:33.6640485Z or.b64 %rd13, %rd467, 4611686293372403712; 2026-02-21T10:18:33.6640548Z add.s32 %r21853, %r29377, 64; 2026-02-21T10:18:33.6640608Z bfe.u32 %r21854, %r21853, 4, 14; 2026-02-21T10:18:33.6640672Z cvt.u64.u32 %rd468, %r21854; 2026-02-21T10:18:33.6640743Z or.b64 %rd14, %rd468, 4611686293372403712; 2026-02-21T10:18:33.6640803Z add.s32 %r21855, %r29377, 96; 2026-02-21T10:18:33.6640863Z bfe.u32 %r21856, %r21855, 4, 14; 2026-02-21T10:18:33.6640926Z cvt.u64.u32 %rd469, %r21856; 2026-02-21T10:18:33.6640996Z or.b64 %rd15, %rd469, 4611686293372403712; 2026-02-21T10:18:33.6641057Z add.s32 %r21857, %r29377, 16384; 2026-02-21T10:18:33.6641116Z bfe.u32 %r21858, %r21857, 4, 14; 2026-02-21T10:18:33.6641181Z cvt.u64.u32 %rd470, %r21858; 2026-02-21T10:18:33.6641250Z or.b64 %rd16, %rd470, 4611686293372403712; 2026-02-21T10:18:33.6641311Z add.s32 %r21859, %r29377, 16416; 2026-02-21T10:18:33.6641374Z bfe.u32 %r21860, %r21859, 4, 14; 2026-02-21T10:18:33.6641437Z cvt.u64.u32 %rd471, %r21860; 2026-02-21T10:18:33.6641508Z or.b64 %rd17, %rd471, 4611686293372403712; 2026-02-21T10:18:33.6641569Z add.s32 %r21861, %r29377, 16448; 2026-02-21T10:18:33.6641633Z bfe.u32 %r21862, %r21861, 4, 14; 2026-02-21T10:18:33.6641694Z cvt.u64.u32 %rd472, %r21862; 2026-02-21T10:18:33.6641763Z or.b64 %rd18, %rd472, 4611686293372403712; 2026-02-21T10:18:33.6641826Z add.s32 %r21863, %r29377, 16480; 2026-02-21T10:18:33.6641885Z bfe.u32 %r21864, %r21863, 4, 14; 2026-02-21T10:18:33.6641958Z cvt.u64.u32 %rd473, %r21864; 2026-02-21T10:18:33.6642031Z or.b64 %rd19, %rd473, 4611686293372403712; 2026-02-21T10:18:33.6642093Z and.b32 %r21866, %r31903, 1920; 2026-02-21T10:18:33.6642156Z or.b32 %r21868, %r21866, %r31902; 2026-02-21T10:18:33.6642286Z xor.b32 %r21869, %r21868, %r31904; 2026-02-21T10:18:33.6642392Z or.b32 %r21870, %r21869, %r21819; 2026-02-21T10:18:33.6642454Z add.s32 %r101, %r29377, %r21870; 2026-02-21T10:18:33.6642517Z add.s32 %r102, %r101, 16384; 2026-02-21T10:18:33.6642583Z add.s32 %r103, %r101, 8192; 2026-02-21T10:18:33.6642646Z add.s32 %r104, %r101, 24576; 2026-02-21T10:18:33.6642707Z xor.b32 %r21871, %r21870, 32; 2026-02-21T10:18:33.6642768Z add.s32 %r105, %r29377, %r21871; 2026-02-21T10:18:33.6642830Z add.s32 %r106, %r105, 16384; 2026-02-21T10:18:33.6642888Z add.s32 %r107, %r105, 8192; 2026-02-21T10:18:33.6642947Z add.s32 %r108, %r105, 24576; 2026-02-21T10:18:33.6643010Z xor.b32 %r21872, %r21870, 64; 2026-02-21T10:18:33.6643070Z add.s32 %r109, %r29377, %r21872; 2026-02-21T10:18:33.6643129Z add.s32 %r110, %r109, 16384; 2026-02-21T10:18:33.6643199Z add.s32 %r111, %r109, 8192; 2026-02-21T10:18:33.6643262Z add.s32 %r112, %r109, 24576; 2026-02-21T10:18:33.6643322Z xor.b32 %r21873, %r21870, 96; 2026-02-21T10:18:33.6643384Z add.s32 %r113, %r29377, %r21873; 2026-02-21T10:18:33.6643447Z add.s32 %r114, %r113, 16384; 2026-02-21T10:18:33.6643506Z add.s32 %r115, %r113, 8192; 2026-02-21T10:18:33.6643615Z add.s32 %r116, %r113, 24576; 2026-02-21T10:18:33.6643838Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.6643903Z cvt.u64.u32 %rd20, %r15; 2026-02-21T10:18:33.6644065Z $L__BB0_13: // =>This Loop Header: Depth=1 2026-02-21T10:18:33.6644167Z // Child Loop BB0_14 Depth 2 2026-02-21T10:18:33.6644269Z // Child Loop BB0_16 Depth 2 2026-02-21T10:18:33.6644477Z .loc 1 38 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:38:33 2026-02-21T10:18:33.6644539Z shr.u32 %r21875, %r32420, 9; 2026-02-21T10:18:33.6644606Z and.b32 %r21876, %r21875, 8388576; 2026-02-21T10:18:33.6644807Z .loc 1 39 39 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:39:39 2026-02-21T10:18:33.6644870Z sub.s32 %r21877, 10, %r21876; 2026-02-21T10:18:33.6645069Z .loc 1 41 51 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:41:51 2026-02-21T10:18:33.6645132Z div.u32 %r21878, %r32420, %r21877; 2026-02-21T10:18:33.6645328Z .loc 1 40 64 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:64 2026-02-21T10:18:33.6645400Z mul.lo.s32 %r21879, %r21878, %r21877; 2026-02-21T10:18:33.6645462Z sub.s32 %r21880, %r32420, %r21879; 2026-02-21T10:18:33.6645656Z .loc 1 40 30 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:40:30 2026-02-21T10:18:33.6645719Z add.s32 %r21881, %r21880, %r21876; 2026-02-21T10:18:33.6645926Z .loc 1 42 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:42:27 2026-02-21T10:18:33.6645989Z shl.b32 %r29378, %r21881, 7; 2026-02-21T10:18:33.6646187Z .loc 1 43 27 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:43:27 2026-02-21T10:18:33.6646254Z shl.b32 %r31825, %r21878, 7; 2026-02-21T10:18:33.6646569Z .loc 1 44 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:44:32 2026-02-21T10:18:33.6646634Z or.b32 %r21882, %r31825, %r6; 2026-02-21T10:18:33.6646697Z or.b32 %r21883, %r31825, %r7; 2026-02-21T10:18:33.6646760Z or.b32 %r21884, %r31825, %r8; 2026-02-21T10:18:33.6646821Z or.b32 %r21885, %r31825, %r9; 2026-02-21T10:18:33.6646884Z or.b32 %r21886, %r31825, %r10; 2026-02-21T10:18:33.6646948Z or.b32 %r21887, %r31825, %r11; 2026-02-21T10:18:33.6647009Z or.b32 %r21888, %r31825, %r12; 2026-02-21T10:18:33.6647069Z or.b32 %r21889, %r31825, %r13; 2026-02-21T10:18:33.6647272Z .loc 1 58 53 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:53 2026-02-21T10:18:33.6647332Z shl.b32 %r21890, %r21882, 13; 2026-02-21T10:18:33.6647481Z shl.b32 %r21891, %r21883, 13; 2026-02-21T10:18:33.6647602Z shl.b32 %r21892, %r21884, 13; 2026-02-21T10:18:33.6647661Z shl.b32 %r21893, %r21885, 13; 2026-02-21T10:18:33.6647721Z shl.b32 %r21894, %r21886, 13; 2026-02-21T10:18:33.6647779Z shl.b32 %r21895, %r21887, 13; 2026-02-21T10:18:33.6647839Z shl.b32 %r21896, %r21888, 13; 2026-02-21T10:18:33.6647901Z shl.b32 %r1159, %r21889, 13; 2026-02-21T10:18:33.6648103Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6648178Z mad.wide.u32 %rd69, %r21890, 2, %rd85; 2026-02-21T10:18:33.6648249Z mad.wide.u32 %rd70, %r21891, 2, %rd85; 2026-02-21T10:18:33.6648317Z mad.wide.u32 %rd71, %r21892, 2, %rd85; 2026-02-21T10:18:33.6648384Z mad.wide.u32 %rd72, %r21893, 2, %rd85; 2026-02-21T10:18:33.6648454Z mad.wide.u32 %rd73, %r21894, 2, %rd85; 2026-02-21T10:18:33.6648519Z mad.wide.u32 %rd74, %r21895, 2, %rd85; 2026-02-21T10:18:33.6648585Z mad.wide.u32 %rd75, %r21896, 2, %rd85; 2026-02-21T10:18:33.6648657Z mad.wide.u32 %rd76, %r1159, 2, %rd85; 2026-02-21T10:18:33.6648721Z mov.b32 %r32421, 0f00000000; 2026-02-21T10:18:33.6648779Z mov.b64 %rd663, 0; 2026-02-21T10:18:33.6648916Z mov.b64 %rd662, -96; 2026-02-21T10:18:33.6648982Z mov.b32 %r32422, %r32421; 2026-02-21T10:18:33.6649042Z mov.b32 %r32423, %r32421; 2026-02-21T10:18:33.6649100Z mov.b32 %r32424, %r32421; 2026-02-21T10:18:33.6649160Z mov.b32 %r32425, %r32421; 2026-02-21T10:18:33.6649275Z mov.b32 %r32426, %r32421; 2026-02-21T10:18:33.6649335Z mov.b32 %r32427, %r32421; 2026-02-21T10:18:33.6649397Z mov.b32 %r32428, %r32421; 2026-02-21T10:18:33.6649455Z mov.b32 %r32429, %r32421; 2026-02-21T10:18:33.6649514Z mov.b32 %r32430, %r32421; 2026-02-21T10:18:33.6649572Z mov.b32 %r32431, %r32421; 2026-02-21T10:18:33.6649632Z mov.b32 %r32432, %r32421; 2026-02-21T10:18:33.6649690Z mov.b32 %r32433, %r32421; 2026-02-21T10:18:33.6649749Z mov.b32 %r32434, %r32421; 2026-02-21T10:18:33.6649811Z mov.b32 %r32435, %r32421; 2026-02-21T10:18:33.6649869Z mov.b32 %r32436, %r32421; 2026-02-21T10:18:33.6649930Z mov.b32 %r32437, %r32421; 2026-02-21T10:18:33.6649988Z mov.b32 %r32438, %r32421; 2026-02-21T10:18:33.6650050Z mov.b32 %r32439, %r32421; 2026-02-21T10:18:33.6650120Z mov.b32 %r32440, %r32421; 2026-02-21T10:18:33.6650180Z mov.b32 %r32441, %r32421; 2026-02-21T10:18:33.6650240Z mov.b32 %r32442, %r32421; 2026-02-21T10:18:33.6650299Z mov.b32 %r32443, %r32421; 2026-02-21T10:18:33.6650359Z mov.b32 %r32444, %r32421; 2026-02-21T10:18:33.6650416Z mov.b32 %r32445, %r32421; 2026-02-21T10:18:33.6650475Z mov.b32 %r32446, %r32421; 2026-02-21T10:18:33.6650534Z mov.b32 %r32447, %r32421; 2026-02-21T10:18:33.6650593Z mov.b32 %r32448, %r32421; 2026-02-21T10:18:33.6650651Z mov.b32 %r32449, %r32421; 2026-02-21T10:18:33.6650709Z mov.b32 %r32450, %r32421; 2026-02-21T10:18:33.6650766Z mov.b32 %r32451, %r32421; 2026-02-21T10:18:33.6650825Z mov.b32 %r32452, %r32421; 2026-02-21T10:18:33.6650886Z mov.b32 %r32453, %r32421; 2026-02-21T10:18:33.6650944Z mov.b32 %r32454, %r32421; 2026-02-21T10:18:33.6651004Z mov.b32 %r32455, %r32421; 2026-02-21T10:18:33.6651065Z mov.b32 %r32456, %r32421; 2026-02-21T10:18:33.6651125Z mov.b32 %r32457, %r32421; 2026-02-21T10:18:33.6651183Z mov.b32 %r32458, %r32421; 2026-02-21T10:18:33.6651242Z mov.b32 %r32459, %r32421; 2026-02-21T10:18:33.6651301Z mov.b32 %r32460, %r32421; 2026-02-21T10:18:33.6651358Z mov.b32 %r32461, %r32421; 2026-02-21T10:18:33.6651418Z mov.b32 %r32462, %r32421; 2026-02-21T10:18:33.6651478Z mov.b32 %r32463, %r32421; 2026-02-21T10:18:33.6651536Z mov.b32 %r32464, %r32421; 2026-02-21T10:18:33.6651593Z mov.b32 %r32465, %r32421; 2026-02-21T10:18:33.6651653Z mov.b32 %r32466, %r32421; 2026-02-21T10:18:33.6651711Z mov.b32 %r32467, %r32421; 2026-02-21T10:18:33.6651769Z mov.b32 %r32468, %r32421; 2026-02-21T10:18:33.6651827Z mov.b32 %r32469, %r32421; 2026-02-21T10:18:33.6651887Z mov.b32 %r32470, %r32421; 2026-02-21T10:18:33.6652001Z mov.b32 %r32471, %r32421; 2026-02-21T10:18:33.6652059Z mov.b32 %r32472, %r32421; 2026-02-21T10:18:33.6652160Z mov.b32 %r32473, %r32421; 2026-02-21T10:18:33.6652218Z mov.b32 %r32474, %r32421; 2026-02-21T10:18:33.6652278Z mov.b32 %r32475, %r32421; 2026-02-21T10:18:33.6652336Z mov.b32 %r32476, %r32421; 2026-02-21T10:18:33.6652396Z mov.b32 %r32477, %r32421; 2026-02-21T10:18:33.6652453Z mov.b32 %r32478, %r32421; 2026-02-21T10:18:33.6652511Z mov.b32 %r32479, %r32421; 2026-02-21T10:18:33.6652571Z mov.b32 %r32480, %r32421; 2026-02-21T10:18:33.6652629Z mov.b32 %r32481, %r32421; 2026-02-21T10:18:33.6652687Z mov.b32 %r32482, %r32421; 2026-02-21T10:18:33.6652744Z mov.b32 %r32483, %r32421; 2026-02-21T10:18:33.6652803Z mov.b32 %r32484, %r32421; 2026-02-21T10:18:33.6652859Z mov.b32 %r32485, %r32421; 2026-02-21T10:18:33.6652918Z mov.b32 %r32486, %r32421; 2026-02-21T10:18:33.6652978Z mov.b32 %r32487, %r32421; 2026-02-21T10:18:33.6653036Z mov.b32 %r32488, %r32421; 2026-02-21T10:18:33.6653094Z mov.b32 %r32489, %r32421; 2026-02-21T10:18:33.6653152Z mov.b32 %r32490, %r32421; 2026-02-21T10:18:33.6653215Z mov.b32 %r32491, %r32421; 2026-02-21T10:18:33.6653321Z mov.b32 %r32492, %r32421; 2026-02-21T10:18:33.6653382Z mov.b32 %r32493, %r32421; 2026-02-21T10:18:33.6653455Z mov.b32 %r32494, %r32421; 2026-02-21T10:18:33.6653516Z mov.b32 %r32495, %r32421; 2026-02-21T10:18:33.6653575Z mov.b32 %r32496, %r32421; 2026-02-21T10:18:33.6653633Z mov.b32 %r32497, %r32421; 2026-02-21T10:18:33.6653738Z mov.b32 %r32498, %r32421; 2026-02-21T10:18:33.6653798Z mov.b32 %r32499, %r32421; 2026-02-21T10:18:33.6653856Z mov.b32 %r32500, %r32421; 2026-02-21T10:18:33.6653916Z mov.b32 %r32501, %r32421; 2026-02-21T10:18:33.6653974Z mov.b32 %r32502, %r32421; 2026-02-21T10:18:33.6654031Z mov.b32 %r32503, %r32421; 2026-02-21T10:18:33.6654091Z mov.b32 %r32504, %r32421; 2026-02-21T10:18:33.6654152Z mov.b32 %r32505, %r32421; 2026-02-21T10:18:33.6654210Z mov.b32 %r32506, %r32421; 2026-02-21T10:18:33.6654271Z mov.b32 %r32507, %r32421; 2026-02-21T10:18:33.6654331Z mov.b32 %r32508, %r32421; 2026-02-21T10:18:33.6654390Z mov.b32 %r32509, %r32421; 2026-02-21T10:18:33.6654457Z mov.b32 %r32510, %r32421; 2026-02-21T10:18:33.6654518Z mov.b32 %r32511, %r32421; 2026-02-21T10:18:33.6654577Z mov.b32 %r32512, %r32421; 2026-02-21T10:18:33.6654633Z mov.b32 %r32513, %r32421; 2026-02-21T10:18:33.6654691Z mov.b32 %r32514, %r32421; 2026-02-21T10:18:33.6654751Z mov.b32 %r32515, %r32421; 2026-02-21T10:18:33.6654811Z mov.b32 %r32516, %r32421; 2026-02-21T10:18:33.6654869Z mov.b32 %r32517, %r32421; 2026-02-21T10:18:33.6654928Z mov.b32 %r32518, %r32421; 2026-02-21T10:18:33.6654987Z mov.b32 %r32519, %r32421; 2026-02-21T10:18:33.6655044Z mov.b32 %r32520, %r32421; 2026-02-21T10:18:33.6655102Z mov.b32 %r32521, %r32421; 2026-02-21T10:18:33.6655162Z mov.b32 %r32522, %r32421; 2026-02-21T10:18:33.6655219Z mov.b32 %r32523, %r32421; 2026-02-21T10:18:33.6655279Z mov.b32 %r32524, %r32421; 2026-02-21T10:18:33.6655354Z mov.b32 %r32525, %r32421; 2026-02-21T10:18:33.6655413Z mov.b32 %r32526, %r32421; 2026-02-21T10:18:33.6655473Z mov.b32 %r32527, %r32421; 2026-02-21T10:18:33.6655533Z mov.b32 %r32528, %r32421; 2026-02-21T10:18:33.6655594Z mov.b32 %r32529, %r32421; 2026-02-21T10:18:33.6655653Z mov.b32 %r32530, %r32421; 2026-02-21T10:18:33.6655712Z mov.b32 %r32531, %r32421; 2026-02-21T10:18:33.6655773Z mov.b32 %r32532, %r32421; 2026-02-21T10:18:33.6655831Z mov.b32 %r32533, %r32421; 2026-02-21T10:18:33.6655890Z mov.b32 %r32534, %r32421; 2026-02-21T10:18:33.6655948Z mov.b32 %r32535, %r32421; 2026-02-21T10:18:33.6656009Z mov.b32 %r32536, %r32421; 2026-02-21T10:18:33.6656068Z mov.b32 %r32537, %r32421; 2026-02-21T10:18:33.6656126Z mov.b32 %r32538, %r32421; 2026-02-21T10:18:33.6656186Z mov.b32 %r32539, %r32421; 2026-02-21T10:18:33.6656243Z mov.b32 %r32540, %r32421; 2026-02-21T10:18:33.6656300Z mov.b32 %r32541, %r32421; 2026-02-21T10:18:33.6656357Z mov.b32 %r32542, %r32421; 2026-02-21T10:18:33.6656638Z mov.b32 %r32543, %r32421; 2026-02-21T10:18:33.6656697Z mov.b32 %r32544, %r32421; 2026-02-21T10:18:33.6656829Z mov.b32 %r32545, %r32421; 2026-02-21T10:18:33.6656892Z mov.b32 %r32546, %r32421; 2026-02-21T10:18:33.6656950Z mov.b32 %r32547, %r32421; 2026-02-21T10:18:33.6657008Z mov.b32 %r32548, %r32421; 2026-02-21T10:18:33.6657125Z $L__BB0_14: // Parent Loop BB0_13 Depth=1 2026-02-21T10:18:33.6657236Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.6657440Z .loc 1 55 22 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:55:22 2026-02-21T10:18:33.6657506Z shl.b64 %rd599, %rd663, 1; 2026-02-21T10:18:33.6657704Z .loc 1 57 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:57:25 2026-02-21T10:18:33.6657767Z or.b64 %rd600, %rd599, %rd20; 2026-02-21T10:18:33.6657963Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6658030Z shl.b64 %rd601, %rd600, 1; 2026-02-21T10:18:33.6658098Z add.s64 %rd477, %rd69, %rd601; 2026-02-21T10:18:33.6658223Z add.s64 %rd480, %rd70, %rd601; 2026-02-21T10:18:33.6658289Z add.s64 %rd483, %rd71, %rd601; 2026-02-21T10:18:33.6658349Z add.s64 %rd486, %rd72, %rd601; 2026-02-21T10:18:33.6658410Z add.s64 %rd489, %rd73, %rd601; 2026-02-21T10:18:33.6658470Z add.s64 %rd492, %rd74, %rd601; 2026-02-21T10:18:33.6658601Z add.s64 %rd495, %rd75, %rd601; 2026-02-21T10:18:33.6658665Z add.s64 %rd498, %rd76, %rd601; 2026-02-21T10:18:33.6658864Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6658928Z // begin inline asm 2026-02-21T10:18:33.6658988Z mov.u64 %rd476, 0x0; 2026-02-21T10:18:33.6659119Z createpolicy.fractional.L2::evict_first.b64 %rd476, 1.0; 2026-02-21T10:18:33.6659177Z // end inline asm 2026-02-21T10:18:33.6659238Z // begin inline asm 2026-02-21T10:18:33.6659300Z mov.u32 %r21897, 0x0; 2026-02-21T10:18:33.6659357Z mov.u32 %r21898, 0x0; 2026-02-21T10:18:33.6659418Z mov.u32 %r21899, 0x0; 2026-02-21T10:18:33.6659476Z mov.u32 %r21900, 0x0; 2026-02-21T10:18:33.6659719Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21897, %r21898, %r21899, %r21900 }, [ %rd477 + 0 ], %rd476; 2026-02-21T10:18:33.6659779Z // end inline asm 2026-02-21T10:18:33.6659836Z // begin inline asm 2026-02-21T10:18:33.6659895Z mov.u64 %rd479, 0x0; 2026-02-21T10:18:33.6660020Z createpolicy.fractional.L2::evict_first.b64 %rd479, 1.0; 2026-02-21T10:18:33.6660093Z // end inline asm 2026-02-21T10:18:33.6660154Z // begin inline asm 2026-02-21T10:18:33.6660213Z mov.u32 %r21901, 0x0; 2026-02-21T10:18:33.6660273Z mov.u32 %r21902, 0x0; 2026-02-21T10:18:33.6660331Z mov.u32 %r21903, 0x0; 2026-02-21T10:18:33.6660389Z mov.u32 %r21904, 0x0; 2026-02-21T10:18:33.6660622Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21901, %r21902, %r21903, %r21904 }, [ %rd480 + 0 ], %rd479; 2026-02-21T10:18:33.6660682Z // end inline asm 2026-02-21T10:18:33.6660741Z // begin inline asm 2026-02-21T10:18:33.6660800Z mov.u64 %rd482, 0x0; 2026-02-21T10:18:33.6660920Z createpolicy.fractional.L2::evict_first.b64 %rd482, 1.0; 2026-02-21T10:18:33.6660976Z // end inline asm 2026-02-21T10:18:33.6661034Z // begin inline asm 2026-02-21T10:18:33.6661092Z mov.u32 %r21905, 0x0; 2026-02-21T10:18:33.6661158Z mov.u32 %r21906, 0x0; 2026-02-21T10:18:33.6661221Z mov.u32 %r21907, 0x0; 2026-02-21T10:18:33.6661282Z mov.u32 %r21908, 0x0; 2026-02-21T10:18:33.6661517Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21905, %r21906, %r21907, %r21908 }, [ %rd483 + 0 ], %rd482; 2026-02-21T10:18:33.6661576Z // end inline asm 2026-02-21T10:18:33.6661635Z // begin inline asm 2026-02-21T10:18:33.6661697Z mov.u64 %rd485, 0x0; 2026-02-21T10:18:33.6661826Z createpolicy.fractional.L2::evict_first.b64 %rd485, 1.0; 2026-02-21T10:18:33.6661884Z // end inline asm 2026-02-21T10:18:33.6661946Z // begin inline asm 2026-02-21T10:18:33.6662081Z mov.u32 %r21909, 0x0; 2026-02-21T10:18:33.6662140Z mov.u32 %r21910, 0x0; 2026-02-21T10:18:33.6662247Z mov.u32 %r21911, 0x0; 2026-02-21T10:18:33.6662307Z mov.u32 %r21912, 0x0; 2026-02-21T10:18:33.6662537Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21909, %r21910, %r21911, %r21912 }, [ %rd486 + 0 ], %rd485; 2026-02-21T10:18:33.6662595Z // end inline asm 2026-02-21T10:18:33.6662654Z // begin inline asm 2026-02-21T10:18:33.6662714Z mov.u64 %rd488, 0x0; 2026-02-21T10:18:33.6662837Z createpolicy.fractional.L2::evict_first.b64 %rd488, 1.0; 2026-02-21T10:18:33.6662899Z // end inline asm 2026-02-21T10:18:33.6662957Z // begin inline asm 2026-02-21T10:18:33.6663015Z mov.u32 %r21913, 0x0; 2026-02-21T10:18:33.6663071Z mov.u32 %r21914, 0x0; 2026-02-21T10:18:33.6663131Z mov.u32 %r21915, 0x0; 2026-02-21T10:18:33.6663187Z mov.u32 %r21916, 0x0; 2026-02-21T10:18:33.6663411Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21913, %r21914, %r21915, %r21916 }, [ %rd489 + 0 ], %rd488; 2026-02-21T10:18:33.6663474Z // end inline asm 2026-02-21T10:18:33.6663531Z // begin inline asm 2026-02-21T10:18:33.6663592Z mov.u64 %rd491, 0x0; 2026-02-21T10:18:33.6663761Z createpolicy.fractional.L2::evict_first.b64 %rd491, 1.0; 2026-02-21T10:18:33.6663824Z // end inline asm 2026-02-21T10:18:33.6663882Z // begin inline asm 2026-02-21T10:18:33.6663940Z mov.u32 %r21917, 0x0; 2026-02-21T10:18:33.6663999Z mov.u32 %r21918, 0x0; 2026-02-21T10:18:33.6664056Z mov.u32 %r21919, 0x0; 2026-02-21T10:18:33.6664154Z mov.u32 %r21920, 0x0; 2026-02-21T10:18:33.6664385Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21917, %r21918, %r21919, %r21920 }, [ %rd492 + 0 ], %rd491; 2026-02-21T10:18:33.6664443Z // end inline asm 2026-02-21T10:18:33.6664501Z // begin inline asm 2026-02-21T10:18:33.6664567Z mov.u64 %rd494, 0x0; 2026-02-21T10:18:33.6664697Z createpolicy.fractional.L2::evict_first.b64 %rd494, 1.0; 2026-02-21T10:18:33.6664758Z // end inline asm 2026-02-21T10:18:33.6664816Z // begin inline asm 2026-02-21T10:18:33.6664878Z mov.u32 %r21921, 0x0; 2026-02-21T10:18:33.6664935Z mov.u32 %r21922, 0x0; 2026-02-21T10:18:33.6664993Z mov.u32 %r21923, 0x0; 2026-02-21T10:18:33.6665051Z mov.u32 %r21924, 0x0; 2026-02-21T10:18:33.6665279Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21921, %r21922, %r21923, %r21924 }, [ %rd495 + 0 ], %rd494; 2026-02-21T10:18:33.6665338Z // end inline asm 2026-02-21T10:18:33.6665397Z // begin inline asm 2026-02-21T10:18:33.6665458Z mov.u64 %rd497, 0x0; 2026-02-21T10:18:33.6665578Z createpolicy.fractional.L2::evict_first.b64 %rd497, 1.0; 2026-02-21T10:18:33.6665638Z // end inline asm 2026-02-21T10:18:33.6665698Z // begin inline asm 2026-02-21T10:18:33.6665755Z mov.u32 %r21925, 0x0; 2026-02-21T10:18:33.6665812Z mov.u32 %r21926, 0x0; 2026-02-21T10:18:33.6665869Z mov.u32 %r21927, 0x0; 2026-02-21T10:18:33.6665929Z mov.u32 %r21928, 0x0; 2026-02-21T10:18:33.6666153Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r21925, %r21926, %r21927, %r21928 }, [ %rd498 + 0 ], %rd497; 2026-02-21T10:18:33.6666213Z // end inline asm 2026-02-21T10:18:33.6666425Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6666609Z bar.sync 0; 2026-02-21T10:18:33.6666700Z st.shared.v2.b32 [%r75], {%r21897, %r21898}; 2026-02-21T10:18:33.6666794Z st.shared.v2.b32 [%r75+2048], {%r21901, %r21902}; 2026-02-21T10:18:33.6666882Z st.shared.v2.b32 [%r75+4096], {%r21905, %r21906}; 2026-02-21T10:18:33.6666971Z st.shared.v2.b32 [%r75+6144], {%r21909, %r21910}; 2026-02-21T10:18:33.6667056Z st.shared.v2.b32 [%r75+8192], {%r21913, %r21914}; 2026-02-21T10:18:33.6667152Z st.shared.v2.b32 [%r75+10240], {%r21917, %r21918}; 2026-02-21T10:18:33.6667240Z st.shared.v2.b32 [%r75+12288], {%r21921, %r21922}; 2026-02-21T10:18:33.6667324Z st.shared.v2.b32 [%r75+14336], {%r21925, %r21926}; 2026-02-21T10:18:33.6667407Z st.shared.v2.b32 [%r76], {%r21899, %r21900}; 2026-02-21T10:18:33.6667494Z st.shared.v2.b32 [%r76+2048], {%r21903, %r21904}; 2026-02-21T10:18:33.6667670Z st.shared.v2.b32 [%r76+4096], {%r21907, %r21908}; 2026-02-21T10:18:33.6667813Z st.shared.v2.b32 [%r76+6144], {%r21911, %r21912}; 2026-02-21T10:18:33.6667899Z st.shared.v2.b32 [%r76+8192], {%r21915, %r21916}; 2026-02-21T10:18:33.6667986Z st.shared.v2.b32 [%r76+10240], {%r21919, %r21920}; 2026-02-21T10:18:33.6668071Z st.shared.v2.b32 [%r76+12288], {%r21923, %r21924}; 2026-02-21T10:18:33.6668158Z st.shared.v2.b32 [%r76+14336], {%r21927, %r21928}; 2026-02-21T10:18:33.6668216Z bar.sync 0; 2026-02-21T10:18:33.6668287Z ld.shared.b16 %rs1793, [%r77]; 2026-02-21T10:18:33.6668362Z ld.shared.b16 %rs1794, [%r77+1024]; 2026-02-21T10:18:33.6668528Z ld.shared.b16 %rs1795, [%r77+64]; 2026-02-21T10:18:33.6668600Z ld.shared.b16 %rs1796, [%r77+1088]; 2026-02-21T10:18:33.6668671Z ld.shared.b16 %rs1797, [%r77+8192]; 2026-02-21T10:18:33.6668737Z ld.shared.b16 %rs1798, [%r77+9216]; 2026-02-21T10:18:33.6668802Z ld.shared.b16 %rs1799, [%r77+8256]; 2026-02-21T10:18:33.6668866Z ld.shared.b16 %rs1800, [%r77+9280]; 2026-02-21T10:18:33.6668937Z ld.shared.b16 %rs1801, [%r78]; 2026-02-21T10:18:33.6669004Z ld.shared.b16 %rs1802, [%r78+1024]; 2026-02-21T10:18:33.6669140Z ld.shared.b16 %rs1803, [%r78+64]; 2026-02-21T10:18:33.6669211Z ld.shared.b16 %rs1804, [%r78+1088]; 2026-02-21T10:18:33.6669277Z ld.shared.b16 %rs1805, [%r78+8192]; 2026-02-21T10:18:33.6669341Z ld.shared.b16 %rs1806, [%r78+9216]; 2026-02-21T10:18:33.6669406Z ld.shared.b16 %rs1807, [%r78+8256]; 2026-02-21T10:18:33.6669532Z ld.shared.b16 %rs1808, [%r78+9280]; 2026-02-21T10:18:33.6669600Z ld.shared.b16 %rs1809, [%r79]; 2026-02-21T10:18:33.6669664Z ld.shared.b16 %rs1810, [%r79+1024]; 2026-02-21T10:18:33.6669733Z ld.shared.b16 %rs1811, [%r79+64]; 2026-02-21T10:18:33.6669797Z ld.shared.b16 %rs1812, [%r79+1088]; 2026-02-21T10:18:33.6669860Z ld.shared.b16 %rs1813, [%r79+8192]; 2026-02-21T10:18:33.6669926Z ld.shared.b16 %rs1814, [%r79+9216]; 2026-02-21T10:18:33.6669991Z ld.shared.b16 %rs1815, [%r79+8256]; 2026-02-21T10:18:33.6670057Z ld.shared.b16 %rs1816, [%r79+9280]; 2026-02-21T10:18:33.6670125Z ld.shared.b16 %rs1817, [%r80]; 2026-02-21T10:18:33.6670206Z ld.shared.b16 %rs1818, [%r80+1024]; 2026-02-21T10:18:33.6670274Z ld.shared.b16 %rs1819, [%r80+64]; 2026-02-21T10:18:33.6670341Z ld.shared.b16 %rs1820, [%r80+1088]; 2026-02-21T10:18:33.6670408Z ld.shared.b16 %rs1821, [%r80+8192]; 2026-02-21T10:18:33.6670472Z ld.shared.b16 %rs1822, [%r80+9216]; 2026-02-21T10:18:33.6670539Z ld.shared.b16 %rs1823, [%r80+8256]; 2026-02-21T10:18:33.6670602Z ld.shared.b16 %rs1824, [%r80+9280]; 2026-02-21T10:18:33.6670670Z ld.shared.b16 %rs1825, [%r81]; 2026-02-21T10:18:33.6670735Z ld.shared.b16 %rs1826, [%r81+1024]; 2026-02-21T10:18:33.6670800Z ld.shared.b16 %rs1827, [%r81+64]; 2026-02-21T10:18:33.6670868Z ld.shared.b16 %rs1828, [%r81+1088]; 2026-02-21T10:18:33.6670932Z ld.shared.b16 %rs1829, [%r81+8192]; 2026-02-21T10:18:33.6670996Z ld.shared.b16 %rs1830, [%r81+9216]; 2026-02-21T10:18:33.6671061Z ld.shared.b16 %rs1831, [%r81+8256]; 2026-02-21T10:18:33.6671128Z ld.shared.b16 %rs1832, [%r81+9280]; 2026-02-21T10:18:33.6671194Z ld.shared.b16 %rs1833, [%r82]; 2026-02-21T10:18:33.6671260Z ld.shared.b16 %rs1834, [%r82+1024]; 2026-02-21T10:18:33.6671330Z ld.shared.b16 %rs1835, [%r82+64]; 2026-02-21T10:18:33.6671394Z ld.shared.b16 %rs1836, [%r82+1088]; 2026-02-21T10:18:33.6671459Z ld.shared.b16 %rs1837, [%r82+8192]; 2026-02-21T10:18:33.6671524Z ld.shared.b16 %rs1838, [%r82+9216]; 2026-02-21T10:18:33.6671589Z ld.shared.b16 %rs1839, [%r82+8256]; 2026-02-21T10:18:33.6671654Z ld.shared.b16 %rs1840, [%r82+9280]; 2026-02-21T10:18:33.6671717Z ld.shared.b16 %rs1841, [%r83]; 2026-02-21T10:18:33.6671784Z ld.shared.b16 %rs1842, [%r83+1024]; 2026-02-21T10:18:33.6671848Z ld.shared.b16 %rs1843, [%r83+64]; 2026-02-21T10:18:33.6671911Z ld.shared.b16 %rs1844, [%r83+1088]; 2026-02-21T10:18:33.6671981Z ld.shared.b16 %rs1845, [%r83+8192]; 2026-02-21T10:18:33.6672044Z ld.shared.b16 %rs1846, [%r83+9216]; 2026-02-21T10:18:33.6672166Z ld.shared.b16 %rs1847, [%r83+8256]; 2026-02-21T10:18:33.6672276Z ld.shared.b16 %rs1848, [%r83+9280]; 2026-02-21T10:18:33.6672341Z ld.shared.b16 %rs1849, [%r84]; 2026-02-21T10:18:33.6672408Z ld.shared.b16 %rs1850, [%r84+1024]; 2026-02-21T10:18:33.6672471Z ld.shared.b16 %rs1851, [%r84+64]; 2026-02-21T10:18:33.6672538Z ld.shared.b16 %rs1852, [%r84+1088]; 2026-02-21T10:18:33.6672602Z ld.shared.b16 %rs1853, [%r84+8192]; 2026-02-21T10:18:33.6672667Z ld.shared.b16 %rs1854, [%r84+9216]; 2026-02-21T10:18:33.6672734Z ld.shared.b16 %rs1855, [%r84+8256]; 2026-02-21T10:18:33.6672798Z ld.shared.b16 %rs1856, [%r84+9280]; 2026-02-21T10:18:33.6672861Z cvt.f32.bf16 %r22066, %rs1793; 2026-02-21T10:18:33.6672922Z cvt.f32.bf16 %r22067, %rs1794; 2026-02-21T10:18:33.6672988Z cvt.f32.bf16 %r22068, %rs1801; 2026-02-21T10:18:33.6673050Z cvt.f32.bf16 %r22069, %rs1802; 2026-02-21T10:18:33.6673111Z cvt.f32.bf16 %r22198, %rs1809; 2026-02-21T10:18:33.6673187Z cvt.f32.bf16 %r22199, %rs1810; 2026-02-21T10:18:33.6673253Z cvt.f32.bf16 %r22200, %rs1817; 2026-02-21T10:18:33.6673316Z cvt.f32.bf16 %r22201, %rs1818; 2026-02-21T10:18:33.6673376Z cvt.f32.bf16 %r22330, %rs1825; 2026-02-21T10:18:33.6673490Z cvt.f32.bf16 %r22331, %rs1826; 2026-02-21T10:18:33.6673553Z cvt.f32.bf16 %r22332, %rs1833; 2026-02-21T10:18:33.6673612Z cvt.f32.bf16 %r22333, %rs1834; 2026-02-21T10:18:33.6673676Z cvt.f32.bf16 %r22462, %rs1841; 2026-02-21T10:18:33.6673797Z cvt.f32.bf16 %r22463, %rs1842; 2026-02-21T10:18:33.6673860Z cvt.f32.bf16 %r22464, %rs1849; 2026-02-21T10:18:33.6673923Z cvt.f32.bf16 %r22465, %rs1850; 2026-02-21T10:18:33.6673985Z cvt.f32.bf16 %r22594, %rs1795; 2026-02-21T10:18:33.6674044Z cvt.f32.bf16 %r22595, %rs1796; 2026-02-21T10:18:33.6674104Z cvt.f32.bf16 %r22596, %rs1803; 2026-02-21T10:18:33.6674167Z cvt.f32.bf16 %r22597, %rs1804; 2026-02-21T10:18:33.6674228Z cvt.f32.bf16 %r22726, %rs1811; 2026-02-21T10:18:33.6674289Z cvt.f32.bf16 %r22727, %rs1812; 2026-02-21T10:18:33.6674352Z cvt.f32.bf16 %r22728, %rs1819; 2026-02-21T10:18:33.6674414Z cvt.f32.bf16 %r22729, %rs1820; 2026-02-21T10:18:33.6674477Z cvt.f32.bf16 %r22858, %rs1827; 2026-02-21T10:18:33.6674547Z cvt.f32.bf16 %r22859, %rs1828; 2026-02-21T10:18:33.6674619Z cvt.f32.bf16 %r22860, %rs1835; 2026-02-21T10:18:33.6674685Z cvt.f32.bf16 %r22861, %rs1836; 2026-02-21T10:18:33.6674746Z cvt.f32.bf16 %r22990, %rs1843; 2026-02-21T10:18:33.6674809Z cvt.f32.bf16 %r22991, %rs1844; 2026-02-21T10:18:33.6674872Z cvt.f32.bf16 %r22992, %rs1851; 2026-02-21T10:18:33.6674934Z cvt.f32.bf16 %r22993, %rs1852; 2026-02-21T10:18:33.6674995Z cvt.f32.bf16 %r23122, %rs1797; 2026-02-21T10:18:33.6675059Z cvt.f32.bf16 %r23123, %rs1798; 2026-02-21T10:18:33.6675119Z cvt.f32.bf16 %r23124, %rs1805; 2026-02-21T10:18:33.6675180Z cvt.f32.bf16 %r23125, %rs1806; 2026-02-21T10:18:33.6675244Z cvt.f32.bf16 %r23254, %rs1813; 2026-02-21T10:18:33.6675304Z cvt.f32.bf16 %r23255, %rs1814; 2026-02-21T10:18:33.6675365Z cvt.f32.bf16 %r23256, %rs1821; 2026-02-21T10:18:33.6675428Z cvt.f32.bf16 %r23257, %rs1822; 2026-02-21T10:18:33.6675492Z cvt.f32.bf16 %r23386, %rs1829; 2026-02-21T10:18:33.6675552Z cvt.f32.bf16 %r23387, %rs1830; 2026-02-21T10:18:33.6675615Z cvt.f32.bf16 %r23388, %rs1837; 2026-02-21T10:18:33.6675680Z cvt.f32.bf16 %r23389, %rs1838; 2026-02-21T10:18:33.6675741Z cvt.f32.bf16 %r23518, %rs1845; 2026-02-21T10:18:33.6675801Z cvt.f32.bf16 %r23519, %rs1846; 2026-02-21T10:18:33.6675866Z cvt.f32.bf16 %r23520, %rs1853; 2026-02-21T10:18:33.6675926Z cvt.f32.bf16 %r23521, %rs1854; 2026-02-21T10:18:33.6675987Z cvt.f32.bf16 %r23650, %rs1799; 2026-02-21T10:18:33.6676048Z cvt.f32.bf16 %r23651, %rs1800; 2026-02-21T10:18:33.6676110Z cvt.f32.bf16 %r23652, %rs1807; 2026-02-21T10:18:33.6676170Z cvt.f32.bf16 %r23653, %rs1808; 2026-02-21T10:18:33.6676231Z cvt.f32.bf16 %r23782, %rs1815; 2026-02-21T10:18:33.6676292Z cvt.f32.bf16 %r23783, %rs1816; 2026-02-21T10:18:33.6676353Z cvt.f32.bf16 %r23784, %rs1823; 2026-02-21T10:18:33.6676591Z cvt.f32.bf16 %r23785, %rs1824; 2026-02-21T10:18:33.6676656Z cvt.f32.bf16 %r23914, %rs1831; 2026-02-21T10:18:33.6676801Z cvt.f32.bf16 %r23915, %rs1832; 2026-02-21T10:18:33.6676867Z cvt.f32.bf16 %r23916, %rs1839; 2026-02-21T10:18:33.6676928Z cvt.f32.bf16 %r23917, %rs1840; 2026-02-21T10:18:33.6676991Z cvt.f32.bf16 %r24046, %rs1847; 2026-02-21T10:18:33.6677051Z cvt.f32.bf16 %r24047, %rs1848; 2026-02-21T10:18:33.6677111Z cvt.f32.bf16 %r24048, %rs1855; 2026-02-21T10:18:33.6677172Z cvt.f32.bf16 %r24049, %rs1856; 2026-02-21T10:18:33.6677386Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6677444Z bar.sync 0; 2026-02-21T10:18:33.6677510Z add.s32 %r29375, %r29377, 4096; 2026-02-21T10:18:33.6677571Z // begin inline asm 2026-02-21T10:18:33.6677673Z @%p222 mbarrier.init.shared::cta.b64 [%r29375], 1; 2026-02-21T10:18:33.6677732Z // end inline asm 2026-02-21T10:18:33.6677789Z bar.sync 0; 2026-02-21T10:18:33.6677850Z // begin inline asm 2026-02-21T10:18:33.6677988Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r29375], 4096; 2026-02-21T10:18:33.6678048Z // end inline asm 2026-02-21T10:18:33.6678178Z // begin inline asm 2026-02-21T10:18:33.6678260Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6678317Z // end inline asm 2026-02-21T10:18:33.6678374Z bar.sync 0; 2026-02-21T10:18:33.6678442Z elect.sync %r29142|%p283, -1; 2026-02-21T10:18:33.6678511Z and.pred %p224, %p1, %p283; 2026-02-21T10:18:33.6678642Z add.s64 %rd79, %rd662, 96; 2026-02-21T10:18:33.6678710Z cvt.u32.u64 %r21933, %rd79; 2026-02-21T10:18:33.6678771Z // begin inline asm 2026-02-21T10:18:33.6679113Z @%p224 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r29378, %r21933}], [%r29375]; 2026-02-21T10:18:33.6679174Z // end inline asm 2026-02-21T10:18:33.6679230Z bar.sync 0; 2026-02-21T10:18:33.6679288Z mov.b32 %r29010, 0; 2026-02-21T10:18:33.6679349Z // begin inline asm 2026-02-21T10:18:33.6679406Z 2026-02-21T10:18:33.6679458Z { 2026-02-21T10:18:33.6679522Z .reg .pred complete; 2026-02-21T10:18:33.6679582Z waitLoop: 2026-02-21T10:18:33.6679734Z mbarrier.try_wait.parity.shared.b64 complete, [%r29375], %r29010; 2026-02-21T10:18:33.6679804Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6679859Z } 2026-02-21T10:18:33.6679865Z 2026-02-21T10:18:33.6679923Z // end inline asm 2026-02-21T10:18:33.6679979Z bar.sync 0; 2026-02-21T10:18:33.6680038Z // begin inline asm 2026-02-21T10:18:33.6680140Z @%p222 mbarrier.inval.shared::cta.b64 [%r29375]; 2026-02-21T10:18:33.6680198Z // end inline asm 2026-02-21T10:18:33.6680406Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6680476Z ld.shared.s8 %rs1857, [%r85]; 2026-02-21T10:18:33.6680674Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6680739Z shl.b16 %rs1858, %rs1857, 4; 2026-02-21T10:18:33.6680939Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6681010Z ld.shared.s8 %rs1859, [%r86+128]; 2026-02-21T10:18:33.6681205Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6681267Z shl.b16 %rs1860, %rs1859, 4; 2026-02-21T10:18:33.6681467Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6681545Z ld.shared.s8 %rs1861, [%r87+256]; 2026-02-21T10:18:33.6681744Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6681808Z shl.b16 %rs1862, %rs1861, 4; 2026-02-21T10:18:33.6682003Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6682070Z ld.shared.s8 %rs1863, [%r88+384]; 2026-02-21T10:18:33.6682266Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6682442Z shl.b16 %rs1864, %rs1863, 4; 2026-02-21T10:18:33.6682647Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6682718Z ld.shared.s8 %rs1865, [%r89+512]; 2026-02-21T10:18:33.6682912Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6682975Z shl.b16 %rs1866, %rs1865, 4; 2026-02-21T10:18:33.6683171Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6683238Z ld.shared.s8 %rs1867, [%r90+640]; 2026-02-21T10:18:33.6683431Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6683492Z shl.b16 %rs1868, %rs1867, 4; 2026-02-21T10:18:33.6683686Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6683755Z ld.shared.s8 %rs1869, [%r91+768]; 2026-02-21T10:18:33.6683998Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6684064Z shl.b16 %rs1870, %rs1869, 4; 2026-02-21T10:18:33.6684257Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6684329Z ld.shared.s8 %rs1871, [%r92+896]; 2026-02-21T10:18:33.6684570Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6684635Z shl.b16 %rs1872, %rs1871, 4; 2026-02-21T10:18:33.6684828Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6684898Z ld.shared.s8 %rs1873, [%r85+1024]; 2026-02-21T10:18:33.6685091Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6685154Z shl.b16 %rs1874, %rs1873, 4; 2026-02-21T10:18:33.6685349Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6685423Z ld.shared.s8 %rs1875, [%r86+1152]; 2026-02-21T10:18:33.6685617Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6685679Z shl.b16 %rs1876, %rs1875, 4; 2026-02-21T10:18:33.6685877Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6685942Z ld.shared.s8 %rs1877, [%r87+1280]; 2026-02-21T10:18:33.6686136Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6686212Z shl.b16 %rs1878, %rs1877, 4; 2026-02-21T10:18:33.6686409Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6686588Z ld.shared.s8 %rs1879, [%r88+1408]; 2026-02-21T10:18:33.6686793Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6686858Z shl.b16 %rs1880, %rs1879, 4; 2026-02-21T10:18:33.6687052Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6687118Z ld.shared.s8 %rs1881, [%r89+1536]; 2026-02-21T10:18:33.6687316Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6687380Z shl.b16 %rs1882, %rs1881, 4; 2026-02-21T10:18:33.6687573Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6687642Z ld.shared.s8 %rs1883, [%r90+1664]; 2026-02-21T10:18:33.6687838Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6687900Z shl.b16 %rs1884, %rs1883, 4; 2026-02-21T10:18:33.6688095Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6688249Z ld.shared.s8 %rs1885, [%r91+1792]; 2026-02-21T10:18:33.6688505Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6688572Z shl.b16 %rs1886, %rs1885, 4; 2026-02-21T10:18:33.6688767Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6688834Z ld.shared.s8 %rs1887, [%r92+1920]; 2026-02-21T10:18:33.6689028Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6689093Z shl.b16 %rs1888, %rs1887, 4; 2026-02-21T10:18:33.6689300Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6689367Z ld.shared.s8 %rs1889, [%r85+2048]; 2026-02-21T10:18:33.6689566Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6689629Z shl.b16 %rs1890, %rs1889, 4; 2026-02-21T10:18:33.6689825Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6689957Z ld.shared.s8 %rs1891, [%r86+2176]; 2026-02-21T10:18:33.6690167Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6690231Z shl.b16 %rs1892, %rs1891, 4; 2026-02-21T10:18:33.6690484Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6690552Z ld.shared.s8 %rs1893, [%r87+2304]; 2026-02-21T10:18:33.6690747Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6690810Z shl.b16 %rs1894, %rs1893, 4; 2026-02-21T10:18:33.6691005Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6691072Z ld.shared.s8 %rs1895, [%r88+2432]; 2026-02-21T10:18:33.6691268Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6691337Z shl.b16 %rs1896, %rs1895, 4; 2026-02-21T10:18:33.6691529Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6691596Z ld.shared.s8 %rs1897, [%r89+2560]; 2026-02-21T10:18:33.6691795Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6691856Z shl.b16 %rs1898, %rs1897, 4; 2026-02-21T10:18:33.6692048Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6692122Z ld.shared.s8 %rs1899, [%r90+2688]; 2026-02-21T10:18:33.6692327Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6692389Z shl.b16 %rs1900, %rs1899, 4; 2026-02-21T10:18:33.6692585Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6692652Z ld.shared.s8 %rs1901, [%r91+2816]; 2026-02-21T10:18:33.6692848Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6692910Z shl.b16 %rs1902, %rs1901, 4; 2026-02-21T10:18:33.6693104Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6693172Z ld.shared.s8 %rs1903, [%r92+2944]; 2026-02-21T10:18:33.6693366Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6693430Z shl.b16 %rs1904, %rs1903, 4; 2026-02-21T10:18:33.6693622Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6693688Z ld.shared.s8 %rs1905, [%r85+3072]; 2026-02-21T10:18:33.6693885Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6694002Z shl.b16 %rs1906, %rs1905, 4; 2026-02-21T10:18:33.6694240Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6694315Z ld.shared.s8 %rs1907, [%r86+3200]; 2026-02-21T10:18:33.6694510Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6694572Z shl.b16 %rs1908, %rs1907, 4; 2026-02-21T10:18:33.6694764Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6694831Z ld.shared.s8 %rs1909, [%r87+3328]; 2026-02-21T10:18:33.6695025Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6695084Z shl.b16 %rs1910, %rs1909, 4; 2026-02-21T10:18:33.6695278Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6695346Z ld.shared.s8 %rs1911, [%r88+3456]; 2026-02-21T10:18:33.6695541Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6695667Z shl.b16 %rs1912, %rs1911, 4; 2026-02-21T10:18:33.6695863Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6695929Z ld.shared.s8 %rs1913, [%r89+3584]; 2026-02-21T10:18:33.6696173Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6696237Z shl.b16 %rs1914, %rs1913, 4; 2026-02-21T10:18:33.6696431Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6696612Z ld.shared.s8 %rs1915, [%r90+3712]; 2026-02-21T10:18:33.6696826Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6696888Z shl.b16 %rs1916, %rs1915, 4; 2026-02-21T10:18:33.6697084Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6697156Z ld.shared.s8 %rs1917, [%r91+3840]; 2026-02-21T10:18:33.6697350Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6697412Z shl.b16 %rs1918, %rs1917, 4; 2026-02-21T10:18:33.6697608Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6697675Z ld.shared.s8 %rs1919, [%r92+3968]; 2026-02-21T10:18:33.6697870Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6697935Z shl.b16 %rs1920, %rs1919, 4; 2026-02-21T10:18:33.6698130Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6698193Z cvt.s16.s8 %rs1921, %rs1858; 2026-02-21T10:18:33.6698254Z shr.s16 %rs1922, %rs1921, 4; 2026-02-21T10:18:33.6698321Z cvt.s16.s8 %rs1923, %rs1860; 2026-02-21T10:18:33.6698385Z shr.s16 %rs1924, %rs1923, 4; 2026-02-21T10:18:33.6698445Z shr.s16 %rs1925, %rs1857, 4; 2026-02-21T10:18:33.6698508Z shr.s16 %rs1926, %rs1859, 4; 2026-02-21T10:18:33.6698703Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6698772Z cvt.rn.f32.s16 %r29143, %rs1926; 2026-02-21T10:18:33.6698841Z cvt.rn.f32.s16 %r29144, %rs1925; 2026-02-21T10:18:33.6698904Z cvt.rn.f32.s16 %r29145, %rs1924; 2026-02-21T10:18:33.6698966Z cvt.rn.f32.s16 %r29146, %rs1922; 2026-02-21T10:18:33.6699162Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6699227Z cvt.s16.s8 %rs1927, %rs1862; 2026-02-21T10:18:33.6699288Z shr.s16 %rs1928, %rs1927, 4; 2026-02-21T10:18:33.6699349Z cvt.s16.s8 %rs1929, %rs1864; 2026-02-21T10:18:33.6699412Z shr.s16 %rs1930, %rs1929, 4; 2026-02-21T10:18:33.6699563Z shr.s16 %rs1931, %rs1861, 4; 2026-02-21T10:18:33.6699682Z shr.s16 %rs1932, %rs1863, 4; 2026-02-21T10:18:33.6699886Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6699954Z cvt.rn.f32.s16 %r29147, %rs1932; 2026-02-21T10:18:33.6700017Z cvt.rn.f32.s16 %r29148, %rs1931; 2026-02-21T10:18:33.6700078Z cvt.rn.f32.s16 %r29149, %rs1930; 2026-02-21T10:18:33.6700146Z cvt.rn.f32.s16 %r29150, %rs1928; 2026-02-21T10:18:33.6700344Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6700417Z cvt.s16.s8 %rs1933, %rs1866; 2026-02-21T10:18:33.6700482Z shr.s16 %rs1934, %rs1933, 4; 2026-02-21T10:18:33.6700545Z cvt.s16.s8 %rs1935, %rs1868; 2026-02-21T10:18:33.6700605Z shr.s16 %rs1936, %rs1935, 4; 2026-02-21T10:18:33.6700664Z shr.s16 %rs1937, %rs1865, 4; 2026-02-21T10:18:33.6700727Z shr.s16 %rs1938, %rs1867, 4; 2026-02-21T10:18:33.6700924Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6700991Z cvt.rn.f32.s16 %r29151, %rs1938; 2026-02-21T10:18:33.6701119Z cvt.rn.f32.s16 %r29152, %rs1937; 2026-02-21T10:18:33.6701185Z cvt.rn.f32.s16 %r29153, %rs1936; 2026-02-21T10:18:33.6701247Z cvt.rn.f32.s16 %r29154, %rs1934; 2026-02-21T10:18:33.6701499Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6701564Z cvt.s16.s8 %rs1939, %rs1870; 2026-02-21T10:18:33.6701634Z shr.s16 %rs1940, %rs1939, 4; 2026-02-21T10:18:33.6701696Z cvt.s16.s8 %rs1941, %rs1872; 2026-02-21T10:18:33.6701761Z shr.s16 %rs1942, %rs1941, 4; 2026-02-21T10:18:33.6701820Z shr.s16 %rs1943, %rs1869, 4; 2026-02-21T10:18:33.6701880Z shr.s16 %rs1944, %rs1871, 4; 2026-02-21T10:18:33.6702079Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6702143Z cvt.rn.f32.s16 %r29155, %rs1944; 2026-02-21T10:18:33.6702209Z cvt.rn.f32.s16 %r29156, %rs1943; 2026-02-21T10:18:33.6702274Z cvt.rn.f32.s16 %r29157, %rs1942; 2026-02-21T10:18:33.6702338Z cvt.rn.f32.s16 %r29158, %rs1940; 2026-02-21T10:18:33.6702538Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6702599Z cvt.s16.s8 %rs1945, %rs1874; 2026-02-21T10:18:33.6702663Z shr.s16 %rs1946, %rs1945, 4; 2026-02-21T10:18:33.6702725Z cvt.s16.s8 %rs1947, %rs1876; 2026-02-21T10:18:33.6702786Z shr.s16 %rs1948, %rs1947, 4; 2026-02-21T10:18:33.6702849Z shr.s16 %rs1949, %rs1873, 4; 2026-02-21T10:18:33.6702911Z shr.s16 %rs1950, %rs1875, 4; 2026-02-21T10:18:33.6703104Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6703170Z cvt.rn.f32.s16 %r29159, %rs1950; 2026-02-21T10:18:33.6703232Z cvt.rn.f32.s16 %r29160, %rs1949; 2026-02-21T10:18:33.6703295Z cvt.rn.f32.s16 %r29161, %rs1948; 2026-02-21T10:18:33.6703360Z cvt.rn.f32.s16 %r29162, %rs1946; 2026-02-21T10:18:33.6703564Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6703630Z cvt.s16.s8 %rs1951, %rs1878; 2026-02-21T10:18:33.6703693Z shr.s16 %rs1952, %rs1951, 4; 2026-02-21T10:18:33.6703755Z cvt.s16.s8 %rs1953, %rs1880; 2026-02-21T10:18:33.6703816Z shr.s16 %rs1954, %rs1953, 4; 2026-02-21T10:18:33.6703877Z shr.s16 %rs1955, %rs1877, 4; 2026-02-21T10:18:33.6703938Z shr.s16 %rs1956, %rs1879, 4; 2026-02-21T10:18:33.6704139Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6704203Z cvt.rn.f32.s16 %r29163, %rs1956; 2026-02-21T10:18:33.6704267Z cvt.rn.f32.s16 %r29164, %rs1955; 2026-02-21T10:18:33.6704333Z cvt.rn.f32.s16 %r29165, %rs1954; 2026-02-21T10:18:33.6704395Z cvt.rn.f32.s16 %r29166, %rs1952; 2026-02-21T10:18:33.6704598Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6704766Z cvt.s16.s8 %rs1957, %rs1882; 2026-02-21T10:18:33.6704827Z shr.s16 %rs1958, %rs1957, 4; 2026-02-21T10:18:33.6704889Z cvt.s16.s8 %rs1959, %rs1884; 2026-02-21T10:18:33.6704949Z shr.s16 %rs1960, %rs1959, 4; 2026-02-21T10:18:33.6705011Z shr.s16 %rs1961, %rs1881, 4; 2026-02-21T10:18:33.6705071Z shr.s16 %rs1962, %rs1883, 4; 2026-02-21T10:18:33.6705279Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6705347Z cvt.rn.f32.s16 %r29167, %rs1962; 2026-02-21T10:18:33.6705409Z cvt.rn.f32.s16 %r29168, %rs1961; 2026-02-21T10:18:33.6705470Z cvt.rn.f32.s16 %r29169, %rs1960; 2026-02-21T10:18:33.6705533Z cvt.rn.f32.s16 %r29170, %rs1958; 2026-02-21T10:18:33.6705729Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6705791Z cvt.s16.s8 %rs1963, %rs1886; 2026-02-21T10:18:33.6705853Z shr.s16 %rs1964, %rs1963, 4; 2026-02-21T10:18:33.6705916Z cvt.s16.s8 %rs1965, %rs1888; 2026-02-21T10:18:33.6705978Z shr.s16 %rs1966, %rs1965, 4; 2026-02-21T10:18:33.6706094Z shr.s16 %rs1967, %rs1885, 4; 2026-02-21T10:18:33.6706160Z shr.s16 %rs1968, %rs1887, 4; 2026-02-21T10:18:33.6706359Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6706422Z cvt.rn.f32.s16 %r29171, %rs1968; 2026-02-21T10:18:33.6706679Z cvt.rn.f32.s16 %r29172, %rs1967; 2026-02-21T10:18:33.6706750Z cvt.rn.f32.s16 %r29173, %rs1966; 2026-02-21T10:18:33.6706813Z cvt.rn.f32.s16 %r29174, %rs1964; 2026-02-21T10:18:33.6707011Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6707077Z cvt.s16.s8 %rs1969, %rs1890; 2026-02-21T10:18:33.6707137Z shr.s16 %rs1970, %rs1969, 4; 2026-02-21T10:18:33.6707197Z cvt.s16.s8 %rs1971, %rs1892; 2026-02-21T10:18:33.6707258Z shr.s16 %rs1972, %rs1971, 4; 2026-02-21T10:18:33.6707322Z shr.s16 %rs1973, %rs1889, 4; 2026-02-21T10:18:33.6707383Z shr.s16 %rs1974, %rs1891, 4; 2026-02-21T10:18:33.6707580Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6707648Z cvt.rn.f32.s16 %r29175, %rs1974; 2026-02-21T10:18:33.6707709Z cvt.rn.f32.s16 %r29176, %rs1973; 2026-02-21T10:18:33.6707771Z cvt.rn.f32.s16 %r29177, %rs1972; 2026-02-21T10:18:33.6707847Z cvt.rn.f32.s16 %r29178, %rs1970; 2026-02-21T10:18:33.6708045Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6708107Z cvt.s16.s8 %rs1975, %rs1894; 2026-02-21T10:18:33.6708172Z shr.s16 %rs1976, %rs1975, 4; 2026-02-21T10:18:33.6708231Z cvt.s16.s8 %rs1977, %rs1896; 2026-02-21T10:18:33.6708291Z shr.s16 %rs1978, %rs1977, 4; 2026-02-21T10:18:33.6708353Z shr.s16 %rs1979, %rs1893, 4; 2026-02-21T10:18:33.6708494Z shr.s16 %rs1980, %rs1895, 4; 2026-02-21T10:18:33.6708695Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6708759Z cvt.rn.f32.s16 %r29179, %rs1980; 2026-02-21T10:18:33.6708828Z cvt.rn.f32.s16 %r29180, %rs1979; 2026-02-21T10:18:33.6708890Z cvt.rn.f32.s16 %r29181, %rs1978; 2026-02-21T10:18:33.6708951Z cvt.rn.f32.s16 %r29182, %rs1976; 2026-02-21T10:18:33.6709151Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6709213Z cvt.s16.s8 %rs1981, %rs1898; 2026-02-21T10:18:33.6709274Z shr.s16 %rs1982, %rs1981, 4; 2026-02-21T10:18:33.6709334Z cvt.s16.s8 %rs1983, %rs1900; 2026-02-21T10:18:33.6709395Z shr.s16 %rs1984, %rs1983, 4; 2026-02-21T10:18:33.6709455Z shr.s16 %rs1985, %rs1897, 4; 2026-02-21T10:18:33.6709514Z shr.s16 %rs1986, %rs1899, 4; 2026-02-21T10:18:33.6709711Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6709861Z cvt.rn.f32.s16 %r29183, %rs1986; 2026-02-21T10:18:33.6709924Z cvt.rn.f32.s16 %r29184, %rs1985; 2026-02-21T10:18:33.6710044Z cvt.rn.f32.s16 %r29185, %rs1984; 2026-02-21T10:18:33.6710111Z cvt.rn.f32.s16 %r29186, %rs1982; 2026-02-21T10:18:33.6710309Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6710371Z cvt.s16.s8 %rs1987, %rs1902; 2026-02-21T10:18:33.6710435Z shr.s16 %rs1988, %rs1987, 4; 2026-02-21T10:18:33.6710498Z cvt.s16.s8 %rs1989, %rs1904; 2026-02-21T10:18:33.6710559Z shr.s16 %rs1990, %rs1989, 4; 2026-02-21T10:18:33.6710622Z shr.s16 %rs1991, %rs1901, 4; 2026-02-21T10:18:33.6710681Z shr.s16 %rs1992, %rs1903, 4; 2026-02-21T10:18:33.6710876Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6710940Z cvt.rn.f32.s16 %r29187, %rs1992; 2026-02-21T10:18:33.6711007Z cvt.rn.f32.s16 %r29188, %rs1991; 2026-02-21T10:18:33.6711068Z cvt.rn.f32.s16 %r29189, %rs1990; 2026-02-21T10:18:33.6711132Z cvt.rn.f32.s16 %r29190, %rs1988; 2026-02-21T10:18:33.6711398Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6711469Z cvt.s16.s8 %rs1993, %rs1906; 2026-02-21T10:18:33.6711531Z shr.s16 %rs1994, %rs1993, 4; 2026-02-21T10:18:33.6711593Z cvt.s16.s8 %rs1995, %rs1908; 2026-02-21T10:18:33.6711653Z shr.s16 %rs1996, %rs1995, 4; 2026-02-21T10:18:33.6711760Z shr.s16 %rs1997, %rs1905, 4; 2026-02-21T10:18:33.6711824Z shr.s16 %rs1998, %rs1907, 4; 2026-02-21T10:18:33.6712025Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6712090Z cvt.rn.f32.s16 %r29191, %rs1998; 2026-02-21T10:18:33.6712152Z cvt.rn.f32.s16 %r29192, %rs1997; 2026-02-21T10:18:33.6712218Z cvt.rn.f32.s16 %r29193, %rs1996; 2026-02-21T10:18:33.6712279Z cvt.rn.f32.s16 %r29194, %rs1994; 2026-02-21T10:18:33.6712474Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6712544Z cvt.s16.s8 %rs1999, %rs1910; 2026-02-21T10:18:33.6712603Z shr.s16 %rs2000, %rs1999, 4; 2026-02-21T10:18:33.6712665Z cvt.s16.s8 %rs2001, %rs1912; 2026-02-21T10:18:33.6712725Z shr.s16 %rs2002, %rs2001, 4; 2026-02-21T10:18:33.6712787Z shr.s16 %rs2003, %rs1909, 4; 2026-02-21T10:18:33.6712848Z shr.s16 %rs2004, %rs1911, 4; 2026-02-21T10:18:33.6713043Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6713113Z cvt.rn.f32.s16 %r29195, %rs2004; 2026-02-21T10:18:33.6713176Z cvt.rn.f32.s16 %r29196, %rs2003; 2026-02-21T10:18:33.6713237Z cvt.rn.f32.s16 %r29197, %rs2002; 2026-02-21T10:18:33.6713299Z cvt.rn.f32.s16 %r29198, %rs2000; 2026-02-21T10:18:33.6713501Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6713566Z cvt.s16.s8 %rs2005, %rs1914; 2026-02-21T10:18:33.6713631Z shr.s16 %rs2006, %rs2005, 4; 2026-02-21T10:18:33.6713709Z cvt.s16.s8 %rs2007, %rs1916; 2026-02-21T10:18:33.6713786Z shr.s16 %rs2008, %rs2007, 4; 2026-02-21T10:18:33.6713849Z shr.s16 %rs2009, %rs1913, 4; 2026-02-21T10:18:33.6713916Z shr.s16 %rs2010, %rs1915, 4; 2026-02-21T10:18:33.6714113Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6714177Z cvt.rn.f32.s16 %r29199, %rs2010; 2026-02-21T10:18:33.6714241Z cvt.rn.f32.s16 %r29200, %rs2009; 2026-02-21T10:18:33.6714307Z cvt.rn.f32.s16 %r29201, %rs2008; 2026-02-21T10:18:33.6714375Z cvt.rn.f32.s16 %r29202, %rs2006; 2026-02-21T10:18:33.6714573Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6714639Z cvt.s16.s8 %rs2011, %rs1918; 2026-02-21T10:18:33.6714703Z shr.s16 %rs2012, %rs2011, 4; 2026-02-21T10:18:33.6714764Z cvt.s16.s8 %rs2013, %rs1920; 2026-02-21T10:18:33.6714882Z shr.s16 %rs2014, %rs2013, 4; 2026-02-21T10:18:33.6714945Z shr.s16 %rs2015, %rs1917, 4; 2026-02-21T10:18:33.6715071Z shr.s16 %rs2016, %rs1919, 4; 2026-02-21T10:18:33.6715272Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6715354Z cvt.rn.f32.s16 %r29203, %rs2016; 2026-02-21T10:18:33.6715421Z cvt.rn.f32.s16 %r29204, %rs2015; 2026-02-21T10:18:33.6715484Z cvt.rn.f32.s16 %r29205, %rs2014; 2026-02-21T10:18:33.6715550Z cvt.rn.f32.s16 %r29206, %rs2012; 2026-02-21T10:18:33.6715606Z bar.sync 0; 2026-02-21T10:18:33.6715728Z st.shared.v4.b32 [%r93], {%r29146, %r29144, %r29145, %r29143}; 2026-02-21T10:18:33.6715860Z st.shared.v4.b32 [%r93+16384], {%r29178, %r29176, %r29177, %r29175}; 2026-02-21T10:18:33.6715971Z st.shared.v4.b32 [%r94], {%r29150, %r29148, %r29149, %r29147}; 2026-02-21T10:18:33.6716093Z st.shared.v4.b32 [%r94+16384], {%r29182, %r29180, %r29181, %r29179}; 2026-02-21T10:18:33.6716202Z st.shared.v4.b32 [%r95], {%r29154, %r29152, %r29153, %r29151}; 2026-02-21T10:18:33.6716323Z st.shared.v4.b32 [%r95+16384], {%r29186, %r29184, %r29185, %r29183}; 2026-02-21T10:18:33.6716613Z st.shared.v4.b32 [%r96], {%r29158, %r29156, %r29157, %r29155}; 2026-02-21T10:18:33.6716743Z st.shared.v4.b32 [%r96+16384], {%r29190, %r29188, %r29189, %r29187}; 2026-02-21T10:18:33.6716854Z st.shared.v4.b32 [%r97], {%r29162, %r29160, %r29161, %r29159}; 2026-02-21T10:18:33.6717038Z st.shared.v4.b32 [%r97+16384], {%r29194, %r29192, %r29193, %r29191}; 2026-02-21T10:18:33.6717161Z st.shared.v4.b32 [%r98], {%r29166, %r29164, %r29165, %r29163}; 2026-02-21T10:18:33.6717284Z st.shared.v4.b32 [%r98+16384], {%r29198, %r29196, %r29197, %r29195}; 2026-02-21T10:18:33.6717390Z st.shared.v4.b32 [%r99], {%r29170, %r29168, %r29169, %r29167}; 2026-02-21T10:18:33.6717506Z st.shared.v4.b32 [%r99+16384], {%r29202, %r29200, %r29201, %r29199}; 2026-02-21T10:18:33.6717620Z st.shared.v4.b32 [%r100], {%r29174, %r29172, %r29173, %r29171}; 2026-02-21T10:18:33.6717748Z st.shared.v4.b32 [%r100+16384], {%r29206, %r29204, %r29205, %r29203}; 2026-02-21T10:18:33.6717806Z $L__tmp17: 2026-02-21T10:18:33.6718089Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6718159Z // begin inline asm 2026-02-21T10:18:33.6718241Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6718303Z // end inline asm 2026-02-21T10:18:33.6718372Z bar.sync 0; 2026-02-21T10:18:33.6718465Z shfl.sync.idx.b32 %r29207, %r5, 0, 31, -1; 2026-02-21T10:18:33.6718545Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6718609Z mov.pred %p226, -1; 2026-02-21T10:18:33.6718671Z // begin inline asm 2026-02-21T10:18:33.6720171Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22066,%r22067,%r22068,%r22069}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6720235Z // end inline asm 2026-02-21T10:18:33.6720294Z // begin inline asm 2026-02-21T10:18:33.6721777Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22198,%r22199,%r22200,%r22201}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6721966Z // end inline asm 2026-02-21T10:18:33.6722029Z // begin inline asm 2026-02-21T10:18:33.6723522Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22330,%r22331,%r22332,%r22333}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6723583Z // end inline asm 2026-02-21T10:18:33.6723646Z // begin inline asm 2026-02-21T10:18:33.6725262Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22462,%r22463,%r22464,%r22465}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6725328Z // end inline asm 2026-02-21T10:18:33.6725387Z // begin inline asm 2026-02-21T10:18:33.6726988Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22594,%r22595,%r22596,%r22597}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6727054Z // end inline asm 2026-02-21T10:18:33.6727123Z // begin inline asm 2026-02-21T10:18:33.6728649Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22726,%r22727,%r22728,%r22729}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6728710Z // end inline asm 2026-02-21T10:18:33.6728779Z // begin inline asm 2026-02-21T10:18:33.6730261Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22858,%r22859,%r22860,%r22861}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6730329Z // end inline asm 2026-02-21T10:18:33.6730395Z // begin inline asm 2026-02-21T10:18:33.6731875Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r22990,%r22991,%r22992,%r22993}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6732073Z // end inline asm 2026-02-21T10:18:33.6732132Z // begin inline asm 2026-02-21T10:18:33.6733670Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23122,%r23123,%r23124,%r23125}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6733739Z // end inline asm 2026-02-21T10:18:33.6733805Z // begin inline asm 2026-02-21T10:18:33.6735348Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23254,%r23255,%r23256,%r23257}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6735415Z // end inline asm 2026-02-21T10:18:33.6735478Z // begin inline asm 2026-02-21T10:18:33.6737086Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23386,%r23387,%r23388,%r23389}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6737151Z // end inline asm 2026-02-21T10:18:33.6737210Z // begin inline asm 2026-02-21T10:18:33.6738690Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23518,%r23519,%r23520,%r23521}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6738750Z // end inline asm 2026-02-21T10:18:33.6738811Z // begin inline asm 2026-02-21T10:18:33.6740296Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23650,%r23651,%r23652,%r23653}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6740490Z // end inline asm 2026-02-21T10:18:33.6740554Z // begin inline asm 2026-02-21T10:18:33.6742037Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23782,%r23783,%r23784,%r23785}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6742098Z // end inline asm 2026-02-21T10:18:33.6742156Z // begin inline asm 2026-02-21T10:18:33.6743746Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r23914,%r23915,%r23916,%r23917}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6743811Z // end inline asm 2026-02-21T10:18:33.6743870Z // begin inline asm 2026-02-21T10:18:33.6745358Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r24046,%r24047,%r24048,%r24049}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6745422Z // end inline asm 2026-02-21T10:18:33.6745501Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6745565Z mov.b32 %r24178, %r29377; 2026-02-21T10:18:33.6745629Z mov.b32 %r24179, %r29010; 2026-02-21T10:18:33.6745688Z mov.b32 %r24180, %r29010; 2026-02-21T10:18:33.6745746Z // begin inline asm 2026-02-21T10:18:33.6748378Z // wait for regs: %r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484,%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548,%r24178,%r24179,%r24180 2026-02-21T10:18:33.6748547Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6748608Z // end inline asm 2026-02-21T10:18:33.6748663Z $L__tmp18: 2026-02-21T10:18:33.6748878Z .loc 1 55 22 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:55:22 2026-02-21T10:18:33.6749041Z add.s64 %rd602, %rd599, %rd20; 2026-02-21T10:18:33.6749308Z .loc 1 57 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:57:25 2026-02-21T10:18:33.6749373Z add.s64 %rd603, %rd602, 64; 2026-02-21T10:18:33.6749589Z .loc 1 58 60 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:60 2026-02-21T10:18:33.6749654Z cvt.u32.u64 %r29208, %rd603; 2026-02-21T10:18:33.6749721Z add.s32 %r29209, %r1159, %r29208; 2026-02-21T10:18:33.6749918Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6749982Z shl.b64 %rd604, %rd603, 1; 2026-02-21T10:18:33.6750046Z add.s64 %rd518, %rd69, %rd604; 2026-02-21T10:18:33.6750108Z add.s64 %rd521, %rd70, %rd604; 2026-02-21T10:18:33.6750171Z add.s64 %rd524, %rd71, %rd604; 2026-02-21T10:18:33.6750232Z add.s64 %rd527, %rd72, %rd604; 2026-02-21T10:18:33.6750293Z add.s64 %rd530, %rd73, %rd604; 2026-02-21T10:18:33.6750354Z add.s64 %rd533, %rd74, %rd604; 2026-02-21T10:18:33.6750420Z add.s64 %rd536, %rd75, %rd604; 2026-02-21T10:18:33.6750496Z mad.wide.s32 %rd539, %r29209, 2, %rd85; 2026-02-21T10:18:33.6750756Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6750826Z // begin inline asm 2026-02-21T10:18:33.6750885Z mov.u64 %rd517, 0x0; 2026-02-21T10:18:33.6751076Z createpolicy.fractional.L2::evict_first.b64 %rd517, 1.0; 2026-02-21T10:18:33.6751140Z // end inline asm 2026-02-21T10:18:33.6751199Z // begin inline asm 2026-02-21T10:18:33.6751260Z mov.u32 %r24312, 0x0; 2026-02-21T10:18:33.6751317Z mov.u32 %r24313, 0x0; 2026-02-21T10:18:33.6751379Z mov.u32 %r24314, 0x0; 2026-02-21T10:18:33.6751436Z mov.u32 %r24315, 0x0; 2026-02-21T10:18:33.6751682Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24312, %r24313, %r24314, %r24315 }, [ %rd518 + 0 ], %rd517; 2026-02-21T10:18:33.6751741Z // end inline asm 2026-02-21T10:18:33.6751802Z // begin inline asm 2026-02-21T10:18:33.6751859Z mov.u64 %rd520, 0x0; 2026-02-21T10:18:33.6751985Z createpolicy.fractional.L2::evict_first.b64 %rd520, 1.0; 2026-02-21T10:18:33.6752043Z // end inline asm 2026-02-21T10:18:33.6752104Z // begin inline asm 2026-02-21T10:18:33.6752163Z mov.u32 %r24316, 0x0; 2026-02-21T10:18:33.6752223Z mov.u32 %r24317, 0x0; 2026-02-21T10:18:33.6752282Z mov.u32 %r24318, 0x0; 2026-02-21T10:18:33.6752351Z mov.u32 %r24319, 0x0; 2026-02-21T10:18:33.6752588Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24316, %r24317, %r24318, %r24319 }, [ %rd521 + 0 ], %rd520; 2026-02-21T10:18:33.6752646Z // end inline asm 2026-02-21T10:18:33.6752705Z // begin inline asm 2026-02-21T10:18:33.6752763Z mov.u64 %rd523, 0x0; 2026-02-21T10:18:33.6752886Z createpolicy.fractional.L2::evict_first.b64 %rd523, 1.0; 2026-02-21T10:18:33.6752943Z // end inline asm 2026-02-21T10:18:33.6753001Z // begin inline asm 2026-02-21T10:18:33.6753062Z mov.u32 %r24320, 0x0; 2026-02-21T10:18:33.6753123Z mov.u32 %r24321, 0x0; 2026-02-21T10:18:33.6753182Z mov.u32 %r24322, 0x0; 2026-02-21T10:18:33.6753243Z mov.u32 %r24323, 0x0; 2026-02-21T10:18:33.6753470Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24320, %r24321, %r24322, %r24323 }, [ %rd524 + 0 ], %rd523; 2026-02-21T10:18:33.6753529Z // end inline asm 2026-02-21T10:18:33.6753587Z // begin inline asm 2026-02-21T10:18:33.6753648Z mov.u64 %rd526, 0x0; 2026-02-21T10:18:33.6753766Z createpolicy.fractional.L2::evict_first.b64 %rd526, 1.0; 2026-02-21T10:18:33.6753822Z // end inline asm 2026-02-21T10:18:33.6753884Z // begin inline asm 2026-02-21T10:18:33.6753941Z mov.u32 %r24324, 0x0; 2026-02-21T10:18:33.6753998Z mov.u32 %r24325, 0x0; 2026-02-21T10:18:33.6754056Z mov.u32 %r24326, 0x0; 2026-02-21T10:18:33.6754115Z mov.u32 %r24327, 0x0; 2026-02-21T10:18:33.6754339Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24324, %r24325, %r24326, %r24327 }, [ %rd527 + 0 ], %rd526; 2026-02-21T10:18:33.6754396Z // end inline asm 2026-02-21T10:18:33.6754523Z // begin inline asm 2026-02-21T10:18:33.6754583Z mov.u64 %rd529, 0x0; 2026-02-21T10:18:33.6754745Z createpolicy.fractional.L2::evict_first.b64 %rd529, 1.0; 2026-02-21T10:18:33.6754806Z // end inline asm 2026-02-21T10:18:33.6754865Z // begin inline asm 2026-02-21T10:18:33.6754922Z mov.u32 %r24328, 0x0; 2026-02-21T10:18:33.6754977Z mov.u32 %r24329, 0x0; 2026-02-21T10:18:33.6755037Z mov.u32 %r24330, 0x0; 2026-02-21T10:18:33.6755106Z mov.u32 %r24331, 0x0; 2026-02-21T10:18:33.6755336Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24328, %r24329, %r24330, %r24331 }, [ %rd530 + 0 ], %rd529; 2026-02-21T10:18:33.6755397Z // end inline asm 2026-02-21T10:18:33.6755455Z // begin inline asm 2026-02-21T10:18:33.6755513Z mov.u64 %rd532, 0x0; 2026-02-21T10:18:33.6755631Z createpolicy.fractional.L2::evict_first.b64 %rd532, 1.0; 2026-02-21T10:18:33.6755688Z // end inline asm 2026-02-21T10:18:33.6755745Z // begin inline asm 2026-02-21T10:18:33.6755802Z mov.u32 %r24332, 0x0; 2026-02-21T10:18:33.6755865Z mov.u32 %r24333, 0x0; 2026-02-21T10:18:33.6755922Z mov.u32 %r24334, 0x0; 2026-02-21T10:18:33.6755979Z mov.u32 %r24335, 0x0; 2026-02-21T10:18:33.6756255Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24332, %r24333, %r24334, %r24335 }, [ %rd533 + 0 ], %rd532; 2026-02-21T10:18:33.6756316Z // end inline asm 2026-02-21T10:18:33.6756375Z // begin inline asm 2026-02-21T10:18:33.6756433Z mov.u64 %rd535, 0x0; 2026-02-21T10:18:33.6756751Z createpolicy.fractional.L2::evict_first.b64 %rd535, 1.0; 2026-02-21T10:18:33.6756817Z // end inline asm 2026-02-21T10:18:33.6756877Z // begin inline asm 2026-02-21T10:18:33.6756936Z mov.u32 %r24336, 0x0; 2026-02-21T10:18:33.6756993Z mov.u32 %r24337, 0x0; 2026-02-21T10:18:33.6757050Z mov.u32 %r24338, 0x0; 2026-02-21T10:18:33.6757108Z mov.u32 %r24339, 0x0; 2026-02-21T10:18:33.6757331Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24336, %r24337, %r24338, %r24339 }, [ %rd536 + 0 ], %rd535; 2026-02-21T10:18:33.6757388Z // end inline asm 2026-02-21T10:18:33.6757449Z // begin inline asm 2026-02-21T10:18:33.6757509Z mov.u64 %rd538, 0x0; 2026-02-21T10:18:33.6757629Z createpolicy.fractional.L2::evict_first.b64 %rd538, 1.0; 2026-02-21T10:18:33.6757688Z // end inline asm 2026-02-21T10:18:33.6757750Z // begin inline asm 2026-02-21T10:18:33.6757809Z mov.u32 %r24340, 0x0; 2026-02-21T10:18:33.6757865Z mov.u32 %r24341, 0x0; 2026-02-21T10:18:33.6757922Z mov.u32 %r24342, 0x0; 2026-02-21T10:18:33.6757982Z mov.u32 %r24343, 0x0; 2026-02-21T10:18:33.6758207Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24340, %r24341, %r24342, %r24343 }, [ %rd539 + 0 ], %rd538; 2026-02-21T10:18:33.6758263Z // end inline asm 2026-02-21T10:18:33.6758467Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6758524Z bar.sync 0; 2026-02-21T10:18:33.6758608Z st.shared.v2.b32 [%r75], {%r24312, %r24313}; 2026-02-21T10:18:33.6758702Z st.shared.v2.b32 [%r75+2048], {%r24316, %r24317}; 2026-02-21T10:18:33.6758793Z st.shared.v2.b32 [%r75+4096], {%r24320, %r24321}; 2026-02-21T10:18:33.6758880Z st.shared.v2.b32 [%r75+6144], {%r24324, %r24325}; 2026-02-21T10:18:33.6758967Z st.shared.v2.b32 [%r75+8192], {%r24328, %r24329}; 2026-02-21T10:18:33.6759059Z st.shared.v2.b32 [%r75+10240], {%r24332, %r24333}; 2026-02-21T10:18:33.6759145Z st.shared.v2.b32 [%r75+12288], {%r24336, %r24337}; 2026-02-21T10:18:33.6759241Z st.shared.v2.b32 [%r75+14336], {%r24340, %r24341}; 2026-02-21T10:18:33.6759329Z st.shared.v2.b32 [%r76], {%r24314, %r24315}; 2026-02-21T10:18:33.6759414Z st.shared.v2.b32 [%r76+2048], {%r24318, %r24319}; 2026-02-21T10:18:33.6759498Z st.shared.v2.b32 [%r76+4096], {%r24322, %r24323}; 2026-02-21T10:18:33.6759584Z st.shared.v2.b32 [%r76+6144], {%r24326, %r24327}; 2026-02-21T10:18:33.6759666Z st.shared.v2.b32 [%r76+8192], {%r24330, %r24331}; 2026-02-21T10:18:33.6759751Z st.shared.v2.b32 [%r76+10240], {%r24334, %r24335}; 2026-02-21T10:18:33.6759839Z st.shared.v2.b32 [%r76+12288], {%r24338, %r24339}; 2026-02-21T10:18:33.6760000Z st.shared.v2.b32 [%r76+14336], {%r24342, %r24343}; 2026-02-21T10:18:33.6760112Z bar.sync 0; 2026-02-21T10:18:33.6760181Z ld.shared.b16 %rs2017, [%r77]; 2026-02-21T10:18:33.6760256Z ld.shared.b16 %rs2018, [%r77+1024]; 2026-02-21T10:18:33.6760324Z ld.shared.b16 %rs2019, [%r77+64]; 2026-02-21T10:18:33.6760391Z ld.shared.b16 %rs2020, [%r77+1088]; 2026-02-21T10:18:33.6760459Z ld.shared.b16 %rs2021, [%r77+8192]; 2026-02-21T10:18:33.6760525Z ld.shared.b16 %rs2022, [%r77+9216]; 2026-02-21T10:18:33.6760590Z ld.shared.b16 %rs2023, [%r77+8256]; 2026-02-21T10:18:33.6760655Z ld.shared.b16 %rs2024, [%r77+9280]; 2026-02-21T10:18:33.6760724Z ld.shared.b16 %rs2025, [%r78]; 2026-02-21T10:18:33.6760789Z ld.shared.b16 %rs2026, [%r78+1024]; 2026-02-21T10:18:33.6760853Z ld.shared.b16 %rs2027, [%r78+64]; 2026-02-21T10:18:33.6760923Z ld.shared.b16 %rs2028, [%r78+1088]; 2026-02-21T10:18:33.6760990Z ld.shared.b16 %rs2029, [%r78+8192]; 2026-02-21T10:18:33.6761058Z ld.shared.b16 %rs2030, [%r78+9216]; 2026-02-21T10:18:33.6761126Z ld.shared.b16 %rs2031, [%r78+8256]; 2026-02-21T10:18:33.6761191Z ld.shared.b16 %rs2032, [%r78+9280]; 2026-02-21T10:18:33.6761333Z ld.shared.b16 %rs2033, [%r79]; 2026-02-21T10:18:33.6761405Z ld.shared.b16 %rs2034, [%r79+1024]; 2026-02-21T10:18:33.6761473Z ld.shared.b16 %rs2035, [%r79+64]; 2026-02-21T10:18:33.6761540Z ld.shared.b16 %rs2036, [%r79+1088]; 2026-02-21T10:18:33.6761650Z ld.shared.b16 %rs2037, [%r79+8192]; 2026-02-21T10:18:33.6761720Z ld.shared.b16 %rs2038, [%r79+9216]; 2026-02-21T10:18:33.6761785Z ld.shared.b16 %rs2039, [%r79+8256]; 2026-02-21T10:18:33.6761853Z ld.shared.b16 %rs2040, [%r79+9280]; 2026-02-21T10:18:33.6761918Z ld.shared.b16 %rs2041, [%r80]; 2026-02-21T10:18:33.6761984Z ld.shared.b16 %rs2042, [%r80+1024]; 2026-02-21T10:18:33.6762059Z ld.shared.b16 %rs2043, [%r80+64]; 2026-02-21T10:18:33.6762126Z ld.shared.b16 %rs2044, [%r80+1088]; 2026-02-21T10:18:33.6762194Z ld.shared.b16 %rs2045, [%r80+8192]; 2026-02-21T10:18:33.6762260Z ld.shared.b16 %rs2046, [%r80+9216]; 2026-02-21T10:18:33.6762328Z ld.shared.b16 %rs2047, [%r80+8256]; 2026-02-21T10:18:33.6762397Z ld.shared.b16 %rs2048, [%r80+9280]; 2026-02-21T10:18:33.6762462Z ld.shared.b16 %rs2049, [%r81]; 2026-02-21T10:18:33.6762526Z ld.shared.b16 %rs2050, [%r81+1024]; 2026-02-21T10:18:33.6762589Z ld.shared.b16 %rs2051, [%r81+64]; 2026-02-21T10:18:33.6762657Z ld.shared.b16 %rs2052, [%r81+1088]; 2026-02-21T10:18:33.6762723Z ld.shared.b16 %rs2053, [%r81+8192]; 2026-02-21T10:18:33.6762788Z ld.shared.b16 %rs2054, [%r81+9216]; 2026-02-21T10:18:33.6765434Z ld.shared.b16 %rs2055, [%r81+8256]; 2026-02-21T10:18:33.6765538Z ld.shared.b16 %rs2056, [%r81+9280]; 2026-02-21T10:18:33.6765614Z ld.shared.b16 %rs2057, [%r82]; 2026-02-21T10:18:33.6765696Z ld.shared.b16 %rs2058, [%r82+1024]; 2026-02-21T10:18:33.6765767Z ld.shared.b16 %rs2059, [%r82+64]; 2026-02-21T10:18:33.6765837Z ld.shared.b16 %rs2060, [%r82+1088]; 2026-02-21T10:18:33.6765907Z ld.shared.b16 %rs2061, [%r82+8192]; 2026-02-21T10:18:33.6765972Z ld.shared.b16 %rs2062, [%r82+9216]; 2026-02-21T10:18:33.6766039Z ld.shared.b16 %rs2063, [%r82+8256]; 2026-02-21T10:18:33.6766112Z ld.shared.b16 %rs2064, [%r82+9280]; 2026-02-21T10:18:33.6766178Z ld.shared.b16 %rs2065, [%r83]; 2026-02-21T10:18:33.6766243Z ld.shared.b16 %rs2066, [%r83+1024]; 2026-02-21T10:18:33.6766313Z ld.shared.b16 %rs2067, [%r83+64]; 2026-02-21T10:18:33.6766379Z ld.shared.b16 %rs2068, [%r83+1088]; 2026-02-21T10:18:33.6766442Z ld.shared.b16 %rs2069, [%r83+8192]; 2026-02-21T10:18:33.6766663Z ld.shared.b16 %rs2070, [%r83+9216]; 2026-02-21T10:18:33.6766735Z ld.shared.b16 %rs2071, [%r83+8256]; 2026-02-21T10:18:33.6766798Z ld.shared.b16 %rs2072, [%r83+9280]; 2026-02-21T10:18:33.6766863Z ld.shared.b16 %rs2073, [%r84]; 2026-02-21T10:18:33.6766941Z ld.shared.b16 %rs2074, [%r84+1024]; 2026-02-21T10:18:33.6767007Z ld.shared.b16 %rs2075, [%r84+64]; 2026-02-21T10:18:33.6767073Z ld.shared.b16 %rs2076, [%r84+1088]; 2026-02-21T10:18:33.6767268Z ld.shared.b16 %rs2077, [%r84+8192]; 2026-02-21T10:18:33.6767393Z ld.shared.b16 %rs2078, [%r84+9216]; 2026-02-21T10:18:33.6767458Z ld.shared.b16 %rs2079, [%r84+8256]; 2026-02-21T10:18:33.6767522Z ld.shared.b16 %rs2080, [%r84+9280]; 2026-02-21T10:18:33.6767588Z cvt.f32.bf16 %r24481, %rs2017; 2026-02-21T10:18:33.6767651Z cvt.f32.bf16 %r24482, %rs2018; 2026-02-21T10:18:33.6767710Z cvt.f32.bf16 %r24483, %rs2025; 2026-02-21T10:18:33.6767774Z cvt.f32.bf16 %r24484, %rs2026; 2026-02-21T10:18:33.6767843Z cvt.f32.bf16 %r24613, %rs2033; 2026-02-21T10:18:33.6767905Z cvt.f32.bf16 %r24614, %rs2034; 2026-02-21T10:18:33.6767966Z cvt.f32.bf16 %r24615, %rs2041; 2026-02-21T10:18:33.6768028Z cvt.f32.bf16 %r24616, %rs2042; 2026-02-21T10:18:33.6768087Z cvt.f32.bf16 %r24745, %rs2049; 2026-02-21T10:18:33.6768145Z cvt.f32.bf16 %r24746, %rs2050; 2026-02-21T10:18:33.6768210Z cvt.f32.bf16 %r24747, %rs2057; 2026-02-21T10:18:33.6768269Z cvt.f32.bf16 %r24748, %rs2058; 2026-02-21T10:18:33.6768331Z cvt.f32.bf16 %r24877, %rs2065; 2026-02-21T10:18:33.6768397Z cvt.f32.bf16 %r24878, %rs2066; 2026-02-21T10:18:33.6768464Z cvt.f32.bf16 %r24879, %rs2073; 2026-02-21T10:18:33.6768591Z cvt.f32.bf16 %r24880, %rs2074; 2026-02-21T10:18:33.6768665Z cvt.f32.bf16 %r25009, %rs2019; 2026-02-21T10:18:33.6768732Z cvt.f32.bf16 %r25010, %rs2020; 2026-02-21T10:18:33.6768794Z cvt.f32.bf16 %r25011, %rs2027; 2026-02-21T10:18:33.6768853Z cvt.f32.bf16 %r25012, %rs2028; 2026-02-21T10:18:33.6768994Z cvt.f32.bf16 %r25141, %rs2035; 2026-02-21T10:18:33.6769061Z cvt.f32.bf16 %r25142, %rs2036; 2026-02-21T10:18:33.6769122Z cvt.f32.bf16 %r25143, %rs2043; 2026-02-21T10:18:33.6769181Z cvt.f32.bf16 %r25144, %rs2044; 2026-02-21T10:18:33.6769246Z cvt.f32.bf16 %r25273, %rs2051; 2026-02-21T10:18:33.6769305Z cvt.f32.bf16 %r25274, %rs2052; 2026-02-21T10:18:33.6769366Z cvt.f32.bf16 %r25275, %rs2059; 2026-02-21T10:18:33.6769428Z cvt.f32.bf16 %r25276, %rs2060; 2026-02-21T10:18:33.6769486Z cvt.f32.bf16 %r25405, %rs2067; 2026-02-21T10:18:33.6769548Z cvt.f32.bf16 %r25406, %rs2068; 2026-02-21T10:18:33.6769608Z cvt.f32.bf16 %r25407, %rs2075; 2026-02-21T10:18:33.6769684Z cvt.f32.bf16 %r25408, %rs2076; 2026-02-21T10:18:33.6769747Z cvt.f32.bf16 %r25537, %rs2021; 2026-02-21T10:18:33.6769806Z cvt.f32.bf16 %r25538, %rs2022; 2026-02-21T10:18:33.6769867Z cvt.f32.bf16 %r25539, %rs2029; 2026-02-21T10:18:33.6769927Z cvt.f32.bf16 %r25540, %rs2030; 2026-02-21T10:18:33.6769988Z cvt.f32.bf16 %r25669, %rs2037; 2026-02-21T10:18:33.6770047Z cvt.f32.bf16 %r25670, %rs2038; 2026-02-21T10:18:33.6770110Z cvt.f32.bf16 %r25671, %rs2045; 2026-02-21T10:18:33.6770168Z cvt.f32.bf16 %r25672, %rs2046; 2026-02-21T10:18:33.6770227Z cvt.f32.bf16 %r25801, %rs2053; 2026-02-21T10:18:33.6770288Z cvt.f32.bf16 %r25802, %rs2054; 2026-02-21T10:18:33.6770348Z cvt.f32.bf16 %r25803, %rs2061; 2026-02-21T10:18:33.6770407Z cvt.f32.bf16 %r25804, %rs2062; 2026-02-21T10:18:33.6770467Z cvt.f32.bf16 %r25933, %rs2069; 2026-02-21T10:18:33.6770534Z cvt.f32.bf16 %r25934, %rs2070; 2026-02-21T10:18:33.6770594Z cvt.f32.bf16 %r25935, %rs2077; 2026-02-21T10:18:33.6770655Z cvt.f32.bf16 %r25936, %rs2078; 2026-02-21T10:18:33.6770718Z cvt.f32.bf16 %r26065, %rs2023; 2026-02-21T10:18:33.6770778Z cvt.f32.bf16 %r26066, %rs2024; 2026-02-21T10:18:33.6770836Z cvt.f32.bf16 %r26067, %rs2031; 2026-02-21T10:18:33.6770898Z cvt.f32.bf16 %r26068, %rs2032; 2026-02-21T10:18:33.6770957Z cvt.f32.bf16 %r26197, %rs2039; 2026-02-21T10:18:33.6771017Z cvt.f32.bf16 %r26198, %rs2040; 2026-02-21T10:18:33.6771077Z cvt.f32.bf16 %r26199, %rs2047; 2026-02-21T10:18:33.6771138Z cvt.f32.bf16 %r26200, %rs2048; 2026-02-21T10:18:33.6771198Z cvt.f32.bf16 %r26329, %rs2055; 2026-02-21T10:18:33.6771258Z cvt.f32.bf16 %r26330, %rs2056; 2026-02-21T10:18:33.6771319Z cvt.f32.bf16 %r26331, %rs2063; 2026-02-21T10:18:33.6771379Z cvt.f32.bf16 %r26332, %rs2064; 2026-02-21T10:18:33.6771437Z cvt.f32.bf16 %r26461, %rs2071; 2026-02-21T10:18:33.6771495Z cvt.f32.bf16 %r26462, %rs2072; 2026-02-21T10:18:33.6771617Z cvt.f32.bf16 %r26463, %rs2079; 2026-02-21T10:18:33.6771719Z cvt.f32.bf16 %r26464, %rs2080; 2026-02-21T10:18:33.6771956Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6772016Z bar.sync 0; 2026-02-21T10:18:33.6772076Z // begin inline asm 2026-02-21T10:18:33.6772185Z @%p222 mbarrier.init.shared::cta.b64 [%r29375], 1; 2026-02-21T10:18:33.6772245Z // end inline asm 2026-02-21T10:18:33.6772304Z bar.sync 0; 2026-02-21T10:18:33.6772363Z // begin inline asm 2026-02-21T10:18:33.6772501Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r29375], 4096; 2026-02-21T10:18:33.6772560Z // end inline asm 2026-02-21T10:18:33.6772619Z // begin inline asm 2026-02-21T10:18:33.6772697Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6772755Z // end inline asm 2026-02-21T10:18:33.6772811Z bar.sync 0; 2026-02-21T10:18:33.6772879Z elect.sync %r29210|%p284, -1; 2026-02-21T10:18:33.6772949Z and.pred %p244, %p1, %p284; 2026-02-21T10:18:33.6773014Z cvt.u32.u64 %r29211, %rd662; 2026-02-21T10:18:33.6773079Z add.s32 %r24348, %r29211, 128; 2026-02-21T10:18:33.6773204Z // begin inline asm 2026-02-21T10:18:33.6773553Z @%p244 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r29378, %r24348}], [%r29375]; 2026-02-21T10:18:33.6773610Z // end inline asm 2026-02-21T10:18:33.6773664Z bar.sync 0; 2026-02-21T10:18:33.6773773Z // begin inline asm 2026-02-21T10:18:33.6773830Z 2026-02-21T10:18:33.6773880Z { 2026-02-21T10:18:33.6773944Z .reg .pred complete; 2026-02-21T10:18:33.6774002Z waitLoop: 2026-02-21T10:18:33.6774149Z mbarrier.try_wait.parity.shared.b64 complete, [%r29375], %r29010; 2026-02-21T10:18:33.6774221Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6774271Z } 2026-02-21T10:18:33.6774280Z 2026-02-21T10:18:33.6774337Z // end inline asm 2026-02-21T10:18:33.6774393Z bar.sync 0; 2026-02-21T10:18:33.6774458Z // begin inline asm 2026-02-21T10:18:33.6774559Z @%p222 mbarrier.inval.shared::cta.b64 [%r29375]; 2026-02-21T10:18:33.6774617Z // end inline asm 2026-02-21T10:18:33.6774831Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6774901Z ld.shared.s8 %rs2081, [%r85]; 2026-02-21T10:18:33.6775097Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6775163Z shl.b16 %rs2082, %rs2081, 4; 2026-02-21T10:18:33.6775357Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6775425Z ld.shared.s8 %rs2083, [%r86+128]; 2026-02-21T10:18:33.6775616Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6775677Z shl.b16 %rs2084, %rs2083, 4; 2026-02-21T10:18:33.6775871Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6775937Z ld.shared.s8 %rs2085, [%r87+256]; 2026-02-21T10:18:33.6776134Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6776197Z shl.b16 %rs2086, %rs2085, 4; 2026-02-21T10:18:33.6776387Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6776599Z ld.shared.s8 %rs2087, [%r88+384]; 2026-02-21T10:18:33.6776803Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6776863Z shl.b16 %rs2088, %rs2087, 4; 2026-02-21T10:18:33.6777055Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6777119Z ld.shared.s8 %rs2089, [%r89+512]; 2026-02-21T10:18:33.6777310Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6777370Z shl.b16 %rs2090, %rs2089, 4; 2026-02-21T10:18:33.6777660Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6777787Z ld.shared.s8 %rs2091, [%r90+640]; 2026-02-21T10:18:33.6777976Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6778037Z shl.b16 %rs2092, %rs2091, 4; 2026-02-21T10:18:33.6778230Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6778292Z ld.shared.s8 %rs2093, [%r91+768]; 2026-02-21T10:18:33.6778482Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6778544Z shl.b16 %rs2094, %rs2093, 4; 2026-02-21T10:18:33.6778740Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6778807Z ld.shared.s8 %rs2095, [%r92+896]; 2026-02-21T10:18:33.6779016Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6779080Z shl.b16 %rs2096, %rs2095, 4; 2026-02-21T10:18:33.6779338Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6779410Z ld.shared.s8 %rs2097, [%r85+1024]; 2026-02-21T10:18:33.6779607Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6779733Z shl.b16 %rs2098, %rs2097, 4; 2026-02-21T10:18:33.6779932Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6780002Z ld.shared.s8 %rs2099, [%r86+1152]; 2026-02-21T10:18:33.6780194Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6780255Z shl.b16 %rs2100, %rs2099, 4; 2026-02-21T10:18:33.6780450Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6780514Z ld.shared.s8 %rs2101, [%r87+1280]; 2026-02-21T10:18:33.6780708Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6780771Z shl.b16 %rs2102, %rs2101, 4; 2026-02-21T10:18:33.6780961Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6781038Z ld.shared.s8 %rs2103, [%r88+1408]; 2026-02-21T10:18:33.6781233Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6781296Z shl.b16 %rs2104, %rs2103, 4; 2026-02-21T10:18:33.6781487Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6781550Z ld.shared.s8 %rs2105, [%r89+1536]; 2026-02-21T10:18:33.6781744Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6781805Z shl.b16 %rs2106, %rs2105, 4; 2026-02-21T10:18:33.6781997Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6782064Z ld.shared.s8 %rs2107, [%r90+1664]; 2026-02-21T10:18:33.6782255Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6782316Z shl.b16 %rs2108, %rs2107, 4; 2026-02-21T10:18:33.6782510Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6782574Z ld.shared.s8 %rs2109, [%r91+1792]; 2026-02-21T10:18:33.6782766Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6782827Z shl.b16 %rs2110, %rs2109, 4; 2026-02-21T10:18:33.6783020Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6783082Z ld.shared.s8 %rs2111, [%r92+1920]; 2026-02-21T10:18:33.6783338Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6783464Z shl.b16 %rs2112, %rs2111, 4; 2026-02-21T10:18:33.6783657Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6783719Z ld.shared.s8 %rs2113, [%r85+2048]; 2026-02-21T10:18:33.6783917Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6783977Z shl.b16 %rs2114, %rs2113, 4; 2026-02-21T10:18:33.6784166Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6784232Z ld.shared.s8 %rs2115, [%r86+2176]; 2026-02-21T10:18:33.6784422Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6784482Z shl.b16 %rs2116, %rs2115, 4; 2026-02-21T10:18:33.6784683Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6784753Z ld.shared.s8 %rs2117, [%r87+2304]; 2026-02-21T10:18:33.6785003Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6785067Z shl.b16 %rs2118, %rs2117, 4; 2026-02-21T10:18:33.6785270Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6785381Z ld.shared.s8 %rs2119, [%r88+2432]; 2026-02-21T10:18:33.6785579Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6785644Z shl.b16 %rs2120, %rs2119, 4; 2026-02-21T10:18:33.6785835Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6785898Z ld.shared.s8 %rs2121, [%r89+2560]; 2026-02-21T10:18:33.6786091Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6786152Z shl.b16 %rs2122, %rs2121, 4; 2026-02-21T10:18:33.6786347Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6786413Z ld.shared.s8 %rs2123, [%r90+2688]; 2026-02-21T10:18:33.6786718Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6786785Z shl.b16 %rs2124, %rs2123, 4; 2026-02-21T10:18:33.6786980Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6787045Z ld.shared.s8 %rs2125, [%r91+2816]; 2026-02-21T10:18:33.6787237Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6787297Z shl.b16 %rs2126, %rs2125, 4; 2026-02-21T10:18:33.6787491Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6787556Z ld.shared.s8 %rs2127, [%r92+2944]; 2026-02-21T10:18:33.6787748Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6787812Z shl.b16 %rs2128, %rs2127, 4; 2026-02-21T10:18:33.6788003Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6788066Z ld.shared.s8 %rs2129, [%r85+3072]; 2026-02-21T10:18:33.6788261Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6788332Z shl.b16 %rs2130, %rs2129, 4; 2026-02-21T10:18:33.6788608Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6788674Z ld.shared.s8 %rs2131, [%r86+3200]; 2026-02-21T10:18:33.6788866Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6788927Z shl.b16 %rs2132, %rs2131, 4; 2026-02-21T10:18:33.6789195Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6789322Z ld.shared.s8 %rs2133, [%r87+3328]; 2026-02-21T10:18:33.6789515Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6789575Z shl.b16 %rs2134, %rs2133, 4; 2026-02-21T10:18:33.6789770Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6789832Z ld.shared.s8 %rs2135, [%r88+3456]; 2026-02-21T10:18:33.6790024Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6790087Z shl.b16 %rs2136, %rs2135, 4; 2026-02-21T10:18:33.6790278Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6790340Z ld.shared.s8 %rs2137, [%r89+3584]; 2026-02-21T10:18:33.6790544Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6790611Z shl.b16 %rs2138, %rs2137, 4; 2026-02-21T10:18:33.6790888Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6790953Z ld.shared.s8 %rs2139, [%r90+3712]; 2026-02-21T10:18:33.6791147Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6791264Z shl.b16 %rs2140, %rs2139, 4; 2026-02-21T10:18:33.6791457Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6791524Z ld.shared.s8 %rs2141, [%r91+3840]; 2026-02-21T10:18:33.6791712Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6791771Z shl.b16 %rs2142, %rs2141, 4; 2026-02-21T10:18:33.6791964Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6792030Z ld.shared.s8 %rs2143, [%r92+3968]; 2026-02-21T10:18:33.6792223Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6792283Z shl.b16 %rs2144, %rs2143, 4; 2026-02-21T10:18:33.6792481Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6792543Z cvt.s16.s8 %rs2145, %rs2082; 2026-02-21T10:18:33.6792602Z shr.s16 %rs2146, %rs2145, 4; 2026-02-21T10:18:33.6792676Z cvt.s16.s8 %rs2147, %rs2084; 2026-02-21T10:18:33.6792738Z shr.s16 %rs2148, %rs2147, 4; 2026-02-21T10:18:33.6792799Z shr.s16 %rs2149, %rs2081, 4; 2026-02-21T10:18:33.6792861Z shr.s16 %rs2150, %rs2083, 4; 2026-02-21T10:18:33.6793056Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6793122Z cvt.rn.f32.s16 %r29212, %rs2150; 2026-02-21T10:18:33.6793189Z cvt.rn.f32.s16 %r29213, %rs2149; 2026-02-21T10:18:33.6793252Z cvt.rn.f32.s16 %r29214, %rs2148; 2026-02-21T10:18:33.6793314Z cvt.rn.f32.s16 %r29215, %rs2146; 2026-02-21T10:18:33.6793507Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6793571Z cvt.s16.s8 %rs2151, %rs2086; 2026-02-21T10:18:33.6793631Z shr.s16 %rs2152, %rs2151, 4; 2026-02-21T10:18:33.6793690Z cvt.s16.s8 %rs2153, %rs2088; 2026-02-21T10:18:33.6793753Z shr.s16 %rs2154, %rs2153, 4; 2026-02-21T10:18:33.6793812Z shr.s16 %rs2155, %rs2085, 4; 2026-02-21T10:18:33.6793871Z shr.s16 %rs2156, %rs2087, 4; 2026-02-21T10:18:33.6794063Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6794129Z cvt.rn.f32.s16 %r29216, %rs2156; 2026-02-21T10:18:33.6794190Z cvt.rn.f32.s16 %r29217, %rs2155; 2026-02-21T10:18:33.6794251Z cvt.rn.f32.s16 %r29218, %rs2154; 2026-02-21T10:18:33.6794313Z cvt.rn.f32.s16 %r29219, %rs2152; 2026-02-21T10:18:33.6794584Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6794691Z cvt.s16.s8 %rs2157, %rs2090; 2026-02-21T10:18:33.6794755Z shr.s16 %rs2158, %rs2157, 4; 2026-02-21T10:18:33.6794816Z cvt.s16.s8 %rs2159, %rs2092; 2026-02-21T10:18:33.6794876Z shr.s16 %rs2160, %rs2159, 4; 2026-02-21T10:18:33.6794936Z shr.s16 %rs2161, %rs2089, 4; 2026-02-21T10:18:33.6794999Z shr.s16 %rs2162, %rs2091, 4; 2026-02-21T10:18:33.6795201Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6795274Z cvt.rn.f32.s16 %r29220, %rs2162; 2026-02-21T10:18:33.6795341Z cvt.rn.f32.s16 %r29221, %rs2161; 2026-02-21T10:18:33.6795402Z cvt.rn.f32.s16 %r29222, %rs2160; 2026-02-21T10:18:33.6795462Z cvt.rn.f32.s16 %r29223, %rs2158; 2026-02-21T10:18:33.6795659Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6795722Z cvt.s16.s8 %rs2163, %rs2094; 2026-02-21T10:18:33.6795783Z shr.s16 %rs2164, %rs2163, 4; 2026-02-21T10:18:33.6795841Z cvt.s16.s8 %rs2165, %rs2096; 2026-02-21T10:18:33.6795954Z shr.s16 %rs2166, %rs2165, 4; 2026-02-21T10:18:33.6796017Z shr.s16 %rs2167, %rs2093, 4; 2026-02-21T10:18:33.6796076Z shr.s16 %rs2168, %rs2095, 4; 2026-02-21T10:18:33.6796317Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6796381Z cvt.rn.f32.s16 %r29224, %rs2168; 2026-02-21T10:18:33.6796442Z cvt.rn.f32.s16 %r29225, %rs2167; 2026-02-21T10:18:33.6796630Z cvt.rn.f32.s16 %r29226, %rs2166; 2026-02-21T10:18:33.6796696Z cvt.rn.f32.s16 %r29227, %rs2164; 2026-02-21T10:18:33.6796903Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6796968Z cvt.s16.s8 %rs2169, %rs2098; 2026-02-21T10:18:33.6797028Z shr.s16 %rs2170, %rs2169, 4; 2026-02-21T10:18:33.6797089Z cvt.s16.s8 %rs2171, %rs2100; 2026-02-21T10:18:33.6797148Z shr.s16 %rs2172, %rs2171, 4; 2026-02-21T10:18:33.6797211Z shr.s16 %rs2173, %rs2097, 4; 2026-02-21T10:18:33.6797276Z shr.s16 %rs2174, %rs2099, 4; 2026-02-21T10:18:33.6797468Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6797531Z cvt.rn.f32.s16 %r29228, %rs2174; 2026-02-21T10:18:33.6797593Z cvt.rn.f32.s16 %r29229, %rs2173; 2026-02-21T10:18:33.6797654Z cvt.rn.f32.s16 %r29230, %rs2172; 2026-02-21T10:18:33.6797714Z cvt.rn.f32.s16 %r29231, %rs2170; 2026-02-21T10:18:33.6797908Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6797968Z cvt.s16.s8 %rs2175, %rs2102; 2026-02-21T10:18:33.6798028Z shr.s16 %rs2176, %rs2175, 4; 2026-02-21T10:18:33.6798090Z cvt.s16.s8 %rs2177, %rs2104; 2026-02-21T10:18:33.6798149Z shr.s16 %rs2178, %rs2177, 4; 2026-02-21T10:18:33.6798208Z shr.s16 %rs2179, %rs2101, 4; 2026-02-21T10:18:33.6798271Z shr.s16 %rs2180, %rs2103, 4; 2026-02-21T10:18:33.6798467Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6798529Z cvt.rn.f32.s16 %r29232, %rs2180; 2026-02-21T10:18:33.6798591Z cvt.rn.f32.s16 %r29233, %rs2179; 2026-02-21T10:18:33.6798653Z cvt.rn.f32.s16 %r29234, %rs2178; 2026-02-21T10:18:33.6798712Z cvt.rn.f32.s16 %r29235, %rs2176; 2026-02-21T10:18:33.6798905Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6798969Z cvt.s16.s8 %rs2181, %rs2106; 2026-02-21T10:18:33.6799030Z shr.s16 %rs2182, %rs2181, 4; 2026-02-21T10:18:33.6799091Z cvt.s16.s8 %rs2183, %rs2108; 2026-02-21T10:18:33.6799153Z shr.s16 %rs2184, %rs2183, 4; 2026-02-21T10:18:33.6799212Z shr.s16 %rs2185, %rs2105, 4; 2026-02-21T10:18:33.6799272Z shr.s16 %rs2186, %rs2107, 4; 2026-02-21T10:18:33.6799464Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6799682Z cvt.rn.f32.s16 %r29236, %rs2186; 2026-02-21T10:18:33.6799743Z cvt.rn.f32.s16 %r29237, %rs2185; 2026-02-21T10:18:33.6799805Z cvt.rn.f32.s16 %r29238, %rs2184; 2026-02-21T10:18:33.6799867Z cvt.rn.f32.s16 %r29239, %rs2182; 2026-02-21T10:18:33.6800064Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6800126Z cvt.s16.s8 %rs2187, %rs2110; 2026-02-21T10:18:33.6800188Z shr.s16 %rs2188, %rs2187, 4; 2026-02-21T10:18:33.6800247Z cvt.s16.s8 %rs2189, %rs2112; 2026-02-21T10:18:33.6800306Z shr.s16 %rs2190, %rs2189, 4; 2026-02-21T10:18:33.6800365Z shr.s16 %rs2191, %rs2109, 4; 2026-02-21T10:18:33.6800426Z shr.s16 %rs2192, %rs2111, 4; 2026-02-21T10:18:33.6800620Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6800682Z cvt.rn.f32.s16 %r29240, %rs2192; 2026-02-21T10:18:33.6800746Z cvt.rn.f32.s16 %r29241, %rs2191; 2026-02-21T10:18:33.6800806Z cvt.rn.f32.s16 %r29242, %rs2190; 2026-02-21T10:18:33.6800868Z cvt.rn.f32.s16 %r29243, %rs2188; 2026-02-21T10:18:33.6801122Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6801198Z cvt.s16.s8 %rs2193, %rs2114; 2026-02-21T10:18:33.6801261Z shr.s16 %rs2194, %rs2193, 4; 2026-02-21T10:18:33.6801321Z cvt.s16.s8 %rs2195, %rs2116; 2026-02-21T10:18:33.6801439Z shr.s16 %rs2196, %rs2195, 4; 2026-02-21T10:18:33.6801504Z shr.s16 %rs2197, %rs2113, 4; 2026-02-21T10:18:33.6801565Z shr.s16 %rs2198, %rs2115, 4; 2026-02-21T10:18:33.6801760Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6801821Z cvt.rn.f32.s16 %r29244, %rs2198; 2026-02-21T10:18:33.6801881Z cvt.rn.f32.s16 %r29245, %rs2197; 2026-02-21T10:18:33.6801939Z cvt.rn.f32.s16 %r29246, %rs2196; 2026-02-21T10:18:33.6802004Z cvt.rn.f32.s16 %r29247, %rs2194; 2026-02-21T10:18:33.6802194Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6802257Z cvt.s16.s8 %rs2199, %rs2118; 2026-02-21T10:18:33.6802318Z shr.s16 %rs2200, %rs2199, 4; 2026-02-21T10:18:33.6802387Z cvt.s16.s8 %rs2201, %rs2120; 2026-02-21T10:18:33.6802448Z shr.s16 %rs2202, %rs2201, 4; 2026-02-21T10:18:33.6802509Z shr.s16 %rs2203, %rs2117, 4; 2026-02-21T10:18:33.6802570Z shr.s16 %rs2204, %rs2119, 4; 2026-02-21T10:18:33.6802763Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6802825Z cvt.rn.f32.s16 %r29248, %rs2204; 2026-02-21T10:18:33.6802887Z cvt.rn.f32.s16 %r29249, %rs2203; 2026-02-21T10:18:33.6802947Z cvt.rn.f32.s16 %r29250, %rs2202; 2026-02-21T10:18:33.6803007Z cvt.rn.f32.s16 %r29251, %rs2200; 2026-02-21T10:18:33.6803201Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6803262Z cvt.s16.s8 %rs2205, %rs2122; 2026-02-21T10:18:33.6803322Z shr.s16 %rs2206, %rs2205, 4; 2026-02-21T10:18:33.6803380Z cvt.s16.s8 %rs2207, %rs2124; 2026-02-21T10:18:33.6803443Z shr.s16 %rs2208, %rs2207, 4; 2026-02-21T10:18:33.6803502Z shr.s16 %rs2209, %rs2121, 4; 2026-02-21T10:18:33.6803562Z shr.s16 %rs2210, %rs2123, 4; 2026-02-21T10:18:33.6803758Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6803820Z cvt.rn.f32.s16 %r29252, %rs2210; 2026-02-21T10:18:33.6803882Z cvt.rn.f32.s16 %r29253, %rs2209; 2026-02-21T10:18:33.6803944Z cvt.rn.f32.s16 %r29254, %rs2208; 2026-02-21T10:18:33.6804004Z cvt.rn.f32.s16 %r29255, %rs2206; 2026-02-21T10:18:33.6804195Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6804256Z cvt.s16.s8 %rs2211, %rs2126; 2026-02-21T10:18:33.6804326Z shr.s16 %rs2212, %rs2211, 4; 2026-02-21T10:18:33.6804446Z cvt.s16.s8 %rs2213, %rs2128; 2026-02-21T10:18:33.6804549Z shr.s16 %rs2214, %rs2213, 4; 2026-02-21T10:18:33.6804612Z shr.s16 %rs2215, %rs2125, 4; 2026-02-21T10:18:33.6804672Z shr.s16 %rs2216, %rs2127, 4; 2026-02-21T10:18:33.6804863Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6804929Z cvt.rn.f32.s16 %r29256, %rs2216; 2026-02-21T10:18:33.6804991Z cvt.rn.f32.s16 %r29257, %rs2215; 2026-02-21T10:18:33.6805053Z cvt.rn.f32.s16 %r29258, %rs2214; 2026-02-21T10:18:33.6805113Z cvt.rn.f32.s16 %r29259, %rs2212; 2026-02-21T10:18:33.6805303Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6805362Z cvt.s16.s8 %rs2217, %rs2130; 2026-02-21T10:18:33.6805426Z shr.s16 %rs2218, %rs2217, 4; 2026-02-21T10:18:33.6805485Z cvt.s16.s8 %rs2219, %rs2132; 2026-02-21T10:18:33.6805543Z shr.s16 %rs2220, %rs2219, 4; 2026-02-21T10:18:33.6805607Z shr.s16 %rs2221, %rs2129, 4; 2026-02-21T10:18:33.6805667Z shr.s16 %rs2222, %rs2131, 4; 2026-02-21T10:18:33.6805928Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6806007Z cvt.rn.f32.s16 %r29260, %rs2222; 2026-02-21T10:18:33.6806070Z cvt.rn.f32.s16 %r29261, %rs2221; 2026-02-21T10:18:33.6806131Z cvt.rn.f32.s16 %r29262, %rs2220; 2026-02-21T10:18:33.6806191Z cvt.rn.f32.s16 %r29263, %rs2218; 2026-02-21T10:18:33.6806433Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6806610Z cvt.s16.s8 %rs2223, %rs2134; 2026-02-21T10:18:33.6806673Z shr.s16 %rs2224, %rs2223, 4; 2026-02-21T10:18:33.6806735Z cvt.s16.s8 %rs2225, %rs2136; 2026-02-21T10:18:33.6806794Z shr.s16 %rs2226, %rs2225, 4; 2026-02-21T10:18:33.6806855Z shr.s16 %rs2227, %rs2133, 4; 2026-02-21T10:18:33.6806913Z shr.s16 %rs2228, %rs2135, 4; 2026-02-21T10:18:33.6807106Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6807172Z cvt.rn.f32.s16 %r29264, %rs2228; 2026-02-21T10:18:33.6807234Z cvt.rn.f32.s16 %r29265, %rs2227; 2026-02-21T10:18:33.6807295Z cvt.rn.f32.s16 %r29266, %rs2226; 2026-02-21T10:18:33.6807354Z cvt.rn.f32.s16 %r29267, %rs2224; 2026-02-21T10:18:33.6807550Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6807623Z cvt.s16.s8 %rs2229, %rs2138; 2026-02-21T10:18:33.6807684Z shr.s16 %rs2230, %rs2229, 4; 2026-02-21T10:18:33.6807743Z cvt.s16.s8 %rs2231, %rs2140; 2026-02-21T10:18:33.6807802Z shr.s16 %rs2232, %rs2231, 4; 2026-02-21T10:18:33.6807862Z shr.s16 %rs2233, %rs2137, 4; 2026-02-21T10:18:33.6807919Z shr.s16 %rs2234, %rs2139, 4; 2026-02-21T10:18:33.6808110Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6808173Z cvt.rn.f32.s16 %r29268, %rs2234; 2026-02-21T10:18:33.6808235Z cvt.rn.f32.s16 %r29269, %rs2233; 2026-02-21T10:18:33.6808297Z cvt.rn.f32.s16 %r29270, %rs2232; 2026-02-21T10:18:33.6808359Z cvt.rn.f32.s16 %r29271, %rs2230; 2026-02-21T10:18:33.6808553Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6808614Z cvt.s16.s8 %rs2235, %rs2142; 2026-02-21T10:18:33.6808674Z shr.s16 %rs2236, %rs2235, 4; 2026-02-21T10:18:33.6808736Z cvt.s16.s8 %rs2237, %rs2144; 2026-02-21T10:18:33.6808796Z shr.s16 %rs2238, %rs2237, 4; 2026-02-21T10:18:33.6808855Z shr.s16 %rs2239, %rs2141, 4; 2026-02-21T10:18:33.6808915Z shr.s16 %rs2240, %rs2143, 4; 2026-02-21T10:18:33.6809116Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6809180Z cvt.rn.f32.s16 %r29272, %rs2240; 2026-02-21T10:18:33.6809248Z cvt.rn.f32.s16 %r29273, %rs2239; 2026-02-21T10:18:33.6809307Z cvt.rn.f32.s16 %r29274, %rs2238; 2026-02-21T10:18:33.6809449Z cvt.rn.f32.s16 %r29275, %rs2236; 2026-02-21T10:18:33.6809503Z bar.sync 0; 2026-02-21T10:18:33.6809709Z st.shared.v4.b32 [%r93], {%r29215, %r29213, %r29214, %r29212}; 2026-02-21T10:18:33.6809847Z st.shared.v4.b32 [%r93+16384], {%r29247, %r29245, %r29246, %r29244}; 2026-02-21T10:18:33.6809964Z st.shared.v4.b32 [%r94], {%r29219, %r29217, %r29218, %r29216}; 2026-02-21T10:18:33.6810085Z st.shared.v4.b32 [%r94+16384], {%r29251, %r29249, %r29250, %r29248}; 2026-02-21T10:18:33.6810196Z st.shared.v4.b32 [%r95], {%r29223, %r29221, %r29222, %r29220}; 2026-02-21T10:18:33.6810311Z st.shared.v4.b32 [%r95+16384], {%r29255, %r29253, %r29254, %r29252}; 2026-02-21T10:18:33.6810418Z st.shared.v4.b32 [%r96], {%r29227, %r29225, %r29226, %r29224}; 2026-02-21T10:18:33.6810533Z st.shared.v4.b32 [%r96+16384], {%r29259, %r29257, %r29258, %r29256}; 2026-02-21T10:18:33.6810638Z st.shared.v4.b32 [%r97], {%r29231, %r29229, %r29230, %r29228}; 2026-02-21T10:18:33.6810752Z st.shared.v4.b32 [%r97+16384], {%r29263, %r29261, %r29262, %r29260}; 2026-02-21T10:18:33.6810863Z st.shared.v4.b32 [%r98], {%r29235, %r29233, %r29234, %r29232}; 2026-02-21T10:18:33.6811053Z st.shared.v4.b32 [%r98+16384], {%r29267, %r29265, %r29266, %r29264}; 2026-02-21T10:18:33.6811167Z st.shared.v4.b32 [%r99], {%r29239, %r29237, %r29238, %r29236}; 2026-02-21T10:18:33.6811287Z st.shared.v4.b32 [%r99+16384], {%r29271, %r29269, %r29270, %r29268}; 2026-02-21T10:18:33.6811456Z st.shared.v4.b32 [%r100], {%r29243, %r29241, %r29242, %r29240}; 2026-02-21T10:18:33.6811583Z st.shared.v4.b32 [%r100+16384], {%r29275, %r29273, %r29274, %r29272}; 2026-02-21T10:18:33.6811639Z $L__tmp19: 2026-02-21T10:18:33.6811913Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6811973Z // begin inline asm 2026-02-21T10:18:33.6812055Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6812110Z // end inline asm 2026-02-21T10:18:33.6812165Z bar.sync 0; 2026-02-21T10:18:33.6812249Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6812313Z // begin inline asm 2026-02-21T10:18:33.6813814Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r24481,%r24482,%r24483,%r24484}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6813874Z // end inline asm 2026-02-21T10:18:33.6813933Z // begin inline asm 2026-02-21T10:18:33.6815429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r24613,%r24614,%r24615,%r24616}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6815493Z // end inline asm 2026-02-21T10:18:33.6815551Z // begin inline asm 2026-02-21T10:18:33.6817159Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r24745,%r24746,%r24747,%r24748}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6817369Z // end inline asm 2026-02-21T10:18:33.6817428Z // begin inline asm 2026-02-21T10:18:33.6818915Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r24877,%r24878,%r24879,%r24880}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6818972Z // end inline asm 2026-02-21T10:18:33.6819031Z // begin inline asm 2026-02-21T10:18:33.6820619Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r25009,%r25010,%r25011,%r25012}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6820681Z // end inline asm 2026-02-21T10:18:33.6820738Z // begin inline asm 2026-02-21T10:18:33.6822222Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r25141,%r25142,%r25143,%r25144}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6822281Z // end inline asm 2026-02-21T10:18:33.6822342Z // begin inline asm 2026-02-21T10:18:33.6823837Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r25273,%r25274,%r25275,%r25276}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6823897Z // end inline asm 2026-02-21T10:18:33.6823958Z // begin inline asm 2026-02-21T10:18:33.6825451Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r25405,%r25406,%r25407,%r25408}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6825523Z // end inline asm 2026-02-21T10:18:33.6825582Z // begin inline asm 2026-02-21T10:18:33.6827178Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r25537,%r25538,%r25539,%r25540}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6827382Z // end inline asm 2026-02-21T10:18:33.6827442Z // begin inline asm 2026-02-21T10:18:33.6829048Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r25669,%r25670,%r25671,%r25672}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6829116Z // end inline asm 2026-02-21T10:18:33.6829176Z // begin inline asm 2026-02-21T10:18:33.6830713Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r25801,%r25802,%r25803,%r25804}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6830774Z // end inline asm 2026-02-21T10:18:33.6830831Z // begin inline asm 2026-02-21T10:18:33.6832310Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r25933,%r25934,%r25935,%r25936}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6832378Z // end inline asm 2026-02-21T10:18:33.6832438Z // begin inline asm 2026-02-21T10:18:33.6833921Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r26065,%r26066,%r26067,%r26068}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6833979Z // end inline asm 2026-02-21T10:18:33.6834038Z // begin inline asm 2026-02-21T10:18:33.6835517Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r26197,%r26198,%r26199,%r26200}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6835659Z // end inline asm 2026-02-21T10:18:33.6835718Z // begin inline asm 2026-02-21T10:18:33.6837310Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r26329,%r26330,%r26331,%r26332}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6837373Z // end inline asm 2026-02-21T10:18:33.6837430Z // begin inline asm 2026-02-21T10:18:33.6839039Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r26461,%r26462,%r26463,%r26464}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6839103Z // end inline asm 2026-02-21T10:18:33.6839181Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6839244Z mov.b32 %r26593, %r29377; 2026-02-21T10:18:33.6839304Z mov.b32 %r26594, %r29010; 2026-02-21T10:18:33.6839361Z mov.b32 %r26595, %r29010; 2026-02-21T10:18:33.6839422Z // begin inline asm 2026-02-21T10:18:33.6842003Z // wait for regs: %r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484,%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548,%r26593,%r26594,%r26595 2026-02-21T10:18:33.6842088Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6842149Z // end inline asm 2026-02-21T10:18:33.6842202Z $L__tmp20: 2026-02-21T10:18:33.6842419Z .loc 1 57 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:57:25 2026-02-21T10:18:33.6842484Z add.s64 %rd605, %rd602, 128; 2026-02-21T10:18:33.6842691Z .loc 1 58 60 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:60 2026-02-21T10:18:33.6842753Z cvt.u32.u64 %r29276, %rd605; 2026-02-21T10:18:33.6842828Z add.s32 %r29277, %r1159, %r29276; 2026-02-21T10:18:33.6843030Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6843093Z shl.b64 %rd606, %rd605, 1; 2026-02-21T10:18:33.6843159Z add.s64 %rd559, %rd69, %rd606; 2026-02-21T10:18:33.6843223Z add.s64 %rd562, %rd70, %rd606; 2026-02-21T10:18:33.6843384Z add.s64 %rd565, %rd71, %rd606; 2026-02-21T10:18:33.6843505Z add.s64 %rd568, %rd72, %rd606; 2026-02-21T10:18:33.6843568Z add.s64 %rd571, %rd73, %rd606; 2026-02-21T10:18:33.6843630Z add.s64 %rd574, %rd74, %rd606; 2026-02-21T10:18:33.6843689Z add.s64 %rd577, %rd75, %rd606; 2026-02-21T10:18:33.6843764Z mad.wide.s32 %rd580, %r29277, 2, %rd85; 2026-02-21T10:18:33.6843966Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6844025Z // begin inline asm 2026-02-21T10:18:33.6844083Z mov.u64 %rd558, 0x0; 2026-02-21T10:18:33.6844217Z createpolicy.fractional.L2::evict_first.b64 %rd558, 1.0; 2026-02-21T10:18:33.6844275Z // end inline asm 2026-02-21T10:18:33.6844334Z // begin inline asm 2026-02-21T10:18:33.6844392Z mov.u32 %r26727, 0x0; 2026-02-21T10:18:33.6844450Z mov.u32 %r26728, 0x0; 2026-02-21T10:18:33.6844507Z mov.u32 %r26729, 0x0; 2026-02-21T10:18:33.6844562Z mov.u32 %r26730, 0x0; 2026-02-21T10:18:33.6844807Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26727, %r26728, %r26729, %r26730 }, [ %rd559 + 0 ], %rd558; 2026-02-21T10:18:33.6844866Z // end inline asm 2026-02-21T10:18:33.6844986Z // begin inline asm 2026-02-21T10:18:33.6845049Z mov.u64 %rd561, 0x0; 2026-02-21T10:18:33.6845177Z createpolicy.fractional.L2::evict_first.b64 %rd561, 1.0; 2026-02-21T10:18:33.6845234Z // end inline asm 2026-02-21T10:18:33.6845291Z // begin inline asm 2026-02-21T10:18:33.6845393Z mov.u32 %r26731, 0x0; 2026-02-21T10:18:33.6845453Z mov.u32 %r26732, 0x0; 2026-02-21T10:18:33.6845508Z mov.u32 %r26733, 0x0; 2026-02-21T10:18:33.6845565Z mov.u32 %r26734, 0x0; 2026-02-21T10:18:33.6845794Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26731, %r26732, %r26733, %r26734 }, [ %rd562 + 0 ], %rd561; 2026-02-21T10:18:33.6845852Z // end inline asm 2026-02-21T10:18:33.6845909Z // begin inline asm 2026-02-21T10:18:33.6845969Z mov.u64 %rd564, 0x0; 2026-02-21T10:18:33.6846086Z createpolicy.fractional.L2::evict_first.b64 %rd564, 1.0; 2026-02-21T10:18:33.6846146Z // end inline asm 2026-02-21T10:18:33.6846206Z // begin inline asm 2026-02-21T10:18:33.6846261Z mov.u32 %r26735, 0x0; 2026-02-21T10:18:33.6846320Z mov.u32 %r26736, 0x0; 2026-02-21T10:18:33.6846376Z mov.u32 %r26737, 0x0; 2026-02-21T10:18:33.6846434Z mov.u32 %r26738, 0x0; 2026-02-21T10:18:33.6846781Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26735, %r26736, %r26737, %r26738 }, [ %rd565 + 0 ], %rd564; 2026-02-21T10:18:33.6846841Z // end inline asm 2026-02-21T10:18:33.6846903Z // begin inline asm 2026-02-21T10:18:33.6846959Z mov.u64 %rd567, 0x0; 2026-02-21T10:18:33.6847074Z createpolicy.fractional.L2::evict_first.b64 %rd567, 1.0; 2026-02-21T10:18:33.6847134Z // end inline asm 2026-02-21T10:18:33.6847190Z // begin inline asm 2026-02-21T10:18:33.6847246Z mov.u32 %r26739, 0x0; 2026-02-21T10:18:33.6847303Z mov.u32 %r26740, 0x0; 2026-02-21T10:18:33.6847362Z mov.u32 %r26741, 0x0; 2026-02-21T10:18:33.6847420Z mov.u32 %r26742, 0x0; 2026-02-21T10:18:33.6847643Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26739, %r26740, %r26741, %r26742 }, [ %rd568 + 0 ], %rd567; 2026-02-21T10:18:33.6847704Z // end inline asm 2026-02-21T10:18:33.6847761Z // begin inline asm 2026-02-21T10:18:33.6847817Z mov.u64 %rd570, 0x0; 2026-02-21T10:18:33.6847937Z createpolicy.fractional.L2::evict_first.b64 %rd570, 1.0; 2026-02-21T10:18:33.6847993Z // end inline asm 2026-02-21T10:18:33.6848049Z // begin inline asm 2026-02-21T10:18:33.6848107Z mov.u32 %r26743, 0x0; 2026-02-21T10:18:33.6848168Z mov.u32 %r26744, 0x0; 2026-02-21T10:18:33.6848222Z mov.u32 %r26745, 0x0; 2026-02-21T10:18:33.6848279Z mov.u32 %r26746, 0x0; 2026-02-21T10:18:33.6848502Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26743, %r26744, %r26745, %r26746 }, [ %rd571 + 0 ], %rd570; 2026-02-21T10:18:33.6848558Z // end inline asm 2026-02-21T10:18:33.6848615Z // begin inline asm 2026-02-21T10:18:33.6848673Z mov.u64 %rd573, 0x0; 2026-02-21T10:18:33.6848796Z createpolicy.fractional.L2::evict_first.b64 %rd573, 1.0; 2026-02-21T10:18:33.6848949Z // end inline asm 2026-02-21T10:18:33.6849070Z // begin inline asm 2026-02-21T10:18:33.6849130Z mov.u32 %r26747, 0x0; 2026-02-21T10:18:33.6849188Z mov.u32 %r26748, 0x0; 2026-02-21T10:18:33.6849244Z mov.u32 %r26749, 0x0; 2026-02-21T10:18:33.6849302Z mov.u32 %r26750, 0x0; 2026-02-21T10:18:33.6849536Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26747, %r26748, %r26749, %r26750 }, [ %rd574 + 0 ], %rd573; 2026-02-21T10:18:33.6849591Z // end inline asm 2026-02-21T10:18:33.6849648Z // begin inline asm 2026-02-21T10:18:33.6849707Z mov.u64 %rd576, 0x0; 2026-02-21T10:18:33.6849823Z createpolicy.fractional.L2::evict_first.b64 %rd576, 1.0; 2026-02-21T10:18:33.6849878Z // end inline asm 2026-02-21T10:18:33.6849937Z // begin inline asm 2026-02-21T10:18:33.6849994Z mov.u32 %r26751, 0x0; 2026-02-21T10:18:33.6850049Z mov.u32 %r26752, 0x0; 2026-02-21T10:18:33.6850110Z mov.u32 %r26753, 0x0; 2026-02-21T10:18:33.6850167Z mov.u32 %r26754, 0x0; 2026-02-21T10:18:33.6850394Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26751, %r26752, %r26753, %r26754 }, [ %rd577 + 0 ], %rd576; 2026-02-21T10:18:33.6850523Z // end inline asm 2026-02-21T10:18:33.6850589Z // begin inline asm 2026-02-21T10:18:33.6850647Z mov.u64 %rd579, 0x0; 2026-02-21T10:18:33.6850763Z createpolicy.fractional.L2::evict_first.b64 %rd579, 1.0; 2026-02-21T10:18:33.6850821Z // end inline asm 2026-02-21T10:18:33.6850878Z // begin inline asm 2026-02-21T10:18:33.6850993Z mov.u32 %r26755, 0x0; 2026-02-21T10:18:33.6851052Z mov.u32 %r26756, 0x0; 2026-02-21T10:18:33.6851109Z mov.u32 %r26757, 0x0; 2026-02-21T10:18:33.6851165Z mov.u32 %r26758, 0x0; 2026-02-21T10:18:33.6851388Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r26755, %r26756, %r26757, %r26758 }, [ %rd580 + 0 ], %rd579; 2026-02-21T10:18:33.6851445Z // end inline asm 2026-02-21T10:18:33.6851649Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6851707Z bar.sync 0; 2026-02-21T10:18:33.6851793Z st.shared.v2.b32 [%r75], {%r26727, %r26728}; 2026-02-21T10:18:33.6851886Z st.shared.v2.b32 [%r75+2048], {%r26731, %r26732}; 2026-02-21T10:18:33.6851974Z st.shared.v2.b32 [%r75+4096], {%r26735, %r26736}; 2026-02-21T10:18:33.6852058Z st.shared.v2.b32 [%r75+6144], {%r26739, %r26740}; 2026-02-21T10:18:33.6852144Z st.shared.v2.b32 [%r75+8192], {%r26743, %r26744}; 2026-02-21T10:18:33.6852247Z st.shared.v2.b32 [%r75+10240], {%r26747, %r26748}; 2026-02-21T10:18:33.6852336Z st.shared.v2.b32 [%r75+12288], {%r26751, %r26752}; 2026-02-21T10:18:33.6852424Z st.shared.v2.b32 [%r75+14336], {%r26755, %r26756}; 2026-02-21T10:18:33.6852503Z st.shared.v2.b32 [%r76], {%r26729, %r26730}; 2026-02-21T10:18:33.6852587Z st.shared.v2.b32 [%r76+2048], {%r26733, %r26734}; 2026-02-21T10:18:33.6852670Z st.shared.v2.b32 [%r76+4096], {%r26737, %r26738}; 2026-02-21T10:18:33.6852752Z st.shared.v2.b32 [%r76+6144], {%r26741, %r26742}; 2026-02-21T10:18:33.6852833Z st.shared.v2.b32 [%r76+8192], {%r26745, %r26746}; 2026-02-21T10:18:33.6852920Z st.shared.v2.b32 [%r76+10240], {%r26749, %r26750}; 2026-02-21T10:18:33.6853007Z st.shared.v2.b32 [%r76+12288], {%r26753, %r26754}; 2026-02-21T10:18:33.6853089Z st.shared.v2.b32 [%r76+14336], {%r26757, %r26758}; 2026-02-21T10:18:33.6853144Z bar.sync 0; 2026-02-21T10:18:33.6853214Z ld.shared.b16 %rs2241, [%r77]; 2026-02-21T10:18:33.6853284Z ld.shared.b16 %rs2242, [%r77+1024]; 2026-02-21T10:18:33.6853352Z ld.shared.b16 %rs2243, [%r77+64]; 2026-02-21T10:18:33.6853418Z ld.shared.b16 %rs2244, [%r77+1088]; 2026-02-21T10:18:33.6853482Z ld.shared.b16 %rs2245, [%r77+8192]; 2026-02-21T10:18:33.6853545Z ld.shared.b16 %rs2246, [%r77+9216]; 2026-02-21T10:18:33.6853610Z ld.shared.b16 %rs2247, [%r77+8256]; 2026-02-21T10:18:33.6853681Z ld.shared.b16 %rs2248, [%r77+9280]; 2026-02-21T10:18:33.6853747Z ld.shared.b16 %rs2249, [%r78]; 2026-02-21T10:18:33.6853811Z ld.shared.b16 %rs2250, [%r78+1024]; 2026-02-21T10:18:33.6853938Z ld.shared.b16 %rs2251, [%r78+64]; 2026-02-21T10:18:33.6854002Z ld.shared.b16 %rs2252, [%r78+1088]; 2026-02-21T10:18:33.6854111Z ld.shared.b16 %rs2253, [%r78+8192]; 2026-02-21T10:18:33.6854176Z ld.shared.b16 %rs2254, [%r78+9216]; 2026-02-21T10:18:33.6854242Z ld.shared.b16 %rs2255, [%r78+8256]; 2026-02-21T10:18:33.6854304Z ld.shared.b16 %rs2256, [%r78+9280]; 2026-02-21T10:18:33.6854366Z ld.shared.b16 %rs2257, [%r79]; 2026-02-21T10:18:33.6854434Z ld.shared.b16 %rs2258, [%r79+1024]; 2026-02-21T10:18:33.6854497Z ld.shared.b16 %rs2259, [%r79+64]; 2026-02-21T10:18:33.6854567Z ld.shared.b16 %rs2260, [%r79+1088]; 2026-02-21T10:18:33.6854633Z ld.shared.b16 %rs2261, [%r79+8192]; 2026-02-21T10:18:33.6854708Z ld.shared.b16 %rs2262, [%r79+9216]; 2026-02-21T10:18:33.6854775Z ld.shared.b16 %rs2263, [%r79+8256]; 2026-02-21T10:18:33.6854838Z ld.shared.b16 %rs2264, [%r79+9280]; 2026-02-21T10:18:33.6854904Z ld.shared.b16 %rs2265, [%r80]; 2026-02-21T10:18:33.6854968Z ld.shared.b16 %rs2266, [%r80+1024]; 2026-02-21T10:18:33.6855033Z ld.shared.b16 %rs2267, [%r80+64]; 2026-02-21T10:18:33.6855101Z ld.shared.b16 %rs2268, [%r80+1088]; 2026-02-21T10:18:33.6855215Z ld.shared.b16 %rs2269, [%r80+8192]; 2026-02-21T10:18:33.6855280Z ld.shared.b16 %rs2270, [%r80+9216]; 2026-02-21T10:18:33.6855343Z ld.shared.b16 %rs2271, [%r80+8256]; 2026-02-21T10:18:33.6855409Z ld.shared.b16 %rs2272, [%r80+9280]; 2026-02-21T10:18:33.6855470Z ld.shared.b16 %rs2273, [%r81]; 2026-02-21T10:18:33.6855581Z ld.shared.b16 %rs2274, [%r81+1024]; 2026-02-21T10:18:33.6855650Z ld.shared.b16 %rs2275, [%r81+64]; 2026-02-21T10:18:33.6855714Z ld.shared.b16 %rs2276, [%r81+1088]; 2026-02-21T10:18:33.6855777Z ld.shared.b16 %rs2277, [%r81+8192]; 2026-02-21T10:18:33.6855840Z ld.shared.b16 %rs2278, [%r81+9216]; 2026-02-21T10:18:33.6855903Z ld.shared.b16 %rs2279, [%r81+8256]; 2026-02-21T10:18:33.6855965Z ld.shared.b16 %rs2280, [%r81+9280]; 2026-02-21T10:18:33.6856028Z ld.shared.b16 %rs2281, [%r82]; 2026-02-21T10:18:33.6856104Z ld.shared.b16 %rs2282, [%r82+1024]; 2026-02-21T10:18:33.6856172Z ld.shared.b16 %rs2283, [%r82+64]; 2026-02-21T10:18:33.6856238Z ld.shared.b16 %rs2284, [%r82+1088]; 2026-02-21T10:18:33.6856304Z ld.shared.b16 %rs2285, [%r82+8192]; 2026-02-21T10:18:33.6856368Z ld.shared.b16 %rs2286, [%r82+9216]; 2026-02-21T10:18:33.6856433Z ld.shared.b16 %rs2287, [%r82+8256]; 2026-02-21T10:18:33.6856616Z ld.shared.b16 %rs2288, [%r82+9280]; 2026-02-21T10:18:33.6856687Z ld.shared.b16 %rs2289, [%r83]; 2026-02-21T10:18:33.6856750Z ld.shared.b16 %rs2290, [%r83+1024]; 2026-02-21T10:18:33.6856813Z ld.shared.b16 %rs2291, [%r83+64]; 2026-02-21T10:18:33.6856880Z ld.shared.b16 %rs2292, [%r83+1088]; 2026-02-21T10:18:33.6856944Z ld.shared.b16 %rs2293, [%r83+8192]; 2026-02-21T10:18:33.6857007Z ld.shared.b16 %rs2294, [%r83+9216]; 2026-02-21T10:18:33.6857073Z ld.shared.b16 %rs2295, [%r83+8256]; 2026-02-21T10:18:33.6857135Z ld.shared.b16 %rs2296, [%r83+9280]; 2026-02-21T10:18:33.6857197Z ld.shared.b16 %rs2297, [%r84]; 2026-02-21T10:18:33.6857262Z ld.shared.b16 %rs2298, [%r84+1024]; 2026-02-21T10:18:33.6857329Z ld.shared.b16 %rs2299, [%r84+64]; 2026-02-21T10:18:33.6857394Z ld.shared.b16 %rs2300, [%r84+1088]; 2026-02-21T10:18:33.6857457Z ld.shared.b16 %rs2301, [%r84+8192]; 2026-02-21T10:18:33.6857527Z ld.shared.b16 %rs2302, [%r84+9216]; 2026-02-21T10:18:33.6857599Z ld.shared.b16 %rs2303, [%r84+8256]; 2026-02-21T10:18:33.6857664Z ld.shared.b16 %rs2304, [%r84+9280]; 2026-02-21T10:18:33.6857728Z cvt.f32.bf16 %r26896, %rs2241; 2026-02-21T10:18:33.6857791Z cvt.f32.bf16 %r26897, %rs2242; 2026-02-21T10:18:33.6857851Z cvt.f32.bf16 %r26898, %rs2249; 2026-02-21T10:18:33.6857909Z cvt.f32.bf16 %r26899, %rs2250; 2026-02-21T10:18:33.6857970Z cvt.f32.bf16 %r27028, %rs2257; 2026-02-21T10:18:33.6858030Z cvt.f32.bf16 %r27029, %rs2258; 2026-02-21T10:18:33.6858088Z cvt.f32.bf16 %r27030, %rs2265; 2026-02-21T10:18:33.6858150Z cvt.f32.bf16 %r27031, %rs2266; 2026-02-21T10:18:33.6858209Z cvt.f32.bf16 %r27160, %rs2273; 2026-02-21T10:18:33.6858359Z cvt.f32.bf16 %r27161, %rs2274; 2026-02-21T10:18:33.6858479Z cvt.f32.bf16 %r27162, %rs2281; 2026-02-21T10:18:33.6858542Z cvt.f32.bf16 %r27163, %rs2282; 2026-02-21T10:18:33.6858603Z cvt.f32.bf16 %r27292, %rs2289; 2026-02-21T10:18:33.6858662Z cvt.f32.bf16 %r27293, %rs2290; 2026-02-21T10:18:33.6858724Z cvt.f32.bf16 %r27294, %rs2297; 2026-02-21T10:18:33.6858782Z cvt.f32.bf16 %r27295, %rs2298; 2026-02-21T10:18:33.6858842Z cvt.f32.bf16 %r27424, %rs2243; 2026-02-21T10:18:33.6858901Z cvt.f32.bf16 %r27425, %rs2244; 2026-02-21T10:18:33.6858964Z cvt.f32.bf16 %r27426, %rs2251; 2026-02-21T10:18:33.6859022Z cvt.f32.bf16 %r27427, %rs2252; 2026-02-21T10:18:33.6859081Z cvt.f32.bf16 %r27556, %rs2259; 2026-02-21T10:18:33.6859140Z cvt.f32.bf16 %r27557, %rs2260; 2026-02-21T10:18:33.6859197Z cvt.f32.bf16 %r27558, %rs2267; 2026-02-21T10:18:33.6859255Z cvt.f32.bf16 %r27559, %rs2268; 2026-02-21T10:18:33.6859314Z cvt.f32.bf16 %r27688, %rs2275; 2026-02-21T10:18:33.6859377Z cvt.f32.bf16 %r27689, %rs2276; 2026-02-21T10:18:33.6859436Z cvt.f32.bf16 %r27690, %rs2283; 2026-02-21T10:18:33.6859496Z cvt.f32.bf16 %r27691, %rs2284; 2026-02-21T10:18:33.6859631Z cvt.f32.bf16 %r27820, %rs2291; 2026-02-21T10:18:33.6859698Z cvt.f32.bf16 %r27821, %rs2292; 2026-02-21T10:18:33.6859758Z cvt.f32.bf16 %r27822, %rs2299; 2026-02-21T10:18:33.6859818Z cvt.f32.bf16 %r27823, %rs2300; 2026-02-21T10:18:33.6859878Z cvt.f32.bf16 %r27952, %rs2245; 2026-02-21T10:18:33.6860013Z cvt.f32.bf16 %r27953, %rs2246; 2026-02-21T10:18:33.6860075Z cvt.f32.bf16 %r27954, %rs2253; 2026-02-21T10:18:33.6860136Z cvt.f32.bf16 %r27955, %rs2254; 2026-02-21T10:18:33.6860195Z cvt.f32.bf16 %r28084, %rs2261; 2026-02-21T10:18:33.6860255Z cvt.f32.bf16 %r28085, %rs2262; 2026-02-21T10:18:33.6860315Z cvt.f32.bf16 %r28086, %rs2269; 2026-02-21T10:18:33.6860373Z cvt.f32.bf16 %r28087, %rs2270; 2026-02-21T10:18:33.6860432Z cvt.f32.bf16 %r28216, %rs2277; 2026-02-21T10:18:33.6860491Z cvt.f32.bf16 %r28217, %rs2278; 2026-02-21T10:18:33.6860554Z cvt.f32.bf16 %r28218, %rs2285; 2026-02-21T10:18:33.6860615Z cvt.f32.bf16 %r28219, %rs2286; 2026-02-21T10:18:33.6860676Z cvt.f32.bf16 %r28348, %rs2293; 2026-02-21T10:18:33.6860738Z cvt.f32.bf16 %r28349, %rs2294; 2026-02-21T10:18:33.6860797Z cvt.f32.bf16 %r28350, %rs2301; 2026-02-21T10:18:33.6860856Z cvt.f32.bf16 %r28351, %rs2302; 2026-02-21T10:18:33.6860915Z cvt.f32.bf16 %r28480, %rs2247; 2026-02-21T10:18:33.6860976Z cvt.f32.bf16 %r28481, %rs2248; 2026-02-21T10:18:33.6861036Z cvt.f32.bf16 %r28482, %rs2255; 2026-02-21T10:18:33.6861095Z cvt.f32.bf16 %r28483, %rs2256; 2026-02-21T10:18:33.6861156Z cvt.f32.bf16 %r28612, %rs2263; 2026-02-21T10:18:33.6861216Z cvt.f32.bf16 %r28613, %rs2264; 2026-02-21T10:18:33.6861275Z cvt.f32.bf16 %r28614, %rs2271; 2026-02-21T10:18:33.6861335Z cvt.f32.bf16 %r28615, %rs2272; 2026-02-21T10:18:33.6861397Z cvt.f32.bf16 %r28744, %rs2279; 2026-02-21T10:18:33.6861455Z cvt.f32.bf16 %r28745, %rs2280; 2026-02-21T10:18:33.6861515Z cvt.f32.bf16 %r28746, %rs2287; 2026-02-21T10:18:33.6861577Z cvt.f32.bf16 %r28747, %rs2288; 2026-02-21T10:18:33.6861638Z cvt.f32.bf16 %r28876, %rs2295; 2026-02-21T10:18:33.6861700Z cvt.f32.bf16 %r28877, %rs2296; 2026-02-21T10:18:33.6861775Z cvt.f32.bf16 %r28878, %rs2303; 2026-02-21T10:18:33.6861837Z cvt.f32.bf16 %r28879, %rs2304; 2026-02-21T10:18:33.6862052Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6862109Z bar.sync 0; 2026-02-21T10:18:33.6862171Z // begin inline asm 2026-02-21T10:18:33.6862271Z @%p222 mbarrier.init.shared::cta.b64 [%r29375], 1; 2026-02-21T10:18:33.6862328Z // end inline asm 2026-02-21T10:18:33.6862384Z bar.sync 0; 2026-02-21T10:18:33.6862444Z // begin inline asm 2026-02-21T10:18:33.6862577Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r29375], 4096; 2026-02-21T10:18:33.6862633Z // end inline asm 2026-02-21T10:18:33.6862695Z // begin inline asm 2026-02-21T10:18:33.6862831Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6862886Z // end inline asm 2026-02-21T10:18:33.6862989Z bar.sync 0; 2026-02-21T10:18:33.6863056Z elect.sync %r29278|%p285, -1; 2026-02-21T10:18:33.6863126Z and.pred %p264, %p1, %p285; 2026-02-21T10:18:33.6863187Z add.s32 %r26763, %r29211, 160; 2026-02-21T10:18:33.6863247Z // begin inline asm 2026-02-21T10:18:33.6863586Z @%p264 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r29378, %r26763}], [%r29375]; 2026-02-21T10:18:33.6863643Z // end inline asm 2026-02-21T10:18:33.6863699Z bar.sync 0; 2026-02-21T10:18:33.6863756Z // begin inline asm 2026-02-21T10:18:33.6863807Z 2026-02-21T10:18:33.6863858Z { 2026-02-21T10:18:33.6863920Z .reg .pred complete; 2026-02-21T10:18:33.6863974Z waitLoop: 2026-02-21T10:18:33.6864122Z mbarrier.try_wait.parity.shared.b64 complete, [%r29375], %r29010; 2026-02-21T10:18:33.6864206Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6864256Z } 2026-02-21T10:18:33.6864264Z 2026-02-21T10:18:33.6864321Z // end inline asm 2026-02-21T10:18:33.6864381Z bar.sync 0; 2026-02-21T10:18:33.6864439Z // begin inline asm 2026-02-21T10:18:33.6864587Z @%p222 mbarrier.inval.shared::cta.b64 [%r29375]; 2026-02-21T10:18:33.6864645Z // end inline asm 2026-02-21T10:18:33.6864856Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6864921Z ld.shared.s8 %rs2305, [%r85]; 2026-02-21T10:18:33.6865162Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6865232Z shl.b16 %rs2306, %rs2305, 4; 2026-02-21T10:18:33.6865426Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6865494Z ld.shared.s8 %rs2307, [%r86+128]; 2026-02-21T10:18:33.6865689Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6865752Z shl.b16 %rs2308, %rs2307, 4; 2026-02-21T10:18:33.6865944Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6866013Z ld.shared.s8 %rs2309, [%r87+256]; 2026-02-21T10:18:33.6866206Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6866268Z shl.b16 %rs2310, %rs2309, 4; 2026-02-21T10:18:33.6866592Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6866662Z ld.shared.s8 %rs2311, [%r88+384]; 2026-02-21T10:18:33.6866858Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6866919Z shl.b16 %rs2312, %rs2311, 4; 2026-02-21T10:18:33.6867114Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6867177Z ld.shared.s8 %rs2313, [%r89+512]; 2026-02-21T10:18:33.6867372Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6867437Z shl.b16 %rs2314, %rs2313, 4; 2026-02-21T10:18:33.6867629Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6867692Z ld.shared.s8 %rs2315, [%r90+640]; 2026-02-21T10:18:33.6867890Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6867951Z shl.b16 %rs2316, %rs2315, 4; 2026-02-21T10:18:33.6868142Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6868206Z ld.shared.s8 %rs2317, [%r91+768]; 2026-02-21T10:18:33.6868397Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6868539Z shl.b16 %rs2318, %rs2317, 4; 2026-02-21T10:18:33.6868737Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6868953Z ld.shared.s8 %rs2319, [%r92+896]; 2026-02-21T10:18:33.6869153Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6869214Z shl.b16 %rs2320, %rs2319, 4; 2026-02-21T10:18:33.6869411Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6869479Z ld.shared.s8 %rs2321, [%r85+1024]; 2026-02-21T10:18:33.6869672Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6869734Z shl.b16 %rs2322, %rs2321, 4; 2026-02-21T10:18:33.6869928Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6869994Z ld.shared.s8 %rs2323, [%r86+1152]; 2026-02-21T10:18:33.6870188Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6870250Z shl.b16 %rs2324, %rs2323, 4; 2026-02-21T10:18:33.6870510Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6870576Z ld.shared.s8 %rs2325, [%r87+1280]; 2026-02-21T10:18:33.6870770Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6870830Z shl.b16 %rs2326, %rs2325, 4; 2026-02-21T10:18:33.6871079Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6871149Z ld.shared.s8 %rs2327, [%r88+1408]; 2026-02-21T10:18:33.6871340Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6871401Z shl.b16 %rs2328, %rs2327, 4; 2026-02-21T10:18:33.6871594Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6871658Z ld.shared.s8 %rs2329, [%r89+1536]; 2026-02-21T10:18:33.6871851Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6871917Z shl.b16 %rs2330, %rs2329, 4; 2026-02-21T10:18:33.6872109Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6872173Z ld.shared.s8 %rs2331, [%r90+1664]; 2026-02-21T10:18:33.6872367Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6872428Z shl.b16 %rs2332, %rs2331, 4; 2026-02-21T10:18:33.6872619Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6872682Z ld.shared.s8 %rs2333, [%r91+1792]; 2026-02-21T10:18:33.6872878Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6872950Z shl.b16 %rs2334, %rs2333, 4; 2026-02-21T10:18:33.6873149Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6873218Z ld.shared.s8 %rs2335, [%r92+1920]; 2026-02-21T10:18:33.6873411Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6873473Z shl.b16 %rs2336, %rs2335, 4; 2026-02-21T10:18:33.6873669Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6873733Z ld.shared.s8 %rs2337, [%r85+2048]; 2026-02-21T10:18:33.6873926Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6873987Z shl.b16 %rs2338, %rs2337, 4; 2026-02-21T10:18:33.6874181Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6874244Z ld.shared.s8 %rs2339, [%r86+2176]; 2026-02-21T10:18:33.6874437Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6874599Z shl.b16 %rs2340, %rs2339, 4; 2026-02-21T10:18:33.6874792Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6874856Z ld.shared.s8 %rs2341, [%r87+2304]; 2026-02-21T10:18:33.6875049Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6875111Z shl.b16 %rs2342, %rs2341, 4; 2026-02-21T10:18:33.6875302Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6875367Z ld.shared.s8 %rs2343, [%r88+2432]; 2026-02-21T10:18:33.6875558Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6875618Z shl.b16 %rs2344, %rs2343, 4; 2026-02-21T10:18:33.6875811Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6875877Z ld.shared.s8 %rs2345, [%r89+2560]; 2026-02-21T10:18:33.6876119Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6876181Z shl.b16 %rs2346, %rs2345, 4; 2026-02-21T10:18:33.6876374Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6876436Z ld.shared.s8 %rs2347, [%r90+2688]; 2026-02-21T10:18:33.6876819Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6876890Z shl.b16 %rs2348, %rs2347, 4; 2026-02-21T10:18:33.6877084Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6877150Z ld.shared.s8 %rs2349, [%r91+2816]; 2026-02-21T10:18:33.6877350Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6877417Z shl.b16 %rs2350, %rs2349, 4; 2026-02-21T10:18:33.6877615Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6877684Z ld.shared.s8 %rs2351, [%r92+2944]; 2026-02-21T10:18:33.6877877Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6877938Z shl.b16 %rs2352, %rs2351, 4; 2026-02-21T10:18:33.6878133Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6878200Z ld.shared.s8 %rs2353, [%r85+3072]; 2026-02-21T10:18:33.6878393Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6878454Z shl.b16 %rs2354, %rs2353, 4; 2026-02-21T10:18:33.6878649Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6878712Z ld.shared.s8 %rs2355, [%r86+3200]; 2026-02-21T10:18:33.6878904Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6878967Z shl.b16 %rs2356, %rs2355, 4; 2026-02-21T10:18:33.6879161Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6879225Z ld.shared.s8 %rs2357, [%r87+3328]; 2026-02-21T10:18:33.6879422Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6879491Z shl.b16 %rs2358, %rs2357, 4; 2026-02-21T10:18:33.6879684Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6879747Z ld.shared.s8 %rs2359, [%r88+3456]; 2026-02-21T10:18:33.6879939Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6879999Z shl.b16 %rs2360, %rs2359, 4; 2026-02-21T10:18:33.6880195Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6880336Z ld.shared.s8 %rs2361, [%r89+3584]; 2026-02-21T10:18:33.6880594Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6880667Z shl.b16 %rs2362, %rs2361, 4; 2026-02-21T10:18:33.6880862Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6880927Z ld.shared.s8 %rs2363, [%r90+3712]; 2026-02-21T10:18:33.6881118Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6881181Z shl.b16 %rs2364, %rs2363, 4; 2026-02-21T10:18:33.6881372Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6881434Z ld.shared.s8 %rs2365, [%r91+3840]; 2026-02-21T10:18:33.6881627Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6881692Z shl.b16 %rs2366, %rs2365, 4; 2026-02-21T10:18:33.6881886Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6882030Z ld.shared.s8 %rs2367, [%r92+3968]; 2026-02-21T10:18:33.6882229Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6882289Z shl.b16 %rs2368, %rs2367, 4; 2026-02-21T10:18:33.6882525Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6882590Z cvt.s16.s8 %rs2369, %rs2306; 2026-02-21T10:18:33.6882649Z shr.s16 %rs2370, %rs2369, 4; 2026-02-21T10:18:33.6882709Z cvt.s16.s8 %rs2371, %rs2308; 2026-02-21T10:18:33.6882776Z shr.s16 %rs2372, %rs2371, 4; 2026-02-21T10:18:33.6882844Z shr.s16 %rs2373, %rs2305, 4; 2026-02-21T10:18:33.6882903Z shr.s16 %rs2374, %rs2307, 4; 2026-02-21T10:18:33.6883098Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6883169Z cvt.rn.f32.s16 %r29279, %rs2374; 2026-02-21T10:18:33.6883235Z cvt.rn.f32.s16 %r29280, %rs2373; 2026-02-21T10:18:33.6883297Z cvt.rn.f32.s16 %r29281, %rs2372; 2026-02-21T10:18:33.6883358Z cvt.rn.f32.s16 %r29282, %rs2370; 2026-02-21T10:18:33.6883552Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6883612Z cvt.s16.s8 %rs2375, %rs2310; 2026-02-21T10:18:33.6883673Z shr.s16 %rs2376, %rs2375, 4; 2026-02-21T10:18:33.6883735Z cvt.s16.s8 %rs2377, %rs2312; 2026-02-21T10:18:33.6883793Z shr.s16 %rs2378, %rs2377, 4; 2026-02-21T10:18:33.6883853Z shr.s16 %rs2379, %rs2309, 4; 2026-02-21T10:18:33.6883915Z shr.s16 %rs2380, %rs2311, 4; 2026-02-21T10:18:33.6884107Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6884170Z cvt.rn.f32.s16 %r29283, %rs2380; 2026-02-21T10:18:33.6884235Z cvt.rn.f32.s16 %r29284, %rs2379; 2026-02-21T10:18:33.6884295Z cvt.rn.f32.s16 %r29285, %rs2378; 2026-02-21T10:18:33.6884357Z cvt.rn.f32.s16 %r29286, %rs2376; 2026-02-21T10:18:33.6884551Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6884615Z cvt.s16.s8 %rs2381, %rs2314; 2026-02-21T10:18:33.6884675Z shr.s16 %rs2382, %rs2381, 4; 2026-02-21T10:18:33.6884734Z cvt.s16.s8 %rs2383, %rs2316; 2026-02-21T10:18:33.6884797Z shr.s16 %rs2384, %rs2383, 4; 2026-02-21T10:18:33.6884858Z shr.s16 %rs2385, %rs2313, 4; 2026-02-21T10:18:33.6884916Z shr.s16 %rs2386, %rs2315, 4; 2026-02-21T10:18:33.6885111Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6885174Z cvt.rn.f32.s16 %r29287, %rs2386; 2026-02-21T10:18:33.6885234Z cvt.rn.f32.s16 %r29288, %rs2385; 2026-02-21T10:18:33.6885294Z cvt.rn.f32.s16 %r29289, %rs2384; 2026-02-21T10:18:33.6885358Z cvt.rn.f32.s16 %r29290, %rs2382; 2026-02-21T10:18:33.6885614Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6885723Z cvt.s16.s8 %rs2387, %rs2318; 2026-02-21T10:18:33.6885784Z shr.s16 %rs2388, %rs2387, 4; 2026-02-21T10:18:33.6885843Z cvt.s16.s8 %rs2389, %rs2320; 2026-02-21T10:18:33.6885902Z shr.s16 %rs2390, %rs2389, 4; 2026-02-21T10:18:33.6885961Z shr.s16 %rs2391, %rs2317, 4; 2026-02-21T10:18:33.6886038Z shr.s16 %rs2392, %rs2319, 4; 2026-02-21T10:18:33.6886233Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6886295Z cvt.rn.f32.s16 %r29291, %rs2392; 2026-02-21T10:18:33.6886359Z cvt.rn.f32.s16 %r29292, %rs2391; 2026-02-21T10:18:33.6886420Z cvt.rn.f32.s16 %r29293, %rs2390; 2026-02-21T10:18:33.6886597Z cvt.rn.f32.s16 %r29294, %rs2388; 2026-02-21T10:18:33.6886798Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6886861Z cvt.s16.s8 %rs2393, %rs2322; 2026-02-21T10:18:33.6886922Z shr.s16 %rs2394, %rs2393, 4; 2026-02-21T10:18:33.6886982Z cvt.s16.s8 %rs2395, %rs2324; 2026-02-21T10:18:33.6887124Z shr.s16 %rs2396, %rs2395, 4; 2026-02-21T10:18:33.6887189Z shr.s16 %rs2397, %rs2321, 4; 2026-02-21T10:18:33.6887250Z shr.s16 %rs2398, %rs2323, 4; 2026-02-21T10:18:33.6887512Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6887577Z cvt.rn.f32.s16 %r29295, %rs2398; 2026-02-21T10:18:33.6887640Z cvt.rn.f32.s16 %r29296, %rs2397; 2026-02-21T10:18:33.6887702Z cvt.rn.f32.s16 %r29297, %rs2396; 2026-02-21T10:18:33.6887761Z cvt.rn.f32.s16 %r29298, %rs2394; 2026-02-21T10:18:33.6887959Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6888020Z cvt.s16.s8 %rs2399, %rs2326; 2026-02-21T10:18:33.6888083Z shr.s16 %rs2400, %rs2399, 4; 2026-02-21T10:18:33.6888145Z cvt.s16.s8 %rs2401, %rs2328; 2026-02-21T10:18:33.6888204Z shr.s16 %rs2402, %rs2401, 4; 2026-02-21T10:18:33.6888267Z shr.s16 %rs2403, %rs2325, 4; 2026-02-21T10:18:33.6888327Z shr.s16 %rs2404, %rs2327, 4; 2026-02-21T10:18:33.6888523Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6888587Z cvt.rn.f32.s16 %r29299, %rs2404; 2026-02-21T10:18:33.6888660Z cvt.rn.f32.s16 %r29300, %rs2403; 2026-02-21T10:18:33.6888725Z cvt.rn.f32.s16 %r29301, %rs2402; 2026-02-21T10:18:33.6888786Z cvt.rn.f32.s16 %r29302, %rs2400; 2026-02-21T10:18:33.6888990Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6889051Z cvt.s16.s8 %rs2405, %rs2330; 2026-02-21T10:18:33.6889109Z shr.s16 %rs2406, %rs2405, 4; 2026-02-21T10:18:33.6889170Z cvt.s16.s8 %rs2407, %rs2332; 2026-02-21T10:18:33.6889229Z shr.s16 %rs2408, %rs2407, 4; 2026-02-21T10:18:33.6889291Z shr.s16 %rs2409, %rs2329, 4; 2026-02-21T10:18:33.6889351Z shr.s16 %rs2410, %rs2331, 4; 2026-02-21T10:18:33.6889551Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6889614Z cvt.rn.f32.s16 %r29303, %rs2410; 2026-02-21T10:18:33.6889674Z cvt.rn.f32.s16 %r29304, %rs2409; 2026-02-21T10:18:33.6889738Z cvt.rn.f32.s16 %r29305, %rs2408; 2026-02-21T10:18:33.6889798Z cvt.rn.f32.s16 %r29306, %rs2406; 2026-02-21T10:18:33.6889992Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6890055Z cvt.s16.s8 %rs2411, %rs2334; 2026-02-21T10:18:33.6890116Z shr.s16 %rs2412, %rs2411, 4; 2026-02-21T10:18:33.6890175Z cvt.s16.s8 %rs2413, %rs2336; 2026-02-21T10:18:33.6890235Z shr.s16 %rs2414, %rs2413, 4; 2026-02-21T10:18:33.6890297Z shr.s16 %rs2415, %rs2333, 4; 2026-02-21T10:18:33.6890355Z shr.s16 %rs2416, %rs2335, 4; 2026-02-21T10:18:33.6890547Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6890750Z cvt.rn.f32.s16 %r29307, %rs2416; 2026-02-21T10:18:33.6890813Z cvt.rn.f32.s16 %r29308, %rs2415; 2026-02-21T10:18:33.6890874Z cvt.rn.f32.s16 %r29309, %rs2414; 2026-02-21T10:18:33.6890937Z cvt.rn.f32.s16 %r29310, %rs2412; 2026-02-21T10:18:33.6891132Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6891194Z cvt.s16.s8 %rs2417, %rs2338; 2026-02-21T10:18:33.6891254Z shr.s16 %rs2418, %rs2417, 4; 2026-02-21T10:18:33.6891317Z cvt.s16.s8 %rs2419, %rs2340; 2026-02-21T10:18:33.6891376Z shr.s16 %rs2420, %rs2419, 4; 2026-02-21T10:18:33.6891438Z shr.s16 %rs2421, %rs2337, 4; 2026-02-21T10:18:33.6891504Z shr.s16 %rs2422, %rs2339, 4; 2026-02-21T10:18:33.6891718Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6891788Z cvt.rn.f32.s16 %r29311, %rs2422; 2026-02-21T10:18:33.6891857Z cvt.rn.f32.s16 %r29312, %rs2421; 2026-02-21T10:18:33.6891922Z cvt.rn.f32.s16 %r29313, %rs2420; 2026-02-21T10:18:33.6891984Z cvt.rn.f32.s16 %r29314, %rs2418; 2026-02-21T10:18:33.6892233Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6892301Z cvt.s16.s8 %rs2423, %rs2342; 2026-02-21T10:18:33.6892362Z shr.s16 %rs2424, %rs2423, 4; 2026-02-21T10:18:33.6892463Z cvt.s16.s8 %rs2425, %rs2344; 2026-02-21T10:18:33.6892529Z shr.s16 %rs2426, %rs2425, 4; 2026-02-21T10:18:33.6892588Z shr.s16 %rs2427, %rs2341, 4; 2026-02-21T10:18:33.6892647Z shr.s16 %rs2428, %rs2343, 4; 2026-02-21T10:18:33.6892844Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6892911Z cvt.rn.f32.s16 %r29315, %rs2428; 2026-02-21T10:18:33.6892973Z cvt.rn.f32.s16 %r29316, %rs2427; 2026-02-21T10:18:33.6893035Z cvt.rn.f32.s16 %r29317, %rs2426; 2026-02-21T10:18:33.6893103Z cvt.rn.f32.s16 %r29318, %rs2424; 2026-02-21T10:18:33.6893300Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6893366Z cvt.s16.s8 %rs2429, %rs2346; 2026-02-21T10:18:33.6893429Z shr.s16 %rs2430, %rs2429, 4; 2026-02-21T10:18:33.6893488Z cvt.s16.s8 %rs2431, %rs2348; 2026-02-21T10:18:33.6893547Z shr.s16 %rs2432, %rs2431, 4; 2026-02-21T10:18:33.6893617Z shr.s16 %rs2433, %rs2345, 4; 2026-02-21T10:18:33.6893682Z shr.s16 %rs2434, %rs2347, 4; 2026-02-21T10:18:33.6893878Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6893941Z cvt.rn.f32.s16 %r29319, %rs2434; 2026-02-21T10:18:33.6894006Z cvt.rn.f32.s16 %r29320, %rs2433; 2026-02-21T10:18:33.6894067Z cvt.rn.f32.s16 %r29321, %rs2432; 2026-02-21T10:18:33.6894128Z cvt.rn.f32.s16 %r29322, %rs2430; 2026-02-21T10:18:33.6894326Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6894389Z cvt.s16.s8 %rs2435, %rs2350; 2026-02-21T10:18:33.6894459Z shr.s16 %rs2436, %rs2435, 4; 2026-02-21T10:18:33.6894520Z cvt.s16.s8 %rs2437, %rs2352; 2026-02-21T10:18:33.6894583Z shr.s16 %rs2438, %rs2437, 4; 2026-02-21T10:18:33.6894641Z shr.s16 %rs2439, %rs2349, 4; 2026-02-21T10:18:33.6894701Z shr.s16 %rs2440, %rs2351, 4; 2026-02-21T10:18:33.6894896Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6894960Z cvt.rn.f32.s16 %r29323, %rs2440; 2026-02-21T10:18:33.6895022Z cvt.rn.f32.s16 %r29324, %rs2439; 2026-02-21T10:18:33.6895086Z cvt.rn.f32.s16 %r29325, %rs2438; 2026-02-21T10:18:33.6895147Z cvt.rn.f32.s16 %r29326, %rs2436; 2026-02-21T10:18:33.6895341Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6895402Z cvt.s16.s8 %rs2441, %rs2354; 2026-02-21T10:18:33.6895465Z shr.s16 %rs2442, %rs2441, 4; 2026-02-21T10:18:33.6895591Z cvt.s16.s8 %rs2443, %rs2356; 2026-02-21T10:18:33.6895698Z shr.s16 %rs2444, %rs2443, 4; 2026-02-21T10:18:33.6895761Z shr.s16 %rs2445, %rs2353, 4; 2026-02-21T10:18:33.6895821Z shr.s16 %rs2446, %rs2355, 4; 2026-02-21T10:18:33.6896017Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6896086Z cvt.rn.f32.s16 %r29327, %rs2446; 2026-02-21T10:18:33.6896149Z cvt.rn.f32.s16 %r29328, %rs2445; 2026-02-21T10:18:33.6896210Z cvt.rn.f32.s16 %r29329, %rs2444; 2026-02-21T10:18:33.6896271Z cvt.rn.f32.s16 %r29330, %rs2442; 2026-02-21T10:18:33.6896577Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6896641Z cvt.s16.s8 %rs2447, %rs2358; 2026-02-21T10:18:33.6896701Z shr.s16 %rs2448, %rs2447, 4; 2026-02-21T10:18:33.6896764Z cvt.s16.s8 %rs2449, %rs2360; 2026-02-21T10:18:33.6896824Z shr.s16 %rs2450, %rs2449, 4; 2026-02-21T10:18:33.6896885Z shr.s16 %rs2451, %rs2357, 4; 2026-02-21T10:18:33.6896944Z shr.s16 %rs2452, %rs2359, 4; 2026-02-21T10:18:33.6897224Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6897293Z cvt.rn.f32.s16 %r29331, %rs2452; 2026-02-21T10:18:33.6897355Z cvt.rn.f32.s16 %r29332, %rs2451; 2026-02-21T10:18:33.6897417Z cvt.rn.f32.s16 %r29333, %rs2450; 2026-02-21T10:18:33.6897479Z cvt.rn.f32.s16 %r29334, %rs2448; 2026-02-21T10:18:33.6897729Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6897796Z cvt.s16.s8 %rs2453, %rs2362; 2026-02-21T10:18:33.6897855Z shr.s16 %rs2454, %rs2453, 4; 2026-02-21T10:18:33.6897914Z cvt.s16.s8 %rs2455, %rs2364; 2026-02-21T10:18:33.6897973Z shr.s16 %rs2456, %rs2455, 4; 2026-02-21T10:18:33.6898034Z shr.s16 %rs2457, %rs2361, 4; 2026-02-21T10:18:33.6898094Z shr.s16 %rs2458, %rs2363, 4; 2026-02-21T10:18:33.6898288Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6898359Z cvt.rn.f32.s16 %r29335, %rs2458; 2026-02-21T10:18:33.6898423Z cvt.rn.f32.s16 %r29336, %rs2457; 2026-02-21T10:18:33.6898485Z cvt.rn.f32.s16 %r29337, %rs2456; 2026-02-21T10:18:33.6898549Z cvt.rn.f32.s16 %r29338, %rs2454; 2026-02-21T10:18:33.6898743Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6898804Z cvt.s16.s8 %rs2459, %rs2366; 2026-02-21T10:18:33.6898864Z shr.s16 %rs2460, %rs2459, 4; 2026-02-21T10:18:33.6898927Z cvt.s16.s8 %rs2461, %rs2368; 2026-02-21T10:18:33.6898987Z shr.s16 %rs2462, %rs2461, 4; 2026-02-21T10:18:33.6899045Z shr.s16 %rs2463, %rs2365, 4; 2026-02-21T10:18:33.6899105Z shr.s16 %rs2464, %rs2367, 4; 2026-02-21T10:18:33.6899300Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6899362Z cvt.rn.f32.s16 %r29339, %rs2464; 2026-02-21T10:18:33.6899427Z cvt.rn.f32.s16 %r29340, %rs2463; 2026-02-21T10:18:33.6899493Z cvt.rn.f32.s16 %r29341, %rs2462; 2026-02-21T10:18:33.6899554Z cvt.rn.f32.s16 %r29342, %rs2460; 2026-02-21T10:18:33.6899611Z bar.sync 0; 2026-02-21T10:18:33.6899735Z st.shared.v4.b32 [%r93], {%r29282, %r29280, %r29281, %r29279}; 2026-02-21T10:18:33.6899863Z st.shared.v4.b32 [%r93+16384], {%r29314, %r29312, %r29313, %r29311}; 2026-02-21T10:18:33.6899976Z st.shared.v4.b32 [%r94], {%r29286, %r29284, %r29285, %r29283}; 2026-02-21T10:18:33.6900101Z st.shared.v4.b32 [%r94+16384], {%r29318, %r29316, %r29317, %r29315}; 2026-02-21T10:18:33.6900211Z st.shared.v4.b32 [%r95], {%r29290, %r29288, %r29289, %r29287}; 2026-02-21T10:18:33.6900329Z st.shared.v4.b32 [%r95+16384], {%r29322, %r29320, %r29321, %r29319}; 2026-02-21T10:18:33.6900443Z st.shared.v4.b32 [%r96], {%r29294, %r29292, %r29293, %r29291}; 2026-02-21T10:18:33.6900568Z st.shared.v4.b32 [%r96+16384], {%r29326, %r29324, %r29325, %r29323}; 2026-02-21T10:18:33.6900766Z st.shared.v4.b32 [%r97], {%r29298, %r29296, %r29297, %r29295}; 2026-02-21T10:18:33.6900958Z st.shared.v4.b32 [%r97+16384], {%r29330, %r29328, %r29329, %r29327}; 2026-02-21T10:18:33.6901069Z st.shared.v4.b32 [%r98], {%r29302, %r29300, %r29301, %r29299}; 2026-02-21T10:18:33.6901186Z st.shared.v4.b32 [%r98+16384], {%r29334, %r29332, %r29333, %r29331}; 2026-02-21T10:18:33.6901292Z st.shared.v4.b32 [%r99], {%r29306, %r29304, %r29305, %r29303}; 2026-02-21T10:18:33.6901411Z st.shared.v4.b32 [%r99+16384], {%r29338, %r29336, %r29337, %r29335}; 2026-02-21T10:18:33.6901525Z st.shared.v4.b32 [%r100], {%r29310, %r29308, %r29309, %r29307}; 2026-02-21T10:18:33.6901649Z st.shared.v4.b32 [%r100+16384], {%r29342, %r29340, %r29341, %r29339}; 2026-02-21T10:18:33.6901707Z $L__tmp21: 2026-02-21T10:18:33.6901991Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.6902053Z // begin inline asm 2026-02-21T10:18:33.6902138Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6902196Z // end inline asm 2026-02-21T10:18:33.6902251Z bar.sync 0; 2026-02-21T10:18:33.6902375Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.6902438Z // begin inline asm 2026-02-21T10:18:33.6903990Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r26896,%r26897,%r26898,%r26899}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6904054Z // end inline asm 2026-02-21T10:18:33.6904112Z // begin inline asm 2026-02-21T10:18:33.6905609Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27028,%r27029,%r27030,%r27031}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6905671Z // end inline asm 2026-02-21T10:18:33.6905729Z // begin inline asm 2026-02-21T10:18:33.6907327Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27160,%r27161,%r27162,%r27163}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6907403Z // end inline asm 2026-02-21T10:18:33.6907461Z // begin inline asm 2026-02-21T10:18:33.6909051Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27292,%r27293,%r27294,%r27295}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6909261Z // end inline asm 2026-02-21T10:18:33.6909321Z // begin inline asm 2026-02-21T10:18:33.6910829Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27424,%r27425,%r27426,%r27427}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6910887Z // end inline asm 2026-02-21T10:18:33.6910945Z // begin inline asm 2026-02-21T10:18:33.6912542Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27556,%r27557,%r27558,%r27559}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6912606Z // end inline asm 2026-02-21T10:18:33.6912664Z // begin inline asm 2026-02-21T10:18:33.6914150Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27688,%r27689,%r27690,%r27691}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6914212Z // end inline asm 2026-02-21T10:18:33.6914273Z // begin inline asm 2026-02-21T10:18:33.6915764Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r27820,%r27821,%r27822,%r27823}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6915837Z // end inline asm 2026-02-21T10:18:33.6915898Z // begin inline asm 2026-02-21T10:18:33.6917501Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r27952,%r27953,%r27954,%r27955}, %rd12, %p226, 1, 1; 2026-02-21T10:18:33.6917564Z // end inline asm 2026-02-21T10:18:33.6917622Z // begin inline asm 2026-02-21T10:18:33.6919107Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28084,%r28085,%r28086,%r28087}, %rd13, %p226, 1, 1; 2026-02-21T10:18:33.6919307Z // end inline asm 2026-02-21T10:18:33.6919365Z // begin inline asm 2026-02-21T10:18:33.6920911Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28216,%r28217,%r28218,%r28219}, %rd14, %p226, 1, 1; 2026-02-21T10:18:33.6920973Z // end inline asm 2026-02-21T10:18:33.6921030Z // begin inline asm 2026-02-21T10:18:33.6922565Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28348,%r28349,%r28350,%r28351}, %rd15, %p226, 1, 1; 2026-02-21T10:18:33.6922634Z // end inline asm 2026-02-21T10:18:33.6922696Z // begin inline asm 2026-02-21T10:18:33.6924179Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28480,%r28481,%r28482,%r28483}, %rd16, %p226, 1, 1; 2026-02-21T10:18:33.6924237Z // end inline asm 2026-02-21T10:18:33.6924298Z // begin inline asm 2026-02-21T10:18:33.6925795Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28612,%r28613,%r28614,%r28615}, %rd17, %p226, 1, 1; 2026-02-21T10:18:33.6925854Z // end inline asm 2026-02-21T10:18:33.6925917Z // begin inline asm 2026-02-21T10:18:33.6927499Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28744,%r28745,%r28746,%r28747}, %rd18, %p226, 1, 1; 2026-02-21T10:18:33.6927697Z // end inline asm 2026-02-21T10:18:33.6927755Z // begin inline asm 2026-02-21T10:18:33.6929241Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r28876,%r28877,%r28878,%r28879}, %rd19, %p226, 1, 1; 2026-02-21T10:18:33.6929300Z // end inline asm 2026-02-21T10:18:33.6929379Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.6929443Z mov.b32 %r29008, %r29377; 2026-02-21T10:18:33.6929504Z mov.b32 %r29009, %r29010; 2026-02-21T10:18:33.6929563Z // begin inline asm 2026-02-21T10:18:33.6932190Z // wait for regs: %r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484,%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548,%r29008,%r29009,%r29010 2026-02-21T10:18:33.6932278Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.6932334Z // end inline asm 2026-02-21T10:18:33.6932389Z $L__tmp22: 2026-02-21T10:18:33.6932608Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.6932673Z add.s64 %rd663, %rd663, 96; 2026-02-21T10:18:33.6932738Z setp.lt.u64 %p286, %rd79, 3936; 2026-02-21T10:18:33.6932800Z mov.b64 %rd662, %rd79; 2026-02-21T10:18:33.6932861Z @%p286 bra $L__BB0_14; 2026-02-21T10:18:33.6932964Z // %bb.15: // %.preheader.preheader 2026-02-21T10:18:33.6933072Z // in Loop: Header=BB0_13 Depth=1 2026-02-21T10:18:33.6933276Z .loc 1 0 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:0:93 2026-02-21T10:18:33.6933339Z mov.b64 %rd665, 4032; 2026-02-21T10:18:33.6933408Z mov.b64 %rd664, 4000; 2026-02-21T10:18:33.6933507Z $L__BB0_16: // %.preheader 2026-02-21T10:18:33.6933606Z // Parent Loop BB0_13 Depth=1 2026-02-21T10:18:33.6933714Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:33.6933922Z .loc 1 58 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:32 2026-02-21T10:18:33.6933986Z shl.b64 %rd650, %rd20, 1; 2026-02-21T10:18:33.6934048Z shl.b64 %rd651, %rd665, 2; 2026-02-21T10:18:33.6934114Z or.b64 %rd652, %rd651, %rd650; 2026-02-21T10:18:33.6934177Z add.s64 %rd610, %rd69, %rd652; 2026-02-21T10:18:33.6934238Z add.s64 %rd613, %rd70, %rd652; 2026-02-21T10:18:33.6934304Z add.s64 %rd616, %rd71, %rd652; 2026-02-21T10:18:33.6934364Z add.s64 %rd619, %rd72, %rd652; 2026-02-21T10:18:33.6934496Z add.s64 %rd622, %rd73, %rd652; 2026-02-21T10:18:33.6934556Z add.s64 %rd625, %rd74, %rd652; 2026-02-21T10:18:33.6934670Z add.s64 %rd628, %rd75, %rd652; 2026-02-21T10:18:33.6934731Z add.s64 %rd631, %rd76, %rd652; 2026-02-21T10:18:33.6934932Z .loc 1 58 80 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:58:80 2026-02-21T10:18:33.6934994Z // begin inline asm 2026-02-21T10:18:33.6935051Z mov.u64 %rd609, 0x0; 2026-02-21T10:18:33.6935184Z createpolicy.fractional.L2::evict_first.b64 %rd609, 1.0; 2026-02-21T10:18:33.6935244Z // end inline asm 2026-02-21T10:18:33.6935302Z // begin inline asm 2026-02-21T10:18:33.6935360Z mov.u32 %r29343, 0x0; 2026-02-21T10:18:33.6935418Z mov.u32 %r29344, 0x0; 2026-02-21T10:18:33.6935476Z mov.u32 %r29345, 0x0; 2026-02-21T10:18:33.6935534Z mov.u32 %r29346, 0x0; 2026-02-21T10:18:33.6935785Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29343, %r29344, %r29345, %r29346 }, [ %rd610 + 0 ], %rd609; 2026-02-21T10:18:33.6935849Z // end inline asm 2026-02-21T10:18:33.6935908Z // begin inline asm 2026-02-21T10:18:33.6935967Z mov.u64 %rd612, 0x0; 2026-02-21T10:18:33.6936139Z createpolicy.fractional.L2::evict_first.b64 %rd612, 1.0; 2026-02-21T10:18:33.6936199Z // end inline asm 2026-02-21T10:18:33.6936257Z // begin inline asm 2026-02-21T10:18:33.6936313Z mov.u32 %r29347, 0x0; 2026-02-21T10:18:33.6936372Z mov.u32 %r29348, 0x0; 2026-02-21T10:18:33.6936430Z mov.u32 %r29349, 0x0; 2026-02-21T10:18:33.6936672Z mov.u32 %r29350, 0x0; 2026-02-21T10:18:33.6936918Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29347, %r29348, %r29349, %r29350 }, [ %rd613 + 0 ], %rd612; 2026-02-21T10:18:33.6936976Z // end inline asm 2026-02-21T10:18:33.6937033Z // begin inline asm 2026-02-21T10:18:33.6937090Z mov.u64 %rd615, 0x0; 2026-02-21T10:18:33.6937212Z createpolicy.fractional.L2::evict_first.b64 %rd615, 1.0; 2026-02-21T10:18:33.6937267Z // end inline asm 2026-02-21T10:18:33.6937324Z // begin inline asm 2026-02-21T10:18:33.6937386Z mov.u32 %r29351, 0x0; 2026-02-21T10:18:33.6937443Z mov.u32 %r29352, 0x0; 2026-02-21T10:18:33.6937501Z mov.u32 %r29353, 0x0; 2026-02-21T10:18:33.6937556Z mov.u32 %r29354, 0x0; 2026-02-21T10:18:33.6937784Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29351, %r29352, %r29353, %r29354 }, [ %rd616 + 0 ], %rd615; 2026-02-21T10:18:33.6937843Z // end inline asm 2026-02-21T10:18:33.6937900Z // begin inline asm 2026-02-21T10:18:33.6937959Z mov.u64 %rd618, 0x0; 2026-02-21T10:18:33.6938078Z createpolicy.fractional.L2::evict_first.b64 %rd618, 1.0; 2026-02-21T10:18:33.6938134Z // end inline asm 2026-02-21T10:18:33.6938197Z // begin inline asm 2026-02-21T10:18:33.6938263Z mov.u32 %r29355, 0x0; 2026-02-21T10:18:33.6938321Z mov.u32 %r29356, 0x0; 2026-02-21T10:18:33.6938377Z mov.u32 %r29357, 0x0; 2026-02-21T10:18:33.6938437Z mov.u32 %r29358, 0x0; 2026-02-21T10:18:33.6938659Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29355, %r29356, %r29357, %r29358 }, [ %rd619 + 0 ], %rd618; 2026-02-21T10:18:33.6938717Z // end inline asm 2026-02-21T10:18:33.6938776Z // begin inline asm 2026-02-21T10:18:33.6938834Z mov.u64 %rd621, 0x0; 2026-02-21T10:18:33.6938952Z createpolicy.fractional.L2::evict_first.b64 %rd621, 1.0; 2026-02-21T10:18:33.6939010Z // end inline asm 2026-02-21T10:18:33.6939067Z // begin inline asm 2026-02-21T10:18:33.6939126Z mov.u32 %r29359, 0x0; 2026-02-21T10:18:33.6939184Z mov.u32 %r29360, 0x0; 2026-02-21T10:18:33.6939243Z mov.u32 %r29361, 0x0; 2026-02-21T10:18:33.6939302Z mov.u32 %r29362, 0x0; 2026-02-21T10:18:33.6939523Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29359, %r29360, %r29361, %r29362 }, [ %rd622 + 0 ], %rd621; 2026-02-21T10:18:33.6939581Z // end inline asm 2026-02-21T10:18:33.6939637Z // begin inline asm 2026-02-21T10:18:33.6939693Z mov.u64 %rd624, 0x0; 2026-02-21T10:18:33.6939809Z createpolicy.fractional.L2::evict_first.b64 %rd624, 1.0; 2026-02-21T10:18:33.6939867Z // end inline asm 2026-02-21T10:18:33.6939924Z // begin inline asm 2026-02-21T10:18:33.6940070Z mov.u32 %r29363, 0x0; 2026-02-21T10:18:33.6940128Z mov.u32 %r29364, 0x0; 2026-02-21T10:18:33.6940246Z mov.u32 %r29365, 0x0; 2026-02-21T10:18:33.6940303Z mov.u32 %r29366, 0x0; 2026-02-21T10:18:33.6940552Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29363, %r29364, %r29365, %r29366 }, [ %rd625 + 0 ], %rd624; 2026-02-21T10:18:33.6940610Z // end inline asm 2026-02-21T10:18:33.6940668Z // begin inline asm 2026-02-21T10:18:33.6940725Z mov.u64 %rd627, 0x0; 2026-02-21T10:18:33.6940845Z createpolicy.fractional.L2::evict_first.b64 %rd627, 1.0; 2026-02-21T10:18:33.6940900Z // end inline asm 2026-02-21T10:18:33.6940957Z // begin inline asm 2026-02-21T10:18:33.6941016Z mov.u32 %r29367, 0x0; 2026-02-21T10:18:33.6941073Z mov.u32 %r29368, 0x0; 2026-02-21T10:18:33.6941128Z mov.u32 %r29369, 0x0; 2026-02-21T10:18:33.6941185Z mov.u32 %r29370, 0x0; 2026-02-21T10:18:33.6941409Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29367, %r29368, %r29369, %r29370 }, [ %rd628 + 0 ], %rd627; 2026-02-21T10:18:33.6941467Z // end inline asm 2026-02-21T10:18:33.6941525Z // begin inline asm 2026-02-21T10:18:33.6941588Z mov.u64 %rd630, 0x0; 2026-02-21T10:18:33.6941776Z createpolicy.fractional.L2::evict_first.b64 %rd630, 1.0; 2026-02-21T10:18:33.6941838Z // end inline asm 2026-02-21T10:18:33.6941897Z // begin inline asm 2026-02-21T10:18:33.6941953Z mov.u32 %r29371, 0x0; 2026-02-21T10:18:33.6942010Z mov.u32 %r29372, 0x0; 2026-02-21T10:18:33.6942067Z mov.u32 %r29373, 0x0; 2026-02-21T10:18:33.6942168Z mov.u32 %r29374, 0x0; 2026-02-21T10:18:33.6942395Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29371, %r29372, %r29373, %r29374 }, [ %rd631 + 0 ], %rd630; 2026-02-21T10:18:33.6942451Z // end inline asm 2026-02-21T10:18:33.6942657Z .loc 1 62 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:62:32 2026-02-21T10:18:33.6942714Z bar.sync 0; 2026-02-21T10:18:33.6942800Z st.shared.v2.b32 [%r75], {%r29343, %r29344}; 2026-02-21T10:18:33.6942894Z st.shared.v2.b32 [%r75+2048], {%r29347, %r29348}; 2026-02-21T10:18:33.6942985Z st.shared.v2.b32 [%r75+4096], {%r29351, %r29352}; 2026-02-21T10:18:33.6943071Z st.shared.v2.b32 [%r75+6144], {%r29355, %r29356}; 2026-02-21T10:18:33.6943154Z st.shared.v2.b32 [%r75+8192], {%r29359, %r29360}; 2026-02-21T10:18:33.6943248Z st.shared.v2.b32 [%r75+10240], {%r29363, %r29364}; 2026-02-21T10:18:33.6943335Z st.shared.v2.b32 [%r75+12288], {%r29367, %r29368}; 2026-02-21T10:18:33.6943420Z st.shared.v2.b32 [%r75+14336], {%r29371, %r29372}; 2026-02-21T10:18:33.6943503Z st.shared.v2.b32 [%r76], {%r29345, %r29346}; 2026-02-21T10:18:33.6943586Z st.shared.v2.b32 [%r76+2048], {%r29349, %r29350}; 2026-02-21T10:18:33.6943669Z st.shared.v2.b32 [%r76+4096], {%r29353, %r29354}; 2026-02-21T10:18:33.6943753Z st.shared.v2.b32 [%r76+6144], {%r29357, %r29358}; 2026-02-21T10:18:33.6943834Z st.shared.v2.b32 [%r76+8192], {%r29361, %r29362}; 2026-02-21T10:18:33.6943918Z st.shared.v2.b32 [%r76+10240], {%r29365, %r29366}; 2026-02-21T10:18:33.6944017Z st.shared.v2.b32 [%r76+12288], {%r29369, %r29370}; 2026-02-21T10:18:33.6944109Z st.shared.v2.b32 [%r76+14336], {%r29373, %r29374}; 2026-02-21T10:18:33.6944169Z bar.sync 0; 2026-02-21T10:18:33.6944238Z ld.shared.b16 %rs2465, [%r77]; 2026-02-21T10:18:33.6944312Z ld.shared.b16 %rs2466, [%r77+1024]; 2026-02-21T10:18:33.6944379Z ld.shared.b16 %rs2467, [%r77+64]; 2026-02-21T10:18:33.6944446Z ld.shared.b16 %rs2468, [%r77+1088]; 2026-02-21T10:18:33.6944518Z ld.shared.b16 %rs2469, [%r77+8192]; 2026-02-21T10:18:33.6944584Z ld.shared.b16 %rs2470, [%r77+9216]; 2026-02-21T10:18:33.6944647Z ld.shared.b16 %rs2471, [%r77+8256]; 2026-02-21T10:18:33.6944711Z ld.shared.b16 %rs2472, [%r77+9280]; 2026-02-21T10:18:33.6944777Z ld.shared.b16 %rs2473, [%r78]; 2026-02-21T10:18:33.6944841Z ld.shared.b16 %rs2474, [%r78+1024]; 2026-02-21T10:18:33.6944905Z ld.shared.b16 %rs2475, [%r78+64]; 2026-02-21T10:18:33.6944973Z ld.shared.b16 %rs2476, [%r78+1088]; 2026-02-21T10:18:33.6945037Z ld.shared.b16 %rs2477, [%r78+8192]; 2026-02-21T10:18:33.6945171Z ld.shared.b16 %rs2478, [%r78+9216]; 2026-02-21T10:18:33.6945279Z ld.shared.b16 %rs2479, [%r78+8256]; 2026-02-21T10:18:33.6945348Z ld.shared.b16 %rs2480, [%r78+9280]; 2026-02-21T10:18:33.6945411Z ld.shared.b16 %rs2481, [%r79]; 2026-02-21T10:18:33.6945475Z ld.shared.b16 %rs2482, [%r79+1024]; 2026-02-21T10:18:33.6945540Z ld.shared.b16 %rs2483, [%r79+64]; 2026-02-21T10:18:33.6945603Z ld.shared.b16 %rs2484, [%r79+1088]; 2026-02-21T10:18:33.6945669Z ld.shared.b16 %rs2485, [%r79+8192]; 2026-02-21T10:18:33.6945735Z ld.shared.b16 %rs2486, [%r79+9216]; 2026-02-21T10:18:33.6945799Z ld.shared.b16 %rs2487, [%r79+8256]; 2026-02-21T10:18:33.6945861Z ld.shared.b16 %rs2488, [%r79+9280]; 2026-02-21T10:18:33.6945923Z ld.shared.b16 %rs2489, [%r80]; 2026-02-21T10:18:33.6945990Z ld.shared.b16 %rs2490, [%r80+1024]; 2026-02-21T10:18:33.6946051Z ld.shared.b16 %rs2491, [%r80+64]; 2026-02-21T10:18:33.6946115Z ld.shared.b16 %rs2492, [%r80+1088]; 2026-02-21T10:18:33.6946182Z ld.shared.b16 %rs2493, [%r80+8192]; 2026-02-21T10:18:33.6946245Z ld.shared.b16 %rs2494, [%r80+9216]; 2026-02-21T10:18:33.6946312Z ld.shared.b16 %rs2495, [%r80+8256]; 2026-02-21T10:18:33.6946429Z ld.shared.b16 %rs2496, [%r80+9280]; 2026-02-21T10:18:33.6946610Z ld.shared.b16 %rs2497, [%r81]; 2026-02-21T10:18:33.6946676Z ld.shared.b16 %rs2498, [%r81+1024]; 2026-02-21T10:18:33.6946739Z ld.shared.b16 %rs2499, [%r81+64]; 2026-02-21T10:18:33.6946880Z ld.shared.b16 %rs2500, [%r81+1088]; 2026-02-21T10:18:33.6946952Z ld.shared.b16 %rs2501, [%r81+8192]; 2026-02-21T10:18:33.6947018Z ld.shared.b16 %rs2502, [%r81+9216]; 2026-02-21T10:18:33.6947084Z ld.shared.b16 %rs2503, [%r81+8256]; 2026-02-21T10:18:33.6947148Z ld.shared.b16 %rs2504, [%r81+9280]; 2026-02-21T10:18:33.6947210Z ld.shared.b16 %rs2505, [%r82]; 2026-02-21T10:18:33.6947275Z ld.shared.b16 %rs2506, [%r82+1024]; 2026-02-21T10:18:33.6947341Z ld.shared.b16 %rs2507, [%r82+64]; 2026-02-21T10:18:33.6947405Z ld.shared.b16 %rs2508, [%r82+1088]; 2026-02-21T10:18:33.6947471Z ld.shared.b16 %rs2509, [%r82+8192]; 2026-02-21T10:18:33.6947540Z ld.shared.b16 %rs2510, [%r82+9216]; 2026-02-21T10:18:33.6947605Z ld.shared.b16 %rs2511, [%r82+8256]; 2026-02-21T10:18:33.6947667Z ld.shared.b16 %rs2512, [%r82+9280]; 2026-02-21T10:18:33.6947730Z ld.shared.b16 %rs2513, [%r83]; 2026-02-21T10:18:33.6947798Z ld.shared.b16 %rs2514, [%r83+1024]; 2026-02-21T10:18:33.6947862Z ld.shared.b16 %rs2515, [%r83+64]; 2026-02-21T10:18:33.6947927Z ld.shared.b16 %rs2516, [%r83+1088]; 2026-02-21T10:18:33.6947993Z ld.shared.b16 %rs2517, [%r83+8192]; 2026-02-21T10:18:33.6948056Z ld.shared.b16 %rs2518, [%r83+9216]; 2026-02-21T10:18:33.6948119Z ld.shared.b16 %rs2519, [%r83+8256]; 2026-02-21T10:18:33.6948187Z ld.shared.b16 %rs2520, [%r83+9280]; 2026-02-21T10:18:33.6948250Z ld.shared.b16 %rs2521, [%r84]; 2026-02-21T10:18:33.6948313Z ld.shared.b16 %rs2522, [%r84+1024]; 2026-02-21T10:18:33.6948375Z ld.shared.b16 %rs2523, [%r84+64]; 2026-02-21T10:18:33.6948521Z ld.shared.b16 %rs2524, [%r84+1088]; 2026-02-21T10:18:33.6948589Z ld.shared.b16 %rs2525, [%r84+8192]; 2026-02-21T10:18:33.6948654Z ld.shared.b16 %rs2526, [%r84+9216]; 2026-02-21T10:18:33.6948720Z ld.shared.b16 %rs2527, [%r84+8256]; 2026-02-21T10:18:33.6948785Z ld.shared.b16 %rs2528, [%r84+9280]; 2026-02-21T10:18:33.6948848Z cvt.f32.bf16 %r29512, %rs2465; 2026-02-21T10:18:33.6948910Z cvt.f32.bf16 %r29513, %rs2466; 2026-02-21T10:18:33.6948974Z cvt.f32.bf16 %r29514, %rs2473; 2026-02-21T10:18:33.6949035Z cvt.f32.bf16 %r29515, %rs2474; 2026-02-21T10:18:33.6949095Z cvt.f32.bf16 %r29644, %rs2481; 2026-02-21T10:18:33.6949156Z cvt.f32.bf16 %r29645, %rs2482; 2026-02-21T10:18:33.6949215Z cvt.f32.bf16 %r29646, %rs2489; 2026-02-21T10:18:33.6949276Z cvt.f32.bf16 %r29647, %rs2490; 2026-02-21T10:18:33.6949335Z cvt.f32.bf16 %r29776, %rs2497; 2026-02-21T10:18:33.6949396Z cvt.f32.bf16 %r29777, %rs2498; 2026-02-21T10:18:33.6949456Z cvt.f32.bf16 %r29778, %rs2505; 2026-02-21T10:18:33.6949597Z cvt.f32.bf16 %r29779, %rs2506; 2026-02-21T10:18:33.6949658Z cvt.f32.bf16 %r29908, %rs2513; 2026-02-21T10:18:33.6949777Z cvt.f32.bf16 %r29909, %rs2514; 2026-02-21T10:18:33.6949838Z cvt.f32.bf16 %r29910, %rs2521; 2026-02-21T10:18:33.6949901Z cvt.f32.bf16 %r29911, %rs2522; 2026-02-21T10:18:33.6949960Z cvt.f32.bf16 %r30040, %rs2467; 2026-02-21T10:18:33.6950019Z cvt.f32.bf16 %r30041, %rs2468; 2026-02-21T10:18:33.6950078Z cvt.f32.bf16 %r30042, %rs2475; 2026-02-21T10:18:33.6950142Z cvt.f32.bf16 %r30043, %rs2476; 2026-02-21T10:18:33.6950202Z cvt.f32.bf16 %r30172, %rs2483; 2026-02-21T10:18:33.6950260Z cvt.f32.bf16 %r30173, %rs2484; 2026-02-21T10:18:33.6950321Z cvt.f32.bf16 %r30174, %rs2491; 2026-02-21T10:18:33.6950381Z cvt.f32.bf16 %r30175, %rs2492; 2026-02-21T10:18:33.6950441Z cvt.f32.bf16 %r30304, %rs2499; 2026-02-21T10:18:33.6950500Z cvt.f32.bf16 %r30305, %rs2500; 2026-02-21T10:18:33.6950563Z cvt.f32.bf16 %r30306, %rs2507; 2026-02-21T10:18:33.6950622Z cvt.f32.bf16 %r30307, %rs2508; 2026-02-21T10:18:33.6950683Z cvt.f32.bf16 %r30436, %rs2515; 2026-02-21T10:18:33.6950746Z cvt.f32.bf16 %r30437, %rs2516; 2026-02-21T10:18:33.6950806Z cvt.f32.bf16 %r30438, %rs2523; 2026-02-21T10:18:33.6950937Z cvt.f32.bf16 %r30439, %rs2524; 2026-02-21T10:18:33.6951002Z cvt.f32.bf16 %r30568, %rs2469; 2026-02-21T10:18:33.6951065Z cvt.f32.bf16 %r30569, %rs2470; 2026-02-21T10:18:33.6951125Z cvt.f32.bf16 %r30570, %rs2477; 2026-02-21T10:18:33.6951250Z cvt.f32.bf16 %r30571, %rs2478; 2026-02-21T10:18:33.6951314Z cvt.f32.bf16 %r30700, %rs2485; 2026-02-21T10:18:33.6951374Z cvt.f32.bf16 %r30701, %rs2486; 2026-02-21T10:18:33.6951433Z cvt.f32.bf16 %r30702, %rs2493; 2026-02-21T10:18:33.6951495Z cvt.f32.bf16 %r30703, %rs2494; 2026-02-21T10:18:33.6951553Z cvt.f32.bf16 %r30832, %rs2501; 2026-02-21T10:18:33.6951612Z cvt.f32.bf16 %r30833, %rs2502; 2026-02-21T10:18:33.6951672Z cvt.f32.bf16 %r30834, %rs2509; 2026-02-21T10:18:33.6951743Z cvt.f32.bf16 %r30835, %rs2510; 2026-02-21T10:18:33.6951809Z cvt.f32.bf16 %r30964, %rs2517; 2026-02-21T10:18:33.6951870Z cvt.f32.bf16 %r30965, %rs2518; 2026-02-21T10:18:33.6951933Z cvt.f32.bf16 %r30966, %rs2525; 2026-02-21T10:18:33.6951994Z cvt.f32.bf16 %r30967, %rs2526; 2026-02-21T10:18:33.6952053Z cvt.f32.bf16 %r31096, %rs2471; 2026-02-21T10:18:33.6952113Z cvt.f32.bf16 %r31097, %rs2472; 2026-02-21T10:18:33.6952175Z cvt.f32.bf16 %r31098, %rs2479; 2026-02-21T10:18:33.6952234Z cvt.f32.bf16 %r31099, %rs2480; 2026-02-21T10:18:33.6952294Z cvt.f32.bf16 %r31228, %rs2487; 2026-02-21T10:18:33.6952356Z cvt.f32.bf16 %r31229, %rs2488; 2026-02-21T10:18:33.6952416Z cvt.f32.bf16 %r31230, %rs2495; 2026-02-21T10:18:33.6952475Z cvt.f32.bf16 %r31231, %rs2496; 2026-02-21T10:18:33.6952534Z cvt.f32.bf16 %r31360, %rs2503; 2026-02-21T10:18:33.6952596Z cvt.f32.bf16 %r31361, %rs2504; 2026-02-21T10:18:33.6952655Z cvt.f32.bf16 %r31362, %rs2511; 2026-02-21T10:18:33.6952713Z cvt.f32.bf16 %r31363, %rs2512; 2026-02-21T10:18:33.6952775Z cvt.f32.bf16 %r31492, %rs2519; 2026-02-21T10:18:33.6952837Z cvt.f32.bf16 %r31493, %rs2520; 2026-02-21T10:18:33.6952898Z cvt.f32.bf16 %r31494, %rs2527; 2026-02-21T10:18:33.6952968Z cvt.f32.bf16 %r31495, %rs2528; 2026-02-21T10:18:33.6953183Z .loc 1 64 33 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:64:33 2026-02-21T10:18:33.6953239Z bar.sync 0; 2026-02-21T10:18:33.6953298Z // begin inline asm 2026-02-21T10:18:33.6953401Z @%p222 mbarrier.init.shared::cta.b64 [%r29375], 1; 2026-02-21T10:18:33.6953460Z // end inline asm 2026-02-21T10:18:33.6953514Z bar.sync 0; 2026-02-21T10:18:33.6953575Z // begin inline asm 2026-02-21T10:18:33.6953713Z @%p222 mbarrier.arrive.expect_tx.shared.b64 _, [%r29375], 4096; 2026-02-21T10:18:33.6953770Z // end inline asm 2026-02-21T10:18:33.6953828Z // begin inline asm 2026-02-21T10:18:33.6953907Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.6953962Z // end inline asm 2026-02-21T10:18:33.6954015Z bar.sync 0; 2026-02-21T10:18:33.6954084Z elect.sync %r31758|%p308, -1; 2026-02-21T10:18:33.6954212Z and.pred %p289, %p1, %p308; 2026-02-21T10:18:33.6954317Z add.s64 %rd664, %rd664, 32; 2026-02-21T10:18:33.6954380Z cvt.u32.u64 %r29379, %rd664; 2026-02-21T10:18:33.6954443Z // begin inline asm 2026-02-21T10:18:33.6954805Z @%p289 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r29377], [%rd633, {%r29378, %r29379}], [%r29375]; 2026-02-21T10:18:33.6954863Z // end inline asm 2026-02-21T10:18:33.6954923Z bar.sync 0; 2026-02-21T10:18:33.6954979Z mov.b32 %r31626, 0; 2026-02-21T10:18:33.6955037Z // begin inline asm 2026-02-21T10:18:33.6955089Z 2026-02-21T10:18:33.6955138Z { 2026-02-21T10:18:33.6955202Z .reg .pred complete; 2026-02-21T10:18:33.6955257Z waitLoop: 2026-02-21T10:18:33.6955407Z mbarrier.try_wait.parity.shared.b64 complete, [%r29375], %r31626; 2026-02-21T10:18:33.6955474Z @!complete bra.uni waitLoop; 2026-02-21T10:18:33.6955523Z } 2026-02-21T10:18:33.6955528Z 2026-02-21T10:18:33.6955585Z // end inline asm 2026-02-21T10:18:33.6955643Z bar.sync 0; 2026-02-21T10:18:33.6955700Z // begin inline asm 2026-02-21T10:18:33.6955798Z @%p222 mbarrier.inval.shared::cta.b64 [%r29375]; 2026-02-21T10:18:33.6955913Z // end inline asm 2026-02-21T10:18:33.6956123Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6956188Z ld.shared.s8 %rs2529, [%r85]; 2026-02-21T10:18:33.6956433Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6956627Z shl.b16 %rs2530, %rs2529, 4; 2026-02-21T10:18:33.6956827Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6956899Z ld.shared.s8 %rs2531, [%r86+128]; 2026-02-21T10:18:33.6957094Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6957155Z shl.b16 %rs2532, %rs2531, 4; 2026-02-21T10:18:33.6957351Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6957421Z ld.shared.s8 %rs2533, [%r87+256]; 2026-02-21T10:18:33.6957617Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6957678Z shl.b16 %rs2534, %rs2533, 4; 2026-02-21T10:18:33.6957873Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6957939Z ld.shared.s8 %rs2535, [%r88+384]; 2026-02-21T10:18:33.6958133Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6958197Z shl.b16 %rs2536, %rs2535, 4; 2026-02-21T10:18:33.6958390Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6958454Z ld.shared.s8 %rs2537, [%r89+512]; 2026-02-21T10:18:33.6958648Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6958712Z shl.b16 %rs2538, %rs2537, 4; 2026-02-21T10:18:33.6958907Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6958988Z ld.shared.s8 %rs2539, [%r90+640]; 2026-02-21T10:18:33.6959184Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6959245Z shl.b16 %rs2540, %rs2539, 4; 2026-02-21T10:18:33.6959441Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6959507Z ld.shared.s8 %rs2541, [%r91+768]; 2026-02-21T10:18:33.6959700Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6959760Z shl.b16 %rs2542, %rs2541, 4; 2026-02-21T10:18:33.6959953Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6960097Z ld.shared.s8 %rs2543, [%r92+896]; 2026-02-21T10:18:33.6960290Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6960418Z shl.b16 %rs2544, %rs2543, 4; 2026-02-21T10:18:33.6960610Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6960676Z ld.shared.s8 %rs2545, [%r85+1024]; 2026-02-21T10:18:33.6960871Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6960933Z shl.b16 %rs2546, %rs2545, 4; 2026-02-21T10:18:33.6961126Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6961190Z ld.shared.s8 %rs2547, [%r86+1152]; 2026-02-21T10:18:33.6961400Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6961462Z shl.b16 %rs2548, %rs2547, 4; 2026-02-21T10:18:33.6961659Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6961787Z ld.shared.s8 %rs2549, [%r87+1280]; 2026-02-21T10:18:33.6961983Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6962044Z shl.b16 %rs2550, %rs2549, 4; 2026-02-21T10:18:33.6962296Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6962363Z ld.shared.s8 %rs2551, [%r88+1408]; 2026-02-21T10:18:33.6962557Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6962619Z shl.b16 %rs2552, %rs2551, 4; 2026-02-21T10:18:33.6962809Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6962886Z ld.shared.s8 %rs2553, [%r89+1536]; 2026-02-21T10:18:33.6963083Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6963148Z shl.b16 %rs2554, %rs2553, 4; 2026-02-21T10:18:33.6963349Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6963411Z ld.shared.s8 %rs2555, [%r90+1664]; 2026-02-21T10:18:33.6963605Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6963667Z shl.b16 %rs2556, %rs2555, 4; 2026-02-21T10:18:33.6963860Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6963926Z ld.shared.s8 %rs2557, [%r91+1792]; 2026-02-21T10:18:33.6964119Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6974831Z shl.b16 %rs2558, %rs2557, 4; 2026-02-21T10:18:33.6975126Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6975226Z ld.shared.s8 %rs2559, [%r92+1920]; 2026-02-21T10:18:33.6975464Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6975539Z shl.b16 %rs2560, %rs2559, 4; 2026-02-21T10:18:33.6975748Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6975826Z ld.shared.s8 %rs2561, [%r85+2048]; 2026-02-21T10:18:33.6976030Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6976101Z shl.b16 %rs2562, %rs2561, 4; 2026-02-21T10:18:33.6976319Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6976391Z ld.shared.s8 %rs2563, [%r86+2176]; 2026-02-21T10:18:33.6976769Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6976984Z shl.b16 %rs2564, %rs2563, 4; 2026-02-21T10:18:33.6977188Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6977325Z ld.shared.s8 %rs2565, [%r87+2304]; 2026-02-21T10:18:33.6977526Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6977596Z shl.b16 %rs2566, %rs2565, 4; 2026-02-21T10:18:33.6977797Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6977865Z ld.shared.s8 %rs2567, [%r88+2432]; 2026-02-21T10:18:33.6978064Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6978128Z shl.b16 %rs2568, %rs2567, 4; 2026-02-21T10:18:33.6978323Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6978391Z ld.shared.s8 %rs2569, [%r89+2560]; 2026-02-21T10:18:33.6978585Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6978661Z shl.b16 %rs2570, %rs2569, 4; 2026-02-21T10:18:33.6978924Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6978998Z ld.shared.s8 %rs2571, [%r90+2688]; 2026-02-21T10:18:33.6979251Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6979316Z shl.b16 %rs2572, %rs2571, 4; 2026-02-21T10:18:33.6979512Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6979578Z ld.shared.s8 %rs2573, [%r91+2816]; 2026-02-21T10:18:33.6979770Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6979836Z shl.b16 %rs2574, %rs2573, 4; 2026-02-21T10:18:33.6980028Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6980107Z ld.shared.s8 %rs2575, [%r92+2944]; 2026-02-21T10:18:33.6980310Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6980374Z shl.b16 %rs2576, %rs2575, 4; 2026-02-21T10:18:33.6980568Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6980636Z ld.shared.s8 %rs2577, [%r85+3072]; 2026-02-21T10:18:33.6980833Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6980901Z shl.b16 %rs2578, %rs2577, 4; 2026-02-21T10:18:33.6981096Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6981163Z ld.shared.s8 %rs2579, [%r86+3200]; 2026-02-21T10:18:33.6981359Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6981423Z shl.b16 %rs2580, %rs2579, 4; 2026-02-21T10:18:33.6981623Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6981688Z ld.shared.s8 %rs2581, [%r87+3328]; 2026-02-21T10:18:33.6981882Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6981948Z shl.b16 %rs2582, %rs2581, 4; 2026-02-21T10:18:33.6982145Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6982211Z ld.shared.s8 %rs2583, [%r88+3456]; 2026-02-21T10:18:33.6982406Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6982483Z shl.b16 %rs2584, %rs2583, 4; 2026-02-21T10:18:33.6982680Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6982802Z ld.shared.s8 %rs2585, [%r89+3584]; 2026-02-21T10:18:33.6982999Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6983109Z shl.b16 %rs2586, %rs2585, 4; 2026-02-21T10:18:33.6983317Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6983389Z ld.shared.s8 %rs2587, [%r90+3712]; 2026-02-21T10:18:33.6983590Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6983654Z shl.b16 %rs2588, %rs2587, 4; 2026-02-21T10:18:33.6983857Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6983923Z ld.shared.s8 %rs2589, [%r91+3840]; 2026-02-21T10:18:33.6984120Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6984185Z shl.b16 %rs2590, %rs2589, 4; 2026-02-21T10:18:33.6984383Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6984449Z ld.shared.s8 %rs2591, [%r92+3968]; 2026-02-21T10:18:33.6984699Z .loc 1 67 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:67:28 2026-02-21T10:18:33.6984767Z shl.b16 %rs2592, %rs2591, 4; 2026-02-21T10:18:33.6985022Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6985097Z cvt.s16.s8 %rs2593, %rs2530; 2026-02-21T10:18:33.6985167Z shr.s16 %rs2594, %rs2593, 4; 2026-02-21T10:18:33.6985233Z cvt.s16.s8 %rs2595, %rs2532; 2026-02-21T10:18:33.6985294Z shr.s16 %rs2596, %rs2595, 4; 2026-02-21T10:18:33.6985359Z shr.s16 %rs2597, %rs2529, 4; 2026-02-21T10:18:33.6985420Z shr.s16 %rs2598, %rs2531, 4; 2026-02-21T10:18:33.6985639Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6985712Z cvt.rn.f32.s16 %r31759, %rs2598; 2026-02-21T10:18:33.6985779Z cvt.rn.f32.s16 %r31760, %rs2597; 2026-02-21T10:18:33.6985845Z cvt.rn.f32.s16 %r31761, %rs2596; 2026-02-21T10:18:33.6985910Z cvt.rn.f32.s16 %r31762, %rs2594; 2026-02-21T10:18:33.6986129Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6986194Z cvt.s16.s8 %rs2599, %rs2534; 2026-02-21T10:18:33.6986260Z shr.s16 %rs2600, %rs2599, 4; 2026-02-21T10:18:33.6986322Z cvt.s16.s8 %rs2601, %rs2536; 2026-02-21T10:18:33.6986389Z shr.s16 %rs2602, %rs2601, 4; 2026-02-21T10:18:33.6986570Z shr.s16 %rs2603, %rs2533, 4; 2026-02-21T10:18:33.6986637Z shr.s16 %rs2604, %rs2535, 4; 2026-02-21T10:18:33.6986849Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6986916Z cvt.rn.f32.s16 %r31763, %rs2604; 2026-02-21T10:18:33.6986981Z cvt.rn.f32.s16 %r31764, %rs2603; 2026-02-21T10:18:33.6987048Z cvt.rn.f32.s16 %r31765, %rs2602; 2026-02-21T10:18:33.6987112Z cvt.rn.f32.s16 %r31766, %rs2600; 2026-02-21T10:18:33.6987317Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6987381Z cvt.s16.s8 %rs2605, %rs2538; 2026-02-21T10:18:33.6987447Z shr.s16 %rs2606, %rs2605, 4; 2026-02-21T10:18:33.6987508Z cvt.s16.s8 %rs2607, %rs2540; 2026-02-21T10:18:33.6987570Z shr.s16 %rs2608, %rs2607, 4; 2026-02-21T10:18:33.6987638Z shr.s16 %rs2609, %rs2537, 4; 2026-02-21T10:18:33.6987704Z shr.s16 %rs2610, %rs2539, 4; 2026-02-21T10:18:33.6987904Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.6987972Z cvt.rn.f32.s16 %r31767, %rs2610; 2026-02-21T10:18:33.6988037Z cvt.rn.f32.s16 %r31768, %rs2609; 2026-02-21T10:18:33.6988100Z cvt.rn.f32.s16 %r31769, %rs2608; 2026-02-21T10:18:33.6988163Z cvt.rn.f32.s16 %r31770, %rs2606; 2026-02-21T10:18:33.6988363Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.6988665Z cvt.s16.s8 %rs2611, %rs2542; 2026-02-21T10:18:33.7009398Z shr.s16 %rs2612, %rs2611, 4; 2026-02-21T10:18:33.7009498Z cvt.s16.s8 %rs2613, %rs2544; 2026-02-21T10:18:33.7009567Z shr.s16 %rs2614, %rs2613, 4; 2026-02-21T10:18:33.7009629Z shr.s16 %rs2615, %rs2541, 4; 2026-02-21T10:18:33.7009690Z shr.s16 %rs2616, %rs2543, 4; 2026-02-21T10:18:33.7009926Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7009997Z cvt.rn.f32.s16 %r31771, %rs2616; 2026-02-21T10:18:33.7010061Z cvt.rn.f32.s16 %r31772, %rs2615; 2026-02-21T10:18:33.7010125Z cvt.rn.f32.s16 %r31773, %rs2614; 2026-02-21T10:18:33.7010185Z cvt.rn.f32.s16 %r31774, %rs2612; 2026-02-21T10:18:33.7010402Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7010470Z cvt.s16.s8 %rs2617, %rs2546; 2026-02-21T10:18:33.7010534Z shr.s16 %rs2618, %rs2617, 4; 2026-02-21T10:18:33.7010596Z cvt.s16.s8 %rs2619, %rs2548; 2026-02-21T10:18:33.7010656Z shr.s16 %rs2620, %rs2619, 4; 2026-02-21T10:18:33.7010805Z shr.s16 %rs2621, %rs2545, 4; 2026-02-21T10:18:33.7010879Z shr.s16 %rs2622, %rs2547, 4; 2026-02-21T10:18:33.7011086Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7011218Z cvt.rn.f32.s16 %r31775, %rs2622; 2026-02-21T10:18:33.7011283Z cvt.rn.f32.s16 %r31776, %rs2621; 2026-02-21T10:18:33.7011343Z cvt.rn.f32.s16 %r31777, %rs2620; 2026-02-21T10:18:33.7011406Z cvt.rn.f32.s16 %r31778, %rs2618; 2026-02-21T10:18:33.7011603Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7011664Z cvt.s16.s8 %rs2623, %rs2550; 2026-02-21T10:18:33.7011735Z shr.s16 %rs2624, %rs2623, 4; 2026-02-21T10:18:33.7011800Z cvt.s16.s8 %rs2625, %rs2552; 2026-02-21T10:18:33.7011862Z shr.s16 %rs2626, %rs2625, 4; 2026-02-21T10:18:33.7011921Z shr.s16 %rs2627, %rs2549, 4; 2026-02-21T10:18:33.7011984Z shr.s16 %rs2628, %rs2551, 4; 2026-02-21T10:18:33.7012181Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7012243Z cvt.rn.f32.s16 %r31779, %rs2628; 2026-02-21T10:18:33.7012306Z cvt.rn.f32.s16 %r31780, %rs2627; 2026-02-21T10:18:33.7012366Z cvt.rn.f32.s16 %r31781, %rs2626; 2026-02-21T10:18:33.7012427Z cvt.rn.f32.s16 %r31782, %rs2624; 2026-02-21T10:18:33.7012620Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7012683Z cvt.s16.s8 %rs2629, %rs2554; 2026-02-21T10:18:33.7012745Z shr.s16 %rs2630, %rs2629, 4; 2026-02-21T10:18:33.7012805Z cvt.s16.s8 %rs2631, %rs2556; 2026-02-21T10:18:33.7012873Z shr.s16 %rs2632, %rs2631, 4; 2026-02-21T10:18:33.7012938Z shr.s16 %rs2633, %rs2553, 4; 2026-02-21T10:18:33.7012999Z shr.s16 %rs2634, %rs2555, 4; 2026-02-21T10:18:33.7013198Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7013265Z cvt.rn.f32.s16 %r31783, %rs2634; 2026-02-21T10:18:33.7013326Z cvt.rn.f32.s16 %r31784, %rs2633; 2026-02-21T10:18:33.7013386Z cvt.rn.f32.s16 %r31785, %rs2632; 2026-02-21T10:18:33.7013450Z cvt.rn.f32.s16 %r31786, %rs2630; 2026-02-21T10:18:33.7013642Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7013703Z cvt.s16.s8 %rs2635, %rs2558; 2026-02-21T10:18:33.7013765Z shr.s16 %rs2636, %rs2635, 4; 2026-02-21T10:18:33.7013824Z cvt.s16.s8 %rs2637, %rs2560; 2026-02-21T10:18:33.7013885Z shr.s16 %rs2638, %rs2637, 4; 2026-02-21T10:18:33.7013944Z shr.s16 %rs2639, %rs2557, 4; 2026-02-21T10:18:33.7014006Z shr.s16 %rs2640, %rs2559, 4; 2026-02-21T10:18:33.7014198Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7014345Z cvt.rn.f32.s16 %r31787, %rs2640; 2026-02-21T10:18:33.7014468Z cvt.rn.f32.s16 %r31788, %rs2639; 2026-02-21T10:18:33.7014529Z cvt.rn.f32.s16 %r31789, %rs2638; 2026-02-21T10:18:33.7014590Z cvt.rn.f32.s16 %r31790, %rs2636; 2026-02-21T10:18:33.7014785Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7014845Z cvt.s16.s8 %rs2641, %rs2562; 2026-02-21T10:18:33.7014906Z shr.s16 %rs2642, %rs2641, 4; 2026-02-21T10:18:33.7014967Z cvt.s16.s8 %rs2643, %rs2564; 2026-02-21T10:18:33.7015028Z shr.s16 %rs2644, %rs2643, 4; 2026-02-21T10:18:33.7015086Z shr.s16 %rs2645, %rs2561, 4; 2026-02-21T10:18:33.7015145Z shr.s16 %rs2646, %rs2563, 4; 2026-02-21T10:18:33.7015339Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7015400Z cvt.rn.f32.s16 %r31791, %rs2646; 2026-02-21T10:18:33.7015461Z cvt.rn.f32.s16 %r31792, %rs2645; 2026-02-21T10:18:33.7015525Z cvt.rn.f32.s16 %r31793, %rs2644; 2026-02-21T10:18:33.7015598Z cvt.rn.f32.s16 %r31794, %rs2642; 2026-02-21T10:18:33.7015868Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7015931Z cvt.s16.s8 %rs2647, %rs2566; 2026-02-21T10:18:33.7015994Z shr.s16 %rs2648, %rs2647, 4; 2026-02-21T10:18:33.7016054Z cvt.s16.s8 %rs2649, %rs2568; 2026-02-21T10:18:33.7016113Z shr.s16 %rs2650, %rs2649, 4; 2026-02-21T10:18:33.7016215Z shr.s16 %rs2651, %rs2565, 4; 2026-02-21T10:18:33.7016277Z shr.s16 %rs2652, %rs2567, 4; 2026-02-21T10:18:33.7016582Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7016650Z cvt.rn.f32.s16 %r31795, %rs2652; 2026-02-21T10:18:33.7016712Z cvt.rn.f32.s16 %r31796, %rs2651; 2026-02-21T10:18:33.7016772Z cvt.rn.f32.s16 %r31797, %rs2650; 2026-02-21T10:18:33.7016831Z cvt.rn.f32.s16 %r31798, %rs2648; 2026-02-21T10:18:33.7017035Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7017100Z cvt.s16.s8 %rs2653, %rs2570; 2026-02-21T10:18:33.7017161Z shr.s16 %rs2654, %rs2653, 4; 2026-02-21T10:18:33.7017224Z cvt.s16.s8 %rs2655, %rs2572; 2026-02-21T10:18:33.7017284Z shr.s16 %rs2656, %rs2655, 4; 2026-02-21T10:18:33.7017342Z shr.s16 %rs2657, %rs2569, 4; 2026-02-21T10:18:33.7017401Z shr.s16 %rs2658, %rs2571, 4; 2026-02-21T10:18:33.7017604Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7017669Z cvt.rn.f32.s16 %r31799, %rs2658; 2026-02-21T10:18:33.7017730Z cvt.rn.f32.s16 %r31800, %rs2657; 2026-02-21T10:18:33.7017794Z cvt.rn.f32.s16 %r31801, %rs2656; 2026-02-21T10:18:33.7017856Z cvt.rn.f32.s16 %r31802, %rs2654; 2026-02-21T10:18:33.7018053Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7018116Z cvt.s16.s8 %rs2659, %rs2574; 2026-02-21T10:18:33.7018179Z shr.s16 %rs2660, %rs2659, 4; 2026-02-21T10:18:33.7018240Z cvt.s16.s8 %rs2661, %rs2576; 2026-02-21T10:18:33.7018301Z shr.s16 %rs2662, %rs2661, 4; 2026-02-21T10:18:33.7018362Z shr.s16 %rs2663, %rs2573, 4; 2026-02-21T10:18:33.7018420Z shr.s16 %rs2664, %rs2575, 4; 2026-02-21T10:18:33.7018614Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7018681Z cvt.rn.f32.s16 %r31803, %rs2664; 2026-02-21T10:18:33.7018743Z cvt.rn.f32.s16 %r31804, %rs2663; 2026-02-21T10:18:33.7018804Z cvt.rn.f32.s16 %r31805, %rs2662; 2026-02-21T10:18:33.7018866Z cvt.rn.f32.s16 %r31806, %rs2660; 2026-02-21T10:18:33.7019059Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7019122Z cvt.s16.s8 %rs2665, %rs2578; 2026-02-21T10:18:33.7019186Z shr.s16 %rs2666, %rs2665, 4; 2026-02-21T10:18:33.7019246Z cvt.s16.s8 %rs2667, %rs2580; 2026-02-21T10:18:33.7019397Z shr.s16 %rs2668, %rs2667, 4; 2026-02-21T10:18:33.7019519Z shr.s16 %rs2669, %rs2577, 4; 2026-02-21T10:18:33.7019579Z shr.s16 %rs2670, %rs2579, 4; 2026-02-21T10:18:33.7019775Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7019841Z cvt.rn.f32.s16 %r31807, %rs2670; 2026-02-21T10:18:33.7019902Z cvt.rn.f32.s16 %r31808, %rs2669; 2026-02-21T10:18:33.7019967Z cvt.rn.f32.s16 %r31809, %rs2668; 2026-02-21T10:18:33.7020031Z cvt.rn.f32.s16 %r31810, %rs2666; 2026-02-21T10:18:33.7020225Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7020286Z cvt.s16.s8 %rs2671, %rs2582; 2026-02-21T10:18:33.7020346Z shr.s16 %rs2672, %rs2671, 4; 2026-02-21T10:18:33.7020408Z cvt.s16.s8 %rs2673, %rs2584; 2026-02-21T10:18:33.7020467Z shr.s16 %rs2674, %rs2673, 4; 2026-02-21T10:18:33.7020526Z shr.s16 %rs2675, %rs2581, 4; 2026-02-21T10:18:33.7020589Z shr.s16 %rs2676, %rs2583, 4; 2026-02-21T10:18:33.7020783Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7020910Z cvt.rn.f32.s16 %r31811, %rs2676; 2026-02-21T10:18:33.7020980Z cvt.rn.f32.s16 %r31812, %rs2675; 2026-02-21T10:18:33.7021051Z cvt.rn.f32.s16 %r31813, %rs2674; 2026-02-21T10:18:33.7021114Z cvt.rn.f32.s16 %r31814, %rs2672; 2026-02-21T10:18:33.7021365Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7021431Z cvt.s16.s8 %rs2677, %rs2586; 2026-02-21T10:18:33.7021492Z shr.s16 %rs2678, %rs2677, 4; 2026-02-21T10:18:33.7021553Z cvt.s16.s8 %rs2679, %rs2588; 2026-02-21T10:18:33.7021615Z shr.s16 %rs2680, %rs2679, 4; 2026-02-21T10:18:33.7021675Z shr.s16 %rs2681, %rs2585, 4; 2026-02-21T10:18:33.7021733Z shr.s16 %rs2682, %rs2587, 4; 2026-02-21T10:18:33.7021927Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7021995Z cvt.rn.f32.s16 %r31815, %rs2682; 2026-02-21T10:18:33.7022060Z cvt.rn.f32.s16 %r31816, %rs2681; 2026-02-21T10:18:33.7022123Z cvt.rn.f32.s16 %r31817, %rs2680; 2026-02-21T10:18:33.7022186Z cvt.rn.f32.s16 %r31818, %rs2678; 2026-02-21T10:18:33.7022379Z .loc 1 69 25 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:69:25 2026-02-21T10:18:33.7022439Z cvt.s16.s8 %rs2683, %rs2590; 2026-02-21T10:18:33.7022503Z shr.s16 %rs2684, %rs2683, 4; 2026-02-21T10:18:33.7022563Z cvt.s16.s8 %rs2685, %rs2592; 2026-02-21T10:18:33.7022627Z shr.s16 %rs2686, %rs2685, 4; 2026-02-21T10:18:33.7022687Z shr.s16 %rs2687, %rs2589, 4; 2026-02-21T10:18:33.7022751Z shr.s16 %rs2688, %rs2591, 4; 2026-02-21T10:18:33.7022944Z .loc 1 87 32 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:87:32 2026-02-21T10:18:33.7023007Z cvt.rn.f32.s16 %r31819, %rs2688; 2026-02-21T10:18:33.7023079Z cvt.rn.f32.s16 %r31820, %rs2687; 2026-02-21T10:18:33.7023154Z cvt.rn.f32.s16 %r31821, %rs2686; 2026-02-21T10:18:33.7023220Z cvt.rn.f32.s16 %r31822, %rs2684; 2026-02-21T10:18:33.7023280Z bar.sync 0; 2026-02-21T10:18:33.7023406Z st.shared.v4.b32 [%r93], {%r31762, %r31760, %r31761, %r31759}; 2026-02-21T10:18:33.7023537Z st.shared.v4.b32 [%r93+16384], {%r31794, %r31792, %r31793, %r31791}; 2026-02-21T10:18:33.7023652Z st.shared.v4.b32 [%r94], {%r31766, %r31764, %r31765, %r31763}; 2026-02-21T10:18:33.7023776Z st.shared.v4.b32 [%r94+16384], {%r31798, %r31796, %r31797, %r31795}; 2026-02-21T10:18:33.7023883Z st.shared.v4.b32 [%r95], {%r31770, %r31768, %r31769, %r31767}; 2026-02-21T10:18:33.7023998Z st.shared.v4.b32 [%r95+16384], {%r31802, %r31800, %r31801, %r31799}; 2026-02-21T10:18:33.7024107Z st.shared.v4.b32 [%r96], {%r31774, %r31772, %r31773, %r31771}; 2026-02-21T10:18:33.7024222Z st.shared.v4.b32 [%r96+16384], {%r31806, %r31804, %r31805, %r31803}; 2026-02-21T10:18:33.7024326Z st.shared.v4.b32 [%r97], {%r31778, %r31776, %r31777, %r31775}; 2026-02-21T10:18:33.7024508Z st.shared.v4.b32 [%r97+16384], {%r31810, %r31808, %r31809, %r31807}; 2026-02-21T10:18:33.7024659Z st.shared.v4.b32 [%r98], {%r31782, %r31780, %r31781, %r31779}; 2026-02-21T10:18:33.7024776Z st.shared.v4.b32 [%r98+16384], {%r31814, %r31812, %r31813, %r31811}; 2026-02-21T10:18:33.7024885Z st.shared.v4.b32 [%r99], {%r31786, %r31784, %r31785, %r31783}; 2026-02-21T10:18:33.7025003Z st.shared.v4.b32 [%r99+16384], {%r31818, %r31816, %r31817, %r31815}; 2026-02-21T10:18:33.7025127Z st.shared.v4.b32 [%r100], {%r31790, %r31788, %r31789, %r31787}; 2026-02-21T10:18:33.7025256Z st.shared.v4.b32 [%r100+16384], {%r31822, %r31820, %r31821, %r31819}; 2026-02-21T10:18:33.7025317Z $L__tmp23: 2026-02-21T10:18:33.7025605Z .loc 2 291 36 // standard.py:291:36 @[ c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:94:40 ] 2026-02-21T10:18:33.7025668Z // begin inline asm 2026-02-21T10:18:33.7025762Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.7025820Z // end inline asm 2026-02-21T10:18:33.7025875Z bar.sync 0; 2026-02-21T10:18:33.7025963Z shfl.sync.idx.b32 %r31823, %r5, 0, 31, -1; 2026-02-21T10:18:33.7026091Z wgmma.fence.sync.aligned; 2026-02-21T10:18:33.7026158Z mov.pred %p291, -1; 2026-02-21T10:18:33.7026218Z // begin inline asm 2026-02-21T10:18:33.7027917Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r29512,%r29513,%r29514,%r29515}, %rd12, %p291, 1, 1; 2026-02-21T10:18:33.7027984Z // end inline asm 2026-02-21T10:18:33.7028050Z // begin inline asm 2026-02-21T10:18:33.7029628Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r29644,%r29645,%r29646,%r29647}, %rd13, %p291, 1, 1; 2026-02-21T10:18:33.7029691Z // end inline asm 2026-02-21T10:18:33.7029755Z // begin inline asm 2026-02-21T10:18:33.7031241Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r29776,%r29777,%r29778,%r29779}, %rd14, %p291, 1, 1; 2026-02-21T10:18:33.7031308Z // end inline asm 2026-02-21T10:18:33.7031370Z // begin inline asm 2026-02-21T10:18:33.7032857Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r29908,%r29909,%r29910,%r29911}, %rd15, %p291, 1, 1; 2026-02-21T10:18:33.7033047Z // end inline asm 2026-02-21T10:18:33.7033109Z // begin inline asm 2026-02-21T10:18:33.7034606Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r30040,%r30041,%r30042,%r30043}, %rd16, %p291, 1, 1; 2026-02-21T10:18:33.7034668Z // end inline asm 2026-02-21T10:18:33.7034731Z // begin inline asm 2026-02-21T10:18:33.7036327Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r30172,%r30173,%r30174,%r30175}, %rd17, %p291, 1, 1; 2026-02-21T10:18:33.7036389Z // end inline asm 2026-02-21T10:18:33.7036564Z // begin inline asm 2026-02-21T10:18:33.7038068Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r30304,%r30305,%r30306,%r30307}, %rd18, %p291, 1, 1; 2026-02-21T10:18:33.7038129Z // end inline asm 2026-02-21T10:18:33.7038188Z // begin inline asm 2026-02-21T10:18:33.7039666Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484}, {%r30436,%r30437,%r30438,%r30439}, %rd19, %p291, 1, 1; 2026-02-21T10:18:33.7039725Z // end inline asm 2026-02-21T10:18:33.7039785Z // begin inline asm 2026-02-21T10:18:33.7041264Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r30568,%r30569,%r30570,%r30571}, %rd12, %p291, 1, 1; 2026-02-21T10:18:33.7041323Z // end inline asm 2026-02-21T10:18:33.7041385Z // begin inline asm 2026-02-21T10:18:33.7042868Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r30700,%r30701,%r30702,%r30703}, %rd13, %p291, 1, 1; 2026-02-21T10:18:33.7043071Z // end inline asm 2026-02-21T10:18:33.7043130Z // begin inline asm 2026-02-21T10:18:33.7044671Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r30832,%r30833,%r30834,%r30835}, %rd14, %p291, 1, 1; 2026-02-21T10:18:33.7044738Z // end inline asm 2026-02-21T10:18:33.7044795Z // begin inline asm 2026-02-21T10:18:33.7046330Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r30964,%r30965,%r30966,%r30967}, %rd15, %p291, 1, 1; 2026-02-21T10:18:33.7046396Z // end inline asm 2026-02-21T10:18:33.7046584Z // begin inline asm 2026-02-21T10:18:33.7048084Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r31096,%r31097,%r31098,%r31099}, %rd16, %p291, 1, 1; 2026-02-21T10:18:33.7048142Z // end inline asm 2026-02-21T10:18:33.7048199Z // begin inline asm 2026-02-21T10:18:33.7049678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r31228,%r31229,%r31230,%r31231}, %rd17, %p291, 1, 1; 2026-02-21T10:18:33.7049739Z // end inline asm 2026-02-21T10:18:33.7049797Z // begin inline asm 2026-02-21T10:18:33.7051279Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r31360,%r31361,%r31362,%r31363}, %rd18, %p291, 1, 1; 2026-02-21T10:18:33.7051462Z // end inline asm 2026-02-21T10:18:33.7051522Z // begin inline asm 2026-02-21T10:18:33.7053004Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548}, {%r31492,%r31493,%r31494,%r31495}, %rd19, %p291, 1, 1; 2026-02-21T10:18:33.7053059Z // end inline asm 2026-02-21T10:18:33.7053143Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:33.7053205Z mov.b32 %r31624, %r29377; 2026-02-21T10:18:33.7053266Z mov.b32 %r31625, %r31626; 2026-02-21T10:18:33.7053395Z // begin inline asm 2026-02-21T10:18:33.7055980Z // wait for regs: %r32421,%r32422,%r32423,%r32424,%r32425,%r32426,%r32427,%r32428,%r32429,%r32430,%r32431,%r32432,%r32433,%r32434,%r32435,%r32436,%r32437,%r32438,%r32439,%r32440,%r32441,%r32442,%r32443,%r32444,%r32445,%r32446,%r32447,%r32448,%r32449,%r32450,%r32451,%r32452,%r32453,%r32454,%r32455,%r32456,%r32457,%r32458,%r32459,%r32460,%r32461,%r32462,%r32463,%r32464,%r32465,%r32466,%r32467,%r32468,%r32469,%r32470,%r32471,%r32472,%r32473,%r32474,%r32475,%r32476,%r32477,%r32478,%r32479,%r32480,%r32481,%r32482,%r32483,%r32484,%r32485,%r32486,%r32487,%r32488,%r32489,%r32490,%r32491,%r32492,%r32493,%r32494,%r32495,%r32496,%r32497,%r32498,%r32499,%r32500,%r32501,%r32502,%r32503,%r32504,%r32505,%r32506,%r32507,%r32508,%r32509,%r32510,%r32511,%r32512,%r32513,%r32514,%r32515,%r32516,%r32517,%r32518,%r32519,%r32520,%r32521,%r32522,%r32523,%r32524,%r32525,%r32526,%r32527,%r32528,%r32529,%r32530,%r32531,%r32532,%r32533,%r32534,%r32535,%r32536,%r32537,%r32538,%r32539,%r32540,%r32541,%r32542,%r32543,%r32544,%r32545,%r32546,%r32547,%r32548,%r31624,%r31625,%r31626 2026-02-21T10:18:33.7056070Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:33.7056129Z // end inline asm 2026-02-21T10:18:33.7056185Z $L__tmp24: 2026-02-21T10:18:33.7056408Z .loc 1 51 93 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:51:93 2026-02-21T10:18:33.7056611Z add.s64 %rd665, %rd665, 32; 2026-02-21T10:18:33.7056685Z setp.lt.u64 %p309, %rd664, 4064; 2026-02-21T10:18:33.7056750Z @%p309 bra $L__BB0_16; 2026-02-21T10:18:33.7056867Z // %bb.17: // in Loop: Header=BB0_13 Depth=1 2026-02-21T10:18:33.7057072Z .loc 1 97 28 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:97:28 2026-02-21T10:18:33.7057161Z cvt.rn.bf16x2.f32 %r31827, %r32422, %r32421; 2026-02-21T10:18:33.7057248Z cvt.rn.bf16x2.f32 %r31828, %r32424, %r32423; 2026-02-21T10:18:33.7057326Z cvt.rn.bf16x2.f32 %r31829, %r32426, %r32425; 2026-02-21T10:18:33.7057404Z cvt.rn.bf16x2.f32 %r31830, %r32428, %r32427; 2026-02-21T10:18:33.7057481Z cvt.rn.bf16x2.f32 %r31831, %r32430, %r32429; 2026-02-21T10:18:33.7057558Z cvt.rn.bf16x2.f32 %r31832, %r32432, %r32431; 2026-02-21T10:18:33.7057635Z cvt.rn.bf16x2.f32 %r31833, %r32434, %r32433; 2026-02-21T10:18:33.7057722Z cvt.rn.bf16x2.f32 %r31834, %r32436, %r32435; 2026-02-21T10:18:33.7057798Z cvt.rn.bf16x2.f32 %r31835, %r32438, %r32437; 2026-02-21T10:18:33.7057872Z cvt.rn.bf16x2.f32 %r31836, %r32440, %r32439; 2026-02-21T10:18:33.7057945Z cvt.rn.bf16x2.f32 %r31837, %r32442, %r32441; 2026-02-21T10:18:33.7058024Z cvt.rn.bf16x2.f32 %r31838, %r32444, %r32443; 2026-02-21T10:18:33.7058098Z cvt.rn.bf16x2.f32 %r31839, %r32446, %r32445; 2026-02-21T10:18:33.7058171Z cvt.rn.bf16x2.f32 %r31840, %r32448, %r32447; 2026-02-21T10:18:33.7058343Z cvt.rn.bf16x2.f32 %r31841, %r32450, %r32449; 2026-02-21T10:18:33.7058485Z cvt.rn.bf16x2.f32 %r31842, %r32452, %r32451; 2026-02-21T10:18:33.7058561Z cvt.rn.bf16x2.f32 %r31843, %r32454, %r32453; 2026-02-21T10:18:33.7058636Z cvt.rn.bf16x2.f32 %r31844, %r32456, %r32455; 2026-02-21T10:18:33.7058713Z cvt.rn.bf16x2.f32 %r31845, %r32458, %r32457; 2026-02-21T10:18:33.7058788Z cvt.rn.bf16x2.f32 %r31846, %r32460, %r32459; 2026-02-21T10:18:33.7058867Z cvt.rn.bf16x2.f32 %r31847, %r32462, %r32461; 2026-02-21T10:18:33.7058959Z cvt.rn.bf16x2.f32 %r31848, %r32464, %r32463; 2026-02-21T10:18:33.7059036Z cvt.rn.bf16x2.f32 %r31849, %r32466, %r32465; 2026-02-21T10:18:33.7059110Z cvt.rn.bf16x2.f32 %r31850, %r32468, %r32467; 2026-02-21T10:18:33.7059187Z cvt.rn.bf16x2.f32 %r31851, %r32470, %r32469; 2026-02-21T10:18:33.7059260Z cvt.rn.bf16x2.f32 %r31852, %r32472, %r32471; 2026-02-21T10:18:33.7059335Z cvt.rn.bf16x2.f32 %r31853, %r32474, %r32473; 2026-02-21T10:18:33.7059412Z cvt.rn.bf16x2.f32 %r31854, %r32476, %r32475; 2026-02-21T10:18:33.7059492Z cvt.rn.bf16x2.f32 %r31855, %r32478, %r32477; 2026-02-21T10:18:33.7059570Z cvt.rn.bf16x2.f32 %r31856, %r32480, %r32479; 2026-02-21T10:18:33.7059714Z cvt.rn.bf16x2.f32 %r31857, %r32482, %r32481; 2026-02-21T10:18:33.7059798Z cvt.rn.bf16x2.f32 %r31858, %r32484, %r32483; 2026-02-21T10:18:33.7059874Z cvt.rn.bf16x2.f32 %r31859, %r32486, %r32485; 2026-02-21T10:18:33.7060004Z cvt.rn.bf16x2.f32 %r31860, %r32488, %r32487; 2026-02-21T10:18:33.7060083Z cvt.rn.bf16x2.f32 %r31861, %r32490, %r32489; 2026-02-21T10:18:33.7060168Z cvt.rn.bf16x2.f32 %r31862, %r32492, %r32491; 2026-02-21T10:18:33.7060246Z cvt.rn.bf16x2.f32 %r31863, %r32494, %r32493; 2026-02-21T10:18:33.7060321Z cvt.rn.bf16x2.f32 %r31864, %r32496, %r32495; 2026-02-21T10:18:33.7060406Z cvt.rn.bf16x2.f32 %r31865, %r32498, %r32497; 2026-02-21T10:18:33.7060481Z cvt.rn.bf16x2.f32 %r31866, %r32500, %r32499; 2026-02-21T10:18:33.7060555Z cvt.rn.bf16x2.f32 %r31867, %r32502, %r32501; 2026-02-21T10:18:33.7060634Z cvt.rn.bf16x2.f32 %r31868, %r32504, %r32503; 2026-02-21T10:18:33.7060710Z cvt.rn.bf16x2.f32 %r31869, %r32506, %r32505; 2026-02-21T10:18:33.7060785Z cvt.rn.bf16x2.f32 %r31870, %r32508, %r32507; 2026-02-21T10:18:33.7060861Z cvt.rn.bf16x2.f32 %r31871, %r32510, %r32509; 2026-02-21T10:18:33.7060937Z cvt.rn.bf16x2.f32 %r31872, %r32512, %r32511; 2026-02-21T10:18:33.7061011Z cvt.rn.bf16x2.f32 %r31873, %r32514, %r32513; 2026-02-21T10:18:33.7061087Z cvt.rn.bf16x2.f32 %r31874, %r32516, %r32515; 2026-02-21T10:18:33.7061177Z cvt.rn.bf16x2.f32 %r31875, %r32518, %r32517; 2026-02-21T10:18:33.7061254Z cvt.rn.bf16x2.f32 %r31876, %r32520, %r32519; 2026-02-21T10:18:33.7061334Z cvt.rn.bf16x2.f32 %r31877, %r32522, %r32521; 2026-02-21T10:18:33.7061412Z cvt.rn.bf16x2.f32 %r31878, %r32524, %r32523; 2026-02-21T10:18:33.7061487Z cvt.rn.bf16x2.f32 %r31879, %r32526, %r32525; 2026-02-21T10:18:33.7061562Z cvt.rn.bf16x2.f32 %r31880, %r32528, %r32527; 2026-02-21T10:18:33.7061639Z cvt.rn.bf16x2.f32 %r31881, %r32530, %r32529; 2026-02-21T10:18:33.7061718Z cvt.rn.bf16x2.f32 %r31882, %r32532, %r32531; 2026-02-21T10:18:33.7061794Z cvt.rn.bf16x2.f32 %r31883, %r32534, %r32533; 2026-02-21T10:18:33.7061869Z cvt.rn.bf16x2.f32 %r31884, %r32536, %r32535; 2026-02-21T10:18:33.7061944Z cvt.rn.bf16x2.f32 %r31885, %r32538, %r32537; 2026-02-21T10:18:33.7062021Z cvt.rn.bf16x2.f32 %r31886, %r32540, %r32539; 2026-02-21T10:18:33.7062096Z cvt.rn.bf16x2.f32 %r31887, %r32542, %r32541; 2026-02-21T10:18:33.7062171Z cvt.rn.bf16x2.f32 %r31888, %r32544, %r32543; 2026-02-21T10:18:33.7062248Z cvt.rn.bf16x2.f32 %r31889, %r32546, %r32545; 2026-02-21T10:18:33.7062322Z cvt.rn.bf16x2.f32 %r31890, %r32548, %r32547; 2026-02-21T10:18:33.7062532Z .loc 1 98 43 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:98:43 2026-02-21T10:18:33.7062592Z bar.sync 0; 2026-02-21T10:18:33.7062793Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r101], {%r31827, %r31828, %r31829, %r31830}; 2026-02-21T10:18:33.7063038Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r102], {%r31843, %r31844, %r31845, %r31846}; 2026-02-21T10:18:33.7063269Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r103], {%r31859, %r31860, %r31861, %r31862}; 2026-02-21T10:18:33.7063454Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r104], {%r31875, %r31876, %r31877, %r31878}; 2026-02-21T10:18:33.7063636Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r105], {%r31831, %r31832, %r31833, %r31834}; 2026-02-21T10:18:33.7063820Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r106], {%r31847, %r31848, %r31849, %r31850}; 2026-02-21T10:18:33.7064005Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r107], {%r31863, %r31864, %r31865, %r31866}; 2026-02-21T10:18:33.7064198Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r108], {%r31879, %r31880, %r31881, %r31882}; 2026-02-21T10:18:33.7064383Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r109], {%r31835, %r31836, %r31837, %r31838}; 2026-02-21T10:18:33.7064567Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r110], {%r31851, %r31852, %r31853, %r31854}; 2026-02-21T10:18:33.7064751Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r111], {%r31867, %r31868, %r31869, %r31870}; 2026-02-21T10:18:33.7064997Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r112], {%r31883, %r31884, %r31885, %r31886}; 2026-02-21T10:18:33.7065187Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r113], {%r31839, %r31840, %r31841, %r31842}; 2026-02-21T10:18:33.7065416Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r114], {%r31855, %r31856, %r31857, %r31858}; 2026-02-21T10:18:33.7065600Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r115], {%r31871, %r31872, %r31873, %r31874}; 2026-02-21T10:18:33.7065785Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r116], {%r31887, %r31888, %r31889, %r31890}; 2026-02-21T10:18:33.7065847Z // begin inline asm 2026-02-21T10:18:33.7065928Z fence.proxy.async.shared::cta; 2026-02-21T10:18:33.7065990Z // end inline asm 2026-02-21T10:18:33.7066044Z bar.sync 0; 2026-02-21T10:18:33.7066113Z elect.sync %r31891|%p312, -1; 2026-02-21T10:18:33.7066196Z shfl.sync.idx.b32 %r31892, %r5, 0, 31, -1; 2026-02-21T10:18:33.7066282Z and.pred %p310, %p314, %p312; 2026-02-21T10:18:33.7066346Z and.b32 %r31893, %r31892, 1; 2026-02-21T10:18:33.7066409Z shl.b32 %r31894, %r31893, 14; 2026-02-21T10:18:33.7066595Z add.s32 %r31826, %r29377, %r31894; 2026-02-21T10:18:33.7066659Z shl.b32 %r31896, %r31893, 6; 2026-02-21T10:18:33.7066724Z or.b32 %r31824, %r31896, %r29378; 2026-02-21T10:18:33.7066783Z // begin inline asm 2026-02-21T10:18:33.7067035Z @%p310 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd298, {%r31824, %r31825}], [%r31826]; 2026-02-21T10:18:33.7067093Z // end inline asm 2026-02-21T10:18:33.7067166Z cp.async.bulk.commit_group; 2026-02-21T10:18:33.7067242Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:18:33.7067297Z bar.sync 0; 2026-02-21T10:18:33.7067503Z .loc 1 31 88 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:88 2026-02-21T10:18:33.7067569Z add.s32 %r1672, %r32420, 1; 2026-02-21T10:18:33.7067638Z setp.lt.u32 %p313, %r32420, %r3; 2026-02-21T10:18:33.7067701Z mov.b32 %r32420, %r1672; 2026-02-21T10:18:33.7067762Z @%p313 bra $L__BB0_13; 2026-02-21T10:18:33.7067857Z $L__BB0_18: // %._crit_edge 2026-02-21T10:18:33.7068055Z .loc 1 31 4 // c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py:31:4 2026-02-21T10:18:33.7068111Z ret; 2026-02-21T10:18:33.7068170Z $L__tmp25: 2026-02-21T10:18:33.7068227Z $L__func_end0: 2026-02-21T10:18:33.7068313Z // -- End function 2026-02-21T10:18:33.7068368Z } 2026-02-21T10:18:33.7068694Z .file 1 "/tmp/torchinductor_root/53/c535yyourbfbhd6v2q4wn7gvzijzbtcpnu67upfxxutgbtmxdpx5.py" 2026-02-21T10:18:33.7068906Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:18:33.7068970Z .section .debug_abbrev 2026-02-21T10:18:33.7069025Z { 2026-02-21T10:18:33.7069119Z .b8 1 // Abbreviation Code 2026-02-21T10:18:33.7069313Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:18:33.7069475Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:33.7069564Z .b8 37 // DW_AT_producer 2026-02-21T10:18:33.7069644Z .b8 8 // DW_FORM_string 2026-02-21T10:18:33.7069723Z .b8 19 // DW_AT_language 2026-02-21T10:18:33.7069811Z .b8 5 // DW_FORM_data2 2026-02-21T10:18:33.7069889Z .b8 3 // DW_AT_name 2026-02-21T10:18:33.7069968Z .b8 8 // DW_FORM_string 2026-02-21T10:18:33.7070057Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:18:33.7070135Z .b8 6 // DW_FORM_data4 2026-02-21T10:18:33.7070213Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:18:33.7070292Z .b8 8 // DW_FORM_string 2026-02-21T10:18:33.7070369Z .b8 0 // EOM(1) 2026-02-21T10:18:33.7070440Z .b8 0 // EOM(2) 2026-02-21T10:18:33.7070592Z .b8 2 // Abbreviation Code 2026-02-21T10:18:33.7070681Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:33.7070760Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:33.7070887Z .b8 3 // DW_AT_name 2026-02-21T10:18:33.7070970Z .b8 8 // DW_FORM_string 2026-02-21T10:18:33.7071051Z .b8 32 // DW_AT_inline 2026-02-21T10:18:33.7071130Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:33.7071203Z .b8 0 // EOM(1) 2026-02-21T10:18:33.7071272Z .b8 0 // EOM(2) 2026-02-21T10:18:33.7071356Z .b8 3 // Abbreviation Code 2026-02-21T10:18:33.7071446Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:33.7071529Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:33.7071608Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:33.7071683Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:33.7071765Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:33.7071847Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:33.7071945Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:33.7072020Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:33.7072090Z .b8 0 // EOM(1) 2026-02-21T10:18:33.7072177Z .b8 0 // EOM(2) 2026-02-21T10:18:33.7072263Z .b8 4 // Abbreviation Code 2026-02-21T10:18:33.7072364Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:18:33.7072447Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:33.7072540Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:33.7072616Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:33.7072689Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:33.7072765Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:33.7072847Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:33.7072921Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:33.7073003Z .b8 88 // DW_AT_call_file 2026-02-21T10:18:33.7073078Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:33.7073156Z .b8 89 // DW_AT_call_line 2026-02-21T10:18:33.7073235Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:33.7073371Z .b8 87 // DW_AT_call_column 2026-02-21T10:18:33.7073504Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:33.7073580Z .b8 0 // EOM(1) 2026-02-21T10:18:33.7073650Z .b8 0 // EOM(2) 2026-02-21T10:18:33.7073717Z .b8 0 // EOM(3) 2026-02-21T10:18:33.7073766Z } 2026-02-21T10:18:33.7073834Z .section .debug_info 2026-02-21T10:18:33.7073885Z { 2026-02-21T10:18:33.7073973Z .b32 178 // Length of Unit 2026-02-21T10:18:33.7074067Z .b8 2 // DWARF version number 2026-02-21T10:18:33.7074118Z .b8 0 2026-02-21T10:18:33.7074246Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:18:33.7074340Z .b8 8 // Address Size (in bytes) 2026-02-21T10:18:33.7074456Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:18:33.7074541Z .b8 116 // DW_AT_producer 2026-02-21T10:18:33.7074596Z .b8 114 2026-02-21T10:18:33.7074651Z .b8 105 2026-02-21T10:18:33.7074749Z .b8 116 2026-02-21T10:18:33.7074801Z .b8 111 2026-02-21T10:18:33.7074851Z .b8 110 2026-02-21T10:18:33.7074905Z .b8 0 2026-02-21T10:18:33.7074992Z .b8 2 // DW_AT_language 2026-02-21T10:18:33.7075045Z .b8 0 2026-02-21T10:18:33.7075170Z .b8 99 // DW_AT_name 2026-02-21T10:18:33.7075223Z .b8 53 2026-02-21T10:18:33.7075272Z .b8 51 2026-02-21T10:18:33.7075321Z .b8 53 2026-02-21T10:18:33.7075374Z .b8 121 2026-02-21T10:18:33.7075424Z .b8 121 2026-02-21T10:18:33.7075477Z .b8 111 2026-02-21T10:18:33.7075530Z .b8 117 2026-02-21T10:18:33.7075580Z .b8 114 2026-02-21T10:18:33.7075632Z .b8 98 2026-02-21T10:18:33.7075683Z .b8 102 2026-02-21T10:18:33.7075737Z .b8 98 2026-02-21T10:18:33.7075787Z .b8 104 2026-02-21T10:18:33.7075837Z .b8 100 2026-02-21T10:18:33.7075891Z .b8 54 2026-02-21T10:18:33.7075941Z .b8 118 2026-02-21T10:18:33.7075990Z .b8 50 2026-02-21T10:18:33.7076042Z .b8 113 2026-02-21T10:18:33.7076093Z .b8 52 2026-02-21T10:18:33.7076143Z .b8 119 2026-02-21T10:18:33.7076196Z .b8 110 2026-02-21T10:18:33.7076249Z .b8 55 2026-02-21T10:18:33.7076301Z .b8 103 2026-02-21T10:18:33.7076352Z .b8 118 2026-02-21T10:18:33.7076402Z .b8 122 2026-02-21T10:18:33.7076579Z .b8 105 2026-02-21T10:18:33.7076648Z .b8 106 2026-02-21T10:18:33.7076700Z .b8 122 2026-02-21T10:18:33.7076753Z .b8 98 2026-02-21T10:18:33.7076806Z .b8 116 2026-02-21T10:18:33.7076855Z .b8 99 2026-02-21T10:18:33.7076906Z .b8 112 2026-02-21T10:18:33.7076958Z .b8 110 2026-02-21T10:18:33.7077008Z .b8 117 2026-02-21T10:18:33.7077060Z .b8 54 2026-02-21T10:18:33.7077110Z .b8 55 2026-02-21T10:18:33.7077163Z .b8 117 2026-02-21T10:18:33.7077213Z .b8 112 2026-02-21T10:18:33.7077264Z .b8 102 2026-02-21T10:18:33.7077317Z .b8 120 2026-02-21T10:18:33.7077367Z .b8 120 2026-02-21T10:18:33.7077416Z .b8 117 2026-02-21T10:18:33.7077470Z .b8 116 2026-02-21T10:18:33.7077521Z .b8 103 2026-02-21T10:18:33.7077571Z .b8 98 2026-02-21T10:18:33.7077622Z .b8 116 2026-02-21T10:18:33.7077672Z .b8 109 2026-02-21T10:18:33.7077727Z .b8 120 2026-02-21T10:18:33.7077779Z .b8 100 2026-02-21T10:18:33.7077830Z .b8 112 2026-02-21T10:18:33.7077883Z .b8 120 2026-02-21T10:18:33.7077932Z .b8 53 2026-02-21T10:18:33.7077983Z .b8 46 2026-02-21T10:18:33.7078034Z .b8 112 2026-02-21T10:18:33.7078087Z .b8 121 2026-02-21T10:18:33.7078138Z .b8 0 2026-02-21T10:18:33.7078260Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:18:33.7078345Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:18:33.7078397Z .b8 116 2026-02-21T10:18:33.7078448Z .b8 109 2026-02-21T10:18:33.7078498Z .b8 112 2026-02-21T10:18:33.7078551Z .b8 47 2026-02-21T10:18:33.7078602Z .b8 116 2026-02-21T10:18:33.7078654Z .b8 111 2026-02-21T10:18:33.7078708Z .b8 114 2026-02-21T10:18:33.7078757Z .b8 99 2026-02-21T10:18:33.7078807Z .b8 104 2026-02-21T10:18:33.7078957Z .b8 105 2026-02-21T10:18:33.7079014Z .b8 110 2026-02-21T10:18:33.7079066Z .b8 100 2026-02-21T10:18:33.7079182Z .b8 117 2026-02-21T10:18:33.7079235Z .b8 99 2026-02-21T10:18:33.7079285Z .b8 116 2026-02-21T10:18:33.7079338Z .b8 111 2026-02-21T10:18:33.7079389Z .b8 114 2026-02-21T10:18:33.7079442Z .b8 95 2026-02-21T10:18:33.7079494Z .b8 114 2026-02-21T10:18:33.7079545Z .b8 111 2026-02-21T10:18:33.7079594Z .b8 111 2026-02-21T10:18:33.7079647Z .b8 116 2026-02-21T10:18:33.7079697Z .b8 47 2026-02-21T10:18:33.7079748Z .b8 53 2026-02-21T10:18:33.7079811Z .b8 51 2026-02-21T10:18:33.7079862Z .b8 0 2026-02-21T10:18:33.7079977Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:18:33.7080055Z .b8 95 // DW_AT_name 2026-02-21T10:18:33.7080110Z .b8 104 2026-02-21T10:18:33.7080162Z .b8 101 2026-02-21T10:18:33.7080213Z .b8 108 2026-02-21T10:18:33.7080268Z .b8 105 2026-02-21T10:18:33.7080317Z .b8 111 2026-02-21T10:18:33.7080368Z .b8 110 2026-02-21T10:18:33.7080420Z .b8 95 2026-02-21T10:18:33.7080473Z .b8 109 2026-02-21T10:18:33.7080525Z .b8 97 2026-02-21T10:18:33.7080575Z .b8 116 2026-02-21T10:18:33.7080629Z .b8 109 2026-02-21T10:18:33.7080751Z .b8 117 2026-02-21T10:18:33.7080817Z .b8 108 2026-02-21T10:18:33.7080869Z .b8 95 2026-02-21T10:18:33.7080921Z .b8 98 2026-02-21T10:18:33.7080972Z .b8 102 2026-02-21T10:18:33.7081022Z .b8 49 2026-02-21T10:18:33.7081075Z .b8 54 2026-02-21T10:18:33.7081125Z .b8 95 2026-02-21T10:18:33.7081234Z .b8 105 2026-02-21T10:18:33.7081289Z .b8 110 2026-02-21T10:18:33.7081343Z .b8 116 2026-02-21T10:18:33.7081393Z .b8 52 2026-02-21T10:18:33.7081442Z .b8 0 2026-02-21T10:18:33.7081522Z .b8 1 // DW_AT_inline 2026-02-21T10:18:33.7081632Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:18:33.7081726Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:18:33.7081819Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:18:33.7081921Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:33.7082050Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:18:33.7082147Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:33.7082236Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:18:33.7082335Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:18:33.7082422Z .b8 1 // DW_AT_call_file 2026-02-21T10:18:33.7082508Z .b8 94 // DW_AT_call_line 2026-02-21T10:18:33.7082593Z .b8 40 // DW_AT_call_column 2026-02-21T10:18:33.7082682Z .b8 0 // End Of Children Mark 2026-02-21T10:18:33.7082767Z .b8 0 // End Of Children Mark 2026-02-21T10:18:33.7082823Z } 2026-02-21T10:18:33.7082894Z .section .debug_macinfo { } 2026-02-21T10:18:33.7082901Z 2026-02-21T10:18:33.7082980Z ================================================================ 2026-02-21T10:18:33.7083101Z please share the reproducer above with Triton project. 2026-02-21T10:18:46.5864429Z 2026-02-21T10:18:46.5864446Z 2026-02-21T10:18:46.5864451Z 2026-02-21T10:18:46.5864892Z ================================================================ 2026-02-21T10:18:46.5865306Z Internal Triton PTX codegen error 2026-02-21T10:18:46.5865578Z `ptxas` stderr: 2026-02-21T10:18:46.5866317Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 545 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:46.5867316Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:46.5867556Z 2026-02-21T10:18:46.5868242Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp7d73yk9g.ptx -o /tmp/tmp7d73yk9g.ptx.o 2026-02-21T10:18:46.5869464Z 2026-02-21T10:18:46.5869468Z 2026-02-21T10:18:46.5869537Z // 2026-02-21T10:18:46.5869812Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:18:46.5870043Z // 2026-02-21T10:18:46.5870121Z 2026-02-21T10:18:46.5870196Z .version 8.7 2026-02-21T10:18:46.5870361Z .target sm_90a 2026-02-21T10:18:46.5870537Z .address_size 64 2026-02-21T10:18:46.5870640Z 2026-02-21T10:18:46.5870835Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:18:46.5871228Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:18:46.5871507Z // @_helion_matmul_bf16_int4 2026-02-21T10:18:46.5871804Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:18:46.5872144Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:18:46.5872533Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:18:46.5872927Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:18:46.5873311Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:18:46.5873696Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:18:46.5874169Z ) 2026-02-21T10:18:46.5874327Z .reqntid 128 2026-02-21T10:18:46.5874485Z .maxnreg 32 2026-02-21T10:18:46.5874642Z { 2026-02-21T10:18:46.5874784Z .reg .pred %p<68>; 2026-02-21T10:18:46.5874970Z .reg .b16 %rs<769>; 2026-02-21T10:18:46.5875157Z .reg .b32 %r<12901>; 2026-02-21T10:18:46.5875452Z .reg .b64 %rd<315>; 2026-02-21T10:18:46.5875813Z .loc 1 14 0 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:14:0 2026-02-21T10:18:46.5876217Z $L__func_begin0: 2026-02-21T10:18:46.5876729Z .loc 1 14 0 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:14:0 2026-02-21T10:18:46.5877062Z 2026-02-21T10:18:46.5877130Z // %bb.0: 2026-02-21T10:18:46.5877337Z ld.param.b64 %rd27, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:18:46.5877683Z ld.param.b64 %rd26, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:18:46.5878013Z ld.param.b64 %rd25, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:18:46.5878280Z $L__tmp0: 2026-02-21T10:18:46.5878603Z .loc 1 19 46 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:46 2026-02-21T10:18:46.5879005Z mov.u32 %r12337, %ctaid.x; 2026-02-21T10:18:46.5879676Z .loc 1 0 0 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:0 2026-02-21T10:18:46.5880029Z sub.s32 %r1071, 5251, %r12337; 2026-02-21T10:18:46.5880380Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.5880776Z mul.hi.u32 %r1072, %r1071, 1041204193; 2026-02-21T10:18:46.5880985Z shr.u32 %r1073, %r1072, 5; 2026-02-21T10:18:46.5881173Z and.b32 %r1074, %r1073, 33554430; 2026-02-21T10:18:46.5881380Z mad.lo.s32 %r12768, %r1074, 132, %r12337; 2026-02-21T10:18:46.5881776Z .loc 1 31 45 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:31:45 2026-02-21T10:18:46.5882179Z mov.u32 %r3, %tid.x; 2026-02-21T10:18:46.5882348Z shr.u32 %r4, %r3, 5; 2026-02-21T10:18:46.5882516Z and.b32 %r5, %r3, 124; 2026-02-21T10:18:46.5882700Z bfe.u32 %r6, %r3, 2, 5; 2026-02-21T10:18:46.5882869Z or.b32 %r7, %r6, 32; 2026-02-21T10:18:46.5883027Z or.b32 %r8, %r6, 64; 2026-02-21T10:18:46.5883209Z or.b32 %r9, %r6, 96; 2026-02-21T10:18:46.5883369Z shr.u32 %r1075, %r3, 4; 2026-02-21T10:18:46.5883555Z bfe.u32 %r11, %r3, 4, 3; 2026-02-21T10:18:46.5883723Z or.b32 %r12, %r11, 8; 2026-02-21T10:18:46.5883897Z or.b32 %r13, %r11, 16; 2026-02-21T10:18:46.5884075Z or.b32 %r14, %r11, 24; 2026-02-21T10:18:46.5884246Z or.b32 %r15, %r11, 32; 2026-02-21T10:18:46.5884414Z or.b32 %r16, %r11, 40; 2026-02-21T10:18:46.5884579Z or.b32 %r17, %r11, 48; 2026-02-21T10:18:46.5884751Z or.b32 %r18, %r1075, 56; 2026-02-21T10:18:46.5884915Z or.b32 %r19, %r11, 64; 2026-02-21T10:18:46.5885086Z or.b32 %r20, %r11, 72; 2026-02-21T10:18:46.5885362Z or.b32 %r21, %r11, 80; 2026-02-21T10:18:46.5885602Z or.b32 %r22, %r11, 88; 2026-02-21T10:18:46.5885774Z or.b32 %r23, %r11, 96; 2026-02-21T10:18:46.5885949Z or.b32 %r24, %r11, 104; 2026-02-21T10:18:46.5886129Z or.b32 %r25, %r11, 112; 2026-02-21T10:18:46.5886300Z or.b32 %r26, %r1075, 120; 2026-02-21T10:18:46.5886687Z shl.b32 %r27, %r3, 3; 2026-02-21T10:18:46.5886865Z and.b32 %r28, %r27, 120; 2026-02-21T10:18:46.5887200Z .loc 1 47 38 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:47:38 2026-02-21T10:18:46.5887562Z and.b32 %r29, %r3, 3; 2026-02-21T10:18:46.5887737Z shl.b32 %r30, %r29, 2; 2026-02-21T10:18:46.5887907Z and.b32 %r31, %r3, 1; 2026-02-21T10:18:46.5888237Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.5888652Z setp.lt.s32 %p1, %r12337, %r12768; 2026-02-21T10:18:46.5888863Z mov.b32 %r5575, global_smem; 2026-02-21T10:18:46.5889065Z and.b32 %r12324, %r3, 96; 2026-02-21T10:18:46.5889244Z and.b32 %r12325, %r27, 96; 2026-02-21T10:18:46.5889425Z shl.b32 %r12326, %r29, 1; 2026-02-21T10:18:46.5889740Z bfe.u32 %r12327, %r3, 5, 2; 2026-02-21T10:18:46.5889939Z and.b32 %r12328, %r3, 6; 2026-02-21T10:18:46.5890124Z and.b32 %r12329, %r3, 120; 2026-02-21T10:18:46.5890297Z shl.b32 %r12330, %r31, 2; 2026-02-21T10:18:46.5890475Z shl.b32 %r12331, %r3, 6; 2026-02-21T10:18:46.5890648Z shl.b32 %r12332, %r29, 11; 2026-02-21T10:18:46.5890921Z shl.b32 %r12333, %r29, 5; 2026-02-21T10:18:46.5891098Z shl.b32 %r12334, %r3, 2; 2026-02-21T10:18:46.5891280Z shl.b32 %r12335, %r3, 8; 2026-02-21T10:18:46.5891442Z shl.b32 %r12336, %r5, 2; 2026-02-21T10:18:46.5891628Z @%p1 bra $L__BB0_2; 2026-02-21T10:18:46.5891795Z bra.uni $L__BB0_1; 2026-02-21T10:18:46.5891990Z $L__BB0_2: // %.lr.ph 2026-02-21T10:18:46.5892374Z .loc 1 0 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:0:139 2026-02-21T10:18:46.5892740Z bfe.u32 %r10, %r3, 1, 6; 2026-02-21T10:18:46.5892924Z shl.b32 %r1076, %r3, 4; 2026-02-21T10:18:46.5893093Z and.b32 %r1077, %r1076, 1904; 2026-02-21T10:18:46.5893357Z bfe.s32 %r1078, %r3, 3, 1; 2026-02-21T10:18:46.5893537Z and.b32 %r1079, %r1078, 136; 2026-02-21T10:18:46.5893729Z or.b32 %r1080, %r1079, %r1077; 2026-02-21T10:18:46.5893913Z add.s32 %r33, %r5575, %r1080; 2026-02-21T10:18:46.5894099Z xor.b32 %r1082, %r1080, 8; 2026-02-21T10:18:46.5894288Z add.s32 %r34, %r5575, %r1082; 2026-02-21T10:18:46.5894463Z shl.b32 %r1084, %r12324, 4; 2026-02-21T10:18:46.5894644Z and.b32 %r12594, %r3, 16; 2026-02-21T10:18:46.5894814Z bfe.s32 %r1087, %r3, 4, 1; 2026-02-21T10:18:46.5894990Z and.b32 %r1088, %r1087, 136; 2026-02-21T10:18:46.5895168Z or.b32 %r1089, %r1084, %r12325; 2026-02-21T10:18:46.5895369Z or.b32 %r1090, %r1089, %r12326; 2026-02-21T10:18:46.5895551Z or.b32 %r1091, %r1090, %r1088; 2026-02-21T10:18:46.5895735Z add.s32 %r36, %r5575, %r1091; 2026-02-21T10:18:46.5895916Z xor.b32 %r1092, %r1091, 8; 2026-02-21T10:18:46.5896089Z add.s32 %r37, %r5575, %r1092; 2026-02-21T10:18:46.5896266Z and.b32 %r1094, %r1087, 132; 2026-02-21T10:18:46.5896441Z or.b32 %r1095, %r12327, %r1094; 2026-02-21T10:18:46.5896775Z or.b32 %r1096, %r1095, %r28; 2026-02-21T10:18:46.5896951Z add.s32 %r38, %r5575, %r1096; 2026-02-21T10:18:46.5897135Z xor.b32 %r1097, %r1096, 4; 2026-02-21T10:18:46.5897316Z add.s32 %r39, %r5575, %r1097; 2026-02-21T10:18:46.5897506Z xor.b32 %r1098, %r1096, 32; 2026-02-21T10:18:46.5897681Z add.s32 %r40, %r5575, %r1098; 2026-02-21T10:18:46.5897859Z xor.b32 %r1099, %r1096, 36; 2026-02-21T10:18:46.5898038Z add.s32 %r41, %r5575, %r1099; 2026-02-21T10:18:46.5898212Z xor.b32 %r1100, %r1096, 64; 2026-02-21T10:18:46.5898396Z add.s32 %r42, %r5575, %r1100; 2026-02-21T10:18:46.5898568Z xor.b32 %r1101, %r1096, 68; 2026-02-21T10:18:46.5898744Z add.s32 %r43, %r5575, %r1101; 2026-02-21T10:18:46.5898920Z xor.b32 %r1102, %r1096, 96; 2026-02-21T10:18:46.5899187Z add.s32 %r44, %r5575, %r1102; 2026-02-21T10:18:46.5899440Z xor.b32 %r1103, %r1096, 100; 2026-02-21T10:18:46.5899620Z add.s32 %r45, %r5575, %r1103; 2026-02-21T10:18:46.5899801Z mul.lo.s32 %r1107, %r12328, 144; 2026-02-21T10:18:46.5900006Z xor.b32 %r1108, %r1107, %r12329; 2026-02-21T10:18:46.5900196Z or.b32 %r1109, %r1108, %r12330; 2026-02-21T10:18:46.5900381Z add.s32 %r46, %r5575, %r1109; 2026-02-21T10:18:46.5900562Z xor.b32 %r1110, %r1109, 132; 2026-02-21T10:18:46.5900736Z add.s32 %r47, %r5575, %r1110; 2026-02-21T10:18:46.5900920Z and.b32 %r1112, %r12331, 8128; 2026-02-21T10:18:46.5901100Z shl.b32 %r1113, %r12328, 3; 2026-02-21T10:18:46.5901279Z or.b32 %r1114, %r1112, %r1113; 2026-02-21T10:18:46.5901457Z add.s32 %r48, %r5575, %r1114; 2026-02-21T10:18:46.5901657Z xor.b32 %r1115, %r1114, 16; 2026-02-21T10:18:46.5901838Z add.s32 %r49, %r5575, %r1115; 2026-02-21T10:18:46.5902012Z xor.b32 %r1116, %r1114, 32; 2026-02-21T10:18:46.5902191Z add.s32 %r50, %r5575, %r1116; 2026-02-21T10:18:46.5902366Z xor.b32 %r1117, %r1114, 48; 2026-02-21T10:18:46.5902546Z add.s32 %r51, %r5575, %r1117; 2026-02-21T10:18:46.5902719Z bfe.u32 %r1118, %r5575, 4, 14; 2026-02-21T10:18:46.5902979Z cvt.u64.u32 %rd28, %r1118; 2026-02-21T10:18:46.5903192Z or.b64 %rd101, %rd28, -9223371899382267904; 2026-02-21T10:18:46.5903413Z add.s32 %r1119, %r5575, 32; 2026-02-21T10:18:46.5903586Z bfe.u32 %r1120, %r1119, 4, 14; 2026-02-21T10:18:46.5903847Z cvt.u64.u32 %rd29, %r1120; 2026-02-21T10:18:46.5904047Z or.b64 %rd102, %rd29, -9223371899382267904; 2026-02-21T10:18:46.5904252Z shl.b32 %r1123, %r12329, 4; 2026-02-21T10:18:46.5904435Z and.b32 %r1125, %r12334, 16; 2026-02-21T10:18:46.5904606Z or.b32 %r1126, %r1123, %r1125; 2026-02-21T10:18:46.5904788Z or.b32 %r1127, %r1126, %r12332; 2026-02-21T10:18:46.5904969Z or.b32 %r1128, %r1127, %r12333; 2026-02-21T10:18:46.5905154Z add.s32 %r52, %r5575, %r1128; 2026-02-21T10:18:46.5905330Z xor.b32 %r1129, %r1128, 32; 2026-02-21T10:18:46.5905512Z add.s32 %r53, %r5575, %r1129; 2026-02-21T10:18:46.5905687Z xor.b32 %r1130, %r1128, 64; 2026-02-21T10:18:46.5905858Z add.s32 %r54, %r5575, %r1130; 2026-02-21T10:18:46.5906038Z xor.b32 %r1131, %r1128, 96; 2026-02-21T10:18:46.5906209Z add.s32 %r55, %r5575, %r1131; 2026-02-21T10:18:46.5906392Z and.b32 %r1133, %r12335, 6144; 2026-02-21T10:18:46.5906717Z or.b32 %r1135, %r1133, %r12333; 2026-02-21T10:18:46.5906908Z xor.b32 %r1136, %r1135, %r12336; 2026-02-21T10:18:46.5907112Z add.s32 %r4522, %r5575, %r1136; 2026-02-21T10:18:46.5907299Z add.s32 %r4527, %r4522, 512; 2026-02-21T10:18:46.5907476Z add.s32 %r4532, %r4522, 1024; 2026-02-21T10:18:46.5907657Z add.s32 %r4537, %r4522, 1536; 2026-02-21T10:18:46.5907991Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.5908354Z or.b32 %r60, %r10, 64; 2026-02-21T10:18:46.5908771Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.5909163Z mad.wide.u32 %rd3, %r31, 16, %rd25; 2026-02-21T10:18:46.5909377Z shl.b32 %r61, %r10, 13; 2026-02-21T10:18:46.5909557Z mad.wide.u32 %rd4, %r11, 1280, %rd26; 2026-02-21T10:18:46.5909816Z $L__BB0_3: // =>This Loop Header: Depth=1 2026-02-21T10:18:46.5910104Z // Child Loop BB0_4 Depth 2 2026-02-21T10:18:46.5910372Z // Child Loop BB0_6 Depth 2 2026-02-21T10:18:46.5910773Z .loc 1 25 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:25:35 2026-02-21T10:18:46.5911136Z shr.s32 %r1139, %r12337, 31; 2026-02-21T10:18:46.5911329Z shr.u32 %r1140, %r1139, 17; 2026-02-21T10:18:46.5911511Z add.s32 %r1141, %r12337, %r1140; 2026-02-21T10:18:46.5911707Z shr.s32 %r1142, %r1141, 15; 2026-02-21T10:18:46.5912031Z .loc 1 26 33 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:26:33 2026-02-21T10:18:46.5912486Z shl.b32 %r1143, %r1142, 6; 2026-02-21T10:18:46.5912808Z .loc 1 27 39 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:39 2026-02-21T10:18:46.5913234Z sub.s32 %r1144, 10, %r1143; 2026-02-21T10:18:46.5913557Z .loc 1 27 52 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:52 2026-02-21T10:18:46.5913906Z min.s32 %r1145, %r1144, 64; 2026-02-21T10:18:46.5914225Z .loc 1 28 45 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:45 2026-02-21T10:18:46.5914593Z and.b32 %r1146, %r1141, -32768; 2026-02-21T10:18:46.5914786Z sub.s32 %r1147, %r12337, %r1146; 2026-02-21T10:18:46.5915114Z .loc 1 29 51 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:29:51 2026-02-21T10:18:46.5915467Z div.s32 %r1148, %r1147, %r1145; 2026-02-21T10:18:46.5915792Z .loc 1 28 64 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:64 2026-02-21T10:18:46.5916147Z mul.lo.s32 %r1149, %r1148, %r1145; 2026-02-21T10:18:46.5916349Z sub.s32 %r1150, %r1147, %r1149; 2026-02-21T10:18:46.5916896Z .loc 1 28 30 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:30 2026-02-21T10:18:46.5917256Z add.s32 %r1151, %r1150, %r1143; 2026-02-21T10:18:46.5917584Z .loc 1 30 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:30:27 2026-02-21T10:18:46.5917994Z shl.b32 %r1152, %r1151, 7; 2026-02-21T10:18:46.5918325Z .loc 1 31 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:31:32 2026-02-21T10:18:46.5918675Z or.b32 %r63, %r1152, %r28; 2026-02-21T10:18:46.5918990Z .loc 1 32 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:32:27 2026-02-21T10:18:46.5919341Z shl.b32 %r64, %r1148, 7; 2026-02-21T10:18:46.5919662Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.5920021Z or.b32 %r1153, %r60, %r64; 2026-02-21T10:18:46.5920192Z shl.b32 %r1154, %r1153, 13; 2026-02-21T10:18:46.5920378Z mul.wide.s32 %rd5, %r1154, 2; 2026-02-21T10:18:46.5920560Z shl.b32 %r1155, %r1148, 20; 2026-02-21T10:18:46.5920743Z or.b32 %r1156, %r61, %r1155; 2026-02-21T10:18:46.5920921Z mul.wide.s32 %rd6, %r1156, 2; 2026-02-21T10:18:46.5921108Z cvt.s64.s32 %rd31, %r63; 2026-02-21T10:18:46.5921285Z add.s64 %rd309, %rd4, %rd31; 2026-02-21T10:18:46.5921462Z mov.b32 %r12338, 0f00000000; 2026-02-21T10:18:46.5921643Z mov.b64 %rd311, -32; 2026-02-21T10:18:46.5921806Z mov.b64 %rd310, %rd3; 2026-02-21T10:18:46.5921976Z mov.b32 %r12339, %r12338; 2026-02-21T10:18:46.5922144Z mov.b32 %r12340, %r12338; 2026-02-21T10:18:46.5922316Z mov.b32 %r12341, %r12338; 2026-02-21T10:18:46.5922480Z mov.b32 %r12342, %r12338; 2026-02-21T10:18:46.5922650Z mov.b32 %r12343, %r12338; 2026-02-21T10:18:46.5922812Z mov.b32 %r12344, %r12338; 2026-02-21T10:18:46.5922981Z mov.b32 %r12345, %r12338; 2026-02-21T10:18:46.5923165Z mov.b32 %r12346, %r12338; 2026-02-21T10:18:46.5923334Z mov.b32 %r12347, %r12338; 2026-02-21T10:18:46.5923505Z mov.b32 %r12348, %r12338; 2026-02-21T10:18:46.5923668Z mov.b32 %r12349, %r12338; 2026-02-21T10:18:46.5923846Z mov.b32 %r12350, %r12338; 2026-02-21T10:18:46.5924012Z mov.b32 %r12351, %r12338; 2026-02-21T10:18:46.5924186Z mov.b32 %r12352, %r12338; 2026-02-21T10:18:46.5924350Z mov.b32 %r12353, %r12338; 2026-02-21T10:18:46.5924520Z mov.b32 %r12354, %r12338; 2026-02-21T10:18:46.5924690Z mov.b32 %r12355, %r12338; 2026-02-21T10:18:46.5924854Z mov.b32 %r12356, %r12338; 2026-02-21T10:18:46.5925024Z mov.b32 %r12357, %r12338; 2026-02-21T10:18:46.5925187Z mov.b32 %r12358, %r12338; 2026-02-21T10:18:46.5925360Z mov.b32 %r12359, %r12338; 2026-02-21T10:18:46.5925526Z mov.b32 %r12360, %r12338; 2026-02-21T10:18:46.5925697Z mov.b32 %r12361, %r12338; 2026-02-21T10:18:46.5925864Z mov.b32 %r12362, %r12338; 2026-02-21T10:18:46.5926032Z mov.b32 %r12363, %r12338; 2026-02-21T10:18:46.5926290Z mov.b32 %r12364, %r12338; 2026-02-21T10:18:46.5926656Z mov.b32 %r12365, %r12338; 2026-02-21T10:18:46.5926827Z mov.b32 %r12366, %r12338; 2026-02-21T10:18:46.5926991Z mov.b32 %r12367, %r12338; 2026-02-21T10:18:46.5927158Z mov.b32 %r12368, %r12338; 2026-02-21T10:18:46.5927332Z mov.b32 %r12369, %r12338; 2026-02-21T10:18:46.5927504Z mov.b32 %r12370, %r12338; 2026-02-21T10:18:46.5927664Z mov.b32 %r12371, %r12338; 2026-02-21T10:18:46.5927837Z mov.b32 %r12372, %r12338; 2026-02-21T10:18:46.5927999Z mov.b32 %r12373, %r12338; 2026-02-21T10:18:46.5928177Z mov.b32 %r12374, %r12338; 2026-02-21T10:18:46.5928351Z mov.b32 %r12375, %r12338; 2026-02-21T10:18:46.5928520Z mov.b32 %r12376, %r12338; 2026-02-21T10:18:46.5928685Z mov.b32 %r12377, %r12338; 2026-02-21T10:18:46.5928848Z mov.b32 %r12378, %r12338; 2026-02-21T10:18:46.5929019Z mov.b32 %r12379, %r12338; 2026-02-21T10:18:46.5929180Z mov.b32 %r12380, %r12338; 2026-02-21T10:18:46.5929348Z mov.b32 %r12381, %r12338; 2026-02-21T10:18:46.5929512Z mov.b32 %r12382, %r12338; 2026-02-21T10:18:46.5929684Z mov.b32 %r12383, %r12338; 2026-02-21T10:18:46.5929847Z mov.b32 %r12384, %r12338; 2026-02-21T10:18:46.5930104Z mov.b32 %r12385, %r12338; 2026-02-21T10:18:46.5930283Z mov.b32 %r12386, %r12338; 2026-02-21T10:18:46.5930457Z mov.b32 %r12387, %r12338; 2026-02-21T10:18:46.5930627Z mov.b32 %r12388, %r12338; 2026-02-21T10:18:46.5930793Z mov.b32 %r12389, %r12338; 2026-02-21T10:18:46.5931026Z mov.b32 %r12390, %r12338; 2026-02-21T10:18:46.5931193Z mov.b32 %r12391, %r12338; 2026-02-21T10:18:46.5931363Z mov.b32 %r12392, %r12338; 2026-02-21T10:18:46.5931526Z mov.b32 %r12393, %r12338; 2026-02-21T10:18:46.5931696Z mov.b32 %r12394, %r12338; 2026-02-21T10:18:46.5931857Z mov.b32 %r12395, %r12338; 2026-02-21T10:18:46.5932025Z mov.b32 %r12396, %r12338; 2026-02-21T10:18:46.5932190Z mov.b32 %r12397, %r12338; 2026-02-21T10:18:46.5932362Z mov.b32 %r12398, %r12338; 2026-02-21T10:18:46.5932531Z mov.b32 %r12399, %r12338; 2026-02-21T10:18:46.5932699Z mov.b32 %r12400, %r12338; 2026-02-21T10:18:46.5932871Z mov.b32 %r12401, %r12338; 2026-02-21T10:18:46.5933034Z mov.b32 %r12402, %r12338; 2026-02-21T10:18:46.5933215Z mov.b32 %r12403, %r12338; 2026-02-21T10:18:46.5933390Z mov.b32 %r12404, %r12338; 2026-02-21T10:18:46.5933561Z mov.b32 %r12405, %r12338; 2026-02-21T10:18:46.5933723Z mov.b32 %r12406, %r12338; 2026-02-21T10:18:46.5933895Z mov.b32 %r12407, %r12338; 2026-02-21T10:18:46.5934060Z mov.b32 %r12408, %r12338; 2026-02-21T10:18:46.5934231Z mov.b32 %r12409, %r12338; 2026-02-21T10:18:46.5934403Z mov.b32 %r12410, %r12338; 2026-02-21T10:18:46.5934568Z mov.b32 %r12411, %r12338; 2026-02-21T10:18:46.5934738Z mov.b32 %r12412, %r12338; 2026-02-21T10:18:46.5934898Z mov.b32 %r12413, %r12338; 2026-02-21T10:18:46.5935067Z mov.b32 %r12414, %r12338; 2026-02-21T10:18:46.5935231Z mov.b32 %r12415, %r12338; 2026-02-21T10:18:46.5935398Z mov.b32 %r12416, %r12338; 2026-02-21T10:18:46.5935577Z mov.b32 %r12417, %r12338; 2026-02-21T10:18:46.5935750Z mov.b32 %r12418, %r12338; 2026-02-21T10:18:46.5935913Z mov.b32 %r12419, %r12338; 2026-02-21T10:18:46.5936081Z mov.b32 %r12420, %r12338; 2026-02-21T10:18:46.5936251Z mov.b32 %r12421, %r12338; 2026-02-21T10:18:46.5936414Z mov.b32 %r12422, %r12338; 2026-02-21T10:18:46.5936716Z mov.b32 %r12423, %r12338; 2026-02-21T10:18:46.5936879Z mov.b32 %r12424, %r12338; 2026-02-21T10:18:46.5937045Z mov.b32 %r12425, %r12338; 2026-02-21T10:18:46.5937208Z mov.b32 %r12426, %r12338; 2026-02-21T10:18:46.5937377Z mov.b32 %r12427, %r12338; 2026-02-21T10:18:46.5937540Z mov.b32 %r12428, %r12338; 2026-02-21T10:18:46.5937709Z mov.b32 %r12429, %r12338; 2026-02-21T10:18:46.5937881Z mov.b32 %r12430, %r12338; 2026-02-21T10:18:46.5938055Z mov.b32 %r12431, %r12338; 2026-02-21T10:18:46.5938220Z mov.b32 %r12432, %r12338; 2026-02-21T10:18:46.5938382Z mov.b32 %r12433, %r12338; 2026-02-21T10:18:46.5938552Z mov.b32 %r12434, %r12338; 2026-02-21T10:18:46.5938716Z mov.b32 %r12435, %r12338; 2026-02-21T10:18:46.5938969Z mov.b32 %r12436, %r12338; 2026-02-21T10:18:46.5939219Z mov.b32 %r12437, %r12338; 2026-02-21T10:18:46.5939400Z mov.b32 %r12438, %r12338; 2026-02-21T10:18:46.5939568Z mov.b32 %r12439, %r12338; 2026-02-21T10:18:46.5939736Z mov.b32 %r12440, %r12338; 2026-02-21T10:18:46.5939899Z mov.b32 %r12441, %r12338; 2026-02-21T10:18:46.5940071Z mov.b32 %r12442, %r12338; 2026-02-21T10:18:46.5940239Z mov.b32 %r12443, %r12338; 2026-02-21T10:18:46.5940401Z mov.b32 %r12444, %r12338; 2026-02-21T10:18:46.5940569Z mov.b32 %r12445, %r12338; 2026-02-21T10:18:46.5940731Z mov.b32 %r12446, %r12338; 2026-02-21T10:18:46.5940903Z mov.b32 %r12447, %r12338; 2026-02-21T10:18:46.5941075Z mov.b32 %r12448, %r12338; 2026-02-21T10:18:46.5941249Z mov.b32 %r12449, %r12338; 2026-02-21T10:18:46.5941420Z mov.b32 %r12450, %r12338; 2026-02-21T10:18:46.5941590Z mov.b32 %r12451, %r12338; 2026-02-21T10:18:46.5941753Z mov.b32 %r12452, %r12338; 2026-02-21T10:18:46.5941925Z mov.b32 %r12453, %r12338; 2026-02-21T10:18:46.5942095Z mov.b32 %r12454, %r12338; 2026-02-21T10:18:46.5942260Z mov.b32 %r12455, %r12338; 2026-02-21T10:18:46.5942431Z mov.b32 %r12456, %r12338; 2026-02-21T10:18:46.5942675Z mov.b32 %r12457, %r12338; 2026-02-21T10:18:46.5942856Z mov.b32 %r12458, %r12338; 2026-02-21T10:18:46.5943020Z mov.b32 %r12459, %r12338; 2026-02-21T10:18:46.5943199Z mov.b32 %r12460, %r12338; 2026-02-21T10:18:46.5943367Z mov.b32 %r12461, %r12338; 2026-02-21T10:18:46.5943601Z mov.b32 %r12462, %r12338; 2026-02-21T10:18:46.5943776Z mov.b32 %r12463, %r12338; 2026-02-21T10:18:46.5943939Z mov.b32 %r12464, %r12338; 2026-02-21T10:18:46.5944110Z mov.b32 %r12465, %r12338; 2026-02-21T10:18:46.5944329Z $L__BB0_4: // Parent Loop BB0_3 Depth=1 2026-02-21T10:18:46.5944632Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:46.5945027Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.5945394Z add.s64 %rd33, %rd310, %rd6; 2026-02-21T10:18:46.5945719Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.5946078Z add.s64 %rd36, %rd310, %rd5; 2026-02-21T10:18:46.5946258Z // begin inline asm 2026-02-21T10:18:46.5946421Z mov.u64 %rd32, 0x0; 2026-02-21T10:18:46.5946796Z createpolicy.fractional.L2::evict_last.b64 %rd32, 1.0; 2026-02-21T10:18:46.5947054Z // end inline asm 2026-02-21T10:18:46.5947216Z // begin inline asm 2026-02-21T10:18:46.5947368Z mov.u32 %r1157, 0x0; 2026-02-21T10:18:46.5947529Z mov.u32 %r1158, 0x0; 2026-02-21T10:18:46.5947679Z mov.u32 %r1159, 0x0; 2026-02-21T10:18:46.5947834Z mov.u32 %r1160, 0x0; 2026-02-21T10:18:46.5948156Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1157, %r1158, %r1159, %r1160 }, [ %rd33 + 0 ], %rd32; 2026-02-21T10:18:46.5948609Z // end inline asm 2026-02-21T10:18:46.5948765Z // begin inline asm 2026-02-21T10:18:46.5948915Z mov.u64 %rd35, 0x0; 2026-02-21T10:18:46.5949134Z createpolicy.fractional.L2::evict_last.b64 %rd35, 1.0; 2026-02-21T10:18:46.5949384Z // end inline asm 2026-02-21T10:18:46.5949536Z // begin inline asm 2026-02-21T10:18:46.5949688Z mov.u32 %r1161, 0x0; 2026-02-21T10:18:46.5949846Z mov.u32 %r1162, 0x0; 2026-02-21T10:18:46.5950000Z mov.u32 %r1163, 0x0; 2026-02-21T10:18:46.5950150Z mov.u32 %r1164, 0x0; 2026-02-21T10:18:46.5950458Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1161, %r1162, %r1163, %r1164 }, [ %rd36 + 0 ], %rd35; 2026-02-21T10:18:46.5950813Z // end inline asm 2026-02-21T10:18:46.5951116Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.5951466Z bar.sync 0; 2026-02-21T10:18:46.5965604Z st.shared.v2.b32 [%r33], {%r1157, %r1158}; 2026-02-21T10:18:46.5965926Z st.shared.v2.b32 [%r33+2048], {%r1161, %r1162}; 2026-02-21T10:18:46.5966206Z st.shared.v2.b32 [%r34], {%r1159, %r1160}; 2026-02-21T10:18:46.5966638Z st.shared.v2.b32 [%r34+2048], {%r1163, %r1164}; 2026-02-21T10:18:46.5967060Z bar.sync 0; 2026-02-21T10:18:46.5967306Z ld.shared.b16 %rs1, [%r36]; 2026-02-21T10:18:46.5967523Z ld.shared.b16 %rs2, [%r36+256]; 2026-02-21T10:18:46.5967732Z ld.shared.b16 %rs3, [%r36+16]; 2026-02-21T10:18:46.5967936Z ld.shared.b16 %rs4, [%r36+272]; 2026-02-21T10:18:46.5968139Z ld.shared.b16 %rs5, [%r36+2048]; 2026-02-21T10:18:46.5968365Z ld.shared.b16 %rs6, [%r36+2304]; 2026-02-21T10:18:46.5968574Z ld.shared.b16 %rs7, [%r36+2064]; 2026-02-21T10:18:46.5968791Z ld.shared.b16 %rs8, [%r36+2320]; 2026-02-21T10:18:46.5968997Z ld.shared.b16 %rs9, [%r37]; 2026-02-21T10:18:46.5969194Z ld.shared.b16 %rs10, [%r37+256]; 2026-02-21T10:18:46.5969396Z ld.shared.b16 %rs11, [%r37+16]; 2026-02-21T10:18:46.5969589Z ld.shared.b16 %rs12, [%r37+272]; 2026-02-21T10:18:46.5969794Z ld.shared.b16 %rs13, [%r37+2048]; 2026-02-21T10:18:46.5969999Z ld.shared.b16 %rs14, [%r37+2304]; 2026-02-21T10:18:46.5970203Z ld.shared.b16 %rs15, [%r37+2064]; 2026-02-21T10:18:46.5970402Z ld.shared.b16 %rs16, [%r37+2320]; 2026-02-21T10:18:46.5970625Z cvt.f32.bf16 %r1295, %rs1; 2026-02-21T10:18:46.5970817Z cvt.f32.bf16 %r1296, %rs2; 2026-02-21T10:18:46.5971098Z cvt.f32.bf16 %r1297, %rs9; 2026-02-21T10:18:46.5971290Z cvt.f32.bf16 %r1298, %rs10; 2026-02-21T10:18:46.5971470Z cvt.f32.bf16 %r1427, %rs3; 2026-02-21T10:18:46.5971653Z cvt.f32.bf16 %r1428, %rs4; 2026-02-21T10:18:46.5971841Z cvt.f32.bf16 %r1429, %rs11; 2026-02-21T10:18:46.5972095Z cvt.f32.bf16 %r1430, %rs12; 2026-02-21T10:18:46.5972290Z cvt.f32.bf16 %r1559, %rs5; 2026-02-21T10:18:46.5972475Z cvt.f32.bf16 %r1560, %rs6; 2026-02-21T10:18:46.5972649Z cvt.f32.bf16 %r1561, %rs13; 2026-02-21T10:18:46.5972837Z cvt.f32.bf16 %r1562, %rs14; 2026-02-21T10:18:46.5973014Z cvt.f32.bf16 %r1691, %rs7; 2026-02-21T10:18:46.5973195Z cvt.f32.bf16 %r1692, %rs8; 2026-02-21T10:18:46.5973376Z cvt.f32.bf16 %r1693, %rs15; 2026-02-21T10:18:46.5973552Z cvt.f32.bf16 %r1694, %rs16; 2026-02-21T10:18:46.5973905Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.5974284Z // begin inline asm 2026-02-21T10:18:46.5974473Z mov.u32 %r1165, 0x0; 2026-02-21T10:18:46.5974641Z mov.u32 %r1166, 0x0; 2026-02-21T10:18:46.5974844Z ld.global.v2.b32 { %r1165, %r1166 }, [ %rd309 + 0 ]; 2026-02-21T10:18:46.5975082Z // end inline asm 2026-02-21T10:18:46.5975406Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.5975771Z bar.sync 0; 2026-02-21T10:18:46.5975933Z st.shared.b8 [%r38], %r1165; 2026-02-21T10:18:46.5976139Z prmt.b32 %r4357, %r1165, 0, 0x7771U; 2026-02-21T10:18:46.5976361Z st.shared.b8 [%r39], %r4357; 2026-02-21T10:18:46.5976699Z prmt.b32 %r4358, %r1165, 0, 0x7772U; 2026-02-21T10:18:46.5976907Z st.shared.b8 [%r40+256], %r4358; 2026-02-21T10:18:46.5977126Z prmt.b32 %r4359, %r1165, 0, 0x7773U; 2026-02-21T10:18:46.5977335Z st.shared.b8 [%r41+256], %r4359; 2026-02-21T10:18:46.5977537Z st.shared.b8 [%r42+512], %r1166; 2026-02-21T10:18:46.5977742Z prmt.b32 %r4360, %r1166, 0, 0x7771U; 2026-02-21T10:18:46.5977944Z st.shared.b8 [%r43+512], %r4360; 2026-02-21T10:18:46.5978149Z prmt.b32 %r4361, %r1166, 0, 0x7772U; 2026-02-21T10:18:46.5978354Z st.shared.b8 [%r44+768], %r4361; 2026-02-21T10:18:46.5978570Z prmt.b32 %r4362, %r1166, 0, 0x7773U; 2026-02-21T10:18:46.5978766Z st.shared.b8 [%r45+768], %r4362; 2026-02-21T10:18:46.5978956Z bar.sync 0; 2026-02-21T10:18:46.5979121Z ld.shared.b32 %r4363, [%r46]; 2026-02-21T10:18:46.5979324Z prmt.b32 %r4364, %r4363, 0, 0x7770U; 2026-02-21T10:18:46.5979530Z cvt.u16.u32 %rs17, %r4364; 2026-02-21T10:18:46.5979720Z prmt.b32 %r4365, %r4363, 0, 0x7771U; 2026-02-21T10:18:46.5979928Z cvt.u16.u32 %rs18, %r4365; 2026-02-21T10:18:46.5980107Z prmt.b32 %r4366, %r4363, 0, 0x7772U; 2026-02-21T10:18:46.5980311Z cvt.u16.u32 %rs19, %r4366; 2026-02-21T10:18:46.5980489Z prmt.b32 %r4367, %r4363, 0, 0x7773U; 2026-02-21T10:18:46.5980794Z cvt.u16.u32 %rs20, %r4367; 2026-02-21T10:18:46.5980977Z ld.shared.b32 %r4368, [%r47]; 2026-02-21T10:18:46.5981288Z prmt.b32 %r4369, %r4368, 0, 0x7770U; 2026-02-21T10:18:46.5981488Z cvt.u16.u32 %rs21, %r4369; 2026-02-21T10:18:46.5981676Z prmt.b32 %r4370, %r4368, 0, 0x7771U; 2026-02-21T10:18:46.5981886Z cvt.u16.u32 %rs22, %r4370; 2026-02-21T10:18:46.5982072Z prmt.b32 %r4371, %r4368, 0, 0x7772U; 2026-02-21T10:18:46.5982276Z cvt.u16.u32 %rs23, %r4371; 2026-02-21T10:18:46.5982459Z prmt.b32 %r4372, %r4368, 0, 0x7773U; 2026-02-21T10:18:46.5982663Z cvt.u16.u32 %rs24, %r4372; 2026-02-21T10:18:46.5983008Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.5983391Z shl.b16 %rs25, %rs17, 4; 2026-02-21T10:18:46.5983575Z shl.b16 %rs26, %rs21, 4; 2026-02-21T10:18:46.5983756Z shl.b16 %rs27, %rs18, 4; 2026-02-21T10:18:46.5983934Z shl.b16 %rs28, %rs22, 4; 2026-02-21T10:18:46.5984100Z shl.b16 %rs29, %rs19, 4; 2026-02-21T10:18:46.5984289Z shl.b16 %rs30, %rs23, 4; 2026-02-21T10:18:46.5984459Z shl.b16 %rs31, %rs20, 4; 2026-02-21T10:18:46.5984636Z shl.b16 %rs32, %rs24, 4; 2026-02-21T10:18:46.5985036Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.5985417Z cvt.s16.s8 %rs33, %rs25; 2026-02-21T10:18:46.5985591Z shr.s16 %rs34, %rs33, 4; 2026-02-21T10:18:46.5985773Z cvt.s16.s8 %rs35, %rs26; 2026-02-21T10:18:46.5986028Z shr.s16 %rs36, %rs35, 4; 2026-02-21T10:18:46.5986213Z prmt.b32 %r4373, %r4363, 0, 0x8880U; 2026-02-21T10:18:46.5986420Z cvt.u16.u32 %rs37, %r4373; 2026-02-21T10:18:46.5986726Z shr.s16 %rs38, %rs37, 4; 2026-02-21T10:18:46.5986909Z prmt.b32 %r4374, %r4368, 0, 0x8880U; 2026-02-21T10:18:46.5987109Z cvt.u16.u32 %rs39, %r4374; 2026-02-21T10:18:46.5987305Z shr.s16 %rs40, %rs39, 4; 2026-02-21T10:18:46.5987627Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.5988000Z cvt.rn.f32.s16 %r4375, %rs40; 2026-02-21T10:18:46.5988196Z cvt.rn.f32.s16 %r4376, %rs38; 2026-02-21T10:18:46.5988378Z cvt.rn.f32.s16 %r4377, %rs36; 2026-02-21T10:18:46.5988680Z cvt.rn.f32.s16 %r4378, %rs34; 2026-02-21T10:18:46.5989004Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.5989370Z cvt.s16.s8 %rs41, %rs27; 2026-02-21T10:18:46.5989547Z shr.s16 %rs42, %rs41, 4; 2026-02-21T10:18:46.5989729Z cvt.s16.s8 %rs43, %rs28; 2026-02-21T10:18:46.5989902Z shr.s16 %rs44, %rs43, 4; 2026-02-21T10:18:46.5990085Z prmt.b32 %r4379, %r4363, 0, 0x9991U; 2026-02-21T10:18:46.5990291Z cvt.u16.u32 %rs45, %r4379; 2026-02-21T10:18:46.5990467Z shr.s16 %rs46, %rs45, 4; 2026-02-21T10:18:46.5990649Z prmt.b32 %r4380, %r4368, 0, 0x9991U; 2026-02-21T10:18:46.5990845Z cvt.u16.u32 %rs47, %r4380; 2026-02-21T10:18:46.5991029Z shr.s16 %rs48, %rs47, 4; 2026-02-21T10:18:46.5991349Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.5991725Z cvt.rn.f32.s16 %r4381, %rs48; 2026-02-21T10:18:46.5991913Z cvt.rn.f32.s16 %r4382, %rs46; 2026-02-21T10:18:46.5992105Z cvt.rn.f32.s16 %r4383, %rs44; 2026-02-21T10:18:46.5992289Z cvt.rn.f32.s16 %r4384, %rs42; 2026-02-21T10:18:46.5992612Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.5992979Z cvt.s16.s8 %rs49, %rs29; 2026-02-21T10:18:46.5993165Z shr.s16 %rs50, %rs49, 4; 2026-02-21T10:18:46.5993346Z cvt.s16.s8 %rs51, %rs30; 2026-02-21T10:18:46.5993515Z shr.s16 %rs52, %rs51, 4; 2026-02-21T10:18:46.5993701Z prmt.b32 %r4385, %r4363, 0, 0xaaa2U; 2026-02-21T10:18:46.5993898Z cvt.u16.u32 %rs53, %r4385; 2026-02-21T10:18:46.5994079Z shr.s16 %rs54, %rs53, 4; 2026-02-21T10:18:46.5994262Z prmt.b32 %r4386, %r4368, 0, 0xaaa2U; 2026-02-21T10:18:46.5994457Z cvt.u16.u32 %rs55, %r4386; 2026-02-21T10:18:46.5994639Z shr.s16 %rs56, %rs55, 4; 2026-02-21T10:18:46.5995039Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.5995463Z cvt.rn.f32.s16 %r4387, %rs56; 2026-02-21T10:18:46.5995649Z cvt.rn.f32.s16 %r4388, %rs54; 2026-02-21T10:18:46.5995835Z cvt.rn.f32.s16 %r4389, %rs52; 2026-02-21T10:18:46.5996012Z cvt.rn.f32.s16 %r4390, %rs50; 2026-02-21T10:18:46.5996342Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.5996833Z cvt.s16.s8 %rs57, %rs31; 2026-02-21T10:18:46.5997006Z shr.s16 %rs58, %rs57, 4; 2026-02-21T10:18:46.5997184Z cvt.s16.s8 %rs59, %rs32; 2026-02-21T10:18:46.5997354Z shr.s16 %rs60, %rs59, 4; 2026-02-21T10:18:46.5997537Z prmt.b32 %r4391, %r4363, 0, 0xbbb3U; 2026-02-21T10:18:46.5997736Z cvt.u16.u32 %rs61, %r4391; 2026-02-21T10:18:46.5997918Z shr.s16 %rs62, %rs61, 4; 2026-02-21T10:18:46.5998092Z prmt.b32 %r4392, %r4368, 0, 0xbbb3U; 2026-02-21T10:18:46.5998291Z cvt.u16.u32 %rs63, %r4392; 2026-02-21T10:18:46.5998482Z shr.s16 %rs64, %rs63, 4; 2026-02-21T10:18:46.5998891Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.5999259Z cvt.rn.f32.s16 %r4393, %rs64; 2026-02-21T10:18:46.5999455Z cvt.rn.f32.s16 %r4394, %rs62; 2026-02-21T10:18:46.5999646Z cvt.rn.f32.s16 %r4395, %rs60; 2026-02-21T10:18:46.5999824Z cvt.rn.f32.s16 %r4396, %rs58; 2026-02-21T10:18:46.6000086Z bar.sync 0; 2026-02-21T10:18:46.6000303Z st.shared.v4.b32 [%r48], {%r4378, %r4376, %r4377, %r4375}; 2026-02-21T10:18:46.6000603Z st.shared.v4.b32 [%r49], {%r4384, %r4382, %r4383, %r4381}; 2026-02-21T10:18:46.6000896Z st.shared.v4.b32 [%r50], {%r4390, %r4388, %r4389, %r4387}; 2026-02-21T10:18:46.6001179Z st.shared.v4.b32 [%r51], {%r4396, %r4394, %r4395, %r4393}; 2026-02-21T10:18:46.6001421Z $L__tmp1: 2026-02-21T10:18:46.6001782Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6002218Z // begin inline asm 2026-02-21T10:18:46.6002410Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6002617Z // end inline asm 2026-02-21T10:18:46.6002780Z bar.sync 0; 2026-02-21T10:18:46.6002952Z shfl.sync.idx.b32 %r4397, %r4, 0, 31, -1; 2026-02-21T10:18:46.6003193Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6003385Z mov.pred %p2, -1; 2026-02-21T10:18:46.6003554Z // begin inline asm 2026-02-21T10:18:46.6005144Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r1295,%r1296,%r1297,%r1298}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6007139Z // end inline asm 2026-02-21T10:18:46.6007301Z // begin inline asm 2026-02-21T10:18:46.6008897Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r1427,%r1428,%r1429,%r1430}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6010515Z // end inline asm 2026-02-21T10:18:46.6010668Z // begin inline asm 2026-02-21T10:18:46.6012260Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r1559,%r1560,%r1561,%r1562}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6014031Z // end inline asm 2026-02-21T10:18:46.6014183Z // begin inline asm 2026-02-21T10:18:46.6015818Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r1691,%r1692,%r1693,%r1694}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6017565Z // end inline asm 2026-02-21T10:18:46.6017735Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6017940Z mov.b32 %r4224, 0; 2026-02-21T10:18:46.6018097Z mov.b32 %r1823, %r5575; 2026-02-21T10:18:46.6018351Z mov.b32 %r1824, %r4224; 2026-02-21T10:18:46.6018529Z mov.b32 %r1825, %r4224; 2026-02-21T10:18:46.6018699Z // begin inline asm 2026-02-21T10:18:46.6021330Z // wait for regs: %r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r1823,%r1824,%r1825 2026-02-21T10:18:46.6024160Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6024365Z // end inline asm 2026-02-21T10:18:46.6024510Z $L__tmp2: 2026-02-21T10:18:46.6024815Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6025184Z add.s64 %rd44, %rd33, 32; 2026-02-21T10:18:46.6025525Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6025901Z add.s64 %rd47, %rd36, 32; 2026-02-21T10:18:46.6026093Z // begin inline asm 2026-02-21T10:18:46.6026263Z mov.u64 %rd43, 0x0; 2026-02-21T10:18:46.6026617Z createpolicy.fractional.L2::evict_last.b64 %rd43, 1.0; 2026-02-21T10:18:46.6026898Z // end inline asm 2026-02-21T10:18:46.6027060Z // begin inline asm 2026-02-21T10:18:46.6027231Z mov.u32 %r1957, 0x0; 2026-02-21T10:18:46.6027400Z mov.u32 %r1958, 0x0; 2026-02-21T10:18:46.6027558Z mov.u32 %r1959, 0x0; 2026-02-21T10:18:46.6027717Z mov.u32 %r1960, 0x0; 2026-02-21T10:18:46.6028035Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1957, %r1958, %r1959, %r1960 }, [ %rd44 + 0 ], %rd43; 2026-02-21T10:18:46.6028479Z // end inline asm 2026-02-21T10:18:46.6028656Z // begin inline asm 2026-02-21T10:18:46.6028814Z mov.u64 %rd46, 0x0; 2026-02-21T10:18:46.6029140Z createpolicy.fractional.L2::evict_last.b64 %rd46, 1.0; 2026-02-21T10:18:46.6029392Z // end inline asm 2026-02-21T10:18:46.6029616Z // begin inline asm 2026-02-21T10:18:46.6029772Z mov.u32 %r1961, 0x0; 2026-02-21T10:18:46.6029933Z mov.u32 %r1962, 0x0; 2026-02-21T10:18:46.6030089Z mov.u32 %r1963, 0x0; 2026-02-21T10:18:46.6030245Z mov.u32 %r1964, 0x0; 2026-02-21T10:18:46.6030557Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1961, %r1962, %r1963, %r1964 }, [ %rd47 + 0 ], %rd46; 2026-02-21T10:18:46.6030926Z // end inline asm 2026-02-21T10:18:46.6031234Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6031591Z bar.sync 0; 2026-02-21T10:18:46.6031778Z st.shared.v2.b32 [%r33], {%r1957, %r1958}; 2026-02-21T10:18:46.6032025Z st.shared.v2.b32 [%r33+2048], {%r1961, %r1962}; 2026-02-21T10:18:46.6032286Z st.shared.v2.b32 [%r34], {%r1959, %r1960}; 2026-02-21T10:18:46.6032526Z st.shared.v2.b32 [%r34+2048], {%r1963, %r1964}; 2026-02-21T10:18:46.6032755Z bar.sync 0; 2026-02-21T10:18:46.6032915Z ld.shared.b16 %rs65, [%r36]; 2026-02-21T10:18:46.6033123Z ld.shared.b16 %rs66, [%r36+256]; 2026-02-21T10:18:46.6033414Z ld.shared.b16 %rs67, [%r36+16]; 2026-02-21T10:18:46.6033621Z ld.shared.b16 %rs68, [%r36+272]; 2026-02-21T10:18:46.6033824Z ld.shared.b16 %rs69, [%r36+2048]; 2026-02-21T10:18:46.6034023Z ld.shared.b16 %rs70, [%r36+2304]; 2026-02-21T10:18:46.6034219Z ld.shared.b16 %rs71, [%r36+2064]; 2026-02-21T10:18:46.6034490Z ld.shared.b16 %rs72, [%r36+2320]; 2026-02-21T10:18:46.6034709Z ld.shared.b16 %rs73, [%r37]; 2026-02-21T10:18:46.6034903Z ld.shared.b16 %rs74, [%r37+256]; 2026-02-21T10:18:46.6035103Z ld.shared.b16 %rs75, [%r37+16]; 2026-02-21T10:18:46.6035300Z ld.shared.b16 %rs76, [%r37+272]; 2026-02-21T10:18:46.6035493Z ld.shared.b16 %rs77, [%r37+2048]; 2026-02-21T10:18:46.6035694Z ld.shared.b16 %rs78, [%r37+2304]; 2026-02-21T10:18:46.6035888Z ld.shared.b16 %rs79, [%r37+2064]; 2026-02-21T10:18:46.6036091Z ld.shared.b16 %rs80, [%r37+2320]; 2026-02-21T10:18:46.6036287Z cvt.f32.bf16 %r2095, %rs65; 2026-02-21T10:18:46.6036615Z cvt.f32.bf16 %r2096, %rs66; 2026-02-21T10:18:46.6036807Z cvt.f32.bf16 %r2097, %rs73; 2026-02-21T10:18:46.6036988Z cvt.f32.bf16 %r2098, %rs74; 2026-02-21T10:18:46.6037168Z cvt.f32.bf16 %r2227, %rs67; 2026-02-21T10:18:46.6037342Z cvt.f32.bf16 %r2228, %rs68; 2026-02-21T10:18:46.6037521Z cvt.f32.bf16 %r2229, %rs75; 2026-02-21T10:18:46.6037698Z cvt.f32.bf16 %r2230, %rs76; 2026-02-21T10:18:46.6037880Z cvt.f32.bf16 %r2359, %rs69; 2026-02-21T10:18:46.6038054Z cvt.f32.bf16 %r2360, %rs70; 2026-02-21T10:18:46.6038234Z cvt.f32.bf16 %r2361, %rs77; 2026-02-21T10:18:46.6038416Z cvt.f32.bf16 %r2362, %rs78; 2026-02-21T10:18:46.6038608Z cvt.f32.bf16 %r2491, %rs71; 2026-02-21T10:18:46.6038784Z cvt.f32.bf16 %r2492, %rs72; 2026-02-21T10:18:46.6038968Z cvt.f32.bf16 %r2493, %rs79; 2026-02-21T10:18:46.6039149Z cvt.f32.bf16 %r2494, %rs80; 2026-02-21T10:18:46.6039477Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6039859Z add.s64 %rd49, %rd309, 10240; 2026-02-21T10:18:46.6040049Z // begin inline asm 2026-02-21T10:18:46.6040221Z mov.u32 %r1965, 0x0; 2026-02-21T10:18:46.6040388Z mov.u32 %r1966, 0x0; 2026-02-21T10:18:46.6040590Z ld.global.v2.b32 { %r1965, %r1966 }, [ %rd49 + 0 ]; 2026-02-21T10:18:46.6040824Z // end inline asm 2026-02-21T10:18:46.6041136Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6041498Z bar.sync 0; 2026-02-21T10:18:46.6041653Z st.shared.b8 [%r38], %r1965; 2026-02-21T10:18:46.6041849Z prmt.b32 %r4398, %r1965, 0, 0x7771U; 2026-02-21T10:18:46.6042055Z st.shared.b8 [%r39], %r4398; 2026-02-21T10:18:46.6042246Z prmt.b32 %r4399, %r1965, 0, 0x7772U; 2026-02-21T10:18:46.6042448Z st.shared.b8 [%r40+256], %r4399; 2026-02-21T10:18:46.6042648Z prmt.b32 %r4400, %r1965, 0, 0x7773U; 2026-02-21T10:18:46.6042969Z st.shared.b8 [%r41+256], %r4400; 2026-02-21T10:18:46.6043171Z st.shared.b8 [%r42+512], %r1966; 2026-02-21T10:18:46.6043428Z prmt.b32 %r4401, %r1966, 0, 0x7771U; 2026-02-21T10:18:46.6043635Z st.shared.b8 [%r43+512], %r4401; 2026-02-21T10:18:46.6043832Z prmt.b32 %r4402, %r1966, 0, 0x7772U; 2026-02-21T10:18:46.6044027Z st.shared.b8 [%r44+768], %r4402; 2026-02-21T10:18:46.6044218Z prmt.b32 %r4403, %r1966, 0, 0x7773U; 2026-02-21T10:18:46.6044412Z st.shared.b8 [%r45+768], %r4403; 2026-02-21T10:18:46.6044599Z bar.sync 0; 2026-02-21T10:18:46.6044751Z ld.shared.b32 %r4404, [%r46]; 2026-02-21T10:18:46.6044945Z prmt.b32 %r4405, %r4404, 0, 0x7770U; 2026-02-21T10:18:46.6045149Z cvt.u16.u32 %rs81, %r4405; 2026-02-21T10:18:46.6045335Z prmt.b32 %r4406, %r4404, 0, 0x7771U; 2026-02-21T10:18:46.6045539Z cvt.u16.u32 %rs82, %r4406; 2026-02-21T10:18:46.6045719Z prmt.b32 %r4407, %r4404, 0, 0x7772U; 2026-02-21T10:18:46.6045920Z cvt.u16.u32 %rs83, %r4407; 2026-02-21T10:18:46.6046100Z prmt.b32 %r4408, %r4404, 0, 0x7773U; 2026-02-21T10:18:46.6046315Z cvt.u16.u32 %rs84, %r4408; 2026-02-21T10:18:46.6046636Z ld.shared.b32 %r4409, [%r47]; 2026-02-21T10:18:46.6046921Z prmt.b32 %r4410, %r4409, 0, 0x7770U; 2026-02-21T10:18:46.6047128Z cvt.u16.u32 %rs85, %r4410; 2026-02-21T10:18:46.6047313Z prmt.b32 %r4411, %r4409, 0, 0x7771U; 2026-02-21T10:18:46.6047511Z cvt.u16.u32 %rs86, %r4411; 2026-02-21T10:18:46.6047690Z prmt.b32 %r4412, %r4409, 0, 0x7772U; 2026-02-21T10:18:46.6047973Z cvt.u16.u32 %rs87, %r4412; 2026-02-21T10:18:46.6048159Z prmt.b32 %r4413, %r4409, 0, 0x7773U; 2026-02-21T10:18:46.6048365Z cvt.u16.u32 %rs88, %r4413; 2026-02-21T10:18:46.6048697Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6049068Z shl.b16 %rs89, %rs81, 4; 2026-02-21T10:18:46.6049259Z shl.b16 %rs90, %rs85, 4; 2026-02-21T10:18:46.6049439Z shl.b16 %rs91, %rs82, 4; 2026-02-21T10:18:46.6049613Z shl.b16 %rs92, %rs86, 4; 2026-02-21T10:18:46.6049784Z shl.b16 %rs93, %rs83, 4; 2026-02-21T10:18:46.6049959Z shl.b16 %rs94, %rs87, 4; 2026-02-21T10:18:46.6050131Z shl.b16 %rs95, %rs84, 4; 2026-02-21T10:18:46.6050308Z shl.b16 %rs96, %rs88, 4; 2026-02-21T10:18:46.6050624Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6050989Z cvt.s16.s8 %rs97, %rs89; 2026-02-21T10:18:46.6051171Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:18:46.6051354Z cvt.s16.s8 %rs99, %rs90; 2026-02-21T10:18:46.6051532Z shr.s16 %rs100, %rs99, 4; 2026-02-21T10:18:46.6051719Z prmt.b32 %r4414, %r4404, 0, 0x8880U; 2026-02-21T10:18:46.6051926Z cvt.u16.u32 %rs101, %r4414; 2026-02-21T10:18:46.6052110Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:18:46.6052296Z prmt.b32 %r4415, %r4409, 0, 0x8880U; 2026-02-21T10:18:46.6052492Z cvt.u16.u32 %rs103, %r4415; 2026-02-21T10:18:46.6052679Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:18:46.6053000Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6053369Z cvt.rn.f32.s16 %r4416, %rs104; 2026-02-21T10:18:46.6053566Z cvt.rn.f32.s16 %r4417, %rs102; 2026-02-21T10:18:46.6053749Z cvt.rn.f32.s16 %r4418, %rs100; 2026-02-21T10:18:46.6053938Z cvt.rn.f32.s16 %r4419, %rs98; 2026-02-21T10:18:46.6054261Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6054625Z cvt.s16.s8 %rs105, %rs91; 2026-02-21T10:18:46.6054805Z shr.s16 %rs106, %rs105, 4; 2026-02-21T10:18:46.6054989Z cvt.s16.s8 %rs107, %rs92; 2026-02-21T10:18:46.6055163Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:18:46.6055365Z prmt.b32 %r4420, %r4404, 0, 0x9991U; 2026-02-21T10:18:46.6055572Z cvt.u16.u32 %rs109, %r4420; 2026-02-21T10:18:46.6055754Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:18:46.6055936Z prmt.b32 %r4421, %r4409, 0, 0x9991U; 2026-02-21T10:18:46.6056129Z cvt.u16.u32 %rs111, %r4421; 2026-02-21T10:18:46.6056310Z shr.s16 %rs112, %rs111, 4; 2026-02-21T10:18:46.6056850Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6057316Z cvt.rn.f32.s16 %r4422, %rs112; 2026-02-21T10:18:46.6057507Z cvt.rn.f32.s16 %r4423, %rs110; 2026-02-21T10:18:46.6057700Z cvt.rn.f32.s16 %r4424, %rs108; 2026-02-21T10:18:46.6057887Z cvt.rn.f32.s16 %r4425, %rs106; 2026-02-21T10:18:46.6058213Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6058582Z cvt.s16.s8 %rs113, %rs93; 2026-02-21T10:18:46.6058757Z shr.s16 %rs114, %rs113, 4; 2026-02-21T10:18:46.6058939Z cvt.s16.s8 %rs115, %rs94; 2026-02-21T10:18:46.6059111Z shr.s16 %rs116, %rs115, 4; 2026-02-21T10:18:46.6059311Z prmt.b32 %r4426, %r4404, 0, 0xaaa2U; 2026-02-21T10:18:46.6059514Z cvt.u16.u32 %rs117, %r4426; 2026-02-21T10:18:46.6059698Z shr.s16 %rs118, %rs117, 4; 2026-02-21T10:18:46.6059885Z prmt.b32 %r4427, %r4409, 0, 0xaaa2U; 2026-02-21T10:18:46.6060084Z cvt.u16.u32 %rs119, %r4427; 2026-02-21T10:18:46.6060273Z shr.s16 %rs120, %rs119, 4; 2026-02-21T10:18:46.6060665Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6061034Z cvt.rn.f32.s16 %r4428, %rs120; 2026-02-21T10:18:46.6061220Z cvt.rn.f32.s16 %r4429, %rs118; 2026-02-21T10:18:46.6061411Z cvt.rn.f32.s16 %r4430, %rs116; 2026-02-21T10:18:46.6061594Z cvt.rn.f32.s16 %r4431, %rs114; 2026-02-21T10:18:46.6062000Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6062371Z cvt.s16.s8 %rs121, %rs95; 2026-02-21T10:18:46.6062546Z shr.s16 %rs122, %rs121, 4; 2026-02-21T10:18:46.6062729Z cvt.s16.s8 %rs123, %rs96; 2026-02-21T10:18:46.6062916Z shr.s16 %rs124, %rs123, 4; 2026-02-21T10:18:46.6063111Z prmt.b32 %r4432, %r4404, 0, 0xbbb3U; 2026-02-21T10:18:46.6063312Z cvt.u16.u32 %rs125, %r4432; 2026-02-21T10:18:46.6063500Z shr.s16 %rs126, %rs125, 4; 2026-02-21T10:18:46.6063685Z prmt.b32 %r4433, %r4409, 0, 0xbbb3U; 2026-02-21T10:18:46.6063885Z cvt.u16.u32 %rs127, %r4433; 2026-02-21T10:18:46.6064070Z shr.s16 %rs128, %rs127, 4; 2026-02-21T10:18:46.6064392Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6064753Z cvt.rn.f32.s16 %r4434, %rs128; 2026-02-21T10:18:46.6064938Z cvt.rn.f32.s16 %r4435, %rs126; 2026-02-21T10:18:46.6065128Z cvt.rn.f32.s16 %r4436, %rs124; 2026-02-21T10:18:46.6065311Z cvt.rn.f32.s16 %r4437, %rs122; 2026-02-21T10:18:46.6065504Z bar.sync 0; 2026-02-21T10:18:46.6065707Z st.shared.v4.b32 [%r48], {%r4419, %r4417, %r4418, %r4416}; 2026-02-21T10:18:46.6066011Z st.shared.v4.b32 [%r49], {%r4425, %r4423, %r4424, %r4422}; 2026-02-21T10:18:46.6066306Z st.shared.v4.b32 [%r50], {%r4431, %r4429, %r4430, %r4428}; 2026-02-21T10:18:46.6066715Z st.shared.v4.b32 [%r51], {%r4437, %r4435, %r4436, %r4434}; 2026-02-21T10:18:46.6066964Z $L__tmp3: 2026-02-21T10:18:46.6067328Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6067776Z // begin inline asm 2026-02-21T10:18:46.6067967Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6068164Z // end inline asm 2026-02-21T10:18:46.6068316Z bar.sync 0; 2026-02-21T10:18:46.6068569Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6068761Z // begin inline asm 2026-02-21T10:18:46.6070354Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r2095,%r2096,%r2097,%r2098}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6072099Z // end inline asm 2026-02-21T10:18:46.6072327Z // begin inline asm 2026-02-21T10:18:46.6073919Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r2227,%r2228,%r2229,%r2230}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6075552Z // end inline asm 2026-02-21T10:18:46.6075720Z // begin inline asm 2026-02-21T10:18:46.6077568Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r2359,%r2360,%r2361,%r2362}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6079227Z // end inline asm 2026-02-21T10:18:46.6079386Z // begin inline asm 2026-02-21T10:18:46.6080956Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r2491,%r2492,%r2493,%r2494}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6082580Z // end inline asm 2026-02-21T10:18:46.6082756Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6082967Z mov.b32 %r2623, %r5575; 2026-02-21T10:18:46.6083149Z mov.b32 %r2624, %r4224; 2026-02-21T10:18:46.6083318Z mov.b32 %r2625, %r4224; 2026-02-21T10:18:46.6083489Z // begin inline asm 2026-02-21T10:18:46.6086104Z // wait for regs: %r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r2623,%r2624,%r2625 2026-02-21T10:18:46.6089057Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6089267Z // end inline asm 2026-02-21T10:18:46.6089421Z $L__tmp4: 2026-02-21T10:18:46.6089731Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6090110Z add.s64 %rd55, %rd33, 64; 2026-02-21T10:18:46.6090451Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6090994Z add.s64 %rd58, %rd36, 64; 2026-02-21T10:18:46.6091177Z // begin inline asm 2026-02-21T10:18:46.6091351Z mov.u64 %rd54, 0x0; 2026-02-21T10:18:46.6091594Z createpolicy.fractional.L2::evict_last.b64 %rd54, 1.0; 2026-02-21T10:18:46.6091862Z // end inline asm 2026-02-21T10:18:46.6092021Z // begin inline asm 2026-02-21T10:18:46.6092176Z mov.u32 %r2757, 0x0; 2026-02-21T10:18:46.6092349Z mov.u32 %r2758, 0x0; 2026-02-21T10:18:46.6092502Z mov.u32 %r2759, 0x0; 2026-02-21T10:18:46.6092667Z mov.u32 %r2760, 0x0; 2026-02-21T10:18:46.6092988Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2757, %r2758, %r2759, %r2760 }, [ %rd55 + 0 ], %rd54; 2026-02-21T10:18:46.6093358Z // end inline asm 2026-02-21T10:18:46.6093517Z // begin inline asm 2026-02-21T10:18:46.6093672Z mov.u64 %rd57, 0x0; 2026-02-21T10:18:46.6093893Z createpolicy.fractional.L2::evict_last.b64 %rd57, 1.0; 2026-02-21T10:18:46.6094143Z // end inline asm 2026-02-21T10:18:46.6094301Z // begin inline asm 2026-02-21T10:18:46.6094460Z mov.u32 %r2761, 0x0; 2026-02-21T10:18:46.6094623Z mov.u32 %r2762, 0x0; 2026-02-21T10:18:46.6094849Z mov.u32 %r2763, 0x0; 2026-02-21T10:18:46.6095010Z mov.u32 %r2764, 0x0; 2026-02-21T10:18:46.6095327Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2761, %r2762, %r2763, %r2764 }, [ %rd58 + 0 ], %rd57; 2026-02-21T10:18:46.6095685Z // end inline asm 2026-02-21T10:18:46.6096085Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6096573Z bar.sync 0; 2026-02-21T10:18:46.6096772Z st.shared.v2.b32 [%r33], {%r2757, %r2758}; 2026-02-21T10:18:46.6097022Z st.shared.v2.b32 [%r33+2048], {%r2761, %r2762}; 2026-02-21T10:18:46.6097284Z st.shared.v2.b32 [%r34], {%r2759, %r2760}; 2026-02-21T10:18:46.6097519Z st.shared.v2.b32 [%r34+2048], {%r2763, %r2764}; 2026-02-21T10:18:46.6097746Z bar.sync 0; 2026-02-21T10:18:46.6097915Z ld.shared.b16 %rs129, [%r36]; 2026-02-21T10:18:46.6098125Z ld.shared.b16 %rs130, [%r36+256]; 2026-02-21T10:18:46.6098341Z ld.shared.b16 %rs131, [%r36+16]; 2026-02-21T10:18:46.6098542Z ld.shared.b16 %rs132, [%r36+272]; 2026-02-21T10:18:46.6098751Z ld.shared.b16 %rs133, [%r36+2048]; 2026-02-21T10:18:46.6098957Z ld.shared.b16 %rs134, [%r36+2304]; 2026-02-21T10:18:46.6099161Z ld.shared.b16 %rs135, [%r36+2064]; 2026-02-21T10:18:46.6099356Z ld.shared.b16 %rs136, [%r36+2320]; 2026-02-21T10:18:46.6099563Z ld.shared.b16 %rs137, [%r37]; 2026-02-21T10:18:46.6099775Z ld.shared.b16 %rs138, [%r37+256]; 2026-02-21T10:18:46.6099975Z ld.shared.b16 %rs139, [%r37+16]; 2026-02-21T10:18:46.6100187Z ld.shared.b16 %rs140, [%r37+272]; 2026-02-21T10:18:46.6100391Z ld.shared.b16 %rs141, [%r37+2048]; 2026-02-21T10:18:46.6100594Z ld.shared.b16 %rs142, [%r37+2304]; 2026-02-21T10:18:46.6100796Z ld.shared.b16 %rs143, [%r37+2064]; 2026-02-21T10:18:46.6101001Z ld.shared.b16 %rs144, [%r37+2320]; 2026-02-21T10:18:46.6101209Z cvt.f32.bf16 %r2895, %rs129; 2026-02-21T10:18:46.6101426Z cvt.f32.bf16 %r2896, %rs130; 2026-02-21T10:18:46.6101615Z cvt.f32.bf16 %r2897, %rs137; 2026-02-21T10:18:46.6101801Z cvt.f32.bf16 %r2898, %rs138; 2026-02-21T10:18:46.6102004Z cvt.f32.bf16 %r3027, %rs131; 2026-02-21T10:18:46.6102191Z cvt.f32.bf16 %r3028, %rs132; 2026-02-21T10:18:46.6102379Z cvt.f32.bf16 %r3029, %rs139; 2026-02-21T10:18:46.6102556Z cvt.f32.bf16 %r3030, %rs140; 2026-02-21T10:18:46.6102741Z cvt.f32.bf16 %r3159, %rs133; 2026-02-21T10:18:46.6102924Z cvt.f32.bf16 %r3160, %rs134; 2026-02-21T10:18:46.6103108Z cvt.f32.bf16 %r3161, %rs141; 2026-02-21T10:18:46.6103283Z cvt.f32.bf16 %r3162, %rs142; 2026-02-21T10:18:46.6103463Z cvt.f32.bf16 %r3291, %rs135; 2026-02-21T10:18:46.6103643Z cvt.f32.bf16 %r3292, %rs136; 2026-02-21T10:18:46.6103817Z cvt.f32.bf16 %r3293, %rs143; 2026-02-21T10:18:46.6104001Z cvt.f32.bf16 %r3294, %rs144; 2026-02-21T10:18:46.6104349Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6104813Z add.s64 %rd60, %rd309, 20480; 2026-02-21T10:18:46.6105077Z // begin inline asm 2026-02-21T10:18:46.6105248Z mov.u32 %r2765, 0x0; 2026-02-21T10:18:46.6105408Z mov.u32 %r2766, 0x0; 2026-02-21T10:18:46.6105603Z ld.global.v2.b32 { %r2765, %r2766 }, [ %rd60 + 0 ]; 2026-02-21T10:18:46.6105841Z // end inline asm 2026-02-21T10:18:46.6106148Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6106634Z bar.sync 0; 2026-02-21T10:18:46.6106792Z st.shared.b8 [%r38], %r2765; 2026-02-21T10:18:46.6106987Z prmt.b32 %r4438, %r2765, 0, 0x7771U; 2026-02-21T10:18:46.6107192Z st.shared.b8 [%r39], %r4438; 2026-02-21T10:18:46.6107382Z prmt.b32 %r4439, %r2765, 0, 0x7772U; 2026-02-21T10:18:46.6107584Z st.shared.b8 [%r40+256], %r4439; 2026-02-21T10:18:46.6107786Z prmt.b32 %r4440, %r2765, 0, 0x7773U; 2026-02-21T10:18:46.6108001Z st.shared.b8 [%r41+256], %r4440; 2026-02-21T10:18:46.6108196Z st.shared.b8 [%r42+512], %r2766; 2026-02-21T10:18:46.6108398Z prmt.b32 %r4441, %r2766, 0, 0x7771U; 2026-02-21T10:18:46.6108676Z st.shared.b8 [%r43+512], %r4441; 2026-02-21T10:18:46.6108957Z prmt.b32 %r4442, %r2766, 0, 0x7772U; 2026-02-21T10:18:46.6109161Z st.shared.b8 [%r44+768], %r4442; 2026-02-21T10:18:46.6109358Z prmt.b32 %r4443, %r2766, 0, 0x7773U; 2026-02-21T10:18:46.6109553Z st.shared.b8 [%r45+768], %r4443; 2026-02-21T10:18:46.6109736Z bar.sync 0; 2026-02-21T10:18:46.6109950Z ld.shared.b32 %r4444, [%r46]; 2026-02-21T10:18:46.6110163Z prmt.b32 %r4445, %r4444, 0, 0x7770U; 2026-02-21T10:18:46.6110368Z cvt.u16.u32 %rs145, %r4445; 2026-02-21T10:18:46.6110556Z prmt.b32 %r4446, %r4444, 0, 0x7771U; 2026-02-21T10:18:46.6110759Z cvt.u16.u32 %rs146, %r4446; 2026-02-21T10:18:46.6110941Z prmt.b32 %r4447, %r4444, 0, 0x7772U; 2026-02-21T10:18:46.6111142Z cvt.u16.u32 %rs147, %r4447; 2026-02-21T10:18:46.6111325Z prmt.b32 %r4448, %r4444, 0, 0x7773U; 2026-02-21T10:18:46.6111528Z cvt.u16.u32 %rs148, %r4448; 2026-02-21T10:18:46.6111709Z ld.shared.b32 %r4449, [%r47]; 2026-02-21T10:18:46.6111901Z prmt.b32 %r4450, %r4449, 0, 0x7770U; 2026-02-21T10:18:46.6112103Z cvt.u16.u32 %rs149, %r4450; 2026-02-21T10:18:46.6112288Z prmt.b32 %r4451, %r4449, 0, 0x7771U; 2026-02-21T10:18:46.6112492Z cvt.u16.u32 %rs150, %r4451; 2026-02-21T10:18:46.6112696Z prmt.b32 %r4452, %r4449, 0, 0x7772U; 2026-02-21T10:18:46.6112915Z cvt.u16.u32 %rs151, %r4452; 2026-02-21T10:18:46.6113106Z prmt.b32 %r4453, %r4449, 0, 0x7773U; 2026-02-21T10:18:46.6113316Z cvt.u16.u32 %rs152, %r4453; 2026-02-21T10:18:46.6113650Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6114025Z shl.b16 %rs153, %rs145, 4; 2026-02-21T10:18:46.6114222Z shl.b16 %rs154, %rs149, 4; 2026-02-21T10:18:46.6114400Z shl.b16 %rs155, %rs146, 4; 2026-02-21T10:18:46.6114579Z shl.b16 %rs156, %rs150, 4; 2026-02-21T10:18:46.6114751Z shl.b16 %rs157, %rs147, 4; 2026-02-21T10:18:46.6114933Z shl.b16 %rs158, %rs151, 4; 2026-02-21T10:18:46.6115118Z shl.b16 %rs159, %rs148, 4; 2026-02-21T10:18:46.6115303Z shl.b16 %rs160, %rs152, 4; 2026-02-21T10:18:46.6115629Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6115997Z cvt.s16.s8 %rs161, %rs153; 2026-02-21T10:18:46.6116179Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:18:46.6116353Z cvt.s16.s8 %rs163, %rs154; 2026-02-21T10:18:46.6116671Z shr.s16 %rs164, %rs163, 4; 2026-02-21T10:18:46.6116856Z prmt.b32 %r4454, %r4444, 0, 0x8880U; 2026-02-21T10:18:46.6117071Z cvt.u16.u32 %rs165, %r4454; 2026-02-21T10:18:46.6117254Z shr.s16 %rs166, %rs165, 4; 2026-02-21T10:18:46.6117437Z prmt.b32 %r4455, %r4449, 0, 0x8880U; 2026-02-21T10:18:46.6117641Z cvt.u16.u32 %rs167, %r4455; 2026-02-21T10:18:46.6117826Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:18:46.6118161Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6118636Z cvt.rn.f32.s16 %r4456, %rs168; 2026-02-21T10:18:46.6118895Z cvt.rn.f32.s16 %r4457, %rs166; 2026-02-21T10:18:46.6119084Z cvt.rn.f32.s16 %r4458, %rs164; 2026-02-21T10:18:46.6119278Z cvt.rn.f32.s16 %r4459, %rs162; 2026-02-21T10:18:46.6119603Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6119967Z cvt.s16.s8 %rs169, %rs155; 2026-02-21T10:18:46.6120162Z shr.s16 %rs170, %rs169, 4; 2026-02-21T10:18:46.6120348Z cvt.s16.s8 %rs171, %rs156; 2026-02-21T10:18:46.6120528Z shr.s16 %rs172, %rs171, 4; 2026-02-21T10:18:46.6120712Z prmt.b32 %r4460, %r4444, 0, 0x9991U; 2026-02-21T10:18:46.6120918Z cvt.u16.u32 %rs173, %r4460; 2026-02-21T10:18:46.6121099Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:18:46.6121283Z prmt.b32 %r4461, %r4449, 0, 0x9991U; 2026-02-21T10:18:46.6121483Z cvt.u16.u32 %rs175, %r4461; 2026-02-21T10:18:46.6121667Z shr.s16 %rs176, %rs175, 4; 2026-02-21T10:18:46.6121987Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6122353Z cvt.rn.f32.s16 %r4462, %rs176; 2026-02-21T10:18:46.6122638Z cvt.rn.f32.s16 %r4463, %rs174; 2026-02-21T10:18:46.6122836Z cvt.rn.f32.s16 %r4464, %rs172; 2026-02-21T10:18:46.6123027Z cvt.rn.f32.s16 %r4465, %rs170; 2026-02-21T10:18:46.6123355Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6123789Z cvt.s16.s8 %rs177, %rs157; 2026-02-21T10:18:46.6123974Z shr.s16 %rs178, %rs177, 4; 2026-02-21T10:18:46.6124159Z cvt.s16.s8 %rs179, %rs158; 2026-02-21T10:18:46.6124347Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:18:46.6124538Z prmt.b32 %r4466, %r4444, 0, 0xaaa2U; 2026-02-21T10:18:46.6124744Z cvt.u16.u32 %rs181, %r4466; 2026-02-21T10:18:46.6124923Z shr.s16 %rs182, %rs181, 4; 2026-02-21T10:18:46.6125109Z prmt.b32 %r4467, %r4449, 0, 0xaaa2U; 2026-02-21T10:18:46.6125307Z cvt.u16.u32 %rs183, %r4467; 2026-02-21T10:18:46.6125509Z shr.s16 %rs184, %rs183, 4; 2026-02-21T10:18:46.6125830Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6126197Z cvt.rn.f32.s16 %r4468, %rs184; 2026-02-21T10:18:46.6126386Z cvt.rn.f32.s16 %r4469, %rs182; 2026-02-21T10:18:46.6126713Z cvt.rn.f32.s16 %r4470, %rs180; 2026-02-21T10:18:46.6126902Z cvt.rn.f32.s16 %r4471, %rs178; 2026-02-21T10:18:46.6127227Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6127589Z cvt.s16.s8 %rs185, %rs159; 2026-02-21T10:18:46.6127767Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:18:46.6127946Z cvt.s16.s8 %rs187, %rs160; 2026-02-21T10:18:46.6128124Z shr.s16 %rs188, %rs187, 4; 2026-02-21T10:18:46.6128319Z prmt.b32 %r4472, %r4444, 0, 0xbbb3U; 2026-02-21T10:18:46.6128527Z cvt.u16.u32 %rs189, %r4472; 2026-02-21T10:18:46.6128713Z shr.s16 %rs190, %rs189, 4; 2026-02-21T10:18:46.6128895Z prmt.b32 %r4473, %r4449, 0, 0xbbb3U; 2026-02-21T10:18:46.6129097Z cvt.u16.u32 %rs191, %r4473; 2026-02-21T10:18:46.6129282Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:18:46.6129602Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6129962Z cvt.rn.f32.s16 %r4474, %rs192; 2026-02-21T10:18:46.6130150Z cvt.rn.f32.s16 %r4475, %rs190; 2026-02-21T10:18:46.6130340Z cvt.rn.f32.s16 %r4476, %rs188; 2026-02-21T10:18:46.6130523Z cvt.rn.f32.s16 %r4477, %rs186; 2026-02-21T10:18:46.6130707Z bar.sync 0; 2026-02-21T10:18:46.6130909Z st.shared.v4.b32 [%r48], {%r4459, %r4457, %r4458, %r4456}; 2026-02-21T10:18:46.6131218Z st.shared.v4.b32 [%r49], {%r4465, %r4463, %r4464, %r4462}; 2026-02-21T10:18:46.6131511Z st.shared.v4.b32 [%r50], {%r4471, %r4469, %r4470, %r4468}; 2026-02-21T10:18:46.6131799Z st.shared.v4.b32 [%r51], {%r4477, %r4475, %r4476, %r4474}; 2026-02-21T10:18:46.6132048Z $L__tmp5: 2026-02-21T10:18:46.6132409Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6132992Z // begin inline asm 2026-02-21T10:18:46.6133188Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6133387Z // end inline asm 2026-02-21T10:18:46.6133545Z bar.sync 0; 2026-02-21T10:18:46.6133704Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6133892Z // begin inline asm 2026-02-21T10:18:46.6135501Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r2895,%r2896,%r2897,%r2898}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6137261Z // end inline asm 2026-02-21T10:18:46.6137436Z // begin inline asm 2026-02-21T10:18:46.6139145Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r3027,%r3028,%r3029,%r3030}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6140768Z // end inline asm 2026-02-21T10:18:46.6140927Z // begin inline asm 2026-02-21T10:18:46.6142523Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r3159,%r3160,%r3161,%r3162}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6144147Z // end inline asm 2026-02-21T10:18:46.6144306Z // begin inline asm 2026-02-21T10:18:46.6145894Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r3291,%r3292,%r3293,%r3294}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6147636Z // end inline asm 2026-02-21T10:18:46.6147813Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6148022Z mov.b32 %r3423, %r5575; 2026-02-21T10:18:46.6148203Z mov.b32 %r3424, %r4224; 2026-02-21T10:18:46.6148381Z mov.b32 %r3425, %r4224; 2026-02-21T10:18:46.6148627Z // begin inline asm 2026-02-21T10:18:46.6151252Z // wait for regs: %r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r3423,%r3424,%r3425 2026-02-21T10:18:46.6154223Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6154420Z // end inline asm 2026-02-21T10:18:46.6154584Z $L__tmp6: 2026-02-21T10:18:46.6154880Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6155257Z add.s64 %rd66, %rd33, 96; 2026-02-21T10:18:46.6155581Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6155945Z add.s64 %rd69, %rd36, 96; 2026-02-21T10:18:46.6156126Z // begin inline asm 2026-02-21T10:18:46.6156414Z mov.u64 %rd65, 0x0; 2026-02-21T10:18:46.6156770Z createpolicy.fractional.L2::evict_last.b64 %rd65, 1.0; 2026-02-21T10:18:46.6157028Z // end inline asm 2026-02-21T10:18:46.6157188Z // begin inline asm 2026-02-21T10:18:46.6157343Z mov.u32 %r3557, 0x0; 2026-02-21T10:18:46.6157602Z mov.u32 %r3558, 0x0; 2026-02-21T10:18:46.6157768Z mov.u32 %r3559, 0x0; 2026-02-21T10:18:46.6157930Z mov.u32 %r3560, 0x0; 2026-02-21T10:18:46.6158268Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3557, %r3558, %r3559, %r3560 }, [ %rd66 + 0 ], %rd65; 2026-02-21T10:18:46.6158645Z // end inline asm 2026-02-21T10:18:46.6158806Z // begin inline asm 2026-02-21T10:18:46.6158966Z mov.u64 %rd68, 0x0; 2026-02-21T10:18:46.6159190Z createpolicy.fractional.L2::evict_last.b64 %rd68, 1.0; 2026-02-21T10:18:46.6159441Z // end inline asm 2026-02-21T10:18:46.6159604Z // begin inline asm 2026-02-21T10:18:46.6159761Z mov.u32 %r3561, 0x0; 2026-02-21T10:18:46.6159923Z mov.u32 %r3562, 0x0; 2026-02-21T10:18:46.6160080Z mov.u32 %r3563, 0x0; 2026-02-21T10:18:46.6160242Z mov.u32 %r3564, 0x0; 2026-02-21T10:18:46.6160562Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3561, %r3562, %r3563, %r3564 }, [ %rd69 + 0 ], %rd68; 2026-02-21T10:18:46.6160926Z // end inline asm 2026-02-21T10:18:46.6161255Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6161619Z bar.sync 0; 2026-02-21T10:18:46.6161798Z st.shared.v2.b32 [%r33], {%r3557, %r3558}; 2026-02-21T10:18:46.6162048Z st.shared.v2.b32 [%r33+2048], {%r3561, %r3562}; 2026-02-21T10:18:46.6162303Z st.shared.v2.b32 [%r34], {%r3559, %r3560}; 2026-02-21T10:18:46.6162390Z st.shared.v2.b32 [%r34+2048], {%r3563, %r3564}; 2026-02-21T10:18:46.6162457Z bar.sync 0; 2026-02-21T10:18:46.6162532Z ld.shared.b16 %rs193, [%r36]; 2026-02-21T10:18:46.6162616Z ld.shared.b16 %rs194, [%r36+256]; 2026-02-21T10:18:46.6162693Z ld.shared.b16 %rs195, [%r36+16]; 2026-02-21T10:18:46.6162770Z ld.shared.b16 %rs196, [%r36+272]; 2026-02-21T10:18:46.6162845Z ld.shared.b16 %rs197, [%r36+2048]; 2026-02-21T10:18:46.6162917Z ld.shared.b16 %rs198, [%r36+2304]; 2026-02-21T10:18:46.6163002Z ld.shared.b16 %rs199, [%r36+2064]; 2026-02-21T10:18:46.6163072Z ld.shared.b16 %rs200, [%r36+2320]; 2026-02-21T10:18:46.6163143Z ld.shared.b16 %rs201, [%r37]; 2026-02-21T10:18:46.6163220Z ld.shared.b16 %rs202, [%r37+256]; 2026-02-21T10:18:46.6163290Z ld.shared.b16 %rs203, [%r37+16]; 2026-02-21T10:18:46.6163359Z ld.shared.b16 %rs204, [%r37+272]; 2026-02-21T10:18:46.6163428Z ld.shared.b16 %rs205, [%r37+2048]; 2026-02-21T10:18:46.6163511Z ld.shared.b16 %rs206, [%r37+2304]; 2026-02-21T10:18:46.6163581Z ld.shared.b16 %rs207, [%r37+2064]; 2026-02-21T10:18:46.6163653Z ld.shared.b16 %rs208, [%r37+2320]; 2026-02-21T10:18:46.6163729Z cvt.f32.bf16 %r3695, %rs193; 2026-02-21T10:18:46.6163878Z cvt.f32.bf16 %r3696, %rs194; 2026-02-21T10:18:46.6164004Z cvt.f32.bf16 %r3697, %rs201; 2026-02-21T10:18:46.6164068Z cvt.f32.bf16 %r3698, %rs202; 2026-02-21T10:18:46.6164142Z cvt.f32.bf16 %r3827, %rs195; 2026-02-21T10:18:46.6164207Z cvt.f32.bf16 %r3828, %rs196; 2026-02-21T10:18:46.6164270Z cvt.f32.bf16 %r3829, %rs203; 2026-02-21T10:18:46.6164336Z cvt.f32.bf16 %r3830, %rs204; 2026-02-21T10:18:46.6164400Z cvt.f32.bf16 %r3959, %rs197; 2026-02-21T10:18:46.6164464Z cvt.f32.bf16 %r3960, %rs198; 2026-02-21T10:18:46.6164527Z cvt.f32.bf16 %r3961, %rs205; 2026-02-21T10:18:46.6164596Z cvt.f32.bf16 %r3962, %rs206; 2026-02-21T10:18:46.6164659Z cvt.f32.bf16 %r4091, %rs199; 2026-02-21T10:18:46.6164724Z cvt.f32.bf16 %r4092, %rs200; 2026-02-21T10:18:46.6164792Z cvt.f32.bf16 %r4093, %rs207; 2026-02-21T10:18:46.6164854Z cvt.f32.bf16 %r4094, %rs208; 2026-02-21T10:18:46.6165076Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6165163Z add.s64 %rd71, %rd309, 30720; 2026-02-21T10:18:46.6165232Z // begin inline asm 2026-02-21T10:18:46.6165295Z mov.u32 %r3565, 0x0; 2026-02-21T10:18:46.6165421Z mov.u32 %r3566, 0x0; 2026-02-21T10:18:46.6165533Z ld.global.v2.b32 { %r3565, %r3566 }, [ %rd71 + 0 ]; 2026-02-21T10:18:46.6165594Z // end inline asm 2026-02-21T10:18:46.6165854Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6165920Z bar.sync 0; 2026-02-21T10:18:46.6165986Z st.shared.b8 [%r38], %r3565; 2026-02-21T10:18:46.6166057Z prmt.b32 %r4478, %r3565, 0, 0x7771U; 2026-02-21T10:18:46.6166124Z st.shared.b8 [%r39], %r4478; 2026-02-21T10:18:46.6166197Z prmt.b32 %r4479, %r3565, 0, 0x7772U; 2026-02-21T10:18:46.6166264Z st.shared.b8 [%r40+256], %r4479; 2026-02-21T10:18:46.6166332Z prmt.b32 %r4480, %r3565, 0, 0x7773U; 2026-02-21T10:18:46.6166404Z st.shared.b8 [%r41+256], %r4480; 2026-02-21T10:18:46.6166590Z st.shared.b8 [%r42+512], %r3566; 2026-02-21T10:18:46.6166665Z prmt.b32 %r4481, %r3566, 0, 0x7771U; 2026-02-21T10:18:46.6166739Z st.shared.b8 [%r43+512], %r4481; 2026-02-21T10:18:46.6166807Z prmt.b32 %r4482, %r3566, 0, 0x7772U; 2026-02-21T10:18:46.6166872Z st.shared.b8 [%r44+768], %r4482; 2026-02-21T10:18:46.6166938Z prmt.b32 %r4483, %r3566, 0, 0x7773U; 2026-02-21T10:18:46.6167008Z st.shared.b8 [%r45+768], %r4483; 2026-02-21T10:18:46.6167065Z bar.sync 0; 2026-02-21T10:18:46.6167146Z ld.shared.b32 %r4484, [%r46]; 2026-02-21T10:18:46.6167220Z prmt.b32 %r4485, %r4484, 0, 0x7770U; 2026-02-21T10:18:46.6167288Z cvt.u16.u32 %rs209, %r4485; 2026-02-21T10:18:46.6167354Z prmt.b32 %r4486, %r4484, 0, 0x7771U; 2026-02-21T10:18:46.6167421Z cvt.u16.u32 %rs210, %r4486; 2026-02-21T10:18:46.6167495Z prmt.b32 %r4487, %r4484, 0, 0x7772U; 2026-02-21T10:18:46.6167561Z cvt.u16.u32 %rs211, %r4487; 2026-02-21T10:18:46.6167628Z prmt.b32 %r4488, %r4484, 0, 0x7773U; 2026-02-21T10:18:46.6167697Z cvt.u16.u32 %rs212, %r4488; 2026-02-21T10:18:46.6167766Z ld.shared.b32 %r4489, [%r47]; 2026-02-21T10:18:46.6167834Z prmt.b32 %r4490, %r4489, 0, 0x7770U; 2026-02-21T10:18:46.6167900Z cvt.u16.u32 %rs213, %r4490; 2026-02-21T10:18:46.6167971Z prmt.b32 %r4491, %r4489, 0, 0x7771U; 2026-02-21T10:18:46.6168036Z cvt.u16.u32 %rs214, %r4491; 2026-02-21T10:18:46.6168102Z prmt.b32 %r4492, %r4489, 0, 0x7772U; 2026-02-21T10:18:46.6168175Z cvt.u16.u32 %rs215, %r4492; 2026-02-21T10:18:46.6168242Z prmt.b32 %r4493, %r4489, 0, 0x7773U; 2026-02-21T10:18:46.6168304Z cvt.u16.u32 %rs216, %r4493; 2026-02-21T10:18:46.6168530Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6168607Z shl.b16 %rs217, %rs209, 4; 2026-02-21T10:18:46.6168671Z shl.b16 %rs218, %rs213, 4; 2026-02-21T10:18:46.6168735Z shl.b16 %rs219, %rs210, 4; 2026-02-21T10:18:46.6168805Z shl.b16 %rs220, %rs214, 4; 2026-02-21T10:18:46.6168868Z shl.b16 %rs221, %rs211, 4; 2026-02-21T10:18:46.6169012Z shl.b16 %rs222, %rs215, 4; 2026-02-21T10:18:46.6169081Z shl.b16 %rs223, %rs212, 4; 2026-02-21T10:18:46.6169204Z shl.b16 %rs224, %rs216, 4; 2026-02-21T10:18:46.6169412Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6169477Z cvt.s16.s8 %rs225, %rs217; 2026-02-21T10:18:46.6169547Z shr.s16 %rs226, %rs225, 4; 2026-02-21T10:18:46.6169610Z cvt.s16.s8 %rs227, %rs218; 2026-02-21T10:18:46.6169676Z shr.s16 %rs228, %rs227, 4; 2026-02-21T10:18:46.6169749Z prmt.b32 %r4494, %r4484, 0, 0x8880U; 2026-02-21T10:18:46.6169813Z cvt.u16.u32 %rs229, %r4494; 2026-02-21T10:18:46.6169876Z shr.s16 %rs230, %rs229, 4; 2026-02-21T10:18:46.6169947Z prmt.b32 %r4495, %r4489, 0, 0x8880U; 2026-02-21T10:18:46.6170013Z cvt.u16.u32 %rs231, %r4495; 2026-02-21T10:18:46.6170075Z shr.s16 %rs232, %rs231, 4; 2026-02-21T10:18:46.6170277Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6170354Z cvt.rn.f32.s16 %r4496, %rs232; 2026-02-21T10:18:46.6170421Z cvt.rn.f32.s16 %r4497, %rs230; 2026-02-21T10:18:46.6170487Z cvt.rn.f32.s16 %r4498, %rs228; 2026-02-21T10:18:46.6170634Z cvt.rn.f32.s16 %r4499, %rs226; 2026-02-21T10:18:46.6170854Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6170921Z cvt.s16.s8 %rs233, %rs219; 2026-02-21T10:18:46.6171056Z shr.s16 %rs234, %rs233, 4; 2026-02-21T10:18:46.6171123Z cvt.s16.s8 %rs235, %rs220; 2026-02-21T10:18:46.6171193Z shr.s16 %rs236, %rs235, 4; 2026-02-21T10:18:46.6171263Z prmt.b32 %r4500, %r4484, 0, 0x9991U; 2026-02-21T10:18:46.6171335Z cvt.u16.u32 %rs237, %r4500; 2026-02-21T10:18:46.6171398Z shr.s16 %rs238, %rs237, 4; 2026-02-21T10:18:46.6171470Z prmt.b32 %r4501, %r4489, 0, 0x9991U; 2026-02-21T10:18:46.6171542Z cvt.u16.u32 %rs239, %r4501; 2026-02-21T10:18:46.6171608Z shr.s16 %rs240, %rs239, 4; 2026-02-21T10:18:46.6171817Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6171889Z cvt.rn.f32.s16 %r4502, %rs240; 2026-02-21T10:18:46.6171963Z cvt.rn.f32.s16 %r4503, %rs238; 2026-02-21T10:18:46.6172029Z cvt.rn.f32.s16 %r4504, %rs236; 2026-02-21T10:18:46.6172094Z cvt.rn.f32.s16 %r4505, %rs234; 2026-02-21T10:18:46.6172309Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6172377Z cvt.s16.s8 %rs241, %rs221; 2026-02-21T10:18:46.6172453Z shr.s16 %rs242, %rs241, 4; 2026-02-21T10:18:46.6172528Z cvt.s16.s8 %rs243, %rs222; 2026-02-21T10:18:46.6172593Z shr.s16 %rs244, %rs243, 4; 2026-02-21T10:18:46.6172662Z prmt.b32 %r4506, %r4484, 0, 0xaaa2U; 2026-02-21T10:18:46.6172724Z cvt.u16.u32 %rs245, %r4506; 2026-02-21T10:18:46.6172792Z shr.s16 %rs246, %rs245, 4; 2026-02-21T10:18:46.6172859Z prmt.b32 %r4507, %r4489, 0, 0xaaa2U; 2026-02-21T10:18:46.6172918Z cvt.u16.u32 %rs247, %r4507; 2026-02-21T10:18:46.6172986Z shr.s16 %rs248, %rs247, 4; 2026-02-21T10:18:46.6173189Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6173262Z cvt.rn.f32.s16 %r4508, %rs248; 2026-02-21T10:18:46.6173326Z cvt.rn.f32.s16 %r4509, %rs246; 2026-02-21T10:18:46.6173395Z cvt.rn.f32.s16 %r4510, %rs244; 2026-02-21T10:18:46.6173457Z cvt.rn.f32.s16 %r4511, %rs242; 2026-02-21T10:18:46.6173658Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6173733Z cvt.s16.s8 %rs249, %rs223; 2026-02-21T10:18:46.6173796Z shr.s16 %rs250, %rs249, 4; 2026-02-21T10:18:46.6173859Z cvt.s16.s8 %rs251, %rs224; 2026-02-21T10:18:46.6173934Z shr.s16 %rs252, %rs251, 4; 2026-02-21T10:18:46.6174003Z prmt.b32 %r4512, %r4484, 0, 0xbbb3U; 2026-02-21T10:18:46.6174070Z cvt.u16.u32 %rs253, %r4512; 2026-02-21T10:18:46.6174130Z shr.s16 %rs254, %rs253, 4; 2026-02-21T10:18:46.6174212Z prmt.b32 %r4513, %r4489, 0, 0xbbb3U; 2026-02-21T10:18:46.6174336Z cvt.u16.u32 %rs255, %r4513; 2026-02-21T10:18:46.6174445Z shr.s16 %rs256, %rs255, 4; 2026-02-21T10:18:46.6174654Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6174721Z cvt.rn.f32.s16 %r4514, %rs256; 2026-02-21T10:18:46.6174787Z cvt.rn.f32.s16 %r4515, %rs254; 2026-02-21T10:18:46.6174852Z cvt.rn.f32.s16 %r4516, %rs252; 2026-02-21T10:18:46.6174923Z cvt.rn.f32.s16 %r4517, %rs250; 2026-02-21T10:18:46.6174982Z bar.sync 0; 2026-02-21T10:18:46.6175107Z st.shared.v4.b32 [%r48], {%r4499, %r4497, %r4498, %r4496}; 2026-02-21T10:18:46.6175225Z st.shared.v4.b32 [%r49], {%r4505, %r4503, %r4504, %r4502}; 2026-02-21T10:18:46.6175329Z st.shared.v4.b32 [%r50], {%r4511, %r4509, %r4510, %r4508}; 2026-02-21T10:18:46.6175437Z st.shared.v4.b32 [%r51], {%r4517, %r4515, %r4516, %r4514}; 2026-02-21T10:18:46.6175499Z $L__tmp7: 2026-02-21T10:18:46.6175782Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6175849Z // begin inline asm 2026-02-21T10:18:46.6175981Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6176053Z // end inline asm 2026-02-21T10:18:46.6176111Z bar.sync 0; 2026-02-21T10:18:46.6176199Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6176268Z // begin inline asm 2026-02-21T10:18:46.6177958Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r3695,%r3696,%r3697,%r3698}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6178030Z // end inline asm 2026-02-21T10:18:46.6178101Z // begin inline asm 2026-02-21T10:18:46.6179596Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401}, {%r3827,%r3828,%r3829,%r3830}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6179662Z // end inline asm 2026-02-21T10:18:46.6179729Z // begin inline asm 2026-02-21T10:18:46.6181208Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r3959,%r3960,%r3961,%r3962}, %rd101, %p2, 1, 1; 2026-02-21T10:18:46.6181291Z // end inline asm 2026-02-21T10:18:46.6181353Z // begin inline asm 2026-02-21T10:18:46.6182832Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465}, {%r4091,%r4092,%r4093,%r4094}, %rd102, %p2, 1, 1; 2026-02-21T10:18:46.6183053Z // end inline asm 2026-02-21T10:18:46.6183133Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6183198Z mov.b32 %r4223, %r5575; 2026-02-21T10:18:46.6183272Z mov.b32 %r4225, %r4224; 2026-02-21T10:18:46.6183337Z // begin inline asm 2026-02-21T10:18:46.6185965Z // wait for regs: %r12338,%r12339,%r12340,%r12341,%r12342,%r12343,%r12344,%r12345,%r12346,%r12347,%r12348,%r12349,%r12350,%r12351,%r12352,%r12353,%r12354,%r12355,%r12356,%r12357,%r12358,%r12359,%r12360,%r12361,%r12362,%r12363,%r12364,%r12365,%r12366,%r12367,%r12368,%r12369,%r12370,%r12371,%r12372,%r12373,%r12374,%r12375,%r12376,%r12377,%r12378,%r12379,%r12380,%r12381,%r12382,%r12383,%r12384,%r12385,%r12386,%r12387,%r12388,%r12389,%r12390,%r12391,%r12392,%r12393,%r12394,%r12395,%r12396,%r12397,%r12398,%r12399,%r12400,%r12401,%r12402,%r12403,%r12404,%r12405,%r12406,%r12407,%r12408,%r12409,%r12410,%r12411,%r12412,%r12413,%r12414,%r12415,%r12416,%r12417,%r12418,%r12419,%r12420,%r12421,%r12422,%r12423,%r12424,%r12425,%r12426,%r12427,%r12428,%r12429,%r12430,%r12431,%r12432,%r12433,%r12434,%r12435,%r12436,%r12437,%r12438,%r12439,%r12440,%r12441,%r12442,%r12443,%r12444,%r12445,%r12446,%r12447,%r12448,%r12449,%r12450,%r12451,%r12452,%r12453,%r12454,%r12455,%r12456,%r12457,%r12458,%r12459,%r12460,%r12461,%r12462,%r12463,%r12464,%r12465,%r4223,%r4224,%r4225 2026-02-21T10:18:46.6186057Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6186118Z // end inline asm 2026-02-21T10:18:46.6186182Z $L__tmp8: 2026-02-21T10:18:46.6186398Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6186581Z add.s64 %rd311, %rd311, 32; 2026-02-21T10:18:46.6186659Z add.s64 %rd310, %rd310, 128; 2026-02-21T10:18:46.6186727Z add.s64 %rd309, %rd309, 40960; 2026-02-21T10:18:46.6186801Z setp.lt.u64 %p18, %rd311, 4064; 2026-02-21T10:18:46.6186866Z @%p18 bra $L__BB0_4; 2026-02-21T10:18:46.6186994Z // %bb.5: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:18:46.6187203Z .loc 1 33 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:33:32 2026-02-21T10:18:46.6187269Z or.b32 %r4663, %r64, %r11; 2026-02-21T10:18:46.6187339Z or.b32 %r4664, %r64, %r12; 2026-02-21T10:18:46.6187404Z or.b32 %r4665, %r64, %r13; 2026-02-21T10:18:46.6187468Z or.b32 %r4666, %r64, %r14; 2026-02-21T10:18:46.6187534Z or.b32 %r4667, %r64, %r15; 2026-02-21T10:18:46.6187602Z or.b32 %r4668, %r64, %r16; 2026-02-21T10:18:46.6187667Z or.b32 %r4669, %r64, %r17; 2026-02-21T10:18:46.6187729Z or.b32 %r4670, %r64, %r18; 2026-02-21T10:18:46.6187801Z or.b32 %r4671, %r64, %r19; 2026-02-21T10:18:46.6187863Z or.b32 %r4672, %r64, %r20; 2026-02-21T10:18:46.6187925Z or.b32 %r4673, %r64, %r21; 2026-02-21T10:18:46.6187998Z or.b32 %r4674, %r64, %r22; 2026-02-21T10:18:46.6188072Z or.b32 %r4675, %r64, %r23; 2026-02-21T10:18:46.6188137Z or.b32 %r4676, %r64, %r24; 2026-02-21T10:18:46.6188199Z or.b32 %r4677, %r64, %r25; 2026-02-21T10:18:46.6188272Z or.b32 %r4678, %r64, %r26; 2026-02-21T10:18:46.6188556Z .loc 1 87 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:87:28 2026-02-21T10:18:46.6188651Z cvt.rn.bf16x2.f32 %r4679, %r12339, %r12338; 2026-02-21T10:18:46.6188743Z cvt.rn.bf16x2.f32 %r4680, %r12341, %r12340; 2026-02-21T10:18:46.6188824Z cvt.rn.bf16x2.f32 %r4681, %r12343, %r12342; 2026-02-21T10:18:46.6188903Z cvt.rn.bf16x2.f32 %r4682, %r12345, %r12344; 2026-02-21T10:18:46.6188985Z cvt.rn.bf16x2.f32 %r4683, %r12347, %r12346; 2026-02-21T10:18:46.6189065Z cvt.rn.bf16x2.f32 %r4684, %r12349, %r12348; 2026-02-21T10:18:46.6189142Z cvt.rn.bf16x2.f32 %r4685, %r12351, %r12350; 2026-02-21T10:18:46.6189222Z cvt.rn.bf16x2.f32 %r4686, %r12353, %r12352; 2026-02-21T10:18:46.6189307Z cvt.rn.bf16x2.f32 %r4687, %r12355, %r12354; 2026-02-21T10:18:46.6189484Z cvt.rn.bf16x2.f32 %r4688, %r12357, %r12356; 2026-02-21T10:18:46.6189623Z cvt.rn.bf16x2.f32 %r4689, %r12359, %r12358; 2026-02-21T10:18:46.6189707Z cvt.rn.bf16x2.f32 %r4690, %r12361, %r12360; 2026-02-21T10:18:46.6189787Z cvt.rn.bf16x2.f32 %r4691, %r12363, %r12362; 2026-02-21T10:18:46.6189868Z cvt.rn.bf16x2.f32 %r4692, %r12365, %r12364; 2026-02-21T10:18:46.6189951Z cvt.rn.bf16x2.f32 %r4693, %r12367, %r12366; 2026-02-21T10:18:46.6190029Z cvt.rn.bf16x2.f32 %r4694, %r12369, %r12368; 2026-02-21T10:18:46.6190105Z cvt.rn.bf16x2.f32 %r4695, %r12371, %r12370; 2026-02-21T10:18:46.6190182Z cvt.rn.bf16x2.f32 %r4696, %r12373, %r12372; 2026-02-21T10:18:46.6190272Z cvt.rn.bf16x2.f32 %r4697, %r12375, %r12374; 2026-02-21T10:18:46.6190354Z cvt.rn.bf16x2.f32 %r4698, %r12377, %r12376; 2026-02-21T10:18:46.6190429Z cvt.rn.bf16x2.f32 %r4699, %r12379, %r12378; 2026-02-21T10:18:46.6190510Z cvt.rn.bf16x2.f32 %r4700, %r12381, %r12380; 2026-02-21T10:18:46.6190586Z cvt.rn.bf16x2.f32 %r4701, %r12383, %r12382; 2026-02-21T10:18:46.6190663Z cvt.rn.bf16x2.f32 %r4702, %r12385, %r12384; 2026-02-21T10:18:46.6190808Z cvt.rn.bf16x2.f32 %r4703, %r12387, %r12386; 2026-02-21T10:18:46.6190893Z cvt.rn.bf16x2.f32 %r4704, %r12389, %r12388; 2026-02-21T10:18:46.6190970Z cvt.rn.bf16x2.f32 %r4705, %r12391, %r12390; 2026-02-21T10:18:46.6191048Z cvt.rn.bf16x2.f32 %r4706, %r12393, %r12392; 2026-02-21T10:18:46.6191198Z cvt.rn.bf16x2.f32 %r4707, %r12395, %r12394; 2026-02-21T10:18:46.6191284Z cvt.rn.bf16x2.f32 %r4708, %r12397, %r12396; 2026-02-21T10:18:46.6191363Z cvt.rn.bf16x2.f32 %r4709, %r12399, %r12398; 2026-02-21T10:18:46.6191446Z cvt.rn.bf16x2.f32 %r4710, %r12401, %r12400; 2026-02-21T10:18:46.6191522Z cvt.rn.bf16x2.f32 %r4711, %r12403, %r12402; 2026-02-21T10:18:46.6191598Z cvt.rn.bf16x2.f32 %r4712, %r12405, %r12404; 2026-02-21T10:18:46.6191685Z cvt.rn.bf16x2.f32 %r4713, %r12407, %r12406; 2026-02-21T10:18:46.6191762Z cvt.rn.bf16x2.f32 %r4714, %r12409, %r12408; 2026-02-21T10:18:46.6191839Z cvt.rn.bf16x2.f32 %r4715, %r12411, %r12410; 2026-02-21T10:18:46.6191916Z cvt.rn.bf16x2.f32 %r4716, %r12413, %r12412; 2026-02-21T10:18:46.6192001Z cvt.rn.bf16x2.f32 %r4717, %r12415, %r12414; 2026-02-21T10:18:46.6192077Z cvt.rn.bf16x2.f32 %r4718, %r12417, %r12416; 2026-02-21T10:18:46.6192157Z cvt.rn.bf16x2.f32 %r4719, %r12419, %r12418; 2026-02-21T10:18:46.6192238Z cvt.rn.bf16x2.f32 %r4720, %r12421, %r12420; 2026-02-21T10:18:46.6192318Z cvt.rn.bf16x2.f32 %r4721, %r12423, %r12422; 2026-02-21T10:18:46.6192398Z cvt.rn.bf16x2.f32 %r4722, %r12425, %r12424; 2026-02-21T10:18:46.6192479Z cvt.rn.bf16x2.f32 %r4723, %r12427, %r12426; 2026-02-21T10:18:46.6192554Z cvt.rn.bf16x2.f32 %r4724, %r12429, %r12428; 2026-02-21T10:18:46.6192627Z cvt.rn.bf16x2.f32 %r4725, %r12431, %r12430; 2026-02-21T10:18:46.6192705Z cvt.rn.bf16x2.f32 %r4726, %r12433, %r12432; 2026-02-21T10:18:46.6192789Z cvt.rn.bf16x2.f32 %r4727, %r12435, %r12434; 2026-02-21T10:18:46.6192866Z cvt.rn.bf16x2.f32 %r4728, %r12437, %r12436; 2026-02-21T10:18:46.6192947Z cvt.rn.bf16x2.f32 %r4729, %r12439, %r12438; 2026-02-21T10:18:46.6193032Z cvt.rn.bf16x2.f32 %r4730, %r12441, %r12440; 2026-02-21T10:18:46.6193112Z cvt.rn.bf16x2.f32 %r4731, %r12443, %r12442; 2026-02-21T10:18:46.6193189Z cvt.rn.bf16x2.f32 %r4732, %r12445, %r12444; 2026-02-21T10:18:46.6193276Z cvt.rn.bf16x2.f32 %r4733, %r12447, %r12446; 2026-02-21T10:18:46.6193370Z cvt.rn.bf16x2.f32 %r4734, %r12449, %r12448; 2026-02-21T10:18:46.6193452Z cvt.rn.bf16x2.f32 %r4735, %r12451, %r12450; 2026-02-21T10:18:46.6193528Z cvt.rn.bf16x2.f32 %r4736, %r12453, %r12452; 2026-02-21T10:18:46.6193617Z cvt.rn.bf16x2.f32 %r4737, %r12455, %r12454; 2026-02-21T10:18:46.6193693Z cvt.rn.bf16x2.f32 %r4738, %r12457, %r12456; 2026-02-21T10:18:46.6193771Z cvt.rn.bf16x2.f32 %r4739, %r12459, %r12458; 2026-02-21T10:18:46.6193852Z cvt.rn.bf16x2.f32 %r4740, %r12461, %r12460; 2026-02-21T10:18:46.6193928Z cvt.rn.bf16x2.f32 %r4741, %r12463, %r12462; 2026-02-21T10:18:46.6194067Z cvt.rn.bf16x2.f32 %r4742, %r12465, %r12464; 2026-02-21T10:18:46.6194339Z .loc 1 88 50 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:50 2026-02-21T10:18:46.6194418Z mad.lo.s32 %r4743, %r4663, 1280, %r63; 2026-02-21T10:18:46.6194495Z mad.lo.s32 %r4744, %r4664, 1280, %r63; 2026-02-21T10:18:46.6194566Z mad.lo.s32 %r4745, %r4665, 1280, %r63; 2026-02-21T10:18:46.6194638Z mad.lo.s32 %r4746, %r4666, 1280, %r63; 2026-02-21T10:18:46.6194707Z mad.lo.s32 %r4747, %r4667, 1280, %r63; 2026-02-21T10:18:46.6194775Z mad.lo.s32 %r4748, %r4668, 1280, %r63; 2026-02-21T10:18:46.6194853Z mad.lo.s32 %r4749, %r4669, 1280, %r63; 2026-02-21T10:18:46.6194920Z mad.lo.s32 %r4750, %r4670, 1280, %r63; 2026-02-21T10:18:46.6194990Z mad.lo.s32 %r4751, %r4671, 1280, %r63; 2026-02-21T10:18:46.6195057Z mad.lo.s32 %r4752, %r4672, 1280, %r63; 2026-02-21T10:18:46.6195130Z mad.lo.s32 %r4753, %r4673, 1280, %r63; 2026-02-21T10:18:46.6195197Z mad.lo.s32 %r4754, %r4674, 1280, %r63; 2026-02-21T10:18:46.6195266Z mad.lo.s32 %r4755, %r4675, 1280, %r63; 2026-02-21T10:18:46.6195340Z mad.lo.s32 %r4756, %r4676, 1280, %r63; 2026-02-21T10:18:46.6195466Z mad.lo.s32 %r4757, %r4677, 1280, %r63; 2026-02-21T10:18:46.6195542Z mad.lo.s32 %r4758, %r4678, 1280, %r63; 2026-02-21T10:18:46.6195757Z .loc 1 88 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:22 2026-02-21T10:18:46.6195831Z mad.wide.s32 %rd76, %r4743, 2, %rd27; 2026-02-21T10:18:46.6195947Z mad.wide.s32 %rd77, %r4744, 2, %rd27; 2026-02-21T10:18:46.6196023Z mad.wide.s32 %rd78, %r4745, 2, %rd27; 2026-02-21T10:18:46.6196099Z mad.wide.s32 %rd79, %r4746, 2, %rd27; 2026-02-21T10:18:46.6196167Z mad.wide.s32 %rd80, %r4747, 2, %rd27; 2026-02-21T10:18:46.6201820Z mad.wide.s32 %rd81, %r4748, 2, %rd27; 2026-02-21T10:18:46.6201957Z mad.wide.s32 %rd82, %r4749, 2, %rd27; 2026-02-21T10:18:46.6202040Z mad.wide.s32 %rd83, %r4750, 2, %rd27; 2026-02-21T10:18:46.6202121Z mad.wide.s32 %rd84, %r4751, 2, %rd27; 2026-02-21T10:18:46.6202203Z mad.wide.s32 %rd85, %r4752, 2, %rd27; 2026-02-21T10:18:46.6202278Z mad.wide.s32 %rd86, %r4753, 2, %rd27; 2026-02-21T10:18:46.6202355Z mad.wide.s32 %rd87, %r4754, 2, %rd27; 2026-02-21T10:18:46.6202425Z mad.wide.s32 %rd88, %r4755, 2, %rd27; 2026-02-21T10:18:46.6202492Z mad.wide.s32 %rd89, %r4756, 2, %rd27; 2026-02-21T10:18:46.6202562Z mad.wide.s32 %rd90, %r4757, 2, %rd27; 2026-02-21T10:18:46.6202636Z mad.wide.s32 %rd91, %r4758, 2, %rd27; 2026-02-21T10:18:46.6202872Z .loc 1 88 81 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:81 2026-02-21T10:18:46.6202938Z bar.sync 0; 2026-02-21T10:18:46.6203069Z st.shared.v4.b32 [%r52], {%r4679, %r4681, %r4683, %r4685}; 2026-02-21T10:18:46.6203181Z st.shared.v4.b32 [%r53], {%r4687, %r4689, %r4691, %r4693}; 2026-02-21T10:18:46.6203290Z st.shared.v4.b32 [%r54], {%r4695, %r4697, %r4699, %r4701}; 2026-02-21T10:18:46.6203399Z st.shared.v4.b32 [%r55], {%r4703, %r4705, %r4707, %r4709}; 2026-02-21T10:18:46.6203460Z bar.sync 0; 2026-02-21T10:18:46.6203528Z // begin inline asm 2026-02-21T10:18:46.6203726Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4598, %r4599, %r4600, %r4601}, [%r4522]; 2026-02-21T10:18:46.6203794Z // end inline asm 2026-02-21T10:18:46.6203859Z // begin inline asm 2026-02-21T10:18:46.6204044Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4606, %r4607, %r4608, %r4609}, [%r4527]; 2026-02-21T10:18:46.6204124Z // end inline asm 2026-02-21T10:18:46.6204192Z // begin inline asm 2026-02-21T10:18:46.6204373Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4614, %r4615, %r4616, %r4617}, [%r4532]; 2026-02-21T10:18:46.6204431Z // end inline asm 2026-02-21T10:18:46.6204497Z // begin inline asm 2026-02-21T10:18:46.6204673Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4622, %r4623, %r4624, %r4625}, [%r4537]; 2026-02-21T10:18:46.6204732Z // end inline asm 2026-02-21T10:18:46.6204797Z bar.sync 0; 2026-02-21T10:18:46.6204907Z st.shared.v4.b32 [%r52], {%r4680, %r4682, %r4684, %r4686}; 2026-02-21T10:18:46.6205183Z st.shared.v4.b32 [%r53], {%r4688, %r4690, %r4692, %r4694}; 2026-02-21T10:18:46.6205362Z st.shared.v4.b32 [%r54], {%r4696, %r4698, %r4700, %r4702}; 2026-02-21T10:18:46.6205470Z st.shared.v4.b32 [%r55], {%r4704, %r4706, %r4708, %r4710}; 2026-02-21T10:18:46.6205530Z bar.sync 0; 2026-02-21T10:18:46.6205594Z // begin inline asm 2026-02-21T10:18:46.6205801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4602, %r4603, %r4604, %r4605}, [%r4522]; 2026-02-21T10:18:46.6205863Z // end inline asm 2026-02-21T10:18:46.6205927Z // begin inline asm 2026-02-21T10:18:46.6206120Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4610, %r4611, %r4612, %r4613}, [%r4527]; 2026-02-21T10:18:46.6206182Z // end inline asm 2026-02-21T10:18:46.6206243Z // begin inline asm 2026-02-21T10:18:46.6206430Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4618, %r4619, %r4620, %r4621}, [%r4532]; 2026-02-21T10:18:46.6206665Z // end inline asm 2026-02-21T10:18:46.6206735Z // begin inline asm 2026-02-21T10:18:46.6206921Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4626, %r4627, %r4628, %r4629}, [%r4537]; 2026-02-21T10:18:46.6206988Z // end inline asm 2026-02-21T10:18:46.6207141Z bar.sync 0; 2026-02-21T10:18:46.6207271Z st.shared.v4.b32 [%r52], {%r4711, %r4713, %r4715, %r4717}; 2026-02-21T10:18:46.6207390Z st.shared.v4.b32 [%r53], {%r4719, %r4721, %r4723, %r4725}; 2026-02-21T10:18:46.6207498Z st.shared.v4.b32 [%r54], {%r4727, %r4729, %r4731, %r4733}; 2026-02-21T10:18:46.6207663Z st.shared.v4.b32 [%r55], {%r4735, %r4737, %r4739, %r4741}; 2026-02-21T10:18:46.6207729Z bar.sync 0; 2026-02-21T10:18:46.6207793Z // begin inline asm 2026-02-21T10:18:46.6207986Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4630, %r4631, %r4632, %r4633}, [%r4522]; 2026-02-21T10:18:46.6208048Z // end inline asm 2026-02-21T10:18:46.6208116Z // begin inline asm 2026-02-21T10:18:46.6208313Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4638, %r4639, %r4640, %r4641}, [%r4527]; 2026-02-21T10:18:46.6208379Z // end inline asm 2026-02-21T10:18:46.6208450Z // begin inline asm 2026-02-21T10:18:46.6208630Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4646, %r4647, %r4648, %r4649}, [%r4532]; 2026-02-21T10:18:46.6208692Z // end inline asm 2026-02-21T10:18:46.6208752Z // begin inline asm 2026-02-21T10:18:46.6208940Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4654, %r4655, %r4656, %r4657}, [%r4537]; 2026-02-21T10:18:46.6208998Z // end inline asm 2026-02-21T10:18:46.6209058Z bar.sync 0; 2026-02-21T10:18:46.6209172Z st.shared.v4.b32 [%r52], {%r4712, %r4714, %r4716, %r4718}; 2026-02-21T10:18:46.6209277Z st.shared.v4.b32 [%r53], {%r4720, %r4722, %r4724, %r4726}; 2026-02-21T10:18:46.6209379Z st.shared.v4.b32 [%r54], {%r4728, %r4730, %r4732, %r4734}; 2026-02-21T10:18:46.6209486Z st.shared.v4.b32 [%r55], {%r4736, %r4738, %r4740, %r4742}; 2026-02-21T10:18:46.6209544Z bar.sync 0; 2026-02-21T10:18:46.6209606Z // begin inline asm 2026-02-21T10:18:46.6209789Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4634, %r4635, %r4636, %r4637}, [%r4522]; 2026-02-21T10:18:46.6209856Z // end inline asm 2026-02-21T10:18:46.6209917Z // begin inline asm 2026-02-21T10:18:46.6210098Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4642, %r4643, %r4644, %r4645}, [%r4527]; 2026-02-21T10:18:46.6210166Z // end inline asm 2026-02-21T10:18:46.6210227Z // begin inline asm 2026-02-21T10:18:46.6210405Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4650, %r4651, %r4652, %r4653}, [%r4532]; 2026-02-21T10:18:46.6210464Z // end inline asm 2026-02-21T10:18:46.6210546Z // begin inline asm 2026-02-21T10:18:46.6210729Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4658, %r4659, %r4660, %r4661}, [%r4537]; 2026-02-21T10:18:46.6210788Z // end inline asm 2026-02-21T10:18:46.6210855Z // begin inline asm 2026-02-21T10:18:46.6210986Z st.global.v4.b32 [ %rd76 + 0 ], { %r4598, %r4599, %r4600, %r4601 }; 2026-02-21T10:18:46.6211047Z // end inline asm 2026-02-21T10:18:46.6211115Z // begin inline asm 2026-02-21T10:18:46.6211236Z st.global.v4.b32 [ %rd77 + 0 ], { %r4602, %r4603, %r4604, %r4605 }; 2026-02-21T10:18:46.6211397Z // end inline asm 2026-02-21T10:18:46.6211522Z // begin inline asm 2026-02-21T10:18:46.6211660Z st.global.v4.b32 [ %rd78 + 0 ], { %r4606, %r4607, %r4608, %r4609 }; 2026-02-21T10:18:46.6211721Z // end inline asm 2026-02-21T10:18:46.6211783Z // begin inline asm 2026-02-21T10:18:46.6211908Z st.global.v4.b32 [ %rd79 + 0 ], { %r4610, %r4611, %r4612, %r4613 }; 2026-02-21T10:18:46.6211966Z // end inline asm 2026-02-21T10:18:46.6212028Z // begin inline asm 2026-02-21T10:18:46.6212144Z st.global.v4.b32 [ %rd80 + 0 ], { %r4614, %r4615, %r4616, %r4617 }; 2026-02-21T10:18:46.6212213Z // end inline asm 2026-02-21T10:18:46.6212273Z // begin inline asm 2026-02-21T10:18:46.6212385Z st.global.v4.b32 [ %rd81 + 0 ], { %r4618, %r4619, %r4620, %r4621 }; 2026-02-21T10:18:46.6212451Z // end inline asm 2026-02-21T10:18:46.6212513Z // begin inline asm 2026-02-21T10:18:46.6212629Z st.global.v4.b32 [ %rd82 + 0 ], { %r4622, %r4623, %r4624, %r4625 }; 2026-02-21T10:18:46.6212698Z // end inline asm 2026-02-21T10:18:46.6212758Z // begin inline asm 2026-02-21T10:18:46.6212874Z st.global.v4.b32 [ %rd83 + 0 ], { %r4626, %r4627, %r4628, %r4629 }; 2026-02-21T10:18:46.6212992Z // end inline asm 2026-02-21T10:18:46.6213067Z // begin inline asm 2026-02-21T10:18:46.6213184Z st.global.v4.b32 [ %rd84 + 0 ], { %r4630, %r4631, %r4632, %r4633 }; 2026-02-21T10:18:46.6213242Z // end inline asm 2026-02-21T10:18:46.6213309Z // begin inline asm 2026-02-21T10:18:46.6213469Z st.global.v4.b32 [ %rd85 + 0 ], { %r4634, %r4635, %r4636, %r4637 }; 2026-02-21T10:18:46.6213540Z // end inline asm 2026-02-21T10:18:46.6213607Z // begin inline asm 2026-02-21T10:18:46.6213726Z st.global.v4.b32 [ %rd86 + 0 ], { %r4638, %r4639, %r4640, %r4641 }; 2026-02-21T10:18:46.6213786Z // end inline asm 2026-02-21T10:18:46.6213845Z // begin inline asm 2026-02-21T10:18:46.6213964Z st.global.v4.b32 [ %rd87 + 0 ], { %r4642, %r4643, %r4644, %r4645 }; 2026-02-21T10:18:46.6214021Z // end inline asm 2026-02-21T10:18:46.6214084Z // begin inline asm 2026-02-21T10:18:46.6214204Z st.global.v4.b32 [ %rd88 + 0 ], { %r4646, %r4647, %r4648, %r4649 }; 2026-02-21T10:18:46.6214265Z // end inline asm 2026-02-21T10:18:46.6214330Z // begin inline asm 2026-02-21T10:18:46.6214446Z st.global.v4.b32 [ %rd89 + 0 ], { %r4650, %r4651, %r4652, %r4653 }; 2026-02-21T10:18:46.6214512Z // end inline asm 2026-02-21T10:18:46.6214572Z // begin inline asm 2026-02-21T10:18:46.6214687Z st.global.v4.b32 [ %rd90 + 0 ], { %r4654, %r4655, %r4656, %r4657 }; 2026-02-21T10:18:46.6214751Z // end inline asm 2026-02-21T10:18:46.6214811Z // begin inline asm 2026-02-21T10:18:46.6214928Z st.global.v4.b32 [ %rd91 + 0 ], { %r4658, %r4659, %r4660, %r4661 }; 2026-02-21T10:18:46.6214987Z // end inline asm 2026-02-21T10:18:46.6215231Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6215302Z add.s32 %r4759, %r12337, 132; 2026-02-21T10:18:46.6215519Z .loc 1 25 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:25:35 2026-02-21T10:18:46.6215607Z shr.s32 %r4760, %r4759, 31; 2026-02-21T10:18:46.6215673Z shr.u32 %r4761, %r4760, 17; 2026-02-21T10:18:46.6215742Z add.s32 %r4762, %r4759, %r4761; 2026-02-21T10:18:46.6215813Z shr.s32 %r4763, %r4762, 15; 2026-02-21T10:18:46.6216021Z .loc 1 26 33 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:26:33 2026-02-21T10:18:46.6216089Z shl.b32 %r4764, %r4763, 6; 2026-02-21T10:18:46.6216291Z .loc 1 27 39 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:39 2026-02-21T10:18:46.6216361Z sub.s32 %r4765, 10, %r4764; 2026-02-21T10:18:46.6216697Z .loc 1 27 52 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:52 2026-02-21T10:18:46.6216764Z min.s32 %r4766, %r4765, 64; 2026-02-21T10:18:46.6216969Z .loc 1 28 45 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:45 2026-02-21T10:18:46.6217126Z and.b32 %r4767, %r4762, -32768; 2026-02-21T10:18:46.6217255Z sub.s32 %r4768, %r4759, %r4767; 2026-02-21T10:18:46.6217472Z .loc 1 29 51 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:29:51 2026-02-21T10:18:46.6217538Z div.s32 %r4769, %r4768, %r4766; 2026-02-21T10:18:46.6217741Z .loc 1 28 64 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:64 2026-02-21T10:18:46.6217822Z mul.lo.s32 %r4770, %r4769, %r4766; 2026-02-21T10:18:46.6217894Z sub.s32 %r4771, %r4768, %r4770; 2026-02-21T10:18:46.6218104Z .loc 1 28 30 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:30 2026-02-21T10:18:46.6218169Z add.s32 %r4772, %r4771, %r4764; 2026-02-21T10:18:46.6218376Z .loc 1 30 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:30:27 2026-02-21T10:18:46.6218440Z shl.b32 %r4773, %r4772, 7; 2026-02-21T10:18:46.6218647Z .loc 1 31 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:31:32 2026-02-21T10:18:46.6218725Z or.b32 %r321, %r4773, %r28; 2026-02-21T10:18:46.6218986Z .loc 1 32 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:32:27 2026-02-21T10:18:46.6219053Z shl.b32 %r322, %r4769, 7; 2026-02-21T10:18:46.6219256Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6219381Z or.b32 %r4774, %r60, %r322; 2026-02-21T10:18:46.6219448Z shl.b32 %r4775, %r4774, 13; 2026-02-21T10:18:46.6219519Z mul.wide.s32 %rd14, %r4775, 2; 2026-02-21T10:18:46.6219588Z shl.b32 %r4776, %r4769, 20; 2026-02-21T10:18:46.6219652Z or.b32 %r4777, %r61, %r4776; 2026-02-21T10:18:46.6219720Z mul.wide.s32 %rd15, %r4777, 2; 2026-02-21T10:18:46.6219802Z cvt.s64.s32 %rd93, %r321; 2026-02-21T10:18:46.6219870Z add.s64 %rd312, %rd4, %rd93; 2026-02-21T10:18:46.6219935Z mov.b32 %r12466, 0f00000000; 2026-02-21T10:18:46.6220005Z mov.b64 %rd314, -32; 2026-02-21T10:18:46.6220071Z mov.b64 %rd313, %rd3; 2026-02-21T10:18:46.6220135Z mov.b32 %r12467, %r12466; 2026-02-21T10:18:46.6220197Z mov.b32 %r12468, %r12466; 2026-02-21T10:18:46.6220268Z mov.b32 %r12469, %r12466; 2026-02-21T10:18:46.6220330Z mov.b32 %r12470, %r12466; 2026-02-21T10:18:46.6220392Z mov.b32 %r12471, %r12466; 2026-02-21T10:18:46.6220462Z mov.b32 %r12472, %r12466; 2026-02-21T10:18:46.6220533Z mov.b32 %r12473, %r12466; 2026-02-21T10:18:46.6220599Z mov.b32 %r12474, %r12466; 2026-02-21T10:18:46.6220662Z mov.b32 %r12475, %r12466; 2026-02-21T10:18:46.6220730Z mov.b32 %r12476, %r12466; 2026-02-21T10:18:46.6220792Z mov.b32 %r12477, %r12466; 2026-02-21T10:18:46.6220854Z mov.b32 %r12478, %r12466; 2026-02-21T10:18:46.6220923Z mov.b32 %r12479, %r12466; 2026-02-21T10:18:46.6220983Z mov.b32 %r12480, %r12466; 2026-02-21T10:18:46.6221042Z mov.b32 %r12481, %r12466; 2026-02-21T10:18:46.6221104Z mov.b32 %r12482, %r12466; 2026-02-21T10:18:46.6221170Z mov.b32 %r12483, %r12466; 2026-02-21T10:18:46.6221234Z mov.b32 %r12484, %r12466; 2026-02-21T10:18:46.6221297Z mov.b32 %r12485, %r12466; 2026-02-21T10:18:46.6221365Z mov.b32 %r12486, %r12466; 2026-02-21T10:18:46.6221440Z mov.b32 %r12487, %r12466; 2026-02-21T10:18:46.6221504Z mov.b32 %r12488, %r12466; 2026-02-21T10:18:46.6221565Z mov.b32 %r12489, %r12466; 2026-02-21T10:18:46.6221632Z mov.b32 %r12490, %r12466; 2026-02-21T10:18:46.6221693Z mov.b32 %r12491, %r12466; 2026-02-21T10:18:46.6221755Z mov.b32 %r12492, %r12466; 2026-02-21T10:18:46.6221823Z mov.b32 %r12493, %r12466; 2026-02-21T10:18:46.6221884Z mov.b32 %r12494, %r12466; 2026-02-21T10:18:46.6221943Z mov.b32 %r12495, %r12466; 2026-02-21T10:18:46.6222004Z mov.b32 %r12496, %r12466; 2026-02-21T10:18:46.6222070Z mov.b32 %r12497, %r12466; 2026-02-21T10:18:46.6222132Z mov.b32 %r12498, %r12466; 2026-02-21T10:18:46.6222194Z mov.b32 %r12499, %r12466; 2026-02-21T10:18:46.6222262Z mov.b32 %r12500, %r12466; 2026-02-21T10:18:46.6222325Z mov.b32 %r12501, %r12466; 2026-02-21T10:18:46.6222457Z mov.b32 %r12502, %r12466; 2026-02-21T10:18:46.6222565Z mov.b32 %r12503, %r12466; 2026-02-21T10:18:46.6222633Z mov.b32 %r12504, %r12466; 2026-02-21T10:18:46.6222695Z mov.b32 %r12505, %r12466; 2026-02-21T10:18:46.6222756Z mov.b32 %r12506, %r12466; 2026-02-21T10:18:46.6222824Z mov.b32 %r12507, %r12466; 2026-02-21T10:18:46.6222884Z mov.b32 %r12508, %r12466; 2026-02-21T10:18:46.6222943Z mov.b32 %r12509, %r12466; 2026-02-21T10:18:46.6223012Z mov.b32 %r12510, %r12466; 2026-02-21T10:18:46.6223074Z mov.b32 %r12511, %r12466; 2026-02-21T10:18:46.6223134Z mov.b32 %r12512, %r12466; 2026-02-21T10:18:46.6223196Z mov.b32 %r12513, %r12466; 2026-02-21T10:18:46.6223261Z mov.b32 %r12514, %r12466; 2026-02-21T10:18:46.6223322Z mov.b32 %r12515, %r12466; 2026-02-21T10:18:46.6223381Z mov.b32 %r12516, %r12466; 2026-02-21T10:18:46.6223447Z mov.b32 %r12517, %r12466; 2026-02-21T10:18:46.6223510Z mov.b32 %r12518, %r12466; 2026-02-21T10:18:46.6223572Z mov.b32 %r12519, %r12466; 2026-02-21T10:18:46.6223635Z mov.b32 %r12520, %r12466; 2026-02-21T10:18:46.6223708Z mov.b32 %r12521, %r12466; 2026-02-21T10:18:46.6223771Z mov.b32 %r12522, %r12466; 2026-02-21T10:18:46.6223891Z mov.b32 %r12523, %r12466; 2026-02-21T10:18:46.6223964Z mov.b32 %r12524, %r12466; 2026-02-21T10:18:46.6224025Z mov.b32 %r12525, %r12466; 2026-02-21T10:18:46.6224086Z mov.b32 %r12526, %r12466; 2026-02-21T10:18:46.6224146Z mov.b32 %r12527, %r12466; 2026-02-21T10:18:46.6224263Z mov.b32 %r12528, %r12466; 2026-02-21T10:18:46.6224326Z mov.b32 %r12529, %r12466; 2026-02-21T10:18:46.6224386Z mov.b32 %r12530, %r12466; 2026-02-21T10:18:46.6224451Z mov.b32 %r12531, %r12466; 2026-02-21T10:18:46.6224512Z mov.b32 %r12532, %r12466; 2026-02-21T10:18:46.6224573Z mov.b32 %r12533, %r12466; 2026-02-21T10:18:46.6224632Z mov.b32 %r12534, %r12466; 2026-02-21T10:18:46.6224699Z mov.b32 %r12535, %r12466; 2026-02-21T10:18:46.6224758Z mov.b32 %r12536, %r12466; 2026-02-21T10:18:46.6224819Z mov.b32 %r12537, %r12466; 2026-02-21T10:18:46.6224889Z mov.b32 %r12538, %r12466; 2026-02-21T10:18:46.6224954Z mov.b32 %r12539, %r12466; 2026-02-21T10:18:46.6225014Z mov.b32 %r12540, %r12466; 2026-02-21T10:18:46.6225076Z mov.b32 %r12541, %r12466; 2026-02-21T10:18:46.6225145Z mov.b32 %r12542, %r12466; 2026-02-21T10:18:46.6225205Z mov.b32 %r12543, %r12466; 2026-02-21T10:18:46.6225264Z mov.b32 %r12544, %r12466; 2026-02-21T10:18:46.6225330Z mov.b32 %r12545, %r12466; 2026-02-21T10:18:46.6225390Z mov.b32 %r12546, %r12466; 2026-02-21T10:18:46.6225449Z mov.b32 %r12547, %r12466; 2026-02-21T10:18:46.6225509Z mov.b32 %r12548, %r12466; 2026-02-21T10:18:46.6225575Z mov.b32 %r12549, %r12466; 2026-02-21T10:18:46.6225634Z mov.b32 %r12550, %r12466; 2026-02-21T10:18:46.6225695Z mov.b32 %r12551, %r12466; 2026-02-21T10:18:46.6225760Z mov.b32 %r12552, %r12466; 2026-02-21T10:18:46.6225821Z mov.b32 %r12553, %r12466; 2026-02-21T10:18:46.6225881Z mov.b32 %r12554, %r12466; 2026-02-21T10:18:46.6225949Z mov.b32 %r12555, %r12466; 2026-02-21T10:18:46.6226011Z mov.b32 %r12556, %r12466; 2026-02-21T10:18:46.6226075Z mov.b32 %r12557, %r12466; 2026-02-21T10:18:46.6226136Z mov.b32 %r12558, %r12466; 2026-02-21T10:18:46.6226203Z mov.b32 %r12559, %r12466; 2026-02-21T10:18:46.6226264Z mov.b32 %r12560, %r12466; 2026-02-21T10:18:46.6226325Z mov.b32 %r12561, %r12466; 2026-02-21T10:18:46.6226388Z mov.b32 %r12562, %r12466; 2026-02-21T10:18:46.6226577Z mov.b32 %r12563, %r12466; 2026-02-21T10:18:46.6226647Z mov.b32 %r12564, %r12466; 2026-02-21T10:18:46.6226710Z mov.b32 %r12565, %r12466; 2026-02-21T10:18:46.6226778Z mov.b32 %r12566, %r12466; 2026-02-21T10:18:46.6226838Z mov.b32 %r12567, %r12466; 2026-02-21T10:18:46.6226898Z mov.b32 %r12568, %r12466; 2026-02-21T10:18:46.6226977Z mov.b32 %r12569, %r12466; 2026-02-21T10:18:46.6227039Z mov.b32 %r12570, %r12466; 2026-02-21T10:18:46.6227101Z mov.b32 %r12571, %r12466; 2026-02-21T10:18:46.6227163Z mov.b32 %r12572, %r12466; 2026-02-21T10:18:46.6227234Z mov.b32 %r12573, %r12466; 2026-02-21T10:18:46.6227375Z mov.b32 %r12574, %r12466; 2026-02-21T10:18:46.6227495Z mov.b32 %r12575, %r12466; 2026-02-21T10:18:46.6227562Z mov.b32 %r12576, %r12466; 2026-02-21T10:18:46.6227625Z mov.b32 %r12577, %r12466; 2026-02-21T10:18:46.6227686Z mov.b32 %r12578, %r12466; 2026-02-21T10:18:46.6227748Z mov.b32 %r12579, %r12466; 2026-02-21T10:18:46.6227818Z mov.b32 %r12580, %r12466; 2026-02-21T10:18:46.6227879Z mov.b32 %r12581, %r12466; 2026-02-21T10:18:46.6227942Z mov.b32 %r12582, %r12466; 2026-02-21T10:18:46.6228009Z mov.b32 %r12583, %r12466; 2026-02-21T10:18:46.6228073Z mov.b32 %r12584, %r12466; 2026-02-21T10:18:46.6228146Z mov.b32 %r12585, %r12466; 2026-02-21T10:18:46.6228211Z mov.b32 %r12586, %r12466; 2026-02-21T10:18:46.6228278Z mov.b32 %r12587, %r12466; 2026-02-21T10:18:46.6228339Z mov.b32 %r12588, %r12466; 2026-02-21T10:18:46.6228401Z mov.b32 %r12589, %r12466; 2026-02-21T10:18:46.6228544Z mov.b32 %r12590, %r12466; 2026-02-21T10:18:46.6228611Z mov.b32 %r12591, %r12466; 2026-02-21T10:18:46.6228671Z mov.b32 %r12592, %r12466; 2026-02-21T10:18:46.6228739Z mov.b32 %r12593, %r12466; 2026-02-21T10:18:46.6228942Z $L__BB0_6: // Parent Loop BB0_3 Depth=1 2026-02-21T10:18:46.6229057Z // => This Inner Loop Header: Depth=2 2026-02-21T10:18:46.6229279Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6229435Z add.s64 %rd95, %rd313, %rd15; 2026-02-21T10:18:46.6229646Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6229717Z add.s64 %rd98, %rd313, %rd14; 2026-02-21T10:18:46.6229780Z // begin inline asm 2026-02-21T10:18:46.6229840Z mov.u64 %rd94, 0x0; 2026-02-21T10:18:46.6229977Z createpolicy.fractional.L2::evict_last.b64 %rd94, 1.0; 2026-02-21T10:18:46.6230045Z // end inline asm 2026-02-21T10:18:46.6230105Z // begin inline asm 2026-02-21T10:18:46.6230169Z mov.u32 %r4778, 0x0; 2026-02-21T10:18:46.6230234Z mov.u32 %r4779, 0x0; 2026-02-21T10:18:46.6230297Z mov.u32 %r4780, 0x0; 2026-02-21T10:18:46.6230357Z mov.u32 %r4781, 0x0; 2026-02-21T10:18:46.6230596Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4778, %r4779, %r4780, %r4781 }, [ %rd95 + 0 ], %rd94; 2026-02-21T10:18:46.6230658Z // end inline asm 2026-02-21T10:18:46.6230720Z // begin inline asm 2026-02-21T10:18:46.6230781Z mov.u64 %rd97, 0x0; 2026-02-21T10:18:46.6230920Z createpolicy.fractional.L2::evict_last.b64 %rd97, 1.0; 2026-02-21T10:18:46.6230985Z // end inline asm 2026-02-21T10:18:46.6231049Z // begin inline asm 2026-02-21T10:18:46.6231116Z mov.u32 %r4782, 0x0; 2026-02-21T10:18:46.6231176Z mov.u32 %r4783, 0x0; 2026-02-21T10:18:46.6231234Z mov.u32 %r4784, 0x0; 2026-02-21T10:18:46.6231294Z mov.u32 %r4785, 0x0; 2026-02-21T10:18:46.6231515Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4782, %r4783, %r4784, %r4785 }, [ %rd98 + 0 ], %rd97; 2026-02-21T10:18:46.6231577Z // end inline asm 2026-02-21T10:18:46.6231789Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6231856Z bar.sync 0; 2026-02-21T10:18:46.6231942Z st.shared.v2.b32 [%r33], {%r4778, %r4779}; 2026-02-21T10:18:46.6232034Z st.shared.v2.b32 [%r33+2048], {%r4782, %r4783}; 2026-02-21T10:18:46.6232120Z st.shared.v2.b32 [%r34], {%r4780, %r4781}; 2026-02-21T10:18:46.6232209Z st.shared.v2.b32 [%r34+2048], {%r4784, %r4785}; 2026-02-21T10:18:46.6232267Z bar.sync 0; 2026-02-21T10:18:46.6232338Z ld.shared.b16 %rs257, [%r36]; 2026-02-21T10:18:46.6232417Z ld.shared.b16 %rs258, [%r36+256]; 2026-02-21T10:18:46.6232488Z ld.shared.b16 %rs259, [%r36+16]; 2026-02-21T10:18:46.6232555Z ld.shared.b16 %rs260, [%r36+272]; 2026-02-21T10:18:46.6232631Z ld.shared.b16 %rs261, [%r36+2048]; 2026-02-21T10:18:46.6232700Z ld.shared.b16 %rs262, [%r36+2304]; 2026-02-21T10:18:46.6232765Z ld.shared.b16 %rs263, [%r36+2064]; 2026-02-21T10:18:46.6232902Z ld.shared.b16 %rs264, [%r36+2320]; 2026-02-21T10:18:46.6232973Z ld.shared.b16 %rs265, [%r37]; 2026-02-21T10:18:46.6233088Z ld.shared.b16 %rs266, [%r37+256]; 2026-02-21T10:18:46.6233157Z ld.shared.b16 %rs267, [%r37+16]; 2026-02-21T10:18:46.6233229Z ld.shared.b16 %rs268, [%r37+272]; 2026-02-21T10:18:46.6233294Z ld.shared.b16 %rs269, [%r37+2048]; 2026-02-21T10:18:46.6233359Z ld.shared.b16 %rs270, [%r37+2304]; 2026-02-21T10:18:46.6233428Z ld.shared.b16 %rs271, [%r37+2064]; 2026-02-21T10:18:46.6233492Z ld.shared.b16 %rs272, [%r37+2320]; 2026-02-21T10:18:46.6233560Z cvt.f32.bf16 %r4916, %rs257; 2026-02-21T10:18:46.6233625Z cvt.f32.bf16 %r4917, %rs258; 2026-02-21T10:18:46.6233691Z cvt.f32.bf16 %r4918, %rs265; 2026-02-21T10:18:46.6233754Z cvt.f32.bf16 %r4919, %rs266; 2026-02-21T10:18:46.6233815Z cvt.f32.bf16 %r5048, %rs259; 2026-02-21T10:18:46.6233881Z cvt.f32.bf16 %r5049, %rs260; 2026-02-21T10:18:46.6233943Z cvt.f32.bf16 %r5050, %rs267; 2026-02-21T10:18:46.6234003Z cvt.f32.bf16 %r5051, %rs268; 2026-02-21T10:18:46.6234067Z cvt.f32.bf16 %r5180, %rs261; 2026-02-21T10:18:46.6234135Z cvt.f32.bf16 %r5181, %rs262; 2026-02-21T10:18:46.6234195Z cvt.f32.bf16 %r5182, %rs269; 2026-02-21T10:18:46.6234308Z cvt.f32.bf16 %r5183, %rs270; 2026-02-21T10:18:46.6234376Z cvt.f32.bf16 %r5312, %rs263; 2026-02-21T10:18:46.6234438Z cvt.f32.bf16 %r5313, %rs264; 2026-02-21T10:18:46.6234499Z cvt.f32.bf16 %r5314, %rs271; 2026-02-21T10:18:46.6234603Z cvt.f32.bf16 %r5315, %rs272; 2026-02-21T10:18:46.6234827Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6234888Z // begin inline asm 2026-02-21T10:18:46.6234947Z mov.u32 %r4786, 0x0; 2026-02-21T10:18:46.6235010Z mov.u32 %r4787, 0x0; 2026-02-21T10:18:46.6235109Z ld.global.v2.b32 { %r4786, %r4787 }, [ %rd312 + 0 ]; 2026-02-21T10:18:46.6235169Z // end inline asm 2026-02-21T10:18:46.6235383Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6235443Z bar.sync 0; 2026-02-21T10:18:46.6235512Z st.shared.b8 [%r38], %r4786; 2026-02-21T10:18:46.6235581Z prmt.b32 %r7978, %r4786, 0, 0x7771U; 2026-02-21T10:18:46.6235656Z st.shared.b8 [%r39], %r7978; 2026-02-21T10:18:46.6235724Z prmt.b32 %r7979, %r4786, 0, 0x7772U; 2026-02-21T10:18:46.6235790Z st.shared.b8 [%r40+256], %r7979; 2026-02-21T10:18:46.6235865Z prmt.b32 %r7980, %r4786, 0, 0x7773U; 2026-02-21T10:18:46.6235931Z st.shared.b8 [%r41+256], %r7980; 2026-02-21T10:18:46.6235996Z st.shared.b8 [%r42+512], %r4787; 2026-02-21T10:18:46.6236062Z prmt.b32 %r7981, %r4787, 0, 0x7771U; 2026-02-21T10:18:46.6236132Z st.shared.b8 [%r43+512], %r7981; 2026-02-21T10:18:46.6236197Z prmt.b32 %r7982, %r4787, 0, 0x7772U; 2026-02-21T10:18:46.6236262Z st.shared.b8 [%r44+768], %r7982; 2026-02-21T10:18:46.6236336Z prmt.b32 %r7983, %r4787, 0, 0x7773U; 2026-02-21T10:18:46.6236401Z st.shared.b8 [%r45+768], %r7983; 2026-02-21T10:18:46.6236604Z bar.sync 0; 2026-02-21T10:18:46.6236682Z ld.shared.b32 %r7984, [%r46]; 2026-02-21T10:18:46.6236752Z prmt.b32 %r7985, %r7984, 0, 0x7770U; 2026-02-21T10:18:46.6236824Z cvt.u16.u32 %rs273, %r7985; 2026-02-21T10:18:46.6236892Z prmt.b32 %r7986, %r7984, 0, 0x7771U; 2026-02-21T10:18:46.6236957Z cvt.u16.u32 %rs274, %r7986; 2026-02-21T10:18:46.6237022Z prmt.b32 %r7987, %r7984, 0, 0x7772U; 2026-02-21T10:18:46.6237091Z cvt.u16.u32 %rs275, %r7987; 2026-02-21T10:18:46.6237157Z prmt.b32 %r7988, %r7984, 0, 0x7773U; 2026-02-21T10:18:46.6237225Z cvt.u16.u32 %rs276, %r7988; 2026-02-21T10:18:46.6237304Z ld.shared.b32 %r7989, [%r47]; 2026-02-21T10:18:46.6237377Z prmt.b32 %r7990, %r7989, 0, 0x7770U; 2026-02-21T10:18:46.6237442Z cvt.u16.u32 %rs277, %r7990; 2026-02-21T10:18:46.6237509Z prmt.b32 %r7991, %r7989, 0, 0x7771U; 2026-02-21T10:18:46.6237579Z cvt.u16.u32 %rs278, %r7991; 2026-02-21T10:18:46.6237644Z prmt.b32 %r7992, %r7989, 0, 0x7772U; 2026-02-21T10:18:46.6237707Z cvt.u16.u32 %rs279, %r7992; 2026-02-21T10:18:46.6237871Z prmt.b32 %r7993, %r7989, 0, 0x7773U; 2026-02-21T10:18:46.6237934Z cvt.u16.u32 %rs280, %r7993; 2026-02-21T10:18:46.6238217Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6238293Z shl.b16 %rs281, %rs273, 4; 2026-02-21T10:18:46.6238358Z shl.b16 %rs282, %rs277, 4; 2026-02-21T10:18:46.6238419Z shl.b16 %rs283, %rs274, 4; 2026-02-21T10:18:46.6238480Z shl.b16 %rs284, %rs278, 4; 2026-02-21T10:18:46.6238550Z shl.b16 %rs285, %rs275, 4; 2026-02-21T10:18:46.6238623Z shl.b16 %rs286, %rs279, 4; 2026-02-21T10:18:46.6238686Z shl.b16 %rs287, %rs276, 4; 2026-02-21T10:18:46.6238753Z shl.b16 %rs288, %rs280, 4; 2026-02-21T10:18:46.6238958Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6239023Z cvt.s16.s8 %rs289, %rs281; 2026-02-21T10:18:46.6239084Z shr.s16 %rs290, %rs289, 4; 2026-02-21T10:18:46.6239150Z cvt.s16.s8 %rs291, %rs282; 2026-02-21T10:18:46.6239214Z shr.s16 %rs292, %rs291, 4; 2026-02-21T10:18:46.6239282Z prmt.b32 %r7994, %r7984, 0, 0x8880U; 2026-02-21T10:18:46.6239353Z cvt.u16.u32 %rs293, %r7994; 2026-02-21T10:18:46.6239490Z shr.s16 %rs294, %rs293, 4; 2026-02-21T10:18:46.6239562Z prmt.b32 %r7995, %r7989, 0, 0x8880U; 2026-02-21T10:18:46.6239627Z cvt.u16.u32 %rs295, %r7995; 2026-02-21T10:18:46.6239693Z shr.s16 %rs296, %rs295, 4; 2026-02-21T10:18:46.6239960Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6240033Z cvt.rn.f32.s16 %r7996, %rs296; 2026-02-21T10:18:46.6240115Z cvt.rn.f32.s16 %r7997, %rs294; 2026-02-21T10:18:46.6240180Z cvt.rn.f32.s16 %r7998, %rs292; 2026-02-21T10:18:46.6240246Z cvt.rn.f32.s16 %r7999, %rs290; 2026-02-21T10:18:46.6240454Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6240518Z cvt.s16.s8 %rs297, %rs283; 2026-02-21T10:18:46.6240580Z shr.s16 %rs298, %rs297, 4; 2026-02-21T10:18:46.6240643Z cvt.s16.s8 %rs299, %rs284; 2026-02-21T10:18:46.6240711Z shr.s16 %rs300, %rs299, 4; 2026-02-21T10:18:46.6240780Z prmt.b32 %r8000, %r7984, 0, 0x9991U; 2026-02-21T10:18:46.6240842Z cvt.u16.u32 %rs301, %r8000; 2026-02-21T10:18:46.6240914Z shr.s16 %rs302, %rs301, 4; 2026-02-21T10:18:46.6240983Z prmt.b32 %r8001, %r7989, 0, 0x9991U; 2026-02-21T10:18:46.6241048Z cvt.u16.u32 %rs303, %r8001; 2026-02-21T10:18:46.6241113Z shr.s16 %rs304, %rs303, 4; 2026-02-21T10:18:46.6241321Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6241385Z cvt.rn.f32.s16 %r8002, %rs304; 2026-02-21T10:18:46.6241448Z cvt.rn.f32.s16 %r8003, %rs302; 2026-02-21T10:18:46.6241521Z cvt.rn.f32.s16 %r8004, %rs300; 2026-02-21T10:18:46.6241586Z cvt.rn.f32.s16 %r8005, %rs298; 2026-02-21T10:18:46.6241786Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6241856Z cvt.s16.s8 %rs305, %rs285; 2026-02-21T10:18:46.6241918Z shr.s16 %rs306, %rs305, 4; 2026-02-21T10:18:46.6241981Z cvt.s16.s8 %rs307, %rs286; 2026-02-21T10:18:46.6242045Z shr.s16 %rs308, %rs307, 4; 2026-02-21T10:18:46.6242117Z prmt.b32 %r8006, %r7984, 0, 0xaaa2U; 2026-02-21T10:18:46.6242180Z cvt.u16.u32 %rs309, %r8006; 2026-02-21T10:18:46.6242242Z shr.s16 %rs310, %rs309, 4; 2026-02-21T10:18:46.6242314Z prmt.b32 %r8007, %r7989, 0, 0xaaa2U; 2026-02-21T10:18:46.6242379Z cvt.u16.u32 %rs311, %r8007; 2026-02-21T10:18:46.6242441Z shr.s16 %rs312, %rs311, 4; 2026-02-21T10:18:46.6242647Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6242712Z cvt.rn.f32.s16 %r8008, %rs312; 2026-02-21T10:18:46.6242775Z cvt.rn.f32.s16 %r8009, %rs310; 2026-02-21T10:18:46.6242840Z cvt.rn.f32.s16 %r8010, %rs308; 2026-02-21T10:18:46.6242909Z cvt.rn.f32.s16 %r8011, %rs306; 2026-02-21T10:18:46.6243110Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6243286Z cvt.s16.s8 %rs313, %rs287; 2026-02-21T10:18:46.6243355Z shr.s16 %rs314, %rs313, 4; 2026-02-21T10:18:46.6243419Z cvt.s16.s8 %rs315, %rs288; 2026-02-21T10:18:46.6243483Z shr.s16 %rs316, %rs315, 4; 2026-02-21T10:18:46.6243551Z prmt.b32 %r8012, %r7984, 0, 0xbbb3U; 2026-02-21T10:18:46.6243623Z cvt.u16.u32 %rs317, %r8012; 2026-02-21T10:18:46.6243688Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:18:46.6243758Z prmt.b32 %r8013, %r7989, 0, 0xbbb3U; 2026-02-21T10:18:46.6243826Z cvt.u16.u32 %rs319, %r8013; 2026-02-21T10:18:46.6243890Z shr.s16 %rs320, %rs319, 4; 2026-02-21T10:18:46.6244100Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6244170Z cvt.rn.f32.s16 %r8014, %rs320; 2026-02-21T10:18:46.6244237Z cvt.rn.f32.s16 %r8015, %rs318; 2026-02-21T10:18:46.6244304Z cvt.rn.f32.s16 %r8016, %rs316; 2026-02-21T10:18:46.6244373Z cvt.rn.f32.s16 %r8017, %rs314; 2026-02-21T10:18:46.6244437Z bar.sync 0; 2026-02-21T10:18:46.6244558Z st.shared.v4.b32 [%r48], {%r7999, %r7997, %r7998, %r7996}; 2026-02-21T10:18:46.6244731Z st.shared.v4.b32 [%r49], {%r8005, %r8003, %r8004, %r8002}; 2026-02-21T10:18:46.6244850Z st.shared.v4.b32 [%r50], {%r8011, %r8009, %r8010, %r8008}; 2026-02-21T10:18:46.6244957Z st.shared.v4.b32 [%r51], {%r8017, %r8015, %r8016, %r8014}; 2026-02-21T10:18:46.6245016Z $L__tmp9: 2026-02-21T10:18:46.6245350Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6245419Z // begin inline asm 2026-02-21T10:18:46.6245503Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6245564Z // end inline asm 2026-02-21T10:18:46.6245628Z bar.sync 0; 2026-02-21T10:18:46.6245716Z shfl.sync.idx.b32 %r8018, %r4, 0, 31, -1; 2026-02-21T10:18:46.6245795Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6245867Z mov.pred %p19, -1; 2026-02-21T10:18:46.6245930Z // begin inline asm 2026-02-21T10:18:46.6247557Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r4916,%r4917,%r4918,%r4919}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6247633Z // end inline asm 2026-02-21T10:18:46.6247705Z // begin inline asm 2026-02-21T10:18:46.6249204Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r5048,%r5049,%r5050,%r5051}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6249279Z // end inline asm 2026-02-21T10:18:46.6249342Z // begin inline asm 2026-02-21T10:18:46.6250831Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r5180,%r5181,%r5182,%r5183}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6251036Z // end inline asm 2026-02-21T10:18:46.6251098Z // begin inline asm 2026-02-21T10:18:46.6252584Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r5312,%r5313,%r5314,%r5315}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6252647Z // end inline asm 2026-02-21T10:18:46.6252729Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6252798Z mov.b32 %r7846, 0; 2026-02-21T10:18:46.6252862Z mov.b32 %r5444, %r5575; 2026-02-21T10:18:46.6252926Z mov.b32 %r5445, %r7846; 2026-02-21T10:18:46.6252991Z mov.b32 %r5446, %r7846; 2026-02-21T10:18:46.6253137Z // begin inline asm 2026-02-21T10:18:46.6255767Z // wait for regs: %r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r5444,%r5445,%r5446 2026-02-21T10:18:46.6255859Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6255919Z // end inline asm 2026-02-21T10:18:46.6255981Z $L__tmp10: 2026-02-21T10:18:46.6256205Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6256276Z add.s64 %rd106, %rd95, 32; 2026-02-21T10:18:46.6256614Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6256693Z add.s64 %rd109, %rd98, 32; 2026-02-21T10:18:46.6256760Z // begin inline asm 2026-02-21T10:18:46.6256822Z mov.u64 %rd105, 0x0; 2026-02-21T10:18:46.6256958Z createpolicy.fractional.L2::evict_last.b64 %rd105, 1.0; 2026-02-21T10:18:46.6257019Z // end inline asm 2026-02-21T10:18:46.6257079Z // begin inline asm 2026-02-21T10:18:46.6257147Z mov.u32 %r5578, 0x0; 2026-02-21T10:18:46.6257208Z mov.u32 %r5579, 0x0; 2026-02-21T10:18:46.6257268Z mov.u32 %r5580, 0x0; 2026-02-21T10:18:46.6257327Z mov.u32 %r5581, 0x0; 2026-02-21T10:18:46.6257564Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5578, %r5579, %r5580, %r5581 }, [ %rd106 + 0 ], %rd105; 2026-02-21T10:18:46.6257624Z // end inline asm 2026-02-21T10:18:46.6257686Z // begin inline asm 2026-02-21T10:18:46.6257766Z mov.u64 %rd108, 0x0; 2026-02-21T10:18:46.6257890Z createpolicy.fractional.L2::evict_last.b64 %rd108, 1.0; 2026-02-21T10:18:46.6257949Z // end inline asm 2026-02-21T10:18:46.6258012Z // begin inline asm 2026-02-21T10:18:46.6258077Z mov.u32 %r5582, 0x0; 2026-02-21T10:18:46.6258136Z mov.u32 %r5583, 0x0; 2026-02-21T10:18:46.6258195Z mov.u32 %r5584, 0x0; 2026-02-21T10:18:46.6258257Z mov.u32 %r5585, 0x0; 2026-02-21T10:18:46.6258476Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5582, %r5583, %r5584, %r5585 }, [ %rd109 + 0 ], %rd108; 2026-02-21T10:18:46.6258691Z // end inline asm 2026-02-21T10:18:46.6258912Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6258971Z bar.sync 0; 2026-02-21T10:18:46.6259055Z st.shared.v2.b32 [%r33], {%r5578, %r5579}; 2026-02-21T10:18:46.6259146Z st.shared.v2.b32 [%r33+2048], {%r5582, %r5583}; 2026-02-21T10:18:46.6259232Z st.shared.v2.b32 [%r34], {%r5580, %r5581}; 2026-02-21T10:18:46.6259319Z st.shared.v2.b32 [%r34+2048], {%r5584, %r5585}; 2026-02-21T10:18:46.6259380Z bar.sync 0; 2026-02-21T10:18:46.6259459Z ld.shared.b16 %rs321, [%r36]; 2026-02-21T10:18:46.6259532Z ld.shared.b16 %rs322, [%r36+256]; 2026-02-21T10:18:46.6259605Z ld.shared.b16 %rs323, [%r36+16]; 2026-02-21T10:18:46.6259678Z ld.shared.b16 %rs324, [%r36+272]; 2026-02-21T10:18:46.6259749Z ld.shared.b16 %rs325, [%r36+2048]; 2026-02-21T10:18:46.6259816Z ld.shared.b16 %rs326, [%r36+2304]; 2026-02-21T10:18:46.6259885Z ld.shared.b16 %rs327, [%r36+2064]; 2026-02-21T10:18:46.6259966Z ld.shared.b16 %rs328, [%r36+2320]; 2026-02-21T10:18:46.6260106Z ld.shared.b16 %rs329, [%r37]; 2026-02-21T10:18:46.6260180Z ld.shared.b16 %rs330, [%r37+256]; 2026-02-21T10:18:46.6260255Z ld.shared.b16 %rs331, [%r37+16]; 2026-02-21T10:18:46.6260322Z ld.shared.b16 %rs332, [%r37+272]; 2026-02-21T10:18:46.6260388Z ld.shared.b16 %rs333, [%r37+2048]; 2026-02-21T10:18:46.6260511Z ld.shared.b16 %rs334, [%r37+2304]; 2026-02-21T10:18:46.6260583Z ld.shared.b16 %rs335, [%r37+2064]; 2026-02-21T10:18:46.6260650Z ld.shared.b16 %rs336, [%r37+2320]; 2026-02-21T10:18:46.6260718Z cvt.f32.bf16 %r5716, %rs321; 2026-02-21T10:18:46.6260787Z cvt.f32.bf16 %r5717, %rs322; 2026-02-21T10:18:46.6260851Z cvt.f32.bf16 %r5718, %rs329; 2026-02-21T10:18:46.6260915Z cvt.f32.bf16 %r5719, %rs330; 2026-02-21T10:18:46.6260977Z cvt.f32.bf16 %r5848, %rs323; 2026-02-21T10:18:46.6261045Z cvt.f32.bf16 %r5849, %rs324; 2026-02-21T10:18:46.6261112Z cvt.f32.bf16 %r5850, %rs331; 2026-02-21T10:18:46.6261177Z cvt.f32.bf16 %r5851, %rs332; 2026-02-21T10:18:46.6261245Z cvt.f32.bf16 %r5980, %rs325; 2026-02-21T10:18:46.6261309Z cvt.f32.bf16 %r5981, %rs326; 2026-02-21T10:18:46.6261373Z cvt.f32.bf16 %r5982, %rs333; 2026-02-21T10:18:46.6261451Z cvt.f32.bf16 %r5983, %rs334; 2026-02-21T10:18:46.6261516Z cvt.f32.bf16 %r6112, %rs327; 2026-02-21T10:18:46.6261581Z cvt.f32.bf16 %r6113, %rs328; 2026-02-21T10:18:46.6261648Z cvt.f32.bf16 %r6114, %rs335; 2026-02-21T10:18:46.6261716Z cvt.f32.bf16 %r6115, %rs336; 2026-02-21T10:18:46.6261935Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6262005Z add.s64 %rd111, %rd312, 10240; 2026-02-21T10:18:46.6262073Z // begin inline asm 2026-02-21T10:18:46.6262135Z mov.u32 %r5586, 0x0; 2026-02-21T10:18:46.6262195Z mov.u32 %r5587, 0x0; 2026-02-21T10:18:46.6262299Z ld.global.v2.b32 { %r5586, %r5587 }, [ %rd111 + 0 ]; 2026-02-21T10:18:46.6262368Z // end inline asm 2026-02-21T10:18:46.6262576Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6262639Z bar.sync 0; 2026-02-21T10:18:46.6262716Z st.shared.b8 [%r38], %r5586; 2026-02-21T10:18:46.6262789Z prmt.b32 %r8019, %r5586, 0, 0x7771U; 2026-02-21T10:18:46.6262857Z st.shared.b8 [%r39], %r8019; 2026-02-21T10:18:46.6262933Z prmt.b32 %r8020, %r5586, 0, 0x7772U; 2026-02-21T10:18:46.6263002Z st.shared.b8 [%r40+256], %r8020; 2026-02-21T10:18:46.6263069Z prmt.b32 %r8021, %r5586, 0, 0x7773U; 2026-02-21T10:18:46.6263137Z st.shared.b8 [%r41+256], %r8021; 2026-02-21T10:18:46.6263209Z st.shared.b8 [%r42+512], %r5587; 2026-02-21T10:18:46.6263274Z prmt.b32 %r8022, %r5587, 0, 0x7771U; 2026-02-21T10:18:46.6263342Z st.shared.b8 [%r43+512], %r8022; 2026-02-21T10:18:46.6263416Z prmt.b32 %r8023, %r5587, 0, 0x7772U; 2026-02-21T10:18:46.6263482Z st.shared.b8 [%r44+768], %r8023; 2026-02-21T10:18:46.6263617Z prmt.b32 %r8024, %r5587, 0, 0x7773U; 2026-02-21T10:18:46.6263730Z st.shared.b8 [%r45+768], %r8024; 2026-02-21T10:18:46.6263793Z bar.sync 0; 2026-02-21T10:18:46.6263863Z ld.shared.b32 %r8025, [%r46]; 2026-02-21T10:18:46.6263929Z prmt.b32 %r8026, %r8025, 0, 0x7770U; 2026-02-21T10:18:46.6264000Z cvt.u16.u32 %rs337, %r8026; 2026-02-21T10:18:46.6264071Z prmt.b32 %r8027, %r8025, 0, 0x7771U; 2026-02-21T10:18:46.6264137Z cvt.u16.u32 %rs338, %r8027; 2026-02-21T10:18:46.6264203Z prmt.b32 %r8028, %r8025, 0, 0x7772U; 2026-02-21T10:18:46.6264271Z cvt.u16.u32 %rs339, %r8028; 2026-02-21T10:18:46.6264335Z prmt.b32 %r8029, %r8025, 0, 0x7773U; 2026-02-21T10:18:46.6264396Z cvt.u16.u32 %rs340, %r8029; 2026-02-21T10:18:46.6264462Z ld.shared.b32 %r8030, [%r47]; 2026-02-21T10:18:46.6264528Z prmt.b32 %r8031, %r8030, 0, 0x7770U; 2026-02-21T10:18:46.6264589Z cvt.u16.u32 %rs341, %r8031; 2026-02-21T10:18:46.6264662Z prmt.b32 %r8032, %r8030, 0, 0x7771U; 2026-02-21T10:18:46.6264735Z cvt.u16.u32 %rs342, %r8032; 2026-02-21T10:18:46.6264802Z prmt.b32 %r8033, %r8030, 0, 0x7772U; 2026-02-21T10:18:46.6264867Z cvt.u16.u32 %rs343, %r8033; 2026-02-21T10:18:46.6264991Z prmt.b32 %r8034, %r8030, 0, 0x7773U; 2026-02-21T10:18:46.6265056Z cvt.u16.u32 %rs344, %r8034; 2026-02-21T10:18:46.6265263Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6265333Z shl.b16 %rs345, %rs337, 4; 2026-02-21T10:18:46.6265443Z shl.b16 %rs346, %rs341, 4; 2026-02-21T10:18:46.6265507Z shl.b16 %rs347, %rs338, 4; 2026-02-21T10:18:46.6265568Z shl.b16 %rs348, %rs342, 4; 2026-02-21T10:18:46.6265645Z shl.b16 %rs349, %rs339, 4; 2026-02-21T10:18:46.6265708Z shl.b16 %rs350, %rs343, 4; 2026-02-21T10:18:46.6265770Z shl.b16 %rs351, %rs340, 4; 2026-02-21T10:18:46.6265837Z shl.b16 %rs352, %rs344, 4; 2026-02-21T10:18:46.6266042Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6266107Z cvt.s16.s8 %rs353, %rs345; 2026-02-21T10:18:46.6266176Z shr.s16 %rs354, %rs353, 4; 2026-02-21T10:18:46.6266240Z cvt.s16.s8 %rs355, %rs346; 2026-02-21T10:18:46.6266304Z shr.s16 %rs356, %rs355, 4; 2026-02-21T10:18:46.6266371Z prmt.b32 %r8035, %r8025, 0, 0x8880U; 2026-02-21T10:18:46.6266440Z cvt.u16.u32 %rs357, %r8035; 2026-02-21T10:18:46.6266627Z shr.s16 %rs358, %rs357, 4; 2026-02-21T10:18:46.6266695Z prmt.b32 %r8036, %r8030, 0, 0x8880U; 2026-02-21T10:18:46.6266765Z cvt.u16.u32 %rs359, %r8036; 2026-02-21T10:18:46.6266828Z shr.s16 %rs360, %rs359, 4; 2026-02-21T10:18:46.6267031Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6267099Z cvt.rn.f32.s16 %r8037, %rs360; 2026-02-21T10:18:46.6267170Z cvt.rn.f32.s16 %r8038, %rs358; 2026-02-21T10:18:46.6267235Z cvt.rn.f32.s16 %r8039, %rs356; 2026-02-21T10:18:46.6267310Z cvt.rn.f32.s16 %r8040, %rs354; 2026-02-21T10:18:46.6267519Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6267588Z cvt.s16.s8 %rs361, %rs347; 2026-02-21T10:18:46.6267650Z shr.s16 %rs362, %rs361, 4; 2026-02-21T10:18:46.6267721Z cvt.s16.s8 %rs363, %rs348; 2026-02-21T10:18:46.6267784Z shr.s16 %rs364, %rs363, 4; 2026-02-21T10:18:46.6267851Z prmt.b32 %r8041, %r8025, 0, 0x9991U; 2026-02-21T10:18:46.6267914Z cvt.u16.u32 %rs365, %r8041; 2026-02-21T10:18:46.6267982Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:18:46.6268048Z prmt.b32 %r8042, %r8030, 0, 0x9991U; 2026-02-21T10:18:46.6268110Z cvt.u16.u32 %rs367, %r8042; 2026-02-21T10:18:46.6268176Z shr.s16 %rs368, %rs367, 4; 2026-02-21T10:18:46.6268375Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6268515Z cvt.rn.f32.s16 %r8043, %rs368; 2026-02-21T10:18:46.6268583Z cvt.rn.f32.s16 %r8044, %rs366; 2026-02-21T10:18:46.6268652Z cvt.rn.f32.s16 %r8045, %rs364; 2026-02-21T10:18:46.6268802Z cvt.rn.f32.s16 %r8046, %rs362; 2026-02-21T10:18:46.6269004Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6269134Z cvt.s16.s8 %rs369, %rs349; 2026-02-21T10:18:46.6269197Z shr.s16 %rs370, %rs369, 4; 2026-02-21T10:18:46.6269259Z cvt.s16.s8 %rs371, %rs350; 2026-02-21T10:18:46.6269325Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:18:46.6269392Z prmt.b32 %r8047, %r8025, 0, 0xaaa2U; 2026-02-21T10:18:46.6269455Z cvt.u16.u32 %rs373, %r8047; 2026-02-21T10:18:46.6269517Z shr.s16 %rs374, %rs373, 4; 2026-02-21T10:18:46.6269587Z prmt.b32 %r8048, %r8030, 0, 0xaaa2U; 2026-02-21T10:18:46.6269651Z cvt.u16.u32 %rs375, %r8048; 2026-02-21T10:18:46.6269716Z shr.s16 %rs376, %rs375, 4; 2026-02-21T10:18:46.6269926Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6269990Z cvt.rn.f32.s16 %r8049, %rs376; 2026-02-21T10:18:46.6270053Z cvt.rn.f32.s16 %r8050, %rs374; 2026-02-21T10:18:46.6270118Z cvt.rn.f32.s16 %r8051, %rs372; 2026-02-21T10:18:46.6270186Z cvt.rn.f32.s16 %r8052, %rs370; 2026-02-21T10:18:46.6270453Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6270521Z cvt.s16.s8 %rs377, %rs351; 2026-02-21T10:18:46.6270595Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:18:46.6270665Z cvt.s16.s8 %rs379, %rs352; 2026-02-21T10:18:46.6270728Z shr.s16 %rs380, %rs379, 4; 2026-02-21T10:18:46.6270863Z prmt.b32 %r8053, %r8025, 0, 0xbbb3U; 2026-02-21T10:18:46.6270930Z cvt.u16.u32 %rs381, %r8053; 2026-02-21T10:18:46.6270992Z shr.s16 %rs382, %rs381, 4; 2026-02-21T10:18:46.6271058Z prmt.b32 %r8054, %r8030, 0, 0xbbb3U; 2026-02-21T10:18:46.6271126Z cvt.u16.u32 %rs383, %r8054; 2026-02-21T10:18:46.6271189Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:18:46.6271390Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6271464Z cvt.rn.f32.s16 %r8055, %rs384; 2026-02-21T10:18:46.6271540Z cvt.rn.f32.s16 %r8056, %rs382; 2026-02-21T10:18:46.6271608Z cvt.rn.f32.s16 %r8057, %rs380; 2026-02-21T10:18:46.6271676Z cvt.rn.f32.s16 %r8058, %rs378; 2026-02-21T10:18:46.6271734Z bar.sync 0; 2026-02-21T10:18:46.6271850Z st.shared.v4.b32 [%r48], {%r8040, %r8038, %r8039, %r8037}; 2026-02-21T10:18:46.6271959Z st.shared.v4.b32 [%r49], {%r8046, %r8044, %r8045, %r8043}; 2026-02-21T10:18:46.6272071Z st.shared.v4.b32 [%r50], {%r8052, %r8050, %r8051, %r8049}; 2026-02-21T10:18:46.6272171Z st.shared.v4.b32 [%r51], {%r8058, %r8056, %r8057, %r8055}; 2026-02-21T10:18:46.6272229Z $L__tmp11: 2026-02-21T10:18:46.6272522Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6272586Z // begin inline asm 2026-02-21T10:18:46.6272666Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6272725Z // end inline asm 2026-02-21T10:18:46.6272785Z bar.sync 0; 2026-02-21T10:18:46.6272864Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6272925Z // begin inline asm 2026-02-21T10:18:46.6274429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r5716,%r5717,%r5718,%r5719}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6274490Z // end inline asm 2026-02-21T10:18:46.6274553Z // begin inline asm 2026-02-21T10:18:46.6276041Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r5848,%r5849,%r5850,%r5851}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6276228Z // end inline asm 2026-02-21T10:18:46.6276294Z // begin inline asm 2026-02-21T10:18:46.6277916Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r5980,%r5981,%r5982,%r5983}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6278066Z // end inline asm 2026-02-21T10:18:46.6278128Z // begin inline asm 2026-02-21T10:18:46.6279664Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r6112,%r6113,%r6114,%r6115}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6279735Z // end inline asm 2026-02-21T10:18:46.6279817Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6279881Z mov.b32 %r6244, %r5575; 2026-02-21T10:18:46.6279960Z mov.b32 %r6245, %r7846; 2026-02-21T10:18:46.6280022Z mov.b32 %r6246, %r7846; 2026-02-21T10:18:46.6280085Z // begin inline asm 2026-02-21T10:18:46.6282614Z // wait for regs: %r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r6244,%r6245,%r6246 2026-02-21T10:18:46.6282697Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6282755Z // end inline asm 2026-02-21T10:18:46.6282819Z $L__tmp12: 2026-02-21T10:18:46.6283031Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6283095Z add.s64 %rd117, %rd95, 64; 2026-02-21T10:18:46.6283303Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6283366Z add.s64 %rd120, %rd98, 64; 2026-02-21T10:18:46.6283426Z // begin inline asm 2026-02-21T10:18:46.6283491Z mov.u64 %rd116, 0x0; 2026-02-21T10:18:46.6283615Z createpolicy.fractional.L2::evict_last.b64 %rd116, 1.0; 2026-02-21T10:18:46.6283755Z // end inline asm 2026-02-21T10:18:46.6283875Z // begin inline asm 2026-02-21T10:18:46.6283934Z mov.u32 %r6378, 0x0; 2026-02-21T10:18:46.6283993Z mov.u32 %r6379, 0x0; 2026-02-21T10:18:46.6284051Z mov.u32 %r6380, 0x0; 2026-02-21T10:18:46.6284115Z mov.u32 %r6381, 0x0; 2026-02-21T10:18:46.6284342Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6378, %r6379, %r6380, %r6381 }, [ %rd117 + 0 ], %rd116; 2026-02-21T10:18:46.6284406Z // end inline asm 2026-02-21T10:18:46.6284469Z // begin inline asm 2026-02-21T10:18:46.6284529Z mov.u64 %rd119, 0x0; 2026-02-21T10:18:46.6284648Z createpolicy.fractional.L2::evict_last.b64 %rd119, 1.0; 2026-02-21T10:18:46.6284710Z // end inline asm 2026-02-21T10:18:46.6284775Z // begin inline asm 2026-02-21T10:18:46.6284832Z mov.u32 %r6382, 0x0; 2026-02-21T10:18:46.6284891Z mov.u32 %r6383, 0x0; 2026-02-21T10:18:46.6284953Z mov.u32 %r6384, 0x0; 2026-02-21T10:18:46.6285011Z mov.u32 %r6385, 0x0; 2026-02-21T10:18:46.6285227Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6382, %r6383, %r6384, %r6385 }, [ %rd120 + 0 ], %rd119; 2026-02-21T10:18:46.6285288Z // end inline asm 2026-02-21T10:18:46.6285552Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6285615Z bar.sync 0; 2026-02-21T10:18:46.6285697Z st.shared.v2.b32 [%r33], {%r6378, %r6379}; 2026-02-21T10:18:46.6285803Z st.shared.v2.b32 [%r33+2048], {%r6382, %r6383}; 2026-02-21T10:18:46.6285928Z st.shared.v2.b32 [%r34], {%r6380, %r6381}; 2026-02-21T10:18:46.6286016Z st.shared.v2.b32 [%r34+2048], {%r6384, %r6385}; 2026-02-21T10:18:46.6286079Z bar.sync 0; 2026-02-21T10:18:46.6286147Z ld.shared.b16 %rs385, [%r36]; 2026-02-21T10:18:46.6286217Z ld.shared.b16 %rs386, [%r36+256]; 2026-02-21T10:18:46.6286287Z ld.shared.b16 %rs387, [%r36+16]; 2026-02-21T10:18:46.6286357Z ld.shared.b16 %rs388, [%r36+272]; 2026-02-21T10:18:46.6286428Z ld.shared.b16 %rs389, [%r36+2048]; 2026-02-21T10:18:46.6286643Z ld.shared.b16 %rs390, [%r36+2304]; 2026-02-21T10:18:46.6286720Z ld.shared.b16 %rs391, [%r36+2064]; 2026-02-21T10:18:46.6286787Z ld.shared.b16 %rs392, [%r36+2320]; 2026-02-21T10:18:46.6286854Z ld.shared.b16 %rs393, [%r37]; 2026-02-21T10:18:46.6286922Z ld.shared.b16 %rs394, [%r37+256]; 2026-02-21T10:18:46.6286994Z ld.shared.b16 %rs395, [%r37+16]; 2026-02-21T10:18:46.6287059Z ld.shared.b16 %rs396, [%r37+272]; 2026-02-21T10:18:46.6287126Z ld.shared.b16 %rs397, [%r37+2048]; 2026-02-21T10:18:46.6287199Z ld.shared.b16 %rs398, [%r37+2304]; 2026-02-21T10:18:46.6287269Z ld.shared.b16 %rs399, [%r37+2064]; 2026-02-21T10:18:46.6287336Z ld.shared.b16 %rs400, [%r37+2320]; 2026-02-21T10:18:46.6287412Z cvt.f32.bf16 %r6516, %rs385; 2026-02-21T10:18:46.6287476Z cvt.f32.bf16 %r6517, %rs386; 2026-02-21T10:18:46.6287542Z cvt.f32.bf16 %r6518, %rs393; 2026-02-21T10:18:46.6287606Z cvt.f32.bf16 %r6519, %rs394; 2026-02-21T10:18:46.6287674Z cvt.f32.bf16 %r6648, %rs387; 2026-02-21T10:18:46.6287740Z cvt.f32.bf16 %r6649, %rs388; 2026-02-21T10:18:46.6287805Z cvt.f32.bf16 %r6650, %rs395; 2026-02-21T10:18:46.6287874Z cvt.f32.bf16 %r6651, %rs396; 2026-02-21T10:18:46.6287935Z cvt.f32.bf16 %r6780, %rs389; 2026-02-21T10:18:46.6288011Z cvt.f32.bf16 %r6781, %rs390; 2026-02-21T10:18:46.6288077Z cvt.f32.bf16 %r6782, %rs397; 2026-02-21T10:18:46.6288143Z cvt.f32.bf16 %r6783, %rs398; 2026-02-21T10:18:46.6288209Z cvt.f32.bf16 %r6912, %rs391; 2026-02-21T10:18:46.6288274Z cvt.f32.bf16 %r6913, %rs392; 2026-02-21T10:18:46.6288343Z cvt.f32.bf16 %r6914, %rs399; 2026-02-21T10:18:46.6288404Z cvt.f32.bf16 %r6915, %rs400; 2026-02-21T10:18:46.6288613Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6288694Z add.s64 %rd122, %rd312, 20480; 2026-02-21T10:18:46.6288759Z // begin inline asm 2026-02-21T10:18:46.6288821Z mov.u32 %r6386, 0x0; 2026-02-21T10:18:46.6288881Z mov.u32 %r6387, 0x0; 2026-02-21T10:18:46.6288985Z ld.global.v2.b32 { %r6386, %r6387 }, [ %rd122 + 0 ]; 2026-02-21T10:18:46.6289128Z // end inline asm 2026-02-21T10:18:46.6289396Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6289461Z bar.sync 0; 2026-02-21T10:18:46.6289527Z st.shared.b8 [%r38], %r6386; 2026-02-21T10:18:46.6289597Z prmt.b32 %r8059, %r6386, 0, 0x7771U; 2026-02-21T10:18:46.6289664Z st.shared.b8 [%r39], %r8059; 2026-02-21T10:18:46.6289735Z prmt.b32 %r8060, %r6386, 0, 0x7772U; 2026-02-21T10:18:46.6289802Z st.shared.b8 [%r40+256], %r8060; 2026-02-21T10:18:46.6289867Z prmt.b32 %r8061, %r6386, 0, 0x7773U; 2026-02-21T10:18:46.6289936Z st.shared.b8 [%r41+256], %r8061; 2026-02-21T10:18:46.6290001Z st.shared.b8 [%r42+512], %r6387; 2026-02-21T10:18:46.6290068Z prmt.b32 %r8062, %r6387, 0, 0x7771U; 2026-02-21T10:18:46.6290140Z st.shared.b8 [%r43+512], %r8062; 2026-02-21T10:18:46.6290206Z prmt.b32 %r8063, %r6387, 0, 0x7772U; 2026-02-21T10:18:46.6290270Z st.shared.b8 [%r44+768], %r8063; 2026-02-21T10:18:46.6290335Z prmt.b32 %r8064, %r6387, 0, 0x7773U; 2026-02-21T10:18:46.6290404Z st.shared.b8 [%r45+768], %r8064; 2026-02-21T10:18:46.6290461Z bar.sync 0; 2026-02-21T10:18:46.6290593Z ld.shared.b32 %r8065, [%r46]; 2026-02-21T10:18:46.6290666Z prmt.b32 %r8066, %r8065, 0, 0x7770U; 2026-02-21T10:18:46.6290731Z cvt.u16.u32 %rs401, %r8066; 2026-02-21T10:18:46.6290795Z prmt.b32 %r8067, %r8065, 0, 0x7771U; 2026-02-21T10:18:46.6290858Z cvt.u16.u32 %rs402, %r8067; 2026-02-21T10:18:46.6290987Z prmt.b32 %r8068, %r8065, 0, 0x7772U; 2026-02-21T10:18:46.6291054Z cvt.u16.u32 %rs403, %r8068; 2026-02-21T10:18:46.6291119Z prmt.b32 %r8069, %r8065, 0, 0x7773U; 2026-02-21T10:18:46.6291198Z cvt.u16.u32 %rs404, %r8069; 2026-02-21T10:18:46.6291267Z ld.shared.b32 %r8070, [%r47]; 2026-02-21T10:18:46.6291331Z prmt.b32 %r8071, %r8070, 0, 0x7770U; 2026-02-21T10:18:46.6291393Z cvt.u16.u32 %rs405, %r8071; 2026-02-21T10:18:46.6291464Z prmt.b32 %r8072, %r8070, 0, 0x7771U; 2026-02-21T10:18:46.6291529Z cvt.u16.u32 %rs406, %r8072; 2026-02-21T10:18:46.6291594Z prmt.b32 %r8073, %r8070, 0, 0x7772U; 2026-02-21T10:18:46.6291665Z cvt.u16.u32 %rs407, %r8073; 2026-02-21T10:18:46.6291736Z prmt.b32 %r8074, %r8070, 0, 0x7773U; 2026-02-21T10:18:46.6291800Z cvt.u16.u32 %rs408, %r8074; 2026-02-21T10:18:46.6292010Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6292076Z shl.b16 %rs409, %rs401, 4; 2026-02-21T10:18:46.6292141Z shl.b16 %rs410, %rs405, 4; 2026-02-21T10:18:46.6292204Z shl.b16 %rs411, %rs402, 4; 2026-02-21T10:18:46.6292274Z shl.b16 %rs412, %rs406, 4; 2026-02-21T10:18:46.6292336Z shl.b16 %rs413, %rs403, 4; 2026-02-21T10:18:46.6292397Z shl.b16 %rs414, %rs407, 4; 2026-02-21T10:18:46.6292462Z shl.b16 %rs415, %rs404, 4; 2026-02-21T10:18:46.6292525Z shl.b16 %rs416, %rs408, 4; 2026-02-21T10:18:46.6292726Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6292793Z cvt.s16.s8 %rs417, %rs409; 2026-02-21T10:18:46.6292861Z shr.s16 %rs418, %rs417, 4; 2026-02-21T10:18:46.6292925Z cvt.s16.s8 %rs419, %rs410; 2026-02-21T10:18:46.6292989Z shr.s16 %rs420, %rs419, 4; 2026-02-21T10:18:46.6293058Z prmt.b32 %r8075, %r8065, 0, 0x8880U; 2026-02-21T10:18:46.6293122Z cvt.u16.u32 %rs421, %r8075; 2026-02-21T10:18:46.6293188Z shr.s16 %rs422, %rs421, 4; 2026-02-21T10:18:46.6293259Z prmt.b32 %r8076, %r8070, 0, 0x8880U; 2026-02-21T10:18:46.6293324Z cvt.u16.u32 %rs423, %r8076; 2026-02-21T10:18:46.6293385Z shr.s16 %rs424, %rs423, 4; 2026-02-21T10:18:46.6293597Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6293672Z cvt.rn.f32.s16 %r8077, %rs424; 2026-02-21T10:18:46.6293739Z cvt.rn.f32.s16 %r8078, %rs422; 2026-02-21T10:18:46.6293802Z cvt.rn.f32.s16 %r8079, %rs420; 2026-02-21T10:18:46.6293871Z cvt.rn.f32.s16 %r8080, %rs418; 2026-02-21T10:18:46.6294076Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6294244Z cvt.s16.s8 %rs425, %rs411; 2026-02-21T10:18:46.6294308Z shr.s16 %rs426, %rs425, 4; 2026-02-21T10:18:46.6294375Z cvt.s16.s8 %rs427, %rs412; 2026-02-21T10:18:46.6294438Z shr.s16 %rs428, %rs427, 4; 2026-02-21T10:18:46.6294504Z prmt.b32 %r8081, %r8065, 0, 0x9991U; 2026-02-21T10:18:46.6294573Z cvt.u16.u32 %rs429, %r8081; 2026-02-21T10:18:46.6294637Z shr.s16 %rs430, %rs429, 4; 2026-02-21T10:18:46.6294704Z prmt.b32 %r8082, %r8070, 0, 0x9991U; 2026-02-21T10:18:46.6294772Z cvt.u16.u32 %rs431, %r8082; 2026-02-21T10:18:46.6294835Z shr.s16 %rs432, %rs431, 4; 2026-02-21T10:18:46.6295039Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6295105Z cvt.rn.f32.s16 %r8083, %rs432; 2026-02-21T10:18:46.6295175Z cvt.rn.f32.s16 %r8084, %rs430; 2026-02-21T10:18:46.6295252Z cvt.rn.f32.s16 %r8085, %rs428; 2026-02-21T10:18:46.6295320Z cvt.rn.f32.s16 %r8086, %rs426; 2026-02-21T10:18:46.6295529Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6295645Z cvt.s16.s8 %rs433, %rs413; 2026-02-21T10:18:46.6295709Z shr.s16 %rs434, %rs433, 4; 2026-02-21T10:18:46.6295772Z cvt.s16.s8 %rs435, %rs414; 2026-02-21T10:18:46.6295837Z shr.s16 %rs436, %rs435, 4; 2026-02-21T10:18:46.6295905Z prmt.b32 %r8087, %r8065, 0, 0xaaa2U; 2026-02-21T10:18:46.6296015Z cvt.u16.u32 %rs437, %r8087; 2026-02-21T10:18:46.6296085Z shr.s16 %rs438, %rs437, 4; 2026-02-21T10:18:46.6296150Z prmt.b32 %r8088, %r8070, 0, 0xaaa2U; 2026-02-21T10:18:46.6296215Z cvt.u16.u32 %rs439, %r8088; 2026-02-21T10:18:46.6296285Z shr.s16 %rs440, %rs439, 4; 2026-02-21T10:18:46.6296617Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6296691Z cvt.rn.f32.s16 %r8089, %rs440; 2026-02-21T10:18:46.6296760Z cvt.rn.f32.s16 %r8090, %rs438; 2026-02-21T10:18:46.6296832Z cvt.rn.f32.s16 %r8091, %rs436; 2026-02-21T10:18:46.6296898Z cvt.rn.f32.s16 %r8092, %rs434; 2026-02-21T10:18:46.6297107Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6297188Z cvt.s16.s8 %rs441, %rs415; 2026-02-21T10:18:46.6297255Z shr.s16 %rs442, %rs441, 4; 2026-02-21T10:18:46.6297320Z cvt.s16.s8 %rs443, %rs416; 2026-02-21T10:18:46.6297385Z shr.s16 %rs444, %rs443, 4; 2026-02-21T10:18:46.6297460Z prmt.b32 %r8093, %r8065, 0, 0xbbb3U; 2026-02-21T10:18:46.6297524Z cvt.u16.u32 %rs445, %r8093; 2026-02-21T10:18:46.6297587Z shr.s16 %rs446, %rs445, 4; 2026-02-21T10:18:46.6297659Z prmt.b32 %r8094, %r8070, 0, 0xbbb3U; 2026-02-21T10:18:46.6297724Z cvt.u16.u32 %rs447, %r8094; 2026-02-21T10:18:46.6297789Z shr.s16 %rs448, %rs447, 4; 2026-02-21T10:18:46.6297995Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6298060Z cvt.rn.f32.s16 %r8095, %rs448; 2026-02-21T10:18:46.6298126Z cvt.rn.f32.s16 %r8096, %rs446; 2026-02-21T10:18:46.6298193Z cvt.rn.f32.s16 %r8097, %rs444; 2026-02-21T10:18:46.6298265Z cvt.rn.f32.s16 %r8098, %rs442; 2026-02-21T10:18:46.6298323Z bar.sync 0; 2026-02-21T10:18:46.6298435Z st.shared.v4.b32 [%r48], {%r8080, %r8078, %r8079, %r8077}; 2026-02-21T10:18:46.6298550Z st.shared.v4.b32 [%r49], {%r8086, %r8084, %r8085, %r8083}; 2026-02-21T10:18:46.6298656Z st.shared.v4.b32 [%r50], {%r8092, %r8090, %r8091, %r8089}; 2026-02-21T10:18:46.6298758Z st.shared.v4.b32 [%r51], {%r8098, %r8096, %r8097, %r8095}; 2026-02-21T10:18:46.6298819Z $L__tmp13: 2026-02-21T10:18:46.6299099Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6299162Z // begin inline asm 2026-02-21T10:18:46.6299242Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6299306Z // end inline asm 2026-02-21T10:18:46.6299374Z bar.sync 0; 2026-02-21T10:18:46.6299550Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6299677Z // begin inline asm 2026-02-21T10:18:46.6301172Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r6516,%r6517,%r6518,%r6519}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6301233Z // end inline asm 2026-02-21T10:18:46.6301300Z // begin inline asm 2026-02-21T10:18:46.6302855Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r6648,%r6649,%r6650,%r6651}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6302980Z // end inline asm 2026-02-21T10:18:46.6303043Z // begin inline asm 2026-02-21T10:18:46.6304529Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r6780,%r6781,%r6782,%r6783}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6304608Z // end inline asm 2026-02-21T10:18:46.6304669Z // begin inline asm 2026-02-21T10:18:46.6306147Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r6912,%r6913,%r6914,%r6915}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6306211Z // end inline asm 2026-02-21T10:18:46.6306293Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6306358Z mov.b32 %r7045, %r7846; 2026-02-21T10:18:46.6306426Z mov.b32 %r7046, %r7846; 2026-02-21T10:18:46.6306607Z mov.b32 %r7044, %r5575; 2026-02-21T10:18:46.6306673Z // begin inline asm 2026-02-21T10:18:46.6309291Z // wait for regs: %r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r7044,%r7045,%r7046 2026-02-21T10:18:46.6309506Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6309570Z // end inline asm 2026-02-21T10:18:46.6309627Z $L__tmp14: 2026-02-21T10:18:46.6309845Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6309918Z add.s64 %rd128, %rd95, 96; 2026-02-21T10:18:46.6310124Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6310189Z add.s64 %rd131, %rd98, 96; 2026-02-21T10:18:46.6310251Z // begin inline asm 2026-02-21T10:18:46.6310318Z mov.u64 %rd127, 0x0; 2026-02-21T10:18:46.6310456Z createpolicy.fractional.L2::evict_last.b64 %rd127, 1.0; 2026-02-21T10:18:46.6310518Z // end inline asm 2026-02-21T10:18:46.6310585Z // begin inline asm 2026-02-21T10:18:46.6310646Z mov.u32 %r7178, 0x0; 2026-02-21T10:18:46.6310771Z mov.u32 %r7179, 0x0; 2026-02-21T10:18:46.6310839Z mov.u32 %r7180, 0x0; 2026-02-21T10:18:46.6310899Z mov.u32 %r7181, 0x0; 2026-02-21T10:18:46.6311127Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7178, %r7179, %r7180, %r7181 }, [ %rd128 + 0 ], %rd127; 2026-02-21T10:18:46.6311244Z // end inline asm 2026-02-21T10:18:46.6311313Z // begin inline asm 2026-02-21T10:18:46.6311373Z mov.u64 %rd130, 0x0; 2026-02-21T10:18:46.6311495Z createpolicy.fractional.L2::evict_last.b64 %rd130, 1.0; 2026-02-21T10:18:46.6311558Z // end inline asm 2026-02-21T10:18:46.6311622Z // begin inline asm 2026-02-21T10:18:46.6311681Z mov.u32 %r7182, 0x0; 2026-02-21T10:18:46.6311739Z mov.u32 %r7183, 0x0; 2026-02-21T10:18:46.6311803Z mov.u32 %r7184, 0x0; 2026-02-21T10:18:46.6311861Z mov.u32 %r7185, 0x0; 2026-02-21T10:18:46.6312081Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7182, %r7183, %r7184, %r7185 }, [ %rd131 + 0 ], %rd130; 2026-02-21T10:18:46.6312147Z // end inline asm 2026-02-21T10:18:46.6312356Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6312415Z bar.sync 0; 2026-02-21T10:18:46.6312504Z st.shared.v2.b32 [%r33], {%r7178, %r7179}; 2026-02-21T10:18:46.6312599Z st.shared.v2.b32 [%r33+2048], {%r7182, %r7183}; 2026-02-21T10:18:46.6312680Z st.shared.v2.b32 [%r34], {%r7180, %r7181}; 2026-02-21T10:18:46.6312771Z st.shared.v2.b32 [%r34+2048], {%r7184, %r7185}; 2026-02-21T10:18:46.6312829Z bar.sync 0; 2026-02-21T10:18:46.6312900Z ld.shared.b16 %rs449, [%r36]; 2026-02-21T10:18:46.6312972Z ld.shared.b16 %rs450, [%r36+256]; 2026-02-21T10:18:46.6313049Z ld.shared.b16 %rs451, [%r36+16]; 2026-02-21T10:18:46.6313116Z ld.shared.b16 %rs452, [%r36+272]; 2026-02-21T10:18:46.6313186Z ld.shared.b16 %rs453, [%r36+2048]; 2026-02-21T10:18:46.6313260Z ld.shared.b16 %rs454, [%r36+2304]; 2026-02-21T10:18:46.6313326Z ld.shared.b16 %rs455, [%r36+2064]; 2026-02-21T10:18:46.6313392Z ld.shared.b16 %rs456, [%r36+2320]; 2026-02-21T10:18:46.6313460Z ld.shared.b16 %rs457, [%r37]; 2026-02-21T10:18:46.6313542Z ld.shared.b16 %rs458, [%r37+256]; 2026-02-21T10:18:46.6313617Z ld.shared.b16 %rs459, [%r37+16]; 2026-02-21T10:18:46.6313684Z ld.shared.b16 %rs460, [%r37+272]; 2026-02-21T10:18:46.6313756Z ld.shared.b16 %rs461, [%r37+2048]; 2026-02-21T10:18:46.6313828Z ld.shared.b16 %rs462, [%r37+2304]; 2026-02-21T10:18:46.6313893Z ld.shared.b16 %rs463, [%r37+2064]; 2026-02-21T10:18:46.6313960Z ld.shared.b16 %rs464, [%r37+2320]; 2026-02-21T10:18:46.6314037Z cvt.f32.bf16 %r7316, %rs449; 2026-02-21T10:18:46.6314102Z cvt.f32.bf16 %r7317, %rs450; 2026-02-21T10:18:46.6314166Z cvt.f32.bf16 %r7318, %rs457; 2026-02-21T10:18:46.6314234Z cvt.f32.bf16 %r7319, %rs458; 2026-02-21T10:18:46.6314298Z cvt.f32.bf16 %r7448, %rs451; 2026-02-21T10:18:46.6314433Z cvt.f32.bf16 %r7449, %rs452; 2026-02-21T10:18:46.6314504Z cvt.f32.bf16 %r7450, %rs459; 2026-02-21T10:18:46.6314618Z cvt.f32.bf16 %r7451, %rs460; 2026-02-21T10:18:46.6314683Z cvt.f32.bf16 %r7580, %rs453; 2026-02-21T10:18:46.6314748Z cvt.f32.bf16 %r7581, %rs454; 2026-02-21T10:18:46.6314814Z cvt.f32.bf16 %r7582, %rs461; 2026-02-21T10:18:46.6314877Z cvt.f32.bf16 %r7583, %rs462; 2026-02-21T10:18:46.6314938Z cvt.f32.bf16 %r7712, %rs455; 2026-02-21T10:18:46.6315008Z cvt.f32.bf16 %r7713, %rs456; 2026-02-21T10:18:46.6315070Z cvt.f32.bf16 %r7714, %rs463; 2026-02-21T10:18:46.6315132Z cvt.f32.bf16 %r7715, %rs464; 2026-02-21T10:18:46.6315347Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6315420Z add.s64 %rd133, %rd312, 30720; 2026-02-21T10:18:46.6315486Z // begin inline asm 2026-02-21T10:18:46.6315546Z mov.u32 %r7186, 0x0; 2026-02-21T10:18:46.6315610Z mov.u32 %r7187, 0x0; 2026-02-21T10:18:46.6315710Z ld.global.v2.b32 { %r7186, %r7187 }, [ %rd133 + 0 ]; 2026-02-21T10:18:46.6315772Z // end inline asm 2026-02-21T10:18:46.6316036Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6316103Z bar.sync 0; 2026-02-21T10:18:46.6316169Z st.shared.b8 [%r38], %r7186; 2026-02-21T10:18:46.6316240Z prmt.b32 %r8099, %r7186, 0, 0x7771U; 2026-02-21T10:18:46.6316311Z st.shared.b8 [%r39], %r8099; 2026-02-21T10:18:46.6316430Z prmt.b32 %r8100, %r7186, 0, 0x7772U; 2026-02-21T10:18:46.6316635Z st.shared.b8 [%r40+256], %r8100; 2026-02-21T10:18:46.6316711Z prmt.b32 %r8101, %r7186, 0, 0x7773U; 2026-02-21T10:18:46.6316778Z st.shared.b8 [%r41+256], %r8101; 2026-02-21T10:18:46.6316841Z st.shared.b8 [%r42+512], %r7187; 2026-02-21T10:18:46.6316907Z prmt.b32 %r8102, %r7187, 0, 0x7771U; 2026-02-21T10:18:46.6316978Z st.shared.b8 [%r43+512], %r8102; 2026-02-21T10:18:46.6317043Z prmt.b32 %r8103, %r7187, 0, 0x7772U; 2026-02-21T10:18:46.6317107Z st.shared.b8 [%r44+768], %r8103; 2026-02-21T10:18:46.6317180Z prmt.b32 %r8104, %r7187, 0, 0x7773U; 2026-02-21T10:18:46.6317247Z st.shared.b8 [%r45+768], %r8104; 2026-02-21T10:18:46.6317304Z bar.sync 0; 2026-02-21T10:18:46.6317375Z ld.shared.b32 %r8105, [%r46]; 2026-02-21T10:18:46.6317446Z prmt.b32 %r8106, %r8105, 0, 0x7770U; 2026-02-21T10:18:46.6317512Z cvt.u16.u32 %rs465, %r8106; 2026-02-21T10:18:46.6317576Z prmt.b32 %r8107, %r8105, 0, 0x7771U; 2026-02-21T10:18:46.6317647Z cvt.u16.u32 %rs466, %r8107; 2026-02-21T10:18:46.6317712Z prmt.b32 %r8108, %r8105, 0, 0x7772U; 2026-02-21T10:18:46.6317776Z cvt.u16.u32 %rs467, %r8108; 2026-02-21T10:18:46.6317847Z prmt.b32 %r8109, %r8105, 0, 0x7773U; 2026-02-21T10:18:46.6317910Z cvt.u16.u32 %rs468, %r8109; 2026-02-21T10:18:46.6317976Z ld.shared.b32 %r8110, [%r47]; 2026-02-21T10:18:46.6318041Z prmt.b32 %r8111, %r8110, 0, 0x7770U; 2026-02-21T10:18:46.6318108Z cvt.u16.u32 %rs469, %r8111; 2026-02-21T10:18:46.6318175Z prmt.b32 %r8112, %r8110, 0, 0x7771U; 2026-02-21T10:18:46.6318240Z cvt.u16.u32 %rs470, %r8112; 2026-02-21T10:18:46.6318323Z prmt.b32 %r8113, %r8110, 0, 0x7772U; 2026-02-21T10:18:46.6318390Z cvt.u16.u32 %rs471, %r8113; 2026-02-21T10:18:46.6318458Z prmt.b32 %r8114, %r8110, 0, 0x7773U; 2026-02-21T10:18:46.6318522Z cvt.u16.u32 %rs472, %r8114; 2026-02-21T10:18:46.6318745Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6318811Z shl.b16 %rs473, %rs465, 4; 2026-02-21T10:18:46.6318878Z shl.b16 %rs474, %rs469, 4; 2026-02-21T10:18:46.6318945Z shl.b16 %rs475, %rs466, 4; 2026-02-21T10:18:46.6319009Z shl.b16 %rs476, %rs470, 4; 2026-02-21T10:18:46.6319071Z shl.b16 %rs477, %rs467, 4; 2026-02-21T10:18:46.6319141Z shl.b16 %rs478, %rs471, 4; 2026-02-21T10:18:46.6319204Z shl.b16 %rs479, %rs468, 4; 2026-02-21T10:18:46.6319267Z shl.b16 %rs480, %rs472, 4; 2026-02-21T10:18:46.6319471Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6319624Z cvt.s16.s8 %rs481, %rs473; 2026-02-21T10:18:46.6319753Z shr.s16 %rs482, %rs481, 4; 2026-02-21T10:18:46.6319816Z cvt.s16.s8 %rs483, %rs474; 2026-02-21T10:18:46.6319884Z shr.s16 %rs484, %rs483, 4; 2026-02-21T10:18:46.6319951Z prmt.b32 %r8115, %r8105, 0, 0x8880U; 2026-02-21T10:18:46.6320014Z cvt.u16.u32 %rs485, %r8115; 2026-02-21T10:18:46.6320076Z shr.s16 %rs486, %rs485, 4; 2026-02-21T10:18:46.6320146Z prmt.b32 %r8116, %r8110, 0, 0x8880U; 2026-02-21T10:18:46.6320225Z cvt.u16.u32 %rs487, %r8116; 2026-02-21T10:18:46.6320291Z shr.s16 %rs488, %rs487, 4; 2026-02-21T10:18:46.6320502Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6320570Z cvt.rn.f32.s16 %r8117, %rs488; 2026-02-21T10:18:46.6320637Z cvt.rn.f32.s16 %r8118, %rs486; 2026-02-21T10:18:46.6320707Z cvt.rn.f32.s16 %r8119, %rs484; 2026-02-21T10:18:46.6320771Z cvt.rn.f32.s16 %r8120, %rs482; 2026-02-21T10:18:46.6320975Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6321044Z cvt.s16.s8 %rs489, %rs475; 2026-02-21T10:18:46.6321180Z shr.s16 %rs490, %rs489, 4; 2026-02-21T10:18:46.6321248Z cvt.s16.s8 %rs491, %rs476; 2026-02-21T10:18:46.6321312Z shr.s16 %rs492, %rs491, 4; 2026-02-21T10:18:46.6321384Z prmt.b32 %r8121, %r8105, 0, 0x9991U; 2026-02-21T10:18:46.6321448Z cvt.u16.u32 %rs493, %r8121; 2026-02-21T10:18:46.6321589Z shr.s16 %rs494, %rs493, 4; 2026-02-21T10:18:46.6321659Z prmt.b32 %r8122, %r8110, 0, 0x9991U; 2026-02-21T10:18:46.6321728Z cvt.u16.u32 %rs495, %r8122; 2026-02-21T10:18:46.6321793Z shr.s16 %rs496, %rs495, 4; 2026-02-21T10:18:46.6321994Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6322066Z cvt.rn.f32.s16 %r8123, %rs496; 2026-02-21T10:18:46.6322131Z cvt.rn.f32.s16 %r8124, %rs494; 2026-02-21T10:18:46.6322197Z cvt.rn.f32.s16 %r8125, %rs492; 2026-02-21T10:18:46.6322270Z cvt.rn.f32.s16 %r8126, %rs490; 2026-02-21T10:18:46.6322471Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6322540Z cvt.s16.s8 %rs497, %rs477; 2026-02-21T10:18:46.6322604Z shr.s16 %rs498, %rs497, 4; 2026-02-21T10:18:46.6322675Z cvt.s16.s8 %rs499, %rs478; 2026-02-21T10:18:46.6322738Z shr.s16 %rs500, %rs499, 4; 2026-02-21T10:18:46.6322804Z prmt.b32 %r8127, %r8105, 0, 0xaaa2U; 2026-02-21T10:18:46.6322879Z cvt.u16.u32 %rs501, %r8127; 2026-02-21T10:18:46.6322942Z shr.s16 %rs502, %rs501, 4; 2026-02-21T10:18:46.6323021Z prmt.b32 %r8128, %r8110, 0, 0xaaa2U; 2026-02-21T10:18:46.6323086Z cvt.u16.u32 %rs503, %r8128; 2026-02-21T10:18:46.6323154Z shr.s16 %rs504, %rs503, 4; 2026-02-21T10:18:46.6323355Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6323420Z cvt.rn.f32.s16 %r8129, %rs504; 2026-02-21T10:18:46.6323488Z cvt.rn.f32.s16 %r8130, %rs502; 2026-02-21T10:18:46.6323554Z cvt.rn.f32.s16 %r8131, %rs500; 2026-02-21T10:18:46.6323620Z cvt.rn.f32.s16 %r8132, %rs498; 2026-02-21T10:18:46.6323829Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6323892Z cvt.s16.s8 %rs505, %rs479; 2026-02-21T10:18:46.6323955Z shr.s16 %rs506, %rs505, 4; 2026-02-21T10:18:46.6324019Z cvt.s16.s8 %rs507, %rs480; 2026-02-21T10:18:46.6324087Z shr.s16 %rs508, %rs507, 4; 2026-02-21T10:18:46.6324152Z prmt.b32 %r8133, %r8105, 0, 0xbbb3U; 2026-02-21T10:18:46.6324216Z cvt.u16.u32 %rs509, %r8133; 2026-02-21T10:18:46.6324281Z shr.s16 %rs510, %rs509, 4; 2026-02-21T10:18:46.6324347Z prmt.b32 %r8134, %r8110, 0, 0xbbb3U; 2026-02-21T10:18:46.6324411Z cvt.u16.u32 %rs511, %r8134; 2026-02-21T10:18:46.6324472Z shr.s16 %rs512, %rs511, 4; 2026-02-21T10:18:46.6324682Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6324824Z cvt.rn.f32.s16 %r8135, %rs512; 2026-02-21T10:18:46.6324889Z cvt.rn.f32.s16 %r8136, %rs510; 2026-02-21T10:18:46.6325008Z cvt.rn.f32.s16 %r8137, %rs508; 2026-02-21T10:18:46.6325075Z cvt.rn.f32.s16 %r8138, %rs506; 2026-02-21T10:18:46.6325132Z bar.sync 0; 2026-02-21T10:18:46.6325266Z st.shared.v4.b32 [%r48], {%r8120, %r8118, %r8119, %r8117}; 2026-02-21T10:18:46.6325378Z st.shared.v4.b32 [%r49], {%r8126, %r8124, %r8125, %r8123}; 2026-02-21T10:18:46.6325483Z st.shared.v4.b32 [%r50], {%r8132, %r8130, %r8131, %r8129}; 2026-02-21T10:18:46.6325586Z st.shared.v4.b32 [%r51], {%r8138, %r8136, %r8137, %r8135}; 2026-02-21T10:18:46.6325646Z $L__tmp15: 2026-02-21T10:18:46.6325923Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6325986Z // begin inline asm 2026-02-21T10:18:46.6326072Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6326130Z // end inline asm 2026-02-21T10:18:46.6326187Z bar.sync 0; 2026-02-21T10:18:46.6326268Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6326330Z // begin inline asm 2026-02-21T10:18:46.6328085Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r7316,%r7317,%r7318,%r7319}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6328165Z // end inline asm 2026-02-21T10:18:46.6328229Z // begin inline asm 2026-02-21T10:18:46.6329733Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529}, {%r7448,%r7449,%r7450,%r7451}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6329806Z // end inline asm 2026-02-21T10:18:46.6329869Z // begin inline asm 2026-02-21T10:18:46.6331364Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r7580,%r7581,%r7582,%r7583}, %rd101, %p19, 1, 1; 2026-02-21T10:18:46.6331432Z // end inline asm 2026-02-21T10:18:46.6331494Z // begin inline asm 2026-02-21T10:18:46.6332980Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593}, {%r7712,%r7713,%r7714,%r7715}, %rd102, %p19, 1, 1; 2026-02-21T10:18:46.6333042Z // end inline asm 2026-02-21T10:18:46.6333191Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6333262Z mov.b32 %r7844, %r5575; 2026-02-21T10:18:46.6333384Z mov.b32 %r7845, %r7846; 2026-02-21T10:18:46.6333449Z // begin inline asm 2026-02-21T10:18:46.6336018Z // wait for regs: %r12466,%r12467,%r12468,%r12469,%r12470,%r12471,%r12472,%r12473,%r12474,%r12475,%r12476,%r12477,%r12478,%r12479,%r12480,%r12481,%r12482,%r12483,%r12484,%r12485,%r12486,%r12487,%r12488,%r12489,%r12490,%r12491,%r12492,%r12493,%r12494,%r12495,%r12496,%r12497,%r12498,%r12499,%r12500,%r12501,%r12502,%r12503,%r12504,%r12505,%r12506,%r12507,%r12508,%r12509,%r12510,%r12511,%r12512,%r12513,%r12514,%r12515,%r12516,%r12517,%r12518,%r12519,%r12520,%r12521,%r12522,%r12523,%r12524,%r12525,%r12526,%r12527,%r12528,%r12529,%r12530,%r12531,%r12532,%r12533,%r12534,%r12535,%r12536,%r12537,%r12538,%r12539,%r12540,%r12541,%r12542,%r12543,%r12544,%r12545,%r12546,%r12547,%r12548,%r12549,%r12550,%r12551,%r12552,%r12553,%r12554,%r12555,%r12556,%r12557,%r12558,%r12559,%r12560,%r12561,%r12562,%r12563,%r12564,%r12565,%r12566,%r12567,%r12568,%r12569,%r12570,%r12571,%r12572,%r12573,%r12574,%r12575,%r12576,%r12577,%r12578,%r12579,%r12580,%r12581,%r12582,%r12583,%r12584,%r12585,%r12586,%r12587,%r12588,%r12589,%r12590,%r12591,%r12592,%r12593,%r7844,%r7845,%r7846 2026-02-21T10:18:46.6336105Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6336171Z // end inline asm 2026-02-21T10:18:46.6336226Z $L__tmp16: 2026-02-21T10:18:46.6336635Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6336722Z add.s64 %rd314, %rd314, 32; 2026-02-21T10:18:46.6336790Z add.s64 %rd313, %rd313, 128; 2026-02-21T10:18:46.6336856Z add.s64 %rd312, %rd312, 40960; 2026-02-21T10:18:46.6336938Z setp.lt.u64 %p35, %rd314, 4064; 2026-02-21T10:18:46.6337007Z @%p35 bra $L__BB0_6; 2026-02-21T10:18:46.6337121Z // %bb.7: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:18:46.6337334Z .loc 1 33 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:33:32 2026-02-21T10:18:46.6337411Z or.b32 %r8283, %r322, %r11; 2026-02-21T10:18:46.6337475Z or.b32 %r8284, %r322, %r12; 2026-02-21T10:18:46.6337538Z or.b32 %r8285, %r322, %r13; 2026-02-21T10:18:46.6337600Z or.b32 %r8286, %r322, %r14; 2026-02-21T10:18:46.6337666Z or.b32 %r8287, %r322, %r15; 2026-02-21T10:18:46.6337731Z or.b32 %r8288, %r322, %r16; 2026-02-21T10:18:46.6337803Z or.b32 %r8289, %r322, %r17; 2026-02-21T10:18:46.6337874Z or.b32 %r8290, %r322, %r18; 2026-02-21T10:18:46.6337936Z or.b32 %r8291, %r322, %r19; 2026-02-21T10:18:46.6337998Z or.b32 %r8292, %r322, %r20; 2026-02-21T10:18:46.6338069Z or.b32 %r8293, %r322, %r21; 2026-02-21T10:18:46.6338129Z or.b32 %r8294, %r322, %r22; 2026-02-21T10:18:46.6338191Z or.b32 %r8295, %r322, %r23; 2026-02-21T10:18:46.6338251Z or.b32 %r8296, %r322, %r24; 2026-02-21T10:18:46.6338320Z or.b32 %r8297, %r322, %r25; 2026-02-21T10:18:46.6338380Z or.b32 %r8298, %r322, %r26; 2026-02-21T10:18:46.6338610Z .loc 1 87 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:87:28 2026-02-21T10:18:46.6338708Z cvt.rn.bf16x2.f32 %r8299, %r12467, %r12466; 2026-02-21T10:18:46.6338793Z cvt.rn.bf16x2.f32 %r8300, %r12469, %r12468; 2026-02-21T10:18:46.6338875Z cvt.rn.bf16x2.f32 %r8301, %r12471, %r12470; 2026-02-21T10:18:46.6338962Z cvt.rn.bf16x2.f32 %r8302, %r12473, %r12472; 2026-02-21T10:18:46.6339042Z cvt.rn.bf16x2.f32 %r8303, %r12475, %r12474; 2026-02-21T10:18:46.6339120Z cvt.rn.bf16x2.f32 %r8304, %r12477, %r12476; 2026-02-21T10:18:46.6339202Z cvt.rn.bf16x2.f32 %r8305, %r12479, %r12478; 2026-02-21T10:18:46.6339285Z cvt.rn.bf16x2.f32 %r8306, %r12481, %r12480; 2026-02-21T10:18:46.6339363Z cvt.rn.bf16x2.f32 %r8307, %r12483, %r12482; 2026-02-21T10:18:46.6339442Z cvt.rn.bf16x2.f32 %r8308, %r12485, %r12484; 2026-02-21T10:18:46.6339527Z cvt.rn.bf16x2.f32 %r8309, %r12487, %r12486; 2026-02-21T10:18:46.6339605Z cvt.rn.bf16x2.f32 %r8310, %r12489, %r12488; 2026-02-21T10:18:46.6339773Z cvt.rn.bf16x2.f32 %r8311, %r12491, %r12490; 2026-02-21T10:18:46.6340362Z [3119s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:18:46.6341650Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=6, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[2, 4], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:18:46.6341802Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:18:46.6341870Z `ptxas` stderr: 2026-02-21T10:18:46.6342330Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 545 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:46.6342435Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:46.6342443Z 2026-02-21T10:18:46.6343014Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp7d73yk9g.ptx -o /tmp/tmp7d73yk9g.ptx.o 2026-02-21T10:18:46.6343022Z 2026-02-21T10:18:46.6343179Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:18:46.6343314Z cvt.rn.bf16x2.f32 %r8312, %r12493, %r12492; 2026-02-21T10:18:46.6343403Z cvt.rn.bf16x2.f32 %r8313, %r12495, %r12494; 2026-02-21T10:18:46.6343484Z cvt.rn.bf16x2.f32 %r8314, %r12497, %r12496; 2026-02-21T10:18:46.6343575Z cvt.rn.bf16x2.f32 %r8315, %r12499, %r12498; 2026-02-21T10:18:46.6343665Z cvt.rn.bf16x2.f32 %r8316, %r12501, %r12500; 2026-02-21T10:18:46.6343741Z cvt.rn.bf16x2.f32 %r8317, %r12503, %r12502; 2026-02-21T10:18:46.6343818Z cvt.rn.bf16x2.f32 %r8318, %r12505, %r12504; 2026-02-21T10:18:46.6343899Z cvt.rn.bf16x2.f32 %r8319, %r12507, %r12506; 2026-02-21T10:18:46.6343977Z cvt.rn.bf16x2.f32 %r8320, %r12509, %r12508; 2026-02-21T10:18:46.6344057Z cvt.rn.bf16x2.f32 %r8321, %r12511, %r12510; 2026-02-21T10:18:46.6344136Z cvt.rn.bf16x2.f32 %r8322, %r12513, %r12512; 2026-02-21T10:18:46.6344218Z cvt.rn.bf16x2.f32 %r8323, %r12515, %r12514; 2026-02-21T10:18:46.6344293Z cvt.rn.bf16x2.f32 %r8324, %r12517, %r12516; 2026-02-21T10:18:46.6344371Z cvt.rn.bf16x2.f32 %r8325, %r12519, %r12518; 2026-02-21T10:18:46.6344453Z cvt.rn.bf16x2.f32 %r8326, %r12521, %r12520; 2026-02-21T10:18:46.6344528Z cvt.rn.bf16x2.f32 %r8327, %r12523, %r12522; 2026-02-21T10:18:46.6344604Z cvt.rn.bf16x2.f32 %r8328, %r12525, %r12524; 2026-02-21T10:18:46.6344681Z cvt.rn.bf16x2.f32 %r8329, %r12527, %r12526; 2026-02-21T10:18:46.6344763Z cvt.rn.bf16x2.f32 %r8330, %r12529, %r12528; 2026-02-21T10:18:46.6344839Z cvt.rn.bf16x2.f32 %r8331, %r12531, %r12530; 2026-02-21T10:18:46.6344916Z cvt.rn.bf16x2.f32 %r8332, %r12533, %r12532; 2026-02-21T10:18:46.6345000Z cvt.rn.bf16x2.f32 %r8333, %r12535, %r12534; 2026-02-21T10:18:46.6345078Z cvt.rn.bf16x2.f32 %r8334, %r12537, %r12536; 2026-02-21T10:18:46.6345156Z cvt.rn.bf16x2.f32 %r8335, %r12539, %r12538; 2026-02-21T10:18:46.6345243Z cvt.rn.bf16x2.f32 %r8336, %r12541, %r12540; 2026-02-21T10:18:46.6345330Z cvt.rn.bf16x2.f32 %r8337, %r12543, %r12542; 2026-02-21T10:18:46.6345407Z cvt.rn.bf16x2.f32 %r8338, %r12545, %r12544; 2026-02-21T10:18:46.6345488Z cvt.rn.bf16x2.f32 %r8339, %r12547, %r12546; 2026-02-21T10:18:46.6345570Z cvt.rn.bf16x2.f32 %r8340, %r12549, %r12548; 2026-02-21T10:18:46.6345646Z cvt.rn.bf16x2.f32 %r8341, %r12551, %r12550; 2026-02-21T10:18:46.6345723Z cvt.rn.bf16x2.f32 %r8342, %r12553, %r12552; 2026-02-21T10:18:46.6345805Z cvt.rn.bf16x2.f32 %r8343, %r12555, %r12554; 2026-02-21T10:18:46.6345880Z cvt.rn.bf16x2.f32 %r8344, %r12557, %r12556; 2026-02-21T10:18:46.6345958Z cvt.rn.bf16x2.f32 %r8345, %r12559, %r12558; 2026-02-21T10:18:46.6346041Z cvt.rn.bf16x2.f32 %r8346, %r12561, %r12560; 2026-02-21T10:18:46.6346197Z cvt.rn.bf16x2.f32 %r8347, %r12563, %r12562; 2026-02-21T10:18:46.6346321Z cvt.rn.bf16x2.f32 %r8348, %r12565, %r12564; 2026-02-21T10:18:46.6346399Z cvt.rn.bf16x2.f32 %r8349, %r12567, %r12566; 2026-02-21T10:18:46.6346616Z cvt.rn.bf16x2.f32 %r8350, %r12569, %r12568; 2026-02-21T10:18:46.6346700Z cvt.rn.bf16x2.f32 %r8351, %r12571, %r12570; 2026-02-21T10:18:46.6346776Z cvt.rn.bf16x2.f32 %r8352, %r12573, %r12572; 2026-02-21T10:18:46.6346864Z cvt.rn.bf16x2.f32 %r8353, %r12575, %r12574; 2026-02-21T10:18:46.6346943Z cvt.rn.bf16x2.f32 %r8354, %r12577, %r12576; 2026-02-21T10:18:46.6347020Z cvt.rn.bf16x2.f32 %r8355, %r12579, %r12578; 2026-02-21T10:18:46.6347101Z cvt.rn.bf16x2.f32 %r8356, %r12581, %r12580; 2026-02-21T10:18:46.6347180Z cvt.rn.bf16x2.f32 %r8357, %r12583, %r12582; 2026-02-21T10:18:46.6347255Z cvt.rn.bf16x2.f32 %r8358, %r12585, %r12584; 2026-02-21T10:18:46.6347331Z cvt.rn.bf16x2.f32 %r8359, %r12587, %r12586; 2026-02-21T10:18:46.6347414Z cvt.rn.bf16x2.f32 %r8360, %r12589, %r12588; 2026-02-21T10:18:46.6347491Z cvt.rn.bf16x2.f32 %r8361, %r12591, %r12590; 2026-02-21T10:18:46.6347571Z cvt.rn.bf16x2.f32 %r8362, %r12593, %r12592; 2026-02-21T10:18:46.6347903Z .loc 1 88 50 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:50 2026-02-21T10:18:46.6347986Z mad.lo.s32 %r8363, %r8283, 1280, %r321; 2026-02-21T10:18:46.6348060Z mad.lo.s32 %r8364, %r8284, 1280, %r321; 2026-02-21T10:18:46.6348191Z mad.lo.s32 %r8365, %r8285, 1280, %r321; 2026-02-21T10:18:46.6348263Z mad.lo.s32 %r8366, %r8286, 1280, %r321; 2026-02-21T10:18:46.6348331Z mad.lo.s32 %r8367, %r8287, 1280, %r321; 2026-02-21T10:18:46.6348399Z mad.lo.s32 %r8368, %r8288, 1280, %r321; 2026-02-21T10:18:46.6348549Z mad.lo.s32 %r8369, %r8289, 1280, %r321; 2026-02-21T10:18:46.6348618Z mad.lo.s32 %r8370, %r8290, 1280, %r321; 2026-02-21T10:18:46.6348687Z mad.lo.s32 %r8371, %r8291, 1280, %r321; 2026-02-21T10:18:46.6348760Z mad.lo.s32 %r8372, %r8292, 1280, %r321; 2026-02-21T10:18:46.6348832Z mad.lo.s32 %r8373, %r8293, 1280, %r321; 2026-02-21T10:18:46.6348902Z mad.lo.s32 %r8374, %r8294, 1280, %r321; 2026-02-21T10:18:46.6348978Z mad.lo.s32 %r8375, %r8295, 1280, %r321; 2026-02-21T10:18:46.6349053Z mad.lo.s32 %r8376, %r8296, 1280, %r321; 2026-02-21T10:18:46.6349123Z mad.lo.s32 %r8377, %r8297, 1280, %r321; 2026-02-21T10:18:46.6349192Z mad.lo.s32 %r8378, %r8298, 1280, %r321; 2026-02-21T10:18:46.6349414Z .loc 1 88 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:22 2026-02-21T10:18:46.6349489Z mad.wide.s32 %rd138, %r8363, 2, %rd27; 2026-02-21T10:18:46.6349560Z mad.wide.s32 %rd139, %r8364, 2, %rd27; 2026-02-21T10:18:46.6349635Z mad.wide.s32 %rd140, %r8365, 2, %rd27; 2026-02-21T10:18:46.6349704Z mad.wide.s32 %rd141, %r8366, 2, %rd27; 2026-02-21T10:18:46.6349772Z mad.wide.s32 %rd142, %r8367, 2, %rd27; 2026-02-21T10:18:46.6349847Z mad.wide.s32 %rd143, %r8368, 2, %rd27; 2026-02-21T10:18:46.6349915Z mad.wide.s32 %rd144, %r8369, 2, %rd27; 2026-02-21T10:18:46.6349985Z mad.wide.s32 %rd145, %r8370, 2, %rd27; 2026-02-21T10:18:46.6350055Z mad.wide.s32 %rd146, %r8371, 2, %rd27; 2026-02-21T10:18:46.6350130Z mad.wide.s32 %rd147, %r8372, 2, %rd27; 2026-02-21T10:18:46.6350200Z mad.wide.s32 %rd148, %r8373, 2, %rd27; 2026-02-21T10:18:46.6350269Z mad.wide.s32 %rd149, %r8374, 2, %rd27; 2026-02-21T10:18:46.6350344Z mad.wide.s32 %rd150, %r8375, 2, %rd27; 2026-02-21T10:18:46.6350415Z mad.wide.s32 %rd151, %r8376, 2, %rd27; 2026-02-21T10:18:46.6350487Z mad.wide.s32 %rd152, %r8377, 2, %rd27; 2026-02-21T10:18:46.6350556Z mad.wide.s32 %rd153, %r8378, 2, %rd27; 2026-02-21T10:18:46.6350772Z .loc 1 88 81 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:81 2026-02-21T10:18:46.6350835Z bar.sync 0; 2026-02-21T10:18:46.6350951Z st.shared.v4.b32 [%r52], {%r8299, %r8301, %r8303, %r8305}; 2026-02-21T10:18:46.6351066Z st.shared.v4.b32 [%r53], {%r8307, %r8309, %r8311, %r8313}; 2026-02-21T10:18:46.6351262Z st.shared.v4.b32 [%r54], {%r8315, %r8317, %r8319, %r8321}; 2026-02-21T10:18:46.6351433Z st.shared.v4.b32 [%r55], {%r8323, %r8325, %r8327, %r8329}; 2026-02-21T10:18:46.6351502Z bar.sync 0; 2026-02-21T10:18:46.6351567Z // begin inline asm 2026-02-21T10:18:46.6351762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8139, %r8140, %r8141, %r8142}, [%r4522]; 2026-02-21T10:18:46.6351823Z // end inline asm 2026-02-21T10:18:46.6351889Z // begin inline asm 2026-02-21T10:18:46.6352077Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8144, %r8145, %r8146, %r8147}, [%r4527]; 2026-02-21T10:18:46.6352135Z // end inline asm 2026-02-21T10:18:46.6352204Z // begin inline asm 2026-02-21T10:18:46.6352386Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8149, %r8150, %r8151, %r8152}, [%r4532]; 2026-02-21T10:18:46.6352443Z // end inline asm 2026-02-21T10:18:46.6352512Z // begin inline asm 2026-02-21T10:18:46.6352692Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8154, %r8155, %r8156, %r8157}, [%r4537]; 2026-02-21T10:18:46.6352753Z // end inline asm 2026-02-21T10:18:46.6352811Z bar.sync 0; 2026-02-21T10:18:46.6352924Z st.shared.v4.b32 [%r52], {%r8300, %r8302, %r8304, %r8306}; 2026-02-21T10:18:46.6353089Z st.shared.v4.b32 [%r53], {%r8308, %r8310, %r8312, %r8314}; 2026-02-21T10:18:46.6353200Z st.shared.v4.b32 [%r54], {%r8316, %r8318, %r8320, %r8322}; 2026-02-21T10:18:46.6353307Z st.shared.v4.b32 [%r55], {%r8324, %r8326, %r8328, %r8330}; 2026-02-21T10:18:46.6353364Z bar.sync 0; 2026-02-21T10:18:46.6353472Z // begin inline asm 2026-02-21T10:18:46.6353655Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8159, %r8160, %r8161, %r8162}, [%r4522]; 2026-02-21T10:18:46.6353719Z // end inline asm 2026-02-21T10:18:46.6353779Z // begin inline asm 2026-02-21T10:18:46.6353958Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8164, %r8165, %r8166, %r8167}, [%r4527]; 2026-02-21T10:18:46.6354021Z // end inline asm 2026-02-21T10:18:46.6354081Z // begin inline asm 2026-02-21T10:18:46.6354261Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8169, %r8170, %r8171, %r8172}, [%r4532]; 2026-02-21T10:18:46.6354328Z // end inline asm 2026-02-21T10:18:46.6354388Z // begin inline asm 2026-02-21T10:18:46.6354571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8174, %r8175, %r8176, %r8177}, [%r4537]; 2026-02-21T10:18:46.6354629Z // end inline asm 2026-02-21T10:18:46.6354702Z bar.sync 0; 2026-02-21T10:18:46.6354812Z st.shared.v4.b32 [%r52], {%r8331, %r8333, %r8335, %r8337}; 2026-02-21T10:18:46.6354920Z st.shared.v4.b32 [%r53], {%r8339, %r8341, %r8343, %r8345}; 2026-02-21T10:18:46.6355028Z st.shared.v4.b32 [%r54], {%r8347, %r8349, %r8351, %r8353}; 2026-02-21T10:18:46.6355131Z st.shared.v4.b32 [%r55], {%r8355, %r8357, %r8359, %r8361}; 2026-02-21T10:18:46.6355188Z bar.sync 0; 2026-02-21T10:18:46.6355252Z // begin inline asm 2026-02-21T10:18:46.6355434Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8179, %r8180, %r8181, %r8182}, [%r4522]; 2026-02-21T10:18:46.6355494Z // end inline asm 2026-02-21T10:18:46.6355554Z // begin inline asm 2026-02-21T10:18:46.6355741Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8184, %r8185, %r8186, %r8187}, [%r4527]; 2026-02-21T10:18:46.6355802Z // end inline asm 2026-02-21T10:18:46.6355862Z // begin inline asm 2026-02-21T10:18:46.6356049Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8189, %r8190, %r8191, %r8192}, [%r4532]; 2026-02-21T10:18:46.6356112Z // end inline asm 2026-02-21T10:18:46.6356173Z // begin inline asm 2026-02-21T10:18:46.6356354Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8194, %r8195, %r8196, %r8197}, [%r4537]; 2026-02-21T10:18:46.6356419Z // end inline asm 2026-02-21T10:18:46.6356612Z bar.sync 0; 2026-02-21T10:18:46.6356726Z st.shared.v4.b32 [%r52], {%r8332, %r8334, %r8336, %r8338}; 2026-02-21T10:18:46.6356837Z st.shared.v4.b32 [%r53], {%r8340, %r8342, %r8344, %r8346}; 2026-02-21T10:18:46.6356942Z st.shared.v4.b32 [%r54], {%r8348, %r8350, %r8352, %r8354}; 2026-02-21T10:18:46.6357044Z st.shared.v4.b32 [%r55], {%r8356, %r8358, %r8360, %r8362}; 2026-02-21T10:18:46.6357196Z bar.sync 0; 2026-02-21T10:18:46.6357258Z // begin inline asm 2026-02-21T10:18:46.6357508Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8199, %r8200, %r8201, %r8202}, [%r4522]; 2026-02-21T10:18:46.6357569Z // end inline asm 2026-02-21T10:18:46.6357635Z // begin inline asm 2026-02-21T10:18:46.6357821Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8204, %r8205, %r8206, %r8207}, [%r4527]; 2026-02-21T10:18:46.6357880Z // end inline asm 2026-02-21T10:18:46.6357946Z // begin inline asm 2026-02-21T10:18:46.6358130Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8209, %r8210, %r8211, %r8212}, [%r4532]; 2026-02-21T10:18:46.6358188Z // end inline asm 2026-02-21T10:18:46.6358252Z // begin inline asm 2026-02-21T10:18:46.6358434Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8214, %r8215, %r8216, %r8217}, [%r4537]; 2026-02-21T10:18:46.6358493Z // end inline asm 2026-02-21T10:18:46.6358565Z // begin inline asm 2026-02-21T10:18:46.6358713Z st.global.v4.b32 [ %rd138 + 0 ], { %r8139, %r8140, %r8141, %r8142 }; 2026-02-21T10:18:46.6358775Z // end inline asm 2026-02-21T10:18:46.6358836Z // begin inline asm 2026-02-21T10:18:46.6358967Z st.global.v4.b32 [ %rd139 + 0 ], { %r8159, %r8160, %r8161, %r8162 }; 2026-02-21T10:18:46.6359098Z // end inline asm 2026-02-21T10:18:46.6359165Z // begin inline asm 2026-02-21T10:18:46.6359291Z st.global.v4.b32 [ %rd140 + 0 ], { %r8144, %r8145, %r8146, %r8147 }; 2026-02-21T10:18:46.6359355Z // end inline asm 2026-02-21T10:18:46.6359416Z // begin inline asm 2026-02-21T10:18:46.6359597Z st.global.v4.b32 [ %rd141 + 0 ], { %r8164, %r8165, %r8166, %r8167 }; 2026-02-21T10:18:46.6359667Z // end inline asm 2026-02-21T10:18:46.6359729Z // begin inline asm 2026-02-21T10:18:46.6359846Z st.global.v4.b32 [ %rd142 + 0 ], { %r8149, %r8150, %r8151, %r8152 }; 2026-02-21T10:18:46.6359910Z // end inline asm 2026-02-21T10:18:46.6359970Z // begin inline asm 2026-02-21T10:18:46.6360087Z st.global.v4.b32 [ %rd143 + 0 ], { %r8169, %r8170, %r8171, %r8172 }; 2026-02-21T10:18:46.6360147Z // end inline asm 2026-02-21T10:18:46.6360215Z // begin inline asm 2026-02-21T10:18:46.6360332Z st.global.v4.b32 [ %rd144 + 0 ], { %r8154, %r8155, %r8156, %r8157 }; 2026-02-21T10:18:46.6360404Z // end inline asm 2026-02-21T10:18:46.6360475Z // begin inline asm 2026-02-21T10:18:46.6360593Z st.global.v4.b32 [ %rd145 + 0 ], { %r8174, %r8175, %r8176, %r8177 }; 2026-02-21T10:18:46.6360653Z // end inline asm 2026-02-21T10:18:46.6360712Z // begin inline asm 2026-02-21T10:18:46.6360834Z st.global.v4.b32 [ %rd146 + 0 ], { %r8179, %r8180, %r8181, %r8182 }; 2026-02-21T10:18:46.6360893Z // end inline asm 2026-02-21T10:18:46.6360954Z // begin inline asm 2026-02-21T10:18:46.6361075Z st.global.v4.b32 [ %rd147 + 0 ], { %r8199, %r8200, %r8201, %r8202 }; 2026-02-21T10:18:46.6361132Z // end inline asm 2026-02-21T10:18:46.6361193Z // begin inline asm 2026-02-21T10:18:46.6361314Z st.global.v4.b32 [ %rd148 + 0 ], { %r8184, %r8185, %r8186, %r8187 }; 2026-02-21T10:18:46.6361376Z // end inline asm 2026-02-21T10:18:46.6361435Z // begin inline asm 2026-02-21T10:18:46.6361552Z st.global.v4.b32 [ %rd149 + 0 ], { %r8204, %r8205, %r8206, %r8207 }; 2026-02-21T10:18:46.6361617Z // end inline asm 2026-02-21T10:18:46.6361680Z // begin inline asm 2026-02-21T10:18:46.6361799Z st.global.v4.b32 [ %rd150 + 0 ], { %r8189, %r8190, %r8191, %r8192 }; 2026-02-21T10:18:46.6361860Z // end inline asm 2026-02-21T10:18:46.6361920Z // begin inline asm 2026-02-21T10:18:46.6362037Z st.global.v4.b32 [ %rd151 + 0 ], { %r8209, %r8210, %r8211, %r8212 }; 2026-02-21T10:18:46.6362095Z // end inline asm 2026-02-21T10:18:46.6362159Z // begin inline asm 2026-02-21T10:18:46.6362279Z st.global.v4.b32 [ %rd152 + 0 ], { %r8194, %r8195, %r8196, %r8197 }; 2026-02-21T10:18:46.6362335Z // end inline asm 2026-02-21T10:18:46.6362399Z // begin inline asm 2026-02-21T10:18:46.6362514Z st.global.v4.b32 [ %rd153 + 0 ], { %r8214, %r8215, %r8216, %r8217 }; 2026-02-21T10:18:46.6362571Z // end inline asm 2026-02-21T10:18:46.6362798Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6362980Z add.s32 %r12337, %r12337, 264; 2026-02-21T10:18:46.6363056Z setp.lt.s32 %p36, %r12337, %r12768; 2026-02-21T10:18:46.6363122Z @%p36 bra $L__BB0_3; 2026-02-21T10:18:46.6363187Z bra.uni $L__BB0_8; 2026-02-21T10:18:46.6363296Z $L__BB0_1: // %.._crit_edge_crit_edge 2026-02-21T10:18:46.6363511Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6363583Z and.b32 %r12594, %r3, 16; 2026-02-21T10:18:46.6363673Z $L__BB0_8: // %._crit_edge 2026-02-21T10:18:46.6363888Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6363961Z sub.s32 %r8539, 5120, %r12768; 2026-02-21T10:18:46.6364035Z mul.hi.s32 %r8540, %r8539, 1041204193; 2026-02-21T10:18:46.6364100Z shr.u32 %r8541, %r8540, 31; 2026-02-21T10:18:46.6364164Z shr.s32 %r8542, %r8540, 5; 2026-02-21T10:18:46.6364236Z add.s32 %r581, %r8542, %r8541; 2026-02-21T10:18:46.6364302Z mul.lo.s32 %r8543, %r581, 132; 2026-02-21T10:18:46.6364426Z setp.ne.b32 %p37, %r8539, %r8543; 2026-02-21T10:18:46.6364503Z setp.gt.s32 %p38, %r8539, -1; 2026-02-21T10:18:46.6364572Z and.pred %p39, %p38, %p37; 2026-02-21T10:18:46.6364651Z selp.b32 %r582, 1, 0, %p39; 2026-02-21T10:18:46.6364718Z add.s32 %r583, %r581, %r582; 2026-02-21T10:18:46.6364838Z setp.lt.s32 %p40, %r583, 1; 2026-02-21T10:18:46.6364905Z setp.gt.s32 %p41, %r583, 0; 2026-02-21T10:18:46.6365115Z .loc 1 25 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:25:35 2026-02-21T10:18:46.6365185Z shr.s32 %r8544, %r12768, 31; 2026-02-21T10:18:46.6365249Z shr.u32 %r8545, %r8544, 17; 2026-02-21T10:18:46.6365317Z add.s32 %r8546, %r12768, %r8545; 2026-02-21T10:18:46.6365386Z shr.s32 %r8547, %r8546, 15; 2026-02-21T10:18:46.6365603Z .loc 1 26 33 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:26:33 2026-02-21T10:18:46.6365674Z shl.b32 %r584, %r8547, 6; 2026-02-21T10:18:46.6365882Z .loc 1 27 39 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:39 2026-02-21T10:18:46.6365953Z sub.s32 %r8548, 10, %r584; 2026-02-21T10:18:46.6366156Z .loc 1 27 52 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:52 2026-02-21T10:18:46.6366219Z min.s32 %r586, %r8548, 64; 2026-02-21T10:18:46.6366427Z .loc 1 28 45 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:45 2026-02-21T10:18:46.6366620Z and.b32 %r8549, %r8546, -32768; 2026-02-21T10:18:46.6366691Z sub.s32 %r585, %r12768, %r8549; 2026-02-21T10:18:46.6366904Z .loc 1 29 51 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:29:51 2026-02-21T10:18:46.6366969Z div.s32 %r587, %r585, %r586; 2026-02-21T10:18:46.6367172Z .loc 1 32 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:32:27 2026-02-21T10:18:46.6367238Z shl.b32 %r12599, %r587, 7; 2026-02-21T10:18:46.6367450Z .loc 1 33 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:33:32 2026-02-21T10:18:46.6367517Z or.b32 %r12769, %r12599, %r6; 2026-02-21T10:18:46.6367580Z or.b32 %r12770, %r12599, %r7; 2026-02-21T10:18:46.6367647Z or.b32 %r12771, %r12599, %r8; 2026-02-21T10:18:46.6367709Z or.b32 %r12772, %r12599, %r9; 2026-02-21T10:18:46.6367912Z .loc 1 48 53 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:53 2026-02-21T10:18:46.6367983Z shl.b32 %r8550, %r12769, 13; 2026-02-21T10:18:46.6368046Z shl.b32 %r8551, %r12770, 13; 2026-02-21T10:18:46.6368109Z shl.b32 %r8552, %r12771, 13; 2026-02-21T10:18:46.6368172Z shl.b32 %r8553, %r12772, 13; 2026-02-21T10:18:46.6368391Z .loc 1 48 60 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:60 2026-02-21T10:18:46.6368455Z or.b32 %r8554, %r8550, %r30; 2026-02-21T10:18:46.6368610Z or.b32 %r8555, %r8551, %r30; 2026-02-21T10:18:46.6368759Z or.b32 %r8556, %r8552, %r30; 2026-02-21T10:18:46.6368822Z or.b32 %r8557, %r8553, %r30; 2026-02-21T10:18:46.6369035Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6369118Z mad.wide.s32 %rd154, %r8554, 2, %rd25; 2026-02-21T10:18:46.6369193Z mad.wide.s32 %rd155, %r8555, 2, %rd25; 2026-02-21T10:18:46.6369265Z mad.wide.s32 %rd156, %r8556, 2, %rd25; 2026-02-21T10:18:46.6369334Z mad.wide.s32 %rd157, %r8557, 2, %rd25; 2026-02-21T10:18:46.6369550Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6369611Z bar.sync 0; 2026-02-21T10:18:46.6369674Z and.b32 %r8558, %r27, 888; 2026-02-21T10:18:46.6369749Z setp.eq.b32 %p42, %r12594, 0; 2026-02-21T10:18:46.6369815Z selp.b32 %r593, 0, 136, %p42; 2026-02-21T10:18:46.6369878Z xor.b32 %r594, %r593, %r8558; 2026-02-21T10:18:46.6369956Z add.s32 %r8379, %r5575, %r594; 2026-02-21T10:18:46.6370033Z selp.b32 %r8380, 8, 0, %p41; 2026-02-21T10:18:46.6370099Z // begin inline asm 2026-02-21T10:18:46.6370316Z cp.async.ca.shared.global [ %r8379 + 0 ], [ %rd154 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6370387Z // end inline asm 2026-02-21T10:18:46.6370454Z add.s32 %r8381, %r8379, 1024; 2026-02-21T10:18:46.6370517Z // begin inline asm 2026-02-21T10:18:46.6370723Z cp.async.ca.shared.global [ %r8381 + 0 ], [ %rd155 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6370787Z // end inline asm 2026-02-21T10:18:46.6370851Z add.s32 %r8383, %r8379, 2048; 2026-02-21T10:18:46.6370913Z // begin inline asm 2026-02-21T10:18:46.6371056Z cp.async.ca.shared.global [ %r8383 + 0 ], [ %rd156 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6371115Z // end inline asm 2026-02-21T10:18:46.6371178Z add.s32 %r8385, %r8379, 3072; 2026-02-21T10:18:46.6371244Z // begin inline asm 2026-02-21T10:18:46.6371379Z cp.async.ca.shared.global [ %r8385 + 0 ], [ %rd157 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6371439Z // end inline asm 2026-02-21T10:18:46.6371515Z cp.async.commit_group; 2026-02-21T10:18:46.6371740Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6371810Z cvt.s64.s32 %rd234, %r8550; 2026-02-21T10:18:46.6371875Z cvt.u64.u32 %rd235, %r30; 2026-02-21T10:18:46.6371947Z or.b64 %rd236, %rd234, %rd235; 2026-02-21T10:18:46.6372013Z shl.b64 %rd237, %rd236, 1; 2026-02-21T10:18:46.6372082Z add.s64 %rd238, %rd25, %rd237; 2026-02-21T10:18:46.6372154Z add.s64 %rd158, %rd238, 32; 2026-02-21T10:18:46.6372218Z cvt.s64.s32 %rd239, %r8551; 2026-02-21T10:18:46.6372283Z or.b64 %rd240, %rd239, %rd235; 2026-02-21T10:18:46.6372347Z shl.b64 %rd241, %rd240, 1; 2026-02-21T10:18:46.6372418Z add.s64 %rd242, %rd25, %rd241; 2026-02-21T10:18:46.6372481Z add.s64 %rd159, %rd242, 32; 2026-02-21T10:18:46.6372543Z cvt.s64.s32 %rd243, %r8552; 2026-02-21T10:18:46.6372616Z or.b64 %rd244, %rd243, %rd235; 2026-02-21T10:18:46.6372681Z shl.b64 %rd245, %rd244, 1; 2026-02-21T10:18:46.6372747Z add.s64 %rd246, %rd25, %rd245; 2026-02-21T10:18:46.6372812Z add.s64 %rd160, %rd246, 32; 2026-02-21T10:18:46.6372882Z cvt.s64.s32 %rd247, %r8553; 2026-02-21T10:18:46.6372947Z or.b64 %rd248, %rd247, %rd235; 2026-02-21T10:18:46.6373012Z shl.b64 %rd249, %rd248, 1; 2026-02-21T10:18:46.6373080Z add.s64 %rd250, %rd25, %rd249; 2026-02-21T10:18:46.6373144Z add.s64 %rd161, %rd250, 32; 2026-02-21T10:18:46.6373362Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6373433Z add.s32 %r8387, %r8379, 20480; 2026-02-21T10:18:46.6373497Z // begin inline asm 2026-02-21T10:18:46.6373635Z cp.async.ca.shared.global [ %r8387 + 0 ], [ %rd158 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6373695Z // end inline asm 2026-02-21T10:18:46.6373762Z add.s32 %r8389, %r8379, 21504; 2026-02-21T10:18:46.6373824Z // begin inline asm 2026-02-21T10:18:46.6374020Z cp.async.ca.shared.global [ %r8389 + 0 ], [ %rd159 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6374130Z // end inline asm 2026-02-21T10:18:46.6374192Z add.s32 %r8391, %r8379, 22528; 2026-02-21T10:18:46.6374256Z // begin inline asm 2026-02-21T10:18:46.6374390Z cp.async.ca.shared.global [ %r8391 + 0 ], [ %rd160 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6374454Z // end inline asm 2026-02-21T10:18:46.6374517Z add.s32 %r8393, %r8379, 23552; 2026-02-21T10:18:46.6374576Z // begin inline asm 2026-02-21T10:18:46.6374722Z cp.async.ca.shared.global [ %r8393 + 0 ], [ %rd161 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6374788Z // end inline asm 2026-02-21T10:18:46.6374857Z cp.async.commit_group; 2026-02-21T10:18:46.6375068Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6375134Z add.s64 %rd162, %rd238, 64; 2026-02-21T10:18:46.6375197Z add.s64 %rd163, %rd242, 64; 2026-02-21T10:18:46.6375260Z add.s64 %rd164, %rd246, 64; 2026-02-21T10:18:46.6375330Z add.s64 %rd165, %rd250, 64; 2026-02-21T10:18:46.6375534Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6375657Z add.s32 %r8395, %r8379, 40960; 2026-02-21T10:18:46.6375728Z // begin inline asm 2026-02-21T10:18:46.6375866Z cp.async.ca.shared.global [ %r8395 + 0 ], [ %rd162 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6375925Z // end inline asm 2026-02-21T10:18:46.6376035Z add.s32 %r8397, %r8379, 41984; 2026-02-21T10:18:46.6376108Z // begin inline asm 2026-02-21T10:18:46.6376242Z cp.async.ca.shared.global [ %r8397 + 0 ], [ %rd163 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6376300Z // end inline asm 2026-02-21T10:18:46.6376379Z add.s32 %r8399, %r8379, 43008; 2026-02-21T10:18:46.6376443Z // begin inline asm 2026-02-21T10:18:46.6376703Z cp.async.ca.shared.global [ %r8399 + 0 ], [ %rd164 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6376770Z // end inline asm 2026-02-21T10:18:46.6376837Z add.s32 %r8401, %r8379, 44032; 2026-02-21T10:18:46.6376905Z // begin inline asm 2026-02-21T10:18:46.6377039Z cp.async.ca.shared.global [ %r8401 + 0 ], [ %rd165 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6377110Z // end inline asm 2026-02-21T10:18:46.6377180Z cp.async.commit_group; 2026-02-21T10:18:46.6377386Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6377456Z add.s64 %rd166, %rd238, 96; 2026-02-21T10:18:46.6377522Z add.s64 %rd167, %rd242, 96; 2026-02-21T10:18:46.6377587Z add.s64 %rd168, %rd246, 96; 2026-02-21T10:18:46.6377650Z add.s64 %rd169, %rd250, 96; 2026-02-21T10:18:46.6377872Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6377937Z add.s32 %r8403, %r8379, 61440; 2026-02-21T10:18:46.6378009Z // begin inline asm 2026-02-21T10:18:46.6378157Z cp.async.ca.shared.global [ %r8403 + 0 ], [ %rd166 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6378215Z // end inline asm 2026-02-21T10:18:46.6384198Z add.s32 %r8405, %r8379, 62464; 2026-02-21T10:18:46.6384288Z // begin inline asm 2026-02-21T10:18:46.6384460Z cp.async.ca.shared.global [ %r8405 + 0 ], [ %rd167 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6384526Z // end inline asm 2026-02-21T10:18:46.6384597Z add.s32 %r8407, %r8379, 63488; 2026-02-21T10:18:46.6384661Z // begin inline asm 2026-02-21T10:18:46.6384813Z cp.async.ca.shared.global [ %r8407 + 0 ], [ %rd168 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6384886Z // end inline asm 2026-02-21T10:18:46.6384954Z add.s32 %r8409, %r8379, 64512; 2026-02-21T10:18:46.6385019Z // begin inline asm 2026-02-21T10:18:46.6385175Z cp.async.ca.shared.global [ %r8409 + 0 ], [ %rd169 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6385237Z // end inline asm 2026-02-21T10:18:46.6385306Z cp.async.commit_group; 2026-02-21T10:18:46.6385535Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6385611Z add.s64 %rd170, %rd238, 128; 2026-02-21T10:18:46.6385804Z add.s64 %rd171, %rd242, 128; 2026-02-21T10:18:46.6385868Z add.s64 %rd172, %rd246, 128; 2026-02-21T10:18:46.6386014Z add.s64 %rd173, %rd250, 128; 2026-02-21T10:18:46.6386237Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6386302Z bar.sync 0; 2026-02-21T10:18:46.6386379Z add.s32 %r8411, %r8379, 4096; 2026-02-21T10:18:46.6386443Z // begin inline asm 2026-02-21T10:18:46.6386766Z cp.async.ca.shared.global [ %r8411 + 0 ], [ %rd170 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6386829Z // end inline asm 2026-02-21T10:18:46.6386900Z add.s32 %r8413, %r8379, 5120; 2026-02-21T10:18:46.6386972Z // begin inline asm 2026-02-21T10:18:46.6387112Z cp.async.ca.shared.global [ %r8413 + 0 ], [ %rd171 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6387178Z // end inline asm 2026-02-21T10:18:46.6387239Z add.s32 %r8415, %r8379, 6144; 2026-02-21T10:18:46.6387298Z // begin inline asm 2026-02-21T10:18:46.6387431Z cp.async.ca.shared.global [ %r8415 + 0 ], [ %rd172 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6387497Z // end inline asm 2026-02-21T10:18:46.6387561Z add.s32 %r8417, %r8379, 7168; 2026-02-21T10:18:46.6387702Z // begin inline asm 2026-02-21T10:18:46.6387844Z cp.async.ca.shared.global [ %r8417 + 0 ], [ %rd173 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6387912Z // end inline asm 2026-02-21T10:18:46.6387984Z cp.async.commit_group; 2026-02-21T10:18:46.6388271Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6388341Z add.s64 %rd174, %rd238, 160; 2026-02-21T10:18:46.6388403Z add.s64 %rd175, %rd242, 160; 2026-02-21T10:18:46.6388556Z add.s64 %rd176, %rd246, 160; 2026-02-21T10:18:46.6388627Z add.s64 %rd177, %rd250, 160; 2026-02-21T10:18:46.6388842Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6388909Z add.s32 %r8419, %r8379, 24576; 2026-02-21T10:18:46.6388974Z // begin inline asm 2026-02-21T10:18:46.6389119Z cp.async.ca.shared.global [ %r8419 + 0 ], [ %rd174 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6389178Z // end inline asm 2026-02-21T10:18:46.6389245Z add.s32 %r8421, %r8379, 25600; 2026-02-21T10:18:46.6389310Z // begin inline asm 2026-02-21T10:18:46.6389444Z cp.async.ca.shared.global [ %r8421 + 0 ], [ %rd175 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6389503Z // end inline asm 2026-02-21T10:18:46.6389570Z add.s32 %r8423, %r8379, 26624; 2026-02-21T10:18:46.6389630Z // begin inline asm 2026-02-21T10:18:46.6389764Z cp.async.ca.shared.global [ %r8423 + 0 ], [ %rd176 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6389827Z // end inline asm 2026-02-21T10:18:46.6389888Z add.s32 %r8425, %r8379, 27648; 2026-02-21T10:18:46.6389948Z // begin inline asm 2026-02-21T10:18:46.6390100Z cp.async.ca.shared.global [ %r8425 + 0 ], [ %rd177 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6390164Z // end inline asm 2026-02-21T10:18:46.6390230Z cp.async.commit_group; 2026-02-21T10:18:46.6390439Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6390515Z add.s64 %rd178, %rd238, 192; 2026-02-21T10:18:46.6390577Z add.s64 %rd179, %rd242, 192; 2026-02-21T10:18:46.6390639Z add.s64 %rd180, %rd246, 192; 2026-02-21T10:18:46.6390701Z add.s64 %rd181, %rd250, 192; 2026-02-21T10:18:46.6390907Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6390972Z add.s32 %r8427, %r8379, 45056; 2026-02-21T10:18:46.6391033Z // begin inline asm 2026-02-21T10:18:46.6391173Z cp.async.ca.shared.global [ %r8427 + 0 ], [ %rd178 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6391230Z // end inline asm 2026-02-21T10:18:46.6391291Z add.s32 %r8429, %r8379, 46080; 2026-02-21T10:18:46.6391356Z // begin inline asm 2026-02-21T10:18:46.6391486Z cp.async.ca.shared.global [ %r8429 + 0 ], [ %rd179 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6391545Z // end inline asm 2026-02-21T10:18:46.6391608Z add.s32 %r8431, %r8379, 47104; 2026-02-21T10:18:46.6391763Z // begin inline asm 2026-02-21T10:18:46.6391955Z cp.async.ca.shared.global [ %r8431 + 0 ], [ %rd180 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6392014Z // end inline asm 2026-02-21T10:18:46.6392080Z add.s32 %r8433, %r8379, 48128; 2026-02-21T10:18:46.6392142Z // begin inline asm 2026-02-21T10:18:46.6392274Z cp.async.ca.shared.global [ %r8433 + 0 ], [ %rd181 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6392335Z // end inline asm 2026-02-21T10:18:46.6392409Z cp.async.commit_group; 2026-02-21T10:18:46.6392615Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6392689Z add.s64 %rd182, %rd238, 224; 2026-02-21T10:18:46.6392762Z add.s64 %rd183, %rd242, 224; 2026-02-21T10:18:46.6392826Z add.s64 %rd184, %rd246, 224; 2026-02-21T10:18:46.6392892Z add.s64 %rd185, %rd250, 224; 2026-02-21T10:18:46.6393101Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6393167Z add.s32 %r8435, %r8379, 65536; 2026-02-21T10:18:46.6393230Z // begin inline asm 2026-02-21T10:18:46.6393418Z cp.async.ca.shared.global [ %r8435 + 0 ], [ %rd182 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6393486Z // end inline asm 2026-02-21T10:18:46.6393552Z add.s32 %r8437, %r8379, 66560; 2026-02-21T10:18:46.6393612Z // begin inline asm 2026-02-21T10:18:46.6393772Z cp.async.ca.shared.global [ %r8437 + 0 ], [ %rd183 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6393877Z // end inline asm 2026-02-21T10:18:46.6393945Z add.s32 %r8439, %r8379, 67584; 2026-02-21T10:18:46.6394013Z // begin inline asm 2026-02-21T10:18:46.6394150Z cp.async.ca.shared.global [ %r8439 + 0 ], [ %rd184 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6394210Z // end inline asm 2026-02-21T10:18:46.6394273Z add.s32 %r8441, %r8379, 68608; 2026-02-21T10:18:46.6394339Z // begin inline asm 2026-02-21T10:18:46.6394469Z cp.async.ca.shared.global [ %r8441 + 0 ], [ %rd185 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6394530Z // end inline asm 2026-02-21T10:18:46.6394606Z cp.async.commit_group; 2026-02-21T10:18:46.6394822Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6394885Z add.s64 %rd186, %rd238, 256; 2026-02-21T10:18:46.6394951Z add.s64 %rd187, %rd242, 256; 2026-02-21T10:18:46.6395022Z add.s64 %rd188, %rd246, 256; 2026-02-21T10:18:46.6395083Z add.s64 %rd189, %rd250, 256; 2026-02-21T10:18:46.6395286Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6395355Z bar.sync 0; 2026-02-21T10:18:46.6395417Z add.s32 %r8443, %r8379, 8192; 2026-02-21T10:18:46.6395490Z // begin inline asm 2026-02-21T10:18:46.6395633Z cp.async.ca.shared.global [ %r8443 + 0 ], [ %rd186 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6395693Z // end inline asm 2026-02-21T10:18:46.6395754Z add.s32 %r8445, %r8379, 9216; 2026-02-21T10:18:46.6395813Z // begin inline asm 2026-02-21T10:18:46.6395958Z cp.async.ca.shared.global [ %r8445 + 0 ], [ %rd187 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6396016Z // end inline asm 2026-02-21T10:18:46.6396079Z add.s32 %r8447, %r8379, 10240; 2026-02-21T10:18:46.6396149Z // begin inline asm 2026-02-21T10:18:46.6396281Z cp.async.ca.shared.global [ %r8447 + 0 ], [ %rd188 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6396340Z // end inline asm 2026-02-21T10:18:46.6396402Z add.s32 %r8449, %r8379, 11264; 2026-02-21T10:18:46.6396599Z // begin inline asm 2026-02-21T10:18:46.6396740Z cp.async.ca.shared.global [ %r8449 + 0 ], [ %rd189 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6396797Z // end inline asm 2026-02-21T10:18:46.6396872Z cp.async.commit_group; 2026-02-21T10:18:46.6397073Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6397136Z add.s64 %rd190, %rd238, 288; 2026-02-21T10:18:46.6397203Z add.s64 %rd191, %rd242, 288; 2026-02-21T10:18:46.6397264Z add.s64 %rd192, %rd246, 288; 2026-02-21T10:18:46.6397443Z add.s64 %rd193, %rd250, 288; 2026-02-21T10:18:46.6397707Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6397781Z add.s32 %r8451, %r8379, 28672; 2026-02-21T10:18:46.6397842Z // begin inline asm 2026-02-21T10:18:46.6397975Z cp.async.ca.shared.global [ %r8451 + 0 ], [ %rd190 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6398037Z // end inline asm 2026-02-21T10:18:46.6398100Z add.s32 %r8453, %r8379, 29696; 2026-02-21T10:18:46.6398160Z // begin inline asm 2026-02-21T10:18:46.6398290Z cp.async.ca.shared.global [ %r8453 + 0 ], [ %rd191 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6398354Z // end inline asm 2026-02-21T10:18:46.6398415Z add.s32 %r8455, %r8379, 30720; 2026-02-21T10:18:46.6398477Z // begin inline asm 2026-02-21T10:18:46.6398630Z cp.async.ca.shared.global [ %r8455 + 0 ], [ %rd192 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6398690Z // end inline asm 2026-02-21T10:18:46.6398752Z add.s32 %r8457, %r8379, 31744; 2026-02-21T10:18:46.6398819Z // begin inline asm 2026-02-21T10:18:46.6398962Z cp.async.ca.shared.global [ %r8457 + 0 ], [ %rd193 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6399094Z // end inline asm 2026-02-21T10:18:46.6399169Z cp.async.commit_group; 2026-02-21T10:18:46.6399395Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6399459Z add.s64 %rd194, %rd238, 320; 2026-02-21T10:18:46.6399582Z add.s64 %rd195, %rd242, 320; 2026-02-21T10:18:46.6399652Z add.s64 %rd196, %rd246, 320; 2026-02-21T10:18:46.6399715Z add.s64 %rd197, %rd250, 320; 2026-02-21T10:18:46.6399923Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6399993Z add.s32 %r8459, %r8379, 49152; 2026-02-21T10:18:46.6400056Z // begin inline asm 2026-02-21T10:18:46.6400195Z cp.async.ca.shared.global [ %r8459 + 0 ], [ %rd194 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6400252Z // end inline asm 2026-02-21T10:18:46.6400331Z add.s32 %r8461, %r8379, 50176; 2026-02-21T10:18:46.6400396Z // begin inline asm 2026-02-21T10:18:46.6400533Z cp.async.ca.shared.global [ %r8461 + 0 ], [ %rd195 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6400595Z // end inline asm 2026-02-21T10:18:46.6400656Z add.s32 %r8463, %r8379, 51200; 2026-02-21T10:18:46.6400715Z // begin inline asm 2026-02-21T10:18:46.6400848Z cp.async.ca.shared.global [ %r8463 + 0 ], [ %rd196 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6400912Z // end inline asm 2026-02-21T10:18:46.6400971Z add.s32 %r8465, %r8379, 52224; 2026-02-21T10:18:46.6401030Z // begin inline asm 2026-02-21T10:18:46.6401165Z cp.async.ca.shared.global [ %r8465 + 0 ], [ %rd197 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6401221Z // end inline asm 2026-02-21T10:18:46.6401287Z cp.async.commit_group; 2026-02-21T10:18:46.6401498Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6401563Z add.s64 %rd198, %rd238, 352; 2026-02-21T10:18:46.6401626Z add.s64 %rd199, %rd242, 352; 2026-02-21T10:18:46.6401691Z add.s64 %rd200, %rd246, 352; 2026-02-21T10:18:46.6401767Z add.s64 %rd201, %rd250, 352; 2026-02-21T10:18:46.6401975Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6402039Z add.s32 %r8467, %r8379, 69632; 2026-02-21T10:18:46.6402103Z // begin inline asm 2026-02-21T10:18:46.6402239Z cp.async.ca.shared.global [ %r8467 + 0 ], [ %rd198 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6402299Z // end inline asm 2026-02-21T10:18:46.6402361Z add.s32 %r8469, %r8379, 70656; 2026-02-21T10:18:46.6402430Z // begin inline asm 2026-02-21T10:18:46.6402574Z cp.async.ca.shared.global [ %r8469 + 0 ], [ %rd199 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6402634Z // end inline asm 2026-02-21T10:18:46.6402704Z add.s32 %r8471, %r8379, 71680; 2026-02-21T10:18:46.6402764Z // begin inline asm 2026-02-21T10:18:46.6402899Z cp.async.ca.shared.global [ %r8471 + 0 ], [ %rd200 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6403030Z // end inline asm 2026-02-21T10:18:46.6403138Z add.s32 %r8473, %r8379, 72704; 2026-02-21T10:18:46.6403197Z // begin inline asm 2026-02-21T10:18:46.6403341Z cp.async.ca.shared.global [ %r8473 + 0 ], [ %rd201 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6403408Z // end inline asm 2026-02-21T10:18:46.6403476Z cp.async.commit_group; 2026-02-21T10:18:46.6403683Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6403753Z add.s64 %rd202, %rd238, 384; 2026-02-21T10:18:46.6403816Z add.s64 %rd203, %rd242, 384; 2026-02-21T10:18:46.6403878Z add.s64 %rd204, %rd246, 384; 2026-02-21T10:18:46.6403942Z add.s64 %rd205, %rd250, 384; 2026-02-21T10:18:46.6404150Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6404210Z bar.sync 0; 2026-02-21T10:18:46.6404273Z add.s32 %r8475, %r8379, 12288; 2026-02-21T10:18:46.6404342Z // begin inline asm 2026-02-21T10:18:46.6404476Z cp.async.ca.shared.global [ %r8475 + 0 ], [ %rd202 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6404537Z // end inline asm 2026-02-21T10:18:46.6404667Z add.s32 %r8477, %r8379, 13312; 2026-02-21T10:18:46.6404735Z // begin inline asm 2026-02-21T10:18:46.6404870Z cp.async.ca.shared.global [ %r8477 + 0 ], [ %rd203 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6404929Z // end inline asm 2026-02-21T10:18:46.6405044Z add.s32 %r8479, %r8379, 14336; 2026-02-21T10:18:46.6405107Z // begin inline asm 2026-02-21T10:18:46.6405242Z cp.async.ca.shared.global [ %r8479 + 0 ], [ %rd204 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6405304Z // end inline asm 2026-02-21T10:18:46.6405364Z add.s32 %r8481, %r8379, 15360; 2026-02-21T10:18:46.6405422Z // begin inline asm 2026-02-21T10:18:46.6405553Z cp.async.ca.shared.global [ %r8481 + 0 ], [ %rd205 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6405618Z // end inline asm 2026-02-21T10:18:46.6405685Z cp.async.commit_group; 2026-02-21T10:18:46.6405890Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6405961Z add.s64 %rd206, %rd238, 416; 2026-02-21T10:18:46.6406024Z add.s64 %rd207, %rd242, 416; 2026-02-21T10:18:46.6406085Z add.s64 %rd208, %rd246, 416; 2026-02-21T10:18:46.6406152Z add.s64 %rd209, %rd250, 416; 2026-02-21T10:18:46.6406367Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6406438Z add.s32 %r8483, %r8379, 32768; 2026-02-21T10:18:46.6406630Z // begin inline asm 2026-02-21T10:18:46.6406773Z cp.async.ca.shared.global [ %r8483 + 0 ], [ %rd206 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6406832Z // end inline asm 2026-02-21T10:18:46.6406894Z add.s32 %r8485, %r8379, 33792; 2026-02-21T10:18:46.6406960Z // begin inline asm 2026-02-21T10:18:46.6407092Z cp.async.ca.shared.global [ %r8485 + 0 ], [ %rd207 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6407151Z // end inline asm 2026-02-21T10:18:46.6407214Z add.s32 %r8487, %r8379, 34816; 2026-02-21T10:18:46.6407279Z // begin inline asm 2026-02-21T10:18:46.6407413Z cp.async.ca.shared.global [ %r8487 + 0 ], [ %rd208 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6407471Z // end inline asm 2026-02-21T10:18:46.6407540Z add.s32 %r8489, %r8379, 35840; 2026-02-21T10:18:46.6407598Z // begin inline asm 2026-02-21T10:18:46.6407729Z cp.async.ca.shared.global [ %r8489 + 0 ], [ %rd209 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6407793Z // end inline asm 2026-02-21T10:18:46.6407860Z cp.async.commit_group; 2026-02-21T10:18:46.6408061Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6408125Z add.s64 %rd210, %rd238, 448; 2026-02-21T10:18:46.6408195Z add.s64 %rd211, %rd242, 448; 2026-02-21T10:18:46.6408257Z add.s64 %rd212, %rd246, 448; 2026-02-21T10:18:46.6408319Z add.s64 %rd213, %rd250, 448; 2026-02-21T10:18:46.6408534Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6408689Z add.s32 %r8491, %r8379, 53248; 2026-02-21T10:18:46.6408808Z // begin inline asm 2026-02-21T10:18:46.6408951Z cp.async.ca.shared.global [ %r8491 + 0 ], [ %rd210 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6409010Z // end inline asm 2026-02-21T10:18:46.6409073Z add.s32 %r8493, %r8379, 54272; 2026-02-21T10:18:46.6409133Z // begin inline asm 2026-02-21T10:18:46.6409275Z cp.async.ca.shared.global [ %r8493 + 0 ], [ %rd211 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6409333Z // end inline asm 2026-02-21T10:18:46.6409394Z add.s32 %r8495, %r8379, 55296; 2026-02-21T10:18:46.6409460Z // begin inline asm 2026-02-21T10:18:46.6409591Z cp.async.ca.shared.global [ %r8495 + 0 ], [ %rd212 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6409647Z // end inline asm 2026-02-21T10:18:46.6409708Z add.s32 %r8497, %r8379, 56320; 2026-02-21T10:18:46.6409772Z // begin inline asm 2026-02-21T10:18:46.6409903Z cp.async.ca.shared.global [ %r8497 + 0 ], [ %rd213 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6409964Z // end inline asm 2026-02-21T10:18:46.6410039Z cp.async.commit_group; 2026-02-21T10:18:46.6410316Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6410384Z add.s64 %rd214, %rd238, 480; 2026-02-21T10:18:46.6410454Z add.s64 %rd215, %rd242, 480; 2026-02-21T10:18:46.6410517Z add.s64 %rd216, %rd246, 480; 2026-02-21T10:18:46.6410578Z add.s64 %rd217, %rd250, 480; 2026-02-21T10:18:46.6410845Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6410916Z add.s32 %r8499, %r8379, 73728; 2026-02-21T10:18:46.6410975Z // begin inline asm 2026-02-21T10:18:46.6411109Z cp.async.ca.shared.global [ %r8499 + 0 ], [ %rd214 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6411177Z // end inline asm 2026-02-21T10:18:46.6411247Z add.s32 %r8501, %r8379, 74752; 2026-02-21T10:18:46.6411307Z // begin inline asm 2026-02-21T10:18:46.6411443Z cp.async.ca.shared.global [ %r8501 + 0 ], [ %rd215 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6411509Z // end inline asm 2026-02-21T10:18:46.6411570Z add.s32 %r8503, %r8379, 75776; 2026-02-21T10:18:46.6411631Z // begin inline asm 2026-02-21T10:18:46.6411769Z cp.async.ca.shared.global [ %r8503 + 0 ], [ %rd216 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6411827Z // end inline asm 2026-02-21T10:18:46.6411889Z add.s32 %r8505, %r8379, 76800; 2026-02-21T10:18:46.6411955Z // begin inline asm 2026-02-21T10:18:46.6412087Z cp.async.ca.shared.global [ %r8505 + 0 ], [ %rd217 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6412144Z // end inline asm 2026-02-21T10:18:46.6412211Z cp.async.commit_group; 2026-02-21T10:18:46.6412420Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6412483Z add.s64 %rd218, %rd238, 512; 2026-02-21T10:18:46.6412544Z add.s64 %rd219, %rd242, 512; 2026-02-21T10:18:46.6412612Z add.s64 %rd220, %rd246, 512; 2026-02-21T10:18:46.6412676Z add.s64 %rd221, %rd250, 512; 2026-02-21T10:18:46.6412875Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6412937Z bar.sync 0; 2026-02-21T10:18:46.6413006Z add.s32 %r8507, %r8379, 16384; 2026-02-21T10:18:46.6413065Z // begin inline asm 2026-02-21T10:18:46.6413204Z cp.async.ca.shared.global [ %r8507 + 0 ], [ %rd218 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6413263Z // end inline asm 2026-02-21T10:18:46.6413325Z add.s32 %r8509, %r8379, 17408; 2026-02-21T10:18:46.6413384Z // begin inline asm 2026-02-21T10:18:46.6413533Z cp.async.ca.shared.global [ %r8509 + 0 ], [ %rd219 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6413591Z // end inline asm 2026-02-21T10:18:46.6413651Z add.s32 %r8511, %r8379, 18432; 2026-02-21T10:18:46.6413715Z // begin inline asm 2026-02-21T10:18:46.6413844Z cp.async.ca.shared.global [ %r8511 + 0 ], [ %rd220 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6413901Z // end inline asm 2026-02-21T10:18:46.6414022Z add.s32 %r8513, %r8379, 19456; 2026-02-21T10:18:46.6414087Z // begin inline asm 2026-02-21T10:18:46.6414262Z cp.async.ca.shared.global [ %r8513 + 0 ], [ %rd221 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6414321Z // end inline asm 2026-02-21T10:18:46.6414398Z cp.async.commit_group; 2026-02-21T10:18:46.6414598Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6414674Z add.s64 %rd222, %rd238, 544; 2026-02-21T10:18:46.6414746Z add.s64 %rd223, %rd242, 544; 2026-02-21T10:18:46.6414808Z add.s64 %rd224, %rd246, 544; 2026-02-21T10:18:46.6414869Z add.s64 %rd225, %rd250, 544; 2026-02-21T10:18:46.6415069Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6415137Z add.s32 %r8515, %r8379, 36864; 2026-02-21T10:18:46.6415196Z // begin inline asm 2026-02-21T10:18:46.6415328Z cp.async.ca.shared.global [ %r8515 + 0 ], [ %rd222 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6415394Z // end inline asm 2026-02-21T10:18:46.6415455Z add.s32 %r8517, %r8379, 37888; 2026-02-21T10:18:46.6415517Z // begin inline asm 2026-02-21T10:18:46.6415696Z cp.async.ca.shared.global [ %r8517 + 0 ], [ %rd223 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6415760Z // end inline asm 2026-02-21T10:18:46.6415823Z add.s32 %r8519, %r8379, 38912; 2026-02-21T10:18:46.6415884Z // begin inline asm 2026-02-21T10:18:46.6416090Z cp.async.ca.shared.global [ %r8519 + 0 ], [ %rd224 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6416150Z // end inline asm 2026-02-21T10:18:46.6416212Z add.s32 %r8521, %r8379, 39936; 2026-02-21T10:18:46.6416277Z // begin inline asm 2026-02-21T10:18:46.6416409Z cp.async.ca.shared.global [ %r8521 + 0 ], [ %rd225 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6416593Z // end inline asm 2026-02-21T10:18:46.6416668Z cp.async.commit_group; 2026-02-21T10:18:46.6416892Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6416963Z add.s64 %rd226, %rd238, 576; 2026-02-21T10:18:46.6417026Z add.s64 %rd227, %rd242, 576; 2026-02-21T10:18:46.6417094Z add.s64 %rd228, %rd246, 576; 2026-02-21T10:18:46.6417157Z add.s64 %rd229, %rd250, 576; 2026-02-21T10:18:46.6417366Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6417430Z add.s32 %r8523, %r8379, 57344; 2026-02-21T10:18:46.6417497Z // begin inline asm 2026-02-21T10:18:46.6417640Z cp.async.ca.shared.global [ %r8523 + 0 ], [ %rd226 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6417696Z // end inline asm 2026-02-21T10:18:46.6417763Z add.s32 %r8525, %r8379, 58368; 2026-02-21T10:18:46.6417834Z // begin inline asm 2026-02-21T10:18:46.6417970Z cp.async.ca.shared.global [ %r8525 + 0 ], [ %rd227 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6418030Z // end inline asm 2026-02-21T10:18:46.6418093Z add.s32 %r8527, %r8379, 59392; 2026-02-21T10:18:46.6418150Z // begin inline asm 2026-02-21T10:18:46.6418281Z cp.async.ca.shared.global [ %r8527 + 0 ], [ %rd228 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6418345Z // end inline asm 2026-02-21T10:18:46.6418406Z add.s32 %r8529, %r8379, 60416; 2026-02-21T10:18:46.6418467Z // begin inline asm 2026-02-21T10:18:46.6418610Z cp.async.ca.shared.global [ %r8529 + 0 ], [ %rd229 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6418667Z // end inline asm 2026-02-21T10:18:46.6418733Z cp.async.commit_group; 2026-02-21T10:18:46.6418940Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6419010Z add.s64 %rd230, %rd238, 608; 2026-02-21T10:18:46.6419072Z add.s64 %rd231, %rd242, 608; 2026-02-21T10:18:46.6419133Z add.s64 %rd232, %rd246, 608; 2026-02-21T10:18:46.6419198Z add.s64 %rd233, %rd250, 608; 2026-02-21T10:18:46.6419397Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6419458Z add.s32 %r8531, %r8379, 77824; 2026-02-21T10:18:46.6419521Z // begin inline asm 2026-02-21T10:18:46.6419748Z cp.async.ca.shared.global [ %r8531 + 0 ], [ %rd230 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6419867Z // end inline asm 2026-02-21T10:18:46.6419930Z add.s32 %r8533, %r8379, 78848; 2026-02-21T10:18:46.6419997Z // begin inline asm 2026-02-21T10:18:46.6420130Z cp.async.ca.shared.global [ %r8533 + 0 ], [ %rd231 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6420186Z // end inline asm 2026-02-21T10:18:46.6420253Z add.s32 %r8535, %r8379, 79872; 2026-02-21T10:18:46.6420315Z // begin inline asm 2026-02-21T10:18:46.6420445Z cp.async.ca.shared.global [ %r8535 + 0 ], [ %rd232 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6420500Z // end inline asm 2026-02-21T10:18:46.6420568Z add.s32 %r8537, %r8379, 80896; 2026-02-21T10:18:46.6420625Z // begin inline asm 2026-02-21T10:18:46.6420754Z cp.async.ca.shared.global [ %r8537 + 0 ], [ %rd233 + 0 ], 0x8, %r8380; 2026-02-21T10:18:46.6420817Z // end inline asm 2026-02-21T10:18:46.6420881Z cp.async.commit_group; 2026-02-21T10:18:46.6421097Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6421165Z @%p40 bra $L__BB0_15; 2026-02-21T10:18:46.6421323Z // %bb.9: // %.lr.ph40 2026-02-21T10:18:46.6421390Z shl.b32 %r8586, %r583, 7; 2026-02-21T10:18:46.6421594Z .loc 1 28 64 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:64 2026-02-21T10:18:46.6421734Z mul.lo.s32 %r8587, %r587, %r586; 2026-02-21T10:18:46.6421800Z sub.s32 %r8588, %r585, %r8587; 2026-02-21T10:18:46.6421999Z .loc 1 28 30 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:30 2026-02-21T10:18:46.6422071Z add.s32 %r8589, %r8588, %r584; 2026-02-21T10:18:46.6422267Z .loc 1 30 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:30:27 2026-02-21T10:18:46.6422337Z shl.b32 %r8590, %r8589, 7; 2026-02-21T10:18:46.6422534Z .loc 1 31 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:31:32 2026-02-21T10:18:46.6422601Z or.b32 %r12622, %r8590, %r28; 2026-02-21T10:18:46.6422672Z add.s32 %r596, %r8586, -5; 2026-02-21T10:18:46.6422737Z shl.b32 %r8592, %r12324, 4; 2026-02-21T10:18:46.6422802Z or.b32 %r8595, %r8592, %r12325; 2026-02-21T10:18:46.6422866Z or.b32 %r8596, %r8595, %r12326; 2026-02-21T10:18:46.6422934Z or.b32 %r597, %r8596, %r593; 2026-02-21T10:18:46.6422996Z xor.b32 %r598, %r597, 8; 2026-02-21T10:18:46.6423064Z selp.b32 %r8598, 0, 132, %p42; 2026-02-21T10:18:46.6423130Z or.b32 %r8599, %r12327, %r8598; 2026-02-21T10:18:46.6423191Z or.b32 %r8600, %r8599, %r28; 2026-02-21T10:18:46.6423252Z add.s32 %r9449, %r5575, 81920; 2026-02-21T10:18:46.6423316Z add.s32 %r599, %r9449, %r8600; 2026-02-21T10:18:46.6423383Z xor.b32 %r8603, %r8600, 4; 2026-02-21T10:18:46.6423442Z add.s32 %r600, %r9449, %r8603; 2026-02-21T10:18:46.6423505Z xor.b32 %r8604, %r8600, 32; 2026-02-21T10:18:46.6423572Z add.s32 %r601, %r9449, %r8604; 2026-02-21T10:18:46.6423635Z xor.b32 %r8605, %r8600, 36; 2026-02-21T10:18:46.6423700Z add.s32 %r602, %r9449, %r8605; 2026-02-21T10:18:46.6423770Z xor.b32 %r8606, %r8600, 64; 2026-02-21T10:18:46.6423841Z add.s32 %r603, %r9449, %r8606; 2026-02-21T10:18:46.6423899Z xor.b32 %r8607, %r8600, 68; 2026-02-21T10:18:46.6423957Z add.s32 %r604, %r9449, %r8607; 2026-02-21T10:18:46.6424019Z xor.b32 %r8608, %r8600, 96; 2026-02-21T10:18:46.6424083Z add.s32 %r605, %r9449, %r8608; 2026-02-21T10:18:46.6424144Z xor.b32 %r8609, %r8600, 100; 2026-02-21T10:18:46.6424210Z add.s32 %r606, %r9449, %r8609; 2026-02-21T10:18:46.6424273Z mul.lo.s32 %r8613, %r12328, 144; 2026-02-21T10:18:46.6424335Z xor.b32 %r8614, %r8613, %r12329; 2026-02-21T10:18:46.6424396Z or.b32 %r8615, %r8614, %r12330; 2026-02-21T10:18:46.6424464Z add.s32 %r607, %r9449, %r8615; 2026-02-21T10:18:46.6424523Z xor.b32 %r8616, %r8615, 132; 2026-02-21T10:18:46.6424582Z add.s32 %r608, %r9449, %r8616; 2026-02-21T10:18:46.6424646Z and.b32 %r8618, %r12331, 8128; 2026-02-21T10:18:46.6424778Z shl.b32 %r8619, %r12328, 3; 2026-02-21T10:18:46.6424889Z or.b32 %r8620, %r8618, %r8619; 2026-02-21T10:18:46.6424951Z add.s32 %r609, %r9449, %r8620; 2026-02-21T10:18:46.6425018Z xor.b32 %r8621, %r8620, 16; 2026-02-21T10:18:46.6425078Z add.s32 %r610, %r9449, %r8621; 2026-02-21T10:18:46.6425138Z xor.b32 %r8622, %r8620, 32; 2026-02-21T10:18:46.6425211Z add.s32 %r611, %r9449, %r8622; 2026-02-21T10:18:46.6425277Z xor.b32 %r8623, %r8620, 48; 2026-02-21T10:18:46.6425337Z add.s32 %r612, %r9449, %r8623; 2026-02-21T10:18:46.6425402Z bfe.u32 %r8624, %r9449, 4, 14; 2026-02-21T10:18:46.6425464Z cvt.u64.u32 %rd251, %r8624; 2026-02-21T10:18:46.6425550Z or.b64 %rd254, %rd251, -9223371899382267904; 2026-02-21T10:18:46.6425616Z add.s32 %r8625, %r5575, 81952; 2026-02-21T10:18:46.6425681Z bfe.u32 %r8626, %r8625, 4, 14; 2026-02-21T10:18:46.6425744Z cvt.u64.u32 %rd252, %r8626; 2026-02-21T10:18:46.6425823Z or.b64 %rd255, %rd252, -9223371899382267904; 2026-02-21T10:18:46.6425890Z shl.b32 %r8629, %r12329, 4; 2026-02-21T10:18:46.6425951Z and.b32 %r8631, %r12334, 16; 2026-02-21T10:18:46.6426015Z or.b32 %r8632, %r8629, %r8631; 2026-02-21T10:18:46.6426134Z or.b32 %r8633, %r8632, %r12332; 2026-02-21T10:18:46.6426207Z or.b32 %r8634, %r8633, %r12333; 2026-02-21T10:18:46.6426269Z add.s32 %r613, %r9449, %r8634; 2026-02-21T10:18:46.6426330Z xor.b32 %r8635, %r8634, 32; 2026-02-21T10:18:46.6426401Z add.s32 %r614, %r9449, %r8635; 2026-02-21T10:18:46.6426654Z xor.b32 %r8636, %r8634, 64; 2026-02-21T10:18:46.6426726Z add.s32 %r615, %r9449, %r8636; 2026-02-21T10:18:46.6426787Z xor.b32 %r8637, %r8634, 96; 2026-02-21T10:18:46.6426855Z add.s32 %r616, %r9449, %r8637; 2026-02-21T10:18:46.6426930Z and.b32 %r8639, %r12335, 6144; 2026-02-21T10:18:46.6426997Z or.b32 %r8641, %r8639, %r12333; 2026-02-21T10:18:46.6427066Z xor.b32 %r8642, %r8641, %r12336; 2026-02-21T10:18:46.6427128Z add.s32 %r12086, %r9449, %r8642; 2026-02-21T10:18:46.6427191Z add.s32 %r12091, %r12086, 512; 2026-02-21T10:18:46.6427262Z add.s32 %r12096, %r12086, 1024; 2026-02-21T10:18:46.6427323Z add.s32 %r12101, %r12086, 1536; 2026-02-21T10:18:46.6427547Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6427610Z shl.b32 %r8643, %r581, 7; 2026-02-21T10:18:46.6427675Z shl.b32 %r8644, %r582, 7; 2026-02-21T10:18:46.6427736Z add.s32 %r621, %r8643, %r8644; 2026-02-21T10:18:46.6427799Z mov.b32 %r12635, 0f00000000; 2026-02-21T10:18:46.6427863Z mov.b32 %r12632, 4; 2026-02-21T10:18:46.6427923Z mov.b32 %r12631, -1; 2026-02-21T10:18:46.6427979Z mov.b32 %r12629, 32; 2026-02-21T10:18:46.6428036Z mov.b32 %r12628, 64; 2026-02-21T10:18:46.6428097Z mov.b32 %r12627, 96; 2026-02-21T10:18:46.6428157Z mov.b32 %r12626, 128; 2026-02-21T10:18:46.6428214Z mov.b32 %r12621, 8; 2026-02-21T10:18:46.6428276Z mov.b32 %r12620, 40; 2026-02-21T10:18:46.6428332Z mov.b32 %r12619, 72; 2026-02-21T10:18:46.6428391Z mov.b32 %r12618, 104; 2026-02-21T10:18:46.6428518Z mov.b32 %r12617, 136; 2026-02-21T10:18:46.6428586Z mov.b32 %r12616, 16; 2026-02-21T10:18:46.6428643Z mov.b32 %r12615, 48; 2026-02-21T10:18:46.6428701Z mov.b32 %r12614, 80; 2026-02-21T10:18:46.6428763Z mov.b32 %r12613, 112; 2026-02-21T10:18:46.6428820Z mov.b32 %r12612, 144; 2026-02-21T10:18:46.6428877Z mov.b32 %r12611, 24; 2026-02-21T10:18:46.6428934Z mov.b32 %r12610, 56; 2026-02-21T10:18:46.6428994Z mov.b32 %r12609, 88; 2026-02-21T10:18:46.6429053Z mov.b32 %r12608, 120; 2026-02-21T10:18:46.6429111Z mov.b32 %r12607, 152; 2026-02-21T10:18:46.6429173Z mov.b32 %r12606, 0; 2026-02-21T10:18:46.6429230Z mov.b32 %r12605, 1; 2026-02-21T10:18:46.6429287Z mov.b32 %r12604, 2; 2026-02-21T10:18:46.6429344Z mov.b32 %r12603, 3; 2026-02-21T10:18:46.6429409Z mov.b32 %r12600, %r12599; 2026-02-21T10:18:46.6429468Z mov.b32 %r12601, %r12599; 2026-02-21T10:18:46.6429526Z mov.b32 %r12602, %r12599; 2026-02-21T10:18:46.6429590Z mov.b32 %r12623, %r12622; 2026-02-21T10:18:46.6429731Z mov.b32 %r12624, %r12622; 2026-02-21T10:18:46.6429789Z mov.b32 %r12625, %r12622; 2026-02-21T10:18:46.6429918Z mov.b32 %r12630, %r12606; 2026-02-21T10:18:46.6429988Z mov.b32 %r12633, %r12599; 2026-02-21T10:18:46.6430046Z mov.b32 %r12634, %r12622; 2026-02-21T10:18:46.6430105Z mov.b32 %r12636, %r12635; 2026-02-21T10:18:46.6430174Z mov.b32 %r12637, %r12635; 2026-02-21T10:18:46.6430232Z mov.b32 %r12638, %r12635; 2026-02-21T10:18:46.6430292Z mov.b32 %r12639, %r12635; 2026-02-21T10:18:46.6430361Z mov.b32 %r12640, %r12635; 2026-02-21T10:18:46.6430421Z mov.b32 %r12641, %r12635; 2026-02-21T10:18:46.6430480Z mov.b32 %r12642, %r12635; 2026-02-21T10:18:46.6430539Z mov.b32 %r12643, %r12635; 2026-02-21T10:18:46.6430604Z mov.b32 %r12644, %r12635; 2026-02-21T10:18:46.6430662Z mov.b32 %r12645, %r12635; 2026-02-21T10:18:46.6430719Z mov.b32 %r12646, %r12635; 2026-02-21T10:18:46.6430784Z mov.b32 %r12647, %r12635; 2026-02-21T10:18:46.6430843Z mov.b32 %r12648, %r12635; 2026-02-21T10:18:46.6430902Z mov.b32 %r12649, %r12635; 2026-02-21T10:18:46.6430960Z mov.b32 %r12650, %r12635; 2026-02-21T10:18:46.6431025Z mov.b32 %r12651, %r12635; 2026-02-21T10:18:46.6431155Z mov.b32 %r12652, %r12635; 2026-02-21T10:18:46.6431216Z mov.b32 %r12653, %r12635; 2026-02-21T10:18:46.6431278Z mov.b32 %r12654, %r12635; 2026-02-21T10:18:46.6431336Z mov.b32 %r12655, %r12635; 2026-02-21T10:18:46.6431394Z mov.b32 %r12656, %r12635; 2026-02-21T10:18:46.6431502Z mov.b32 %r12657, %r12635; 2026-02-21T10:18:46.6431567Z mov.b32 %r12658, %r12635; 2026-02-21T10:18:46.6431624Z mov.b32 %r12659, %r12635; 2026-02-21T10:18:46.6431682Z mov.b32 %r12660, %r12635; 2026-02-21T10:18:46.6431746Z mov.b32 %r12661, %r12635; 2026-02-21T10:18:46.6431817Z mov.b32 %r12662, %r12635; 2026-02-21T10:18:46.6431876Z mov.b32 %r12663, %r12635; 2026-02-21T10:18:46.6431935Z mov.b32 %r12664, %r12635; 2026-02-21T10:18:46.6431997Z mov.b32 %r12665, %r12635; 2026-02-21T10:18:46.6432055Z mov.b32 %r12666, %r12635; 2026-02-21T10:18:46.6432115Z mov.b32 %r12667, %r12635; 2026-02-21T10:18:46.6432178Z mov.b32 %r12668, %r12635; 2026-02-21T10:18:46.6432237Z mov.b32 %r12669, %r12635; 2026-02-21T10:18:46.6432297Z mov.b32 %r12670, %r12635; 2026-02-21T10:18:46.6432354Z mov.b32 %r12671, %r12635; 2026-02-21T10:18:46.6432414Z mov.b32 %r12672, %r12635; 2026-02-21T10:18:46.6432473Z mov.b32 %r12673, %r12635; 2026-02-21T10:18:46.6432532Z mov.b32 %r12674, %r12635; 2026-02-21T10:18:46.6432594Z mov.b32 %r12675, %r12635; 2026-02-21T10:18:46.6432653Z mov.b32 %r12676, %r12635; 2026-02-21T10:18:46.6432711Z mov.b32 %r12677, %r12635; 2026-02-21T10:18:46.6432769Z mov.b32 %r12678, %r12635; 2026-02-21T10:18:46.6432830Z mov.b32 %r12679, %r12635; 2026-02-21T10:18:46.6432887Z mov.b32 %r12680, %r12635; 2026-02-21T10:18:46.6432945Z mov.b32 %r12681, %r12635; 2026-02-21T10:18:46.6433006Z mov.b32 %r12682, %r12635; 2026-02-21T10:18:46.6433062Z mov.b32 %r12683, %r12635; 2026-02-21T10:18:46.6433119Z mov.b32 %r12684, %r12635; 2026-02-21T10:18:46.6433184Z mov.b32 %r12685, %r12635; 2026-02-21T10:18:46.6433242Z mov.b32 %r12686, %r12635; 2026-02-21T10:18:46.6433299Z mov.b32 %r12687, %r12635; 2026-02-21T10:18:46.6433357Z mov.b32 %r12688, %r12635; 2026-02-21T10:18:46.6433420Z mov.b32 %r12689, %r12635; 2026-02-21T10:18:46.6433478Z mov.b32 %r12690, %r12635; 2026-02-21T10:18:46.6433534Z mov.b32 %r12691, %r12635; 2026-02-21T10:18:46.6433597Z mov.b32 %r12692, %r12635; 2026-02-21T10:18:46.6433656Z mov.b32 %r12693, %r12635; 2026-02-21T10:18:46.6433715Z mov.b32 %r12694, %r12635; 2026-02-21T10:18:46.6433771Z mov.b32 %r12695, %r12635; 2026-02-21T10:18:46.6433832Z mov.b32 %r12696, %r12635; 2026-02-21T10:18:46.6433890Z mov.b32 %r12697, %r12635; 2026-02-21T10:18:46.6433946Z mov.b32 %r12698, %r12635; 2026-02-21T10:18:46.6434006Z mov.b32 %r12699, %r12635; 2026-02-21T10:18:46.6434064Z mov.b32 %r12700, %r12635; 2026-02-21T10:18:46.6434121Z mov.b32 %r12701, %r12635; 2026-02-21T10:18:46.6434177Z mov.b32 %r12702, %r12635; 2026-02-21T10:18:46.6434323Z mov.b32 %r12703, %r12635; 2026-02-21T10:18:46.6434429Z mov.b32 %r12704, %r12635; 2026-02-21T10:18:46.6434487Z mov.b32 %r12705, %r12635; 2026-02-21T10:18:46.6434550Z mov.b32 %r12706, %r12635; 2026-02-21T10:18:46.6434608Z mov.b32 %r12707, %r12635; 2026-02-21T10:18:46.6434665Z mov.b32 %r12708, %r12635; 2026-02-21T10:18:46.6434720Z mov.b32 %r12709, %r12635; 2026-02-21T10:18:46.6434781Z mov.b32 %r12710, %r12635; 2026-02-21T10:18:46.6434839Z mov.b32 %r12711, %r12635; 2026-02-21T10:18:46.6434897Z mov.b32 %r12712, %r12635; 2026-02-21T10:18:46.6434958Z mov.b32 %r12713, %r12635; 2026-02-21T10:18:46.6435013Z mov.b32 %r12714, %r12635; 2026-02-21T10:18:46.6435071Z mov.b32 %r12715, %r12635; 2026-02-21T10:18:46.6435129Z mov.b32 %r12716, %r12635; 2026-02-21T10:18:46.6435191Z mov.b32 %r12717, %r12635; 2026-02-21T10:18:46.6435247Z mov.b32 %r12718, %r12635; 2026-02-21T10:18:46.6435315Z mov.b32 %r12719, %r12635; 2026-02-21T10:18:46.6435379Z mov.b32 %r12720, %r12635; 2026-02-21T10:18:46.6435438Z mov.b32 %r12721, %r12635; 2026-02-21T10:18:46.6435496Z mov.b32 %r12722, %r12635; 2026-02-21T10:18:46.6435554Z mov.b32 %r12723, %r12635; 2026-02-21T10:18:46.6435699Z mov.b32 %r12724, %r12635; 2026-02-21T10:18:46.6435761Z mov.b32 %r12725, %r12635; 2026-02-21T10:18:46.6435819Z mov.b32 %r12726, %r12635; 2026-02-21T10:18:46.6435880Z mov.b32 %r12727, %r12635; 2026-02-21T10:18:46.6435936Z mov.b32 %r12728, %r12635; 2026-02-21T10:18:46.6436041Z mov.b32 %r12729, %r12635; 2026-02-21T10:18:46.6436106Z mov.b32 %r12730, %r12635; 2026-02-21T10:18:46.6436164Z mov.b32 %r12731, %r12635; 2026-02-21T10:18:46.6436220Z mov.b32 %r12732, %r12635; 2026-02-21T10:18:46.6436276Z mov.b32 %r12733, %r12635; 2026-02-21T10:18:46.6436336Z mov.b32 %r12734, %r12635; 2026-02-21T10:18:46.6436393Z mov.b32 %r12735, %r12635; 2026-02-21T10:18:46.6436587Z mov.b32 %r12736, %r12635; 2026-02-21T10:18:46.6436654Z mov.b32 %r12737, %r12635; 2026-02-21T10:18:46.6436711Z mov.b32 %r12738, %r12635; 2026-02-21T10:18:46.6436771Z mov.b32 %r12739, %r12635; 2026-02-21T10:18:46.6436830Z mov.b32 %r12740, %r12635; 2026-02-21T10:18:46.6436891Z mov.b32 %r12741, %r12635; 2026-02-21T10:18:46.6436950Z mov.b32 %r12742, %r12635; 2026-02-21T10:18:46.6437007Z mov.b32 %r12743, %r12635; 2026-02-21T10:18:46.6437067Z mov.b32 %r12744, %r12635; 2026-02-21T10:18:46.6437123Z mov.b32 %r12745, %r12635; 2026-02-21T10:18:46.6437180Z mov.b32 %r12746, %r12635; 2026-02-21T10:18:46.6437240Z mov.b32 %r12747, %r12635; 2026-02-21T10:18:46.6437302Z mov.b32 %r12748, %r12635; 2026-02-21T10:18:46.6437360Z mov.b32 %r12749, %r12635; 2026-02-21T10:18:46.6437416Z mov.b32 %r12750, %r12635; 2026-02-21T10:18:46.6437475Z mov.b32 %r12751, %r12635; 2026-02-21T10:18:46.6437532Z mov.b32 %r12752, %r12635; 2026-02-21T10:18:46.6437588Z mov.b32 %r12753, %r12635; 2026-02-21T10:18:46.6437646Z mov.b32 %r12754, %r12635; 2026-02-21T10:18:46.6437707Z mov.b32 %r12755, %r12635; 2026-02-21T10:18:46.6437768Z mov.b32 %r12756, %r12635; 2026-02-21T10:18:46.6437827Z mov.b32 %r12757, %r12635; 2026-02-21T10:18:46.6437891Z mov.b32 %r12758, %r12635; 2026-02-21T10:18:46.6437950Z mov.b32 %r12759, %r12635; 2026-02-21T10:18:46.6438008Z mov.b32 %r12760, %r12635; 2026-02-21T10:18:46.6438066Z mov.b32 %r12761, %r12635; 2026-02-21T10:18:46.6438128Z mov.b32 %r12762, %r12635; 2026-02-21T10:18:46.6438185Z mov.b32 %r12764, %r12632; 2026-02-21T10:18:46.6438242Z mov.b32 %r12765, %r12606; 2026-02-21T10:18:46.6438304Z mov.b32 %r12766, %r12634; 2026-02-21T10:18:46.6438361Z mov.b32 %r12767, %r12633; 2026-02-21T10:18:46.6438421Z bra.uni $L__BB0_10; 2026-02-21T10:18:46.6438545Z $L__BB0_14: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:18:46.6438781Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6438845Z add.s32 %r12765, %r12765, 1; 2026-02-21T10:18:46.6438915Z setp.ne.b32 %p67, %r621, %r12765; 2026-02-21T10:18:46.6439071Z mov.b32 %r12599, %r12633; 2026-02-21T10:18:46.6439132Z mov.b32 %r12602, %r628; 2026-02-21T10:18:46.6439248Z mov.b32 %r12603, %r12764; 2026-02-21T10:18:46.6439311Z mov.b32 %r12606, %r632; 2026-02-21T10:18:46.6439370Z mov.b32 %r12611, %r637; 2026-02-21T10:18:46.6439426Z mov.b32 %r12616, %r642; 2026-02-21T10:18:46.6439485Z mov.b32 %r12621, %r647; 2026-02-21T10:18:46.6439545Z mov.b32 %r12622, %r12634; 2026-02-21T10:18:46.6439604Z mov.b32 %r12625, %r651; 2026-02-21T10:18:46.6439662Z mov.b32 %r12630, %r656; 2026-02-21T10:18:46.6439721Z mov.b32 %r12633, %r12767; 2026-02-21T10:18:46.6439778Z mov.b32 %r12634, %r12766; 2026-02-21T10:18:46.6439834Z mov.b32 %r12764, %r793; 2026-02-21T10:18:46.6439894Z @%p67 bra $L__BB0_10; 2026-02-21T10:18:46.6439955Z bra.uni $L__BB0_15; 2026-02-21T10:18:46.6440075Z $L__BB0_10: // =>This Inner Loop Header: Depth=1 2026-02-21T10:18:46.6440282Z .loc 1 0 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:0:139 2026-02-21T10:18:46.6440347Z mov.b32 %r656, %r12629; 2026-02-21T10:18:46.6440407Z mov.b32 %r12629, %r12628; 2026-02-21T10:18:46.6440465Z mov.b32 %r12628, %r12627; 2026-02-21T10:18:46.6440594Z mov.b32 %r12627, %r12626; 2026-02-21T10:18:46.6440656Z mov.b32 %r651, %r12624; 2026-02-21T10:18:46.6440714Z mov.b32 %r12624, %r12623; 2026-02-21T10:18:46.6440770Z mov.b32 %r12623, %r12622; 2026-02-21T10:18:46.6440843Z mov.b32 %r647, %r12620; 2026-02-21T10:18:46.6440960Z mov.b32 %r12620, %r12619; 2026-02-21T10:18:46.6441020Z mov.b32 %r12619, %r12618; 2026-02-21T10:18:46.6441082Z mov.b32 %r12618, %r12617; 2026-02-21T10:18:46.6441138Z mov.b32 %r642, %r12615; 2026-02-21T10:18:46.6441195Z mov.b32 %r12615, %r12614; 2026-02-21T10:18:46.6441252Z mov.b32 %r12614, %r12613; 2026-02-21T10:18:46.6441314Z mov.b32 %r12613, %r12612; 2026-02-21T10:18:46.6441374Z mov.b32 %r637, %r12610; 2026-02-21T10:18:46.6441431Z mov.b32 %r12610, %r12609; 2026-02-21T10:18:46.6441492Z mov.b32 %r12609, %r12608; 2026-02-21T10:18:46.6441552Z mov.b32 %r12608, %r12607; 2026-02-21T10:18:46.6441613Z mov.b32 %r632, %r12605; 2026-02-21T10:18:46.6441669Z mov.b32 %r12605, %r12604; 2026-02-21T10:18:46.6441732Z mov.b32 %r12604, %r12603; 2026-02-21T10:18:46.6441790Z mov.b32 %r628, %r12601; 2026-02-21T10:18:46.6441847Z mov.b32 %r12601, %r12600; 2026-02-21T10:18:46.6441908Z mov.b32 %r12600, %r12599; 2026-02-21T10:18:46.6442119Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6442183Z add.s32 %r8645, %r12764, 1; 2026-02-21T10:18:46.6442251Z setp.eq.b32 %p44, %r12764, 127; 2026-02-21T10:18:46.6442325Z selp.b32 %r793, 0, %r8645, %p44; 2026-02-21T10:18:46.6442392Z setp.ne.b32 %p45, %r793, 0; 2026-02-21T10:18:46.6442453Z @%p45 bra $L__BB0_12; 2026-02-21T10:18:46.6442568Z // %bb.11: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:18:46.6442632Z add.s32 %r12768, %r12768, 132; 2026-02-21T10:18:46.6442837Z .loc 1 25 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:25:35 2026-02-21T10:18:46.6442903Z shr.s32 %r8646, %r12768, 31; 2026-02-21T10:18:46.6442965Z shr.u32 %r8647, %r8646, 17; 2026-02-21T10:18:46.6443027Z add.s32 %r8648, %r12768, %r8647; 2026-02-21T10:18:46.6443085Z shr.s32 %r8649, %r8648, 15; 2026-02-21T10:18:46.6443285Z .loc 1 26 33 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:26:33 2026-02-21T10:18:46.6443349Z shl.b32 %r8650, %r8649, 6; 2026-02-21T10:18:46.6443543Z .loc 1 27 39 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:39 2026-02-21T10:18:46.6443609Z sub.s32 %r8651, 10, %r8650; 2026-02-21T10:18:46.6443802Z .loc 1 27 52 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:27:52 2026-02-21T10:18:46.6443860Z min.s32 %r8652, %r8651, 64; 2026-02-21T10:18:46.6444058Z .loc 1 28 45 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:45 2026-02-21T10:18:46.6444191Z and.b32 %r8653, %r8648, -32768; 2026-02-21T10:18:46.6444300Z sub.s32 %r8654, %r12768, %r8653; 2026-02-21T10:18:46.6444504Z .loc 1 29 51 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:29:51 2026-02-21T10:18:46.6444570Z div.s32 %r8655, %r8654, %r8652; 2026-02-21T10:18:46.6444766Z .loc 1 28 64 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:64 2026-02-21T10:18:46.6444831Z mul.lo.s32 %r8656, %r8655, %r8652; 2026-02-21T10:18:46.6444896Z sub.s32 %r8657, %r8654, %r8656; 2026-02-21T10:18:46.6445089Z .loc 1 28 30 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:28:30 2026-02-21T10:18:46.6445151Z add.s32 %r8658, %r8657, %r8650; 2026-02-21T10:18:46.6445349Z .loc 1 30 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:30:27 2026-02-21T10:18:46.6445409Z shl.b32 %r8659, %r8658, 7; 2026-02-21T10:18:46.6445604Z .loc 1 31 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:31:32 2026-02-21T10:18:46.6445670Z or.b32 %r12766, %r8659, %r28; 2026-02-21T10:18:46.6445910Z .loc 1 32 27 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:32:27 2026-02-21T10:18:46.6445974Z shl.b32 %r12767, %r8655, 7; 2026-02-21T10:18:46.6446169Z .loc 1 33 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:33:32 2026-02-21T10:18:46.6446277Z or.b32 %r12769, %r12767, %r6; 2026-02-21T10:18:46.6446339Z or.b32 %r12770, %r12767, %r7; 2026-02-21T10:18:46.6446399Z or.b32 %r12771, %r12767, %r8; 2026-02-21T10:18:46.6446580Z or.b32 %r12772, %r12767, %r9; 2026-02-21T10:18:46.6446780Z $L__BB0_12: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:18:46.6447221Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6447597Z setp.eq.b32 %p62, %r793, 0; 2026-02-21T10:18:46.6447802Z setp.lt.s32 %p63, %r12765, %r596; 2026-02-21T10:18:46.6448000Z add.s32 %r11860, %r12631, 1; 2026-02-21T10:18:46.6448188Z setp.gt.s32 %p64, %r11860, 4; 2026-02-21T10:18:46.6448384Z selp.b32 %r12631, 0, %r11860, %p64; 2026-02-21T10:18:46.6448729Z .loc 1 41 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:41:35 2026-02-21T10:18:46.6449088Z add.s32 %r11861, %r12630, %r11; 2026-02-21T10:18:46.6449411Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6449768Z cp.async.wait_group 16; 2026-02-21T10:18:46.6449937Z bar.sync 0; 2026-02-21T10:18:46.6450084Z shl.b32 %r11862, %r12631, 12; 2026-02-21T10:18:46.6450266Z add.s32 %r11864, %r5575, %r11862; 2026-02-21T10:18:46.6450610Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6450965Z add.s32 %r11865, %r11864, %r597; 2026-02-21T10:18:46.6451167Z ld.shared.b16 %rs513, [%r11865]; 2026-02-21T10:18:46.6451366Z ld.shared.b16 %rs514, [%r11865+256]; 2026-02-21T10:18:46.6451568Z ld.shared.b16 %rs515, [%r11865+16]; 2026-02-21T10:18:46.6451770Z ld.shared.b16 %rs516, [%r11865+272]; 2026-02-21T10:18:46.6451974Z ld.shared.b16 %rs517, [%r11865+2048]; 2026-02-21T10:18:46.6452182Z ld.shared.b16 %rs518, [%r11865+2304]; 2026-02-21T10:18:46.6452378Z ld.shared.b16 %rs519, [%r11865+2064]; 2026-02-21T10:18:46.6452577Z ld.shared.b16 %rs520, [%r11865+2320]; 2026-02-21T10:18:46.6452769Z add.s32 %r11866, %r11864, %r598; 2026-02-21T10:18:46.6452977Z ld.shared.b16 %rs521, [%r11866]; 2026-02-21T10:18:46.6453172Z ld.shared.b16 %rs522, [%r11866+256]; 2026-02-21T10:18:46.6453385Z ld.shared.b16 %rs523, [%r11866+16]; 2026-02-21T10:18:46.6453585Z ld.shared.b16 %rs524, [%r11866+272]; 2026-02-21T10:18:46.6453784Z ld.shared.b16 %rs525, [%r11866+2048]; 2026-02-21T10:18:46.6453984Z ld.shared.b16 %rs526, [%r11866+2304]; 2026-02-21T10:18:46.6454279Z ld.shared.b16 %rs527, [%r11866+2064]; 2026-02-21T10:18:46.6454482Z ld.shared.b16 %rs528, [%r11866+2320]; 2026-02-21T10:18:46.6454743Z cvt.f32.bf16 %r8790, %rs513; 2026-02-21T10:18:46.6454931Z cvt.f32.bf16 %r8791, %rs514; 2026-02-21T10:18:46.6455104Z cvt.f32.bf16 %r8792, %rs521; 2026-02-21T10:18:46.6455290Z cvt.f32.bf16 %r8793, %rs522; 2026-02-21T10:18:46.6455476Z cvt.f32.bf16 %r8922, %rs515; 2026-02-21T10:18:46.6455695Z cvt.f32.bf16 %r8923, %rs516; 2026-02-21T10:18:46.6455885Z cvt.f32.bf16 %r8924, %rs523; 2026-02-21T10:18:46.6456070Z cvt.f32.bf16 %r8925, %rs524; 2026-02-21T10:18:46.6456242Z cvt.f32.bf16 %r9054, %rs517; 2026-02-21T10:18:46.6456430Z cvt.f32.bf16 %r9055, %rs518; 2026-02-21T10:18:46.6456747Z cvt.f32.bf16 %r9056, %rs525; 2026-02-21T10:18:46.6456929Z cvt.f32.bf16 %r9057, %rs526; 2026-02-21T10:18:46.6457098Z cvt.f32.bf16 %r9186, %rs519; 2026-02-21T10:18:46.6457273Z cvt.f32.bf16 %r9187, %rs520; 2026-02-21T10:18:46.6457451Z cvt.f32.bf16 %r9188, %rs527; 2026-02-21T10:18:46.6457624Z cvt.f32.bf16 %r9189, %rs528; 2026-02-21T10:18:46.6457952Z .loc 1 54 62 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:62 2026-02-21T10:18:46.6458417Z mad.lo.s32 %r11867, %r11861, 1280, %r12625; 2026-02-21T10:18:46.6458787Z .loc 1 54 34 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:34 2026-02-21T10:18:46.6459142Z cvt.s64.s32 %rd289, %r11867; 2026-02-21T10:18:46.6459402Z add.s64 %rd253, %rd26, %rd289; 2026-02-21T10:18:46.6459736Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6460083Z // begin inline asm 2026-02-21T10:18:46.6460254Z mov.u32 %r8660, 0x0; 2026-02-21T10:18:46.6460410Z mov.u32 %r8661, 0x0; 2026-02-21T10:18:46.6460607Z ld.global.v2.b32 { %r8660, %r8661 }, [ %rd253 + 0 ]; 2026-02-21T10:18:46.6460837Z // end inline asm 2026-02-21T10:18:46.6461135Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6461500Z st.shared.b8 [%r599], %r8660; 2026-02-21T10:18:46.6461699Z prmt.b32 %r11868, %r8660, 0, 0x7771U; 2026-02-21T10:18:46.6461913Z st.shared.b8 [%r600], %r11868; 2026-02-21T10:18:46.6462104Z prmt.b32 %r11869, %r8660, 0, 0x7772U; 2026-02-21T10:18:46.6462309Z st.shared.b8 [%r601+256], %r11869; 2026-02-21T10:18:46.6462504Z prmt.b32 %r11870, %r8660, 0, 0x7773U; 2026-02-21T10:18:46.6462716Z st.shared.b8 [%r602+256], %r11870; 2026-02-21T10:18:46.6462916Z st.shared.b8 [%r603+512], %r8661; 2026-02-21T10:18:46.6463115Z prmt.b32 %r11871, %r8661, 0, 0x7771U; 2026-02-21T10:18:46.6463308Z st.shared.b8 [%r604+512], %r11871; 2026-02-21T10:18:46.6463498Z prmt.b32 %r11872, %r8661, 0, 0x7772U; 2026-02-21T10:18:46.6463700Z st.shared.b8 [%r605+768], %r11872; 2026-02-21T10:18:46.6463889Z prmt.b32 %r11873, %r8661, 0, 0x7773U; 2026-02-21T10:18:46.6464086Z st.shared.b8 [%r606+768], %r11873; 2026-02-21T10:18:46.6464280Z bar.sync 0; 2026-02-21T10:18:46.6464438Z ld.shared.b32 %r11874, [%r607]; 2026-02-21T10:18:46.6464630Z prmt.b32 %r11875, %r11874, 0, 0x7770U; 2026-02-21T10:18:46.6464840Z cvt.u16.u32 %rs529, %r11875; 2026-02-21T10:18:46.6465024Z prmt.b32 %r11876, %r11874, 0, 0x7771U; 2026-02-21T10:18:46.6465228Z cvt.u16.u32 %rs530, %r11876; 2026-02-21T10:18:46.6465409Z prmt.b32 %r11877, %r11874, 0, 0x7772U; 2026-02-21T10:18:46.6465609Z cvt.u16.u32 %rs531, %r11877; 2026-02-21T10:18:46.6465796Z prmt.b32 %r11878, %r11874, 0, 0x7773U; 2026-02-21T10:18:46.6466001Z cvt.u16.u32 %rs532, %r11878; 2026-02-21T10:18:46.6466188Z ld.shared.b32 %r11879, [%r608]; 2026-02-21T10:18:46.6466378Z prmt.b32 %r11880, %r11879, 0, 0x7770U; 2026-02-21T10:18:46.6466703Z cvt.u16.u32 %rs533, %r11880; 2026-02-21T10:18:46.6466882Z prmt.b32 %r11881, %r11879, 0, 0x7771U; 2026-02-21T10:18:46.6467082Z cvt.u16.u32 %rs534, %r11881; 2026-02-21T10:18:46.6467260Z prmt.b32 %r11882, %r11879, 0, 0x7772U; 2026-02-21T10:18:46.6467462Z cvt.u16.u32 %rs535, %r11882; 2026-02-21T10:18:46.6467740Z prmt.b32 %r11883, %r11879, 0, 0x7773U; 2026-02-21T10:18:46.6468016Z cvt.u16.u32 %rs536, %r11883; 2026-02-21T10:18:46.6468344Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6468784Z shl.b16 %rs537, %rs529, 4; 2026-02-21T10:18:46.6468966Z shl.b16 %rs538, %rs533, 4; 2026-02-21T10:18:46.6469138Z shl.b16 %rs539, %rs530, 4; 2026-02-21T10:18:46.6469315Z shl.b16 %rs540, %rs534, 4; 2026-02-21T10:18:46.6469488Z shl.b16 %rs541, %rs531, 4; 2026-02-21T10:18:46.6469661Z shl.b16 %rs542, %rs535, 4; 2026-02-21T10:18:46.6469834Z shl.b16 %rs543, %rs532, 4; 2026-02-21T10:18:46.6470004Z shl.b16 %rs544, %rs536, 4; 2026-02-21T10:18:46.6470328Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6470700Z cvt.s16.s8 %rs545, %rs537; 2026-02-21T10:18:46.6470881Z shr.s16 %rs546, %rs545, 4; 2026-02-21T10:18:46.6471052Z cvt.s16.s8 %rs547, %rs538; 2026-02-21T10:18:46.6471228Z shr.s16 %rs548, %rs547, 4; 2026-02-21T10:18:46.6471407Z prmt.b32 %r11884, %r11874, 0, 0x8880U; 2026-02-21T10:18:46.6471614Z cvt.u16.u32 %rs549, %r11884; 2026-02-21T10:18:46.6471873Z shr.s16 %rs550, %rs549, 4; 2026-02-21T10:18:46.6472057Z prmt.b32 %r11885, %r11879, 0, 0x8880U; 2026-02-21T10:18:46.6472257Z cvt.u16.u32 %rs551, %r11885; 2026-02-21T10:18:46.6472428Z shr.s16 %rs552, %rs551, 4; 2026-02-21T10:18:46.6472818Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6473178Z cvt.rn.f32.s16 %r11886, %rs552; 2026-02-21T10:18:46.6473369Z cvt.rn.f32.s16 %r11887, %rs550; 2026-02-21T10:18:46.6473552Z cvt.rn.f32.s16 %r11888, %rs548; 2026-02-21T10:18:46.6473740Z cvt.rn.f32.s16 %r11889, %rs546; 2026-02-21T10:18:46.6474069Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6474415Z cvt.s16.s8 %rs553, %rs539; 2026-02-21T10:18:46.6474593Z shr.s16 %rs554, %rs553, 4; 2026-02-21T10:18:46.6474760Z cvt.s16.s8 %rs555, %rs540; 2026-02-21T10:18:46.6474934Z shr.s16 %rs556, %rs555, 4; 2026-02-21T10:18:46.6475111Z prmt.b32 %r11890, %r11874, 0, 0x9991U; 2026-02-21T10:18:46.6475326Z cvt.u16.u32 %rs557, %r11890; 2026-02-21T10:18:46.6475502Z shr.s16 %rs558, %rs557, 4; 2026-02-21T10:18:46.6475682Z prmt.b32 %r11891, %r11879, 0, 0x9991U; 2026-02-21T10:18:46.6475883Z cvt.u16.u32 %rs559, %r11891; 2026-02-21T10:18:46.6476058Z shr.s16 %rs560, %rs559, 4; 2026-02-21T10:18:46.6476370Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6476906Z cvt.rn.f32.s16 %r11892, %rs560; 2026-02-21T10:18:46.6477098Z cvt.rn.f32.s16 %r11893, %rs558; 2026-02-21T10:18:46.6477280Z cvt.rn.f32.s16 %r11894, %rs556; 2026-02-21T10:18:46.6477466Z cvt.rn.f32.s16 %r11895, %rs554; 2026-02-21T10:18:46.6477785Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6478137Z cvt.s16.s8 %rs561, %rs541; 2026-02-21T10:18:46.6478317Z shr.s16 %rs562, %rs561, 4; 2026-02-21T10:18:46.6478486Z cvt.s16.s8 %rs563, %rs542; 2026-02-21T10:18:46.6478684Z shr.s16 %rs564, %rs563, 4; 2026-02-21T10:18:46.6478861Z prmt.b32 %r11896, %r11874, 0, 0xaaa2U; 2026-02-21T10:18:46.6479065Z cvt.u16.u32 %rs565, %r11896; 2026-02-21T10:18:46.6479238Z shr.s16 %rs566, %rs565, 4; 2026-02-21T10:18:46.6479419Z prmt.b32 %r11897, %r11879, 0, 0xaaa2U; 2026-02-21T10:18:46.6479615Z cvt.u16.u32 %rs567, %r11897; 2026-02-21T10:18:46.6479796Z shr.s16 %rs568, %rs567, 4; 2026-02-21T10:18:46.6480113Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6480465Z cvt.rn.f32.s16 %r11898, %rs568; 2026-02-21T10:18:46.6480654Z cvt.rn.f32.s16 %r11899, %rs566; 2026-02-21T10:18:46.6480836Z cvt.rn.f32.s16 %r11900, %rs564; 2026-02-21T10:18:46.6481023Z cvt.rn.f32.s16 %r11901, %rs562; 2026-02-21T10:18:46.6481438Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6481871Z cvt.s16.s8 %rs569, %rs543; 2026-02-21T10:18:46.6482045Z shr.s16 %rs570, %rs569, 4; 2026-02-21T10:18:46.6482223Z cvt.s16.s8 %rs571, %rs544; 2026-02-21T10:18:46.6482403Z shr.s16 %rs572, %rs571, 4; 2026-02-21T10:18:46.6482579Z prmt.b32 %r11902, %r11874, 0, 0xbbb3U; 2026-02-21T10:18:46.6482782Z cvt.u16.u32 %rs573, %r11902; 2026-02-21T10:18:46.6482957Z shr.s16 %rs574, %rs573, 4; 2026-02-21T10:18:46.6483136Z prmt.b32 %r11903, %r11879, 0, 0xbbb3U; 2026-02-21T10:18:46.6483330Z cvt.u16.u32 %rs575, %r11903; 2026-02-21T10:18:46.6483508Z shr.s16 %rs576, %rs575, 4; 2026-02-21T10:18:46.6483814Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6484169Z cvt.rn.f32.s16 %r11904, %rs576; 2026-02-21T10:18:46.6484354Z cvt.rn.f32.s16 %r11905, %rs574; 2026-02-21T10:18:46.6484534Z cvt.rn.f32.s16 %r11906, %rs572; 2026-02-21T10:18:46.6484720Z cvt.rn.f32.s16 %r11907, %rs570; 2026-02-21T10:18:46.6484898Z bar.sync 0; 2026-02-21T10:18:46.6485186Z st.shared.v4.b32 [%r609], {%r11889, %r11887, %r11888, %r11886}; 2026-02-21T10:18:46.6485502Z st.shared.v4.b32 [%r610], {%r11895, %r11893, %r11894, %r11892}; 2026-02-21T10:18:46.6485807Z st.shared.v4.b32 [%r611], {%r11901, %r11899, %r11900, %r11898}; 2026-02-21T10:18:46.6486160Z st.shared.v4.b32 [%r612], {%r11907, %r11905, %r11906, %r11904}; 2026-02-21T10:18:46.6486412Z $L__tmp17: 2026-02-21T10:18:46.6486903Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6487320Z // begin inline asm 2026-02-21T10:18:46.6487511Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6487700Z // end inline asm 2026-02-21T10:18:46.6487861Z bar.sync 0; 2026-02-21T10:18:46.6488028Z shfl.sync.idx.b32 %r11908, %r4, 0, 31, -1; 2026-02-21T10:18:46.6488263Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6488461Z mov.pred %p46, -1; 2026-02-21T10:18:46.6488630Z // begin inline asm 2026-02-21T10:18:46.6490228Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r8790,%r8791,%r8792,%r8793}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6491857Z // end inline asm 2026-02-21T10:18:46.6492008Z // begin inline asm 2026-02-21T10:18:46.6493586Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r8922,%r8923,%r8924,%r8925}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6495200Z // end inline asm 2026-02-21T10:18:46.6495348Z // begin inline asm 2026-02-21T10:18:46.6497045Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r9054,%r9055,%r9056,%r9057}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6498832Z // end inline asm 2026-02-21T10:18:46.6498979Z // begin inline asm 2026-02-21T10:18:46.6500560Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r9186,%r9187,%r9188,%r9189}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6502178Z // end inline asm 2026-02-21T10:18:46.6502347Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6502562Z mov.b32 %r11696, 0; 2026-02-21T10:18:46.6502726Z mov.b32 %r9318, %r9449; 2026-02-21T10:18:46.6502973Z mov.b32 %r9319, %r11696; 2026-02-21T10:18:46.6503144Z mov.b32 %r9320, %r11696; 2026-02-21T10:18:46.6503314Z // begin inline asm 2026-02-21T10:18:46.6506020Z // wait for regs: %r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r9318,%r9319,%r9320 2026-02-21T10:18:46.6509050Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6509246Z // end inline asm 2026-02-21T10:18:46.6509400Z $L__tmp18: 2026-02-21T10:18:46.6509692Z .loc 1 41 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:41:35 2026-02-21T10:18:46.6510063Z add.s32 %r11909, %r12621, %r11; 2026-02-21T10:18:46.6510393Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6510750Z add.s32 %r11910, %r5575, 20480; 2026-02-21T10:18:46.6510942Z add.s32 %r11911, %r11910, %r11862; 2026-02-21T10:18:46.6511271Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6511633Z add.s32 %r11912, %r11911, %r597; 2026-02-21T10:18:46.6511825Z ld.shared.b16 %rs577, [%r11912]; 2026-02-21T10:18:46.6512027Z ld.shared.b16 %rs578, [%r11912+256]; 2026-02-21T10:18:46.6512230Z ld.shared.b16 %rs579, [%r11912+16]; 2026-02-21T10:18:46.6512431Z ld.shared.b16 %rs580, [%r11912+272]; 2026-02-21T10:18:46.6512630Z ld.shared.b16 %rs581, [%r11912+2048]; 2026-02-21T10:18:46.6512838Z ld.shared.b16 %rs582, [%r11912+2304]; 2026-02-21T10:18:46.6513040Z ld.shared.b16 %rs583, [%r11912+2064]; 2026-02-21T10:18:46.6513239Z ld.shared.b16 %rs584, [%r11912+2320]; 2026-02-21T10:18:46.6513442Z add.s32 %r11913, %r11911, %r598; 2026-02-21T10:18:46.6513630Z ld.shared.b16 %rs585, [%r11913]; 2026-02-21T10:18:46.6513830Z ld.shared.b16 %rs586, [%r11913+256]; 2026-02-21T10:18:46.6514048Z ld.shared.b16 %rs587, [%r11913+16]; 2026-02-21T10:18:46.6514247Z ld.shared.b16 %rs588, [%r11913+272]; 2026-02-21T10:18:46.6514538Z ld.shared.b16 %rs589, [%r11913+2048]; 2026-02-21T10:18:46.6514798Z ld.shared.b16 %rs590, [%r11913+2304]; 2026-02-21T10:18:46.6515002Z ld.shared.b16 %rs591, [%r11913+2064]; 2026-02-21T10:18:46.6515199Z ld.shared.b16 %rs592, [%r11913+2320]; 2026-02-21T10:18:46.6515405Z cvt.f32.bf16 %r9582, %rs577; 2026-02-21T10:18:46.6515587Z cvt.f32.bf16 %r9583, %rs578; 2026-02-21T10:18:46.6515769Z cvt.f32.bf16 %r9584, %rs585; 2026-02-21T10:18:46.6515950Z cvt.f32.bf16 %r9585, %rs586; 2026-02-21T10:18:46.6516123Z cvt.f32.bf16 %r9714, %rs579; 2026-02-21T10:18:46.6516301Z cvt.f32.bf16 %r9715, %rs580; 2026-02-21T10:18:46.6516607Z cvt.f32.bf16 %r9716, %rs587; 2026-02-21T10:18:46.6516792Z cvt.f32.bf16 %r9717, %rs588; 2026-02-21T10:18:46.6516963Z cvt.f32.bf16 %r9846, %rs581; 2026-02-21T10:18:46.6517148Z cvt.f32.bf16 %r9847, %rs582; 2026-02-21T10:18:46.6517322Z cvt.f32.bf16 %r9848, %rs589; 2026-02-21T10:18:46.6517501Z cvt.f32.bf16 %r9849, %rs590; 2026-02-21T10:18:46.6517674Z cvt.f32.bf16 %r9978, %rs583; 2026-02-21T10:18:46.6517849Z cvt.f32.bf16 %r9979, %rs584; 2026-02-21T10:18:46.6518044Z cvt.f32.bf16 %r9980, %rs591; 2026-02-21T10:18:46.6518295Z cvt.f32.bf16 %r9981, %rs592; 2026-02-21T10:18:46.6518627Z .loc 1 54 62 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:62 2026-02-21T10:18:46.6519001Z mad.lo.s32 %r11914, %r11909, 1280, %r12625; 2026-02-21T10:18:46.6519434Z .loc 1 54 34 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:34 2026-02-21T10:18:46.6519787Z cvt.s64.s32 %rd290, %r11914; 2026-02-21T10:18:46.6519972Z add.s64 %rd258, %rd26, %rd290; 2026-02-21T10:18:46.6520302Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6520650Z // begin inline asm 2026-02-21T10:18:46.6520817Z mov.u32 %r9452, 0x0; 2026-02-21T10:18:46.6520971Z mov.u32 %r9453, 0x0; 2026-02-21T10:18:46.6521163Z ld.global.v2.b32 { %r9452, %r9453 }, [ %rd258 + 0 ]; 2026-02-21T10:18:46.6521410Z // end inline asm 2026-02-21T10:18:46.6521710Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6522054Z bar.sync 0; 2026-02-21T10:18:46.6522211Z st.shared.b8 [%r599], %r9452; 2026-02-21T10:18:46.6522410Z prmt.b32 %r11915, %r9452, 0, 0x7771U; 2026-02-21T10:18:46.6522612Z st.shared.b8 [%r600], %r11915; 2026-02-21T10:18:46.6522807Z prmt.b32 %r11916, %r9452, 0, 0x7772U; 2026-02-21T10:18:46.6523006Z st.shared.b8 [%r601+256], %r11916; 2026-02-21T10:18:46.6523210Z prmt.b32 %r11917, %r9452, 0, 0x7773U; 2026-02-21T10:18:46.6523405Z st.shared.b8 [%r602+256], %r11917; 2026-02-21T10:18:46.6523610Z st.shared.b8 [%r603+512], %r9453; 2026-02-21T10:18:46.6523805Z prmt.b32 %r11918, %r9453, 0, 0x7771U; 2026-02-21T10:18:46.6524004Z st.shared.b8 [%r604+512], %r11918; 2026-02-21T10:18:46.6524201Z prmt.b32 %r11919, %r9453, 0, 0x7772U; 2026-02-21T10:18:46.6524396Z st.shared.b8 [%r605+768], %r11919; 2026-02-21T10:18:46.6524601Z prmt.b32 %r11920, %r9453, 0, 0x7773U; 2026-02-21T10:18:46.6524800Z st.shared.b8 [%r606+768], %r11920; 2026-02-21T10:18:46.6525004Z bar.sync 0; 2026-02-21T10:18:46.6525161Z ld.shared.b32 %r11921, [%r607]; 2026-02-21T10:18:46.6525362Z prmt.b32 %r11922, %r11921, 0, 0x7770U; 2026-02-21T10:18:46.6525566Z cvt.u16.u32 %rs593, %r11922; 2026-02-21T10:18:46.6525769Z prmt.b32 %r11923, %r11921, 0, 0x7771U; 2026-02-21T10:18:46.6525972Z cvt.u16.u32 %rs594, %r11923; 2026-02-21T10:18:46.6526158Z prmt.b32 %r11924, %r11921, 0, 0x7772U; 2026-02-21T10:18:46.6526361Z cvt.u16.u32 %rs595, %r11924; 2026-02-21T10:18:46.6526670Z prmt.b32 %r11925, %r11921, 0, 0x7773U; 2026-02-21T10:18:46.6526872Z cvt.u16.u32 %rs596, %r11925; 2026-02-21T10:18:46.6527052Z ld.shared.b32 %r11926, [%r608]; 2026-02-21T10:18:46.6527245Z prmt.b32 %r11927, %r11926, 0, 0x7770U; 2026-02-21T10:18:46.6527440Z cvt.u16.u32 %rs597, %r11927; 2026-02-21T10:18:46.6527626Z prmt.b32 %r11928, %r11926, 0, 0x7771U; 2026-02-21T10:18:46.6527938Z cvt.u16.u32 %rs598, %r11928; 2026-02-21T10:18:46.6528181Z prmt.b32 %r11929, %r11926, 0, 0x7772U; 2026-02-21T10:18:46.6528383Z cvt.u16.u32 %rs599, %r11929; 2026-02-21T10:18:46.6528574Z prmt.b32 %r11930, %r11926, 0, 0x7773U; 2026-02-21T10:18:46.6528773Z cvt.u16.u32 %rs600, %r11930; 2026-02-21T10:18:46.6529088Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6529445Z shl.b16 %rs601, %rs593, 4; 2026-02-21T10:18:46.6529621Z shl.b16 %rs602, %rs597, 4; 2026-02-21T10:18:46.6529799Z shl.b16 %rs603, %rs594, 4; 2026-02-21T10:18:46.6529968Z shl.b16 %rs604, %rs598, 4; 2026-02-21T10:18:46.6530144Z shl.b16 %rs605, %rs595, 4; 2026-02-21T10:18:46.6530320Z shl.b16 %rs606, %rs599, 4; 2026-02-21T10:18:46.6530500Z shl.b16 %rs607, %rs596, 4; 2026-02-21T10:18:46.6530675Z shl.b16 %rs608, %rs600, 4; 2026-02-21T10:18:46.6530985Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6531342Z cvt.s16.s8 %rs609, %rs601; 2026-02-21T10:18:46.6531517Z shr.s16 %rs610, %rs609, 4; 2026-02-21T10:18:46.6531772Z cvt.s16.s8 %rs611, %rs602; 2026-02-21T10:18:46.6531954Z shr.s16 %rs612, %rs611, 4; 2026-02-21T10:18:46.6532133Z prmt.b32 %r11931, %r11921, 0, 0x8880U; 2026-02-21T10:18:46.6532337Z cvt.u16.u32 %rs613, %r11931; 2026-02-21T10:18:46.6532511Z shr.s16 %rs614, %rs613, 4; 2026-02-21T10:18:46.6532765Z prmt.b32 %r11932, %r11926, 0, 0x8880U; 2026-02-21T10:18:46.6532971Z cvt.u16.u32 %rs615, %r11932; 2026-02-21T10:18:46.6533151Z shr.s16 %rs616, %rs615, 4; 2026-02-21T10:18:46.6533464Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6533822Z cvt.rn.f32.s16 %r11933, %rs616; 2026-02-21T10:18:46.6534011Z cvt.rn.f32.s16 %r11934, %rs614; 2026-02-21T10:18:46.6534200Z cvt.rn.f32.s16 %r11935, %rs612; 2026-02-21T10:18:46.6534388Z cvt.rn.f32.s16 %r11936, %rs610; 2026-02-21T10:18:46.6534709Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6535070Z cvt.s16.s8 %rs617, %rs603; 2026-02-21T10:18:46.6535243Z shr.s16 %rs618, %rs617, 4; 2026-02-21T10:18:46.6535419Z cvt.s16.s8 %rs619, %rs604; 2026-02-21T10:18:46.6535603Z shr.s16 %rs620, %rs619, 4; 2026-02-21T10:18:46.6535791Z prmt.b32 %r11937, %r11921, 0, 0x9991U; 2026-02-21T10:18:46.6535990Z cvt.u16.u32 %rs621, %r11937; 2026-02-21T10:18:46.6536174Z shr.s16 %rs622, %rs621, 4; 2026-02-21T10:18:46.6536356Z prmt.b32 %r11938, %r11926, 0, 0x9991U; 2026-02-21T10:18:46.6536683Z cvt.u16.u32 %rs623, %r11938; 2026-02-21T10:18:46.6536866Z shr.s16 %rs624, %rs623, 4; 2026-02-21T10:18:46.6537181Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6537544Z cvt.rn.f32.s16 %r11939, %rs624; 2026-02-21T10:18:46.6537733Z cvt.rn.f32.s16 %r11940, %rs622; 2026-02-21T10:18:46.6537942Z cvt.rn.f32.s16 %r11941, %rs620; 2026-02-21T10:18:46.6538130Z cvt.rn.f32.s16 %r11942, %rs618; 2026-02-21T10:18:46.6538467Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6538820Z cvt.s16.s8 %rs625, %rs605; 2026-02-21T10:18:46.6538991Z shr.s16 %rs626, %rs625, 4; 2026-02-21T10:18:46.6539162Z cvt.s16.s8 %rs627, %rs606; 2026-02-21T10:18:46.6539339Z shr.s16 %rs628, %rs627, 4; 2026-02-21T10:18:46.6539528Z prmt.b32 %r11943, %r11921, 0, 0xaaa2U; 2026-02-21T10:18:46.6539728Z cvt.u16.u32 %rs629, %r11943; 2026-02-21T10:18:46.6539904Z shr.s16 %rs630, %rs629, 4; 2026-02-21T10:18:46.6540081Z prmt.b32 %r11944, %r11926, 0, 0xaaa2U; 2026-02-21T10:18:46.6540280Z cvt.u16.u32 %rs631, %r11944; 2026-02-21T10:18:46.6540457Z shr.s16 %rs632, %rs631, 4; 2026-02-21T10:18:46.6540772Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6541131Z cvt.rn.f32.s16 %r11945, %rs632; 2026-02-21T10:18:46.6541407Z cvt.rn.f32.s16 %r11946, %rs630; 2026-02-21T10:18:46.6541654Z cvt.rn.f32.s16 %r11947, %rs628; 2026-02-21T10:18:46.6541835Z cvt.rn.f32.s16 %r11948, %rs626; 2026-02-21T10:18:46.6542159Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6542507Z cvt.s16.s8 %rs633, %rs607; 2026-02-21T10:18:46.6542683Z shr.s16 %rs634, %rs633, 4; 2026-02-21T10:18:46.6542857Z cvt.s16.s8 %rs635, %rs608; 2026-02-21T10:18:46.6543025Z shr.s16 %rs636, %rs635, 4; 2026-02-21T10:18:46.6543206Z prmt.b32 %r11949, %r11921, 0, 0xbbb3U; 2026-02-21T10:18:46.6543404Z cvt.u16.u32 %rs637, %r11949; 2026-02-21T10:18:46.6543580Z shr.s16 %rs638, %rs637, 4; 2026-02-21T10:18:46.6543769Z prmt.b32 %r11950, %r11926, 0, 0xbbb3U; 2026-02-21T10:18:46.6543970Z cvt.u16.u32 %rs639, %r11950; 2026-02-21T10:18:46.6544144Z shr.s16 %rs640, %rs639, 4; 2026-02-21T10:18:46.6544464Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6544821Z cvt.rn.f32.s16 %r11951, %rs640; 2026-02-21T10:18:46.6545006Z cvt.rn.f32.s16 %r11952, %rs638; 2026-02-21T10:18:46.6545282Z cvt.rn.f32.s16 %r11953, %rs636; 2026-02-21T10:18:46.6545476Z cvt.rn.f32.s16 %r11954, %rs634; 2026-02-21T10:18:46.6545658Z bar.sync 0; 2026-02-21T10:18:46.6545859Z st.shared.v4.b32 [%r609], {%r11936, %r11934, %r11935, %r11933}; 2026-02-21T10:18:46.6546233Z st.shared.v4.b32 [%r610], {%r11942, %r11940, %r11941, %r11939}; 2026-02-21T10:18:46.6546654Z st.shared.v4.b32 [%r611], {%r11948, %r11946, %r11947, %r11945}; 2026-02-21T10:18:46.6546964Z st.shared.v4.b32 [%r612], {%r11954, %r11952, %r11953, %r11951}; 2026-02-21T10:18:46.6547224Z $L__tmp19: 2026-02-21T10:18:46.6547582Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6548005Z // begin inline asm 2026-02-21T10:18:46.6548193Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6548392Z // end inline asm 2026-02-21T10:18:46.6548638Z bar.sync 0; 2026-02-21T10:18:46.6548797Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6548979Z // begin inline asm 2026-02-21T10:18:46.6550571Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r9582,%r9583,%r9584,%r9585}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6552189Z // end inline asm 2026-02-21T10:18:46.6552359Z // begin inline asm 2026-02-21T10:18:46.6553936Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r9714,%r9715,%r9716,%r9717}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6555557Z // end inline asm 2026-02-21T10:18:46.6555708Z // begin inline asm 2026-02-21T10:18:46.6557407Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r9846,%r9847,%r9848,%r9849}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6559185Z // end inline asm 2026-02-21T10:18:46.6559340Z // begin inline asm 2026-02-21T10:18:46.6560910Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r9978,%r9979,%r9980,%r9981}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6562525Z // end inline asm 2026-02-21T10:18:46.6562707Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6562974Z mov.b32 %r10110, %r9449; 2026-02-21T10:18:46.6563153Z mov.b32 %r10111, %r11696; 2026-02-21T10:18:46.6563322Z mov.b32 %r10112, %r11696; 2026-02-21T10:18:46.6563490Z // begin inline asm 2026-02-21T10:18:46.6566172Z // wait for regs: %r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r10110,%r10111,%r10112 2026-02-21T10:18:46.6569113Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6569314Z // end inline asm 2026-02-21T10:18:46.6569457Z $L__tmp20: 2026-02-21T10:18:46.6569747Z .loc 1 41 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:41:35 2026-02-21T10:18:46.6570112Z add.s32 %r11955, %r12616, %r11; 2026-02-21T10:18:46.6570437Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6570791Z add.s32 %r11956, %r5575, 40960; 2026-02-21T10:18:46.6570972Z add.s32 %r11957, %r11956, %r11862; 2026-02-21T10:18:46.6571306Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6571666Z add.s32 %r11958, %r11957, %r597; 2026-02-21T10:18:46.6571859Z ld.shared.b16 %rs641, [%r11958]; 2026-02-21T10:18:46.6572057Z ld.shared.b16 %rs642, [%r11958+256]; 2026-02-21T10:18:46.6572260Z ld.shared.b16 %rs643, [%r11958+16]; 2026-02-21T10:18:46.6572467Z ld.shared.b16 %rs644, [%r11958+272]; 2026-02-21T10:18:46.6572682Z ld.shared.b16 %rs645, [%r11958+2048]; 2026-02-21T10:18:46.6572892Z ld.shared.b16 %rs646, [%r11958+2304]; 2026-02-21T10:18:46.6573091Z ld.shared.b16 %rs647, [%r11958+2064]; 2026-02-21T10:18:46.6573296Z ld.shared.b16 %rs648, [%r11958+2320]; 2026-02-21T10:18:46.6573495Z add.s32 %r11959, %r11957, %r598; 2026-02-21T10:18:46.6573683Z ld.shared.b16 %rs649, [%r11959]; 2026-02-21T10:18:46.6573876Z ld.shared.b16 %rs650, [%r11959+256]; 2026-02-21T10:18:46.6574075Z ld.shared.b16 %rs651, [%r11959+16]; 2026-02-21T10:18:46.6574368Z ld.shared.b16 %rs652, [%r11959+272]; 2026-02-21T10:18:46.6578452Z ld.shared.b16 %rs653, [%r11959+2048]; 2026-02-21T10:18:46.6578768Z ld.shared.b16 %rs654, [%r11959+2304]; 2026-02-21T10:18:46.6579003Z ld.shared.b16 %rs655, [%r11959+2064]; 2026-02-21T10:18:46.6579226Z ld.shared.b16 %rs656, [%r11959+2320]; 2026-02-21T10:18:46.6579442Z cvt.f32.bf16 %r10374, %rs641; 2026-02-21T10:18:46.6579638Z cvt.f32.bf16 %r10375, %rs642; 2026-02-21T10:18:46.6579820Z cvt.f32.bf16 %r10376, %rs649; 2026-02-21T10:18:46.6580004Z cvt.f32.bf16 %r10377, %rs650; 2026-02-21T10:18:46.6580185Z cvt.f32.bf16 %r10506, %rs643; 2026-02-21T10:18:46.6580368Z cvt.f32.bf16 %r10507, %rs644; 2026-02-21T10:18:46.6580541Z cvt.f32.bf16 %r10508, %rs651; 2026-02-21T10:18:46.6580721Z cvt.f32.bf16 %r10509, %rs652; 2026-02-21T10:18:46.6580904Z cvt.f32.bf16 %r10638, %rs645; 2026-02-21T10:18:46.6581087Z cvt.f32.bf16 %r10639, %rs646; 2026-02-21T10:18:46.6581268Z cvt.f32.bf16 %r10640, %rs653; 2026-02-21T10:18:46.6581445Z cvt.f32.bf16 %r10641, %rs654; 2026-02-21T10:18:46.6581625Z cvt.f32.bf16 %r10770, %rs647; 2026-02-21T10:18:46.6581800Z cvt.f32.bf16 %r10771, %rs648; 2026-02-21T10:18:46.6582113Z cvt.f32.bf16 %r10772, %rs655; 2026-02-21T10:18:46.6582307Z cvt.f32.bf16 %r10773, %rs656; 2026-02-21T10:18:46.6582655Z .loc 1 54 62 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:62 2026-02-21T10:18:46.6583153Z mad.lo.s32 %r11960, %r11955, 1280, %r12625; 2026-02-21T10:18:46.6583546Z .loc 1 54 34 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:34 2026-02-21T10:18:46.6583925Z cvt.s64.s32 %rd291, %r11960; 2026-02-21T10:18:46.6584122Z add.s64 %rd263, %rd26, %rd291; 2026-02-21T10:18:46.6584465Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6584821Z // begin inline asm 2026-02-21T10:18:46.6585001Z mov.u32 %r10244, 0x0; 2026-02-21T10:18:46.6585167Z mov.u32 %r10245, 0x0; 2026-02-21T10:18:46.6585374Z ld.global.v2.b32 { %r10244, %r10245 }, [ %rd263 + 0 ]; 2026-02-21T10:18:46.6585614Z // end inline asm 2026-02-21T10:18:46.6585921Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6586275Z bar.sync 0; 2026-02-21T10:18:46.6586431Z st.shared.b8 [%r599], %r10244; 2026-02-21T10:18:46.6586797Z prmt.b32 %r11961, %r10244, 0, 0x7771U; 2026-02-21T10:18:46.6587009Z st.shared.b8 [%r600], %r11961; 2026-02-21T10:18:46.6587203Z prmt.b32 %r11962, %r10244, 0, 0x7772U; 2026-02-21T10:18:46.6587418Z st.shared.b8 [%r601+256], %r11962; 2026-02-21T10:18:46.6587629Z prmt.b32 %r11963, %r10244, 0, 0x7773U; 2026-02-21T10:18:46.6587834Z st.shared.b8 [%r602+256], %r11963; 2026-02-21T10:18:46.6588033Z st.shared.b8 [%r603+512], %r10245; 2026-02-21T10:18:46.6588232Z prmt.b32 %r11964, %r10245, 0, 0x7771U; 2026-02-21T10:18:46.6588505Z st.shared.b8 [%r604+512], %r11964; 2026-02-21T10:18:46.6588726Z prmt.b32 %r11965, %r10245, 0, 0x7772U; 2026-02-21T10:18:46.6588929Z st.shared.b8 [%r605+768], %r11965; 2026-02-21T10:18:46.6589133Z prmt.b32 %r11966, %r10245, 0, 0x7773U; 2026-02-21T10:18:46.6589333Z st.shared.b8 [%r606+768], %r11966; 2026-02-21T10:18:46.6589522Z bar.sync 0; 2026-02-21T10:18:46.6589677Z ld.shared.b32 %r11967, [%r607]; 2026-02-21T10:18:46.6589877Z prmt.b32 %r11968, %r11967, 0, 0x7770U; 2026-02-21T10:18:46.6590090Z cvt.u16.u32 %rs657, %r11968; 2026-02-21T10:18:46.6590299Z prmt.b32 %r11969, %r11967, 0, 0x7771U; 2026-02-21T10:18:46.6590519Z cvt.u16.u32 %rs658, %r11969; 2026-02-21T10:18:46.6590708Z prmt.b32 %r11970, %r11967, 0, 0x7772U; 2026-02-21T10:18:46.6590914Z cvt.u16.u32 %rs659, %r11970; 2026-02-21T10:18:46.6591096Z prmt.b32 %r11971, %r11967, 0, 0x7773U; 2026-02-21T10:18:46.6591298Z cvt.u16.u32 %rs660, %r11971; 2026-02-21T10:18:46.6591481Z ld.shared.b32 %r11972, [%r608]; 2026-02-21T10:18:46.6591680Z prmt.b32 %r11973, %r11972, 0, 0x7770U; 2026-02-21T10:18:46.6591986Z cvt.u16.u32 %rs661, %r11973; 2026-02-21T10:18:46.6592173Z prmt.b32 %r11974, %r11972, 0, 0x7771U; 2026-02-21T10:18:46.6592452Z cvt.u16.u32 %rs662, %r11974; 2026-02-21T10:18:46.6592634Z prmt.b32 %r11975, %r11972, 0, 0x7772U; 2026-02-21T10:18:46.6592839Z cvt.u16.u32 %rs663, %r11975; 2026-02-21T10:18:46.6593020Z prmt.b32 %r11976, %r11972, 0, 0x7773U; 2026-02-21T10:18:46.6593226Z cvt.u16.u32 %rs664, %r11976; 2026-02-21T10:18:46.6593559Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6593933Z shl.b16 %rs665, %rs657, 4; 2026-02-21T10:18:46.6594116Z shl.b16 %rs666, %rs661, 4; 2026-02-21T10:18:46.6594292Z shl.b16 %rs667, %rs658, 4; 2026-02-21T10:18:46.6594472Z shl.b16 %rs668, %rs662, 4; 2026-02-21T10:18:46.6594653Z shl.b16 %rs669, %rs659, 4; 2026-02-21T10:18:46.6594829Z shl.b16 %rs670, %rs663, 4; 2026-02-21T10:18:46.6595000Z shl.b16 %rs671, %rs660, 4; 2026-02-21T10:18:46.6595176Z shl.b16 %rs672, %rs664, 4; 2026-02-21T10:18:46.6595493Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6595856Z cvt.s16.s8 %rs673, %rs665; 2026-02-21T10:18:46.6596109Z shr.s16 %rs674, %rs673, 4; 2026-02-21T10:18:46.6596296Z cvt.s16.s8 %rs675, %rs666; 2026-02-21T10:18:46.6596605Z shr.s16 %rs676, %rs675, 4; 2026-02-21T10:18:46.6596813Z prmt.b32 %r11977, %r11967, 0, 0x8880U; 2026-02-21T10:18:46.6597024Z cvt.u16.u32 %rs677, %r11977; 2026-02-21T10:18:46.6597278Z shr.s16 %rs678, %rs677, 4; 2026-02-21T10:18:46.6597465Z prmt.b32 %r11978, %r11972, 0, 0x8880U; 2026-02-21T10:18:46.6597664Z cvt.u16.u32 %rs679, %r11978; 2026-02-21T10:18:46.6597846Z shr.s16 %rs680, %rs679, 4; 2026-02-21T10:18:46.6598166Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6598241Z cvt.rn.f32.s16 %r11979, %rs680; 2026-02-21T10:18:46.6598308Z cvt.rn.f32.s16 %r11980, %rs678; 2026-02-21T10:18:46.6598376Z cvt.rn.f32.s16 %r11981, %rs676; 2026-02-21T10:18:46.6598455Z cvt.rn.f32.s16 %r11982, %rs674; 2026-02-21T10:18:46.6598676Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6598742Z cvt.s16.s8 %rs681, %rs667; 2026-02-21T10:18:46.6598810Z shr.s16 %rs682, %rs681, 4; 2026-02-21T10:18:46.6598873Z cvt.s16.s8 %rs683, %rs668; 2026-02-21T10:18:46.6598936Z shr.s16 %rs684, %rs683, 4; 2026-02-21T10:18:46.6599010Z prmt.b32 %r11983, %r11967, 0, 0x9991U; 2026-02-21T10:18:46.6599079Z cvt.u16.u32 %rs685, %r11983; 2026-02-21T10:18:46.6599140Z shr.s16 %rs686, %rs685, 4; 2026-02-21T10:18:46.6599212Z prmt.b32 %r11984, %r11972, 0, 0x9991U; 2026-02-21T10:18:46.6599288Z cvt.u16.u32 %rs687, %r11984; 2026-02-21T10:18:46.6599351Z shr.s16 %rs688, %rs687, 4; 2026-02-21T10:18:46.6599567Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6599636Z cvt.rn.f32.s16 %r11985, %rs688; 2026-02-21T10:18:46.6599706Z cvt.rn.f32.s16 %r11986, %rs686; 2026-02-21T10:18:46.6599782Z cvt.rn.f32.s16 %r11987, %rs684; 2026-02-21T10:18:46.6599847Z cvt.rn.f32.s16 %r11988, %rs682; 2026-02-21T10:18:46.6600060Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6600125Z cvt.s16.s8 %rs689, %rs669; 2026-02-21T10:18:46.6600188Z shr.s16 %rs690, %rs689, 4; 2026-02-21T10:18:46.6600256Z cvt.s16.s8 %rs691, %rs670; 2026-02-21T10:18:46.6600320Z shr.s16 %rs692, %rs691, 4; 2026-02-21T10:18:46.6600390Z prmt.b32 %r11989, %r11967, 0, 0xaaa2U; 2026-02-21T10:18:46.6600454Z cvt.u16.u32 %rs693, %r11989; 2026-02-21T10:18:46.6600525Z shr.s16 %rs694, %rs693, 4; 2026-02-21T10:18:46.6600593Z prmt.b32 %r11990, %r11972, 0, 0xaaa2U; 2026-02-21T10:18:46.6600656Z cvt.u16.u32 %rs695, %r11990; 2026-02-21T10:18:46.6600721Z shr.s16 %rs696, %rs695, 4; 2026-02-21T10:18:46.6600922Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6601080Z cvt.rn.f32.s16 %r11991, %rs696; 2026-02-21T10:18:46.6601210Z cvt.rn.f32.s16 %r11992, %rs694; 2026-02-21T10:18:46.6601286Z cvt.rn.f32.s16 %r11993, %rs692; 2026-02-21T10:18:46.6601349Z cvt.rn.f32.s16 %r11994, %rs690; 2026-02-21T10:18:46.6601560Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6601632Z cvt.s16.s8 %rs697, %rs671; 2026-02-21T10:18:46.6601697Z shr.s16 %rs698, %rs697, 4; 2026-02-21T10:18:46.6601759Z cvt.s16.s8 %rs699, %rs672; 2026-02-21T10:18:46.6601826Z shr.s16 %rs700, %rs699, 4; 2026-02-21T10:18:46.6601900Z prmt.b32 %r11995, %r11967, 0, 0xbbb3U; 2026-02-21T10:18:46.6601966Z cvt.u16.u32 %rs701, %r11995; 2026-02-21T10:18:46.6602030Z shr.s16 %rs702, %rs701, 4; 2026-02-21T10:18:46.6602107Z prmt.b32 %r11996, %r11972, 0, 0xbbb3U; 2026-02-21T10:18:46.6602171Z cvt.u16.u32 %rs703, %r11996; 2026-02-21T10:18:46.6602237Z shr.s16 %rs704, %rs703, 4; 2026-02-21T10:18:46.6602451Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6602582Z cvt.rn.f32.s16 %r11997, %rs704; 2026-02-21T10:18:46.6602661Z cvt.rn.f32.s16 %r11998, %rs702; 2026-02-21T10:18:46.6602734Z cvt.rn.f32.s16 %r11999, %rs700; 2026-02-21T10:18:46.6602797Z cvt.rn.f32.s16 %r12000, %rs698; 2026-02-21T10:18:46.6602857Z bar.sync 0; 2026-02-21T10:18:46.6603032Z st.shared.v4.b32 [%r609], {%r11982, %r11980, %r11981, %r11979}; 2026-02-21T10:18:46.6603154Z st.shared.v4.b32 [%r610], {%r11988, %r11986, %r11987, %r11985}; 2026-02-21T10:18:46.6603265Z st.shared.v4.b32 [%r611], {%r11994, %r11992, %r11993, %r11991}; 2026-02-21T10:18:46.6603374Z st.shared.v4.b32 [%r612], {%r12000, %r11998, %r11999, %r11997}; 2026-02-21T10:18:46.6603436Z $L__tmp21: 2026-02-21T10:18:46.6603721Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6603787Z // begin inline asm 2026-02-21T10:18:46.6603882Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6603944Z // end inline asm 2026-02-21T10:18:46.6604003Z bar.sync 0; 2026-02-21T10:18:46.6604078Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6604147Z // begin inline asm 2026-02-21T10:18:46.6605615Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r10374,%r10375,%r10376,%r10377}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6605680Z // end inline asm 2026-02-21T10:18:46.6605744Z // begin inline asm 2026-02-21T10:18:46.6607342Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r10506,%r10507,%r10508,%r10509}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6607411Z // end inline asm 2026-02-21T10:18:46.6607470Z // begin inline asm 2026-02-21T10:18:46.6608937Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r10638,%r10639,%r10640,%r10641}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6609143Z // end inline asm 2026-02-21T10:18:46.6609207Z // begin inline asm 2026-02-21T10:18:46.6610669Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r10770,%r10771,%r10772,%r10773}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6610731Z // end inline asm 2026-02-21T10:18:46.6610866Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6610937Z mov.b32 %r10902, %r9449; 2026-02-21T10:18:46.6611000Z mov.b32 %r10903, %r11696; 2026-02-21T10:18:46.6611064Z mov.b32 %r10904, %r11696; 2026-02-21T10:18:46.6611125Z // begin inline asm 2026-02-21T10:18:46.6613662Z // wait for regs: %r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r10902,%r10903,%r10904 2026-02-21T10:18:46.6613752Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6613811Z // end inline asm 2026-02-21T10:18:46.6613866Z $L__tmp22: 2026-02-21T10:18:46.6614081Z .loc 1 41 35 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:41:35 2026-02-21T10:18:46.6614152Z add.s32 %r12001, %r12611, %r11; 2026-02-21T10:18:46.6614357Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6614422Z add.s32 %r12002, %r5575, 61440; 2026-02-21T10:18:46.6614494Z add.s32 %r12003, %r12002, %r11862; 2026-02-21T10:18:46.6614695Z .loc 1 52 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:52:32 2026-02-21T10:18:46.6614760Z add.s32 %r12004, %r12003, %r597; 2026-02-21T10:18:46.6614833Z ld.shared.b16 %rs705, [%r12004]; 2026-02-21T10:18:46.6614905Z ld.shared.b16 %rs706, [%r12004+256]; 2026-02-21T10:18:46.6614978Z ld.shared.b16 %rs707, [%r12004+16]; 2026-02-21T10:18:46.6615046Z ld.shared.b16 %rs708, [%r12004+272]; 2026-02-21T10:18:46.6615125Z ld.shared.b16 %rs709, [%r12004+2048]; 2026-02-21T10:18:46.6615193Z ld.shared.b16 %rs710, [%r12004+2304]; 2026-02-21T10:18:46.6615260Z ld.shared.b16 %rs711, [%r12004+2064]; 2026-02-21T10:18:46.6615331Z ld.shared.b16 %rs712, [%r12004+2320]; 2026-02-21T10:18:46.6615397Z add.s32 %r12005, %r12003, %r598; 2026-02-21T10:18:46.6615463Z ld.shared.b16 %rs713, [%r12005]; 2026-02-21T10:18:46.6615599Z ld.shared.b16 %rs714, [%r12005+256]; 2026-02-21T10:18:46.6615670Z ld.shared.b16 %rs715, [%r12005+16]; 2026-02-21T10:18:46.6615790Z ld.shared.b16 %rs716, [%r12005+272]; 2026-02-21T10:18:46.6615860Z ld.shared.b16 %rs717, [%r12005+2048]; 2026-02-21T10:18:46.6615933Z ld.shared.b16 %rs718, [%r12005+2304]; 2026-02-21T10:18:46.6616002Z ld.shared.b16 %rs719, [%r12005+2064]; 2026-02-21T10:18:46.6616070Z ld.shared.b16 %rs720, [%r12005+2320]; 2026-02-21T10:18:46.6616145Z cvt.f32.bf16 %r11166, %rs705; 2026-02-21T10:18:46.6616209Z cvt.f32.bf16 %r11167, %rs706; 2026-02-21T10:18:46.6616271Z cvt.f32.bf16 %r11168, %rs713; 2026-02-21T10:18:46.6616334Z cvt.f32.bf16 %r11169, %rs714; 2026-02-21T10:18:46.6616401Z cvt.f32.bf16 %r11298, %rs707; 2026-02-21T10:18:46.6616588Z cvt.f32.bf16 %r11299, %rs708; 2026-02-21T10:18:46.6616655Z cvt.f32.bf16 %r11300, %rs715; 2026-02-21T10:18:46.6616722Z cvt.f32.bf16 %r11301, %rs716; 2026-02-21T10:18:46.6616783Z cvt.f32.bf16 %r11430, %rs709; 2026-02-21T10:18:46.6616850Z cvt.f32.bf16 %r11431, %rs710; 2026-02-21T10:18:46.6616911Z cvt.f32.bf16 %r11432, %rs717; 2026-02-21T10:18:46.6616980Z cvt.f32.bf16 %r11433, %rs718; 2026-02-21T10:18:46.6617137Z cvt.f32.bf16 %r11562, %rs711; 2026-02-21T10:18:46.6617203Z cvt.f32.bf16 %r11563, %rs712; 2026-02-21T10:18:46.6617274Z cvt.f32.bf16 %r11564, %rs719; 2026-02-21T10:18:46.6617344Z cvt.f32.bf16 %r11565, %rs720; 2026-02-21T10:18:46.6617614Z .loc 1 54 62 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:62 2026-02-21T10:18:46.6617701Z mad.lo.s32 %r12006, %r12001, 1280, %r12625; 2026-02-21T10:18:46.6617906Z .loc 1 54 34 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:34 2026-02-21T10:18:46.6617971Z cvt.s64.s32 %rd292, %r12006; 2026-02-21T10:18:46.6618043Z add.s64 %rd268, %rd26, %rd292; 2026-02-21T10:18:46.6618243Z .loc 1 54 87 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:54:87 2026-02-21T10:18:46.6618307Z // begin inline asm 2026-02-21T10:18:46.6618368Z mov.u32 %r11036, 0x0; 2026-02-21T10:18:46.6618433Z mov.u32 %r11037, 0x0; 2026-02-21T10:18:46.6618540Z ld.global.v2.b32 { %r11036, %r11037 }, [ %rd268 + 0 ]; 2026-02-21T10:18:46.6618598Z // end inline asm 2026-02-21T10:18:46.6618807Z .loc 1 62 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:62:28 2026-02-21T10:18:46.6618865Z bar.sync 0; 2026-02-21T10:18:46.6618933Z st.shared.b8 [%r599], %r11036; 2026-02-21T10:18:46.6619010Z prmt.b32 %r12007, %r11036, 0, 0x7771U; 2026-02-21T10:18:46.6619076Z st.shared.b8 [%r600], %r12007; 2026-02-21T10:18:46.6619145Z prmt.b32 %r12008, %r11036, 0, 0x7772U; 2026-02-21T10:18:46.6619213Z st.shared.b8 [%r601+256], %r12008; 2026-02-21T10:18:46.6619283Z prmt.b32 %r12009, %r11036, 0, 0x7773U; 2026-02-21T10:18:46.6619348Z st.shared.b8 [%r602+256], %r12009; 2026-02-21T10:18:46.6619413Z st.shared.b8 [%r603+512], %r11037; 2026-02-21T10:18:46.6619496Z prmt.b32 %r12010, %r11037, 0, 0x7771U; 2026-02-21T10:18:46.6619564Z st.shared.b8 [%r604+512], %r12010; 2026-02-21T10:18:46.6619634Z prmt.b32 %r12011, %r11037, 0, 0x7772U; 2026-02-21T10:18:46.6619700Z st.shared.b8 [%r605+768], %r12011; 2026-02-21T10:18:46.6619771Z prmt.b32 %r12012, %r11037, 0, 0x7773U; 2026-02-21T10:18:46.6619834Z st.shared.b8 [%r606+768], %r12012; 2026-02-21T10:18:46.6619889Z bar.sync 0; 2026-02-21T10:18:46.6619961Z ld.shared.b32 %r12013, [%r607]; 2026-02-21T10:18:46.6620028Z prmt.b32 %r12014, %r12013, 0, 0x7770U; 2026-02-21T10:18:46.6620094Z cvt.u16.u32 %rs721, %r12014; 2026-02-21T10:18:46.6620160Z prmt.b32 %r12015, %r12013, 0, 0x7771U; 2026-02-21T10:18:46.6620227Z cvt.u16.u32 %rs722, %r12015; 2026-02-21T10:18:46.6620295Z prmt.b32 %r12016, %r12013, 0, 0x7772U; 2026-02-21T10:18:46.6620357Z cvt.u16.u32 %rs723, %r12016; 2026-02-21T10:18:46.6620426Z prmt.b32 %r12017, %r12013, 0, 0x7773U; 2026-02-21T10:18:46.6620486Z cvt.u16.u32 %rs724, %r12017; 2026-02-21T10:18:46.6620552Z ld.shared.b32 %r12018, [%r608]; 2026-02-21T10:18:46.6620695Z prmt.b32 %r12019, %r12018, 0, 0x7770U; 2026-02-21T10:18:46.6620817Z cvt.u16.u32 %rs725, %r12019; 2026-02-21T10:18:46.6620884Z prmt.b32 %r12020, %r12018, 0, 0x7771U; 2026-02-21T10:18:46.6620947Z cvt.u16.u32 %rs726, %r12020; 2026-02-21T10:18:46.6621017Z prmt.b32 %r12021, %r12018, 0, 0x7772U; 2026-02-21T10:18:46.6621078Z cvt.u16.u32 %rs727, %r12021; 2026-02-21T10:18:46.6621147Z prmt.b32 %r12022, %r12018, 0, 0x7773U; 2026-02-21T10:18:46.6621221Z cvt.u16.u32 %rs728, %r12022; 2026-02-21T10:18:46.6621438Z .loc 1 57 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:57:28 2026-02-21T10:18:46.6621506Z shl.b16 %rs729, %rs721, 4; 2026-02-21T10:18:46.6621569Z shl.b16 %rs730, %rs725, 4; 2026-02-21T10:18:46.6621636Z shl.b16 %rs731, %rs722, 4; 2026-02-21T10:18:46.6621697Z shl.b16 %rs732, %rs726, 4; 2026-02-21T10:18:46.6621757Z shl.b16 %rs733, %rs723, 4; 2026-02-21T10:18:46.6621822Z shl.b16 %rs734, %rs727, 4; 2026-02-21T10:18:46.6621884Z shl.b16 %rs735, %rs724, 4; 2026-02-21T10:18:46.6621946Z shl.b16 %rs736, %rs728, 4; 2026-02-21T10:18:46.6622204Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6622269Z cvt.s16.s8 %rs737, %rs729; 2026-02-21T10:18:46.6622330Z shr.s16 %rs738, %rs737, 4; 2026-02-21T10:18:46.6622391Z cvt.s16.s8 %rs739, %rs730; 2026-02-21T10:18:46.6622455Z shr.s16 %rs740, %rs739, 4; 2026-02-21T10:18:46.6622568Z prmt.b32 %r12023, %r12013, 0, 0x8880U; 2026-02-21T10:18:46.6622632Z cvt.u16.u32 %rs741, %r12023; 2026-02-21T10:18:46.6622696Z shr.s16 %rs742, %rs741, 4; 2026-02-21T10:18:46.6622765Z prmt.b32 %r12024, %r12018, 0, 0x8880U; 2026-02-21T10:18:46.6622827Z cvt.u16.u32 %rs743, %r12024; 2026-02-21T10:18:46.6622887Z shr.s16 %rs744, %rs743, 4; 2026-02-21T10:18:46.6623095Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6623161Z cvt.rn.f32.s16 %r12025, %rs744; 2026-02-21T10:18:46.6623227Z cvt.rn.f32.s16 %r12026, %rs742; 2026-02-21T10:18:46.6623302Z cvt.rn.f32.s16 %r12027, %rs740; 2026-02-21T10:18:46.6623374Z cvt.rn.f32.s16 %r12028, %rs738; 2026-02-21T10:18:46.6623577Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6623644Z cvt.s16.s8 %rs745, %rs731; 2026-02-21T10:18:46.6623705Z shr.s16 %rs746, %rs745, 4; 2026-02-21T10:18:46.6623768Z cvt.s16.s8 %rs747, %rs732; 2026-02-21T10:18:46.6623829Z shr.s16 %rs748, %rs747, 4; 2026-02-21T10:18:46.6623901Z prmt.b32 %r12029, %r12013, 0, 0x9991U; 2026-02-21T10:18:46.6623963Z cvt.u16.u32 %rs749, %r12029; 2026-02-21T10:18:46.6624023Z shr.s16 %rs750, %rs749, 4; 2026-02-21T10:18:46.6624092Z prmt.b32 %r12030, %r12018, 0, 0x9991U; 2026-02-21T10:18:46.6624156Z cvt.u16.u32 %rs751, %r12030; 2026-02-21T10:18:46.6624215Z shr.s16 %rs752, %rs751, 4; 2026-02-21T10:18:46.6624415Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6624485Z cvt.rn.f32.s16 %r12031, %rs752; 2026-02-21T10:18:46.6624551Z cvt.rn.f32.s16 %r12032, %rs750; 2026-02-21T10:18:46.6624615Z cvt.rn.f32.s16 %r12033, %rs748; 2026-02-21T10:18:46.6624682Z cvt.rn.f32.s16 %r12034, %rs746; 2026-02-21T10:18:46.6624880Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6624942Z cvt.s16.s8 %rs753, %rs733; 2026-02-21T10:18:46.6625008Z shr.s16 %rs754, %rs753, 4; 2026-02-21T10:18:46.6625070Z cvt.s16.s8 %rs755, %rs734; 2026-02-21T10:18:46.6625131Z shr.s16 %rs756, %rs755, 4; 2026-02-21T10:18:46.6625198Z prmt.b32 %r12035, %r12013, 0, 0xaaa2U; 2026-02-21T10:18:46.6625265Z cvt.u16.u32 %rs757, %r12035; 2026-02-21T10:18:46.6625326Z shr.s16 %rs758, %rs757, 4; 2026-02-21T10:18:46.6625392Z prmt.b32 %r12036, %r12018, 0, 0xaaa2U; 2026-02-21T10:18:46.6625455Z cvt.u16.u32 %rs759, %r12036; 2026-02-21T10:18:46.6625519Z shr.s16 %rs760, %rs759, 4; 2026-02-21T10:18:46.6625781Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6625894Z cvt.rn.f32.s16 %r12037, %rs760; 2026-02-21T10:18:46.6625955Z cvt.rn.f32.s16 %r12038, %rs758; 2026-02-21T10:18:46.6626015Z cvt.rn.f32.s16 %r12039, %rs756; 2026-02-21T10:18:46.6626081Z cvt.rn.f32.s16 %r12040, %rs754; 2026-02-21T10:18:46.6626279Z .loc 1 59 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:59:25 2026-02-21T10:18:46.6626340Z cvt.s16.s8 %rs761, %rs735; 2026-02-21T10:18:46.6626403Z shr.s16 %rs762, %rs761, 4; 2026-02-21T10:18:46.6626589Z cvt.s16.s8 %rs763, %rs736; 2026-02-21T10:18:46.6626655Z shr.s16 %rs764, %rs763, 4; 2026-02-21T10:18:46.6626725Z prmt.b32 %r12041, %r12013, 0, 0xbbb3U; 2026-02-21T10:18:46.6626786Z cvt.u16.u32 %rs765, %r12041; 2026-02-21T10:18:46.6626846Z shr.s16 %rs766, %rs765, 4; 2026-02-21T10:18:46.6626914Z prmt.b32 %r12042, %r12018, 0, 0xbbb3U; 2026-02-21T10:18:46.6626982Z cvt.u16.u32 %rs767, %r12042; 2026-02-21T10:18:46.6627046Z shr.s16 %rs768, %rs767, 4; 2026-02-21T10:18:46.6627315Z .loc 1 77 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:77:32 2026-02-21T10:18:46.6627383Z cvt.rn.f32.s16 %r12043, %rs768; 2026-02-21T10:18:46.6627445Z cvt.rn.f32.s16 %r12044, %rs766; 2026-02-21T10:18:46.6627506Z cvt.rn.f32.s16 %r12045, %rs764; 2026-02-21T10:18:46.6627631Z cvt.rn.f32.s16 %r12046, %rs762; 2026-02-21T10:18:46.6627688Z bar.sync 0; 2026-02-21T10:18:46.6627808Z st.shared.v4.b32 [%r609], {%r12028, %r12026, %r12027, %r12025}; 2026-02-21T10:18:46.6627923Z st.shared.v4.b32 [%r610], {%r12034, %r12032, %r12033, %r12031}; 2026-02-21T10:18:46.6628035Z st.shared.v4.b32 [%r611], {%r12040, %r12038, %r12039, %r12037}; 2026-02-21T10:18:46.6628144Z st.shared.v4.b32 [%r612], {%r12046, %r12044, %r12045, %r12043}; 2026-02-21T10:18:46.6628198Z $L__tmp23: 2026-02-21T10:18:46.6628558Z .loc 2 291 36 // standard.py:291:36 @[ co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:84:40 ] 2026-02-21T10:18:46.6628626Z // begin inline asm 2026-02-21T10:18:46.6628707Z fence.proxy.async.shared::cta; 2026-02-21T10:18:46.6628767Z // end inline asm 2026-02-21T10:18:46.6628820Z bar.sync 0; 2026-02-21T10:18:46.6628893Z wgmma.fence.sync.aligned; 2026-02-21T10:18:46.6628951Z // begin inline asm 2026-02-21T10:18:46.6630429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r11166,%r11167,%r11168,%r11169}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6630488Z // end inline asm 2026-02-21T10:18:46.6630549Z // begin inline asm 2026-02-21T10:18:46.6632014Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698}, {%r11298,%r11299,%r11300,%r11301}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6632072Z // end inline asm 2026-02-21T10:18:46.6632133Z // begin inline asm 2026-02-21T10:18:46.6633589Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r11430,%r11431,%r11432,%r11433}, %rd254, %p46, 1, 1; 2026-02-21T10:18:46.6633788Z // end inline asm 2026-02-21T10:18:46.6633846Z // begin inline asm 2026-02-21T10:18:46.6635344Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762}, {%r11562,%r11563,%r11564,%r11565}, %rd255, %p46, 1, 1; 2026-02-21T10:18:46.6635408Z // end inline asm 2026-02-21T10:18:46.6635486Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:46.6635546Z mov.b32 %r11694, %r9449; 2026-02-21T10:18:46.6635610Z mov.b32 %r11695, %r11696; 2026-02-21T10:18:46.6635712Z // begin inline asm 2026-02-21T10:18:46.6638306Z // wait for regs: %r12635,%r12636,%r12637,%r12638,%r12639,%r12640,%r12641,%r12642,%r12643,%r12644,%r12645,%r12646,%r12647,%r12648,%r12649,%r12650,%r12651,%r12652,%r12653,%r12654,%r12655,%r12656,%r12657,%r12658,%r12659,%r12660,%r12661,%r12662,%r12663,%r12664,%r12665,%r12666,%r12667,%r12668,%r12669,%r12670,%r12671,%r12672,%r12673,%r12674,%r12675,%r12676,%r12677,%r12678,%r12679,%r12680,%r12681,%r12682,%r12683,%r12684,%r12685,%r12686,%r12687,%r12688,%r12689,%r12690,%r12691,%r12692,%r12693,%r12694,%r12695,%r12696,%r12697,%r12698,%r12699,%r12700,%r12701,%r12702,%r12703,%r12704,%r12705,%r12706,%r12707,%r12708,%r12709,%r12710,%r12711,%r12712,%r12713,%r12714,%r12715,%r12716,%r12717,%r12718,%r12719,%r12720,%r12721,%r12722,%r12723,%r12724,%r12725,%r12726,%r12727,%r12728,%r12729,%r12730,%r12731,%r12732,%r12733,%r12734,%r12735,%r12736,%r12737,%r12738,%r12739,%r12740,%r12741,%r12742,%r12743,%r12744,%r12745,%r12746,%r12747,%r12748,%r12749,%r12750,%r12751,%r12752,%r12753,%r12754,%r12755,%r12756,%r12757,%r12758,%r12759,%r12760,%r12761,%r12762,%r11694,%r11695,%r11696 2026-02-21T10:18:46.6638391Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:46.6638448Z // end inline asm 2026-02-21T10:18:46.6638501Z $L__tmp24: 2026-02-21T10:18:46.6638731Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6638793Z add.s32 %r12047, %r12627, 32; 2026-02-21T10:18:46.6638856Z add.s32 %r12048, %r12632, 1; 2026-02-21T10:18:46.6638927Z setp.gt.s32 %p65, %r12048, 4; 2026-02-21T10:18:46.6638997Z selp.b32 %r12632, 0, %r12048, %p65; 2026-02-21T10:18:46.6639065Z selp.b32 %r12626, 0, %r12047, %p62; 2026-02-21T10:18:46.6639275Z .loc 1 45 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:45:22 2026-02-21T10:18:46.6639337Z shl.b32 %r12049, %r12626, 1; 2026-02-21T10:18:46.6639538Z .loc 1 47 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:47:25 2026-02-21T10:18:46.6639613Z add.s32 %r12050, %r12049, %r30; 2026-02-21T10:18:46.6639821Z .loc 1 48 53 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:53 2026-02-21T10:18:46.6639881Z shl.b32 %r12051, %r12769, 13; 2026-02-21T10:18:46.6639940Z shl.b32 %r12052, %r12770, 13; 2026-02-21T10:18:46.6640001Z shl.b32 %r12053, %r12771, 13; 2026-02-21T10:18:46.6640061Z shl.b32 %r12054, %r12772, 13; 2026-02-21T10:18:46.6640255Z .loc 1 48 60 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:60 2026-02-21T10:18:46.6640409Z add.s32 %r12055, %r12051, %r12050; 2026-02-21T10:18:46.6640549Z add.s32 %r12056, %r12052, %r12050; 2026-02-21T10:18:46.6640611Z add.s32 %r12057, %r12053, %r12050; 2026-02-21T10:18:46.6640672Z add.s32 %r12058, %r12054, %r12050; 2026-02-21T10:18:46.6640873Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6640947Z mad.wide.s32 %rd273, %r12055, 2, %rd25; 2026-02-21T10:18:46.6641017Z mad.wide.s32 %rd274, %r12056, 2, %rd25; 2026-02-21T10:18:46.6641089Z mad.wide.s32 %rd275, %r12057, 2, %rd25; 2026-02-21T10:18:46.6641156Z mad.wide.s32 %rd276, %r12058, 2, %rd25; 2026-02-21T10:18:46.6641353Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6641419Z shl.b32 %r12059, %r12632, 12; 2026-02-21T10:18:46.6641484Z add.s32 %r12060, %r5575, %r12059; 2026-02-21T10:18:46.6641548Z add.s32 %r11828, %r12060, %r594; 2026-02-21T10:18:46.6641616Z selp.b32 %r11829, 8, 0, %p63; 2026-02-21T10:18:46.6641676Z // begin inline asm 2026-02-21T10:18:46.6641910Z cp.async.ca.shared.global [ %r11828 + 0 ], [ %rd273 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6641972Z // end inline asm 2026-02-21T10:18:46.6642041Z add.s32 %r11830, %r11828, 1024; 2026-02-21T10:18:46.6642100Z // begin inline asm 2026-02-21T10:18:46.6642241Z cp.async.ca.shared.global [ %r11830 + 0 ], [ %rd274 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6642362Z // end inline asm 2026-02-21T10:18:46.6642427Z add.s32 %r11832, %r11828, 2048; 2026-02-21T10:18:46.6642485Z // begin inline asm 2026-02-21T10:18:46.6642622Z cp.async.ca.shared.global [ %r11832 + 0 ], [ %rd275 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6642682Z // end inline asm 2026-02-21T10:18:46.6642744Z add.s32 %r11834, %r11828, 3072; 2026-02-21T10:18:46.6642802Z // begin inline asm 2026-02-21T10:18:46.6642939Z cp.async.ca.shared.global [ %r11834 + 0 ], [ %rd276 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6642999Z // end inline asm 2026-02-21T10:18:46.6643065Z cp.async.commit_group; 2026-02-21T10:18:46.6643275Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6643340Z add.s32 %r12617, %r12626, 8; 2026-02-21T10:18:46.6643542Z .loc 1 45 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:45:22 2026-02-21T10:18:46.6643602Z shl.b32 %r12061, %r12617, 1; 2026-02-21T10:18:46.6643806Z .loc 1 47 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:47:25 2026-02-21T10:18:46.6643868Z add.s32 %r12062, %r12061, %r30; 2026-02-21T10:18:46.6644062Z .loc 1 48 60 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:60 2026-02-21T10:18:46.6644134Z add.s32 %r12063, %r12051, %r12062; 2026-02-21T10:18:46.6644203Z add.s32 %r12064, %r12052, %r12062; 2026-02-21T10:18:46.6644265Z add.s32 %r12065, %r12053, %r12062; 2026-02-21T10:18:46.6644332Z add.s32 %r12066, %r12054, %r12062; 2026-02-21T10:18:46.6644529Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6644602Z mad.wide.s32 %rd277, %r12063, 2, %rd25; 2026-02-21T10:18:46.6644673Z mad.wide.s32 %rd278, %r12064, 2, %rd25; 2026-02-21T10:18:46.6644745Z mad.wide.s32 %rd279, %r12065, 2, %rd25; 2026-02-21T10:18:46.6644814Z mad.wide.s32 %rd280, %r12066, 2, %rd25; 2026-02-21T10:18:46.6645012Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6645075Z add.s32 %r12067, %r11910, %r12059; 2026-02-21T10:18:46.6645138Z add.s32 %r11836, %r12067, %r594; 2026-02-21T10:18:46.6645197Z // begin inline asm 2026-02-21T10:18:46.6645340Z cp.async.ca.shared.global [ %r11836 + 0 ], [ %rd277 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6645396Z // end inline asm 2026-02-21T10:18:46.6645456Z add.s32 %r11838, %r11836, 1024; 2026-02-21T10:18:46.6645514Z // begin inline asm 2026-02-21T10:18:46.6645723Z cp.async.ca.shared.global [ %r11838 + 0 ], [ %rd278 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6645827Z // end inline asm 2026-02-21T10:18:46.6645891Z add.s32 %r11840, %r11836, 2048; 2026-02-21T10:18:46.6645954Z // begin inline asm 2026-02-21T10:18:46.6646090Z cp.async.ca.shared.global [ %r11840 + 0 ], [ %rd279 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6646147Z // end inline asm 2026-02-21T10:18:46.6646207Z add.s32 %r11842, %r11836, 3072; 2026-02-21T10:18:46.6646269Z // begin inline asm 2026-02-21T10:18:46.6646404Z cp.async.ca.shared.global [ %r11842 + 0 ], [ %rd280 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6646572Z // end inline asm 2026-02-21T10:18:46.6646651Z cp.async.commit_group; 2026-02-21T10:18:46.6646858Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6646918Z add.s32 %r12612, %r12626, 16; 2026-02-21T10:18:46.6647120Z .loc 1 45 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:45:22 2026-02-21T10:18:46.6647182Z shl.b32 %r12068, %r12612, 1; 2026-02-21T10:18:46.6647455Z .loc 1 47 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:47:25 2026-02-21T10:18:46.6647521Z add.s32 %r12069, %r12068, %r30; 2026-02-21T10:18:46.6647717Z .loc 1 48 60 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:60 2026-02-21T10:18:46.6647842Z add.s32 %r12070, %r12051, %r12069; 2026-02-21T10:18:46.6647916Z add.s32 %r12071, %r12052, %r12069; 2026-02-21T10:18:46.6647983Z add.s32 %r12072, %r12053, %r12069; 2026-02-21T10:18:46.6648043Z add.s32 %r12073, %r12054, %r12069; 2026-02-21T10:18:46.6648241Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6648312Z mad.wide.s32 %rd281, %r12070, 2, %rd25; 2026-02-21T10:18:46.6648388Z mad.wide.s32 %rd282, %r12071, 2, %rd25; 2026-02-21T10:18:46.6648455Z mad.wide.s32 %rd283, %r12072, 2, %rd25; 2026-02-21T10:18:46.6648529Z mad.wide.s32 %rd284, %r12073, 2, %rd25; 2026-02-21T10:18:46.6648729Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6648789Z add.s32 %r12074, %r11956, %r12059; 2026-02-21T10:18:46.6648851Z add.s32 %r11844, %r12074, %r594; 2026-02-21T10:18:46.6648911Z // begin inline asm 2026-02-21T10:18:46.6649050Z cp.async.ca.shared.global [ %r11844 + 0 ], [ %rd281 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6649106Z // end inline asm 2026-02-21T10:18:46.6649170Z add.s32 %r11846, %r11844, 1024; 2026-02-21T10:18:46.6649239Z // begin inline asm 2026-02-21T10:18:46.6649379Z cp.async.ca.shared.global [ %r11846 + 0 ], [ %rd282 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6649438Z // end inline asm 2026-02-21T10:18:46.6649503Z add.s32 %r11848, %r11844, 2048; 2026-02-21T10:18:46.6649561Z // begin inline asm 2026-02-21T10:18:46.6649698Z cp.async.ca.shared.global [ %r11848 + 0 ], [ %rd283 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6649757Z // end inline asm 2026-02-21T10:18:46.6649819Z add.s32 %r11850, %r11844, 3072; 2026-02-21T10:18:46.6649879Z // begin inline asm 2026-02-21T10:18:46.6650016Z cp.async.ca.shared.global [ %r11850 + 0 ], [ %rd284 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6650072Z // end inline asm 2026-02-21T10:18:46.6650137Z cp.async.commit_group; 2026-02-21T10:18:46.6650337Z .loc 1 40 93 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:40:93 2026-02-21T10:18:46.6650402Z add.s32 %r12607, %r12626, 24; 2026-02-21T10:18:46.6650599Z .loc 1 45 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:45:22 2026-02-21T10:18:46.6650660Z shl.b32 %r12075, %r12607, 1; 2026-02-21T10:18:46.6650860Z .loc 1 47 25 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:47:25 2026-02-21T10:18:46.6650921Z add.s32 %r12076, %r12075, %r30; 2026-02-21T10:18:46.6651117Z .loc 1 48 60 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:60 2026-02-21T10:18:46.6651322Z add.s32 %r12077, %r12051, %r12076; 2026-02-21T10:18:46.6651384Z add.s32 %r12078, %r12052, %r12076; 2026-02-21T10:18:46.6651446Z add.s32 %r12079, %r12053, %r12076; 2026-02-21T10:18:46.6651509Z add.s32 %r12080, %r12054, %r12076; 2026-02-21T10:18:46.6651710Z .loc 1 48 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:32 2026-02-21T10:18:46.6651780Z mad.wide.s32 %rd285, %r12077, 2, %rd25; 2026-02-21T10:18:46.6651848Z mad.wide.s32 %rd286, %r12078, 2, %rd25; 2026-02-21T10:18:46.6651918Z mad.wide.s32 %rd287, %r12079, 2, %rd25; 2026-02-21T10:18:46.6651985Z mad.wide.s32 %rd288, %r12080, 2, %rd25; 2026-02-21T10:18:46.6652183Z .loc 1 48 80 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:48:80 2026-02-21T10:18:46.6652246Z add.s32 %r12081, %r12002, %r12059; 2026-02-21T10:18:46.6652306Z add.s32 %r11852, %r12081, %r594; 2026-02-21T10:18:46.6652366Z // begin inline asm 2026-02-21T10:18:46.6652503Z cp.async.ca.shared.global [ %r11852 + 0 ], [ %rd285 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6652564Z // end inline asm 2026-02-21T10:18:46.6652677Z add.s32 %r11854, %r11852, 1024; 2026-02-21T10:18:46.6652737Z // begin inline asm 2026-02-21T10:18:46.6652875Z cp.async.ca.shared.global [ %r11854 + 0 ], [ %rd286 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6652931Z // end inline asm 2026-02-21T10:18:46.6653033Z add.s32 %r11856, %r11852, 2048; 2026-02-21T10:18:46.6653097Z // begin inline asm 2026-02-21T10:18:46.6653243Z cp.async.ca.shared.global [ %r11856 + 0 ], [ %rd287 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6653301Z // end inline asm 2026-02-21T10:18:46.6653362Z add.s32 %r11858, %r11852, 3072; 2026-02-21T10:18:46.6653423Z // begin inline asm 2026-02-21T10:18:46.6653561Z cp.async.ca.shared.global [ %r11858 + 0 ], [ %rd288 + 0 ], 0x8, %r11829; 2026-02-21T10:18:46.6653618Z // end inline asm 2026-02-21T10:18:46.6653683Z cp.async.commit_group; 2026-02-21T10:18:46.6653894Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6653963Z setp.ne.b32 %p66, %r12606, 127; 2026-02-21T10:18:46.6654024Z @%p66 bra $L__BB0_14; 2026-02-21T10:18:46.6654138Z // %bb.13: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:18:46.6654336Z .loc 1 33 32 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:33:32 2026-02-21T10:18:46.6654398Z add.s32 %r12227, %r12602, %r11; 2026-02-21T10:18:46.6654462Z add.s32 %r12228, %r12602, %r12; 2026-02-21T10:18:46.6654522Z add.s32 %r12229, %r12602, %r13; 2026-02-21T10:18:46.6654583Z add.s32 %r12230, %r12602, %r14; 2026-02-21T10:18:46.6654647Z add.s32 %r12231, %r12602, %r15; 2026-02-21T10:18:46.6654706Z add.s32 %r12232, %r12602, %r16; 2026-02-21T10:18:46.6654765Z add.s32 %r12233, %r12602, %r17; 2026-02-21T10:18:46.6654826Z add.s32 %r12234, %r12602, %r18; 2026-02-21T10:18:46.6654888Z add.s32 %r12235, %r12602, %r19; 2026-02-21T10:18:46.6654952Z add.s32 %r12236, %r12602, %r20; 2026-02-21T10:18:46.6655013Z add.s32 %r12237, %r12602, %r21; 2026-02-21T10:18:46.6655075Z add.s32 %r12238, %r12602, %r22; 2026-02-21T10:18:46.6655135Z add.s32 %r12239, %r12602, %r23; 2026-02-21T10:18:46.6655194Z add.s32 %r12240, %r12602, %r24; 2026-02-21T10:18:46.6655254Z add.s32 %r12241, %r12602, %r25; 2026-02-21T10:18:46.6655313Z add.s32 %r12242, %r12602, %r26; 2026-02-21T10:18:46.6655513Z .loc 1 87 28 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:87:28 2026-02-21T10:18:46.6655597Z cvt.rn.bf16x2.f32 %r12243, %r12636, %r12635; 2026-02-21T10:18:46.6655681Z cvt.rn.bf16x2.f32 %r12244, %r12638, %r12637; 2026-02-21T10:18:46.6655759Z cvt.rn.bf16x2.f32 %r12245, %r12640, %r12639; 2026-02-21T10:18:46.6655833Z cvt.rn.bf16x2.f32 %r12246, %r12642, %r12641; 2026-02-21T10:18:46.6655910Z cvt.rn.bf16x2.f32 %r12247, %r12644, %r12643; 2026-02-21T10:18:46.6655985Z cvt.rn.bf16x2.f32 %r12248, %r12646, %r12645; 2026-02-21T10:18:46.6656134Z cvt.rn.bf16x2.f32 %r12249, %r12648, %r12647; 2026-02-21T10:18:46.6656259Z cvt.rn.bf16x2.f32 %r12250, %r12650, %r12649; 2026-02-21T10:18:46.6656337Z cvt.rn.bf16x2.f32 %r12251, %r12652, %r12651; 2026-02-21T10:18:46.6656411Z cvt.rn.bf16x2.f32 %r12252, %r12654, %r12653; 2026-02-21T10:18:46.6656620Z cvt.rn.bf16x2.f32 %r12253, %r12656, %r12655; 2026-02-21T10:18:46.6656703Z cvt.rn.bf16x2.f32 %r12254, %r12658, %r12657; 2026-02-21T10:18:46.6656779Z cvt.rn.bf16x2.f32 %r12255, %r12660, %r12659; 2026-02-21T10:18:46.6656853Z cvt.rn.bf16x2.f32 %r12256, %r12662, %r12661; 2026-02-21T10:18:46.6656929Z cvt.rn.bf16x2.f32 %r12257, %r12664, %r12663; 2026-02-21T10:18:46.6657003Z cvt.rn.bf16x2.f32 %r12258, %r12666, %r12665; 2026-02-21T10:18:46.6657077Z cvt.rn.bf16x2.f32 %r12259, %r12668, %r12667; 2026-02-21T10:18:46.6657153Z cvt.rn.bf16x2.f32 %r12260, %r12670, %r12669; 2026-02-21T10:18:46.6657227Z cvt.rn.bf16x2.f32 %r12261, %r12672, %r12671; 2026-02-21T10:18:46.6657307Z cvt.rn.bf16x2.f32 %r12262, %r12674, %r12673; 2026-02-21T10:18:46.6657385Z cvt.rn.bf16x2.f32 %r12263, %r12676, %r12675; 2026-02-21T10:18:46.6657549Z cvt.rn.bf16x2.f32 %r12264, %r12678, %r12677; 2026-02-21T10:18:46.6657630Z cvt.rn.bf16x2.f32 %r12265, %r12680, %r12679; 2026-02-21T10:18:46.6657706Z cvt.rn.bf16x2.f32 %r12266, %r12682, %r12681; 2026-02-21T10:18:46.6657781Z cvt.rn.bf16x2.f32 %r12267, %r12684, %r12683; 2026-02-21T10:18:46.6657915Z cvt.rn.bf16x2.f32 %r12268, %r12686, %r12685; 2026-02-21T10:18:46.6657992Z cvt.rn.bf16x2.f32 %r12269, %r12688, %r12687; 2026-02-21T10:18:46.6658067Z cvt.rn.bf16x2.f32 %r12270, %r12690, %r12689; 2026-02-21T10:18:46.6658147Z cvt.rn.bf16x2.f32 %r12271, %r12692, %r12691; 2026-02-21T10:18:46.6658221Z cvt.rn.bf16x2.f32 %r12272, %r12694, %r12693; 2026-02-21T10:18:46.6658295Z cvt.rn.bf16x2.f32 %r12273, %r12696, %r12695; 2026-02-21T10:18:46.6658371Z cvt.rn.bf16x2.f32 %r12274, %r12698, %r12697; 2026-02-21T10:18:46.6658446Z cvt.rn.bf16x2.f32 %r12275, %r12700, %r12699; 2026-02-21T10:18:46.6658531Z cvt.rn.bf16x2.f32 %r12276, %r12702, %r12701; 2026-02-21T10:18:46.6658609Z cvt.rn.bf16x2.f32 %r12277, %r12704, %r12703; 2026-02-21T10:18:46.6658686Z cvt.rn.bf16x2.f32 %r12278, %r12706, %r12705; 2026-02-21T10:18:46.6658759Z cvt.rn.bf16x2.f32 %r12279, %r12708, %r12707; 2026-02-21T10:18:46.6658834Z cvt.rn.bf16x2.f32 %r12280, %r12710, %r12709; 2026-02-21T10:18:46.6658911Z cvt.rn.bf16x2.f32 %r12281, %r12712, %r12711; 2026-02-21T10:18:46.6658987Z cvt.rn.bf16x2.f32 %r12282, %r12714, %r12713; 2026-02-21T10:18:46.6659062Z cvt.rn.bf16x2.f32 %r12283, %r12716, %r12715; 2026-02-21T10:18:46.6659140Z cvt.rn.bf16x2.f32 %r12284, %r12718, %r12717; 2026-02-21T10:18:46.6659214Z cvt.rn.bf16x2.f32 %r12285, %r12720, %r12719; 2026-02-21T10:18:46.6659298Z cvt.rn.bf16x2.f32 %r12286, %r12722, %r12721; 2026-02-21T10:18:46.6659378Z cvt.rn.bf16x2.f32 %r12287, %r12724, %r12723; 2026-02-21T10:18:46.6659454Z cvt.rn.bf16x2.f32 %r12288, %r12726, %r12725; 2026-02-21T10:18:46.6659534Z cvt.rn.bf16x2.f32 %r12289, %r12728, %r12727; 2026-02-21T10:18:46.6659611Z cvt.rn.bf16x2.f32 %r12290, %r12730, %r12729; 2026-02-21T10:18:46.6659689Z cvt.rn.bf16x2.f32 %r12291, %r12732, %r12731; 2026-02-21T10:18:46.6659764Z cvt.rn.bf16x2.f32 %r12292, %r12734, %r12733; 2026-02-21T10:18:46.6659839Z cvt.rn.bf16x2.f32 %r12293, %r12736, %r12735; 2026-02-21T10:18:46.6659917Z cvt.rn.bf16x2.f32 %r12294, %r12738, %r12737; 2026-02-21T10:18:46.6659993Z cvt.rn.bf16x2.f32 %r12295, %r12740, %r12739; 2026-02-21T10:18:46.6660070Z cvt.rn.bf16x2.f32 %r12296, %r12742, %r12741; 2026-02-21T10:18:46.6660145Z cvt.rn.bf16x2.f32 %r12297, %r12744, %r12743; 2026-02-21T10:18:46.6660221Z cvt.rn.bf16x2.f32 %r12298, %r12746, %r12745; 2026-02-21T10:18:46.6660294Z cvt.rn.bf16x2.f32 %r12299, %r12748, %r12747; 2026-02-21T10:18:46.6660369Z cvt.rn.bf16x2.f32 %r12300, %r12750, %r12749; 2026-02-21T10:18:46.6660446Z cvt.rn.bf16x2.f32 %r12301, %r12752, %r12751; 2026-02-21T10:18:46.6660524Z cvt.rn.bf16x2.f32 %r12302, %r12754, %r12753; 2026-02-21T10:18:46.6660705Z cvt.rn.bf16x2.f32 %r12303, %r12756, %r12755; 2026-02-21T10:18:46.6660847Z cvt.rn.bf16x2.f32 %r12304, %r12758, %r12757; 2026-02-21T10:18:46.6660925Z cvt.rn.bf16x2.f32 %r12305, %r12760, %r12759; 2026-02-21T10:18:46.6661000Z cvt.rn.bf16x2.f32 %r12306, %r12762, %r12761; 2026-02-21T10:18:46.6661217Z .loc 1 88 50 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:50 2026-02-21T10:18:46.6661296Z mad.lo.s32 %r12307, %r12227, 1280, %r12625; 2026-02-21T10:18:46.6661371Z mad.lo.s32 %r12308, %r12228, 1280, %r12625; 2026-02-21T10:18:46.6661444Z mad.lo.s32 %r12309, %r12229, 1280, %r12625; 2026-02-21T10:18:46.6661531Z mad.lo.s32 %r12310, %r12230, 1280, %r12625; 2026-02-21T10:18:46.6661605Z mad.lo.s32 %r12311, %r12231, 1280, %r12625; 2026-02-21T10:18:46.6661677Z mad.lo.s32 %r12312, %r12232, 1280, %r12625; 2026-02-21T10:18:46.6661750Z mad.lo.s32 %r12313, %r12233, 1280, %r12625; 2026-02-21T10:18:46.6661823Z mad.lo.s32 %r12314, %r12234, 1280, %r12625; 2026-02-21T10:18:46.6661893Z mad.lo.s32 %r12315, %r12235, 1280, %r12625; 2026-02-21T10:18:46.6662023Z mad.lo.s32 %r12316, %r12236, 1280, %r12625; 2026-02-21T10:18:46.6662098Z mad.lo.s32 %r12317, %r12237, 1280, %r12625; 2026-02-21T10:18:46.6662170Z mad.lo.s32 %r12318, %r12238, 1280, %r12625; 2026-02-21T10:18:46.6662241Z mad.lo.s32 %r12319, %r12239, 1280, %r12625; 2026-02-21T10:18:46.6662358Z mad.lo.s32 %r12320, %r12240, 1280, %r12625; 2026-02-21T10:18:46.6662431Z mad.lo.s32 %r12321, %r12241, 1280, %r12625; 2026-02-21T10:18:46.6662501Z mad.lo.s32 %r12322, %r12242, 1280, %r12625; 2026-02-21T10:18:46.6662709Z .loc 1 88 22 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:22 2026-02-21T10:18:46.6662782Z mad.wide.s32 %rd293, %r12307, 2, %rd27; 2026-02-21T10:18:46.6662851Z mad.wide.s32 %rd294, %r12308, 2, %rd27; 2026-02-21T10:18:46.6662920Z mad.wide.s32 %rd295, %r12309, 2, %rd27; 2026-02-21T10:18:46.6662991Z mad.wide.s32 %rd296, %r12310, 2, %rd27; 2026-02-21T10:18:46.6663058Z mad.wide.s32 %rd297, %r12311, 2, %rd27; 2026-02-21T10:18:46.6663128Z mad.wide.s32 %rd298, %r12312, 2, %rd27; 2026-02-21T10:18:46.6663200Z mad.wide.s32 %rd299, %r12313, 2, %rd27; 2026-02-21T10:18:46.6663268Z mad.wide.s32 %rd300, %r12314, 2, %rd27; 2026-02-21T10:18:46.6663334Z mad.wide.s32 %rd301, %r12315, 2, %rd27; 2026-02-21T10:18:46.6663403Z mad.wide.s32 %rd302, %r12316, 2, %rd27; 2026-02-21T10:18:46.6663471Z mad.wide.s32 %rd303, %r12317, 2, %rd27; 2026-02-21T10:18:46.6663538Z mad.wide.s32 %rd304, %r12318, 2, %rd27; 2026-02-21T10:18:46.6663608Z mad.wide.s32 %rd305, %r12319, 2, %rd27; 2026-02-21T10:18:46.6663673Z mad.wide.s32 %rd306, %r12320, 2, %rd27; 2026-02-21T10:18:46.6663740Z mad.wide.s32 %rd307, %r12321, 2, %rd27; 2026-02-21T10:18:46.6663805Z mad.wide.s32 %rd308, %r12322, 2, %rd27; 2026-02-21T10:18:46.6664007Z .loc 1 88 81 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:88:81 2026-02-21T10:18:46.6664066Z bar.sync 0; 2026-02-21T10:18:46.6664199Z st.shared.v4.b32 [%r613], {%r12243, %r12245, %r12247, %r12249}; 2026-02-21T10:18:46.6664317Z st.shared.v4.b32 [%r614], {%r12251, %r12253, %r12255, %r12257}; 2026-02-21T10:18:46.6664428Z st.shared.v4.b32 [%r615], {%r12259, %r12261, %r12263, %r12265}; 2026-02-21T10:18:46.6664535Z st.shared.v4.b32 [%r616], {%r12267, %r12269, %r12271, %r12273}; 2026-02-21T10:18:46.6664592Z bar.sync 0; 2026-02-21T10:18:46.6664655Z // begin inline asm 2026-02-21T10:18:46.6664862Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12162, %r12163, %r12164, %r12165}, [%r12086]; 2026-02-21T10:18:46.6664919Z // end inline asm 2026-02-21T10:18:46.6664980Z // begin inline asm 2026-02-21T10:18:46.6665175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12170, %r12171, %r12172, %r12173}, [%r12091]; 2026-02-21T10:18:46.6665233Z // end inline asm 2026-02-21T10:18:46.6665294Z // begin inline asm 2026-02-21T10:18:46.6665486Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12178, %r12179, %r12180, %r12181}, [%r12096]; 2026-02-21T10:18:46.6665599Z // end inline asm 2026-02-21T10:18:46.6665702Z // begin inline asm 2026-02-21T10:18:46.6665894Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12186, %r12187, %r12188, %r12189}, [%r12101]; 2026-02-21T10:18:46.6665951Z // end inline asm 2026-02-21T10:18:46.6666006Z bar.sync 0; 2026-02-21T10:18:46.6666117Z st.shared.v4.b32 [%r613], {%r12244, %r12246, %r12248, %r12250}; 2026-02-21T10:18:46.6666228Z st.shared.v4.b32 [%r614], {%r12252, %r12254, %r12256, %r12258}; 2026-02-21T10:18:46.6666335Z st.shared.v4.b32 [%r615], {%r12260, %r12262, %r12264, %r12266}; 2026-02-21T10:18:46.6666445Z st.shared.v4.b32 [%r616], {%r12268, %r12270, %r12272, %r12274}; 2026-02-21T10:18:46.6666620Z bar.sync 0; 2026-02-21T10:18:46.6666682Z // begin inline asm 2026-02-21T10:18:46.6666876Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12166, %r12167, %r12168, %r12169}, [%r12086]; 2026-02-21T10:18:46.6666934Z // end inline asm 2026-02-21T10:18:46.6666994Z // begin inline asm 2026-02-21T10:18:46.6667187Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12174, %r12175, %r12176, %r12177}, [%r12091]; 2026-02-21T10:18:46.6667249Z // end inline asm 2026-02-21T10:18:46.6667378Z // begin inline asm 2026-02-21T10:18:46.6667571Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12182, %r12183, %r12184, %r12185}, [%r12096]; 2026-02-21T10:18:46.6667634Z // end inline asm 2026-02-21T10:18:46.6667700Z // begin inline asm 2026-02-21T10:18:46.6667949Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12190, %r12191, %r12192, %r12193}, [%r12101]; 2026-02-21T10:18:46.6668007Z // end inline asm 2026-02-21T10:18:46.6668062Z bar.sync 0; 2026-02-21T10:18:46.6668172Z st.shared.v4.b32 [%r613], {%r12275, %r12277, %r12279, %r12281}; 2026-02-21T10:18:46.6668281Z st.shared.v4.b32 [%r614], {%r12283, %r12285, %r12287, %r12289}; 2026-02-21T10:18:46.6668391Z st.shared.v4.b32 [%r615], {%r12291, %r12293, %r12295, %r12297}; 2026-02-21T10:18:46.6668563Z st.shared.v4.b32 [%r616], {%r12299, %r12301, %r12303, %r12305}; 2026-02-21T10:18:46.6668623Z bar.sync 0; 2026-02-21T10:18:46.6668684Z // begin inline asm 2026-02-21T10:18:46.6668879Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12194, %r12195, %r12196, %r12197}, [%r12086]; 2026-02-21T10:18:46.6668935Z // end inline asm 2026-02-21T10:18:46.6668992Z // begin inline asm 2026-02-21T10:18:46.6669185Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12202, %r12203, %r12204, %r12205}, [%r12091]; 2026-02-21T10:18:46.6669242Z // end inline asm 2026-02-21T10:18:46.6669299Z // begin inline asm 2026-02-21T10:18:46.6669490Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12210, %r12211, %r12212, %r12213}, [%r12096]; 2026-02-21T10:18:46.6669547Z // end inline asm 2026-02-21T10:18:46.6669604Z // begin inline asm 2026-02-21T10:18:46.6669806Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12218, %r12219, %r12220, %r12221}, [%r12101]; 2026-02-21T10:18:46.6669863Z // end inline asm 2026-02-21T10:18:46.6669918Z bar.sync 0; 2026-02-21T10:18:46.6670031Z st.shared.v4.b32 [%r613], {%r12276, %r12278, %r12280, %r12282}; 2026-02-21T10:18:46.6670145Z st.shared.v4.b32 [%r614], {%r12284, %r12286, %r12288, %r12290}; 2026-02-21T10:18:46.6670255Z st.shared.v4.b32 [%r615], {%r12292, %r12294, %r12296, %r12298}; 2026-02-21T10:18:46.6670364Z st.shared.v4.b32 [%r616], {%r12300, %r12302, %r12304, %r12306}; 2026-02-21T10:18:46.6670422Z bar.sync 0; 2026-02-21T10:18:46.6670479Z // begin inline asm 2026-02-21T10:18:46.6670673Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12198, %r12199, %r12200, %r12201}, [%r12086]; 2026-02-21T10:18:46.6670732Z // end inline asm 2026-02-21T10:18:46.6670789Z // begin inline asm 2026-02-21T10:18:46.6670978Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12206, %r12207, %r12208, %r12209}, [%r12091]; 2026-02-21T10:18:46.6671033Z // end inline asm 2026-02-21T10:18:46.6671096Z // begin inline asm 2026-02-21T10:18:46.6671287Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12214, %r12215, %r12216, %r12217}, [%r12096]; 2026-02-21T10:18:46.6671422Z // end inline asm 2026-02-21T10:18:46.6671484Z // begin inline asm 2026-02-21T10:18:46.6671748Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r12222, %r12223, %r12224, %r12225}, [%r12101]; 2026-02-21T10:18:46.6671809Z // end inline asm 2026-02-21T10:18:46.6671868Z // begin inline asm 2026-02-21T10:18:46.6672005Z st.global.v4.b32 [ %rd293 + 0 ], { %r12162, %r12163, %r12164, %r12165 }; 2026-02-21T10:18:46.6672061Z // end inline asm 2026-02-21T10:18:46.6672119Z // begin inline asm 2026-02-21T10:18:46.6672250Z st.global.v4.b32 [ %rd294 + 0 ], { %r12166, %r12167, %r12168, %r12169 }; 2026-02-21T10:18:46.6672308Z // end inline asm 2026-02-21T10:18:46.6672366Z // begin inline asm 2026-02-21T10:18:46.6672498Z st.global.v4.b32 [ %rd295 + 0 ], { %r12170, %r12171, %r12172, %r12173 }; 2026-02-21T10:18:46.6672558Z // end inline asm 2026-02-21T10:18:46.6672617Z // begin inline asm 2026-02-21T10:18:46.6672737Z st.global.v4.b32 [ %rd296 + 0 ], { %r12174, %r12175, %r12176, %r12177 }; 2026-02-21T10:18:46.6672798Z // end inline asm 2026-02-21T10:18:46.6672856Z // begin inline asm 2026-02-21T10:18:46.6672977Z st.global.v4.b32 [ %rd297 + 0 ], { %r12178, %r12179, %r12180, %r12181 }; 2026-02-21T10:18:46.6673098Z // end inline asm 2026-02-21T10:18:46.6673163Z // begin inline asm 2026-02-21T10:18:46.6673282Z st.global.v4.b32 [ %rd298 + 0 ], { %r12182, %r12183, %r12184, %r12185 }; 2026-02-21T10:18:46.6673339Z // end inline asm 2026-02-21T10:18:46.6673399Z // begin inline asm 2026-02-21T10:18:46.6673565Z st.global.v4.b32 [ %rd299 + 0 ], { %r12186, %r12187, %r12188, %r12189 }; 2026-02-21T10:18:46.6673624Z // end inline asm 2026-02-21T10:18:46.6673688Z // begin inline asm 2026-02-21T10:18:46.6673807Z st.global.v4.b32 [ %rd300 + 0 ], { %r12190, %r12191, %r12192, %r12193 }; 2026-02-21T10:18:46.6673862Z // end inline asm 2026-02-21T10:18:46.6673923Z // begin inline asm 2026-02-21T10:18:46.6674041Z st.global.v4.b32 [ %rd301 + 0 ], { %r12194, %r12195, %r12196, %r12197 }; 2026-02-21T10:18:46.6674097Z // end inline asm 2026-02-21T10:18:46.6674161Z // begin inline asm 2026-02-21T10:18:46.6674285Z st.global.v4.b32 [ %rd302 + 0 ], { %r12198, %r12199, %r12200, %r12201 }; 2026-02-21T10:18:46.6674343Z // end inline asm 2026-02-21T10:18:46.6674401Z // begin inline asm 2026-02-21T10:18:46.6674524Z st.global.v4.b32 [ %rd303 + 0 ], { %r12202, %r12203, %r12204, %r12205 }; 2026-02-21T10:18:46.6674579Z // end inline asm 2026-02-21T10:18:46.6674636Z // begin inline asm 2026-02-21T10:18:46.6674755Z st.global.v4.b32 [ %rd304 + 0 ], { %r12206, %r12207, %r12208, %r12209 }; 2026-02-21T10:18:46.6674813Z // end inline asm 2026-02-21T10:18:46.6674872Z // begin inline asm 2026-02-21T10:18:46.6674991Z st.global.v4.b32 [ %rd305 + 0 ], { %r12210, %r12211, %r12212, %r12213 }; 2026-02-21T10:18:46.6675049Z // end inline asm 2026-02-21T10:18:46.6675118Z // begin inline asm 2026-02-21T10:18:46.6675241Z st.global.v4.b32 [ %rd306 + 0 ], { %r12214, %r12215, %r12216, %r12217 }; 2026-02-21T10:18:46.6675300Z // end inline asm 2026-02-21T10:18:46.6675359Z // begin inline asm 2026-02-21T10:18:46.6675478Z st.global.v4.b32 [ %rd307 + 0 ], { %r12218, %r12219, %r12220, %r12221 }; 2026-02-21T10:18:46.6675536Z // end inline asm 2026-02-21T10:18:46.6675599Z // begin inline asm 2026-02-21T10:18:46.6675716Z st.global.v4.b32 [ %rd308 + 0 ], { %r12222, %r12223, %r12224, %r12225 }; 2026-02-21T10:18:46.6675772Z // end inline asm 2026-02-21T10:18:46.6675837Z mov.b32 %r12635, 0f00000000; 2026-02-21T10:18:46.6675900Z mov.b32 %r12636, %r12635; 2026-02-21T10:18:46.6675960Z mov.b32 %r12637, %r12635; 2026-02-21T10:18:46.6676019Z mov.b32 %r12638, %r12635; 2026-02-21T10:18:46.6676081Z mov.b32 %r12639, %r12635; 2026-02-21T10:18:46.6676139Z mov.b32 %r12640, %r12635; 2026-02-21T10:18:46.6676196Z mov.b32 %r12641, %r12635; 2026-02-21T10:18:46.6676258Z mov.b32 %r12642, %r12635; 2026-02-21T10:18:46.6676317Z mov.b32 %r12643, %r12635; 2026-02-21T10:18:46.6676377Z mov.b32 %r12644, %r12635; 2026-02-21T10:18:46.6676436Z mov.b32 %r12645, %r12635; 2026-02-21T10:18:46.6676725Z mov.b32 %r12646, %r12635; 2026-02-21T10:18:46.6676789Z mov.b32 %r12647, %r12635; 2026-02-21T10:18:46.6676909Z mov.b32 %r12648, %r12635; 2026-02-21T10:18:46.6676974Z mov.b32 %r12649, %r12635; 2026-02-21T10:18:46.6677033Z mov.b32 %r12650, %r12635; 2026-02-21T10:18:46.6677093Z mov.b32 %r12651, %r12635; 2026-02-21T10:18:46.6677151Z mov.b32 %r12652, %r12635; 2026-02-21T10:18:46.6677213Z mov.b32 %r12653, %r12635; 2026-02-21T10:18:46.6677273Z mov.b32 %r12654, %r12635; 2026-02-21T10:18:46.6677331Z mov.b32 %r12655, %r12635; 2026-02-21T10:18:46.6677391Z mov.b32 %r12656, %r12635; 2026-02-21T10:18:46.6677449Z mov.b32 %r12657, %r12635; 2026-02-21T10:18:46.6677507Z mov.b32 %r12658, %r12635; 2026-02-21T10:18:46.6677570Z mov.b32 %r12659, %r12635; 2026-02-21T10:18:46.6677628Z mov.b32 %r12660, %r12635; 2026-02-21T10:18:46.6677688Z mov.b32 %r12661, %r12635; 2026-02-21T10:18:46.6677746Z mov.b32 %r12662, %r12635; 2026-02-21T10:18:46.6677809Z mov.b32 %r12663, %r12635; 2026-02-21T10:18:46.6677873Z mov.b32 %r12664, %r12635; 2026-02-21T10:18:46.6677932Z mov.b32 %r12665, %r12635; 2026-02-21T10:18:46.6677998Z mov.b32 %r12666, %r12635; 2026-02-21T10:18:46.6678129Z mov.b32 %r12667, %r12635; 2026-02-21T10:18:46.6678193Z mov.b32 %r12668, %r12635; 2026-02-21T10:18:46.6678250Z mov.b32 %r12669, %r12635; 2026-02-21T10:18:46.6678311Z mov.b32 %r12670, %r12635; 2026-02-21T10:18:46.6678369Z mov.b32 %r12671, %r12635; 2026-02-21T10:18:46.6678503Z mov.b32 %r12672, %r12635; 2026-02-21T10:18:46.6678576Z mov.b32 %r12673, %r12635; 2026-02-21T10:18:46.6678633Z mov.b32 %r12674, %r12635; 2026-02-21T10:18:46.6678691Z mov.b32 %r12675, %r12635; 2026-02-21T10:18:46.6678749Z mov.b32 %r12676, %r12635; 2026-02-21T10:18:46.6678809Z mov.b32 %r12677, %r12635; 2026-02-21T10:18:46.6678868Z mov.b32 %r12678, %r12635; 2026-02-21T10:18:46.6678927Z mov.b32 %r12679, %r12635; 2026-02-21T10:18:46.6678986Z mov.b32 %r12680, %r12635; 2026-02-21T10:18:46.6679043Z mov.b32 %r12681, %r12635; 2026-02-21T10:18:46.6679102Z mov.b32 %r12682, %r12635; 2026-02-21T10:18:46.6679160Z mov.b32 %r12683, %r12635; 2026-02-21T10:18:46.6679224Z mov.b32 %r12684, %r12635; 2026-02-21T10:18:46.6679283Z mov.b32 %r12685, %r12635; 2026-02-21T10:18:46.6679340Z mov.b32 %r12686, %r12635; 2026-02-21T10:18:46.6679400Z mov.b32 %r12687, %r12635; 2026-02-21T10:18:46.6679458Z mov.b32 %r12688, %r12635; 2026-02-21T10:18:46.6679515Z mov.b32 %r12689, %r12635; 2026-02-21T10:18:46.6679575Z mov.b32 %r12690, %r12635; 2026-02-21T10:18:46.6679640Z mov.b32 %r12691, %r12635; 2026-02-21T10:18:46.6679697Z mov.b32 %r12692, %r12635; 2026-02-21T10:18:46.6679757Z mov.b32 %r12693, %r12635; 2026-02-21T10:18:46.6679825Z mov.b32 %r12694, %r12635; 2026-02-21T10:18:46.6679890Z mov.b32 %r12695, %r12635; 2026-02-21T10:18:46.6679949Z mov.b32 %r12696, %r12635; 2026-02-21T10:18:46.6680008Z mov.b32 %r12697, %r12635; 2026-02-21T10:18:46.6680071Z mov.b32 %r12698, %r12635; 2026-02-21T10:18:46.6680130Z mov.b32 %r12699, %r12635; 2026-02-21T10:18:46.6680190Z mov.b32 %r12700, %r12635; 2026-02-21T10:18:46.6680253Z mov.b32 %r12701, %r12635; 2026-02-21T10:18:46.6680313Z mov.b32 %r12702, %r12635; 2026-02-21T10:18:46.6680372Z mov.b32 %r12703, %r12635; 2026-02-21T10:18:46.6680434Z mov.b32 %r12704, %r12635; 2026-02-21T10:18:46.6680492Z mov.b32 %r12705, %r12635; 2026-02-21T10:18:46.6680549Z mov.b32 %r12706, %r12635; 2026-02-21T10:18:46.6680607Z mov.b32 %r12707, %r12635; 2026-02-21T10:18:46.6680671Z mov.b32 %r12708, %r12635; 2026-02-21T10:18:46.6680728Z mov.b32 %r12709, %r12635; 2026-02-21T10:18:46.6680786Z mov.b32 %r12710, %r12635; 2026-02-21T10:18:46.6680847Z mov.b32 %r12711, %r12635; 2026-02-21T10:18:46.6680905Z mov.b32 %r12712, %r12635; 2026-02-21T10:18:46.6680963Z mov.b32 %r12713, %r12635; 2026-02-21T10:18:46.6681024Z mov.b32 %r12714, %r12635; 2026-02-21T10:18:46.6681085Z mov.b32 %r12715, %r12635; 2026-02-21T10:18:46.6681143Z mov.b32 %r12716, %r12635; 2026-02-21T10:18:46.6681201Z mov.b32 %r12717, %r12635; 2026-02-21T10:18:46.6681334Z mov.b32 %r12718, %r12635; 2026-02-21T10:18:46.6681393Z mov.b32 %r12719, %r12635; 2026-02-21T10:18:46.6681497Z mov.b32 %r12720, %r12635; 2026-02-21T10:18:46.6681556Z mov.b32 %r12721, %r12635; 2026-02-21T10:18:46.6681618Z mov.b32 %r12722, %r12635; 2026-02-21T10:18:46.6681675Z mov.b32 %r12723, %r12635; 2026-02-21T10:18:46.6681732Z mov.b32 %r12724, %r12635; 2026-02-21T10:18:46.6681797Z mov.b32 %r12725, %r12635; 2026-02-21T10:18:46.6681858Z mov.b32 %r12726, %r12635; 2026-02-21T10:18:46.6681916Z mov.b32 %r12727, %r12635; 2026-02-21T10:18:46.6681973Z mov.b32 %r12728, %r12635; 2026-02-21T10:18:46.6682035Z mov.b32 %r12729, %r12635; 2026-02-21T10:18:46.6682091Z mov.b32 %r12730, %r12635; 2026-02-21T10:18:46.6682148Z mov.b32 %r12731, %r12635; 2026-02-21T10:18:46.6682210Z mov.b32 %r12732, %r12635; 2026-02-21T10:18:46.6682268Z mov.b32 %r12733, %r12635; 2026-02-21T10:18:46.6682325Z mov.b32 %r12734, %r12635; 2026-02-21T10:18:46.6682383Z mov.b32 %r12735, %r12635; 2026-02-21T10:18:46.6682446Z mov.b32 %r12736, %r12635; 2026-02-21T10:18:46.6682504Z mov.b32 %r12737, %r12635; 2026-02-21T10:18:46.6682563Z mov.b32 %r12738, %r12635; 2026-02-21T10:18:46.6682685Z mov.b32 %r12739, %r12635; 2026-02-21T10:18:46.6682749Z mov.b32 %r12740, %r12635; 2026-02-21T10:18:46.6682807Z mov.b32 %r12741, %r12635; 2026-02-21T10:18:46.6682866Z mov.b32 %r12742, %r12635; 2026-02-21T10:18:46.6682926Z mov.b32 %r12743, %r12635; 2026-02-21T10:18:46.6683028Z mov.b32 %r12744, %r12635; 2026-02-21T10:18:46.6683089Z mov.b32 %r12745, %r12635; 2026-02-21T10:18:46.6683153Z mov.b32 %r12746, %r12635; 2026-02-21T10:18:46.6683211Z mov.b32 %r12747, %r12635; 2026-02-21T10:18:46.6683268Z mov.b32 %r12748, %r12635; 2026-02-21T10:18:46.6683334Z mov.b32 %r12749, %r12635; 2026-02-21T10:18:46.6683392Z mov.b32 %r12750, %r12635; 2026-02-21T10:18:46.6683451Z mov.b32 %r12751, %r12635; 2026-02-21T10:18:46.6683509Z mov.b32 %r12752, %r12635; 2026-02-21T10:18:46.6683573Z mov.b32 %r12753, %r12635; 2026-02-21T10:18:46.6683633Z mov.b32 %r12754, %r12635; 2026-02-21T10:18:46.6683692Z mov.b32 %r12755, %r12635; 2026-02-21T10:18:46.6683754Z mov.b32 %r12756, %r12635; 2026-02-21T10:18:46.6683812Z mov.b32 %r12757, %r12635; 2026-02-21T10:18:46.6683871Z mov.b32 %r12758, %r12635; 2026-02-21T10:18:46.6683928Z mov.b32 %r12759, %r12635; 2026-02-21T10:18:46.6683991Z mov.b32 %r12760, %r12635; 2026-02-21T10:18:46.6684052Z mov.b32 %r12761, %r12635; 2026-02-21T10:18:46.6684110Z mov.b32 %r12762, %r12635; 2026-02-21T10:18:46.6684174Z bra.uni $L__BB0_14; 2026-02-21T10:18:46.6684278Z $L__BB0_15: // %._crit_edge41 2026-02-21T10:18:46.6684520Z .loc 1 19 139 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:139 2026-02-21T10:18:46.6684594Z cp.async.wait_group 0; 2026-02-21T10:18:46.6684655Z bar.sync 0; 2026-02-21T10:18:46.6684864Z .loc 1 19 4 // co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py:19:4 2026-02-21T10:18:46.6684923Z ret; 2026-02-21T10:18:46.6684985Z $L__tmp25: 2026-02-21T10:18:46.6685044Z $L__func_end0: 2026-02-21T10:18:46.6685133Z // -- End function 2026-02-21T10:18:46.6685190Z } 2026-02-21T10:18:46.6685449Z .file 1 "/tmp/torchinductor_root/o2/co2o4qbpmrcw33gtajvlbgli7xw5nivaxfjjpvadkjaaset7dbyl.py" 2026-02-21T10:18:46.6685663Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:18:46.6685731Z .section .debug_abbrev 2026-02-21T10:18:46.6685786Z { 2026-02-21T10:18:46.6685885Z .b8 1 // Abbreviation Code 2026-02-21T10:18:46.6685981Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:18:46.6686066Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:46.6686151Z .b8 37 // DW_AT_producer 2026-02-21T10:18:46.6686230Z .b8 8 // DW_FORM_string 2026-02-21T10:18:46.6686371Z .b8 19 // DW_AT_language 2026-02-21T10:18:46.6686631Z .b8 5 // DW_FORM_data2 2026-02-21T10:18:46.6686718Z .b8 3 // DW_AT_name 2026-02-21T10:18:46.6686797Z .b8 8 // DW_FORM_string 2026-02-21T10:18:46.6686882Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:18:46.6686962Z .b8 6 // DW_FORM_data4 2026-02-21T10:18:46.6687044Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:18:46.6687126Z .b8 8 // DW_FORM_string 2026-02-21T10:18:46.6687201Z .b8 0 // EOM(1) 2026-02-21T10:18:46.6687273Z .b8 0 // EOM(2) 2026-02-21T10:18:46.6687365Z .b8 2 // Abbreviation Code 2026-02-21T10:18:46.6687455Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:46.6687536Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:46.6687614Z .b8 3 // DW_AT_name 2026-02-21T10:18:46.6687776Z .b8 8 // DW_FORM_string 2026-02-21T10:18:46.6687864Z .b8 32 // DW_AT_inline 2026-02-21T10:18:46.6687946Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:46.6688021Z .b8 0 // EOM(1) 2026-02-21T10:18:46.6688156Z .b8 0 // EOM(2) 2026-02-21T10:18:46.6688244Z .b8 3 // Abbreviation Code 2026-02-21T10:18:46.6688332Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:46.6688421Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:46.6688502Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:46.6688578Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:46.6688665Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:46.6688742Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:46.6688837Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:46.6688919Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:46.6688988Z .b8 0 // EOM(1) 2026-02-21T10:18:46.6689056Z .b8 0 // EOM(2) 2026-02-21T10:18:46.6689144Z .b8 4 // Abbreviation Code 2026-02-21T10:18:46.6689255Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:18:46.6689338Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:46.6689434Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:46.6689510Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:46.6689587Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:46.6689666Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:46.6689751Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:46.6689829Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:46.6689910Z .b8 88 // DW_AT_call_file 2026-02-21T10:18:46.6689992Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:46.6690075Z .b8 89 // DW_AT_call_line 2026-02-21T10:18:46.6690153Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:46.6690250Z .b8 87 // DW_AT_call_column 2026-02-21T10:18:46.6690330Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:46.6690401Z .b8 0 // EOM(1) 2026-02-21T10:18:46.6690469Z .b8 0 // EOM(2) 2026-02-21T10:18:46.6690543Z .b8 0 // EOM(3) 2026-02-21T10:18:46.6690669Z } 2026-02-21T10:18:46.6690734Z .section .debug_info 2026-02-21T10:18:46.6690846Z { 2026-02-21T10:18:46.6690938Z .b32 178 // Length of Unit 2026-02-21T10:18:46.6691032Z .b8 2 // DWARF version number 2026-02-21T10:18:46.6691087Z .b8 0 2026-02-21T10:18:46.6691219Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:18:46.6691316Z .b8 8 // Address Size (in bytes) 2026-02-21T10:18:46.6691433Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:18:46.6691521Z .b8 116 // DW_AT_producer 2026-02-21T10:18:46.6691576Z .b8 114 2026-02-21T10:18:46.6691628Z .b8 105 2026-02-21T10:18:46.6691684Z .b8 116 2026-02-21T10:18:46.6691735Z .b8 111 2026-02-21T10:18:46.6691786Z .b8 110 2026-02-21T10:18:46.6691837Z .b8 0 2026-02-21T10:18:46.6691935Z .b8 2 // DW_AT_language 2026-02-21T10:18:46.6691990Z .b8 0 2026-02-21T10:18:46.6692071Z .b8 99 // DW_AT_name 2026-02-21T10:18:46.6692128Z .b8 111 2026-02-21T10:18:46.6692180Z .b8 50 2026-02-21T10:18:46.6692283Z .b8 111 2026-02-21T10:18:46.6692337Z .b8 52 2026-02-21T10:18:46.6692391Z .b8 113 2026-02-21T10:18:46.6692442Z .b8 98 2026-02-21T10:18:46.6692494Z .b8 112 2026-02-21T10:18:46.6692548Z .b8 109 2026-02-21T10:18:46.6692600Z .b8 114 2026-02-21T10:18:46.6692650Z .b8 99 2026-02-21T10:18:46.6692748Z .b8 119 2026-02-21T10:18:46.6692804Z .b8 51 2026-02-21T10:18:46.6692855Z .b8 51 2026-02-21T10:18:46.6692907Z .b8 103 2026-02-21T10:18:46.6692958Z .b8 116 2026-02-21T10:18:46.6693013Z .b8 97 2026-02-21T10:18:46.6693064Z .b8 106 2026-02-21T10:18:46.6693115Z .b8 118 2026-02-21T10:18:46.6693168Z .b8 108 2026-02-21T10:18:46.6693219Z .b8 98 2026-02-21T10:18:46.6693270Z .b8 103 2026-02-21T10:18:46.6693320Z .b8 108 2026-02-21T10:18:46.6693375Z .b8 105 2026-02-21T10:18:46.6693426Z .b8 55 2026-02-21T10:18:46.6693480Z .b8 120 2026-02-21T10:18:46.6693534Z .b8 119 2026-02-21T10:18:46.6693597Z .b8 53 2026-02-21T10:18:46.6693653Z .b8 110 2026-02-21T10:18:46.6693706Z .b8 105 2026-02-21T10:18:46.6693761Z .b8 118 2026-02-21T10:18:46.6693814Z .b8 97 2026-02-21T10:18:46.6693867Z .b8 120 2026-02-21T10:18:46.6693922Z .b8 102 2026-02-21T10:18:46.6693973Z .b8 106 2026-02-21T10:18:46.6694026Z .b8 106 2026-02-21T10:18:46.6694078Z .b8 112 2026-02-21T10:18:46.6694134Z .b8 118 2026-02-21T10:18:46.6694184Z .b8 97 2026-02-21T10:18:46.6694237Z .b8 100 2026-02-21T10:18:46.6694288Z .b8 107 2026-02-21T10:18:46.6694344Z .b8 106 2026-02-21T10:18:46.6694395Z .b8 97 2026-02-21T10:18:46.6694445Z .b8 97 2026-02-21T10:18:46.6694498Z .b8 115 2026-02-21T10:18:46.6694550Z .b8 101 2026-02-21T10:18:46.6694602Z .b8 116 2026-02-21T10:18:46.6694653Z .b8 55 2026-02-21T10:18:46.6694707Z .b8 100 2026-02-21T10:18:46.6694758Z .b8 98 2026-02-21T10:18:46.6694811Z .b8 121 2026-02-21T10:18:46.6694866Z .b8 108 2026-02-21T10:18:46.6694916Z .b8 46 2026-02-21T10:18:46.6694969Z .b8 112 2026-02-21T10:18:46.6695021Z .b8 121 2026-02-21T10:18:46.6695074Z .b8 0 2026-02-21T10:18:46.6695177Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:18:46.6695261Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:18:46.6695316Z .b8 116 2026-02-21T10:18:46.6695368Z .b8 109 2026-02-21T10:18:46.6695419Z .b8 112 2026-02-21T10:18:46.6695471Z .b8 47 2026-02-21T10:18:46.6695526Z .b8 116 2026-02-21T10:18:46.6695578Z .b8 111 2026-02-21T10:18:46.6695631Z .b8 114 2026-02-21T10:18:46.6695682Z .b8 99 2026-02-21T10:18:46.6695736Z .b8 104 2026-02-21T10:18:46.6695787Z .b8 105 2026-02-21T10:18:46.6695837Z .b8 110 2026-02-21T10:18:46.6695889Z .b8 100 2026-02-21T10:18:46.6695939Z .b8 117 2026-02-21T10:18:46.6695990Z .b8 99 2026-02-21T10:18:46.6696042Z .b8 116 2026-02-21T10:18:46.6696096Z .b8 111 2026-02-21T10:18:46.6696159Z .b8 114 2026-02-21T10:18:46.6696212Z .b8 95 2026-02-21T10:18:46.6696267Z .b8 114 2026-02-21T10:18:46.6696322Z .b8 111 2026-02-21T10:18:46.6696440Z .b8 111 2026-02-21T10:18:46.6696613Z .b8 116 2026-02-21T10:18:46.6696670Z .b8 47 2026-02-21T10:18:46.6696803Z .b8 111 2026-02-21T10:18:46.6696854Z .b8 50 2026-02-21T10:18:46.6696908Z .b8 0 2026-02-21T10:18:46.6697022Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:18:46.6697100Z .b8 95 // DW_AT_name 2026-02-21T10:18:46.6697152Z .b8 104 2026-02-21T10:18:46.6697207Z .b8 101 2026-02-21T10:18:46.6697260Z .b8 108 2026-02-21T10:18:46.6697311Z .b8 105 2026-02-21T10:18:46.6697364Z .b8 111 2026-02-21T10:18:46.6697416Z .b8 110 2026-02-21T10:18:46.6697467Z .b8 95 2026-02-21T10:18:46.6697517Z .b8 109 2026-02-21T10:18:46.6697570Z .b8 97 2026-02-21T10:18:46.6697622Z .b8 116 2026-02-21T10:18:46.6697673Z .b8 109 2026-02-21T10:18:46.6697725Z .b8 117 2026-02-21T10:18:46.6697781Z .b8 108 2026-02-21T10:18:46.6697831Z .b8 95 2026-02-21T10:18:46.6697882Z .b8 98 2026-02-21T10:18:46.6697941Z .b8 102 2026-02-21T10:18:46.6697993Z .b8 49 2026-02-21T10:18:46.6698045Z .b8 54 2026-02-21T10:18:46.6698097Z .b8 95 2026-02-21T10:18:46.6698153Z .b8 105 2026-02-21T10:18:46.6698204Z .b8 110 2026-02-21T10:18:46.6698255Z .b8 116 2026-02-21T10:18:46.6698406Z .b8 52 2026-02-21T10:18:46.6698460Z .b8 0 2026-02-21T10:18:46.6698541Z .b8 1 // DW_AT_inline 2026-02-21T10:18:46.6698653Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:18:46.6698819Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:18:46.6698921Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:18:46.6699019Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:46.6699151Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:18:46.6699246Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:46.6699334Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:18:46.6699430Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:18:46.6699515Z .b8 1 // DW_AT_call_file 2026-02-21T10:18:46.6699602Z .b8 84 // DW_AT_call_line 2026-02-21T10:18:46.6699689Z .b8 40 // DW_AT_call_column 2026-02-21T10:18:46.6699782Z .b8 0 // End Of Children Mark 2026-02-21T10:18:46.6699883Z .b8 0 // End Of Children Mark 2026-02-21T10:18:46.6699936Z } 2026-02-21T10:18:46.6700010Z .section .debug_macinfo { } 2026-02-21T10:18:46.6700016Z 2026-02-21T10:18:46.6700096Z ================================================================ 2026-02-21T10:18:46.6700216Z please share the reproducer above with Triton project. 2026-02-21T10:18:50.5271588Z [3123s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:18:50.5273671Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[4, 2], range_unroll_factors=[1, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:18:50.5275603Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:18:50.5275961Z `ptxas` stderr: 2026-02-21T10:18:50.5276851Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:50.5277662Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:50.5277897Z 2026-02-21T10:18:50.5278536Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfez8b4rr.ptx -o /tmp/tmpfez8b4rr.ptx.o 2026-02-21T10:18:50.5279928Z 2026-02-21T10:18:50.5280125Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:18:50.5280427Z 2026-02-21T10:18:50.5280431Z 2026-02-21T10:18:50.5280434Z 2026-02-21T10:18:50.5280536Z ================================================================ 2026-02-21T10:18:50.5280838Z Internal Triton PTX codegen error 2026-02-21T10:18:50.5281075Z `ptxas` stderr: 2026-02-21T10:18:50.5281776Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:50.5282586Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:50.5282809Z 2026-02-21T10:18:50.5283437Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfez8b4rr.ptx -o /tmp/tmpfez8b4rr.ptx.o 2026-02-21T10:18:50.5284171Z 2026-02-21T10:18:50.5284175Z 2026-02-21T10:18:50.5284242Z // 2026-02-21T10:18:50.5284428Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:18:50.5284669Z // 2026-02-21T10:18:50.5284874Z 2026-02-21T10:18:50.5284953Z .version 8.7 2026-02-21T10:18:50.5285129Z .target sm_90a 2026-02-21T10:18:50.5285307Z .address_size 64 2026-02-21T10:18:50.5285419Z 2026-02-21T10:18:50.5285629Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:18:50.5286145Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:18:50.5286586Z // @_helion_matmul_bf16_int4 2026-02-21T10:18:50.5286918Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:18:50.5287284Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:18:50.5287721Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:18:50.5288153Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:18:50.5288582Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:18:50.5289013Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:18:50.5289341Z ) 2026-02-21T10:18:50.5289497Z .reqntid 128 2026-02-21T10:18:50.5289674Z .maxnreg 32 2026-02-21T10:18:50.5289829Z { 2026-02-21T10:18:50.5289990Z .reg .pred %p<64>; 2026-02-21T10:18:50.5290182Z .reg .b16 %rs<113>; 2026-02-21T10:18:50.5290379Z .reg .b32 %r<2657>; 2026-02-21T10:18:50.5290550Z .reg .b64 %rd<125>; 2026-02-21T10:18:50.5290857Z .loc 1 19 0 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:19:0 2026-02-21T10:18:50.5291209Z $L__func_begin0: 2026-02-21T10:18:50.5291496Z .loc 1 19 0 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:19:0 2026-02-21T10:18:50.5291782Z 2026-02-21T10:18:50.5291846Z // %bb.0: 2026-02-21T10:18:50.5292040Z ld.param.b64 %rd6, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:18:50.5292291Z $L__tmp0: 2026-02-21T10:18:50.5292566Z .loc 1 21 67 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:21:67 2026-02-21T10:18:50.5292938Z mov.u32 %r2520, %ctaid.x; 2026-02-21T10:18:50.5293164Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:18:50.5293419Z mov.u32 %r578, %ctaid.y; 2026-02-21T10:18:50.5293639Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:18:50.5294056Z mov.u32 %r579, %ctaid.z; 2026-02-21T10:18:50.5294231Z mov.u32 %r580, %nctaid.x; 2026-02-21T10:18:50.5294414Z mov.u32 %r581, %nctaid.y; 2026-02-21T10:18:50.5294601Z mad.lo.s32 %r582, %r579, %r581, %r578; 2026-02-21T10:18:50.5294812Z mad.lo.s32 %r583, %r582, %r580, %r2520; 2026-02-21T10:18:50.5295017Z shl.b32 %r584, %r583, 7; 2026-02-21T10:18:50.5295184Z cvt.s64.s32 %rd54, %r584; 2026-02-21T10:18:50.5295364Z add.s64 %rd23, %rd53, %rd54; 2026-02-21T10:18:50.5295540Z mov.u32 %r2, %tid.x; 2026-02-21T10:18:50.5295710Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:18:50.5295986Z shl.b32 %r585, %r2, 2; 2026-02-21T10:18:50.5296152Z mov.b32 %r586, global_smem; 2026-02-21T10:18:50.5296400Z add.s32 %r504, %r586, %r585; 2026-02-21T10:18:50.5296712Z mov.b32 %r2380, 0; 2026-02-21T10:18:50.5296873Z // begin inline asm 2026-02-21T10:18:50.5297058Z @%p1 st.shared.b32 [ %r504 + 0 ], %r2380; 2026-02-21T10:18:50.5297275Z // end inline asm 2026-02-21T10:18:50.5297431Z bar.warp.sync -1; 2026-02-21T10:18:50.5297602Z setp.eq.b32 %p61, %r2, 0; 2026-02-21T10:18:50.5297782Z cvt.u64.u32 %rd8, %r586; 2026-02-21T10:18:50.5297956Z // begin inline asm 2026-02-21T10:18:50.5298274Z @%p61 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T10:18:50.5298616Z // end inline asm 2026-02-21T10:18:50.5298771Z // begin inline asm 2026-02-21T10:18:50.5299035Z @%p61 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T10:18:50.5299345Z // end inline asm 2026-02-21T10:18:50.5299496Z mov.b32 %r506, 128; 2026-02-21T10:18:50.5299669Z // begin inline asm 2026-02-21T10:18:50.5299948Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r506; 2026-02-21T10:18:50.5300380Z // end inline asm 2026-02-21T10:18:50.5300538Z mov.b32 %r507, 16; 2026-02-21T10:18:50.5300688Z // begin inline asm 2026-02-21T10:18:50.5300966Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r507; 2026-02-21T10:18:50.5301288Z // end inline asm 2026-02-21T10:18:50.5301506Z mov.b32 %r508, 1280; 2026-02-21T10:18:50.5301672Z // begin inline asm 2026-02-21T10:18:50.5301968Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r508; 2026-02-21T10:18:50.5302314Z // end inline asm 2026-02-21T10:18:50.5302478Z mov.b32 %r509, 4096; 2026-02-21T10:18:50.5302639Z // begin inline asm 2026-02-21T10:18:50.5302928Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r509; 2026-02-21T10:18:50.5303265Z // end inline asm 2026-02-21T10:18:50.5303414Z mov.b64 %rd16, 1280; 2026-02-21T10:18:50.5303575Z // begin inline asm 2026-02-21T10:18:50.5303876Z @%p61 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T10:18:50.5304229Z // end inline asm 2026-02-21T10:18:50.5304379Z mov.b32 %r2379, 1; 2026-02-21T10:18:50.5304529Z // begin inline asm 2026-02-21T10:18:50.5304844Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r2379; 2026-02-21T10:18:50.5305196Z // end inline asm 2026-02-21T10:18:50.5305343Z // begin inline asm 2026-02-21T10:18:50.5305643Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r2379; 2026-02-21T10:18:50.5306000Z // end inline asm 2026-02-21T10:18:50.5306147Z // begin inline asm 2026-02-21T10:18:50.5306433Z @%p61 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:50.5306892Z // end inline asm 2026-02-21T10:18:50.5307037Z // begin inline asm 2026-02-21T10:18:50.5307342Z @%p61 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:50.5307688Z // end inline asm 2026-02-21T10:18:50.5307842Z // begin inline asm 2026-02-21T10:18:50.5308120Z @%p61 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T10:18:50.5308543Z // end inline asm 2026-02-21T10:18:50.5308694Z // begin inline asm 2026-02-21T10:18:50.5308967Z @%p61 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:50.5309291Z // end inline asm 2026-02-21T10:18:50.5309434Z // begin inline asm 2026-02-21T10:18:50.5309869Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T10:18:50.5310338Z // end inline asm 2026-02-21T10:18:50.5310486Z // begin inline asm 2026-02-21T10:18:50.5310730Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T10:18:50.5311032Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:18:50.5311351Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:18:50.5311617Z // end inline asm 2026-02-21T10:18:50.5311770Z bar.sync 0; 2026-02-21T10:18:50.5311927Z cvta.global.u64 %rd34, %rd23; 2026-02-21T10:18:50.5312265Z .loc 1 38 45 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:38:45 2026-02-21T10:18:50.5312618Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:18:50.5312790Z or.b32 %r6, %r5, 16; 2026-02-21T10:18:50.5312953Z or.b32 %r7, %r5, 32; 2026-02-21T10:18:50.5313107Z or.b32 %r8, %r5, 48; 2026-02-21T10:18:50.5313264Z or.b32 %r9, %r5, 64; 2026-02-21T10:18:50.5313414Z or.b32 %r10, %r5, 80; 2026-02-21T10:18:50.5313584Z or.b32 %r11, %r5, 96; 2026-02-21T10:18:50.5313747Z or.b32 %r12, %r5, 112; 2026-02-21T10:18:50.5314065Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5314433Z sub.s32 %r589, 5120, %r2520; 2026-02-21T10:18:50.5314632Z mul.hi.s32 %r590, %r589, 1041204193; 2026-02-21T10:18:50.5314842Z shr.u32 %r591, %r590, 31; 2026-02-21T10:18:50.5315017Z shr.s32 %r592, %r590, 11; 2026-02-21T10:18:50.5315195Z add.s32 %r30, %r592, %r591; 2026-02-21T10:18:50.5315466Z mul.lo.s32 %r593, %r30, 8448; 2026-02-21T10:18:50.5315671Z setp.ne.b32 %p28, %r589, %r593; 2026-02-21T10:18:50.5315868Z setp.lt.u32 %p29, %r2520, 5121; 2026-02-21T10:18:50.5316071Z and.pred %p30, %p29, %p28; 2026-02-21T10:18:50.5316334Z selp.b32 %r31, 1, 0, %p30; 2026-02-21T10:18:50.5316646Z add.s32 %r32, %r30, %r31; 2026-02-21T10:18:50.5316974Z .loc 1 53 38 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:53:38 2026-02-21T10:18:50.5317335Z and.b32 %r33, %r2, 7; 2026-02-21T10:18:50.5317504Z shl.b32 %r34, %r33, 2; 2026-02-21T10:18:50.5317821Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5318183Z add.s32 %r512, %r586, 47104; 2026-02-21T10:18:50.5318371Z // begin inline asm 2026-02-21T10:18:50.5318578Z @%p61 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T10:18:50.5318803Z // end inline asm 2026-02-21T10:18:50.5318955Z bar.sync 0; 2026-02-21T10:18:50.5319120Z add.s32 %r513, %r586, 47112; 2026-02-21T10:18:50.5326221Z // begin inline asm 2026-02-21T10:18:50.5326645Z @%p61 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T10:18:50.5326913Z // end inline asm 2026-02-21T10:18:50.5327081Z bar.sync 0; 2026-02-21T10:18:50.5327256Z add.s32 %r514, %r586, 47120; 2026-02-21T10:18:50.5327444Z // begin inline asm 2026-02-21T10:18:50.5327656Z @%p61 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T10:18:50.5327903Z // end inline asm 2026-02-21T10:18:50.5328078Z setp.lt.s32 %p31, %r32, 1; 2026-02-21T10:18:50.5328274Z setp.gt.s32 %p32, %r32, 0; 2026-02-21T10:18:50.5328607Z .loc 1 32 35 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:32:35 2026-02-21T10:18:50.5328983Z mul.hi.u32 %r594, %r2520, 1717986919; 2026-02-21T10:18:50.5329192Z shr.u32 %r595, %r594, 5; 2026-02-21T10:18:50.5329513Z .loc 1 33 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:33:33 2026-02-21T10:18:50.5329880Z shl.b32 %r596, %r595, 3; 2026-02-21T10:18:50.5330194Z .loc 1 34 39 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:34:39 2026-02-21T10:18:50.5330541Z sub.s32 %r597, 512, %r596; 2026-02-21T10:18:50.5330850Z .loc 1 34 52 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:34:52 2026-02-21T10:18:50.5331194Z min.s32 %r598, %r597, 8; 2026-02-21T10:18:50.5331504Z .loc 1 35 45 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:45 2026-02-21T10:18:50.5331866Z mul.lo.s32 %r599, %r595, 80; 2026-02-21T10:18:50.5332063Z sub.s32 %r600, %r2520, %r599; 2026-02-21T10:18:50.5332404Z .loc 1 36 51 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:36:51 2026-02-21T10:18:50.5332922Z div.s32 %r601, %r600, %r598; 2026-02-21T10:18:50.5333258Z .loc 1 35 64 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:64 2026-02-21T10:18:50.5333738Z mul.lo.s32 %r602, %r601, %r598; 2026-02-21T10:18:50.5333937Z sub.s32 %r603, %r600, %r602; 2026-02-21T10:18:50.5334270Z .loc 1 35 30 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:30 2026-02-21T10:18:50.5334624Z add.s32 %r604, %r603, %r596; 2026-02-21T10:18:50.5334961Z .loc 1 37 27 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:37:27 2026-02-21T10:18:50.5335315Z shl.b32 %r2377, %r604, 7; 2026-02-21T10:18:50.5335625Z .loc 1 38 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:38:32 2026-02-21T10:18:50.5335978Z or.b32 %r2521, %r2377, %r5; 2026-02-21T10:18:50.5336157Z or.b32 %r2522, %r2377, %r6; 2026-02-21T10:18:50.5336339Z or.b32 %r2523, %r2377, %r7; 2026-02-21T10:18:50.5336645Z or.b32 %r2524, %r2377, %r8; 2026-02-21T10:18:50.5336836Z or.b32 %r2525, %r2377, %r9; 2026-02-21T10:18:50.5337011Z or.b32 %r2526, %r2377, %r10; 2026-02-21T10:18:50.5337192Z or.b32 %r2527, %r2377, %r11; 2026-02-21T10:18:50.5337459Z or.b32 %r2528, %r2377, %r12; 2026-02-21T10:18:50.5337780Z .loc 1 39 27 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:39:27 2026-02-21T10:18:50.5338134Z shl.b32 %r2375, %r601, 7; 2026-02-21T10:18:50.5338510Z .loc 1 54 53 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:53 2026-02-21T10:18:50.5338872Z shl.b32 %r605, %r2521, 13; 2026-02-21T10:18:50.5339050Z shl.b32 %r606, %r2522, 13; 2026-02-21T10:18:50.5339224Z shl.b32 %r607, %r2523, 13; 2026-02-21T10:18:50.5339402Z shl.b32 %r608, %r2524, 13; 2026-02-21T10:18:50.5339584Z shl.b32 %r609, %r2525, 13; 2026-02-21T10:18:50.5339766Z shl.b32 %r610, %r2526, 13; 2026-02-21T10:18:50.5339936Z shl.b32 %r611, %r2527, 13; 2026-02-21T10:18:50.5340112Z shl.b32 %r612, %r2528, 13; 2026-02-21T10:18:50.5340424Z .loc 1 54 60 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:60 2026-02-21T10:18:50.5340778Z or.b32 %r613, %r605, %r34; 2026-02-21T10:18:50.5340951Z or.b32 %r614, %r606, %r34; 2026-02-21T10:18:50.5341131Z or.b32 %r615, %r607, %r34; 2026-02-21T10:18:50.5341305Z or.b32 %r616, %r608, %r34; 2026-02-21T10:18:50.5341484Z or.b32 %r617, %r609, %r34; 2026-02-21T10:18:50.5341666Z or.b32 %r618, %r610, %r34; 2026-02-21T10:18:50.5341849Z or.b32 %r619, %r611, %r34; 2026-02-21T10:18:50.5342027Z or.b32 %r620, %r612, %r34; 2026-02-21T10:18:50.5342334Z .loc 1 54 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:32 2026-02-21T10:18:50.5342694Z mad.wide.s32 %rd26, %r613, 2, %rd6; 2026-02-21T10:18:50.5342903Z mad.wide.s32 %rd27, %r614, 2, %rd6; 2026-02-21T10:18:50.5343108Z mad.wide.s32 %rd28, %r615, 2, %rd6; 2026-02-21T10:18:50.5343308Z mad.wide.s32 %rd29, %r616, 2, %rd6; 2026-02-21T10:18:50.5343511Z mad.wide.s32 %rd30, %r617, 2, %rd6; 2026-02-21T10:18:50.5343713Z mad.wide.s32 %rd31, %r618, 2, %rd6; 2026-02-21T10:18:50.5343908Z mad.wide.s32 %rd32, %r619, 2, %rd6; 2026-02-21T10:18:50.5344109Z mad.wide.s32 %rd33, %r620, 2, %rd6; 2026-02-21T10:18:50.5344432Z .loc 1 54 80 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:80 2026-02-21T10:18:50.5344795Z and.b32 %r45, %r2, 127; 2026-02-21T10:18:50.5344969Z shl.b32 %r621, %r45, 3; 2026-02-21T10:18:50.5345146Z shr.u32 %r622, %r2, 1; 2026-02-21T10:18:50.5345317Z and.b32 %r623, %r622, 24; 2026-02-21T10:18:50.5345491Z xor.b32 %r46, %r621, %r623; 2026-02-21T10:18:50.5345670Z add.s32 %r515, %r586, %r46; 2026-02-21T10:18:50.5345851Z selp.b32 %r516, 8, 0, %p32; 2026-02-21T10:18:50.5346033Z // begin inline asm 2026-02-21T10:18:50.5346274Z cp.async.ca.shared.global [ %r515 + 0 ], [ %rd26 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5346679Z // end inline asm 2026-02-21T10:18:50.5346834Z add.s32 %r517, %r515, 1024; 2026-02-21T10:18:50.5347108Z // begin inline asm 2026-02-21T10:18:50.5347402Z cp.async.ca.shared.global [ %r517 + 0 ], [ %rd27 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5347677Z // end inline asm 2026-02-21T10:18:50.5347835Z add.s32 %r519, %r515, 2048; 2026-02-21T10:18:50.5348010Z // begin inline asm 2026-02-21T10:18:50.5348238Z cp.async.ca.shared.global [ %r519 + 0 ], [ %rd28 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5348598Z // end inline asm 2026-02-21T10:18:50.5348757Z add.s32 %r521, %r515, 3072; 2026-02-21T10:18:50.5348930Z // begin inline asm 2026-02-21T10:18:50.5349159Z cp.async.ca.shared.global [ %r521 + 0 ], [ %rd29 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5349422Z // end inline asm 2026-02-21T10:18:50.5349576Z add.s32 %r523, %r515, 4096; 2026-02-21T10:18:50.5349770Z // begin inline asm 2026-02-21T10:18:50.5349994Z cp.async.ca.shared.global [ %r523 + 0 ], [ %rd30 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5350265Z // end inline asm 2026-02-21T10:18:50.5350414Z add.s32 %r525, %r515, 5120; 2026-02-21T10:18:50.5350594Z // begin inline asm 2026-02-21T10:18:50.5350825Z cp.async.ca.shared.global [ %r525 + 0 ], [ %rd31 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5351181Z // end inline asm 2026-02-21T10:18:50.5351347Z add.s32 %r527, %r515, 6144; 2026-02-21T10:18:50.5351532Z // begin inline asm 2026-02-21T10:18:50.5351756Z cp.async.ca.shared.global [ %r527 + 0 ], [ %rd32 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5352032Z // end inline asm 2026-02-21T10:18:50.5352258Z add.s32 %r529, %r515, 7168; 2026-02-21T10:18:50.5352441Z // begin inline asm 2026-02-21T10:18:50.5352669Z cp.async.ca.shared.global [ %r529 + 0 ], [ %rd33 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5352938Z // end inline asm 2026-02-21T10:18:50.5353102Z cp.async.commit_group; 2026-02-21T10:18:50.5353426Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5353801Z bar.sync 0; 2026-02-21T10:18:50.5353963Z and.pred %p22, %p61, %p32; 2026-02-21T10:18:50.5354156Z // begin inline asm 2026-02-21T10:18:50.5354405Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r512], 2048; 2026-02-21T10:18:50.5354672Z // end inline asm 2026-02-21T10:18:50.5354980Z .loc 1 60 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:60:33 2026-02-21T10:18:50.5355330Z bar.sync 0; 2026-02-21T10:18:50.5355496Z elect.sync %r624|%p33, -1; 2026-02-21T10:18:50.5355691Z and.pred %p34, %p32, %p33; 2026-02-21T10:18:50.5355886Z and.pred %p23, %p1, %p34; 2026-02-21T10:18:50.5356069Z add.s32 %r532, %r586, 40960; 2026-02-21T10:18:50.5356254Z // begin inline asm 2026-02-21T10:18:50.5356816Z @%p23 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r532], [%rd34, {%r2375, %r2380}], [%r512]; 2026-02-21T10:18:50.5357282Z // end inline asm 2026-02-21T10:18:50.5357583Z .loc 1 54 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:32 2026-02-21T10:18:50.5357940Z cvt.s64.s32 %rd55, %r605; 2026-02-21T10:18:50.5358132Z cvt.u64.u32 %rd56, %r34; 2026-02-21T10:18:50.5358307Z or.b64 %rd57, %rd55, %rd56; 2026-02-21T10:18:50.5358492Z shl.b64 %rd58, %rd57, 1; 2026-02-21T10:18:50.5358672Z add.s64 %rd59, %rd6, %rd58; 2026-02-21T10:18:50.5358856Z add.s64 %rd35, %rd59, 64; 2026-02-21T10:18:50.5359045Z cvt.s64.s32 %rd60, %r606; 2026-02-21T10:18:50.5359222Z or.b64 %rd61, %rd60, %rd56; 2026-02-21T10:18:50.5359417Z shl.b64 %rd62, %rd61, 1; 2026-02-21T10:18:50.5359592Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T10:18:50.5359803Z add.s64 %rd36, %rd63, 64; 2026-02-21T10:18:50.5359976Z cvt.s64.s32 %rd64, %r607; 2026-02-21T10:18:50.5360157Z or.b64 %rd65, %rd64, %rd56; 2026-02-21T10:18:50.5360333Z shl.b64 %rd66, %rd65, 1; 2026-02-21T10:18:50.5360510Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T10:18:50.5360691Z add.s64 %rd37, %rd67, 64; 2026-02-21T10:18:50.5360875Z cvt.s64.s32 %rd68, %r608; 2026-02-21T10:18:50.5361052Z or.b64 %rd69, %rd68, %rd56; 2026-02-21T10:18:50.5361227Z shl.b64 %rd70, %rd69, 1; 2026-02-21T10:18:50.5361492Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T10:18:50.5361732Z add.s64 %rd38, %rd71, 64; 2026-02-21T10:18:50.5361904Z cvt.s64.s32 %rd72, %r609; 2026-02-21T10:18:50.5362089Z or.b64 %rd73, %rd72, %rd56; 2026-02-21T10:18:50.5362276Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:18:50.5362444Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T10:18:50.5362626Z add.s64 %rd39, %rd75, 64; 2026-02-21T10:18:50.5362801Z cvt.s64.s32 %rd76, %r610; 2026-02-21T10:18:50.5362973Z or.b64 %rd77, %rd76, %rd56; 2026-02-21T10:18:50.5363152Z shl.b64 %rd78, %rd77, 1; 2026-02-21T10:18:50.5363321Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T10:18:50.5363501Z add.s64 %rd40, %rd79, 64; 2026-02-21T10:18:50.5363682Z cvt.s64.s32 %rd80, %r611; 2026-02-21T10:18:50.5363857Z or.b64 %rd81, %rd80, %rd56; 2026-02-21T10:18:50.5364040Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:18:50.5364207Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T10:18:50.5364386Z add.s64 %rd41, %rd83, 64; 2026-02-21T10:18:50.5364554Z cvt.s64.s32 %rd84, %r612; 2026-02-21T10:18:50.5364735Z or.b64 %rd85, %rd84, %rd56; 2026-02-21T10:18:50.5364912Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:18:50.5365086Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T10:18:50.5365340Z add.s64 %rd42, %rd87, 64; 2026-02-21T10:18:50.5365672Z .loc 1 54 80 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:80 2026-02-21T10:18:50.5366035Z add.s32 %r536, %r515, 8192; 2026-02-21T10:18:50.5366219Z // begin inline asm 2026-02-21T10:18:50.5366667Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd35 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5366958Z // end inline asm 2026-02-21T10:18:50.5367121Z add.s32 %r538, %r515, 9216; 2026-02-21T10:18:50.5367301Z // begin inline asm 2026-02-21T10:18:50.5367539Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd36 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5367810Z // end inline asm 2026-02-21T10:18:50.5367982Z add.s32 %r540, %r515, 10240; 2026-02-21T10:18:50.5368163Z // begin inline asm 2026-02-21T10:18:50.5368398Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd37 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5368672Z // end inline asm 2026-02-21T10:18:50.5368823Z add.s32 %r542, %r515, 11264; 2026-02-21T10:18:50.5369006Z // begin inline asm 2026-02-21T10:18:50.5369226Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd38 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5369497Z // end inline asm 2026-02-21T10:18:50.5369653Z add.s32 %r544, %r515, 12288; 2026-02-21T10:18:50.5369829Z // begin inline asm 2026-02-21T10:18:50.5370049Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd39 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5370318Z // end inline asm 2026-02-21T10:18:50.5370469Z add.s32 %r546, %r515, 13312; 2026-02-21T10:18:50.5370641Z // begin inline asm 2026-02-21T10:18:50.5370870Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd40 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5371132Z // end inline asm 2026-02-21T10:18:50.5371287Z add.s32 %r548, %r515, 14336; 2026-02-21T10:18:50.5371457Z // begin inline asm 2026-02-21T10:18:50.5371686Z cp.async.ca.shared.global [ %r548 + 0 ], [ %rd41 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5371953Z // end inline asm 2026-02-21T10:18:50.5372106Z add.s32 %r550, %r515, 15360; 2026-02-21T10:18:50.5372277Z // begin inline asm 2026-02-21T10:18:50.5372504Z cp.async.ca.shared.global [ %r550 + 0 ], [ %rd42 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5372771Z // end inline asm 2026-02-21T10:18:50.5372934Z cp.async.commit_group; 2026-02-21T10:18:50.5373256Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5373619Z bar.sync 0; 2026-02-21T10:18:50.5373766Z // begin inline asm 2026-02-21T10:18:50.5373995Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r513], 2048; 2026-02-21T10:18:50.5374264Z // end inline asm 2026-02-21T10:18:50.5374566Z .loc 1 60 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:60:33 2026-02-21T10:18:50.5374918Z bar.sync 0; 2026-02-21T10:18:50.5375080Z elect.sync %r625|%p35, -1; 2026-02-21T10:18:50.5375377Z and.pred %p36, %p32, %p35; 2026-02-21T10:18:50.5375628Z and.pred %p25, %p1, %p36; 2026-02-21T10:18:50.5375816Z add.s32 %r553, %r586, 43008; 2026-02-21T10:18:50.5375994Z // begin inline asm 2026-02-21T10:18:50.5376409Z @%p25 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r553], [%rd34, {%r2375, %r507}], [%r513]; 2026-02-21T10:18:50.5376986Z // end inline asm 2026-02-21T10:18:50.5377287Z .loc 1 54 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:32 2026-02-21T10:18:50.5377648Z add.s64 %rd44, %rd59, 128; 2026-02-21T10:18:50.5377825Z add.s64 %rd45, %rd63, 128; 2026-02-21T10:18:50.5378004Z add.s64 %rd46, %rd67, 128; 2026-02-21T10:18:50.5378179Z add.s64 %rd47, %rd71, 128; 2026-02-21T10:18:50.5378364Z add.s64 %rd48, %rd75, 128; 2026-02-21T10:18:50.5378539Z add.s64 %rd49, %rd79, 128; 2026-02-21T10:18:50.5378716Z add.s64 %rd50, %rd83, 128; 2026-02-21T10:18:50.5378887Z add.s64 %rd51, %rd87, 128; 2026-02-21T10:18:50.5379209Z .loc 1 54 80 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:80 2026-02-21T10:18:50.5379647Z add.s32 %r557, %r515, 16384; 2026-02-21T10:18:50.5379848Z // begin inline asm 2026-02-21T10:18:50.5380080Z cp.async.ca.shared.global [ %r557 + 0 ], [ %rd44 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5380352Z // end inline asm 2026-02-21T10:18:50.5380507Z add.s32 %r559, %r515, 17408; 2026-02-21T10:18:50.5380765Z // begin inline asm 2026-02-21T10:18:50.5381009Z cp.async.ca.shared.global [ %r559 + 0 ], [ %rd45 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5381279Z // end inline asm 2026-02-21T10:18:50.5381444Z add.s32 %r561, %r515, 18432; 2026-02-21T10:18:50.5381620Z // begin inline asm 2026-02-21T10:18:50.5381848Z cp.async.ca.shared.global [ %r561 + 0 ], [ %rd46 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5382121Z // end inline asm 2026-02-21T10:18:50.5382269Z add.s32 %r563, %r515, 19456; 2026-02-21T10:18:50.5382449Z // begin inline asm 2026-02-21T10:18:50.5382676Z cp.async.ca.shared.global [ %r563 + 0 ], [ %rd47 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5382950Z // end inline asm 2026-02-21T10:18:50.5383105Z add.s32 %r565, %r515, 20480; 2026-02-21T10:18:50.5383290Z // begin inline asm 2026-02-21T10:18:50.5383519Z cp.async.ca.shared.global [ %r565 + 0 ], [ %rd48 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5383786Z // end inline asm 2026-02-21T10:18:50.5383944Z add.s32 %r567, %r515, 21504; 2026-02-21T10:18:50.5384115Z // begin inline asm 2026-02-21T10:18:50.5384341Z cp.async.ca.shared.global [ %r567 + 0 ], [ %rd49 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5384621Z // end inline asm 2026-02-21T10:18:50.5384776Z add.s32 %r569, %r515, 22528; 2026-02-21T10:18:50.5384946Z // begin inline asm 2026-02-21T10:18:50.5385168Z cp.async.ca.shared.global [ %r569 + 0 ], [ %rd50 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5385432Z // end inline asm 2026-02-21T10:18:50.5385581Z add.s32 %r571, %r515, 23552; 2026-02-21T10:18:50.5385750Z // begin inline asm 2026-02-21T10:18:50.5385976Z cp.async.ca.shared.global [ %r571 + 0 ], [ %rd51 + 0 ], 0x8, %r516; 2026-02-21T10:18:50.5386242Z // end inline asm 2026-02-21T10:18:50.5386396Z cp.async.commit_group; 2026-02-21T10:18:50.5386848Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5387199Z bar.sync 0; 2026-02-21T10:18:50.5387344Z // begin inline asm 2026-02-21T10:18:50.5387576Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r514], 2048; 2026-02-21T10:18:50.5387842Z // end inline asm 2026-02-21T10:18:50.5388126Z .loc 1 60 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:60:33 2026-02-21T10:18:50.5388544Z bar.sync 0; 2026-02-21T10:18:50.5388708Z elect.sync %r626|%p37, -1; 2026-02-21T10:18:50.5388890Z and.pred %p38, %p32, %p37; 2026-02-21T10:18:50.5389074Z and.pred %p27, %p1, %p38; 2026-02-21T10:18:50.5389251Z add.s32 %r574, %r586, 45056; 2026-02-21T10:18:50.5389425Z mov.b32 %r2381, 32; 2026-02-21T10:18:50.5389704Z // begin inline asm 2026-02-21T10:18:50.5390120Z @%p27 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r574], [%rd34, {%r2375, %r2381}], [%r514]; 2026-02-21T10:18:50.5390653Z // end inline asm 2026-02-21T10:18:50.5390956Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5391313Z @%p31 bra $L__BB0_7; 2026-02-21T10:18:50.5391494Z // %bb.1: // %.lr.ph 2026-02-21T10:18:50.5391865Z .loc 1 0 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:0:144 2026-02-21T10:18:50.5392271Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:18:50.5392523Z shr.u32 %r3, %r2, 5; 2026-02-21T10:18:50.5392683Z and.b32 %r4, %r2, 120; 2026-02-21T10:18:50.5392852Z shr.u32 %r587, %r2, 4; 2026-02-21T10:18:50.5393022Z bfe.u32 %r13, %r2, 4, 3; 2026-02-21T10:18:50.5393190Z or.b32 %r14, %r13, 8; 2026-02-21T10:18:50.5393359Z or.b32 %r15, %r13, 16; 2026-02-21T10:18:50.5393516Z or.b32 %r16, %r13, 24; 2026-02-21T10:18:50.5393679Z or.b32 %r17, %r13, 32; 2026-02-21T10:18:50.5393835Z or.b32 %r18, %r13, 40; 2026-02-21T10:18:50.5394075Z or.b32 %r19, %r13, 48; 2026-02-21T10:18:50.5394240Z or.b32 %r20, %r587, 56; 2026-02-21T10:18:50.5394404Z or.b32 %r21, %r13, 64; 2026-02-21T10:18:50.5394565Z or.b32 %r22, %r13, 72; 2026-02-21T10:18:50.5394723Z or.b32 %r23, %r13, 80; 2026-02-21T10:18:50.5394953Z or.b32 %r24, %r13, 88; 2026-02-21T10:18:50.5395115Z or.b32 %r25, %r13, 96; 2026-02-21T10:18:50.5395274Z or.b32 %r26, %r13, 104; 2026-02-21T10:18:50.5395435Z or.b32 %r27, %r13, 112; 2026-02-21T10:18:50.5395601Z or.b32 %r28, %r587, 120; 2026-02-21T10:18:50.5395767Z shl.b32 %r588, %r2, 3; 2026-02-21T10:18:50.5395934Z and.b32 %r29, %r588, 120; 2026-02-21T10:18:50.5396262Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5396753Z shl.b32 %r633, %r32, 8; 2026-02-21T10:18:50.5396926Z add.s32 %r47, %r633, -3; 2026-02-21T10:18:50.5397094Z shl.b32 %r634, %r2, 5; 2026-02-21T10:18:50.5397262Z and.b32 %r635, %r634, 3072; 2026-02-21T10:18:50.5397439Z shl.b32 %r636, %r2, 4; 2026-02-21T10:18:50.5397617Z and.b32 %r637, %r636, 448; 2026-02-21T10:18:50.5397793Z and.b32 %r638, %r2, 3; 2026-02-21T10:18:50.5397955Z shl.b32 %r639, %r638, 1; 2026-02-21T10:18:50.5398115Z and.b32 %r640, %r2, 24; 2026-02-21T10:18:50.5398285Z or.b32 %r641, %r635, %r637; 2026-02-21T10:18:50.5398469Z or.b32 %r642, %r639, %r640; 2026-02-21T10:18:50.5398647Z or.b32 %r48, %r641, %r642; 2026-02-21T10:18:50.5398820Z xor.b32 %r49, %r48, 8; 2026-02-21T10:18:50.5398979Z xor.b32 %r50, %r48, 16; 2026-02-21T10:18:50.5399143Z xor.b32 %r51, %r48, 24; 2026-02-21T10:18:50.5399301Z shl.b32 %r643, %r45, 7; 2026-02-21T10:18:50.5399464Z shl.b32 %r644, %r33, 4; 2026-02-21T10:18:50.5399622Z or.b32 %r645, %r643, %r644; 2026-02-21T10:18:50.5399798Z add.s32 %r647, %r586, 24576; 2026-02-21T10:18:50.5399977Z add.s32 %r52, %r647, %r645; 2026-02-21T10:18:50.5400157Z xor.b32 %r648, %r645, 16; 2026-02-21T10:18:50.5400323Z add.s32 %r53, %r647, %r648; 2026-02-21T10:18:50.5400500Z xor.b32 %r649, %r645, 32; 2026-02-21T10:18:50.5400672Z add.s32 %r54, %r647, %r649; 2026-02-21T10:18:50.5400841Z xor.b32 %r650, %r645, 48; 2026-02-21T10:18:50.5401013Z add.s32 %r55, %r647, %r650; 2026-02-21T10:18:50.5401182Z xor.b32 %r651, %r645, 64; 2026-02-21T10:18:50.5401354Z add.s32 %r56, %r647, %r651; 2026-02-21T10:18:50.5401525Z xor.b32 %r652, %r645, 80; 2026-02-21T10:18:50.5401692Z add.s32 %r57, %r647, %r652; 2026-02-21T10:18:50.5401859Z xor.b32 %r653, %r645, 96; 2026-02-21T10:18:50.5402027Z add.s32 %r58, %r647, %r653; 2026-02-21T10:18:50.5402207Z xor.b32 %r654, %r645, 112; 2026-02-21T10:18:50.5402383Z add.s32 %r59, %r647, %r654; 2026-02-21T10:18:50.5402558Z bfe.u32 %r655, %r647, 4, 14; 2026-02-21T10:18:50.5402728Z cvt.u64.u32 %rd88, %r655; 2026-02-21T10:18:50.5402996Z or.b64 %rd92, %rd88, 4611686293372403712; 2026-02-21T10:18:50.5403213Z add.s32 %r656, %r586, 24608; 2026-02-21T10:18:50.5403455Z bfe.u32 %r657, %r656, 4, 14; 2026-02-21T10:18:50.5403628Z cvt.u64.u32 %rd89, %r657; 2026-02-21T10:18:50.5403810Z or.b64 %rd93, %rd89, 4611686293372403712; 2026-02-21T10:18:50.5404009Z add.s32 %r658, %r586, 24640; 2026-02-21T10:18:50.5404183Z bfe.u32 %r659, %r658, 4, 14; 2026-02-21T10:18:50.5404360Z cvt.u64.u32 %rd90, %r659; 2026-02-21T10:18:50.5404552Z or.b64 %rd94, %rd90, 4611686293372403712; 2026-02-21T10:18:50.5404755Z add.s32 %r660, %r586, 24672; 2026-02-21T10:18:50.5404929Z bfe.u32 %r661, %r660, 4, 14; 2026-02-21T10:18:50.5405106Z cvt.u64.u32 %rd91, %r661; 2026-02-21T10:18:50.5405279Z or.b64 %rd95, %rd91, 4611686293372403712; 2026-02-21T10:18:50.5405480Z shl.b32 %r662, %r638, 11; 2026-02-21T10:18:50.5405644Z shl.b32 %r663, %r638, 5; 2026-02-21T10:18:50.5405814Z shl.b32 %r664, %r4, 4; 2026-02-21T10:18:50.5405972Z and.b32 %r666, %r585, 16; 2026-02-21T10:18:50.5406144Z or.b32 %r667, %r664, %r666; 2026-02-21T10:18:50.5406316Z or.b32 %r668, %r667, %r662; 2026-02-21T10:18:50.5406607Z or.b32 %r669, %r668, %r663; 2026-02-21T10:18:50.5406878Z add.s32 %r60, %r647, %r669; 2026-02-21T10:18:50.5407054Z xor.b32 %r670, %r669, 32; 2026-02-21T10:18:50.5407226Z add.s32 %r61, %r647, %r670; 2026-02-21T10:18:50.5407393Z xor.b32 %r671, %r669, 64; 2026-02-21T10:18:50.5407561Z add.s32 %r62, %r647, %r671; 2026-02-21T10:18:50.5407794Z xor.b32 %r672, %r669, 96; 2026-02-21T10:18:50.5407965Z add.s32 %r63, %r647, %r672; 2026-02-21T10:18:50.5408132Z shl.b32 %r673, %r640, 8; 2026-02-21T10:18:50.5408299Z and.b32 %r674, %r585, 496; 2026-02-21T10:18:50.5408470Z or.b32 %r675, %r673, %r663; 2026-02-21T10:18:50.5408640Z xor.b32 %r676, %r675, %r674; 2026-02-21T10:18:50.5408826Z add.s32 %r2125, %r647, %r676; 2026-02-21T10:18:50.5409009Z add.s32 %r2130, %r2125, 512; 2026-02-21T10:18:50.5409185Z add.s32 %r2135, %r2125, 1024; 2026-02-21T10:18:50.5409360Z add.s32 %r2140, %r2125, 1536; 2026-02-21T10:18:50.5409534Z shl.b32 %r677, %r30, 8; 2026-02-21T10:18:50.5409697Z shl.b32 %r678, %r31, 8; 2026-02-21T10:18:50.5409867Z add.s32 %r68, %r677, %r678; 2026-02-21T10:18:50.5410046Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:18:50.5410213Z mov.b32 %r2384, 2; 2026-02-21T10:18:50.5410370Z mov.b32 %r2383, -1; 2026-02-21T10:18:50.5410539Z mov.b32 %r2376, %r2375; 2026-02-21T10:18:50.5410704Z mov.b32 %r2378, %r2377; 2026-02-21T10:18:50.5410866Z mov.b32 %r2382, %r2380; 2026-02-21T10:18:50.5411024Z mov.b32 %r2385, %r2375; 2026-02-21T10:18:50.5411180Z mov.b32 %r2386, %r2377; 2026-02-21T10:18:50.5411344Z mov.b32 %r2388, %r2387; 2026-02-21T10:18:50.5411500Z mov.b32 %r2389, %r2387; 2026-02-21T10:18:50.5411661Z mov.b32 %r2390, %r2387; 2026-02-21T10:18:50.5411827Z mov.b32 %r2391, %r2387; 2026-02-21T10:18:50.5411995Z mov.b32 %r2392, %r2387; 2026-02-21T10:18:50.5412160Z mov.b32 %r2393, %r2387; 2026-02-21T10:18:50.5412318Z mov.b32 %r2394, %r2387; 2026-02-21T10:18:50.5412479Z mov.b32 %r2395, %r2387; 2026-02-21T10:18:50.5412637Z mov.b32 %r2396, %r2387; 2026-02-21T10:18:50.5412797Z mov.b32 %r2397, %r2387; 2026-02-21T10:18:50.5412959Z mov.b32 %r2398, %r2387; 2026-02-21T10:18:50.5413122Z mov.b32 %r2399, %r2387; 2026-02-21T10:18:50.5413281Z mov.b32 %r2400, %r2387; 2026-02-21T10:18:50.5413443Z mov.b32 %r2401, %r2387; 2026-02-21T10:18:50.5413608Z mov.b32 %r2402, %r2387; 2026-02-21T10:18:50.5413764Z mov.b32 %r2403, %r2387; 2026-02-21T10:18:50.5413926Z mov.b32 %r2404, %r2387; 2026-02-21T10:18:50.5414084Z mov.b32 %r2405, %r2387; 2026-02-21T10:18:50.5414254Z mov.b32 %r2406, %r2387; 2026-02-21T10:18:50.5414426Z mov.b32 %r2407, %r2387; 2026-02-21T10:18:50.5414599Z mov.b32 %r2408, %r2387; 2026-02-21T10:18:50.5414761Z mov.b32 %r2409, %r2387; 2026-02-21T10:18:50.5414931Z mov.b32 %r2410, %r2387; 2026-02-21T10:18:50.5415097Z mov.b32 %r2411, %r2387; 2026-02-21T10:18:50.5415272Z mov.b32 %r2412, %r2387; 2026-02-21T10:18:50.5415443Z mov.b32 %r2413, %r2387; 2026-02-21T10:18:50.5415694Z mov.b32 %r2414, %r2387; 2026-02-21T10:18:50.5415925Z mov.b32 %r2415, %r2387; 2026-02-21T10:18:50.5416088Z mov.b32 %r2416, %r2387; 2026-02-21T10:18:50.5416259Z mov.b32 %r2417, %r2387; 2026-02-21T10:18:50.5416417Z mov.b32 %r2418, %r2387; 2026-02-21T10:18:50.5416703Z mov.b32 %r2419, %r2387; 2026-02-21T10:18:50.5416867Z mov.b32 %r2420, %r2387; 2026-02-21T10:18:50.5417033Z mov.b32 %r2421, %r2387; 2026-02-21T10:18:50.5417197Z mov.b32 %r2422, %r2387; 2026-02-21T10:18:50.5417364Z mov.b32 %r2423, %r2387; 2026-02-21T10:18:50.5417528Z mov.b32 %r2424, %r2387; 2026-02-21T10:18:50.5417689Z mov.b32 %r2425, %r2387; 2026-02-21T10:18:50.5417857Z mov.b32 %r2426, %r2387; 2026-02-21T10:18:50.5418015Z mov.b32 %r2427, %r2387; 2026-02-21T10:18:50.5418180Z mov.b32 %r2428, %r2387; 2026-02-21T10:18:50.5418357Z mov.b32 %r2429, %r2387; 2026-02-21T10:18:50.5418526Z mov.b32 %r2430, %r2387; 2026-02-21T10:18:50.5418690Z mov.b32 %r2431, %r2387; 2026-02-21T10:18:50.5418858Z mov.b32 %r2432, %r2387; 2026-02-21T10:18:50.5419017Z mov.b32 %r2433, %r2387; 2026-02-21T10:18:50.5419187Z mov.b32 %r2434, %r2387; 2026-02-21T10:18:50.5419433Z mov.b32 %r2435, %r2387; 2026-02-21T10:18:50.5419596Z mov.b32 %r2436, %r2387; 2026-02-21T10:18:50.5419762Z mov.b32 %r2437, %r2387; 2026-02-21T10:18:50.5419927Z mov.b32 %r2438, %r2387; 2026-02-21T10:18:50.5420111Z mov.b32 %r2439, %r2387; 2026-02-21T10:18:50.5420275Z mov.b32 %r2440, %r2387; 2026-02-21T10:18:50.5420508Z mov.b32 %r2441, %r2387; 2026-02-21T10:18:50.5420673Z mov.b32 %r2442, %r2387; 2026-02-21T10:18:50.5420841Z mov.b32 %r2443, %r2387; 2026-02-21T10:18:50.5421004Z mov.b32 %r2444, %r2387; 2026-02-21T10:18:50.5421185Z mov.b32 %r2445, %r2387; 2026-02-21T10:18:50.5421353Z mov.b32 %r2446, %r2387; 2026-02-21T10:18:50.5421511Z mov.b32 %r2447, %r2387; 2026-02-21T10:18:50.5421680Z mov.b32 %r2448, %r2387; 2026-02-21T10:18:50.5421843Z mov.b32 %r2449, %r2387; 2026-02-21T10:18:50.5422014Z mov.b32 %r2450, %r2387; 2026-02-21T10:18:50.5422178Z mov.b32 %r2451, %r2387; 2026-02-21T10:18:50.5422346Z mov.b32 %r2452, %r2387; 2026-02-21T10:18:50.5422508Z mov.b32 %r2453, %r2387; 2026-02-21T10:18:50.5422675Z mov.b32 %r2454, %r2387; 2026-02-21T10:18:50.5422836Z mov.b32 %r2455, %r2387; 2026-02-21T10:18:50.5423003Z mov.b32 %r2456, %r2387; 2026-02-21T10:18:50.5423167Z mov.b32 %r2457, %r2387; 2026-02-21T10:18:50.5423330Z mov.b32 %r2458, %r2387; 2026-02-21T10:18:50.5423499Z mov.b32 %r2459, %r2387; 2026-02-21T10:18:50.5423662Z mov.b32 %r2460, %r2387; 2026-02-21T10:18:50.5423827Z mov.b32 %r2461, %r2387; 2026-02-21T10:18:50.5424002Z mov.b32 %r2462, %r2387; 2026-02-21T10:18:50.5424176Z mov.b32 %r2463, %r2387; 2026-02-21T10:18:50.5424337Z mov.b32 %r2464, %r2387; 2026-02-21T10:18:50.5424503Z mov.b32 %r2465, %r2387; 2026-02-21T10:18:50.5424664Z mov.b32 %r2466, %r2387; 2026-02-21T10:18:50.5424835Z mov.b32 %r2467, %r2387; 2026-02-21T10:18:50.5425000Z mov.b32 %r2468, %r2387; 2026-02-21T10:18:50.5425165Z mov.b32 %r2469, %r2387; 2026-02-21T10:18:50.5425333Z mov.b32 %r2470, %r2387; 2026-02-21T10:18:50.5425496Z mov.b32 %r2471, %r2387; 2026-02-21T10:18:50.5425661Z mov.b32 %r2472, %r2387; 2026-02-21T10:18:50.5425823Z mov.b32 %r2473, %r2387; 2026-02-21T10:18:50.5425986Z mov.b32 %r2474, %r2387; 2026-02-21T10:18:50.5426146Z mov.b32 %r2475, %r2387; 2026-02-21T10:18:50.5426321Z mov.b32 %r2476, %r2387; 2026-02-21T10:18:50.5426601Z mov.b32 %r2477, %r2387; 2026-02-21T10:18:50.5426782Z mov.b32 %r2478, %r2387; 2026-02-21T10:18:50.5426948Z mov.b32 %r2479, %r2387; 2026-02-21T10:18:50.5427111Z mov.b32 %r2480, %r2387; 2026-02-21T10:18:50.5427273Z mov.b32 %r2481, %r2387; 2026-02-21T10:18:50.5427429Z mov.b32 %r2482, %r2387; 2026-02-21T10:18:50.5427591Z mov.b32 %r2483, %r2387; 2026-02-21T10:18:50.5427753Z mov.b32 %r2484, %r2387; 2026-02-21T10:18:50.5427922Z mov.b32 %r2485, %r2387; 2026-02-21T10:18:50.5428087Z mov.b32 %r2486, %r2387; 2026-02-21T10:18:50.5428253Z mov.b32 %r2487, %r2387; 2026-02-21T10:18:50.5428576Z mov.b32 %r2488, %r2387; 2026-02-21T10:18:50.5428743Z mov.b32 %r2489, %r2387; 2026-02-21T10:18:50.5428979Z mov.b32 %r2490, %r2387; 2026-02-21T10:18:50.5429146Z mov.b32 %r2491, %r2387; 2026-02-21T10:18:50.5429312Z mov.b32 %r2492, %r2387; 2026-02-21T10:18:50.5429483Z mov.b32 %r2493, %r2387; 2026-02-21T10:18:50.5429655Z mov.b32 %r2494, %r2387; 2026-02-21T10:18:50.5429826Z mov.b32 %r2495, %r2387; 2026-02-21T10:18:50.5429989Z mov.b32 %r2496, %r2387; 2026-02-21T10:18:50.5430152Z mov.b32 %r2497, %r2387; 2026-02-21T10:18:50.5430318Z mov.b32 %r2498, %r2387; 2026-02-21T10:18:50.5430477Z mov.b32 %r2499, %r2387; 2026-02-21T10:18:50.5430643Z mov.b32 %r2500, %r2387; 2026-02-21T10:18:50.5430805Z mov.b32 %r2501, %r2387; 2026-02-21T10:18:50.5430971Z mov.b32 %r2502, %r2387; 2026-02-21T10:18:50.5431137Z mov.b32 %r2503, %r2387; 2026-02-21T10:18:50.5431297Z mov.b32 %r2504, %r2387; 2026-02-21T10:18:50.5431464Z mov.b32 %r2505, %r2387; 2026-02-21T10:18:50.5431626Z mov.b32 %r2506, %r2387; 2026-02-21T10:18:50.5431792Z mov.b32 %r2507, %r2387; 2026-02-21T10:18:50.5431956Z mov.b32 %r2508, %r2387; 2026-02-21T10:18:50.5432125Z mov.b32 %r2509, %r2387; 2026-02-21T10:18:50.5432361Z mov.b32 %r2510, %r2387; 2026-02-21T10:18:50.5432537Z mov.b32 %r2511, %r2387; 2026-02-21T10:18:50.5432713Z mov.b32 %r2512, %r2387; 2026-02-21T10:18:50.5432887Z mov.b32 %r2513, %r2387; 2026-02-21T10:18:50.5433056Z mov.b32 %r2514, %r2387; 2026-02-21T10:18:50.5433299Z mov.b32 %r2516, %r2384; 2026-02-21T10:18:50.5433470Z mov.b32 %r2517, %r2380; 2026-02-21T10:18:50.5433630Z mov.b32 %r2518, %r2386; 2026-02-21T10:18:50.5433795Z mov.b32 %r2519, %r2385; 2026-02-21T10:18:50.5433956Z bra.uni $L__BB0_2; 2026-02-21T10:18:50.5434176Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:50.5434602Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5434971Z add.s32 %r2517, %r2517, 1; 2026-02-21T10:18:50.5435163Z setp.ne.b32 %p60, %r68, %r2517; 2026-02-21T10:18:50.5435353Z mov.b32 %r2375, %r2385; 2026-02-21T10:18:50.5435524Z mov.b32 %r2376, %r77; 2026-02-21T10:18:50.5435685Z mov.b32 %r2377, %r2386; 2026-02-21T10:18:50.5435850Z mov.b32 %r2378, %r79; 2026-02-21T10:18:50.5436008Z mov.b32 %r2379, %r2516; 2026-02-21T10:18:50.5436173Z mov.b32 %r2380, %r81; 2026-02-21T10:18:50.5436328Z mov.b32 %r2385, %r2519; 2026-02-21T10:18:50.5436624Z mov.b32 %r2386, %r2518; 2026-02-21T10:18:50.5436803Z mov.b32 %r2516, %r220; 2026-02-21T10:18:50.5436978Z @%p60 bra $L__BB0_2; 2026-02-21T10:18:50.5437139Z bra.uni $L__BB0_7; 2026-02-21T10:18:50.5437351Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:18:50.5437763Z .loc 1 0 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:0:144 2026-02-21T10:18:50.5438116Z mov.b32 %r81, %r2379; 2026-02-21T10:18:50.5438282Z mov.b32 %r79, %r2377; 2026-02-21T10:18:50.5438437Z mov.b32 %r77, %r2375; 2026-02-21T10:18:50.5438741Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5439108Z add.s32 %r679, %r2516, 1; 2026-02-21T10:18:50.5439290Z setp.eq.b32 %p39, %r2516, 255; 2026-02-21T10:18:50.5439489Z selp.b32 %r220, 0, %r679, %p39; 2026-02-21T10:18:50.5439694Z setp.ne.b32 %p40, %r220, 0; 2026-02-21T10:18:50.5439883Z @%p40 bra $L__BB0_4; 2026-02-21T10:18:50.5440096Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:50.5440365Z add.s32 %r2520, %r2520, 8448; 2026-02-21T10:18:50.5440687Z .loc 1 32 35 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:32:35 2026-02-21T10:18:50.5441052Z mul.hi.s32 %r680, %r2520, 1717986919; 2026-02-21T10:18:50.5441259Z shr.u32 %r681, %r680, 31; 2026-02-21T10:18:50.5441432Z shr.s32 %r682, %r680, 5; 2026-02-21T10:18:50.5441626Z add.s32 %r683, %r682, %r681; 2026-02-21T10:18:50.5441941Z .loc 1 33 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:33:33 2026-02-21T10:18:50.5442432Z shl.b32 %r684, %r683, 3; 2026-02-21T10:18:50.5442740Z .loc 1 34 39 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:34:39 2026-02-21T10:18:50.5443091Z sub.s32 %r685, 512, %r684; 2026-02-21T10:18:50.5443407Z .loc 1 34 52 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:34:52 2026-02-21T10:18:50.5443754Z min.s32 %r686, %r685, 8; 2026-02-21T10:18:50.5444061Z .loc 1 35 45 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:45 2026-02-21T10:18:50.5444406Z mul.lo.s32 %r687, %r683, 80; 2026-02-21T10:18:50.5444598Z sub.s32 %r688, %r2520, %r687; 2026-02-21T10:18:50.5444912Z .loc 1 36 51 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:36:51 2026-02-21T10:18:50.5445267Z div.s32 %r689, %r688, %r686; 2026-02-21T10:18:50.5445577Z .loc 1 35 64 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:64 2026-02-21T10:18:50.5445932Z mul.lo.s32 %r690, %r689, %r686; 2026-02-21T10:18:50.5446215Z sub.s32 %r691, %r688, %r690; 2026-02-21T10:18:50.5446650Z .loc 1 35 30 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:35:30 2026-02-21T10:18:50.5447008Z add.s32 %r692, %r691, %r684; 2026-02-21T10:18:50.5447392Z .loc 1 37 27 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:37:27 2026-02-21T10:18:50.5447756Z shl.b32 %r2518, %r692, 7; 2026-02-21T10:18:50.5448065Z .loc 1 38 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:38:32 2026-02-21T10:18:50.5448405Z or.b32 %r2521, %r2518, %r5; 2026-02-21T10:18:50.5448588Z or.b32 %r2522, %r2518, %r6; 2026-02-21T10:18:50.5448759Z or.b32 %r2523, %r2518, %r7; 2026-02-21T10:18:50.5448934Z or.b32 %r2524, %r2518, %r8; 2026-02-21T10:18:50.5449117Z or.b32 %r2525, %r2518, %r9; 2026-02-21T10:18:50.5449298Z or.b32 %r2526, %r2518, %r10; 2026-02-21T10:18:50.5449468Z or.b32 %r2527, %r2518, %r11; 2026-02-21T10:18:50.5449653Z or.b32 %r2528, %r2518, %r12; 2026-02-21T10:18:50.5449966Z .loc 1 39 27 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:39:27 2026-02-21T10:18:50.5450310Z shl.b32 %r2519, %r689, 7; 2026-02-21T10:18:50.5450536Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:50.5450954Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5451324Z setp.eq.b32 %p51, %r220, 0; 2026-02-21T10:18:50.5451518Z setp.lt.s32 %p52, %r2517, %r47; 2026-02-21T10:18:50.5451714Z add.s32 %r2034, %r2383, 1; 2026-02-21T10:18:50.5451896Z setp.gt.s32 %p55, %r2034, 2; 2026-02-21T10:18:50.5452092Z selp.b32 %r2383, 0, %r2034, %p55; 2026-02-21T10:18:50.5452293Z selp.b32 %r2035, 1, 0, %p55; 2026-02-21T10:18:50.5452473Z xor.b32 %r2382, %r2382, %r2035; 2026-02-21T10:18:50.5452810Z .loc 1 54 80 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:80 2026-02-21T10:18:50.5453168Z cp.async.wait_group 2; 2026-02-21T10:18:50.5453354Z bar.sync 0; 2026-02-21T10:18:50.5453504Z shl.b32 %r2036, %r2383, 13; 2026-02-21T10:18:50.5453691Z add.s32 %r2038, %r586, %r2036; 2026-02-21T10:18:50.5454010Z .loc 1 58 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:58:32 2026-02-21T10:18:50.5454373Z add.s32 %r2039, %r2038, %r48; 2026-02-21T10:18:50.5454566Z ld.shared.b16 %rs1, [%r2039]; 2026-02-21T10:18:50.5454755Z ld.shared.b16 %rs2, [%r2039+512]; 2026-02-21T10:18:50.5454965Z ld.shared.b16 %rs3, [%r2039+32]; 2026-02-21T10:18:50.5455162Z ld.shared.b16 %rs4, [%r2039+544]; 2026-02-21T10:18:50.5455383Z ld.shared.b16 %rs5, [%r2039+4096]; 2026-02-21T10:18:50.5455584Z ld.shared.b16 %rs6, [%r2039+4608]; 2026-02-21T10:18:50.5455788Z ld.shared.b16 %rs7, [%r2039+4128]; 2026-02-21T10:18:50.5456074Z ld.shared.b16 %rs8, [%r2039+4640]; 2026-02-21T10:18:50.5456281Z add.s32 %r2040, %r2038, %r49; 2026-02-21T10:18:50.5456665Z ld.shared.b16 %rs9, [%r2040]; 2026-02-21T10:18:50.5456855Z ld.shared.b16 %rs10, [%r2040+512]; 2026-02-21T10:18:50.5457069Z ld.shared.b16 %rs11, [%r2040+32]; 2026-02-21T10:18:50.5457265Z ld.shared.b16 %rs12, [%r2040+544]; 2026-02-21T10:18:50.5457471Z ld.shared.b16 %rs13, [%r2040+4096]; 2026-02-21T10:18:50.5457679Z ld.shared.b16 %rs14, [%r2040+4608]; 2026-02-21T10:18:50.5457885Z ld.shared.b16 %rs15, [%r2040+4128]; 2026-02-21T10:18:50.5458086Z ld.shared.b16 %rs16, [%r2040+4640]; 2026-02-21T10:18:50.5458285Z add.s32 %r2041, %r2038, %r50; 2026-02-21T10:18:50.5458475Z ld.shared.b16 %rs17, [%r2041]; 2026-02-21T10:18:50.5458666Z ld.shared.b16 %rs18, [%r2041+512]; 2026-02-21T10:18:50.5458869Z ld.shared.b16 %rs19, [%r2041+32]; 2026-02-21T10:18:50.5459062Z ld.shared.b16 %rs20, [%r2041+544]; 2026-02-21T10:18:50.5459265Z ld.shared.b16 %rs21, [%r2041+4096]; 2026-02-21T10:18:50.5459468Z ld.shared.b16 %rs22, [%r2041+4608]; 2026-02-21T10:18:50.5459669Z ld.shared.b16 %rs23, [%r2041+4128]; 2026-02-21T10:18:50.5459868Z ld.shared.b16 %rs24, [%r2041+4640]; 2026-02-21T10:18:50.5460150Z add.s32 %r2042, %r2038, %r51; 2026-02-21T10:18:50.5460348Z ld.shared.b16 %rs25, [%r2042]; 2026-02-21T10:18:50.5460534Z ld.shared.b16 %rs26, [%r2042+512]; 2026-02-21T10:18:50.5460736Z ld.shared.b16 %rs27, [%r2042+32]; 2026-02-21T10:18:50.5460990Z ld.shared.b16 %rs28, [%r2042+544]; 2026-02-21T10:18:50.5461209Z ld.shared.b16 %rs29, [%r2042+4096]; 2026-02-21T10:18:50.5461416Z ld.shared.b16 %rs30, [%r2042+4608]; 2026-02-21T10:18:50.5461614Z ld.shared.b16 %rs31, [%r2042+4128]; 2026-02-21T10:18:50.5461812Z ld.shared.b16 %rs32, [%r2042+4640]; 2026-02-21T10:18:50.5462013Z cvt.f32.bf16 %r823, %rs1; 2026-02-21T10:18:50.5462189Z cvt.f32.bf16 %r824, %rs2; 2026-02-21T10:18:50.5462364Z cvt.f32.bf16 %r825, %rs9; 2026-02-21T10:18:50.5462543Z cvt.f32.bf16 %r826, %rs10; 2026-02-21T10:18:50.5462725Z cvt.f32.bf16 %r955, %rs17; 2026-02-21T10:18:50.5462905Z cvt.f32.bf16 %r956, %rs18; 2026-02-21T10:18:50.5463084Z cvt.f32.bf16 %r957, %rs25; 2026-02-21T10:18:50.5463263Z cvt.f32.bf16 %r958, %rs26; 2026-02-21T10:18:50.5463435Z cvt.f32.bf16 %r1087, %rs3; 2026-02-21T10:18:50.5463630Z cvt.f32.bf16 %r1088, %rs4; 2026-02-21T10:18:50.5463806Z cvt.f32.bf16 %r1089, %rs11; 2026-02-21T10:18:50.5463998Z cvt.f32.bf16 %r1090, %rs12; 2026-02-21T10:18:50.5464183Z cvt.f32.bf16 %r1219, %rs19; 2026-02-21T10:18:50.5464359Z cvt.f32.bf16 %r1220, %rs20; 2026-02-21T10:18:50.5464540Z cvt.f32.bf16 %r1221, %rs27; 2026-02-21T10:18:50.5464721Z cvt.f32.bf16 %r1222, %rs28; 2026-02-21T10:18:50.5464902Z cvt.f32.bf16 %r1351, %rs5; 2026-02-21T10:18:50.5465073Z cvt.f32.bf16 %r1352, %rs6; 2026-02-21T10:18:50.5465250Z cvt.f32.bf16 %r1353, %rs13; 2026-02-21T10:18:50.5465422Z cvt.f32.bf16 %r1354, %rs14; 2026-02-21T10:18:50.5465602Z cvt.f32.bf16 %r1483, %rs21; 2026-02-21T10:18:50.5465789Z cvt.f32.bf16 %r1484, %rs22; 2026-02-21T10:18:50.5465973Z cvt.f32.bf16 %r1485, %rs29; 2026-02-21T10:18:50.5466155Z cvt.f32.bf16 %r1486, %rs30; 2026-02-21T10:18:50.5466330Z cvt.f32.bf16 %r1615, %rs7; 2026-02-21T10:18:50.5466651Z cvt.f32.bf16 %r1616, %rs8; 2026-02-21T10:18:50.5466834Z cvt.f32.bf16 %r1617, %rs15; 2026-02-21T10:18:50.5467020Z cvt.f32.bf16 %r1618, %rs16; 2026-02-21T10:18:50.5467204Z cvt.f32.bf16 %r1747, %rs23; 2026-02-21T10:18:50.5467388Z cvt.f32.bf16 %r1748, %rs24; 2026-02-21T10:18:50.5467565Z cvt.f32.bf16 %r1749, %rs31; 2026-02-21T10:18:50.5467746Z cvt.f32.bf16 %r1750, %rs32; 2026-02-21T10:18:50.5468073Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5468493Z shl.b32 %r2043, %r2383, 3; 2026-02-21T10:18:50.5468686Z add.s32 %r693, %r512, %r2043; 2026-02-21T10:18:50.5468866Z // begin inline asm 2026-02-21T10:18:50.5469025Z 2026-02-21T10:18:50.5469146Z { 2026-02-21T10:18:50.5469282Z .reg .pred complete; 2026-02-21T10:18:50.5469533Z waitLoop: 2026-02-21T10:18:50.5469768Z mbarrier.try_wait.parity.shared.b64 complete, [%r693], %r2382; 2026-02-21T10:18:50.5470123Z @!complete bra.uni waitLoop; 2026-02-21T10:18:50.5470303Z } 2026-02-21T10:18:50.5470376Z 2026-02-21T10:18:50.5470441Z // end inline asm 2026-02-21T10:18:50.5470736Z .loc 1 60 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:60:33 2026-02-21T10:18:50.5471090Z shl.b32 %r2045, %r2383, 11; 2026-02-21T10:18:50.5471281Z add.s32 %r2047, %r532, %r2045; 2026-02-21T10:18:50.5471611Z .loc 1 78 58 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:78:58 2026-02-21T10:18:50.5471960Z add.s32 %r2048, %r2047, %r45; 2026-02-21T10:18:50.5472143Z xor.b32 %r2049, %r45, 16; 2026-02-21T10:18:50.5472320Z add.s32 %r2050, %r2047, %r2049; 2026-02-21T10:18:50.5472510Z xor.b32 %r2051, %r45, 32; 2026-02-21T10:18:50.5472686Z add.s32 %r2052, %r2047, %r2051; 2026-02-21T10:18:50.5472867Z xor.b32 %r2053, %r45, 48; 2026-02-21T10:18:50.5473042Z add.s32 %r2054, %r2047, %r2053; 2026-02-21T10:18:50.5473228Z xor.b32 %r2055, %r45, 64; 2026-02-21T10:18:50.5473403Z add.s32 %r2056, %r2047, %r2055; 2026-02-21T10:18:50.5473671Z xor.b32 %r2057, %r45, 80; 2026-02-21T10:18:50.5473855Z add.s32 %r2058, %r2047, %r2057; 2026-02-21T10:18:50.5474034Z xor.b32 %r2059, %r45, 96; 2026-02-21T10:18:50.5474209Z add.s32 %r2060, %r2047, %r2059; 2026-02-21T10:18:50.5474396Z xor.b32 %r2061, %r45, 112; 2026-02-21T10:18:50.5474649Z add.s32 %r2062, %r2047, %r2061; 2026-02-21T10:18:50.5474980Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5475332Z ld.shared.s8 %rs33, [%r2048]; 2026-02-21T10:18:50.5475659Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5476010Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:18:50.5476326Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5476814Z ld.shared.s8 %rs35, [%r2050+128]; 2026-02-21T10:18:50.5477145Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5477494Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:18:50.5477797Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5478152Z ld.shared.s8 %rs37, [%r2052+256]; 2026-02-21T10:18:50.5478481Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5478845Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:18:50.5479161Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5479512Z ld.shared.s8 %rs39, [%r2054+384]; 2026-02-21T10:18:50.5479847Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5480196Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:18:50.5480504Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5480856Z ld.shared.s8 %rs41, [%r2056+512]; 2026-02-21T10:18:50.5481187Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5481540Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:18:50.5481849Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5482217Z ld.shared.s8 %rs43, [%r2058+640]; 2026-02-21T10:18:50.5482542Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5482893Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:18:50.5483194Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5483548Z ld.shared.s8 %rs45, [%r2060+768]; 2026-02-21T10:18:50.5483967Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5484381Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:18:50.5484690Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5485037Z ld.shared.s8 %rs47, [%r2062+896]; 2026-02-21T10:18:50.5485366Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5485716Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:18:50.5486037Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5486396Z ld.shared.s8 %rs49, [%r2048+1024]; 2026-02-21T10:18:50.5486853Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5487205Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:18:50.5487506Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5487866Z ld.shared.s8 %rs51, [%r2050+1152]; 2026-02-21T10:18:50.5488300Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5488665Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:18:50.5488987Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5489404Z ld.shared.s8 %rs53, [%r2052+1280]; 2026-02-21T10:18:50.5489751Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5490094Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:18:50.5490404Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5490760Z ld.shared.s8 %rs55, [%r2054+1408]; 2026-02-21T10:18:50.5491097Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5491455Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:18:50.5491758Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5492116Z ld.shared.s8 %rs57, [%r2056+1536]; 2026-02-21T10:18:50.5492447Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5492799Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:18:50.5493112Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5493464Z ld.shared.s8 %rs59, [%r2058+1664]; 2026-02-21T10:18:50.5493795Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5494141Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:18:50.5494453Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5494799Z ld.shared.s8 %rs61, [%r2060+1792]; 2026-02-21T10:18:50.5495135Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5495488Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:18:50.5495794Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5496155Z ld.shared.s8 %rs63, [%r2062+1920]; 2026-02-21T10:18:50.5496624Z .loc 1 63 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:63:28 2026-02-21T10:18:50.5496981Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:18:50.5497284Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5497644Z cvt.s16.s8 %rs65, %rs34; 2026-02-21T10:18:50.5497825Z shr.s16 %rs66, %rs65, 4; 2026-02-21T10:18:50.5497995Z cvt.s16.s8 %rs67, %rs36; 2026-02-21T10:18:50.5498171Z shr.s16 %rs68, %rs67, 4; 2026-02-21T10:18:50.5498352Z shr.s16 %rs69, %rs33, 4; 2026-02-21T10:18:50.5498527Z shr.s16 %rs70, %rs35, 4; 2026-02-21T10:18:50.5498922Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5499347Z cvt.rn.f32.s16 %r2063, %rs70; 2026-02-21T10:18:50.5499533Z cvt.rn.f32.s16 %r2064, %rs69; 2026-02-21T10:18:50.5499723Z cvt.rn.f32.s16 %r2065, %rs68; 2026-02-21T10:18:50.5499906Z cvt.rn.f32.s16 %r2066, %rs66; 2026-02-21T10:18:50.5500216Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5500570Z cvt.s16.s8 %rs71, %rs38; 2026-02-21T10:18:50.5500739Z shr.s16 %rs72, %rs71, 4; 2026-02-21T10:18:50.5500927Z cvt.s16.s8 %rs73, %rs40; 2026-02-21T10:18:50.5501096Z shr.s16 %rs74, %rs73, 4; 2026-02-21T10:18:50.5501267Z shr.s16 %rs75, %rs37, 4; 2026-02-21T10:18:50.5501433Z shr.s16 %rs76, %rs39, 4; 2026-02-21T10:18:50.5501735Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5502089Z cvt.rn.f32.s16 %r2067, %rs76; 2026-02-21T10:18:50.5502270Z cvt.rn.f32.s16 %r2068, %rs75; 2026-02-21T10:18:50.5502454Z cvt.rn.f32.s16 %r2069, %rs74; 2026-02-21T10:18:50.5502712Z cvt.rn.f32.s16 %r2070, %rs72; 2026-02-21T10:18:50.5503039Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5503383Z cvt.s16.s8 %rs77, %rs42; 2026-02-21T10:18:50.5503559Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:18:50.5503794Z cvt.s16.s8 %rs79, %rs44; 2026-02-21T10:18:50.5503970Z shr.s16 %rs80, %rs79, 4; 2026-02-21T10:18:50.5504157Z shr.s16 %rs81, %rs41, 4; 2026-02-21T10:18:50.5504329Z shr.s16 %rs82, %rs43, 4; 2026-02-21T10:18:50.5504637Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5504988Z cvt.rn.f32.s16 %r2071, %rs82; 2026-02-21T10:18:50.5505174Z cvt.rn.f32.s16 %r2072, %rs81; 2026-02-21T10:18:50.5505353Z cvt.rn.f32.s16 %r2073, %rs80; 2026-02-21T10:18:50.5505541Z cvt.rn.f32.s16 %r2074, %rs78; 2026-02-21T10:18:50.5505850Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5506217Z cvt.s16.s8 %rs83, %rs46; 2026-02-21T10:18:50.5506399Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:18:50.5506696Z cvt.s16.s8 %rs85, %rs48; 2026-02-21T10:18:50.5506873Z shr.s16 %rs86, %rs85, 4; 2026-02-21T10:18:50.5507055Z shr.s16 %rs87, %rs45, 4; 2026-02-21T10:18:50.5507231Z shr.s16 %rs88, %rs47, 4; 2026-02-21T10:18:50.5507534Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5507887Z cvt.rn.f32.s16 %r2075, %rs88; 2026-02-21T10:18:50.5512249Z cvt.rn.f32.s16 %r2076, %rs87; 2026-02-21T10:18:50.5512526Z cvt.rn.f32.s16 %r2077, %rs86; 2026-02-21T10:18:50.5512726Z cvt.rn.f32.s16 %r2078, %rs84; 2026-02-21T10:18:50.5513070Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5513455Z cvt.s16.s8 %rs89, %rs50; 2026-02-21T10:18:50.5513642Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:18:50.5513814Z cvt.s16.s8 %rs91, %rs52; 2026-02-21T10:18:50.5513991Z shr.s16 %rs92, %rs91, 4; 2026-02-21T10:18:50.5514159Z shr.s16 %rs93, %rs49, 4; 2026-02-21T10:18:50.5514343Z shr.s16 %rs94, %rs51, 4; 2026-02-21T10:18:50.5514676Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5515053Z cvt.rn.f32.s16 %r2079, %rs94; 2026-02-21T10:18:50.5515266Z cvt.rn.f32.s16 %r2080, %rs93; 2026-02-21T10:18:50.5515460Z cvt.rn.f32.s16 %r2081, %rs92; 2026-02-21T10:18:50.5515643Z cvt.rn.f32.s16 %r2082, %rs90; 2026-02-21T10:18:50.5515983Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5516357Z cvt.s16.s8 %rs95, %rs54; 2026-02-21T10:18:50.5516721Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:18:50.5516903Z cvt.s16.s8 %rs97, %rs56; 2026-02-21T10:18:50.5517236Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:18:50.5517405Z shr.s16 %rs99, %rs53, 4; 2026-02-21T10:18:50.5517650Z shr.s16 %rs100, %rs55, 4; 2026-02-21T10:18:50.5517990Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5518354Z cvt.rn.f32.s16 %r2083, %rs100; 2026-02-21T10:18:50.5518551Z cvt.rn.f32.s16 %r2084, %rs99; 2026-02-21T10:18:50.5518737Z cvt.rn.f32.s16 %r2085, %rs98; 2026-02-21T10:18:50.5518919Z cvt.rn.f32.s16 %r2086, %rs96; 2026-02-21T10:18:50.5519240Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5519608Z cvt.s16.s8 %rs101, %rs58; 2026-02-21T10:18:50.5519789Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:18:50.5519984Z cvt.s16.s8 %rs103, %rs60; 2026-02-21T10:18:50.5520154Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:18:50.5520329Z shr.s16 %rs105, %rs57, 4; 2026-02-21T10:18:50.5520495Z shr.s16 %rs106, %rs59, 4; 2026-02-21T10:18:50.5520810Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5521177Z cvt.rn.f32.s16 %r2087, %rs106; 2026-02-21T10:18:50.5521463Z cvt.rn.f32.s16 %r2088, %rs105; 2026-02-21T10:18:50.5521660Z cvt.rn.f32.s16 %r2089, %rs104; 2026-02-21T10:18:50.5521841Z cvt.rn.f32.s16 %r2090, %rs102; 2026-02-21T10:18:50.5522166Z .loc 1 65 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:65:25 2026-02-21T10:18:50.5522574Z cvt.s16.s8 %rs107, %rs62; 2026-02-21T10:18:50.5522767Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:18:50.5522944Z cvt.s16.s8 %rs109, %rs64; 2026-02-21T10:18:50.5523118Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:18:50.5523290Z shr.s16 %rs111, %rs61, 4; 2026-02-21T10:18:50.5523467Z shr.s16 %rs112, %rs63, 4; 2026-02-21T10:18:50.5523779Z .loc 1 83 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:83:32 2026-02-21T10:18:50.5524125Z cvt.rn.f32.s16 %r2091, %rs112; 2026-02-21T10:18:50.5524326Z cvt.rn.f32.s16 %r2092, %rs111; 2026-02-21T10:18:50.5524517Z cvt.rn.f32.s16 %r2093, %rs110; 2026-02-21T10:18:50.5524706Z cvt.rn.f32.s16 %r2094, %rs108; 2026-02-21T10:18:50.5524939Z st.shared.v4.b32 [%r52], {%r2066, %r2064, %r2065, %r2063}; 2026-02-21T10:18:50.5525247Z st.shared.v4.b32 [%r53], {%r2070, %r2068, %r2069, %r2067}; 2026-02-21T10:18:50.5525535Z st.shared.v4.b32 [%r54], {%r2074, %r2072, %r2073, %r2071}; 2026-02-21T10:18:50.5525821Z st.shared.v4.b32 [%r55], {%r2078, %r2076, %r2077, %r2075}; 2026-02-21T10:18:50.5526107Z st.shared.v4.b32 [%r56], {%r2082, %r2080, %r2081, %r2079}; 2026-02-21T10:18:50.5526387Z st.shared.v4.b32 [%r57], {%r2086, %r2084, %r2085, %r2083}; 2026-02-21T10:18:50.5526801Z st.shared.v4.b32 [%r58], {%r2090, %r2088, %r2089, %r2087}; 2026-02-21T10:18:50.5527080Z st.shared.v4.b32 [%r59], {%r2094, %r2092, %r2093, %r2091}; 2026-02-21T10:18:50.5527318Z $L__tmp1: 2026-02-21T10:18:50.5527686Z .loc 2 291 36 // standard.py:291:36 @[ cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:90:40 ] 2026-02-21T10:18:50.5528119Z // begin inline asm 2026-02-21T10:18:50.5528309Z fence.proxy.async.shared::cta; 2026-02-21T10:18:50.5528511Z // end inline asm 2026-02-21T10:18:50.5528668Z bar.sync 0; 2026-02-21T10:18:50.5528840Z shfl.sync.idx.b32 %r2095, %r3, 0, 31, -1; 2026-02-21T10:18:50.5529076Z wgmma.fence.sync.aligned; 2026-02-21T10:18:50.5529269Z mov.pred %p41, -1; 2026-02-21T10:18:50.5529435Z // begin inline asm 2026-02-21T10:18:50.5530828Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r823,%r824,%r825,%r826}, %rd92, %p41, 1, 1; 2026-02-21T10:18:50.5532389Z // end inline asm 2026-02-21T10:18:50.5532549Z // begin inline asm 2026-02-21T10:18:50.5533905Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r955,%r956,%r957,%r958}, %rd93, %p41, 1, 1; 2026-02-21T10:18:50.5533967Z // end inline asm 2026-02-21T10:18:50.5534027Z // begin inline asm 2026-02-21T10:18:50.5535358Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1087,%r1088,%r1089,%r1090}, %rd94, %p41, 1, 1; 2026-02-21T10:18:50.5535429Z // end inline asm 2026-02-21T10:18:50.5535492Z // begin inline asm 2026-02-21T10:18:50.5536940Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1219,%r1220,%r1221,%r1222}, %rd95, %p41, 1, 1; 2026-02-21T10:18:50.5537010Z // end inline asm 2026-02-21T10:18:50.5537073Z // begin inline asm 2026-02-21T10:18:50.5538340Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1351,%r1352,%r1353,%r1354}, %rd92, %p41, 1, 1; 2026-02-21T10:18:50.5538399Z // end inline asm 2026-02-21T10:18:50.5538471Z // begin inline asm 2026-02-21T10:18:50.5539736Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1483,%r1484,%r1485,%r1486}, %rd93, %p41, 1, 1; 2026-02-21T10:18:50.5539798Z // end inline asm 2026-02-21T10:18:50.5539865Z // begin inline asm 2026-02-21T10:18:50.5541121Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1615,%r1616,%r1617,%r1618}, %rd94, %p41, 1, 1; 2026-02-21T10:18:50.5541261Z // end inline asm 2026-02-21T10:18:50.5541406Z // begin inline asm 2026-02-21T10:18:50.5542670Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1747,%r1748,%r1749,%r1750}, %rd95, %p41, 1, 1; 2026-02-21T10:18:50.5542737Z // end inline asm 2026-02-21T10:18:50.5542819Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:50.5542879Z mov.b32 %r1880, 0; 2026-02-21T10:18:50.5542947Z mov.b32 %r1879, %r647; 2026-02-21T10:18:50.5543010Z mov.b32 %r1881, %r1880; 2026-02-21T10:18:50.5543073Z // begin inline asm 2026-02-21T10:18:50.5545262Z // wait for regs: %r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r1879,%r1880,%r1881 2026-02-21T10:18:50.5545352Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:50.5545413Z // end inline asm 2026-02-21T10:18:50.5545476Z $L__tmp2: 2026-02-21T10:18:50.5545703Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5545770Z add.s32 %r2096, %r2381, 16; 2026-02-21T10:18:50.5545838Z add.s32 %r2097, %r2384, 1; 2026-02-21T10:18:50.5545907Z setp.gt.s32 %p56, %r2097, 2; 2026-02-21T10:18:50.5545976Z selp.b32 %r2384, 0, %r2097, %p56; 2026-02-21T10:18:50.5546043Z selp.b32 %r2381, 0, %r2096, %p51; 2026-02-21T10:18:50.5546255Z .loc 1 51 22 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:51:22 2026-02-21T10:18:50.5546319Z shl.b32 %r2098, %r2381, 1; 2026-02-21T10:18:50.5546639Z .loc 1 53 25 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:53:25 2026-02-21T10:18:50.5546721Z add.s32 %r2099, %r2098, %r34; 2026-02-21T10:18:50.5546921Z .loc 1 54 53 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:53 2026-02-21T10:18:50.5546990Z shl.b32 %r2100, %r2521, 13; 2026-02-21T10:18:50.5547057Z shl.b32 %r2101, %r2522, 13; 2026-02-21T10:18:50.5547120Z shl.b32 %r2102, %r2523, 13; 2026-02-21T10:18:50.5547181Z shl.b32 %r2103, %r2524, 13; 2026-02-21T10:18:50.5547242Z shl.b32 %r2104, %r2525, 13; 2026-02-21T10:18:50.5547307Z shl.b32 %r2105, %r2526, 13; 2026-02-21T10:18:50.5547369Z shl.b32 %r2106, %r2527, 13; 2026-02-21T10:18:50.5547431Z shl.b32 %r2107, %r2528, 13; 2026-02-21T10:18:50.5547631Z .loc 1 54 60 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:60 2026-02-21T10:18:50.5547700Z add.s32 %r2108, %r2100, %r2099; 2026-02-21T10:18:50.5547763Z add.s32 %r2109, %r2101, %r2099; 2026-02-21T10:18:50.5547831Z add.s32 %r2110, %r2102, %r2099; 2026-02-21T10:18:50.5547907Z add.s32 %r2111, %r2103, %r2099; 2026-02-21T10:18:50.5547974Z add.s32 %r2112, %r2104, %r2099; 2026-02-21T10:18:50.5548038Z add.s32 %r2113, %r2105, %r2099; 2026-02-21T10:18:50.5548185Z add.s32 %r2114, %r2106, %r2099; 2026-02-21T10:18:50.5548309Z add.s32 %r2115, %r2107, %r2099; 2026-02-21T10:18:50.5548584Z .loc 1 54 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:32 2026-02-21T10:18:50.5548668Z mad.wide.s32 %rd100, %r2108, 2, %rd6; 2026-02-21T10:18:50.5548739Z mad.wide.s32 %rd101, %r2109, 2, %rd6; 2026-02-21T10:18:50.5548807Z mad.wide.s32 %rd102, %r2110, 2, %rd6; 2026-02-21T10:18:50.5548882Z mad.wide.s32 %rd103, %r2111, 2, %rd6; 2026-02-21T10:18:50.5548950Z mad.wide.s32 %rd104, %r2112, 2, %rd6; 2026-02-21T10:18:50.5549016Z mad.wide.s32 %rd105, %r2113, 2, %rd6; 2026-02-21T10:18:50.5549085Z mad.wide.s32 %rd106, %r2114, 2, %rd6; 2026-02-21T10:18:50.5549158Z mad.wide.s32 %rd107, %r2115, 2, %rd6; 2026-02-21T10:18:50.5549355Z .loc 1 54 80 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:54:80 2026-02-21T10:18:50.5549418Z shl.b32 %r2116, %r2384, 13; 2026-02-21T10:18:50.5549492Z add.s32 %r2117, %r586, %r2116; 2026-02-21T10:18:50.5549555Z add.s32 %r2013, %r2117, %r46; 2026-02-21T10:18:50.5549626Z selp.b32 %r2014, 8, 0, %p52; 2026-02-21T10:18:50.5549763Z // begin inline asm 2026-02-21T10:18:50.5549923Z cp.async.ca.shared.global [ %r2013 + 0 ], [ %rd100 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5549983Z // end inline asm 2026-02-21T10:18:50.5550045Z add.s32 %r2015, %r2013, 1024; 2026-02-21T10:18:50.5550112Z // begin inline asm 2026-02-21T10:18:50.5550310Z cp.async.ca.shared.global [ %r2015 + 0 ], [ %rd101 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5550370Z // end inline asm 2026-02-21T10:18:50.5550437Z add.s32 %r2017, %r2013, 2048; 2026-02-21T10:18:50.5550501Z // begin inline asm 2026-02-21T10:18:50.5550633Z cp.async.ca.shared.global [ %r2017 + 0 ], [ %rd102 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5550689Z // end inline asm 2026-02-21T10:18:50.5550757Z add.s32 %r2019, %r2013, 3072; 2026-02-21T10:18:50.5550819Z // begin inline asm 2026-02-21T10:18:50.5550958Z cp.async.ca.shared.global [ %r2019 + 0 ], [ %rd103 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5551028Z // end inline asm 2026-02-21T10:18:50.5551100Z add.s32 %r2021, %r2013, 4096; 2026-02-21T10:18:50.5551163Z // begin inline asm 2026-02-21T10:18:50.5551299Z cp.async.ca.shared.global [ %r2021 + 0 ], [ %rd104 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5551361Z // end inline asm 2026-02-21T10:18:50.5551426Z add.s32 %r2023, %r2013, 5120; 2026-02-21T10:18:50.5551484Z // begin inline asm 2026-02-21T10:18:50.5551623Z cp.async.ca.shared.global [ %r2023 + 0 ], [ %rd105 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5551681Z // end inline asm 2026-02-21T10:18:50.5551742Z add.s32 %r2025, %r2013, 6144; 2026-02-21T10:18:50.5551802Z // begin inline asm 2026-02-21T10:18:50.5551936Z cp.async.ca.shared.global [ %r2025 + 0 ], [ %rd106 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5551992Z // end inline asm 2026-02-21T10:18:50.5552054Z add.s32 %r2027, %r2013, 7168; 2026-02-21T10:18:50.5552119Z // begin inline asm 2026-02-21T10:18:50.5552251Z cp.async.ca.shared.global [ %r2027 + 0 ], [ %rd107 + 0 ], 0x8, %r2014; 2026-02-21T10:18:50.5552311Z // end inline asm 2026-02-21T10:18:50.5552382Z cp.async.commit_group; 2026-02-21T10:18:50.5552600Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5552665Z shl.b32 %r2118, %r2384, 3; 2026-02-21T10:18:50.5552730Z add.s32 %r2029, %r512, %r2118; 2026-02-21T10:18:50.5552807Z and.pred %p49, %p61, %p52; 2026-02-21T10:18:50.5552867Z // begin inline asm 2026-02-21T10:18:50.5553003Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r2029], 2048; 2026-02-21T10:18:50.5553069Z // end inline asm 2026-02-21T10:18:50.5553271Z .loc 1 60 33 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:60:33 2026-02-21T10:18:50.5553335Z shl.b32 %r2119, %r2384, 11; 2026-02-21T10:18:50.5553403Z add.s32 %r2030, %r532, %r2119; 2026-02-21T10:18:50.5553465Z bar.sync 0; 2026-02-21T10:18:50.5553536Z elect.sync %r2120|%p57, -1; 2026-02-21T10:18:50.5553675Z and.pred %p58, %p52, %p57; 2026-02-21T10:18:50.5553794Z and.pred %p50, %p1, %p58; 2026-02-21T10:18:50.5553854Z // begin inline asm 2026-02-21T10:18:50.5554179Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r2030], [%rd34, {%r2519, %r2381}], [%r2029]; 2026-02-21T10:18:50.5554241Z // end inline asm 2026-02-21T10:18:50.5554449Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5554516Z setp.ne.b32 %p59, %r2380, 255; 2026-02-21T10:18:50.5554586Z @%p59 bra $L__BB0_6; 2026-02-21T10:18:50.5554700Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:50.5554903Z .loc 1 38 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:38:32 2026-02-21T10:18:50.5554969Z add.s32 %r2266, %r2378, %r13; 2026-02-21T10:18:50.5555037Z add.s32 %r2267, %r14, %r2378; 2026-02-21T10:18:50.5555100Z add.s32 %r2268, %r15, %r2378; 2026-02-21T10:18:50.5555161Z add.s32 %r2269, %r16, %r2378; 2026-02-21T10:18:50.5555229Z add.s32 %r2270, %r17, %r2378; 2026-02-21T10:18:50.5555345Z add.s32 %r2271, %r18, %r2378; 2026-02-21T10:18:50.5555409Z add.s32 %r2272, %r19, %r2378; 2026-02-21T10:18:50.5555472Z add.s32 %r2273, %r2378, %r20; 2026-02-21T10:18:50.5555537Z add.s32 %r2274, %r21, %r2378; 2026-02-21T10:18:50.5555599Z add.s32 %r2275, %r22, %r2378; 2026-02-21T10:18:50.5555706Z add.s32 %r2276, %r23, %r2378; 2026-02-21T10:18:50.5555774Z add.s32 %r2277, %r24, %r2378; 2026-02-21T10:18:50.5555834Z add.s32 %r2278, %r25, %r2378; 2026-02-21T10:18:50.5555893Z add.s32 %r2279, %r26, %r2378; 2026-02-21T10:18:50.5555959Z add.s32 %r2280, %r27, %r2378; 2026-02-21T10:18:50.5556020Z add.s32 %r2281, %r2378, %r28; 2026-02-21T10:18:50.5556219Z .loc 1 40 32 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:40:32 2026-02-21T10:18:50.5556281Z add.s32 %r2282, %r2376, %r29; 2026-02-21T10:18:50.5556610Z .loc 1 93 28 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:93:28 2026-02-21T10:18:50.5556714Z cvt.rn.bf16x2.f32 %r2283, %r2388, %r2387; 2026-02-21T10:18:50.5556791Z cvt.rn.bf16x2.f32 %r2284, %r2390, %r2389; 2026-02-21T10:18:50.5556873Z cvt.rn.bf16x2.f32 %r2285, %r2392, %r2391; 2026-02-21T10:18:50.5556945Z cvt.rn.bf16x2.f32 %r2286, %r2394, %r2393; 2026-02-21T10:18:50.5557019Z cvt.rn.bf16x2.f32 %r2287, %r2396, %r2395; 2026-02-21T10:18:50.5557096Z cvt.rn.bf16x2.f32 %r2288, %r2398, %r2397; 2026-02-21T10:18:50.5557167Z cvt.rn.bf16x2.f32 %r2289, %r2400, %r2399; 2026-02-21T10:18:50.5557237Z cvt.rn.bf16x2.f32 %r2290, %r2402, %r2401; 2026-02-21T10:18:50.5557308Z cvt.rn.bf16x2.f32 %r2291, %r2404, %r2403; 2026-02-21T10:18:50.5557385Z cvt.rn.bf16x2.f32 %r2292, %r2406, %r2405; 2026-02-21T10:18:50.5557456Z cvt.rn.bf16x2.f32 %r2293, %r2408, %r2407; 2026-02-21T10:18:50.5557528Z cvt.rn.bf16x2.f32 %r2294, %r2410, %r2409; 2026-02-21T10:18:50.5557604Z cvt.rn.bf16x2.f32 %r2295, %r2412, %r2411; 2026-02-21T10:18:50.5557678Z cvt.rn.bf16x2.f32 %r2296, %r2414, %r2413; 2026-02-21T10:18:50.5557752Z cvt.rn.bf16x2.f32 %r2297, %r2416, %r2415; 2026-02-21T10:18:50.5557827Z cvt.rn.bf16x2.f32 %r2298, %r2418, %r2417; 2026-02-21T10:18:50.5557899Z cvt.rn.bf16x2.f32 %r2299, %r2420, %r2419; 2026-02-21T10:18:50.5557969Z cvt.rn.bf16x2.f32 %r2300, %r2422, %r2421; 2026-02-21T10:18:50.5558047Z cvt.rn.bf16x2.f32 %r2301, %r2424, %r2423; 2026-02-21T10:18:50.5558118Z cvt.rn.bf16x2.f32 %r2302, %r2426, %r2425; 2026-02-21T10:18:50.5558198Z cvt.rn.bf16x2.f32 %r2303, %r2428, %r2427; 2026-02-21T10:18:50.5558271Z cvt.rn.bf16x2.f32 %r2304, %r2430, %r2429; 2026-02-21T10:18:50.5558346Z cvt.rn.bf16x2.f32 %r2305, %r2432, %r2431; 2026-02-21T10:18:50.5558419Z cvt.rn.bf16x2.f32 %r2306, %r2434, %r2433; 2026-02-21T10:18:50.5558492Z cvt.rn.bf16x2.f32 %r2307, %r2436, %r2435; 2026-02-21T10:18:50.5558568Z cvt.rn.bf16x2.f32 %r2308, %r2438, %r2437; 2026-02-21T10:18:50.5558720Z cvt.rn.bf16x2.f32 %r2309, %r2440, %r2439; 2026-02-21T10:18:50.5558791Z cvt.rn.bf16x2.f32 %r2310, %r2442, %r2441; 2026-02-21T10:18:50.5558924Z cvt.rn.bf16x2.f32 %r2311, %r2444, %r2443; 2026-02-21T10:18:50.5559001Z cvt.rn.bf16x2.f32 %r2312, %r2446, %r2445; 2026-02-21T10:18:50.5559071Z cvt.rn.bf16x2.f32 %r2313, %r2448, %r2447; 2026-02-21T10:18:50.5559142Z cvt.rn.bf16x2.f32 %r2314, %r2450, %r2449; 2026-02-21T10:18:50.5559215Z cvt.rn.bf16x2.f32 %r2315, %r2452, %r2451; 2026-02-21T10:18:50.5559288Z cvt.rn.bf16x2.f32 %r2316, %r2454, %r2453; 2026-02-21T10:18:50.5559361Z cvt.rn.bf16x2.f32 %r2317, %r2456, %r2455; 2026-02-21T10:18:50.5559440Z cvt.rn.bf16x2.f32 %r2318, %r2458, %r2457; 2026-02-21T10:18:50.5559510Z cvt.rn.bf16x2.f32 %r2319, %r2460, %r2459; 2026-02-21T10:18:50.5559579Z cvt.rn.bf16x2.f32 %r2320, %r2462, %r2461; 2026-02-21T10:18:50.5559649Z cvt.rn.bf16x2.f32 %r2321, %r2464, %r2463; 2026-02-21T10:18:50.5559722Z cvt.rn.bf16x2.f32 %r2322, %r2466, %r2465; 2026-02-21T10:18:50.5559802Z cvt.rn.bf16x2.f32 %r2323, %r2468, %r2467; 2026-02-21T10:18:50.5559876Z cvt.rn.bf16x2.f32 %r2324, %r2470, %r2469; 2026-02-21T10:18:50.5559955Z cvt.rn.bf16x2.f32 %r2325, %r2472, %r2471; 2026-02-21T10:18:50.5560089Z cvt.rn.bf16x2.f32 %r2326, %r2474, %r2473; 2026-02-21T10:18:50.5560162Z cvt.rn.bf16x2.f32 %r2327, %r2476, %r2475; 2026-02-21T10:18:50.5560248Z cvt.rn.bf16x2.f32 %r2328, %r2478, %r2477; 2026-02-21T10:18:50.5560322Z cvt.rn.bf16x2.f32 %r2329, %r2480, %r2479; 2026-02-21T10:18:50.5560450Z cvt.rn.bf16x2.f32 %r2330, %r2482, %r2481; 2026-02-21T10:18:50.5560524Z cvt.rn.bf16x2.f32 %r2331, %r2484, %r2483; 2026-02-21T10:18:50.5560601Z cvt.rn.bf16x2.f32 %r2332, %r2486, %r2485; 2026-02-21T10:18:50.5560672Z cvt.rn.bf16x2.f32 %r2333, %r2488, %r2487; 2026-02-21T10:18:50.5560743Z cvt.rn.bf16x2.f32 %r2334, %r2490, %r2489; 2026-02-21T10:18:50.5560818Z cvt.rn.bf16x2.f32 %r2335, %r2492, %r2491; 2026-02-21T10:18:50.5560889Z cvt.rn.bf16x2.f32 %r2336, %r2494, %r2493; 2026-02-21T10:18:50.5560962Z cvt.rn.bf16x2.f32 %r2337, %r2496, %r2495; 2026-02-21T10:18:50.5561035Z cvt.rn.bf16x2.f32 %r2338, %r2498, %r2497; 2026-02-21T10:18:50.5561111Z cvt.rn.bf16x2.f32 %r2339, %r2500, %r2499; 2026-02-21T10:18:50.5561184Z cvt.rn.bf16x2.f32 %r2340, %r2502, %r2501; 2026-02-21T10:18:50.5561255Z cvt.rn.bf16x2.f32 %r2341, %r2504, %r2503; 2026-02-21T10:18:50.5561328Z cvt.rn.bf16x2.f32 %r2342, %r2506, %r2505; 2026-02-21T10:18:50.5561397Z cvt.rn.bf16x2.f32 %r2343, %r2508, %r2507; 2026-02-21T10:18:50.5561468Z cvt.rn.bf16x2.f32 %r2344, %r2510, %r2509; 2026-02-21T10:18:50.5561545Z cvt.rn.bf16x2.f32 %r2345, %r2512, %r2511; 2026-02-21T10:18:50.5561617Z cvt.rn.bf16x2.f32 %r2346, %r2514, %r2513; 2026-02-21T10:18:50.5561816Z .loc 1 94 50 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:94:50 2026-02-21T10:18:50.5561888Z mad.lo.s32 %r2347, %r2266, 1280, %r2282; 2026-02-21T10:18:50.5561967Z mad.lo.s32 %r2348, %r2267, 1280, %r2282; 2026-02-21T10:18:50.5562035Z mad.lo.s32 %r2349, %r2268, 1280, %r2282; 2026-02-21T10:18:50.5562104Z mad.lo.s32 %r2350, %r2269, 1280, %r2282; 2026-02-21T10:18:50.5562181Z mad.lo.s32 %r2351, %r2270, 1280, %r2282; 2026-02-21T10:18:50.5562250Z mad.lo.s32 %r2352, %r2271, 1280, %r2282; 2026-02-21T10:18:50.5562318Z mad.lo.s32 %r2353, %r2272, 1280, %r2282; 2026-02-21T10:18:50.5562389Z mad.lo.s32 %r2354, %r2273, 1280, %r2282; 2026-02-21T10:18:50.5562457Z mad.lo.s32 %r2355, %r2274, 1280, %r2282; 2026-02-21T10:18:50.5562526Z mad.lo.s32 %r2356, %r2275, 1280, %r2282; 2026-02-21T10:18:50.5562594Z mad.lo.s32 %r2357, %r2276, 1280, %r2282; 2026-02-21T10:18:50.5562666Z mad.lo.s32 %r2358, %r2277, 1280, %r2282; 2026-02-21T10:18:50.5562744Z mad.lo.s32 %r2359, %r2278, 1280, %r2282; 2026-02-21T10:18:50.5562813Z mad.lo.s32 %r2360, %r2279, 1280, %r2282; 2026-02-21T10:18:50.5562884Z mad.lo.s32 %r2361, %r2280, 1280, %r2282; 2026-02-21T10:18:50.5562952Z mad.lo.s32 %r2362, %r2281, 1280, %r2282; 2026-02-21T10:18:50.5563147Z .loc 1 94 22 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:94:22 2026-02-21T10:18:50.5563341Z mad.wide.s32 %rd109, %r2347, 2, %rd7; 2026-02-21T10:18:50.5563412Z mad.wide.s32 %rd110, %r2348, 2, %rd7; 2026-02-21T10:18:50.5563478Z mad.wide.s32 %rd111, %r2349, 2, %rd7; 2026-02-21T10:18:50.5563545Z mad.wide.s32 %rd112, %r2350, 2, %rd7; 2026-02-21T10:18:50.5563614Z mad.wide.s32 %rd113, %r2351, 2, %rd7; 2026-02-21T10:18:50.5563679Z mad.wide.s32 %rd114, %r2352, 2, %rd7; 2026-02-21T10:18:50.5563746Z mad.wide.s32 %rd115, %r2353, 2, %rd7; 2026-02-21T10:18:50.5563816Z mad.wide.s32 %rd116, %r2354, 2, %rd7; 2026-02-21T10:18:50.5563882Z mad.wide.s32 %rd117, %r2355, 2, %rd7; 2026-02-21T10:18:50.5563948Z mad.wide.s32 %rd118, %r2356, 2, %rd7; 2026-02-21T10:18:50.5564014Z mad.wide.s32 %rd119, %r2357, 2, %rd7; 2026-02-21T10:18:50.5564086Z mad.wide.s32 %rd120, %r2358, 2, %rd7; 2026-02-21T10:18:50.5564151Z mad.wide.s32 %rd121, %r2359, 2, %rd7; 2026-02-21T10:18:50.5564217Z mad.wide.s32 %rd122, %r2360, 2, %rd7; 2026-02-21T10:18:50.5564289Z mad.wide.s32 %rd123, %r2361, 2, %rd7; 2026-02-21T10:18:50.5564356Z mad.wide.s32 %rd124, %r2362, 2, %rd7; 2026-02-21T10:18:50.5564606Z .loc 1 94 81 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:94:81 2026-02-21T10:18:50.5564728Z st.shared.v4.b32 [%r60], {%r2283, %r2285, %r2287, %r2289}; 2026-02-21T10:18:50.5564834Z st.shared.v4.b32 [%r61], {%r2291, %r2293, %r2295, %r2297}; 2026-02-21T10:18:50.5564981Z st.shared.v4.b32 [%r62], {%r2299, %r2301, %r2303, %r2305}; 2026-02-21T10:18:50.5565093Z st.shared.v4.b32 [%r63], {%r2307, %r2309, %r2311, %r2313}; 2026-02-21T10:18:50.5565151Z bar.sync 0; 2026-02-21T10:18:50.5565212Z // begin inline asm 2026-02-21T10:18:50.5565412Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2201, %r2202, %r2203, %r2204}, [%r2125]; 2026-02-21T10:18:50.5565472Z // end inline asm 2026-02-21T10:18:50.5565531Z // begin inline asm 2026-02-21T10:18:50.5565719Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2209, %r2210, %r2211, %r2212}, [%r2130]; 2026-02-21T10:18:50.5565779Z // end inline asm 2026-02-21T10:18:50.5565838Z // begin inline asm 2026-02-21T10:18:50.5566024Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2217, %r2218, %r2219, %r2220}, [%r2135]; 2026-02-21T10:18:50.5566080Z // end inline asm 2026-02-21T10:18:50.5566137Z // begin inline asm 2026-02-21T10:18:50.5566314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2225, %r2226, %r2227, %r2228}, [%r2140]; 2026-02-21T10:18:50.5566376Z // end inline asm 2026-02-21T10:18:50.5566435Z bar.sync 0; 2026-02-21T10:18:50.5566668Z st.shared.v4.b32 [%r60], {%r2284, %r2286, %r2288, %r2290}; 2026-02-21T10:18:50.5566779Z st.shared.v4.b32 [%r61], {%r2292, %r2294, %r2296, %r2298}; 2026-02-21T10:18:50.5566882Z st.shared.v4.b32 [%r62], {%r2300, %r2302, %r2304, %r2306}; 2026-02-21T10:18:50.5566980Z st.shared.v4.b32 [%r63], {%r2308, %r2310, %r2312, %r2314}; 2026-02-21T10:18:50.5567038Z bar.sync 0; 2026-02-21T10:18:50.5567096Z // begin inline asm 2026-02-21T10:18:50.5567278Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2205, %r2206, %r2207, %r2208}, [%r2125]; 2026-02-21T10:18:50.5567336Z // end inline asm 2026-02-21T10:18:50.5567402Z // begin inline asm 2026-02-21T10:18:50.5567583Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2213, %r2214, %r2215, %r2216}, [%r2130]; 2026-02-21T10:18:50.5567639Z // end inline asm 2026-02-21T10:18:50.5567699Z // begin inline asm 2026-02-21T10:18:50.5567881Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2221, %r2222, %r2223, %r2224}, [%r2135]; 2026-02-21T10:18:50.5567937Z // end inline asm 2026-02-21T10:18:50.5567993Z // begin inline asm 2026-02-21T10:18:50.5568173Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2229, %r2230, %r2231, %r2232}, [%r2140]; 2026-02-21T10:18:50.5568230Z // end inline asm 2026-02-21T10:18:50.5568286Z bar.sync 0; 2026-02-21T10:18:50.5568391Z st.shared.v4.b32 [%r60], {%r2315, %r2317, %r2319, %r2321}; 2026-02-21T10:18:50.5568492Z st.shared.v4.b32 [%r61], {%r2323, %r2325, %r2327, %r2329}; 2026-02-21T10:18:50.5568690Z st.shared.v4.b32 [%r62], {%r2331, %r2333, %r2335, %r2337}; 2026-02-21T10:18:50.5568854Z st.shared.v4.b32 [%r63], {%r2339, %r2341, %r2343, %r2345}; 2026-02-21T10:18:50.5568911Z bar.sync 0; 2026-02-21T10:18:50.5568968Z // begin inline asm 2026-02-21T10:18:50.5569146Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2233, %r2234, %r2235, %r2236}, [%r2125]; 2026-02-21T10:18:50.5569205Z // end inline asm 2026-02-21T10:18:50.5569263Z // begin inline asm 2026-02-21T10:18:50.5569442Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2241, %r2242, %r2243, %r2244}, [%r2130]; 2026-02-21T10:18:50.5569502Z // end inline asm 2026-02-21T10:18:50.5569570Z // begin inline asm 2026-02-21T10:18:50.5569759Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2249, %r2250, %r2251, %r2252}, [%r2135]; 2026-02-21T10:18:50.5569819Z // end inline asm 2026-02-21T10:18:50.5569877Z // begin inline asm 2026-02-21T10:18:50.5570054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2257, %r2258, %r2259, %r2260}, [%r2140]; 2026-02-21T10:18:50.5570112Z // end inline asm 2026-02-21T10:18:50.5570170Z bar.sync 0; 2026-02-21T10:18:50.5570272Z st.shared.v4.b32 [%r60], {%r2316, %r2318, %r2320, %r2322}; 2026-02-21T10:18:50.5570435Z st.shared.v4.b32 [%r61], {%r2324, %r2326, %r2328, %r2330}; 2026-02-21T10:18:50.5570540Z st.shared.v4.b32 [%r62], {%r2332, %r2334, %r2336, %r2338}; 2026-02-21T10:18:50.5570638Z st.shared.v4.b32 [%r63], {%r2340, %r2342, %r2344, %r2346}; 2026-02-21T10:18:50.5570693Z bar.sync 0; 2026-02-21T10:18:50.5570812Z // begin inline asm 2026-02-21T10:18:50.5571012Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2237, %r2238, %r2239, %r2240}, [%r2125]; 2026-02-21T10:18:50.5571067Z // end inline asm 2026-02-21T10:18:50.5571126Z // begin inline asm 2026-02-21T10:18:50.5571307Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2245, %r2246, %r2247, %r2248}, [%r2130]; 2026-02-21T10:18:50.5571364Z // end inline asm 2026-02-21T10:18:50.5571420Z // begin inline asm 2026-02-21T10:18:50.5571601Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2253, %r2254, %r2255, %r2256}, [%r2135]; 2026-02-21T10:18:50.5571660Z // end inline asm 2026-02-21T10:18:50.5571718Z // begin inline asm 2026-02-21T10:18:50.5571896Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2261, %r2262, %r2263, %r2264}, [%r2140]; 2026-02-21T10:18:50.5571955Z // end inline asm 2026-02-21T10:18:50.5572013Z // begin inline asm 2026-02-21T10:18:50.5572138Z st.global.v4.b32 [ %rd109 + 0 ], { %r2201, %r2202, %r2203, %r2204 }; 2026-02-21T10:18:50.5572200Z // end inline asm 2026-02-21T10:18:50.5572257Z // begin inline asm 2026-02-21T10:18:50.5572375Z st.global.v4.b32 [ %rd110 + 0 ], { %r2205, %r2206, %r2207, %r2208 }; 2026-02-21T10:18:50.5572434Z // end inline asm 2026-02-21T10:18:50.5572493Z // begin inline asm 2026-02-21T10:18:50.5572606Z st.global.v4.b32 [ %rd111 + 0 ], { %r2209, %r2210, %r2211, %r2212 }; 2026-02-21T10:18:50.5572661Z // end inline asm 2026-02-21T10:18:50.5572724Z // begin inline asm 2026-02-21T10:18:50.5572836Z st.global.v4.b32 [ %rd112 + 0 ], { %r2213, %r2214, %r2215, %r2216 }; 2026-02-21T10:18:50.5572892Z // end inline asm 2026-02-21T10:18:50.5572955Z // begin inline asm 2026-02-21T10:18:50.5573071Z st.global.v4.b32 [ %rd113 + 0 ], { %r2217, %r2218, %r2219, %r2220 }; 2026-02-21T10:18:50.5573127Z // end inline asm 2026-02-21T10:18:50.5573185Z // begin inline asm 2026-02-21T10:18:50.5573301Z st.global.v4.b32 [ %rd114 + 0 ], { %r2221, %r2222, %r2223, %r2224 }; 2026-02-21T10:18:50.5573356Z // end inline asm 2026-02-21T10:18:50.5573414Z // begin inline asm 2026-02-21T10:18:50.5573530Z st.global.v4.b32 [ %rd115 + 0 ], { %r2225, %r2226, %r2227, %r2228 }; 2026-02-21T10:18:50.5573585Z // end inline asm 2026-02-21T10:18:50.5573642Z // begin inline asm 2026-02-21T10:18:50.5573754Z st.global.v4.b32 [ %rd116 + 0 ], { %r2229, %r2230, %r2231, %r2232 }; 2026-02-21T10:18:50.5573812Z // end inline asm 2026-02-21T10:18:50.5573871Z // begin inline asm 2026-02-21T10:18:50.5573983Z st.global.v4.b32 [ %rd117 + 0 ], { %r2233, %r2234, %r2235, %r2236 }; 2026-02-21T10:18:50.5574130Z // end inline asm 2026-02-21T10:18:50.5574188Z // begin inline asm 2026-02-21T10:18:50.5574352Z st.global.v4.b32 [ %rd118 + 0 ], { %r2237, %r2238, %r2239, %r2240 }; 2026-02-21T10:18:50.5574410Z // end inline asm 2026-02-21T10:18:50.5574469Z // begin inline asm 2026-02-21T10:18:50.5574581Z st.global.v4.b32 [ %rd119 + 0 ], { %r2241, %r2242, %r2243, %r2244 }; 2026-02-21T10:18:50.5574638Z // end inline asm 2026-02-21T10:18:50.5574699Z // begin inline asm 2026-02-21T10:18:50.5574812Z st.global.v4.b32 [ %rd120 + 0 ], { %r2245, %r2246, %r2247, %r2248 }; 2026-02-21T10:18:50.5574867Z // end inline asm 2026-02-21T10:18:50.5574929Z // begin inline asm 2026-02-21T10:18:50.5575043Z st.global.v4.b32 [ %rd121 + 0 ], { %r2249, %r2250, %r2251, %r2252 }; 2026-02-21T10:18:50.5575100Z // end inline asm 2026-02-21T10:18:50.5575157Z // begin inline asm 2026-02-21T10:18:50.5575273Z st.global.v4.b32 [ %rd122 + 0 ], { %r2253, %r2254, %r2255, %r2256 }; 2026-02-21T10:18:50.5575330Z // end inline asm 2026-02-21T10:18:50.5575389Z // begin inline asm 2026-02-21T10:18:50.5575516Z st.global.v4.b32 [ %rd123 + 0 ], { %r2257, %r2258, %r2259, %r2260 }; 2026-02-21T10:18:50.5575577Z // end inline asm 2026-02-21T10:18:50.5575693Z // begin inline asm 2026-02-21T10:18:50.5575818Z st.global.v4.b32 [ %rd124 + 0 ], { %r2261, %r2262, %r2263, %r2264 }; 2026-02-21T10:18:50.5575873Z // end inline asm 2026-02-21T10:18:50.5575937Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:18:50.5576048Z mov.b32 %r2388, %r2387; 2026-02-21T10:18:50.5576114Z mov.b32 %r2389, %r2387; 2026-02-21T10:18:50.5576171Z mov.b32 %r2390, %r2387; 2026-02-21T10:18:50.5576229Z mov.b32 %r2391, %r2387; 2026-02-21T10:18:50.5576289Z mov.b32 %r2392, %r2387; 2026-02-21T10:18:50.5576345Z mov.b32 %r2393, %r2387; 2026-02-21T10:18:50.5576402Z mov.b32 %r2394, %r2387; 2026-02-21T10:18:50.5576577Z mov.b32 %r2395, %r2387; 2026-02-21T10:18:50.5576646Z mov.b32 %r2396, %r2387; 2026-02-21T10:18:50.5576703Z mov.b32 %r2397, %r2387; 2026-02-21T10:18:50.5576762Z mov.b32 %r2398, %r2387; 2026-02-21T10:18:50.5576823Z mov.b32 %r2399, %r2387; 2026-02-21T10:18:50.5576882Z mov.b32 %r2400, %r2387; 2026-02-21T10:18:50.5576942Z mov.b32 %r2401, %r2387; 2026-02-21T10:18:50.5576999Z mov.b32 %r2402, %r2387; 2026-02-21T10:18:50.5577059Z mov.b32 %r2403, %r2387; 2026-02-21T10:18:50.5577119Z mov.b32 %r2404, %r2387; 2026-02-21T10:18:50.5577176Z mov.b32 %r2405, %r2387; 2026-02-21T10:18:50.5577239Z mov.b32 %r2406, %r2387; 2026-02-21T10:18:50.5577297Z mov.b32 %r2407, %r2387; 2026-02-21T10:18:50.5577353Z mov.b32 %r2408, %r2387; 2026-02-21T10:18:50.5577410Z mov.b32 %r2409, %r2387; 2026-02-21T10:18:50.5577470Z mov.b32 %r2410, %r2387; 2026-02-21T10:18:50.5577529Z mov.b32 %r2411, %r2387; 2026-02-21T10:18:50.5577586Z mov.b32 %r2412, %r2387; 2026-02-21T10:18:50.5577646Z mov.b32 %r2413, %r2387; 2026-02-21T10:18:50.5577703Z mov.b32 %r2414, %r2387; 2026-02-21T10:18:50.5577760Z mov.b32 %r2415, %r2387; 2026-02-21T10:18:50.5577819Z mov.b32 %r2416, %r2387; 2026-02-21T10:18:50.5577881Z mov.b32 %r2417, %r2387; 2026-02-21T10:18:50.5577939Z mov.b32 %r2418, %r2387; 2026-02-21T10:18:50.5577996Z mov.b32 %r2419, %r2387; 2026-02-21T10:18:50.5578058Z mov.b32 %r2420, %r2387; 2026-02-21T10:18:50.5578115Z mov.b32 %r2421, %r2387; 2026-02-21T10:18:50.5578174Z mov.b32 %r2422, %r2387; 2026-02-21T10:18:50.5578235Z mov.b32 %r2423, %r2387; 2026-02-21T10:18:50.5578293Z mov.b32 %r2424, %r2387; 2026-02-21T10:18:50.5578352Z mov.b32 %r2425, %r2387; 2026-02-21T10:18:50.5578409Z mov.b32 %r2426, %r2387; 2026-02-21T10:18:50.5578481Z mov.b32 %r2427, %r2387; 2026-02-21T10:18:50.5578543Z mov.b32 %r2428, %r2387; 2026-02-21T10:18:50.5578601Z mov.b32 %r2429, %r2387; 2026-02-21T10:18:50.5578662Z mov.b32 %r2430, %r2387; 2026-02-21T10:18:50.5578720Z mov.b32 %r2431, %r2387; 2026-02-21T10:18:50.5578777Z mov.b32 %r2432, %r2387; 2026-02-21T10:18:50.5578833Z mov.b32 %r2433, %r2387; 2026-02-21T10:18:50.5578894Z mov.b32 %r2434, %r2387; 2026-02-21T10:18:50.5579043Z mov.b32 %r2435, %r2387; 2026-02-21T10:18:50.5579101Z mov.b32 %r2436, %r2387; 2026-02-21T10:18:50.5579222Z mov.b32 %r2437, %r2387; 2026-02-21T10:18:50.5579280Z mov.b32 %r2438, %r2387; 2026-02-21T10:18:50.5579337Z mov.b32 %r2439, %r2387; 2026-02-21T10:18:50.5579394Z mov.b32 %r2440, %r2387; 2026-02-21T10:18:50.5579456Z mov.b32 %r2441, %r2387; 2026-02-21T10:18:50.5579514Z mov.b32 %r2442, %r2387; 2026-02-21T10:18:50.5579572Z mov.b32 %r2443, %r2387; 2026-02-21T10:18:50.5579645Z mov.b32 %r2444, %r2387; 2026-02-21T10:18:50.5579704Z mov.b32 %r2445, %r2387; 2026-02-21T10:18:50.5579761Z mov.b32 %r2446, %r2387; 2026-02-21T10:18:50.5579818Z mov.b32 %r2447, %r2387; 2026-02-21T10:18:50.5579880Z mov.b32 %r2448, %r2387; 2026-02-21T10:18:50.5579939Z mov.b32 %r2449, %r2387; 2026-02-21T10:18:50.5579996Z mov.b32 %r2450, %r2387; 2026-02-21T10:18:50.5580056Z mov.b32 %r2451, %r2387; 2026-02-21T10:18:50.5580114Z mov.b32 %r2452, %r2387; 2026-02-21T10:18:50.5580170Z mov.b32 %r2453, %r2387; 2026-02-21T10:18:50.5580231Z mov.b32 %r2454, %r2387; 2026-02-21T10:18:50.5580290Z mov.b32 %r2455, %r2387; 2026-02-21T10:18:50.5580349Z mov.b32 %r2456, %r2387; 2026-02-21T10:18:50.5580484Z mov.b32 %r2457, %r2387; 2026-02-21T10:18:50.5580550Z mov.b32 %r2458, %r2387; 2026-02-21T10:18:50.5580607Z mov.b32 %r2459, %r2387; 2026-02-21T10:18:50.5580664Z mov.b32 %r2460, %r2387; 2026-02-21T10:18:50.5580720Z mov.b32 %r2461, %r2387; 2026-02-21T10:18:50.5580779Z mov.b32 %r2462, %r2387; 2026-02-21T10:18:50.5580913Z mov.b32 %r2463, %r2387; 2026-02-21T10:18:50.5580973Z mov.b32 %r2464, %r2387; 2026-02-21T10:18:50.5581034Z mov.b32 %r2465, %r2387; 2026-02-21T10:18:50.5581092Z mov.b32 %r2466, %r2387; 2026-02-21T10:18:50.5581149Z mov.b32 %r2467, %r2387; 2026-02-21T10:18:50.5581209Z mov.b32 %r2468, %r2387; 2026-02-21T10:18:50.5581266Z mov.b32 %r2469, %r2387; 2026-02-21T10:18:50.5581324Z mov.b32 %r2470, %r2387; 2026-02-21T10:18:50.5581383Z mov.b32 %r2471, %r2387; 2026-02-21T10:18:50.5581444Z mov.b32 %r2472, %r2387; 2026-02-21T10:18:50.5581504Z mov.b32 %r2473, %r2387; 2026-02-21T10:18:50.5581563Z mov.b32 %r2474, %r2387; 2026-02-21T10:18:50.5581623Z mov.b32 %r2475, %r2387; 2026-02-21T10:18:50.5581683Z mov.b32 %r2476, %r2387; 2026-02-21T10:18:50.5581754Z mov.b32 %r2477, %r2387; 2026-02-21T10:18:50.5581813Z mov.b32 %r2478, %r2387; 2026-02-21T10:18:50.5581879Z mov.b32 %r2479, %r2387; 2026-02-21T10:18:50.5581937Z mov.b32 %r2480, %r2387; 2026-02-21T10:18:50.5581995Z mov.b32 %r2481, %r2387; 2026-02-21T10:18:50.5582056Z mov.b32 %r2482, %r2387; 2026-02-21T10:18:50.5582112Z mov.b32 %r2483, %r2387; 2026-02-21T10:18:50.5582169Z mov.b32 %r2484, %r2387; 2026-02-21T10:18:50.5582227Z mov.b32 %r2485, %r2387; 2026-02-21T10:18:50.5582288Z mov.b32 %r2486, %r2387; 2026-02-21T10:18:50.5582344Z mov.b32 %r2487, %r2387; 2026-02-21T10:18:50.5582401Z mov.b32 %r2488, %r2387; 2026-02-21T10:18:50.5582461Z mov.b32 %r2489, %r2387; 2026-02-21T10:18:50.5582519Z mov.b32 %r2490, %r2387; 2026-02-21T10:18:50.5582582Z mov.b32 %r2491, %r2387; 2026-02-21T10:18:50.5582639Z mov.b32 %r2492, %r2387; 2026-02-21T10:18:50.5582700Z mov.b32 %r2493, %r2387; 2026-02-21T10:18:50.5582757Z mov.b32 %r2494, %r2387; 2026-02-21T10:18:50.5582815Z mov.b32 %r2495, %r2387; 2026-02-21T10:18:50.5582875Z mov.b32 %r2496, %r2387; 2026-02-21T10:18:50.5582932Z mov.b32 %r2497, %r2387; 2026-02-21T10:18:50.5582990Z mov.b32 %r2498, %r2387; 2026-02-21T10:18:50.5583046Z mov.b32 %r2499, %r2387; 2026-02-21T10:18:50.5583108Z mov.b32 %r2500, %r2387; 2026-02-21T10:18:50.5583166Z mov.b32 %r2501, %r2387; 2026-02-21T10:18:50.5583223Z mov.b32 %r2502, %r2387; 2026-02-21T10:18:50.5583281Z mov.b32 %r2503, %r2387; 2026-02-21T10:18:50.5583338Z mov.b32 %r2504, %r2387; 2026-02-21T10:18:50.5583394Z mov.b32 %r2505, %r2387; 2026-02-21T10:18:50.5583463Z mov.b32 %r2506, %r2387; 2026-02-21T10:18:50.5583527Z mov.b32 %r2507, %r2387; 2026-02-21T10:18:50.5583586Z mov.b32 %r2508, %r2387; 2026-02-21T10:18:50.5583642Z mov.b32 %r2509, %r2387; 2026-02-21T10:18:50.5583763Z mov.b32 %r2510, %r2387; 2026-02-21T10:18:50.5583821Z mov.b32 %r2511, %r2387; 2026-02-21T10:18:50.5583930Z mov.b32 %r2512, %r2387; 2026-02-21T10:18:50.5583990Z mov.b32 %r2513, %r2387; 2026-02-21T10:18:50.5584054Z mov.b32 %r2514, %r2387; 2026-02-21T10:18:50.5584123Z bra.uni $L__BB0_6; 2026-02-21T10:18:50.5584216Z $L__BB0_7: // %._crit_edge 2026-02-21T10:18:50.5584439Z .loc 1 26 144 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:144 2026-02-21T10:18:50.5584508Z cp.async.wait_group 0; 2026-02-21T10:18:50.5584562Z bar.sync 0; 2026-02-21T10:18:50.5584622Z // begin inline asm 2026-02-21T10:18:50.5584719Z @%p61 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T10:18:50.5584774Z // end inline asm 2026-02-21T10:18:50.5584827Z bar.sync 0; 2026-02-21T10:18:50.5584888Z // begin inline asm 2026-02-21T10:18:50.5584974Z @%p61 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T10:18:50.5585029Z // end inline asm 2026-02-21T10:18:50.5585089Z bar.sync 0; 2026-02-21T10:18:50.5585146Z // begin inline asm 2026-02-21T10:18:50.5585231Z @%p61 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T10:18:50.5585338Z // end inline asm 2026-02-21T10:18:50.5585549Z .loc 1 26 4 // cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py:26:4 2026-02-21T10:18:50.5585601Z ret; 2026-02-21T10:18:50.5585656Z $L__tmp3: 2026-02-21T10:18:50.5585714Z $L__func_end0: 2026-02-21T10:18:50.5585845Z // -- End function 2026-02-21T10:18:50.5585900Z } 2026-02-21T10:18:50.5586147Z .file 1 "/tmp/torchinductor_root/d6/cd6frfkwg3sledm725kp7u232dbpnw4xhs3zdabdyn3um5fmskus.py" 2026-02-21T10:18:50.5586357Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:18:50.5586422Z .section .debug_abbrev 2026-02-21T10:18:50.5586590Z { 2026-02-21T10:18:50.5586695Z .b8 1 // Abbreviation Code 2026-02-21T10:18:50.5586794Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:18:50.5586880Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:50.5586981Z .b8 37 // DW_AT_producer 2026-02-21T10:18:50.5587066Z .b8 8 // DW_FORM_string 2026-02-21T10:18:50.5587145Z .b8 19 // DW_AT_language 2026-02-21T10:18:50.5587230Z .b8 5 // DW_FORM_data2 2026-02-21T10:18:50.5587309Z .b8 3 // DW_AT_name 2026-02-21T10:18:50.5587388Z .b8 8 // DW_FORM_string 2026-02-21T10:18:50.5587469Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:18:50.5587551Z .b8 6 // DW_FORM_data4 2026-02-21T10:18:50.5587627Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:18:50.5587703Z .b8 8 // DW_FORM_string 2026-02-21T10:18:50.5587781Z .b8 0 // EOM(1) 2026-02-21T10:18:50.5587853Z .b8 0 // EOM(2) 2026-02-21T10:18:50.5587941Z .b8 2 // Abbreviation Code 2026-02-21T10:18:50.5588032Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:50.5588111Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:50.5588190Z .b8 3 // DW_AT_name 2026-02-21T10:18:50.5588270Z .b8 8 // DW_FORM_string 2026-02-21T10:18:50.5588350Z .b8 32 // DW_AT_inline 2026-02-21T10:18:50.5588492Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:50.5588569Z .b8 0 // EOM(1) 2026-02-21T10:18:50.5588643Z .b8 0 // EOM(2) 2026-02-21T10:18:50.5588728Z .b8 3 // Abbreviation Code 2026-02-21T10:18:50.5588911Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:50.5589059Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:50.5589140Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:50.5589216Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:50.5589300Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:50.5589376Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:50.5589471Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:50.5589546Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:50.5589619Z .b8 0 // EOM(1) 2026-02-21T10:18:50.5589698Z .b8 0 // EOM(2) 2026-02-21T10:18:50.5589785Z .b8 4 // Abbreviation Code 2026-02-21T10:18:50.5589888Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:18:50.5589970Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:50.5590062Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:50.5590214Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:50.5590295Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:50.5590371Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:50.5590507Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:50.5590585Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:50.5590666Z .b8 88 // DW_AT_call_file 2026-02-21T10:18:50.5590744Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:50.5590828Z .b8 89 // DW_AT_call_line 2026-02-21T10:18:50.5590906Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:50.5591000Z .b8 87 // DW_AT_call_column 2026-02-21T10:18:50.5591085Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:50.5591161Z .b8 0 // EOM(1) 2026-02-21T10:18:50.5591230Z .b8 0 // EOM(2) 2026-02-21T10:18:50.5591300Z .b8 0 // EOM(3) 2026-02-21T10:18:50.5591352Z } 2026-02-21T10:18:50.5591414Z .section .debug_info 2026-02-21T10:18:50.5591465Z { 2026-02-21T10:18:50.5591556Z .b32 178 // Length of Unit 2026-02-21T10:18:50.5591648Z .b8 2 // DWARF version number 2026-02-21T10:18:50.5591699Z .b8 0 2026-02-21T10:18:50.5591830Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:18:50.5591926Z .b8 8 // Address Size (in bytes) 2026-02-21T10:18:50.5592038Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:18:50.5592124Z .b8 116 // DW_AT_producer 2026-02-21T10:18:50.5592181Z .b8 114 2026-02-21T10:18:50.5592233Z .b8 105 2026-02-21T10:18:50.5592284Z .b8 116 2026-02-21T10:18:50.5592340Z .b8 111 2026-02-21T10:18:50.5592392Z .b8 110 2026-02-21T10:18:50.5592444Z .b8 0 2026-02-21T10:18:50.5592521Z .b8 2 // DW_AT_language 2026-02-21T10:18:50.5592574Z .b8 0 2026-02-21T10:18:50.5592651Z .b8 99 // DW_AT_name 2026-02-21T10:18:50.5592704Z .b8 100 2026-02-21T10:18:50.5592758Z .b8 54 2026-02-21T10:18:50.5592810Z .b8 102 2026-02-21T10:18:50.5592860Z .b8 114 2026-02-21T10:18:50.5592909Z .b8 102 2026-02-21T10:18:50.5592963Z .b8 107 2026-02-21T10:18:50.5593014Z .b8 119 2026-02-21T10:18:50.5593066Z .b8 103 2026-02-21T10:18:50.5593119Z .b8 51 2026-02-21T10:18:50.5593169Z .b8 115 2026-02-21T10:18:50.5593220Z .b8 108 2026-02-21T10:18:50.5593270Z .b8 101 2026-02-21T10:18:50.5593323Z .b8 100 2026-02-21T10:18:50.5593374Z .b8 109 2026-02-21T10:18:50.5593497Z .b8 55 2026-02-21T10:18:50.5593550Z .b8 50 2026-02-21T10:18:50.5593600Z .b8 53 2026-02-21T10:18:50.5593698Z .b8 107 2026-02-21T10:18:50.5593749Z .b8 112 2026-02-21T10:18:50.5593801Z .b8 55 2026-02-21T10:18:50.5593853Z .b8 117 2026-02-21T10:18:50.5593902Z .b8 50 2026-02-21T10:18:50.5593952Z .b8 51 2026-02-21T10:18:50.5594005Z .b8 50 2026-02-21T10:18:50.5594056Z .b8 100 2026-02-21T10:18:50.5594105Z .b8 98 2026-02-21T10:18:50.5594158Z .b8 112 2026-02-21T10:18:50.5594209Z .b8 110 2026-02-21T10:18:50.5594261Z .b8 119 2026-02-21T10:18:50.5594311Z .b8 52 2026-02-21T10:18:50.5594364Z .b8 120 2026-02-21T10:18:50.5594414Z .b8 104 2026-02-21T10:18:50.5594464Z .b8 115 2026-02-21T10:18:50.5594519Z .b8 51 2026-02-21T10:18:50.5594579Z .b8 122 2026-02-21T10:18:50.5594632Z .b8 100 2026-02-21T10:18:50.5594681Z .b8 97 2026-02-21T10:18:50.5594734Z .b8 98 2026-02-21T10:18:50.5594785Z .b8 100 2026-02-21T10:18:50.5594837Z .b8 121 2026-02-21T10:18:50.5594887Z .b8 110 2026-02-21T10:18:50.5594941Z .b8 51 2026-02-21T10:18:50.5594992Z .b8 117 2026-02-21T10:18:50.5595042Z .b8 109 2026-02-21T10:18:50.5595094Z .b8 53 2026-02-21T10:18:50.5595148Z .b8 102 2026-02-21T10:18:50.5595198Z .b8 109 2026-02-21T10:18:50.5595248Z .b8 115 2026-02-21T10:18:50.5595371Z .b8 107 2026-02-21T10:18:50.5595428Z .b8 117 2026-02-21T10:18:50.5595479Z .b8 115 2026-02-21T10:18:50.5595531Z .b8 46 2026-02-21T10:18:50.5595581Z .b8 112 2026-02-21T10:18:50.5595630Z .b8 121 2026-02-21T10:18:50.5595679Z .b8 0 2026-02-21T10:18:50.5595831Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:18:50.5595918Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:18:50.5595972Z .b8 116 2026-02-21T10:18:50.5596026Z .b8 109 2026-02-21T10:18:50.5596078Z .b8 112 2026-02-21T10:18:50.5596128Z .b8 47 2026-02-21T10:18:50.5596179Z .b8 116 2026-02-21T10:18:50.5596237Z .b8 111 2026-02-21T10:18:50.5596300Z .b8 114 2026-02-21T10:18:50.5596353Z .b8 99 2026-02-21T10:18:50.5596408Z .b8 104 2026-02-21T10:18:50.5596577Z .b8 105 2026-02-21T10:18:50.5596636Z .b8 110 2026-02-21T10:18:50.5596689Z .b8 100 2026-02-21T10:18:50.5596751Z .b8 117 2026-02-21T10:18:50.5596809Z .b8 99 2026-02-21T10:18:50.5596863Z .b8 116 2026-02-21T10:18:50.5596915Z .b8 111 2026-02-21T10:18:50.5596974Z .b8 114 2026-02-21T10:18:50.5597025Z .b8 95 2026-02-21T10:18:50.5597079Z .b8 114 2026-02-21T10:18:50.5597134Z .b8 111 2026-02-21T10:18:50.5597188Z .b8 111 2026-02-21T10:18:50.5597239Z .b8 116 2026-02-21T10:18:50.5597292Z .b8 47 2026-02-21T10:18:50.5597346Z .b8 100 2026-02-21T10:18:50.5597398Z .b8 54 2026-02-21T10:18:50.5597449Z .b8 0 2026-02-21T10:18:50.5597565Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:18:50.5597644Z .b8 95 // DW_AT_name 2026-02-21T10:18:50.5597697Z .b8 104 2026-02-21T10:18:50.5597746Z .b8 101 2026-02-21T10:18:50.5597802Z .b8 108 2026-02-21T10:18:50.5597855Z .b8 105 2026-02-21T10:18:50.5597906Z .b8 111 2026-02-21T10:18:50.5597963Z .b8 110 2026-02-21T10:18:50.5598013Z .b8 95 2026-02-21T10:18:50.5598067Z .b8 109 2026-02-21T10:18:50.5598117Z .b8 97 2026-02-21T10:18:50.5598174Z .b8 116 2026-02-21T10:18:50.5598227Z .b8 109 2026-02-21T10:18:50.5598278Z .b8 117 2026-02-21T10:18:50.5598331Z .b8 108 2026-02-21T10:18:50.5598387Z .b8 95 2026-02-21T10:18:50.5598438Z .b8 98 2026-02-21T10:18:50.5598491Z .b8 102 2026-02-21T10:18:50.5598546Z .b8 49 2026-02-21T10:18:50.5598597Z .b8 54 2026-02-21T10:18:50.5598648Z .b8 95 2026-02-21T10:18:50.5598699Z .b8 105 2026-02-21T10:18:50.5598757Z .b8 110 2026-02-21T10:18:50.5598809Z .b8 116 2026-02-21T10:18:50.5598860Z .b8 52 2026-02-21T10:18:50.5598914Z .b8 0 2026-02-21T10:18:50.5598992Z .b8 1 // DW_AT_inline 2026-02-21T10:18:50.5599097Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:18:50.5599191Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:18:50.5599288Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:18:50.5599385Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:50.5599605Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:18:50.5599770Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:50.5599865Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:18:50.5599952Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:18:50.5600041Z .b8 1 // DW_AT_call_file 2026-02-21T10:18:50.5600123Z .b8 90 // DW_AT_call_line 2026-02-21T10:18:50.5600209Z .b8 40 // DW_AT_call_column 2026-02-21T10:18:50.5600305Z .b8 0 // End Of Children Mark 2026-02-21T10:18:50.5600391Z .b8 0 // End Of Children Mark 2026-02-21T10:18:50.5600442Z } 2026-02-21T10:18:50.5600513Z .section .debug_macinfo { } 2026-02-21T10:18:50.5600519Z 2026-02-21T10:18:50.5600604Z ================================================================ 2026-02-21T10:18:50.5600722Z please share the reproducer above with Triton project. 2026-02-21T10:18:51.7220814Z 2026-02-21T10:18:51.7221024Z 2026-02-21T10:18:51.7221030Z 2026-02-21T10:18:51.7221172Z ================================================================ 2026-02-21T10:18:51.7221495Z Internal Triton PTX codegen error 2026-02-21T10:18:51.7221736Z `ptxas` stderr: 2026-02-21T10:18:51.7222492Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:51.7223214Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:51.7223440Z 2026-02-21T10:18:51.7224016Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsc4u94ut.ptx -o /tmp/tmpsc4u94ut.ptx.o 2026-02-21T10:18:51.7224670Z 2026-02-21T10:18:51.7224680Z 2026-02-21T10:18:51.7224749Z // 2026-02-21T10:18:51.7224909Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:18:51.7225124Z // 2026-02-21T10:18:51.7225198Z 2026-02-21T10:18:51.7225263Z .version 8.7 2026-02-21T10:18:51.7225418Z .target sm_90a 2026-02-21T10:18:51.7225567Z .address_size 64 2026-02-21T10:18:51.7225669Z 2026-02-21T10:18:51.7225852Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:18:51.7226224Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:18:51.7226642Z // @_helion_matmul_bf16_int4 2026-02-21T10:18:51.7226927Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:18:51.7227551Z [3124s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:18:51.7229390Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[4, 2], range_unroll_factors=[1, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:18:51.7231046Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:18:51.7231333Z `ptxas` stderr: 2026-02-21T10:18:51.7231904Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:18:51.7232561Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:18:51.7232745Z 2026-02-21T10:18:51.7233256Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpsc4u94ut.ptx -o /tmp/tmpsc4u94ut.ptx.o 2026-02-21T10:18:51.7233851Z 2026-02-21T10:18:51.7234004Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:18:51.7234581Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:18:51.7234946Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:18:51.7235314Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:18:51.7235664Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:18:51.7236023Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:18:51.7236291Z ) 2026-02-21T10:18:51.7236422Z .reqntid 128 2026-02-21T10:18:51.7236718Z .maxnreg 32 2026-02-21T10:18:51.7236859Z { 2026-02-21T10:18:51.7236996Z .reg .pred %p<64>; 2026-02-21T10:18:51.7237164Z .reg .b16 %rs<113>; 2026-02-21T10:18:51.7237331Z .reg .b32 %r<2657>; 2026-02-21T10:18:51.7237484Z .reg .b64 %rd<125>; 2026-02-21T10:18:51.7237799Z .loc 1 19 0 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:19:0 2026-02-21T10:18:51.7238165Z $L__func_begin0: 2026-02-21T10:18:51.7238465Z .loc 1 19 0 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:19:0 2026-02-21T10:18:51.7238858Z 2026-02-21T10:18:51.7238922Z // %bb.0: 2026-02-21T10:18:51.7239120Z ld.param.b64 %rd6, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:18:51.7239373Z $L__tmp0: 2026-02-21T10:18:51.7239726Z .loc 1 21 67 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:21:67 2026-02-21T10:18:51.7240111Z mov.u32 %r2520, %ctaid.x; 2026-02-21T10:18:51.7240339Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:18:51.7240595Z mov.u32 %r578, %ctaid.y; 2026-02-21T10:18:51.7240817Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:18:51.7241065Z mov.u32 %r579, %ctaid.z; 2026-02-21T10:18:51.7241245Z mov.u32 %r580, %nctaid.x; 2026-02-21T10:18:51.7241414Z mov.u32 %r581, %nctaid.y; 2026-02-21T10:18:51.7241602Z mad.lo.s32 %r582, %r579, %r581, %r578; 2026-02-21T10:18:51.7241818Z mad.lo.s32 %r583, %r582, %r580, %r2520; 2026-02-21T10:18:51.7242030Z shl.b32 %r584, %r583, 7; 2026-02-21T10:18:51.7242196Z cvt.s64.s32 %rd54, %r584; 2026-02-21T10:18:51.7242375Z add.s64 %rd23, %rd53, %rd54; 2026-02-21T10:18:51.7242556Z mov.u32 %r2, %tid.x; 2026-02-21T10:18:51.7242741Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:18:51.7242915Z shl.b32 %r585, %r2, 2; 2026-02-21T10:18:51.7243082Z mov.b32 %r586, global_smem; 2026-02-21T10:18:51.7243272Z add.s32 %r504, %r586, %r585; 2026-02-21T10:18:51.7243442Z mov.b32 %r2380, 0; 2026-02-21T10:18:51.7243603Z // begin inline asm 2026-02-21T10:18:51.7243770Z @%p1 st.shared.b32 [ %r504 + 0 ], %r2380; 2026-02-21T10:18:51.7243977Z // end inline asm 2026-02-21T10:18:51.7244130Z bar.warp.sync -1; 2026-02-21T10:18:51.7244296Z setp.eq.b32 %p61, %r2, 0; 2026-02-21T10:18:51.7244470Z cvt.u64.u32 %rd8, %r586; 2026-02-21T10:18:51.7244641Z // begin inline asm 2026-02-21T10:18:51.7244951Z @%p61 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T10:18:51.7245299Z // end inline asm 2026-02-21T10:18:51.7245452Z // begin inline asm 2026-02-21T10:18:51.7245717Z @%p61 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T10:18:51.7246027Z // end inline asm 2026-02-21T10:18:51.7246172Z mov.b32 %r506, 128; 2026-02-21T10:18:51.7246332Z // begin inline asm 2026-02-21T10:18:51.7246757Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r506; 2026-02-21T10:18:51.7247106Z // end inline asm 2026-02-21T10:18:51.7247257Z mov.b32 %r507, 16; 2026-02-21T10:18:51.7247407Z // begin inline asm 2026-02-21T10:18:51.7247683Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r507; 2026-02-21T10:18:51.7248003Z // end inline asm 2026-02-21T10:18:51.7248154Z mov.b32 %r508, 1280; 2026-02-21T10:18:51.7248310Z // begin inline asm 2026-02-21T10:18:51.7248599Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r508; 2026-02-21T10:18:51.7249035Z // end inline asm 2026-02-21T10:18:51.7249254Z mov.b32 %r509, 4096; 2026-02-21T10:18:51.7249412Z // begin inline asm 2026-02-21T10:18:51.7249697Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r509; 2026-02-21T10:18:51.7250037Z // end inline asm 2026-02-21T10:18:51.7250180Z mov.b64 %rd16, 1280; 2026-02-21T10:18:51.7250337Z // begin inline asm 2026-02-21T10:18:51.7250636Z @%p61 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T10:18:51.7250992Z // end inline asm 2026-02-21T10:18:51.7251135Z mov.b32 %r2379, 1; 2026-02-21T10:18:51.7251295Z // begin inline asm 2026-02-21T10:18:51.7251613Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r2379; 2026-02-21T10:18:51.7251970Z // end inline asm 2026-02-21T10:18:51.7252122Z // begin inline asm 2026-02-21T10:18:51.7252422Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r2379; 2026-02-21T10:18:51.7252796Z // end inline asm 2026-02-21T10:18:51.7252943Z // begin inline asm 2026-02-21T10:18:51.7253294Z @%p61 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:51.7253622Z // end inline asm 2026-02-21T10:18:51.7253772Z // begin inline asm 2026-02-21T10:18:51.7254071Z @%p61 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:51.7254469Z // end inline asm 2026-02-21T10:18:51.7254626Z // begin inline asm 2026-02-21T10:18:51.7254902Z @%p61 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T10:18:51.7255228Z // end inline asm 2026-02-21T10:18:51.7255367Z // begin inline asm 2026-02-21T10:18:51.7255634Z @%p61 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:18:51.7255950Z // end inline asm 2026-02-21T10:18:51.7256092Z // begin inline asm 2026-02-21T10:18:51.7256649Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T10:18:51.7257142Z // end inline asm 2026-02-21T10:18:51.7257296Z // begin inline asm 2026-02-21T10:18:51.7257543Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T10:18:51.7257855Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:18:51.7258075Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:18:51.7258286Z // end inline asm 2026-02-21T10:18:51.7258437Z bar.sync 0; 2026-02-21T10:18:51.7258599Z cvta.global.u64 %rd34, %rd23; 2026-02-21T10:18:51.7258942Z .loc 1 38 45 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:38:45 2026-02-21T10:18:51.7259306Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:18:51.7259478Z or.b32 %r6, %r5, 16; 2026-02-21T10:18:51.7259636Z or.b32 %r7, %r5, 32; 2026-02-21T10:18:51.7259788Z or.b32 %r8, %r5, 48; 2026-02-21T10:18:51.7259955Z or.b32 %r9, %r5, 64; 2026-02-21T10:18:51.7260114Z or.b32 %r10, %r5, 80; 2026-02-21T10:18:51.7260280Z or.b32 %r11, %r5, 96; 2026-02-21T10:18:51.7260439Z or.b32 %r12, %r5, 112; 2026-02-21T10:18:51.7260771Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7261137Z sub.s32 %r589, 5120, %r2520; 2026-02-21T10:18:51.7261322Z mul.hi.s32 %r590, %r589, 1041204193; 2026-02-21T10:18:51.7261524Z shr.u32 %r591, %r590, 31; 2026-02-21T10:18:51.7261695Z shr.s32 %r592, %r590, 11; 2026-02-21T10:18:51.7261868Z add.s32 %r30, %r592, %r591; 2026-02-21T10:18:51.7262047Z mul.lo.s32 %r593, %r30, 8448; 2026-02-21T10:18:51.7262243Z setp.ne.b32 %p28, %r589, %r593; 2026-02-21T10:18:51.7262435Z setp.lt.u32 %p29, %r2520, 5121; 2026-02-21T10:18:51.7262631Z and.pred %p30, %p29, %p28; 2026-02-21T10:18:51.7262820Z selp.b32 %r31, 1, 0, %p30; 2026-02-21T10:18:51.7262996Z add.s32 %r32, %r30, %r31; 2026-02-21T10:18:51.7263318Z .loc 1 53 38 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:53:38 2026-02-21T10:18:51.7263761Z and.b32 %r33, %r2, 7; 2026-02-21T10:18:51.7263986Z shl.b32 %r34, %r33, 2; 2026-02-21T10:18:51.7264307Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7264667Z add.s32 %r512, %r586, 47104; 2026-02-21T10:18:51.7264845Z // begin inline asm 2026-02-21T10:18:51.7265049Z @%p61 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T10:18:51.7265281Z // end inline asm 2026-02-21T10:18:51.7265429Z bar.sync 0; 2026-02-21T10:18:51.7265586Z add.s32 %r513, %r586, 47112; 2026-02-21T10:18:51.7265764Z // begin inline asm 2026-02-21T10:18:51.7265954Z @%p61 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T10:18:51.7266172Z // end inline asm 2026-02-21T10:18:51.7266319Z bar.sync 0; 2026-02-21T10:18:51.7266582Z add.s32 %r514, %r586, 47120; 2026-02-21T10:18:51.7266778Z // begin inline asm 2026-02-21T10:18:51.7266964Z @%p61 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T10:18:51.7267188Z // end inline asm 2026-02-21T10:18:51.7267347Z setp.lt.s32 %p31, %r32, 1; 2026-02-21T10:18:51.7267527Z setp.gt.s32 %p32, %r32, 0; 2026-02-21T10:18:51.7267928Z .loc 1 32 35 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:32:35 2026-02-21T10:18:51.7268292Z mul.hi.u32 %r594, %r2520, 1717986919; 2026-02-21T10:18:51.7268579Z shr.u32 %r595, %r594, 5; 2026-02-21T10:18:51.7268952Z .loc 1 33 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:33:33 2026-02-21T10:18:51.7269307Z shl.b32 %r596, %r595, 3; 2026-02-21T10:18:51.7269612Z .loc 1 34 39 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:34:39 2026-02-21T10:18:51.7269956Z sub.s32 %r597, 512, %r596; 2026-02-21T10:18:51.7270276Z .loc 1 34 52 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:34:52 2026-02-21T10:18:51.7270618Z min.s32 %r598, %r597, 8; 2026-02-21T10:18:51.7270920Z .loc 1 35 45 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:45 2026-02-21T10:18:51.7271270Z mul.lo.s32 %r599, %r595, 80; 2026-02-21T10:18:51.7271453Z sub.s32 %r600, %r2520, %r599; 2026-02-21T10:18:51.7271769Z .loc 1 36 51 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:36:51 2026-02-21T10:18:51.7272114Z div.s32 %r601, %r600, %r598; 2026-02-21T10:18:51.7272426Z .loc 1 35 64 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:64 2026-02-21T10:18:51.7272770Z mul.lo.s32 %r602, %r601, %r598; 2026-02-21T10:18:51.7272958Z sub.s32 %r603, %r600, %r602; 2026-02-21T10:18:51.7273261Z .loc 1 35 30 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:30 2026-02-21T10:18:51.7273611Z add.s32 %r604, %r603, %r596; 2026-02-21T10:18:51.7273923Z .loc 1 37 27 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:37:27 2026-02-21T10:18:51.7274263Z shl.b32 %r2377, %r604, 7; 2026-02-21T10:18:51.7274573Z .loc 1 38 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:38:32 2026-02-21T10:18:51.7274921Z or.b32 %r2521, %r2377, %r5; 2026-02-21T10:18:51.7275116Z or.b32 %r2522, %r2377, %r6; 2026-02-21T10:18:51.7275289Z or.b32 %r2523, %r2377, %r7; 2026-02-21T10:18:51.7275458Z or.b32 %r2524, %r2377, %r8; 2026-02-21T10:18:51.7275629Z or.b32 %r2525, %r2377, %r9; 2026-02-21T10:18:51.7275800Z or.b32 %r2526, %r2377, %r10; 2026-02-21T10:18:51.7275977Z or.b32 %r2527, %r2377, %r11; 2026-02-21T10:18:51.7276145Z or.b32 %r2528, %r2377, %r12; 2026-02-21T10:18:51.7276580Z .loc 1 39 27 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:39:27 2026-02-21T10:18:51.7276925Z shl.b32 %r2375, %r601, 7; 2026-02-21T10:18:51.7277231Z .loc 1 54 53 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:53 2026-02-21T10:18:51.7277578Z shl.b32 %r605, %r2521, 13; 2026-02-21T10:18:51.7277878Z shl.b32 %r606, %r2522, 13; 2026-02-21T10:18:51.7278048Z shl.b32 %r607, %r2523, 13; 2026-02-21T10:18:51.7278281Z shl.b32 %r608, %r2524, 13; 2026-02-21T10:18:51.7278452Z shl.b32 %r609, %r2525, 13; 2026-02-21T10:18:51.7278618Z shl.b32 %r610, %r2526, 13; 2026-02-21T10:18:51.7278789Z shl.b32 %r611, %r2527, 13; 2026-02-21T10:18:51.7278971Z shl.b32 %r612, %r2528, 13; 2026-02-21T10:18:51.7279282Z .loc 1 54 60 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:60 2026-02-21T10:18:51.7279626Z or.b32 %r613, %r605, %r34; 2026-02-21T10:18:51.7279796Z or.b32 %r614, %r606, %r34; 2026-02-21T10:18:51.7279966Z or.b32 %r615, %r607, %r34; 2026-02-21T10:18:51.7280132Z or.b32 %r616, %r608, %r34; 2026-02-21T10:18:51.7280301Z or.b32 %r617, %r609, %r34; 2026-02-21T10:18:51.7280472Z or.b32 %r618, %r610, %r34; 2026-02-21T10:18:51.7280640Z or.b32 %r619, %r611, %r34; 2026-02-21T10:18:51.7280805Z or.b32 %r620, %r612, %r34; 2026-02-21T10:18:51.7281112Z .loc 1 54 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:32 2026-02-21T10:18:51.7281466Z mad.wide.s32 %rd26, %r613, 2, %rd6; 2026-02-21T10:18:51.7281747Z mad.wide.s32 %rd27, %r614, 2, %rd6; 2026-02-21T10:18:51.7281952Z mad.wide.s32 %rd28, %r615, 2, %rd6; 2026-02-21T10:18:51.7282148Z mad.wide.s32 %rd29, %r616, 2, %rd6; 2026-02-21T10:18:51.7282346Z mad.wide.s32 %rd30, %r617, 2, %rd6; 2026-02-21T10:18:51.7282551Z mad.wide.s32 %rd31, %r618, 2, %rd6; 2026-02-21T10:18:51.7282811Z mad.wide.s32 %rd32, %r619, 2, %rd6; 2026-02-21T10:18:51.7283010Z mad.wide.s32 %rd33, %r620, 2, %rd6; 2026-02-21T10:18:51.7283343Z .loc 1 54 80 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:80 2026-02-21T10:18:51.7283690Z and.b32 %r45, %r2, 127; 2026-02-21T10:18:51.7283861Z shl.b32 %r621, %r45, 3; 2026-02-21T10:18:51.7284031Z shr.u32 %r622, %r2, 1; 2026-02-21T10:18:51.7284195Z and.b32 %r623, %r622, 24; 2026-02-21T10:18:51.7284370Z xor.b32 %r46, %r621, %r623; 2026-02-21T10:18:51.7284544Z add.s32 %r515, %r586, %r46; 2026-02-21T10:18:51.7284726Z selp.b32 %r516, 8, 0, %p32; 2026-02-21T10:18:51.7284894Z // begin inline asm 2026-02-21T10:18:51.7285129Z cp.async.ca.shared.global [ %r515 + 0 ], [ %rd26 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7285399Z // end inline asm 2026-02-21T10:18:51.7285555Z add.s32 %r517, %r515, 1024; 2026-02-21T10:18:51.7285721Z // begin inline asm 2026-02-21T10:18:51.7285961Z cp.async.ca.shared.global [ %r517 + 0 ], [ %rd27 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7286234Z // end inline asm 2026-02-21T10:18:51.7286381Z add.s32 %r519, %r515, 2048; 2026-02-21T10:18:51.7286680Z // begin inline asm 2026-02-21T10:18:51.7286899Z cp.async.ca.shared.global [ %r519 + 0 ], [ %rd28 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7287163Z // end inline asm 2026-02-21T10:18:51.7287312Z add.s32 %r521, %r515, 3072; 2026-02-21T10:18:51.7287483Z // begin inline asm 2026-02-21T10:18:51.7287698Z cp.async.ca.shared.global [ %r521 + 0 ], [ %rd29 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7287983Z // end inline asm 2026-02-21T10:18:51.7288136Z add.s32 %r523, %r515, 4096; 2026-02-21T10:18:51.7288302Z // begin inline asm 2026-02-21T10:18:51.7288523Z cp.async.ca.shared.global [ %r523 + 0 ], [ %rd30 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7288783Z // end inline asm 2026-02-21T10:18:51.7288932Z add.s32 %r525, %r515, 5120; 2026-02-21T10:18:51.7289101Z // begin inline asm 2026-02-21T10:18:51.7289322Z cp.async.ca.shared.global [ %r525 + 0 ], [ %rd31 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7289581Z // end inline asm 2026-02-21T10:18:51.7289731Z add.s32 %r527, %r515, 6144; 2026-02-21T10:18:51.7289910Z // begin inline asm 2026-02-21T10:18:51.7290126Z cp.async.ca.shared.global [ %r527 + 0 ], [ %rd32 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7290393Z // end inline asm 2026-02-21T10:18:51.7290537Z add.s32 %r529, %r515, 7168; 2026-02-21T10:18:51.7290710Z // begin inline asm 2026-02-21T10:18:51.7290923Z cp.async.ca.shared.global [ %r529 + 0 ], [ %rd33 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7291284Z // end inline asm 2026-02-21T10:18:51.7291496Z cp.async.commit_group; 2026-02-21T10:18:51.7291834Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7292196Z bar.sync 0; 2026-02-21T10:18:51.7292347Z and.pred %p22, %p61, %p32; 2026-02-21T10:18:51.7292529Z // begin inline asm 2026-02-21T10:18:51.7292743Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r512], 2048; 2026-02-21T10:18:51.7293012Z // end inline asm 2026-02-21T10:18:51.7293302Z .loc 1 60 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:60:33 2026-02-21T10:18:51.7293647Z bar.sync 0; 2026-02-21T10:18:51.7293797Z elect.sync %r624|%p33, -1; 2026-02-21T10:18:51.7293981Z and.pred %p34, %p32, %p33; 2026-02-21T10:18:51.7294159Z and.pred %p23, %p1, %p34; 2026-02-21T10:18:51.7294338Z add.s32 %r532, %r586, 40960; 2026-02-21T10:18:51.7294511Z // begin inline asm 2026-02-21T10:18:51.7294922Z @%p23 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r532], [%rd34, {%r2375, %r2380}], [%r512]; 2026-02-21T10:18:51.7295383Z // end inline asm 2026-02-21T10:18:51.7295748Z .loc 1 54 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:32 2026-02-21T10:18:51.7296106Z cvt.s64.s32 %rd55, %r605; 2026-02-21T10:18:51.7296278Z cvt.u64.u32 %rd56, %r34; 2026-02-21T10:18:51.7296578Z or.b64 %rd57, %rd55, %rd56; 2026-02-21T10:18:51.7296845Z shl.b64 %rd58, %rd57, 1; 2026-02-21T10:18:51.7297024Z add.s64 %rd59, %rd6, %rd58; 2026-02-21T10:18:51.7297202Z add.s64 %rd35, %rd59, 64; 2026-02-21T10:18:51.7297376Z cvt.s64.s32 %rd60, %r606; 2026-02-21T10:18:51.7297547Z or.b64 %rd61, %rd60, %rd56; 2026-02-21T10:18:51.7297716Z shl.b64 %rd62, %rd61, 1; 2026-02-21T10:18:51.7297885Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T10:18:51.7298067Z add.s64 %rd36, %rd63, 64; 2026-02-21T10:18:51.7298244Z cvt.s64.s32 %rd64, %r607; 2026-02-21T10:18:51.7298418Z or.b64 %rd65, %rd64, %rd56; 2026-02-21T10:18:51.7298595Z shl.b64 %rd66, %rd65, 1; 2026-02-21T10:18:51.7298766Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T10:18:51.7298942Z add.s64 %rd37, %rd67, 64; 2026-02-21T10:18:51.7299116Z cvt.s64.s32 %rd68, %r608; 2026-02-21T10:18:51.7299280Z or.b64 %rd69, %rd68, %rd56; 2026-02-21T10:18:51.7299454Z shl.b64 %rd70, %rd69, 1; 2026-02-21T10:18:51.7299630Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T10:18:51.7299808Z add.s64 %rd38, %rd71, 64; 2026-02-21T10:18:51.7299982Z cvt.s64.s32 %rd72, %r609; 2026-02-21T10:18:51.7300152Z or.b64 %rd73, %rd72, %rd56; 2026-02-21T10:18:51.7300325Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:18:51.7300494Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T10:18:51.7300667Z add.s64 %rd39, %rd75, 64; 2026-02-21T10:18:51.7300831Z cvt.s64.s32 %rd76, %r610; 2026-02-21T10:18:51.7301001Z or.b64 %rd77, %rd76, %rd56; 2026-02-21T10:18:51.7301173Z shl.b64 %rd78, %rd77, 1; 2026-02-21T10:18:51.7301345Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T10:18:51.7301514Z add.s64 %rd40, %rd79, 64; 2026-02-21T10:18:51.7301680Z cvt.s64.s32 %rd80, %r611; 2026-02-21T10:18:51.7301850Z or.b64 %rd81, %rd80, %rd56; 2026-02-21T10:18:51.7302025Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:18:51.7302196Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T10:18:51.7302366Z add.s64 %rd41, %rd83, 64; 2026-02-21T10:18:51.7302537Z cvt.s64.s32 %rd84, %r612; 2026-02-21T10:18:51.7302704Z or.b64 %rd85, %rd84, %rd56; 2026-02-21T10:18:51.7302883Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:18:51.7303059Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T10:18:51.7303238Z add.s64 %rd42, %rd87, 64; 2026-02-21T10:18:51.7310707Z .loc 1 54 80 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:80 2026-02-21T10:18:51.7311113Z add.s32 %r536, %r515, 8192; 2026-02-21T10:18:51.7311312Z // begin inline asm 2026-02-21T10:18:51.7311562Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd35 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7311847Z // end inline asm 2026-02-21T10:18:51.7312178Z add.s32 %r538, %r515, 9216; 2026-02-21T10:18:51.7312359Z // begin inline asm 2026-02-21T10:18:51.7312666Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd36 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7312942Z // end inline asm 2026-02-21T10:18:51.7313105Z add.s32 %r540, %r515, 10240; 2026-02-21T10:18:51.7313279Z // begin inline asm 2026-02-21T10:18:51.7313508Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd37 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7313779Z // end inline asm 2026-02-21T10:18:51.7313961Z add.s32 %r542, %r515, 11264; 2026-02-21T10:18:51.7314153Z // begin inline asm 2026-02-21T10:18:51.7314392Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd38 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7314672Z // end inline asm 2026-02-21T10:18:51.7314828Z add.s32 %r544, %r515, 12288; 2026-02-21T10:18:51.7315005Z // begin inline asm 2026-02-21T10:18:51.7315226Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd39 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7315500Z // end inline asm 2026-02-21T10:18:51.7315651Z add.s32 %r546, %r515, 13312; 2026-02-21T10:18:51.7315832Z // begin inline asm 2026-02-21T10:18:51.7316051Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd40 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7316399Z // end inline asm 2026-02-21T10:18:51.7316724Z add.s32 %r548, %r515, 14336; 2026-02-21T10:18:51.7316906Z // begin inline asm 2026-02-21T10:18:51.7317156Z cp.async.ca.shared.global [ %r548 + 0 ], [ %rd41 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7317431Z // end inline asm 2026-02-21T10:18:51.7317661Z add.s32 %r550, %r515, 15360; 2026-02-21T10:18:51.7317840Z // begin inline asm 2026-02-21T10:18:51.7318067Z cp.async.ca.shared.global [ %r550 + 0 ], [ %rd42 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7318333Z // end inline asm 2026-02-21T10:18:51.7318492Z cp.async.commit_group; 2026-02-21T10:18:51.7318828Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7319195Z bar.sync 0; 2026-02-21T10:18:51.7319355Z // begin inline asm 2026-02-21T10:18:51.7319602Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r513], 2048; 2026-02-21T10:18:51.7319874Z // end inline asm 2026-02-21T10:18:51.7320176Z .loc 1 60 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:60:33 2026-02-21T10:18:51.7320526Z bar.sync 0; 2026-02-21T10:18:51.7320686Z elect.sync %r625|%p35, -1; 2026-02-21T10:18:51.7320886Z and.pred %p36, %p32, %p35; 2026-02-21T10:18:51.7321079Z and.pred %p25, %p1, %p36; 2026-02-21T10:18:51.7321263Z add.s32 %r553, %r586, 43008; 2026-02-21T10:18:51.7321444Z // begin inline asm 2026-02-21T10:18:51.7321862Z @%p25 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r553], [%rd34, {%r2375, %r507}], [%r513]; 2026-02-21T10:18:51.7322326Z // end inline asm 2026-02-21T10:18:51.7322626Z .loc 1 54 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:32 2026-02-21T10:18:51.7322990Z add.s64 %rd44, %rd59, 128; 2026-02-21T10:18:51.7323175Z add.s64 %rd45, %rd63, 128; 2026-02-21T10:18:51.7323347Z add.s64 %rd46, %rd67, 128; 2026-02-21T10:18:51.7323522Z add.s64 %rd47, %rd71, 128; 2026-02-21T10:18:51.7323693Z add.s64 %rd48, %rd75, 128; 2026-02-21T10:18:51.7323868Z add.s64 %rd49, %rd79, 128; 2026-02-21T10:18:51.7324036Z add.s64 %rd50, %rd83, 128; 2026-02-21T10:18:51.7324213Z add.s64 %rd51, %rd87, 128; 2026-02-21T10:18:51.7324523Z .loc 1 54 80 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:80 2026-02-21T10:18:51.7324879Z add.s32 %r557, %r515, 16384; 2026-02-21T10:18:51.7325061Z // begin inline asm 2026-02-21T10:18:51.7325300Z cp.async.ca.shared.global [ %r557 + 0 ], [ %rd44 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7325576Z // end inline asm 2026-02-21T10:18:51.7325728Z add.s32 %r559, %r515, 17408; 2026-02-21T10:18:51.7325902Z // begin inline asm 2026-02-21T10:18:51.7326140Z cp.async.ca.shared.global [ %r559 + 0 ], [ %rd45 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7326419Z // end inline asm 2026-02-21T10:18:51.7326805Z add.s32 %r561, %r515, 18432; 2026-02-21T10:18:51.7326983Z // begin inline asm 2026-02-21T10:18:51.7327294Z cp.async.ca.shared.global [ %r561 + 0 ], [ %rd46 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7327567Z // end inline asm 2026-02-21T10:18:51.7327719Z add.s32 %r563, %r515, 19456; 2026-02-21T10:18:51.7327889Z // begin inline asm 2026-02-21T10:18:51.7328114Z cp.async.ca.shared.global [ %r563 + 0 ], [ %rd47 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7328396Z // end inline asm 2026-02-21T10:18:51.7328556Z add.s32 %r565, %r515, 20480; 2026-02-21T10:18:51.7328732Z // begin inline asm 2026-02-21T10:18:51.7328960Z cp.async.ca.shared.global [ %r565 + 0 ], [ %rd48 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7329229Z // end inline asm 2026-02-21T10:18:51.7329382Z add.s32 %r567, %r515, 21504; 2026-02-21T10:18:51.7329559Z // begin inline asm 2026-02-21T10:18:51.7329776Z cp.async.ca.shared.global [ %r567 + 0 ], [ %rd49 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7330055Z // end inline asm 2026-02-21T10:18:51.7330205Z add.s32 %r569, %r515, 22528; 2026-02-21T10:18:51.7330384Z // begin inline asm 2026-02-21T10:18:51.7330677Z cp.async.ca.shared.global [ %r569 + 0 ], [ %rd50 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7330956Z // end inline asm 2026-02-21T10:18:51.7331102Z add.s32 %r571, %r515, 23552; 2026-02-21T10:18:51.7331276Z // begin inline asm 2026-02-21T10:18:51.7331500Z cp.async.ca.shared.global [ %r571 + 0 ], [ %rd51 + 0 ], 0x8, %r516; 2026-02-21T10:18:51.7331847Z // end inline asm 2026-02-21T10:18:51.7332009Z cp.async.commit_group; 2026-02-21T10:18:51.7332335Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7332695Z bar.sync 0; 2026-02-21T10:18:51.7332837Z // begin inline asm 2026-02-21T10:18:51.7333064Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r514], 2048; 2026-02-21T10:18:51.7333325Z // end inline asm 2026-02-21T10:18:51.7333621Z .loc 1 60 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:60:33 2026-02-21T10:18:51.7333972Z bar.sync 0; 2026-02-21T10:18:51.7334122Z elect.sync %r626|%p37, -1; 2026-02-21T10:18:51.7334314Z and.pred %p38, %p32, %p37; 2026-02-21T10:18:51.7334495Z and.pred %p27, %p1, %p38; 2026-02-21T10:18:51.7334679Z add.s32 %r574, %r586, 45056; 2026-02-21T10:18:51.7334853Z mov.b32 %r2381, 32; 2026-02-21T10:18:51.7335014Z // begin inline asm 2026-02-21T10:18:51.7335440Z @%p27 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r574], [%rd34, {%r2375, %r2381}], [%r514]; 2026-02-21T10:18:51.7335899Z // end inline asm 2026-02-21T10:18:51.7336206Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7336703Z @%p31 bra $L__BB0_7; 2026-02-21T10:18:51.7336890Z // %bb.1: // %.lr.ph 2026-02-21T10:18:51.7337262Z .loc 1 0 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:0:145 2026-02-21T10:18:51.7337673Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:18:51.7337922Z shr.u32 %r3, %r2, 5; 2026-02-21T10:18:51.7338085Z and.b32 %r4, %r2, 120; 2026-02-21T10:18:51.7338258Z shr.u32 %r587, %r2, 4; 2026-02-21T10:18:51.7338436Z bfe.u32 %r13, %r2, 4, 3; 2026-02-21T10:18:51.7338611Z or.b32 %r14, %r13, 8; 2026-02-21T10:18:51.7338775Z or.b32 %r15, %r13, 16; 2026-02-21T10:18:51.7338944Z or.b32 %r16, %r13, 24; 2026-02-21T10:18:51.7339100Z or.b32 %r17, %r13, 32; 2026-02-21T10:18:51.7339260Z or.b32 %r18, %r13, 40; 2026-02-21T10:18:51.7339416Z or.b32 %r19, %r13, 48; 2026-02-21T10:18:51.7339581Z or.b32 %r20, %r587, 56; 2026-02-21T10:18:51.7339745Z or.b32 %r21, %r13, 64; 2026-02-21T10:18:51.7339905Z or.b32 %r22, %r13, 72; 2026-02-21T10:18:51.7340068Z or.b32 %r23, %r13, 80; 2026-02-21T10:18:51.7340229Z or.b32 %r24, %r13, 88; 2026-02-21T10:18:51.7340390Z or.b32 %r25, %r13, 96; 2026-02-21T10:18:51.7340548Z or.b32 %r26, %r13, 104; 2026-02-21T10:18:51.7340820Z or.b32 %r27, %r13, 112; 2026-02-21T10:18:51.7340981Z or.b32 %r28, %r587, 120; 2026-02-21T10:18:51.7341210Z shl.b32 %r588, %r2, 3; 2026-02-21T10:18:51.7341371Z and.b32 %r29, %r588, 120; 2026-02-21T10:18:51.7341699Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7342053Z shl.b32 %r633, %r32, 8; 2026-02-21T10:18:51.7342223Z add.s32 %r47, %r633, -3; 2026-02-21T10:18:51.7342393Z shl.b32 %r634, %r2, 5; 2026-02-21T10:18:51.7342558Z and.b32 %r635, %r634, 3072; 2026-02-21T10:18:51.7342739Z shl.b32 %r636, %r2, 4; 2026-02-21T10:18:51.7342900Z and.b32 %r637, %r636, 448; 2026-02-21T10:18:51.7343072Z and.b32 %r638, %r2, 3; 2026-02-21T10:18:51.7343228Z shl.b32 %r639, %r638, 1; 2026-02-21T10:18:51.7343392Z and.b32 %r640, %r2, 24; 2026-02-21T10:18:51.7343552Z or.b32 %r641, %r635, %r637; 2026-02-21T10:18:51.7343728Z or.b32 %r642, %r639, %r640; 2026-02-21T10:18:51.7343897Z or.b32 %r48, %r641, %r642; 2026-02-21T10:18:51.7344074Z xor.b32 %r49, %r48, 8; 2026-02-21T10:18:51.7344236Z xor.b32 %r50, %r48, 16; 2026-02-21T10:18:51.7344396Z xor.b32 %r51, %r48, 24; 2026-02-21T10:18:51.7344572Z shl.b32 %r643, %r45, 7; 2026-02-21T10:18:51.7344813Z shl.b32 %r644, %r33, 4; 2026-02-21T10:18:51.7344985Z or.b32 %r645, %r643, %r644; 2026-02-21T10:18:51.7345159Z add.s32 %r647, %r586, 24576; 2026-02-21T10:18:51.7345333Z add.s32 %r52, %r647, %r645; 2026-02-21T10:18:51.7345509Z xor.b32 %r648, %r645, 16; 2026-02-21T10:18:51.7345735Z add.s32 %r53, %r647, %r648; 2026-02-21T10:18:51.7345911Z xor.b32 %r649, %r645, 32; 2026-02-21T10:18:51.7346075Z add.s32 %r54, %r647, %r649; 2026-02-21T10:18:51.7346249Z xor.b32 %r650, %r645, 48; 2026-02-21T10:18:51.7346408Z add.s32 %r55, %r647, %r650; 2026-02-21T10:18:51.7346706Z xor.b32 %r651, %r645, 64; 2026-02-21T10:18:51.7346884Z add.s32 %r56, %r647, %r651; 2026-02-21T10:18:51.7347060Z xor.b32 %r652, %r645, 80; 2026-02-21T10:18:51.7347223Z add.s32 %r57, %r647, %r652; 2026-02-21T10:18:51.7347403Z xor.b32 %r653, %r645, 96; 2026-02-21T10:18:51.7347570Z add.s32 %r58, %r647, %r653; 2026-02-21T10:18:51.7347739Z xor.b32 %r654, %r645, 112; 2026-02-21T10:18:51.7347917Z add.s32 %r59, %r647, %r654; 2026-02-21T10:18:51.7348089Z bfe.u32 %r655, %r647, 4, 14; 2026-02-21T10:18:51.7348273Z cvt.u64.u32 %rd88, %r655; 2026-02-21T10:18:51.7348516Z or.b64 %rd92, %rd88, 4611686293372403712; 2026-02-21T10:18:51.7348729Z add.s32 %r656, %r586, 24608; 2026-02-21T10:18:51.7348903Z bfe.u32 %r657, %r656, 4, 14; 2026-02-21T10:18:51.7349076Z cvt.u64.u32 %rd89, %r657; 2026-02-21T10:18:51.7349254Z or.b64 %rd93, %rd89, 4611686293372403712; 2026-02-21T10:18:51.7349455Z add.s32 %r658, %r586, 24640; 2026-02-21T10:18:51.7349629Z bfe.u32 %r659, %r658, 4, 14; 2026-02-21T10:18:51.7349793Z cvt.u64.u32 %rd90, %r659; 2026-02-21T10:18:51.7349976Z or.b64 %rd94, %rd90, 4611686293372403712; 2026-02-21T10:18:51.7350170Z add.s32 %r660, %r586, 24672; 2026-02-21T10:18:51.7350340Z bfe.u32 %r661, %r660, 4, 14; 2026-02-21T10:18:51.7350509Z cvt.u64.u32 %rd91, %r661; 2026-02-21T10:18:51.7350694Z or.b64 %rd95, %rd91, 4611686293372403712; 2026-02-21T10:18:51.7350888Z shl.b32 %r662, %r638, 11; 2026-02-21T10:18:51.7351056Z shl.b32 %r663, %r638, 5; 2026-02-21T10:18:51.7351226Z shl.b32 %r664, %r4, 4; 2026-02-21T10:18:51.7351386Z and.b32 %r666, %r585, 16; 2026-02-21T10:18:51.7351553Z or.b32 %r667, %r664, %r666; 2026-02-21T10:18:51.7351722Z or.b32 %r668, %r667, %r662; 2026-02-21T10:18:51.7351900Z or.b32 %r669, %r668, %r663; 2026-02-21T10:18:51.7352083Z add.s32 %r60, %r647, %r669; 2026-02-21T10:18:51.7352256Z xor.b32 %r670, %r669, 32; 2026-02-21T10:18:51.7352418Z add.s32 %r61, %r647, %r670; 2026-02-21T10:18:51.7352593Z xor.b32 %r671, %r669, 64; 2026-02-21T10:18:51.7352753Z add.s32 %r62, %r647, %r671; 2026-02-21T10:18:51.7352924Z xor.b32 %r672, %r669, 96; 2026-02-21T10:18:51.7353089Z add.s32 %r63, %r647, %r672; 2026-02-21T10:18:51.7353256Z shl.b32 %r673, %r640, 8; 2026-02-21T10:18:51.7353525Z and.b32 %r674, %r585, 496; 2026-02-21T10:18:51.7353693Z or.b32 %r675, %r673, %r663; 2026-02-21T10:18:51.7353930Z xor.b32 %r676, %r675, %r674; 2026-02-21T10:18:51.7354115Z add.s32 %r2125, %r647, %r676; 2026-02-21T10:18:51.7354295Z add.s32 %r2130, %r2125, 512; 2026-02-21T10:18:51.7354472Z add.s32 %r2135, %r2125, 1024; 2026-02-21T10:18:51.7354645Z add.s32 %r2140, %r2125, 1536; 2026-02-21T10:18:51.7354820Z shl.b32 %r677, %r30, 8; 2026-02-21T10:18:51.7354983Z shl.b32 %r678, %r31, 8; 2026-02-21T10:18:51.7355156Z add.s32 %r68, %r677, %r678; 2026-02-21T10:18:51.7355330Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:18:51.7355496Z mov.b32 %r2384, 2; 2026-02-21T10:18:51.7355654Z mov.b32 %r2383, -1; 2026-02-21T10:18:51.7355812Z mov.b32 %r2376, %r2375; 2026-02-21T10:18:51.7355978Z mov.b32 %r2378, %r2377; 2026-02-21T10:18:51.7356137Z mov.b32 %r2382, %r2380; 2026-02-21T10:18:51.7356298Z mov.b32 %r2385, %r2375; 2026-02-21T10:18:51.7356573Z mov.b32 %r2386, %r2377; 2026-02-21T10:18:51.7356760Z mov.b32 %r2388, %r2387; 2026-02-21T10:18:51.7356916Z mov.b32 %r2389, %r2387; 2026-02-21T10:18:51.7357080Z mov.b32 %r2390, %r2387; 2026-02-21T10:18:51.7357235Z mov.b32 %r2391, %r2387; 2026-02-21T10:18:51.7357476Z mov.b32 %r2392, %r2387; 2026-02-21T10:18:51.7357641Z mov.b32 %r2393, %r2387; 2026-02-21T10:18:51.7357797Z mov.b32 %r2394, %r2387; 2026-02-21T10:18:51.7357960Z mov.b32 %r2395, %r2387; 2026-02-21T10:18:51.7358115Z mov.b32 %r2396, %r2387; 2026-02-21T10:18:51.7358334Z mov.b32 %r2397, %r2387; 2026-02-21T10:18:51.7358497Z mov.b32 %r2398, %r2387; 2026-02-21T10:18:51.7358657Z mov.b32 %r2399, %r2387; 2026-02-21T10:18:51.7358813Z mov.b32 %r2400, %r2387; 2026-02-21T10:18:51.7358973Z mov.b32 %r2401, %r2387; 2026-02-21T10:18:51.7359134Z mov.b32 %r2402, %r2387; 2026-02-21T10:18:51.7359298Z mov.b32 %r2403, %r2387; 2026-02-21T10:18:51.7359457Z mov.b32 %r2404, %r2387; 2026-02-21T10:18:51.7359616Z mov.b32 %r2405, %r2387; 2026-02-21T10:18:51.7359777Z mov.b32 %r2406, %r2387; 2026-02-21T10:18:51.7359956Z mov.b32 %r2407, %r2387; 2026-02-21T10:18:51.7360117Z mov.b32 %r2408, %r2387; 2026-02-21T10:18:51.7360281Z mov.b32 %r2409, %r2387; 2026-02-21T10:18:51.7360444Z mov.b32 %r2410, %r2387; 2026-02-21T10:18:51.7360599Z mov.b32 %r2411, %r2387; 2026-02-21T10:18:51.7360765Z mov.b32 %r2412, %r2387; 2026-02-21T10:18:51.7360924Z mov.b32 %r2413, %r2387; 2026-02-21T10:18:51.7361083Z mov.b32 %r2414, %r2387; 2026-02-21T10:18:51.7361242Z mov.b32 %r2415, %r2387; 2026-02-21T10:18:51.7361402Z mov.b32 %r2416, %r2387; 2026-02-21T10:18:51.7361564Z mov.b32 %r2417, %r2387; 2026-02-21T10:18:51.7361720Z mov.b32 %r2418, %r2387; 2026-02-21T10:18:51.7361879Z mov.b32 %r2419, %r2387; 2026-02-21T10:18:51.7362034Z mov.b32 %r2420, %r2387; 2026-02-21T10:18:51.7362192Z mov.b32 %r2421, %r2387; 2026-02-21T10:18:51.7362347Z mov.b32 %r2422, %r2387; 2026-02-21T10:18:51.7362507Z mov.b32 %r2423, %r2387; 2026-02-21T10:18:51.7362663Z mov.b32 %r2424, %r2387; 2026-02-21T10:18:51.7362826Z mov.b32 %r2425, %r2387; 2026-02-21T10:18:51.7362990Z mov.b32 %r2426, %r2387; 2026-02-21T10:18:51.7363148Z mov.b32 %r2427, %r2387; 2026-02-21T10:18:51.7363305Z mov.b32 %r2428, %r2387; 2026-02-21T10:18:51.7363466Z mov.b32 %r2429, %r2387; 2026-02-21T10:18:51.7363626Z mov.b32 %r2430, %r2387; 2026-02-21T10:18:51.7363797Z mov.b32 %r2431, %r2387; 2026-02-21T10:18:51.7363960Z mov.b32 %r2432, %r2387; 2026-02-21T10:18:51.7364114Z mov.b32 %r2433, %r2387; 2026-02-21T10:18:51.7364276Z mov.b32 %r2434, %r2387; 2026-02-21T10:18:51.7364433Z mov.b32 %r2435, %r2387; 2026-02-21T10:18:51.7364592Z mov.b32 %r2436, %r2387; 2026-02-21T10:18:51.7364749Z mov.b32 %r2437, %r2387; 2026-02-21T10:18:51.7364905Z mov.b32 %r2438, %r2387; 2026-02-21T10:18:51.7365064Z mov.b32 %r2439, %r2387; 2026-02-21T10:18:51.7365220Z mov.b32 %r2440, %r2387; 2026-02-21T10:18:51.7365377Z mov.b32 %r2441, %r2387; 2026-02-21T10:18:51.7365531Z mov.b32 %r2442, %r2387; 2026-02-21T10:18:51.7365693Z mov.b32 %r2443, %r2387; 2026-02-21T10:18:51.7365930Z mov.b32 %r2444, %r2387; 2026-02-21T10:18:51.7366088Z mov.b32 %r2445, %r2387; 2026-02-21T10:18:51.7366306Z mov.b32 %r2446, %r2387; 2026-02-21T10:18:51.7366589Z mov.b32 %r2447, %r2387; 2026-02-21T10:18:51.7366758Z mov.b32 %r2448, %r2387; 2026-02-21T10:18:51.7366914Z mov.b32 %r2449, %r2387; 2026-02-21T10:18:51.7367074Z mov.b32 %r2450, %r2387; 2026-02-21T10:18:51.7367228Z mov.b32 %r2451, %r2387; 2026-02-21T10:18:51.7367385Z mov.b32 %r2452, %r2387; 2026-02-21T10:18:51.7367545Z mov.b32 %r2453, %r2387; 2026-02-21T10:18:51.7367703Z mov.b32 %r2454, %r2387; 2026-02-21T10:18:51.7367857Z mov.b32 %r2455, %r2387; 2026-02-21T10:18:51.7368015Z mov.b32 %r2456, %r2387; 2026-02-21T10:18:51.7368172Z mov.b32 %r2457, %r2387; 2026-02-21T10:18:51.7368334Z mov.b32 %r2458, %r2387; 2026-02-21T10:18:51.7368494Z mov.b32 %r2459, %r2387; 2026-02-21T10:18:51.7368650Z mov.b32 %r2460, %r2387; 2026-02-21T10:18:51.7368811Z mov.b32 %r2461, %r2387; 2026-02-21T10:18:51.7368970Z mov.b32 %r2462, %r2387; 2026-02-21T10:18:51.7369133Z mov.b32 %r2463, %r2387; 2026-02-21T10:18:51.7369289Z mov.b32 %r2464, %r2387; 2026-02-21T10:18:51.7369448Z mov.b32 %r2465, %r2387; 2026-02-21T10:18:51.7369677Z mov.b32 %r2466, %r2387; 2026-02-21T10:18:51.7369841Z mov.b32 %r2467, %r2387; 2026-02-21T10:18:51.7369997Z mov.b32 %r2468, %r2387; 2026-02-21T10:18:51.7370157Z mov.b32 %r2469, %r2387; 2026-02-21T10:18:51.7370316Z mov.b32 %r2470, %r2387; 2026-02-21T10:18:51.7370535Z mov.b32 %r2471, %r2387; 2026-02-21T10:18:51.7370699Z mov.b32 %r2472, %r2387; 2026-02-21T10:18:51.7370857Z mov.b32 %r2473, %r2387; 2026-02-21T10:18:51.7371018Z mov.b32 %r2474, %r2387; 2026-02-21T10:18:51.7371178Z mov.b32 %r2475, %r2387; 2026-02-21T10:18:51.7371352Z mov.b32 %r2476, %r2387; 2026-02-21T10:18:51.7371513Z mov.b32 %r2477, %r2387; 2026-02-21T10:18:51.7371673Z mov.b32 %r2478, %r2387; 2026-02-21T10:18:51.7371829Z mov.b32 %r2479, %r2387; 2026-02-21T10:18:51.7371989Z mov.b32 %r2480, %r2387; 2026-02-21T10:18:51.7372153Z mov.b32 %r2481, %r2387; 2026-02-21T10:18:51.7372308Z mov.b32 %r2482, %r2387; 2026-02-21T10:18:51.7372471Z mov.b32 %r2483, %r2387; 2026-02-21T10:18:51.7372625Z mov.b32 %r2484, %r2387; 2026-02-21T10:18:51.7372787Z mov.b32 %r2485, %r2387; 2026-02-21T10:18:51.7372942Z mov.b32 %r2486, %r2387; 2026-02-21T10:18:51.7373100Z mov.b32 %r2487, %r2387; 2026-02-21T10:18:51.7373255Z mov.b32 %r2488, %r2387; 2026-02-21T10:18:51.7373415Z mov.b32 %r2489, %r2387; 2026-02-21T10:18:51.7373573Z mov.b32 %r2490, %r2387; 2026-02-21T10:18:51.7373732Z mov.b32 %r2491, %r2387; 2026-02-21T10:18:51.7373892Z mov.b32 %r2492, %r2387; 2026-02-21T10:18:51.7374048Z mov.b32 %r2493, %r2387; 2026-02-21T10:18:51.7374208Z mov.b32 %r2494, %r2387; 2026-02-21T10:18:51.7374366Z mov.b32 %r2495, %r2387; 2026-02-21T10:18:51.7374524Z mov.b32 %r2496, %r2387; 2026-02-21T10:18:51.7374680Z mov.b32 %r2497, %r2387; 2026-02-21T10:18:51.7374858Z mov.b32 %r2498, %r2387; 2026-02-21T10:18:51.7375020Z mov.b32 %r2499, %r2387; 2026-02-21T10:18:51.7375184Z mov.b32 %r2500, %r2387; 2026-02-21T10:18:51.7375340Z mov.b32 %r2501, %r2387; 2026-02-21T10:18:51.7375502Z mov.b32 %r2502, %r2387; 2026-02-21T10:18:51.7375665Z mov.b32 %r2503, %r2387; 2026-02-21T10:18:51.7375822Z mov.b32 %r2504, %r2387; 2026-02-21T10:18:51.7375981Z mov.b32 %r2505, %r2387; 2026-02-21T10:18:51.7376137Z mov.b32 %r2506, %r2387; 2026-02-21T10:18:51.7376297Z mov.b32 %r2507, %r2387; 2026-02-21T10:18:51.7376569Z mov.b32 %r2508, %r2387; 2026-02-21T10:18:51.7376736Z mov.b32 %r2509, %r2387; 2026-02-21T10:18:51.7376892Z mov.b32 %r2510, %r2387; 2026-02-21T10:18:51.7377055Z mov.b32 %r2511, %r2387; 2026-02-21T10:18:51.7377211Z mov.b32 %r2512, %r2387; 2026-02-21T10:18:51.7377370Z mov.b32 %r2513, %r2387; 2026-02-21T10:18:51.7377530Z mov.b32 %r2514, %r2387; 2026-02-21T10:18:51.7377686Z mov.b32 %r2516, %r2384; 2026-02-21T10:18:51.7377853Z mov.b32 %r2517, %r2380; 2026-02-21T10:18:51.7378015Z mov.b32 %r2518, %r2386; 2026-02-21T10:18:51.7378175Z mov.b32 %r2519, %r2385; 2026-02-21T10:18:51.7378414Z bra.uni $L__BB0_2; 2026-02-21T10:18:51.7378688Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:51.7379097Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7379460Z add.s32 %r2517, %r2517, 1; 2026-02-21T10:18:51.7379643Z setp.ne.b32 %p60, %r68, %r2517; 2026-02-21T10:18:51.7379830Z mov.b32 %r2375, %r2385; 2026-02-21T10:18:51.7379994Z mov.b32 %r2376, %r77; 2026-02-21T10:18:51.7380152Z mov.b32 %r2377, %r2386; 2026-02-21T10:18:51.7380311Z mov.b32 %r2378, %r79; 2026-02-21T10:18:51.7380467Z mov.b32 %r2379, %r2516; 2026-02-21T10:18:51.7380627Z mov.b32 %r2380, %r81; 2026-02-21T10:18:51.7380782Z mov.b32 %r2385, %r2519; 2026-02-21T10:18:51.7380944Z mov.b32 %r2386, %r2518; 2026-02-21T10:18:51.7381106Z mov.b32 %r2516, %r220; 2026-02-21T10:18:51.7381282Z @%p60 bra $L__BB0_2; 2026-02-21T10:18:51.7381444Z bra.uni $L__BB0_7; 2026-02-21T10:18:51.7381653Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:18:51.7382147Z .loc 1 0 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:0:145 2026-02-21T10:18:51.7382499Z mov.b32 %r81, %r2379; 2026-02-21T10:18:51.7382657Z mov.b32 %r79, %r2377; 2026-02-21T10:18:51.7382808Z mov.b32 %r77, %r2375; 2026-02-21T10:18:51.7383173Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7383534Z add.s32 %r679, %r2516, 1; 2026-02-21T10:18:51.7383711Z setp.eq.b32 %p39, %r2516, 255; 2026-02-21T10:18:51.7383907Z selp.b32 %r220, 0, %r679, %p39; 2026-02-21T10:18:51.7384108Z setp.ne.b32 %p40, %r220, 0; 2026-02-21T10:18:51.7384287Z @%p40 bra $L__BB0_4; 2026-02-21T10:18:51.7384490Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:51.7384751Z add.s32 %r2520, %r2520, 8448; 2026-02-21T10:18:51.7385071Z .loc 1 32 35 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:32:35 2026-02-21T10:18:51.7385430Z mul.hi.s32 %r680, %r2520, 1717986919; 2026-02-21T10:18:51.7385632Z shr.u32 %r681, %r680, 31; 2026-02-21T10:18:51.7385803Z shr.s32 %r682, %r680, 5; 2026-02-21T10:18:51.7385974Z add.s32 %r683, %r682, %r681; 2026-02-21T10:18:51.7386284Z .loc 1 33 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:33:33 2026-02-21T10:18:51.7386745Z shl.b32 %r684, %r683, 3; 2026-02-21T10:18:51.7387048Z .loc 1 34 39 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:34:39 2026-02-21T10:18:51.7387393Z sub.s32 %r685, 512, %r684; 2026-02-21T10:18:51.7387706Z .loc 1 34 52 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:34:52 2026-02-21T10:18:51.7388045Z min.s32 %r686, %r685, 8; 2026-02-21T10:18:51.7388349Z .loc 1 35 45 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:45 2026-02-21T10:18:51.7388763Z mul.lo.s32 %r687, %r683, 80; 2026-02-21T10:18:51.7388946Z sub.s32 %r688, %r2520, %r687; 2026-02-21T10:18:51.7389258Z .loc 1 36 51 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:36:51 2026-02-21T10:18:51.7389602Z div.s32 %r689, %r688, %r686; 2026-02-21T10:18:51.7389912Z .loc 1 35 64 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:64 2026-02-21T10:18:51.7390269Z mul.lo.s32 %r690, %r689, %r686; 2026-02-21T10:18:51.7390461Z sub.s32 %r691, %r688, %r690; 2026-02-21T10:18:51.7390765Z .loc 1 35 30 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:35:30 2026-02-21T10:18:51.7391109Z add.s32 %r692, %r691, %r684; 2026-02-21T10:18:51.7391412Z .loc 1 37 27 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:37:27 2026-02-21T10:18:51.7391758Z shl.b32 %r2518, %r692, 7; 2026-02-21T10:18:51.7392068Z .loc 1 38 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:38:32 2026-02-21T10:18:51.7392493Z or.b32 %r2521, %r2518, %r5; 2026-02-21T10:18:51.7392732Z or.b32 %r2522, %r2518, %r6; 2026-02-21T10:18:51.7392901Z or.b32 %r2523, %r2518, %r7; 2026-02-21T10:18:51.7393079Z or.b32 %r2524, %r2518, %r8; 2026-02-21T10:18:51.7393244Z or.b32 %r2525, %r2518, %r9; 2026-02-21T10:18:51.7393416Z or.b32 %r2526, %r2518, %r10; 2026-02-21T10:18:51.7393582Z or.b32 %r2527, %r2518, %r11; 2026-02-21T10:18:51.7393759Z or.b32 %r2528, %r2518, %r12; 2026-02-21T10:18:51.7394066Z .loc 1 39 27 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:39:27 2026-02-21T10:18:51.7394406Z shl.b32 %r2519, %r689, 7; 2026-02-21T10:18:51.7394628Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:51.7395028Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7395386Z setp.eq.b32 %p51, %r220, 0; 2026-02-21T10:18:51.7395585Z setp.lt.s32 %p52, %r2517, %r47; 2026-02-21T10:18:51.7395781Z add.s32 %r2034, %r2383, 1; 2026-02-21T10:18:51.7395957Z setp.gt.s32 %p55, %r2034, 2; 2026-02-21T10:18:51.7396216Z selp.b32 %r2383, 0, %r2034, %p55; 2026-02-21T10:18:51.7396421Z selp.b32 %r2035, 1, 0, %p55; 2026-02-21T10:18:51.7396716Z xor.b32 %r2382, %r2382, %r2035; 2026-02-21T10:18:51.7397038Z .loc 1 54 80 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:80 2026-02-21T10:18:51.7397457Z cp.async.wait_group 2; 2026-02-21T10:18:51.7397633Z bar.sync 0; 2026-02-21T10:18:51.7397777Z shl.b32 %r2036, %r2383, 13; 2026-02-21T10:18:51.7397956Z add.s32 %r2038, %r586, %r2036; 2026-02-21T10:18:51.7398271Z .loc 1 58 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:58:32 2026-02-21T10:18:51.7398618Z add.s32 %r2039, %r2038, %r48; 2026-02-21T10:18:51.7398798Z ld.shared.b16 %rs1, [%r2039]; 2026-02-21T10:18:51.7398982Z ld.shared.b16 %rs2, [%r2039+512]; 2026-02-21T10:18:51.7399201Z ld.shared.b16 %rs3, [%r2039+32]; 2026-02-21T10:18:51.7399398Z ld.shared.b16 %rs4, [%r2039+544]; 2026-02-21T10:18:51.7399596Z ld.shared.b16 %rs5, [%r2039+4096]; 2026-02-21T10:18:51.7399792Z ld.shared.b16 %rs6, [%r2039+4608]; 2026-02-21T10:18:51.7399983Z ld.shared.b16 %rs7, [%r2039+4128]; 2026-02-21T10:18:51.7400185Z ld.shared.b16 %rs8, [%r2039+4640]; 2026-02-21T10:18:51.7400374Z add.s32 %r2040, %r2038, %r49; 2026-02-21T10:18:51.7400557Z ld.shared.b16 %rs9, [%r2040]; 2026-02-21T10:18:51.7400736Z ld.shared.b16 %rs10, [%r2040+512]; 2026-02-21T10:18:51.7400931Z ld.shared.b16 %rs11, [%r2040+32]; 2026-02-21T10:18:51.7401119Z ld.shared.b16 %rs12, [%r2040+544]; 2026-02-21T10:18:51.7401315Z ld.shared.b16 %rs13, [%r2040+4096]; 2026-02-21T10:18:51.7401511Z ld.shared.b16 %rs14, [%r2040+4608]; 2026-02-21T10:18:51.7401709Z ld.shared.b16 %rs15, [%r2040+4128]; 2026-02-21T10:18:51.7401899Z ld.shared.b16 %rs16, [%r2040+4640]; 2026-02-21T10:18:51.7402093Z add.s32 %r2041, %r2038, %r50; 2026-02-21T10:18:51.7402282Z ld.shared.b16 %rs17, [%r2041]; 2026-02-21T10:18:51.7402465Z ld.shared.b16 %rs18, [%r2041+512]; 2026-02-21T10:18:51.7402664Z ld.shared.b16 %rs19, [%r2041+32]; 2026-02-21T10:18:51.7402853Z ld.shared.b16 %rs20, [%r2041+544]; 2026-02-21T10:18:51.7403047Z ld.shared.b16 %rs21, [%r2041+4096]; 2026-02-21T10:18:51.7403237Z ld.shared.b16 %rs22, [%r2041+4608]; 2026-02-21T10:18:51.7403446Z ld.shared.b16 %rs23, [%r2041+4128]; 2026-02-21T10:18:51.7403646Z ld.shared.b16 %rs24, [%r2041+4640]; 2026-02-21T10:18:51.7403838Z add.s32 %r2042, %r2038, %r51; 2026-02-21T10:18:51.7404022Z ld.shared.b16 %rs25, [%r2042]; 2026-02-21T10:18:51.7404205Z ld.shared.b16 %rs26, [%r2042+512]; 2026-02-21T10:18:51.7404404Z ld.shared.b16 %rs27, [%r2042+32]; 2026-02-21T10:18:51.7404592Z ld.shared.b16 %rs28, [%r2042+544]; 2026-02-21T10:18:51.7404790Z ld.shared.b16 %rs29, [%r2042+4096]; 2026-02-21T10:18:51.7404980Z ld.shared.b16 %rs30, [%r2042+4608]; 2026-02-21T10:18:51.7405255Z ld.shared.b16 %rs31, [%r2042+4128]; 2026-02-21T10:18:51.7405445Z ld.shared.b16 %rs32, [%r2042+4640]; 2026-02-21T10:18:51.7405702Z cvt.f32.bf16 %r823, %rs1; 2026-02-21T10:18:51.7405882Z cvt.f32.bf16 %r824, %rs2; 2026-02-21T10:18:51.7406075Z cvt.f32.bf16 %r825, %rs9; 2026-02-21T10:18:51.7406255Z cvt.f32.bf16 %r826, %rs10; 2026-02-21T10:18:51.7406430Z cvt.f32.bf16 %r955, %rs17; 2026-02-21T10:18:51.7406721Z cvt.f32.bf16 %r956, %rs18; 2026-02-21T10:18:51.7406897Z cvt.f32.bf16 %r957, %rs25; 2026-02-21T10:18:51.7407071Z cvt.f32.bf16 %r958, %rs26; 2026-02-21T10:18:51.7407240Z cvt.f32.bf16 %r1087, %rs3; 2026-02-21T10:18:51.7407419Z cvt.f32.bf16 %r1088, %rs4; 2026-02-21T10:18:51.7407590Z cvt.f32.bf16 %r1089, %rs11; 2026-02-21T10:18:51.7407780Z cvt.f32.bf16 %r1090, %rs12; 2026-02-21T10:18:51.7407956Z cvt.f32.bf16 %r1219, %rs19; 2026-02-21T10:18:51.7408138Z cvt.f32.bf16 %r1220, %rs20; 2026-02-21T10:18:51.7408311Z cvt.f32.bf16 %r1221, %rs27; 2026-02-21T10:18:51.7408479Z cvt.f32.bf16 %r1222, %rs28; 2026-02-21T10:18:51.7408663Z cvt.f32.bf16 %r1351, %rs5; 2026-02-21T10:18:51.7408835Z cvt.f32.bf16 %r1352, %rs6; 2026-02-21T10:18:51.7409011Z cvt.f32.bf16 %r1353, %rs13; 2026-02-21T10:18:51.7409270Z cvt.f32.bf16 %r1354, %rs14; 2026-02-21T10:18:51.7409458Z cvt.f32.bf16 %r1483, %rs21; 2026-02-21T10:18:51.7409627Z cvt.f32.bf16 %r1484, %rs22; 2026-02-21T10:18:51.7409802Z cvt.f32.bf16 %r1485, %rs29; 2026-02-21T10:18:51.7409976Z cvt.f32.bf16 %r1486, %rs30; 2026-02-21T10:18:51.7410209Z cvt.f32.bf16 %r1615, %rs7; 2026-02-21T10:18:51.7410384Z cvt.f32.bf16 %r1616, %rs8; 2026-02-21T10:18:51.7410555Z cvt.f32.bf16 %r1617, %rs15; 2026-02-21T10:18:51.7410732Z cvt.f32.bf16 %r1618, %rs16; 2026-02-21T10:18:51.7410905Z cvt.f32.bf16 %r1747, %rs23; 2026-02-21T10:18:51.7411075Z cvt.f32.bf16 %r1748, %rs24; 2026-02-21T10:18:51.7411243Z cvt.f32.bf16 %r1749, %rs31; 2026-02-21T10:18:51.7411414Z cvt.f32.bf16 %r1750, %rs32; 2026-02-21T10:18:51.7411734Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7412102Z shl.b32 %r2043, %r2383, 3; 2026-02-21T10:18:51.7412270Z add.s32 %r693, %r512, %r2043; 2026-02-21T10:18:51.7412446Z // begin inline asm 2026-02-21T10:18:51.7412601Z 2026-02-21T10:18:51.7412721Z { 2026-02-21T10:18:51.7412857Z .reg .pred complete; 2026-02-21T10:18:51.7413014Z waitLoop: 2026-02-21T10:18:51.7413233Z mbarrier.try_wait.parity.shared.b64 complete, [%r693], %r2382; 2026-02-21T10:18:51.7413519Z @!complete bra.uni waitLoop; 2026-02-21T10:18:51.7413696Z } 2026-02-21T10:18:51.7413765Z 2026-02-21T10:18:51.7413830Z // end inline asm 2026-02-21T10:18:51.7414122Z .loc 1 60 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:60:33 2026-02-21T10:18:51.7414483Z shl.b32 %r2045, %r2383, 11; 2026-02-21T10:18:51.7414661Z add.s32 %r2047, %r532, %r2045; 2026-02-21T10:18:51.7414981Z .loc 1 78 58 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:78:58 2026-02-21T10:18:51.7415345Z add.s32 %r2048, %r2047, %r45; 2026-02-21T10:18:51.7415527Z xor.b32 %r2049, %r45, 16; 2026-02-21T10:18:51.7415699Z add.s32 %r2050, %r2047, %r2049; 2026-02-21T10:18:51.7415889Z xor.b32 %r2051, %r45, 32; 2026-02-21T10:18:51.7416060Z add.s32 %r2052, %r2047, %r2051; 2026-02-21T10:18:51.7416242Z xor.b32 %r2053, %r45, 48; 2026-02-21T10:18:51.7416411Z add.s32 %r2054, %r2047, %r2053; 2026-02-21T10:18:51.7416708Z xor.b32 %r2055, %r45, 64; 2026-02-21T10:18:51.7416882Z add.s32 %r2056, %r2047, %r2055; 2026-02-21T10:18:51.7417054Z xor.b32 %r2057, %r45, 80; 2026-02-21T10:18:51.7417220Z add.s32 %r2058, %r2047, %r2057; 2026-02-21T10:18:51.7417393Z xor.b32 %r2059, %r45, 96; 2026-02-21T10:18:51.7417559Z add.s32 %r2060, %r2047, %r2059; 2026-02-21T10:18:51.7417737Z xor.b32 %r2061, %r45, 112; 2026-02-21T10:18:51.7417911Z add.s32 %r2062, %r2047, %r2061; 2026-02-21T10:18:51.7418228Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7418661Z ld.shared.s8 %rs33, [%r2048]; 2026-02-21T10:18:51.7419041Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7419385Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:18:51.7419694Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7420041Z ld.shared.s8 %rs35, [%r2050+128]; 2026-02-21T10:18:51.7420373Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7420718Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:18:51.7421031Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7421387Z ld.shared.s8 %rs37, [%r2052+256]; 2026-02-21T10:18:51.7421708Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7422055Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:18:51.7422360Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7422780Z ld.shared.s8 %rs39, [%r2054+384]; 2026-02-21T10:18:51.7423106Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7423448Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:18:51.7423813Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7424162Z ld.shared.s8 %rs41, [%r2056+512]; 2026-02-21T10:18:51.7424483Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7424828Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:18:51.7425128Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7425479Z ld.shared.s8 %rs43, [%r2058+640]; 2026-02-21T10:18:51.7425799Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7426145Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:18:51.7426444Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7426926Z ld.shared.s8 %rs45, [%r2060+768]; 2026-02-21T10:18:51.7427250Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7427597Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:18:51.7427902Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7428245Z ld.shared.s8 %rs47, [%r2062+896]; 2026-02-21T10:18:51.7428656Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7429001Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:18:51.7429304Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7429661Z ld.shared.s8 %rs49, [%r2048+1024]; 2026-02-21T10:18:51.7429989Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7430336Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:18:51.7430631Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7430983Z ld.shared.s8 %rs51, [%r2050+1152]; 2026-02-21T10:18:51.7431308Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7431656Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:18:51.7431959Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7432301Z ld.shared.s8 %rs53, [%r2052+1280]; 2026-02-21T10:18:51.7432627Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7433069Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:18:51.7433370Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7433808Z ld.shared.s8 %rs55, [%r2054+1408]; 2026-02-21T10:18:51.7434129Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7434472Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:18:51.7434768Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7435115Z ld.shared.s8 %rs57, [%r2056+1536]; 2026-02-21T10:18:51.7435446Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7435798Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:18:51.7436103Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7436561Z ld.shared.s8 %rs59, [%r2058+1664]; 2026-02-21T10:18:51.7436892Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7437234Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:18:51.7437619Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7437971Z ld.shared.s8 %rs61, [%r2060+1792]; 2026-02-21T10:18:51.7438294Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7438696Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:18:51.7438999Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7439349Z ld.shared.s8 %rs63, [%r2062+1920]; 2026-02-21T10:18:51.7439667Z .loc 1 63 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:63:28 2026-02-21T10:18:51.7440015Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:18:51.7440313Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7440661Z cvt.s16.s8 %rs65, %rs34; 2026-02-21T10:18:51.7440831Z shr.s16 %rs66, %rs65, 4; 2026-02-21T10:18:51.7440995Z cvt.s16.s8 %rs67, %rs36; 2026-02-21T10:18:51.7441163Z shr.s16 %rs68, %rs67, 4; 2026-02-21T10:18:51.7441325Z shr.s16 %rs69, %rs33, 4; 2026-02-21T10:18:51.7441488Z shr.s16 %rs70, %rs35, 4; 2026-02-21T10:18:51.7441788Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7442137Z cvt.rn.f32.s16 %r2063, %rs70; 2026-02-21T10:18:51.7442318Z cvt.rn.f32.s16 %r2064, %rs69; 2026-02-21T10:18:51.7442513Z cvt.rn.f32.s16 %r2065, %rs68; 2026-02-21T10:18:51.7442693Z cvt.rn.f32.s16 %r2066, %rs66; 2026-02-21T10:18:51.7443007Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7443357Z cvt.s16.s8 %rs71, %rs38; 2026-02-21T10:18:51.7443520Z shr.s16 %rs72, %rs71, 4; 2026-02-21T10:18:51.7443689Z cvt.s16.s8 %rs73, %rs40; 2026-02-21T10:18:51.7443851Z shr.s16 %rs74, %rs73, 4; 2026-02-21T10:18:51.7444018Z shr.s16 %rs75, %rs37, 4; 2026-02-21T10:18:51.7444178Z shr.s16 %rs76, %rs39, 4; 2026-02-21T10:18:51.7444482Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7444829Z cvt.rn.f32.s16 %r2067, %rs76; 2026-02-21T10:18:51.7445003Z cvt.rn.f32.s16 %r2068, %rs75; 2026-02-21T10:18:51.7445185Z cvt.rn.f32.s16 %r2069, %rs74; 2026-02-21T10:18:51.7445358Z cvt.rn.f32.s16 %r2070, %rs72; 2026-02-21T10:18:51.7445665Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7446009Z cvt.s16.s8 %rs77, %rs42; 2026-02-21T10:18:51.7446176Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:18:51.7446337Z cvt.s16.s8 %rs79, %rs44; 2026-02-21T10:18:51.7446623Z shr.s16 %rs80, %rs79, 4; 2026-02-21T10:18:51.7446789Z shr.s16 %rs81, %rs41, 4; 2026-02-21T10:18:51.7446948Z shr.s16 %rs82, %rs43, 4; 2026-02-21T10:18:51.7447352Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7447762Z cvt.rn.f32.s16 %r2071, %rs82; 2026-02-21T10:18:51.7447939Z cvt.rn.f32.s16 %r2072, %rs81; 2026-02-21T10:18:51.7448113Z cvt.rn.f32.s16 %r2073, %rs80; 2026-02-21T10:18:51.7448287Z cvt.rn.f32.s16 %r2074, %rs78; 2026-02-21T10:18:51.7448598Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7448944Z cvt.s16.s8 %rs83, %rs46; 2026-02-21T10:18:51.7449109Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:18:51.7449270Z cvt.s16.s8 %rs85, %rs48; 2026-02-21T10:18:51.7449436Z shr.s16 %rs86, %rs85, 4; 2026-02-21T10:18:51.7449596Z shr.s16 %rs87, %rs45, 4; 2026-02-21T10:18:51.7449760Z shr.s16 %rs88, %rs47, 4; 2026-02-21T10:18:51.7450056Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7450405Z cvt.rn.f32.s16 %r2075, %rs88; 2026-02-21T10:18:51.7450580Z cvt.rn.f32.s16 %r2076, %rs87; 2026-02-21T10:18:51.7450759Z cvt.rn.f32.s16 %r2077, %rs86; 2026-02-21T10:18:51.7451012Z cvt.rn.f32.s16 %r2078, %rs84; 2026-02-21T10:18:51.7451319Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7451666Z cvt.s16.s8 %rs89, %rs50; 2026-02-21T10:18:51.7451837Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:18:51.7452077Z cvt.s16.s8 %rs91, %rs52; 2026-02-21T10:18:51.7452243Z shr.s16 %rs92, %rs91, 4; 2026-02-21T10:18:51.7452407Z shr.s16 %rs93, %rs49, 4; 2026-02-21T10:18:51.7452567Z shr.s16 %rs94, %rs51, 4; 2026-02-21T10:18:51.7452873Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7453236Z cvt.rn.f32.s16 %r2079, %rs94; 2026-02-21T10:18:51.7453418Z cvt.rn.f32.s16 %r2080, %rs93; 2026-02-21T10:18:51.7453597Z cvt.rn.f32.s16 %r2081, %rs92; 2026-02-21T10:18:51.7453770Z cvt.rn.f32.s16 %r2082, %rs90; 2026-02-21T10:18:51.7454084Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7454434Z cvt.s16.s8 %rs95, %rs54; 2026-02-21T10:18:51.7454607Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:18:51.7454774Z cvt.s16.s8 %rs97, %rs56; 2026-02-21T10:18:51.7454938Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:18:51.7455105Z shr.s16 %rs99, %rs53, 4; 2026-02-21T10:18:51.7455270Z shr.s16 %rs100, %rs55, 4; 2026-02-21T10:18:51.7455577Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7455926Z cvt.rn.f32.s16 %r2083, %rs100; 2026-02-21T10:18:51.7456110Z cvt.rn.f32.s16 %r2084, %rs99; 2026-02-21T10:18:51.7456296Z cvt.rn.f32.s16 %r2085, %rs98; 2026-02-21T10:18:51.7456599Z cvt.rn.f32.s16 %r2086, %rs96; 2026-02-21T10:18:51.7456914Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7457259Z cvt.s16.s8 %rs101, %rs58; 2026-02-21T10:18:51.7457431Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:18:51.7457603Z cvt.s16.s8 %rs103, %rs60; 2026-02-21T10:18:51.7457775Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:18:51.7457942Z shr.s16 %rs105, %rs57, 4; 2026-02-21T10:18:51.7458109Z shr.s16 %rs106, %rs59, 4; 2026-02-21T10:18:51.7458406Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7458754Z cvt.rn.f32.s16 %r2087, %rs106; 2026-02-21T10:18:51.7458939Z cvt.rn.f32.s16 %r2088, %rs105; 2026-02-21T10:18:51.7459116Z cvt.rn.f32.s16 %r2089, %rs104; 2026-02-21T10:18:51.7459296Z cvt.rn.f32.s16 %r2090, %rs102; 2026-02-21T10:18:51.7459607Z .loc 1 65 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:65:25 2026-02-21T10:18:51.7459950Z cvt.s16.s8 %rs107, %rs62; 2026-02-21T10:18:51.7460129Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:18:51.7460302Z cvt.s16.s8 %rs109, %rs64; 2026-02-21T10:18:51.7460553Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:18:51.7460729Z shr.s16 %rs111, %rs61, 4; 2026-02-21T10:18:51.7460964Z shr.s16 %rs112, %rs63, 4; 2026-02-21T10:18:51.7461269Z .loc 1 83 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:83:32 2026-02-21T10:18:51.7461613Z cvt.rn.f32.s16 %r2091, %rs112; 2026-02-21T10:18:51.7461807Z cvt.rn.f32.s16 %r2092, %rs111; 2026-02-21T10:18:51.7461993Z cvt.rn.f32.s16 %r2093, %rs110; 2026-02-21T10:18:51.7462171Z cvt.rn.f32.s16 %r2094, %rs108; 2026-02-21T10:18:51.7462401Z st.shared.v4.b32 [%r52], {%r2066, %r2064, %r2065, %r2063}; 2026-02-21T10:18:51.7462690Z st.shared.v4.b32 [%r53], {%r2070, %r2068, %r2069, %r2067}; 2026-02-21T10:18:51.7462972Z st.shared.v4.b32 [%r54], {%r2074, %r2072, %r2073, %r2071}; 2026-02-21T10:18:51.7463252Z st.shared.v4.b32 [%r55], {%r2078, %r2076, %r2077, %r2075}; 2026-02-21T10:18:51.7463527Z st.shared.v4.b32 [%r56], {%r2082, %r2080, %r2081, %r2079}; 2026-02-21T10:18:51.7463804Z st.shared.v4.b32 [%r57], {%r2086, %r2084, %r2085, %r2083}; 2026-02-21T10:18:51.7464080Z st.shared.v4.b32 [%r58], {%r2090, %r2088, %r2089, %r2087}; 2026-02-21T10:18:51.7464426Z st.shared.v4.b32 [%r59], {%r2094, %r2092, %r2093, %r2091}; 2026-02-21T10:18:51.7464662Z $L__tmp1: 2026-02-21T10:18:51.7465021Z .loc 2 291 36 // standard.py:291:36 @[ cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:90:40 ] 2026-02-21T10:18:51.7465452Z // begin inline asm 2026-02-21T10:18:51.7465695Z fence.proxy.async.shared::cta; 2026-02-21T10:18:51.7465887Z // end inline asm 2026-02-21T10:18:51.7466031Z bar.sync 0; 2026-02-21T10:18:51.7466197Z shfl.sync.idx.b32 %r2095, %r3, 0, 31, -1; 2026-02-21T10:18:51.7466421Z wgmma.fence.sync.aligned; 2026-02-21T10:18:51.7466740Z mov.pred %p41, -1; 2026-02-21T10:18:51.7466898Z // begin inline asm 2026-02-21T10:18:51.7468260Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r823,%r824,%r825,%r826}, %rd92, %p41, 1, 1; 2026-02-21T10:18:51.7469756Z // end inline asm 2026-02-21T10:18:51.7469905Z // begin inline asm 2026-02-21T10:18:51.7471248Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r955,%r956,%r957,%r958}, %rd93, %p41, 1, 1; 2026-02-21T10:18:51.7471308Z // end inline asm 2026-02-21T10:18:51.7471372Z // begin inline asm 2026-02-21T10:18:51.7472632Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1087,%r1088,%r1089,%r1090}, %rd94, %p41, 1, 1; 2026-02-21T10:18:51.7472696Z // end inline asm 2026-02-21T10:18:51.7472753Z // begin inline asm 2026-02-21T10:18:51.7474008Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1219,%r1220,%r1221,%r1222}, %rd95, %p41, 1, 1; 2026-02-21T10:18:51.7474213Z // end inline asm 2026-02-21T10:18:51.7474276Z // begin inline asm 2026-02-21T10:18:51.7475534Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1351,%r1352,%r1353,%r1354}, %rd92, %p41, 1, 1; 2026-02-21T10:18:51.7475602Z // end inline asm 2026-02-21T10:18:51.7475661Z // begin inline asm 2026-02-21T10:18:51.7477161Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1483,%r1484,%r1485,%r1486}, %rd93, %p41, 1, 1; 2026-02-21T10:18:51.7477228Z // end inline asm 2026-02-21T10:18:51.7477289Z // begin inline asm 2026-02-21T10:18:51.7478558Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1615,%r1616,%r1617,%r1618}, %rd94, %p41, 1, 1; 2026-02-21T10:18:51.7478632Z // end inline asm 2026-02-21T10:18:51.7478694Z // begin inline asm 2026-02-21T10:18:51.7479956Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1747,%r1748,%r1749,%r1750}, %rd95, %p41, 1, 1; 2026-02-21T10:18:51.7480016Z // end inline asm 2026-02-21T10:18:51.7480095Z wgmma.commit_group.sync.aligned; 2026-02-21T10:18:51.7480158Z mov.b32 %r1880, 0; 2026-02-21T10:18:51.7480219Z mov.b32 %r1879, %r647; 2026-02-21T10:18:51.7480280Z mov.b32 %r1881, %r1880; 2026-02-21T10:18:51.7480344Z // begin inline asm 2026-02-21T10:18:51.7482426Z // wait for regs: %r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r1879,%r1880,%r1881 2026-02-21T10:18:51.7482647Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:18:51.7482702Z // end inline asm 2026-02-21T10:18:51.7482756Z $L__tmp2: 2026-02-21T10:18:51.7482980Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7483043Z add.s32 %r2096, %r2381, 16; 2026-02-21T10:18:51.7483106Z add.s32 %r2097, %r2384, 1; 2026-02-21T10:18:51.7483175Z setp.gt.s32 %p56, %r2097, 2; 2026-02-21T10:18:51.7483259Z selp.b32 %r2384, 0, %r2097, %p56; 2026-02-21T10:18:51.7483328Z selp.b32 %r2381, 0, %r2096, %p51; 2026-02-21T10:18:51.7483542Z .loc 1 51 22 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:51:22 2026-02-21T10:18:51.7483612Z shl.b32 %r2098, %r2381, 1; 2026-02-21T10:18:51.7483864Z .loc 1 53 25 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:53:25 2026-02-21T10:18:51.7483931Z add.s32 %r2099, %r2098, %r34; 2026-02-21T10:18:51.7484133Z .loc 1 54 53 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:53 2026-02-21T10:18:51.7484194Z shl.b32 %r2100, %r2521, 13; 2026-02-21T10:18:51.7484298Z shl.b32 %r2101, %r2522, 13; 2026-02-21T10:18:51.7484360Z shl.b32 %r2102, %r2523, 13; 2026-02-21T10:18:51.7484422Z shl.b32 %r2103, %r2524, 13; 2026-02-21T10:18:51.7484483Z shl.b32 %r2104, %r2525, 13; 2026-02-21T10:18:51.7484541Z shl.b32 %r2105, %r2526, 13; 2026-02-21T10:18:51.7484604Z shl.b32 %r2106, %r2527, 13; 2026-02-21T10:18:51.7484661Z shl.b32 %r2107, %r2528, 13; 2026-02-21T10:18:51.7484855Z .loc 1 54 60 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:60 2026-02-21T10:18:51.7484932Z add.s32 %r2108, %r2100, %r2099; 2026-02-21T10:18:51.7485004Z add.s32 %r2109, %r2101, %r2099; 2026-02-21T10:18:51.7485064Z add.s32 %r2110, %r2102, %r2099; 2026-02-21T10:18:51.7485125Z add.s32 %r2111, %r2103, %r2099; 2026-02-21T10:18:51.7485189Z add.s32 %r2112, %r2104, %r2099; 2026-02-21T10:18:51.7485248Z add.s32 %r2113, %r2105, %r2099; 2026-02-21T10:18:51.7485309Z add.s32 %r2114, %r2106, %r2099; 2026-02-21T10:18:51.7485374Z add.s32 %r2115, %r2107, %r2099; 2026-02-21T10:18:51.7485570Z .loc 1 54 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:32 2026-02-21T10:18:51.7485644Z mad.wide.s32 %rd100, %r2108, 2, %rd6; 2026-02-21T10:18:51.7485712Z mad.wide.s32 %rd101, %r2109, 2, %rd6; 2026-02-21T10:18:51.7485780Z mad.wide.s32 %rd102, %r2110, 2, %rd6; 2026-02-21T10:18:51.7485844Z mad.wide.s32 %rd103, %r2111, 2, %rd6; 2026-02-21T10:18:51.7485909Z mad.wide.s32 %rd104, %r2112, 2, %rd6; 2026-02-21T10:18:51.7485980Z mad.wide.s32 %rd105, %r2113, 2, %rd6; 2026-02-21T10:18:51.7486049Z mad.wide.s32 %rd106, %r2114, 2, %rd6; 2026-02-21T10:18:51.7486117Z mad.wide.s32 %rd107, %r2115, 2, %rd6; 2026-02-21T10:18:51.7486323Z .loc 1 54 80 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:54:80 2026-02-21T10:18:51.7486384Z shl.b32 %r2116, %r2384, 13; 2026-02-21T10:18:51.7486563Z add.s32 %r2117, %r586, %r2116; 2026-02-21T10:18:51.7486628Z add.s32 %r2013, %r2117, %r46; 2026-02-21T10:18:51.7486700Z selp.b32 %r2014, 8, 0, %p52; 2026-02-21T10:18:51.7486763Z // begin inline asm 2026-02-21T10:18:51.7486912Z cp.async.ca.shared.global [ %r2013 + 0 ], [ %rd100 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7486972Z // end inline asm 2026-02-21T10:18:51.7487033Z add.s32 %r2015, %r2013, 1024; 2026-02-21T10:18:51.7487092Z // begin inline asm 2026-02-21T10:18:51.7487228Z cp.async.ca.shared.global [ %r2015 + 0 ], [ %rd101 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7487289Z // end inline asm 2026-02-21T10:18:51.7487429Z add.s32 %r2017, %r2013, 2048; 2026-02-21T10:18:51.7487500Z // begin inline asm 2026-02-21T10:18:51.7487704Z cp.async.ca.shared.global [ %r2017 + 0 ], [ %rd102 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7487761Z // end inline asm 2026-02-21T10:18:51.7487821Z add.s32 %r2019, %r2013, 3072; 2026-02-21T10:18:51.7487883Z // begin inline asm 2026-02-21T10:18:51.7488015Z cp.async.ca.shared.global [ %r2019 + 0 ], [ %rd103 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7488071Z // end inline asm 2026-02-21T10:18:51.7488133Z add.s32 %r2021, %r2013, 4096; 2026-02-21T10:18:51.7488198Z // begin inline asm 2026-02-21T10:18:51.7488326Z cp.async.ca.shared.global [ %r2021 + 0 ], [ %rd104 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7488382Z // end inline asm 2026-02-21T10:18:51.7488445Z add.s32 %r2023, %r2013, 5120; 2026-02-21T10:18:51.7488504Z // begin inline asm 2026-02-21T10:18:51.7488633Z cp.async.ca.shared.global [ %r2023 + 0 ], [ %rd105 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7488688Z // end inline asm 2026-02-21T10:18:51.7488753Z add.s32 %r2025, %r2013, 6144; 2026-02-21T10:18:51.7488812Z // begin inline asm 2026-02-21T10:18:51.7488944Z cp.async.ca.shared.global [ %r2025 + 0 ], [ %rd106 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7489069Z // end inline asm 2026-02-21T10:18:51.7489133Z add.s32 %r2027, %r2013, 7168; 2026-02-21T10:18:51.7489190Z // begin inline asm 2026-02-21T10:18:51.7489319Z cp.async.ca.shared.global [ %r2027 + 0 ], [ %rd107 + 0 ], 0x8, %r2014; 2026-02-21T10:18:51.7489442Z // end inline asm 2026-02-21T10:18:51.7489509Z cp.async.commit_group; 2026-02-21T10:18:51.7489720Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7489786Z shl.b32 %r2118, %r2384, 3; 2026-02-21T10:18:51.7489849Z add.s32 %r2029, %r512, %r2118; 2026-02-21T10:18:51.7489916Z and.pred %p49, %p61, %p52; 2026-02-21T10:18:51.7489983Z // begin inline asm 2026-02-21T10:18:51.7490115Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r2029], 2048; 2026-02-21T10:18:51.7490177Z // end inline asm 2026-02-21T10:18:51.7490373Z .loc 1 60 33 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:60:33 2026-02-21T10:18:51.7490442Z shl.b32 %r2119, %r2384, 11; 2026-02-21T10:18:51.7490503Z add.s32 %r2030, %r532, %r2119; 2026-02-21T10:18:51.7490559Z bar.sync 0; 2026-02-21T10:18:51.7490630Z elect.sync %r2120|%p57, -1; 2026-02-21T10:18:51.7490695Z and.pred %p58, %p52, %p57; 2026-02-21T10:18:51.7490761Z and.pred %p50, %p1, %p58; 2026-02-21T10:18:51.7490819Z // begin inline asm 2026-02-21T10:18:51.7491143Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r2030], [%rd34, {%r2519, %r2381}], [%r2029]; 2026-02-21T10:18:51.7491200Z // end inline asm 2026-02-21T10:18:51.7491405Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7491477Z setp.ne.b32 %p59, %r2380, 255; 2026-02-21T10:18:51.7491535Z @%p59 bra $L__BB0_6; 2026-02-21T10:18:51.7491646Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:18:51.7491854Z .loc 1 38 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:38:32 2026-02-21T10:18:51.7491915Z add.s32 %r2266, %r2378, %r13; 2026-02-21T10:18:51.7491976Z add.s32 %r2267, %r14, %r2378; 2026-02-21T10:18:51.7492040Z add.s32 %r2268, %r15, %r2378; 2026-02-21T10:18:51.7492099Z add.s32 %r2269, %r16, %r2378; 2026-02-21T10:18:51.7492159Z add.s32 %r2270, %r17, %r2378; 2026-02-21T10:18:51.7492218Z add.s32 %r2271, %r18, %r2378; 2026-02-21T10:18:51.7492279Z add.s32 %r2272, %r19, %r2378; 2026-02-21T10:18:51.7492349Z add.s32 %r2273, %r2378, %r20; 2026-02-21T10:18:51.7492410Z add.s32 %r2274, %r21, %r2378; 2026-02-21T10:18:51.7492473Z add.s32 %r2275, %r22, %r2378; 2026-02-21T10:18:51.7492533Z add.s32 %r2276, %r23, %r2378; 2026-02-21T10:18:51.7492591Z add.s32 %r2277, %r24, %r2378; 2026-02-21T10:18:51.7492651Z add.s32 %r2278, %r25, %r2378; 2026-02-21T10:18:51.7492775Z add.s32 %r2279, %r26, %r2378; 2026-02-21T10:18:51.7492833Z add.s32 %r2280, %r27, %r2378; 2026-02-21T10:18:51.7492938Z add.s32 %r2281, %r2378, %r28; 2026-02-21T10:18:51.7493140Z .loc 1 40 32 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:40:32 2026-02-21T10:18:51.7493208Z add.s32 %r2282, %r2376, %r29; 2026-02-21T10:18:51.7493409Z .loc 1 93 28 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:93:28 2026-02-21T10:18:51.7493492Z cvt.rn.bf16x2.f32 %r2283, %r2388, %r2387; 2026-02-21T10:18:51.7493565Z cvt.rn.bf16x2.f32 %r2284, %r2390, %r2389; 2026-02-21T10:18:51.7493634Z cvt.rn.bf16x2.f32 %r2285, %r2392, %r2391; 2026-02-21T10:18:51.7493703Z cvt.rn.bf16x2.f32 %r2286, %r2394, %r2393; 2026-02-21T10:18:51.7493777Z cvt.rn.bf16x2.f32 %r2287, %r2396, %r2395; 2026-02-21T10:18:51.7493845Z cvt.rn.bf16x2.f32 %r2288, %r2398, %r2397; 2026-02-21T10:18:51.7493915Z cvt.rn.bf16x2.f32 %r2289, %r2400, %r2399; 2026-02-21T10:18:51.7493992Z cvt.rn.bf16x2.f32 %r2290, %r2402, %r2401; 2026-02-21T10:18:51.7494064Z cvt.rn.bf16x2.f32 %r2291, %r2404, %r2403; 2026-02-21T10:18:51.7494182Z cvt.rn.bf16x2.f32 %r2292, %r2406, %r2405; 2026-02-21T10:18:51.7494260Z cvt.rn.bf16x2.f32 %r2293, %r2408, %r2407; 2026-02-21T10:18:51.7494330Z cvt.rn.bf16x2.f32 %r2294, %r2410, %r2409; 2026-02-21T10:18:51.7494400Z cvt.rn.bf16x2.f32 %r2295, %r2412, %r2411; 2026-02-21T10:18:51.7494468Z cvt.rn.bf16x2.f32 %r2296, %r2414, %r2413; 2026-02-21T10:18:51.7494583Z cvt.rn.bf16x2.f32 %r2297, %r2416, %r2415; 2026-02-21T10:18:51.7494656Z cvt.rn.bf16x2.f32 %r2298, %r2418, %r2417; 2026-02-21T10:18:51.7494726Z cvt.rn.bf16x2.f32 %r2299, %r2420, %r2419; 2026-02-21T10:18:51.7494798Z cvt.rn.bf16x2.f32 %r2300, %r2422, %r2421; 2026-02-21T10:18:51.7494869Z cvt.rn.bf16x2.f32 %r2301, %r2424, %r2423; 2026-02-21T10:18:51.7494937Z cvt.rn.bf16x2.f32 %r2302, %r2426, %r2425; 2026-02-21T10:18:51.7495010Z cvt.rn.bf16x2.f32 %r2303, %r2428, %r2427; 2026-02-21T10:18:51.7495087Z cvt.rn.bf16x2.f32 %r2304, %r2430, %r2429; 2026-02-21T10:18:51.7495157Z cvt.rn.bf16x2.f32 %r2305, %r2432, %r2431; 2026-02-21T10:18:51.7495228Z cvt.rn.bf16x2.f32 %r2306, %r2434, %r2433; 2026-02-21T10:18:51.7495301Z cvt.rn.bf16x2.f32 %r2307, %r2436, %r2435; 2026-02-21T10:18:51.7495370Z cvt.rn.bf16x2.f32 %r2308, %r2438, %r2437; 2026-02-21T10:18:51.7495438Z cvt.rn.bf16x2.f32 %r2309, %r2440, %r2439; 2026-02-21T10:18:51.7495511Z cvt.rn.bf16x2.f32 %r2310, %r2442, %r2441; 2026-02-21T10:18:51.7495582Z cvt.rn.bf16x2.f32 %r2311, %r2444, %r2443; 2026-02-21T10:18:51.7495650Z cvt.rn.bf16x2.f32 %r2312, %r2446, %r2445; 2026-02-21T10:18:51.7495719Z cvt.rn.bf16x2.f32 %r2313, %r2448, %r2447; 2026-02-21T10:18:51.7495792Z cvt.rn.bf16x2.f32 %r2314, %r2450, %r2449; 2026-02-21T10:18:51.7495861Z cvt.rn.bf16x2.f32 %r2315, %r2452, %r2451; 2026-02-21T10:18:51.7495930Z cvt.rn.bf16x2.f32 %r2316, %r2454, %r2453; 2026-02-21T10:18:51.7496004Z cvt.rn.bf16x2.f32 %r2317, %r2456, %r2455; 2026-02-21T10:18:51.7496074Z cvt.rn.bf16x2.f32 %r2318, %r2458, %r2457; 2026-02-21T10:18:51.7496144Z cvt.rn.bf16x2.f32 %r2319, %r2460, %r2459; 2026-02-21T10:18:51.7496219Z cvt.rn.bf16x2.f32 %r2320, %r2462, %r2461; 2026-02-21T10:18:51.7496290Z cvt.rn.bf16x2.f32 %r2321, %r2464, %r2463; 2026-02-21T10:18:51.7496360Z cvt.rn.bf16x2.f32 %r2322, %r2466, %r2465; 2026-02-21T10:18:51.7500782Z cvt.rn.bf16x2.f32 %r2323, %r2468, %r2467; 2026-02-21T10:18:51.7500922Z cvt.rn.bf16x2.f32 %r2324, %r2470, %r2469; 2026-02-21T10:18:51.7501017Z cvt.rn.bf16x2.f32 %r2325, %r2472, %r2471; 2026-02-21T10:18:51.7501101Z cvt.rn.bf16x2.f32 %r2326, %r2474, %r2473; 2026-02-21T10:18:51.7501176Z cvt.rn.bf16x2.f32 %r2327, %r2476, %r2475; 2026-02-21T10:18:51.7501252Z cvt.rn.bf16x2.f32 %r2328, %r2478, %r2477; 2026-02-21T10:18:51.7501333Z cvt.rn.bf16x2.f32 %r2329, %r2480, %r2479; 2026-02-21T10:18:51.7501410Z cvt.rn.bf16x2.f32 %r2330, %r2482, %r2481; 2026-02-21T10:18:51.7501484Z cvt.rn.bf16x2.f32 %r2331, %r2484, %r2483; 2026-02-21T10:18:51.7501719Z cvt.rn.bf16x2.f32 %r2332, %r2486, %r2485; 2026-02-21T10:18:51.7501793Z cvt.rn.bf16x2.f32 %r2333, %r2488, %r2487; 2026-02-21T10:18:51.7501932Z cvt.rn.bf16x2.f32 %r2334, %r2490, %r2489; 2026-02-21T10:18:51.7502004Z cvt.rn.bf16x2.f32 %r2335, %r2492, %r2491; 2026-02-21T10:18:51.7502083Z cvt.rn.bf16x2.f32 %r2336, %r2494, %r2493; 2026-02-21T10:18:51.7502153Z cvt.rn.bf16x2.f32 %r2337, %r2496, %r2495; 2026-02-21T10:18:51.7502225Z cvt.rn.bf16x2.f32 %r2338, %r2498, %r2497; 2026-02-21T10:18:51.7502303Z cvt.rn.bf16x2.f32 %r2339, %r2500, %r2499; 2026-02-21T10:18:51.7502375Z cvt.rn.bf16x2.f32 %r2340, %r2502, %r2501; 2026-02-21T10:18:51.7502445Z cvt.rn.bf16x2.f32 %r2341, %r2504, %r2503; 2026-02-21T10:18:51.7502524Z cvt.rn.bf16x2.f32 %r2342, %r2506, %r2505; 2026-02-21T10:18:51.7502595Z cvt.rn.bf16x2.f32 %r2343, %r2508, %r2507; 2026-02-21T10:18:51.7502666Z cvt.rn.bf16x2.f32 %r2344, %r2510, %r2509; 2026-02-21T10:18:51.7502735Z cvt.rn.bf16x2.f32 %r2345, %r2512, %r2511; 2026-02-21T10:18:51.7502816Z cvt.rn.bf16x2.f32 %r2346, %r2514, %r2513; 2026-02-21T10:18:51.7503051Z .loc 1 94 50 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:94:50 2026-02-21T10:18:51.7503209Z mad.lo.s32 %r2347, %r2266, 1280, %r2282; 2026-02-21T10:18:51.7503296Z mad.lo.s32 %r2348, %r2267, 1280, %r2282; 2026-02-21T10:18:51.7503365Z mad.lo.s32 %r2349, %r2268, 1280, %r2282; 2026-02-21T10:18:51.7503432Z mad.lo.s32 %r2350, %r2269, 1280, %r2282; 2026-02-21T10:18:51.7503584Z mad.lo.s32 %r2351, %r2270, 1280, %r2282; 2026-02-21T10:18:51.7503656Z mad.lo.s32 %r2352, %r2271, 1280, %r2282; 2026-02-21T10:18:51.7503722Z mad.lo.s32 %r2353, %r2272, 1280, %r2282; 2026-02-21T10:18:51.7503790Z mad.lo.s32 %r2354, %r2273, 1280, %r2282; 2026-02-21T10:18:51.7503867Z mad.lo.s32 %r2355, %r2274, 1280, %r2282; 2026-02-21T10:18:51.7503933Z mad.lo.s32 %r2356, %r2275, 1280, %r2282; 2026-02-21T10:18:51.7504013Z mad.lo.s32 %r2357, %r2276, 1280, %r2282; 2026-02-21T10:18:51.7504088Z mad.lo.s32 %r2358, %r2277, 1280, %r2282; 2026-02-21T10:18:51.7504157Z mad.lo.s32 %r2359, %r2278, 1280, %r2282; 2026-02-21T10:18:51.7504224Z mad.lo.s32 %r2360, %r2279, 1280, %r2282; 2026-02-21T10:18:51.7504292Z mad.lo.s32 %r2361, %r2280, 1280, %r2282; 2026-02-21T10:18:51.7504367Z mad.lo.s32 %r2362, %r2281, 1280, %r2282; 2026-02-21T10:18:51.7504597Z .loc 1 94 22 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:94:22 2026-02-21T10:18:51.7504678Z mad.wide.s32 %rd109, %r2347, 2, %rd7; 2026-02-21T10:18:51.7504754Z mad.wide.s32 %rd110, %r2348, 2, %rd7; 2026-02-21T10:18:51.7504822Z mad.wide.s32 %rd111, %r2349, 2, %rd7; 2026-02-21T10:18:51.7504886Z mad.wide.s32 %rd112, %r2350, 2, %rd7; 2026-02-21T10:18:51.7504958Z mad.wide.s32 %rd113, %r2351, 2, %rd7; 2026-02-21T10:18:51.7505028Z mad.wide.s32 %rd114, %r2352, 2, %rd7; 2026-02-21T10:18:51.7505093Z mad.wide.s32 %rd115, %r2353, 2, %rd7; 2026-02-21T10:18:51.7505160Z mad.wide.s32 %rd116, %r2354, 2, %rd7; 2026-02-21T10:18:51.7505236Z mad.wide.s32 %rd117, %r2355, 2, %rd7; 2026-02-21T10:18:51.7505302Z mad.wide.s32 %rd118, %r2356, 2, %rd7; 2026-02-21T10:18:51.7505370Z mad.wide.s32 %rd119, %r2357, 2, %rd7; 2026-02-21T10:18:51.7505443Z mad.wide.s32 %rd120, %r2358, 2, %rd7; 2026-02-21T10:18:51.7505510Z mad.wide.s32 %rd121, %r2359, 2, %rd7; 2026-02-21T10:18:51.7505576Z mad.wide.s32 %rd122, %r2360, 2, %rd7; 2026-02-21T10:18:51.7505648Z mad.wide.s32 %rd123, %r2361, 2, %rd7; 2026-02-21T10:18:51.7505713Z mad.wide.s32 %rd124, %r2362, 2, %rd7; 2026-02-21T10:18:51.7505929Z .loc 1 94 81 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:94:81 2026-02-21T10:18:51.7506051Z st.shared.v4.b32 [%r60], {%r2283, %r2285, %r2287, %r2289}; 2026-02-21T10:18:51.7506163Z st.shared.v4.b32 [%r61], {%r2291, %r2293, %r2295, %r2297}; 2026-02-21T10:18:51.7506265Z st.shared.v4.b32 [%r62], {%r2299, %r2301, %r2303, %r2305}; 2026-02-21T10:18:51.7506363Z st.shared.v4.b32 [%r63], {%r2307, %r2309, %r2311, %r2313}; 2026-02-21T10:18:51.7506802Z bar.sync 0; 2026-02-21T10:18:51.7506873Z // begin inline asm 2026-02-21T10:18:51.7507150Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2201, %r2202, %r2203, %r2204}, [%r2125]; 2026-02-21T10:18:51.7507219Z // end inline asm 2026-02-21T10:18:51.7507291Z // begin inline asm 2026-02-21T10:18:51.7507477Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2209, %r2210, %r2211, %r2212}, [%r2130]; 2026-02-21T10:18:51.7507536Z // end inline asm 2026-02-21T10:18:51.7507603Z // begin inline asm 2026-02-21T10:18:51.7507781Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2217, %r2218, %r2219, %r2220}, [%r2135]; 2026-02-21T10:18:51.7507839Z // end inline asm 2026-02-21T10:18:51.7507909Z // begin inline asm 2026-02-21T10:18:51.7508098Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2225, %r2226, %r2227, %r2228}, [%r2140]; 2026-02-21T10:18:51.7508156Z // end inline asm 2026-02-21T10:18:51.7508213Z bar.sync 0; 2026-02-21T10:18:51.7508331Z st.shared.v4.b32 [%r60], {%r2284, %r2286, %r2288, %r2290}; 2026-02-21T10:18:51.7508518Z st.shared.v4.b32 [%r61], {%r2292, %r2294, %r2296, %r2298}; 2026-02-21T10:18:51.7508627Z st.shared.v4.b32 [%r62], {%r2300, %r2302, %r2304, %r2306}; 2026-02-21T10:18:51.7508823Z st.shared.v4.b32 [%r63], {%r2308, %r2310, %r2312, %r2314}; 2026-02-21T10:18:51.7508884Z bar.sync 0; 2026-02-21T10:18:51.7508945Z // begin inline asm 2026-02-21T10:18:51.7509136Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2205, %r2206, %r2207, %r2208}, [%r2125]; 2026-02-21T10:18:51.7509258Z // end inline asm 2026-02-21T10:18:51.7509320Z // begin inline asm 2026-02-21T10:18:51.7509498Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2213, %r2214, %r2215, %r2216}, [%r2130]; 2026-02-21T10:18:51.7509560Z // end inline asm 2026-02-21T10:18:51.7509619Z // begin inline asm 2026-02-21T10:18:51.7509797Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2221, %r2222, %r2223, %r2224}, [%r2135]; 2026-02-21T10:18:51.7509861Z // end inline asm 2026-02-21T10:18:51.7509921Z // begin inline asm 2026-02-21T10:18:51.7510098Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2229, %r2230, %r2231, %r2232}, [%r2140]; 2026-02-21T10:18:51.7510165Z // end inline asm 2026-02-21T10:18:51.7510221Z bar.sync 0; 2026-02-21T10:18:51.7510326Z st.shared.v4.b32 [%r60], {%r2315, %r2317, %r2319, %r2321}; 2026-02-21T10:18:51.7510428Z st.shared.v4.b32 [%r61], {%r2323, %r2325, %r2327, %r2329}; 2026-02-21T10:18:51.7510532Z st.shared.v4.b32 [%r62], {%r2331, %r2333, %r2335, %r2337}; 2026-02-21T10:18:51.7510634Z st.shared.v4.b32 [%r63], {%r2339, %r2341, %r2343, %r2345}; 2026-02-21T10:18:51.7510691Z bar.sync 0; 2026-02-21T10:18:51.7510756Z // begin inline asm 2026-02-21T10:18:51.7510933Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2233, %r2234, %r2235, %r2236}, [%r2125]; 2026-02-21T10:18:51.7510992Z // end inline asm 2026-02-21T10:18:51.7511055Z // begin inline asm 2026-02-21T10:18:51.7511236Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2241, %r2242, %r2243, %r2244}, [%r2130]; 2026-02-21T10:18:51.7511295Z // end inline asm 2026-02-21T10:18:51.7511354Z // begin inline asm 2026-02-21T10:18:51.7511545Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2249, %r2250, %r2251, %r2252}, [%r2135]; 2026-02-21T10:18:51.7511607Z // end inline asm 2026-02-21T10:18:51.7511668Z // begin inline asm 2026-02-21T10:18:51.7511860Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2257, %r2258, %r2259, %r2260}, [%r2140]; 2026-02-21T10:18:51.7511918Z // end inline asm 2026-02-21T10:18:51.7511976Z bar.sync 0; 2026-02-21T10:18:51.7512084Z st.shared.v4.b32 [%r60], {%r2316, %r2318, %r2320, %r2322}; 2026-02-21T10:18:51.7512195Z st.shared.v4.b32 [%r61], {%r2324, %r2326, %r2328, %r2330}; 2026-02-21T10:18:51.7512295Z st.shared.v4.b32 [%r62], {%r2332, %r2334, %r2336, %r2338}; 2026-02-21T10:18:51.7512397Z st.shared.v4.b32 [%r63], {%r2340, %r2342, %r2344, %r2346}; 2026-02-21T10:18:51.7512456Z bar.sync 0; 2026-02-21T10:18:51.7512517Z // begin inline asm 2026-02-21T10:18:51.7512699Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2237, %r2238, %r2239, %r2240}, [%r2125]; 2026-02-21T10:18:51.7512840Z // end inline asm 2026-02-21T10:18:51.7512899Z // begin inline asm 2026-02-21T10:18:51.7513125Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2245, %r2246, %r2247, %r2248}, [%r2130]; 2026-02-21T10:18:51.7513184Z // end inline asm 2026-02-21T10:18:51.7513247Z // begin inline asm 2026-02-21T10:18:51.7513423Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2253, %r2254, %r2255, %r2256}, [%r2135]; 2026-02-21T10:18:51.7513480Z // end inline asm 2026-02-21T10:18:51.7513563Z // begin inline asm 2026-02-21T10:18:51.7513743Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2261, %r2262, %r2263, %r2264}, [%r2140]; 2026-02-21T10:18:51.7513800Z // end inline asm 2026-02-21T10:18:51.7513858Z // begin inline asm 2026-02-21T10:18:51.7513991Z st.global.v4.b32 [ %rd109 + 0 ], { %r2201, %r2202, %r2203, %r2204 }; 2026-02-21T10:18:51.7514048Z // end inline asm 2026-02-21T10:18:51.7514111Z // begin inline asm 2026-02-21T10:18:51.7514235Z st.global.v4.b32 [ %rd110 + 0 ], { %r2205, %r2206, %r2207, %r2208 }; 2026-02-21T10:18:51.7514296Z // end inline asm 2026-02-21T10:18:51.7514355Z // begin inline asm 2026-02-21T10:18:51.7514476Z st.global.v4.b32 [ %rd111 + 0 ], { %r2209, %r2210, %r2211, %r2212 }; 2026-02-21T10:18:51.7514585Z // end inline asm 2026-02-21T10:18:51.7514659Z // begin inline asm 2026-02-21T10:18:51.7514780Z st.global.v4.b32 [ %rd112 + 0 ], { %r2213, %r2214, %r2215, %r2216 }; 2026-02-21T10:18:51.7514842Z // end inline asm 2026-02-21T10:18:51.7514949Z // begin inline asm 2026-02-21T10:18:51.7515064Z st.global.v4.b32 [ %rd113 + 0 ], { %r2217, %r2218, %r2219, %r2220 }; 2026-02-21T10:18:51.7515126Z // end inline asm 2026-02-21T10:18:51.7515185Z // begin inline asm 2026-02-21T10:18:51.7515298Z st.global.v4.b32 [ %rd114 + 0 ], { %r2221, %r2222, %r2223, %r2224 }; 2026-02-21T10:18:51.7515354Z // end inline asm 2026-02-21T10:18:51.7515420Z // begin inline asm 2026-02-21T10:18:51.7515532Z st.global.v4.b32 [ %rd115 + 0 ], { %r2225, %r2226, %r2227, %r2228 }; 2026-02-21T10:18:51.7515592Z // end inline asm 2026-02-21T10:18:51.7515660Z // begin inline asm 2026-02-21T10:18:51.7515776Z st.global.v4.b32 [ %rd116 + 0 ], { %r2229, %r2230, %r2231, %r2232 }; 2026-02-21T10:18:51.7515836Z // end inline asm 2026-02-21T10:18:51.7515902Z // begin inline asm 2026-02-21T10:18:51.7516019Z st.global.v4.b32 [ %rd117 + 0 ], { %r2233, %r2234, %r2235, %r2236 }; 2026-02-21T10:18:51.7516076Z // end inline asm 2026-02-21T10:18:51.7516134Z // begin inline asm 2026-02-21T10:18:51.7516257Z st.global.v4.b32 [ %rd118 + 0 ], { %r2237, %r2238, %r2239, %r2240 }; 2026-02-21T10:18:51.7516314Z // end inline asm 2026-02-21T10:18:51.7516373Z // begin inline asm 2026-02-21T10:18:51.7516623Z st.global.v4.b32 [ %rd119 + 0 ], { %r2241, %r2242, %r2243, %r2244 }; 2026-02-21T10:18:51.7516686Z // end inline asm 2026-02-21T10:18:51.7516746Z // begin inline asm 2026-02-21T10:18:51.7516861Z st.global.v4.b32 [ %rd120 + 0 ], { %r2245, %r2246, %r2247, %r2248 }; 2026-02-21T10:18:51.7516922Z // end inline asm 2026-02-21T10:18:51.7516984Z // begin inline asm 2026-02-21T10:18:51.7517100Z st.global.v4.b32 [ %rd121 + 0 ], { %r2249, %r2250, %r2251, %r2252 }; 2026-02-21T10:18:51.7517169Z // end inline asm 2026-02-21T10:18:51.7517229Z // begin inline asm 2026-02-21T10:18:51.7517343Z st.global.v4.b32 [ %rd122 + 0 ], { %r2253, %r2254, %r2255, %r2256 }; 2026-02-21T10:18:51.7517404Z // end inline asm 2026-02-21T10:18:51.7517462Z // begin inline asm 2026-02-21T10:18:51.7517579Z st.global.v4.b32 [ %rd123 + 0 ], { %r2257, %r2258, %r2259, %r2260 }; 2026-02-21T10:18:51.7517636Z // end inline asm 2026-02-21T10:18:51.7517700Z // begin inline asm 2026-02-21T10:18:51.7517812Z st.global.v4.b32 [ %rd124 + 0 ], { %r2261, %r2262, %r2263, %r2264 }; 2026-02-21T10:18:51.7517868Z // end inline asm 2026-02-21T10:18:51.7517939Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:18:51.7518005Z mov.b32 %r2388, %r2387; 2026-02-21T10:18:51.7518066Z mov.b32 %r2389, %r2387; 2026-02-21T10:18:51.7518124Z mov.b32 %r2390, %r2387; 2026-02-21T10:18:51.7518186Z mov.b32 %r2391, %r2387; 2026-02-21T10:18:51.7518344Z mov.b32 %r2392, %r2387; 2026-02-21T10:18:51.7518461Z mov.b32 %r2393, %r2387; 2026-02-21T10:18:51.7518526Z mov.b32 %r2394, %r2387; 2026-02-21T10:18:51.7518587Z mov.b32 %r2395, %r2387; 2026-02-21T10:18:51.7518647Z mov.b32 %r2396, %r2387; 2026-02-21T10:18:51.7518706Z mov.b32 %r2397, %r2387; 2026-02-21T10:18:51.7518773Z mov.b32 %r2398, %r2387; 2026-02-21T10:18:51.7518832Z mov.b32 %r2399, %r2387; 2026-02-21T10:18:51.7518893Z mov.b32 %r2400, %r2387; 2026-02-21T10:18:51.7518954Z mov.b32 %r2401, %r2387; 2026-02-21T10:18:51.7519014Z mov.b32 %r2402, %r2387; 2026-02-21T10:18:51.7519073Z mov.b32 %r2403, %r2387; 2026-02-21T10:18:51.7519131Z mov.b32 %r2404, %r2387; 2026-02-21T10:18:51.7519195Z mov.b32 %r2405, %r2387; 2026-02-21T10:18:51.7519254Z mov.b32 %r2406, %r2387; 2026-02-21T10:18:51.7519312Z mov.b32 %r2407, %r2387; 2026-02-21T10:18:51.7519375Z mov.b32 %r2408, %r2387; 2026-02-21T10:18:51.7519434Z mov.b32 %r2409, %r2387; 2026-02-21T10:18:51.7519494Z mov.b32 %r2410, %r2387; 2026-02-21T10:18:51.7519559Z mov.b32 %r2411, %r2387; 2026-02-21T10:18:51.7519621Z mov.b32 %r2412, %r2387; 2026-02-21T10:18:51.7519748Z mov.b32 %r2413, %r2387; 2026-02-21T10:18:51.7519811Z mov.b32 %r2414, %r2387; 2026-02-21T10:18:51.7519874Z mov.b32 %r2415, %r2387; 2026-02-21T10:18:51.7519934Z mov.b32 %r2416, %r2387; 2026-02-21T10:18:51.7519994Z mov.b32 %r2417, %r2387; 2026-02-21T10:18:51.7520060Z mov.b32 %r2418, %r2387; 2026-02-21T10:18:51.7520179Z mov.b32 %r2419, %r2387; 2026-02-21T10:18:51.7520240Z mov.b32 %r2420, %r2387; 2026-02-21T10:18:51.7520298Z mov.b32 %r2421, %r2387; 2026-02-21T10:18:51.7520367Z mov.b32 %r2422, %r2387; 2026-02-21T10:18:51.7520426Z mov.b32 %r2423, %r2387; 2026-02-21T10:18:51.7520486Z mov.b32 %r2424, %r2387; 2026-02-21T10:18:51.7520547Z mov.b32 %r2425, %r2387; 2026-02-21T10:18:51.7520604Z mov.b32 %r2426, %r2387; 2026-02-21T10:18:51.7520663Z mov.b32 %r2427, %r2387; 2026-02-21T10:18:51.7520724Z mov.b32 %r2428, %r2387; 2026-02-21T10:18:51.7520801Z mov.b32 %r2429, %r2387; 2026-02-21T10:18:51.7520866Z mov.b32 %r2430, %r2387; 2026-02-21T10:18:51.7520925Z mov.b32 %r2431, %r2387; 2026-02-21T10:18:51.7520991Z mov.b32 %r2432, %r2387; 2026-02-21T10:18:51.7521050Z mov.b32 %r2433, %r2387; 2026-02-21T10:18:51.7521108Z mov.b32 %r2434, %r2387; 2026-02-21T10:18:51.7521166Z mov.b32 %r2435, %r2387; 2026-02-21T10:18:51.7521229Z mov.b32 %r2436, %r2387; 2026-02-21T10:18:51.7521290Z mov.b32 %r2437, %r2387; 2026-02-21T10:18:51.7521351Z mov.b32 %r2438, %r2387; 2026-02-21T10:18:51.7521419Z mov.b32 %r2439, %r2387; 2026-02-21T10:18:51.7521479Z mov.b32 %r2440, %r2387; 2026-02-21T10:18:51.7521536Z mov.b32 %r2441, %r2387; 2026-02-21T10:18:51.7521594Z mov.b32 %r2442, %r2387; 2026-02-21T10:18:51.7521659Z mov.b32 %r2443, %r2387; 2026-02-21T10:18:51.7521718Z mov.b32 %r2444, %r2387; 2026-02-21T10:18:51.7521777Z mov.b32 %r2445, %r2387; 2026-02-21T10:18:51.7521846Z mov.b32 %r2446, %r2387; 2026-02-21T10:18:51.7521906Z mov.b32 %r2447, %r2387; 2026-02-21T10:18:51.7521965Z mov.b32 %r2448, %r2387; 2026-02-21T10:18:51.7522026Z mov.b32 %r2449, %r2387; 2026-02-21T10:18:51.7522090Z mov.b32 %r2450, %r2387; 2026-02-21T10:18:51.7522150Z mov.b32 %r2451, %r2387; 2026-02-21T10:18:51.7522210Z mov.b32 %r2452, %r2387; 2026-02-21T10:18:51.7522277Z mov.b32 %r2453, %r2387; 2026-02-21T10:18:51.7522336Z mov.b32 %r2454, %r2387; 2026-02-21T10:18:51.7522394Z mov.b32 %r2455, %r2387; 2026-02-21T10:18:51.7522460Z mov.b32 %r2456, %r2387; 2026-02-21T10:18:51.7522519Z mov.b32 %r2457, %r2387; 2026-02-21T10:18:51.7522577Z mov.b32 %r2458, %r2387; 2026-02-21T10:18:51.7522636Z mov.b32 %r2459, %r2387; 2026-02-21T10:18:51.7522700Z mov.b32 %r2460, %r2387; 2026-02-21T10:18:51.7522760Z mov.b32 %r2461, %r2387; 2026-02-21T10:18:51.7522819Z mov.b32 %r2462, %r2387; 2026-02-21T10:18:51.7522881Z mov.b32 %r2463, %r2387; 2026-02-21T10:18:51.7522941Z mov.b32 %r2464, %r2387; 2026-02-21T10:18:51.7522998Z mov.b32 %r2465, %r2387; 2026-02-21T10:18:51.7523120Z mov.b32 %r2466, %r2387; 2026-02-21T10:18:51.7523186Z mov.b32 %r2467, %r2387; 2026-02-21T10:18:51.7523290Z mov.b32 %r2468, %r2387; 2026-02-21T10:18:51.7523351Z mov.b32 %r2469, %r2387; 2026-02-21T10:18:51.7523414Z mov.b32 %r2470, %r2387; 2026-02-21T10:18:51.7523475Z mov.b32 %r2471, %r2387; 2026-02-21T10:18:51.7523534Z mov.b32 %r2472, %r2387; 2026-02-21T10:18:51.7523591Z mov.b32 %r2473, %r2387; 2026-02-21T10:18:51.7523656Z mov.b32 %r2474, %r2387; 2026-02-21T10:18:51.7523715Z mov.b32 %r2475, %r2387; 2026-02-21T10:18:51.7523774Z mov.b32 %r2476, %r2387; 2026-02-21T10:18:51.7523838Z mov.b32 %r2477, %r2387; 2026-02-21T10:18:51.7523897Z mov.b32 %r2478, %r2387; 2026-02-21T10:18:51.7523956Z mov.b32 %r2479, %r2387; 2026-02-21T10:18:51.7524017Z mov.b32 %r2480, %r2387; 2026-02-21T10:18:51.7524081Z mov.b32 %r2481, %r2387; 2026-02-21T10:18:51.7524141Z mov.b32 %r2482, %r2387; 2026-02-21T10:18:51.7524198Z mov.b32 %r2483, %r2387; 2026-02-21T10:18:51.7524260Z mov.b32 %r2484, %r2387; 2026-02-21T10:18:51.7524321Z mov.b32 %r2485, %r2387; 2026-02-21T10:18:51.7524382Z mov.b32 %r2486, %r2387; 2026-02-21T10:18:51.7524440Z mov.b32 %r2487, %r2387; 2026-02-21T10:18:51.7524573Z mov.b32 %r2488, %r2387; 2026-02-21T10:18:51.7524634Z mov.b32 %r2489, %r2387; 2026-02-21T10:18:51.7524692Z mov.b32 %r2490, %r2387; 2026-02-21T10:18:51.7524756Z mov.b32 %r2491, %r2387; 2026-02-21T10:18:51.7524819Z mov.b32 %r2492, %r2387; 2026-02-21T10:18:51.7524924Z mov.b32 %r2493, %r2387; 2026-02-21T10:18:51.7524984Z mov.b32 %r2494, %r2387; 2026-02-21T10:18:51.7525061Z mov.b32 %r2495, %r2387; 2026-02-21T10:18:51.7525122Z mov.b32 %r2496, %r2387; 2026-02-21T10:18:51.7525182Z mov.b32 %r2497, %r2387; 2026-02-21T10:18:51.7525243Z mov.b32 %r2498, %r2387; 2026-02-21T10:18:51.7525301Z mov.b32 %r2499, %r2387; 2026-02-21T10:18:51.7525359Z mov.b32 %r2500, %r2387; 2026-02-21T10:18:51.7525418Z mov.b32 %r2501, %r2387; 2026-02-21T10:18:51.7525483Z mov.b32 %r2502, %r2387; 2026-02-21T10:18:51.7525547Z mov.b32 %r2503, %r2387; 2026-02-21T10:18:51.7525606Z mov.b32 %r2504, %r2387; 2026-02-21T10:18:51.7525673Z mov.b32 %r2505, %r2387; 2026-02-21T10:18:51.7525732Z mov.b32 %r2506, %r2387; 2026-02-21T10:18:51.7525793Z mov.b32 %r2507, %r2387; 2026-02-21T10:18:51.7525860Z mov.b32 %r2508, %r2387; 2026-02-21T10:18:51.7525917Z mov.b32 %r2509, %r2387; 2026-02-21T10:18:51.7525975Z mov.b32 %r2510, %r2387; 2026-02-21T10:18:51.7526040Z mov.b32 %r2511, %r2387; 2026-02-21T10:18:51.7526100Z mov.b32 %r2512, %r2387; 2026-02-21T10:18:51.7526159Z mov.b32 %r2513, %r2387; 2026-02-21T10:18:51.7526216Z mov.b32 %r2514, %r2387; 2026-02-21T10:18:51.7526280Z bra.uni $L__BB0_6; 2026-02-21T10:18:51.7526372Z $L__BB0_7: // %._crit_edge 2026-02-21T10:18:51.7526726Z .loc 1 26 145 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:145 2026-02-21T10:18:51.7526804Z cp.async.wait_group 0; 2026-02-21T10:18:51.7526862Z bar.sync 0; 2026-02-21T10:18:51.7526924Z // begin inline asm 2026-02-21T10:18:51.7527028Z @%p61 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T10:18:51.7527092Z // end inline asm 2026-02-21T10:18:51.7527148Z bar.sync 0; 2026-02-21T10:18:51.7527208Z // begin inline asm 2026-02-21T10:18:51.7527302Z @%p61 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T10:18:51.7527361Z // end inline asm 2026-02-21T10:18:51.7527416Z bar.sync 0; 2026-02-21T10:18:51.7527475Z // begin inline asm 2026-02-21T10:18:51.7527567Z @%p61 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T10:18:51.7527625Z // end inline asm 2026-02-21T10:18:51.7527832Z .loc 1 26 4 // cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py:26:4 2026-02-21T10:18:51.7527890Z ret; 2026-02-21T10:18:51.7527946Z $L__tmp3: 2026-02-21T10:18:51.7528002Z $L__func_end0: 2026-02-21T10:18:51.7528094Z // -- End function 2026-02-21T10:18:51.7528148Z } 2026-02-21T10:18:51.7528397Z .file 1 "/tmp/torchinductor_root/n6/cn66s7sicjwqkgpmw3rqicb3htj62pik6istiwx4ivea5soknenl.py" 2026-02-21T10:18:51.7528695Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:18:51.7528835Z .section .debug_abbrev 2026-02-21T10:18:51.7528890Z { 2026-02-21T10:18:51.7528993Z .b8 1 // Abbreviation Code 2026-02-21T10:18:51.7529105Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:18:51.7529195Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:51.7529284Z .b8 37 // DW_AT_producer 2026-02-21T10:18:51.7529369Z .b8 8 // DW_FORM_string 2026-02-21T10:18:51.7529447Z .b8 19 // DW_AT_language 2026-02-21T10:18:51.7529530Z .b8 5 // DW_FORM_data2 2026-02-21T10:18:51.7529608Z .b8 3 // DW_AT_name 2026-02-21T10:18:51.7529692Z .b8 8 // DW_FORM_string 2026-02-21T10:18:51.7529781Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:18:51.7529861Z .b8 6 // DW_FORM_data4 2026-02-21T10:18:51.7530011Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:18:51.7530098Z .b8 8 // DW_FORM_string 2026-02-21T10:18:51.7530175Z .b8 0 // EOM(1) 2026-02-21T10:18:51.7530305Z .b8 0 // EOM(2) 2026-02-21T10:18:51.7530398Z .b8 2 // Abbreviation Code 2026-02-21T10:18:51.7530486Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:51.7530563Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:51.7530641Z .b8 3 // DW_AT_name 2026-02-21T10:18:51.7530720Z .b8 8 // DW_FORM_string 2026-02-21T10:18:51.7530801Z .b8 32 // DW_AT_inline 2026-02-21T10:18:51.7530889Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:51.7530961Z .b8 0 // EOM(1) 2026-02-21T10:18:51.7531031Z .b8 0 // EOM(2) 2026-02-21T10:18:51.7531121Z .b8 3 // Abbreviation Code 2026-02-21T10:18:51.7531207Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:18:51.7531291Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:18:51.7531374Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:51.7531450Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:51.7531532Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:51.7531610Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:51.7531706Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:51.7531796Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:51.7531876Z .b8 0 // EOM(1) 2026-02-21T10:18:51.7531952Z .b8 0 // EOM(2) 2026-02-21T10:18:51.7532043Z .b8 4 // Abbreviation Code 2026-02-21T10:18:51.7532143Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:18:51.7532229Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:18:51.7532322Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:18:51.7532398Z .b8 19 // DW_FORM_ref4 2026-02-21T10:18:51.7532475Z .b8 17 // DW_AT_low_pc 2026-02-21T10:18:51.7532553Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:51.7532633Z .b8 18 // DW_AT_high_pc 2026-02-21T10:18:51.7532705Z .b8 1 // DW_FORM_addr 2026-02-21T10:18:51.7532790Z .b8 88 // DW_AT_call_file 2026-02-21T10:18:51.7532927Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:51.7533054Z .b8 89 // DW_AT_call_line 2026-02-21T10:18:51.7533137Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:51.7533221Z .b8 87 // DW_AT_call_column 2026-02-21T10:18:51.7533297Z .b8 11 // DW_FORM_data1 2026-02-21T10:18:51.7533368Z .b8 0 // EOM(1) 2026-02-21T10:18:51.7533453Z .b8 0 // EOM(2) 2026-02-21T10:18:51.7533524Z .b8 0 // EOM(3) 2026-02-21T10:18:51.7533577Z } 2026-02-21T10:18:51.7533647Z .section .debug_info 2026-02-21T10:18:51.7533698Z { 2026-02-21T10:18:51.7533788Z .b32 178 // Length of Unit 2026-02-21T10:18:51.7533886Z .b8 2 // DWARF version number 2026-02-21T10:18:51.7533940Z .b8 0 2026-02-21T10:18:51.7534076Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:18:51.7534222Z .b8 8 // Address Size (in bytes) 2026-02-21T10:18:51.7534341Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:18:51.7534435Z .b8 116 // DW_AT_producer 2026-02-21T10:18:51.7534496Z .b8 114 2026-02-21T10:18:51.7534592Z .b8 105 2026-02-21T10:18:51.7534646Z .b8 116 2026-02-21T10:18:51.7534701Z .b8 111 2026-02-21T10:18:51.7534752Z .b8 110 2026-02-21T10:18:51.7534804Z .b8 0 2026-02-21T10:18:51.7534884Z .b8 2 // DW_AT_language 2026-02-21T10:18:51.7534939Z .b8 0 2026-02-21T10:18:51.7535022Z .b8 99 // DW_AT_name 2026-02-21T10:18:51.7535073Z .b8 110 2026-02-21T10:18:51.7535126Z .b8 54 2026-02-21T10:18:51.7535177Z .b8 54 2026-02-21T10:18:51.7535228Z .b8 115 2026-02-21T10:18:51.7535281Z .b8 55 2026-02-21T10:18:51.7535337Z .b8 115 2026-02-21T10:18:51.7535390Z .b8 105 2026-02-21T10:18:51.7535443Z .b8 99 2026-02-21T10:18:51.7535498Z .b8 106 2026-02-21T10:18:51.7535552Z .b8 119 2026-02-21T10:18:51.7535605Z .b8 113 2026-02-21T10:18:51.7535658Z .b8 107 2026-02-21T10:18:51.7535715Z .b8 103 2026-02-21T10:18:51.7535765Z .b8 112 2026-02-21T10:18:51.7535818Z .b8 109 2026-02-21T10:18:51.7535872Z .b8 119 2026-02-21T10:18:51.7535930Z .b8 51 2026-02-21T10:18:51.7535982Z .b8 114 2026-02-21T10:18:51.7536034Z .b8 113 2026-02-21T10:18:51.7536089Z .b8 105 2026-02-21T10:18:51.7536140Z .b8 99 2026-02-21T10:18:51.7536191Z .b8 98 2026-02-21T10:18:51.7536241Z .b8 51 2026-02-21T10:18:51.7536299Z .b8 104 2026-02-21T10:18:51.7536351Z .b8 116 2026-02-21T10:18:51.7536401Z .b8 106 2026-02-21T10:18:51.7536573Z .b8 54 2026-02-21T10:18:51.7536628Z .b8 50 2026-02-21T10:18:51.7536679Z .b8 112 2026-02-21T10:18:51.7536730Z .b8 105 2026-02-21T10:18:51.7536798Z .b8 107 2026-02-21T10:18:51.7536849Z .b8 54 2026-02-21T10:18:51.7536904Z .b8 105 2026-02-21T10:18:51.7536959Z .b8 115 2026-02-21T10:18:51.7537010Z .b8 116 2026-02-21T10:18:51.7537063Z .b8 105 2026-02-21T10:18:51.7537113Z .b8 119 2026-02-21T10:18:51.7537168Z .b8 120 2026-02-21T10:18:51.7537220Z .b8 52 2026-02-21T10:18:51.7537270Z .b8 105 2026-02-21T10:18:51.7537320Z .b8 118 2026-02-21T10:18:51.7537375Z .b8 101 2026-02-21T10:18:51.7537425Z .b8 97 2026-02-21T10:18:51.7537476Z .b8 53 2026-02-21T10:18:51.7537531Z .b8 115 2026-02-21T10:18:51.7537582Z .b8 111 2026-02-21T10:18:51.7537634Z .b8 107 2026-02-21T10:18:51.7537684Z .b8 110 2026-02-21T10:18:51.7537740Z .b8 101 2026-02-21T10:18:51.7537791Z .b8 110 2026-02-21T10:18:51.7537842Z .b8 108 2026-02-21T10:18:51.7537897Z .b8 46 2026-02-21T10:18:51.7537948Z .b8 112 2026-02-21T10:18:51.7537998Z .b8 121 2026-02-21T10:18:51.7538053Z .b8 0 2026-02-21T10:18:51.7538161Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:18:51.7538242Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:18:51.7538377Z .b8 116 2026-02-21T10:18:51.7538431Z .b8 109 2026-02-21T10:18:51.7538482Z .b8 112 2026-02-21T10:18:51.7538592Z .b8 47 2026-02-21T10:18:51.7538643Z .b8 116 2026-02-21T10:18:51.7538697Z .b8 111 2026-02-21T10:18:51.7538749Z .b8 114 2026-02-21T10:18:51.7538800Z .b8 99 2026-02-21T10:18:51.7538851Z .b8 104 2026-02-21T10:18:51.7538906Z .b8 105 2026-02-21T10:18:51.7538973Z .b8 110 2026-02-21T10:18:51.7539028Z .b8 100 2026-02-21T10:18:51.7539083Z .b8 117 2026-02-21T10:18:51.7539135Z .b8 99 2026-02-21T10:18:51.7539188Z .b8 116 2026-02-21T10:18:51.7539239Z .b8 111 2026-02-21T10:18:51.7539295Z .b8 114 2026-02-21T10:18:51.7539346Z .b8 95 2026-02-21T10:18:51.7539397Z .b8 114 2026-02-21T10:18:51.7539452Z .b8 111 2026-02-21T10:18:51.7539504Z .b8 111 2026-02-21T10:18:51.7539554Z .b8 116 2026-02-21T10:18:51.7539604Z .b8 47 2026-02-21T10:18:51.7539660Z .b8 110 2026-02-21T10:18:51.7539710Z .b8 54 2026-02-21T10:18:51.7539764Z .b8 0 2026-02-21T10:18:51.7539884Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:18:51.7539964Z .b8 95 // DW_AT_name 2026-02-21T10:18:51.7540018Z .b8 104 2026-02-21T10:18:51.7540071Z .b8 101 2026-02-21T10:18:51.7540199Z .b8 108 2026-02-21T10:18:51.7540253Z .b8 105 2026-02-21T10:18:51.7540304Z .b8 111 2026-02-21T10:18:51.7540368Z .b8 110 2026-02-21T10:18:51.7540420Z .b8 95 2026-02-21T10:18:51.7540474Z .b8 109 2026-02-21T10:18:51.7540524Z .b8 97 2026-02-21T10:18:51.7540580Z .b8 116 2026-02-21T10:18:51.7540693Z .b8 109 2026-02-21T10:18:51.7540748Z .b8 117 2026-02-21T10:18:51.7540798Z .b8 108 2026-02-21T10:18:51.7540852Z .b8 95 2026-02-21T10:18:51.7540903Z .b8 98 2026-02-21T10:18:51.7540956Z .b8 102 2026-02-21T10:18:51.7541010Z .b8 49 2026-02-21T10:18:51.7541061Z .b8 54 2026-02-21T10:18:51.7541111Z .b8 95 2026-02-21T10:18:51.7541162Z .b8 105 2026-02-21T10:18:51.7541217Z .b8 110 2026-02-21T10:18:51.7541267Z .b8 116 2026-02-21T10:18:51.7541317Z .b8 52 2026-02-21T10:18:51.7541372Z .b8 0 2026-02-21T10:18:51.7541455Z .b8 1 // DW_AT_inline 2026-02-21T10:18:51.7541564Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:18:51.7541664Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:18:51.7541767Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:18:51.7541870Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:51.7542005Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:18:51.7542104Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:18:51.7542192Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:18:51.7542280Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:18:51.7542368Z .b8 1 // DW_AT_call_file 2026-02-21T10:18:51.7542450Z .b8 90 // DW_AT_call_line 2026-02-21T10:18:51.7542537Z .b8 40 // DW_AT_call_column 2026-02-21T10:18:51.7542634Z .b8 0 // End Of Children Mark 2026-02-21T10:18:51.7542734Z .b8 0 // End Of Children Mark 2026-02-21T10:18:51.7542786Z } 2026-02-21T10:18:51.7542856Z .section .debug_macinfo { } 2026-02-21T10:18:51.7542863Z 2026-02-21T10:18:51.7542948Z ================================================================ 2026-02-21T10:18:51.7543066Z please share the reproducer above with Triton project. 2026-02-21T10:18:55.7841258Z 2026-02-21T10:18:55.7842189Z Generation 14: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 2.0 configs/s 2026-02-21T10:18:58.0441227Z Generation 14: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 9.8 configs/s 2026-02-21T10:18:59.4923428Z [3132s] Generation 14 complete: 2026-02-21T10:18:59.4923877Z error=16 2026-02-21T10:18:59.4924180Z ok=44 2026-02-21T10:18:59.4924447Z min=6.3715 2026-02-21T10:18:59.4924715Z mid=12.7154 2026-02-21T10:18:59.4925011Z max=416.2498 2026-02-21T10:18:59.4925920Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:18:59.4927138Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:18:59.4927740Z 'l2_groupings': [8], 2026-02-21T10:18:59.4928119Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:18:59.4928551Z 'loop_orders': [[0, 1]], 2026-02-21T10:18:59.4928907Z 'maxnreg': 256, 2026-02-21T10:18:59.4929217Z 'num_sm_multiplier': 64, 2026-02-21T10:18:59.4929572Z 'num_stages': 1, 2026-02-21T10:18:59.4929883Z 'num_warps': 4, 2026-02-21T10:18:59.4930224Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:18:59.4930667Z 'range_flattens': [True, False], 2026-02-21T10:18:59.4931064Z 'range_multi_buffers': [None, False], 2026-02-21T10:18:59.4931506Z 'range_num_stages': [4, 1], 2026-02-21T10:18:59.4931873Z 'range_unroll_factors': [1, 0], 2026-02-21T10:18:59.4932285Z 'range_warp_specializes': []} 2026-02-21T10:18:59.4984365Z [3132s] Fitting surrogate: 1262 points, 1262 targets 2026-02-21T10:19:00.4980856Z [3133s] Generation 15 starting: 57 neighbors, 3 active search path(s) 2026-02-21T10:19:29.9553296Z Generation 15: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 0.4 configs/s 2026-02-21T10:19:37.9712928Z 2026-02-21T10:19:37.9712944Z 2026-02-21T10:19:37.9713382Z ================================================================ 2026-02-21T10:19:37.9713768Z Internal Triton PTX codegen error 2026-02-21T10:19:37.9714045Z `ptxas` stderr: 2026-02-21T10:19:37.9715075Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1161 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:19:37.9715940Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:19:37.9716171Z 2026-02-21T10:19:37.9717044Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp1igmrbwc.ptx -o /tmp/tmp1igmrbwc.ptx.o 2026-02-21T10:19:37.9717841Z 2026-02-21T10:19:37.9717861Z 2026-02-21T10:19:37.9717936Z // 2026-02-21T10:19:37.9718125Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:19:37.9718377Z // 2026-02-21T10:19:37.9718464Z 2026-02-21T10:19:37.9718539Z .version 8.7 2026-02-21T10:19:37.9718724Z .target sm_90a 2026-02-21T10:19:37.9718898Z .address_size 64 2026-02-21T10:19:37.9719015Z 2026-02-21T10:19:37.9719238Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:19:37.9719671Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:19:37.9720005Z // @_helion_matmul_bf16_int4 2026-02-21T10:19:37.9720324Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:19:37.9720707Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:19:37.9721150Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:19:37.9721593Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:19:37.9722038Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:19:37.9722515Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:19:37.9722906Z ) 2026-02-21T10:19:37.9723079Z .reqntid 128 2026-02-21T10:19:37.9723298Z .maxnreg 32 2026-02-21T10:19:37.9723492Z { 2026-02-21T10:19:37.9723666Z .reg .pred %p<406>; 2026-02-21T10:19:37.9723905Z .reg .b16 %rs<3590>; 2026-02-21T10:19:37.9724129Z .reg .b32 %r<43497>; 2026-02-21T10:19:37.9724327Z .reg .b64 %rd<856>; 2026-02-21T10:19:37.9724713Z .loc 1 19 0 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:19:0 2026-02-21T10:19:37.9725203Z $L__func_begin0: 2026-02-21T10:19:37.9725682Z .loc 1 19 0 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:19:0 2026-02-21T10:19:37.9725993Z 2026-02-21T10:19:37.9726055Z // %bb.0: 2026-02-21T10:19:37.9726270Z ld.param.b64 %rd117, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:19:37.9726688Z $L__tmp0: 2026-02-21T10:19:37.9727156Z .loc 1 21 67 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:21:67 2026-02-21T10:19:37.9727593Z mov.u32 %r42467, %ctaid.x; 2026-02-21T10:19:37.9727832Z ld.param.b64 %rd119, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:19:37.9728127Z ld.param.b64 %rd137, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:19:37.9728384Z mov.u32 %r2210, %ctaid.y; 2026-02-21T10:19:37.9728606Z ld.param.b64 %rd154, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:19:37.9728850Z mov.u32 %r2211, %ctaid.z; 2026-02-21T10:19:37.9729027Z mov.u32 %r2212, %nctaid.x; 2026-02-21T10:19:37.9729196Z mov.u32 %r2213, %nctaid.y; 2026-02-21T10:19:37.9729385Z mad.lo.s32 %r2214, %r2211, %r2213, %r2210; 2026-02-21T10:19:37.9729611Z mad.lo.s32 %r2215, %r2214, %r2212, %r42467; 2026-02-21T10:19:37.9729823Z shl.b32 %r2216, %r2215, 8; 2026-02-21T10:19:37.9730009Z cvt.s64.s32 %rd155, %r2216; 2026-02-21T10:19:37.9730200Z add.s64 %rd133, %rd154, %rd155; 2026-02-21T10:19:37.9730395Z mov.u32 %r2, %tid.x; 2026-02-21T10:19:37.9730558Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:19:37.9730737Z shl.b32 %r2217, %r2, 2; 2026-02-21T10:19:37.9730906Z mov.b32 %r39931, global_smem; 2026-02-21T10:19:37.9731193Z add.s32 %r2194, %r39931, %r2217; 2026-02-21T10:19:37.9731391Z mov.b32 %r2203, 0; 2026-02-21T10:19:37.9731883Z // begin inline asm 2026-02-21T10:19:37.9732055Z @%p1 st.shared.b32 [ %r2194 + 0 ], %r2203; 2026-02-21T10:19:37.9732266Z // end inline asm 2026-02-21T10:19:37.9732501Z bar.warp.sync -1; 2026-02-21T10:19:37.9732675Z setp.eq.b32 %p313, %r2, 0; 2026-02-21T10:19:37.9732865Z cvt.u64.u32 %rd118, %r39931; 2026-02-21T10:19:37.9733050Z // begin inline asm 2026-02-21T10:19:37.9733390Z @%p313 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd118 + 0 ], %rd119; 2026-02-21T10:19:37.9733751Z // end inline asm 2026-02-21T10:19:37.9733905Z // begin inline asm 2026-02-21T10:19:37.9734185Z @%p313 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1; 2026-02-21T10:19:37.9734533Z // end inline asm 2026-02-21T10:19:37.9734697Z mov.b32 %r2196, 128; 2026-02-21T10:19:37.9734867Z // begin inline asm 2026-02-21T10:19:37.9735181Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2196; 2026-02-21T10:19:37.9735528Z // end inline asm 2026-02-21T10:19:37.9735697Z mov.b32 %r2197, 32; 2026-02-21T10:19:37.9735876Z // begin inline asm 2026-02-21T10:19:37.9736178Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2197; 2026-02-21T10:19:37.9736675Z // end inline asm 2026-02-21T10:19:37.9736858Z mov.b32 %r2198, 1280; 2026-02-21T10:19:37.9737028Z // begin inline asm 2026-02-21T10:19:37.9737348Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2198; 2026-02-21T10:19:37.9737721Z // end inline asm 2026-02-21T10:19:37.9737888Z mov.b32 %r2199, 4096; 2026-02-21T10:19:37.9738065Z // begin inline asm 2026-02-21T10:19:37.9738377Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2199; 2026-02-21T10:19:37.9738740Z // end inline asm 2026-02-21T10:19:37.9738895Z mov.b64 %rd126, 1280; 2026-02-21T10:19:37.9739057Z // begin inline asm 2026-02-21T10:19:37.9739379Z @%p313 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd118 + 0 ], 0x0, %rd126; 2026-02-21T10:19:37.9739742Z // end inline asm 2026-02-21T10:19:37.9739893Z mov.b32 %r2200, 1; 2026-02-21T10:19:37.9740044Z // begin inline asm 2026-02-21T10:19:37.9740365Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2200; 2026-02-21T10:19:37.9740781Z // end inline asm 2026-02-21T10:19:37.9740975Z // begin inline asm 2026-02-21T10:19:37.9741329Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2200; 2026-02-21T10:19:37.9741714Z // end inline asm 2026-02-21T10:19:37.9741886Z // begin inline asm 2026-02-21T10:19:37.9742178Z @%p313 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:19:37.9742634Z // end inline asm 2026-02-21T10:19:37.9742858Z // begin inline asm 2026-02-21T10:19:37.9743183Z @%p313 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:19:37.9743564Z // end inline asm 2026-02-21T10:19:37.9743720Z // begin inline asm 2026-02-21T10:19:37.9744047Z @%p313 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x3; 2026-02-21T10:19:37.9744403Z // end inline asm 2026-02-21T10:19:37.9744566Z // begin inline asm 2026-02-21T10:19:37.9744857Z @%p313 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:19:37.9745199Z // end inline asm 2026-02-21T10:19:37.9745383Z // begin inline asm 2026-02-21T10:19:37.9745836Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd133 + 0 ], [ %rd118 + 0 ], 0x80; 2026-02-21T10:19:37.9746348Z // end inline asm 2026-02-21T10:19:37.9746663Z // begin inline asm 2026-02-21T10:19:37.9746952Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd133 + 0 ], 0x80; 2026-02-21T10:19:37.9747282Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:19:37.9747601Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:19:37.9747839Z // end inline asm 2026-02-21T10:19:37.9747993Z bar.sync 0; 2026-02-21T10:19:37.9748169Z cvta.global.u64 %rd822, %rd133; 2026-02-21T10:19:37.9748679Z .loc 1 23 68 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:23:68 2026-02-21T10:19:37.9749068Z add.s64 %rd151, %rd133, 128; 2026-02-21T10:19:37.9749258Z bar.sync 0; 2026-02-21T10:19:37.9749408Z // begin inline asm 2026-02-21T10:19:37.9749598Z @%p1 st.shared.b32 [ %r2194 + 0 ], %r2203; 2026-02-21T10:19:37.9749825Z // end inline asm 2026-02-21T10:19:37.9749988Z bar.warp.sync -1; 2026-02-21T10:19:37.9750156Z // begin inline asm 2026-02-21T10:19:37.9750508Z @%p313 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd118 + 0 ], %rd137; 2026-02-21T10:19:37.9750892Z // end inline asm 2026-02-21T10:19:37.9751056Z // begin inline asm 2026-02-21T10:19:37.9751336Z @%p313 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1; 2026-02-21T10:19:37.9751700Z // end inline asm 2026-02-21T10:19:37.9751874Z mov.b32 %r2204, 64; 2026-02-21T10:19:37.9752051Z // begin inline asm 2026-02-21T10:19:37.9752347Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2204; 2026-02-21T10:19:37.9752694Z // end inline asm 2026-02-21T10:19:37.9752848Z // begin inline asm 2026-02-21T10:19:37.9753135Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2196; 2026-02-21T10:19:37.9753484Z // end inline asm 2026-02-21T10:19:37.9753653Z // begin inline asm 2026-02-21T10:19:37.9753966Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2198; 2026-02-21T10:19:37.9754333Z // end inline asm 2026-02-21T10:19:37.9754497Z mov.b32 %r2207, 65536; 2026-02-21T10:19:37.9754674Z // begin inline asm 2026-02-21T10:19:37.9754983Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2207; 2026-02-21T10:19:37.9755343Z // end inline asm 2026-02-21T10:19:37.9755507Z mov.b64 %rd144, 2560; 2026-02-21T10:19:37.9755678Z // begin inline asm 2026-02-21T10:19:37.9756020Z @%p313 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd118 + 0 ], 0x0, %rd144; 2026-02-21T10:19:37.9756392Z // end inline asm 2026-02-21T10:19:37.9756699Z // begin inline asm 2026-02-21T10:19:37.9757026Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2200; 2026-02-21T10:19:37.9757425Z // end inline asm 2026-02-21T10:19:37.9757569Z // begin inline asm 2026-02-21T10:19:37.9757901Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2200; 2026-02-21T10:19:37.9758285Z // end inline asm 2026-02-21T10:19:37.9758437Z // begin inline asm 2026-02-21T10:19:37.9758730Z @%p313 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd118 + 0 ], 0xa; 2026-02-21T10:19:37.9759169Z // end inline asm 2026-02-21T10:19:37.9759394Z // begin inline asm 2026-02-21T10:19:37.9759731Z @%p313 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:19:37.9760105Z // end inline asm 2026-02-21T10:19:37.9760256Z // begin inline asm 2026-02-21T10:19:37.9760555Z @%p313 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x3; 2026-02-21T10:19:37.9760902Z // end inline asm 2026-02-21T10:19:37.9761058Z // begin inline asm 2026-02-21T10:19:37.9761346Z @%p313 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:19:37.9761688Z // end inline asm 2026-02-21T10:19:37.9761861Z // begin inline asm 2026-02-21T10:19:37.9762312Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd151 + 0 ], [ %rd118 + 0 ], 0x80; 2026-02-21T10:19:37.9762809Z // end inline asm 2026-02-21T10:19:37.9762966Z // begin inline asm 2026-02-21T10:19:37.9763228Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd151 + 0 ], 0x80; 2026-02-21T10:19:37.9763549Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:19:37.9763866Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:19:37.9764084Z // end inline asm 2026-02-21T10:19:37.9764243Z bar.sync 0; 2026-02-21T10:19:37.9764402Z cvta.global.u64 %rd497, %rd151; 2026-02-21T10:19:37.9764838Z .loc 1 0 0 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:0 2026-02-21T10:19:37.9765194Z sub.s32 %r2219, 9343, %r42467; 2026-02-21T10:19:37.9765540Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:37.9765949Z mul.hi.u32 %r2220, %r2219, 1041204193; 2026-02-21T10:19:37.9766178Z shr.u32 %r2221, %r2220, 10; 2026-02-21T10:19:37.9766385Z mul.hi.u32 %r2222, %r2221, 1431655766; 2026-02-21T10:19:37.9766744Z mad.lo.s32 %r43239, %r2222, 12672, %r42467; 2026-02-21T10:19:37.9767110Z .loc 1 41 45 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:41:45 2026-02-21T10:19:37.9767468Z shr.u32 %r4, %r2, 5; 2026-02-21T10:19:37.9767657Z shr.u32 %r2223, %r2, 3; 2026-02-21T10:19:37.9767826Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:19:37.9768007Z or.b32 %r6, %r2223, 112; 2026-02-21T10:19:37.9768325Z .loc 1 54 38 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:54:38 2026-02-21T10:19:37.9768674Z and.b32 %r7, %r2, 7; 2026-02-21T10:19:37.9768840Z shl.b32 %r8, %r7, 3; 2026-02-21T10:19:37.9769149Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:37.9769527Z setp.ge.s32 %p37, %r42467, %r43239; 2026-02-21T10:19:37.9769737Z and.b32 %r42451, %r2, 127; 2026-02-21T10:19:37.9769929Z and.b32 %r42452, %r2, 56; 2026-02-21T10:19:37.9770130Z shl.b32 %r42453, %r2, 6; 2026-02-21T10:19:37.9770306Z shl.b32 %r42454, %r2, 5; 2026-02-21T10:19:37.9770482Z shl.b32 %r42455, %r2, 1; 2026-02-21T10:19:37.9770648Z shl.b32 %r42456, %r7, 4; 2026-02-21T10:19:37.9770823Z shl.b32 %r42457, %r2, 7; 2026-02-21T10:19:37.9770988Z and.b32 %r42458, %r2, 16; 2026-02-21T10:19:37.9771160Z or.b32 %r42459, %r5, 96; 2026-02-21T10:19:37.9771325Z or.b32 %r42460, %r5, 80; 2026-02-21T10:19:37.9771520Z or.b32 %r42461, %r5, 64; 2026-02-21T10:19:37.9771699Z or.b32 %r42462, %r5, 48; 2026-02-21T10:19:37.9771919Z or.b32 %r42463, %r5, 32; 2026-02-21T10:19:37.9772122Z or.b32 %r42464, %r5, 16; 2026-02-21T10:19:37.9772297Z shl.b32 %r42465, %r5, 13; 2026-02-21T10:19:37.9772487Z shl.b32 %r42466, %r6, 13; 2026-02-21T10:19:37.9772665Z setp.lt.u32 %p405, %r2, 64; 2026-02-21T10:19:37.9772892Z @%p37 bra $L__BB0_15; 2026-02-21T10:19:37.9773107Z // %bb.1: // %.lr.ph 2026-02-21T10:19:37.9773530Z .loc 1 0 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:0:112 2026-02-21T10:19:37.9773915Z shl.b32 %r2225, %r42451, 4; 2026-02-21T10:19:37.9774206Z xor.b32 %r2227, %r2225, %r42452; 2026-02-21T10:19:37.9774516Z add.s32 %r9, %r39931, %r2227; 2026-02-21T10:19:37.9774743Z xor.b32 %r2229, %r2227, 8; 2026-02-21T10:19:37.9774955Z add.s32 %r10, %r39931, %r2229; 2026-02-21T10:19:37.9775157Z and.b32 %r2231, %r42453, 6144; 2026-02-21T10:19:37.9775413Z and.b32 %r2233, %r42454, 896; 2026-02-21T10:19:37.9775619Z and.b32 %r2235, %r42455, 62; 2026-02-21T10:19:37.9775830Z or.b32 %r2236, %r2231, %r2233; 2026-02-21T10:19:37.9776012Z or.b32 %r2237, %r2236, %r2235; 2026-02-21T10:19:37.9776197Z add.s32 %r11, %r39931, %r2237; 2026-02-21T10:19:37.9776380Z xor.b32 %r2238, %r2237, 8; 2026-02-21T10:19:37.9776687Z add.s32 %r12, %r39931, %r2238; 2026-02-21T10:19:37.9776889Z xor.b32 %r2239, %r2237, 16; 2026-02-21T10:19:37.9777065Z add.s32 %r13, %r39931, %r2239; 2026-02-21T10:19:37.9777249Z xor.b32 %r2240, %r2237, 24; 2026-02-21T10:19:37.9777423Z add.s32 %r14, %r39931, %r2240; 2026-02-21T10:19:37.9777606Z xor.b32 %r2241, %r2237, 32; 2026-02-21T10:19:37.9777782Z add.s32 %r15, %r39931, %r2241; 2026-02-21T10:19:37.9777971Z xor.b32 %r2242, %r2237, 40; 2026-02-21T10:19:37.9778244Z add.s32 %r16, %r39931, %r2242; 2026-02-21T10:19:37.9778437Z xor.b32 %r2243, %r2237, 48; 2026-02-21T10:19:37.9778626Z add.s32 %r17, %r39931, %r2243; 2026-02-21T10:19:37.9778813Z xor.b32 %r2244, %r2237, 56; 2026-02-21T10:19:37.9779002Z add.s32 %r18, %r39931, %r2244; 2026-02-21T10:19:37.9779254Z add.s32 %r19, %r39931, %r42451; 2026-02-21T10:19:37.9779465Z xor.b32 %r2245, %r42451, 16; 2026-02-21T10:19:37.9779651Z add.s32 %r20, %r39931, %r2245; 2026-02-21T10:19:37.9779836Z xor.b32 %r2246, %r42451, 32; 2026-02-21T10:19:37.9780011Z add.s32 %r21, %r39931, %r2246; 2026-02-21T10:19:37.9780193Z xor.b32 %r2247, %r42451, 48; 2026-02-21T10:19:37.9780372Z add.s32 %r22, %r39931, %r2247; 2026-02-21T10:19:37.9780552Z xor.b32 %r2248, %r42451, 64; 2026-02-21T10:19:37.9780730Z add.s32 %r23, %r39931, %r2248; 2026-02-21T10:19:37.9780911Z xor.b32 %r2249, %r42451, 80; 2026-02-21T10:19:37.9781091Z add.s32 %r24, %r39931, %r2249; 2026-02-21T10:19:37.9781270Z xor.b32 %r2250, %r42451, 96; 2026-02-21T10:19:37.9781449Z add.s32 %r25, %r39931, %r2250; 2026-02-21T10:19:37.9781628Z xor.b32 %r2251, %r42451, 112; 2026-02-21T10:19:37.9781810Z add.s32 %r26, %r39931, %r2251; 2026-02-21T10:19:37.9781994Z shl.b32 %r2252, %r42451, 7; 2026-02-21T10:19:37.9782174Z or.b32 %r2254, %r2252, %r42456; 2026-02-21T10:19:37.9782364Z add.s32 %r27, %r39931, %r2254; 2026-02-21T10:19:37.9782542Z xor.b32 %r2255, %r2254, 16; 2026-02-21T10:19:37.9782720Z add.s32 %r28, %r39931, %r2255; 2026-02-21T10:19:37.9782900Z xor.b32 %r2256, %r2254, 32; 2026-02-21T10:19:37.9783081Z add.s32 %r29, %r39931, %r2256; 2026-02-21T10:19:37.9783258Z xor.b32 %r2257, %r2254, 48; 2026-02-21T10:19:37.9783450Z add.s32 %r30, %r39931, %r2257; 2026-02-21T10:19:37.9783627Z xor.b32 %r2258, %r2254, 64; 2026-02-21T10:19:37.9783812Z add.s32 %r31, %r39931, %r2258; 2026-02-21T10:19:37.9783997Z xor.b32 %r2259, %r2254, 80; 2026-02-21T10:19:37.9784169Z add.s32 %r32, %r39931, %r2259; 2026-02-21T10:19:37.9784363Z xor.b32 %r2260, %r2254, 96; 2026-02-21T10:19:37.9784538Z add.s32 %r33, %r39931, %r2260; 2026-02-21T10:19:37.9784719Z xor.b32 %r2261, %r2254, 112; 2026-02-21T10:19:37.9784892Z add.s32 %r34, %r39931, %r2261; 2026-02-21T10:19:37.9785081Z bfe.u32 %r2262, %r39931, 4, 14; 2026-02-21T10:19:37.9785267Z cvt.u64.u32 %rd156, %r2262; 2026-02-21T10:19:37.9785465Z or.b64 %rd3, %rd156, 4611686293372403712; 2026-02-21T10:19:37.9785687Z add.s32 %r2263, %r39931, 32; 2026-02-21T10:19:37.9785866Z bfe.u32 %r2264, %r2263, 4, 14; 2026-02-21T10:19:37.9786055Z cvt.u64.u32 %rd157, %r2264; 2026-02-21T10:19:37.9786244Z or.b64 %rd4, %rd157, 4611686293372403712; 2026-02-21T10:19:37.9786593Z add.s32 %r2265, %r39931, 64; 2026-02-21T10:19:37.9786789Z bfe.u32 %r2266, %r2265, 4, 14; 2026-02-21T10:19:37.9786979Z cvt.u64.u32 %rd158, %r2266; 2026-02-21T10:19:37.9787169Z or.b64 %rd5, %rd158, 4611686293372403712; 2026-02-21T10:19:37.9787480Z add.s32 %r2267, %r39931, 96; 2026-02-21T10:19:37.9787729Z bfe.u32 %r2268, %r2267, 4, 14; 2026-02-21T10:19:37.9787921Z cvt.u64.u32 %rd159, %r2268; 2026-02-21T10:19:37.9788132Z or.b64 %rd6, %rd159, 4611686293372403712; 2026-02-21T10:19:37.9788340Z add.s32 %r2269, %r39931, 16384; 2026-02-21T10:19:37.9788620Z bfe.u32 %r2270, %r2269, 4, 14; 2026-02-21T10:19:37.9788800Z cvt.u64.u32 %rd160, %r2270; 2026-02-21T10:19:37.9788996Z or.b64 %rd7, %rd160, 4611686293372403712; 2026-02-21T10:19:37.9789203Z add.s32 %r2271, %r39931, 16416; 2026-02-21T10:19:37.9789392Z bfe.u32 %r2272, %r2271, 4, 14; 2026-02-21T10:19:37.9789575Z cvt.u64.u32 %rd161, %r2272; 2026-02-21T10:19:37.9789763Z or.b64 %rd8, %rd161, 4611686293372403712; 2026-02-21T10:19:37.9789969Z add.s32 %r2273, %r39931, 16448; 2026-02-21T10:19:37.9790147Z bfe.u32 %r2274, %r2273, 4, 14; 2026-02-21T10:19:37.9790334Z cvt.u64.u32 %rd162, %r2274; 2026-02-21T10:19:37.9790514Z or.b64 %rd9, %rd162, 4611686293372403712; 2026-02-21T10:19:37.9790720Z add.s32 %r2275, %r39931, 16480; 2026-02-21T10:19:37.9790902Z bfe.u32 %r2276, %r2275, 4, 14; 2026-02-21T10:19:37.9791172Z cvt.u64.u32 %rd163, %r2276; 2026-02-21T10:19:37.9791367Z or.b64 %rd10, %rd163, 4611686293372403712; 2026-02-21T10:19:37.9791582Z and.b32 %r2278, %r42457, 1920; 2026-02-21T10:19:37.9791761Z or.b32 %r2280, %r2278, %r42456; 2026-02-21T10:19:37.9791949Z xor.b32 %r2281, %r2280, %r42458; 2026-02-21T10:19:37.9792205Z or.b32 %r2282, %r2281, %r2231; 2026-02-21T10:19:37.9792393Z add.s32 %r35, %r39931, %r2282; 2026-02-21T10:19:37.9792593Z add.s32 %r36, %r35, 16384; 2026-02-21T10:19:37.9792770Z add.s32 %r37, %r35, 8192; 2026-02-21T10:19:37.9792949Z add.s32 %r38, %r35, 24576; 2026-02-21T10:19:37.9793122Z xor.b32 %r2283, %r2282, 32; 2026-02-21T10:19:37.9793306Z add.s32 %r39, %r39931, %r2283; 2026-02-21T10:19:37.9793485Z add.s32 %r40, %r39, 16384; 2026-02-21T10:19:37.9793658Z add.s32 %r41, %r39, 8192; 2026-02-21T10:19:37.9793835Z add.s32 %r42, %r39, 24576; 2026-02-21T10:19:37.9794009Z xor.b32 %r2284, %r2282, 64; 2026-02-21T10:19:37.9794206Z add.s32 %r43, %r39931, %r2284; 2026-02-21T10:19:37.9794391Z add.s32 %r44, %r43, 16384; 2026-02-21T10:19:37.9794569Z add.s32 %r45, %r43, 8192; 2026-02-21T10:19:37.9794743Z add.s32 %r46, %r43, 24576; 2026-02-21T10:19:37.9794920Z xor.b32 %r2285, %r2282, 96; 2026-02-21T10:19:37.9795092Z add.s32 %r47, %r39931, %r2285; 2026-02-21T10:19:37.9795277Z add.s32 %r48, %r47, 16384; 2026-02-21T10:19:37.9795448Z add.s32 %r49, %r47, 8192; 2026-02-21T10:19:37.9795621Z add.s32 %r50, %r47, 24576; 2026-02-21T10:19:37.9795799Z or.b32 %r2286, %r2233, %r2235; 2026-02-21T10:19:37.9795978Z or.b32 %r2287, %r2286, %r2231; 2026-02-21T10:19:37.9796165Z add.s32 %r51, %r39931, %r2287; 2026-02-21T10:19:37.9796344Z xor.b32 %r2288, %r2287, 8; 2026-02-21T10:19:37.9796658Z add.s32 %r52, %r39931, %r2288; 2026-02-21T10:19:37.9825267Z xor.b32 %r2289, %r2287, 16; 2026-02-21T10:19:37.9825509Z add.s32 %r53, %r39931, %r2289; 2026-02-21T10:19:37.9825722Z xor.b32 %r2290, %r2287, 24; 2026-02-21T10:19:37.9825973Z add.s32 %r54, %r39931, %r2290; 2026-02-21T10:19:37.9826283Z xor.b32 %r2291, %r2287, 32; 2026-02-21T10:19:37.9826629Z add.s32 %r55, %r39931, %r2291; 2026-02-21T10:19:37.9826842Z xor.b32 %r2292, %r2287, 40; 2026-02-21T10:19:37.9827035Z add.s32 %r56, %r39931, %r2292; 2026-02-21T10:19:37.9827221Z xor.b32 %r2293, %r2287, 48; 2026-02-21T10:19:37.9827425Z add.s32 %r57, %r39931, %r2293; 2026-02-21T10:19:37.9827613Z xor.b32 %r2294, %r2287, 56; 2026-02-21T10:19:37.9827797Z add.s32 %r58, %r39931, %r2294; 2026-02-21T10:19:37.9828156Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:37.9828655Z mad.wide.u32 %rd11, %r7, 16, %rd117; 2026-02-21T10:19:37.9829022Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:37.9829390Z or.b32 %r2296, %r42466, %r8; 2026-02-21T10:19:37.9829765Z or.b32 %r67, %r2296, 128; 2026-02-21T10:19:37.9830071Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:19:37.9830375Z // Child Loop BB0_3 Depth 2 2026-02-21T10:19:37.9830648Z // Child Loop BB0_5 Depth 2 2026-02-21T10:19:37.9830930Z // Child Loop BB0_7 Depth 2 2026-02-21T10:19:37.9831195Z // Child Loop BB0_9 Depth 2 2026-02-21T10:19:37.9831470Z // Child Loop BB0_11 Depth 2 2026-02-21T10:19:37.9831754Z // Child Loop BB0_13 Depth 2 2026-02-21T10:19:37.9832163Z .loc 1 34 35 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:34:35 2026-02-21T10:19:37.9832620Z shr.s32 %r2298, %r42467, 31; 2026-02-21T10:19:37.9832812Z shr.u32 %r2299, %r2298, 18; 2026-02-21T10:19:37.9833005Z add.s32 %r2300, %r42467, %r2299; 2026-02-21T10:19:37.9833199Z shr.s32 %r2301, %r2300, 14; 2026-02-21T10:19:37.9833622Z .loc 1 35 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:35:33 2026-02-21T10:19:37.9833990Z shl.b32 %r2302, %r2301, 5; 2026-02-21T10:19:37.9834304Z .loc 1 36 39 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:39 2026-02-21T10:19:37.9834723Z sub.s32 %r2303, 10, %r2302; 2026-02-21T10:19:37.9835033Z .loc 1 36 52 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:52 2026-02-21T10:19:37.9835384Z min.s32 %r2304, %r2303, 32; 2026-02-21T10:19:37.9835690Z .loc 1 37 45 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:45 2026-02-21T10:19:37.9836049Z and.b32 %r2305, %r2300, -16384; 2026-02-21T10:19:37.9836254Z sub.s32 %r2306, %r42467, %r2305; 2026-02-21T10:19:37.9836752Z .loc 1 38 51 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:38:51 2026-02-21T10:19:37.9837147Z div.s32 %r2307, %r2306, %r2304; 2026-02-21T10:19:37.9837488Z .loc 1 37 64 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:64 2026-02-21T10:19:37.9837851Z mul.lo.s32 %r2308, %r2307, %r2304; 2026-02-21T10:19:37.9838052Z sub.s32 %r2309, %r2306, %r2308; 2026-02-21T10:19:37.9838383Z .loc 1 37 30 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:30 2026-02-21T10:19:37.9838738Z add.s32 %r2310, %r2309, %r2302; 2026-02-21T10:19:37.9839054Z .loc 1 39 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:39:27 2026-02-21T10:19:37.9839407Z shl.b32 %r9804, %r2310, 7; 2026-02-21T10:19:37.9839715Z .loc 1 40 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:40:27 2026-02-21T10:19:37.9840081Z shl.b32 %r12251, %r2307, 7; 2026-02-21T10:19:37.9840397Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:37.9840755Z or.b32 %r2311, %r42459, %r12251; 2026-02-21T10:19:37.9840957Z shl.b32 %r2312, %r2311, 13; 2026-02-21T10:19:37.9841149Z mul.wide.s32 %rd21, %r2312, 2; 2026-02-21T10:19:37.9841349Z or.b32 %r2313, %r42460, %r12251; 2026-02-21T10:19:37.9841538Z shl.b32 %r2314, %r2313, 13; 2026-02-21T10:19:37.9841731Z mul.wide.s32 %rd22, %r2314, 2; 2026-02-21T10:19:37.9841919Z or.b32 %r2315, %r42461, %r12251; 2026-02-21T10:19:37.9842112Z shl.b32 %r2316, %r2315, 13; 2026-02-21T10:19:37.9842307Z mul.wide.s32 %rd23, %r2316, 2; 2026-02-21T10:19:37.9842509Z or.b32 %r2317, %r42462, %r12251; 2026-02-21T10:19:37.9842707Z shl.b32 %r2318, %r2317, 13; 2026-02-21T10:19:37.9842894Z mul.wide.s32 %rd24, %r2318, 2; 2026-02-21T10:19:37.9843097Z or.b32 %r2319, %r42463, %r12251; 2026-02-21T10:19:37.9843285Z shl.b32 %r2320, %r2319, 13; 2026-02-21T10:19:37.9843475Z mul.wide.s32 %rd25, %r2320, 2; 2026-02-21T10:19:37.9843782Z or.b32 %r2321, %r42464, %r12251; 2026-02-21T10:19:37.9843988Z shl.b32 %r2322, %r2321, 13; 2026-02-21T10:19:37.9844244Z mul.wide.s32 %rd26, %r2322, 2; 2026-02-21T10:19:37.9844440Z shl.b32 %r2323, %r2307, 20; 2026-02-21T10:19:37.9844626Z or.b32 %r2324, %r42465, %r2323; 2026-02-21T10:19:37.9844835Z mul.wide.s32 %rd27, %r2324, 2; 2026-02-21T10:19:37.9845032Z or.b32 %r42468, %r67, %r2323; 2026-02-21T10:19:37.9845215Z or.b32 %r2325, %r42466, %r2323; 2026-02-21T10:19:37.9845410Z mul.wide.s32 %rd28, %r2325, 2; 2026-02-21T10:19:37.9845595Z mov.b32 %r42469, 0f00000000; 2026-02-21T10:19:37.9845782Z mov.b64 %rd841, -96; 2026-02-21T10:19:37.9845951Z mov.b64 %rd840, %rd11; 2026-02-21T10:19:37.9846126Z mov.b32 %r42470, %r42469; 2026-02-21T10:19:37.9846315Z mov.b32 %r42471, %r42469; 2026-02-21T10:19:37.9846621Z mov.b32 %r42472, %r42469; 2026-02-21T10:19:37.9846807Z mov.b32 %r42473, %r42469; 2026-02-21T10:19:37.9846991Z mov.b32 %r42474, %r42469; 2026-02-21T10:19:37.9847168Z mov.b32 %r42475, %r42469; 2026-02-21T10:19:37.9847339Z mov.b32 %r42476, %r42469; 2026-02-21T10:19:37.9847519Z mov.b32 %r42477, %r42469; 2026-02-21T10:19:37.9847686Z mov.b32 %r42478, %r42469; 2026-02-21T10:19:37.9847936Z mov.b32 %r42479, %r42469; 2026-02-21T10:19:37.9848116Z mov.b32 %r42480, %r42469; 2026-02-21T10:19:37.9848290Z mov.b32 %r42481, %r42469; 2026-02-21T10:19:37.9848453Z mov.b32 %r42482, %r42469; 2026-02-21T10:19:37.9848625Z mov.b32 %r42483, %r42469; 2026-02-21T10:19:37.9848870Z mov.b32 %r42484, %r42469; 2026-02-21T10:19:37.9849044Z mov.b32 %r42485, %r42469; 2026-02-21T10:19:37.9849217Z mov.b32 %r42486, %r42469; 2026-02-21T10:19:37.9849382Z mov.b32 %r42487, %r42469; 2026-02-21T10:19:37.9849557Z mov.b32 %r42488, %r42469; 2026-02-21T10:19:37.9849722Z mov.b32 %r42489, %r42469; 2026-02-21T10:19:37.9849893Z mov.b32 %r42490, %r42469; 2026-02-21T10:19:37.9850063Z mov.b32 %r42491, %r42469; 2026-02-21T10:19:37.9850251Z mov.b32 %r42492, %r42469; 2026-02-21T10:19:37.9850421Z mov.b32 %r42493, %r42469; 2026-02-21T10:19:37.9850599Z mov.b32 %r42494, %r42469; 2026-02-21T10:19:37.9850780Z mov.b32 %r42495, %r42469; 2026-02-21T10:19:37.9850948Z mov.b32 %r42496, %r42469; 2026-02-21T10:19:37.9851127Z mov.b32 %r42497, %r42469; 2026-02-21T10:19:37.9851298Z mov.b32 %r42498, %r42469; 2026-02-21T10:19:37.9851476Z mov.b32 %r42499, %r42469; 2026-02-21T10:19:37.9851654Z mov.b32 %r42500, %r42469; 2026-02-21T10:19:37.9851827Z mov.b32 %r42501, %r42469; 2026-02-21T10:19:37.9851994Z mov.b32 %r42502, %r42469; 2026-02-21T10:19:37.9852170Z mov.b32 %r42503, %r42469; 2026-02-21T10:19:37.9852337Z mov.b32 %r42504, %r42469; 2026-02-21T10:19:37.9852512Z mov.b32 %r42505, %r42469; 2026-02-21T10:19:37.9852684Z mov.b32 %r42506, %r42469; 2026-02-21T10:19:37.9852862Z mov.b32 %r42507, %r42469; 2026-02-21T10:19:37.9853043Z mov.b32 %r42508, %r42469; 2026-02-21T10:19:37.9853209Z mov.b32 %r42509, %r42469; 2026-02-21T10:19:37.9853381Z mov.b32 %r42510, %r42469; 2026-02-21T10:19:37.9853545Z mov.b32 %r42511, %r42469; 2026-02-21T10:19:37.9853721Z mov.b32 %r42512, %r42469; 2026-02-21T10:19:37.9853890Z mov.b32 %r42513, %r42469; 2026-02-21T10:19:37.9854063Z mov.b32 %r42514, %r42469; 2026-02-21T10:19:37.9854233Z mov.b32 %r42515, %r42469; 2026-02-21T10:19:37.9854414Z mov.b32 %r42516, %r42469; 2026-02-21T10:19:37.9854593Z mov.b32 %r42517, %r42469; 2026-02-21T10:19:37.9854764Z mov.b32 %r42518, %r42469; 2026-02-21T10:19:37.9854938Z mov.b32 %r42519, %r42469; 2026-02-21T10:19:37.9855108Z mov.b32 %r42520, %r42469; 2026-02-21T10:19:37.9855289Z mov.b32 %r42521, %r42469; 2026-02-21T10:19:37.9855456Z mov.b32 %r42522, %r42469; 2026-02-21T10:19:37.9855628Z mov.b32 %r42523, %r42469; 2026-02-21T10:19:37.9855793Z mov.b32 %r42524, %r42469; 2026-02-21T10:19:37.9855971Z mov.b32 %r42525, %r42469; 2026-02-21T10:19:37.9856141Z mov.b32 %r42526, %r42469; 2026-02-21T10:19:37.9856313Z mov.b32 %r42527, %r42469; 2026-02-21T10:19:37.9856613Z mov.b32 %r42528, %r42469; 2026-02-21T10:19:37.9856793Z mov.b32 %r42529, %r42469; 2026-02-21T10:19:37.9857061Z mov.b32 %r42530, %r42469; 2026-02-21T10:19:37.9857292Z mov.b32 %r42531, %r42469; 2026-02-21T10:19:37.9857464Z mov.b32 %r42532, %r42469; 2026-02-21T10:19:37.9857634Z mov.b32 %r42533, %r42469; 2026-02-21T10:19:37.9857809Z mov.b32 %r42534, %r42469; 2026-02-21T10:19:37.9857974Z mov.b32 %r42535, %r42469; 2026-02-21T10:19:37.9858152Z mov.b32 %r42536, %r42469; 2026-02-21T10:19:37.9858320Z mov.b32 %r42537, %r42469; 2026-02-21T10:19:37.9858496Z mov.b32 %r42538, %r42469; 2026-02-21T10:19:37.9858668Z mov.b32 %r42539, %r42469; 2026-02-21T10:19:37.9858848Z mov.b32 %r42540, %r42469; 2026-02-21T10:19:37.9859025Z mov.b32 %r42541, %r42469; 2026-02-21T10:19:37.9859193Z mov.b32 %r42542, %r42469; 2026-02-21T10:19:37.9859371Z mov.b32 %r42543, %r42469; 2026-02-21T10:19:37.9859544Z mov.b32 %r42544, %r42469; 2026-02-21T10:19:37.9859726Z mov.b32 %r42545, %r42469; 2026-02-21T10:19:37.9859912Z mov.b32 %r42546, %r42469; 2026-02-21T10:19:37.9860085Z mov.b32 %r42547, %r42469; 2026-02-21T10:19:37.9860265Z mov.b32 %r42548, %r42469; 2026-02-21T10:19:37.9860432Z mov.b32 %r42549, %r42469; 2026-02-21T10:19:37.9860609Z mov.b32 %r42550, %r42469; 2026-02-21T10:19:37.9860854Z mov.b32 %r42551, %r42469; 2026-02-21T10:19:37.9861029Z mov.b32 %r42552, %r42469; 2026-02-21T10:19:37.9861198Z mov.b32 %r42553, %r42469; 2026-02-21T10:19:37.9861384Z mov.b32 %r42554, %r42469; 2026-02-21T10:19:37.9861557Z mov.b32 %r42555, %r42469; 2026-02-21T10:19:37.9861795Z mov.b32 %r42556, %r42469; 2026-02-21T10:19:37.9861973Z mov.b32 %r42557, %r42469; 2026-02-21T10:19:37.9862140Z mov.b32 %r42558, %r42469; 2026-02-21T10:19:37.9862313Z mov.b32 %r42559, %r42469; 2026-02-21T10:19:37.9862481Z mov.b32 %r42560, %r42469; 2026-02-21T10:19:37.9862653Z mov.b32 %r42561, %r42469; 2026-02-21T10:19:37.9862821Z mov.b32 %r42562, %r42469; 2026-02-21T10:19:37.9862993Z mov.b32 %r42563, %r42469; 2026-02-21T10:19:37.9863160Z mov.b32 %r42564, %r42469; 2026-02-21T10:19:37.9863337Z mov.b32 %r42565, %r42469; 2026-02-21T10:19:37.9863510Z mov.b32 %r42566, %r42469; 2026-02-21T10:19:37.9863690Z mov.b32 %r42567, %r42469; 2026-02-21T10:19:37.9863864Z mov.b32 %r42568, %r42469; 2026-02-21T10:19:37.9864031Z mov.b32 %r42569, %r42469; 2026-02-21T10:19:37.9864202Z mov.b32 %r42570, %r42469; 2026-02-21T10:19:37.9864367Z mov.b32 %r42571, %r42469; 2026-02-21T10:19:37.9864540Z mov.b32 %r42572, %r42469; 2026-02-21T10:19:37.9864709Z mov.b32 %r42573, %r42469; 2026-02-21T10:19:37.9864879Z mov.b32 %r42574, %r42469; 2026-02-21T10:19:37.9865048Z mov.b32 %r42575, %r42469; 2026-02-21T10:19:37.9865218Z mov.b32 %r42576, %r42469; 2026-02-21T10:19:37.9865383Z mov.b32 %r42577, %r42469; 2026-02-21T10:19:37.9865570Z mov.b32 %r42578, %r42469; 2026-02-21T10:19:37.9865740Z mov.b32 %r42579, %r42469; 2026-02-21T10:19:37.9865908Z mov.b32 %r42580, %r42469; 2026-02-21T10:19:37.9866087Z mov.b32 %r42581, %r42469; 2026-02-21T10:19:37.9866259Z mov.b32 %r42582, %r42469; 2026-02-21T10:19:37.9866433Z mov.b32 %r42583, %r42469; 2026-02-21T10:19:37.9866729Z mov.b32 %r42584, %r42469; 2026-02-21T10:19:37.9866918Z mov.b32 %r42585, %r42469; 2026-02-21T10:19:37.9867087Z mov.b32 %r42586, %r42469; 2026-02-21T10:19:37.9867260Z mov.b32 %r42587, %r42469; 2026-02-21T10:19:37.9867426Z mov.b32 %r42588, %r42469; 2026-02-21T10:19:37.9867598Z mov.b32 %r42589, %r42469; 2026-02-21T10:19:37.9867775Z mov.b32 %r42590, %r42469; 2026-02-21T10:19:37.9867940Z mov.b32 %r42591, %r42469; 2026-02-21T10:19:37.9868114Z mov.b32 %r42592, %r42469; 2026-02-21T10:19:37.9868289Z mov.b32 %r42593, %r42469; 2026-02-21T10:19:37.9868544Z mov.b32 %r42594, %r42469; 2026-02-21T10:19:37.9868721Z mov.b32 %r42595, %r42469; 2026-02-21T10:19:37.9868888Z mov.b32 %r42596, %r42469; 2026-02-21T10:19:37.9869125Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:37.9869434Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:37.9869840Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:37.9870358Z add.s64 %rd166, %rd840, %rd27; 2026-02-21T10:19:37.9870558Z add.s64 %rd169, %rd840, %rd26; 2026-02-21T10:19:37.9870752Z add.s64 %rd172, %rd840, %rd25; 2026-02-21T10:19:37.9870936Z add.s64 %rd175, %rd840, %rd24; 2026-02-21T10:19:37.9871145Z add.s64 %rd178, %rd840, %rd23; 2026-02-21T10:19:37.9871338Z add.s64 %rd181, %rd840, %rd22; 2026-02-21T10:19:37.9871533Z add.s64 %rd184, %rd840, %rd21; 2026-02-21T10:19:37.9871861Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:37.9872223Z add.s64 %rd187, %rd840, %rd28; 2026-02-21T10:19:37.9872414Z // begin inline asm 2026-02-21T10:19:37.9872576Z mov.u64 %rd165, 0x0; 2026-02-21T10:19:37.9872825Z createpolicy.fractional.L2::evict_first.b64 %rd165, 1.0; 2026-02-21T10:19:37.9873091Z // end inline asm 2026-02-21T10:19:37.9873261Z // begin inline asm 2026-02-21T10:19:37.9873428Z mov.u32 %r2326, 0x0; 2026-02-21T10:19:37.9873590Z mov.u32 %r2327, 0x0; 2026-02-21T10:19:37.9873745Z mov.u32 %r2328, 0x0; 2026-02-21T10:19:37.9873907Z mov.u32 %r2329, 0x0; 2026-02-21T10:19:37.9874364Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2326, %r2327, %r2328, %r2329 }, [ %rd166 + 0 ], %rd165; 2026-02-21T10:19:37.9874752Z // end inline asm 2026-02-21T10:19:37.9874917Z // begin inline asm 2026-02-21T10:19:37.9875076Z mov.u64 %rd168, 0x0; 2026-02-21T10:19:37.9875381Z createpolicy.fractional.L2::evict_first.b64 %rd168, 1.0; 2026-02-21T10:19:37.9875656Z // end inline asm 2026-02-21T10:19:37.9875818Z // begin inline asm 2026-02-21T10:19:37.9875975Z mov.u32 %r2330, 0x0; 2026-02-21T10:19:37.9876133Z mov.u32 %r2331, 0x0; 2026-02-21T10:19:37.9876286Z mov.u32 %r2332, 0x0; 2026-02-21T10:19:37.9876570Z mov.u32 %r2333, 0x0; 2026-02-21T10:19:37.9876902Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2330, %r2331, %r2332, %r2333 }, [ %rd169 + 0 ], %rd168; 2026-02-21T10:19:37.9877287Z // end inline asm 2026-02-21T10:19:37.9877446Z // begin inline asm 2026-02-21T10:19:37.9877602Z mov.u64 %rd171, 0x0; 2026-02-21T10:19:37.9877826Z createpolicy.fractional.L2::evict_first.b64 %rd171, 1.0; 2026-02-21T10:19:37.9878080Z // end inline asm 2026-02-21T10:19:37.9878236Z // begin inline asm 2026-02-21T10:19:37.9878388Z mov.u32 %r2334, 0x0; 2026-02-21T10:19:37.9878558Z mov.u32 %r2335, 0x0; 2026-02-21T10:19:37.9878729Z mov.u32 %r2336, 0x0; 2026-02-21T10:19:37.9878886Z mov.u32 %r2337, 0x0; 2026-02-21T10:19:37.9879204Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2334, %r2335, %r2336, %r2337 }, [ %rd172 + 0 ], %rd171; 2026-02-21T10:19:37.9879567Z // end inline asm 2026-02-21T10:19:37.9879724Z // begin inline asm 2026-02-21T10:19:37.9879878Z mov.u64 %rd174, 0x0; 2026-02-21T10:19:37.9880103Z createpolicy.fractional.L2::evict_first.b64 %rd174, 1.0; 2026-02-21T10:19:37.9880356Z // end inline asm 2026-02-21T10:19:37.9880520Z // begin inline asm 2026-02-21T10:19:37.9880681Z mov.u32 %r2338, 0x0; 2026-02-21T10:19:37.9880835Z mov.u32 %r2339, 0x0; 2026-02-21T10:19:37.9881014Z mov.u32 %r2340, 0x0; 2026-02-21T10:19:37.9881169Z mov.u32 %r2341, 0x0; 2026-02-21T10:19:37.9881490Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2338, %r2339, %r2340, %r2341 }, [ %rd175 + 0 ], %rd174; 2026-02-21T10:19:37.9881857Z // end inline asm 2026-02-21T10:19:37.9882021Z // begin inline asm 2026-02-21T10:19:37.9882178Z mov.u64 %rd177, 0x0; 2026-02-21T10:19:37.9882404Z createpolicy.fractional.L2::evict_first.b64 %rd177, 1.0; 2026-02-21T10:19:37.9882664Z // end inline asm 2026-02-21T10:19:37.9882816Z // begin inline asm 2026-02-21T10:19:37.9882975Z mov.u32 %r2342, 0x0; 2026-02-21T10:19:37.9883135Z mov.u32 %r2343, 0x0; 2026-02-21T10:19:37.9883297Z mov.u32 %r2344, 0x0; 2026-02-21T10:19:37.9883450Z mov.u32 %r2345, 0x0; 2026-02-21T10:19:37.9883786Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2342, %r2343, %r2344, %r2345 }, [ %rd178 + 0 ], %rd177; 2026-02-21T10:19:37.9884158Z // end inline asm 2026-02-21T10:19:37.9884408Z // begin inline asm 2026-02-21T10:19:37.9884574Z mov.u64 %rd180, 0x0; 2026-02-21T10:19:37.9884852Z createpolicy.fractional.L2::evict_first.b64 %rd180, 1.0; 2026-02-21T10:19:37.9885111Z // end inline asm 2026-02-21T10:19:37.9885261Z // begin inline asm 2026-02-21T10:19:37.9885422Z mov.u32 %r2346, 0x0; 2026-02-21T10:19:37.9885576Z mov.u32 %r2347, 0x0; 2026-02-21T10:19:37.9885749Z mov.u32 %r2348, 0x0; 2026-02-21T10:19:37.9885905Z mov.u32 %r2349, 0x0; 2026-02-21T10:19:37.9886227Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2346, %r2347, %r2348, %r2349 }, [ %rd181 + 0 ], %rd180; 2026-02-21T10:19:37.9886736Z // end inline asm 2026-02-21T10:19:37.9886907Z // begin inline asm 2026-02-21T10:19:37.9887070Z mov.u64 %rd183, 0x0; 2026-02-21T10:19:37.9887288Z createpolicy.fractional.L2::evict_first.b64 %rd183, 1.0; 2026-02-21T10:19:37.9887547Z // end inline asm 2026-02-21T10:19:37.9887697Z // begin inline asm 2026-02-21T10:19:37.9887862Z mov.u32 %r2350, 0x0; 2026-02-21T10:19:37.9888019Z mov.u32 %r2351, 0x0; 2026-02-21T10:19:37.9888177Z mov.u32 %r2352, 0x0; 2026-02-21T10:19:37.9888332Z mov.u32 %r2353, 0x0; 2026-02-21T10:19:37.9888746Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2350, %r2351, %r2352, %r2353 }, [ %rd184 + 0 ], %rd183; 2026-02-21T10:19:37.9889127Z // end inline asm 2026-02-21T10:19:37.9889275Z // begin inline asm 2026-02-21T10:19:37.9889433Z mov.u64 %rd186, 0x0; 2026-02-21T10:19:37.9889715Z createpolicy.fractional.L2::evict_first.b64 %rd186, 1.0; 2026-02-21T10:19:37.9889978Z // end inline asm 2026-02-21T10:19:37.9890125Z // begin inline asm 2026-02-21T10:19:37.9890285Z mov.u32 %r2354, 0x0; 2026-02-21T10:19:37.9890438Z mov.u32 %r2355, 0x0; 2026-02-21T10:19:37.9890600Z mov.u32 %r2356, 0x0; 2026-02-21T10:19:37.9890757Z mov.u32 %r2357, 0x0; 2026-02-21T10:19:37.9891071Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2354, %r2355, %r2356, %r2357 }, [ %rd187 + 0 ], %rd186; 2026-02-21T10:19:37.9891445Z // end inline asm 2026-02-21T10:19:37.9891760Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:37.9892127Z bar.sync 0; 2026-02-21T10:19:37.9892295Z st.shared.v2.b32 [%r9], {%r2326, %r2327}; 2026-02-21T10:19:37.9892543Z st.shared.v2.b32 [%r9+2048], {%r2330, %r2331}; 2026-02-21T10:19:37.9892794Z st.shared.v2.b32 [%r9+4096], {%r2334, %r2335}; 2026-02-21T10:19:37.9893037Z st.shared.v2.b32 [%r9+6144], {%r2338, %r2339}; 2026-02-21T10:19:37.9893280Z st.shared.v2.b32 [%r9+8192], {%r2342, %r2343}; 2026-02-21T10:19:37.9893526Z st.shared.v2.b32 [%r9+10240], {%r2346, %r2347}; 2026-02-21T10:19:37.9893777Z st.shared.v2.b32 [%r9+12288], {%r2350, %r2351}; 2026-02-21T10:19:37.9894018Z st.shared.v2.b32 [%r9+14336], {%r2354, %r2355}; 2026-02-21T10:19:37.9894267Z st.shared.v2.b32 [%r10], {%r2328, %r2329}; 2026-02-21T10:19:37.9894501Z st.shared.v2.b32 [%r10+2048], {%r2332, %r2333}; 2026-02-21T10:19:37.9894746Z st.shared.v2.b32 [%r10+4096], {%r2336, %r2337}; 2026-02-21T10:19:37.9894993Z st.shared.v2.b32 [%r10+6144], {%r2340, %r2341}; 2026-02-21T10:19:37.9895231Z st.shared.v2.b32 [%r10+8192], {%r2344, %r2345}; 2026-02-21T10:19:37.9895482Z st.shared.v2.b32 [%r10+10240], {%r2348, %r2349}; 2026-02-21T10:19:37.9895730Z st.shared.v2.b32 [%r10+12288], {%r2352, %r2353}; 2026-02-21T10:19:37.9895977Z st.shared.v2.b32 [%r10+14336], {%r2356, %r2357}; 2026-02-21T10:19:37.9896192Z bar.sync 0; 2026-02-21T10:19:37.9896352Z ld.shared.b16 %rs1, [%r11]; 2026-02-21T10:19:37.9896688Z ld.shared.b16 %rs2, [%r11+1024]; 2026-02-21T10:19:37.9896903Z ld.shared.b16 %rs3, [%r11+64]; 2026-02-21T10:19:37.9897101Z ld.shared.b16 %rs4, [%r11+1088]; 2026-02-21T10:19:37.9897294Z ld.shared.b16 %rs5, [%r11+8192]; 2026-02-21T10:19:37.9897489Z ld.shared.b16 %rs6, [%r11+9216]; 2026-02-21T10:19:37.9897676Z ld.shared.b16 %rs7, [%r11+8256]; 2026-02-21T10:19:37.9897869Z ld.shared.b16 %rs8, [%r11+9280]; 2026-02-21T10:19:37.9898059Z ld.shared.b16 %rs9, [%r12]; 2026-02-21T10:19:37.9898250Z ld.shared.b16 %rs10, [%r12+1024]; 2026-02-21T10:19:37.9898549Z ld.shared.b16 %rs11, [%r12+64]; 2026-02-21T10:19:37.9898811Z ld.shared.b16 %rs12, [%r12+1088]; 2026-02-21T10:19:37.9899013Z ld.shared.b16 %rs13, [%r12+8192]; 2026-02-21T10:19:37.9899205Z ld.shared.b16 %rs14, [%r12+9216]; 2026-02-21T10:19:37.9899401Z ld.shared.b16 %rs15, [%r12+8256]; 2026-02-21T10:19:37.9899593Z ld.shared.b16 %rs16, [%r12+9280]; 2026-02-21T10:19:37.9899794Z ld.shared.b16 %rs17, [%r13]; 2026-02-21T10:19:37.9899982Z ld.shared.b16 %rs18, [%r13+1024]; 2026-02-21T10:19:37.9900179Z ld.shared.b16 %rs19, [%r13+64]; 2026-02-21T10:19:37.9900370Z ld.shared.b16 %rs20, [%r13+1088]; 2026-02-21T10:19:37.9900567Z ld.shared.b16 %rs21, [%r13+8192]; 2026-02-21T10:19:37.9900762Z ld.shared.b16 %rs22, [%r13+9216]; 2026-02-21T10:19:37.9900952Z ld.shared.b16 %rs23, [%r13+8256]; 2026-02-21T10:19:37.9901148Z ld.shared.b16 %rs24, [%r13+9280]; 2026-02-21T10:19:37.9901339Z ld.shared.b16 %rs25, [%r14]; 2026-02-21T10:19:37.9901527Z ld.shared.b16 %rs26, [%r14+1024]; 2026-02-21T10:19:37.9901722Z ld.shared.b16 %rs27, [%r14+64]; 2026-02-21T10:19:37.9901923Z ld.shared.b16 %rs28, [%r14+1088]; 2026-02-21T10:19:37.9902189Z ld.shared.b16 %rs29, [%r14+8192]; 2026-02-21T10:19:37.9902387Z ld.shared.b16 %rs30, [%r14+9216]; 2026-02-21T10:19:37.9902578Z ld.shared.b16 %rs31, [%r14+8256]; 2026-02-21T10:19:37.9902788Z ld.shared.b16 %rs32, [%r14+9280]; 2026-02-21T10:19:37.9902987Z ld.shared.b16 %rs33, [%r15]; 2026-02-21T10:19:37.9903242Z ld.shared.b16 %rs34, [%r15+1024]; 2026-02-21T10:19:37.9903450Z ld.shared.b16 %rs35, [%r15+64]; 2026-02-21T10:19:37.9903640Z ld.shared.b16 %rs36, [%r15+1088]; 2026-02-21T10:19:37.9903837Z ld.shared.b16 %rs37, [%r15+8192]; 2026-02-21T10:19:37.9904024Z ld.shared.b16 %rs38, [%r15+9216]; 2026-02-21T10:19:37.9904219Z ld.shared.b16 %rs39, [%r15+8256]; 2026-02-21T10:19:37.9904414Z ld.shared.b16 %rs40, [%r15+9280]; 2026-02-21T10:19:37.9904611Z ld.shared.b16 %rs41, [%r16]; 2026-02-21T10:19:37.9904798Z ld.shared.b16 %rs42, [%r16+1024]; 2026-02-21T10:19:37.9904990Z ld.shared.b16 %rs43, [%r16+64]; 2026-02-21T10:19:37.9905182Z ld.shared.b16 %rs44, [%r16+1088]; 2026-02-21T10:19:37.9905373Z ld.shared.b16 %rs45, [%r16+8192]; 2026-02-21T10:19:37.9905567Z ld.shared.b16 %rs46, [%r16+9216]; 2026-02-21T10:19:37.9905770Z ld.shared.b16 %rs47, [%r16+8256]; 2026-02-21T10:19:37.9905965Z ld.shared.b16 %rs48, [%r16+9280]; 2026-02-21T10:19:37.9906153Z ld.shared.b16 %rs49, [%r17]; 2026-02-21T10:19:37.9906340Z ld.shared.b16 %rs50, [%r17+1024]; 2026-02-21T10:19:37.9906670Z ld.shared.b16 %rs51, [%r17+64]; 2026-02-21T10:19:37.9906862Z ld.shared.b16 %rs52, [%r17+1088]; 2026-02-21T10:19:37.9907056Z ld.shared.b16 %rs53, [%r17+8192]; 2026-02-21T10:19:37.9907259Z ld.shared.b16 %rs54, [%r17+9216]; 2026-02-21T10:19:37.9907456Z ld.shared.b16 %rs55, [%r17+8256]; 2026-02-21T10:19:37.9907647Z ld.shared.b16 %rs56, [%r17+9280]; 2026-02-21T10:19:37.9907839Z ld.shared.b16 %rs57, [%r18]; 2026-02-21T10:19:37.9908023Z ld.shared.b16 %rs58, [%r18+1024]; 2026-02-21T10:19:37.9908221Z ld.shared.b16 %rs59, [%r18+64]; 2026-02-21T10:19:37.9908479Z ld.shared.b16 %rs60, [%r18+1088]; 2026-02-21T10:19:37.9908679Z ld.shared.b16 %rs61, [%r18+8192]; 2026-02-21T10:19:37.9908873Z ld.shared.b16 %rs62, [%r18+9216]; 2026-02-21T10:19:37.9909066Z ld.shared.b16 %rs63, [%r18+8256]; 2026-02-21T10:19:37.9909262Z ld.shared.b16 %rs64, [%r18+9280]; 2026-02-21T10:19:37.9909452Z cvt.f32.bf16 %r2495, %rs1; 2026-02-21T10:19:37.9909650Z cvt.f32.bf16 %r2496, %rs2; 2026-02-21T10:19:37.9909824Z cvt.f32.bf16 %r2497, %rs9; 2026-02-21T10:19:37.9910005Z cvt.f32.bf16 %r2498, %rs10; 2026-02-21T10:19:37.9910185Z cvt.f32.bf16 %r2627, %rs17; 2026-02-21T10:19:37.9910379Z cvt.f32.bf16 %r2628, %rs18; 2026-02-21T10:19:37.9910570Z cvt.f32.bf16 %r2629, %rs25; 2026-02-21T10:19:37.9910744Z cvt.f32.bf16 %r2630, %rs26; 2026-02-21T10:19:37.9910925Z cvt.f32.bf16 %r2759, %rs33; 2026-02-21T10:19:37.9911098Z cvt.f32.bf16 %r2760, %rs34; 2026-02-21T10:19:37.9911377Z cvt.f32.bf16 %r2761, %rs41; 2026-02-21T10:19:37.9911551Z cvt.f32.bf16 %r2762, %rs42; 2026-02-21T10:19:37.9911799Z cvt.f32.bf16 %r2891, %rs49; 2026-02-21T10:19:37.9911975Z cvt.f32.bf16 %r2892, %rs50; 2026-02-21T10:19:37.9912160Z cvt.f32.bf16 %r2893, %rs57; 2026-02-21T10:19:37.9912332Z cvt.f32.bf16 %r2894, %rs58; 2026-02-21T10:19:37.9912515Z cvt.f32.bf16 %r3023, %rs3; 2026-02-21T10:19:37.9912702Z cvt.f32.bf16 %r3024, %rs4; 2026-02-21T10:19:37.9912876Z cvt.f32.bf16 %r3025, %rs11; 2026-02-21T10:19:37.9913072Z cvt.f32.bf16 %r3026, %rs12; 2026-02-21T10:19:37.9913250Z cvt.f32.bf16 %r3155, %rs19; 2026-02-21T10:19:37.9913430Z cvt.f32.bf16 %r3156, %rs20; 2026-02-21T10:19:37.9913607Z cvt.f32.bf16 %r3157, %rs27; 2026-02-21T10:19:37.9913788Z cvt.f32.bf16 %r3158, %rs28; 2026-02-21T10:19:37.9913962Z cvt.f32.bf16 %r3287, %rs35; 2026-02-21T10:19:37.9914141Z cvt.f32.bf16 %r3288, %rs36; 2026-02-21T10:19:37.9914318Z cvt.f32.bf16 %r3289, %rs43; 2026-02-21T10:19:37.9914491Z cvt.f32.bf16 %r3290, %rs44; 2026-02-21T10:19:37.9914672Z cvt.f32.bf16 %r3419, %rs51; 2026-02-21T10:19:37.9914846Z cvt.f32.bf16 %r3420, %rs52; 2026-02-21T10:19:37.9915022Z cvt.f32.bf16 %r3421, %rs59; 2026-02-21T10:19:37.9915275Z cvt.f32.bf16 %r3422, %rs60; 2026-02-21T10:19:37.9915464Z cvt.f32.bf16 %r3551, %rs5; 2026-02-21T10:19:37.9915635Z cvt.f32.bf16 %r3552, %rs6; 2026-02-21T10:19:37.9915815Z cvt.f32.bf16 %r3553, %rs13; 2026-02-21T10:19:37.9915985Z cvt.f32.bf16 %r3554, %rs14; 2026-02-21T10:19:37.9916228Z cvt.f32.bf16 %r3683, %rs21; 2026-02-21T10:19:37.9916411Z cvt.f32.bf16 %r3684, %rs22; 2026-02-21T10:19:37.9916729Z cvt.f32.bf16 %r3685, %rs29; 2026-02-21T10:19:37.9916912Z cvt.f32.bf16 %r3686, %rs30; 2026-02-21T10:19:37.9917083Z cvt.f32.bf16 %r3815, %rs37; 2026-02-21T10:19:37.9917275Z cvt.f32.bf16 %r3816, %rs38; 2026-02-21T10:19:37.9917444Z cvt.f32.bf16 %r3817, %rs45; 2026-02-21T10:19:37.9917623Z cvt.f32.bf16 %r3818, %rs46; 2026-02-21T10:19:37.9917796Z cvt.f32.bf16 %r3947, %rs53; 2026-02-21T10:19:37.9917976Z cvt.f32.bf16 %r3948, %rs54; 2026-02-21T10:19:37.9918149Z cvt.f32.bf16 %r3949, %rs61; 2026-02-21T10:19:37.9918330Z cvt.f32.bf16 %r3950, %rs62; 2026-02-21T10:19:37.9918509Z cvt.f32.bf16 %r4079, %rs7; 2026-02-21T10:19:37.9918683Z cvt.f32.bf16 %r4080, %rs8; 2026-02-21T10:19:37.9918860Z cvt.f32.bf16 %r4081, %rs15; 2026-02-21T10:19:37.9919035Z cvt.f32.bf16 %r4082, %rs16; 2026-02-21T10:19:37.9919214Z cvt.f32.bf16 %r4211, %rs23; 2026-02-21T10:19:37.9919389Z cvt.f32.bf16 %r4212, %rs24; 2026-02-21T10:19:37.9919570Z cvt.f32.bf16 %r4213, %rs31; 2026-02-21T10:19:37.9919747Z cvt.f32.bf16 %r4214, %rs32; 2026-02-21T10:19:37.9919929Z cvt.f32.bf16 %r4343, %rs39; 2026-02-21T10:19:37.9920110Z cvt.f32.bf16 %r4344, %rs40; 2026-02-21T10:19:37.9920308Z cvt.f32.bf16 %r4345, %rs47; 2026-02-21T10:19:37.9920494Z cvt.f32.bf16 %r4346, %rs48; 2026-02-21T10:19:37.9920668Z cvt.f32.bf16 %r4475, %rs55; 2026-02-21T10:19:37.9920852Z cvt.f32.bf16 %r4476, %rs56; 2026-02-21T10:19:37.9921033Z cvt.f32.bf16 %r4477, %rs63; 2026-02-21T10:19:37.9921212Z cvt.f32.bf16 %r4478, %rs64; 2026-02-21T10:19:37.9921543Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:37.9921908Z bar.sync 0; 2026-02-21T10:19:37.9922063Z add.s32 %r29846, %r39931, 4096; 2026-02-21T10:19:37.9922255Z // begin inline asm 2026-02-21T10:19:37.9922466Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:37.9922722Z // end inline asm 2026-02-21T10:19:37.9922878Z bar.sync 0; 2026-02-21T10:19:37.9923028Z // begin inline asm 2026-02-21T10:19:37.9923267Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:37.9923541Z // end inline asm 2026-02-21T10:19:37.9923700Z // begin inline asm 2026-02-21T10:19:37.9923886Z fence.proxy.async.shared::cta; 2026-02-21T10:19:37.9924092Z // end inline asm 2026-02-21T10:19:37.9924249Z bar.sync 0; 2026-02-21T10:19:37.9924401Z elect.sync %r9571|%p99, -1; 2026-02-21T10:19:37.9924680Z and.pred %p40, %p1, %p99; 2026-02-21T10:19:37.9924861Z add.s64 %rd31, %rd841, 96; 2026-02-21T10:19:37.9925129Z cvt.u32.u64 %r2362, %rd31; 2026-02-21T10:19:37.9925300Z // begin inline asm 2026-02-21T10:19:37.9925736Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r9804, %r2362}], [%r29846]; 2026-02-21T10:19:37.9926210Z // end inline asm 2026-02-21T10:19:37.9926369Z bar.sync 0; 2026-02-21T10:19:37.9926679Z mov.b32 %r9439, 0; 2026-02-21T10:19:37.9926952Z // begin inline asm 2026-02-21T10:19:37.9927150Z 2026-02-21T10:19:37.9927277Z { 2026-02-21T10:19:37.9927420Z .reg .pred complete; 2026-02-21T10:19:37.9927581Z waitLoop: 2026-02-21T10:19:37.9927809Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r9439; 2026-02-21T10:19:37.9928107Z @!complete bra.uni waitLoop; 2026-02-21T10:19:37.9928289Z } 2026-02-21T10:19:37.9928362Z 2026-02-21T10:19:37.9928424Z // end inline asm 2026-02-21T10:19:37.9928585Z bar.sync 0; 2026-02-21T10:19:37.9928734Z // begin inline asm 2026-02-21T10:19:37.9928935Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:37.9929183Z // end inline asm 2026-02-21T10:19:37.9929593Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9929965Z ld.shared.s8 %rs65, [%r19]; 2026-02-21T10:19:37.9930283Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9930703Z shl.b16 %rs66, %rs65, 4; 2026-02-21T10:19:37.9931020Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9931393Z ld.shared.s8 %rs67, [%r20+128]; 2026-02-21T10:19:37.9931743Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9932106Z shl.b16 %rs68, %rs67, 4; 2026-02-21T10:19:37.9932426Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9932784Z ld.shared.s8 %rs69, [%r21+256]; 2026-02-21T10:19:37.9933117Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9933471Z shl.b16 %rs70, %rs69, 4; 2026-02-21T10:19:37.9933775Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9934132Z ld.shared.s8 %rs71, [%r22+384]; 2026-02-21T10:19:37.9934454Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9934815Z shl.b16 %rs72, %rs71, 4; 2026-02-21T10:19:37.9935119Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9935474Z ld.shared.s8 %rs73, [%r23+512]; 2026-02-21T10:19:37.9935803Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9936155Z shl.b16 %rs74, %rs73, 4; 2026-02-21T10:19:37.9936614Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9936991Z ld.shared.s8 %rs75, [%r24+640]; 2026-02-21T10:19:37.9937321Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9937663Z shl.b16 %rs76, %rs75, 4; 2026-02-21T10:19:37.9937974Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9938337Z ld.shared.s8 %rs77, [%r25+768]; 2026-02-21T10:19:37.9938654Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9939008Z shl.b16 %rs78, %rs77, 4; 2026-02-21T10:19:37.9939305Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9939657Z ld.shared.s8 %rs79, [%r26+896]; 2026-02-21T10:19:37.9940082Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9940515Z shl.b16 %rs80, %rs79, 4; 2026-02-21T10:19:37.9940819Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9941185Z ld.shared.s8 %rs81, [%r19+1024]; 2026-02-21T10:19:37.9941522Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9941874Z shl.b16 %rs82, %rs81, 4; 2026-02-21T10:19:37.9942187Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9942537Z ld.shared.s8 %rs83, [%r20+1152]; 2026-02-21T10:19:37.9942866Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9943215Z shl.b16 %rs84, %rs83, 4; 2026-02-21T10:19:37.9943515Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9943873Z ld.shared.s8 %rs85, [%r21+1280]; 2026-02-21T10:19:37.9944265Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9944622Z shl.b16 %rs86, %rs85, 4; 2026-02-21T10:19:37.9944938Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9945297Z ld.shared.s8 %rs87, [%r22+1408]; 2026-02-21T10:19:37.9945697Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9946043Z shl.b16 %rs88, %rs87, 4; 2026-02-21T10:19:37.9946351Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9946840Z ld.shared.s8 %rs89, [%r23+1536]; 2026-02-21T10:19:37.9947175Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9947530Z shl.b16 %rs90, %rs89, 4; 2026-02-21T10:19:37.9947834Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9948195Z ld.shared.s8 %rs91, [%r24+1664]; 2026-02-21T10:19:37.9948610Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9948960Z shl.b16 %rs92, %rs91, 4; 2026-02-21T10:19:37.9949263Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9949615Z ld.shared.s8 %rs93, [%r25+1792]; 2026-02-21T10:19:37.9949941Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9950287Z shl.b16 %rs94, %rs93, 4; 2026-02-21T10:19:37.9950598Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9950945Z ld.shared.s8 %rs95, [%r26+1920]; 2026-02-21T10:19:37.9951277Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9951621Z shl.b16 %rs96, %rs95, 4; 2026-02-21T10:19:37.9951933Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9952285Z ld.shared.s8 %rs97, [%r19+2048]; 2026-02-21T10:19:37.9952602Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9952955Z shl.b16 %rs98, %rs97, 4; 2026-02-21T10:19:37.9953272Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9953628Z ld.shared.s8 %rs99, [%r20+2176]; 2026-02-21T10:19:37.9953955Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9954306Z shl.b16 %rs100, %rs99, 4; 2026-02-21T10:19:37.9954621Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9955069Z ld.shared.s8 %rs101, [%r21+2304]; 2026-02-21T10:19:37.9955480Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9955831Z shl.b16 %rs102, %rs101, 4; 2026-02-21T10:19:37.9956154Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9956626Z ld.shared.s8 %rs103, [%r22+2432]; 2026-02-21T10:19:37.9956973Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9957344Z shl.b16 %rs104, %rs103, 4; 2026-02-21T10:19:37.9957652Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9958010Z ld.shared.s8 %rs105, [%r23+2560]; 2026-02-21T10:19:37.9958331Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9958683Z shl.b16 %rs106, %rs105, 4; 2026-02-21T10:19:37.9959000Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9959454Z ld.shared.s8 %rs107, [%r24+2688]; 2026-02-21T10:19:37.9959796Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9960143Z shl.b16 %rs108, %rs107, 4; 2026-02-21T10:19:37.9960525Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9960884Z ld.shared.s8 %rs109, [%r25+2816]; 2026-02-21T10:19:37.9961213Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9961566Z shl.b16 %rs110, %rs109, 4; 2026-02-21T10:19:37.9961872Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9962224Z ld.shared.s8 %rs111, [%r26+2944]; 2026-02-21T10:19:37.9962549Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9962905Z shl.b16 %rs112, %rs111, 4; 2026-02-21T10:19:37.9963213Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9963570Z ld.shared.s8 %rs113, [%r19+3072]; 2026-02-21T10:19:37.9963903Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9964248Z shl.b16 %rs114, %rs113, 4; 2026-02-21T10:19:37.9964564Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9964911Z ld.shared.s8 %rs115, [%r20+3200]; 2026-02-21T10:19:37.9965245Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9965593Z shl.b16 %rs116, %rs115, 4; 2026-02-21T10:19:37.9965910Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9966273Z ld.shared.s8 %rs117, [%r21+3328]; 2026-02-21T10:19:37.9966727Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9967100Z shl.b16 %rs118, %rs117, 4; 2026-02-21T10:19:37.9967410Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9967771Z ld.shared.s8 %rs119, [%r22+3456]; 2026-02-21T10:19:37.9968094Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9968451Z shl.b16 %rs120, %rs119, 4; 2026-02-21T10:19:37.9968764Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9969112Z ld.shared.s8 %rs121, [%r23+3584]; 2026-02-21T10:19:37.9969452Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9969880Z shl.b16 %rs122, %rs121, 4; 2026-02-21T10:19:37.9970255Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9970611Z ld.shared.s8 %rs123, [%r24+3712]; 2026-02-21T10:19:37.9970936Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9971289Z shl.b16 %rs124, %rs123, 4; 2026-02-21T10:19:37.9971601Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9971957Z ld.shared.s8 %rs125, [%r25+3840]; 2026-02-21T10:19:37.9972279Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9972631Z shl.b16 %rs126, %rs125, 4; 2026-02-21T10:19:37.9972941Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9973287Z ld.shared.s8 %rs127, [%r26+3968]; 2026-02-21T10:19:37.9973618Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:37.9973964Z shl.b16 %rs128, %rs127, 4; 2026-02-21T10:19:37.9974359Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9974713Z cvt.s16.s8 %rs129, %rs66; 2026-02-21T10:19:37.9974892Z shr.s16 %rs130, %rs129, 4; 2026-02-21T10:19:37.9975146Z cvt.s16.s8 %rs131, %rs68; 2026-02-21T10:19:37.9975323Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:19:37.9975501Z shr.s16 %rs133, %rs65, 4; 2026-02-21T10:19:37.9975671Z shr.s16 %rs134, %rs67, 4; 2026-02-21T10:19:37.9975975Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9976326Z cvt.rn.f32.s16 %r9572, %rs134; 2026-02-21T10:19:37.9976643Z cvt.rn.f32.s16 %r9573, %rs133; 2026-02-21T10:19:37.9976830Z cvt.rn.f32.s16 %r9574, %rs132; 2026-02-21T10:19:37.9977038Z cvt.rn.f32.s16 %r9575, %rs130; 2026-02-21T10:19:37.9977363Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9977718Z cvt.s16.s8 %rs135, %rs70; 2026-02-21T10:19:37.9977899Z shr.s16 %rs136, %rs135, 4; 2026-02-21T10:19:37.9978077Z cvt.s16.s8 %rs137, %rs72; 2026-02-21T10:19:37.9978266Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:19:37.9978444Z shr.s16 %rs139, %rs69, 4; 2026-02-21T10:19:37.9978620Z shr.s16 %rs140, %rs71, 4; 2026-02-21T10:19:37.9978922Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9979281Z cvt.rn.f32.s16 %r9576, %rs140; 2026-02-21T10:19:37.9979473Z cvt.rn.f32.s16 %r9577, %rs139; 2026-02-21T10:19:37.9979658Z cvt.rn.f32.s16 %r9578, %rs138; 2026-02-21T10:19:37.9979849Z cvt.rn.f32.s16 %r9579, %rs136; 2026-02-21T10:19:37.9980161Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9980526Z cvt.s16.s8 %rs141, %rs74; 2026-02-21T10:19:37.9980701Z shr.s16 %rs142, %rs141, 4; 2026-02-21T10:19:37.9980887Z cvt.s16.s8 %rs143, %rs76; 2026-02-21T10:19:37.9981064Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:19:37.9981244Z shr.s16 %rs145, %rs73, 4; 2026-02-21T10:19:37.9981420Z shr.s16 %rs146, %rs75, 4; 2026-02-21T10:19:37.9981723Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9982080Z cvt.rn.f32.s16 %r9580, %rs146; 2026-02-21T10:19:37.9982266Z cvt.rn.f32.s16 %r9581, %rs145; 2026-02-21T10:19:37.9982458Z cvt.rn.f32.s16 %r9582, %rs144; 2026-02-21T10:19:37.9982642Z cvt.rn.f32.s16 %r9583, %rs142; 2026-02-21T10:19:37.9982966Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9983316Z cvt.s16.s8 %rs147, %rs78; 2026-02-21T10:19:37.9983509Z shr.s16 %rs148, %rs147, 4; 2026-02-21T10:19:37.9983696Z cvt.s16.s8 %rs149, %rs80; 2026-02-21T10:19:37.9983984Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:19:37.9984164Z shr.s16 %rs151, %rs77, 4; 2026-02-21T10:19:37.9984405Z shr.s16 %rs152, %rs79, 4; 2026-02-21T10:19:37.9984720Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9985068Z cvt.rn.f32.s16 %r9584, %rs152; 2026-02-21T10:19:37.9985260Z cvt.rn.f32.s16 %r9585, %rs151; 2026-02-21T10:19:37.9985444Z cvt.rn.f32.s16 %r9586, %rs150; 2026-02-21T10:19:37.9985639Z cvt.rn.f32.s16 %r9587, %rs148; 2026-02-21T10:19:37.9985962Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9986308Z cvt.s16.s8 %rs153, %rs82; 2026-02-21T10:19:37.9986617Z shr.s16 %rs154, %rs153, 4; 2026-02-21T10:19:37.9986809Z cvt.s16.s8 %rs155, %rs84; 2026-02-21T10:19:37.9987005Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:19:37.9987178Z shr.s16 %rs157, %rs81, 4; 2026-02-21T10:19:37.9987352Z shr.s16 %rs158, %rs83, 4; 2026-02-21T10:19:37.9987658Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9988026Z cvt.rn.f32.s16 %r9588, %rs158; 2026-02-21T10:19:37.9988295Z cvt.rn.f32.s16 %r9589, %rs157; 2026-02-21T10:19:37.9988550Z cvt.rn.f32.s16 %r9590, %rs156; 2026-02-21T10:19:37.9988738Z cvt.rn.f32.s16 %r9591, %rs154; 2026-02-21T10:19:37.9989128Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9989481Z cvt.s16.s8 %rs159, %rs86; 2026-02-21T10:19:37.9989652Z shr.s16 %rs160, %rs159, 4; 2026-02-21T10:19:37.9989834Z cvt.s16.s8 %rs161, %rs88; 2026-02-21T10:19:37.9990005Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:19:37.9990186Z shr.s16 %rs163, %rs85, 4; 2026-02-21T10:19:37.9990365Z shr.s16 %rs164, %rs87, 4; 2026-02-21T10:19:37.9990676Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9991032Z cvt.rn.f32.s16 %r9592, %rs164; 2026-02-21T10:19:37.9991217Z cvt.rn.f32.s16 %r9593, %rs163; 2026-02-21T10:19:37.9991408Z cvt.rn.f32.s16 %r9594, %rs162; 2026-02-21T10:19:37.9991592Z cvt.rn.f32.s16 %r9595, %rs160; 2026-02-21T10:19:37.9991916Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9992266Z cvt.s16.s8 %rs165, %rs90; 2026-02-21T10:19:37.9992439Z shr.s16 %rs166, %rs165, 4; 2026-02-21T10:19:37.9992624Z cvt.s16.s8 %rs167, %rs92; 2026-02-21T10:19:37.9992797Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:19:37.9992976Z shr.s16 %rs169, %rs89, 4; 2026-02-21T10:19:37.9993147Z shr.s16 %rs170, %rs91, 4; 2026-02-21T10:19:37.9993474Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9993826Z cvt.rn.f32.s16 %r9596, %rs170; 2026-02-21T10:19:37.9994016Z cvt.rn.f32.s16 %r9597, %rs169; 2026-02-21T10:19:37.9994081Z cvt.rn.f32.s16 %r9598, %rs168; 2026-02-21T10:19:37.9994156Z cvt.rn.f32.s16 %r9599, %rs166; 2026-02-21T10:19:37.9994354Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9994423Z cvt.s16.s8 %rs171, %rs94; 2026-02-21T10:19:37.9994489Z shr.s16 %rs172, %rs171, 4; 2026-02-21T10:19:37.9994559Z cvt.s16.s8 %rs173, %rs96; 2026-02-21T10:19:37.9994622Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:19:37.9994687Z shr.s16 %rs175, %rs93, 4; 2026-02-21T10:19:37.9994760Z shr.s16 %rs176, %rs95, 4; 2026-02-21T10:19:37.9994957Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9995024Z cvt.rn.f32.s16 %r9600, %rs176; 2026-02-21T10:19:37.9995089Z cvt.rn.f32.s16 %r9601, %rs175; 2026-02-21T10:19:37.9995162Z cvt.rn.f32.s16 %r9602, %rs174; 2026-02-21T10:19:37.9995229Z cvt.rn.f32.s16 %r9603, %rs172; 2026-02-21T10:19:37.9995423Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9995584Z cvt.s16.s8 %rs177, %rs98; 2026-02-21T10:19:37.9995714Z shr.s16 %rs178, %rs177, 4; 2026-02-21T10:19:37.9995780Z cvt.s16.s8 %rs179, %rs100; 2026-02-21T10:19:37.9995854Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:19:37.9995919Z shr.s16 %rs181, %rs97, 4; 2026-02-21T10:19:37.9995984Z shr.s16 %rs182, %rs99, 4; 2026-02-21T10:19:37.9996191Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9996269Z cvt.rn.f32.s16 %r9604, %rs182; 2026-02-21T10:19:37.9996336Z cvt.rn.f32.s16 %r9605, %rs181; 2026-02-21T10:19:37.9996400Z cvt.rn.f32.s16 %r9606, %rs180; 2026-02-21T10:19:37.9996613Z cvt.rn.f32.s16 %r9607, %rs178; 2026-02-21T10:19:37.9996820Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9996884Z cvt.s16.s8 %rs183, %rs102; 2026-02-21T10:19:37.9996955Z shr.s16 %rs184, %rs183, 4; 2026-02-21T10:19:37.9997019Z cvt.s16.s8 %rs185, %rs104; 2026-02-21T10:19:37.9997085Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:19:37.9997149Z shr.s16 %rs187, %rs101, 4; 2026-02-21T10:19:37.9997227Z shr.s16 %rs188, %rs103, 4; 2026-02-21T10:19:37.9997501Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9997573Z cvt.rn.f32.s16 %r9608, %rs188; 2026-02-21T10:19:37.9997647Z cvt.rn.f32.s16 %r9609, %rs187; 2026-02-21T10:19:37.9997775Z cvt.rn.f32.s16 %r9610, %rs186; 2026-02-21T10:19:37.9997841Z cvt.rn.f32.s16 %r9611, %rs184; 2026-02-21T10:19:37.9998036Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9998105Z cvt.s16.s8 %rs189, %rs106; 2026-02-21T10:19:37.9998167Z shr.s16 %rs190, %rs189, 4; 2026-02-21T10:19:37.9998228Z cvt.s16.s8 %rs191, %rs108; 2026-02-21T10:19:37.9998296Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:19:37.9998358Z shr.s16 %rs193, %rs105, 4; 2026-02-21T10:19:37.9998423Z shr.s16 %rs194, %rs107, 4; 2026-02-21T10:19:37.9998634Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9998706Z cvt.rn.f32.s16 %r9612, %rs194; 2026-02-21T10:19:37.9998772Z cvt.rn.f32.s16 %r9613, %rs193; 2026-02-21T10:19:37.9998835Z cvt.rn.f32.s16 %r9614, %rs192; 2026-02-21T10:19:37.9998903Z cvt.rn.f32.s16 %r9615, %rs190; 2026-02-21T10:19:37.9999099Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:37.9999164Z cvt.s16.s8 %rs195, %rs110; 2026-02-21T10:19:37.9999232Z shr.s16 %rs196, %rs195, 4; 2026-02-21T10:19:37.9999294Z cvt.s16.s8 %rs197, %rs112; 2026-02-21T10:19:37.9999357Z shr.s16 %rs198, %rs197, 4; 2026-02-21T10:19:37.9999418Z shr.s16 %rs199, %rs109, 4; 2026-02-21T10:19:37.9999486Z shr.s16 %rs200, %rs111, 4; 2026-02-21T10:19:37.9999678Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:37.9999747Z cvt.rn.f32.s16 %r9616, %rs200; 2026-02-21T10:19:37.9999819Z cvt.rn.f32.s16 %r9617, %rs199; 2026-02-21T10:19:37.9999884Z cvt.rn.f32.s16 %r9618, %rs198; 2026-02-21T10:19:37.9999950Z cvt.rn.f32.s16 %r9619, %rs196; 2026-02-21T10:19:38.0000148Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0000212Z cvt.s16.s8 %rs201, %rs114; 2026-02-21T10:19:38.0000274Z shr.s16 %rs202, %rs201, 4; 2026-02-21T10:19:38.0000337Z cvt.s16.s8 %rs203, %rs116; 2026-02-21T10:19:38.0000405Z shr.s16 %rs204, %rs203, 4; 2026-02-21T10:19:38.0000468Z shr.s16 %rs205, %rs113, 4; 2026-02-21T10:19:38.0000530Z shr.s16 %rs206, %rs115, 4; 2026-02-21T10:19:38.0000728Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0000794Z cvt.rn.f32.s16 %r9620, %rs206; 2026-02-21T10:19:38.0000859Z cvt.rn.f32.s16 %r9621, %rs205; 2026-02-21T10:19:38.0000929Z cvt.rn.f32.s16 %r9622, %rs204; 2026-02-21T10:19:38.0001081Z cvt.rn.f32.s16 %r9623, %rs202; 2026-02-21T10:19:38.0001338Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0001401Z cvt.s16.s8 %rs207, %rs118; 2026-02-21T10:19:38.0001471Z shr.s16 %rs208, %rs207, 4; 2026-02-21T10:19:38.0001535Z cvt.s16.s8 %rs209, %rs120; 2026-02-21T10:19:38.0001598Z shr.s16 %rs210, %rs209, 4; 2026-02-21T10:19:38.0001664Z shr.s16 %rs211, %rs117, 4; 2026-02-21T10:19:38.0001725Z shr.s16 %rs212, %rs119, 4; 2026-02-21T10:19:38.0001918Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0001986Z cvt.rn.f32.s16 %r9624, %rs212; 2026-02-21T10:19:38.0002050Z cvt.rn.f32.s16 %r9625, %rs211; 2026-02-21T10:19:38.0002113Z cvt.rn.f32.s16 %r9626, %rs210; 2026-02-21T10:19:38.0002177Z cvt.rn.f32.s16 %r9627, %rs208; 2026-02-21T10:19:38.0002372Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0002435Z cvt.s16.s8 %rs213, %rs122; 2026-02-21T10:19:38.0002498Z shr.s16 %rs214, %rs213, 4; 2026-02-21T10:19:38.0002613Z cvt.s16.s8 %rs215, %rs124; 2026-02-21T10:19:38.0002675Z shr.s16 %rs216, %rs215, 4; 2026-02-21T10:19:38.0002735Z shr.s16 %rs217, %rs121, 4; 2026-02-21T10:19:38.0002795Z shr.s16 %rs218, %rs123, 4; 2026-02-21T10:19:38.0003034Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0003100Z cvt.rn.f32.s16 %r9628, %rs218; 2026-02-21T10:19:38.0003164Z cvt.rn.f32.s16 %r9629, %rs217; 2026-02-21T10:19:38.0003228Z cvt.rn.f32.s16 %r9630, %rs216; 2026-02-21T10:19:38.0003292Z cvt.rn.f32.s16 %r9631, %rs214; 2026-02-21T10:19:38.0003488Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0003551Z cvt.s16.s8 %rs219, %rs126; 2026-02-21T10:19:38.0003623Z shr.s16 %rs220, %rs219, 4; 2026-02-21T10:19:38.0003690Z cvt.s16.s8 %rs221, %rs128; 2026-02-21T10:19:38.0003755Z shr.s16 %rs222, %rs221, 4; 2026-02-21T10:19:38.0003820Z shr.s16 %rs223, %rs125, 4; 2026-02-21T10:19:38.0003880Z shr.s16 %rs224, %rs127, 4; 2026-02-21T10:19:38.0004073Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0004139Z cvt.rn.f32.s16 %r9632, %rs224; 2026-02-21T10:19:38.0004201Z cvt.rn.f32.s16 %r9633, %rs223; 2026-02-21T10:19:38.0004265Z cvt.rn.f32.s16 %r9634, %rs222; 2026-02-21T10:19:38.0004327Z cvt.rn.f32.s16 %r9635, %rs220; 2026-02-21T10:19:38.0004390Z bar.sync 0; 2026-02-21T10:19:38.0004509Z st.shared.v4.b32 [%r27], {%r9575, %r9573, %r9574, %r9572}; 2026-02-21T10:19:38.0004634Z st.shared.v4.b32 [%r27+16384], {%r9607, %r9605, %r9606, %r9604}; 2026-02-21T10:19:38.0004751Z st.shared.v4.b32 [%r28], {%r9579, %r9577, %r9578, %r9576}; 2026-02-21T10:19:38.0004868Z st.shared.v4.b32 [%r28+16384], {%r9611, %r9609, %r9610, %r9608}; 2026-02-21T10:19:38.0004975Z st.shared.v4.b32 [%r29], {%r9583, %r9581, %r9582, %r9580}; 2026-02-21T10:19:38.0005098Z st.shared.v4.b32 [%r29+16384], {%r9615, %r9613, %r9614, %r9612}; 2026-02-21T10:19:38.0005218Z st.shared.v4.b32 [%r30], {%r9587, %r9585, %r9586, %r9584}; 2026-02-21T10:19:38.0005337Z st.shared.v4.b32 [%r30+16384], {%r9619, %r9617, %r9618, %r9616}; 2026-02-21T10:19:38.0005441Z st.shared.v4.b32 [%r31], {%r9591, %r9589, %r9590, %r9588}; 2026-02-21T10:19:38.0005563Z st.shared.v4.b32 [%r31+16384], {%r9623, %r9621, %r9622, %r9620}; 2026-02-21T10:19:38.0005668Z st.shared.v4.b32 [%r32], {%r9595, %r9593, %r9594, %r9592}; 2026-02-21T10:19:38.0005782Z st.shared.v4.b32 [%r32+16384], {%r9627, %r9625, %r9626, %r9624}; 2026-02-21T10:19:38.0005891Z st.shared.v4.b32 [%r33], {%r9599, %r9597, %r9598, %r9596}; 2026-02-21T10:19:38.0006006Z st.shared.v4.b32 [%r33+16384], {%r9631, %r9629, %r9630, %r9628}; 2026-02-21T10:19:38.0006109Z st.shared.v4.b32 [%r34], {%r9603, %r9601, %r9602, %r9600}; 2026-02-21T10:19:38.0006287Z st.shared.v4.b32 [%r34+16384], {%r9635, %r9633, %r9634, %r9632}; 2026-02-21T10:19:38.0006390Z $L__tmp1: 2026-02-21T10:19:38.0006798Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0006881Z // begin inline asm 2026-02-21T10:19:38.0006969Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0007029Z // end inline asm 2026-02-21T10:19:38.0007099Z bar.sync 0; 2026-02-21T10:19:38.0007192Z shfl.sync.idx.b32 %r9636, %r4, 0, 31, -1; 2026-02-21T10:19:38.0007269Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0007336Z mov.pred %p42, -1; 2026-02-21T10:19:38.0007401Z // begin inline asm 2026-02-21T10:19:38.0008976Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r2495,%r2496,%r2497,%r2498}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0009043Z // end inline asm 2026-02-21T10:19:38.0009120Z // begin inline asm 2026-02-21T10:19:38.0010696Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r2627,%r2628,%r2629,%r2630}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0010767Z // end inline asm 2026-02-21T10:19:38.0010827Z // begin inline asm 2026-02-21T10:19:38.0012301Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r2759,%r2760,%r2761,%r2762}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0012372Z // end inline asm 2026-02-21T10:19:38.0012432Z // begin inline asm 2026-02-21T10:19:38.0013903Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r2891,%r2892,%r2893,%r2894}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0013975Z // end inline asm 2026-02-21T10:19:38.0014035Z // begin inline asm 2026-02-21T10:19:38.0015526Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r3023,%r3024,%r3025,%r3026}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0015708Z // end inline asm 2026-02-21T10:19:38.0015769Z // begin inline asm 2026-02-21T10:19:38.0017384Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r3155,%r3156,%r3157,%r3158}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0017452Z // end inline asm 2026-02-21T10:19:38.0017512Z // begin inline asm 2026-02-21T10:19:38.0019121Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r3287,%r3288,%r3289,%r3290}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0019188Z // end inline asm 2026-02-21T10:19:38.0019254Z // begin inline asm 2026-02-21T10:19:38.0020733Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r3419,%r3420,%r3421,%r3422}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0020796Z // end inline asm 2026-02-21T10:19:38.0020863Z // begin inline asm 2026-02-21T10:19:38.0022338Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r3551,%r3552,%r3553,%r3554}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0022407Z // end inline asm 2026-02-21T10:19:38.0022468Z // begin inline asm 2026-02-21T10:19:38.0023948Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r3683,%r3684,%r3685,%r3686}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0024019Z // end inline asm 2026-02-21T10:19:38.0024078Z // begin inline asm 2026-02-21T10:19:38.0025642Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r3815,%r3816,%r3817,%r3818}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0025768Z // end inline asm 2026-02-21T10:19:38.0025830Z // begin inline asm 2026-02-21T10:19:38.0027554Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r3947,%r3948,%r3949,%r3950}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0027675Z // end inline asm 2026-02-21T10:19:38.0027768Z // begin inline asm 2026-02-21T10:19:38.0029440Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r4079,%r4080,%r4081,%r4082}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0029512Z // end inline asm 2026-02-21T10:19:38.0029574Z // begin inline asm 2026-02-21T10:19:38.0031063Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r4211,%r4212,%r4213,%r4214}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0031125Z // end inline asm 2026-02-21T10:19:38.0031191Z // begin inline asm 2026-02-21T10:19:38.0032669Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r4343,%r4344,%r4345,%r4346}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0032742Z // end inline asm 2026-02-21T10:19:38.0032810Z // begin inline asm 2026-02-21T10:19:38.0034293Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r4475,%r4476,%r4477,%r4478}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0034477Z // end inline asm 2026-02-21T10:19:38.0034569Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0034635Z mov.b32 %r4608, %r9439; 2026-02-21T10:19:38.0034699Z mov.b32 %r4609, %r9439; 2026-02-21T10:19:38.0034770Z mov.b32 %r4607, %r39931; 2026-02-21T10:19:38.0034834Z // begin inline asm 2026-02-21T10:19:38.0037663Z // wait for regs: %r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r4607,%r4608,%r4609 2026-02-21T10:19:38.0037761Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0037822Z // end inline asm 2026-02-21T10:19:38.0037886Z $L__tmp2: 2026-02-21T10:19:38.0038105Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0038176Z add.s32 %r9637, %r42468, -64; 2026-02-21T10:19:38.0045876Z add.s64 %rd207, %rd166, 128; 2026-02-21T10:19:38.0046000Z add.s64 %rd210, %rd169, 128; 2026-02-21T10:19:38.0046083Z add.s64 %rd213, %rd172, 128; 2026-02-21T10:19:38.0046152Z add.s64 %rd216, %rd175, 128; 2026-02-21T10:19:38.0046215Z add.s64 %rd219, %rd178, 128; 2026-02-21T10:19:38.0046287Z add.s64 %rd222, %rd181, 128; 2026-02-21T10:19:38.0046348Z add.s64 %rd225, %rd184, 128; 2026-02-21T10:19:38.0046432Z mad.wide.s32 %rd228, %r9637, 2, %rd117; 2026-02-21T10:19:38.0046870Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0046956Z // begin inline asm 2026-02-21T10:19:38.0047025Z mov.u64 %rd206, 0x0; 2026-02-21T10:19:38.0047174Z createpolicy.fractional.L2::evict_first.b64 %rd206, 1.0; 2026-02-21T10:19:38.0047242Z // end inline asm 2026-02-21T10:19:38.0047305Z // begin inline asm 2026-02-21T10:19:38.0047366Z mov.u32 %r4741, 0x0; 2026-02-21T10:19:38.0047429Z mov.u32 %r4742, 0x0; 2026-02-21T10:19:38.0047495Z mov.u32 %r4743, 0x0; 2026-02-21T10:19:38.0047556Z mov.u32 %r4744, 0x0; 2026-02-21T10:19:38.0047797Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4741, %r4742, %r4743, %r4744 }, [ %rd207 + 0 ], %rd206; 2026-02-21T10:19:38.0047874Z // end inline asm 2026-02-21T10:19:38.0047940Z // begin inline asm 2026-02-21T10:19:38.0048003Z mov.u64 %rd209, 0x0; 2026-02-21T10:19:38.0048143Z createpolicy.fractional.L2::evict_first.b64 %rd209, 1.0; 2026-02-21T10:19:38.0048206Z // end inline asm 2026-02-21T10:19:38.0048269Z // begin inline asm 2026-02-21T10:19:38.0048333Z mov.u32 %r4745, 0x0; 2026-02-21T10:19:38.0048401Z mov.u32 %r4746, 0x0; 2026-02-21T10:19:38.0048460Z mov.u32 %r4747, 0x0; 2026-02-21T10:19:38.0048519Z mov.u32 %r4748, 0x0; 2026-02-21T10:19:38.0048766Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4745, %r4746, %r4747, %r4748 }, [ %rd210 + 0 ], %rd209; 2026-02-21T10:19:38.0048839Z // end inline asm 2026-02-21T10:19:38.0048904Z // begin inline asm 2026-02-21T10:19:38.0048973Z mov.u64 %rd212, 0x0; 2026-02-21T10:19:38.0049270Z createpolicy.fractional.L2::evict_first.b64 %rd212, 1.0; 2026-02-21T10:19:38.0049395Z // end inline asm 2026-02-21T10:19:38.0049456Z // begin inline asm 2026-02-21T10:19:38.0049536Z mov.u32 %r4749, 0x0; 2026-02-21T10:19:38.0049599Z mov.u32 %r4750, 0x0; 2026-02-21T10:19:38.0049659Z mov.u32 %r4751, 0x0; 2026-02-21T10:19:38.0049722Z mov.u32 %r4752, 0x0; 2026-02-21T10:19:38.0049955Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4749, %r4750, %r4751, %r4752 }, [ %rd213 + 0 ], %rd212; 2026-02-21T10:19:38.0050015Z // end inline asm 2026-02-21T10:19:38.0050076Z // begin inline asm 2026-02-21T10:19:38.0050146Z mov.u64 %rd215, 0x0; 2026-02-21T10:19:38.0050269Z createpolicy.fractional.L2::evict_first.b64 %rd215, 1.0; 2026-02-21T10:19:38.0050330Z // end inline asm 2026-02-21T10:19:38.0050400Z // begin inline asm 2026-02-21T10:19:38.0050459Z mov.u32 %r4753, 0x0; 2026-02-21T10:19:38.0050517Z mov.u32 %r4754, 0x0; 2026-02-21T10:19:38.0050583Z mov.u32 %r4755, 0x0; 2026-02-21T10:19:38.0050645Z mov.u32 %r4756, 0x0; 2026-02-21T10:19:38.0050860Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4753, %r4754, %r4755, %r4756 }, [ %rd216 + 0 ], %rd215; 2026-02-21T10:19:38.0051024Z // end inline asm 2026-02-21T10:19:38.0051108Z // begin inline asm 2026-02-21T10:19:38.0051174Z mov.u64 %rd218, 0x0; 2026-02-21T10:19:38.0051296Z createpolicy.fractional.L2::evict_first.b64 %rd218, 1.0; 2026-02-21T10:19:38.0051361Z // end inline asm 2026-02-21T10:19:38.0051484Z // begin inline asm 2026-02-21T10:19:38.0051546Z mov.u32 %r4757, 0x0; 2026-02-21T10:19:38.0051609Z mov.u32 %r4758, 0x0; 2026-02-21T10:19:38.0051669Z mov.u32 %r4759, 0x0; 2026-02-21T10:19:38.0051729Z mov.u32 %r4760, 0x0; 2026-02-21T10:19:38.0051949Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4757, %r4758, %r4759, %r4760 }, [ %rd219 + 0 ], %rd218; 2026-02-21T10:19:38.0052014Z // end inline asm 2026-02-21T10:19:38.0052075Z // begin inline asm 2026-02-21T10:19:38.0052135Z mov.u64 %rd221, 0x0; 2026-02-21T10:19:38.0052268Z createpolicy.fractional.L2::evict_first.b64 %rd221, 1.0; 2026-02-21T10:19:38.0052328Z // end inline asm 2026-02-21T10:19:38.0052388Z // begin inline asm 2026-02-21T10:19:38.0052449Z mov.u32 %r4761, 0x0; 2026-02-21T10:19:38.0052514Z mov.u32 %r4762, 0x0; 2026-02-21T10:19:38.0052572Z mov.u32 %r4763, 0x0; 2026-02-21T10:19:38.0052631Z mov.u32 %r4764, 0x0; 2026-02-21T10:19:38.0052863Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4761, %r4762, %r4763, %r4764 }, [ %rd222 + 0 ], %rd221; 2026-02-21T10:19:38.0052923Z // end inline asm 2026-02-21T10:19:38.0052983Z // begin inline asm 2026-02-21T10:19:38.0053051Z mov.u64 %rd224, 0x0; 2026-02-21T10:19:38.0053172Z createpolicy.fractional.L2::evict_first.b64 %rd224, 1.0; 2026-02-21T10:19:38.0053233Z // end inline asm 2026-02-21T10:19:38.0053296Z // begin inline asm 2026-02-21T10:19:38.0053360Z mov.u32 %r4765, 0x0; 2026-02-21T10:19:38.0053430Z mov.u32 %r4766, 0x0; 2026-02-21T10:19:38.0053492Z mov.u32 %r4767, 0x0; 2026-02-21T10:19:38.0053562Z mov.u32 %r4768, 0x0; 2026-02-21T10:19:38.0053776Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4765, %r4766, %r4767, %r4768 }, [ %rd225 + 0 ], %rd224; 2026-02-21T10:19:38.0053839Z // end inline asm 2026-02-21T10:19:38.0053910Z // begin inline asm 2026-02-21T10:19:38.0053970Z mov.u64 %rd227, 0x0; 2026-02-21T10:19:38.0054089Z createpolicy.fractional.L2::evict_first.b64 %rd227, 1.0; 2026-02-21T10:19:38.0054150Z // end inline asm 2026-02-21T10:19:38.0054214Z // begin inline asm 2026-02-21T10:19:38.0054282Z mov.u32 %r4769, 0x0; 2026-02-21T10:19:38.0054340Z mov.u32 %r4770, 0x0; 2026-02-21T10:19:38.0054404Z mov.u32 %r4771, 0x0; 2026-02-21T10:19:38.0054464Z mov.u32 %r4772, 0x0; 2026-02-21T10:19:38.0054676Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4769, %r4770, %r4771, %r4772 }, [ %rd228 + 0 ], %rd227; 2026-02-21T10:19:38.0054734Z // end inline asm 2026-02-21T10:19:38.0054956Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0055078Z bar.sync 0; 2026-02-21T10:19:38.0055162Z st.shared.v2.b32 [%r9], {%r4741, %r4742}; 2026-02-21T10:19:38.0055314Z st.shared.v2.b32 [%r9+2048], {%r4745, %r4746}; 2026-02-21T10:19:38.0055401Z st.shared.v2.b32 [%r9+4096], {%r4749, %r4750}; 2026-02-21T10:19:38.0055483Z st.shared.v2.b32 [%r9+6144], {%r4753, %r4754}; 2026-02-21T10:19:38.0055568Z st.shared.v2.b32 [%r9+8192], {%r4757, %r4758}; 2026-02-21T10:19:38.0055660Z st.shared.v2.b32 [%r9+10240], {%r4761, %r4762}; 2026-02-21T10:19:38.0055744Z st.shared.v2.b32 [%r9+12288], {%r4765, %r4766}; 2026-02-21T10:19:38.0055827Z st.shared.v2.b32 [%r9+14336], {%r4769, %r4770}; 2026-02-21T10:19:38.0055913Z st.shared.v2.b32 [%r10], {%r4743, %r4744}; 2026-02-21T10:19:38.0055996Z st.shared.v2.b32 [%r10+2048], {%r4747, %r4748}; 2026-02-21T10:19:38.0056076Z st.shared.v2.b32 [%r10+4096], {%r4751, %r4752}; 2026-02-21T10:19:38.0056165Z st.shared.v2.b32 [%r10+6144], {%r4755, %r4756}; 2026-02-21T10:19:38.0056247Z st.shared.v2.b32 [%r10+8192], {%r4759, %r4760}; 2026-02-21T10:19:38.0056338Z st.shared.v2.b32 [%r10+10240], {%r4763, %r4764}; 2026-02-21T10:19:38.0056431Z st.shared.v2.b32 [%r10+12288], {%r4767, %r4768}; 2026-02-21T10:19:38.0056727Z st.shared.v2.b32 [%r10+14336], {%r4771, %r4772}; 2026-02-21T10:19:38.0056804Z bar.sync 0; 2026-02-21T10:19:38.0056886Z ld.shared.b16 %rs225, [%r11]; 2026-02-21T10:19:38.0056969Z ld.shared.b16 %rs226, [%r11+1024]; 2026-02-21T10:19:38.0057040Z ld.shared.b16 %rs227, [%r11+64]; 2026-02-21T10:19:38.0057173Z ld.shared.b16 %rs228, [%r11+1088]; 2026-02-21T10:19:38.0057251Z ld.shared.b16 %rs229, [%r11+8192]; 2026-02-21T10:19:38.0057319Z ld.shared.b16 %rs230, [%r11+9216]; 2026-02-21T10:19:38.0057385Z ld.shared.b16 %rs231, [%r11+8256]; 2026-02-21T10:19:38.0057453Z ld.shared.b16 %rs232, [%r11+9280]; 2026-02-21T10:19:38.0057530Z ld.shared.b16 %rs233, [%r12]; 2026-02-21T10:19:38.0057597Z ld.shared.b16 %rs234, [%r12+1024]; 2026-02-21T10:19:38.0057665Z ld.shared.b16 %rs235, [%r12+64]; 2026-02-21T10:19:38.0057743Z ld.shared.b16 %rs236, [%r12+1088]; 2026-02-21T10:19:38.0057812Z ld.shared.b16 %rs237, [%r12+8192]; 2026-02-21T10:19:38.0057878Z ld.shared.b16 %rs238, [%r12+9216]; 2026-02-21T10:19:38.0057952Z ld.shared.b16 %rs239, [%r12+8256]; 2026-02-21T10:19:38.0058021Z ld.shared.b16 %rs240, [%r12+9280]; 2026-02-21T10:19:38.0058087Z ld.shared.b16 %rs241, [%r13]; 2026-02-21T10:19:38.0058153Z ld.shared.b16 %rs242, [%r13+1024]; 2026-02-21T10:19:38.0058229Z ld.shared.b16 %rs243, [%r13+64]; 2026-02-21T10:19:38.0058307Z ld.shared.b16 %rs244, [%r13+1088]; 2026-02-21T10:19:38.0058379Z ld.shared.b16 %rs245, [%r13+8192]; 2026-02-21T10:19:38.0058458Z ld.shared.b16 %rs246, [%r13+9216]; 2026-02-21T10:19:38.0058524Z ld.shared.b16 %rs247, [%r13+8256]; 2026-02-21T10:19:38.0058591Z ld.shared.b16 %rs248, [%r13+9280]; 2026-02-21T10:19:38.0058661Z ld.shared.b16 %rs249, [%r14]; 2026-02-21T10:19:38.0058734Z ld.shared.b16 %rs250, [%r14+1024]; 2026-02-21T10:19:38.0058801Z ld.shared.b16 %rs251, [%r14+64]; 2026-02-21T10:19:38.0058872Z ld.shared.b16 %rs252, [%r14+1088]; 2026-02-21T10:19:38.0058948Z ld.shared.b16 %rs253, [%r14+8192]; 2026-02-21T10:19:38.0059015Z ld.shared.b16 %rs254, [%r14+9216]; 2026-02-21T10:19:38.0059081Z ld.shared.b16 %rs255, [%r14+8256]; 2026-02-21T10:19:38.0059158Z ld.shared.b16 %rs256, [%r14+9280]; 2026-02-21T10:19:38.0059224Z ld.shared.b16 %rs257, [%r15]; 2026-02-21T10:19:38.0059292Z ld.shared.b16 %rs258, [%r15+1024]; 2026-02-21T10:19:38.0059360Z ld.shared.b16 %rs259, [%r15+64]; 2026-02-21T10:19:38.0059436Z ld.shared.b16 %rs260, [%r15+1088]; 2026-02-21T10:19:38.0059502Z ld.shared.b16 %rs261, [%r15+8192]; 2026-02-21T10:19:38.0059571Z ld.shared.b16 %rs262, [%r15+9216]; 2026-02-21T10:19:38.0059644Z ld.shared.b16 %rs263, [%r15+8256]; 2026-02-21T10:19:38.0059712Z ld.shared.b16 %rs264, [%r15+9280]; 2026-02-21T10:19:38.0059779Z ld.shared.b16 %rs265, [%r16]; 2026-02-21T10:19:38.0059845Z ld.shared.b16 %rs266, [%r16+1024]; 2026-02-21T10:19:38.0059920Z ld.shared.b16 %rs267, [%r16+64]; 2026-02-21T10:19:38.0060074Z ld.shared.b16 %rs268, [%r16+1088]; 2026-02-21T10:19:38.0060198Z ld.shared.b16 %rs269, [%r16+8192]; 2026-02-21T10:19:38.0060270Z ld.shared.b16 %rs270, [%r16+9216]; 2026-02-21T10:19:38.0060338Z ld.shared.b16 %rs271, [%r16+8256]; 2026-02-21T10:19:38.0060403Z ld.shared.b16 %rs272, [%r16+9280]; 2026-02-21T10:19:38.0060472Z ld.shared.b16 %rs273, [%r17]; 2026-02-21T10:19:38.0060543Z ld.shared.b16 %rs274, [%r17+1024]; 2026-02-21T10:19:38.0060611Z ld.shared.b16 %rs275, [%r17+64]; 2026-02-21T10:19:38.0060678Z ld.shared.b16 %rs276, [%r17+1088]; 2026-02-21T10:19:38.0060749Z ld.shared.b16 %rs277, [%r17+8192]; 2026-02-21T10:19:38.0060826Z ld.shared.b16 %rs278, [%r17+9216]; 2026-02-21T10:19:38.0060895Z ld.shared.b16 %rs279, [%r17+8256]; 2026-02-21T10:19:38.0060968Z ld.shared.b16 %rs280, [%r17+9280]; 2026-02-21T10:19:38.0061033Z ld.shared.b16 %rs281, [%r18]; 2026-02-21T10:19:38.0061099Z ld.shared.b16 %rs282, [%r18+1024]; 2026-02-21T10:19:38.0061167Z ld.shared.b16 %rs283, [%r18+64]; 2026-02-21T10:19:38.0061241Z ld.shared.b16 %rs284, [%r18+1088]; 2026-02-21T10:19:38.0061313Z ld.shared.b16 %rs285, [%r18+8192]; 2026-02-21T10:19:38.0061437Z ld.shared.b16 %rs286, [%r18+9216]; 2026-02-21T10:19:38.0061516Z ld.shared.b16 %rs287, [%r18+8256]; 2026-02-21T10:19:38.0061584Z ld.shared.b16 %rs288, [%r18+9280]; 2026-02-21T10:19:38.0061652Z cvt.f32.bf16 %r4910, %rs225; 2026-02-21T10:19:38.0061764Z cvt.f32.bf16 %r4911, %rs226; 2026-02-21T10:19:38.0061835Z cvt.f32.bf16 %r4912, %rs233; 2026-02-21T10:19:38.0061900Z cvt.f32.bf16 %r4913, %rs234; 2026-02-21T10:19:38.0061966Z cvt.f32.bf16 %r5042, %rs241; 2026-02-21T10:19:38.0062035Z cvt.f32.bf16 %r5043, %rs242; 2026-02-21T10:19:38.0062106Z cvt.f32.bf16 %r5044, %rs249; 2026-02-21T10:19:38.0062171Z cvt.f32.bf16 %r5045, %rs250; 2026-02-21T10:19:38.0062234Z cvt.f32.bf16 %r5174, %rs257; 2026-02-21T10:19:38.0062306Z cvt.f32.bf16 %r5175, %rs258; 2026-02-21T10:19:38.0062369Z cvt.f32.bf16 %r5176, %rs265; 2026-02-21T10:19:38.0062436Z cvt.f32.bf16 %r5177, %rs266; 2026-02-21T10:19:38.0062507Z cvt.f32.bf16 %r5306, %rs273; 2026-02-21T10:19:38.0062572Z cvt.f32.bf16 %r5307, %rs274; 2026-02-21T10:19:38.0062636Z cvt.f32.bf16 %r5308, %rs281; 2026-02-21T10:19:38.0062705Z cvt.f32.bf16 %r5309, %rs282; 2026-02-21T10:19:38.0062770Z cvt.f32.bf16 %r5438, %rs227; 2026-02-21T10:19:38.0062832Z cvt.f32.bf16 %r5439, %rs228; 2026-02-21T10:19:38.0062894Z cvt.f32.bf16 %r5440, %rs235; 2026-02-21T10:19:38.0062965Z cvt.f32.bf16 %r5441, %rs236; 2026-02-21T10:19:38.0063030Z cvt.f32.bf16 %r5570, %rs243; 2026-02-21T10:19:38.0063092Z cvt.f32.bf16 %r5571, %rs244; 2026-02-21T10:19:38.0063161Z cvt.f32.bf16 %r5572, %rs251; 2026-02-21T10:19:38.0063235Z cvt.f32.bf16 %r5573, %rs252; 2026-02-21T10:19:38.0063300Z cvt.f32.bf16 %r5702, %rs259; 2026-02-21T10:19:38.0063364Z cvt.f32.bf16 %r5703, %rs260; 2026-02-21T10:19:38.0063434Z cvt.f32.bf16 %r5704, %rs267; 2026-02-21T10:19:38.0063499Z cvt.f32.bf16 %r5705, %rs268; 2026-02-21T10:19:38.0063564Z cvt.f32.bf16 %r5834, %rs275; 2026-02-21T10:19:38.0063635Z cvt.f32.bf16 %r5835, %rs276; 2026-02-21T10:19:38.0063698Z cvt.f32.bf16 %r5836, %rs283; 2026-02-21T10:19:38.0063762Z cvt.f32.bf16 %r5837, %rs284; 2026-02-21T10:19:38.0063826Z cvt.f32.bf16 %r5966, %rs229; 2026-02-21T10:19:38.0063894Z cvt.f32.bf16 %r5967, %rs230; 2026-02-21T10:19:38.0063956Z cvt.f32.bf16 %r5968, %rs237; 2026-02-21T10:19:38.0064019Z cvt.f32.bf16 %r5969, %rs238; 2026-02-21T10:19:38.0064094Z cvt.f32.bf16 %r6098, %rs245; 2026-02-21T10:19:38.0064159Z cvt.f32.bf16 %r6099, %rs246; 2026-02-21T10:19:38.0064225Z cvt.f32.bf16 %r6100, %rs253; 2026-02-21T10:19:38.0064287Z cvt.f32.bf16 %r6101, %rs254; 2026-02-21T10:19:38.0064356Z cvt.f32.bf16 %r6230, %rs261; 2026-02-21T10:19:38.0064427Z cvt.f32.bf16 %r6231, %rs262; 2026-02-21T10:19:38.0064489Z cvt.f32.bf16 %r6232, %rs269; 2026-02-21T10:19:38.0064560Z cvt.f32.bf16 %r6233, %rs270; 2026-02-21T10:19:38.0064623Z cvt.f32.bf16 %r6362, %rs277; 2026-02-21T10:19:38.0064766Z cvt.f32.bf16 %r6363, %rs278; 2026-02-21T10:19:38.0064885Z cvt.f32.bf16 %r6364, %rs285; 2026-02-21T10:19:38.0064950Z cvt.f32.bf16 %r6365, %rs286; 2026-02-21T10:19:38.0065016Z cvt.f32.bf16 %r6494, %rs231; 2026-02-21T10:19:38.0065080Z cvt.f32.bf16 %r6495, %rs232; 2026-02-21T10:19:38.0065152Z cvt.f32.bf16 %r6496, %rs239; 2026-02-21T10:19:38.0065216Z cvt.f32.bf16 %r6497, %rs240; 2026-02-21T10:19:38.0065282Z cvt.f32.bf16 %r6626, %rs247; 2026-02-21T10:19:38.0065360Z cvt.f32.bf16 %r6627, %rs248; 2026-02-21T10:19:38.0065424Z cvt.f32.bf16 %r6628, %rs255; 2026-02-21T10:19:38.0065486Z cvt.f32.bf16 %r6629, %rs256; 2026-02-21T10:19:38.0065548Z cvt.f32.bf16 %r6758, %rs263; 2026-02-21T10:19:38.0065617Z cvt.f32.bf16 %r6759, %rs264; 2026-02-21T10:19:38.0065679Z cvt.f32.bf16 %r6760, %rs271; 2026-02-21T10:19:38.0065742Z cvt.f32.bf16 %r6761, %rs272; 2026-02-21T10:19:38.0065812Z cvt.f32.bf16 %r6890, %rs279; 2026-02-21T10:19:38.0065874Z cvt.f32.bf16 %r6891, %rs280; 2026-02-21T10:19:38.0065944Z cvt.f32.bf16 %r6892, %rs287; 2026-02-21T10:19:38.0066010Z cvt.f32.bf16 %r6893, %rs288; 2026-02-21T10:19:38.0066294Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0066358Z bar.sync 0; 2026-02-21T10:19:38.0066422Z // begin inline asm 2026-02-21T10:19:38.0066662Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0066728Z // end inline asm 2026-02-21T10:19:38.0066864Z bar.sync 0; 2026-02-21T10:19:38.0066935Z // begin inline asm 2026-02-21T10:19:38.0067091Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0067151Z // end inline asm 2026-02-21T10:19:38.0067211Z // begin inline asm 2026-02-21T10:19:38.0067299Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0067357Z // end inline asm 2026-02-21T10:19:38.0067415Z bar.sync 0; 2026-02-21T10:19:38.0067495Z elect.sync %r9638|%p100, -1; 2026-02-21T10:19:38.0067565Z and.pred %p60, %p1, %p100; 2026-02-21T10:19:38.0067638Z cvt.u32.u64 %r9639, %rd841; 2026-02-21T10:19:38.0067703Z add.s32 %r4777, %r9639, 128; 2026-02-21T10:19:38.0067770Z // begin inline asm 2026-02-21T10:19:38.0068107Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r9804, %r4777}], [%r29846]; 2026-02-21T10:19:38.0068170Z // end inline asm 2026-02-21T10:19:38.0068234Z bar.sync 0; 2026-02-21T10:19:38.0068295Z // begin inline asm 2026-02-21T10:19:38.0068353Z 2026-02-21T10:19:38.0068486Z { 2026-02-21T10:19:38.0068561Z .reg .pred complete; 2026-02-21T10:19:38.0068621Z waitLoop: 2026-02-21T10:19:38.0068770Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r9439; 2026-02-21T10:19:38.0068851Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0068905Z } 2026-02-21T10:19:38.0068911Z 2026-02-21T10:19:38.0068972Z // end inline asm 2026-02-21T10:19:38.0069040Z bar.sync 0; 2026-02-21T10:19:38.0069101Z // begin inline asm 2026-02-21T10:19:38.0069201Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0069261Z // end inline asm 2026-02-21T10:19:38.0069483Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0069555Z ld.shared.s8 %rs289, [%r19]; 2026-02-21T10:19:38.0069763Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0069836Z shl.b16 %rs290, %rs289, 4; 2026-02-21T10:19:38.0070033Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0070107Z ld.shared.s8 %rs291, [%r20+128]; 2026-02-21T10:19:38.0070309Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0070377Z shl.b16 %rs292, %rs291, 4; 2026-02-21T10:19:38.0070567Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0070642Z ld.shared.s8 %rs293, [%r21+256]; 2026-02-21T10:19:38.0070930Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0071077Z shl.b16 %rs294, %rs293, 4; 2026-02-21T10:19:38.0071274Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0071349Z ld.shared.s8 %rs295, [%r22+384]; 2026-02-21T10:19:38.0071543Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0071607Z shl.b16 %rs296, %rs295, 4; 2026-02-21T10:19:38.0071806Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0071873Z ld.shared.s8 %rs297, [%r23+512]; 2026-02-21T10:19:38.0072066Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0072139Z shl.b16 %rs298, %rs297, 4; 2026-02-21T10:19:38.0072333Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0072401Z ld.shared.s8 %rs299, [%r24+640]; 2026-02-21T10:19:38.0072663Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0072729Z shl.b16 %rs300, %rs299, 4; 2026-02-21T10:19:38.0072920Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0073032Z ld.shared.s8 %rs301, [%r25+768]; 2026-02-21T10:19:38.0073242Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0073309Z shl.b16 %rs302, %rs301, 4; 2026-02-21T10:19:38.0073500Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0073574Z ld.shared.s8 %rs303, [%r26+896]; 2026-02-21T10:19:38.0073763Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0073828Z shl.b16 %rs304, %rs303, 4; 2026-02-21T10:19:38.0074028Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0074098Z ld.shared.s8 %rs305, [%r19+1024]; 2026-02-21T10:19:38.0074294Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0074364Z shl.b16 %rs306, %rs305, 4; 2026-02-21T10:19:38.0074561Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0074631Z ld.shared.s8 %rs307, [%r20+1152]; 2026-02-21T10:19:38.0074822Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0074895Z shl.b16 %rs308, %rs307, 4; 2026-02-21T10:19:38.0075090Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0075165Z ld.shared.s8 %rs309, [%r21+1280]; 2026-02-21T10:19:38.0075371Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0075440Z shl.b16 %rs310, %rs309, 4; 2026-02-21T10:19:38.0075637Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0075713Z ld.shared.s8 %rs311, [%r22+1408]; 2026-02-21T10:19:38.0075906Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0075970Z shl.b16 %rs312, %rs311, 4; 2026-02-21T10:19:38.0076168Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0076236Z ld.shared.s8 %rs313, [%r23+1536]; 2026-02-21T10:19:38.0076427Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0076631Z shl.b16 %rs314, %rs313, 4; 2026-02-21T10:19:38.0076927Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0077070Z ld.shared.s8 %rs315, [%r24+1664]; 2026-02-21T10:19:38.0077268Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0077341Z shl.b16 %rs316, %rs315, 4; 2026-02-21T10:19:38.0077535Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0077601Z ld.shared.s8 %rs317, [%r25+1792]; 2026-02-21T10:19:38.0077799Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0077864Z shl.b16 %rs318, %rs317, 4; 2026-02-21T10:19:38.0078055Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0078126Z ld.shared.s8 %rs319, [%r26+1920]; 2026-02-21T10:19:38.0078316Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0078385Z shl.b16 %rs320, %rs319, 4; 2026-02-21T10:19:38.0078638Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0078712Z ld.shared.s8 %rs321, [%r19+2048]; 2026-02-21T10:19:38.0078902Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0079035Z shl.b16 %rs322, %rs321, 4; 2026-02-21T10:19:38.0079234Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0079299Z ld.shared.s8 %rs323, [%r20+2176]; 2026-02-21T10:19:38.0079487Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0079555Z shl.b16 %rs324, %rs323, 4; 2026-02-21T10:19:38.0079748Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0079818Z ld.shared.s8 %rs325, [%r21+2304]; 2026-02-21T10:19:38.0080016Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0080079Z shl.b16 %rs326, %rs325, 4; 2026-02-21T10:19:38.0080268Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0080339Z ld.shared.s8 %rs327, [%r22+2432]; 2026-02-21T10:19:38.0080529Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0080591Z shl.b16 %rs328, %rs327, 4; 2026-02-21T10:19:38.0080782Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0080853Z ld.shared.s8 %rs329, [%r23+2560]; 2026-02-21T10:19:38.0081044Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0081119Z shl.b16 %rs330, %rs329, 4; 2026-02-21T10:19:38.0081321Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0081392Z ld.shared.s8 %rs331, [%r24+2688]; 2026-02-21T10:19:38.0081583Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0081652Z shl.b16 %rs332, %rs331, 4; 2026-02-21T10:19:38.0081844Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0081911Z ld.shared.s8 %rs333, [%r25+2816]; 2026-02-21T10:19:38.0082104Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0082170Z shl.b16 %rs334, %rs333, 4; 2026-02-21T10:19:38.0082359Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0082424Z ld.shared.s8 %rs335, [%r26+2944]; 2026-02-21T10:19:38.0082617Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0082781Z shl.b16 %rs336, %rs335, 4; 2026-02-21T10:19:38.0082973Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0083047Z ld.shared.s8 %rs337, [%r19+3072]; 2026-02-21T10:19:38.0083237Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0083300Z shl.b16 %rs338, %rs337, 4; 2026-02-21T10:19:38.0083495Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0083563Z ld.shared.s8 %rs339, [%r20+3200]; 2026-02-21T10:19:38.0083756Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0083824Z shl.b16 %rs340, %rs339, 4; 2026-02-21T10:19:38.0084015Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0084085Z ld.shared.s8 %rs341, [%r21+3328]; 2026-02-21T10:19:38.0084320Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0084392Z shl.b16 %rs342, %rs341, 4; 2026-02-21T10:19:38.0084582Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0084648Z ld.shared.s8 %rs343, [%r22+3456]; 2026-02-21T10:19:38.0084890Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0084959Z shl.b16 %rs344, %rs343, 4; 2026-02-21T10:19:38.0085157Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0085223Z ld.shared.s8 %rs345, [%r23+3584]; 2026-02-21T10:19:38.0085414Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0085485Z shl.b16 %rs346, %rs345, 4; 2026-02-21T10:19:38.0085675Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0085746Z ld.shared.s8 %rs347, [%r24+3712]; 2026-02-21T10:19:38.0085942Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0086004Z shl.b16 %rs348, %rs347, 4; 2026-02-21T10:19:38.0086195Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0086265Z ld.shared.s8 %rs349, [%r25+3840]; 2026-02-21T10:19:38.0086585Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0086654Z shl.b16 %rs350, %rs349, 4; 2026-02-21T10:19:38.0086845Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0086926Z ld.shared.s8 %rs351, [%r26+3968]; 2026-02-21T10:19:38.0087122Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0087189Z shl.b16 %rs352, %rs351, 4; 2026-02-21T10:19:38.0087387Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0087451Z cvt.s16.s8 %rs353, %rs290; 2026-02-21T10:19:38.0087513Z shr.s16 %rs354, %rs353, 4; 2026-02-21T10:19:38.0087575Z cvt.s16.s8 %rs355, %rs292; 2026-02-21T10:19:38.0087645Z shr.s16 %rs356, %rs355, 4; 2026-02-21T10:19:38.0087707Z shr.s16 %rs357, %rs289, 4; 2026-02-21T10:19:38.0087768Z shr.s16 %rs358, %rs291, 4; 2026-02-21T10:19:38.0087963Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0088030Z cvt.rn.f32.s16 %r9640, %rs358; 2026-02-21T10:19:38.0088096Z cvt.rn.f32.s16 %r9641, %rs357; 2026-02-21T10:19:38.0088165Z cvt.rn.f32.s16 %r9642, %rs356; 2026-02-21T10:19:38.0088227Z cvt.rn.f32.s16 %r9643, %rs354; 2026-02-21T10:19:38.0088502Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0088626Z cvt.s16.s8 %rs359, %rs294; 2026-02-21T10:19:38.0088694Z shr.s16 %rs360, %rs359, 4; 2026-02-21T10:19:38.0088756Z cvt.s16.s8 %rs361, %rs296; 2026-02-21T10:19:38.0088828Z shr.s16 %rs362, %rs361, 4; 2026-02-21T10:19:38.0088896Z shr.s16 %rs363, %rs293, 4; 2026-02-21T10:19:38.0088957Z shr.s16 %rs364, %rs295, 4; 2026-02-21T10:19:38.0089150Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0089220Z cvt.rn.f32.s16 %r9644, %rs364; 2026-02-21T10:19:38.0089282Z cvt.rn.f32.s16 %r9645, %rs363; 2026-02-21T10:19:38.0089347Z cvt.rn.f32.s16 %r9646, %rs362; 2026-02-21T10:19:38.0089409Z cvt.rn.f32.s16 %r9647, %rs360; 2026-02-21T10:19:38.0089605Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0089669Z cvt.s16.s8 %rs365, %rs298; 2026-02-21T10:19:38.0089732Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:19:38.0089799Z cvt.s16.s8 %rs367, %rs300; 2026-02-21T10:19:38.0089925Z shr.s16 %rs368, %rs367, 4; 2026-02-21T10:19:38.0090001Z shr.s16 %rs369, %rs297, 4; 2026-02-21T10:19:38.0090066Z shr.s16 %rs370, %rs299, 4; 2026-02-21T10:19:38.0090263Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0090390Z cvt.rn.f32.s16 %r9648, %rs370; 2026-02-21T10:19:38.0090456Z cvt.rn.f32.s16 %r9649, %rs369; 2026-02-21T10:19:38.0090528Z cvt.rn.f32.s16 %r9650, %rs368; 2026-02-21T10:19:38.0090589Z cvt.rn.f32.s16 %r9651, %rs366; 2026-02-21T10:19:38.0090780Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0090847Z cvt.s16.s8 %rs371, %rs302; 2026-02-21T10:19:38.0090913Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:19:38.0090977Z cvt.s16.s8 %rs373, %rs304; 2026-02-21T10:19:38.0091039Z shr.s16 %rs374, %rs373, 4; 2026-02-21T10:19:38.0091107Z shr.s16 %rs375, %rs301, 4; 2026-02-21T10:19:38.0091170Z shr.s16 %rs376, %rs303, 4; 2026-02-21T10:19:38.0091370Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0091441Z cvt.rn.f32.s16 %r9652, %rs376; 2026-02-21T10:19:38.0091505Z cvt.rn.f32.s16 %r9653, %rs375; 2026-02-21T10:19:38.0091569Z cvt.rn.f32.s16 %r9654, %rs374; 2026-02-21T10:19:38.0091634Z cvt.rn.f32.s16 %r9655, %rs372; 2026-02-21T10:19:38.0091834Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0091897Z cvt.s16.s8 %rs377, %rs306; 2026-02-21T10:19:38.0091961Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:19:38.0092033Z cvt.s16.s8 %rs379, %rs308; 2026-02-21T10:19:38.0092094Z shr.s16 %rs380, %rs379, 4; 2026-02-21T10:19:38.0092156Z shr.s16 %rs381, %rs305, 4; 2026-02-21T10:19:38.0092222Z shr.s16 %rs382, %rs307, 4; 2026-02-21T10:19:38.0092416Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0092483Z cvt.rn.f32.s16 %r9656, %rs382; 2026-02-21T10:19:38.0092549Z cvt.rn.f32.s16 %r9657, %rs381; 2026-02-21T10:19:38.0092621Z cvt.rn.f32.s16 %r9658, %rs380; 2026-02-21T10:19:38.0092683Z cvt.rn.f32.s16 %r9659, %rs378; 2026-02-21T10:19:38.0092875Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0092944Z cvt.s16.s8 %rs383, %rs310; 2026-02-21T10:19:38.0093006Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:19:38.0093067Z cvt.s16.s8 %rs385, %rs312; 2026-02-21T10:19:38.0093133Z shr.s16 %rs386, %rs385, 4; 2026-02-21T10:19:38.0093195Z shr.s16 %rs387, %rs309, 4; 2026-02-21T10:19:38.0093257Z shr.s16 %rs388, %rs311, 4; 2026-02-21T10:19:38.0093448Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0093519Z cvt.rn.f32.s16 %r9660, %rs388; 2026-02-21T10:19:38.0093675Z cvt.rn.f32.s16 %r9661, %rs387; 2026-02-21T10:19:38.0093783Z cvt.rn.f32.s16 %r9662, %rs386; 2026-02-21T10:19:38.0093853Z cvt.rn.f32.s16 %r9663, %rs384; 2026-02-21T10:19:38.0094045Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0094107Z cvt.s16.s8 %rs389, %rs314; 2026-02-21T10:19:38.0094170Z shr.s16 %rs390, %rs389, 4; 2026-02-21T10:19:38.0094240Z cvt.s16.s8 %rs391, %rs316; 2026-02-21T10:19:38.0094301Z shr.s16 %rs392, %rs391, 4; 2026-02-21T10:19:38.0094362Z shr.s16 %rs393, %rs313, 4; 2026-02-21T10:19:38.0094429Z shr.s16 %rs394, %rs315, 4; 2026-02-21T10:19:38.0094622Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0094687Z cvt.rn.f32.s16 %r9664, %rs394; 2026-02-21T10:19:38.0094758Z cvt.rn.f32.s16 %r9665, %rs393; 2026-02-21T10:19:38.0094825Z cvt.rn.f32.s16 %r9666, %rs392; 2026-02-21T10:19:38.0094893Z cvt.rn.f32.s16 %r9667, %rs390; 2026-02-21T10:19:38.0095087Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0095206Z cvt.s16.s8 %rs395, %rs318; 2026-02-21T10:19:38.0095271Z shr.s16 %rs396, %rs395, 4; 2026-02-21T10:19:38.0095334Z cvt.s16.s8 %rs397, %rs320; 2026-02-21T10:19:38.0095403Z shr.s16 %rs398, %rs397, 4; 2026-02-21T10:19:38.0095471Z shr.s16 %rs399, %rs317, 4; 2026-02-21T10:19:38.0095597Z shr.s16 %rs400, %rs319, 4; 2026-02-21T10:19:38.0095795Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0095866Z cvt.rn.f32.s16 %r9668, %rs400; 2026-02-21T10:19:38.0095932Z cvt.rn.f32.s16 %r9669, %rs399; 2026-02-21T10:19:38.0095995Z cvt.rn.f32.s16 %r9670, %rs398; 2026-02-21T10:19:38.0096064Z cvt.rn.f32.s16 %r9671, %rs396; 2026-02-21T10:19:38.0096258Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0096325Z cvt.s16.s8 %rs401, %rs322; 2026-02-21T10:19:38.0096406Z shr.s16 %rs402, %rs401, 4; 2026-02-21T10:19:38.0096603Z cvt.s16.s8 %rs403, %rs324; 2026-02-21T10:19:38.0096674Z shr.s16 %rs404, %rs403, 4; 2026-02-21T10:19:38.0096736Z shr.s16 %rs405, %rs321, 4; 2026-02-21T10:19:38.0096805Z shr.s16 %rs406, %rs323, 4; 2026-02-21T10:19:38.0097012Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0097082Z cvt.rn.f32.s16 %r9672, %rs406; 2026-02-21T10:19:38.0097156Z cvt.rn.f32.s16 %r9673, %rs405; 2026-02-21T10:19:38.0097220Z cvt.rn.f32.s16 %r9674, %rs404; 2026-02-21T10:19:38.0097282Z cvt.rn.f32.s16 %r9675, %rs402; 2026-02-21T10:19:38.0097483Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0097546Z cvt.s16.s8 %rs407, %rs326; 2026-02-21T10:19:38.0097608Z shr.s16 %rs408, %rs407, 4; 2026-02-21T10:19:38.0097671Z cvt.s16.s8 %rs409, %rs328; 2026-02-21T10:19:38.0097742Z shr.s16 %rs410, %rs409, 4; 2026-02-21T10:19:38.0097806Z shr.s16 %rs411, %rs325, 4; 2026-02-21T10:19:38.0097868Z shr.s16 %rs412, %rs327, 4; 2026-02-21T10:19:38.0098065Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0098131Z cvt.rn.f32.s16 %r9676, %rs412; 2026-02-21T10:19:38.0098195Z cvt.rn.f32.s16 %r9677, %rs411; 2026-02-21T10:19:38.0098259Z cvt.rn.f32.s16 %r9678, %rs410; 2026-02-21T10:19:38.0098327Z cvt.rn.f32.s16 %r9679, %rs408; 2026-02-21T10:19:38.0098520Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0098587Z cvt.s16.s8 %rs413, %rs330; 2026-02-21T10:19:38.0098655Z shr.s16 %rs414, %rs413, 4; 2026-02-21T10:19:38.0098716Z cvt.s16.s8 %rs415, %rs332; 2026-02-21T10:19:38.0098777Z shr.s16 %rs416, %rs415, 4; 2026-02-21T10:19:38.0098842Z shr.s16 %rs417, %rs329, 4; 2026-02-21T10:19:38.0098989Z shr.s16 %rs418, %rs331, 4; 2026-02-21T10:19:38.0099194Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0099329Z cvt.rn.f32.s16 %r9680, %rs418; 2026-02-21T10:19:38.0099398Z cvt.rn.f32.s16 %r9681, %rs417; 2026-02-21T10:19:38.0099461Z cvt.rn.f32.s16 %r9682, %rs416; 2026-02-21T10:19:38.0099524Z cvt.rn.f32.s16 %r9683, %rs414; 2026-02-21T10:19:38.0099721Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0099783Z cvt.s16.s8 %rs419, %rs334; 2026-02-21T10:19:38.0099846Z shr.s16 %rs420, %rs419, 4; 2026-02-21T10:19:38.0099915Z cvt.s16.s8 %rs421, %rs336; 2026-02-21T10:19:38.0099977Z shr.s16 %rs422, %rs421, 4; 2026-02-21T10:19:38.0100037Z shr.s16 %rs423, %rs333, 4; 2026-02-21T10:19:38.0100099Z shr.s16 %rs424, %rs335, 4; 2026-02-21T10:19:38.0100295Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0100362Z cvt.rn.f32.s16 %r9684, %rs424; 2026-02-21T10:19:38.0100426Z cvt.rn.f32.s16 %r9685, %rs423; 2026-02-21T10:19:38.0100494Z cvt.rn.f32.s16 %r9686, %rs422; 2026-02-21T10:19:38.0100628Z cvt.rn.f32.s16 %r9687, %rs420; 2026-02-21T10:19:38.0100822Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0100886Z cvt.s16.s8 %rs425, %rs338; 2026-02-21T10:19:38.0101015Z shr.s16 %rs426, %rs425, 4; 2026-02-21T10:19:38.0101078Z cvt.s16.s8 %rs427, %rs340; 2026-02-21T10:19:38.0101140Z shr.s16 %rs428, %rs427, 4; 2026-02-21T10:19:38.0101206Z shr.s16 %rs429, %rs337, 4; 2026-02-21T10:19:38.0101268Z shr.s16 %rs430, %rs339, 4; 2026-02-21T10:19:38.0101461Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0101532Z cvt.rn.f32.s16 %r9688, %rs430; 2026-02-21T10:19:38.0101596Z cvt.rn.f32.s16 %r9689, %rs429; 2026-02-21T10:19:38.0101659Z cvt.rn.f32.s16 %r9690, %rs428; 2026-02-21T10:19:38.0101725Z cvt.rn.f32.s16 %r9691, %rs426; 2026-02-21T10:19:38.0101925Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0101987Z cvt.s16.s8 %rs431, %rs342; 2026-02-21T10:19:38.0102054Z shr.s16 %rs432, %rs431, 4; 2026-02-21T10:19:38.0102137Z cvt.s16.s8 %rs433, %rs344; 2026-02-21T10:19:38.0102201Z shr.s16 %rs434, %rs433, 4; 2026-02-21T10:19:38.0102263Z shr.s16 %rs435, %rs341, 4; 2026-02-21T10:19:38.0102326Z shr.s16 %rs436, %rs343, 4; 2026-02-21T10:19:38.0102524Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0102588Z cvt.rn.f32.s16 %r9692, %rs436; 2026-02-21T10:19:38.0102653Z cvt.rn.f32.s16 %r9693, %rs435; 2026-02-21T10:19:38.0102721Z cvt.rn.f32.s16 %r9694, %rs434; 2026-02-21T10:19:38.0102784Z cvt.rn.f32.s16 %r9695, %rs432; 2026-02-21T10:19:38.0102974Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0103045Z cvt.s16.s8 %rs437, %rs346; 2026-02-21T10:19:38.0103106Z shr.s16 %rs438, %rs437, 4; 2026-02-21T10:19:38.0103171Z cvt.s16.s8 %rs439, %rs348; 2026-02-21T10:19:38.0103234Z shr.s16 %rs440, %rs439, 4; 2026-02-21T10:19:38.0103301Z shr.s16 %rs441, %rs345, 4; 2026-02-21T10:19:38.0103362Z shr.s16 %rs442, %rs347, 4; 2026-02-21T10:19:38.0103554Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0103625Z cvt.rn.f32.s16 %r9696, %rs442; 2026-02-21T10:19:38.0103690Z cvt.rn.f32.s16 %r9697, %rs441; 2026-02-21T10:19:38.0103752Z cvt.rn.f32.s16 %r9698, %rs440; 2026-02-21T10:19:38.0103823Z cvt.rn.f32.s16 %r9699, %rs438; 2026-02-21T10:19:38.0104014Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0104077Z cvt.s16.s8 %rs443, %rs350; 2026-02-21T10:19:38.0104140Z shr.s16 %rs444, %rs443, 4; 2026-02-21T10:19:38.0104270Z cvt.s16.s8 %rs445, %rs352; 2026-02-21T10:19:38.0104377Z shr.s16 %rs446, %rs445, 4; 2026-02-21T10:19:38.0104441Z shr.s16 %rs447, %rs349, 4; 2026-02-21T10:19:38.0104510Z shr.s16 %rs448, %rs351, 4; 2026-02-21T10:19:38.0104701Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0104766Z cvt.rn.f32.s16 %r9700, %rs448; 2026-02-21T10:19:38.0104832Z cvt.rn.f32.s16 %r9701, %rs447; 2026-02-21T10:19:38.0104902Z cvt.rn.f32.s16 %r9702, %rs446; 2026-02-21T10:19:38.0104967Z cvt.rn.f32.s16 %r9703, %rs444; 2026-02-21T10:19:38.0105026Z bar.sync 0; 2026-02-21T10:19:38.0105154Z st.shared.v4.b32 [%r27], {%r9643, %r9641, %r9642, %r9640}; 2026-02-21T10:19:38.0105281Z st.shared.v4.b32 [%r27+16384], {%r9675, %r9673, %r9674, %r9672}; 2026-02-21T10:19:38.0105388Z st.shared.v4.b32 [%r28], {%r9647, %r9645, %r9646, %r9644}; 2026-02-21T10:19:38.0105511Z st.shared.v4.b32 [%r28+16384], {%r9679, %r9677, %r9678, %r9676}; 2026-02-21T10:19:38.0105618Z st.shared.v4.b32 [%r29], {%r9651, %r9649, %r9650, %r9648}; 2026-02-21T10:19:38.0105735Z st.shared.v4.b32 [%r29+16384], {%r9683, %r9681, %r9682, %r9680}; 2026-02-21T10:19:38.0105886Z st.shared.v4.b32 [%r30], {%r9655, %r9653, %r9654, %r9652}; 2026-02-21T10:19:38.0106011Z st.shared.v4.b32 [%r30+16384], {%r9687, %r9685, %r9686, %r9684}; 2026-02-21T10:19:38.0106125Z st.shared.v4.b32 [%r31], {%r9659, %r9657, %r9658, %r9656}; 2026-02-21T10:19:38.0106288Z st.shared.v4.b32 [%r31+16384], {%r9691, %r9689, %r9690, %r9688}; 2026-02-21T10:19:38.0106401Z st.shared.v4.b32 [%r32], {%r9663, %r9661, %r9662, %r9660}; 2026-02-21T10:19:38.0106639Z st.shared.v4.b32 [%r32+16384], {%r9695, %r9693, %r9694, %r9692}; 2026-02-21T10:19:38.0106747Z st.shared.v4.b32 [%r33], {%r9667, %r9665, %r9666, %r9664}; 2026-02-21T10:19:38.0106867Z st.shared.v4.b32 [%r33+16384], {%r9699, %r9697, %r9698, %r9696}; 2026-02-21T10:19:38.0106969Z st.shared.v4.b32 [%r34], {%r9671, %r9669, %r9670, %r9668}; 2026-02-21T10:19:38.0107097Z st.shared.v4.b32 [%r34+16384], {%r9703, %r9701, %r9702, %r9700}; 2026-02-21T10:19:38.0107162Z $L__tmp3: 2026-02-21T10:19:38.0107453Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0107518Z // begin inline asm 2026-02-21T10:19:38.0107600Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0107673Z // end inline asm 2026-02-21T10:19:38.0107735Z bar.sync 0; 2026-02-21T10:19:38.0107811Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0107877Z // begin inline asm 2026-02-21T10:19:38.0109454Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r4910,%r4911,%r4912,%r4913}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0109522Z // end inline asm 2026-02-21T10:19:38.0109589Z // begin inline asm 2026-02-21T10:19:38.0111076Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5042,%r5043,%r5044,%r5045}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0111141Z // end inline asm 2026-02-21T10:19:38.0111294Z // begin inline asm 2026-02-21T10:19:38.0112780Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5174,%r5175,%r5176,%r5177}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0112905Z // end inline asm 2026-02-21T10:19:38.0112966Z // begin inline asm 2026-02-21T10:19:38.0114495Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5306,%r5307,%r5308,%r5309}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0114567Z // end inline asm 2026-02-21T10:19:38.0114710Z // begin inline asm 2026-02-21T10:19:38.0116190Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5438,%r5439,%r5440,%r5441}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0116256Z // end inline asm 2026-02-21T10:19:38.0116317Z // begin inline asm 2026-02-21T10:19:38.0117950Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5570,%r5571,%r5572,%r5573}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0118014Z // end inline asm 2026-02-21T10:19:38.0118075Z // begin inline asm 2026-02-21T10:19:38.0119559Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5702,%r5703,%r5704,%r5705}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0119623Z // end inline asm 2026-02-21T10:19:38.0119688Z // begin inline asm 2026-02-21T10:19:38.0121163Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r5834,%r5835,%r5836,%r5837}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0121353Z // end inline asm 2026-02-21T10:19:38.0121428Z // begin inline asm 2026-02-21T10:19:38.0122904Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r5966,%r5967,%r5968,%r5969}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0122969Z // end inline asm 2026-02-21T10:19:38.0123031Z // begin inline asm 2026-02-21T10:19:38.0124626Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6098,%r6099,%r6100,%r6101}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0124692Z // end inline asm 2026-02-21T10:19:38.0124752Z // begin inline asm 2026-02-21T10:19:38.0126228Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6230,%r6231,%r6232,%r6233}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0126295Z // end inline asm 2026-02-21T10:19:38.0126354Z // begin inline asm 2026-02-21T10:19:38.0127969Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6362,%r6363,%r6364,%r6365}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0128036Z // end inline asm 2026-02-21T10:19:38.0128096Z // begin inline asm 2026-02-21T10:19:38.0129747Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6494,%r6495,%r6496,%r6497}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0129812Z // end inline asm 2026-02-21T10:19:38.0129976Z // begin inline asm 2026-02-21T10:19:38.0131455Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6626,%r6627,%r6628,%r6629}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0131579Z // end inline asm 2026-02-21T10:19:38.0131640Z // begin inline asm 2026-02-21T10:19:38.0133164Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6758,%r6759,%r6760,%r6761}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0133230Z // end inline asm 2026-02-21T10:19:38.0133343Z // begin inline asm 2026-02-21T10:19:38.0134829Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r6890,%r6891,%r6892,%r6893}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0134893Z // end inline asm 2026-02-21T10:19:38.0134974Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0135036Z mov.b32 %r7023, %r9439; 2026-02-21T10:19:38.0135096Z mov.b32 %r7024, %r9439; 2026-02-21T10:19:38.0135155Z mov.b32 %r7022, %r39931; 2026-02-21T10:19:38.0135213Z // begin inline asm 2026-02-21T10:19:38.0137874Z // wait for regs: %r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r7022,%r7023,%r7024 2026-02-21T10:19:38.0137961Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0138017Z // end inline asm 2026-02-21T10:19:38.0138077Z $L__tmp4: 2026-02-21T10:19:38.0138293Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0138360Z add.s64 %rd248, %rd166, 256; 2026-02-21T10:19:38.0138426Z add.s64 %rd251, %rd169, 256; 2026-02-21T10:19:38.0138487Z add.s64 %rd254, %rd172, 256; 2026-02-21T10:19:38.0138657Z add.s64 %rd257, %rd175, 256; 2026-02-21T10:19:38.0138719Z add.s64 %rd260, %rd178, 256; 2026-02-21T10:19:38.0138848Z add.s64 %rd263, %rd181, 256; 2026-02-21T10:19:38.0138920Z add.s64 %rd266, %rd184, 256; 2026-02-21T10:19:38.0139000Z mad.wide.s32 %rd269, %r42468, 2, %rd117; 2026-02-21T10:19:38.0139209Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0139271Z // begin inline asm 2026-02-21T10:19:38.0139332Z mov.u64 %rd247, 0x0; 2026-02-21T10:19:38.0139469Z createpolicy.fractional.L2::evict_first.b64 %rd247, 1.0; 2026-02-21T10:19:38.0139532Z // end inline asm 2026-02-21T10:19:38.0139591Z // begin inline asm 2026-02-21T10:19:38.0139652Z mov.u32 %r7156, 0x0; 2026-02-21T10:19:38.0139717Z mov.u32 %r7157, 0x0; 2026-02-21T10:19:38.0139776Z mov.u32 %r7158, 0x0; 2026-02-21T10:19:38.0139834Z mov.u32 %r7159, 0x0; 2026-02-21T10:19:38.0140073Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7156, %r7157, %r7158, %r7159 }, [ %rd248 + 0 ], %rd247; 2026-02-21T10:19:38.0140134Z // end inline asm 2026-02-21T10:19:38.0140195Z // begin inline asm 2026-02-21T10:19:38.0140261Z mov.u64 %rd250, 0x0; 2026-02-21T10:19:38.0140462Z createpolicy.fractional.L2::evict_first.b64 %rd250, 1.0; 2026-02-21T10:19:38.0140526Z // end inline asm 2026-02-21T10:19:38.0140587Z // begin inline asm 2026-02-21T10:19:38.0140651Z mov.u32 %r7160, 0x0; 2026-02-21T10:19:38.0140709Z mov.u32 %r7161, 0x0; 2026-02-21T10:19:38.0140825Z mov.u32 %r7162, 0x0; 2026-02-21T10:19:38.0140889Z mov.u32 %r7163, 0x0; 2026-02-21T10:19:38.0141113Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7160, %r7161, %r7162, %r7163 }, [ %rd251 + 0 ], %rd250; 2026-02-21T10:19:38.0141171Z // end inline asm 2026-02-21T10:19:38.0141234Z // begin inline asm 2026-02-21T10:19:38.0141293Z mov.u64 %rd253, 0x0; 2026-02-21T10:19:38.0141413Z createpolicy.fractional.L2::evict_first.b64 %rd253, 1.0; 2026-02-21T10:19:38.0141471Z // end inline asm 2026-02-21T10:19:38.0141539Z // begin inline asm 2026-02-21T10:19:38.0141600Z mov.u32 %r7164, 0x0; 2026-02-21T10:19:38.0141658Z mov.u32 %r7165, 0x0; 2026-02-21T10:19:38.0141718Z mov.u32 %r7166, 0x0; 2026-02-21T10:19:38.0141778Z mov.u32 %r7167, 0x0; 2026-02-21T10:19:38.0142000Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7164, %r7165, %r7166, %r7167 }, [ %rd254 + 0 ], %rd253; 2026-02-21T10:19:38.0142057Z // end inline asm 2026-02-21T10:19:38.0142122Z // begin inline asm 2026-02-21T10:19:38.0142182Z mov.u64 %rd256, 0x0; 2026-02-21T10:19:38.0142309Z createpolicy.fractional.L2::evict_first.b64 %rd256, 1.0; 2026-02-21T10:19:38.0142372Z // end inline asm 2026-02-21T10:19:38.0142430Z // begin inline asm 2026-02-21T10:19:38.0142488Z mov.u32 %r7168, 0x0; 2026-02-21T10:19:38.0142551Z mov.u32 %r7169, 0x0; 2026-02-21T10:19:38.0142607Z mov.u32 %r7170, 0x0; 2026-02-21T10:19:38.0142664Z mov.u32 %r7171, 0x0; 2026-02-21T10:19:38.0142879Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7168, %r7169, %r7170, %r7171 }, [ %rd257 + 0 ], %rd256; 2026-02-21T10:19:38.0142953Z // end inline asm 2026-02-21T10:19:38.0143015Z // begin inline asm 2026-02-21T10:19:38.0143075Z mov.u64 %rd259, 0x0; 2026-02-21T10:19:38.0143201Z createpolicy.fractional.L2::evict_first.b64 %rd259, 1.0; 2026-02-21T10:19:38.0143260Z // end inline asm 2026-02-21T10:19:38.0143318Z // begin inline asm 2026-02-21T10:19:38.0143374Z mov.u32 %r7172, 0x0; 2026-02-21T10:19:38.0143434Z mov.u32 %r7173, 0x0; 2026-02-21T10:19:38.0143495Z mov.u32 %r7174, 0x0; 2026-02-21T10:19:38.0143554Z mov.u32 %r7175, 0x0; 2026-02-21T10:19:38.0143778Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7172, %r7173, %r7174, %r7175 }, [ %rd260 + 0 ], %rd259; 2026-02-21T10:19:38.0143836Z // end inline asm 2026-02-21T10:19:38.0143895Z // begin inline asm 2026-02-21T10:19:38.0143957Z mov.u64 %rd262, 0x0; 2026-02-21T10:19:38.0144075Z createpolicy.fractional.L2::evict_first.b64 %rd262, 1.0; 2026-02-21T10:19:38.0144133Z // end inline asm 2026-02-21T10:19:38.0144192Z // begin inline asm 2026-02-21T10:19:38.0144312Z mov.u32 %r7176, 0x0; 2026-02-21T10:19:38.0144389Z mov.u32 %r7177, 0x0; 2026-02-21T10:19:38.0144498Z mov.u32 %r7178, 0x0; 2026-02-21T10:19:38.0144560Z mov.u32 %r7179, 0x0; 2026-02-21T10:19:38.0144777Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7176, %r7177, %r7178, %r7179 }, [ %rd263 + 0 ], %rd262; 2026-02-21T10:19:38.0144834Z // end inline asm 2026-02-21T10:19:38.0144898Z // begin inline asm 2026-02-21T10:19:38.0144956Z mov.u64 %rd265, 0x0; 2026-02-21T10:19:38.0145075Z createpolicy.fractional.L2::evict_first.b64 %rd265, 1.0; 2026-02-21T10:19:38.0145132Z // end inline asm 2026-02-21T10:19:38.0145195Z // begin inline asm 2026-02-21T10:19:38.0145253Z mov.u32 %r7180, 0x0; 2026-02-21T10:19:38.0145313Z mov.u32 %r7181, 0x0; 2026-02-21T10:19:38.0145374Z mov.u32 %r7182, 0x0; 2026-02-21T10:19:38.0145441Z mov.u32 %r7183, 0x0; 2026-02-21T10:19:38.0145657Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7180, %r7181, %r7182, %r7183 }, [ %rd266 + 0 ], %rd265; 2026-02-21T10:19:38.0145722Z // end inline asm 2026-02-21T10:19:38.0145782Z // begin inline asm 2026-02-21T10:19:38.0145843Z mov.u64 %rd268, 0x0; 2026-02-21T10:19:38.0146029Z createpolicy.fractional.L2::evict_first.b64 %rd268, 1.0; 2026-02-21T10:19:38.0146094Z // end inline asm 2026-02-21T10:19:38.0146153Z // begin inline asm 2026-02-21T10:19:38.0146211Z mov.u32 %r7184, 0x0; 2026-02-21T10:19:38.0146272Z mov.u32 %r7185, 0x0; 2026-02-21T10:19:38.0146328Z mov.u32 %r7186, 0x0; 2026-02-21T10:19:38.0146434Z mov.u32 %r7187, 0x0; 2026-02-21T10:19:38.0146785Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7184, %r7185, %r7186, %r7187 }, [ %rd269 + 0 ], %rd268; 2026-02-21T10:19:38.0146852Z // end inline asm 2026-02-21T10:19:38.0147071Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0147131Z bar.sync 0; 2026-02-21T10:19:38.0147222Z st.shared.v2.b32 [%r9], {%r7156, %r7157}; 2026-02-21T10:19:38.0147309Z st.shared.v2.b32 [%r9+2048], {%r7160, %r7161}; 2026-02-21T10:19:38.0147396Z st.shared.v2.b32 [%r9+4096], {%r7164, %r7165}; 2026-02-21T10:19:38.0147483Z st.shared.v2.b32 [%r9+6144], {%r7168, %r7169}; 2026-02-21T10:19:38.0147569Z st.shared.v2.b32 [%r9+8192], {%r7172, %r7173}; 2026-02-21T10:19:38.0147654Z st.shared.v2.b32 [%r9+10240], {%r7176, %r7177}; 2026-02-21T10:19:38.0147735Z st.shared.v2.b32 [%r9+12288], {%r7180, %r7181}; 2026-02-21T10:19:38.0147818Z st.shared.v2.b32 [%r9+14336], {%r7184, %r7185}; 2026-02-21T10:19:38.0147899Z st.shared.v2.b32 [%r10], {%r7158, %r7159}; 2026-02-21T10:19:38.0147992Z st.shared.v2.b32 [%r10+2048], {%r7162, %r7163}; 2026-02-21T10:19:38.0148081Z st.shared.v2.b32 [%r10+4096], {%r7166, %r7167}; 2026-02-21T10:19:38.0148165Z st.shared.v2.b32 [%r10+6144], {%r7170, %r7171}; 2026-02-21T10:19:38.0148245Z st.shared.v2.b32 [%r10+8192], {%r7174, %r7175}; 2026-02-21T10:19:38.0148340Z st.shared.v2.b32 [%r10+10240], {%r7178, %r7179}; 2026-02-21T10:19:38.0148504Z st.shared.v2.b32 [%r10+12288], {%r7182, %r7183}; 2026-02-21T10:19:38.0148592Z st.shared.v2.b32 [%r10+14336], {%r7186, %r7187}; 2026-02-21T10:19:38.0148655Z bar.sync 0; 2026-02-21T10:19:38.0148731Z ld.shared.b16 %rs449, [%r11]; 2026-02-21T10:19:38.0148805Z ld.shared.b16 %rs450, [%r11+1024]; 2026-02-21T10:19:38.0148875Z ld.shared.b16 %rs451, [%r11+64]; 2026-02-21T10:19:38.0148948Z ld.shared.b16 %rs452, [%r11+1088]; 2026-02-21T10:19:38.0149015Z ld.shared.b16 %rs453, [%r11+8192]; 2026-02-21T10:19:38.0149082Z ld.shared.b16 %rs454, [%r11+9216]; 2026-02-21T10:19:38.0149148Z ld.shared.b16 %rs455, [%r11+8256]; 2026-02-21T10:19:38.0149221Z ld.shared.b16 %rs456, [%r11+9280]; 2026-02-21T10:19:38.0149287Z ld.shared.b16 %rs457, [%r12]; 2026-02-21T10:19:38.0149352Z ld.shared.b16 %rs458, [%r12+1024]; 2026-02-21T10:19:38.0149424Z ld.shared.b16 %rs459, [%r12+64]; 2026-02-21T10:19:38.0149489Z ld.shared.b16 %rs460, [%r12+1088]; 2026-02-21T10:19:38.0149554Z ld.shared.b16 %rs461, [%r12+8192]; 2026-02-21T10:19:38.0149624Z ld.shared.b16 %rs462, [%r12+9216]; 2026-02-21T10:19:38.0149774Z ld.shared.b16 %rs463, [%r12+8256]; 2026-02-21T10:19:38.0149898Z ld.shared.b16 %rs464, [%r12+9280]; 2026-02-21T10:19:38.0149964Z ld.shared.b16 %rs465, [%r13]; 2026-02-21T10:19:38.0150034Z ld.shared.b16 %rs466, [%r13+1024]; 2026-02-21T10:19:38.0150101Z ld.shared.b16 %rs467, [%r13+64]; 2026-02-21T10:19:38.0150178Z ld.shared.b16 %rs468, [%r13+1088]; 2026-02-21T10:19:38.0150248Z ld.shared.b16 %rs469, [%r13+8192]; 2026-02-21T10:19:38.0150315Z ld.shared.b16 %rs470, [%r13+9216]; 2026-02-21T10:19:38.0150383Z ld.shared.b16 %rs471, [%r13+8256]; 2026-02-21T10:19:38.0150447Z ld.shared.b16 %rs472, [%r13+9280]; 2026-02-21T10:19:38.0150516Z ld.shared.b16 %rs473, [%r14]; 2026-02-21T10:19:38.0150581Z ld.shared.b16 %rs474, [%r14+1024]; 2026-02-21T10:19:38.0150645Z ld.shared.b16 %rs475, [%r14+64]; 2026-02-21T10:19:38.0150717Z ld.shared.b16 %rs476, [%r14+1088]; 2026-02-21T10:19:38.0150785Z ld.shared.b16 %rs477, [%r14+8192]; 2026-02-21T10:19:38.0150851Z ld.shared.b16 %rs478, [%r14+9216]; 2026-02-21T10:19:38.0150924Z ld.shared.b16 %rs479, [%r14+8256]; 2026-02-21T10:19:38.0150994Z ld.shared.b16 %rs480, [%r14+9280]; 2026-02-21T10:19:38.0151128Z ld.shared.b16 %rs481, [%r15]; 2026-02-21T10:19:38.0151197Z ld.shared.b16 %rs482, [%r15+1024]; 2026-02-21T10:19:38.0151268Z ld.shared.b16 %rs483, [%r15+64]; 2026-02-21T10:19:38.0151336Z ld.shared.b16 %rs484, [%r15+1088]; 2026-02-21T10:19:38.0151399Z ld.shared.b16 %rs485, [%r15+8192]; 2026-02-21T10:19:38.0151528Z ld.shared.b16 %rs486, [%r15+9216]; 2026-02-21T10:19:38.0151596Z ld.shared.b16 %rs487, [%r15+8256]; 2026-02-21T10:19:38.0151660Z ld.shared.b16 %rs488, [%r15+9280]; 2026-02-21T10:19:38.0151721Z ld.shared.b16 %rs489, [%r16]; 2026-02-21T10:19:38.0151789Z ld.shared.b16 %rs490, [%r16+1024]; 2026-02-21T10:19:38.0151852Z ld.shared.b16 %rs491, [%r16+64]; 2026-02-21T10:19:38.0151917Z ld.shared.b16 %rs492, [%r16+1088]; 2026-02-21T10:19:38.0151986Z ld.shared.b16 %rs493, [%r16+8192]; 2026-02-21T10:19:38.0152053Z ld.shared.b16 %rs494, [%r16+9216]; 2026-02-21T10:19:38.0152118Z ld.shared.b16 %rs495, [%r16+8256]; 2026-02-21T10:19:38.0152185Z ld.shared.b16 %rs496, [%r16+9280]; 2026-02-21T10:19:38.0152257Z ld.shared.b16 %rs497, [%r17]; 2026-02-21T10:19:38.0152322Z ld.shared.b16 %rs498, [%r17+1024]; 2026-02-21T10:19:38.0152387Z ld.shared.b16 %rs499, [%r17+64]; 2026-02-21T10:19:38.0152458Z ld.shared.b16 %rs500, [%r17+1088]; 2026-02-21T10:19:38.0152524Z ld.shared.b16 %rs501, [%r17+8192]; 2026-02-21T10:19:38.0152588Z ld.shared.b16 %rs502, [%r17+9216]; 2026-02-21T10:19:38.0152659Z ld.shared.b16 %rs503, [%r17+8256]; 2026-02-21T10:19:38.0152724Z ld.shared.b16 %rs504, [%r17+9280]; 2026-02-21T10:19:38.0152789Z ld.shared.b16 %rs505, [%r18]; 2026-02-21T10:19:38.0152853Z ld.shared.b16 %rs506, [%r18+1024]; 2026-02-21T10:19:38.0152933Z ld.shared.b16 %rs507, [%r18+64]; 2026-02-21T10:19:38.0153004Z ld.shared.b16 %rs508, [%r18+1088]; 2026-02-21T10:19:38.0153069Z ld.shared.b16 %rs509, [%r18+8192]; 2026-02-21T10:19:38.0153141Z ld.shared.b16 %rs510, [%r18+9216]; 2026-02-21T10:19:38.0153209Z ld.shared.b16 %rs511, [%r18+8256]; 2026-02-21T10:19:38.0153276Z ld.shared.b16 %rs512, [%r18+9280]; 2026-02-21T10:19:38.0153345Z cvt.f32.bf16 %r7325, %rs449; 2026-02-21T10:19:38.0153413Z cvt.f32.bf16 %r7326, %rs450; 2026-02-21T10:19:38.0153475Z cvt.f32.bf16 %r7327, %rs457; 2026-02-21T10:19:38.0153538Z cvt.f32.bf16 %r7328, %rs458; 2026-02-21T10:19:38.0153605Z cvt.f32.bf16 %r7457, %rs465; 2026-02-21T10:19:38.0153668Z cvt.f32.bf16 %r7458, %rs466; 2026-02-21T10:19:38.0153729Z cvt.f32.bf16 %r7459, %rs473; 2026-02-21T10:19:38.0153789Z cvt.f32.bf16 %r7460, %rs474; 2026-02-21T10:19:38.0153858Z cvt.f32.bf16 %r7589, %rs481; 2026-02-21T10:19:38.0153918Z cvt.f32.bf16 %r7590, %rs482; 2026-02-21T10:19:38.0153979Z cvt.f32.bf16 %r7591, %rs489; 2026-02-21T10:19:38.0154048Z cvt.f32.bf16 %r7592, %rs490; 2026-02-21T10:19:38.0154110Z cvt.f32.bf16 %r7721, %rs497; 2026-02-21T10:19:38.0154171Z cvt.f32.bf16 %r7722, %rs498; 2026-02-21T10:19:38.0154298Z cvt.f32.bf16 %r7723, %rs505; 2026-02-21T10:19:38.0154406Z cvt.f32.bf16 %r7724, %rs506; 2026-02-21T10:19:38.0154468Z cvt.f32.bf16 %r7853, %rs451; 2026-02-21T10:19:38.0154540Z cvt.f32.bf16 %r7854, %rs452; 2026-02-21T10:19:38.0154606Z cvt.f32.bf16 %r7855, %rs459; 2026-02-21T10:19:38.0154682Z cvt.f32.bf16 %r7856, %rs460; 2026-02-21T10:19:38.0154747Z cvt.f32.bf16 %r7985, %rs467; 2026-02-21T10:19:38.0154813Z cvt.f32.bf16 %r7986, %rs468; 2026-02-21T10:19:38.0154876Z cvt.f32.bf16 %r7987, %rs475; 2026-02-21T10:19:38.0154940Z cvt.f32.bf16 %r7988, %rs476; 2026-02-21T10:19:38.0155002Z cvt.f32.bf16 %r8117, %rs483; 2026-02-21T10:19:38.0155074Z cvt.f32.bf16 %r8118, %rs484; 2026-02-21T10:19:38.0155137Z cvt.f32.bf16 %r8119, %rs491; 2026-02-21T10:19:38.0155200Z cvt.f32.bf16 %r8120, %rs492; 2026-02-21T10:19:38.0155268Z cvt.f32.bf16 %r8249, %rs499; 2026-02-21T10:19:38.0155331Z cvt.f32.bf16 %r8250, %rs500; 2026-02-21T10:19:38.0155392Z cvt.f32.bf16 %r8251, %rs507; 2026-02-21T10:19:38.0155453Z cvt.f32.bf16 %r8252, %rs508; 2026-02-21T10:19:38.0155523Z cvt.f32.bf16 %r8381, %rs453; 2026-02-21T10:19:38.0155592Z cvt.f32.bf16 %r8382, %rs454; 2026-02-21T10:19:38.0155707Z cvt.f32.bf16 %r8383, %rs461; 2026-02-21T10:19:38.0155777Z cvt.f32.bf16 %r8384, %rs462; 2026-02-21T10:19:38.0155838Z cvt.f32.bf16 %r8513, %rs469; 2026-02-21T10:19:38.0155900Z cvt.f32.bf16 %r8514, %rs470; 2026-02-21T10:19:38.0155962Z cvt.f32.bf16 %r8515, %rs477; 2026-02-21T10:19:38.0156073Z cvt.f32.bf16 %r8516, %rs478; 2026-02-21T10:19:38.0156138Z cvt.f32.bf16 %r8645, %rs485; 2026-02-21T10:19:38.0156199Z cvt.f32.bf16 %r8646, %rs486; 2026-02-21T10:19:38.0156265Z cvt.f32.bf16 %r8647, %rs493; 2026-02-21T10:19:38.0156326Z cvt.f32.bf16 %r8648, %rs494; 2026-02-21T10:19:38.0156387Z cvt.f32.bf16 %r8777, %rs501; 2026-02-21T10:19:38.0156582Z cvt.f32.bf16 %r8778, %rs502; 2026-02-21T10:19:38.0156650Z cvt.f32.bf16 %r8779, %rs509; 2026-02-21T10:19:38.0156714Z cvt.f32.bf16 %r8780, %rs510; 2026-02-21T10:19:38.0156778Z cvt.f32.bf16 %r8909, %rs455; 2026-02-21T10:19:38.0156846Z cvt.f32.bf16 %r8910, %rs456; 2026-02-21T10:19:38.0156908Z cvt.f32.bf16 %r8911, %rs463; 2026-02-21T10:19:38.0156972Z cvt.f32.bf16 %r8912, %rs464; 2026-02-21T10:19:38.0157038Z cvt.f32.bf16 %r9041, %rs471; 2026-02-21T10:19:38.0157111Z cvt.f32.bf16 %r9042, %rs472; 2026-02-21T10:19:38.0157173Z cvt.f32.bf16 %r9043, %rs479; 2026-02-21T10:19:38.0157235Z cvt.f32.bf16 %r9044, %rs480; 2026-02-21T10:19:38.0157302Z cvt.f32.bf16 %r9173, %rs487; 2026-02-21T10:19:38.0157363Z cvt.f32.bf16 %r9174, %rs488; 2026-02-21T10:19:38.0157424Z cvt.f32.bf16 %r9175, %rs495; 2026-02-21T10:19:38.0157491Z cvt.f32.bf16 %r9176, %rs496; 2026-02-21T10:19:38.0157553Z cvt.f32.bf16 %r9305, %rs503; 2026-02-21T10:19:38.0157613Z cvt.f32.bf16 %r9306, %rs504; 2026-02-21T10:19:38.0157675Z cvt.f32.bf16 %r9307, %rs511; 2026-02-21T10:19:38.0157745Z cvt.f32.bf16 %r9308, %rs512; 2026-02-21T10:19:38.0157955Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0158015Z bar.sync 0; 2026-02-21T10:19:38.0158079Z // begin inline asm 2026-02-21T10:19:38.0158185Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0158245Z // end inline asm 2026-02-21T10:19:38.0158304Z bar.sync 0; 2026-02-21T10:19:38.0158365Z // begin inline asm 2026-02-21T10:19:38.0158498Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0158557Z // end inline asm 2026-02-21T10:19:38.0158621Z // begin inline asm 2026-02-21T10:19:38.0158700Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0158756Z // end inline asm 2026-02-21T10:19:38.0158818Z bar.sync 0; 2026-02-21T10:19:38.0158896Z elect.sync %r9704|%p101, -1; 2026-02-21T10:19:38.0158969Z and.pred %p80, %p1, %p101; 2026-02-21T10:19:38.0159032Z add.s32 %r7192, %r9639, 160; 2026-02-21T10:19:38.0159096Z // begin inline asm 2026-02-21T10:19:38.0159433Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r9804, %r7192}], [%r29846]; 2026-02-21T10:19:38.0159633Z // end inline asm 2026-02-21T10:19:38.0159694Z bar.sync 0; 2026-02-21T10:19:38.0159754Z // begin inline asm 2026-02-21T10:19:38.0159808Z 2026-02-21T10:19:38.0159862Z { 2026-02-21T10:19:38.0159928Z .reg .pred complete; 2026-02-21T10:19:38.0159984Z waitLoop: 2026-02-21T10:19:38.0160128Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r9439; 2026-02-21T10:19:38.0160205Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0160260Z } 2026-02-21T10:19:38.0160265Z 2026-02-21T10:19:38.0160324Z // end inline asm 2026-02-21T10:19:38.0160386Z bar.sync 0; 2026-02-21T10:19:38.0160450Z // begin inline asm 2026-02-21T10:19:38.0160557Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0160617Z // end inline asm 2026-02-21T10:19:38.0160828Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0160900Z ld.shared.s8 %rs513, [%r19]; 2026-02-21T10:19:38.0161098Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0161244Z shl.b16 %rs514, %rs513, 4; 2026-02-21T10:19:38.0161440Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0161513Z ld.shared.s8 %rs515, [%r20+128]; 2026-02-21T10:19:38.0161771Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0161839Z shl.b16 %rs516, %rs515, 4; 2026-02-21T10:19:38.0162032Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0162105Z ld.shared.s8 %rs517, [%r21+256]; 2026-02-21T10:19:38.0162296Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0162357Z shl.b16 %rs518, %rs517, 4; 2026-02-21T10:19:38.0162552Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0162626Z ld.shared.s8 %rs519, [%r22+384]; 2026-02-21T10:19:38.0162820Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0162884Z shl.b16 %rs520, %rs519, 4; 2026-02-21T10:19:38.0163082Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0163147Z ld.shared.s8 %rs521, [%r23+512]; 2026-02-21T10:19:38.0163337Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0163404Z shl.b16 %rs522, %rs521, 4; 2026-02-21T10:19:38.0163597Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0163665Z ld.shared.s8 %rs523, [%r24+640]; 2026-02-21T10:19:38.0163860Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0163926Z shl.b16 %rs524, %rs523, 4; 2026-02-21T10:19:38.0164117Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0164181Z ld.shared.s8 %rs525, [%r25+768]; 2026-02-21T10:19:38.0164377Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0164453Z shl.b16 %rs526, %rs525, 4; 2026-02-21T10:19:38.0164649Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0164719Z ld.shared.s8 %rs527, [%r26+896]; 2026-02-21T10:19:38.0164912Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0164975Z shl.b16 %rs528, %rs527, 4; 2026-02-21T10:19:38.0165175Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0165300Z ld.shared.s8 %rs529, [%r19+1024]; 2026-02-21T10:19:38.0165564Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0165634Z shl.b16 %rs530, %rs529, 4; 2026-02-21T10:19:38.0165825Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0165894Z ld.shared.s8 %rs531, [%r20+1152]; 2026-02-21T10:19:38.0166088Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0166158Z shl.b16 %rs532, %rs531, 4; 2026-02-21T10:19:38.0166349Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0166415Z ld.shared.s8 %rs533, [%r21+1280]; 2026-02-21T10:19:38.0166751Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0166818Z shl.b16 %rs534, %rs533, 4; 2026-02-21T10:19:38.0167026Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0167173Z ld.shared.s8 %rs535, [%r22+1408]; 2026-02-21T10:19:38.0167369Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0167431Z shl.b16 %rs536, %rs535, 4; 2026-02-21T10:19:38.0167684Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0167752Z ld.shared.s8 %rs537, [%r23+1536]; 2026-02-21T10:19:38.0167945Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0168005Z shl.b16 %rs538, %rs537, 4; 2026-02-21T10:19:38.0168199Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0168264Z ld.shared.s8 %rs539, [%r24+1664]; 2026-02-21T10:19:38.0168454Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0168526Z shl.b16 %rs540, %rs539, 4; 2026-02-21T10:19:38.0168730Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0168798Z ld.shared.s8 %rs541, [%r25+1792]; 2026-02-21T10:19:38.0168994Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0169056Z shl.b16 %rs542, %rs541, 4; 2026-02-21T10:19:38.0169249Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0169317Z ld.shared.s8 %rs543, [%r26+1920]; 2026-02-21T10:19:38.0169508Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0169570Z shl.b16 %rs544, %rs543, 4; 2026-02-21T10:19:38.0169762Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0169834Z ld.shared.s8 %rs545, [%r19+2048]; 2026-02-21T10:19:38.0170027Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0170091Z shl.b16 %rs546, %rs545, 4; 2026-02-21T10:19:38.0170287Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0170353Z ld.shared.s8 %rs547, [%r20+2176]; 2026-02-21T10:19:38.0170544Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0170610Z shl.b16 %rs548, %rs547, 4; 2026-02-21T10:19:38.0170801Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0170867Z ld.shared.s8 %rs549, [%r21+2304]; 2026-02-21T10:19:38.0171062Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0171211Z shl.b16 %rs550, %rs549, 4; 2026-02-21T10:19:38.0171403Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0171531Z ld.shared.s8 %rs551, [%r22+2432]; 2026-02-21T10:19:38.0171725Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0171787Z shl.b16 %rs552, %rs551, 4; 2026-02-21T10:19:38.0171978Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0172048Z ld.shared.s8 %rs553, [%r23+2560]; 2026-02-21T10:19:38.0172239Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0172301Z shl.b16 %rs554, %rs553, 4; 2026-02-21T10:19:38.0172495Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0172561Z ld.shared.s8 %rs555, [%r24+2688]; 2026-02-21T10:19:38.0172755Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0172822Z shl.b16 %rs556, %rs555, 4; 2026-02-21T10:19:38.0173058Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0173126Z ld.shared.s8 %rs557, [%r25+2816]; 2026-02-21T10:19:38.0173364Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0173428Z shl.b16 %rs558, %rs557, 4; 2026-02-21T10:19:38.0173619Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0173684Z ld.shared.s8 %rs559, [%r26+2944]; 2026-02-21T10:19:38.0173881Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0173945Z shl.b16 %rs560, %rs559, 4; 2026-02-21T10:19:38.0174136Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0174211Z ld.shared.s8 %rs561, [%r19+3072]; 2026-02-21T10:19:38.0174413Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0174477Z shl.b16 %rs562, %rs561, 4; 2026-02-21T10:19:38.0174671Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0174739Z ld.shared.s8 %rs563, [%r20+3200]; 2026-02-21T10:19:38.0174930Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0174998Z shl.b16 %rs564, %rs563, 4; 2026-02-21T10:19:38.0175192Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0175258Z ld.shared.s8 %rs565, [%r21+3328]; 2026-02-21T10:19:38.0175453Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0175522Z shl.b16 %rs566, %rs565, 4; 2026-02-21T10:19:38.0175712Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0175793Z ld.shared.s8 %rs567, [%r22+3456]; 2026-02-21T10:19:38.0175996Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0176059Z shl.b16 %rs568, %rs567, 4; 2026-02-21T10:19:38.0176250Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0176321Z ld.shared.s8 %rs569, [%r23+3584]; 2026-02-21T10:19:38.0176644Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0176710Z shl.b16 %rs570, %rs569, 4; 2026-02-21T10:19:38.0176918Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0176986Z ld.shared.s8 %rs571, [%r24+3712]; 2026-02-21T10:19:38.0177260Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0177390Z shl.b16 %rs572, %rs571, 4; 2026-02-21T10:19:38.0177590Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0177659Z ld.shared.s8 %rs573, [%r25+3840]; 2026-02-21T10:19:38.0177852Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0177919Z shl.b16 %rs574, %rs573, 4; 2026-02-21T10:19:38.0178114Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0178181Z ld.shared.s8 %rs575, [%r26+3968]; 2026-02-21T10:19:38.0178377Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0178440Z shl.b16 %rs576, %rs575, 4; 2026-02-21T10:19:38.0178631Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0178703Z cvt.s16.s8 %rs577, %rs514; 2026-02-21T10:19:38.0178764Z shr.s16 %rs578, %rs577, 4; 2026-02-21T10:19:38.0178890Z cvt.s16.s8 %rs579, %rs516; 2026-02-21T10:19:38.0178954Z shr.s16 %rs580, %rs579, 4; 2026-02-21T10:19:38.0179020Z shr.s16 %rs581, %rs513, 4; 2026-02-21T10:19:38.0179081Z shr.s16 %rs582, %rs515, 4; 2026-02-21T10:19:38.0179328Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0179413Z cvt.rn.f32.s16 %r9705, %rs582; 2026-02-21T10:19:38.0179481Z cvt.rn.f32.s16 %r9706, %rs581; 2026-02-21T10:19:38.0179545Z cvt.rn.f32.s16 %r9707, %rs580; 2026-02-21T10:19:38.0179608Z cvt.rn.f32.s16 %r9708, %rs578; 2026-02-21T10:19:38.0179808Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0179872Z cvt.s16.s8 %rs583, %rs518; 2026-02-21T10:19:38.0179934Z shr.s16 %rs584, %rs583, 4; 2026-02-21T10:19:38.0180002Z cvt.s16.s8 %rs585, %rs520; 2026-02-21T10:19:38.0180063Z shr.s16 %rs586, %rs585, 4; 2026-02-21T10:19:38.0180123Z shr.s16 %rs587, %rs517, 4; 2026-02-21T10:19:38.0180193Z shr.s16 %rs588, %rs519, 4; 2026-02-21T10:19:38.0180384Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0180448Z cvt.rn.f32.s16 %r9709, %rs588; 2026-02-21T10:19:38.0180514Z cvt.rn.f32.s16 %r9710, %rs587; 2026-02-21T10:19:38.0180582Z cvt.rn.f32.s16 %r9711, %rs586; 2026-02-21T10:19:38.0180643Z cvt.rn.f32.s16 %r9712, %rs584; 2026-02-21T10:19:38.0180834Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0180898Z cvt.s16.s8 %rs589, %rs522; 2026-02-21T10:19:38.0180960Z shr.s16 %rs590, %rs589, 4; 2026-02-21T10:19:38.0181020Z cvt.s16.s8 %rs591, %rs524; 2026-02-21T10:19:38.0181081Z shr.s16 %rs592, %rs591, 4; 2026-02-21T10:19:38.0181149Z shr.s16 %rs593, %rs521, 4; 2026-02-21T10:19:38.0181209Z shr.s16 %rs594, %rs523, 4; 2026-02-21T10:19:38.0181405Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0181474Z cvt.rn.f32.s16 %r9713, %rs594; 2026-02-21T10:19:38.0181538Z cvt.rn.f32.s16 %r9714, %rs593; 2026-02-21T10:19:38.0181602Z cvt.rn.f32.s16 %r9715, %rs592; 2026-02-21T10:19:38.0181668Z cvt.rn.f32.s16 %r9716, %rs590; 2026-02-21T10:19:38.0181862Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0181938Z cvt.s16.s8 %rs595, %rs526; 2026-02-21T10:19:38.0182003Z shr.s16 %rs596, %rs595, 4; 2026-02-21T10:19:38.0182069Z cvt.s16.s8 %rs597, %rs528; 2026-02-21T10:19:38.0182129Z shr.s16 %rs598, %rs597, 4; 2026-02-21T10:19:38.0182191Z shr.s16 %rs599, %rs525, 4; 2026-02-21T10:19:38.0182257Z shr.s16 %rs600, %rs527, 4; 2026-02-21T10:19:38.0182449Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0182612Z cvt.rn.f32.s16 %r9717, %rs600; 2026-02-21T10:19:38.0182680Z cvt.rn.f32.s16 %r9718, %rs599; 2026-02-21T10:19:38.0182744Z cvt.rn.f32.s16 %r9719, %rs598; 2026-02-21T10:19:38.0182808Z cvt.rn.f32.s16 %r9720, %rs596; 2026-02-21T10:19:38.0183000Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0183068Z cvt.s16.s8 %rs601, %rs530; 2026-02-21T10:19:38.0183129Z shr.s16 %rs602, %rs601, 4; 2026-02-21T10:19:38.0183192Z cvt.s16.s8 %rs603, %rs532; 2026-02-21T10:19:38.0183258Z shr.s16 %rs604, %rs603, 4; 2026-02-21T10:19:38.0183319Z shr.s16 %rs605, %rs529, 4; 2026-02-21T10:19:38.0183380Z shr.s16 %rs606, %rs531, 4; 2026-02-21T10:19:38.0183571Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0183640Z cvt.rn.f32.s16 %r9721, %rs606; 2026-02-21T10:19:38.0183705Z cvt.rn.f32.s16 %r9722, %rs605; 2026-02-21T10:19:38.0183770Z cvt.rn.f32.s16 %r9723, %rs604; 2026-02-21T10:19:38.0183841Z cvt.rn.f32.s16 %r9724, %rs602; 2026-02-21T10:19:38.0184087Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0184163Z cvt.s16.s8 %rs607, %rs534; 2026-02-21T10:19:38.0184232Z shr.s16 %rs608, %rs607, 4; 2026-02-21T10:19:38.0184296Z cvt.s16.s8 %rs609, %rs536; 2026-02-21T10:19:38.0184399Z shr.s16 %rs610, %rs609, 4; 2026-02-21T10:19:38.0184462Z shr.s16 %rs611, %rs533, 4; 2026-02-21T10:19:38.0184537Z shr.s16 %rs612, %rs535, 4; 2026-02-21T10:19:38.0184734Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0184799Z cvt.rn.f32.s16 %r9725, %rs612; 2026-02-21T10:19:38.0184868Z cvt.rn.f32.s16 %r9726, %rs611; 2026-02-21T10:19:38.0184934Z cvt.rn.f32.s16 %r9727, %rs610; 2026-02-21T10:19:38.0184996Z cvt.rn.f32.s16 %r9728, %rs608; 2026-02-21T10:19:38.0185197Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0185261Z cvt.s16.s8 %rs613, %rs538; 2026-02-21T10:19:38.0185325Z shr.s16 %rs614, %rs613, 4; 2026-02-21T10:19:38.0185387Z cvt.s16.s8 %rs615, %rs540; 2026-02-21T10:19:38.0185455Z shr.s16 %rs616, %rs615, 4; 2026-02-21T10:19:38.0185517Z shr.s16 %rs617, %rs537, 4; 2026-02-21T10:19:38.0185579Z shr.s16 %rs618, %rs539, 4; 2026-02-21T10:19:38.0185779Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0185842Z cvt.rn.f32.s16 %r9729, %rs618; 2026-02-21T10:19:38.0185906Z cvt.rn.f32.s16 %r9730, %rs617; 2026-02-21T10:19:38.0185969Z cvt.rn.f32.s16 %r9731, %rs616; 2026-02-21T10:19:38.0186049Z cvt.rn.f32.s16 %r9732, %rs614; 2026-02-21T10:19:38.0186245Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0186308Z cvt.s16.s8 %rs619, %rs542; 2026-02-21T10:19:38.0186381Z shr.s16 %rs620, %rs619, 4; 2026-02-21T10:19:38.0186445Z cvt.s16.s8 %rs621, %rs544; 2026-02-21T10:19:38.0186688Z shr.s16 %rs622, %rs621, 4; 2026-02-21T10:19:38.0186760Z shr.s16 %rs623, %rs541, 4; 2026-02-21T10:19:38.0186822Z shr.s16 %rs624, %rs543, 4; 2026-02-21T10:19:38.0187033Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0187101Z cvt.rn.f32.s16 %r9733, %rs624; 2026-02-21T10:19:38.0187172Z cvt.rn.f32.s16 %r9734, %rs623; 2026-02-21T10:19:38.0187235Z cvt.rn.f32.s16 %r9735, %rs622; 2026-02-21T10:19:38.0187299Z cvt.rn.f32.s16 %r9736, %rs620; 2026-02-21T10:19:38.0187499Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0187563Z cvt.s16.s8 %rs625, %rs546; 2026-02-21T10:19:38.0187625Z shr.s16 %rs626, %rs625, 4; 2026-02-21T10:19:38.0187688Z cvt.s16.s8 %rs627, %rs548; 2026-02-21T10:19:38.0187757Z shr.s16 %rs628, %rs627, 4; 2026-02-21T10:19:38.0187921Z shr.s16 %rs629, %rs545, 4; 2026-02-21T10:19:38.0188044Z shr.s16 %rs630, %rs547, 4; 2026-02-21T10:19:38.0188242Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0188306Z cvt.rn.f32.s16 %r9737, %rs630; 2026-02-21T10:19:38.0188427Z cvt.rn.f32.s16 %r9738, %rs629; 2026-02-21T10:19:38.0188507Z cvt.rn.f32.s16 %r9739, %rs628; 2026-02-21T10:19:38.0188572Z cvt.rn.f32.s16 %r9740, %rs626; 2026-02-21T10:19:38.0188766Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0188830Z cvt.s16.s8 %rs631, %rs550; 2026-02-21T10:19:38.0188897Z shr.s16 %rs632, %rs631, 4; 2026-02-21T10:19:38.0188960Z cvt.s16.s8 %rs633, %rs552; 2026-02-21T10:19:38.0189020Z shr.s16 %rs634, %rs633, 4; 2026-02-21T10:19:38.0189085Z shr.s16 %rs635, %rs549, 4; 2026-02-21T10:19:38.0189145Z shr.s16 %rs636, %rs551, 4; 2026-02-21T10:19:38.0189340Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0189412Z cvt.rn.f32.s16 %r9741, %rs636; 2026-02-21T10:19:38.0189547Z cvt.rn.f32.s16 %r9742, %rs635; 2026-02-21T10:19:38.0189613Z cvt.rn.f32.s16 %r9743, %rs634; 2026-02-21T10:19:38.0189677Z cvt.rn.f32.s16 %r9744, %rs632; 2026-02-21T10:19:38.0189875Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0189997Z cvt.s16.s8 %rs637, %rs554; 2026-02-21T10:19:38.0190061Z shr.s16 %rs638, %rs637, 4; 2026-02-21T10:19:38.0190129Z cvt.s16.s8 %rs639, %rs556; 2026-02-21T10:19:38.0190191Z shr.s16 %rs640, %rs639, 4; 2026-02-21T10:19:38.0190251Z shr.s16 %rs641, %rs553, 4; 2026-02-21T10:19:38.0190311Z shr.s16 %rs642, %rs555, 4; 2026-02-21T10:19:38.0190509Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0190572Z cvt.rn.f32.s16 %r9745, %rs642; 2026-02-21T10:19:38.0190652Z cvt.rn.f32.s16 %r9746, %rs641; 2026-02-21T10:19:38.0190725Z cvt.rn.f32.s16 %r9747, %rs640; 2026-02-21T10:19:38.0190790Z cvt.rn.f32.s16 %r9748, %rs638; 2026-02-21T10:19:38.0190985Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0191053Z cvt.s16.s8 %rs643, %rs558; 2026-02-21T10:19:38.0191113Z shr.s16 %rs644, %rs643, 4; 2026-02-21T10:19:38.0191174Z cvt.s16.s8 %rs645, %rs560; 2026-02-21T10:19:38.0191237Z shr.s16 %rs646, %rs645, 4; 2026-02-21T10:19:38.0191303Z shr.s16 %rs647, %rs557, 4; 2026-02-21T10:19:38.0191364Z shr.s16 %rs648, %rs559, 4; 2026-02-21T10:19:38.0191556Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0191624Z cvt.rn.f32.s16 %r9749, %rs648; 2026-02-21T10:19:38.0191687Z cvt.rn.f32.s16 %r9750, %rs647; 2026-02-21T10:19:38.0191753Z cvt.rn.f32.s16 %r9751, %rs646; 2026-02-21T10:19:38.0191816Z cvt.rn.f32.s16 %r9752, %rs644; 2026-02-21T10:19:38.0192018Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0192083Z cvt.s16.s8 %rs649, %rs562; 2026-02-21T10:19:38.0192145Z shr.s16 %rs650, %rs649, 4; 2026-02-21T10:19:38.0192212Z cvt.s16.s8 %rs651, %rs564; 2026-02-21T10:19:38.0192274Z shr.s16 %rs652, %rs651, 4; 2026-02-21T10:19:38.0192335Z shr.s16 %rs653, %rs561, 4; 2026-02-21T10:19:38.0192403Z shr.s16 %rs654, %rs563, 4; 2026-02-21T10:19:38.0192593Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0192659Z cvt.rn.f32.s16 %r9753, %rs654; 2026-02-21T10:19:38.0192723Z cvt.rn.f32.s16 %r9754, %rs653; 2026-02-21T10:19:38.0192794Z cvt.rn.f32.s16 %r9755, %rs652; 2026-02-21T10:19:38.0192859Z cvt.rn.f32.s16 %r9756, %rs650; 2026-02-21T10:19:38.0193052Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0193188Z cvt.s16.s8 %rs655, %rs566; 2026-02-21T10:19:38.0193250Z shr.s16 %rs656, %rs655, 4; 2026-02-21T10:19:38.0193358Z cvt.s16.s8 %rs657, %rs568; 2026-02-21T10:19:38.0193426Z shr.s16 %rs658, %rs657, 4; 2026-02-21T10:19:38.0193488Z shr.s16 %rs659, %rs565, 4; 2026-02-21T10:19:38.0193550Z shr.s16 %rs660, %rs567, 4; 2026-02-21T10:19:38.0193744Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0193817Z cvt.rn.f32.s16 %r9757, %rs660; 2026-02-21T10:19:38.0193881Z cvt.rn.f32.s16 %r9758, %rs659; 2026-02-21T10:19:38.0193945Z cvt.rn.f32.s16 %r9759, %rs658; 2026-02-21T10:19:38.0194017Z cvt.rn.f32.s16 %r9760, %rs656; 2026-02-21T10:19:38.0194210Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0194273Z cvt.s16.s8 %rs661, %rs570; 2026-02-21T10:19:38.0194337Z shr.s16 %rs662, %rs661, 4; 2026-02-21T10:19:38.0194406Z cvt.s16.s8 %rs663, %rs572; 2026-02-21T10:19:38.0194475Z shr.s16 %rs664, %rs663, 4; 2026-02-21T10:19:38.0194541Z shr.s16 %rs665, %rs569, 4; 2026-02-21T10:19:38.0194608Z shr.s16 %rs666, %rs571, 4; 2026-02-21T10:19:38.0194851Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0194918Z cvt.rn.f32.s16 %r9761, %rs666; 2026-02-21T10:19:38.0194989Z cvt.rn.f32.s16 %r9762, %rs665; 2026-02-21T10:19:38.0195051Z cvt.rn.f32.s16 %r9763, %rs664; 2026-02-21T10:19:38.0195158Z cvt.rn.f32.s16 %r9764, %rs662; 2026-02-21T10:19:38.0195352Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0195419Z cvt.s16.s8 %rs667, %rs574; 2026-02-21T10:19:38.0195481Z shr.s16 %rs668, %rs667, 4; 2026-02-21T10:19:38.0195541Z cvt.s16.s8 %rs669, %rs576; 2026-02-21T10:19:38.0195607Z shr.s16 %rs670, %rs669, 4; 2026-02-21T10:19:38.0195666Z shr.s16 %rs671, %rs573, 4; 2026-02-21T10:19:38.0195728Z shr.s16 %rs672, %rs575, 4; 2026-02-21T10:19:38.0195922Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0195993Z cvt.rn.f32.s16 %r9765, %rs672; 2026-02-21T10:19:38.0196056Z cvt.rn.f32.s16 %r9766, %rs671; 2026-02-21T10:19:38.0196120Z cvt.rn.f32.s16 %r9767, %rs670; 2026-02-21T10:19:38.0196187Z cvt.rn.f32.s16 %r9768, %rs668; 2026-02-21T10:19:38.0196247Z bar.sync 0; 2026-02-21T10:19:38.0196364Z st.shared.v4.b32 [%r27], {%r9708, %r9706, %r9707, %r9705}; 2026-02-21T10:19:38.0196620Z st.shared.v4.b32 [%r27+16384], {%r9740, %r9738, %r9739, %r9737}; 2026-02-21T10:19:38.0196736Z st.shared.v4.b32 [%r28], {%r9712, %r9710, %r9711, %r9709}; 2026-02-21T10:19:38.0196852Z st.shared.v4.b32 [%r28+16384], {%r9744, %r9742, %r9743, %r9741}; 2026-02-21T10:19:38.0196955Z st.shared.v4.b32 [%r29], {%r9716, %r9714, %r9715, %r9713}; 2026-02-21T10:19:38.0197085Z st.shared.v4.b32 [%r29+16384], {%r9748, %r9746, %r9747, %r9745}; 2026-02-21T10:19:38.0197191Z st.shared.v4.b32 [%r30], {%r9720, %r9718, %r9719, %r9717}; 2026-02-21T10:19:38.0197308Z st.shared.v4.b32 [%r30+16384], {%r9752, %r9750, %r9751, %r9749}; 2026-02-21T10:19:38.0197420Z st.shared.v4.b32 [%r31], {%r9724, %r9722, %r9723, %r9721}; 2026-02-21T10:19:38.0197532Z st.shared.v4.b32 [%r31+16384], {%r9756, %r9754, %r9755, %r9753}; 2026-02-21T10:19:38.0197634Z st.shared.v4.b32 [%r32], {%r9728, %r9726, %r9727, %r9725}; 2026-02-21T10:19:38.0197752Z st.shared.v4.b32 [%r32+16384], {%r9760, %r9758, %r9759, %r9757}; 2026-02-21T10:19:38.0197856Z st.shared.v4.b32 [%r33], {%r9732, %r9730, %r9731, %r9729}; 2026-02-21T10:19:38.0197969Z st.shared.v4.b32 [%r33+16384], {%r9764, %r9762, %r9763, %r9761}; 2026-02-21T10:19:38.0198075Z st.shared.v4.b32 [%r34], {%r9736, %r9734, %r9735, %r9733}; 2026-02-21T10:19:38.0198187Z st.shared.v4.b32 [%r34+16384], {%r9768, %r9766, %r9767, %r9765}; 2026-02-21T10:19:38.0198245Z $L__tmp5: 2026-02-21T10:19:38.0198519Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0198721Z // begin inline asm 2026-02-21T10:19:38.0198801Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0198859Z // end inline asm 2026-02-21T10:19:38.0198920Z bar.sync 0; 2026-02-21T10:19:38.0198993Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0199053Z // begin inline asm 2026-02-21T10:19:38.0200550Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7325,%r7326,%r7327,%r7328}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0200610Z // end inline asm 2026-02-21T10:19:38.0200668Z // begin inline asm 2026-02-21T10:19:38.0202269Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7457,%r7458,%r7459,%r7460}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0202333Z // end inline asm 2026-02-21T10:19:38.0202397Z // begin inline asm 2026-02-21T10:19:38.0203868Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7589,%r7590,%r7591,%r7592}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0203931Z // end inline asm 2026-02-21T10:19:38.0203998Z // begin inline asm 2026-02-21T10:19:38.0205478Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7721,%r7722,%r7723,%r7724}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0205550Z // end inline asm 2026-02-21T10:19:38.0205620Z // begin inline asm 2026-02-21T10:19:38.0207222Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7853,%r7854,%r7855,%r7856}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0207289Z // end inline asm 2026-02-21T10:19:38.0207348Z // begin inline asm 2026-02-21T10:19:38.0208910Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r7985,%r7986,%r7987,%r7988}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0209041Z // end inline asm 2026-02-21T10:19:38.0209099Z // begin inline asm 2026-02-21T10:19:38.0210631Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r8117,%r8118,%r8119,%r8120}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0210696Z // end inline asm 2026-02-21T10:19:38.0210755Z // begin inline asm 2026-02-21T10:19:38.0212326Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r8249,%r8250,%r8251,%r8252}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0212391Z // end inline asm 2026-02-21T10:19:38.0212450Z // begin inline asm 2026-02-21T10:19:38.0213925Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r8381,%r8382,%r8383,%r8384}, %rd3, %p42, 1, 1; 2026-02-21T10:19:38.0213983Z // end inline asm 2026-02-21T10:19:38.0214046Z // begin inline asm 2026-02-21T10:19:38.0215520Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r8513,%r8514,%r8515,%r8516}, %rd4, %p42, 1, 1; 2026-02-21T10:19:38.0215582Z // end inline asm 2026-02-21T10:19:38.0215647Z // begin inline asm 2026-02-21T10:19:38.0217794Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r8645,%r8646,%r8647,%r8648}, %rd5, %p42, 1, 1; 2026-02-21T10:19:38.0218003Z // end inline asm 2026-02-21T10:19:38.0218064Z // begin inline asm 2026-02-21T10:19:38.0219556Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r8777,%r8778,%r8779,%r8780}, %rd6, %p42, 1, 1; 2026-02-21T10:19:38.0219622Z // end inline asm 2026-02-21T10:19:38.0219682Z // begin inline asm 2026-02-21T10:19:38.0221284Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r8909,%r8910,%r8911,%r8912}, %rd7, %p42, 1, 1; 2026-02-21T10:19:38.0221356Z // end inline asm 2026-02-21T10:19:38.0221416Z // begin inline asm 2026-02-21T10:19:38.0222896Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r9041,%r9042,%r9043,%r9044}, %rd8, %p42, 1, 1; 2026-02-21T10:19:38.0222962Z // end inline asm 2026-02-21T10:19:38.0223021Z // begin inline asm 2026-02-21T10:19:38.0224506Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r9173,%r9174,%r9175,%r9176}, %rd9, %p42, 1, 1; 2026-02-21T10:19:38.0224569Z // end inline asm 2026-02-21T10:19:38.0224631Z // begin inline asm 2026-02-21T10:19:38.0226109Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r9305,%r9306,%r9307,%r9308}, %rd10, %p42, 1, 1; 2026-02-21T10:19:38.0226173Z // end inline asm 2026-02-21T10:19:38.0226261Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0226386Z mov.b32 %r9437, %r39931; 2026-02-21T10:19:38.0226616Z mov.b32 %r9438, %r9439; 2026-02-21T10:19:38.0226680Z // begin inline asm 2026-02-21T10:19:38.0229501Z // wait for regs: %r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r9437,%r9438,%r9439 2026-02-21T10:19:38.0229603Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0229664Z // end inline asm 2026-02-21T10:19:38.0229720Z $L__tmp6: 2026-02-21T10:19:38.0229993Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0230081Z add.s64 %rd840, %rd840, 384; 2026-02-21T10:19:38.0230149Z add.s32 %r42468, %r42468, 192; 2026-02-21T10:19:38.0230220Z setp.lt.u64 %p102, %rd31, 3936; 2026-02-21T10:19:38.0230289Z mov.b64 %rd841, %rd31; 2026-02-21T10:19:38.0230352Z @%p102 bra $L__BB0_3; 2026-02-21T10:19:38.0230466Z // %bb.4: // %.preheader274.preheader 2026-02-21T10:19:38.0230570Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.0230646Z add.s64 %rd33, %rd28, 16128; 2026-02-21T10:19:38.0230712Z add.s64 %rd34, %rd21, 16128; 2026-02-21T10:19:38.0230773Z add.s64 %rd35, %rd22, 16128; 2026-02-21T10:19:38.0230843Z add.s64 %rd36, %rd23, 16128; 2026-02-21T10:19:38.0230906Z add.s64 %rd37, %rd24, 16128; 2026-02-21T10:19:38.0230969Z add.s64 %rd38, %rd25, 16128; 2026-02-21T10:19:38.0231038Z add.s64 %rd39, %rd26, 16128; 2026-02-21T10:19:38.0231101Z add.s64 %rd40, %rd27, 16128; 2026-02-21T10:19:38.0231163Z mov.b64 %rd843, 4000; 2026-02-21T10:19:38.0231225Z mov.b64 %rd842, %rd11; 2026-02-21T10:19:38.0231346Z $L__BB0_5: // %.preheader274 2026-02-21T10:19:38.0231450Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:38.0231557Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.0231766Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0231837Z add.s64 %rd290, %rd842, %rd40; 2026-02-21T10:19:38.0231902Z add.s64 %rd293, %rd842, %rd39; 2026-02-21T10:19:38.0231973Z add.s64 %rd296, %rd842, %rd38; 2026-02-21T10:19:38.0232037Z add.s64 %rd299, %rd842, %rd37; 2026-02-21T10:19:38.0232100Z add.s64 %rd302, %rd842, %rd36; 2026-02-21T10:19:38.0232161Z add.s64 %rd305, %rd842, %rd35; 2026-02-21T10:19:38.0232230Z add.s64 %rd308, %rd842, %rd34; 2026-02-21T10:19:38.0232430Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0232495Z add.s64 %rd311, %rd842, %rd33; 2026-02-21T10:19:38.0232561Z // begin inline asm 2026-02-21T10:19:38.0232623Z mov.u64 %rd289, 0x0; 2026-02-21T10:19:38.0232755Z createpolicy.fractional.L2::evict_first.b64 %rd289, 1.0; 2026-02-21T10:19:38.0232816Z // end inline asm 2026-02-21T10:19:38.0232881Z // begin inline asm 2026-02-21T10:19:38.0232940Z mov.u32 %r9769, 0x0; 2026-02-21T10:19:38.0232998Z mov.u32 %r9770, 0x0; 2026-02-21T10:19:38.0233149Z mov.u32 %r9771, 0x0; 2026-02-21T10:19:38.0233208Z mov.u32 %r9772, 0x0; 2026-02-21T10:19:38.0233508Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9769, %r9770, %r9771, %r9772 }, [ %rd290 + 0 ], %rd289; 2026-02-21T10:19:38.0233572Z // end inline asm 2026-02-21T10:19:38.0233631Z // begin inline asm 2026-02-21T10:19:38.0233690Z mov.u64 %rd292, 0x0; 2026-02-21T10:19:38.0233812Z createpolicy.fractional.L2::evict_first.b64 %rd292, 1.0; 2026-02-21T10:19:38.0233879Z // end inline asm 2026-02-21T10:19:38.0233939Z // begin inline asm 2026-02-21T10:19:38.0233996Z mov.u32 %r9773, 0x0; 2026-02-21T10:19:38.0234058Z mov.u32 %r9774, 0x0; 2026-02-21T10:19:38.0234116Z mov.u32 %r9775, 0x0; 2026-02-21T10:19:38.0234174Z mov.u32 %r9776, 0x0; 2026-02-21T10:19:38.0234393Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9773, %r9774, %r9775, %r9776 }, [ %rd293 + 0 ], %rd292; 2026-02-21T10:19:38.0234459Z // end inline asm 2026-02-21T10:19:38.0234519Z // begin inline asm 2026-02-21T10:19:38.0234580Z mov.u64 %rd295, 0x0; 2026-02-21T10:19:38.0234706Z createpolicy.fractional.L2::evict_first.b64 %rd295, 1.0; 2026-02-21T10:19:38.0234765Z // end inline asm 2026-02-21T10:19:38.0234876Z // begin inline asm 2026-02-21T10:19:38.0234942Z mov.u32 %r9777, 0x0; 2026-02-21T10:19:38.0235005Z mov.u32 %r9778, 0x0; 2026-02-21T10:19:38.0235062Z mov.u32 %r9779, 0x0; 2026-02-21T10:19:38.0235121Z mov.u32 %r9780, 0x0; 2026-02-21T10:19:38.0235385Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9777, %r9778, %r9779, %r9780 }, [ %rd296 + 0 ], %rd295; 2026-02-21T10:19:38.0235446Z // end inline asm 2026-02-21T10:19:38.0235506Z // begin inline asm 2026-02-21T10:19:38.0235570Z mov.u64 %rd298, 0x0; 2026-02-21T10:19:38.0235692Z createpolicy.fractional.L2::evict_first.b64 %rd298, 1.0; 2026-02-21T10:19:38.0235750Z // end inline asm 2026-02-21T10:19:38.0235814Z // begin inline asm 2026-02-21T10:19:38.0235871Z mov.u32 %r9781, 0x0; 2026-02-21T10:19:38.0235930Z mov.u32 %r9782, 0x0; 2026-02-21T10:19:38.0235988Z mov.u32 %r9783, 0x0; 2026-02-21T10:19:38.0236053Z mov.u32 %r9784, 0x0; 2026-02-21T10:19:38.0236268Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9781, %r9782, %r9783, %r9784 }, [ %rd299 + 0 ], %rd298; 2026-02-21T10:19:38.0236328Z // end inline asm 2026-02-21T10:19:38.0236395Z // begin inline asm 2026-02-21T10:19:38.0236611Z mov.u64 %rd301, 0x0; 2026-02-21T10:19:38.0236740Z createpolicy.fractional.L2::evict_first.b64 %rd301, 1.0; 2026-02-21T10:19:38.0236799Z // end inline asm 2026-02-21T10:19:38.0236866Z // begin inline asm 2026-02-21T10:19:38.0236925Z mov.u32 %r9785, 0x0; 2026-02-21T10:19:38.0236983Z mov.u32 %r9786, 0x0; 2026-02-21T10:19:38.0237046Z mov.u32 %r9787, 0x0; 2026-02-21T10:19:38.0237105Z mov.u32 %r9788, 0x0; 2026-02-21T10:19:38.0237336Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9785, %r9786, %r9787, %r9788 }, [ %rd302 + 0 ], %rd301; 2026-02-21T10:19:38.0237400Z // end inline asm 2026-02-21T10:19:38.0237459Z // begin inline asm 2026-02-21T10:19:38.0237519Z mov.u64 %rd304, 0x0; 2026-02-21T10:19:38.0237641Z createpolicy.fractional.L2::evict_first.b64 %rd304, 1.0; 2026-02-21T10:19:38.0237706Z // end inline asm 2026-02-21T10:19:38.0237766Z // begin inline asm 2026-02-21T10:19:38.0237826Z mov.u32 %r9789, 0x0; 2026-02-21T10:19:38.0237890Z mov.u32 %r9790, 0x0; 2026-02-21T10:19:38.0237948Z mov.u32 %r9791, 0x0; 2026-02-21T10:19:38.0238007Z mov.u32 %r9792, 0x0; 2026-02-21T10:19:38.0238222Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9789, %r9790, %r9791, %r9792 }, [ %rd305 + 0 ], %rd304; 2026-02-21T10:19:38.0238288Z // end inline asm 2026-02-21T10:19:38.0238347Z // begin inline asm 2026-02-21T10:19:38.0238407Z mov.u64 %rd307, 0x0; 2026-02-21T10:19:38.0238533Z createpolicy.fractional.L2::evict_first.b64 %rd307, 1.0; 2026-02-21T10:19:38.0238592Z // end inline asm 2026-02-21T10:19:38.0238652Z // begin inline asm 2026-02-21T10:19:38.0238715Z mov.u32 %r9793, 0x0; 2026-02-21T10:19:38.0238775Z mov.u32 %r9794, 0x0; 2026-02-21T10:19:38.0238834Z mov.u32 %r9795, 0x0; 2026-02-21T10:19:38.0238978Z mov.u32 %r9796, 0x0; 2026-02-21T10:19:38.0239200Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9793, %r9794, %r9795, %r9796 }, [ %rd308 + 0 ], %rd307; 2026-02-21T10:19:38.0239319Z // end inline asm 2026-02-21T10:19:38.0239380Z // begin inline asm 2026-02-21T10:19:38.0239445Z mov.u64 %rd310, 0x0; 2026-02-21T10:19:38.0239564Z createpolicy.fractional.L2::evict_first.b64 %rd310, 1.0; 2026-02-21T10:19:38.0239623Z // end inline asm 2026-02-21T10:19:38.0239688Z // begin inline asm 2026-02-21T10:19:38.0239748Z mov.u32 %r9797, 0x0; 2026-02-21T10:19:38.0239807Z mov.u32 %r9798, 0x0; 2026-02-21T10:19:38.0239866Z mov.u32 %r9799, 0x0; 2026-02-21T10:19:38.0239934Z mov.u32 %r9800, 0x0; 2026-02-21T10:19:38.0240170Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9797, %r9798, %r9799, %r9800 }, [ %rd311 + 0 ], %rd310; 2026-02-21T10:19:38.0240234Z // end inline asm 2026-02-21T10:19:38.0240447Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0240510Z bar.sync 0; 2026-02-21T10:19:38.0240593Z st.shared.v2.b32 [%r9], {%r9769, %r9770}; 2026-02-21T10:19:38.0240775Z st.shared.v2.b32 [%r9+2048], {%r9773, %r9774}; 2026-02-21T10:19:38.0240878Z st.shared.v2.b32 [%r9+4096], {%r9777, %r9778}; 2026-02-21T10:19:38.0240961Z st.shared.v2.b32 [%r9+6144], {%r9781, %r9782}; 2026-02-21T10:19:38.0241041Z st.shared.v2.b32 [%r9+8192], {%r9785, %r9786}; 2026-02-21T10:19:38.0241191Z st.shared.v2.b32 [%r9+10240], {%r9789, %r9790}; 2026-02-21T10:19:38.0241278Z st.shared.v2.b32 [%r9+12288], {%r9793, %r9794}; 2026-02-21T10:19:38.0241361Z st.shared.v2.b32 [%r9+14336], {%r9797, %r9798}; 2026-02-21T10:19:38.0241447Z st.shared.v2.b32 [%r10], {%r9771, %r9772}; 2026-02-21T10:19:38.0241529Z st.shared.v2.b32 [%r10+2048], {%r9775, %r9776}; 2026-02-21T10:19:38.0241612Z st.shared.v2.b32 [%r10+4096], {%r9779, %r9780}; 2026-02-21T10:19:38.0241693Z st.shared.v2.b32 [%r10+6144], {%r9783, %r9784}; 2026-02-21T10:19:38.0241790Z st.shared.v2.b32 [%r10+8192], {%r9787, %r9788}; 2026-02-21T10:19:38.0241880Z st.shared.v2.b32 [%r10+10240], {%r9791, %r9792}; 2026-02-21T10:19:38.0241968Z st.shared.v2.b32 [%r10+12288], {%r9795, %r9796}; 2026-02-21T10:19:38.0242061Z st.shared.v2.b32 [%r10+14336], {%r9799, %r9800}; 2026-02-21T10:19:38.0242120Z bar.sync 0; 2026-02-21T10:19:38.0242191Z ld.shared.b16 %rs673, [%r11]; 2026-02-21T10:19:38.0242271Z ld.shared.b16 %rs674, [%r11+1024]; 2026-02-21T10:19:38.0242342Z ld.shared.b16 %rs675, [%r11+64]; 2026-02-21T10:19:38.0242410Z ld.shared.b16 %rs676, [%r11+1088]; 2026-02-21T10:19:38.0242476Z ld.shared.b16 %rs677, [%r11+8192]; 2026-02-21T10:19:38.0242549Z ld.shared.b16 %rs678, [%r11+9216]; 2026-02-21T10:19:38.0242627Z ld.shared.b16 %rs679, [%r11+8256]; 2026-02-21T10:19:38.0242695Z ld.shared.b16 %rs680, [%r11+9280]; 2026-02-21T10:19:38.0242767Z ld.shared.b16 %rs681, [%r12]; 2026-02-21T10:19:38.0242834Z ld.shared.b16 %rs682, [%r12+1024]; 2026-02-21T10:19:38.0242900Z ld.shared.b16 %rs683, [%r12+64]; 2026-02-21T10:19:38.0242967Z ld.shared.b16 %rs684, [%r12+1088]; 2026-02-21T10:19:38.0243037Z ld.shared.b16 %rs685, [%r12+8192]; 2026-02-21T10:19:38.0243103Z ld.shared.b16 %rs686, [%r12+9216]; 2026-02-21T10:19:38.0243170Z ld.shared.b16 %rs687, [%r12+8256]; 2026-02-21T10:19:38.0243240Z ld.shared.b16 %rs688, [%r12+9280]; 2026-02-21T10:19:38.0243307Z ld.shared.b16 %rs689, [%r13]; 2026-02-21T10:19:38.0243373Z ld.shared.b16 %rs690, [%r13+1024]; 2026-02-21T10:19:38.0243446Z ld.shared.b16 %rs691, [%r13+64]; 2026-02-21T10:19:38.0243512Z ld.shared.b16 %rs692, [%r13+1088]; 2026-02-21T10:19:38.0243576Z ld.shared.b16 %rs693, [%r13+8192]; 2026-02-21T10:19:38.0243640Z ld.shared.b16 %rs694, [%r13+9216]; 2026-02-21T10:19:38.0243712Z ld.shared.b16 %rs695, [%r13+8256]; 2026-02-21T10:19:38.0243779Z ld.shared.b16 %rs696, [%r13+9280]; 2026-02-21T10:19:38.0243844Z ld.shared.b16 %rs697, [%r14]; 2026-02-21T10:19:38.0243913Z ld.shared.b16 %rs698, [%r14+1024]; 2026-02-21T10:19:38.0244038Z ld.shared.b16 %rs699, [%r14+64]; 2026-02-21T10:19:38.0244104Z ld.shared.b16 %rs700, [%r14+1088]; 2026-02-21T10:19:38.0244214Z ld.shared.b16 %rs701, [%r14+8192]; 2026-02-21T10:19:38.0244294Z ld.shared.b16 %rs702, [%r14+9216]; 2026-02-21T10:19:38.0244362Z ld.shared.b16 %rs703, [%r14+8256]; 2026-02-21T10:19:38.0244427Z ld.shared.b16 %rs704, [%r14+9280]; 2026-02-21T10:19:38.0244498Z ld.shared.b16 %rs705, [%r15]; 2026-02-21T10:19:38.0244576Z ld.shared.b16 %rs706, [%r15+1024]; 2026-02-21T10:19:38.0244646Z ld.shared.b16 %rs707, [%r15+64]; 2026-02-21T10:19:38.0244718Z ld.shared.b16 %rs708, [%r15+1088]; 2026-02-21T10:19:38.0244783Z ld.shared.b16 %rs709, [%r15+8192]; 2026-02-21T10:19:38.0244849Z ld.shared.b16 %rs710, [%r15+9216]; 2026-02-21T10:19:38.0244915Z ld.shared.b16 %rs711, [%r15+8256]; 2026-02-21T10:19:38.0244989Z ld.shared.b16 %rs712, [%r15+9280]; 2026-02-21T10:19:38.0245055Z ld.shared.b16 %rs713, [%r16]; 2026-02-21T10:19:38.0245121Z ld.shared.b16 %rs714, [%r16+1024]; 2026-02-21T10:19:38.0245198Z ld.shared.b16 %rs715, [%r16+64]; 2026-02-21T10:19:38.0245265Z ld.shared.b16 %rs716, [%r16+1088]; 2026-02-21T10:19:38.0245333Z ld.shared.b16 %rs717, [%r16+8192]; 2026-02-21T10:19:38.0245460Z ld.shared.b16 %rs718, [%r16+9216]; 2026-02-21T10:19:38.0245536Z ld.shared.b16 %rs719, [%r16+8256]; 2026-02-21T10:19:38.0245602Z ld.shared.b16 %rs720, [%r16+9280]; 2026-02-21T10:19:38.0245669Z ld.shared.b16 %rs721, [%r17]; 2026-02-21T10:19:38.0245787Z ld.shared.b16 %rs722, [%r17+1024]; 2026-02-21T10:19:38.0245856Z ld.shared.b16 %rs723, [%r17+64]; 2026-02-21T10:19:38.0245922Z ld.shared.b16 %rs724, [%r17+1088]; 2026-02-21T10:19:38.0245988Z ld.shared.b16 %rs725, [%r17+8192]; 2026-02-21T10:19:38.0246072Z ld.shared.b16 %rs726, [%r17+9216]; 2026-02-21T10:19:38.0246138Z ld.shared.b16 %rs727, [%r17+8256]; 2026-02-21T10:19:38.0246205Z ld.shared.b16 %rs728, [%r17+9280]; 2026-02-21T10:19:38.0246277Z ld.shared.b16 %rs729, [%r18]; 2026-02-21T10:19:38.0246342Z ld.shared.b16 %rs730, [%r18+1024]; 2026-02-21T10:19:38.0246412Z ld.shared.b16 %rs731, [%r18+64]; 2026-02-21T10:19:38.0246642Z ld.shared.b16 %rs732, [%r18+1088]; 2026-02-21T10:19:38.0246716Z ld.shared.b16 %rs733, [%r18+8192]; 2026-02-21T10:19:38.0246784Z ld.shared.b16 %rs734, [%r18+9216]; 2026-02-21T10:19:38.0246852Z ld.shared.b16 %rs735, [%r18+8256]; 2026-02-21T10:19:38.0246922Z ld.shared.b16 %rs736, [%r18+9280]; 2026-02-21T10:19:38.0246990Z cvt.f32.bf16 %r9938, %rs673; 2026-02-21T10:19:38.0247069Z cvt.f32.bf16 %r9939, %rs674; 2026-02-21T10:19:38.0247144Z cvt.f32.bf16 %r9940, %rs681; 2026-02-21T10:19:38.0247212Z cvt.f32.bf16 %r9941, %rs682; 2026-02-21T10:19:38.0247277Z cvt.f32.bf16 %r10070, %rs689; 2026-02-21T10:19:38.0247343Z cvt.f32.bf16 %r10071, %rs690; 2026-02-21T10:19:38.0247410Z cvt.f32.bf16 %r10072, %rs697; 2026-02-21T10:19:38.0260592Z cvt.f32.bf16 %r10073, %rs698; 2026-02-21T10:19:38.0260701Z cvt.f32.bf16 %r10202, %rs705; 2026-02-21T10:19:38.0260768Z cvt.f32.bf16 %r10203, %rs706; 2026-02-21T10:19:38.0260843Z cvt.f32.bf16 %r10204, %rs713; 2026-02-21T10:19:38.0260914Z cvt.f32.bf16 %r10205, %rs714; 2026-02-21T10:19:38.0260979Z cvt.f32.bf16 %r10334, %rs721; 2026-02-21T10:19:38.0261042Z cvt.f32.bf16 %r10335, %rs722; 2026-02-21T10:19:38.0261112Z cvt.f32.bf16 %r10336, %rs729; 2026-02-21T10:19:38.0261179Z cvt.f32.bf16 %r10337, %rs730; 2026-02-21T10:19:38.0261240Z cvt.f32.bf16 %r10466, %rs675; 2026-02-21T10:19:38.0261309Z cvt.f32.bf16 %r10467, %rs676; 2026-02-21T10:19:38.0261374Z cvt.f32.bf16 %r10468, %rs683; 2026-02-21T10:19:38.0261436Z cvt.f32.bf16 %r10469, %rs684; 2026-02-21T10:19:38.0261497Z cvt.f32.bf16 %r10598, %rs691; 2026-02-21T10:19:38.0261565Z cvt.f32.bf16 %r10599, %rs692; 2026-02-21T10:19:38.0261627Z cvt.f32.bf16 %r10600, %rs699; 2026-02-21T10:19:38.0261701Z cvt.f32.bf16 %r10601, %rs700; 2026-02-21T10:19:38.0261771Z cvt.f32.bf16 %r10730, %rs707; 2026-02-21T10:19:38.0261832Z cvt.f32.bf16 %r10731, %rs708; 2026-02-21T10:19:38.0261894Z cvt.f32.bf16 %r10732, %rs715; 2026-02-21T10:19:38.0262104Z cvt.f32.bf16 %r10733, %rs716; 2026-02-21T10:19:38.0262242Z cvt.f32.bf16 %r10862, %rs723; 2026-02-21T10:19:38.0262304Z cvt.f32.bf16 %r10863, %rs724; 2026-02-21T10:19:38.0262367Z cvt.f32.bf16 %r10864, %rs731; 2026-02-21T10:19:38.0262435Z cvt.f32.bf16 %r10865, %rs732; 2026-02-21T10:19:38.0262497Z cvt.f32.bf16 %r10994, %rs677; 2026-02-21T10:19:38.0262557Z cvt.f32.bf16 %r10995, %rs678; 2026-02-21T10:19:38.0262630Z cvt.f32.bf16 %r10996, %rs685; 2026-02-21T10:19:38.0262700Z cvt.f32.bf16 %r10997, %rs686; 2026-02-21T10:19:38.0262762Z cvt.f32.bf16 %r11126, %rs693; 2026-02-21T10:19:38.0262822Z cvt.f32.bf16 %r11127, %rs694; 2026-02-21T10:19:38.0262889Z cvt.f32.bf16 %r11128, %rs701; 2026-02-21T10:19:38.0262953Z cvt.f32.bf16 %r11129, %rs702; 2026-02-21T10:19:38.0263018Z cvt.f32.bf16 %r11258, %rs709; 2026-02-21T10:19:38.0263085Z cvt.f32.bf16 %r11259, %rs710; 2026-02-21T10:19:38.0263147Z cvt.f32.bf16 %r11260, %rs717; 2026-02-21T10:19:38.0263208Z cvt.f32.bf16 %r11261, %rs718; 2026-02-21T10:19:38.0263275Z cvt.f32.bf16 %r11390, %rs725; 2026-02-21T10:19:38.0263343Z cvt.f32.bf16 %r11391, %rs726; 2026-02-21T10:19:38.0263404Z cvt.f32.bf16 %r11392, %rs733; 2026-02-21T10:19:38.0263551Z cvt.f32.bf16 %r11393, %rs734; 2026-02-21T10:19:38.0263620Z cvt.f32.bf16 %r11522, %rs679; 2026-02-21T10:19:38.0263681Z cvt.f32.bf16 %r11523, %rs680; 2026-02-21T10:19:38.0263742Z cvt.f32.bf16 %r11524, %rs687; 2026-02-21T10:19:38.0263860Z cvt.f32.bf16 %r11525, %rs688; 2026-02-21T10:19:38.0263936Z cvt.f32.bf16 %r11654, %rs695; 2026-02-21T10:19:38.0263998Z cvt.f32.bf16 %r11655, %rs696; 2026-02-21T10:19:38.0264058Z cvt.f32.bf16 %r11656, %rs703; 2026-02-21T10:19:38.0264127Z cvt.f32.bf16 %r11657, %rs704; 2026-02-21T10:19:38.0264189Z cvt.f32.bf16 %r11786, %rs711; 2026-02-21T10:19:38.0264250Z cvt.f32.bf16 %r11787, %rs712; 2026-02-21T10:19:38.0264315Z cvt.f32.bf16 %r11788, %rs719; 2026-02-21T10:19:38.0264387Z cvt.f32.bf16 %r11789, %rs720; 2026-02-21T10:19:38.0264452Z cvt.f32.bf16 %r11918, %rs727; 2026-02-21T10:19:38.0264520Z cvt.f32.bf16 %r11919, %rs728; 2026-02-21T10:19:38.0264598Z cvt.f32.bf16 %r11920, %rs735; 2026-02-21T10:19:38.0264673Z cvt.f32.bf16 %r11921, %rs736; 2026-02-21T10:19:38.0264911Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0264979Z bar.sync 0; 2026-02-21T10:19:38.0265045Z // begin inline asm 2026-02-21T10:19:38.0265166Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0265226Z // end inline asm 2026-02-21T10:19:38.0265289Z bar.sync 0; 2026-02-21T10:19:38.0265350Z // begin inline asm 2026-02-21T10:19:38.0265493Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0265560Z // end inline asm 2026-02-21T10:19:38.0265621Z // begin inline asm 2026-02-21T10:19:38.0265702Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0265760Z // end inline asm 2026-02-21T10:19:38.0265828Z bar.sync 0; 2026-02-21T10:19:38.0265903Z elect.sync %r12184|%p124, -1; 2026-02-21T10:19:38.0265975Z and.pred %p105, %p1, %p124; 2026-02-21T10:19:38.0266049Z add.s64 %rd843, %rd843, 32; 2026-02-21T10:19:38.0266115Z cvt.u32.u64 %r9805, %rd843; 2026-02-21T10:19:38.0266177Z // begin inline asm 2026-02-21T10:19:38.0266686Z @%p105 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r9804, %r9805}], [%r29846]; 2026-02-21T10:19:38.0266753Z // end inline asm 2026-02-21T10:19:38.0266813Z bar.sync 0; 2026-02-21T10:19:38.0266875Z mov.b32 %r12051, 0; 2026-02-21T10:19:38.0266948Z // begin inline asm 2026-02-21T10:19:38.0267009Z 2026-02-21T10:19:38.0267063Z { 2026-02-21T10:19:38.0267136Z .reg .pred complete; 2026-02-21T10:19:38.0267197Z waitLoop: 2026-02-21T10:19:38.0267350Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r12051; 2026-02-21T10:19:38.0267426Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0267486Z } 2026-02-21T10:19:38.0267492Z 2026-02-21T10:19:38.0267554Z // end inline asm 2026-02-21T10:19:38.0267697Z bar.sync 0; 2026-02-21T10:19:38.0267767Z // begin inline asm 2026-02-21T10:19:38.0267945Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0268010Z // end inline asm 2026-02-21T10:19:38.0268232Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0268311Z ld.shared.s8 %rs737, [%r19]; 2026-02-21T10:19:38.0268598Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0268668Z shl.b16 %rs738, %rs737, 4; 2026-02-21T10:19:38.0268872Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0268946Z ld.shared.s8 %rs739, [%r20+128]; 2026-02-21T10:19:38.0269140Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0269212Z shl.b16 %rs740, %rs739, 4; 2026-02-21T10:19:38.0269415Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0269492Z ld.shared.s8 %rs741, [%r21+256]; 2026-02-21T10:19:38.0269793Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0269868Z shl.b16 %rs742, %rs741, 4; 2026-02-21T10:19:38.0270080Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0270213Z ld.shared.s8 %rs743, [%r22+384]; 2026-02-21T10:19:38.0270419Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0270484Z shl.b16 %rs744, %rs743, 4; 2026-02-21T10:19:38.0270677Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0270756Z ld.shared.s8 %rs745, [%r23+512]; 2026-02-21T10:19:38.0270953Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0271023Z shl.b16 %rs746, %rs745, 4; 2026-02-21T10:19:38.0271226Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0271296Z ld.shared.s8 %rs747, [%r24+640]; 2026-02-21T10:19:38.0271487Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0271559Z shl.b16 %rs748, %rs747, 4; 2026-02-21T10:19:38.0271763Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0271832Z ld.shared.s8 %rs749, [%r25+768]; 2026-02-21T10:19:38.0272034Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0272110Z shl.b16 %rs750, %rs749, 4; 2026-02-21T10:19:38.0272313Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0272384Z ld.shared.s8 %rs751, [%r26+896]; 2026-02-21T10:19:38.0272588Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0272654Z shl.b16 %rs752, %rs751, 4; 2026-02-21T10:19:38.0272847Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0272927Z ld.shared.s8 %rs753, [%r19+1024]; 2026-02-21T10:19:38.0273122Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0273185Z shl.b16 %rs754, %rs753, 4; 2026-02-21T10:19:38.0273390Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0273464Z ld.shared.s8 %rs755, [%r20+1152]; 2026-02-21T10:19:38.0273660Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0273726Z shl.b16 %rs756, %rs755, 4; 2026-02-21T10:19:38.0274019Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0274134Z ld.shared.s8 %rs757, [%r21+1280]; 2026-02-21T10:19:38.0274327Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0274398Z shl.b16 %rs758, %rs757, 4; 2026-02-21T10:19:38.0274591Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0274658Z ld.shared.s8 %rs759, [%r22+1408]; 2026-02-21T10:19:38.0274856Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0274920Z shl.b16 %rs760, %rs759, 4; 2026-02-21T10:19:38.0275112Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0275182Z ld.shared.s8 %rs761, [%r23+1536]; 2026-02-21T10:19:38.0275373Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0275439Z shl.b16 %rs762, %rs761, 4; 2026-02-21T10:19:38.0275680Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0275760Z ld.shared.s8 %rs763, [%r24+1664]; 2026-02-21T10:19:38.0275954Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0276063Z shl.b16 %rs764, %rs763, 4; 2026-02-21T10:19:38.0276262Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0276332Z ld.shared.s8 %rs765, [%r25+1792]; 2026-02-21T10:19:38.0276662Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0276737Z shl.b16 %rs766, %rs765, 4; 2026-02-21T10:19:38.0276928Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0277011Z ld.shared.s8 %rs767, [%r26+1920]; 2026-02-21T10:19:38.0277222Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0277287Z shl.b16 %rs768, %rs767, 4; 2026-02-21T10:19:38.0277479Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0277551Z ld.shared.s8 %rs769, [%r19+2048]; 2026-02-21T10:19:38.0277742Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0277806Z shl.b16 %rs770, %rs769, 4; 2026-02-21T10:19:38.0277998Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0278073Z ld.shared.s8 %rs771, [%r20+2176]; 2026-02-21T10:19:38.0278278Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0278343Z shl.b16 %rs772, %rs771, 4; 2026-02-21T10:19:38.0278547Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0278619Z ld.shared.s8 %rs773, [%r21+2304]; 2026-02-21T10:19:38.0278816Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0278889Z shl.b16 %rs774, %rs773, 4; 2026-02-21T10:19:38.0279082Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0279150Z ld.shared.s8 %rs775, [%r22+2432]; 2026-02-21T10:19:38.0279349Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0279421Z shl.b16 %rs776, %rs775, 4; 2026-02-21T10:19:38.0279613Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0279680Z ld.shared.s8 %rs777, [%r23+2560]; 2026-02-21T10:19:38.0279878Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0280076Z shl.b16 %rs778, %rs777, 4; 2026-02-21T10:19:38.0280270Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0280343Z ld.shared.s8 %rs779, [%r24+2688]; 2026-02-21T10:19:38.0280535Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0280611Z shl.b16 %rs780, %rs779, 4; 2026-02-21T10:19:38.0280817Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0280886Z ld.shared.s8 %rs781, [%r25+2816]; 2026-02-21T10:19:38.0281079Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0281149Z shl.b16 %rs782, %rs781, 4; 2026-02-21T10:19:38.0281356Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0281427Z ld.shared.s8 %rs783, [%r26+2944]; 2026-02-21T10:19:38.0281691Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0281765Z shl.b16 %rs784, %rs783, 4; 2026-02-21T10:19:38.0281962Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0282030Z ld.shared.s8 %rs785, [%r19+3072]; 2026-02-21T10:19:38.0282287Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0282353Z shl.b16 %rs786, %rs785, 4; 2026-02-21T10:19:38.0282543Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0282616Z ld.shared.s8 %rs787, [%r20+3200]; 2026-02-21T10:19:38.0282817Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0282885Z shl.b16 %rs788, %rs787, 4; 2026-02-21T10:19:38.0283086Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0283156Z ld.shared.s8 %rs789, [%r21+3328]; 2026-02-21T10:19:38.0283348Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0283412Z shl.b16 %rs790, %rs789, 4; 2026-02-21T10:19:38.0283617Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0283683Z ld.shared.s8 %rs791, [%r22+3456]; 2026-02-21T10:19:38.0283876Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0283948Z shl.b16 %rs792, %rs791, 4; 2026-02-21T10:19:38.0284141Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0284210Z ld.shared.s8 %rs793, [%r23+3584]; 2026-02-21T10:19:38.0284421Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0284485Z shl.b16 %rs794, %rs793, 4; 2026-02-21T10:19:38.0284678Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0284750Z ld.shared.s8 %rs795, [%r24+3712]; 2026-02-21T10:19:38.0284943Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0285009Z shl.b16 %rs796, %rs795, 4; 2026-02-21T10:19:38.0285200Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0285284Z ld.shared.s8 %rs797, [%r25+3840]; 2026-02-21T10:19:38.0285476Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0285542Z shl.b16 %rs798, %rs797, 4; 2026-02-21T10:19:38.0285745Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0285912Z ld.shared.s8 %rs799, [%r26+3968]; 2026-02-21T10:19:38.0286106Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0286176Z shl.b16 %rs800, %rs799, 4; 2026-02-21T10:19:38.0286367Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0286434Z cvt.s16.s8 %rs801, %rs738; 2026-02-21T10:19:38.0286630Z shr.s16 %rs802, %rs801, 4; 2026-02-21T10:19:38.0286696Z cvt.s16.s8 %rs803, %rs740; 2026-02-21T10:19:38.0286768Z shr.s16 %rs804, %rs803, 4; 2026-02-21T10:19:38.0286833Z shr.s16 %rs805, %rs737, 4; 2026-02-21T10:19:38.0286901Z shr.s16 %rs806, %rs739, 4; 2026-02-21T10:19:38.0287108Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0287179Z cvt.rn.f32.s16 %r12185, %rs806; 2026-02-21T10:19:38.0287253Z cvt.rn.f32.s16 %r12186, %rs805; 2026-02-21T10:19:38.0287319Z cvt.rn.f32.s16 %r12187, %rs804; 2026-02-21T10:19:38.0287388Z cvt.rn.f32.s16 %r12188, %rs802; 2026-02-21T10:19:38.0287665Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0287744Z cvt.s16.s8 %rs807, %rs742; 2026-02-21T10:19:38.0287808Z shr.s16 %rs808, %rs807, 4; 2026-02-21T10:19:38.0287869Z cvt.s16.s8 %rs809, %rs744; 2026-02-21T10:19:38.0287998Z shr.s16 %rs810, %rs809, 4; 2026-02-21T10:19:38.0288065Z shr.s16 %rs811, %rs741, 4; 2026-02-21T10:19:38.0288130Z shr.s16 %rs812, %rs743, 4; 2026-02-21T10:19:38.0288344Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0288412Z cvt.rn.f32.s16 %r12189, %rs812; 2026-02-21T10:19:38.0288477Z cvt.rn.f32.s16 %r12190, %rs811; 2026-02-21T10:19:38.0288540Z cvt.rn.f32.s16 %r12191, %rs810; 2026-02-21T10:19:38.0288612Z cvt.rn.f32.s16 %r12192, %rs808; 2026-02-21T10:19:38.0288814Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0288881Z cvt.s16.s8 %rs813, %rs746; 2026-02-21T10:19:38.0288953Z shr.s16 %rs814, %rs813, 4; 2026-02-21T10:19:38.0289016Z cvt.s16.s8 %rs815, %rs748; 2026-02-21T10:19:38.0289076Z shr.s16 %rs816, %rs815, 4; 2026-02-21T10:19:38.0289145Z shr.s16 %rs817, %rs745, 4; 2026-02-21T10:19:38.0289209Z shr.s16 %rs818, %rs747, 4; 2026-02-21T10:19:38.0289409Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0289474Z cvt.rn.f32.s16 %r12193, %rs818; 2026-02-21T10:19:38.0289545Z cvt.rn.f32.s16 %r12194, %rs817; 2026-02-21T10:19:38.0289608Z cvt.rn.f32.s16 %r12195, %rs816; 2026-02-21T10:19:38.0289672Z cvt.rn.f32.s16 %r12196, %rs814; 2026-02-21T10:19:38.0289872Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0289935Z cvt.s16.s8 %rs819, %rs750; 2026-02-21T10:19:38.0290001Z shr.s16 %rs820, %rs819, 4; 2026-02-21T10:19:38.0290069Z cvt.s16.s8 %rs821, %rs752; 2026-02-21T10:19:38.0290136Z shr.s16 %rs822, %rs821, 4; 2026-02-21T10:19:38.0290199Z shr.s16 %rs823, %rs749, 4; 2026-02-21T10:19:38.0290261Z shr.s16 %rs824, %rs751, 4; 2026-02-21T10:19:38.0290461Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0290527Z cvt.rn.f32.s16 %r12197, %rs824; 2026-02-21T10:19:38.0290592Z cvt.rn.f32.s16 %r12198, %rs823; 2026-02-21T10:19:38.0290662Z cvt.rn.f32.s16 %r12199, %rs822; 2026-02-21T10:19:38.0290727Z cvt.rn.f32.s16 %r12200, %rs820; 2026-02-21T10:19:38.0290920Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0290996Z cvt.s16.s8 %rs825, %rs754; 2026-02-21T10:19:38.0291069Z shr.s16 %rs826, %rs825, 4; 2026-02-21T10:19:38.0291132Z cvt.s16.s8 %rs827, %rs756; 2026-02-21T10:19:38.0291269Z shr.s16 %rs828, %rs827, 4; 2026-02-21T10:19:38.0291338Z shr.s16 %rs829, %rs753, 4; 2026-02-21T10:19:38.0291460Z shr.s16 %rs830, %rs755, 4; 2026-02-21T10:19:38.0291657Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0291732Z cvt.rn.f32.s16 %r12201, %rs830; 2026-02-21T10:19:38.0291795Z cvt.rn.f32.s16 %r12202, %rs829; 2026-02-21T10:19:38.0291858Z cvt.rn.f32.s16 %r12203, %rs828; 2026-02-21T10:19:38.0291924Z cvt.rn.f32.s16 %r12204, %rs826; 2026-02-21T10:19:38.0292124Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0292186Z cvt.s16.s8 %rs831, %rs758; 2026-02-21T10:19:38.0292248Z shr.s16 %rs832, %rs831, 4; 2026-02-21T10:19:38.0292316Z cvt.s16.s8 %rs833, %rs760; 2026-02-21T10:19:38.0292377Z shr.s16 %rs834, %rs833, 4; 2026-02-21T10:19:38.0292439Z shr.s16 %rs835, %rs757, 4; 2026-02-21T10:19:38.0292503Z shr.s16 %rs836, %rs759, 4; 2026-02-21T10:19:38.0292705Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0292785Z cvt.rn.f32.s16 %r12205, %rs836; 2026-02-21T10:19:38.0292905Z cvt.rn.f32.s16 %r12206, %rs835; 2026-02-21T10:19:38.0292977Z cvt.rn.f32.s16 %r12207, %rs834; 2026-02-21T10:19:38.0293041Z cvt.rn.f32.s16 %r12208, %rs832; 2026-02-21T10:19:38.0293298Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0293371Z cvt.s16.s8 %rs837, %rs762; 2026-02-21T10:19:38.0293436Z shr.s16 %rs838, %rs837, 4; 2026-02-21T10:19:38.0293498Z cvt.s16.s8 %rs839, %rs764; 2026-02-21T10:19:38.0293562Z shr.s16 %rs840, %rs839, 4; 2026-02-21T10:19:38.0293632Z shr.s16 %rs841, %rs761, 4; 2026-02-21T10:19:38.0293692Z shr.s16 %rs842, %rs763, 4; 2026-02-21T10:19:38.0293888Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0293960Z cvt.rn.f32.s16 %r12209, %rs842; 2026-02-21T10:19:38.0294023Z cvt.rn.f32.s16 %r12210, %rs841; 2026-02-21T10:19:38.0294086Z cvt.rn.f32.s16 %r12211, %rs840; 2026-02-21T10:19:38.0294158Z cvt.rn.f32.s16 %r12212, %rs838; 2026-02-21T10:19:38.0294350Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0294413Z cvt.s16.s8 %rs843, %rs766; 2026-02-21T10:19:38.0294491Z shr.s16 %rs844, %rs843, 4; 2026-02-21T10:19:38.0294558Z cvt.s16.s8 %rs845, %rs768; 2026-02-21T10:19:38.0294619Z shr.s16 %rs846, %rs845, 4; 2026-02-21T10:19:38.0294680Z shr.s16 %rs847, %rs765, 4; 2026-02-21T10:19:38.0294749Z shr.s16 %rs848, %rs767, 4; 2026-02-21T10:19:38.0294940Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0295004Z cvt.rn.f32.s16 %r12213, %rs848; 2026-02-21T10:19:38.0295072Z cvt.rn.f32.s16 %r12214, %rs847; 2026-02-21T10:19:38.0295136Z cvt.rn.f32.s16 %r12215, %rs846; 2026-02-21T10:19:38.0295201Z cvt.rn.f32.s16 %r12216, %rs844; 2026-02-21T10:19:38.0295400Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0295464Z cvt.s16.s8 %rs849, %rs770; 2026-02-21T10:19:38.0295527Z shr.s16 %rs850, %rs849, 4; 2026-02-21T10:19:38.0295589Z cvt.s16.s8 %rs851, %rs772; 2026-02-21T10:19:38.0295659Z shr.s16 %rs852, %rs851, 4; 2026-02-21T10:19:38.0295725Z shr.s16 %rs853, %rs769, 4; 2026-02-21T10:19:38.0295788Z shr.s16 %rs854, %rs771, 4; 2026-02-21T10:19:38.0295988Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0296056Z cvt.rn.f32.s16 %r12217, %rs854; 2026-02-21T10:19:38.0296123Z cvt.rn.f32.s16 %r12218, %rs853; 2026-02-21T10:19:38.0296187Z cvt.rn.f32.s16 %r12219, %rs852; 2026-02-21T10:19:38.0296258Z cvt.rn.f32.s16 %r12220, %rs850; 2026-02-21T10:19:38.0296577Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0296738Z cvt.s16.s8 %rs855, %rs774; 2026-02-21T10:19:38.0296870Z shr.s16 %rs856, %rs855, 4; 2026-02-21T10:19:38.0296935Z cvt.s16.s8 %rs857, %rs776; 2026-02-21T10:19:38.0297006Z shr.s16 %rs858, %rs857, 4; 2026-02-21T10:19:38.0297076Z shr.s16 %rs859, %rs773, 4; 2026-02-21T10:19:38.0297137Z shr.s16 %rs860, %rs775, 4; 2026-02-21T10:19:38.0297339Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0297408Z cvt.rn.f32.s16 %r12221, %rs860; 2026-02-21T10:19:38.0297478Z cvt.rn.f32.s16 %r12222, %rs859; 2026-02-21T10:19:38.0297542Z cvt.rn.f32.s16 %r12223, %rs858; 2026-02-21T10:19:38.0297607Z cvt.rn.f32.s16 %r12224, %rs856; 2026-02-21T10:19:38.0297808Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0297870Z cvt.s16.s8 %rs861, %rs778; 2026-02-21T10:19:38.0297934Z shr.s16 %rs862, %rs861, 4; 2026-02-21T10:19:38.0298004Z cvt.s16.s8 %rs863, %rs780; 2026-02-21T10:19:38.0298065Z shr.s16 %rs864, %rs863, 4; 2026-02-21T10:19:38.0298128Z shr.s16 %rs865, %rs777, 4; 2026-02-21T10:19:38.0298258Z shr.s16 %rs866, %rs779, 4; 2026-02-21T10:19:38.0298468Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0298534Z cvt.rn.f32.s16 %r12225, %rs866; 2026-02-21T10:19:38.0298597Z cvt.rn.f32.s16 %r12226, %rs865; 2026-02-21T10:19:38.0298719Z cvt.rn.f32.s16 %r12227, %rs864; 2026-02-21T10:19:38.0298788Z cvt.rn.f32.s16 %r12228, %rs862; 2026-02-21T10:19:38.0298980Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0299043Z cvt.s16.s8 %rs867, %rs782; 2026-02-21T10:19:38.0299113Z shr.s16 %rs868, %rs867, 4; 2026-02-21T10:19:38.0299174Z cvt.s16.s8 %rs869, %rs784; 2026-02-21T10:19:38.0299234Z shr.s16 %rs870, %rs869, 4; 2026-02-21T10:19:38.0299300Z shr.s16 %rs871, %rs781, 4; 2026-02-21T10:19:38.0299362Z shr.s16 %rs872, %rs783, 4; 2026-02-21T10:19:38.0299554Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0299624Z cvt.rn.f32.s16 %r12229, %rs872; 2026-02-21T10:19:38.0299686Z cvt.rn.f32.s16 %r12230, %rs871; 2026-02-21T10:19:38.0299748Z cvt.rn.f32.s16 %r12231, %rs870; 2026-02-21T10:19:38.0299824Z cvt.rn.f32.s16 %r12232, %rs868; 2026-02-21T10:19:38.0300024Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0300086Z cvt.s16.s8 %rs873, %rs786; 2026-02-21T10:19:38.0300149Z shr.s16 %rs874, %rs873, 4; 2026-02-21T10:19:38.0300213Z cvt.s16.s8 %rs875, %rs788; 2026-02-21T10:19:38.0300274Z shr.s16 %rs876, %rs875, 4; 2026-02-21T10:19:38.0300334Z shr.s16 %rs877, %rs785, 4; 2026-02-21T10:19:38.0300394Z shr.s16 %rs878, %rs787, 4; 2026-02-21T10:19:38.0300591Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0300656Z cvt.rn.f32.s16 %r12233, %rs878; 2026-02-21T10:19:38.0300720Z cvt.rn.f32.s16 %r12234, %rs877; 2026-02-21T10:19:38.0300789Z cvt.rn.f32.s16 %r12235, %rs876; 2026-02-21T10:19:38.0300853Z cvt.rn.f32.s16 %r12236, %rs874; 2026-02-21T10:19:38.0301042Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0301108Z cvt.s16.s8 %rs879, %rs790; 2026-02-21T10:19:38.0301171Z shr.s16 %rs880, %rs879, 4; 2026-02-21T10:19:38.0301232Z cvt.s16.s8 %rs881, %rs792; 2026-02-21T10:19:38.0301293Z shr.s16 %rs882, %rs881, 4; 2026-02-21T10:19:38.0301360Z shr.s16 %rs883, %rs789, 4; 2026-02-21T10:19:38.0301421Z shr.s16 %rs884, %rs791, 4; 2026-02-21T10:19:38.0301612Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0301683Z cvt.rn.f32.s16 %r12237, %rs884; 2026-02-21T10:19:38.0301747Z cvt.rn.f32.s16 %r12238, %rs883; 2026-02-21T10:19:38.0301869Z cvt.rn.f32.s16 %r12239, %rs882; 2026-02-21T10:19:38.0301984Z cvt.rn.f32.s16 %r12240, %rs880; 2026-02-21T10:19:38.0302175Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0302237Z cvt.s16.s8 %rs885, %rs794; 2026-02-21T10:19:38.0302299Z shr.s16 %rs886, %rs885, 4; 2026-02-21T10:19:38.0302366Z cvt.s16.s8 %rs887, %rs796; 2026-02-21T10:19:38.0302425Z shr.s16 %rs888, %rs887, 4; 2026-02-21T10:19:38.0302489Z shr.s16 %rs889, %rs793, 4; 2026-02-21T10:19:38.0302550Z shr.s16 %rs890, %rs795, 4; 2026-02-21T10:19:38.0302739Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0302802Z cvt.rn.f32.s16 %r12241, %rs890; 2026-02-21T10:19:38.0302863Z cvt.rn.f32.s16 %r12242, %rs889; 2026-02-21T10:19:38.0302922Z cvt.rn.f32.s16 %r12243, %rs888; 2026-02-21T10:19:38.0302983Z cvt.rn.f32.s16 %r12244, %rs886; 2026-02-21T10:19:38.0303177Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0303242Z cvt.s16.s8 %rs891, %rs798; 2026-02-21T10:19:38.0303349Z shr.s16 %rs892, %rs891, 4; 2026-02-21T10:19:38.0303418Z cvt.s16.s8 %rs893, %rs800; 2026-02-21T10:19:38.0303480Z shr.s16 %rs894, %rs893, 4; 2026-02-21T10:19:38.0303539Z shr.s16 %rs895, %rs797, 4; 2026-02-21T10:19:38.0303601Z shr.s16 %rs896, %rs799, 4; 2026-02-21T10:19:38.0303837Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0303902Z cvt.rn.f32.s16 %r12245, %rs896; 2026-02-21T10:19:38.0303962Z cvt.rn.f32.s16 %r12246, %rs895; 2026-02-21T10:19:38.0304026Z cvt.rn.f32.s16 %r12247, %rs894; 2026-02-21T10:19:38.0304088Z cvt.rn.f32.s16 %r12248, %rs892; 2026-02-21T10:19:38.0304144Z bar.sync 0; 2026-02-21T10:19:38.0304273Z st.shared.v4.b32 [%r27], {%r12188, %r12186, %r12187, %r12185}; 2026-02-21T10:19:38.0304401Z st.shared.v4.b32 [%r27+16384], {%r12220, %r12218, %r12219, %r12217}; 2026-02-21T10:19:38.0304514Z st.shared.v4.b32 [%r28], {%r12192, %r12190, %r12191, %r12189}; 2026-02-21T10:19:38.0304643Z st.shared.v4.b32 [%r28+16384], {%r12224, %r12222, %r12223, %r12221}; 2026-02-21T10:19:38.0304751Z st.shared.v4.b32 [%r29], {%r12196, %r12194, %r12195, %r12193}; 2026-02-21T10:19:38.0304868Z st.shared.v4.b32 [%r29+16384], {%r12228, %r12226, %r12227, %r12225}; 2026-02-21T10:19:38.0304975Z st.shared.v4.b32 [%r30], {%r12200, %r12198, %r12199, %r12197}; 2026-02-21T10:19:38.0305096Z st.shared.v4.b32 [%r30+16384], {%r12232, %r12230, %r12231, %r12229}; 2026-02-21T10:19:38.0305200Z st.shared.v4.b32 [%r31], {%r12204, %r12202, %r12203, %r12201}; 2026-02-21T10:19:38.0305316Z st.shared.v4.b32 [%r31+16384], {%r12236, %r12234, %r12235, %r12233}; 2026-02-21T10:19:38.0305427Z st.shared.v4.b32 [%r32], {%r12208, %r12206, %r12207, %r12205}; 2026-02-21T10:19:38.0305540Z st.shared.v4.b32 [%r32+16384], {%r12240, %r12238, %r12239, %r12237}; 2026-02-21T10:19:38.0305648Z st.shared.v4.b32 [%r33], {%r12212, %r12210, %r12211, %r12209}; 2026-02-21T10:19:38.0305768Z st.shared.v4.b32 [%r33+16384], {%r12244, %r12242, %r12243, %r12241}; 2026-02-21T10:19:38.0305875Z st.shared.v4.b32 [%r34], {%r12216, %r12214, %r12215, %r12213}; 2026-02-21T10:19:38.0306002Z st.shared.v4.b32 [%r34+16384], {%r12248, %r12246, %r12247, %r12245}; 2026-02-21T10:19:38.0306062Z $L__tmp7: 2026-02-21T10:19:38.0306337Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0306400Z // begin inline asm 2026-02-21T10:19:38.0306602Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0306674Z // end inline asm 2026-02-21T10:19:38.0306730Z bar.sync 0; 2026-02-21T10:19:38.0306816Z shfl.sync.idx.b32 %r12249, %r4, 0, 31, -1; 2026-02-21T10:19:38.0306891Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0306968Z mov.pred %p107, -1; 2026-02-21T10:19:38.0307029Z // begin inline asm 2026-02-21T10:19:38.0308660Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r9938,%r9939,%r9940,%r9941}, %rd3, %p107, 1, 1; 2026-02-21T10:19:38.0308791Z // end inline asm 2026-02-21T10:19:38.0308852Z // begin inline asm 2026-02-21T10:19:38.0310378Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10070,%r10071,%r10072,%r10073}, %rd4, %p107, 1, 1; 2026-02-21T10:19:38.0310443Z // end inline asm 2026-02-21T10:19:38.0310501Z // begin inline asm 2026-02-21T10:19:38.0312018Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10202,%r10203,%r10204,%r10205}, %rd5, %p107, 1, 1; 2026-02-21T10:19:38.0312092Z // end inline asm 2026-02-21T10:19:38.0312156Z // begin inline asm 2026-02-21T10:19:38.0313615Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10334,%r10335,%r10336,%r10337}, %rd6, %p107, 1, 1; 2026-02-21T10:19:38.0313676Z // end inline asm 2026-02-21T10:19:38.0313739Z // begin inline asm 2026-02-21T10:19:38.0315202Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10466,%r10467,%r10468,%r10469}, %rd7, %p107, 1, 1; 2026-02-21T10:19:38.0315265Z // end inline asm 2026-02-21T10:19:38.0315323Z // begin inline asm 2026-02-21T10:19:38.0316939Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10598,%r10599,%r10600,%r10601}, %rd8, %p107, 1, 1; 2026-02-21T10:19:38.0317138Z // end inline asm 2026-02-21T10:19:38.0317198Z // begin inline asm 2026-02-21T10:19:38.0318678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10730,%r10731,%r10732,%r10733}, %rd9, %p107, 1, 1; 2026-02-21T10:19:38.0318739Z // end inline asm 2026-02-21T10:19:38.0318798Z // begin inline asm 2026-02-21T10:19:38.0320392Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532}, {%r10862,%r10863,%r10864,%r10865}, %rd10, %p107, 1, 1; 2026-02-21T10:19:38.0320453Z // end inline asm 2026-02-21T10:19:38.0320510Z // begin inline asm 2026-02-21T10:19:38.0321969Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r10994,%r10995,%r10996,%r10997}, %rd3, %p107, 1, 1; 2026-02-21T10:19:38.0322030Z // end inline asm 2026-02-21T10:19:38.0322092Z // begin inline asm 2026-02-21T10:19:38.0323542Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11126,%r11127,%r11128,%r11129}, %rd4, %p107, 1, 1; 2026-02-21T10:19:38.0323602Z // end inline asm 2026-02-21T10:19:38.0323664Z // begin inline asm 2026-02-21T10:19:38.0325127Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11258,%r11259,%r11260,%r11261}, %rd5, %p107, 1, 1; 2026-02-21T10:19:38.0325189Z // end inline asm 2026-02-21T10:19:38.0325298Z // begin inline asm 2026-02-21T10:19:38.0326882Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11390,%r11391,%r11392,%r11393}, %rd6, %p107, 1, 1; 2026-02-21T10:19:38.0327030Z // end inline asm 2026-02-21T10:19:38.0327089Z // begin inline asm 2026-02-21T10:19:38.0328601Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11522,%r11523,%r11524,%r11525}, %rd7, %p107, 1, 1; 2026-02-21T10:19:38.0328670Z // end inline asm 2026-02-21T10:19:38.0328789Z // begin inline asm 2026-02-21T10:19:38.0330404Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11654,%r11655,%r11656,%r11657}, %rd8, %p107, 1, 1; 2026-02-21T10:19:38.0330474Z // end inline asm 2026-02-21T10:19:38.0330535Z // begin inline asm 2026-02-21T10:19:38.0331998Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11786,%r11787,%r11788,%r11789}, %rd9, %p107, 1, 1; 2026-02-21T10:19:38.0332055Z // end inline asm 2026-02-21T10:19:38.0332113Z // begin inline asm 2026-02-21T10:19:38.0333576Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596}, {%r11918,%r11919,%r11920,%r11921}, %rd10, %p107, 1, 1; 2026-02-21T10:19:38.0333637Z // end inline asm 2026-02-21T10:19:38.0333717Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0333780Z mov.b32 %r12050, %r39931; 2026-02-21T10:19:38.0333838Z mov.b32 %r12052, %r12051; 2026-02-21T10:19:38.0333895Z // begin inline asm 2026-02-21T10:19:38.0336368Z // wait for regs: %r42469,%r42470,%r42471,%r42472,%r42473,%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r12050,%r12051,%r12052 2026-02-21T10:19:38.0336747Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0336808Z // end inline asm 2026-02-21T10:19:38.0336877Z $L__tmp8: 2026-02-21T10:19:38.0337158Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0337232Z add.s64 %rd842, %rd842, 128; 2026-02-21T10:19:38.0337303Z setp.lt.u64 %p125, %rd843, 4064; 2026-02-21T10:19:38.0337367Z @%p125 bra $L__BB0_5; 2026-02-21T10:19:38.0337557Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.0337775Z .loc 1 94 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:94:28 2026-02-21T10:19:38.0337864Z cvt.rn.bf16x2.f32 %r12254, %r42470, %r42469; 2026-02-21T10:19:38.0337946Z cvt.rn.bf16x2.f32 %r12255, %r42472, %r42471; 2026-02-21T10:19:38.0338023Z cvt.rn.bf16x2.f32 %r12256, %r42474, %r42473; 2026-02-21T10:19:38.0338098Z cvt.rn.bf16x2.f32 %r12257, %r42476, %r42475; 2026-02-21T10:19:38.0338173Z cvt.rn.bf16x2.f32 %r12258, %r42478, %r42477; 2026-02-21T10:19:38.0338252Z cvt.rn.bf16x2.f32 %r12259, %r42480, %r42479; 2026-02-21T10:19:38.0338330Z cvt.rn.bf16x2.f32 %r12260, %r42482, %r42481; 2026-02-21T10:19:38.0338406Z cvt.rn.bf16x2.f32 %r12261, %r42484, %r42483; 2026-02-21T10:19:38.0338483Z cvt.rn.bf16x2.f32 %r12262, %r42486, %r42485; 2026-02-21T10:19:38.0338559Z cvt.rn.bf16x2.f32 %r12263, %r42488, %r42487; 2026-02-21T10:19:38.0338632Z cvt.rn.bf16x2.f32 %r12264, %r42490, %r42489; 2026-02-21T10:19:38.0338708Z cvt.rn.bf16x2.f32 %r12265, %r42492, %r42491; 2026-02-21T10:19:38.0338785Z cvt.rn.bf16x2.f32 %r12266, %r42494, %r42493; 2026-02-21T10:19:38.0338859Z cvt.rn.bf16x2.f32 %r12267, %r42496, %r42495; 2026-02-21T10:19:38.0338933Z cvt.rn.bf16x2.f32 %r12268, %r42498, %r42497; 2026-02-21T10:19:38.0339011Z cvt.rn.bf16x2.f32 %r12269, %r42500, %r42499; 2026-02-21T10:19:38.0339088Z cvt.rn.bf16x2.f32 %r12270, %r42502, %r42501; 2026-02-21T10:19:38.0339163Z cvt.rn.bf16x2.f32 %r12271, %r42504, %r42503; 2026-02-21T10:19:38.0339238Z cvt.rn.bf16x2.f32 %r12272, %r42506, %r42505; 2026-02-21T10:19:38.0339314Z cvt.rn.bf16x2.f32 %r12273, %r42508, %r42507; 2026-02-21T10:19:38.0339390Z cvt.rn.bf16x2.f32 %r12274, %r42510, %r42509; 2026-02-21T10:19:38.0339465Z cvt.rn.bf16x2.f32 %r12275, %r42512, %r42511; 2026-02-21T10:19:38.0339543Z cvt.rn.bf16x2.f32 %r12276, %r42514, %r42513; 2026-02-21T10:19:38.0339616Z cvt.rn.bf16x2.f32 %r12277, %r42516, %r42515; 2026-02-21T10:19:38.0339690Z cvt.rn.bf16x2.f32 %r12278, %r42518, %r42517; 2026-02-21T10:19:38.0339768Z cvt.rn.bf16x2.f32 %r12279, %r42520, %r42519; 2026-02-21T10:19:38.0339842Z cvt.rn.bf16x2.f32 %r12280, %r42522, %r42521; 2026-02-21T10:19:38.0339928Z cvt.rn.bf16x2.f32 %r12281, %r42524, %r42523; 2026-02-21T10:19:38.0340008Z cvt.rn.bf16x2.f32 %r12282, %r42526, %r42525; 2026-02-21T10:19:38.0340083Z cvt.rn.bf16x2.f32 %r12283, %r42528, %r42527; 2026-02-21T10:19:38.0340159Z cvt.rn.bf16x2.f32 %r12284, %r42530, %r42529; 2026-02-21T10:19:38.0340235Z cvt.rn.bf16x2.f32 %r12285, %r42532, %r42531; 2026-02-21T10:19:38.0340402Z cvt.rn.bf16x2.f32 %r12286, %r42534, %r42533; 2026-02-21T10:19:38.0340526Z cvt.rn.bf16x2.f32 %r12287, %r42536, %r42535; 2026-02-21T10:19:38.0340600Z cvt.rn.bf16x2.f32 %r12288, %r42538, %r42537; 2026-02-21T10:19:38.0340678Z cvt.rn.bf16x2.f32 %r12289, %r42540, %r42539; 2026-02-21T10:19:38.0340752Z cvt.rn.bf16x2.f32 %r12290, %r42542, %r42541; 2026-02-21T10:19:38.0340827Z cvt.rn.bf16x2.f32 %r12291, %r42544, %r42543; 2026-02-21T10:19:38.0340906Z cvt.rn.bf16x2.f32 %r12292, %r42546, %r42545; 2026-02-21T10:19:38.0340981Z cvt.rn.bf16x2.f32 %r12293, %r42548, %r42547; 2026-02-21T10:19:38.0341054Z cvt.rn.bf16x2.f32 %r12294, %r42550, %r42549; 2026-02-21T10:19:38.0341129Z cvt.rn.bf16x2.f32 %r12295, %r42552, %r42551; 2026-02-21T10:19:38.0341206Z cvt.rn.bf16x2.f32 %r12296, %r42554, %r42553; 2026-02-21T10:19:38.0341280Z cvt.rn.bf16x2.f32 %r12297, %r42556, %r42555; 2026-02-21T10:19:38.0341352Z cvt.rn.bf16x2.f32 %r12298, %r42558, %r42557; 2026-02-21T10:19:38.0341429Z cvt.rn.bf16x2.f32 %r12299, %r42560, %r42559; 2026-02-21T10:19:38.0341505Z cvt.rn.bf16x2.f32 %r12300, %r42562, %r42561; 2026-02-21T10:19:38.0341580Z cvt.rn.bf16x2.f32 %r12301, %r42564, %r42563; 2026-02-21T10:19:38.0341704Z cvt.rn.bf16x2.f32 %r12302, %r42566, %r42565; 2026-02-21T10:19:38.0341781Z cvt.rn.bf16x2.f32 %r12303, %r42568, %r42567; 2026-02-21T10:19:38.0341858Z cvt.rn.bf16x2.f32 %r12304, %r42570, %r42569; 2026-02-21T10:19:38.0341932Z cvt.rn.bf16x2.f32 %r12305, %r42572, %r42571; 2026-02-21T10:19:38.0342062Z cvt.rn.bf16x2.f32 %r12306, %r42574, %r42573; 2026-02-21T10:19:38.0342138Z cvt.rn.bf16x2.f32 %r12307, %r42576, %r42575; 2026-02-21T10:19:38.0342213Z cvt.rn.bf16x2.f32 %r12308, %r42578, %r42577; 2026-02-21T10:19:38.0342290Z cvt.rn.bf16x2.f32 %r12309, %r42580, %r42579; 2026-02-21T10:19:38.0342365Z cvt.rn.bf16x2.f32 %r12310, %r42582, %r42581; 2026-02-21T10:19:38.0342440Z cvt.rn.bf16x2.f32 %r12311, %r42584, %r42583; 2026-02-21T10:19:38.0342521Z cvt.rn.bf16x2.f32 %r12312, %r42586, %r42585; 2026-02-21T10:19:38.0342597Z cvt.rn.bf16x2.f32 %r12313, %r42588, %r42587; 2026-02-21T10:19:38.0342672Z cvt.rn.bf16x2.f32 %r12314, %r42590, %r42589; 2026-02-21T10:19:38.0342749Z cvt.rn.bf16x2.f32 %r12315, %r42592, %r42591; 2026-02-21T10:19:38.0342825Z cvt.rn.bf16x2.f32 %r12316, %r42594, %r42593; 2026-02-21T10:19:38.0342899Z cvt.rn.bf16x2.f32 %r12317, %r42596, %r42595; 2026-02-21T10:19:38.0343115Z .loc 1 95 43 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:95:43 2026-02-21T10:19:38.0343176Z bar.sync 0; 2026-02-21T10:19:38.0343374Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r12254, %r12255, %r12256, %r12257}; 2026-02-21T10:19:38.0343565Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r12270, %r12271, %r12272, %r12273}; 2026-02-21T10:19:38.0343752Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r12286, %r12287, %r12288, %r12289}; 2026-02-21T10:19:38.0343933Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r12302, %r12303, %r12304, %r12305}; 2026-02-21T10:19:38.0344114Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r12258, %r12259, %r12260, %r12261}; 2026-02-21T10:19:38.0344299Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r12274, %r12275, %r12276, %r12277}; 2026-02-21T10:19:38.0344480Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r12290, %r12291, %r12292, %r12293}; 2026-02-21T10:19:38.0344661Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r12306, %r12307, %r12308, %r12309}; 2026-02-21T10:19:38.0344841Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r12262, %r12263, %r12264, %r12265}; 2026-02-21T10:19:38.0345024Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r12278, %r12279, %r12280, %r12281}; 2026-02-21T10:19:38.0345202Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r12294, %r12295, %r12296, %r12297}; 2026-02-21T10:19:38.0345382Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r12310, %r12311, %r12312, %r12313}; 2026-02-21T10:19:38.0345565Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r12266, %r12267, %r12268, %r12269}; 2026-02-21T10:19:38.0345802Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r12282, %r12283, %r12284, %r12285}; 2026-02-21T10:19:38.0346060Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r12298, %r12299, %r12300, %r12301}; 2026-02-21T10:19:38.0346246Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r12314, %r12315, %r12316, %r12317}; 2026-02-21T10:19:38.0346307Z // begin inline asm 2026-02-21T10:19:38.0346392Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0346581Z // end inline asm 2026-02-21T10:19:38.0346643Z bar.sync 0; 2026-02-21T10:19:38.0346715Z elect.sync %r12318|%p128, -1; 2026-02-21T10:19:38.0346799Z shfl.sync.idx.b32 %r12319, %r4, 0, 31, -1; 2026-02-21T10:19:38.0346872Z and.pred %p126, %p405, %p128; 2026-02-21T10:19:38.0346935Z and.b32 %r12320, %r12319, 1; 2026-02-21T10:19:38.0347006Z shl.b32 %r12321, %r12320, 14; 2026-02-21T10:19:38.0347077Z add.s32 %r22276, %r39931, %r12321; 2026-02-21T10:19:38.0347139Z shl.b32 %r639, %r12320, 6; 2026-02-21T10:19:38.0347204Z or.b32 %r12250, %r639, %r9804; 2026-02-21T10:19:38.0347265Z // begin inline asm 2026-02-21T10:19:38.0347597Z @%p126 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r12250, %r12251}], [%r22276]; 2026-02-21T10:19:38.0347658Z // end inline asm 2026-02-21T10:19:38.0347732Z cp.async.bulk.commit_group; 2026-02-21T10:19:38.0347811Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:19:38.0347866Z bar.sync 0; 2026-02-21T10:19:38.0348140Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:38.0348213Z add.s32 %r12323, %r42467, 4224; 2026-02-21T10:19:38.0348495Z .loc 1 34 35 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:34:35 2026-02-21T10:19:38.0348559Z shr.s32 %r12324, %r12323, 31; 2026-02-21T10:19:38.0348623Z shr.u32 %r12325, %r12324, 18; 2026-02-21T10:19:38.0348686Z add.s32 %r12326, %r12323, %r12325; 2026-02-21T10:19:38.0348745Z shr.s32 %r12327, %r12326, 14; 2026-02-21T10:19:38.0348941Z .loc 1 35 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:35:33 2026-02-21T10:19:38.0349010Z shl.b32 %r12328, %r12327, 5; 2026-02-21T10:19:38.0349202Z .loc 1 36 39 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:39 2026-02-21T10:19:38.0349264Z sub.s32 %r12329, 10, %r12328; 2026-02-21T10:19:38.0349457Z .loc 1 36 52 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:52 2026-02-21T10:19:38.0349516Z min.s32 %r12330, %r12329, 32; 2026-02-21T10:19:38.0349706Z .loc 1 37 45 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:45 2026-02-21T10:19:38.0349774Z and.b32 %r12331, %r12326, -16384; 2026-02-21T10:19:38.0349836Z sub.s32 %r12332, %r12323, %r12331; 2026-02-21T10:19:38.0350024Z .loc 1 38 51 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:38:51 2026-02-21T10:19:38.0350085Z div.s32 %r12333, %r12332, %r12330; 2026-02-21T10:19:38.0350280Z .loc 1 37 64 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:64 2026-02-21T10:19:38.0350352Z mul.lo.s32 %r12334, %r12333, %r12330; 2026-02-21T10:19:38.0350412Z sub.s32 %r12335, %r12332, %r12334; 2026-02-21T10:19:38.0350606Z .loc 1 37 30 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:30 2026-02-21T10:19:38.0350668Z add.s32 %r12336, %r12335, %r12328; 2026-02-21T10:19:38.0350859Z .loc 1 39 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:39:27 2026-02-21T10:19:38.0350922Z shl.b32 %r19829, %r12336, 7; 2026-02-21T10:19:38.0351109Z .loc 1 40 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:40:27 2026-02-21T10:19:38.0351168Z shl.b32 %r22275, %r12333, 7; 2026-02-21T10:19:38.0351360Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0351498Z or.b32 %r12337, %r42459, %r22275; 2026-02-21T10:19:38.0351558Z shl.b32 %r12338, %r12337, 13; 2026-02-21T10:19:38.0351682Z mul.wide.s32 %rd45, %r12338, 2; 2026-02-21T10:19:38.0351747Z or.b32 %r12339, %r42460, %r22275; 2026-02-21T10:19:38.0351805Z shl.b32 %r12340, %r12339, 13; 2026-02-21T10:19:38.0351868Z mul.wide.s32 %rd46, %r12340, 2; 2026-02-21T10:19:38.0351931Z or.b32 %r12341, %r42461, %r22275; 2026-02-21T10:19:38.0351991Z shl.b32 %r12342, %r12341, 13; 2026-02-21T10:19:38.0352055Z mul.wide.s32 %rd47, %r12342, 2; 2026-02-21T10:19:38.0352115Z or.b32 %r12343, %r42462, %r22275; 2026-02-21T10:19:38.0352176Z shl.b32 %r12344, %r12343, 13; 2026-02-21T10:19:38.0352239Z mul.wide.s32 %rd48, %r12344, 2; 2026-02-21T10:19:38.0352298Z or.b32 %r12345, %r42463, %r22275; 2026-02-21T10:19:38.0352360Z shl.b32 %r12346, %r12345, 13; 2026-02-21T10:19:38.0352423Z mul.wide.s32 %rd49, %r12346, 2; 2026-02-21T10:19:38.0352482Z or.b32 %r12347, %r42464, %r22275; 2026-02-21T10:19:38.0352545Z shl.b32 %r12348, %r12347, 13; 2026-02-21T10:19:38.0352608Z mul.wide.s32 %rd50, %r12348, 2; 2026-02-21T10:19:38.0352667Z shl.b32 %r12349, %r12333, 20; 2026-02-21T10:19:38.0352775Z or.b32 %r12350, %r42465, %r12349; 2026-02-21T10:19:38.0352842Z mul.wide.s32 %rd51, %r12350, 2; 2026-02-21T10:19:38.0352904Z or.b32 %r42725, %r67, %r12349; 2026-02-21T10:19:38.0352964Z or.b32 %r12351, %r42466, %r12349; 2026-02-21T10:19:38.0353033Z mul.wide.s32 %rd52, %r12351, 2; 2026-02-21T10:19:38.0353137Z mov.b32 %r42726, 0f00000000; 2026-02-21T10:19:38.0353200Z mov.b64 %rd845, -96; 2026-02-21T10:19:38.0353261Z mov.b64 %rd844, %rd11; 2026-02-21T10:19:38.0353324Z mov.b32 %r42727, %r42726; 2026-02-21T10:19:38.0353393Z mov.b32 %r42728, %r42726; 2026-02-21T10:19:38.0353453Z mov.b32 %r42729, %r42726; 2026-02-21T10:19:38.0353515Z mov.b32 %r42730, %r42726; 2026-02-21T10:19:38.0353573Z mov.b32 %r42731, %r42726; 2026-02-21T10:19:38.0353630Z mov.b32 %r42732, %r42726; 2026-02-21T10:19:38.0353687Z mov.b32 %r42733, %r42726; 2026-02-21T10:19:38.0353751Z mov.b32 %r42734, %r42726; 2026-02-21T10:19:38.0353811Z mov.b32 %r42735, %r42726; 2026-02-21T10:19:38.0353869Z mov.b32 %r42736, %r42726; 2026-02-21T10:19:38.0353931Z mov.b32 %r42737, %r42726; 2026-02-21T10:19:38.0353990Z mov.b32 %r42738, %r42726; 2026-02-21T10:19:38.0354047Z mov.b32 %r42739, %r42726; 2026-02-21T10:19:38.0354105Z mov.b32 %r42740, %r42726; 2026-02-21T10:19:38.0354165Z mov.b32 %r42741, %r42726; 2026-02-21T10:19:38.0354224Z mov.b32 %r42742, %r42726; 2026-02-21T10:19:38.0354282Z mov.b32 %r42743, %r42726; 2026-02-21T10:19:38.0354342Z mov.b32 %r42744, %r42726; 2026-02-21T10:19:38.0354399Z mov.b32 %r42745, %r42726; 2026-02-21T10:19:38.0354456Z mov.b32 %r42746, %r42726; 2026-02-21T10:19:38.0354524Z mov.b32 %r42747, %r42726; 2026-02-21T10:19:38.0354580Z mov.b32 %r42748, %r42726; 2026-02-21T10:19:38.0354637Z mov.b32 %r42749, %r42726; 2026-02-21T10:19:38.0354695Z mov.b32 %r42750, %r42726; 2026-02-21T10:19:38.0354758Z mov.b32 %r42751, %r42726; 2026-02-21T10:19:38.0354818Z mov.b32 %r42752, %r42726; 2026-02-21T10:19:38.0354879Z mov.b32 %r42753, %r42726; 2026-02-21T10:19:38.0354940Z mov.b32 %r42754, %r42726; 2026-02-21T10:19:38.0354999Z mov.b32 %r42755, %r42726; 2026-02-21T10:19:38.0355058Z mov.b32 %r42756, %r42726; 2026-02-21T10:19:38.0355117Z mov.b32 %r42757, %r42726; 2026-02-21T10:19:38.0355177Z mov.b32 %r42758, %r42726; 2026-02-21T10:19:38.0355235Z mov.b32 %r42759, %r42726; 2026-02-21T10:19:38.0355294Z mov.b32 %r42760, %r42726; 2026-02-21T10:19:38.0355355Z mov.b32 %r42761, %r42726; 2026-02-21T10:19:38.0355412Z mov.b32 %r42762, %r42726; 2026-02-21T10:19:38.0355471Z mov.b32 %r42763, %r42726; 2026-02-21T10:19:38.0355527Z mov.b32 %r42764, %r42726; 2026-02-21T10:19:38.0355590Z mov.b32 %r42765, %r42726; 2026-02-21T10:19:38.0355646Z mov.b32 %r42766, %r42726; 2026-02-21T10:19:38.0355702Z mov.b32 %r42767, %r42726; 2026-02-21T10:19:38.0355763Z mov.b32 %r42768, %r42726; 2026-02-21T10:19:38.0355822Z mov.b32 %r42769, %r42726; 2026-02-21T10:19:38.0355952Z mov.b32 %r42770, %r42726; 2026-02-21T10:19:38.0356055Z mov.b32 %r42771, %r42726; 2026-02-21T10:19:38.0356118Z mov.b32 %r42772, %r42726; 2026-02-21T10:19:38.0356180Z mov.b32 %r42773, %r42726; 2026-02-21T10:19:38.0356238Z mov.b32 %r42774, %r42726; 2026-02-21T10:19:38.0356298Z mov.b32 %r42775, %r42726; 2026-02-21T10:19:38.0356357Z mov.b32 %r42776, %r42726; 2026-02-21T10:19:38.0356413Z mov.b32 %r42777, %r42726; 2026-02-21T10:19:38.0356595Z mov.b32 %r42778, %r42726; 2026-02-21T10:19:38.0356663Z mov.b32 %r42779, %r42726; 2026-02-21T10:19:38.0356721Z mov.b32 %r42780, %r42726; 2026-02-21T10:19:38.0356778Z mov.b32 %r42781, %r42726; 2026-02-21T10:19:38.0356838Z mov.b32 %r42782, %r42726; 2026-02-21T10:19:38.0356896Z mov.b32 %r42783, %r42726; 2026-02-21T10:19:38.0356953Z mov.b32 %r42784, %r42726; 2026-02-21T10:19:38.0357022Z mov.b32 %r42785, %r42726; 2026-02-21T10:19:38.0357087Z mov.b32 %r42786, %r42726; 2026-02-21T10:19:38.0357144Z mov.b32 %r42787, %r42726; 2026-02-21T10:19:38.0357205Z mov.b32 %r42788, %r42726; 2026-02-21T10:19:38.0357268Z mov.b32 %r42789, %r42726; 2026-02-21T10:19:38.0357326Z mov.b32 %r42790, %r42726; 2026-02-21T10:19:38.0357459Z mov.b32 %r42791, %r42726; 2026-02-21T10:19:38.0357522Z mov.b32 %r42792, %r42726; 2026-02-21T10:19:38.0357580Z mov.b32 %r42793, %r42726; 2026-02-21T10:19:38.0357638Z mov.b32 %r42794, %r42726; 2026-02-21T10:19:38.0357694Z mov.b32 %r42795, %r42726; 2026-02-21T10:19:38.0357811Z mov.b32 %r42796, %r42726; 2026-02-21T10:19:38.0357881Z mov.b32 %r42797, %r42726; 2026-02-21T10:19:38.0357940Z mov.b32 %r42798, %r42726; 2026-02-21T10:19:38.0358002Z mov.b32 %r42799, %r42726; 2026-02-21T10:19:38.0358059Z mov.b32 %r42800, %r42726; 2026-02-21T10:19:38.0358116Z mov.b32 %r42801, %r42726; 2026-02-21T10:19:38.0358173Z mov.b32 %r42802, %r42726; 2026-02-21T10:19:38.0358234Z mov.b32 %r42803, %r42726; 2026-02-21T10:19:38.0358292Z mov.b32 %r42804, %r42726; 2026-02-21T10:19:38.0358349Z mov.b32 %r42805, %r42726; 2026-02-21T10:19:38.0358413Z mov.b32 %r42806, %r42726; 2026-02-21T10:19:38.0358471Z mov.b32 %r42807, %r42726; 2026-02-21T10:19:38.0358528Z mov.b32 %r42808, %r42726; 2026-02-21T10:19:38.0358587Z mov.b32 %r42809, %r42726; 2026-02-21T10:19:38.0358646Z mov.b32 %r42810, %r42726; 2026-02-21T10:19:38.0358702Z mov.b32 %r42811, %r42726; 2026-02-21T10:19:38.0358759Z mov.b32 %r42812, %r42726; 2026-02-21T10:19:38.0358818Z mov.b32 %r42813, %r42726; 2026-02-21T10:19:38.0358876Z mov.b32 %r42814, %r42726; 2026-02-21T10:19:38.0358933Z mov.b32 %r42815, %r42726; 2026-02-21T10:19:38.0358990Z mov.b32 %r42816, %r42726; 2026-02-21T10:19:38.0359051Z mov.b32 %r42817, %r42726; 2026-02-21T10:19:38.0359109Z mov.b32 %r42818, %r42726; 2026-02-21T10:19:38.0359165Z mov.b32 %r42819, %r42726; 2026-02-21T10:19:38.0359228Z mov.b32 %r42820, %r42726; 2026-02-21T10:19:38.0359284Z mov.b32 %r42821, %r42726; 2026-02-21T10:19:38.0359341Z mov.b32 %r42822, %r42726; 2026-02-21T10:19:38.0359398Z mov.b32 %r42823, %r42726; 2026-02-21T10:19:38.0359460Z mov.b32 %r42824, %r42726; 2026-02-21T10:19:38.0359522Z mov.b32 %r42825, %r42726; 2026-02-21T10:19:38.0359592Z mov.b32 %r42826, %r42726; 2026-02-21T10:19:38.0359664Z mov.b32 %r42827, %r42726; 2026-02-21T10:19:38.0359721Z mov.b32 %r42828, %r42726; 2026-02-21T10:19:38.0359779Z mov.b32 %r42829, %r42726; 2026-02-21T10:19:38.0359835Z mov.b32 %r42830, %r42726; 2026-02-21T10:19:38.0359898Z mov.b32 %r42831, %r42726; 2026-02-21T10:19:38.0359957Z mov.b32 %r42832, %r42726; 2026-02-21T10:19:38.0360013Z mov.b32 %r42833, %r42726; 2026-02-21T10:19:38.0360073Z mov.b32 %r42834, %r42726; 2026-02-21T10:19:38.0360131Z mov.b32 %r42835, %r42726; 2026-02-21T10:19:38.0360188Z mov.b32 %r42836, %r42726; 2026-02-21T10:19:38.0360249Z mov.b32 %r42837, %r42726; 2026-02-21T10:19:38.0360306Z mov.b32 %r42838, %r42726; 2026-02-21T10:19:38.0360365Z mov.b32 %r42839, %r42726; 2026-02-21T10:19:38.0360420Z mov.b32 %r42840, %r42726; 2026-02-21T10:19:38.0360480Z mov.b32 %r42841, %r42726; 2026-02-21T10:19:38.0360624Z mov.b32 %r42842, %r42726; 2026-02-21T10:19:38.0360741Z mov.b32 %r42843, %r42726; 2026-02-21T10:19:38.0360801Z mov.b32 %r42844, %r42726; 2026-02-21T10:19:38.0360861Z mov.b32 %r42845, %r42726; 2026-02-21T10:19:38.0360918Z mov.b32 %r42846, %r42726; 2026-02-21T10:19:38.0360975Z mov.b32 %r42847, %r42726; 2026-02-21T10:19:38.0361036Z mov.b32 %r42848, %r42726; 2026-02-21T10:19:38.0361093Z mov.b32 %r42849, %r42726; 2026-02-21T10:19:38.0361150Z mov.b32 %r42850, %r42726; 2026-02-21T10:19:38.0361211Z mov.b32 %r42851, %r42726; 2026-02-21T10:19:38.0361268Z mov.b32 %r42852, %r42726; 2026-02-21T10:19:38.0361329Z mov.b32 %r42853, %r42726; 2026-02-21T10:19:38.0361446Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:38.0361559Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.0361763Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0361832Z add.s64 %rd333, %rd844, %rd51; 2026-02-21T10:19:38.0361915Z add.s64 %rd336, %rd844, %rd50; 2026-02-21T10:19:38.0362028Z add.s64 %rd339, %rd844, %rd49; 2026-02-21T10:19:38.0362092Z add.s64 %rd342, %rd844, %rd48; 2026-02-21T10:19:38.0362158Z add.s64 %rd345, %rd844, %rd47; 2026-02-21T10:19:38.0362219Z add.s64 %rd348, %rd844, %rd46; 2026-02-21T10:19:38.0362282Z add.s64 %rd351, %rd844, %rd45; 2026-02-21T10:19:38.0362527Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0362603Z add.s64 %rd354, %rd844, %rd52; 2026-02-21T10:19:38.0362666Z // begin inline asm 2026-02-21T10:19:38.0362726Z mov.u64 %rd332, 0x0; 2026-02-21T10:19:38.0362862Z createpolicy.fractional.L2::evict_first.b64 %rd332, 1.0; 2026-02-21T10:19:38.0362921Z // end inline asm 2026-02-21T10:19:38.0362979Z // begin inline asm 2026-02-21T10:19:38.0363039Z mov.u32 %r12352, 0x0; 2026-02-21T10:19:38.0363103Z mov.u32 %r12353, 0x0; 2026-02-21T10:19:38.0363164Z mov.u32 %r12354, 0x0; 2026-02-21T10:19:38.0363223Z mov.u32 %r12355, 0x0; 2026-02-21T10:19:38.0363471Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12352, %r12353, %r12354, %r12355 }, [ %rd333 + 0 ], %rd332; 2026-02-21T10:19:38.0363530Z // end inline asm 2026-02-21T10:19:38.0363588Z // begin inline asm 2026-02-21T10:19:38.0363650Z mov.u64 %rd335, 0x0; 2026-02-21T10:19:38.0363772Z createpolicy.fractional.L2::evict_first.b64 %rd335, 1.0; 2026-02-21T10:19:38.0363833Z // end inline asm 2026-02-21T10:19:38.0363892Z // begin inline asm 2026-02-21T10:19:38.0363955Z mov.u32 %r12356, 0x0; 2026-02-21T10:19:38.0364012Z mov.u32 %r12357, 0x0; 2026-02-21T10:19:38.0364071Z mov.u32 %r12358, 0x0; 2026-02-21T10:19:38.0364131Z mov.u32 %r12359, 0x0; 2026-02-21T10:19:38.0364370Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12356, %r12357, %r12358, %r12359 }, [ %rd336 + 0 ], %rd335; 2026-02-21T10:19:38.0364427Z // end inline asm 2026-02-21T10:19:38.0364491Z // begin inline asm 2026-02-21T10:19:38.0364552Z mov.u64 %rd338, 0x0; 2026-02-21T10:19:38.0364671Z createpolicy.fractional.L2::evict_first.b64 %rd338, 1.0; 2026-02-21T10:19:38.0364728Z // end inline asm 2026-02-21T10:19:38.0364794Z // begin inline asm 2026-02-21T10:19:38.0364854Z mov.u32 %r12360, 0x0; 2026-02-21T10:19:38.0364911Z mov.u32 %r12361, 0x0; 2026-02-21T10:19:38.0364971Z mov.u32 %r12362, 0x0; 2026-02-21T10:19:38.0365028Z mov.u32 %r12363, 0x0; 2026-02-21T10:19:38.0365252Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12360, %r12361, %r12362, %r12363 }, [ %rd339 + 0 ], %rd338; 2026-02-21T10:19:38.0365310Z // end inline asm 2026-02-21T10:19:38.0365373Z // begin inline asm 2026-02-21T10:19:38.0365436Z mov.u64 %rd341, 0x0; 2026-02-21T10:19:38.0365559Z createpolicy.fractional.L2::evict_first.b64 %rd341, 1.0; 2026-02-21T10:19:38.0365619Z // end inline asm 2026-02-21T10:19:38.0365677Z // begin inline asm 2026-02-21T10:19:38.0365735Z mov.u32 %r12364, 0x0; 2026-02-21T10:19:38.0365796Z mov.u32 %r12365, 0x0; 2026-02-21T10:19:38.0365941Z mov.u32 %r12366, 0x0; 2026-02-21T10:19:38.0366044Z mov.u32 %r12367, 0x0; 2026-02-21T10:19:38.0366268Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12364, %r12365, %r12366, %r12367 }, [ %rd342 + 0 ], %rd341; 2026-02-21T10:19:38.0366329Z // end inline asm 2026-02-21T10:19:38.0367606Z [3170s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:19:38.0368905Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, None], range_multi_buffers=[None, True], range_num_stages=[2, 3], range_unroll_factors=[3, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:19:38.0369053Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:19:38.0369115Z `ptxas` stderr: 2026-02-21T10:19:38.0369670Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1161 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:19:38.0369781Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:19:38.0369787Z 2026-02-21T10:19:38.0370351Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp1igmrbwc.ptx -o /tmp/tmp1igmrbwc.ptx.o 2026-02-21T10:19:38.0370357Z 2026-02-21T10:19:38.0370515Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:19:38.0370578Z // begin inline asm 2026-02-21T10:19:38.0370638Z mov.u64 %rd344, 0x0; 2026-02-21T10:19:38.0370766Z createpolicy.fractional.L2::evict_first.b64 %rd344, 1.0; 2026-02-21T10:19:38.0370825Z // end inline asm 2026-02-21T10:19:38.0370883Z // begin inline asm 2026-02-21T10:19:38.0370944Z mov.u32 %r12368, 0x0; 2026-02-21T10:19:38.0371006Z mov.u32 %r12369, 0x0; 2026-02-21T10:19:38.0371064Z mov.u32 %r12370, 0x0; 2026-02-21T10:19:38.0371121Z mov.u32 %r12371, 0x0; 2026-02-21T10:19:38.0371362Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12368, %r12369, %r12370, %r12371 }, [ %rd345 + 0 ], %rd344; 2026-02-21T10:19:38.0371419Z // end inline asm 2026-02-21T10:19:38.0371477Z // begin inline asm 2026-02-21T10:19:38.0371539Z mov.u64 %rd347, 0x0; 2026-02-21T10:19:38.0371662Z createpolicy.fractional.L2::evict_first.b64 %rd347, 1.0; 2026-02-21T10:19:38.0371721Z // end inline asm 2026-02-21T10:19:38.0371778Z // begin inline asm 2026-02-21T10:19:38.0371839Z mov.u32 %r12372, 0x0; 2026-02-21T10:19:38.0371896Z mov.u32 %r12373, 0x0; 2026-02-21T10:19:38.0371953Z mov.u32 %r12374, 0x0; 2026-02-21T10:19:38.0372013Z mov.u32 %r12375, 0x0; 2026-02-21T10:19:38.0372240Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12372, %r12373, %r12374, %r12375 }, [ %rd348 + 0 ], %rd347; 2026-02-21T10:19:38.0372299Z // end inline asm 2026-02-21T10:19:38.0372357Z // begin inline asm 2026-02-21T10:19:38.0372436Z mov.u64 %rd350, 0x0; 2026-02-21T10:19:38.0372557Z createpolicy.fractional.L2::evict_first.b64 %rd350, 1.0; 2026-02-21T10:19:38.0372614Z // end inline asm 2026-02-21T10:19:38.0372677Z // begin inline asm 2026-02-21T10:19:38.0372735Z mov.u32 %r12376, 0x0; 2026-02-21T10:19:38.0372793Z mov.u32 %r12377, 0x0; 2026-02-21T10:19:38.0372853Z mov.u32 %r12378, 0x0; 2026-02-21T10:19:38.0372912Z mov.u32 %r12379, 0x0; 2026-02-21T10:19:38.0373136Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12376, %r12377, %r12378, %r12379 }, [ %rd351 + 0 ], %rd350; 2026-02-21T10:19:38.0373194Z // end inline asm 2026-02-21T10:19:38.0373257Z // begin inline asm 2026-02-21T10:19:38.0373315Z mov.u64 %rd353, 0x0; 2026-02-21T10:19:38.0373435Z createpolicy.fractional.L2::evict_first.b64 %rd353, 1.0; 2026-02-21T10:19:38.0373503Z // end inline asm 2026-02-21T10:19:38.0373567Z // begin inline asm 2026-02-21T10:19:38.0373701Z mov.u32 %r12380, 0x0; 2026-02-21T10:19:38.0373760Z mov.u32 %r12381, 0x0; 2026-02-21T10:19:38.0373885Z mov.u32 %r12382, 0x0; 2026-02-21T10:19:38.0373942Z mov.u32 %r12383, 0x0; 2026-02-21T10:19:38.0374178Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12380, %r12381, %r12382, %r12383 }, [ %rd354 + 0 ], %rd353; 2026-02-21T10:19:38.0374240Z // end inline asm 2026-02-21T10:19:38.0374458Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0374516Z bar.sync 0; 2026-02-21T10:19:38.0374603Z st.shared.v2.b32 [%r9], {%r12352, %r12353}; 2026-02-21T10:19:38.0374695Z st.shared.v2.b32 [%r9+2048], {%r12356, %r12357}; 2026-02-21T10:19:38.0374781Z st.shared.v2.b32 [%r9+4096], {%r12360, %r12361}; 2026-02-21T10:19:38.0374869Z st.shared.v2.b32 [%r9+6144], {%r12364, %r12365}; 2026-02-21T10:19:38.0374953Z st.shared.v2.b32 [%r9+8192], {%r12368, %r12369}; 2026-02-21T10:19:38.0375043Z st.shared.v2.b32 [%r9+10240], {%r12372, %r12373}; 2026-02-21T10:19:38.0375132Z st.shared.v2.b32 [%r9+12288], {%r12376, %r12377}; 2026-02-21T10:19:38.0375227Z st.shared.v2.b32 [%r9+14336], {%r12380, %r12381}; 2026-02-21T10:19:38.0375367Z st.shared.v2.b32 [%r10], {%r12354, %r12355}; 2026-02-21T10:19:38.0375457Z st.shared.v2.b32 [%r10+2048], {%r12358, %r12359}; 2026-02-21T10:19:38.0375544Z st.shared.v2.b32 [%r10+4096], {%r12362, %r12363}; 2026-02-21T10:19:38.0375628Z st.shared.v2.b32 [%r10+6144], {%r12366, %r12367}; 2026-02-21T10:19:38.0375753Z st.shared.v2.b32 [%r10+8192], {%r12370, %r12371}; 2026-02-21T10:19:38.0375851Z st.shared.v2.b32 [%r10+10240], {%r12374, %r12375}; 2026-02-21T10:19:38.0375939Z st.shared.v2.b32 [%r10+12288], {%r12378, %r12379}; 2026-02-21T10:19:38.0376026Z st.shared.v2.b32 [%r10+14336], {%r12382, %r12383}; 2026-02-21T10:19:38.0376082Z bar.sync 0; 2026-02-21T10:19:38.0376160Z ld.shared.b16 %rs897, [%r51]; 2026-02-21T10:19:38.0376229Z ld.shared.b16 %rs898, [%r51+1024]; 2026-02-21T10:19:38.0376297Z ld.shared.b16 %rs899, [%r51+64]; 2026-02-21T10:19:38.0376370Z ld.shared.b16 %rs900, [%r51+1088]; 2026-02-21T10:19:38.0376437Z ld.shared.b16 %rs901, [%r51+8192]; 2026-02-21T10:19:38.0376629Z ld.shared.b16 %rs902, [%r51+9216]; 2026-02-21T10:19:38.0376696Z ld.shared.b16 %rs903, [%r51+8256]; 2026-02-21T10:19:38.0376761Z ld.shared.b16 %rs904, [%r51+9280]; 2026-02-21T10:19:38.0376829Z ld.shared.b16 %rs905, [%r52]; 2026-02-21T10:19:38.0376903Z ld.shared.b16 %rs906, [%r52+1024]; 2026-02-21T10:19:38.0376978Z ld.shared.b16 %rs907, [%r52+64]; 2026-02-21T10:19:38.0377041Z ld.shared.b16 %rs908, [%r52+1088]; 2026-02-21T10:19:38.0377106Z ld.shared.b16 %rs909, [%r52+8192]; 2026-02-21T10:19:38.0377169Z ld.shared.b16 %rs910, [%r52+9216]; 2026-02-21T10:19:38.0377234Z ld.shared.b16 %rs911, [%r52+8256]; 2026-02-21T10:19:38.0377296Z ld.shared.b16 %rs912, [%r52+9280]; 2026-02-21T10:19:38.0377359Z ld.shared.b16 %rs913, [%r53]; 2026-02-21T10:19:38.0377427Z ld.shared.b16 %rs914, [%r53+1024]; 2026-02-21T10:19:38.0377494Z ld.shared.b16 %rs915, [%r53+64]; 2026-02-21T10:19:38.0377557Z ld.shared.b16 %rs916, [%r53+1088]; 2026-02-21T10:19:38.0377624Z ld.shared.b16 %rs917, [%r53+8192]; 2026-02-21T10:19:38.0377687Z ld.shared.b16 %rs918, [%r53+9216]; 2026-02-21T10:19:38.0377751Z ld.shared.b16 %rs919, [%r53+8256]; 2026-02-21T10:19:38.0377814Z ld.shared.b16 %rs920, [%r53+9280]; 2026-02-21T10:19:38.0377881Z ld.shared.b16 %rs921, [%r54]; 2026-02-21T10:19:38.0377945Z ld.shared.b16 %rs922, [%r54+1024]; 2026-02-21T10:19:38.0378007Z ld.shared.b16 %rs923, [%r54+64]; 2026-02-21T10:19:38.0378072Z ld.shared.b16 %rs924, [%r54+1088]; 2026-02-21T10:19:38.0378133Z ld.shared.b16 %rs925, [%r54+8192]; 2026-02-21T10:19:38.0378195Z ld.shared.b16 %rs926, [%r54+9216]; 2026-02-21T10:19:38.0378258Z ld.shared.b16 %rs927, [%r54+8256]; 2026-02-21T10:19:38.0378325Z ld.shared.b16 %rs928, [%r54+9280]; 2026-02-21T10:19:38.0378388Z ld.shared.b16 %rs929, [%r55]; 2026-02-21T10:19:38.0378451Z ld.shared.b16 %rs930, [%r55+1024]; 2026-02-21T10:19:38.0378603Z ld.shared.b16 %rs931, [%r55+64]; 2026-02-21T10:19:38.0378736Z ld.shared.b16 %rs932, [%r55+1088]; 2026-02-21T10:19:38.0378801Z ld.shared.b16 %rs933, [%r55+8192]; 2026-02-21T10:19:38.0378870Z ld.shared.b16 %rs934, [%r55+9216]; 2026-02-21T10:19:38.0378935Z ld.shared.b16 %rs935, [%r55+8256]; 2026-02-21T10:19:38.0378997Z ld.shared.b16 %rs936, [%r55+9280]; 2026-02-21T10:19:38.0379061Z ld.shared.b16 %rs937, [%r56]; 2026-02-21T10:19:38.0379131Z ld.shared.b16 %rs938, [%r56+1024]; 2026-02-21T10:19:38.0379196Z ld.shared.b16 %rs939, [%r56+64]; 2026-02-21T10:19:38.0379261Z ld.shared.b16 %rs940, [%r56+1088]; 2026-02-21T10:19:38.0379334Z ld.shared.b16 %rs941, [%r56+8192]; 2026-02-21T10:19:38.0379399Z ld.shared.b16 %rs942, [%r56+9216]; 2026-02-21T10:19:38.0379462Z ld.shared.b16 %rs943, [%r56+8256]; 2026-02-21T10:19:38.0379526Z ld.shared.b16 %rs944, [%r56+9280]; 2026-02-21T10:19:38.0379593Z ld.shared.b16 %rs945, [%r57]; 2026-02-21T10:19:38.0379657Z ld.shared.b16 %rs946, [%r57+1024]; 2026-02-21T10:19:38.0379724Z ld.shared.b16 %rs947, [%r57+64]; 2026-02-21T10:19:38.0379793Z ld.shared.b16 %rs948, [%r57+1088]; 2026-02-21T10:19:38.0379937Z ld.shared.b16 %rs949, [%r57+8192]; 2026-02-21T10:19:38.0380006Z ld.shared.b16 %rs950, [%r57+9216]; 2026-02-21T10:19:38.0380072Z ld.shared.b16 %rs951, [%r57+8256]; 2026-02-21T10:19:38.0380141Z ld.shared.b16 %rs952, [%r57+9280]; 2026-02-21T10:19:38.0380205Z ld.shared.b16 %rs953, [%r58]; 2026-02-21T10:19:38.0380324Z ld.shared.b16 %rs954, [%r58+1024]; 2026-02-21T10:19:38.0380394Z ld.shared.b16 %rs955, [%r58+64]; 2026-02-21T10:19:38.0380457Z ld.shared.b16 %rs956, [%r58+1088]; 2026-02-21T10:19:38.0380520Z ld.shared.b16 %rs957, [%r58+8192]; 2026-02-21T10:19:38.0380586Z ld.shared.b16 %rs958, [%r58+9216]; 2026-02-21T10:19:38.0380647Z ld.shared.b16 %rs959, [%r58+8256]; 2026-02-21T10:19:38.0380710Z ld.shared.b16 %rs960, [%r58+9280]; 2026-02-21T10:19:38.0380771Z cvt.f32.bf16 %r12521, %rs897; 2026-02-21T10:19:38.0380835Z cvt.f32.bf16 %r12522, %rs898; 2026-02-21T10:19:38.0380898Z cvt.f32.bf16 %r12523, %rs905; 2026-02-21T10:19:38.0380959Z cvt.f32.bf16 %r12524, %rs906; 2026-02-21T10:19:38.0381037Z cvt.f32.bf16 %r12653, %rs913; 2026-02-21T10:19:38.0381099Z cvt.f32.bf16 %r12654, %rs914; 2026-02-21T10:19:38.0381160Z cvt.f32.bf16 %r12655, %rs921; 2026-02-21T10:19:38.0381221Z cvt.f32.bf16 %r12656, %rs922; 2026-02-21T10:19:38.0381286Z cvt.f32.bf16 %r12785, %rs929; 2026-02-21T10:19:38.0381348Z cvt.f32.bf16 %r12786, %rs930; 2026-02-21T10:19:38.0381409Z cvt.f32.bf16 %r12787, %rs937; 2026-02-21T10:19:38.0381475Z cvt.f32.bf16 %r12788, %rs938; 2026-02-21T10:19:38.0381536Z cvt.f32.bf16 %r12917, %rs945; 2026-02-21T10:19:38.0381598Z cvt.f32.bf16 %r12918, %rs946; 2026-02-21T10:19:38.0381657Z cvt.f32.bf16 %r12919, %rs953; 2026-02-21T10:19:38.0381721Z cvt.f32.bf16 %r12920, %rs954; 2026-02-21T10:19:38.0381781Z cvt.f32.bf16 %r13049, %rs899; 2026-02-21T10:19:38.0381839Z cvt.f32.bf16 %r13050, %rs900; 2026-02-21T10:19:38.0381909Z cvt.f32.bf16 %r13051, %rs907; 2026-02-21T10:19:38.0381969Z cvt.f32.bf16 %r13052, %rs908; 2026-02-21T10:19:38.0382031Z cvt.f32.bf16 %r13181, %rs915; 2026-02-21T10:19:38.0382098Z cvt.f32.bf16 %r13182, %rs916; 2026-02-21T10:19:38.0382157Z cvt.f32.bf16 %r13183, %rs923; 2026-02-21T10:19:38.0382216Z cvt.f32.bf16 %r13184, %rs924; 2026-02-21T10:19:38.0382277Z cvt.f32.bf16 %r13313, %rs931; 2026-02-21T10:19:38.0382338Z cvt.f32.bf16 %r13314, %rs932; 2026-02-21T10:19:38.0382399Z cvt.f32.bf16 %r13315, %rs939; 2026-02-21T10:19:38.0382459Z cvt.f32.bf16 %r13316, %rs940; 2026-02-21T10:19:38.0382521Z cvt.f32.bf16 %r13445, %rs947; 2026-02-21T10:19:38.0382579Z cvt.f32.bf16 %r13446, %rs948; 2026-02-21T10:19:38.0382638Z cvt.f32.bf16 %r13447, %rs955; 2026-02-21T10:19:38.0382698Z cvt.f32.bf16 %r13448, %rs956; 2026-02-21T10:19:38.0382762Z cvt.f32.bf16 %r13577, %rs901; 2026-02-21T10:19:38.0382822Z cvt.f32.bf16 %r13578, %rs902; 2026-02-21T10:19:38.0382895Z cvt.f32.bf16 %r13579, %rs909; 2026-02-21T10:19:38.0383019Z cvt.f32.bf16 %r13580, %rs910; 2026-02-21T10:19:38.0383081Z cvt.f32.bf16 %r13709, %rs917; 2026-02-21T10:19:38.0383187Z cvt.f32.bf16 %r13710, %rs918; 2026-02-21T10:19:38.0383250Z cvt.f32.bf16 %r13711, %rs925; 2026-02-21T10:19:38.0383313Z cvt.f32.bf16 %r13712, %rs926; 2026-02-21T10:19:38.0383372Z cvt.f32.bf16 %r13841, %rs933; 2026-02-21T10:19:38.0383433Z cvt.f32.bf16 %r13842, %rs934; 2026-02-21T10:19:38.0383496Z cvt.f32.bf16 %r13843, %rs941; 2026-02-21T10:19:38.0383557Z cvt.f32.bf16 %r13844, %rs942; 2026-02-21T10:19:38.0383618Z cvt.f32.bf16 %r13973, %rs949; 2026-02-21T10:19:38.0383677Z cvt.f32.bf16 %r13974, %rs950; 2026-02-21T10:19:38.0383740Z cvt.f32.bf16 %r13975, %rs957; 2026-02-21T10:19:38.0383798Z cvt.f32.bf16 %r13976, %rs958; 2026-02-21T10:19:38.0383858Z cvt.f32.bf16 %r14105, %rs903; 2026-02-21T10:19:38.0383921Z cvt.f32.bf16 %r14106, %rs904; 2026-02-21T10:19:38.0383981Z cvt.f32.bf16 %r14107, %rs911; 2026-02-21T10:19:38.0384041Z cvt.f32.bf16 %r14108, %rs912; 2026-02-21T10:19:38.0384106Z cvt.f32.bf16 %r14237, %rs919; 2026-02-21T10:19:38.0384169Z cvt.f32.bf16 %r14238, %rs920; 2026-02-21T10:19:38.0384229Z cvt.f32.bf16 %r14239, %rs927; 2026-02-21T10:19:38.0384338Z cvt.f32.bf16 %r14240, %rs928; 2026-02-21T10:19:38.0384406Z cvt.f32.bf16 %r14369, %rs935; 2026-02-21T10:19:38.0384466Z cvt.f32.bf16 %r14370, %rs936; 2026-02-21T10:19:38.0384534Z cvt.f32.bf16 %r14371, %rs943; 2026-02-21T10:19:38.0384596Z cvt.f32.bf16 %r14372, %rs944; 2026-02-21T10:19:38.0384730Z cvt.f32.bf16 %r14501, %rs951; 2026-02-21T10:19:38.0384795Z cvt.f32.bf16 %r14502, %rs952; 2026-02-21T10:19:38.0384858Z cvt.f32.bf16 %r14503, %rs959; 2026-02-21T10:19:38.0384922Z cvt.f32.bf16 %r14504, %rs960; 2026-02-21T10:19:38.0385132Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0385189Z bar.sync 0; 2026-02-21T10:19:38.0385253Z // begin inline asm 2026-02-21T10:19:38.0385353Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0385414Z // end inline asm 2026-02-21T10:19:38.0385471Z bar.sync 0; 2026-02-21T10:19:38.0385535Z // begin inline asm 2026-02-21T10:19:38.0385674Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0385733Z // end inline asm 2026-02-21T10:19:38.0385795Z // begin inline asm 2026-02-21T10:19:38.0385868Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0385924Z // end inline asm 2026-02-21T10:19:38.0385984Z bar.sync 0; 2026-02-21T10:19:38.0386052Z elect.sync %r19597|%p190, -1; 2026-02-21T10:19:38.0386121Z and.pred %p131, %p1, %p190; 2026-02-21T10:19:38.0386184Z add.s64 %rd55, %rd845, 96; 2026-02-21T10:19:38.0386250Z cvt.u32.u64 %r12388, %rd55; 2026-02-21T10:19:38.0386308Z // begin inline asm 2026-02-21T10:19:38.0386776Z @%p131 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r19829, %r12388}], [%r29846]; 2026-02-21T10:19:38.0386847Z // end inline asm 2026-02-21T10:19:38.0386911Z bar.sync 0; 2026-02-21T10:19:38.0386980Z mov.b32 %r19465, 0; 2026-02-21T10:19:38.0387045Z // begin inline asm 2026-02-21T10:19:38.0387101Z 2026-02-21T10:19:38.0387152Z { 2026-02-21T10:19:38.0387217Z .reg .pred complete; 2026-02-21T10:19:38.0387277Z waitLoop: 2026-02-21T10:19:38.0387426Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r19465; 2026-02-21T10:19:38.0387495Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0387547Z } 2026-02-21T10:19:38.0387556Z 2026-02-21T10:19:38.0387614Z // end inline asm 2026-02-21T10:19:38.0387670Z bar.sync 0; 2026-02-21T10:19:38.0387727Z // begin inline asm 2026-02-21T10:19:38.0387830Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0387889Z // end inline asm 2026-02-21T10:19:38.0388093Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0388169Z ld.shared.s8 %rs961, [%r19]; 2026-02-21T10:19:38.0388363Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0388632Z shl.b16 %rs962, %rs961, 4; 2026-02-21T10:19:38.0388858Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0388930Z ld.shared.s8 %rs963, [%r20+128]; 2026-02-21T10:19:38.0389134Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0389200Z shl.b16 %rs964, %rs963, 4; 2026-02-21T10:19:38.0389406Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0389474Z ld.shared.s8 %rs965, [%r21+256]; 2026-02-21T10:19:38.0389673Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0389739Z shl.b16 %rs966, %rs965, 4; 2026-02-21T10:19:38.0389932Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0390000Z ld.shared.s8 %rs967, [%r22+384]; 2026-02-21T10:19:38.0390201Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0390330Z shl.b16 %rs968, %rs967, 4; 2026-02-21T10:19:38.0390528Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0390613Z ld.shared.s8 %rs969, [%r23+512]; 2026-02-21T10:19:38.0390864Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0390929Z shl.b16 %rs970, %rs969, 4; 2026-02-21T10:19:38.0391119Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0391190Z ld.shared.s8 %rs971, [%r24+640]; 2026-02-21T10:19:38.0391377Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0391439Z shl.b16 %rs972, %rs971, 4; 2026-02-21T10:19:38.0391634Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0391702Z ld.shared.s8 %rs973, [%r25+768]; 2026-02-21T10:19:38.0391892Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0391958Z shl.b16 %rs974, %rs973, 4; 2026-02-21T10:19:38.0392148Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0392212Z ld.shared.s8 %rs975, [%r26+896]; 2026-02-21T10:19:38.0392404Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0392465Z shl.b16 %rs976, %rs975, 4; 2026-02-21T10:19:38.0392657Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0392726Z ld.shared.s8 %rs977, [%r19+1024]; 2026-02-21T10:19:38.0392926Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0392991Z shl.b16 %rs978, %rs977, 4; 2026-02-21T10:19:38.0393187Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0393268Z ld.shared.s8 %rs979, [%r20+1152]; 2026-02-21T10:19:38.0393464Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0393527Z shl.b16 %rs980, %rs979, 4; 2026-02-21T10:19:38.0393723Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0393787Z ld.shared.s8 %rs981, [%r21+1280]; 2026-02-21T10:19:38.0393978Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0394042Z shl.b16 %rs982, %rs981, 4; 2026-02-21T10:19:38.0394230Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0394363Z ld.shared.s8 %rs983, [%r22+1408]; 2026-02-21T10:19:38.0394609Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0394673Z shl.b16 %rs984, %rs983, 4; 2026-02-21T10:19:38.0394861Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0394925Z ld.shared.s8 %rs985, [%r23+1536]; 2026-02-21T10:19:38.0395120Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0395180Z shl.b16 %rs986, %rs985, 4; 2026-02-21T10:19:38.0395370Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0395438Z ld.shared.s8 %rs987, [%r24+1664]; 2026-02-21T10:19:38.0395627Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0395689Z shl.b16 %rs988, %rs987, 4; 2026-02-21T10:19:38.0395884Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0395996Z ld.shared.s8 %rs989, [%r25+1792]; 2026-02-21T10:19:38.0396189Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0396249Z shl.b16 %rs990, %rs989, 4; 2026-02-21T10:19:38.0396625Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0396703Z ld.shared.s8 %rs991, [%r26+1920]; 2026-02-21T10:19:38.0396896Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0396961Z shl.b16 %rs992, %rs991, 4; 2026-02-21T10:19:38.0397166Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0397233Z ld.shared.s8 %rs993, [%r19+2048]; 2026-02-21T10:19:38.0397430Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0397492Z shl.b16 %rs994, %rs993, 4; 2026-02-21T10:19:38.0397683Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0397751Z ld.shared.s8 %rs995, [%r20+2176]; 2026-02-21T10:19:38.0397939Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0398002Z shl.b16 %rs996, %rs995, 4; 2026-02-21T10:19:38.0398191Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0398260Z ld.shared.s8 %rs997, [%r21+2304]; 2026-02-21T10:19:38.0398451Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0398511Z shl.b16 %rs998, %rs997, 4; 2026-02-21T10:19:38.0398702Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0398769Z ld.shared.s8 %rs999, [%r22+2432]; 2026-02-21T10:19:38.0398960Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0399029Z shl.b16 %rs1000, %rs999, 4; 2026-02-21T10:19:38.0399219Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0399287Z ld.shared.s8 %rs1001, [%r23+2560]; 2026-02-21T10:19:38.0399482Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0399545Z shl.b16 %rs1002, %rs1001, 4; 2026-02-21T10:19:38.0399733Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0399799Z ld.shared.s8 %rs1003, [%r24+2688]; 2026-02-21T10:19:38.0399993Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0400139Z shl.b16 %rs1004, %rs1003, 4; 2026-02-21T10:19:38.0400333Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0400472Z ld.shared.s8 %rs1005, [%r25+2816]; 2026-02-21T10:19:38.0400664Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0400728Z shl.b16 %rs1006, %rs1005, 4; 2026-02-21T10:19:38.0400924Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0400989Z ld.shared.s8 %rs1007, [%r26+2944]; 2026-02-21T10:19:38.0401179Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0401249Z shl.b16 %rs1008, %rs1007, 4; 2026-02-21T10:19:38.0401438Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0401502Z ld.shared.s8 %rs1009, [%r19+3072]; 2026-02-21T10:19:38.0401694Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0401765Z shl.b16 %rs1010, %rs1009, 4; 2026-02-21T10:19:38.0402016Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0402083Z ld.shared.s8 %rs1011, [%r20+3200]; 2026-02-21T10:19:38.0402322Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0402385Z shl.b16 %rs1012, %rs1011, 4; 2026-02-21T10:19:38.0402575Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0402643Z ld.shared.s8 %rs1013, [%r21+3328]; 2026-02-21T10:19:38.0402833Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0402895Z shl.b16 %rs1014, %rs1013, 4; 2026-02-21T10:19:38.0403089Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0403159Z ld.shared.s8 %rs1015, [%r22+3456]; 2026-02-21T10:19:38.0403349Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0403417Z shl.b16 %rs1016, %rs1015, 4; 2026-02-21T10:19:38.0403606Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0403672Z ld.shared.s8 %rs1017, [%r23+3584]; 2026-02-21T10:19:38.0403860Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0403927Z shl.b16 %rs1018, %rs1017, 4; 2026-02-21T10:19:38.0404115Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0404179Z ld.shared.s8 %rs1019, [%r24+3712]; 2026-02-21T10:19:38.0404392Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0404458Z shl.b16 %rs1020, %rs1019, 4; 2026-02-21T10:19:38.0404651Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0404720Z ld.shared.s8 %rs1021, [%r25+3840]; 2026-02-21T10:19:38.0404909Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0404973Z shl.b16 %rs1022, %rs1021, 4; 2026-02-21T10:19:38.0405168Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0405234Z ld.shared.s8 %rs1023, [%r26+3968]; 2026-02-21T10:19:38.0405423Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0405485Z shl.b16 %rs1024, %rs1023, 4; 2026-02-21T10:19:38.0405678Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0405813Z cvt.s16.s8 %rs1025, %rs962; 2026-02-21T10:19:38.0405878Z shr.s16 %rs1026, %rs1025, 4; 2026-02-21T10:19:38.0405991Z cvt.s16.s8 %rs1027, %rs964; 2026-02-21T10:19:38.0406053Z shr.s16 %rs1028, %rs1027, 4; 2026-02-21T10:19:38.0406113Z shr.s16 %rs1029, %rs961, 4; 2026-02-21T10:19:38.0406178Z shr.s16 %rs1030, %rs963, 4; 2026-02-21T10:19:38.0406369Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0406437Z cvt.rn.f32.s16 %r19598, %rs1030; 2026-02-21T10:19:38.0406617Z cvt.rn.f32.s16 %r19599, %rs1029; 2026-02-21T10:19:38.0406687Z cvt.rn.f32.s16 %r19600, %rs1028; 2026-02-21T10:19:38.0406749Z cvt.rn.f32.s16 %r19601, %rs1026; 2026-02-21T10:19:38.0406953Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0407024Z cvt.s16.s8 %rs1031, %rs966; 2026-02-21T10:19:38.0407085Z shr.s16 %rs1032, %rs1031, 4; 2026-02-21T10:19:38.0407146Z cvt.s16.s8 %rs1033, %rs968; 2026-02-21T10:19:38.0407209Z shr.s16 %rs1034, %rs1033, 4; 2026-02-21T10:19:38.0407276Z shr.s16 %rs1035, %rs965, 4; 2026-02-21T10:19:38.0407338Z shr.s16 %rs1036, %rs967, 4; 2026-02-21T10:19:38.0407621Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0407695Z cvt.rn.f32.s16 %r19602, %rs1036; 2026-02-21T10:19:38.0407757Z cvt.rn.f32.s16 %r19603, %rs1035; 2026-02-21T10:19:38.0407891Z cvt.rn.f32.s16 %r19604, %rs1034; 2026-02-21T10:19:38.0407959Z cvt.rn.f32.s16 %r19605, %rs1032; 2026-02-21T10:19:38.0408151Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0408212Z cvt.s16.s8 %rs1037, %rs970; 2026-02-21T10:19:38.0408278Z shr.s16 %rs1038, %rs1037, 4; 2026-02-21T10:19:38.0408344Z cvt.s16.s8 %rs1039, %rs972; 2026-02-21T10:19:38.0408405Z shr.s16 %rs1040, %rs1039, 4; 2026-02-21T10:19:38.0408465Z shr.s16 %rs1041, %rs969, 4; 2026-02-21T10:19:38.0408532Z shr.s16 %rs1042, %rs971, 4; 2026-02-21T10:19:38.0408734Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0408802Z cvt.rn.f32.s16 %r19606, %rs1042; 2026-02-21T10:19:38.0408873Z cvt.rn.f32.s16 %r19607, %rs1041; 2026-02-21T10:19:38.0408935Z cvt.rn.f32.s16 %r19608, %rs1040; 2026-02-21T10:19:38.0409000Z cvt.rn.f32.s16 %r19609, %rs1038; 2026-02-21T10:19:38.0409193Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0409263Z cvt.s16.s8 %rs1043, %rs974; 2026-02-21T10:19:38.0409326Z shr.s16 %rs1044, %rs1043, 4; 2026-02-21T10:19:38.0409387Z cvt.s16.s8 %rs1045, %rs976; 2026-02-21T10:19:38.0409452Z shr.s16 %rs1046, %rs1045, 4; 2026-02-21T10:19:38.0409513Z shr.s16 %rs1047, %rs973, 4; 2026-02-21T10:19:38.0409573Z shr.s16 %rs1048, %rs975, 4; 2026-02-21T10:19:38.0409767Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0409833Z cvt.rn.f32.s16 %r19610, %rs1048; 2026-02-21T10:19:38.0409898Z cvt.rn.f32.s16 %r19611, %rs1047; 2026-02-21T10:19:38.0409962Z cvt.rn.f32.s16 %r19612, %rs1046; 2026-02-21T10:19:38.0410028Z cvt.rn.f32.s16 %r19613, %rs1044; 2026-02-21T10:19:38.0410217Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0410279Z cvt.s16.s8 %rs1049, %rs978; 2026-02-21T10:19:38.0410349Z shr.s16 %rs1050, %rs1049, 4; 2026-02-21T10:19:38.0410412Z cvt.s16.s8 %rs1051, %rs980; 2026-02-21T10:19:38.0410473Z shr.s16 %rs1052, %rs1051, 4; 2026-02-21T10:19:38.0410535Z shr.s16 %rs1053, %rs977, 4; 2026-02-21T10:19:38.0410601Z shr.s16 %rs1054, %rs979, 4; 2026-02-21T10:19:38.0410792Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0410855Z cvt.rn.f32.s16 %r19614, %rs1054; 2026-02-21T10:19:38.0410923Z cvt.rn.f32.s16 %r19615, %rs1053; 2026-02-21T10:19:38.0411076Z cvt.rn.f32.s16 %r19616, %rs1052; 2026-02-21T10:19:38.0411202Z cvt.rn.f32.s16 %r19617, %rs1050; 2026-02-21T10:19:38.0411401Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0411462Z cvt.s16.s8 %rs1055, %rs982; 2026-02-21T10:19:38.0411525Z shr.s16 %rs1056, %rs1055, 4; 2026-02-21T10:19:38.0411587Z cvt.s16.s8 %rs1057, %rs984; 2026-02-21T10:19:38.0411657Z shr.s16 %rs1058, %rs1057, 4; 2026-02-21T10:19:38.0411719Z shr.s16 %rs1059, %rs981, 4; 2026-02-21T10:19:38.0411779Z shr.s16 %rs1060, %rs983, 4; 2026-02-21T10:19:38.0411974Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0412037Z cvt.rn.f32.s16 %r19618, %rs1060; 2026-02-21T10:19:38.0412100Z cvt.rn.f32.s16 %r19619, %rs1059; 2026-02-21T10:19:38.0412168Z cvt.rn.f32.s16 %r19620, %rs1058; 2026-02-21T10:19:38.0412234Z cvt.rn.f32.s16 %r19621, %rs1056; 2026-02-21T10:19:38.0412427Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0412492Z cvt.s16.s8 %rs1061, %rs986; 2026-02-21T10:19:38.0412619Z shr.s16 %rs1062, %rs1061, 4; 2026-02-21T10:19:38.0412685Z cvt.s16.s8 %rs1063, %rs988; 2026-02-21T10:19:38.0412747Z shr.s16 %rs1064, %rs1063, 4; 2026-02-21T10:19:38.0412812Z shr.s16 %rs1065, %rs985, 4; 2026-02-21T10:19:38.0412874Z shr.s16 %rs1066, %rs987, 4; 2026-02-21T10:19:38.0413113Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0413184Z cvt.rn.f32.s16 %r19622, %rs1066; 2026-02-21T10:19:38.0413247Z cvt.rn.f32.s16 %r19623, %rs1065; 2026-02-21T10:19:38.0413310Z cvt.rn.f32.s16 %r19624, %rs1064; 2026-02-21T10:19:38.0413372Z cvt.rn.f32.s16 %r19625, %rs1062; 2026-02-21T10:19:38.0413568Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0413632Z cvt.s16.s8 %rs1067, %rs990; 2026-02-21T10:19:38.0413694Z shr.s16 %rs1068, %rs1067, 4; 2026-02-21T10:19:38.0413759Z cvt.s16.s8 %rs1069, %rs992; 2026-02-21T10:19:38.0413821Z shr.s16 %rs1070, %rs1069, 4; 2026-02-21T10:19:38.0413884Z shr.s16 %rs1071, %rs989, 4; 2026-02-21T10:19:38.0413945Z shr.s16 %rs1072, %rs991, 4; 2026-02-21T10:19:38.0414139Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0414203Z cvt.rn.f32.s16 %r19626, %rs1072; 2026-02-21T10:19:38.0414266Z cvt.rn.f32.s16 %r19627, %rs1071; 2026-02-21T10:19:38.0414332Z cvt.rn.f32.s16 %r19628, %rs1070; 2026-02-21T10:19:38.0414394Z cvt.rn.f32.s16 %r19629, %rs1068; 2026-02-21T10:19:38.0414583Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0414648Z cvt.s16.s8 %rs1073, %rs994; 2026-02-21T10:19:38.0414709Z shr.s16 %rs1074, %rs1073, 4; 2026-02-21T10:19:38.0414771Z cvt.s16.s8 %rs1075, %rs996; 2026-02-21T10:19:38.0414835Z shr.s16 %rs1076, %rs1075, 4; 2026-02-21T10:19:38.0414904Z shr.s16 %rs1077, %rs993, 4; 2026-02-21T10:19:38.0414963Z shr.s16 %rs1078, %rs995, 4; 2026-02-21T10:19:38.0415154Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0415223Z cvt.rn.f32.s16 %r19630, %rs1078; 2026-02-21T10:19:38.0415285Z cvt.rn.f32.s16 %r19631, %rs1077; 2026-02-21T10:19:38.0415348Z cvt.rn.f32.s16 %r19632, %rs1076; 2026-02-21T10:19:38.0415413Z cvt.rn.f32.s16 %r19633, %rs1074; 2026-02-21T10:19:38.0415607Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0415669Z cvt.s16.s8 %rs1079, %rs998; 2026-02-21T10:19:38.0415731Z shr.s16 %rs1080, %rs1079, 4; 2026-02-21T10:19:38.0415799Z cvt.s16.s8 %rs1081, %rs1000; 2026-02-21T10:19:38.0415860Z shr.s16 %rs1082, %rs1081, 4; 2026-02-21T10:19:38.0415931Z shr.s16 %rs1083, %rs997, 4; 2026-02-21T10:19:38.0416059Z shr.s16 %rs1084, %rs999, 4; 2026-02-21T10:19:38.0416252Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0416376Z cvt.rn.f32.s16 %r19634, %rs1084; 2026-02-21T10:19:38.0416442Z cvt.rn.f32.s16 %r19635, %rs1083; 2026-02-21T10:19:38.0416630Z cvt.rn.f32.s16 %r19636, %rs1082; 2026-02-21T10:19:38.0416693Z cvt.rn.f32.s16 %r19637, %rs1080; 2026-02-21T10:19:38.0416895Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0416965Z cvt.s16.s8 %rs1085, %rs1002; 2026-02-21T10:19:38.0417029Z shr.s16 %rs1086, %rs1085, 4; 2026-02-21T10:19:38.0417091Z cvt.s16.s8 %rs1087, %rs1004; 2026-02-21T10:19:38.0417157Z shr.s16 %rs1088, %rs1087, 4; 2026-02-21T10:19:38.0417216Z shr.s16 %rs1089, %rs1001, 4; 2026-02-21T10:19:38.0417275Z shr.s16 %rs1090, %rs1003, 4; 2026-02-21T10:19:38.0417466Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0417538Z cvt.rn.f32.s16 %r19638, %rs1090; 2026-02-21T10:19:38.0417602Z cvt.rn.f32.s16 %r19639, %rs1089; 2026-02-21T10:19:38.0417740Z cvt.rn.f32.s16 %r19640, %rs1088; 2026-02-21T10:19:38.0417810Z cvt.rn.f32.s16 %r19641, %rs1086; 2026-02-21T10:19:38.0418001Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0418063Z cvt.s16.s8 %rs1091, %rs1006; 2026-02-21T10:19:38.0418190Z shr.s16 %rs1092, %rs1091, 4; 2026-02-21T10:19:38.0418253Z cvt.s16.s8 %rs1093, %rs1008; 2026-02-21T10:19:38.0418316Z shr.s16 %rs1094, %rs1093, 4; 2026-02-21T10:19:38.0418376Z shr.s16 %rs1095, %rs1005, 4; 2026-02-21T10:19:38.0418440Z shr.s16 %rs1096, %rs1007, 4; 2026-02-21T10:19:38.0418643Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0418710Z cvt.rn.f32.s16 %r19642, %rs1096; 2026-02-21T10:19:38.0418780Z cvt.rn.f32.s16 %r19643, %rs1095; 2026-02-21T10:19:38.0418844Z cvt.rn.f32.s16 %r19644, %rs1094; 2026-02-21T10:19:38.0418910Z cvt.rn.f32.s16 %r19645, %rs1092; 2026-02-21T10:19:38.0419110Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0419172Z cvt.s16.s8 %rs1097, %rs1010; 2026-02-21T10:19:38.0419232Z shr.s16 %rs1098, %rs1097, 4; 2026-02-21T10:19:38.0419294Z cvt.s16.s8 %rs1099, %rs1012; 2026-02-21T10:19:38.0419360Z shr.s16 %rs1100, %rs1099, 4; 2026-02-21T10:19:38.0419420Z shr.s16 %rs1101, %rs1009, 4; 2026-02-21T10:19:38.0419480Z shr.s16 %rs1102, %rs1011, 4; 2026-02-21T10:19:38.0419675Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0419739Z cvt.rn.f32.s16 %r19646, %rs1102; 2026-02-21T10:19:38.0419801Z cvt.rn.f32.s16 %r19647, %rs1101; 2026-02-21T10:19:38.0419868Z cvt.rn.f32.s16 %r19648, %rs1100; 2026-02-21T10:19:38.0419930Z cvt.rn.f32.s16 %r19649, %rs1098; 2026-02-21T10:19:38.0420126Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0420189Z cvt.s16.s8 %rs1103, %rs1014; 2026-02-21T10:19:38.0420258Z shr.s16 %rs1104, %rs1103, 4; 2026-02-21T10:19:38.0420319Z cvt.s16.s8 %rs1105, %rs1016; 2026-02-21T10:19:38.0420380Z shr.s16 %rs1106, %rs1105, 4; 2026-02-21T10:19:38.0420442Z shr.s16 %rs1107, %rs1013, 4; 2026-02-21T10:19:38.0420506Z shr.s16 %rs1108, %rs1015, 4; 2026-02-21T10:19:38.0420699Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0420763Z cvt.rn.f32.s16 %r19650, %rs1108; 2026-02-21T10:19:38.0420833Z cvt.rn.f32.s16 %r19651, %rs1107; 2026-02-21T10:19:38.0420898Z cvt.rn.f32.s16 %r19652, %rs1106; 2026-02-21T10:19:38.0420960Z cvt.rn.f32.s16 %r19653, %rs1104; 2026-02-21T10:19:38.0421154Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0421301Z cvt.s16.s8 %rs1109, %rs1018; 2026-02-21T10:19:38.0421363Z shr.s16 %rs1110, %rs1109, 4; 2026-02-21T10:19:38.0421486Z cvt.s16.s8 %rs1111, %rs1020; 2026-02-21T10:19:38.0421547Z shr.s16 %rs1112, %rs1111, 4; 2026-02-21T10:19:38.0421609Z shr.s16 %rs1113, %rs1017, 4; 2026-02-21T10:19:38.0421670Z shr.s16 %rs1114, %rs1019, 4; 2026-02-21T10:19:38.0421868Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0421932Z cvt.rn.f32.s16 %r19654, %rs1114; 2026-02-21T10:19:38.0421996Z cvt.rn.f32.s16 %r19655, %rs1113; 2026-02-21T10:19:38.0422062Z cvt.rn.f32.s16 %r19656, %rs1112; 2026-02-21T10:19:38.0422124Z cvt.rn.f32.s16 %r19657, %rs1110; 2026-02-21T10:19:38.0422314Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0422381Z cvt.s16.s8 %rs1115, %rs1022; 2026-02-21T10:19:38.0422444Z shr.s16 %rs1116, %rs1115, 4; 2026-02-21T10:19:38.0422505Z cvt.s16.s8 %rs1117, %rs1024; 2026-02-21T10:19:38.0422569Z shr.s16 %rs1118, %rs1117, 4; 2026-02-21T10:19:38.0422635Z shr.s16 %rs1119, %rs1021, 4; 2026-02-21T10:19:38.0422697Z shr.s16 %rs1120, %rs1023, 4; 2026-02-21T10:19:38.0422941Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0423021Z cvt.rn.f32.s16 %r19658, %rs1120; 2026-02-21T10:19:38.0423085Z cvt.rn.f32.s16 %r19659, %rs1119; 2026-02-21T10:19:38.0423191Z cvt.rn.f32.s16 %r19660, %rs1118; 2026-02-21T10:19:38.0423263Z cvt.rn.f32.s16 %r19661, %rs1116; 2026-02-21T10:19:38.0423320Z bar.sync 0; 2026-02-21T10:19:38.0423442Z st.shared.v4.b32 [%r27], {%r19601, %r19599, %r19600, %r19598}; 2026-02-21T10:19:38.0423573Z st.shared.v4.b32 [%r27+16384], {%r19633, %r19631, %r19632, %r19630}; 2026-02-21T10:19:38.0423691Z st.shared.v4.b32 [%r28], {%r19605, %r19603, %r19604, %r19602}; 2026-02-21T10:19:38.0423811Z st.shared.v4.b32 [%r28+16384], {%r19637, %r19635, %r19636, %r19634}; 2026-02-21T10:19:38.0423921Z st.shared.v4.b32 [%r29], {%r19609, %r19607, %r19608, %r19606}; 2026-02-21T10:19:38.0424046Z st.shared.v4.b32 [%r29+16384], {%r19641, %r19639, %r19640, %r19638}; 2026-02-21T10:19:38.0424156Z st.shared.v4.b32 [%r30], {%r19613, %r19611, %r19612, %r19610}; 2026-02-21T10:19:38.0424272Z st.shared.v4.b32 [%r30+16384], {%r19645, %r19643, %r19644, %r19642}; 2026-02-21T10:19:38.0424382Z st.shared.v4.b32 [%r31], {%r19617, %r19615, %r19616, %r19614}; 2026-02-21T10:19:38.0424499Z st.shared.v4.b32 [%r31+16384], {%r19649, %r19647, %r19648, %r19646}; 2026-02-21T10:19:38.0424607Z st.shared.v4.b32 [%r32], {%r19621, %r19619, %r19620, %r19618}; 2026-02-21T10:19:38.0424723Z st.shared.v4.b32 [%r32+16384], {%r19653, %r19651, %r19652, %r19650}; 2026-02-21T10:19:38.0424834Z st.shared.v4.b32 [%r33], {%r19625, %r19623, %r19624, %r19622}; 2026-02-21T10:19:38.0424952Z st.shared.v4.b32 [%r33+16384], {%r19657, %r19655, %r19656, %r19654}; 2026-02-21T10:19:38.0425056Z st.shared.v4.b32 [%r34], {%r19629, %r19627, %r19628, %r19626}; 2026-02-21T10:19:38.0425179Z st.shared.v4.b32 [%r34+16384], {%r19661, %r19659, %r19660, %r19658}; 2026-02-21T10:19:38.0425237Z $L__tmp9: 2026-02-21T10:19:38.0425513Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0425581Z // begin inline asm 2026-02-21T10:19:38.0425675Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0425734Z // end inline asm 2026-02-21T10:19:38.0425797Z bar.sync 0; 2026-02-21T10:19:38.0425870Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0425935Z mov.pred %p133, -1; 2026-02-21T10:19:38.0425996Z // begin inline asm 2026-02-21T10:19:38.0427618Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r12521,%r12522,%r12523,%r12524}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0427833Z // end inline asm 2026-02-21T10:19:38.0427899Z // begin inline asm 2026-02-21T10:19:38.0429462Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r12653,%r12654,%r12655,%r12656}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0429525Z // end inline asm 2026-02-21T10:19:38.0429590Z // begin inline asm 2026-02-21T10:19:38.0431329Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r12785,%r12786,%r12787,%r12788}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0431401Z // end inline asm 2026-02-21T10:19:38.0431460Z // begin inline asm 2026-02-21T10:19:38.0432947Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r12917,%r12918,%r12919,%r12920}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0433012Z // end inline asm 2026-02-21T10:19:38.0433082Z // begin inline asm 2026-02-21T10:19:38.0434573Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r13049,%r13050,%r13051,%r13052}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0434640Z // end inline asm 2026-02-21T10:19:38.0434699Z // begin inline asm 2026-02-21T10:19:38.0436182Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r13181,%r13182,%r13183,%r13184}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0436290Z // end inline asm 2026-02-21T10:19:38.0436347Z // begin inline asm 2026-02-21T10:19:38.0438049Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r13313,%r13314,%r13315,%r13316}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0438111Z // end inline asm 2026-02-21T10:19:38.0438168Z // begin inline asm 2026-02-21T10:19:38.0439725Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r13445,%r13446,%r13447,%r13448}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0439789Z // end inline asm 2026-02-21T10:19:38.0439905Z // begin inline asm 2026-02-21T10:19:38.0441389Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r13577,%r13578,%r13579,%r13580}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0441452Z // end inline asm 2026-02-21T10:19:38.0441513Z // begin inline asm 2026-02-21T10:19:38.0442993Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r13709,%r13710,%r13711,%r13712}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0443051Z // end inline asm 2026-02-21T10:19:38.0443108Z // begin inline asm 2026-02-21T10:19:38.0444595Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r13841,%r13842,%r13843,%r13844}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0444655Z // end inline asm 2026-02-21T10:19:38.0444714Z // begin inline asm 2026-02-21T10:19:38.0446189Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r13973,%r13974,%r13975,%r13976}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0446381Z // end inline asm 2026-02-21T10:19:38.0446441Z // begin inline asm 2026-02-21T10:19:38.0448091Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r14105,%r14106,%r14107,%r14108}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0448154Z // end inline asm 2026-02-21T10:19:38.0448213Z // begin inline asm 2026-02-21T10:19:38.0449824Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r14237,%r14238,%r14239,%r14240}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0449889Z // end inline asm 2026-02-21T10:19:38.0449949Z // begin inline asm 2026-02-21T10:19:38.0451429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r14369,%r14370,%r14371,%r14372}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0451488Z // end inline asm 2026-02-21T10:19:38.0451550Z // begin inline asm 2026-02-21T10:19:38.0453030Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r14501,%r14502,%r14503,%r14504}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0453091Z // end inline asm 2026-02-21T10:19:38.0453172Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0453234Z mov.b32 %r14634, %r19465; 2026-02-21T10:19:38.0453294Z mov.b32 %r14635, %r19465; 2026-02-21T10:19:38.0453359Z mov.b32 %r14633, %r39931; 2026-02-21T10:19:38.0453419Z // begin inline asm 2026-02-21T10:19:38.0455953Z // wait for regs: %r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r14633,%r14634,%r14635 2026-02-21T10:19:38.0456170Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0456228Z // end inline asm 2026-02-21T10:19:38.0456288Z $L__tmp10: 2026-02-21T10:19:38.0456616Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0456690Z add.s32 %r19662, %r42725, -64; 2026-02-21T10:19:38.0456760Z add.s64 %rd374, %rd333, 128; 2026-02-21T10:19:38.0456830Z add.s64 %rd377, %rd336, 128; 2026-02-21T10:19:38.0456968Z add.s64 %rd380, %rd339, 128; 2026-02-21T10:19:38.0457036Z add.s64 %rd383, %rd342, 128; 2026-02-21T10:19:38.0457103Z add.s64 %rd386, %rd345, 128; 2026-02-21T10:19:38.0457162Z add.s64 %rd389, %rd348, 128; 2026-02-21T10:19:38.0457222Z add.s64 %rd392, %rd351, 128; 2026-02-21T10:19:38.0457365Z mad.wide.s32 %rd395, %r19662, 2, %rd117; 2026-02-21T10:19:38.0457569Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0457629Z // begin inline asm 2026-02-21T10:19:38.0457689Z mov.u64 %rd373, 0x0; 2026-02-21T10:19:38.0457824Z createpolicy.fractional.L2::evict_first.b64 %rd373, 1.0; 2026-02-21T10:19:38.0457881Z // end inline asm 2026-02-21T10:19:38.0457938Z // begin inline asm 2026-02-21T10:19:38.0457999Z mov.u32 %r14767, 0x0; 2026-02-21T10:19:38.0458057Z mov.u32 %r14768, 0x0; 2026-02-21T10:19:38.0458117Z mov.u32 %r14769, 0x0; 2026-02-21T10:19:38.0458176Z mov.u32 %r14770, 0x0; 2026-02-21T10:19:38.0458420Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14767, %r14768, %r14769, %r14770 }, [ %rd374 + 0 ], %rd373; 2026-02-21T10:19:38.0458478Z // end inline asm 2026-02-21T10:19:38.0458536Z // begin inline asm 2026-02-21T10:19:38.0458599Z mov.u64 %rd376, 0x0; 2026-02-21T10:19:38.0458724Z createpolicy.fractional.L2::evict_first.b64 %rd376, 1.0; 2026-02-21T10:19:38.0458781Z // end inline asm 2026-02-21T10:19:38.0458844Z // begin inline asm 2026-02-21T10:19:38.0458901Z mov.u32 %r14771, 0x0; 2026-02-21T10:19:38.0458959Z mov.u32 %r14772, 0x0; 2026-02-21T10:19:38.0459016Z mov.u32 %r14773, 0x0; 2026-02-21T10:19:38.0459087Z mov.u32 %r14774, 0x0; 2026-02-21T10:19:38.0459321Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14771, %r14772, %r14773, %r14774 }, [ %rd377 + 0 ], %rd376; 2026-02-21T10:19:38.0459377Z // end inline asm 2026-02-21T10:19:38.0459444Z // begin inline asm 2026-02-21T10:19:38.0459503Z mov.u64 %rd379, 0x0; 2026-02-21T10:19:38.0459623Z createpolicy.fractional.L2::evict_first.b64 %rd379, 1.0; 2026-02-21T10:19:38.0459685Z // end inline asm 2026-02-21T10:19:38.0459744Z // begin inline asm 2026-02-21T10:19:38.0459803Z mov.u32 %r14775, 0x0; 2026-02-21T10:19:38.0459862Z mov.u32 %r14776, 0x0; 2026-02-21T10:19:38.0459922Z mov.u32 %r14777, 0x0; 2026-02-21T10:19:38.0459977Z mov.u32 %r14778, 0x0; 2026-02-21T10:19:38.0460203Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14775, %r14776, %r14777, %r14778 }, [ %rd380 + 0 ], %rd379; 2026-02-21T10:19:38.0460265Z // end inline asm 2026-02-21T10:19:38.0460323Z // begin inline asm 2026-02-21T10:19:38.0460381Z mov.u64 %rd382, 0x0; 2026-02-21T10:19:38.0460498Z createpolicy.fractional.L2::evict_first.b64 %rd382, 1.0; 2026-02-21T10:19:38.0460560Z // end inline asm 2026-02-21T10:19:38.0460617Z // begin inline asm 2026-02-21T10:19:38.0460675Z mov.u32 %r14779, 0x0; 2026-02-21T10:19:38.0460834Z mov.u32 %r14780, 0x0; 2026-02-21T10:19:38.0460891Z mov.u32 %r14781, 0x0; 2026-02-21T10:19:38.0461010Z mov.u32 %r14782, 0x0; 2026-02-21T10:19:38.0461251Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14779, %r14780, %r14781, %r14782 }, [ %rd383 + 0 ], %rd382; 2026-02-21T10:19:38.0461309Z // end inline asm 2026-02-21T10:19:38.0461369Z // begin inline asm 2026-02-21T10:19:38.0461428Z mov.u64 %rd385, 0x0; 2026-02-21T10:19:38.0461551Z createpolicy.fractional.L2::evict_first.b64 %rd385, 1.0; 2026-02-21T10:19:38.0461610Z // end inline asm 2026-02-21T10:19:38.0461673Z // begin inline asm 2026-02-21T10:19:38.0461733Z mov.u32 %r14783, 0x0; 2026-02-21T10:19:38.0461792Z mov.u32 %r14784, 0x0; 2026-02-21T10:19:38.0461850Z mov.u32 %r14785, 0x0; 2026-02-21T10:19:38.0461908Z mov.u32 %r14786, 0x0; 2026-02-21T10:19:38.0462136Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14783, %r14784, %r14785, %r14786 }, [ %rd386 + 0 ], %rd385; 2026-02-21T10:19:38.0462195Z // end inline asm 2026-02-21T10:19:38.0462257Z // begin inline asm 2026-02-21T10:19:38.0462320Z mov.u64 %rd388, 0x0; 2026-02-21T10:19:38.0462438Z createpolicy.fractional.L2::evict_first.b64 %rd388, 1.0; 2026-02-21T10:19:38.0462559Z // end inline asm 2026-02-21T10:19:38.0462627Z // begin inline asm 2026-02-21T10:19:38.0462685Z mov.u32 %r14787, 0x0; 2026-02-21T10:19:38.0462743Z mov.u32 %r14788, 0x0; 2026-02-21T10:19:38.0462802Z mov.u32 %r14789, 0x0; 2026-02-21T10:19:38.0462864Z mov.u32 %r14790, 0x0; 2026-02-21T10:19:38.0463133Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14787, %r14788, %r14789, %r14790 }, [ %rd389 + 0 ], %rd388; 2026-02-21T10:19:38.0463192Z // end inline asm 2026-02-21T10:19:38.0466084Z // begin inline asm 2026-02-21T10:19:38.0466178Z mov.u64 %rd391, 0x0; 2026-02-21T10:19:38.0466332Z createpolicy.fractional.L2::evict_first.b64 %rd391, 1.0; 2026-02-21T10:19:38.0466394Z // end inline asm 2026-02-21T10:19:38.0466616Z // begin inline asm 2026-02-21T10:19:38.0466686Z mov.u32 %r14791, 0x0; 2026-02-21T10:19:38.0466751Z mov.u32 %r14792, 0x0; 2026-02-21T10:19:38.0466810Z mov.u32 %r14793, 0x0; 2026-02-21T10:19:38.0466869Z mov.u32 %r14794, 0x0; 2026-02-21T10:19:38.0467155Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14791, %r14792, %r14793, %r14794 }, [ %rd392 + 0 ], %rd391; 2026-02-21T10:19:38.0467212Z // end inline asm 2026-02-21T10:19:38.0467269Z // begin inline asm 2026-02-21T10:19:38.0467333Z mov.u64 %rd394, 0x0; 2026-02-21T10:19:38.0467465Z createpolicy.fractional.L2::evict_first.b64 %rd394, 1.0; 2026-02-21T10:19:38.0467524Z // end inline asm 2026-02-21T10:19:38.0467591Z // begin inline asm 2026-02-21T10:19:38.0467651Z mov.u32 %r14795, 0x0; 2026-02-21T10:19:38.0467710Z mov.u32 %r14796, 0x0; 2026-02-21T10:19:38.0467766Z mov.u32 %r14797, 0x0; 2026-02-21T10:19:38.0467827Z mov.u32 %r14798, 0x0; 2026-02-21T10:19:38.0468071Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14795, %r14796, %r14797, %r14798 }, [ %rd395 + 0 ], %rd394; 2026-02-21T10:19:38.0468130Z // end inline asm 2026-02-21T10:19:38.0468353Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0468487Z bar.sync 0; 2026-02-21T10:19:38.0468577Z st.shared.v2.b32 [%r9], {%r14767, %r14768}; 2026-02-21T10:19:38.0468675Z st.shared.v2.b32 [%r9+2048], {%r14771, %r14772}; 2026-02-21T10:19:38.0468771Z st.shared.v2.b32 [%r9+4096], {%r14775, %r14776}; 2026-02-21T10:19:38.0468858Z st.shared.v2.b32 [%r9+6144], {%r14779, %r14780}; 2026-02-21T10:19:38.0468943Z st.shared.v2.b32 [%r9+8192], {%r14783, %r14784}; 2026-02-21T10:19:38.0469036Z st.shared.v2.b32 [%r9+10240], {%r14787, %r14788}; 2026-02-21T10:19:38.0469122Z st.shared.v2.b32 [%r9+12288], {%r14791, %r14792}; 2026-02-21T10:19:38.0469205Z st.shared.v2.b32 [%r9+14336], {%r14795, %r14796}; 2026-02-21T10:19:38.0469289Z st.shared.v2.b32 [%r10], {%r14769, %r14770}; 2026-02-21T10:19:38.0469373Z st.shared.v2.b32 [%r10+2048], {%r14773, %r14774}; 2026-02-21T10:19:38.0469456Z st.shared.v2.b32 [%r10+4096], {%r14777, %r14778}; 2026-02-21T10:19:38.0469674Z st.shared.v2.b32 [%r10+6144], {%r14781, %r14782}; 2026-02-21T10:19:38.0469824Z st.shared.v2.b32 [%r10+8192], {%r14785, %r14786}; 2026-02-21T10:19:38.0469918Z st.shared.v2.b32 [%r10+10240], {%r14789, %r14790}; 2026-02-21T10:19:38.0470008Z st.shared.v2.b32 [%r10+12288], {%r14793, %r14794}; 2026-02-21T10:19:38.0470099Z st.shared.v2.b32 [%r10+14336], {%r14797, %r14798}; 2026-02-21T10:19:38.0470157Z bar.sync 0; 2026-02-21T10:19:38.0470229Z ld.shared.b16 %rs1121, [%r51]; 2026-02-21T10:19:38.0470313Z ld.shared.b16 %rs1122, [%r51+1024]; 2026-02-21T10:19:38.0470384Z ld.shared.b16 %rs1123, [%r51+64]; 2026-02-21T10:19:38.0470452Z ld.shared.b16 %rs1124, [%r51+1088]; 2026-02-21T10:19:38.0470517Z ld.shared.b16 %rs1125, [%r51+8192]; 2026-02-21T10:19:38.0470588Z ld.shared.b16 %rs1126, [%r51+9216]; 2026-02-21T10:19:38.0470655Z ld.shared.b16 %rs1127, [%r51+8256]; 2026-02-21T10:19:38.0470717Z ld.shared.b16 %rs1128, [%r51+9280]; 2026-02-21T10:19:38.0470789Z ld.shared.b16 %rs1129, [%r52]; 2026-02-21T10:19:38.0470857Z ld.shared.b16 %rs1130, [%r52+1024]; 2026-02-21T10:19:38.0470925Z ld.shared.b16 %rs1131, [%r52+64]; 2026-02-21T10:19:38.0471061Z ld.shared.b16 %rs1132, [%r52+1088]; 2026-02-21T10:19:38.0471128Z ld.shared.b16 %rs1133, [%r52+8192]; 2026-02-21T10:19:38.0471191Z ld.shared.b16 %rs1134, [%r52+9216]; 2026-02-21T10:19:38.0471254Z ld.shared.b16 %rs1135, [%r52+8256]; 2026-02-21T10:19:38.0471321Z ld.shared.b16 %rs1136, [%r52+9280]; 2026-02-21T10:19:38.0471445Z ld.shared.b16 %rs1137, [%r53]; 2026-02-21T10:19:38.0471513Z ld.shared.b16 %rs1138, [%r53+1024]; 2026-02-21T10:19:38.0471578Z ld.shared.b16 %rs1139, [%r53+64]; 2026-02-21T10:19:38.0471642Z ld.shared.b16 %rs1140, [%r53+1088]; 2026-02-21T10:19:38.0471705Z ld.shared.b16 %rs1141, [%r53+8192]; 2026-02-21T10:19:38.0471767Z ld.shared.b16 %rs1142, [%r53+9216]; 2026-02-21T10:19:38.0471834Z ld.shared.b16 %rs1143, [%r53+8256]; 2026-02-21T10:19:38.0471898Z ld.shared.b16 %rs1144, [%r53+9280]; 2026-02-21T10:19:38.0471963Z ld.shared.b16 %rs1145, [%r54]; 2026-02-21T10:19:38.0472031Z ld.shared.b16 %rs1146, [%r54+1024]; 2026-02-21T10:19:38.0472093Z ld.shared.b16 %rs1147, [%r54+64]; 2026-02-21T10:19:38.0472158Z ld.shared.b16 %rs1148, [%r54+1088]; 2026-02-21T10:19:38.0472226Z ld.shared.b16 %rs1149, [%r54+8192]; 2026-02-21T10:19:38.0472293Z ld.shared.b16 %rs1150, [%r54+9216]; 2026-02-21T10:19:38.0472356Z ld.shared.b16 %rs1151, [%r54+8256]; 2026-02-21T10:19:38.0472423Z ld.shared.b16 %rs1152, [%r54+9280]; 2026-02-21T10:19:38.0472505Z ld.shared.b16 %rs1153, [%r55]; 2026-02-21T10:19:38.0472575Z ld.shared.b16 %rs1154, [%r55+1024]; 2026-02-21T10:19:38.0472640Z ld.shared.b16 %rs1155, [%r55+64]; 2026-02-21T10:19:38.0472712Z ld.shared.b16 %rs1156, [%r55+1088]; 2026-02-21T10:19:38.0472776Z ld.shared.b16 %rs1157, [%r55+8192]; 2026-02-21T10:19:38.0472841Z ld.shared.b16 %rs1158, [%r55+9216]; 2026-02-21T10:19:38.0472906Z ld.shared.b16 %rs1159, [%r55+8256]; 2026-02-21T10:19:38.0472971Z ld.shared.b16 %rs1160, [%r55+9280]; 2026-02-21T10:19:38.0473038Z ld.shared.b16 %rs1161, [%r56]; 2026-02-21T10:19:38.0473103Z ld.shared.b16 %rs1162, [%r56+1024]; 2026-02-21T10:19:38.0473170Z ld.shared.b16 %rs1163, [%r56+64]; 2026-02-21T10:19:38.0473234Z ld.shared.b16 %rs1164, [%r56+1088]; 2026-02-21T10:19:38.0473297Z ld.shared.b16 %rs1165, [%r56+8192]; 2026-02-21T10:19:38.0473366Z ld.shared.b16 %rs1166, [%r56+9216]; 2026-02-21T10:19:38.0473431Z ld.shared.b16 %rs1167, [%r56+8256]; 2026-02-21T10:19:38.0473497Z ld.shared.b16 %rs1168, [%r56+9280]; 2026-02-21T10:19:38.0473559Z ld.shared.b16 %rs1169, [%r57]; 2026-02-21T10:19:38.0473629Z ld.shared.b16 %rs1170, [%r57+1024]; 2026-02-21T10:19:38.0473695Z ld.shared.b16 %rs1171, [%r57+64]; 2026-02-21T10:19:38.0473757Z ld.shared.b16 %rs1172, [%r57+1088]; 2026-02-21T10:19:38.0473826Z ld.shared.b16 %rs1173, [%r57+8192]; 2026-02-21T10:19:38.0473889Z ld.shared.b16 %rs1174, [%r57+9216]; 2026-02-21T10:19:38.0473962Z ld.shared.b16 %rs1175, [%r57+8256]; 2026-02-21T10:19:38.0474092Z ld.shared.b16 %rs1176, [%r57+9280]; 2026-02-21T10:19:38.0474208Z ld.shared.b16 %rs1177, [%r58]; 2026-02-21T10:19:38.0474274Z ld.shared.b16 %rs1178, [%r58+1024]; 2026-02-21T10:19:38.0474340Z ld.shared.b16 %rs1179, [%r58+64]; 2026-02-21T10:19:38.0474410Z ld.shared.b16 %rs1180, [%r58+1088]; 2026-02-21T10:19:38.0474480Z ld.shared.b16 %rs1181, [%r58+8192]; 2026-02-21T10:19:38.0474544Z ld.shared.b16 %rs1182, [%r58+9216]; 2026-02-21T10:19:38.0474613Z ld.shared.b16 %rs1183, [%r58+8256]; 2026-02-21T10:19:38.0474678Z ld.shared.b16 %rs1184, [%r58+9280]; 2026-02-21T10:19:38.0474741Z cvt.f32.bf16 %r14936, %rs1121; 2026-02-21T10:19:38.0474801Z cvt.f32.bf16 %r14937, %rs1122; 2026-02-21T10:19:38.0474865Z cvt.f32.bf16 %r14938, %rs1129; 2026-02-21T10:19:38.0474927Z cvt.f32.bf16 %r14939, %rs1130; 2026-02-21T10:19:38.0474988Z cvt.f32.bf16 %r15068, %rs1137; 2026-02-21T10:19:38.0475051Z cvt.f32.bf16 %r15069, %rs1138; 2026-02-21T10:19:38.0475113Z cvt.f32.bf16 %r15070, %rs1145; 2026-02-21T10:19:38.0475174Z cvt.f32.bf16 %r15071, %rs1146; 2026-02-21T10:19:38.0475236Z cvt.f32.bf16 %r15200, %rs1153; 2026-02-21T10:19:38.0475300Z cvt.f32.bf16 %r15201, %rs1154; 2026-02-21T10:19:38.0475422Z cvt.f32.bf16 %r15202, %rs1161; 2026-02-21T10:19:38.0475486Z cvt.f32.bf16 %r15203, %rs1162; 2026-02-21T10:19:38.0475553Z cvt.f32.bf16 %r15332, %rs1169; 2026-02-21T10:19:38.0475613Z cvt.f32.bf16 %r15333, %rs1170; 2026-02-21T10:19:38.0475715Z cvt.f32.bf16 %r15334, %rs1177; 2026-02-21T10:19:38.0475778Z cvt.f32.bf16 %r15335, %rs1178; 2026-02-21T10:19:38.0475842Z cvt.f32.bf16 %r15464, %rs1123; 2026-02-21T10:19:38.0475900Z cvt.f32.bf16 %r15465, %rs1124; 2026-02-21T10:19:38.0475961Z cvt.f32.bf16 %r15466, %rs1131; 2026-02-21T10:19:38.0476025Z cvt.f32.bf16 %r15467, %rs1132; 2026-02-21T10:19:38.0476101Z cvt.f32.bf16 %r15596, %rs1139; 2026-02-21T10:19:38.0476164Z cvt.f32.bf16 %r15597, %rs1140; 2026-02-21T10:19:38.0476229Z cvt.f32.bf16 %r15598, %rs1147; 2026-02-21T10:19:38.0476292Z cvt.f32.bf16 %r15599, %rs1148; 2026-02-21T10:19:38.0476350Z cvt.f32.bf16 %r15728, %rs1155; 2026-02-21T10:19:38.0476411Z cvt.f32.bf16 %r15729, %rs1156; 2026-02-21T10:19:38.0476615Z cvt.f32.bf16 %r15730, %rs1163; 2026-02-21T10:19:38.0476682Z cvt.f32.bf16 %r15731, %rs1164; 2026-02-21T10:19:38.0476742Z cvt.f32.bf16 %r15860, %rs1171; 2026-02-21T10:19:38.0476807Z cvt.f32.bf16 %r15861, %rs1172; 2026-02-21T10:19:38.0476866Z cvt.f32.bf16 %r15862, %rs1179; 2026-02-21T10:19:38.0476927Z cvt.f32.bf16 %r15863, %rs1180; 2026-02-21T10:19:38.0476987Z cvt.f32.bf16 %r15992, %rs1125; 2026-02-21T10:19:38.0477050Z cvt.f32.bf16 %r15993, %rs1126; 2026-02-21T10:19:38.0477119Z cvt.f32.bf16 %r15994, %rs1133; 2026-02-21T10:19:38.0477182Z cvt.f32.bf16 %r15995, %rs1134; 2026-02-21T10:19:38.0477245Z cvt.f32.bf16 %r16124, %rs1141; 2026-02-21T10:19:38.0477308Z cvt.f32.bf16 %r16125, %rs1142; 2026-02-21T10:19:38.0477367Z cvt.f32.bf16 %r16126, %rs1149; 2026-02-21T10:19:38.0477426Z cvt.f32.bf16 %r16127, %rs1150; 2026-02-21T10:19:38.0477491Z cvt.f32.bf16 %r16256, %rs1157; 2026-02-21T10:19:38.0477554Z cvt.f32.bf16 %r16257, %rs1158; 2026-02-21T10:19:38.0477613Z cvt.f32.bf16 %r16258, %rs1165; 2026-02-21T10:19:38.0477681Z cvt.f32.bf16 %r16259, %rs1166; 2026-02-21T10:19:38.0477740Z cvt.f32.bf16 %r16388, %rs1173; 2026-02-21T10:19:38.0477798Z cvt.f32.bf16 %r16389, %rs1174; 2026-02-21T10:19:38.0477862Z cvt.f32.bf16 %r16390, %rs1181; 2026-02-21T10:19:38.0477923Z cvt.f32.bf16 %r16391, %rs1182; 2026-02-21T10:19:38.0477981Z cvt.f32.bf16 %r16520, %rs1127; 2026-02-21T10:19:38.0478040Z cvt.f32.bf16 %r16521, %rs1128; 2026-02-21T10:19:38.0478105Z cvt.f32.bf16 %r16522, %rs1135; 2026-02-21T10:19:38.0478166Z cvt.f32.bf16 %r16523, %rs1136; 2026-02-21T10:19:38.0478225Z cvt.f32.bf16 %r16652, %rs1143; 2026-02-21T10:19:38.0478291Z cvt.f32.bf16 %r16653, %rs1144; 2026-02-21T10:19:38.0478350Z cvt.f32.bf16 %r16654, %rs1151; 2026-02-21T10:19:38.0478410Z cvt.f32.bf16 %r16655, %rs1152; 2026-02-21T10:19:38.0478562Z cvt.f32.bf16 %r16784, %rs1159; 2026-02-21T10:19:38.0478627Z cvt.f32.bf16 %r16785, %rs1160; 2026-02-21T10:19:38.0478751Z cvt.f32.bf16 %r16786, %rs1167; 2026-02-21T10:19:38.0478815Z cvt.f32.bf16 %r16787, %rs1168; 2026-02-21T10:19:38.0478881Z cvt.f32.bf16 %r16916, %rs1175; 2026-02-21T10:19:38.0478953Z cvt.f32.bf16 %r16917, %rs1176; 2026-02-21T10:19:38.0479013Z cvt.f32.bf16 %r16918, %rs1183; 2026-02-21T10:19:38.0479075Z cvt.f32.bf16 %r16919, %rs1184; 2026-02-21T10:19:38.0479309Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0479368Z bar.sync 0; 2026-02-21T10:19:38.0479431Z // begin inline asm 2026-02-21T10:19:38.0479541Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0479600Z // end inline asm 2026-02-21T10:19:38.0479655Z bar.sync 0; 2026-02-21T10:19:38.0479717Z // begin inline asm 2026-02-21T10:19:38.0479856Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0479914Z // end inline asm 2026-02-21T10:19:38.0479970Z // begin inline asm 2026-02-21T10:19:38.0480053Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0480108Z // end inline asm 2026-02-21T10:19:38.0480242Z bar.sync 0; 2026-02-21T10:19:38.0480317Z elect.sync %r19663|%p191, -1; 2026-02-21T10:19:38.0480390Z and.pred %p151, %p1, %p191; 2026-02-21T10:19:38.0480454Z cvt.u32.u64 %r19664, %rd845; 2026-02-21T10:19:38.0480519Z add.s32 %r14803, %r19664, 128; 2026-02-21T10:19:38.0480667Z // begin inline asm 2026-02-21T10:19:38.0481017Z @%p151 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r19829, %r14803}], [%r29846]; 2026-02-21T10:19:38.0481086Z // end inline asm 2026-02-21T10:19:38.0481145Z bar.sync 0; 2026-02-21T10:19:38.0481205Z // begin inline asm 2026-02-21T10:19:38.0481258Z 2026-02-21T10:19:38.0481308Z { 2026-02-21T10:19:38.0481375Z .reg .pred complete; 2026-02-21T10:19:38.0481435Z waitLoop: 2026-02-21T10:19:38.0481585Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r19465; 2026-02-21T10:19:38.0481663Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0481716Z } 2026-02-21T10:19:38.0481721Z 2026-02-21T10:19:38.0481781Z // end inline asm 2026-02-21T10:19:38.0481840Z bar.sync 0; 2026-02-21T10:19:38.0481899Z // begin inline asm 2026-02-21T10:19:38.0481997Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0482053Z // end inline asm 2026-02-21T10:19:38.0482264Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0482334Z ld.shared.s8 %rs1185, [%r19]; 2026-02-21T10:19:38.0482551Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0482619Z shl.b16 %rs1186, %rs1185, 4; 2026-02-21T10:19:38.0482811Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0482881Z ld.shared.s8 %rs1187, [%r20+128]; 2026-02-21T10:19:38.0483086Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0483151Z shl.b16 %rs1188, %rs1187, 4; 2026-02-21T10:19:38.0483341Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0483414Z ld.shared.s8 %rs1189, [%r21+256]; 2026-02-21T10:19:38.0483608Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0483669Z shl.b16 %rs1190, %rs1189, 4; 2026-02-21T10:19:38.0483860Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0483928Z ld.shared.s8 %rs1191, [%r22+384]; 2026-02-21T10:19:38.0484117Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0484178Z shl.b16 %rs1192, %rs1191, 4; 2026-02-21T10:19:38.0484379Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0484561Z ld.shared.s8 %rs1193, [%r23+512]; 2026-02-21T10:19:38.0484754Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0484818Z shl.b16 %rs1194, %rs1193, 4; 2026-02-21T10:19:38.0485008Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0485072Z ld.shared.s8 %rs1195, [%r24+640]; 2026-02-21T10:19:38.0485265Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0485325Z shl.b16 %rs1196, %rs1195, 4; 2026-02-21T10:19:38.0485516Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0485581Z ld.shared.s8 %rs1197, [%r25+768]; 2026-02-21T10:19:38.0485774Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0485837Z shl.b16 %rs1198, %rs1197, 4; 2026-02-21T10:19:38.0486083Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0486156Z ld.shared.s8 %rs1199, [%r26+896]; 2026-02-21T10:19:38.0486347Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0486408Z shl.b16 %rs1200, %rs1199, 4; 2026-02-21T10:19:38.0486794Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0486875Z ld.shared.s8 %rs1201, [%r19+1024]; 2026-02-21T10:19:38.0487082Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0487150Z shl.b16 %rs1202, %rs1201, 4; 2026-02-21T10:19:38.0487344Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0487415Z ld.shared.s8 %rs1203, [%r20+1152]; 2026-02-21T10:19:38.0487604Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0487673Z shl.b16 %rs1204, %rs1203, 4; 2026-02-21T10:19:38.0487862Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0487926Z ld.shared.s8 %rs1205, [%r21+1280]; 2026-02-21T10:19:38.0488119Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0488180Z shl.b16 %rs1206, %rs1205, 4; 2026-02-21T10:19:38.0488369Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0488439Z ld.shared.s8 %rs1207, [%r22+1408]; 2026-02-21T10:19:38.0488627Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0488689Z shl.b16 %rs1208, %rs1207, 4; 2026-02-21T10:19:38.0488886Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0488956Z ld.shared.s8 %rs1209, [%r23+1536]; 2026-02-21T10:19:38.0489154Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0489215Z shl.b16 %rs1210, %rs1209, 4; 2026-02-21T10:19:38.0489421Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0489489Z ld.shared.s8 %rs1211, [%r24+1664]; 2026-02-21T10:19:38.0489684Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0489751Z shl.b16 %rs1212, %rs1211, 4; 2026-02-21T10:19:38.0489942Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0490006Z ld.shared.s8 %rs1213, [%r25+1792]; 2026-02-21T10:19:38.0490199Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0490387Z shl.b16 %rs1214, %rs1213, 4; 2026-02-21T10:19:38.0490576Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0490645Z ld.shared.s8 %rs1215, [%r26+1920]; 2026-02-21T10:19:38.0490834Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0490897Z shl.b16 %rs1216, %rs1215, 4; 2026-02-21T10:19:38.0491087Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0491156Z ld.shared.s8 %rs1217, [%r19+2048]; 2026-02-21T10:19:38.0491343Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0491403Z shl.b16 %rs1218, %rs1217, 4; 2026-02-21T10:19:38.0491594Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0491661Z ld.shared.s8 %rs1219, [%r20+2176]; 2026-02-21T10:19:38.0491915Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0491983Z shl.b16 %rs1220, %rs1219, 4; 2026-02-21T10:19:38.0492188Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0492253Z ld.shared.s8 %rs1221, [%r21+2304]; 2026-02-21T10:19:38.0492489Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0492551Z shl.b16 %rs1222, %rs1221, 4; 2026-02-21T10:19:38.0492738Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0492807Z ld.shared.s8 %rs1223, [%r22+2432]; 2026-02-21T10:19:38.0492994Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0493058Z shl.b16 %rs1224, %rs1223, 4; 2026-02-21T10:19:38.0493249Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0493319Z ld.shared.s8 %rs1225, [%r23+2560]; 2026-02-21T10:19:38.0493508Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0493566Z shl.b16 %rs1226, %rs1225, 4; 2026-02-21T10:19:38.0493759Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0493823Z ld.shared.s8 %rs1227, [%r24+2688]; 2026-02-21T10:19:38.0494012Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0494074Z shl.b16 %rs1228, %rs1227, 4; 2026-02-21T10:19:38.0494265Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0494328Z ld.shared.s8 %rs1229, [%r25+2816]; 2026-02-21T10:19:38.0494523Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0494585Z shl.b16 %rs1230, %rs1229, 4; 2026-02-21T10:19:38.0494774Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0494843Z ld.shared.s8 %rs1231, [%r26+2944]; 2026-02-21T10:19:38.0495036Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0495096Z shl.b16 %rs1232, %rs1231, 4; 2026-02-21T10:19:38.0495284Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0495351Z ld.shared.s8 %rs1233, [%r19+3072]; 2026-02-21T10:19:38.0495542Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0495602Z shl.b16 %rs1234, %rs1233, 4; 2026-02-21T10:19:38.0495792Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0495965Z ld.shared.s8 %rs1235, [%r20+3200]; 2026-02-21T10:19:38.0496156Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0496221Z shl.b16 %rs1236, %rs1235, 4; 2026-02-21T10:19:38.0496410Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0496593Z ld.shared.s8 %rs1237, [%r21+3328]; 2026-02-21T10:19:38.0496799Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0496871Z shl.b16 %rs1238, %rs1237, 4; 2026-02-21T10:19:38.0497063Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0497130Z ld.shared.s8 %rs1239, [%r22+3456]; 2026-02-21T10:19:38.0497320Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0497382Z shl.b16 %rs1240, %rs1239, 4; 2026-02-21T10:19:38.0497641Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0497713Z ld.shared.s8 %rs1241, [%r23+3584]; 2026-02-21T10:19:38.0497903Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0497963Z shl.b16 %rs1242, %rs1241, 4; 2026-02-21T10:19:38.0498215Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0498280Z ld.shared.s8 %rs1243, [%r24+3712]; 2026-02-21T10:19:38.0498469Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0498532Z shl.b16 %rs1244, %rs1243, 4; 2026-02-21T10:19:38.0498722Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0498787Z ld.shared.s8 %rs1245, [%r25+3840]; 2026-02-21T10:19:38.0498979Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0499042Z shl.b16 %rs1246, %rs1245, 4; 2026-02-21T10:19:38.0499233Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0499296Z ld.shared.s8 %rs1247, [%r26+3968]; 2026-02-21T10:19:38.0499490Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0499551Z shl.b16 %rs1248, %rs1247, 4; 2026-02-21T10:19:38.0499738Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0499814Z cvt.s16.s8 %rs1249, %rs1186; 2026-02-21T10:19:38.0499875Z shr.s16 %rs1250, %rs1249, 4; 2026-02-21T10:19:38.0499936Z cvt.s16.s8 %rs1251, %rs1188; 2026-02-21T10:19:38.0499998Z shr.s16 %rs1252, %rs1251, 4; 2026-02-21T10:19:38.0500059Z shr.s16 %rs1253, %rs1185, 4; 2026-02-21T10:19:38.0500121Z shr.s16 %rs1254, %rs1187, 4; 2026-02-21T10:19:38.0500315Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0500385Z cvt.rn.f32.s16 %r19665, %rs1254; 2026-02-21T10:19:38.0500446Z cvt.rn.f32.s16 %r19666, %rs1253; 2026-02-21T10:19:38.0500509Z cvt.rn.f32.s16 %r19667, %rs1252; 2026-02-21T10:19:38.0500571Z cvt.rn.f32.s16 %r19668, %rs1250; 2026-02-21T10:19:38.0500763Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0500823Z cvt.s16.s8 %rs1255, %rs1190; 2026-02-21T10:19:38.0500887Z shr.s16 %rs1256, %rs1255, 4; 2026-02-21T10:19:38.0500946Z cvt.s16.s8 %rs1257, %rs1192; 2026-02-21T10:19:38.0501007Z shr.s16 %rs1258, %rs1257, 4; 2026-02-21T10:19:38.0501065Z shr.s16 %rs1259, %rs1189, 4; 2026-02-21T10:19:38.0501132Z shr.s16 %rs1260, %rs1191, 4; 2026-02-21T10:19:38.0501320Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0501523Z cvt.rn.f32.s16 %r19669, %rs1260; 2026-02-21T10:19:38.0501592Z cvt.rn.f32.s16 %r19670, %rs1259; 2026-02-21T10:19:38.0501656Z cvt.rn.f32.s16 %r19671, %rs1258; 2026-02-21T10:19:38.0501715Z cvt.rn.f32.s16 %r19672, %rs1256; 2026-02-21T10:19:38.0501919Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0501982Z cvt.s16.s8 %rs1261, %rs1194; 2026-02-21T10:19:38.0502041Z shr.s16 %rs1262, %rs1261, 4; 2026-02-21T10:19:38.0502100Z cvt.s16.s8 %rs1263, %rs1196; 2026-02-21T10:19:38.0502162Z shr.s16 %rs1264, %rs1263, 4; 2026-02-21T10:19:38.0502221Z shr.s16 %rs1265, %rs1193, 4; 2026-02-21T10:19:38.0502282Z shr.s16 %rs1266, %rs1195, 4; 2026-02-21T10:19:38.0502476Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0502539Z cvt.rn.f32.s16 %r19673, %rs1266; 2026-02-21T10:19:38.0502602Z cvt.rn.f32.s16 %r19674, %rs1265; 2026-02-21T10:19:38.0502664Z cvt.rn.f32.s16 %r19675, %rs1264; 2026-02-21T10:19:38.0502728Z cvt.rn.f32.s16 %r19676, %rs1262; 2026-02-21T10:19:38.0502989Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0503054Z cvt.s16.s8 %rs1267, %rs1198; 2026-02-21T10:19:38.0503116Z shr.s16 %rs1268, %rs1267, 4; 2026-02-21T10:19:38.0503224Z cvt.s16.s8 %rs1269, %rs1200; 2026-02-21T10:19:38.0503285Z shr.s16 %rs1270, %rs1269, 4; 2026-02-21T10:19:38.0503348Z shr.s16 %rs1271, %rs1197, 4; 2026-02-21T10:19:38.0503408Z shr.s16 %rs1272, %rs1199, 4; 2026-02-21T10:19:38.0503598Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0503661Z cvt.rn.f32.s16 %r19677, %rs1272; 2026-02-21T10:19:38.0503723Z cvt.rn.f32.s16 %r19678, %rs1271; 2026-02-21T10:19:38.0503783Z cvt.rn.f32.s16 %r19679, %rs1270; 2026-02-21T10:19:38.0503846Z cvt.rn.f32.s16 %r19680, %rs1268; 2026-02-21T10:19:38.0504040Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0504105Z cvt.s16.s8 %rs1273, %rs1202; 2026-02-21T10:19:38.0504165Z shr.s16 %rs1274, %rs1273, 4; 2026-02-21T10:19:38.0504231Z cvt.s16.s8 %rs1275, %rs1204; 2026-02-21T10:19:38.0504290Z shr.s16 %rs1276, %rs1275, 4; 2026-02-21T10:19:38.0504348Z shr.s16 %rs1277, %rs1201, 4; 2026-02-21T10:19:38.0504408Z shr.s16 %rs1278, %rs1203, 4; 2026-02-21T10:19:38.0504607Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0504671Z cvt.rn.f32.s16 %r19681, %rs1278; 2026-02-21T10:19:38.0504730Z cvt.rn.f32.s16 %r19682, %rs1277; 2026-02-21T10:19:38.0504794Z cvt.rn.f32.s16 %r19683, %rs1276; 2026-02-21T10:19:38.0504854Z cvt.rn.f32.s16 %r19684, %rs1274; 2026-02-21T10:19:38.0505043Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0505110Z cvt.s16.s8 %rs1279, %rs1206; 2026-02-21T10:19:38.0505172Z shr.s16 %rs1280, %rs1279, 4; 2026-02-21T10:19:38.0505231Z cvt.s16.s8 %rs1281, %rs1208; 2026-02-21T10:19:38.0505292Z shr.s16 %rs1282, %rs1281, 4; 2026-02-21T10:19:38.0505349Z shr.s16 %rs1283, %rs1205, 4; 2026-02-21T10:19:38.0505407Z shr.s16 %rs1284, %rs1207, 4; 2026-02-21T10:19:38.0505598Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0505659Z cvt.rn.f32.s16 %r19685, %rs1284; 2026-02-21T10:19:38.0505720Z cvt.rn.f32.s16 %r19686, %rs1283; 2026-02-21T10:19:38.0505783Z cvt.rn.f32.s16 %r19687, %rs1282; 2026-02-21T10:19:38.0505843Z cvt.rn.f32.s16 %r19688, %rs1280; 2026-02-21T10:19:38.0506030Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0506089Z cvt.s16.s8 %rs1285, %rs1210; 2026-02-21T10:19:38.0506151Z shr.s16 %rs1286, %rs1285, 4; 2026-02-21T10:19:38.0506267Z cvt.s16.s8 %rs1287, %rs1212; 2026-02-21T10:19:38.0506369Z shr.s16 %rs1288, %rs1287, 4; 2026-02-21T10:19:38.0506429Z shr.s16 %rs1289, %rs1209, 4; 2026-02-21T10:19:38.0506614Z shr.s16 %rs1290, %rs1211, 4; 2026-02-21T10:19:38.0506810Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0506876Z cvt.rn.f32.s16 %r19689, %rs1290; 2026-02-21T10:19:38.0506953Z cvt.rn.f32.s16 %r19690, %rs1289; 2026-02-21T10:19:38.0507018Z cvt.rn.f32.s16 %r19691, %rs1288; 2026-02-21T10:19:38.0507078Z cvt.rn.f32.s16 %r19692, %rs1286; 2026-02-21T10:19:38.0507270Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0507333Z cvt.s16.s8 %rs1291, %rs1214; 2026-02-21T10:19:38.0507395Z shr.s16 %rs1292, %rs1291, 4; 2026-02-21T10:19:38.0507456Z cvt.s16.s8 %rs1293, %rs1216; 2026-02-21T10:19:38.0507514Z shr.s16 %rs1294, %rs1293, 4; 2026-02-21T10:19:38.0507573Z shr.s16 %rs1295, %rs1213, 4; 2026-02-21T10:19:38.0507635Z shr.s16 %rs1296, %rs1215, 4; 2026-02-21T10:19:38.0507902Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0507969Z cvt.rn.f32.s16 %r19693, %rs1296; 2026-02-21T10:19:38.0508029Z cvt.rn.f32.s16 %r19694, %rs1295; 2026-02-21T10:19:38.0508090Z cvt.rn.f32.s16 %r19695, %rs1294; 2026-02-21T10:19:38.0508150Z cvt.rn.f32.s16 %r19696, %rs1292; 2026-02-21T10:19:38.0508478Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0508549Z cvt.s16.s8 %rs1297, %rs1218; 2026-02-21T10:19:38.0508609Z shr.s16 %rs1298, %rs1297, 4; 2026-02-21T10:19:38.0508668Z cvt.s16.s8 %rs1299, %rs1220; 2026-02-21T10:19:38.0508727Z shr.s16 %rs1300, %rs1299, 4; 2026-02-21T10:19:38.0508790Z shr.s16 %rs1301, %rs1217, 4; 2026-02-21T10:19:38.0508846Z shr.s16 %rs1302, %rs1219, 4; 2026-02-21T10:19:38.0509037Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0509103Z cvt.rn.f32.s16 %r19697, %rs1302; 2026-02-21T10:19:38.0509165Z cvt.rn.f32.s16 %r19698, %rs1301; 2026-02-21T10:19:38.0509226Z cvt.rn.f32.s16 %r19699, %rs1300; 2026-02-21T10:19:38.0509287Z cvt.rn.f32.s16 %r19700, %rs1298; 2026-02-21T10:19:38.0509477Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0509537Z cvt.s16.s8 %rs1303, %rs1222; 2026-02-21T10:19:38.0509596Z shr.s16 %rs1304, %rs1303, 4; 2026-02-21T10:19:38.0509657Z cvt.s16.s8 %rs1305, %rs1224; 2026-02-21T10:19:38.0509715Z shr.s16 %rs1306, %rs1305, 4; 2026-02-21T10:19:38.0509773Z shr.s16 %rs1307, %rs1221, 4; 2026-02-21T10:19:38.0509835Z shr.s16 %rs1308, %rs1223, 4; 2026-02-21T10:19:38.0510024Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0510085Z cvt.rn.f32.s16 %r19701, %rs1308; 2026-02-21T10:19:38.0510151Z cvt.rn.f32.s16 %r19702, %rs1307; 2026-02-21T10:19:38.0510213Z cvt.rn.f32.s16 %r19703, %rs1306; 2026-02-21T10:19:38.0510275Z cvt.rn.f32.s16 %r19704, %rs1304; 2026-02-21T10:19:38.0510463Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0510525Z cvt.s16.s8 %rs1309, %rs1226; 2026-02-21T10:19:38.0510584Z shr.s16 %rs1310, %rs1309, 4; 2026-02-21T10:19:38.0510644Z cvt.s16.s8 %rs1311, %rs1228; 2026-02-21T10:19:38.0510708Z shr.s16 %rs1312, %rs1311, 4; 2026-02-21T10:19:38.0510768Z shr.s16 %rs1313, %rs1225, 4; 2026-02-21T10:19:38.0510838Z shr.s16 %rs1314, %rs1227, 4; 2026-02-21T10:19:38.0511032Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0511094Z cvt.rn.f32.s16 %r19705, %rs1314; 2026-02-21T10:19:38.0511155Z cvt.rn.f32.s16 %r19706, %rs1313; 2026-02-21T10:19:38.0511213Z cvt.rn.f32.s16 %r19707, %rs1312; 2026-02-21T10:19:38.0511357Z cvt.rn.f32.s16 %r19708, %rs1310; 2026-02-21T10:19:38.0511607Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0511669Z cvt.s16.s8 %rs1315, %rs1230; 2026-02-21T10:19:38.0511729Z shr.s16 %rs1316, %rs1315, 4; 2026-02-21T10:19:38.0511791Z cvt.s16.s8 %rs1317, %rs1232; 2026-02-21T10:19:38.0511850Z shr.s16 %rs1318, %rs1317, 4; 2026-02-21T10:19:38.0511909Z shr.s16 %rs1319, %rs1229, 4; 2026-02-21T10:19:38.0511973Z shr.s16 %rs1320, %rs1231, 4; 2026-02-21T10:19:38.0512161Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0512223Z cvt.rn.f32.s16 %r19709, %rs1320; 2026-02-21T10:19:38.0512286Z cvt.rn.f32.s16 %r19710, %rs1319; 2026-02-21T10:19:38.0512345Z cvt.rn.f32.s16 %r19711, %rs1318; 2026-02-21T10:19:38.0512405Z cvt.rn.f32.s16 %r19712, %rs1316; 2026-02-21T10:19:38.0512594Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0512656Z cvt.s16.s8 %rs1321, %rs1234; 2026-02-21T10:19:38.0512717Z shr.s16 %rs1322, %rs1321, 4; 2026-02-21T10:19:38.0512825Z cvt.s16.s8 %rs1323, %rs1236; 2026-02-21T10:19:38.0512888Z shr.s16 %rs1324, %rs1323, 4; 2026-02-21T10:19:38.0512946Z shr.s16 %rs1325, %rs1233, 4; 2026-02-21T10:19:38.0513004Z shr.s16 %rs1326, %rs1235, 4; 2026-02-21T10:19:38.0513241Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0513315Z cvt.rn.f32.s16 %r19713, %rs1326; 2026-02-21T10:19:38.0513379Z cvt.rn.f32.s16 %r19714, %rs1325; 2026-02-21T10:19:38.0513441Z cvt.rn.f32.s16 %r19715, %rs1324; 2026-02-21T10:19:38.0513502Z cvt.rn.f32.s16 %r19716, %rs1322; 2026-02-21T10:19:38.0513694Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0513754Z cvt.s16.s8 %rs1327, %rs1238; 2026-02-21T10:19:38.0513819Z shr.s16 %rs1328, %rs1327, 4; 2026-02-21T10:19:38.0513880Z cvt.s16.s8 %rs1329, %rs1240; 2026-02-21T10:19:38.0513940Z shr.s16 %rs1330, %rs1329, 4; 2026-02-21T10:19:38.0514003Z shr.s16 %rs1331, %rs1237, 4; 2026-02-21T10:19:38.0514061Z shr.s16 %rs1332, %rs1239, 4; 2026-02-21T10:19:38.0514250Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0514318Z cvt.rn.f32.s16 %r19717, %rs1332; 2026-02-21T10:19:38.0514382Z cvt.rn.f32.s16 %r19718, %rs1331; 2026-02-21T10:19:38.0514444Z cvt.rn.f32.s16 %r19719, %rs1330; 2026-02-21T10:19:38.0514505Z cvt.rn.f32.s16 %r19720, %rs1328; 2026-02-21T10:19:38.0514707Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0514768Z cvt.s16.s8 %rs1333, %rs1242; 2026-02-21T10:19:38.0514827Z shr.s16 %rs1334, %rs1333, 4; 2026-02-21T10:19:38.0514889Z cvt.s16.s8 %rs1335, %rs1244; 2026-02-21T10:19:38.0514947Z shr.s16 %rs1336, %rs1335, 4; 2026-02-21T10:19:38.0515008Z shr.s16 %rs1337, %rs1241, 4; 2026-02-21T10:19:38.0515068Z shr.s16 %rs1338, %rs1243, 4; 2026-02-21T10:19:38.0515268Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0515331Z cvt.rn.f32.s16 %r19721, %rs1338; 2026-02-21T10:19:38.0515393Z cvt.rn.f32.s16 %r19722, %rs1337; 2026-02-21T10:19:38.0515456Z cvt.rn.f32.s16 %r19723, %rs1336; 2026-02-21T10:19:38.0515519Z cvt.rn.f32.s16 %r19724, %rs1334; 2026-02-21T10:19:38.0515709Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0515771Z cvt.s16.s8 %rs1339, %rs1246; 2026-02-21T10:19:38.0515830Z shr.s16 %rs1340, %rs1339, 4; 2026-02-21T10:19:38.0515889Z cvt.s16.s8 %rs1341, %rs1248; 2026-02-21T10:19:38.0515948Z shr.s16 %rs1342, %rs1341, 4; 2026-02-21T10:19:38.0516009Z shr.s16 %rs1343, %rs1245, 4; 2026-02-21T10:19:38.0516068Z shr.s16 %rs1344, %rs1247, 4; 2026-02-21T10:19:38.0516330Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0516440Z cvt.rn.f32.s16 %r19725, %rs1344; 2026-02-21T10:19:38.0516630Z cvt.rn.f32.s16 %r19726, %rs1343; 2026-02-21T10:19:38.0516695Z cvt.rn.f32.s16 %r19727, %rs1342; 2026-02-21T10:19:38.0516758Z cvt.rn.f32.s16 %r19728, %rs1340; 2026-02-21T10:19:38.0516815Z bar.sync 0; 2026-02-21T10:19:38.0516938Z st.shared.v4.b32 [%r27], {%r19668, %r19666, %r19667, %r19665}; 2026-02-21T10:19:38.0517069Z st.shared.v4.b32 [%r27+16384], {%r19700, %r19698, %r19699, %r19697}; 2026-02-21T10:19:38.0517195Z st.shared.v4.b32 [%r28], {%r19672, %r19670, %r19671, %r19669}; 2026-02-21T10:19:38.0517316Z st.shared.v4.b32 [%r28+16384], {%r19704, %r19702, %r19703, %r19701}; 2026-02-21T10:19:38.0517423Z st.shared.v4.b32 [%r29], {%r19676, %r19674, %r19675, %r19673}; 2026-02-21T10:19:38.0517542Z st.shared.v4.b32 [%r29+16384], {%r19708, %r19706, %r19707, %r19705}; 2026-02-21T10:19:38.0517651Z st.shared.v4.b32 [%r30], {%r19680, %r19678, %r19679, %r19677}; 2026-02-21T10:19:38.0517768Z st.shared.v4.b32 [%r30+16384], {%r19712, %r19710, %r19711, %r19709}; 2026-02-21T10:19:38.0517952Z st.shared.v4.b32 [%r31], {%r19684, %r19682, %r19683, %r19681}; 2026-02-21T10:19:38.0518071Z st.shared.v4.b32 [%r31+16384], {%r19716, %r19714, %r19715, %r19713}; 2026-02-21T10:19:38.0518178Z st.shared.v4.b32 [%r32], {%r19688, %r19686, %r19687, %r19685}; 2026-02-21T10:19:38.0518356Z st.shared.v4.b32 [%r32+16384], {%r19720, %r19718, %r19719, %r19717}; 2026-02-21T10:19:38.0518475Z st.shared.v4.b32 [%r33], {%r19692, %r19690, %r19691, %r19689}; 2026-02-21T10:19:38.0518595Z st.shared.v4.b32 [%r33+16384], {%r19724, %r19722, %r19723, %r19721}; 2026-02-21T10:19:38.0518701Z st.shared.v4.b32 [%r34], {%r19696, %r19694, %r19695, %r19693}; 2026-02-21T10:19:38.0518821Z st.shared.v4.b32 [%r34+16384], {%r19728, %r19726, %r19727, %r19725}; 2026-02-21T10:19:38.0518877Z $L__tmp11: 2026-02-21T10:19:38.0519148Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0519214Z // begin inline asm 2026-02-21T10:19:38.0519299Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0519354Z // end inline asm 2026-02-21T10:19:38.0519410Z bar.sync 0; 2026-02-21T10:19:38.0519482Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0519539Z // begin inline asm 2026-02-21T10:19:38.0521041Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r14936,%r14937,%r14938,%r14939}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0521100Z // end inline asm 2026-02-21T10:19:38.0521159Z // begin inline asm 2026-02-21T10:19:38.0522656Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15068,%r15069,%r15070,%r15071}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0522717Z // end inline asm 2026-02-21T10:19:38.0522775Z // begin inline asm 2026-02-21T10:19:38.0524258Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15200,%r15201,%r15202,%r15203}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0524464Z // end inline asm 2026-02-21T10:19:38.0524523Z // begin inline asm 2026-02-21T10:19:38.0526048Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15332,%r15333,%r15334,%r15335}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0526108Z // end inline asm 2026-02-21T10:19:38.0526166Z // begin inline asm 2026-02-21T10:19:38.0527859Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15464,%r15465,%r15466,%r15467}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0527926Z // end inline asm 2026-02-21T10:19:38.0527982Z // begin inline asm 2026-02-21T10:19:38.0529461Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15596,%r15597,%r15598,%r15599}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0529519Z // end inline asm 2026-02-21T10:19:38.0529576Z // begin inline asm 2026-02-21T10:19:38.0531057Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15728,%r15729,%r15730,%r15731}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0531119Z // end inline asm 2026-02-21T10:19:38.0531196Z // begin inline asm 2026-02-21T10:19:38.0532819Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r15860,%r15861,%r15862,%r15863}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0533024Z // end inline asm 2026-02-21T10:19:38.0533082Z // begin inline asm 2026-02-21T10:19:38.0534563Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r15992,%r15993,%r15994,%r15995}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0534622Z // end inline asm 2026-02-21T10:19:38.0534678Z // begin inline asm 2026-02-21T10:19:38.0536254Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16124,%r16125,%r16126,%r16127}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0536314Z // end inline asm 2026-02-21T10:19:38.0536372Z // begin inline asm 2026-02-21T10:19:38.0538004Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16256,%r16257,%r16258,%r16259}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0538067Z // end inline asm 2026-02-21T10:19:38.0538127Z // begin inline asm 2026-02-21T10:19:38.0539608Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16388,%r16389,%r16390,%r16391}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0539669Z // end inline asm 2026-02-21T10:19:38.0539727Z // begin inline asm 2026-02-21T10:19:38.0541203Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16520,%r16521,%r16522,%r16523}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0541261Z // end inline asm 2026-02-21T10:19:38.0541319Z // begin inline asm 2026-02-21T10:19:38.0542797Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16652,%r16653,%r16654,%r16655}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0542998Z // end inline asm 2026-02-21T10:19:38.0543055Z // begin inline asm 2026-02-21T10:19:38.0544595Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16784,%r16785,%r16786,%r16787}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0544656Z // end inline asm 2026-02-21T10:19:38.0544712Z // begin inline asm 2026-02-21T10:19:38.0546254Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r16916,%r16917,%r16918,%r16919}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0546314Z // end inline asm 2026-02-21T10:19:38.0546392Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0546573Z mov.b32 %r17049, %r19465; 2026-02-21T10:19:38.0546638Z mov.b32 %r17050, %r19465; 2026-02-21T10:19:38.0546696Z mov.b32 %r17048, %r39931; 2026-02-21T10:19:38.0546755Z // begin inline asm 2026-02-21T10:19:38.0549377Z // wait for regs: %r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r17048,%r17049,%r17050 2026-02-21T10:19:38.0549463Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0549520Z // end inline asm 2026-02-21T10:19:38.0549574Z $L__tmp12: 2026-02-21T10:19:38.0549781Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0549848Z add.s64 %rd415, %rd333, 256; 2026-02-21T10:19:38.0549910Z add.s64 %rd418, %rd336, 256; 2026-02-21T10:19:38.0549970Z add.s64 %rd421, %rd339, 256; 2026-02-21T10:19:38.0550032Z add.s64 %rd424, %rd342, 256; 2026-02-21T10:19:38.0550092Z add.s64 %rd427, %rd345, 256; 2026-02-21T10:19:38.0550241Z add.s64 %rd430, %rd348, 256; 2026-02-21T10:19:38.0550378Z add.s64 %rd433, %rd351, 256; 2026-02-21T10:19:38.0550457Z mad.wide.s32 %rd436, %r42725, 2, %rd117; 2026-02-21T10:19:38.0550658Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0550719Z // begin inline asm 2026-02-21T10:19:38.0550777Z mov.u64 %rd414, 0x0; 2026-02-21T10:19:38.0550908Z createpolicy.fractional.L2::evict_first.b64 %rd414, 1.0; 2026-02-21T10:19:38.0550964Z // end inline asm 2026-02-21T10:19:38.0551024Z // begin inline asm 2026-02-21T10:19:38.0551082Z mov.u32 %r17182, 0x0; 2026-02-21T10:19:38.0551138Z mov.u32 %r17183, 0x0; 2026-02-21T10:19:38.0551195Z mov.u32 %r17184, 0x0; 2026-02-21T10:19:38.0551250Z mov.u32 %r17185, 0x0; 2026-02-21T10:19:38.0551486Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17182, %r17183, %r17184, %r17185 }, [ %rd415 + 0 ], %rd414; 2026-02-21T10:19:38.0551543Z // end inline asm 2026-02-21T10:19:38.0551601Z // begin inline asm 2026-02-21T10:19:38.0551658Z mov.u64 %rd417, 0x0; 2026-02-21T10:19:38.0551779Z createpolicy.fractional.L2::evict_first.b64 %rd417, 1.0; 2026-02-21T10:19:38.0551911Z // end inline asm 2026-02-21T10:19:38.0551974Z // begin inline asm 2026-02-21T10:19:38.0552030Z mov.u32 %r17186, 0x0; 2026-02-21T10:19:38.0552088Z mov.u32 %r17187, 0x0; 2026-02-21T10:19:38.0552142Z mov.u32 %r17188, 0x0; 2026-02-21T10:19:38.0552199Z mov.u32 %r17189, 0x0; 2026-02-21T10:19:38.0552489Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17186, %r17187, %r17188, %r17189 }, [ %rd418 + 0 ], %rd417; 2026-02-21T10:19:38.0552550Z // end inline asm 2026-02-21T10:19:38.0552607Z // begin inline asm 2026-02-21T10:19:38.0552664Z mov.u64 %rd420, 0x0; 2026-02-21T10:19:38.0552782Z createpolicy.fractional.L2::evict_first.b64 %rd420, 1.0; 2026-02-21T10:19:38.0552837Z // end inline asm 2026-02-21T10:19:38.0552892Z // begin inline asm 2026-02-21T10:19:38.0552949Z mov.u32 %r17190, 0x0; 2026-02-21T10:19:38.0553006Z mov.u32 %r17191, 0x0; 2026-02-21T10:19:38.0553060Z mov.u32 %r17192, 0x0; 2026-02-21T10:19:38.0553117Z mov.u32 %r17193, 0x0; 2026-02-21T10:19:38.0553344Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17190, %r17191, %r17192, %r17193 }, [ %rd421 + 0 ], %rd420; 2026-02-21T10:19:38.0553400Z // end inline asm 2026-02-21T10:19:38.0553456Z // begin inline asm 2026-02-21T10:19:38.0553515Z mov.u64 %rd423, 0x0; 2026-02-21T10:19:38.0553632Z createpolicy.fractional.L2::evict_first.b64 %rd423, 1.0; 2026-02-21T10:19:38.0553687Z // end inline asm 2026-02-21T10:19:38.0553746Z // begin inline asm 2026-02-21T10:19:38.0553801Z mov.u32 %r17194, 0x0; 2026-02-21T10:19:38.0553856Z mov.u32 %r17195, 0x0; 2026-02-21T10:19:38.0553912Z mov.u32 %r17196, 0x0; 2026-02-21T10:19:38.0553971Z mov.u32 %r17197, 0x0; 2026-02-21T10:19:38.0554192Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17194, %r17195, %r17196, %r17197 }, [ %rd424 + 0 ], %rd423; 2026-02-21T10:19:38.0554247Z // end inline asm 2026-02-21T10:19:38.0554319Z // begin inline asm 2026-02-21T10:19:38.0554377Z mov.u64 %rd426, 0x0; 2026-02-21T10:19:38.0554496Z createpolicy.fractional.L2::evict_first.b64 %rd426, 1.0; 2026-02-21T10:19:38.0554560Z // end inline asm 2026-02-21T10:19:38.0554619Z // begin inline asm 2026-02-21T10:19:38.0554675Z mov.u32 %r17198, 0x0; 2026-02-21T10:19:38.0554731Z mov.u32 %r17199, 0x0; 2026-02-21T10:19:38.0554790Z mov.u32 %r17200, 0x0; 2026-02-21T10:19:38.0554845Z mov.u32 %r17201, 0x0; 2026-02-21T10:19:38.0555069Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17198, %r17199, %r17200, %r17201 }, [ %rd427 + 0 ], %rd426; 2026-02-21T10:19:38.0555127Z // end inline asm 2026-02-21T10:19:38.0555183Z // begin inline asm 2026-02-21T10:19:38.0555240Z mov.u64 %rd429, 0x0; 2026-02-21T10:19:38.0555352Z createpolicy.fractional.L2::evict_first.b64 %rd429, 1.0; 2026-02-21T10:19:38.0555410Z // end inline asm 2026-02-21T10:19:38.0555467Z // begin inline asm 2026-02-21T10:19:38.0555521Z mov.u32 %r17202, 0x0; 2026-02-21T10:19:38.0555667Z mov.u32 %r17203, 0x0; 2026-02-21T10:19:38.0555723Z mov.u32 %r17204, 0x0; 2026-02-21T10:19:38.0555826Z mov.u32 %r17205, 0x0; 2026-02-21T10:19:38.0556053Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17202, %r17203, %r17204, %r17205 }, [ %rd430 + 0 ], %rd429; 2026-02-21T10:19:38.0556108Z // end inline asm 2026-02-21T10:19:38.0556165Z // begin inline asm 2026-02-21T10:19:38.0556221Z mov.u64 %rd432, 0x0; 2026-02-21T10:19:38.0556337Z createpolicy.fractional.L2::evict_first.b64 %rd432, 1.0; 2026-02-21T10:19:38.0556392Z // end inline asm 2026-02-21T10:19:38.0556573Z // begin inline asm 2026-02-21T10:19:38.0556637Z mov.u32 %r17206, 0x0; 2026-02-21T10:19:38.0556693Z mov.u32 %r17207, 0x0; 2026-02-21T10:19:38.0556748Z mov.u32 %r17208, 0x0; 2026-02-21T10:19:38.0556802Z mov.u32 %r17209, 0x0; 2026-02-21T10:19:38.0557027Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17206, %r17207, %r17208, %r17209 }, [ %rd433 + 0 ], %rd432; 2026-02-21T10:19:38.0557094Z // end inline asm 2026-02-21T10:19:38.0557155Z // begin inline asm 2026-02-21T10:19:38.0557215Z mov.u64 %rd435, 0x0; 2026-02-21T10:19:38.0557335Z createpolicy.fractional.L2::evict_first.b64 %rd435, 1.0; 2026-02-21T10:19:38.0557463Z // end inline asm 2026-02-21T10:19:38.0557526Z // begin inline asm 2026-02-21T10:19:38.0557582Z mov.u32 %r17210, 0x0; 2026-02-21T10:19:38.0557637Z mov.u32 %r17211, 0x0; 2026-02-21T10:19:38.0557692Z mov.u32 %r17212, 0x0; 2026-02-21T10:19:38.0557754Z mov.u32 %r17213, 0x0; 2026-02-21T10:19:38.0558035Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17210, %r17211, %r17212, %r17213 }, [ %rd436 + 0 ], %rd435; 2026-02-21T10:19:38.0558092Z // end inline asm 2026-02-21T10:19:38.0558306Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0558366Z bar.sync 0; 2026-02-21T10:19:38.0558449Z st.shared.v2.b32 [%r9], {%r17182, %r17183}; 2026-02-21T10:19:38.0558542Z st.shared.v2.b32 [%r9+2048], {%r17186, %r17187}; 2026-02-21T10:19:38.0558627Z st.shared.v2.b32 [%r9+4096], {%r17190, %r17191}; 2026-02-21T10:19:38.0558710Z st.shared.v2.b32 [%r9+6144], {%r17194, %r17195}; 2026-02-21T10:19:38.0558792Z st.shared.v2.b32 [%r9+8192], {%r17198, %r17199}; 2026-02-21T10:19:38.0558883Z st.shared.v2.b32 [%r9+10240], {%r17202, %r17203}; 2026-02-21T10:19:38.0558965Z st.shared.v2.b32 [%r9+12288], {%r17206, %r17207}; 2026-02-21T10:19:38.0559046Z st.shared.v2.b32 [%r9+14336], {%r17210, %r17211}; 2026-02-21T10:19:38.0559128Z st.shared.v2.b32 [%r10], {%r17184, %r17185}; 2026-02-21T10:19:38.0559210Z st.shared.v2.b32 [%r10+2048], {%r17188, %r17189}; 2026-02-21T10:19:38.0559292Z st.shared.v2.b32 [%r10+4096], {%r17192, %r17193}; 2026-02-21T10:19:38.0559376Z st.shared.v2.b32 [%r10+6144], {%r17196, %r17197}; 2026-02-21T10:19:38.0559458Z st.shared.v2.b32 [%r10+8192], {%r17200, %r17201}; 2026-02-21T10:19:38.0559546Z st.shared.v2.b32 [%r10+10240], {%r17204, %r17205}; 2026-02-21T10:19:38.0559633Z st.shared.v2.b32 [%r10+12288], {%r17208, %r17209}; 2026-02-21T10:19:38.0559720Z st.shared.v2.b32 [%r10+14336], {%r17212, %r17213}; 2026-02-21T10:19:38.0559777Z bar.sync 0; 2026-02-21T10:19:38.0559846Z ld.shared.b16 %rs1345, [%r51]; 2026-02-21T10:19:38.0559918Z ld.shared.b16 %rs1346, [%r51+1024]; 2026-02-21T10:19:38.0559984Z ld.shared.b16 %rs1347, [%r51+64]; 2026-02-21T10:19:38.0560050Z ld.shared.b16 %rs1348, [%r51+1088]; 2026-02-21T10:19:38.0560113Z ld.shared.b16 %rs1349, [%r51+8192]; 2026-02-21T10:19:38.0560179Z ld.shared.b16 %rs1350, [%r51+9216]; 2026-02-21T10:19:38.0560243Z ld.shared.b16 %rs1351, [%r51+8256]; 2026-02-21T10:19:38.0560305Z ld.shared.b16 %rs1352, [%r51+9280]; 2026-02-21T10:19:38.0560371Z ld.shared.b16 %rs1353, [%r52]; 2026-02-21T10:19:38.0560433Z ld.shared.b16 %rs1354, [%r52+1024]; 2026-02-21T10:19:38.0560496Z ld.shared.b16 %rs1355, [%r52+64]; 2026-02-21T10:19:38.0560561Z ld.shared.b16 %rs1356, [%r52+1088]; 2026-02-21T10:19:38.0560624Z ld.shared.b16 %rs1357, [%r52+8192]; 2026-02-21T10:19:38.0560686Z ld.shared.b16 %rs1358, [%r52+9216]; 2026-02-21T10:19:38.0560835Z ld.shared.b16 %rs1359, [%r52+8256]; 2026-02-21T10:19:38.0560961Z ld.shared.b16 %rs1360, [%r52+9280]; 2026-02-21T10:19:38.0561026Z ld.shared.b16 %rs1361, [%r53]; 2026-02-21T10:19:38.0561089Z ld.shared.b16 %rs1362, [%r53+1024]; 2026-02-21T10:19:38.0561154Z ld.shared.b16 %rs1363, [%r53+64]; 2026-02-21T10:19:38.0561217Z ld.shared.b16 %rs1364, [%r53+1088]; 2026-02-21T10:19:38.0561280Z ld.shared.b16 %rs1365, [%r53+8192]; 2026-02-21T10:19:38.0561344Z ld.shared.b16 %rs1366, [%r53+9216]; 2026-02-21T10:19:38.0561412Z ld.shared.b16 %rs1367, [%r53+8256]; 2026-02-21T10:19:38.0561475Z ld.shared.b16 %rs1368, [%r53+9280]; 2026-02-21T10:19:38.0561537Z ld.shared.b16 %rs1369, [%r54]; 2026-02-21T10:19:38.0561603Z ld.shared.b16 %rs1370, [%r54+1024]; 2026-02-21T10:19:38.0561665Z ld.shared.b16 %rs1371, [%r54+64]; 2026-02-21T10:19:38.0561730Z ld.shared.b16 %rs1372, [%r54+1088]; 2026-02-21T10:19:38.0561795Z ld.shared.b16 %rs1373, [%r54+8192]; 2026-02-21T10:19:38.0561859Z ld.shared.b16 %rs1374, [%r54+9216]; 2026-02-21T10:19:38.0561922Z ld.shared.b16 %rs1375, [%r54+8256]; 2026-02-21T10:19:38.0561984Z ld.shared.b16 %rs1376, [%r54+9280]; 2026-02-21T10:19:38.0562097Z ld.shared.b16 %rs1377, [%r55]; 2026-02-21T10:19:38.0562164Z ld.shared.b16 %rs1378, [%r55+1024]; 2026-02-21T10:19:38.0562226Z ld.shared.b16 %rs1379, [%r55+64]; 2026-02-21T10:19:38.0562290Z ld.shared.b16 %rs1380, [%r55+1088]; 2026-02-21T10:19:38.0562400Z ld.shared.b16 %rs1381, [%r55+8192]; 2026-02-21T10:19:38.0562470Z ld.shared.b16 %rs1382, [%r55+9216]; 2026-02-21T10:19:38.0562534Z ld.shared.b16 %rs1383, [%r55+8256]; 2026-02-21T10:19:38.0562601Z ld.shared.b16 %rs1384, [%r55+9280]; 2026-02-21T10:19:38.0562667Z ld.shared.b16 %rs1385, [%r56]; 2026-02-21T10:19:38.0562731Z ld.shared.b16 %rs1386, [%r56+1024]; 2026-02-21T10:19:38.0562810Z ld.shared.b16 %rs1387, [%r56+64]; 2026-02-21T10:19:38.0562882Z ld.shared.b16 %rs1388, [%r56+1088]; 2026-02-21T10:19:38.0562946Z ld.shared.b16 %rs1389, [%r56+8192]; 2026-02-21T10:19:38.0563013Z ld.shared.b16 %rs1390, [%r56+9216]; 2026-02-21T10:19:38.0563080Z ld.shared.b16 %rs1391, [%r56+8256]; 2026-02-21T10:19:38.0563145Z ld.shared.b16 %rs1392, [%r56+9280]; 2026-02-21T10:19:38.0563210Z ld.shared.b16 %rs1393, [%r57]; 2026-02-21T10:19:38.0563275Z ld.shared.b16 %rs1394, [%r57+1024]; 2026-02-21T10:19:38.0563339Z ld.shared.b16 %rs1395, [%r57+64]; 2026-02-21T10:19:38.0563402Z ld.shared.b16 %rs1396, [%r57+1088]; 2026-02-21T10:19:38.0563468Z ld.shared.b16 %rs1397, [%r57+8192]; 2026-02-21T10:19:38.0563531Z ld.shared.b16 %rs1398, [%r57+9216]; 2026-02-21T10:19:38.0563596Z ld.shared.b16 %rs1399, [%r57+8256]; 2026-02-21T10:19:38.0563659Z ld.shared.b16 %rs1400, [%r57+9280]; 2026-02-21T10:19:38.0563723Z ld.shared.b16 %rs1401, [%r58]; 2026-02-21T10:19:38.0563787Z ld.shared.b16 %rs1402, [%r58+1024]; 2026-02-21T10:19:38.0563849Z ld.shared.b16 %rs1403, [%r58+64]; 2026-02-21T10:19:38.0563914Z ld.shared.b16 %rs1404, [%r58+1088]; 2026-02-21T10:19:38.0563980Z ld.shared.b16 %rs1405, [%r58+8192]; 2026-02-21T10:19:38.0564048Z ld.shared.b16 %rs1406, [%r58+9216]; 2026-02-21T10:19:38.0564114Z ld.shared.b16 %rs1407, [%r58+8256]; 2026-02-21T10:19:38.0564190Z ld.shared.b16 %rs1408, [%r58+9280]; 2026-02-21T10:19:38.0564255Z cvt.f32.bf16 %r17351, %rs1345; 2026-02-21T10:19:38.0564325Z cvt.f32.bf16 %r17352, %rs1346; 2026-02-21T10:19:38.0564389Z cvt.f32.bf16 %r17353, %rs1353; 2026-02-21T10:19:38.0564450Z cvt.f32.bf16 %r17354, %rs1354; 2026-02-21T10:19:38.0564510Z cvt.f32.bf16 %r17483, %rs1361; 2026-02-21T10:19:38.0564576Z cvt.f32.bf16 %r17484, %rs1362; 2026-02-21T10:19:38.0564635Z cvt.f32.bf16 %r17485, %rs1369; 2026-02-21T10:19:38.0564695Z cvt.f32.bf16 %r17486, %rs1370; 2026-02-21T10:19:38.0564755Z cvt.f32.bf16 %r17615, %rs1377; 2026-02-21T10:19:38.0564816Z cvt.f32.bf16 %r17616, %rs1378; 2026-02-21T10:19:38.0564885Z cvt.f32.bf16 %r17617, %rs1385; 2026-02-21T10:19:38.0564948Z cvt.f32.bf16 %r17618, %rs1386; 2026-02-21T10:19:38.0565068Z cvt.f32.bf16 %r17747, %rs1393; 2026-02-21T10:19:38.0565128Z cvt.f32.bf16 %r17748, %rs1394; 2026-02-21T10:19:38.0565233Z cvt.f32.bf16 %r17749, %rs1401; 2026-02-21T10:19:38.0565293Z cvt.f32.bf16 %r17750, %rs1402; 2026-02-21T10:19:38.0565356Z cvt.f32.bf16 %r17879, %rs1347; 2026-02-21T10:19:38.0565416Z cvt.f32.bf16 %r17880, %rs1348; 2026-02-21T10:19:38.0565475Z cvt.f32.bf16 %r17881, %rs1355; 2026-02-21T10:19:38.0565550Z cvt.f32.bf16 %r17882, %rs1356; 2026-02-21T10:19:38.0565614Z cvt.f32.bf16 %r18011, %rs1363; 2026-02-21T10:19:38.0565675Z cvt.f32.bf16 %r18012, %rs1364; 2026-02-21T10:19:38.0565740Z cvt.f32.bf16 %r18013, %rs1371; 2026-02-21T10:19:38.0565800Z cvt.f32.bf16 %r18014, %rs1372; 2026-02-21T10:19:38.0565859Z cvt.f32.bf16 %r18143, %rs1379; 2026-02-21T10:19:38.0565917Z cvt.f32.bf16 %r18144, %rs1380; 2026-02-21T10:19:38.0565981Z cvt.f32.bf16 %r18145, %rs1387; 2026-02-21T10:19:38.0566042Z cvt.f32.bf16 %r18146, %rs1388; 2026-02-21T10:19:38.0566101Z cvt.f32.bf16 %r18275, %rs1395; 2026-02-21T10:19:38.0566166Z cvt.f32.bf16 %r18276, %rs1396; 2026-02-21T10:19:38.0566229Z cvt.f32.bf16 %r18277, %rs1403; 2026-02-21T10:19:38.0566288Z cvt.f32.bf16 %r18278, %rs1404; 2026-02-21T10:19:38.0566398Z cvt.f32.bf16 %r18407, %rs1349; 2026-02-21T10:19:38.0566575Z cvt.f32.bf16 %r18408, %rs1350; 2026-02-21T10:19:38.0566639Z cvt.f32.bf16 %r18409, %rs1357; 2026-02-21T10:19:38.0566709Z cvt.f32.bf16 %r18410, %rs1358; 2026-02-21T10:19:38.0566848Z cvt.f32.bf16 %r18539, %rs1365; 2026-02-21T10:19:38.0566920Z cvt.f32.bf16 %r18540, %rs1366; 2026-02-21T10:19:38.0566985Z cvt.f32.bf16 %r18541, %rs1373; 2026-02-21T10:19:38.0567045Z cvt.f32.bf16 %r18542, %rs1374; 2026-02-21T10:19:38.0567107Z cvt.f32.bf16 %r18671, %rs1381; 2026-02-21T10:19:38.0567168Z cvt.f32.bf16 %r18672, %rs1382; 2026-02-21T10:19:38.0567229Z cvt.f32.bf16 %r18673, %rs1389; 2026-02-21T10:19:38.0567291Z cvt.f32.bf16 %r18674, %rs1390; 2026-02-21T10:19:38.0567350Z cvt.f32.bf16 %r18803, %rs1397; 2026-02-21T10:19:38.0567412Z cvt.f32.bf16 %r18804, %rs1398; 2026-02-21T10:19:38.0567471Z cvt.f32.bf16 %r18805, %rs1405; 2026-02-21T10:19:38.0567535Z cvt.f32.bf16 %r18806, %rs1406; 2026-02-21T10:19:38.0567595Z cvt.f32.bf16 %r18935, %rs1351; 2026-02-21T10:19:38.0567654Z cvt.f32.bf16 %r18936, %rs1352; 2026-02-21T10:19:38.0567716Z cvt.f32.bf16 %r18937, %rs1359; 2026-02-21T10:19:38.0567774Z cvt.f32.bf16 %r18938, %rs1360; 2026-02-21T10:19:38.0567833Z cvt.f32.bf16 %r19067, %rs1367; 2026-02-21T10:19:38.0567895Z cvt.f32.bf16 %r19068, %rs1368; 2026-02-21T10:19:38.0567955Z cvt.f32.bf16 %r19069, %rs1375; 2026-02-21T10:19:38.0568015Z cvt.f32.bf16 %r19070, %rs1376; 2026-02-21T10:19:38.0568074Z cvt.f32.bf16 %r19199, %rs1383; 2026-02-21T10:19:38.0568136Z cvt.f32.bf16 %r19200, %rs1384; 2026-02-21T10:19:38.0568195Z cvt.f32.bf16 %r19201, %rs1391; 2026-02-21T10:19:38.0568254Z cvt.f32.bf16 %r19202, %rs1392; 2026-02-21T10:19:38.0568316Z cvt.f32.bf16 %r19331, %rs1399; 2026-02-21T10:19:38.0568374Z cvt.f32.bf16 %r19332, %rs1400; 2026-02-21T10:19:38.0568436Z cvt.f32.bf16 %r19333, %rs1407; 2026-02-21T10:19:38.0568495Z cvt.f32.bf16 %r19334, %rs1408; 2026-02-21T10:19:38.0568709Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0568768Z bar.sync 0; 2026-02-21T10:19:38.0568829Z // begin inline asm 2026-02-21T10:19:38.0568938Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0568995Z // end inline asm 2026-02-21T10:19:38.0569052Z bar.sync 0; 2026-02-21T10:19:38.0569112Z // begin inline asm 2026-02-21T10:19:38.0569247Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0569303Z // end inline asm 2026-02-21T10:19:38.0569359Z // begin inline asm 2026-02-21T10:19:38.0569439Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0569494Z // end inline asm 2026-02-21T10:19:38.0569548Z bar.sync 0; 2026-02-21T10:19:38.0569618Z elect.sync %r19729|%p192, -1; 2026-02-21T10:19:38.0569688Z and.pred %p171, %p1, %p192; 2026-02-21T10:19:38.0569839Z add.s32 %r17218, %r19664, 160; 2026-02-21T10:19:38.0569959Z // begin inline asm 2026-02-21T10:19:38.0570303Z @%p171 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r19829, %r17218}], [%r29846]; 2026-02-21T10:19:38.0570360Z // end inline asm 2026-02-21T10:19:38.0570413Z bar.sync 0; 2026-02-21T10:19:38.0570473Z // begin inline asm 2026-02-21T10:19:38.0570524Z 2026-02-21T10:19:38.0570575Z { 2026-02-21T10:19:38.0570637Z .reg .pred complete; 2026-02-21T10:19:38.0570695Z waitLoop: 2026-02-21T10:19:38.0570842Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r19465; 2026-02-21T10:19:38.0570910Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0570962Z } 2026-02-21T10:19:38.0570968Z 2026-02-21T10:19:38.0571024Z // end inline asm 2026-02-21T10:19:38.0571077Z bar.sync 0; 2026-02-21T10:19:38.0571137Z // begin inline asm 2026-02-21T10:19:38.0571234Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0571290Z // end inline asm 2026-02-21T10:19:38.0571497Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0571636Z ld.shared.s8 %rs1409, [%r19]; 2026-02-21T10:19:38.0571847Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0571912Z shl.b16 %rs1410, %rs1409, 4; 2026-02-21T10:19:38.0572184Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0572260Z ld.shared.s8 %rs1411, [%r20+128]; 2026-02-21T10:19:38.0572454Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0572521Z shl.b16 %rs1412, %rs1411, 4; 2026-02-21T10:19:38.0572711Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0572778Z ld.shared.s8 %rs1413, [%r21+256]; 2026-02-21T10:19:38.0572977Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0573040Z shl.b16 %rs1414, %rs1413, 4; 2026-02-21T10:19:38.0573230Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0573293Z ld.shared.s8 %rs1415, [%r22+384]; 2026-02-21T10:19:38.0573497Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0573560Z shl.b16 %rs1416, %rs1415, 4; 2026-02-21T10:19:38.0573750Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0573818Z ld.shared.s8 %rs1417, [%r23+512]; 2026-02-21T10:19:38.0574008Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0574068Z shl.b16 %rs1418, %rs1417, 4; 2026-02-21T10:19:38.0574261Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0574328Z ld.shared.s8 %rs1419, [%r24+640]; 2026-02-21T10:19:38.0574519Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0574582Z shl.b16 %rs1420, %rs1419, 4; 2026-02-21T10:19:38.0574772Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0574836Z ld.shared.s8 %rs1421, [%r25+768]; 2026-02-21T10:19:38.0575025Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0575090Z shl.b16 %rs1422, %rs1421, 4; 2026-02-21T10:19:38.0575277Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0575340Z ld.shared.s8 %rs1423, [%r26+896]; 2026-02-21T10:19:38.0575533Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0575651Z shl.b16 %rs1424, %rs1423, 4; 2026-02-21T10:19:38.0575886Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0575955Z ld.shared.s8 %rs1425, [%r19+1024]; 2026-02-21T10:19:38.0576144Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0576203Z shl.b16 %rs1426, %rs1425, 4; 2026-02-21T10:19:38.0576396Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0576586Z ld.shared.s8 %rs1427, [%r20+1152]; 2026-02-21T10:19:38.0576782Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0576843Z shl.b16 %rs1428, %rs1427, 4; 2026-02-21T10:19:38.0577049Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0577118Z ld.shared.s8 %rs1429, [%r21+1280]; 2026-02-21T10:19:38.0577306Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0577447Z shl.b16 %rs1430, %rs1429, 4; 2026-02-21T10:19:38.0577638Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0577703Z ld.shared.s8 %rs1431, [%r22+1408]; 2026-02-21T10:19:38.0577950Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0578013Z shl.b16 %rs1432, %rs1431, 4; 2026-02-21T10:19:38.0582611Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0582692Z ld.shared.s8 %rs1433, [%r23+1536]; 2026-02-21T10:19:38.0582917Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0582988Z shl.b16 %rs1434, %rs1433, 4; 2026-02-21T10:19:38.0583199Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0583274Z ld.shared.s8 %rs1435, [%r24+1664]; 2026-02-21T10:19:38.0583474Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0583539Z shl.b16 %rs1436, %rs1435, 4; 2026-02-21T10:19:38.0583729Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0583797Z ld.shared.s8 %rs1437, [%r25+1792]; 2026-02-21T10:19:38.0583986Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0584064Z shl.b16 %rs1438, %rs1437, 4; 2026-02-21T10:19:38.0584249Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0584315Z ld.shared.s8 %rs1439, [%r26+1920]; 2026-02-21T10:19:38.0584508Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0584570Z shl.b16 %rs1440, %rs1439, 4; 2026-02-21T10:19:38.0584759Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0584820Z ld.shared.s8 %rs1441, [%r19+2048]; 2026-02-21T10:19:38.0585005Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0585069Z shl.b16 %rs1442, %rs1441, 4; 2026-02-21T10:19:38.0585255Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0585318Z ld.shared.s8 %rs1443, [%r20+2176]; 2026-02-21T10:19:38.0585506Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0585566Z shl.b16 %rs1444, %rs1443, 4; 2026-02-21T10:19:38.0585750Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0585928Z ld.shared.s8 %rs1445, [%r21+2304]; 2026-02-21T10:19:38.0586122Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0586182Z shl.b16 %rs1446, %rs1445, 4; 2026-02-21T10:19:38.0586368Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0586433Z ld.shared.s8 %rs1447, [%r22+2432]; 2026-02-21T10:19:38.0586817Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0586881Z shl.b16 %rs1448, %rs1447, 4; 2026-02-21T10:19:38.0587085Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0587150Z ld.shared.s8 %rs1449, [%r23+2560]; 2026-02-21T10:19:38.0587337Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0587401Z shl.b16 %rs1450, %rs1449, 4; 2026-02-21T10:19:38.0587588Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0587732Z ld.shared.s8 %rs1451, [%r24+2688]; 2026-02-21T10:19:38.0587921Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0587989Z shl.b16 %rs1452, %rs1451, 4; 2026-02-21T10:19:38.0588234Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0588300Z ld.shared.s8 %rs1453, [%r25+2816]; 2026-02-21T10:19:38.0588737Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0588799Z shl.b16 %rs1454, %rs1453, 4; 2026-02-21T10:19:38.0588989Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0589054Z ld.shared.s8 %rs1455, [%r26+2944]; 2026-02-21T10:19:38.0589240Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0589306Z shl.b16 %rs1456, %rs1455, 4; 2026-02-21T10:19:38.0589506Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0589575Z ld.shared.s8 %rs1457, [%r19+3072]; 2026-02-21T10:19:38.0589764Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0589826Z shl.b16 %rs1458, %rs1457, 4; 2026-02-21T10:19:38.0590017Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0590081Z ld.shared.s8 %rs1459, [%r20+3200]; 2026-02-21T10:19:38.0590267Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0590329Z shl.b16 %rs1460, %rs1459, 4; 2026-02-21T10:19:38.0590515Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0590579Z ld.shared.s8 %rs1461, [%r21+3328]; 2026-02-21T10:19:38.0590769Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0590829Z shl.b16 %rs1462, %rs1461, 4; 2026-02-21T10:19:38.0591015Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0591082Z ld.shared.s8 %rs1463, [%r22+3456]; 2026-02-21T10:19:38.0591281Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0591344Z shl.b16 %rs1464, %rs1463, 4; 2026-02-21T10:19:38.0591532Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0591596Z ld.shared.s8 %rs1465, [%r23+3584]; 2026-02-21T10:19:38.0591780Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0591918Z shl.b16 %rs1466, %rs1465, 4; 2026-02-21T10:19:38.0592125Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0592191Z ld.shared.s8 %rs1467, [%r24+3712]; 2026-02-21T10:19:38.0592388Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0592453Z shl.b16 %rs1468, %rs1467, 4; 2026-02-21T10:19:38.0592642Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0592709Z ld.shared.s8 %rs1469, [%r25+3840]; 2026-02-21T10:19:38.0592908Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0592971Z shl.b16 %rs1470, %rs1469, 4; 2026-02-21T10:19:38.0593166Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0593238Z ld.shared.s8 %rs1471, [%r26+3968]; 2026-02-21T10:19:38.0593430Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0593556Z shl.b16 %rs1472, %rs1471, 4; 2026-02-21T10:19:38.0593749Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0593813Z cvt.s16.s8 %rs1473, %rs1410; 2026-02-21T10:19:38.0593872Z shr.s16 %rs1474, %rs1473, 4; 2026-02-21T10:19:38.0593975Z cvt.s16.s8 %rs1475, %rs1412; 2026-02-21T10:19:38.0594038Z shr.s16 %rs1476, %rs1475, 4; 2026-02-21T10:19:38.0594096Z shr.s16 %rs1477, %rs1409, 4; 2026-02-21T10:19:38.0594235Z shr.s16 %rs1478, %rs1411, 4; 2026-02-21T10:19:38.0594428Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0594494Z cvt.rn.f32.s16 %r19730, %rs1478; 2026-02-21T10:19:38.0594555Z cvt.rn.f32.s16 %r19731, %rs1477; 2026-02-21T10:19:38.0594616Z cvt.rn.f32.s16 %r19732, %rs1476; 2026-02-21T10:19:38.0594681Z cvt.rn.f32.s16 %r19733, %rs1474; 2026-02-21T10:19:38.0594872Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0594933Z cvt.s16.s8 %rs1479, %rs1414; 2026-02-21T10:19:38.0594994Z shr.s16 %rs1480, %rs1479, 4; 2026-02-21T10:19:38.0595053Z cvt.s16.s8 %rs1481, %rs1416; 2026-02-21T10:19:38.0595110Z shr.s16 %rs1482, %rs1481, 4; 2026-02-21T10:19:38.0595169Z shr.s16 %rs1483, %rs1413, 4; 2026-02-21T10:19:38.0595230Z shr.s16 %rs1484, %rs1415, 4; 2026-02-21T10:19:38.0595415Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0595477Z cvt.rn.f32.s16 %r19734, %rs1484; 2026-02-21T10:19:38.0595539Z cvt.rn.f32.s16 %r19735, %rs1483; 2026-02-21T10:19:38.0595600Z cvt.rn.f32.s16 %r19736, %rs1482; 2026-02-21T10:19:38.0595660Z cvt.rn.f32.s16 %r19737, %rs1480; 2026-02-21T10:19:38.0595851Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0595912Z cvt.s16.s8 %rs1485, %rs1418; 2026-02-21T10:19:38.0595972Z shr.s16 %rs1486, %rs1485, 4; 2026-02-21T10:19:38.0596033Z cvt.s16.s8 %rs1487, %rs1420; 2026-02-21T10:19:38.0596094Z shr.s16 %rs1488, %rs1487, 4; 2026-02-21T10:19:38.0596154Z shr.s16 %rs1489, %rs1417, 4; 2026-02-21T10:19:38.0596212Z shr.s16 %rs1490, %rs1419, 4; 2026-02-21T10:19:38.0596404Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0596615Z cvt.rn.f32.s16 %r19738, %rs1490; 2026-02-21T10:19:38.0596700Z cvt.rn.f32.s16 %r19739, %rs1489; 2026-02-21T10:19:38.0596764Z cvt.rn.f32.s16 %r19740, %rs1488; 2026-02-21T10:19:38.0596824Z cvt.rn.f32.s16 %r19741, %rs1486; 2026-02-21T10:19:38.0597015Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0597076Z cvt.s16.s8 %rs1491, %rs1422; 2026-02-21T10:19:38.0597234Z shr.s16 %rs1492, %rs1491, 4; 2026-02-21T10:19:38.0597296Z cvt.s16.s8 %rs1493, %rs1424; 2026-02-21T10:19:38.0597355Z shr.s16 %rs1494, %rs1493, 4; 2026-02-21T10:19:38.0597417Z shr.s16 %rs1495, %rs1421, 4; 2026-02-21T10:19:38.0597476Z shr.s16 %rs1496, %rs1423, 4; 2026-02-21T10:19:38.0597672Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0597740Z cvt.rn.f32.s16 %r19742, %rs1496; 2026-02-21T10:19:38.0597802Z cvt.rn.f32.s16 %r19743, %rs1495; 2026-02-21T10:19:38.0597862Z cvt.rn.f32.s16 %r19744, %rs1494; 2026-02-21T10:19:38.0597921Z cvt.rn.f32.s16 %r19745, %rs1492; 2026-02-21T10:19:38.0598115Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0598174Z cvt.s16.s8 %rs1497, %rs1426; 2026-02-21T10:19:38.0598233Z shr.s16 %rs1498, %rs1497, 4; 2026-02-21T10:19:38.0598297Z cvt.s16.s8 %rs1499, %rs1428; 2026-02-21T10:19:38.0598357Z shr.s16 %rs1500, %rs1499, 4; 2026-02-21T10:19:38.0598416Z shr.s16 %rs1501, %rs1425, 4; 2026-02-21T10:19:38.0598473Z shr.s16 %rs1502, %rs1427, 4; 2026-02-21T10:19:38.0598744Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0598811Z cvt.rn.f32.s16 %r19746, %rs1502; 2026-02-21T10:19:38.0598871Z cvt.rn.f32.s16 %r19747, %rs1501; 2026-02-21T10:19:38.0598935Z cvt.rn.f32.s16 %r19748, %rs1500; 2026-02-21T10:19:38.0599055Z cvt.rn.f32.s16 %r19749, %rs1498; 2026-02-21T10:19:38.0599246Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0599369Z cvt.s16.s8 %rs1503, %rs1430; 2026-02-21T10:19:38.0599429Z shr.s16 %rs1504, %rs1503, 4; 2026-02-21T10:19:38.0599487Z cvt.s16.s8 %rs1505, %rs1432; 2026-02-21T10:19:38.0599557Z shr.s16 %rs1506, %rs1505, 4; 2026-02-21T10:19:38.0599622Z shr.s16 %rs1507, %rs1429, 4; 2026-02-21T10:19:38.0599684Z shr.s16 %rs1508, %rs1431, 4; 2026-02-21T10:19:38.0599874Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0599939Z cvt.rn.f32.s16 %r19750, %rs1508; 2026-02-21T10:19:38.0600001Z cvt.rn.f32.s16 %r19751, %rs1507; 2026-02-21T10:19:38.0600060Z cvt.rn.f32.s16 %r19752, %rs1506; 2026-02-21T10:19:38.0600122Z cvt.rn.f32.s16 %r19753, %rs1504; 2026-02-21T10:19:38.0600311Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0600371Z cvt.s16.s8 %rs1509, %rs1434; 2026-02-21T10:19:38.0600430Z shr.s16 %rs1510, %rs1509, 4; 2026-02-21T10:19:38.0600494Z cvt.s16.s8 %rs1511, %rs1436; 2026-02-21T10:19:38.0600551Z shr.s16 %rs1512, %rs1511, 4; 2026-02-21T10:19:38.0600609Z shr.s16 %rs1513, %rs1433, 4; 2026-02-21T10:19:38.0600670Z shr.s16 %rs1514, %rs1435, 4; 2026-02-21T10:19:38.0600857Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0600919Z cvt.rn.f32.s16 %r19754, %rs1514; 2026-02-21T10:19:38.0600980Z cvt.rn.f32.s16 %r19755, %rs1513; 2026-02-21T10:19:38.0601039Z cvt.rn.f32.s16 %r19756, %rs1512; 2026-02-21T10:19:38.0601100Z cvt.rn.f32.s16 %r19757, %rs1510; 2026-02-21T10:19:38.0601287Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0601351Z cvt.s16.s8 %rs1515, %rs1438; 2026-02-21T10:19:38.0601413Z shr.s16 %rs1516, %rs1515, 4; 2026-02-21T10:19:38.0601473Z cvt.s16.s8 %rs1517, %rs1440; 2026-02-21T10:19:38.0601547Z shr.s16 %rs1518, %rs1517, 4; 2026-02-21T10:19:38.0601610Z shr.s16 %rs1519, %rs1437, 4; 2026-02-21T10:19:38.0601671Z shr.s16 %rs1520, %rs1439, 4; 2026-02-21T10:19:38.0601863Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0601930Z cvt.rn.f32.s16 %r19758, %rs1520; 2026-02-21T10:19:38.0601990Z cvt.rn.f32.s16 %r19759, %rs1519; 2026-02-21T10:19:38.0602109Z cvt.rn.f32.s16 %r19760, %rs1518; 2026-02-21T10:19:38.0602173Z cvt.rn.f32.s16 %r19761, %rs1516; 2026-02-21T10:19:38.0602361Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0602422Z cvt.s16.s8 %rs1521, %rs1442; 2026-02-21T10:19:38.0602484Z shr.s16 %rs1522, %rs1521, 4; 2026-02-21T10:19:38.0602545Z cvt.s16.s8 %rs1523, %rs1444; 2026-02-21T10:19:38.0602604Z shr.s16 %rs1524, %rs1523, 4; 2026-02-21T10:19:38.0602664Z shr.s16 %rs1525, %rs1441, 4; 2026-02-21T10:19:38.0602727Z shr.s16 %rs1526, %rs1443, 4; 2026-02-21T10:19:38.0602927Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0602992Z cvt.rn.f32.s16 %r19762, %rs1526; 2026-02-21T10:19:38.0603055Z cvt.rn.f32.s16 %r19763, %rs1525; 2026-02-21T10:19:38.0603115Z cvt.rn.f32.s16 %r19764, %rs1524; 2026-02-21T10:19:38.0603173Z cvt.rn.f32.s16 %r19765, %rs1522; 2026-02-21T10:19:38.0603362Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0603423Z cvt.s16.s8 %rs1527, %rs1446; 2026-02-21T10:19:38.0603538Z shr.s16 %rs1528, %rs1527, 4; 2026-02-21T10:19:38.0603609Z cvt.s16.s8 %rs1529, %rs1448; 2026-02-21T10:19:38.0603675Z shr.s16 %rs1530, %rs1529, 4; 2026-02-21T10:19:38.0603736Z shr.s16 %rs1531, %rs1445, 4; 2026-02-21T10:19:38.0603795Z shr.s16 %rs1532, %rs1447, 4; 2026-02-21T10:19:38.0604033Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0604096Z cvt.rn.f32.s16 %r19766, %rs1532; 2026-02-21T10:19:38.0604211Z cvt.rn.f32.s16 %r19767, %rs1531; 2026-02-21T10:19:38.0604285Z cvt.rn.f32.s16 %r19768, %rs1530; 2026-02-21T10:19:38.0604347Z cvt.rn.f32.s16 %r19769, %rs1528; 2026-02-21T10:19:38.0604533Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0604595Z cvt.s16.s8 %rs1533, %rs1450; 2026-02-21T10:19:38.0604661Z shr.s16 %rs1534, %rs1533, 4; 2026-02-21T10:19:38.0604719Z cvt.s16.s8 %rs1535, %rs1452; 2026-02-21T10:19:38.0604781Z shr.s16 %rs1536, %rs1535, 4; 2026-02-21T10:19:38.0604842Z shr.s16 %rs1537, %rs1449, 4; 2026-02-21T10:19:38.0604901Z shr.s16 %rs1538, %rs1451, 4; 2026-02-21T10:19:38.0605089Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0605152Z cvt.rn.f32.s16 %r19770, %rs1538; 2026-02-21T10:19:38.0605215Z cvt.rn.f32.s16 %r19771, %rs1537; 2026-02-21T10:19:38.0605276Z cvt.rn.f32.s16 %r19772, %rs1536; 2026-02-21T10:19:38.0605338Z cvt.rn.f32.s16 %r19773, %rs1534; 2026-02-21T10:19:38.0605530Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0605588Z cvt.s16.s8 %rs1539, %rs1454; 2026-02-21T10:19:38.0605647Z shr.s16 %rs1540, %rs1539, 4; 2026-02-21T10:19:38.0605709Z cvt.s16.s8 %rs1541, %rs1456; 2026-02-21T10:19:38.0605769Z shr.s16 %rs1542, %rs1541, 4; 2026-02-21T10:19:38.0605829Z shr.s16 %rs1543, %rs1453, 4; 2026-02-21T10:19:38.0605888Z shr.s16 %rs1544, %rs1455, 4; 2026-02-21T10:19:38.0606079Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0606140Z cvt.rn.f32.s16 %r19774, %rs1544; 2026-02-21T10:19:38.0606202Z cvt.rn.f32.s16 %r19775, %rs1543; 2026-02-21T10:19:38.0606263Z cvt.rn.f32.s16 %r19776, %rs1542; 2026-02-21T10:19:38.0606324Z cvt.rn.f32.s16 %r19777, %rs1540; 2026-02-21T10:19:38.0606649Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0606718Z cvt.s16.s8 %rs1545, %rs1458; 2026-02-21T10:19:38.0606779Z shr.s16 %rs1546, %rs1545, 4; 2026-02-21T10:19:38.0606837Z cvt.s16.s8 %rs1547, %rs1460; 2026-02-21T10:19:38.0606896Z shr.s16 %rs1548, %rs1547, 4; 2026-02-21T10:19:38.0606958Z shr.s16 %rs1549, %rs1457, 4; 2026-02-21T10:19:38.0607026Z shr.s16 %rs1550, %rs1459, 4; 2026-02-21T10:19:38.0607302Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0607368Z cvt.rn.f32.s16 %r19778, %rs1550; 2026-02-21T10:19:38.0607429Z cvt.rn.f32.s16 %r19779, %rs1549; 2026-02-21T10:19:38.0607488Z cvt.rn.f32.s16 %r19780, %rs1548; 2026-02-21T10:19:38.0607550Z cvt.rn.f32.s16 %r19781, %rs1546; 2026-02-21T10:19:38.0607737Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0607797Z cvt.s16.s8 %rs1551, %rs1462; 2026-02-21T10:19:38.0607857Z shr.s16 %rs1552, %rs1551, 4; 2026-02-21T10:19:38.0607936Z cvt.s16.s8 %rs1553, %rs1464; 2026-02-21T10:19:38.0607997Z shr.s16 %rs1554, %rs1553, 4; 2026-02-21T10:19:38.0608057Z shr.s16 %rs1555, %rs1461, 4; 2026-02-21T10:19:38.0608123Z shr.s16 %rs1556, %rs1463, 4; 2026-02-21T10:19:38.0608311Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0608374Z cvt.rn.f32.s16 %r19782, %rs1556; 2026-02-21T10:19:38.0608438Z cvt.rn.f32.s16 %r19783, %rs1555; 2026-02-21T10:19:38.0608500Z cvt.rn.f32.s16 %r19784, %rs1554; 2026-02-21T10:19:38.0608636Z cvt.rn.f32.s16 %r19785, %rs1552; 2026-02-21T10:19:38.0608829Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0608892Z cvt.s16.s8 %rs1557, %rs1466; 2026-02-21T10:19:38.0609013Z shr.s16 %rs1558, %rs1557, 4; 2026-02-21T10:19:38.0609075Z cvt.s16.s8 %rs1559, %rs1468; 2026-02-21T10:19:38.0609138Z shr.s16 %rs1560, %rs1559, 4; 2026-02-21T10:19:38.0609258Z shr.s16 %rs1561, %rs1465, 4; 2026-02-21T10:19:38.0609315Z shr.s16 %rs1562, %rs1467, 4; 2026-02-21T10:19:38.0609507Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0609568Z cvt.rn.f32.s16 %r19786, %rs1562; 2026-02-21T10:19:38.0609628Z cvt.rn.f32.s16 %r19787, %rs1561; 2026-02-21T10:19:38.0609691Z cvt.rn.f32.s16 %r19788, %rs1560; 2026-02-21T10:19:38.0609755Z cvt.rn.f32.s16 %r19789, %rs1558; 2026-02-21T10:19:38.0609945Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0610004Z cvt.s16.s8 %rs1563, %rs1470; 2026-02-21T10:19:38.0610068Z shr.s16 %rs1564, %rs1563, 4; 2026-02-21T10:19:38.0610127Z cvt.s16.s8 %rs1565, %rs1472; 2026-02-21T10:19:38.0610186Z shr.s16 %rs1566, %rs1565, 4; 2026-02-21T10:19:38.0610249Z shr.s16 %rs1567, %rs1469, 4; 2026-02-21T10:19:38.0610307Z shr.s16 %rs1568, %rs1471, 4; 2026-02-21T10:19:38.0610494Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0610556Z cvt.rn.f32.s16 %r19790, %rs1568; 2026-02-21T10:19:38.0610618Z cvt.rn.f32.s16 %r19791, %rs1567; 2026-02-21T10:19:38.0610678Z cvt.rn.f32.s16 %r19792, %rs1566; 2026-02-21T10:19:38.0610738Z cvt.rn.f32.s16 %r19793, %rs1564; 2026-02-21T10:19:38.0610803Z bar.sync 0; 2026-02-21T10:19:38.0610925Z st.shared.v4.b32 [%r27], {%r19733, %r19731, %r19732, %r19730}; 2026-02-21T10:19:38.0611054Z st.shared.v4.b32 [%r27+16384], {%r19765, %r19763, %r19764, %r19762}; 2026-02-21T10:19:38.0611170Z st.shared.v4.b32 [%r28], {%r19737, %r19735, %r19736, %r19734}; 2026-02-21T10:19:38.0611289Z st.shared.v4.b32 [%r28+16384], {%r19769, %r19767, %r19768, %r19766}; 2026-02-21T10:19:38.0611397Z st.shared.v4.b32 [%r29], {%r19741, %r19739, %r19740, %r19738}; 2026-02-21T10:19:38.0611512Z st.shared.v4.b32 [%r29+16384], {%r19773, %r19771, %r19772, %r19770}; 2026-02-21T10:19:38.0611632Z st.shared.v4.b32 [%r30], {%r19745, %r19743, %r19744, %r19742}; 2026-02-21T10:19:38.0611753Z st.shared.v4.b32 [%r30+16384], {%r19777, %r19775, %r19776, %r19774}; 2026-02-21T10:19:38.0611859Z st.shared.v4.b32 [%r31], {%r19749, %r19747, %r19748, %r19746}; 2026-02-21T10:19:38.0611977Z st.shared.v4.b32 [%r31+16384], {%r19781, %r19779, %r19780, %r19778}; 2026-02-21T10:19:38.0612080Z st.shared.v4.b32 [%r32], {%r19753, %r19751, %r19752, %r19750}; 2026-02-21T10:19:38.0612263Z st.shared.v4.b32 [%r32+16384], {%r19785, %r19783, %r19784, %r19782}; 2026-02-21T10:19:38.0612375Z st.shared.v4.b32 [%r33], {%r19757, %r19755, %r19756, %r19754}; 2026-02-21T10:19:38.0612491Z st.shared.v4.b32 [%r33+16384], {%r19789, %r19787, %r19788, %r19786}; 2026-02-21T10:19:38.0612595Z st.shared.v4.b32 [%r34], {%r19761, %r19759, %r19760, %r19758}; 2026-02-21T10:19:38.0612712Z st.shared.v4.b32 [%r34+16384], {%r19793, %r19791, %r19792, %r19790}; 2026-02-21T10:19:38.0612766Z $L__tmp13: 2026-02-21T10:19:38.0613041Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0613103Z // begin inline asm 2026-02-21T10:19:38.0613191Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0613247Z // end inline asm 2026-02-21T10:19:38.0613301Z bar.sync 0; 2026-02-21T10:19:38.0613376Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0613435Z // begin inline asm 2026-02-21T10:19:38.0615028Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r17351,%r17352,%r17353,%r17354}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0615138Z // end inline asm 2026-02-21T10:19:38.0615196Z // begin inline asm 2026-02-21T10:19:38.0616806Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r17483,%r17484,%r17485,%r17486}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0616872Z // end inline asm 2026-02-21T10:19:38.0616942Z // begin inline asm 2026-02-21T10:19:38.0618398Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r17615,%r17616,%r17617,%r17618}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0618457Z // end inline asm 2026-02-21T10:19:38.0618514Z // begin inline asm 2026-02-21T10:19:38.0619969Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r17747,%r17748,%r17749,%r17750}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0620028Z // end inline asm 2026-02-21T10:19:38.0620085Z // begin inline asm 2026-02-21T10:19:38.0621541Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r17879,%r17880,%r17881,%r17882}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0621684Z // end inline asm 2026-02-21T10:19:38.0621746Z // begin inline asm 2026-02-21T10:19:38.0623256Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r18011,%r18012,%r18013,%r18014}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0623316Z // end inline asm 2026-02-21T10:19:38.0623377Z // begin inline asm 2026-02-21T10:19:38.0624887Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r18143,%r18144,%r18145,%r18146}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0625002Z // end inline asm 2026-02-21T10:19:38.0625059Z // begin inline asm 2026-02-21T10:19:38.0626634Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r18275,%r18276,%r18277,%r18278}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0626703Z // end inline asm 2026-02-21T10:19:38.0626761Z // begin inline asm 2026-02-21T10:19:38.0628235Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r18407,%r18408,%r18409,%r18410}, %rd3, %p133, 1, 1; 2026-02-21T10:19:38.0628293Z // end inline asm 2026-02-21T10:19:38.0628349Z // begin inline asm 2026-02-21T10:19:38.0629871Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r18539,%r18540,%r18541,%r18542}, %rd4, %p133, 1, 1; 2026-02-21T10:19:38.0630003Z // end inline asm 2026-02-21T10:19:38.0630060Z // begin inline asm 2026-02-21T10:19:38.0631515Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r18671,%r18672,%r18673,%r18674}, %rd5, %p133, 1, 1; 2026-02-21T10:19:38.0631575Z // end inline asm 2026-02-21T10:19:38.0631636Z // begin inline asm 2026-02-21T10:19:38.0633364Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r18803,%r18804,%r18805,%r18806}, %rd6, %p133, 1, 1; 2026-02-21T10:19:38.0633485Z // end inline asm 2026-02-21T10:19:38.0633548Z // begin inline asm 2026-02-21T10:19:38.0635012Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r18935,%r18936,%r18937,%r18938}, %rd7, %p133, 1, 1; 2026-02-21T10:19:38.0635077Z // end inline asm 2026-02-21T10:19:38.0635135Z // begin inline asm 2026-02-21T10:19:38.0636761Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r19067,%r19068,%r19069,%r19070}, %rd8, %p133, 1, 1; 2026-02-21T10:19:38.0636830Z // end inline asm 2026-02-21T10:19:38.0636888Z // begin inline asm 2026-02-21T10:19:38.0638364Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r19199,%r19200,%r19201,%r19202}, %rd9, %p133, 1, 1; 2026-02-21T10:19:38.0638428Z // end inline asm 2026-02-21T10:19:38.0638484Z // begin inline asm 2026-02-21T10:19:38.0640025Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r19331,%r19332,%r19333,%r19334}, %rd10, %p133, 1, 1; 2026-02-21T10:19:38.0640095Z // end inline asm 2026-02-21T10:19:38.0640177Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0640242Z mov.b32 %r19463, %r39931; 2026-02-21T10:19:38.0640300Z mov.b32 %r19464, %r19465; 2026-02-21T10:19:38.0640356Z // begin inline asm 2026-02-21T10:19:38.0642941Z // wait for regs: %r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r19463,%r19464,%r19465 2026-02-21T10:19:38.0643091Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0643149Z // end inline asm 2026-02-21T10:19:38.0643202Z $L__tmp14: 2026-02-21T10:19:38.0643413Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0643480Z add.s64 %rd844, %rd844, 384; 2026-02-21T10:19:38.0643547Z add.s32 %r42725, %r42725, 192; 2026-02-21T10:19:38.0643614Z setp.lt.u64 %p193, %rd55, 3936; 2026-02-21T10:19:38.0643676Z mov.b64 %rd845, %rd55; 2026-02-21T10:19:38.0643739Z @%p193 bra $L__BB0_7; 2026-02-21T10:19:38.0643845Z // %bb.8: // %.preheader273.preheader 2026-02-21T10:19:38.0643948Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.0644014Z add.s64 %rd57, %rd52, 16128; 2026-02-21T10:19:38.0644076Z add.s64 %rd58, %rd45, 16128; 2026-02-21T10:19:38.0644133Z add.s64 %rd59, %rd46, 16128; 2026-02-21T10:19:38.0644191Z add.s64 %rd60, %rd47, 16128; 2026-02-21T10:19:38.0644255Z add.s64 %rd61, %rd48, 16128; 2026-02-21T10:19:38.0644315Z add.s64 %rd62, %rd49, 16128; 2026-02-21T10:19:38.0644375Z add.s64 %rd63, %rd50, 16128; 2026-02-21T10:19:38.0644439Z add.s64 %rd64, %rd51, 16128; 2026-02-21T10:19:38.0644498Z mov.b64 %rd847, 4000; 2026-02-21T10:19:38.0644557Z mov.b64 %rd846, %rd11; 2026-02-21T10:19:38.0644664Z $L__BB0_9: // %.preheader273 2026-02-21T10:19:38.0644770Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:38.0644872Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.0645077Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0645148Z add.s64 %rd457, %rd846, %rd64; 2026-02-21T10:19:38.0645210Z add.s64 %rd460, %rd846, %rd63; 2026-02-21T10:19:38.0645271Z add.s64 %rd463, %rd846, %rd62; 2026-02-21T10:19:38.0645332Z add.s64 %rd466, %rd846, %rd61; 2026-02-21T10:19:38.0645462Z add.s64 %rd469, %rd846, %rd60; 2026-02-21T10:19:38.0645523Z add.s64 %rd472, %rd846, %rd59; 2026-02-21T10:19:38.0645582Z add.s64 %rd475, %rd846, %rd58; 2026-02-21T10:19:38.0645795Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0645859Z add.s64 %rd478, %rd846, %rd57; 2026-02-21T10:19:38.0645919Z // begin inline asm 2026-02-21T10:19:38.0645980Z mov.u64 %rd456, 0x0; 2026-02-21T10:19:38.0646117Z createpolicy.fractional.L2::evict_first.b64 %rd456, 1.0; 2026-02-21T10:19:38.0646174Z // end inline asm 2026-02-21T10:19:38.0646231Z // begin inline asm 2026-02-21T10:19:38.0646294Z mov.u32 %r19794, 0x0; 2026-02-21T10:19:38.0646352Z mov.u32 %r19795, 0x0; 2026-02-21T10:19:38.0646407Z mov.u32 %r19796, 0x0; 2026-02-21T10:19:38.0646615Z mov.u32 %r19797, 0x0; 2026-02-21T10:19:38.0646879Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19794, %r19795, %r19796, %r19797 }, [ %rd457 + 0 ], %rd456; 2026-02-21T10:19:38.0646944Z // end inline asm 2026-02-21T10:19:38.0647012Z // begin inline asm 2026-02-21T10:19:38.0647071Z mov.u64 %rd459, 0x0; 2026-02-21T10:19:38.0647300Z createpolicy.fractional.L2::evict_first.b64 %rd459, 1.0; 2026-02-21T10:19:38.0647365Z // end inline asm 2026-02-21T10:19:38.0647426Z // begin inline asm 2026-02-21T10:19:38.0647483Z mov.u32 %r19798, 0x0; 2026-02-21T10:19:38.0647541Z mov.u32 %r19799, 0x0; 2026-02-21T10:19:38.0647603Z mov.u32 %r19800, 0x0; 2026-02-21T10:19:38.0647721Z mov.u32 %r19801, 0x0; 2026-02-21T10:19:38.0647958Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19798, %r19799, %r19800, %r19801 }, [ %rd460 + 0 ], %rd459; 2026-02-21T10:19:38.0648087Z // end inline asm 2026-02-21T10:19:38.0648146Z // begin inline asm 2026-02-21T10:19:38.0648203Z mov.u64 %rd462, 0x0; 2026-02-21T10:19:38.0648324Z createpolicy.fractional.L2::evict_first.b64 %rd462, 1.0; 2026-02-21T10:19:38.0648383Z // end inline asm 2026-02-21T10:19:38.0648439Z // begin inline asm 2026-02-21T10:19:38.0648497Z mov.u32 %r19802, 0x0; 2026-02-21T10:19:38.0648555Z mov.u32 %r19803, 0x0; 2026-02-21T10:19:38.0648612Z mov.u32 %r19804, 0x0; 2026-02-21T10:19:38.0648666Z mov.u32 %r19805, 0x0; 2026-02-21T10:19:38.0648889Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19802, %r19803, %r19804, %r19805 }, [ %rd463 + 0 ], %rd462; 2026-02-21T10:19:38.0648957Z // end inline asm 2026-02-21T10:19:38.0649015Z // begin inline asm 2026-02-21T10:19:38.0649073Z mov.u64 %rd465, 0x0; 2026-02-21T10:19:38.0649198Z createpolicy.fractional.L2::evict_first.b64 %rd465, 1.0; 2026-02-21T10:19:38.0649254Z // end inline asm 2026-02-21T10:19:38.0649310Z // begin inline asm 2026-02-21T10:19:38.0649372Z mov.u32 %r19806, 0x0; 2026-02-21T10:19:38.0649427Z mov.u32 %r19807, 0x0; 2026-02-21T10:19:38.0649483Z mov.u32 %r19808, 0x0; 2026-02-21T10:19:38.0649542Z mov.u32 %r19809, 0x0; 2026-02-21T10:19:38.0649766Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19806, %r19807, %r19808, %r19809 }, [ %rd466 + 0 ], %rd465; 2026-02-21T10:19:38.0649824Z // end inline asm 2026-02-21T10:19:38.0649882Z // begin inline asm 2026-02-21T10:19:38.0649942Z mov.u64 %rd468, 0x0; 2026-02-21T10:19:38.0650060Z createpolicy.fractional.L2::evict_first.b64 %rd468, 1.0; 2026-02-21T10:19:38.0650115Z // end inline asm 2026-02-21T10:19:38.0650184Z // begin inline asm 2026-02-21T10:19:38.0650247Z mov.u32 %r19810, 0x0; 2026-02-21T10:19:38.0650302Z mov.u32 %r19811, 0x0; 2026-02-21T10:19:38.0650358Z mov.u32 %r19812, 0x0; 2026-02-21T10:19:38.0650423Z mov.u32 %r19813, 0x0; 2026-02-21T10:19:38.0650642Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19810, %r19811, %r19812, %r19813 }, [ %rd469 + 0 ], %rd468; 2026-02-21T10:19:38.0650710Z // end inline asm 2026-02-21T10:19:38.0650772Z // begin inline asm 2026-02-21T10:19:38.0650831Z mov.u64 %rd471, 0x0; 2026-02-21T10:19:38.0650946Z createpolicy.fractional.L2::evict_first.b64 %rd471, 1.0; 2026-02-21T10:19:38.0651002Z // end inline asm 2026-02-21T10:19:38.0651061Z // begin inline asm 2026-02-21T10:19:38.0651196Z mov.u32 %r19814, 0x0; 2026-02-21T10:19:38.0651251Z mov.u32 %r19815, 0x0; 2026-02-21T10:19:38.0651308Z mov.u32 %r19816, 0x0; 2026-02-21T10:19:38.0651363Z mov.u32 %r19817, 0x0; 2026-02-21T10:19:38.0651583Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19814, %r19815, %r19816, %r19817 }, [ %rd472 + 0 ], %rd471; 2026-02-21T10:19:38.0651644Z // end inline asm 2026-02-21T10:19:38.0651700Z // begin inline asm 2026-02-21T10:19:38.0651756Z mov.u64 %rd474, 0x0; 2026-02-21T10:19:38.0651875Z createpolicy.fractional.L2::evict_first.b64 %rd474, 1.0; 2026-02-21T10:19:38.0651934Z // end inline asm 2026-02-21T10:19:38.0651989Z // begin inline asm 2026-02-21T10:19:38.0652058Z mov.u32 %r19818, 0x0; 2026-02-21T10:19:38.0652118Z mov.u32 %r19819, 0x0; 2026-02-21T10:19:38.0652173Z mov.u32 %r19820, 0x0; 2026-02-21T10:19:38.0652227Z mov.u32 %r19821, 0x0; 2026-02-21T10:19:38.0652447Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19818, %r19819, %r19820, %r19821 }, [ %rd475 + 0 ], %rd474; 2026-02-21T10:19:38.0652508Z // end inline asm 2026-02-21T10:19:38.0652564Z // begin inline asm 2026-02-21T10:19:38.0652622Z mov.u64 %rd477, 0x0; 2026-02-21T10:19:38.0652798Z createpolicy.fractional.L2::evict_first.b64 %rd477, 1.0; 2026-02-21T10:19:38.0652859Z // end inline asm 2026-02-21T10:19:38.0652918Z // begin inline asm 2026-02-21T10:19:38.0652976Z mov.u32 %r19822, 0x0; 2026-02-21T10:19:38.0653032Z mov.u32 %r19823, 0x0; 2026-02-21T10:19:38.0653086Z mov.u32 %r19824, 0x0; 2026-02-21T10:19:38.0653187Z mov.u32 %r19825, 0x0; 2026-02-21T10:19:38.0653414Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19822, %r19823, %r19824, %r19825 }, [ %rd478 + 0 ], %rd477; 2026-02-21T10:19:38.0653514Z // end inline asm 2026-02-21T10:19:38.0653720Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0653779Z bar.sync 0; 2026-02-21T10:19:38.0653862Z st.shared.v2.b32 [%r9], {%r19794, %r19795}; 2026-02-21T10:19:38.0653951Z st.shared.v2.b32 [%r9+2048], {%r19798, %r19799}; 2026-02-21T10:19:38.0654040Z st.shared.v2.b32 [%r9+4096], {%r19802, %r19803}; 2026-02-21T10:19:38.0654119Z st.shared.v2.b32 [%r9+6144], {%r19806, %r19807}; 2026-02-21T10:19:38.0654200Z st.shared.v2.b32 [%r9+8192], {%r19810, %r19811}; 2026-02-21T10:19:38.0654286Z st.shared.v2.b32 [%r9+10240], {%r19814, %r19815}; 2026-02-21T10:19:38.0654373Z st.shared.v2.b32 [%r9+12288], {%r19818, %r19819}; 2026-02-21T10:19:38.0654456Z st.shared.v2.b32 [%r9+14336], {%r19822, %r19823}; 2026-02-21T10:19:38.0654532Z st.shared.v2.b32 [%r10], {%r19796, %r19797}; 2026-02-21T10:19:38.0654616Z st.shared.v2.b32 [%r10+2048], {%r19800, %r19801}; 2026-02-21T10:19:38.0654700Z st.shared.v2.b32 [%r10+4096], {%r19804, %r19805}; 2026-02-21T10:19:38.0654780Z st.shared.v2.b32 [%r10+6144], {%r19808, %r19809}; 2026-02-21T10:19:38.0654865Z st.shared.v2.b32 [%r10+8192], {%r19812, %r19813}; 2026-02-21T10:19:38.0654956Z st.shared.v2.b32 [%r10+10240], {%r19816, %r19817}; 2026-02-21T10:19:38.0655053Z st.shared.v2.b32 [%r10+12288], {%r19820, %r19821}; 2026-02-21T10:19:38.0655143Z st.shared.v2.b32 [%r10+14336], {%r19824, %r19825}; 2026-02-21T10:19:38.0655202Z bar.sync 0; 2026-02-21T10:19:38.0655271Z ld.shared.b16 %rs1569, [%r51]; 2026-02-21T10:19:38.0655341Z ld.shared.b16 %rs1570, [%r51+1024]; 2026-02-21T10:19:38.0655412Z ld.shared.b16 %rs1571, [%r51+64]; 2026-02-21T10:19:38.0655482Z ld.shared.b16 %rs1572, [%r51+1088]; 2026-02-21T10:19:38.0655545Z ld.shared.b16 %rs1573, [%r51+8192]; 2026-02-21T10:19:38.0655612Z ld.shared.b16 %rs1574, [%r51+9216]; 2026-02-21T10:19:38.0655679Z ld.shared.b16 %rs1575, [%r51+8256]; 2026-02-21T10:19:38.0655744Z ld.shared.b16 %rs1576, [%r51+9280]; 2026-02-21T10:19:38.0655809Z ld.shared.b16 %rs1577, [%r52]; 2026-02-21T10:19:38.0655876Z ld.shared.b16 %rs1578, [%r52+1024]; 2026-02-21T10:19:38.0655940Z ld.shared.b16 %rs1579, [%r52+64]; 2026-02-21T10:19:38.0656003Z ld.shared.b16 %rs1580, [%r52+1088]; 2026-02-21T10:19:38.0656071Z ld.shared.b16 %rs1581, [%r52+8192]; 2026-02-21T10:19:38.0656205Z ld.shared.b16 %rs1582, [%r52+9216]; 2026-02-21T10:19:38.0656268Z ld.shared.b16 %rs1583, [%r52+8256]; 2026-02-21T10:19:38.0656330Z ld.shared.b16 %rs1584, [%r52+9280]; 2026-02-21T10:19:38.0656400Z ld.shared.b16 %rs1585, [%r53]; 2026-02-21T10:19:38.0656595Z ld.shared.b16 %rs1586, [%r53+1024]; 2026-02-21T10:19:38.0656664Z ld.shared.b16 %rs1587, [%r53+64]; 2026-02-21T10:19:38.0656732Z ld.shared.b16 %rs1588, [%r53+1088]; 2026-02-21T10:19:38.0656796Z ld.shared.b16 %rs1589, [%r53+8192]; 2026-02-21T10:19:38.0656875Z ld.shared.b16 %rs1590, [%r53+9216]; 2026-02-21T10:19:38.0656940Z ld.shared.b16 %rs1591, [%r53+8256]; 2026-02-21T10:19:38.0657007Z ld.shared.b16 %rs1592, [%r53+9280]; 2026-02-21T10:19:38.0657070Z ld.shared.b16 %rs1593, [%r54]; 2026-02-21T10:19:38.0657132Z ld.shared.b16 %rs1594, [%r54+1024]; 2026-02-21T10:19:38.0657200Z ld.shared.b16 %rs1595, [%r54+64]; 2026-02-21T10:19:38.0657262Z ld.shared.b16 %rs1596, [%r54+1088]; 2026-02-21T10:19:38.0657328Z ld.shared.b16 %rs1597, [%r54+8192]; 2026-02-21T10:19:38.0657394Z ld.shared.b16 %rs1598, [%r54+9216]; 2026-02-21T10:19:38.0657456Z ld.shared.b16 %rs1599, [%r54+8256]; 2026-02-21T10:19:38.0657596Z ld.shared.b16 %rs1600, [%r54+9280]; 2026-02-21T10:19:38.0657663Z ld.shared.b16 %rs1601, [%r55]; 2026-02-21T10:19:38.0657730Z ld.shared.b16 %rs1602, [%r55+1024]; 2026-02-21T10:19:38.0657792Z ld.shared.b16 %rs1603, [%r55+64]; 2026-02-21T10:19:38.0657915Z ld.shared.b16 %rs1604, [%r55+1088]; 2026-02-21T10:19:38.0657983Z ld.shared.b16 %rs1605, [%r55+8192]; 2026-02-21T10:19:38.0658046Z ld.shared.b16 %rs1606, [%r55+9216]; 2026-02-21T10:19:38.0658182Z ld.shared.b16 %rs1607, [%r55+8256]; 2026-02-21T10:19:38.0658247Z ld.shared.b16 %rs1608, [%r55+9280]; 2026-02-21T10:19:38.0658314Z ld.shared.b16 %rs1609, [%r56]; 2026-02-21T10:19:38.0658378Z ld.shared.b16 %rs1610, [%r56+1024]; 2026-02-21T10:19:38.0658443Z ld.shared.b16 %rs1611, [%r56+64]; 2026-02-21T10:19:38.0658509Z ld.shared.b16 %rs1612, [%r56+1088]; 2026-02-21T10:19:38.0658575Z ld.shared.b16 %rs1613, [%r56+8192]; 2026-02-21T10:19:38.0658637Z ld.shared.b16 %rs1614, [%r56+9216]; 2026-02-21T10:19:38.0658704Z ld.shared.b16 %rs1615, [%r56+8256]; 2026-02-21T10:19:38.0658767Z ld.shared.b16 %rs1616, [%r56+9280]; 2026-02-21T10:19:38.0658829Z ld.shared.b16 %rs1617, [%r57]; 2026-02-21T10:19:38.0658891Z ld.shared.b16 %rs1618, [%r57+1024]; 2026-02-21T10:19:38.0658957Z ld.shared.b16 %rs1619, [%r57+64]; 2026-02-21T10:19:38.0659022Z ld.shared.b16 %rs1620, [%r57+1088]; 2026-02-21T10:19:38.0659086Z ld.shared.b16 %rs1621, [%r57+8192]; 2026-02-21T10:19:38.0659152Z ld.shared.b16 %rs1622, [%r57+9216]; 2026-02-21T10:19:38.0659229Z ld.shared.b16 %rs1623, [%r57+8256]; 2026-02-21T10:19:38.0659294Z ld.shared.b16 %rs1624, [%r57+9280]; 2026-02-21T10:19:38.0659357Z ld.shared.b16 %rs1625, [%r58]; 2026-02-21T10:19:38.0659422Z ld.shared.b16 %rs1626, [%r58+1024]; 2026-02-21T10:19:38.0659486Z ld.shared.b16 %rs1627, [%r58+64]; 2026-02-21T10:19:38.0659561Z ld.shared.b16 %rs1628, [%r58+1088]; 2026-02-21T10:19:38.0659632Z ld.shared.b16 %rs1629, [%r58+8192]; 2026-02-21T10:19:38.0659696Z ld.shared.b16 %rs1630, [%r58+9216]; 2026-02-21T10:19:38.0659761Z ld.shared.b16 %rs1631, [%r58+8256]; 2026-02-21T10:19:38.0659826Z ld.shared.b16 %rs1632, [%r58+9280]; 2026-02-21T10:19:38.0659888Z cvt.f32.bf16 %r19963, %rs1569; 2026-02-21T10:19:38.0659948Z cvt.f32.bf16 %r19964, %rs1570; 2026-02-21T10:19:38.0660008Z cvt.f32.bf16 %r19965, %rs1577; 2026-02-21T10:19:38.0660073Z cvt.f32.bf16 %r19966, %rs1578; 2026-02-21T10:19:38.0660133Z cvt.f32.bf16 %r20095, %rs1585; 2026-02-21T10:19:38.0660192Z cvt.f32.bf16 %r20096, %rs1586; 2026-02-21T10:19:38.0660255Z cvt.f32.bf16 %r20097, %rs1593; 2026-02-21T10:19:38.0660313Z cvt.f32.bf16 %r20098, %rs1594; 2026-02-21T10:19:38.0660371Z cvt.f32.bf16 %r20227, %rs1601; 2026-02-21T10:19:38.0660430Z cvt.f32.bf16 %r20228, %rs1602; 2026-02-21T10:19:38.0660492Z cvt.f32.bf16 %r20229, %rs1609; 2026-02-21T10:19:38.0660551Z cvt.f32.bf16 %r20230, %rs1610; 2026-02-21T10:19:38.0660690Z cvt.f32.bf16 %r20359, %rs1617; 2026-02-21T10:19:38.0660762Z cvt.f32.bf16 %r20360, %rs1618; 2026-02-21T10:19:38.0660826Z cvt.f32.bf16 %r20361, %rs1625; 2026-02-21T10:19:38.0660885Z cvt.f32.bf16 %r20362, %rs1626; 2026-02-21T10:19:38.0660944Z cvt.f32.bf16 %r20491, %rs1571; 2026-02-21T10:19:38.0661005Z cvt.f32.bf16 %r20492, %rs1572; 2026-02-21T10:19:38.0661064Z cvt.f32.bf16 %r20493, %rs1579; 2026-02-21T10:19:38.0661126Z cvt.f32.bf16 %r20494, %rs1580; 2026-02-21T10:19:38.0661188Z cvt.f32.bf16 %r20623, %rs1587; 2026-02-21T10:19:38.0661247Z cvt.f32.bf16 %r20624, %rs1588; 2026-02-21T10:19:38.0661312Z cvt.f32.bf16 %r20625, %rs1595; 2026-02-21T10:19:38.0661375Z cvt.f32.bf16 %r20626, %rs1596; 2026-02-21T10:19:38.0661434Z cvt.f32.bf16 %r20755, %rs1603; 2026-02-21T10:19:38.0661493Z cvt.f32.bf16 %r20756, %rs1604; 2026-02-21T10:19:38.0661551Z cvt.f32.bf16 %r20757, %rs1611; 2026-02-21T10:19:38.0661614Z cvt.f32.bf16 %r20758, %rs1612; 2026-02-21T10:19:38.0661673Z cvt.f32.bf16 %r20887, %rs1619; 2026-02-21T10:19:38.0661731Z cvt.f32.bf16 %r20888, %rs1620; 2026-02-21T10:19:38.0661793Z cvt.f32.bf16 %r20889, %rs1627; 2026-02-21T10:19:38.0661904Z cvt.f32.bf16 %r20890, %rs1628; 2026-02-21T10:19:38.0661978Z cvt.f32.bf16 %r21019, %rs1573; 2026-02-21T10:19:38.0662039Z cvt.f32.bf16 %r21020, %rs1574; 2026-02-21T10:19:38.0662099Z cvt.f32.bf16 %r21021, %rs1581; 2026-02-21T10:19:38.0662156Z cvt.f32.bf16 %r21022, %rs1582; 2026-02-21T10:19:38.0662258Z cvt.f32.bf16 %r21151, %rs1589; 2026-02-21T10:19:38.0662322Z cvt.f32.bf16 %r21152, %rs1590; 2026-02-21T10:19:38.0662383Z cvt.f32.bf16 %r21153, %rs1597; 2026-02-21T10:19:38.0662486Z cvt.f32.bf16 %r21154, %rs1598; 2026-02-21T10:19:38.0662545Z cvt.f32.bf16 %r21283, %rs1605; 2026-02-21T10:19:38.0662608Z cvt.f32.bf16 %r21284, %rs1606; 2026-02-21T10:19:38.0662665Z cvt.f32.bf16 %r21285, %rs1613; 2026-02-21T10:19:38.0662727Z cvt.f32.bf16 %r21286, %rs1614; 2026-02-21T10:19:38.0662788Z cvt.f32.bf16 %r21415, %rs1621; 2026-02-21T10:19:38.0662849Z cvt.f32.bf16 %r21416, %rs1622; 2026-02-21T10:19:38.0662907Z cvt.f32.bf16 %r21417, %rs1629; 2026-02-21T10:19:38.0662968Z cvt.f32.bf16 %r21418, %rs1630; 2026-02-21T10:19:38.0663042Z cvt.f32.bf16 %r21547, %rs1575; 2026-02-21T10:19:38.0663102Z cvt.f32.bf16 %r21548, %rs1576; 2026-02-21T10:19:38.0663163Z cvt.f32.bf16 %r21549, %rs1583; 2026-02-21T10:19:38.0663228Z cvt.f32.bf16 %r21550, %rs1584; 2026-02-21T10:19:38.0663290Z cvt.f32.bf16 %r21679, %rs1591; 2026-02-21T10:19:38.0663349Z cvt.f32.bf16 %r21680, %rs1592; 2026-02-21T10:19:38.0663411Z cvt.f32.bf16 %r21681, %rs1599; 2026-02-21T10:19:38.0663472Z cvt.f32.bf16 %r21682, %rs1600; 2026-02-21T10:19:38.0663531Z cvt.f32.bf16 %r21811, %rs1607; 2026-02-21T10:19:38.0663590Z cvt.f32.bf16 %r21812, %rs1608; 2026-02-21T10:19:38.0663651Z cvt.f32.bf16 %r21813, %rs1615; 2026-02-21T10:19:38.0663709Z cvt.f32.bf16 %r21814, %rs1616; 2026-02-21T10:19:38.0663767Z cvt.f32.bf16 %r21943, %rs1623; 2026-02-21T10:19:38.0663831Z cvt.f32.bf16 %r21944, %rs1624; 2026-02-21T10:19:38.0663888Z cvt.f32.bf16 %r21945, %rs1631; 2026-02-21T10:19:38.0663947Z cvt.f32.bf16 %r21946, %rs1632; 2026-02-21T10:19:38.0664168Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0664225Z bar.sync 0; 2026-02-21T10:19:38.0664292Z // begin inline asm 2026-02-21T10:19:38.0664394Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0664455Z // end inline asm 2026-02-21T10:19:38.0664507Z bar.sync 0; 2026-02-21T10:19:38.0664564Z // begin inline asm 2026-02-21T10:19:38.0664700Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0664761Z // end inline asm 2026-02-21T10:19:38.0664817Z // begin inline asm 2026-02-21T10:19:38.0664892Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0664950Z // end inline asm 2026-02-21T10:19:38.0665002Z bar.sync 0; 2026-02-21T10:19:38.0665071Z elect.sync %r22209|%p215, -1; 2026-02-21T10:19:38.0669120Z and.pred %p196, %p1, %p215; 2026-02-21T10:19:38.0669227Z add.s64 %rd847, %rd847, 32; 2026-02-21T10:19:38.0669300Z cvt.u32.u64 %r19830, %rd847; 2026-02-21T10:19:38.0669370Z // begin inline asm 2026-02-21T10:19:38.0669743Z @%p196 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r19829, %r19830}], [%r29846]; 2026-02-21T10:19:38.0669810Z // end inline asm 2026-02-21T10:19:38.0669868Z bar.sync 0; 2026-02-21T10:19:38.0669931Z mov.b32 %r22077, 0; 2026-02-21T10:19:38.0669993Z // begin inline asm 2026-02-21T10:19:38.0670049Z 2026-02-21T10:19:38.0670099Z { 2026-02-21T10:19:38.0670166Z .reg .pred complete; 2026-02-21T10:19:38.0670225Z waitLoop: 2026-02-21T10:19:38.0670384Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r22077; 2026-02-21T10:19:38.0670459Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0670510Z } 2026-02-21T10:19:38.0670516Z 2026-02-21T10:19:38.0670578Z // end inline asm 2026-02-21T10:19:38.0670647Z bar.sync 0; 2026-02-21T10:19:38.0670709Z // begin inline asm 2026-02-21T10:19:38.0670819Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0670876Z // end inline asm 2026-02-21T10:19:38.0671221Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0671308Z ld.shared.s8 %rs1633, [%r19]; 2026-02-21T10:19:38.0671588Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0671656Z shl.b16 %rs1634, %rs1633, 4; 2026-02-21T10:19:38.0671851Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0671995Z ld.shared.s8 %rs1635, [%r20+128]; 2026-02-21T10:19:38.0672207Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0672273Z shl.b16 %rs1636, %rs1635, 4; 2026-02-21T10:19:38.0672484Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0672555Z ld.shared.s8 %rs1637, [%r21+256]; 2026-02-21T10:19:38.0672755Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0672823Z shl.b16 %rs1638, %rs1637, 4; 2026-02-21T10:19:38.0673018Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0673085Z ld.shared.s8 %rs1639, [%r22+384]; 2026-02-21T10:19:38.0673278Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0673343Z shl.b16 %rs1640, %rs1639, 4; 2026-02-21T10:19:38.0673529Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0673594Z ld.shared.s8 %rs1641, [%r23+512]; 2026-02-21T10:19:38.0673785Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0673851Z shl.b16 %rs1642, %rs1641, 4; 2026-02-21T10:19:38.0674039Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0674108Z ld.shared.s8 %rs1643, [%r24+640]; 2026-02-21T10:19:38.0674303Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0674366Z shl.b16 %rs1644, %rs1643, 4; 2026-02-21T10:19:38.0674568Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0674635Z ld.shared.s8 %rs1645, [%r25+768]; 2026-02-21T10:19:38.0674827Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0674891Z shl.b16 %rs1646, %rs1645, 4; 2026-02-21T10:19:38.0675078Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0675231Z ld.shared.s8 %rs1647, [%r26+896]; 2026-02-21T10:19:38.0675419Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0675482Z shl.b16 %rs1648, %rs1647, 4; 2026-02-21T10:19:38.0675668Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0675747Z ld.shared.s8 %rs1649, [%r19+1024]; 2026-02-21T10:19:38.0675943Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0676003Z shl.b16 %rs1650, %rs1649, 4; 2026-02-21T10:19:38.0676190Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0676256Z ld.shared.s8 %rs1651, [%r20+1152]; 2026-02-21T10:19:38.0676444Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0676660Z shl.b16 %rs1652, %rs1651, 4; 2026-02-21T10:19:38.0676854Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0676917Z ld.shared.s8 %rs1653, [%r21+1280]; 2026-02-21T10:19:38.0677216Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0677282Z shl.b16 %rs1654, %rs1653, 4; 2026-02-21T10:19:38.0677530Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0677595Z ld.shared.s8 %rs1655, [%r22+1408]; 2026-02-21T10:19:38.0677780Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0677907Z shl.b16 %rs1656, %rs1655, 4; 2026-02-21T10:19:38.0678093Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0678159Z ld.shared.s8 %rs1657, [%r23+1536]; 2026-02-21T10:19:38.0678355Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0678419Z shl.b16 %rs1658, %rs1657, 4; 2026-02-21T10:19:38.0678619Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0678687Z ld.shared.s8 %rs1659, [%r24+1664]; 2026-02-21T10:19:38.0678878Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0678943Z shl.b16 %rs1660, %rs1659, 4; 2026-02-21T10:19:38.0679130Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0679209Z ld.shared.s8 %rs1661, [%r25+1792]; 2026-02-21T10:19:38.0679399Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0679461Z shl.b16 %rs1662, %rs1661, 4; 2026-02-21T10:19:38.0679651Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0679717Z ld.shared.s8 %rs1663, [%r26+1920]; 2026-02-21T10:19:38.0679905Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0679968Z shl.b16 %rs1664, %rs1663, 4; 2026-02-21T10:19:38.0680156Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0680219Z ld.shared.s8 %rs1665, [%r19+2048]; 2026-02-21T10:19:38.0680411Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0680471Z shl.b16 %rs1666, %rs1665, 4; 2026-02-21T10:19:38.0680658Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0680724Z ld.shared.s8 %rs1667, [%r20+2176]; 2026-02-21T10:19:38.0680911Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0681046Z shl.b16 %rs1668, %rs1667, 4; 2026-02-21T10:19:38.0681234Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0681301Z ld.shared.s8 %rs1669, [%r21+2304]; 2026-02-21T10:19:38.0681486Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0681545Z shl.b16 %rs1670, %rs1669, 4; 2026-02-21T10:19:38.0681736Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0681800Z ld.shared.s8 %rs1671, [%r22+2432]; 2026-02-21T10:19:38.0681992Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0682069Z shl.b16 %rs1672, %rs1671, 4; 2026-02-21T10:19:38.0682260Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0682324Z ld.shared.s8 %rs1673, [%r23+2560]; 2026-02-21T10:19:38.0682517Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0682577Z shl.b16 %rs1674, %rs1673, 4; 2026-02-21T10:19:38.0682816Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0682881Z ld.shared.s8 %rs1675, [%r24+2688]; 2026-02-21T10:19:38.0683119Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0683184Z shl.b16 %rs1676, %rs1675, 4; 2026-02-21T10:19:38.0683374Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0683492Z ld.shared.s8 %rs1677, [%r25+2816]; 2026-02-21T10:19:38.0683679Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0683744Z shl.b16 %rs1678, %rs1677, 4; 2026-02-21T10:19:38.0683936Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0684012Z ld.shared.s8 %rs1679, [%r26+2944]; 2026-02-21T10:19:38.0684206Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0684272Z shl.b16 %rs1680, %rs1679, 4; 2026-02-21T10:19:38.0684462Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0684528Z ld.shared.s8 %rs1681, [%r19+3072]; 2026-02-21T10:19:38.0684714Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0684780Z shl.b16 %rs1682, %rs1681, 4; 2026-02-21T10:19:38.0684966Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0685031Z ld.shared.s8 %rs1683, [%r20+3200]; 2026-02-21T10:19:38.0685220Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0685283Z shl.b16 %rs1684, %rs1683, 4; 2026-02-21T10:19:38.0685472Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0685541Z ld.shared.s8 %rs1685, [%r21+3328]; 2026-02-21T10:19:38.0685727Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0685787Z shl.b16 %rs1686, %rs1685, 4; 2026-02-21T10:19:38.0685977Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0686040Z ld.shared.s8 %rs1687, [%r22+3456]; 2026-02-21T10:19:38.0686239Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0686302Z shl.b16 %rs1688, %rs1687, 4; 2026-02-21T10:19:38.0686692Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0686861Z ld.shared.s8 %rs1689, [%r23+3584]; 2026-02-21T10:19:38.0687074Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0687148Z shl.b16 %rs1690, %rs1689, 4; 2026-02-21T10:19:38.0687339Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0687405Z ld.shared.s8 %rs1691, [%r24+3712]; 2026-02-21T10:19:38.0687597Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0687659Z shl.b16 %rs1692, %rs1691, 4; 2026-02-21T10:19:38.0687847Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0687917Z ld.shared.s8 %rs1693, [%r25+3840]; 2026-02-21T10:19:38.0688105Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0688167Z shl.b16 %rs1694, %rs1693, 4; 2026-02-21T10:19:38.0688357Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0688426Z ld.shared.s8 %rs1695, [%r26+3968]; 2026-02-21T10:19:38.0688681Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0688744Z shl.b16 %rs1696, %rs1695, 4; 2026-02-21T10:19:38.0688989Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0689054Z cvt.s16.s8 %rs1697, %rs1634; 2026-02-21T10:19:38.0689114Z shr.s16 %rs1698, %rs1697, 4; 2026-02-21T10:19:38.0689237Z cvt.s16.s8 %rs1699, %rs1636; 2026-02-21T10:19:38.0689309Z shr.s16 %rs1700, %rs1699, 4; 2026-02-21T10:19:38.0689370Z shr.s16 %rs1701, %rs1633, 4; 2026-02-21T10:19:38.0689432Z shr.s16 %rs1702, %rs1635, 4; 2026-02-21T10:19:38.0689623Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0689693Z cvt.rn.f32.s16 %r22210, %rs1702; 2026-02-21T10:19:38.0689755Z cvt.rn.f32.s16 %r22211, %rs1701; 2026-02-21T10:19:38.0689821Z cvt.rn.f32.s16 %r22212, %rs1700; 2026-02-21T10:19:38.0689884Z cvt.rn.f32.s16 %r22213, %rs1698; 2026-02-21T10:19:38.0690072Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0690134Z cvt.s16.s8 %rs1703, %rs1638; 2026-02-21T10:19:38.0690195Z shr.s16 %rs1704, %rs1703, 4; 2026-02-21T10:19:38.0690256Z cvt.s16.s8 %rs1705, %rs1640; 2026-02-21T10:19:38.0690320Z shr.s16 %rs1706, %rs1705, 4; 2026-02-21T10:19:38.0690383Z shr.s16 %rs1707, %rs1637, 4; 2026-02-21T10:19:38.0690444Z shr.s16 %rs1708, %rs1639, 4; 2026-02-21T10:19:38.0690632Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0690702Z cvt.rn.f32.s16 %r22214, %rs1708; 2026-02-21T10:19:38.0690762Z cvt.rn.f32.s16 %r22215, %rs1707; 2026-02-21T10:19:38.0690834Z cvt.rn.f32.s16 %r22216, %rs1706; 2026-02-21T10:19:38.0690901Z cvt.rn.f32.s16 %r22217, %rs1704; 2026-02-21T10:19:38.0691092Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0691152Z cvt.s16.s8 %rs1709, %rs1642; 2026-02-21T10:19:38.0691211Z shr.s16 %rs1710, %rs1709, 4; 2026-02-21T10:19:38.0691273Z cvt.s16.s8 %rs1711, %rs1644; 2026-02-21T10:19:38.0691332Z shr.s16 %rs1712, %rs1711, 4; 2026-02-21T10:19:38.0691391Z shr.s16 %rs1713, %rs1641, 4; 2026-02-21T10:19:38.0691452Z shr.s16 %rs1714, %rs1643, 4; 2026-02-21T10:19:38.0691638Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0691701Z cvt.rn.f32.s16 %r22218, %rs1714; 2026-02-21T10:19:38.0691767Z cvt.rn.f32.s16 %r22219, %rs1713; 2026-02-21T10:19:38.0691828Z cvt.rn.f32.s16 %r22220, %rs1712; 2026-02-21T10:19:38.0691888Z cvt.rn.f32.s16 %r22221, %rs1710; 2026-02-21T10:19:38.0692075Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0692209Z cvt.s16.s8 %rs1715, %rs1646; 2026-02-21T10:19:38.0692269Z shr.s16 %rs1716, %rs1715, 4; 2026-02-21T10:19:38.0692330Z cvt.s16.s8 %rs1717, %rs1648; 2026-02-21T10:19:38.0692392Z shr.s16 %rs1718, %rs1717, 4; 2026-02-21T10:19:38.0692450Z shr.s16 %rs1719, %rs1645, 4; 2026-02-21T10:19:38.0692508Z shr.s16 %rs1720, %rs1647, 4; 2026-02-21T10:19:38.0692695Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0692760Z cvt.rn.f32.s16 %r22222, %rs1720; 2026-02-21T10:19:38.0692822Z cvt.rn.f32.s16 %r22223, %rs1719; 2026-02-21T10:19:38.0692896Z cvt.rn.f32.s16 %r22224, %rs1718; 2026-02-21T10:19:38.0692967Z cvt.rn.f32.s16 %r22225, %rs1716; 2026-02-21T10:19:38.0693157Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0693217Z cvt.s16.s8 %rs1721, %rs1650; 2026-02-21T10:19:38.0693281Z shr.s16 %rs1722, %rs1721, 4; 2026-02-21T10:19:38.0693342Z cvt.s16.s8 %rs1723, %rs1652; 2026-02-21T10:19:38.0693401Z shr.s16 %rs1724, %rs1723, 4; 2026-02-21T10:19:38.0693511Z shr.s16 %rs1725, %rs1649, 4; 2026-02-21T10:19:38.0693578Z shr.s16 %rs1726, %rs1651, 4; 2026-02-21T10:19:38.0693763Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0693824Z cvt.rn.f32.s16 %r22226, %rs1726; 2026-02-21T10:19:38.0693931Z cvt.rn.f32.s16 %r22227, %rs1725; 2026-02-21T10:19:38.0693996Z cvt.rn.f32.s16 %r22228, %rs1724; 2026-02-21T10:19:38.0694057Z cvt.rn.f32.s16 %r22229, %rs1722; 2026-02-21T10:19:38.0694309Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0694371Z cvt.s16.s8 %rs1727, %rs1654; 2026-02-21T10:19:38.0694432Z shr.s16 %rs1728, %rs1727, 4; 2026-02-21T10:19:38.0694492Z cvt.s16.s8 %rs1729, %rs1656; 2026-02-21T10:19:38.0694554Z shr.s16 %rs1730, %rs1729, 4; 2026-02-21T10:19:38.0694612Z shr.s16 %rs1731, %rs1653, 4; 2026-02-21T10:19:38.0694669Z shr.s16 %rs1732, %rs1655, 4; 2026-02-21T10:19:38.0694860Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0694920Z cvt.rn.f32.s16 %r22230, %rs1732; 2026-02-21T10:19:38.0694982Z cvt.rn.f32.s16 %r22231, %rs1731; 2026-02-21T10:19:38.0695044Z cvt.rn.f32.s16 %r22232, %rs1730; 2026-02-21T10:19:38.0695107Z cvt.rn.f32.s16 %r22233, %rs1728; 2026-02-21T10:19:38.0695292Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0695352Z cvt.s16.s8 %rs1733, %rs1658; 2026-02-21T10:19:38.0695414Z shr.s16 %rs1734, %rs1733, 4; 2026-02-21T10:19:38.0695474Z cvt.s16.s8 %rs1735, %rs1660; 2026-02-21T10:19:38.0695545Z shr.s16 %rs1736, %rs1735, 4; 2026-02-21T10:19:38.0695610Z shr.s16 %rs1737, %rs1657, 4; 2026-02-21T10:19:38.0695670Z shr.s16 %rs1738, %rs1659, 4; 2026-02-21T10:19:38.0695861Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0695927Z cvt.rn.f32.s16 %r22234, %rs1738; 2026-02-21T10:19:38.0695993Z cvt.rn.f32.s16 %r22235, %rs1737; 2026-02-21T10:19:38.0696058Z cvt.rn.f32.s16 %r22236, %rs1736; 2026-02-21T10:19:38.0696118Z cvt.rn.f32.s16 %r22237, %rs1734; 2026-02-21T10:19:38.0696312Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0696374Z cvt.s16.s8 %rs1739, %rs1662; 2026-02-21T10:19:38.0696436Z shr.s16 %rs1740, %rs1739, 4; 2026-02-21T10:19:38.0696651Z cvt.s16.s8 %rs1741, %rs1664; 2026-02-21T10:19:38.0696715Z shr.s16 %rs1742, %rs1741, 4; 2026-02-21T10:19:38.0696775Z shr.s16 %rs1743, %rs1661, 4; 2026-02-21T10:19:38.0696835Z shr.s16 %rs1744, %rs1663, 4; 2026-02-21T10:19:38.0697046Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0697213Z cvt.rn.f32.s16 %r22238, %rs1744; 2026-02-21T10:19:38.0697275Z cvt.rn.f32.s16 %r22239, %rs1743; 2026-02-21T10:19:38.0697339Z cvt.rn.f32.s16 %r22240, %rs1742; 2026-02-21T10:19:38.0697403Z cvt.rn.f32.s16 %r22241, %rs1740; 2026-02-21T10:19:38.0697591Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0697658Z cvt.s16.s8 %rs1745, %rs1666; 2026-02-21T10:19:38.0697721Z shr.s16 %rs1746, %rs1745, 4; 2026-02-21T10:19:38.0697780Z cvt.s16.s8 %rs1747, %rs1668; 2026-02-21T10:19:38.0697839Z shr.s16 %rs1748, %rs1747, 4; 2026-02-21T10:19:38.0697904Z shr.s16 %rs1749, %rs1665, 4; 2026-02-21T10:19:38.0697967Z shr.s16 %rs1750, %rs1667, 4; 2026-02-21T10:19:38.0698163Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0698231Z cvt.rn.f32.s16 %r22242, %rs1750; 2026-02-21T10:19:38.0698293Z cvt.rn.f32.s16 %r22243, %rs1749; 2026-02-21T10:19:38.0698357Z cvt.rn.f32.s16 %r22244, %rs1748; 2026-02-21T10:19:38.0698419Z cvt.rn.f32.s16 %r22245, %rs1746; 2026-02-21T10:19:38.0698693Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0698757Z cvt.s16.s8 %rs1751, %rs1670; 2026-02-21T10:19:38.0698819Z shr.s16 %rs1752, %rs1751, 4; 2026-02-21T10:19:38.0698882Z cvt.s16.s8 %rs1753, %rs1672; 2026-02-21T10:19:38.0698942Z shr.s16 %rs1754, %rs1753, 4; 2026-02-21T10:19:38.0699057Z shr.s16 %rs1755, %rs1669, 4; 2026-02-21T10:19:38.0699120Z shr.s16 %rs1756, %rs1671, 4; 2026-02-21T10:19:38.0699317Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0699454Z cvt.rn.f32.s16 %r22246, %rs1756; 2026-02-21T10:19:38.0699520Z cvt.rn.f32.s16 %r22247, %rs1755; 2026-02-21T10:19:38.0699584Z cvt.rn.f32.s16 %r22248, %rs1754; 2026-02-21T10:19:38.0699643Z cvt.rn.f32.s16 %r22249, %rs1752; 2026-02-21T10:19:38.0699833Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0699900Z cvt.s16.s8 %rs1757, %rs1674; 2026-02-21T10:19:38.0699962Z shr.s16 %rs1758, %rs1757, 4; 2026-02-21T10:19:38.0700021Z cvt.s16.s8 %rs1759, %rs1676; 2026-02-21T10:19:38.0700084Z shr.s16 %rs1760, %rs1759, 4; 2026-02-21T10:19:38.0700143Z shr.s16 %rs1761, %rs1673, 4; 2026-02-21T10:19:38.0700202Z shr.s16 %rs1762, %rs1675, 4; 2026-02-21T10:19:38.0700391Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0700458Z cvt.rn.f32.s16 %r22250, %rs1762; 2026-02-21T10:19:38.0701941Z cvt.rn.f32.s16 %r22251, %rs1761; 2026-02-21T10:19:38.0702038Z cvt.rn.f32.s16 %r22252, %rs1760; 2026-02-21T10:19:38.0702104Z cvt.rn.f32.s16 %r22253, %rs1758; 2026-02-21T10:19:38.0702331Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0702400Z cvt.s16.s8 %rs1763, %rs1678; 2026-02-21T10:19:38.0702470Z shr.s16 %rs1764, %rs1763, 4; 2026-02-21T10:19:38.0702532Z cvt.s16.s8 %rs1765, %rs1680; 2026-02-21T10:19:38.0702592Z shr.s16 %rs1766, %rs1765, 4; 2026-02-21T10:19:38.0702653Z shr.s16 %rs1767, %rs1677, 4; 2026-02-21T10:19:38.0702711Z shr.s16 %rs1768, %rs1679, 4; 2026-02-21T10:19:38.0702921Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0702989Z cvt.rn.f32.s16 %r22254, %rs1768; 2026-02-21T10:19:38.0703050Z cvt.rn.f32.s16 %r22255, %rs1767; 2026-02-21T10:19:38.0703113Z cvt.rn.f32.s16 %r22256, %rs1766; 2026-02-21T10:19:38.0703176Z cvt.rn.f32.s16 %r22257, %rs1764; 2026-02-21T10:19:38.0703386Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0703451Z cvt.s16.s8 %rs1769, %rs1682; 2026-02-21T10:19:38.0703515Z shr.s16 %rs1770, %rs1769, 4; 2026-02-21T10:19:38.0703577Z cvt.s16.s8 %rs1771, %rs1684; 2026-02-21T10:19:38.0703715Z shr.s16 %rs1772, %rs1771, 4; 2026-02-21T10:19:38.0703782Z shr.s16 %rs1773, %rs1681, 4; 2026-02-21T10:19:38.0703840Z shr.s16 %rs1774, %rs1683, 4; 2026-02-21T10:19:38.0704033Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0704104Z cvt.rn.f32.s16 %r22258, %rs1774; 2026-02-21T10:19:38.0704174Z cvt.rn.f32.s16 %r22259, %rs1773; 2026-02-21T10:19:38.0704237Z cvt.rn.f32.s16 %r22260, %rs1772; 2026-02-21T10:19:38.0704300Z cvt.rn.f32.s16 %r22261, %rs1770; 2026-02-21T10:19:38.0704493Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0704556Z cvt.s16.s8 %rs1775, %rs1686; 2026-02-21T10:19:38.0704617Z shr.s16 %rs1776, %rs1775, 4; 2026-02-21T10:19:38.0704678Z cvt.s16.s8 %rs1777, %rs1688; 2026-02-21T10:19:38.0704738Z shr.s16 %rs1778, %rs1777, 4; 2026-02-21T10:19:38.0704797Z shr.s16 %rs1779, %rs1685, 4; 2026-02-21T10:19:38.0704860Z shr.s16 %rs1780, %rs1687, 4; 2026-02-21T10:19:38.0705050Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0705167Z cvt.rn.f32.s16 %r22262, %rs1780; 2026-02-21T10:19:38.0705230Z cvt.rn.f32.s16 %r22263, %rs1779; 2026-02-21T10:19:38.0705297Z cvt.rn.f32.s16 %r22264, %rs1778; 2026-02-21T10:19:38.0705359Z cvt.rn.f32.s16 %r22265, %rs1776; 2026-02-21T10:19:38.0705548Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0705611Z cvt.s16.s8 %rs1781, %rs1690; 2026-02-21T10:19:38.0705670Z shr.s16 %rs1782, %rs1781, 4; 2026-02-21T10:19:38.0705793Z cvt.s16.s8 %rs1783, %rs1692; 2026-02-21T10:19:38.0705857Z shr.s16 %rs1784, %rs1783, 4; 2026-02-21T10:19:38.0705918Z shr.s16 %rs1785, %rs1689, 4; 2026-02-21T10:19:38.0705977Z shr.s16 %rs1786, %rs1691, 4; 2026-02-21T10:19:38.0706165Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0706234Z cvt.rn.f32.s16 %r22266, %rs1786; 2026-02-21T10:19:38.0706297Z cvt.rn.f32.s16 %r22267, %rs1785; 2026-02-21T10:19:38.0706359Z cvt.rn.f32.s16 %r22268, %rs1784; 2026-02-21T10:19:38.0706424Z cvt.rn.f32.s16 %r22269, %rs1782; 2026-02-21T10:19:38.0706783Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0706850Z cvt.s16.s8 %rs1787, %rs1694; 2026-02-21T10:19:38.0706914Z shr.s16 %rs1788, %rs1787, 4; 2026-02-21T10:19:38.0706983Z cvt.s16.s8 %rs1789, %rs1696; 2026-02-21T10:19:38.0707043Z shr.s16 %rs1790, %rs1789, 4; 2026-02-21T10:19:38.0707104Z shr.s16 %rs1791, %rs1693, 4; 2026-02-21T10:19:38.0707285Z shr.s16 %rs1792, %rs1695, 4; 2026-02-21T10:19:38.0707481Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0707544Z cvt.rn.f32.s16 %r22270, %rs1792; 2026-02-21T10:19:38.0707617Z cvt.rn.f32.s16 %r22271, %rs1791; 2026-02-21T10:19:38.0707687Z cvt.rn.f32.s16 %r22272, %rs1790; 2026-02-21T10:19:38.0707749Z cvt.rn.f32.s16 %r22273, %rs1788; 2026-02-21T10:19:38.0707805Z bar.sync 0; 2026-02-21T10:19:38.0707933Z st.shared.v4.b32 [%r27], {%r22213, %r22211, %r22212, %r22210}; 2026-02-21T10:19:38.0708063Z st.shared.v4.b32 [%r27+16384], {%r22245, %r22243, %r22244, %r22242}; 2026-02-21T10:19:38.0708181Z st.shared.v4.b32 [%r28], {%r22217, %r22215, %r22216, %r22214}; 2026-02-21T10:19:38.0708304Z st.shared.v4.b32 [%r28+16384], {%r22249, %r22247, %r22248, %r22246}; 2026-02-21T10:19:38.0708504Z st.shared.v4.b32 [%r29], {%r22221, %r22219, %r22220, %r22218}; 2026-02-21T10:19:38.0708630Z st.shared.v4.b32 [%r29+16384], {%r22253, %r22251, %r22252, %r22250}; 2026-02-21T10:19:38.0708738Z st.shared.v4.b32 [%r30], {%r22225, %r22223, %r22224, %r22222}; 2026-02-21T10:19:38.0708853Z st.shared.v4.b32 [%r30+16384], {%r22257, %r22255, %r22256, %r22254}; 2026-02-21T10:19:38.0708960Z st.shared.v4.b32 [%r31], {%r22229, %r22227, %r22228, %r22226}; 2026-02-21T10:19:38.0709165Z st.shared.v4.b32 [%r31+16384], {%r22261, %r22259, %r22260, %r22258}; 2026-02-21T10:19:38.0709274Z st.shared.v4.b32 [%r32], {%r22233, %r22231, %r22232, %r22230}; 2026-02-21T10:19:38.0709389Z st.shared.v4.b32 [%r32+16384], {%r22265, %r22263, %r22264, %r22262}; 2026-02-21T10:19:38.0709496Z st.shared.v4.b32 [%r33], {%r22237, %r22235, %r22236, %r22234}; 2026-02-21T10:19:38.0709616Z st.shared.v4.b32 [%r33+16384], {%r22269, %r22267, %r22268, %r22266}; 2026-02-21T10:19:38.0709722Z st.shared.v4.b32 [%r34], {%r22241, %r22239, %r22240, %r22238}; 2026-02-21T10:19:38.0709838Z st.shared.v4.b32 [%r34+16384], {%r22273, %r22271, %r22272, %r22270}; 2026-02-21T10:19:38.0709898Z $L__tmp15: 2026-02-21T10:19:38.0710169Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0710230Z // begin inline asm 2026-02-21T10:19:38.0710321Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0710386Z // end inline asm 2026-02-21T10:19:38.0710439Z bar.sync 0; 2026-02-21T10:19:38.0710511Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0710581Z mov.pred %p198, -1; 2026-02-21T10:19:38.0710708Z // begin inline asm 2026-02-21T10:19:38.0712196Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r19963,%r19964,%r19965,%r19966}, %rd3, %p198, 1, 1; 2026-02-21T10:19:38.0712318Z // end inline asm 2026-02-21T10:19:38.0712375Z // begin inline asm 2026-02-21T10:19:38.0713850Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20095,%r20096,%r20097,%r20098}, %rd4, %p198, 1, 1; 2026-02-21T10:19:38.0713910Z // end inline asm 2026-02-21T10:19:38.0713972Z // begin inline asm 2026-02-21T10:19:38.0715498Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20227,%r20228,%r20229,%r20230}, %rd5, %p198, 1, 1; 2026-02-21T10:19:38.0715561Z // end inline asm 2026-02-21T10:19:38.0715619Z // begin inline asm 2026-02-21T10:19:38.0717420Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20359,%r20360,%r20361,%r20362}, %rd6, %p198, 1, 1; 2026-02-21T10:19:38.0717572Z // end inline asm 2026-02-21T10:19:38.0717634Z // begin inline asm 2026-02-21T10:19:38.0719088Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20491,%r20492,%r20493,%r20494}, %rd7, %p198, 1, 1; 2026-02-21T10:19:38.0719150Z // end inline asm 2026-02-21T10:19:38.0719212Z // begin inline asm 2026-02-21T10:19:38.0720723Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20623,%r20624,%r20625,%r20626}, %rd8, %p198, 1, 1; 2026-02-21T10:19:38.0720790Z // end inline asm 2026-02-21T10:19:38.0720847Z // begin inline asm 2026-02-21T10:19:38.0722394Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20755,%r20756,%r20757,%r20758}, %rd9, %p198, 1, 1; 2026-02-21T10:19:38.0722454Z // end inline asm 2026-02-21T10:19:38.0722512Z // begin inline asm 2026-02-21T10:19:38.0724031Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789}, {%r20887,%r20888,%r20889,%r20890}, %rd10, %p198, 1, 1; 2026-02-21T10:19:38.0724094Z // end inline asm 2026-02-21T10:19:38.0724153Z // begin inline asm 2026-02-21T10:19:38.0725612Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21019,%r21020,%r21021,%r21022}, %rd3, %p198, 1, 1; 2026-02-21T10:19:38.0725672Z // end inline asm 2026-02-21T10:19:38.0725731Z // begin inline asm 2026-02-21T10:19:38.0727330Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21151,%r21152,%r21153,%r21154}, %rd4, %p198, 1, 1; 2026-02-21T10:19:38.0727469Z // end inline asm 2026-02-21T10:19:38.0727527Z // begin inline asm 2026-02-21T10:19:38.0728977Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21283,%r21284,%r21285,%r21286}, %rd5, %p198, 1, 1; 2026-02-21T10:19:38.0729098Z // end inline asm 2026-02-21T10:19:38.0729158Z // begin inline asm 2026-02-21T10:19:38.0730612Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21415,%r21416,%r21417,%r21418}, %rd6, %p198, 1, 1; 2026-02-21T10:19:38.0730729Z // end inline asm 2026-02-21T10:19:38.0730787Z // begin inline asm 2026-02-21T10:19:38.0732243Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21547,%r21548,%r21549,%r21550}, %rd7, %p198, 1, 1; 2026-02-21T10:19:38.0732300Z // end inline asm 2026-02-21T10:19:38.0732414Z // begin inline asm 2026-02-21T10:19:38.0734038Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21679,%r21680,%r21681,%r21682}, %rd8, %p198, 1, 1; 2026-02-21T10:19:38.0734105Z // end inline asm 2026-02-21T10:19:38.0734167Z // begin inline asm 2026-02-21T10:19:38.0735619Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21811,%r21812,%r21813,%r21814}, %rd9, %p198, 1, 1; 2026-02-21T10:19:38.0735742Z // end inline asm 2026-02-21T10:19:38.0735804Z // begin inline asm 2026-02-21T10:19:38.0737410Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853}, {%r21943,%r21944,%r21945,%r21946}, %rd10, %p198, 1, 1; 2026-02-21T10:19:38.0737475Z // end inline asm 2026-02-21T10:19:38.0737554Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0737617Z mov.b32 %r22075, %r39931; 2026-02-21T10:19:38.0737677Z mov.b32 %r22076, %r22077; 2026-02-21T10:19:38.0737738Z // begin inline asm 2026-02-21T10:19:38.0740271Z // wait for regs: %r42726,%r42727,%r42728,%r42729,%r42730,%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r22075,%r22076,%r22077 2026-02-21T10:19:38.0740426Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0740483Z // end inline asm 2026-02-21T10:19:38.0740537Z $L__tmp16: 2026-02-21T10:19:38.0740758Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0740825Z add.s64 %rd846, %rd846, 128; 2026-02-21T10:19:38.0740894Z setp.lt.u64 %p216, %rd847, 4064; 2026-02-21T10:19:38.0740958Z @%p216 bra $L__BB0_9; 2026-02-21T10:19:38.0741141Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.0741349Z .loc 1 94 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:94:28 2026-02-21T10:19:38.0741439Z cvt.rn.bf16x2.f32 %r22278, %r42727, %r42726; 2026-02-21T10:19:38.0741523Z cvt.rn.bf16x2.f32 %r22279, %r42729, %r42728; 2026-02-21T10:19:38.0741600Z cvt.rn.bf16x2.f32 %r22280, %r42731, %r42730; 2026-02-21T10:19:38.0741674Z cvt.rn.bf16x2.f32 %r22281, %r42733, %r42732; 2026-02-21T10:19:38.0741753Z cvt.rn.bf16x2.f32 %r22282, %r42735, %r42734; 2026-02-21T10:19:38.0741827Z cvt.rn.bf16x2.f32 %r22283, %r42737, %r42736; 2026-02-21T10:19:38.0741903Z cvt.rn.bf16x2.f32 %r22284, %r42739, %r42738; 2026-02-21T10:19:38.0741980Z cvt.rn.bf16x2.f32 %r22285, %r42741, %r42740; 2026-02-21T10:19:38.0742053Z cvt.rn.bf16x2.f32 %r22286, %r42743, %r42742; 2026-02-21T10:19:38.0742126Z cvt.rn.bf16x2.f32 %r22287, %r42745, %r42744; 2026-02-21T10:19:38.0742199Z cvt.rn.bf16x2.f32 %r22288, %r42747, %r42746; 2026-02-21T10:19:38.0742279Z cvt.rn.bf16x2.f32 %r22289, %r42749, %r42748; 2026-02-21T10:19:38.0742353Z cvt.rn.bf16x2.f32 %r22290, %r42751, %r42750; 2026-02-21T10:19:38.0742426Z cvt.rn.bf16x2.f32 %r22291, %r42753, %r42752; 2026-02-21T10:19:38.0742507Z cvt.rn.bf16x2.f32 %r22292, %r42755, %r42754; 2026-02-21T10:19:38.0742665Z cvt.rn.bf16x2.f32 %r22293, %r42757, %r42756; 2026-02-21T10:19:38.0742744Z cvt.rn.bf16x2.f32 %r22294, %r42759, %r42758; 2026-02-21T10:19:38.0742824Z cvt.rn.bf16x2.f32 %r22295, %r42761, %r42760; 2026-02-21T10:19:38.0742899Z cvt.rn.bf16x2.f32 %r22296, %r42763, %r42762; 2026-02-21T10:19:38.0742972Z cvt.rn.bf16x2.f32 %r22297, %r42765, %r42764; 2026-02-21T10:19:38.0743049Z cvt.rn.bf16x2.f32 %r22298, %r42767, %r42766; 2026-02-21T10:19:38.0743127Z cvt.rn.bf16x2.f32 %r22299, %r42769, %r42768; 2026-02-21T10:19:38.0743200Z cvt.rn.bf16x2.f32 %r22300, %r42771, %r42770; 2026-02-21T10:19:38.0743275Z cvt.rn.bf16x2.f32 %r22301, %r42773, %r42772; 2026-02-21T10:19:38.0743355Z cvt.rn.bf16x2.f32 %r22302, %r42775, %r42774; 2026-02-21T10:19:38.0743430Z cvt.rn.bf16x2.f32 %r22303, %r42777, %r42776; 2026-02-21T10:19:38.0743504Z cvt.rn.bf16x2.f32 %r22304, %r42779, %r42778; 2026-02-21T10:19:38.0743581Z cvt.rn.bf16x2.f32 %r22305, %r42781, %r42780; 2026-02-21T10:19:38.0743654Z cvt.rn.bf16x2.f32 %r22306, %r42783, %r42782; 2026-02-21T10:19:38.0743728Z cvt.rn.bf16x2.f32 %r22307, %r42785, %r42784; 2026-02-21T10:19:38.0743801Z cvt.rn.bf16x2.f32 %r22308, %r42787, %r42786; 2026-02-21T10:19:38.0743931Z cvt.rn.bf16x2.f32 %r22309, %r42789, %r42788; 2026-02-21T10:19:38.0744009Z cvt.rn.bf16x2.f32 %r22310, %r42791, %r42790; 2026-02-21T10:19:38.0744083Z cvt.rn.bf16x2.f32 %r22311, %r42793, %r42792; 2026-02-21T10:19:38.0744159Z cvt.rn.bf16x2.f32 %r22312, %r42795, %r42794; 2026-02-21T10:19:38.0744232Z cvt.rn.bf16x2.f32 %r22313, %r42797, %r42796; 2026-02-21T10:19:38.0744304Z cvt.rn.bf16x2.f32 %r22314, %r42799, %r42798; 2026-02-21T10:19:38.0744446Z cvt.rn.bf16x2.f32 %r22315, %r42801, %r42800; 2026-02-21T10:19:38.0744524Z cvt.rn.bf16x2.f32 %r22316, %r42803, %r42802; 2026-02-21T10:19:38.0744598Z cvt.rn.bf16x2.f32 %r22317, %r42805, %r42804; 2026-02-21T10:19:38.0744671Z cvt.rn.bf16x2.f32 %r22318, %r42807, %r42806; 2026-02-21T10:19:38.0744748Z cvt.rn.bf16x2.f32 %r22319, %r42809, %r42808; 2026-02-21T10:19:38.0744826Z cvt.rn.bf16x2.f32 %r22320, %r42811, %r42810; 2026-02-21T10:19:38.0744900Z cvt.rn.bf16x2.f32 %r22321, %r42813, %r42812; 2026-02-21T10:19:38.0744978Z cvt.rn.bf16x2.f32 %r22322, %r42815, %r42814; 2026-02-21T10:19:38.0745052Z cvt.rn.bf16x2.f32 %r22323, %r42817, %r42816; 2026-02-21T10:19:38.0745125Z cvt.rn.bf16x2.f32 %r22324, %r42819, %r42818; 2026-02-21T10:19:38.0745199Z cvt.rn.bf16x2.f32 %r22325, %r42821, %r42820; 2026-02-21T10:19:38.0745280Z cvt.rn.bf16x2.f32 %r22326, %r42823, %r42822; 2026-02-21T10:19:38.0745353Z cvt.rn.bf16x2.f32 %r22327, %r42825, %r42824; 2026-02-21T10:19:38.0745426Z cvt.rn.bf16x2.f32 %r22328, %r42827, %r42826; 2026-02-21T10:19:38.0745557Z cvt.rn.bf16x2.f32 %r22329, %r42829, %r42828; 2026-02-21T10:19:38.0745634Z cvt.rn.bf16x2.f32 %r22330, %r42831, %r42830; 2026-02-21T10:19:38.0745708Z cvt.rn.bf16x2.f32 %r22331, %r42833, %r42832; 2026-02-21T10:19:38.0745784Z cvt.rn.bf16x2.f32 %r22332, %r42835, %r42834; 2026-02-21T10:19:38.0745857Z cvt.rn.bf16x2.f32 %r22333, %r42837, %r42836; 2026-02-21T10:19:38.0745932Z cvt.rn.bf16x2.f32 %r22334, %r42839, %r42838; 2026-02-21T10:19:38.0746004Z cvt.rn.bf16x2.f32 %r22335, %r42841, %r42840; 2026-02-21T10:19:38.0746082Z cvt.rn.bf16x2.f32 %r22336, %r42843, %r42842; 2026-02-21T10:19:38.0746155Z cvt.rn.bf16x2.f32 %r22337, %r42845, %r42844; 2026-02-21T10:19:38.0746227Z cvt.rn.bf16x2.f32 %r22338, %r42847, %r42846; 2026-02-21T10:19:38.0746303Z cvt.rn.bf16x2.f32 %r22339, %r42849, %r42848; 2026-02-21T10:19:38.0746376Z cvt.rn.bf16x2.f32 %r22340, %r42851, %r42850; 2026-02-21T10:19:38.0746573Z cvt.rn.bf16x2.f32 %r22341, %r42853, %r42852; 2026-02-21T10:19:38.0746787Z .loc 1 95 43 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:95:43 2026-02-21T10:19:38.0746845Z bar.sync 0; 2026-02-21T10:19:38.0747055Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r22278, %r22279, %r22280, %r22281}; 2026-02-21T10:19:38.0747245Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r22294, %r22295, %r22296, %r22297}; 2026-02-21T10:19:38.0747504Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r22310, %r22311, %r22312, %r22313}; 2026-02-21T10:19:38.0747684Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r22326, %r22327, %r22328, %r22329}; 2026-02-21T10:19:38.0747863Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r22282, %r22283, %r22284, %r22285}; 2026-02-21T10:19:38.0748043Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r22298, %r22299, %r22300, %r22301}; 2026-02-21T10:19:38.0748220Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r22314, %r22315, %r22316, %r22317}; 2026-02-21T10:19:38.0748485Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r22330, %r22331, %r22332, %r22333}; 2026-02-21T10:19:38.0748674Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r22286, %r22287, %r22288, %r22289}; 2026-02-21T10:19:38.0748852Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r22302, %r22303, %r22304, %r22305}; 2026-02-21T10:19:38.0749029Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r22318, %r22319, %r22320, %r22321}; 2026-02-21T10:19:38.0749212Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r22334, %r22335, %r22336, %r22337}; 2026-02-21T10:19:38.0749477Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r22290, %r22291, %r22292, %r22293}; 2026-02-21T10:19:38.0749658Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r22306, %r22307, %r22308, %r22309}; 2026-02-21T10:19:38.0749841Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r22322, %r22323, %r22324, %r22325}; 2026-02-21T10:19:38.0750020Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r22338, %r22339, %r22340, %r22341}; 2026-02-21T10:19:38.0750144Z // begin inline asm 2026-02-21T10:19:38.0750231Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0750289Z // end inline asm 2026-02-21T10:19:38.0750344Z bar.sync 0; 2026-02-21T10:19:38.0750413Z elect.sync %r22342|%p219, -1; 2026-02-21T10:19:38.0750483Z and.pred %p217, %p405, %p219; 2026-02-21T10:19:38.0750546Z or.b32 %r22274, %r19829, %r639; 2026-02-21T10:19:38.0750606Z // begin inline asm 2026-02-21T10:19:38.0750847Z @%p217 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r22274, %r22275}], [%r22276]; 2026-02-21T10:19:38.0750906Z // end inline asm 2026-02-21T10:19:38.0750978Z cp.async.bulk.commit_group; 2026-02-21T10:19:38.0751055Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:19:38.0751119Z bar.sync 0; 2026-02-21T10:19:38.0751335Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:38.0751402Z add.s32 %r22343, %r42467, 8448; 2026-02-21T10:19:38.0751602Z .loc 1 34 35 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:34:35 2026-02-21T10:19:38.0751733Z shr.s32 %r22344, %r22343, 31; 2026-02-21T10:19:38.0751794Z shr.u32 %r22345, %r22344, 18; 2026-02-21T10:19:38.0751860Z add.s32 %r22346, %r22343, %r22345; 2026-02-21T10:19:38.0751919Z shr.s32 %r22347, %r22346, 14; 2026-02-21T10:19:38.0752109Z .loc 1 35 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:35:33 2026-02-21T10:19:38.0752175Z shl.b32 %r22348, %r22347, 5; 2026-02-21T10:19:38.0752366Z .loc 1 36 39 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:39 2026-02-21T10:19:38.0752426Z sub.s32 %r22349, 10, %r22348; 2026-02-21T10:19:38.0752614Z .loc 1 36 52 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:52 2026-02-21T10:19:38.0752674Z min.s32 %r22350, %r22349, 32; 2026-02-21T10:19:38.0752861Z .loc 1 37 45 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:45 2026-02-21T10:19:38.0752929Z and.b32 %r22351, %r22346, -16384; 2026-02-21T10:19:38.0752996Z sub.s32 %r22352, %r22343, %r22351; 2026-02-21T10:19:38.0753188Z .loc 1 38 51 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:38:51 2026-02-21T10:19:38.0753261Z div.s32 %r22353, %r22352, %r22350; 2026-02-21T10:19:38.0753456Z .loc 1 37 64 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:64 2026-02-21T10:19:38.0753577Z mul.lo.s32 %r22354, %r22353, %r22350; 2026-02-21T10:19:38.0753638Z sub.s32 %r22355, %r22352, %r22354; 2026-02-21T10:19:38.0753830Z .loc 1 37 30 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:30 2026-02-21T10:19:38.0753889Z add.s32 %r22356, %r22355, %r22348; 2026-02-21T10:19:38.0754077Z .loc 1 39 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:39:27 2026-02-21T10:19:38.0754136Z shl.b32 %r29849, %r22356, 7; 2026-02-21T10:19:38.0754338Z .loc 1 40 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:40:27 2026-02-21T10:19:38.0754403Z shl.b32 %r32295, %r22353, 7; 2026-02-21T10:19:38.0754600Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.0754665Z or.b32 %r22357, %r42459, %r32295; 2026-02-21T10:19:38.0754724Z shl.b32 %r22358, %r22357, 13; 2026-02-21T10:19:38.0754790Z mul.wide.s32 %rd69, %r22358, 2; 2026-02-21T10:19:38.0754851Z or.b32 %r22359, %r42460, %r32295; 2026-02-21T10:19:38.0754963Z shl.b32 %r22360, %r22359, 13; 2026-02-21T10:19:38.0755031Z mul.wide.s32 %rd70, %r22360, 2; 2026-02-21T10:19:38.0755089Z or.b32 %r22361, %r42461, %r32295; 2026-02-21T10:19:38.0755149Z shl.b32 %r22362, %r22361, 13; 2026-02-21T10:19:38.0755210Z mul.wide.s32 %rd71, %r22362, 2; 2026-02-21T10:19:38.0755271Z or.b32 %r22363, %r42462, %r32295; 2026-02-21T10:19:38.0755332Z shl.b32 %r22364, %r22363, 13; 2026-02-21T10:19:38.0755393Z mul.wide.s32 %rd72, %r22364, 2; 2026-02-21T10:19:38.0755501Z or.b32 %r22365, %r42463, %r32295; 2026-02-21T10:19:38.0755560Z shl.b32 %r22366, %r22365, 13; 2026-02-21T10:19:38.0755633Z mul.wide.s32 %rd73, %r22366, 2; 2026-02-21T10:19:38.0755696Z or.b32 %r22367, %r42464, %r32295; 2026-02-21T10:19:38.0755755Z shl.b32 %r22368, %r22367, 13; 2026-02-21T10:19:38.0755819Z mul.wide.s32 %rd74, %r22368, 2; 2026-02-21T10:19:38.0755879Z shl.b32 %r22369, %r22353, 20; 2026-02-21T10:19:38.0755937Z or.b32 %r22370, %r42465, %r22369; 2026-02-21T10:19:38.0756006Z mul.wide.s32 %rd75, %r22370, 2; 2026-02-21T10:19:38.0756068Z or.b32 %r42982, %r67, %r22369; 2026-02-21T10:19:38.0756127Z or.b32 %r22371, %r42466, %r22369; 2026-02-21T10:19:38.0756189Z mul.wide.s32 %rd76, %r22371, 2; 2026-02-21T10:19:38.0756251Z mov.b32 %r42983, 0f00000000; 2026-02-21T10:19:38.0756310Z mov.b64 %rd849, -96; 2026-02-21T10:19:38.0756370Z mov.b64 %rd848, %rd11; 2026-02-21T10:19:38.0756433Z mov.b32 %r42984, %r42983; 2026-02-21T10:19:38.0756626Z mov.b32 %r42985, %r42983; 2026-02-21T10:19:38.0756773Z mov.b32 %r42986, %r42983; 2026-02-21T10:19:38.0756835Z mov.b32 %r42987, %r42983; 2026-02-21T10:19:38.0756896Z mov.b32 %r42988, %r42983; 2026-02-21T10:19:38.0756957Z mov.b32 %r42989, %r42983; 2026-02-21T10:19:38.0757025Z mov.b32 %r42990, %r42983; 2026-02-21T10:19:38.0757086Z mov.b32 %r42991, %r42983; 2026-02-21T10:19:38.0757147Z mov.b32 %r42992, %r42983; 2026-02-21T10:19:38.0757206Z mov.b32 %r42993, %r42983; 2026-02-21T10:19:38.0757263Z mov.b32 %r42994, %r42983; 2026-02-21T10:19:38.0757324Z mov.b32 %r42995, %r42983; 2026-02-21T10:19:38.0757382Z mov.b32 %r42996, %r42983; 2026-02-21T10:19:38.0757439Z mov.b32 %r42997, %r42983; 2026-02-21T10:19:38.0757497Z mov.b32 %r42998, %r42983; 2026-02-21T10:19:38.0757554Z mov.b32 %r42999, %r42983; 2026-02-21T10:19:38.0757611Z mov.b32 %r43000, %r42983; 2026-02-21T10:19:38.0757668Z mov.b32 %r43001, %r42983; 2026-02-21T10:19:38.0757727Z mov.b32 %r43002, %r42983; 2026-02-21T10:19:38.0757787Z mov.b32 %r43003, %r42983; 2026-02-21T10:19:38.0757846Z mov.b32 %r43004, %r42983; 2026-02-21T10:19:38.0757907Z mov.b32 %r43005, %r42983; 2026-02-21T10:19:38.0757965Z mov.b32 %r43006, %r42983; 2026-02-21T10:19:38.0758022Z mov.b32 %r43007, %r42983; 2026-02-21T10:19:38.0758078Z mov.b32 %r43008, %r42983; 2026-02-21T10:19:38.0758139Z mov.b32 %r43009, %r42983; 2026-02-21T10:19:38.0758269Z mov.b32 %r43010, %r42983; 2026-02-21T10:19:38.0758326Z mov.b32 %r43011, %r42983; 2026-02-21T10:19:38.0758385Z mov.b32 %r43012, %r42983; 2026-02-21T10:19:38.0758443Z mov.b32 %r43013, %r42983; 2026-02-21T10:19:38.0758499Z mov.b32 %r43014, %r42983; 2026-02-21T10:19:38.0758555Z mov.b32 %r43015, %r42983; 2026-02-21T10:19:38.0758614Z mov.b32 %r43016, %r42983; 2026-02-21T10:19:38.0758669Z mov.b32 %r43017, %r42983; 2026-02-21T10:19:38.0758726Z mov.b32 %r43018, %r42983; 2026-02-21T10:19:38.0758784Z mov.b32 %r43019, %r42983; 2026-02-21T10:19:38.0758841Z mov.b32 %r43020, %r42983; 2026-02-21T10:19:38.0758898Z mov.b32 %r43021, %r42983; 2026-02-21T10:19:38.0758959Z mov.b32 %r43022, %r42983; 2026-02-21T10:19:38.0759017Z mov.b32 %r43023, %r42983; 2026-02-21T10:19:38.0759073Z mov.b32 %r43024, %r42983; 2026-02-21T10:19:38.0759129Z mov.b32 %r43025, %r42983; 2026-02-21T10:19:38.0759189Z mov.b32 %r43026, %r42983; 2026-02-21T10:19:38.0759246Z mov.b32 %r43027, %r42983; 2026-02-21T10:19:38.0759304Z mov.b32 %r43028, %r42983; 2026-02-21T10:19:38.0759362Z mov.b32 %r43029, %r42983; 2026-02-21T10:19:38.0759417Z mov.b32 %r43030, %r42983; 2026-02-21T10:19:38.0759540Z mov.b32 %r43031, %r42983; 2026-02-21T10:19:38.0759599Z mov.b32 %r43032, %r42983; 2026-02-21T10:19:38.0759666Z mov.b32 %r43033, %r42983; 2026-02-21T10:19:38.0759729Z mov.b32 %r43034, %r42983; 2026-02-21T10:19:38.0759787Z mov.b32 %r43035, %r42983; 2026-02-21T10:19:38.0759845Z mov.b32 %r43036, %r42983; 2026-02-21T10:19:38.0759901Z mov.b32 %r43037, %r42983; 2026-02-21T10:19:38.0759957Z mov.b32 %r43038, %r42983; 2026-02-21T10:19:38.0760076Z mov.b32 %r43039, %r42983; 2026-02-21T10:19:38.0760137Z mov.b32 %r43040, %r42983; 2026-02-21T10:19:38.0760194Z mov.b32 %r43041, %r42983; 2026-02-21T10:19:38.0760250Z mov.b32 %r43042, %r42983; 2026-02-21T10:19:38.0760308Z mov.b32 %r43043, %r42983; 2026-02-21T10:19:38.0760365Z mov.b32 %r43044, %r42983; 2026-02-21T10:19:38.0760421Z mov.b32 %r43045, %r42983; 2026-02-21T10:19:38.0760479Z mov.b32 %r43046, %r42983; 2026-02-21T10:19:38.0760539Z mov.b32 %r43047, %r42983; 2026-02-21T10:19:38.0760595Z mov.b32 %r43048, %r42983; 2026-02-21T10:19:38.0760654Z mov.b32 %r43049, %r42983; 2026-02-21T10:19:38.0760713Z mov.b32 %r43050, %r42983; 2026-02-21T10:19:38.0760770Z mov.b32 %r43051, %r42983; 2026-02-21T10:19:38.0760827Z mov.b32 %r43052, %r42983; 2026-02-21T10:19:38.0760883Z mov.b32 %r43053, %r42983; 2026-02-21T10:19:38.0760943Z mov.b32 %r43054, %r42983; 2026-02-21T10:19:38.0760998Z mov.b32 %r43055, %r42983; 2026-02-21T10:19:38.0761056Z mov.b32 %r43056, %r42983; 2026-02-21T10:19:38.0761117Z mov.b32 %r43057, %r42983; 2026-02-21T10:19:38.0761224Z mov.b32 %r43058, %r42983; 2026-02-21T10:19:38.0761284Z mov.b32 %r43059, %r42983; 2026-02-21T10:19:38.0761341Z mov.b32 %r43060, %r42983; 2026-02-21T10:19:38.0761400Z mov.b32 %r43061, %r42983; 2026-02-21T10:19:38.0761457Z mov.b32 %r43062, %r42983; 2026-02-21T10:19:38.0761513Z mov.b32 %r43063, %r42983; 2026-02-21T10:19:38.0761574Z mov.b32 %r43064, %r42983; 2026-02-21T10:19:38.0761631Z mov.b32 %r43065, %r42983; 2026-02-21T10:19:38.0761689Z mov.b32 %r43066, %r42983; 2026-02-21T10:19:38.0761748Z mov.b32 %r43067, %r42983; 2026-02-21T10:19:38.0761806Z mov.b32 %r43068, %r42983; 2026-02-21T10:19:38.0761863Z mov.b32 %r43069, %r42983; 2026-02-21T10:19:38.0761919Z mov.b32 %r43070, %r42983; 2026-02-21T10:19:38.0761977Z mov.b32 %r43071, %r42983; 2026-02-21T10:19:38.0762035Z mov.b32 %r43072, %r42983; 2026-02-21T10:19:38.0762092Z mov.b32 %r43073, %r42983; 2026-02-21T10:19:38.0762150Z mov.b32 %r43074, %r42983; 2026-02-21T10:19:38.0762208Z mov.b32 %r43075, %r42983; 2026-02-21T10:19:38.0762267Z mov.b32 %r43076, %r42983; 2026-02-21T10:19:38.0762323Z mov.b32 %r43077, %r42983; 2026-02-21T10:19:38.0762383Z mov.b32 %r43078, %r42983; 2026-02-21T10:19:38.0762440Z mov.b32 %r43079, %r42983; 2026-02-21T10:19:38.0762495Z mov.b32 %r43080, %r42983; 2026-02-21T10:19:38.0762553Z mov.b32 %r43081, %r42983; 2026-02-21T10:19:38.0762678Z mov.b32 %r43082, %r42983; 2026-02-21T10:19:38.0762736Z mov.b32 %r43083, %r42983; 2026-02-21T10:19:38.0762792Z mov.b32 %r43084, %r42983; 2026-02-21T10:19:38.0762853Z mov.b32 %r43085, %r42983; 2026-02-21T10:19:38.0762911Z mov.b32 %r43086, %r42983; 2026-02-21T10:19:38.0762968Z mov.b32 %r43087, %r42983; 2026-02-21T10:19:38.0763027Z mov.b32 %r43088, %r42983; 2026-02-21T10:19:38.0763083Z mov.b32 %r43089, %r42983; 2026-02-21T10:19:38.0763140Z mov.b32 %r43090, %r42983; 2026-02-21T10:19:38.0763196Z mov.b32 %r43091, %r42983; 2026-02-21T10:19:38.0763255Z mov.b32 %r43092, %r42983; 2026-02-21T10:19:38.0763314Z mov.b32 %r43093, %r42983; 2026-02-21T10:19:38.0763373Z mov.b32 %r43094, %r42983; 2026-02-21T10:19:38.0763432Z mov.b32 %r43095, %r42983; 2026-02-21T10:19:38.0763490Z mov.b32 %r43096, %r42983; 2026-02-21T10:19:38.0763548Z mov.b32 %r43097, %r42983; 2026-02-21T10:19:38.0763604Z mov.b32 %r43098, %r42983; 2026-02-21T10:19:38.0763664Z mov.b32 %r43099, %r42983; 2026-02-21T10:19:38.0763722Z mov.b32 %r43100, %r42983; 2026-02-21T10:19:38.0763778Z mov.b32 %r43101, %r42983; 2026-02-21T10:19:38.0763838Z mov.b32 %r43102, %r42983; 2026-02-21T10:19:38.0763954Z mov.b32 %r43103, %r42983; 2026-02-21T10:19:38.0764017Z mov.b32 %r43104, %r42983; 2026-02-21T10:19:38.0764076Z mov.b32 %r43105, %r42983; 2026-02-21T10:19:38.0764137Z mov.b32 %r43106, %r42983; 2026-02-21T10:19:38.0764192Z mov.b32 %r43107, %r42983; 2026-02-21T10:19:38.0764249Z mov.b32 %r43108, %r42983; 2026-02-21T10:19:38.0764317Z mov.b32 %r43109, %r42983; 2026-02-21T10:19:38.0764373Z mov.b32 %r43110, %r42983; 2026-02-21T10:19:38.0764548Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:38.0764659Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.0764859Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0764924Z add.s64 %rd500, %rd848, %rd75; 2026-02-21T10:19:38.0764987Z add.s64 %rd503, %rd848, %rd74; 2026-02-21T10:19:38.0765049Z add.s64 %rd506, %rd848, %rd73; 2026-02-21T10:19:38.0765112Z add.s64 %rd509, %rd848, %rd72; 2026-02-21T10:19:38.0765172Z add.s64 %rd512, %rd848, %rd71; 2026-02-21T10:19:38.0765245Z add.s64 %rd515, %rd848, %rd70; 2026-02-21T10:19:38.0765308Z add.s64 %rd518, %rd848, %rd69; 2026-02-21T10:19:38.0765506Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0765570Z add.s64 %rd521, %rd848, %rd76; 2026-02-21T10:19:38.0765629Z // begin inline asm 2026-02-21T10:19:38.0765690Z mov.u64 %rd499, 0x0; 2026-02-21T10:19:38.0765869Z createpolicy.fractional.L2::evict_first.b64 %rd499, 1.0; 2026-02-21T10:19:38.0765929Z // end inline asm 2026-02-21T10:19:38.0765986Z // begin inline asm 2026-02-21T10:19:38.0766044Z mov.u32 %r22372, 0x0; 2026-02-21T10:19:38.0766112Z mov.u32 %r22373, 0x0; 2026-02-21T10:19:38.0766170Z mov.u32 %r22374, 0x0; 2026-02-21T10:19:38.0766229Z mov.u32 %r22375, 0x0; 2026-02-21T10:19:38.0766598Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22372, %r22373, %r22374, %r22375 }, [ %rd500 + 0 ], %rd499; 2026-02-21T10:19:38.0766664Z // end inline asm 2026-02-21T10:19:38.0766723Z // begin inline asm 2026-02-21T10:19:38.0766781Z mov.u64 %rd502, 0x0; 2026-02-21T10:19:38.0766909Z createpolicy.fractional.L2::evict_first.b64 %rd502, 1.0; 2026-02-21T10:19:38.0766973Z // end inline asm 2026-02-21T10:19:38.0767030Z // begin inline asm 2026-02-21T10:19:38.0767089Z mov.u32 %r22376, 0x0; 2026-02-21T10:19:38.0767145Z mov.u32 %r22377, 0x0; 2026-02-21T10:19:38.0767203Z mov.u32 %r22378, 0x0; 2026-02-21T10:19:38.0767260Z mov.u32 %r22379, 0x0; 2026-02-21T10:19:38.0767487Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22376, %r22377, %r22378, %r22379 }, [ %rd503 + 0 ], %rd502; 2026-02-21T10:19:38.0767543Z // end inline asm 2026-02-21T10:19:38.0767602Z // begin inline asm 2026-02-21T10:19:38.0767660Z mov.u64 %rd505, 0x0; 2026-02-21T10:19:38.0767885Z createpolicy.fractional.L2::evict_first.b64 %rd505, 1.0; 2026-02-21T10:19:38.0767940Z // end inline asm 2026-02-21T10:19:38.0767998Z // begin inline asm 2026-02-21T10:19:38.0768058Z mov.u32 %r22380, 0x0; 2026-02-21T10:19:38.0768113Z mov.u32 %r22381, 0x0; 2026-02-21T10:19:38.0768170Z mov.u32 %r22382, 0x0; 2026-02-21T10:19:38.0768227Z mov.u32 %r22383, 0x0; 2026-02-21T10:19:38.0768446Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22380, %r22381, %r22382, %r22383 }, [ %rd506 + 0 ], %rd505; 2026-02-21T10:19:38.0768502Z // end inline asm 2026-02-21T10:19:38.0768562Z // begin inline asm 2026-02-21T10:19:38.0768624Z mov.u64 %rd508, 0x0; 2026-02-21T10:19:38.0768754Z createpolicy.fractional.L2::evict_first.b64 %rd508, 1.0; 2026-02-21T10:19:38.0768814Z // end inline asm 2026-02-21T10:19:38.0768877Z // begin inline asm 2026-02-21T10:19:38.0768946Z mov.u32 %r22384, 0x0; 2026-02-21T10:19:38.0769006Z mov.u32 %r22385, 0x0; 2026-02-21T10:19:38.0769065Z mov.u32 %r22386, 0x0; 2026-02-21T10:19:38.0769125Z mov.u32 %r22387, 0x0; 2026-02-21T10:19:38.0769355Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22384, %r22385, %r22386, %r22387 }, [ %rd509 + 0 ], %rd508; 2026-02-21T10:19:38.0769485Z // end inline asm 2026-02-21T10:19:38.0769549Z // begin inline asm 2026-02-21T10:19:38.0769608Z mov.u64 %rd511, 0x0; 2026-02-21T10:19:38.0769732Z createpolicy.fractional.L2::evict_first.b64 %rd511, 1.0; 2026-02-21T10:19:38.0769789Z // end inline asm 2026-02-21T10:19:38.0769847Z // begin inline asm 2026-02-21T10:19:38.0769914Z mov.u32 %r22388, 0x0; 2026-02-21T10:19:38.0769974Z mov.u32 %r22389, 0x0; 2026-02-21T10:19:38.0770094Z mov.u32 %r22390, 0x0; 2026-02-21T10:19:38.0770151Z mov.u32 %r22391, 0x0; 2026-02-21T10:19:38.0770378Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22388, %r22389, %r22390, %r22391 }, [ %rd512 + 0 ], %rd511; 2026-02-21T10:19:38.0770437Z // end inline asm 2026-02-21T10:19:38.0770494Z // begin inline asm 2026-02-21T10:19:38.0770551Z mov.u64 %rd514, 0x0; 2026-02-21T10:19:38.0770674Z createpolicy.fractional.L2::evict_first.b64 %rd514, 1.0; 2026-02-21T10:19:38.0770729Z // end inline asm 2026-02-21T10:19:38.0770787Z // begin inline asm 2026-02-21T10:19:38.0770846Z mov.u32 %r22392, 0x0; 2026-02-21T10:19:38.0770903Z mov.u32 %r22393, 0x0; 2026-02-21T10:19:38.0770960Z mov.u32 %r22394, 0x0; 2026-02-21T10:19:38.0771020Z mov.u32 %r22395, 0x0; 2026-02-21T10:19:38.0771247Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22392, %r22393, %r22394, %r22395 }, [ %rd515 + 0 ], %rd514; 2026-02-21T10:19:38.0771314Z // end inline asm 2026-02-21T10:19:38.0771376Z // begin inline asm 2026-02-21T10:19:38.0771442Z mov.u64 %rd517, 0x0; 2026-02-21T10:19:38.0771629Z createpolicy.fractional.L2::evict_first.b64 %rd517, 1.0; 2026-02-21T10:19:38.0771688Z // end inline asm 2026-02-21T10:19:38.0771749Z // begin inline asm 2026-02-21T10:19:38.0771805Z mov.u32 %r22396, 0x0; 2026-02-21T10:19:38.0771873Z mov.u32 %r22397, 0x0; 2026-02-21T10:19:38.0771932Z mov.u32 %r22398, 0x0; 2026-02-21T10:19:38.0771994Z mov.u32 %r22399, 0x0; 2026-02-21T10:19:38.0772218Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22396, %r22397, %r22398, %r22399 }, [ %rd518 + 0 ], %rd517; 2026-02-21T10:19:38.0772277Z // end inline asm 2026-02-21T10:19:38.0772336Z // begin inline asm 2026-02-21T10:19:38.0772394Z mov.u64 %rd520, 0x0; 2026-02-21T10:19:38.0772508Z createpolicy.fractional.L2::evict_first.b64 %rd520, 1.0; 2026-02-21T10:19:38.0772563Z // end inline asm 2026-02-21T10:19:38.0772624Z // begin inline asm 2026-02-21T10:19:38.0772681Z mov.u32 %r22400, 0x0; 2026-02-21T10:19:38.0772737Z mov.u32 %r22401, 0x0; 2026-02-21T10:19:38.0772812Z mov.u32 %r22402, 0x0; 2026-02-21T10:19:38.0772873Z mov.u32 %r22403, 0x0; 2026-02-21T10:19:38.0773096Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22400, %r22401, %r22402, %r22403 }, [ %rd521 + 0 ], %rd520; 2026-02-21T10:19:38.0773153Z // end inline asm 2026-02-21T10:19:38.0773357Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0773469Z bar.sync 0; 2026-02-21T10:19:38.0773553Z st.shared.v2.b32 [%r9], {%r22372, %r22373}; 2026-02-21T10:19:38.0773647Z st.shared.v2.b32 [%r9+2048], {%r22376, %r22377}; 2026-02-21T10:19:38.0773735Z st.shared.v2.b32 [%r9+4096], {%r22380, %r22381}; 2026-02-21T10:19:38.0773817Z st.shared.v2.b32 [%r9+6144], {%r22384, %r22385}; 2026-02-21T10:19:38.0773902Z st.shared.v2.b32 [%r9+8192], {%r22388, %r22389}; 2026-02-21T10:19:38.0773991Z st.shared.v2.b32 [%r9+10240], {%r22392, %r22393}; 2026-02-21T10:19:38.0774075Z st.shared.v2.b32 [%r9+12288], {%r22396, %r22397}; 2026-02-21T10:19:38.0774165Z st.shared.v2.b32 [%r9+14336], {%r22400, %r22401}; 2026-02-21T10:19:38.0774245Z st.shared.v2.b32 [%r10], {%r22374, %r22375}; 2026-02-21T10:19:38.0774328Z st.shared.v2.b32 [%r10+2048], {%r22378, %r22379}; 2026-02-21T10:19:38.0774410Z st.shared.v2.b32 [%r10+4096], {%r22382, %r22383}; 2026-02-21T10:19:38.0774495Z st.shared.v2.b32 [%r10+6144], {%r22386, %r22387}; 2026-02-21T10:19:38.0774581Z st.shared.v2.b32 [%r10+8192], {%r22390, %r22391}; 2026-02-21T10:19:38.0774673Z st.shared.v2.b32 [%r10+10240], {%r22394, %r22395}; 2026-02-21T10:19:38.0774819Z st.shared.v2.b32 [%r10+12288], {%r22398, %r22399}; 2026-02-21T10:19:38.0774906Z st.shared.v2.b32 [%r10+14336], {%r22402, %r22403}; 2026-02-21T10:19:38.0774960Z bar.sync 0; 2026-02-21T10:19:38.0775033Z ld.shared.b16 %rs1793, [%r51]; 2026-02-21T10:19:38.0775115Z ld.shared.b16 %rs1794, [%r51+1024]; 2026-02-21T10:19:38.0775185Z ld.shared.b16 %rs1795, [%r51+64]; 2026-02-21T10:19:38.0775253Z ld.shared.b16 %rs1796, [%r51+1088]; 2026-02-21T10:19:38.0775367Z ld.shared.b16 %rs1797, [%r51+8192]; 2026-02-21T10:19:38.0775434Z ld.shared.b16 %rs1798, [%r51+9216]; 2026-02-21T10:19:38.0775499Z ld.shared.b16 %rs1799, [%r51+8256]; 2026-02-21T10:19:38.0775563Z ld.shared.b16 %rs1800, [%r51+9280]; 2026-02-21T10:19:38.0775630Z ld.shared.b16 %rs1801, [%r52]; 2026-02-21T10:19:38.0775695Z ld.shared.b16 %rs1802, [%r52+1024]; 2026-02-21T10:19:38.0775761Z ld.shared.b16 %rs1803, [%r52+64]; 2026-02-21T10:19:38.0775827Z ld.shared.b16 %rs1804, [%r52+1088]; 2026-02-21T10:19:38.0775892Z ld.shared.b16 %rs1805, [%r52+8192]; 2026-02-21T10:19:38.0775957Z ld.shared.b16 %rs1806, [%r52+9216]; 2026-02-21T10:19:38.0776024Z ld.shared.b16 %rs1807, [%r52+8256]; 2026-02-21T10:19:38.0776087Z ld.shared.b16 %rs1808, [%r52+9280]; 2026-02-21T10:19:38.0776151Z ld.shared.b16 %rs1809, [%r53]; 2026-02-21T10:19:38.0776217Z ld.shared.b16 %rs1810, [%r53+1024]; 2026-02-21T10:19:38.0776281Z ld.shared.b16 %rs1811, [%r53+64]; 2026-02-21T10:19:38.0776344Z ld.shared.b16 %rs1812, [%r53+1088]; 2026-02-21T10:19:38.0776623Z ld.shared.b16 %rs1813, [%r53+8192]; 2026-02-21T10:19:38.0776707Z ld.shared.b16 %rs1814, [%r53+9216]; 2026-02-21T10:19:38.0776773Z ld.shared.b16 %rs1815, [%r53+8256]; 2026-02-21T10:19:38.0776838Z ld.shared.b16 %rs1816, [%r53+9280]; 2026-02-21T10:19:38.0776906Z ld.shared.b16 %rs1817, [%r54]; 2026-02-21T10:19:38.0776973Z ld.shared.b16 %rs1818, [%r54+1024]; 2026-02-21T10:19:38.0777049Z ld.shared.b16 %rs1819, [%r54+64]; 2026-02-21T10:19:38.0777117Z ld.shared.b16 %rs1820, [%r54+1088]; 2026-02-21T10:19:38.0777185Z ld.shared.b16 %rs1821, [%r54+8192]; 2026-02-21T10:19:38.0777249Z ld.shared.b16 %rs1822, [%r54+9216]; 2026-02-21T10:19:38.0777313Z ld.shared.b16 %rs1823, [%r54+8256]; 2026-02-21T10:19:38.0777380Z ld.shared.b16 %rs1824, [%r54+9280]; 2026-02-21T10:19:38.0777444Z ld.shared.b16 %rs1825, [%r55]; 2026-02-21T10:19:38.0777507Z ld.shared.b16 %rs1826, [%r55+1024]; 2026-02-21T10:19:38.0777572Z ld.shared.b16 %rs1827, [%r55+64]; 2026-02-21T10:19:38.0777639Z ld.shared.b16 %rs1828, [%r55+1088]; 2026-02-21T10:19:38.0777706Z ld.shared.b16 %rs1829, [%r55+8192]; 2026-02-21T10:19:38.0777770Z ld.shared.b16 %rs1830, [%r55+9216]; 2026-02-21T10:19:38.0777837Z ld.shared.b16 %rs1831, [%r55+8256]; 2026-02-21T10:19:38.0777899Z ld.shared.b16 %rs1832, [%r55+9280]; 2026-02-21T10:19:38.0777963Z ld.shared.b16 %rs1833, [%r56]; 2026-02-21T10:19:38.0778111Z ld.shared.b16 %rs1834, [%r56+1024]; 2026-02-21T10:19:38.0778175Z ld.shared.b16 %rs1835, [%r56+64]; 2026-02-21T10:19:38.0778252Z ld.shared.b16 %rs1836, [%r56+1088]; 2026-02-21T10:19:38.0778318Z ld.shared.b16 %rs1837, [%r56+8192]; 2026-02-21T10:19:38.0778384Z ld.shared.b16 %rs1838, [%r56+9216]; 2026-02-21T10:19:38.0778449Z ld.shared.b16 %rs1839, [%r56+8256]; 2026-02-21T10:19:38.0778514Z ld.shared.b16 %rs1840, [%r56+9280]; 2026-02-21T10:19:38.0778582Z ld.shared.b16 %rs1841, [%r57]; 2026-02-21T10:19:38.0778645Z ld.shared.b16 %rs1842, [%r57+1024]; 2026-02-21T10:19:38.0778709Z ld.shared.b16 %rs1843, [%r57+64]; 2026-02-21T10:19:38.0778779Z ld.shared.b16 %rs1844, [%r57+1088]; 2026-02-21T10:19:38.0778843Z ld.shared.b16 %rs1845, [%r57+8192]; 2026-02-21T10:19:38.0778908Z ld.shared.b16 %rs1846, [%r57+9216]; 2026-02-21T10:19:38.0778971Z ld.shared.b16 %rs1847, [%r57+8256]; 2026-02-21T10:19:38.0779037Z ld.shared.b16 %rs1848, [%r57+9280]; 2026-02-21T10:19:38.0779102Z ld.shared.b16 %rs1849, [%r58]; 2026-02-21T10:19:38.0779165Z ld.shared.b16 %rs1850, [%r58+1024]; 2026-02-21T10:19:38.0779229Z ld.shared.b16 %rs1851, [%r58+64]; 2026-02-21T10:19:38.0779364Z ld.shared.b16 %rs1852, [%r58+1088]; 2026-02-21T10:19:38.0779432Z ld.shared.b16 %rs1853, [%r58+8192]; 2026-02-21T10:19:38.0779496Z ld.shared.b16 %rs1854, [%r58+9216]; 2026-02-21T10:19:38.0779563Z ld.shared.b16 %rs1855, [%r58+8256]; 2026-02-21T10:19:38.0779639Z ld.shared.b16 %rs1856, [%r58+9280]; 2026-02-21T10:19:38.0779704Z cvt.f32.bf16 %r22541, %rs1793; 2026-02-21T10:19:38.0779771Z cvt.f32.bf16 %r22542, %rs1794; 2026-02-21T10:19:38.0779897Z cvt.f32.bf16 %r22543, %rs1801; 2026-02-21T10:19:38.0779960Z cvt.f32.bf16 %r22544, %rs1802; 2026-02-21T10:19:38.0780022Z cvt.f32.bf16 %r22673, %rs1809; 2026-02-21T10:19:38.0780083Z cvt.f32.bf16 %r22674, %rs1810; 2026-02-21T10:19:38.0780143Z cvt.f32.bf16 %r22675, %rs1817; 2026-02-21T10:19:38.0780201Z cvt.f32.bf16 %r22676, %rs1818; 2026-02-21T10:19:38.0780267Z cvt.f32.bf16 %r22805, %rs1825; 2026-02-21T10:19:38.0780327Z cvt.f32.bf16 %r22806, %rs1826; 2026-02-21T10:19:38.0780388Z cvt.f32.bf16 %r22807, %rs1833; 2026-02-21T10:19:38.0780455Z cvt.f32.bf16 %r22808, %rs1834; 2026-02-21T10:19:38.0780515Z cvt.f32.bf16 %r22937, %rs1841; 2026-02-21T10:19:38.0780574Z cvt.f32.bf16 %r22938, %rs1842; 2026-02-21T10:19:38.0780633Z cvt.f32.bf16 %r22939, %rs1849; 2026-02-21T10:19:38.0780694Z cvt.f32.bf16 %r22940, %rs1850; 2026-02-21T10:19:38.0780764Z cvt.f32.bf16 %r23069, %rs1795; 2026-02-21T10:19:38.0780825Z cvt.f32.bf16 %r23070, %rs1796; 2026-02-21T10:19:38.0780886Z cvt.f32.bf16 %r23071, %rs1803; 2026-02-21T10:19:38.0781013Z cvt.f32.bf16 %r23072, %rs1804; 2026-02-21T10:19:38.0781075Z cvt.f32.bf16 %r23201, %rs1811; 2026-02-21T10:19:38.0781136Z cvt.f32.bf16 %r23202, %rs1812; 2026-02-21T10:19:38.0781198Z cvt.f32.bf16 %r23203, %rs1819; 2026-02-21T10:19:38.0781258Z cvt.f32.bf16 %r23204, %rs1820; 2026-02-21T10:19:38.0781317Z cvt.f32.bf16 %r23333, %rs1827; 2026-02-21T10:19:38.0781383Z cvt.f32.bf16 %r23334, %rs1828; 2026-02-21T10:19:38.0781443Z cvt.f32.bf16 %r23335, %rs1835; 2026-02-21T10:19:38.0781504Z cvt.f32.bf16 %r23336, %rs1836; 2026-02-21T10:19:38.0781565Z cvt.f32.bf16 %r23465, %rs1843; 2026-02-21T10:19:38.0781627Z cvt.f32.bf16 %r23466, %rs1844; 2026-02-21T10:19:38.0781686Z cvt.f32.bf16 %r23467, %rs1851; 2026-02-21T10:19:38.0781746Z cvt.f32.bf16 %r23468, %rs1852; 2026-02-21T10:19:38.0781809Z cvt.f32.bf16 %r23597, %rs1797; 2026-02-21T10:19:38.0781870Z cvt.f32.bf16 %r23598, %rs1798; 2026-02-21T10:19:38.0781930Z cvt.f32.bf16 %r23599, %rs1805; 2026-02-21T10:19:38.0781993Z cvt.f32.bf16 %r23600, %rs1806; 2026-02-21T10:19:38.0782053Z cvt.f32.bf16 %r23729, %rs1813; 2026-02-21T10:19:38.0782113Z cvt.f32.bf16 %r23730, %rs1814; 2026-02-21T10:19:38.0782173Z cvt.f32.bf16 %r23731, %rs1821; 2026-02-21T10:19:38.0782235Z cvt.f32.bf16 %r23732, %rs1822; 2026-02-21T10:19:38.0782294Z cvt.f32.bf16 %r23861, %rs1829; 2026-02-21T10:19:38.0782420Z cvt.f32.bf16 %r23862, %rs1830; 2026-02-21T10:19:38.0782484Z cvt.f32.bf16 %r23863, %rs1837; 2026-02-21T10:19:38.0782545Z cvt.f32.bf16 %r23864, %rs1838; 2026-02-21T10:19:38.0782605Z cvt.f32.bf16 %r23993, %rs1845; 2026-02-21T10:19:38.0782665Z cvt.f32.bf16 %r23994, %rs1846; 2026-02-21T10:19:38.0782728Z cvt.f32.bf16 %r23995, %rs1853; 2026-02-21T10:19:38.0782789Z cvt.f32.bf16 %r23996, %rs1854; 2026-02-21T10:19:38.0782848Z cvt.f32.bf16 %r24125, %rs1799; 2026-02-21T10:19:38.0782909Z cvt.f32.bf16 %r24126, %rs1800; 2026-02-21T10:19:38.0782969Z cvt.f32.bf16 %r24127, %rs1807; 2026-02-21T10:19:38.0783028Z cvt.f32.bf16 %r24128, %rs1808; 2026-02-21T10:19:38.0783092Z cvt.f32.bf16 %r24257, %rs1815; 2026-02-21T10:19:38.0783155Z cvt.f32.bf16 %r24258, %rs1816; 2026-02-21T10:19:38.0783214Z cvt.f32.bf16 %r24259, %rs1823; 2026-02-21T10:19:38.0783272Z cvt.f32.bf16 %r24260, %rs1824; 2026-02-21T10:19:38.0783334Z cvt.f32.bf16 %r24389, %rs1831; 2026-02-21T10:19:38.0783394Z cvt.f32.bf16 %r24390, %rs1832; 2026-02-21T10:19:38.0783454Z cvt.f32.bf16 %r24391, %rs1839; 2026-02-21T10:19:38.0783515Z cvt.f32.bf16 %r24392, %rs1840; 2026-02-21T10:19:38.0783635Z cvt.f32.bf16 %r24521, %rs1847; 2026-02-21T10:19:38.0783700Z cvt.f32.bf16 %r24522, %rs1848; 2026-02-21T10:19:38.0783759Z cvt.f32.bf16 %r24523, %rs1855; 2026-02-21T10:19:38.0783822Z cvt.f32.bf16 %r24524, %rs1856; 2026-02-21T10:19:38.0784034Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0784091Z bar.sync 0; 2026-02-21T10:19:38.0784151Z // begin inline asm 2026-02-21T10:19:38.0784254Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0784365Z // end inline asm 2026-02-21T10:19:38.0784419Z bar.sync 0; 2026-02-21T10:19:38.0784480Z // begin inline asm 2026-02-21T10:19:38.0784624Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0784681Z // end inline asm 2026-02-21T10:19:38.0784740Z // begin inline asm 2026-02-21T10:19:38.0784820Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0784875Z // end inline asm 2026-02-21T10:19:38.0784929Z bar.sync 0; 2026-02-21T10:19:38.0785003Z elect.sync %r29617|%p281, -1; 2026-02-21T10:19:38.0785072Z and.pred %p222, %p1, %p281; 2026-02-21T10:19:38.0785135Z add.s64 %rd79, %rd849, 96; 2026-02-21T10:19:38.0785199Z cvt.u32.u64 %r22408, %rd79; 2026-02-21T10:19:38.0785258Z // begin inline asm 2026-02-21T10:19:38.0785599Z @%p222 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r29849, %r22408}], [%r29846]; 2026-02-21T10:19:38.0785660Z // end inline asm 2026-02-21T10:19:38.0785716Z bar.sync 0; 2026-02-21T10:19:38.0785848Z mov.b32 %r29485, 0; 2026-02-21T10:19:38.0785908Z // begin inline asm 2026-02-21T10:19:38.0785963Z 2026-02-21T10:19:38.0786012Z { 2026-02-21T10:19:38.0786074Z .reg .pred complete; 2026-02-21T10:19:38.0786131Z waitLoop: 2026-02-21T10:19:38.0786279Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r29485; 2026-02-21T10:19:38.0786349Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0786399Z } 2026-02-21T10:19:38.0786405Z 2026-02-21T10:19:38.0786613Z // end inline asm 2026-02-21T10:19:38.0786676Z bar.sync 0; 2026-02-21T10:19:38.0786736Z // begin inline asm 2026-02-21T10:19:38.0786836Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0786891Z // end inline asm 2026-02-21T10:19:38.0787107Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0787179Z ld.shared.s8 %rs1857, [%r19]; 2026-02-21T10:19:38.0787375Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0787444Z shl.b16 %rs1858, %rs1857, 4; 2026-02-21T10:19:38.0787635Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0787706Z ld.shared.s8 %rs1859, [%r20+128]; 2026-02-21T10:19:38.0787994Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0788057Z shl.b16 %rs1860, %rs1859, 4; 2026-02-21T10:19:38.0788250Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0788317Z ld.shared.s8 %rs1861, [%r21+256]; 2026-02-21T10:19:38.0788585Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0788650Z shl.b16 %rs1862, %rs1861, 4; 2026-02-21T10:19:38.0788838Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0788907Z ld.shared.s8 %rs1863, [%r22+384]; 2026-02-21T10:19:38.0789099Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0789161Z shl.b16 %rs1864, %rs1863, 4; 2026-02-21T10:19:38.0789349Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0789415Z ld.shared.s8 %rs1865, [%r23+512]; 2026-02-21T10:19:38.0789678Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0789743Z shl.b16 %rs1866, %rs1865, 4; 2026-02-21T10:19:38.0789931Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0789996Z ld.shared.s8 %rs1867, [%r24+640]; 2026-02-21T10:19:38.0790184Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0790317Z shl.b16 %rs1868, %rs1867, 4; 2026-02-21T10:19:38.0790515Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0790578Z ld.shared.s8 %rs1869, [%r25+768]; 2026-02-21T10:19:38.0790765Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0790830Z shl.b16 %rs1870, %rs1869, 4; 2026-02-21T10:19:38.0791019Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0791084Z ld.shared.s8 %rs1871, [%r26+896]; 2026-02-21T10:19:38.0791274Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0791335Z shl.b16 %rs1872, %rs1871, 4; 2026-02-21T10:19:38.0791522Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0791589Z ld.shared.s8 %rs1873, [%r19+1024]; 2026-02-21T10:19:38.0791854Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0791920Z shl.b16 %rs1874, %rs1873, 4; 2026-02-21T10:19:38.0792116Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0792188Z ld.shared.s8 %rs1875, [%r20+1152]; 2026-02-21T10:19:38.0792386Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0792452Z shl.b16 %rs1876, %rs1875, 4; 2026-02-21T10:19:38.0792647Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0792713Z ld.shared.s8 %rs1877, [%r21+1280]; 2026-02-21T10:19:38.0792904Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0792969Z shl.b16 %rs1878, %rs1877, 4; 2026-02-21T10:19:38.0793159Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0793225Z ld.shared.s8 %rs1879, [%r22+1408]; 2026-02-21T10:19:38.0793415Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0793480Z shl.b16 %rs1880, %rs1879, 4; 2026-02-21T10:19:38.0793724Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0793787Z ld.shared.s8 %rs1881, [%r23+1536]; 2026-02-21T10:19:38.0793981Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0794042Z shl.b16 %rs1882, %rs1881, 4; 2026-02-21T10:19:38.0794231Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0794296Z ld.shared.s8 %rs1883, [%r24+1664]; 2026-02-21T10:19:38.0794484Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0794549Z shl.b16 %rs1884, %rs1883, 4; 2026-02-21T10:19:38.0794741Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0794806Z ld.shared.s8 %rs1885, [%r25+1792]; 2026-02-21T10:19:38.0794994Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0795058Z shl.b16 %rs1886, %rs1885, 4; 2026-02-21T10:19:38.0795303Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0795368Z ld.shared.s8 %rs1887, [%r26+1920]; 2026-02-21T10:19:38.0795557Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0795621Z shl.b16 %rs1888, %rs1887, 4; 2026-02-21T10:19:38.0795809Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0795918Z ld.shared.s8 %rs1889, [%r19+2048]; 2026-02-21T10:19:38.0796114Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0796174Z shl.b16 %rs1890, %rs1889, 4; 2026-02-21T10:19:38.0796365Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0796437Z ld.shared.s8 %rs1891, [%r20+2176]; 2026-02-21T10:19:38.0796760Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0796826Z shl.b16 %rs1892, %rs1891, 4; 2026-02-21T10:19:38.0797016Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0797082Z ld.shared.s8 %rs1893, [%r21+2304]; 2026-02-21T10:19:38.0797282Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0797347Z shl.b16 %rs1894, %rs1893, 4; 2026-02-21T10:19:38.0797615Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0797682Z ld.shared.s8 %rs1895, [%r22+2432]; 2026-02-21T10:19:38.0797872Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0797937Z shl.b16 %rs1896, %rs1895, 4; 2026-02-21T10:19:38.0798137Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0798205Z ld.shared.s8 %rs1897, [%r23+2560]; 2026-02-21T10:19:38.0798398Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0798459Z shl.b16 %rs1898, %rs1897, 4; 2026-02-21T10:19:38.0798648Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0798713Z ld.shared.s8 %rs1899, [%r24+2688]; 2026-02-21T10:19:38.0798904Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0798966Z shl.b16 %rs1900, %rs1899, 4; 2026-02-21T10:19:38.0799154Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0799221Z ld.shared.s8 %rs1901, [%r25+2816]; 2026-02-21T10:19:38.0799479Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0799541Z shl.b16 %rs1902, %rs1901, 4; 2026-02-21T10:19:38.0799732Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0799795Z ld.shared.s8 %rs1903, [%r26+2944]; 2026-02-21T10:19:38.0799981Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0800044Z shl.b16 %rs1904, %rs1903, 4; 2026-02-21T10:19:38.0800232Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0800299Z ld.shared.s8 %rs1905, [%r19+3072]; 2026-02-21T10:19:38.0800489Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0800551Z shl.b16 %rs1906, %rs1905, 4; 2026-02-21T10:19:38.0800739Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0800805Z ld.shared.s8 %rs1907, [%r20+3200]; 2026-02-21T10:19:38.0801069Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0801135Z shl.b16 %rs1908, %rs1907, 4; 2026-02-21T10:19:38.0801324Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0801390Z ld.shared.s8 %rs1909, [%r21+3328]; 2026-02-21T10:19:38.0801576Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0801696Z shl.b16 %rs1910, %rs1909, 4; 2026-02-21T10:19:38.0801890Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0801952Z ld.shared.s8 %rs1911, [%r22+3456]; 2026-02-21T10:19:38.0802142Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0802208Z shl.b16 %rs1912, %rs1911, 4; 2026-02-21T10:19:38.0802395Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0802461Z ld.shared.s8 %rs1913, [%r23+3584]; 2026-02-21T10:19:38.0802650Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0802713Z shl.b16 %rs1914, %rs1913, 4; 2026-02-21T10:19:38.0802903Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0802966Z ld.shared.s8 %rs1915, [%r24+3712]; 2026-02-21T10:19:38.0803216Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0803281Z shl.b16 %rs1916, %rs1915, 4; 2026-02-21T10:19:38.0803469Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0803535Z ld.shared.s8 %rs1917, [%r25+3840]; 2026-02-21T10:19:38.0803723Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0803786Z shl.b16 %rs1918, %rs1917, 4; 2026-02-21T10:19:38.0803978Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0804042Z ld.shared.s8 %rs1919, [%r26+3968]; 2026-02-21T10:19:38.0804235Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0804299Z shl.b16 %rs1920, %rs1919, 4; 2026-02-21T10:19:38.0804497Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0804561Z cvt.s16.s8 %rs1921, %rs1858; 2026-02-21T10:19:38.0804624Z shr.s16 %rs1922, %rs1921, 4; 2026-02-21T10:19:38.0804689Z cvt.s16.s8 %rs1923, %rs1860; 2026-02-21T10:19:38.0804749Z shr.s16 %rs1924, %rs1923, 4; 2026-02-21T10:19:38.0804808Z shr.s16 %rs1925, %rs1857, 4; 2026-02-21T10:19:38.0804935Z shr.s16 %rs1926, %rs1859, 4; 2026-02-21T10:19:38.0805141Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0805211Z cvt.rn.f32.s16 %r29618, %rs1926; 2026-02-21T10:19:38.0805275Z cvt.rn.f32.s16 %r29619, %rs1925; 2026-02-21T10:19:38.0805339Z cvt.rn.f32.s16 %r29620, %rs1924; 2026-02-21T10:19:38.0805400Z cvt.rn.f32.s16 %r29621, %rs1922; 2026-02-21T10:19:38.0805596Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0805661Z cvt.s16.s8 %rs1927, %rs1862; 2026-02-21T10:19:38.0805725Z shr.s16 %rs1928, %rs1927, 4; 2026-02-21T10:19:38.0805786Z cvt.s16.s8 %rs1929, %rs1864; 2026-02-21T10:19:38.0805850Z shr.s16 %rs1930, %rs1929, 4; 2026-02-21T10:19:38.0805912Z shr.s16 %rs1931, %rs1861, 4; 2026-02-21T10:19:38.0805972Z shr.s16 %rs1932, %rs1863, 4; 2026-02-21T10:19:38.0806164Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0806231Z cvt.rn.f32.s16 %r29622, %rs1932; 2026-02-21T10:19:38.0806293Z cvt.rn.f32.s16 %r29623, %rs1931; 2026-02-21T10:19:38.0806413Z cvt.rn.f32.s16 %r29624, %rs1930; 2026-02-21T10:19:38.0806596Z cvt.rn.f32.s16 %r29625, %rs1928; 2026-02-21T10:19:38.0806794Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0806856Z cvt.s16.s8 %rs1933, %rs1866; 2026-02-21T10:19:38.0806919Z shr.s16 %rs1934, %rs1933, 4; 2026-02-21T10:19:38.0806979Z cvt.s16.s8 %rs1935, %rs1868; 2026-02-21T10:19:38.0807045Z shr.s16 %rs1936, %rs1935, 4; 2026-02-21T10:19:38.0807209Z shr.s16 %rs1937, %rs1865, 4; 2026-02-21T10:19:38.0807272Z shr.s16 %rs1938, %rs1867, 4; 2026-02-21T10:19:38.0807468Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0807534Z cvt.rn.f32.s16 %r29626, %rs1938; 2026-02-21T10:19:38.0807600Z cvt.rn.f32.s16 %r29627, %rs1937; 2026-02-21T10:19:38.0807664Z cvt.rn.f32.s16 %r29628, %rs1936; 2026-02-21T10:19:38.0807723Z cvt.rn.f32.s16 %r29629, %rs1934; 2026-02-21T10:19:38.0807918Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0807983Z cvt.s16.s8 %rs1939, %rs1870; 2026-02-21T10:19:38.0808044Z shr.s16 %rs1940, %rs1939, 4; 2026-02-21T10:19:38.0808105Z cvt.s16.s8 %rs1941, %rs1872; 2026-02-21T10:19:38.0808166Z shr.s16 %rs1942, %rs1941, 4; 2026-02-21T10:19:38.0808226Z shr.s16 %rs1943, %rs1869, 4; 2026-02-21T10:19:38.0808289Z shr.s16 %rs1944, %rs1871, 4; 2026-02-21T10:19:38.0808553Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0808618Z cvt.rn.f32.s16 %r29630, %rs1944; 2026-02-21T10:19:38.0808680Z cvt.rn.f32.s16 %r29631, %rs1943; 2026-02-21T10:19:38.0808743Z cvt.rn.f32.s16 %r29632, %rs1942; 2026-02-21T10:19:38.0808821Z cvt.rn.f32.s16 %r29633, %rs1940; 2026-02-21T10:19:38.0809019Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0809082Z cvt.s16.s8 %rs1945, %rs1874; 2026-02-21T10:19:38.0809147Z shr.s16 %rs1946, %rs1945, 4; 2026-02-21T10:19:38.0809206Z cvt.s16.s8 %rs1947, %rs1876; 2026-02-21T10:19:38.0809265Z shr.s16 %rs1948, %rs1947, 4; 2026-02-21T10:19:38.0809326Z shr.s16 %rs1949, %rs1873, 4; 2026-02-21T10:19:38.0809385Z shr.s16 %rs1950, %rs1875, 4; 2026-02-21T10:19:38.0809575Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0809639Z cvt.rn.f32.s16 %r29634, %rs1950; 2026-02-21T10:19:38.0809709Z cvt.rn.f32.s16 %r29635, %rs1949; 2026-02-21T10:19:38.0809774Z cvt.rn.f32.s16 %r29636, %rs1948; 2026-02-21T10:19:38.0809835Z cvt.rn.f32.s16 %r29637, %rs1946; 2026-02-21T10:19:38.0810028Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0810159Z cvt.s16.s8 %rs1951, %rs1878; 2026-02-21T10:19:38.0810219Z shr.s16 %rs1952, %rs1951, 4; 2026-02-21T10:19:38.0810280Z cvt.s16.s8 %rs1953, %rs1880; 2026-02-21T10:19:38.0810356Z shr.s16 %rs1954, %rs1953, 4; 2026-02-21T10:19:38.0810418Z shr.s16 %rs1955, %rs1877, 4; 2026-02-21T10:19:38.0810478Z shr.s16 %rs1956, %rs1879, 4; 2026-02-21T10:19:38.0810671Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0810733Z cvt.rn.f32.s16 %r29638, %rs1956; 2026-02-21T10:19:38.0810794Z cvt.rn.f32.s16 %r29639, %rs1955; 2026-02-21T10:19:38.0810860Z cvt.rn.f32.s16 %r29640, %rs1954; 2026-02-21T10:19:38.0810923Z cvt.rn.f32.s16 %r29641, %rs1952; 2026-02-21T10:19:38.0811113Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0811174Z cvt.s16.s8 %rs1957, %rs1882; 2026-02-21T10:19:38.0811236Z shr.s16 %rs1958, %rs1957, 4; 2026-02-21T10:19:38.0811299Z cvt.s16.s8 %rs1959, %rs1884; 2026-02-21T10:19:38.0811358Z shr.s16 %rs1960, %rs1959, 4; 2026-02-21T10:19:38.0811421Z shr.s16 %rs1961, %rs1881, 4; 2026-02-21T10:19:38.0811549Z shr.s16 %rs1962, %rs1883, 4; 2026-02-21T10:19:38.0811743Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0811808Z cvt.rn.f32.s16 %r29642, %rs1962; 2026-02-21T10:19:38.0811873Z cvt.rn.f32.s16 %r29643, %rs1961; 2026-02-21T10:19:38.0811933Z cvt.rn.f32.s16 %r29644, %rs1960; 2026-02-21T10:19:38.0811993Z cvt.rn.f32.s16 %r29645, %rs1958; 2026-02-21T10:19:38.0812186Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0812298Z cvt.s16.s8 %rs1963, %rs1886; 2026-02-21T10:19:38.0812359Z shr.s16 %rs1964, %rs1963, 4; 2026-02-21T10:19:38.0812423Z cvt.s16.s8 %rs1965, %rs1888; 2026-02-21T10:19:38.0812480Z shr.s16 %rs1966, %rs1965, 4; 2026-02-21T10:19:38.0812539Z shr.s16 %rs1967, %rs1885, 4; 2026-02-21T10:19:38.0812605Z shr.s16 %rs1968, %rs1887, 4; 2026-02-21T10:19:38.0812811Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0812875Z cvt.rn.f32.s16 %r29646, %rs1968; 2026-02-21T10:19:38.0812940Z cvt.rn.f32.s16 %r29647, %rs1967; 2026-02-21T10:19:38.0813004Z cvt.rn.f32.s16 %r29648, %rs1966; 2026-02-21T10:19:38.0813065Z cvt.rn.f32.s16 %r29649, %rs1964; 2026-02-21T10:19:38.0813254Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0813318Z cvt.s16.s8 %rs1969, %rs1890; 2026-02-21T10:19:38.0813427Z shr.s16 %rs1970, %rs1969, 4; 2026-02-21T10:19:38.0813489Z cvt.s16.s8 %rs1971, %rs1892; 2026-02-21T10:19:38.0813549Z shr.s16 %rs1972, %rs1971, 4; 2026-02-21T10:19:38.0813610Z shr.s16 %rs1973, %rs1889, 4; 2026-02-21T10:19:38.0813670Z shr.s16 %rs1974, %rs1891, 4; 2026-02-21T10:19:38.0813858Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0813926Z cvt.rn.f32.s16 %r29650, %rs1974; 2026-02-21T10:19:38.0813987Z cvt.rn.f32.s16 %r29651, %rs1973; 2026-02-21T10:19:38.0814050Z cvt.rn.f32.s16 %r29652, %rs1972; 2026-02-21T10:19:38.0814116Z cvt.rn.f32.s16 %r29653, %rs1970; 2026-02-21T10:19:38.0814310Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0814371Z cvt.s16.s8 %rs1975, %rs1894; 2026-02-21T10:19:38.0814432Z shr.s16 %rs1976, %rs1975, 4; 2026-02-21T10:19:38.0814495Z cvt.s16.s8 %rs1977, %rs1896; 2026-02-21T10:19:38.0814555Z shr.s16 %rs1978, %rs1977, 4; 2026-02-21T10:19:38.0814617Z shr.s16 %rs1979, %rs1893, 4; 2026-02-21T10:19:38.0814679Z shr.s16 %rs1980, %rs1895, 4; 2026-02-21T10:19:38.0814867Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0814940Z cvt.rn.f32.s16 %r29654, %rs1980; 2026-02-21T10:19:38.0815062Z cvt.rn.f32.s16 %r29655, %rs1979; 2026-02-21T10:19:38.0815124Z cvt.rn.f32.s16 %r29656, %rs1978; 2026-02-21T10:19:38.0815185Z cvt.rn.f32.s16 %r29657, %rs1976; 2026-02-21T10:19:38.0815377Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0815442Z cvt.s16.s8 %rs1981, %rs1898; 2026-02-21T10:19:38.0815502Z shr.s16 %rs1982, %rs1981, 4; 2026-02-21T10:19:38.0815562Z cvt.s16.s8 %rs1983, %rs1900; 2026-02-21T10:19:38.0815624Z shr.s16 %rs1984, %rs1983, 4; 2026-02-21T10:19:38.0815683Z shr.s16 %rs1985, %rs1897, 4; 2026-02-21T10:19:38.0815743Z shr.s16 %rs1986, %rs1899, 4; 2026-02-21T10:19:38.0815937Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0816001Z cvt.rn.f32.s16 %r29658, %rs1986; 2026-02-21T10:19:38.0816062Z cvt.rn.f32.s16 %r29659, %rs1985; 2026-02-21T10:19:38.0816123Z cvt.rn.f32.s16 %r29660, %rs1984; 2026-02-21T10:19:38.0816192Z cvt.rn.f32.s16 %r29661, %rs1982; 2026-02-21T10:19:38.0816380Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0816649Z cvt.s16.s8 %rs1987, %rs1902; 2026-02-21T10:19:38.0816737Z shr.s16 %rs1988, %rs1987, 4; 2026-02-21T10:19:38.0816800Z cvt.s16.s8 %rs1989, %rs1904; 2026-02-21T10:19:38.0816861Z shr.s16 %rs1990, %rs1989, 4; 2026-02-21T10:19:38.0816934Z shr.s16 %rs1991, %rs1901, 4; 2026-02-21T10:19:38.0816998Z shr.s16 %rs1992, %rs1903, 4; 2026-02-21T10:19:38.0817188Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0817324Z cvt.rn.f32.s16 %r29662, %rs1992; 2026-02-21T10:19:38.0817388Z cvt.rn.f32.s16 %r29663, %rs1991; 2026-02-21T10:19:38.0817449Z cvt.rn.f32.s16 %r29664, %rs1990; 2026-02-21T10:19:38.0817510Z cvt.rn.f32.s16 %r29665, %rs1988; 2026-02-21T10:19:38.0817700Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0817763Z cvt.s16.s8 %rs1993, %rs1906; 2026-02-21T10:19:38.0817822Z shr.s16 %rs1994, %rs1993, 4; 2026-02-21T10:19:38.0817882Z cvt.s16.s8 %rs1995, %rs1908; 2026-02-21T10:19:38.0817945Z shr.s16 %rs1996, %rs1995, 4; 2026-02-21T10:19:38.0818005Z shr.s16 %rs1997, %rs1905, 4; 2026-02-21T10:19:38.0818064Z shr.s16 %rs1998, %rs1907, 4; 2026-02-21T10:19:38.0818257Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0818319Z cvt.rn.f32.s16 %r29666, %rs1998; 2026-02-21T10:19:38.0818381Z cvt.rn.f32.s16 %r29667, %rs1997; 2026-02-21T10:19:38.0818448Z cvt.rn.f32.s16 %r29668, %rs1996; 2026-02-21T10:19:38.0818575Z cvt.rn.f32.s16 %r29669, %rs1994; 2026-02-21T10:19:38.0818768Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0818831Z cvt.s16.s8 %rs1999, %rs1910; 2026-02-21T10:19:38.0818894Z shr.s16 %rs2000, %rs1999, 4; 2026-02-21T10:19:38.0818955Z cvt.s16.s8 %rs2001, %rs1912; 2026-02-21T10:19:38.0819015Z shr.s16 %rs2002, %rs2001, 4; 2026-02-21T10:19:38.0819077Z shr.s16 %rs2003, %rs1909, 4; 2026-02-21T10:19:38.0819149Z shr.s16 %rs2004, %rs1911, 4; 2026-02-21T10:19:38.0819344Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0819409Z cvt.rn.f32.s16 %r29670, %rs2004; 2026-02-21T10:19:38.0819469Z cvt.rn.f32.s16 %r29671, %rs2003; 2026-02-21T10:19:38.0819531Z cvt.rn.f32.s16 %r29672, %rs2002; 2026-02-21T10:19:38.0819592Z cvt.rn.f32.s16 %r29673, %rs2000; 2026-02-21T10:19:38.0819790Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0819852Z cvt.s16.s8 %rs2005, %rs1914; 2026-02-21T10:19:38.0819912Z shr.s16 %rs2006, %rs2005, 4; 2026-02-21T10:19:38.0819976Z cvt.s16.s8 %rs2007, %rs1916; 2026-02-21T10:19:38.0820046Z shr.s16 %rs2008, %rs2007, 4; 2026-02-21T10:19:38.0820180Z shr.s16 %rs2009, %rs1913, 4; 2026-02-21T10:19:38.0820241Z shr.s16 %rs2010, %rs1915, 4; 2026-02-21T10:19:38.0820436Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0820499Z cvt.rn.f32.s16 %r29674, %rs2010; 2026-02-21T10:19:38.0820561Z cvt.rn.f32.s16 %r29675, %rs2009; 2026-02-21T10:19:38.0820627Z cvt.rn.f32.s16 %r29676, %rs2008; 2026-02-21T10:19:38.0820687Z cvt.rn.f32.s16 %r29677, %rs2006; 2026-02-21T10:19:38.0820876Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0820937Z cvt.s16.s8 %rs2011, %rs1918; 2026-02-21T10:19:38.0821001Z shr.s16 %rs2012, %rs2011, 4; 2026-02-21T10:19:38.0821060Z cvt.s16.s8 %rs2013, %rs1920; 2026-02-21T10:19:38.0821122Z shr.s16 %rs2014, %rs2013, 4; 2026-02-21T10:19:38.0821185Z shr.s16 %rs2015, %rs1917, 4; 2026-02-21T10:19:38.0821245Z shr.s16 %rs2016, %rs1919, 4; 2026-02-21T10:19:38.0821434Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0821499Z cvt.rn.f32.s16 %r29678, %rs2016; 2026-02-21T10:19:38.0821614Z cvt.rn.f32.s16 %r29679, %rs2015; 2026-02-21T10:19:38.0821677Z cvt.rn.f32.s16 %r29680, %rs2014; 2026-02-21T10:19:38.0821740Z cvt.rn.f32.s16 %r29681, %rs2012; 2026-02-21T10:19:38.0821798Z bar.sync 0; 2026-02-21T10:19:38.0821918Z st.shared.v4.b32 [%r27], {%r29621, %r29619, %r29620, %r29618}; 2026-02-21T10:19:38.0822051Z st.shared.v4.b32 [%r27+16384], {%r29653, %r29651, %r29652, %r29650}; 2026-02-21T10:19:38.0822168Z st.shared.v4.b32 [%r28], {%r29625, %r29623, %r29624, %r29622}; 2026-02-21T10:19:38.0822352Z st.shared.v4.b32 [%r28+16384], {%r29657, %r29655, %r29656, %r29654}; 2026-02-21T10:19:38.0822462Z st.shared.v4.b32 [%r29], {%r29629, %r29627, %r29628, %r29626}; 2026-02-21T10:19:38.0822582Z st.shared.v4.b32 [%r29+16384], {%r29661, %r29659, %r29660, %r29658}; 2026-02-21T10:19:38.0822691Z st.shared.v4.b32 [%r30], {%r29633, %r29631, %r29632, %r29630}; 2026-02-21T10:19:38.0822809Z st.shared.v4.b32 [%r30+16384], {%r29665, %r29663, %r29664, %r29662}; 2026-02-21T10:19:38.0822922Z st.shared.v4.b32 [%r31], {%r29637, %r29635, %r29636, %r29634}; 2026-02-21T10:19:38.0823041Z st.shared.v4.b32 [%r31+16384], {%r29669, %r29667, %r29668, %r29666}; 2026-02-21T10:19:38.0823148Z st.shared.v4.b32 [%r32], {%r29641, %r29639, %r29640, %r29638}; 2026-02-21T10:19:38.0823266Z st.shared.v4.b32 [%r32+16384], {%r29673, %r29671, %r29672, %r29670}; 2026-02-21T10:19:38.0823375Z st.shared.v4.b32 [%r33], {%r29645, %r29643, %r29644, %r29642}; 2026-02-21T10:19:38.0823537Z st.shared.v4.b32 [%r33+16384], {%r29677, %r29675, %r29676, %r29674}; 2026-02-21T10:19:38.0823648Z st.shared.v4.b32 [%r34], {%r29649, %r29647, %r29648, %r29646}; 2026-02-21T10:19:38.0823767Z st.shared.v4.b32 [%r34+16384], {%r29681, %r29679, %r29680, %r29678}; 2026-02-21T10:19:38.0823822Z $L__tmp17: 2026-02-21T10:19:38.0824095Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0824160Z // begin inline asm 2026-02-21T10:19:38.0824238Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0824297Z // end inline asm 2026-02-21T10:19:38.0824354Z bar.sync 0; 2026-02-21T10:19:38.0824426Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0824491Z mov.pred %p224, -1; 2026-02-21T10:19:38.0824571Z // begin inline asm 2026-02-21T10:19:38.0826064Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r22541,%r22542,%r22543,%r22544}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.0826177Z // end inline asm 2026-02-21T10:19:38.0826237Z // begin inline asm 2026-02-21T10:19:38.0827857Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r22673,%r22674,%r22675,%r22676}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.0827921Z // end inline asm 2026-02-21T10:19:38.0827982Z // begin inline asm 2026-02-21T10:19:38.0829612Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r22805,%r22806,%r22807,%r22808}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.0829690Z // end inline asm 2026-02-21T10:19:38.0829749Z // begin inline asm 2026-02-21T10:19:38.0831290Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r22937,%r22938,%r22939,%r22940}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.0831356Z // end inline asm 2026-02-21T10:19:38.0831415Z // begin inline asm 2026-02-21T10:19:38.0832947Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r23069,%r23070,%r23071,%r23072}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.0833016Z // end inline asm 2026-02-21T10:19:38.0833073Z // begin inline asm 2026-02-21T10:19:38.0834718Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r23201,%r23202,%r23203,%r23204}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.0834782Z // end inline asm 2026-02-21T10:19:38.0834839Z // begin inline asm 2026-02-21T10:19:38.0836317Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r23333,%r23334,%r23335,%r23336}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.0836616Z // end inline asm 2026-02-21T10:19:38.0836679Z // begin inline asm 2026-02-21T10:19:38.0838256Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r23465,%r23466,%r23467,%r23468}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.0838321Z // end inline asm 2026-02-21T10:19:38.0838382Z // begin inline asm 2026-02-21T10:19:38.0839860Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r23597,%r23598,%r23599,%r23600}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.0839976Z // end inline asm 2026-02-21T10:19:38.0840038Z // begin inline asm 2026-02-21T10:19:38.0841515Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r23729,%r23730,%r23731,%r23732}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.0841634Z // end inline asm 2026-02-21T10:19:38.0841694Z // begin inline asm 2026-02-21T10:19:38.0843176Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r23861,%r23862,%r23863,%r23864}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.0843236Z // end inline asm 2026-02-21T10:19:38.0843294Z // begin inline asm 2026-02-21T10:19:38.0844771Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r23993,%r23994,%r23995,%r23996}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.0844910Z // end inline asm 2026-02-21T10:19:38.0844971Z // begin inline asm 2026-02-21T10:19:38.0846596Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r24125,%r24126,%r24127,%r24128}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.0846670Z // end inline asm 2026-02-21T10:19:38.0846731Z // begin inline asm 2026-02-21T10:19:38.0848296Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r24257,%r24258,%r24259,%r24260}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.0848361Z // end inline asm 2026-02-21T10:19:38.0848479Z // begin inline asm 2026-02-21T10:19:38.0849978Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r24389,%r24390,%r24391,%r24392}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.0850038Z // end inline asm 2026-02-21T10:19:38.0850100Z // begin inline asm 2026-02-21T10:19:38.0851641Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r24521,%r24522,%r24523,%r24524}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.0851704Z // end inline asm 2026-02-21T10:19:38.0851783Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0851848Z mov.b32 %r24654, %r29485; 2026-02-21T10:19:38.0851906Z mov.b32 %r24655, %r29485; 2026-02-21T10:19:38.0851968Z mov.b32 %r24653, %r39931; 2026-02-21T10:19:38.0852025Z // begin inline asm 2026-02-21T10:19:38.0854552Z // wait for regs: %r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r24653,%r24654,%r24655 2026-02-21T10:19:38.0854699Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0854757Z // end inline asm 2026-02-21T10:19:38.0854813Z $L__tmp18: 2026-02-21T10:19:38.0855022Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0855089Z add.s32 %r29682, %r42982, -64; 2026-02-21T10:19:38.0855153Z add.s64 %rd541, %rd500, 128; 2026-02-21T10:19:38.0855216Z add.s64 %rd544, %rd503, 128; 2026-02-21T10:19:38.0855279Z add.s64 %rd547, %rd506, 128; 2026-02-21T10:19:38.0855343Z add.s64 %rd550, %rd509, 128; 2026-02-21T10:19:38.0855412Z add.s64 %rd553, %rd512, 128; 2026-02-21T10:19:38.0855478Z add.s64 %rd556, %rd515, 128; 2026-02-21T10:19:38.0855586Z add.s64 %rd559, %rd518, 128; 2026-02-21T10:19:38.0855675Z mad.wide.s32 %rd562, %r29682, 2, %rd117; 2026-02-21T10:19:38.0855884Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0855944Z // begin inline asm 2026-02-21T10:19:38.0856003Z mov.u64 %rd540, 0x0; 2026-02-21T10:19:38.0856137Z createpolicy.fractional.L2::evict_first.b64 %rd540, 1.0; 2026-02-21T10:19:38.0856194Z // end inline asm 2026-02-21T10:19:38.0856299Z // begin inline asm 2026-02-21T10:19:38.0856360Z mov.u32 %r24787, 0x0; 2026-02-21T10:19:38.0856418Z mov.u32 %r24788, 0x0; 2026-02-21T10:19:38.0856598Z mov.u32 %r24789, 0x0; 2026-02-21T10:19:38.0856660Z mov.u32 %r24790, 0x0; 2026-02-21T10:19:38.0856911Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24787, %r24788, %r24789, %r24790 }, [ %rd541 + 0 ], %rd540; 2026-02-21T10:19:38.0856971Z // end inline asm 2026-02-21T10:19:38.0857028Z // begin inline asm 2026-02-21T10:19:38.0857088Z mov.u64 %rd543, 0x0; 2026-02-21T10:19:38.0857211Z createpolicy.fractional.L2::evict_first.b64 %rd543, 1.0; 2026-02-21T10:19:38.0857269Z // end inline asm 2026-02-21T10:19:38.0857331Z // begin inline asm 2026-02-21T10:19:38.0857389Z mov.u32 %r24791, 0x0; 2026-02-21T10:19:38.0857445Z mov.u32 %r24792, 0x0; 2026-02-21T10:19:38.0857502Z mov.u32 %r24793, 0x0; 2026-02-21T10:19:38.0857562Z mov.u32 %r24794, 0x0; 2026-02-21T10:19:38.0857790Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24791, %r24792, %r24793, %r24794 }, [ %rd544 + 0 ], %rd543; 2026-02-21T10:19:38.0857927Z // end inline asm 2026-02-21T10:19:38.0858002Z // begin inline asm 2026-02-21T10:19:38.0858064Z mov.u64 %rd546, 0x0; 2026-02-21T10:19:38.0858184Z createpolicy.fractional.L2::evict_first.b64 %rd546, 1.0; 2026-02-21T10:19:38.0858241Z // end inline asm 2026-02-21T10:19:38.0858305Z // begin inline asm 2026-02-21T10:19:38.0858362Z mov.u32 %r24795, 0x0; 2026-02-21T10:19:38.0858418Z mov.u32 %r24796, 0x0; 2026-02-21T10:19:38.0858477Z mov.u32 %r24797, 0x0; 2026-02-21T10:19:38.0858535Z mov.u32 %r24798, 0x0; 2026-02-21T10:19:38.0858761Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24795, %r24796, %r24797, %r24798 }, [ %rd547 + 0 ], %rd546; 2026-02-21T10:19:38.0858821Z // end inline asm 2026-02-21T10:19:38.0858879Z // begin inline asm 2026-02-21T10:19:38.0858937Z mov.u64 %rd549, 0x0; 2026-02-21T10:19:38.0859059Z createpolicy.fractional.L2::evict_first.b64 %rd549, 1.0; 2026-02-21T10:19:38.0859117Z // end inline asm 2026-02-21T10:19:38.0859176Z // begin inline asm 2026-02-21T10:19:38.0859233Z mov.u32 %r24799, 0x0; 2026-02-21T10:19:38.0859292Z mov.u32 %r24800, 0x0; 2026-02-21T10:19:38.0859350Z mov.u32 %r24801, 0x0; 2026-02-21T10:19:38.0859409Z mov.u32 %r24802, 0x0; 2026-02-21T10:19:38.0859646Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24799, %r24800, %r24801, %r24802 }, [ %rd550 + 0 ], %rd549; 2026-02-21T10:19:38.0859800Z // end inline asm 2026-02-21T10:19:38.0859859Z // begin inline asm 2026-02-21T10:19:38.0859919Z mov.u64 %rd552, 0x0; 2026-02-21T10:19:38.0860039Z createpolicy.fractional.L2::evict_first.b64 %rd552, 1.0; 2026-02-21T10:19:38.0860096Z // end inline asm 2026-02-21T10:19:38.0860154Z // begin inline asm 2026-02-21T10:19:38.0860213Z mov.u32 %r24803, 0x0; 2026-02-21T10:19:38.0860270Z mov.u32 %r24804, 0x0; 2026-02-21T10:19:38.0860327Z mov.u32 %r24805, 0x0; 2026-02-21T10:19:38.0860383Z mov.u32 %r24806, 0x0; 2026-02-21T10:19:38.0860608Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24803, %r24804, %r24805, %r24806 }, [ %rd553 + 0 ], %rd552; 2026-02-21T10:19:38.0860668Z // end inline asm 2026-02-21T10:19:38.0860725Z // begin inline asm 2026-02-21T10:19:38.0860785Z mov.u64 %rd555, 0x0; 2026-02-21T10:19:38.0860899Z createpolicy.fractional.L2::evict_first.b64 %rd555, 1.0; 2026-02-21T10:19:38.0860954Z // end inline asm 2026-02-21T10:19:38.0861017Z // begin inline asm 2026-02-21T10:19:38.0861073Z mov.u32 %r24807, 0x0; 2026-02-21T10:19:38.0861128Z mov.u32 %r24808, 0x0; 2026-02-21T10:19:38.0861183Z mov.u32 %r24809, 0x0; 2026-02-21T10:19:38.0861342Z mov.u32 %r24810, 0x0; 2026-02-21T10:19:38.0861570Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24807, %r24808, %r24809, %r24810 }, [ %rd556 + 0 ], %rd555; 2026-02-21T10:19:38.0861628Z // end inline asm 2026-02-21T10:19:38.0861689Z // begin inline asm 2026-02-21T10:19:38.0861747Z mov.u64 %rd558, 0x0; 2026-02-21T10:19:38.0861861Z createpolicy.fractional.L2::evict_first.b64 %rd558, 1.0; 2026-02-21T10:19:38.0861927Z // end inline asm 2026-02-21T10:19:38.0862056Z // begin inline asm 2026-02-21T10:19:38.0862116Z mov.u32 %r24811, 0x0; 2026-02-21T10:19:38.0862173Z mov.u32 %r24812, 0x0; 2026-02-21T10:19:38.0862234Z mov.u32 %r24813, 0x0; 2026-02-21T10:19:38.0862290Z mov.u32 %r24814, 0x0; 2026-02-21T10:19:38.0862521Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24811, %r24812, %r24813, %r24814 }, [ %rd559 + 0 ], %rd558; 2026-02-21T10:19:38.0862584Z // end inline asm 2026-02-21T10:19:38.0862642Z // begin inline asm 2026-02-21T10:19:38.0862699Z mov.u64 %rd561, 0x0; 2026-02-21T10:19:38.0862821Z createpolicy.fractional.L2::evict_first.b64 %rd561, 1.0; 2026-02-21T10:19:38.0862878Z // end inline asm 2026-02-21T10:19:38.0862935Z // begin inline asm 2026-02-21T10:19:38.0862990Z mov.u32 %r24815, 0x0; 2026-02-21T10:19:38.0863049Z mov.u32 %r24816, 0x0; 2026-02-21T10:19:38.0863104Z mov.u32 %r24817, 0x0; 2026-02-21T10:19:38.0863162Z mov.u32 %r24818, 0x0; 2026-02-21T10:19:38.0863385Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24815, %r24816, %r24817, %r24818 }, [ %rd562 + 0 ], %rd561; 2026-02-21T10:19:38.0863494Z // end inline asm 2026-02-21T10:19:38.0863702Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0863761Z bar.sync 0; 2026-02-21T10:19:38.0863845Z st.shared.v2.b32 [%r9], {%r24787, %r24788}; 2026-02-21T10:19:38.0863937Z st.shared.v2.b32 [%r9+2048], {%r24791, %r24792}; 2026-02-21T10:19:38.0864023Z st.shared.v2.b32 [%r9+4096], {%r24795, %r24796}; 2026-02-21T10:19:38.0864109Z st.shared.v2.b32 [%r9+6144], {%r24799, %r24800}; 2026-02-21T10:19:38.0864190Z st.shared.v2.b32 [%r9+8192], {%r24803, %r24804}; 2026-02-21T10:19:38.0864277Z st.shared.v2.b32 [%r9+10240], {%r24807, %r24808}; 2026-02-21T10:19:38.0864371Z st.shared.v2.b32 [%r9+12288], {%r24811, %r24812}; 2026-02-21T10:19:38.0864464Z st.shared.v2.b32 [%r9+14336], {%r24815, %r24816}; 2026-02-21T10:19:38.0864545Z st.shared.v2.b32 [%r10], {%r24789, %r24790}; 2026-02-21T10:19:38.0864633Z st.shared.v2.b32 [%r10+2048], {%r24793, %r24794}; 2026-02-21T10:19:38.0864722Z st.shared.v2.b32 [%r10+4096], {%r24797, %r24798}; 2026-02-21T10:19:38.0864804Z st.shared.v2.b32 [%r10+6144], {%r24801, %r24802}; 2026-02-21T10:19:38.0864886Z st.shared.v2.b32 [%r10+8192], {%r24805, %r24806}; 2026-02-21T10:19:38.0864980Z st.shared.v2.b32 [%r10+10240], {%r24809, %r24810}; 2026-02-21T10:19:38.0865123Z st.shared.v2.b32 [%r10+12288], {%r24813, %r24814}; 2026-02-21T10:19:38.0865209Z st.shared.v2.b32 [%r10+14336], {%r24817, %r24818}; 2026-02-21T10:19:38.0865267Z bar.sync 0; 2026-02-21T10:19:38.0865337Z ld.shared.b16 %rs2017, [%r51]; 2026-02-21T10:19:38.0865409Z ld.shared.b16 %rs2018, [%r51+1024]; 2026-02-21T10:19:38.0865477Z ld.shared.b16 %rs2019, [%r51+64]; 2026-02-21T10:19:38.0865546Z ld.shared.b16 %rs2020, [%r51+1088]; 2026-02-21T10:19:38.0865613Z ld.shared.b16 %rs2021, [%r51+8192]; 2026-02-21T10:19:38.0865688Z ld.shared.b16 %rs2022, [%r51+9216]; 2026-02-21T10:19:38.0865758Z ld.shared.b16 %rs2023, [%r51+8256]; 2026-02-21T10:19:38.0865827Z ld.shared.b16 %rs2024, [%r51+9280]; 2026-02-21T10:19:38.0865893Z ld.shared.b16 %rs2025, [%r52]; 2026-02-21T10:19:38.0865959Z ld.shared.b16 %rs2026, [%r52+1024]; 2026-02-21T10:19:38.0866022Z ld.shared.b16 %rs2027, [%r52+64]; 2026-02-21T10:19:38.0866087Z ld.shared.b16 %rs2028, [%r52+1088]; 2026-02-21T10:19:38.0866153Z ld.shared.b16 %rs2029, [%r52+8192]; 2026-02-21T10:19:38.0866220Z ld.shared.b16 %rs2030, [%r52+9216]; 2026-02-21T10:19:38.0866285Z ld.shared.b16 %rs2031, [%r52+8256]; 2026-02-21T10:19:38.0866402Z ld.shared.b16 %rs2032, [%r52+9280]; 2026-02-21T10:19:38.0866603Z ld.shared.b16 %rs2033, [%r53]; 2026-02-21T10:19:38.0866675Z ld.shared.b16 %rs2034, [%r53+1024]; 2026-02-21T10:19:38.0866739Z ld.shared.b16 %rs2035, [%r53+64]; 2026-02-21T10:19:38.0866807Z ld.shared.b16 %rs2036, [%r53+1088]; 2026-02-21T10:19:38.0866873Z ld.shared.b16 %rs2037, [%r53+8192]; 2026-02-21T10:19:38.0866938Z ld.shared.b16 %rs2038, [%r53+9216]; 2026-02-21T10:19:38.0867091Z ld.shared.b16 %rs2039, [%r53+8256]; 2026-02-21T10:19:38.0867164Z ld.shared.b16 %rs2040, [%r53+9280]; 2026-02-21T10:19:38.0867228Z ld.shared.b16 %rs2041, [%r54]; 2026-02-21T10:19:38.0867304Z ld.shared.b16 %rs2042, [%r54+1024]; 2026-02-21T10:19:38.0867373Z ld.shared.b16 %rs2043, [%r54+64]; 2026-02-21T10:19:38.0867438Z ld.shared.b16 %rs2044, [%r54+1088]; 2026-02-21T10:19:38.0867506Z ld.shared.b16 %rs2045, [%r54+8192]; 2026-02-21T10:19:38.0867569Z ld.shared.b16 %rs2046, [%r54+9216]; 2026-02-21T10:19:38.0867636Z ld.shared.b16 %rs2047, [%r54+8256]; 2026-02-21T10:19:38.0867702Z ld.shared.b16 %rs2048, [%r54+9280]; 2026-02-21T10:19:38.0867766Z ld.shared.b16 %rs2049, [%r55]; 2026-02-21T10:19:38.0867833Z ld.shared.b16 %rs2050, [%r55+1024]; 2026-02-21T10:19:38.0867896Z ld.shared.b16 %rs2051, [%r55+64]; 2026-02-21T10:19:38.0867961Z ld.shared.b16 %rs2052, [%r55+1088]; 2026-02-21T10:19:38.0868024Z ld.shared.b16 %rs2053, [%r55+8192]; 2026-02-21T10:19:38.0868092Z ld.shared.b16 %rs2054, [%r55+9216]; 2026-02-21T10:19:38.0868229Z ld.shared.b16 %rs2055, [%r55+8256]; 2026-02-21T10:19:38.0868295Z ld.shared.b16 %rs2056, [%r55+9280]; 2026-02-21T10:19:38.0868448Z ld.shared.b16 %rs2057, [%r56]; 2026-02-21T10:19:38.0868520Z ld.shared.b16 %rs2058, [%r56+1024]; 2026-02-21T10:19:38.0868588Z ld.shared.b16 %rs2059, [%r56+64]; 2026-02-21T10:19:38.0868660Z ld.shared.b16 %rs2060, [%r56+1088]; 2026-02-21T10:19:38.0868727Z ld.shared.b16 %rs2061, [%r56+8192]; 2026-02-21T10:19:38.0868791Z ld.shared.b16 %rs2062, [%r56+9216]; 2026-02-21T10:19:38.0868856Z ld.shared.b16 %rs2063, [%r56+8256]; 2026-02-21T10:19:38.0868922Z ld.shared.b16 %rs2064, [%r56+9280]; 2026-02-21T10:19:38.0868988Z ld.shared.b16 %rs2065, [%r57]; 2026-02-21T10:19:38.0869052Z ld.shared.b16 %rs2066, [%r57+1024]; 2026-02-21T10:19:38.0869117Z ld.shared.b16 %rs2067, [%r57+64]; 2026-02-21T10:19:38.0869182Z ld.shared.b16 %rs2068, [%r57+1088]; 2026-02-21T10:19:38.0869245Z ld.shared.b16 %rs2069, [%r57+8192]; 2026-02-21T10:19:38.0869311Z ld.shared.b16 %rs2070, [%r57+9216]; 2026-02-21T10:19:38.0869381Z ld.shared.b16 %rs2071, [%r57+8256]; 2026-02-21T10:19:38.0869450Z ld.shared.b16 %rs2072, [%r57+9280]; 2026-02-21T10:19:38.0872234Z ld.shared.b16 %rs2073, [%r58]; 2026-02-21T10:19:38.0872312Z ld.shared.b16 %rs2074, [%r58+1024]; 2026-02-21T10:19:38.0872382Z ld.shared.b16 %rs2075, [%r58+64]; 2026-02-21T10:19:38.0872582Z ld.shared.b16 %rs2076, [%r58+1088]; 2026-02-21T10:19:38.0872650Z ld.shared.b16 %rs2077, [%r58+8192]; 2026-02-21T10:19:38.0872720Z ld.shared.b16 %rs2078, [%r58+9216]; 2026-02-21T10:19:38.0872788Z ld.shared.b16 %rs2079, [%r58+8256]; 2026-02-21T10:19:38.0872854Z ld.shared.b16 %rs2080, [%r58+9280]; 2026-02-21T10:19:38.0872920Z cvt.f32.bf16 %r24956, %rs2017; 2026-02-21T10:19:38.0872982Z cvt.f32.bf16 %r24957, %rs2018; 2026-02-21T10:19:38.0873044Z cvt.f32.bf16 %r24958, %rs2025; 2026-02-21T10:19:38.0873104Z cvt.f32.bf16 %r24959, %rs2026; 2026-02-21T10:19:38.0873164Z cvt.f32.bf16 %r25088, %rs2033; 2026-02-21T10:19:38.0873230Z cvt.f32.bf16 %r25089, %rs2034; 2026-02-21T10:19:38.0873290Z cvt.f32.bf16 %r25090, %rs2041; 2026-02-21T10:19:38.0873350Z cvt.f32.bf16 %r25091, %rs2042; 2026-02-21T10:19:38.0873410Z cvt.f32.bf16 %r25220, %rs2049; 2026-02-21T10:19:38.0873471Z cvt.f32.bf16 %r25221, %rs2050; 2026-02-21T10:19:38.0873529Z cvt.f32.bf16 %r25222, %rs2057; 2026-02-21T10:19:38.0873591Z cvt.f32.bf16 %r25223, %rs2058; 2026-02-21T10:19:38.0873653Z cvt.f32.bf16 %r25352, %rs2065; 2026-02-21T10:19:38.0873712Z cvt.f32.bf16 %r25353, %rs2066; 2026-02-21T10:19:38.0873846Z cvt.f32.bf16 %r25354, %rs2073; 2026-02-21T10:19:38.0873915Z cvt.f32.bf16 %r25355, %rs2074; 2026-02-21T10:19:38.0873984Z cvt.f32.bf16 %r25484, %rs2019; 2026-02-21T10:19:38.0874046Z cvt.f32.bf16 %r25485, %rs2020; 2026-02-21T10:19:38.0874108Z cvt.f32.bf16 %r25486, %rs2027; 2026-02-21T10:19:38.0874171Z cvt.f32.bf16 %r25487, %rs2028; 2026-02-21T10:19:38.0874230Z cvt.f32.bf16 %r25616, %rs2035; 2026-02-21T10:19:38.0874340Z cvt.f32.bf16 %r25617, %rs2036; 2026-02-21T10:19:38.0874402Z cvt.f32.bf16 %r25618, %rs2043; 2026-02-21T10:19:38.0874463Z cvt.f32.bf16 %r25619, %rs2044; 2026-02-21T10:19:38.0874522Z cvt.f32.bf16 %r25748, %rs2051; 2026-02-21T10:19:38.0874579Z cvt.f32.bf16 %r25749, %rs2052; 2026-02-21T10:19:38.0874639Z cvt.f32.bf16 %r25750, %rs2059; 2026-02-21T10:19:38.0874700Z cvt.f32.bf16 %r25751, %rs2060; 2026-02-21T10:19:38.0874761Z cvt.f32.bf16 %r25880, %rs2067; 2026-02-21T10:19:38.0874823Z cvt.f32.bf16 %r25881, %rs2068; 2026-02-21T10:19:38.0874884Z cvt.f32.bf16 %r25882, %rs2075; 2026-02-21T10:19:38.0874943Z cvt.f32.bf16 %r25883, %rs2076; 2026-02-21T10:19:38.0875005Z cvt.f32.bf16 %r26012, %rs2021; 2026-02-21T10:19:38.0875064Z cvt.f32.bf16 %r26013, %rs2022; 2026-02-21T10:19:38.0875122Z cvt.f32.bf16 %r26014, %rs2029; 2026-02-21T10:19:38.0875181Z cvt.f32.bf16 %r26015, %rs2030; 2026-02-21T10:19:38.0875242Z cvt.f32.bf16 %r26144, %rs2037; 2026-02-21T10:19:38.0875301Z cvt.f32.bf16 %r26145, %rs2038; 2026-02-21T10:19:38.0875410Z cvt.f32.bf16 %r26146, %rs2045; 2026-02-21T10:19:38.0875476Z cvt.f32.bf16 %r26147, %rs2046; 2026-02-21T10:19:38.0875538Z cvt.f32.bf16 %r26276, %rs2053; 2026-02-21T10:19:38.0875596Z cvt.f32.bf16 %r26277, %rs2054; 2026-02-21T10:19:38.0875654Z cvt.f32.bf16 %r26278, %rs2061; 2026-02-21T10:19:38.0875716Z cvt.f32.bf16 %r26279, %rs2062; 2026-02-21T10:19:38.0875776Z cvt.f32.bf16 %r26408, %rs2069; 2026-02-21T10:19:38.0875837Z cvt.f32.bf16 %r26409, %rs2070; 2026-02-21T10:19:38.0875909Z cvt.f32.bf16 %r26410, %rs2077; 2026-02-21T10:19:38.0875972Z cvt.f32.bf16 %r26411, %rs2078; 2026-02-21T10:19:38.0876031Z cvt.f32.bf16 %r26540, %rs2023; 2026-02-21T10:19:38.0876088Z cvt.f32.bf16 %r26541, %rs2024; 2026-02-21T10:19:38.0876152Z cvt.f32.bf16 %r26542, %rs2031; 2026-02-21T10:19:38.0876210Z cvt.f32.bf16 %r26543, %rs2032; 2026-02-21T10:19:38.0876267Z cvt.f32.bf16 %r26672, %rs2039; 2026-02-21T10:19:38.0876328Z cvt.f32.bf16 %r26673, %rs2040; 2026-02-21T10:19:38.0876388Z cvt.f32.bf16 %r26674, %rs2047; 2026-02-21T10:19:38.0876609Z cvt.f32.bf16 %r26675, %rs2048; 2026-02-21T10:19:38.0876677Z cvt.f32.bf16 %r26804, %rs2055; 2026-02-21T10:19:38.0876738Z cvt.f32.bf16 %r26805, %rs2056; 2026-02-21T10:19:38.0876796Z cvt.f32.bf16 %r26806, %rs2063; 2026-02-21T10:19:38.0876855Z cvt.f32.bf16 %r26807, %rs2064; 2026-02-21T10:19:38.0876997Z cvt.f32.bf16 %r26936, %rs2071; 2026-02-21T10:19:38.0877057Z cvt.f32.bf16 %r26937, %rs2072; 2026-02-21T10:19:38.0877124Z cvt.f32.bf16 %r26938, %rs2079; 2026-02-21T10:19:38.0877192Z cvt.f32.bf16 %r26939, %rs2080; 2026-02-21T10:19:38.0877426Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0877484Z bar.sync 0; 2026-02-21T10:19:38.0877544Z // begin inline asm 2026-02-21T10:19:38.0877658Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0877717Z // end inline asm 2026-02-21T10:19:38.0877771Z bar.sync 0; 2026-02-21T10:19:38.0877831Z // begin inline asm 2026-02-21T10:19:38.0877980Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0878036Z // end inline asm 2026-02-21T10:19:38.0878093Z // begin inline asm 2026-02-21T10:19:38.0878175Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0878229Z // end inline asm 2026-02-21T10:19:38.0878283Z bar.sync 0; 2026-02-21T10:19:38.0878359Z elect.sync %r29683|%p282, -1; 2026-02-21T10:19:38.0878427Z and.pred %p242, %p1, %p282; 2026-02-21T10:19:38.0878492Z cvt.u32.u64 %r29684, %rd849; 2026-02-21T10:19:38.0878635Z add.s32 %r24823, %r29684, 128; 2026-02-21T10:19:38.0878702Z // begin inline asm 2026-02-21T10:19:38.0879056Z @%p242 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r29849, %r24823}], [%r29846]; 2026-02-21T10:19:38.0879113Z // end inline asm 2026-02-21T10:19:38.0879171Z bar.sync 0; 2026-02-21T10:19:38.0879227Z // begin inline asm 2026-02-21T10:19:38.0879279Z 2026-02-21T10:19:38.0879330Z { 2026-02-21T10:19:38.0879470Z .reg .pred complete; 2026-02-21T10:19:38.0879527Z waitLoop: 2026-02-21T10:19:38.0879679Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r29485; 2026-02-21T10:19:38.0879753Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0879801Z } 2026-02-21T10:19:38.0879807Z 2026-02-21T10:19:38.0879861Z // end inline asm 2026-02-21T10:19:38.0879921Z bar.sync 0; 2026-02-21T10:19:38.0879978Z // begin inline asm 2026-02-21T10:19:38.0880078Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0880135Z // end inline asm 2026-02-21T10:19:38.0880351Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0880419Z ld.shared.s8 %rs2081, [%r19]; 2026-02-21T10:19:38.0880616Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0880684Z shl.b16 %rs2082, %rs2081, 4; 2026-02-21T10:19:38.0880971Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0881048Z ld.shared.s8 %rs2083, [%r20+128]; 2026-02-21T10:19:38.0881255Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0881319Z shl.b16 %rs2084, %rs2083, 4; 2026-02-21T10:19:38.0881515Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0881586Z ld.shared.s8 %rs2085, [%r21+256]; 2026-02-21T10:19:38.0881778Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0881852Z shl.b16 %rs2086, %rs2085, 4; 2026-02-21T10:19:38.0882047Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0882117Z ld.shared.s8 %rs2087, [%r22+384]; 2026-02-21T10:19:38.0882306Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0882370Z shl.b16 %rs2088, %rs2087, 4; 2026-02-21T10:19:38.0882560Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0882624Z ld.shared.s8 %rs2089, [%r23+512]; 2026-02-21T10:19:38.0882812Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0882939Z shl.b16 %rs2090, %rs2089, 4; 2026-02-21T10:19:38.0883142Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0883210Z ld.shared.s8 %rs2091, [%r24+640]; 2026-02-21T10:19:38.0883407Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0883469Z shl.b16 %rs2092, %rs2091, 4; 2026-02-21T10:19:38.0883660Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0883725Z ld.shared.s8 %rs2093, [%r25+768]; 2026-02-21T10:19:38.0883926Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0883988Z shl.b16 %rs2094, %rs2093, 4; 2026-02-21T10:19:38.0884184Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0884254Z ld.shared.s8 %rs2095, [%r26+896]; 2026-02-21T10:19:38.0884447Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0884567Z shl.b16 %rs2096, %rs2095, 4; 2026-02-21T10:19:38.0884760Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0884830Z ld.shared.s8 %rs2097, [%r19+1024]; 2026-02-21T10:19:38.0885017Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0885081Z shl.b16 %rs2098, %rs2097, 4; 2026-02-21T10:19:38.0885316Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0885381Z ld.shared.s8 %rs2099, [%r20+1152]; 2026-02-21T10:19:38.0885570Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0885635Z shl.b16 %rs2100, %rs2099, 4; 2026-02-21T10:19:38.0885836Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0885901Z ld.shared.s8 %rs2101, [%r21+1280]; 2026-02-21T10:19:38.0886094Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0886154Z shl.b16 %rs2102, %rs2101, 4; 2026-02-21T10:19:38.0886343Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0886408Z ld.shared.s8 %rs2103, [%r22+1408]; 2026-02-21T10:19:38.0886721Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0886863Z shl.b16 %rs2104, %rs2103, 4; 2026-02-21T10:19:38.0887072Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0887138Z ld.shared.s8 %rs2105, [%r23+1536]; 2026-02-21T10:19:38.0887326Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0887389Z shl.b16 %rs2106, %rs2105, 4; 2026-02-21T10:19:38.0887582Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0887645Z ld.shared.s8 %rs2107, [%r24+1664]; 2026-02-21T10:19:38.0887830Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0887893Z shl.b16 %rs2108, %rs2107, 4; 2026-02-21T10:19:38.0888078Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0888145Z ld.shared.s8 %rs2109, [%r25+1792]; 2026-02-21T10:19:38.0888335Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0888394Z shl.b16 %rs2110, %rs2109, 4; 2026-02-21T10:19:38.0888578Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0888722Z ld.shared.s8 %rs2111, [%r26+1920]; 2026-02-21T10:19:38.0888912Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0888973Z shl.b16 %rs2112, %rs2111, 4; 2026-02-21T10:19:38.0889171Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0889238Z ld.shared.s8 %rs2113, [%r19+2048]; 2026-02-21T10:19:38.0889432Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0889495Z shl.b16 %rs2114, %rs2113, 4; 2026-02-21T10:19:38.0889693Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0889756Z ld.shared.s8 %rs2115, [%r20+2176]; 2026-02-21T10:19:38.0889943Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0890008Z shl.b16 %rs2116, %rs2115, 4; 2026-02-21T10:19:38.0890195Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0890328Z ld.shared.s8 %rs2117, [%r21+2304]; 2026-02-21T10:19:38.0890532Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0890595Z shl.b16 %rs2118, %rs2117, 4; 2026-02-21T10:19:38.0890786Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0890855Z ld.shared.s8 %rs2119, [%r22+2432]; 2026-02-21T10:19:38.0891105Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0891167Z shl.b16 %rs2120, %rs2119, 4; 2026-02-21T10:19:38.0891354Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0891429Z ld.shared.s8 %rs2121, [%r23+2560]; 2026-02-21T10:19:38.0891622Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0891681Z shl.b16 %rs2122, %rs2121, 4; 2026-02-21T10:19:38.0891876Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0891938Z ld.shared.s8 %rs2123, [%r24+2688]; 2026-02-21T10:19:38.0892125Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0892189Z shl.b16 %rs2124, %rs2123, 4; 2026-02-21T10:19:38.0892376Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0892489Z ld.shared.s8 %rs2125, [%r25+2816]; 2026-02-21T10:19:38.0892685Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0892745Z shl.b16 %rs2126, %rs2125, 4; 2026-02-21T10:19:38.0892931Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0892995Z ld.shared.s8 %rs2127, [%r26+2944]; 2026-02-21T10:19:38.0893184Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0893244Z shl.b16 %rs2128, %rs2127, 4; 2026-02-21T10:19:38.0893429Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0893503Z ld.shared.s8 %rs2129, [%r19+3072]; 2026-02-21T10:19:38.0893696Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0893758Z shl.b16 %rs2130, %rs2129, 4; 2026-02-21T10:19:38.0893949Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0894014Z ld.shared.s8 %rs2131, [%r20+3200]; 2026-02-21T10:19:38.0894200Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0894324Z shl.b16 %rs2132, %rs2131, 4; 2026-02-21T10:19:38.0894520Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0894583Z ld.shared.s8 %rs2133, [%r21+3328]; 2026-02-21T10:19:38.0894770Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0894832Z shl.b16 %rs2134, %rs2133, 4; 2026-02-21T10:19:38.0895018Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0895080Z ld.shared.s8 %rs2135, [%r22+3456]; 2026-02-21T10:19:38.0895274Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0895333Z shl.b16 %rs2136, %rs2135, 4; 2026-02-21T10:19:38.0895519Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0895587Z ld.shared.s8 %rs2137, [%r23+3584]; 2026-02-21T10:19:38.0895774Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0895883Z shl.b16 %rs2138, %rs2137, 4; 2026-02-21T10:19:38.0896073Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0896136Z ld.shared.s8 %rs2139, [%r24+3712]; 2026-02-21T10:19:38.0896323Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0896382Z shl.b16 %rs2140, %rs2139, 4; 2026-02-21T10:19:38.0896753Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0896820Z ld.shared.s8 %rs2141, [%r25+3840]; 2026-02-21T10:19:38.0897021Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0897084Z shl.b16 %rs2142, %rs2141, 4; 2026-02-21T10:19:38.0897272Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0897335Z ld.shared.s8 %rs2143, [%r26+3968]; 2026-02-21T10:19:38.0897524Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0897584Z shl.b16 %rs2144, %rs2143, 4; 2026-02-21T10:19:38.0897772Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0897835Z cvt.s16.s8 %rs2145, %rs2082; 2026-02-21T10:19:38.0897896Z shr.s16 %rs2146, %rs2145, 4; 2026-02-21T10:19:38.0897959Z cvt.s16.s8 %rs2147, %rs2084; 2026-02-21T10:19:38.0898093Z shr.s16 %rs2148, %rs2147, 4; 2026-02-21T10:19:38.0898159Z shr.s16 %rs2149, %rs2081, 4; 2026-02-21T10:19:38.0898218Z shr.s16 %rs2150, %rs2083, 4; 2026-02-21T10:19:38.0898407Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0898481Z cvt.rn.f32.s16 %r29685, %rs2150; 2026-02-21T10:19:38.0898543Z cvt.rn.f32.s16 %r29686, %rs2149; 2026-02-21T10:19:38.0898605Z cvt.rn.f32.s16 %r29687, %rs2148; 2026-02-21T10:19:38.0898668Z cvt.rn.f32.s16 %r29688, %rs2146; 2026-02-21T10:19:38.0898857Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0898919Z cvt.s16.s8 %rs2151, %rs2086; 2026-02-21T10:19:38.0898978Z shr.s16 %rs2152, %rs2151, 4; 2026-02-21T10:19:38.0899040Z cvt.s16.s8 %rs2153, %rs2088; 2026-02-21T10:19:38.0899098Z shr.s16 %rs2154, %rs2153, 4; 2026-02-21T10:19:38.0899158Z shr.s16 %rs2155, %rs2085, 4; 2026-02-21T10:19:38.0899223Z shr.s16 %rs2156, %rs2087, 4; 2026-02-21T10:19:38.0899411Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0899472Z cvt.rn.f32.s16 %r29689, %rs2156; 2026-02-21T10:19:38.0899538Z cvt.rn.f32.s16 %r29690, %rs2155; 2026-02-21T10:19:38.0899673Z cvt.rn.f32.s16 %r29691, %rs2154; 2026-02-21T10:19:38.0899732Z cvt.rn.f32.s16 %r29692, %rs2152; 2026-02-21T10:19:38.0899920Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0899987Z cvt.s16.s8 %rs2157, %rs2090; 2026-02-21T10:19:38.0900058Z shr.s16 %rs2158, %rs2157, 4; 2026-02-21T10:19:38.0900120Z cvt.s16.s8 %rs2159, %rs2092; 2026-02-21T10:19:38.0900182Z shr.s16 %rs2160, %rs2159, 4; 2026-02-21T10:19:38.0900240Z shr.s16 %rs2161, %rs2089, 4; 2026-02-21T10:19:38.0900298Z shr.s16 %rs2162, %rs2091, 4; 2026-02-21T10:19:38.0900491Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0900561Z cvt.rn.f32.s16 %r29693, %rs2162; 2026-02-21T10:19:38.0900622Z cvt.rn.f32.s16 %r29694, %rs2161; 2026-02-21T10:19:38.0900681Z cvt.rn.f32.s16 %r29695, %rs2160; 2026-02-21T10:19:38.0900743Z cvt.rn.f32.s16 %r29696, %rs2158; 2026-02-21T10:19:38.0900940Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0901005Z cvt.s16.s8 %rs2163, %rs2094; 2026-02-21T10:19:38.0901135Z shr.s16 %rs2164, %rs2163, 4; 2026-02-21T10:19:38.0901199Z cvt.s16.s8 %rs2165, %rs2096; 2026-02-21T10:19:38.0901257Z shr.s16 %rs2166, %rs2165, 4; 2026-02-21T10:19:38.0901315Z shr.s16 %rs2167, %rs2093, 4; 2026-02-21T10:19:38.0901375Z shr.s16 %rs2168, %rs2095, 4; 2026-02-21T10:19:38.0901570Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0901631Z cvt.rn.f32.s16 %r29697, %rs2168; 2026-02-21T10:19:38.0901780Z cvt.rn.f32.s16 %r29698, %rs2167; 2026-02-21T10:19:38.0901840Z cvt.rn.f32.s16 %r29699, %rs2166; 2026-02-21T10:19:38.0901901Z cvt.rn.f32.s16 %r29700, %rs2164; 2026-02-21T10:19:38.0902092Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0902152Z cvt.s16.s8 %rs2169, %rs2098; 2026-02-21T10:19:38.0902212Z shr.s16 %rs2170, %rs2169, 4; 2026-02-21T10:19:38.0902270Z cvt.s16.s8 %rs2171, %rs2100; 2026-02-21T10:19:38.0902343Z shr.s16 %rs2172, %rs2171, 4; 2026-02-21T10:19:38.0902404Z shr.s16 %rs2173, %rs2097, 4; 2026-02-21T10:19:38.0902462Z shr.s16 %rs2174, %rs2099, 4; 2026-02-21T10:19:38.0902656Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0902717Z cvt.rn.f32.s16 %r29701, %rs2174; 2026-02-21T10:19:38.0902778Z cvt.rn.f32.s16 %r29702, %rs2173; 2026-02-21T10:19:38.0902842Z cvt.rn.f32.s16 %r29703, %rs2172; 2026-02-21T10:19:38.0902904Z cvt.rn.f32.s16 %r29704, %rs2170; 2026-02-21T10:19:38.0903144Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0903207Z cvt.s16.s8 %rs2175, %rs2102; 2026-02-21T10:19:38.0903268Z shr.s16 %rs2176, %rs2175, 4; 2026-02-21T10:19:38.0903327Z cvt.s16.s8 %rs2177, %rs2104; 2026-02-21T10:19:38.0903387Z shr.s16 %rs2178, %rs2177, 4; 2026-02-21T10:19:38.0903449Z shr.s16 %rs2179, %rs2101, 4; 2026-02-21T10:19:38.0903507Z shr.s16 %rs2180, %rs2103, 4; 2026-02-21T10:19:38.0903696Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0903760Z cvt.rn.f32.s16 %r29705, %rs2180; 2026-02-21T10:19:38.0903821Z cvt.rn.f32.s16 %r29706, %rs2179; 2026-02-21T10:19:38.0903880Z cvt.rn.f32.s16 %r29707, %rs2178; 2026-02-21T10:19:38.0903941Z cvt.rn.f32.s16 %r29708, %rs2176; 2026-02-21T10:19:38.0904142Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0904206Z cvt.s16.s8 %rs2181, %rs2106; 2026-02-21T10:19:38.0904265Z shr.s16 %rs2182, %rs2181, 4; 2026-02-21T10:19:38.0904335Z cvt.s16.s8 %rs2183, %rs2108; 2026-02-21T10:19:38.0904394Z shr.s16 %rs2184, %rs2183, 4; 2026-02-21T10:19:38.0904452Z shr.s16 %rs2185, %rs2105, 4; 2026-02-21T10:19:38.0904512Z shr.s16 %rs2186, %rs2107, 4; 2026-02-21T10:19:38.0904754Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0904817Z cvt.rn.f32.s16 %r29709, %rs2186; 2026-02-21T10:19:38.0904878Z cvt.rn.f32.s16 %r29710, %rs2185; 2026-02-21T10:19:38.0904943Z cvt.rn.f32.s16 %r29711, %rs2184; 2026-02-21T10:19:38.0905003Z cvt.rn.f32.s16 %r29712, %rs2182; 2026-02-21T10:19:38.0905190Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0905252Z cvt.s16.s8 %rs2187, %rs2110; 2026-02-21T10:19:38.0905311Z shr.s16 %rs2188, %rs2187, 4; 2026-02-21T10:19:38.0905371Z cvt.s16.s8 %rs2189, %rs2112; 2026-02-21T10:19:38.0905434Z shr.s16 %rs2190, %rs2189, 4; 2026-02-21T10:19:38.0905492Z shr.s16 %rs2191, %rs2109, 4; 2026-02-21T10:19:38.0905551Z shr.s16 %rs2192, %rs2111, 4; 2026-02-21T10:19:38.0905738Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0905805Z cvt.rn.f32.s16 %r29713, %rs2192; 2026-02-21T10:19:38.0905866Z cvt.rn.f32.s16 %r29714, %rs2191; 2026-02-21T10:19:38.0905926Z cvt.rn.f32.s16 %r29715, %rs2190; 2026-02-21T10:19:38.0906039Z cvt.rn.f32.s16 %r29716, %rs2188; 2026-02-21T10:19:38.0906230Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0906301Z cvt.s16.s8 %rs2193, %rs2114; 2026-02-21T10:19:38.0906364Z shr.s16 %rs2194, %rs2193, 4; 2026-02-21T10:19:38.0906425Z cvt.s16.s8 %rs2195, %rs2116; 2026-02-21T10:19:38.0906598Z shr.s16 %rs2196, %rs2195, 4; 2026-02-21T10:19:38.0906749Z shr.s16 %rs2197, %rs2113, 4; 2026-02-21T10:19:38.0906815Z shr.s16 %rs2198, %rs2115, 4; 2026-02-21T10:19:38.0907028Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0907095Z cvt.rn.f32.s16 %r29717, %rs2198; 2026-02-21T10:19:38.0907159Z cvt.rn.f32.s16 %r29718, %rs2197; 2026-02-21T10:19:38.0907226Z cvt.rn.f32.s16 %r29719, %rs2196; 2026-02-21T10:19:38.0907287Z cvt.rn.f32.s16 %r29720, %rs2194; 2026-02-21T10:19:38.0907483Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0907543Z cvt.s16.s8 %rs2199, %rs2118; 2026-02-21T10:19:38.0907603Z shr.s16 %rs2200, %rs2199, 4; 2026-02-21T10:19:38.0907662Z cvt.s16.s8 %rs2201, %rs2120; 2026-02-21T10:19:38.0907723Z shr.s16 %rs2202, %rs2201, 4; 2026-02-21T10:19:38.0907780Z shr.s16 %rs2203, %rs2117, 4; 2026-02-21T10:19:38.0907839Z shr.s16 %rs2204, %rs2119, 4; 2026-02-21T10:19:38.0908098Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0908164Z cvt.rn.f32.s16 %r29721, %rs2204; 2026-02-21T10:19:38.0908226Z cvt.rn.f32.s16 %r29722, %rs2203; 2026-02-21T10:19:38.0908290Z cvt.rn.f32.s16 %r29723, %rs2202; 2026-02-21T10:19:38.0908353Z cvt.rn.f32.s16 %r29724, %rs2200; 2026-02-21T10:19:38.0908631Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0908693Z cvt.s16.s8 %rs2205, %rs2122; 2026-02-21T10:19:38.0908758Z shr.s16 %rs2206, %rs2205, 4; 2026-02-21T10:19:38.0908816Z cvt.s16.s8 %rs2207, %rs2124; 2026-02-21T10:19:38.0908875Z shr.s16 %rs2208, %rs2207, 4; 2026-02-21T10:19:38.0908935Z shr.s16 %rs2209, %rs2121, 4; 2026-02-21T10:19:38.0908998Z shr.s16 %rs2210, %rs2123, 4; 2026-02-21T10:19:38.0909188Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0909249Z cvt.rn.f32.s16 %r29725, %rs2210; 2026-02-21T10:19:38.0909315Z cvt.rn.f32.s16 %r29726, %rs2209; 2026-02-21T10:19:38.0909375Z cvt.rn.f32.s16 %r29727, %rs2208; 2026-02-21T10:19:38.0909434Z cvt.rn.f32.s16 %r29728, %rs2206; 2026-02-21T10:19:38.0909625Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0909683Z cvt.s16.s8 %rs2211, %rs2126; 2026-02-21T10:19:38.0909820Z shr.s16 %rs2212, %rs2211, 4; 2026-02-21T10:19:38.0909884Z cvt.s16.s8 %rs2213, %rs2128; 2026-02-21T10:19:38.0909942Z shr.s16 %rs2214, %rs2213, 4; 2026-02-21T10:19:38.0910000Z shr.s16 %rs2215, %rs2125, 4; 2026-02-21T10:19:38.0910058Z shr.s16 %rs2216, %rs2127, 4; 2026-02-21T10:19:38.0910250Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0910312Z cvt.rn.f32.s16 %r29729, %rs2216; 2026-02-21T10:19:38.0910371Z cvt.rn.f32.s16 %r29730, %rs2215; 2026-02-21T10:19:38.0910432Z cvt.rn.f32.s16 %r29731, %rs2214; 2026-02-21T10:19:38.0910492Z cvt.rn.f32.s16 %r29732, %rs2212; 2026-02-21T10:19:38.0910682Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0910743Z cvt.s16.s8 %rs2217, %rs2130; 2026-02-21T10:19:38.0910803Z shr.s16 %rs2218, %rs2217, 4; 2026-02-21T10:19:38.0910861Z cvt.s16.s8 %rs2219, %rs2132; 2026-02-21T10:19:38.0910920Z shr.s16 %rs2220, %rs2219, 4; 2026-02-21T10:19:38.0910980Z shr.s16 %rs2221, %rs2129, 4; 2026-02-21T10:19:38.0911038Z shr.s16 %rs2222, %rs2131, 4; 2026-02-21T10:19:38.0911291Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0911359Z cvt.rn.f32.s16 %r29733, %rs2222; 2026-02-21T10:19:38.0911419Z cvt.rn.f32.s16 %r29734, %rs2221; 2026-02-21T10:19:38.0911479Z cvt.rn.f32.s16 %r29735, %rs2220; 2026-02-21T10:19:38.0911539Z cvt.rn.f32.s16 %r29736, %rs2218; 2026-02-21T10:19:38.0911729Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0911836Z cvt.s16.s8 %rs2223, %rs2134; 2026-02-21T10:19:38.0911901Z shr.s16 %rs2224, %rs2223, 4; 2026-02-21T10:19:38.0911960Z cvt.s16.s8 %rs2225, %rs2136; 2026-02-21T10:19:38.0912018Z shr.s16 %rs2226, %rs2225, 4; 2026-02-21T10:19:38.0912078Z shr.s16 %rs2227, %rs2133, 4; 2026-02-21T10:19:38.0912139Z shr.s16 %rs2228, %rs2135, 4; 2026-02-21T10:19:38.0912328Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0912392Z cvt.rn.f32.s16 %r29737, %rs2228; 2026-02-21T10:19:38.0912453Z cvt.rn.f32.s16 %r29738, %rs2227; 2026-02-21T10:19:38.0912512Z cvt.rn.f32.s16 %r29739, %rs2226; 2026-02-21T10:19:38.0912570Z cvt.rn.f32.s16 %r29740, %rs2224; 2026-02-21T10:19:38.0912757Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0912816Z cvt.s16.s8 %rs2229, %rs2138; 2026-02-21T10:19:38.0912875Z shr.s16 %rs2230, %rs2229, 4; 2026-02-21T10:19:38.0912986Z cvt.s16.s8 %rs2231, %rs2140; 2026-02-21T10:19:38.0913046Z shr.s16 %rs2232, %rs2231, 4; 2026-02-21T10:19:38.0913104Z shr.s16 %rs2233, %rs2137, 4; 2026-02-21T10:19:38.0913161Z shr.s16 %rs2234, %rs2139, 4; 2026-02-21T10:19:38.0913350Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0913413Z cvt.rn.f32.s16 %r29741, %rs2234; 2026-02-21T10:19:38.0913471Z cvt.rn.f32.s16 %r29742, %rs2233; 2026-02-21T10:19:38.0913538Z cvt.rn.f32.s16 %r29743, %rs2232; 2026-02-21T10:19:38.0913598Z cvt.rn.f32.s16 %r29744, %rs2230; 2026-02-21T10:19:38.0913787Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0913849Z cvt.s16.s8 %rs2235, %rs2142; 2026-02-21T10:19:38.0913906Z shr.s16 %rs2236, %rs2235, 4; 2026-02-21T10:19:38.0913963Z cvt.s16.s8 %rs2237, %rs2144; 2026-02-21T10:19:38.0914021Z shr.s16 %rs2238, %rs2237, 4; 2026-02-21T10:19:38.0914083Z shr.s16 %rs2239, %rs2141, 4; 2026-02-21T10:19:38.0914141Z shr.s16 %rs2240, %rs2143, 4; 2026-02-21T10:19:38.0914328Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0914392Z cvt.rn.f32.s16 %r29745, %rs2240; 2026-02-21T10:19:38.0914451Z cvt.rn.f32.s16 %r29746, %rs2239; 2026-02-21T10:19:38.0914577Z cvt.rn.f32.s16 %r29747, %rs2238; 2026-02-21T10:19:38.0914639Z cvt.rn.f32.s16 %r29748, %rs2236; 2026-02-21T10:19:38.0914694Z bar.sync 0; 2026-02-21T10:19:38.0914817Z st.shared.v4.b32 [%r27], {%r29688, %r29686, %r29687, %r29685}; 2026-02-21T10:19:38.0914943Z st.shared.v4.b32 [%r27+16384], {%r29720, %r29718, %r29719, %r29717}; 2026-02-21T10:19:38.0915055Z st.shared.v4.b32 [%r28], {%r29692, %r29690, %r29691, %r29689}; 2026-02-21T10:19:38.0915175Z st.shared.v4.b32 [%r28+16384], {%r29724, %r29722, %r29723, %r29721}; 2026-02-21T10:19:38.0915280Z st.shared.v4.b32 [%r29], {%r29696, %r29694, %r29695, %r29693}; 2026-02-21T10:19:38.0915412Z st.shared.v4.b32 [%r29+16384], {%r29728, %r29726, %r29727, %r29725}; 2026-02-21T10:19:38.0915521Z st.shared.v4.b32 [%r30], {%r29700, %r29698, %r29699, %r29697}; 2026-02-21T10:19:38.0915636Z st.shared.v4.b32 [%r30+16384], {%r29732, %r29730, %r29731, %r29729}; 2026-02-21T10:19:38.0915742Z st.shared.v4.b32 [%r31], {%r29704, %r29702, %r29703, %r29701}; 2026-02-21T10:19:38.0915859Z st.shared.v4.b32 [%r31+16384], {%r29736, %r29734, %r29735, %r29733}; 2026-02-21T10:19:38.0916018Z st.shared.v4.b32 [%r32], {%r29708, %r29706, %r29707, %r29705}; 2026-02-21T10:19:38.0916138Z st.shared.v4.b32 [%r32+16384], {%r29740, %r29738, %r29739, %r29737}; 2026-02-21T10:19:38.0916243Z st.shared.v4.b32 [%r33], {%r29712, %r29710, %r29711, %r29709}; 2026-02-21T10:19:38.0916356Z st.shared.v4.b32 [%r33+16384], {%r29744, %r29742, %r29743, %r29741}; 2026-02-21T10:19:38.0916587Z st.shared.v4.b32 [%r34], {%r29716, %r29714, %r29715, %r29713}; 2026-02-21T10:19:38.0916712Z st.shared.v4.b32 [%r34+16384], {%r29748, %r29746, %r29747, %r29745}; 2026-02-21T10:19:38.0916852Z $L__tmp19: 2026-02-21T10:19:38.0917129Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.0917202Z // begin inline asm 2026-02-21T10:19:38.0917285Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0917343Z // end inline asm 2026-02-21T10:19:38.0917398Z bar.sync 0; 2026-02-21T10:19:38.0917469Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.0917525Z // begin inline asm 2026-02-21T10:19:38.0919058Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r24956,%r24957,%r24958,%r24959}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.0919122Z // end inline asm 2026-02-21T10:19:38.0919178Z // begin inline asm 2026-02-21T10:19:38.0920649Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25088,%r25089,%r25090,%r25091}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.0920706Z // end inline asm 2026-02-21T10:19:38.0920763Z // begin inline asm 2026-02-21T10:19:38.0922237Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25220,%r25221,%r25222,%r25223}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.0922358Z // end inline asm 2026-02-21T10:19:38.0922416Z // begin inline asm 2026-02-21T10:19:38.0923893Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25352,%r25353,%r25354,%r25355}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.0923951Z // end inline asm 2026-02-21T10:19:38.0924010Z // begin inline asm 2026-02-21T10:19:38.0925575Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25484,%r25485,%r25486,%r25487}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.0925684Z // end inline asm 2026-02-21T10:19:38.0925745Z // begin inline asm 2026-02-21T10:19:38.0927375Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25616,%r25617,%r25618,%r25619}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.0927440Z // end inline asm 2026-02-21T10:19:38.0927498Z // begin inline asm 2026-02-21T10:19:38.0929047Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25748,%r25749,%r25750,%r25751}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.0929113Z // end inline asm 2026-02-21T10:19:38.0929168Z // begin inline asm 2026-02-21T10:19:38.0930649Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r25880,%r25881,%r25882,%r25883}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.0930707Z // end inline asm 2026-02-21T10:19:38.0930824Z // begin inline asm 2026-02-21T10:19:38.0932323Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26012,%r26013,%r26014,%r26015}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.0932383Z // end inline asm 2026-02-21T10:19:38.0932441Z // begin inline asm 2026-02-21T10:19:38.0933990Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26144,%r26145,%r26146,%r26147}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.0934051Z // end inline asm 2026-02-21T10:19:38.0934118Z // begin inline asm 2026-02-21T10:19:38.0935753Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26276,%r26277,%r26278,%r26279}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.0935909Z // end inline asm 2026-02-21T10:19:38.0935973Z // begin inline asm 2026-02-21T10:19:38.0937642Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26408,%r26409,%r26410,%r26411}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.0937714Z // end inline asm 2026-02-21T10:19:38.0937773Z // begin inline asm 2026-02-21T10:19:38.0939250Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26540,%r26541,%r26542,%r26543}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.0939314Z // end inline asm 2026-02-21T10:19:38.0939372Z // begin inline asm 2026-02-21T10:19:38.0940831Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26672,%r26673,%r26674,%r26675}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.0940957Z // end inline asm 2026-02-21T10:19:38.0941016Z // begin inline asm 2026-02-21T10:19:38.0942469Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26804,%r26805,%r26806,%r26807}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.0942528Z // end inline asm 2026-02-21T10:19:38.0942584Z // begin inline asm 2026-02-21T10:19:38.0944135Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r26936,%r26937,%r26938,%r26939}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.0944259Z // end inline asm 2026-02-21T10:19:38.0944340Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.0944402Z mov.b32 %r27069, %r29485; 2026-02-21T10:19:38.0944461Z mov.b32 %r27070, %r29485; 2026-02-21T10:19:38.0944519Z mov.b32 %r27068, %r39931; 2026-02-21T10:19:38.0944581Z // begin inline asm 2026-02-21T10:19:38.0947277Z // wait for regs: %r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r27068,%r27069,%r27070 2026-02-21T10:19:38.0947372Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.0947428Z // end inline asm 2026-02-21T10:19:38.0947482Z $L__tmp20: 2026-02-21T10:19:38.0947698Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.0947776Z add.s64 %rd582, %rd500, 256; 2026-02-21T10:19:38.0947838Z add.s64 %rd585, %rd503, 256; 2026-02-21T10:19:38.0947902Z add.s64 %rd588, %rd506, 256; 2026-02-21T10:19:38.0947964Z add.s64 %rd591, %rd509, 256; 2026-02-21T10:19:38.0948023Z add.s64 %rd594, %rd512, 256; 2026-02-21T10:19:38.0948081Z add.s64 %rd597, %rd515, 256; 2026-02-21T10:19:38.0948144Z add.s64 %rd600, %rd518, 256; 2026-02-21T10:19:38.0948222Z mad.wide.s32 %rd603, %r42982, 2, %rd117; 2026-02-21T10:19:38.0948488Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.0948639Z // begin inline asm 2026-02-21T10:19:38.0948699Z mov.u64 %rd581, 0x0; 2026-02-21T10:19:38.0948835Z createpolicy.fractional.L2::evict_first.b64 %rd581, 1.0; 2026-02-21T10:19:38.0948891Z // end inline asm 2026-02-21T10:19:38.0948950Z // begin inline asm 2026-02-21T10:19:38.0949011Z mov.u32 %r27202, 0x0; 2026-02-21T10:19:38.0949077Z mov.u32 %r27203, 0x0; 2026-02-21T10:19:38.0949137Z mov.u32 %r27204, 0x0; 2026-02-21T10:19:38.0949191Z mov.u32 %r27205, 0x0; 2026-02-21T10:19:38.0949429Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27202, %r27203, %r27204, %r27205 }, [ %rd582 + 0 ], %rd581; 2026-02-21T10:19:38.0949491Z // end inline asm 2026-02-21T10:19:38.0949547Z // begin inline asm 2026-02-21T10:19:38.0949603Z mov.u64 %rd584, 0x0; 2026-02-21T10:19:38.0949725Z createpolicy.fractional.L2::evict_first.b64 %rd584, 1.0; 2026-02-21T10:19:38.0949783Z // end inline asm 2026-02-21T10:19:38.0949842Z // begin inline asm 2026-02-21T10:19:38.0949898Z mov.u32 %r27206, 0x0; 2026-02-21T10:19:38.0949954Z mov.u32 %r27207, 0x0; 2026-02-21T10:19:38.0950009Z mov.u32 %r27208, 0x0; 2026-02-21T10:19:38.0950137Z mov.u32 %r27209, 0x0; 2026-02-21T10:19:38.0950368Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27206, %r27207, %r27208, %r27209 }, [ %rd585 + 0 ], %rd584; 2026-02-21T10:19:38.0950425Z // end inline asm 2026-02-21T10:19:38.0950481Z // begin inline asm 2026-02-21T10:19:38.0950548Z mov.u64 %rd587, 0x0; 2026-02-21T10:19:38.0950670Z createpolicy.fractional.L2::evict_first.b64 %rd587, 1.0; 2026-02-21T10:19:38.0950725Z // end inline asm 2026-02-21T10:19:38.0950845Z // begin inline asm 2026-02-21T10:19:38.0950903Z mov.u32 %r27210, 0x0; 2026-02-21T10:19:38.0950958Z mov.u32 %r27211, 0x0; 2026-02-21T10:19:38.0951014Z mov.u32 %r27212, 0x0; 2026-02-21T10:19:38.0951068Z mov.u32 %r27213, 0x0; 2026-02-21T10:19:38.0951306Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27210, %r27211, %r27212, %r27213 }, [ %rd588 + 0 ], %rd587; 2026-02-21T10:19:38.0951363Z // end inline asm 2026-02-21T10:19:38.0951422Z // begin inline asm 2026-02-21T10:19:38.0951482Z mov.u64 %rd590, 0x0; 2026-02-21T10:19:38.0951606Z createpolicy.fractional.L2::evict_first.b64 %rd590, 1.0; 2026-02-21T10:19:38.0951661Z // end inline asm 2026-02-21T10:19:38.0951720Z // begin inline asm 2026-02-21T10:19:38.0951776Z mov.u32 %r27214, 0x0; 2026-02-21T10:19:38.0951830Z mov.u32 %r27215, 0x0; 2026-02-21T10:19:38.0951887Z mov.u32 %r27216, 0x0; 2026-02-21T10:19:38.0951944Z mov.u32 %r27217, 0x0; 2026-02-21T10:19:38.0952224Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27214, %r27215, %r27216, %r27217 }, [ %rd591 + 0 ], %rd590; 2026-02-21T10:19:38.0952283Z // end inline asm 2026-02-21T10:19:38.0952341Z // begin inline asm 2026-02-21T10:19:38.0952396Z mov.u64 %rd593, 0x0; 2026-02-21T10:19:38.0952516Z createpolicy.fractional.L2::evict_first.b64 %rd593, 1.0; 2026-02-21T10:19:38.0952581Z // end inline asm 2026-02-21T10:19:38.0952644Z // begin inline asm 2026-02-21T10:19:38.0952700Z mov.u32 %r27218, 0x0; 2026-02-21T10:19:38.0952755Z mov.u32 %r27219, 0x0; 2026-02-21T10:19:38.0952812Z mov.u32 %r27220, 0x0; 2026-02-21T10:19:38.0952867Z mov.u32 %r27221, 0x0; 2026-02-21T10:19:38.0953092Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27218, %r27219, %r27220, %r27221 }, [ %rd594 + 0 ], %rd593; 2026-02-21T10:19:38.0953152Z // end inline asm 2026-02-21T10:19:38.0953208Z // begin inline asm 2026-02-21T10:19:38.0953264Z mov.u64 %rd596, 0x0; 2026-02-21T10:19:38.0953381Z createpolicy.fractional.L2::evict_first.b64 %rd596, 1.0; 2026-02-21T10:19:38.0953438Z // end inline asm 2026-02-21T10:19:38.0953495Z // begin inline asm 2026-02-21T10:19:38.0953551Z mov.u32 %r27222, 0x0; 2026-02-21T10:19:38.0953607Z mov.u32 %r27223, 0x0; 2026-02-21T10:19:38.0953662Z mov.u32 %r27224, 0x0; 2026-02-21T10:19:38.0953717Z mov.u32 %r27225, 0x0; 2026-02-21T10:19:38.0953940Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27222, %r27223, %r27224, %r27225 }, [ %rd597 + 0 ], %rd596; 2026-02-21T10:19:38.0954082Z // end inline asm 2026-02-21T10:19:38.0954138Z // begin inline asm 2026-02-21T10:19:38.0954193Z mov.u64 %rd599, 0x0; 2026-02-21T10:19:38.0954312Z createpolicy.fractional.L2::evict_first.b64 %rd599, 1.0; 2026-02-21T10:19:38.0954367Z // end inline asm 2026-02-21T10:19:38.0954423Z // begin inline asm 2026-02-21T10:19:38.0954491Z mov.u32 %r27226, 0x0; 2026-02-21T10:19:38.0954549Z mov.u32 %r27227, 0x0; 2026-02-21T10:19:38.0954604Z mov.u32 %r27228, 0x0; 2026-02-21T10:19:38.0954659Z mov.u32 %r27229, 0x0; 2026-02-21T10:19:38.0954885Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27226, %r27227, %r27228, %r27229 }, [ %rd600 + 0 ], %rd599; 2026-02-21T10:19:38.0954942Z // end inline asm 2026-02-21T10:19:38.0954998Z // begin inline asm 2026-02-21T10:19:38.0955056Z mov.u64 %rd602, 0x0; 2026-02-21T10:19:38.0955172Z createpolicy.fractional.L2::evict_first.b64 %rd602, 1.0; 2026-02-21T10:19:38.0955226Z // end inline asm 2026-02-21T10:19:38.0955286Z // begin inline asm 2026-02-21T10:19:38.0955342Z mov.u32 %r27230, 0x0; 2026-02-21T10:19:38.0955397Z mov.u32 %r27231, 0x0; 2026-02-21T10:19:38.0955451Z mov.u32 %r27232, 0x0; 2026-02-21T10:19:38.0955561Z mov.u32 %r27233, 0x0; 2026-02-21T10:19:38.0955784Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27230, %r27231, %r27232, %r27233 }, [ %rd603 + 0 ], %rd602; 2026-02-21T10:19:38.0955839Z // end inline asm 2026-02-21T10:19:38.0956044Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.0956100Z bar.sync 0; 2026-02-21T10:19:38.0956182Z st.shared.v2.b32 [%r9], {%r27202, %r27203}; 2026-02-21T10:19:38.0956334Z st.shared.v2.b32 [%r9+2048], {%r27206, %r27207}; 2026-02-21T10:19:38.0956419Z st.shared.v2.b32 [%r9+4096], {%r27210, %r27211}; 2026-02-21T10:19:38.0956631Z st.shared.v2.b32 [%r9+6144], {%r27214, %r27215}; 2026-02-21T10:19:38.0956718Z st.shared.v2.b32 [%r9+8192], {%r27218, %r27219}; 2026-02-21T10:19:38.0956813Z st.shared.v2.b32 [%r9+10240], {%r27222, %r27223}; 2026-02-21T10:19:38.0956897Z st.shared.v2.b32 [%r9+12288], {%r27226, %r27227}; 2026-02-21T10:19:38.0956983Z st.shared.v2.b32 [%r9+14336], {%r27230, %r27231}; 2026-02-21T10:19:38.0957076Z st.shared.v2.b32 [%r10], {%r27204, %r27205}; 2026-02-21T10:19:38.0957162Z st.shared.v2.b32 [%r10+2048], {%r27208, %r27209}; 2026-02-21T10:19:38.0957242Z st.shared.v2.b32 [%r10+4096], {%r27212, %r27213}; 2026-02-21T10:19:38.0957328Z st.shared.v2.b32 [%r10+6144], {%r27216, %r27217}; 2026-02-21T10:19:38.0957408Z st.shared.v2.b32 [%r10+8192], {%r27220, %r27221}; 2026-02-21T10:19:38.0957498Z st.shared.v2.b32 [%r10+10240], {%r27224, %r27225}; 2026-02-21T10:19:38.0957660Z st.shared.v2.b32 [%r10+12288], {%r27228, %r27229}; 2026-02-21T10:19:38.0957752Z st.shared.v2.b32 [%r10+14336], {%r27232, %r27233}; 2026-02-21T10:19:38.0957807Z bar.sync 0; 2026-02-21T10:19:38.0957876Z ld.shared.b16 %rs2241, [%r51]; 2026-02-21T10:19:38.0957947Z ld.shared.b16 %rs2242, [%r51+1024]; 2026-02-21T10:19:38.0958015Z ld.shared.b16 %rs2243, [%r51+64]; 2026-02-21T10:19:38.0958080Z ld.shared.b16 %rs2244, [%r51+1088]; 2026-02-21T10:19:38.0958157Z ld.shared.b16 %rs2245, [%r51+8192]; 2026-02-21T10:19:38.0958224Z ld.shared.b16 %rs2246, [%r51+9216]; 2026-02-21T10:19:38.0958288Z ld.shared.b16 %rs2247, [%r51+8256]; 2026-02-21T10:19:38.0958350Z ld.shared.b16 %rs2248, [%r51+9280]; 2026-02-21T10:19:38.0958418Z ld.shared.b16 %rs2249, [%r52]; 2026-02-21T10:19:38.0958483Z ld.shared.b16 %rs2250, [%r52+1024]; 2026-02-21T10:19:38.0958546Z ld.shared.b16 %rs2251, [%r52+64]; 2026-02-21T10:19:38.0958610Z ld.shared.b16 %rs2252, [%r52+1088]; 2026-02-21T10:19:38.0958676Z ld.shared.b16 %rs2253, [%r52+8192]; 2026-02-21T10:19:38.0958740Z ld.shared.b16 %rs2254, [%r52+9216]; 2026-02-21T10:19:38.0958802Z ld.shared.b16 %rs2255, [%r52+8256]; 2026-02-21T10:19:38.0958867Z ld.shared.b16 %rs2256, [%r52+9280]; 2026-02-21T10:19:38.0958929Z ld.shared.b16 %rs2257, [%r53]; 2026-02-21T10:19:38.0959067Z ld.shared.b16 %rs2258, [%r53+1024]; 2026-02-21T10:19:38.0959132Z ld.shared.b16 %rs2259, [%r53+64]; 2026-02-21T10:19:38.0959197Z ld.shared.b16 %rs2260, [%r53+1088]; 2026-02-21T10:19:38.0959260Z ld.shared.b16 %rs2261, [%r53+8192]; 2026-02-21T10:19:38.0959323Z ld.shared.b16 %rs2262, [%r53+9216]; 2026-02-21T10:19:38.0959388Z ld.shared.b16 %rs2263, [%r53+8256]; 2026-02-21T10:19:38.0959449Z ld.shared.b16 %rs2264, [%r53+9280]; 2026-02-21T10:19:38.0959510Z ld.shared.b16 %rs2265, [%r54]; 2026-02-21T10:19:38.0959576Z ld.shared.b16 %rs2266, [%r54+1024]; 2026-02-21T10:19:38.0959639Z ld.shared.b16 %rs2267, [%r54+64]; 2026-02-21T10:19:38.0959702Z ld.shared.b16 %rs2268, [%r54+1088]; 2026-02-21T10:19:38.0959768Z ld.shared.b16 %rs2269, [%r54+8192]; 2026-02-21T10:19:38.0959831Z ld.shared.b16 %rs2270, [%r54+9216]; 2026-02-21T10:19:38.0959893Z ld.shared.b16 %rs2271, [%r54+8256]; 2026-02-21T10:19:38.0959954Z ld.shared.b16 %rs2272, [%r54+9280]; 2026-02-21T10:19:38.0960019Z ld.shared.b16 %rs2273, [%r55]; 2026-02-21T10:19:38.0960082Z ld.shared.b16 %rs2274, [%r55+1024]; 2026-02-21T10:19:38.0960144Z ld.shared.b16 %rs2275, [%r55+64]; 2026-02-21T10:19:38.0960285Z ld.shared.b16 %rs2276, [%r55+1088]; 2026-02-21T10:19:38.0960356Z ld.shared.b16 %rs2277, [%r55+8192]; 2026-02-21T10:19:38.0960420Z ld.shared.b16 %rs2278, [%r55+9216]; 2026-02-21T10:19:38.0960484Z ld.shared.b16 %rs2279, [%r55+8256]; 2026-02-21T10:19:38.0960548Z ld.shared.b16 %rs2280, [%r55+9280]; 2026-02-21T10:19:38.0960610Z ld.shared.b16 %rs2281, [%r56]; 2026-02-21T10:19:38.0960675Z ld.shared.b16 %rs2282, [%r56+1024]; 2026-02-21T10:19:38.0960740Z ld.shared.b16 %rs2283, [%r56+64]; 2026-02-21T10:19:38.0960865Z ld.shared.b16 %rs2284, [%r56+1088]; 2026-02-21T10:19:38.0960927Z ld.shared.b16 %rs2285, [%r56+8192]; 2026-02-21T10:19:38.0960990Z ld.shared.b16 %rs2286, [%r56+9216]; 2026-02-21T10:19:38.0961052Z ld.shared.b16 %rs2287, [%r56+8256]; 2026-02-21T10:19:38.0961114Z ld.shared.b16 %rs2288, [%r56+9280]; 2026-02-21T10:19:38.0961178Z ld.shared.b16 %rs2289, [%r57]; 2026-02-21T10:19:38.0961244Z ld.shared.b16 %rs2290, [%r57+1024]; 2026-02-21T10:19:38.0961306Z ld.shared.b16 %rs2291, [%r57+64]; 2026-02-21T10:19:38.0961369Z ld.shared.b16 %rs2292, [%r57+1088]; 2026-02-21T10:19:38.0961433Z ld.shared.b16 %rs2293, [%r57+8192]; 2026-02-21T10:19:38.0961495Z ld.shared.b16 %rs2294, [%r57+9216]; 2026-02-21T10:19:38.0961556Z ld.shared.b16 %rs2295, [%r57+8256]; 2026-02-21T10:19:38.0961618Z ld.shared.b16 %rs2296, [%r57+9280]; 2026-02-21T10:19:38.0961681Z ld.shared.b16 %rs2297, [%r58]; 2026-02-21T10:19:38.0961743Z ld.shared.b16 %rs2298, [%r58+1024]; 2026-02-21T10:19:38.0961807Z ld.shared.b16 %rs2299, [%r58+64]; 2026-02-21T10:19:38.0961932Z ld.shared.b16 %rs2300, [%r58+1088]; 2026-02-21T10:19:38.0962000Z ld.shared.b16 %rs2301, [%r58+8192]; 2026-02-21T10:19:38.0962065Z ld.shared.b16 %rs2302, [%r58+9216]; 2026-02-21T10:19:38.0962128Z ld.shared.b16 %rs2303, [%r58+8256]; 2026-02-21T10:19:38.0962189Z ld.shared.b16 %rs2304, [%r58+9280]; 2026-02-21T10:19:38.0962251Z cvt.f32.bf16 %r27371, %rs2241; 2026-02-21T10:19:38.0962311Z cvt.f32.bf16 %r27372, %rs2242; 2026-02-21T10:19:38.0962374Z cvt.f32.bf16 %r27373, %rs2249; 2026-02-21T10:19:38.0962433Z cvt.f32.bf16 %r27374, %rs2250; 2026-02-21T10:19:38.0962491Z cvt.f32.bf16 %r27503, %rs2257; 2026-02-21T10:19:38.0962554Z cvt.f32.bf16 %r27504, %rs2258; 2026-02-21T10:19:38.0962612Z cvt.f32.bf16 %r27505, %rs2265; 2026-02-21T10:19:38.0962669Z cvt.f32.bf16 %r27506, %rs2266; 2026-02-21T10:19:38.0962726Z cvt.f32.bf16 %r27635, %rs2273; 2026-02-21T10:19:38.0962786Z cvt.f32.bf16 %r27636, %rs2274; 2026-02-21T10:19:38.0962846Z cvt.f32.bf16 %r27637, %rs2281; 2026-02-21T10:19:38.0962906Z cvt.f32.bf16 %r27638, %rs2282; 2026-02-21T10:19:38.0962966Z cvt.f32.bf16 %r27767, %rs2289; 2026-02-21T10:19:38.0963023Z cvt.f32.bf16 %r27768, %rs2290; 2026-02-21T10:19:38.0963093Z cvt.f32.bf16 %r27769, %rs2297; 2026-02-21T10:19:38.0963154Z cvt.f32.bf16 %r27770, %rs2298; 2026-02-21T10:19:38.0963268Z cvt.f32.bf16 %r27899, %rs2243; 2026-02-21T10:19:38.0963327Z cvt.f32.bf16 %r27900, %rs2244; 2026-02-21T10:19:38.0963385Z cvt.f32.bf16 %r27901, %rs2251; 2026-02-21T10:19:38.0963448Z cvt.f32.bf16 %r27902, %rs2252; 2026-02-21T10:19:38.0963507Z cvt.f32.bf16 %r28031, %rs2259; 2026-02-21T10:19:38.0963565Z cvt.f32.bf16 %r28032, %rs2260; 2026-02-21T10:19:38.0963625Z cvt.f32.bf16 %r28033, %rs2267; 2026-02-21T10:19:38.0963684Z cvt.f32.bf16 %r28034, %rs2268; 2026-02-21T10:19:38.0963741Z cvt.f32.bf16 %r28163, %rs2275; 2026-02-21T10:19:38.0963797Z cvt.f32.bf16 %r28164, %rs2276; 2026-02-21T10:19:38.0963857Z cvt.f32.bf16 %r28165, %rs2283; 2026-02-21T10:19:38.0963918Z cvt.f32.bf16 %r28166, %rs2284; 2026-02-21T10:19:38.0963977Z cvt.f32.bf16 %r28295, %rs2291; 2026-02-21T10:19:38.0964037Z cvt.f32.bf16 %r28296, %rs2292; 2026-02-21T10:19:38.0964095Z cvt.f32.bf16 %r28297, %rs2299; 2026-02-21T10:19:38.0964153Z cvt.f32.bf16 %r28298, %rs2300; 2026-02-21T10:19:38.0964210Z cvt.f32.bf16 %r28427, %rs2245; 2026-02-21T10:19:38.0964271Z cvt.f32.bf16 %r28428, %rs2246; 2026-02-21T10:19:38.0964329Z cvt.f32.bf16 %r28429, %rs2253; 2026-02-21T10:19:38.0964437Z cvt.f32.bf16 %r28430, %rs2254; 2026-02-21T10:19:38.0964499Z cvt.f32.bf16 %r28559, %rs2261; 2026-02-21T10:19:38.0964556Z cvt.f32.bf16 %r28560, %rs2262; 2026-02-21T10:19:38.0964613Z cvt.f32.bf16 %r28561, %rs2269; 2026-02-21T10:19:38.0964671Z cvt.f32.bf16 %r28562, %rs2270; 2026-02-21T10:19:38.0964734Z cvt.f32.bf16 %r28691, %rs2277; 2026-02-21T10:19:38.0964791Z cvt.f32.bf16 %r28692, %rs2278; 2026-02-21T10:19:38.0964847Z cvt.f32.bf16 %r28693, %rs2285; 2026-02-21T10:19:38.0964963Z cvt.f32.bf16 %r28694, %rs2286; 2026-02-21T10:19:38.0965024Z cvt.f32.bf16 %r28823, %rs2293; 2026-02-21T10:19:38.0965082Z cvt.f32.bf16 %r28824, %rs2294; 2026-02-21T10:19:38.0965142Z cvt.f32.bf16 %r28825, %rs2301; 2026-02-21T10:19:38.0965203Z cvt.f32.bf16 %r28826, %rs2302; 2026-02-21T10:19:38.0965260Z cvt.f32.bf16 %r28955, %rs2247; 2026-02-21T10:19:38.0965319Z cvt.f32.bf16 %r28956, %rs2248; 2026-02-21T10:19:38.0965377Z cvt.f32.bf16 %r28957, %rs2255; 2026-02-21T10:19:38.0965436Z cvt.f32.bf16 %r28958, %rs2256; 2026-02-21T10:19:38.0965496Z cvt.f32.bf16 %r29087, %rs2263; 2026-02-21T10:19:38.0965557Z cvt.f32.bf16 %r29088, %rs2264; 2026-02-21T10:19:38.0965616Z cvt.f32.bf16 %r29089, %rs2271; 2026-02-21T10:19:38.0965675Z cvt.f32.bf16 %r29090, %rs2272; 2026-02-21T10:19:38.0965734Z cvt.f32.bf16 %r29219, %rs2279; 2026-02-21T10:19:38.0965795Z cvt.f32.bf16 %r29220, %rs2280; 2026-02-21T10:19:38.0965851Z cvt.f32.bf16 %r29221, %rs2287; 2026-02-21T10:19:38.0965910Z cvt.f32.bf16 %r29222, %rs2288; 2026-02-21T10:19:38.0966025Z cvt.f32.bf16 %r29351, %rs2295; 2026-02-21T10:19:38.0966086Z cvt.f32.bf16 %r29352, %rs2296; 2026-02-21T10:19:38.0966142Z cvt.f32.bf16 %r29353, %rs2303; 2026-02-21T10:19:38.0966199Z cvt.f32.bf16 %r29354, %rs2304; 2026-02-21T10:19:38.0966406Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.0966586Z bar.sync 0; 2026-02-21T10:19:38.0966649Z // begin inline asm 2026-02-21T10:19:38.0966755Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.0966812Z // end inline asm 2026-02-21T10:19:38.0966866Z bar.sync 0; 2026-02-21T10:19:38.0966925Z // begin inline asm 2026-02-21T10:19:38.0967071Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.0967129Z // end inline asm 2026-02-21T10:19:38.0967187Z // begin inline asm 2026-02-21T10:19:38.0967266Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.0967319Z // end inline asm 2026-02-21T10:19:38.0967374Z bar.sync 0; 2026-02-21T10:19:38.0967442Z elect.sync %r29749|%p283, -1; 2026-02-21T10:19:38.0967511Z and.pred %p262, %p1, %p283; 2026-02-21T10:19:38.0967573Z add.s32 %r27238, %r29684, 160; 2026-02-21T10:19:38.0967628Z // begin inline asm 2026-02-21T10:19:38.0967967Z @%p262 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r29849, %r27238}], [%r29846]; 2026-02-21T10:19:38.0968113Z // end inline asm 2026-02-21T10:19:38.0968165Z bar.sync 0; 2026-02-21T10:19:38.0968225Z // begin inline asm 2026-02-21T10:19:38.0968276Z 2026-02-21T10:19:38.0968325Z { 2026-02-21T10:19:38.0968391Z .reg .pred complete; 2026-02-21T10:19:38.0968446Z waitLoop: 2026-02-21T10:19:38.0968590Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r29485; 2026-02-21T10:19:38.0968658Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.0968709Z } 2026-02-21T10:19:38.0968714Z 2026-02-21T10:19:38.0968769Z // end inline asm 2026-02-21T10:19:38.0968822Z bar.sync 0; 2026-02-21T10:19:38.0968886Z // begin inline asm 2026-02-21T10:19:38.0968984Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.0969040Z // end inline asm 2026-02-21T10:19:38.0969238Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0969306Z ld.shared.s8 %rs2305, [%r19]; 2026-02-21T10:19:38.0969502Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0969565Z shl.b16 %rs2306, %rs2305, 4; 2026-02-21T10:19:38.0969838Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0969908Z ld.shared.s8 %rs2307, [%r20+128]; 2026-02-21T10:19:38.0970096Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0970161Z shl.b16 %rs2308, %rs2307, 4; 2026-02-21T10:19:38.0970349Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0970485Z ld.shared.s8 %rs2309, [%r21+256]; 2026-02-21T10:19:38.0970684Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0970746Z shl.b16 %rs2310, %rs2309, 4; 2026-02-21T10:19:38.0970936Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0971000Z ld.shared.s8 %rs2311, [%r22+384]; 2026-02-21T10:19:38.0971191Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0971252Z shl.b16 %rs2312, %rs2311, 4; 2026-02-21T10:19:38.0971438Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0971505Z ld.shared.s8 %rs2313, [%r23+512]; 2026-02-21T10:19:38.0971691Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0971834Z shl.b16 %rs2314, %rs2313, 4; 2026-02-21T10:19:38.0972031Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0972093Z ld.shared.s8 %rs2315, [%r24+640]; 2026-02-21T10:19:38.0972279Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0972354Z shl.b16 %rs2316, %rs2315, 4; 2026-02-21T10:19:38.0972553Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0972621Z ld.shared.s8 %rs2317, [%r25+768]; 2026-02-21T10:19:38.0972813Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0972878Z shl.b16 %rs2318, %rs2317, 4; 2026-02-21T10:19:38.0973066Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0973131Z ld.shared.s8 %rs2319, [%r26+896]; 2026-02-21T10:19:38.0973326Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0973386Z shl.b16 %rs2320, %rs2319, 4; 2026-02-21T10:19:38.0973571Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0973697Z ld.shared.s8 %rs2321, [%r19+1024]; 2026-02-21T10:19:38.0973884Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0973944Z shl.b16 %rs2322, %rs2321, 4; 2026-02-21T10:19:38.0974140Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0974208Z ld.shared.s8 %rs2323, [%r20+1152]; 2026-02-21T10:19:38.0974403Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0974466Z shl.b16 %rs2324, %rs2323, 4; 2026-02-21T10:19:38.0974664Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0974728Z ld.shared.s8 %rs2325, [%r21+1280]; 2026-02-21T10:19:38.0974928Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0974994Z shl.b16 %rs2326, %rs2325, 4; 2026-02-21T10:19:38.0975185Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0975250Z ld.shared.s8 %rs2327, [%r22+1408]; 2026-02-21T10:19:38.0975495Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0975557Z shl.b16 %rs2328, %rs2327, 4; 2026-02-21T10:19:38.0975746Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0975810Z ld.shared.s8 %rs2329, [%r23+1536]; 2026-02-21T10:19:38.0975999Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0976109Z shl.b16 %rs2330, %rs2329, 4; 2026-02-21T10:19:38.0976298Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0976365Z ld.shared.s8 %rs2331, [%r24+1664]; 2026-02-21T10:19:38.0976686Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0976753Z shl.b16 %rs2332, %rs2331, 4; 2026-02-21T10:19:38.0976960Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0977026Z ld.shared.s8 %rs2333, [%r25+1792]; 2026-02-21T10:19:38.0977214Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0977278Z shl.b16 %rs2334, %rs2333, 4; 2026-02-21T10:19:38.0977466Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0977605Z ld.shared.s8 %rs2335, [%r26+1920]; 2026-02-21T10:19:38.0977800Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0977860Z shl.b16 %rs2336, %rs2335, 4; 2026-02-21T10:19:38.0978049Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0978116Z ld.shared.s8 %rs2337, [%r19+2048]; 2026-02-21T10:19:38.0978305Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0978368Z shl.b16 %rs2338, %rs2337, 4; 2026-02-21T10:19:38.0978556Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0978623Z ld.shared.s8 %rs2339, [%r20+2176]; 2026-02-21T10:19:38.0978811Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0978870Z shl.b16 %rs2340, %rs2339, 4; 2026-02-21T10:19:38.0979080Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0979145Z ld.shared.s8 %rs2341, [%r21+2304]; 2026-02-21T10:19:38.0979331Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0979467Z shl.b16 %rs2342, %rs2341, 4; 2026-02-21T10:19:38.0979654Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0979718Z ld.shared.s8 %rs2343, [%r22+2432]; 2026-02-21T10:19:38.0979907Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0979966Z shl.b16 %rs2344, %rs2343, 4; 2026-02-21T10:19:38.0980156Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0980220Z ld.shared.s8 %rs2345, [%r23+2560]; 2026-02-21T10:19:38.0980411Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0980471Z shl.b16 %rs2346, %rs2345, 4; 2026-02-21T10:19:38.0980658Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0980724Z ld.shared.s8 %rs2347, [%r24+2688]; 2026-02-21T10:19:38.0980913Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0980973Z shl.b16 %rs2348, %rs2347, 4; 2026-02-21T10:19:38.0981227Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0981293Z ld.shared.s8 %rs2349, [%r25+2816]; 2026-02-21T10:19:38.0981480Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0981543Z shl.b16 %rs2350, %rs2349, 4; 2026-02-21T10:19:38.0981728Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0981863Z ld.shared.s8 %rs2351, [%r26+2944]; 2026-02-21T10:19:38.0982051Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0982114Z shl.b16 %rs2352, %rs2351, 4; 2026-02-21T10:19:38.0982314Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0982381Z ld.shared.s8 %rs2353, [%r19+3072]; 2026-02-21T10:19:38.0982575Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0982635Z shl.b16 %rs2354, %rs2353, 4; 2026-02-21T10:19:38.0982822Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0982888Z ld.shared.s8 %rs2355, [%r20+3200]; 2026-02-21T10:19:38.0983074Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0983182Z shl.b16 %rs2356, %rs2355, 4; 2026-02-21T10:19:38.0983377Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0983441Z ld.shared.s8 %rs2357, [%r21+3328]; 2026-02-21T10:19:38.0983629Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0983691Z shl.b16 %rs2358, %rs2357, 4; 2026-02-21T10:19:38.0983883Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0983957Z ld.shared.s8 %rs2359, [%r22+3456]; 2026-02-21T10:19:38.0984147Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0984208Z shl.b16 %rs2360, %rs2359, 4; 2026-02-21T10:19:38.0984395Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0984458Z ld.shared.s8 %rs2361, [%r23+3584]; 2026-02-21T10:19:38.0984648Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0984708Z shl.b16 %rs2362, %rs2361, 4; 2026-02-21T10:19:38.0984898Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0985013Z ld.shared.s8 %rs2363, [%r24+3712]; 2026-02-21T10:19:38.0985202Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0985262Z shl.b16 %rs2364, %rs2363, 4; 2026-02-21T10:19:38.0985449Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0985514Z ld.shared.s8 %rs2365, [%r25+3840]; 2026-02-21T10:19:38.0985703Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0985761Z shl.b16 %rs2366, %rs2365, 4; 2026-02-21T10:19:38.0985968Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0986032Z ld.shared.s8 %rs2367, [%r26+3968]; 2026-02-21T10:19:38.0986220Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.0986283Z shl.b16 %rs2368, %rs2367, 4; 2026-02-21T10:19:38.0986587Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0986654Z cvt.s16.s8 %rs2369, %rs2306; 2026-02-21T10:19:38.0986797Z shr.s16 %rs2370, %rs2369, 4; 2026-02-21T10:19:38.0986863Z cvt.s16.s8 %rs2371, %rs2308; 2026-02-21T10:19:38.0986935Z shr.s16 %rs2372, %rs2371, 4; 2026-02-21T10:19:38.0986996Z shr.s16 %rs2373, %rs2305, 4; 2026-02-21T10:19:38.0987056Z shr.s16 %rs2374, %rs2307, 4; 2026-02-21T10:19:38.0987252Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0987318Z cvt.rn.f32.s16 %r29750, %rs2374; 2026-02-21T10:19:38.0987450Z cvt.rn.f32.s16 %r29751, %rs2373; 2026-02-21T10:19:38.0987511Z cvt.rn.f32.s16 %r29752, %rs2372; 2026-02-21T10:19:38.0987571Z cvt.rn.f32.s16 %r29753, %rs2370; 2026-02-21T10:19:38.0987771Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0987834Z cvt.s16.s8 %rs2375, %rs2310; 2026-02-21T10:19:38.0987893Z shr.s16 %rs2376, %rs2375, 4; 2026-02-21T10:19:38.0987952Z cvt.s16.s8 %rs2377, %rs2312; 2026-02-21T10:19:38.0988016Z shr.s16 %rs2378, %rs2377, 4; 2026-02-21T10:19:38.0988076Z shr.s16 %rs2379, %rs2309, 4; 2026-02-21T10:19:38.0988135Z shr.s16 %rs2380, %rs2311, 4; 2026-02-21T10:19:38.0988327Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0988469Z cvt.rn.f32.s16 %r29754, %rs2380; 2026-02-21T10:19:38.0988534Z cvt.rn.f32.s16 %r29755, %rs2379; 2026-02-21T10:19:38.0988595Z cvt.rn.f32.s16 %r29756, %rs2378; 2026-02-21T10:19:38.0988729Z cvt.rn.f32.s16 %r29757, %rs2376; 2026-02-21T10:19:38.0988923Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0988984Z cvt.s16.s8 %rs2381, %rs2314; 2026-02-21T10:19:38.0989045Z shr.s16 %rs2382, %rs2381, 4; 2026-02-21T10:19:38.0989104Z cvt.s16.s8 %rs2383, %rs2316; 2026-02-21T10:19:38.0989166Z shr.s16 %rs2384, %rs2383, 4; 2026-02-21T10:19:38.0989225Z shr.s16 %rs2385, %rs2313, 4; 2026-02-21T10:19:38.0989284Z shr.s16 %rs2386, %rs2315, 4; 2026-02-21T10:19:38.0989474Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0989535Z cvt.rn.f32.s16 %r29758, %rs2386; 2026-02-21T10:19:38.0989599Z cvt.rn.f32.s16 %r29759, %rs2385; 2026-02-21T10:19:38.0989659Z cvt.rn.f32.s16 %r29760, %rs2384; 2026-02-21T10:19:38.0989720Z cvt.rn.f32.s16 %r29761, %rs2382; 2026-02-21T10:19:38.0989909Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0989971Z cvt.s16.s8 %rs2387, %rs2318; 2026-02-21T10:19:38.0990032Z shr.s16 %rs2388, %rs2387, 4; 2026-02-21T10:19:38.0990096Z cvt.s16.s8 %rs2389, %rs2320; 2026-02-21T10:19:38.0990155Z shr.s16 %rs2390, %rs2389, 4; 2026-02-21T10:19:38.0990213Z shr.s16 %rs2391, %rs2317, 4; 2026-02-21T10:19:38.0990341Z shr.s16 %rs2392, %rs2319, 4; 2026-02-21T10:19:38.0990532Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0990595Z cvt.rn.f32.s16 %r29762, %rs2392; 2026-02-21T10:19:38.0990655Z cvt.rn.f32.s16 %r29763, %rs2391; 2026-02-21T10:19:38.0990719Z cvt.rn.f32.s16 %r29764, %rs2390; 2026-02-21T10:19:38.0990779Z cvt.rn.f32.s16 %r29765, %rs2388; 2026-02-21T10:19:38.0990965Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0991027Z cvt.s16.s8 %rs2393, %rs2322; 2026-02-21T10:19:38.0991088Z shr.s16 %rs2394, %rs2393, 4; 2026-02-21T10:19:38.0991147Z cvt.s16.s8 %rs2395, %rs2324; 2026-02-21T10:19:38.0991207Z shr.s16 %rs2396, %rs2395, 4; 2026-02-21T10:19:38.0991278Z shr.s16 %rs2397, %rs2321, 4; 2026-02-21T10:19:38.0991342Z shr.s16 %rs2398, %rs2323, 4; 2026-02-21T10:19:38.0991533Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0991600Z cvt.rn.f32.s16 %r29766, %rs2398; 2026-02-21T10:19:38.0991660Z cvt.rn.f32.s16 %r29767, %rs2397; 2026-02-21T10:19:38.0991774Z cvt.rn.f32.s16 %r29768, %rs2396; 2026-02-21T10:19:38.0991837Z cvt.rn.f32.s16 %r29769, %rs2394; 2026-02-21T10:19:38.0992029Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0992089Z cvt.s16.s8 %rs2399, %rs2326; 2026-02-21T10:19:38.0992149Z shr.s16 %rs2400, %rs2399, 4; 2026-02-21T10:19:38.0992211Z cvt.s16.s8 %rs2401, %rs2328; 2026-02-21T10:19:38.0992270Z shr.s16 %rs2402, %rs2401, 4; 2026-02-21T10:19:38.0992397Z shr.s16 %rs2403, %rs2325, 4; 2026-02-21T10:19:38.0992458Z shr.s16 %rs2404, %rs2327, 4; 2026-02-21T10:19:38.0992647Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0992708Z cvt.rn.f32.s16 %r29770, %rs2404; 2026-02-21T10:19:38.0992770Z cvt.rn.f32.s16 %r29771, %rs2403; 2026-02-21T10:19:38.0992835Z cvt.rn.f32.s16 %r29772, %rs2402; 2026-02-21T10:19:38.0992895Z cvt.rn.f32.s16 %r29773, %rs2400; 2026-02-21T10:19:38.0993082Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0993146Z cvt.s16.s8 %rs2405, %rs2330; 2026-02-21T10:19:38.0993217Z shr.s16 %rs2406, %rs2405, 4; 2026-02-21T10:19:38.0993279Z cvt.s16.s8 %rs2407, %rs2332; 2026-02-21T10:19:38.0993341Z shr.s16 %rs2408, %rs2407, 4; 2026-02-21T10:19:38.0993400Z shr.s16 %rs2409, %rs2329, 4; 2026-02-21T10:19:38.0993461Z shr.s16 %rs2410, %rs2331, 4; 2026-02-21T10:19:38.0993702Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0993769Z cvt.rn.f32.s16 %r29774, %rs2410; 2026-02-21T10:19:38.0993830Z cvt.rn.f32.s16 %r29775, %rs2409; 2026-02-21T10:19:38.0993890Z cvt.rn.f32.s16 %r29776, %rs2408; 2026-02-21T10:19:38.0993953Z cvt.rn.f32.s16 %r29777, %rs2406; 2026-02-21T10:19:38.0994144Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0994204Z cvt.s16.s8 %rs2411, %rs2334; 2026-02-21T10:19:38.0994268Z shr.s16 %rs2412, %rs2411, 4; 2026-02-21T10:19:38.0994329Z cvt.s16.s8 %rs2413, %rs2336; 2026-02-21T10:19:38.0994388Z shr.s16 %rs2414, %rs2413, 4; 2026-02-21T10:19:38.0994446Z shr.s16 %rs2415, %rs2333, 4; 2026-02-21T10:19:38.0994505Z shr.s16 %rs2416, %rs2335, 4; 2026-02-21T10:19:38.0994704Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0994771Z cvt.rn.f32.s16 %r29778, %rs2416; 2026-02-21T10:19:38.0994837Z cvt.rn.f32.s16 %r29779, %rs2415; 2026-02-21T10:19:38.0994899Z cvt.rn.f32.s16 %r29780, %rs2414; 2026-02-21T10:19:38.0994958Z cvt.rn.f32.s16 %r29781, %rs2412; 2026-02-21T10:19:38.0995165Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0995283Z cvt.s16.s8 %rs2417, %rs2338; 2026-02-21T10:19:38.0995343Z shr.s16 %rs2418, %rs2417, 4; 2026-02-21T10:19:38.0995402Z cvt.s16.s8 %rs2419, %rs2340; 2026-02-21T10:19:38.0995465Z shr.s16 %rs2420, %rs2419, 4; 2026-02-21T10:19:38.0995523Z shr.s16 %rs2421, %rs2337, 4; 2026-02-21T10:19:38.0995581Z shr.s16 %rs2422, %rs2339, 4; 2026-02-21T10:19:38.0995773Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0995836Z cvt.rn.f32.s16 %r29782, %rs2422; 2026-02-21T10:19:38.0995897Z cvt.rn.f32.s16 %r29783, %rs2421; 2026-02-21T10:19:38.0995957Z cvt.rn.f32.s16 %r29784, %rs2420; 2026-02-21T10:19:38.0996022Z cvt.rn.f32.s16 %r29785, %rs2418; 2026-02-21T10:19:38.0996210Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0996271Z cvt.s16.s8 %rs2423, %rs2342; 2026-02-21T10:19:38.0996334Z shr.s16 %rs2424, %rs2423, 4; 2026-02-21T10:19:38.0996395Z cvt.s16.s8 %rs2425, %rs2344; 2026-02-21T10:19:38.0996579Z shr.s16 %rs2426, %rs2425, 4; 2026-02-21T10:19:38.0996660Z shr.s16 %rs2427, %rs2341, 4; 2026-02-21T10:19:38.0996798Z shr.s16 %rs2428, %rs2343, 4; 2026-02-21T10:19:38.0996993Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0997068Z cvt.rn.f32.s16 %r29786, %rs2428; 2026-02-21T10:19:38.0997133Z cvt.rn.f32.s16 %r29787, %rs2427; 2026-02-21T10:19:38.0997193Z cvt.rn.f32.s16 %r29788, %rs2426; 2026-02-21T10:19:38.0997253Z cvt.rn.f32.s16 %r29789, %rs2424; 2026-02-21T10:19:38.0997447Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0997576Z cvt.s16.s8 %rs2429, %rs2346; 2026-02-21T10:19:38.0997635Z shr.s16 %rs2430, %rs2429, 4; 2026-02-21T10:19:38.0997697Z cvt.s16.s8 %rs2431, %rs2348; 2026-02-21T10:19:38.0997757Z shr.s16 %rs2432, %rs2431, 4; 2026-02-21T10:19:38.0997816Z shr.s16 %rs2433, %rs2345, 4; 2026-02-21T10:19:38.0997878Z shr.s16 %rs2434, %rs2347, 4; 2026-02-21T10:19:38.0998068Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0998131Z cvt.rn.f32.s16 %r29790, %rs2434; 2026-02-21T10:19:38.0998192Z cvt.rn.f32.s16 %r29791, %rs2433; 2026-02-21T10:19:38.0998255Z cvt.rn.f32.s16 %r29792, %rs2432; 2026-02-21T10:19:38.0998315Z cvt.rn.f32.s16 %r29793, %rs2430; 2026-02-21T10:19:38.0998506Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0998568Z cvt.s16.s8 %rs2435, %rs2350; 2026-02-21T10:19:38.0998630Z shr.s16 %rs2436, %rs2435, 4; 2026-02-21T10:19:38.0998754Z cvt.s16.s8 %rs2437, %rs2352; 2026-02-21T10:19:38.0998815Z shr.s16 %rs2438, %rs2437, 4; 2026-02-21T10:19:38.0998878Z shr.s16 %rs2439, %rs2349, 4; 2026-02-21T10:19:38.0998936Z shr.s16 %rs2440, %rs2351, 4; 2026-02-21T10:19:38.0999125Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.0999192Z cvt.rn.f32.s16 %r29794, %rs2440; 2026-02-21T10:19:38.0999254Z cvt.rn.f32.s16 %r29795, %rs2439; 2026-02-21T10:19:38.0999315Z cvt.rn.f32.s16 %r29796, %rs2438; 2026-02-21T10:19:38.0999378Z cvt.rn.f32.s16 %r29797, %rs2436; 2026-02-21T10:19:38.0999570Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.0999629Z cvt.s16.s8 %rs2441, %rs2354; 2026-02-21T10:19:38.0999687Z shr.s16 %rs2442, %rs2441, 4; 2026-02-21T10:19:38.0999748Z cvt.s16.s8 %rs2443, %rs2356; 2026-02-21T10:19:38.0999807Z shr.s16 %rs2444, %rs2443, 4; 2026-02-21T10:19:38.0999869Z shr.s16 %rs2445, %rs2353, 4; 2026-02-21T10:19:38.0999931Z shr.s16 %rs2446, %rs2355, 4; 2026-02-21T10:19:38.1000118Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1000179Z cvt.rn.f32.s16 %r29798, %rs2446; 2026-02-21T10:19:38.1000310Z cvt.rn.f32.s16 %r29799, %rs2445; 2026-02-21T10:19:38.1000374Z cvt.rn.f32.s16 %r29800, %rs2444; 2026-02-21T10:19:38.1000434Z cvt.rn.f32.s16 %r29801, %rs2442; 2026-02-21T10:19:38.1000625Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1000688Z cvt.s16.s8 %rs2447, %rs2358; 2026-02-21T10:19:38.1000747Z shr.s16 %rs2448, %rs2447, 4; 2026-02-21T10:19:38.1000807Z cvt.s16.s8 %rs2449, %rs2360; 2026-02-21T10:19:38.1000868Z shr.s16 %rs2450, %rs2449, 4; 2026-02-21T10:19:38.1000925Z shr.s16 %rs2451, %rs2357, 4; 2026-02-21T10:19:38.1000984Z shr.s16 %rs2452, %rs2359, 4; 2026-02-21T10:19:38.1001178Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1001244Z cvt.rn.f32.s16 %r29802, %rs2452; 2026-02-21T10:19:38.1001306Z cvt.rn.f32.s16 %r29803, %rs2451; 2026-02-21T10:19:38.1001370Z cvt.rn.f32.s16 %r29804, %rs2450; 2026-02-21T10:19:38.1001433Z cvt.rn.f32.s16 %r29805, %rs2448; 2026-02-21T10:19:38.1001624Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1001734Z cvt.s16.s8 %rs2453, %rs2362; 2026-02-21T10:19:38.1001800Z shr.s16 %rs2454, %rs2453, 4; 2026-02-21T10:19:38.1001859Z cvt.s16.s8 %rs2455, %rs2364; 2026-02-21T10:19:38.1001918Z shr.s16 %rs2456, %rs2455, 4; 2026-02-21T10:19:38.1001975Z shr.s16 %rs2457, %rs2361, 4; 2026-02-21T10:19:38.1002036Z shr.s16 %rs2458, %rs2363, 4; 2026-02-21T10:19:38.1002225Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1002349Z cvt.rn.f32.s16 %r29806, %rs2458; 2026-02-21T10:19:38.1002416Z cvt.rn.f32.s16 %r29807, %rs2457; 2026-02-21T10:19:38.1002477Z cvt.rn.f32.s16 %r29808, %rs2456; 2026-02-21T10:19:38.1002536Z cvt.rn.f32.s16 %r29809, %rs2454; 2026-02-21T10:19:38.1002725Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1002790Z cvt.s16.s8 %rs2459, %rs2366; 2026-02-21T10:19:38.1002848Z shr.s16 %rs2460, %rs2459, 4; 2026-02-21T10:19:38.1002907Z cvt.s16.s8 %rs2461, %rs2368; 2026-02-21T10:19:38.1002969Z shr.s16 %rs2462, %rs2461, 4; 2026-02-21T10:19:38.1003028Z shr.s16 %rs2463, %rs2365, 4; 2026-02-21T10:19:38.1003086Z shr.s16 %rs2464, %rs2367, 4; 2026-02-21T10:19:38.1003275Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1003337Z cvt.rn.f32.s16 %r29810, %rs2464; 2026-02-21T10:19:38.1003398Z cvt.rn.f32.s16 %r29811, %rs2463; 2026-02-21T10:19:38.1003461Z cvt.rn.f32.s16 %r29812, %rs2462; 2026-02-21T10:19:38.1003576Z cvt.rn.f32.s16 %r29813, %rs2460; 2026-02-21T10:19:38.1003632Z bar.sync 0; 2026-02-21T10:19:38.1003754Z st.shared.v4.b32 [%r27], {%r29753, %r29751, %r29752, %r29750}; 2026-02-21T10:19:38.1003885Z st.shared.v4.b32 [%r27+16384], {%r29785, %r29783, %r29784, %r29782}; 2026-02-21T10:19:38.1003999Z st.shared.v4.b32 [%r28], {%r29757, %r29755, %r29756, %r29754}; 2026-02-21T10:19:38.1004119Z st.shared.v4.b32 [%r28+16384], {%r29789, %r29787, %r29788, %r29786}; 2026-02-21T10:19:38.1004230Z st.shared.v4.b32 [%r29], {%r29761, %r29759, %r29760, %r29758}; 2026-02-21T10:19:38.1004355Z st.shared.v4.b32 [%r29+16384], {%r29793, %r29791, %r29792, %r29790}; 2026-02-21T10:19:38.1004462Z st.shared.v4.b32 [%r30], {%r29765, %r29763, %r29764, %r29762}; 2026-02-21T10:19:38.1004577Z st.shared.v4.b32 [%r30+16384], {%r29797, %r29795, %r29796, %r29794}; 2026-02-21T10:19:38.1004686Z st.shared.v4.b32 [%r31], {%r29769, %r29767, %r29768, %r29766}; 2026-02-21T10:19:38.1004805Z st.shared.v4.b32 [%r31+16384], {%r29801, %r29799, %r29800, %r29798}; 2026-02-21T10:19:38.1004911Z st.shared.v4.b32 [%r32], {%r29773, %r29771, %r29772, %r29770}; 2026-02-21T10:19:38.1005028Z st.shared.v4.b32 [%r32+16384], {%r29805, %r29803, %r29804, %r29802}; 2026-02-21T10:19:38.1005134Z st.shared.v4.b32 [%r33], {%r29777, %r29775, %r29776, %r29774}; 2026-02-21T10:19:38.1005316Z st.shared.v4.b32 [%r33+16384], {%r29809, %r29807, %r29808, %r29806}; 2026-02-21T10:19:38.1005427Z st.shared.v4.b32 [%r34], {%r29781, %r29779, %r29780, %r29778}; 2026-02-21T10:19:38.1005543Z st.shared.v4.b32 [%r34+16384], {%r29813, %r29811, %r29812, %r29810}; 2026-02-21T10:19:38.1005598Z $L__tmp21: 2026-02-21T10:19:38.1005872Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1005936Z // begin inline asm 2026-02-21T10:19:38.1006016Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1006071Z // end inline asm 2026-02-21T10:19:38.1006129Z bar.sync 0; 2026-02-21T10:19:38.1006201Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1006258Z // begin inline asm 2026-02-21T10:19:38.1008008Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r27371,%r27372,%r27373,%r27374}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.1008074Z // end inline asm 2026-02-21T10:19:38.1008132Z // begin inline asm 2026-02-21T10:19:38.1009612Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r27503,%r27504,%r27505,%r27506}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.1009733Z // end inline asm 2026-02-21T10:19:38.1009792Z // begin inline asm 2026-02-21T10:19:38.1011337Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r27635,%r27636,%r27637,%r27638}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.1011399Z // end inline asm 2026-02-21T10:19:38.1011459Z // begin inline asm 2026-02-21T10:19:38.1012937Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r27767,%r27768,%r27769,%r27770}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.1012998Z // end inline asm 2026-02-21T10:19:38.1013057Z // begin inline asm 2026-02-21T10:19:38.1014557Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r27899,%r27900,%r27901,%r27902}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.1014676Z // end inline asm 2026-02-21T10:19:38.1014733Z // begin inline asm 2026-02-21T10:19:38.1016214Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r28031,%r28032,%r28033,%r28034}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.1016276Z // end inline asm 2026-02-21T10:19:38.1016334Z // begin inline asm 2026-02-21T10:19:38.1018076Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r28163,%r28164,%r28165,%r28166}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.1018201Z // end inline asm 2026-02-21T10:19:38.1018260Z // begin inline asm 2026-02-21T10:19:38.1019751Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r28295,%r28296,%r28297,%r28298}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.1019810Z // end inline asm 2026-02-21T10:19:38.1019868Z // begin inline asm 2026-02-21T10:19:38.1021417Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r28427,%r28428,%r28429,%r28430}, %rd3, %p224, 1, 1; 2026-02-21T10:19:38.1021477Z // end inline asm 2026-02-21T10:19:38.1021539Z // begin inline asm 2026-02-21T10:19:38.1023019Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r28559,%r28560,%r28561,%r28562}, %rd4, %p224, 1, 1; 2026-02-21T10:19:38.1023150Z // end inline asm 2026-02-21T10:19:38.1023213Z // begin inline asm 2026-02-21T10:19:38.1024706Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r28691,%r28692,%r28693,%r28694}, %rd5, %p224, 1, 1; 2026-02-21T10:19:38.1024767Z // end inline asm 2026-02-21T10:19:38.1024826Z // begin inline asm 2026-02-21T10:19:38.1026345Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r28823,%r28824,%r28825,%r28826}, %rd6, %p224, 1, 1; 2026-02-21T10:19:38.1026409Z // end inline asm 2026-02-21T10:19:38.1026594Z // begin inline asm 2026-02-21T10:19:38.1028094Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r28955,%r28956,%r28957,%r28958}, %rd7, %p224, 1, 1; 2026-02-21T10:19:38.1028227Z // end inline asm 2026-02-21T10:19:38.1028285Z // begin inline asm 2026-02-21T10:19:38.1029896Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r29087,%r29088,%r29089,%r29090}, %rd8, %p224, 1, 1; 2026-02-21T10:19:38.1029959Z // end inline asm 2026-02-21T10:19:38.1030032Z // begin inline asm 2026-02-21T10:19:38.1031510Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r29219,%r29220,%r29221,%r29222}, %rd9, %p224, 1, 1; 2026-02-21T10:19:38.1031569Z // end inline asm 2026-02-21T10:19:38.1031629Z // begin inline asm 2026-02-21T10:19:38.1033105Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r29351,%r29352,%r29353,%r29354}, %rd10, %p224, 1, 1; 2026-02-21T10:19:38.1033230Z // end inline asm 2026-02-21T10:19:38.1033311Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1033374Z mov.b32 %r29483, %r39931; 2026-02-21T10:19:38.1033433Z mov.b32 %r29484, %r29485; 2026-02-21T10:19:38.1033497Z // begin inline asm 2026-02-21T10:19:38.1036215Z // wait for regs: %r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r29483,%r29484,%r29485 2026-02-21T10:19:38.1036348Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1036407Z // end inline asm 2026-02-21T10:19:38.1036614Z $L__tmp22: 2026-02-21T10:19:38.1036835Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1036903Z add.s64 %rd848, %rd848, 384; 2026-02-21T10:19:38.1036968Z add.s32 %r42982, %r42982, 192; 2026-02-21T10:19:38.1037046Z setp.lt.u64 %p284, %rd79, 3936; 2026-02-21T10:19:38.1037112Z mov.b64 %rd849, %rd79; 2026-02-21T10:19:38.1037172Z @%p284 bra $L__BB0_11; 2026-02-21T10:19:38.1037282Z // %bb.12: // %.preheader272.preheader 2026-02-21T10:19:38.1037387Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.1037448Z add.s64 %rd81, %rd76, 16128; 2026-02-21T10:19:38.1037508Z add.s64 %rd82, %rd69, 16128; 2026-02-21T10:19:38.1037569Z add.s64 %rd83, %rd70, 16128; 2026-02-21T10:19:38.1037706Z add.s64 %rd84, %rd71, 16128; 2026-02-21T10:19:38.1037768Z add.s64 %rd85, %rd72, 16128; 2026-02-21T10:19:38.1037826Z add.s64 %rd86, %rd73, 16128; 2026-02-21T10:19:38.1037887Z add.s64 %rd87, %rd74, 16128; 2026-02-21T10:19:38.1037947Z add.s64 %rd88, %rd75, 16128; 2026-02-21T10:19:38.1038007Z mov.b64 %rd851, 4000; 2026-02-21T10:19:38.1038064Z mov.b64 %rd850, %rd11; 2026-02-21T10:19:38.1038162Z $L__BB0_13: // %.preheader272 2026-02-21T10:19:38.1038263Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:19:38.1038366Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.1038584Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.1038650Z add.s64 %rd624, %rd850, %rd88; 2026-02-21T10:19:38.1038711Z add.s64 %rd627, %rd850, %rd87; 2026-02-21T10:19:38.1038775Z add.s64 %rd630, %rd850, %rd86; 2026-02-21T10:19:38.1038836Z add.s64 %rd633, %rd850, %rd85; 2026-02-21T10:19:38.1038895Z add.s64 %rd636, %rd850, %rd84; 2026-02-21T10:19:38.1038953Z add.s64 %rd639, %rd850, %rd83; 2026-02-21T10:19:38.1039027Z add.s64 %rd642, %rd850, %rd82; 2026-02-21T10:19:38.1039225Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.1039355Z add.s64 %rd645, %rd850, %rd81; 2026-02-21T10:19:38.1039415Z // begin inline asm 2026-02-21T10:19:38.1039475Z mov.u64 %rd623, 0x0; 2026-02-21T10:19:38.1039616Z createpolicy.fractional.L2::evict_first.b64 %rd623, 1.0; 2026-02-21T10:19:38.1039677Z // end inline asm 2026-02-21T10:19:38.1039734Z // begin inline asm 2026-02-21T10:19:38.1039791Z mov.u32 %r29814, 0x0; 2026-02-21T10:19:38.1039848Z mov.u32 %r29815, 0x0; 2026-02-21T10:19:38.1039907Z mov.u32 %r29816, 0x0; 2026-02-21T10:19:38.1039962Z mov.u32 %r29817, 0x0; 2026-02-21T10:19:38.1040203Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29814, %r29815, %r29816, %r29817 }, [ %rd624 + 0 ], %rd623; 2026-02-21T10:19:38.1040267Z // end inline asm 2026-02-21T10:19:38.1040324Z // begin inline asm 2026-02-21T10:19:38.1040382Z mov.u64 %rd626, 0x0; 2026-02-21T10:19:38.1040505Z createpolicy.fractional.L2::evict_first.b64 %rd626, 1.0; 2026-02-21T10:19:38.1040562Z // end inline asm 2026-02-21T10:19:38.1040619Z // begin inline asm 2026-02-21T10:19:38.1040678Z mov.u32 %r29818, 0x0; 2026-02-21T10:19:38.1040737Z mov.u32 %r29819, 0x0; 2026-02-21T10:19:38.1040869Z mov.u32 %r29820, 0x0; 2026-02-21T10:19:38.1040932Z mov.u32 %r29821, 0x0; 2026-02-21T10:19:38.1041168Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29818, %r29819, %r29820, %r29821 }, [ %rd627 + 0 ], %rd626; 2026-02-21T10:19:38.1041224Z // end inline asm 2026-02-21T10:19:38.1041281Z // begin inline asm 2026-02-21T10:19:38.1041338Z mov.u64 %rd629, 0x0; 2026-02-21T10:19:38.1041456Z createpolicy.fractional.L2::evict_first.b64 %rd629, 1.0; 2026-02-21T10:19:38.1041574Z // end inline asm 2026-02-21T10:19:38.1041633Z // begin inline asm 2026-02-21T10:19:38.1041693Z mov.u32 %r29822, 0x0; 2026-02-21T10:19:38.1041749Z mov.u32 %r29823, 0x0; 2026-02-21T10:19:38.1041805Z mov.u32 %r29824, 0x0; 2026-02-21T10:19:38.1041862Z mov.u32 %r29825, 0x0; 2026-02-21T10:19:38.1042085Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29822, %r29823, %r29824, %r29825 }, [ %rd630 + 0 ], %rd629; 2026-02-21T10:19:38.1042144Z // end inline asm 2026-02-21T10:19:38.1042202Z // begin inline asm 2026-02-21T10:19:38.1042265Z mov.u64 %rd632, 0x0; 2026-02-21T10:19:38.1042382Z createpolicy.fractional.L2::evict_first.b64 %rd632, 1.0; 2026-02-21T10:19:38.1042437Z // end inline asm 2026-02-21T10:19:38.1042498Z // begin inline asm 2026-02-21T10:19:38.1042555Z mov.u32 %r29826, 0x0; 2026-02-21T10:19:38.1042612Z mov.u32 %r29827, 0x0; 2026-02-21T10:19:38.1042668Z mov.u32 %r29828, 0x0; 2026-02-21T10:19:38.1042726Z mov.u32 %r29829, 0x0; 2026-02-21T10:19:38.1043000Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29826, %r29827, %r29828, %r29829 }, [ %rd633 + 0 ], %rd632; 2026-02-21T10:19:38.1043061Z // end inline asm 2026-02-21T10:19:38.1043123Z // begin inline asm 2026-02-21T10:19:38.1043179Z mov.u64 %rd635, 0x0; 2026-02-21T10:19:38.1043295Z createpolicy.fractional.L2::evict_first.b64 %rd635, 1.0; 2026-02-21T10:19:38.1043357Z // end inline asm 2026-02-21T10:19:38.1043413Z // begin inline asm 2026-02-21T10:19:38.1043470Z mov.u32 %r29830, 0x0; 2026-02-21T10:19:38.1043528Z mov.u32 %r29831, 0x0; 2026-02-21T10:19:38.1043593Z mov.u32 %r29832, 0x0; 2026-02-21T10:19:38.1043649Z mov.u32 %r29833, 0x0; 2026-02-21T10:19:38.1043870Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29830, %r29831, %r29832, %r29833 }, [ %rd636 + 0 ], %rd635; 2026-02-21T10:19:38.1043930Z // end inline asm 2026-02-21T10:19:38.1043986Z // begin inline asm 2026-02-21T10:19:38.1044043Z mov.u64 %rd638, 0x0; 2026-02-21T10:19:38.1044160Z createpolicy.fractional.L2::evict_first.b64 %rd638, 1.0; 2026-02-21T10:19:38.1044216Z // end inline asm 2026-02-21T10:19:38.1044274Z // begin inline asm 2026-02-21T10:19:38.1044332Z mov.u32 %r29834, 0x0; 2026-02-21T10:19:38.1044389Z mov.u32 %r29835, 0x0; 2026-02-21T10:19:38.1044444Z mov.u32 %r29836, 0x0; 2026-02-21T10:19:38.1044499Z mov.u32 %r29837, 0x0; 2026-02-21T10:19:38.1044721Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29834, %r29835, %r29836, %r29837 }, [ %rd639 + 0 ], %rd638; 2026-02-21T10:19:38.1044863Z // end inline asm 2026-02-21T10:19:38.1044921Z // begin inline asm 2026-02-21T10:19:38.1044981Z mov.u64 %rd641, 0x0; 2026-02-21T10:19:38.1045100Z createpolicy.fractional.L2::evict_first.b64 %rd641, 1.0; 2026-02-21T10:19:38.1045157Z // end inline asm 2026-02-21T10:19:38.1045215Z // begin inline asm 2026-02-21T10:19:38.1045273Z mov.u32 %r29838, 0x0; 2026-02-21T10:19:38.1045330Z mov.u32 %r29839, 0x0; 2026-02-21T10:19:38.1045385Z mov.u32 %r29840, 0x0; 2026-02-21T10:19:38.1045442Z mov.u32 %r29841, 0x0; 2026-02-21T10:19:38.1045664Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29838, %r29839, %r29840, %r29841 }, [ %rd642 + 0 ], %rd641; 2026-02-21T10:19:38.1045723Z // end inline asm 2026-02-21T10:19:38.1045778Z // begin inline asm 2026-02-21T10:19:38.1045837Z mov.u64 %rd644, 0x0; 2026-02-21T10:19:38.1045954Z createpolicy.fractional.L2::evict_first.b64 %rd644, 1.0; 2026-02-21T10:19:38.1046010Z // end inline asm 2026-02-21T10:19:38.1046069Z // begin inline asm 2026-02-21T10:19:38.1046127Z mov.u32 %r29842, 0x0; 2026-02-21T10:19:38.1046185Z mov.u32 %r29843, 0x0; 2026-02-21T10:19:38.1046296Z mov.u32 %r29844, 0x0; 2026-02-21T10:19:38.1046353Z mov.u32 %r29845, 0x0; 2026-02-21T10:19:38.1046747Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29842, %r29843, %r29844, %r29845 }, [ %rd645 + 0 ], %rd644; 2026-02-21T10:19:38.1046809Z // end inline asm 2026-02-21T10:19:38.1047026Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.1047082Z bar.sync 0; 2026-02-21T10:19:38.1047243Z st.shared.v2.b32 [%r9], {%r29814, %r29815}; 2026-02-21T10:19:38.1047336Z st.shared.v2.b32 [%r9+2048], {%r29818, %r29819}; 2026-02-21T10:19:38.1047423Z st.shared.v2.b32 [%r9+4096], {%r29822, %r29823}; 2026-02-21T10:19:38.1047505Z st.shared.v2.b32 [%r9+6144], {%r29826, %r29827}; 2026-02-21T10:19:38.1047587Z st.shared.v2.b32 [%r9+8192], {%r29830, %r29831}; 2026-02-21T10:19:38.1047676Z st.shared.v2.b32 [%r9+10240], {%r29834, %r29835}; 2026-02-21T10:19:38.1047770Z st.shared.v2.b32 [%r9+12288], {%r29838, %r29839}; 2026-02-21T10:19:38.1047857Z st.shared.v2.b32 [%r9+14336], {%r29842, %r29843}; 2026-02-21T10:19:38.1047941Z st.shared.v2.b32 [%r10], {%r29816, %r29817}; 2026-02-21T10:19:38.1048023Z st.shared.v2.b32 [%r10+2048], {%r29820, %r29821}; 2026-02-21T10:19:38.1048105Z st.shared.v2.b32 [%r10+4096], {%r29824, %r29825}; 2026-02-21T10:19:38.1048190Z st.shared.v2.b32 [%r10+6144], {%r29828, %r29829}; 2026-02-21T10:19:38.1048270Z st.shared.v2.b32 [%r10+8192], {%r29832, %r29833}; 2026-02-21T10:19:38.1048441Z st.shared.v2.b32 [%r10+10240], {%r29836, %r29837}; 2026-02-21T10:19:38.1048535Z st.shared.v2.b32 [%r10+12288], {%r29840, %r29841}; 2026-02-21T10:19:38.1048621Z st.shared.v2.b32 [%r10+14336], {%r29844, %r29845}; 2026-02-21T10:19:38.1048675Z bar.sync 0; 2026-02-21T10:19:38.1048747Z ld.shared.b16 %rs2465, [%r51]; 2026-02-21T10:19:38.1048822Z ld.shared.b16 %rs2466, [%r51+1024]; 2026-02-21T10:19:38.1048889Z ld.shared.b16 %rs2467, [%r51+64]; 2026-02-21T10:19:38.1048957Z ld.shared.b16 %rs2468, [%r51+1088]; 2026-02-21T10:19:38.1049025Z ld.shared.b16 %rs2469, [%r51+8192]; 2026-02-21T10:19:38.1049087Z ld.shared.b16 %rs2470, [%r51+9216]; 2026-02-21T10:19:38.1049150Z ld.shared.b16 %rs2471, [%r51+8256]; 2026-02-21T10:19:38.1049213Z ld.shared.b16 %rs2472, [%r51+9280]; 2026-02-21T10:19:38.1049280Z ld.shared.b16 %rs2473, [%r52]; 2026-02-21T10:19:38.1049345Z ld.shared.b16 %rs2474, [%r52+1024]; 2026-02-21T10:19:38.1049410Z ld.shared.b16 %rs2475, [%r52+64]; 2026-02-21T10:19:38.1049481Z ld.shared.b16 %rs2476, [%r52+1088]; 2026-02-21T10:19:38.1049544Z ld.shared.b16 %rs2477, [%r52+8192]; 2026-02-21T10:19:38.1049609Z ld.shared.b16 %rs2478, [%r52+9216]; 2026-02-21T10:19:38.1049675Z ld.shared.b16 %rs2479, [%r52+8256]; 2026-02-21T10:19:38.1049739Z ld.shared.b16 %rs2480, [%r52+9280]; 2026-02-21T10:19:38.1049801Z ld.shared.b16 %rs2481, [%r53]; 2026-02-21T10:19:38.1049948Z ld.shared.b16 %rs2482, [%r53+1024]; 2026-02-21T10:19:38.1050015Z ld.shared.b16 %rs2483, [%r53+64]; 2026-02-21T10:19:38.1050080Z ld.shared.b16 %rs2484, [%r53+1088]; 2026-02-21T10:19:38.1050143Z ld.shared.b16 %rs2485, [%r53+8192]; 2026-02-21T10:19:38.1050208Z ld.shared.b16 %rs2486, [%r53+9216]; 2026-02-21T10:19:38.1050271Z ld.shared.b16 %rs2487, [%r53+8256]; 2026-02-21T10:19:38.1050333Z ld.shared.b16 %rs2488, [%r53+9280]; 2026-02-21T10:19:38.1050396Z ld.shared.b16 %rs2489, [%r54]; 2026-02-21T10:19:38.1050462Z ld.shared.b16 %rs2490, [%r54+1024]; 2026-02-21T10:19:38.1050525Z ld.shared.b16 %rs2491, [%r54+64]; 2026-02-21T10:19:38.1050592Z ld.shared.b16 %rs2492, [%r54+1088]; 2026-02-21T10:19:38.1050657Z ld.shared.b16 %rs2493, [%r54+8192]; 2026-02-21T10:19:38.1050720Z ld.shared.b16 %rs2494, [%r54+9216]; 2026-02-21T10:19:38.1050783Z ld.shared.b16 %rs2495, [%r54+8256]; 2026-02-21T10:19:38.1050850Z ld.shared.b16 %rs2496, [%r54+9280]; 2026-02-21T10:19:38.1050915Z ld.shared.b16 %rs2497, [%r55]; 2026-02-21T10:19:38.1050981Z ld.shared.b16 %rs2498, [%r55+1024]; 2026-02-21T10:19:38.1051043Z ld.shared.b16 %rs2499, [%r55+64]; 2026-02-21T10:19:38.1051185Z ld.shared.b16 %rs2500, [%r55+1088]; 2026-02-21T10:19:38.1051254Z ld.shared.b16 %rs2501, [%r55+8192]; 2026-02-21T10:19:38.1051319Z ld.shared.b16 %rs2502, [%r55+9216]; 2026-02-21T10:19:38.1051384Z ld.shared.b16 %rs2503, [%r55+8256]; 2026-02-21T10:19:38.1051446Z ld.shared.b16 %rs2504, [%r55+9280]; 2026-02-21T10:19:38.1051509Z ld.shared.b16 %rs2505, [%r56]; 2026-02-21T10:19:38.1051572Z ld.shared.b16 %rs2506, [%r56+1024]; 2026-02-21T10:19:38.1051699Z ld.shared.b16 %rs2507, [%r56+64]; 2026-02-21T10:19:38.1051765Z ld.shared.b16 %rs2508, [%r56+1088]; 2026-02-21T10:19:38.1051831Z ld.shared.b16 %rs2509, [%r56+8192]; 2026-02-21T10:19:38.1051898Z ld.shared.b16 %rs2510, [%r56+9216]; 2026-02-21T10:19:38.1051962Z ld.shared.b16 %rs2511, [%r56+8256]; 2026-02-21T10:19:38.1052027Z ld.shared.b16 %rs2512, [%r56+9280]; 2026-02-21T10:19:38.1052097Z ld.shared.b16 %rs2513, [%r57]; 2026-02-21T10:19:38.1052161Z ld.shared.b16 %rs2514, [%r57+1024]; 2026-02-21T10:19:38.1052225Z ld.shared.b16 %rs2515, [%r57+64]; 2026-02-21T10:19:38.1052288Z ld.shared.b16 %rs2516, [%r57+1088]; 2026-02-21T10:19:38.1052356Z ld.shared.b16 %rs2517, [%r57+8192]; 2026-02-21T10:19:38.1052420Z ld.shared.b16 %rs2518, [%r57+9216]; 2026-02-21T10:19:38.1052484Z ld.shared.b16 %rs2519, [%r57+8256]; 2026-02-21T10:19:38.1052559Z ld.shared.b16 %rs2520, [%r57+9280]; 2026-02-21T10:19:38.1052624Z ld.shared.b16 %rs2521, [%r58]; 2026-02-21T10:19:38.1052690Z ld.shared.b16 %rs2522, [%r58+1024]; 2026-02-21T10:19:38.1052803Z ld.shared.b16 %rs2523, [%r58+64]; 2026-02-21T10:19:38.1052873Z ld.shared.b16 %rs2524, [%r58+1088]; 2026-02-21T10:19:38.1052936Z ld.shared.b16 %rs2525, [%r58+8192]; 2026-02-21T10:19:38.1052999Z ld.shared.b16 %rs2526, [%r58+9216]; 2026-02-21T10:19:38.1053064Z ld.shared.b16 %rs2527, [%r58+8256]; 2026-02-21T10:19:38.1053129Z ld.shared.b16 %rs2528, [%r58+9280]; 2026-02-21T10:19:38.1053191Z cvt.f32.bf16 %r29983, %rs2465; 2026-02-21T10:19:38.1053255Z cvt.f32.bf16 %r29984, %rs2466; 2026-02-21T10:19:38.1053317Z cvt.f32.bf16 %r29985, %rs2473; 2026-02-21T10:19:38.1053376Z cvt.f32.bf16 %r29986, %rs2474; 2026-02-21T10:19:38.1053436Z cvt.f32.bf16 %r30115, %rs2481; 2026-02-21T10:19:38.1053498Z cvt.f32.bf16 %r30116, %rs2482; 2026-02-21T10:19:38.1053556Z cvt.f32.bf16 %r30117, %rs2489; 2026-02-21T10:19:38.1053615Z cvt.f32.bf16 %r30118, %rs2490; 2026-02-21T10:19:38.1053677Z cvt.f32.bf16 %r30247, %rs2497; 2026-02-21T10:19:38.1053736Z cvt.f32.bf16 %r30248, %rs2498; 2026-02-21T10:19:38.1053798Z cvt.f32.bf16 %r30249, %rs2505; 2026-02-21T10:19:38.1053857Z cvt.f32.bf16 %r30250, %rs2506; 2026-02-21T10:19:38.1053920Z cvt.f32.bf16 %r30379, %rs2513; 2026-02-21T10:19:38.1053980Z cvt.f32.bf16 %r30380, %rs2514; 2026-02-21T10:19:38.1054039Z cvt.f32.bf16 %r30381, %rs2521; 2026-02-21T10:19:38.1054156Z cvt.f32.bf16 %r30382, %rs2522; 2026-02-21T10:19:38.1054215Z cvt.f32.bf16 %r30511, %rs2467; 2026-02-21T10:19:38.1054274Z cvt.f32.bf16 %r30512, %rs2468; 2026-02-21T10:19:38.1054336Z cvt.f32.bf16 %r30513, %rs2475; 2026-02-21T10:19:38.1054396Z cvt.f32.bf16 %r30514, %rs2476; 2026-02-21T10:19:38.1054456Z cvt.f32.bf16 %r30643, %rs2483; 2026-02-21T10:19:38.1054514Z cvt.f32.bf16 %r30644, %rs2484; 2026-02-21T10:19:38.1054575Z cvt.f32.bf16 %r30645, %rs2491; 2026-02-21T10:19:38.1054636Z cvt.f32.bf16 %r30646, %rs2492; 2026-02-21T10:19:38.1054694Z cvt.f32.bf16 %r30775, %rs2499; 2026-02-21T10:19:38.1054766Z cvt.f32.bf16 %r30776, %rs2500; 2026-02-21T10:19:38.1054828Z cvt.f32.bf16 %r30777, %rs2507; 2026-02-21T10:19:38.1054890Z cvt.f32.bf16 %r30778, %rs2508; 2026-02-21T10:19:38.1054949Z cvt.f32.bf16 %r30907, %rs2515; 2026-02-21T10:19:38.1055011Z cvt.f32.bf16 %r30908, %rs2516; 2026-02-21T10:19:38.1055073Z cvt.f32.bf16 %r30909, %rs2523; 2026-02-21T10:19:38.1055131Z cvt.f32.bf16 %r30910, %rs2524; 2026-02-21T10:19:38.1055194Z cvt.f32.bf16 %r31039, %rs2469; 2026-02-21T10:19:38.1055253Z cvt.f32.bf16 %r31040, %rs2470; 2026-02-21T10:19:38.1055311Z cvt.f32.bf16 %r31041, %rs2477; 2026-02-21T10:19:38.1055422Z cvt.f32.bf16 %r31042, %rs2478; 2026-02-21T10:19:38.1055487Z cvt.f32.bf16 %r31171, %rs2485; 2026-02-21T10:19:38.1055546Z cvt.f32.bf16 %r31172, %rs2486; 2026-02-21T10:19:38.1055605Z cvt.f32.bf16 %r31173, %rs2493; 2026-02-21T10:19:38.1055667Z cvt.f32.bf16 %r31174, %rs2494; 2026-02-21T10:19:38.1055727Z cvt.f32.bf16 %r31303, %rs2501; 2026-02-21T10:19:38.1055785Z cvt.f32.bf16 %r31304, %rs2502; 2026-02-21T10:19:38.1055843Z cvt.f32.bf16 %r31305, %rs2509; 2026-02-21T10:19:38.1055954Z cvt.f32.bf16 %r31306, %rs2510; 2026-02-21T10:19:38.1056013Z cvt.f32.bf16 %r31435, %rs2517; 2026-02-21T10:19:38.1056071Z cvt.f32.bf16 %r31436, %rs2518; 2026-02-21T10:19:38.1056131Z cvt.f32.bf16 %r31437, %rs2525; 2026-02-21T10:19:38.1056190Z cvt.f32.bf16 %r31438, %rs2526; 2026-02-21T10:19:38.1056251Z cvt.f32.bf16 %r31567, %rs2471; 2026-02-21T10:19:38.1056310Z cvt.f32.bf16 %r31568, %rs2472; 2026-02-21T10:19:38.1056371Z cvt.f32.bf16 %r31569, %rs2479; 2026-02-21T10:19:38.1056431Z cvt.f32.bf16 %r31570, %rs2480; 2026-02-21T10:19:38.1056618Z cvt.f32.bf16 %r31699, %rs2487; 2026-02-21T10:19:38.1056685Z cvt.f32.bf16 %r31700, %rs2488; 2026-02-21T10:19:38.1056749Z cvt.f32.bf16 %r31701, %rs2495; 2026-02-21T10:19:38.1056808Z cvt.f32.bf16 %r31702, %rs2496; 2026-02-21T10:19:38.1056879Z cvt.f32.bf16 %r31831, %rs2503; 2026-02-21T10:19:38.1056941Z cvt.f32.bf16 %r31832, %rs2504; 2026-02-21T10:19:38.1057000Z cvt.f32.bf16 %r31833, %rs2511; 2026-02-21T10:19:38.1057061Z cvt.f32.bf16 %r31834, %rs2512; 2026-02-21T10:19:38.1057195Z cvt.f32.bf16 %r31963, %rs2519; 2026-02-21T10:19:38.1057258Z cvt.f32.bf16 %r31964, %rs2520; 2026-02-21T10:19:38.1057317Z cvt.f32.bf16 %r31965, %rs2527; 2026-02-21T10:19:38.1057379Z cvt.f32.bf16 %r31966, %rs2528; 2026-02-21T10:19:38.1057589Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.1057647Z bar.sync 0; 2026-02-21T10:19:38.1057705Z // begin inline asm 2026-02-21T10:19:38.1057811Z @%p313 mbarrier.init.shared::cta.b64 [%r29846], 1; 2026-02-21T10:19:38.1057868Z // end inline asm 2026-02-21T10:19:38.1057921Z bar.sync 0; 2026-02-21T10:19:38.1057980Z // begin inline asm 2026-02-21T10:19:38.1058115Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29846], 4096; 2026-02-21T10:19:38.1058171Z // end inline asm 2026-02-21T10:19:38.1058227Z // begin inline asm 2026-02-21T10:19:38.1058307Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1058364Z // end inline asm 2026-02-21T10:19:38.1058418Z bar.sync 0; 2026-02-21T10:19:38.1058491Z elect.sync %r32229|%p306, -1; 2026-02-21T10:19:38.1058559Z and.pred %p287, %p1, %p306; 2026-02-21T10:19:38.1058621Z add.s64 %rd851, %rd851, 32; 2026-02-21T10:19:38.1058687Z cvt.u32.u64 %r29850, %rd851; 2026-02-21T10:19:38.1058745Z // begin inline asm 2026-02-21T10:19:38.1059156Z @%p287 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r29849, %r29850}], [%r29846]; 2026-02-21T10:19:38.1059214Z // end inline asm 2026-02-21T10:19:38.1059271Z bar.sync 0; 2026-02-21T10:19:38.1059329Z mov.b32 %r32097, 0; 2026-02-21T10:19:38.1059387Z // begin inline asm 2026-02-21T10:19:38.1059440Z 2026-02-21T10:19:38.1059490Z { 2026-02-21T10:19:38.1059551Z .reg .pred complete; 2026-02-21T10:19:38.1059617Z waitLoop: 2026-02-21T10:19:38.1059766Z mbarrier.try_wait.parity.shared.b64 complete, [%r29846], %r32097; 2026-02-21T10:19:38.1059834Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.1059885Z } 2026-02-21T10:19:38.1059892Z 2026-02-21T10:19:38.1059954Z // end inline asm 2026-02-21T10:19:38.1060008Z bar.sync 0; 2026-02-21T10:19:38.1060066Z // begin inline asm 2026-02-21T10:19:38.1060167Z @%p313 mbarrier.inval.shared::cta.b64 [%r29846]; 2026-02-21T10:19:38.1060223Z // end inline asm 2026-02-21T10:19:38.1060424Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1060493Z ld.shared.s8 %rs2529, [%r19]; 2026-02-21T10:19:38.1060778Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1060845Z shl.b16 %rs2530, %rs2529, 4; 2026-02-21T10:19:38.1061037Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1061106Z ld.shared.s8 %rs2531, [%r20+128]; 2026-02-21T10:19:38.1061295Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1061418Z shl.b16 %rs2532, %rs2531, 4; 2026-02-21T10:19:38.1061608Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1061672Z ld.shared.s8 %rs2533, [%r21+256]; 2026-02-21T10:19:38.1061860Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1061925Z shl.b16 %rs2534, %rs2533, 4; 2026-02-21T10:19:38.1062116Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1062179Z ld.shared.s8 %rs2535, [%r22+384]; 2026-02-21T10:19:38.1062367Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1062432Z shl.b16 %rs2536, %rs2535, 4; 2026-02-21T10:19:38.1062622Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1062684Z ld.shared.s8 %rs2537, [%r23+512]; 2026-02-21T10:19:38.1062943Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1063007Z shl.b16 %rs2538, %rs2537, 4; 2026-02-21T10:19:38.1063194Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1063261Z ld.shared.s8 %rs2539, [%r24+640]; 2026-02-21T10:19:38.1063450Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1063511Z shl.b16 %rs2540, %rs2539, 4; 2026-02-21T10:19:38.1063701Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1063766Z ld.shared.s8 %rs2541, [%r25+768]; 2026-02-21T10:19:38.1063954Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1064014Z shl.b16 %rs2542, %rs2541, 4; 2026-02-21T10:19:38.1064208Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1064272Z ld.shared.s8 %rs2543, [%r26+896]; 2026-02-21T10:19:38.1064460Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1064522Z shl.b16 %rs2544, %rs2543, 4; 2026-02-21T10:19:38.1064775Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1064844Z ld.shared.s8 %rs2545, [%r19+1024]; 2026-02-21T10:19:38.1065036Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1065097Z shl.b16 %rs2546, %rs2545, 4; 2026-02-21T10:19:38.1065285Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1065352Z ld.shared.s8 %rs2547, [%r20+1152]; 2026-02-21T10:19:38.1065541Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1065603Z shl.b16 %rs2548, %rs2547, 4; 2026-02-21T10:19:38.1065792Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1065859Z ld.shared.s8 %rs2549, [%r21+1280]; 2026-02-21T10:19:38.1066047Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1066109Z shl.b16 %rs2550, %rs2549, 4; 2026-02-21T10:19:38.1066361Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1066428Z ld.shared.s8 %rs2551, [%r22+1408]; 2026-02-21T10:19:38.1066744Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1066812Z shl.b16 %rs2552, %rs2551, 4; 2026-02-21T10:19:38.1067016Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1067155Z ld.shared.s8 %rs2553, [%r23+1536]; 2026-02-21T10:19:38.1067350Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1067413Z shl.b16 %rs2554, %rs2553, 4; 2026-02-21T10:19:38.1067603Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1067679Z ld.shared.s8 %rs2555, [%r24+1664]; 2026-02-21T10:19:38.1067877Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1067938Z shl.b16 %rs2556, %rs2555, 4; 2026-02-21T10:19:38.1068126Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1068191Z ld.shared.s8 %rs2557, [%r25+1792]; 2026-02-21T10:19:38.1068459Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1068522Z shl.b16 %rs2558, %rs2557, 4; 2026-02-21T10:19:38.1068787Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1068852Z ld.shared.s8 %rs2559, [%r26+1920]; 2026-02-21T10:19:38.1069041Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1069105Z shl.b16 %rs2560, %rs2559, 4; 2026-02-21T10:19:38.1069292Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1069355Z ld.shared.s8 %rs2561, [%r19+2048]; 2026-02-21T10:19:38.1069546Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1069607Z shl.b16 %rs2562, %rs2561, 4; 2026-02-21T10:19:38.1069795Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1069857Z ld.shared.s8 %rs2563, [%r20+2176]; 2026-02-21T10:19:38.1070053Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1070114Z shl.b16 %rs2564, %rs2563, 4; 2026-02-21T10:19:38.1070302Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1070368Z ld.shared.s8 %rs2565, [%r21+2304]; 2026-02-21T10:19:38.1070626Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1070686Z shl.b16 %rs2566, %rs2565, 4; 2026-02-21T10:19:38.1070878Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1070944Z ld.shared.s8 %rs2567, [%r22+2432]; 2026-02-21T10:19:38.1073618Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1073705Z shl.b16 %rs2568, %rs2567, 4; 2026-02-21T10:19:38.1073928Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1074009Z ld.shared.s8 %rs2569, [%r23+2560]; 2026-02-21T10:19:38.1074223Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1074288Z shl.b16 %rs2570, %rs2569, 4; 2026-02-21T10:19:38.1074500Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1074571Z ld.shared.s8 %rs2571, [%r24+2688]; 2026-02-21T10:19:38.1074879Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1074949Z shl.b16 %rs2572, %rs2571, 4; 2026-02-21T10:19:38.1075150Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1075216Z ld.shared.s8 %rs2573, [%r25+2816]; 2026-02-21T10:19:38.1075414Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1075533Z shl.b16 %rs2574, %rs2573, 4; 2026-02-21T10:19:38.1075739Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1075809Z ld.shared.s8 %rs2575, [%r26+2944]; 2026-02-21T10:19:38.1076007Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1076071Z shl.b16 %rs2576, %rs2575, 4; 2026-02-21T10:19:38.1076270Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1076341Z ld.shared.s8 %rs2577, [%r19+3072]; 2026-02-21T10:19:38.1076711Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1076779Z shl.b16 %rs2578, %rs2577, 4; 2026-02-21T10:19:38.1076982Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1077052Z ld.shared.s8 %rs2579, [%r20+3200]; 2026-02-21T10:19:38.1077335Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1077400Z shl.b16 %rs2580, %rs2579, 4; 2026-02-21T10:19:38.1077595Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1077661Z ld.shared.s8 %rs2581, [%r21+3328]; 2026-02-21T10:19:38.1077848Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1077911Z shl.b16 %rs2582, %rs2581, 4; 2026-02-21T10:19:38.1078099Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1078168Z ld.shared.s8 %rs2583, [%r22+3456]; 2026-02-21T10:19:38.1078361Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1078422Z shl.b16 %rs2584, %rs2583, 4; 2026-02-21T10:19:38.1078613Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1078679Z ld.shared.s8 %rs2585, [%r23+3584]; 2026-02-21T10:19:38.1078872Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1078931Z shl.b16 %rs2586, %rs2585, 4; 2026-02-21T10:19:38.1079188Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1079256Z ld.shared.s8 %rs2587, [%r24+3712]; 2026-02-21T10:19:38.1079441Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1079500Z shl.b16 %rs2588, %rs2587, 4; 2026-02-21T10:19:38.1079687Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1079749Z ld.shared.s8 %rs2589, [%r25+3840]; 2026-02-21T10:19:38.1079935Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1079999Z shl.b16 %rs2590, %rs2589, 4; 2026-02-21T10:19:38.1080188Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1080249Z ld.shared.s8 %rs2591, [%r26+3968]; 2026-02-21T10:19:38.1080436Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1080498Z shl.b16 %rs2592, %rs2591, 4; 2026-02-21T10:19:38.1080765Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1080830Z cvt.s16.s8 %rs2593, %rs2530; 2026-02-21T10:19:38.1080889Z shr.s16 %rs2594, %rs2593, 4; 2026-02-21T10:19:38.1080947Z cvt.s16.s8 %rs2595, %rs2532; 2026-02-21T10:19:38.1081006Z shr.s16 %rs2596, %rs2595, 4; 2026-02-21T10:19:38.1081065Z shr.s16 %rs2597, %rs2529, 4; 2026-02-21T10:19:38.1081122Z shr.s16 %rs2598, %rs2531, 4; 2026-02-21T10:19:38.1081373Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1081438Z cvt.rn.f32.s16 %r32230, %rs2598; 2026-02-21T10:19:38.1081503Z cvt.rn.f32.s16 %r32231, %rs2597; 2026-02-21T10:19:38.1081562Z cvt.rn.f32.s16 %r32232, %rs2596; 2026-02-21T10:19:38.1081620Z cvt.rn.f32.s16 %r32233, %rs2594; 2026-02-21T10:19:38.1081814Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1081873Z cvt.s16.s8 %rs2599, %rs2534; 2026-02-21T10:19:38.1081932Z shr.s16 %rs2600, %rs2599, 4; 2026-02-21T10:19:38.1081992Z cvt.s16.s8 %rs2601, %rs2536; 2026-02-21T10:19:38.1082049Z shr.s16 %rs2602, %rs2601, 4; 2026-02-21T10:19:38.1082105Z shr.s16 %rs2603, %rs2533, 4; 2026-02-21T10:19:38.1082163Z shr.s16 %rs2604, %rs2535, 4; 2026-02-21T10:19:38.1082356Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1082417Z cvt.rn.f32.s16 %r32234, %rs2604; 2026-02-21T10:19:38.1082532Z cvt.rn.f32.s16 %r32235, %rs2603; 2026-02-21T10:19:38.1082607Z cvt.rn.f32.s16 %r32236, %rs2602; 2026-02-21T10:19:38.1082670Z cvt.rn.f32.s16 %r32237, %rs2600; 2026-02-21T10:19:38.1082867Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1082941Z cvt.s16.s8 %rs2605, %rs2538; 2026-02-21T10:19:38.1083001Z shr.s16 %rs2606, %rs2605, 4; 2026-02-21T10:19:38.1083060Z cvt.s16.s8 %rs2607, %rs2540; 2026-02-21T10:19:38.1083120Z shr.s16 %rs2608, %rs2607, 4; 2026-02-21T10:19:38.1083181Z shr.s16 %rs2609, %rs2537, 4; 2026-02-21T10:19:38.1083240Z shr.s16 %rs2610, %rs2539, 4; 2026-02-21T10:19:38.1083430Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1083499Z cvt.rn.f32.s16 %r32238, %rs2610; 2026-02-21T10:19:38.1083561Z cvt.rn.f32.s16 %r32239, %rs2609; 2026-02-21T10:19:38.1083620Z cvt.rn.f32.s16 %r32240, %rs2608; 2026-02-21T10:19:38.1083685Z cvt.rn.f32.s16 %r32241, %rs2606; 2026-02-21T10:19:38.1083875Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1083935Z cvt.s16.s8 %rs2611, %rs2542; 2026-02-21T10:19:38.1083993Z shr.s16 %rs2612, %rs2611, 4; 2026-02-21T10:19:38.1084058Z cvt.s16.s8 %rs2613, %rs2544; 2026-02-21T10:19:38.1084184Z shr.s16 %rs2614, %rs2613, 4; 2026-02-21T10:19:38.1084242Z shr.s16 %rs2615, %rs2541, 4; 2026-02-21T10:19:38.1084304Z shr.s16 %rs2616, %rs2543, 4; 2026-02-21T10:19:38.1084501Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1084562Z cvt.rn.f32.s16 %r32242, %rs2616; 2026-02-21T10:19:38.1084622Z cvt.rn.f32.s16 %r32243, %rs2615; 2026-02-21T10:19:38.1084684Z cvt.rn.f32.s16 %r32244, %rs2614; 2026-02-21T10:19:38.1084742Z cvt.rn.f32.s16 %r32245, %rs2612; 2026-02-21T10:19:38.1084929Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1084995Z cvt.s16.s8 %rs2617, %rs2546; 2026-02-21T10:19:38.1085053Z shr.s16 %rs2618, %rs2617, 4; 2026-02-21T10:19:38.1085112Z cvt.s16.s8 %rs2619, %rs2548; 2026-02-21T10:19:38.1085172Z shr.s16 %rs2620, %rs2619, 4; 2026-02-21T10:19:38.1085230Z shr.s16 %rs2621, %rs2545, 4; 2026-02-21T10:19:38.1085291Z shr.s16 %rs2622, %rs2547, 4; 2026-02-21T10:19:38.1085476Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1085603Z cvt.rn.f32.s16 %r32246, %rs2622; 2026-02-21T10:19:38.1085668Z cvt.rn.f32.s16 %r32247, %rs2621; 2026-02-21T10:19:38.1085729Z cvt.rn.f32.s16 %r32248, %rs2620; 2026-02-21T10:19:38.1085792Z cvt.rn.f32.s16 %r32249, %rs2618; 2026-02-21T10:19:38.1085980Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1086039Z cvt.s16.s8 %rs2623, %rs2550; 2026-02-21T10:19:38.1086167Z shr.s16 %rs2624, %rs2623, 4; 2026-02-21T10:19:38.1086226Z cvt.s16.s8 %rs2625, %rs2552; 2026-02-21T10:19:38.1086285Z shr.s16 %rs2626, %rs2625, 4; 2026-02-21T10:19:38.1086342Z shr.s16 %rs2627, %rs2549, 4; 2026-02-21T10:19:38.1086406Z shr.s16 %rs2628, %rs2551, 4; 2026-02-21T10:19:38.1086732Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1086801Z cvt.rn.f32.s16 %r32250, %rs2628; 2026-02-21T10:19:38.1086866Z cvt.rn.f32.s16 %r32251, %rs2627; 2026-02-21T10:19:38.1086927Z cvt.rn.f32.s16 %r32252, %rs2626; 2026-02-21T10:19:38.1086997Z cvt.rn.f32.s16 %r32253, %rs2624; 2026-02-21T10:19:38.1087192Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1087252Z cvt.s16.s8 %rs2629, %rs2554; 2026-02-21T10:19:38.1087311Z shr.s16 %rs2630, %rs2629, 4; 2026-02-21T10:19:38.1087369Z cvt.s16.s8 %rs2631, %rs2556; 2026-02-21T10:19:38.1087431Z shr.s16 %rs2632, %rs2631, 4; 2026-02-21T10:19:38.1087567Z shr.s16 %rs2633, %rs2553, 4; 2026-02-21T10:19:38.1087629Z shr.s16 %rs2634, %rs2555, 4; 2026-02-21T10:19:38.1087821Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1087882Z cvt.rn.f32.s16 %r32254, %rs2634; 2026-02-21T10:19:38.1087942Z cvt.rn.f32.s16 %r32255, %rs2633; 2026-02-21T10:19:38.1088005Z cvt.rn.f32.s16 %r32256, %rs2632; 2026-02-21T10:19:38.1088067Z cvt.rn.f32.s16 %r32257, %rs2630; 2026-02-21T10:19:38.1088255Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1088314Z cvt.s16.s8 %rs2635, %rs2558; 2026-02-21T10:19:38.1088377Z shr.s16 %rs2636, %rs2635, 4; 2026-02-21T10:19:38.1088448Z cvt.s16.s8 %rs2637, %rs2560; 2026-02-21T10:19:38.1088508Z shr.s16 %rs2638, %rs2637, 4; 2026-02-21T10:19:38.1088569Z shr.s16 %rs2639, %rs2557, 4; 2026-02-21T10:19:38.1088626Z shr.s16 %rs2640, %rs2559, 4; 2026-02-21T10:19:38.1088816Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1088878Z cvt.rn.f32.s16 %r32258, %rs2640; 2026-02-21T10:19:38.1088939Z cvt.rn.f32.s16 %r32259, %rs2639; 2026-02-21T10:19:38.1088998Z cvt.rn.f32.s16 %r32260, %rs2638; 2026-02-21T10:19:38.1089057Z cvt.rn.f32.s16 %r32261, %rs2636; 2026-02-21T10:19:38.1089321Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1089383Z cvt.s16.s8 %rs2641, %rs2562; 2026-02-21T10:19:38.1089442Z shr.s16 %rs2642, %rs2641, 4; 2026-02-21T10:19:38.1089503Z cvt.s16.s8 %rs2643, %rs2564; 2026-02-21T10:19:38.1089563Z shr.s16 %rs2644, %rs2643, 4; 2026-02-21T10:19:38.1089622Z shr.s16 %rs2645, %rs2561, 4; 2026-02-21T10:19:38.1089680Z shr.s16 %rs2646, %rs2563, 4; 2026-02-21T10:19:38.1089869Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1089933Z cvt.rn.f32.s16 %r32262, %rs2646; 2026-02-21T10:19:38.1090007Z cvt.rn.f32.s16 %r32263, %rs2645; 2026-02-21T10:19:38.1090072Z cvt.rn.f32.s16 %r32264, %rs2644; 2026-02-21T10:19:38.1090135Z cvt.rn.f32.s16 %r32265, %rs2642; 2026-02-21T10:19:38.1090321Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1090387Z cvt.s16.s8 %rs2647, %rs2566; 2026-02-21T10:19:38.1090446Z shr.s16 %rs2648, %rs2647, 4; 2026-02-21T10:19:38.1090504Z cvt.s16.s8 %rs2649, %rs2568; 2026-02-21T10:19:38.1090630Z shr.s16 %rs2650, %rs2649, 4; 2026-02-21T10:19:38.1090695Z shr.s16 %rs2651, %rs2565, 4; 2026-02-21T10:19:38.1090752Z shr.s16 %rs2652, %rs2567, 4; 2026-02-21T10:19:38.1090938Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1091002Z cvt.rn.f32.s16 %r32266, %rs2652; 2026-02-21T10:19:38.1091064Z cvt.rn.f32.s16 %r32267, %rs2651; 2026-02-21T10:19:38.1091129Z cvt.rn.f32.s16 %r32268, %rs2650; 2026-02-21T10:19:38.1091253Z cvt.rn.f32.s16 %r32269, %rs2648; 2026-02-21T10:19:38.1091443Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1091502Z cvt.s16.s8 %rs2653, %rs2570; 2026-02-21T10:19:38.1091571Z shr.s16 %rs2654, %rs2653, 4; 2026-02-21T10:19:38.1091637Z cvt.s16.s8 %rs2655, %rs2572; 2026-02-21T10:19:38.1091695Z shr.s16 %rs2656, %rs2655, 4; 2026-02-21T10:19:38.1091755Z shr.s16 %rs2657, %rs2569, 4; 2026-02-21T10:19:38.1091816Z shr.s16 %rs2658, %rs2571, 4; 2026-02-21T10:19:38.1092004Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1092064Z cvt.rn.f32.s16 %r32270, %rs2658; 2026-02-21T10:19:38.1092123Z cvt.rn.f32.s16 %r32271, %rs2657; 2026-02-21T10:19:38.1092185Z cvt.rn.f32.s16 %r32272, %rs2656; 2026-02-21T10:19:38.1092245Z cvt.rn.f32.s16 %r32273, %rs2654; 2026-02-21T10:19:38.1092483Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1092550Z cvt.s16.s8 %rs2659, %rs2574; 2026-02-21T10:19:38.1092608Z shr.s16 %rs2660, %rs2659, 4; 2026-02-21T10:19:38.1092668Z cvt.s16.s8 %rs2661, %rs2576; 2026-02-21T10:19:38.1092730Z shr.s16 %rs2662, %rs2661, 4; 2026-02-21T10:19:38.1092799Z shr.s16 %rs2663, %rs2573, 4; 2026-02-21T10:19:38.1092863Z shr.s16 %rs2664, %rs2575, 4; 2026-02-21T10:19:38.1093050Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1093115Z cvt.rn.f32.s16 %r32274, %rs2664; 2026-02-21T10:19:38.1093174Z cvt.rn.f32.s16 %r32275, %rs2663; 2026-02-21T10:19:38.1093233Z cvt.rn.f32.s16 %r32276, %rs2662; 2026-02-21T10:19:38.1093296Z cvt.rn.f32.s16 %r32277, %rs2660; 2026-02-21T10:19:38.1093482Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1093543Z cvt.s16.s8 %rs2665, %rs2578; 2026-02-21T10:19:38.1093608Z shr.s16 %rs2666, %rs2665, 4; 2026-02-21T10:19:38.1093668Z cvt.s16.s8 %rs2667, %rs2580; 2026-02-21T10:19:38.1093726Z shr.s16 %rs2668, %rs2667, 4; 2026-02-21T10:19:38.1093783Z shr.s16 %rs2669, %rs2577, 4; 2026-02-21T10:19:38.1093845Z shr.s16 %rs2670, %rs2579, 4; 2026-02-21T10:19:38.1094034Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1094152Z cvt.rn.f32.s16 %r32278, %rs2670; 2026-02-21T10:19:38.1094220Z cvt.rn.f32.s16 %r32279, %rs2669; 2026-02-21T10:19:38.1094289Z cvt.rn.f32.s16 %r32280, %rs2668; 2026-02-21T10:19:38.1094357Z cvt.rn.f32.s16 %r32281, %rs2666; 2026-02-21T10:19:38.1094543Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1094606Z cvt.s16.s8 %rs2671, %rs2582; 2026-02-21T10:19:38.1094665Z shr.s16 %rs2672, %rs2671, 4; 2026-02-21T10:19:38.1094723Z cvt.s16.s8 %rs2673, %rs2584; 2026-02-21T10:19:38.1094786Z shr.s16 %rs2674, %rs2673, 4; 2026-02-21T10:19:38.1094847Z shr.s16 %rs2675, %rs2581, 4; 2026-02-21T10:19:38.1094906Z shr.s16 %rs2676, %rs2583, 4; 2026-02-21T10:19:38.1095094Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1095155Z cvt.rn.f32.s16 %r32282, %rs2676; 2026-02-21T10:19:38.1095217Z cvt.rn.f32.s16 %r32283, %rs2675; 2026-02-21T10:19:38.1095277Z cvt.rn.f32.s16 %r32284, %rs2674; 2026-02-21T10:19:38.1095339Z cvt.rn.f32.s16 %r32285, %rs2672; 2026-02-21T10:19:38.1095579Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1095641Z cvt.s16.s8 %rs2677, %rs2586; 2026-02-21T10:19:38.1095703Z shr.s16 %rs2678, %rs2677, 4; 2026-02-21T10:19:38.1095761Z cvt.s16.s8 %rs2679, %rs2588; 2026-02-21T10:19:38.1095819Z shr.s16 %rs2680, %rs2679, 4; 2026-02-21T10:19:38.1095879Z shr.s16 %rs2681, %rs2585, 4; 2026-02-21T10:19:38.1095936Z shr.s16 %rs2682, %rs2587, 4; 2026-02-21T10:19:38.1096168Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1096230Z cvt.rn.f32.s16 %r32286, %rs2682; 2026-02-21T10:19:38.1096294Z cvt.rn.f32.s16 %r32287, %rs2681; 2026-02-21T10:19:38.1096353Z cvt.rn.f32.s16 %r32288, %rs2680; 2026-02-21T10:19:38.1096412Z cvt.rn.f32.s16 %r32289, %rs2678; 2026-02-21T10:19:38.1096738Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1096805Z cvt.s16.s8 %rs2683, %rs2590; 2026-02-21T10:19:38.1096874Z shr.s16 %rs2684, %rs2683, 4; 2026-02-21T10:19:38.1096938Z cvt.s16.s8 %rs2685, %rs2592; 2026-02-21T10:19:38.1096997Z shr.s16 %rs2686, %rs2685, 4; 2026-02-21T10:19:38.1097056Z shr.s16 %rs2687, %rs2589, 4; 2026-02-21T10:19:38.1097114Z shr.s16 %rs2688, %rs2591, 4; 2026-02-21T10:19:38.1097306Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1097452Z cvt.rn.f32.s16 %r32290, %rs2688; 2026-02-21T10:19:38.1097518Z cvt.rn.f32.s16 %r32291, %rs2687; 2026-02-21T10:19:38.1097579Z cvt.rn.f32.s16 %r32292, %rs2686; 2026-02-21T10:19:38.1097639Z cvt.rn.f32.s16 %r32293, %rs2684; 2026-02-21T10:19:38.1097695Z bar.sync 0; 2026-02-21T10:19:38.1097820Z st.shared.v4.b32 [%r27], {%r32233, %r32231, %r32232, %r32230}; 2026-02-21T10:19:38.1097954Z st.shared.v4.b32 [%r27+16384], {%r32265, %r32263, %r32264, %r32262}; 2026-02-21T10:19:38.1098066Z st.shared.v4.b32 [%r28], {%r32237, %r32235, %r32236, %r32234}; 2026-02-21T10:19:38.1098185Z st.shared.v4.b32 [%r28+16384], {%r32269, %r32267, %r32268, %r32266}; 2026-02-21T10:19:38.1098297Z st.shared.v4.b32 [%r29], {%r32241, %r32239, %r32240, %r32238}; 2026-02-21T10:19:38.1098412Z st.shared.v4.b32 [%r29+16384], {%r32273, %r32271, %r32272, %r32270}; 2026-02-21T10:19:38.1098517Z st.shared.v4.b32 [%r30], {%r32245, %r32243, %r32244, %r32242}; 2026-02-21T10:19:38.1098635Z st.shared.v4.b32 [%r30+16384], {%r32277, %r32275, %r32276, %r32274}; 2026-02-21T10:19:38.1098745Z st.shared.v4.b32 [%r31], {%r32249, %r32247, %r32248, %r32246}; 2026-02-21T10:19:38.1098860Z st.shared.v4.b32 [%r31+16384], {%r32281, %r32279, %r32280, %r32278}; 2026-02-21T10:19:38.1098967Z st.shared.v4.b32 [%r32], {%r32253, %r32251, %r32252, %r32250}; 2026-02-21T10:19:38.1099152Z st.shared.v4.b32 [%r32+16384], {%r32285, %r32283, %r32284, %r32282}; 2026-02-21T10:19:38.1099258Z st.shared.v4.b32 [%r33], {%r32257, %r32255, %r32256, %r32254}; 2026-02-21T10:19:38.1099375Z st.shared.v4.b32 [%r33+16384], {%r32289, %r32287, %r32288, %r32286}; 2026-02-21T10:19:38.1099482Z st.shared.v4.b32 [%r34], {%r32261, %r32259, %r32260, %r32258}; 2026-02-21T10:19:38.1099596Z st.shared.v4.b32 [%r34+16384], {%r32293, %r32291, %r32292, %r32290}; 2026-02-21T10:19:38.1099650Z $L__tmp23: 2026-02-21T10:19:38.1099932Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1100000Z // begin inline asm 2026-02-21T10:19:38.1100085Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1100143Z // end inline asm 2026-02-21T10:19:38.1100198Z bar.sync 0; 2026-02-21T10:19:38.1100269Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1100335Z mov.pred %p289, -1; 2026-02-21T10:19:38.1100396Z // begin inline asm 2026-02-21T10:19:38.1101973Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r29983,%r29984,%r29985,%r29986}, %rd3, %p289, 1, 1; 2026-02-21T10:19:38.1102093Z // end inline asm 2026-02-21T10:19:38.1102152Z // begin inline asm 2026-02-21T10:19:38.1103644Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30115,%r30116,%r30117,%r30118}, %rd4, %p289, 1, 1; 2026-02-21T10:19:38.1103708Z // end inline asm 2026-02-21T10:19:38.1103766Z // begin inline asm 2026-02-21T10:19:38.1105299Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30247,%r30248,%r30249,%r30250}, %rd5, %p289, 1, 1; 2026-02-21T10:19:38.1105364Z // end inline asm 2026-02-21T10:19:38.1105421Z // begin inline asm 2026-02-21T10:19:38.1107058Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30379,%r30380,%r30381,%r30382}, %rd6, %p289, 1, 1; 2026-02-21T10:19:38.1107122Z // end inline asm 2026-02-21T10:19:38.1107179Z // begin inline asm 2026-02-21T10:19:38.1108724Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30511,%r30512,%r30513,%r30514}, %rd7, %p289, 1, 1; 2026-02-21T10:19:38.1108863Z // end inline asm 2026-02-21T10:19:38.1108920Z // begin inline asm 2026-02-21T10:19:38.1110491Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30643,%r30644,%r30645,%r30646}, %rd8, %p289, 1, 1; 2026-02-21T10:19:38.1110554Z // end inline asm 2026-02-21T10:19:38.1110623Z // begin inline asm 2026-02-21T10:19:38.1112110Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30775,%r30776,%r30777,%r30778}, %rd9, %p289, 1, 1; 2026-02-21T10:19:38.1112229Z // end inline asm 2026-02-21T10:19:38.1112288Z // begin inline asm 2026-02-21T10:19:38.1113831Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046}, {%r30907,%r30908,%r30909,%r30910}, %rd10, %p289, 1, 1; 2026-02-21T10:19:38.1113894Z // end inline asm 2026-02-21T10:19:38.1113951Z // begin inline asm 2026-02-21T10:19:38.1115431Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31039,%r31040,%r31041,%r31042}, %rd3, %p289, 1, 1; 2026-02-21T10:19:38.1115493Z // end inline asm 2026-02-21T10:19:38.1115549Z // begin inline asm 2026-02-21T10:19:38.1117176Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31171,%r31172,%r31173,%r31174}, %rd4, %p289, 1, 1; 2026-02-21T10:19:38.1117316Z // end inline asm 2026-02-21T10:19:38.1117376Z // begin inline asm 2026-02-21T10:19:38.1118878Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31303,%r31304,%r31305,%r31306}, %rd5, %p289, 1, 1; 2026-02-21T10:19:38.1118938Z // end inline asm 2026-02-21T10:19:38.1119000Z // begin inline asm 2026-02-21T10:19:38.1120541Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31435,%r31436,%r31437,%r31438}, %rd6, %p289, 1, 1; 2026-02-21T10:19:38.1120654Z // end inline asm 2026-02-21T10:19:38.1120715Z // begin inline asm 2026-02-21T10:19:38.1122192Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31567,%r31568,%r31569,%r31570}, %rd7, %p289, 1, 1; 2026-02-21T10:19:38.1122252Z // end inline asm 2026-02-21T10:19:38.1122312Z // begin inline asm 2026-02-21T10:19:38.1123848Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31699,%r31700,%r31701,%r31702}, %rd8, %p289, 1, 1; 2026-02-21T10:19:38.1123911Z // end inline asm 2026-02-21T10:19:38.1123969Z // begin inline asm 2026-02-21T10:19:38.1125443Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31831,%r31832,%r31833,%r31834}, %rd9, %p289, 1, 1; 2026-02-21T10:19:38.1125503Z // end inline asm 2026-02-21T10:19:38.1125560Z // begin inline asm 2026-02-21T10:19:38.1127230Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110}, {%r31963,%r31964,%r31965,%r31966}, %rd10, %p289, 1, 1; 2026-02-21T10:19:38.1127293Z // end inline asm 2026-02-21T10:19:38.1127372Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1127436Z mov.b32 %r32095, %r39931; 2026-02-21T10:19:38.1127493Z mov.b32 %r32096, %r32097; 2026-02-21T10:19:38.1127549Z // begin inline asm 2026-02-21T10:19:38.1130151Z // wait for regs: %r42983,%r42984,%r42985,%r42986,%r42987,%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r32095,%r32096,%r32097 2026-02-21T10:19:38.1130288Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1130344Z // end inline asm 2026-02-21T10:19:38.1130402Z $L__tmp24: 2026-02-21T10:19:38.1130615Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1130681Z add.s64 %rd850, %rd850, 128; 2026-02-21T10:19:38.1130754Z setp.lt.u64 %p307, %rd851, 4064; 2026-02-21T10:19:38.1130815Z @%p307 bra $L__BB0_13; 2026-02-21T10:19:38.1130927Z // %bb.14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:38.1131131Z .loc 1 94 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:94:28 2026-02-21T10:19:38.1131280Z cvt.rn.bf16x2.f32 %r32297, %r42984, %r42983; 2026-02-21T10:19:38.1131361Z cvt.rn.bf16x2.f32 %r32298, %r42986, %r42985; 2026-02-21T10:19:38.1131436Z cvt.rn.bf16x2.f32 %r32299, %r42988, %r42987; 2026-02-21T10:19:38.1131513Z cvt.rn.bf16x2.f32 %r32300, %r42990, %r42989; 2026-02-21T10:19:38.1131586Z cvt.rn.bf16x2.f32 %r32301, %r42992, %r42991; 2026-02-21T10:19:38.1131661Z cvt.rn.bf16x2.f32 %r32302, %r42994, %r42993; 2026-02-21T10:19:38.1131737Z cvt.rn.bf16x2.f32 %r32303, %r42996, %r42995; 2026-02-21T10:19:38.1131824Z cvt.rn.bf16x2.f32 %r32304, %r42998, %r42997; 2026-02-21T10:19:38.1131903Z cvt.rn.bf16x2.f32 %r32305, %r43000, %r42999; 2026-02-21T10:19:38.1131979Z cvt.rn.bf16x2.f32 %r32306, %r43002, %r43001; 2026-02-21T10:19:38.1132052Z cvt.rn.bf16x2.f32 %r32307, %r43004, %r43003; 2026-02-21T10:19:38.1132124Z cvt.rn.bf16x2.f32 %r32308, %r43006, %r43005; 2026-02-21T10:19:38.1132197Z cvt.rn.bf16x2.f32 %r32309, %r43008, %r43007; 2026-02-21T10:19:38.1132272Z cvt.rn.bf16x2.f32 %r32310, %r43010, %r43009; 2026-02-21T10:19:38.1132348Z cvt.rn.bf16x2.f32 %r32311, %r43012, %r43011; 2026-02-21T10:19:38.1132420Z cvt.rn.bf16x2.f32 %r32312, %r43014, %r43013; 2026-02-21T10:19:38.1132496Z cvt.rn.bf16x2.f32 %r32313, %r43016, %r43015; 2026-02-21T10:19:38.1132569Z cvt.rn.bf16x2.f32 %r32314, %r43018, %r43017; 2026-02-21T10:19:38.1132710Z cvt.rn.bf16x2.f32 %r32315, %r43020, %r43019; 2026-02-21T10:19:38.1132785Z cvt.rn.bf16x2.f32 %r32316, %r43022, %r43021; 2026-02-21T10:19:38.1132860Z cvt.rn.bf16x2.f32 %r32317, %r43024, %r43023; 2026-02-21T10:19:38.1132933Z cvt.rn.bf16x2.f32 %r32318, %r43026, %r43025; 2026-02-21T10:19:38.1133006Z cvt.rn.bf16x2.f32 %r32319, %r43028, %r43027; 2026-02-21T10:19:38.1133083Z cvt.rn.bf16x2.f32 %r32320, %r43030, %r43029; 2026-02-21T10:19:38.1133155Z cvt.rn.bf16x2.f32 %r32321, %r43032, %r43031; 2026-02-21T10:19:38.1133228Z cvt.rn.bf16x2.f32 %r32322, %r43034, %r43033; 2026-02-21T10:19:38.1133304Z cvt.rn.bf16x2.f32 %r32323, %r43036, %r43035; 2026-02-21T10:19:38.1133379Z cvt.rn.bf16x2.f32 %r32324, %r43038, %r43037; 2026-02-21T10:19:38.1133452Z cvt.rn.bf16x2.f32 %r32325, %r43040, %r43039; 2026-02-21T10:19:38.1133525Z cvt.rn.bf16x2.f32 %r32326, %r43042, %r43041; 2026-02-21T10:19:38.1133597Z cvt.rn.bf16x2.f32 %r32327, %r43044, %r43043; 2026-02-21T10:19:38.1133683Z cvt.rn.bf16x2.f32 %r32328, %r43046, %r43045; 2026-02-21T10:19:38.1133762Z cvt.rn.bf16x2.f32 %r32329, %r43048, %r43047; 2026-02-21T10:19:38.1133837Z cvt.rn.bf16x2.f32 %r32330, %r43050, %r43049; 2026-02-21T10:19:38.1133962Z cvt.rn.bf16x2.f32 %r32331, %r43052, %r43051; 2026-02-21T10:19:38.1134038Z cvt.rn.bf16x2.f32 %r32332, %r43054, %r43053; 2026-02-21T10:19:38.1134112Z cvt.rn.bf16x2.f32 %r32333, %r43056, %r43055; 2026-02-21T10:19:38.1134184Z cvt.rn.bf16x2.f32 %r32334, %r43058, %r43057; 2026-02-21T10:19:38.1134257Z cvt.rn.bf16x2.f32 %r32335, %r43060, %r43059; 2026-02-21T10:19:38.1134337Z cvt.rn.bf16x2.f32 %r32336, %r43062, %r43061; 2026-02-21T10:19:38.1134410Z cvt.rn.bf16x2.f32 %r32337, %r43064, %r43063; 2026-02-21T10:19:38.1134535Z cvt.rn.bf16x2.f32 %r32338, %r43066, %r43065; 2026-02-21T10:19:38.1134608Z cvt.rn.bf16x2.f32 %r32339, %r43068, %r43067; 2026-02-21T10:19:38.1134683Z cvt.rn.bf16x2.f32 %r32340, %r43070, %r43069; 2026-02-21T10:19:38.1134755Z cvt.rn.bf16x2.f32 %r32341, %r43072, %r43071; 2026-02-21T10:19:38.1134830Z cvt.rn.bf16x2.f32 %r32342, %r43074, %r43073; 2026-02-21T10:19:38.1134905Z cvt.rn.bf16x2.f32 %r32343, %r43076, %r43075; 2026-02-21T10:19:38.1134976Z cvt.rn.bf16x2.f32 %r32344, %r43078, %r43077; 2026-02-21T10:19:38.1135050Z cvt.rn.bf16x2.f32 %r32345, %r43080, %r43079; 2026-02-21T10:19:38.1135126Z cvt.rn.bf16x2.f32 %r32346, %r43082, %r43081; 2026-02-21T10:19:38.1135199Z cvt.rn.bf16x2.f32 %r32347, %r43084, %r43083; 2026-02-21T10:19:38.1135272Z cvt.rn.bf16x2.f32 %r32348, %r43086, %r43085; 2026-02-21T10:19:38.1135345Z cvt.rn.bf16x2.f32 %r32349, %r43088, %r43087; 2026-02-21T10:19:38.1135422Z cvt.rn.bf16x2.f32 %r32350, %r43090, %r43089; 2026-02-21T10:19:38.1135550Z cvt.rn.bf16x2.f32 %r32351, %r43092, %r43091; 2026-02-21T10:19:38.1135676Z cvt.rn.bf16x2.f32 %r32352, %r43094, %r43093; 2026-02-21T10:19:38.1135821Z cvt.rn.bf16x2.f32 %r32353, %r43096, %r43095; 2026-02-21T10:19:38.1135931Z cvt.rn.bf16x2.f32 %r32354, %r43098, %r43097; 2026-02-21T10:19:38.1136010Z cvt.rn.bf16x2.f32 %r32355, %r43100, %r43099; 2026-02-21T10:19:38.1136096Z cvt.rn.bf16x2.f32 %r32356, %r43102, %r43101; 2026-02-21T10:19:38.1136170Z cvt.rn.bf16x2.f32 %r32357, %r43104, %r43103; 2026-02-21T10:19:38.1136245Z cvt.rn.bf16x2.f32 %r32358, %r43106, %r43105; 2026-02-21T10:19:38.1136322Z cvt.rn.bf16x2.f32 %r32359, %r43108, %r43107; 2026-02-21T10:19:38.1136396Z cvt.rn.bf16x2.f32 %r32360, %r43110, %r43109; 2026-02-21T10:19:38.1136708Z .loc 1 95 43 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:95:43 2026-02-21T10:19:38.1136772Z bar.sync 0; 2026-02-21T10:19:38.1136984Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r32297, %r32298, %r32299, %r32300}; 2026-02-21T10:19:38.1137174Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r32313, %r32314, %r32315, %r32316}; 2026-02-21T10:19:38.1137359Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r32329, %r32330, %r32331, %r32332}; 2026-02-21T10:19:38.1137537Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r32345, %r32346, %r32347, %r32348}; 2026-02-21T10:19:38.1137824Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r32301, %r32302, %r32303, %r32304}; 2026-02-21T10:19:38.1138005Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r32317, %r32318, %r32319, %r32320}; 2026-02-21T10:19:38.1138184Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r32333, %r32334, %r32335, %r32336}; 2026-02-21T10:19:38.1138367Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r32349, %r32350, %r32351, %r32352}; 2026-02-21T10:19:38.1138549Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r32305, %r32306, %r32307, %r32308}; 2026-02-21T10:19:38.1138727Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r32321, %r32322, %r32323, %r32324}; 2026-02-21T10:19:38.1138909Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r32337, %r32338, %r32339, %r32340}; 2026-02-21T10:19:38.1139092Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r32353, %r32354, %r32355, %r32356}; 2026-02-21T10:19:38.1139270Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r32309, %r32310, %r32311, %r32312}; 2026-02-21T10:19:38.1139449Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r32325, %r32326, %r32327, %r32328}; 2026-02-21T10:19:38.1139697Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r32341, %r32342, %r32343, %r32344}; 2026-02-21T10:19:38.1139894Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r32357, %r32358, %r32359, %r32360}; 2026-02-21T10:19:38.1139956Z // begin inline asm 2026-02-21T10:19:38.1140038Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1140097Z // end inline asm 2026-02-21T10:19:38.1140151Z bar.sync 0; 2026-02-21T10:19:38.1140219Z elect.sync %r32361|%p310, -1; 2026-02-21T10:19:38.1140351Z and.pred %p308, %p405, %p310; 2026-02-21T10:19:38.1140417Z or.b32 %r32294, %r29849, %r639; 2026-02-21T10:19:38.1140475Z // begin inline asm 2026-02-21T10:19:38.1140719Z @%p308 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r32294, %r32295}], [%r22276]; 2026-02-21T10:19:38.1140778Z // end inline asm 2026-02-21T10:19:38.1140851Z cp.async.bulk.commit_group; 2026-02-21T10:19:38.1140925Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:19:38.1140980Z bar.sync 0; 2026-02-21T10:19:38.1141194Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:38.1141257Z add.s32 %r42467, %r42467, 12672; 2026-02-21T10:19:38.1141329Z setp.lt.s32 %p311, %r42467, %r43239; 2026-02-21T10:19:38.1141388Z @%p311 bra $L__BB0_2; 2026-02-21T10:19:38.1141495Z $L__BB0_15: // %.preheader271 2026-02-21T10:19:38.1141565Z setp.gt.s32 %p312, %r43239, 5119; 2026-02-21T10:19:38.1141628Z @%p312 bra $L__BB0_22; 2026-02-21T10:19:38.1141780Z // %bb.16: // %.lr.ph282 2026-02-21T10:19:38.1141985Z .loc 1 0 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:0:112 2026-02-21T10:19:38.1142049Z shl.b32 %r32363, %r42451, 4; 2026-02-21T10:19:38.1142113Z xor.b32 %r32365, %r32363, %r42452; 2026-02-21T10:19:38.1142176Z add.s32 %r68, %r39931, %r32365; 2026-02-21T10:19:38.1142247Z xor.b32 %r32367, %r32365, 8; 2026-02-21T10:19:38.1142312Z add.s32 %r69, %r39931, %r32367; 2026-02-21T10:19:38.1142372Z and.b32 %r32369, %r42453, 6144; 2026-02-21T10:19:38.1142433Z and.b32 %r32371, %r42454, 896; 2026-02-21T10:19:38.1142496Z and.b32 %r32373, %r42455, 62; 2026-02-21T10:19:38.1142556Z or.b32 %r32374, %r32371, %r32373; 2026-02-21T10:19:38.1142615Z or.b32 %r32375, %r32374, %r32369; 2026-02-21T10:19:38.1142675Z add.s32 %r70, %r39931, %r32375; 2026-02-21T10:19:38.1142735Z xor.b32 %r32376, %r32375, 8; 2026-02-21T10:19:38.1142793Z add.s32 %r71, %r39931, %r32376; 2026-02-21T10:19:38.1142856Z xor.b32 %r32377, %r32375, 16; 2026-02-21T10:19:38.1142918Z add.s32 %r72, %r39931, %r32377; 2026-02-21T10:19:38.1142976Z xor.b32 %r32378, %r32375, 24; 2026-02-21T10:19:38.1143036Z add.s32 %r73, %r39931, %r32378; 2026-02-21T10:19:38.1143097Z xor.b32 %r32379, %r32375, 32; 2026-02-21T10:19:38.1143156Z add.s32 %r74, %r39931, %r32379; 2026-02-21T10:19:38.1143269Z xor.b32 %r32380, %r32375, 40; 2026-02-21T10:19:38.1143328Z add.s32 %r75, %r39931, %r32380; 2026-02-21T10:19:38.1143390Z xor.b32 %r32381, %r32375, 48; 2026-02-21T10:19:38.1143448Z add.s32 %r76, %r39931, %r32381; 2026-02-21T10:19:38.1143504Z xor.b32 %r32382, %r32375, 56; 2026-02-21T10:19:38.1143564Z add.s32 %r77, %r39931, %r32382; 2026-02-21T10:19:38.1143621Z add.s32 %r78, %r39931, %r42451; 2026-02-21T10:19:38.1143678Z xor.b32 %r32383, %r42451, 16; 2026-02-21T10:19:38.1143737Z add.s32 %r79, %r39931, %r32383; 2026-02-21T10:19:38.1143798Z xor.b32 %r32384, %r42451, 32; 2026-02-21T10:19:38.1143857Z add.s32 %r80, %r39931, %r32384; 2026-02-21T10:19:38.1143915Z xor.b32 %r32385, %r42451, 48; 2026-02-21T10:19:38.1143975Z add.s32 %r81, %r39931, %r32385; 2026-02-21T10:19:38.1144032Z xor.b32 %r32386, %r42451, 64; 2026-02-21T10:19:38.1144089Z add.s32 %r82, %r39931, %r32386; 2026-02-21T10:19:38.1144149Z xor.b32 %r32387, %r42451, 80; 2026-02-21T10:19:38.1144210Z add.s32 %r83, %r39931, %r32387; 2026-02-21T10:19:38.1144268Z xor.b32 %r32388, %r42451, 96; 2026-02-21T10:19:38.1144325Z add.s32 %r84, %r39931, %r32388; 2026-02-21T10:19:38.1144440Z xor.b32 %r32389, %r42451, 112; 2026-02-21T10:19:38.1144502Z add.s32 %r85, %r39931, %r32389; 2026-02-21T10:19:38.1144560Z shl.b32 %r32390, %r42451, 7; 2026-02-21T10:19:38.1144620Z or.b32 %r32392, %r32390, %r42456; 2026-02-21T10:19:38.1144679Z add.s32 %r86, %r39931, %r32392; 2026-02-21T10:19:38.1144735Z xor.b32 %r32393, %r32392, 16; 2026-02-21T10:19:38.1144793Z add.s32 %r87, %r39931, %r32393; 2026-02-21T10:19:38.1144851Z xor.b32 %r32394, %r32392, 32; 2026-02-21T10:19:38.1144958Z add.s32 %r88, %r39931, %r32394; 2026-02-21T10:19:38.1145015Z xor.b32 %r32395, %r32392, 48; 2026-02-21T10:19:38.1145077Z add.s32 %r89, %r39931, %r32395; 2026-02-21T10:19:38.1145146Z xor.b32 %r32396, %r32392, 64; 2026-02-21T10:19:38.1145207Z add.s32 %r90, %r39931, %r32396; 2026-02-21T10:19:38.1145264Z xor.b32 %r32397, %r32392, 80; 2026-02-21T10:19:38.1145327Z add.s32 %r91, %r39931, %r32397; 2026-02-21T10:19:38.1145385Z xor.b32 %r32398, %r32392, 96; 2026-02-21T10:19:38.1145444Z add.s32 %r92, %r39931, %r32398; 2026-02-21T10:19:38.1145505Z xor.b32 %r32399, %r32392, 112; 2026-02-21T10:19:38.1145563Z add.s32 %r93, %r39931, %r32399; 2026-02-21T10:19:38.1145625Z bfe.u32 %r32400, %r39931, 4, 14; 2026-02-21T10:19:38.1145688Z cvt.u64.u32 %rd665, %r32400; 2026-02-21T10:19:38.1145765Z or.b64 %rd12, %rd665, 4611686293372403712; 2026-02-21T10:19:38.1145826Z add.s32 %r32401, %r39931, 32; 2026-02-21T10:19:38.1145885Z bfe.u32 %r32402, %r32401, 4, 14; 2026-02-21T10:19:38.1146008Z cvt.u64.u32 %rd666, %r32402; 2026-02-21T10:19:38.1146083Z or.b64 %rd13, %rd666, 4611686293372403712; 2026-02-21T10:19:38.1146141Z add.s32 %r32403, %r39931, 64; 2026-02-21T10:19:38.1146202Z bfe.u32 %r32404, %r32403, 4, 14; 2026-02-21T10:19:38.1146260Z cvt.u64.u32 %rd667, %r32404; 2026-02-21T10:19:38.1146327Z or.b64 %rd14, %rd667, 4611686293372403712; 2026-02-21T10:19:38.1146386Z add.s32 %r32405, %r39931, 96; 2026-02-21T10:19:38.1146569Z bfe.u32 %r32406, %r32405, 4, 14; 2026-02-21T10:19:38.1146636Z cvt.u64.u32 %rd668, %r32406; 2026-02-21T10:19:38.1146706Z or.b64 %rd15, %rd668, 4611686293372403712; 2026-02-21T10:19:38.1146768Z add.s32 %r32407, %r39931, 16384; 2026-02-21T10:19:38.1146836Z bfe.u32 %r32408, %r32407, 4, 14; 2026-02-21T10:19:38.1146897Z cvt.u64.u32 %rd669, %r32408; 2026-02-21T10:19:38.1146967Z or.b64 %rd16, %rd669, 4611686293372403712; 2026-02-21T10:19:38.1147025Z add.s32 %r32409, %r39931, 16416; 2026-02-21T10:19:38.1147083Z bfe.u32 %r32410, %r32409, 4, 14; 2026-02-21T10:19:38.1147143Z cvt.u64.u32 %rd670, %r32410; 2026-02-21T10:19:38.1147212Z or.b64 %rd17, %rd670, 4611686293372403712; 2026-02-21T10:19:38.1147271Z add.s32 %r32411, %r39931, 16448; 2026-02-21T10:19:38.1147328Z bfe.u32 %r32412, %r32411, 4, 14; 2026-02-21T10:19:38.1147388Z cvt.u64.u32 %rd671, %r32412; 2026-02-21T10:19:38.1147546Z or.b64 %rd18, %rd671, 4611686293372403712; 2026-02-21T10:19:38.1147607Z add.s32 %r32413, %r39931, 16480; 2026-02-21T10:19:38.1147666Z bfe.u32 %r32414, %r32413, 4, 14; 2026-02-21T10:19:38.1147728Z cvt.u64.u32 %rd672, %r32414; 2026-02-21T10:19:38.1147798Z or.b64 %rd19, %rd672, 4611686293372403712; 2026-02-21T10:19:38.1147858Z and.b32 %r32416, %r42457, 1920; 2026-02-21T10:19:38.1147920Z or.b32 %r32418, %r32416, %r42456; 2026-02-21T10:19:38.1147981Z xor.b32 %r32419, %r32418, %r42458; 2026-02-21T10:19:38.1148040Z or.b32 %r32420, %r32419, %r32369; 2026-02-21T10:19:38.1148100Z add.s32 %r94, %r39931, %r32420; 2026-02-21T10:19:38.1148168Z add.s32 %r95, %r94, 16384; 2026-02-21T10:19:38.1148228Z add.s32 %r96, %r94, 8192; 2026-02-21T10:19:38.1148285Z add.s32 %r97, %r94, 24576; 2026-02-21T10:19:38.1148345Z xor.b32 %r32421, %r32420, 32; 2026-02-21T10:19:38.1148506Z add.s32 %r98, %r39931, %r32421; 2026-02-21T10:19:38.1148567Z add.s32 %r99, %r98, 16384; 2026-02-21T10:19:38.1148630Z add.s32 %r100, %r98, 8192; 2026-02-21T10:19:38.1148690Z add.s32 %r101, %r98, 24576; 2026-02-21T10:19:38.1148747Z xor.b32 %r32422, %r32420, 64; 2026-02-21T10:19:38.1148881Z add.s32 %r102, %r39931, %r32422; 2026-02-21T10:19:38.1148946Z add.s32 %r103, %r102, 16384; 2026-02-21T10:19:38.1149004Z add.s32 %r104, %r102, 8192; 2026-02-21T10:19:38.1149062Z add.s32 %r105, %r102, 24576; 2026-02-21T10:19:38.1149122Z xor.b32 %r32423, %r32420, 96; 2026-02-21T10:19:38.1149181Z add.s32 %r106, %r39931, %r32423; 2026-02-21T10:19:38.1149237Z add.s32 %r107, %r106, 16384; 2026-02-21T10:19:38.1149294Z add.s32 %r108, %r106, 8192; 2026-02-21T10:19:38.1149416Z add.s32 %r109, %r106, 24576; 2026-02-21T10:19:38.1149644Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:38.1149713Z mad.wide.u32 %rd20, %r7, 16, %rd117; 2026-02-21T10:19:38.1149913Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1149973Z or.b32 %r32425, %r42466, %r8; 2026-02-21T10:19:38.1150030Z or.b32 %r118, %r32425, 128; 2026-02-21T10:19:38.1150145Z $L__BB0_17: // =>This Loop Header: Depth=1 2026-02-21T10:19:38.1150242Z // Child Loop BB0_18 Depth 2 2026-02-21T10:19:38.1150334Z // Child Loop BB0_20 Depth 2 2026-02-21T10:19:38.1150530Z .loc 1 34 35 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:34:35 2026-02-21T10:19:38.1150592Z shr.s32 %r32427, %r43239, 31; 2026-02-21T10:19:38.1150650Z shr.u32 %r32428, %r32427, 18; 2026-02-21T10:19:38.1150779Z add.s32 %r32429, %r43239, %r32428; 2026-02-21T10:19:38.1150843Z shr.s32 %r32430, %r32429, 14; 2026-02-21T10:19:38.1151033Z .loc 1 35 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:35:33 2026-02-21T10:19:38.1151092Z shl.b32 %r32431, %r32430, 5; 2026-02-21T10:19:38.1151284Z .loc 1 36 39 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:39 2026-02-21T10:19:38.1151343Z sub.s32 %r32432, 10, %r32431; 2026-02-21T10:19:38.1151534Z .loc 1 36 52 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:36:52 2026-02-21T10:19:38.1151593Z min.u32 %r32433, %r32432, 32; 2026-02-21T10:19:38.1151781Z .loc 1 37 45 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:45 2026-02-21T10:19:38.1151843Z and.b32 %r32434, %r32429, 49152; 2026-02-21T10:19:38.1151905Z sub.s32 %r32435, %r43239, %r32434; 2026-02-21T10:19:38.1152112Z .loc 1 37 64 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:64 2026-02-21T10:19:38.1152175Z cvt.u16.u32 %rs2689, %r32435; 2026-02-21T10:19:38.1152235Z cvt.u16.u32 %rs2690, %r32433; 2026-02-21T10:19:38.1152426Z .loc 1 38 51 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:38:51 2026-02-21T10:19:38.1152541Z div.s16 %rs2691, %rs2689, %rs2690; 2026-02-21T10:19:38.1152728Z .loc 1 37 64 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:64 2026-02-21T10:19:38.1152800Z mul.lo.s16 %rs2692, %rs2691, %rs2690; 2026-02-21T10:19:38.1152859Z sub.s16 %rs2693, %rs2689, %rs2692; 2026-02-21T10:19:38.1152917Z cvt.s32.s16 %r32436, %rs2693; 2026-02-21T10:19:38.1153105Z .loc 1 37 30 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:37:30 2026-02-21T10:19:38.1153169Z add.s32 %r32437, %r32431, %r32436; 2026-02-21T10:19:38.1153356Z .loc 1 38 51 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:38:51 2026-02-21T10:19:38.1153418Z cvt.u32.u16 %r32438, %rs2691; 2026-02-21T10:19:38.1153607Z .loc 1 39 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:39:27 2026-02-21T10:19:38.1153665Z shl.b32 %r39932, %r32437, 7; 2026-02-21T10:19:38.1153852Z .loc 1 40 27 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:40:27 2026-02-21T10:19:38.1153924Z mul.wide.s16 %r42379, %rs2691, 128; 2026-02-21T10:19:38.1154162Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1154224Z or.b32 %r32439, %r42459, %r42379; 2026-02-21T10:19:38.1154283Z shl.b32 %r32440, %r32439, 13; 2026-02-21T10:19:38.1154347Z mul.wide.s32 %rd93, %r32440, 2; 2026-02-21T10:19:38.1154405Z or.b32 %r32441, %r42460, %r42379; 2026-02-21T10:19:38.1154462Z shl.b32 %r32442, %r32441, 13; 2026-02-21T10:19:38.1154526Z mul.wide.s32 %rd94, %r32442, 2; 2026-02-21T10:19:38.1154636Z or.b32 %r32443, %r42461, %r42379; 2026-02-21T10:19:38.1154694Z shl.b32 %r32444, %r32443, 13; 2026-02-21T10:19:38.1154758Z mul.wide.s32 %rd95, %r32444, 2; 2026-02-21T10:19:38.1154816Z or.b32 %r32445, %r42462, %r42379; 2026-02-21T10:19:38.1154873Z shl.b32 %r32446, %r32445, 13; 2026-02-21T10:19:38.1154947Z mul.wide.s32 %rd96, %r32446, 2; 2026-02-21T10:19:38.1155012Z or.b32 %r32447, %r42463, %r42379; 2026-02-21T10:19:38.1155068Z shl.b32 %r32448, %r32447, 13; 2026-02-21T10:19:38.1155129Z mul.wide.s32 %rd97, %r32448, 2; 2026-02-21T10:19:38.1155192Z or.b32 %r32449, %r42464, %r42379; 2026-02-21T10:19:38.1155248Z shl.b32 %r32450, %r32449, 13; 2026-02-21T10:19:38.1155310Z mul.wide.s32 %rd98, %r32450, 2; 2026-02-21T10:19:38.1155369Z shl.b32 %r32451, %r32438, 20; 2026-02-21T10:19:38.1155427Z or.b32 %r32452, %r42465, %r32451; 2026-02-21T10:19:38.1155488Z mul.wide.s32 %rd99, %r32452, 2; 2026-02-21T10:19:38.1155547Z or.b32 %r43240, %r118, %r32451; 2026-02-21T10:19:38.1155609Z or.b32 %r32453, %r42466, %r32451; 2026-02-21T10:19:38.1155744Z mul.wide.s32 %rd100, %r32453, 2; 2026-02-21T10:19:38.1155807Z mov.b32 %r43241, 0f00000000; 2026-02-21T10:19:38.1155871Z mov.b64 %rd853, -96; 2026-02-21T10:19:38.1155931Z mov.b64 %rd852, %rd20; 2026-02-21T10:19:38.1155989Z mov.b32 %r43242, %r43241; 2026-02-21T10:19:38.1156046Z mov.b32 %r43243, %r43241; 2026-02-21T10:19:38.1156107Z mov.b32 %r43244, %r43241; 2026-02-21T10:19:38.1156166Z mov.b32 %r43245, %r43241; 2026-02-21T10:19:38.1156222Z mov.b32 %r43246, %r43241; 2026-02-21T10:19:38.1156281Z mov.b32 %r43247, %r43241; 2026-02-21T10:19:38.1156337Z mov.b32 %r43248, %r43241; 2026-02-21T10:19:38.1156392Z mov.b32 %r43249, %r43241; 2026-02-21T10:19:38.1156568Z mov.b32 %r43250, %r43241; 2026-02-21T10:19:38.1156632Z mov.b32 %r43251, %r43241; 2026-02-21T10:19:38.1156687Z mov.b32 %r43252, %r43241; 2026-02-21T10:19:38.1156742Z mov.b32 %r43253, %r43241; 2026-02-21T10:19:38.1156800Z mov.b32 %r43254, %r43241; 2026-02-21T10:19:38.1156871Z mov.b32 %r43255, %r43241; 2026-02-21T10:19:38.1156930Z mov.b32 %r43256, %r43241; 2026-02-21T10:19:38.1156986Z mov.b32 %r43257, %r43241; 2026-02-21T10:19:38.1157055Z mov.b32 %r43258, %r43241; 2026-02-21T10:19:38.1157112Z mov.b32 %r43259, %r43241; 2026-02-21T10:19:38.1157169Z mov.b32 %r43260, %r43241; 2026-02-21T10:19:38.1157226Z mov.b32 %r43261, %r43241; 2026-02-21T10:19:38.1157360Z mov.b32 %r43262, %r43241; 2026-02-21T10:19:38.1157415Z mov.b32 %r43263, %r43241; 2026-02-21T10:19:38.1157471Z mov.b32 %r43264, %r43241; 2026-02-21T10:19:38.1157531Z mov.b32 %r43265, %r43241; 2026-02-21T10:19:38.1157586Z mov.b32 %r43266, %r43241; 2026-02-21T10:19:38.1157643Z mov.b32 %r43267, %r43241; 2026-02-21T10:19:38.1157700Z mov.b32 %r43268, %r43241; 2026-02-21T10:19:38.1157755Z mov.b32 %r43269, %r43241; 2026-02-21T10:19:38.1157811Z mov.b32 %r43270, %r43241; 2026-02-21T10:19:38.1157869Z mov.b32 %r43271, %r43241; 2026-02-21T10:19:38.1157924Z mov.b32 %r43272, %r43241; 2026-02-21T10:19:38.1157981Z mov.b32 %r43273, %r43241; 2026-02-21T10:19:38.1158038Z mov.b32 %r43274, %r43241; 2026-02-21T10:19:38.1158096Z mov.b32 %r43275, %r43241; 2026-02-21T10:19:38.1158153Z mov.b32 %r43276, %r43241; 2026-02-21T10:19:38.1158207Z mov.b32 %r43277, %r43241; 2026-02-21T10:19:38.1158264Z mov.b32 %r43278, %r43241; 2026-02-21T10:19:38.1158318Z mov.b32 %r43279, %r43241; 2026-02-21T10:19:38.1158374Z mov.b32 %r43280, %r43241; 2026-02-21T10:19:38.1158430Z mov.b32 %r43281, %r43241; 2026-02-21T10:19:38.1158489Z mov.b32 %r43282, %r43241; 2026-02-21T10:19:38.1158621Z mov.b32 %r43283, %r43241; 2026-02-21T10:19:38.1158685Z mov.b32 %r43284, %r43241; 2026-02-21T10:19:38.1158742Z mov.b32 %r43285, %r43241; 2026-02-21T10:19:38.1158798Z mov.b32 %r43286, %r43241; 2026-02-21T10:19:38.1158854Z mov.b32 %r43287, %r43241; 2026-02-21T10:19:38.1158910Z mov.b32 %r43288, %r43241; 2026-02-21T10:19:38.1158968Z mov.b32 %r43289, %r43241; 2026-02-21T10:19:38.1159023Z mov.b32 %r43290, %r43241; 2026-02-21T10:19:38.1159140Z mov.b32 %r43291, %r43241; 2026-02-21T10:19:38.1159199Z mov.b32 %r43292, %r43241; 2026-02-21T10:19:38.1159255Z mov.b32 %r43293, %r43241; 2026-02-21T10:19:38.1159310Z mov.b32 %r43294, %r43241; 2026-02-21T10:19:38.1159366Z mov.b32 %r43295, %r43241; 2026-02-21T10:19:38.1159424Z mov.b32 %r43296, %r43241; 2026-02-21T10:19:38.1159479Z mov.b32 %r43297, %r43241; 2026-02-21T10:19:38.1159536Z mov.b32 %r43298, %r43241; 2026-02-21T10:19:38.1159595Z mov.b32 %r43299, %r43241; 2026-02-21T10:19:38.1159651Z mov.b32 %r43300, %r43241; 2026-02-21T10:19:38.1159707Z mov.b32 %r43301, %r43241; 2026-02-21T10:19:38.1159763Z mov.b32 %r43302, %r43241; 2026-02-21T10:19:38.1159822Z mov.b32 %r43303, %r43241; 2026-02-21T10:19:38.1159878Z mov.b32 %r43304, %r43241; 2026-02-21T10:19:38.1159935Z mov.b32 %r43305, %r43241; 2026-02-21T10:19:38.1159994Z mov.b32 %r43306, %r43241; 2026-02-21T10:19:38.1160050Z mov.b32 %r43307, %r43241; 2026-02-21T10:19:38.1160106Z mov.b32 %r43308, %r43241; 2026-02-21T10:19:38.1160164Z mov.b32 %r43309, %r43241; 2026-02-21T10:19:38.1160296Z mov.b32 %r43310, %r43241; 2026-02-21T10:19:38.1160355Z mov.b32 %r43311, %r43241; 2026-02-21T10:19:38.1160411Z mov.b32 %r43312, %r43241; 2026-02-21T10:19:38.1160469Z mov.b32 %r43313, %r43241; 2026-02-21T10:19:38.1160526Z mov.b32 %r43314, %r43241; 2026-02-21T10:19:38.1160582Z mov.b32 %r43315, %r43241; 2026-02-21T10:19:38.1160643Z mov.b32 %r43316, %r43241; 2026-02-21T10:19:38.1160697Z mov.b32 %r43317, %r43241; 2026-02-21T10:19:38.1160753Z mov.b32 %r43318, %r43241; 2026-02-21T10:19:38.1160810Z mov.b32 %r43319, %r43241; 2026-02-21T10:19:38.1160868Z mov.b32 %r43320, %r43241; 2026-02-21T10:19:38.1160925Z mov.b32 %r43321, %r43241; 2026-02-21T10:19:38.1160980Z mov.b32 %r43322, %r43241; 2026-02-21T10:19:38.1161037Z mov.b32 %r43323, %r43241; 2026-02-21T10:19:38.1161092Z mov.b32 %r43324, %r43241; 2026-02-21T10:19:38.1161159Z mov.b32 %r43325, %r43241; 2026-02-21T10:19:38.1161217Z mov.b32 %r43326, %r43241; 2026-02-21T10:19:38.1161278Z mov.b32 %r43327, %r43241; 2026-02-21T10:19:38.1161336Z mov.b32 %r43328, %r43241; 2026-02-21T10:19:38.1161393Z mov.b32 %r43329, %r43241; 2026-02-21T10:19:38.1161453Z mov.b32 %r43330, %r43241; 2026-02-21T10:19:38.1161509Z mov.b32 %r43331, %r43241; 2026-02-21T10:19:38.1161564Z mov.b32 %r43332, %r43241; 2026-02-21T10:19:38.1161677Z mov.b32 %r43333, %r43241; 2026-02-21T10:19:38.1161736Z mov.b32 %r43334, %r43241; 2026-02-21T10:19:38.1161791Z mov.b32 %r43335, %r43241; 2026-02-21T10:19:38.1161848Z mov.b32 %r43336, %r43241; 2026-02-21T10:19:38.1161908Z mov.b32 %r43337, %r43241; 2026-02-21T10:19:38.1161965Z mov.b32 %r43338, %r43241; 2026-02-21T10:19:38.1162020Z mov.b32 %r43339, %r43241; 2026-02-21T10:19:38.1162075Z mov.b32 %r43340, %r43241; 2026-02-21T10:19:38.1162132Z mov.b32 %r43341, %r43241; 2026-02-21T10:19:38.1162187Z mov.b32 %r43342, %r43241; 2026-02-21T10:19:38.1162242Z mov.b32 %r43343, %r43241; 2026-02-21T10:19:38.1162299Z mov.b32 %r43344, %r43241; 2026-02-21T10:19:38.1162358Z mov.b32 %r43345, %r43241; 2026-02-21T10:19:38.1162415Z mov.b32 %r43346, %r43241; 2026-02-21T10:19:38.1162482Z mov.b32 %r43347, %r43241; 2026-02-21T10:19:38.1162542Z mov.b32 %r43348, %r43241; 2026-02-21T10:19:38.1162598Z mov.b32 %r43349, %r43241; 2026-02-21T10:19:38.1162655Z mov.b32 %r43350, %r43241; 2026-02-21T10:19:38.1162715Z mov.b32 %r43351, %r43241; 2026-02-21T10:19:38.1162771Z mov.b32 %r43352, %r43241; 2026-02-21T10:19:38.1162827Z mov.b32 %r43353, %r43241; 2026-02-21T10:19:38.1162882Z mov.b32 %r43354, %r43241; 2026-02-21T10:19:38.1162994Z mov.b32 %r43355, %r43241; 2026-02-21T10:19:38.1163062Z mov.b32 %r43356, %r43241; 2026-02-21T10:19:38.1163121Z mov.b32 %r43357, %r43241; 2026-02-21T10:19:38.1163180Z mov.b32 %r43358, %r43241; 2026-02-21T10:19:38.1163236Z mov.b32 %r43359, %r43241; 2026-02-21T10:19:38.1163290Z mov.b32 %r43360, %r43241; 2026-02-21T10:19:38.1163345Z mov.b32 %r43361, %r43241; 2026-02-21T10:19:38.1163403Z mov.b32 %r43362, %r43241; 2026-02-21T10:19:38.1163506Z mov.b32 %r43363, %r43241; 2026-02-21T10:19:38.1163563Z mov.b32 %r43364, %r43241; 2026-02-21T10:19:38.1163621Z mov.b32 %r43365, %r43241; 2026-02-21T10:19:38.1163676Z mov.b32 %r43366, %r43241; 2026-02-21T10:19:38.1163731Z mov.b32 %r43367, %r43241; 2026-02-21T10:19:38.1163788Z mov.b32 %r43368, %r43241; 2026-02-21T10:19:38.1163905Z $L__BB0_18: // Parent Loop BB0_17 Depth=1 2026-02-21T10:19:38.1164011Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.1164212Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.1164280Z add.s64 %rd675, %rd852, %rd99; 2026-02-21T10:19:38.1164349Z add.s64 %rd678, %rd852, %rd98; 2026-02-21T10:19:38.1164409Z add.s64 %rd681, %rd852, %rd97; 2026-02-21T10:19:38.1164470Z add.s64 %rd684, %rd852, %rd96; 2026-02-21T10:19:38.1164529Z add.s64 %rd687, %rd852, %rd95; 2026-02-21T10:19:38.1164587Z add.s64 %rd690, %rd852, %rd94; 2026-02-21T10:19:38.1164715Z add.s64 %rd693, %rd852, %rd93; 2026-02-21T10:19:38.1164913Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.1164976Z add.s64 %rd696, %rd852, %rd100; 2026-02-21T10:19:38.1165035Z // begin inline asm 2026-02-21T10:19:38.1165096Z mov.u64 %rd674, 0x0; 2026-02-21T10:19:38.1165226Z createpolicy.fractional.L2::evict_first.b64 %rd674, 1.0; 2026-02-21T10:19:38.1165282Z // end inline asm 2026-02-21T10:19:38.1165342Z // begin inline asm 2026-02-21T10:19:38.1165412Z mov.u32 %r32454, 0x0; 2026-02-21T10:19:38.1165470Z mov.u32 %r32455, 0x0; 2026-02-21T10:19:38.1165527Z mov.u32 %r32456, 0x0; 2026-02-21T10:19:38.1165584Z mov.u32 %r32457, 0x0; 2026-02-21T10:19:38.1165824Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32454, %r32455, %r32456, %r32457 }, [ %rd675 + 0 ], %rd674; 2026-02-21T10:19:38.1165879Z // end inline asm 2026-02-21T10:19:38.1165939Z // begin inline asm 2026-02-21T10:19:38.1165997Z mov.u64 %rd677, 0x0; 2026-02-21T10:19:38.1166118Z createpolicy.fractional.L2::evict_first.b64 %rd677, 1.0; 2026-02-21T10:19:38.1166175Z // end inline asm 2026-02-21T10:19:38.1166232Z // begin inline asm 2026-02-21T10:19:38.1166286Z mov.u32 %r32458, 0x0; 2026-02-21T10:19:38.1166341Z mov.u32 %r32459, 0x0; 2026-02-21T10:19:38.1166575Z mov.u32 %r32460, 0x0; 2026-02-21T10:19:38.1166646Z mov.u32 %r32461, 0x0; 2026-02-21T10:19:38.1166890Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32458, %r32459, %r32460, %r32461 }, [ %rd678 + 0 ], %rd677; 2026-02-21T10:19:38.1166949Z // end inline asm 2026-02-21T10:19:38.1167006Z // begin inline asm 2026-02-21T10:19:38.1167062Z mov.u64 %rd680, 0x0; 2026-02-21T10:19:38.1167181Z createpolicy.fractional.L2::evict_first.b64 %rd680, 1.0; 2026-02-21T10:19:38.1167235Z // end inline asm 2026-02-21T10:19:38.1167290Z // begin inline asm 2026-02-21T10:19:38.1167346Z mov.u32 %r32462, 0x0; 2026-02-21T10:19:38.1167403Z mov.u32 %r32463, 0x0; 2026-02-21T10:19:38.1167460Z mov.u32 %r32464, 0x0; 2026-02-21T10:19:38.1167517Z mov.u32 %r32465, 0x0; 2026-02-21T10:19:38.1167742Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32462, %r32463, %r32464, %r32465 }, [ %rd681 + 0 ], %rd680; 2026-02-21T10:19:38.1167799Z // end inline asm 2026-02-21T10:19:38.1167855Z // begin inline asm 2026-02-21T10:19:38.1167914Z mov.u64 %rd683, 0x0; 2026-02-21T10:19:38.1168032Z createpolicy.fractional.L2::evict_first.b64 %rd683, 1.0; 2026-02-21T10:19:38.1168085Z // end inline asm 2026-02-21T10:19:38.1168225Z // begin inline asm 2026-02-21T10:19:38.1168289Z mov.u32 %r32466, 0x0; 2026-02-21T10:19:38.1168344Z mov.u32 %r32467, 0x0; 2026-02-21T10:19:38.1168399Z mov.u32 %r32468, 0x0; 2026-02-21T10:19:38.1168454Z mov.u32 %r32469, 0x0; 2026-02-21T10:19:38.1168688Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32466, %r32467, %r32468, %r32469 }, [ %rd684 + 0 ], %rd683; 2026-02-21T10:19:38.1168741Z // end inline asm 2026-02-21T10:19:38.1168796Z // begin inline asm 2026-02-21T10:19:38.1168931Z mov.u64 %rd686, 0x0; 2026-02-21T10:19:38.1169048Z createpolicy.fractional.L2::evict_first.b64 %rd686, 1.0; 2026-02-21T10:19:38.1169103Z // end inline asm 2026-02-21T10:19:38.1169173Z // begin inline asm 2026-02-21T10:19:38.1169229Z mov.u32 %r32470, 0x0; 2026-02-21T10:19:38.1169284Z mov.u32 %r32471, 0x0; 2026-02-21T10:19:38.1169342Z mov.u32 %r32472, 0x0; 2026-02-21T10:19:38.1169398Z mov.u32 %r32473, 0x0; 2026-02-21T10:19:38.1169620Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32470, %r32471, %r32472, %r32473 }, [ %rd687 + 0 ], %rd686; 2026-02-21T10:19:38.1169676Z // end inline asm 2026-02-21T10:19:38.1169734Z // begin inline asm 2026-02-21T10:19:38.1169791Z mov.u64 %rd689, 0x0; 2026-02-21T10:19:38.1169905Z createpolicy.fractional.L2::evict_first.b64 %rd689, 1.0; 2026-02-21T10:19:38.1169961Z // end inline asm 2026-02-21T10:19:38.1170017Z // begin inline asm 2026-02-21T10:19:38.1170073Z mov.u32 %r32474, 0x0; 2026-02-21T10:19:38.1170128Z mov.u32 %r32475, 0x0; 2026-02-21T10:19:38.1170186Z mov.u32 %r32476, 0x0; 2026-02-21T10:19:38.1170308Z mov.u32 %r32477, 0x0; 2026-02-21T10:19:38.1170533Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32474, %r32475, %r32476, %r32477 }, [ %rd690 + 0 ], %rd689; 2026-02-21T10:19:38.1170589Z // end inline asm 2026-02-21T10:19:38.1170650Z // begin inline asm 2026-02-21T10:19:38.1170708Z mov.u64 %rd692, 0x0; 2026-02-21T10:19:38.1170826Z createpolicy.fractional.L2::evict_first.b64 %rd692, 1.0; 2026-02-21T10:19:38.1170880Z // end inline asm 2026-02-21T10:19:38.1170950Z // begin inline asm 2026-02-21T10:19:38.1171008Z mov.u32 %r32478, 0x0; 2026-02-21T10:19:38.1171070Z mov.u32 %r32479, 0x0; 2026-02-21T10:19:38.1171125Z mov.u32 %r32480, 0x0; 2026-02-21T10:19:38.1171180Z mov.u32 %r32481, 0x0; 2026-02-21T10:19:38.1171403Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32478, %r32479, %r32480, %r32481 }, [ %rd693 + 0 ], %rd692; 2026-02-21T10:19:38.1171459Z // end inline asm 2026-02-21T10:19:38.1171514Z // begin inline asm 2026-02-21T10:19:38.1171572Z mov.u64 %rd695, 0x0; 2026-02-21T10:19:38.1171693Z createpolicy.fractional.L2::evict_first.b64 %rd695, 1.0; 2026-02-21T10:19:38.1171748Z // end inline asm 2026-02-21T10:19:38.1171803Z // begin inline asm 2026-02-21T10:19:38.1171861Z mov.u32 %r32482, 0x0; 2026-02-21T10:19:38.1171916Z mov.u32 %r32483, 0x0; 2026-02-21T10:19:38.1172042Z mov.u32 %r32484, 0x0; 2026-02-21T10:19:38.1172099Z mov.u32 %r32485, 0x0; 2026-02-21T10:19:38.1172323Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32482, %r32483, %r32484, %r32485 }, [ %rd696 + 0 ], %rd695; 2026-02-21T10:19:38.1172377Z // end inline asm 2026-02-21T10:19:38.1172576Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.1172633Z bar.sync 0; 2026-02-21T10:19:38.1172715Z st.shared.v2.b32 [%r68], {%r32454, %r32455}; 2026-02-21T10:19:38.1172804Z st.shared.v2.b32 [%r68+2048], {%r32458, %r32459}; 2026-02-21T10:19:38.1172891Z st.shared.v2.b32 [%r68+4096], {%r32462, %r32463}; 2026-02-21T10:19:38.1172978Z st.shared.v2.b32 [%r68+6144], {%r32466, %r32467}; 2026-02-21T10:19:38.1173060Z st.shared.v2.b32 [%r68+8192], {%r32470, %r32471}; 2026-02-21T10:19:38.1173153Z st.shared.v2.b32 [%r68+10240], {%r32474, %r32475}; 2026-02-21T10:19:38.1173237Z st.shared.v2.b32 [%r68+12288], {%r32478, %r32479}; 2026-02-21T10:19:38.1173323Z st.shared.v2.b32 [%r68+14336], {%r32482, %r32483}; 2026-02-21T10:19:38.1173400Z st.shared.v2.b32 [%r69], {%r32456, %r32457}; 2026-02-21T10:19:38.1173533Z st.shared.v2.b32 [%r69+2048], {%r32460, %r32461}; 2026-02-21T10:19:38.1173619Z st.shared.v2.b32 [%r69+4096], {%r32464, %r32465}; 2026-02-21T10:19:38.1173701Z st.shared.v2.b32 [%r69+6144], {%r32468, %r32469}; 2026-02-21T10:19:38.1173784Z st.shared.v2.b32 [%r69+8192], {%r32472, %r32473}; 2026-02-21T10:19:38.1173881Z st.shared.v2.b32 [%r69+10240], {%r32476, %r32477}; 2026-02-21T10:19:38.1173974Z st.shared.v2.b32 [%r69+12288], {%r32480, %r32481}; 2026-02-21T10:19:38.1174129Z st.shared.v2.b32 [%r69+14336], {%r32484, %r32485}; 2026-02-21T10:19:38.1174187Z bar.sync 0; 2026-02-21T10:19:38.1174255Z ld.shared.b16 %rs2694, [%r70]; 2026-02-21T10:19:38.1174326Z ld.shared.b16 %rs2695, [%r70+1024]; 2026-02-21T10:19:38.1174397Z ld.shared.b16 %rs2696, [%r70+64]; 2026-02-21T10:19:38.1174462Z ld.shared.b16 %rs2697, [%r70+1088]; 2026-02-21T10:19:38.1174527Z ld.shared.b16 %rs2698, [%r70+8192]; 2026-02-21T10:19:38.1174591Z ld.shared.b16 %rs2699, [%r70+9216]; 2026-02-21T10:19:38.1174654Z ld.shared.b16 %rs2700, [%r70+8256]; 2026-02-21T10:19:38.1174719Z ld.shared.b16 %rs2701, [%r70+9280]; 2026-02-21T10:19:38.1174784Z ld.shared.b16 %rs2702, [%r71]; 2026-02-21T10:19:38.1174851Z ld.shared.b16 %rs2703, [%r71+1024]; 2026-02-21T10:19:38.1174914Z ld.shared.b16 %rs2704, [%r71+64]; 2026-02-21T10:19:38.1174977Z ld.shared.b16 %rs2705, [%r71+1088]; 2026-02-21T10:19:38.1175042Z ld.shared.b16 %rs2706, [%r71+8192]; 2026-02-21T10:19:38.1175105Z ld.shared.b16 %rs2707, [%r71+9216]; 2026-02-21T10:19:38.1175227Z ld.shared.b16 %rs2708, [%r71+8256]; 2026-02-21T10:19:38.1175300Z ld.shared.b16 %rs2709, [%r71+9280]; 2026-02-21T10:19:38.1175365Z ld.shared.b16 %rs2710, [%r72]; 2026-02-21T10:19:38.1175429Z ld.shared.b16 %rs2711, [%r72+1024]; 2026-02-21T10:19:38.1175491Z ld.shared.b16 %rs2712, [%r72+64]; 2026-02-21T10:19:38.1175558Z ld.shared.b16 %rs2713, [%r72+1088]; 2026-02-21T10:19:38.1175624Z ld.shared.b16 %rs2714, [%r72+8192]; 2026-02-21T10:19:38.1175687Z ld.shared.b16 %rs2715, [%r72+9216]; 2026-02-21T10:19:38.1175755Z ld.shared.b16 %rs2716, [%r72+8256]; 2026-02-21T10:19:38.1175818Z ld.shared.b16 %rs2717, [%r72+9280]; 2026-02-21T10:19:38.1175880Z ld.shared.b16 %rs2718, [%r73]; 2026-02-21T10:19:38.1175943Z ld.shared.b16 %rs2719, [%r73+1024]; 2026-02-21T10:19:38.1176007Z ld.shared.b16 %rs2720, [%r73+64]; 2026-02-21T10:19:38.1176070Z ld.shared.b16 %rs2721, [%r73+1088]; 2026-02-21T10:19:38.1176133Z ld.shared.b16 %rs2722, [%r73+8192]; 2026-02-21T10:19:38.1176197Z ld.shared.b16 %rs2723, [%r73+9216]; 2026-02-21T10:19:38.1176266Z ld.shared.b16 %rs2724, [%r73+8256]; 2026-02-21T10:19:38.1176328Z ld.shared.b16 %rs2725, [%r73+9280]; 2026-02-21T10:19:38.1176394Z ld.shared.b16 %rs2726, [%r74]; 2026-02-21T10:19:38.1176589Z ld.shared.b16 %rs2727, [%r74+1024]; 2026-02-21T10:19:38.1176658Z ld.shared.b16 %rs2728, [%r74+64]; 2026-02-21T10:19:38.1176812Z ld.shared.b16 %rs2729, [%r74+1088]; 2026-02-21T10:19:38.1176889Z ld.shared.b16 %rs2730, [%r74+8192]; 2026-02-21T10:19:38.1176954Z ld.shared.b16 %rs2731, [%r74+9216]; 2026-02-21T10:19:38.1177019Z ld.shared.b16 %rs2732, [%r74+8256]; 2026-02-21T10:19:38.1177084Z ld.shared.b16 %rs2733, [%r74+9280]; 2026-02-21T10:19:38.1177146Z ld.shared.b16 %rs2734, [%r75]; 2026-02-21T10:19:38.1177211Z ld.shared.b16 %rs2735, [%r75+1024]; 2026-02-21T10:19:38.1177274Z ld.shared.b16 %rs2736, [%r75+64]; 2026-02-21T10:19:38.1177338Z ld.shared.b16 %rs2737, [%r75+1088]; 2026-02-21T10:19:38.1177402Z ld.shared.b16 %rs2738, [%r75+8192]; 2026-02-21T10:19:38.1177469Z ld.shared.b16 %rs2739, [%r75+9216]; 2026-02-21T10:19:38.1177536Z ld.shared.b16 %rs2740, [%r75+8256]; 2026-02-21T10:19:38.1177599Z ld.shared.b16 %rs2741, [%r75+9280]; 2026-02-21T10:19:38.1177661Z ld.shared.b16 %rs2742, [%r76]; 2026-02-21T10:19:38.1177726Z ld.shared.b16 %rs2743, [%r76+1024]; 2026-02-21T10:19:38.1177791Z ld.shared.b16 %rs2744, [%r76+64]; 2026-02-21T10:19:38.1177857Z ld.shared.b16 %rs2745, [%r76+1088]; 2026-02-21T10:19:38.1177921Z ld.shared.b16 %rs2746, [%r76+8192]; 2026-02-21T10:19:38.1178061Z ld.shared.b16 %rs2747, [%r76+9216]; 2026-02-21T10:19:38.1178132Z ld.shared.b16 %rs2748, [%r76+8256]; 2026-02-21T10:19:38.1178196Z ld.shared.b16 %rs2749, [%r76+9280]; 2026-02-21T10:19:38.1178260Z ld.shared.b16 %rs2750, [%r77]; 2026-02-21T10:19:38.1178324Z ld.shared.b16 %rs2751, [%r77+1024]; 2026-02-21T10:19:38.1178386Z ld.shared.b16 %rs2752, [%r77+64]; 2026-02-21T10:19:38.1178449Z ld.shared.b16 %rs2753, [%r77+1088]; 2026-02-21T10:19:38.1178515Z ld.shared.b16 %rs2754, [%r77+8192]; 2026-02-21T10:19:38.1178644Z ld.shared.b16 %rs2755, [%r77+9216]; 2026-02-21T10:19:38.1178707Z ld.shared.b16 %rs2756, [%r77+8256]; 2026-02-21T10:19:38.1178771Z ld.shared.b16 %rs2757, [%r77+9280]; 2026-02-21T10:19:38.1178832Z cvt.f32.bf16 %r32623, %rs2694; 2026-02-21T10:19:38.1178893Z cvt.f32.bf16 %r32624, %rs2695; 2026-02-21T10:19:38.1178958Z cvt.f32.bf16 %r32625, %rs2702; 2026-02-21T10:19:38.1179016Z cvt.f32.bf16 %r32626, %rs2703; 2026-02-21T10:19:38.1179075Z cvt.f32.bf16 %r32755, %rs2710; 2026-02-21T10:19:38.1179135Z cvt.f32.bf16 %r32756, %rs2711; 2026-02-21T10:19:38.1179198Z cvt.f32.bf16 %r32757, %rs2718; 2026-02-21T10:19:38.1179256Z cvt.f32.bf16 %r32758, %rs2719; 2026-02-21T10:19:38.1179314Z cvt.f32.bf16 %r32887, %rs2726; 2026-02-21T10:19:38.1179379Z cvt.f32.bf16 %r32888, %rs2727; 2026-02-21T10:19:38.1179445Z cvt.f32.bf16 %r32889, %rs2734; 2026-02-21T10:19:38.1179505Z cvt.f32.bf16 %r32890, %rs2735; 2026-02-21T10:19:38.1179566Z cvt.f32.bf16 %r33019, %rs2742; 2026-02-21T10:19:38.1179695Z cvt.f32.bf16 %r33020, %rs2743; 2026-02-21T10:19:38.1179757Z cvt.f32.bf16 %r33021, %rs2750; 2026-02-21T10:19:38.1179816Z cvt.f32.bf16 %r33022, %rs2751; 2026-02-21T10:19:38.1179875Z cvt.f32.bf16 %r33151, %rs2696; 2026-02-21T10:19:38.1179934Z cvt.f32.bf16 %r33152, %rs2697; 2026-02-21T10:19:38.1179991Z cvt.f32.bf16 %r33153, %rs2704; 2026-02-21T10:19:38.1180053Z cvt.f32.bf16 %r33154, %rs2705; 2026-02-21T10:19:38.1180114Z cvt.f32.bf16 %r33283, %rs2712; 2026-02-21T10:19:38.1180174Z cvt.f32.bf16 %r33284, %rs2713; 2026-02-21T10:19:38.1180233Z cvt.f32.bf16 %r33285, %rs2720; 2026-02-21T10:19:38.1180293Z cvt.f32.bf16 %r33286, %rs2721; 2026-02-21T10:19:38.1180352Z cvt.f32.bf16 %r33415, %rs2728; 2026-02-21T10:19:38.1180410Z cvt.f32.bf16 %r33416, %rs2729; 2026-02-21T10:19:38.1180470Z cvt.f32.bf16 %r33417, %rs2736; 2026-02-21T10:19:38.1180528Z cvt.f32.bf16 %r33418, %rs2737; 2026-02-21T10:19:38.1180586Z cvt.f32.bf16 %r33547, %rs2744; 2026-02-21T10:19:38.1180647Z cvt.f32.bf16 %r33548, %rs2745; 2026-02-21T10:19:38.1180709Z cvt.f32.bf16 %r33549, %rs2752; 2026-02-21T10:19:38.1180768Z cvt.f32.bf16 %r33550, %rs2753; 2026-02-21T10:19:38.1180827Z cvt.f32.bf16 %r33679, %rs2698; 2026-02-21T10:19:38.1180887Z cvt.f32.bf16 %r33680, %rs2699; 2026-02-21T10:19:38.1180946Z cvt.f32.bf16 %r33681, %rs2706; 2026-02-21T10:19:38.1181062Z cvt.f32.bf16 %r33682, %rs2707; 2026-02-21T10:19:38.1181121Z cvt.f32.bf16 %r33811, %rs2714; 2026-02-21T10:19:38.1181181Z cvt.f32.bf16 %r33812, %rs2715; 2026-02-21T10:19:38.1181240Z cvt.f32.bf16 %r33813, %rs2722; 2026-02-21T10:19:38.1181299Z cvt.f32.bf16 %r33814, %rs2723; 2026-02-21T10:19:38.1181360Z cvt.f32.bf16 %r33943, %rs2730; 2026-02-21T10:19:38.1181419Z cvt.f32.bf16 %r33944, %rs2731; 2026-02-21T10:19:38.1181477Z cvt.f32.bf16 %r33945, %rs2738; 2026-02-21T10:19:38.1181536Z cvt.f32.bf16 %r33946, %rs2739; 2026-02-21T10:19:38.1181596Z cvt.f32.bf16 %r34075, %rs2746; 2026-02-21T10:19:38.1181655Z cvt.f32.bf16 %r34076, %rs2747; 2026-02-21T10:19:38.1181717Z cvt.f32.bf16 %r34077, %rs2754; 2026-02-21T10:19:38.1181784Z cvt.f32.bf16 %r34078, %rs2755; 2026-02-21T10:19:38.1181851Z cvt.f32.bf16 %r34207, %rs2700; 2026-02-21T10:19:38.1181910Z cvt.f32.bf16 %r34208, %rs2701; 2026-02-21T10:19:38.1181969Z cvt.f32.bf16 %r34209, %rs2708; 2026-02-21T10:19:38.1182029Z cvt.f32.bf16 %r34210, %rs2709; 2026-02-21T10:19:38.1182090Z cvt.f32.bf16 %r34339, %rs2716; 2026-02-21T10:19:38.1182149Z cvt.f32.bf16 %r34340, %rs2717; 2026-02-21T10:19:38.1182263Z cvt.f32.bf16 %r34341, %rs2724; 2026-02-21T10:19:38.1182324Z cvt.f32.bf16 %r34342, %rs2725; 2026-02-21T10:19:38.1182382Z cvt.f32.bf16 %r34471, %rs2732; 2026-02-21T10:19:38.1182443Z cvt.f32.bf16 %r34472, %rs2733; 2026-02-21T10:19:38.1182502Z cvt.f32.bf16 %r34473, %rs2740; 2026-02-21T10:19:38.1182561Z cvt.f32.bf16 %r34474, %rs2741; 2026-02-21T10:19:38.1182619Z cvt.f32.bf16 %r34603, %rs2748; 2026-02-21T10:19:38.1182680Z cvt.f32.bf16 %r34604, %rs2749; 2026-02-21T10:19:38.1182785Z cvt.f32.bf16 %r34605, %rs2756; 2026-02-21T10:19:38.1182844Z cvt.f32.bf16 %r34606, %rs2757; 2026-02-21T10:19:38.1183053Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.1183107Z bar.sync 0; 2026-02-21T10:19:38.1183168Z add.s32 %r39929, %r39931, 4096; 2026-02-21T10:19:38.1183229Z // begin inline asm 2026-02-21T10:19:38.1183335Z @%p313 mbarrier.init.shared::cta.b64 [%r39929], 1; 2026-02-21T10:19:38.1183391Z // end inline asm 2026-02-21T10:19:38.1183455Z bar.sync 0; 2026-02-21T10:19:38.1183518Z // begin inline asm 2026-02-21T10:19:38.1183657Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39929], 4096; 2026-02-21T10:19:38.1183712Z // end inline asm 2026-02-21T10:19:38.1183771Z // begin inline asm 2026-02-21T10:19:38.1183846Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1183900Z // end inline asm 2026-02-21T10:19:38.1183952Z bar.sync 0; 2026-02-21T10:19:38.1184020Z elect.sync %r39699|%p374, -1; 2026-02-21T10:19:38.1184089Z and.pred %p315, %p1, %p374; 2026-02-21T10:19:38.1184204Z add.s64 %rd103, %rd853, 96; 2026-02-21T10:19:38.1184272Z cvt.u32.u64 %r32490, %rd103; 2026-02-21T10:19:38.1184329Z // begin inline asm 2026-02-21T10:19:38.1184668Z @%p315 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r39932, %r32490}], [%r39929]; 2026-02-21T10:19:38.1184725Z // end inline asm 2026-02-21T10:19:38.1184782Z bar.sync 0; 2026-02-21T10:19:38.1184838Z mov.b32 %r39567, 0; 2026-02-21T10:19:38.1184897Z // begin inline asm 2026-02-21T10:19:38.1184950Z 2026-02-21T10:19:38.1184999Z { 2026-02-21T10:19:38.1185061Z .reg .pred complete; 2026-02-21T10:19:38.1185115Z waitLoop: 2026-02-21T10:19:38.1185263Z mbarrier.try_wait.parity.shared.b64 complete, [%r39929], %r39567; 2026-02-21T10:19:38.1185333Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.1185382Z } 2026-02-21T10:19:38.1185387Z 2026-02-21T10:19:38.1185444Z // end inline asm 2026-02-21T10:19:38.1185512Z bar.sync 0; 2026-02-21T10:19:38.1185574Z // begin inline asm 2026-02-21T10:19:38.1185673Z @%p313 mbarrier.inval.shared::cta.b64 [%r39929]; 2026-02-21T10:19:38.1185729Z // end inline asm 2026-02-21T10:19:38.1185930Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1185996Z ld.shared.s8 %rs2758, [%r78]; 2026-02-21T10:19:38.1186261Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1186327Z shl.b16 %rs2759, %rs2758, 4; 2026-02-21T10:19:38.1186633Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1186706Z ld.shared.s8 %rs2760, [%r79+128]; 2026-02-21T10:19:38.1186905Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1186967Z shl.b16 %rs2761, %rs2760, 4; 2026-02-21T10:19:38.1187173Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1187242Z ld.shared.s8 %rs2762, [%r80+256]; 2026-02-21T10:19:38.1187432Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1187493Z shl.b16 %rs2763, %rs2762, 4; 2026-02-21T10:19:38.1187682Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1187744Z ld.shared.s8 %rs2764, [%r81+384]; 2026-02-21T10:19:38.1188015Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1188084Z shl.b16 %rs2765, %rs2764, 4; 2026-02-21T10:19:38.1188280Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1188344Z ld.shared.s8 %rs2766, [%r82+512]; 2026-02-21T10:19:38.1188614Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1188747Z shl.b16 %rs2767, %rs2766, 4; 2026-02-21T10:19:38.1188935Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1189001Z ld.shared.s8 %rs2768, [%r83+640]; 2026-02-21T10:19:38.1189188Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1189250Z shl.b16 %rs2769, %rs2768, 4; 2026-02-21T10:19:38.1189444Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1189507Z ld.shared.s8 %rs2770, [%r84+768]; 2026-02-21T10:19:38.1189695Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1189755Z shl.b16 %rs2771, %rs2770, 4; 2026-02-21T10:19:38.1189943Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1190007Z ld.shared.s8 %rs2772, [%r85+896]; 2026-02-21T10:19:38.1190259Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1190324Z shl.b16 %rs2773, %rs2772, 4; 2026-02-21T10:19:38.1190512Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1190581Z ld.shared.s8 %rs2774, [%r78+1024]; 2026-02-21T10:19:38.1190771Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1190834Z shl.b16 %rs2775, %rs2774, 4; 2026-02-21T10:19:38.1191021Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1191087Z ld.shared.s8 %rs2776, [%r79+1152]; 2026-02-21T10:19:38.1191274Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1191333Z shl.b16 %rs2777, %rs2776, 4; 2026-02-21T10:19:38.1191524Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1191590Z ld.shared.s8 %rs2778, [%r80+1280]; 2026-02-21T10:19:38.1191776Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1191835Z shl.b16 %rs2779, %rs2778, 4; 2026-02-21T10:19:38.1192091Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1192156Z ld.shared.s8 %rs2780, [%r81+1408]; 2026-02-21T10:19:38.1192344Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1192406Z shl.b16 %rs2781, %rs2780, 4; 2026-02-21T10:19:38.1192605Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1192670Z ld.shared.s8 %rs2782, [%r82+1536]; 2026-02-21T10:19:38.1192862Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1192923Z shl.b16 %rs2783, %rs2782, 4; 2026-02-21T10:19:38.1193110Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1193175Z ld.shared.s8 %rs2784, [%r83+1664]; 2026-02-21T10:19:38.1193361Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1193423Z shl.b16 %rs2785, %rs2784, 4; 2026-02-21T10:19:38.1193677Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1193749Z ld.shared.s8 %rs2786, [%r84+1792]; 2026-02-21T10:19:38.1193936Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1193997Z shl.b16 %rs2787, %rs2786, 4; 2026-02-21T10:19:38.1194188Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1194301Z ld.shared.s8 %rs2788, [%r85+1920]; 2026-02-21T10:19:38.1194488Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1194557Z shl.b16 %rs2789, %rs2788, 4; 2026-02-21T10:19:38.1194743Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1194809Z ld.shared.s8 %rs2790, [%r78+2048]; 2026-02-21T10:19:38.1194999Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1195059Z shl.b16 %rs2791, %rs2790, 4; 2026-02-21T10:19:38.1195246Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1195311Z ld.shared.s8 %rs2792, [%r79+2176]; 2026-02-21T10:19:38.1195499Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1195559Z shl.b16 %rs2793, %rs2792, 4; 2026-02-21T10:19:38.1195796Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1195863Z ld.shared.s8 %rs2794, [%r80+2304]; 2026-02-21T10:19:38.1196050Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1196111Z shl.b16 %rs2795, %rs2794, 4; 2026-02-21T10:19:38.1196300Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1196365Z ld.shared.s8 %rs2796, [%r81+2432]; 2026-02-21T10:19:38.1196687Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1196752Z shl.b16 %rs2797, %rs2796, 4; 2026-02-21T10:19:38.1196940Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1197003Z ld.shared.s8 %rs2798, [%r82+2560]; 2026-02-21T10:19:38.1197208Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1197270Z shl.b16 %rs2799, %rs2798, 4; 2026-02-21T10:19:38.1197458Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1197521Z ld.shared.s8 %rs2800, [%r83+2688]; 2026-02-21T10:19:38.1197787Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1197848Z shl.b16 %rs2801, %rs2800, 4; 2026-02-21T10:19:38.1198036Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1198100Z ld.shared.s8 %rs2802, [%r84+2816]; 2026-02-21T10:19:38.1198287Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1198346Z shl.b16 %rs2803, %rs2802, 4; 2026-02-21T10:19:38.1198539Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1198607Z ld.shared.s8 %rs2804, [%r85+2944]; 2026-02-21T10:19:38.1198796Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1198856Z shl.b16 %rs2805, %rs2804, 4; 2026-02-21T10:19:38.1199046Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1199110Z ld.shared.s8 %rs2806, [%r78+3072]; 2026-02-21T10:19:38.1199361Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1199425Z shl.b16 %rs2807, %rs2806, 4; 2026-02-21T10:19:38.1199613Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1199675Z ld.shared.s8 %rs2808, [%r79+3200]; 2026-02-21T10:19:38.1199870Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1199998Z shl.b16 %rs2809, %rs2808, 4; 2026-02-21T10:19:38.1200193Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1200265Z ld.shared.s8 %rs2810, [%r80+3328]; 2026-02-21T10:19:38.1200452Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1200513Z shl.b16 %rs2811, %rs2810, 4; 2026-02-21T10:19:38.1200701Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1200766Z ld.shared.s8 %rs2812, [%r81+3456]; 2026-02-21T10:19:38.1200952Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1201011Z shl.b16 %rs2813, %rs2812, 4; 2026-02-21T10:19:38.1201200Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1201264Z ld.shared.s8 %rs2814, [%r82+3584]; 2026-02-21T10:19:38.1201524Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1201591Z shl.b16 %rs2815, %rs2814, 4; 2026-02-21T10:19:38.1201781Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1201846Z ld.shared.s8 %rs2816, [%r83+3712]; 2026-02-21T10:19:38.1202037Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1202098Z shl.b16 %rs2817, %rs2816, 4; 2026-02-21T10:19:38.1202286Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1202350Z ld.shared.s8 %rs2818, [%r84+3840]; 2026-02-21T10:19:38.1202538Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1202596Z shl.b16 %rs2819, %rs2818, 4; 2026-02-21T10:19:38.1202786Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1202853Z ld.shared.s8 %rs2820, [%r85+3968]; 2026-02-21T10:19:38.1203039Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1203099Z shl.b16 %rs2821, %rs2820, 4; 2026-02-21T10:19:38.1203344Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1203405Z cvt.s16.s8 %rs2822, %rs2759; 2026-02-21T10:19:38.1203464Z shr.s16 %rs2823, %rs2822, 4; 2026-02-21T10:19:38.1203526Z cvt.s16.s8 %rs2824, %rs2761; 2026-02-21T10:19:38.1203585Z shr.s16 %rs2825, %rs2824, 4; 2026-02-21T10:19:38.1203643Z shr.s16 %rs2826, %rs2758, 4; 2026-02-21T10:19:38.1203701Z shr.s16 %rs2827, %rs2760, 4; 2026-02-21T10:19:38.1203891Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1203958Z cvt.rn.f32.s16 %r39700, %rs2827; 2026-02-21T10:19:38.1204024Z cvt.rn.f32.s16 %r39701, %rs2826; 2026-02-21T10:19:38.1204090Z cvt.rn.f32.s16 %r39702, %rs2825; 2026-02-21T10:19:38.1204150Z cvt.rn.f32.s16 %r39703, %rs2823; 2026-02-21T10:19:38.1204345Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1204409Z cvt.s16.s8 %rs2828, %rs2763; 2026-02-21T10:19:38.1204469Z shr.s16 %rs2829, %rs2828, 4; 2026-02-21T10:19:38.1204528Z cvt.s16.s8 %rs2830, %rs2765; 2026-02-21T10:19:38.1204636Z shr.s16 %rs2831, %rs2830, 4; 2026-02-21T10:19:38.1204701Z shr.s16 %rs2832, %rs2762, 4; 2026-02-21T10:19:38.1204759Z shr.s16 %rs2833, %rs2764, 4; 2026-02-21T10:19:38.1204946Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1205010Z cvt.rn.f32.s16 %r39704, %rs2833; 2026-02-21T10:19:38.1205071Z cvt.rn.f32.s16 %r39705, %rs2832; 2026-02-21T10:19:38.1205142Z cvt.rn.f32.s16 %r39706, %rs2831; 2026-02-21T10:19:38.1205252Z cvt.rn.f32.s16 %r39707, %rs2829; 2026-02-21T10:19:38.1205445Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1205505Z cvt.s16.s8 %rs2834, %rs2767; 2026-02-21T10:19:38.1205566Z shr.s16 %rs2835, %rs2834, 4; 2026-02-21T10:19:38.1205629Z cvt.s16.s8 %rs2836, %rs2769; 2026-02-21T10:19:38.1205687Z shr.s16 %rs2837, %rs2836, 4; 2026-02-21T10:19:38.1205746Z shr.s16 %rs2838, %rs2766, 4; 2026-02-21T10:19:38.1205807Z shr.s16 %rs2839, %rs2768, 4; 2026-02-21T10:19:38.1205995Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1206057Z cvt.rn.f32.s16 %r39708, %rs2839; 2026-02-21T10:19:38.1206118Z cvt.rn.f32.s16 %r39709, %rs2838; 2026-02-21T10:19:38.1206180Z cvt.rn.f32.s16 %r39710, %rs2837; 2026-02-21T10:19:38.1206240Z cvt.rn.f32.s16 %r39711, %rs2835; 2026-02-21T10:19:38.1206607Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1206682Z cvt.s16.s8 %rs2840, %rs2771; 2026-02-21T10:19:38.1206743Z shr.s16 %rs2841, %rs2840, 4; 2026-02-21T10:19:38.1206801Z cvt.s16.s8 %rs2842, %rs2773; 2026-02-21T10:19:38.1206862Z shr.s16 %rs2843, %rs2842, 4; 2026-02-21T10:19:38.1206921Z shr.s16 %rs2844, %rs2770, 4; 2026-02-21T10:19:38.1206981Z shr.s16 %rs2845, %rs2772, 4; 2026-02-21T10:19:38.1207185Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1207251Z cvt.rn.f32.s16 %r39712, %rs2845; 2026-02-21T10:19:38.1207313Z cvt.rn.f32.s16 %r39713, %rs2844; 2026-02-21T10:19:38.1207375Z cvt.rn.f32.s16 %r39714, %rs2843; 2026-02-21T10:19:38.1207440Z cvt.rn.f32.s16 %r39715, %rs2841; 2026-02-21T10:19:38.1207627Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1207686Z cvt.s16.s8 %rs2846, %rs2775; 2026-02-21T10:19:38.1207751Z shr.s16 %rs2847, %rs2846, 4; 2026-02-21T10:19:38.1207810Z cvt.s16.s8 %rs2848, %rs2777; 2026-02-21T10:19:38.1207869Z shr.s16 %rs2849, %rs2848, 4; 2026-02-21T10:19:38.1207926Z shr.s16 %rs2850, %rs2774, 4; 2026-02-21T10:19:38.1207986Z shr.s16 %rs2851, %rs2776, 4; 2026-02-21T10:19:38.1208176Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1208315Z cvt.rn.f32.s16 %r39716, %rs2851; 2026-02-21T10:19:38.1208377Z cvt.rn.f32.s16 %r39717, %rs2850; 2026-02-21T10:19:38.1208438Z cvt.rn.f32.s16 %r39718, %rs2849; 2026-02-21T10:19:38.1208497Z cvt.rn.f32.s16 %r39719, %rs2847; 2026-02-21T10:19:38.1208687Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1208747Z cvt.s16.s8 %rs2852, %rs2779; 2026-02-21T10:19:38.1208817Z shr.s16 %rs2853, %rs2852, 4; 2026-02-21T10:19:38.1208878Z cvt.s16.s8 %rs2854, %rs2781; 2026-02-21T10:19:38.1208941Z shr.s16 %rs2855, %rs2854, 4; 2026-02-21T10:19:38.1209003Z shr.s16 %rs2856, %rs2778, 4; 2026-02-21T10:19:38.1209061Z shr.s16 %rs2857, %rs2780, 4; 2026-02-21T10:19:38.1209255Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1209318Z cvt.rn.f32.s16 %r39720, %rs2857; 2026-02-21T10:19:38.1209380Z cvt.rn.f32.s16 %r39721, %rs2856; 2026-02-21T10:19:38.1209441Z cvt.rn.f32.s16 %r39722, %rs2855; 2026-02-21T10:19:38.1209502Z cvt.rn.f32.s16 %r39723, %rs2853; 2026-02-21T10:19:38.1209761Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1209837Z cvt.s16.s8 %rs2858, %rs2783; 2026-02-21T10:19:38.1209901Z shr.s16 %rs2859, %rs2858, 4; 2026-02-21T10:19:38.1209960Z cvt.s16.s8 %rs2860, %rs2785; 2026-02-21T10:19:38.1210018Z shr.s16 %rs2861, %rs2860, 4; 2026-02-21T10:19:38.1210078Z shr.s16 %rs2862, %rs2782, 4; 2026-02-21T10:19:38.1210138Z shr.s16 %rs2863, %rs2784, 4; 2026-02-21T10:19:38.1210390Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1210452Z cvt.rn.f32.s16 %r39724, %rs2863; 2026-02-21T10:19:38.1210514Z cvt.rn.f32.s16 %r39725, %rs2862; 2026-02-21T10:19:38.1210574Z cvt.rn.f32.s16 %r39726, %rs2861; 2026-02-21T10:19:38.1210634Z cvt.rn.f32.s16 %r39727, %rs2859; 2026-02-21T10:19:38.1210826Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1210887Z cvt.s16.s8 %rs2864, %rs2787; 2026-02-21T10:19:38.1210946Z shr.s16 %rs2865, %rs2864, 4; 2026-02-21T10:19:38.1211007Z cvt.s16.s8 %rs2866, %rs2789; 2026-02-21T10:19:38.1211066Z shr.s16 %rs2867, %rs2866, 4; 2026-02-21T10:19:38.1211125Z shr.s16 %rs2868, %rs2786, 4; 2026-02-21T10:19:38.1211187Z shr.s16 %rs2869, %rs2788, 4; 2026-02-21T10:19:38.1211380Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1211442Z cvt.rn.f32.s16 %r39728, %rs2869; 2026-02-21T10:19:38.1211569Z cvt.rn.f32.s16 %r39729, %rs2868; 2026-02-21T10:19:38.1211634Z cvt.rn.f32.s16 %r39730, %rs2867; 2026-02-21T10:19:38.1211694Z cvt.rn.f32.s16 %r39731, %rs2865; 2026-02-21T10:19:38.1211883Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1211947Z cvt.s16.s8 %rs2870, %rs2791; 2026-02-21T10:19:38.1212005Z shr.s16 %rs2871, %rs2870, 4; 2026-02-21T10:19:38.1212063Z cvt.s16.s8 %rs2872, %rs2793; 2026-02-21T10:19:38.1212125Z shr.s16 %rs2873, %rs2872, 4; 2026-02-21T10:19:38.1212191Z shr.s16 %rs2874, %rs2790, 4; 2026-02-21T10:19:38.1212249Z shr.s16 %rs2875, %rs2792, 4; 2026-02-21T10:19:38.1212452Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1212517Z cvt.rn.f32.s16 %r39732, %rs2875; 2026-02-21T10:19:38.1212578Z cvt.rn.f32.s16 %r39733, %rs2874; 2026-02-21T10:19:38.1212640Z cvt.rn.f32.s16 %r39734, %rs2873; 2026-02-21T10:19:38.1212702Z cvt.rn.f32.s16 %r39735, %rs2871; 2026-02-21T10:19:38.1212892Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1212951Z cvt.s16.s8 %rs2876, %rs2795; 2026-02-21T10:19:38.1213009Z shr.s16 %rs2877, %rs2876, 4; 2026-02-21T10:19:38.1213146Z cvt.s16.s8 %rs2878, %rs2797; 2026-02-21T10:19:38.1213205Z shr.s16 %rs2879, %rs2878, 4; 2026-02-21T10:19:38.1213263Z shr.s16 %rs2880, %rs2794, 4; 2026-02-21T10:19:38.1213326Z shr.s16 %rs2881, %rs2796, 4; 2026-02-21T10:19:38.1213515Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1213578Z cvt.rn.f32.s16 %r39736, %rs2881; 2026-02-21T10:19:38.1213637Z cvt.rn.f32.s16 %r39737, %rs2880; 2026-02-21T10:19:38.1213700Z cvt.rn.f32.s16 %r39738, %rs2879; 2026-02-21T10:19:38.1213759Z cvt.rn.f32.s16 %r39739, %rs2877; 2026-02-21T10:19:38.1213948Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1214011Z cvt.s16.s8 %rs2882, %rs2799; 2026-02-21T10:19:38.1214072Z shr.s16 %rs2883, %rs2882, 4; 2026-02-21T10:19:38.1214129Z cvt.s16.s8 %rs2884, %rs2801; 2026-02-21T10:19:38.1214190Z shr.s16 %rs2885, %rs2884, 4; 2026-02-21T10:19:38.1214252Z shr.s16 %rs2886, %rs2798, 4; 2026-02-21T10:19:38.1214312Z shr.s16 %rs2887, %rs2800, 4; 2026-02-21T10:19:38.1214562Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1214630Z cvt.rn.f32.s16 %r39740, %rs2887; 2026-02-21T10:19:38.1214689Z cvt.rn.f32.s16 %r39741, %rs2886; 2026-02-21T10:19:38.1214749Z cvt.rn.f32.s16 %r39742, %rs2885; 2026-02-21T10:19:38.1214812Z cvt.rn.f32.s16 %r39743, %rs2883; 2026-02-21T10:19:38.1215003Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1215074Z cvt.s16.s8 %rs2888, %rs2803; 2026-02-21T10:19:38.1215188Z shr.s16 %rs2889, %rs2888, 4; 2026-02-21T10:19:38.1215250Z cvt.s16.s8 %rs2890, %rs2805; 2026-02-21T10:19:38.1215309Z shr.s16 %rs2891, %rs2890, 4; 2026-02-21T10:19:38.1215366Z shr.s16 %rs2892, %rs2802, 4; 2026-02-21T10:19:38.1215427Z shr.s16 %rs2893, %rs2804, 4; 2026-02-21T10:19:38.1215616Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1215680Z cvt.rn.f32.s16 %r39744, %rs2893; 2026-02-21T10:19:38.1215753Z cvt.rn.f32.s16 %r39745, %rs2892; 2026-02-21T10:19:38.1215817Z cvt.rn.f32.s16 %r39746, %rs2891; 2026-02-21T10:19:38.1215877Z cvt.rn.f32.s16 %r39747, %rs2889; 2026-02-21T10:19:38.1216067Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1216129Z cvt.s16.s8 %rs2894, %rs2807; 2026-02-21T10:19:38.1216190Z shr.s16 %rs2895, %rs2894, 4; 2026-02-21T10:19:38.1216248Z cvt.s16.s8 %rs2896, %rs2809; 2026-02-21T10:19:38.1216310Z shr.s16 %rs2897, %rs2896, 4; 2026-02-21T10:19:38.1216423Z shr.s16 %rs2898, %rs2806, 4; 2026-02-21T10:19:38.1216607Z shr.s16 %rs2899, %rs2808, 4; 2026-02-21T10:19:38.1216808Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1216880Z cvt.rn.f32.s16 %r39748, %rs2899; 2026-02-21T10:19:38.1216947Z cvt.rn.f32.s16 %r39749, %rs2898; 2026-02-21T10:19:38.1217007Z cvt.rn.f32.s16 %r39750, %rs2897; 2026-02-21T10:19:38.1217071Z cvt.rn.f32.s16 %r39751, %rs2895; 2026-02-21T10:19:38.1217260Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1217319Z cvt.s16.s8 %rs2900, %rs2811; 2026-02-21T10:19:38.1217379Z shr.s16 %rs2901, %rs2900, 4; 2026-02-21T10:19:38.1217437Z cvt.s16.s8 %rs2902, %rs2813; 2026-02-21T10:19:38.1217497Z shr.s16 %rs2903, %rs2902, 4; 2026-02-21T10:19:38.1217557Z shr.s16 %rs2904, %rs2810, 4; 2026-02-21T10:19:38.1217616Z shr.s16 %rs2905, %rs2812, 4; 2026-02-21T10:19:38.1217806Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1217868Z cvt.rn.f32.s16 %r39752, %rs2905; 2026-02-21T10:19:38.1217931Z cvt.rn.f32.s16 %r39753, %rs2904; 2026-02-21T10:19:38.1217991Z cvt.rn.f32.s16 %r39754, %rs2903; 2026-02-21T10:19:38.1218131Z cvt.rn.f32.s16 %r39755, %rs2901; 2026-02-21T10:19:38.1218322Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1218383Z cvt.s16.s8 %rs2906, %rs2815; 2026-02-21T10:19:38.1218452Z shr.s16 %rs2907, %rs2906, 4; 2026-02-21T10:19:38.1218512Z cvt.s16.s8 %rs2908, %rs2817; 2026-02-21T10:19:38.1218574Z shr.s16 %rs2909, %rs2908, 4; 2026-02-21T10:19:38.1218632Z shr.s16 %rs2910, %rs2814, 4; 2026-02-21T10:19:38.1218690Z shr.s16 %rs2911, %rs2816, 4; 2026-02-21T10:19:38.1218881Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1218945Z cvt.rn.f32.s16 %r39756, %rs2911; 2026-02-21T10:19:38.1219006Z cvt.rn.f32.s16 %r39757, %rs2910; 2026-02-21T10:19:38.1219069Z cvt.rn.f32.s16 %r39758, %rs2909; 2026-02-21T10:19:38.1219128Z cvt.rn.f32.s16 %r39759, %rs2907; 2026-02-21T10:19:38.1219317Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1219378Z cvt.s16.s8 %rs2912, %rs2819; 2026-02-21T10:19:38.1219440Z shr.s16 %rs2913, %rs2912, 4; 2026-02-21T10:19:38.1219566Z cvt.s16.s8 %rs2914, %rs2821; 2026-02-21T10:19:38.1219629Z shr.s16 %rs2915, %rs2914, 4; 2026-02-21T10:19:38.1219698Z shr.s16 %rs2916, %rs2818, 4; 2026-02-21T10:19:38.1219764Z shr.s16 %rs2917, %rs2820, 4; 2026-02-21T10:19:38.1219953Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1220018Z cvt.rn.f32.s16 %r39760, %rs2917; 2026-02-21T10:19:38.1220078Z cvt.rn.f32.s16 %r39761, %rs2916; 2026-02-21T10:19:38.1220203Z cvt.rn.f32.s16 %r39762, %rs2915; 2026-02-21T10:19:38.1220265Z cvt.rn.f32.s16 %r39763, %rs2913; 2026-02-21T10:19:38.1220323Z bar.sync 0; 2026-02-21T10:19:38.1220443Z st.shared.v4.b32 [%r86], {%r39703, %r39701, %r39702, %r39700}; 2026-02-21T10:19:38.1220571Z st.shared.v4.b32 [%r86+16384], {%r39735, %r39733, %r39734, %r39732}; 2026-02-21T10:19:38.1220686Z st.shared.v4.b32 [%r87], {%r39707, %r39705, %r39706, %r39704}; 2026-02-21T10:19:38.1220805Z st.shared.v4.b32 [%r87+16384], {%r39739, %r39737, %r39738, %r39736}; 2026-02-21T10:19:38.1220914Z st.shared.v4.b32 [%r88], {%r39711, %r39709, %r39710, %r39708}; 2026-02-21T10:19:38.1221034Z st.shared.v4.b32 [%r88+16384], {%r39743, %r39741, %r39742, %r39740}; 2026-02-21T10:19:38.1221138Z st.shared.v4.b32 [%r89], {%r39715, %r39713, %r39714, %r39712}; 2026-02-21T10:19:38.1221253Z st.shared.v4.b32 [%r89+16384], {%r39747, %r39745, %r39746, %r39744}; 2026-02-21T10:19:38.1221359Z st.shared.v4.b32 [%r90], {%r39719, %r39717, %r39718, %r39716}; 2026-02-21T10:19:38.1221553Z st.shared.v4.b32 [%r90+16384], {%r39751, %r39749, %r39750, %r39748}; 2026-02-21T10:19:38.1221665Z st.shared.v4.b32 [%r91], {%r39723, %r39721, %r39722, %r39720}; 2026-02-21T10:19:38.1221782Z st.shared.v4.b32 [%r91+16384], {%r39755, %r39753, %r39754, %r39752}; 2026-02-21T10:19:38.1221891Z st.shared.v4.b32 [%r92], {%r39727, %r39725, %r39726, %r39724}; 2026-02-21T10:19:38.1222008Z st.shared.v4.b32 [%r92+16384], {%r39759, %r39757, %r39758, %r39756}; 2026-02-21T10:19:38.1222114Z st.shared.v4.b32 [%r93], {%r39731, %r39729, %r39730, %r39728}; 2026-02-21T10:19:38.1222232Z st.shared.v4.b32 [%r93+16384], {%r39763, %r39761, %r39762, %r39760}; 2026-02-21T10:19:38.1222286Z $L__tmp25: 2026-02-21T10:19:38.1222555Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1222616Z // begin inline asm 2026-02-21T10:19:38.1222693Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1222751Z // end inline asm 2026-02-21T10:19:38.1222806Z bar.sync 0; 2026-02-21T10:19:38.1222892Z shfl.sync.idx.b32 %r39764, %r4, 0, 31, -1; 2026-02-21T10:19:38.1222963Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1223027Z mov.pred %p317, -1; 2026-02-21T10:19:38.1223087Z // begin inline asm 2026-02-21T10:19:38.1224583Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r32623,%r32624,%r32625,%r32626}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1224696Z // end inline asm 2026-02-21T10:19:38.1224757Z // begin inline asm 2026-02-21T10:19:38.1226289Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r32755,%r32756,%r32757,%r32758}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1226351Z // end inline asm 2026-02-21T10:19:38.1226408Z // begin inline asm 2026-02-21T10:19:38.1228034Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r32887,%r32888,%r32889,%r32890}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1228182Z // end inline asm 2026-02-21T10:19:38.1228246Z // begin inline asm 2026-02-21T10:19:38.1229921Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r33019,%r33020,%r33021,%r33022}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1229990Z // end inline asm 2026-02-21T10:19:38.1230049Z // begin inline asm 2026-02-21T10:19:38.1231542Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r33151,%r33152,%r33153,%r33154}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1231603Z // end inline asm 2026-02-21T10:19:38.1231660Z // begin inline asm 2026-02-21T10:19:38.1233151Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r33283,%r33284,%r33285,%r33286}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1233280Z // end inline asm 2026-02-21T10:19:38.1233336Z // begin inline asm 2026-02-21T10:19:38.1234820Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r33415,%r33416,%r33417,%r33418}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1234886Z // end inline asm 2026-02-21T10:19:38.1234947Z // begin inline asm 2026-02-21T10:19:38.1236724Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r33547,%r33548,%r33549,%r33550}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1236858Z // end inline asm 2026-02-21T10:19:38.1236922Z // begin inline asm 2026-02-21T10:19:38.1238436Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r33679,%r33680,%r33681,%r33682}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1238501Z // end inline asm 2026-02-21T10:19:38.1238558Z // begin inline asm 2026-02-21T10:19:38.1240137Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r33811,%r33812,%r33813,%r33814}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1240207Z // end inline asm 2026-02-21T10:19:38.1240265Z // begin inline asm 2026-02-21T10:19:38.1241764Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r33943,%r33944,%r33945,%r33946}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1241822Z // end inline asm 2026-02-21T10:19:38.1241880Z // begin inline asm 2026-02-21T10:19:38.1243429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r34075,%r34076,%r34077,%r34078}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1243489Z // end inline asm 2026-02-21T10:19:38.1243548Z // begin inline asm 2026-02-21T10:19:38.1245085Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r34207,%r34208,%r34209,%r34210}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1245146Z // end inline asm 2026-02-21T10:19:38.1245205Z // begin inline asm 2026-02-21T10:19:38.1246871Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r34339,%r34340,%r34341,%r34342}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1247009Z // end inline asm 2026-02-21T10:19:38.1247069Z // begin inline asm 2026-02-21T10:19:38.1248614Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r34471,%r34472,%r34473,%r34474}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1248680Z // end inline asm 2026-02-21T10:19:38.1248746Z // begin inline asm 2026-02-21T10:19:38.1250230Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r34603,%r34604,%r34605,%r34606}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1250291Z // end inline asm 2026-02-21T10:19:38.1250370Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1250434Z mov.b32 %r34736, %r39567; 2026-02-21T10:19:38.1250495Z mov.b32 %r34737, %r39567; 2026-02-21T10:19:38.1250552Z mov.b32 %r34735, %r39931; 2026-02-21T10:19:38.1250609Z // begin inline asm 2026-02-21T10:19:38.1253123Z // wait for regs: %r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r34735,%r34736,%r34737 2026-02-21T10:19:38.1253270Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1253326Z // end inline asm 2026-02-21T10:19:38.1253383Z $L__tmp26: 2026-02-21T10:19:38.1253653Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.1253723Z add.s32 %r39765, %r43240, -64; 2026-02-21T10:19:38.1253789Z add.s64 %rd716, %rd675, 128; 2026-02-21T10:19:38.1253850Z add.s64 %rd719, %rd678, 128; 2026-02-21T10:19:38.1253912Z add.s64 %rd722, %rd681, 128; 2026-02-21T10:19:38.1253972Z add.s64 %rd725, %rd684, 128; 2026-02-21T10:19:38.1254029Z add.s64 %rd728, %rd687, 128; 2026-02-21T10:19:38.1254087Z add.s64 %rd731, %rd690, 128; 2026-02-21T10:19:38.1254195Z add.s64 %rd734, %rd693, 128; 2026-02-21T10:19:38.1254274Z mad.wide.s32 %rd737, %r39765, 2, %rd117; 2026-02-21T10:19:38.1254475Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.1254533Z // begin inline asm 2026-02-21T10:19:38.1254606Z mov.u64 %rd715, 0x0; 2026-02-21T10:19:38.1254743Z createpolicy.fractional.L2::evict_first.b64 %rd715, 1.0; 2026-02-21T10:19:38.1254800Z // end inline asm 2026-02-21T10:19:38.1254861Z // begin inline asm 2026-02-21T10:19:38.1254920Z mov.u32 %r34869, 0x0; 2026-02-21T10:19:38.1254977Z mov.u32 %r34870, 0x0; 2026-02-21T10:19:38.1255034Z mov.u32 %r34871, 0x0; 2026-02-21T10:19:38.1255103Z mov.u32 %r34872, 0x0; 2026-02-21T10:19:38.1255345Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34869, %r34870, %r34871, %r34872 }, [ %rd716 + 0 ], %rd715; 2026-02-21T10:19:38.1255401Z // end inline asm 2026-02-21T10:19:38.1255459Z // begin inline asm 2026-02-21T10:19:38.1255520Z mov.u64 %rd718, 0x0; 2026-02-21T10:19:38.1255701Z createpolicy.fractional.L2::evict_first.b64 %rd718, 1.0; 2026-02-21T10:19:38.1255759Z // end inline asm 2026-02-21T10:19:38.1255819Z // begin inline asm 2026-02-21T10:19:38.1255876Z mov.u32 %r34873, 0x0; 2026-02-21T10:19:38.1255933Z mov.u32 %r34874, 0x0; 2026-02-21T10:19:38.1255991Z mov.u32 %r34875, 0x0; 2026-02-21T10:19:38.1256049Z mov.u32 %r34876, 0x0; 2026-02-21T10:19:38.1256287Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34873, %r34874, %r34875, %r34876 }, [ %rd719 + 0 ], %rd718; 2026-02-21T10:19:38.1256344Z // end inline asm 2026-02-21T10:19:38.1256400Z // begin inline asm 2026-02-21T10:19:38.1256605Z mov.u64 %rd721, 0x0; 2026-02-21T10:19:38.1256742Z createpolicy.fractional.L2::evict_first.b64 %rd721, 1.0; 2026-02-21T10:19:38.1256802Z // end inline asm 2026-02-21T10:19:38.1256859Z // begin inline asm 2026-02-21T10:19:38.1256926Z mov.u32 %r34877, 0x0; 2026-02-21T10:19:38.1256985Z mov.u32 %r34878, 0x0; 2026-02-21T10:19:38.1257044Z mov.u32 %r34879, 0x0; 2026-02-21T10:19:38.1257100Z mov.u32 %r34880, 0x0; 2026-02-21T10:19:38.1257330Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34877, %r34878, %r34879, %r34880 }, [ %rd722 + 0 ], %rd721; 2026-02-21T10:19:38.1257388Z // end inline asm 2026-02-21T10:19:38.1257444Z // begin inline asm 2026-02-21T10:19:38.1257500Z mov.u64 %rd724, 0x0; 2026-02-21T10:19:38.1257717Z createpolicy.fractional.L2::evict_first.b64 %rd724, 1.0; 2026-02-21T10:19:38.1257772Z // end inline asm 2026-02-21T10:19:38.1257829Z // begin inline asm 2026-02-21T10:19:38.1257899Z mov.u32 %r34881, 0x0; 2026-02-21T10:19:38.1257956Z mov.u32 %r34882, 0x0; 2026-02-21T10:19:38.1258012Z mov.u32 %r34883, 0x0; 2026-02-21T10:19:38.1258069Z mov.u32 %r34884, 0x0; 2026-02-21T10:19:38.1258293Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34881, %r34882, %r34883, %r34884 }, [ %rd725 + 0 ], %rd724; 2026-02-21T10:19:38.1258351Z // end inline asm 2026-02-21T10:19:38.1258407Z // begin inline asm 2026-02-21T10:19:38.1258467Z mov.u64 %rd727, 0x0; 2026-02-21T10:19:38.1258583Z createpolicy.fractional.L2::evict_first.b64 %rd727, 1.0; 2026-02-21T10:19:38.1258639Z // end inline asm 2026-02-21T10:19:38.1258697Z // begin inline asm 2026-02-21T10:19:38.1258752Z mov.u32 %r34885, 0x0; 2026-02-21T10:19:38.1258807Z mov.u32 %r34886, 0x0; 2026-02-21T10:19:38.1258862Z mov.u32 %r34887, 0x0; 2026-02-21T10:19:38.1258922Z mov.u32 %r34888, 0x0; 2026-02-21T10:19:38.1259228Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34885, %r34886, %r34887, %r34888 }, [ %rd728 + 0 ], %rd727; 2026-02-21T10:19:38.1259290Z // end inline asm 2026-02-21T10:19:38.1259349Z // begin inline asm 2026-02-21T10:19:38.1259406Z mov.u64 %rd730, 0x0; 2026-02-21T10:19:38.1259522Z createpolicy.fractional.L2::evict_first.b64 %rd730, 1.0; 2026-02-21T10:19:38.1259578Z // end inline asm 2026-02-21T10:19:38.1259635Z // begin inline asm 2026-02-21T10:19:38.1259691Z mov.u32 %r34889, 0x0; 2026-02-21T10:19:38.1259745Z mov.u32 %r34890, 0x0; 2026-02-21T10:19:38.1259868Z mov.u32 %r34891, 0x0; 2026-02-21T10:19:38.1259925Z mov.u32 %r34892, 0x0; 2026-02-21T10:19:38.1260146Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34889, %r34890, %r34891, %r34892 }, [ %rd731 + 0 ], %rd730; 2026-02-21T10:19:38.1260203Z // end inline asm 2026-02-21T10:19:38.1260259Z // begin inline asm 2026-02-21T10:19:38.1260315Z mov.u64 %rd733, 0x0; 2026-02-21T10:19:38.1260432Z createpolicy.fractional.L2::evict_first.b64 %rd733, 1.0; 2026-02-21T10:19:38.1260489Z // end inline asm 2026-02-21T10:19:38.1260547Z // begin inline asm 2026-02-21T10:19:38.1260603Z mov.u32 %r34893, 0x0; 2026-02-21T10:19:38.1260660Z mov.u32 %r34894, 0x0; 2026-02-21T10:19:38.1260714Z mov.u32 %r34895, 0x0; 2026-02-21T10:19:38.1260768Z mov.u32 %r34896, 0x0; 2026-02-21T10:19:38.1260992Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34893, %r34894, %r34895, %r34896 }, [ %rd734 + 0 ], %rd733; 2026-02-21T10:19:38.1261046Z // end inline asm 2026-02-21T10:19:38.1261102Z // begin inline asm 2026-02-21T10:19:38.1261159Z mov.u64 %rd736, 0x0; 2026-02-21T10:19:38.1261341Z createpolicy.fractional.L2::evict_first.b64 %rd736, 1.0; 2026-02-21T10:19:38.1261399Z // end inline asm 2026-02-21T10:19:38.1261454Z // begin inline asm 2026-02-21T10:19:38.1261513Z mov.u32 %r34897, 0x0; 2026-02-21T10:19:38.1261579Z mov.u32 %r34898, 0x0; 2026-02-21T10:19:38.1261636Z mov.u32 %r34899, 0x0; 2026-02-21T10:19:38.1261695Z mov.u32 %r34900, 0x0; 2026-02-21T10:19:38.1261921Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34897, %r34898, %r34899, %r34900 }, [ %rd737 + 0 ], %rd736; 2026-02-21T10:19:38.1261978Z // end inline asm 2026-02-21T10:19:38.1262184Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.1262242Z bar.sync 0; 2026-02-21T10:19:38.1262324Z st.shared.v2.b32 [%r68], {%r34869, %r34870}; 2026-02-21T10:19:38.1262415Z st.shared.v2.b32 [%r68+2048], {%r34873, %r34874}; 2026-02-21T10:19:38.1262503Z st.shared.v2.b32 [%r68+4096], {%r34877, %r34878}; 2026-02-21T10:19:38.1262592Z st.shared.v2.b32 [%r68+6144], {%r34881, %r34882}; 2026-02-21T10:19:38.1262674Z st.shared.v2.b32 [%r68+8192], {%r34885, %r34886}; 2026-02-21T10:19:38.1262763Z st.shared.v2.b32 [%r68+10240], {%r34889, %r34890}; 2026-02-21T10:19:38.1262851Z st.shared.v2.b32 [%r68+12288], {%r34893, %r34894}; 2026-02-21T10:19:38.1262992Z st.shared.v2.b32 [%r68+14336], {%r34897, %r34898}; 2026-02-21T10:19:38.1263070Z st.shared.v2.b32 [%r69], {%r34871, %r34872}; 2026-02-21T10:19:38.1263157Z st.shared.v2.b32 [%r69+2048], {%r34875, %r34876}; 2026-02-21T10:19:38.1263238Z st.shared.v2.b32 [%r69+4096], {%r34879, %r34880}; 2026-02-21T10:19:38.1263320Z st.shared.v2.b32 [%r69+6144], {%r34883, %r34884}; 2026-02-21T10:19:38.1263404Z st.shared.v2.b32 [%r69+8192], {%r34887, %r34888}; 2026-02-21T10:19:38.1263488Z st.shared.v2.b32 [%r69+10240], {%r34891, %r34892}; 2026-02-21T10:19:38.1263572Z st.shared.v2.b32 [%r69+12288], {%r34895, %r34896}; 2026-02-21T10:19:38.1263655Z st.shared.v2.b32 [%r69+14336], {%r34899, %r34900}; 2026-02-21T10:19:38.1263715Z bar.sync 0; 2026-02-21T10:19:38.1263794Z ld.shared.b16 %rs2918, [%r70]; 2026-02-21T10:19:38.1263866Z ld.shared.b16 %rs2919, [%r70+1024]; 2026-02-21T10:19:38.1263936Z ld.shared.b16 %rs2920, [%r70+64]; 2026-02-21T10:19:38.1264003Z ld.shared.b16 %rs2921, [%r70+1088]; 2026-02-21T10:19:38.1264069Z ld.shared.b16 %rs2922, [%r70+8192]; 2026-02-21T10:19:38.1264134Z ld.shared.b16 %rs2923, [%r70+9216]; 2026-02-21T10:19:38.1264197Z ld.shared.b16 %rs2924, [%r70+8256]; 2026-02-21T10:19:38.1264313Z ld.shared.b16 %rs2925, [%r70+9280]; 2026-02-21T10:19:38.1264379Z ld.shared.b16 %rs2926, [%r71]; 2026-02-21T10:19:38.1264445Z ld.shared.b16 %rs2927, [%r71+1024]; 2026-02-21T10:19:38.1264509Z ld.shared.b16 %rs2928, [%r71+64]; 2026-02-21T10:19:38.1264572Z ld.shared.b16 %rs2929, [%r71+1088]; 2026-02-21T10:19:38.1264638Z ld.shared.b16 %rs2930, [%r71+8192]; 2026-02-21T10:19:38.1264701Z ld.shared.b16 %rs2931, [%r71+9216]; 2026-02-21T10:19:38.1264840Z ld.shared.b16 %rs2932, [%r71+8256]; 2026-02-21T10:19:38.1264905Z ld.shared.b16 %rs2933, [%r71+9280]; 2026-02-21T10:19:38.1264973Z ld.shared.b16 %rs2934, [%r72]; 2026-02-21T10:19:38.1265039Z ld.shared.b16 %rs2935, [%r72+1024]; 2026-02-21T10:19:38.1265101Z ld.shared.b16 %rs2936, [%r72+64]; 2026-02-21T10:19:38.1265167Z ld.shared.b16 %rs2937, [%r72+1088]; 2026-02-21T10:19:38.1265232Z ld.shared.b16 %rs2938, [%r72+8192]; 2026-02-21T10:19:38.1265296Z ld.shared.b16 %rs2939, [%r72+9216]; 2026-02-21T10:19:38.1265366Z ld.shared.b16 %rs2940, [%r72+8256]; 2026-02-21T10:19:38.1267981Z ld.shared.b16 %rs2941, [%r72+9280]; 2026-02-21T10:19:38.1268095Z ld.shared.b16 %rs2942, [%r73]; 2026-02-21T10:19:38.1268172Z ld.shared.b16 %rs2943, [%r73+1024]; 2026-02-21T10:19:38.1268242Z ld.shared.b16 %rs2944, [%r73+64]; 2026-02-21T10:19:38.1268309Z ld.shared.b16 %rs2945, [%r73+1088]; 2026-02-21T10:19:38.1268455Z ld.shared.b16 %rs2946, [%r73+8192]; 2026-02-21T10:19:38.1268525Z ld.shared.b16 %rs2947, [%r73+9216]; 2026-02-21T10:19:38.1268720Z ld.shared.b16 %rs2948, [%r73+8256]; 2026-02-21T10:19:38.1268796Z ld.shared.b16 %rs2949, [%r73+9280]; 2026-02-21T10:19:38.1268864Z ld.shared.b16 %rs2950, [%r74]; 2026-02-21T10:19:38.1268926Z ld.shared.b16 %rs2951, [%r74+1024]; 2026-02-21T10:19:38.1268992Z ld.shared.b16 %rs2952, [%r74+64]; 2026-02-21T10:19:38.1269058Z ld.shared.b16 %rs2953, [%r74+1088]; 2026-02-21T10:19:38.1269122Z ld.shared.b16 %rs2954, [%r74+8192]; 2026-02-21T10:19:38.1269185Z ld.shared.b16 %rs2955, [%r74+9216]; 2026-02-21T10:19:38.1269254Z ld.shared.b16 %rs2956, [%r74+8256]; 2026-02-21T10:19:38.1269319Z ld.shared.b16 %rs2957, [%r74+9280]; 2026-02-21T10:19:38.1269382Z ld.shared.b16 %rs2958, [%r75]; 2026-02-21T10:19:38.1269447Z ld.shared.b16 %rs2959, [%r75+1024]; 2026-02-21T10:19:38.1269509Z ld.shared.b16 %rs2960, [%r75+64]; 2026-02-21T10:19:38.1269571Z ld.shared.b16 %rs2961, [%r75+1088]; 2026-02-21T10:19:38.1269635Z ld.shared.b16 %rs2962, [%r75+8192]; 2026-02-21T10:19:38.1269702Z ld.shared.b16 %rs2963, [%r75+9216]; 2026-02-21T10:19:38.1269767Z ld.shared.b16 %rs2964, [%r75+8256]; 2026-02-21T10:19:38.1269836Z ld.shared.b16 %rs2965, [%r75+9280]; 2026-02-21T10:19:38.1269905Z ld.shared.b16 %rs2966, [%r76]; 2026-02-21T10:19:38.1269971Z ld.shared.b16 %rs2967, [%r76+1024]; 2026-02-21T10:19:38.1270034Z ld.shared.b16 %rs2968, [%r76+64]; 2026-02-21T10:19:38.1270191Z ld.shared.b16 %rs2969, [%r76+1088]; 2026-02-21T10:19:38.1270256Z ld.shared.b16 %rs2970, [%r76+8192]; 2026-02-21T10:19:38.1270321Z ld.shared.b16 %rs2971, [%r76+9216]; 2026-02-21T10:19:38.1270383Z ld.shared.b16 %rs2972, [%r76+8256]; 2026-02-21T10:19:38.1270448Z ld.shared.b16 %rs2973, [%r76+9280]; 2026-02-21T10:19:38.1270511Z ld.shared.b16 %rs2974, [%r77]; 2026-02-21T10:19:38.1270573Z ld.shared.b16 %rs2975, [%r77+1024]; 2026-02-21T10:19:38.1270638Z ld.shared.b16 %rs2976, [%r77+64]; 2026-02-21T10:19:38.1270703Z ld.shared.b16 %rs2977, [%r77+1088]; 2026-02-21T10:19:38.1270764Z ld.shared.b16 %rs2978, [%r77+8192]; 2026-02-21T10:19:38.1270830Z ld.shared.b16 %rs2979, [%r77+9216]; 2026-02-21T10:19:38.1270898Z ld.shared.b16 %rs2980, [%r77+8256]; 2026-02-21T10:19:38.1270963Z ld.shared.b16 %rs2981, [%r77+9280]; 2026-02-21T10:19:38.1271023Z cvt.f32.bf16 %r35038, %rs2918; 2026-02-21T10:19:38.1271086Z cvt.f32.bf16 %r35039, %rs2919; 2026-02-21T10:19:38.1271147Z cvt.f32.bf16 %r35040, %rs2926; 2026-02-21T10:19:38.1271206Z cvt.f32.bf16 %r35041, %rs2927; 2026-02-21T10:19:38.1271267Z cvt.f32.bf16 %r35170, %rs2934; 2026-02-21T10:19:38.1271412Z cvt.f32.bf16 %r35171, %rs2935; 2026-02-21T10:19:38.1271477Z cvt.f32.bf16 %r35172, %rs2942; 2026-02-21T10:19:38.1271537Z cvt.f32.bf16 %r35173, %rs2943; 2026-02-21T10:19:38.1271600Z cvt.f32.bf16 %r35302, %rs2950; 2026-02-21T10:19:38.1271661Z cvt.f32.bf16 %r35303, %rs2951; 2026-02-21T10:19:38.1271720Z cvt.f32.bf16 %r35304, %rs2958; 2026-02-21T10:19:38.1271782Z cvt.f32.bf16 %r35305, %rs2959; 2026-02-21T10:19:38.1271840Z cvt.f32.bf16 %r35434, %rs2966; 2026-02-21T10:19:38.1271968Z cvt.f32.bf16 %r35435, %rs2967; 2026-02-21T10:19:38.1272027Z cvt.f32.bf16 %r35436, %rs2974; 2026-02-21T10:19:38.1272090Z cvt.f32.bf16 %r35437, %rs2975; 2026-02-21T10:19:38.1272147Z cvt.f32.bf16 %r35566, %rs2920; 2026-02-21T10:19:38.1272206Z cvt.f32.bf16 %r35567, %rs2921; 2026-02-21T10:19:38.1272266Z cvt.f32.bf16 %r35568, %rs2928; 2026-02-21T10:19:38.1272326Z cvt.f32.bf16 %r35569, %rs2929; 2026-02-21T10:19:38.1272384Z cvt.f32.bf16 %r35698, %rs2936; 2026-02-21T10:19:38.1272441Z cvt.f32.bf16 %r35699, %rs2937; 2026-02-21T10:19:38.1272504Z cvt.f32.bf16 %r35700, %rs2944; 2026-02-21T10:19:38.1272562Z cvt.f32.bf16 %r35701, %rs2945; 2026-02-21T10:19:38.1272620Z cvt.f32.bf16 %r35830, %rs2952; 2026-02-21T10:19:38.1272680Z cvt.f32.bf16 %r35831, %rs2953; 2026-02-21T10:19:38.1272739Z cvt.f32.bf16 %r35832, %rs2960; 2026-02-21T10:19:38.1272798Z cvt.f32.bf16 %r35833, %rs2961; 2026-02-21T10:19:38.1272857Z cvt.f32.bf16 %r35962, %rs2968; 2026-02-21T10:19:38.1272922Z cvt.f32.bf16 %r35963, %rs2969; 2026-02-21T10:19:38.1273030Z cvt.f32.bf16 %r35964, %rs2976; 2026-02-21T10:19:38.1273091Z cvt.f32.bf16 %r35965, %rs2977; 2026-02-21T10:19:38.1273163Z cvt.f32.bf16 %r36094, %rs2922; 2026-02-21T10:19:38.1273223Z cvt.f32.bf16 %r36095, %rs2923; 2026-02-21T10:19:38.1273282Z cvt.f32.bf16 %r36096, %rs2930; 2026-02-21T10:19:38.1273347Z cvt.f32.bf16 %r36097, %rs2931; 2026-02-21T10:19:38.1273406Z cvt.f32.bf16 %r36226, %rs2938; 2026-02-21T10:19:38.1273464Z cvt.f32.bf16 %r36227, %rs2939; 2026-02-21T10:19:38.1273524Z cvt.f32.bf16 %r36228, %rs2946; 2026-02-21T10:19:38.1273586Z cvt.f32.bf16 %r36229, %rs2947; 2026-02-21T10:19:38.1273645Z cvt.f32.bf16 %r36358, %rs2954; 2026-02-21T10:19:38.1273703Z cvt.f32.bf16 %r36359, %rs2955; 2026-02-21T10:19:38.1273763Z cvt.f32.bf16 %r36360, %rs2962; 2026-02-21T10:19:38.1273821Z cvt.f32.bf16 %r36361, %rs2963; 2026-02-21T10:19:38.1273878Z cvt.f32.bf16 %r36490, %rs2970; 2026-02-21T10:19:38.1273937Z cvt.f32.bf16 %r36491, %rs2971; 2026-02-21T10:19:38.1274002Z cvt.f32.bf16 %r36492, %rs2978; 2026-02-21T10:19:38.1274061Z cvt.f32.bf16 %r36493, %rs2979; 2026-02-21T10:19:38.1274119Z cvt.f32.bf16 %r36622, %rs2924; 2026-02-21T10:19:38.1274180Z cvt.f32.bf16 %r36623, %rs2925; 2026-02-21T10:19:38.1274238Z cvt.f32.bf16 %r36624, %rs2932; 2026-02-21T10:19:38.1274306Z cvt.f32.bf16 %r36625, %rs2933; 2026-02-21T10:19:38.1274423Z cvt.f32.bf16 %r36754, %rs2940; 2026-02-21T10:19:38.1274484Z cvt.f32.bf16 %r36755, %rs2941; 2026-02-21T10:19:38.1274550Z cvt.f32.bf16 %r36756, %rs2948; 2026-02-21T10:19:38.1274609Z cvt.f32.bf16 %r36757, %rs2949; 2026-02-21T10:19:38.1274671Z cvt.f32.bf16 %r36886, %rs2956; 2026-02-21T10:19:38.1274730Z cvt.f32.bf16 %r36887, %rs2957; 2026-02-21T10:19:38.1274787Z cvt.f32.bf16 %r36888, %rs2964; 2026-02-21T10:19:38.1274847Z cvt.f32.bf16 %r36889, %rs2965; 2026-02-21T10:19:38.1274907Z cvt.f32.bf16 %r37018, %rs2972; 2026-02-21T10:19:38.1274966Z cvt.f32.bf16 %r37019, %rs2973; 2026-02-21T10:19:38.1275028Z cvt.f32.bf16 %r37020, %rs2980; 2026-02-21T10:19:38.1275091Z cvt.f32.bf16 %r37021, %rs2981; 2026-02-21T10:19:38.1275322Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.1275381Z bar.sync 0; 2026-02-21T10:19:38.1275444Z // begin inline asm 2026-02-21T10:19:38.1275558Z @%p313 mbarrier.init.shared::cta.b64 [%r39929], 1; 2026-02-21T10:19:38.1275619Z // end inline asm 2026-02-21T10:19:38.1275676Z bar.sync 0; 2026-02-21T10:19:38.1275737Z // begin inline asm 2026-02-21T10:19:38.1275948Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39929], 4096; 2026-02-21T10:19:38.1276006Z // end inline asm 2026-02-21T10:19:38.1276066Z // begin inline asm 2026-02-21T10:19:38.1276145Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1276199Z // end inline asm 2026-02-21T10:19:38.1276252Z bar.sync 0; 2026-02-21T10:19:38.1276334Z elect.sync %r39766|%p375, -1; 2026-02-21T10:19:38.1276405Z and.pred %p335, %p1, %p375; 2026-02-21T10:19:38.1276653Z cvt.u32.u64 %r39767, %rd853; 2026-02-21T10:19:38.1276734Z add.s32 %r34905, %r39767, 128; 2026-02-21T10:19:38.1276796Z // begin inline asm 2026-02-21T10:19:38.1277157Z @%p335 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r39932, %r34905}], [%r39929]; 2026-02-21T10:19:38.1277218Z // end inline asm 2026-02-21T10:19:38.1277275Z bar.sync 0; 2026-02-21T10:19:38.1277334Z // begin inline asm 2026-02-21T10:19:38.1277387Z 2026-02-21T10:19:38.1277439Z { 2026-02-21T10:19:38.1277504Z .reg .pred complete; 2026-02-21T10:19:38.1277559Z waitLoop: 2026-02-21T10:19:38.1277713Z mbarrier.try_wait.parity.shared.b64 complete, [%r39929], %r39567; 2026-02-21T10:19:38.1277782Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.1277831Z } 2026-02-21T10:19:38.1277837Z 2026-02-21T10:19:38.1277893Z // end inline asm 2026-02-21T10:19:38.1277949Z bar.sync 0; 2026-02-21T10:19:38.1278007Z // begin inline asm 2026-02-21T10:19:38.1278107Z @%p313 mbarrier.inval.shared::cta.b64 [%r39929]; 2026-02-21T10:19:38.1278257Z // end inline asm 2026-02-21T10:19:38.1278479Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1278548Z ld.shared.s8 %rs2982, [%r78]; 2026-02-21T10:19:38.1278754Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1278823Z shl.b16 %rs2983, %rs2982, 4; 2026-02-21T10:19:38.1279017Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1279087Z ld.shared.s8 %rs2984, [%r79+128]; 2026-02-21T10:19:38.1279277Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1279338Z shl.b16 %rs2985, %rs2984, 4; 2026-02-21T10:19:38.1279525Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1279592Z ld.shared.s8 %rs2986, [%r80+256]; 2026-02-21T10:19:38.1279783Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1279843Z shl.b16 %rs2987, %rs2986, 4; 2026-02-21T10:19:38.1280035Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1280180Z ld.shared.s8 %rs2988, [%r81+384]; 2026-02-21T10:19:38.1280373Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1280436Z shl.b16 %rs2989, %rs2988, 4; 2026-02-21T10:19:38.1280623Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1280686Z ld.shared.s8 %rs2990, [%r82+512]; 2026-02-21T10:19:38.1280877Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1280938Z shl.b16 %rs2991, %rs2990, 4; 2026-02-21T10:19:38.1281126Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1281193Z ld.shared.s8 %rs2992, [%r83+640]; 2026-02-21T10:19:38.1281385Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1281445Z shl.b16 %rs2993, %rs2992, 4; 2026-02-21T10:19:38.1281634Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1281701Z ld.shared.s8 %rs2994, [%r84+768]; 2026-02-21T10:19:38.1281961Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1282026Z shl.b16 %rs2995, %rs2994, 4; 2026-02-21T10:19:38.1282226Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1282289Z ld.shared.s8 %rs2996, [%r85+896]; 2026-02-21T10:19:38.1282476Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1282601Z shl.b16 %rs2997, %rs2996, 4; 2026-02-21T10:19:38.1282790Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1282856Z ld.shared.s8 %rs2998, [%r78+1024]; 2026-02-21T10:19:38.1283043Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1283106Z shl.b16 %rs2999, %rs2998, 4; 2026-02-21T10:19:38.1283293Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1283369Z ld.shared.s8 %rs3000, [%r79+1152]; 2026-02-21T10:19:38.1283562Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1283622Z shl.b16 %rs3001, %rs3000, 4; 2026-02-21T10:19:38.1283809Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1283927Z ld.shared.s8 %rs3002, [%r80+1280]; 2026-02-21T10:19:38.1284115Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1284174Z shl.b16 %rs3003, %rs3002, 4; 2026-02-21T10:19:38.1284370Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1284435Z ld.shared.s8 %rs3004, [%r81+1408]; 2026-02-21T10:19:38.1284622Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1284684Z shl.b16 %rs3005, %rs3004, 4; 2026-02-21T10:19:38.1284879Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1284952Z ld.shared.s8 %rs3006, [%r82+1536]; 2026-02-21T10:19:38.1285145Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1285209Z shl.b16 %rs3007, %rs3006, 4; 2026-02-21T10:19:38.1285400Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1285463Z ld.shared.s8 %rs3008, [%r83+1664]; 2026-02-21T10:19:38.1285653Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1285766Z shl.b16 %rs3009, %rs3008, 4; 2026-02-21T10:19:38.1285955Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1286033Z ld.shared.s8 %rs3010, [%r84+1792]; 2026-02-21T10:19:38.1286224Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1286286Z shl.b16 %rs3011, %rs3010, 4; 2026-02-21T10:19:38.1286604Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1286679Z ld.shared.s8 %rs3012, [%r85+1920]; 2026-02-21T10:19:38.1286885Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1286960Z shl.b16 %rs3013, %rs3012, 4; 2026-02-21T10:19:38.1287167Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1287233Z ld.shared.s8 %rs3014, [%r78+2048]; 2026-02-21T10:19:38.1287429Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1287493Z shl.b16 %rs3015, %rs3014, 4; 2026-02-21T10:19:38.1287786Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1287855Z ld.shared.s8 %rs3016, [%r79+2176]; 2026-02-21T10:19:38.1288050Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1288109Z shl.b16 %rs3017, %rs3016, 4; 2026-02-21T10:19:38.1288299Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1288431Z ld.shared.s8 %rs3018, [%r80+2304]; 2026-02-21T10:19:38.1288622Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1288694Z shl.b16 %rs3019, %rs3018, 4; 2026-02-21T10:19:38.1288885Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1288955Z ld.shared.s8 %rs3020, [%r81+2432]; 2026-02-21T10:19:38.1289142Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1289202Z shl.b16 %rs3021, %rs3020, 4; 2026-02-21T10:19:38.1289390Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1289455Z ld.shared.s8 %rs3022, [%r82+2560]; 2026-02-21T10:19:38.1289640Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1289768Z shl.b16 %rs3023, %rs3022, 4; 2026-02-21T10:19:38.1289958Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1290020Z ld.shared.s8 %rs3024, [%r83+2688]; 2026-02-21T10:19:38.1290209Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1290270Z shl.b16 %rs3025, %rs3024, 4; 2026-02-21T10:19:38.1290458Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1290522Z ld.shared.s8 %rs3026, [%r84+2816]; 2026-02-21T10:19:38.1290710Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1290769Z shl.b16 %rs3027, %rs3026, 4; 2026-02-21T10:19:38.1290970Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1291036Z ld.shared.s8 %rs3028, [%r85+2944]; 2026-02-21T10:19:38.1291225Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1291286Z shl.b16 %rs3029, %rs3028, 4; 2026-02-21T10:19:38.1291477Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1291610Z ld.shared.s8 %rs3030, [%r78+3072]; 2026-02-21T10:19:38.1291797Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1291862Z shl.b16 %rs3031, %rs3030, 4; 2026-02-21T10:19:38.1292049Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1292112Z ld.shared.s8 %rs3032, [%r79+3200]; 2026-02-21T10:19:38.1292301Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1292365Z shl.b16 %rs3033, %rs3032, 4; 2026-02-21T10:19:38.1292553Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1292617Z ld.shared.s8 %rs3034, [%r80+3328]; 2026-02-21T10:19:38.1292806Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1292865Z shl.b16 %rs3035, %rs3034, 4; 2026-02-21T10:19:38.1293055Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1293122Z ld.shared.s8 %rs3036, [%r81+3456]; 2026-02-21T10:19:38.1293358Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1293419Z shl.b16 %rs3037, %rs3036, 4; 2026-02-21T10:19:38.1293611Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1293676Z ld.shared.s8 %rs3038, [%r82+3584]; 2026-02-21T10:19:38.1293863Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1293970Z shl.b16 %rs3039, %rs3038, 4; 2026-02-21T10:19:38.1294159Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1294221Z ld.shared.s8 %rs3040, [%r83+3712]; 2026-02-21T10:19:38.1294417Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1294492Z shl.b16 %rs3041, %rs3040, 4; 2026-02-21T10:19:38.1294685Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1294748Z ld.shared.s8 %rs3042, [%r84+3840]; 2026-02-21T10:19:38.1294936Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1294994Z shl.b16 %rs3043, %rs3042, 4; 2026-02-21T10:19:38.1295182Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1295299Z ld.shared.s8 %rs3044, [%r85+3968]; 2026-02-21T10:19:38.1295488Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1295547Z shl.b16 %rs3045, %rs3044, 4; 2026-02-21T10:19:38.1295734Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1295799Z cvt.s16.s8 %rs3046, %rs2983; 2026-02-21T10:19:38.1295857Z shr.s16 %rs3047, %rs3046, 4; 2026-02-21T10:19:38.1295916Z cvt.s16.s8 %rs3048, %rs2985; 2026-02-21T10:19:38.1295979Z shr.s16 %rs3049, %rs3048, 4; 2026-02-21T10:19:38.1296038Z shr.s16 %rs3050, %rs2982, 4; 2026-02-21T10:19:38.1296096Z shr.s16 %rs3051, %rs2984, 4; 2026-02-21T10:19:38.1296285Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1296351Z cvt.rn.f32.s16 %r39768, %rs3051; 2026-02-21T10:19:38.1296414Z cvt.rn.f32.s16 %r39769, %rs3050; 2026-02-21T10:19:38.1296601Z cvt.rn.f32.s16 %r39770, %rs3049; 2026-02-21T10:19:38.1296671Z cvt.rn.f32.s16 %r39771, %rs3047; 2026-02-21T10:19:38.1296874Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1296936Z cvt.s16.s8 %rs3052, %rs2987; 2026-02-21T10:19:38.1296999Z shr.s16 %rs3053, %rs3052, 4; 2026-02-21T10:19:38.1297134Z cvt.s16.s8 %rs3054, %rs2989; 2026-02-21T10:19:38.1297193Z shr.s16 %rs3055, %rs3054, 4; 2026-02-21T10:19:38.1297256Z shr.s16 %rs3056, %rs2986, 4; 2026-02-21T10:19:38.1297315Z shr.s16 %rs3057, %rs2988, 4; 2026-02-21T10:19:38.1297503Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1297566Z cvt.rn.f32.s16 %r39772, %rs3057; 2026-02-21T10:19:38.1297629Z cvt.rn.f32.s16 %r39773, %rs3056; 2026-02-21T10:19:38.1297689Z cvt.rn.f32.s16 %r39774, %rs3055; 2026-02-21T10:19:38.1297749Z cvt.rn.f32.s16 %r39775, %rs3053; 2026-02-21T10:19:38.1297939Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1298000Z cvt.s16.s8 %rs3058, %rs2991; 2026-02-21T10:19:38.1298059Z shr.s16 %rs3059, %rs3058, 4; 2026-02-21T10:19:38.1298129Z cvt.s16.s8 %rs3060, %rs2993; 2026-02-21T10:19:38.1298195Z shr.s16 %rs3061, %rs3060, 4; 2026-02-21T10:19:38.1298256Z shr.s16 %rs3062, %rs2990, 4; 2026-02-21T10:19:38.1298315Z shr.s16 %rs3063, %rs2992, 4; 2026-02-21T10:19:38.1298572Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1298636Z cvt.rn.f32.s16 %r39776, %rs3063; 2026-02-21T10:19:38.1298699Z cvt.rn.f32.s16 %r39777, %rs3062; 2026-02-21T10:19:38.1298762Z cvt.rn.f32.s16 %r39778, %rs3061; 2026-02-21T10:19:38.1298822Z cvt.rn.f32.s16 %r39779, %rs3059; 2026-02-21T10:19:38.1299009Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1299070Z cvt.s16.s8 %rs3064, %rs2995; 2026-02-21T10:19:38.1299196Z shr.s16 %rs3065, %rs3064, 4; 2026-02-21T10:19:38.1299255Z cvt.s16.s8 %rs3066, %rs2997; 2026-02-21T10:19:38.1299315Z shr.s16 %rs3067, %rs3066, 4; 2026-02-21T10:19:38.1299376Z shr.s16 %rs3068, %rs2994, 4; 2026-02-21T10:19:38.1299434Z shr.s16 %rs3069, %rs2996, 4; 2026-02-21T10:19:38.1299623Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1299689Z cvt.rn.f32.s16 %r39780, %rs3069; 2026-02-21T10:19:38.1299750Z cvt.rn.f32.s16 %r39781, %rs3068; 2026-02-21T10:19:38.1299810Z cvt.rn.f32.s16 %r39782, %rs3067; 2026-02-21T10:19:38.1299871Z cvt.rn.f32.s16 %r39783, %rs3065; 2026-02-21T10:19:38.1300057Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1300116Z cvt.s16.s8 %rs3070, %rs2999; 2026-02-21T10:19:38.1300177Z shr.s16 %rs3071, %rs3070, 4; 2026-02-21T10:19:38.1300238Z cvt.s16.s8 %rs3072, %rs3001; 2026-02-21T10:19:38.1300299Z shr.s16 %rs3073, %rs3072, 4; 2026-02-21T10:19:38.1300431Z shr.s16 %rs3074, %rs2998, 4; 2026-02-21T10:19:38.1300498Z shr.s16 %rs3075, %rs3000, 4; 2026-02-21T10:19:38.1300688Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1300751Z cvt.rn.f32.s16 %r39784, %rs3075; 2026-02-21T10:19:38.1300815Z cvt.rn.f32.s16 %r39785, %rs3074; 2026-02-21T10:19:38.1300875Z cvt.rn.f32.s16 %r39786, %rs3073; 2026-02-21T10:19:38.1300935Z cvt.rn.f32.s16 %r39787, %rs3071; 2026-02-21T10:19:38.1301124Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1301186Z cvt.s16.s8 %rs3076, %rs3003; 2026-02-21T10:19:38.1301244Z shr.s16 %rs3077, %rs3076, 4; 2026-02-21T10:19:38.1301304Z cvt.s16.s8 %rs3078, %rs3005; 2026-02-21T10:19:38.1301366Z shr.s16 %rs3079, %rs3078, 4; 2026-02-21T10:19:38.1301424Z shr.s16 %rs3080, %rs3002, 4; 2026-02-21T10:19:38.1301483Z shr.s16 %rs3081, %rs3004, 4; 2026-02-21T10:19:38.1301677Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1301739Z cvt.rn.f32.s16 %r39788, %rs3081; 2026-02-21T10:19:38.1301800Z cvt.rn.f32.s16 %r39789, %rs3080; 2026-02-21T10:19:38.1301860Z cvt.rn.f32.s16 %r39790, %rs3079; 2026-02-21T10:19:38.1301977Z cvt.rn.f32.s16 %r39791, %rs3077; 2026-02-21T10:19:38.1302165Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1302225Z cvt.s16.s8 %rs3082, %rs3007; 2026-02-21T10:19:38.1302288Z shr.s16 %rs3083, %rs3082, 4; 2026-02-21T10:19:38.1302347Z cvt.s16.s8 %rs3084, %rs3009; 2026-02-21T10:19:38.1302405Z shr.s16 %rs3085, %rs3084, 4; 2026-02-21T10:19:38.1302463Z shr.s16 %rs3086, %rs3006, 4; 2026-02-21T10:19:38.1302523Z shr.s16 %rs3087, %rs3008, 4; 2026-02-21T10:19:38.1302709Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1302775Z cvt.rn.f32.s16 %r39792, %rs3087; 2026-02-21T10:19:38.1302838Z cvt.rn.f32.s16 %r39793, %rs3086; 2026-02-21T10:19:38.1302898Z cvt.rn.f32.s16 %r39794, %rs3085; 2026-02-21T10:19:38.1302969Z cvt.rn.f32.s16 %r39795, %rs3083; 2026-02-21T10:19:38.1303162Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1303227Z cvt.s16.s8 %rs3088, %rs3011; 2026-02-21T10:19:38.1303288Z shr.s16 %rs3089, %rs3088, 4; 2026-02-21T10:19:38.1303401Z cvt.s16.s8 %rs3090, %rs3013; 2026-02-21T10:19:38.1303464Z shr.s16 %rs3091, %rs3090, 4; 2026-02-21T10:19:38.1303522Z shr.s16 %rs3092, %rs3010, 4; 2026-02-21T10:19:38.1303580Z shr.s16 %rs3093, %rs3012, 4; 2026-02-21T10:19:38.1303771Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1303831Z cvt.rn.f32.s16 %r39796, %rs3093; 2026-02-21T10:19:38.1303891Z cvt.rn.f32.s16 %r39797, %rs3092; 2026-02-21T10:19:38.1304003Z cvt.rn.f32.s16 %r39798, %rs3091; 2026-02-21T10:19:38.1304062Z cvt.rn.f32.s16 %r39799, %rs3089; 2026-02-21T10:19:38.1304250Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1304310Z cvt.s16.s8 %rs3094, %rs3015; 2026-02-21T10:19:38.1304372Z shr.s16 %rs3095, %rs3094, 4; 2026-02-21T10:19:38.1304433Z cvt.s16.s8 %rs3096, %rs3017; 2026-02-21T10:19:38.1304492Z shr.s16 %rs3097, %rs3096, 4; 2026-02-21T10:19:38.1304553Z shr.s16 %rs3098, %rs3014, 4; 2026-02-21T10:19:38.1304612Z shr.s16 %rs3099, %rs3016, 4; 2026-02-21T10:19:38.1304799Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1304863Z cvt.rn.f32.s16 %r39800, %rs3099; 2026-02-21T10:19:38.1304923Z cvt.rn.f32.s16 %r39801, %rs3098; 2026-02-21T10:19:38.1304984Z cvt.rn.f32.s16 %r39802, %rs3097; 2026-02-21T10:19:38.1305043Z cvt.rn.f32.s16 %r39803, %rs3095; 2026-02-21T10:19:38.1305285Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1305347Z cvt.s16.s8 %rs3100, %rs3019; 2026-02-21T10:19:38.1305406Z shr.s16 %rs3101, %rs3100, 4; 2026-02-21T10:19:38.1305466Z cvt.s16.s8 %rs3102, %rs3021; 2026-02-21T10:19:38.1305524Z shr.s16 %rs3103, %rs3102, 4; 2026-02-21T10:19:38.1305584Z shr.s16 %rs3104, %rs3018, 4; 2026-02-21T10:19:38.1305643Z shr.s16 %rs3105, %rs3020, 4; 2026-02-21T10:19:38.1305837Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1305899Z cvt.rn.f32.s16 %r39804, %rs3105; 2026-02-21T10:19:38.1305959Z cvt.rn.f32.s16 %r39805, %rs3104; 2026-02-21T10:19:38.1306021Z cvt.rn.f32.s16 %r39806, %rs3103; 2026-02-21T10:19:38.1306081Z cvt.rn.f32.s16 %r39807, %rs3101; 2026-02-21T10:19:38.1306270Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1306334Z cvt.s16.s8 %rs3106, %rs3023; 2026-02-21T10:19:38.1306393Z shr.s16 %rs3107, %rs3106, 4; 2026-02-21T10:19:38.1306580Z cvt.s16.s8 %rs3108, %rs3025; 2026-02-21T10:19:38.1306645Z shr.s16 %rs3109, %rs3108, 4; 2026-02-21T10:19:38.1306709Z shr.s16 %rs3110, %rs3022, 4; 2026-02-21T10:19:38.1306768Z shr.s16 %rs3111, %rs3024, 4; 2026-02-21T10:19:38.1306975Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1307131Z cvt.rn.f32.s16 %r39808, %rs3111; 2026-02-21T10:19:38.1307192Z cvt.rn.f32.s16 %r39809, %rs3110; 2026-02-21T10:19:38.1307256Z cvt.rn.f32.s16 %r39810, %rs3109; 2026-02-21T10:19:38.1307316Z cvt.rn.f32.s16 %r39811, %rs3107; 2026-02-21T10:19:38.1307502Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1307563Z cvt.s16.s8 %rs3112, %rs3027; 2026-02-21T10:19:38.1307621Z shr.s16 %rs3113, %rs3112, 4; 2026-02-21T10:19:38.1307680Z cvt.s16.s8 %rs3114, %rs3029; 2026-02-21T10:19:38.1307743Z shr.s16 %rs3115, %rs3114, 4; 2026-02-21T10:19:38.1307805Z shr.s16 %rs3116, %rs3026, 4; 2026-02-21T10:19:38.1307862Z shr.s16 %rs3117, %rs3028, 4; 2026-02-21T10:19:38.1308050Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1308115Z cvt.rn.f32.s16 %r39812, %rs3117; 2026-02-21T10:19:38.1308176Z cvt.rn.f32.s16 %r39813, %rs3116; 2026-02-21T10:19:38.1308235Z cvt.rn.f32.s16 %r39814, %rs3115; 2026-02-21T10:19:38.1308472Z cvt.rn.f32.s16 %r39815, %rs3113; 2026-02-21T10:19:38.1308679Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1308740Z cvt.s16.s8 %rs3118, %rs3031; 2026-02-21T10:19:38.1308801Z shr.s16 %rs3119, %rs3118, 4; 2026-02-21T10:19:38.1308864Z cvt.s16.s8 %rs3120, %rs3033; 2026-02-21T10:19:38.1308923Z shr.s16 %rs3121, %rs3120, 4; 2026-02-21T10:19:38.1308982Z shr.s16 %rs3122, %rs3030, 4; 2026-02-21T10:19:38.1309111Z shr.s16 %rs3123, %rs3032, 4; 2026-02-21T10:19:38.1309303Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1309365Z cvt.rn.f32.s16 %r39816, %rs3123; 2026-02-21T10:19:38.1309425Z cvt.rn.f32.s16 %r39817, %rs3122; 2026-02-21T10:19:38.1309486Z cvt.rn.f32.s16 %r39818, %rs3121; 2026-02-21T10:19:38.1309547Z cvt.rn.f32.s16 %r39819, %rs3119; 2026-02-21T10:19:38.1309736Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1309801Z cvt.s16.s8 %rs3124, %rs3035; 2026-02-21T10:19:38.1309860Z shr.s16 %rs3125, %rs3124, 4; 2026-02-21T10:19:38.1309918Z cvt.s16.s8 %rs3126, %rs3037; 2026-02-21T10:19:38.1309977Z shr.s16 %rs3127, %rs3126, 4; 2026-02-21T10:19:38.1310046Z shr.s16 %rs3128, %rs3034, 4; 2026-02-21T10:19:38.1310107Z shr.s16 %rs3129, %rs3036, 4; 2026-02-21T10:19:38.1310296Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1310428Z cvt.rn.f32.s16 %r39820, %rs3129; 2026-02-21T10:19:38.1310491Z cvt.rn.f32.s16 %r39821, %rs3128; 2026-02-21T10:19:38.1310550Z cvt.rn.f32.s16 %r39822, %rs3127; 2026-02-21T10:19:38.1310612Z cvt.rn.f32.s16 %r39823, %rs3125; 2026-02-21T10:19:38.1310799Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1310860Z cvt.s16.s8 %rs3130, %rs3039; 2026-02-21T10:19:38.1310923Z shr.s16 %rs3131, %rs3130, 4; 2026-02-21T10:19:38.1310984Z cvt.s16.s8 %rs3132, %rs3041; 2026-02-21T10:19:38.1311043Z shr.s16 %rs3133, %rs3132, 4; 2026-02-21T10:19:38.1311101Z shr.s16 %rs3134, %rs3038, 4; 2026-02-21T10:19:38.1311162Z shr.s16 %rs3135, %rs3040, 4; 2026-02-21T10:19:38.1311348Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1311408Z cvt.rn.f32.s16 %r39824, %rs3135; 2026-02-21T10:19:38.1311471Z cvt.rn.f32.s16 %r39825, %rs3134; 2026-02-21T10:19:38.1311536Z cvt.rn.f32.s16 %r39826, %rs3133; 2026-02-21T10:19:38.1311597Z cvt.rn.f32.s16 %r39827, %rs3131; 2026-02-21T10:19:38.1311785Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1311847Z cvt.s16.s8 %rs3136, %rs3043; 2026-02-21T10:19:38.1311974Z shr.s16 %rs3137, %rs3136, 4; 2026-02-21T10:19:38.1312033Z cvt.s16.s8 %rs3138, %rs3045; 2026-02-21T10:19:38.1312095Z shr.s16 %rs3139, %rs3138, 4; 2026-02-21T10:19:38.1312155Z shr.s16 %rs3140, %rs3042, 4; 2026-02-21T10:19:38.1312212Z shr.s16 %rs3141, %rs3044, 4; 2026-02-21T10:19:38.1312401Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1312463Z cvt.rn.f32.s16 %r39828, %rs3141; 2026-02-21T10:19:38.1312524Z cvt.rn.f32.s16 %r39829, %rs3140; 2026-02-21T10:19:38.1312584Z cvt.rn.f32.s16 %r39830, %rs3139; 2026-02-21T10:19:38.1312645Z cvt.rn.f32.s16 %r39831, %rs3137; 2026-02-21T10:19:38.1312703Z bar.sync 0; 2026-02-21T10:19:38.1312826Z st.shared.v4.b32 [%r86], {%r39771, %r39769, %r39770, %r39768}; 2026-02-21T10:19:38.1312956Z st.shared.v4.b32 [%r86+16384], {%r39803, %r39801, %r39802, %r39800}; 2026-02-21T10:19:38.1313074Z st.shared.v4.b32 [%r87], {%r39775, %r39773, %r39774, %r39772}; 2026-02-21T10:19:38.1313198Z st.shared.v4.b32 [%r87+16384], {%r39807, %r39805, %r39806, %r39804}; 2026-02-21T10:19:38.1313312Z st.shared.v4.b32 [%r88], {%r39779, %r39777, %r39778, %r39776}; 2026-02-21T10:19:38.1313488Z st.shared.v4.b32 [%r88+16384], {%r39811, %r39809, %r39810, %r39808}; 2026-02-21T10:19:38.1313603Z st.shared.v4.b32 [%r89], {%r39783, %r39781, %r39782, %r39780}; 2026-02-21T10:19:38.1313721Z st.shared.v4.b32 [%r89+16384], {%r39815, %r39813, %r39814, %r39812}; 2026-02-21T10:19:38.1313830Z st.shared.v4.b32 [%r90], {%r39787, %r39785, %r39786, %r39784}; 2026-02-21T10:19:38.1313946Z st.shared.v4.b32 [%r90+16384], {%r39819, %r39817, %r39818, %r39816}; 2026-02-21T10:19:38.1314100Z st.shared.v4.b32 [%r91], {%r39791, %r39789, %r39790, %r39788}; 2026-02-21T10:19:38.1314216Z st.shared.v4.b32 [%r91+16384], {%r39823, %r39821, %r39822, %r39820}; 2026-02-21T10:19:38.1314321Z st.shared.v4.b32 [%r92], {%r39795, %r39793, %r39794, %r39792}; 2026-02-21T10:19:38.1314435Z st.shared.v4.b32 [%r92+16384], {%r39827, %r39825, %r39826, %r39824}; 2026-02-21T10:19:38.1314549Z st.shared.v4.b32 [%r93], {%r39799, %r39797, %r39798, %r39796}; 2026-02-21T10:19:38.1314663Z st.shared.v4.b32 [%r93+16384], {%r39831, %r39829, %r39830, %r39828}; 2026-02-21T10:19:38.1314717Z $L__tmp27: 2026-02-21T10:19:38.1314996Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1315056Z // begin inline asm 2026-02-21T10:19:38.1315136Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1315191Z // end inline asm 2026-02-21T10:19:38.1315247Z bar.sync 0; 2026-02-21T10:19:38.1315316Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1315376Z // begin inline asm 2026-02-21T10:19:38.1317065Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35038,%r35039,%r35040,%r35041}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1317146Z // end inline asm 2026-02-21T10:19:38.1317207Z // begin inline asm 2026-02-21T10:19:38.1318682Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35170,%r35171,%r35172,%r35173}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1318806Z // end inline asm 2026-02-21T10:19:38.1318865Z // begin inline asm 2026-02-21T10:19:38.1320330Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35302,%r35303,%r35304,%r35305}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1320390Z // end inline asm 2026-02-21T10:19:38.1320450Z // begin inline asm 2026-02-21T10:19:38.1321959Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35434,%r35435,%r35436,%r35437}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1322021Z // end inline asm 2026-02-21T10:19:38.1322078Z // begin inline asm 2026-02-21T10:19:38.1323589Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35566,%r35567,%r35568,%r35569}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1323649Z // end inline asm 2026-02-21T10:19:38.1323706Z // begin inline asm 2026-02-21T10:19:38.1325206Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35698,%r35699,%r35700,%r35701}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1325280Z // end inline asm 2026-02-21T10:19:38.1325340Z // begin inline asm 2026-02-21T10:19:38.1326939Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35830,%r35831,%r35832,%r35833}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1327004Z // end inline asm 2026-02-21T10:19:38.1327063Z // begin inline asm 2026-02-21T10:19:38.1328517Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r35962,%r35963,%r35964,%r35965}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1328664Z // end inline asm 2026-02-21T10:19:38.1328725Z // begin inline asm 2026-02-21T10:19:38.1330182Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36094,%r36095,%r36096,%r36097}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1330306Z // end inline asm 2026-02-21T10:19:38.1330368Z // begin inline asm 2026-02-21T10:19:38.1331825Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36226,%r36227,%r36228,%r36229}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1331944Z // end inline asm 2026-02-21T10:19:38.1332004Z // begin inline asm 2026-02-21T10:19:38.1333460Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36358,%r36359,%r36360,%r36361}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1333597Z // end inline asm 2026-02-21T10:19:38.1333658Z // begin inline asm 2026-02-21T10:19:38.1335118Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36490,%r36491,%r36492,%r36493}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1335179Z // end inline asm 2026-02-21T10:19:38.1335238Z // begin inline asm 2026-02-21T10:19:38.1336823Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36622,%r36623,%r36624,%r36625}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1336967Z // end inline asm 2026-02-21T10:19:38.1337027Z // begin inline asm 2026-02-21T10:19:38.1338640Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36754,%r36755,%r36756,%r36757}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1338703Z // end inline asm 2026-02-21T10:19:38.1338760Z // begin inline asm 2026-02-21T10:19:38.1340295Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r36886,%r36887,%r36888,%r36889}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1340358Z // end inline asm 2026-02-21T10:19:38.1340476Z // begin inline asm 2026-02-21T10:19:38.1341935Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r37018,%r37019,%r37020,%r37021}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1341993Z // end inline asm 2026-02-21T10:19:38.1342073Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1342135Z mov.b32 %r37151, %r39567; 2026-02-21T10:19:38.1342204Z mov.b32 %r37152, %r39567; 2026-02-21T10:19:38.1342265Z mov.b32 %r37150, %r39931; 2026-02-21T10:19:38.1342324Z // begin inline asm 2026-02-21T10:19:38.1344861Z // wait for regs: %r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r37150,%r37151,%r37152 2026-02-21T10:19:38.1344946Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1345001Z // end inline asm 2026-02-21T10:19:38.1345055Z $L__tmp28: 2026-02-21T10:19:38.1345262Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.1345377Z add.s64 %rd757, %rd675, 256; 2026-02-21T10:19:38.1345438Z add.s64 %rd760, %rd678, 256; 2026-02-21T10:19:38.1345500Z add.s64 %rd763, %rd681, 256; 2026-02-21T10:19:38.1345560Z add.s64 %rd766, %rd684, 256; 2026-02-21T10:19:38.1345618Z add.s64 %rd769, %rd687, 256; 2026-02-21T10:19:38.1345680Z add.s64 %rd772, %rd690, 256; 2026-02-21T10:19:38.1345737Z add.s64 %rd775, %rd693, 256; 2026-02-21T10:19:38.1345824Z mad.wide.s32 %rd778, %r43240, 2, %rd117; 2026-02-21T10:19:38.1346024Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.1346088Z // begin inline asm 2026-02-21T10:19:38.1346149Z mov.u64 %rd756, 0x0; 2026-02-21T10:19:38.1346279Z createpolicy.fractional.L2::evict_first.b64 %rd756, 1.0; 2026-02-21T10:19:38.1346338Z // end inline asm 2026-02-21T10:19:38.1346396Z // begin inline asm 2026-02-21T10:19:38.1346576Z mov.u32 %r37284, 0x0; 2026-02-21T10:19:38.1346641Z mov.u32 %r37285, 0x0; 2026-02-21T10:19:38.1346699Z mov.u32 %r37286, 0x0; 2026-02-21T10:19:38.1346755Z mov.u32 %r37287, 0x0; 2026-02-21T10:19:38.1347080Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37284, %r37285, %r37286, %r37287 }, [ %rd757 + 0 ], %rd756; 2026-02-21T10:19:38.1347143Z // end inline asm 2026-02-21T10:19:38.1347200Z // begin inline asm 2026-02-21T10:19:38.1347257Z mov.u64 %rd759, 0x0; 2026-02-21T10:19:38.1347380Z createpolicy.fractional.L2::evict_first.b64 %rd759, 1.0; 2026-02-21T10:19:38.1347436Z // end inline asm 2026-02-21T10:19:38.1347493Z // begin inline asm 2026-02-21T10:19:38.1347550Z mov.u32 %r37288, 0x0; 2026-02-21T10:19:38.1347605Z mov.u32 %r37289, 0x0; 2026-02-21T10:19:38.1347725Z mov.u32 %r37290, 0x0; 2026-02-21T10:19:38.1347781Z mov.u32 %r37291, 0x0; 2026-02-21T10:19:38.1348011Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37288, %r37289, %r37290, %r37291 }, [ %rd760 + 0 ], %rd759; 2026-02-21T10:19:38.1348068Z // end inline asm 2026-02-21T10:19:38.1348124Z // begin inline asm 2026-02-21T10:19:38.1348185Z mov.u64 %rd762, 0x0; 2026-02-21T10:19:38.1348302Z createpolicy.fractional.L2::evict_first.b64 %rd762, 1.0; 2026-02-21T10:19:38.1348358Z // end inline asm 2026-02-21T10:19:38.1348491Z // begin inline asm 2026-02-21T10:19:38.1348552Z mov.u32 %r37292, 0x0; 2026-02-21T10:19:38.1348608Z mov.u32 %r37293, 0x0; 2026-02-21T10:19:38.1348664Z mov.u32 %r37294, 0x0; 2026-02-21T10:19:38.1348721Z mov.u32 %r37295, 0x0; 2026-02-21T10:19:38.1348945Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37292, %r37293, %r37294, %r37295 }, [ %rd763 + 0 ], %rd762; 2026-02-21T10:19:38.1349000Z // end inline asm 2026-02-21T10:19:38.1349061Z // begin inline asm 2026-02-21T10:19:38.1349196Z mov.u64 %rd765, 0x0; 2026-02-21T10:19:38.1349320Z createpolicy.fractional.L2::evict_first.b64 %rd765, 1.0; 2026-02-21T10:19:38.1349376Z // end inline asm 2026-02-21T10:19:38.1349436Z // begin inline asm 2026-02-21T10:19:38.1349490Z mov.u32 %r37296, 0x0; 2026-02-21T10:19:38.1349545Z mov.u32 %r37297, 0x0; 2026-02-21T10:19:38.1349604Z mov.u32 %r37298, 0x0; 2026-02-21T10:19:38.1349658Z mov.u32 %r37299, 0x0; 2026-02-21T10:19:38.1349880Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37296, %r37297, %r37298, %r37299 }, [ %rd766 + 0 ], %rd765; 2026-02-21T10:19:38.1349937Z // end inline asm 2026-02-21T10:19:38.1349994Z // begin inline asm 2026-02-21T10:19:38.1350050Z mov.u64 %rd768, 0x0; 2026-02-21T10:19:38.1350164Z createpolicy.fractional.L2::evict_first.b64 %rd768, 1.0; 2026-02-21T10:19:38.1350221Z // end inline asm 2026-02-21T10:19:38.1350277Z // begin inline asm 2026-02-21T10:19:38.1350332Z mov.u32 %r37300, 0x0; 2026-02-21T10:19:38.1350392Z mov.u32 %r37301, 0x0; 2026-02-21T10:19:38.1350448Z mov.u32 %r37302, 0x0; 2026-02-21T10:19:38.1350503Z mov.u32 %r37303, 0x0; 2026-02-21T10:19:38.1350723Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37300, %r37301, %r37302, %r37303 }, [ %rd769 + 0 ], %rd768; 2026-02-21T10:19:38.1350780Z // end inline asm 2026-02-21T10:19:38.1350836Z // begin inline asm 2026-02-21T10:19:38.1350977Z mov.u64 %rd771, 0x0; 2026-02-21T10:19:38.1351096Z createpolicy.fractional.L2::evict_first.b64 %rd771, 1.0; 2026-02-21T10:19:38.1351150Z // end inline asm 2026-02-21T10:19:38.1351208Z // begin inline asm 2026-02-21T10:19:38.1351265Z mov.u32 %r37304, 0x0; 2026-02-21T10:19:38.1351320Z mov.u32 %r37305, 0x0; 2026-02-21T10:19:38.1351374Z mov.u32 %r37306, 0x0; 2026-02-21T10:19:38.1351429Z mov.u32 %r37307, 0x0; 2026-02-21T10:19:38.1351649Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37304, %r37305, %r37306, %r37307 }, [ %rd772 + 0 ], %rd771; 2026-02-21T10:19:38.1351704Z // end inline asm 2026-02-21T10:19:38.1351763Z // begin inline asm 2026-02-21T10:19:38.1351823Z mov.u64 %rd774, 0x0; 2026-02-21T10:19:38.1351936Z createpolicy.fractional.L2::evict_first.b64 %rd774, 1.0; 2026-02-21T10:19:38.1351991Z // end inline asm 2026-02-21T10:19:38.1352046Z // begin inline asm 2026-02-21T10:19:38.1352103Z mov.u32 %r37308, 0x0; 2026-02-21T10:19:38.1352158Z mov.u32 %r37309, 0x0; 2026-02-21T10:19:38.1352214Z mov.u32 %r37310, 0x0; 2026-02-21T10:19:38.1352271Z mov.u32 %r37311, 0x0; 2026-02-21T10:19:38.1352551Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37308, %r37309, %r37310, %r37311 }, [ %rd775 + 0 ], %rd774; 2026-02-21T10:19:38.1352611Z // end inline asm 2026-02-21T10:19:38.1352670Z // begin inline asm 2026-02-21T10:19:38.1352726Z mov.u64 %rd777, 0x0; 2026-02-21T10:19:38.1352841Z createpolicy.fractional.L2::evict_first.b64 %rd777, 1.0; 2026-02-21T10:19:38.1352895Z // end inline asm 2026-02-21T10:19:38.1352954Z // begin inline asm 2026-02-21T10:19:38.1353009Z mov.u32 %r37312, 0x0; 2026-02-21T10:19:38.1353112Z mov.u32 %r37313, 0x0; 2026-02-21T10:19:38.1353171Z mov.u32 %r37314, 0x0; 2026-02-21T10:19:38.1353226Z mov.u32 %r37315, 0x0; 2026-02-21T10:19:38.1353448Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37312, %r37313, %r37314, %r37315 }, [ %rd778 + 0 ], %rd777; 2026-02-21T10:19:38.1353504Z // end inline asm 2026-02-21T10:19:38.1353704Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.1353762Z bar.sync 0; 2026-02-21T10:19:38.1353848Z st.shared.v2.b32 [%r68], {%r37284, %r37285}; 2026-02-21T10:19:38.1353942Z st.shared.v2.b32 [%r68+2048], {%r37288, %r37289}; 2026-02-21T10:19:38.1354027Z st.shared.v2.b32 [%r68+4096], {%r37292, %r37293}; 2026-02-21T10:19:38.1354108Z st.shared.v2.b32 [%r68+6144], {%r37296, %r37297}; 2026-02-21T10:19:38.1354191Z st.shared.v2.b32 [%r68+8192], {%r37300, %r37301}; 2026-02-21T10:19:38.1354281Z st.shared.v2.b32 [%r68+10240], {%r37304, %r37305}; 2026-02-21T10:19:38.1354365Z st.shared.v2.b32 [%r68+12288], {%r37308, %r37309}; 2026-02-21T10:19:38.1354521Z st.shared.v2.b32 [%r68+14336], {%r37312, %r37313}; 2026-02-21T10:19:38.1354606Z st.shared.v2.b32 [%r69], {%r37286, %r37287}; 2026-02-21T10:19:38.1354691Z st.shared.v2.b32 [%r69+2048], {%r37290, %r37291}; 2026-02-21T10:19:38.1354774Z st.shared.v2.b32 [%r69+4096], {%r37294, %r37295}; 2026-02-21T10:19:38.1354860Z st.shared.v2.b32 [%r69+6144], {%r37298, %r37299}; 2026-02-21T10:19:38.1354943Z st.shared.v2.b32 [%r69+8192], {%r37302, %r37303}; 2026-02-21T10:19:38.1355029Z st.shared.v2.b32 [%r69+10240], {%r37306, %r37307}; 2026-02-21T10:19:38.1355117Z st.shared.v2.b32 [%r69+12288], {%r37310, %r37311}; 2026-02-21T10:19:38.1355201Z st.shared.v2.b32 [%r69+14336], {%r37314, %r37315}; 2026-02-21T10:19:38.1355256Z bar.sync 0; 2026-02-21T10:19:38.1355326Z ld.shared.b16 %rs3142, [%r70]; 2026-02-21T10:19:38.1355398Z ld.shared.b16 %rs3143, [%r70+1024]; 2026-02-21T10:19:38.1355464Z ld.shared.b16 %rs3144, [%r70+64]; 2026-02-21T10:19:38.1355540Z ld.shared.b16 %rs3145, [%r70+1088]; 2026-02-21T10:19:38.1355612Z ld.shared.b16 %rs3146, [%r70+8192]; 2026-02-21T10:19:38.1355676Z ld.shared.b16 %rs3147, [%r70+9216]; 2026-02-21T10:19:38.1355739Z ld.shared.b16 %rs3148, [%r70+8256]; 2026-02-21T10:19:38.1355805Z ld.shared.b16 %rs3149, [%r70+9280]; 2026-02-21T10:19:38.1355871Z ld.shared.b16 %rs3150, [%r71]; 2026-02-21T10:19:38.1355996Z ld.shared.b16 %rs3151, [%r71+1024]; 2026-02-21T10:19:38.1356059Z ld.shared.b16 %rs3152, [%r71+64]; 2026-02-21T10:19:38.1356126Z ld.shared.b16 %rs3153, [%r71+1088]; 2026-02-21T10:19:38.1356190Z ld.shared.b16 %rs3154, [%r71+8192]; 2026-02-21T10:19:38.1356252Z ld.shared.b16 %rs3155, [%r71+9216]; 2026-02-21T10:19:38.1356317Z ld.shared.b16 %rs3156, [%r71+8256]; 2026-02-21T10:19:38.1356380Z ld.shared.b16 %rs3157, [%r71+9280]; 2026-02-21T10:19:38.1356442Z ld.shared.b16 %rs3158, [%r72]; 2026-02-21T10:19:38.1356639Z ld.shared.b16 %rs3159, [%r72+1024]; 2026-02-21T10:19:38.1356706Z ld.shared.b16 %rs3160, [%r72+64]; 2026-02-21T10:19:38.1356774Z ld.shared.b16 %rs3161, [%r72+1088]; 2026-02-21T10:19:38.1356837Z ld.shared.b16 %rs3162, [%r72+8192]; 2026-02-21T10:19:38.1356903Z ld.shared.b16 %rs3163, [%r72+9216]; 2026-02-21T10:19:38.1356966Z ld.shared.b16 %rs3164, [%r72+8256]; 2026-02-21T10:19:38.1357042Z ld.shared.b16 %rs3165, [%r72+9280]; 2026-02-21T10:19:38.1357109Z ld.shared.b16 %rs3166, [%r73]; 2026-02-21T10:19:38.1357171Z ld.shared.b16 %rs3167, [%r73+1024]; 2026-02-21T10:19:38.1357234Z ld.shared.b16 %rs3168, [%r73+64]; 2026-02-21T10:19:38.1357371Z ld.shared.b16 %rs3169, [%r73+1088]; 2026-02-21T10:19:38.1357440Z ld.shared.b16 %rs3170, [%r73+8192]; 2026-02-21T10:19:38.1357503Z ld.shared.b16 %rs3171, [%r73+9216]; 2026-02-21T10:19:38.1357565Z ld.shared.b16 %rs3172, [%r73+8256]; 2026-02-21T10:19:38.1357630Z ld.shared.b16 %rs3173, [%r73+9280]; 2026-02-21T10:19:38.1357703Z ld.shared.b16 %rs3174, [%r74]; 2026-02-21T10:19:38.1357767Z ld.shared.b16 %rs3175, [%r74+1024]; 2026-02-21T10:19:38.1357915Z ld.shared.b16 %rs3176, [%r74+64]; 2026-02-21T10:19:38.1357980Z ld.shared.b16 %rs3177, [%r74+1088]; 2026-02-21T10:19:38.1358044Z ld.shared.b16 %rs3178, [%r74+8192]; 2026-02-21T10:19:38.1358108Z ld.shared.b16 %rs3179, [%r74+9216]; 2026-02-21T10:19:38.1358174Z ld.shared.b16 %rs3180, [%r74+8256]; 2026-02-21T10:19:38.1358236Z ld.shared.b16 %rs3181, [%r74+9280]; 2026-02-21T10:19:38.1358311Z ld.shared.b16 %rs3182, [%r75]; 2026-02-21T10:19:38.1358379Z ld.shared.b16 %rs3183, [%r75+1024]; 2026-02-21T10:19:38.1358444Z ld.shared.b16 %rs3184, [%r75+64]; 2026-02-21T10:19:38.1358508Z ld.shared.b16 %rs3185, [%r75+1088]; 2026-02-21T10:19:38.1358570Z ld.shared.b16 %rs3186, [%r75+8192]; 2026-02-21T10:19:38.1358636Z ld.shared.b16 %rs3187, [%r75+9216]; 2026-02-21T10:19:38.1358700Z ld.shared.b16 %rs3188, [%r75+8256]; 2026-02-21T10:19:38.1358764Z ld.shared.b16 %rs3189, [%r75+9280]; 2026-02-21T10:19:38.1358829Z ld.shared.b16 %rs3190, [%r76]; 2026-02-21T10:19:38.1358891Z ld.shared.b16 %rs3191, [%r76+1024]; 2026-02-21T10:19:38.1359019Z ld.shared.b16 %rs3192, [%r76+64]; 2026-02-21T10:19:38.1359088Z ld.shared.b16 %rs3193, [%r76+1088]; 2026-02-21T10:19:38.1359151Z ld.shared.b16 %rs3194, [%r76+8192]; 2026-02-21T10:19:38.1359213Z ld.shared.b16 %rs3195, [%r76+9216]; 2026-02-21T10:19:38.1359277Z ld.shared.b16 %rs3196, [%r76+8256]; 2026-02-21T10:19:38.1359344Z ld.shared.b16 %rs3197, [%r76+9280]; 2026-02-21T10:19:38.1359405Z ld.shared.b16 %rs3198, [%r77]; 2026-02-21T10:19:38.1359470Z ld.shared.b16 %rs3199, [%r77+1024]; 2026-02-21T10:19:38.1359535Z ld.shared.b16 %rs3200, [%r77+64]; 2026-02-21T10:19:38.1359608Z ld.shared.b16 %rs3201, [%r77+1088]; 2026-02-21T10:19:38.1359672Z ld.shared.b16 %rs3202, [%r77+8192]; 2026-02-21T10:19:38.1359736Z ld.shared.b16 %rs3203, [%r77+9216]; 2026-02-21T10:19:38.1359800Z ld.shared.b16 %rs3204, [%r77+8256]; 2026-02-21T10:19:38.1359862Z ld.shared.b16 %rs3205, [%r77+9280]; 2026-02-21T10:19:38.1359922Z cvt.f32.bf16 %r37453, %rs3142; 2026-02-21T10:19:38.1359985Z cvt.f32.bf16 %r37454, %rs3143; 2026-02-21T10:19:38.1360045Z cvt.f32.bf16 %r37455, %rs3150; 2026-02-21T10:19:38.1360104Z cvt.f32.bf16 %r37456, %rs3151; 2026-02-21T10:19:38.1360163Z cvt.f32.bf16 %r37585, %rs3158; 2026-02-21T10:19:38.1360224Z cvt.f32.bf16 %r37586, %rs3159; 2026-02-21T10:19:38.1360282Z cvt.f32.bf16 %r37587, %rs3166; 2026-02-21T10:19:38.1360413Z cvt.f32.bf16 %r37588, %rs3167; 2026-02-21T10:19:38.1360473Z cvt.f32.bf16 %r37717, %rs3174; 2026-02-21T10:19:38.1360532Z cvt.f32.bf16 %r37718, %rs3175; 2026-02-21T10:19:38.1360592Z cvt.f32.bf16 %r37719, %rs3182; 2026-02-21T10:19:38.1360652Z cvt.f32.bf16 %r37720, %rs3183; 2026-02-21T10:19:38.1360711Z cvt.f32.bf16 %r37849, %rs3190; 2026-02-21T10:19:38.1360770Z cvt.f32.bf16 %r37850, %rs3191; 2026-02-21T10:19:38.1360828Z cvt.f32.bf16 %r37851, %rs3198; 2026-02-21T10:19:38.1360889Z cvt.f32.bf16 %r37852, %rs3199; 2026-02-21T10:19:38.1360946Z cvt.f32.bf16 %r37981, %rs3144; 2026-02-21T10:19:38.1361003Z cvt.f32.bf16 %r37982, %rs3145; 2026-02-21T10:19:38.1361068Z cvt.f32.bf16 %r37983, %rs3152; 2026-02-21T10:19:38.1361129Z cvt.f32.bf16 %r37984, %rs3153; 2026-02-21T10:19:38.1361188Z cvt.f32.bf16 %r38113, %rs3160; 2026-02-21T10:19:38.1361246Z cvt.f32.bf16 %r38114, %rs3161; 2026-02-21T10:19:38.1361308Z cvt.f32.bf16 %r38115, %rs3168; 2026-02-21T10:19:38.1361369Z cvt.f32.bf16 %r38116, %rs3169; 2026-02-21T10:19:38.1361431Z cvt.f32.bf16 %r38245, %rs3176; 2026-02-21T10:19:38.1361493Z cvt.f32.bf16 %r38246, %rs3177; 2026-02-21T10:19:38.1361552Z cvt.f32.bf16 %r38247, %rs3184; 2026-02-21T10:19:38.1361663Z cvt.f32.bf16 %r38248, %rs3185; 2026-02-21T10:19:38.1361736Z cvt.f32.bf16 %r38377, %rs3192; 2026-02-21T10:19:38.1361800Z cvt.f32.bf16 %r38378, %rs3193; 2026-02-21T10:19:38.1361858Z cvt.f32.bf16 %r38379, %rs3200; 2026-02-21T10:19:38.1361918Z cvt.f32.bf16 %r38380, %rs3201; 2026-02-21T10:19:38.1361978Z cvt.f32.bf16 %r38509, %rs3146; 2026-02-21T10:19:38.1362037Z cvt.f32.bf16 %r38510, %rs3147; 2026-02-21T10:19:38.1362143Z cvt.f32.bf16 %r38511, %rs3154; 2026-02-21T10:19:38.1362203Z cvt.f32.bf16 %r38512, %rs3155; 2026-02-21T10:19:38.1362261Z cvt.f32.bf16 %r38641, %rs3162; 2026-02-21T10:19:38.1362319Z cvt.f32.bf16 %r38642, %rs3163; 2026-02-21T10:19:38.1362376Z cvt.f32.bf16 %r38643, %rs3170; 2026-02-21T10:19:38.1362438Z cvt.f32.bf16 %r38644, %rs3171; 2026-02-21T10:19:38.1362498Z cvt.f32.bf16 %r38773, %rs3178; 2026-02-21T10:19:38.1362557Z cvt.f32.bf16 %r38774, %rs3179; 2026-02-21T10:19:38.1362619Z cvt.f32.bf16 %r38775, %rs3186; 2026-02-21T10:19:38.1362678Z cvt.f32.bf16 %r38776, %rs3187; 2026-02-21T10:19:38.1362737Z cvt.f32.bf16 %r38905, %rs3194; 2026-02-21T10:19:38.1362795Z cvt.f32.bf16 %r38906, %rs3195; 2026-02-21T10:19:38.1362869Z cvt.f32.bf16 %r38907, %rs3202; 2026-02-21T10:19:38.1362930Z cvt.f32.bf16 %r38908, %rs3203; 2026-02-21T10:19:38.1362989Z cvt.f32.bf16 %r39037, %rs3148; 2026-02-21T10:19:38.1363049Z cvt.f32.bf16 %r39038, %rs3149; 2026-02-21T10:19:38.1363107Z cvt.f32.bf16 %r39039, %rs3156; 2026-02-21T10:19:38.1363219Z cvt.f32.bf16 %r39040, %rs3157; 2026-02-21T10:19:38.1363280Z cvt.f32.bf16 %r39169, %rs3164; 2026-02-21T10:19:38.1363341Z cvt.f32.bf16 %r39170, %rs3165; 2026-02-21T10:19:38.1363399Z cvt.f32.bf16 %r39171, %rs3172; 2026-02-21T10:19:38.1363457Z cvt.f32.bf16 %r39172, %rs3173; 2026-02-21T10:19:38.1363518Z cvt.f32.bf16 %r39301, %rs3180; 2026-02-21T10:19:38.1363578Z cvt.f32.bf16 %r39302, %rs3181; 2026-02-21T10:19:38.1363636Z cvt.f32.bf16 %r39303, %rs3188; 2026-02-21T10:19:38.1363695Z cvt.f32.bf16 %r39304, %rs3189; 2026-02-21T10:19:38.1363757Z cvt.f32.bf16 %r39433, %rs3196; 2026-02-21T10:19:38.1363819Z cvt.f32.bf16 %r39434, %rs3197; 2026-02-21T10:19:38.1363878Z cvt.f32.bf16 %r39435, %rs3204; 2026-02-21T10:19:38.1363940Z cvt.f32.bf16 %r39436, %rs3205; 2026-02-21T10:19:38.1364147Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.1364202Z bar.sync 0; 2026-02-21T10:19:38.1364264Z // begin inline asm 2026-02-21T10:19:38.1364373Z @%p313 mbarrier.init.shared::cta.b64 [%r39929], 1; 2026-02-21T10:19:38.1364430Z // end inline asm 2026-02-21T10:19:38.1364483Z bar.sync 0; 2026-02-21T10:19:38.1364542Z // begin inline asm 2026-02-21T10:19:38.1364694Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39929], 4096; 2026-02-21T10:19:38.1364752Z // end inline asm 2026-02-21T10:19:38.1364870Z // begin inline asm 2026-02-21T10:19:38.1364950Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1365003Z // end inline asm 2026-02-21T10:19:38.1365069Z bar.sync 0; 2026-02-21T10:19:38.1365139Z elect.sync %r39832|%p376, -1; 2026-02-21T10:19:38.1365207Z and.pred %p355, %p1, %p376; 2026-02-21T10:19:38.1365268Z add.s32 %r37320, %r39767, 160; 2026-02-21T10:19:38.1365327Z // begin inline asm 2026-02-21T10:19:38.1365673Z @%p355 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r39932, %r37320}], [%r39929]; 2026-02-21T10:19:38.1365728Z // end inline asm 2026-02-21T10:19:38.1365786Z bar.sync 0; 2026-02-21T10:19:38.1365844Z // begin inline asm 2026-02-21T10:19:38.1365898Z 2026-02-21T10:19:38.1365947Z { 2026-02-21T10:19:38.1366013Z .reg .pred complete; 2026-02-21T10:19:38.1366067Z waitLoop: 2026-02-21T10:19:38.1366214Z mbarrier.try_wait.parity.shared.b64 complete, [%r39929], %r39567; 2026-02-21T10:19:38.1366287Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.1366336Z } 2026-02-21T10:19:38.1366342Z 2026-02-21T10:19:38.1366400Z // end inline asm 2026-02-21T10:19:38.1366581Z bar.sync 0; 2026-02-21T10:19:38.1366721Z // begin inline asm 2026-02-21T10:19:38.1366830Z @%p313 mbarrier.inval.shared::cta.b64 [%r39929]; 2026-02-21T10:19:38.1366892Z // end inline asm 2026-02-21T10:19:38.1367103Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1367168Z ld.shared.s8 %rs3206, [%r78]; 2026-02-21T10:19:38.1367362Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1367498Z shl.b16 %rs3207, %rs3206, 4; 2026-02-21T10:19:38.1367693Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1367761Z ld.shared.s8 %rs3208, [%r79+128]; 2026-02-21T10:19:38.1367949Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1368015Z shl.b16 %rs3209, %rs3208, 4; 2026-02-21T10:19:38.1368206Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1368269Z ld.shared.s8 %rs3210, [%r80+256]; 2026-02-21T10:19:38.1368459Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1368520Z shl.b16 %rs3211, %rs3210, 4; 2026-02-21T10:19:38.1368708Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1368777Z ld.shared.s8 %rs3212, [%r81+384]; 2026-02-21T10:19:38.1369032Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1369095Z shl.b16 %rs3213, %rs3212, 4; 2026-02-21T10:19:38.1369287Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1369352Z ld.shared.s8 %rs3214, [%r82+512]; 2026-02-21T10:19:38.1369538Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1369601Z shl.b16 %rs3215, %rs3214, 4; 2026-02-21T10:19:38.1369792Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1369854Z ld.shared.s8 %rs3216, [%r83+640]; 2026-02-21T10:19:38.1370049Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1370110Z shl.b16 %rs3217, %rs3216, 4; 2026-02-21T10:19:38.1370300Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1370363Z ld.shared.s8 %rs3218, [%r84+768]; 2026-02-21T10:19:38.1370554Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1370614Z shl.b16 %rs3219, %rs3218, 4; 2026-02-21T10:19:38.1370882Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1370948Z ld.shared.s8 %rs3220, [%r85+896]; 2026-02-21T10:19:38.1371136Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1371196Z shl.b16 %rs3221, %rs3220, 4; 2026-02-21T10:19:38.1371384Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1371450Z ld.shared.s8 %rs3222, [%r78+1024]; 2026-02-21T10:19:38.1371641Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1371704Z shl.b16 %rs3223, %rs3222, 4; 2026-02-21T10:19:38.1371892Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1371958Z ld.shared.s8 %rs3224, [%r79+1152]; 2026-02-21T10:19:38.1372145Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1372217Z shl.b16 %rs3225, %rs3224, 4; 2026-02-21T10:19:38.1372466Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1372533Z ld.shared.s8 %rs3226, [%r80+1280]; 2026-02-21T10:19:38.1372722Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1372782Z shl.b16 %rs3227, %rs3226, 4; 2026-02-21T10:19:38.1372968Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1373084Z ld.shared.s8 %rs3228, [%r81+1408]; 2026-02-21T10:19:38.1373275Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1373334Z shl.b16 %rs3229, %rs3228, 4; 2026-02-21T10:19:38.1373522Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1373590Z ld.shared.s8 %rs3230, [%r82+1536]; 2026-02-21T10:19:38.1373780Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1373840Z shl.b16 %rs3231, %rs3230, 4; 2026-02-21T10:19:38.1374030Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1374092Z ld.shared.s8 %rs3232, [%r83+1664]; 2026-02-21T10:19:38.1374284Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1374347Z shl.b16 %rs3233, %rs3232, 4; 2026-02-21T10:19:38.1374584Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1374650Z ld.shared.s8 %rs3234, [%r84+1792]; 2026-02-21T10:19:38.1374841Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1374903Z shl.b16 %rs3235, %rs3234, 4; 2026-02-21T10:19:38.1375091Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1375155Z ld.shared.s8 %rs3236, [%r85+1920]; 2026-02-21T10:19:38.1375346Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1375405Z shl.b16 %rs3237, %rs3236, 4; 2026-02-21T10:19:38.1375593Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1375658Z ld.shared.s8 %rs3238, [%r78+2048]; 2026-02-21T10:19:38.1375848Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1375907Z shl.b16 %rs3239, %rs3238, 4; 2026-02-21T10:19:38.1376097Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1376159Z ld.shared.s8 %rs3240, [%r79+2176]; 2026-02-21T10:19:38.1376398Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1376586Z shl.b16 %rs3241, %rs3240, 4; 2026-02-21T10:19:38.1376781Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1376850Z ld.shared.s8 %rs3242, [%r80+2304]; 2026-02-21T10:19:38.1377047Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1377109Z shl.b16 %rs3243, %rs3242, 4; 2026-02-21T10:19:38.1377297Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1377362Z ld.shared.s8 %rs3244, [%r81+2432]; 2026-02-21T10:19:38.1377552Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1377612Z shl.b16 %rs3245, %rs3244, 4; 2026-02-21T10:19:38.1377799Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1377866Z ld.shared.s8 %rs3246, [%r82+2560]; 2026-02-21T10:19:38.1378152Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1378218Z shl.b16 %rs3247, %rs3246, 4; 2026-02-21T10:19:38.1378410Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1378473Z ld.shared.s8 %rs3248, [%r83+2688]; 2026-02-21T10:19:38.1378661Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1378788Z shl.b16 %rs3249, %rs3248, 4; 2026-02-21T10:19:38.1378979Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1379042Z ld.shared.s8 %rs3250, [%r84+2816]; 2026-02-21T10:19:38.1379230Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1379293Z shl.b16 %rs3251, %rs3250, 4; 2026-02-21T10:19:38.1379482Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1379544Z ld.shared.s8 %rs3252, [%r85+2944]; 2026-02-21T10:19:38.1379732Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1379792Z shl.b16 %rs3253, %rs3252, 4; 2026-02-21T10:19:38.1379978Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1380043Z ld.shared.s8 %rs3254, [%r78+3072]; 2026-02-21T10:19:38.1380295Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1380359Z shl.b16 %rs3255, %rs3254, 4; 2026-02-21T10:19:38.1380549Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1380614Z ld.shared.s8 %rs3256, [%r79+3200]; 2026-02-21T10:19:38.1380800Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1380861Z shl.b16 %rs3257, %rs3256, 4; 2026-02-21T10:19:38.1381053Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1381114Z ld.shared.s8 %rs3258, [%r80+3328]; 2026-02-21T10:19:38.1381300Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1381362Z shl.b16 %rs3259, %rs3258, 4; 2026-02-21T10:19:38.1381554Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1381616Z ld.shared.s8 %rs3260, [%r81+3456]; 2026-02-21T10:19:38.1381805Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1381864Z shl.b16 %rs3261, %rs3260, 4; 2026-02-21T10:19:38.1382129Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1382201Z ld.shared.s8 %rs3262, [%r82+3584]; 2026-02-21T10:19:38.1382393Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1382451Z shl.b16 %rs3263, %rs3262, 4; 2026-02-21T10:19:38.1382637Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1382702Z ld.shared.s8 %rs3264, [%r83+3712]; 2026-02-21T10:19:38.1382889Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1382951Z shl.b16 %rs3265, %rs3264, 4; 2026-02-21T10:19:38.1383140Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1383203Z ld.shared.s8 %rs3266, [%r84+3840]; 2026-02-21T10:19:38.1383391Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1383454Z shl.b16 %rs3267, %rs3266, 4; 2026-02-21T10:19:38.1383691Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1383755Z ld.shared.s8 %rs3268, [%r85+3968]; 2026-02-21T10:19:38.1383945Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1384004Z shl.b16 %rs3269, %rs3268, 4; 2026-02-21T10:19:38.1384192Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1384300Z cvt.s16.s8 %rs3270, %rs3207; 2026-02-21T10:19:38.1384362Z shr.s16 %rs3271, %rs3270, 4; 2026-02-21T10:19:38.1384421Z cvt.s16.s8 %rs3272, %rs3209; 2026-02-21T10:19:38.1384480Z shr.s16 %rs3273, %rs3272, 4; 2026-02-21T10:19:38.1384541Z shr.s16 %rs3274, %rs3206, 4; 2026-02-21T10:19:38.1384598Z shr.s16 %rs3275, %rs3208, 4; 2026-02-21T10:19:38.1384788Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1384856Z cvt.rn.f32.s16 %r39833, %rs3275; 2026-02-21T10:19:38.1384918Z cvt.rn.f32.s16 %r39834, %rs3274; 2026-02-21T10:19:38.1384978Z cvt.rn.f32.s16 %r39835, %rs3273; 2026-02-21T10:19:38.1385051Z cvt.rn.f32.s16 %r39836, %rs3271; 2026-02-21T10:19:38.1385244Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1385304Z cvt.s16.s8 %rs3276, %rs3211; 2026-02-21T10:19:38.1385364Z shr.s16 %rs3277, %rs3276, 4; 2026-02-21T10:19:38.1385426Z cvt.s16.s8 %rs3278, %rs3213; 2026-02-21T10:19:38.1385535Z shr.s16 %rs3279, %rs3278, 4; 2026-02-21T10:19:38.1385595Z shr.s16 %rs3280, %rs3210, 4; 2026-02-21T10:19:38.1385653Z shr.s16 %rs3281, %rs3212, 4; 2026-02-21T10:19:38.1385844Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1385908Z cvt.rn.f32.s16 %r39837, %rs3281; 2026-02-21T10:19:38.1385970Z cvt.rn.f32.s16 %r39838, %rs3280; 2026-02-21T10:19:38.1386034Z cvt.rn.f32.s16 %r39839, %rs3279; 2026-02-21T10:19:38.1386095Z cvt.rn.f32.s16 %r39840, %rs3277; 2026-02-21T10:19:38.1386283Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1386345Z cvt.s16.s8 %rs3282, %rs3215; 2026-02-21T10:19:38.1386404Z shr.s16 %rs3283, %rs3282, 4; 2026-02-21T10:19:38.1386581Z cvt.s16.s8 %rs3284, %rs3217; 2026-02-21T10:19:38.1386646Z shr.s16 %rs3285, %rs3284, 4; 2026-02-21T10:19:38.1386712Z shr.s16 %rs3286, %rs3214, 4; 2026-02-21T10:19:38.1386772Z shr.s16 %rs3287, %rs3216, 4; 2026-02-21T10:19:38.1386978Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1387044Z cvt.rn.f32.s16 %r39841, %rs3287; 2026-02-21T10:19:38.1387105Z cvt.rn.f32.s16 %r39842, %rs3286; 2026-02-21T10:19:38.1387251Z cvt.rn.f32.s16 %r39843, %rs3285; 2026-02-21T10:19:38.1387315Z cvt.rn.f32.s16 %r39844, %rs3283; 2026-02-21T10:19:38.1387506Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1387566Z cvt.s16.s8 %rs3288, %rs3219; 2026-02-21T10:19:38.1387626Z shr.s16 %rs3289, %rs3288, 4; 2026-02-21T10:19:38.1387686Z cvt.s16.s8 %rs3290, %rs3221; 2026-02-21T10:19:38.1387745Z shr.s16 %rs3291, %rs3290, 4; 2026-02-21T10:19:38.1387803Z shr.s16 %rs3292, %rs3218, 4; 2026-02-21T10:19:38.1387875Z shr.s16 %rs3293, %rs3220, 4; 2026-02-21T10:19:38.1388071Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1388136Z cvt.rn.f32.s16 %r39845, %rs3293; 2026-02-21T10:19:38.1388199Z cvt.rn.f32.s16 %r39846, %rs3292; 2026-02-21T10:19:38.1388260Z cvt.rn.f32.s16 %r39847, %rs3291; 2026-02-21T10:19:38.1388319Z cvt.rn.f32.s16 %r39848, %rs3289; 2026-02-21T10:19:38.1388599Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1388664Z cvt.s16.s8 %rs3294, %rs3223; 2026-02-21T10:19:38.1388795Z shr.s16 %rs3295, %rs3294, 4; 2026-02-21T10:19:38.1388858Z cvt.s16.s8 %rs3296, %rs3225; 2026-02-21T10:19:38.1388924Z shr.s16 %rs3297, %rs3296, 4; 2026-02-21T10:19:38.1388982Z shr.s16 %rs3298, %rs3222, 4; 2026-02-21T10:19:38.1389041Z shr.s16 %rs3299, %rs3224, 4; 2026-02-21T10:19:38.1389232Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1389294Z cvt.rn.f32.s16 %r39849, %rs3299; 2026-02-21T10:19:38.1389419Z cvt.rn.f32.s16 %r39850, %rs3298; 2026-02-21T10:19:38.1389480Z cvt.rn.f32.s16 %r39851, %rs3297; 2026-02-21T10:19:38.1389544Z cvt.rn.f32.s16 %r39852, %rs3295; 2026-02-21T10:19:38.1389735Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1389795Z cvt.s16.s8 %rs3300, %rs3227; 2026-02-21T10:19:38.1389857Z shr.s16 %rs3301, %rs3300, 4; 2026-02-21T10:19:38.1389915Z cvt.s16.s8 %rs3302, %rs3229; 2026-02-21T10:19:38.1389974Z shr.s16 %rs3303, %rs3302, 4; 2026-02-21T10:19:38.1390033Z shr.s16 %rs3304, %rs3226, 4; 2026-02-21T10:19:38.1390094Z shr.s16 %rs3305, %rs3228, 4; 2026-02-21T10:19:38.1390285Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1390346Z cvt.rn.f32.s16 %r39853, %rs3305; 2026-02-21T10:19:38.1390417Z cvt.rn.f32.s16 %r39854, %rs3304; 2026-02-21T10:19:38.1390482Z cvt.rn.f32.s16 %r39855, %rs3303; 2026-02-21T10:19:38.1390545Z cvt.rn.f32.s16 %r39856, %rs3301; 2026-02-21T10:19:38.1390807Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1390869Z cvt.s16.s8 %rs3306, %rs3231; 2026-02-21T10:19:38.1390929Z shr.s16 %rs3307, %rs3306, 4; 2026-02-21T10:19:38.1390987Z cvt.s16.s8 %rs3308, %rs3233; 2026-02-21T10:19:38.1391051Z shr.s16 %rs3309, %rs3308, 4; 2026-02-21T10:19:38.1391111Z shr.s16 %rs3310, %rs3230, 4; 2026-02-21T10:19:38.1391170Z shr.s16 %rs3311, %rs3232, 4; 2026-02-21T10:19:38.1391369Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1391437Z cvt.rn.f32.s16 %r39857, %rs3311; 2026-02-21T10:19:38.1391499Z cvt.rn.f32.s16 %r39858, %rs3310; 2026-02-21T10:19:38.1391561Z cvt.rn.f32.s16 %r39859, %rs3309; 2026-02-21T10:19:38.1391622Z cvt.rn.f32.s16 %r39860, %rs3307; 2026-02-21T10:19:38.1391811Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1391874Z cvt.s16.s8 %rs3312, %rs3235; 2026-02-21T10:19:38.1391936Z shr.s16 %rs3313, %rs3312, 4; 2026-02-21T10:19:38.1391995Z cvt.s16.s8 %rs3314, %rs3237; 2026-02-21T10:19:38.1392053Z shr.s16 %rs3315, %rs3314, 4; 2026-02-21T10:19:38.1392114Z shr.s16 %rs3316, %rs3234, 4; 2026-02-21T10:19:38.1392228Z shr.s16 %rs3317, %rs3236, 4; 2026-02-21T10:19:38.1392417Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1392484Z cvt.rn.f32.s16 %r39861, %rs3317; 2026-02-21T10:19:38.1392546Z cvt.rn.f32.s16 %r39862, %rs3316; 2026-02-21T10:19:38.1392606Z cvt.rn.f32.s16 %r39863, %rs3315; 2026-02-21T10:19:38.1392665Z cvt.rn.f32.s16 %r39864, %rs3313; 2026-02-21T10:19:38.1392854Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1392916Z cvt.s16.s8 %rs3318, %rs3239; 2026-02-21T10:19:38.1392975Z shr.s16 %rs3319, %rs3318, 4; 2026-02-21T10:19:38.1393040Z cvt.s16.s8 %rs3320, %rs3241; 2026-02-21T10:19:38.1393098Z shr.s16 %rs3321, %rs3320, 4; 2026-02-21T10:19:38.1393156Z shr.s16 %rs3322, %rs3238, 4; 2026-02-21T10:19:38.1393215Z shr.s16 %rs3323, %rs3240, 4; 2026-02-21T10:19:38.1393406Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1393469Z cvt.rn.f32.s16 %r39865, %rs3323; 2026-02-21T10:19:38.1393529Z cvt.rn.f32.s16 %r39866, %rs3322; 2026-02-21T10:19:38.1393640Z cvt.rn.f32.s16 %r39867, %rs3321; 2026-02-21T10:19:38.1393703Z cvt.rn.f32.s16 %r39868, %rs3319; 2026-02-21T10:19:38.1393891Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1393953Z cvt.s16.s8 %rs3324, %rs3243; 2026-02-21T10:19:38.1394013Z shr.s16 %rs3325, %rs3324, 4; 2026-02-21T10:19:38.1394072Z cvt.s16.s8 %rs3326, %rs3245; 2026-02-21T10:19:38.1394130Z shr.s16 %rs3327, %rs3326, 4; 2026-02-21T10:19:38.1394239Z shr.s16 %rs3328, %rs3242, 4; 2026-02-21T10:19:38.1394298Z shr.s16 %rs3329, %rs3244, 4; 2026-02-21T10:19:38.1394488Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1394559Z cvt.rn.f32.s16 %r39869, %rs3329; 2026-02-21T10:19:38.1394619Z cvt.rn.f32.s16 %r39870, %rs3328; 2026-02-21T10:19:38.1394681Z cvt.rn.f32.s16 %r39871, %rs3327; 2026-02-21T10:19:38.1394744Z cvt.rn.f32.s16 %r39872, %rs3325; 2026-02-21T10:19:38.1394933Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1395004Z cvt.s16.s8 %rs3330, %rs3247; 2026-02-21T10:19:38.1395066Z shr.s16 %rs3331, %rs3330, 4; 2026-02-21T10:19:38.1395128Z cvt.s16.s8 %rs3332, %rs3249; 2026-02-21T10:19:38.1395187Z shr.s16 %rs3333, %rs3332, 4; 2026-02-21T10:19:38.1395244Z shr.s16 %rs3334, %rs3246, 4; 2026-02-21T10:19:38.1395306Z shr.s16 %rs3335, %rs3248, 4; 2026-02-21T10:19:38.1395543Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1395610Z cvt.rn.f32.s16 %r39873, %rs3335; 2026-02-21T10:19:38.1395672Z cvt.rn.f32.s16 %r39874, %rs3334; 2026-02-21T10:19:38.1395732Z cvt.rn.f32.s16 %r39875, %rs3333; 2026-02-21T10:19:38.1395793Z cvt.rn.f32.s16 %r39876, %rs3331; 2026-02-21T10:19:38.1395986Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1396049Z cvt.s16.s8 %rs3336, %rs3251; 2026-02-21T10:19:38.1396112Z shr.s16 %rs3337, %rs3336, 4; 2026-02-21T10:19:38.1396171Z cvt.s16.s8 %rs3338, %rs3253; 2026-02-21T10:19:38.1396232Z shr.s16 %rs3339, %rs3338, 4; 2026-02-21T10:19:38.1396291Z shr.s16 %rs3340, %rs3250, 4; 2026-02-21T10:19:38.1396349Z shr.s16 %rs3341, %rs3252, 4; 2026-02-21T10:19:38.1396664Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1396732Z cvt.rn.f32.s16 %r39877, %rs3341; 2026-02-21T10:19:38.1396798Z cvt.rn.f32.s16 %r39878, %rs3340; 2026-02-21T10:19:38.1396858Z cvt.rn.f32.s16 %r39879, %rs3339; 2026-02-21T10:19:38.1396921Z cvt.rn.f32.s16 %r39880, %rs3337; 2026-02-21T10:19:38.1397123Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1397277Z cvt.s16.s8 %rs3342, %rs3255; 2026-02-21T10:19:38.1397339Z shr.s16 %rs3343, %rs3342, 4; 2026-02-21T10:19:38.1397398Z cvt.s16.s8 %rs3344, %rs3257; 2026-02-21T10:19:38.1397458Z shr.s16 %rs3345, %rs3344, 4; 2026-02-21T10:19:38.1397518Z shr.s16 %rs3346, %rs3254, 4; 2026-02-21T10:19:38.1397578Z shr.s16 %rs3347, %rs3256, 4; 2026-02-21T10:19:38.1397768Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1397830Z cvt.rn.f32.s16 %r39881, %rs3347; 2026-02-21T10:19:38.1397893Z cvt.rn.f32.s16 %r39882, %rs3346; 2026-02-21T10:19:38.1397953Z cvt.rn.f32.s16 %r39883, %rs3345; 2026-02-21T10:19:38.1398028Z cvt.rn.f32.s16 %r39884, %rs3343; 2026-02-21T10:19:38.1398223Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1398282Z cvt.s16.s8 %rs3348, %rs3259; 2026-02-21T10:19:38.1398341Z shr.s16 %rs3349, %rs3348, 4; 2026-02-21T10:19:38.1398399Z cvt.s16.s8 %rs3350, %rs3261; 2026-02-21T10:19:38.1398461Z shr.s16 %rs3351, %rs3350, 4; 2026-02-21T10:19:38.1398519Z shr.s16 %rs3352, %rs3258, 4; 2026-02-21T10:19:38.1398578Z shr.s16 %rs3353, %rs3260, 4; 2026-02-21T10:19:38.1398835Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1398900Z cvt.rn.f32.s16 %r39885, %rs3353; 2026-02-21T10:19:38.1398960Z cvt.rn.f32.s16 %r39886, %rs3352; 2026-02-21T10:19:38.1399025Z cvt.rn.f32.s16 %r39887, %rs3351; 2026-02-21T10:19:38.1399084Z cvt.rn.f32.s16 %r39888, %rs3349; 2026-02-21T10:19:38.1399274Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1399410Z cvt.s16.s8 %rs3354, %rs3263; 2026-02-21T10:19:38.1399474Z shr.s16 %rs3355, %rs3354, 4; 2026-02-21T10:19:38.1399533Z cvt.s16.s8 %rs3356, %rs3265; 2026-02-21T10:19:38.1399592Z shr.s16 %rs3357, %rs3356, 4; 2026-02-21T10:19:38.1399652Z shr.s16 %rs3358, %rs3262, 4; 2026-02-21T10:19:38.1399711Z shr.s16 %rs3359, %rs3264, 4; 2026-02-21T10:19:38.1399901Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1399963Z cvt.rn.f32.s16 %r39889, %rs3359; 2026-02-21T10:19:38.1400026Z cvt.rn.f32.s16 %r39890, %rs3358; 2026-02-21T10:19:38.1400087Z cvt.rn.f32.s16 %r39891, %rs3357; 2026-02-21T10:19:38.1400148Z cvt.rn.f32.s16 %r39892, %rs3355; 2026-02-21T10:19:38.1400340Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1400400Z cvt.s16.s8 %rs3360, %rs3267; 2026-02-21T10:19:38.1400458Z shr.s16 %rs3361, %rs3360, 4; 2026-02-21T10:19:38.1400594Z cvt.s16.s8 %rs3362, %rs3269; 2026-02-21T10:19:38.1400659Z shr.s16 %rs3363, %rs3362, 4; 2026-02-21T10:19:38.1400718Z shr.s16 %rs3364, %rs3266, 4; 2026-02-21T10:19:38.1400776Z shr.s16 %rs3365, %rs3268, 4; 2026-02-21T10:19:38.1400968Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1401032Z cvt.rn.f32.s16 %r39893, %rs3365; 2026-02-21T10:19:38.1401093Z cvt.rn.f32.s16 %r39894, %rs3364; 2026-02-21T10:19:38.1401158Z cvt.rn.f32.s16 %r39895, %rs3363; 2026-02-21T10:19:38.1401218Z cvt.rn.f32.s16 %r39896, %rs3361; 2026-02-21T10:19:38.1401273Z bar.sync 0; 2026-02-21T10:19:38.1401395Z st.shared.v4.b32 [%r86], {%r39836, %r39834, %r39835, %r39833}; 2026-02-21T10:19:38.1401522Z st.shared.v4.b32 [%r86+16384], {%r39868, %r39866, %r39867, %r39865}; 2026-02-21T10:19:38.1401631Z st.shared.v4.b32 [%r87], {%r39840, %r39838, %r39839, %r39837}; 2026-02-21T10:19:38.1401750Z st.shared.v4.b32 [%r87+16384], {%r39872, %r39870, %r39871, %r39869}; 2026-02-21T10:19:38.1401863Z st.shared.v4.b32 [%r88], {%r39844, %r39842, %r39843, %r39841}; 2026-02-21T10:19:38.1401979Z st.shared.v4.b32 [%r88+16384], {%r39876, %r39874, %r39875, %r39873}; 2026-02-21T10:19:38.1402085Z st.shared.v4.b32 [%r89], {%r39848, %r39846, %r39847, %r39845}; 2026-02-21T10:19:38.1402259Z st.shared.v4.b32 [%r89+16384], {%r39880, %r39878, %r39879, %r39877}; 2026-02-21T10:19:38.1402363Z st.shared.v4.b32 [%r90], {%r39852, %r39850, %r39851, %r39849}; 2026-02-21T10:19:38.1402478Z st.shared.v4.b32 [%r90+16384], {%r39884, %r39882, %r39883, %r39881}; 2026-02-21T10:19:38.1402588Z st.shared.v4.b32 [%r91], {%r39856, %r39854, %r39855, %r39853}; 2026-02-21T10:19:38.1402702Z st.shared.v4.b32 [%r91+16384], {%r39888, %r39886, %r39887, %r39885}; 2026-02-21T10:19:38.1402806Z st.shared.v4.b32 [%r92], {%r39860, %r39858, %r39859, %r39857}; 2026-02-21T10:19:38.1402922Z st.shared.v4.b32 [%r92+16384], {%r39892, %r39890, %r39891, %r39889}; 2026-02-21T10:19:38.1403030Z st.shared.v4.b32 [%r93], {%r39864, %r39862, %r39863, %r39861}; 2026-02-21T10:19:38.1403146Z st.shared.v4.b32 [%r93+16384], {%r39896, %r39894, %r39895, %r39893}; 2026-02-21T10:19:38.1403200Z $L__tmp29: 2026-02-21T10:19:38.1403473Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1403534Z // begin inline asm 2026-02-21T10:19:38.1403612Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1403670Z // end inline asm 2026-02-21T10:19:38.1403787Z bar.sync 0; 2026-02-21T10:19:38.1403863Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1403923Z // begin inline asm 2026-02-21T10:19:38.1405410Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r37453,%r37454,%r37455,%r37456}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1405515Z // end inline asm 2026-02-21T10:19:38.1405575Z // begin inline asm 2026-02-21T10:19:38.1407246Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r37585,%r37586,%r37587,%r37588}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1407316Z // end inline asm 2026-02-21T10:19:38.1407374Z // begin inline asm 2026-02-21T10:19:38.1408828Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r37717,%r37718,%r37719,%r37720}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1408887Z // end inline asm 2026-02-21T10:19:38.1408944Z // begin inline asm 2026-02-21T10:19:38.1410397Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r37849,%r37850,%r37851,%r37852}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1410537Z // end inline asm 2026-02-21T10:19:38.1410595Z // begin inline asm 2026-02-21T10:19:38.1412060Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r37981,%r37982,%r37983,%r37984}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1412117Z // end inline asm 2026-02-21T10:19:38.1412174Z // begin inline asm 2026-02-21T10:19:38.1413697Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r38113,%r38114,%r38115,%r38116}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1413811Z // end inline asm 2026-02-21T10:19:38.1413869Z // begin inline asm 2026-02-21T10:19:38.1415323Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r38245,%r38246,%r38247,%r38248}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1415383Z // end inline asm 2026-02-21T10:19:38.1415441Z // begin inline asm 2026-02-21T10:19:38.1417104Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r38377,%r38378,%r38379,%r38380}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1417172Z // end inline asm 2026-02-21T10:19:38.1417236Z // begin inline asm 2026-02-21T10:19:38.1418690Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r38509,%r38510,%r38511,%r38512}, %rd12, %p317, 1, 1; 2026-02-21T10:19:38.1418752Z // end inline asm 2026-02-21T10:19:38.1418809Z // begin inline asm 2026-02-21T10:19:38.1420261Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r38641,%r38642,%r38643,%r38644}, %rd13, %p317, 1, 1; 2026-02-21T10:19:38.1420383Z // end inline asm 2026-02-21T10:19:38.1420443Z // begin inline asm 2026-02-21T10:19:38.1421965Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r38773,%r38774,%r38775,%r38776}, %rd14, %p317, 1, 1; 2026-02-21T10:19:38.1422032Z // end inline asm 2026-02-21T10:19:38.1422089Z // begin inline asm 2026-02-21T10:19:38.1423544Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r38905,%r38906,%r38907,%r38908}, %rd15, %p317, 1, 1; 2026-02-21T10:19:38.1423664Z // end inline asm 2026-02-21T10:19:38.1423721Z // begin inline asm 2026-02-21T10:19:38.1425246Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r39037,%r39038,%r39039,%r39040}, %rd16, %p317, 1, 1; 2026-02-21T10:19:38.1425305Z // end inline asm 2026-02-21T10:19:38.1425362Z // begin inline asm 2026-02-21T10:19:38.1426949Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r39169,%r39170,%r39171,%r39172}, %rd17, %p317, 1, 1; 2026-02-21T10:19:38.1427011Z // end inline asm 2026-02-21T10:19:38.1427071Z // begin inline asm 2026-02-21T10:19:38.1428586Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r39301,%r39302,%r39303,%r39304}, %rd18, %p317, 1, 1; 2026-02-21T10:19:38.1428723Z // end inline asm 2026-02-21T10:19:38.1428783Z // begin inline asm 2026-02-21T10:19:38.1430245Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r39433,%r39434,%r39435,%r39436}, %rd19, %p317, 1, 1; 2026-02-21T10:19:38.1430307Z // end inline asm 2026-02-21T10:19:38.1430384Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1430446Z mov.b32 %r39565, %r39931; 2026-02-21T10:19:38.1430503Z mov.b32 %r39566, %r39567; 2026-02-21T10:19:38.1430626Z // begin inline asm 2026-02-21T10:19:38.1433094Z // wait for regs: %r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r39565,%r39566,%r39567 2026-02-21T10:19:38.1433234Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1433288Z // end inline asm 2026-02-21T10:19:38.1433342Z $L__tmp30: 2026-02-21T10:19:38.1433559Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1433626Z add.s64 %rd852, %rd852, 384; 2026-02-21T10:19:38.1433751Z add.s32 %r43240, %r43240, 192; 2026-02-21T10:19:38.1433823Z setp.lt.u64 %p377, %rd103, 3936; 2026-02-21T10:19:38.1433884Z mov.b64 %rd853, %rd103; 2026-02-21T10:19:38.1433945Z @%p377 bra $L__BB0_18; 2026-02-21T10:19:38.1434048Z // %bb.19: // %.preheader.preheader 2026-02-21T10:19:38.1434154Z // in Loop: Header=BB0_17 Depth=1 2026-02-21T10:19:38.1434218Z add.s64 %rd105, %rd100, 16128; 2026-02-21T10:19:38.1434280Z add.s64 %rd106, %rd93, 16128; 2026-02-21T10:19:38.1434344Z add.s64 %rd107, %rd94, 16128; 2026-02-21T10:19:38.1434404Z add.s64 %rd108, %rd95, 16128; 2026-02-21T10:19:38.1434463Z add.s64 %rd109, %rd96, 16128; 2026-02-21T10:19:38.1434521Z add.s64 %rd110, %rd97, 16128; 2026-02-21T10:19:38.1434581Z add.s64 %rd111, %rd98, 16128; 2026-02-21T10:19:38.1434639Z add.s64 %rd112, %rd99, 16128; 2026-02-21T10:19:38.1434696Z mov.b64 %rd855, 4000; 2026-02-21T10:19:38.1434760Z mov.b64 %rd854, %rd20; 2026-02-21T10:19:38.1434852Z $L__BB0_20: // %.preheader 2026-02-21T10:19:38.1434953Z // Parent Loop BB0_17 Depth=1 2026-02-21T10:19:38.1435060Z // => This Inner Loop Header: Depth=2 2026-02-21T10:19:38.1435262Z .loc 1 55 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:32 2026-02-21T10:19:38.1435396Z add.s64 %rd799, %rd854, %rd112; 2026-02-21T10:19:38.1435461Z add.s64 %rd802, %rd854, %rd111; 2026-02-21T10:19:38.1435525Z add.s64 %rd805, %rd854, %rd110; 2026-02-21T10:19:38.1435585Z add.s64 %rd808, %rd854, %rd109; 2026-02-21T10:19:38.1435645Z add.s64 %rd811, %rd854, %rd108; 2026-02-21T10:19:38.1435707Z add.s64 %rd814, %rd854, %rd107; 2026-02-21T10:19:38.1435767Z add.s64 %rd817, %rd854, %rd106; 2026-02-21T10:19:38.1435961Z .loc 1 55 80 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:55:80 2026-02-21T10:19:38.1436027Z add.s64 %rd820, %rd854, %rd105; 2026-02-21T10:19:38.1436087Z // begin inline asm 2026-02-21T10:19:38.1436145Z mov.u64 %rd798, 0x0; 2026-02-21T10:19:38.1436277Z createpolicy.fractional.L2::evict_first.b64 %rd798, 1.0; 2026-02-21T10:19:38.1436334Z // end inline asm 2026-02-21T10:19:38.1436392Z // begin inline asm 2026-02-21T10:19:38.1436578Z mov.u32 %r39897, 0x0; 2026-02-21T10:19:38.1436650Z mov.u32 %r39898, 0x0; 2026-02-21T10:19:38.1436706Z mov.u32 %r39899, 0x0; 2026-02-21T10:19:38.1436762Z mov.u32 %r39900, 0x0; 2026-02-21T10:19:38.1437090Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39897, %r39898, %r39899, %r39900 }, [ %rd799 + 0 ], %rd798; 2026-02-21T10:19:38.1437153Z // end inline asm 2026-02-21T10:19:38.1437212Z // begin inline asm 2026-02-21T10:19:38.1437271Z mov.u64 %rd801, 0x0; 2026-02-21T10:19:38.1437394Z createpolicy.fractional.L2::evict_first.b64 %rd801, 1.0; 2026-02-21T10:19:38.1437451Z // end inline asm 2026-02-21T10:19:38.1437509Z // begin inline asm 2026-02-21T10:19:38.1437631Z mov.u32 %r39901, 0x0; 2026-02-21T10:19:38.1437687Z mov.u32 %r39902, 0x0; 2026-02-21T10:19:38.1437742Z mov.u32 %r39903, 0x0; 2026-02-21T10:19:38.1437797Z mov.u32 %r39904, 0x0; 2026-02-21T10:19:38.1438146Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39901, %r39902, %r39903, %r39904 }, [ %rd802 + 0 ], %rd801; 2026-02-21T10:19:38.1438210Z // end inline asm 2026-02-21T10:19:38.1438267Z // begin inline asm 2026-02-21T10:19:38.1438327Z mov.u64 %rd804, 0x0; 2026-02-21T10:19:38.1438450Z createpolicy.fractional.L2::evict_first.b64 %rd804, 1.0; 2026-02-21T10:19:38.1438507Z // end inline asm 2026-02-21T10:19:38.1438565Z // begin inline asm 2026-02-21T10:19:38.1438623Z mov.u32 %r39905, 0x0; 2026-02-21T10:19:38.1438677Z mov.u32 %r39906, 0x0; 2026-02-21T10:19:38.1438733Z mov.u32 %r39907, 0x0; 2026-02-21T10:19:38.1438790Z mov.u32 %r39908, 0x0; 2026-02-21T10:19:38.1439012Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39905, %r39906, %r39907, %r39908 }, [ %rd805 + 0 ], %rd804; 2026-02-21T10:19:38.1439070Z // end inline asm 2026-02-21T10:19:38.1439217Z // begin inline asm 2026-02-21T10:19:38.1439280Z mov.u64 %rd807, 0x0; 2026-02-21T10:19:38.1439399Z createpolicy.fractional.L2::evict_first.b64 %rd807, 1.0; 2026-02-21T10:19:38.1439457Z // end inline asm 2026-02-21T10:19:38.1439515Z // begin inline asm 2026-02-21T10:19:38.1439572Z mov.u32 %r39909, 0x0; 2026-02-21T10:19:38.1439628Z mov.u32 %r39910, 0x0; 2026-02-21T10:19:38.1439685Z mov.u32 %r39911, 0x0; 2026-02-21T10:19:38.1439741Z mov.u32 %r39912, 0x0; 2026-02-21T10:19:38.1439962Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39909, %r39910, %r39911, %r39912 }, [ %rd808 + 0 ], %rd807; 2026-02-21T10:19:38.1440022Z // end inline asm 2026-02-21T10:19:38.1440078Z // begin inline asm 2026-02-21T10:19:38.1440134Z mov.u64 %rd810, 0x0; 2026-02-21T10:19:38.1440251Z createpolicy.fractional.L2::evict_first.b64 %rd810, 1.0; 2026-02-21T10:19:38.1440307Z // end inline asm 2026-02-21T10:19:38.1440364Z // begin inline asm 2026-02-21T10:19:38.1440423Z mov.u32 %r39913, 0x0; 2026-02-21T10:19:38.1440484Z mov.u32 %r39914, 0x0; 2026-02-21T10:19:38.1440539Z mov.u32 %r39915, 0x0; 2026-02-21T10:19:38.1440593Z mov.u32 %r39916, 0x0; 2026-02-21T10:19:38.1440816Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39913, %r39914, %r39915, %r39916 }, [ %rd811 + 0 ], %rd810; 2026-02-21T10:19:38.1440943Z // end inline asm 2026-02-21T10:19:38.1440999Z // begin inline asm 2026-02-21T10:19:38.1441057Z mov.u64 %rd813, 0x0; 2026-02-21T10:19:38.1441188Z createpolicy.fractional.L2::evict_first.b64 %rd813, 1.0; 2026-02-21T10:19:38.1441258Z // end inline asm 2026-02-21T10:19:38.1441320Z // begin inline asm 2026-02-21T10:19:38.1441382Z mov.u32 %r39917, 0x0; 2026-02-21T10:19:38.1441439Z mov.u32 %r39918, 0x0; 2026-02-21T10:19:38.1441495Z mov.u32 %r39919, 0x0; 2026-02-21T10:19:38.1441553Z mov.u32 %r39920, 0x0; 2026-02-21T10:19:38.1441785Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39917, %r39918, %r39919, %r39920 }, [ %rd814 + 0 ], %rd813; 2026-02-21T10:19:38.1441846Z // end inline asm 2026-02-21T10:19:38.1441907Z // begin inline asm 2026-02-21T10:19:38.1441967Z mov.u64 %rd816, 0x0; 2026-02-21T10:19:38.1442091Z createpolicy.fractional.L2::evict_first.b64 %rd816, 1.0; 2026-02-21T10:19:38.1442149Z // end inline asm 2026-02-21T10:19:38.1442209Z // begin inline asm 2026-02-21T10:19:38.1442266Z mov.u32 %r39921, 0x0; 2026-02-21T10:19:38.1442322Z mov.u32 %r39922, 0x0; 2026-02-21T10:19:38.1442379Z mov.u32 %r39923, 0x0; 2026-02-21T10:19:38.1442437Z mov.u32 %r39924, 0x0; 2026-02-21T10:19:38.1442718Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39921, %r39922, %r39923, %r39924 }, [ %rd817 + 0 ], %rd816; 2026-02-21T10:19:38.1442776Z // end inline asm 2026-02-21T10:19:38.1442838Z // begin inline asm 2026-02-21T10:19:38.1442894Z mov.u64 %rd819, 0x0; 2026-02-21T10:19:38.1443012Z createpolicy.fractional.L2::evict_first.b64 %rd819, 1.0; 2026-02-21T10:19:38.1443069Z // end inline asm 2026-02-21T10:19:38.1443127Z // begin inline asm 2026-02-21T10:19:38.1443231Z mov.u32 %r39925, 0x0; 2026-02-21T10:19:38.1443288Z mov.u32 %r39926, 0x0; 2026-02-21T10:19:38.1443348Z mov.u32 %r39927, 0x0; 2026-02-21T10:19:38.1443405Z mov.u32 %r39928, 0x0; 2026-02-21T10:19:38.1443637Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39925, %r39926, %r39927, %r39928 }, [ %rd820 + 0 ], %rd819; 2026-02-21T10:19:38.1443701Z // end inline asm 2026-02-21T10:19:38.1443907Z .loc 1 59 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:59:32 2026-02-21T10:19:38.1443965Z bar.sync 0; 2026-02-21T10:19:38.1444053Z st.shared.v2.b32 [%r68], {%r39897, %r39898}; 2026-02-21T10:19:38.1444145Z st.shared.v2.b32 [%r68+2048], {%r39901, %r39902}; 2026-02-21T10:19:38.1444229Z st.shared.v2.b32 [%r68+4096], {%r39905, %r39906}; 2026-02-21T10:19:38.1444321Z st.shared.v2.b32 [%r68+6144], {%r39909, %r39910}; 2026-02-21T10:19:38.1444405Z st.shared.v2.b32 [%r68+8192], {%r39913, %r39914}; 2026-02-21T10:19:38.1444496Z st.shared.v2.b32 [%r68+10240], {%r39917, %r39918}; 2026-02-21T10:19:38.1444635Z st.shared.v2.b32 [%r68+12288], {%r39921, %r39922}; 2026-02-21T10:19:38.1444727Z st.shared.v2.b32 [%r68+14336], {%r39925, %r39926}; 2026-02-21T10:19:38.1444803Z st.shared.v2.b32 [%r69], {%r39899, %r39900}; 2026-02-21T10:19:38.1444887Z st.shared.v2.b32 [%r69+2048], {%r39903, %r39904}; 2026-02-21T10:19:38.1444973Z st.shared.v2.b32 [%r69+4096], {%r39907, %r39908}; 2026-02-21T10:19:38.1445055Z st.shared.v2.b32 [%r69+6144], {%r39911, %r39912}; 2026-02-21T10:19:38.1445137Z st.shared.v2.b32 [%r69+8192], {%r39915, %r39916}; 2026-02-21T10:19:38.1445222Z st.shared.v2.b32 [%r69+10240], {%r39919, %r39920}; 2026-02-21T10:19:38.1445309Z st.shared.v2.b32 [%r69+12288], {%r39923, %r39924}; 2026-02-21T10:19:38.1445404Z st.shared.v2.b32 [%r69+14336], {%r39927, %r39928}; 2026-02-21T10:19:38.1445462Z bar.sync 0; 2026-02-21T10:19:38.1445536Z ld.shared.b16 %rs3366, [%r70]; 2026-02-21T10:19:38.1445609Z ld.shared.b16 %rs3367, [%r70+1024]; 2026-02-21T10:19:38.1445679Z ld.shared.b16 %rs3368, [%r70+64]; 2026-02-21T10:19:38.1445749Z ld.shared.b16 %rs3369, [%r70+1088]; 2026-02-21T10:19:38.1445813Z ld.shared.b16 %rs3370, [%r70+8192]; 2026-02-21T10:19:38.1445877Z ld.shared.b16 %rs3371, [%r70+9216]; 2026-02-21T10:19:38.1445942Z ld.shared.b16 %rs3372, [%r70+8256]; 2026-02-21T10:19:38.1446010Z ld.shared.b16 %rs3373, [%r70+9280]; 2026-02-21T10:19:38.1446132Z ld.shared.b16 %rs3374, [%r71]; 2026-02-21T10:19:38.1446197Z ld.shared.b16 %rs3375, [%r71+1024]; 2026-02-21T10:19:38.1446264Z ld.shared.b16 %rs3376, [%r71+64]; 2026-02-21T10:19:38.1446328Z ld.shared.b16 %rs3377, [%r71+1088]; 2026-02-21T10:19:38.1446403Z ld.shared.b16 %rs3378, [%r71+8192]; 2026-02-21T10:19:38.1446619Z ld.shared.b16 %rs3379, [%r71+9216]; 2026-02-21T10:19:38.1446694Z ld.shared.b16 %rs3380, [%r71+8256]; 2026-02-21T10:19:38.1446758Z ld.shared.b16 %rs3381, [%r71+9280]; 2026-02-21T10:19:38.1446823Z ld.shared.b16 %rs3382, [%r72]; 2026-02-21T10:19:38.1446900Z ld.shared.b16 %rs3383, [%r72+1024]; 2026-02-21T10:19:38.1446970Z ld.shared.b16 %rs3384, [%r72+64]; 2026-02-21T10:19:38.1447038Z ld.shared.b16 %rs3385, [%r72+1088]; 2026-02-21T10:19:38.1447108Z ld.shared.b16 %rs3386, [%r72+8192]; 2026-02-21T10:19:38.1447172Z ld.shared.b16 %rs3387, [%r72+9216]; 2026-02-21T10:19:38.1447235Z ld.shared.b16 %rs3388, [%r72+8256]; 2026-02-21T10:19:38.1447302Z ld.shared.b16 %rs3389, [%r72+9280]; 2026-02-21T10:19:38.1447370Z ld.shared.b16 %rs3390, [%r73]; 2026-02-21T10:19:38.1447434Z ld.shared.b16 %rs3391, [%r73+1024]; 2026-02-21T10:19:38.1447586Z ld.shared.b16 %rs3392, [%r73+64]; 2026-02-21T10:19:38.1447660Z ld.shared.b16 %rs3393, [%r73+1088]; 2026-02-21T10:19:38.1447733Z ld.shared.b16 %rs3394, [%r73+8192]; 2026-02-21T10:19:38.1447798Z ld.shared.b16 %rs3395, [%r73+9216]; 2026-02-21T10:19:38.1447862Z ld.shared.b16 %rs3396, [%r73+8256]; 2026-02-21T10:19:38.1447927Z ld.shared.b16 %rs3397, [%r73+9280]; 2026-02-21T10:19:38.1447991Z ld.shared.b16 %rs3398, [%r74]; 2026-02-21T10:19:38.1448148Z ld.shared.b16 %rs3399, [%r74+1024]; 2026-02-21T10:19:38.1448217Z ld.shared.b16 %rs3400, [%r74+64]; 2026-02-21T10:19:38.1448283Z ld.shared.b16 %rs3401, [%r74+1088]; 2026-02-21T10:19:38.1448349Z ld.shared.b16 %rs3402, [%r74+8192]; 2026-02-21T10:19:38.1448413Z ld.shared.b16 %rs3403, [%r74+9216]; 2026-02-21T10:19:38.1448477Z ld.shared.b16 %rs3404, [%r74+8256]; 2026-02-21T10:19:38.1448542Z ld.shared.b16 %rs3405, [%r74+9280]; 2026-02-21T10:19:38.1448604Z ld.shared.b16 %rs3406, [%r75]; 2026-02-21T10:19:38.1448672Z ld.shared.b16 %rs3407, [%r75+1024]; 2026-02-21T10:19:38.1448736Z ld.shared.b16 %rs3408, [%r75+64]; 2026-02-21T10:19:38.1448799Z ld.shared.b16 %rs3409, [%r75+1088]; 2026-02-21T10:19:38.1448863Z ld.shared.b16 %rs3410, [%r75+8192]; 2026-02-21T10:19:38.1448927Z ld.shared.b16 %rs3411, [%r75+9216]; 2026-02-21T10:19:38.1448991Z ld.shared.b16 %rs3412, [%r75+8256]; 2026-02-21T10:19:38.1449054Z ld.shared.b16 %rs3413, [%r75+9280]; 2026-02-21T10:19:38.1449119Z ld.shared.b16 %rs3414, [%r76]; 2026-02-21T10:19:38.1449253Z ld.shared.b16 %rs3415, [%r76+1024]; 2026-02-21T10:19:38.1449321Z ld.shared.b16 %rs3416, [%r76+64]; 2026-02-21T10:19:38.1449388Z ld.shared.b16 %rs3417, [%r76+1088]; 2026-02-21T10:19:38.1449451Z ld.shared.b16 %rs3418, [%r76+8192]; 2026-02-21T10:19:38.1449514Z ld.shared.b16 %rs3419, [%r76+9216]; 2026-02-21T10:19:38.1449581Z ld.shared.b16 %rs3420, [%r76+8256]; 2026-02-21T10:19:38.1449644Z ld.shared.b16 %rs3421, [%r76+9280]; 2026-02-21T10:19:38.1449708Z ld.shared.b16 %rs3422, [%r77]; 2026-02-21T10:19:38.1449772Z ld.shared.b16 %rs3423, [%r77+1024]; 2026-02-21T10:19:38.1449836Z ld.shared.b16 %rs3424, [%r77+64]; 2026-02-21T10:19:38.1449900Z ld.shared.b16 %rs3425, [%r77+1088]; 2026-02-21T10:19:38.1449963Z ld.shared.b16 %rs3426, [%r77+8192]; 2026-02-21T10:19:38.1450028Z ld.shared.b16 %rs3427, [%r77+9216]; 2026-02-21T10:19:38.1450091Z ld.shared.b16 %rs3428, [%r77+8256]; 2026-02-21T10:19:38.1450155Z ld.shared.b16 %rs3429, [%r77+9280]; 2026-02-21T10:19:38.1450219Z cvt.f32.bf16 %r40066, %rs3366; 2026-02-21T10:19:38.1450284Z cvt.f32.bf16 %r40067, %rs3367; 2026-02-21T10:19:38.1450345Z cvt.f32.bf16 %r40068, %rs3374; 2026-02-21T10:19:38.1450405Z cvt.f32.bf16 %r40069, %rs3375; 2026-02-21T10:19:38.1450467Z cvt.f32.bf16 %r40198, %rs3382; 2026-02-21T10:19:38.1450527Z cvt.f32.bf16 %r40199, %rs3383; 2026-02-21T10:19:38.1450656Z cvt.f32.bf16 %r40200, %rs3390; 2026-02-21T10:19:38.1450716Z cvt.f32.bf16 %r40201, %rs3391; 2026-02-21T10:19:38.1450779Z cvt.f32.bf16 %r40330, %rs3398; 2026-02-21T10:19:38.1450841Z cvt.f32.bf16 %r40331, %rs3399; 2026-02-21T10:19:38.1450901Z cvt.f32.bf16 %r40332, %rs3406; 2026-02-21T10:19:38.1450962Z cvt.f32.bf16 %r40333, %rs3407; 2026-02-21T10:19:38.1451020Z cvt.f32.bf16 %r40462, %rs3414; 2026-02-21T10:19:38.1451079Z cvt.f32.bf16 %r40463, %rs3415; 2026-02-21T10:19:38.1451140Z cvt.f32.bf16 %r40464, %rs3422; 2026-02-21T10:19:38.1451201Z cvt.f32.bf16 %r40465, %rs3423; 2026-02-21T10:19:38.1451260Z cvt.f32.bf16 %r40594, %rs3368; 2026-02-21T10:19:38.1451323Z cvt.f32.bf16 %r40595, %rs3369; 2026-02-21T10:19:38.1451384Z cvt.f32.bf16 %r40596, %rs3376; 2026-02-21T10:19:38.1451444Z cvt.f32.bf16 %r40597, %rs3377; 2026-02-21T10:19:38.1451515Z cvt.f32.bf16 %r40726, %rs3384; 2026-02-21T10:19:38.1451581Z cvt.f32.bf16 %r40727, %rs3385; 2026-02-21T10:19:38.1451643Z cvt.f32.bf16 %r40728, %rs3392; 2026-02-21T10:19:38.1451701Z cvt.f32.bf16 %r40729, %rs3393; 2026-02-21T10:19:38.1451761Z cvt.f32.bf16 %r40858, %rs3400; 2026-02-21T10:19:38.1451874Z cvt.f32.bf16 %r40859, %rs3401; 2026-02-21T10:19:38.1451937Z cvt.f32.bf16 %r40860, %rs3408; 2026-02-21T10:19:38.1451996Z cvt.f32.bf16 %r40861, %rs3409; 2026-02-21T10:19:38.1452057Z cvt.f32.bf16 %r40990, %rs3416; 2026-02-21T10:19:38.1452116Z cvt.f32.bf16 %r40991, %rs3417; 2026-02-21T10:19:38.1452175Z cvt.f32.bf16 %r40992, %rs3424; 2026-02-21T10:19:38.1452236Z cvt.f32.bf16 %r40993, %rs3425; 2026-02-21T10:19:38.1452297Z cvt.f32.bf16 %r41122, %rs3370; 2026-02-21T10:19:38.1452405Z cvt.f32.bf16 %r41123, %rs3371; 2026-02-21T10:19:38.1452465Z cvt.f32.bf16 %r41124, %rs3378; 2026-02-21T10:19:38.1452527Z cvt.f32.bf16 %r41125, %rs3379; 2026-02-21T10:19:38.1452586Z cvt.f32.bf16 %r41254, %rs3386; 2026-02-21T10:19:38.1452645Z cvt.f32.bf16 %r41255, %rs3387; 2026-02-21T10:19:38.1452707Z cvt.f32.bf16 %r41256, %rs3394; 2026-02-21T10:19:38.1452769Z cvt.f32.bf16 %r41257, %rs3395; 2026-02-21T10:19:38.1452827Z cvt.f32.bf16 %r41386, %rs3402; 2026-02-21T10:19:38.1452888Z cvt.f32.bf16 %r41387, %rs3403; 2026-02-21T10:19:38.1452951Z cvt.f32.bf16 %r41388, %rs3410; 2026-02-21T10:19:38.1453011Z cvt.f32.bf16 %r41389, %rs3411; 2026-02-21T10:19:38.1453072Z cvt.f32.bf16 %r41518, %rs3418; 2026-02-21T10:19:38.1453134Z cvt.f32.bf16 %r41519, %rs3419; 2026-02-21T10:19:38.1453198Z cvt.f32.bf16 %r41520, %rs3426; 2026-02-21T10:19:38.1453257Z cvt.f32.bf16 %r41521, %rs3427; 2026-02-21T10:19:38.1453316Z cvt.f32.bf16 %r41650, %rs3372; 2026-02-21T10:19:38.1453379Z cvt.f32.bf16 %r41651, %rs3373; 2026-02-21T10:19:38.1453490Z cvt.f32.bf16 %r41652, %rs3380; 2026-02-21T10:19:38.1453553Z cvt.f32.bf16 %r41653, %rs3381; 2026-02-21T10:19:38.1453619Z cvt.f32.bf16 %r41782, %rs3388; 2026-02-21T10:19:38.1453678Z cvt.f32.bf16 %r41783, %rs3389; 2026-02-21T10:19:38.1453737Z cvt.f32.bf16 %r41784, %rs3396; 2026-02-21T10:19:38.1453797Z cvt.f32.bf16 %r41785, %rs3397; 2026-02-21T10:19:38.1453860Z cvt.f32.bf16 %r41914, %rs3404; 2026-02-21T10:19:38.1453920Z cvt.f32.bf16 %r41915, %rs3405; 2026-02-21T10:19:38.1453979Z cvt.f32.bf16 %r41916, %rs3412; 2026-02-21T10:19:38.1454042Z cvt.f32.bf16 %r41917, %rs3413; 2026-02-21T10:19:38.1454114Z cvt.f32.bf16 %r42046, %rs3420; 2026-02-21T10:19:38.1454176Z cvt.f32.bf16 %r42047, %rs3421; 2026-02-21T10:19:38.1454238Z cvt.f32.bf16 %r42048, %rs3428; 2026-02-21T10:19:38.1454299Z cvt.f32.bf16 %r42049, %rs3429; 2026-02-21T10:19:38.1454507Z .loc 1 61 33 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:61:33 2026-02-21T10:19:38.1454565Z bar.sync 0; 2026-02-21T10:19:38.1454628Z // begin inline asm 2026-02-21T10:19:38.1454728Z @%p313 mbarrier.init.shared::cta.b64 [%r39929], 1; 2026-02-21T10:19:38.1454784Z // end inline asm 2026-02-21T10:19:38.1454842Z bar.sync 0; 2026-02-21T10:19:38.1454898Z // begin inline asm 2026-02-21T10:19:38.1455033Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39929], 4096; 2026-02-21T10:19:38.1455146Z // end inline asm 2026-02-21T10:19:38.1455205Z // begin inline asm 2026-02-21T10:19:38.1455281Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1455338Z // end inline asm 2026-02-21T10:19:38.1455393Z bar.sync 0; 2026-02-21T10:19:38.1455460Z elect.sync %r42312|%p399, -1; 2026-02-21T10:19:38.1455527Z and.pred %p380, %p1, %p399; 2026-02-21T10:19:38.1455588Z add.s64 %rd855, %rd855, 32; 2026-02-21T10:19:38.1455652Z cvt.u32.u64 %r39933, %rd855; 2026-02-21T10:19:38.1455710Z // begin inline asm 2026-02-21T10:19:38.1456063Z @%p380 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39931], [%rd822, {%r39932, %r39933}], [%r39929]; 2026-02-21T10:19:38.1456125Z // end inline asm 2026-02-21T10:19:38.1456183Z bar.sync 0; 2026-02-21T10:19:38.1456240Z mov.b32 %r42180, 0; 2026-02-21T10:19:38.1456300Z // begin inline asm 2026-02-21T10:19:38.1456351Z 2026-02-21T10:19:38.1456401Z { 2026-02-21T10:19:38.1456617Z .reg .pred complete; 2026-02-21T10:19:38.1456684Z waitLoop: 2026-02-21T10:19:38.1456842Z mbarrier.try_wait.parity.shared.b64 complete, [%r39929], %r42180; 2026-02-21T10:19:38.1456993Z @!complete bra.uni waitLoop; 2026-02-21T10:19:38.1457046Z } 2026-02-21T10:19:38.1457053Z 2026-02-21T10:19:38.1457109Z // end inline asm 2026-02-21T10:19:38.1457164Z bar.sync 0; 2026-02-21T10:19:38.1457221Z // begin inline asm 2026-02-21T10:19:38.1457320Z @%p313 mbarrier.inval.shared::cta.b64 [%r39929]; 2026-02-21T10:19:38.1457376Z // end inline asm 2026-02-21T10:19:38.1457575Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1457714Z ld.shared.s8 %rs3430, [%r78]; 2026-02-21T10:19:38.1457907Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1457971Z shl.b16 %rs3431, %rs3430, 4; 2026-02-21T10:19:38.1458164Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1458234Z ld.shared.s8 %rs3432, [%r79+128]; 2026-02-21T10:19:38.1458425Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1458489Z shl.b16 %rs3433, %rs3432, 4; 2026-02-21T10:19:38.1458678Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1458743Z ld.shared.s8 %rs3434, [%r80+256]; 2026-02-21T10:19:38.1458932Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1458997Z shl.b16 %rs3435, %rs3434, 4; 2026-02-21T10:19:38.1459254Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1459321Z ld.shared.s8 %rs3436, [%r81+384]; 2026-02-21T10:19:38.1459512Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1459589Z shl.b16 %rs3437, %rs3436, 4; 2026-02-21T10:19:38.1459782Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1459850Z ld.shared.s8 %rs3438, [%r82+512]; 2026-02-21T10:19:38.1460042Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1460103Z shl.b16 %rs3439, %rs3438, 4; 2026-02-21T10:19:38.1460294Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1460358Z ld.shared.s8 %rs3440, [%r83+640]; 2026-02-21T10:19:38.1460549Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1460613Z shl.b16 %rs3441, %rs3440, 4; 2026-02-21T10:19:38.1460803Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1460866Z ld.shared.s8 %rs3442, [%r84+768]; 2026-02-21T10:19:38.1461144Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1461207Z shl.b16 %rs3443, %rs3442, 4; 2026-02-21T10:19:38.1461400Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1461466Z ld.shared.s8 %rs3444, [%r85+896]; 2026-02-21T10:19:38.1461663Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1461724Z shl.b16 %rs3445, %rs3444, 4; 2026-02-21T10:19:38.1461916Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1461987Z ld.shared.s8 %rs3446, [%r78+1024]; 2026-02-21T10:19:38.1462176Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1462237Z shl.b16 %rs3447, %rs3446, 4; 2026-02-21T10:19:38.1462425Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1462496Z ld.shared.s8 %rs3448, [%r79+1152]; 2026-02-21T10:19:38.1462741Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1462807Z shl.b16 %rs3449, %rs3448, 4; 2026-02-21T10:19:38.1463000Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1463062Z ld.shared.s8 %rs3450, [%r80+1280]; 2026-02-21T10:19:38.1463251Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1463364Z shl.b16 %rs3451, %rs3450, 4; 2026-02-21T10:19:38.1463553Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1466000Z ld.shared.s8 %rs3452, [%r81+1408]; 2026-02-21T10:19:38.1466269Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1466343Z shl.b16 %rs3453, %rs3452, 4; 2026-02-21T10:19:38.1466716Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1466793Z ld.shared.s8 %rs3454, [%r82+1536]; 2026-02-21T10:19:38.1467016Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1467085Z shl.b16 %rs3455, %rs3454, 4; 2026-02-21T10:19:38.1467282Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1467349Z ld.shared.s8 %rs3456, [%r83+1664]; 2026-02-21T10:19:38.1467655Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1467726Z shl.b16 %rs3457, %rs3456, 4; 2026-02-21T10:19:38.1467914Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1467980Z ld.shared.s8 %rs3458, [%r84+1792]; 2026-02-21T10:19:38.1468173Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1468234Z shl.b16 %rs3459, %rs3458, 4; 2026-02-21T10:19:38.1468498Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1468568Z ld.shared.s8 %rs3460, [%r85+1920]; 2026-02-21T10:19:38.1468756Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1468817Z shl.b16 %rs3461, %rs3460, 4; 2026-02-21T10:19:38.1469019Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1469087Z ld.shared.s8 %rs3462, [%r78+2048]; 2026-02-21T10:19:38.1469280Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1469342Z shl.b16 %rs3463, %rs3462, 4; 2026-02-21T10:19:38.1469610Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1469675Z ld.shared.s8 %rs3464, [%r79+2176]; 2026-02-21T10:19:38.1469860Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1469925Z shl.b16 %rs3465, %rs3464, 4; 2026-02-21T10:19:38.1470112Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1470174Z ld.shared.s8 %rs3466, [%r80+2304]; 2026-02-21T10:19:38.1470365Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1470430Z shl.b16 %rs3467, %rs3466, 4; 2026-02-21T10:19:38.1470626Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1470705Z ld.shared.s8 %rs3468, [%r81+2432]; 2026-02-21T10:19:38.1470898Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1470961Z shl.b16 %rs3469, %rs3468, 4; 2026-02-21T10:19:38.1471240Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1471308Z ld.shared.s8 %rs3470, [%r82+2560]; 2026-02-21T10:19:38.1471495Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1471555Z shl.b16 %rs3471, %rs3470, 4; 2026-02-21T10:19:38.1471744Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1471873Z ld.shared.s8 %rs3472, [%r83+2688]; 2026-02-21T10:19:38.1472067Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1472143Z shl.b16 %rs3473, %rs3472, 4; 2026-02-21T10:19:38.1472342Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1472412Z ld.shared.s8 %rs3474, [%r84+2816]; 2026-02-21T10:19:38.1472608Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1472670Z shl.b16 %rs3475, %rs3474, 4; 2026-02-21T10:19:38.1472861Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1472929Z ld.shared.s8 %rs3476, [%r85+2944]; 2026-02-21T10:19:38.1473122Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1473184Z shl.b16 %rs3477, %rs3476, 4; 2026-02-21T10:19:38.1473437Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1473507Z ld.shared.s8 %rs3478, [%r78+3072]; 2026-02-21T10:19:38.1473694Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1473755Z shl.b16 %rs3479, %rs3478, 4; 2026-02-21T10:19:38.1473942Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1474006Z ld.shared.s8 %rs3480, [%r79+3200]; 2026-02-21T10:19:38.1474192Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1474254Z shl.b16 %rs3481, %rs3480, 4; 2026-02-21T10:19:38.1474454Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1474518Z ld.shared.s8 %rs3482, [%r80+3328]; 2026-02-21T10:19:38.1474719Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1474778Z shl.b16 %rs3483, %rs3482, 4; 2026-02-21T10:19:38.1474966Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1475030Z ld.shared.s8 %rs3484, [%r81+3456]; 2026-02-21T10:19:38.1475273Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1475333Z shl.b16 %rs3485, %rs3484, 4; 2026-02-21T10:19:38.1475519Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1475585Z ld.shared.s8 %rs3486, [%r82+3584]; 2026-02-21T10:19:38.1475771Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1475831Z shl.b16 %rs3487, %rs3486, 4; 2026-02-21T10:19:38.1476022Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1476088Z ld.shared.s8 %rs3488, [%r83+3712]; 2026-02-21T10:19:38.1476276Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1476337Z shl.b16 %rs3489, %rs3488, 4; 2026-02-21T10:19:38.1476659Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1476731Z ld.shared.s8 %rs3490, [%r84+3840]; 2026-02-21T10:19:38.1476990Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1477055Z shl.b16 %rs3491, %rs3490, 4; 2026-02-21T10:19:38.1477258Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1477321Z ld.shared.s8 %rs3492, [%r85+3968]; 2026-02-21T10:19:38.1477516Z .loc 1 64 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:64:28 2026-02-21T10:19:38.1477644Z shl.b16 %rs3493, %rs3492, 4; 2026-02-21T10:19:38.1477843Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1477908Z cvt.s16.s8 %rs3494, %rs3431; 2026-02-21T10:19:38.1477967Z shr.s16 %rs3495, %rs3494, 4; 2026-02-21T10:19:38.1478026Z cvt.s16.s8 %rs3496, %rs3433; 2026-02-21T10:19:38.1478090Z shr.s16 %rs3497, %rs3496, 4; 2026-02-21T10:19:38.1478148Z shr.s16 %rs3498, %rs3430, 4; 2026-02-21T10:19:38.1478206Z shr.s16 %rs3499, %rs3432, 4; 2026-02-21T10:19:38.1478400Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1478470Z cvt.rn.f32.s16 %r42313, %rs3499; 2026-02-21T10:19:38.1478531Z cvt.rn.f32.s16 %r42314, %rs3498; 2026-02-21T10:19:38.1478592Z cvt.rn.f32.s16 %r42315, %rs3497; 2026-02-21T10:19:38.1478654Z cvt.rn.f32.s16 %r42316, %rs3495; 2026-02-21T10:19:38.1478923Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1478990Z cvt.s16.s8 %rs3500, %rs3435; 2026-02-21T10:19:38.1479050Z shr.s16 %rs3501, %rs3500, 4; 2026-02-21T10:19:38.1479113Z cvt.s16.s8 %rs3502, %rs3437; 2026-02-21T10:19:38.1479174Z shr.s16 %rs3503, %rs3502, 4; 2026-02-21T10:19:38.1479232Z shr.s16 %rs3504, %rs3434, 4; 2026-02-21T10:19:38.1479296Z shr.s16 %rs3505, %rs3436, 4; 2026-02-21T10:19:38.1479490Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1479554Z cvt.rn.f32.s16 %r42317, %rs3505; 2026-02-21T10:19:38.1479618Z cvt.rn.f32.s16 %r42318, %rs3504; 2026-02-21T10:19:38.1479678Z cvt.rn.f32.s16 %r42319, %rs3503; 2026-02-21T10:19:38.1479738Z cvt.rn.f32.s16 %r42320, %rs3501; 2026-02-21T10:19:38.1479929Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1479992Z cvt.s16.s8 %rs3506, %rs3439; 2026-02-21T10:19:38.1480053Z shr.s16 %rs3507, %rs3506, 4; 2026-02-21T10:19:38.1480124Z cvt.s16.s8 %rs3508, %rs3441; 2026-02-21T10:19:38.1480189Z shr.s16 %rs3509, %rs3508, 4; 2026-02-21T10:19:38.1480250Z shr.s16 %rs3510, %rs3438, 4; 2026-02-21T10:19:38.1480310Z shr.s16 %rs3511, %rs3440, 4; 2026-02-21T10:19:38.1480503Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1480637Z cvt.rn.f32.s16 %r42321, %rs3511; 2026-02-21T10:19:38.1480698Z cvt.rn.f32.s16 %r42322, %rs3510; 2026-02-21T10:19:38.1480760Z cvt.rn.f32.s16 %r42323, %rs3509; 2026-02-21T10:19:38.1480823Z cvt.rn.f32.s16 %r42324, %rs3507; 2026-02-21T10:19:38.1481014Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1481074Z cvt.s16.s8 %rs3512, %rs3443; 2026-02-21T10:19:38.1481134Z shr.s16 %rs3513, %rs3512, 4; 2026-02-21T10:19:38.1481194Z cvt.s16.s8 %rs3514, %rs3445; 2026-02-21T10:19:38.1481252Z shr.s16 %rs3515, %rs3514, 4; 2026-02-21T10:19:38.1481316Z shr.s16 %rs3516, %rs3442, 4; 2026-02-21T10:19:38.1481378Z shr.s16 %rs3517, %rs3444, 4; 2026-02-21T10:19:38.1481567Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1481627Z cvt.rn.f32.s16 %r42325, %rs3517; 2026-02-21T10:19:38.1481693Z cvt.rn.f32.s16 %r42326, %rs3516; 2026-02-21T10:19:38.1481756Z cvt.rn.f32.s16 %r42327, %rs3515; 2026-02-21T10:19:38.1481815Z cvt.rn.f32.s16 %r42328, %rs3513; 2026-02-21T10:19:38.1482063Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1482127Z cvt.s16.s8 %rs3518, %rs3447; 2026-02-21T10:19:38.1482186Z shr.s16 %rs3519, %rs3518, 4; 2026-02-21T10:19:38.1482246Z cvt.s16.s8 %rs3520, %rs3449; 2026-02-21T10:19:38.1482307Z shr.s16 %rs3521, %rs3520, 4; 2026-02-21T10:19:38.1482366Z shr.s16 %rs3522, %rs3446, 4; 2026-02-21T10:19:38.1482426Z shr.s16 %rs3523, %rs3448, 4; 2026-02-21T10:19:38.1482667Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1482729Z cvt.rn.f32.s16 %r42329, %rs3523; 2026-02-21T10:19:38.1482789Z cvt.rn.f32.s16 %r42330, %rs3522; 2026-02-21T10:19:38.1482852Z cvt.rn.f32.s16 %r42331, %rs3521; 2026-02-21T10:19:38.1482911Z cvt.rn.f32.s16 %r42332, %rs3519; 2026-02-21T10:19:38.1483099Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1483160Z cvt.s16.s8 %rs3524, %rs3451; 2026-02-21T10:19:38.1483222Z shr.s16 %rs3525, %rs3524, 4; 2026-02-21T10:19:38.1483279Z cvt.s16.s8 %rs3526, %rs3453; 2026-02-21T10:19:38.1483338Z shr.s16 %rs3527, %rs3526, 4; 2026-02-21T10:19:38.1483398Z shr.s16 %rs3528, %rs3450, 4; 2026-02-21T10:19:38.1483457Z shr.s16 %rs3529, %rs3452, 4; 2026-02-21T10:19:38.1483646Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1483714Z cvt.rn.f32.s16 %r42333, %rs3529; 2026-02-21T10:19:38.1483831Z cvt.rn.f32.s16 %r42334, %rs3528; 2026-02-21T10:19:38.1483895Z cvt.rn.f32.s16 %r42335, %rs3527; 2026-02-21T10:19:38.1483966Z cvt.rn.f32.s16 %r42336, %rs3525; 2026-02-21T10:19:38.1484170Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1484234Z cvt.s16.s8 %rs3530, %rs3455; 2026-02-21T10:19:38.1484293Z shr.s16 %rs3531, %rs3530, 4; 2026-02-21T10:19:38.1484362Z cvt.s16.s8 %rs3532, %rs3457; 2026-02-21T10:19:38.1484422Z shr.s16 %rs3533, %rs3532, 4; 2026-02-21T10:19:38.1484481Z shr.s16 %rs3534, %rs3454, 4; 2026-02-21T10:19:38.1484541Z shr.s16 %rs3535, %rs3456, 4; 2026-02-21T10:19:38.1484738Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1484802Z cvt.rn.f32.s16 %r42337, %rs3535; 2026-02-21T10:19:38.1484865Z cvt.rn.f32.s16 %r42338, %rs3534; 2026-02-21T10:19:38.1484927Z cvt.rn.f32.s16 %r42339, %rs3533; 2026-02-21T10:19:38.1484989Z cvt.rn.f32.s16 %r42340, %rs3531; 2026-02-21T10:19:38.1485180Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1485242Z cvt.s16.s8 %rs3536, %rs3459; 2026-02-21T10:19:38.1485301Z shr.s16 %rs3537, %rs3536, 4; 2026-02-21T10:19:38.1485428Z cvt.s16.s8 %rs3538, %rs3461; 2026-02-21T10:19:38.1485488Z shr.s16 %rs3539, %rs3538, 4; 2026-02-21T10:19:38.1485549Z shr.s16 %rs3540, %rs3458, 4; 2026-02-21T10:19:38.1485609Z shr.s16 %rs3541, %rs3460, 4; 2026-02-21T10:19:38.1485799Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1485863Z cvt.rn.f32.s16 %r42341, %rs3541; 2026-02-21T10:19:38.1485925Z cvt.rn.f32.s16 %r42342, %rs3540; 2026-02-21T10:19:38.1485985Z cvt.rn.f32.s16 %r42343, %rs3539; 2026-02-21T10:19:38.1486047Z cvt.rn.f32.s16 %r42344, %rs3537; 2026-02-21T10:19:38.1486235Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1486307Z cvt.s16.s8 %rs3542, %rs3463; 2026-02-21T10:19:38.1486369Z shr.s16 %rs3543, %rs3542, 4; 2026-02-21T10:19:38.1486431Z cvt.s16.s8 %rs3544, %rs3465; 2026-02-21T10:19:38.1486621Z shr.s16 %rs3545, %rs3544, 4; 2026-02-21T10:19:38.1486685Z shr.s16 %rs3546, %rs3462, 4; 2026-02-21T10:19:38.1486750Z shr.s16 %rs3547, %rs3464, 4; 2026-02-21T10:19:38.1487023Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1487092Z cvt.rn.f32.s16 %r42345, %rs3547; 2026-02-21T10:19:38.1487155Z cvt.rn.f32.s16 %r42346, %rs3546; 2026-02-21T10:19:38.1487215Z cvt.rn.f32.s16 %r42347, %rs3545; 2026-02-21T10:19:38.1487275Z cvt.rn.f32.s16 %r42348, %rs3543; 2026-02-21T10:19:38.1487461Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1487523Z cvt.s16.s8 %rs3548, %rs3467; 2026-02-21T10:19:38.1487650Z shr.s16 %rs3549, %rs3548, 4; 2026-02-21T10:19:38.1487709Z cvt.s16.s8 %rs3550, %rs3469; 2026-02-21T10:19:38.1487778Z shr.s16 %rs3551, %rs3550, 4; 2026-02-21T10:19:38.1487844Z shr.s16 %rs3552, %rs3466, 4; 2026-02-21T10:19:38.1487903Z shr.s16 %rs3553, %rs3468, 4; 2026-02-21T10:19:38.1488091Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1488158Z cvt.rn.f32.s16 %r42349, %rs3553; 2026-02-21T10:19:38.1488218Z cvt.rn.f32.s16 %r42350, %rs3552; 2026-02-21T10:19:38.1488279Z cvt.rn.f32.s16 %r42351, %rs3551; 2026-02-21T10:19:38.1488343Z cvt.rn.f32.s16 %r42352, %rs3549; 2026-02-21T10:19:38.1488530Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1488589Z cvt.s16.s8 %rs3554, %rs3471; 2026-02-21T10:19:38.1488650Z shr.s16 %rs3555, %rs3554, 4; 2026-02-21T10:19:38.1488708Z cvt.s16.s8 %rs3556, %rs3473; 2026-02-21T10:19:38.1488771Z shr.s16 %rs3557, %rs3556, 4; 2026-02-21T10:19:38.1488905Z shr.s16 %rs3558, %rs3470, 4; 2026-02-21T10:19:38.1488972Z shr.s16 %rs3559, %rs3472, 4; 2026-02-21T10:19:38.1489161Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1489223Z cvt.rn.f32.s16 %r42353, %rs3559; 2026-02-21T10:19:38.1489291Z cvt.rn.f32.s16 %r42354, %rs3558; 2026-02-21T10:19:38.1489351Z cvt.rn.f32.s16 %r42355, %rs3557; 2026-02-21T10:19:38.1489411Z cvt.rn.f32.s16 %r42356, %rs3555; 2026-02-21T10:19:38.1489600Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1489659Z cvt.s16.s8 %rs3560, %rs3475; 2026-02-21T10:19:38.1489718Z shr.s16 %rs3561, %rs3560, 4; 2026-02-21T10:19:38.1489776Z cvt.s16.s8 %rs3562, %rs3477; 2026-02-21T10:19:38.1489837Z shr.s16 %rs3563, %rs3562, 4; 2026-02-21T10:19:38.1489896Z shr.s16 %rs3564, %rs3474, 4; 2026-02-21T10:19:38.1489954Z shr.s16 %rs3565, %rs3476, 4; 2026-02-21T10:19:38.1490149Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1490211Z cvt.rn.f32.s16 %r42357, %rs3565; 2026-02-21T10:19:38.1490271Z cvt.rn.f32.s16 %r42358, %rs3564; 2026-02-21T10:19:38.1490334Z cvt.rn.f32.s16 %r42359, %rs3563; 2026-02-21T10:19:38.1490395Z cvt.rn.f32.s16 %r42360, %rs3561; 2026-02-21T10:19:38.1490670Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1490742Z cvt.s16.s8 %rs3566, %rs3479; 2026-02-21T10:19:38.1490807Z shr.s16 %rs3567, %rs3566, 4; 2026-02-21T10:19:38.1490866Z cvt.s16.s8 %rs3568, %rs3481; 2026-02-21T10:19:38.1490923Z shr.s16 %rs3569, %rs3568, 4; 2026-02-21T10:19:38.1490984Z shr.s16 %rs3570, %rs3478, 4; 2026-02-21T10:19:38.1491045Z shr.s16 %rs3571, %rs3480, 4; 2026-02-21T10:19:38.1491232Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1491297Z cvt.rn.f32.s16 %r42361, %rs3571; 2026-02-21T10:19:38.1491358Z cvt.rn.f32.s16 %r42362, %rs3570; 2026-02-21T10:19:38.1491418Z cvt.rn.f32.s16 %r42363, %rs3569; 2026-02-21T10:19:38.1491477Z cvt.rn.f32.s16 %r42364, %rs3567; 2026-02-21T10:19:38.1491666Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1491727Z cvt.s16.s8 %rs3572, %rs3483; 2026-02-21T10:19:38.1491785Z shr.s16 %rs3573, %rs3572, 4; 2026-02-21T10:19:38.1491847Z cvt.s16.s8 %rs3574, %rs3485; 2026-02-21T10:19:38.1491956Z shr.s16 %rs3575, %rs3574, 4; 2026-02-21T10:19:38.1492016Z shr.s16 %rs3576, %rs3482, 4; 2026-02-21T10:19:38.1492075Z shr.s16 %rs3577, %rs3484, 4; 2026-02-21T10:19:38.1492270Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1492333Z cvt.rn.f32.s16 %r42365, %rs3577; 2026-02-21T10:19:38.1492395Z cvt.rn.f32.s16 %r42366, %rs3576; 2026-02-21T10:19:38.1492504Z cvt.rn.f32.s16 %r42367, %rs3575; 2026-02-21T10:19:38.1492565Z cvt.rn.f32.s16 %r42368, %rs3573; 2026-02-21T10:19:38.1492754Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1492816Z cvt.s16.s8 %rs3578, %rs3487; 2026-02-21T10:19:38.1492875Z shr.s16 %rs3579, %rs3578, 4; 2026-02-21T10:19:38.1492935Z cvt.s16.s8 %rs3580, %rs3489; 2026-02-21T10:19:38.1492995Z shr.s16 %rs3581, %rs3580, 4; 2026-02-21T10:19:38.1493057Z shr.s16 %rs3582, %rs3486, 4; 2026-02-21T10:19:38.1493116Z shr.s16 %rs3583, %rs3488, 4; 2026-02-21T10:19:38.1493302Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1493366Z cvt.rn.f32.s16 %r42369, %rs3583; 2026-02-21T10:19:38.1493426Z cvt.rn.f32.s16 %r42370, %rs3582; 2026-02-21T10:19:38.1493487Z cvt.rn.f32.s16 %r42371, %rs3581; 2026-02-21T10:19:38.1493549Z cvt.rn.f32.s16 %r42372, %rs3579; 2026-02-21T10:19:38.1493788Z .loc 1 66 25 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:66:25 2026-02-21T10:19:38.1493850Z cvt.s16.s8 %rs3584, %rs3491; 2026-02-21T10:19:38.1493908Z shr.s16 %rs3585, %rs3584, 4; 2026-02-21T10:19:38.1493969Z cvt.s16.s8 %rs3586, %rs3493; 2026-02-21T10:19:38.1494027Z shr.s16 %rs3587, %rs3586, 4; 2026-02-21T10:19:38.1494084Z shr.s16 %rs3588, %rs3490, 4; 2026-02-21T10:19:38.1494146Z shr.s16 %rs3589, %rs3492, 4; 2026-02-21T10:19:38.1494337Z .loc 1 84 32 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:84:32 2026-02-21T10:19:38.1494398Z cvt.rn.f32.s16 %r42373, %rs3589; 2026-02-21T10:19:38.1494461Z cvt.rn.f32.s16 %r42374, %rs3588; 2026-02-21T10:19:38.1494522Z cvt.rn.f32.s16 %r42375, %rs3587; 2026-02-21T10:19:38.1494587Z cvt.rn.f32.s16 %r42376, %rs3585; 2026-02-21T10:19:38.1494645Z bar.sync 0; 2026-02-21T10:19:38.1494782Z st.shared.v4.b32 [%r86], {%r42316, %r42314, %r42315, %r42313}; 2026-02-21T10:19:38.1494910Z st.shared.v4.b32 [%r86+16384], {%r42348, %r42346, %r42347, %r42345}; 2026-02-21T10:19:38.1495027Z st.shared.v4.b32 [%r87], {%r42320, %r42318, %r42319, %r42317}; 2026-02-21T10:19:38.1495147Z st.shared.v4.b32 [%r87+16384], {%r42352, %r42350, %r42351, %r42349}; 2026-02-21T10:19:38.1495254Z st.shared.v4.b32 [%r88], {%r42324, %r42322, %r42323, %r42321}; 2026-02-21T10:19:38.1495426Z st.shared.v4.b32 [%r88+16384], {%r42356, %r42354, %r42355, %r42353}; 2026-02-21T10:19:38.1495536Z st.shared.v4.b32 [%r89], {%r42328, %r42326, %r42327, %r42325}; 2026-02-21T10:19:38.1495652Z st.shared.v4.b32 [%r89+16384], {%r42360, %r42358, %r42359, %r42357}; 2026-02-21T10:19:38.1495756Z st.shared.v4.b32 [%r90], {%r42332, %r42330, %r42331, %r42329}; 2026-02-21T10:19:38.1495871Z st.shared.v4.b32 [%r90+16384], {%r42364, %r42362, %r42363, %r42361}; 2026-02-21T10:19:38.1495978Z st.shared.v4.b32 [%r91], {%r42336, %r42334, %r42335, %r42333}; 2026-02-21T10:19:38.1496091Z st.shared.v4.b32 [%r91+16384], {%r42368, %r42366, %r42367, %r42365}; 2026-02-21T10:19:38.1496200Z st.shared.v4.b32 [%r92], {%r42340, %r42338, %r42339, %r42337}; 2026-02-21T10:19:38.1496316Z st.shared.v4.b32 [%r92+16384], {%r42372, %r42370, %r42371, %r42369}; 2026-02-21T10:19:38.1496421Z st.shared.v4.b32 [%r93], {%r42344, %r42342, %r42343, %r42341}; 2026-02-21T10:19:38.1496664Z st.shared.v4.b32 [%r93+16384], {%r42376, %r42374, %r42375, %r42373}; 2026-02-21T10:19:38.1496727Z $L__tmp31: 2026-02-21T10:19:38.1497088Z .loc 2 291 36 // standard.py:291:36 @[ cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:91:40 ] 2026-02-21T10:19:38.1497153Z // begin inline asm 2026-02-21T10:19:38.1497240Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1497297Z // end inline asm 2026-02-21T10:19:38.1497352Z bar.sync 0; 2026-02-21T10:19:38.1497436Z shfl.sync.idx.b32 %r42377, %r4, 0, 31, -1; 2026-02-21T10:19:38.1497511Z wgmma.fence.sync.aligned; 2026-02-21T10:19:38.1497577Z mov.pred %p382, -1; 2026-02-21T10:19:38.1497636Z // begin inline asm 2026-02-21T10:19:38.1499183Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40066,%r40067,%r40068,%r40069}, %rd12, %p382, 1, 1; 2026-02-21T10:19:38.1499242Z // end inline asm 2026-02-21T10:19:38.1499302Z // begin inline asm 2026-02-21T10:19:38.1500819Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40198,%r40199,%r40200,%r40201}, %rd13, %p382, 1, 1; 2026-02-21T10:19:38.1500880Z // end inline asm 2026-02-21T10:19:38.1500954Z // begin inline asm 2026-02-21T10:19:38.1502412Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40330,%r40331,%r40332,%r40333}, %rd14, %p382, 1, 1; 2026-02-21T10:19:38.1502472Z // end inline asm 2026-02-21T10:19:38.1502529Z // begin inline asm 2026-02-21T10:19:38.1503982Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40462,%r40463,%r40464,%r40465}, %rd15, %p382, 1, 1; 2026-02-21T10:19:38.1504102Z // end inline asm 2026-02-21T10:19:38.1504160Z // begin inline asm 2026-02-21T10:19:38.1505665Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40594,%r40595,%r40596,%r40597}, %rd16, %p382, 1, 1; 2026-02-21T10:19:38.1505730Z // end inline asm 2026-02-21T10:19:38.1505791Z // begin inline asm 2026-02-21T10:19:38.1507386Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40726,%r40727,%r40728,%r40729}, %rd17, %p382, 1, 1; 2026-02-21T10:19:38.1507520Z // end inline asm 2026-02-21T10:19:38.1507580Z // begin inline asm 2026-02-21T10:19:38.1509107Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40858,%r40859,%r40860,%r40861}, %rd18, %p382, 1, 1; 2026-02-21T10:19:38.1509230Z // end inline asm 2026-02-21T10:19:38.1509290Z // begin inline asm 2026-02-21T10:19:38.1510749Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304}, {%r40990,%r40991,%r40992,%r40993}, %rd19, %p382, 1, 1; 2026-02-21T10:19:38.1510808Z // end inline asm 2026-02-21T10:19:38.1510867Z // begin inline asm 2026-02-21T10:19:38.1512320Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41122,%r41123,%r41124,%r41125}, %rd12, %p382, 1, 1; 2026-02-21T10:19:38.1512440Z // end inline asm 2026-02-21T10:19:38.1512502Z // begin inline asm 2026-02-21T10:19:38.1513949Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41254,%r41255,%r41256,%r41257}, %rd13, %p382, 1, 1; 2026-02-21T10:19:38.1514019Z // end inline asm 2026-02-21T10:19:38.1514076Z // begin inline asm 2026-02-21T10:19:38.1515604Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41386,%r41387,%r41388,%r41389}, %rd14, %p382, 1, 1; 2026-02-21T10:19:38.1515665Z // end inline asm 2026-02-21T10:19:38.1515767Z // begin inline asm 2026-02-21T10:19:38.1517361Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41518,%r41519,%r41520,%r41521}, %rd15, %p382, 1, 1; 2026-02-21T10:19:38.1517425Z // end inline asm 2026-02-21T10:19:38.1517482Z // begin inline asm 2026-02-21T10:19:38.1519031Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41650,%r41651,%r41652,%r41653}, %rd16, %p382, 1, 1; 2026-02-21T10:19:38.1519095Z // end inline asm 2026-02-21T10:19:38.1519152Z // begin inline asm 2026-02-21T10:19:38.1520618Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41782,%r41783,%r41784,%r41785}, %rd17, %p382, 1, 1; 2026-02-21T10:19:38.1520676Z // end inline asm 2026-02-21T10:19:38.1520731Z // begin inline asm 2026-02-21T10:19:38.1522189Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r41914,%r41915,%r41916,%r41917}, %rd18, %p382, 1, 1; 2026-02-21T10:19:38.1522309Z // end inline asm 2026-02-21T10:19:38.1522367Z // begin inline asm 2026-02-21T10:19:38.1523889Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368}, {%r42046,%r42047,%r42048,%r42049}, %rd19, %p382, 1, 1; 2026-02-21T10:19:38.1523950Z // end inline asm 2026-02-21T10:19:38.1524032Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:38.1524095Z mov.b32 %r42178, %r39931; 2026-02-21T10:19:38.1524154Z mov.b32 %r42179, %r42180; 2026-02-21T10:19:38.1524212Z // begin inline asm 2026-02-21T10:19:38.1526806Z // wait for regs: %r43241,%r43242,%r43243,%r43244,%r43245,%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r42178,%r42179,%r42180 2026-02-21T10:19:38.1526980Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:38.1527096Z // end inline asm 2026-02-21T10:19:38.1527153Z $L__tmp32: 2026-02-21T10:19:38.1527366Z .loc 1 48 93 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:48:93 2026-02-21T10:19:38.1527431Z add.s64 %rd854, %rd854, 128; 2026-02-21T10:19:38.1527497Z setp.lt.u64 %p400, %rd855, 4064; 2026-02-21T10:19:38.1527559Z @%p400 bra $L__BB0_20; 2026-02-21T10:19:38.1527675Z // %bb.21: // in Loop: Header=BB0_17 Depth=1 2026-02-21T10:19:38.1527874Z .loc 1 94 28 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:94:28 2026-02-21T10:19:38.1527960Z cvt.rn.bf16x2.f32 %r42381, %r43242, %r43241; 2026-02-21T10:19:38.1528040Z cvt.rn.bf16x2.f32 %r42382, %r43244, %r43243; 2026-02-21T10:19:38.1528115Z cvt.rn.bf16x2.f32 %r42383, %r43246, %r43245; 2026-02-21T10:19:38.1528189Z cvt.rn.bf16x2.f32 %r42384, %r43248, %r43247; 2026-02-21T10:19:38.1528265Z cvt.rn.bf16x2.f32 %r42385, %r43250, %r43249; 2026-02-21T10:19:38.1528343Z cvt.rn.bf16x2.f32 %r42386, %r43252, %r43251; 2026-02-21T10:19:38.1528417Z cvt.rn.bf16x2.f32 %r42387, %r43254, %r43253; 2026-02-21T10:19:38.1528493Z cvt.rn.bf16x2.f32 %r42388, %r43256, %r43255; 2026-02-21T10:19:38.1528567Z cvt.rn.bf16x2.f32 %r42389, %r43258, %r43257; 2026-02-21T10:19:38.1528641Z cvt.rn.bf16x2.f32 %r42390, %r43260, %r43259; 2026-02-21T10:19:38.1528786Z cvt.rn.bf16x2.f32 %r42391, %r43262, %r43261; 2026-02-21T10:19:38.1528862Z cvt.rn.bf16x2.f32 %r42392, %r43264, %r43263; 2026-02-21T10:19:38.1528937Z cvt.rn.bf16x2.f32 %r42393, %r43266, %r43265; 2026-02-21T10:19:38.1529010Z cvt.rn.bf16x2.f32 %r42394, %r43268, %r43267; 2026-02-21T10:19:38.1529085Z cvt.rn.bf16x2.f32 %r42395, %r43270, %r43269; 2026-02-21T10:19:38.1529159Z cvt.rn.bf16x2.f32 %r42396, %r43272, %r43271; 2026-02-21T10:19:38.1529232Z cvt.rn.bf16x2.f32 %r42397, %r43274, %r43273; 2026-02-21T10:19:38.1529307Z cvt.rn.bf16x2.f32 %r42398, %r43276, %r43275; 2026-02-21T10:19:38.1529388Z cvt.rn.bf16x2.f32 %r42399, %r43278, %r43277; 2026-02-21T10:19:38.1529462Z cvt.rn.bf16x2.f32 %r42400, %r43280, %r43279; 2026-02-21T10:19:38.1529536Z cvt.rn.bf16x2.f32 %r42401, %r43282, %r43281; 2026-02-21T10:19:38.1529611Z cvt.rn.bf16x2.f32 %r42402, %r43284, %r43283; 2026-02-21T10:19:38.1529684Z cvt.rn.bf16x2.f32 %r42403, %r43286, %r43285; 2026-02-21T10:19:38.1529760Z cvt.rn.bf16x2.f32 %r42404, %r43288, %r43287; 2026-02-21T10:19:38.1529835Z cvt.rn.bf16x2.f32 %r42405, %r43290, %r43289; 2026-02-21T10:19:38.1529972Z cvt.rn.bf16x2.f32 %r42406, %r43292, %r43291; 2026-02-21T10:19:38.1530055Z cvt.rn.bf16x2.f32 %r42407, %r43294, %r43293; 2026-02-21T10:19:38.1530130Z cvt.rn.bf16x2.f32 %r42408, %r43296, %r43295; 2026-02-21T10:19:38.1530202Z cvt.rn.bf16x2.f32 %r42409, %r43298, %r43297; 2026-02-21T10:19:38.1530274Z cvt.rn.bf16x2.f32 %r42410, %r43300, %r43299; 2026-02-21T10:19:38.1530349Z cvt.rn.bf16x2.f32 %r42411, %r43302, %r43301; 2026-02-21T10:19:38.1530421Z cvt.rn.bf16x2.f32 %r42412, %r43304, %r43303; 2026-02-21T10:19:38.1530546Z cvt.rn.bf16x2.f32 %r42413, %r43306, %r43305; 2026-02-21T10:19:38.1530622Z cvt.rn.bf16x2.f32 %r42414, %r43308, %r43307; 2026-02-21T10:19:38.1530695Z cvt.rn.bf16x2.f32 %r42415, %r43310, %r43309; 2026-02-21T10:19:38.1530768Z cvt.rn.bf16x2.f32 %r42416, %r43312, %r43311; 2026-02-21T10:19:38.1530844Z cvt.rn.bf16x2.f32 %r42417, %r43314, %r43313; 2026-02-21T10:19:38.1530921Z cvt.rn.bf16x2.f32 %r42418, %r43316, %r43315; 2026-02-21T10:19:38.1530994Z cvt.rn.bf16x2.f32 %r42419, %r43318, %r43317; 2026-02-21T10:19:38.1531070Z cvt.rn.bf16x2.f32 %r42420, %r43320, %r43319; 2026-02-21T10:19:38.1531149Z cvt.rn.bf16x2.f32 %r42421, %r43322, %r43321; 2026-02-21T10:19:38.1531223Z cvt.rn.bf16x2.f32 %r42422, %r43324, %r43323; 2026-02-21T10:19:38.1531296Z cvt.rn.bf16x2.f32 %r42423, %r43326, %r43325; 2026-02-21T10:19:38.1531371Z cvt.rn.bf16x2.f32 %r42424, %r43328, %r43327; 2026-02-21T10:19:38.1531445Z cvt.rn.bf16x2.f32 %r42425, %r43330, %r43329; 2026-02-21T10:19:38.1531569Z cvt.rn.bf16x2.f32 %r42426, %r43332, %r43331; 2026-02-21T10:19:38.1531647Z cvt.rn.bf16x2.f32 %r42427, %r43334, %r43333; 2026-02-21T10:19:38.1531721Z cvt.rn.bf16x2.f32 %r42428, %r43336, %r43335; 2026-02-21T10:19:38.1531793Z cvt.rn.bf16x2.f32 %r42429, %r43338, %r43337; 2026-02-21T10:19:38.1531866Z cvt.rn.bf16x2.f32 %r42430, %r43340, %r43339; 2026-02-21T10:19:38.1531944Z cvt.rn.bf16x2.f32 %r42431, %r43342, %r43341; 2026-02-21T10:19:38.1532016Z cvt.rn.bf16x2.f32 %r42432, %r43344, %r43343; 2026-02-21T10:19:38.1532090Z cvt.rn.bf16x2.f32 %r42433, %r43346, %r43345; 2026-02-21T10:19:38.1532165Z cvt.rn.bf16x2.f32 %r42434, %r43348, %r43347; 2026-02-21T10:19:38.1532239Z cvt.rn.bf16x2.f32 %r42435, %r43350, %r43349; 2026-02-21T10:19:38.1532312Z cvt.rn.bf16x2.f32 %r42436, %r43352, %r43351; 2026-02-21T10:19:38.1532389Z cvt.rn.bf16x2.f32 %r42437, %r43354, %r43353; 2026-02-21T10:19:38.1532462Z cvt.rn.bf16x2.f32 %r42438, %r43356, %r43355; 2026-02-21T10:19:38.1532535Z cvt.rn.bf16x2.f32 %r42439, %r43358, %r43357; 2026-02-21T10:19:38.1532611Z cvt.rn.bf16x2.f32 %r42440, %r43360, %r43359; 2026-02-21T10:19:38.1532689Z cvt.rn.bf16x2.f32 %r42441, %r43362, %r43361; 2026-02-21T10:19:38.1532763Z cvt.rn.bf16x2.f32 %r42442, %r43364, %r43363; 2026-02-21T10:19:38.1532848Z cvt.rn.bf16x2.f32 %r42443, %r43366, %r43365; 2026-02-21T10:19:38.1532980Z cvt.rn.bf16x2.f32 %r42444, %r43368, %r43367; 2026-02-21T10:19:38.1533178Z .loc 1 95 43 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:95:43 2026-02-21T10:19:38.1533235Z bar.sync 0; 2026-02-21T10:19:38.1533431Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r94], {%r42381, %r42382, %r42383, %r42384}; 2026-02-21T10:19:38.1533615Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r95], {%r42397, %r42398, %r42399, %r42400}; 2026-02-21T10:19:38.1533797Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r96], {%r42413, %r42414, %r42415, %r42416}; 2026-02-21T10:19:38.1533986Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r97], {%r42429, %r42430, %r42431, %r42432}; 2026-02-21T10:19:38.1534172Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r98], {%r42385, %r42386, %r42387, %r42388}; 2026-02-21T10:19:38.1534359Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r99], {%r42401, %r42402, %r42403, %r42404}; 2026-02-21T10:19:38.1534546Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r100], {%r42417, %r42418, %r42419, %r42420}; 2026-02-21T10:19:38.1534735Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r101], {%r42433, %r42434, %r42435, %r42436}; 2026-02-21T10:19:38.1534978Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r102], {%r42389, %r42390, %r42391, %r42392}; 2026-02-21T10:19:38.1535168Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r103], {%r42405, %r42406, %r42407, %r42408}; 2026-02-21T10:19:38.1535353Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r104], {%r42421, %r42422, %r42423, %r42424}; 2026-02-21T10:19:38.1535534Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r105], {%r42437, %r42438, %r42439, %r42440}; 2026-02-21T10:19:38.1535717Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r106], {%r42393, %r42394, %r42395, %r42396}; 2026-02-21T10:19:38.1535948Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r107], {%r42409, %r42410, %r42411, %r42412}; 2026-02-21T10:19:38.1536132Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r108], {%r42425, %r42426, %r42427, %r42428}; 2026-02-21T10:19:38.1536314Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r109], {%r42441, %r42442, %r42443, %r42444}; 2026-02-21T10:19:38.1536377Z // begin inline asm 2026-02-21T10:19:38.1536580Z fence.proxy.async.shared::cta; 2026-02-21T10:19:38.1536644Z // end inline asm 2026-02-21T10:19:38.1536698Z bar.sync 0; 2026-02-21T10:19:38.1536768Z elect.sync %r42445|%p403, -1; 2026-02-21T10:19:38.1536851Z shfl.sync.idx.b32 %r42446, %r4, 0, 31, -1; 2026-02-21T10:19:38.1536930Z and.pred %p401, %p405, %p403; 2026-02-21T10:19:38.1536996Z and.b32 %r42447, %r42446, 1; 2026-02-21T10:19:38.1537057Z shl.b32 %r42448, %r42447, 14; 2026-02-21T10:19:38.1537122Z add.s32 %r42380, %r39931, %r42448; 2026-02-21T10:19:38.1537185Z shl.b32 %r42450, %r42447, 6; 2026-02-21T10:19:38.1537320Z or.b32 %r42378, %r42450, %r39932; 2026-02-21T10:19:38.1537380Z // begin inline asm 2026-02-21T10:19:38.1537613Z @%p401 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r42378, %r42379}], [%r42380]; 2026-02-21T10:19:38.1537671Z // end inline asm 2026-02-21T10:19:38.1537744Z cp.async.bulk.commit_group; 2026-02-21T10:19:38.1537819Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:19:38.1537876Z bar.sync 0; 2026-02-21T10:19:38.1538086Z .loc 1 28 112 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:112 2026-02-21T10:19:38.1538150Z add.s32 %r2193, %r43239, 4224; 2026-02-21T10:19:38.1538215Z setp.lt.s32 %p404, %r43239, 896; 2026-02-21T10:19:38.1538279Z mov.b32 %r43239, %r2193; 2026-02-21T10:19:38.1538340Z @%p404 bra $L__BB0_17; 2026-02-21T10:19:38.1538427Z $L__BB0_22: // %._crit_edge 2026-02-21T10:19:38.1538687Z .loc 1 28 4 // cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py:28:4 2026-02-21T10:19:38.1538783Z ret; 2026-02-21T10:19:38.1538877Z $L__tmp33: 2026-02-21T10:19:38.1538937Z $L__func_end0: 2026-02-21T10:19:38.1539026Z // -- End function 2026-02-21T10:19:38.1539078Z } 2026-02-21T10:19:38.1539321Z .file 1 "/tmp/torchinductor_root/gc/cgc5n2dco6k4ksfpdal4efl4is3hsy475epaey2o7l5anka4qf57.py" 2026-02-21T10:19:38.1539625Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:19:38.1539691Z .section .debug_abbrev 2026-02-21T10:19:38.1539740Z { 2026-02-21T10:19:38.1539837Z .b8 1 // Abbreviation Code 2026-02-21T10:19:38.1539932Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:19:38.1540015Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:19:38.1540100Z .b8 37 // DW_AT_producer 2026-02-21T10:19:38.1540178Z .b8 8 // DW_FORM_string 2026-02-21T10:19:38.1540259Z .b8 19 // DW_AT_language 2026-02-21T10:19:38.1540339Z .b8 5 // DW_FORM_data2 2026-02-21T10:19:38.1540418Z .b8 3 // DW_AT_name 2026-02-21T10:19:38.1540495Z .b8 8 // DW_FORM_string 2026-02-21T10:19:38.1540578Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:19:38.1540657Z .b8 6 // DW_FORM_data4 2026-02-21T10:19:38.1540804Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:19:38.1540883Z .b8 8 // DW_FORM_string 2026-02-21T10:19:38.1540956Z .b8 0 // EOM(1) 2026-02-21T10:19:38.1541036Z .b8 0 // EOM(2) 2026-02-21T10:19:38.1541124Z .b8 2 // Abbreviation Code 2026-02-21T10:19:38.1541211Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:19:38.1541375Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:19:38.1541449Z .b8 3 // DW_AT_name 2026-02-21T10:19:38.1541525Z .b8 8 // DW_FORM_string 2026-02-21T10:19:38.1541605Z .b8 32 // DW_AT_inline 2026-02-21T10:19:38.1541685Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:38.1541754Z .b8 0 // EOM(1) 2026-02-21T10:19:38.1541825Z .b8 0 // EOM(2) 2026-02-21T10:19:38.1541908Z .b8 3 // Abbreviation Code 2026-02-21T10:19:38.1541993Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:19:38.1542075Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:19:38.1542153Z .b8 17 // DW_AT_low_pc 2026-02-21T10:19:38.1542229Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:38.1542360Z .b8 18 // DW_AT_high_pc 2026-02-21T10:19:38.1542438Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:38.1542530Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:19:38.1542604Z .b8 19 // DW_FORM_ref4 2026-02-21T10:19:38.1542676Z .b8 0 // EOM(1) 2026-02-21T10:19:38.1542743Z .b8 0 // EOM(2) 2026-02-21T10:19:38.1542826Z .b8 4 // Abbreviation Code 2026-02-21T10:19:38.1542929Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:19:38.1543008Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:19:38.1543096Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:19:38.1543169Z .b8 19 // DW_FORM_ref4 2026-02-21T10:19:38.1543260Z .b8 17 // DW_AT_low_pc 2026-02-21T10:19:38.1543337Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:38.1543415Z .b8 18 // DW_AT_high_pc 2026-02-21T10:19:38.1543491Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:38.1543625Z .b8 88 // DW_AT_call_file 2026-02-21T10:19:38.1543700Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:38.1543783Z .b8 89 // DW_AT_call_line 2026-02-21T10:19:38.1543858Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:38.1543939Z .b8 87 // DW_AT_call_column 2026-02-21T10:19:38.1544013Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:38.1544083Z .b8 0 // EOM(1) 2026-02-21T10:19:38.1544150Z .b8 0 // EOM(2) 2026-02-21T10:19:38.1544219Z .b8 0 // EOM(3) 2026-02-21T10:19:38.1544272Z } 2026-02-21T10:19:38.1544334Z .section .debug_info 2026-02-21T10:19:38.1544382Z { 2026-02-21T10:19:38.1544471Z .b32 178 // Length of Unit 2026-02-21T10:19:38.1544569Z .b8 2 // DWARF version number 2026-02-21T10:19:38.1544621Z .b8 0 2026-02-21T10:19:38.1544750Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:19:38.1544896Z .b8 8 // Address Size (in bytes) 2026-02-21T10:19:38.1545010Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:19:38.1545094Z .b8 116 // DW_AT_producer 2026-02-21T10:19:38.1545154Z .b8 114 2026-02-21T10:19:38.1545212Z .b8 105 2026-02-21T10:19:38.1545263Z .b8 116 2026-02-21T10:19:38.1545312Z .b8 111 2026-02-21T10:19:38.1545363Z .b8 110 2026-02-21T10:19:38.1545412Z .b8 0 2026-02-21T10:19:38.1545542Z .b8 2 // DW_AT_language 2026-02-21T10:19:38.1545594Z .b8 0 2026-02-21T10:19:38.1545671Z .b8 99 // DW_AT_name 2026-02-21T10:19:38.1545724Z .b8 103 2026-02-21T10:19:38.1545774Z .b8 99 2026-02-21T10:19:38.1545825Z .b8 53 2026-02-21T10:19:38.1545876Z .b8 110 2026-02-21T10:19:38.1545927Z .b8 50 2026-02-21T10:19:38.1545980Z .b8 100 2026-02-21T10:19:38.1546028Z .b8 99 2026-02-21T10:19:38.1546078Z .b8 111 2026-02-21T10:19:38.1546126Z .b8 54 2026-02-21T10:19:38.1546179Z .b8 107 2026-02-21T10:19:38.1546230Z .b8 52 2026-02-21T10:19:38.1546280Z .b8 107 2026-02-21T10:19:38.1546342Z .b8 115 2026-02-21T10:19:38.1546395Z .b8 102 2026-02-21T10:19:38.1546562Z .b8 112 2026-02-21T10:19:38.1546628Z .b8 100 2026-02-21T10:19:38.1546682Z .b8 97 2026-02-21T10:19:38.1546732Z .b8 108 2026-02-21T10:19:38.1546781Z .b8 52 2026-02-21T10:19:38.1546832Z .b8 101 2026-02-21T10:19:38.1546881Z .b8 102 2026-02-21T10:19:38.1546942Z .b8 108 2026-02-21T10:19:38.1546996Z .b8 52 2026-02-21T10:19:38.1547050Z .b8 105 2026-02-21T10:19:38.1547176Z .b8 115 2026-02-21T10:19:38.1547227Z .b8 51 2026-02-21T10:19:38.1547277Z .b8 104 2026-02-21T10:19:38.1547329Z .b8 115 2026-02-21T10:19:38.1547378Z .b8 121 2026-02-21T10:19:38.1547427Z .b8 52 2026-02-21T10:19:38.1547478Z .b8 55 2026-02-21T10:19:38.1547538Z .b8 53 2026-02-21T10:19:38.1547592Z .b8 101 2026-02-21T10:19:38.1547642Z .b8 112 2026-02-21T10:19:38.1547694Z .b8 97 2026-02-21T10:19:38.1547744Z .b8 101 2026-02-21T10:19:38.1547793Z .b8 121 2026-02-21T10:19:38.1547845Z .b8 50 2026-02-21T10:19:38.1547897Z .b8 111 2026-02-21T10:19:38.1547946Z .b8 55 2026-02-21T10:19:38.1547996Z .b8 108 2026-02-21T10:19:38.1548047Z .b8 53 2026-02-21T10:19:38.1548096Z .b8 97 2026-02-21T10:19:38.1548146Z .b8 110 2026-02-21T10:19:38.1548197Z .b8 107 2026-02-21T10:19:38.1548248Z .b8 97 2026-02-21T10:19:38.1548297Z .b8 52 2026-02-21T10:19:38.1548346Z .b8 113 2026-02-21T10:19:38.1548480Z .b8 102 2026-02-21T10:19:38.1548533Z .b8 53 2026-02-21T10:19:38.1548583Z .b8 55 2026-02-21T10:19:38.1548631Z .b8 46 2026-02-21T10:19:38.1548685Z .b8 112 2026-02-21T10:19:38.1548735Z .b8 121 2026-02-21T10:19:38.1548784Z .b8 0 2026-02-21T10:19:38.1548889Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:19:38.1548973Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:19:38.1549104Z .b8 116 2026-02-21T10:19:38.1549153Z .b8 109 2026-02-21T10:19:38.1549206Z .b8 112 2026-02-21T10:19:38.1549255Z .b8 47 2026-02-21T10:19:38.1549304Z .b8 116 2026-02-21T10:19:38.1549354Z .b8 111 2026-02-21T10:19:38.1549405Z .b8 114 2026-02-21T10:19:38.1549454Z .b8 99 2026-02-21T10:19:38.1549503Z .b8 104 2026-02-21T10:19:38.1549554Z .b8 105 2026-02-21T10:19:38.1549604Z .b8 110 2026-02-21T10:19:38.1549652Z .b8 100 2026-02-21T10:19:38.1549706Z .b8 117 2026-02-21T10:19:38.1549754Z .b8 99 2026-02-21T10:19:38.1549804Z .b8 116 2026-02-21T10:19:38.1549853Z .b8 111 2026-02-21T10:19:38.1549905Z .b8 114 2026-02-21T10:19:38.1549953Z .b8 95 2026-02-21T10:19:38.1550004Z .b8 114 2026-02-21T10:19:38.1550053Z .b8 111 2026-02-21T10:19:38.1550108Z .b8 111 2026-02-21T10:19:38.1550157Z .b8 116 2026-02-21T10:19:38.1550206Z .b8 47 2026-02-21T10:19:38.1550257Z .b8 103 2026-02-21T10:19:38.1550305Z .b8 99 2026-02-21T10:19:38.1550355Z .b8 0 2026-02-21T10:19:38.1550468Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:19:38.1550549Z .b8 95 // DW_AT_name 2026-02-21T10:19:38.1550599Z .b8 104 2026-02-21T10:19:38.1550648Z .b8 101 2026-02-21T10:19:38.1550777Z .b8 108 2026-02-21T10:19:38.1550833Z .b8 105 2026-02-21T10:19:38.1550883Z .b8 111 2026-02-21T10:19:38.1550932Z .b8 110 2026-02-21T10:19:38.1550983Z .b8 95 2026-02-21T10:19:38.1551032Z .b8 109 2026-02-21T10:19:38.1551081Z .b8 97 2026-02-21T10:19:38.1551132Z .b8 116 2026-02-21T10:19:38.1551183Z .b8 109 2026-02-21T10:19:38.1551232Z .b8 117 2026-02-21T10:19:38.1551280Z .b8 108 2026-02-21T10:19:38.1551332Z .b8 95 2026-02-21T10:19:38.1551380Z .b8 98 2026-02-21T10:19:38.1551493Z .b8 102 2026-02-21T10:19:38.1551544Z .b8 49 2026-02-21T10:19:38.1551595Z .b8 54 2026-02-21T10:19:38.1551645Z .b8 95 2026-02-21T10:19:38.1551694Z .b8 105 2026-02-21T10:19:38.1551745Z .b8 110 2026-02-21T10:19:38.1551794Z .b8 116 2026-02-21T10:19:38.1551842Z .b8 52 2026-02-21T10:19:38.1551891Z .b8 0 2026-02-21T10:19:38.1551971Z .b8 1 // DW_AT_inline 2026-02-21T10:19:38.1552080Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:19:38.1552173Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:19:38.1552270Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:19:38.1552369Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:19:38.1552498Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:19:38.1552596Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:19:38.1552683Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:19:38.1552825Z .b64 $L__tmp32 // DW_AT_high_pc 2026-02-21T10:19:38.1552913Z .b8 1 // DW_AT_call_file 2026-02-21T10:19:38.1552993Z .b8 91 // DW_AT_call_line 2026-02-21T10:19:38.1553080Z .b8 40 // DW_AT_call_column 2026-02-21T10:19:38.1553181Z .b8 0 // End Of Children Mark 2026-02-21T10:19:38.1553273Z .b8 0 // End Of Children Mark 2026-02-21T10:19:38.1553323Z } 2026-02-21T10:19:38.1553393Z .section .debug_macinfo { } 2026-02-21T10:19:38.1553398Z 2026-02-21T10:19:38.1553479Z ================================================================ 2026-02-21T10:19:38.1553596Z please share the reproducer above with Triton project. 2026-02-21T10:19:58.0619741Z 2026-02-21T10:19:58.0620175Z 2026-02-21T10:19:58.0620198Z 2026-02-21T10:19:58.0620755Z ================================================================ 2026-02-21T10:19:58.0621435Z Internal Triton PTX codegen error 2026-02-21T10:19:58.0621785Z `ptxas` stderr: 2026-02-21T10:19:58.0622542Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:19:58.0623889Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:19:58.0624124Z 2026-02-21T10:19:58.0624794Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpxertqlv9.ptx -o /tmp/tmpxertqlv9.ptx.o 2026-02-21T10:19:58.0625547Z 2026-02-21T10:19:58.0625552Z 2026-02-21T10:19:58.0625622Z // 2026-02-21T10:19:58.0625817Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:19:58.0626064Z // 2026-02-21T10:19:58.0626163Z 2026-02-21T10:19:58.0626236Z .version 8.7 2026-02-21T10:19:58.0626414Z .target sm_90a 2026-02-21T10:19:58.0626790Z .address_size 64 2026-02-21T10:19:58.0626912Z 2026-02-21T10:19:58.0627159Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:19:58.0627593Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:19:58.0627916Z // @_helion_matmul_bf16_int4 2026-02-21T10:19:58.0628236Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:19:58.0628673Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:19:58.0629291Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:19:58.0629724Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:19:58.0630145Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:19:58.0630604Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:19:58.0630926Z ) 2026-02-21T10:19:58.0631059Z .reqntid 128 2026-02-21T10:19:58.0631210Z .maxnreg 64 2026-02-21T10:19:58.0631469Z { 2026-02-21T10:19:58.0631624Z .reg .pred %p<64>; 2026-02-21T10:19:58.0631801Z .reg .b16 %rs<113>; 2026-02-21T10:19:58.0631977Z .reg .b32 %r<2657>; 2026-02-21T10:19:58.0632141Z .reg .b64 %rd<125>; 2026-02-21T10:19:58.0632483Z .loc 1 19 0 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:19:0 2026-02-21T10:19:58.0632893Z $L__func_begin0: 2026-02-21T10:19:58.0633221Z .loc 1 19 0 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:19:0 2026-02-21T10:19:58.0633570Z 2026-02-21T10:19:58.0633637Z // %bb.0: 2026-02-21T10:19:58.0633848Z ld.param.b64 %rd6, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:19:58.0634123Z $L__tmp0: 2026-02-21T10:19:58.0634437Z .loc 1 21 67 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:21:67 2026-02-21T10:19:58.0634842Z mov.u32 %r2520, %ctaid.x; 2026-02-21T10:19:58.0635101Z ld.param.b64 %rd9, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:19:58.0635368Z mov.u32 %r578, %ctaid.y; 2026-02-21T10:19:58.0635708Z ld.param.b64 %rd53, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:19:58.0635980Z mov.u32 %r579, %ctaid.z; 2026-02-21T10:19:58.0636164Z mov.u32 %r580, %nctaid.x; 2026-02-21T10:19:58.0636345Z mov.u32 %r581, %nctaid.y; 2026-02-21T10:19:58.0636665Z mad.lo.s32 %r582, %r579, %r581, %r578; 2026-02-21T10:19:58.0636893Z mad.lo.s32 %r583, %r582, %r580, %r2520; 2026-02-21T10:19:58.0637131Z shl.b32 %r584, %r583, 7; 2026-02-21T10:19:58.0637317Z cvt.s64.s32 %rd54, %r584; 2026-02-21T10:19:58.0637504Z add.s64 %rd23, %rd53, %rd54; 2026-02-21T10:19:58.0637697Z mov.u32 %r2, %tid.x; 2026-02-21T10:19:58.0637869Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:19:58.0638061Z shl.b32 %r585, %r2, 2; 2026-02-21T10:19:58.0638241Z mov.b32 %r586, global_smem; 2026-02-21T10:19:58.0638441Z add.s32 %r504, %r586, %r585; 2026-02-21T10:19:58.0638626Z mov.b32 %r2380, 0; 2026-02-21T10:19:58.0638872Z // begin inline asm 2026-02-21T10:19:58.0639104Z @%p1 st.shared.b32 [ %r504 + 0 ], %r2380; 2026-02-21T10:19:58.0639334Z // end inline asm 2026-02-21T10:19:58.0639504Z bar.warp.sync -1; 2026-02-21T10:19:58.0639674Z setp.eq.b32 %p61, %r2, 0; 2026-02-21T10:19:58.0639867Z cvt.u64.u32 %rd8, %r586; 2026-02-21T10:19:58.0640049Z // begin inline asm 2026-02-21T10:19:58.0640399Z @%p61 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd8 + 0 ], %rd9; 2026-02-21T10:19:58.0640889Z // end inline asm 2026-02-21T10:19:58.0641054Z // begin inline asm 2026-02-21T10:19:58.0641339Z @%p61 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1; 2026-02-21T10:19:58.0641675Z // end inline asm 2026-02-21T10:19:58.0641830Z mov.b32 %r506, 128; 2026-02-21T10:19:58.0641989Z // begin inline asm 2026-02-21T10:19:58.0642501Z [3191s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:19:58.0644085Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, False], range_num_stages=[4, 1], range_unroll_factors=[1, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:19:58.0645619Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:19:58.0645909Z `ptxas` stderr: 2026-02-21T10:19:58.0646711Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 1012 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:19:58.0647412Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:19:58.0647599Z 2026-02-21T10:19:58.0648116Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpxertqlv9.ptx -o /tmp/tmpxertqlv9.ptx.o 2026-02-21T10:19:58.0648769Z 2026-02-21T10:19:58.0648926Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:19:58.0649380Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r506; 2026-02-21T10:19:58.0649729Z // end inline asm 2026-02-21T10:19:58.0649882Z mov.b32 %r507, 16; 2026-02-21T10:19:58.0650045Z // begin inline asm 2026-02-21T10:19:58.0650326Z @%p61 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r507; 2026-02-21T10:19:58.0650670Z // end inline asm 2026-02-21T10:19:58.0650823Z mov.b32 %r508, 1280; 2026-02-21T10:19:58.0650990Z // begin inline asm 2026-02-21T10:19:58.0651288Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r508; 2026-02-21T10:19:58.0651638Z // end inline asm 2026-02-21T10:19:58.0651797Z mov.b32 %r509, 4096; 2026-02-21T10:19:58.0651953Z // begin inline asm 2026-02-21T10:19:58.0652248Z @%p61 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r509; 2026-02-21T10:19:58.0652597Z // end inline asm 2026-02-21T10:19:58.0652824Z mov.b64 %rd16, 1280; 2026-02-21T10:19:58.0652985Z // begin inline asm 2026-02-21T10:19:58.0653313Z @%p61 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd8 + 0 ], 0x0, %rd16; 2026-02-21T10:19:58.0653669Z // end inline asm 2026-02-21T10:19:58.0653827Z mov.b32 %r2379, 1; 2026-02-21T10:19:58.0653992Z // begin inline asm 2026-02-21T10:19:58.0654306Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0, %r2379; 2026-02-21T10:19:58.0654675Z // end inline asm 2026-02-21T10:19:58.0654822Z // begin inline asm 2026-02-21T10:19:58.0655135Z @%p61 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x1, %r2379; 2026-02-21T10:19:58.0655491Z // end inline asm 2026-02-21T10:19:58.0655642Z // begin inline asm 2026-02-21T10:19:58.0655920Z @%p61 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:19:58.0656245Z // end inline asm 2026-02-21T10:19:58.0656399Z // begin inline asm 2026-02-21T10:19:58.0656869Z @%p61 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:19:58.0657230Z // end inline asm 2026-02-21T10:19:58.0657377Z // begin inline asm 2026-02-21T10:19:58.0657664Z @%p61 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x3; 2026-02-21T10:19:58.0658094Z // end inline asm 2026-02-21T10:19:58.0658244Z // begin inline asm 2026-02-21T10:19:58.0658522Z @%p61 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd8 + 0 ], 0x0; 2026-02-21T10:19:58.0658840Z // end inline asm 2026-02-21T10:19:58.0658992Z // begin inline asm 2026-02-21T10:19:58.0659418Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd23 + 0 ], [ %rd8 + 0 ], 0x80; 2026-02-21T10:19:58.0659901Z // end inline asm 2026-02-21T10:19:58.0660048Z // begin inline asm 2026-02-21T10:19:58.0660297Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd23 + 0 ], 0x80; 2026-02-21T10:19:58.0660621Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:19:58.0660839Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:19:58.0661050Z // end inline asm 2026-02-21T10:19:58.0661207Z bar.sync 0; 2026-02-21T10:19:58.0661377Z cvta.global.u64 %rd34, %rd23; 2026-02-21T10:19:58.0661722Z .loc 1 38 45 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:38:45 2026-02-21T10:19:58.0662099Z bfe.u32 %r5, %r2, 3, 4; 2026-02-21T10:19:58.0662268Z or.b32 %r6, %r5, 16; 2026-02-21T10:19:58.0662506Z or.b32 %r7, %r5, 32; 2026-02-21T10:19:58.0662675Z or.b32 %r8, %r5, 48; 2026-02-21T10:19:58.0662831Z or.b32 %r9, %r5, 64; 2026-02-21T10:19:58.0662992Z or.b32 %r10, %r5, 80; 2026-02-21T10:19:58.0663150Z or.b32 %r11, %r5, 96; 2026-02-21T10:19:58.0663314Z or.b32 %r12, %r5, 112; 2026-02-21T10:19:58.0663642Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0664014Z sub.s32 %r589, 5120, %r2520; 2026-02-21T10:19:58.0664292Z mul.hi.s32 %r590, %r589, 1041204193; 2026-02-21T10:19:58.0664501Z shr.u32 %r591, %r590, 31; 2026-02-21T10:19:58.0664684Z shr.s32 %r592, %r590, 11; 2026-02-21T10:19:58.0664855Z add.s32 %r30, %r592, %r591; 2026-02-21T10:19:58.0665039Z mul.lo.s32 %r593, %r30, 8448; 2026-02-21T10:19:58.0665225Z setp.ne.b32 %p28, %r589, %r593; 2026-02-21T10:19:58.0665429Z setp.lt.u32 %p29, %r2520, 5121; 2026-02-21T10:19:58.0665620Z and.pred %p30, %p29, %p28; 2026-02-21T10:19:58.0665807Z selp.b32 %r31, 1, 0, %p30; 2026-02-21T10:19:58.0665979Z add.s32 %r32, %r30, %r31; 2026-02-21T10:19:58.0666304Z .loc 1 53 38 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:53:38 2026-02-21T10:19:58.0666827Z and.b32 %r33, %r2, 7; 2026-02-21T10:19:58.0666998Z shl.b32 %r34, %r33, 2; 2026-02-21T10:19:58.0667326Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0667697Z add.s32 %r512, %r586, 47104; 2026-02-21T10:19:58.0667971Z // begin inline asm 2026-02-21T10:19:58.0668176Z @%p61 mbarrier.init.shared::cta.b64 [%r512], 1; 2026-02-21T10:19:58.0668467Z // end inline asm 2026-02-21T10:19:58.0668619Z bar.sync 0; 2026-02-21T10:19:58.0668772Z add.s32 %r513, %r586, 47112; 2026-02-21T10:19:58.0668949Z // begin inline asm 2026-02-21T10:19:58.0669144Z @%p61 mbarrier.init.shared::cta.b64 [%r513], 1; 2026-02-21T10:19:58.0669368Z // end inline asm 2026-02-21T10:19:58.0669511Z bar.sync 0; 2026-02-21T10:19:58.0669665Z add.s32 %r514, %r586, 47120; 2026-02-21T10:19:58.0669837Z // begin inline asm 2026-02-21T10:19:58.0670026Z @%p61 mbarrier.init.shared::cta.b64 [%r514], 1; 2026-02-21T10:19:58.0670246Z // end inline asm 2026-02-21T10:19:58.0670407Z setp.lt.s32 %p31, %r32, 1; 2026-02-21T10:19:58.0670590Z setp.gt.s32 %p32, %r32, 0; 2026-02-21T10:19:58.0670929Z .loc 1 32 35 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:32:35 2026-02-21T10:19:58.0671313Z mul.hi.u32 %r594, %r2520, 1717986919; 2026-02-21T10:19:58.0671516Z shr.u32 %r595, %r594, 5; 2026-02-21T10:19:58.0671851Z .loc 1 33 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:33:33 2026-02-21T10:19:58.0672209Z shl.b32 %r596, %r595, 3; 2026-02-21T10:19:58.0672528Z .loc 1 34 39 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:34:39 2026-02-21T10:19:58.0672982Z sub.s32 %r597, 512, %r596; 2026-02-21T10:19:58.0673325Z .loc 1 34 52 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:34:52 2026-02-21T10:19:58.0673687Z min.s32 %r598, %r597, 8; 2026-02-21T10:19:58.0673998Z .loc 1 35 45 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:45 2026-02-21T10:19:58.0674360Z mul.lo.s32 %r599, %r595, 80; 2026-02-21T10:19:58.0674541Z sub.s32 %r600, %r2520, %r599; 2026-02-21T10:19:58.0674872Z .loc 1 36 51 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:36:51 2026-02-21T10:19:58.0675229Z div.s32 %r601, %r600, %r598; 2026-02-21T10:19:58.0675555Z .loc 1 35 64 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:64 2026-02-21T10:19:58.0675920Z mul.lo.s32 %r602, %r601, %r598; 2026-02-21T10:19:58.0676110Z sub.s32 %r603, %r600, %r602; 2026-02-21T10:19:58.0676437Z .loc 1 35 30 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:30 2026-02-21T10:19:58.0676926Z add.s32 %r604, %r603, %r596; 2026-02-21T10:19:58.0677347Z .loc 1 37 27 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:37:27 2026-02-21T10:19:58.0677720Z shl.b32 %r2377, %r604, 7; 2026-02-21T10:19:58.0678042Z .loc 1 38 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:38:32 2026-02-21T10:19:58.0678400Z or.b32 %r2521, %r2377, %r5; 2026-02-21T10:19:58.0678574Z or.b32 %r2522, %r2377, %r6; 2026-02-21T10:19:58.0678753Z or.b32 %r2523, %r2377, %r7; 2026-02-21T10:19:58.0679020Z or.b32 %r2524, %r2377, %r8; 2026-02-21T10:19:58.0679194Z or.b32 %r2525, %r2377, %r9; 2026-02-21T10:19:58.0679376Z or.b32 %r2526, %r2377, %r10; 2026-02-21T10:19:58.0679556Z or.b32 %r2527, %r2377, %r11; 2026-02-21T10:19:58.0679728Z or.b32 %r2528, %r2377, %r12; 2026-02-21T10:19:58.0680048Z .loc 1 39 27 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:39:27 2026-02-21T10:19:58.0680409Z shl.b32 %r2375, %r601, 7; 2026-02-21T10:19:58.0680725Z .loc 1 54 53 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:53 2026-02-21T10:19:58.0681093Z shl.b32 %r605, %r2521, 13; 2026-02-21T10:19:58.0681265Z shl.b32 %r606, %r2522, 13; 2026-02-21T10:19:58.0681439Z shl.b32 %r607, %r2523, 13; 2026-02-21T10:19:58.0681605Z shl.b32 %r608, %r2524, 13; 2026-02-21T10:19:58.0681780Z shl.b32 %r609, %r2525, 13; 2026-02-21T10:19:58.0681948Z shl.b32 %r610, %r2526, 13; 2026-02-21T10:19:58.0682123Z shl.b32 %r611, %r2527, 13; 2026-02-21T10:19:58.0682379Z shl.b32 %r612, %r2528, 13; 2026-02-21T10:19:58.0682703Z .loc 1 54 60 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:60 2026-02-21T10:19:58.0683077Z or.b32 %r613, %r605, %r34; 2026-02-21T10:19:58.0683251Z or.b32 %r614, %r606, %r34; 2026-02-21T10:19:58.0683426Z or.b32 %r615, %r607, %r34; 2026-02-21T10:19:58.0683602Z or.b32 %r616, %r608, %r34; 2026-02-21T10:19:58.0683779Z or.b32 %r617, %r609, %r34; 2026-02-21T10:19:58.0683954Z or.b32 %r618, %r610, %r34; 2026-02-21T10:19:58.0684131Z or.b32 %r619, %r611, %r34; 2026-02-21T10:19:58.0684300Z or.b32 %r620, %r612, %r34; 2026-02-21T10:19:58.0684620Z .loc 1 54 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:32 2026-02-21T10:19:58.0684990Z mad.wide.s32 %rd26, %r613, 2, %rd6; 2026-02-21T10:19:58.0685198Z mad.wide.s32 %rd27, %r614, 2, %rd6; 2026-02-21T10:19:58.0685402Z mad.wide.s32 %rd28, %r615, 2, %rd6; 2026-02-21T10:19:58.0685603Z mad.wide.s32 %rd29, %r616, 2, %rd6; 2026-02-21T10:19:58.0685811Z mad.wide.s32 %rd30, %r617, 2, %rd6; 2026-02-21T10:19:58.0686006Z mad.wide.s32 %rd31, %r618, 2, %rd6; 2026-02-21T10:19:58.0686206Z mad.wide.s32 %rd32, %r619, 2, %rd6; 2026-02-21T10:19:58.0686409Z mad.wide.s32 %rd33, %r620, 2, %rd6; 2026-02-21T10:19:58.0686890Z .loc 1 54 80 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:80 2026-02-21T10:19:58.0687343Z and.b32 %r45, %r2, 127; 2026-02-21T10:19:58.0687517Z shl.b32 %r621, %r45, 3; 2026-02-21T10:19:58.0687690Z shr.u32 %r622, %r2, 1; 2026-02-21T10:19:58.0687876Z and.b32 %r623, %r622, 24; 2026-02-21T10:19:58.0688057Z xor.b32 %r46, %r621, %r623; 2026-02-21T10:19:58.0688236Z add.s32 %r515, %r586, %r46; 2026-02-21T10:19:58.0688420Z selp.b32 %r516, 8, 0, %p32; 2026-02-21T10:19:58.0688605Z // begin inline asm 2026-02-21T10:19:58.0688844Z cp.async.ca.shared.global [ %r515 + 0 ], [ %rd26 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0689129Z // end inline asm 2026-02-21T10:19:58.0689283Z add.s32 %r517, %r515, 1024; 2026-02-21T10:19:58.0689464Z // begin inline asm 2026-02-21T10:19:58.0689692Z cp.async.ca.shared.global [ %r517 + 0 ], [ %rd27 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0689966Z // end inline asm 2026-02-21T10:19:58.0690116Z add.s32 %r519, %r515, 2048; 2026-02-21T10:19:58.0690295Z // begin inline asm 2026-02-21T10:19:58.0690528Z cp.async.ca.shared.global [ %r519 + 0 ], [ %rd28 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0690806Z // end inline asm 2026-02-21T10:19:58.0691037Z add.s32 %r521, %r515, 3072; 2026-02-21T10:19:58.0691211Z // begin inline asm 2026-02-21T10:19:58.0691438Z cp.async.ca.shared.global [ %r521 + 0 ], [ %rd29 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0691704Z // end inline asm 2026-02-21T10:19:58.0691853Z add.s32 %r523, %r515, 4096; 2026-02-21T10:19:58.0692022Z // begin inline asm 2026-02-21T10:19:58.0692248Z cp.async.ca.shared.global [ %r523 + 0 ], [ %rd30 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0692580Z // end inline asm 2026-02-21T10:19:58.0692756Z add.s32 %r525, %r515, 5120; 2026-02-21T10:19:58.0692935Z // begin inline asm 2026-02-21T10:19:58.0693158Z cp.async.ca.shared.global [ %r525 + 0 ], [ %rd31 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0693429Z // end inline asm 2026-02-21T10:19:58.0693579Z add.s32 %r527, %r515, 6144; 2026-02-21T10:19:58.0693760Z // begin inline asm 2026-02-21T10:19:58.0693979Z cp.async.ca.shared.global [ %r527 + 0 ], [ %rd32 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0694251Z // end inline asm 2026-02-21T10:19:58.0694403Z add.s32 %r529, %r515, 7168; 2026-02-21T10:19:58.0694585Z // begin inline asm 2026-02-21T10:19:58.0694808Z cp.async.ca.shared.global [ %r529 + 0 ], [ %rd33 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0695083Z // end inline asm 2026-02-21T10:19:58.0695243Z cp.async.commit_group; 2026-02-21T10:19:58.0695579Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0695955Z bar.sync 0; 2026-02-21T10:19:58.0696111Z and.pred %p22, %p61, %p32; 2026-02-21T10:19:58.0696372Z // begin inline asm 2026-02-21T10:19:58.0696726Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r512], 2048; 2026-02-21T10:19:58.0697010Z // end inline asm 2026-02-21T10:19:58.0697309Z .loc 1 60 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:60:33 2026-02-21T10:19:58.0697668Z bar.sync 0; 2026-02-21T10:19:58.0697828Z elect.sync %r624|%p33, -1; 2026-02-21T10:19:58.0698016Z and.pred %p34, %p32, %p33; 2026-02-21T10:19:58.0698209Z and.pred %p23, %p1, %p34; 2026-02-21T10:19:58.0698389Z add.s32 %r532, %r586, 40960; 2026-02-21T10:19:58.0698572Z // begin inline asm 2026-02-21T10:19:58.0698989Z @%p23 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r532], [%rd34, {%r2375, %r2380}], [%r512]; 2026-02-21T10:19:58.0699454Z // end inline asm 2026-02-21T10:19:58.0699764Z .loc 1 54 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:32 2026-02-21T10:19:58.0700129Z cvt.s64.s32 %rd55, %r605; 2026-02-21T10:19:58.0700313Z cvt.u64.u32 %rd56, %r34; 2026-02-21T10:19:58.0700494Z or.b64 %rd57, %rd55, %rd56; 2026-02-21T10:19:58.0700690Z shl.b64 %rd58, %rd57, 1; 2026-02-21T10:19:58.0700865Z add.s64 %rd59, %rd6, %rd58; 2026-02-21T10:19:58.0701046Z add.s64 %rd35, %rd59, 64; 2026-02-21T10:19:58.0701309Z cvt.s64.s32 %rd60, %r606; 2026-02-21T10:19:58.0701486Z or.b64 %rd61, %rd60, %rd56; 2026-02-21T10:19:58.0701664Z shl.b64 %rd62, %rd61, 1; 2026-02-21T10:19:58.0701852Z add.s64 %rd63, %rd6, %rd62; 2026-02-21T10:19:58.0702034Z add.s64 %rd36, %rd63, 64; 2026-02-21T10:19:58.0702200Z cvt.s64.s32 %rd64, %r607; 2026-02-21T10:19:58.0702372Z or.b64 %rd65, %rd64, %rd56; 2026-02-21T10:19:58.0702546Z shl.b64 %rd66, %rd65, 1; 2026-02-21T10:19:58.0702717Z add.s64 %rd67, %rd6, %rd66; 2026-02-21T10:19:58.0702888Z add.s64 %rd37, %rd67, 64; 2026-02-21T10:19:58.0703062Z cvt.s64.s32 %rd68, %r608; 2026-02-21T10:19:58.0703232Z or.b64 %rd69, %rd68, %rd56; 2026-02-21T10:19:58.0703414Z shl.b64 %rd70, %rd69, 1; 2026-02-21T10:19:58.0703588Z add.s64 %rd71, %rd6, %rd70; 2026-02-21T10:19:58.0703762Z add.s64 %rd38, %rd71, 64; 2026-02-21T10:19:58.0703933Z cvt.s64.s32 %rd72, %r609; 2026-02-21T10:19:58.0704106Z or.b64 %rd73, %rd72, %rd56; 2026-02-21T10:19:58.0704288Z shl.b64 %rd74, %rd73, 1; 2026-02-21T10:19:58.0704474Z add.s64 %rd75, %rd6, %rd74; 2026-02-21T10:19:58.0704656Z add.s64 %rd39, %rd75, 64; 2026-02-21T10:19:58.0704823Z cvt.s64.s32 %rd76, %r610; 2026-02-21T10:19:58.0705073Z or.b64 %rd77, %rd76, %rd56; 2026-02-21T10:19:58.0705252Z shl.b64 %rd78, %rd77, 1; 2026-02-21T10:19:58.0705440Z add.s64 %rd79, %rd6, %rd78; 2026-02-21T10:19:58.0705621Z add.s64 %rd40, %rd79, 64; 2026-02-21T10:19:58.0705787Z cvt.s64.s32 %rd80, %r611; 2026-02-21T10:19:58.0705964Z or.b64 %rd81, %rd80, %rd56; 2026-02-21T10:19:58.0706139Z shl.b64 %rd82, %rd81, 1; 2026-02-21T10:19:58.0706313Z add.s64 %rd83, %rd6, %rd82; 2026-02-21T10:19:58.0706673Z add.s64 %rd41, %rd83, 64; 2026-02-21T10:19:58.0706851Z cvt.s64.s32 %rd84, %r612; 2026-02-21T10:19:58.0707039Z or.b64 %rd85, %rd84, %rd56; 2026-02-21T10:19:58.0707218Z shl.b64 %rd86, %rd85, 1; 2026-02-21T10:19:58.0707388Z add.s64 %rd87, %rd6, %rd86; 2026-02-21T10:19:58.0707567Z add.s64 %rd42, %rd87, 64; 2026-02-21T10:19:58.0707888Z .loc 1 54 80 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:80 2026-02-21T10:19:58.0708246Z add.s32 %r536, %r515, 8192; 2026-02-21T10:19:58.0708508Z // begin inline asm 2026-02-21T10:19:58.0708745Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd35 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0709025Z // end inline asm 2026-02-21T10:19:58.0709189Z add.s32 %r538, %r515, 9216; 2026-02-21T10:19:58.0709366Z // begin inline asm 2026-02-21T10:19:58.0709593Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd36 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0709869Z // end inline asm 2026-02-21T10:19:58.0710027Z add.s32 %r540, %r515, 10240; 2026-02-21T10:19:58.0710206Z // begin inline asm 2026-02-21T10:19:58.0710523Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd37 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0710810Z // end inline asm 2026-02-21T10:19:58.0710969Z add.s32 %r542, %r515, 11264; 2026-02-21T10:19:58.0711143Z // begin inline asm 2026-02-21T10:19:58.0711372Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd38 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0711641Z // end inline asm 2026-02-21T10:19:58.0711794Z add.s32 %r544, %r515, 12288; 2026-02-21T10:19:58.0711973Z // begin inline asm 2026-02-21T10:19:58.0712194Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd39 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0712465Z // end inline asm 2026-02-21T10:19:58.0712614Z add.s32 %r546, %r515, 13312; 2026-02-21T10:19:58.0712790Z // begin inline asm 2026-02-21T10:19:58.0713012Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd40 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0713282Z // end inline asm 2026-02-21T10:19:58.0713429Z add.s32 %r548, %r515, 14336; 2026-02-21T10:19:58.0713608Z // begin inline asm 2026-02-21T10:19:58.0713839Z cp.async.ca.shared.global [ %r548 + 0 ], [ %rd41 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0714107Z // end inline asm 2026-02-21T10:19:58.0714258Z add.s32 %r550, %r515, 15360; 2026-02-21T10:19:58.0714432Z // begin inline asm 2026-02-21T10:19:58.0714657Z cp.async.ca.shared.global [ %r550 + 0 ], [ %rd42 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0715016Z // end inline asm 2026-02-21T10:19:58.0715177Z cp.async.commit_group; 2026-02-21T10:19:58.0715506Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0715876Z bar.sync 0; 2026-02-21T10:19:58.0716021Z // begin inline asm 2026-02-21T10:19:58.0716242Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r513], 2048; 2026-02-21T10:19:58.0716643Z // end inline asm 2026-02-21T10:19:58.0716948Z .loc 1 60 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:60:33 2026-02-21T10:19:58.0717324Z bar.sync 0; 2026-02-21T10:19:58.0717477Z elect.sync %r625|%p35, -1; 2026-02-21T10:19:58.0717667Z and.pred %p36, %p32, %p35; 2026-02-21T10:19:58.0717853Z and.pred %p25, %p1, %p36; 2026-02-21T10:19:58.0718037Z add.s32 %r553, %r586, 43008; 2026-02-21T10:19:58.0718211Z // begin inline asm 2026-02-21T10:19:58.0718628Z @%p25 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r553], [%rd34, {%r2375, %r507}], [%r513]; 2026-02-21T10:19:58.0719089Z // end inline asm 2026-02-21T10:19:58.0719471Z .loc 1 54 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:32 2026-02-21T10:19:58.0719845Z add.s64 %rd44, %rd59, 128; 2026-02-21T10:19:58.0720019Z add.s64 %rd45, %rd63, 128; 2026-02-21T10:19:58.0720205Z add.s64 %rd46, %rd67, 128; 2026-02-21T10:19:58.0720380Z add.s64 %rd47, %rd71, 128; 2026-02-21T10:19:58.0720561Z add.s64 %rd48, %rd75, 128; 2026-02-21T10:19:58.0729116Z add.s64 %rd49, %rd79, 128; 2026-02-21T10:19:58.0729344Z add.s64 %rd50, %rd83, 128; 2026-02-21T10:19:58.0729689Z add.s64 %rd51, %rd87, 128; 2026-02-21T10:19:58.0730053Z .loc 1 54 80 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:80 2026-02-21T10:19:58.0730459Z add.s32 %r557, %r515, 16384; 2026-02-21T10:19:58.0730652Z // begin inline asm 2026-02-21T10:19:58.0730915Z cp.async.ca.shared.global [ %r557 + 0 ], [ %rd44 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0731208Z // end inline asm 2026-02-21T10:19:58.0731380Z add.s32 %r559, %r515, 17408; 2026-02-21T10:19:58.0731580Z // begin inline asm 2026-02-21T10:19:58.0731831Z cp.async.ca.shared.global [ %r559 + 0 ], [ %rd45 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0732113Z // end inline asm 2026-02-21T10:19:58.0732268Z add.s32 %r561, %r515, 18432; 2026-02-21T10:19:58.0732450Z // begin inline asm 2026-02-21T10:19:58.0732674Z cp.async.ca.shared.global [ %r561 + 0 ], [ %rd46 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0732952Z // end inline asm 2026-02-21T10:19:58.0733105Z add.s32 %r563, %r515, 19456; 2026-02-21T10:19:58.0733376Z // begin inline asm 2026-02-21T10:19:58.0733628Z cp.async.ca.shared.global [ %r563 + 0 ], [ %rd47 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0733908Z // end inline asm 2026-02-21T10:19:58.0734081Z add.s32 %r565, %r515, 20480; 2026-02-21T10:19:58.0734280Z // begin inline asm 2026-02-21T10:19:58.0734520Z cp.async.ca.shared.global [ %r565 + 0 ], [ %rd48 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0734798Z // end inline asm 2026-02-21T10:19:58.0734962Z add.s32 %r567, %r515, 21504; 2026-02-21T10:19:58.0735143Z // begin inline asm 2026-02-21T10:19:58.0735377Z cp.async.ca.shared.global [ %r567 + 0 ], [ %rd49 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0735646Z // end inline asm 2026-02-21T10:19:58.0735808Z add.s32 %r569, %r515, 22528; 2026-02-21T10:19:58.0735990Z // begin inline asm 2026-02-21T10:19:58.0736214Z cp.async.ca.shared.global [ %r569 + 0 ], [ %rd50 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0736647Z // end inline asm 2026-02-21T10:19:58.0736823Z add.s32 %r571, %r515, 23552; 2026-02-21T10:19:58.0737028Z // begin inline asm 2026-02-21T10:19:58.0737250Z cp.async.ca.shared.global [ %r571 + 0 ], [ %rd51 + 0 ], 0x8, %r516; 2026-02-21T10:19:58.0737521Z // end inline asm 2026-02-21T10:19:58.0737679Z cp.async.commit_group; 2026-02-21T10:19:58.0738026Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0738504Z bar.sync 0; 2026-02-21T10:19:58.0738651Z // begin inline asm 2026-02-21T10:19:58.0738886Z @%p22 mbarrier.arrive.expect_tx.shared.b64 _, [%r514], 2048; 2026-02-21T10:19:58.0739151Z // end inline asm 2026-02-21T10:19:58.0739482Z .loc 1 60 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:60:33 2026-02-21T10:19:58.0739994Z bar.sync 0; 2026-02-21T10:19:58.0740171Z elect.sync %r626|%p37, -1; 2026-02-21T10:19:58.0740368Z and.pred %p38, %p32, %p37; 2026-02-21T10:19:58.0740573Z and.pred %p27, %p1, %p38; 2026-02-21T10:19:58.0740765Z add.s32 %r574, %r586, 45056; 2026-02-21T10:19:58.0740949Z mov.b32 %r2381, 32; 2026-02-21T10:19:58.0741118Z // begin inline asm 2026-02-21T10:19:58.0741541Z @%p27 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r574], [%rd34, {%r2375, %r2381}], [%r514]; 2026-02-21T10:19:58.0742023Z // end inline asm 2026-02-21T10:19:58.0742343Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0742724Z @%p31 bra $L__BB0_7; 2026-02-21T10:19:58.0743025Z // %bb.1: // %.lr.ph 2026-02-21T10:19:58.0743424Z .loc 1 0 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:0:112 2026-02-21T10:19:58.0743860Z ld.param.b64 %rd7, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:19:58.0744114Z shr.u32 %r3, %r2, 5; 2026-02-21T10:19:58.0744286Z and.b32 %r4, %r2, 120; 2026-02-21T10:19:58.0744466Z shr.u32 %r587, %r2, 4; 2026-02-21T10:19:58.0744651Z bfe.u32 %r13, %r2, 4, 3; 2026-02-21T10:19:58.0744896Z or.b32 %r14, %r13, 8; 2026-02-21T10:19:58.0745071Z or.b32 %r15, %r13, 16; 2026-02-21T10:19:58.0745242Z or.b32 %r16, %r13, 24; 2026-02-21T10:19:58.0745404Z or.b32 %r17, %r13, 32; 2026-02-21T10:19:58.0745573Z or.b32 %r18, %r13, 40; 2026-02-21T10:19:58.0745733Z or.b32 %r19, %r13, 48; 2026-02-21T10:19:58.0745918Z or.b32 %r20, %r587, 56; 2026-02-21T10:19:58.0746091Z or.b32 %r21, %r13, 64; 2026-02-21T10:19:58.0746260Z or.b32 %r22, %r13, 72; 2026-02-21T10:19:58.0746422Z or.b32 %r23, %r13, 80; 2026-02-21T10:19:58.0746731Z or.b32 %r24, %r13, 88; 2026-02-21T10:19:58.0746915Z or.b32 %r25, %r13, 96; 2026-02-21T10:19:58.0747088Z or.b32 %r26, %r13, 104; 2026-02-21T10:19:58.0747259Z or.b32 %r27, %r13, 112; 2026-02-21T10:19:58.0747425Z or.b32 %r28, %r587, 120; 2026-02-21T10:19:58.0747605Z shl.b32 %r588, %r2, 3; 2026-02-21T10:19:58.0747770Z and.b32 %r29, %r588, 120; 2026-02-21T10:19:58.0748107Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0748635Z shl.b32 %r633, %r32, 8; 2026-02-21T10:19:58.0748824Z add.s32 %r47, %r633, -3; 2026-02-21T10:19:58.0748995Z shl.b32 %r634, %r2, 5; 2026-02-21T10:19:58.0749168Z and.b32 %r635, %r634, 3072; 2026-02-21T10:19:58.0749345Z shl.b32 %r636, %r2, 4; 2026-02-21T10:19:58.0749516Z and.b32 %r637, %r636, 448; 2026-02-21T10:19:58.0749697Z and.b32 %r638, %r2, 3; 2026-02-21T10:19:58.0749858Z shl.b32 %r639, %r638, 1; 2026-02-21T10:19:58.0750039Z and.b32 %r640, %r2, 24; 2026-02-21T10:19:58.0750214Z or.b32 %r641, %r635, %r637; 2026-02-21T10:19:58.0750400Z or.b32 %r642, %r639, %r640; 2026-02-21T10:19:58.0750574Z or.b32 %r48, %r641, %r642; 2026-02-21T10:19:58.0750762Z xor.b32 %r49, %r48, 8; 2026-02-21T10:19:58.0750926Z xor.b32 %r50, %r48, 16; 2026-02-21T10:19:58.0751093Z xor.b32 %r51, %r48, 24; 2026-02-21T10:19:58.0751255Z shl.b32 %r643, %r45, 7; 2026-02-21T10:19:58.0751435Z shl.b32 %r644, %r33, 4; 2026-02-21T10:19:58.0751610Z or.b32 %r645, %r643, %r644; 2026-02-21T10:19:58.0751789Z add.s32 %r647, %r586, 24576; 2026-02-21T10:19:58.0751980Z add.s32 %r52, %r647, %r645; 2026-02-21T10:19:58.0752166Z xor.b32 %r648, %r645, 16; 2026-02-21T10:19:58.0752350Z add.s32 %r53, %r647, %r648; 2026-02-21T10:19:58.0752528Z xor.b32 %r649, %r645, 32; 2026-02-21T10:19:58.0752708Z add.s32 %r54, %r647, %r649; 2026-02-21T10:19:58.0752965Z xor.b32 %r650, %r645, 48; 2026-02-21T10:19:58.0753159Z add.s32 %r55, %r647, %r650; 2026-02-21T10:19:58.0753343Z xor.b32 %r651, %r645, 64; 2026-02-21T10:19:58.0753514Z add.s32 %r56, %r647, %r651; 2026-02-21T10:19:58.0753698Z xor.b32 %r652, %r645, 80; 2026-02-21T10:19:58.0753868Z add.s32 %r57, %r647, %r652; 2026-02-21T10:19:58.0754049Z xor.b32 %r653, %r645, 96; 2026-02-21T10:19:58.0754221Z add.s32 %r58, %r647, %r653; 2026-02-21T10:19:58.0754402Z xor.b32 %r654, %r645, 112; 2026-02-21T10:19:58.0754572Z add.s32 %r59, %r647, %r654; 2026-02-21T10:19:58.0754762Z bfe.u32 %r655, %r647, 4, 14; 2026-02-21T10:19:58.0754946Z cvt.u64.u32 %rd88, %r655; 2026-02-21T10:19:58.0755155Z or.b64 %rd92, %rd88, 4611686293372403712; 2026-02-21T10:19:58.0755377Z add.s32 %r656, %r586, 24608; 2026-02-21T10:19:58.0755554Z bfe.u32 %r657, %r656, 4, 14; 2026-02-21T10:19:58.0755740Z cvt.u64.u32 %rd89, %r657; 2026-02-21T10:19:58.0755923Z or.b64 %rd93, %rd89, 4611686293372403712; 2026-02-21T10:19:58.0756135Z add.s32 %r658, %r586, 24640; 2026-02-21T10:19:58.0756309Z bfe.u32 %r659, %r658, 4, 14; 2026-02-21T10:19:58.0756647Z cvt.u64.u32 %rd90, %r659; 2026-02-21T10:19:58.0756919Z or.b64 %rd94, %rd90, 4611686293372403712; 2026-02-21T10:19:58.0757147Z add.s32 %r660, %r586, 24672; 2026-02-21T10:19:58.0757323Z bfe.u32 %r661, %r660, 4, 14; 2026-02-21T10:19:58.0757502Z cvt.u64.u32 %rd91, %r661; 2026-02-21T10:19:58.0757690Z or.b64 %rd95, %rd91, 4611686293372403712; 2026-02-21T10:19:58.0757891Z shl.b32 %r662, %r638, 11; 2026-02-21T10:19:58.0758071Z shl.b32 %r663, %r638, 5; 2026-02-21T10:19:58.0758238Z shl.b32 %r664, %r4, 4; 2026-02-21T10:19:58.0758491Z and.b32 %r666, %r585, 16; 2026-02-21T10:19:58.0758664Z or.b32 %r667, %r664, %r666; 2026-02-21T10:19:58.0758846Z or.b32 %r668, %r667, %r662; 2026-02-21T10:19:58.0759019Z or.b32 %r669, %r668, %r663; 2026-02-21T10:19:58.0759200Z add.s32 %r60, %r647, %r669; 2026-02-21T10:19:58.0759381Z xor.b32 %r670, %r669, 32; 2026-02-21T10:19:58.0759566Z add.s32 %r61, %r647, %r670; 2026-02-21T10:19:58.0759748Z xor.b32 %r671, %r669, 64; 2026-02-21T10:19:58.0759917Z add.s32 %r62, %r647, %r671; 2026-02-21T10:19:58.0760100Z xor.b32 %r672, %r669, 96; 2026-02-21T10:19:58.0760270Z add.s32 %r63, %r647, %r672; 2026-02-21T10:19:58.0760449Z shl.b32 %r673, %r640, 8; 2026-02-21T10:19:58.0760616Z and.b32 %r674, %r585, 496; 2026-02-21T10:19:58.0760799Z or.b32 %r675, %r673, %r663; 2026-02-21T10:19:58.0760973Z xor.b32 %r676, %r675, %r674; 2026-02-21T10:19:58.0761158Z add.s32 %r2125, %r647, %r676; 2026-02-21T10:19:58.0761351Z add.s32 %r2130, %r2125, 512; 2026-02-21T10:19:58.0761530Z add.s32 %r2135, %r2125, 1024; 2026-02-21T10:19:58.0761803Z add.s32 %r2140, %r2125, 1536; 2026-02-21T10:19:58.0761998Z shl.b32 %r677, %r30, 8; 2026-02-21T10:19:58.0762178Z shl.b32 %r678, %r31, 8; 2026-02-21T10:19:58.0762347Z add.s32 %r68, %r677, %r678; 2026-02-21T10:19:58.0762537Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:19:58.0762715Z mov.b32 %r2384, 2; 2026-02-21T10:19:58.0762876Z mov.b32 %r2383, -1; 2026-02-21T10:19:58.0763055Z mov.b32 %r2376, %r2375; 2026-02-21T10:19:58.0763229Z mov.b32 %r2378, %r2377; 2026-02-21T10:19:58.0763409Z mov.b32 %r2382, %r2380; 2026-02-21T10:19:58.0763575Z mov.b32 %r2385, %r2375; 2026-02-21T10:19:58.0763744Z mov.b32 %r2386, %r2377; 2026-02-21T10:19:58.0763910Z mov.b32 %r2388, %r2387; 2026-02-21T10:19:58.0764079Z mov.b32 %r2389, %r2387; 2026-02-21T10:19:58.0764243Z mov.b32 %r2390, %r2387; 2026-02-21T10:19:58.0764409Z mov.b32 %r2391, %r2387; 2026-02-21T10:19:58.0764578Z mov.b32 %r2392, %r2387; 2026-02-21T10:19:58.0764742Z mov.b32 %r2393, %r2387; 2026-02-21T10:19:58.0764916Z mov.b32 %r2394, %r2387; 2026-02-21T10:19:58.0765080Z mov.b32 %r2395, %r2387; 2026-02-21T10:19:58.0765251Z mov.b32 %r2396, %r2387; 2026-02-21T10:19:58.0765413Z mov.b32 %r2397, %r2387; 2026-02-21T10:19:58.0765584Z mov.b32 %r2398, %r2387; 2026-02-21T10:19:58.0765745Z mov.b32 %r2399, %r2387; 2026-02-21T10:19:58.0766006Z mov.b32 %r2400, %r2387; 2026-02-21T10:19:58.0766182Z mov.b32 %r2401, %r2387; 2026-02-21T10:19:58.0766356Z mov.b32 %r2402, %r2387; 2026-02-21T10:19:58.0766649Z mov.b32 %r2403, %r2387; 2026-02-21T10:19:58.0766814Z mov.b32 %r2404, %r2387; 2026-02-21T10:19:58.0766993Z mov.b32 %r2405, %r2387; 2026-02-21T10:19:58.0767163Z mov.b32 %r2406, %r2387; 2026-02-21T10:19:58.0767333Z mov.b32 %r2407, %r2387; 2026-02-21T10:19:58.0767494Z mov.b32 %r2408, %r2387; 2026-02-21T10:19:58.0767658Z mov.b32 %r2409, %r2387; 2026-02-21T10:19:58.0767817Z mov.b32 %r2410, %r2387; 2026-02-21T10:19:58.0767982Z mov.b32 %r2411, %r2387; 2026-02-21T10:19:58.0768146Z mov.b32 %r2412, %r2387; 2026-02-21T10:19:58.0768311Z mov.b32 %r2413, %r2387; 2026-02-21T10:19:58.0768479Z mov.b32 %r2414, %r2387; 2026-02-21T10:19:58.0768639Z mov.b32 %r2415, %r2387; 2026-02-21T10:19:58.0768804Z mov.b32 %r2416, %r2387; 2026-02-21T10:19:58.0768964Z mov.b32 %r2417, %r2387; 2026-02-21T10:19:58.0769133Z mov.b32 %r2418, %r2387; 2026-02-21T10:19:58.0769297Z mov.b32 %r2419, %r2387; 2026-02-21T10:19:58.0769465Z mov.b32 %r2420, %r2387; 2026-02-21T10:19:58.0769626Z mov.b32 %r2421, %r2387; 2026-02-21T10:19:58.0769876Z mov.b32 %r2422, %r2387; 2026-02-21T10:19:58.0770043Z mov.b32 %r2423, %r2387; 2026-02-21T10:19:58.0770209Z mov.b32 %r2424, %r2387; 2026-02-21T10:19:58.0770372Z mov.b32 %r2425, %r2387; 2026-02-21T10:19:58.0770536Z mov.b32 %r2426, %r2387; 2026-02-21T10:19:58.0770702Z mov.b32 %r2427, %r2387; 2026-02-21T10:19:58.0770862Z mov.b32 %r2428, %r2387; 2026-02-21T10:19:58.0771022Z mov.b32 %r2429, %r2387; 2026-02-21T10:19:58.0771184Z mov.b32 %r2430, %r2387; 2026-02-21T10:19:58.0771429Z mov.b32 %r2431, %r2387; 2026-02-21T10:19:58.0771591Z mov.b32 %r2432, %r2387; 2026-02-21T10:19:58.0771748Z mov.b32 %r2433, %r2387; 2026-02-21T10:19:58.0771906Z mov.b32 %r2434, %r2387; 2026-02-21T10:19:58.0772063Z mov.b32 %r2435, %r2387; 2026-02-21T10:19:58.0772220Z mov.b32 %r2436, %r2387; 2026-02-21T10:19:58.0772380Z mov.b32 %r2437, %r2387; 2026-02-21T10:19:58.0772544Z mov.b32 %r2438, %r2387; 2026-02-21T10:19:58.0772702Z mov.b32 %r2439, %r2387; 2026-02-21T10:19:58.0772859Z mov.b32 %r2440, %r2387; 2026-02-21T10:19:58.0773022Z mov.b32 %r2441, %r2387; 2026-02-21T10:19:58.0773181Z mov.b32 %r2442, %r2387; 2026-02-21T10:19:58.0773342Z mov.b32 %r2443, %r2387; 2026-02-21T10:19:58.0773498Z mov.b32 %r2444, %r2387; 2026-02-21T10:19:58.0773657Z mov.b32 %r2445, %r2387; 2026-02-21T10:19:58.0773816Z mov.b32 %r2446, %r2387; 2026-02-21T10:19:58.0773975Z mov.b32 %r2447, %r2387; 2026-02-21T10:19:58.0774136Z mov.b32 %r2448, %r2387; 2026-02-21T10:19:58.0774295Z mov.b32 %r2449, %r2387; 2026-02-21T10:19:58.0774546Z mov.b32 %r2450, %r2387; 2026-02-21T10:19:58.0774711Z mov.b32 %r2451, %r2387; 2026-02-21T10:19:58.0774869Z mov.b32 %r2452, %r2387; 2026-02-21T10:19:58.0775024Z mov.b32 %r2453, %r2387; 2026-02-21T10:19:58.0775186Z mov.b32 %r2454, %r2387; 2026-02-21T10:19:58.0775342Z mov.b32 %r2455, %r2387; 2026-02-21T10:19:58.0775507Z mov.b32 %r2456, %r2387; 2026-02-21T10:19:58.0775664Z mov.b32 %r2457, %r2387; 2026-02-21T10:19:58.0775825Z mov.b32 %r2458, %r2387; 2026-02-21T10:19:58.0775983Z mov.b32 %r2459, %r2387; 2026-02-21T10:19:58.0776145Z mov.b32 %r2460, %r2387; 2026-02-21T10:19:58.0776304Z mov.b32 %r2461, %r2387; 2026-02-21T10:19:58.0776591Z mov.b32 %r2462, %r2387; 2026-02-21T10:19:58.0776760Z mov.b32 %r2463, %r2387; 2026-02-21T10:19:58.0776934Z mov.b32 %r2464, %r2387; 2026-02-21T10:19:58.0777098Z mov.b32 %r2465, %r2387; 2026-02-21T10:19:58.0777256Z mov.b32 %r2466, %r2387; 2026-02-21T10:19:58.0777419Z mov.b32 %r2467, %r2387; 2026-02-21T10:19:58.0777580Z mov.b32 %r2468, %r2387; 2026-02-21T10:19:58.0777740Z mov.b32 %r2469, %r2387; 2026-02-21T10:19:58.0777899Z mov.b32 %r2470, %r2387; 2026-02-21T10:19:58.0778058Z mov.b32 %r2471, %r2387; 2026-02-21T10:19:58.0778219Z mov.b32 %r2472, %r2387; 2026-02-21T10:19:58.0778379Z mov.b32 %r2473, %r2387; 2026-02-21T10:19:58.0778540Z mov.b32 %r2474, %r2387; 2026-02-21T10:19:58.0778813Z mov.b32 %r2475, %r2387; 2026-02-21T10:19:58.0778977Z mov.b32 %r2476, %r2387; 2026-02-21T10:19:58.0779135Z mov.b32 %r2477, %r2387; 2026-02-21T10:19:58.0779296Z mov.b32 %r2478, %r2387; 2026-02-21T10:19:58.0779453Z mov.b32 %r2479, %r2387; 2026-02-21T10:19:58.0779614Z mov.b32 %r2480, %r2387; 2026-02-21T10:19:58.0779774Z mov.b32 %r2481, %r2387; 2026-02-21T10:19:58.0779935Z mov.b32 %r2482, %r2387; 2026-02-21T10:19:58.0780100Z mov.b32 %r2483, %r2387; 2026-02-21T10:19:58.0780271Z mov.b32 %r2484, %r2387; 2026-02-21T10:19:58.0780433Z mov.b32 %r2485, %r2387; 2026-02-21T10:19:58.0780589Z mov.b32 %r2486, %r2387; 2026-02-21T10:19:58.0780749Z mov.b32 %r2487, %r2387; 2026-02-21T10:19:58.0780909Z mov.b32 %r2488, %r2387; 2026-02-21T10:19:58.0781068Z mov.b32 %r2489, %r2387; 2026-02-21T10:19:58.0781223Z mov.b32 %r2490, %r2387; 2026-02-21T10:19:58.0781385Z mov.b32 %r2491, %r2387; 2026-02-21T10:19:58.0781542Z mov.b32 %r2492, %r2387; 2026-02-21T10:19:58.0781704Z mov.b32 %r2493, %r2387; 2026-02-21T10:19:58.0781863Z mov.b32 %r2494, %r2387; 2026-02-21T10:19:58.0782023Z mov.b32 %r2495, %r2387; 2026-02-21T10:19:58.0782181Z mov.b32 %r2496, %r2387; 2026-02-21T10:19:58.0782422Z mov.b32 %r2497, %r2387; 2026-02-21T10:19:58.0782589Z mov.b32 %r2498, %r2387; 2026-02-21T10:19:58.0782751Z mov.b32 %r2499, %r2387; 2026-02-21T10:19:58.0782910Z mov.b32 %r2500, %r2387; 2026-02-21T10:19:58.0783068Z mov.b32 %r2501, %r2387; 2026-02-21T10:19:58.0783232Z mov.b32 %r2502, %r2387; 2026-02-21T10:19:58.0783390Z mov.b32 %r2503, %r2387; 2026-02-21T10:19:58.0783549Z mov.b32 %r2504, %r2387; 2026-02-21T10:19:58.0783778Z mov.b32 %r2505, %r2387; 2026-02-21T10:19:58.0783936Z mov.b32 %r2506, %r2387; 2026-02-21T10:19:58.0784098Z mov.b32 %r2507, %r2387; 2026-02-21T10:19:58.0784258Z mov.b32 %r2508, %r2387; 2026-02-21T10:19:58.0784423Z mov.b32 %r2509, %r2387; 2026-02-21T10:19:58.0784579Z mov.b32 %r2510, %r2387; 2026-02-21T10:19:58.0784741Z mov.b32 %r2511, %r2387; 2026-02-21T10:19:58.0784898Z mov.b32 %r2512, %r2387; 2026-02-21T10:19:58.0785072Z mov.b32 %r2513, %r2387; 2026-02-21T10:19:58.0785230Z mov.b32 %r2514, %r2387; 2026-02-21T10:19:58.0785393Z mov.b32 %r2516, %r2384; 2026-02-21T10:19:58.0785556Z mov.b32 %r2517, %r2380; 2026-02-21T10:19:58.0785713Z mov.b32 %r2518, %r2386; 2026-02-21T10:19:58.0785879Z mov.b32 %r2519, %r2385; 2026-02-21T10:19:58.0786038Z bra.uni $L__BB0_2; 2026-02-21T10:19:58.0786255Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:58.0786981Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0787442Z add.s32 %r2517, %r2517, 1; 2026-02-21T10:19:58.0787632Z setp.ne.b32 %p60, %r68, %r2517; 2026-02-21T10:19:58.0787823Z mov.b32 %r2375, %r2385; 2026-02-21T10:19:58.0787987Z mov.b32 %r2376, %r77; 2026-02-21T10:19:58.0788147Z mov.b32 %r2377, %r2386; 2026-02-21T10:19:58.0788309Z mov.b32 %r2378, %r79; 2026-02-21T10:19:58.0788556Z mov.b32 %r2379, %r2516; 2026-02-21T10:19:58.0788723Z mov.b32 %r2380, %r81; 2026-02-21T10:19:58.0788876Z mov.b32 %r2385, %r2519; 2026-02-21T10:19:58.0789034Z mov.b32 %r2386, %r2518; 2026-02-21T10:19:58.0789197Z mov.b32 %r2516, %r220; 2026-02-21T10:19:58.0789364Z @%p60 bra $L__BB0_2; 2026-02-21T10:19:58.0789519Z bra.uni $L__BB0_7; 2026-02-21T10:19:58.0789733Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:19:58.0790150Z .loc 1 0 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:0:112 2026-02-21T10:19:58.0790513Z mov.b32 %r81, %r2379; 2026-02-21T10:19:58.0790685Z mov.b32 %r79, %r2377; 2026-02-21T10:19:58.0790840Z mov.b32 %r77, %r2375; 2026-02-21T10:19:58.0791151Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0791513Z add.s32 %r679, %r2516, 1; 2026-02-21T10:19:58.0791696Z setp.eq.b32 %p39, %r2516, 255; 2026-02-21T10:19:58.0791978Z selp.b32 %r220, 0, %r679, %p39; 2026-02-21T10:19:58.0792180Z setp.ne.b32 %p40, %r220, 0; 2026-02-21T10:19:58.0792365Z @%p40 bra $L__BB0_4; 2026-02-21T10:19:58.0792574Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:58.0792831Z add.s32 %r2520, %r2520, 8448; 2026-02-21T10:19:58.0793158Z .loc 1 32 35 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:32:35 2026-02-21T10:19:58.0793522Z mul.hi.s32 %r680, %r2520, 1717986919; 2026-02-21T10:19:58.0793723Z shr.u32 %r681, %r680, 31; 2026-02-21T10:19:58.0793899Z shr.s32 %r682, %r680, 5; 2026-02-21T10:19:58.0794069Z add.s32 %r683, %r682, %r681; 2026-02-21T10:19:58.0794398Z .loc 1 33 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:33:33 2026-02-21T10:19:58.0794752Z shl.b32 %r684, %r683, 3; 2026-02-21T10:19:58.0795058Z .loc 1 34 39 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:34:39 2026-02-21T10:19:58.0795414Z sub.s32 %r685, 512, %r684; 2026-02-21T10:19:58.0795725Z .loc 1 34 52 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:34:52 2026-02-21T10:19:58.0796176Z min.s32 %r686, %r685, 8; 2026-02-21T10:19:58.0796624Z .loc 1 35 45 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:45 2026-02-21T10:19:58.0796993Z mul.lo.s32 %r687, %r683, 80; 2026-02-21T10:19:58.0797190Z sub.s32 %r688, %r2520, %r687; 2026-02-21T10:19:58.0797532Z .loc 1 36 51 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:36:51 2026-02-21T10:19:58.0797886Z div.s32 %r689, %r688, %r686; 2026-02-21T10:19:58.0798297Z .loc 1 35 64 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:64 2026-02-21T10:19:58.0798664Z mul.lo.s32 %r690, %r689, %r686; 2026-02-21T10:19:58.0798850Z sub.s32 %r691, %r688, %r690; 2026-02-21T10:19:58.0799170Z .loc 1 35 30 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:35:30 2026-02-21T10:19:58.0799522Z add.s32 %r692, %r691, %r684; 2026-02-21T10:19:58.0799837Z .loc 1 37 27 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:37:27 2026-02-21T10:19:58.0800188Z shl.b32 %r2518, %r692, 7; 2026-02-21T10:19:58.0800498Z .loc 1 38 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:38:32 2026-02-21T10:19:58.0800849Z or.b32 %r2521, %r2518, %r5; 2026-02-21T10:19:58.0801024Z or.b32 %r2522, %r2518, %r6; 2026-02-21T10:19:58.0801197Z or.b32 %r2523, %r2518, %r7; 2026-02-21T10:19:58.0801365Z or.b32 %r2524, %r2518, %r8; 2026-02-21T10:19:58.0801534Z or.b32 %r2525, %r2518, %r9; 2026-02-21T10:19:58.0801786Z or.b32 %r2526, %r2518, %r10; 2026-02-21T10:19:58.0801964Z or.b32 %r2527, %r2518, %r11; 2026-02-21T10:19:58.0802137Z or.b32 %r2528, %r2518, %r12; 2026-02-21T10:19:58.0802457Z .loc 1 39 27 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:39:27 2026-02-21T10:19:58.0802814Z shl.b32 %r2519, %r689, 7; 2026-02-21T10:19:58.0803032Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:58.0803442Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0803810Z setp.eq.b32 %p51, %r220, 0; 2026-02-21T10:19:58.0803994Z setp.lt.s32 %p52, %r2517, %r47; 2026-02-21T10:19:58.0804197Z add.s32 %r2034, %r2383, 1; 2026-02-21T10:19:58.0804376Z setp.gt.s32 %p55, %r2034, 2; 2026-02-21T10:19:58.0804564Z selp.b32 %r2383, 0, %r2034, %p55; 2026-02-21T10:19:58.0804756Z selp.b32 %r2035, 1, 0, %p55; 2026-02-21T10:19:58.0804935Z xor.b32 %r2382, %r2382, %r2035; 2026-02-21T10:19:58.0805265Z .loc 1 54 80 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:80 2026-02-21T10:19:58.0805623Z cp.async.wait_group 2; 2026-02-21T10:19:58.0805792Z bar.sync 0; 2026-02-21T10:19:58.0805938Z shl.b32 %r2036, %r2383, 13; 2026-02-21T10:19:58.0806197Z add.s32 %r2038, %r586, %r2036; 2026-02-21T10:19:58.0806646Z .loc 1 58 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:58:32 2026-02-21T10:19:58.0807019Z add.s32 %r2039, %r2038, %r48; 2026-02-21T10:19:58.0807203Z ld.shared.b16 %rs1, [%r2039]; 2026-02-21T10:19:58.0807389Z ld.shared.b16 %rs2, [%r2039+512]; 2026-02-21T10:19:58.0807587Z ld.shared.b16 %rs3, [%r2039+32]; 2026-02-21T10:19:58.0807782Z ld.shared.b16 %rs4, [%r2039+544]; 2026-02-21T10:19:58.0807977Z ld.shared.b16 %rs5, [%r2039+4096]; 2026-02-21T10:19:58.0808171Z ld.shared.b16 %rs6, [%r2039+4608]; 2026-02-21T10:19:58.0808365Z ld.shared.b16 %rs7, [%r2039+4128]; 2026-02-21T10:19:58.0808558Z ld.shared.b16 %rs8, [%r2039+4640]; 2026-02-21T10:19:58.0808753Z add.s32 %r2040, %r2038, %r49; 2026-02-21T10:19:58.0808931Z ld.shared.b16 %rs9, [%r2040]; 2026-02-21T10:19:58.0809115Z ld.shared.b16 %rs10, [%r2040+512]; 2026-02-21T10:19:58.0809312Z ld.shared.b16 %rs11, [%r2040+32]; 2026-02-21T10:19:58.0809506Z ld.shared.b16 %rs12, [%r2040+544]; 2026-02-21T10:19:58.0809703Z ld.shared.b16 %rs13, [%r2040+4096]; 2026-02-21T10:19:58.0809918Z ld.shared.b16 %rs14, [%r2040+4608]; 2026-02-21T10:19:58.0810195Z ld.shared.b16 %rs15, [%r2040+4128]; 2026-02-21T10:19:58.0810396Z ld.shared.b16 %rs16, [%r2040+4640]; 2026-02-21T10:19:58.0810590Z add.s32 %r2041, %r2038, %r50; 2026-02-21T10:19:58.0810768Z ld.shared.b16 %rs17, [%r2041]; 2026-02-21T10:19:58.0810955Z ld.shared.b16 %rs18, [%r2041+512]; 2026-02-21T10:19:58.0811148Z ld.shared.b16 %rs19, [%r2041+32]; 2026-02-21T10:19:58.0811357Z ld.shared.b16 %rs20, [%r2041+544]; 2026-02-21T10:19:58.0811621Z ld.shared.b16 %rs21, [%r2041+4096]; 2026-02-21T10:19:58.0811813Z ld.shared.b16 %rs22, [%r2041+4608]; 2026-02-21T10:19:58.0812010Z ld.shared.b16 %rs23, [%r2041+4128]; 2026-02-21T10:19:58.0812202Z ld.shared.b16 %rs24, [%r2041+4640]; 2026-02-21T10:19:58.0812391Z add.s32 %r2042, %r2038, %r51; 2026-02-21T10:19:58.0812567Z ld.shared.b16 %rs25, [%r2042]; 2026-02-21T10:19:58.0812759Z ld.shared.b16 %rs26, [%r2042+512]; 2026-02-21T10:19:58.0812960Z ld.shared.b16 %rs27, [%r2042+32]; 2026-02-21T10:19:58.0813159Z ld.shared.b16 %rs28, [%r2042+544]; 2026-02-21T10:19:58.0813354Z ld.shared.b16 %rs29, [%r2042+4096]; 2026-02-21T10:19:58.0813550Z ld.shared.b16 %rs30, [%r2042+4608]; 2026-02-21T10:19:58.0813756Z ld.shared.b16 %rs31, [%r2042+4128]; 2026-02-21T10:19:58.0813956Z ld.shared.b16 %rs32, [%r2042+4640]; 2026-02-21T10:19:58.0814174Z cvt.f32.bf16 %r823, %rs1; 2026-02-21T10:19:58.0814352Z cvt.f32.bf16 %r824, %rs2; 2026-02-21T10:19:58.0814534Z cvt.f32.bf16 %r825, %rs9; 2026-02-21T10:19:58.0814716Z cvt.f32.bf16 %r826, %rs10; 2026-02-21T10:19:58.0815004Z cvt.f32.bf16 %r955, %rs17; 2026-02-21T10:19:58.0815197Z cvt.f32.bf16 %r956, %rs18; 2026-02-21T10:19:58.0815378Z cvt.f32.bf16 %r957, %rs25; 2026-02-21T10:19:58.0815566Z cvt.f32.bf16 %r958, %rs26; 2026-02-21T10:19:58.0815748Z cvt.f32.bf16 %r1087, %rs3; 2026-02-21T10:19:58.0815933Z cvt.f32.bf16 %r1088, %rs4; 2026-02-21T10:19:58.0816116Z cvt.f32.bf16 %r1089, %rs11; 2026-02-21T10:19:58.0816311Z cvt.f32.bf16 %r1090, %rs12; 2026-02-21T10:19:58.0816616Z cvt.f32.bf16 %r1219, %rs19; 2026-02-21T10:19:58.0816821Z cvt.f32.bf16 %r1220, %rs20; 2026-02-21T10:19:58.0817009Z cvt.f32.bf16 %r1221, %rs27; 2026-02-21T10:19:58.0817200Z cvt.f32.bf16 %r1222, %rs28; 2026-02-21T10:19:58.0817388Z cvt.f32.bf16 %r1351, %rs5; 2026-02-21T10:19:58.0817567Z cvt.f32.bf16 %r1352, %rs6; 2026-02-21T10:19:58.0817757Z cvt.f32.bf16 %r1353, %rs13; 2026-02-21T10:19:58.0817931Z cvt.f32.bf16 %r1354, %rs14; 2026-02-21T10:19:58.0818114Z cvt.f32.bf16 %r1483, %rs21; 2026-02-21T10:19:58.0818301Z cvt.f32.bf16 %r1484, %rs22; 2026-02-21T10:19:58.0818478Z cvt.f32.bf16 %r1485, %rs29; 2026-02-21T10:19:58.0818650Z cvt.f32.bf16 %r1486, %rs30; 2026-02-21T10:19:58.0818824Z cvt.f32.bf16 %r1615, %rs7; 2026-02-21T10:19:58.0818994Z cvt.f32.bf16 %r1616, %rs8; 2026-02-21T10:19:58.0819167Z cvt.f32.bf16 %r1617, %rs15; 2026-02-21T10:19:58.0819443Z cvt.f32.bf16 %r1618, %rs16; 2026-02-21T10:19:58.0819615Z cvt.f32.bf16 %r1747, %rs23; 2026-02-21T10:19:58.0819788Z cvt.f32.bf16 %r1748, %rs24; 2026-02-21T10:19:58.0819959Z cvt.f32.bf16 %r1749, %rs31; 2026-02-21T10:19:58.0820131Z cvt.f32.bf16 %r1750, %rs32; 2026-02-21T10:19:58.0820460Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0820827Z shl.b32 %r2043, %r2383, 3; 2026-02-21T10:19:58.0821002Z add.s32 %r693, %r512, %r2043; 2026-02-21T10:19:58.0821181Z // begin inline asm 2026-02-21T10:19:58.0821337Z 2026-02-21T10:19:58.0821458Z { 2026-02-21T10:19:58.0821592Z .reg .pred complete; 2026-02-21T10:19:58.0821752Z waitLoop: 2026-02-21T10:19:58.0821974Z mbarrier.try_wait.parity.shared.b64 complete, [%r693], %r2382; 2026-02-21T10:19:58.0822263Z @!complete bra.uni waitLoop; 2026-02-21T10:19:58.0822446Z } 2026-02-21T10:19:58.0822519Z 2026-02-21T10:19:58.0822580Z // end inline asm 2026-02-21T10:19:58.0822889Z .loc 1 60 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:60:33 2026-02-21T10:19:58.0823250Z shl.b32 %r2045, %r2383, 11; 2026-02-21T10:19:58.0823499Z add.s32 %r2047, %r532, %r2045; 2026-02-21T10:19:58.0823830Z .loc 1 78 58 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:78:58 2026-02-21T10:19:58.0824183Z add.s32 %r2048, %r2047, %r45; 2026-02-21T10:19:58.0824375Z xor.b32 %r2049, %r45, 16; 2026-02-21T10:19:58.0824551Z add.s32 %r2050, %r2047, %r2049; 2026-02-21T10:19:58.0824746Z xor.b32 %r2051, %r45, 32; 2026-02-21T10:19:58.0824916Z add.s32 %r2052, %r2047, %r2051; 2026-02-21T10:19:58.0825167Z xor.b32 %r2053, %r45, 48; 2026-02-21T10:19:58.0825334Z add.s32 %r2054, %r2047, %r2053; 2026-02-21T10:19:58.0825515Z xor.b32 %r2055, %r45, 64; 2026-02-21T10:19:58.0825690Z add.s32 %r2056, %r2047, %r2055; 2026-02-21T10:19:58.0825866Z xor.b32 %r2057, %r45, 80; 2026-02-21T10:19:58.0826040Z add.s32 %r2058, %r2047, %r2057; 2026-02-21T10:19:58.0826219Z xor.b32 %r2059, %r45, 96; 2026-02-21T10:19:58.0826388Z add.s32 %r2060, %r2047, %r2059; 2026-02-21T10:19:58.0826693Z xor.b32 %r2061, %r45, 112; 2026-02-21T10:19:58.0826883Z add.s32 %r2062, %r2047, %r2061; 2026-02-21T10:19:58.0827208Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0827566Z ld.shared.s8 %rs33, [%r2048]; 2026-02-21T10:19:58.0827894Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0828244Z shl.b16 %rs34, %rs33, 4; 2026-02-21T10:19:58.0828722Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0829083Z ld.shared.s8 %rs35, [%r2050+128]; 2026-02-21T10:19:58.0829421Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0829770Z shl.b16 %rs36, %rs35, 4; 2026-02-21T10:19:58.0830082Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0830439Z ld.shared.s8 %rs37, [%r2052+256]; 2026-02-21T10:19:58.0830779Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0831135Z shl.b16 %rs38, %rs37, 4; 2026-02-21T10:19:58.0831438Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0831794Z ld.shared.s8 %rs39, [%r2054+384]; 2026-02-21T10:19:58.0832126Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0832478Z shl.b16 %rs40, %rs39, 4; 2026-02-21T10:19:58.0832786Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0833143Z ld.shared.s8 %rs41, [%r2056+512]; 2026-02-21T10:19:58.0833472Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0833908Z shl.b16 %rs42, %rs41, 4; 2026-02-21T10:19:58.0834218Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0834577Z ld.shared.s8 %rs43, [%r2058+640]; 2026-02-21T10:19:58.0834906Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0835255Z shl.b16 %rs44, %rs43, 4; 2026-02-21T10:19:58.0835561Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0835918Z ld.shared.s8 %rs45, [%r2060+768]; 2026-02-21T10:19:58.0836243Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0836732Z shl.b16 %rs46, %rs45, 4; 2026-02-21T10:19:58.0837042Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0837414Z ld.shared.s8 %rs47, [%r2062+896]; 2026-02-21T10:19:58.0837745Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0838169Z shl.b16 %rs48, %rs47, 4; 2026-02-21T10:19:58.0838482Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0838836Z ld.shared.s8 %rs49, [%r2048+1024]; 2026-02-21T10:19:58.0839172Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0839520Z shl.b16 %rs50, %rs49, 4; 2026-02-21T10:19:58.0839909Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0840317Z ld.shared.s8 %rs51, [%r2050+1152]; 2026-02-21T10:19:58.0840739Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0841109Z shl.b16 %rs52, %rs51, 4; 2026-02-21T10:19:58.0841424Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0841786Z ld.shared.s8 %rs53, [%r2052+1280]; 2026-02-21T10:19:58.0842122Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0842471Z shl.b16 %rs54, %rs53, 4; 2026-02-21T10:19:58.0842781Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0843133Z ld.shared.s8 %rs55, [%r2054+1408]; 2026-02-21T10:19:58.0843552Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0843914Z shl.b16 %rs56, %rs55, 4; 2026-02-21T10:19:58.0844220Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0844581Z ld.shared.s8 %rs57, [%r2056+1536]; 2026-02-21T10:19:58.0844913Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0845267Z shl.b16 %rs58, %rs57, 4; 2026-02-21T10:19:58.0845574Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0845932Z ld.shared.s8 %rs59, [%r2058+1664]; 2026-02-21T10:19:58.0846264Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0846776Z shl.b16 %rs60, %rs59, 4; 2026-02-21T10:19:58.0847100Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0847455Z ld.shared.s8 %rs61, [%r2060+1792]; 2026-02-21T10:19:58.0847784Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0848129Z shl.b16 %rs62, %rs61, 4; 2026-02-21T10:19:58.0848439Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0848879Z ld.shared.s8 %rs63, [%r2062+1920]; 2026-02-21T10:19:58.0849208Z .loc 1 63 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:63:28 2026-02-21T10:19:58.0849560Z shl.b16 %rs64, %rs63, 4; 2026-02-21T10:19:58.0849868Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0850233Z cvt.s16.s8 %rs65, %rs34; 2026-02-21T10:19:58.0850400Z shr.s16 %rs66, %rs65, 4; 2026-02-21T10:19:58.0850565Z cvt.s16.s8 %rs67, %rs36; 2026-02-21T10:19:58.0850731Z shr.s16 %rs68, %rs67, 4; 2026-02-21T10:19:58.0850898Z shr.s16 %rs69, %rs33, 4; 2026-02-21T10:19:58.0851067Z shr.s16 %rs70, %rs35, 4; 2026-02-21T10:19:58.0851371Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0851727Z cvt.rn.f32.s16 %r2063, %rs70; 2026-02-21T10:19:58.0851913Z cvt.rn.f32.s16 %r2064, %rs69; 2026-02-21T10:19:58.0852095Z cvt.rn.f32.s16 %r2065, %rs68; 2026-02-21T10:19:58.0852268Z cvt.rn.f32.s16 %r2066, %rs66; 2026-02-21T10:19:58.0852666Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0853026Z cvt.s16.s8 %rs71, %rs38; 2026-02-21T10:19:58.0853192Z shr.s16 %rs72, %rs71, 4; 2026-02-21T10:19:58.0853359Z cvt.s16.s8 %rs73, %rs40; 2026-02-21T10:19:58.0853529Z shr.s16 %rs74, %rs73, 4; 2026-02-21T10:19:58.0853695Z shr.s16 %rs75, %rs37, 4; 2026-02-21T10:19:58.0853857Z shr.s16 %rs76, %rs39, 4; 2026-02-21T10:19:58.0854167Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0854586Z cvt.rn.f32.s16 %r2067, %rs76; 2026-02-21T10:19:58.0854764Z cvt.rn.f32.s16 %r2068, %rs75; 2026-02-21T10:19:58.0854940Z cvt.rn.f32.s16 %r2069, %rs74; 2026-02-21T10:19:58.0855112Z cvt.rn.f32.s16 %r2070, %rs72; 2026-02-21T10:19:58.0855426Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0855776Z cvt.s16.s8 %rs77, %rs42; 2026-02-21T10:19:58.0855949Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:19:58.0856114Z cvt.s16.s8 %rs79, %rs44; 2026-02-21T10:19:58.0856298Z shr.s16 %rs80, %rs79, 4; 2026-02-21T10:19:58.0856617Z shr.s16 %rs81, %rs41, 4; 2026-02-21T10:19:58.0856804Z shr.s16 %rs82, %rs43, 4; 2026-02-21T10:19:58.0857129Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0857476Z cvt.rn.f32.s16 %r2071, %rs82; 2026-02-21T10:19:58.0857659Z cvt.rn.f32.s16 %r2072, %rs81; 2026-02-21T10:19:58.0857836Z cvt.rn.f32.s16 %r2073, %rs80; 2026-02-21T10:19:58.0858104Z cvt.rn.f32.s16 %r2074, %rs78; 2026-02-21T10:19:58.0858425Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0858777Z cvt.s16.s8 %rs83, %rs46; 2026-02-21T10:19:58.0858942Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:19:58.0859115Z cvt.s16.s8 %rs85, %rs48; 2026-02-21T10:19:58.0859285Z shr.s16 %rs86, %rs85, 4; 2026-02-21T10:19:58.0859449Z shr.s16 %rs87, %rs45, 4; 2026-02-21T10:19:58.0859620Z shr.s16 %rs88, %rs47, 4; 2026-02-21T10:19:58.0859935Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0860289Z cvt.rn.f32.s16 %r2075, %rs88; 2026-02-21T10:19:58.0860465Z cvt.rn.f32.s16 %r2076, %rs87; 2026-02-21T10:19:58.0860654Z cvt.rn.f32.s16 %r2077, %rs86; 2026-02-21T10:19:58.0860824Z cvt.rn.f32.s16 %r2078, %rs84; 2026-02-21T10:19:58.0861139Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0861493Z cvt.s16.s8 %rs89, %rs50; 2026-02-21T10:19:58.0861656Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:19:58.0861824Z cvt.s16.s8 %rs91, %rs52; 2026-02-21T10:19:58.0861985Z shr.s16 %rs92, %rs91, 4; 2026-02-21T10:19:58.0862153Z shr.s16 %rs93, %rs49, 4; 2026-02-21T10:19:58.0862316Z shr.s16 %rs94, %rs51, 4; 2026-02-21T10:19:58.0862704Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0863055Z cvt.rn.f32.s16 %r2079, %rs94; 2026-02-21T10:19:58.0863239Z cvt.rn.f32.s16 %r2080, %rs93; 2026-02-21T10:19:58.0863418Z cvt.rn.f32.s16 %r2081, %rs92; 2026-02-21T10:19:58.0863589Z cvt.rn.f32.s16 %r2082, %rs90; 2026-02-21T10:19:58.0863905Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0864252Z cvt.s16.s8 %rs95, %rs54; 2026-02-21T10:19:58.0864432Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:19:58.0864600Z cvt.s16.s8 %rs97, %rs56; 2026-02-21T10:19:58.0864768Z shr.s16 %rs98, %rs97, 4; 2026-02-21T10:19:58.0864932Z shr.s16 %rs99, %rs53, 4; 2026-02-21T10:19:58.0865113Z shr.s16 %rs100, %rs55, 4; 2026-02-21T10:19:58.0865439Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0865798Z cvt.rn.f32.s16 %r2083, %rs100; 2026-02-21T10:19:58.0866007Z cvt.rn.f32.s16 %r2084, %rs99; 2026-02-21T10:19:58.0866186Z cvt.rn.f32.s16 %r2085, %rs98; 2026-02-21T10:19:58.0866442Z cvt.rn.f32.s16 %r2086, %rs96; 2026-02-21T10:19:58.0866909Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0867282Z cvt.s16.s8 %rs101, %rs58; 2026-02-21T10:19:58.0867460Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:19:58.0867650Z cvt.s16.s8 %rs103, %rs60; 2026-02-21T10:19:58.0867831Z shr.s16 %rs104, %rs103, 4; 2026-02-21T10:19:58.0868007Z shr.s16 %rs105, %rs57, 4; 2026-02-21T10:19:58.0868264Z shr.s16 %rs106, %rs59, 4; 2026-02-21T10:19:58.0868659Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0869022Z cvt.rn.f32.s16 %r2087, %rs106; 2026-02-21T10:19:58.0869209Z cvt.rn.f32.s16 %r2088, %rs105; 2026-02-21T10:19:58.0869399Z cvt.rn.f32.s16 %r2089, %rs104; 2026-02-21T10:19:58.0869583Z cvt.rn.f32.s16 %r2090, %rs102; 2026-02-21T10:19:58.0869912Z .loc 1 65 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:65:25 2026-02-21T10:19:58.0870270Z cvt.s16.s8 %rs107, %rs62; 2026-02-21T10:19:58.0870453Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:19:58.0870638Z cvt.s16.s8 %rs109, %rs64; 2026-02-21T10:19:58.0870817Z shr.s16 %rs110, %rs109, 4; 2026-02-21T10:19:58.0870995Z shr.s16 %rs111, %rs61, 4; 2026-02-21T10:19:58.0871164Z shr.s16 %rs112, %rs63, 4; 2026-02-21T10:19:58.0871493Z .loc 1 83 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:83:32 2026-02-21T10:19:58.0871934Z cvt.rn.f32.s16 %r2091, %rs112; 2026-02-21T10:19:58.0872123Z cvt.rn.f32.s16 %r2092, %rs111; 2026-02-21T10:19:58.0872309Z cvt.rn.f32.s16 %r2093, %rs110; 2026-02-21T10:19:58.0872491Z cvt.rn.f32.s16 %r2094, %rs108; 2026-02-21T10:19:58.0872744Z st.shared.v4.b32 [%r52], {%r2066, %r2064, %r2065, %r2063}; 2026-02-21T10:19:58.0873045Z st.shared.v4.b32 [%r53], {%r2070, %r2068, %r2069, %r2067}; 2026-02-21T10:19:58.0873340Z st.shared.v4.b32 [%r54], {%r2074, %r2072, %r2073, %r2071}; 2026-02-21T10:19:58.0873624Z st.shared.v4.b32 [%r55], {%r2078, %r2076, %r2077, %r2075}; 2026-02-21T10:19:58.0873908Z st.shared.v4.b32 [%r56], {%r2082, %r2080, %r2081, %r2079}; 2026-02-21T10:19:58.0874195Z st.shared.v4.b32 [%r57], {%r2086, %r2084, %r2085, %r2083}; 2026-02-21T10:19:58.0874473Z st.shared.v4.b32 [%r58], {%r2090, %r2088, %r2089, %r2087}; 2026-02-21T10:19:58.0874756Z st.shared.v4.b32 [%r59], {%r2094, %r2092, %r2093, %r2091}; 2026-02-21T10:19:58.0874993Z $L__tmp1: 2026-02-21T10:19:58.0875366Z .loc 2 291 36 // standard.py:291:36 @[ chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:90:40 ] 2026-02-21T10:19:58.0875796Z // begin inline asm 2026-02-21T10:19:58.0875989Z fence.proxy.async.shared::cta; 2026-02-21T10:19:58.0876185Z // end inline asm 2026-02-21T10:19:58.0876338Z bar.sync 0; 2026-02-21T10:19:58.0876706Z shfl.sync.idx.b32 %r2095, %r3, 0, 31, -1; 2026-02-21T10:19:58.0876938Z wgmma.fence.sync.aligned; 2026-02-21T10:19:58.0877142Z mov.pred %p41, -1; 2026-02-21T10:19:58.0877309Z // begin inline asm 2026-02-21T10:19:58.0878678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r823,%r824,%r825,%r826}, %rd92, %p41, 1, 1; 2026-02-21T10:19:58.0880089Z // end inline asm 2026-02-21T10:19:58.0880243Z // begin inline asm 2026-02-21T10:19:58.0881684Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r955,%r956,%r957,%r958}, %rd93, %p41, 1, 1; 2026-02-21T10:19:58.0881763Z // end inline asm 2026-02-21T10:19:58.0881826Z // begin inline asm 2026-02-21T10:19:58.0883091Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1087,%r1088,%r1089,%r1090}, %rd94, %p41, 1, 1; 2026-02-21T10:19:58.0883256Z // end inline asm 2026-02-21T10:19:58.0883319Z // begin inline asm 2026-02-21T10:19:58.0884633Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450}, {%r1219,%r1220,%r1221,%r1222}, %rd95, %p41, 1, 1; 2026-02-21T10:19:58.0884704Z // end inline asm 2026-02-21T10:19:58.0884768Z // begin inline asm 2026-02-21T10:19:58.0886026Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1351,%r1352,%r1353,%r1354}, %rd92, %p41, 1, 1; 2026-02-21T10:19:58.0886089Z // end inline asm 2026-02-21T10:19:58.0886149Z // begin inline asm 2026-02-21T10:19:58.0887549Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1483,%r1484,%r1485,%r1486}, %rd93, %p41, 1, 1; 2026-02-21T10:19:58.0887699Z // end inline asm 2026-02-21T10:19:58.0887761Z // begin inline asm 2026-02-21T10:19:58.0889024Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1615,%r1616,%r1617,%r1618}, %rd94, %p41, 1, 1; 2026-02-21T10:19:58.0889087Z // end inline asm 2026-02-21T10:19:58.0889150Z // begin inline asm 2026-02-21T10:19:58.0890471Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514}, {%r1747,%r1748,%r1749,%r1750}, %rd95, %p41, 1, 1; 2026-02-21T10:19:58.0890535Z // end inline asm 2026-02-21T10:19:58.0890620Z wgmma.commit_group.sync.aligned; 2026-02-21T10:19:58.0890681Z mov.b32 %r1880, 0; 2026-02-21T10:19:58.0890745Z mov.b32 %r1879, %r647; 2026-02-21T10:19:58.0890869Z mov.b32 %r1881, %r1880; 2026-02-21T10:19:58.0890937Z // begin inline asm 2026-02-21T10:19:58.0893014Z // wait for regs: %r2387,%r2388,%r2389,%r2390,%r2391,%r2392,%r2393,%r2394,%r2395,%r2396,%r2397,%r2398,%r2399,%r2400,%r2401,%r2402,%r2403,%r2404,%r2405,%r2406,%r2407,%r2408,%r2409,%r2410,%r2411,%r2412,%r2413,%r2414,%r2415,%r2416,%r2417,%r2418,%r2419,%r2420,%r2421,%r2422,%r2423,%r2424,%r2425,%r2426,%r2427,%r2428,%r2429,%r2430,%r2431,%r2432,%r2433,%r2434,%r2435,%r2436,%r2437,%r2438,%r2439,%r2440,%r2441,%r2442,%r2443,%r2444,%r2445,%r2446,%r2447,%r2448,%r2449,%r2450,%r2451,%r2452,%r2453,%r2454,%r2455,%r2456,%r2457,%r2458,%r2459,%r2460,%r2461,%r2462,%r2463,%r2464,%r2465,%r2466,%r2467,%r2468,%r2469,%r2470,%r2471,%r2472,%r2473,%r2474,%r2475,%r2476,%r2477,%r2478,%r2479,%r2480,%r2481,%r2482,%r2483,%r2484,%r2485,%r2486,%r2487,%r2488,%r2489,%r2490,%r2491,%r2492,%r2493,%r2494,%r2495,%r2496,%r2497,%r2498,%r2499,%r2500,%r2501,%r2502,%r2503,%r2504,%r2505,%r2506,%r2507,%r2508,%r2509,%r2510,%r2511,%r2512,%r2513,%r2514,%r1879,%r1880,%r1881 2026-02-21T10:19:58.0893161Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:19:58.0893224Z // end inline asm 2026-02-21T10:19:58.0893282Z $L__tmp2: 2026-02-21T10:19:58.0893514Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0893578Z add.s32 %r2096, %r2381, 16; 2026-02-21T10:19:58.0893645Z add.s32 %r2097, %r2384, 1; 2026-02-21T10:19:58.0893714Z setp.gt.s32 %p56, %r2097, 2; 2026-02-21T10:19:58.0893791Z selp.b32 %r2384, 0, %r2097, %p56; 2026-02-21T10:19:58.0893858Z selp.b32 %r2381, 0, %r2096, %p51; 2026-02-21T10:19:58.0894073Z .loc 1 51 22 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:51:22 2026-02-21T10:19:58.0894143Z shl.b32 %r2098, %r2381, 1; 2026-02-21T10:19:58.0894350Z .loc 1 53 25 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:53:25 2026-02-21T10:19:58.0894416Z add.s32 %r2099, %r2098, %r34; 2026-02-21T10:19:58.0894642Z .loc 1 54 53 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:53 2026-02-21T10:19:58.0894708Z shl.b32 %r2100, %r2521, 13; 2026-02-21T10:19:58.0894768Z shl.b32 %r2101, %r2522, 13; 2026-02-21T10:19:58.0894834Z shl.b32 %r2102, %r2523, 13; 2026-02-21T10:19:58.0894896Z shl.b32 %r2103, %r2524, 13; 2026-02-21T10:19:58.0895011Z shl.b32 %r2104, %r2525, 13; 2026-02-21T10:19:58.0895074Z shl.b32 %r2105, %r2526, 13; 2026-02-21T10:19:58.0895140Z shl.b32 %r2106, %r2527, 13; 2026-02-21T10:19:58.0895201Z shl.b32 %r2107, %r2528, 13; 2026-02-21T10:19:58.0895403Z .loc 1 54 60 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:60 2026-02-21T10:19:58.0895473Z add.s32 %r2108, %r2100, %r2099; 2026-02-21T10:19:58.0895536Z add.s32 %r2109, %r2101, %r2099; 2026-02-21T10:19:58.0895598Z add.s32 %r2110, %r2102, %r2099; 2026-02-21T10:19:58.0895675Z add.s32 %r2111, %r2103, %r2099; 2026-02-21T10:19:58.0895742Z add.s32 %r2112, %r2104, %r2099; 2026-02-21T10:19:58.0895808Z add.s32 %r2113, %r2105, %r2099; 2026-02-21T10:19:58.0895871Z add.s32 %r2114, %r2106, %r2099; 2026-02-21T10:19:58.0895938Z add.s32 %r2115, %r2107, %r2099; 2026-02-21T10:19:58.0896139Z .loc 1 54 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:32 2026-02-21T10:19:58.0896215Z mad.wide.s32 %rd100, %r2108, 2, %rd6; 2026-02-21T10:19:58.0896293Z mad.wide.s32 %rd101, %r2109, 2, %rd6; 2026-02-21T10:19:58.0896362Z mad.wide.s32 %rd102, %r2110, 2, %rd6; 2026-02-21T10:19:58.0896607Z mad.wide.s32 %rd103, %r2111, 2, %rd6; 2026-02-21T10:19:58.0896690Z mad.wide.s32 %rd104, %r2112, 2, %rd6; 2026-02-21T10:19:58.0896766Z mad.wide.s32 %rd105, %r2113, 2, %rd6; 2026-02-21T10:19:58.0896834Z mad.wide.s32 %rd106, %r2114, 2, %rd6; 2026-02-21T10:19:58.0896902Z mad.wide.s32 %rd107, %r2115, 2, %rd6; 2026-02-21T10:19:58.0897134Z .loc 1 54 80 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:54:80 2026-02-21T10:19:58.0897269Z shl.b32 %r2116, %r2384, 13; 2026-02-21T10:19:58.0897340Z add.s32 %r2117, %r586, %r2116; 2026-02-21T10:19:58.0897412Z add.s32 %r2013, %r2117, %r46; 2026-02-21T10:19:58.0897479Z selp.b32 %r2014, 8, 0, %p52; 2026-02-21T10:19:58.0897541Z // begin inline asm 2026-02-21T10:19:58.0897699Z cp.async.ca.shared.global [ %r2013 + 0 ], [ %rd100 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0897768Z // end inline asm 2026-02-21T10:19:58.0897832Z add.s32 %r2015, %r2013, 1024; 2026-02-21T10:19:58.0897893Z // begin inline asm 2026-02-21T10:19:58.0898041Z cp.async.ca.shared.global [ %r2015 + 0 ], [ %rd101 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0898101Z // end inline asm 2026-02-21T10:19:58.0898164Z add.s32 %r2017, %r2013, 2048; 2026-02-21T10:19:58.0898225Z // begin inline asm 2026-02-21T10:19:58.0898365Z cp.async.ca.shared.global [ %r2017 + 0 ], [ %rd102 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0898423Z // end inline asm 2026-02-21T10:19:58.0898486Z add.s32 %r2019, %r2013, 3072; 2026-02-21T10:19:58.0898553Z // begin inline asm 2026-02-21T10:19:58.0898750Z cp.async.ca.shared.global [ %r2019 + 0 ], [ %rd103 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0898813Z // end inline asm 2026-02-21T10:19:58.0898886Z add.s32 %r2021, %r2013, 4096; 2026-02-21T10:19:58.0898955Z // begin inline asm 2026-02-21T10:19:58.0899090Z cp.async.ca.shared.global [ %r2021 + 0 ], [ %rd104 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0899153Z // end inline asm 2026-02-21T10:19:58.0899221Z add.s32 %r2023, %r2013, 5120; 2026-02-21T10:19:58.0899281Z // begin inline asm 2026-02-21T10:19:58.0899419Z cp.async.ca.shared.global [ %r2023 + 0 ], [ %rd105 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0899484Z // end inline asm 2026-02-21T10:19:58.0899548Z add.s32 %r2025, %r2013, 6144; 2026-02-21T10:19:58.0899608Z // begin inline asm 2026-02-21T10:19:58.0899741Z cp.async.ca.shared.global [ %r2025 + 0 ], [ %rd106 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0899808Z // end inline asm 2026-02-21T10:19:58.0899872Z add.s32 %r2027, %r2013, 7168; 2026-02-21T10:19:58.0899938Z // begin inline asm 2026-02-21T10:19:58.0900078Z cp.async.ca.shared.global [ %r2027 + 0 ], [ %rd107 + 0 ], 0x8, %r2014; 2026-02-21T10:19:58.0900141Z // end inline asm 2026-02-21T10:19:58.0900210Z cp.async.commit_group; 2026-02-21T10:19:58.0900441Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0900588Z shl.b32 %r2118, %r2384, 3; 2026-02-21T10:19:58.0900655Z add.s32 %r2029, %r512, %r2118; 2026-02-21T10:19:58.0900729Z and.pred %p49, %p61, %p52; 2026-02-21T10:19:58.0900800Z // begin inline asm 2026-02-21T10:19:58.0900937Z @%p49 mbarrier.arrive.expect_tx.shared.b64 _, [%r2029], 2048; 2026-02-21T10:19:58.0900998Z // end inline asm 2026-02-21T10:19:58.0901216Z .loc 1 60 33 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:60:33 2026-02-21T10:19:58.0901281Z shl.b32 %r2119, %r2384, 11; 2026-02-21T10:19:58.0901356Z add.s32 %r2030, %r532, %r2119; 2026-02-21T10:19:58.0901423Z bar.sync 0; 2026-02-21T10:19:58.0901501Z elect.sync %r2120|%p57, -1; 2026-02-21T10:19:58.0901569Z and.pred %p58, %p52, %p57; 2026-02-21T10:19:58.0901637Z and.pred %p50, %p1, %p58; 2026-02-21T10:19:58.0901711Z // begin inline asm 2026-02-21T10:19:58.0902040Z @%p50 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r2030], [%rd34, {%r2519, %r2381}], [%r2029]; 2026-02-21T10:19:58.0902103Z // end inline asm 2026-02-21T10:19:58.0902391Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0902463Z setp.ne.b32 %p59, %r2380, 255; 2026-02-21T10:19:58.0902525Z @%p59 bra $L__BB0_6; 2026-02-21T10:19:58.0902639Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:19:58.0902851Z .loc 1 38 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:38:32 2026-02-21T10:19:58.0902917Z add.s32 %r2266, %r2378, %r13; 2026-02-21T10:19:58.0903029Z add.s32 %r2267, %r14, %r2378; 2026-02-21T10:19:58.0903099Z add.s32 %r2268, %r15, %r2378; 2026-02-21T10:19:58.0903163Z add.s32 %r2269, %r16, %r2378; 2026-02-21T10:19:58.0903223Z add.s32 %r2270, %r17, %r2378; 2026-02-21T10:19:58.0903288Z add.s32 %r2271, %r18, %r2378; 2026-02-21T10:19:58.0903348Z add.s32 %r2272, %r19, %r2378; 2026-02-21T10:19:58.0903412Z add.s32 %r2273, %r2378, %r20; 2026-02-21T10:19:58.0903473Z add.s32 %r2274, %r21, %r2378; 2026-02-21T10:19:58.0903539Z add.s32 %r2275, %r22, %r2378; 2026-02-21T10:19:58.0903602Z add.s32 %r2276, %r23, %r2378; 2026-02-21T10:19:58.0903663Z add.s32 %r2277, %r24, %r2378; 2026-02-21T10:19:58.0903727Z add.s32 %r2278, %r25, %r2378; 2026-02-21T10:19:58.0903789Z add.s32 %r2279, %r26, %r2378; 2026-02-21T10:19:58.0903852Z add.s32 %r2280, %r27, %r2378; 2026-02-21T10:19:58.0903913Z add.s32 %r2281, %r2378, %r28; 2026-02-21T10:19:58.0904121Z .loc 1 40 32 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:40:32 2026-02-21T10:19:58.0904242Z add.s32 %r2282, %r2376, %r29; 2026-02-21T10:19:58.0904447Z .loc 1 93 28 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:93:28 2026-02-21T10:19:58.0904535Z cvt.rn.bf16x2.f32 %r2283, %r2388, %r2387; 2026-02-21T10:19:58.0904613Z cvt.rn.bf16x2.f32 %r2284, %r2390, %r2389; 2026-02-21T10:19:58.0904688Z cvt.rn.bf16x2.f32 %r2285, %r2392, %r2391; 2026-02-21T10:19:58.0904763Z cvt.rn.bf16x2.f32 %r2286, %r2394, %r2393; 2026-02-21T10:19:58.0904835Z cvt.rn.bf16x2.f32 %r2287, %r2396, %r2395; 2026-02-21T10:19:58.0904908Z cvt.rn.bf16x2.f32 %r2288, %r2398, %r2397; 2026-02-21T10:19:58.0904981Z cvt.rn.bf16x2.f32 %r2289, %r2400, %r2399; 2026-02-21T10:19:58.0905059Z cvt.rn.bf16x2.f32 %r2290, %r2402, %r2401; 2026-02-21T10:19:58.0905131Z cvt.rn.bf16x2.f32 %r2291, %r2404, %r2403; 2026-02-21T10:19:58.0905202Z cvt.rn.bf16x2.f32 %r2292, %r2406, %r2405; 2026-02-21T10:19:58.0905282Z cvt.rn.bf16x2.f32 %r2293, %r2408, %r2407; 2026-02-21T10:19:58.0905355Z cvt.rn.bf16x2.f32 %r2294, %r2410, %r2409; 2026-02-21T10:19:58.0905441Z cvt.rn.bf16x2.f32 %r2295, %r2412, %r2411; 2026-02-21T10:19:58.0905525Z cvt.rn.bf16x2.f32 %r2296, %r2414, %r2413; 2026-02-21T10:19:58.0905598Z cvt.rn.bf16x2.f32 %r2297, %r2416, %r2415; 2026-02-21T10:19:58.0905672Z cvt.rn.bf16x2.f32 %r2298, %r2418, %r2417; 2026-02-21T10:19:58.0905802Z cvt.rn.bf16x2.f32 %r2299, %r2420, %r2419; 2026-02-21T10:19:58.0905880Z cvt.rn.bf16x2.f32 %r2300, %r2422, %r2421; 2026-02-21T10:19:58.0905952Z cvt.rn.bf16x2.f32 %r2301, %r2424, %r2423; 2026-02-21T10:19:58.0906026Z cvt.rn.bf16x2.f32 %r2302, %r2426, %r2425; 2026-02-21T10:19:58.0906104Z cvt.rn.bf16x2.f32 %r2303, %r2428, %r2427; 2026-02-21T10:19:58.0906177Z cvt.rn.bf16x2.f32 %r2304, %r2430, %r2429; 2026-02-21T10:19:58.0906250Z cvt.rn.bf16x2.f32 %r2305, %r2432, %r2431; 2026-02-21T10:19:58.0906323Z cvt.rn.bf16x2.f32 %r2306, %r2434, %r2433; 2026-02-21T10:19:58.0906402Z cvt.rn.bf16x2.f32 %r2307, %r2436, %r2435; 2026-02-21T10:19:58.0906599Z cvt.rn.bf16x2.f32 %r2308, %r2438, %r2437; 2026-02-21T10:19:58.0906682Z cvt.rn.bf16x2.f32 %r2309, %r2440, %r2439; 2026-02-21T10:19:58.0906764Z cvt.rn.bf16x2.f32 %r2310, %r2442, %r2441; 2026-02-21T10:19:58.0906836Z cvt.rn.bf16x2.f32 %r2311, %r2444, %r2443; 2026-02-21T10:19:58.0906912Z cvt.rn.bf16x2.f32 %r2312, %r2446, %r2445; 2026-02-21T10:19:58.0907000Z cvt.rn.bf16x2.f32 %r2313, %r2448, %r2447; 2026-02-21T10:19:58.0907074Z cvt.rn.bf16x2.f32 %r2314, %r2450, %r2449; 2026-02-21T10:19:58.0907146Z cvt.rn.bf16x2.f32 %r2315, %r2452, %r2451; 2026-02-21T10:19:58.0907318Z cvt.rn.bf16x2.f32 %r2316, %r2454, %r2453; 2026-02-21T10:19:58.0907399Z cvt.rn.bf16x2.f32 %r2317, %r2456, %r2455; 2026-02-21T10:19:58.0907476Z cvt.rn.bf16x2.f32 %r2318, %r2458, %r2457; 2026-02-21T10:19:58.0907548Z cvt.rn.bf16x2.f32 %r2319, %r2460, %r2459; 2026-02-21T10:19:58.0907627Z cvt.rn.bf16x2.f32 %r2320, %r2462, %r2461; 2026-02-21T10:19:58.0907699Z cvt.rn.bf16x2.f32 %r2321, %r2464, %r2463; 2026-02-21T10:19:58.0907835Z cvt.rn.bf16x2.f32 %r2322, %r2466, %r2465; 2026-02-21T10:19:58.0907915Z cvt.rn.bf16x2.f32 %r2323, %r2468, %r2467; 2026-02-21T10:19:58.0907986Z cvt.rn.bf16x2.f32 %r2324, %r2470, %r2469; 2026-02-21T10:19:58.0908059Z cvt.rn.bf16x2.f32 %r2325, %r2472, %r2471; 2026-02-21T10:19:58.0908130Z cvt.rn.bf16x2.f32 %r2326, %r2474, %r2473; 2026-02-21T10:19:58.0908210Z cvt.rn.bf16x2.f32 %r2327, %r2476, %r2475; 2026-02-21T10:19:58.0908281Z cvt.rn.bf16x2.f32 %r2328, %r2478, %r2477; 2026-02-21T10:19:58.0908352Z cvt.rn.bf16x2.f32 %r2329, %r2480, %r2479; 2026-02-21T10:19:58.0908511Z cvt.rn.bf16x2.f32 %r2330, %r2482, %r2481; 2026-02-21T10:19:58.0908588Z cvt.rn.bf16x2.f32 %r2331, %r2484, %r2483; 2026-02-21T10:19:58.0908662Z cvt.rn.bf16x2.f32 %r2332, %r2486, %r2485; 2026-02-21T10:19:58.0908735Z cvt.rn.bf16x2.f32 %r2333, %r2488, %r2487; 2026-02-21T10:19:58.0908819Z cvt.rn.bf16x2.f32 %r2334, %r2490, %r2489; 2026-02-21T10:19:58.0908892Z cvt.rn.bf16x2.f32 %r2335, %r2492, %r2491; 2026-02-21T10:19:58.0908969Z cvt.rn.bf16x2.f32 %r2336, %r2494, %r2493; 2026-02-21T10:19:58.0909117Z cvt.rn.bf16x2.f32 %r2337, %r2496, %r2495; 2026-02-21T10:19:58.0909192Z cvt.rn.bf16x2.f32 %r2338, %r2498, %r2497; 2026-02-21T10:19:58.0909263Z cvt.rn.bf16x2.f32 %r2339, %r2500, %r2499; 2026-02-21T10:19:58.0909340Z cvt.rn.bf16x2.f32 %r2340, %r2502, %r2501; 2026-02-21T10:19:58.0909414Z cvt.rn.bf16x2.f32 %r2341, %r2504, %r2503; 2026-02-21T10:19:58.0909488Z cvt.rn.bf16x2.f32 %r2342, %r2506, %r2505; 2026-02-21T10:19:58.0909560Z cvt.rn.bf16x2.f32 %r2343, %r2508, %r2507; 2026-02-21T10:19:58.0909654Z cvt.rn.bf16x2.f32 %r2344, %r2510, %r2509; 2026-02-21T10:19:58.0909726Z cvt.rn.bf16x2.f32 %r2345, %r2512, %r2511; 2026-02-21T10:19:58.0909798Z cvt.rn.bf16x2.f32 %r2346, %r2514, %r2513; 2026-02-21T10:19:58.0910018Z .loc 1 94 50 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:94:50 2026-02-21T10:19:58.0910095Z mad.lo.s32 %r2347, %r2266, 1280, %r2282; 2026-02-21T10:19:58.0910167Z mad.lo.s32 %r2348, %r2267, 1280, %r2282; 2026-02-21T10:19:58.0910247Z mad.lo.s32 %r2349, %r2268, 1280, %r2282; 2026-02-21T10:19:58.0910322Z mad.lo.s32 %r2350, %r2269, 1280, %r2282; 2026-02-21T10:19:58.0910391Z mad.lo.s32 %r2351, %r2270, 1280, %r2282; 2026-02-21T10:19:58.0910459Z mad.lo.s32 %r2352, %r2271, 1280, %r2282; 2026-02-21T10:19:58.0910533Z mad.lo.s32 %r2353, %r2272, 1280, %r2282; 2026-02-21T10:19:58.0910684Z mad.lo.s32 %r2354, %r2273, 1280, %r2282; 2026-02-21T10:19:58.0910752Z mad.lo.s32 %r2355, %r2274, 1280, %r2282; 2026-02-21T10:19:58.0910828Z mad.lo.s32 %r2356, %r2275, 1280, %r2282; 2026-02-21T10:19:58.0910899Z mad.lo.s32 %r2357, %r2276, 1280, %r2282; 2026-02-21T10:19:58.0910970Z mad.lo.s32 %r2358, %r2277, 1280, %r2282; 2026-02-21T10:19:58.0911042Z mad.lo.s32 %r2359, %r2278, 1280, %r2282; 2026-02-21T10:19:58.0911111Z mad.lo.s32 %r2360, %r2279, 1280, %r2282; 2026-02-21T10:19:58.0911179Z mad.lo.s32 %r2361, %r2280, 1280, %r2282; 2026-02-21T10:19:58.0911246Z mad.lo.s32 %r2362, %r2281, 1280, %r2282; 2026-02-21T10:19:58.0911464Z .loc 1 94 22 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:94:22 2026-02-21T10:19:58.0911537Z mad.wide.s32 %rd109, %r2347, 2, %rd7; 2026-02-21T10:19:58.0911606Z mad.wide.s32 %rd110, %r2348, 2, %rd7; 2026-02-21T10:19:58.0911679Z mad.wide.s32 %rd111, %r2349, 2, %rd7; 2026-02-21T10:19:58.0911749Z mad.wide.s32 %rd112, %r2350, 2, %rd7; 2026-02-21T10:19:58.0911816Z mad.wide.s32 %rd113, %r2351, 2, %rd7; 2026-02-21T10:19:58.0911887Z mad.wide.s32 %rd114, %r2352, 2, %rd7; 2026-02-21T10:19:58.0912013Z mad.wide.s32 %rd115, %r2353, 2, %rd7; 2026-02-21T10:19:58.0912082Z mad.wide.s32 %rd116, %r2354, 2, %rd7; 2026-02-21T10:19:58.0912148Z mad.wide.s32 %rd117, %r2355, 2, %rd7; 2026-02-21T10:19:58.0912224Z mad.wide.s32 %rd118, %r2356, 2, %rd7; 2026-02-21T10:19:58.0912288Z mad.wide.s32 %rd119, %r2357, 2, %rd7; 2026-02-21T10:19:58.0912353Z mad.wide.s32 %rd120, %r2358, 2, %rd7; 2026-02-21T10:19:58.0912420Z mad.wide.s32 %rd121, %r2359, 2, %rd7; 2026-02-21T10:19:58.0912534Z mad.wide.s32 %rd122, %r2360, 2, %rd7; 2026-02-21T10:19:58.0912598Z mad.wide.s32 %rd123, %r2361, 2, %rd7; 2026-02-21T10:19:58.0912665Z mad.wide.s32 %rd124, %r2362, 2, %rd7; 2026-02-21T10:19:58.0912875Z .loc 1 94 81 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:94:81 2026-02-21T10:19:58.0912991Z st.shared.v4.b32 [%r60], {%r2283, %r2285, %r2287, %r2289}; 2026-02-21T10:19:58.0913099Z st.shared.v4.b32 [%r61], {%r2291, %r2293, %r2295, %r2297}; 2026-02-21T10:19:58.0913223Z st.shared.v4.b32 [%r62], {%r2299, %r2301, %r2303, %r2305}; 2026-02-21T10:19:58.0913330Z st.shared.v4.b32 [%r63], {%r2307, %r2309, %r2311, %r2313}; 2026-02-21T10:19:58.0913388Z bar.sync 0; 2026-02-21T10:19:58.0913457Z // begin inline asm 2026-02-21T10:19:58.0913654Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2201, %r2202, %r2203, %r2204}, [%r2125]; 2026-02-21T10:19:58.0913715Z // end inline asm 2026-02-21T10:19:58.0913776Z // begin inline asm 2026-02-21T10:19:58.0914020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2209, %r2210, %r2211, %r2212}, [%r2130]; 2026-02-21T10:19:58.0914082Z // end inline asm 2026-02-21T10:19:58.0914141Z // begin inline asm 2026-02-21T10:19:58.0914327Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2217, %r2218, %r2219, %r2220}, [%r2135]; 2026-02-21T10:19:58.0914384Z // end inline asm 2026-02-21T10:19:58.0914445Z // begin inline asm 2026-02-21T10:19:58.0914626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2225, %r2226, %r2227, %r2228}, [%r2140]; 2026-02-21T10:19:58.0914685Z // end inline asm 2026-02-21T10:19:58.0914739Z bar.sync 0; 2026-02-21T10:19:58.0914844Z st.shared.v4.b32 [%r60], {%r2284, %r2286, %r2288, %r2290}; 2026-02-21T10:19:58.0914965Z st.shared.v4.b32 [%r61], {%r2292, %r2294, %r2296, %r2298}; 2026-02-21T10:19:58.0915068Z st.shared.v4.b32 [%r62], {%r2300, %r2302, %r2304, %r2306}; 2026-02-21T10:19:58.0915168Z st.shared.v4.b32 [%r63], {%r2308, %r2310, %r2312, %r2314}; 2026-02-21T10:19:58.0915227Z bar.sync 0; 2026-02-21T10:19:58.0915290Z // begin inline asm 2026-02-21T10:19:58.0915470Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2205, %r2206, %r2207, %r2208}, [%r2125]; 2026-02-21T10:19:58.0915534Z // end inline asm 2026-02-21T10:19:58.0915594Z // begin inline asm 2026-02-21T10:19:58.0915772Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2213, %r2214, %r2215, %r2216}, [%r2130]; 2026-02-21T10:19:58.0915892Z // end inline asm 2026-02-21T10:19:58.0915959Z // begin inline asm 2026-02-21T10:19:58.0916142Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2221, %r2222, %r2223, %r2224}, [%r2135]; 2026-02-21T10:19:58.0916199Z // end inline asm 2026-02-21T10:19:58.0916265Z // begin inline asm 2026-02-21T10:19:58.0916442Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2229, %r2230, %r2231, %r2232}, [%r2140]; 2026-02-21T10:19:58.0916711Z // end inline asm 2026-02-21T10:19:58.0916772Z bar.sync 0; 2026-02-21T10:19:58.0916886Z st.shared.v4.b32 [%r60], {%r2315, %r2317, %r2319, %r2321}; 2026-02-21T10:19:58.0916997Z st.shared.v4.b32 [%r61], {%r2323, %r2325, %r2327, %r2329}; 2026-02-21T10:19:58.0917102Z st.shared.v4.b32 [%r62], {%r2331, %r2333, %r2335, %r2337}; 2026-02-21T10:19:58.0917225Z st.shared.v4.b32 [%r63], {%r2339, %r2341, %r2343, %r2345}; 2026-02-21T10:19:58.0917283Z bar.sync 0; 2026-02-21T10:19:58.0917345Z // begin inline asm 2026-02-21T10:19:58.0917531Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2233, %r2234, %r2235, %r2236}, [%r2125]; 2026-02-21T10:19:58.0917591Z // end inline asm 2026-02-21T10:19:58.0917652Z // begin inline asm 2026-02-21T10:19:58.0917905Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2241, %r2242, %r2243, %r2244}, [%r2130]; 2026-02-21T10:19:58.0917974Z // end inline asm 2026-02-21T10:19:58.0921807Z // begin inline asm 2026-02-21T10:19:58.0922063Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2249, %r2250, %r2251, %r2252}, [%r2135]; 2026-02-21T10:19:58.0922129Z // end inline asm 2026-02-21T10:19:58.0922192Z // begin inline asm 2026-02-21T10:19:58.0922407Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2257, %r2258, %r2259, %r2260}, [%r2140]; 2026-02-21T10:19:58.0922602Z // end inline asm 2026-02-21T10:19:58.0922662Z bar.sync 0; 2026-02-21T10:19:58.0922794Z st.shared.v4.b32 [%r60], {%r2316, %r2318, %r2320, %r2322}; 2026-02-21T10:19:58.0922910Z st.shared.v4.b32 [%r61], {%r2324, %r2326, %r2328, %r2330}; 2026-02-21T10:19:58.0923024Z st.shared.v4.b32 [%r62], {%r2332, %r2334, %r2336, %r2338}; 2026-02-21T10:19:58.0923125Z st.shared.v4.b32 [%r63], {%r2340, %r2342, %r2344, %r2346}; 2026-02-21T10:19:58.0923186Z bar.sync 0; 2026-02-21T10:19:58.0923252Z // begin inline asm 2026-02-21T10:19:58.0923445Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2237, %r2238, %r2239, %r2240}, [%r2125]; 2026-02-21T10:19:58.0923509Z // end inline asm 2026-02-21T10:19:58.0923570Z // begin inline asm 2026-02-21T10:19:58.0923758Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2245, %r2246, %r2247, %r2248}, [%r2130]; 2026-02-21T10:19:58.0923821Z // end inline asm 2026-02-21T10:19:58.0923882Z // begin inline asm 2026-02-21T10:19:58.0924143Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2253, %r2254, %r2255, %r2256}, [%r2135]; 2026-02-21T10:19:58.0924205Z // end inline asm 2026-02-21T10:19:58.0924272Z // begin inline asm 2026-02-21T10:19:58.0924450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2261, %r2262, %r2263, %r2264}, [%r2140]; 2026-02-21T10:19:58.0924509Z // end inline asm 2026-02-21T10:19:58.0924570Z // begin inline asm 2026-02-21T10:19:58.0924701Z st.global.v4.b32 [ %rd109 + 0 ], { %r2201, %r2202, %r2203, %r2204 }; 2026-02-21T10:19:58.0924760Z // end inline asm 2026-02-21T10:19:58.0924832Z // begin inline asm 2026-02-21T10:19:58.0924960Z st.global.v4.b32 [ %rd110 + 0 ], { %r2205, %r2206, %r2207, %r2208 }; 2026-02-21T10:19:58.0925017Z // end inline asm 2026-02-21T10:19:58.0925077Z // begin inline asm 2026-02-21T10:19:58.0925199Z st.global.v4.b32 [ %rd111 + 0 ], { %r2209, %r2210, %r2211, %r2212 }; 2026-02-21T10:19:58.0925257Z // end inline asm 2026-02-21T10:19:58.0925316Z // begin inline asm 2026-02-21T10:19:58.0925438Z st.global.v4.b32 [ %rd112 + 0 ], { %r2213, %r2214, %r2215, %r2216 }; 2026-02-21T10:19:58.0925493Z // end inline asm 2026-02-21T10:19:58.0925552Z // begin inline asm 2026-02-21T10:19:58.0925665Z st.global.v4.b32 [ %rd113 + 0 ], { %r2217, %r2218, %r2219, %r2220 }; 2026-02-21T10:19:58.0925726Z // end inline asm 2026-02-21T10:19:58.0925872Z // begin inline asm 2026-02-21T10:19:58.0925991Z st.global.v4.b32 [ %rd114 + 0 ], { %r2221, %r2222, %r2223, %r2224 }; 2026-02-21T10:19:58.0926050Z // end inline asm 2026-02-21T10:19:58.0926112Z // begin inline asm 2026-02-21T10:19:58.0926226Z st.global.v4.b32 [ %rd115 + 0 ], { %r2225, %r2226, %r2227, %r2228 }; 2026-02-21T10:19:58.0926283Z // end inline asm 2026-02-21T10:19:58.0926345Z // begin inline asm 2026-02-21T10:19:58.0926643Z st.global.v4.b32 [ %rd116 + 0 ], { %r2229, %r2230, %r2231, %r2232 }; 2026-02-21T10:19:58.0926710Z // end inline asm 2026-02-21T10:19:58.0926776Z // begin inline asm 2026-02-21T10:19:58.0926895Z st.global.v4.b32 [ %rd117 + 0 ], { %r2233, %r2234, %r2235, %r2236 }; 2026-02-21T10:19:58.0926955Z // end inline asm 2026-02-21T10:19:58.0927018Z // begin inline asm 2026-02-21T10:19:58.0927150Z st.global.v4.b32 [ %rd118 + 0 ], { %r2237, %r2238, %r2239, %r2240 }; 2026-02-21T10:19:58.0927212Z // end inline asm 2026-02-21T10:19:58.0927271Z // begin inline asm 2026-02-21T10:19:58.0927401Z st.global.v4.b32 [ %rd119 + 0 ], { %r2241, %r2242, %r2243, %r2244 }; 2026-02-21T10:19:58.0927457Z // end inline asm 2026-02-21T10:19:58.0927514Z // begin inline asm 2026-02-21T10:19:58.0927724Z st.global.v4.b32 [ %rd120 + 0 ], { %r2245, %r2246, %r2247, %r2248 }; 2026-02-21T10:19:58.0927787Z // end inline asm 2026-02-21T10:19:58.0927846Z // begin inline asm 2026-02-21T10:19:58.0927962Z st.global.v4.b32 [ %rd121 + 0 ], { %r2249, %r2250, %r2251, %r2252 }; 2026-02-21T10:19:58.0928020Z // end inline asm 2026-02-21T10:19:58.0928078Z // begin inline asm 2026-02-21T10:19:58.0928191Z st.global.v4.b32 [ %rd122 + 0 ], { %r2253, %r2254, %r2255, %r2256 }; 2026-02-21T10:19:58.0928316Z // end inline asm 2026-02-21T10:19:58.0928374Z // begin inline asm 2026-02-21T10:19:58.0928485Z st.global.v4.b32 [ %rd123 + 0 ], { %r2257, %r2258, %r2259, %r2260 }; 2026-02-21T10:19:58.0928547Z // end inline asm 2026-02-21T10:19:58.0928606Z // begin inline asm 2026-02-21T10:19:58.0928720Z st.global.v4.b32 [ %rd124 + 0 ], { %r2261, %r2262, %r2263, %r2264 }; 2026-02-21T10:19:58.0928777Z // end inline asm 2026-02-21T10:19:58.0928847Z mov.b32 %r2387, 0f00000000; 2026-02-21T10:19:58.0928914Z mov.b32 %r2388, %r2387; 2026-02-21T10:19:58.0928975Z mov.b32 %r2389, %r2387; 2026-02-21T10:19:58.0929039Z mov.b32 %r2390, %r2387; 2026-02-21T10:19:58.0929099Z mov.b32 %r2391, %r2387; 2026-02-21T10:19:58.0929158Z mov.b32 %r2392, %r2387; 2026-02-21T10:19:58.0929213Z mov.b32 %r2393, %r2387; 2026-02-21T10:19:58.0929287Z mov.b32 %r2394, %r2387; 2026-02-21T10:19:58.0929349Z mov.b32 %r2395, %r2387; 2026-02-21T10:19:58.0929409Z mov.b32 %r2396, %r2387; 2026-02-21T10:19:58.0929473Z mov.b32 %r2397, %r2387; 2026-02-21T10:19:58.0929597Z mov.b32 %r2398, %r2387; 2026-02-21T10:19:58.0929658Z mov.b32 %r2399, %r2387; 2026-02-21T10:19:58.0929716Z mov.b32 %r2400, %r2387; 2026-02-21T10:19:58.0929780Z mov.b32 %r2401, %r2387; 2026-02-21T10:19:58.0929838Z mov.b32 %r2402, %r2387; 2026-02-21T10:19:58.0929895Z mov.b32 %r2403, %r2387; 2026-02-21T10:19:58.0929961Z mov.b32 %r2404, %r2387; 2026-02-21T10:19:58.0930019Z mov.b32 %r2405, %r2387; 2026-02-21T10:19:58.0930076Z mov.b32 %r2406, %r2387; 2026-02-21T10:19:58.0930135Z mov.b32 %r2407, %r2387; 2026-02-21T10:19:58.0930196Z mov.b32 %r2408, %r2387; 2026-02-21T10:19:58.0930253Z mov.b32 %r2409, %r2387; 2026-02-21T10:19:58.0930311Z mov.b32 %r2410, %r2387; 2026-02-21T10:19:58.0930384Z mov.b32 %r2411, %r2387; 2026-02-21T10:19:58.0930451Z mov.b32 %r2412, %r2387; 2026-02-21T10:19:58.0930510Z mov.b32 %r2413, %r2387; 2026-02-21T10:19:58.0930568Z mov.b32 %r2414, %r2387; 2026-02-21T10:19:58.0930634Z mov.b32 %r2415, %r2387; 2026-02-21T10:19:58.0930697Z mov.b32 %r2416, %r2387; 2026-02-21T10:19:58.0930756Z mov.b32 %r2417, %r2387; 2026-02-21T10:19:58.0930818Z mov.b32 %r2418, %r2387; 2026-02-21T10:19:58.0930876Z mov.b32 %r2419, %r2387; 2026-02-21T10:19:58.0930933Z mov.b32 %r2420, %r2387; 2026-02-21T10:19:58.0930994Z mov.b32 %r2421, %r2387; 2026-02-21T10:19:58.0931150Z mov.b32 %r2422, %r2387; 2026-02-21T10:19:58.0931209Z mov.b32 %r2423, %r2387; 2026-02-21T10:19:58.0931268Z mov.b32 %r2424, %r2387; 2026-02-21T10:19:58.0931332Z mov.b32 %r2425, %r2387; 2026-02-21T10:19:58.0931391Z mov.b32 %r2426, %r2387; 2026-02-21T10:19:58.0931448Z mov.b32 %r2427, %r2387; 2026-02-21T10:19:58.0931510Z mov.b32 %r2428, %r2387; 2026-02-21T10:19:58.0931568Z mov.b32 %r2429, %r2387; 2026-02-21T10:19:58.0931636Z mov.b32 %r2430, %r2387; 2026-02-21T10:19:58.0931696Z mov.b32 %r2431, %r2387; 2026-02-21T10:19:58.0931760Z mov.b32 %r2432, %r2387; 2026-02-21T10:19:58.0931819Z mov.b32 %r2433, %r2387; 2026-02-21T10:19:58.0931879Z mov.b32 %r2434, %r2387; 2026-02-21T10:19:58.0931944Z mov.b32 %r2435, %r2387; 2026-02-21T10:19:58.0932003Z mov.b32 %r2436, %r2387; 2026-02-21T10:19:58.0932061Z mov.b32 %r2437, %r2387; 2026-02-21T10:19:58.0932120Z mov.b32 %r2438, %r2387; 2026-02-21T10:19:58.0932181Z mov.b32 %r2439, %r2387; 2026-02-21T10:19:58.0932238Z mov.b32 %r2440, %r2387; 2026-02-21T10:19:58.0932297Z mov.b32 %r2441, %r2387; 2026-02-21T10:19:58.0932362Z mov.b32 %r2442, %r2387; 2026-02-21T10:19:58.0932419Z mov.b32 %r2443, %r2387; 2026-02-21T10:19:58.0932539Z mov.b32 %r2444, %r2387; 2026-02-21T10:19:58.0932602Z mov.b32 %r2445, %r2387; 2026-02-21T10:19:58.0932666Z mov.b32 %r2446, %r2387; 2026-02-21T10:19:58.0932725Z mov.b32 %r2447, %r2387; 2026-02-21T10:19:58.0932783Z mov.b32 %r2448, %r2387; 2026-02-21T10:19:58.0932843Z mov.b32 %r2449, %r2387; 2026-02-21T10:19:58.0932902Z mov.b32 %r2450, %r2387; 2026-02-21T10:19:58.0932959Z mov.b32 %r2451, %r2387; 2026-02-21T10:19:58.0933018Z mov.b32 %r2452, %r2387; 2026-02-21T10:19:58.0933128Z mov.b32 %r2453, %r2387; 2026-02-21T10:19:58.0933188Z mov.b32 %r2454, %r2387; 2026-02-21T10:19:58.0933246Z mov.b32 %r2455, %r2387; 2026-02-21T10:19:58.0933308Z mov.b32 %r2456, %r2387; 2026-02-21T10:19:58.0933366Z mov.b32 %r2457, %r2387; 2026-02-21T10:19:58.0933423Z mov.b32 %r2458, %r2387; 2026-02-21T10:19:58.0933483Z mov.b32 %r2459, %r2387; 2026-02-21T10:19:58.0933544Z mov.b32 %r2460, %r2387; 2026-02-21T10:19:58.0933601Z mov.b32 %r2461, %r2387; 2026-02-21T10:19:58.0933659Z mov.b32 %r2462, %r2387; 2026-02-21T10:19:58.0933722Z mov.b32 %r2463, %r2387; 2026-02-21T10:19:58.0933781Z mov.b32 %r2464, %r2387; 2026-02-21T10:19:58.0933838Z mov.b32 %r2465, %r2387; 2026-02-21T10:19:58.0933897Z mov.b32 %r2466, %r2387; 2026-02-21T10:19:58.0933958Z mov.b32 %r2467, %r2387; 2026-02-21T10:19:58.0934015Z mov.b32 %r2468, %r2387; 2026-02-21T10:19:58.0934073Z mov.b32 %r2469, %r2387; 2026-02-21T10:19:58.0934147Z mov.b32 %r2470, %r2387; 2026-02-21T10:19:58.0934211Z mov.b32 %r2471, %r2387; 2026-02-21T10:19:58.0934321Z mov.b32 %r2472, %r2387; 2026-02-21T10:19:58.0934386Z mov.b32 %r2473, %r2387; 2026-02-21T10:19:58.0934443Z mov.b32 %r2474, %r2387; 2026-02-21T10:19:58.0934501Z mov.b32 %r2475, %r2387; 2026-02-21T10:19:58.0934559Z mov.b32 %r2476, %r2387; 2026-02-21T10:19:58.0934621Z mov.b32 %r2477, %r2387; 2026-02-21T10:19:58.0934682Z mov.b32 %r2478, %r2387; 2026-02-21T10:19:58.0934739Z mov.b32 %r2479, %r2387; 2026-02-21T10:19:58.0934800Z mov.b32 %r2480, %r2387; 2026-02-21T10:19:58.0934858Z mov.b32 %r2481, %r2387; 2026-02-21T10:19:58.0934916Z mov.b32 %r2482, %r2387; 2026-02-21T10:19:58.0934974Z mov.b32 %r2483, %r2387; 2026-02-21T10:19:58.0935038Z mov.b32 %r2484, %r2387; 2026-02-21T10:19:58.0935095Z mov.b32 %r2485, %r2387; 2026-02-21T10:19:58.0935154Z mov.b32 %r2486, %r2387; 2026-02-21T10:19:58.0935218Z mov.b32 %r2487, %r2387; 2026-02-21T10:19:58.0935283Z mov.b32 %r2488, %r2387; 2026-02-21T10:19:58.0935341Z mov.b32 %r2489, %r2387; 2026-02-21T10:19:58.0935402Z mov.b32 %r2490, %r2387; 2026-02-21T10:19:58.0935464Z mov.b32 %r2491, %r2387; 2026-02-21T10:19:58.0935522Z mov.b32 %r2492, %r2387; 2026-02-21T10:19:58.0935579Z mov.b32 %r2493, %r2387; 2026-02-21T10:19:58.0935641Z mov.b32 %r2494, %r2387; 2026-02-21T10:19:58.0935698Z mov.b32 %r2495, %r2387; 2026-02-21T10:19:58.0935755Z mov.b32 %r2496, %r2387; 2026-02-21T10:19:58.0935867Z mov.b32 %r2497, %r2387; 2026-02-21T10:19:58.0935929Z mov.b32 %r2498, %r2387; 2026-02-21T10:19:58.0935996Z mov.b32 %r2499, %r2387; 2026-02-21T10:19:58.0936059Z mov.b32 %r2500, %r2387; 2026-02-21T10:19:58.0936123Z mov.b32 %r2501, %r2387; 2026-02-21T10:19:58.0936181Z mov.b32 %r2502, %r2387; 2026-02-21T10:19:58.0936238Z mov.b32 %r2503, %r2387; 2026-02-21T10:19:58.0936295Z mov.b32 %r2504, %r2387; 2026-02-21T10:19:58.0936358Z mov.b32 %r2505, %r2387; 2026-02-21T10:19:58.0936417Z mov.b32 %r2506, %r2387; 2026-02-21T10:19:58.0936602Z mov.b32 %r2507, %r2387; 2026-02-21T10:19:58.0936670Z mov.b32 %r2508, %r2387; 2026-02-21T10:19:58.0936733Z mov.b32 %r2509, %r2387; 2026-02-21T10:19:58.0936791Z mov.b32 %r2510, %r2387; 2026-02-21T10:19:58.0936848Z mov.b32 %r2511, %r2387; 2026-02-21T10:19:58.0936910Z mov.b32 %r2512, %r2387; 2026-02-21T10:19:58.0936979Z mov.b32 %r2513, %r2387; 2026-02-21T10:19:58.0937039Z mov.b32 %r2514, %r2387; 2026-02-21T10:19:58.0937107Z bra.uni $L__BB0_6; 2026-02-21T10:19:58.0937202Z $L__BB0_7: // %._crit_edge 2026-02-21T10:19:58.0937518Z .loc 1 26 112 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:112 2026-02-21T10:19:58.0937597Z cp.async.wait_group 0; 2026-02-21T10:19:58.0937654Z bar.sync 0; 2026-02-21T10:19:58.0937714Z // begin inline asm 2026-02-21T10:19:58.0937819Z @%p61 mbarrier.inval.shared::cta.b64 [%r512]; 2026-02-21T10:19:58.0937881Z // end inline asm 2026-02-21T10:19:58.0937939Z bar.sync 0; 2026-02-21T10:19:58.0937999Z // begin inline asm 2026-02-21T10:19:58.0938095Z @%p61 mbarrier.inval.shared::cta.b64 [%r513]; 2026-02-21T10:19:58.0938218Z // end inline asm 2026-02-21T10:19:58.0938275Z bar.sync 0; 2026-02-21T10:19:58.0938337Z // begin inline asm 2026-02-21T10:19:58.0938430Z @%p61 mbarrier.inval.shared::cta.b64 [%r514]; 2026-02-21T10:19:58.0938499Z // end inline asm 2026-02-21T10:19:58.0938717Z .loc 1 26 4 // chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py:26:4 2026-02-21T10:19:58.0938777Z ret; 2026-02-21T10:19:58.0938833Z $L__tmp3: 2026-02-21T10:19:58.0938890Z $L__func_end0: 2026-02-21T10:19:58.0938984Z // -- End function 2026-02-21T10:19:58.0939038Z } 2026-02-21T10:19:58.0939291Z .file 1 "/tmp/torchinductor_root/hs/chsgbaotruecb7zsm2ofakhvlwr7cejizgdkwyyaqmz7llbe5bmn.py" 2026-02-21T10:19:58.0939502Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:19:58.0939571Z .section .debug_abbrev 2026-02-21T10:19:58.0939626Z { 2026-02-21T10:19:58.0939720Z .b8 1 // Abbreviation Code 2026-02-21T10:19:58.0939894Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:19:58.0939982Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:19:58.0940067Z .b8 37 // DW_AT_producer 2026-02-21T10:19:58.0940161Z .b8 8 // DW_FORM_string 2026-02-21T10:19:58.0940245Z .b8 19 // DW_AT_language 2026-02-21T10:19:58.0940329Z .b8 5 // DW_FORM_data2 2026-02-21T10:19:58.0940409Z .b8 3 // DW_AT_name 2026-02-21T10:19:58.0940490Z .b8 8 // DW_FORM_string 2026-02-21T10:19:58.0940571Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:19:58.0940657Z .b8 6 // DW_FORM_data4 2026-02-21T10:19:58.0940741Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:19:58.0940821Z .b8 8 // DW_FORM_string 2026-02-21T10:19:58.0940896Z .b8 0 // EOM(1) 2026-02-21T10:19:58.0940999Z .b8 0 // EOM(2) 2026-02-21T10:19:58.0941159Z .b8 2 // Abbreviation Code 2026-02-21T10:19:58.0941295Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:19:58.0941485Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:19:58.0941572Z .b8 3 // DW_AT_name 2026-02-21T10:19:58.0941649Z .b8 8 // DW_FORM_string 2026-02-21T10:19:58.0941731Z .b8 32 // DW_AT_inline 2026-02-21T10:19:58.0941813Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:58.0941882Z .b8 0 // EOM(1) 2026-02-21T10:19:58.0941950Z .b8 0 // EOM(2) 2026-02-21T10:19:58.0942042Z .b8 3 // Abbreviation Code 2026-02-21T10:19:58.0942126Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:19:58.0942207Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:19:58.0942291Z .b8 17 // DW_AT_low_pc 2026-02-21T10:19:58.0942366Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:58.0942449Z .b8 18 // DW_AT_high_pc 2026-02-21T10:19:58.0942584Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:58.0942688Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:19:58.0942765Z .b8 19 // DW_FORM_ref4 2026-02-21T10:19:58.0942832Z .b8 0 // EOM(1) 2026-02-21T10:19:58.0942906Z .b8 0 // EOM(2) 2026-02-21T10:19:58.0942990Z .b8 4 // Abbreviation Code 2026-02-21T10:19:58.0943138Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:19:58.0943224Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:19:58.0943317Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:19:58.0943399Z .b8 19 // DW_FORM_ref4 2026-02-21T10:19:58.0943481Z .b8 17 // DW_AT_low_pc 2026-02-21T10:19:58.0943559Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:58.0943649Z .b8 18 // DW_AT_high_pc 2026-02-21T10:19:58.0943724Z .b8 1 // DW_FORM_addr 2026-02-21T10:19:58.0943815Z .b8 88 // DW_AT_call_file 2026-02-21T10:19:58.0943905Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:58.0943989Z .b8 89 // DW_AT_call_line 2026-02-21T10:19:58.0944071Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:58.0944212Z .b8 87 // DW_AT_call_column 2026-02-21T10:19:58.0944292Z .b8 11 // DW_FORM_data1 2026-02-21T10:19:58.0944365Z .b8 0 // EOM(1) 2026-02-21T10:19:58.0944440Z .b8 0 // EOM(2) 2026-02-21T10:19:58.0944510Z .b8 0 // EOM(3) 2026-02-21T10:19:58.0944562Z } 2026-02-21T10:19:58.0944629Z .section .debug_info 2026-02-21T10:19:58.0944680Z { 2026-02-21T10:19:58.0944772Z .b32 178 // Length of Unit 2026-02-21T10:19:58.0944872Z .b8 2 // DWARF version number 2026-02-21T10:19:58.0944926Z .b8 0 2026-02-21T10:19:58.0945059Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:19:58.0945156Z .b8 8 // Address Size (in bytes) 2026-02-21T10:19:58.0945275Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:19:58.0945363Z .b8 116 // DW_AT_producer 2026-02-21T10:19:58.0945418Z .b8 114 2026-02-21T10:19:58.0945474Z .b8 105 2026-02-21T10:19:58.0945525Z .b8 116 2026-02-21T10:19:58.0945577Z .b8 111 2026-02-21T10:19:58.0945628Z .b8 110 2026-02-21T10:19:58.0945684Z .b8 0 2026-02-21T10:19:58.0945827Z .b8 2 // DW_AT_language 2026-02-21T10:19:58.0945882Z .b8 0 2026-02-21T10:19:58.0945964Z .b8 99 // DW_AT_name 2026-02-21T10:19:58.0946018Z .b8 104 2026-02-21T10:19:58.0946070Z .b8 115 2026-02-21T10:19:58.0946120Z .b8 103 2026-02-21T10:19:58.0946178Z .b8 98 2026-02-21T10:19:58.0946230Z .b8 97 2026-02-21T10:19:58.0946281Z .b8 111 2026-02-21T10:19:58.0946334Z .b8 116 2026-02-21T10:19:58.0946386Z .b8 114 2026-02-21T10:19:58.0946438Z .b8 117 2026-02-21T10:19:58.0946628Z .b8 101 2026-02-21T10:19:58.0946683Z .b8 99 2026-02-21T10:19:58.0946734Z .b8 98 2026-02-21T10:19:58.0946786Z .b8 55 2026-02-21T10:19:58.0946841Z .b8 122 2026-02-21T10:19:58.0946904Z .b8 115 2026-02-21T10:19:58.0946959Z .b8 109 2026-02-21T10:19:58.0947011Z .b8 50 2026-02-21T10:19:58.0947067Z .b8 111 2026-02-21T10:19:58.0947117Z .b8 102 2026-02-21T10:19:58.0947169Z .b8 97 2026-02-21T10:19:58.0947224Z .b8 107 2026-02-21T10:19:58.0947275Z .b8 104 2026-02-21T10:19:58.0947330Z .b8 118 2026-02-21T10:19:58.0947380Z .b8 108 2026-02-21T10:19:58.0947435Z .b8 119 2026-02-21T10:19:58.0947484Z .b8 114 2026-02-21T10:19:58.0947534Z .b8 55 2026-02-21T10:19:58.0947587Z .b8 99 2026-02-21T10:19:58.0947717Z .b8 101 2026-02-21T10:19:58.0947772Z .b8 106 2026-02-21T10:19:58.0947821Z .b8 105 2026-02-21T10:19:58.0947881Z .b8 122 2026-02-21T10:19:58.0947932Z .b8 103 2026-02-21T10:19:58.0947984Z .b8 100 2026-02-21T10:19:58.0948038Z .b8 107 2026-02-21T10:19:58.0948100Z .b8 119 2026-02-21T10:19:58.0948155Z .b8 121 2026-02-21T10:19:58.0948207Z .b8 121 2026-02-21T10:19:58.0948260Z .b8 97 2026-02-21T10:19:58.0948310Z .b8 113 2026-02-21T10:19:58.0948507Z .b8 109 2026-02-21T10:19:58.0948559Z .b8 122 2026-02-21T10:19:58.0948615Z .b8 55 2026-02-21T10:19:58.0948667Z .b8 108 2026-02-21T10:19:58.0948719Z .b8 108 2026-02-21T10:19:58.0948776Z .b8 98 2026-02-21T10:19:58.0948826Z .b8 101 2026-02-21T10:19:58.0948877Z .b8 53 2026-02-21T10:19:58.0948927Z .b8 98 2026-02-21T10:19:58.0948981Z .b8 109 2026-02-21T10:19:58.0949033Z .b8 110 2026-02-21T10:19:58.0949082Z .b8 46 2026-02-21T10:19:58.0949136Z .b8 112 2026-02-21T10:19:58.0949188Z .b8 121 2026-02-21T10:19:58.0949237Z .b8 0 2026-02-21T10:19:58.0949340Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:19:58.0949429Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:19:58.0949484Z .b8 116 2026-02-21T10:19:58.0949536Z .b8 109 2026-02-21T10:19:58.0949591Z .b8 112 2026-02-21T10:19:58.0949642Z .b8 47 2026-02-21T10:19:58.0949693Z .b8 116 2026-02-21T10:19:58.0949744Z .b8 111 2026-02-21T10:19:58.0949798Z .b8 114 2026-02-21T10:19:58.0949847Z .b8 99 2026-02-21T10:19:58.0949899Z .b8 104 2026-02-21T10:19:58.0949950Z .b8 105 2026-02-21T10:19:58.0950102Z .b8 110 2026-02-21T10:19:58.0950158Z .b8 100 2026-02-21T10:19:58.0950210Z .b8 117 2026-02-21T10:19:58.0950261Z .b8 99 2026-02-21T10:19:58.0950311Z .b8 116 2026-02-21T10:19:58.0950362Z .b8 111 2026-02-21T10:19:58.0950412Z .b8 114 2026-02-21T10:19:58.0950467Z .b8 95 2026-02-21T10:19:58.0950520Z .b8 114 2026-02-21T10:19:58.0950571Z .b8 111 2026-02-21T10:19:58.0950628Z .b8 111 2026-02-21T10:19:58.0950678Z .b8 116 2026-02-21T10:19:58.0950728Z .b8 47 2026-02-21T10:19:58.0950779Z .b8 104 2026-02-21T10:19:58.0950835Z .b8 115 2026-02-21T10:19:58.0950886Z .b8 0 2026-02-21T10:19:58.0950999Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:19:58.0951082Z .b8 95 // DW_AT_name 2026-02-21T10:19:58.0951132Z .b8 104 2026-02-21T10:19:58.0951184Z .b8 101 2026-02-21T10:19:58.0951235Z .b8 108 2026-02-21T10:19:58.0951290Z .b8 105 2026-02-21T10:19:58.0951341Z .b8 111 2026-02-21T10:19:58.0951394Z .b8 110 2026-02-21T10:19:58.0951449Z .b8 95 2026-02-21T10:19:58.0951502Z .b8 109 2026-02-21T10:19:58.0951552Z .b8 97 2026-02-21T10:19:58.0951603Z .b8 116 2026-02-21T10:19:58.0951656Z .b8 109 2026-02-21T10:19:58.0951705Z .b8 117 2026-02-21T10:19:58.0951765Z .b8 108 2026-02-21T10:19:58.0951816Z .b8 95 2026-02-21T10:19:58.0951866Z .b8 98 2026-02-21T10:19:58.0951997Z .b8 102 2026-02-21T10:19:58.0952051Z .b8 49 2026-02-21T10:19:58.0952101Z .b8 54 2026-02-21T10:19:58.0952150Z .b8 95 2026-02-21T10:19:58.0952205Z .b8 105 2026-02-21T10:19:58.0952259Z .b8 110 2026-02-21T10:19:58.0952310Z .b8 116 2026-02-21T10:19:58.0952359Z .b8 52 2026-02-21T10:19:58.0952411Z .b8 0 2026-02-21T10:19:58.0952491Z .b8 1 // DW_AT_inline 2026-02-21T10:19:58.0952598Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:19:58.0952696Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:19:58.0952794Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:19:58.0952900Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:19:58.0953042Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:19:58.0953147Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:19:58.0953240Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:19:58.0953331Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:19:58.0953473Z .b8 1 // DW_AT_call_file 2026-02-21T10:19:58.0953557Z .b8 90 // DW_AT_call_line 2026-02-21T10:19:58.0953642Z .b8 40 // DW_AT_call_column 2026-02-21T10:19:58.0953735Z .b8 0 // End Of Children Mark 2026-02-21T10:19:58.0953821Z .b8 0 // End Of Children Mark 2026-02-21T10:19:58.0953873Z } 2026-02-21T10:19:58.0953947Z .section .debug_macinfo { } 2026-02-21T10:19:58.0954001Z 2026-02-21T10:19:58.0954084Z ================================================================ 2026-02-21T10:19:58.0954202Z please share the reproducer above with Triton project. 2026-02-21T10:19:58.9617955Z 2026-02-21T10:19:58.9619900Z Generation 15: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 2.0 configs/s 2026-02-21T10:20:00.9263868Z Generation 15: verifying top configs 100% ━━━━━━━━━━━━━━━━━ 34/34 10.7 configs/s 2026-02-21T10:20:02.3486918Z [3195s] Generation 15 complete: 2026-02-21T10:20:02.3487189Z error=18 2026-02-21T10:20:02.3487339Z ok=42 2026-02-21T10:20:02.3487486Z min=6.3389 2026-02-21T10:20:02.3487636Z mid=11.4968 2026-02-21T10:20:02.3487789Z max=872.1146 2026-02-21T10:20:02.3487961Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:20:02.3488301Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:20:02.3488644Z 'l2_groupings': [8], 2026-02-21T10:20:02.3488849Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:20:02.3489105Z 'loop_orders': [[1, 0]], 2026-02-21T10:20:02.3489529Z 'maxnreg': 256, 2026-02-21T10:20:02.3489720Z 'num_sm_multiplier': 64, 2026-02-21T10:20:02.3489916Z 'num_stages': 1, 2026-02-21T10:20:02.3490088Z 'num_warps': 4, 2026-02-21T10:20:02.3543812Z 'pid_type': 'persistent_interleaved', 2026-02-21T10:20:02.3544060Z 'range_flattens': [True, False], 2026-02-21T10:20:02.3544292Z 'range_multi_buffers': [None, False], 2026-02-21T10:20:02.3544513Z 'range_num_stages': [4, 1], 2026-02-21T10:20:02.3544715Z 'range_unroll_factors': [1, 1], 2026-02-21T10:20:02.3544928Z 'range_warp_specializes': []} 2026-02-21T10:20:02.3545173Z [3195s] Fitting surrogate: 1322 points, 1322 targets 2026-02-21T10:20:03.2697834Z [3196s] Generation 16 starting: 52 neighbors, 3 active search path(s) 2026-02-21T10:20:22.7388169Z Generation 16: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 53/53 2.1 configs/s 2026-02-21T10:20:51.5778505Z Generation 16: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 53/53 1.8 configs/s 2026-02-21T10:20:54.2767174Z Generation 16: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 8.8 configs/s 2026-02-21T10:20:55.6988165Z [3248s] Generation 16 complete: 2026-02-21T10:20:55.6988552Z error=10 2026-02-21T10:20:55.6988736Z ok=45 2026-02-21T10:20:55.6988909Z min=6.3864 2026-02-21T10:20:55.6989080Z mid=11.5999 2026-02-21T10:20:55.6989253Z max=504.6950 2026-02-21T10:20:55.6989865Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:20:55.6990238Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:20:55.6990629Z 'l2_groupings': [8], 2026-02-21T10:20:55.6990884Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:20:55.6991163Z 'loop_orders': [[1, 0]], 2026-02-21T10:20:55.6991373Z 'maxnreg': 256, 2026-02-21T10:20:55.6991571Z 'num_sm_multiplier': 64, 2026-02-21T10:20:55.6991786Z 'num_stages': 1, 2026-02-21T10:20:55.6991982Z 'num_warps': 4, 2026-02-21T10:20:55.6992187Z 'pid_type': 'persistent_blocked', 2026-02-21T10:20:55.6992449Z 'range_flattens': [True, False], 2026-02-21T10:20:55.6992709Z 'range_multi_buffers': [None, False], 2026-02-21T10:20:55.6992972Z 'range_num_stages': [4, 1], 2026-02-21T10:20:55.6993202Z 'range_unroll_factors': [1, 1], 2026-02-21T10:20:55.6993477Z 'range_warp_specializes': []} 2026-02-21T10:20:55.7050150Z [3248s] Fitting surrogate: 1377 points, 1377 targets 2026-02-21T10:20:56.7123210Z [3249s] Generation 17 starting: 58 neighbors, 3 active search path(s) 2026-02-21T10:21:16.7482920Z Generation 17: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 59/59 3.6 configs/s 2026-02-21T10:21:49.1312529Z Generation 17: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 59/59 1.8 configs/s 2026-02-21T10:21:51.8797061Z Generation 17: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 8.7 configs/s 2026-02-21T10:21:53.3271962Z [3306s] Generation 17 complete: 2026-02-21T10:21:53.3272422Z error=14 2026-02-21T10:21:53.3272700Z ok=47 2026-02-21T10:21:53.3272964Z min=6.0923 2026-02-21T10:21:53.3273230Z mid=11.5062 2026-02-21T10:21:53.3273498Z max=466.2318 2026-02-21T10:21:53.3274417Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:21:53.3275024Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:21:53.3275825Z 'l2_groupings': [8], 2026-02-21T10:21:53.3276318Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:21:53.3277202Z 'loop_orders': [[1, 0]], 2026-02-21T10:21:53.3277451Z 'maxnreg': 256, 2026-02-21T10:21:53.3277770Z 'num_sm_multiplier': 64, 2026-02-21T10:21:53.3278002Z 'num_stages': 1, 2026-02-21T10:21:53.3278269Z 'num_warps': 4, 2026-02-21T10:21:53.3278501Z 'pid_type': 'persistent_blocked', 2026-02-21T10:21:53.3278831Z 'range_flattens': [True, False], 2026-02-21T10:21:53.3279120Z 'range_multi_buffers': [None, False], 2026-02-21T10:21:53.3279447Z 'range_num_stages': [3, 1], 2026-02-21T10:21:53.3279702Z 'range_unroll_factors': [1, 1], 2026-02-21T10:21:53.3339782Z 'range_warp_specializes': []} 2026-02-21T10:21:53.3340090Z [3306s] Fitting surrogate: 1438 points, 1438 targets 2026-02-21T10:21:54.3928248Z [3307s] Generation 18 starting: 60 neighbors, 3 active search path(s) 2026-02-21T10:22:20.2213648Z Generation 18: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 61/61 1.0 configs/s 2026-02-21T10:22:24.2149611Z 2026-02-21T10:22:24.2149630Z 2026-02-21T10:22:24.2150051Z ================================================================ 2026-02-21T10:22:24.2150466Z Internal Triton PTX codegen error 2026-02-21T10:22:24.2151236Z `ptxas` stderr: 2026-02-21T10:22:24.2152013Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1166 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:22:24.2152879Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:24.2153175Z 2026-02-21T10:22:24.2153854Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpy47jx0jh.ptx -o /tmp/tmpy47jx0jh.ptx.o 2026-02-21T10:22:24.2154581Z 2026-02-21T10:22:24.2154587Z 2026-02-21T10:22:24.2154674Z // 2026-02-21T10:22:24.2154868Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:22:24.2155122Z // 2026-02-21T10:22:24.2155214Z 2026-02-21T10:22:24.2155284Z .version 8.7 2026-02-21T10:22:24.2155469Z .target sm_90a 2026-02-21T10:22:24.2155654Z .address_size 64 2026-02-21T10:22:24.2155800Z 2026-02-21T10:22:24.2156072Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:22:24.2156792Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:22:24.2157302Z // @_helion_matmul_bf16_int4 2026-02-21T10:22:24.2157645Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:22:24.2158014Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:22:24.2158460Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:22:24.2158885Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:22:24.2159311Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:22:24.2159748Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:22:24.2160081Z ) 2026-02-21T10:22:24.2160240Z .reqntid 128 2026-02-21T10:22:24.2160407Z .maxnreg 32 2026-02-21T10:22:24.2160572Z { 2026-02-21T10:22:24.2160728Z .reg .pred %p<406>; 2026-02-21T10:22:24.2160929Z .reg .b16 %rs<3585>; 2026-02-21T10:22:24.2161124Z .reg .b32 %r<43502>; 2026-02-21T10:22:24.2161316Z .reg .b64 %rd<856>; 2026-02-21T10:22:24.2161692Z .loc 1 19 0 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:19:0 2026-02-21T10:22:24.2162145Z $L__func_begin0: 2026-02-21T10:22:24.2162519Z .loc 1 19 0 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:19:0 2026-02-21T10:22:24.2162908Z 2026-02-21T10:22:24.2162984Z // %bb.0: 2026-02-21T10:22:24.2163278Z ld.param.b64 %rd117, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:22:24.2163671Z ld.param.b64 %rd119, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:22:24.2164179Z $L__tmp0: 2026-02-21T10:22:24.2164558Z .loc 1 21 67 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:21:67 2026-02-21T10:22:24.2165055Z mov.u32 %r2212, %ctaid.x; 2026-02-21T10:22:24.2165342Z ld.param.b64 %rd137, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:22:24.2165647Z mov.u32 %r2213, %ctaid.y; 2026-02-21T10:22:24.2165921Z ld.param.b64 %rd154, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:22:24.2166230Z mov.u32 %r2214, %ctaid.z; 2026-02-21T10:22:24.2166449Z mov.u32 %r2215, %nctaid.x; 2026-02-21T10:22:24.2166878Z mov.u32 %r2216, %nctaid.y; 2026-02-21T10:22:24.2167139Z mad.lo.s32 %r2217, %r2214, %r2216, %r2213; 2026-02-21T10:22:24.2167407Z mad.lo.s32 %r2218, %r2217, %r2215, %r2212; 2026-02-21T10:22:24.2167637Z shl.b32 %r2219, %r2218, 8; 2026-02-21T10:22:24.2167836Z cvt.s64.s32 %rd155, %r2219; 2026-02-21T10:22:24.2168089Z add.s64 %rd133, %rd154, %rd155; 2026-02-21T10:22:24.2168299Z mov.u32 %r1, %tid.x; 2026-02-21T10:22:24.2168597Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T10:22:24.2168804Z shl.b32 %r2220, %r1, 2; 2026-02-21T10:22:24.2168981Z mov.b32 %r39936, global_smem; 2026-02-21T10:22:24.2169175Z add.s32 %r2196, %r39936, %r2220; 2026-02-21T10:22:24.2169364Z mov.b32 %r2205, 0; 2026-02-21T10:22:24.2169519Z // begin inline asm 2026-02-21T10:22:24.2169773Z @%p1 st.shared.b32 [ %r2196 + 0 ], %r2205; 2026-02-21T10:22:24.2169984Z // end inline asm 2026-02-21T10:22:24.2170151Z bar.warp.sync -1; 2026-02-21T10:22:24.2170321Z setp.eq.b32 %p313, %r1, 0; 2026-02-21T10:22:24.2170508Z cvt.u64.u32 %rd118, %r39936; 2026-02-21T10:22:24.2170698Z // begin inline asm 2026-02-21T10:22:24.2171045Z @%p313 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd118 + 0 ], %rd119; 2026-02-21T10:22:24.2171407Z // end inline asm 2026-02-21T10:22:24.2171556Z // begin inline asm 2026-02-21T10:22:24.2171835Z @%p313 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1; 2026-02-21T10:22:24.2172187Z // end inline asm 2026-02-21T10:22:24.2172406Z mov.b32 %r2198, 128; 2026-02-21T10:22:24.2172578Z // begin inline asm 2026-02-21T10:22:24.2172913Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2198; 2026-02-21T10:22:24.2173268Z // end inline asm 2026-02-21T10:22:24.2173422Z mov.b32 %r2199, 32; 2026-02-21T10:22:24.2173592Z // begin inline asm 2026-02-21T10:22:24.2173876Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2199; 2026-02-21T10:22:24.2174307Z // end inline asm 2026-02-21T10:22:24.2176312Z mov.b32 %r2200, 1280; 2026-02-21T10:22:24.2176665Z // begin inline asm 2026-02-21T10:22:24.2177000Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2200; 2026-02-21T10:22:24.2177361Z // end inline asm 2026-02-21T10:22:24.2177526Z mov.b32 %r2201, 4096; 2026-02-21T10:22:24.2177693Z // begin inline asm 2026-02-21T10:22:24.2178011Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2201; 2026-02-21T10:22:24.2178400Z // end inline asm 2026-02-21T10:22:24.2178569Z mov.b64 %rd126, 1280; 2026-02-21T10:22:24.2178751Z // begin inline asm 2026-02-21T10:22:24.2179079Z @%p313 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd118 + 0 ], 0x0, %rd126; 2026-02-21T10:22:24.2179451Z // end inline asm 2026-02-21T10:22:24.2179608Z mov.b32 %r2202, 1; 2026-02-21T10:22:24.2179763Z // begin inline asm 2026-02-21T10:22:24.2180089Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2202; 2026-02-21T10:22:24.2180507Z // end inline asm 2026-02-21T10:22:24.2180679Z // begin inline asm 2026-02-21T10:22:24.2181001Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2202; 2026-02-21T10:22:24.2181390Z // end inline asm 2026-02-21T10:22:24.2181548Z // begin inline asm 2026-02-21T10:22:24.2181835Z @%p313 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:22:24.2182315Z // end inline asm 2026-02-21T10:22:24.2182503Z // begin inline asm 2026-02-21T10:22:24.2182844Z @%p313 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:22:24.2183226Z // end inline asm 2026-02-21T10:22:24.2183399Z // begin inline asm 2026-02-21T10:22:24.2183697Z @%p313 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x3; 2026-02-21T10:22:24.2184039Z // end inline asm 2026-02-21T10:22:24.2184204Z // begin inline asm 2026-02-21T10:22:24.2184487Z @%p313 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:22:24.2184850Z // end inline asm 2026-02-21T10:22:24.2184997Z // begin inline asm 2026-02-21T10:22:24.2185438Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd133 + 0 ], [ %rd118 + 0 ], 0x80; 2026-02-21T10:22:24.2185922Z // end inline asm 2026-02-21T10:22:24.2186094Z // begin inline asm 2026-02-21T10:22:24.2186614Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd133 + 0 ], 0x80; 2026-02-21T10:22:24.2186961Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:24.2187194Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:24.2187407Z // end inline asm 2026-02-21T10:22:24.2187619Z bar.sync 0; 2026-02-21T10:22:24.2187802Z cvta.global.u64 %rd822, %rd133; 2026-02-21T10:22:24.2188273Z .loc 1 23 68 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:23:68 2026-02-21T10:22:24.2188743Z add.s64 %rd151, %rd133, 128; 2026-02-21T10:22:24.2188928Z bar.sync 0; 2026-02-21T10:22:24.2189099Z // begin inline asm 2026-02-21T10:22:24.2189273Z @%p1 st.shared.b32 [ %r2196 + 0 ], %r2205; 2026-02-21T10:22:24.2189489Z // end inline asm 2026-02-21T10:22:24.2189639Z bar.warp.sync -1; 2026-02-21T10:22:24.2189808Z // begin inline asm 2026-02-21T10:22:24.2190153Z @%p313 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd118 + 0 ], %rd137; 2026-02-21T10:22:24.2190515Z // end inline asm 2026-02-21T10:22:24.2190681Z // begin inline asm 2026-02-21T10:22:24.2190988Z @%p313 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1; 2026-02-21T10:22:24.2191360Z // end inline asm 2026-02-21T10:22:24.2191507Z mov.b32 %r2206, 64; 2026-02-21T10:22:24.2191676Z // begin inline asm 2026-02-21T10:22:24.2191988Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2206; 2026-02-21T10:22:24.2192349Z // end inline asm 2026-02-21T10:22:24.2192505Z // begin inline asm 2026-02-21T10:22:24.2192872Z @%p313 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2198; 2026-02-21T10:22:24.2193224Z // end inline asm 2026-02-21T10:22:24.2193375Z // begin inline asm 2026-02-21T10:22:24.2193680Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2200; 2026-02-21T10:22:24.2194039Z // end inline asm 2026-02-21T10:22:24.2194194Z mov.b32 %r2209, 65536; 2026-02-21T10:22:24.2194366Z // begin inline asm 2026-02-21T10:22:24.2194666Z @%p313 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2209; 2026-02-21T10:22:24.2195030Z // end inline asm 2026-02-21T10:22:24.2195182Z mov.b64 %rd144, 2560; 2026-02-21T10:22:24.2195363Z // begin inline asm 2026-02-21T10:22:24.2195679Z @%p313 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd118 + 0 ], 0x0, %rd144; 2026-02-21T10:22:24.2196047Z // end inline asm 2026-02-21T10:22:24.2196206Z // begin inline asm 2026-02-21T10:22:24.2196672Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0, %r2202; 2026-02-21T10:22:24.2197049Z // end inline asm 2026-02-21T10:22:24.2197193Z // begin inline asm 2026-02-21T10:22:24.2197509Z @%p313 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x1, %r2202; 2026-02-21T10:22:24.2197864Z // end inline asm 2026-02-21T10:22:24.2198012Z // begin inline asm 2026-02-21T10:22:24.2198297Z @%p313 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd118 + 0 ], 0xa; 2026-02-21T10:22:24.2198771Z // end inline asm 2026-02-21T10:22:24.2198921Z // begin inline asm 2026-02-21T10:22:24.2199225Z @%p313 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:22:24.2199582Z // end inline asm 2026-02-21T10:22:24.2199728Z // begin inline asm 2026-02-21T10:22:24.2200016Z @%p313 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x3; 2026-02-21T10:22:24.2200361Z // end inline asm 2026-02-21T10:22:24.2200514Z // begin inline asm 2026-02-21T10:22:24.2200793Z @%p313 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd118 + 0 ], 0x0; 2026-02-21T10:22:24.2201115Z // end inline asm 2026-02-21T10:22:24.2201269Z // begin inline asm 2026-02-21T10:22:24.2201693Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd151 + 0 ], [ %rd118 + 0 ], 0x80; 2026-02-21T10:22:24.2202175Z // end inline asm 2026-02-21T10:22:24.2202320Z // begin inline asm 2026-02-21T10:22:24.2202655Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd151 + 0 ], 0x80; 2026-02-21T10:22:24.2202975Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:24.2203190Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:24.2203402Z // end inline asm 2026-02-21T10:22:24.2203545Z bar.sync 0; 2026-02-21T10:22:24.2203770Z cvta.global.u64 %rd497, %rd151; 2026-02-21T10:22:24.2204104Z .loc 1 29 35 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:29:35 2026-02-21T10:22:24.2204464Z shl.b32 %r42472, %r2212, 1; 2026-02-21T10:22:24.2204799Z .loc 1 30 37 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:30:37 2026-02-21T10:22:24.2205144Z add.s32 %r2222, %r42472, 2; 2026-02-21T10:22:24.2205460Z .loc 1 30 49 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:30:49 2026-02-21T10:22:24.2205822Z min.s32 %r3, %r2222, 5120; 2026-02-21T10:22:24.2206150Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.2206637Z sub.s32 %r2223, %r3, %r42472; 2026-02-21T10:22:24.2206844Z mul.hi.s32 %r2224, %r2223, 1431655766; 2026-02-21T10:22:24.2207047Z shr.u32 %r2225, %r2224, 31; 2026-02-21T10:22:24.2207236Z add.s32 %r2226, %r2224, %r2225; 2026-02-21T10:22:24.2207440Z mad.lo.s32 %r43244, %r2226, 3, %r42472; 2026-02-21T10:22:24.2207780Z .loc 1 44 45 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:44:45 2026-02-21T10:22:24.2208222Z shr.u32 %r5, %r1, 5; 2026-02-21T10:22:24.2208402Z shr.u32 %r2227, %r1, 3; 2026-02-21T10:22:24.2208580Z bfe.u32 %r6, %r1, 3, 4; 2026-02-21T10:22:24.2208749Z or.b32 %r7, %r2227, 112; 2026-02-21T10:22:24.2209058Z .loc 1 57 38 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:57:38 2026-02-21T10:22:24.2209408Z and.b32 %r8, %r1, 7; 2026-02-21T10:22:24.2209570Z shl.b32 %r9, %r8, 3; 2026-02-21T10:22:24.2209873Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.2210230Z setp.ge.s32 %p37, %r42472, %r43244; 2026-02-21T10:22:24.2210438Z and.b32 %r42456, %r1, 127; 2026-02-21T10:22:24.2210615Z and.b32 %r42457, %r1, 56; 2026-02-21T10:22:24.2210804Z shl.b32 %r42458, %r1, 6; 2026-02-21T10:22:24.2210974Z shl.b32 %r42459, %r1, 5; 2026-02-21T10:22:24.2211142Z shl.b32 %r42460, %r1, 1; 2026-02-21T10:22:24.2211313Z shl.b32 %r42461, %r8, 4; 2026-02-21T10:22:24.2211476Z shl.b32 %r42462, %r1, 7; 2026-02-21T10:22:24.2211650Z and.b32 %r42463, %r1, 16; 2026-02-21T10:22:24.2211813Z or.b32 %r42464, %r6, 96; 2026-02-21T10:22:24.2211979Z or.b32 %r42465, %r6, 80; 2026-02-21T10:22:24.2212139Z or.b32 %r42466, %r6, 64; 2026-02-21T10:22:24.2212303Z or.b32 %r42467, %r6, 48; 2026-02-21T10:22:24.2212474Z or.b32 %r42468, %r6, 32; 2026-02-21T10:22:24.2212643Z or.b32 %r42469, %r6, 16; 2026-02-21T10:22:24.2212804Z shl.b32 %r42470, %r6, 13; 2026-02-21T10:22:24.2213056Z shl.b32 %r42471, %r7, 13; 2026-02-21T10:22:24.2213233Z setp.lt.u32 %p405, %r1, 64; 2026-02-21T10:22:24.2213411Z @%p37 bra $L__BB0_15; 2026-02-21T10:22:24.2213601Z // %bb.1: // %.lr.ph 2026-02-21T10:22:24.2213956Z .loc 1 0 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:0:88 2026-02-21T10:22:24.2214310Z shl.b32 %r2229, %r42456, 4; 2026-02-21T10:22:24.2214491Z xor.b32 %r2231, %r2229, %r42457; 2026-02-21T10:22:24.2214689Z add.s32 %r10, %r39936, %r2231; 2026-02-21T10:22:24.2214870Z xor.b32 %r2233, %r2231, 8; 2026-02-21T10:22:24.2215053Z add.s32 %r11, %r39936, %r2233; 2026-02-21T10:22:24.2215237Z and.b32 %r2235, %r42458, 6144; 2026-02-21T10:22:24.2215417Z and.b32 %r2237, %r42459, 896; 2026-02-21T10:22:24.2215615Z and.b32 %r2239, %r42460, 62; 2026-02-21T10:22:24.2215800Z or.b32 %r2240, %r2235, %r2237; 2026-02-21T10:22:24.2215981Z or.b32 %r2241, %r2240, %r2239; 2026-02-21T10:22:24.2216163Z add.s32 %r12, %r39936, %r2241; 2026-02-21T10:22:24.2216430Z xor.b32 %r2242, %r2241, 8; 2026-02-21T10:22:24.2216732Z add.s32 %r13, %r39936, %r2242; 2026-02-21T10:22:24.2216916Z xor.b32 %r2243, %r2241, 16; 2026-02-21T10:22:24.2217092Z add.s32 %r14, %r39936, %r2243; 2026-02-21T10:22:24.2217277Z xor.b32 %r2244, %r2241, 24; 2026-02-21T10:22:24.2217453Z add.s32 %r15, %r39936, %r2244; 2026-02-21T10:22:24.2217727Z xor.b32 %r2245, %r2241, 32; 2026-02-21T10:22:24.2217904Z add.s32 %r16, %r39936, %r2245; 2026-02-21T10:22:24.2218080Z xor.b32 %r2246, %r2241, 40; 2026-02-21T10:22:24.2218258Z add.s32 %r17, %r39936, %r2246; 2026-02-21T10:22:24.2218435Z xor.b32 %r2247, %r2241, 48; 2026-02-21T10:22:24.2218616Z add.s32 %r18, %r39936, %r2247; 2026-02-21T10:22:24.2218795Z xor.b32 %r2248, %r2241, 56; 2026-02-21T10:22:24.2218988Z add.s32 %r19, %r39936, %r2248; 2026-02-21T10:22:24.2219167Z add.s32 %r20, %r39936, %r42456; 2026-02-21T10:22:24.2219360Z xor.b32 %r2249, %r42456, 16; 2026-02-21T10:22:24.2219547Z add.s32 %r21, %r39936, %r2249; 2026-02-21T10:22:24.2219739Z xor.b32 %r2250, %r42456, 32; 2026-02-21T10:22:24.2219929Z add.s32 %r22, %r39936, %r2250; 2026-02-21T10:22:24.2220121Z xor.b32 %r2251, %r42456, 48; 2026-02-21T10:22:24.2220305Z add.s32 %r23, %r39936, %r2251; 2026-02-21T10:22:24.2220488Z xor.b32 %r2252, %r42456, 64; 2026-02-21T10:22:24.2220668Z add.s32 %r24, %r39936, %r2252; 2026-02-21T10:22:24.2220846Z xor.b32 %r2253, %r42456, 80; 2026-02-21T10:22:24.2221024Z add.s32 %r25, %r39936, %r2253; 2026-02-21T10:22:24.2221283Z xor.b32 %r2254, %r42456, 96; 2026-02-21T10:22:24.2221462Z add.s32 %r26, %r39936, %r2254; 2026-02-21T10:22:24.2221643Z xor.b32 %r2255, %r42456, 112; 2026-02-21T10:22:24.2221822Z add.s32 %r27, %r39936, %r2255; 2026-02-21T10:22:24.2222003Z shl.b32 %r2256, %r42456, 7; 2026-02-21T10:22:24.2222182Z or.b32 %r2258, %r2256, %r42461; 2026-02-21T10:22:24.2222369Z add.s32 %r28, %r39936, %r2258; 2026-02-21T10:22:24.2222545Z xor.b32 %r2259, %r2258, 16; 2026-02-21T10:22:24.2222742Z add.s32 %r29, %r39936, %r2259; 2026-02-21T10:22:24.2222920Z xor.b32 %r2260, %r2258, 32; 2026-02-21T10:22:24.2223100Z add.s32 %r30, %r39936, %r2260; 2026-02-21T10:22:24.2223279Z xor.b32 %r2261, %r2258, 48; 2026-02-21T10:22:24.2223449Z add.s32 %r31, %r39936, %r2261; 2026-02-21T10:22:24.2223628Z xor.b32 %r2262, %r2258, 64; 2026-02-21T10:22:24.2223804Z add.s32 %r32, %r39936, %r2262; 2026-02-21T10:22:24.2223988Z xor.b32 %r2263, %r2258, 80; 2026-02-21T10:22:24.2224162Z add.s32 %r33, %r39936, %r2263; 2026-02-21T10:22:24.2224347Z xor.b32 %r2264, %r2258, 96; 2026-02-21T10:22:24.2224522Z add.s32 %r34, %r39936, %r2264; 2026-02-21T10:22:24.2224705Z xor.b32 %r2265, %r2258, 112; 2026-02-21T10:22:24.2224902Z add.s32 %r35, %r39936, %r2265; 2026-02-21T10:22:24.2225084Z bfe.u32 %r2266, %r39936, 4, 14; 2026-02-21T10:22:24.2225279Z cvt.u64.u32 %rd156, %r2266; 2026-02-21T10:22:24.2225469Z or.b64 %rd3, %rd156, 4611686293372403712; 2026-02-21T10:22:24.2225688Z add.s32 %r2267, %r39936, 32; 2026-02-21T10:22:24.2225959Z bfe.u32 %r2268, %r2267, 4, 14; 2026-02-21T10:22:24.2226160Z cvt.u64.u32 %rd157, %r2268; 2026-02-21T10:22:24.2226348Z or.b64 %rd4, %rd157, 4611686293372403712; 2026-02-21T10:22:24.2226684Z add.s32 %r2269, %r39936, 64; 2026-02-21T10:22:24.2226862Z bfe.u32 %r2270, %r2269, 4, 14; 2026-02-21T10:22:24.2227049Z cvt.u64.u32 %rd158, %r2270; 2026-02-21T10:22:24.2227239Z or.b64 %rd5, %rd158, 4611686293372403712; 2026-02-21T10:22:24.2227440Z add.s32 %r2271, %r39936, 96; 2026-02-21T10:22:24.2227622Z bfe.u32 %r2272, %r2271, 4, 14; 2026-02-21T10:22:24.2249759Z cvt.u64.u32 %rd159, %r2272; 2026-02-21T10:22:24.2250053Z or.b64 %rd6, %rd159, 4611686293372403712; 2026-02-21T10:22:24.2250297Z add.s32 %r2273, %r39936, 16384; 2026-02-21T10:22:24.2250510Z bfe.u32 %r2274, %r2273, 4, 14; 2026-02-21T10:22:24.2250720Z cvt.u64.u32 %rd160, %r2274; 2026-02-21T10:22:24.2250923Z or.b64 %rd7, %rd160, 4611686293372403712; 2026-02-21T10:22:24.2251148Z add.s32 %r2275, %r39936, 16416; 2026-02-21T10:22:24.2251348Z bfe.u32 %r2276, %r2275, 4, 14; 2026-02-21T10:22:24.2251673Z cvt.u64.u32 %rd161, %r2276; 2026-02-21T10:22:24.2251885Z or.b64 %rd8, %rd161, 4611686293372403712; 2026-02-21T10:22:24.2252108Z add.s32 %r2277, %r39936, 16448; 2026-02-21T10:22:24.2252316Z bfe.u32 %r2278, %r2277, 4, 14; 2026-02-21T10:22:24.2252504Z cvt.u64.u32 %rd162, %r2278; 2026-02-21T10:22:24.2252835Z or.b64 %rd9, %rd162, 4611686293372403712; 2026-02-21T10:22:24.2253049Z add.s32 %r2279, %r39936, 16480; 2026-02-21T10:22:24.2253245Z bfe.u32 %r2280, %r2279, 4, 14; 2026-02-21T10:22:24.2253425Z cvt.u64.u32 %rd163, %r2280; 2026-02-21T10:22:24.2253631Z or.b64 %rd10, %rd163, 4611686293372403712; 2026-02-21T10:22:24.2253849Z and.b32 %r2282, %r42462, 1920; 2026-02-21T10:22:24.2254035Z or.b32 %r2284, %r2282, %r42461; 2026-02-21T10:22:24.2254241Z xor.b32 %r2285, %r2284, %r42463; 2026-02-21T10:22:24.2254433Z or.b32 %r2286, %r2285, %r2235; 2026-02-21T10:22:24.2254625Z add.s32 %r36, %r39936, %r2286; 2026-02-21T10:22:24.2254811Z add.s32 %r37, %r36, 16384; 2026-02-21T10:22:24.2255004Z add.s32 %r38, %r36, 8192; 2026-02-21T10:22:24.2255185Z add.s32 %r39, %r36, 24576; 2026-02-21T10:22:24.2255371Z xor.b32 %r2287, %r2286, 32; 2026-02-21T10:22:24.2255559Z add.s32 %r40, %r39936, %r2287; 2026-02-21T10:22:24.2255742Z add.s32 %r41, %r40, 16384; 2026-02-21T10:22:24.2255940Z add.s32 %r42, %r40, 8192; 2026-02-21T10:22:24.2256121Z add.s32 %r43, %r40, 24576; 2026-02-21T10:22:24.2256302Z xor.b32 %r2288, %r2286, 64; 2026-02-21T10:22:24.2256706Z add.s32 %r44, %r39936, %r2288; 2026-02-21T10:22:24.2256929Z add.s32 %r45, %r44, 16384; 2026-02-21T10:22:24.2257106Z add.s32 %r46, %r44, 8192; 2026-02-21T10:22:24.2257287Z add.s32 %r47, %r44, 24576; 2026-02-21T10:22:24.2257462Z xor.b32 %r2289, %r2286, 96; 2026-02-21T10:22:24.2257666Z add.s32 %r48, %r39936, %r2289; 2026-02-21T10:22:24.2257857Z add.s32 %r49, %r48, 16384; 2026-02-21T10:22:24.2258035Z add.s32 %r50, %r48, 8192; 2026-02-21T10:22:24.2258215Z add.s32 %r51, %r48, 24576; 2026-02-21T10:22:24.2258404Z or.b32 %r2290, %r2237, %r2239; 2026-02-21T10:22:24.2258596Z or.b32 %r2291, %r2290, %r2235; 2026-02-21T10:22:24.2258788Z add.s32 %r52, %r39936, %r2291; 2026-02-21T10:22:24.2258977Z xor.b32 %r2292, %r2291, 8; 2026-02-21T10:22:24.2259154Z add.s32 %r53, %r39936, %r2292; 2026-02-21T10:22:24.2259344Z xor.b32 %r2293, %r2291, 16; 2026-02-21T10:22:24.2259533Z add.s32 %r54, %r39936, %r2293; 2026-02-21T10:22:24.2259724Z xor.b32 %r2294, %r2291, 24; 2026-02-21T10:22:24.2259908Z add.s32 %r55, %r39936, %r2294; 2026-02-21T10:22:24.2260089Z xor.b32 %r2295, %r2291, 32; 2026-02-21T10:22:24.2260280Z add.s32 %r56, %r39936, %r2295; 2026-02-21T10:22:24.2260470Z xor.b32 %r2296, %r2291, 40; 2026-02-21T10:22:24.2260661Z add.s32 %r57, %r39936, %r2296; 2026-02-21T10:22:24.2260847Z xor.b32 %r2297, %r2291, 48; 2026-02-21T10:22:24.2261032Z add.s32 %r58, %r39936, %r2297; 2026-02-21T10:22:24.2261223Z xor.b32 %r2298, %r2291, 56; 2026-02-21T10:22:24.2261494Z add.s32 %r59, %r39936, %r2298; 2026-02-21T10:22:24.2261840Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.2262216Z mad.wide.u32 %rd11, %r8, 16, %rd117; 2026-02-21T10:22:24.2262585Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.2262967Z or.b32 %r2300, %r42471, %r9; 2026-02-21T10:22:24.2263174Z or.b32 %r68, %r2300, 128; 2026-02-21T10:22:24.2263405Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:22:24.2263701Z // Child Loop BB0_3 Depth 2 2026-02-21T10:22:24.2263975Z // Child Loop BB0_5 Depth 2 2026-02-21T10:22:24.2264235Z // Child Loop BB0_7 Depth 2 2026-02-21T10:22:24.2264503Z // Child Loop BB0_9 Depth 2 2026-02-21T10:22:24.2264842Z // Child Loop BB0_11 Depth 2 2026-02-21T10:22:24.2265127Z // Child Loop BB0_13 Depth 2 2026-02-21T10:22:24.2265517Z .loc 1 37 35 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:37:35 2026-02-21T10:22:24.2265886Z shr.s32 %r2302, %r42472, 31; 2026-02-21T10:22:24.2266147Z shr.u32 %r2303, %r2302, 18; 2026-02-21T10:22:24.2266332Z add.s32 %r2304, %r42472, %r2303; 2026-02-21T10:22:24.2266646Z shr.s32 %r2305, %r2304, 14; 2026-02-21T10:22:24.2266970Z .loc 1 38 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:38:33 2026-02-21T10:22:24.2267326Z shl.b32 %r2306, %r2305, 5; 2026-02-21T10:22:24.2267636Z .loc 1 39 39 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:39 2026-02-21T10:22:24.2267987Z sub.s32 %r2307, 10, %r2306; 2026-02-21T10:22:24.2268307Z .loc 1 39 52 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:52 2026-02-21T10:22:24.2268743Z min.s32 %r2308, %r2307, 32; 2026-02-21T10:22:24.2269059Z .loc 1 40 45 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:45 2026-02-21T10:22:24.2269412Z and.b32 %r2309, %r2304, -16384; 2026-02-21T10:22:24.2269612Z sub.s32 %r2310, %r42472, %r2309; 2026-02-21T10:22:24.2269935Z .loc 1 41 51 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:41:51 2026-02-21T10:22:24.2270288Z div.s32 %r2311, %r2310, %r2308; 2026-02-21T10:22:24.2270705Z .loc 1 40 64 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:64 2026-02-21T10:22:24.2271067Z mul.lo.s32 %r2312, %r2311, %r2308; 2026-02-21T10:22:24.2271271Z sub.s32 %r2313, %r2310, %r2312; 2026-02-21T10:22:24.2271589Z .loc 1 40 30 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:30 2026-02-21T10:22:24.2271940Z add.s32 %r2314, %r2313, %r2306; 2026-02-21T10:22:24.2272262Z .loc 1 42 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:42:27 2026-02-21T10:22:24.2272630Z shl.b32 %r9808, %r2314, 7; 2026-02-21T10:22:24.2272946Z .loc 1 43 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:43:27 2026-02-21T10:22:24.2273286Z shl.b32 %r12255, %r2311, 7; 2026-02-21T10:22:24.2273625Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.2273986Z or.b32 %r2315, %r42464, %r12255; 2026-02-21T10:22:24.2274185Z shl.b32 %r2316, %r2315, 13; 2026-02-21T10:22:24.2274368Z mul.wide.s32 %rd21, %r2316, 2; 2026-02-21T10:22:24.2274578Z or.b32 %r2317, %r42465, %r12255; 2026-02-21T10:22:24.2274767Z shl.b32 %r2318, %r2317, 13; 2026-02-21T10:22:24.2274961Z mul.wide.s32 %rd22, %r2318, 2; 2026-02-21T10:22:24.2275155Z or.b32 %r2319, %r42466, %r12255; 2026-02-21T10:22:24.2275338Z shl.b32 %r2320, %r2319, 13; 2026-02-21T10:22:24.2275528Z mul.wide.s32 %rd23, %r2320, 2; 2026-02-21T10:22:24.2275806Z or.b32 %r2321, %r42467, %r12255; 2026-02-21T10:22:24.2276001Z shl.b32 %r2322, %r2321, 13; 2026-02-21T10:22:24.2276177Z mul.wide.s32 %rd24, %r2322, 2; 2026-02-21T10:22:24.2276369Z or.b32 %r2323, %r42468, %r12255; 2026-02-21T10:22:24.2276688Z shl.b32 %r2324, %r2323, 13; 2026-02-21T10:22:24.2276881Z mul.wide.s32 %rd25, %r2324, 2; 2026-02-21T10:22:24.2277075Z or.b32 %r2325, %r42469, %r12255; 2026-02-21T10:22:24.2277259Z shl.b32 %r2326, %r2325, 13; 2026-02-21T10:22:24.2277446Z mul.wide.s32 %rd26, %r2326, 2; 2026-02-21T10:22:24.2277629Z shl.b32 %r2327, %r2311, 20; 2026-02-21T10:22:24.2277807Z or.b32 %r2328, %r42470, %r2327; 2026-02-21T10:22:24.2278007Z mul.wide.s32 %rd27, %r2328, 2; 2026-02-21T10:22:24.2278204Z or.b32 %r42473, %r68, %r2327; 2026-02-21T10:22:24.2278385Z or.b32 %r2329, %r42471, %r2327; 2026-02-21T10:22:24.2278577Z mul.wide.s32 %rd28, %r2329, 2; 2026-02-21T10:22:24.2278766Z mov.b32 %r42474, 0f00000000; 2026-02-21T10:22:24.2278955Z mov.b64 %rd841, -96; 2026-02-21T10:22:24.2279212Z mov.b64 %rd840, %rd11; 2026-02-21T10:22:24.2279393Z mov.b32 %r42475, %r42474; 2026-02-21T10:22:24.2279577Z mov.b32 %r42476, %r42474; 2026-02-21T10:22:24.2279747Z mov.b32 %r42477, %r42474; 2026-02-21T10:22:24.2279922Z mov.b32 %r42478, %r42474; 2026-02-21T10:22:24.2280087Z mov.b32 %r42479, %r42474; 2026-02-21T10:22:24.2280333Z mov.b32 %r42480, %r42474; 2026-02-21T10:22:24.2280501Z mov.b32 %r42481, %r42474; 2026-02-21T10:22:24.2280674Z mov.b32 %r42482, %r42474; 2026-02-21T10:22:24.2280843Z mov.b32 %r42483, %r42474; 2026-02-21T10:22:24.2281015Z mov.b32 %r42484, %r42474; 2026-02-21T10:22:24.2281187Z mov.b32 %r42485, %r42474; 2026-02-21T10:22:24.2281355Z mov.b32 %r42486, %r42474; 2026-02-21T10:22:24.2281530Z mov.b32 %r42487, %r42474; 2026-02-21T10:22:24.2281695Z mov.b32 %r42488, %r42474; 2026-02-21T10:22:24.2281874Z mov.b32 %r42489, %r42474; 2026-02-21T10:22:24.2282037Z mov.b32 %r42490, %r42474; 2026-02-21T10:22:24.2282210Z mov.b32 %r42491, %r42474; 2026-02-21T10:22:24.2282377Z mov.b32 %r42492, %r42474; 2026-02-21T10:22:24.2282550Z mov.b32 %r42493, %r42474; 2026-02-21T10:22:24.2282716Z mov.b32 %r42494, %r42474; 2026-02-21T10:22:24.2282889Z mov.b32 %r42495, %r42474; 2026-02-21T10:22:24.2283063Z mov.b32 %r42496, %r42474; 2026-02-21T10:22:24.2283229Z mov.b32 %r42497, %r42474; 2026-02-21T10:22:24.2283408Z mov.b32 %r42498, %r42474; 2026-02-21T10:22:24.2283571Z mov.b32 %r42499, %r42474; 2026-02-21T10:22:24.2283746Z mov.b32 %r42500, %r42474; 2026-02-21T10:22:24.2283998Z mov.b32 %r42501, %r42474; 2026-02-21T10:22:24.2284187Z mov.b32 %r42502, %r42474; 2026-02-21T10:22:24.2284359Z mov.b32 %r42503, %r42474; 2026-02-21T10:22:24.2284530Z mov.b32 %r42504, %r42474; 2026-02-21T10:22:24.2284707Z mov.b32 %r42505, %r42474; 2026-02-21T10:22:24.2284873Z mov.b32 %r42506, %r42474; 2026-02-21T10:22:24.2285047Z mov.b32 %r42507, %r42474; 2026-02-21T10:22:24.2285210Z mov.b32 %r42508, %r42474; 2026-02-21T10:22:24.2285383Z mov.b32 %r42509, %r42474; 2026-02-21T10:22:24.2285548Z mov.b32 %r42510, %r42474; 2026-02-21T10:22:24.2285721Z mov.b32 %r42511, %r42474; 2026-02-21T10:22:24.2285885Z mov.b32 %r42512, %r42474; 2026-02-21T10:22:24.2286070Z mov.b32 %r42513, %r42474; 2026-02-21T10:22:24.2286245Z mov.b32 %r42514, %r42474; 2026-02-21T10:22:24.2286415Z mov.b32 %r42515, %r42474; 2026-02-21T10:22:24.2286732Z mov.b32 %r42516, %r42474; 2026-02-21T10:22:24.2286900Z mov.b32 %r42517, %r42474; 2026-02-21T10:22:24.2287075Z mov.b32 %r42518, %r42474; 2026-02-21T10:22:24.2287242Z mov.b32 %r42519, %r42474; 2026-02-21T10:22:24.2287414Z mov.b32 %r42520, %r42474; 2026-02-21T10:22:24.2287580Z mov.b32 %r42521, %r42474; 2026-02-21T10:22:24.2287753Z mov.b32 %r42522, %r42474; 2026-02-21T10:22:24.2287921Z mov.b32 %r42523, %r42474; 2026-02-21T10:22:24.2288098Z mov.b32 %r42524, %r42474; 2026-02-21T10:22:24.2288275Z mov.b32 %r42525, %r42474; 2026-02-21T10:22:24.2288451Z mov.b32 %r42526, %r42474; 2026-02-21T10:22:24.2288703Z mov.b32 %r42527, %r42474; 2026-02-21T10:22:24.2288869Z mov.b32 %r42528, %r42474; 2026-02-21T10:22:24.2289038Z mov.b32 %r42529, %r42474; 2026-02-21T10:22:24.2289203Z mov.b32 %r42530, %r42474; 2026-02-21T10:22:24.2289374Z mov.b32 %r42531, %r42474; 2026-02-21T10:22:24.2289538Z mov.b32 %r42532, %r42474; 2026-02-21T10:22:24.2289706Z mov.b32 %r42533, %r42474; 2026-02-21T10:22:24.2289874Z mov.b32 %r42534, %r42474; 2026-02-21T10:22:24.2290046Z mov.b32 %r42535, %r42474; 2026-02-21T10:22:24.2290218Z mov.b32 %r42536, %r42474; 2026-02-21T10:22:24.2290385Z mov.b32 %r42537, %r42474; 2026-02-21T10:22:24.2290555Z mov.b32 %r42538, %r42474; 2026-02-21T10:22:24.2290722Z mov.b32 %r42539, %r42474; 2026-02-21T10:22:24.2290891Z mov.b32 %r42540, %r42474; 2026-02-21T10:22:24.2291055Z mov.b32 %r42541, %r42474; 2026-02-21T10:22:24.2291223Z mov.b32 %r42542, %r42474; 2026-02-21T10:22:24.2291388Z mov.b32 %r42543, %r42474; 2026-02-21T10:22:24.2291558Z mov.b32 %r42544, %r42474; 2026-02-21T10:22:24.2291726Z mov.b32 %r42545, %r42474; 2026-02-21T10:22:24.2291980Z mov.b32 %r42546, %r42474; 2026-02-21T10:22:24.2292162Z mov.b32 %r42547, %r42474; 2026-02-21T10:22:24.2292327Z mov.b32 %r42548, %r42474; 2026-02-21T10:22:24.2292497Z mov.b32 %r42549, %r42474; 2026-02-21T10:22:24.2292663Z mov.b32 %r42550, %r42474; 2026-02-21T10:22:24.2292831Z mov.b32 %r42551, %r42474; 2026-02-21T10:22:24.2293074Z mov.b32 %r42552, %r42474; 2026-02-21T10:22:24.2293233Z mov.b32 %r42553, %r42474; 2026-02-21T10:22:24.2293397Z mov.b32 %r42554, %r42474; 2026-02-21T10:22:24.2293560Z mov.b32 %r42555, %r42474; 2026-02-21T10:22:24.2293725Z mov.b32 %r42556, %r42474; 2026-02-21T10:22:24.2293888Z mov.b32 %r42557, %r42474; 2026-02-21T10:22:24.2294056Z mov.b32 %r42558, %r42474; 2026-02-21T10:22:24.2294237Z mov.b32 %r42559, %r42474; 2026-02-21T10:22:24.2294404Z mov.b32 %r42560, %r42474; 2026-02-21T10:22:24.2294581Z mov.b32 %r42561, %r42474; 2026-02-21T10:22:24.2294745Z mov.b32 %r42562, %r42474; 2026-02-21T10:22:24.2294914Z mov.b32 %r42563, %r42474; 2026-02-21T10:22:24.2295078Z mov.b32 %r42564, %r42474; 2026-02-21T10:22:24.2295250Z mov.b32 %r42565, %r42474; 2026-02-21T10:22:24.2295412Z mov.b32 %r42566, %r42474; 2026-02-21T10:22:24.2295579Z mov.b32 %r42567, %r42474; 2026-02-21T10:22:24.2295738Z mov.b32 %r42568, %r42474; 2026-02-21T10:22:24.2295926Z mov.b32 %r42569, %r42474; 2026-02-21T10:22:24.2296101Z mov.b32 %r42570, %r42474; 2026-02-21T10:22:24.2296271Z mov.b32 %r42571, %r42474; 2026-02-21T10:22:24.2296440Z mov.b32 %r42572, %r42474; 2026-02-21T10:22:24.2296814Z mov.b32 %r42573, %r42474; 2026-02-21T10:22:24.2296990Z mov.b32 %r42574, %r42474; 2026-02-21T10:22:24.2297154Z mov.b32 %r42575, %r42474; 2026-02-21T10:22:24.2297319Z mov.b32 %r42576, %r42474; 2026-02-21T10:22:24.2297482Z mov.b32 %r42577, %r42474; 2026-02-21T10:22:24.2297648Z mov.b32 %r42578, %r42474; 2026-02-21T10:22:24.2297817Z mov.b32 %r42579, %r42474; 2026-02-21T10:22:24.2297983Z mov.b32 %r42580, %r42474; 2026-02-21T10:22:24.2298157Z mov.b32 %r42581, %r42474; 2026-02-21T10:22:24.2298324Z mov.b32 %r42582, %r42474; 2026-02-21T10:22:24.2298496Z mov.b32 %r42583, %r42474; 2026-02-21T10:22:24.2298657Z mov.b32 %r42584, %r42474; 2026-02-21T10:22:24.2298830Z mov.b32 %r42585, %r42474; 2026-02-21T10:22:24.2298993Z mov.b32 %r42586, %r42474; 2026-02-21T10:22:24.2299172Z mov.b32 %r42587, %r42474; 2026-02-21T10:22:24.2299338Z mov.b32 %r42588, %r42474; 2026-02-21T10:22:24.2299511Z mov.b32 %r42589, %r42474; 2026-02-21T10:22:24.2299677Z mov.b32 %r42590, %r42474; 2026-02-21T10:22:24.2299850Z mov.b32 %r42591, %r42474; 2026-02-21T10:22:24.2300024Z mov.b32 %r42592, %r42474; 2026-02-21T10:22:24.2300190Z mov.b32 %r42593, %r42474; 2026-02-21T10:22:24.2300357Z mov.b32 %r42594, %r42474; 2026-02-21T10:22:24.2300521Z mov.b32 %r42595, %r42474; 2026-02-21T10:22:24.2300687Z mov.b32 %r42596, %r42474; 2026-02-21T10:22:24.2300857Z mov.b32 %r42597, %r42474; 2026-02-21T10:22:24.2301022Z mov.b32 %r42598, %r42474; 2026-02-21T10:22:24.2301270Z mov.b32 %r42599, %r42474; 2026-02-21T10:22:24.2301448Z mov.b32 %r42600, %r42474; 2026-02-21T10:22:24.2301614Z mov.b32 %r42601, %r42474; 2026-02-21T10:22:24.2301839Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.2302140Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.2302536Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2302899Z add.s64 %rd166, %rd840, %rd27; 2026-02-21T10:22:24.2303086Z add.s64 %rd169, %rd840, %rd26; 2026-02-21T10:22:24.2303279Z add.s64 %rd172, %rd840, %rd25; 2026-02-21T10:22:24.2303459Z add.s64 %rd175, %rd840, %rd24; 2026-02-21T10:22:24.2303637Z add.s64 %rd178, %rd840, %rd23; 2026-02-21T10:22:24.2303817Z add.s64 %rd181, %rd840, %rd22; 2026-02-21T10:22:24.2303993Z add.s64 %rd184, %rd840, %rd21; 2026-02-21T10:22:24.2304307Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2304765Z add.s64 %rd187, %rd840, %rd28; 2026-02-21T10:22:24.2304955Z // begin inline asm 2026-02-21T10:22:24.2305114Z mov.u64 %rd165, 0x0; 2026-02-21T10:22:24.2305351Z createpolicy.fractional.L2::evict_first.b64 %rd165, 1.0; 2026-02-21T10:22:24.2305609Z // end inline asm 2026-02-21T10:22:24.2305835Z // begin inline asm 2026-02-21T10:22:24.2305989Z mov.u32 %r2330, 0x0; 2026-02-21T10:22:24.2306138Z mov.u32 %r2331, 0x0; 2026-02-21T10:22:24.2306287Z mov.u32 %r2332, 0x0; 2026-02-21T10:22:24.2306435Z mov.u32 %r2333, 0x0; 2026-02-21T10:22:24.2306889Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2330, %r2331, %r2332, %r2333 }, [ %rd166 + 0 ], %rd165; 2026-02-21T10:22:24.2307258Z // end inline asm 2026-02-21T10:22:24.2307420Z // begin inline asm 2026-02-21T10:22:24.2307569Z mov.u64 %rd168, 0x0; 2026-02-21T10:22:24.2307785Z createpolicy.fractional.L2::evict_first.b64 %rd168, 1.0; 2026-02-21T10:22:24.2308038Z // end inline asm 2026-02-21T10:22:24.2308187Z // begin inline asm 2026-02-21T10:22:24.2308352Z mov.u32 %r2334, 0x0; 2026-02-21T10:22:24.2308577Z mov.u32 %r2335, 0x0; 2026-02-21T10:22:24.2308731Z mov.u32 %r2336, 0x0; 2026-02-21T10:22:24.2308878Z mov.u32 %r2337, 0x0; 2026-02-21T10:22:24.2309190Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2334, %r2335, %r2336, %r2337 }, [ %rd169 + 0 ], %rd168; 2026-02-21T10:22:24.2309556Z // end inline asm 2026-02-21T10:22:24.2309702Z // begin inline asm 2026-02-21T10:22:24.2309852Z mov.u64 %rd171, 0x0; 2026-02-21T10:22:24.2310166Z createpolicy.fractional.L2::evict_first.b64 %rd171, 1.0; 2026-02-21T10:22:24.2310426Z // end inline asm 2026-02-21T10:22:24.2310568Z // begin inline asm 2026-02-21T10:22:24.2310734Z mov.u32 %r2338, 0x0; 2026-02-21T10:22:24.2310885Z mov.u32 %r2339, 0x0; 2026-02-21T10:22:24.2311034Z mov.u32 %r2340, 0x0; 2026-02-21T10:22:24.2311181Z mov.u32 %r2341, 0x0; 2026-02-21T10:22:24.2311505Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2338, %r2339, %r2340, %r2341 }, [ %rd172 + 0 ], %rd171; 2026-02-21T10:22:24.2311875Z // end inline asm 2026-02-21T10:22:24.2312023Z // begin inline asm 2026-02-21T10:22:24.2312175Z mov.u64 %rd174, 0x0; 2026-02-21T10:22:24.2312389Z createpolicy.fractional.L2::evict_first.b64 %rd174, 1.0; 2026-02-21T10:22:24.2312646Z // end inline asm 2026-02-21T10:22:24.2312790Z // begin inline asm 2026-02-21T10:22:24.2312944Z mov.u32 %r2342, 0x0; 2026-02-21T10:22:24.2313094Z mov.u32 %r2343, 0x0; 2026-02-21T10:22:24.2313258Z mov.u32 %r2344, 0x0; 2026-02-21T10:22:24.2313413Z mov.u32 %r2345, 0x0; 2026-02-21T10:22:24.2313728Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2342, %r2343, %r2344, %r2345 }, [ %rd175 + 0 ], %rd174; 2026-02-21T10:22:24.2314087Z // end inline asm 2026-02-21T10:22:24.2314230Z // begin inline asm 2026-02-21T10:22:24.2314383Z mov.u64 %rd177, 0x0; 2026-02-21T10:22:24.2314595Z createpolicy.fractional.L2::evict_first.b64 %rd177, 1.0; 2026-02-21T10:22:24.2314846Z // end inline asm 2026-02-21T10:22:24.2315091Z // begin inline asm 2026-02-21T10:22:24.2315243Z mov.u32 %r2346, 0x0; 2026-02-21T10:22:24.2315393Z mov.u32 %r2347, 0x0; 2026-02-21T10:22:24.2315543Z mov.u32 %r2348, 0x0; 2026-02-21T10:22:24.2315699Z mov.u32 %r2349, 0x0; 2026-02-21T10:22:24.2316004Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2346, %r2347, %r2348, %r2349 }, [ %rd178 + 0 ], %rd177; 2026-02-21T10:22:24.2316364Z // end inline asm 2026-02-21T10:22:24.2316639Z // begin inline asm 2026-02-21T10:22:24.2316793Z mov.u64 %rd180, 0x0; 2026-02-21T10:22:24.2317009Z createpolicy.fractional.L2::evict_first.b64 %rd180, 1.0; 2026-02-21T10:22:24.2317260Z // end inline asm 2026-02-21T10:22:24.2317402Z // begin inline asm 2026-02-21T10:22:24.2317566Z mov.u32 %r2350, 0x0; 2026-02-21T10:22:24.2317715Z mov.u32 %r2351, 0x0; 2026-02-21T10:22:24.2317861Z mov.u32 %r2352, 0x0; 2026-02-21T10:22:24.2318012Z mov.u32 %r2353, 0x0; 2026-02-21T10:22:24.2318316Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2350, %r2351, %r2352, %r2353 }, [ %rd181 + 0 ], %rd180; 2026-02-21T10:22:24.2318757Z // end inline asm 2026-02-21T10:22:24.2318907Z // begin inline asm 2026-02-21T10:22:24.2319058Z mov.u64 %rd183, 0x0; 2026-02-21T10:22:24.2319265Z createpolicy.fractional.L2::evict_first.b64 %rd183, 1.0; 2026-02-21T10:22:24.2319525Z // end inline asm 2026-02-21T10:22:24.2319678Z // begin inline asm 2026-02-21T10:22:24.2319895Z mov.u32 %r2354, 0x0; 2026-02-21T10:22:24.2320046Z mov.u32 %r2355, 0x0; 2026-02-21T10:22:24.2320190Z mov.u32 %r2356, 0x0; 2026-02-21T10:22:24.2320339Z mov.u32 %r2357, 0x0; 2026-02-21T10:22:24.2320647Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2354, %r2355, %r2356, %r2357 }, [ %rd184 + 0 ], %rd183; 2026-02-21T10:22:24.2321022Z // end inline asm 2026-02-21T10:22:24.2321167Z // begin inline asm 2026-02-21T10:22:24.2321318Z mov.u64 %rd186, 0x0; 2026-02-21T10:22:24.2321540Z createpolicy.fractional.L2::evict_first.b64 %rd186, 1.0; 2026-02-21T10:22:24.2321793Z // end inline asm 2026-02-21T10:22:24.2321943Z // begin inline asm 2026-02-21T10:22:24.2322096Z mov.u32 %r2358, 0x0; 2026-02-21T10:22:24.2322250Z mov.u32 %r2359, 0x0; 2026-02-21T10:22:24.2322399Z mov.u32 %r2360, 0x0; 2026-02-21T10:22:24.2322549Z mov.u32 %r2361, 0x0; 2026-02-21T10:22:24.2322850Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r2358, %r2359, %r2360, %r2361 }, [ %rd187 + 0 ], %rd186; 2026-02-21T10:22:24.2323207Z // end inline asm 2026-02-21T10:22:24.2323501Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2323845Z bar.sync 0; 2026-02-21T10:22:24.2324093Z st.shared.v2.b32 [%r10], {%r2330, %r2331}; 2026-02-21T10:22:24.2324337Z st.shared.v2.b32 [%r10+2048], {%r2334, %r2335}; 2026-02-21T10:22:24.2324582Z st.shared.v2.b32 [%r10+4096], {%r2338, %r2339}; 2026-02-21T10:22:24.2324816Z st.shared.v2.b32 [%r10+6144], {%r2342, %r2343}; 2026-02-21T10:22:24.2325048Z st.shared.v2.b32 [%r10+8192], {%r2346, %r2347}; 2026-02-21T10:22:24.2325288Z st.shared.v2.b32 [%r10+10240], {%r2350, %r2351}; 2026-02-21T10:22:24.2325554Z st.shared.v2.b32 [%r10+12288], {%r2354, %r2355}; 2026-02-21T10:22:24.2325803Z st.shared.v2.b32 [%r10+14336], {%r2358, %r2359}; 2026-02-21T10:22:24.2326034Z st.shared.v2.b32 [%r11], {%r2332, %r2333}; 2026-02-21T10:22:24.2326263Z st.shared.v2.b32 [%r11+2048], {%r2336, %r2337}; 2026-02-21T10:22:24.2326668Z st.shared.v2.b32 [%r11+4096], {%r2340, %r2341}; 2026-02-21T10:22:24.2326928Z st.shared.v2.b32 [%r11+6144], {%r2344, %r2345}; 2026-02-21T10:22:24.2327166Z st.shared.v2.b32 [%r11+8192], {%r2348, %r2349}; 2026-02-21T10:22:24.2327406Z st.shared.v2.b32 [%r11+10240], {%r2352, %r2353}; 2026-02-21T10:22:24.2327643Z st.shared.v2.b32 [%r11+12288], {%r2356, %r2357}; 2026-02-21T10:22:24.2327880Z st.shared.v2.b32 [%r11+14336], {%r2360, %r2361}; 2026-02-21T10:22:24.2328094Z bar.sync 0; 2026-02-21T10:22:24.2328241Z ld.shared.b16 %rs1, [%r12]; 2026-02-21T10:22:24.2328433Z ld.shared.b16 %rs2, [%r12+1024]; 2026-02-21T10:22:24.2328627Z ld.shared.b16 %rs3, [%r12+64]; 2026-02-21T10:22:24.2328916Z ld.shared.b16 %rs4, [%r12+1088]; 2026-02-21T10:22:24.2329103Z ld.shared.b16 %rs5, [%r12+8192]; 2026-02-21T10:22:24.2329292Z ld.shared.b16 %rs6, [%r12+9216]; 2026-02-21T10:22:24.2329479Z ld.shared.b16 %rs7, [%r12+8256]; 2026-02-21T10:22:24.2329672Z ld.shared.b16 %rs8, [%r12+9280]; 2026-02-21T10:22:24.2329860Z ld.shared.b16 %rs9, [%r13]; 2026-02-21T10:22:24.2330044Z ld.shared.b16 %rs10, [%r13+1024]; 2026-02-21T10:22:24.2330243Z ld.shared.b16 %rs11, [%r13+64]; 2026-02-21T10:22:24.2330433Z ld.shared.b16 %rs12, [%r13+1088]; 2026-02-21T10:22:24.2330625Z ld.shared.b16 %rs13, [%r13+8192]; 2026-02-21T10:22:24.2330810Z ld.shared.b16 %rs14, [%r13+9216]; 2026-02-21T10:22:24.2330999Z ld.shared.b16 %rs15, [%r13+8256]; 2026-02-21T10:22:24.2331183Z ld.shared.b16 %rs16, [%r13+9280]; 2026-02-21T10:22:24.2331372Z ld.shared.b16 %rs17, [%r14]; 2026-02-21T10:22:24.2331556Z ld.shared.b16 %rs18, [%r14+1024]; 2026-02-21T10:22:24.2331741Z ld.shared.b16 %rs19, [%r14+64]; 2026-02-21T10:22:24.2332012Z ld.shared.b16 %rs20, [%r14+1088]; 2026-02-21T10:22:24.2332205Z ld.shared.b16 %rs21, [%r14+8192]; 2026-02-21T10:22:24.2332394Z ld.shared.b16 %rs22, [%r14+9216]; 2026-02-21T10:22:24.2332577Z ld.shared.b16 %rs23, [%r14+8256]; 2026-02-21T10:22:24.2332767Z ld.shared.b16 %rs24, [%r14+9280]; 2026-02-21T10:22:24.2332953Z ld.shared.b16 %rs25, [%r15]; 2026-02-21T10:22:24.2333215Z ld.shared.b16 %rs26, [%r15+1024]; 2026-02-21T10:22:24.2333405Z ld.shared.b16 %rs27, [%r15+64]; 2026-02-21T10:22:24.2333592Z ld.shared.b16 %rs28, [%r15+1088]; 2026-02-21T10:22:24.2333782Z ld.shared.b16 %rs29, [%r15+8192]; 2026-02-21T10:22:24.2333966Z ld.shared.b16 %rs30, [%r15+9216]; 2026-02-21T10:22:24.2334160Z ld.shared.b16 %rs31, [%r15+8256]; 2026-02-21T10:22:24.2334341Z ld.shared.b16 %rs32, [%r15+9280]; 2026-02-21T10:22:24.2334527Z ld.shared.b16 %rs33, [%r16]; 2026-02-21T10:22:24.2334714Z ld.shared.b16 %rs34, [%r16+1024]; 2026-02-21T10:22:24.2334901Z ld.shared.b16 %rs35, [%r16+64]; 2026-02-21T10:22:24.2335092Z ld.shared.b16 %rs36, [%r16+1088]; 2026-02-21T10:22:24.2335277Z ld.shared.b16 %rs37, [%r16+8192]; 2026-02-21T10:22:24.2335462Z ld.shared.b16 %rs38, [%r16+9216]; 2026-02-21T10:22:24.2335647Z ld.shared.b16 %rs39, [%r16+8256]; 2026-02-21T10:22:24.2335831Z ld.shared.b16 %rs40, [%r16+9280]; 2026-02-21T10:22:24.2336020Z ld.shared.b16 %rs41, [%r17]; 2026-02-21T10:22:24.2336205Z ld.shared.b16 %rs42, [%r17+1024]; 2026-02-21T10:22:24.2336390Z ld.shared.b16 %rs43, [%r17+64]; 2026-02-21T10:22:24.2336800Z ld.shared.b16 %rs44, [%r17+1088]; 2026-02-21T10:22:24.2336997Z ld.shared.b16 %rs45, [%r17+8192]; 2026-02-21T10:22:24.2337196Z ld.shared.b16 %rs46, [%r17+9216]; 2026-02-21T10:22:24.2337393Z ld.shared.b16 %rs47, [%r17+8256]; 2026-02-21T10:22:24.2337597Z ld.shared.b16 %rs48, [%r17+9280]; 2026-02-21T10:22:24.2337794Z ld.shared.b16 %rs49, [%r18]; 2026-02-21T10:22:24.2337970Z ld.shared.b16 %rs50, [%r18+1024]; 2026-02-21T10:22:24.2338169Z ld.shared.b16 %rs51, [%r18+64]; 2026-02-21T10:22:24.2338361Z ld.shared.b16 %rs52, [%r18+1088]; 2026-02-21T10:22:24.2338555Z ld.shared.b16 %rs53, [%r18+8192]; 2026-02-21T10:22:24.2338742Z ld.shared.b16 %rs54, [%r18+9216]; 2026-02-21T10:22:24.2338932Z ld.shared.b16 %rs55, [%r18+8256]; 2026-02-21T10:22:24.2339123Z ld.shared.b16 %rs56, [%r18+9280]; 2026-02-21T10:22:24.2339312Z ld.shared.b16 %rs57, [%r19]; 2026-02-21T10:22:24.2339496Z ld.shared.b16 %rs58, [%r19+1024]; 2026-02-21T10:22:24.2339686Z ld.shared.b16 %rs59, [%r19+64]; 2026-02-21T10:22:24.2339884Z ld.shared.b16 %rs60, [%r19+1088]; 2026-02-21T10:22:24.2340078Z ld.shared.b16 %rs61, [%r19+8192]; 2026-02-21T10:22:24.2340277Z ld.shared.b16 %rs62, [%r19+9216]; 2026-02-21T10:22:24.2340463Z ld.shared.b16 %rs63, [%r19+8256]; 2026-02-21T10:22:24.2340670Z ld.shared.b16 %rs64, [%r19+9280]; 2026-02-21T10:22:24.2340864Z cvt.f32.bf16 %r2499, %rs1; 2026-02-21T10:22:24.2341043Z cvt.f32.bf16 %r2500, %rs2; 2026-02-21T10:22:24.2341222Z cvt.f32.bf16 %r2501, %rs9; 2026-02-21T10:22:24.2341487Z cvt.f32.bf16 %r2502, %rs10; 2026-02-21T10:22:24.2341666Z cvt.f32.bf16 %r2631, %rs17; 2026-02-21T10:22:24.2341853Z cvt.f32.bf16 %r2632, %rs18; 2026-02-21T10:22:24.2342031Z cvt.f32.bf16 %r2633, %rs25; 2026-02-21T10:22:24.2342200Z cvt.f32.bf16 %r2634, %rs26; 2026-02-21T10:22:24.2342377Z cvt.f32.bf16 %r2763, %rs33; 2026-02-21T10:22:24.2342550Z cvt.f32.bf16 %r2764, %rs34; 2026-02-21T10:22:24.2342725Z cvt.f32.bf16 %r2765, %rs41; 2026-02-21T10:22:24.2342899Z cvt.f32.bf16 %r2766, %rs42; 2026-02-21T10:22:24.2343073Z cvt.f32.bf16 %r2895, %rs49; 2026-02-21T10:22:24.2343250Z cvt.f32.bf16 %r2896, %rs50; 2026-02-21T10:22:24.2343420Z cvt.f32.bf16 %r2897, %rs57; 2026-02-21T10:22:24.2343597Z cvt.f32.bf16 %r2898, %rs58; 2026-02-21T10:22:24.2343775Z cvt.f32.bf16 %r3027, %rs3; 2026-02-21T10:22:24.2343962Z cvt.f32.bf16 %r3028, %rs4; 2026-02-21T10:22:24.2344138Z cvt.f32.bf16 %r3029, %rs11; 2026-02-21T10:22:24.2344320Z cvt.f32.bf16 %r3030, %rs12; 2026-02-21T10:22:24.2344496Z cvt.f32.bf16 %r3159, %rs19; 2026-02-21T10:22:24.2344761Z cvt.f32.bf16 %r3160, %rs20; 2026-02-21T10:22:24.2344947Z cvt.f32.bf16 %r3161, %rs27; 2026-02-21T10:22:24.2345118Z cvt.f32.bf16 %r3162, %rs28; 2026-02-21T10:22:24.2345298Z cvt.f32.bf16 %r3291, %rs35; 2026-02-21T10:22:24.2345474Z cvt.f32.bf16 %r3292, %rs36; 2026-02-21T10:22:24.2345653Z cvt.f32.bf16 %r3293, %rs43; 2026-02-21T10:22:24.2345908Z cvt.f32.bf16 %r3294, %rs44; 2026-02-21T10:22:24.2346095Z cvt.f32.bf16 %r3423, %rs51; 2026-02-21T10:22:24.2346269Z cvt.f32.bf16 %r3424, %rs52; 2026-02-21T10:22:24.2346444Z cvt.f32.bf16 %r3425, %rs59; 2026-02-21T10:22:24.2346752Z cvt.f32.bf16 %r3426, %rs60; 2026-02-21T10:22:24.2346950Z cvt.f32.bf16 %r3555, %rs5; 2026-02-21T10:22:24.2347125Z cvt.f32.bf16 %r3556, %rs6; 2026-02-21T10:22:24.2347296Z cvt.f32.bf16 %r3557, %rs13; 2026-02-21T10:22:24.2347479Z cvt.f32.bf16 %r3558, %rs14; 2026-02-21T10:22:24.2347656Z cvt.f32.bf16 %r3687, %rs21; 2026-02-21T10:22:24.2347837Z cvt.f32.bf16 %r3688, %rs22; 2026-02-21T10:22:24.2348010Z cvt.f32.bf16 %r3689, %rs29; 2026-02-21T10:22:24.2348186Z cvt.f32.bf16 %r3690, %rs30; 2026-02-21T10:22:24.2348355Z cvt.f32.bf16 %r3819, %rs37; 2026-02-21T10:22:24.2348600Z cvt.f32.bf16 %r3820, %rs38; 2026-02-21T10:22:24.2348786Z cvt.f32.bf16 %r3821, %rs45; 2026-02-21T10:22:24.2348955Z cvt.f32.bf16 %r3822, %rs46; 2026-02-21T10:22:24.2349137Z cvt.f32.bf16 %r3951, %rs53; 2026-02-21T10:22:24.2349312Z cvt.f32.bf16 %r3952, %rs54; 2026-02-21T10:22:24.2349488Z cvt.f32.bf16 %r3953, %rs61; 2026-02-21T10:22:24.2349744Z cvt.f32.bf16 %r3954, %rs62; 2026-02-21T10:22:24.2349925Z cvt.f32.bf16 %r4083, %rs7; 2026-02-21T10:22:24.2350092Z cvt.f32.bf16 %r4084, %rs8; 2026-02-21T10:22:24.2350264Z cvt.f32.bf16 %r4085, %rs15; 2026-02-21T10:22:24.2350450Z cvt.f32.bf16 %r4086, %rs16; 2026-02-21T10:22:24.2350629Z cvt.f32.bf16 %r4215, %rs23; 2026-02-21T10:22:24.2350810Z cvt.f32.bf16 %r4216, %rs24; 2026-02-21T10:22:24.2350980Z cvt.f32.bf16 %r4217, %rs31; 2026-02-21T10:22:24.2351158Z cvt.f32.bf16 %r4218, %rs32; 2026-02-21T10:22:24.2351331Z cvt.f32.bf16 %r4347, %rs39; 2026-02-21T10:22:24.2351507Z cvt.f32.bf16 %r4348, %rs40; 2026-02-21T10:22:24.2351677Z cvt.f32.bf16 %r4349, %rs47; 2026-02-21T10:22:24.2351851Z cvt.f32.bf16 %r4350, %rs48; 2026-02-21T10:22:24.2352025Z cvt.f32.bf16 %r4479, %rs55; 2026-02-21T10:22:24.2352204Z cvt.f32.bf16 %r4480, %rs56; 2026-02-21T10:22:24.2352377Z cvt.f32.bf16 %r4481, %rs63; 2026-02-21T10:22:24.2352558Z cvt.f32.bf16 %r4482, %rs64; 2026-02-21T10:22:24.2352891Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2353244Z bar.sync 0; 2026-02-21T10:22:24.2353409Z add.s32 %r29850, %r39936, 4096; 2026-02-21T10:22:24.2353595Z // begin inline asm 2026-02-21T10:22:24.2353801Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2354033Z // end inline asm 2026-02-21T10:22:24.2354188Z bar.sync 0; 2026-02-21T10:22:24.2354331Z // begin inline asm 2026-02-21T10:22:24.2354689Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2354978Z // end inline asm 2026-02-21T10:22:24.2355128Z // begin inline asm 2026-02-21T10:22:24.2355309Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2355515Z // end inline asm 2026-02-21T10:22:24.2355668Z bar.sync 0; 2026-02-21T10:22:24.2355826Z elect.sync %r9575|%p99, -1; 2026-02-21T10:22:24.2356027Z and.pred %p40, %p1, %p99; 2026-02-21T10:22:24.2356207Z add.s64 %rd31, %rd841, 96; 2026-02-21T10:22:24.2356389Z cvt.u32.u64 %r2366, %rd31; 2026-02-21T10:22:24.2356683Z // begin inline asm 2026-02-21T10:22:24.2357120Z @%p40 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r9808, %r2366}], [%r29850]; 2026-02-21T10:22:24.2357605Z // end inline asm 2026-02-21T10:22:24.2357758Z bar.sync 0; 2026-02-21T10:22:24.2357904Z mov.b32 %r9443, 0; 2026-02-21T10:22:24.2358058Z // begin inline asm 2026-02-21T10:22:24.2358210Z 2026-02-21T10:22:24.2358348Z { 2026-02-21T10:22:24.2358580Z .reg .pred complete; 2026-02-21T10:22:24.2358751Z waitLoop: 2026-02-21T10:22:24.2358991Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r9443; 2026-02-21T10:22:24.2359303Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2359483Z } 2026-02-21T10:22:24.2359555Z 2026-02-21T10:22:24.2359620Z // end inline asm 2026-02-21T10:22:24.2359838Z bar.sync 0; 2026-02-21T10:22:24.2359986Z // begin inline asm 2026-02-21T10:22:24.2360186Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2360427Z // end inline asm 2026-02-21T10:22:24.2360735Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2361110Z ld.shared.s8 %rs65, [%r20]; 2026-02-21T10:22:24.2361438Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2361785Z shl.b16 %rs66, %rs65, 4; 2026-02-21T10:22:24.2362095Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2362450Z ld.shared.s8 %rs67, [%r21+128]; 2026-02-21T10:22:24.2362791Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2363136Z shl.b16 %rs68, %rs67, 4; 2026-02-21T10:22:24.2363442Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2363797Z ld.shared.s8 %rs69, [%r22+256]; 2026-02-21T10:22:24.2364185Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2364531Z shl.b16 %rs70, %rs69, 4; 2026-02-21T10:22:24.2364843Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2365196Z ld.shared.s8 %rs71, [%r23+384]; 2026-02-21T10:22:24.2365512Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2365873Z shl.b16 %rs72, %rs71, 4; 2026-02-21T10:22:24.2366175Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2366650Z ld.shared.s8 %rs73, [%r24+512]; 2026-02-21T10:22:24.2366986Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2367336Z shl.b16 %rs74, %rs73, 4; 2026-02-21T10:22:24.2367644Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2367994Z ld.shared.s8 %rs75, [%r25+640]; 2026-02-21T10:22:24.2368309Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2368658Z shl.b16 %rs76, %rs75, 4; 2026-02-21T10:22:24.2368955Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2369301Z ld.shared.s8 %rs77, [%r26+768]; 2026-02-21T10:22:24.2369704Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2370048Z shl.b16 %rs78, %rs77, 4; 2026-02-21T10:22:24.2370364Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2370715Z ld.shared.s8 %rs79, [%r27+896]; 2026-02-21T10:22:24.2371039Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2371384Z shl.b16 %rs80, %rs79, 4; 2026-02-21T10:22:24.2371700Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2372047Z ld.shared.s8 %rs81, [%r20+1024]; 2026-02-21T10:22:24.2372376Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2372721Z shl.b16 %rs82, %rs81, 4; 2026-02-21T10:22:24.2373089Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2373459Z ld.shared.s8 %rs83, [%r21+1152]; 2026-02-21T10:22:24.2373778Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2374137Z shl.b16 %rs84, %rs83, 4; 2026-02-21T10:22:24.2374439Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2374865Z ld.shared.s8 %rs85, [%r22+1280]; 2026-02-21T10:22:24.2375206Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2375551Z shl.b16 %rs86, %rs85, 4; 2026-02-21T10:22:24.2375854Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2376204Z ld.shared.s8 %rs87, [%r23+1408]; 2026-02-21T10:22:24.2376664Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2377021Z shl.b16 %rs88, %rs87, 4; 2026-02-21T10:22:24.2377322Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2377672Z ld.shared.s8 %rs89, [%r24+1536]; 2026-02-21T10:22:24.2377989Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2378343Z shl.b16 %rs90, %rs89, 4; 2026-02-21T10:22:24.2378736Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2379093Z ld.shared.s8 %rs91, [%r25+1664]; 2026-02-21T10:22:24.2379410Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2379760Z shl.b16 %rs92, %rs91, 4; 2026-02-21T10:22:24.2380066Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2380414Z ld.shared.s8 %rs93, [%r26+1792]; 2026-02-21T10:22:24.2380740Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2381096Z shl.b16 %rs94, %rs93, 4; 2026-02-21T10:22:24.2381402Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2381757Z ld.shared.s8 %rs95, [%r27+1920]; 2026-02-21T10:22:24.2382075Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2382421Z shl.b16 %rs96, %rs95, 4; 2026-02-21T10:22:24.2382731Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2383077Z ld.shared.s8 %rs97, [%r20+2048]; 2026-02-21T10:22:24.2383409Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2383764Z shl.b16 %rs98, %rs97, 4; 2026-02-21T10:22:24.2384070Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2384492Z ld.shared.s8 %rs99, [%r21+2176]; 2026-02-21T10:22:24.2384819Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2385164Z shl.b16 %rs100, %rs99, 4; 2026-02-21T10:22:24.2385479Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2385833Z ld.shared.s8 %rs101, [%r22+2304]; 2026-02-21T10:22:24.2386178Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2386646Z shl.b16 %rs102, %rs101, 4; 2026-02-21T10:22:24.2386957Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2387325Z ld.shared.s8 %rs103, [%r23+2432]; 2026-02-21T10:22:24.2387649Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2388081Z shl.b16 %rs104, %rs103, 4; 2026-02-21T10:22:24.2388390Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2388832Z ld.shared.s8 %rs105, [%r24+2560]; 2026-02-21T10:22:24.2389159Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2389570Z shl.b16 %rs106, %rs105, 4; 2026-02-21T10:22:24.2389883Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2390229Z ld.shared.s8 %rs107, [%r25+2688]; 2026-02-21T10:22:24.2390553Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2390898Z shl.b16 %rs108, %rs107, 4; 2026-02-21T10:22:24.2391220Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2391573Z ld.shared.s8 %rs109, [%r26+2816]; 2026-02-21T10:22:24.2391897Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2392246Z shl.b16 %rs110, %rs109, 4; 2026-02-21T10:22:24.2392554Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2392903Z ld.shared.s8 %rs111, [%r27+2944]; 2026-02-21T10:22:24.2393230Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2393665Z shl.b16 %rs112, %rs111, 4; 2026-02-21T10:22:24.2393981Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2394328Z ld.shared.s8 %rs113, [%r20+3072]; 2026-02-21T10:22:24.2394652Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2394995Z shl.b16 %rs114, %rs113, 4; 2026-02-21T10:22:24.2395308Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2395663Z ld.shared.s8 %rs115, [%r21+3200]; 2026-02-21T10:22:24.2395983Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2396335Z shl.b16 %rs116, %rs115, 4; 2026-02-21T10:22:24.2396774Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2397127Z ld.shared.s8 %rs117, [%r22+3328]; 2026-02-21T10:22:24.2397453Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2397803Z shl.b16 %rs118, %rs117, 4; 2026-02-21T10:22:24.2398113Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2398459Z ld.shared.s8 %rs119, [%r23+3456]; 2026-02-21T10:22:24.2398789Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2399229Z shl.b16 %rs120, %rs119, 4; 2026-02-21T10:22:24.2399542Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2399888Z ld.shared.s8 %rs121, [%r24+3584]; 2026-02-21T10:22:24.2400211Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2400565Z shl.b16 %rs122, %rs121, 4; 2026-02-21T10:22:24.2400874Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2401229Z ld.shared.s8 %rs123, [%r25+3712]; 2026-02-21T10:22:24.2401550Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2401901Z shl.b16 %rs124, %rs123, 4; 2026-02-21T10:22:24.2402205Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2402563Z ld.shared.s8 %rs125, [%r26+3840]; 2026-02-21T10:22:24.2402970Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2403323Z shl.b16 %rs126, %rs125, 4; 2026-02-21T10:22:24.2403635Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2404044Z ld.shared.s8 %rs127, [%r27+3968]; 2026-02-21T10:22:24.2404369Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2404722Z shl.b16 %rs128, %rs127, 4; 2026-02-21T10:22:24.2405026Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2405379Z cvt.s16.s8 %rs129, %rs66; 2026-02-21T10:22:24.2405557Z shr.s16 %rs130, %rs129, 4; 2026-02-21T10:22:24.2405736Z cvt.s16.s8 %rs131, %rs68; 2026-02-21T10:22:24.2405906Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:22:24.2406082Z shr.s16 %rs133, %rs65, 4; 2026-02-21T10:22:24.2406255Z shr.s16 %rs134, %rs67, 4; 2026-02-21T10:22:24.2406699Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2407057Z cvt.rn.f32.s16 %r9576, %rs134; 2026-02-21T10:22:24.2407247Z cvt.rn.f32.s16 %r9577, %rs133; 2026-02-21T10:22:24.2407437Z cvt.rn.f32.s16 %r9578, %rs132; 2026-02-21T10:22:24.2407620Z cvt.rn.f32.s16 %r9579, %rs130; 2026-02-21T10:22:24.2408040Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2408393Z cvt.s16.s8 %rs135, %rs70; 2026-02-21T10:22:24.2408568Z shr.s16 %rs136, %rs135, 4; 2026-02-21T10:22:24.2408741Z cvt.s16.s8 %rs137, %rs72; 2026-02-21T10:22:24.2408914Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:22:24.2409093Z shr.s16 %rs139, %rs69, 4; 2026-02-21T10:22:24.2409265Z shr.s16 %rs140, %rs71, 4; 2026-02-21T10:22:24.2409573Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2409935Z cvt.rn.f32.s16 %r9580, %rs140; 2026-02-21T10:22:24.2410129Z cvt.rn.f32.s16 %r9581, %rs139; 2026-02-21T10:22:24.2410312Z cvt.rn.f32.s16 %r9582, %rs138; 2026-02-21T10:22:24.2410499Z cvt.rn.f32.s16 %r9583, %rs136; 2026-02-21T10:22:24.2410810Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2411162Z cvt.s16.s8 %rs141, %rs74; 2026-02-21T10:22:24.2411341Z shr.s16 %rs142, %rs141, 4; 2026-02-21T10:22:24.2411518Z cvt.s16.s8 %rs143, %rs76; 2026-02-21T10:22:24.2411694Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:22:24.2411869Z shr.s16 %rs145, %rs73, 4; 2026-02-21T10:22:24.2412043Z shr.s16 %rs146, %rs75, 4; 2026-02-21T10:22:24.2412341Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2412693Z cvt.rn.f32.s16 %r9584, %rs146; 2026-02-21T10:22:24.2412877Z cvt.rn.f32.s16 %r9585, %rs145; 2026-02-21T10:22:24.2413159Z cvt.rn.f32.s16 %r9586, %rs144; 2026-02-21T10:22:24.2413348Z cvt.rn.f32.s16 %r9587, %rs142; 2026-02-21T10:22:24.2413660Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2414018Z cvt.s16.s8 %rs147, %rs78; 2026-02-21T10:22:24.2414196Z shr.s16 %rs148, %rs147, 4; 2026-02-21T10:22:24.2414383Z cvt.s16.s8 %rs149, %rs80; 2026-02-21T10:22:24.2414557Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:22:24.2414741Z shr.s16 %rs151, %rs77, 4; 2026-02-21T10:22:24.2414912Z shr.s16 %rs152, %rs79, 4; 2026-02-21T10:22:24.2415222Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2415575Z cvt.rn.f32.s16 %r9588, %rs152; 2026-02-21T10:22:24.2415760Z cvt.rn.f32.s16 %r9589, %rs151; 2026-02-21T10:22:24.2415954Z cvt.rn.f32.s16 %r9590, %rs150; 2026-02-21T10:22:24.2416152Z cvt.rn.f32.s16 %r9591, %rs148; 2026-02-21T10:22:24.2416674Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2417039Z cvt.s16.s8 %rs153, %rs82; 2026-02-21T10:22:24.2417217Z shr.s16 %rs154, %rs153, 4; 2026-02-21T10:22:24.2417391Z cvt.s16.s8 %rs155, %rs84; 2026-02-21T10:22:24.2417570Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:22:24.2417749Z shr.s16 %rs157, %rs81, 4; 2026-02-21T10:22:24.2417984Z shr.s16 %rs158, %rs83, 4; 2026-02-21T10:22:24.2418290Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2418636Z cvt.rn.f32.s16 %r9592, %rs158; 2026-02-21T10:22:24.2418839Z cvt.rn.f32.s16 %r9593, %rs157; 2026-02-21T10:22:24.2419026Z cvt.rn.f32.s16 %r9594, %rs156; 2026-02-21T10:22:24.2419223Z cvt.rn.f32.s16 %r9595, %rs154; 2026-02-21T10:22:24.2419544Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2419903Z cvt.s16.s8 %rs159, %rs86; 2026-02-21T10:22:24.2420091Z shr.s16 %rs160, %rs159, 4; 2026-02-21T10:22:24.2420272Z cvt.s16.s8 %rs161, %rs88; 2026-02-21T10:22:24.2420453Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:22:24.2420630Z shr.s16 %rs163, %rs85, 4; 2026-02-21T10:22:24.2420809Z shr.s16 %rs164, %rs87, 4; 2026-02-21T10:22:24.2421119Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2421481Z cvt.rn.f32.s16 %r9596, %rs164; 2026-02-21T10:22:24.2421677Z cvt.rn.f32.s16 %r9597, %rs163; 2026-02-21T10:22:24.2421944Z cvt.rn.f32.s16 %r9598, %rs162; 2026-02-21T10:22:24.2422136Z cvt.rn.f32.s16 %r9599, %rs160; 2026-02-21T10:22:24.2422453Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2422809Z cvt.s16.s8 %rs165, %rs90; 2026-02-21T10:22:24.2422983Z shr.s16 %rs166, %rs165, 4; 2026-02-21T10:22:24.2423165Z cvt.s16.s8 %rs167, %rs92; 2026-02-21T10:22:24.2423346Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:22:24.2423421Z shr.s16 %rs169, %rs89, 4; 2026-02-21T10:22:24.2423487Z shr.s16 %rs170, %rs91, 4; 2026-02-21T10:22:24.2423687Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2423761Z cvt.rn.f32.s16 %r9600, %rs170; 2026-02-21T10:22:24.2423827Z cvt.rn.f32.s16 %r9601, %rs169; 2026-02-21T10:22:24.2423894Z cvt.rn.f32.s16 %r9602, %rs168; 2026-02-21T10:22:24.2423959Z cvt.rn.f32.s16 %r9603, %rs166; 2026-02-21T10:22:24.2424162Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2424226Z cvt.s16.s8 %rs171, %rs94; 2026-02-21T10:22:24.2424290Z shr.s16 %rs172, %rs171, 4; 2026-02-21T10:22:24.2424359Z cvt.s16.s8 %rs173, %rs96; 2026-02-21T10:22:24.2424422Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:22:24.2424484Z shr.s16 %rs175, %rs93, 4; 2026-02-21T10:22:24.2424553Z shr.s16 %rs176, %rs95, 4; 2026-02-21T10:22:24.2424745Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2424893Z cvt.rn.f32.s16 %r9604, %rs176; 2026-02-21T10:22:24.2424959Z cvt.rn.f32.s16 %r9605, %rs175; 2026-02-21T10:22:24.2425031Z cvt.rn.f32.s16 %r9606, %rs174; 2026-02-21T10:22:24.2425094Z cvt.rn.f32.s16 %r9607, %rs172; 2026-02-21T10:22:24.2425299Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2425376Z cvt.s16.s8 %rs177, %rs98; 2026-02-21T10:22:24.2425443Z shr.s16 %rs178, %rs177, 4; 2026-02-21T10:22:24.2425513Z cvt.s16.s8 %rs179, %rs100; 2026-02-21T10:22:24.2425579Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:22:24.2425650Z shr.s16 %rs181, %rs97, 4; 2026-02-21T10:22:24.2425714Z shr.s16 %rs182, %rs99, 4; 2026-02-21T10:22:24.2425911Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2425995Z cvt.rn.f32.s16 %r9608, %rs182; 2026-02-21T10:22:24.2426071Z cvt.rn.f32.s16 %r9609, %rs181; 2026-02-21T10:22:24.2426197Z cvt.rn.f32.s16 %r9610, %rs180; 2026-02-21T10:22:24.2426270Z cvt.rn.f32.s16 %r9611, %rs178; 2026-02-21T10:22:24.2426585Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2426655Z cvt.s16.s8 %rs183, %rs102; 2026-02-21T10:22:24.2426721Z shr.s16 %rs184, %rs183, 4; 2026-02-21T10:22:24.2426891Z cvt.s16.s8 %rs185, %rs104; 2026-02-21T10:22:24.2426956Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:22:24.2427020Z shr.s16 %rs187, %rs101, 4; 2026-02-21T10:22:24.2427092Z shr.s16 %rs188, %rs103, 4; 2026-02-21T10:22:24.2427286Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2427352Z cvt.rn.f32.s16 %r9612, %rs188; 2026-02-21T10:22:24.2427425Z cvt.rn.f32.s16 %r9613, %rs187; 2026-02-21T10:22:24.2427489Z cvt.rn.f32.s16 %r9614, %rs186; 2026-02-21T10:22:24.2427556Z cvt.rn.f32.s16 %r9615, %rs184; 2026-02-21T10:22:24.2427754Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2427828Z cvt.s16.s8 %rs189, %rs106; 2026-02-21T10:22:24.2427893Z shr.s16 %rs190, %rs189, 4; 2026-02-21T10:22:24.2427957Z cvt.s16.s8 %rs191, %rs108; 2026-02-21T10:22:24.2428026Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:22:24.2428093Z shr.s16 %rs193, %rs105, 4; 2026-02-21T10:22:24.2428154Z shr.s16 %rs194, %rs107, 4; 2026-02-21T10:22:24.2428498Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2428580Z cvt.rn.f32.s16 %r9616, %rs194; 2026-02-21T10:22:24.2428647Z cvt.rn.f32.s16 %r9617, %rs193; 2026-02-21T10:22:24.2428712Z cvt.rn.f32.s16 %r9618, %rs192; 2026-02-21T10:22:24.2428780Z cvt.rn.f32.s16 %r9619, %rs190; 2026-02-21T10:22:24.2428992Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2429061Z cvt.s16.s8 %rs195, %rs110; 2026-02-21T10:22:24.2429133Z shr.s16 %rs196, %rs195, 4; 2026-02-21T10:22:24.2429198Z cvt.s16.s8 %rs197, %rs112; 2026-02-21T10:22:24.2429261Z shr.s16 %rs198, %rs197, 4; 2026-02-21T10:22:24.2429323Z shr.s16 %rs199, %rs109, 4; 2026-02-21T10:22:24.2429395Z shr.s16 %rs200, %rs111, 4; 2026-02-21T10:22:24.2429593Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2429660Z cvt.rn.f32.s16 %r9620, %rs200; 2026-02-21T10:22:24.2429726Z cvt.rn.f32.s16 %r9621, %rs199; 2026-02-21T10:22:24.2429789Z cvt.rn.f32.s16 %r9622, %rs198; 2026-02-21T10:22:24.2429850Z cvt.rn.f32.s16 %r9623, %rs196; 2026-02-21T10:22:24.2430043Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2430108Z cvt.s16.s8 %rs201, %rs114; 2026-02-21T10:22:24.2430169Z shr.s16 %rs202, %rs201, 4; 2026-02-21T10:22:24.2430229Z cvt.s16.s8 %rs203, %rs116; 2026-02-21T10:22:24.2430292Z shr.s16 %rs204, %rs203, 4; 2026-02-21T10:22:24.2430444Z shr.s16 %rs205, %rs113, 4; 2026-02-21T10:22:24.2430505Z shr.s16 %rs206, %rs115, 4; 2026-02-21T10:22:24.2430714Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2430780Z cvt.rn.f32.s16 %r9624, %rs206; 2026-02-21T10:22:24.2430841Z cvt.rn.f32.s16 %r9625, %rs205; 2026-02-21T10:22:24.2430906Z cvt.rn.f32.s16 %r9626, %rs204; 2026-02-21T10:22:24.2430969Z cvt.rn.f32.s16 %r9627, %rs202; 2026-02-21T10:22:24.2431170Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2431234Z cvt.s16.s8 %rs207, %rs118; 2026-02-21T10:22:24.2431300Z shr.s16 %rs208, %rs207, 4; 2026-02-21T10:22:24.2431360Z cvt.s16.s8 %rs209, %rs120; 2026-02-21T10:22:24.2431419Z shr.s16 %rs210, %rs209, 4; 2026-02-21T10:22:24.2431483Z shr.s16 %rs211, %rs117, 4; 2026-02-21T10:22:24.2431544Z shr.s16 %rs212, %rs119, 4; 2026-02-21T10:22:24.2431801Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2431873Z cvt.rn.f32.s16 %r9628, %rs212; 2026-02-21T10:22:24.2431939Z cvt.rn.f32.s16 %r9629, %rs211; 2026-02-21T10:22:24.2432003Z cvt.rn.f32.s16 %r9630, %rs210; 2026-02-21T10:22:24.2432064Z cvt.rn.f32.s16 %r9631, %rs208; 2026-02-21T10:22:24.2432321Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2432385Z cvt.s16.s8 %rs213, %rs122; 2026-02-21T10:22:24.2432448Z shr.s16 %rs214, %rs213, 4; 2026-02-21T10:22:24.2432510Z cvt.s16.s8 %rs215, %rs124; 2026-02-21T10:22:24.2432572Z shr.s16 %rs216, %rs215, 4; 2026-02-21T10:22:24.2432635Z shr.s16 %rs217, %rs121, 4; 2026-02-21T10:22:24.2432694Z shr.s16 %rs218, %rs123, 4; 2026-02-21T10:22:24.2432889Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2432951Z cvt.rn.f32.s16 %r9632, %rs218; 2026-02-21T10:22:24.2433016Z cvt.rn.f32.s16 %r9633, %rs217; 2026-02-21T10:22:24.2433094Z cvt.rn.f32.s16 %r9634, %rs216; 2026-02-21T10:22:24.2433158Z cvt.rn.f32.s16 %r9635, %rs214; 2026-02-21T10:22:24.2433350Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2433413Z cvt.s16.s8 %rs219, %rs126; 2026-02-21T10:22:24.2433479Z shr.s16 %rs220, %rs219, 4; 2026-02-21T10:22:24.2433540Z cvt.s16.s8 %rs221, %rs128; 2026-02-21T10:22:24.2433601Z shr.s16 %rs222, %rs221, 4; 2026-02-21T10:22:24.2433719Z shr.s16 %rs223, %rs125, 4; 2026-02-21T10:22:24.2433781Z shr.s16 %rs224, %rs127, 4; 2026-02-21T10:22:24.2433971Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2434036Z cvt.rn.f32.s16 %r9636, %rs224; 2026-02-21T10:22:24.2434098Z cvt.rn.f32.s16 %r9637, %rs223; 2026-02-21T10:22:24.2434160Z cvt.rn.f32.s16 %r9638, %rs222; 2026-02-21T10:22:24.2434220Z cvt.rn.f32.s16 %r9639, %rs220; 2026-02-21T10:22:24.2434282Z bar.sync 0; 2026-02-21T10:22:24.2434411Z st.shared.v4.b32 [%r28], {%r9579, %r9577, %r9578, %r9576}; 2026-02-21T10:22:24.2434537Z st.shared.v4.b32 [%r28+16384], {%r9611, %r9609, %r9610, %r9608}; 2026-02-21T10:22:24.2434651Z st.shared.v4.b32 [%r29], {%r9583, %r9581, %r9582, %r9580}; 2026-02-21T10:22:24.2441401Z st.shared.v4.b32 [%r29+16384], {%r9615, %r9613, %r9614, %r9612}; 2026-02-21T10:22:24.2441590Z st.shared.v4.b32 [%r30], {%r9587, %r9585, %r9586, %r9584}; 2026-02-21T10:22:24.2441734Z st.shared.v4.b32 [%r30+16384], {%r9619, %r9617, %r9618, %r9616}; 2026-02-21T10:22:24.2441856Z st.shared.v4.b32 [%r31], {%r9591, %r9589, %r9590, %r9588}; 2026-02-21T10:22:24.2441979Z st.shared.v4.b32 [%r31+16384], {%r9623, %r9621, %r9622, %r9620}; 2026-02-21T10:22:24.2442084Z st.shared.v4.b32 [%r32], {%r9595, %r9593, %r9594, %r9592}; 2026-02-21T10:22:24.2442195Z st.shared.v4.b32 [%r32+16384], {%r9627, %r9625, %r9626, %r9624}; 2026-02-21T10:22:24.2442295Z st.shared.v4.b32 [%r33], {%r9599, %r9597, %r9598, %r9596}; 2026-02-21T10:22:24.2442548Z st.shared.v4.b32 [%r33+16384], {%r9631, %r9629, %r9630, %r9628}; 2026-02-21T10:22:24.2442653Z st.shared.v4.b32 [%r34], {%r9603, %r9601, %r9602, %r9600}; 2026-02-21T10:22:24.2442773Z st.shared.v4.b32 [%r34+16384], {%r9635, %r9633, %r9634, %r9632}; 2026-02-21T10:22:24.2442882Z st.shared.v4.b32 [%r35], {%r9607, %r9605, %r9606, %r9604}; 2026-02-21T10:22:24.2443003Z st.shared.v4.b32 [%r35+16384], {%r9639, %r9637, %r9638, %r9636}; 2026-02-21T10:22:24.2443062Z $L__tmp1: 2026-02-21T10:22:24.2443356Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2443420Z // begin inline asm 2026-02-21T10:22:24.2443507Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2443571Z // end inline asm 2026-02-21T10:22:24.2443629Z bar.sync 0; 2026-02-21T10:22:24.2443726Z shfl.sync.idx.b32 %r9640, %r5, 0, 31, -1; 2026-02-21T10:22:24.2443804Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2443950Z mov.pred %p42, -1; 2026-02-21T10:22:24.2444015Z // begin inline asm 2026-02-21T10:22:24.2445484Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r2499,%r2500,%r2501,%r2502}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2445638Z // end inline asm 2026-02-21T10:22:24.2445699Z // begin inline asm 2026-02-21T10:22:24.2447322Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r2631,%r2632,%r2633,%r2634}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2447393Z // end inline asm 2026-02-21T10:22:24.2447532Z // begin inline asm 2026-02-21T10:22:24.2449014Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r2763,%r2764,%r2765,%r2766}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2449076Z // end inline asm 2026-02-21T10:22:24.2449134Z // begin inline asm 2026-02-21T10:22:24.2450595Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r2895,%r2896,%r2897,%r2898}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2450657Z // end inline asm 2026-02-21T10:22:24.2450787Z // begin inline asm 2026-02-21T10:22:24.2452241Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r3027,%r3028,%r3029,%r3030}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2452304Z // end inline asm 2026-02-21T10:22:24.2452380Z // begin inline asm 2026-02-21T10:22:24.2453887Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r3159,%r3160,%r3161,%r3162}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2454009Z // end inline asm 2026-02-21T10:22:24.2454068Z // begin inline asm 2026-02-21T10:22:24.2455513Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r3291,%r3292,%r3293,%r3294}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2455581Z // end inline asm 2026-02-21T10:22:24.2455640Z // begin inline asm 2026-02-21T10:22:24.2457294Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r3423,%r3424,%r3425,%r3426}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2457370Z // end inline asm 2026-02-21T10:22:24.2457432Z // begin inline asm 2026-02-21T10:22:24.2458897Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r3555,%r3556,%r3557,%r3558}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2458961Z // end inline asm 2026-02-21T10:22:24.2459022Z // begin inline asm 2026-02-21T10:22:24.2460468Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r3687,%r3688,%r3689,%r3690}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2460591Z // end inline asm 2026-02-21T10:22:24.2460653Z // begin inline asm 2026-02-21T10:22:24.2462103Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r3819,%r3820,%r3821,%r3822}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2462232Z // end inline asm 2026-02-21T10:22:24.2462296Z // begin inline asm 2026-02-21T10:22:24.2463753Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r3951,%r3952,%r3953,%r3954}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2463871Z // end inline asm 2026-02-21T10:22:24.2463935Z // begin inline asm 2026-02-21T10:22:24.2465380Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r4083,%r4084,%r4085,%r4086}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2465446Z // end inline asm 2026-02-21T10:22:24.2465556Z // begin inline asm 2026-02-21T10:22:24.2467135Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r4215,%r4216,%r4217,%r4218}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2467208Z // end inline asm 2026-02-21T10:22:24.2467267Z // begin inline asm 2026-02-21T10:22:24.2468790Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r4347,%r4348,%r4349,%r4350}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2468859Z // end inline asm 2026-02-21T10:22:24.2469007Z // begin inline asm 2026-02-21T10:22:24.2470460Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r4479,%r4480,%r4481,%r4482}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2470522Z // end inline asm 2026-02-21T10:22:24.2470602Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.2470668Z mov.b32 %r4612, %r9443; 2026-02-21T10:22:24.2470729Z mov.b32 %r4613, %r9443; 2026-02-21T10:22:24.2470791Z mov.b32 %r4611, %r39936; 2026-02-21T10:22:24.2470852Z // begin inline asm 2026-02-21T10:22:24.2473386Z // wait for regs: %r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601,%r4611,%r4612,%r4613 2026-02-21T10:22:24.2473534Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.2473591Z // end inline asm 2026-02-21T10:22:24.2473649Z $L__tmp2: 2026-02-21T10:22:24.2473865Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2473941Z add.s32 %r9641, %r42473, -64; 2026-02-21T10:22:24.2474006Z add.s64 %rd207, %rd166, 128; 2026-02-21T10:22:24.2474069Z add.s64 %rd210, %rd169, 128; 2026-02-21T10:22:24.2474203Z add.s64 %rd213, %rd172, 128; 2026-02-21T10:22:24.2474278Z add.s64 %rd216, %rd175, 128; 2026-02-21T10:22:24.2474342Z add.s64 %rd219, %rd178, 128; 2026-02-21T10:22:24.2474402Z add.s64 %rd222, %rd181, 128; 2026-02-21T10:22:24.2474468Z add.s64 %rd225, %rd184, 128; 2026-02-21T10:22:24.2474549Z mad.wide.s32 %rd228, %r9641, 2, %rd117; 2026-02-21T10:22:24.2474754Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2474824Z // begin inline asm 2026-02-21T10:22:24.2474889Z mov.u64 %rd206, 0x0; 2026-02-21T10:22:24.2475022Z createpolicy.fractional.L2::evict_first.b64 %rd206, 1.0; 2026-02-21T10:22:24.2475087Z // end inline asm 2026-02-21T10:22:24.2475147Z // begin inline asm 2026-02-21T10:22:24.2475207Z mov.u32 %r4745, 0x0; 2026-02-21T10:22:24.2475265Z mov.u32 %r4746, 0x0; 2026-02-21T10:22:24.2475329Z mov.u32 %r4747, 0x0; 2026-02-21T10:22:24.2475386Z mov.u32 %r4748, 0x0; 2026-02-21T10:22:24.2475615Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4745, %r4746, %r4747, %r4748 }, [ %rd207 + 0 ], %rd206; 2026-02-21T10:22:24.2475678Z // end inline asm 2026-02-21T10:22:24.2475738Z // begin inline asm 2026-02-21T10:22:24.2475808Z mov.u64 %rd209, 0x0; 2026-02-21T10:22:24.2475937Z createpolicy.fractional.L2::evict_first.b64 %rd209, 1.0; 2026-02-21T10:22:24.2475996Z // end inline asm 2026-02-21T10:22:24.2476055Z // begin inline asm 2026-02-21T10:22:24.2476113Z mov.u32 %r4749, 0x0; 2026-02-21T10:22:24.2476241Z mov.u32 %r4750, 0x0; 2026-02-21T10:22:24.2476301Z mov.u32 %r4751, 0x0; 2026-02-21T10:22:24.2476358Z mov.u32 %r4752, 0x0; 2026-02-21T10:22:24.2476713Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4749, %r4750, %r4751, %r4752 }, [ %rd210 + 0 ], %rd209; 2026-02-21T10:22:24.2476777Z // end inline asm 2026-02-21T10:22:24.2476836Z // begin inline asm 2026-02-21T10:22:24.2476896Z mov.u64 %rd212, 0x0; 2026-02-21T10:22:24.2477030Z createpolicy.fractional.L2::evict_first.b64 %rd212, 1.0; 2026-02-21T10:22:24.2477089Z // end inline asm 2026-02-21T10:22:24.2477149Z // begin inline asm 2026-02-21T10:22:24.2477213Z mov.u32 %r4753, 0x0; 2026-02-21T10:22:24.2477270Z mov.u32 %r4754, 0x0; 2026-02-21T10:22:24.2477328Z mov.u32 %r4755, 0x0; 2026-02-21T10:22:24.2477390Z mov.u32 %r4756, 0x0; 2026-02-21T10:22:24.2477606Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4753, %r4754, %r4755, %r4756 }, [ %rd213 + 0 ], %rd212; 2026-02-21T10:22:24.2477662Z // end inline asm 2026-02-21T10:22:24.2477724Z // begin inline asm 2026-02-21T10:22:24.2477884Z mov.u64 %rd215, 0x0; 2026-02-21T10:22:24.2478004Z createpolicy.fractional.L2::evict_first.b64 %rd215, 1.0; 2026-02-21T10:22:24.2478060Z // end inline asm 2026-02-21T10:22:24.2478124Z // begin inline asm 2026-02-21T10:22:24.2478180Z mov.u32 %r4757, 0x0; 2026-02-21T10:22:24.2478236Z mov.u32 %r4758, 0x0; 2026-02-21T10:22:24.2478358Z mov.u32 %r4759, 0x0; 2026-02-21T10:22:24.2478430Z mov.u32 %r4760, 0x0; 2026-02-21T10:22:24.2478646Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4757, %r4758, %r4759, %r4760 }, [ %rd216 + 0 ], %rd215; 2026-02-21T10:22:24.2478705Z // end inline asm 2026-02-21T10:22:24.2478768Z // begin inline asm 2026-02-21T10:22:24.2478826Z mov.u64 %rd218, 0x0; 2026-02-21T10:22:24.2478944Z createpolicy.fractional.L2::evict_first.b64 %rd218, 1.0; 2026-02-21T10:22:24.2479009Z // end inline asm 2026-02-21T10:22:24.2479068Z // begin inline asm 2026-02-21T10:22:24.2479125Z mov.u32 %r4761, 0x0; 2026-02-21T10:22:24.2479183Z mov.u32 %r4762, 0x0; 2026-02-21T10:22:24.2479253Z mov.u32 %r4763, 0x0; 2026-02-21T10:22:24.2479309Z mov.u32 %r4764, 0x0; 2026-02-21T10:22:24.2479520Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4761, %r4762, %r4763, %r4764 }, [ %rd219 + 0 ], %rd218; 2026-02-21T10:22:24.2479582Z // end inline asm 2026-02-21T10:22:24.2479640Z // begin inline asm 2026-02-21T10:22:24.2479702Z mov.u64 %rd221, 0x0; 2026-02-21T10:22:24.2479832Z createpolicy.fractional.L2::evict_first.b64 %rd221, 1.0; 2026-02-21T10:22:24.2479894Z // end inline asm 2026-02-21T10:22:24.2480021Z // begin inline asm 2026-02-21T10:22:24.2480082Z mov.u32 %r4765, 0x0; 2026-02-21T10:22:24.2480145Z mov.u32 %r4766, 0x0; 2026-02-21T10:22:24.2480205Z mov.u32 %r4767, 0x0; 2026-02-21T10:22:24.2480262Z mov.u32 %r4768, 0x0; 2026-02-21T10:22:24.2480480Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4765, %r4766, %r4767, %r4768 }, [ %rd222 + 0 ], %rd221; 2026-02-21T10:22:24.2480537Z // end inline asm 2026-02-21T10:22:24.2480596Z // begin inline asm 2026-02-21T10:22:24.2480659Z mov.u64 %rd224, 0x0; 2026-02-21T10:22:24.2480794Z createpolicy.fractional.L2::evict_first.b64 %rd224, 1.0; 2026-02-21T10:22:24.2480853Z // end inline asm 2026-02-21T10:22:24.2480914Z // begin inline asm 2026-02-21T10:22:24.2480978Z mov.u32 %r4769, 0x0; 2026-02-21T10:22:24.2481036Z mov.u32 %r4770, 0x0; 2026-02-21T10:22:24.2481096Z mov.u32 %r4771, 0x0; 2026-02-21T10:22:24.2481157Z mov.u32 %r4772, 0x0; 2026-02-21T10:22:24.2481373Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4769, %r4770, %r4771, %r4772 }, [ %rd225 + 0 ], %rd224; 2026-02-21T10:22:24.2481432Z // end inline asm 2026-02-21T10:22:24.2481495Z // begin inline asm 2026-02-21T10:22:24.2481558Z mov.u64 %rd227, 0x0; 2026-02-21T10:22:24.2481674Z createpolicy.fractional.L2::evict_first.b64 %rd227, 1.0; 2026-02-21T10:22:24.2481732Z // end inline asm 2026-02-21T10:22:24.2481796Z // begin inline asm 2026-02-21T10:22:24.2481854Z mov.u32 %r4773, 0x0; 2026-02-21T10:22:24.2481910Z mov.u32 %r4774, 0x0; 2026-02-21T10:22:24.2482048Z mov.u32 %r4775, 0x0; 2026-02-21T10:22:24.2482117Z mov.u32 %r4776, 0x0; 2026-02-21T10:22:24.2482332Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r4773, %r4774, %r4775, %r4776 }, [ %rd228 + 0 ], %rd227; 2026-02-21T10:22:24.2482390Z // end inline asm 2026-02-21T10:22:24.2482598Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2482659Z bar.sync 0; 2026-02-21T10:22:24.2482744Z st.shared.v2.b32 [%r10], {%r4745, %r4746}; 2026-02-21T10:22:24.2482851Z st.shared.v2.b32 [%r10+2048], {%r4749, %r4750}; 2026-02-21T10:22:24.2482939Z st.shared.v2.b32 [%r10+4096], {%r4753, %r4754}; 2026-02-21T10:22:24.2483019Z st.shared.v2.b32 [%r10+6144], {%r4757, %r4758}; 2026-02-21T10:22:24.2483102Z st.shared.v2.b32 [%r10+8192], {%r4761, %r4762}; 2026-02-21T10:22:24.2483191Z st.shared.v2.b32 [%r10+10240], {%r4765, %r4766}; 2026-02-21T10:22:24.2483274Z st.shared.v2.b32 [%r10+12288], {%r4769, %r4770}; 2026-02-21T10:22:24.2483410Z st.shared.v2.b32 [%r10+14336], {%r4773, %r4774}; 2026-02-21T10:22:24.2483494Z st.shared.v2.b32 [%r11], {%r4747, %r4748}; 2026-02-21T10:22:24.2483578Z st.shared.v2.b32 [%r11+2048], {%r4751, %r4752}; 2026-02-21T10:22:24.2483659Z st.shared.v2.b32 [%r11+4096], {%r4755, %r4756}; 2026-02-21T10:22:24.2483741Z st.shared.v2.b32 [%r11+6144], {%r4759, %r4760}; 2026-02-21T10:22:24.2483883Z st.shared.v2.b32 [%r11+8192], {%r4763, %r4764}; 2026-02-21T10:22:24.2483966Z st.shared.v2.b32 [%r11+10240], {%r4767, %r4768}; 2026-02-21T10:22:24.2484065Z st.shared.v2.b32 [%r11+12288], {%r4771, %r4772}; 2026-02-21T10:22:24.2484151Z st.shared.v2.b32 [%r11+14336], {%r4775, %r4776}; 2026-02-21T10:22:24.2484207Z bar.sync 0; 2026-02-21T10:22:24.2484277Z ld.shared.b16 %rs225, [%r12]; 2026-02-21T10:22:24.2484352Z ld.shared.b16 %rs226, [%r12+1024]; 2026-02-21T10:22:24.2484425Z ld.shared.b16 %rs227, [%r12+64]; 2026-02-21T10:22:24.2484490Z ld.shared.b16 %rs228, [%r12+1088]; 2026-02-21T10:22:24.2484569Z ld.shared.b16 %rs229, [%r12+8192]; 2026-02-21T10:22:24.2484634Z ld.shared.b16 %rs230, [%r12+9216]; 2026-02-21T10:22:24.2484697Z ld.shared.b16 %rs231, [%r12+8256]; 2026-02-21T10:22:24.2484768Z ld.shared.b16 %rs232, [%r12+9280]; 2026-02-21T10:22:24.2484838Z ld.shared.b16 %rs233, [%r13]; 2026-02-21T10:22:24.2484902Z ld.shared.b16 %rs234, [%r13+1024]; 2026-02-21T10:22:24.2484970Z ld.shared.b16 %rs235, [%r13+64]; 2026-02-21T10:22:24.2485037Z ld.shared.b16 %rs236, [%r13+1088]; 2026-02-21T10:22:24.2485102Z ld.shared.b16 %rs237, [%r13+8192]; 2026-02-21T10:22:24.2485219Z ld.shared.b16 %rs238, [%r13+9216]; 2026-02-21T10:22:24.2485286Z ld.shared.b16 %rs239, [%r13+8256]; 2026-02-21T10:22:24.2485354Z ld.shared.b16 %rs240, [%r13+9280]; 2026-02-21T10:22:24.2485418Z ld.shared.b16 %rs241, [%r14]; 2026-02-21T10:22:24.2485480Z ld.shared.b16 %rs242, [%r14+1024]; 2026-02-21T10:22:24.2485550Z ld.shared.b16 %rs243, [%r14+64]; 2026-02-21T10:22:24.2485612Z ld.shared.b16 %rs244, [%r14+1088]; 2026-02-21T10:22:24.2485677Z ld.shared.b16 %rs245, [%r14+8192]; 2026-02-21T10:22:24.2485751Z ld.shared.b16 %rs246, [%r14+9216]; 2026-02-21T10:22:24.2485824Z ld.shared.b16 %rs247, [%r14+8256]; 2026-02-21T10:22:24.2485887Z ld.shared.b16 %rs248, [%r14+9280]; 2026-02-21T10:22:24.2485950Z ld.shared.b16 %rs249, [%r15]; 2026-02-21T10:22:24.2486017Z ld.shared.b16 %rs250, [%r15+1024]; 2026-02-21T10:22:24.2486083Z ld.shared.b16 %rs251, [%r15+64]; 2026-02-21T10:22:24.2486149Z ld.shared.b16 %rs252, [%r15+1088]; 2026-02-21T10:22:24.2486216Z ld.shared.b16 %rs253, [%r15+8192]; 2026-02-21T10:22:24.2486280Z ld.shared.b16 %rs254, [%r15+9216]; 2026-02-21T10:22:24.2486342Z ld.shared.b16 %rs255, [%r15+8256]; 2026-02-21T10:22:24.2486408Z ld.shared.b16 %rs256, [%r15+9280]; 2026-02-21T10:22:24.2486602Z ld.shared.b16 %rs257, [%r16]; 2026-02-21T10:22:24.2486671Z ld.shared.b16 %rs258, [%r16+1024]; 2026-02-21T10:22:24.2486736Z ld.shared.b16 %rs259, [%r16+64]; 2026-02-21T10:22:24.2486804Z ld.shared.b16 %rs260, [%r16+1088]; 2026-02-21T10:22:24.2486962Z ld.shared.b16 %rs261, [%r16+8192]; 2026-02-21T10:22:24.2487025Z ld.shared.b16 %rs262, [%r16+9216]; 2026-02-21T10:22:24.2487097Z ld.shared.b16 %rs263, [%r16+8256]; 2026-02-21T10:22:24.2487162Z ld.shared.b16 %rs264, [%r16+9280]; 2026-02-21T10:22:24.2487224Z ld.shared.b16 %rs265, [%r17]; 2026-02-21T10:22:24.2487289Z ld.shared.b16 %rs266, [%r17+1024]; 2026-02-21T10:22:24.2487364Z ld.shared.b16 %rs267, [%r17+64]; 2026-02-21T10:22:24.2487429Z ld.shared.b16 %rs268, [%r17+1088]; 2026-02-21T10:22:24.2487496Z ld.shared.b16 %rs269, [%r17+8192]; 2026-02-21T10:22:24.2487566Z ld.shared.b16 %rs270, [%r17+9216]; 2026-02-21T10:22:24.2487630Z ld.shared.b16 %rs271, [%r17+8256]; 2026-02-21T10:22:24.2487694Z ld.shared.b16 %rs272, [%r17+9280]; 2026-02-21T10:22:24.2487758Z ld.shared.b16 %rs273, [%r18]; 2026-02-21T10:22:24.2487826Z ld.shared.b16 %rs274, [%r18+1024]; 2026-02-21T10:22:24.2487890Z ld.shared.b16 %rs275, [%r18+64]; 2026-02-21T10:22:24.2487954Z ld.shared.b16 %rs276, [%r18+1088]; 2026-02-21T10:22:24.2488093Z ld.shared.b16 %rs277, [%r18+8192]; 2026-02-21T10:22:24.2488161Z ld.shared.b16 %rs278, [%r18+9216]; 2026-02-21T10:22:24.2488223Z ld.shared.b16 %rs279, [%r18+8256]; 2026-02-21T10:22:24.2488287Z ld.shared.b16 %rs280, [%r18+9280]; 2026-02-21T10:22:24.2488355Z ld.shared.b16 %rs281, [%r19]; 2026-02-21T10:22:24.2488478Z ld.shared.b16 %rs282, [%r19+1024]; 2026-02-21T10:22:24.2488541Z ld.shared.b16 %rs283, [%r19+64]; 2026-02-21T10:22:24.2488609Z ld.shared.b16 %rs284, [%r19+1088]; 2026-02-21T10:22:24.2488673Z ld.shared.b16 %rs285, [%r19+8192]; 2026-02-21T10:22:24.2488738Z ld.shared.b16 %rs286, [%r19+9216]; 2026-02-21T10:22:24.2488808Z ld.shared.b16 %rs287, [%r19+8256]; 2026-02-21T10:22:24.2488871Z ld.shared.b16 %rs288, [%r19+9280]; 2026-02-21T10:22:24.2488938Z cvt.f32.bf16 %r4914, %rs225; 2026-02-21T10:22:24.2489000Z cvt.f32.bf16 %r4915, %rs226; 2026-02-21T10:22:24.2489066Z cvt.f32.bf16 %r4916, %rs233; 2026-02-21T10:22:24.2489127Z cvt.f32.bf16 %r4917, %rs234; 2026-02-21T10:22:24.2489192Z cvt.f32.bf16 %r5046, %rs241; 2026-02-21T10:22:24.2489259Z cvt.f32.bf16 %r5047, %rs242; 2026-02-21T10:22:24.2489319Z cvt.f32.bf16 %r5048, %rs249; 2026-02-21T10:22:24.2489378Z cvt.f32.bf16 %r5049, %rs250; 2026-02-21T10:22:24.2489438Z cvt.f32.bf16 %r5178, %rs257; 2026-02-21T10:22:24.2489505Z cvt.f32.bf16 %r5179, %rs258; 2026-02-21T10:22:24.2489582Z cvt.f32.bf16 %r5180, %rs265; 2026-02-21T10:22:24.2489644Z cvt.f32.bf16 %r5181, %rs266; 2026-02-21T10:22:24.2489711Z cvt.f32.bf16 %r5310, %rs273; 2026-02-21T10:22:24.2489841Z cvt.f32.bf16 %r5311, %rs274; 2026-02-21T10:22:24.2489906Z cvt.f32.bf16 %r5312, %rs281; 2026-02-21T10:22:24.2489967Z cvt.f32.bf16 %r5313, %rs282; 2026-02-21T10:22:24.2490032Z cvt.f32.bf16 %r5442, %rs227; 2026-02-21T10:22:24.2490093Z cvt.f32.bf16 %r5443, %rs228; 2026-02-21T10:22:24.2490154Z cvt.f32.bf16 %r5444, %rs235; 2026-02-21T10:22:24.2490221Z cvt.f32.bf16 %r5445, %rs236; 2026-02-21T10:22:24.2490281Z cvt.f32.bf16 %r5574, %rs243; 2026-02-21T10:22:24.2490346Z cvt.f32.bf16 %r5575, %rs244; 2026-02-21T10:22:24.2490414Z cvt.f32.bf16 %r5576, %rs251; 2026-02-21T10:22:24.2490474Z cvt.f32.bf16 %r5577, %rs252; 2026-02-21T10:22:24.2490533Z cvt.f32.bf16 %r5706, %rs259; 2026-02-21T10:22:24.2490596Z cvt.f32.bf16 %r5707, %rs260; 2026-02-21T10:22:24.2490659Z cvt.f32.bf16 %r5708, %rs267; 2026-02-21T10:22:24.2490723Z cvt.f32.bf16 %r5709, %rs268; 2026-02-21T10:22:24.2490784Z cvt.f32.bf16 %r5838, %rs275; 2026-02-21T10:22:24.2490846Z cvt.f32.bf16 %r5839, %rs276; 2026-02-21T10:22:24.2490909Z cvt.f32.bf16 %r5840, %rs283; 2026-02-21T10:22:24.2490967Z cvt.f32.bf16 %r5841, %rs284; 2026-02-21T10:22:24.2491032Z cvt.f32.bf16 %r5970, %rs229; 2026-02-21T10:22:24.2491093Z cvt.f32.bf16 %r5971, %rs230; 2026-02-21T10:22:24.2491153Z cvt.f32.bf16 %r5972, %rs237; 2026-02-21T10:22:24.2491217Z cvt.f32.bf16 %r5973, %rs238; 2026-02-21T10:22:24.2491278Z cvt.f32.bf16 %r6102, %rs245; 2026-02-21T10:22:24.2491337Z cvt.f32.bf16 %r6103, %rs246; 2026-02-21T10:22:24.2491470Z cvt.f32.bf16 %r6104, %rs253; 2026-02-21T10:22:24.2491535Z cvt.f32.bf16 %r6105, %rs254; 2026-02-21T10:22:24.2491595Z cvt.f32.bf16 %r6234, %rs261; 2026-02-21T10:22:24.2491655Z cvt.f32.bf16 %r6235, %rs262; 2026-02-21T10:22:24.2491720Z cvt.f32.bf16 %r6236, %rs269; 2026-02-21T10:22:24.2491780Z cvt.f32.bf16 %r6237, %rs270; 2026-02-21T10:22:24.2491843Z cvt.f32.bf16 %r6366, %rs277; 2026-02-21T10:22:24.2491907Z cvt.f32.bf16 %r6367, %rs278; 2026-02-21T10:22:24.2491968Z cvt.f32.bf16 %r6368, %rs285; 2026-02-21T10:22:24.2492032Z cvt.f32.bf16 %r6369, %rs286; 2026-02-21T10:22:24.2492092Z cvt.f32.bf16 %r6498, %rs231; 2026-02-21T10:22:24.2492157Z cvt.f32.bf16 %r6499, %rs232; 2026-02-21T10:22:24.2492218Z cvt.f32.bf16 %r6500, %rs239; 2026-02-21T10:22:24.2492280Z cvt.f32.bf16 %r6501, %rs240; 2026-02-21T10:22:24.2492343Z cvt.f32.bf16 %r6630, %rs247; 2026-02-21T10:22:24.2492404Z cvt.f32.bf16 %r6631, %rs248; 2026-02-21T10:22:24.2492463Z cvt.f32.bf16 %r6632, %rs255; 2026-02-21T10:22:24.2492579Z cvt.f32.bf16 %r6633, %rs256; 2026-02-21T10:22:24.2492647Z cvt.f32.bf16 %r6762, %rs263; 2026-02-21T10:22:24.2492706Z cvt.f32.bf16 %r6763, %rs264; 2026-02-21T10:22:24.2492766Z cvt.f32.bf16 %r6764, %rs271; 2026-02-21T10:22:24.2492830Z cvt.f32.bf16 %r6765, %rs272; 2026-02-21T10:22:24.2492890Z cvt.f32.bf16 %r6894, %rs279; 2026-02-21T10:22:24.2492998Z cvt.f32.bf16 %r6895, %rs280; 2026-02-21T10:22:24.2493060Z cvt.f32.bf16 %r6896, %rs287; 2026-02-21T10:22:24.2493126Z cvt.f32.bf16 %r6897, %rs288; 2026-02-21T10:22:24.2493350Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2493409Z bar.sync 0; 2026-02-21T10:22:24.2493478Z // begin inline asm 2026-02-21T10:22:24.2493582Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2493640Z // end inline asm 2026-02-21T10:22:24.2493699Z bar.sync 0; 2026-02-21T10:22:24.2493757Z // begin inline asm 2026-02-21T10:22:24.2493898Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2493956Z // end inline asm 2026-02-21T10:22:24.2494019Z // begin inline asm 2026-02-21T10:22:24.2494095Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2494152Z // end inline asm 2026-02-21T10:22:24.2494210Z bar.sync 0; 2026-02-21T10:22:24.2494278Z elect.sync %r9642|%p100, -1; 2026-02-21T10:22:24.2494349Z and.pred %p60, %p1, %p100; 2026-02-21T10:22:24.2494413Z cvt.u32.u64 %r9643, %rd841; 2026-02-21T10:22:24.2494475Z add.s32 %r4781, %r9643, 128; 2026-02-21T10:22:24.2494589Z // begin inline asm 2026-02-21T10:22:24.2494936Z @%p60 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r9808, %r4781}], [%r29850]; 2026-02-21T10:22:24.2494998Z // end inline asm 2026-02-21T10:22:24.2495054Z bar.sync 0; 2026-02-21T10:22:24.2495114Z // begin inline asm 2026-02-21T10:22:24.2495171Z 2026-02-21T10:22:24.2495222Z { 2026-02-21T10:22:24.2495287Z .reg .pred complete; 2026-02-21T10:22:24.2495345Z waitLoop: 2026-02-21T10:22:24.2495499Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r9443; 2026-02-21T10:22:24.2495569Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2495620Z } 2026-02-21T10:22:24.2495626Z 2026-02-21T10:22:24.2495686Z // end inline asm 2026-02-21T10:22:24.2495742Z bar.sync 0; 2026-02-21T10:22:24.2495803Z // begin inline asm 2026-02-21T10:22:24.2495899Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2495968Z // end inline asm 2026-02-21T10:22:24.2496173Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2496242Z ld.shared.s8 %rs289, [%r20]; 2026-02-21T10:22:24.2496439Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2496630Z shl.b16 %rs290, %rs289, 4; 2026-02-21T10:22:24.2496825Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2497009Z ld.shared.s8 %rs291, [%r21+128]; 2026-02-21T10:22:24.2497203Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2497265Z shl.b16 %rs292, %rs291, 4; 2026-02-21T10:22:24.2497455Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2497524Z ld.shared.s8 %rs293, [%r22+256]; 2026-02-21T10:22:24.2497716Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2497779Z shl.b16 %rs294, %rs293, 4; 2026-02-21T10:22:24.2497972Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2498035Z ld.shared.s8 %rs295, [%r23+384]; 2026-02-21T10:22:24.2498224Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2498289Z shl.b16 %rs296, %rs295, 4; 2026-02-21T10:22:24.2498547Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2498615Z ld.shared.s8 %rs297, [%r24+512]; 2026-02-21T10:22:24.2498808Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2498932Z shl.b16 %rs298, %rs297, 4; 2026-02-21T10:22:24.2499120Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2499191Z ld.shared.s8 %rs299, [%r25+640]; 2026-02-21T10:22:24.2499381Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2499443Z shl.b16 %rs300, %rs299, 4; 2026-02-21T10:22:24.2499631Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2499701Z ld.shared.s8 %rs301, [%r26+768]; 2026-02-21T10:22:24.2499893Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2499963Z shl.b16 %rs302, %rs301, 4; 2026-02-21T10:22:24.2500151Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2500214Z ld.shared.s8 %rs303, [%r27+896]; 2026-02-21T10:22:24.2500411Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2500471Z shl.b16 %rs304, %rs303, 4; 2026-02-21T10:22:24.2500724Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2500798Z ld.shared.s8 %rs305, [%r20+1024]; 2026-02-21T10:22:24.2500988Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2501049Z shl.b16 %rs306, %rs305, 4; 2026-02-21T10:22:24.2501242Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2501313Z ld.shared.s8 %rs307, [%r21+1152]; 2026-02-21T10:22:24.2501501Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2501562Z shl.b16 %rs308, %rs307, 4; 2026-02-21T10:22:24.2501755Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2501822Z ld.shared.s8 %rs309, [%r22+1280]; 2026-02-21T10:22:24.2502013Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2502078Z shl.b16 %rs310, %rs309, 4; 2026-02-21T10:22:24.2502268Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2502331Z ld.shared.s8 %rs311, [%r23+1408]; 2026-02-21T10:22:24.2502523Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2502635Z shl.b16 %rs312, %rs311, 4; 2026-02-21T10:22:24.2502826Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2502905Z ld.shared.s8 %rs313, [%r24+1536]; 2026-02-21T10:22:24.2503098Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2503163Z shl.b16 %rs314, %rs313, 4; 2026-02-21T10:22:24.2503350Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2503416Z ld.shared.s8 %rs315, [%r25+1664]; 2026-02-21T10:22:24.2503606Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2503667Z shl.b16 %rs316, %rs315, 4; 2026-02-21T10:22:24.2503858Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2503922Z ld.shared.s8 %rs317, [%r26+1792]; 2026-02-21T10:22:24.2504160Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2504225Z shl.b16 %rs318, %rs317, 4; 2026-02-21T10:22:24.2504413Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2504478Z ld.shared.s8 %rs319, [%r27+1920]; 2026-02-21T10:22:24.2504717Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2504778Z shl.b16 %rs320, %rs319, 4; 2026-02-21T10:22:24.2504967Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2505030Z ld.shared.s8 %rs321, [%r20+2048]; 2026-02-21T10:22:24.2505221Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2505282Z shl.b16 %rs322, %rs321, 4; 2026-02-21T10:22:24.2505469Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2505543Z ld.shared.s8 %rs323, [%r21+2176]; 2026-02-21T10:22:24.2505733Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2505793Z shl.b16 %rs324, %rs323, 4; 2026-02-21T10:22:24.2505985Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2506052Z ld.shared.s8 %rs325, [%r22+2304]; 2026-02-21T10:22:24.2506297Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2506363Z shl.b16 %rs326, %rs325, 4; 2026-02-21T10:22:24.2506689Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2506755Z ld.shared.s8 %rs327, [%r23+2432]; 2026-02-21T10:22:24.2506945Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2507014Z shl.b16 %rs328, %rs327, 4; 2026-02-21T10:22:24.2507205Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2507269Z ld.shared.s8 %rs329, [%r24+2560]; 2026-02-21T10:22:24.2507461Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2507525Z shl.b16 %rs330, %rs329, 4; 2026-02-21T10:22:24.2507716Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2507790Z ld.shared.s8 %rs331, [%r25+2688]; 2026-02-21T10:22:24.2507989Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2508051Z shl.b16 %rs332, %rs331, 4; 2026-02-21T10:22:24.2508242Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2508305Z ld.shared.s8 %rs333, [%r26+2816]; 2026-02-21T10:22:24.2508652Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2508722Z shl.b16 %rs334, %rs333, 4; 2026-02-21T10:22:24.2508916Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2508978Z ld.shared.s8 %rs335, [%r27+2944]; 2026-02-21T10:22:24.2509170Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2509236Z shl.b16 %rs336, %rs335, 4; 2026-02-21T10:22:24.2509425Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2509487Z ld.shared.s8 %rs337, [%r20+3072]; 2026-02-21T10:22:24.2509676Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2509737Z shl.b16 %rs338, %rs337, 4; 2026-02-21T10:22:24.2509999Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2510070Z ld.shared.s8 %rs339, [%r21+3200]; 2026-02-21T10:22:24.2510258Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2510318Z shl.b16 %rs340, %rs339, 4; 2026-02-21T10:22:24.2510507Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2510640Z ld.shared.s8 %rs341, [%r22+3328]; 2026-02-21T10:22:24.2510830Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2510893Z shl.b16 %rs342, %rs341, 4; 2026-02-21T10:22:24.2511086Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2511150Z ld.shared.s8 %rs343, [%r23+3456]; 2026-02-21T10:22:24.2511339Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2511408Z shl.b16 %rs344, %rs343, 4; 2026-02-21T10:22:24.2511598Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2511662Z ld.shared.s8 %rs345, [%r24+3584]; 2026-02-21T10:22:24.2511854Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2511918Z shl.b16 %rs346, %rs345, 4; 2026-02-21T10:22:24.2512168Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2512235Z ld.shared.s8 %rs347, [%r25+3712]; 2026-02-21T10:22:24.2512426Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2512487Z shl.b16 %rs348, %rs347, 4; 2026-02-21T10:22:24.2512675Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2512743Z ld.shared.s8 %rs349, [%r26+3840]; 2026-02-21T10:22:24.2512937Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2512997Z shl.b16 %rs350, %rs349, 4; 2026-02-21T10:22:24.2513189Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2513256Z ld.shared.s8 %rs351, [%r27+3968]; 2026-02-21T10:22:24.2513443Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2513508Z shl.b16 %rs352, %rs351, 4; 2026-02-21T10:22:24.2513697Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2513758Z cvt.s16.s8 %rs353, %rs290; 2026-02-21T10:22:24.2513818Z shr.s16 %rs354, %rs353, 4; 2026-02-21T10:22:24.2513880Z cvt.s16.s8 %rs355, %rs292; 2026-02-21T10:22:24.2513940Z shr.s16 %rs356, %rs355, 4; 2026-02-21T10:22:24.2513999Z shr.s16 %rs357, %rs289, 4; 2026-02-21T10:22:24.2514127Z shr.s16 %rs358, %rs291, 4; 2026-02-21T10:22:24.2514319Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2514383Z cvt.rn.f32.s16 %r9644, %rs358; 2026-02-21T10:22:24.2514451Z cvt.rn.f32.s16 %r9645, %rs357; 2026-02-21T10:22:24.2514512Z cvt.rn.f32.s16 %r9646, %rs356; 2026-02-21T10:22:24.2514577Z cvt.rn.f32.s16 %r9647, %rs354; 2026-02-21T10:22:24.2514768Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2514835Z cvt.s16.s8 %rs359, %rs294; 2026-02-21T10:22:24.2514897Z shr.s16 %rs360, %rs359, 4; 2026-02-21T10:22:24.2514956Z cvt.s16.s8 %rs361, %rs296; 2026-02-21T10:22:24.2515018Z shr.s16 %rs362, %rs361, 4; 2026-02-21T10:22:24.2515076Z shr.s16 %rs363, %rs293, 4; 2026-02-21T10:22:24.2515135Z shr.s16 %rs364, %rs295, 4; 2026-02-21T10:22:24.2515324Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2515443Z cvt.rn.f32.s16 %r9648, %rs364; 2026-02-21T10:22:24.2515506Z cvt.rn.f32.s16 %r9649, %rs363; 2026-02-21T10:22:24.2515566Z cvt.rn.f32.s16 %r9650, %rs362; 2026-02-21T10:22:24.2515629Z cvt.rn.f32.s16 %r9651, %rs360; 2026-02-21T10:22:24.2515818Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2515925Z cvt.s16.s8 %rs365, %rs298; 2026-02-21T10:22:24.2515989Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:22:24.2516048Z cvt.s16.s8 %rs367, %rs300; 2026-02-21T10:22:24.2516106Z shr.s16 %rs368, %rs367, 4; 2026-02-21T10:22:24.2516165Z shr.s16 %rs369, %rs297, 4; 2026-02-21T10:22:24.2516236Z shr.s16 %rs370, %rs299, 4; 2026-02-21T10:22:24.2516424Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2516601Z cvt.rn.f32.s16 %r9652, %rs370; 2026-02-21T10:22:24.2516672Z cvt.rn.f32.s16 %r9653, %rs369; 2026-02-21T10:22:24.2516737Z cvt.rn.f32.s16 %r9654, %rs368; 2026-02-21T10:22:24.2516799Z cvt.rn.f32.s16 %r9655, %rs366; 2026-02-21T10:22:24.2516995Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2517056Z cvt.s16.s8 %rs371, %rs302; 2026-02-21T10:22:24.2517116Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:22:24.2517180Z cvt.s16.s8 %rs373, %rs304; 2026-02-21T10:22:24.2517242Z shr.s16 %rs374, %rs373, 4; 2026-02-21T10:22:24.2517303Z shr.s16 %rs375, %rs301, 4; 2026-02-21T10:22:24.2517467Z shr.s16 %rs376, %rs303, 4; 2026-02-21T10:22:24.2517669Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2517733Z cvt.rn.f32.s16 %r9656, %rs376; 2026-02-21T10:22:24.2517793Z cvt.rn.f32.s16 %r9657, %rs375; 2026-02-21T10:22:24.2517856Z cvt.rn.f32.s16 %r9658, %rs374; 2026-02-21T10:22:24.2517921Z cvt.rn.f32.s16 %r9659, %rs372; 2026-02-21T10:22:24.2518111Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2518175Z cvt.s16.s8 %rs377, %rs306; 2026-02-21T10:22:24.2518238Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:22:24.2518296Z cvt.s16.s8 %rs379, %rs308; 2026-02-21T10:22:24.2518356Z shr.s16 %rs380, %rs379, 4; 2026-02-21T10:22:24.2518417Z shr.s16 %rs381, %rs305, 4; 2026-02-21T10:22:24.2518480Z shr.s16 %rs382, %rs307, 4; 2026-02-21T10:22:24.2518668Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2518734Z cvt.rn.f32.s16 %r9660, %rs382; 2026-02-21T10:22:24.2518800Z cvt.rn.f32.s16 %r9661, %rs381; 2026-02-21T10:22:24.2518861Z cvt.rn.f32.s16 %r9662, %rs380; 2026-02-21T10:22:24.2518921Z cvt.rn.f32.s16 %r9663, %rs378; 2026-02-21T10:22:24.2519113Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2519174Z cvt.s16.s8 %rs383, %rs310; 2026-02-21T10:22:24.2519305Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:22:24.2519371Z cvt.s16.s8 %rs385, %rs312; 2026-02-21T10:22:24.2519432Z shr.s16 %rs386, %rs385, 4; 2026-02-21T10:22:24.2519492Z shr.s16 %rs387, %rs309, 4; 2026-02-21T10:22:24.2519552Z shr.s16 %rs388, %rs311, 4; 2026-02-21T10:22:24.2519744Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2519808Z cvt.rn.f32.s16 %r9664, %rs388; 2026-02-21T10:22:24.2519869Z cvt.rn.f32.s16 %r9665, %rs387; 2026-02-21T10:22:24.2519933Z cvt.rn.f32.s16 %r9666, %rs386; 2026-02-21T10:22:24.2520006Z cvt.rn.f32.s16 %r9667, %rs384; 2026-02-21T10:22:24.2520197Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2520257Z cvt.s16.s8 %rs389, %rs314; 2026-02-21T10:22:24.2520319Z shr.s16 %rs390, %rs389, 4; 2026-02-21T10:22:24.2520378Z cvt.s16.s8 %rs391, %rs316; 2026-02-21T10:22:24.2520437Z shr.s16 %rs392, %rs391, 4; 2026-02-21T10:22:24.2520504Z shr.s16 %rs393, %rs313, 4; 2026-02-21T10:22:24.2520630Z shr.s16 %rs394, %rs315, 4; 2026-02-21T10:22:24.2520824Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2520889Z cvt.rn.f32.s16 %r9668, %rs394; 2026-02-21T10:22:24.2520949Z cvt.rn.f32.s16 %r9669, %rs393; 2026-02-21T10:22:24.2521071Z cvt.rn.f32.s16 %r9670, %rs392; 2026-02-21T10:22:24.2521131Z cvt.rn.f32.s16 %r9671, %rs390; 2026-02-21T10:22:24.2521326Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2521387Z cvt.s16.s8 %rs395, %rs318; 2026-02-21T10:22:24.2521448Z shr.s16 %rs396, %rs395, 4; 2026-02-21T10:22:24.2521510Z cvt.s16.s8 %rs397, %rs320; 2026-02-21T10:22:24.2521570Z shr.s16 %rs398, %rs397, 4; 2026-02-21T10:22:24.2521628Z shr.s16 %rs399, %rs317, 4; 2026-02-21T10:22:24.2521687Z shr.s16 %rs400, %rs319, 4; 2026-02-21T10:22:24.2521884Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2521962Z cvt.rn.f32.s16 %r9672, %rs400; 2026-02-21T10:22:24.2522025Z cvt.rn.f32.s16 %r9673, %rs399; 2026-02-21T10:22:24.2522089Z cvt.rn.f32.s16 %r9674, %rs398; 2026-02-21T10:22:24.2522152Z cvt.rn.f32.s16 %r9675, %rs396; 2026-02-21T10:22:24.2522341Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2522410Z cvt.s16.s8 %rs401, %rs322; 2026-02-21T10:22:24.2522470Z shr.s16 %rs402, %rs401, 4; 2026-02-21T10:22:24.2522582Z cvt.s16.s8 %rs403, %rs324; 2026-02-21T10:22:24.2522645Z shr.s16 %rs404, %rs403, 4; 2026-02-21T10:22:24.2522710Z shr.s16 %rs405, %rs321, 4; 2026-02-21T10:22:24.2522770Z shr.s16 %rs406, %rs323, 4; 2026-02-21T10:22:24.2522959Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2523023Z cvt.rn.f32.s16 %r9676, %rs406; 2026-02-21T10:22:24.2523085Z cvt.rn.f32.s16 %r9677, %rs405; 2026-02-21T10:22:24.2523151Z cvt.rn.f32.s16 %r9678, %rs404; 2026-02-21T10:22:24.2523214Z cvt.rn.f32.s16 %r9679, %rs402; 2026-02-21T10:22:24.2523421Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2523482Z cvt.s16.s8 %rs407, %rs326; 2026-02-21T10:22:24.2523545Z shr.s16 %rs408, %rs407, 4; 2026-02-21T10:22:24.2523608Z cvt.s16.s8 %rs409, %rs328; 2026-02-21T10:22:24.2523667Z shr.s16 %rs410, %rs409, 4; 2026-02-21T10:22:24.2523729Z shr.s16 %rs411, %rs325, 4; 2026-02-21T10:22:24.2523792Z shr.s16 %rs412, %rs327, 4; 2026-02-21T10:22:24.2523982Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2524044Z cvt.rn.f32.s16 %r9680, %rs412; 2026-02-21T10:22:24.2524106Z cvt.rn.f32.s16 %r9681, %rs411; 2026-02-21T10:22:24.2524169Z cvt.rn.f32.s16 %r9682, %rs410; 2026-02-21T10:22:24.2524231Z cvt.rn.f32.s16 %r9683, %rs408; 2026-02-21T10:22:24.2524480Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2524545Z cvt.s16.s8 %rs413, %rs330; 2026-02-21T10:22:24.2524604Z shr.s16 %rs414, %rs413, 4; 2026-02-21T10:22:24.2524662Z cvt.s16.s8 %rs415, %rs332; 2026-02-21T10:22:24.2524725Z shr.s16 %rs416, %rs415, 4; 2026-02-21T10:22:24.2524789Z shr.s16 %rs417, %rs329, 4; 2026-02-21T10:22:24.2524848Z shr.s16 %rs418, %rs331, 4; 2026-02-21T10:22:24.2525038Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2525104Z cvt.rn.f32.s16 %r9684, %rs418; 2026-02-21T10:22:24.2525165Z cvt.rn.f32.s16 %r9685, %rs417; 2026-02-21T10:22:24.2525228Z cvt.rn.f32.s16 %r9686, %rs416; 2026-02-21T10:22:24.2525292Z cvt.rn.f32.s16 %r9687, %rs414; 2026-02-21T10:22:24.2525484Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2525544Z cvt.s16.s8 %rs419, %rs334; 2026-02-21T10:22:24.2525654Z shr.s16 %rs420, %rs419, 4; 2026-02-21T10:22:24.2525720Z cvt.s16.s8 %rs421, %rs336; 2026-02-21T10:22:24.2525779Z shr.s16 %rs422, %rs421, 4; 2026-02-21T10:22:24.2525838Z shr.s16 %rs423, %rs333, 4; 2026-02-21T10:22:24.2525899Z shr.s16 %rs424, %rs335, 4; 2026-02-21T10:22:24.2526096Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2526219Z cvt.rn.f32.s16 %r9688, %rs424; 2026-02-21T10:22:24.2526284Z cvt.rn.f32.s16 %r9689, %rs423; 2026-02-21T10:22:24.2526347Z cvt.rn.f32.s16 %r9690, %rs422; 2026-02-21T10:22:24.2526407Z cvt.rn.f32.s16 %r9691, %rs420; 2026-02-21T10:22:24.2526727Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2526795Z cvt.s16.s8 %rs425, %rs338; 2026-02-21T10:22:24.2526855Z shr.s16 %rs426, %rs425, 4; 2026-02-21T10:22:24.2526914Z cvt.s16.s8 %rs427, %rs340; 2026-02-21T10:22:24.2526976Z shr.s16 %rs428, %rs427, 4; 2026-02-21T10:22:24.2527040Z shr.s16 %rs429, %rs337, 4; 2026-02-21T10:22:24.2527101Z shr.s16 %rs430, %rs339, 4; 2026-02-21T10:22:24.2527293Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2527354Z cvt.rn.f32.s16 %r9692, %rs430; 2026-02-21T10:22:24.2527418Z cvt.rn.f32.s16 %r9693, %rs429; 2026-02-21T10:22:24.2527479Z cvt.rn.f32.s16 %r9694, %rs428; 2026-02-21T10:22:24.2527542Z cvt.rn.f32.s16 %r9695, %rs426; 2026-02-21T10:22:24.2527808Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2527872Z cvt.s16.s8 %rs431, %rs342; 2026-02-21T10:22:24.2527944Z shr.s16 %rs432, %rs431, 4; 2026-02-21T10:22:24.2528008Z cvt.s16.s8 %rs433, %rs344; 2026-02-21T10:22:24.2528073Z shr.s16 %rs434, %rs433, 4; 2026-02-21T10:22:24.2528132Z shr.s16 %rs435, %rs341, 4; 2026-02-21T10:22:24.2528196Z shr.s16 %rs436, %rs343, 4; 2026-02-21T10:22:24.2528400Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2528468Z cvt.rn.f32.s16 %r9696, %rs436; 2026-02-21T10:22:24.2528535Z cvt.rn.f32.s16 %r9697, %rs435; 2026-02-21T10:22:24.2528597Z cvt.rn.f32.s16 %r9698, %rs434; 2026-02-21T10:22:24.2528658Z cvt.rn.f32.s16 %r9699, %rs432; 2026-02-21T10:22:24.2528862Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2528923Z cvt.s16.s8 %rs437, %rs346; 2026-02-21T10:22:24.2528985Z shr.s16 %rs438, %rs437, 4; 2026-02-21T10:22:24.2529045Z cvt.s16.s8 %rs439, %rs348; 2026-02-21T10:22:24.2529107Z shr.s16 %rs440, %rs439, 4; 2026-02-21T10:22:24.2529167Z shr.s16 %rs441, %rs345, 4; 2026-02-21T10:22:24.2529226Z shr.s16 %rs442, %rs347, 4; 2026-02-21T10:22:24.2529420Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2529483Z cvt.rn.f32.s16 %r9700, %rs442; 2026-02-21T10:22:24.2529619Z cvt.rn.f32.s16 %r9701, %rs441; 2026-02-21T10:22:24.2529682Z cvt.rn.f32.s16 %r9702, %rs440; 2026-02-21T10:22:24.2529748Z cvt.rn.f32.s16 %r9703, %rs438; 2026-02-21T10:22:24.2529937Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2529997Z cvt.s16.s8 %rs443, %rs350; 2026-02-21T10:22:24.2530064Z shr.s16 %rs444, %rs443, 4; 2026-02-21T10:22:24.2530124Z cvt.s16.s8 %rs445, %rs352; 2026-02-21T10:22:24.2530184Z shr.s16 %rs446, %rs445, 4; 2026-02-21T10:22:24.2530248Z shr.s16 %rs447, %rs349, 4; 2026-02-21T10:22:24.2530307Z shr.s16 %rs448, %rs351, 4; 2026-02-21T10:22:24.2530497Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2530559Z cvt.rn.f32.s16 %r9704, %rs448; 2026-02-21T10:22:24.2530625Z cvt.rn.f32.s16 %r9705, %rs447; 2026-02-21T10:22:24.2530686Z cvt.rn.f32.s16 %r9706, %rs446; 2026-02-21T10:22:24.2530747Z cvt.rn.f32.s16 %r9707, %rs444; 2026-02-21T10:22:24.2530873Z bar.sync 0; 2026-02-21T10:22:24.2530995Z st.shared.v4.b32 [%r28], {%r9647, %r9645, %r9646, %r9644}; 2026-02-21T10:22:24.2531117Z st.shared.v4.b32 [%r28+16384], {%r9679, %r9677, %r9678, %r9676}; 2026-02-21T10:22:24.2531224Z st.shared.v4.b32 [%r29], {%r9651, %r9649, %r9650, %r9648}; 2026-02-21T10:22:24.2531398Z st.shared.v4.b32 [%r29+16384], {%r9683, %r9681, %r9682, %r9680}; 2026-02-21T10:22:24.2531501Z st.shared.v4.b32 [%r30], {%r9655, %r9653, %r9654, %r9652}; 2026-02-21T10:22:24.2531614Z st.shared.v4.b32 [%r30+16384], {%r9687, %r9685, %r9686, %r9684}; 2026-02-21T10:22:24.2531718Z st.shared.v4.b32 [%r31], {%r9659, %r9657, %r9658, %r9656}; 2026-02-21T10:22:24.2531840Z st.shared.v4.b32 [%r31+16384], {%r9691, %r9689, %r9690, %r9688}; 2026-02-21T10:22:24.2531945Z st.shared.v4.b32 [%r32], {%r9663, %r9661, %r9662, %r9660}; 2026-02-21T10:22:24.2532058Z st.shared.v4.b32 [%r32+16384], {%r9695, %r9693, %r9694, %r9692}; 2026-02-21T10:22:24.2532164Z st.shared.v4.b32 [%r33], {%r9667, %r9665, %r9666, %r9664}; 2026-02-21T10:22:24.2532276Z st.shared.v4.b32 [%r33+16384], {%r9699, %r9697, %r9698, %r9696}; 2026-02-21T10:22:24.2532376Z st.shared.v4.b32 [%r34], {%r9671, %r9669, %r9670, %r9668}; 2026-02-21T10:22:24.2532489Z st.shared.v4.b32 [%r34+16384], {%r9703, %r9701, %r9702, %r9700}; 2026-02-21T10:22:24.2532591Z st.shared.v4.b32 [%r35], {%r9675, %r9673, %r9674, %r9672}; 2026-02-21T10:22:24.2532706Z st.shared.v4.b32 [%r35+16384], {%r9707, %r9705, %r9706, %r9704}; 2026-02-21T10:22:24.2532822Z $L__tmp3: 2026-02-21T10:22:24.2533098Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2533162Z // begin inline asm 2026-02-21T10:22:24.2533245Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2533302Z // end inline asm 2026-02-21T10:22:24.2533358Z bar.sync 0; 2026-02-21T10:22:24.2533435Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2533494Z // begin inline asm 2026-02-21T10:22:24.2534987Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r4914,%r4915,%r4916,%r4917}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2535053Z // end inline asm 2026-02-21T10:22:24.2535110Z // begin inline asm 2026-02-21T10:22:24.2536724Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5046,%r5047,%r5048,%r5049}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2536872Z // end inline asm 2026-02-21T10:22:24.2536931Z // begin inline asm 2026-02-21T10:22:24.2538475Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5178,%r5179,%r5180,%r5181}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2538540Z // end inline asm 2026-02-21T10:22:24.2538599Z // begin inline asm 2026-02-21T10:22:24.2540073Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5310,%r5311,%r5312,%r5313}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2540216Z // end inline asm 2026-02-21T10:22:24.2540275Z // begin inline asm 2026-02-21T10:22:24.2541819Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5442,%r5443,%r5444,%r5445}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2541882Z // end inline asm 2026-02-21T10:22:24.2541953Z // begin inline asm 2026-02-21T10:22:24.2543427Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5574,%r5575,%r5576,%r5577}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2543486Z // end inline asm 2026-02-21T10:22:24.2543549Z // begin inline asm 2026-02-21T10:22:24.2545024Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5706,%r5707,%r5708,%r5709}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2545132Z // end inline asm 2026-02-21T10:22:24.2545192Z // begin inline asm 2026-02-21T10:22:24.2546797Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r5838,%r5839,%r5840,%r5841}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2546863Z // end inline asm 2026-02-21T10:22:24.2546920Z // begin inline asm 2026-02-21T10:22:24.2548535Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r5970,%r5971,%r5972,%r5973}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2548667Z // end inline asm 2026-02-21T10:22:24.2548730Z // begin inline asm 2026-02-21T10:22:24.2550209Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6102,%r6103,%r6104,%r6105}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2550268Z // end inline asm 2026-02-21T10:22:24.2550325Z // begin inline asm 2026-02-21T10:22:24.2551852Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6234,%r6235,%r6236,%r6237}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2551914Z // end inline asm 2026-02-21T10:22:24.2551974Z // begin inline asm 2026-02-21T10:22:24.2553447Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6366,%r6367,%r6368,%r6369}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2553506Z // end inline asm 2026-02-21T10:22:24.2553566Z // begin inline asm 2026-02-21T10:22:24.2555029Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6498,%r6499,%r6500,%r6501}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2555161Z // end inline asm 2026-02-21T10:22:24.2555222Z // begin inline asm 2026-02-21T10:22:24.2556893Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6630,%r6631,%r6632,%r6633}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2556974Z // end inline asm 2026-02-21T10:22:24.2557035Z // begin inline asm 2026-02-21T10:22:24.2558503Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6762,%r6763,%r6764,%r6765}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2558624Z // end inline asm 2026-02-21T10:22:24.2558682Z // begin inline asm 2026-02-21T10:22:24.2560224Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r6894,%r6895,%r6896,%r6897}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2560297Z // end inline asm 2026-02-21T10:22:24.2560377Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.2560441Z mov.b32 %r7027, %r9443; 2026-02-21T10:22:24.2560501Z mov.b32 %r7028, %r9443; 2026-02-21T10:22:24.2560560Z mov.b32 %r7026, %r39936; 2026-02-21T10:22:24.2560617Z // begin inline asm 2026-02-21T10:22:24.2563125Z // wait for regs: %r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601,%r7026,%r7027,%r7028 2026-02-21T10:22:24.2563206Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.2563325Z // end inline asm 2026-02-21T10:22:24.2563383Z $L__tmp4: 2026-02-21T10:22:24.2563593Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2563661Z add.s64 %rd248, %rd166, 256; 2026-02-21T10:22:24.2563722Z add.s64 %rd251, %rd169, 256; 2026-02-21T10:22:24.2563784Z add.s64 %rd254, %rd172, 256; 2026-02-21T10:22:24.2563847Z add.s64 %rd257, %rd175, 256; 2026-02-21T10:22:24.2563907Z add.s64 %rd260, %rd178, 256; 2026-02-21T10:22:24.2563969Z add.s64 %rd263, %rd181, 256; 2026-02-21T10:22:24.2564040Z add.s64 %rd266, %rd184, 256; 2026-02-21T10:22:24.2564121Z mad.wide.s32 %rd269, %r42473, 2, %rd117; 2026-02-21T10:22:24.2564318Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2564377Z // begin inline asm 2026-02-21T10:22:24.2564438Z mov.u64 %rd247, 0x0; 2026-02-21T10:22:24.2564570Z createpolicy.fractional.L2::evict_first.b64 %rd247, 1.0; 2026-02-21T10:22:24.2564682Z // end inline asm 2026-02-21T10:22:24.2564742Z // begin inline asm 2026-02-21T10:22:24.2564807Z mov.u32 %r7160, 0x0; 2026-02-21T10:22:24.2564862Z mov.u32 %r7161, 0x0; 2026-02-21T10:22:24.2564919Z mov.u32 %r7162, 0x0; 2026-02-21T10:22:24.2564978Z mov.u32 %r7163, 0x0; 2026-02-21T10:22:24.2565206Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7160, %r7161, %r7162, %r7163 }, [ %rd248 + 0 ], %rd247; 2026-02-21T10:22:24.2565312Z // end inline asm 2026-02-21T10:22:24.2565371Z // begin inline asm 2026-02-21T10:22:24.2565429Z mov.u64 %rd250, 0x0; 2026-02-21T10:22:24.2565550Z createpolicy.fractional.L2::evict_first.b64 %rd250, 1.0; 2026-02-21T10:22:24.2565617Z // end inline asm 2026-02-21T10:22:24.2565680Z // begin inline asm 2026-02-21T10:22:24.2565738Z mov.u32 %r7164, 0x0; 2026-02-21T10:22:24.2565793Z mov.u32 %r7165, 0x0; 2026-02-21T10:22:24.2565851Z mov.u32 %r7166, 0x0; 2026-02-21T10:22:24.2565907Z mov.u32 %r7167, 0x0; 2026-02-21T10:22:24.2566145Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7164, %r7165, %r7166, %r7167 }, [ %rd251 + 0 ], %rd250; 2026-02-21T10:22:24.2566205Z // end inline asm 2026-02-21T10:22:24.2566262Z // begin inline asm 2026-02-21T10:22:24.2566318Z mov.u64 %rd253, 0x0; 2026-02-21T10:22:24.2566437Z createpolicy.fractional.L2::evict_first.b64 %rd253, 1.0; 2026-02-21T10:22:24.2566610Z // end inline asm 2026-02-21T10:22:24.2566670Z // begin inline asm 2026-02-21T10:22:24.2566727Z mov.u32 %r7168, 0x0; 2026-02-21T10:22:24.2566786Z mov.u32 %r7169, 0x0; 2026-02-21T10:22:24.2566926Z mov.u32 %r7170, 0x0; 2026-02-21T10:22:24.2566988Z mov.u32 %r7171, 0x0; 2026-02-21T10:22:24.2567202Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7168, %r7169, %r7170, %r7171 }, [ %rd254 + 0 ], %rd253; 2026-02-21T10:22:24.2567264Z // end inline asm 2026-02-21T10:22:24.2567326Z // begin inline asm 2026-02-21T10:22:24.2567383Z mov.u64 %rd256, 0x0; 2026-02-21T10:22:24.2567502Z createpolicy.fractional.L2::evict_first.b64 %rd256, 1.0; 2026-02-21T10:22:24.2567562Z // end inline asm 2026-02-21T10:22:24.2567622Z // begin inline asm 2026-02-21T10:22:24.2567681Z mov.u32 %r7172, 0x0; 2026-02-21T10:22:24.2567736Z mov.u32 %r7173, 0x0; 2026-02-21T10:22:24.2567790Z mov.u32 %r7174, 0x0; 2026-02-21T10:22:24.2567846Z mov.u32 %r7175, 0x0; 2026-02-21T10:22:24.2568060Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7172, %r7173, %r7174, %r7175 }, [ %rd257 + 0 ], %rd256; 2026-02-21T10:22:24.2568120Z // end inline asm 2026-02-21T10:22:24.2568177Z // begin inline asm 2026-02-21T10:22:24.2568239Z mov.u64 %rd259, 0x0; 2026-02-21T10:22:24.2568357Z createpolicy.fractional.L2::evict_first.b64 %rd259, 1.0; 2026-02-21T10:22:24.2568416Z // end inline asm 2026-02-21T10:22:24.2568473Z // begin inline asm 2026-02-21T10:22:24.2568535Z mov.u32 %r7176, 0x0; 2026-02-21T10:22:24.2568603Z mov.u32 %r7177, 0x0; 2026-02-21T10:22:24.2568661Z mov.u32 %r7178, 0x0; 2026-02-21T10:22:24.2568722Z mov.u32 %r7179, 0x0; 2026-02-21T10:22:24.2568936Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7176, %r7177, %r7178, %r7179 }, [ %rd260 + 0 ], %rd259; 2026-02-21T10:22:24.2569068Z // end inline asm 2026-02-21T10:22:24.2569130Z // begin inline asm 2026-02-21T10:22:24.2569188Z mov.u64 %rd262, 0x0; 2026-02-21T10:22:24.2569303Z createpolicy.fractional.L2::evict_first.b64 %rd262, 1.0; 2026-02-21T10:22:24.2569360Z // end inline asm 2026-02-21T10:22:24.2569424Z // begin inline asm 2026-02-21T10:22:24.2569481Z mov.u32 %r7180, 0x0; 2026-02-21T10:22:24.2569536Z mov.u32 %r7181, 0x0; 2026-02-21T10:22:24.2569597Z mov.u32 %r7182, 0x0; 2026-02-21T10:22:24.2569654Z mov.u32 %r7183, 0x0; 2026-02-21T10:22:24.2569865Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7180, %r7181, %r7182, %r7183 }, [ %rd263 + 0 ], %rd262; 2026-02-21T10:22:24.2569925Z // end inline asm 2026-02-21T10:22:24.2569983Z // begin inline asm 2026-02-21T10:22:24.2570041Z mov.u64 %rd265, 0x0; 2026-02-21T10:22:24.2570157Z createpolicy.fractional.L2::evict_first.b64 %rd265, 1.0; 2026-02-21T10:22:24.2570221Z // end inline asm 2026-02-21T10:22:24.2570370Z // begin inline asm 2026-02-21T10:22:24.2570432Z mov.u32 %r7184, 0x0; 2026-02-21T10:22:24.2570494Z mov.u32 %r7185, 0x0; 2026-02-21T10:22:24.2570549Z mov.u32 %r7186, 0x0; 2026-02-21T10:22:24.2570604Z mov.u32 %r7187, 0x0; 2026-02-21T10:22:24.2570817Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7184, %r7185, %r7186, %r7187 }, [ %rd266 + 0 ], %rd265; 2026-02-21T10:22:24.2570941Z // end inline asm 2026-02-21T10:22:24.2571001Z // begin inline asm 2026-02-21T10:22:24.2571058Z mov.u64 %rd268, 0x0; 2026-02-21T10:22:24.2571180Z createpolicy.fractional.L2::evict_first.b64 %rd268, 1.0; 2026-02-21T10:22:24.2571237Z // end inline asm 2026-02-21T10:22:24.2571293Z // begin inline asm 2026-02-21T10:22:24.2571353Z mov.u32 %r7188, 0x0; 2026-02-21T10:22:24.2571409Z mov.u32 %r7189, 0x0; 2026-02-21T10:22:24.2571464Z mov.u32 %r7190, 0x0; 2026-02-21T10:22:24.2571519Z mov.u32 %r7191, 0x0; 2026-02-21T10:22:24.2571737Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r7188, %r7189, %r7190, %r7191 }, [ %rd269 + 0 ], %rd268; 2026-02-21T10:22:24.2571797Z // end inline asm 2026-02-21T10:22:24.2571995Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2572056Z bar.sync 0; 2026-02-21T10:22:24.2572137Z st.shared.v2.b32 [%r10], {%r7160, %r7161}; 2026-02-21T10:22:24.2572226Z st.shared.v2.b32 [%r10+2048], {%r7164, %r7165}; 2026-02-21T10:22:24.2572311Z st.shared.v2.b32 [%r10+4096], {%r7168, %r7169}; 2026-02-21T10:22:24.2572442Z st.shared.v2.b32 [%r10+6144], {%r7172, %r7173}; 2026-02-21T10:22:24.2572525Z st.shared.v2.b32 [%r10+8192], {%r7176, %r7177}; 2026-02-21T10:22:24.2572612Z st.shared.v2.b32 [%r10+10240], {%r7180, %r7181}; 2026-02-21T10:22:24.2572700Z st.shared.v2.b32 [%r10+12288], {%r7184, %r7185}; 2026-02-21T10:22:24.2572781Z st.shared.v2.b32 [%r10+14336], {%r7188, %r7189}; 2026-02-21T10:22:24.2572855Z st.shared.v2.b32 [%r11], {%r7162, %r7163}; 2026-02-21T10:22:24.2572943Z st.shared.v2.b32 [%r11+2048], {%r7166, %r7167}; 2026-02-21T10:22:24.2573026Z st.shared.v2.b32 [%r11+4096], {%r7170, %r7171}; 2026-02-21T10:22:24.2573105Z st.shared.v2.b32 [%r11+6144], {%r7174, %r7175}; 2026-02-21T10:22:24.2573186Z st.shared.v2.b32 [%r11+8192], {%r7178, %r7179}; 2026-02-21T10:22:24.2573267Z st.shared.v2.b32 [%r11+10240], {%r7182, %r7183}; 2026-02-21T10:22:24.2573355Z st.shared.v2.b32 [%r11+12288], {%r7186, %r7187}; 2026-02-21T10:22:24.2573448Z st.shared.v2.b32 [%r11+14336], {%r7190, %r7191}; 2026-02-21T10:22:24.2573511Z bar.sync 0; 2026-02-21T10:22:24.2573581Z ld.shared.b16 %rs449, [%r12]; 2026-02-21T10:22:24.2573651Z ld.shared.b16 %rs450, [%r12+1024]; 2026-02-21T10:22:24.2573722Z ld.shared.b16 %rs451, [%r12+64]; 2026-02-21T10:22:24.2573787Z ld.shared.b16 %rs452, [%r12+1088]; 2026-02-21T10:22:24.2573850Z ld.shared.b16 %rs453, [%r12+8192]; 2026-02-21T10:22:24.2573913Z ld.shared.b16 %rs454, [%r12+9216]; 2026-02-21T10:22:24.2573980Z ld.shared.b16 %rs455, [%r12+8256]; 2026-02-21T10:22:24.2574112Z ld.shared.b16 %rs456, [%r12+9280]; 2026-02-21T10:22:24.2574178Z ld.shared.b16 %rs457, [%r13]; 2026-02-21T10:22:24.2574245Z ld.shared.b16 %rs458, [%r13+1024]; 2026-02-21T10:22:24.2574310Z ld.shared.b16 %rs459, [%r13+64]; 2026-02-21T10:22:24.2574374Z ld.shared.b16 %rs460, [%r13+1088]; 2026-02-21T10:22:24.2574440Z ld.shared.b16 %rs461, [%r13+8192]; 2026-02-21T10:22:24.2574506Z ld.shared.b16 %rs462, [%r13+9216]; 2026-02-21T10:22:24.2574569Z ld.shared.b16 %rs463, [%r13+8256]; 2026-02-21T10:22:24.2574634Z ld.shared.b16 %rs464, [%r13+9280]; 2026-02-21T10:22:24.2574702Z ld.shared.b16 %rs465, [%r14]; 2026-02-21T10:22:24.2574766Z ld.shared.b16 %rs466, [%r14+1024]; 2026-02-21T10:22:24.2574829Z ld.shared.b16 %rs467, [%r14+64]; 2026-02-21T10:22:24.2574895Z ld.shared.b16 %rs468, [%r14+1088]; 2026-02-21T10:22:24.2574958Z ld.shared.b16 %rs469, [%r14+8192]; 2026-02-21T10:22:24.2575020Z ld.shared.b16 %rs470, [%r14+9216]; 2026-02-21T10:22:24.2575084Z ld.shared.b16 %rs471, [%r14+8256]; 2026-02-21T10:22:24.2575213Z ld.shared.b16 %rs472, [%r14+9280]; 2026-02-21T10:22:24.2575281Z ld.shared.b16 %rs473, [%r15]; 2026-02-21T10:22:24.2575346Z ld.shared.b16 %rs474, [%r15+1024]; 2026-02-21T10:22:24.2575412Z ld.shared.b16 %rs475, [%r15+64]; 2026-02-21T10:22:24.2575474Z ld.shared.b16 %rs476, [%r15+1088]; 2026-02-21T10:22:24.2575585Z ld.shared.b16 %rs477, [%r15+8192]; 2026-02-21T10:22:24.2575652Z ld.shared.b16 %rs478, [%r15+9216]; 2026-02-21T10:22:24.2575715Z ld.shared.b16 %rs479, [%r15+8256]; 2026-02-21T10:22:24.2575780Z ld.shared.b16 %rs480, [%r15+9280]; 2026-02-21T10:22:24.2575843Z ld.shared.b16 %rs481, [%r16]; 2026-02-21T10:22:24.2575908Z ld.shared.b16 %rs482, [%r16+1024]; 2026-02-21T10:22:24.2575979Z ld.shared.b16 %rs483, [%r16+64]; 2026-02-21T10:22:24.2576042Z ld.shared.b16 %rs484, [%r16+1088]; 2026-02-21T10:22:24.2576109Z ld.shared.b16 %rs485, [%r16+8192]; 2026-02-21T10:22:24.2576172Z ld.shared.b16 %rs486, [%r16+9216]; 2026-02-21T10:22:24.2576239Z ld.shared.b16 %rs487, [%r16+8256]; 2026-02-21T10:22:24.2576306Z ld.shared.b16 %rs488, [%r16+9280]; 2026-02-21T10:22:24.2576371Z ld.shared.b16 %rs489, [%r17]; 2026-02-21T10:22:24.2576434Z ld.shared.b16 %rs490, [%r17+1024]; 2026-02-21T10:22:24.2576629Z ld.shared.b16 %rs491, [%r17+64]; 2026-02-21T10:22:24.2576698Z ld.shared.b16 %rs492, [%r17+1088]; 2026-02-21T10:22:24.2576766Z ld.shared.b16 %rs493, [%r17+8192]; 2026-02-21T10:22:24.2576829Z ld.shared.b16 %rs494, [%r17+9216]; 2026-02-21T10:22:24.2576892Z ld.shared.b16 %rs495, [%r17+8256]; 2026-02-21T10:22:24.2577049Z ld.shared.b16 %rs496, [%r17+9280]; 2026-02-21T10:22:24.2577115Z ld.shared.b16 %rs497, [%r18]; 2026-02-21T10:22:24.2577179Z ld.shared.b16 %rs498, [%r18+1024]; 2026-02-21T10:22:24.2577245Z ld.shared.b16 %rs499, [%r18+64]; 2026-02-21T10:22:24.2577309Z ld.shared.b16 %rs500, [%r18+1088]; 2026-02-21T10:22:24.2577373Z ld.shared.b16 %rs501, [%r18+8192]; 2026-02-21T10:22:24.2577440Z ld.shared.b16 %rs502, [%r18+9216]; 2026-02-21T10:22:24.2577508Z ld.shared.b16 %rs503, [%r18+8256]; 2026-02-21T10:22:24.2577573Z ld.shared.b16 %rs504, [%r18+9280]; 2026-02-21T10:22:24.2577637Z ld.shared.b16 %rs505, [%r19]; 2026-02-21T10:22:24.2577704Z ld.shared.b16 %rs506, [%r19+1024]; 2026-02-21T10:22:24.2577767Z ld.shared.b16 %rs507, [%r19+64]; 2026-02-21T10:22:24.2577830Z ld.shared.b16 %rs508, [%r19+1088]; 2026-02-21T10:22:24.2577899Z ld.shared.b16 %rs509, [%r19+8192]; 2026-02-21T10:22:24.2577961Z ld.shared.b16 %rs510, [%r19+9216]; 2026-02-21T10:22:24.2578026Z ld.shared.b16 %rs511, [%r19+8256]; 2026-02-21T10:22:24.2578089Z ld.shared.b16 %rs512, [%r19+9280]; 2026-02-21T10:22:24.2578168Z cvt.f32.bf16 %r7329, %rs449; 2026-02-21T10:22:24.2578233Z cvt.f32.bf16 %r7330, %rs450; 2026-02-21T10:22:24.2578294Z cvt.f32.bf16 %r7331, %rs457; 2026-02-21T10:22:24.2578358Z cvt.f32.bf16 %r7332, %rs458; 2026-02-21T10:22:24.2578420Z cvt.f32.bf16 %r7461, %rs465; 2026-02-21T10:22:24.2578480Z cvt.f32.bf16 %r7462, %rs466; 2026-02-21T10:22:24.2578611Z cvt.f32.bf16 %r7463, %rs473; 2026-02-21T10:22:24.2578676Z cvt.f32.bf16 %r7464, %rs474; 2026-02-21T10:22:24.2578737Z cvt.f32.bf16 %r7593, %rs481; 2026-02-21T10:22:24.2578796Z cvt.f32.bf16 %r7594, %rs482; 2026-02-21T10:22:24.2578858Z cvt.f32.bf16 %r7595, %rs489; 2026-02-21T10:22:24.2578919Z cvt.f32.bf16 %r7596, %rs490; 2026-02-21T10:22:24.2578981Z cvt.f32.bf16 %r7725, %rs497; 2026-02-21T10:22:24.2579044Z cvt.f32.bf16 %r7726, %rs498; 2026-02-21T10:22:24.2579104Z cvt.f32.bf16 %r7727, %rs505; 2026-02-21T10:22:24.2579165Z cvt.f32.bf16 %r7728, %rs506; 2026-02-21T10:22:24.2579224Z cvt.f32.bf16 %r7857, %rs451; 2026-02-21T10:22:24.2579287Z cvt.f32.bf16 %r7858, %rs452; 2026-02-21T10:22:24.2579348Z cvt.f32.bf16 %r7859, %rs459; 2026-02-21T10:22:24.2579409Z cvt.f32.bf16 %r7860, %rs460; 2026-02-21T10:22:24.2579471Z cvt.f32.bf16 %r7989, %rs467; 2026-02-21T10:22:24.2579531Z cvt.f32.bf16 %r7990, %rs468; 2026-02-21T10:22:24.2579590Z cvt.f32.bf16 %r7991, %rs475; 2026-02-21T10:22:24.2579652Z cvt.f32.bf16 %r7992, %rs476; 2026-02-21T10:22:24.2579788Z cvt.f32.bf16 %r8121, %rs483; 2026-02-21T10:22:24.2579851Z cvt.f32.bf16 %r8122, %rs484; 2026-02-21T10:22:24.2579911Z cvt.f32.bf16 %r8123, %rs491; 2026-02-21T10:22:24.2579974Z cvt.f32.bf16 %r8124, %rs492; 2026-02-21T10:22:24.2580033Z cvt.f32.bf16 %r8253, %rs499; 2026-02-21T10:22:24.2580153Z cvt.f32.bf16 %r8254, %rs500; 2026-02-21T10:22:24.2580214Z cvt.f32.bf16 %r8255, %rs507; 2026-02-21T10:22:24.2580279Z cvt.f32.bf16 %r8256, %rs508; 2026-02-21T10:22:24.2580342Z cvt.f32.bf16 %r8385, %rs453; 2026-02-21T10:22:24.2580402Z cvt.f32.bf16 %r8386, %rs454; 2026-02-21T10:22:24.2580465Z cvt.f32.bf16 %r8387, %rs461; 2026-02-21T10:22:24.2580524Z cvt.f32.bf16 %r8388, %rs462; 2026-02-21T10:22:24.2580588Z cvt.f32.bf16 %r8517, %rs469; 2026-02-21T10:22:24.2580648Z cvt.f32.bf16 %r8518, %rs470; 2026-02-21T10:22:24.2580712Z cvt.f32.bf16 %r8519, %rs477; 2026-02-21T10:22:24.2580783Z cvt.f32.bf16 %r8520, %rs478; 2026-02-21T10:22:24.2580847Z cvt.f32.bf16 %r8649, %rs485; 2026-02-21T10:22:24.2580912Z cvt.f32.bf16 %r8650, %rs486; 2026-02-21T10:22:24.2580973Z cvt.f32.bf16 %r8651, %rs493; 2026-02-21T10:22:24.2581034Z cvt.f32.bf16 %r8652, %rs494; 2026-02-21T10:22:24.2581099Z cvt.f32.bf16 %r8781, %rs501; 2026-02-21T10:22:24.2581160Z cvt.f32.bf16 %r8782, %rs502; 2026-02-21T10:22:24.2581220Z cvt.f32.bf16 %r8783, %rs509; 2026-02-21T10:22:24.2581287Z cvt.f32.bf16 %r8784, %rs510; 2026-02-21T10:22:24.2581352Z cvt.f32.bf16 %r8913, %rs455; 2026-02-21T10:22:24.2581463Z cvt.f32.bf16 %r8914, %rs456; 2026-02-21T10:22:24.2581526Z cvt.f32.bf16 %r8915, %rs463; 2026-02-21T10:22:24.2581589Z cvt.f32.bf16 %r8916, %rs464; 2026-02-21T10:22:24.2581648Z cvt.f32.bf16 %r9045, %rs471; 2026-02-21T10:22:24.2581708Z cvt.f32.bf16 %r9046, %rs472; 2026-02-21T10:22:24.2581768Z cvt.f32.bf16 %r9047, %rs479; 2026-02-21T10:22:24.2581830Z cvt.f32.bf16 %r9048, %rs480; 2026-02-21T10:22:24.2581890Z cvt.f32.bf16 %r9177, %rs487; 2026-02-21T10:22:24.2581949Z cvt.f32.bf16 %r9178, %rs488; 2026-02-21T10:22:24.2582018Z cvt.f32.bf16 %r9179, %rs495; 2026-02-21T10:22:24.2582079Z cvt.f32.bf16 %r9180, %rs496; 2026-02-21T10:22:24.2582139Z cvt.f32.bf16 %r9309, %rs503; 2026-02-21T10:22:24.2582199Z cvt.f32.bf16 %r9310, %rs504; 2026-02-21T10:22:24.2582262Z cvt.f32.bf16 %r9311, %rs511; 2026-02-21T10:22:24.2582321Z cvt.f32.bf16 %r9312, %rs512; 2026-02-21T10:22:24.2582534Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2582592Z bar.sync 0; 2026-02-21T10:22:24.2582654Z // begin inline asm 2026-02-21T10:22:24.2582759Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2582822Z // end inline asm 2026-02-21T10:22:24.2582885Z bar.sync 0; 2026-02-21T10:22:24.2582946Z // begin inline asm 2026-02-21T10:22:24.2583081Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2583141Z // end inline asm 2026-02-21T10:22:24.2583197Z // begin inline asm 2026-02-21T10:22:24.2583333Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2583392Z // end inline asm 2026-02-21T10:22:24.2583447Z bar.sync 0; 2026-02-21T10:22:24.2583515Z elect.sync %r9708|%p101, -1; 2026-02-21T10:22:24.2583583Z and.pred %p80, %p1, %p101; 2026-02-21T10:22:24.2583656Z add.s32 %r7196, %r9643, 160; 2026-02-21T10:22:24.2583720Z // begin inline asm 2026-02-21T10:22:24.2584052Z @%p80 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r9808, %r7196}], [%r29850]; 2026-02-21T10:22:24.2584113Z // end inline asm 2026-02-21T10:22:24.2584168Z bar.sync 0; 2026-02-21T10:22:24.2584225Z // begin inline asm 2026-02-21T10:22:24.2584280Z 2026-02-21T10:22:24.2584329Z { 2026-02-21T10:22:24.2584392Z .reg .pred complete; 2026-02-21T10:22:24.2584448Z waitLoop: 2026-02-21T10:22:24.2584595Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r9443; 2026-02-21T10:22:24.2584665Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2584714Z } 2026-02-21T10:22:24.2584721Z 2026-02-21T10:22:24.2584834Z // end inline asm 2026-02-21T10:22:24.2584891Z bar.sync 0; 2026-02-21T10:22:24.2584949Z // begin inline asm 2026-02-21T10:22:24.2585044Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2585104Z // end inline asm 2026-02-21T10:22:24.2585306Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2585421Z ld.shared.s8 %rs513, [%r20]; 2026-02-21T10:22:24.2585618Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2585681Z shl.b16 %rs514, %rs513, 4; 2026-02-21T10:22:24.2585872Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2585943Z ld.shared.s8 %rs515, [%r21+128]; 2026-02-21T10:22:24.2586140Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2586203Z shl.b16 %rs516, %rs515, 4; 2026-02-21T10:22:24.2586398Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2586587Z ld.shared.s8 %rs517, [%r22+256]; 2026-02-21T10:22:24.2586790Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2586858Z shl.b16 %rs518, %rs517, 4; 2026-02-21T10:22:24.2587054Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2587201Z ld.shared.s8 %rs519, [%r23+384]; 2026-02-21T10:22:24.2587407Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2587473Z shl.b16 %rs520, %rs519, 4; 2026-02-21T10:22:24.2587667Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2587733Z ld.shared.s8 %rs521, [%r24+512]; 2026-02-21T10:22:24.2587934Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2587996Z shl.b16 %rs522, %rs521, 4; 2026-02-21T10:22:24.2588184Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2588251Z ld.shared.s8 %rs523, [%r25+640]; 2026-02-21T10:22:24.2588526Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2588589Z shl.b16 %rs524, %rs523, 4; 2026-02-21T10:22:24.2588783Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2588849Z ld.shared.s8 %rs525, [%r26+768]; 2026-02-21T10:22:24.2589039Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2589099Z shl.b16 %rs526, %rs525, 4; 2026-02-21T10:22:24.2589293Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2589464Z ld.shared.s8 %rs527, [%r27+896]; 2026-02-21T10:22:24.2589655Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2589718Z shl.b16 %rs528, %rs527, 4; 2026-02-21T10:22:24.2589909Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2589980Z ld.shared.s8 %rs529, [%r20+1024]; 2026-02-21T10:22:24.2590180Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2590242Z shl.b16 %rs530, %rs529, 4; 2026-02-21T10:22:24.2590433Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2590500Z ld.shared.s8 %rs531, [%r21+1152]; 2026-02-21T10:22:24.2590691Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2590757Z shl.b16 %rs532, %rs531, 4; 2026-02-21T10:22:24.2591013Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2591084Z ld.shared.s8 %rs533, [%r22+1280]; 2026-02-21T10:22:24.2591275Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2591399Z shl.b16 %rs534, %rs533, 4; 2026-02-21T10:22:24.2591594Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2591669Z ld.shared.s8 %rs535, [%r23+1408]; 2026-02-21T10:22:24.2591861Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2591925Z shl.b16 %rs536, %rs535, 4; 2026-02-21T10:22:24.2592112Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2592174Z ld.shared.s8 %rs537, [%r24+1536]; 2026-02-21T10:22:24.2592366Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2592430Z shl.b16 %rs538, %rs537, 4; 2026-02-21T10:22:24.2592618Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2592683Z ld.shared.s8 %rs539, [%r25+1664]; 2026-02-21T10:22:24.2592879Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2592995Z shl.b16 %rs540, %rs539, 4; 2026-02-21T10:22:24.2593186Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2593253Z ld.shared.s8 %rs541, [%r26+1792]; 2026-02-21T10:22:24.2593443Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2593503Z shl.b16 %rs542, %rs541, 4; 2026-02-21T10:22:24.2593697Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2593764Z ld.shared.s8 %rs543, [%r27+1920]; 2026-02-21T10:22:24.2593953Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2594016Z shl.b16 %rs544, %rs543, 4; 2026-02-21T10:22:24.2594204Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2594272Z ld.shared.s8 %rs545, [%r20+2048]; 2026-02-21T10:22:24.2594463Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2594529Z shl.b16 %rs546, %rs545, 4; 2026-02-21T10:22:24.2594720Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2594783Z ld.shared.s8 %rs547, [%r21+2176]; 2026-02-21T10:22:24.2594982Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2595109Z shl.b16 %rs548, %rs547, 4; 2026-02-21T10:22:24.2595301Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2595366Z ld.shared.s8 %rs549, [%r22+2304]; 2026-02-21T10:22:24.2595556Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2595619Z shl.b16 %rs550, %rs549, 4; 2026-02-21T10:22:24.2595810Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2595877Z ld.shared.s8 %rs551, [%r23+2432]; 2026-02-21T10:22:24.2596067Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2596128Z shl.b16 %rs552, %rs551, 4; 2026-02-21T10:22:24.2596320Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2596384Z ld.shared.s8 %rs553, [%r24+2560]; 2026-02-21T10:22:24.2596789Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2596874Z shl.b16 %rs554, %rs553, 4; 2026-02-21T10:22:24.2597068Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2597197Z ld.shared.s8 %rs555, [%r25+2688]; 2026-02-21T10:22:24.2597390Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2597455Z shl.b16 %rs556, %rs555, 4; 2026-02-21T10:22:24.2597645Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2597708Z ld.shared.s8 %rs557, [%r26+2816]; 2026-02-21T10:22:24.2597900Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2597961Z shl.b16 %rs558, %rs557, 4; 2026-02-21T10:22:24.2598153Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2598221Z ld.shared.s8 %rs559, [%r27+2944]; 2026-02-21T10:22:24.2598409Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2598471Z shl.b16 %rs560, %rs559, 4; 2026-02-21T10:22:24.2598668Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2598731Z ld.shared.s8 %rs561, [%r20+3072]; 2026-02-21T10:22:24.2598986Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2599053Z shl.b16 %rs562, %rs561, 4; 2026-02-21T10:22:24.2599241Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2599304Z ld.shared.s8 %rs563, [%r21+3200]; 2026-02-21T10:22:24.2599492Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2599560Z shl.b16 %rs564, %rs563, 4; 2026-02-21T10:22:24.2599750Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2599813Z ld.shared.s8 %rs565, [%r22+3328]; 2026-02-21T10:22:24.2600005Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2600068Z shl.b16 %rs566, %rs565, 4; 2026-02-21T10:22:24.2600259Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2600327Z ld.shared.s8 %rs567, [%r23+3456]; 2026-02-21T10:22:24.2600515Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2600576Z shl.b16 %rs568, %rs567, 4; 2026-02-21T10:22:24.2600767Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2600901Z ld.shared.s8 %rs569, [%r24+3584]; 2026-02-21T10:22:24.2601094Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2601159Z shl.b16 %rs570, %rs569, 4; 2026-02-21T10:22:24.2601346Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2601411Z ld.shared.s8 %rs571, [%r25+3712]; 2026-02-21T10:22:24.2601602Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2601679Z shl.b16 %rs572, %rs571, 4; 2026-02-21T10:22:24.2601870Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2601935Z ld.shared.s8 %rs573, [%r26+3840]; 2026-02-21T10:22:24.2602127Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2602188Z shl.b16 %rs574, %rs573, 4; 2026-02-21T10:22:24.2602430Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2602503Z ld.shared.s8 %rs575, [%r27+3968]; 2026-02-21T10:22:24.2602692Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2602756Z shl.b16 %rs576, %rs575, 4; 2026-02-21T10:22:24.2603008Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2603077Z cvt.s16.s8 %rs577, %rs514; 2026-02-21T10:22:24.2603138Z shr.s16 %rs578, %rs577, 4; 2026-02-21T10:22:24.2603197Z cvt.s16.s8 %rs579, %rs516; 2026-02-21T10:22:24.2603261Z shr.s16 %rs580, %rs579, 4; 2026-02-21T10:22:24.2603323Z shr.s16 %rs581, %rs513, 4; 2026-02-21T10:22:24.2603385Z shr.s16 %rs582, %rs515, 4; 2026-02-21T10:22:24.2603580Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2603648Z cvt.rn.f32.s16 %r9709, %rs582; 2026-02-21T10:22:24.2603715Z cvt.rn.f32.s16 %r9710, %rs581; 2026-02-21T10:22:24.2603776Z cvt.rn.f32.s16 %r9711, %rs580; 2026-02-21T10:22:24.2603841Z cvt.rn.f32.s16 %r9712, %rs578; 2026-02-21T10:22:24.2604030Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2604093Z cvt.s16.s8 %rs583, %rs518; 2026-02-21T10:22:24.2604158Z shr.s16 %rs584, %rs583, 4; 2026-02-21T10:22:24.2604219Z cvt.s16.s8 %rs585, %rs520; 2026-02-21T10:22:24.2604329Z shr.s16 %rs586, %rs585, 4; 2026-02-21T10:22:24.2604396Z shr.s16 %rs587, %rs517, 4; 2026-02-21T10:22:24.2604455Z shr.s16 %rs588, %rs519, 4; 2026-02-21T10:22:24.2604644Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2604708Z cvt.rn.f32.s16 %r9713, %rs588; 2026-02-21T10:22:24.2604776Z cvt.rn.f32.s16 %r9714, %rs587; 2026-02-21T10:22:24.2604839Z cvt.rn.f32.s16 %r9715, %rs586; 2026-02-21T10:22:24.2604903Z cvt.rn.f32.s16 %r9716, %rs584; 2026-02-21T10:22:24.2605101Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2605166Z cvt.s16.s8 %rs589, %rs522; 2026-02-21T10:22:24.2605228Z shr.s16 %rs590, %rs589, 4; 2026-02-21T10:22:24.2605287Z cvt.s16.s8 %rs591, %rs524; 2026-02-21T10:22:24.2605354Z shr.s16 %rs592, %rs591, 4; 2026-02-21T10:22:24.2605412Z shr.s16 %rs593, %rs521, 4; 2026-02-21T10:22:24.2605470Z shr.s16 %rs594, %rs523, 4; 2026-02-21T10:22:24.2605667Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2605730Z cvt.rn.f32.s16 %r9717, %rs594; 2026-02-21T10:22:24.2605792Z cvt.rn.f32.s16 %r9718, %rs593; 2026-02-21T10:22:24.2605855Z cvt.rn.f32.s16 %r9719, %rs592; 2026-02-21T10:22:24.2605917Z cvt.rn.f32.s16 %r9720, %rs590; 2026-02-21T10:22:24.2606127Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2606244Z cvt.s16.s8 %rs595, %rs526; 2026-02-21T10:22:24.2606309Z shr.s16 %rs596, %rs595, 4; 2026-02-21T10:22:24.2606370Z cvt.s16.s8 %rs597, %rs528; 2026-02-21T10:22:24.2606430Z shr.s16 %rs598, %rs597, 4; 2026-02-21T10:22:24.2606617Z shr.s16 %rs599, %rs525, 4; 2026-02-21T10:22:24.2606685Z shr.s16 %rs600, %rs527, 4; 2026-02-21T10:22:24.2606881Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2606947Z cvt.rn.f32.s16 %r9721, %rs600; 2026-02-21T10:22:24.2607010Z cvt.rn.f32.s16 %r9722, %rs599; 2026-02-21T10:22:24.2607072Z cvt.rn.f32.s16 %r9723, %rs598; 2026-02-21T10:22:24.2607133Z cvt.rn.f32.s16 %r9724, %rs596; 2026-02-21T10:22:24.2607325Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2607385Z cvt.s16.s8 %rs601, %rs530; 2026-02-21T10:22:24.2607445Z shr.s16 %rs602, %rs601, 4; 2026-02-21T10:22:24.2607513Z cvt.s16.s8 %rs603, %rs532; 2026-02-21T10:22:24.2607647Z shr.s16 %rs604, %rs603, 4; 2026-02-21T10:22:24.2607710Z shr.s16 %rs605, %rs529, 4; 2026-02-21T10:22:24.2607771Z shr.s16 %rs606, %rs531, 4; 2026-02-21T10:22:24.2607963Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2608086Z cvt.rn.f32.s16 %r9725, %rs606; 2026-02-21T10:22:24.2608148Z cvt.rn.f32.s16 %r9726, %rs605; 2026-02-21T10:22:24.2608213Z cvt.rn.f32.s16 %r9727, %rs604; 2026-02-21T10:22:24.2608277Z cvt.rn.f32.s16 %r9728, %rs602; 2026-02-21T10:22:24.2608476Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2608542Z cvt.s16.s8 %rs607, %rs534; 2026-02-21T10:22:24.2608604Z shr.s16 %rs608, %rs607, 4; 2026-02-21T10:22:24.2608665Z cvt.s16.s8 %rs609, %rs536; 2026-02-21T10:22:24.2608725Z shr.s16 %rs610, %rs609, 4; 2026-02-21T10:22:24.2608789Z shr.s16 %rs611, %rs533, 4; 2026-02-21T10:22:24.2608854Z shr.s16 %rs612, %rs535, 4; 2026-02-21T10:22:24.2609049Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2609116Z cvt.rn.f32.s16 %r9729, %rs612; 2026-02-21T10:22:24.2609179Z cvt.rn.f32.s16 %r9730, %rs611; 2026-02-21T10:22:24.2609242Z cvt.rn.f32.s16 %r9731, %rs610; 2026-02-21T10:22:24.2609306Z cvt.rn.f32.s16 %r9732, %rs608; 2026-02-21T10:22:24.2609501Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2609654Z cvt.s16.s8 %rs613, %rs538; 2026-02-21T10:22:24.2609721Z shr.s16 %rs614, %rs613, 4; 2026-02-21T10:22:24.2609786Z cvt.s16.s8 %rs615, %rs540; 2026-02-21T10:22:24.2609846Z shr.s16 %rs616, %rs615, 4; 2026-02-21T10:22:24.2609906Z shr.s16 %rs617, %rs537, 4; 2026-02-21T10:22:24.2609971Z shr.s16 %rs618, %rs539, 4; 2026-02-21T10:22:24.2610165Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2610232Z cvt.rn.f32.s16 %r9733, %rs618; 2026-02-21T10:22:24.2610294Z cvt.rn.f32.s16 %r9734, %rs617; 2026-02-21T10:22:24.2610358Z cvt.rn.f32.s16 %r9735, %rs616; 2026-02-21T10:22:24.2610419Z cvt.rn.f32.s16 %r9736, %rs614; 2026-02-21T10:22:24.2610609Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2610676Z cvt.s16.s8 %rs619, %rs542; 2026-02-21T10:22:24.2610738Z shr.s16 %rs620, %rs619, 4; 2026-02-21T10:22:24.2610799Z cvt.s16.s8 %rs621, %rs544; 2026-02-21T10:22:24.2610865Z shr.s16 %rs622, %rs621, 4; 2026-02-21T10:22:24.2610925Z shr.s16 %rs623, %rs541, 4; 2026-02-21T10:22:24.2610986Z shr.s16 %rs624, %rs543, 4; 2026-02-21T10:22:24.2611174Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2611240Z cvt.rn.f32.s16 %r9737, %rs624; 2026-02-21T10:22:24.2611301Z cvt.rn.f32.s16 %r9738, %rs623; 2026-02-21T10:22:24.2611437Z cvt.rn.f32.s16 %r9739, %rs622; 2026-02-21T10:22:24.2611507Z cvt.rn.f32.s16 %r9740, %rs620; 2026-02-21T10:22:24.2611707Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2611768Z cvt.s16.s8 %rs625, %rs546; 2026-02-21T10:22:24.2611832Z shr.s16 %rs626, %rs625, 4; 2026-02-21T10:22:24.2611898Z cvt.s16.s8 %rs627, %rs548; 2026-02-21T10:22:24.2611957Z shr.s16 %rs628, %rs627, 4; 2026-02-21T10:22:24.2612017Z shr.s16 %rs629, %rs545, 4; 2026-02-21T10:22:24.2612082Z shr.s16 %rs630, %rs547, 4; 2026-02-21T10:22:24.2612274Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2612338Z cvt.rn.f32.s16 %r9741, %rs630; 2026-02-21T10:22:24.2612406Z cvt.rn.f32.s16 %r9742, %rs629; 2026-02-21T10:22:24.2612468Z cvt.rn.f32.s16 %r9743, %rs628; 2026-02-21T10:22:24.2612530Z cvt.rn.f32.s16 %r9744, %rs626; 2026-02-21T10:22:24.2612770Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2612842Z cvt.s16.s8 %rs631, %rs550; 2026-02-21T10:22:24.2612903Z shr.s16 %rs632, %rs631, 4; 2026-02-21T10:22:24.2612974Z cvt.s16.s8 %rs633, %rs552; 2026-02-21T10:22:24.2613040Z shr.s16 %rs634, %rs633, 4; 2026-02-21T10:22:24.2613101Z shr.s16 %rs635, %rs549, 4; 2026-02-21T10:22:24.2613209Z shr.s16 %rs636, %rs551, 4; 2026-02-21T10:22:24.2613400Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2613470Z cvt.rn.f32.s16 %r9745, %rs636; 2026-02-21T10:22:24.2613532Z cvt.rn.f32.s16 %r9746, %rs635; 2026-02-21T10:22:24.2613593Z cvt.rn.f32.s16 %r9747, %rs634; 2026-02-21T10:22:24.2613662Z cvt.rn.f32.s16 %r9748, %rs632; 2026-02-21T10:22:24.2613851Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2613911Z cvt.s16.s8 %rs637, %rs554; 2026-02-21T10:22:24.2613976Z shr.s16 %rs638, %rs637, 4; 2026-02-21T10:22:24.2614043Z cvt.s16.s8 %rs639, %rs556; 2026-02-21T10:22:24.2614103Z shr.s16 %rs640, %rs639, 4; 2026-02-21T10:22:24.2614162Z shr.s16 %rs641, %rs553, 4; 2026-02-21T10:22:24.2614226Z shr.s16 %rs642, %rs555, 4; 2026-02-21T10:22:24.2614416Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2614484Z cvt.rn.f32.s16 %r9749, %rs642; 2026-02-21T10:22:24.2614554Z cvt.rn.f32.s16 %r9750, %rs641; 2026-02-21T10:22:24.2614616Z cvt.rn.f32.s16 %r9751, %rs640; 2026-02-21T10:22:24.2614729Z cvt.rn.f32.s16 %r9752, %rs638; 2026-02-21T10:22:24.2614926Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2614986Z cvt.s16.s8 %rs643, %rs558; 2026-02-21T10:22:24.2615045Z shr.s16 %rs644, %rs643, 4; 2026-02-21T10:22:24.2615106Z cvt.s16.s8 %rs645, %rs560; 2026-02-21T10:22:24.2615169Z shr.s16 %rs646, %rs645, 4; 2026-02-21T10:22:24.2615228Z shr.s16 %rs647, %rs557, 4; 2026-02-21T10:22:24.2615294Z shr.s16 %rs648, %rs559, 4; 2026-02-21T10:22:24.2615487Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2615549Z cvt.rn.f32.s16 %r9753, %rs648; 2026-02-21T10:22:24.2615611Z cvt.rn.f32.s16 %r9754, %rs647; 2026-02-21T10:22:24.2615674Z cvt.rn.f32.s16 %r9755, %rs646; 2026-02-21T10:22:24.2615738Z cvt.rn.f32.s16 %r9756, %rs644; 2026-02-21T10:22:24.2615929Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2615990Z cvt.s16.s8 %rs649, %rs562; 2026-02-21T10:22:24.2616074Z shr.s16 %rs650, %rs649, 4; 2026-02-21T10:22:24.2616137Z cvt.s16.s8 %rs651, %rs564; 2026-02-21T10:22:24.2616197Z shr.s16 %rs652, %rs651, 4; 2026-02-21T10:22:24.2616263Z shr.s16 %rs653, %rs561, 4; 2026-02-21T10:22:24.2616322Z shr.s16 %rs654, %rs563, 4; 2026-02-21T10:22:24.2616629Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2616784Z cvt.rn.f32.s16 %r9757, %rs654; 2026-02-21T10:22:24.2616849Z cvt.rn.f32.s16 %r9758, %rs653; 2026-02-21T10:22:24.2616911Z cvt.rn.f32.s16 %r9759, %rs652; 2026-02-21T10:22:24.2616973Z cvt.rn.f32.s16 %r9760, %rs650; 2026-02-21T10:22:24.2617170Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2617234Z cvt.s16.s8 %rs655, %rs566; 2026-02-21T10:22:24.2617295Z shr.s16 %rs656, %rs655, 4; 2026-02-21T10:22:24.2617361Z cvt.s16.s8 %rs657, %rs568; 2026-02-21T10:22:24.2617425Z shr.s16 %rs658, %rs657, 4; 2026-02-21T10:22:24.2617485Z shr.s16 %rs659, %rs565, 4; 2026-02-21T10:22:24.2617545Z shr.s16 %rs660, %rs567, 4; 2026-02-21T10:22:24.2617740Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2617802Z cvt.rn.f32.s16 %r9761, %rs660; 2026-02-21T10:22:24.2617863Z cvt.rn.f32.s16 %r9762, %rs659; 2026-02-21T10:22:24.2617998Z cvt.rn.f32.s16 %r9763, %rs658; 2026-02-21T10:22:24.2618064Z cvt.rn.f32.s16 %r9764, %rs656; 2026-02-21T10:22:24.2618256Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2618317Z cvt.s16.s8 %rs661, %rs570; 2026-02-21T10:22:24.2618392Z shr.s16 %rs662, %rs661, 4; 2026-02-21T10:22:24.2618520Z cvt.s16.s8 %rs663, %rs572; 2026-02-21T10:22:24.2618581Z shr.s16 %rs664, %rs663, 4; 2026-02-21T10:22:24.2618645Z shr.s16 %rs665, %rs569, 4; 2026-02-21T10:22:24.2618708Z shr.s16 %rs666, %rs571, 4; 2026-02-21T10:22:24.2618900Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2618965Z cvt.rn.f32.s16 %r9765, %rs666; 2026-02-21T10:22:24.2619028Z cvt.rn.f32.s16 %r9766, %rs665; 2026-02-21T10:22:24.2619089Z cvt.rn.f32.s16 %r9767, %rs664; 2026-02-21T10:22:24.2619150Z cvt.rn.f32.s16 %r9768, %rs662; 2026-02-21T10:22:24.2619349Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2619409Z cvt.s16.s8 %rs667, %rs574; 2026-02-21T10:22:24.2619469Z shr.s16 %rs668, %rs667, 4; 2026-02-21T10:22:24.2619532Z cvt.s16.s8 %rs669, %rs576; 2026-02-21T10:22:24.2619593Z shr.s16 %rs670, %rs669, 4; 2026-02-21T10:22:24.2619653Z shr.s16 %rs671, %rs573, 4; 2026-02-21T10:22:24.2619717Z shr.s16 %rs672, %rs575, 4; 2026-02-21T10:22:24.2619990Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2620055Z cvt.rn.f32.s16 %r9769, %rs672; 2026-02-21T10:22:24.2620117Z cvt.rn.f32.s16 %r9770, %rs671; 2026-02-21T10:22:24.2620182Z cvt.rn.f32.s16 %r9771, %rs670; 2026-02-21T10:22:24.2620242Z cvt.rn.f32.s16 %r9772, %rs668; 2026-02-21T10:22:24.2620298Z bar.sync 0; 2026-02-21T10:22:24.2620428Z st.shared.v4.b32 [%r28], {%r9712, %r9710, %r9711, %r9709}; 2026-02-21T10:22:24.2620550Z st.shared.v4.b32 [%r28+16384], {%r9744, %r9742, %r9743, %r9741}; 2026-02-21T10:22:24.2620661Z st.shared.v4.b32 [%r29], {%r9716, %r9714, %r9715, %r9713}; 2026-02-21T10:22:24.2620775Z st.shared.v4.b32 [%r29+16384], {%r9748, %r9746, %r9747, %r9745}; 2026-02-21T10:22:24.2620883Z st.shared.v4.b32 [%r30], {%r9720, %r9718, %r9719, %r9717}; 2026-02-21T10:22:24.2620995Z st.shared.v4.b32 [%r30+16384], {%r9752, %r9750, %r9751, %r9749}; 2026-02-21T10:22:24.2621100Z st.shared.v4.b32 [%r31], {%r9724, %r9722, %r9723, %r9721}; 2026-02-21T10:22:24.2621218Z st.shared.v4.b32 [%r31+16384], {%r9756, %r9754, %r9755, %r9753}; 2026-02-21T10:22:24.2621319Z st.shared.v4.b32 [%r32], {%r9728, %r9726, %r9727, %r9725}; 2026-02-21T10:22:24.2621429Z st.shared.v4.b32 [%r32+16384], {%r9760, %r9758, %r9759, %r9757}; 2026-02-21T10:22:24.2621534Z st.shared.v4.b32 [%r33], {%r9732, %r9730, %r9731, %r9729}; 2026-02-21T10:22:24.2621645Z st.shared.v4.b32 [%r33+16384], {%r9764, %r9762, %r9763, %r9761}; 2026-02-21T10:22:24.2621745Z st.shared.v4.b32 [%r34], {%r9736, %r9734, %r9735, %r9733}; 2026-02-21T10:22:24.2621927Z st.shared.v4.b32 [%r34+16384], {%r9768, %r9766, %r9767, %r9765}; 2026-02-21T10:22:24.2622029Z st.shared.v4.b32 [%r35], {%r9740, %r9738, %r9739, %r9737}; 2026-02-21T10:22:24.2622140Z st.shared.v4.b32 [%r35+16384], {%r9772, %r9770, %r9771, %r9769}; 2026-02-21T10:22:24.2622195Z $L__tmp5: 2026-02-21T10:22:24.2622471Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2622531Z // begin inline asm 2026-02-21T10:22:24.2622613Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2622673Z // end inline asm 2026-02-21T10:22:24.2622728Z bar.sync 0; 2026-02-21T10:22:24.2622801Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2622861Z // begin inline asm 2026-02-21T10:22:24.2624404Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7329,%r7330,%r7331,%r7332}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2624511Z // end inline asm 2026-02-21T10:22:24.2624576Z // begin inline asm 2026-02-21T10:22:24.2626071Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7461,%r7462,%r7463,%r7464}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2626132Z // end inline asm 2026-02-21T10:22:24.2626195Z // begin inline asm 2026-02-21T10:22:24.2627873Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7593,%r7594,%r7595,%r7596}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2627952Z // end inline asm 2026-02-21T10:22:24.2628013Z // begin inline asm 2026-02-21T10:22:24.2629574Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7725,%r7726,%r7727,%r7728}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2629637Z // end inline asm 2026-02-21T10:22:24.2629695Z // begin inline asm 2026-02-21T10:22:24.2631163Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7857,%r7858,%r7859,%r7860}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2631291Z // end inline asm 2026-02-21T10:22:24.2631360Z // begin inline asm 2026-02-21T10:22:24.2632900Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r7989,%r7990,%r7991,%r7992}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2632963Z // end inline asm 2026-02-21T10:22:24.2633022Z // begin inline asm 2026-02-21T10:22:24.2634499Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r8121,%r8122,%r8123,%r8124}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2634634Z // end inline asm 2026-02-21T10:22:24.2634694Z // begin inline asm 2026-02-21T10:22:24.2636173Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r8253,%r8254,%r8255,%r8256}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2636282Z // end inline asm 2026-02-21T10:22:24.2636356Z // begin inline asm 2026-02-21T10:22:24.2637950Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r8385,%r8386,%r8387,%r8388}, %rd3, %p42, 1, 1; 2026-02-21T10:22:24.2638019Z // end inline asm 2026-02-21T10:22:24.2638077Z // begin inline asm 2026-02-21T10:22:24.2639551Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r8517,%r8518,%r8519,%r8520}, %rd4, %p42, 1, 1; 2026-02-21T10:22:24.2639693Z // end inline asm 2026-02-21T10:22:24.2639755Z // begin inline asm 2026-02-21T10:22:24.2641226Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r8649,%r8650,%r8651,%r8652}, %rd5, %p42, 1, 1; 2026-02-21T10:22:24.2641290Z // end inline asm 2026-02-21T10:22:24.2641349Z // begin inline asm 2026-02-21T10:22:24.2642878Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r8781,%r8782,%r8783,%r8784}, %rd6, %p42, 1, 1; 2026-02-21T10:22:24.2642997Z // end inline asm 2026-02-21T10:22:24.2643054Z // begin inline asm 2026-02-21T10:22:24.2644529Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r8913,%r8914,%r8915,%r8916}, %rd7, %p42, 1, 1; 2026-02-21T10:22:24.2644589Z // end inline asm 2026-02-21T10:22:24.2644650Z // begin inline asm 2026-02-21T10:22:24.2646185Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r9045,%r9046,%r9047,%r9048}, %rd8, %p42, 1, 1; 2026-02-21T10:22:24.2646248Z // end inline asm 2026-02-21T10:22:24.2646322Z // begin inline asm 2026-02-21T10:22:24.2647922Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r9177,%r9178,%r9179,%r9180}, %rd9, %p42, 1, 1; 2026-02-21T10:22:24.2647985Z // end inline asm 2026-02-21T10:22:24.2648048Z // begin inline asm 2026-02-21T10:22:24.2649521Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r9309,%r9310,%r9311,%r9312}, %rd10, %p42, 1, 1; 2026-02-21T10:22:24.2649655Z // end inline asm 2026-02-21T10:22:24.2649733Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.2649797Z mov.b32 %r9442, %r9443; 2026-02-21T10:22:24.2649862Z mov.b32 %r9441, %r39936; 2026-02-21T10:22:24.2649932Z // begin inline asm 2026-02-21T10:22:24.2652500Z // wait for regs: %r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601,%r9441,%r9442,%r9443 2026-02-21T10:22:24.2652648Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.2652706Z // end inline asm 2026-02-21T10:22:24.2652772Z $L__tmp6: 2026-02-21T10:22:24.2653006Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.2653075Z add.s64 %rd840, %rd840, 384; 2026-02-21T10:22:24.2653143Z add.s32 %r42473, %r42473, 192; 2026-02-21T10:22:24.2653215Z setp.lt.u64 %p102, %rd31, 3936; 2026-02-21T10:22:24.2653277Z mov.b64 %rd841, %rd31; 2026-02-21T10:22:24.2653339Z @%p102 bra $L__BB0_3; 2026-02-21T10:22:24.2653451Z // %bb.4: // %.preheader271.preheader 2026-02-21T10:22:24.2653563Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.2653626Z add.s64 %rd33, %rd28, 16128; 2026-02-21T10:22:24.2653687Z add.s64 %rd34, %rd21, 16128; 2026-02-21T10:22:24.2653817Z add.s64 %rd35, %rd22, 16128; 2026-02-21T10:22:24.2653880Z add.s64 %rd36, %rd23, 16128; 2026-02-21T10:22:24.2653940Z add.s64 %rd37, %rd24, 16128; 2026-02-21T10:22:24.2654002Z add.s64 %rd38, %rd25, 16128; 2026-02-21T10:22:24.2654062Z add.s64 %rd39, %rd26, 16128; 2026-02-21T10:22:24.2654123Z add.s64 %rd40, %rd27, 16128; 2026-02-21T10:22:24.2654182Z mov.b64 %rd843, 4000; 2026-02-21T10:22:24.2654244Z mov.b64 %rd842, %rd11; 2026-02-21T10:22:24.2654348Z $L__BB0_5: // %.preheader271 2026-02-21T10:22:24.2654448Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.2654554Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.2654761Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2654830Z add.s64 %rd290, %rd842, %rd40; 2026-02-21T10:22:24.2654892Z add.s64 %rd293, %rd842, %rd39; 2026-02-21T10:22:24.2654959Z add.s64 %rd296, %rd842, %rd38; 2026-02-21T10:22:24.2655021Z add.s64 %rd299, %rd842, %rd37; 2026-02-21T10:22:24.2655095Z add.s64 %rd302, %rd842, %rd36; 2026-02-21T10:22:24.2655160Z add.s64 %rd305, %rd842, %rd35; 2026-02-21T10:22:24.2655222Z add.s64 %rd308, %rd842, %rd34; 2026-02-21T10:22:24.2655419Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2655539Z add.s64 %rd311, %rd842, %rd33; 2026-02-21T10:22:24.2655611Z // begin inline asm 2026-02-21T10:22:24.2655672Z mov.u64 %rd289, 0x0; 2026-02-21T10:22:24.2655801Z createpolicy.fractional.L2::evict_first.b64 %rd289, 1.0; 2026-02-21T10:22:24.2655862Z // end inline asm 2026-02-21T10:22:24.2655919Z // begin inline asm 2026-02-21T10:22:24.2655976Z mov.u32 %r9773, 0x0; 2026-02-21T10:22:24.2656042Z mov.u32 %r9774, 0x0; 2026-02-21T10:22:24.2656099Z mov.u32 %r9775, 0x0; 2026-02-21T10:22:24.2656155Z mov.u32 %r9776, 0x0; 2026-02-21T10:22:24.2656394Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9773, %r9774, %r9775, %r9776 }, [ %rd290 + 0 ], %rd289; 2026-02-21T10:22:24.2656578Z // end inline asm 2026-02-21T10:22:24.2656642Z // begin inline asm 2026-02-21T10:22:24.2656700Z mov.u64 %rd292, 0x0; 2026-02-21T10:22:24.2656828Z createpolicy.fractional.L2::evict_first.b64 %rd292, 1.0; 2026-02-21T10:22:24.2656886Z // end inline asm 2026-02-21T10:22:24.2656944Z // begin inline asm 2026-02-21T10:22:24.2657010Z mov.u32 %r9777, 0x0; 2026-02-21T10:22:24.2657139Z mov.u32 %r9778, 0x0; 2026-02-21T10:22:24.2657200Z mov.u32 %r9779, 0x0; 2026-02-21T10:22:24.2657255Z mov.u32 %r9780, 0x0; 2026-02-21T10:22:24.2657478Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9777, %r9778, %r9779, %r9780 }, [ %rd293 + 0 ], %rd292; 2026-02-21T10:22:24.2657535Z // end inline asm 2026-02-21T10:22:24.2657669Z // begin inline asm 2026-02-21T10:22:24.2657730Z mov.u64 %rd295, 0x0; 2026-02-21T10:22:24.2657848Z createpolicy.fractional.L2::evict_first.b64 %rd295, 1.0; 2026-02-21T10:22:24.2657907Z // end inline asm 2026-02-21T10:22:24.2657968Z // begin inline asm 2026-02-21T10:22:24.2658025Z mov.u32 %r9781, 0x0; 2026-02-21T10:22:24.2658080Z mov.u32 %r9782, 0x0; 2026-02-21T10:22:24.2658136Z mov.u32 %r9783, 0x0; 2026-02-21T10:22:24.2658195Z mov.u32 %r9784, 0x0; 2026-02-21T10:22:24.2658407Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9781, %r9782, %r9783, %r9784 }, [ %rd296 + 0 ], %rd295; 2026-02-21T10:22:24.2658467Z // end inline asm 2026-02-21T10:22:24.2658530Z // begin inline asm 2026-02-21T10:22:24.2658588Z mov.u64 %rd298, 0x0; 2026-02-21T10:22:24.2658703Z createpolicy.fractional.L2::evict_first.b64 %rd298, 1.0; 2026-02-21T10:22:24.2658761Z // end inline asm 2026-02-21T10:22:24.2658819Z // begin inline asm 2026-02-21T10:22:24.2658877Z mov.u32 %r9785, 0x0; 2026-02-21T10:22:24.2658937Z mov.u32 %r9786, 0x0; 2026-02-21T10:22:24.2658995Z mov.u32 %r9787, 0x0; 2026-02-21T10:22:24.2659051Z mov.u32 %r9788, 0x0; 2026-02-21T10:22:24.2659343Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9785, %r9786, %r9787, %r9788 }, [ %rd299 + 0 ], %rd298; 2026-02-21T10:22:24.2659407Z // end inline asm 2026-02-21T10:22:24.2659467Z // begin inline asm 2026-02-21T10:22:24.2659525Z mov.u64 %rd301, 0x0; 2026-02-21T10:22:24.2659643Z createpolicy.fractional.L2::evict_first.b64 %rd301, 1.0; 2026-02-21T10:22:24.2659703Z // end inline asm 2026-02-21T10:22:24.2659761Z // begin inline asm 2026-02-21T10:22:24.2659818Z mov.u32 %r9789, 0x0; 2026-02-21T10:22:24.2659883Z mov.u32 %r9790, 0x0; 2026-02-21T10:22:24.2659940Z mov.u32 %r9791, 0x0; 2026-02-21T10:22:24.2659998Z mov.u32 %r9792, 0x0; 2026-02-21T10:22:24.2660210Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9789, %r9790, %r9791, %r9792 }, [ %rd302 + 0 ], %rd301; 2026-02-21T10:22:24.2660268Z // end inline asm 2026-02-21T10:22:24.2660340Z // begin inline asm 2026-02-21T10:22:24.2660400Z mov.u64 %rd304, 0x0; 2026-02-21T10:22:24.2660521Z createpolicy.fractional.L2::evict_first.b64 %rd304, 1.0; 2026-02-21T10:22:24.2660580Z // end inline asm 2026-02-21T10:22:24.2660637Z // begin inline asm 2026-02-21T10:22:24.2660695Z mov.u32 %r9793, 0x0; 2026-02-21T10:22:24.2660752Z mov.u32 %r9794, 0x0; 2026-02-21T10:22:24.2660809Z mov.u32 %r9795, 0x0; 2026-02-21T10:22:24.2660867Z mov.u32 %r9796, 0x0; 2026-02-21T10:22:24.2661082Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9793, %r9794, %r9795, %r9796 }, [ %rd305 + 0 ], %rd304; 2026-02-21T10:22:24.2661137Z // end inline asm 2026-02-21T10:22:24.2661265Z // begin inline asm 2026-02-21T10:22:24.2661329Z mov.u64 %rd307, 0x0; 2026-02-21T10:22:24.2661445Z createpolicy.fractional.L2::evict_first.b64 %rd307, 1.0; 2026-02-21T10:22:24.2661501Z // end inline asm 2026-02-21T10:22:24.2661564Z // begin inline asm 2026-02-21T10:22:24.2661621Z mov.u32 %r9797, 0x0; 2026-02-21T10:22:24.2661680Z mov.u32 %r9798, 0x0; 2026-02-21T10:22:24.2661748Z mov.u32 %r9799, 0x0; 2026-02-21T10:22:24.2661809Z mov.u32 %r9800, 0x0; 2026-02-21T10:22:24.2662024Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9797, %r9798, %r9799, %r9800 }, [ %rd308 + 0 ], %rd307; 2026-02-21T10:22:24.2662081Z // end inline asm 2026-02-21T10:22:24.2662141Z // begin inline asm 2026-02-21T10:22:24.2662198Z mov.u64 %rd310, 0x0; 2026-02-21T10:22:24.2662313Z createpolicy.fractional.L2::evict_first.b64 %rd310, 1.0; 2026-02-21T10:22:24.2662372Z // end inline asm 2026-02-21T10:22:24.2662432Z // begin inline asm 2026-02-21T10:22:24.2662487Z mov.u32 %r9801, 0x0; 2026-02-21T10:22:24.2662546Z mov.u32 %r9802, 0x0; 2026-02-21T10:22:24.2662677Z mov.u32 %r9803, 0x0; 2026-02-21T10:22:24.2662736Z mov.u32 %r9804, 0x0; 2026-02-21T10:22:24.2662946Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r9801, %r9802, %r9803, %r9804 }, [ %rd311 + 0 ], %rd310; 2026-02-21T10:22:24.2663005Z // end inline asm 2026-02-21T10:22:24.2663208Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2663318Z bar.sync 0; 2026-02-21T10:22:24.2663408Z st.shared.v2.b32 [%r10], {%r9773, %r9774}; 2026-02-21T10:22:24.2663519Z st.shared.v2.b32 [%r10+2048], {%r9777, %r9778}; 2026-02-21T10:22:24.2663605Z st.shared.v2.b32 [%r10+4096], {%r9781, %r9782}; 2026-02-21T10:22:24.2682822Z st.shared.v2.b32 [%r10+6144], {%r9785, %r9786}; 2026-02-21T10:22:24.2682959Z st.shared.v2.b32 [%r10+8192], {%r9789, %r9790}; 2026-02-21T10:22:24.2683060Z st.shared.v2.b32 [%r10+10240], {%r9793, %r9794}; 2026-02-21T10:22:24.2683152Z st.shared.v2.b32 [%r10+12288], {%r9797, %r9798}; 2026-02-21T10:22:24.2683243Z st.shared.v2.b32 [%r10+14336], {%r9801, %r9802}; 2026-02-21T10:22:24.2683330Z st.shared.v2.b32 [%r11], {%r9775, %r9776}; 2026-02-21T10:22:24.2683411Z st.shared.v2.b32 [%r11+2048], {%r9779, %r9780}; 2026-02-21T10:22:24.2683491Z st.shared.v2.b32 [%r11+4096], {%r9783, %r9784}; 2026-02-21T10:22:24.2683570Z st.shared.v2.b32 [%r11+6144], {%r9787, %r9788}; 2026-02-21T10:22:24.2683646Z st.shared.v2.b32 [%r11+8192], {%r9791, %r9792}; 2026-02-21T10:22:24.2683858Z st.shared.v2.b32 [%r11+10240], {%r9795, %r9796}; 2026-02-21T10:22:24.2683943Z st.shared.v2.b32 [%r11+12288], {%r9799, %r9800}; 2026-02-21T10:22:24.2684020Z st.shared.v2.b32 [%r11+14336], {%r9803, %r9804}; 2026-02-21T10:22:24.2684080Z bar.sync 0; 2026-02-21T10:22:24.2684151Z ld.shared.b16 %rs673, [%r12]; 2026-02-21T10:22:24.2684218Z ld.shared.b16 %rs674, [%r12+1024]; 2026-02-21T10:22:24.2684283Z ld.shared.b16 %rs675, [%r12+64]; 2026-02-21T10:22:24.2684348Z ld.shared.b16 %rs676, [%r12+1088]; 2026-02-21T10:22:24.2684414Z ld.shared.b16 %rs677, [%r12+8192]; 2026-02-21T10:22:24.2684476Z ld.shared.b16 %rs678, [%r12+9216]; 2026-02-21T10:22:24.2684542Z ld.shared.b16 %rs679, [%r12+8256]; 2026-02-21T10:22:24.2684602Z ld.shared.b16 %rs680, [%r12+9280]; 2026-02-21T10:22:24.2684664Z ld.shared.b16 %rs681, [%r13]; 2026-02-21T10:22:24.2684730Z ld.shared.b16 %rs682, [%r13+1024]; 2026-02-21T10:22:24.2684809Z ld.shared.b16 %rs683, [%r13+64]; 2026-02-21T10:22:24.2684870Z ld.shared.b16 %rs684, [%r13+1088]; 2026-02-21T10:22:24.2684934Z ld.shared.b16 %rs685, [%r13+8192]; 2026-02-21T10:22:24.2684996Z ld.shared.b16 %rs686, [%r13+9216]; 2026-02-21T10:22:24.2685055Z ld.shared.b16 %rs687, [%r13+8256]; 2026-02-21T10:22:24.2685115Z ld.shared.b16 %rs688, [%r13+9280]; 2026-02-21T10:22:24.2685177Z ld.shared.b16 %rs689, [%r14]; 2026-02-21T10:22:24.2685238Z ld.shared.b16 %rs690, [%r14+1024]; 2026-02-21T10:22:24.2685299Z ld.shared.b16 %rs691, [%r14+64]; 2026-02-21T10:22:24.2685359Z ld.shared.b16 %rs692, [%r14+1088]; 2026-02-21T10:22:24.2685503Z ld.shared.b16 %rs693, [%r14+8192]; 2026-02-21T10:22:24.2685563Z ld.shared.b16 %rs694, [%r14+9216]; 2026-02-21T10:22:24.2685626Z ld.shared.b16 %rs695, [%r14+8256]; 2026-02-21T10:22:24.2685689Z ld.shared.b16 %rs696, [%r14+9280]; 2026-02-21T10:22:24.2685751Z ld.shared.b16 %rs697, [%r15]; 2026-02-21T10:22:24.2685827Z ld.shared.b16 %rs698, [%r15+1024]; 2026-02-21T10:22:24.2685891Z ld.shared.b16 %rs699, [%r15+64]; 2026-02-21T10:22:24.2685959Z ld.shared.b16 %rs700, [%r15+1088]; 2026-02-21T10:22:24.2686022Z ld.shared.b16 %rs701, [%r15+8192]; 2026-02-21T10:22:24.2686082Z ld.shared.b16 %rs702, [%r15+9216]; 2026-02-21T10:22:24.2686147Z ld.shared.b16 %rs703, [%r15+8256]; 2026-02-21T10:22:24.2686208Z ld.shared.b16 %rs704, [%r15+9280]; 2026-02-21T10:22:24.2686271Z ld.shared.b16 %rs705, [%r16]; 2026-02-21T10:22:24.2686333Z ld.shared.b16 %rs706, [%r16+1024]; 2026-02-21T10:22:24.2686399Z ld.shared.b16 %rs707, [%r16+64]; 2026-02-21T10:22:24.2686615Z ld.shared.b16 %rs708, [%r16+1088]; 2026-02-21T10:22:24.2686760Z ld.shared.b16 %rs709, [%r16+8192]; 2026-02-21T10:22:24.2686830Z ld.shared.b16 %rs710, [%r16+9216]; 2026-02-21T10:22:24.2686890Z ld.shared.b16 %rs711, [%r16+8256]; 2026-02-21T10:22:24.2686951Z ld.shared.b16 %rs712, [%r16+9280]; 2026-02-21T10:22:24.2687015Z ld.shared.b16 %rs713, [%r17]; 2026-02-21T10:22:24.2687141Z ld.shared.b16 %rs714, [%r17+1024]; 2026-02-21T10:22:24.2687205Z ld.shared.b16 %rs715, [%r17+64]; 2026-02-21T10:22:24.2687273Z ld.shared.b16 %rs716, [%r17+1088]; 2026-02-21T10:22:24.2687350Z ld.shared.b16 %rs717, [%r17+8192]; 2026-02-21T10:22:24.2687411Z ld.shared.b16 %rs718, [%r17+9216]; 2026-02-21T10:22:24.2687472Z ld.shared.b16 %rs719, [%r17+8256]; 2026-02-21T10:22:24.2687533Z ld.shared.b16 %rs720, [%r17+9280]; 2026-02-21T10:22:24.2687597Z ld.shared.b16 %rs721, [%r18]; 2026-02-21T10:22:24.2687659Z ld.shared.b16 %rs722, [%r18+1024]; 2026-02-21T10:22:24.2687722Z ld.shared.b16 %rs723, [%r18+64]; 2026-02-21T10:22:24.2687792Z ld.shared.b16 %rs724, [%r18+1088]; 2026-02-21T10:22:24.2687852Z ld.shared.b16 %rs725, [%r18+8192]; 2026-02-21T10:22:24.2687912Z ld.shared.b16 %rs726, [%r18+9216]; 2026-02-21T10:22:24.2687975Z ld.shared.b16 %rs727, [%r18+8256]; 2026-02-21T10:22:24.2688046Z ld.shared.b16 %rs728, [%r18+9280]; 2026-02-21T10:22:24.2688108Z ld.shared.b16 %rs729, [%r19]; 2026-02-21T10:22:24.2688175Z ld.shared.b16 %rs730, [%r19+1024]; 2026-02-21T10:22:24.2688238Z ld.shared.b16 %rs731, [%r19+64]; 2026-02-21T10:22:24.2688369Z ld.shared.b16 %rs732, [%r19+1088]; 2026-02-21T10:22:24.2688433Z ld.shared.b16 %rs733, [%r19+8192]; 2026-02-21T10:22:24.2688498Z ld.shared.b16 %rs734, [%r19+9216]; 2026-02-21T10:22:24.2688561Z ld.shared.b16 %rs735, [%r19+8256]; 2026-02-21T10:22:24.2688624Z ld.shared.b16 %rs736, [%r19+9280]; 2026-02-21T10:22:24.2688691Z cvt.f32.bf16 %r9942, %rs673; 2026-02-21T10:22:24.2688750Z cvt.f32.bf16 %r9943, %rs674; 2026-02-21T10:22:24.2688809Z cvt.f32.bf16 %r9944, %rs681; 2026-02-21T10:22:24.2688871Z cvt.f32.bf16 %r9945, %rs682; 2026-02-21T10:22:24.2688933Z cvt.f32.bf16 %r10074, %rs689; 2026-02-21T10:22:24.2688990Z cvt.f32.bf16 %r10075, %rs690; 2026-02-21T10:22:24.2689048Z cvt.f32.bf16 %r10076, %rs697; 2026-02-21T10:22:24.2689107Z cvt.f32.bf16 %r10077, %rs698; 2026-02-21T10:22:24.2689164Z cvt.f32.bf16 %r10206, %rs705; 2026-02-21T10:22:24.2689222Z cvt.f32.bf16 %r10207, %rs706; 2026-02-21T10:22:24.2689279Z cvt.f32.bf16 %r10208, %rs713; 2026-02-21T10:22:24.2689338Z cvt.f32.bf16 %r10209, %rs714; 2026-02-21T10:22:24.2689397Z cvt.f32.bf16 %r10338, %rs721; 2026-02-21T10:22:24.2689454Z cvt.f32.bf16 %r10339, %rs722; 2026-02-21T10:22:24.2689513Z cvt.f32.bf16 %r10340, %rs729; 2026-02-21T10:22:24.2689581Z cvt.f32.bf16 %r10341, %rs730; 2026-02-21T10:22:24.2689640Z cvt.f32.bf16 %r10470, %rs675; 2026-02-21T10:22:24.2689698Z cvt.f32.bf16 %r10471, %rs676; 2026-02-21T10:22:24.2689760Z cvt.f32.bf16 %r10472, %rs683; 2026-02-21T10:22:24.2689817Z cvt.f32.bf16 %r10473, %rs684; 2026-02-21T10:22:24.2689952Z cvt.f32.bf16 %r10602, %rs691; 2026-02-21T10:22:24.2690014Z cvt.f32.bf16 %r10603, %rs692; 2026-02-21T10:22:24.2690071Z cvt.f32.bf16 %r10604, %rs699; 2026-02-21T10:22:24.2690127Z cvt.f32.bf16 %r10605, %rs700; 2026-02-21T10:22:24.2690189Z cvt.f32.bf16 %r10734, %rs707; 2026-02-21T10:22:24.2690245Z cvt.f32.bf16 %r10735, %rs708; 2026-02-21T10:22:24.2690304Z cvt.f32.bf16 %r10736, %rs715; 2026-02-21T10:22:24.2690360Z cvt.f32.bf16 %r10737, %rs716; 2026-02-21T10:22:24.2690423Z cvt.f32.bf16 %r10866, %rs723; 2026-02-21T10:22:24.2690480Z cvt.f32.bf16 %r10867, %rs724; 2026-02-21T10:22:24.2690549Z cvt.f32.bf16 %r10868, %rs731; 2026-02-21T10:22:24.2690610Z cvt.f32.bf16 %r10869, %rs732; 2026-02-21T10:22:24.2690669Z cvt.f32.bf16 %r10998, %rs677; 2026-02-21T10:22:24.2690726Z cvt.f32.bf16 %r10999, %rs678; 2026-02-21T10:22:24.2690785Z cvt.f32.bf16 %r11000, %rs685; 2026-02-21T10:22:24.2690847Z cvt.f32.bf16 %r11001, %rs686; 2026-02-21T10:22:24.2690907Z cvt.f32.bf16 %r11130, %rs693; 2026-02-21T10:22:24.2691018Z cvt.f32.bf16 %r11131, %rs694; 2026-02-21T10:22:24.2691082Z cvt.f32.bf16 %r11132, %rs701; 2026-02-21T10:22:24.2691141Z cvt.f32.bf16 %r11133, %rs702; 2026-02-21T10:22:24.2691197Z cvt.f32.bf16 %r11262, %rs709; 2026-02-21T10:22:24.2691254Z cvt.f32.bf16 %r11263, %rs710; 2026-02-21T10:22:24.2691361Z cvt.f32.bf16 %r11264, %rs717; 2026-02-21T10:22:24.2691417Z cvt.f32.bf16 %r11265, %rs718; 2026-02-21T10:22:24.2691474Z cvt.f32.bf16 %r11394, %rs725; 2026-02-21T10:22:24.2691533Z cvt.f32.bf16 %r11395, %rs726; 2026-02-21T10:22:24.2691591Z cvt.f32.bf16 %r11396, %rs733; 2026-02-21T10:22:24.2691648Z cvt.f32.bf16 %r11397, %rs734; 2026-02-21T10:22:24.2691708Z cvt.f32.bf16 %r11526, %rs679; 2026-02-21T10:22:24.2691766Z cvt.f32.bf16 %r11527, %rs680; 2026-02-21T10:22:24.2691822Z cvt.f32.bf16 %r11528, %rs687; 2026-02-21T10:22:24.2691879Z cvt.f32.bf16 %r11529, %rs688; 2026-02-21T10:22:24.2691937Z cvt.f32.bf16 %r11658, %rs695; 2026-02-21T10:22:24.2691996Z cvt.f32.bf16 %r11659, %rs696; 2026-02-21T10:22:24.2692054Z cvt.f32.bf16 %r11660, %rs703; 2026-02-21T10:22:24.2692114Z cvt.f32.bf16 %r11661, %rs704; 2026-02-21T10:22:24.2692170Z cvt.f32.bf16 %r11790, %rs711; 2026-02-21T10:22:24.2692226Z cvt.f32.bf16 %r11791, %rs712; 2026-02-21T10:22:24.2692282Z cvt.f32.bf16 %r11792, %rs719; 2026-02-21T10:22:24.2692344Z cvt.f32.bf16 %r11793, %rs720; 2026-02-21T10:22:24.2692401Z cvt.f32.bf16 %r11922, %rs727; 2026-02-21T10:22:24.2692456Z cvt.f32.bf16 %r11923, %rs728; 2026-02-21T10:22:24.2692568Z cvt.f32.bf16 %r11924, %rs735; 2026-02-21T10:22:24.2692627Z cvt.f32.bf16 %r11925, %rs736; 2026-02-21T10:22:24.2692851Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2692908Z bar.sync 0; 2026-02-21T10:22:24.2692981Z // begin inline asm 2026-02-21T10:22:24.2693093Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2693150Z // end inline asm 2026-02-21T10:22:24.2693208Z bar.sync 0; 2026-02-21T10:22:24.2693265Z // begin inline asm 2026-02-21T10:22:24.2693401Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2693457Z // end inline asm 2026-02-21T10:22:24.2693513Z // begin inline asm 2026-02-21T10:22:24.2693587Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2693642Z // end inline asm 2026-02-21T10:22:24.2693699Z bar.sync 0; 2026-02-21T10:22:24.2693764Z elect.sync %r12188|%p124, -1; 2026-02-21T10:22:24.2693830Z and.pred %p105, %p1, %p124; 2026-02-21T10:22:24.2693895Z add.s64 %rd843, %rd843, 32; 2026-02-21T10:22:24.2693954Z cvt.u32.u64 %r9809, %rd843; 2026-02-21T10:22:24.2694022Z // begin inline asm 2026-02-21T10:22:24.2694360Z @%p105 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r9808, %r9809}], [%r29850]; 2026-02-21T10:22:24.2694417Z // end inline asm 2026-02-21T10:22:24.2694469Z bar.sync 0; 2026-02-21T10:22:24.2694526Z mov.b32 %r12056, 0; 2026-02-21T10:22:24.2694647Z // begin inline asm 2026-02-21T10:22:24.2694705Z 2026-02-21T10:22:24.2694753Z { 2026-02-21T10:22:24.2694815Z .reg .pred complete; 2026-02-21T10:22:24.2694872Z waitLoop: 2026-02-21T10:22:24.2695021Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r12056; 2026-02-21T10:22:24.2695089Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2695145Z } 2026-02-21T10:22:24.2695151Z 2026-02-21T10:22:24.2695207Z // end inline asm 2026-02-21T10:22:24.2695258Z bar.sync 0; 2026-02-21T10:22:24.2695317Z // begin inline asm 2026-02-21T10:22:24.2695414Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2695468Z // end inline asm 2026-02-21T10:22:24.2695680Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2695752Z ld.shared.s8 %rs737, [%r20]; 2026-02-21T10:22:24.2695959Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2696026Z shl.b16 %rs738, %rs737, 4; 2026-02-21T10:22:24.2696286Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2696366Z ld.shared.s8 %rs739, [%r21+128]; 2026-02-21T10:22:24.2696695Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2696870Z shl.b16 %rs740, %rs739, 4; 2026-02-21T10:22:24.2697066Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2697135Z ld.shared.s8 %rs741, [%r22+256]; 2026-02-21T10:22:24.2697328Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2697387Z shl.b16 %rs742, %rs741, 4; 2026-02-21T10:22:24.2697574Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2697639Z ld.shared.s8 %rs743, [%r23+384]; 2026-02-21T10:22:24.2697833Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2697891Z shl.b16 %rs744, %rs743, 4; 2026-02-21T10:22:24.2698077Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2698142Z ld.shared.s8 %rs745, [%r24+512]; 2026-02-21T10:22:24.2698333Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2698392Z shl.b16 %rs746, %rs745, 4; 2026-02-21T10:22:24.2698652Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2698717Z ld.shared.s8 %rs747, [%r25+640]; 2026-02-21T10:22:24.2698904Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2698964Z shl.b16 %rs748, %rs747, 4; 2026-02-21T10:22:24.2699160Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2699229Z ld.shared.s8 %rs749, [%r26+768]; 2026-02-21T10:22:24.2699418Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2699479Z shl.b16 %rs750, %rs749, 4; 2026-02-21T10:22:24.2699666Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2699730Z ld.shared.s8 %rs751, [%r27+896]; 2026-02-21T10:22:24.2699923Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2699981Z shl.b16 %rs752, %rs751, 4; 2026-02-21T10:22:24.2700166Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2700236Z ld.shared.s8 %rs753, [%r20+1024]; 2026-02-21T10:22:24.2700423Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2700548Z shl.b16 %rs754, %rs753, 4; 2026-02-21T10:22:24.2700739Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2700802Z ld.shared.s8 %rs755, [%r21+1152]; 2026-02-21T10:22:24.2700987Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2701049Z shl.b16 %rs756, %rs755, 4; 2026-02-21T10:22:24.2701246Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2701310Z ld.shared.s8 %rs757, [%r22+1280]; 2026-02-21T10:22:24.2701494Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2701556Z shl.b16 %rs758, %rs757, 4; 2026-02-21T10:22:24.2701741Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2701803Z ld.shared.s8 %rs759, [%r23+1408]; 2026-02-21T10:22:24.2702060Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2702121Z shl.b16 %rs760, %rs759, 4; 2026-02-21T10:22:24.2702307Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2702382Z ld.shared.s8 %rs761, [%r24+1536]; 2026-02-21T10:22:24.2702621Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2702682Z shl.b16 %rs762, %rs761, 4; 2026-02-21T10:22:24.2702870Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2702934Z ld.shared.s8 %rs763, [%r25+1664]; 2026-02-21T10:22:24.2703121Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2703178Z shl.b16 %rs764, %rs763, 4; 2026-02-21T10:22:24.2703370Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2703433Z ld.shared.s8 %rs765, [%r26+1792]; 2026-02-21T10:22:24.2703620Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2703681Z shl.b16 %rs766, %rs765, 4; 2026-02-21T10:22:24.2703868Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2703933Z ld.shared.s8 %rs767, [%r27+1920]; 2026-02-21T10:22:24.2704180Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2704243Z shl.b16 %rs768, %rs767, 4; 2026-02-21T10:22:24.2704431Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2704491Z ld.shared.s8 %rs769, [%r20+2048]; 2026-02-21T10:22:24.2704681Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2704742Z shl.b16 %rs770, %rs769, 4; 2026-02-21T10:22:24.2704929Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2704994Z ld.shared.s8 %rs771, [%r21+2176]; 2026-02-21T10:22:24.2705179Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2705241Z shl.b16 %rs772, %rs771, 4; 2026-02-21T10:22:24.2705435Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2705496Z ld.shared.s8 %rs773, [%r22+2304]; 2026-02-21T10:22:24.2705683Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2705742Z shl.b16 %rs774, %rs773, 4; 2026-02-21T10:22:24.2705926Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2705991Z ld.shared.s8 %rs775, [%r23+2432]; 2026-02-21T10:22:24.2706237Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2706297Z shl.b16 %rs776, %rs775, 4; 2026-02-21T10:22:24.2706613Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2706683Z ld.shared.s8 %rs777, [%r24+2560]; 2026-02-21T10:22:24.2706875Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2706936Z shl.b16 %rs778, %rs777, 4; 2026-02-21T10:22:24.2707123Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2707187Z ld.shared.s8 %rs779, [%r25+2688]; 2026-02-21T10:22:24.2707373Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2707431Z shl.b16 %rs780, %rs779, 4; 2026-02-21T10:22:24.2707710Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2707784Z ld.shared.s8 %rs781, [%r26+2816]; 2026-02-21T10:22:24.2707988Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2708052Z shl.b16 %rs782, %rs781, 4; 2026-02-21T10:22:24.2708318Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2708393Z ld.shared.s8 %rs783, [%r27+2944]; 2026-02-21T10:22:24.2708677Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2708741Z shl.b16 %rs784, %rs783, 4; 2026-02-21T10:22:24.2708935Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2708996Z ld.shared.s8 %rs785, [%r20+3072]; 2026-02-21T10:22:24.2709188Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2709253Z shl.b16 %rs786, %rs785, 4; 2026-02-21T10:22:24.2709441Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2709506Z ld.shared.s8 %rs787, [%r21+3200]; 2026-02-21T10:22:24.2709694Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2709756Z shl.b16 %rs788, %rs787, 4; 2026-02-21T10:22:24.2710020Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2710085Z ld.shared.s8 %rs789, [%r22+3328]; 2026-02-21T10:22:24.2710274Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2710332Z shl.b16 %rs790, %rs789, 4; 2026-02-21T10:22:24.2710520Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2710583Z ld.shared.s8 %rs791, [%r23+3456]; 2026-02-21T10:22:24.2710771Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2710833Z shl.b16 %rs792, %rs791, 4; 2026-02-21T10:22:24.2711018Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2711083Z ld.shared.s8 %rs793, [%r24+3584]; 2026-02-21T10:22:24.2711271Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2711332Z shl.b16 %rs794, %rs793, 4; 2026-02-21T10:22:24.2711516Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2711592Z ld.shared.s8 %rs795, [%r25+3712]; 2026-02-21T10:22:24.2711779Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2711839Z shl.b16 %rs796, %rs795, 4; 2026-02-21T10:22:24.2712099Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2712163Z ld.shared.s8 %rs797, [%r26+3840]; 2026-02-21T10:22:24.2712349Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2712417Z shl.b16 %rs798, %rs797, 4; 2026-02-21T10:22:24.2712609Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2712673Z ld.shared.s8 %rs799, [%r27+3968]; 2026-02-21T10:22:24.2712858Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2712919Z shl.b16 %rs800, %rs799, 4; 2026-02-21T10:22:24.2713104Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2713167Z cvt.s16.s8 %rs801, %rs738; 2026-02-21T10:22:24.2713227Z shr.s16 %rs802, %rs801, 4; 2026-02-21T10:22:24.2713288Z cvt.s16.s8 %rs803, %rs740; 2026-02-21T10:22:24.2713396Z shr.s16 %rs804, %rs803, 4; 2026-02-21T10:22:24.2713456Z shr.s16 %rs805, %rs737, 4; 2026-02-21T10:22:24.2713518Z shr.s16 %rs806, %rs739, 4; 2026-02-21T10:22:24.2713715Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2713840Z cvt.rn.f32.s16 %r12189, %rs806; 2026-02-21T10:22:24.2713908Z cvt.rn.f32.s16 %r12190, %rs805; 2026-02-21T10:22:24.2713967Z cvt.rn.f32.s16 %r12191, %rs804; 2026-02-21T10:22:24.2714027Z cvt.rn.f32.s16 %r12192, %rs802; 2026-02-21T10:22:24.2714235Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2714300Z cvt.s16.s8 %rs807, %rs742; 2026-02-21T10:22:24.2714358Z shr.s16 %rs808, %rs807, 4; 2026-02-21T10:22:24.2714417Z cvt.s16.s8 %rs809, %rs744; 2026-02-21T10:22:24.2714476Z shr.s16 %rs810, %rs809, 4; 2026-02-21T10:22:24.2714536Z shr.s16 %rs811, %rs741, 4; 2026-02-21T10:22:24.2714595Z shr.s16 %rs812, %rs743, 4; 2026-02-21T10:22:24.2714789Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2714850Z cvt.rn.f32.s16 %r12193, %rs812; 2026-02-21T10:22:24.2714909Z cvt.rn.f32.s16 %r12194, %rs811; 2026-02-21T10:22:24.2714968Z cvt.rn.f32.s16 %r12195, %rs810; 2026-02-21T10:22:24.2715033Z cvt.rn.f32.s16 %r12196, %rs808; 2026-02-21T10:22:24.2715276Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2715338Z cvt.s16.s8 %rs813, %rs746; 2026-02-21T10:22:24.2715397Z shr.s16 %rs814, %rs813, 4; 2026-02-21T10:22:24.2715455Z cvt.s16.s8 %rs815, %rs748; 2026-02-21T10:22:24.2715512Z shr.s16 %rs816, %rs815, 4; 2026-02-21T10:22:24.2715570Z shr.s16 %rs817, %rs745, 4; 2026-02-21T10:22:24.2715630Z shr.s16 %rs818, %rs747, 4; 2026-02-21T10:22:24.2715819Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2715884Z cvt.rn.f32.s16 %r12197, %rs818; 2026-02-21T10:22:24.2715947Z cvt.rn.f32.s16 %r12198, %rs817; 2026-02-21T10:22:24.2716005Z cvt.rn.f32.s16 %r12199, %rs816; 2026-02-21T10:22:24.2716063Z cvt.rn.f32.s16 %r12200, %rs814; 2026-02-21T10:22:24.2716253Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2716316Z cvt.s16.s8 %rs819, %rs750; 2026-02-21T10:22:24.2716374Z shr.s16 %rs820, %rs819, 4; 2026-02-21T10:22:24.2716432Z cvt.s16.s8 %rs821, %rs752; 2026-02-21T10:22:24.2716628Z shr.s16 %rs822, %rs821, 4; 2026-02-21T10:22:24.2716691Z shr.s16 %rs823, %rs749, 4; 2026-02-21T10:22:24.2716749Z shr.s16 %rs824, %rs751, 4; 2026-02-21T10:22:24.2716939Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2717000Z cvt.rn.f32.s16 %r12201, %rs824; 2026-02-21T10:22:24.2717059Z cvt.rn.f32.s16 %r12202, %rs823; 2026-02-21T10:22:24.2717215Z cvt.rn.f32.s16 %r12203, %rs822; 2026-02-21T10:22:24.2717277Z cvt.rn.f32.s16 %r12204, %rs820; 2026-02-21T10:22:24.2717467Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2717527Z cvt.s16.s8 %rs825, %rs754; 2026-02-21T10:22:24.2717589Z shr.s16 %rs826, %rs825, 4; 2026-02-21T10:22:24.2717650Z cvt.s16.s8 %rs827, %rs756; 2026-02-21T10:22:24.2717708Z shr.s16 %rs828, %rs827, 4; 2026-02-21T10:22:24.2717769Z shr.s16 %rs829, %rs753, 4; 2026-02-21T10:22:24.2717830Z shr.s16 %rs830, %rs755, 4; 2026-02-21T10:22:24.2718018Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2718077Z cvt.rn.f32.s16 %r12205, %rs830; 2026-02-21T10:22:24.2718138Z cvt.rn.f32.s16 %r12206, %rs829; 2026-02-21T10:22:24.2718197Z cvt.rn.f32.s16 %r12207, %rs828; 2026-02-21T10:22:24.2718255Z cvt.rn.f32.s16 %r12208, %rs826; 2026-02-21T10:22:24.2718526Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2718590Z cvt.s16.s8 %rs831, %rs758; 2026-02-21T10:22:24.2718648Z shr.s16 %rs832, %rs831, 4; 2026-02-21T10:22:24.2718708Z cvt.s16.s8 %rs833, %rs760; 2026-02-21T10:22:24.2718767Z shr.s16 %rs834, %rs833, 4; 2026-02-21T10:22:24.2718838Z shr.s16 %rs835, %rs757, 4; 2026-02-21T10:22:24.2718958Z shr.s16 %rs836, %rs759, 4; 2026-02-21T10:22:24.2719153Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2719216Z cvt.rn.f32.s16 %r12209, %rs836; 2026-02-21T10:22:24.2719276Z cvt.rn.f32.s16 %r12210, %rs835; 2026-02-21T10:22:24.2719338Z cvt.rn.f32.s16 %r12211, %rs834; 2026-02-21T10:22:24.2719397Z cvt.rn.f32.s16 %r12212, %rs832; 2026-02-21T10:22:24.2719584Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2719645Z cvt.s16.s8 %rs837, %rs762; 2026-02-21T10:22:24.2719705Z shr.s16 %rs838, %rs837, 4; 2026-02-21T10:22:24.2719766Z cvt.s16.s8 %rs839, %rs764; 2026-02-21T10:22:24.2719824Z shr.s16 %rs840, %rs839, 4; 2026-02-21T10:22:24.2719886Z shr.s16 %rs841, %rs761, 4; 2026-02-21T10:22:24.2719945Z shr.s16 %rs842, %rs763, 4; 2026-02-21T10:22:24.2720132Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2720197Z cvt.rn.f32.s16 %r12213, %rs842; 2026-02-21T10:22:24.2720255Z cvt.rn.f32.s16 %r12214, %rs841; 2026-02-21T10:22:24.2720382Z cvt.rn.f32.s16 %r12215, %rs840; 2026-02-21T10:22:24.2720445Z cvt.rn.f32.s16 %r12216, %rs838; 2026-02-21T10:22:24.2720635Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2720694Z cvt.s16.s8 %rs843, %rs766; 2026-02-21T10:22:24.2720752Z shr.s16 %rs844, %rs843, 4; 2026-02-21T10:22:24.2720817Z cvt.s16.s8 %rs845, %rs768; 2026-02-21T10:22:24.2720875Z shr.s16 %rs846, %rs845, 4; 2026-02-21T10:22:24.2720950Z shr.s16 %rs847, %rs765, 4; 2026-02-21T10:22:24.2721010Z shr.s16 %rs848, %rs767, 4; 2026-02-21T10:22:24.2721198Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2721258Z cvt.rn.f32.s16 %r12217, %rs848; 2026-02-21T10:22:24.2721318Z cvt.rn.f32.s16 %r12218, %rs847; 2026-02-21T10:22:24.2721380Z cvt.rn.f32.s16 %r12219, %rs846; 2026-02-21T10:22:24.2721439Z cvt.rn.f32.s16 %r12220, %rs844; 2026-02-21T10:22:24.2721629Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2721691Z cvt.s16.s8 %rs849, %rs770; 2026-02-21T10:22:24.2721749Z shr.s16 %rs850, %rs849, 4; 2026-02-21T10:22:24.2721807Z cvt.s16.s8 %rs851, %rs772; 2026-02-21T10:22:24.2721866Z shr.s16 %rs852, %rs851, 4; 2026-02-21T10:22:24.2721923Z shr.s16 %rs853, %rs769, 4; 2026-02-21T10:22:24.2721980Z shr.s16 %rs854, %rs771, 4; 2026-02-21T10:22:24.2722168Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2722299Z cvt.rn.f32.s16 %r12221, %rs854; 2026-02-21T10:22:24.2722359Z cvt.rn.f32.s16 %r12222, %rs853; 2026-02-21T10:22:24.2722419Z cvt.rn.f32.s16 %r12223, %rs852; 2026-02-21T10:22:24.2722480Z cvt.rn.f32.s16 %r12224, %rs850; 2026-02-21T10:22:24.2722670Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2722731Z cvt.s16.s8 %rs855, %rs774; 2026-02-21T10:22:24.2722795Z shr.s16 %rs856, %rs855, 4; 2026-02-21T10:22:24.2722854Z cvt.s16.s8 %rs857, %rs776; 2026-02-21T10:22:24.2722910Z shr.s16 %rs858, %rs857, 4; 2026-02-21T10:22:24.2722967Z shr.s16 %rs859, %rs773, 4; 2026-02-21T10:22:24.2723027Z shr.s16 %rs860, %rs775, 4; 2026-02-21T10:22:24.2723213Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2723274Z cvt.rn.f32.s16 %r12225, %rs860; 2026-02-21T10:22:24.2723391Z cvt.rn.f32.s16 %r12226, %rs859; 2026-02-21T10:22:24.2723454Z cvt.rn.f32.s16 %r12227, %rs858; 2026-02-21T10:22:24.2723514Z cvt.rn.f32.s16 %r12228, %rs856; 2026-02-21T10:22:24.2723700Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2723760Z cvt.s16.s8 %rs861, %rs778; 2026-02-21T10:22:24.2723866Z shr.s16 %rs862, %rs861, 4; 2026-02-21T10:22:24.2723923Z cvt.s16.s8 %rs863, %rs780; 2026-02-21T10:22:24.2723981Z shr.s16 %rs864, %rs863, 4; 2026-02-21T10:22:24.2724041Z shr.s16 %rs865, %rs777, 4; 2026-02-21T10:22:24.2724099Z shr.s16 %rs866, %rs779, 4; 2026-02-21T10:22:24.2724288Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2724348Z cvt.rn.f32.s16 %r12229, %rs866; 2026-02-21T10:22:24.2724409Z cvt.rn.f32.s16 %r12230, %rs865; 2026-02-21T10:22:24.2724467Z cvt.rn.f32.s16 %r12231, %rs864; 2026-02-21T10:22:24.2724530Z cvt.rn.f32.s16 %r12232, %rs862; 2026-02-21T10:22:24.2724721Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2724780Z cvt.s16.s8 %rs867, %rs782; 2026-02-21T10:22:24.2724840Z shr.s16 %rs868, %rs867, 4; 2026-02-21T10:22:24.2724898Z cvt.s16.s8 %rs869, %rs784; 2026-02-21T10:22:24.2724957Z shr.s16 %rs870, %rs869, 4; 2026-02-21T10:22:24.2725017Z shr.s16 %rs871, %rs781, 4; 2026-02-21T10:22:24.2725077Z shr.s16 %rs872, %rs783, 4; 2026-02-21T10:22:24.2725315Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2725378Z cvt.rn.f32.s16 %r12233, %rs872; 2026-02-21T10:22:24.2725438Z cvt.rn.f32.s16 %r12234, %rs871; 2026-02-21T10:22:24.2725496Z cvt.rn.f32.s16 %r12235, %rs870; 2026-02-21T10:22:24.2725554Z cvt.rn.f32.s16 %r12236, %rs868; 2026-02-21T10:22:24.2725743Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2725804Z cvt.s16.s8 %rs873, %rs786; 2026-02-21T10:22:24.2725864Z shr.s16 %rs874, %rs873, 4; 2026-02-21T10:22:24.2725922Z cvt.s16.s8 %rs875, %rs788; 2026-02-21T10:22:24.2725983Z shr.s16 %rs876, %rs875, 4; 2026-02-21T10:22:24.2726040Z shr.s16 %rs877, %rs785, 4; 2026-02-21T10:22:24.2726096Z shr.s16 %rs878, %rs787, 4; 2026-02-21T10:22:24.2726300Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2726362Z cvt.rn.f32.s16 %r12237, %rs878; 2026-02-21T10:22:24.2726425Z cvt.rn.f32.s16 %r12238, %rs877; 2026-02-21T10:22:24.2726609Z cvt.rn.f32.s16 %r12239, %rs876; 2026-02-21T10:22:24.2726671Z cvt.rn.f32.s16 %r12240, %rs874; 2026-02-21T10:22:24.2726859Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2726918Z cvt.s16.s8 %rs879, %rs790; 2026-02-21T10:22:24.2726990Z shr.s16 %rs880, %rs879, 4; 2026-02-21T10:22:24.2727050Z cvt.s16.s8 %rs881, %rs792; 2026-02-21T10:22:24.2727190Z shr.s16 %rs882, %rs881, 4; 2026-02-21T10:22:24.2727251Z shr.s16 %rs883, %rs789, 4; 2026-02-21T10:22:24.2727309Z shr.s16 %rs884, %rs791, 4; 2026-02-21T10:22:24.2727497Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2727556Z cvt.rn.f32.s16 %r12241, %rs884; 2026-02-21T10:22:24.2727621Z cvt.rn.f32.s16 %r12242, %rs883; 2026-02-21T10:22:24.2727680Z cvt.rn.f32.s16 %r12243, %rs882; 2026-02-21T10:22:24.2727740Z cvt.rn.f32.s16 %r12244, %rs880; 2026-02-21T10:22:24.2727930Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2727988Z cvt.s16.s8 %rs885, %rs794; 2026-02-21T10:22:24.2728045Z shr.s16 %rs886, %rs885, 4; 2026-02-21T10:22:24.2728105Z cvt.s16.s8 %rs887, %rs796; 2026-02-21T10:22:24.2728162Z shr.s16 %rs888, %rs887, 4; 2026-02-21T10:22:24.2728218Z shr.s16 %rs889, %rs793, 4; 2026-02-21T10:22:24.2728277Z shr.s16 %rs890, %rs795, 4; 2026-02-21T10:22:24.2728544Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2728611Z cvt.rn.f32.s16 %r12245, %rs890; 2026-02-21T10:22:24.2728674Z cvt.rn.f32.s16 %r12246, %rs889; 2026-02-21T10:22:24.2728736Z cvt.rn.f32.s16 %r12247, %rs888; 2026-02-21T10:22:24.2728856Z cvt.rn.f32.s16 %r12248, %rs886; 2026-02-21T10:22:24.2729053Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2729116Z cvt.s16.s8 %rs891, %rs798; 2026-02-21T10:22:24.2729174Z shr.s16 %rs892, %rs891, 4; 2026-02-21T10:22:24.2729231Z cvt.s16.s8 %rs893, %rs800; 2026-02-21T10:22:24.2729288Z shr.s16 %rs894, %rs893, 4; 2026-02-21T10:22:24.2729349Z shr.s16 %rs895, %rs797, 4; 2026-02-21T10:22:24.2729406Z shr.s16 %rs896, %rs799, 4; 2026-02-21T10:22:24.2729594Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2729660Z cvt.rn.f32.s16 %r12249, %rs896; 2026-02-21T10:22:24.2729721Z cvt.rn.f32.s16 %r12250, %rs895; 2026-02-21T10:22:24.2729779Z cvt.rn.f32.s16 %r12251, %rs894; 2026-02-21T10:22:24.2729838Z cvt.rn.f32.s16 %r12252, %rs892; 2026-02-21T10:22:24.2729895Z bar.sync 0; 2026-02-21T10:22:24.2730017Z st.shared.v4.b32 [%r28], {%r12192, %r12190, %r12191, %r12189}; 2026-02-21T10:22:24.2730150Z st.shared.v4.b32 [%r28+16384], {%r12224, %r12222, %r12223, %r12221}; 2026-02-21T10:22:24.2730332Z st.shared.v4.b32 [%r29], {%r12196, %r12194, %r12195, %r12193}; 2026-02-21T10:22:24.2730458Z st.shared.v4.b32 [%r29+16384], {%r12228, %r12226, %r12227, %r12225}; 2026-02-21T10:22:24.2730576Z st.shared.v4.b32 [%r30], {%r12200, %r12198, %r12199, %r12197}; 2026-02-21T10:22:24.2730695Z st.shared.v4.b32 [%r30+16384], {%r12232, %r12230, %r12231, %r12229}; 2026-02-21T10:22:24.2730802Z st.shared.v4.b32 [%r31], {%r12204, %r12202, %r12203, %r12201}; 2026-02-21T10:22:24.2730917Z st.shared.v4.b32 [%r31+16384], {%r12236, %r12234, %r12235, %r12233}; 2026-02-21T10:22:24.2731029Z st.shared.v4.b32 [%r32], {%r12208, %r12206, %r12207, %r12205}; 2026-02-21T10:22:24.2731144Z st.shared.v4.b32 [%r32+16384], {%r12240, %r12238, %r12239, %r12237}; 2026-02-21T10:22:24.2731247Z st.shared.v4.b32 [%r33], {%r12212, %r12210, %r12211, %r12209}; 2026-02-21T10:22:24.2731363Z st.shared.v4.b32 [%r33+16384], {%r12244, %r12242, %r12243, %r12241}; 2026-02-21T10:22:24.2731469Z st.shared.v4.b32 [%r34], {%r12216, %r12214, %r12215, %r12213}; 2026-02-21T10:22:24.2731584Z st.shared.v4.b32 [%r34+16384], {%r12248, %r12246, %r12247, %r12245}; 2026-02-21T10:22:24.2731688Z st.shared.v4.b32 [%r35], {%r12220, %r12218, %r12219, %r12217}; 2026-02-21T10:22:24.2731803Z st.shared.v4.b32 [%r35+16384], {%r12252, %r12250, %r12251, %r12249}; 2026-02-21T10:22:24.2731858Z $L__tmp7: 2026-02-21T10:22:24.2732127Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2732256Z // begin inline asm 2026-02-21T10:22:24.2732339Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2732393Z // end inline asm 2026-02-21T10:22:24.2732448Z bar.sync 0; 2026-02-21T10:22:24.2732533Z shfl.sync.idx.b32 %r12253, %r5, 0, 31, -1; 2026-02-21T10:22:24.2732605Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2732668Z mov.pred %p107, -1; 2026-02-21T10:22:24.2732730Z // begin inline asm 2026-02-21T10:22:24.2734238Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r9942,%r9943,%r9944,%r9945}, %rd3, %p107, 1, 1; 2026-02-21T10:22:24.2734349Z // end inline asm 2026-02-21T10:22:24.2734409Z // begin inline asm 2026-02-21T10:22:24.2735902Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10074,%r10075,%r10076,%r10077}, %rd4, %p107, 1, 1; 2026-02-21T10:22:24.2736013Z // end inline asm 2026-02-21T10:22:24.2736072Z // begin inline asm 2026-02-21T10:22:24.2737681Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10206,%r10207,%r10208,%r10209}, %rd5, %p107, 1, 1; 2026-02-21T10:22:24.2737833Z // end inline asm 2026-02-21T10:22:24.2737893Z // begin inline asm 2026-02-21T10:22:24.2739388Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10338,%r10339,%r10340,%r10341}, %rd6, %p107, 1, 1; 2026-02-21T10:22:24.2739447Z // end inline asm 2026-02-21T10:22:24.2739503Z // begin inline asm 2026-02-21T10:22:24.2740988Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10470,%r10471,%r10472,%r10473}, %rd7, %p107, 1, 1; 2026-02-21T10:22:24.2741129Z // end inline asm 2026-02-21T10:22:24.2741187Z // begin inline asm 2026-02-21T10:22:24.2742665Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10602,%r10603,%r10604,%r10605}, %rd8, %p107, 1, 1; 2026-02-21T10:22:24.2742722Z // end inline asm 2026-02-21T10:22:24.2742783Z // begin inline asm 2026-02-21T10:22:24.2744326Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10734,%r10735,%r10736,%r10737}, %rd9, %p107, 1, 1; 2026-02-21T10:22:24.2744446Z // end inline asm 2026-02-21T10:22:24.2744505Z // begin inline asm 2026-02-21T10:22:24.2745988Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537}, {%r10866,%r10867,%r10868,%r10869}, %rd10, %p107, 1, 1; 2026-02-21T10:22:24.2746050Z // end inline asm 2026-02-21T10:22:24.2746106Z // begin inline asm 2026-02-21T10:22:24.2747825Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r10998,%r10999,%r11000,%r11001}, %rd3, %p107, 1, 1; 2026-02-21T10:22:24.2747907Z // end inline asm 2026-02-21T10:22:24.2747965Z // begin inline asm 2026-02-21T10:22:24.2749550Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11130,%r11131,%r11132,%r11133}, %rd4, %p107, 1, 1; 2026-02-21T10:22:24.2749613Z // end inline asm 2026-02-21T10:22:24.2749671Z // begin inline asm 2026-02-21T10:22:24.2751159Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11262,%r11263,%r11264,%r11265}, %rd5, %p107, 1, 1; 2026-02-21T10:22:24.2751290Z // end inline asm 2026-02-21T10:22:24.2751346Z // begin inline asm 2026-02-21T10:22:24.2752883Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11394,%r11395,%r11396,%r11397}, %rd6, %p107, 1, 1; 2026-02-21T10:22:24.2752944Z // end inline asm 2026-02-21T10:22:24.2753000Z // begin inline asm 2026-02-21T10:22:24.2754475Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11526,%r11527,%r11528,%r11529}, %rd7, %p107, 1, 1; 2026-02-21T10:22:24.2754591Z // end inline asm 2026-02-21T10:22:24.2754648Z // begin inline asm 2026-02-21T10:22:24.2756173Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11658,%r11659,%r11660,%r11661}, %rd8, %p107, 1, 1; 2026-02-21T10:22:24.2756243Z // end inline asm 2026-02-21T10:22:24.2756314Z // begin inline asm 2026-02-21T10:22:24.2757918Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11790,%r11791,%r11792,%r11793}, %rd9, %p107, 1, 1; 2026-02-21T10:22:24.2757983Z // end inline asm 2026-02-21T10:22:24.2758042Z // begin inline asm 2026-02-21T10:22:24.2759530Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601}, {%r11922,%r11923,%r11924,%r11925}, %rd10, %p107, 1, 1; 2026-02-21T10:22:24.2759673Z // end inline asm 2026-02-21T10:22:24.2759748Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.2759810Z mov.b32 %r12054, %r39936; 2026-02-21T10:22:24.2759872Z mov.b32 %r12055, %r12056; 2026-02-21T10:22:24.2759929Z // begin inline asm 2026-02-21T10:22:24.2762527Z // wait for regs: %r42474,%r42475,%r42476,%r42477,%r42478,%r42479,%r42480,%r42481,%r42482,%r42483,%r42484,%r42485,%r42486,%r42487,%r42488,%r42489,%r42490,%r42491,%r42492,%r42493,%r42494,%r42495,%r42496,%r42497,%r42498,%r42499,%r42500,%r42501,%r42502,%r42503,%r42504,%r42505,%r42506,%r42507,%r42508,%r42509,%r42510,%r42511,%r42512,%r42513,%r42514,%r42515,%r42516,%r42517,%r42518,%r42519,%r42520,%r42521,%r42522,%r42523,%r42524,%r42525,%r42526,%r42527,%r42528,%r42529,%r42530,%r42531,%r42532,%r42533,%r42534,%r42535,%r42536,%r42537,%r42538,%r42539,%r42540,%r42541,%r42542,%r42543,%r42544,%r42545,%r42546,%r42547,%r42548,%r42549,%r42550,%r42551,%r42552,%r42553,%r42554,%r42555,%r42556,%r42557,%r42558,%r42559,%r42560,%r42561,%r42562,%r42563,%r42564,%r42565,%r42566,%r42567,%r42568,%r42569,%r42570,%r42571,%r42572,%r42573,%r42574,%r42575,%r42576,%r42577,%r42578,%r42579,%r42580,%r42581,%r42582,%r42583,%r42584,%r42585,%r42586,%r42587,%r42588,%r42589,%r42590,%r42591,%r42592,%r42593,%r42594,%r42595,%r42596,%r42597,%r42598,%r42599,%r42600,%r42601,%r12054,%r12055,%r12056 2026-02-21T10:22:24.2762676Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.2762731Z // end inline asm 2026-02-21T10:22:24.2762787Z $L__tmp8: 2026-02-21T10:22:24.2763004Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.2763070Z add.s64 %rd842, %rd842, 128; 2026-02-21T10:22:24.2763137Z setp.lt.u64 %p125, %rd843, 4064; 2026-02-21T10:22:24.2763198Z @%p125 bra $L__BB0_5; 2026-02-21T10:22:24.2763307Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.2763509Z .loc 1 97 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:97:28 2026-02-21T10:22:24.2763604Z cvt.rn.bf16x2.f32 %r12258, %r42475, %r42474; 2026-02-21T10:22:24.2763685Z cvt.rn.bf16x2.f32 %r12259, %r42477, %r42476; 2026-02-21T10:22:24.2763759Z cvt.rn.bf16x2.f32 %r12260, %r42479, %r42478; 2026-02-21T10:22:24.2763834Z cvt.rn.bf16x2.f32 %r12261, %r42481, %r42480; 2026-02-21T10:22:24.2763910Z cvt.rn.bf16x2.f32 %r12262, %r42483, %r42482; 2026-02-21T10:22:24.2763985Z cvt.rn.bf16x2.f32 %r12263, %r42485, %r42484; 2026-02-21T10:22:24.2764126Z cvt.rn.bf16x2.f32 %r12264, %r42487, %r42486; 2026-02-21T10:22:24.2764206Z cvt.rn.bf16x2.f32 %r12265, %r42489, %r42488; 2026-02-21T10:22:24.2764292Z cvt.rn.bf16x2.f32 %r12266, %r42491, %r42490; 2026-02-21T10:22:24.2764369Z cvt.rn.bf16x2.f32 %r12267, %r42493, %r42492; 2026-02-21T10:22:24.2764445Z cvt.rn.bf16x2.f32 %r12268, %r42495, %r42494; 2026-02-21T10:22:24.2764519Z cvt.rn.bf16x2.f32 %r12269, %r42497, %r42496; 2026-02-21T10:22:24.2764592Z cvt.rn.bf16x2.f32 %r12270, %r42499, %r42498; 2026-02-21T10:22:24.2764676Z cvt.rn.bf16x2.f32 %r12271, %r42501, %r42500; 2026-02-21T10:22:24.2764751Z cvt.rn.bf16x2.f32 %r12272, %r42503, %r42502; 2026-02-21T10:22:24.2764823Z cvt.rn.bf16x2.f32 %r12273, %r42505, %r42504; 2026-02-21T10:22:24.2764896Z cvt.rn.bf16x2.f32 %r12274, %r42507, %r42506; 2026-02-21T10:22:24.2764972Z cvt.rn.bf16x2.f32 %r12275, %r42509, %r42508; 2026-02-21T10:22:24.2765047Z cvt.rn.bf16x2.f32 %r12276, %r42511, %r42510; 2026-02-21T10:22:24.2765119Z cvt.rn.bf16x2.f32 %r12277, %r42513, %r42512; 2026-02-21T10:22:24.2765196Z cvt.rn.bf16x2.f32 %r12278, %r42515, %r42514; 2026-02-21T10:22:24.2765269Z cvt.rn.bf16x2.f32 %r12279, %r42517, %r42516; 2026-02-21T10:22:24.2765341Z cvt.rn.bf16x2.f32 %r12280, %r42519, %r42518; 2026-02-21T10:22:24.2765416Z cvt.rn.bf16x2.f32 %r12281, %r42521, %r42520; 2026-02-21T10:22:24.2765489Z cvt.rn.bf16x2.f32 %r12282, %r42523, %r42522; 2026-02-21T10:22:24.2765562Z cvt.rn.bf16x2.f32 %r12283, %r42525, %r42524; 2026-02-21T10:22:24.2765708Z cvt.rn.bf16x2.f32 %r12284, %r42527, %r42526; 2026-02-21T10:22:24.2765785Z cvt.rn.bf16x2.f32 %r12285, %r42529, %r42528; 2026-02-21T10:22:24.2765857Z cvt.rn.bf16x2.f32 %r12286, %r42531, %r42530; 2026-02-21T10:22:24.2765931Z cvt.rn.bf16x2.f32 %r12287, %r42533, %r42532; 2026-02-21T10:22:24.2766013Z cvt.rn.bf16x2.f32 %r12288, %r42535, %r42534; 2026-02-21T10:22:24.2766088Z cvt.rn.bf16x2.f32 %r12289, %r42537, %r42536; 2026-02-21T10:22:24.2766161Z cvt.rn.bf16x2.f32 %r12290, %r42539, %r42538; 2026-02-21T10:22:24.2766238Z cvt.rn.bf16x2.f32 %r12291, %r42541, %r42540; 2026-02-21T10:22:24.2766312Z cvt.rn.bf16x2.f32 %r12292, %r42543, %r42542; 2026-02-21T10:22:24.2766383Z cvt.rn.bf16x2.f32 %r12293, %r42545, %r42544; 2026-02-21T10:22:24.2766721Z cvt.rn.bf16x2.f32 %r12294, %r42547, %r42546; 2026-02-21T10:22:24.2766804Z cvt.rn.bf16x2.f32 %r12295, %r42549, %r42548; 2026-02-21T10:22:24.2766877Z cvt.rn.bf16x2.f32 %r12296, %r42551, %r42550; 2026-02-21T10:22:24.2766949Z cvt.rn.bf16x2.f32 %r12297, %r42553, %r42552; 2026-02-21T10:22:24.2767114Z cvt.rn.bf16x2.f32 %r12298, %r42555, %r42554; 2026-02-21T10:22:24.2767192Z cvt.rn.bf16x2.f32 %r12299, %r42557, %r42556; 2026-02-21T10:22:24.2767265Z cvt.rn.bf16x2.f32 %r12300, %r42559, %r42558; 2026-02-21T10:22:24.2767342Z cvt.rn.bf16x2.f32 %r12301, %r42561, %r42560; 2026-02-21T10:22:24.2767415Z cvt.rn.bf16x2.f32 %r12302, %r42563, %r42562; 2026-02-21T10:22:24.2767555Z cvt.rn.bf16x2.f32 %r12303, %r42565, %r42564; 2026-02-21T10:22:24.2767630Z cvt.rn.bf16x2.f32 %r12304, %r42567, %r42566; 2026-02-21T10:22:24.2767707Z cvt.rn.bf16x2.f32 %r12305, %r42569, %r42568; 2026-02-21T10:22:24.2767794Z cvt.rn.bf16x2.f32 %r12306, %r42571, %r42570; 2026-02-21T10:22:24.2767870Z cvt.rn.bf16x2.f32 %r12307, %r42573, %r42572; 2026-02-21T10:22:24.2767945Z cvt.rn.bf16x2.f32 %r12308, %r42575, %r42574; 2026-02-21T10:22:24.2768016Z cvt.rn.bf16x2.f32 %r12309, %r42577, %r42576; 2026-02-21T10:22:24.2768089Z cvt.rn.bf16x2.f32 %r12310, %r42579, %r42578; 2026-02-21T10:22:24.2768165Z cvt.rn.bf16x2.f32 %r12311, %r42581, %r42580; 2026-02-21T10:22:24.2768241Z cvt.rn.bf16x2.f32 %r12312, %r42583, %r42582; 2026-02-21T10:22:24.2768313Z cvt.rn.bf16x2.f32 %r12313, %r42585, %r42584; 2026-02-21T10:22:24.2768385Z cvt.rn.bf16x2.f32 %r12314, %r42587, %r42586; 2026-02-21T10:22:24.2768461Z cvt.rn.bf16x2.f32 %r12315, %r42589, %r42588; 2026-02-21T10:22:24.2768537Z cvt.rn.bf16x2.f32 %r12316, %r42591, %r42590; 2026-02-21T10:22:24.2768610Z cvt.rn.bf16x2.f32 %r12317, %r42593, %r42592; 2026-02-21T10:22:24.2768767Z cvt.rn.bf16x2.f32 %r12318, %r42595, %r42594; 2026-02-21T10:22:24.2768843Z cvt.rn.bf16x2.f32 %r12319, %r42597, %r42596; 2026-02-21T10:22:24.2768915Z cvt.rn.bf16x2.f32 %r12320, %r42599, %r42598; 2026-02-21T10:22:24.2768987Z cvt.rn.bf16x2.f32 %r12321, %r42601, %r42600; 2026-02-21T10:22:24.2769196Z .loc 1 98 43 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:98:43 2026-02-21T10:22:24.2769254Z bar.sync 0; 2026-02-21T10:22:24.2769454Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r12258, %r12259, %r12260, %r12261}; 2026-02-21T10:22:24.2769644Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r12274, %r12275, %r12276, %r12277}; 2026-02-21T10:22:24.2769833Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r12290, %r12291, %r12292, %r12293}; 2026-02-21T10:22:24.2770014Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r12306, %r12307, %r12308, %r12309}; 2026-02-21T10:22:24.2770199Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r12262, %r12263, %r12264, %r12265}; 2026-02-21T10:22:24.2770379Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r12278, %r12279, %r12280, %r12281}; 2026-02-21T10:22:24.2770558Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r12294, %r12295, %r12296, %r12297}; 2026-02-21T10:22:24.2770738Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r12310, %r12311, %r12312, %r12313}; 2026-02-21T10:22:24.2770915Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r12266, %r12267, %r12268, %r12269}; 2026-02-21T10:22:24.2771167Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r12282, %r12283, %r12284, %r12285}; 2026-02-21T10:22:24.2771347Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r12298, %r12299, %r12300, %r12301}; 2026-02-21T10:22:24.2771524Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r12314, %r12315, %r12316, %r12317}; 2026-02-21T10:22:24.2771706Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r12270, %r12271, %r12272, %r12273}; 2026-02-21T10:22:24.2771888Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r12286, %r12287, %r12288, %r12289}; 2026-02-21T10:22:24.2772064Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r12302, %r12303, %r12304, %r12305}; 2026-02-21T10:22:24.2772243Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r12318, %r12319, %r12320, %r12321}; 2026-02-21T10:22:24.2772302Z // begin inline asm 2026-02-21T10:22:24.2772385Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2772440Z // end inline asm 2026-02-21T10:22:24.2772496Z bar.sync 0; 2026-02-21T10:22:24.2772623Z elect.sync %r12322|%p128, -1; 2026-02-21T10:22:24.2772706Z shfl.sync.idx.b32 %r12323, %r5, 0, 31, -1; 2026-02-21T10:22:24.2772772Z and.pred %p126, %p405, %p128; 2026-02-21T10:22:24.2772836Z and.b32 %r12324, %r12323, 1; 2026-02-21T10:22:24.2772894Z shl.b32 %r12325, %r12324, 14; 2026-02-21T10:22:24.2772958Z add.s32 %r22280, %r39936, %r12325; 2026-02-21T10:22:24.2773066Z shl.b32 %r641, %r12324, 6; 2026-02-21T10:22:24.2773128Z or.b32 %r12254, %r641, %r9808; 2026-02-21T10:22:24.2773186Z // begin inline asm 2026-02-21T10:22:24.2773429Z @%p126 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r12254, %r12255}], [%r22280]; 2026-02-21T10:22:24.2773486Z // end inline asm 2026-02-21T10:22:24.2773570Z cp.async.bulk.commit_group; 2026-02-21T10:22:24.2773644Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:24.2773698Z bar.sync 0; 2026-02-21T10:22:24.2773898Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.2773963Z add.s32 %r12327, %r42472, 1; 2026-02-21T10:22:24.2774154Z .loc 1 37 35 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:37:35 2026-02-21T10:22:24.2774216Z shr.s32 %r12328, %r12327, 31; 2026-02-21T10:22:24.2774274Z shr.u32 %r12329, %r12328, 18; 2026-02-21T10:22:24.2774335Z add.s32 %r12330, %r12327, %r12329; 2026-02-21T10:22:24.2774398Z shr.s32 %r12331, %r12330, 14; 2026-02-21T10:22:24.2774639Z .loc 1 38 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:38:33 2026-02-21T10:22:24.2774699Z shl.b32 %r12332, %r12331, 5; 2026-02-21T10:22:24.2774888Z .loc 1 39 39 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:39 2026-02-21T10:22:24.2774950Z sub.s32 %r12333, 10, %r12332; 2026-02-21T10:22:24.2775137Z .loc 1 39 52 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:52 2026-02-21T10:22:24.2775193Z min.s32 %r12334, %r12333, 32; 2026-02-21T10:22:24.2775388Z .loc 1 40 45 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:45 2026-02-21T10:22:24.2775451Z and.b32 %r12335, %r12330, -16384; 2026-02-21T10:22:24.2775510Z sub.s32 %r12336, %r12327, %r12335; 2026-02-21T10:22:24.2775700Z .loc 1 41 51 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:41:51 2026-02-21T10:22:24.2775761Z div.s32 %r12337, %r12336, %r12334; 2026-02-21T10:22:24.2775950Z .loc 1 40 64 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:64 2026-02-21T10:22:24.2776026Z mul.lo.s32 %r12338, %r12337, %r12334; 2026-02-21T10:22:24.2776084Z sub.s32 %r12339, %r12336, %r12338; 2026-02-21T10:22:24.2776272Z .loc 1 40 30 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:30 2026-02-21T10:22:24.2776331Z add.s32 %r12340, %r12339, %r12332; 2026-02-21T10:22:24.2776649Z .loc 1 42 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:42:27 2026-02-21T10:22:24.2776803Z shl.b32 %r19833, %r12340, 7; 2026-02-21T10:22:24.2777001Z .loc 1 43 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:43:27 2026-02-21T10:22:24.2777061Z shl.b32 %r22279, %r12337, 7; 2026-02-21T10:22:24.2777264Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.2777325Z or.b32 %r12341, %r42464, %r22279; 2026-02-21T10:22:24.2777385Z shl.b32 %r12342, %r12341, 13; 2026-02-21T10:22:24.2777451Z mul.wide.s32 %rd45, %r12342, 2; 2026-02-21T10:22:24.2777509Z or.b32 %r12343, %r42465, %r22279; 2026-02-21T10:22:24.2777568Z shl.b32 %r12344, %r12343, 13; 2026-02-21T10:22:24.2777634Z mul.wide.s32 %rd46, %r12344, 2; 2026-02-21T10:22:24.2777694Z or.b32 %r12345, %r42466, %r22279; 2026-02-21T10:22:24.2777753Z shl.b32 %r12346, %r12345, 13; 2026-02-21T10:22:24.2777819Z mul.wide.s32 %rd47, %r12346, 2; 2026-02-21T10:22:24.2777875Z or.b32 %r12347, %r42467, %r22279; 2026-02-21T10:22:24.2778002Z shl.b32 %r12348, %r12347, 13; 2026-02-21T10:22:24.2778069Z mul.wide.s32 %rd48, %r12348, 2; 2026-02-21T10:22:24.2778126Z or.b32 %r12349, %r42468, %r22279; 2026-02-21T10:22:24.2778183Z shl.b32 %r12350, %r12349, 13; 2026-02-21T10:22:24.2778243Z mul.wide.s32 %rd49, %r12350, 2; 2026-02-21T10:22:24.2778368Z or.b32 %r12351, %r42469, %r22279; 2026-02-21T10:22:24.2778438Z shl.b32 %r12352, %r12351, 13; 2026-02-21T10:22:24.2778501Z mul.wide.s32 %rd50, %r12352, 2; 2026-02-21T10:22:24.2778563Z shl.b32 %r12353, %r12337, 20; 2026-02-21T10:22:24.2778622Z or.b32 %r12354, %r42470, %r12353; 2026-02-21T10:22:24.2778682Z mul.wide.s32 %rd51, %r12354, 2; 2026-02-21T10:22:24.2778743Z or.b32 %r42730, %r68, %r12353; 2026-02-21T10:22:24.2778806Z or.b32 %r12355, %r42471, %r12353; 2026-02-21T10:22:24.2778865Z mul.wide.s32 %rd52, %r12355, 2; 2026-02-21T10:22:24.2778925Z mov.b32 %r42731, 0f00000000; 2026-02-21T10:22:24.2778988Z mov.b64 %rd845, -96; 2026-02-21T10:22:24.2779050Z mov.b64 %rd844, %rd11; 2026-02-21T10:22:24.2779111Z mov.b32 %r42732, %r42731; 2026-02-21T10:22:24.2779168Z mov.b32 %r42733, %r42731; 2026-02-21T10:22:24.2779230Z mov.b32 %r42734, %r42731; 2026-02-21T10:22:24.2779285Z mov.b32 %r42735, %r42731; 2026-02-21T10:22:24.2779353Z mov.b32 %r42736, %r42731; 2026-02-21T10:22:24.2779418Z mov.b32 %r42737, %r42731; 2026-02-21T10:22:24.2779476Z mov.b32 %r42738, %r42731; 2026-02-21T10:22:24.2779533Z mov.b32 %r42739, %r42731; 2026-02-21T10:22:24.2779594Z mov.b32 %r42740, %r42731; 2026-02-21T10:22:24.2779725Z mov.b32 %r42741, %r42731; 2026-02-21T10:22:24.2779783Z mov.b32 %r42742, %r42731; 2026-02-21T10:22:24.2779838Z mov.b32 %r42743, %r42731; 2026-02-21T10:22:24.2779896Z mov.b32 %r42744, %r42731; 2026-02-21T10:22:24.2779953Z mov.b32 %r42745, %r42731; 2026-02-21T10:22:24.2780008Z mov.b32 %r42746, %r42731; 2026-02-21T10:22:24.2780067Z mov.b32 %r42747, %r42731; 2026-02-21T10:22:24.2780124Z mov.b32 %r42748, %r42731; 2026-02-21T10:22:24.2780182Z mov.b32 %r42749, %r42731; 2026-02-21T10:22:24.2780239Z mov.b32 %r42750, %r42731; 2026-02-21T10:22:24.2780298Z mov.b32 %r42751, %r42731; 2026-02-21T10:22:24.2780354Z mov.b32 %r42752, %r42731; 2026-02-21T10:22:24.2780409Z mov.b32 %r42753, %r42731; 2026-02-21T10:22:24.2780470Z mov.b32 %r42754, %r42731; 2026-02-21T10:22:24.2780528Z mov.b32 %r42755, %r42731; 2026-02-21T10:22:24.2780583Z mov.b32 %r42756, %r42731; 2026-02-21T10:22:24.2780638Z mov.b32 %r42757, %r42731; 2026-02-21T10:22:24.2780697Z mov.b32 %r42758, %r42731; 2026-02-21T10:22:24.2780755Z mov.b32 %r42759, %r42731; 2026-02-21T10:22:24.2780811Z mov.b32 %r42760, %r42731; 2026-02-21T10:22:24.2780869Z mov.b32 %r42761, %r42731; 2026-02-21T10:22:24.2780925Z mov.b32 %r42762, %r42731; 2026-02-21T10:22:24.2780981Z mov.b32 %r42763, %r42731; 2026-02-21T10:22:24.2781037Z mov.b32 %r42764, %r42731; 2026-02-21T10:22:24.2781099Z mov.b32 %r42765, %r42731; 2026-02-21T10:22:24.2781156Z mov.b32 %r42766, %r42731; 2026-02-21T10:22:24.2781271Z mov.b32 %r42767, %r42731; 2026-02-21T10:22:24.2781330Z mov.b32 %r42768, %r42731; 2026-02-21T10:22:24.2781386Z mov.b32 %r42769, %r42731; 2026-02-21T10:22:24.2781453Z mov.b32 %r42770, %r42731; 2026-02-21T10:22:24.2781510Z mov.b32 %r42771, %r42731; 2026-02-21T10:22:24.2781570Z mov.b32 %r42772, %r42731; 2026-02-21T10:22:24.2781634Z mov.b32 %r42773, %r42731; 2026-02-21T10:22:24.2781689Z mov.b32 %r42774, %r42731; 2026-02-21T10:22:24.2781749Z mov.b32 %r42775, %r42731; 2026-02-21T10:22:24.2781808Z mov.b32 %r42776, %r42731; 2026-02-21T10:22:24.2781866Z mov.b32 %r42777, %r42731; 2026-02-21T10:22:24.2781922Z mov.b32 %r42778, %r42731; 2026-02-21T10:22:24.2781982Z mov.b32 %r42779, %r42731; 2026-02-21T10:22:24.2782038Z mov.b32 %r42780, %r42731; 2026-02-21T10:22:24.2782093Z mov.b32 %r42781, %r42731; 2026-02-21T10:22:24.2782152Z mov.b32 %r42782, %r42731; 2026-02-21T10:22:24.2782209Z mov.b32 %r42783, %r42731; 2026-02-21T10:22:24.2782266Z mov.b32 %r42784, %r42731; 2026-02-21T10:22:24.2782325Z mov.b32 %r42785, %r42731; 2026-02-21T10:22:24.2782437Z mov.b32 %r42786, %r42731; 2026-02-21T10:22:24.2782496Z mov.b32 %r42787, %r42731; 2026-02-21T10:22:24.2782551Z mov.b32 %r42788, %r42731; 2026-02-21T10:22:24.2782610Z mov.b32 %r42789, %r42731; 2026-02-21T10:22:24.2782665Z mov.b32 %r42790, %r42731; 2026-02-21T10:22:24.2782780Z mov.b32 %r42791, %r42731; 2026-02-21T10:22:24.2782839Z mov.b32 %r42792, %r42731; 2026-02-21T10:22:24.2782896Z mov.b32 %r42793, %r42731; 2026-02-21T10:22:24.2782954Z mov.b32 %r42794, %r42731; 2026-02-21T10:22:24.2783011Z mov.b32 %r42795, %r42731; 2026-02-21T10:22:24.2783069Z mov.b32 %r42796, %r42731; 2026-02-21T10:22:24.2783125Z mov.b32 %r42797, %r42731; 2026-02-21T10:22:24.2783181Z mov.b32 %r42798, %r42731; 2026-02-21T10:22:24.2783238Z mov.b32 %r42799, %r42731; 2026-02-21T10:22:24.2783294Z mov.b32 %r42800, %r42731; 2026-02-21T10:22:24.2783349Z mov.b32 %r42801, %r42731; 2026-02-21T10:22:24.2783405Z mov.b32 %r42802, %r42731; 2026-02-21T10:22:24.2783469Z mov.b32 %r42803, %r42731; 2026-02-21T10:22:24.2783526Z mov.b32 %r42804, %r42731; 2026-02-21T10:22:24.2783581Z mov.b32 %r42805, %r42731; 2026-02-21T10:22:24.2783638Z mov.b32 %r42806, %r42731; 2026-02-21T10:22:24.2783694Z mov.b32 %r42807, %r42731; 2026-02-21T10:22:24.2783750Z mov.b32 %r42808, %r42731; 2026-02-21T10:22:24.2783808Z mov.b32 %r42809, %r42731; 2026-02-21T10:22:24.2783866Z mov.b32 %r42810, %r42731; 2026-02-21T10:22:24.2783921Z mov.b32 %r42811, %r42731; 2026-02-21T10:22:24.2784037Z mov.b32 %r42812, %r42731; 2026-02-21T10:22:24.2784100Z mov.b32 %r42813, %r42731; 2026-02-21T10:22:24.2784156Z mov.b32 %r42814, %r42731; 2026-02-21T10:22:24.2784211Z mov.b32 %r42815, %r42731; 2026-02-21T10:22:24.2784268Z mov.b32 %r42816, %r42731; 2026-02-21T10:22:24.2784326Z mov.b32 %r42817, %r42731; 2026-02-21T10:22:24.2784383Z mov.b32 %r42818, %r42731; 2026-02-21T10:22:24.2784439Z mov.b32 %r42819, %r42731; 2026-02-21T10:22:24.2784499Z mov.b32 %r42820, %r42731; 2026-02-21T10:22:24.2784570Z mov.b32 %r42821, %r42731; 2026-02-21T10:22:24.2784628Z mov.b32 %r42822, %r42731; 2026-02-21T10:22:24.2784686Z mov.b32 %r42823, %r42731; 2026-02-21T10:22:24.2784743Z mov.b32 %r42824, %r42731; 2026-02-21T10:22:24.2784798Z mov.b32 %r42825, %r42731; 2026-02-21T10:22:24.2784853Z mov.b32 %r42826, %r42731; 2026-02-21T10:22:24.2784914Z mov.b32 %r42827, %r42731; 2026-02-21T10:22:24.2784969Z mov.b32 %r42828, %r42731; 2026-02-21T10:22:24.2785025Z mov.b32 %r42829, %r42731; 2026-02-21T10:22:24.2785084Z mov.b32 %r42830, %r42731; 2026-02-21T10:22:24.2785145Z mov.b32 %r42831, %r42731; 2026-02-21T10:22:24.2785200Z mov.b32 %r42832, %r42731; 2026-02-21T10:22:24.2785256Z mov.b32 %r42833, %r42731; 2026-02-21T10:22:24.2785315Z mov.b32 %r42834, %r42731; 2026-02-21T10:22:24.2785371Z mov.b32 %r42835, %r42731; 2026-02-21T10:22:24.2785427Z mov.b32 %r42836, %r42731; 2026-02-21T10:22:24.2785485Z mov.b32 %r42837, %r42731; 2026-02-21T10:22:24.2785542Z mov.b32 %r42838, %r42731; 2026-02-21T10:22:24.2785668Z mov.b32 %r42839, %r42731; 2026-02-21T10:22:24.2785726Z mov.b32 %r42840, %r42731; 2026-02-21T10:22:24.2785784Z mov.b32 %r42841, %r42731; 2026-02-21T10:22:24.2785839Z mov.b32 %r42842, %r42731; 2026-02-21T10:22:24.2785894Z mov.b32 %r42843, %r42731; 2026-02-21T10:22:24.2785953Z mov.b32 %r42844, %r42731; 2026-02-21T10:22:24.2786010Z mov.b32 %r42845, %r42731; 2026-02-21T10:22:24.2786067Z mov.b32 %r42846, %r42731; 2026-02-21T10:22:24.2786122Z mov.b32 %r42847, %r42731; 2026-02-21T10:22:24.2786182Z mov.b32 %r42848, %r42731; 2026-02-21T10:22:24.2786236Z mov.b32 %r42849, %r42731; 2026-02-21T10:22:24.2786300Z mov.b32 %r42850, %r42731; 2026-02-21T10:22:24.2786359Z mov.b32 %r42851, %r42731; 2026-02-21T10:22:24.2786415Z mov.b32 %r42852, %r42731; 2026-02-21T10:22:24.2786586Z mov.b32 %r42853, %r42731; 2026-02-21T10:22:24.2786647Z mov.b32 %r42854, %r42731; 2026-02-21T10:22:24.2786705Z mov.b32 %r42855, %r42731; 2026-02-21T10:22:24.2786761Z mov.b32 %r42856, %r42731; 2026-02-21T10:22:24.2786896Z mov.b32 %r42857, %r42731; 2026-02-21T10:22:24.2786963Z mov.b32 %r42858, %r42731; 2026-02-21T10:22:24.2787078Z $L__BB0_7: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.2787187Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.2787395Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2787553Z add.s64 %rd333, %rd844, %rd51; 2026-02-21T10:22:24.2787618Z add.s64 %rd336, %rd844, %rd50; 2026-02-21T10:22:24.2787681Z add.s64 %rd339, %rd844, %rd49; 2026-02-21T10:22:24.2787743Z add.s64 %rd342, %rd844, %rd48; 2026-02-21T10:22:24.2787801Z add.s64 %rd345, %rd844, %rd47; 2026-02-21T10:22:24.2787862Z add.s64 %rd348, %rd844, %rd46; 2026-02-21T10:22:24.2787925Z add.s64 %rd351, %rd844, %rd45; 2026-02-21T10:22:24.2788119Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2788185Z add.s64 %rd354, %rd844, %rd52; 2026-02-21T10:22:24.2788243Z // begin inline asm 2026-02-21T10:22:24.2788304Z mov.u64 %rd332, 0x0; 2026-02-21T10:22:24.2788502Z createpolicy.fractional.L2::evict_first.b64 %rd332, 1.0; 2026-02-21T10:22:24.2788563Z // end inline asm 2026-02-21T10:22:24.2788625Z // begin inline asm 2026-02-21T10:22:24.2788686Z mov.u32 %r12356, 0x0; 2026-02-21T10:22:24.2788741Z mov.u32 %r12357, 0x0; 2026-02-21T10:22:24.2788799Z mov.u32 %r12358, 0x0; 2026-02-21T10:22:24.2788856Z mov.u32 %r12359, 0x0; 2026-02-21T10:22:24.2789169Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12356, %r12357, %r12358, %r12359 }, [ %rd333 + 0 ], %rd332; 2026-02-21T10:22:24.2789240Z // end inline asm 2026-02-21T10:22:24.2789302Z // begin inline asm 2026-02-21T10:22:24.2789360Z mov.u64 %rd335, 0x0; 2026-02-21T10:22:24.2789486Z createpolicy.fractional.L2::evict_first.b64 %rd335, 1.0; 2026-02-21T10:22:24.2789545Z // end inline asm 2026-02-21T10:22:24.2789603Z // begin inline asm 2026-02-21T10:22:24.2789666Z mov.u32 %r12360, 0x0; 2026-02-21T10:22:24.2789724Z mov.u32 %r12361, 0x0; 2026-02-21T10:22:24.2789782Z mov.u32 %r12362, 0x0; 2026-02-21T10:22:24.2789838Z mov.u32 %r12363, 0x0; 2026-02-21T10:22:24.2790071Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12360, %r12361, %r12362, %r12363 }, [ %rd336 + 0 ], %rd335; 2026-02-21T10:22:24.2790132Z // end inline asm 2026-02-21T10:22:24.2790189Z // begin inline asm 2026-02-21T10:22:24.2790245Z mov.u64 %rd338, 0x0; 2026-02-21T10:22:24.2790370Z createpolicy.fractional.L2::evict_first.b64 %rd338, 1.0; 2026-02-21T10:22:24.2790425Z // end inline asm 2026-02-21T10:22:24.2790481Z // begin inline asm 2026-02-21T10:22:24.2790536Z mov.u32 %r12364, 0x0; 2026-02-21T10:22:24.2790595Z mov.u32 %r12365, 0x0; 2026-02-21T10:22:24.2790651Z mov.u32 %r12366, 0x0; 2026-02-21T10:22:24.2790705Z mov.u32 %r12367, 0x0; 2026-02-21T10:22:24.2790936Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12364, %r12365, %r12366, %r12367 }, [ %rd339 + 0 ], %rd338; 2026-02-21T10:22:24.2791080Z // end inline asm 2026-02-21T10:22:24.2791137Z // begin inline asm 2026-02-21T10:22:24.2791196Z mov.u64 %rd341, 0x0; 2026-02-21T10:22:24.2791313Z createpolicy.fractional.L2::evict_first.b64 %rd341, 1.0; 2026-02-21T10:22:24.2791368Z // end inline asm 2026-02-21T10:22:24.2791423Z // begin inline asm 2026-02-21T10:22:24.2791484Z mov.u32 %r12368, 0x0; 2026-02-21T10:22:24.2791540Z mov.u32 %r12369, 0x0; 2026-02-21T10:22:24.2791594Z mov.u32 %r12370, 0x0; 2026-02-21T10:22:24.2791652Z mov.u32 %r12371, 0x0; 2026-02-21T10:22:24.2791876Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12368, %r12369, %r12370, %r12371 }, [ %rd342 + 0 ], %rd341; 2026-02-21T10:22:24.2791934Z // end inline asm 2026-02-21T10:22:24.2791990Z // begin inline asm 2026-02-21T10:22:24.2792048Z mov.u64 %rd344, 0x0; 2026-02-21T10:22:24.2792167Z createpolicy.fractional.L2::evict_first.b64 %rd344, 1.0; 2026-02-21T10:22:24.2792221Z // end inline asm 2026-02-21T10:22:24.2792280Z // begin inline asm 2026-02-21T10:22:24.2792398Z mov.u32 %r12372, 0x0; 2026-02-21T10:22:24.2792459Z mov.u32 %r12373, 0x0; 2026-02-21T10:22:24.2792517Z mov.u32 %r12374, 0x0; 2026-02-21T10:22:24.2792572Z mov.u32 %r12375, 0x0; 2026-02-21T10:22:24.2792793Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12372, %r12373, %r12374, %r12375 }, [ %rd345 + 0 ], %rd344; 2026-02-21T10:22:24.2792895Z // end inline asm 2026-02-21T10:22:24.2792955Z // begin inline asm 2026-02-21T10:22:24.2793010Z mov.u64 %rd347, 0x0; 2026-02-21T10:22:24.2793127Z createpolicy.fractional.L2::evict_first.b64 %rd347, 1.0; 2026-02-21T10:22:24.2793186Z // end inline asm 2026-02-21T10:22:24.2793243Z // begin inline asm 2026-02-21T10:22:24.2793299Z mov.u32 %r12376, 0x0; 2026-02-21T10:22:24.2793355Z mov.u32 %r12377, 0x0; 2026-02-21T10:22:24.2793416Z mov.u32 %r12378, 0x0; 2026-02-21T10:22:24.2793471Z mov.u32 %r12379, 0x0; 2026-02-21T10:22:24.2793688Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12376, %r12377, %r12378, %r12379 }, [ %rd348 + 0 ], %rd347; 2026-02-21T10:22:24.2793753Z // end inline asm 2026-02-21T10:22:24.2793809Z // begin inline asm 2026-02-21T10:22:24.2793864Z mov.u64 %rd350, 0x0; 2026-02-21T10:22:24.2793982Z createpolicy.fractional.L2::evict_first.b64 %rd350, 1.0; 2026-02-21T10:22:24.2794037Z // end inline asm 2026-02-21T10:22:24.2794094Z // begin inline asm 2026-02-21T10:22:24.2794153Z mov.u32 %r12380, 0x0; 2026-02-21T10:22:24.2794212Z mov.u32 %r12381, 0x0; 2026-02-21T10:22:24.2794269Z mov.u32 %r12382, 0x0; 2026-02-21T10:22:24.2794375Z mov.u32 %r12383, 0x0; 2026-02-21T10:22:24.2794599Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12380, %r12381, %r12382, %r12383 }, [ %rd351 + 0 ], %rd350; 2026-02-21T10:22:24.2794668Z // end inline asm 2026-02-21T10:22:24.2794727Z // begin inline asm 2026-02-21T10:22:24.2794786Z mov.u64 %rd353, 0x0; 2026-02-21T10:22:24.2794900Z createpolicy.fractional.L2::evict_first.b64 %rd353, 1.0; 2026-02-21T10:22:24.2794954Z // end inline asm 2026-02-21T10:22:24.2795012Z // begin inline asm 2026-02-21T10:22:24.2795073Z mov.u32 %r12384, 0x0; 2026-02-21T10:22:24.2795129Z mov.u32 %r12385, 0x0; 2026-02-21T10:22:24.2795184Z mov.u32 %r12386, 0x0; 2026-02-21T10:22:24.2795241Z mov.u32 %r12387, 0x0; 2026-02-21T10:22:24.2795458Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r12384, %r12385, %r12386, %r12387 }, [ %rd354 + 0 ], %rd353; 2026-02-21T10:22:24.2795515Z // end inline asm 2026-02-21T10:22:24.2795724Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2795782Z bar.sync 0; 2026-02-21T10:22:24.2795866Z st.shared.v2.b32 [%r10], {%r12356, %r12357}; 2026-02-21T10:22:24.2795957Z st.shared.v2.b32 [%r10+2048], {%r12360, %r12361}; 2026-02-21T10:22:24.2796044Z st.shared.v2.b32 [%r10+4096], {%r12364, %r12365}; 2026-02-21T10:22:24.2796137Z st.shared.v2.b32 [%r10+6144], {%r12368, %r12369}; 2026-02-21T10:22:24.2796221Z st.shared.v2.b32 [%r10+8192], {%r12372, %r12373}; 2026-02-21T10:22:24.2796371Z st.shared.v2.b32 [%r10+10240], {%r12376, %r12377}; 2026-02-21T10:22:24.2796588Z st.shared.v2.b32 [%r10+12288], {%r12380, %r12381}; 2026-02-21T10:22:24.2796680Z st.shared.v2.b32 [%r10+14336], {%r12384, %r12385}; 2026-02-21T10:22:24.2796759Z st.shared.v2.b32 [%r11], {%r12358, %r12359}; 2026-02-21T10:22:24.2796843Z st.shared.v2.b32 [%r11+2048], {%r12362, %r12363}; 2026-02-21T10:22:24.2796928Z st.shared.v2.b32 [%r11+4096], {%r12366, %r12367}; 2026-02-21T10:22:24.2797010Z st.shared.v2.b32 [%r11+6144], {%r12370, %r12371}; 2026-02-21T10:22:24.2797109Z st.shared.v2.b32 [%r11+8192], {%r12374, %r12375}; 2026-02-21T10:22:24.2797194Z st.shared.v2.b32 [%r11+10240], {%r12378, %r12379}; 2026-02-21T10:22:24.2797278Z st.shared.v2.b32 [%r11+12288], {%r12382, %r12383}; 2026-02-21T10:22:24.2797365Z st.shared.v2.b32 [%r11+14336], {%r12386, %r12387}; 2026-02-21T10:22:24.2797418Z bar.sync 0; 2026-02-21T10:22:24.2797488Z ld.shared.b16 %rs897, [%r52]; 2026-02-21T10:22:24.2797555Z ld.shared.b16 %rs898, [%r52+1024]; 2026-02-21T10:22:24.2797700Z ld.shared.b16 %rs899, [%r52+64]; 2026-02-21T10:22:24.2797768Z ld.shared.b16 %rs900, [%r52+1088]; 2026-02-21T10:22:24.2797831Z ld.shared.b16 %rs901, [%r52+8192]; 2026-02-21T10:22:24.2797897Z ld.shared.b16 %rs902, [%r52+9216]; 2026-02-21T10:22:24.2797958Z ld.shared.b16 %rs903, [%r52+8256]; 2026-02-21T10:22:24.2798086Z ld.shared.b16 %rs904, [%r52+9280]; 2026-02-21T10:22:24.2798156Z ld.shared.b16 %rs905, [%r53]; 2026-02-21T10:22:24.2798229Z ld.shared.b16 %rs906, [%r53+1024]; 2026-02-21T10:22:24.2798295Z ld.shared.b16 %rs907, [%r53+64]; 2026-02-21T10:22:24.2798357Z ld.shared.b16 %rs908, [%r53+1088]; 2026-02-21T10:22:24.2798423Z ld.shared.b16 %rs909, [%r53+8192]; 2026-02-21T10:22:24.2798485Z ld.shared.b16 %rs910, [%r53+9216]; 2026-02-21T10:22:24.2798546Z ld.shared.b16 %rs911, [%r53+8256]; 2026-02-21T10:22:24.2798609Z ld.shared.b16 %rs912, [%r53+9280]; 2026-02-21T10:22:24.2798671Z ld.shared.b16 %rs913, [%r54]; 2026-02-21T10:22:24.2798732Z ld.shared.b16 %rs914, [%r54+1024]; 2026-02-21T10:22:24.2798798Z ld.shared.b16 %rs915, [%r54+64]; 2026-02-21T10:22:24.2798863Z ld.shared.b16 %rs916, [%r54+1088]; 2026-02-21T10:22:24.2798925Z ld.shared.b16 %rs917, [%r54+8192]; 2026-02-21T10:22:24.2798986Z ld.shared.b16 %rs918, [%r54+9216]; 2026-02-21T10:22:24.2799048Z ld.shared.b16 %rs919, [%r54+8256]; 2026-02-21T10:22:24.2799115Z ld.shared.b16 %rs920, [%r54+9280]; 2026-02-21T10:22:24.2799178Z ld.shared.b16 %rs921, [%r55]; 2026-02-21T10:22:24.2799244Z ld.shared.b16 %rs922, [%r55+1024]; 2026-02-21T10:22:24.2799372Z ld.shared.b16 %rs923, [%r55+64]; 2026-02-21T10:22:24.2799438Z ld.shared.b16 %rs924, [%r55+1088]; 2026-02-21T10:22:24.2799500Z ld.shared.b16 %rs925, [%r55+8192]; 2026-02-21T10:22:24.2799563Z ld.shared.b16 %rs926, [%r55+9216]; 2026-02-21T10:22:24.2799625Z ld.shared.b16 %rs927, [%r55+8256]; 2026-02-21T10:22:24.2799686Z ld.shared.b16 %rs928, [%r55+9280]; 2026-02-21T10:22:24.2799748Z ld.shared.b16 %rs929, [%r56]; 2026-02-21T10:22:24.2799812Z ld.shared.b16 %rs930, [%r56+1024]; 2026-02-21T10:22:24.2799875Z ld.shared.b16 %rs931, [%r56+64]; 2026-02-21T10:22:24.2799936Z ld.shared.b16 %rs932, [%r56+1088]; 2026-02-21T10:22:24.2800001Z ld.shared.b16 %rs933, [%r56+8192]; 2026-02-21T10:22:24.2800073Z ld.shared.b16 %rs934, [%r56+9216]; 2026-02-21T10:22:24.2800137Z ld.shared.b16 %rs935, [%r56+8256]; 2026-02-21T10:22:24.2800205Z ld.shared.b16 %rs936, [%r56+9280]; 2026-02-21T10:22:24.2800266Z ld.shared.b16 %rs937, [%r57]; 2026-02-21T10:22:24.2800327Z ld.shared.b16 %rs938, [%r57+1024]; 2026-02-21T10:22:24.2800393Z ld.shared.b16 %rs939, [%r57+64]; 2026-02-21T10:22:24.2800457Z ld.shared.b16 %rs940, [%r57+1088]; 2026-02-21T10:22:24.2800519Z ld.shared.b16 %rs941, [%r57+8192]; 2026-02-21T10:22:24.2800580Z ld.shared.b16 %rs942, [%r57+9216]; 2026-02-21T10:22:24.2800644Z ld.shared.b16 %rs943, [%r57+8256]; 2026-02-21T10:22:24.2800704Z ld.shared.b16 %rs944, [%r57+9280]; 2026-02-21T10:22:24.2800765Z ld.shared.b16 %rs945, [%r58]; 2026-02-21T10:22:24.2800901Z ld.shared.b16 %rs946, [%r58+1024]; 2026-02-21T10:22:24.2800963Z ld.shared.b16 %rs947, [%r58+64]; 2026-02-21T10:22:24.2801024Z ld.shared.b16 %rs948, [%r58+1088]; 2026-02-21T10:22:24.2801084Z ld.shared.b16 %rs949, [%r58+8192]; 2026-02-21T10:22:24.2801149Z ld.shared.b16 %rs950, [%r58+9216]; 2026-02-21T10:22:24.2801209Z ld.shared.b16 %rs951, [%r58+8256]; 2026-02-21T10:22:24.2801272Z ld.shared.b16 %rs952, [%r58+9280]; 2026-02-21T10:22:24.2801334Z ld.shared.b16 %rs953, [%r59]; 2026-02-21T10:22:24.2801398Z ld.shared.b16 %rs954, [%r59+1024]; 2026-02-21T10:22:24.2801459Z ld.shared.b16 %rs955, [%r59+64]; 2026-02-21T10:22:24.2801521Z ld.shared.b16 %rs956, [%r59+1088]; 2026-02-21T10:22:24.2801586Z ld.shared.b16 %rs957, [%r59+8192]; 2026-02-21T10:22:24.2801647Z ld.shared.b16 %rs958, [%r59+9216]; 2026-02-21T10:22:24.2801708Z ld.shared.b16 %rs959, [%r59+8256]; 2026-02-21T10:22:24.2801774Z ld.shared.b16 %rs960, [%r59+9280]; 2026-02-21T10:22:24.2801835Z cvt.f32.bf16 %r12525, %rs897; 2026-02-21T10:22:24.2801957Z cvt.f32.bf16 %r12526, %rs898; 2026-02-21T10:22:24.2802025Z cvt.f32.bf16 %r12527, %rs905; 2026-02-21T10:22:24.2802083Z cvt.f32.bf16 %r12528, %rs906; 2026-02-21T10:22:24.2802142Z cvt.f32.bf16 %r12657, %rs913; 2026-02-21T10:22:24.2802199Z cvt.f32.bf16 %r12658, %rs914; 2026-02-21T10:22:24.2802261Z cvt.f32.bf16 %r12659, %rs921; 2026-02-21T10:22:24.2802367Z cvt.f32.bf16 %r12660, %rs922; 2026-02-21T10:22:24.2802425Z cvt.f32.bf16 %r12789, %rs929; 2026-02-21T10:22:24.2802485Z cvt.f32.bf16 %r12790, %rs930; 2026-02-21T10:22:24.2802545Z cvt.f32.bf16 %r12791, %rs937; 2026-02-21T10:22:24.2802603Z cvt.f32.bf16 %r12792, %rs938; 2026-02-21T10:22:24.2802661Z cvt.f32.bf16 %r12921, %rs945; 2026-02-21T10:22:24.2802722Z cvt.f32.bf16 %r12922, %rs946; 2026-02-21T10:22:24.2802779Z cvt.f32.bf16 %r12923, %rs953; 2026-02-21T10:22:24.2802837Z cvt.f32.bf16 %r12924, %rs954; 2026-02-21T10:22:24.2802897Z cvt.f32.bf16 %r13053, %rs899; 2026-02-21T10:22:24.2802956Z cvt.f32.bf16 %r13054, %rs900; 2026-02-21T10:22:24.2803020Z cvt.f32.bf16 %r13055, %rs907; 2026-02-21T10:22:24.2803079Z cvt.f32.bf16 %r13056, %rs908; 2026-02-21T10:22:24.2803139Z cvt.f32.bf16 %r13185, %rs915; 2026-02-21T10:22:24.2803196Z cvt.f32.bf16 %r13186, %rs916; 2026-02-21T10:22:24.2803254Z cvt.f32.bf16 %r13187, %rs923; 2026-02-21T10:22:24.2803314Z cvt.f32.bf16 %r13188, %rs924; 2026-02-21T10:22:24.2803376Z cvt.f32.bf16 %r13317, %rs931; 2026-02-21T10:22:24.2803434Z cvt.f32.bf16 %r13318, %rs932; 2026-02-21T10:22:24.2803553Z cvt.f32.bf16 %r13319, %rs939; 2026-02-21T10:22:24.2803620Z cvt.f32.bf16 %r13320, %rs940; 2026-02-21T10:22:24.2803677Z cvt.f32.bf16 %r13449, %rs947; 2026-02-21T10:22:24.2803736Z cvt.f32.bf16 %r13450, %rs948; 2026-02-21T10:22:24.2803796Z cvt.f32.bf16 %r13451, %rs955; 2026-02-21T10:22:24.2803853Z cvt.f32.bf16 %r13452, %rs956; 2026-02-21T10:22:24.2803911Z cvt.f32.bf16 %r13581, %rs901; 2026-02-21T10:22:24.2803972Z cvt.f32.bf16 %r13582, %rs902; 2026-02-21T10:22:24.2804032Z cvt.f32.bf16 %r13583, %rs909; 2026-02-21T10:22:24.2804092Z cvt.f32.bf16 %r13584, %rs910; 2026-02-21T10:22:24.2804150Z cvt.f32.bf16 %r13713, %rs917; 2026-02-21T10:22:24.2804219Z cvt.f32.bf16 %r13714, %rs918; 2026-02-21T10:22:24.2804283Z cvt.f32.bf16 %r13715, %rs925; 2026-02-21T10:22:24.2804342Z cvt.f32.bf16 %r13716, %rs926; 2026-02-21T10:22:24.2804405Z cvt.f32.bf16 %r13845, %rs933; 2026-02-21T10:22:24.2804463Z cvt.f32.bf16 %r13846, %rs934; 2026-02-21T10:22:24.2804522Z cvt.f32.bf16 %r13847, %rs941; 2026-02-21T10:22:24.2804584Z cvt.f32.bf16 %r13848, %rs942; 2026-02-21T10:22:24.2804645Z cvt.f32.bf16 %r13977, %rs949; 2026-02-21T10:22:24.2804705Z cvt.f32.bf16 %r13978, %rs950; 2026-02-21T10:22:24.2804762Z cvt.f32.bf16 %r13979, %rs957; 2026-02-21T10:22:24.2804823Z cvt.f32.bf16 %r13980, %rs958; 2026-02-21T10:22:24.2804882Z cvt.f32.bf16 %r14109, %rs903; 2026-02-21T10:22:24.2804940Z cvt.f32.bf16 %r14110, %rs904; 2026-02-21T10:22:24.2804999Z cvt.f32.bf16 %r14111, %rs911; 2026-02-21T10:22:24.2805131Z cvt.f32.bf16 %r14112, %rs912; 2026-02-21T10:22:24.2805192Z cvt.f32.bf16 %r14241, %rs919; 2026-02-21T10:22:24.2805250Z cvt.f32.bf16 %r14242, %rs920; 2026-02-21T10:22:24.2805310Z cvt.f32.bf16 %r14243, %rs927; 2026-02-21T10:22:24.2805369Z cvt.f32.bf16 %r14244, %rs928; 2026-02-21T10:22:24.2805426Z cvt.f32.bf16 %r14373, %rs935; 2026-02-21T10:22:24.2805489Z cvt.f32.bf16 %r14374, %rs936; 2026-02-21T10:22:24.2805548Z cvt.f32.bf16 %r14375, %rs943; 2026-02-21T10:22:24.2805606Z cvt.f32.bf16 %r14376, %rs944; 2026-02-21T10:22:24.2805668Z cvt.f32.bf16 %r14505, %rs951; 2026-02-21T10:22:24.2805729Z cvt.f32.bf16 %r14506, %rs952; 2026-02-21T10:22:24.2805787Z cvt.f32.bf16 %r14507, %rs959; 2026-02-21T10:22:24.2805844Z cvt.f32.bf16 %r14508, %rs960; 2026-02-21T10:22:24.2806058Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2806112Z bar.sync 0; 2026-02-21T10:22:24.2806180Z // begin inline asm 2026-02-21T10:22:24.2806353Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2806414Z // end inline asm 2026-02-21T10:22:24.2806593Z bar.sync 0; 2026-02-21T10:22:24.2806655Z // begin inline asm 2026-02-21T10:22:24.2806793Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2806850Z // end inline asm 2026-02-21T10:22:24.2806999Z // begin inline asm 2026-02-21T10:22:24.2807075Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2807132Z // end inline asm 2026-02-21T10:22:24.2807184Z bar.sync 0; 2026-02-21T10:22:24.2807250Z elect.sync %r19601|%p190, -1; 2026-02-21T10:22:24.2807321Z and.pred %p131, %p1, %p190; 2026-02-21T10:22:24.2807383Z add.s64 %rd55, %rd845, 96; 2026-02-21T10:22:24.2807444Z cvt.u32.u64 %r12392, %rd55; 2026-02-21T10:22:24.2807502Z // begin inline asm 2026-02-21T10:22:24.2807845Z @%p131 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r19833, %r12392}], [%r29850]; 2026-02-21T10:22:24.2807903Z // end inline asm 2026-02-21T10:22:24.2807957Z bar.sync 0; 2026-02-21T10:22:24.2808017Z mov.b32 %r19468, 0; 2026-02-21T10:22:24.2808072Z // begin inline asm 2026-02-21T10:22:24.2808123Z 2026-02-21T10:22:24.2808173Z { 2026-02-21T10:22:24.2808236Z .reg .pred complete; 2026-02-21T10:22:24.2808291Z waitLoop: 2026-02-21T10:22:24.2808438Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r19468; 2026-02-21T10:22:24.2808521Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2808574Z } 2026-02-21T10:22:24.2808579Z 2026-02-21T10:22:24.2808705Z // end inline asm 2026-02-21T10:22:24.2808766Z bar.sync 0; 2026-02-21T10:22:24.2808823Z // begin inline asm 2026-02-21T10:22:24.2808921Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2808978Z // end inline asm 2026-02-21T10:22:24.2809181Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2809248Z ld.shared.s8 %rs961, [%r20]; 2026-02-21T10:22:24.2809447Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2809515Z shl.b16 %rs962, %rs961, 4; 2026-02-21T10:22:24.2809702Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2809778Z ld.shared.s8 %rs963, [%r21+128]; 2026-02-21T10:22:24.2809974Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2810034Z shl.b16 %rs964, %rs963, 4; 2026-02-21T10:22:24.2810224Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2810290Z ld.shared.s8 %rs965, [%r22+256]; 2026-02-21T10:22:24.2810477Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2810536Z shl.b16 %rs966, %rs965, 4; 2026-02-21T10:22:24.2810726Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2810871Z ld.shared.s8 %rs967, [%r23+384]; 2026-02-21T10:22:24.2811060Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2811119Z shl.b16 %rs968, %rs967, 4; 2026-02-21T10:22:24.2811307Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2811372Z ld.shared.s8 %rs969, [%r24+512]; 2026-02-21T10:22:24.2811561Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2811624Z shl.b16 %rs970, %rs969, 4; 2026-02-21T10:22:24.2811810Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2811875Z ld.shared.s8 %rs971, [%r25+640]; 2026-02-21T10:22:24.2812062Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2812124Z shl.b16 %rs972, %rs971, 4; 2026-02-21T10:22:24.2812376Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2812444Z ld.shared.s8 %rs973, [%r26+768]; 2026-02-21T10:22:24.2812635Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2812745Z shl.b16 %rs974, %rs973, 4; 2026-02-21T10:22:24.2812933Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2813016Z ld.shared.s8 %rs975, [%r27+896]; 2026-02-21T10:22:24.2813206Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2813269Z shl.b16 %rs976, %rs975, 4; 2026-02-21T10:22:24.2813466Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2813537Z ld.shared.s8 %rs977, [%r20+1024]; 2026-02-21T10:22:24.2813732Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2813801Z shl.b16 %rs978, %rs977, 4; 2026-02-21T10:22:24.2813994Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2814062Z ld.shared.s8 %rs979, [%r21+1152]; 2026-02-21T10:22:24.2814262Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2814378Z shl.b16 %rs980, %rs979, 4; 2026-02-21T10:22:24.2814570Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2814639Z ld.shared.s8 %rs981, [%r22+1280]; 2026-02-21T10:22:24.2814835Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2814896Z shl.b16 %rs982, %rs981, 4; 2026-02-21T10:22:24.2815089Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2815166Z ld.shared.s8 %rs983, [%r23+1408]; 2026-02-21T10:22:24.2815357Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2815420Z shl.b16 %rs984, %rs983, 4; 2026-02-21T10:22:24.2815615Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2815685Z ld.shared.s8 %rs985, [%r24+1536]; 2026-02-21T10:22:24.2815878Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2815945Z shl.b16 %rs986, %rs985, 4; 2026-02-21T10:22:24.2816133Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2816199Z ld.shared.s8 %rs987, [%r25+1664]; 2026-02-21T10:22:24.2816387Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2816639Z shl.b16 %rs988, %rs987, 4; 2026-02-21T10:22:24.2816843Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2816911Z ld.shared.s8 %rs989, [%r26+1792]; 2026-02-21T10:22:24.2817111Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2817174Z shl.b16 %rs990, %rs989, 4; 2026-02-21T10:22:24.2817365Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2817434Z ld.shared.s8 %rs991, [%r27+1920]; 2026-02-21T10:22:24.2817623Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2817685Z shl.b16 %rs992, %rs991, 4; 2026-02-21T10:22:24.2817879Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2817957Z ld.shared.s8 %rs993, [%r20+2048]; 2026-02-21T10:22:24.2818227Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2818291Z shl.b16 %rs994, %rs993, 4; 2026-02-21T10:22:24.2818488Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2818617Z ld.shared.s8 %rs995, [%r21+2176]; 2026-02-21T10:22:24.2818807Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2818876Z shl.b16 %rs996, %rs995, 4; 2026-02-21T10:22:24.2819065Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2819131Z ld.shared.s8 %rs997, [%r22+2304]; 2026-02-21T10:22:24.2819325Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2819387Z shl.b16 %rs998, %rs997, 4; 2026-02-21T10:22:24.2819578Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2819650Z ld.shared.s8 %rs999, [%r23+2432]; 2026-02-21T10:22:24.2819840Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2819904Z shl.b16 %rs1000, %rs999, 4; 2026-02-21T10:22:24.2820098Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2820174Z ld.shared.s8 %rs1001, [%r24+2560]; 2026-02-21T10:22:24.2820443Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2820512Z shl.b16 %rs1002, %rs1001, 4; 2026-02-21T10:22:24.2820709Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2820776Z ld.shared.s8 %rs1003, [%r25+2688]; 2026-02-21T10:22:24.2820966Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2821051Z shl.b16 %rs1004, %rs1003, 4; 2026-02-21T10:22:24.2821246Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2821313Z ld.shared.s8 %rs1005, [%r26+2816]; 2026-02-21T10:22:24.2821510Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2821575Z shl.b16 %rs1006, %rs1005, 4; 2026-02-21T10:22:24.2821767Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2821838Z ld.shared.s8 %rs1007, [%r27+2944]; 2026-02-21T10:22:24.2822037Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2822099Z shl.b16 %rs1008, %rs1007, 4; 2026-02-21T10:22:24.2822291Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2822429Z ld.shared.s8 %rs1009, [%r20+3072]; 2026-02-21T10:22:24.2822621Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2822683Z shl.b16 %rs1010, %rs1009, 4; 2026-02-21T10:22:24.2822883Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2822952Z ld.shared.s8 %rs1011, [%r21+3200]; 2026-02-21T10:22:24.2823154Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2823297Z shl.b16 %rs1012, %rs1011, 4; 2026-02-21T10:22:24.2823537Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2823639Z ld.shared.s8 %rs1013, [%r22+3328]; 2026-02-21T10:22:24.2823953Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2824109Z shl.b16 %rs1014, %rs1013, 4; 2026-02-21T10:22:24.2824397Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2824516Z ld.shared.s8 %rs1015, [%r23+3456]; 2026-02-21T10:22:24.2824781Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2824908Z shl.b16 %rs1016, %rs1015, 4; 2026-02-21T10:22:24.2825185Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2825342Z ld.shared.s8 %rs1017, [%r24+3584]; 2026-02-21T10:22:24.2825592Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2825691Z shl.b16 %rs1018, %rs1017, 4; 2026-02-21T10:22:24.2825951Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2826031Z ld.shared.s8 %rs1019, [%r25+3712]; 2026-02-21T10:22:24.2826311Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2826622Z shl.b16 %rs1020, %rs1019, 4; 2026-02-21T10:22:24.2826879Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2826981Z ld.shared.s8 %rs1021, [%r26+3840]; 2026-02-21T10:22:24.2827214Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2827333Z shl.b16 %rs1022, %rs1021, 4; 2026-02-21T10:22:24.2827705Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2827830Z ld.shared.s8 %rs1023, [%r27+3968]; 2026-02-21T10:22:24.2828128Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2828223Z shl.b16 %rs1024, %rs1023, 4; 2026-02-21T10:22:24.2828517Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2828645Z cvt.s16.s8 %rs1025, %rs962; 2026-02-21T10:22:24.2828799Z shr.s16 %rs1026, %rs1025, 4; 2026-02-21T10:22:24.2828928Z cvt.s16.s8 %rs1027, %rs964; 2026-02-21T10:22:24.2829060Z shr.s16 %rs1028, %rs1027, 4; 2026-02-21T10:22:24.2829155Z shr.s16 %rs1029, %rs961, 4; 2026-02-21T10:22:24.2829251Z shr.s16 %rs1030, %rs963, 4; 2026-02-21T10:22:24.2829479Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2829653Z cvt.rn.f32.s16 %r19602, %rs1030; 2026-02-21T10:22:24.2829785Z cvt.rn.f32.s16 %r19603, %rs1029; 2026-02-21T10:22:24.2829884Z cvt.rn.f32.s16 %r19604, %rs1028; 2026-02-21T10:22:24.2830014Z cvt.rn.f32.s16 %r19605, %rs1026; 2026-02-21T10:22:24.2830240Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2830335Z cvt.s16.s8 %rs1031, %rs966; 2026-02-21T10:22:24.2830497Z shr.s16 %rs1032, %rs1031, 4; 2026-02-21T10:22:24.2830711Z cvt.s16.s8 %rs1033, %rs968; 2026-02-21T10:22:24.2830818Z shr.s16 %rs1034, %rs1033, 4; 2026-02-21T10:22:24.2830912Z shr.s16 %rs1035, %rs965, 4; 2026-02-21T10:22:24.2831050Z shr.s16 %rs1036, %rs967, 4; 2026-02-21T10:22:24.2831277Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2831362Z cvt.rn.f32.s16 %r19606, %rs1036; 2026-02-21T10:22:24.2831579Z cvt.rn.f32.s16 %r19607, %rs1035; 2026-02-21T10:22:24.2831681Z cvt.rn.f32.s16 %r19608, %rs1034; 2026-02-21T10:22:24.2831780Z cvt.rn.f32.s16 %r19609, %rs1032; 2026-02-21T10:22:24.2832043Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2832139Z cvt.s16.s8 %rs1037, %rs970; 2026-02-21T10:22:24.2832233Z shr.s16 %rs1038, %rs1037, 4; 2026-02-21T10:22:24.2832373Z cvt.s16.s8 %rs1039, %rs972; 2026-02-21T10:22:24.2832525Z shr.s16 %rs1040, %rs1039, 4; 2026-02-21T10:22:24.2832622Z shr.s16 %rs1041, %rs969, 4; 2026-02-21T10:22:24.2832790Z shr.s16 %rs1042, %rs971, 4; 2026-02-21T10:22:24.2833078Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2833161Z cvt.rn.f32.s16 %r19610, %rs1042; 2026-02-21T10:22:24.2833303Z cvt.rn.f32.s16 %r19611, %rs1041; 2026-02-21T10:22:24.2833526Z cvt.rn.f32.s16 %r19612, %rs1040; 2026-02-21T10:22:24.2833619Z cvt.rn.f32.s16 %r19613, %rs1038; 2026-02-21T10:22:24.2833848Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2833996Z cvt.s16.s8 %rs1043, %rs974; 2026-02-21T10:22:24.2834077Z shr.s16 %rs1044, %rs1043, 4; 2026-02-21T10:22:24.2834220Z cvt.s16.s8 %rs1045, %rs976; 2026-02-21T10:22:24.2834342Z shr.s16 %rs1046, %rs1045, 4; 2026-02-21T10:22:24.2834477Z shr.s16 %rs1047, %rs973, 4; 2026-02-21T10:22:24.2834592Z shr.s16 %rs1048, %rs975, 4; 2026-02-21T10:22:24.2834827Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2834948Z cvt.rn.f32.s16 %r19614, %rs1048; 2026-02-21T10:22:24.2835097Z cvt.rn.f32.s16 %r19615, %rs1047; 2026-02-21T10:22:24.2835211Z cvt.rn.f32.s16 %r19616, %rs1046; 2026-02-21T10:22:24.2835346Z cvt.rn.f32.s16 %r19617, %rs1044; 2026-02-21T10:22:24.2835599Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2835707Z cvt.s16.s8 %rs1049, %rs978; 2026-02-21T10:22:24.2835864Z shr.s16 %rs1050, %rs1049, 4; 2026-02-21T10:22:24.2836032Z cvt.s16.s8 %rs1051, %rs980; 2026-02-21T10:22:24.2836142Z shr.s16 %rs1052, %rs1051, 4; 2026-02-21T10:22:24.2836240Z shr.s16 %rs1053, %rs977, 4; 2026-02-21T10:22:24.2836384Z shr.s16 %rs1054, %rs979, 4; 2026-02-21T10:22:24.2836761Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2836862Z cvt.rn.f32.s16 %r19618, %rs1054; 2026-02-21T10:22:24.2837038Z cvt.rn.f32.s16 %r19619, %rs1053; 2026-02-21T10:22:24.2837156Z cvt.rn.f32.s16 %r19620, %rs1052; 2026-02-21T10:22:24.2837278Z cvt.rn.f32.s16 %r19621, %rs1050; 2026-02-21T10:22:24.2837514Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2837654Z cvt.s16.s8 %rs1055, %rs982; 2026-02-21T10:22:24.2837754Z shr.s16 %rs1056, %rs1055, 4; 2026-02-21T10:22:24.2837835Z cvt.s16.s8 %rs1057, %rs984; 2026-02-21T10:22:24.2838027Z shr.s16 %rs1058, %rs1057, 4; 2026-02-21T10:22:24.2838146Z shr.s16 %rs1059, %rs981, 4; 2026-02-21T10:22:24.2838242Z shr.s16 %rs1060, %rs983, 4; 2026-02-21T10:22:24.2838507Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2838603Z cvt.rn.f32.s16 %r19622, %rs1060; 2026-02-21T10:22:24.2838684Z cvt.rn.f32.s16 %r19623, %rs1059; 2026-02-21T10:22:24.2838825Z cvt.rn.f32.s16 %r19624, %rs1058; 2026-02-21T10:22:24.2839088Z cvt.rn.f32.s16 %r19625, %rs1056; 2026-02-21T10:22:24.2839324Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2839420Z cvt.s16.s8 %rs1061, %rs986; 2026-02-21T10:22:24.2839554Z shr.s16 %rs1062, %rs1061, 4; 2026-02-21T10:22:24.2839632Z cvt.s16.s8 %rs1063, %rs988; 2026-02-21T10:22:24.2839778Z shr.s16 %rs1064, %rs1063, 4; 2026-02-21T10:22:24.2839950Z shr.s16 %rs1065, %rs985, 4; 2026-02-21T10:22:24.2840045Z shr.s16 %rs1066, %rs987, 4; 2026-02-21T10:22:24.2840275Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2840370Z cvt.rn.f32.s16 %r19626, %rs1066; 2026-02-21T10:22:24.2840489Z cvt.rn.f32.s16 %r19627, %rs1065; 2026-02-21T10:22:24.2840647Z cvt.rn.f32.s16 %r19628, %rs1064; 2026-02-21T10:22:24.2840756Z cvt.rn.f32.s16 %r19629, %rs1062; 2026-02-21T10:22:24.2841018Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2841197Z cvt.s16.s8 %rs1067, %rs990; 2026-02-21T10:22:24.2841300Z shr.s16 %rs1068, %rs1067, 4; 2026-02-21T10:22:24.2841433Z cvt.s16.s8 %rs1069, %rs992; 2026-02-21T10:22:24.2841579Z shr.s16 %rs1070, %rs1069, 4; 2026-02-21T10:22:24.2841690Z shr.s16 %rs1071, %rs989, 4; 2026-02-21T10:22:24.2841821Z shr.s16 %rs1072, %rs991, 4; 2026-02-21T10:22:24.2842119Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2842217Z cvt.rn.f32.s16 %r19630, %rs1072; 2026-02-21T10:22:24.2842333Z cvt.rn.f32.s16 %r19631, %rs1071; 2026-02-21T10:22:24.2842617Z cvt.rn.f32.s16 %r19632, %rs1070; 2026-02-21T10:22:24.2842732Z cvt.rn.f32.s16 %r19633, %rs1068; 2026-02-21T10:22:24.2842961Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2843096Z cvt.s16.s8 %rs1073, %rs994; 2026-02-21T10:22:24.2843210Z shr.s16 %rs1074, %rs1073, 4; 2026-02-21T10:22:24.2843306Z cvt.s16.s8 %rs1075, %rs996; 2026-02-21T10:22:24.2843466Z shr.s16 %rs1076, %rs1075, 4; 2026-02-21T10:22:24.2843577Z shr.s16 %rs1077, %rs993, 4; 2026-02-21T10:22:24.2843673Z shr.s16 %rs1078, %rs995, 4; 2026-02-21T10:22:24.2843898Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2844053Z cvt.rn.f32.s16 %r19634, %rs1078; 2026-02-21T10:22:24.2844149Z cvt.rn.f32.s16 %r19635, %rs1077; 2026-02-21T10:22:24.2844229Z cvt.rn.f32.s16 %r19636, %rs1076; 2026-02-21T10:22:24.2844493Z cvt.rn.f32.s16 %r19637, %rs1074; 2026-02-21T10:22:24.2844736Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2844833Z cvt.s16.s8 %rs1079, %rs998; 2026-02-21T10:22:24.2844979Z shr.s16 %rs1080, %rs1079, 4; 2026-02-21T10:22:24.2845077Z cvt.s16.s8 %rs1081, %rs1000; 2026-02-21T10:22:24.2845155Z shr.s16 %rs1082, %rs1081, 4; 2026-02-21T10:22:24.2845303Z shr.s16 %rs1083, %rs997, 4; 2026-02-21T10:22:24.2845461Z shr.s16 %rs1084, %rs999, 4; 2026-02-21T10:22:24.2845699Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2845807Z cvt.rn.f32.s16 %r19638, %rs1084; 2026-02-21T10:22:24.2845943Z cvt.rn.f32.s16 %r19639, %rs1083; 2026-02-21T10:22:24.2846034Z cvt.rn.f32.s16 %r19640, %rs1082; 2026-02-21T10:22:24.2846171Z cvt.rn.f32.s16 %r19641, %rs1080; 2026-02-21T10:22:24.2846595Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2846715Z cvt.s16.s8 %rs1085, %rs1002; 2026-02-21T10:22:24.2846812Z shr.s16 %rs1086, %rs1085, 4; 2026-02-21T10:22:24.2846907Z cvt.s16.s8 %rs1087, %rs1004; 2026-02-21T10:22:24.2847024Z shr.s16 %rs1088, %rs1087, 4; 2026-02-21T10:22:24.2847177Z shr.s16 %rs1089, %rs1001, 4; 2026-02-21T10:22:24.2847290Z shr.s16 %rs1090, %rs1003, 4; 2026-02-21T10:22:24.2847576Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2847770Z cvt.rn.f32.s16 %r19642, %rs1090; 2026-02-21T10:22:24.2847867Z cvt.rn.f32.s16 %r19643, %rs1089; 2026-02-21T10:22:24.2847986Z cvt.rn.f32.s16 %r19644, %rs1088; 2026-02-21T10:22:24.2848135Z cvt.rn.f32.s16 %r19645, %rs1086; 2026-02-21T10:22:24.2848395Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2848529Z cvt.s16.s8 %rs1091, %rs1006; 2026-02-21T10:22:24.2848628Z shr.s16 %rs1092, %rs1091, 4; 2026-02-21T10:22:24.2848719Z cvt.s16.s8 %rs1093, %rs1008; 2026-02-21T10:22:24.2848813Z shr.s16 %rs1094, %rs1093, 4; 2026-02-21T10:22:24.2848978Z shr.s16 %rs1095, %rs1005, 4; 2026-02-21T10:22:24.2849105Z shr.s16 %rs1096, %rs1007, 4; 2026-02-21T10:22:24.2849343Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2849482Z cvt.rn.f32.s16 %r19646, %rs1096; 2026-02-21T10:22:24.2849584Z cvt.rn.f32.s16 %r19647, %rs1095; 2026-02-21T10:22:24.2849755Z cvt.rn.f32.s16 %r19648, %rs1094; 2026-02-21T10:22:24.2849952Z cvt.rn.f32.s16 %r19649, %rs1092; 2026-02-21T10:22:24.2850203Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2850297Z cvt.s16.s8 %rs1097, %rs1010; 2026-02-21T10:22:24.2850464Z shr.s16 %rs1098, %rs1097, 4; 2026-02-21T10:22:24.2850597Z cvt.s16.s8 %rs1099, %rs1012; 2026-02-21T10:22:24.2850690Z shr.s16 %rs1100, %rs1099, 4; 2026-02-21T10:22:24.2850786Z shr.s16 %rs1101, %rs1009, 4; 2026-02-21T10:22:24.2850978Z shr.s16 %rs1102, %rs1011, 4; 2026-02-21T10:22:24.2851203Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2851297Z cvt.rn.f32.s16 %r19650, %rs1102; 2026-02-21T10:22:24.2851430Z cvt.rn.f32.s16 %r19651, %rs1101; 2026-02-21T10:22:24.2851538Z cvt.rn.f32.s16 %r19652, %rs1100; 2026-02-21T10:22:24.2851620Z cvt.rn.f32.s16 %r19653, %rs1098; 2026-02-21T10:22:24.2851898Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2852049Z cvt.s16.s8 %rs1103, %rs1014; 2026-02-21T10:22:24.2852147Z shr.s16 %rs1104, %rs1103, 4; 2026-02-21T10:22:24.2852244Z cvt.s16.s8 %rs1105, %rs1016; 2026-02-21T10:22:24.2852399Z shr.s16 %rs1106, %rs1105, 4; 2026-02-21T10:22:24.2852489Z shr.s16 %rs1107, %rs1013, 4; 2026-02-21T10:22:24.2852628Z shr.s16 %rs1108, %rs1015, 4; 2026-02-21T10:22:24.2852994Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2853096Z cvt.rn.f32.s16 %r19654, %rs1108; 2026-02-21T10:22:24.2853193Z cvt.rn.f32.s16 %r19655, %rs1107; 2026-02-21T10:22:24.2853307Z cvt.rn.f32.s16 %r19656, %rs1106; 2026-02-21T10:22:24.2853425Z cvt.rn.f32.s16 %r19657, %rs1104; 2026-02-21T10:22:24.2853699Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2853821Z cvt.s16.s8 %rs1109, %rs1018; 2026-02-21T10:22:24.2853955Z shr.s16 %rs1110, %rs1109, 4; 2026-02-21T10:22:24.2854065Z cvt.s16.s8 %rs1111, %rs1020; 2026-02-21T10:22:24.2854160Z shr.s16 %rs1112, %rs1111, 4; 2026-02-21T10:22:24.2854277Z shr.s16 %rs1113, %rs1017, 4; 2026-02-21T10:22:24.2854411Z shr.s16 %rs1114, %rs1019, 4; 2026-02-21T10:22:24.2854654Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2854800Z cvt.rn.f32.s16 %r19658, %rs1114; 2026-02-21T10:22:24.2854914Z cvt.rn.f32.s16 %r19659, %rs1113; 2026-02-21T10:22:24.2855008Z cvt.rn.f32.s16 %r19660, %rs1112; 2026-02-21T10:22:24.2855103Z cvt.rn.f32.s16 %r19661, %rs1110; 2026-02-21T10:22:24.2855397Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2855510Z cvt.s16.s8 %rs1115, %rs1022; 2026-02-21T10:22:24.2855603Z shr.s16 %rs1116, %rs1115, 4; 2026-02-21T10:22:24.2855814Z cvt.s16.s8 %rs1117, %rs1024; 2026-02-21T10:22:24.2855908Z shr.s16 %rs1118, %rs1117, 4; 2026-02-21T10:22:24.2856001Z shr.s16 %rs1119, %rs1021, 4; 2026-02-21T10:22:24.2856170Z shr.s16 %rs1120, %rs1023, 4; 2026-02-21T10:22:24.2856414Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2856670Z cvt.rn.f32.s16 %r19662, %rs1120; 2026-02-21T10:22:24.2856774Z cvt.rn.f32.s16 %r19663, %rs1119; 2026-02-21T10:22:24.2856911Z cvt.rn.f32.s16 %r19664, %rs1118; 2026-02-21T10:22:24.2857007Z cvt.rn.f32.s16 %r19665, %rs1116; 2026-02-21T10:22:24.2857082Z bar.sync 0; 2026-02-21T10:22:24.2857350Z st.shared.v4.b32 [%r28], {%r19605, %r19603, %r19604, %r19602}; 2026-02-21T10:22:24.2857531Z st.shared.v4.b32 [%r28+16384], {%r19637, %r19635, %r19636, %r19634}; 2026-02-21T10:22:24.2857676Z st.shared.v4.b32 [%r29], {%r19609, %r19607, %r19608, %r19606}; 2026-02-21T10:22:24.2857865Z st.shared.v4.b32 [%r29+16384], {%r19641, %r19639, %r19640, %r19638}; 2026-02-21T10:22:24.2858100Z st.shared.v4.b32 [%r30], {%r19613, %r19611, %r19612, %r19610}; 2026-02-21T10:22:24.2858253Z st.shared.v4.b32 [%r30+16384], {%r19645, %r19643, %r19644, %r19642}; 2026-02-21T10:22:24.2858513Z st.shared.v4.b32 [%r31], {%r19617, %r19615, %r19616, %r19614}; 2026-02-21T10:22:24.2858666Z st.shared.v4.b32 [%r31+16384], {%r19649, %r19647, %r19648, %r19646}; 2026-02-21T10:22:24.2858897Z st.shared.v4.b32 [%r32], {%r19621, %r19619, %r19620, %r19618}; 2026-02-21T10:22:24.2859054Z st.shared.v4.b32 [%r32+16384], {%r19653, %r19651, %r19652, %r19650}; 2026-02-21T10:22:24.2859235Z st.shared.v4.b32 [%r33], {%r19625, %r19623, %r19624, %r19622}; 2026-02-21T10:22:24.2859368Z st.shared.v4.b32 [%r33+16384], {%r19657, %r19655, %r19656, %r19654}; 2026-02-21T10:22:24.2859582Z st.shared.v4.b32 [%r34], {%r19629, %r19627, %r19628, %r19626}; 2026-02-21T10:22:24.2859793Z st.shared.v4.b32 [%r34+16384], {%r19661, %r19659, %r19660, %r19658}; 2026-02-21T10:22:24.2859944Z st.shared.v4.b32 [%r35], {%r19633, %r19631, %r19632, %r19630}; 2026-02-21T10:22:24.2860095Z st.shared.v4.b32 [%r35+16384], {%r19665, %r19663, %r19664, %r19662}; 2026-02-21T10:22:24.2860230Z $L__tmp9: 2026-02-21T10:22:24.2860523Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2860685Z // begin inline asm 2026-02-21T10:22:24.2860852Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2860944Z // end inline asm 2026-02-21T10:22:24.2861118Z bar.sync 0; 2026-02-21T10:22:24.2861228Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2861364Z mov.pred %p133, -1; 2026-02-21T10:22:24.2861509Z // begin inline asm 2026-02-21T10:22:24.2863054Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r12525,%r12526,%r12527,%r12528}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.2863188Z // end inline asm 2026-02-21T10:22:24.2863279Z // begin inline asm 2026-02-21T10:22:24.2864848Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r12657,%r12658,%r12659,%r12660}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.2864999Z // end inline asm 2026-02-21T10:22:24.2865144Z // begin inline asm 2026-02-21T10:22:24.2866853Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r12789,%r12790,%r12791,%r12792}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.2866953Z // end inline asm 2026-02-21T10:22:24.2867098Z // begin inline asm 2026-02-21T10:22:24.2868763Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r12921,%r12922,%r12923,%r12924}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.2868909Z // end inline asm 2026-02-21T10:22:24.2869121Z // begin inline asm 2026-02-21T10:22:24.2870635Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r13053,%r13054,%r13055,%r13056}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.2870791Z // end inline asm 2026-02-21T10:22:24.2870888Z // begin inline asm 2026-02-21T10:22:24.2872468Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r13185,%r13186,%r13187,%r13188}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.2872640Z // end inline asm 2026-02-21T10:22:24.2872751Z // begin inline asm 2026-02-21T10:22:24.2874285Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r13317,%r13318,%r13319,%r13320}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.2874415Z // end inline asm 2026-02-21T10:22:24.2874507Z // begin inline asm 2026-02-21T10:22:24.2876054Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r13449,%r13450,%r13451,%r13452}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.2876288Z // end inline asm 2026-02-21T10:22:24.2876418Z // begin inline asm 2026-02-21T10:22:24.2878196Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r13581,%r13582,%r13583,%r13584}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.2878301Z // end inline asm 2026-02-21T10:22:24.2878430Z // begin inline asm 2026-02-21T10:22:24.2879928Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r13713,%r13714,%r13715,%r13716}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.2880164Z // end inline asm 2026-02-21T10:22:24.2880313Z // begin inline asm 2026-02-21T10:22:24.2881906Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r13845,%r13846,%r13847,%r13848}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.2882041Z // end inline asm 2026-02-21T10:22:24.2882135Z // begin inline asm 2026-02-21T10:22:24.2883654Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r13977,%r13978,%r13979,%r13980}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.2883851Z // end inline asm 2026-02-21T10:22:24.2883946Z // begin inline asm 2026-02-21T10:22:24.2885463Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r14109,%r14110,%r14111,%r14112}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.2885660Z // end inline asm 2026-02-21T10:22:24.2885767Z // begin inline asm 2026-02-21T10:22:24.2887502Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r14241,%r14242,%r14243,%r14244}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.2887620Z // end inline asm 2026-02-21T10:22:24.2887714Z // begin inline asm 2026-02-21T10:22:24.2889357Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r14373,%r14374,%r14375,%r14376}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.2889532Z // end inline asm 2026-02-21T10:22:24.2889651Z // begin inline asm 2026-02-21T10:22:24.2891233Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r14505,%r14506,%r14507,%r14508}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.2891345Z // end inline asm 2026-02-21T10:22:24.2891499Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.2891613Z mov.b32 %r14638, %r19468; 2026-02-21T10:22:24.2891771Z mov.b32 %r14639, %r19468; 2026-02-21T10:22:24.2891898Z mov.b32 %r14637, %r39936; 2026-02-21T10:22:24.2892044Z // begin inline asm 2026-02-21T10:22:24.2894618Z // wait for regs: %r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858,%r14637,%r14638,%r14639 2026-02-21T10:22:24.2894793Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.2894892Z // end inline asm 2026-02-21T10:22:24.2895021Z $L__tmp10: 2026-02-21T10:22:24.2895250Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.2895471Z add.s32 %r19666, %r42730, -64; 2026-02-21T10:22:24.2895589Z add.s64 %rd374, %rd333, 128; 2026-02-21T10:22:24.2895736Z add.s64 %rd377, %rd336, 128; 2026-02-21T10:22:24.2895832Z add.s64 %rd380, %rd339, 128; 2026-02-21T10:22:24.2895925Z add.s64 %rd383, %rd342, 128; 2026-02-21T10:22:24.2896054Z add.s64 %rd386, %rd345, 128; 2026-02-21T10:22:24.2896207Z add.s64 %rd389, %rd348, 128; 2026-02-21T10:22:24.2896345Z add.s64 %rd392, %rd351, 128; 2026-02-21T10:22:24.2896626Z mad.wide.s32 %rd395, %r19666, 2, %rd117; 2026-02-21T10:22:24.2896869Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.2896968Z // begin inline asm 2026-02-21T10:22:24.2897060Z mov.u64 %rd373, 0x0; 2026-02-21T10:22:24.2904327Z createpolicy.fractional.L2::evict_first.b64 %rd373, 1.0; 2026-02-21T10:22:24.2904492Z // end inline asm 2026-02-21T10:22:24.2904566Z // begin inline asm 2026-02-21T10:22:24.2904632Z mov.u32 %r14771, 0x0; 2026-02-21T10:22:24.2904836Z mov.u32 %r14772, 0x0; 2026-02-21T10:22:24.2904921Z mov.u32 %r14773, 0x0; 2026-02-21T10:22:24.2904994Z mov.u32 %r14774, 0x0; 2026-02-21T10:22:24.2905286Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14771, %r14772, %r14773, %r14774 }, [ %rd374 + 0 ], %rd373; 2026-02-21T10:22:24.2905355Z // end inline asm 2026-02-21T10:22:24.2905493Z // begin inline asm 2026-02-21T10:22:24.2905557Z mov.u64 %rd376, 0x0; 2026-02-21T10:22:24.2905709Z createpolicy.fractional.L2::evict_first.b64 %rd376, 1.0; 2026-02-21T10:22:24.2905775Z // end inline asm 2026-02-21T10:22:24.2905840Z // begin inline asm 2026-02-21T10:22:24.2905903Z mov.u32 %r14775, 0x0; 2026-02-21T10:22:24.2905970Z mov.u32 %r14776, 0x0; 2026-02-21T10:22:24.2906030Z mov.u32 %r14777, 0x0; 2026-02-21T10:22:24.2906088Z mov.u32 %r14778, 0x0; 2026-02-21T10:22:24.2906349Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14775, %r14776, %r14777, %r14778 }, [ %rd377 + 0 ], %rd376; 2026-02-21T10:22:24.2906411Z // end inline asm 2026-02-21T10:22:24.2906637Z // begin inline asm 2026-02-21T10:22:24.2906705Z mov.u64 %rd379, 0x0; 2026-02-21T10:22:24.2906844Z createpolicy.fractional.L2::evict_first.b64 %rd379, 1.0; 2026-02-21T10:22:24.2906908Z // end inline asm 2026-02-21T10:22:24.2906972Z // begin inline asm 2026-02-21T10:22:24.2907036Z mov.u32 %r14779, 0x0; 2026-02-21T10:22:24.2907099Z mov.u32 %r14780, 0x0; 2026-02-21T10:22:24.2907157Z mov.u32 %r14781, 0x0; 2026-02-21T10:22:24.2907215Z mov.u32 %r14782, 0x0; 2026-02-21T10:22:24.2907557Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14779, %r14780, %r14781, %r14782 }, [ %rd380 + 0 ], %rd379; 2026-02-21T10:22:24.2907623Z // end inline asm 2026-02-21T10:22:24.2907685Z // begin inline asm 2026-02-21T10:22:24.2907753Z mov.u64 %rd382, 0x0; 2026-02-21T10:22:24.2907883Z createpolicy.fractional.L2::evict_first.b64 %rd382, 1.0; 2026-02-21T10:22:24.2907945Z // end inline asm 2026-02-21T10:22:24.2908008Z // begin inline asm 2026-02-21T10:22:24.2908071Z mov.u32 %r14783, 0x0; 2026-02-21T10:22:24.2908135Z mov.u32 %r14784, 0x0; 2026-02-21T10:22:24.2908193Z mov.u32 %r14785, 0x0; 2026-02-21T10:22:24.2908257Z mov.u32 %r14786, 0x0; 2026-02-21T10:22:24.2908577Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14783, %r14784, %r14785, %r14786 }, [ %rd383 + 0 ], %rd382; 2026-02-21T10:22:24.2908641Z // end inline asm 2026-02-21T10:22:24.2908712Z // begin inline asm 2026-02-21T10:22:24.2908773Z mov.u64 %rd385, 0x0; 2026-02-21T10:22:24.2908904Z createpolicy.fractional.L2::evict_first.b64 %rd385, 1.0; 2026-02-21T10:22:24.2908968Z // end inline asm 2026-02-21T10:22:24.2909028Z // begin inline asm 2026-02-21T10:22:24.2909089Z mov.u32 %r14787, 0x0; 2026-02-21T10:22:24.2909148Z mov.u32 %r14788, 0x0; 2026-02-21T10:22:24.2909211Z mov.u32 %r14789, 0x0; 2026-02-21T10:22:24.2909272Z mov.u32 %r14790, 0x0; 2026-02-21T10:22:24.2909507Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14787, %r14788, %r14789, %r14790 }, [ %rd386 + 0 ], %rd385; 2026-02-21T10:22:24.2909657Z // end inline asm 2026-02-21T10:22:24.2909721Z // begin inline asm 2026-02-21T10:22:24.2909783Z mov.u64 %rd388, 0x0; 2026-02-21T10:22:24.2909914Z createpolicy.fractional.L2::evict_first.b64 %rd388, 1.0; 2026-02-21T10:22:24.2909973Z // end inline asm 2026-02-21T10:22:24.2910033Z // begin inline asm 2026-02-21T10:22:24.2910094Z mov.u32 %r14791, 0x0; 2026-02-21T10:22:24.2910161Z mov.u32 %r14792, 0x0; 2026-02-21T10:22:24.2910220Z mov.u32 %r14793, 0x0; 2026-02-21T10:22:24.2910279Z mov.u32 %r14794, 0x0; 2026-02-21T10:22:24.2910520Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14791, %r14792, %r14793, %r14794 }, [ %rd389 + 0 ], %rd388; 2026-02-21T10:22:24.2910583Z // end inline asm 2026-02-21T10:22:24.2910645Z // begin inline asm 2026-02-21T10:22:24.2910706Z mov.u64 %rd391, 0x0; 2026-02-21T10:22:24.2910850Z createpolicy.fractional.L2::evict_first.b64 %rd391, 1.0; 2026-02-21T10:22:24.2910910Z // end inline asm 2026-02-21T10:22:24.2910972Z // begin inline asm 2026-02-21T10:22:24.2911046Z mov.u32 %r14795, 0x0; 2026-02-21T10:22:24.2911177Z mov.u32 %r14796, 0x0; 2026-02-21T10:22:24.2911241Z mov.u32 %r14797, 0x0; 2026-02-21T10:22:24.2911314Z mov.u32 %r14798, 0x0; 2026-02-21T10:22:24.2911555Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14795, %r14796, %r14797, %r14798 }, [ %rd392 + 0 ], %rd391; 2026-02-21T10:22:24.2911614Z // end inline asm 2026-02-21T10:22:24.2911740Z // begin inline asm 2026-02-21T10:22:24.2911819Z mov.u64 %rd394, 0x0; 2026-02-21T10:22:24.2911947Z createpolicy.fractional.L2::evict_first.b64 %rd394, 1.0; 2026-02-21T10:22:24.2912009Z // end inline asm 2026-02-21T10:22:24.2912076Z // begin inline asm 2026-02-21T10:22:24.2912135Z mov.u32 %r14799, 0x0; 2026-02-21T10:22:24.2912195Z mov.u32 %r14800, 0x0; 2026-02-21T10:22:24.2912254Z mov.u32 %r14801, 0x0; 2026-02-21T10:22:24.2912319Z mov.u32 %r14802, 0x0; 2026-02-21T10:22:24.2912549Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r14799, %r14800, %r14801, %r14802 }, [ %rd395 + 0 ], %rd394; 2026-02-21T10:22:24.2912611Z // end inline asm 2026-02-21T10:22:24.2912835Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.2912895Z bar.sync 0; 2026-02-21T10:22:24.2912987Z st.shared.v2.b32 [%r10], {%r14771, %r14772}; 2026-02-21T10:22:24.2913092Z st.shared.v2.b32 [%r10+2048], {%r14775, %r14776}; 2026-02-21T10:22:24.2913190Z st.shared.v2.b32 [%r10+4096], {%r14779, %r14780}; 2026-02-21T10:22:24.2913280Z st.shared.v2.b32 [%r10+6144], {%r14783, %r14784}; 2026-02-21T10:22:24.2913433Z st.shared.v2.b32 [%r10+8192], {%r14787, %r14788}; 2026-02-21T10:22:24.2913534Z st.shared.v2.b32 [%r10+10240], {%r14791, %r14792}; 2026-02-21T10:22:24.2913626Z st.shared.v2.b32 [%r10+12288], {%r14795, %r14796}; 2026-02-21T10:22:24.2913716Z st.shared.v2.b32 [%r10+14336], {%r14799, %r14800}; 2026-02-21T10:22:24.2913804Z st.shared.v2.b32 [%r11], {%r14773, %r14774}; 2026-02-21T10:22:24.2913892Z st.shared.v2.b32 [%r11+2048], {%r14777, %r14778}; 2026-02-21T10:22:24.2913985Z st.shared.v2.b32 [%r11+4096], {%r14781, %r14782}; 2026-02-21T10:22:24.2914075Z st.shared.v2.b32 [%r11+6144], {%r14785, %r14786}; 2026-02-21T10:22:24.2914162Z st.shared.v2.b32 [%r11+8192], {%r14789, %r14790}; 2026-02-21T10:22:24.2914251Z st.shared.v2.b32 [%r11+10240], {%r14793, %r14794}; 2026-02-21T10:22:24.2914347Z st.shared.v2.b32 [%r11+12288], {%r14797, %r14798}; 2026-02-21T10:22:24.2914438Z st.shared.v2.b32 [%r11+14336], {%r14801, %r14802}; 2026-02-21T10:22:24.2914498Z bar.sync 0; 2026-02-21T10:22:24.2914574Z ld.shared.b16 %rs1121, [%r52]; 2026-02-21T10:22:24.2914655Z ld.shared.b16 %rs1122, [%r52+1024]; 2026-02-21T10:22:24.2935382Z ld.shared.b16 %rs1123, [%r52+64]; 2026-02-21T10:22:24.2935480Z ld.shared.b16 %rs1124, [%r52+1088]; 2026-02-21T10:22:24.2935563Z ld.shared.b16 %rs1125, [%r52+8192]; 2026-02-21T10:22:24.2935632Z ld.shared.b16 %rs1126, [%r52+9216]; 2026-02-21T10:22:24.2935696Z ld.shared.b16 %rs1127, [%r52+8256]; 2026-02-21T10:22:24.2935760Z ld.shared.b16 %rs1128, [%r52+9280]; 2026-02-21T10:22:24.2935945Z ld.shared.b16 %rs1129, [%r53]; 2026-02-21T10:22:24.2936010Z ld.shared.b16 %rs1130, [%r53+1024]; 2026-02-21T10:22:24.2936078Z ld.shared.b16 %rs1131, [%r53+64]; 2026-02-21T10:22:24.2936146Z ld.shared.b16 %rs1132, [%r53+1088]; 2026-02-21T10:22:24.2936210Z ld.shared.b16 %rs1133, [%r53+8192]; 2026-02-21T10:22:24.2936276Z ld.shared.b16 %rs1134, [%r53+9216]; 2026-02-21T10:22:24.2936341Z ld.shared.b16 %rs1135, [%r53+8256]; 2026-02-21T10:22:24.2936405Z ld.shared.b16 %rs1136, [%r53+9280]; 2026-02-21T10:22:24.2936602Z ld.shared.b16 %rs1137, [%r54]; 2026-02-21T10:22:24.2936673Z ld.shared.b16 %rs1138, [%r54+1024]; 2026-02-21T10:22:24.2936740Z ld.shared.b16 %rs1139, [%r54+64]; 2026-02-21T10:22:24.2936803Z ld.shared.b16 %rs1140, [%r54+1088]; 2026-02-21T10:22:24.2936866Z ld.shared.b16 %rs1141, [%r54+8192]; 2026-02-21T10:22:24.2936933Z ld.shared.b16 %rs1142, [%r54+9216]; 2026-02-21T10:22:24.2936996Z ld.shared.b16 %rs1143, [%r54+8256]; 2026-02-21T10:22:24.2937061Z ld.shared.b16 %rs1144, [%r54+9280]; 2026-02-21T10:22:24.2937199Z ld.shared.b16 %rs1145, [%r55]; 2026-02-21T10:22:24.2937271Z ld.shared.b16 %rs1146, [%r55+1024]; 2026-02-21T10:22:24.2937334Z ld.shared.b16 %rs1147, [%r55+64]; 2026-02-21T10:22:24.2937396Z ld.shared.b16 %rs1148, [%r55+1088]; 2026-02-21T10:22:24.2937461Z ld.shared.b16 %rs1149, [%r55+8192]; 2026-02-21T10:22:24.2937587Z ld.shared.b16 %rs1150, [%r55+9216]; 2026-02-21T10:22:24.2937662Z ld.shared.b16 %rs1151, [%r55+8256]; 2026-02-21T10:22:24.2937732Z ld.shared.b16 %rs1152, [%r55+9280]; 2026-02-21T10:22:24.2937795Z ld.shared.b16 %rs1153, [%r56]; 2026-02-21T10:22:24.2937860Z ld.shared.b16 %rs1154, [%r56+1024]; 2026-02-21T10:22:24.2937923Z ld.shared.b16 %rs1155, [%r56+64]; 2026-02-21T10:22:24.2937989Z ld.shared.b16 %rs1156, [%r56+1088]; 2026-02-21T10:22:24.2938051Z ld.shared.b16 %rs1157, [%r56+8192]; 2026-02-21T10:22:24.2938112Z ld.shared.b16 %rs1158, [%r56+9216]; 2026-02-21T10:22:24.2938177Z ld.shared.b16 %rs1159, [%r56+8256]; 2026-02-21T10:22:24.2938245Z ld.shared.b16 %rs1160, [%r56+9280]; 2026-02-21T10:22:24.2938308Z ld.shared.b16 %rs1161, [%r57]; 2026-02-21T10:22:24.2938371Z ld.shared.b16 %rs1162, [%r57+1024]; 2026-02-21T10:22:24.2938437Z ld.shared.b16 %rs1163, [%r57+64]; 2026-02-21T10:22:24.2938499Z ld.shared.b16 %rs1164, [%r57+1088]; 2026-02-21T10:22:24.2938564Z ld.shared.b16 %rs1165, [%r57+8192]; 2026-02-21T10:22:24.2938630Z ld.shared.b16 %rs1166, [%r57+9216]; 2026-02-21T10:22:24.2938694Z ld.shared.b16 %rs1167, [%r57+8256]; 2026-02-21T10:22:24.2938833Z ld.shared.b16 %rs1168, [%r57+9280]; 2026-02-21T10:22:24.2938902Z ld.shared.b16 %rs1169, [%r58]; 2026-02-21T10:22:24.2938965Z ld.shared.b16 %rs1170, [%r58+1024]; 2026-02-21T10:22:24.2939026Z ld.shared.b16 %rs1171, [%r58+64]; 2026-02-21T10:22:24.2939088Z ld.shared.b16 %rs1172, [%r58+1088]; 2026-02-21T10:22:24.2939153Z ld.shared.b16 %rs1173, [%r58+8192]; 2026-02-21T10:22:24.2939216Z ld.shared.b16 %rs1174, [%r58+9216]; 2026-02-21T10:22:24.2939281Z ld.shared.b16 %rs1175, [%r58+8256]; 2026-02-21T10:22:24.2939360Z ld.shared.b16 %rs1176, [%r58+9280]; 2026-02-21T10:22:24.2939425Z ld.shared.b16 %rs1177, [%r59]; 2026-02-21T10:22:24.2939489Z ld.shared.b16 %rs1178, [%r59+1024]; 2026-02-21T10:22:24.2939552Z ld.shared.b16 %rs1179, [%r59+64]; 2026-02-21T10:22:24.2939618Z ld.shared.b16 %rs1180, [%r59+1088]; 2026-02-21T10:22:24.2939682Z ld.shared.b16 %rs1181, [%r59+8192]; 2026-02-21T10:22:24.2939745Z ld.shared.b16 %rs1182, [%r59+9216]; 2026-02-21T10:22:24.2939813Z ld.shared.b16 %rs1183, [%r59+8256]; 2026-02-21T10:22:24.2939876Z ld.shared.b16 %rs1184, [%r59+9280]; 2026-02-21T10:22:24.2939938Z cvt.f32.bf16 %r14940, %rs1121; 2026-02-21T10:22:24.2939998Z cvt.f32.bf16 %r14941, %rs1122; 2026-02-21T10:22:24.2940073Z cvt.f32.bf16 %r14942, %rs1129; 2026-02-21T10:22:24.2940133Z cvt.f32.bf16 %r14943, %rs1130; 2026-02-21T10:22:24.2940191Z cvt.f32.bf16 %r15072, %rs1137; 2026-02-21T10:22:24.2940253Z cvt.f32.bf16 %r15073, %rs1138; 2026-02-21T10:22:24.2940387Z cvt.f32.bf16 %r15074, %rs1145; 2026-02-21T10:22:24.2940447Z cvt.f32.bf16 %r15075, %rs1146; 2026-02-21T10:22:24.2940509Z cvt.f32.bf16 %r15204, %rs1153; 2026-02-21T10:22:24.2940569Z cvt.f32.bf16 %r15205, %rs1154; 2026-02-21T10:22:24.2940664Z cvt.f32.bf16 %r15206, %rs1161; 2026-02-21T10:22:24.2940723Z cvt.f32.bf16 %r15207, %rs1162; 2026-02-21T10:22:24.2940789Z cvt.f32.bf16 %r15336, %rs1169; 2026-02-21T10:22:24.2940848Z cvt.f32.bf16 %r15337, %rs1170; 2026-02-21T10:22:24.2940907Z cvt.f32.bf16 %r15338, %rs1177; 2026-02-21T10:22:24.2940970Z cvt.f32.bf16 %r15339, %rs1178; 2026-02-21T10:22:24.2941029Z cvt.f32.bf16 %r15468, %rs1123; 2026-02-21T10:22:24.2941089Z cvt.f32.bf16 %r15469, %rs1124; 2026-02-21T10:22:24.2941148Z cvt.f32.bf16 %r15470, %rs1131; 2026-02-21T10:22:24.2941209Z cvt.f32.bf16 %r15471, %rs1132; 2026-02-21T10:22:24.2941267Z cvt.f32.bf16 %r15600, %rs1139; 2026-02-21T10:22:24.2941327Z cvt.f32.bf16 %r15601, %rs1140; 2026-02-21T10:22:24.2941402Z cvt.f32.bf16 %r15602, %rs1147; 2026-02-21T10:22:24.2941520Z cvt.f32.bf16 %r15603, %rs1148; 2026-02-21T10:22:24.2941584Z cvt.f32.bf16 %r15732, %rs1155; 2026-02-21T10:22:24.2941644Z cvt.f32.bf16 %r15733, %rs1156; 2026-02-21T10:22:24.2941706Z cvt.f32.bf16 %r15734, %rs1163; 2026-02-21T10:22:24.2941767Z cvt.f32.bf16 %r15735, %rs1164; 2026-02-21T10:22:24.2941883Z cvt.f32.bf16 %r15864, %rs1171; 2026-02-21T10:22:24.2941945Z cvt.f32.bf16 %r15865, %rs1172; 2026-02-21T10:22:24.2942004Z cvt.f32.bf16 %r15866, %rs1179; 2026-02-21T10:22:24.2942064Z cvt.f32.bf16 %r15867, %rs1180; 2026-02-21T10:22:24.2942126Z cvt.f32.bf16 %r15996, %rs1125; 2026-02-21T10:22:24.2942185Z cvt.f32.bf16 %r15997, %rs1126; 2026-02-21T10:22:24.2942243Z cvt.f32.bf16 %r15998, %rs1133; 2026-02-21T10:22:24.2942302Z cvt.f32.bf16 %r15999, %rs1134; 2026-02-21T10:22:24.2942364Z cvt.f32.bf16 %r16128, %rs1141; 2026-02-21T10:22:24.2942422Z cvt.f32.bf16 %r16129, %rs1142; 2026-02-21T10:22:24.2942481Z cvt.f32.bf16 %r16130, %rs1149; 2026-02-21T10:22:24.2942544Z cvt.f32.bf16 %r16131, %rs1150; 2026-02-21T10:22:24.2942606Z cvt.f32.bf16 %r16260, %rs1157; 2026-02-21T10:22:24.2942665Z cvt.f32.bf16 %r16261, %rs1158; 2026-02-21T10:22:24.2942724Z cvt.f32.bf16 %r16262, %rs1165; 2026-02-21T10:22:24.2942785Z cvt.f32.bf16 %r16263, %rs1166; 2026-02-21T10:22:24.2942853Z cvt.f32.bf16 %r16392, %rs1173; 2026-02-21T10:22:24.2942918Z cvt.f32.bf16 %r16393, %rs1174; 2026-02-21T10:22:24.2942981Z cvt.f32.bf16 %r16394, %rs1181; 2026-02-21T10:22:24.2943049Z cvt.f32.bf16 %r16395, %rs1182; 2026-02-21T10:22:24.2943187Z cvt.f32.bf16 %r16524, %rs1127; 2026-02-21T10:22:24.2943253Z cvt.f32.bf16 %r16525, %rs1128; 2026-02-21T10:22:24.2943321Z cvt.f32.bf16 %r16526, %rs1135; 2026-02-21T10:22:24.2943383Z cvt.f32.bf16 %r16527, %rs1136; 2026-02-21T10:22:24.2943445Z cvt.f32.bf16 %r16656, %rs1143; 2026-02-21T10:22:24.2943507Z cvt.f32.bf16 %r16657, %rs1144; 2026-02-21T10:22:24.2943575Z cvt.f32.bf16 %r16658, %rs1151; 2026-02-21T10:22:24.2943638Z cvt.f32.bf16 %r16659, %rs1152; 2026-02-21T10:22:24.2943708Z cvt.f32.bf16 %r16788, %rs1159; 2026-02-21T10:22:24.2943791Z cvt.f32.bf16 %r16789, %rs1160; 2026-02-21T10:22:24.2943856Z cvt.f32.bf16 %r16790, %rs1167; 2026-02-21T10:22:24.2943918Z cvt.f32.bf16 %r16791, %rs1168; 2026-02-21T10:22:24.2943979Z cvt.f32.bf16 %r16920, %rs1175; 2026-02-21T10:22:24.2944048Z cvt.f32.bf16 %r16921, %rs1176; 2026-02-21T10:22:24.2944109Z cvt.f32.bf16 %r16922, %rs1183; 2026-02-21T10:22:24.2944173Z cvt.f32.bf16 %r16923, %rs1184; 2026-02-21T10:22:24.2944403Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.2944464Z bar.sync 0; 2026-02-21T10:22:24.2944527Z // begin inline asm 2026-02-21T10:22:24.2944645Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.2944706Z // end inline asm 2026-02-21T10:22:24.2944763Z bar.sync 0; 2026-02-21T10:22:24.2944822Z // begin inline asm 2026-02-21T10:22:24.2944966Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.2945086Z // end inline asm 2026-02-21T10:22:24.2945147Z // begin inline asm 2026-02-21T10:22:24.2945230Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2945290Z // end inline asm 2026-02-21T10:22:24.2945346Z bar.sync 0; 2026-02-21T10:22:24.2945416Z elect.sync %r19667|%p191, -1; 2026-02-21T10:22:24.2945495Z and.pred %p151, %p1, %p191; 2026-02-21T10:22:24.2945559Z cvt.u32.u64 %r19668, %rd845; 2026-02-21T10:22:24.2945623Z add.s32 %r14807, %r19668, 128; 2026-02-21T10:22:24.2945699Z // begin inline asm 2026-02-21T10:22:24.2946051Z @%p151 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r19833, %r14807}], [%r29850]; 2026-02-21T10:22:24.2946111Z // end inline asm 2026-02-21T10:22:24.2946171Z bar.sync 0; 2026-02-21T10:22:24.2946239Z // begin inline asm 2026-02-21T10:22:24.2946294Z 2026-02-21T10:22:24.2946346Z { 2026-02-21T10:22:24.2946418Z .reg .pred complete; 2026-02-21T10:22:24.2946609Z waitLoop: 2026-02-21T10:22:24.2946852Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r19468; 2026-02-21T10:22:24.2946935Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.2946988Z } 2026-02-21T10:22:24.2946994Z 2026-02-21T10:22:24.2947055Z // end inline asm 2026-02-21T10:22:24.2947112Z bar.sync 0; 2026-02-21T10:22:24.2947179Z // begin inline asm 2026-02-21T10:22:24.2947349Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.2947408Z // end inline asm 2026-02-21T10:22:24.2947628Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2947698Z ld.shared.s8 %rs1185, [%r20]; 2026-02-21T10:22:24.2947898Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2947969Z shl.b16 %rs1186, %rs1185, 4; 2026-02-21T10:22:24.2948163Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2948238Z ld.shared.s8 %rs1187, [%r21+128]; 2026-02-21T10:22:24.2948515Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2948590Z shl.b16 %rs1188, %rs1187, 4; 2026-02-21T10:22:24.2948791Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2948864Z ld.shared.s8 %rs1189, [%r22+256]; 2026-02-21T10:22:24.2949139Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2949207Z shl.b16 %rs1190, %rs1189, 4; 2026-02-21T10:22:24.2949403Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2949476Z ld.shared.s8 %rs1191, [%r23+384]; 2026-02-21T10:22:24.2949679Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2949746Z shl.b16 %rs1192, %rs1191, 4; 2026-02-21T10:22:24.2949947Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2950012Z ld.shared.s8 %rs1193, [%r24+512]; 2026-02-21T10:22:24.2950201Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2950267Z shl.b16 %rs1194, %rs1193, 4; 2026-02-21T10:22:24.2950463Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2950530Z ld.shared.s8 %rs1195, [%r25+640]; 2026-02-21T10:22:24.2950721Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2950793Z shl.b16 %rs1196, %rs1195, 4; 2026-02-21T10:22:24.2950983Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2951050Z ld.shared.s8 %rs1197, [%r26+768]; 2026-02-21T10:22:24.2951249Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2951395Z shl.b16 %rs1198, %rs1197, 4; 2026-02-21T10:22:24.2951589Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2951662Z ld.shared.s8 %rs1199, [%r27+896]; 2026-02-21T10:22:24.2951856Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2951919Z shl.b16 %rs1200, %rs1199, 4; 2026-02-21T10:22:24.2952110Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2952190Z ld.shared.s8 %rs1201, [%r20+1024]; 2026-02-21T10:22:24.2952382Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2952445Z shl.b16 %rs1202, %rs1201, 4; 2026-02-21T10:22:24.2952641Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2952759Z ld.shared.s8 %rs1203, [%r21+1152]; 2026-02-21T10:22:24.2952950Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2953017Z shl.b16 %rs1204, %rs1203, 4; 2026-02-21T10:22:24.2953209Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2953323Z ld.shared.s8 %rs1205, [%r22+1280]; 2026-02-21T10:22:24.2953523Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2953588Z shl.b16 %rs1206, %rs1205, 4; 2026-02-21T10:22:24.2953779Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2953845Z ld.shared.s8 %rs1207, [%r23+1408]; 2026-02-21T10:22:24.2954045Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2954111Z shl.b16 %rs1208, %rs1207, 4; 2026-02-21T10:22:24.2954306Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2954376Z ld.shared.s8 %rs1209, [%r24+1536]; 2026-02-21T10:22:24.2954567Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2954631Z shl.b16 %rs1210, %rs1209, 4; 2026-02-21T10:22:24.2954880Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2954947Z ld.shared.s8 %rs1211, [%r25+1664]; 2026-02-21T10:22:24.2955138Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2955206Z shl.b16 %rs1212, %rs1211, 4; 2026-02-21T10:22:24.2955398Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2955476Z ld.shared.s8 %rs1213, [%r26+1792]; 2026-02-21T10:22:24.2955679Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2955742Z shl.b16 %rs1214, %rs1213, 4; 2026-02-21T10:22:24.2955930Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2955995Z ld.shared.s8 %rs1215, [%r27+1920]; 2026-02-21T10:22:24.2956202Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2956265Z shl.b16 %rs1216, %rs1215, 4; 2026-02-21T10:22:24.2956589Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2956669Z ld.shared.s8 %rs1217, [%r20+2048]; 2026-02-21T10:22:24.2956863Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2956926Z shl.b16 %rs1218, %rs1217, 4; 2026-02-21T10:22:24.2957123Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2957268Z ld.shared.s8 %rs1219, [%r21+2176]; 2026-02-21T10:22:24.2957459Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2957525Z shl.b16 %rs1220, %rs1219, 4; 2026-02-21T10:22:24.2957719Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2957784Z ld.shared.s8 %rs1221, [%r22+2304]; 2026-02-21T10:22:24.2957977Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2958046Z shl.b16 %rs1222, %rs1221, 4; 2026-02-21T10:22:24.2958236Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2958301Z ld.shared.s8 %rs1223, [%r23+2432]; 2026-02-21T10:22:24.2958494Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2958622Z shl.b16 %rs1224, %rs1223, 4; 2026-02-21T10:22:24.2958815Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2958892Z ld.shared.s8 %rs1225, [%r24+2560]; 2026-02-21T10:22:24.2959094Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2959217Z shl.b16 %rs1226, %rs1225, 4; 2026-02-21T10:22:24.2959413Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2959479Z ld.shared.s8 %rs1227, [%r25+2688]; 2026-02-21T10:22:24.2959669Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2959731Z shl.b16 %rs1228, %rs1227, 4; 2026-02-21T10:22:24.2959927Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2959996Z ld.shared.s8 %rs1229, [%r26+2816]; 2026-02-21T10:22:24.2960189Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2960257Z shl.b16 %rs1230, %rs1229, 4; 2026-02-21T10:22:24.2960446Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2960515Z ld.shared.s8 %rs1231, [%r27+2944]; 2026-02-21T10:22:24.2960776Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2960842Z shl.b16 %rs1232, %rs1231, 4; 2026-02-21T10:22:24.2961031Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2961100Z ld.shared.s8 %rs1233, [%r20+3072]; 2026-02-21T10:22:24.2961289Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2961351Z shl.b16 %rs1234, %rs1233, 4; 2026-02-21T10:22:24.2961546Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2961618Z ld.shared.s8 %rs1235, [%r21+3200]; 2026-02-21T10:22:24.2961807Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2961869Z shl.b16 %rs1236, %rs1235, 4; 2026-02-21T10:22:24.2962083Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2962152Z ld.shared.s8 %rs1237, [%r22+3328]; 2026-02-21T10:22:24.2962343Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2962411Z shl.b16 %rs1238, %rs1237, 4; 2026-02-21T10:22:24.2962601Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2962666Z ld.shared.s8 %rs1239, [%r23+3456]; 2026-02-21T10:22:24.2962863Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2962981Z shl.b16 %rs1240, %rs1239, 4; 2026-02-21T10:22:24.2963170Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2963236Z ld.shared.s8 %rs1241, [%r24+3584]; 2026-02-21T10:22:24.2963437Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2963500Z shl.b16 %rs1242, %rs1241, 4; 2026-02-21T10:22:24.2963691Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2963762Z ld.shared.s8 %rs1243, [%r25+3712]; 2026-02-21T10:22:24.2963951Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2964013Z shl.b16 %rs1244, %rs1243, 4; 2026-02-21T10:22:24.2964208Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2964325Z ld.shared.s8 %rs1245, [%r26+3840]; 2026-02-21T10:22:24.2964517Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2964586Z shl.b16 %rs1246, %rs1245, 4; 2026-02-21T10:22:24.2964776Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2964904Z ld.shared.s8 %rs1247, [%r27+3968]; 2026-02-21T10:22:24.2965101Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.2965165Z shl.b16 %rs1248, %rs1247, 4; 2026-02-21T10:22:24.2965353Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2965419Z cvt.s16.s8 %rs1249, %rs1186; 2026-02-21T10:22:24.2965487Z shr.s16 %rs1250, %rs1249, 4; 2026-02-21T10:22:24.2965551Z cvt.s16.s8 %rs1251, %rs1188; 2026-02-21T10:22:24.2965616Z shr.s16 %rs1252, %rs1251, 4; 2026-02-21T10:22:24.2965685Z shr.s16 %rs1253, %rs1185, 4; 2026-02-21T10:22:24.2965747Z shr.s16 %rs1254, %rs1187, 4; 2026-02-21T10:22:24.2965939Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2966015Z cvt.rn.f32.s16 %r19669, %rs1254; 2026-02-21T10:22:24.2966087Z cvt.rn.f32.s16 %r19670, %rs1253; 2026-02-21T10:22:24.2966163Z cvt.rn.f32.s16 %r19671, %rs1252; 2026-02-21T10:22:24.2966229Z cvt.rn.f32.s16 %r19672, %rs1250; 2026-02-21T10:22:24.2966605Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2966682Z cvt.s16.s8 %rs1255, %rs1190; 2026-02-21T10:22:24.2966747Z shr.s16 %rs1256, %rs1255, 4; 2026-02-21T10:22:24.2966814Z cvt.s16.s8 %rs1257, %rs1192; 2026-02-21T10:22:24.2966876Z shr.s16 %rs1258, %rs1257, 4; 2026-02-21T10:22:24.2966941Z shr.s16 %rs1259, %rs1189, 4; 2026-02-21T10:22:24.2967014Z shr.s16 %rs1260, %rs1191, 4; 2026-02-21T10:22:24.2967225Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2967290Z cvt.rn.f32.s16 %r19673, %rs1260; 2026-02-21T10:22:24.2967355Z cvt.rn.f32.s16 %r19674, %rs1259; 2026-02-21T10:22:24.2967430Z cvt.rn.f32.s16 %r19675, %rs1258; 2026-02-21T10:22:24.2967492Z cvt.rn.f32.s16 %r19676, %rs1256; 2026-02-21T10:22:24.2967687Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2967760Z cvt.s16.s8 %rs1261, %rs1194; 2026-02-21T10:22:24.2967822Z shr.s16 %rs1262, %rs1261, 4; 2026-02-21T10:22:24.2967883Z cvt.s16.s8 %rs1263, %rs1196; 2026-02-21T10:22:24.2967946Z shr.s16 %rs1264, %rs1263, 4; 2026-02-21T10:22:24.2968014Z shr.s16 %rs1265, %rs1193, 4; 2026-02-21T10:22:24.2968075Z shr.s16 %rs1266, %rs1195, 4; 2026-02-21T10:22:24.2968268Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2968427Z cvt.rn.f32.s16 %r19677, %rs1266; 2026-02-21T10:22:24.2968497Z cvt.rn.f32.s16 %r19678, %rs1265; 2026-02-21T10:22:24.2968560Z cvt.rn.f32.s16 %r19679, %rs1264; 2026-02-21T10:22:24.2968629Z cvt.rn.f32.s16 %r19680, %rs1262; 2026-02-21T10:22:24.2968824Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2968888Z cvt.s16.s8 %rs1267, %rs1198; 2026-02-21T10:22:24.2968950Z shr.s16 %rs1268, %rs1267, 4; 2026-02-21T10:22:24.2969019Z cvt.s16.s8 %rs1269, %rs1200; 2026-02-21T10:22:24.2969084Z shr.s16 %rs1270, %rs1269, 4; 2026-02-21T10:22:24.2969144Z shr.s16 %rs1271, %rs1197, 4; 2026-02-21T10:22:24.2969209Z shr.s16 %rs1272, %rs1199, 4; 2026-02-21T10:22:24.2969399Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2969463Z cvt.rn.f32.s16 %r19681, %rs1272; 2026-02-21T10:22:24.2969527Z cvt.rn.f32.s16 %r19682, %rs1271; 2026-02-21T10:22:24.2969599Z cvt.rn.f32.s16 %r19683, %rs1270; 2026-02-21T10:22:24.2969729Z cvt.rn.f32.s16 %r19684, %rs1268; 2026-02-21T10:22:24.2969925Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2969994Z cvt.s16.s8 %rs1273, %rs1202; 2026-02-21T10:22:24.2970057Z shr.s16 %rs1274, %rs1273, 4; 2026-02-21T10:22:24.2970180Z cvt.s16.s8 %rs1275, %rs1204; 2026-02-21T10:22:24.2970246Z shr.s16 %rs1276, %rs1275, 4; 2026-02-21T10:22:24.2970308Z shr.s16 %rs1277, %rs1201, 4; 2026-02-21T10:22:24.2970372Z shr.s16 %rs1278, %rs1203, 4; 2026-02-21T10:22:24.2970565Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2970644Z cvt.rn.f32.s16 %r19685, %rs1278; 2026-02-21T10:22:24.2970711Z cvt.rn.f32.s16 %r19686, %rs1277; 2026-02-21T10:22:24.2970774Z cvt.rn.f32.s16 %r19687, %rs1276; 2026-02-21T10:22:24.2970843Z cvt.rn.f32.s16 %r19688, %rs1274; 2026-02-21T10:22:24.2971037Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2971102Z cvt.s16.s8 %rs1279, %rs1206; 2026-02-21T10:22:24.2971168Z shr.s16 %rs1280, %rs1279, 4; 2026-02-21T10:22:24.2971228Z cvt.s16.s8 %rs1281, %rs1208; 2026-02-21T10:22:24.2971290Z shr.s16 %rs1282, %rs1281, 4; 2026-02-21T10:22:24.2971355Z shr.s16 %rs1283, %rs1205, 4; 2026-02-21T10:22:24.2971423Z shr.s16 %rs1284, %rs1207, 4; 2026-02-21T10:22:24.2971683Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2971750Z cvt.rn.f32.s16 %r19689, %rs1284; 2026-02-21T10:22:24.2971818Z cvt.rn.f32.s16 %r19690, %rs1283; 2026-02-21T10:22:24.2971881Z cvt.rn.f32.s16 %r19691, %rs1282; 2026-02-21T10:22:24.2971943Z cvt.rn.f32.s16 %r19692, %rs1280; 2026-02-21T10:22:24.2972139Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2972202Z cvt.s16.s8 %rs1285, %rs1210; 2026-02-21T10:22:24.2972269Z shr.s16 %rs1286, %rs1285, 4; 2026-02-21T10:22:24.2972331Z cvt.s16.s8 %rs1287, %rs1212; 2026-02-21T10:22:24.2972398Z shr.s16 %rs1288, %rs1287, 4; 2026-02-21T10:22:24.2972460Z shr.s16 %rs1289, %rs1209, 4; 2026-02-21T10:22:24.2972521Z shr.s16 %rs1290, %rs1211, 4; 2026-02-21T10:22:24.2972721Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2972791Z cvt.rn.f32.s16 %r19693, %rs1290; 2026-02-21T10:22:24.2972857Z cvt.rn.f32.s16 %r19694, %rs1289; 2026-02-21T10:22:24.2972921Z cvt.rn.f32.s16 %r19695, %rs1288; 2026-02-21T10:22:24.2972989Z cvt.rn.f32.s16 %r19696, %rs1286; 2026-02-21T10:22:24.2973180Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2973255Z cvt.s16.s8 %rs1291, %rs1214; 2026-02-21T10:22:24.2973325Z shr.s16 %rs1292, %rs1291, 4; 2026-02-21T10:22:24.2973390Z cvt.s16.s8 %rs1293, %rs1216; 2026-02-21T10:22:24.2973506Z shr.s16 %rs1294, %rs1293, 4; 2026-02-21T10:22:24.2973574Z shr.s16 %rs1295, %rs1213, 4; 2026-02-21T10:22:24.2973637Z shr.s16 %rs1296, %rs1215, 4; 2026-02-21T10:22:24.2973832Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2973896Z cvt.rn.f32.s16 %r19697, %rs1296; 2026-02-21T10:22:24.2973967Z cvt.rn.f32.s16 %r19698, %rs1295; 2026-02-21T10:22:24.2974031Z cvt.rn.f32.s16 %r19699, %rs1294; 2026-02-21T10:22:24.2974094Z cvt.rn.f32.s16 %r19700, %rs1292; 2026-02-21T10:22:24.2974309Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2974374Z cvt.s16.s8 %rs1297, %rs1218; 2026-02-21T10:22:24.2974438Z shr.s16 %rs1298, %rs1297, 4; 2026-02-21T10:22:24.2974506Z cvt.s16.s8 %rs1299, %rs1220; 2026-02-21T10:22:24.2974569Z shr.s16 %rs1300, %rs1299, 4; 2026-02-21T10:22:24.2974631Z shr.s16 %rs1301, %rs1217, 4; 2026-02-21T10:22:24.2974693Z shr.s16 %rs1302, %rs1219, 4; 2026-02-21T10:22:24.2974946Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2975012Z cvt.rn.f32.s16 %r19701, %rs1302; 2026-02-21T10:22:24.2975075Z cvt.rn.f32.s16 %r19702, %rs1301; 2026-02-21T10:22:24.2975142Z cvt.rn.f32.s16 %r19703, %rs1300; 2026-02-21T10:22:24.2975259Z cvt.rn.f32.s16 %r19704, %rs1298; 2026-02-21T10:22:24.2975450Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2975517Z cvt.s16.s8 %rs1303, %rs1222; 2026-02-21T10:22:24.2975577Z shr.s16 %rs1304, %rs1303, 4; 2026-02-21T10:22:24.2975638Z cvt.s16.s8 %rs1305, %rs1224; 2026-02-21T10:22:24.2975702Z shr.s16 %rs1306, %rs1305, 4; 2026-02-21T10:22:24.2975767Z shr.s16 %rs1307, %rs1221, 4; 2026-02-21T10:22:24.2975829Z shr.s16 %rs1308, %rs1223, 4; 2026-02-21T10:22:24.2976026Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2976098Z cvt.rn.f32.s16 %r19705, %rs1308; 2026-02-21T10:22:24.2976161Z cvt.rn.f32.s16 %r19706, %rs1307; 2026-02-21T10:22:24.2976222Z cvt.rn.f32.s16 %r19707, %rs1306; 2026-02-21T10:22:24.2976284Z cvt.rn.f32.s16 %r19708, %rs1304; 2026-02-21T10:22:24.2976642Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2976711Z cvt.s16.s8 %rs1309, %rs1226; 2026-02-21T10:22:24.2976775Z shr.s16 %rs1310, %rs1309, 4; 2026-02-21T10:22:24.2976917Z cvt.s16.s8 %rs1311, %rs1228; 2026-02-21T10:22:24.2976981Z shr.s16 %rs1312, %rs1311, 4; 2026-02-21T10:22:24.2977052Z shr.s16 %rs1313, %rs1225, 4; 2026-02-21T10:22:24.2977117Z shr.s16 %rs1314, %rs1227, 4; 2026-02-21T10:22:24.2977310Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2977374Z cvt.rn.f32.s16 %r19709, %rs1314; 2026-02-21T10:22:24.2977436Z cvt.rn.f32.s16 %r19710, %rs1313; 2026-02-21T10:22:24.2977505Z cvt.rn.f32.s16 %r19711, %rs1312; 2026-02-21T10:22:24.2977570Z cvt.rn.f32.s16 %r19712, %rs1310; 2026-02-21T10:22:24.2977761Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2977828Z cvt.s16.s8 %rs1315, %rs1230; 2026-02-21T10:22:24.2977888Z shr.s16 %rs1316, %rs1315, 4; 2026-02-21T10:22:24.2977950Z cvt.s16.s8 %rs1317, %rs1232; 2026-02-21T10:22:24.2978013Z shr.s16 %rs1318, %rs1317, 4; 2026-02-21T10:22:24.2978073Z shr.s16 %rs1319, %rs1229, 4; 2026-02-21T10:22:24.2978136Z shr.s16 %rs1320, %rs1231, 4; 2026-02-21T10:22:24.2978327Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2978394Z cvt.rn.f32.s16 %r19713, %rs1320; 2026-02-21T10:22:24.2978459Z cvt.rn.f32.s16 %r19714, %rs1319; 2026-02-21T10:22:24.2978520Z cvt.rn.f32.s16 %r19715, %rs1318; 2026-02-21T10:22:24.2978584Z cvt.rn.f32.s16 %r19716, %rs1316; 2026-02-21T10:22:24.2978863Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2978925Z cvt.s16.s8 %rs1321, %rs1234; 2026-02-21T10:22:24.2978990Z shr.s16 %rs1322, %rs1321, 4; 2026-02-21T10:22:24.2979051Z cvt.s16.s8 %rs1323, %rs1236; 2026-02-21T10:22:24.2979113Z shr.s16 %rs1324, %rs1323, 4; 2026-02-21T10:22:24.2979177Z shr.s16 %rs1325, %rs1233, 4; 2026-02-21T10:22:24.2979241Z shr.s16 %rs1326, %rs1235, 4; 2026-02-21T10:22:24.2979434Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2979498Z cvt.rn.f32.s16 %r19717, %rs1326; 2026-02-21T10:22:24.2979562Z cvt.rn.f32.s16 %r19718, %rs1325; 2026-02-21T10:22:24.2979624Z cvt.rn.f32.s16 %r19719, %rs1324; 2026-02-21T10:22:24.2979684Z cvt.rn.f32.s16 %r19720, %rs1322; 2026-02-21T10:22:24.2979872Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2979935Z cvt.s16.s8 %rs1327, %rs1238; 2026-02-21T10:22:24.2980074Z shr.s16 %rs1328, %rs1327, 4; 2026-02-21T10:22:24.2980140Z cvt.s16.s8 %rs1329, %rs1240; 2026-02-21T10:22:24.2980208Z shr.s16 %rs1330, %rs1329, 4; 2026-02-21T10:22:24.2980268Z shr.s16 %rs1331, %rs1237, 4; 2026-02-21T10:22:24.2980329Z shr.s16 %rs1332, %rs1239, 4; 2026-02-21T10:22:24.2980524Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2980649Z cvt.rn.f32.s16 %r19721, %rs1332; 2026-02-21T10:22:24.2980712Z cvt.rn.f32.s16 %r19722, %rs1331; 2026-02-21T10:22:24.2980773Z cvt.rn.f32.s16 %r19723, %rs1330; 2026-02-21T10:22:24.2980843Z cvt.rn.f32.s16 %r19724, %rs1328; 2026-02-21T10:22:24.2981036Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2981097Z cvt.s16.s8 %rs1333, %rs1242; 2026-02-21T10:22:24.2981161Z shr.s16 %rs1334, %rs1333, 4; 2026-02-21T10:22:24.2981224Z cvt.s16.s8 %rs1335, %rs1244; 2026-02-21T10:22:24.2981291Z shr.s16 %rs1336, %rs1335, 4; 2026-02-21T10:22:24.2981355Z shr.s16 %rs1337, %rs1241, 4; 2026-02-21T10:22:24.2981414Z shr.s16 %rs1338, %rs1243, 4; 2026-02-21T10:22:24.2981604Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2981670Z cvt.rn.f32.s16 %r19725, %rs1338; 2026-02-21T10:22:24.2981740Z cvt.rn.f32.s16 %r19726, %rs1337; 2026-02-21T10:22:24.2981802Z cvt.rn.f32.s16 %r19727, %rs1336; 2026-02-21T10:22:24.2981924Z cvt.rn.f32.s16 %r19728, %rs1334; 2026-02-21T10:22:24.2982125Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.2982186Z cvt.s16.s8 %rs1339, %rs1246; 2026-02-21T10:22:24.2982249Z shr.s16 %rs1340, %rs1339, 4; 2026-02-21T10:22:24.2982313Z cvt.s16.s8 %rs1341, %rs1248; 2026-02-21T10:22:24.2982375Z shr.s16 %rs1342, %rs1341, 4; 2026-02-21T10:22:24.2982437Z shr.s16 %rs1343, %rs1245, 4; 2026-02-21T10:22:24.2982501Z shr.s16 %rs1344, %rs1247, 4; 2026-02-21T10:22:24.2982699Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.2982767Z cvt.rn.f32.s16 %r19729, %rs1344; 2026-02-21T10:22:24.2982830Z cvt.rn.f32.s16 %r19730, %rs1343; 2026-02-21T10:22:24.2982897Z cvt.rn.f32.s16 %r19731, %rs1342; 2026-02-21T10:22:24.2982962Z cvt.rn.f32.s16 %r19732, %rs1340; 2026-02-21T10:22:24.2983019Z bar.sync 0; 2026-02-21T10:22:24.2983145Z st.shared.v4.b32 [%r28], {%r19672, %r19670, %r19671, %r19669}; 2026-02-21T10:22:24.2983280Z st.shared.v4.b32 [%r28+16384], {%r19704, %r19702, %r19703, %r19701}; 2026-02-21T10:22:24.2983392Z st.shared.v4.b32 [%r29], {%r19676, %r19674, %r19675, %r19673}; 2026-02-21T10:22:24.2983513Z st.shared.v4.b32 [%r29+16384], {%r19708, %r19706, %r19707, %r19705}; 2026-02-21T10:22:24.2983628Z st.shared.v4.b32 [%r30], {%r19680, %r19678, %r19679, %r19677}; 2026-02-21T10:22:24.2983744Z st.shared.v4.b32 [%r30+16384], {%r19712, %r19710, %r19711, %r19709}; 2026-02-21T10:22:24.2983913Z st.shared.v4.b32 [%r31], {%r19684, %r19682, %r19683, %r19681}; 2026-02-21T10:22:24.2984033Z st.shared.v4.b32 [%r31+16384], {%r19716, %r19714, %r19715, %r19713}; 2026-02-21T10:22:24.2984140Z st.shared.v4.b32 [%r32], {%r19688, %r19686, %r19687, %r19685}; 2026-02-21T10:22:24.2984258Z st.shared.v4.b32 [%r32+16384], {%r19720, %r19718, %r19719, %r19717}; 2026-02-21T10:22:24.2984373Z st.shared.v4.b32 [%r33], {%r19692, %r19690, %r19691, %r19689}; 2026-02-21T10:22:24.2984491Z st.shared.v4.b32 [%r33+16384], {%r19724, %r19722, %r19723, %r19721}; 2026-02-21T10:22:24.2984596Z st.shared.v4.b32 [%r34], {%r19696, %r19694, %r19695, %r19693}; 2026-02-21T10:22:24.2984712Z st.shared.v4.b32 [%r34+16384], {%r19728, %r19726, %r19727, %r19725}; 2026-02-21T10:22:24.2984827Z st.shared.v4.b32 [%r35], {%r19700, %r19698, %r19699, %r19697}; 2026-02-21T10:22:24.2984943Z st.shared.v4.b32 [%r35+16384], {%r19732, %r19730, %r19731, %r19729}; 2026-02-21T10:22:24.2985003Z $L__tmp11: 2026-02-21T10:22:24.2985359Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.2985426Z // begin inline asm 2026-02-21T10:22:24.2985508Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.2985570Z // end inline asm 2026-02-21T10:22:24.2985628Z bar.sync 0; 2026-02-21T10:22:24.2985749Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.2985809Z // begin inline asm 2026-02-21T10:22:24.2987406Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r14940,%r14941,%r14942,%r14943}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.2987471Z // end inline asm 2026-02-21T10:22:24.2987534Z // begin inline asm 2026-02-21T10:22:24.2989142Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15072,%r15073,%r15074,%r15075}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.2989209Z // end inline asm 2026-02-21T10:22:24.2989274Z // begin inline asm 2026-02-21T10:22:24.2990726Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15204,%r15205,%r15206,%r15207}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.2990793Z // end inline asm 2026-02-21T10:22:24.2990851Z // begin inline asm 2026-02-21T10:22:24.2992304Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15336,%r15337,%r15338,%r15339}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.2992436Z // end inline asm 2026-02-21T10:22:24.2992496Z // begin inline asm 2026-02-21T10:22:24.2993944Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15468,%r15469,%r15470,%r15471}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.2994070Z // end inline asm 2026-02-21T10:22:24.2994132Z // begin inline asm 2026-02-21T10:22:24.2995597Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15600,%r15601,%r15602,%r15603}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.2995721Z // end inline asm 2026-02-21T10:22:24.2995779Z // begin inline asm 2026-02-21T10:22:24.2997356Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15732,%r15733,%r15734,%r15735}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.2997423Z // end inline asm 2026-02-21T10:22:24.2997562Z // begin inline asm 2026-02-21T10:22:24.2999024Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r15864,%r15865,%r15866,%r15867}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.2999084Z // end inline asm 2026-02-21T10:22:24.2999148Z // begin inline asm 2026-02-21T10:22:24.3000602Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r15996,%r15997,%r15998,%r15999}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.3000663Z // end inline asm 2026-02-21T10:22:24.3000787Z // begin inline asm 2026-02-21T10:22:24.3002239Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16128,%r16129,%r16130,%r16131}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.3002301Z // end inline asm 2026-02-21T10:22:24.3002360Z // begin inline asm 2026-02-21T10:22:24.3003876Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16260,%r16261,%r16262,%r16263}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.3003999Z // end inline asm 2026-02-21T10:22:24.3004058Z // begin inline asm 2026-02-21T10:22:24.3005515Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16392,%r16393,%r16394,%r16395}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.3005573Z // end inline asm 2026-02-21T10:22:24.3005632Z // begin inline asm 2026-02-21T10:22:24.3007485Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16524,%r16525,%r16526,%r16527}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.3007564Z // end inline asm 2026-02-21T10:22:24.3007624Z // begin inline asm 2026-02-21T10:22:24.3009088Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16656,%r16657,%r16658,%r16659}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.3009148Z // end inline asm 2026-02-21T10:22:24.3009209Z // begin inline asm 2026-02-21T10:22:24.3010666Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16788,%r16789,%r16790,%r16791}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.3010795Z // end inline asm 2026-02-21T10:22:24.3010857Z // begin inline asm 2026-02-21T10:22:24.3012385Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r16920,%r16921,%r16922,%r16923}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.3012455Z // end inline asm 2026-02-21T10:22:24.3012535Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3012599Z mov.b32 %r17053, %r19468; 2026-02-21T10:22:24.3012659Z mov.b32 %r17054, %r19468; 2026-02-21T10:22:24.3012724Z mov.b32 %r17052, %r39936; 2026-02-21T10:22:24.3012845Z // begin inline asm 2026-02-21T10:22:24.3015328Z // wait for regs: %r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858,%r17052,%r17053,%r17054 2026-02-21T10:22:24.3015409Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3015515Z // end inline asm 2026-02-21T10:22:24.3015576Z $L__tmp12: 2026-02-21T10:22:24.3015786Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3015853Z add.s64 %rd415, %rd333, 256; 2026-02-21T10:22:24.3015918Z add.s64 %rd418, %rd336, 256; 2026-02-21T10:22:24.3015978Z add.s64 %rd421, %rd339, 256; 2026-02-21T10:22:24.3016039Z add.s64 %rd424, %rd342, 256; 2026-02-21T10:22:24.3016112Z add.s64 %rd427, %rd345, 256; 2026-02-21T10:22:24.3016193Z add.s64 %rd430, %rd348, 256; 2026-02-21T10:22:24.3016256Z add.s64 %rd433, %rd351, 256; 2026-02-21T10:22:24.3016336Z mad.wide.s32 %rd436, %r42730, 2, %rd117; 2026-02-21T10:22:24.3016673Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3016741Z // begin inline asm 2026-02-21T10:22:24.3016801Z mov.u64 %rd414, 0x0; 2026-02-21T10:22:24.3016937Z createpolicy.fractional.L2::evict_first.b64 %rd414, 1.0; 2026-02-21T10:22:24.3016999Z // end inline asm 2026-02-21T10:22:24.3017059Z // begin inline asm 2026-02-21T10:22:24.3017119Z mov.u32 %r17186, 0x0; 2026-02-21T10:22:24.3017185Z mov.u32 %r17187, 0x0; 2026-02-21T10:22:24.3017244Z mov.u32 %r17188, 0x0; 2026-02-21T10:22:24.3017299Z mov.u32 %r17189, 0x0; 2026-02-21T10:22:24.3017543Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17186, %r17187, %r17188, %r17189 }, [ %rd415 + 0 ], %rd414; 2026-02-21T10:22:24.3017708Z // end inline asm 2026-02-21T10:22:24.3017769Z // begin inline asm 2026-02-21T10:22:24.3017827Z mov.u64 %rd417, 0x0; 2026-02-21T10:22:24.3017955Z createpolicy.fractional.L2::evict_first.b64 %rd417, 1.0; 2026-02-21T10:22:24.3018017Z // end inline asm 2026-02-21T10:22:24.3018075Z // begin inline asm 2026-02-21T10:22:24.3018142Z mov.u32 %r17190, 0x0; 2026-02-21T10:22:24.3018202Z mov.u32 %r17191, 0x0; 2026-02-21T10:22:24.3018259Z mov.u32 %r17192, 0x0; 2026-02-21T10:22:24.3018322Z mov.u32 %r17193, 0x0; 2026-02-21T10:22:24.3018555Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17190, %r17191, %r17192, %r17193 }, [ %rd418 + 0 ], %rd417; 2026-02-21T10:22:24.3018615Z // end inline asm 2026-02-21T10:22:24.3018676Z // begin inline asm 2026-02-21T10:22:24.3018750Z mov.u64 %rd420, 0x0; 2026-02-21T10:22:24.3018875Z createpolicy.fractional.L2::evict_first.b64 %rd420, 1.0; 2026-02-21T10:22:24.3018933Z // end inline asm 2026-02-21T10:22:24.3018995Z // begin inline asm 2026-02-21T10:22:24.3019057Z mov.u32 %r17194, 0x0; 2026-02-21T10:22:24.3019184Z mov.u32 %r17195, 0x0; 2026-02-21T10:22:24.3019245Z mov.u32 %r17196, 0x0; 2026-02-21T10:22:24.3019309Z mov.u32 %r17197, 0x0; 2026-02-21T10:22:24.3019535Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17194, %r17195, %r17196, %r17197 }, [ %rd421 + 0 ], %rd420; 2026-02-21T10:22:24.3019652Z // end inline asm 2026-02-21T10:22:24.3019714Z // begin inline asm 2026-02-21T10:22:24.3019773Z mov.u64 %rd423, 0x0; 2026-02-21T10:22:24.3019894Z createpolicy.fractional.L2::evict_first.b64 %rd423, 1.0; 2026-02-21T10:22:24.3019954Z // end inline asm 2026-02-21T10:22:24.3020013Z // begin inline asm 2026-02-21T10:22:24.3020070Z mov.u32 %r17198, 0x0; 2026-02-21T10:22:24.3020129Z mov.u32 %r17199, 0x0; 2026-02-21T10:22:24.3020190Z mov.u32 %r17200, 0x0; 2026-02-21T10:22:24.3020247Z mov.u32 %r17201, 0x0; 2026-02-21T10:22:24.3020472Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17198, %r17199, %r17200, %r17201 }, [ %rd424 + 0 ], %rd423; 2026-02-21T10:22:24.3020538Z // end inline asm 2026-02-21T10:22:24.3020598Z // begin inline asm 2026-02-21T10:22:24.3020658Z mov.u64 %rd426, 0x0; 2026-02-21T10:22:24.3020780Z createpolicy.fractional.L2::evict_first.b64 %rd426, 1.0; 2026-02-21T10:22:24.3020836Z // end inline asm 2026-02-21T10:22:24.3020895Z // begin inline asm 2026-02-21T10:22:24.3020953Z mov.u32 %r17202, 0x0; 2026-02-21T10:22:24.3021017Z mov.u32 %r17203, 0x0; 2026-02-21T10:22:24.3021077Z mov.u32 %r17204, 0x0; 2026-02-21T10:22:24.3021134Z mov.u32 %r17205, 0x0; 2026-02-21T10:22:24.3021440Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17202, %r17203, %r17204, %r17205 }, [ %rd427 + 0 ], %rd426; 2026-02-21T10:22:24.3021501Z // end inline asm 2026-02-21T10:22:24.3021560Z // begin inline asm 2026-02-21T10:22:24.3021619Z mov.u64 %rd429, 0x0; 2026-02-21T10:22:24.3021760Z createpolicy.fractional.L2::evict_first.b64 %rd429, 1.0; 2026-02-21T10:22:24.3021818Z // end inline asm 2026-02-21T10:22:24.3021878Z // begin inline asm 2026-02-21T10:22:24.3021944Z mov.u32 %r17206, 0x0; 2026-02-21T10:22:24.3022003Z mov.u32 %r17207, 0x0; 2026-02-21T10:22:24.3022061Z mov.u32 %r17208, 0x0; 2026-02-21T10:22:24.3022123Z mov.u32 %r17209, 0x0; 2026-02-21T10:22:24.3022349Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17206, %r17207, %r17208, %r17209 }, [ %rd430 + 0 ], %rd429; 2026-02-21T10:22:24.3022409Z // end inline asm 2026-02-21T10:22:24.3022467Z // begin inline asm 2026-02-21T10:22:24.3022528Z mov.u64 %rd432, 0x0; 2026-02-21T10:22:24.3022651Z createpolicy.fractional.L2::evict_first.b64 %rd432, 1.0; 2026-02-21T10:22:24.3022711Z // end inline asm 2026-02-21T10:22:24.3022774Z // begin inline asm 2026-02-21T10:22:24.3022833Z mov.u32 %r17210, 0x0; 2026-02-21T10:22:24.3022890Z mov.u32 %r17211, 0x0; 2026-02-21T10:22:24.3022947Z mov.u32 %r17212, 0x0; 2026-02-21T10:22:24.3023008Z mov.u32 %r17213, 0x0; 2026-02-21T10:22:24.3023232Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17210, %r17211, %r17212, %r17213 }, [ %rd433 + 0 ], %rd432; 2026-02-21T10:22:24.3023343Z // end inline asm 2026-02-21T10:22:24.3023408Z // begin inline asm 2026-02-21T10:22:24.3023466Z mov.u64 %rd435, 0x0; 2026-02-21T10:22:24.3023588Z createpolicy.fractional.L2::evict_first.b64 %rd435, 1.0; 2026-02-21T10:22:24.3023648Z // end inline asm 2026-02-21T10:22:24.3023707Z // begin inline asm 2026-02-21T10:22:24.3023767Z mov.u32 %r17214, 0x0; 2026-02-21T10:22:24.3023826Z mov.u32 %r17215, 0x0; 2026-02-21T10:22:24.3023890Z mov.u32 %r17216, 0x0; 2026-02-21T10:22:24.3023947Z mov.u32 %r17217, 0x0; 2026-02-21T10:22:24.3024173Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r17214, %r17215, %r17216, %r17217 }, [ %rd436 + 0 ], %rd435; 2026-02-21T10:22:24.3024235Z // end inline asm 2026-02-21T10:22:24.3024439Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3024498Z bar.sync 0; 2026-02-21T10:22:24.3024588Z st.shared.v2.b32 [%r10], {%r17186, %r17187}; 2026-02-21T10:22:24.3024681Z st.shared.v2.b32 [%r10+2048], {%r17190, %r17191}; 2026-02-21T10:22:24.3024823Z st.shared.v2.b32 [%r10+4096], {%r17194, %r17195}; 2026-02-21T10:22:24.3024912Z st.shared.v2.b32 [%r10+6144], {%r17198, %r17199}; 2026-02-21T10:22:24.3025001Z st.shared.v2.b32 [%r10+8192], {%r17202, %r17203}; 2026-02-21T10:22:24.3025093Z st.shared.v2.b32 [%r10+10240], {%r17206, %r17207}; 2026-02-21T10:22:24.3025229Z st.shared.v2.b32 [%r10+12288], {%r17210, %r17211}; 2026-02-21T10:22:24.3025320Z st.shared.v2.b32 [%r10+14336], {%r17214, %r17215}; 2026-02-21T10:22:24.3025403Z st.shared.v2.b32 [%r11], {%r17188, %r17189}; 2026-02-21T10:22:24.3025488Z st.shared.v2.b32 [%r11+2048], {%r17192, %r17193}; 2026-02-21T10:22:24.3025575Z st.shared.v2.b32 [%r11+4096], {%r17196, %r17197}; 2026-02-21T10:22:24.3025659Z st.shared.v2.b32 [%r11+6144], {%r17200, %r17201}; 2026-02-21T10:22:24.3025741Z st.shared.v2.b32 [%r11+8192], {%r17204, %r17205}; 2026-02-21T10:22:24.3025826Z st.shared.v2.b32 [%r11+10240], {%r17208, %r17209}; 2026-02-21T10:22:24.3025924Z st.shared.v2.b32 [%r11+12288], {%r17212, %r17213}; 2026-02-21T10:22:24.3026017Z st.shared.v2.b32 [%r11+14336], {%r17216, %r17217}; 2026-02-21T10:22:24.3026073Z bar.sync 0; 2026-02-21T10:22:24.3026147Z ld.shared.b16 %rs1345, [%r52]; 2026-02-21T10:22:24.3026218Z ld.shared.b16 %rs1346, [%r52+1024]; 2026-02-21T10:22:24.3026288Z ld.shared.b16 %rs1347, [%r52+64]; 2026-02-21T10:22:24.3026362Z ld.shared.b16 %rs1348, [%r52+1088]; 2026-02-21T10:22:24.3026428Z ld.shared.b16 %rs1349, [%r52+8192]; 2026-02-21T10:22:24.3026687Z ld.shared.b16 %rs1350, [%r52+9216]; 2026-02-21T10:22:24.3026761Z ld.shared.b16 %rs1351, [%r52+8256]; 2026-02-21T10:22:24.3026834Z ld.shared.b16 %rs1352, [%r52+9280]; 2026-02-21T10:22:24.3026900Z ld.shared.b16 %rs1353, [%r53]; 2026-02-21T10:22:24.3026966Z ld.shared.b16 %rs1354, [%r53+1024]; 2026-02-21T10:22:24.3027037Z ld.shared.b16 %rs1355, [%r53+64]; 2026-02-21T10:22:24.3027115Z ld.shared.b16 %rs1356, [%r53+1088]; 2026-02-21T10:22:24.3027183Z ld.shared.b16 %rs1357, [%r53+8192]; 2026-02-21T10:22:24.3027258Z ld.shared.b16 %rs1358, [%r53+9216]; 2026-02-21T10:22:24.3027328Z ld.shared.b16 %rs1359, [%r53+8256]; 2026-02-21T10:22:24.3027391Z ld.shared.b16 %rs1360, [%r53+9280]; 2026-02-21T10:22:24.3027455Z ld.shared.b16 %rs1361, [%r54]; 2026-02-21T10:22:24.3027524Z ld.shared.b16 %rs1362, [%r54+1024]; 2026-02-21T10:22:24.3027590Z ld.shared.b16 %rs1363, [%r54+64]; 2026-02-21T10:22:24.3027658Z ld.shared.b16 %rs1364, [%r54+1088]; 2026-02-21T10:22:24.3027738Z ld.shared.b16 %rs1365, [%r54+8192]; 2026-02-21T10:22:24.3027808Z ld.shared.b16 %rs1366, [%r54+9216]; 2026-02-21T10:22:24.3027875Z ld.shared.b16 %rs1367, [%r54+8256]; 2026-02-21T10:22:24.3027940Z ld.shared.b16 %rs1368, [%r54+9280]; 2026-02-21T10:22:24.3028007Z ld.shared.b16 %rs1369, [%r55]; 2026-02-21T10:22:24.3028075Z ld.shared.b16 %rs1370, [%r55+1024]; 2026-02-21T10:22:24.3028140Z ld.shared.b16 %rs1371, [%r55+64]; 2026-02-21T10:22:24.3028208Z ld.shared.b16 %rs1372, [%r55+1088]; 2026-02-21T10:22:24.3028355Z ld.shared.b16 %rs1373, [%r55+8192]; 2026-02-21T10:22:24.3028521Z ld.shared.b16 %rs1374, [%r55+9216]; 2026-02-21T10:22:24.3028591Z ld.shared.b16 %rs1375, [%r55+8256]; 2026-02-21T10:22:24.3028660Z ld.shared.b16 %rs1376, [%r55+9280]; 2026-02-21T10:22:24.3028724Z ld.shared.b16 %rs1377, [%r56]; 2026-02-21T10:22:24.3028788Z ld.shared.b16 %rs1378, [%r56+1024]; 2026-02-21T10:22:24.3028861Z ld.shared.b16 %rs1379, [%r56+64]; 2026-02-21T10:22:24.3028925Z ld.shared.b16 %rs1380, [%r56+1088]; 2026-02-21T10:22:24.3028992Z ld.shared.b16 %rs1381, [%r56+8192]; 2026-02-21T10:22:24.3029060Z ld.shared.b16 %rs1382, [%r56+9216]; 2026-02-21T10:22:24.3029124Z ld.shared.b16 %rs1383, [%r56+8256]; 2026-02-21T10:22:24.3029188Z ld.shared.b16 %rs1384, [%r56+9280]; 2026-02-21T10:22:24.3029252Z ld.shared.b16 %rs1385, [%r57]; 2026-02-21T10:22:24.3029321Z ld.shared.b16 %rs1386, [%r57+1024]; 2026-02-21T10:22:24.3029386Z ld.shared.b16 %rs1387, [%r57+64]; 2026-02-21T10:22:24.3029450Z ld.shared.b16 %rs1388, [%r57+1088]; 2026-02-21T10:22:24.3029609Z ld.shared.b16 %rs1389, [%r57+8192]; 2026-02-21T10:22:24.3029680Z ld.shared.b16 %rs1390, [%r57+9216]; 2026-02-21T10:22:24.3029746Z ld.shared.b16 %rs1391, [%r57+8256]; 2026-02-21T10:22:24.3029810Z ld.shared.b16 %rs1392, [%r57+9280]; 2026-02-21T10:22:24.3029880Z ld.shared.b16 %rs1393, [%r58]; 2026-02-21T10:22:24.3030010Z ld.shared.b16 %rs1394, [%r58+1024]; 2026-02-21T10:22:24.3030074Z ld.shared.b16 %rs1395, [%r58+64]; 2026-02-21T10:22:24.3030140Z ld.shared.b16 %rs1396, [%r58+1088]; 2026-02-21T10:22:24.3030206Z ld.shared.b16 %rs1397, [%r58+8192]; 2026-02-21T10:22:24.3030270Z ld.shared.b16 %rs1398, [%r58+9216]; 2026-02-21T10:22:24.3030342Z ld.shared.b16 %rs1399, [%r58+8256]; 2026-02-21T10:22:24.3030406Z ld.shared.b16 %rs1400, [%r58+9280]; 2026-02-21T10:22:24.3030467Z ld.shared.b16 %rs1401, [%r59]; 2026-02-21T10:22:24.3030532Z ld.shared.b16 %rs1402, [%r59+1024]; 2026-02-21T10:22:24.3030600Z ld.shared.b16 %rs1403, [%r59+64]; 2026-02-21T10:22:24.3030672Z ld.shared.b16 %rs1404, [%r59+1088]; 2026-02-21T10:22:24.3030734Z ld.shared.b16 %rs1405, [%r59+8192]; 2026-02-21T10:22:24.3030803Z ld.shared.b16 %rs1406, [%r59+9216]; 2026-02-21T10:22:24.3030867Z ld.shared.b16 %rs1407, [%r59+8256]; 2026-02-21T10:22:24.3030932Z ld.shared.b16 %rs1408, [%r59+9280]; 2026-02-21T10:22:24.3030994Z cvt.f32.bf16 %r17355, %rs1345; 2026-02-21T10:22:24.3031063Z cvt.f32.bf16 %r17356, %rs1346; 2026-02-21T10:22:24.3031126Z cvt.f32.bf16 %r17357, %rs1353; 2026-02-21T10:22:24.3031239Z cvt.f32.bf16 %r17358, %rs1354; 2026-02-21T10:22:24.3031309Z cvt.f32.bf16 %r17487, %rs1361; 2026-02-21T10:22:24.3031367Z cvt.f32.bf16 %r17488, %rs1362; 2026-02-21T10:22:24.3031425Z cvt.f32.bf16 %r17489, %rs1369; 2026-02-21T10:22:24.3031490Z cvt.f32.bf16 %r17490, %rs1370; 2026-02-21T10:22:24.3031552Z cvt.f32.bf16 %r17619, %rs1377; 2026-02-21T10:22:24.3031613Z cvt.f32.bf16 %r17620, %rs1378; 2026-02-21T10:22:24.3031673Z cvt.f32.bf16 %r17621, %rs1385; 2026-02-21T10:22:24.3031754Z cvt.f32.bf16 %r17622, %rs1386; 2026-02-21T10:22:24.3031825Z cvt.f32.bf16 %r17751, %rs1393; 2026-02-21T10:22:24.3031888Z cvt.f32.bf16 %r17752, %rs1394; 2026-02-21T10:22:24.3031953Z cvt.f32.bf16 %r17753, %rs1401; 2026-02-21T10:22:24.3032014Z cvt.f32.bf16 %r17754, %rs1402; 2026-02-21T10:22:24.3032075Z cvt.f32.bf16 %r17883, %rs1347; 2026-02-21T10:22:24.3032140Z cvt.f32.bf16 %r17884, %rs1348; 2026-02-21T10:22:24.3032205Z cvt.f32.bf16 %r17885, %rs1355; 2026-02-21T10:22:24.3032268Z cvt.f32.bf16 %r17886, %rs1356; 2026-02-21T10:22:24.3032332Z cvt.f32.bf16 %r18015, %rs1363; 2026-02-21T10:22:24.3032395Z cvt.f32.bf16 %r18016, %rs1364; 2026-02-21T10:22:24.3032458Z cvt.f32.bf16 %r18017, %rs1371; 2026-02-21T10:22:24.3032519Z cvt.f32.bf16 %r18018, %rs1372; 2026-02-21T10:22:24.3032581Z cvt.f32.bf16 %r18147, %rs1379; 2026-02-21T10:22:24.3032659Z cvt.f32.bf16 %r18148, %rs1380; 2026-02-21T10:22:24.3032722Z cvt.f32.bf16 %r18149, %rs1387; 2026-02-21T10:22:24.3032787Z cvt.f32.bf16 %r18150, %rs1388; 2026-02-21T10:22:24.3032913Z cvt.f32.bf16 %r18279, %rs1395; 2026-02-21T10:22:24.3032974Z cvt.f32.bf16 %r18280, %rs1396; 2026-02-21T10:22:24.3033035Z cvt.f32.bf16 %r18281, %rs1403; 2026-02-21T10:22:24.3033095Z cvt.f32.bf16 %r18282, %rs1404; 2026-02-21T10:22:24.3033162Z cvt.f32.bf16 %r18411, %rs1349; 2026-02-21T10:22:24.3033226Z cvt.f32.bf16 %r18412, %rs1350; 2026-02-21T10:22:24.3033290Z cvt.f32.bf16 %r18413, %rs1357; 2026-02-21T10:22:24.3033354Z cvt.f32.bf16 %r18414, %rs1358; 2026-02-21T10:22:24.3033421Z cvt.f32.bf16 %r18543, %rs1365; 2026-02-21T10:22:24.3033482Z cvt.f32.bf16 %r18544, %rs1366; 2026-02-21T10:22:24.3033552Z cvt.f32.bf16 %r18545, %rs1373; 2026-02-21T10:22:24.3033613Z cvt.f32.bf16 %r18546, %rs1374; 2026-02-21T10:22:24.3033672Z cvt.f32.bf16 %r18675, %rs1381; 2026-02-21T10:22:24.3033735Z cvt.f32.bf16 %r18676, %rs1382; 2026-02-21T10:22:24.3033801Z cvt.f32.bf16 %r18677, %rs1389; 2026-02-21T10:22:24.3033863Z cvt.f32.bf16 %r18678, %rs1390; 2026-02-21T10:22:24.3033930Z cvt.f32.bf16 %r18807, %rs1397; 2026-02-21T10:22:24.3034055Z cvt.f32.bf16 %r18808, %rs1398; 2026-02-21T10:22:24.3034122Z cvt.f32.bf16 %r18809, %rs1405; 2026-02-21T10:22:24.3034183Z cvt.f32.bf16 %r18810, %rs1406; 2026-02-21T10:22:24.3034245Z cvt.f32.bf16 %r18939, %rs1351; 2026-02-21T10:22:24.3034310Z cvt.f32.bf16 %r18940, %rs1352; 2026-02-21T10:22:24.3034419Z cvt.f32.bf16 %r18941, %rs1359; 2026-02-21T10:22:24.3034480Z cvt.f32.bf16 %r18942, %rs1360; 2026-02-21T10:22:24.3034552Z cvt.f32.bf16 %r19071, %rs1367; 2026-02-21T10:22:24.3034621Z cvt.f32.bf16 %r19072, %rs1368; 2026-02-21T10:22:24.3034683Z cvt.f32.bf16 %r19073, %rs1375; 2026-02-21T10:22:24.3034744Z cvt.f32.bf16 %r19074, %rs1376; 2026-02-21T10:22:24.3034810Z cvt.f32.bf16 %r19203, %rs1383; 2026-02-21T10:22:24.3034872Z cvt.f32.bf16 %r19204, %rs1384; 2026-02-21T10:22:24.3034933Z cvt.f32.bf16 %r19205, %rs1391; 2026-02-21T10:22:24.3034999Z cvt.f32.bf16 %r19206, %rs1392; 2026-02-21T10:22:24.3035061Z cvt.f32.bf16 %r19335, %rs1399; 2026-02-21T10:22:24.3035128Z cvt.f32.bf16 %r19336, %rs1400; 2026-02-21T10:22:24.3035190Z cvt.f32.bf16 %r19337, %rs1407; 2026-02-21T10:22:24.3035258Z cvt.f32.bf16 %r19338, %rs1408; 2026-02-21T10:22:24.3035467Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3035527Z bar.sync 0; 2026-02-21T10:22:24.3035594Z // begin inline asm 2026-02-21T10:22:24.3035696Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3035755Z // end inline asm 2026-02-21T10:22:24.3035886Z bar.sync 0; 2026-02-21T10:22:24.3035947Z // begin inline asm 2026-02-21T10:22:24.3036084Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3036141Z // end inline asm 2026-02-21T10:22:24.3036212Z // begin inline asm 2026-02-21T10:22:24.3036289Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3036345Z // end inline asm 2026-02-21T10:22:24.3036409Z bar.sync 0; 2026-02-21T10:22:24.3036616Z elect.sync %r19733|%p192, -1; 2026-02-21T10:22:24.3036695Z and.pred %p171, %p1, %p192; 2026-02-21T10:22:24.3036758Z add.s32 %r17222, %r19668, 160; 2026-02-21T10:22:24.3036824Z // begin inline asm 2026-02-21T10:22:24.3037166Z @%p171 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r19833, %r17222}], [%r29850]; 2026-02-21T10:22:24.3037227Z // end inline asm 2026-02-21T10:22:24.3037287Z bar.sync 0; 2026-02-21T10:22:24.3037345Z // begin inline asm 2026-02-21T10:22:24.3037397Z 2026-02-21T10:22:24.3037453Z { 2026-02-21T10:22:24.3037517Z .reg .pred complete; 2026-02-21T10:22:24.3037575Z waitLoop: 2026-02-21T10:22:24.3037722Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r19468; 2026-02-21T10:22:24.3037800Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3037852Z } 2026-02-21T10:22:24.3037857Z 2026-02-21T10:22:24.3037915Z // end inline asm 2026-02-21T10:22:24.3037973Z bar.sync 0; 2026-02-21T10:22:24.3038032Z // begin inline asm 2026-02-21T10:22:24.3038226Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3038283Z // end inline asm 2026-02-21T10:22:24.3038496Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3038564Z ld.shared.s8 %rs1409, [%r20]; 2026-02-21T10:22:24.3038763Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3038836Z shl.b16 %rs1410, %rs1409, 4; 2026-02-21T10:22:24.3039030Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3039099Z ld.shared.s8 %rs1411, [%r21+128]; 2026-02-21T10:22:24.3039292Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3039355Z shl.b16 %rs1412, %rs1411, 4; 2026-02-21T10:22:24.3039544Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3039684Z ld.shared.s8 %rs1413, [%r22+256]; 2026-02-21T10:22:24.3039878Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3039939Z shl.b16 %rs1414, %rs1413, 4; 2026-02-21T10:22:24.3040129Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3040260Z ld.shared.s8 %rs1415, [%r23+384]; 2026-02-21T10:22:24.3040453Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3040516Z shl.b16 %rs1416, %rs1415, 4; 2026-02-21T10:22:24.3040708Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3040784Z ld.shared.s8 %rs1417, [%r24+512]; 2026-02-21T10:22:24.3040978Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3041045Z shl.b16 %rs1418, %rs1417, 4; 2026-02-21T10:22:24.3041239Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3041305Z ld.shared.s8 %rs1419, [%r25+640]; 2026-02-21T10:22:24.3041499Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3041567Z shl.b16 %rs1420, %rs1419, 4; 2026-02-21T10:22:24.3041757Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3041894Z ld.shared.s8 %rs1421, [%r26+768]; 2026-02-21T10:22:24.3042086Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3042150Z shl.b16 %rs1422, %rs1421, 4; 2026-02-21T10:22:24.3042338Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3042406Z ld.shared.s8 %rs1423, [%r27+896]; 2026-02-21T10:22:24.3042601Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3042669Z shl.b16 %rs1424, %rs1423, 4; 2026-02-21T10:22:24.3042862Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3042931Z ld.shared.s8 %rs1425, [%r20+1024]; 2026-02-21T10:22:24.3043124Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3043191Z shl.b16 %rs1426, %rs1425, 4; 2026-02-21T10:22:24.3043385Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3043452Z ld.shared.s8 %rs1427, [%r21+1152]; 2026-02-21T10:22:24.3043643Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3043706Z shl.b16 %rs1428, %rs1427, 4; 2026-02-21T10:22:24.3043895Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3044018Z ld.shared.s8 %rs1429, [%r22+1280]; 2026-02-21T10:22:24.3044212Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3044273Z shl.b16 %rs1430, %rs1429, 4; 2026-02-21T10:22:24.3044462Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3044536Z ld.shared.s8 %rs1431, [%r23+1408]; 2026-02-21T10:22:24.3044725Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3044788Z shl.b16 %rs1432, %rs1431, 4; 2026-02-21T10:22:24.3044985Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3045049Z ld.shared.s8 %rs1433, [%r24+1536]; 2026-02-21T10:22:24.3045239Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3045313Z shl.b16 %rs1434, %rs1433, 4; 2026-02-21T10:22:24.3045560Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3045630Z ld.shared.s8 %rs1435, [%r25+1664]; 2026-02-21T10:22:24.3045819Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3045933Z shl.b16 %rs1436, %rs1435, 4; 2026-02-21T10:22:24.3046123Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3046188Z ld.shared.s8 %rs1437, [%r26+1792]; 2026-02-21T10:22:24.3046381Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3046443Z shl.b16 %rs1438, %rs1437, 4; 2026-02-21T10:22:24.3046758Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3046829Z ld.shared.s8 %rs1439, [%r27+1920]; 2026-02-21T10:22:24.3047022Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3047083Z shl.b16 %rs1440, %rs1439, 4; 2026-02-21T10:22:24.3047277Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3047349Z ld.shared.s8 %rs1441, [%r20+2048]; 2026-02-21T10:22:24.3047538Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3047679Z shl.b16 %rs1442, %rs1441, 4; 2026-02-21T10:22:24.3047886Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3047952Z ld.shared.s8 %rs1443, [%r21+2176]; 2026-02-21T10:22:24.3048139Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3048207Z shl.b16 %rs1444, %rs1443, 4; 2026-02-21T10:22:24.3048398Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3048465Z ld.shared.s8 %rs1445, [%r22+2304]; 2026-02-21T10:22:24.3048656Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3048719Z shl.b16 %rs1446, %rs1445, 4; 2026-02-21T10:22:24.3048912Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3048980Z ld.shared.s8 %rs1447, [%r23+2432]; 2026-02-21T10:22:24.3049173Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3049237Z shl.b16 %rs1448, %rs1447, 4; 2026-02-21T10:22:24.3049427Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3049496Z ld.shared.s8 %rs1449, [%r24+2560]; 2026-02-21T10:22:24.3049686Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3049820Z shl.b16 %rs1450, %rs1449, 4; 2026-02-21T10:22:24.3050013Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3050078Z ld.shared.s8 %rs1451, [%r25+2688]; 2026-02-21T10:22:24.3050267Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3050336Z shl.b16 %rs1452, %rs1451, 4; 2026-02-21T10:22:24.3050526Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3050592Z ld.shared.s8 %rs1453, [%r26+2816]; 2026-02-21T10:22:24.3050783Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3050844Z shl.b16 %rs1454, %rs1453, 4; 2026-02-21T10:22:24.3051033Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3051106Z ld.shared.s8 %rs1455, [%r27+2944]; 2026-02-21T10:22:24.3051358Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3051422Z shl.b16 %rs1456, %rs1455, 4; 2026-02-21T10:22:24.3051614Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3051742Z ld.shared.s8 %rs1457, [%r20+3072]; 2026-02-21T10:22:24.3051947Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3052011Z shl.b16 %rs1458, %rs1457, 4; 2026-02-21T10:22:24.3052205Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3052272Z ld.shared.s8 %rs1459, [%r21+3200]; 2026-02-21T10:22:24.3052462Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3052534Z shl.b16 %rs1460, %rs1459, 4; 2026-02-21T10:22:24.3052729Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3052794Z ld.shared.s8 %rs1461, [%r22+3328]; 2026-02-21T10:22:24.3052986Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3053051Z shl.b16 %rs1462, %rs1461, 4; 2026-02-21T10:22:24.3053239Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3053357Z ld.shared.s8 %rs1463, [%r23+3456]; 2026-02-21T10:22:24.3053553Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3053619Z shl.b16 %rs1464, %rs1463, 4; 2026-02-21T10:22:24.3053807Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3053877Z ld.shared.s8 %rs1465, [%r24+3584]; 2026-02-21T10:22:24.3054071Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3054135Z shl.b16 %rs1466, %rs1465, 4; 2026-02-21T10:22:24.3054332Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3054396Z ld.shared.s8 %rs1467, [%r25+3712]; 2026-02-21T10:22:24.3054591Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3054658Z shl.b16 %rs1468, %rs1467, 4; 2026-02-21T10:22:24.3054846Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3054911Z ld.shared.s8 %rs1469, [%r26+3840]; 2026-02-21T10:22:24.3055103Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3055171Z shl.b16 %rs1470, %rs1469, 4; 2026-02-21T10:22:24.3055361Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3055486Z ld.shared.s8 %rs1471, [%r27+3968]; 2026-02-21T10:22:24.3055688Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3055756Z shl.b16 %rs1472, %rs1471, 4; 2026-02-21T10:22:24.3055949Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3056020Z cvt.s16.s8 %rs1473, %rs1410; 2026-02-21T10:22:24.3056082Z shr.s16 %rs1474, %rs1473, 4; 2026-02-21T10:22:24.3056146Z cvt.s16.s8 %rs1475, %rs1412; 2026-02-21T10:22:24.3056208Z shr.s16 %rs1476, %rs1475, 4; 2026-02-21T10:22:24.3056275Z shr.s16 %rs1477, %rs1409, 4; 2026-02-21T10:22:24.3056335Z shr.s16 %rs1478, %rs1411, 4; 2026-02-21T10:22:24.3056634Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3056713Z cvt.rn.f32.s16 %r19734, %rs1478; 2026-02-21T10:22:24.3056780Z cvt.rn.f32.s16 %r19735, %rs1477; 2026-02-21T10:22:24.3056926Z cvt.rn.f32.s16 %r19736, %rs1476; 2026-02-21T10:22:24.3056997Z cvt.rn.f32.s16 %r19737, %rs1474; 2026-02-21T10:22:24.3057194Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3057256Z cvt.s16.s8 %rs1479, %rs1414; 2026-02-21T10:22:24.3057397Z shr.s16 %rs1480, %rs1479, 4; 2026-02-21T10:22:24.3057463Z cvt.s16.s8 %rs1481, %rs1416; 2026-02-21T10:22:24.3057525Z shr.s16 %rs1482, %rs1481, 4; 2026-02-21T10:22:24.3057590Z shr.s16 %rs1483, %rs1413, 4; 2026-02-21T10:22:24.3057654Z shr.s16 %rs1484, %rs1415, 4; 2026-02-21T10:22:24.3057847Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3057911Z cvt.rn.f32.s16 %r19738, %rs1484; 2026-02-21T10:22:24.3057979Z cvt.rn.f32.s16 %r19739, %rs1483; 2026-02-21T10:22:24.3058042Z cvt.rn.f32.s16 %r19740, %rs1482; 2026-02-21T10:22:24.3058108Z cvt.rn.f32.s16 %r19741, %rs1480; 2026-02-21T10:22:24.3058304Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3058381Z cvt.s16.s8 %rs1485, %rs1418; 2026-02-21T10:22:24.3058446Z shr.s16 %rs1486, %rs1485, 4; 2026-02-21T10:22:24.3058507Z cvt.s16.s8 %rs1487, %rs1420; 2026-02-21T10:22:24.3058572Z shr.s16 %rs1488, %rs1487, 4; 2026-02-21T10:22:24.3058636Z shr.s16 %rs1489, %rs1417, 4; 2026-02-21T10:22:24.3058697Z shr.s16 %rs1490, %rs1419, 4; 2026-02-21T10:22:24.3058959Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3059024Z cvt.rn.f32.s16 %r19742, %rs1490; 2026-02-21T10:22:24.3059087Z cvt.rn.f32.s16 %r19743, %rs1489; 2026-02-21T10:22:24.3059149Z cvt.rn.f32.s16 %r19744, %rs1488; 2026-02-21T10:22:24.3059215Z cvt.rn.f32.s16 %r19745, %rs1486; 2026-02-21T10:22:24.3059404Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3059470Z cvt.s16.s8 %rs1491, %rs1422; 2026-02-21T10:22:24.3059539Z shr.s16 %rs1492, %rs1491, 4; 2026-02-21T10:22:24.3059600Z cvt.s16.s8 %rs1493, %rs1424; 2026-02-21T10:22:24.3059660Z shr.s16 %rs1494, %rs1493, 4; 2026-02-21T10:22:24.3059720Z shr.s16 %rs1495, %rs1421, 4; 2026-02-21T10:22:24.3059783Z shr.s16 %rs1496, %rs1423, 4; 2026-02-21T10:22:24.3059980Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3060047Z cvt.rn.f32.s16 %r19746, %rs1496; 2026-02-21T10:22:24.3060116Z cvt.rn.f32.s16 %r19747, %rs1495; 2026-02-21T10:22:24.3060178Z cvt.rn.f32.s16 %r19748, %rs1494; 2026-02-21T10:22:24.3060240Z cvt.rn.f32.s16 %r19749, %rs1492; 2026-02-21T10:22:24.3060444Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3060507Z cvt.s16.s8 %rs1497, %rs1426; 2026-02-21T10:22:24.3060568Z shr.s16 %rs1498, %rs1497, 4; 2026-02-21T10:22:24.3060700Z cvt.s16.s8 %rs1499, %rs1428; 2026-02-21T10:22:24.3060768Z shr.s16 %rs1500, %rs1499, 4; 2026-02-21T10:22:24.3060829Z shr.s16 %rs1501, %rs1425, 4; 2026-02-21T10:22:24.3060889Z shr.s16 %rs1502, %rs1427, 4; 2026-02-21T10:22:24.3061088Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3061155Z cvt.rn.f32.s16 %r19750, %rs1502; 2026-02-21T10:22:24.3061218Z cvt.rn.f32.s16 %r19751, %rs1501; 2026-02-21T10:22:24.3061286Z cvt.rn.f32.s16 %r19752, %rs1500; 2026-02-21T10:22:24.3061349Z cvt.rn.f32.s16 %r19753, %rs1498; 2026-02-21T10:22:24.3061540Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3061602Z cvt.s16.s8 %rs1503, %rs1430; 2026-02-21T10:22:24.3061667Z shr.s16 %rs1504, %rs1503, 4; 2026-02-21T10:22:24.3061730Z cvt.s16.s8 %rs1505, %rs1432; 2026-02-21T10:22:24.3061790Z shr.s16 %rs1506, %rs1505, 4; 2026-02-21T10:22:24.3061855Z shr.s16 %rs1507, %rs1429, 4; 2026-02-21T10:22:24.3061969Z shr.s16 %rs1508, %rs1431, 4; 2026-02-21T10:22:24.3062163Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3062233Z cvt.rn.f32.s16 %r19754, %rs1508; 2026-02-21T10:22:24.3062295Z cvt.rn.f32.s16 %r19755, %rs1507; 2026-02-21T10:22:24.3062357Z cvt.rn.f32.s16 %r19756, %rs1506; 2026-02-21T10:22:24.3062467Z cvt.rn.f32.s16 %r19757, %rs1504; 2026-02-21T10:22:24.3062673Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3062737Z cvt.s16.s8 %rs1509, %rs1434; 2026-02-21T10:22:24.3062800Z shr.s16 %rs1510, %rs1509, 4; 2026-02-21T10:22:24.3062865Z cvt.s16.s8 %rs1511, %rs1436; 2026-02-21T10:22:24.3062926Z shr.s16 %rs1512, %rs1511, 4; 2026-02-21T10:22:24.3062987Z shr.s16 %rs1513, %rs1433, 4; 2026-02-21T10:22:24.3063048Z shr.s16 %rs1514, %rs1435, 4; 2026-02-21T10:22:24.3063253Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3063323Z cvt.rn.f32.s16 %r19758, %rs1514; 2026-02-21T10:22:24.3063387Z cvt.rn.f32.s16 %r19759, %rs1513; 2026-02-21T10:22:24.3063453Z cvt.rn.f32.s16 %r19760, %rs1512; 2026-02-21T10:22:24.3063516Z cvt.rn.f32.s16 %r19761, %rs1510; 2026-02-21T10:22:24.3063719Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3063790Z cvt.s16.s8 %rs1515, %rs1438; 2026-02-21T10:22:24.3063854Z shr.s16 %rs1516, %rs1515, 4; 2026-02-21T10:22:24.3063969Z cvt.s16.s8 %rs1517, %rs1440; 2026-02-21T10:22:24.3064034Z shr.s16 %rs1518, %rs1517, 4; 2026-02-21T10:22:24.3064101Z shr.s16 %rs1519, %rs1437, 4; 2026-02-21T10:22:24.3064159Z shr.s16 %rs1520, %rs1439, 4; 2026-02-21T10:22:24.3064358Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3064431Z cvt.rn.f32.s16 %r19762, %rs1520; 2026-02-21T10:22:24.3064497Z cvt.rn.f32.s16 %r19763, %rs1519; 2026-02-21T10:22:24.3064566Z cvt.rn.f32.s16 %r19764, %rs1518; 2026-02-21T10:22:24.3064636Z cvt.rn.f32.s16 %r19765, %rs1516; 2026-02-21T10:22:24.3064829Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3064894Z cvt.s16.s8 %rs1521, %rs1442; 2026-02-21T10:22:24.3064959Z shr.s16 %rs1522, %rs1521, 4; 2026-02-21T10:22:24.3065025Z cvt.s16.s8 %rs1523, %rs1444; 2026-02-21T10:22:24.3065084Z shr.s16 %rs1524, %rs1523, 4; 2026-02-21T10:22:24.3065147Z shr.s16 %rs1525, %rs1441, 4; 2026-02-21T10:22:24.3065214Z shr.s16 %rs1526, %rs1443, 4; 2026-02-21T10:22:24.3065406Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3065471Z cvt.rn.f32.s16 %r19766, %rs1526; 2026-02-21T10:22:24.3065538Z cvt.rn.f32.s16 %r19767, %rs1525; 2026-02-21T10:22:24.3065605Z cvt.rn.f32.s16 %r19768, %rs1524; 2026-02-21T10:22:24.3065667Z cvt.rn.f32.s16 %r19769, %rs1522; 2026-02-21T10:22:24.3065929Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3065998Z cvt.s16.s8 %rs1527, %rs1446; 2026-02-21T10:22:24.3066058Z shr.s16 %rs1528, %rs1527, 4; 2026-02-21T10:22:24.3066120Z cvt.s16.s8 %rs1529, %rs1448; 2026-02-21T10:22:24.3066188Z shr.s16 %rs1530, %rs1529, 4; 2026-02-21T10:22:24.3066247Z shr.s16 %rs1531, %rs1445, 4; 2026-02-21T10:22:24.3066307Z shr.s16 %rs1532, %rs1447, 4; 2026-02-21T10:22:24.3066659Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3066735Z cvt.rn.f32.s16 %r19770, %rs1532; 2026-02-21T10:22:24.3066799Z cvt.rn.f32.s16 %r19771, %rs1531; 2026-02-21T10:22:24.3066861Z cvt.rn.f32.s16 %r19772, %rs1530; 2026-02-21T10:22:24.3066928Z cvt.rn.f32.s16 %r19773, %rs1528; 2026-02-21T10:22:24.3067120Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3067263Z cvt.s16.s8 %rs1533, %rs1450; 2026-02-21T10:22:24.3067332Z shr.s16 %rs1534, %rs1533, 4; 2026-02-21T10:22:24.3067393Z cvt.s16.s8 %rs1535, %rs1452; 2026-02-21T10:22:24.3067454Z shr.s16 %rs1536, %rs1535, 4; 2026-02-21T10:22:24.3067515Z shr.s16 %rs1537, %rs1449, 4; 2026-02-21T10:22:24.3067578Z shr.s16 %rs1538, %rs1451, 4; 2026-02-21T10:22:24.3067832Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3067896Z cvt.rn.f32.s16 %r19774, %rs1538; 2026-02-21T10:22:24.3067966Z cvt.rn.f32.s16 %r19775, %rs1537; 2026-02-21T10:22:24.3068028Z cvt.rn.f32.s16 %r19776, %rs1536; 2026-02-21T10:22:24.3068101Z cvt.rn.f32.s16 %r19777, %rs1534; 2026-02-21T10:22:24.3068301Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3068363Z cvt.s16.s8 %rs1539, %rs1454; 2026-02-21T10:22:24.3068509Z shr.s16 %rs1540, %rs1539, 4; 2026-02-21T10:22:24.3068578Z cvt.s16.s8 %rs1541, %rs1456; 2026-02-21T10:22:24.3068644Z shr.s16 %rs1542, %rs1541, 4; 2026-02-21T10:22:24.3068706Z shr.s16 %rs1543, %rs1453, 4; 2026-02-21T10:22:24.3068766Z shr.s16 %rs1544, %rs1455, 4; 2026-02-21T10:22:24.3068962Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3069029Z cvt.rn.f32.s16 %r19778, %rs1544; 2026-02-21T10:22:24.3069091Z cvt.rn.f32.s16 %r19779, %rs1543; 2026-02-21T10:22:24.3069162Z cvt.rn.f32.s16 %r19780, %rs1542; 2026-02-21T10:22:24.3069295Z cvt.rn.f32.s16 %r19781, %rs1540; 2026-02-21T10:22:24.3069492Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3069553Z cvt.s16.s8 %rs1545, %rs1458; 2026-02-21T10:22:24.3069619Z shr.s16 %rs1546, %rs1545, 4; 2026-02-21T10:22:24.3069678Z cvt.s16.s8 %rs1547, %rs1460; 2026-02-21T10:22:24.3069740Z shr.s16 %rs1548, %rs1547, 4; 2026-02-21T10:22:24.3069808Z shr.s16 %rs1549, %rs1457, 4; 2026-02-21T10:22:24.3069874Z shr.s16 %rs1550, %rs1459, 4; 2026-02-21T10:22:24.3070064Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3070131Z cvt.rn.f32.s16 %r19782, %rs1550; 2026-02-21T10:22:24.3070201Z cvt.rn.f32.s16 %r19783, %rs1549; 2026-02-21T10:22:24.3070278Z cvt.rn.f32.s16 %r19784, %rs1548; 2026-02-21T10:22:24.3070342Z cvt.rn.f32.s16 %r19785, %rs1546; 2026-02-21T10:22:24.3070544Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3070607Z cvt.s16.s8 %rs1551, %rs1462; 2026-02-21T10:22:24.3070669Z shr.s16 %rs1552, %rs1551, 4; 2026-02-21T10:22:24.3070734Z cvt.s16.s8 %rs1553, %rs1464; 2026-02-21T10:22:24.3070794Z shr.s16 %rs1554, %rs1553, 4; 2026-02-21T10:22:24.3070854Z shr.s16 %rs1555, %rs1461, 4; 2026-02-21T10:22:24.3070913Z shr.s16 %rs1556, %rs1463, 4; 2026-02-21T10:22:24.3071110Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3071247Z cvt.rn.f32.s16 %r19786, %rs1556; 2026-02-21T10:22:24.3071309Z cvt.rn.f32.s16 %r19787, %rs1555; 2026-02-21T10:22:24.3071375Z cvt.rn.f32.s16 %r19788, %rs1554; 2026-02-21T10:22:24.3071438Z cvt.rn.f32.s16 %r19789, %rs1552; 2026-02-21T10:22:24.3071628Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3071711Z cvt.s16.s8 %rs1557, %rs1466; 2026-02-21T10:22:24.3071777Z shr.s16 %rs1558, %rs1557, 4; 2026-02-21T10:22:24.3071839Z cvt.s16.s8 %rs1559, %rs1468; 2026-02-21T10:22:24.3071900Z shr.s16 %rs1560, %rs1559, 4; 2026-02-21T10:22:24.3071968Z shr.s16 %rs1561, %rs1465, 4; 2026-02-21T10:22:24.3072028Z shr.s16 %rs1562, %rs1467, 4; 2026-02-21T10:22:24.3072217Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3072285Z cvt.rn.f32.s16 %r19790, %rs1562; 2026-02-21T10:22:24.3072351Z cvt.rn.f32.s16 %r19791, %rs1561; 2026-02-21T10:22:24.3072466Z cvt.rn.f32.s16 %r19792, %rs1560; 2026-02-21T10:22:24.3072535Z cvt.rn.f32.s16 %r19793, %rs1558; 2026-02-21T10:22:24.3072726Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3072788Z cvt.s16.s8 %rs1563, %rs1470; 2026-02-21T10:22:24.3072900Z shr.s16 %rs1564, %rs1563, 4; 2026-02-21T10:22:24.3072967Z cvt.s16.s8 %rs1565, %rs1472; 2026-02-21T10:22:24.3073026Z shr.s16 %rs1566, %rs1565, 4; 2026-02-21T10:22:24.3073088Z shr.s16 %rs1567, %rs1469, 4; 2026-02-21T10:22:24.3073152Z shr.s16 %rs1568, %rs1471, 4; 2026-02-21T10:22:24.3073345Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3073408Z cvt.rn.f32.s16 %r19794, %rs1568; 2026-02-21T10:22:24.3073472Z cvt.rn.f32.s16 %r19795, %rs1567; 2026-02-21T10:22:24.3073540Z cvt.rn.f32.s16 %r19796, %rs1566; 2026-02-21T10:22:24.3073605Z cvt.rn.f32.s16 %r19797, %rs1564; 2026-02-21T10:22:24.3073664Z bar.sync 0; 2026-02-21T10:22:24.3073790Z st.shared.v4.b32 [%r28], {%r19737, %r19735, %r19736, %r19734}; 2026-02-21T10:22:24.3073921Z st.shared.v4.b32 [%r28+16384], {%r19769, %r19767, %r19768, %r19766}; 2026-02-21T10:22:24.3074035Z st.shared.v4.b32 [%r29], {%r19741, %r19739, %r19740, %r19738}; 2026-02-21T10:22:24.3074164Z st.shared.v4.b32 [%r29+16384], {%r19773, %r19771, %r19772, %r19770}; 2026-02-21T10:22:24.3074327Z st.shared.v4.b32 [%r30], {%r19745, %r19743, %r19744, %r19742}; 2026-02-21T10:22:24.3074448Z st.shared.v4.b32 [%r30+16384], {%r19777, %r19775, %r19776, %r19774}; 2026-02-21T10:22:24.3074555Z st.shared.v4.b32 [%r31], {%r19749, %r19747, %r19748, %r19746}; 2026-02-21T10:22:24.3074677Z st.shared.v4.b32 [%r31+16384], {%r19781, %r19779, %r19780, %r19778}; 2026-02-21T10:22:24.3074786Z st.shared.v4.b32 [%r32], {%r19753, %r19751, %r19752, %r19750}; 2026-02-21T10:22:24.3074914Z st.shared.v4.b32 [%r32+16384], {%r19785, %r19783, %r19784, %r19782}; 2026-02-21T10:22:24.3075037Z st.shared.v4.b32 [%r33], {%r19757, %r19755, %r19756, %r19754}; 2026-02-21T10:22:24.3075154Z st.shared.v4.b32 [%r33+16384], {%r19789, %r19787, %r19788, %r19786}; 2026-02-21T10:22:24.3075263Z st.shared.v4.b32 [%r34], {%r19761, %r19759, %r19760, %r19758}; 2026-02-21T10:22:24.3075388Z st.shared.v4.b32 [%r34+16384], {%r19793, %r19791, %r19792, %r19790}; 2026-02-21T10:22:24.3075498Z st.shared.v4.b32 [%r35], {%r19765, %r19763, %r19764, %r19762}; 2026-02-21T10:22:24.3075616Z st.shared.v4.b32 [%r35+16384], {%r19797, %r19795, %r19796, %r19794}; 2026-02-21T10:22:24.3075678Z $L__tmp13: 2026-02-21T10:22:24.3075954Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3076018Z // begin inline asm 2026-02-21T10:22:24.3076097Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3076161Z // end inline asm 2026-02-21T10:22:24.3076217Z bar.sync 0; 2026-02-21T10:22:24.3076352Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3076419Z // begin inline asm 2026-02-21T10:22:24.3078208Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r17355,%r17356,%r17357,%r17358}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.3078279Z // end inline asm 2026-02-21T10:22:24.3078342Z // begin inline asm 2026-02-21T10:22:24.3079935Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r17487,%r17488,%r17489,%r17490}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.3080062Z // end inline asm 2026-02-21T10:22:24.3080125Z // begin inline asm 2026-02-21T10:22:24.3081608Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r17619,%r17620,%r17621,%r17622}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.3081677Z // end inline asm 2026-02-21T10:22:24.3081738Z // begin inline asm 2026-02-21T10:22:24.3083281Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r17751,%r17752,%r17753,%r17754}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.3083346Z // end inline asm 2026-02-21T10:22:24.3083408Z // begin inline asm 2026-02-21T10:22:24.3084892Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r17883,%r17884,%r17885,%r17886}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.3084954Z // end inline asm 2026-02-21T10:22:24.3085013Z // begin inline asm 2026-02-21T10:22:24.3086825Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r18015,%r18016,%r18017,%r18018}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.3086975Z // end inline asm 2026-02-21T10:22:24.3087045Z // begin inline asm 2026-02-21T10:22:24.3088598Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r18147,%r18148,%r18149,%r18150}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.3088659Z // end inline asm 2026-02-21T10:22:24.3088721Z // begin inline asm 2026-02-21T10:22:24.3090202Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r18279,%r18280,%r18281,%r18282}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.3090325Z // end inline asm 2026-02-21T10:22:24.3090385Z // begin inline asm 2026-02-21T10:22:24.3091919Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r18411,%r18412,%r18413,%r18414}, %rd3, %p133, 1, 1; 2026-02-21T10:22:24.3091989Z // end inline asm 2026-02-21T10:22:24.3092049Z // begin inline asm 2026-02-21T10:22:24.3093542Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r18543,%r18544,%r18545,%r18546}, %rd4, %p133, 1, 1; 2026-02-21T10:22:24.3093608Z // end inline asm 2026-02-21T10:22:24.3093668Z // begin inline asm 2026-02-21T10:22:24.3095153Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r18675,%r18676,%r18677,%r18678}, %rd5, %p133, 1, 1; 2026-02-21T10:22:24.3095266Z // end inline asm 2026-02-21T10:22:24.3095323Z // begin inline asm 2026-02-21T10:22:24.3096936Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r18807,%r18808,%r18809,%r18810}, %rd6, %p133, 1, 1; 2026-02-21T10:22:24.3097002Z // end inline asm 2026-02-21T10:22:24.3097059Z // begin inline asm 2026-02-21T10:22:24.3098616Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r18939,%r18940,%r18941,%r18942}, %rd7, %p133, 1, 1; 2026-02-21T10:22:24.3098741Z // end inline asm 2026-02-21T10:22:24.3098804Z // begin inline asm 2026-02-21T10:22:24.3100289Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r19071,%r19072,%r19073,%r19074}, %rd8, %p133, 1, 1; 2026-02-21T10:22:24.3100351Z // end inline asm 2026-02-21T10:22:24.3100414Z // begin inline asm 2026-02-21T10:22:24.3101955Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r19203,%r19204,%r19205,%r19206}, %rd9, %p133, 1, 1; 2026-02-21T10:22:24.3102022Z // end inline asm 2026-02-21T10:22:24.3102082Z // begin inline asm 2026-02-21T10:22:24.3103564Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r19335,%r19336,%r19337,%r19338}, %rd10, %p133, 1, 1; 2026-02-21T10:22:24.3103632Z // end inline asm 2026-02-21T10:22:24.3103713Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3103777Z mov.b32 %r19467, %r39936; 2026-02-21T10:22:24.3103843Z mov.b32 %r19469, %r19468; 2026-02-21T10:22:24.3103903Z // begin inline asm 2026-02-21T10:22:24.3106626Z // wait for regs: %r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858,%r19467,%r19468,%r19469 2026-02-21T10:22:24.3106794Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3106857Z // end inline asm 2026-02-21T10:22:24.3106918Z $L__tmp14: 2026-02-21T10:22:24.3107136Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3107262Z add.s64 %rd844, %rd844, 384; 2026-02-21T10:22:24.3107328Z add.s32 %r42730, %r42730, 192; 2026-02-21T10:22:24.3107404Z setp.lt.u64 %p193, %rd55, 3936; 2026-02-21T10:22:24.3107468Z mov.b64 %rd845, %rd55; 2026-02-21T10:22:24.3107530Z @%p193 bra $L__BB0_7; 2026-02-21T10:22:24.3107644Z // %bb.8: // %.preheader270.preheader 2026-02-21T10:22:24.3107751Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.3107815Z add.s64 %rd57, %rd52, 16128; 2026-02-21T10:22:24.3107883Z add.s64 %rd58, %rd45, 16128; 2026-02-21T10:22:24.3107944Z add.s64 %rd59, %rd46, 16128; 2026-02-21T10:22:24.3108008Z add.s64 %rd60, %rd47, 16128; 2026-02-21T10:22:24.3108070Z add.s64 %rd61, %rd48, 16128; 2026-02-21T10:22:24.3108146Z add.s64 %rd62, %rd49, 16128; 2026-02-21T10:22:24.3108209Z add.s64 %rd63, %rd50, 16128; 2026-02-21T10:22:24.3108271Z add.s64 %rd64, %rd51, 16128; 2026-02-21T10:22:24.3108334Z mov.b64 %rd847, 4000; 2026-02-21T10:22:24.3108398Z mov.b64 %rd846, %rd11; 2026-02-21T10:22:24.3108582Z $L__BB0_9: // %.preheader270 2026-02-21T10:22:24.3108756Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.3108867Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.3109073Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3109139Z add.s64 %rd457, %rd846, %rd64; 2026-02-21T10:22:24.3109208Z add.s64 %rd460, %rd846, %rd63; 2026-02-21T10:22:24.3109270Z add.s64 %rd463, %rd846, %rd62; 2026-02-21T10:22:24.3109334Z add.s64 %rd466, %rd846, %rd61; 2026-02-21T10:22:24.3109403Z add.s64 %rd469, %rd846, %rd60; 2026-02-21T10:22:24.3109466Z add.s64 %rd472, %rd846, %rd59; 2026-02-21T10:22:24.3109528Z add.s64 %rd475, %rd846, %rd58; 2026-02-21T10:22:24.3109725Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3109795Z add.s64 %rd478, %rd846, %rd57; 2026-02-21T10:22:24.3109856Z // begin inline asm 2026-02-21T10:22:24.3109917Z mov.u64 %rd456, 0x0; 2026-02-21T10:22:24.3110054Z createpolicy.fractional.L2::evict_first.b64 %rd456, 1.0; 2026-02-21T10:22:24.3110113Z // end inline asm 2026-02-21T10:22:24.3110172Z // begin inline asm 2026-02-21T10:22:24.3110231Z mov.u32 %r19798, 0x0; 2026-02-21T10:22:24.3110294Z mov.u32 %r19799, 0x0; 2026-02-21T10:22:24.3110352Z mov.u32 %r19800, 0x0; 2026-02-21T10:22:24.3110423Z mov.u32 %r19801, 0x0; 2026-02-21T10:22:24.3110669Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19798, %r19799, %r19800, %r19801 }, [ %rd457 + 0 ], %rd456; 2026-02-21T10:22:24.3110819Z // end inline asm 2026-02-21T10:22:24.3110880Z // begin inline asm 2026-02-21T10:22:24.3110945Z mov.u64 %rd459, 0x0; 2026-02-21T10:22:24.3111069Z createpolicy.fractional.L2::evict_first.b64 %rd459, 1.0; 2026-02-21T10:22:24.3111128Z // end inline asm 2026-02-21T10:22:24.3111190Z // begin inline asm 2026-02-21T10:22:24.3111270Z mov.u32 %r19802, 0x0; 2026-02-21T10:22:24.3111331Z mov.u32 %r19803, 0x0; 2026-02-21T10:22:24.3111389Z mov.u32 %r19804, 0x0; 2026-02-21T10:22:24.3111455Z mov.u32 %r19805, 0x0; 2026-02-21T10:22:24.3111685Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19802, %r19803, %r19804, %r19805 }, [ %rd460 + 0 ], %rd459; 2026-02-21T10:22:24.3111743Z // end inline asm 2026-02-21T10:22:24.3111805Z // begin inline asm 2026-02-21T10:22:24.3111863Z mov.u64 %rd462, 0x0; 2026-02-21T10:22:24.3111983Z createpolicy.fractional.L2::evict_first.b64 %rd462, 1.0; 2026-02-21T10:22:24.3112041Z // end inline asm 2026-02-21T10:22:24.3112106Z // begin inline asm 2026-02-21T10:22:24.3112214Z mov.u32 %r19806, 0x0; 2026-02-21T10:22:24.3112276Z mov.u32 %r19807, 0x0; 2026-02-21T10:22:24.3112338Z mov.u32 %r19808, 0x0; 2026-02-21T10:22:24.3112399Z mov.u32 %r19809, 0x0; 2026-02-21T10:22:24.3112621Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19806, %r19807, %r19808, %r19809 }, [ %rd463 + 0 ], %rd462; 2026-02-21T10:22:24.3112730Z // end inline asm 2026-02-21T10:22:24.3112790Z // begin inline asm 2026-02-21T10:22:24.3112848Z mov.u64 %rd465, 0x0; 2026-02-21T10:22:24.3112967Z createpolicy.fractional.L2::evict_first.b64 %rd465, 1.0; 2026-02-21T10:22:24.3113030Z // end inline asm 2026-02-21T10:22:24.3113089Z // begin inline asm 2026-02-21T10:22:24.3113145Z mov.u32 %r19810, 0x0; 2026-02-21T10:22:24.3113207Z mov.u32 %r19811, 0x0; 2026-02-21T10:22:24.3113265Z mov.u32 %r19812, 0x0; 2026-02-21T10:22:24.3113321Z mov.u32 %r19813, 0x0; 2026-02-21T10:22:24.3113541Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19810, %r19811, %r19812, %r19813 }, [ %rd466 + 0 ], %rd465; 2026-02-21T10:22:24.3113608Z // end inline asm 2026-02-21T10:22:24.3113670Z // begin inline asm 2026-02-21T10:22:24.3113730Z mov.u64 %rd468, 0x0; 2026-02-21T10:22:24.3113855Z createpolicy.fractional.L2::evict_first.b64 %rd468, 1.0; 2026-02-21T10:22:24.3113913Z // end inline asm 2026-02-21T10:22:24.3113971Z // begin inline asm 2026-02-21T10:22:24.3114034Z mov.u32 %r19814, 0x0; 2026-02-21T10:22:24.3114093Z mov.u32 %r19815, 0x0; 2026-02-21T10:22:24.3114163Z mov.u32 %r19816, 0x0; 2026-02-21T10:22:24.3114275Z mov.u32 %r19817, 0x0; 2026-02-21T10:22:24.3114507Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19814, %r19815, %r19816, %r19817 }, [ %rd469 + 0 ], %rd468; 2026-02-21T10:22:24.3114565Z // end inline asm 2026-02-21T10:22:24.3114624Z // begin inline asm 2026-02-21T10:22:24.3114685Z mov.u64 %rd471, 0x0; 2026-02-21T10:22:24.3114804Z createpolicy.fractional.L2::evict_first.b64 %rd471, 1.0; 2026-02-21T10:22:24.3114864Z // end inline asm 2026-02-21T10:22:24.3114926Z // begin inline asm 2026-02-21T10:22:24.3114991Z mov.u32 %r19818, 0x0; 2026-02-21T10:22:24.3115050Z mov.u32 %r19819, 0x0; 2026-02-21T10:22:24.3115107Z mov.u32 %r19820, 0x0; 2026-02-21T10:22:24.3115167Z mov.u32 %r19821, 0x0; 2026-02-21T10:22:24.3115388Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19818, %r19819, %r19820, %r19821 }, [ %rd472 + 0 ], %rd471; 2026-02-21T10:22:24.3115449Z // end inline asm 2026-02-21T10:22:24.3115511Z // begin inline asm 2026-02-21T10:22:24.3115571Z mov.u64 %rd474, 0x0; 2026-02-21T10:22:24.3115693Z createpolicy.fractional.L2::evict_first.b64 %rd474, 1.0; 2026-02-21T10:22:24.3115751Z // end inline asm 2026-02-21T10:22:24.3115816Z // begin inline asm 2026-02-21T10:22:24.3115876Z mov.u32 %r19822, 0x0; 2026-02-21T10:22:24.3115936Z mov.u32 %r19823, 0x0; 2026-02-21T10:22:24.3115997Z mov.u32 %r19824, 0x0; 2026-02-21T10:22:24.3116053Z mov.u32 %r19825, 0x0; 2026-02-21T10:22:24.3116293Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19822, %r19823, %r19824, %r19825 }, [ %rd475 + 0 ], %rd474; 2026-02-21T10:22:24.3116422Z // end inline asm 2026-02-21T10:22:24.3116604Z // begin inline asm 2026-02-21T10:22:24.3116669Z mov.u64 %rd477, 0x0; 2026-02-21T10:22:24.3116789Z createpolicy.fractional.L2::evict_first.b64 %rd477, 1.0; 2026-02-21T10:22:24.3116850Z // end inline asm 2026-02-21T10:22:24.3116906Z // begin inline asm 2026-02-21T10:22:24.3116968Z mov.u32 %r19826, 0x0; 2026-02-21T10:22:24.3117032Z mov.u32 %r19827, 0x0; 2026-02-21T10:22:24.3117089Z mov.u32 %r19828, 0x0; 2026-02-21T10:22:24.3117148Z mov.u32 %r19829, 0x0; 2026-02-21T10:22:24.3117370Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r19826, %r19827, %r19828, %r19829 }, [ %rd478 + 0 ], %rd477; 2026-02-21T10:22:24.3117432Z // end inline asm 2026-02-21T10:22:24.3117634Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3117690Z bar.sync 0; 2026-02-21T10:22:24.3117780Z st.shared.v2.b32 [%r10], {%r19798, %r19799}; 2026-02-21T10:22:24.3117960Z st.shared.v2.b32 [%r10+2048], {%r19802, %r19803}; 2026-02-21T10:22:24.3118055Z st.shared.v2.b32 [%r10+4096], {%r19806, %r19807}; 2026-02-21T10:22:24.3118145Z st.shared.v2.b32 [%r10+6144], {%r19810, %r19811}; 2026-02-21T10:22:24.3118231Z st.shared.v2.b32 [%r10+8192], {%r19814, %r19815}; 2026-02-21T10:22:24.3118325Z st.shared.v2.b32 [%r10+10240], {%r19818, %r19819}; 2026-02-21T10:22:24.3118479Z st.shared.v2.b32 [%r10+12288], {%r19822, %r19823}; 2026-02-21T10:22:24.3118573Z st.shared.v2.b32 [%r10+14336], {%r19826, %r19827}; 2026-02-21T10:22:24.3118654Z st.shared.v2.b32 [%r11], {%r19800, %r19801}; 2026-02-21T10:22:24.3118739Z st.shared.v2.b32 [%r11+2048], {%r19804, %r19805}; 2026-02-21T10:22:24.3118827Z st.shared.v2.b32 [%r11+4096], {%r19808, %r19809}; 2026-02-21T10:22:24.3118912Z st.shared.v2.b32 [%r11+6144], {%r19812, %r19813}; 2026-02-21T10:22:24.3118996Z st.shared.v2.b32 [%r11+8192], {%r19816, %r19817}; 2026-02-21T10:22:24.3119087Z st.shared.v2.b32 [%r11+10240], {%r19820, %r19821}; 2026-02-21T10:22:24.3119177Z st.shared.v2.b32 [%r11+12288], {%r19824, %r19825}; 2026-02-21T10:22:24.3119264Z st.shared.v2.b32 [%r11+14336], {%r19828, %r19829}; 2026-02-21T10:22:24.3119319Z bar.sync 0; 2026-02-21T10:22:24.3119396Z ld.shared.b16 %rs1569, [%r52]; 2026-02-21T10:22:24.3119468Z ld.shared.b16 %rs1570, [%r52+1024]; 2026-02-21T10:22:24.3119540Z ld.shared.b16 %rs1571, [%r52+64]; 2026-02-21T10:22:24.3119612Z ld.shared.b16 %rs1572, [%r52+1088]; 2026-02-21T10:22:24.3119759Z ld.shared.b16 %rs1573, [%r52+8192]; 2026-02-21T10:22:24.3119828Z ld.shared.b16 %rs1574, [%r52+9216]; 2026-02-21T10:22:24.3119900Z ld.shared.b16 %rs1575, [%r52+8256]; 2026-02-21T10:22:24.3119964Z ld.shared.b16 %rs1576, [%r52+9280]; 2026-02-21T10:22:24.3120032Z ld.shared.b16 %rs1577, [%r53]; 2026-02-21T10:22:24.3120097Z ld.shared.b16 %rs1578, [%r53+1024]; 2026-02-21T10:22:24.3120176Z ld.shared.b16 %rs1579, [%r53+64]; 2026-02-21T10:22:24.3120246Z ld.shared.b16 %rs1580, [%r53+1088]; 2026-02-21T10:22:24.3120315Z ld.shared.b16 %rs1581, [%r53+8192]; 2026-02-21T10:22:24.3120388Z ld.shared.b16 %rs1582, [%r53+9216]; 2026-02-21T10:22:24.3120454Z ld.shared.b16 %rs1583, [%r53+8256]; 2026-02-21T10:22:24.3120520Z ld.shared.b16 %rs1584, [%r53+9280]; 2026-02-21T10:22:24.3120588Z ld.shared.b16 %rs1585, [%r54]; 2026-02-21T10:22:24.3120662Z ld.shared.b16 %rs1586, [%r54+1024]; 2026-02-21T10:22:24.3120725Z ld.shared.b16 %rs1587, [%r54+64]; 2026-02-21T10:22:24.3120791Z ld.shared.b16 %rs1588, [%r54+1088]; 2026-02-21T10:22:24.3120865Z ld.shared.b16 %rs1589, [%r54+8192]; 2026-02-21T10:22:24.3120930Z ld.shared.b16 %rs1590, [%r54+9216]; 2026-02-21T10:22:24.3120996Z ld.shared.b16 %rs1591, [%r54+8256]; 2026-02-21T10:22:24.3121066Z ld.shared.b16 %rs1592, [%r54+9280]; 2026-02-21T10:22:24.3121132Z ld.shared.b16 %rs1593, [%r55]; 2026-02-21T10:22:24.3121199Z ld.shared.b16 %rs1594, [%r55+1024]; 2026-02-21T10:22:24.3121265Z ld.shared.b16 %rs1595, [%r55+64]; 2026-02-21T10:22:24.3121419Z ld.shared.b16 %rs1596, [%r55+1088]; 2026-02-21T10:22:24.3121492Z ld.shared.b16 %rs1597, [%r55+8192]; 2026-02-21T10:22:24.3121557Z ld.shared.b16 %rs1598, [%r55+9216]; 2026-02-21T10:22:24.3121624Z ld.shared.b16 %rs1599, [%r55+8256]; 2026-02-21T10:22:24.3121688Z ld.shared.b16 %rs1600, [%r55+9280]; 2026-02-21T10:22:24.3121752Z ld.shared.b16 %rs1601, [%r56]; 2026-02-21T10:22:24.3121821Z ld.shared.b16 %rs1602, [%r56+1024]; 2026-02-21T10:22:24.3121891Z ld.shared.b16 %rs1603, [%r56+64]; 2026-02-21T10:22:24.3121957Z ld.shared.b16 %rs1604, [%r56+1088]; 2026-02-21T10:22:24.3122022Z ld.shared.b16 %rs1605, [%r56+8192]; 2026-02-21T10:22:24.3122094Z ld.shared.b16 %rs1606, [%r56+9216]; 2026-02-21T10:22:24.3122158Z ld.shared.b16 %rs1607, [%r56+8256]; 2026-02-21T10:22:24.3122223Z ld.shared.b16 %rs1608, [%r56+9280]; 2026-02-21T10:22:24.3122294Z ld.shared.b16 %rs1609, [%r57]; 2026-02-21T10:22:24.3122361Z ld.shared.b16 %rs1610, [%r57+1024]; 2026-02-21T10:22:24.3122426Z ld.shared.b16 %rs1611, [%r57+64]; 2026-02-21T10:22:24.3122546Z ld.shared.b16 %rs1612, [%r57+1088]; 2026-02-21T10:22:24.3122618Z ld.shared.b16 %rs1613, [%r57+8192]; 2026-02-21T10:22:24.3122683Z ld.shared.b16 %rs1614, [%r57+9216]; 2026-02-21T10:22:24.3122747Z ld.shared.b16 %rs1615, [%r57+8256]; 2026-02-21T10:22:24.3122819Z ld.shared.b16 %rs1616, [%r57+9280]; 2026-02-21T10:22:24.3122930Z ld.shared.b16 %rs1617, [%r58]; 2026-02-21T10:22:24.3122995Z ld.shared.b16 %rs1618, [%r58+1024]; 2026-02-21T10:22:24.3123062Z ld.shared.b16 %rs1619, [%r58+64]; 2026-02-21T10:22:24.3123132Z ld.shared.b16 %rs1620, [%r58+1088]; 2026-02-21T10:22:24.3123198Z ld.shared.b16 %rs1621, [%r58+8192]; 2026-02-21T10:22:24.3123262Z ld.shared.b16 %rs1622, [%r58+9216]; 2026-02-21T10:22:24.3123331Z ld.shared.b16 %rs1623, [%r58+8256]; 2026-02-21T10:22:24.3123396Z ld.shared.b16 %rs1624, [%r58+9280]; 2026-02-21T10:22:24.3123460Z ld.shared.b16 %rs1625, [%r59]; 2026-02-21T10:22:24.3123531Z ld.shared.b16 %rs1626, [%r59+1024]; 2026-02-21T10:22:24.3123598Z ld.shared.b16 %rs1627, [%r59+64]; 2026-02-21T10:22:24.3123666Z ld.shared.b16 %rs1628, [%r59+1088]; 2026-02-21T10:22:24.3123731Z ld.shared.b16 %rs1629, [%r59+8192]; 2026-02-21T10:22:24.3123807Z ld.shared.b16 %rs1630, [%r59+9216]; 2026-02-21T10:22:24.3123879Z ld.shared.b16 %rs1631, [%r59+8256]; 2026-02-21T10:22:24.3123944Z ld.shared.b16 %rs1632, [%r59+9280]; 2026-02-21T10:22:24.3124015Z cvt.f32.bf16 %r19967, %rs1569; 2026-02-21T10:22:24.3124081Z cvt.f32.bf16 %r19968, %rs1570; 2026-02-21T10:22:24.3124144Z cvt.f32.bf16 %r19969, %rs1577; 2026-02-21T10:22:24.3124266Z cvt.f32.bf16 %r19970, %rs1578; 2026-02-21T10:22:24.3124337Z cvt.f32.bf16 %r20099, %rs1585; 2026-02-21T10:22:24.3124398Z cvt.f32.bf16 %r20100, %rs1586; 2026-02-21T10:22:24.3124459Z cvt.f32.bf16 %r20101, %rs1593; 2026-02-21T10:22:24.3124528Z cvt.f32.bf16 %r20102, %rs1594; 2026-02-21T10:22:24.3124590Z cvt.f32.bf16 %r20231, %rs1601; 2026-02-21T10:22:24.3124651Z cvt.f32.bf16 %r20232, %rs1602; 2026-02-21T10:22:24.3124713Z cvt.f32.bf16 %r20233, %rs1609; 2026-02-21T10:22:24.3124781Z cvt.f32.bf16 %r20234, %rs1610; 2026-02-21T10:22:24.3124843Z cvt.f32.bf16 %r20363, %rs1617; 2026-02-21T10:22:24.3124906Z cvt.f32.bf16 %r20364, %rs1618; 2026-02-21T10:22:24.3124970Z cvt.f32.bf16 %r20365, %rs1625; 2026-02-21T10:22:24.3128106Z cvt.f32.bf16 %r20366, %rs1626; 2026-02-21T10:22:24.3128224Z cvt.f32.bf16 %r20495, %rs1571; 2026-02-21T10:22:24.3128295Z cvt.f32.bf16 %r20496, %rs1572; 2026-02-21T10:22:24.3128359Z cvt.f32.bf16 %r20497, %rs1579; 2026-02-21T10:22:24.3128426Z cvt.f32.bf16 %r20498, %rs1580; 2026-02-21T10:22:24.3128489Z cvt.f32.bf16 %r20627, %rs1587; 2026-02-21T10:22:24.3128549Z cvt.f32.bf16 %r20628, %rs1588; 2026-02-21T10:22:24.3128607Z cvt.f32.bf16 %r20629, %rs1595; 2026-02-21T10:22:24.3128666Z cvt.f32.bf16 %r20630, %rs1596; 2026-02-21T10:22:24.3128729Z cvt.f32.bf16 %r20759, %rs1603; 2026-02-21T10:22:24.3128788Z cvt.f32.bf16 %r20760, %rs1604; 2026-02-21T10:22:24.3128847Z cvt.f32.bf16 %r20761, %rs1611; 2026-02-21T10:22:24.3129028Z cvt.f32.bf16 %r20762, %rs1612; 2026-02-21T10:22:24.3129089Z cvt.f32.bf16 %r20891, %rs1619; 2026-02-21T10:22:24.3129148Z cvt.f32.bf16 %r20892, %rs1620; 2026-02-21T10:22:24.3129207Z cvt.f32.bf16 %r20893, %rs1627; 2026-02-21T10:22:24.3129270Z cvt.f32.bf16 %r20894, %rs1628; 2026-02-21T10:22:24.3129329Z cvt.f32.bf16 %r21023, %rs1573; 2026-02-21T10:22:24.3129391Z cvt.f32.bf16 %r21024, %rs1574; 2026-02-21T10:22:24.3129455Z cvt.f32.bf16 %r21025, %rs1581; 2026-02-21T10:22:24.3129517Z cvt.f32.bf16 %r21026, %rs1582; 2026-02-21T10:22:24.3129580Z cvt.f32.bf16 %r21155, %rs1589; 2026-02-21T10:22:24.3129642Z cvt.f32.bf16 %r21156, %rs1590; 2026-02-21T10:22:24.3129702Z cvt.f32.bf16 %r21157, %rs1597; 2026-02-21T10:22:24.3129762Z cvt.f32.bf16 %r21158, %rs1598; 2026-02-21T10:22:24.3129821Z cvt.f32.bf16 %r21287, %rs1605; 2026-02-21T10:22:24.3129882Z cvt.f32.bf16 %r21288, %rs1606; 2026-02-21T10:22:24.3129941Z cvt.f32.bf16 %r21289, %rs1613; 2026-02-21T10:22:24.3130003Z cvt.f32.bf16 %r21290, %rs1614; 2026-02-21T10:22:24.3130139Z cvt.f32.bf16 %r21419, %rs1621; 2026-02-21T10:22:24.3130203Z cvt.f32.bf16 %r21420, %rs1622; 2026-02-21T10:22:24.3130261Z cvt.f32.bf16 %r21421, %rs1629; 2026-02-21T10:22:24.3130319Z cvt.f32.bf16 %r21422, %rs1630; 2026-02-21T10:22:24.3130388Z cvt.f32.bf16 %r21551, %rs1575; 2026-02-21T10:22:24.3130519Z cvt.f32.bf16 %r21552, %rs1576; 2026-02-21T10:22:24.3130581Z cvt.f32.bf16 %r21553, %rs1583; 2026-02-21T10:22:24.3130643Z cvt.f32.bf16 %r21554, %rs1584; 2026-02-21T10:22:24.3130704Z cvt.f32.bf16 %r21683, %rs1591; 2026-02-21T10:22:24.3130763Z cvt.f32.bf16 %r21684, %rs1592; 2026-02-21T10:22:24.3130822Z cvt.f32.bf16 %r21685, %rs1599; 2026-02-21T10:22:24.3130883Z cvt.f32.bf16 %r21686, %rs1600; 2026-02-21T10:22:24.3130941Z cvt.f32.bf16 %r21815, %rs1607; 2026-02-21T10:22:24.3130999Z cvt.f32.bf16 %r21816, %rs1608; 2026-02-21T10:22:24.3131059Z cvt.f32.bf16 %r21817, %rs1615; 2026-02-21T10:22:24.3131117Z cvt.f32.bf16 %r21818, %rs1616; 2026-02-21T10:22:24.3131181Z cvt.f32.bf16 %r21947, %rs1623; 2026-02-21T10:22:24.3131240Z cvt.f32.bf16 %r21948, %rs1624; 2026-02-21T10:22:24.3131301Z cvt.f32.bf16 %r21949, %rs1631; 2026-02-21T10:22:24.3131359Z cvt.f32.bf16 %r21950, %rs1632; 2026-02-21T10:22:24.3131588Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3131652Z bar.sync 0; 2026-02-21T10:22:24.3131713Z // begin inline asm 2026-02-21T10:22:24.3131826Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3131986Z // end inline asm 2026-02-21T10:22:24.3132046Z bar.sync 0; 2026-02-21T10:22:24.3132105Z // begin inline asm 2026-02-21T10:22:24.3132247Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3132304Z // end inline asm 2026-02-21T10:22:24.3132362Z // begin inline asm 2026-02-21T10:22:24.3132440Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3132498Z // end inline asm 2026-02-21T10:22:24.3132556Z bar.sync 0; 2026-02-21T10:22:24.3132627Z elect.sync %r22213|%p215, -1; 2026-02-21T10:22:24.3132697Z and.pred %p196, %p1, %p215; 2026-02-21T10:22:24.3132760Z add.s64 %rd847, %rd847, 32; 2026-02-21T10:22:24.3132824Z cvt.u32.u64 %r19834, %rd847; 2026-02-21T10:22:24.3132883Z // begin inline asm 2026-02-21T10:22:24.3133239Z @%p196 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r19833, %r19834}], [%r29850]; 2026-02-21T10:22:24.3133300Z // end inline asm 2026-02-21T10:22:24.3133354Z bar.sync 0; 2026-02-21T10:22:24.3133416Z mov.b32 %r22081, 0; 2026-02-21T10:22:24.3133476Z // begin inline asm 2026-02-21T10:22:24.3133528Z 2026-02-21T10:22:24.3133578Z { 2026-02-21T10:22:24.3133644Z .reg .pred complete; 2026-02-21T10:22:24.3133699Z waitLoop: 2026-02-21T10:22:24.3133850Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r22081; 2026-02-21T10:22:24.3133925Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3133979Z } 2026-02-21T10:22:24.3134050Z 2026-02-21T10:22:24.3134110Z // end inline asm 2026-02-21T10:22:24.3134168Z bar.sync 0; 2026-02-21T10:22:24.3134229Z // begin inline asm 2026-02-21T10:22:24.3134330Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3134385Z // end inline asm 2026-02-21T10:22:24.3134599Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3134670Z ld.shared.s8 %rs1633, [%r20]; 2026-02-21T10:22:24.3134868Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3134935Z shl.b16 %rs1634, %rs1633, 4; 2026-02-21T10:22:24.3135128Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3135196Z ld.shared.s8 %rs1635, [%r21+128]; 2026-02-21T10:22:24.3135386Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3135447Z shl.b16 %rs1636, %rs1635, 4; 2026-02-21T10:22:24.3135688Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3135755Z ld.shared.s8 %rs1637, [%r22+256]; 2026-02-21T10:22:24.3135945Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3136053Z shl.b16 %rs1638, %rs1637, 4; 2026-02-21T10:22:24.3136241Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3136309Z ld.shared.s8 %rs1639, [%r23+384]; 2026-02-21T10:22:24.3136644Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3136709Z shl.b16 %rs1640, %rs1639, 4; 2026-02-21T10:22:24.3136903Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3136966Z ld.shared.s8 %rs1641, [%r24+512]; 2026-02-21T10:22:24.3137160Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3137222Z shl.b16 %rs1642, %rs1641, 4; 2026-02-21T10:22:24.3137411Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3137474Z ld.shared.s8 %rs1643, [%r25+640]; 2026-02-21T10:22:24.3137670Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3137743Z shl.b16 %rs1644, %rs1643, 4; 2026-02-21T10:22:24.3138020Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3138087Z ld.shared.s8 %rs1645, [%r26+768]; 2026-02-21T10:22:24.3138278Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3138338Z shl.b16 %rs1646, %rs1645, 4; 2026-02-21T10:22:24.3138526Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3138599Z ld.shared.s8 %rs1647, [%r27+896]; 2026-02-21T10:22:24.3138788Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3138849Z shl.b16 %rs1648, %rs1647, 4; 2026-02-21T10:22:24.3139040Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3139110Z ld.shared.s8 %rs1649, [%r20+1024]; 2026-02-21T10:22:24.3139299Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3139362Z shl.b16 %rs1650, %rs1649, 4; 2026-02-21T10:22:24.3139552Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3139616Z ld.shared.s8 %rs1651, [%r21+1152]; 2026-02-21T10:22:24.3139804Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3139950Z shl.b16 %rs1652, %rs1651, 4; 2026-02-21T10:22:24.3140142Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3140206Z ld.shared.s8 %rs1653, [%r22+1280]; 2026-02-21T10:22:24.3140398Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3140460Z shl.b16 %rs1654, %rs1653, 4; 2026-02-21T10:22:24.3140651Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3140716Z ld.shared.s8 %rs1655, [%r23+1408]; 2026-02-21T10:22:24.3140903Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3140963Z shl.b16 %rs1656, %rs1655, 4; 2026-02-21T10:22:24.3141151Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3141217Z ld.shared.s8 %rs1657, [%r24+1536]; 2026-02-21T10:22:24.3141474Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3141537Z shl.b16 %rs1658, %rs1657, 4; 2026-02-21T10:22:24.3141727Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3141851Z ld.shared.s8 %rs1659, [%r25+1664]; 2026-02-21T10:22:24.3142038Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3142102Z shl.b16 %rs1660, %rs1659, 4; 2026-02-21T10:22:24.3142288Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3142351Z ld.shared.s8 %rs1661, [%r26+1792]; 2026-02-21T10:22:24.3142540Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3142599Z shl.b16 %rs1662, %rs1661, 4; 2026-02-21T10:22:24.3142790Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3142853Z ld.shared.s8 %rs1663, [%r27+1920]; 2026-02-21T10:22:24.3143043Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3143104Z shl.b16 %rs1664, %rs1663, 4; 2026-02-21T10:22:24.3143294Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3143359Z ld.shared.s8 %rs1665, [%r20+2048]; 2026-02-21T10:22:24.3143594Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3143657Z shl.b16 %rs1666, %rs1665, 4; 2026-02-21T10:22:24.3143846Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3143908Z ld.shared.s8 %rs1667, [%r21+2176]; 2026-02-21T10:22:24.3144107Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3144175Z shl.b16 %rs1668, %rs1667, 4; 2026-02-21T10:22:24.3144365Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3144426Z ld.shared.s8 %rs1669, [%r22+2304]; 2026-02-21T10:22:24.3144614Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3144677Z shl.b16 %rs1670, %rs1669, 4; 2026-02-21T10:22:24.3144868Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3144930Z ld.shared.s8 %rs1671, [%r23+2432]; 2026-02-21T10:22:24.3145121Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3145180Z shl.b16 %rs1672, %rs1671, 4; 2026-02-21T10:22:24.3145366Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3145486Z ld.shared.s8 %rs1673, [%r24+2560]; 2026-02-21T10:22:24.3145672Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3145732Z shl.b16 %rs1674, %rs1673, 4; 2026-02-21T10:22:24.3145921Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3145986Z ld.shared.s8 %rs1675, [%r25+2688]; 2026-02-21T10:22:24.3146183Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3146245Z shl.b16 %rs1676, %rs1675, 4; 2026-02-21T10:22:24.3146435Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3146622Z ld.shared.s8 %rs1677, [%r26+2816]; 2026-02-21T10:22:24.3146818Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3146881Z shl.b16 %rs1678, %rs1677, 4; 2026-02-21T10:22:24.3147150Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3147219Z ld.shared.s8 %rs1679, [%r27+2944]; 2026-02-21T10:22:24.3147412Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3147531Z shl.b16 %rs1680, %rs1679, 4; 2026-02-21T10:22:24.3147718Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3147784Z ld.shared.s8 %rs1681, [%r20+3072]; 2026-02-21T10:22:24.3147978Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3148041Z shl.b16 %rs1682, %rs1681, 4; 2026-02-21T10:22:24.3148234Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3148302Z ld.shared.s8 %rs1683, [%r21+3200]; 2026-02-21T10:22:24.3148584Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3148650Z shl.b16 %rs1684, %rs1683, 4; 2026-02-21T10:22:24.3148839Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3148903Z ld.shared.s8 %rs1685, [%r22+3328]; 2026-02-21T10:22:24.3149095Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3149158Z shl.b16 %rs1686, %rs1685, 4; 2026-02-21T10:22:24.3149418Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3149483Z ld.shared.s8 %rs1687, [%r23+3456]; 2026-02-21T10:22:24.3149673Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3149732Z shl.b16 %rs1688, %rs1687, 4; 2026-02-21T10:22:24.3149920Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3149988Z ld.shared.s8 %rs1689, [%r24+3584]; 2026-02-21T10:22:24.3150175Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3150233Z shl.b16 %rs1690, %rs1689, 4; 2026-02-21T10:22:24.3150420Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3150488Z ld.shared.s8 %rs1691, [%r25+3712]; 2026-02-21T10:22:24.3150679Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3150738Z shl.b16 %rs1692, %rs1691, 4; 2026-02-21T10:22:24.3150928Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3150990Z ld.shared.s8 %rs1693, [%r26+3840]; 2026-02-21T10:22:24.3151177Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3151310Z shl.b16 %rs1694, %rs1693, 4; 2026-02-21T10:22:24.3151498Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3151561Z ld.shared.s8 %rs1695, [%r27+3968]; 2026-02-21T10:22:24.3151750Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3151811Z shl.b16 %rs1696, %rs1695, 4; 2026-02-21T10:22:24.3152001Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3152063Z cvt.s16.s8 %rs1697, %rs1634; 2026-02-21T10:22:24.3152125Z shr.s16 %rs1698, %rs1697, 4; 2026-02-21T10:22:24.3152184Z cvt.s16.s8 %rs1699, %rs1636; 2026-02-21T10:22:24.3152242Z shr.s16 %rs1700, %rs1699, 4; 2026-02-21T10:22:24.3152309Z shr.s16 %rs1701, %rs1633, 4; 2026-02-21T10:22:24.3152375Z shr.s16 %rs1702, %rs1635, 4; 2026-02-21T10:22:24.3152569Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3152694Z cvt.rn.f32.s16 %r22214, %rs1702; 2026-02-21T10:22:24.3152759Z cvt.rn.f32.s16 %r22215, %rs1701; 2026-02-21T10:22:24.3152821Z cvt.rn.f32.s16 %r22216, %rs1700; 2026-02-21T10:22:24.3152880Z cvt.rn.f32.s16 %r22217, %rs1698; 2026-02-21T10:22:24.3153082Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3153221Z cvt.s16.s8 %rs1703, %rs1638; 2026-02-21T10:22:24.3153281Z shr.s16 %rs1704, %rs1703, 4; 2026-02-21T10:22:24.3153343Z cvt.s16.s8 %rs1705, %rs1640; 2026-02-21T10:22:24.3153402Z shr.s16 %rs1706, %rs1705, 4; 2026-02-21T10:22:24.3153460Z shr.s16 %rs1707, %rs1637, 4; 2026-02-21T10:22:24.3153532Z shr.s16 %rs1708, %rs1639, 4; 2026-02-21T10:22:24.3153732Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3153794Z cvt.rn.f32.s16 %r22218, %rs1708; 2026-02-21T10:22:24.3153858Z cvt.rn.f32.s16 %r22219, %rs1707; 2026-02-21T10:22:24.3153924Z cvt.rn.f32.s16 %r22220, %rs1706; 2026-02-21T10:22:24.3153983Z cvt.rn.f32.s16 %r22221, %rs1704; 2026-02-21T10:22:24.3154172Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3154234Z cvt.s16.s8 %rs1709, %rs1642; 2026-02-21T10:22:24.3154297Z shr.s16 %rs1710, %rs1709, 4; 2026-02-21T10:22:24.3154354Z cvt.s16.s8 %rs1711, %rs1644; 2026-02-21T10:22:24.3154413Z shr.s16 %rs1712, %rs1711, 4; 2026-02-21T10:22:24.3154529Z shr.s16 %rs1713, %rs1641, 4; 2026-02-21T10:22:24.3154591Z shr.s16 %rs1714, %rs1643, 4; 2026-02-21T10:22:24.3154780Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3154843Z cvt.rn.f32.s16 %r22222, %rs1714; 2026-02-21T10:22:24.3154903Z cvt.rn.f32.s16 %r22223, %rs1713; 2026-02-21T10:22:24.3154963Z cvt.rn.f32.s16 %r22224, %rs1712; 2026-02-21T10:22:24.3155025Z cvt.rn.f32.s16 %r22225, %rs1710; 2026-02-21T10:22:24.3155219Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3155279Z cvt.s16.s8 %rs1715, %rs1646; 2026-02-21T10:22:24.3155338Z shr.s16 %rs1716, %rs1715, 4; 2026-02-21T10:22:24.3155399Z cvt.s16.s8 %rs1717, %rs1648; 2026-02-21T10:22:24.3155461Z shr.s16 %rs1718, %rs1717, 4; 2026-02-21T10:22:24.3155519Z shr.s16 %rs1719, %rs1645, 4; 2026-02-21T10:22:24.3155580Z shr.s16 %rs1720, %rs1647, 4; 2026-02-21T10:22:24.3155771Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3155833Z cvt.rn.f32.s16 %r22226, %rs1720; 2026-02-21T10:22:24.3155895Z cvt.rn.f32.s16 %r22227, %rs1719; 2026-02-21T10:22:24.3155955Z cvt.rn.f32.s16 %r22228, %rs1718; 2026-02-21T10:22:24.3156014Z cvt.rn.f32.s16 %r22229, %rs1716; 2026-02-21T10:22:24.3156202Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3156327Z cvt.s16.s8 %rs1721, %rs1650; 2026-02-21T10:22:24.3156386Z shr.s16 %rs1722, %rs1721, 4; 2026-02-21T10:22:24.3156445Z cvt.s16.s8 %rs1723, %rs1652; 2026-02-21T10:22:24.3156633Z shr.s16 %rs1724, %rs1723, 4; 2026-02-21T10:22:24.3156693Z shr.s16 %rs1725, %rs1649, 4; 2026-02-21T10:22:24.3156752Z shr.s16 %rs1726, %rs1651, 4; 2026-02-21T10:22:24.3156948Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3157012Z cvt.rn.f32.s16 %r22230, %rs1726; 2026-02-21T10:22:24.3157073Z cvt.rn.f32.s16 %r22231, %rs1725; 2026-02-21T10:22:24.3157133Z cvt.rn.f32.s16 %r22232, %rs1724; 2026-02-21T10:22:24.3157196Z cvt.rn.f32.s16 %r22233, %rs1722; 2026-02-21T10:22:24.3157397Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3157459Z cvt.s16.s8 %rs1727, %rs1654; 2026-02-21T10:22:24.3157521Z shr.s16 %rs1728, %rs1727, 4; 2026-02-21T10:22:24.3157583Z cvt.s16.s8 %rs1729, %rs1656; 2026-02-21T10:22:24.3157721Z shr.s16 %rs1730, %rs1729, 4; 2026-02-21T10:22:24.3157784Z shr.s16 %rs1731, %rs1653, 4; 2026-02-21T10:22:24.3157845Z shr.s16 %rs1732, %rs1655, 4; 2026-02-21T10:22:24.3158038Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3158163Z cvt.rn.f32.s16 %r22234, %rs1732; 2026-02-21T10:22:24.3158228Z cvt.rn.f32.s16 %r22235, %rs1731; 2026-02-21T10:22:24.3158288Z cvt.rn.f32.s16 %r22236, %rs1730; 2026-02-21T10:22:24.3158351Z cvt.rn.f32.s16 %r22237, %rs1728; 2026-02-21T10:22:24.3158542Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3158600Z cvt.s16.s8 %rs1733, %rs1658; 2026-02-21T10:22:24.3158661Z shr.s16 %rs1734, %rs1733, 4; 2026-02-21T10:22:24.3158723Z cvt.s16.s8 %rs1735, %rs1660; 2026-02-21T10:22:24.3158789Z shr.s16 %rs1736, %rs1735, 4; 2026-02-21T10:22:24.3158851Z shr.s16 %rs1737, %rs1657, 4; 2026-02-21T10:22:24.3158916Z shr.s16 %rs1738, %rs1659, 4; 2026-02-21T10:22:24.3159110Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3159172Z cvt.rn.f32.s16 %r22238, %rs1738; 2026-02-21T10:22:24.3159247Z cvt.rn.f32.s16 %r22239, %rs1737; 2026-02-21T10:22:24.3159320Z cvt.rn.f32.s16 %r22240, %rs1736; 2026-02-21T10:22:24.3159383Z cvt.rn.f32.s16 %r22241, %rs1734; 2026-02-21T10:22:24.3159643Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3159707Z cvt.s16.s8 %rs1739, %rs1662; 2026-02-21T10:22:24.3159774Z shr.s16 %rs1740, %rs1739, 4; 2026-02-21T10:22:24.3159834Z cvt.s16.s8 %rs1741, %rs1664; 2026-02-21T10:22:24.3159896Z shr.s16 %rs1742, %rs1741, 4; 2026-02-21T10:22:24.3159970Z shr.s16 %rs1743, %rs1661, 4; 2026-02-21T10:22:24.3160032Z shr.s16 %rs1744, %rs1663, 4; 2026-02-21T10:22:24.3160227Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3160297Z cvt.rn.f32.s16 %r22242, %rs1744; 2026-02-21T10:22:24.3160360Z cvt.rn.f32.s16 %r22243, %rs1743; 2026-02-21T10:22:24.3160422Z cvt.rn.f32.s16 %r22244, %rs1742; 2026-02-21T10:22:24.3160482Z cvt.rn.f32.s16 %r22245, %rs1740; 2026-02-21T10:22:24.3160677Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3160743Z cvt.s16.s8 %rs1745, %rs1666; 2026-02-21T10:22:24.3160807Z shr.s16 %rs1746, %rs1745, 4; 2026-02-21T10:22:24.3160878Z cvt.s16.s8 %rs1747, %rs1668; 2026-02-21T10:22:24.3160939Z shr.s16 %rs1748, %rs1747, 4; 2026-02-21T10:22:24.3161000Z shr.s16 %rs1749, %rs1665, 4; 2026-02-21T10:22:24.3161064Z shr.s16 %rs1750, %rs1667, 4; 2026-02-21T10:22:24.3161258Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3161324Z cvt.rn.f32.s16 %r22246, %rs1750; 2026-02-21T10:22:24.3161458Z cvt.rn.f32.s16 %r22247, %rs1749; 2026-02-21T10:22:24.3161529Z cvt.rn.f32.s16 %r22248, %rs1748; 2026-02-21T10:22:24.3161593Z cvt.rn.f32.s16 %r22249, %rs1746; 2026-02-21T10:22:24.3161784Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3161849Z cvt.s16.s8 %rs1751, %rs1670; 2026-02-21T10:22:24.3161913Z shr.s16 %rs1752, %rs1751, 4; 2026-02-21T10:22:24.3161973Z cvt.s16.s8 %rs1753, %rs1672; 2026-02-21T10:22:24.3162032Z shr.s16 %rs1754, %rs1753, 4; 2026-02-21T10:22:24.3162098Z shr.s16 %rs1755, %rs1669, 4; 2026-02-21T10:22:24.3162158Z shr.s16 %rs1756, %rs1671, 4; 2026-02-21T10:22:24.3162348Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3162414Z cvt.rn.f32.s16 %r22250, %rs1756; 2026-02-21T10:22:24.3162476Z cvt.rn.f32.s16 %r22251, %rs1755; 2026-02-21T10:22:24.3162538Z cvt.rn.f32.s16 %r22252, %rs1754; 2026-02-21T10:22:24.3162607Z cvt.rn.f32.s16 %r22253, %rs1752; 2026-02-21T10:22:24.3162848Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3162914Z cvt.s16.s8 %rs1757, %rs1674; 2026-02-21T10:22:24.3162988Z shr.s16 %rs1758, %rs1757, 4; 2026-02-21T10:22:24.3163055Z cvt.s16.s8 %rs1759, %rs1676; 2026-02-21T10:22:24.3163166Z shr.s16 %rs1760, %rs1759, 4; 2026-02-21T10:22:24.3163227Z shr.s16 %rs1761, %rs1673, 4; 2026-02-21T10:22:24.3163292Z shr.s16 %rs1762, %rs1675, 4; 2026-02-21T10:22:24.3163486Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3163551Z cvt.rn.f32.s16 %r22254, %rs1762; 2026-02-21T10:22:24.3163617Z cvt.rn.f32.s16 %r22255, %rs1761; 2026-02-21T10:22:24.3163679Z cvt.rn.f32.s16 %r22256, %rs1760; 2026-02-21T10:22:24.3163742Z cvt.rn.f32.s16 %r22257, %rs1758; 2026-02-21T10:22:24.3163933Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3164007Z cvt.s16.s8 %rs1763, %rs1678; 2026-02-21T10:22:24.3164069Z shr.s16 %rs1764, %rs1763, 4; 2026-02-21T10:22:24.3164130Z cvt.s16.s8 %rs1765, %rs1680; 2026-02-21T10:22:24.3164193Z shr.s16 %rs1766, %rs1765, 4; 2026-02-21T10:22:24.3164251Z shr.s16 %rs1767, %rs1677, 4; 2026-02-21T10:22:24.3164312Z shr.s16 %rs1768, %rs1679, 4; 2026-02-21T10:22:24.3164503Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3164623Z cvt.rn.f32.s16 %r22258, %rs1768; 2026-02-21T10:22:24.3164689Z cvt.rn.f32.s16 %r22259, %rs1767; 2026-02-21T10:22:24.3164752Z cvt.rn.f32.s16 %r22260, %rs1766; 2026-02-21T10:22:24.3164819Z cvt.rn.f32.s16 %r22261, %rs1764; 2026-02-21T10:22:24.3165009Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3165069Z cvt.s16.s8 %rs1769, %rs1682; 2026-02-21T10:22:24.3165135Z shr.s16 %rs1770, %rs1769, 4; 2026-02-21T10:22:24.3165202Z cvt.s16.s8 %rs1771, %rs1684; 2026-02-21T10:22:24.3165265Z shr.s16 %rs1772, %rs1771, 4; 2026-02-21T10:22:24.3165328Z shr.s16 %rs1773, %rs1681, 4; 2026-02-21T10:22:24.3165392Z shr.s16 %rs1774, %rs1683, 4; 2026-02-21T10:22:24.3165580Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3165646Z cvt.rn.f32.s16 %r22262, %rs1774; 2026-02-21T10:22:24.3165711Z cvt.rn.f32.s16 %r22263, %rs1773; 2026-02-21T10:22:24.3165773Z cvt.rn.f32.s16 %r22264, %rs1772; 2026-02-21T10:22:24.3165835Z cvt.rn.f32.s16 %r22265, %rs1770; 2026-02-21T10:22:24.3166037Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3166099Z cvt.s16.s8 %rs1775, %rs1686; 2026-02-21T10:22:24.3166170Z shr.s16 %rs1776, %rs1775, 4; 2026-02-21T10:22:24.3166233Z cvt.s16.s8 %rs1777, %rs1688; 2026-02-21T10:22:24.3166300Z shr.s16 %rs1778, %rs1777, 4; 2026-02-21T10:22:24.3166421Z shr.s16 %rs1779, %rs1685, 4; 2026-02-21T10:22:24.3166616Z shr.s16 %rs1780, %rs1687, 4; 2026-02-21T10:22:24.3166819Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3166883Z cvt.rn.f32.s16 %r22266, %rs1780; 2026-02-21T10:22:24.3166946Z cvt.rn.f32.s16 %r22267, %rs1779; 2026-02-21T10:22:24.3167016Z cvt.rn.f32.s16 %r22268, %rs1778; 2026-02-21T10:22:24.3167079Z cvt.rn.f32.s16 %r22269, %rs1776; 2026-02-21T10:22:24.3167272Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3167337Z cvt.s16.s8 %rs1781, %rs1690; 2026-02-21T10:22:24.3167403Z shr.s16 %rs1782, %rs1781, 4; 2026-02-21T10:22:24.3167463Z cvt.s16.s8 %rs1783, %rs1692; 2026-02-21T10:22:24.3167524Z shr.s16 %rs1784, %rs1783, 4; 2026-02-21T10:22:24.3167590Z shr.s16 %rs1785, %rs1689, 4; 2026-02-21T10:22:24.3167649Z shr.s16 %rs1786, %rs1691, 4; 2026-02-21T10:22:24.3167931Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3168005Z cvt.rn.f32.s16 %r22270, %rs1786; 2026-02-21T10:22:24.3168074Z cvt.rn.f32.s16 %r22271, %rs1785; 2026-02-21T10:22:24.3168135Z cvt.rn.f32.s16 %r22272, %rs1784; 2026-02-21T10:22:24.3168196Z cvt.rn.f32.s16 %r22273, %rs1782; 2026-02-21T10:22:24.3168463Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3168525Z cvt.s16.s8 %rs1787, %rs1694; 2026-02-21T10:22:24.3168586Z shr.s16 %rs1788, %rs1787, 4; 2026-02-21T10:22:24.3168650Z cvt.s16.s8 %rs1789, %rs1696; 2026-02-21T10:22:24.3168711Z shr.s16 %rs1790, %rs1789, 4; 2026-02-21T10:22:24.3168770Z shr.s16 %rs1791, %rs1693, 4; 2026-02-21T10:22:24.3168829Z shr.s16 %rs1792, %rs1695, 4; 2026-02-21T10:22:24.3169027Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3169089Z cvt.rn.f32.s16 %r22274, %rs1792; 2026-02-21T10:22:24.3169155Z cvt.rn.f32.s16 %r22275, %rs1791; 2026-02-21T10:22:24.3169232Z cvt.rn.f32.s16 %r22276, %rs1790; 2026-02-21T10:22:24.3169295Z cvt.rn.f32.s16 %r22277, %rs1788; 2026-02-21T10:22:24.3169353Z bar.sync 0; 2026-02-21T10:22:24.3169481Z st.shared.v4.b32 [%r28], {%r22217, %r22215, %r22216, %r22214}; 2026-02-21T10:22:24.3169611Z st.shared.v4.b32 [%r28+16384], {%r22249, %r22247, %r22248, %r22246}; 2026-02-21T10:22:24.3169721Z st.shared.v4.b32 [%r29], {%r22221, %r22219, %r22220, %r22218}; 2026-02-21T10:22:24.3169918Z st.shared.v4.b32 [%r29+16384], {%r22253, %r22251, %r22252, %r22250}; 2026-02-21T10:22:24.3170035Z st.shared.v4.b32 [%r30], {%r22225, %r22223, %r22224, %r22222}; 2026-02-21T10:22:24.3170152Z st.shared.v4.b32 [%r30+16384], {%r22257, %r22255, %r22256, %r22254}; 2026-02-21T10:22:24.3170260Z st.shared.v4.b32 [%r31], {%r22229, %r22227, %r22228, %r22226}; 2026-02-21T10:22:24.3170378Z st.shared.v4.b32 [%r31+16384], {%r22261, %r22259, %r22260, %r22258}; 2026-02-21T10:22:24.3170489Z st.shared.v4.b32 [%r32], {%r22233, %r22231, %r22232, %r22230}; 2026-02-21T10:22:24.3170605Z st.shared.v4.b32 [%r32+16384], {%r22265, %r22263, %r22264, %r22262}; 2026-02-21T10:22:24.3170716Z st.shared.v4.b32 [%r33], {%r22237, %r22235, %r22236, %r22234}; 2026-02-21T10:22:24.3170834Z st.shared.v4.b32 [%r33+16384], {%r22269, %r22267, %r22268, %r22266}; 2026-02-21T10:22:24.3170941Z st.shared.v4.b32 [%r34], {%r22241, %r22239, %r22240, %r22238}; 2026-02-21T10:22:24.3171059Z st.shared.v4.b32 [%r34+16384], {%r22273, %r22271, %r22272, %r22270}; 2026-02-21T10:22:24.3171166Z st.shared.v4.b32 [%r35], {%r22245, %r22243, %r22244, %r22242}; 2026-02-21T10:22:24.3171279Z st.shared.v4.b32 [%r35+16384], {%r22277, %r22275, %r22276, %r22274}; 2026-02-21T10:22:24.3171336Z $L__tmp15: 2026-02-21T10:22:24.3171616Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3171680Z // begin inline asm 2026-02-21T10:22:24.3171831Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3171894Z // end inline asm 2026-02-21T10:22:24.3171951Z bar.sync 0; 2026-02-21T10:22:24.3172023Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3172090Z mov.pred %p198, -1; 2026-02-21T10:22:24.3172149Z // begin inline asm 2026-02-21T10:22:24.3173638Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r19967,%r19968,%r19969,%r19970}, %rd3, %p198, 1, 1; 2026-02-21T10:22:24.3173705Z // end inline asm 2026-02-21T10:22:24.3173766Z // begin inline asm 2026-02-21T10:22:24.3175293Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20099,%r20100,%r20101,%r20102}, %rd4, %p198, 1, 1; 2026-02-21T10:22:24.3175397Z // end inline asm 2026-02-21T10:22:24.3175462Z // begin inline asm 2026-02-21T10:22:24.3177053Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20231,%r20232,%r20233,%r20234}, %rd5, %p198, 1, 1; 2026-02-21T10:22:24.3177117Z // end inline asm 2026-02-21T10:22:24.3177177Z // begin inline asm 2026-02-21T10:22:24.3178749Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20363,%r20364,%r20365,%r20366}, %rd6, %p198, 1, 1; 2026-02-21T10:22:24.3178816Z // end inline asm 2026-02-21T10:22:24.3178878Z // begin inline asm 2026-02-21T10:22:24.3180371Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20495,%r20496,%r20497,%r20498}, %rd7, %p198, 1, 1; 2026-02-21T10:22:24.3180433Z // end inline asm 2026-02-21T10:22:24.3180492Z // begin inline asm 2026-02-21T10:22:24.3181976Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20627,%r20628,%r20629,%r20630}, %rd8, %p198, 1, 1; 2026-02-21T10:22:24.3182101Z // end inline asm 2026-02-21T10:22:24.3182160Z // begin inline asm 2026-02-21T10:22:24.3183691Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20759,%r20760,%r20761,%r20762}, %rd9, %p198, 1, 1; 2026-02-21T10:22:24.3183812Z // end inline asm 2026-02-21T10:22:24.3183870Z // begin inline asm 2026-02-21T10:22:24.3185356Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794}, {%r20891,%r20892,%r20893,%r20894}, %rd10, %p198, 1, 1; 2026-02-21T10:22:24.3185419Z // end inline asm 2026-02-21T10:22:24.3185476Z // begin inline asm 2026-02-21T10:22:24.3187146Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21023,%r21024,%r21025,%r21026}, %rd3, %p198, 1, 1; 2026-02-21T10:22:24.3187218Z // end inline asm 2026-02-21T10:22:24.3187278Z // begin inline asm 2026-02-21T10:22:24.3188859Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21155,%r21156,%r21157,%r21158}, %rd4, %p198, 1, 1; 2026-02-21T10:22:24.3188925Z // end inline asm 2026-02-21T10:22:24.3188983Z // begin inline asm 2026-02-21T10:22:24.3190469Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21287,%r21288,%r21289,%r21290}, %rd5, %p198, 1, 1; 2026-02-21T10:22:24.3190597Z // end inline asm 2026-02-21T10:22:24.3190655Z // begin inline asm 2026-02-21T10:22:24.3192142Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21419,%r21420,%r21421,%r21422}, %rd6, %p198, 1, 1; 2026-02-21T10:22:24.3192201Z // end inline asm 2026-02-21T10:22:24.3192328Z // begin inline asm 2026-02-21T10:22:24.3193813Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21551,%r21552,%r21553,%r21554}, %rd7, %p198, 1, 1; 2026-02-21T10:22:24.3193927Z // end inline asm 2026-02-21T10:22:24.3193988Z // begin inline asm 2026-02-21T10:22:24.3195467Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21683,%r21684,%r21685,%r21686}, %rd8, %p198, 1, 1; 2026-02-21T10:22:24.3195531Z // end inline asm 2026-02-21T10:22:24.3195590Z // begin inline asm 2026-02-21T10:22:24.3197274Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21815,%r21816,%r21817,%r21818}, %rd9, %p198, 1, 1; 2026-02-21T10:22:24.3197355Z // end inline asm 2026-02-21T10:22:24.3197414Z // begin inline asm 2026-02-21T10:22:24.3198901Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858}, {%r21947,%r21948,%r21949,%r21950}, %rd10, %p198, 1, 1; 2026-02-21T10:22:24.3198964Z // end inline asm 2026-02-21T10:22:24.3199043Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3199175Z mov.b32 %r22080, %r22081; 2026-02-21T10:22:24.3199243Z mov.b32 %r22079, %r39936; 2026-02-21T10:22:24.3199300Z // begin inline asm 2026-02-21T10:22:24.3201877Z // wait for regs: %r42731,%r42732,%r42733,%r42734,%r42735,%r42736,%r42737,%r42738,%r42739,%r42740,%r42741,%r42742,%r42743,%r42744,%r42745,%r42746,%r42747,%r42748,%r42749,%r42750,%r42751,%r42752,%r42753,%r42754,%r42755,%r42756,%r42757,%r42758,%r42759,%r42760,%r42761,%r42762,%r42763,%r42764,%r42765,%r42766,%r42767,%r42768,%r42769,%r42770,%r42771,%r42772,%r42773,%r42774,%r42775,%r42776,%r42777,%r42778,%r42779,%r42780,%r42781,%r42782,%r42783,%r42784,%r42785,%r42786,%r42787,%r42788,%r42789,%r42790,%r42791,%r42792,%r42793,%r42794,%r42795,%r42796,%r42797,%r42798,%r42799,%r42800,%r42801,%r42802,%r42803,%r42804,%r42805,%r42806,%r42807,%r42808,%r42809,%r42810,%r42811,%r42812,%r42813,%r42814,%r42815,%r42816,%r42817,%r42818,%r42819,%r42820,%r42821,%r42822,%r42823,%r42824,%r42825,%r42826,%r42827,%r42828,%r42829,%r42830,%r42831,%r42832,%r42833,%r42834,%r42835,%r42836,%r42837,%r42838,%r42839,%r42840,%r42841,%r42842,%r42843,%r42844,%r42845,%r42846,%r42847,%r42848,%r42849,%r42850,%r42851,%r42852,%r42853,%r42854,%r42855,%r42856,%r42857,%r42858,%r22079,%r22080,%r22081 2026-02-21T10:22:24.3201961Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3202022Z // end inline asm 2026-02-21T10:22:24.3202140Z $L__tmp16: 2026-02-21T10:22:24.3202356Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3202423Z add.s64 %rd846, %rd846, 128; 2026-02-21T10:22:24.3202495Z setp.lt.u64 %p216, %rd847, 4064; 2026-02-21T10:22:24.3202556Z @%p216 bra $L__BB0_9; 2026-02-21T10:22:24.3202668Z // %bb.10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.3202873Z .loc 1 97 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:97:28 2026-02-21T10:22:24.3202959Z cvt.rn.bf16x2.f32 %r22282, %r42732, %r42731; 2026-02-21T10:22:24.3203042Z cvt.rn.bf16x2.f32 %r22283, %r42734, %r42733; 2026-02-21T10:22:24.3203119Z cvt.rn.bf16x2.f32 %r22284, %r42736, %r42735; 2026-02-21T10:22:24.3203197Z cvt.rn.bf16x2.f32 %r22285, %r42738, %r42737; 2026-02-21T10:22:24.3203273Z cvt.rn.bf16x2.f32 %r22286, %r42740, %r42739; 2026-02-21T10:22:24.3203361Z cvt.rn.bf16x2.f32 %r22287, %r42742, %r42741; 2026-02-21T10:22:24.3203446Z cvt.rn.bf16x2.f32 %r22288, %r42744, %r42743; 2026-02-21T10:22:24.3203523Z cvt.rn.bf16x2.f32 %r22289, %r42746, %r42745; 2026-02-21T10:22:24.3203665Z cvt.rn.bf16x2.f32 %r22290, %r42748, %r42747; 2026-02-21T10:22:24.3203749Z cvt.rn.bf16x2.f32 %r22291, %r42750, %r42749; 2026-02-21T10:22:24.3203830Z cvt.rn.bf16x2.f32 %r22292, %r42752, %r42751; 2026-02-21T10:22:24.3203908Z cvt.rn.bf16x2.f32 %r22293, %r42754, %r42753; 2026-02-21T10:22:24.3203984Z cvt.rn.bf16x2.f32 %r22294, %r42756, %r42755; 2026-02-21T10:22:24.3204062Z cvt.rn.bf16x2.f32 %r22295, %r42758, %r42757; 2026-02-21T10:22:24.3204139Z cvt.rn.bf16x2.f32 %r22296, %r42760, %r42759; 2026-02-21T10:22:24.3204217Z cvt.rn.bf16x2.f32 %r22297, %r42762, %r42761; 2026-02-21T10:22:24.3204295Z cvt.rn.bf16x2.f32 %r22298, %r42764, %r42763; 2026-02-21T10:22:24.3204372Z cvt.rn.bf16x2.f32 %r22299, %r42766, %r42765; 2026-02-21T10:22:24.3204447Z cvt.rn.bf16x2.f32 %r22300, %r42768, %r42767; 2026-02-21T10:22:24.3204525Z cvt.rn.bf16x2.f32 %r22301, %r42770, %r42769; 2026-02-21T10:22:24.3204603Z cvt.rn.bf16x2.f32 %r22302, %r42772, %r42771; 2026-02-21T10:22:24.3204679Z cvt.rn.bf16x2.f32 %r22303, %r42774, %r42773; 2026-02-21T10:22:24.3204755Z cvt.rn.bf16x2.f32 %r22304, %r42776, %r42775; 2026-02-21T10:22:24.3204833Z cvt.rn.bf16x2.f32 %r22305, %r42778, %r42777; 2026-02-21T10:22:24.3204908Z cvt.rn.bf16x2.f32 %r22306, %r42780, %r42779; 2026-02-21T10:22:24.3204983Z cvt.rn.bf16x2.f32 %r22307, %r42782, %r42781; 2026-02-21T10:22:24.3205071Z cvt.rn.bf16x2.f32 %r22308, %r42784, %r42783; 2026-02-21T10:22:24.3205149Z cvt.rn.bf16x2.f32 %r22309, %r42786, %r42785; 2026-02-21T10:22:24.3205303Z cvt.rn.bf16x2.f32 %r22310, %r42788, %r42787; 2026-02-21T10:22:24.3205378Z cvt.rn.bf16x2.f32 %r22311, %r42790, %r42789; 2026-02-21T10:22:24.3205456Z cvt.rn.bf16x2.f32 %r22312, %r42792, %r42791; 2026-02-21T10:22:24.3205530Z cvt.rn.bf16x2.f32 %r22313, %r42794, %r42793; 2026-02-21T10:22:24.3205604Z cvt.rn.bf16x2.f32 %r22314, %r42796, %r42795; 2026-02-21T10:22:24.3205687Z cvt.rn.bf16x2.f32 %r22315, %r42798, %r42797; 2026-02-21T10:22:24.3205763Z cvt.rn.bf16x2.f32 %r22316, %r42800, %r42799; 2026-02-21T10:22:24.3205840Z cvt.rn.bf16x2.f32 %r22317, %r42802, %r42801; 2026-02-21T10:22:24.3205917Z cvt.rn.bf16x2.f32 %r22318, %r42804, %r42803; 2026-02-21T10:22:24.3205992Z cvt.rn.bf16x2.f32 %r22319, %r42806, %r42805; 2026-02-21T10:22:24.3206072Z cvt.rn.bf16x2.f32 %r22320, %r42808, %r42807; 2026-02-21T10:22:24.3206146Z cvt.rn.bf16x2.f32 %r22321, %r42810, %r42809; 2026-02-21T10:22:24.3206227Z cvt.rn.bf16x2.f32 %r22322, %r42812, %r42811; 2026-02-21T10:22:24.3206304Z cvt.rn.bf16x2.f32 %r22323, %r42814, %r42813; 2026-02-21T10:22:24.3206428Z cvt.rn.bf16x2.f32 %r22324, %r42816, %r42815; 2026-02-21T10:22:24.3206636Z cvt.rn.bf16x2.f32 %r22325, %r42818, %r42817; 2026-02-21T10:22:24.3206715Z cvt.rn.bf16x2.f32 %r22326, %r42820, %r42819; 2026-02-21T10:22:24.3206789Z cvt.rn.bf16x2.f32 %r22327, %r42822, %r42821; 2026-02-21T10:22:24.3206951Z cvt.rn.bf16x2.f32 %r22328, %r42824, %r42823; 2026-02-21T10:22:24.3207025Z cvt.rn.bf16x2.f32 %r22329, %r42826, %r42825; 2026-02-21T10:22:24.3207102Z cvt.rn.bf16x2.f32 %r22330, %r42828, %r42827; 2026-02-21T10:22:24.3207178Z cvt.rn.bf16x2.f32 %r22331, %r42830, %r42829; 2026-02-21T10:22:24.3207256Z cvt.rn.bf16x2.f32 %r22332, %r42832, %r42831; 2026-02-21T10:22:24.3207329Z cvt.rn.bf16x2.f32 %r22333, %r42834, %r42833; 2026-02-21T10:22:24.3207402Z cvt.rn.bf16x2.f32 %r22334, %r42836, %r42835; 2026-02-21T10:22:24.3207479Z cvt.rn.bf16x2.f32 %r22335, %r42838, %r42837; 2026-02-21T10:22:24.3207554Z cvt.rn.bf16x2.f32 %r22336, %r42840, %r42839; 2026-02-21T10:22:24.3207633Z cvt.rn.bf16x2.f32 %r22337, %r42842, %r42841; 2026-02-21T10:22:24.3207709Z cvt.rn.bf16x2.f32 %r22338, %r42844, %r42843; 2026-02-21T10:22:24.3207782Z cvt.rn.bf16x2.f32 %r22339, %r42846, %r42845; 2026-02-21T10:22:24.3207855Z cvt.rn.bf16x2.f32 %r22340, %r42848, %r42847; 2026-02-21T10:22:24.3207928Z cvt.rn.bf16x2.f32 %r22341, %r42850, %r42849; 2026-02-21T10:22:24.3208010Z cvt.rn.bf16x2.f32 %r22342, %r42852, %r42851; 2026-02-21T10:22:24.3208085Z cvt.rn.bf16x2.f32 %r22343, %r42854, %r42853; 2026-02-21T10:22:24.3208236Z cvt.rn.bf16x2.f32 %r22344, %r42856, %r42855; 2026-02-21T10:22:24.3208320Z cvt.rn.bf16x2.f32 %r22345, %r42858, %r42857; 2026-02-21T10:22:24.3208534Z .loc 1 98 43 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:98:43 2026-02-21T10:22:24.3208592Z bar.sync 0; 2026-02-21T10:22:24.3208793Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r22282, %r22283, %r22284, %r22285}; 2026-02-21T10:22:24.3208985Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r22298, %r22299, %r22300, %r22301}; 2026-02-21T10:22:24.3209168Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r22314, %r22315, %r22316, %r22317}; 2026-02-21T10:22:24.3209349Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r22330, %r22331, %r22332, %r22333}; 2026-02-21T10:22:24.3209528Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r22286, %r22287, %r22288, %r22289}; 2026-02-21T10:22:24.3209711Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r22302, %r22303, %r22304, %r22305}; 2026-02-21T10:22:24.3209893Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r22318, %r22319, %r22320, %r22321}; 2026-02-21T10:22:24.3210076Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r22334, %r22335, %r22336, %r22337}; 2026-02-21T10:22:24.3210254Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r22290, %r22291, %r22292, %r22293}; 2026-02-21T10:22:24.3210433Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r22306, %r22307, %r22308, %r22309}; 2026-02-21T10:22:24.3210690Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r22322, %r22323, %r22324, %r22325}; 2026-02-21T10:22:24.3210868Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r22338, %r22339, %r22340, %r22341}; 2026-02-21T10:22:24.3211044Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r22294, %r22295, %r22296, %r22297}; 2026-02-21T10:22:24.3211229Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r22310, %r22311, %r22312, %r22313}; 2026-02-21T10:22:24.3211411Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r22326, %r22327, %r22328, %r22329}; 2026-02-21T10:22:24.3211589Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r22342, %r22343, %r22344, %r22345}; 2026-02-21T10:22:24.3211654Z // begin inline asm 2026-02-21T10:22:24.3211746Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3211805Z // end inline asm 2026-02-21T10:22:24.3211863Z bar.sync 0; 2026-02-21T10:22:24.3211935Z elect.sync %r22346|%p219, -1; 2026-02-21T10:22:24.3212002Z and.pred %p217, %p405, %p219; 2026-02-21T10:22:24.3212139Z or.b32 %r22278, %r19833, %r641; 2026-02-21T10:22:24.3212204Z // begin inline asm 2026-02-21T10:22:24.3212444Z @%p217 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r22278, %r22279}], [%r22280]; 2026-02-21T10:22:24.3212501Z // end inline asm 2026-02-21T10:22:24.3212575Z cp.async.bulk.commit_group; 2026-02-21T10:22:24.3212702Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:24.3212757Z bar.sync 0; 2026-02-21T10:22:24.3212959Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.3213025Z add.s32 %r22347, %r42472, 2; 2026-02-21T10:22:24.3213217Z .loc 1 37 35 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:37:35 2026-02-21T10:22:24.3213278Z shr.s32 %r22348, %r22347, 31; 2026-02-21T10:22:24.3213341Z shr.u32 %r22349, %r22348, 18; 2026-02-21T10:22:24.3213405Z add.s32 %r22350, %r22347, %r22349; 2026-02-21T10:22:24.3213465Z shr.s32 %r22351, %r22350, 14; 2026-02-21T10:22:24.3213665Z .loc 1 38 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:38:33 2026-02-21T10:22:24.3213725Z shl.b32 %r22352, %r22351, 5; 2026-02-21T10:22:24.3213915Z .loc 1 39 39 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:39 2026-02-21T10:22:24.3213977Z sub.s32 %r22353, 10, %r22352; 2026-02-21T10:22:24.3214186Z .loc 1 39 52 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:52 2026-02-21T10:22:24.3214300Z min.s32 %r22354, %r22353, 32; 2026-02-21T10:22:24.3214494Z .loc 1 40 45 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:45 2026-02-21T10:22:24.3214563Z and.b32 %r22355, %r22350, -16384; 2026-02-21T10:22:24.3214626Z sub.s32 %r22356, %r22347, %r22355; 2026-02-21T10:22:24.3214816Z .loc 1 41 51 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:41:51 2026-02-21T10:22:24.3214880Z div.s32 %r22357, %r22356, %r22354; 2026-02-21T10:22:24.3215073Z .loc 1 40 64 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:64 2026-02-21T10:22:24.3215141Z mul.lo.s32 %r22358, %r22357, %r22354; 2026-02-21T10:22:24.3215204Z sub.s32 %r22359, %r22356, %r22358; 2026-02-21T10:22:24.3215394Z .loc 1 40 30 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:30 2026-02-21T10:22:24.3215455Z add.s32 %r22360, %r22359, %r22352; 2026-02-21T10:22:24.3215647Z .loc 1 42 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:42:27 2026-02-21T10:22:24.3215711Z shl.b32 %r29853, %r22360, 7; 2026-02-21T10:22:24.3215902Z .loc 1 43 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:43:27 2026-02-21T10:22:24.3215963Z shl.b32 %r32299, %r22357, 7; 2026-02-21T10:22:24.3216171Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3216285Z or.b32 %r22361, %r42464, %r32299; 2026-02-21T10:22:24.3216346Z shl.b32 %r22362, %r22361, 13; 2026-02-21T10:22:24.3216414Z mul.wide.s32 %rd69, %r22362, 2; 2026-02-21T10:22:24.3216596Z or.b32 %r22363, %r42465, %r32299; 2026-02-21T10:22:24.3216661Z shl.b32 %r22364, %r22363, 13; 2026-02-21T10:22:24.3216726Z mul.wide.s32 %rd70, %r22364, 2; 2026-02-21T10:22:24.3216792Z or.b32 %r22365, %r42466, %r32299; 2026-02-21T10:22:24.3216852Z shl.b32 %r22366, %r22365, 13; 2026-02-21T10:22:24.3216915Z mul.wide.s32 %rd71, %r22366, 2; 2026-02-21T10:22:24.3216989Z or.b32 %r22367, %r42467, %r32299; 2026-02-21T10:22:24.3217049Z shl.b32 %r22368, %r22367, 13; 2026-02-21T10:22:24.3217111Z mul.wide.s32 %rd72, %r22368, 2; 2026-02-21T10:22:24.3217173Z or.b32 %r22369, %r42468, %r32299; 2026-02-21T10:22:24.3217232Z shl.b32 %r22370, %r22369, 13; 2026-02-21T10:22:24.3217296Z mul.wide.s32 %rd73, %r22370, 2; 2026-02-21T10:22:24.3217356Z or.b32 %r22371, %r42469, %r32299; 2026-02-21T10:22:24.3217423Z shl.b32 %r22372, %r22371, 13; 2026-02-21T10:22:24.3217557Z mul.wide.s32 %rd74, %r22372, 2; 2026-02-21T10:22:24.3217620Z shl.b32 %r22373, %r22357, 20; 2026-02-21T10:22:24.3217683Z or.b32 %r22374, %r42470, %r22373; 2026-02-21T10:22:24.3217747Z mul.wide.s32 %rd75, %r22374, 2; 2026-02-21T10:22:24.3217809Z or.b32 %r42987, %r68, %r22373; 2026-02-21T10:22:24.3217930Z or.b32 %r22375, %r42471, %r22373; 2026-02-21T10:22:24.3217997Z mul.wide.s32 %rd76, %r22375, 2; 2026-02-21T10:22:24.3218058Z mov.b32 %r42988, 0f00000000; 2026-02-21T10:22:24.3218121Z mov.b64 %rd849, -96; 2026-02-21T10:22:24.3218186Z mov.b64 %rd848, %rd11; 2026-02-21T10:22:24.3218246Z mov.b32 %r42989, %r42988; 2026-02-21T10:22:24.3218305Z mov.b32 %r42990, %r42988; 2026-02-21T10:22:24.3218363Z mov.b32 %r42991, %r42988; 2026-02-21T10:22:24.3218423Z mov.b32 %r42992, %r42988; 2026-02-21T10:22:24.3218481Z mov.b32 %r42993, %r42988; 2026-02-21T10:22:24.3218538Z mov.b32 %r42994, %r42988; 2026-02-21T10:22:24.3218597Z mov.b32 %r42995, %r42988; 2026-02-21T10:22:24.3218671Z mov.b32 %r42996, %r42988; 2026-02-21T10:22:24.3218731Z mov.b32 %r42997, %r42988; 2026-02-21T10:22:24.3218789Z mov.b32 %r42998, %r42988; 2026-02-21T10:22:24.3218851Z mov.b32 %r42999, %r42988; 2026-02-21T10:22:24.3218908Z mov.b32 %r43000, %r42988; 2026-02-21T10:22:24.3218966Z mov.b32 %r43001, %r42988; 2026-02-21T10:22:24.3219028Z mov.b32 %r43002, %r42988; 2026-02-21T10:22:24.3219086Z mov.b32 %r43003, %r42988; 2026-02-21T10:22:24.3219143Z mov.b32 %r43004, %r42988; 2026-02-21T10:22:24.3219277Z mov.b32 %r43005, %r42988; 2026-02-21T10:22:24.3219338Z mov.b32 %r43006, %r42988; 2026-02-21T10:22:24.3219395Z mov.b32 %r43007, %r42988; 2026-02-21T10:22:24.3219453Z mov.b32 %r43008, %r42988; 2026-02-21T10:22:24.3219512Z mov.b32 %r43009, %r42988; 2026-02-21T10:22:24.3219570Z mov.b32 %r43010, %r42988; 2026-02-21T10:22:24.3219626Z mov.b32 %r43011, %r42988; 2026-02-21T10:22:24.3219687Z mov.b32 %r43012, %r42988; 2026-02-21T10:22:24.3219744Z mov.b32 %r43013, %r42988; 2026-02-21T10:22:24.3219806Z mov.b32 %r43014, %r42988; 2026-02-21T10:22:24.3219863Z mov.b32 %r43015, %r42988; 2026-02-21T10:22:24.3219923Z mov.b32 %r43016, %r42988; 2026-02-21T10:22:24.3219980Z mov.b32 %r43017, %r42988; 2026-02-21T10:22:24.3220038Z mov.b32 %r43018, %r42988; 2026-02-21T10:22:24.3220100Z mov.b32 %r43019, %r42988; 2026-02-21T10:22:24.3220162Z mov.b32 %r43020, %r42988; 2026-02-21T10:22:24.3220219Z mov.b32 %r43021, %r42988; 2026-02-21T10:22:24.3220275Z mov.b32 %r43022, %r42988; 2026-02-21T10:22:24.3220338Z mov.b32 %r43023, %r42988; 2026-02-21T10:22:24.3220395Z mov.b32 %r43024, %r42988; 2026-02-21T10:22:24.3220453Z mov.b32 %r43025, %r42988; 2026-02-21T10:22:24.3220512Z mov.b32 %r43026, %r42988; 2026-02-21T10:22:24.3220569Z mov.b32 %r43027, %r42988; 2026-02-21T10:22:24.3220626Z mov.b32 %r43028, %r42988; 2026-02-21T10:22:24.3220684Z mov.b32 %r43029, %r42988; 2026-02-21T10:22:24.3220744Z mov.b32 %r43030, %r42988; 2026-02-21T10:22:24.3220801Z mov.b32 %r43031, %r42988; 2026-02-21T10:22:24.3220935Z mov.b32 %r43032, %r42988; 2026-02-21T10:22:24.3220997Z mov.b32 %r43033, %r42988; 2026-02-21T10:22:24.3221054Z mov.b32 %r43034, %r42988; 2026-02-21T10:22:24.3221110Z mov.b32 %r43035, %r42988; 2026-02-21T10:22:24.3221168Z mov.b32 %r43036, %r42988; 2026-02-21T10:22:24.3221228Z mov.b32 %r43037, %r42988; 2026-02-21T10:22:24.3221287Z mov.b32 %r43038, %r42988; 2026-02-21T10:22:24.3221344Z mov.b32 %r43039, %r42988; 2026-02-21T10:22:24.3221404Z mov.b32 %r43040, %r42988; 2026-02-21T10:22:24.3221470Z mov.b32 %r43041, %r42988; 2026-02-21T10:22:24.3221527Z mov.b32 %r43042, %r42988; 2026-02-21T10:22:24.3221585Z mov.b32 %r43043, %r42988; 2026-02-21T10:22:24.3221646Z mov.b32 %r43044, %r42988; 2026-02-21T10:22:24.3221714Z mov.b32 %r43045, %r42988; 2026-02-21T10:22:24.3221773Z mov.b32 %r43046, %r42988; 2026-02-21T10:22:24.3221835Z mov.b32 %r43047, %r42988; 2026-02-21T10:22:24.3221893Z mov.b32 %r43048, %r42988; 2026-02-21T10:22:24.3221951Z mov.b32 %r43049, %r42988; 2026-02-21T10:22:24.3222065Z mov.b32 %r43050, %r42988; 2026-02-21T10:22:24.3222125Z mov.b32 %r43051, %r42988; 2026-02-21T10:22:24.3222181Z mov.b32 %r43052, %r42988; 2026-02-21T10:22:24.3222238Z mov.b32 %r43053, %r42988; 2026-02-21T10:22:24.3222297Z mov.b32 %r43054, %r42988; 2026-02-21T10:22:24.3222354Z mov.b32 %r43055, %r42988; 2026-02-21T10:22:24.3222460Z mov.b32 %r43056, %r42988; 2026-02-21T10:22:24.3222525Z mov.b32 %r43057, %r42988; 2026-02-21T10:22:24.3222581Z mov.b32 %r43058, %r42988; 2026-02-21T10:22:24.3222640Z mov.b32 %r43059, %r42988; 2026-02-21T10:22:24.3222697Z mov.b32 %r43060, %r42988; 2026-02-21T10:22:24.3222756Z mov.b32 %r43061, %r42988; 2026-02-21T10:22:24.3222813Z mov.b32 %r43062, %r42988; 2026-02-21T10:22:24.3222869Z mov.b32 %r43063, %r42988; 2026-02-21T10:22:24.3222927Z mov.b32 %r43064, %r42988; 2026-02-21T10:22:24.3222985Z mov.b32 %r43065, %r42988; 2026-02-21T10:22:24.3223042Z mov.b32 %r43066, %r42988; 2026-02-21T10:22:24.3223112Z mov.b32 %r43067, %r42988; 2026-02-21T10:22:24.3223180Z mov.b32 %r43068, %r42988; 2026-02-21T10:22:24.3223238Z mov.b32 %r43069, %r42988; 2026-02-21T10:22:24.3223296Z mov.b32 %r43070, %r42988; 2026-02-21T10:22:24.3223356Z mov.b32 %r43071, %r42988; 2026-02-21T10:22:24.3223413Z mov.b32 %r43072, %r42988; 2026-02-21T10:22:24.3223470Z mov.b32 %r43073, %r42988; 2026-02-21T10:22:24.3223531Z mov.b32 %r43074, %r42988; 2026-02-21T10:22:24.3223592Z mov.b32 %r43075, %r42988; 2026-02-21T10:22:24.3223650Z mov.b32 %r43076, %r42988; 2026-02-21T10:22:24.3223777Z mov.b32 %r43077, %r42988; 2026-02-21T10:22:24.3223847Z mov.b32 %r43078, %r42988; 2026-02-21T10:22:24.3223911Z mov.b32 %r43079, %r42988; 2026-02-21T10:22:24.3223969Z mov.b32 %r43080, %r42988; 2026-02-21T10:22:24.3224027Z mov.b32 %r43081, %r42988; 2026-02-21T10:22:24.3224088Z mov.b32 %r43082, %r42988; 2026-02-21T10:22:24.3224146Z mov.b32 %r43083, %r42988; 2026-02-21T10:22:24.3224202Z mov.b32 %r43084, %r42988; 2026-02-21T10:22:24.3224266Z mov.b32 %r43085, %r42988; 2026-02-21T10:22:24.3224325Z mov.b32 %r43086, %r42988; 2026-02-21T10:22:24.3224384Z mov.b32 %r43087, %r42988; 2026-02-21T10:22:24.3224441Z mov.b32 %r43088, %r42988; 2026-02-21T10:22:24.3224504Z mov.b32 %r43089, %r42988; 2026-02-21T10:22:24.3224563Z mov.b32 %r43090, %r42988; 2026-02-21T10:22:24.3224621Z mov.b32 %r43091, %r42988; 2026-02-21T10:22:24.3224687Z mov.b32 %r43092, %r42988; 2026-02-21T10:22:24.3224754Z mov.b32 %r43093, %r42988; 2026-02-21T10:22:24.3224812Z mov.b32 %r43094, %r42988; 2026-02-21T10:22:24.3224875Z mov.b32 %r43095, %r42988; 2026-02-21T10:22:24.3224934Z mov.b32 %r43096, %r42988; 2026-02-21T10:22:24.3224992Z mov.b32 %r43097, %r42988; 2026-02-21T10:22:24.3225050Z mov.b32 %r43098, %r42988; 2026-02-21T10:22:24.3225111Z mov.b32 %r43099, %r42988; 2026-02-21T10:22:24.3225167Z mov.b32 %r43100, %r42988; 2026-02-21T10:22:24.3225223Z mov.b32 %r43101, %r42988; 2026-02-21T10:22:24.3225285Z mov.b32 %r43102, %r42988; 2026-02-21T10:22:24.3225398Z mov.b32 %r43103, %r42988; 2026-02-21T10:22:24.3225458Z mov.b32 %r43104, %r42988; 2026-02-21T10:22:24.3225516Z mov.b32 %r43105, %r42988; 2026-02-21T10:22:24.3225575Z mov.b32 %r43106, %r42988; 2026-02-21T10:22:24.3225633Z mov.b32 %r43107, %r42988; 2026-02-21T10:22:24.3225690Z mov.b32 %r43108, %r42988; 2026-02-21T10:22:24.3225749Z mov.b32 %r43109, %r42988; 2026-02-21T10:22:24.3225809Z mov.b32 %r43110, %r42988; 2026-02-21T10:22:24.3225864Z mov.b32 %r43111, %r42988; 2026-02-21T10:22:24.3225925Z mov.b32 %r43112, %r42988; 2026-02-21T10:22:24.3225989Z mov.b32 %r43113, %r42988; 2026-02-21T10:22:24.3226048Z mov.b32 %r43114, %r42988; 2026-02-21T10:22:24.3226105Z mov.b32 %r43115, %r42988; 2026-02-21T10:22:24.3226228Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.3226335Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.3226669Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3226822Z add.s64 %rd500, %rd848, %rd75; 2026-02-21T10:22:24.3226901Z add.s64 %rd503, %rd848, %rd74; 2026-02-21T10:22:24.3226964Z add.s64 %rd506, %rd848, %rd73; 2026-02-21T10:22:24.3227026Z add.s64 %rd509, %rd848, %rd72; 2026-02-21T10:22:24.3227093Z add.s64 %rd512, %rd848, %rd71; 2026-02-21T10:22:24.3227221Z add.s64 %rd515, %rd848, %rd70; 2026-02-21T10:22:24.3227284Z add.s64 %rd518, %rd848, %rd69; 2026-02-21T10:22:24.3227483Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3227555Z add.s64 %rd521, %rd848, %rd76; 2026-02-21T10:22:24.3227614Z // begin inline asm 2026-02-21T10:22:24.3227674Z mov.u64 %rd499, 0x0; 2026-02-21T10:22:24.3227810Z createpolicy.fractional.L2::evict_first.b64 %rd499, 1.0; 2026-02-21T10:22:24.3227867Z // end inline asm 2026-02-21T10:22:24.3227926Z // begin inline asm 2026-02-21T10:22:24.3227989Z mov.u32 %r22376, 0x0; 2026-02-21T10:22:24.3228050Z mov.u32 %r22377, 0x0; 2026-02-21T10:22:24.3228108Z mov.u32 %r22378, 0x0; 2026-02-21T10:22:24.3228163Z mov.u32 %r22379, 0x0; 2026-02-21T10:22:24.3228405Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22376, %r22377, %r22378, %r22379 }, [ %rd500 + 0 ], %rd499; 2026-02-21T10:22:24.3228541Z // end inline asm 2026-02-21T10:22:24.3228601Z // begin inline asm 2026-02-21T10:22:24.3228668Z mov.u64 %rd502, 0x0; 2026-02-21T10:22:24.3228792Z createpolicy.fractional.L2::evict_first.b64 %rd502, 1.0; 2026-02-21T10:22:24.3228849Z // end inline asm 2026-02-21T10:22:24.3228981Z // begin inline asm 2026-02-21T10:22:24.3229042Z mov.u32 %r22380, 0x0; 2026-02-21T10:22:24.3229099Z mov.u32 %r22381, 0x0; 2026-02-21T10:22:24.3229155Z mov.u32 %r22382, 0x0; 2026-02-21T10:22:24.3229213Z mov.u32 %r22383, 0x0; 2026-02-21T10:22:24.3229441Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22380, %r22381, %r22382, %r22383 }, [ %rd503 + 0 ], %rd502; 2026-02-21T10:22:24.3229496Z // end inline asm 2026-02-21T10:22:24.3229562Z // begin inline asm 2026-02-21T10:22:24.3229620Z mov.u64 %rd505, 0x0; 2026-02-21T10:22:24.3229739Z createpolicy.fractional.L2::evict_first.b64 %rd505, 1.0; 2026-02-21T10:22:24.3229797Z // end inline asm 2026-02-21T10:22:24.3229855Z // begin inline asm 2026-02-21T10:22:24.3229911Z mov.u32 %r22384, 0x0; 2026-02-21T10:22:24.3229967Z mov.u32 %r22385, 0x0; 2026-02-21T10:22:24.3230033Z mov.u32 %r22386, 0x0; 2026-02-21T10:22:24.3230089Z mov.u32 %r22387, 0x0; 2026-02-21T10:22:24.3230314Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22384, %r22385, %r22386, %r22387 }, [ %rd506 + 0 ], %rd505; 2026-02-21T10:22:24.3230375Z // end inline asm 2026-02-21T10:22:24.3230433Z // begin inline asm 2026-02-21T10:22:24.3230490Z mov.u64 %rd508, 0x0; 2026-02-21T10:22:24.3230615Z createpolicy.fractional.L2::evict_first.b64 %rd508, 1.0; 2026-02-21T10:22:24.3230671Z // end inline asm 2026-02-21T10:22:24.3230729Z // begin inline asm 2026-02-21T10:22:24.3230786Z mov.u32 %r22388, 0x0; 2026-02-21T10:22:24.3230920Z mov.u32 %r22389, 0x0; 2026-02-21T10:22:24.3230990Z mov.u32 %r22390, 0x0; 2026-02-21T10:22:24.3231050Z mov.u32 %r22391, 0x0; 2026-02-21T10:22:24.3231275Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22388, %r22389, %r22390, %r22391 }, [ %rd509 + 0 ], %rd508; 2026-02-21T10:22:24.3231332Z // end inline asm 2026-02-21T10:22:24.3231390Z // begin inline asm 2026-02-21T10:22:24.3231452Z mov.u64 %rd511, 0x0; 2026-02-21T10:22:24.3231583Z createpolicy.fractional.L2::evict_first.b64 %rd511, 1.0; 2026-02-21T10:22:24.3231641Z // end inline asm 2026-02-21T10:22:24.3231703Z // begin inline asm 2026-02-21T10:22:24.3231765Z mov.u32 %r22392, 0x0; 2026-02-21T10:22:24.3231823Z mov.u32 %r22393, 0x0; 2026-02-21T10:22:24.3231880Z mov.u32 %r22394, 0x0; 2026-02-21T10:22:24.3231939Z mov.u32 %r22395, 0x0; 2026-02-21T10:22:24.3232159Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22392, %r22393, %r22394, %r22395 }, [ %rd512 + 0 ], %rd511; 2026-02-21T10:22:24.3232217Z // end inline asm 2026-02-21T10:22:24.3232276Z // begin inline asm 2026-02-21T10:22:24.3232392Z mov.u64 %rd514, 0x0; 2026-02-21T10:22:24.3232509Z createpolicy.fractional.L2::evict_first.b64 %rd514, 1.0; 2026-02-21T10:22:24.3232575Z // end inline asm 2026-02-21T10:22:24.3232639Z // begin inline asm 2026-02-21T10:22:24.3232698Z mov.u32 %r22396, 0x0; 2026-02-21T10:22:24.3232754Z mov.u32 %r22397, 0x0; 2026-02-21T10:22:24.3232860Z mov.u32 %r22398, 0x0; 2026-02-21T10:22:24.3232925Z mov.u32 %r22399, 0x0; 2026-02-21T10:22:24.3233149Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22396, %r22397, %r22398, %r22399 }, [ %rd515 + 0 ], %rd514; 2026-02-21T10:22:24.3233206Z // end inline asm 2026-02-21T10:22:24.3233266Z // begin inline asm 2026-02-21T10:22:24.3233326Z mov.u64 %rd517, 0x0; 2026-02-21T10:22:24.3233442Z createpolicy.fractional.L2::evict_first.b64 %rd517, 1.0; 2026-02-21T10:22:24.3233499Z // end inline asm 2026-02-21T10:22:24.3233557Z // begin inline asm 2026-02-21T10:22:24.3233612Z mov.u32 %r22400, 0x0; 2026-02-21T10:22:24.3233672Z mov.u32 %r22401, 0x0; 2026-02-21T10:22:24.3233733Z mov.u32 %r22402, 0x0; 2026-02-21T10:22:24.3233791Z mov.u32 %r22403, 0x0; 2026-02-21T10:22:24.3234011Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22400, %r22401, %r22402, %r22403 }, [ %rd518 + 0 ], %rd517; 2026-02-21T10:22:24.3234071Z // end inline asm 2026-02-21T10:22:24.3234128Z // begin inline asm 2026-02-21T10:22:24.3234189Z mov.u64 %rd520, 0x0; 2026-02-21T10:22:24.3234306Z createpolicy.fractional.L2::evict_first.b64 %rd520, 1.0; 2026-02-21T10:22:24.3234362Z // end inline asm 2026-02-21T10:22:24.3234472Z // begin inline asm 2026-02-21T10:22:24.3234531Z mov.u32 %r22404, 0x0; 2026-02-21T10:22:24.3234590Z mov.u32 %r22405, 0x0; 2026-02-21T10:22:24.3234647Z mov.u32 %r22406, 0x0; 2026-02-21T10:22:24.3234704Z mov.u32 %r22407, 0x0; 2026-02-21T10:22:24.3234930Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r22404, %r22405, %r22406, %r22407 }, [ %rd521 + 0 ], %rd520; 2026-02-21T10:22:24.3234986Z // end inline asm 2026-02-21T10:22:24.3235190Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3235250Z bar.sync 0; 2026-02-21T10:22:24.3235335Z st.shared.v2.b32 [%r10], {%r22376, %r22377}; 2026-02-21T10:22:24.3235429Z st.shared.v2.b32 [%r10+2048], {%r22380, %r22381}; 2026-02-21T10:22:24.3235514Z st.shared.v2.b32 [%r10+4096], {%r22384, %r22385}; 2026-02-21T10:22:24.3235602Z st.shared.v2.b32 [%r10+6144], {%r22388, %r22389}; 2026-02-21T10:22:24.3235683Z st.shared.v2.b32 [%r10+8192], {%r22392, %r22393}; 2026-02-21T10:22:24.3235777Z st.shared.v2.b32 [%r10+10240], {%r22396, %r22397}; 2026-02-21T10:22:24.3235868Z st.shared.v2.b32 [%r10+12288], {%r22400, %r22401}; 2026-02-21T10:22:24.3235953Z st.shared.v2.b32 [%r10+14336], {%r22404, %r22405}; 2026-02-21T10:22:24.3236031Z st.shared.v2.b32 [%r11], {%r22378, %r22379}; 2026-02-21T10:22:24.3236118Z st.shared.v2.b32 [%r11+2048], {%r22382, %r22383}; 2026-02-21T10:22:24.3236201Z st.shared.v2.b32 [%r11+4096], {%r22386, %r22387}; 2026-02-21T10:22:24.3236348Z st.shared.v2.b32 [%r11+6144], {%r22390, %r22391}; 2026-02-21T10:22:24.3236432Z st.shared.v2.b32 [%r11+8192], {%r22394, %r22395}; 2026-02-21T10:22:24.3236646Z st.shared.v2.b32 [%r11+10240], {%r22398, %r22399}; 2026-02-21T10:22:24.3236735Z st.shared.v2.b32 [%r11+12288], {%r22402, %r22403}; 2026-02-21T10:22:24.3236820Z st.shared.v2.b32 [%r11+14336], {%r22406, %r22407}; 2026-02-21T10:22:24.3236878Z bar.sync 0; 2026-02-21T10:22:24.3236944Z ld.shared.b16 %rs1793, [%r52]; 2026-02-21T10:22:24.3237017Z ld.shared.b16 %rs1794, [%r52+1024]; 2026-02-21T10:22:24.3237086Z ld.shared.b16 %rs1795, [%r52+64]; 2026-02-21T10:22:24.3237157Z ld.shared.b16 %rs1796, [%r52+1088]; 2026-02-21T10:22:24.3237222Z ld.shared.b16 %rs1797, [%r52+8192]; 2026-02-21T10:22:24.3237288Z ld.shared.b16 %rs1798, [%r52+9216]; 2026-02-21T10:22:24.3237362Z ld.shared.b16 %rs1799, [%r52+8256]; 2026-02-21T10:22:24.3237432Z ld.shared.b16 %rs1800, [%r52+9280]; 2026-02-21T10:22:24.3237503Z ld.shared.b16 %rs1801, [%r53]; 2026-02-21T10:22:24.3237647Z ld.shared.b16 %rs1802, [%r53+1024]; 2026-02-21T10:22:24.3237716Z ld.shared.b16 %rs1803, [%r53+64]; 2026-02-21T10:22:24.3237782Z ld.shared.b16 %rs1804, [%r53+1088]; 2026-02-21T10:22:24.3237846Z ld.shared.b16 %rs1805, [%r53+8192]; 2026-02-21T10:22:24.3237914Z ld.shared.b16 %rs1806, [%r53+9216]; 2026-02-21T10:22:24.3238041Z ld.shared.b16 %rs1807, [%r53+8256]; 2026-02-21T10:22:24.3238106Z ld.shared.b16 %rs1808, [%r53+9280]; 2026-02-21T10:22:24.3238172Z ld.shared.b16 %rs1809, [%r54]; 2026-02-21T10:22:24.3238237Z ld.shared.b16 %rs1810, [%r54+1024]; 2026-02-21T10:22:24.3238300Z ld.shared.b16 %rs1811, [%r54+64]; 2026-02-21T10:22:24.3238363Z ld.shared.b16 %rs1812, [%r54+1088]; 2026-02-21T10:22:24.3238431Z ld.shared.b16 %rs1813, [%r54+8192]; 2026-02-21T10:22:24.3238495Z ld.shared.b16 %rs1814, [%r54+9216]; 2026-02-21T10:22:24.3238559Z ld.shared.b16 %rs1815, [%r54+8256]; 2026-02-21T10:22:24.3238627Z ld.shared.b16 %rs1816, [%r54+9280]; 2026-02-21T10:22:24.3238696Z ld.shared.b16 %rs1817, [%r55]; 2026-02-21T10:22:24.3238760Z ld.shared.b16 %rs1818, [%r55+1024]; 2026-02-21T10:22:24.3238830Z ld.shared.b16 %rs1819, [%r55+64]; 2026-02-21T10:22:24.3238904Z ld.shared.b16 %rs1820, [%r55+1088]; 2026-02-21T10:22:24.3238970Z ld.shared.b16 %rs1821, [%r55+8192]; 2026-02-21T10:22:24.3239038Z ld.shared.b16 %rs1822, [%r55+9216]; 2026-02-21T10:22:24.3239105Z ld.shared.b16 %rs1823, [%r55+8256]; 2026-02-21T10:22:24.3239169Z ld.shared.b16 %rs1824, [%r55+9280]; 2026-02-21T10:22:24.3239303Z ld.shared.b16 %rs1825, [%r56]; 2026-02-21T10:22:24.3239374Z ld.shared.b16 %rs1826, [%r56+1024]; 2026-02-21T10:22:24.3239437Z ld.shared.b16 %rs1827, [%r56+64]; 2026-02-21T10:22:24.3239500Z ld.shared.b16 %rs1828, [%r56+1088]; 2026-02-21T10:22:24.3239564Z ld.shared.b16 %rs1829, [%r56+8192]; 2026-02-21T10:22:24.3239633Z ld.shared.b16 %rs1830, [%r56+9216]; 2026-02-21T10:22:24.3239696Z ld.shared.b16 %rs1831, [%r56+8256]; 2026-02-21T10:22:24.3239763Z ld.shared.b16 %rs1832, [%r56+9280]; 2026-02-21T10:22:24.3239830Z ld.shared.b16 %rs1833, [%r57]; 2026-02-21T10:22:24.3239895Z ld.shared.b16 %rs1834, [%r57+1024]; 2026-02-21T10:22:24.3239960Z ld.shared.b16 %rs1835, [%r57+64]; 2026-02-21T10:22:24.3240027Z ld.shared.b16 %rs1836, [%r57+1088]; 2026-02-21T10:22:24.3240091Z ld.shared.b16 %rs1837, [%r57+8192]; 2026-02-21T10:22:24.3240155Z ld.shared.b16 %rs1838, [%r57+9216]; 2026-02-21T10:22:24.3240218Z ld.shared.b16 %rs1839, [%r57+8256]; 2026-02-21T10:22:24.3240287Z ld.shared.b16 %rs1840, [%r57+9280]; 2026-02-21T10:22:24.3240350Z ld.shared.b16 %rs1841, [%r58]; 2026-02-21T10:22:24.3240413Z ld.shared.b16 %rs1842, [%r58+1024]; 2026-02-21T10:22:24.3240492Z ld.shared.b16 %rs1843, [%r58+64]; 2026-02-21T10:22:24.3240561Z ld.shared.b16 %rs1844, [%r58+1088]; 2026-02-21T10:22:24.3240624Z ld.shared.b16 %rs1845, [%r58+8192]; 2026-02-21T10:22:24.3240687Z ld.shared.b16 %rs1846, [%r58+9216]; 2026-02-21T10:22:24.3240758Z ld.shared.b16 %rs1847, [%r58+8256]; 2026-02-21T10:22:24.3240898Z ld.shared.b16 %rs1848, [%r58+9280]; 2026-02-21T10:22:24.3240961Z ld.shared.b16 %rs1849, [%r59]; 2026-02-21T10:22:24.3241027Z ld.shared.b16 %rs1850, [%r59+1024]; 2026-02-21T10:22:24.3241090Z ld.shared.b16 %rs1851, [%r59+64]; 2026-02-21T10:22:24.3241155Z ld.shared.b16 %rs1852, [%r59+1088]; 2026-02-21T10:22:24.3241224Z ld.shared.b16 %rs1853, [%r59+8192]; 2026-02-21T10:22:24.3241288Z ld.shared.b16 %rs1854, [%r59+9216]; 2026-02-21T10:22:24.3241350Z ld.shared.b16 %rs1855, [%r59+8256]; 2026-02-21T10:22:24.3241416Z ld.shared.b16 %rs1856, [%r59+9280]; 2026-02-21T10:22:24.3241482Z cvt.f32.bf16 %r22545, %rs1793; 2026-02-21T10:22:24.3241542Z cvt.f32.bf16 %r22546, %rs1794; 2026-02-21T10:22:24.3241604Z cvt.f32.bf16 %r22547, %rs1801; 2026-02-21T10:22:24.3241665Z cvt.f32.bf16 %r22548, %rs1802; 2026-02-21T10:22:24.3241723Z cvt.f32.bf16 %r22677, %rs1809; 2026-02-21T10:22:24.3241794Z cvt.f32.bf16 %r22678, %rs1810; 2026-02-21T10:22:24.3241858Z cvt.f32.bf16 %r22679, %rs1817; 2026-02-21T10:22:24.3241977Z cvt.f32.bf16 %r22680, %rs1818; 2026-02-21T10:22:24.3242038Z cvt.f32.bf16 %r22809, %rs1825; 2026-02-21T10:22:24.3242098Z cvt.f32.bf16 %r22810, %rs1826; 2026-02-21T10:22:24.3242162Z cvt.f32.bf16 %r22811, %rs1833; 2026-02-21T10:22:24.3242222Z cvt.f32.bf16 %r22812, %rs1834; 2026-02-21T10:22:24.3242282Z cvt.f32.bf16 %r22941, %rs1841; 2026-02-21T10:22:24.3242407Z cvt.f32.bf16 %r22942, %rs1842; 2026-02-21T10:22:24.3242469Z cvt.f32.bf16 %r22943, %rs1849; 2026-02-21T10:22:24.3242531Z cvt.f32.bf16 %r22944, %rs1850; 2026-02-21T10:22:24.3242590Z cvt.f32.bf16 %r23073, %rs1795; 2026-02-21T10:22:24.3242656Z cvt.f32.bf16 %r23074, %rs1796; 2026-02-21T10:22:24.3242715Z cvt.f32.bf16 %r23075, %rs1803; 2026-02-21T10:22:24.3242775Z cvt.f32.bf16 %r23076, %rs1804; 2026-02-21T10:22:24.3242837Z cvt.f32.bf16 %r23205, %rs1811; 2026-02-21T10:22:24.3242896Z cvt.f32.bf16 %r23206, %rs1812; 2026-02-21T10:22:24.3242966Z cvt.f32.bf16 %r23207, %rs1819; 2026-02-21T10:22:24.3243032Z cvt.f32.bf16 %r23208, %rs1820; 2026-02-21T10:22:24.3243098Z cvt.f32.bf16 %r23337, %rs1827; 2026-02-21T10:22:24.3243159Z cvt.f32.bf16 %r23338, %rs1828; 2026-02-21T10:22:24.3243219Z cvt.f32.bf16 %r23339, %rs1835; 2026-02-21T10:22:24.3243281Z cvt.f32.bf16 %r23340, %rs1836; 2026-02-21T10:22:24.3243341Z cvt.f32.bf16 %r23469, %rs1843; 2026-02-21T10:22:24.3243404Z cvt.f32.bf16 %r23470, %rs1844; 2026-02-21T10:22:24.3243465Z cvt.f32.bf16 %r23471, %rs1851; 2026-02-21T10:22:24.3243530Z cvt.f32.bf16 %r23472, %rs1852; 2026-02-21T10:22:24.3243644Z cvt.f32.bf16 %r23601, %rs1797; 2026-02-21T10:22:24.3243718Z cvt.f32.bf16 %r23602, %rs1798; 2026-02-21T10:22:24.3243781Z cvt.f32.bf16 %r23603, %rs1805; 2026-02-21T10:22:24.3243843Z cvt.f32.bf16 %r23604, %rs1806; 2026-02-21T10:22:24.3243903Z cvt.f32.bf16 %r23733, %rs1813; 2026-02-21T10:22:24.3243962Z cvt.f32.bf16 %r23734, %rs1814; 2026-02-21T10:22:24.3244028Z cvt.f32.bf16 %r23735, %rs1821; 2026-02-21T10:22:24.3244088Z cvt.f32.bf16 %r23736, %rs1822; 2026-02-21T10:22:24.3244153Z cvt.f32.bf16 %r23865, %rs1829; 2026-02-21T10:22:24.3244217Z cvt.f32.bf16 %r23866, %rs1830; 2026-02-21T10:22:24.3244276Z cvt.f32.bf16 %r23867, %rs1837; 2026-02-21T10:22:24.3244337Z cvt.f32.bf16 %r23868, %rs1838; 2026-02-21T10:22:24.3244400Z cvt.f32.bf16 %r23997, %rs1845; 2026-02-21T10:22:24.3244462Z cvt.f32.bf16 %r23998, %rs1846; 2026-02-21T10:22:24.3244524Z cvt.f32.bf16 %r23999, %rs1853; 2026-02-21T10:22:24.3244583Z cvt.f32.bf16 %r24000, %rs1854; 2026-02-21T10:22:24.3244647Z cvt.f32.bf16 %r24129, %rs1799; 2026-02-21T10:22:24.3244707Z cvt.f32.bf16 %r24130, %rs1800; 2026-02-21T10:22:24.3244767Z cvt.f32.bf16 %r24131, %rs1807; 2026-02-21T10:22:24.3244828Z cvt.f32.bf16 %r24132, %rs1808; 2026-02-21T10:22:24.3244888Z cvt.f32.bf16 %r24261, %rs1815; 2026-02-21T10:22:24.3244948Z cvt.f32.bf16 %r24262, %rs1816; 2026-02-21T10:22:24.3245007Z cvt.f32.bf16 %r24263, %rs1823; 2026-02-21T10:22:24.3245069Z cvt.f32.bf16 %r24264, %rs1824; 2026-02-21T10:22:24.3245196Z cvt.f32.bf16 %r24393, %rs1831; 2026-02-21T10:22:24.3245258Z cvt.f32.bf16 %r24394, %rs1832; 2026-02-21T10:22:24.3245321Z cvt.f32.bf16 %r24395, %rs1839; 2026-02-21T10:22:24.3245382Z cvt.f32.bf16 %r24396, %rs1840; 2026-02-21T10:22:24.3245443Z cvt.f32.bf16 %r24525, %rs1847; 2026-02-21T10:22:24.3245504Z cvt.f32.bf16 %r24526, %rs1848; 2026-02-21T10:22:24.3245570Z cvt.f32.bf16 %r24527, %rs1855; 2026-02-21T10:22:24.3245629Z cvt.f32.bf16 %r24528, %rs1856; 2026-02-21T10:22:24.3245836Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3245894Z bar.sync 0; 2026-02-21T10:22:24.3245954Z // begin inline asm 2026-02-21T10:22:24.3246063Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3246126Z // end inline asm 2026-02-21T10:22:24.3246181Z bar.sync 0; 2026-02-21T10:22:24.3246240Z // begin inline asm 2026-02-21T10:22:24.3246374Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3246438Z // end inline asm 2026-02-21T10:22:24.3246697Z // begin inline asm 2026-02-21T10:22:24.3246780Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3246840Z // end inline asm 2026-02-21T10:22:24.3246893Z bar.sync 0; 2026-02-21T10:22:24.3246959Z elect.sync %r29621|%p281, -1; 2026-02-21T10:22:24.3247028Z and.pred %p222, %p1, %p281; 2026-02-21T10:22:24.3247159Z add.s64 %rd79, %rd849, 96; 2026-02-21T10:22:24.3247222Z cvt.u32.u64 %r22412, %rd79; 2026-02-21T10:22:24.3247279Z // begin inline asm 2026-02-21T10:22:24.3247626Z @%p222 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r29853, %r22412}], [%r29850]; 2026-02-21T10:22:24.3247683Z // end inline asm 2026-02-21T10:22:24.3247737Z bar.sync 0; 2026-02-21T10:22:24.3247797Z mov.b32 %r29488, 0; 2026-02-21T10:22:24.3247855Z // begin inline asm 2026-02-21T10:22:24.3247909Z 2026-02-21T10:22:24.3247959Z { 2026-02-21T10:22:24.3248026Z .reg .pred complete; 2026-02-21T10:22:24.3248085Z waitLoop: 2026-02-21T10:22:24.3248233Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r29488; 2026-02-21T10:22:24.3248318Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3248370Z } 2026-02-21T10:22:24.3248375Z 2026-02-21T10:22:24.3248430Z // end inline asm 2026-02-21T10:22:24.3248484Z bar.sync 0; 2026-02-21T10:22:24.3248544Z // begin inline asm 2026-02-21T10:22:24.3248644Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3248700Z // end inline asm 2026-02-21T10:22:24.3248975Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3249043Z ld.shared.s8 %rs1857, [%r20]; 2026-02-21T10:22:24.3249240Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3249307Z shl.b16 %rs1858, %rs1857, 4; 2026-02-21T10:22:24.3249498Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3249566Z ld.shared.s8 %rs1859, [%r21+128]; 2026-02-21T10:22:24.3249758Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3249824Z shl.b16 %rs1860, %rs1859, 4; 2026-02-21T10:22:24.3250012Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3250079Z ld.shared.s8 %rs1861, [%r22+256]; 2026-02-21T10:22:24.3250273Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3250335Z shl.b16 %rs1862, %rs1861, 4; 2026-02-21T10:22:24.3250523Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3250593Z ld.shared.s8 %rs1863, [%r23+384]; 2026-02-21T10:22:24.3250782Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3250845Z shl.b16 %rs1864, %rs1863, 4; 2026-02-21T10:22:24.3251106Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3251170Z ld.shared.s8 %rs1865, [%r24+512]; 2026-02-21T10:22:24.3251361Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3251425Z shl.b16 %rs1866, %rs1865, 4; 2026-02-21T10:22:24.3251616Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3251682Z ld.shared.s8 %rs1867, [%r25+640]; 2026-02-21T10:22:24.3251871Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3251938Z shl.b16 %rs1868, %rs1867, 4; 2026-02-21T10:22:24.3252127Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3252191Z ld.shared.s8 %rs1869, [%r26+768]; 2026-02-21T10:22:24.3252433Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3252498Z shl.b16 %rs1870, %rs1869, 4; 2026-02-21T10:22:24.3252687Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3252752Z ld.shared.s8 %rs1871, [%r27+896]; 2026-02-21T10:22:24.3252988Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3253050Z shl.b16 %rs1872, %rs1871, 4; 2026-02-21T10:22:24.3253239Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3253311Z ld.shared.s8 %rs1873, [%r20+1024]; 2026-02-21T10:22:24.3253499Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3253571Z shl.b16 %rs1874, %rs1873, 4; 2026-02-21T10:22:24.3253767Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3253835Z ld.shared.s8 %rs1875, [%r21+1152]; 2026-02-21T10:22:24.3254026Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3254092Z shl.b16 %rs1876, %rs1875, 4; 2026-02-21T10:22:24.3254282Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3254348Z ld.shared.s8 %rs1877, [%r22+1280]; 2026-02-21T10:22:24.3254598Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3254665Z shl.b16 %rs1878, %rs1877, 4; 2026-02-21T10:22:24.3254857Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3254929Z ld.shared.s8 %rs1879, [%r23+1408]; 2026-02-21T10:22:24.3255117Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3255180Z shl.b16 %rs1880, %rs1879, 4; 2026-02-21T10:22:24.3255371Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3255438Z ld.shared.s8 %rs1881, [%r24+1536]; 2026-02-21T10:22:24.3255628Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3255692Z shl.b16 %rs1882, %rs1881, 4; 2026-02-21T10:22:24.3255886Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3255949Z ld.shared.s8 %rs1883, [%r25+1664]; 2026-02-21T10:22:24.3256143Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3256207Z shl.b16 %rs1884, %rs1883, 4; 2026-02-21T10:22:24.3256396Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3256574Z ld.shared.s8 %rs1885, [%r26+1792]; 2026-02-21T10:22:24.3256877Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3256941Z shl.b16 %rs1886, %rs1885, 4; 2026-02-21T10:22:24.3257138Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3257205Z ld.shared.s8 %rs1887, [%r27+1920]; 2026-02-21T10:22:24.3257409Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3257474Z shl.b16 %rs1888, %rs1887, 4; 2026-02-21T10:22:24.3257667Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3257736Z ld.shared.s8 %rs1889, [%r20+2048]; 2026-02-21T10:22:24.3257925Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3257988Z shl.b16 %rs1890, %rs1889, 4; 2026-02-21T10:22:24.3258252Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3258324Z ld.shared.s8 %rs1891, [%r21+2176]; 2026-02-21T10:22:24.3258516Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3258582Z shl.b16 %rs1892, %rs1891, 4; 2026-02-21T10:22:24.3258842Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3258907Z ld.shared.s8 %rs1893, [%r22+2304]; 2026-02-21T10:22:24.3259095Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3259159Z shl.b16 %rs1894, %rs1893, 4; 2026-02-21T10:22:24.3259346Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3259408Z ld.shared.s8 %rs1895, [%r23+2432]; 2026-02-21T10:22:24.3259599Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3259663Z shl.b16 %rs1896, %rs1895, 4; 2026-02-21T10:22:24.3259852Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3259923Z ld.shared.s8 %rs1897, [%r24+2560]; 2026-02-21T10:22:24.3260110Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3260173Z shl.b16 %rs1898, %rs1897, 4; 2026-02-21T10:22:24.3260432Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3260500Z ld.shared.s8 %rs1899, [%r25+2688]; 2026-02-21T10:22:24.3260688Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3260748Z shl.b16 %rs1900, %rs1899, 4; 2026-02-21T10:22:24.3260939Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3261006Z ld.shared.s8 %rs1901, [%r26+2816]; 2026-02-21T10:22:24.3261197Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3261264Z shl.b16 %rs1902, %rs1901, 4; 2026-02-21T10:22:24.3261452Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3261518Z ld.shared.s8 %rs1903, [%r27+2944]; 2026-02-21T10:22:24.3261712Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3261773Z shl.b16 %rs1904, %rs1903, 4; 2026-02-21T10:22:24.3261962Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3262035Z ld.shared.s8 %rs1905, [%r20+3072]; 2026-02-21T10:22:24.3262235Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3262296Z shl.b16 %rs1906, %rs1905, 4; 2026-02-21T10:22:24.3262544Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3262612Z ld.shared.s8 %rs1907, [%r21+3200]; 2026-02-21T10:22:24.3262803Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3262865Z shl.b16 %rs1908, %rs1907, 4; 2026-02-21T10:22:24.3263063Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3263128Z ld.shared.s8 %rs1909, [%r22+3328]; 2026-02-21T10:22:24.3263316Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3263390Z shl.b16 %rs1910, %rs1909, 4; 2026-02-21T10:22:24.3263580Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3263645Z ld.shared.s8 %rs1911, [%r23+3456]; 2026-02-21T10:22:24.3263906Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3263971Z shl.b16 %rs1912, %rs1911, 4; 2026-02-21T10:22:24.3264161Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3264225Z ld.shared.s8 %rs1913, [%r24+3584]; 2026-02-21T10:22:24.3264462Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3264522Z shl.b16 %rs1914, %rs1913, 4; 2026-02-21T10:22:24.3264712Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3264779Z ld.shared.s8 %rs1915, [%r25+3712]; 2026-02-21T10:22:24.3264970Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3265032Z shl.b16 %rs1916, %rs1915, 4; 2026-02-21T10:22:24.3265224Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3265292Z ld.shared.s8 %rs1917, [%r26+3840]; 2026-02-21T10:22:24.3265479Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3265542Z shl.b16 %rs1918, %rs1917, 4; 2026-02-21T10:22:24.3265727Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3265795Z ld.shared.s8 %rs1919, [%r27+3968]; 2026-02-21T10:22:24.3266040Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3266114Z shl.b16 %rs1920, %rs1919, 4; 2026-02-21T10:22:24.3266305Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3266367Z cvt.s16.s8 %rs1921, %rs1858; 2026-02-21T10:22:24.3266430Z shr.s16 %rs1922, %rs1921, 4; 2026-02-21T10:22:24.3266618Z cvt.s16.s8 %rs1923, %rs1860; 2026-02-21T10:22:24.3266686Z shr.s16 %rs1924, %rs1923, 4; 2026-02-21T10:22:24.3266751Z shr.s16 %rs1925, %rs1857, 4; 2026-02-21T10:22:24.3266810Z shr.s16 %rs1926, %rs1859, 4; 2026-02-21T10:22:24.3267001Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3267070Z cvt.rn.f32.s16 %r29622, %rs1926; 2026-02-21T10:22:24.3267136Z cvt.rn.f32.s16 %r29623, %rs1925; 2026-02-21T10:22:24.3267197Z cvt.rn.f32.s16 %r29624, %rs1924; 2026-02-21T10:22:24.3267258Z cvt.rn.f32.s16 %r29625, %rs1922; 2026-02-21T10:22:24.3267454Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3267515Z cvt.s16.s8 %rs1927, %rs1862; 2026-02-21T10:22:24.3267574Z shr.s16 %rs1928, %rs1927, 4; 2026-02-21T10:22:24.3267635Z cvt.s16.s8 %rs1929, %rs1864; 2026-02-21T10:22:24.3267695Z shr.s16 %rs1930, %rs1929, 4; 2026-02-21T10:22:24.3267754Z shr.s16 %rs1931, %rs1861, 4; 2026-02-21T10:22:24.3267812Z shr.s16 %rs1932, %rs1863, 4; 2026-02-21T10:22:24.3268096Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3268159Z cvt.rn.f32.s16 %r29626, %rs1932; 2026-02-21T10:22:24.3268221Z cvt.rn.f32.s16 %r29627, %rs1931; 2026-02-21T10:22:24.3268284Z cvt.rn.f32.s16 %r29628, %rs1930; 2026-02-21T10:22:24.3268345Z cvt.rn.f32.s16 %r29629, %rs1928; 2026-02-21T10:22:24.3268612Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3268677Z cvt.s16.s8 %rs1933, %rs1866; 2026-02-21T10:22:24.3268737Z shr.s16 %rs1934, %rs1933, 4; 2026-02-21T10:22:24.3268795Z cvt.s16.s8 %rs1935, %rs1868; 2026-02-21T10:22:24.3268853Z shr.s16 %rs1936, %rs1935, 4; 2026-02-21T10:22:24.3268915Z shr.s16 %rs1937, %rs1865, 4; 2026-02-21T10:22:24.3268974Z shr.s16 %rs1938, %rs1867, 4; 2026-02-21T10:22:24.3269165Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3269233Z cvt.rn.f32.s16 %r29630, %rs1938; 2026-02-21T10:22:24.3269364Z cvt.rn.f32.s16 %r29631, %rs1937; 2026-02-21T10:22:24.3269428Z cvt.rn.f32.s16 %r29632, %rs1936; 2026-02-21T10:22:24.3269490Z cvt.rn.f32.s16 %r29633, %rs1934; 2026-02-21T10:22:24.3269682Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3269804Z cvt.s16.s8 %rs1939, %rs1870; 2026-02-21T10:22:24.3269866Z shr.s16 %rs1940, %rs1939, 4; 2026-02-21T10:22:24.3269927Z cvt.s16.s8 %rs1941, %rs1872; 2026-02-21T10:22:24.3269988Z shr.s16 %rs1942, %rs1941, 4; 2026-02-21T10:22:24.3270060Z shr.s16 %rs1943, %rs1869, 4; 2026-02-21T10:22:24.3270125Z shr.s16 %rs1944, %rs1871, 4; 2026-02-21T10:22:24.3270316Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3270376Z cvt.rn.f32.s16 %r29634, %rs1944; 2026-02-21T10:22:24.3270439Z cvt.rn.f32.s16 %r29635, %rs1943; 2026-02-21T10:22:24.3270503Z cvt.rn.f32.s16 %r29636, %rs1942; 2026-02-21T10:22:24.3270566Z cvt.rn.f32.s16 %r29637, %rs1940; 2026-02-21T10:22:24.3270757Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3270819Z cvt.s16.s8 %rs1945, %rs1874; 2026-02-21T10:22:24.3270879Z shr.s16 %rs1946, %rs1945, 4; 2026-02-21T10:22:24.3270942Z cvt.s16.s8 %rs1947, %rs1876; 2026-02-21T10:22:24.3271002Z shr.s16 %rs1948, %rs1947, 4; 2026-02-21T10:22:24.3271060Z shr.s16 %rs1949, %rs1873, 4; 2026-02-21T10:22:24.3271187Z shr.s16 %rs1950, %rs1875, 4; 2026-02-21T10:22:24.3271378Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3271445Z cvt.rn.f32.s16 %r29638, %rs1950; 2026-02-21T10:22:24.3271518Z cvt.rn.f32.s16 %r29639, %rs1949; 2026-02-21T10:22:24.3271579Z cvt.rn.f32.s16 %r29640, %rs1948; 2026-02-21T10:22:24.3271643Z cvt.rn.f32.s16 %r29641, %rs1946; 2026-02-21T10:22:24.3271837Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3271900Z cvt.s16.s8 %rs1951, %rs1878; 2026-02-21T10:22:24.3271962Z shr.s16 %rs1952, %rs1951, 4; 2026-02-21T10:22:24.3272022Z cvt.s16.s8 %rs1953, %rs1880; 2026-02-21T10:22:24.3272082Z shr.s16 %rs1954, %rs1953, 4; 2026-02-21T10:22:24.3272145Z shr.s16 %rs1955, %rs1877, 4; 2026-02-21T10:22:24.3272208Z shr.s16 %rs1956, %rs1879, 4; 2026-02-21T10:22:24.3272401Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3272463Z cvt.rn.f32.s16 %r29642, %rs1956; 2026-02-21T10:22:24.3272527Z cvt.rn.f32.s16 %r29643, %rs1955; 2026-02-21T10:22:24.3272588Z cvt.rn.f32.s16 %r29644, %rs1954; 2026-02-21T10:22:24.3272649Z cvt.rn.f32.s16 %r29645, %rs1952; 2026-02-21T10:22:24.3272842Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3272912Z cvt.s16.s8 %rs1957, %rs1882; 2026-02-21T10:22:24.3273030Z shr.s16 %rs1958, %rs1957, 4; 2026-02-21T10:22:24.3273091Z cvt.s16.s8 %rs1959, %rs1884; 2026-02-21T10:22:24.3273152Z shr.s16 %rs1960, %rs1959, 4; 2026-02-21T10:22:24.3273212Z shr.s16 %rs1961, %rs1881, 4; 2026-02-21T10:22:24.3273270Z shr.s16 %rs1962, %rs1883, 4; 2026-02-21T10:22:24.3273462Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3273529Z cvt.rn.f32.s16 %r29646, %rs1962; 2026-02-21T10:22:24.3273592Z cvt.rn.f32.s16 %r29647, %rs1961; 2026-02-21T10:22:24.3273654Z cvt.rn.f32.s16 %r29648, %rs1960; 2026-02-21T10:22:24.3273718Z cvt.rn.f32.s16 %r29649, %rs1958; 2026-02-21T10:22:24.3273907Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3273967Z cvt.s16.s8 %rs1963, %rs1886; 2026-02-21T10:22:24.3274032Z shr.s16 %rs1964, %rs1963, 4; 2026-02-21T10:22:24.3274092Z cvt.s16.s8 %rs1965, %rs1888; 2026-02-21T10:22:24.3274153Z shr.s16 %rs1966, %rs1965, 4; 2026-02-21T10:22:24.3274272Z shr.s16 %rs1967, %rs1885, 4; 2026-02-21T10:22:24.3274336Z shr.s16 %rs1968, %rs1887, 4; 2026-02-21T10:22:24.3274526Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3274589Z cvt.rn.f32.s16 %r29650, %rs1968; 2026-02-21T10:22:24.3274697Z cvt.rn.f32.s16 %r29651, %rs1967; 2026-02-21T10:22:24.3274757Z cvt.rn.f32.s16 %r29652, %rs1966; 2026-02-21T10:22:24.3274819Z cvt.rn.f32.s16 %r29653, %rs1964; 2026-02-21T10:22:24.3275012Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3275072Z cvt.s16.s8 %rs1969, %rs1890; 2026-02-21T10:22:24.3275131Z shr.s16 %rs1970, %rs1969, 4; 2026-02-21T10:22:24.3275195Z cvt.s16.s8 %rs1971, %rs1892; 2026-02-21T10:22:24.3275253Z shr.s16 %rs1972, %rs1971, 4; 2026-02-21T10:22:24.3275313Z shr.s16 %rs1973, %rs1889, 4; 2026-02-21T10:22:24.3275371Z shr.s16 %rs1974, %rs1891, 4; 2026-02-21T10:22:24.3275571Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3275636Z cvt.rn.f32.s16 %r29654, %rs1974; 2026-02-21T10:22:24.3275702Z cvt.rn.f32.s16 %r29655, %rs1973; 2026-02-21T10:22:24.3275768Z cvt.rn.f32.s16 %r29656, %rs1972; 2026-02-21T10:22:24.3275833Z cvt.rn.f32.s16 %r29657, %rs1970; 2026-02-21T10:22:24.3276025Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3276144Z cvt.s16.s8 %rs1975, %rs1894; 2026-02-21T10:22:24.3276211Z shr.s16 %rs1976, %rs1975, 4; 2026-02-21T10:22:24.3276270Z cvt.s16.s8 %rs1977, %rs1896; 2026-02-21T10:22:24.3276330Z shr.s16 %rs1978, %rs1977, 4; 2026-02-21T10:22:24.3276393Z shr.s16 %rs1979, %rs1893, 4; 2026-02-21T10:22:24.3276589Z shr.s16 %rs1980, %rs1895, 4; 2026-02-21T10:22:24.3276787Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3276862Z cvt.rn.f32.s16 %r29658, %rs1980; 2026-02-21T10:22:24.3276926Z cvt.rn.f32.s16 %r29659, %rs1979; 2026-02-21T10:22:24.3276987Z cvt.rn.f32.s16 %r29660, %rs1978; 2026-02-21T10:22:24.3277049Z cvt.rn.f32.s16 %r29661, %rs1976; 2026-02-21T10:22:24.3277247Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3277310Z cvt.s16.s8 %rs1981, %rs1898; 2026-02-21T10:22:24.3277371Z shr.s16 %rs1982, %rs1981, 4; 2026-02-21T10:22:24.3277439Z cvt.s16.s8 %rs1983, %rs1900; 2026-02-21T10:22:24.3277498Z shr.s16 %rs1984, %rs1983, 4; 2026-02-21T10:22:24.3277559Z shr.s16 %rs1985, %rs1897, 4; 2026-02-21T10:22:24.3277626Z shr.s16 %rs1986, %rs1899, 4; 2026-02-21T10:22:24.3277816Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3277879Z cvt.rn.f32.s16 %r29662, %rs1986; 2026-02-21T10:22:24.3277941Z cvt.rn.f32.s16 %r29663, %rs1985; 2026-02-21T10:22:24.3278085Z cvt.rn.f32.s16 %r29664, %rs1984; 2026-02-21T10:22:24.3278147Z cvt.rn.f32.s16 %r29665, %rs1982; 2026-02-21T10:22:24.3278339Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3278409Z cvt.s16.s8 %rs1987, %rs1902; 2026-02-21T10:22:24.3278471Z shr.s16 %rs1988, %rs1987, 4; 2026-02-21T10:22:24.3278535Z cvt.s16.s8 %rs1989, %rs1904; 2026-02-21T10:22:24.3278598Z shr.s16 %rs1990, %rs1989, 4; 2026-02-21T10:22:24.3278658Z shr.s16 %rs1991, %rs1901, 4; 2026-02-21T10:22:24.3278719Z shr.s16 %rs1992, %rs1903, 4; 2026-02-21T10:22:24.3278907Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3278986Z cvt.rn.f32.s16 %r29666, %rs1992; 2026-02-21T10:22:24.3279054Z cvt.rn.f32.s16 %r29667, %rs1991; 2026-02-21T10:22:24.3279116Z cvt.rn.f32.s16 %r29668, %rs1990; 2026-02-21T10:22:24.3279185Z cvt.rn.f32.s16 %r29669, %rs1988; 2026-02-21T10:22:24.3279452Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3279519Z cvt.s16.s8 %rs1993, %rs1906; 2026-02-21T10:22:24.3279584Z shr.s16 %rs1994, %rs1993, 4; 2026-02-21T10:22:24.3279646Z cvt.s16.s8 %rs1995, %rs1908; 2026-02-21T10:22:24.3279706Z shr.s16 %rs1996, %rs1995, 4; 2026-02-21T10:22:24.3279826Z shr.s16 %rs1997, %rs1905, 4; 2026-02-21T10:22:24.3279892Z shr.s16 %rs1998, %rs1907, 4; 2026-02-21T10:22:24.3280087Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3280149Z cvt.rn.f32.s16 %r29670, %rs1998; 2026-02-21T10:22:24.3280218Z cvt.rn.f32.s16 %r29671, %rs1997; 2026-02-21T10:22:24.3280280Z cvt.rn.f32.s16 %r29672, %rs1996; 2026-02-21T10:22:24.3280341Z cvt.rn.f32.s16 %r29673, %rs1994; 2026-02-21T10:22:24.3280533Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3280596Z cvt.s16.s8 %rs1999, %rs1910; 2026-02-21T10:22:24.3280663Z shr.s16 %rs2000, %rs1999, 4; 2026-02-21T10:22:24.3280723Z cvt.s16.s8 %rs2001, %rs1912; 2026-02-21T10:22:24.3280789Z shr.s16 %rs2002, %rs2001, 4; 2026-02-21T10:22:24.3280850Z shr.s16 %rs2003, %rs1909, 4; 2026-02-21T10:22:24.3280911Z shr.s16 %rs2004, %rs1911, 4; 2026-02-21T10:22:24.3281104Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3281169Z cvt.rn.f32.s16 %r29674, %rs2004; 2026-02-21T10:22:24.3281300Z cvt.rn.f32.s16 %r29675, %rs2003; 2026-02-21T10:22:24.3281365Z cvt.rn.f32.s16 %r29676, %rs2002; 2026-02-21T10:22:24.3281431Z cvt.rn.f32.s16 %r29677, %rs2000; 2026-02-21T10:22:24.3281621Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3281682Z cvt.s16.s8 %rs2005, %rs1914; 2026-02-21T10:22:24.3281746Z shr.s16 %rs2006, %rs2005, 4; 2026-02-21T10:22:24.3281805Z cvt.s16.s8 %rs2007, %rs1916; 2026-02-21T10:22:24.3281872Z shr.s16 %rs2008, %rs2007, 4; 2026-02-21T10:22:24.3281950Z shr.s16 %rs2009, %rs1913, 4; 2026-02-21T10:22:24.3282011Z shr.s16 %rs2010, %rs1915, 4; 2026-02-21T10:22:24.3282201Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3282265Z cvt.rn.f32.s16 %r29678, %rs2010; 2026-02-21T10:22:24.3282333Z cvt.rn.f32.s16 %r29679, %rs2009; 2026-02-21T10:22:24.3282395Z cvt.rn.f32.s16 %r29680, %rs2008; 2026-02-21T10:22:24.3282461Z cvt.rn.f32.s16 %r29681, %rs2006; 2026-02-21T10:22:24.3282658Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3282719Z cvt.s16.s8 %rs2011, %rs1918; 2026-02-21T10:22:24.3282780Z shr.s16 %rs2012, %rs2011, 4; 2026-02-21T10:22:24.3282845Z cvt.s16.s8 %rs2013, %rs1920; 2026-02-21T10:22:24.3282905Z shr.s16 %rs2014, %rs2013, 4; 2026-02-21T10:22:24.3282966Z shr.s16 %rs2015, %rs1917, 4; 2026-02-21T10:22:24.3283101Z shr.s16 %rs2016, %rs1919, 4; 2026-02-21T10:22:24.3283297Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3283361Z cvt.rn.f32.s16 %r29682, %rs2016; 2026-02-21T10:22:24.3283423Z cvt.rn.f32.s16 %r29683, %rs2015; 2026-02-21T10:22:24.3283491Z cvt.rn.f32.s16 %r29684, %rs2014; 2026-02-21T10:22:24.3283556Z cvt.rn.f32.s16 %r29685, %rs2012; 2026-02-21T10:22:24.3283613Z bar.sync 0; 2026-02-21T10:22:24.3283734Z st.shared.v4.b32 [%r28], {%r29625, %r29623, %r29624, %r29622}; 2026-02-21T10:22:24.3283884Z st.shared.v4.b32 [%r28+16384], {%r29657, %r29655, %r29656, %r29654}; 2026-02-21T10:22:24.3283996Z st.shared.v4.b32 [%r29], {%r29629, %r29627, %r29628, %r29626}; 2026-02-21T10:22:24.3284119Z st.shared.v4.b32 [%r29+16384], {%r29661, %r29659, %r29660, %r29658}; 2026-02-21T10:22:24.3284229Z st.shared.v4.b32 [%r30], {%r29633, %r29631, %r29632, %r29630}; 2026-02-21T10:22:24.3284345Z st.shared.v4.b32 [%r30+16384], {%r29665, %r29663, %r29664, %r29662}; 2026-02-21T10:22:24.3284507Z st.shared.v4.b32 [%r31], {%r29637, %r29635, %r29636, %r29634}; 2026-02-21T10:22:24.3284631Z st.shared.v4.b32 [%r31+16384], {%r29669, %r29667, %r29668, %r29666}; 2026-02-21T10:22:24.3284739Z st.shared.v4.b32 [%r32], {%r29641, %r29639, %r29640, %r29638}; 2026-02-21T10:22:24.3284855Z st.shared.v4.b32 [%r32+16384], {%r29673, %r29671, %r29672, %r29670}; 2026-02-21T10:22:24.3285033Z st.shared.v4.b32 [%r33], {%r29645, %r29643, %r29644, %r29642}; 2026-02-21T10:22:24.3285151Z st.shared.v4.b32 [%r33+16384], {%r29677, %r29675, %r29676, %r29674}; 2026-02-21T10:22:24.3285260Z st.shared.v4.b32 [%r34], {%r29649, %r29647, %r29648, %r29646}; 2026-02-21T10:22:24.3285381Z st.shared.v4.b32 [%r34+16384], {%r29681, %r29679, %r29680, %r29678}; 2026-02-21T10:22:24.3285494Z st.shared.v4.b32 [%r35], {%r29653, %r29651, %r29652, %r29650}; 2026-02-21T10:22:24.3285608Z st.shared.v4.b32 [%r35+16384], {%r29685, %r29683, %r29684, %r29682}; 2026-02-21T10:22:24.3285669Z $L__tmp17: 2026-02-21T10:22:24.3285946Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3286009Z // begin inline asm 2026-02-21T10:22:24.3286096Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3286161Z // end inline asm 2026-02-21T10:22:24.3286217Z bar.sync 0; 2026-02-21T10:22:24.3286305Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3286371Z mov.pred %p224, -1; 2026-02-21T10:22:24.3286437Z // begin inline asm 2026-02-21T10:22:24.3288138Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r22545,%r22546,%r22547,%r22548}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3288219Z // end inline asm 2026-02-21T10:22:24.3288280Z // begin inline asm 2026-02-21T10:22:24.3289762Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r22677,%r22678,%r22679,%r22680}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3289826Z // end inline asm 2026-02-21T10:22:24.3289884Z // begin inline asm 2026-02-21T10:22:24.3291429Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r22809,%r22810,%r22811,%r22812}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3291495Z // end inline asm 2026-02-21T10:22:24.3291554Z // begin inline asm 2026-02-21T10:22:24.3293100Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r22941,%r22942,%r22943,%r22944}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3293220Z // end inline asm 2026-02-21T10:22:24.3293278Z // begin inline asm 2026-02-21T10:22:24.3294766Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r23073,%r23074,%r23075,%r23076}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3294826Z // end inline asm 2026-02-21T10:22:24.3294886Z // begin inline asm 2026-02-21T10:22:24.3296416Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r23205,%r23206,%r23207,%r23208}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3296597Z // end inline asm 2026-02-21T10:22:24.3296665Z // begin inline asm 2026-02-21T10:22:24.3298146Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r23337,%r23338,%r23339,%r23340}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3298214Z // end inline asm 2026-02-21T10:22:24.3298274Z // begin inline asm 2026-02-21T10:22:24.3299758Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r23469,%r23470,%r23471,%r23472}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3299901Z // end inline asm 2026-02-21T10:22:24.3299962Z // begin inline asm 2026-02-21T10:22:24.3301442Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r23601,%r23602,%r23603,%r23604}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3301567Z // end inline asm 2026-02-21T10:22:24.3301630Z // begin inline asm 2026-02-21T10:22:24.3303110Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r23733,%r23734,%r23735,%r23736}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3303231Z // end inline asm 2026-02-21T10:22:24.3303289Z // begin inline asm 2026-02-21T10:22:24.3304773Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r23865,%r23866,%r23867,%r23868}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3304836Z // end inline asm 2026-02-21T10:22:24.3304954Z // begin inline asm 2026-02-21T10:22:24.3306443Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r23997,%r23998,%r23999,%r24000}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3306621Z // end inline asm 2026-02-21T10:22:24.3306684Z // begin inline asm 2026-02-21T10:22:24.3308165Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r24129,%r24130,%r24131,%r24132}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3308228Z // end inline asm 2026-02-21T10:22:24.3308377Z // begin inline asm 2026-02-21T10:22:24.3309940Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r24261,%r24262,%r24263,%r24264}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3310005Z // end inline asm 2026-02-21T10:22:24.3310065Z // begin inline asm 2026-02-21T10:22:24.3311619Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r24393,%r24394,%r24395,%r24396}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3311743Z // end inline asm 2026-02-21T10:22:24.3311801Z // begin inline asm 2026-02-21T10:22:24.3313284Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r24525,%r24526,%r24527,%r24528}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3313349Z // end inline asm 2026-02-21T10:22:24.3313427Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3313492Z mov.b32 %r24658, %r29488; 2026-02-21T10:22:24.3313557Z mov.b32 %r24659, %r29488; 2026-02-21T10:22:24.3313618Z mov.b32 %r24657, %r39936; 2026-02-21T10:22:24.3313677Z // begin inline asm 2026-02-21T10:22:24.3316271Z // wait for regs: %r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115,%r24657,%r24658,%r24659 2026-02-21T10:22:24.3316358Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3316425Z // end inline asm 2026-02-21T10:22:24.3316604Z $L__tmp18: 2026-02-21T10:22:24.3316816Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3316892Z add.s32 %r29686, %r42987, -64; 2026-02-21T10:22:24.3316956Z add.s64 %rd541, %rd500, 128; 2026-02-21T10:22:24.3317017Z add.s64 %rd544, %rd503, 128; 2026-02-21T10:22:24.3317152Z add.s64 %rd547, %rd506, 128; 2026-02-21T10:22:24.3317220Z add.s64 %rd550, %rd509, 128; 2026-02-21T10:22:24.3317281Z add.s64 %rd553, %rd512, 128; 2026-02-21T10:22:24.3317344Z add.s64 %rd556, %rd515, 128; 2026-02-21T10:22:24.3317409Z add.s64 %rd559, %rd518, 128; 2026-02-21T10:22:24.3317500Z mad.wide.s32 %rd562, %r29686, 2, %rd117; 2026-02-21T10:22:24.3317707Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3317774Z // begin inline asm 2026-02-21T10:22:24.3317836Z mov.u64 %rd540, 0x0; 2026-02-21T10:22:24.3317968Z createpolicy.fractional.L2::evict_first.b64 %rd540, 1.0; 2026-02-21T10:22:24.3318028Z // end inline asm 2026-02-21T10:22:24.3318091Z // begin inline asm 2026-02-21T10:22:24.3318153Z mov.u32 %r24791, 0x0; 2026-02-21T10:22:24.3318214Z mov.u32 %r24792, 0x0; 2026-02-21T10:22:24.3318276Z mov.u32 %r24793, 0x0; 2026-02-21T10:22:24.3318335Z mov.u32 %r24794, 0x0; 2026-02-21T10:22:24.3318653Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24791, %r24792, %r24793, %r24794 }, [ %rd541 + 0 ], %rd540; 2026-02-21T10:22:24.3318718Z // end inline asm 2026-02-21T10:22:24.3318783Z // begin inline asm 2026-02-21T10:22:24.3318842Z mov.u64 %rd543, 0x0; 2026-02-21T10:22:24.3318966Z createpolicy.fractional.L2::evict_first.b64 %rd543, 1.0; 2026-02-21T10:22:24.3319089Z // end inline asm 2026-02-21T10:22:24.3319148Z // begin inline asm 2026-02-21T10:22:24.3319205Z mov.u32 %r24795, 0x0; 2026-02-21T10:22:24.3319265Z mov.u32 %r24796, 0x0; 2026-02-21T10:22:24.3319325Z mov.u32 %r24797, 0x0; 2026-02-21T10:22:24.3319395Z mov.u32 %r24798, 0x0; 2026-02-21T10:22:24.3319624Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24795, %r24796, %r24797, %r24798 }, [ %rd544 + 0 ], %rd543; 2026-02-21T10:22:24.3319691Z // end inline asm 2026-02-21T10:22:24.3319749Z // begin inline asm 2026-02-21T10:22:24.3319808Z mov.u64 %rd546, 0x0; 2026-02-21T10:22:24.3319930Z createpolicy.fractional.L2::evict_first.b64 %rd546, 1.0; 2026-02-21T10:22:24.3319990Z // end inline asm 2026-02-21T10:22:24.3320050Z // begin inline asm 2026-02-21T10:22:24.3320108Z mov.u32 %r24799, 0x0; 2026-02-21T10:22:24.3320174Z mov.u32 %r24800, 0x0; 2026-02-21T10:22:24.3320231Z mov.u32 %r24801, 0x0; 2026-02-21T10:22:24.3320289Z mov.u32 %r24802, 0x0; 2026-02-21T10:22:24.3320515Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24799, %r24800, %r24801, %r24802 }, [ %rd547 + 0 ], %rd546; 2026-02-21T10:22:24.3320575Z // end inline asm 2026-02-21T10:22:24.3320633Z // begin inline asm 2026-02-21T10:22:24.3320778Z mov.u64 %rd549, 0x0; 2026-02-21T10:22:24.3320900Z createpolicy.fractional.L2::evict_first.b64 %rd549, 1.0; 2026-02-21T10:22:24.3320957Z // end inline asm 2026-02-21T10:22:24.3321015Z // begin inline asm 2026-02-21T10:22:24.3321075Z mov.u32 %r24803, 0x0; 2026-02-21T10:22:24.3321133Z mov.u32 %r24804, 0x0; 2026-02-21T10:22:24.3321188Z mov.u32 %r24805, 0x0; 2026-02-21T10:22:24.3321249Z mov.u32 %r24806, 0x0; 2026-02-21T10:22:24.3321473Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24803, %r24804, %r24805, %r24806 }, [ %rd550 + 0 ], %rd549; 2026-02-21T10:22:24.3321534Z // end inline asm 2026-02-21T10:22:24.3321597Z // begin inline asm 2026-02-21T10:22:24.3321657Z mov.u64 %rd552, 0x0; 2026-02-21T10:22:24.3321774Z createpolicy.fractional.L2::evict_first.b64 %rd552, 1.0; 2026-02-21T10:22:24.3321835Z // end inline asm 2026-02-21T10:22:24.3321898Z // begin inline asm 2026-02-21T10:22:24.3321957Z mov.u32 %r24807, 0x0; 2026-02-21T10:22:24.3322013Z mov.u32 %r24808, 0x0; 2026-02-21T10:22:24.3322075Z mov.u32 %r24809, 0x0; 2026-02-21T10:22:24.3322132Z mov.u32 %r24810, 0x0; 2026-02-21T10:22:24.3322354Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24807, %r24808, %r24809, %r24810 }, [ %rd553 + 0 ], %rd552; 2026-02-21T10:22:24.3322412Z // end inline asm 2026-02-21T10:22:24.3322473Z // begin inline asm 2026-02-21T10:22:24.3322531Z mov.u64 %rd555, 0x0; 2026-02-21T10:22:24.3322659Z createpolicy.fractional.L2::evict_first.b64 %rd555, 1.0; 2026-02-21T10:22:24.3322776Z // end inline asm 2026-02-21T10:22:24.3322837Z // begin inline asm 2026-02-21T10:22:24.3322897Z mov.u32 %r24811, 0x0; 2026-02-21T10:22:24.3322958Z mov.u32 %r24812, 0x0; 2026-02-21T10:22:24.3323015Z mov.u32 %r24813, 0x0; 2026-02-21T10:22:24.3323072Z mov.u32 %r24814, 0x0; 2026-02-21T10:22:24.3323293Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24811, %r24812, %r24813, %r24814 }, [ %rd556 + 0 ], %rd555; 2026-02-21T10:22:24.3323356Z // end inline asm 2026-02-21T10:22:24.3323415Z // begin inline asm 2026-02-21T10:22:24.3323478Z mov.u64 %rd558, 0x0; 2026-02-21T10:22:24.3323598Z createpolicy.fractional.L2::evict_first.b64 %rd558, 1.0; 2026-02-21T10:22:24.3323655Z // end inline asm 2026-02-21T10:22:24.3323716Z // begin inline asm 2026-02-21T10:22:24.3323773Z mov.u32 %r24815, 0x0; 2026-02-21T10:22:24.3323845Z mov.u32 %r24816, 0x0; 2026-02-21T10:22:24.3323904Z mov.u32 %r24817, 0x0; 2026-02-21T10:22:24.3323960Z mov.u32 %r24818, 0x0; 2026-02-21T10:22:24.3324245Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24815, %r24816, %r24817, %r24818 }, [ %rd559 + 0 ], %rd558; 2026-02-21T10:22:24.3324304Z // end inline asm 2026-02-21T10:22:24.3324362Z // begin inline asm 2026-02-21T10:22:24.3324424Z mov.u64 %rd561, 0x0; 2026-02-21T10:22:24.3324540Z createpolicy.fractional.L2::evict_first.b64 %rd561, 1.0; 2026-02-21T10:22:24.3324642Z // end inline asm 2026-02-21T10:22:24.3324700Z // begin inline asm 2026-02-21T10:22:24.3324761Z mov.u32 %r24819, 0x0; 2026-02-21T10:22:24.3324818Z mov.u32 %r24820, 0x0; 2026-02-21T10:22:24.3324878Z mov.u32 %r24821, 0x0; 2026-02-21T10:22:24.3324940Z mov.u32 %r24822, 0x0; 2026-02-21T10:22:24.3325160Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r24819, %r24820, %r24821, %r24822 }, [ %rd562 + 0 ], %rd561; 2026-02-21T10:22:24.3325220Z // end inline asm 2026-02-21T10:22:24.3325424Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3325482Z bar.sync 0; 2026-02-21T10:22:24.3325581Z st.shared.v2.b32 [%r10], {%r24791, %r24792}; 2026-02-21T10:22:24.3325677Z st.shared.v2.b32 [%r10+2048], {%r24795, %r24796}; 2026-02-21T10:22:24.3325766Z st.shared.v2.b32 [%r10+4096], {%r24799, %r24800}; 2026-02-21T10:22:24.3325852Z st.shared.v2.b32 [%r10+6144], {%r24803, %r24804}; 2026-02-21T10:22:24.3325935Z st.shared.v2.b32 [%r10+8192], {%r24807, %r24808}; 2026-02-21T10:22:24.3326041Z st.shared.v2.b32 [%r10+10240], {%r24811, %r24812}; 2026-02-21T10:22:24.3326130Z st.shared.v2.b32 [%r10+12288], {%r24815, %r24816}; 2026-02-21T10:22:24.3326273Z st.shared.v2.b32 [%r10+14336], {%r24819, %r24820}; 2026-02-21T10:22:24.3326360Z st.shared.v2.b32 [%r11], {%r24793, %r24794}; 2026-02-21T10:22:24.3326560Z st.shared.v2.b32 [%r11+2048], {%r24797, %r24798}; 2026-02-21T10:22:24.3326652Z st.shared.v2.b32 [%r11+4096], {%r24801, %r24802}; 2026-02-21T10:22:24.3326735Z st.shared.v2.b32 [%r11+6144], {%r24805, %r24806}; 2026-02-21T10:22:24.3329261Z st.shared.v2.b32 [%r11+8192], {%r24809, %r24810}; 2026-02-21T10:22:24.3329389Z st.shared.v2.b32 [%r11+10240], {%r24813, %r24814}; 2026-02-21T10:22:24.3329488Z st.shared.v2.b32 [%r11+12288], {%r24817, %r24818}; 2026-02-21T10:22:24.3329581Z st.shared.v2.b32 [%r11+14336], {%r24821, %r24822}; 2026-02-21T10:22:24.3329638Z bar.sync 0; 2026-02-21T10:22:24.3329708Z ld.shared.b16 %rs2017, [%r52]; 2026-02-21T10:22:24.3329794Z ld.shared.b16 %rs2018, [%r52+1024]; 2026-02-21T10:22:24.3329863Z ld.shared.b16 %rs2019, [%r52+64]; 2026-02-21T10:22:24.3329929Z ld.shared.b16 %rs2020, [%r52+1088]; 2026-02-21T10:22:24.3329994Z ld.shared.b16 %rs2021, [%r52+8192]; 2026-02-21T10:22:24.3330059Z ld.shared.b16 %rs2022, [%r52+9216]; 2026-02-21T10:22:24.3330121Z ld.shared.b16 %rs2023, [%r52+8256]; 2026-02-21T10:22:24.3330183Z ld.shared.b16 %rs2024, [%r52+9280]; 2026-02-21T10:22:24.3330250Z ld.shared.b16 %rs2025, [%r53]; 2026-02-21T10:22:24.3330313Z ld.shared.b16 %rs2026, [%r53+1024]; 2026-02-21T10:22:24.3330376Z ld.shared.b16 %rs2027, [%r53+64]; 2026-02-21T10:22:24.3330560Z ld.shared.b16 %rs2028, [%r53+1088]; 2026-02-21T10:22:24.3330625Z ld.shared.b16 %rs2029, [%r53+8192]; 2026-02-21T10:22:24.3330687Z ld.shared.b16 %rs2030, [%r53+9216]; 2026-02-21T10:22:24.3330749Z ld.shared.b16 %rs2031, [%r53+8256]; 2026-02-21T10:22:24.3330816Z ld.shared.b16 %rs2032, [%r53+9280]; 2026-02-21T10:22:24.3330882Z ld.shared.b16 %rs2033, [%r54]; 2026-02-21T10:22:24.3330944Z ld.shared.b16 %rs2034, [%r54+1024]; 2026-02-21T10:22:24.3331009Z ld.shared.b16 %rs2035, [%r54+64]; 2026-02-21T10:22:24.3331079Z ld.shared.b16 %rs2036, [%r54+1088]; 2026-02-21T10:22:24.3331141Z ld.shared.b16 %rs2037, [%r54+8192]; 2026-02-21T10:22:24.3331203Z ld.shared.b16 %rs2038, [%r54+9216]; 2026-02-21T10:22:24.3331267Z ld.shared.b16 %rs2039, [%r54+8256]; 2026-02-21T10:22:24.3331328Z ld.shared.b16 %rs2040, [%r54+9280]; 2026-02-21T10:22:24.3331390Z ld.shared.b16 %rs2041, [%r55]; 2026-02-21T10:22:24.3331454Z ld.shared.b16 %rs2042, [%r55+1024]; 2026-02-21T10:22:24.3331520Z ld.shared.b16 %rs2043, [%r55+64]; 2026-02-21T10:22:24.3331654Z ld.shared.b16 %rs2044, [%r55+1088]; 2026-02-21T10:22:24.3331732Z ld.shared.b16 %rs2045, [%r55+8192]; 2026-02-21T10:22:24.3331799Z ld.shared.b16 %rs2046, [%r55+9216]; 2026-02-21T10:22:24.3331861Z ld.shared.b16 %rs2047, [%r55+8256]; 2026-02-21T10:22:24.3331924Z ld.shared.b16 %rs2048, [%r55+9280]; 2026-02-21T10:22:24.3332052Z ld.shared.b16 %rs2049, [%r56]; 2026-02-21T10:22:24.3332115Z ld.shared.b16 %rs2050, [%r56+1024]; 2026-02-21T10:22:24.3332179Z ld.shared.b16 %rs2051, [%r56+64]; 2026-02-21T10:22:24.3332241Z ld.shared.b16 %rs2052, [%r56+1088]; 2026-02-21T10:22:24.3332316Z ld.shared.b16 %rs2053, [%r56+8192]; 2026-02-21T10:22:24.3332380Z ld.shared.b16 %rs2054, [%r56+9216]; 2026-02-21T10:22:24.3332441Z ld.shared.b16 %rs2055, [%r56+8256]; 2026-02-21T10:22:24.3332504Z ld.shared.b16 %rs2056, [%r56+9280]; 2026-02-21T10:22:24.3332564Z ld.shared.b16 %rs2057, [%r57]; 2026-02-21T10:22:24.3332627Z ld.shared.b16 %rs2058, [%r57+1024]; 2026-02-21T10:22:24.3332694Z ld.shared.b16 %rs2059, [%r57+64]; 2026-02-21T10:22:24.3332756Z ld.shared.b16 %rs2060, [%r57+1088]; 2026-02-21T10:22:24.3332817Z ld.shared.b16 %rs2061, [%r57+8192]; 2026-02-21T10:22:24.3332880Z ld.shared.b16 %rs2062, [%r57+9216]; 2026-02-21T10:22:24.3332950Z ld.shared.b16 %rs2063, [%r57+8256]; 2026-02-21T10:22:24.3333018Z ld.shared.b16 %rs2064, [%r57+9280]; 2026-02-21T10:22:24.3333079Z ld.shared.b16 %rs2065, [%r58]; 2026-02-21T10:22:24.3333145Z ld.shared.b16 %rs2066, [%r58+1024]; 2026-02-21T10:22:24.3333278Z ld.shared.b16 %rs2067, [%r58+64]; 2026-02-21T10:22:24.3333343Z ld.shared.b16 %rs2068, [%r58+1088]; 2026-02-21T10:22:24.3333405Z ld.shared.b16 %rs2069, [%r58+8192]; 2026-02-21T10:22:24.3333471Z ld.shared.b16 %rs2070, [%r58+9216]; 2026-02-21T10:22:24.3333532Z ld.shared.b16 %rs2071, [%r58+8256]; 2026-02-21T10:22:24.3333594Z ld.shared.b16 %rs2072, [%r58+9280]; 2026-02-21T10:22:24.3333659Z ld.shared.b16 %rs2073, [%r59]; 2026-02-21T10:22:24.3333724Z ld.shared.b16 %rs2074, [%r59+1024]; 2026-02-21T10:22:24.3333786Z ld.shared.b16 %rs2075, [%r59+64]; 2026-02-21T10:22:24.3333853Z ld.shared.b16 %rs2076, [%r59+1088]; 2026-02-21T10:22:24.3333915Z ld.shared.b16 %rs2077, [%r59+8192]; 2026-02-21T10:22:24.3333977Z ld.shared.b16 %rs2078, [%r59+9216]; 2026-02-21T10:22:24.3334041Z ld.shared.b16 %rs2079, [%r59+8256]; 2026-02-21T10:22:24.3334108Z ld.shared.b16 %rs2080, [%r59+9280]; 2026-02-21T10:22:24.3334167Z cvt.f32.bf16 %r24960, %rs2017; 2026-02-21T10:22:24.3334238Z cvt.f32.bf16 %r24961, %rs2018; 2026-02-21T10:22:24.3334304Z cvt.f32.bf16 %r24962, %rs2025; 2026-02-21T10:22:24.3334363Z cvt.f32.bf16 %r24963, %rs2026; 2026-02-21T10:22:24.3334422Z cvt.f32.bf16 %r25092, %rs2033; 2026-02-21T10:22:24.3334479Z cvt.f32.bf16 %r25093, %rs2034; 2026-02-21T10:22:24.3334538Z cvt.f32.bf16 %r25094, %rs2041; 2026-02-21T10:22:24.3334597Z cvt.f32.bf16 %r25095, %rs2042; 2026-02-21T10:22:24.3334656Z cvt.f32.bf16 %r25224, %rs2049; 2026-02-21T10:22:24.3334775Z cvt.f32.bf16 %r25225, %rs2050; 2026-02-21T10:22:24.3334836Z cvt.f32.bf16 %r25226, %rs2057; 2026-02-21T10:22:24.3334893Z cvt.f32.bf16 %r25227, %rs2058; 2026-02-21T10:22:24.3334953Z cvt.f32.bf16 %r25356, %rs2065; 2026-02-21T10:22:24.3335012Z cvt.f32.bf16 %r25357, %rs2066; 2026-02-21T10:22:24.3335070Z cvt.f32.bf16 %r25358, %rs2073; 2026-02-21T10:22:24.3335131Z cvt.f32.bf16 %r25359, %rs2074; 2026-02-21T10:22:24.3335194Z cvt.f32.bf16 %r25488, %rs2019; 2026-02-21T10:22:24.3335251Z cvt.f32.bf16 %r25489, %rs2020; 2026-02-21T10:22:24.3335312Z cvt.f32.bf16 %r25490, %rs2027; 2026-02-21T10:22:24.3335371Z cvt.f32.bf16 %r25491, %rs2028; 2026-02-21T10:22:24.3335428Z cvt.f32.bf16 %r25620, %rs2035; 2026-02-21T10:22:24.3335485Z cvt.f32.bf16 %r25621, %rs2036; 2026-02-21T10:22:24.3335542Z cvt.f32.bf16 %r25622, %rs2043; 2026-02-21T10:22:24.3335603Z cvt.f32.bf16 %r25623, %rs2044; 2026-02-21T10:22:24.3335660Z cvt.f32.bf16 %r25752, %rs2051; 2026-02-21T10:22:24.3335716Z cvt.f32.bf16 %r25753, %rs2052; 2026-02-21T10:22:24.3335858Z cvt.f32.bf16 %r25754, %rs2059; 2026-02-21T10:22:24.3335921Z cvt.f32.bf16 %r25755, %rs2060; 2026-02-21T10:22:24.3335978Z cvt.f32.bf16 %r25884, %rs2067; 2026-02-21T10:22:24.3336038Z cvt.f32.bf16 %r25885, %rs2068; 2026-02-21T10:22:24.3336102Z cvt.f32.bf16 %r25886, %rs2075; 2026-02-21T10:22:24.3336159Z cvt.f32.bf16 %r25887, %rs2076; 2026-02-21T10:22:24.3336272Z cvt.f32.bf16 %r26016, %rs2021; 2026-02-21T10:22:24.3336332Z cvt.f32.bf16 %r26017, %rs2022; 2026-02-21T10:22:24.3336389Z cvt.f32.bf16 %r26018, %rs2029; 2026-02-21T10:22:24.3336605Z cvt.f32.bf16 %r26019, %rs2030; 2026-02-21T10:22:24.3336673Z cvt.f32.bf16 %r26148, %rs2037; 2026-02-21T10:22:24.3336735Z cvt.f32.bf16 %r26149, %rs2038; 2026-02-21T10:22:24.3336794Z cvt.f32.bf16 %r26150, %rs2045; 2026-02-21T10:22:24.3336852Z cvt.f32.bf16 %r26151, %rs2046; 2026-02-21T10:22:24.3336923Z cvt.f32.bf16 %r26280, %rs2053; 2026-02-21T10:22:24.3336983Z cvt.f32.bf16 %r26281, %rs2054; 2026-02-21T10:22:24.3337044Z cvt.f32.bf16 %r26282, %rs2061; 2026-02-21T10:22:24.3337107Z cvt.f32.bf16 %r26283, %rs2062; 2026-02-21T10:22:24.3337164Z cvt.f32.bf16 %r26412, %rs2069; 2026-02-21T10:22:24.3337221Z cvt.f32.bf16 %r26413, %rs2070; 2026-02-21T10:22:24.3337278Z cvt.f32.bf16 %r26414, %rs2077; 2026-02-21T10:22:24.3337338Z cvt.f32.bf16 %r26415, %rs2078; 2026-02-21T10:22:24.3337398Z cvt.f32.bf16 %r26544, %rs2023; 2026-02-21T10:22:24.3337456Z cvt.f32.bf16 %r26545, %rs2024; 2026-02-21T10:22:24.3337516Z cvt.f32.bf16 %r26546, %rs2031; 2026-02-21T10:22:24.3337659Z cvt.f32.bf16 %r26547, %rs2032; 2026-02-21T10:22:24.3337725Z cvt.f32.bf16 %r26676, %rs2039; 2026-02-21T10:22:24.3337784Z cvt.f32.bf16 %r26677, %rs2040; 2026-02-21T10:22:24.3337843Z cvt.f32.bf16 %r26678, %rs2047; 2026-02-21T10:22:24.3337903Z cvt.f32.bf16 %r26679, %rs2048; 2026-02-21T10:22:24.3337961Z cvt.f32.bf16 %r26808, %rs2055; 2026-02-21T10:22:24.3338022Z cvt.f32.bf16 %r26809, %rs2056; 2026-02-21T10:22:24.3338080Z cvt.f32.bf16 %r26810, %rs2063; 2026-02-21T10:22:24.3338143Z cvt.f32.bf16 %r26811, %rs2064; 2026-02-21T10:22:24.3338201Z cvt.f32.bf16 %r26940, %rs2071; 2026-02-21T10:22:24.3338260Z cvt.f32.bf16 %r26941, %rs2072; 2026-02-21T10:22:24.3338317Z cvt.f32.bf16 %r26942, %rs2079; 2026-02-21T10:22:24.3338374Z cvt.f32.bf16 %r26943, %rs2080; 2026-02-21T10:22:24.3338599Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3338658Z bar.sync 0; 2026-02-21T10:22:24.3338716Z // begin inline asm 2026-02-21T10:22:24.3338827Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3338882Z // end inline asm 2026-02-21T10:22:24.3338934Z bar.sync 0; 2026-02-21T10:22:24.3338989Z // begin inline asm 2026-02-21T10:22:24.3339129Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3339184Z // end inline asm 2026-02-21T10:22:24.3339239Z // begin inline asm 2026-02-21T10:22:24.3339315Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3339443Z // end inline asm 2026-02-21T10:22:24.3339496Z bar.sync 0; 2026-02-21T10:22:24.3339563Z elect.sync %r29687|%p282, -1; 2026-02-21T10:22:24.3339633Z and.pred %p242, %p1, %p282; 2026-02-21T10:22:24.3339693Z cvt.u32.u64 %r29688, %rd849; 2026-02-21T10:22:24.3339753Z add.s32 %r24827, %r29688, 128; 2026-02-21T10:22:24.3339810Z // begin inline asm 2026-02-21T10:22:24.3340150Z @%p242 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r29853, %r24827}], [%r29850]; 2026-02-21T10:22:24.3340206Z // end inline asm 2026-02-21T10:22:24.3340262Z bar.sync 0; 2026-02-21T10:22:24.3340328Z // begin inline asm 2026-02-21T10:22:24.3340382Z 2026-02-21T10:22:24.3340430Z { 2026-02-21T10:22:24.3340495Z .reg .pred complete; 2026-02-21T10:22:24.3340548Z waitLoop: 2026-02-21T10:22:24.3340693Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r29488; 2026-02-21T10:22:24.3340763Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3340812Z } 2026-02-21T10:22:24.3340820Z 2026-02-21T10:22:24.3340945Z // end inline asm 2026-02-21T10:22:24.3341001Z bar.sync 0; 2026-02-21T10:22:24.3341059Z // begin inline asm 2026-02-21T10:22:24.3341154Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3341209Z // end inline asm 2026-02-21T10:22:24.3341415Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3341549Z ld.shared.s8 %rs2081, [%r20]; 2026-02-21T10:22:24.3341744Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3341807Z shl.b16 %rs2082, %rs2081, 4; 2026-02-21T10:22:24.3341995Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3342064Z ld.shared.s8 %rs2083, [%r21+128]; 2026-02-21T10:22:24.3342251Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3342316Z shl.b16 %rs2084, %rs2083, 4; 2026-02-21T10:22:24.3342507Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3342569Z ld.shared.s8 %rs2085, [%r22+256]; 2026-02-21T10:22:24.3342760Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3342835Z shl.b16 %rs2086, %rs2085, 4; 2026-02-21T10:22:24.3343038Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3343157Z ld.shared.s8 %rs2087, [%r23+384]; 2026-02-21T10:22:24.3343353Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3343412Z shl.b16 %rs2088, %rs2087, 4; 2026-02-21T10:22:24.3343604Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3343667Z ld.shared.s8 %rs2089, [%r24+512]; 2026-02-21T10:22:24.3343858Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3343916Z shl.b16 %rs2090, %rs2089, 4; 2026-02-21T10:22:24.3344114Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3344178Z ld.shared.s8 %rs2091, [%r25+640]; 2026-02-21T10:22:24.3344378Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3344443Z shl.b16 %rs2092, %rs2091, 4; 2026-02-21T10:22:24.3344635Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3344697Z ld.shared.s8 %rs2093, [%r26+768]; 2026-02-21T10:22:24.3344887Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3344945Z shl.b16 %rs2094, %rs2093, 4; 2026-02-21T10:22:24.3345133Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3345253Z ld.shared.s8 %rs2095, [%r27+896]; 2026-02-21T10:22:24.3345438Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3345495Z shl.b16 %rs2096, %rs2095, 4; 2026-02-21T10:22:24.3345682Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3345749Z ld.shared.s8 %rs2097, [%r20+1024]; 2026-02-21T10:22:24.3345939Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3345997Z shl.b16 %rs2098, %rs2097, 4; 2026-02-21T10:22:24.3346191Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3346254Z ld.shared.s8 %rs2099, [%r21+1152]; 2026-02-21T10:22:24.3346440Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3346714Z shl.b16 %rs2100, %rs2099, 4; 2026-02-21T10:22:24.3346907Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3346970Z ld.shared.s8 %rs2101, [%r22+1280]; 2026-02-21T10:22:24.3347159Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3347285Z shl.b16 %rs2102, %rs2101, 4; 2026-02-21T10:22:24.3347471Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3347535Z ld.shared.s8 %rs2103, [%r23+1408]; 2026-02-21T10:22:24.3347718Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3347776Z shl.b16 %rs2104, %rs2103, 4; 2026-02-21T10:22:24.3347960Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3348026Z ld.shared.s8 %rs2105, [%r24+1536]; 2026-02-21T10:22:24.3348214Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3348285Z shl.b16 %rs2106, %rs2105, 4; 2026-02-21T10:22:24.3348554Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3348622Z ld.shared.s8 %rs2107, [%r25+1664]; 2026-02-21T10:22:24.3348808Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3348941Z shl.b16 %rs2108, %rs2107, 4; 2026-02-21T10:22:24.3349129Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3349191Z ld.shared.s8 %rs2109, [%r26+1792]; 2026-02-21T10:22:24.3349379Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3349438Z shl.b16 %rs2110, %rs2109, 4; 2026-02-21T10:22:24.3349630Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3349691Z ld.shared.s8 %rs2111, [%r27+1920]; 2026-02-21T10:22:24.3349877Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3349935Z shl.b16 %rs2112, %rs2111, 4; 2026-02-21T10:22:24.3350120Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3350187Z ld.shared.s8 %rs2113, [%r20+2048]; 2026-02-21T10:22:24.3350373Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3350432Z shl.b16 %rs2114, %rs2113, 4; 2026-02-21T10:22:24.3350618Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3350679Z ld.shared.s8 %rs2115, [%r21+2176]; 2026-02-21T10:22:24.3350866Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3351007Z shl.b16 %rs2116, %rs2115, 4; 2026-02-21T10:22:24.3351200Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3351262Z ld.shared.s8 %rs2117, [%r22+2304]; 2026-02-21T10:22:24.3351445Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3351509Z shl.b16 %rs2118, %rs2117, 4; 2026-02-21T10:22:24.3351693Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3351755Z ld.shared.s8 %rs2119, [%r23+2432]; 2026-02-21T10:22:24.3351941Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3351999Z shl.b16 %rs2120, %rs2119, 4; 2026-02-21T10:22:24.3352184Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3352300Z ld.shared.s8 %rs2121, [%r24+2560]; 2026-02-21T10:22:24.3352488Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3352546Z shl.b16 %rs2122, %rs2121, 4; 2026-02-21T10:22:24.3352733Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3352839Z ld.shared.s8 %rs2123, [%r25+2688]; 2026-02-21T10:22:24.3353025Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3353083Z shl.b16 %rs2124, %rs2123, 4; 2026-02-21T10:22:24.3353269Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3353329Z ld.shared.s8 %rs2125, [%r26+2816]; 2026-02-21T10:22:24.3353513Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3353584Z shl.b16 %rs2126, %rs2125, 4; 2026-02-21T10:22:24.3353778Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3353840Z ld.shared.s8 %rs2127, [%r27+2944]; 2026-02-21T10:22:24.3354026Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3354087Z shl.b16 %rs2128, %rs2127, 4; 2026-02-21T10:22:24.3354279Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3354400Z ld.shared.s8 %rs2129, [%r20+3072]; 2026-02-21T10:22:24.3354596Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3354656Z shl.b16 %rs2130, %rs2129, 4; 2026-02-21T10:22:24.3354847Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3354909Z ld.shared.s8 %rs2131, [%r21+3200]; 2026-02-21T10:22:24.3355102Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3355161Z shl.b16 %rs2132, %rs2131, 4; 2026-02-21T10:22:24.3355349Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3355422Z ld.shared.s8 %rs2133, [%r22+3328]; 2026-02-21T10:22:24.3355612Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3355676Z shl.b16 %rs2134, %rs2133, 4; 2026-02-21T10:22:24.3355861Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3355922Z ld.shared.s8 %rs2135, [%r23+3456]; 2026-02-21T10:22:24.3356110Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3356168Z shl.b16 %rs2136, %rs2135, 4; 2026-02-21T10:22:24.3356358Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3356624Z ld.shared.s8 %rs2137, [%r24+3584]; 2026-02-21T10:22:24.3356814Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3356872Z shl.b16 %rs2138, %rs2137, 4; 2026-02-21T10:22:24.3357056Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3357123Z ld.shared.s8 %rs2139, [%r25+3712]; 2026-02-21T10:22:24.3357310Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3357368Z shl.b16 %rs2140, %rs2139, 4; 2026-02-21T10:22:24.3357558Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3357621Z ld.shared.s8 %rs2141, [%r26+3840]; 2026-02-21T10:22:24.3357806Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3357946Z shl.b16 %rs2142, %rs2141, 4; 2026-02-21T10:22:24.3358136Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3358197Z ld.shared.s8 %rs2143, [%r27+3968]; 2026-02-21T10:22:24.3358397Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3358531Z shl.b16 %rs2144, %rs2143, 4; 2026-02-21T10:22:24.3358721Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3358782Z cvt.s16.s8 %rs2145, %rs2082; 2026-02-21T10:22:24.3358845Z shr.s16 %rs2146, %rs2145, 4; 2026-02-21T10:22:24.3358903Z cvt.s16.s8 %rs2147, %rs2084; 2026-02-21T10:22:24.3358960Z shr.s16 %rs2148, %rs2147, 4; 2026-02-21T10:22:24.3359022Z shr.s16 %rs2149, %rs2081, 4; 2026-02-21T10:22:24.3359079Z shr.s16 %rs2150, %rs2083, 4; 2026-02-21T10:22:24.3359269Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3359342Z cvt.rn.f32.s16 %r29689, %rs2150; 2026-02-21T10:22:24.3359416Z cvt.rn.f32.s16 %r29690, %rs2149; 2026-02-21T10:22:24.3359481Z cvt.rn.f32.s16 %r29691, %rs2148; 2026-02-21T10:22:24.3359542Z cvt.rn.f32.s16 %r29692, %rs2146; 2026-02-21T10:22:24.3359735Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3359799Z cvt.s16.s8 %rs2151, %rs2086; 2026-02-21T10:22:24.3359925Z shr.s16 %rs2152, %rs2151, 4; 2026-02-21T10:22:24.3359989Z cvt.s16.s8 %rs2153, %rs2088; 2026-02-21T10:22:24.3360048Z shr.s16 %rs2154, %rs2153, 4; 2026-02-21T10:22:24.3360106Z shr.s16 %rs2155, %rs2085, 4; 2026-02-21T10:22:24.3360164Z shr.s16 %rs2156, %rs2087, 4; 2026-02-21T10:22:24.3360363Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3360424Z cvt.rn.f32.s16 %r29693, %rs2156; 2026-02-21T10:22:24.3360487Z cvt.rn.f32.s16 %r29694, %rs2155; 2026-02-21T10:22:24.3360553Z cvt.rn.f32.s16 %r29695, %rs2154; 2026-02-21T10:22:24.3360614Z cvt.rn.f32.s16 %r29696, %rs2152; 2026-02-21T10:22:24.3360801Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3360862Z cvt.s16.s8 %rs2157, %rs2090; 2026-02-21T10:22:24.3360922Z shr.s16 %rs2158, %rs2157, 4; 2026-02-21T10:22:24.3360980Z cvt.s16.s8 %rs2159, %rs2092; 2026-02-21T10:22:24.3361038Z shr.s16 %rs2160, %rs2159, 4; 2026-02-21T10:22:24.3361101Z shr.s16 %rs2161, %rs2089, 4; 2026-02-21T10:22:24.3361160Z shr.s16 %rs2162, %rs2091, 4; 2026-02-21T10:22:24.3361347Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3361411Z cvt.rn.f32.s16 %r29697, %rs2162; 2026-02-21T10:22:24.3361473Z cvt.rn.f32.s16 %r29698, %rs2161; 2026-02-21T10:22:24.3361532Z cvt.rn.f32.s16 %r29699, %rs2160; 2026-02-21T10:22:24.3361662Z cvt.rn.f32.s16 %r29700, %rs2158; 2026-02-21T10:22:24.3361853Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3361911Z cvt.s16.s8 %rs2163, %rs2094; 2026-02-21T10:22:24.3361969Z shr.s16 %rs2164, %rs2163, 4; 2026-02-21T10:22:24.3362045Z cvt.s16.s8 %rs2165, %rs2096; 2026-02-21T10:22:24.3362106Z shr.s16 %rs2166, %rs2165, 4; 2026-02-21T10:22:24.3362163Z shr.s16 %rs2167, %rs2093, 4; 2026-02-21T10:22:24.3362224Z shr.s16 %rs2168, %rs2095, 4; 2026-02-21T10:22:24.3362416Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3362478Z cvt.rn.f32.s16 %r29701, %rs2168; 2026-02-21T10:22:24.3362540Z cvt.rn.f32.s16 %r29702, %rs2167; 2026-02-21T10:22:24.3362599Z cvt.rn.f32.s16 %r29703, %rs2166; 2026-02-21T10:22:24.3362660Z cvt.rn.f32.s16 %r29704, %rs2164; 2026-02-21T10:22:24.3362845Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3362963Z cvt.s16.s8 %rs2169, %rs2098; 2026-02-21T10:22:24.3363024Z shr.s16 %rs2170, %rs2169, 4; 2026-02-21T10:22:24.3363082Z cvt.s16.s8 %rs2171, %rs2100; 2026-02-21T10:22:24.3363141Z shr.s16 %rs2172, %rs2171, 4; 2026-02-21T10:22:24.3363198Z shr.s16 %rs2173, %rs2097, 4; 2026-02-21T10:22:24.3363255Z shr.s16 %rs2174, %rs2099, 4; 2026-02-21T10:22:24.3363489Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3363554Z cvt.rn.f32.s16 %r29705, %rs2174; 2026-02-21T10:22:24.3363615Z cvt.rn.f32.s16 %r29706, %rs2173; 2026-02-21T10:22:24.3363675Z cvt.rn.f32.s16 %r29707, %rs2172; 2026-02-21T10:22:24.3363739Z cvt.rn.f32.s16 %r29708, %rs2170; 2026-02-21T10:22:24.3363926Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3363987Z cvt.s16.s8 %rs2175, %rs2102; 2026-02-21T10:22:24.3364046Z shr.s16 %rs2176, %rs2175, 4; 2026-02-21T10:22:24.3364110Z cvt.s16.s8 %rs2177, %rs2104; 2026-02-21T10:22:24.3364169Z shr.s16 %rs2178, %rs2177, 4; 2026-02-21T10:22:24.3364228Z shr.s16 %rs2179, %rs2101, 4; 2026-02-21T10:22:24.3364289Z shr.s16 %rs2180, %rs2103, 4; 2026-02-21T10:22:24.3364476Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3364539Z cvt.rn.f32.s16 %r29709, %rs2180; 2026-02-21T10:22:24.3364614Z cvt.rn.f32.s16 %r29710, %rs2179; 2026-02-21T10:22:24.3364730Z cvt.rn.f32.s16 %r29711, %rs2178; 2026-02-21T10:22:24.3364792Z cvt.rn.f32.s16 %r29712, %rs2176; 2026-02-21T10:22:24.3364982Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3365042Z cvt.s16.s8 %rs2181, %rs2106; 2026-02-21T10:22:24.3365101Z shr.s16 %rs2182, %rs2181, 4; 2026-02-21T10:22:24.3365158Z cvt.s16.s8 %rs2183, %rs2108; 2026-02-21T10:22:24.3365219Z shr.s16 %rs2184, %rs2183, 4; 2026-02-21T10:22:24.3365281Z shr.s16 %rs2185, %rs2105, 4; 2026-02-21T10:22:24.3365341Z shr.s16 %rs2186, %rs2107, 4; 2026-02-21T10:22:24.3365540Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3365599Z cvt.rn.f32.s16 %r29713, %rs2186; 2026-02-21T10:22:24.3365659Z cvt.rn.f32.s16 %r29714, %rs2185; 2026-02-21T10:22:24.3365725Z cvt.rn.f32.s16 %r29715, %rs2184; 2026-02-21T10:22:24.3365786Z cvt.rn.f32.s16 %r29716, %rs2182; 2026-02-21T10:22:24.3365977Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3366043Z cvt.s16.s8 %rs2187, %rs2110; 2026-02-21T10:22:24.3366105Z shr.s16 %rs2188, %rs2187, 4; 2026-02-21T10:22:24.3366162Z cvt.s16.s8 %rs2189, %rs2112; 2026-02-21T10:22:24.3366220Z shr.s16 %rs2190, %rs2189, 4; 2026-02-21T10:22:24.3366281Z shr.s16 %rs2191, %rs2109, 4; 2026-02-21T10:22:24.3366340Z shr.s16 %rs2192, %rs2111, 4; 2026-02-21T10:22:24.3366652Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3366794Z cvt.rn.f32.s16 %r29717, %rs2192; 2026-02-21T10:22:24.3366858Z cvt.rn.f32.s16 %r29718, %rs2191; 2026-02-21T10:22:24.3366918Z cvt.rn.f32.s16 %r29719, %rs2190; 2026-02-21T10:22:24.3366977Z cvt.rn.f32.s16 %r29720, %rs2188; 2026-02-21T10:22:24.3367173Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3367234Z cvt.s16.s8 %rs2193, %rs2114; 2026-02-21T10:22:24.3367294Z shr.s16 %rs2194, %rs2193, 4; 2026-02-21T10:22:24.3367358Z cvt.s16.s8 %rs2195, %rs2116; 2026-02-21T10:22:24.3367425Z shr.s16 %rs2196, %rs2195, 4; 2026-02-21T10:22:24.3367483Z shr.s16 %rs2197, %rs2113, 4; 2026-02-21T10:22:24.3367541Z shr.s16 %rs2198, %rs2115, 4; 2026-02-21T10:22:24.3367732Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3367792Z cvt.rn.f32.s16 %r29721, %rs2198; 2026-02-21T10:22:24.3367918Z cvt.rn.f32.s16 %r29722, %rs2197; 2026-02-21T10:22:24.3367983Z cvt.rn.f32.s16 %r29723, %rs2196; 2026-02-21T10:22:24.3368041Z cvt.rn.f32.s16 %r29724, %rs2194; 2026-02-21T10:22:24.3368228Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3368287Z cvt.s16.s8 %rs2199, %rs2118; 2026-02-21T10:22:24.3368408Z shr.s16 %rs2200, %rs2199, 4; 2026-02-21T10:22:24.3368464Z cvt.s16.s8 %rs2201, %rs2120; 2026-02-21T10:22:24.3368524Z shr.s16 %rs2202, %rs2201, 4; 2026-02-21T10:22:24.3368585Z shr.s16 %rs2203, %rs2117, 4; 2026-02-21T10:22:24.3368641Z shr.s16 %rs2204, %rs2119, 4; 2026-02-21T10:22:24.3368828Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3368890Z cvt.rn.f32.s16 %r29725, %rs2204; 2026-02-21T10:22:24.3368947Z cvt.rn.f32.s16 %r29726, %rs2203; 2026-02-21T10:22:24.3369005Z cvt.rn.f32.s16 %r29727, %rs2202; 2026-02-21T10:22:24.3369068Z cvt.rn.f32.s16 %r29728, %rs2200; 2026-02-21T10:22:24.3369257Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3369315Z cvt.s16.s8 %rs2205, %rs2122; 2026-02-21T10:22:24.3369385Z shr.s16 %rs2206, %rs2205, 4; 2026-02-21T10:22:24.3369448Z cvt.s16.s8 %rs2207, %rs2124; 2026-02-21T10:22:24.3369508Z shr.s16 %rs2208, %rs2207, 4; 2026-02-21T10:22:24.3369566Z shr.s16 %rs2209, %rs2121, 4; 2026-02-21T10:22:24.3369625Z shr.s16 %rs2210, %rs2123, 4; 2026-02-21T10:22:24.3369883Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3369946Z cvt.rn.f32.s16 %r29729, %rs2210; 2026-02-21T10:22:24.3370006Z cvt.rn.f32.s16 %r29730, %rs2209; 2026-02-21T10:22:24.3370065Z cvt.rn.f32.s16 %r29731, %rs2208; 2026-02-21T10:22:24.3370122Z cvt.rn.f32.s16 %r29732, %rs2206; 2026-02-21T10:22:24.3370309Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3370375Z cvt.s16.s8 %rs2211, %rs2126; 2026-02-21T10:22:24.3370433Z shr.s16 %rs2212, %rs2211, 4; 2026-02-21T10:22:24.3370491Z cvt.s16.s8 %rs2213, %rs2128; 2026-02-21T10:22:24.3370549Z shr.s16 %rs2214, %rs2213, 4; 2026-02-21T10:22:24.3370605Z shr.s16 %rs2215, %rs2125, 4; 2026-02-21T10:22:24.3370667Z shr.s16 %rs2216, %rs2127, 4; 2026-02-21T10:22:24.3370853Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3370917Z cvt.rn.f32.s16 %r29733, %rs2216; 2026-02-21T10:22:24.3370977Z cvt.rn.f32.s16 %r29734, %rs2215; 2026-02-21T10:22:24.3371034Z cvt.rn.f32.s16 %r29735, %rs2214; 2026-02-21T10:22:24.3371094Z cvt.rn.f32.s16 %r29736, %rs2212; 2026-02-21T10:22:24.3371280Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3371339Z cvt.s16.s8 %rs2217, %rs2130; 2026-02-21T10:22:24.3371399Z shr.s16 %rs2218, %rs2217, 4; 2026-02-21T10:22:24.3371514Z cvt.s16.s8 %rs2219, %rs2132; 2026-02-21T10:22:24.3371571Z shr.s16 %rs2220, %rs2219, 4; 2026-02-21T10:22:24.3371628Z shr.s16 %rs2221, %rs2129, 4; 2026-02-21T10:22:24.3371686Z shr.s16 %rs2222, %rs2131, 4; 2026-02-21T10:22:24.3371872Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3371935Z cvt.rn.f32.s16 %r29737, %rs2222; 2026-02-21T10:22:24.3372003Z cvt.rn.f32.s16 %r29738, %rs2221; 2026-02-21T10:22:24.3372069Z cvt.rn.f32.s16 %r29739, %rs2220; 2026-02-21T10:22:24.3372129Z cvt.rn.f32.s16 %r29740, %rs2218; 2026-02-21T10:22:24.3372317Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3372375Z cvt.s16.s8 %rs2223, %rs2134; 2026-02-21T10:22:24.3372433Z shr.s16 %rs2224, %rs2223, 4; 2026-02-21T10:22:24.3372491Z cvt.s16.s8 %rs2225, %rs2136; 2026-02-21T10:22:24.3372550Z shr.s16 %rs2226, %rs2225, 4; 2026-02-21T10:22:24.3372610Z shr.s16 %rs2227, %rs2133, 4; 2026-02-21T10:22:24.3372726Z shr.s16 %rs2228, %rs2135, 4; 2026-02-21T10:22:24.3372921Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3372981Z cvt.rn.f32.s16 %r29741, %rs2228; 2026-02-21T10:22:24.3373040Z cvt.rn.f32.s16 %r29742, %rs2227; 2026-02-21T10:22:24.3373146Z cvt.rn.f32.s16 %r29743, %rs2226; 2026-02-21T10:22:24.3373207Z cvt.rn.f32.s16 %r29744, %rs2224; 2026-02-21T10:22:24.3373396Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3373454Z cvt.s16.s8 %rs2229, %rs2138; 2026-02-21T10:22:24.3373515Z shr.s16 %rs2230, %rs2229, 4; 2026-02-21T10:22:24.3373572Z cvt.s16.s8 %rs2231, %rs2140; 2026-02-21T10:22:24.3373628Z shr.s16 %rs2232, %rs2231, 4; 2026-02-21T10:22:24.3373687Z shr.s16 %rs2233, %rs2137, 4; 2026-02-21T10:22:24.3373743Z shr.s16 %rs2234, %rs2139, 4; 2026-02-21T10:22:24.3373936Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3373998Z cvt.rn.f32.s16 %r29745, %rs2234; 2026-02-21T10:22:24.3374059Z cvt.rn.f32.s16 %r29746, %rs2233; 2026-02-21T10:22:24.3374117Z cvt.rn.f32.s16 %r29747, %rs2232; 2026-02-21T10:22:24.3374175Z cvt.rn.f32.s16 %r29748, %rs2230; 2026-02-21T10:22:24.3374365Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3374434Z cvt.s16.s8 %rs2235, %rs2142; 2026-02-21T10:22:24.3374574Z shr.s16 %rs2236, %rs2235, 4; 2026-02-21T10:22:24.3374638Z cvt.s16.s8 %rs2237, %rs2144; 2026-02-21T10:22:24.3374696Z shr.s16 %rs2238, %rs2237, 4; 2026-02-21T10:22:24.3374752Z shr.s16 %rs2239, %rs2141, 4; 2026-02-21T10:22:24.3374810Z shr.s16 %rs2240, %rs2143, 4; 2026-02-21T10:22:24.3374999Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3375060Z cvt.rn.f32.s16 %r29749, %rs2240; 2026-02-21T10:22:24.3375124Z cvt.rn.f32.s16 %r29750, %rs2239; 2026-02-21T10:22:24.3375185Z cvt.rn.f32.s16 %r29751, %rs2238; 2026-02-21T10:22:24.3375244Z cvt.rn.f32.s16 %r29752, %rs2236; 2026-02-21T10:22:24.3375297Z bar.sync 0; 2026-02-21T10:22:24.3375420Z st.shared.v4.b32 [%r28], {%r29692, %r29690, %r29691, %r29689}; 2026-02-21T10:22:24.3375550Z st.shared.v4.b32 [%r28+16384], {%r29724, %r29722, %r29723, %r29721}; 2026-02-21T10:22:24.3375658Z st.shared.v4.b32 [%r29], {%r29696, %r29694, %r29695, %r29693}; 2026-02-21T10:22:24.3375776Z st.shared.v4.b32 [%r29+16384], {%r29728, %r29726, %r29727, %r29725}; 2026-02-21T10:22:24.3375884Z st.shared.v4.b32 [%r30], {%r29700, %r29698, %r29699, %r29697}; 2026-02-21T10:22:24.3376005Z st.shared.v4.b32 [%r30+16384], {%r29732, %r29730, %r29731, %r29729}; 2026-02-21T10:22:24.3376109Z st.shared.v4.b32 [%r31], {%r29704, %r29702, %r29703, %r29701}; 2026-02-21T10:22:24.3376223Z st.shared.v4.b32 [%r31+16384], {%r29736, %r29734, %r29735, %r29733}; 2026-02-21T10:22:24.3376383Z st.shared.v4.b32 [%r32], {%r29708, %r29706, %r29707, %r29705}; 2026-02-21T10:22:24.3376614Z st.shared.v4.b32 [%r32+16384], {%r29740, %r29738, %r29739, %r29737}; 2026-02-21T10:22:24.3376727Z st.shared.v4.b32 [%r33], {%r29712, %r29710, %r29711, %r29709}; 2026-02-21T10:22:24.3376843Z st.shared.v4.b32 [%r33+16384], {%r29744, %r29742, %r29743, %r29741}; 2026-02-21T10:22:24.3376947Z st.shared.v4.b32 [%r34], {%r29716, %r29714, %r29715, %r29713}; 2026-02-21T10:22:24.3377063Z st.shared.v4.b32 [%r34+16384], {%r29748, %r29746, %r29747, %r29745}; 2026-02-21T10:22:24.3377167Z st.shared.v4.b32 [%r35], {%r29720, %r29718, %r29719, %r29717}; 2026-02-21T10:22:24.3377282Z st.shared.v4.b32 [%r35+16384], {%r29752, %r29750, %r29751, %r29749}; 2026-02-21T10:22:24.3377336Z $L__tmp19: 2026-02-21T10:22:24.3377608Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3377667Z // begin inline asm 2026-02-21T10:22:24.3377831Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3377894Z // end inline asm 2026-02-21T10:22:24.3377947Z bar.sync 0; 2026-02-21T10:22:24.3378018Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3378075Z // begin inline asm 2026-02-21T10:22:24.3379577Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r24960,%r24961,%r24962,%r24963}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3379694Z // end inline asm 2026-02-21T10:22:24.3379754Z // begin inline asm 2026-02-21T10:22:24.3381293Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25092,%r25093,%r25094,%r25095}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3381354Z // end inline asm 2026-02-21T10:22:24.3381413Z // begin inline asm 2026-02-21T10:22:24.3382893Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25224,%r25225,%r25226,%r25227}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3382954Z // end inline asm 2026-02-21T10:22:24.3383010Z // begin inline asm 2026-02-21T10:22:24.3384481Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25356,%r25357,%r25358,%r25359}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3384603Z // end inline asm 2026-02-21T10:22:24.3384659Z // begin inline asm 2026-02-21T10:22:24.3386133Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25488,%r25489,%r25490,%r25491}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3386191Z // end inline asm 2026-02-21T10:22:24.3386245Z // begin inline asm 2026-02-21T10:22:24.3387923Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25620,%r25621,%r25622,%r25623}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3388045Z // end inline asm 2026-02-21T10:22:24.3388101Z // begin inline asm 2026-02-21T10:22:24.3389658Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25752,%r25753,%r25754,%r25755}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3389719Z // end inline asm 2026-02-21T10:22:24.3389782Z // begin inline asm 2026-02-21T10:22:24.3391326Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r25884,%r25885,%r25886,%r25887}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3391388Z // end inline asm 2026-02-21T10:22:24.3391446Z // begin inline asm 2026-02-21T10:22:24.3392929Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26016,%r26017,%r26018,%r26019}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3392989Z // end inline asm 2026-02-21T10:22:24.3393044Z // begin inline asm 2026-02-21T10:22:24.3394520Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26148,%r26149,%r26150,%r26151}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3394644Z // end inline asm 2026-02-21T10:22:24.3394703Z // begin inline asm 2026-02-21T10:22:24.3396240Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26280,%r26281,%r26282,%r26283}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3396302Z // end inline asm 2026-02-21T10:22:24.3396358Z // begin inline asm 2026-02-21T10:22:24.3397971Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26412,%r26413,%r26414,%r26415}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3398100Z // end inline asm 2026-02-21T10:22:24.3398159Z // begin inline asm 2026-02-21T10:22:24.3399703Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26544,%r26545,%r26546,%r26547}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3399762Z // end inline asm 2026-02-21T10:22:24.3399818Z // begin inline asm 2026-02-21T10:22:24.3401296Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26676,%r26677,%r26678,%r26679}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3401354Z // end inline asm 2026-02-21T10:22:24.3401418Z // begin inline asm 2026-02-21T10:22:24.3402904Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26808,%r26809,%r26810,%r26811}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3403025Z // end inline asm 2026-02-21T10:22:24.3403083Z // begin inline asm 2026-02-21T10:22:24.3404567Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r26940,%r26941,%r26942,%r26943}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3404629Z // end inline asm 2026-02-21T10:22:24.3404705Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3404767Z mov.b32 %r27073, %r29488; 2026-02-21T10:22:24.3404889Z mov.b32 %r27074, %r29488; 2026-02-21T10:22:24.3404950Z mov.b32 %r27072, %r39936; 2026-02-21T10:22:24.3405005Z // begin inline asm 2026-02-21T10:22:24.3407700Z // wait for regs: %r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115,%r27072,%r27073,%r27074 2026-02-21T10:22:24.3407880Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3407937Z // end inline asm 2026-02-21T10:22:24.3407993Z $L__tmp20: 2026-02-21T10:22:24.3408263Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3408329Z add.s64 %rd582, %rd500, 256; 2026-02-21T10:22:24.3408392Z add.s64 %rd585, %rd503, 256; 2026-02-21T10:22:24.3408450Z add.s64 %rd588, %rd506, 256; 2026-02-21T10:22:24.3408508Z add.s64 %rd591, %rd509, 256; 2026-02-21T10:22:24.3408567Z add.s64 %rd594, %rd512, 256; 2026-02-21T10:22:24.3408628Z add.s64 %rd597, %rd515, 256; 2026-02-21T10:22:24.3408686Z add.s64 %rd600, %rd518, 256; 2026-02-21T10:22:24.3408763Z mad.wide.s32 %rd603, %r42987, 2, %rd117; 2026-02-21T10:22:24.3408973Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3409035Z // begin inline asm 2026-02-21T10:22:24.3409092Z mov.u64 %rd581, 0x0; 2026-02-21T10:22:24.3409223Z createpolicy.fractional.L2::evict_first.b64 %rd581, 1.0; 2026-02-21T10:22:24.3409282Z // end inline asm 2026-02-21T10:22:24.3409338Z // begin inline asm 2026-02-21T10:22:24.3409393Z mov.u32 %r27206, 0x0; 2026-02-21T10:22:24.3409450Z mov.u32 %r27207, 0x0; 2026-02-21T10:22:24.3409508Z mov.u32 %r27208, 0x0; 2026-02-21T10:22:24.3409563Z mov.u32 %r27209, 0x0; 2026-02-21T10:22:24.3409801Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27206, %r27207, %r27208, %r27209 }, [ %rd582 + 0 ], %rd581; 2026-02-21T10:22:24.3409856Z // end inline asm 2026-02-21T10:22:24.3409911Z // begin inline asm 2026-02-21T10:22:24.3409966Z mov.u64 %rd584, 0x0; 2026-02-21T10:22:24.3410088Z createpolicy.fractional.L2::evict_first.b64 %rd584, 1.0; 2026-02-21T10:22:24.3410218Z // end inline asm 2026-02-21T10:22:24.3410274Z // begin inline asm 2026-02-21T10:22:24.3410335Z mov.u32 %r27210, 0x0; 2026-02-21T10:22:24.3410390Z mov.u32 %r27211, 0x0; 2026-02-21T10:22:24.3410443Z mov.u32 %r27212, 0x0; 2026-02-21T10:22:24.3410499Z mov.u32 %r27213, 0x0; 2026-02-21T10:22:24.3410727Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27210, %r27211, %r27212, %r27213 }, [ %rd585 + 0 ], %rd584; 2026-02-21T10:22:24.3410787Z // end inline asm 2026-02-21T10:22:24.3410842Z // begin inline asm 2026-02-21T10:22:24.3410904Z mov.u64 %rd587, 0x0; 2026-02-21T10:22:24.3411023Z createpolicy.fractional.L2::evict_first.b64 %rd587, 1.0; 2026-02-21T10:22:24.3411078Z // end inline asm 2026-02-21T10:22:24.3411149Z // begin inline asm 2026-02-21T10:22:24.3411209Z mov.u32 %r27214, 0x0; 2026-02-21T10:22:24.3411263Z mov.u32 %r27215, 0x0; 2026-02-21T10:22:24.3411320Z mov.u32 %r27216, 0x0; 2026-02-21T10:22:24.3411374Z mov.u32 %r27217, 0x0; 2026-02-21T10:22:24.3411674Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27214, %r27215, %r27216, %r27217 }, [ %rd588 + 0 ], %rd587; 2026-02-21T10:22:24.3411733Z // end inline asm 2026-02-21T10:22:24.3411793Z // begin inline asm 2026-02-21T10:22:24.3411848Z mov.u64 %rd590, 0x0; 2026-02-21T10:22:24.3411965Z createpolicy.fractional.L2::evict_first.b64 %rd590, 1.0; 2026-02-21T10:22:24.3412070Z // end inline asm 2026-02-21T10:22:24.3412126Z // begin inline asm 2026-02-21T10:22:24.3412179Z mov.u32 %r27218, 0x0; 2026-02-21T10:22:24.3412233Z mov.u32 %r27219, 0x0; 2026-02-21T10:22:24.3412290Z mov.u32 %r27220, 0x0; 2026-02-21T10:22:24.3412345Z mov.u32 %r27221, 0x0; 2026-02-21T10:22:24.3412564Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27218, %r27219, %r27220, %r27221 }, [ %rd591 + 0 ], %rd590; 2026-02-21T10:22:24.3412622Z // end inline asm 2026-02-21T10:22:24.3412677Z // begin inline asm 2026-02-21T10:22:24.3412733Z mov.u64 %rd593, 0x0; 2026-02-21T10:22:24.3412849Z createpolicy.fractional.L2::evict_first.b64 %rd593, 1.0; 2026-02-21T10:22:24.3412908Z // end inline asm 2026-02-21T10:22:24.3412964Z // begin inline asm 2026-02-21T10:22:24.3413017Z mov.u32 %r27222, 0x0; 2026-02-21T10:22:24.3413075Z mov.u32 %r27223, 0x0; 2026-02-21T10:22:24.3413129Z mov.u32 %r27224, 0x0; 2026-02-21T10:22:24.3413182Z mov.u32 %r27225, 0x0; 2026-02-21T10:22:24.3413403Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27222, %r27223, %r27224, %r27225 }, [ %rd594 + 0 ], %rd593; 2026-02-21T10:22:24.3413474Z // end inline asm 2026-02-21T10:22:24.3413586Z // begin inline asm 2026-02-21T10:22:24.3413648Z mov.u64 %rd596, 0x0; 2026-02-21T10:22:24.3413763Z createpolicy.fractional.L2::evict_first.b64 %rd596, 1.0; 2026-02-21T10:22:24.3413818Z // end inline asm 2026-02-21T10:22:24.3413873Z // begin inline asm 2026-02-21T10:22:24.3413930Z mov.u32 %r27226, 0x0; 2026-02-21T10:22:24.3413986Z mov.u32 %r27227, 0x0; 2026-02-21T10:22:24.3414041Z mov.u32 %r27228, 0x0; 2026-02-21T10:22:24.3414097Z mov.u32 %r27229, 0x0; 2026-02-21T10:22:24.3414319Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27226, %r27227, %r27228, %r27229 }, [ %rd597 + 0 ], %rd596; 2026-02-21T10:22:24.3414375Z // end inline asm 2026-02-21T10:22:24.3414431Z // begin inline asm 2026-02-21T10:22:24.3414490Z mov.u64 %rd599, 0x0; 2026-02-21T10:22:24.3414604Z createpolicy.fractional.L2::evict_first.b64 %rd599, 1.0; 2026-02-21T10:22:24.3414661Z // end inline asm 2026-02-21T10:22:24.3414718Z // begin inline asm 2026-02-21T10:22:24.3414774Z mov.u32 %r27230, 0x0; 2026-02-21T10:22:24.3414830Z mov.u32 %r27231, 0x0; 2026-02-21T10:22:24.3414887Z mov.u32 %r27232, 0x0; 2026-02-21T10:22:24.3414953Z mov.u32 %r27233, 0x0; 2026-02-21T10:22:24.3415174Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27230, %r27231, %r27232, %r27233 }, [ %rd600 + 0 ], %rd599; 2026-02-21T10:22:24.3415228Z // end inline asm 2026-02-21T10:22:24.3415287Z // begin inline asm 2026-02-21T10:22:24.3415343Z mov.u64 %rd602, 0x0; 2026-02-21T10:22:24.3415457Z createpolicy.fractional.L2::evict_first.b64 %rd602, 1.0; 2026-02-21T10:22:24.3415578Z // end inline asm 2026-02-21T10:22:24.3415633Z // begin inline asm 2026-02-21T10:22:24.3415688Z mov.u32 %r27234, 0x0; 2026-02-21T10:22:24.3415743Z mov.u32 %r27235, 0x0; 2026-02-21T10:22:24.3415800Z mov.u32 %r27236, 0x0; 2026-02-21T10:22:24.3415853Z mov.u32 %r27237, 0x0; 2026-02-21T10:22:24.3416071Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r27234, %r27235, %r27236, %r27237 }, [ %rd603 + 0 ], %rd602; 2026-02-21T10:22:24.3416131Z // end inline asm 2026-02-21T10:22:24.3416330Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3416384Z bar.sync 0; 2026-02-21T10:22:24.3416605Z st.shared.v2.b32 [%r10], {%r27206, %r27207}; 2026-02-21T10:22:24.3416702Z st.shared.v2.b32 [%r10+2048], {%r27210, %r27211}; 2026-02-21T10:22:24.3416787Z st.shared.v2.b32 [%r10+4096], {%r27214, %r27215}; 2026-02-21T10:22:24.3416872Z st.shared.v2.b32 [%r10+6144], {%r27218, %r27219}; 2026-02-21T10:22:24.3417040Z st.shared.v2.b32 [%r10+8192], {%r27222, %r27223}; 2026-02-21T10:22:24.3417137Z st.shared.v2.b32 [%r10+10240], {%r27226, %r27227}; 2026-02-21T10:22:24.3417223Z st.shared.v2.b32 [%r10+12288], {%r27230, %r27231}; 2026-02-21T10:22:24.3417307Z st.shared.v2.b32 [%r10+14336], {%r27234, %r27235}; 2026-02-21T10:22:24.3417383Z st.shared.v2.b32 [%r11], {%r27208, %r27209}; 2026-02-21T10:22:24.3417529Z st.shared.v2.b32 [%r11+2048], {%r27212, %r27213}; 2026-02-21T10:22:24.3417613Z st.shared.v2.b32 [%r11+4096], {%r27216, %r27217}; 2026-02-21T10:22:24.3417697Z st.shared.v2.b32 [%r11+6144], {%r27220, %r27221}; 2026-02-21T10:22:24.3417779Z st.shared.v2.b32 [%r11+8192], {%r27224, %r27225}; 2026-02-21T10:22:24.3417865Z st.shared.v2.b32 [%r11+10240], {%r27228, %r27229}; 2026-02-21T10:22:24.3417947Z st.shared.v2.b32 [%r11+12288], {%r27232, %r27233}; 2026-02-21T10:22:24.3418029Z st.shared.v2.b32 [%r11+14336], {%r27236, %r27237}; 2026-02-21T10:22:24.3418081Z bar.sync 0; 2026-02-21T10:22:24.3418153Z ld.shared.b16 %rs2241, [%r52]; 2026-02-21T10:22:24.3418223Z ld.shared.b16 %rs2242, [%r52+1024]; 2026-02-21T10:22:24.3418289Z ld.shared.b16 %rs2243, [%r52+64]; 2026-02-21T10:22:24.3418356Z ld.shared.b16 %rs2244, [%r52+1088]; 2026-02-21T10:22:24.3418417Z ld.shared.b16 %rs2245, [%r52+8192]; 2026-02-21T10:22:24.3418480Z ld.shared.b16 %rs2246, [%r52+9216]; 2026-02-21T10:22:24.3418557Z ld.shared.b16 %rs2247, [%r52+8256]; 2026-02-21T10:22:24.3418623Z ld.shared.b16 %rs2248, [%r52+9280]; 2026-02-21T10:22:24.3418761Z ld.shared.b16 %rs2249, [%r53]; 2026-02-21T10:22:24.3418826Z ld.shared.b16 %rs2250, [%r53+1024]; 2026-02-21T10:22:24.3418891Z ld.shared.b16 %rs2251, [%r53+64]; 2026-02-21T10:22:24.3418953Z ld.shared.b16 %rs2252, [%r53+1088]; 2026-02-21T10:22:24.3419015Z ld.shared.b16 %rs2253, [%r53+8192]; 2026-02-21T10:22:24.3419083Z ld.shared.b16 %rs2254, [%r53+9216]; 2026-02-21T10:22:24.3419155Z ld.shared.b16 %rs2255, [%r53+8256]; 2026-02-21T10:22:24.3419220Z ld.shared.b16 %rs2256, [%r53+9280]; 2026-02-21T10:22:24.3419286Z ld.shared.b16 %rs2257, [%r54]; 2026-02-21T10:22:24.3419351Z ld.shared.b16 %rs2258, [%r54+1024]; 2026-02-21T10:22:24.3419413Z ld.shared.b16 %rs2259, [%r54+64]; 2026-02-21T10:22:24.3419474Z ld.shared.b16 %rs2260, [%r54+1088]; 2026-02-21T10:22:24.3419537Z ld.shared.b16 %rs2261, [%r54+8192]; 2026-02-21T10:22:24.3419601Z ld.shared.b16 %rs2262, [%r54+9216]; 2026-02-21T10:22:24.3419663Z ld.shared.b16 %rs2263, [%r54+8256]; 2026-02-21T10:22:24.3419727Z ld.shared.b16 %rs2264, [%r54+9280]; 2026-02-21T10:22:24.3419793Z ld.shared.b16 %rs2265, [%r55]; 2026-02-21T10:22:24.3419855Z ld.shared.b16 %rs2266, [%r55+1024]; 2026-02-21T10:22:24.3419916Z ld.shared.b16 %rs2267, [%r55+64]; 2026-02-21T10:22:24.3419981Z ld.shared.b16 %rs2268, [%r55+1088]; 2026-02-21T10:22:24.3420041Z ld.shared.b16 %rs2269, [%r55+8192]; 2026-02-21T10:22:24.3420101Z ld.shared.b16 %rs2270, [%r55+9216]; 2026-02-21T10:22:24.3420163Z ld.shared.b16 %rs2271, [%r55+8256]; 2026-02-21T10:22:24.3420298Z ld.shared.b16 %rs2272, [%r55+9280]; 2026-02-21T10:22:24.3420361Z ld.shared.b16 %rs2273, [%r56]; 2026-02-21T10:22:24.3420422Z ld.shared.b16 %rs2274, [%r56+1024]; 2026-02-21T10:22:24.3420485Z ld.shared.b16 %rs2275, [%r56+64]; 2026-02-21T10:22:24.3420547Z ld.shared.b16 %rs2276, [%r56+1088]; 2026-02-21T10:22:24.3420619Z ld.shared.b16 %rs2277, [%r56+8192]; 2026-02-21T10:22:24.3420689Z ld.shared.b16 %rs2278, [%r56+9216]; 2026-02-21T10:22:24.3420750Z ld.shared.b16 %rs2279, [%r56+8256]; 2026-02-21T10:22:24.3420815Z ld.shared.b16 %rs2280, [%r56+9280]; 2026-02-21T10:22:24.3420878Z ld.shared.b16 %rs2281, [%r57]; 2026-02-21T10:22:24.3420941Z ld.shared.b16 %rs2282, [%r57+1024]; 2026-02-21T10:22:24.3421003Z ld.shared.b16 %rs2283, [%r57+64]; 2026-02-21T10:22:24.3421066Z ld.shared.b16 %rs2284, [%r57+1088]; 2026-02-21T10:22:24.3421129Z ld.shared.b16 %rs2285, [%r57+8192]; 2026-02-21T10:22:24.3421192Z ld.shared.b16 %rs2286, [%r57+9216]; 2026-02-21T10:22:24.3421254Z ld.shared.b16 %rs2287, [%r57+8256]; 2026-02-21T10:22:24.3421369Z ld.shared.b16 %rs2288, [%r57+9280]; 2026-02-21T10:22:24.3421432Z ld.shared.b16 %rs2289, [%r58]; 2026-02-21T10:22:24.3421497Z ld.shared.b16 %rs2290, [%r58+1024]; 2026-02-21T10:22:24.3421557Z ld.shared.b16 %rs2291, [%r58+64]; 2026-02-21T10:22:24.3421636Z ld.shared.b16 %rs2292, [%r58+1088]; 2026-02-21T10:22:24.3421748Z ld.shared.b16 %rs2293, [%r58+8192]; 2026-02-21T10:22:24.3421810Z ld.shared.b16 %rs2294, [%r58+9216]; 2026-02-21T10:22:24.3421874Z ld.shared.b16 %rs2295, [%r58+8256]; 2026-02-21T10:22:24.3421938Z ld.shared.b16 %rs2296, [%r58+9280]; 2026-02-21T10:22:24.3422011Z ld.shared.b16 %rs2297, [%r59]; 2026-02-21T10:22:24.3422075Z ld.shared.b16 %rs2298, [%r59+1024]; 2026-02-21T10:22:24.3422142Z ld.shared.b16 %rs2299, [%r59+64]; 2026-02-21T10:22:24.3422205Z ld.shared.b16 %rs2300, [%r59+1088]; 2026-02-21T10:22:24.3422267Z ld.shared.b16 %rs2301, [%r59+8192]; 2026-02-21T10:22:24.3422333Z ld.shared.b16 %rs2302, [%r59+9216]; 2026-02-21T10:22:24.3422398Z ld.shared.b16 %rs2303, [%r59+8256]; 2026-02-21T10:22:24.3422462Z ld.shared.b16 %rs2304, [%r59+9280]; 2026-02-21T10:22:24.3422528Z cvt.f32.bf16 %r27375, %rs2241; 2026-02-21T10:22:24.3422589Z cvt.f32.bf16 %r27376, %rs2242; 2026-02-21T10:22:24.3422648Z cvt.f32.bf16 %r27377, %rs2249; 2026-02-21T10:22:24.3422705Z cvt.f32.bf16 %r27378, %rs2250; 2026-02-21T10:22:24.3422768Z cvt.f32.bf16 %r27507, %rs2257; 2026-02-21T10:22:24.3422828Z cvt.f32.bf16 %r27508, %rs2258; 2026-02-21T10:22:24.3422886Z cvt.f32.bf16 %r27509, %rs2265; 2026-02-21T10:22:24.3423020Z cvt.f32.bf16 %r27510, %rs2266; 2026-02-21T10:22:24.3423080Z cvt.f32.bf16 %r27639, %rs2273; 2026-02-21T10:22:24.3423137Z cvt.f32.bf16 %r27640, %rs2274; 2026-02-21T10:22:24.3423194Z cvt.f32.bf16 %r27641, %rs2281; 2026-02-21T10:22:24.3423255Z cvt.f32.bf16 %r27642, %rs2282; 2026-02-21T10:22:24.3423313Z cvt.f32.bf16 %r27771, %rs2289; 2026-02-21T10:22:24.3423371Z cvt.f32.bf16 %r27772, %rs2290; 2026-02-21T10:22:24.3423431Z cvt.f32.bf16 %r27773, %rs2297; 2026-02-21T10:22:24.3423493Z cvt.f32.bf16 %r27774, %rs2298; 2026-02-21T10:22:24.3423550Z cvt.f32.bf16 %r27903, %rs2243; 2026-02-21T10:22:24.3423610Z cvt.f32.bf16 %r27904, %rs2244; 2026-02-21T10:22:24.3423670Z cvt.f32.bf16 %r27905, %rs2251; 2026-02-21T10:22:24.3423727Z cvt.f32.bf16 %r27906, %rs2252; 2026-02-21T10:22:24.3423788Z cvt.f32.bf16 %r28035, %rs2259; 2026-02-21T10:22:24.3423849Z cvt.f32.bf16 %r28036, %rs2260; 2026-02-21T10:22:24.3423906Z cvt.f32.bf16 %r28037, %rs2267; 2026-02-21T10:22:24.3423966Z cvt.f32.bf16 %r28038, %rs2268; 2026-02-21T10:22:24.3424025Z cvt.f32.bf16 %r28167, %rs2275; 2026-02-21T10:22:24.3424085Z cvt.f32.bf16 %r28168, %rs2276; 2026-02-21T10:22:24.3424156Z cvt.f32.bf16 %r28169, %rs2283; 2026-02-21T10:22:24.3424219Z cvt.f32.bf16 %r28170, %rs2284; 2026-02-21T10:22:24.3424284Z cvt.f32.bf16 %r28299, %rs2291; 2026-02-21T10:22:24.3424342Z cvt.f32.bf16 %r28300, %rs2292; 2026-02-21T10:22:24.3424401Z cvt.f32.bf16 %r28301, %rs2299; 2026-02-21T10:22:24.3424518Z cvt.f32.bf16 %r28302, %rs2300; 2026-02-21T10:22:24.3424578Z cvt.f32.bf16 %r28431, %rs2245; 2026-02-21T10:22:24.3424640Z cvt.f32.bf16 %r28432, %rs2246; 2026-02-21T10:22:24.3424698Z cvt.f32.bf16 %r28433, %rs2253; 2026-02-21T10:22:24.3424759Z cvt.f32.bf16 %r28434, %rs2254; 2026-02-21T10:22:24.3424817Z cvt.f32.bf16 %r28563, %rs2261; 2026-02-21T10:22:24.3424878Z cvt.f32.bf16 %r28564, %rs2262; 2026-02-21T10:22:24.3424938Z cvt.f32.bf16 %r28565, %rs2269; 2026-02-21T10:22:24.3424997Z cvt.f32.bf16 %r28566, %rs2270; 2026-02-21T10:22:24.3425056Z cvt.f32.bf16 %r28695, %rs2277; 2026-02-21T10:22:24.3425113Z cvt.f32.bf16 %r28696, %rs2278; 2026-02-21T10:22:24.3425176Z cvt.f32.bf16 %r28697, %rs2285; 2026-02-21T10:22:24.3425236Z cvt.f32.bf16 %r28698, %rs2286; 2026-02-21T10:22:24.3425296Z cvt.f32.bf16 %r28827, %rs2293; 2026-02-21T10:22:24.3425357Z cvt.f32.bf16 %r28828, %rs2294; 2026-02-21T10:22:24.3425415Z cvt.f32.bf16 %r28829, %rs2301; 2026-02-21T10:22:24.3425473Z cvt.f32.bf16 %r28830, %rs2302; 2026-02-21T10:22:24.3425611Z cvt.f32.bf16 %r28959, %rs2247; 2026-02-21T10:22:24.3425678Z cvt.f32.bf16 %r28960, %rs2248; 2026-02-21T10:22:24.3425735Z cvt.f32.bf16 %r28961, %rs2255; 2026-02-21T10:22:24.3425795Z cvt.f32.bf16 %r28962, %rs2256; 2026-02-21T10:22:24.3425856Z cvt.f32.bf16 %r29091, %rs2263; 2026-02-21T10:22:24.3425961Z cvt.f32.bf16 %r29092, %rs2264; 2026-02-21T10:22:24.3426019Z cvt.f32.bf16 %r29093, %rs2271; 2026-02-21T10:22:24.3426077Z cvt.f32.bf16 %r29094, %rs2272; 2026-02-21T10:22:24.3426138Z cvt.f32.bf16 %r29223, %rs2279; 2026-02-21T10:22:24.3426196Z cvt.f32.bf16 %r29224, %rs2280; 2026-02-21T10:22:24.3426261Z cvt.f32.bf16 %r29225, %rs2287; 2026-02-21T10:22:24.3426324Z cvt.f32.bf16 %r29226, %rs2288; 2026-02-21T10:22:24.3426382Z cvt.f32.bf16 %r29355, %rs2295; 2026-02-21T10:22:24.3426440Z cvt.f32.bf16 %r29356, %rs2296; 2026-02-21T10:22:24.3426625Z cvt.f32.bf16 %r29357, %rs2303; 2026-02-21T10:22:24.3426687Z cvt.f32.bf16 %r29358, %rs2304; 2026-02-21T10:22:24.3426901Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3426956Z bar.sync 0; 2026-02-21T10:22:24.3427015Z // begin inline asm 2026-02-21T10:22:24.3427119Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3427175Z // end inline asm 2026-02-21T10:22:24.3427233Z bar.sync 0; 2026-02-21T10:22:24.3427289Z // begin inline asm 2026-02-21T10:22:24.3427431Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3427567Z // end inline asm 2026-02-21T10:22:24.3427629Z // begin inline asm 2026-02-21T10:22:24.3427706Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3427760Z // end inline asm 2026-02-21T10:22:24.3427815Z bar.sync 0; 2026-02-21T10:22:24.3427893Z elect.sync %r29753|%p283, -1; 2026-02-21T10:22:24.3427962Z and.pred %p262, %p1, %p283; 2026-02-21T10:22:24.3428022Z add.s32 %r27242, %r29688, 160; 2026-02-21T10:22:24.3428082Z // begin inline asm 2026-02-21T10:22:24.3428499Z @%p262 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r29853, %r27242}], [%r29850]; 2026-02-21T10:22:24.3428559Z // end inline asm 2026-02-21T10:22:24.3428617Z bar.sync 0; 2026-02-21T10:22:24.3428673Z // begin inline asm 2026-02-21T10:22:24.3428724Z 2026-02-21T10:22:24.3428773Z { 2026-02-21T10:22:24.3428838Z .reg .pred complete; 2026-02-21T10:22:24.3428891Z waitLoop: 2026-02-21T10:22:24.3429035Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r29488; 2026-02-21T10:22:24.3429108Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3429158Z } 2026-02-21T10:22:24.3429163Z 2026-02-21T10:22:24.3429217Z // end inline asm 2026-02-21T10:22:24.3429275Z bar.sync 0; 2026-02-21T10:22:24.3429333Z // begin inline asm 2026-02-21T10:22:24.3429427Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3429483Z // end inline asm 2026-02-21T10:22:24.3429683Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3429842Z ld.shared.s8 %rs2305, [%r20]; 2026-02-21T10:22:24.3430038Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3430102Z shl.b16 %rs2306, %rs2305, 4; 2026-02-21T10:22:24.3430291Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3430359Z ld.shared.s8 %rs2307, [%r21+128]; 2026-02-21T10:22:24.3430556Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3430616Z shl.b16 %rs2308, %rs2307, 4; 2026-02-21T10:22:24.3430802Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3430872Z ld.shared.s8 %rs2309, [%r22+256]; 2026-02-21T10:22:24.3431059Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3431120Z shl.b16 %rs2310, %rs2309, 4; 2026-02-21T10:22:24.3431376Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3431441Z ld.shared.s8 %rs2311, [%r23+384]; 2026-02-21T10:22:24.3431629Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3431750Z shl.b16 %rs2312, %rs2311, 4; 2026-02-21T10:22:24.3431943Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3432006Z ld.shared.s8 %rs2313, [%r24+512]; 2026-02-21T10:22:24.3432193Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3432256Z shl.b16 %rs2314, %rs2313, 4; 2026-02-21T10:22:24.3432446Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3432510Z ld.shared.s8 %rs2315, [%r25+640]; 2026-02-21T10:22:24.3432709Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3432769Z shl.b16 %rs2316, %rs2315, 4; 2026-02-21T10:22:24.3432958Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3433022Z ld.shared.s8 %rs2317, [%r26+768]; 2026-02-21T10:22:24.3433213Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3433324Z shl.b16 %rs2318, %rs2317, 4; 2026-02-21T10:22:24.3433515Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3433582Z ld.shared.s8 %rs2319, [%r27+896]; 2026-02-21T10:22:24.3433769Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3433829Z shl.b16 %rs2320, %rs2319, 4; 2026-02-21T10:22:24.3434026Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3434093Z ld.shared.s8 %rs2321, [%r20+1024]; 2026-02-21T10:22:24.3434280Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3434341Z shl.b16 %rs2322, %rs2321, 4; 2026-02-21T10:22:24.3434531Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3434595Z ld.shared.s8 %rs2323, [%r21+1152]; 2026-02-21T10:22:24.3434787Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3434846Z shl.b16 %rs2324, %rs2323, 4; 2026-02-21T10:22:24.3435032Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3435094Z ld.shared.s8 %rs2325, [%r22+1280]; 2026-02-21T10:22:24.3435283Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3435400Z shl.b16 %rs2326, %rs2325, 4; 2026-02-21T10:22:24.3435586Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3435651Z ld.shared.s8 %rs2327, [%r23+1408]; 2026-02-21T10:22:24.3435840Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3435914Z shl.b16 %rs2328, %rs2327, 4; 2026-02-21T10:22:24.3436110Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3436173Z ld.shared.s8 %rs2329, [%r24+1536]; 2026-02-21T10:22:24.3436368Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3436431Z shl.b16 %rs2330, %rs2329, 4; 2026-02-21T10:22:24.3436740Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3436807Z ld.shared.s8 %rs2331, [%r25+1664]; 2026-02-21T10:22:24.3437078Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3437148Z shl.b16 %rs2332, %rs2331, 4; 2026-02-21T10:22:24.3437341Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3437468Z ld.shared.s8 %rs2333, [%r26+1792]; 2026-02-21T10:22:24.3437669Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3437729Z shl.b16 %rs2334, %rs2333, 4; 2026-02-21T10:22:24.3437919Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3437989Z ld.shared.s8 %rs2335, [%r27+1920]; 2026-02-21T10:22:24.3438177Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3438236Z shl.b16 %rs2336, %rs2335, 4; 2026-02-21T10:22:24.3438432Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3438496Z ld.shared.s8 %rs2337, [%r20+2048]; 2026-02-21T10:22:24.3438683Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3438744Z shl.b16 %rs2338, %rs2337, 4; 2026-02-21T10:22:24.3438934Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3439061Z ld.shared.s8 %rs2339, [%r21+2176]; 2026-02-21T10:22:24.3439251Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3439314Z shl.b16 %rs2340, %rs2339, 4; 2026-02-21T10:22:24.3439504Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3439565Z ld.shared.s8 %rs2341, [%r22+2304]; 2026-02-21T10:22:24.3439759Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3439821Z shl.b16 %rs2342, %rs2341, 4; 2026-02-21T10:22:24.3440007Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3440071Z ld.shared.s8 %rs2343, [%r23+2432]; 2026-02-21T10:22:24.3440259Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3440318Z shl.b16 %rs2344, %rs2343, 4; 2026-02-21T10:22:24.3440507Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3440570Z ld.shared.s8 %rs2345, [%r24+2560]; 2026-02-21T10:22:24.3440758Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3440817Z shl.b16 %rs2346, %rs2345, 4; 2026-02-21T10:22:24.3441008Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3441143Z ld.shared.s8 %rs2347, [%r25+2688]; 2026-02-21T10:22:24.3441329Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3441390Z shl.b16 %rs2348, %rs2347, 4; 2026-02-21T10:22:24.3441577Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3441642Z ld.shared.s8 %rs2349, [%r26+2816]; 2026-02-21T10:22:24.3441846Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3441907Z shl.b16 %rs2350, %rs2349, 4; 2026-02-21T10:22:24.3442095Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3442158Z ld.shared.s8 %rs2351, [%r27+2944]; 2026-02-21T10:22:24.3442345Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3442412Z shl.b16 %rs2352, %rs2351, 4; 2026-02-21T10:22:24.3442647Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3442715Z ld.shared.s8 %rs2353, [%r20+3072]; 2026-02-21T10:22:24.3442901Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3443007Z shl.b16 %rs2354, %rs2353, 4; 2026-02-21T10:22:24.3443197Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3443262Z ld.shared.s8 %rs2355, [%r21+3200]; 2026-02-21T10:22:24.3443448Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3443512Z shl.b16 %rs2356, %rs2355, 4; 2026-02-21T10:22:24.3443699Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3443762Z ld.shared.s8 %rs2357, [%r22+3328]; 2026-02-21T10:22:24.3443956Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3444015Z shl.b16 %rs2358, %rs2357, 4; 2026-02-21T10:22:24.3444200Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3444263Z ld.shared.s8 %rs2359, [%r23+3456]; 2026-02-21T10:22:24.3444454Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3444560Z shl.b16 %rs2360, %rs2359, 4; 2026-02-21T10:22:24.3444749Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3444817Z ld.shared.s8 %rs2361, [%r24+3584]; 2026-02-21T10:22:24.3445003Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3445062Z shl.b16 %rs2362, %rs2361, 4; 2026-02-21T10:22:24.3445256Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3445321Z ld.shared.s8 %rs2363, [%r25+3712]; 2026-02-21T10:22:24.3445508Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3445569Z shl.b16 %rs2364, %rs2363, 4; 2026-02-21T10:22:24.3445758Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3445821Z ld.shared.s8 %rs2365, [%r26+3840]; 2026-02-21T10:22:24.3446009Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3446091Z shl.b16 %rs2366, %rs2365, 4; 2026-02-21T10:22:24.3446280Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3446342Z ld.shared.s8 %rs2367, [%r27+3968]; 2026-02-21T10:22:24.3446666Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3446834Z shl.b16 %rs2368, %rs2367, 4; 2026-02-21T10:22:24.3447024Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3447089Z cvt.s16.s8 %rs2369, %rs2306; 2026-02-21T10:22:24.3447147Z shr.s16 %rs2370, %rs2369, 4; 2026-02-21T10:22:24.3447211Z cvt.s16.s8 %rs2371, %rs2308; 2026-02-21T10:22:24.3447274Z shr.s16 %rs2372, %rs2371, 4; 2026-02-21T10:22:24.3447332Z shr.s16 %rs2373, %rs2305, 4; 2026-02-21T10:22:24.3447394Z shr.s16 %rs2374, %rs2307, 4; 2026-02-21T10:22:24.3447583Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3447650Z cvt.rn.f32.s16 %r29754, %rs2374; 2026-02-21T10:22:24.3447711Z cvt.rn.f32.s16 %r29755, %rs2373; 2026-02-21T10:22:24.3447772Z cvt.rn.f32.s16 %r29756, %rs2372; 2026-02-21T10:22:24.3447835Z cvt.rn.f32.s16 %r29757, %rs2370; 2026-02-21T10:22:24.3448106Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3448173Z cvt.s16.s8 %rs2375, %rs2310; 2026-02-21T10:22:24.3448235Z shr.s16 %rs2376, %rs2375, 4; 2026-02-21T10:22:24.3448294Z cvt.s16.s8 %rs2377, %rs2312; 2026-02-21T10:22:24.3448353Z shr.s16 %rs2378, %rs2377, 4; 2026-02-21T10:22:24.3448412Z shr.s16 %rs2379, %rs2309, 4; 2026-02-21T10:22:24.3448537Z shr.s16 %rs2380, %rs2311, 4; 2026-02-21T10:22:24.3448729Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3448790Z cvt.rn.f32.s16 %r29758, %rs2380; 2026-02-21T10:22:24.3448853Z cvt.rn.f32.s16 %r29759, %rs2379; 2026-02-21T10:22:24.3448912Z cvt.rn.f32.s16 %r29760, %rs2378; 2026-02-21T10:22:24.3448971Z cvt.rn.f32.s16 %r29761, %rs2376; 2026-02-21T10:22:24.3449159Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3449221Z cvt.s16.s8 %rs2381, %rs2314; 2026-02-21T10:22:24.3449282Z shr.s16 %rs2382, %rs2381, 4; 2026-02-21T10:22:24.3449342Z cvt.s16.s8 %rs2383, %rs2316; 2026-02-21T10:22:24.3449402Z shr.s16 %rs2384, %rs2383, 4; 2026-02-21T10:22:24.3449461Z shr.s16 %rs2385, %rs2313, 4; 2026-02-21T10:22:24.3449519Z shr.s16 %rs2386, %rs2315, 4; 2026-02-21T10:22:24.3449709Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3449772Z cvt.rn.f32.s16 %r29762, %rs2386; 2026-02-21T10:22:24.3449833Z cvt.rn.f32.s16 %r29763, %rs2385; 2026-02-21T10:22:24.3449959Z cvt.rn.f32.s16 %r29764, %rs2384; 2026-02-21T10:22:24.3450029Z cvt.rn.f32.s16 %r29765, %rs2382; 2026-02-21T10:22:24.3450215Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3450273Z cvt.s16.s8 %rs2387, %rs2318; 2026-02-21T10:22:24.3450334Z shr.s16 %rs2388, %rs2387, 4; 2026-02-21T10:22:24.3450393Z cvt.s16.s8 %rs2389, %rs2320; 2026-02-21T10:22:24.3450454Z shr.s16 %rs2390, %rs2389, 4; 2026-02-21T10:22:24.3450517Z shr.s16 %rs2391, %rs2317, 4; 2026-02-21T10:22:24.3450576Z shr.s16 %rs2392, %rs2319, 4; 2026-02-21T10:22:24.3450763Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3450825Z cvt.rn.f32.s16 %r29766, %rs2392; 2026-02-21T10:22:24.3450892Z cvt.rn.f32.s16 %r29767, %rs2391; 2026-02-21T10:22:24.3450952Z cvt.rn.f32.s16 %r29768, %rs2390; 2026-02-21T10:22:24.3451010Z cvt.rn.f32.s16 %r29769, %rs2388; 2026-02-21T10:22:24.3451201Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3451259Z cvt.s16.s8 %rs2393, %rs2322; 2026-02-21T10:22:24.3451316Z shr.s16 %rs2394, %rs2393, 4; 2026-02-21T10:22:24.3451373Z cvt.s16.s8 %rs2395, %rs2324; 2026-02-21T10:22:24.3451433Z shr.s16 %rs2396, %rs2395, 4; 2026-02-21T10:22:24.3451495Z shr.s16 %rs2397, %rs2321, 4; 2026-02-21T10:22:24.3451553Z shr.s16 %rs2398, %rs2323, 4; 2026-02-21T10:22:24.3451799Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3451861Z cvt.rn.f32.s16 %r29770, %rs2398; 2026-02-21T10:22:24.3451919Z cvt.rn.f32.s16 %r29771, %rs2397; 2026-02-21T10:22:24.3451980Z cvt.rn.f32.s16 %r29772, %rs2396; 2026-02-21T10:22:24.3452040Z cvt.rn.f32.s16 %r29773, %rs2394; 2026-02-21T10:22:24.3452230Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3452293Z cvt.s16.s8 %rs2399, %rs2326; 2026-02-21T10:22:24.3452355Z shr.s16 %rs2400, %rs2399, 4; 2026-02-21T10:22:24.3452413Z cvt.s16.s8 %rs2401, %rs2328; 2026-02-21T10:22:24.3452471Z shr.s16 %rs2402, %rs2401, 4; 2026-02-21T10:22:24.3452531Z shr.s16 %rs2403, %rs2325, 4; 2026-02-21T10:22:24.3452589Z shr.s16 %rs2404, %rs2327, 4; 2026-02-21T10:22:24.3452789Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3452857Z cvt.rn.f32.s16 %r29774, %rs2404; 2026-02-21T10:22:24.3452967Z cvt.rn.f32.s16 %r29775, %rs2403; 2026-02-21T10:22:24.3453029Z cvt.rn.f32.s16 %r29776, %rs2402; 2026-02-21T10:22:24.3453087Z cvt.rn.f32.s16 %r29777, %rs2400; 2026-02-21T10:22:24.3453278Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3453383Z cvt.s16.s8 %rs2405, %rs2330; 2026-02-21T10:22:24.3453443Z shr.s16 %rs2406, %rs2405, 4; 2026-02-21T10:22:24.3453503Z cvt.s16.s8 %rs2407, %rs2332; 2026-02-21T10:22:24.3453564Z shr.s16 %rs2408, %rs2407, 4; 2026-02-21T10:22:24.3453622Z shr.s16 %rs2409, %rs2329, 4; 2026-02-21T10:22:24.3453683Z shr.s16 %rs2410, %rs2331, 4; 2026-02-21T10:22:24.3453870Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3453931Z cvt.rn.f32.s16 %r29778, %rs2410; 2026-02-21T10:22:24.3453992Z cvt.rn.f32.s16 %r29779, %rs2409; 2026-02-21T10:22:24.3454055Z cvt.rn.f32.s16 %r29780, %rs2408; 2026-02-21T10:22:24.3454117Z cvt.rn.f32.s16 %r29781, %rs2406; 2026-02-21T10:22:24.3454304Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3454366Z cvt.s16.s8 %rs2411, %rs2334; 2026-02-21T10:22:24.3454424Z shr.s16 %rs2412, %rs2411, 4; 2026-02-21T10:22:24.3454485Z cvt.s16.s8 %rs2413, %rs2336; 2026-02-21T10:22:24.3454545Z shr.s16 %rs2414, %rs2413, 4; 2026-02-21T10:22:24.3454605Z shr.s16 %rs2415, %rs2333, 4; 2026-02-21T10:22:24.3454712Z shr.s16 %rs2416, %rs2335, 4; 2026-02-21T10:22:24.3454904Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3454970Z cvt.rn.f32.s16 %r29782, %rs2416; 2026-02-21T10:22:24.3455030Z cvt.rn.f32.s16 %r29783, %rs2415; 2026-02-21T10:22:24.3455090Z cvt.rn.f32.s16 %r29784, %rs2414; 2026-02-21T10:22:24.3455151Z cvt.rn.f32.s16 %r29785, %rs2412; 2026-02-21T10:22:24.3455342Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3455403Z cvt.s16.s8 %rs2417, %rs2338; 2026-02-21T10:22:24.3455461Z shr.s16 %rs2418, %rs2417, 4; 2026-02-21T10:22:24.3455522Z cvt.s16.s8 %rs2419, %rs2340; 2026-02-21T10:22:24.3455580Z shr.s16 %rs2420, %rs2419, 4; 2026-02-21T10:22:24.3455642Z shr.s16 %rs2421, %rs2337, 4; 2026-02-21T10:22:24.3455702Z shr.s16 %rs2422, %rs2339, 4; 2026-02-21T10:22:24.3455907Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3455977Z cvt.rn.f32.s16 %r29786, %rs2422; 2026-02-21T10:22:24.3456040Z cvt.rn.f32.s16 %r29787, %rs2421; 2026-02-21T10:22:24.3456098Z cvt.rn.f32.s16 %r29788, %rs2420; 2026-02-21T10:22:24.3456156Z cvt.rn.f32.s16 %r29789, %rs2418; 2026-02-21T10:22:24.3456347Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3456407Z cvt.s16.s8 %rs2423, %rs2342; 2026-02-21T10:22:24.3456661Z shr.s16 %rs2424, %rs2423, 4; 2026-02-21T10:22:24.3456723Z cvt.s16.s8 %rs2425, %rs2344; 2026-02-21T10:22:24.3456784Z shr.s16 %rs2426, %rs2425, 4; 2026-02-21T10:22:24.3456853Z shr.s16 %rs2427, %rs2341, 4; 2026-02-21T10:22:24.3456912Z shr.s16 %rs2428, %rs2343, 4; 2026-02-21T10:22:24.3457111Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3457176Z cvt.rn.f32.s16 %r29790, %rs2428; 2026-02-21T10:22:24.3457238Z cvt.rn.f32.s16 %r29791, %rs2427; 2026-02-21T10:22:24.3457299Z cvt.rn.f32.s16 %r29792, %rs2426; 2026-02-21T10:22:24.3457362Z cvt.rn.f32.s16 %r29793, %rs2424; 2026-02-21T10:22:24.3457555Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3457614Z cvt.s16.s8 %rs2429, %rs2346; 2026-02-21T10:22:24.3457675Z shr.s16 %rs2430, %rs2429, 4; 2026-02-21T10:22:24.3457734Z cvt.s16.s8 %rs2431, %rs2348; 2026-02-21T10:22:24.3457796Z shr.s16 %rs2432, %rs2431, 4; 2026-02-21T10:22:24.3457926Z shr.s16 %rs2433, %rs2345, 4; 2026-02-21T10:22:24.3458003Z shr.s16 %rs2434, %rs2347, 4; 2026-02-21T10:22:24.3458194Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3458254Z cvt.rn.f32.s16 %r29794, %rs2434; 2026-02-21T10:22:24.3458381Z cvt.rn.f32.s16 %r29795, %rs2433; 2026-02-21T10:22:24.3458440Z cvt.rn.f32.s16 %r29796, %rs2432; 2026-02-21T10:22:24.3458500Z cvt.rn.f32.s16 %r29797, %rs2430; 2026-02-21T10:22:24.3458692Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3458751Z cvt.s16.s8 %rs2435, %rs2350; 2026-02-21T10:22:24.3458808Z shr.s16 %rs2436, %rs2435, 4; 2026-02-21T10:22:24.3458867Z cvt.s16.s8 %rs2437, %rs2352; 2026-02-21T10:22:24.3458927Z shr.s16 %rs2438, %rs2437, 4; 2026-02-21T10:22:24.3458985Z shr.s16 %rs2439, %rs2349, 4; 2026-02-21T10:22:24.3459044Z shr.s16 %rs2440, %rs2351, 4; 2026-02-21T10:22:24.3459240Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3459300Z cvt.rn.f32.s16 %r29798, %rs2440; 2026-02-21T10:22:24.3459360Z cvt.rn.f32.s16 %r29799, %rs2439; 2026-02-21T10:22:24.3459421Z cvt.rn.f32.s16 %r29800, %rs2438; 2026-02-21T10:22:24.3459482Z cvt.rn.f32.s16 %r29801, %rs2436; 2026-02-21T10:22:24.3459669Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3459796Z cvt.s16.s8 %rs2441, %rs2354; 2026-02-21T10:22:24.3459861Z shr.s16 %rs2442, %rs2441, 4; 2026-02-21T10:22:24.3459920Z cvt.s16.s8 %rs2443, %rs2356; 2026-02-21T10:22:24.3459977Z shr.s16 %rs2444, %rs2443, 4; 2026-02-21T10:22:24.3460040Z shr.s16 %rs2445, %rs2353, 4; 2026-02-21T10:22:24.3460097Z shr.s16 %rs2446, %rs2355, 4; 2026-02-21T10:22:24.3460285Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3460354Z cvt.rn.f32.s16 %r29802, %rs2446; 2026-02-21T10:22:24.3460413Z cvt.rn.f32.s16 %r29803, %rs2445; 2026-02-21T10:22:24.3460472Z cvt.rn.f32.s16 %r29804, %rs2444; 2026-02-21T10:22:24.3460542Z cvt.rn.f32.s16 %r29805, %rs2442; 2026-02-21T10:22:24.3460736Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3460797Z cvt.s16.s8 %rs2447, %rs2358; 2026-02-21T10:22:24.3460856Z shr.s16 %rs2448, %rs2447, 4; 2026-02-21T10:22:24.3460921Z cvt.s16.s8 %rs2449, %rs2360; 2026-02-21T10:22:24.3460978Z shr.s16 %rs2450, %rs2449, 4; 2026-02-21T10:22:24.3461035Z shr.s16 %rs2451, %rs2357, 4; 2026-02-21T10:22:24.3461092Z shr.s16 %rs2452, %rs2359, 4; 2026-02-21T10:22:24.3461284Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3461344Z cvt.rn.f32.s16 %r29806, %rs2452; 2026-02-21T10:22:24.3461405Z cvt.rn.f32.s16 %r29807, %rs2451; 2026-02-21T10:22:24.3461540Z cvt.rn.f32.s16 %r29808, %rs2450; 2026-02-21T10:22:24.3461601Z cvt.rn.f32.s16 %r29809, %rs2448; 2026-02-21T10:22:24.3461789Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3461849Z cvt.s16.s8 %rs2453, %rs2362; 2026-02-21T10:22:24.3461908Z shr.s16 %rs2454, %rs2453, 4; 2026-02-21T10:22:24.3461968Z cvt.s16.s8 %rs2455, %rs2364; 2026-02-21T10:22:24.3462026Z shr.s16 %rs2456, %rs2455, 4; 2026-02-21T10:22:24.3462087Z shr.s16 %rs2457, %rs2361, 4; 2026-02-21T10:22:24.3462147Z shr.s16 %rs2458, %rs2363, 4; 2026-02-21T10:22:24.3462334Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3462397Z cvt.rn.f32.s16 %r29810, %rs2458; 2026-02-21T10:22:24.3462457Z cvt.rn.f32.s16 %r29811, %rs2457; 2026-02-21T10:22:24.3462515Z cvt.rn.f32.s16 %r29812, %rs2456; 2026-02-21T10:22:24.3462588Z cvt.rn.f32.s16 %r29813, %rs2454; 2026-02-21T10:22:24.3462836Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3462897Z cvt.s16.s8 %rs2459, %rs2366; 2026-02-21T10:22:24.3462957Z shr.s16 %rs2460, %rs2459, 4; 2026-02-21T10:22:24.3463017Z cvt.s16.s8 %rs2461, %rs2368; 2026-02-21T10:22:24.3463076Z shr.s16 %rs2462, %rs2461, 4; 2026-02-21T10:22:24.3463180Z shr.s16 %rs2463, %rs2365, 4; 2026-02-21T10:22:24.3463239Z shr.s16 %rs2464, %rs2367, 4; 2026-02-21T10:22:24.3463430Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3463490Z cvt.rn.f32.s16 %r29814, %rs2464; 2026-02-21T10:22:24.3463552Z cvt.rn.f32.s16 %r29815, %rs2463; 2026-02-21T10:22:24.3463612Z cvt.rn.f32.s16 %r29816, %rs2462; 2026-02-21T10:22:24.3463672Z cvt.rn.f32.s16 %r29817, %rs2460; 2026-02-21T10:22:24.3463726Z bar.sync 0; 2026-02-21T10:22:24.3463848Z st.shared.v4.b32 [%r28], {%r29757, %r29755, %r29756, %r29754}; 2026-02-21T10:22:24.3463979Z st.shared.v4.b32 [%r28+16384], {%r29789, %r29787, %r29788, %r29786}; 2026-02-21T10:22:24.3464089Z st.shared.v4.b32 [%r29], {%r29761, %r29759, %r29760, %r29758}; 2026-02-21T10:22:24.3464206Z st.shared.v4.b32 [%r29+16384], {%r29793, %r29791, %r29792, %r29790}; 2026-02-21T10:22:24.3464312Z st.shared.v4.b32 [%r30], {%r29765, %r29763, %r29764, %r29762}; 2026-02-21T10:22:24.3464430Z st.shared.v4.b32 [%r30+16384], {%r29797, %r29795, %r29796, %r29794}; 2026-02-21T10:22:24.3464535Z st.shared.v4.b32 [%r31], {%r29769, %r29767, %r29768, %r29766}; 2026-02-21T10:22:24.3464729Z st.shared.v4.b32 [%r31+16384], {%r29801, %r29799, %r29800, %r29798}; 2026-02-21T10:22:24.3464841Z st.shared.v4.b32 [%r32], {%r29773, %r29771, %r29772, %r29770}; 2026-02-21T10:22:24.3464955Z st.shared.v4.b32 [%r32+16384], {%r29805, %r29803, %r29804, %r29802}; 2026-02-21T10:22:24.3465064Z st.shared.v4.b32 [%r33], {%r29777, %r29775, %r29776, %r29774}; 2026-02-21T10:22:24.3465177Z st.shared.v4.b32 [%r33+16384], {%r29809, %r29807, %r29808, %r29806}; 2026-02-21T10:22:24.3465286Z st.shared.v4.b32 [%r34], {%r29781, %r29779, %r29780, %r29778}; 2026-02-21T10:22:24.3465403Z st.shared.v4.b32 [%r34+16384], {%r29813, %r29811, %r29812, %r29810}; 2026-02-21T10:22:24.3465507Z st.shared.v4.b32 [%r35], {%r29785, %r29783, %r29784, %r29782}; 2026-02-21T10:22:24.3465621Z st.shared.v4.b32 [%r35+16384], {%r29817, %r29815, %r29816, %r29814}; 2026-02-21T10:22:24.3465681Z $L__tmp21: 2026-02-21T10:22:24.3465951Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3466010Z // begin inline asm 2026-02-21T10:22:24.3466094Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3466159Z // end inline asm 2026-02-21T10:22:24.3466212Z bar.sync 0; 2026-02-21T10:22:24.3466283Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3466343Z // begin inline asm 2026-02-21T10:22:24.3467965Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r27375,%r27376,%r27377,%r27378}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3468112Z // end inline asm 2026-02-21T10:22:24.3468168Z // begin inline asm 2026-02-21T10:22:24.3469788Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r27507,%r27508,%r27509,%r27510}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3469858Z // end inline asm 2026-02-21T10:22:24.3469916Z // begin inline asm 2026-02-21T10:22:24.3471455Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r27639,%r27640,%r27641,%r27642}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3471516Z // end inline asm 2026-02-21T10:22:24.3471574Z // begin inline asm 2026-02-21T10:22:24.3473113Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r27771,%r27772,%r27773,%r27774}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3473173Z // end inline asm 2026-02-21T10:22:24.3473230Z // begin inline asm 2026-02-21T10:22:24.3474720Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r27903,%r27904,%r27905,%r27906}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3474781Z // end inline asm 2026-02-21T10:22:24.3474838Z // begin inline asm 2026-02-21T10:22:24.3476326Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r28035,%r28036,%r28037,%r28038}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3476432Z // end inline asm 2026-02-21T10:22:24.3476614Z // begin inline asm 2026-02-21T10:22:24.3478105Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r28167,%r28168,%r28169,%r28170}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3478164Z // end inline asm 2026-02-21T10:22:24.3478223Z // begin inline asm 2026-02-21T10:22:24.3479787Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r28299,%r28300,%r28301,%r28302}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3479906Z // end inline asm 2026-02-21T10:22:24.3479963Z // begin inline asm 2026-02-21T10:22:24.3481443Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r28431,%r28432,%r28433,%r28434}, %rd3, %p224, 1, 1; 2026-02-21T10:22:24.3481509Z // end inline asm 2026-02-21T10:22:24.3481564Z // begin inline asm 2026-02-21T10:22:24.3483097Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r28563,%r28564,%r28565,%r28566}, %rd4, %p224, 1, 1; 2026-02-21T10:22:24.3483160Z // end inline asm 2026-02-21T10:22:24.3483216Z // begin inline asm 2026-02-21T10:22:24.3484696Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r28695,%r28696,%r28697,%r28698}, %rd5, %p224, 1, 1; 2026-02-21T10:22:24.3484758Z // end inline asm 2026-02-21T10:22:24.3484813Z // begin inline asm 2026-02-21T10:22:24.3486298Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r28827,%r28828,%r28829,%r28830}, %rd6, %p224, 1, 1; 2026-02-21T10:22:24.3486414Z // end inline asm 2026-02-21T10:22:24.3486592Z // begin inline asm 2026-02-21T10:22:24.3488141Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r28959,%r28960,%r28961,%r28962}, %rd7, %p224, 1, 1; 2026-02-21T10:22:24.3488201Z // end inline asm 2026-02-21T10:22:24.3488320Z // begin inline asm 2026-02-21T10:22:24.3489797Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r29091,%r29092,%r29093,%r29094}, %rd8, %p224, 1, 1; 2026-02-21T10:22:24.3489856Z // end inline asm 2026-02-21T10:22:24.3489916Z // begin inline asm 2026-02-21T10:22:24.3491451Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r29223,%r29224,%r29225,%r29226}, %rd9, %p224, 1, 1; 2026-02-21T10:22:24.3491513Z // end inline asm 2026-02-21T10:22:24.3491570Z // begin inline asm 2026-02-21T10:22:24.3493052Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r29355,%r29356,%r29357,%r29358}, %rd10, %p224, 1, 1; 2026-02-21T10:22:24.3493116Z // end inline asm 2026-02-21T10:22:24.3493192Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3493252Z mov.b32 %r29487, %r39936; 2026-02-21T10:22:24.3493312Z mov.b32 %r29489, %r29488; 2026-02-21T10:22:24.3493368Z // begin inline asm 2026-02-21T10:22:24.3495892Z // wait for regs: %r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115,%r29487,%r29488,%r29489 2026-02-21T10:22:24.3496049Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3496106Z // end inline asm 2026-02-21T10:22:24.3496161Z $L__tmp22: 2026-02-21T10:22:24.3496423Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3496630Z add.s64 %rd848, %rd848, 384; 2026-02-21T10:22:24.3496696Z add.s32 %r42987, %r42987, 192; 2026-02-21T10:22:24.3496764Z setp.lt.u64 %p284, %rd79, 3936; 2026-02-21T10:22:24.3496823Z mov.b64 %rd849, %rd79; 2026-02-21T10:22:24.3496968Z @%p284 bra $L__BB0_11; 2026-02-21T10:22:24.3497080Z // %bb.12: // %.preheader269.preheader 2026-02-21T10:22:24.3497183Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.3497246Z add.s64 %rd81, %rd76, 16128; 2026-02-21T10:22:24.3497308Z add.s64 %rd82, %rd69, 16128; 2026-02-21T10:22:24.3497365Z add.s64 %rd83, %rd70, 16128; 2026-02-21T10:22:24.3497423Z add.s64 %rd84, %rd71, 16128; 2026-02-21T10:22:24.3497480Z add.s64 %rd85, %rd72, 16128; 2026-02-21T10:22:24.3497539Z add.s64 %rd86, %rd73, 16128; 2026-02-21T10:22:24.3497598Z add.s64 %rd87, %rd74, 16128; 2026-02-21T10:22:24.3497664Z add.s64 %rd88, %rd75, 16128; 2026-02-21T10:22:24.3497727Z mov.b64 %rd851, 4000; 2026-02-21T10:22:24.3497784Z mov.b64 %rd850, %rd11; 2026-02-21T10:22:24.3497879Z $L__BB0_13: // %.preheader269 2026-02-21T10:22:24.3497978Z // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:24.3498087Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.3498368Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3498438Z add.s64 %rd624, %rd850, %rd88; 2026-02-21T10:22:24.3498502Z add.s64 %rd627, %rd850, %rd87; 2026-02-21T10:22:24.3498561Z add.s64 %rd630, %rd850, %rd86; 2026-02-21T10:22:24.3498621Z add.s64 %rd633, %rd850, %rd85; 2026-02-21T10:22:24.3498682Z add.s64 %rd636, %rd850, %rd84; 2026-02-21T10:22:24.3498740Z add.s64 %rd639, %rd850, %rd83; 2026-02-21T10:22:24.3498799Z add.s64 %rd642, %rd850, %rd82; 2026-02-21T10:22:24.3498999Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3499062Z add.s64 %rd645, %rd850, %rd81; 2026-02-21T10:22:24.3499121Z // begin inline asm 2026-02-21T10:22:24.3499179Z mov.u64 %rd623, 0x0; 2026-02-21T10:22:24.3499305Z createpolicy.fractional.L2::evict_first.b64 %rd623, 1.0; 2026-02-21T10:22:24.3499363Z // end inline asm 2026-02-21T10:22:24.3499419Z // begin inline asm 2026-02-21T10:22:24.3499477Z mov.u32 %r29818, 0x0; 2026-02-21T10:22:24.3499535Z mov.u32 %r29819, 0x0; 2026-02-21T10:22:24.3499602Z mov.u32 %r29820, 0x0; 2026-02-21T10:22:24.3499659Z mov.u32 %r29821, 0x0; 2026-02-21T10:22:24.3499899Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29818, %r29819, %r29820, %r29821 }, [ %rd624 + 0 ], %rd623; 2026-02-21T10:22:24.3499954Z // end inline asm 2026-02-21T10:22:24.3500009Z // begin inline asm 2026-02-21T10:22:24.3500069Z mov.u64 %rd626, 0x0; 2026-02-21T10:22:24.3500260Z createpolicy.fractional.L2::evict_first.b64 %rd626, 1.0; 2026-02-21T10:22:24.3500315Z // end inline asm 2026-02-21T10:22:24.3500372Z // begin inline asm 2026-02-21T10:22:24.3500431Z mov.u32 %r29822, 0x0; 2026-02-21T10:22:24.3500487Z mov.u32 %r29823, 0x0; 2026-02-21T10:22:24.3500541Z mov.u32 %r29824, 0x0; 2026-02-21T10:22:24.3500598Z mov.u32 %r29825, 0x0; 2026-02-21T10:22:24.3500827Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29822, %r29823, %r29824, %r29825 }, [ %rd627 + 0 ], %rd626; 2026-02-21T10:22:24.3500883Z // end inline asm 2026-02-21T10:22:24.3500954Z // begin inline asm 2026-02-21T10:22:24.3501013Z mov.u64 %rd629, 0x0; 2026-02-21T10:22:24.3501129Z createpolicy.fractional.L2::evict_first.b64 %rd629, 1.0; 2026-02-21T10:22:24.3501184Z // end inline asm 2026-02-21T10:22:24.3501243Z // begin inline asm 2026-02-21T10:22:24.3501298Z mov.u32 %r29826, 0x0; 2026-02-21T10:22:24.3501353Z mov.u32 %r29827, 0x0; 2026-02-21T10:22:24.3501409Z mov.u32 %r29828, 0x0; 2026-02-21T10:22:24.3501466Z mov.u32 %r29829, 0x0; 2026-02-21T10:22:24.3501771Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29826, %r29827, %r29828, %r29829 }, [ %rd630 + 0 ], %rd629; 2026-02-21T10:22:24.3501832Z // end inline asm 2026-02-21T10:22:24.3501889Z // begin inline asm 2026-02-21T10:22:24.3501945Z mov.u64 %rd632, 0x0; 2026-02-21T10:22:24.3502061Z createpolicy.fractional.L2::evict_first.b64 %rd632, 1.0; 2026-02-21T10:22:24.3502169Z // end inline asm 2026-02-21T10:22:24.3502226Z // begin inline asm 2026-02-21T10:22:24.3502283Z mov.u32 %r29830, 0x0; 2026-02-21T10:22:24.3502342Z mov.u32 %r29831, 0x0; 2026-02-21T10:22:24.3502396Z mov.u32 %r29832, 0x0; 2026-02-21T10:22:24.3502451Z mov.u32 %r29833, 0x0; 2026-02-21T10:22:24.3502684Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29830, %r29831, %r29832, %r29833 }, [ %rd633 + 0 ], %rd632; 2026-02-21T10:22:24.3502744Z // end inline asm 2026-02-21T10:22:24.3502802Z // begin inline asm 2026-02-21T10:22:24.3502858Z mov.u64 %rd635, 0x0; 2026-02-21T10:22:24.3502986Z createpolicy.fractional.L2::evict_first.b64 %rd635, 1.0; 2026-02-21T10:22:24.3503041Z // end inline asm 2026-02-21T10:22:24.3503096Z // begin inline asm 2026-02-21T10:22:24.3503153Z mov.u32 %r29834, 0x0; 2026-02-21T10:22:24.3503208Z mov.u32 %r29835, 0x0; 2026-02-21T10:22:24.3503262Z mov.u32 %r29836, 0x0; 2026-02-21T10:22:24.3503317Z mov.u32 %r29837, 0x0; 2026-02-21T10:22:24.3503546Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29834, %r29835, %r29836, %r29837 }, [ %rd636 + 0 ], %rd635; 2026-02-21T10:22:24.3503654Z // end inline asm 2026-02-21T10:22:24.3503713Z // begin inline asm 2026-02-21T10:22:24.3503772Z mov.u64 %rd638, 0x0; 2026-02-21T10:22:24.3503903Z createpolicy.fractional.L2::evict_first.b64 %rd638, 1.0; 2026-02-21T10:22:24.3503959Z // end inline asm 2026-02-21T10:22:24.3504019Z // begin inline asm 2026-02-21T10:22:24.3504075Z mov.u32 %r29838, 0x0; 2026-02-21T10:22:24.3504129Z mov.u32 %r29839, 0x0; 2026-02-21T10:22:24.3504184Z mov.u32 %r29840, 0x0; 2026-02-21T10:22:24.3504243Z mov.u32 %r29841, 0x0; 2026-02-21T10:22:24.3504470Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29838, %r29839, %r29840, %r29841 }, [ %rd639 + 0 ], %rd638; 2026-02-21T10:22:24.3504524Z // end inline asm 2026-02-21T10:22:24.3504583Z // begin inline asm 2026-02-21T10:22:24.3504639Z mov.u64 %rd641, 0x0; 2026-02-21T10:22:24.3504752Z createpolicy.fractional.L2::evict_first.b64 %rd641, 1.0; 2026-02-21T10:22:24.3504809Z // end inline asm 2026-02-21T10:22:24.3504867Z // begin inline asm 2026-02-21T10:22:24.3504922Z mov.u32 %r29842, 0x0; 2026-02-21T10:22:24.3504979Z mov.u32 %r29843, 0x0; 2026-02-21T10:22:24.3505037Z mov.u32 %r29844, 0x0; 2026-02-21T10:22:24.3505091Z mov.u32 %r29845, 0x0; 2026-02-21T10:22:24.3505308Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29842, %r29843, %r29844, %r29845 }, [ %rd642 + 0 ], %rd641; 2026-02-21T10:22:24.3505364Z // end inline asm 2026-02-21T10:22:24.3505419Z // begin inline asm 2026-02-21T10:22:24.3505475Z mov.u64 %rd644, 0x0; 2026-02-21T10:22:24.3505651Z createpolicy.fractional.L2::evict_first.b64 %rd644, 1.0; 2026-02-21T10:22:24.3505709Z // end inline asm 2026-02-21T10:22:24.3505766Z // begin inline asm 2026-02-21T10:22:24.3505821Z mov.u32 %r29846, 0x0; 2026-02-21T10:22:24.3505878Z mov.u32 %r29847, 0x0; 2026-02-21T10:22:24.3505932Z mov.u32 %r29848, 0x0; 2026-02-21T10:22:24.3505987Z mov.u32 %r29849, 0x0; 2026-02-21T10:22:24.3506219Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r29846, %r29847, %r29848, %r29849 }, [ %rd645 + 0 ], %rd644; 2026-02-21T10:22:24.3506276Z // end inline asm 2026-02-21T10:22:24.3506599Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3506659Z bar.sync 0; 2026-02-21T10:22:24.3506747Z st.shared.v2.b32 [%r10], {%r29818, %r29819}; 2026-02-21T10:22:24.3506839Z st.shared.v2.b32 [%r10+2048], {%r29822, %r29823}; 2026-02-21T10:22:24.3506926Z st.shared.v2.b32 [%r10+4096], {%r29826, %r29827}; 2026-02-21T10:22:24.3507011Z st.shared.v2.b32 [%r10+6144], {%r29830, %r29831}; 2026-02-21T10:22:24.3507181Z st.shared.v2.b32 [%r10+8192], {%r29834, %r29835}; 2026-02-21T10:22:24.3507277Z st.shared.v2.b32 [%r10+10240], {%r29838, %r29839}; 2026-02-21T10:22:24.3507367Z st.shared.v2.b32 [%r10+12288], {%r29842, %r29843}; 2026-02-21T10:22:24.3507451Z st.shared.v2.b32 [%r10+14336], {%r29846, %r29847}; 2026-02-21T10:22:24.3507592Z st.shared.v2.b32 [%r11], {%r29820, %r29821}; 2026-02-21T10:22:24.3507676Z st.shared.v2.b32 [%r11+2048], {%r29824, %r29825}; 2026-02-21T10:22:24.3507762Z st.shared.v2.b32 [%r11+4096], {%r29828, %r29829}; 2026-02-21T10:22:24.3507843Z st.shared.v2.b32 [%r11+6144], {%r29832, %r29833}; 2026-02-21T10:22:24.3507924Z st.shared.v2.b32 [%r11+8192], {%r29836, %r29837}; 2026-02-21T10:22:24.3508009Z st.shared.v2.b32 [%r11+10240], {%r29840, %r29841}; 2026-02-21T10:22:24.3508095Z st.shared.v2.b32 [%r11+12288], {%r29844, %r29845}; 2026-02-21T10:22:24.3508178Z st.shared.v2.b32 [%r11+14336], {%r29848, %r29849}; 2026-02-21T10:22:24.3508237Z bar.sync 0; 2026-02-21T10:22:24.3508306Z ld.shared.b16 %rs2465, [%r52]; 2026-02-21T10:22:24.3508375Z ld.shared.b16 %rs2466, [%r52+1024]; 2026-02-21T10:22:24.3508513Z ld.shared.b16 %rs2467, [%r52+64]; 2026-02-21T10:22:24.3508586Z ld.shared.b16 %rs2468, [%r52+1088]; 2026-02-21T10:22:24.3508650Z ld.shared.b16 %rs2469, [%r52+8192]; 2026-02-21T10:22:24.3508716Z ld.shared.b16 %rs2470, [%r52+9216]; 2026-02-21T10:22:24.3508781Z ld.shared.b16 %rs2471, [%r52+8256]; 2026-02-21T10:22:24.3508843Z ld.shared.b16 %rs2472, [%r52+9280]; 2026-02-21T10:22:24.3508975Z ld.shared.b16 %rs2473, [%r53]; 2026-02-21T10:22:24.3509040Z ld.shared.b16 %rs2474, [%r53+1024]; 2026-02-21T10:22:24.3509108Z ld.shared.b16 %rs2475, [%r53+64]; 2026-02-21T10:22:24.3509171Z ld.shared.b16 %rs2476, [%r53+1088]; 2026-02-21T10:22:24.3509234Z ld.shared.b16 %rs2477, [%r53+8192]; 2026-02-21T10:22:24.3509301Z ld.shared.b16 %rs2478, [%r53+9216]; 2026-02-21T10:22:24.3509363Z ld.shared.b16 %rs2479, [%r53+8256]; 2026-02-21T10:22:24.3509429Z ld.shared.b16 %rs2480, [%r53+9280]; 2026-02-21T10:22:24.3509493Z ld.shared.b16 %rs2481, [%r54]; 2026-02-21T10:22:24.3509556Z ld.shared.b16 %rs2482, [%r54+1024]; 2026-02-21T10:22:24.3509617Z ld.shared.b16 %rs2483, [%r54+64]; 2026-02-21T10:22:24.3509680Z ld.shared.b16 %rs2484, [%r54+1088]; 2026-02-21T10:22:24.3509744Z ld.shared.b16 %rs2485, [%r54+8192]; 2026-02-21T10:22:24.3509809Z ld.shared.b16 %rs2486, [%r54+9216]; 2026-02-21T10:22:24.3509871Z ld.shared.b16 %rs2487, [%r54+8256]; 2026-02-21T10:22:24.3509942Z ld.shared.b16 %rs2488, [%r54+9280]; 2026-02-21T10:22:24.3510011Z ld.shared.b16 %rs2489, [%r55]; 2026-02-21T10:22:24.3510073Z ld.shared.b16 %rs2490, [%r55+1024]; 2026-02-21T10:22:24.3510134Z ld.shared.b16 %rs2491, [%r55+64]; 2026-02-21T10:22:24.3510198Z ld.shared.b16 %rs2492, [%r55+1088]; 2026-02-21T10:22:24.3510261Z ld.shared.b16 %rs2493, [%r55+8192]; 2026-02-21T10:22:24.3510322Z ld.shared.b16 %rs2494, [%r55+9216]; 2026-02-21T10:22:24.3510460Z ld.shared.b16 %rs2495, [%r55+8256]; 2026-02-21T10:22:24.3510526Z ld.shared.b16 %rs2496, [%r55+9280]; 2026-02-21T10:22:24.3510589Z ld.shared.b16 %rs2497, [%r56]; 2026-02-21T10:22:24.3510651Z ld.shared.b16 %rs2498, [%r56+1024]; 2026-02-21T10:22:24.3510715Z ld.shared.b16 %rs2499, [%r56+64]; 2026-02-21T10:22:24.3510777Z ld.shared.b16 %rs2500, [%r56+1088]; 2026-02-21T10:22:24.3510842Z ld.shared.b16 %rs2501, [%r56+8192]; 2026-02-21T10:22:24.3510906Z ld.shared.b16 %rs2502, [%r56+9216]; 2026-02-21T10:22:24.3510968Z ld.shared.b16 %rs2503, [%r56+8256]; 2026-02-21T10:22:24.3511033Z ld.shared.b16 %rs2504, [%r56+9280]; 2026-02-21T10:22:24.3511101Z ld.shared.b16 %rs2505, [%r57]; 2026-02-21T10:22:24.3511164Z ld.shared.b16 %rs2506, [%r57+1024]; 2026-02-21T10:22:24.3511227Z ld.shared.b16 %rs2507, [%r57+64]; 2026-02-21T10:22:24.3511289Z ld.shared.b16 %rs2508, [%r57+1088]; 2026-02-21T10:22:24.3511354Z ld.shared.b16 %rs2509, [%r57+8192]; 2026-02-21T10:22:24.3511417Z ld.shared.b16 %rs2510, [%r57+9216]; 2026-02-21T10:22:24.3511532Z ld.shared.b16 %rs2511, [%r57+8256]; 2026-02-21T10:22:24.3511613Z ld.shared.b16 %rs2512, [%r57+9280]; 2026-02-21T10:22:24.3511675Z ld.shared.b16 %rs2513, [%r58]; 2026-02-21T10:22:24.3511739Z ld.shared.b16 %rs2514, [%r58+1024]; 2026-02-21T10:22:24.3511801Z ld.shared.b16 %rs2515, [%r58+64]; 2026-02-21T10:22:24.3511870Z ld.shared.b16 %rs2516, [%r58+1088]; 2026-02-21T10:22:24.3511983Z ld.shared.b16 %rs2517, [%r58+8192]; 2026-02-21T10:22:24.3512047Z ld.shared.b16 %rs2518, [%r58+9216]; 2026-02-21T10:22:24.3512115Z ld.shared.b16 %rs2519, [%r58+8256]; 2026-02-21T10:22:24.3512178Z ld.shared.b16 %rs2520, [%r58+9280]; 2026-02-21T10:22:24.3512239Z ld.shared.b16 %rs2521, [%r59]; 2026-02-21T10:22:24.3512303Z ld.shared.b16 %rs2522, [%r59+1024]; 2026-02-21T10:22:24.3512365Z ld.shared.b16 %rs2523, [%r59+64]; 2026-02-21T10:22:24.3512425Z ld.shared.b16 %rs2524, [%r59+1088]; 2026-02-21T10:22:24.3512486Z ld.shared.b16 %rs2525, [%r59+8192]; 2026-02-21T10:22:24.3512553Z ld.shared.b16 %rs2526, [%r59+9216]; 2026-02-21T10:22:24.3512619Z ld.shared.b16 %rs2527, [%r59+8256]; 2026-02-21T10:22:24.3512681Z ld.shared.b16 %rs2528, [%r59+9280]; 2026-02-21T10:22:24.3512745Z cvt.f32.bf16 %r29987, %rs2465; 2026-02-21T10:22:24.3512806Z cvt.f32.bf16 %r29988, %rs2466; 2026-02-21T10:22:24.3512866Z cvt.f32.bf16 %r29989, %rs2473; 2026-02-21T10:22:24.3512929Z cvt.f32.bf16 %r29990, %rs2474; 2026-02-21T10:22:24.3512988Z cvt.f32.bf16 %r30119, %rs2481; 2026-02-21T10:22:24.3513046Z cvt.f32.bf16 %r30120, %rs2482; 2026-02-21T10:22:24.3513158Z cvt.f32.bf16 %r30121, %rs2489; 2026-02-21T10:22:24.3513226Z cvt.f32.bf16 %r30122, %rs2490; 2026-02-21T10:22:24.3513284Z cvt.f32.bf16 %r30251, %rs2497; 2026-02-21T10:22:24.3513353Z cvt.f32.bf16 %r30252, %rs2498; 2026-02-21T10:22:24.3513415Z cvt.f32.bf16 %r30253, %rs2505; 2026-02-21T10:22:24.3513474Z cvt.f32.bf16 %r30254, %rs2506; 2026-02-21T10:22:24.3513536Z cvt.f32.bf16 %r30383, %rs2513; 2026-02-21T10:22:24.3513595Z cvt.f32.bf16 %r30384, %rs2514; 2026-02-21T10:22:24.3513661Z cvt.f32.bf16 %r30385, %rs2521; 2026-02-21T10:22:24.3513719Z cvt.f32.bf16 %r30386, %rs2522; 2026-02-21T10:22:24.3513778Z cvt.f32.bf16 %r30515, %rs2467; 2026-02-21T10:22:24.3513838Z cvt.f32.bf16 %r30516, %rs2468; 2026-02-21T10:22:24.3513895Z cvt.f32.bf16 %r30517, %rs2475; 2026-02-21T10:22:24.3513954Z cvt.f32.bf16 %r30518, %rs2476; 2026-02-21T10:22:24.3514016Z cvt.f32.bf16 %r30647, %rs2483; 2026-02-21T10:22:24.3514075Z cvt.f32.bf16 %r30648, %rs2484; 2026-02-21T10:22:24.3514139Z cvt.f32.bf16 %r30649, %rs2491; 2026-02-21T10:22:24.3514197Z cvt.f32.bf16 %r30650, %rs2492; 2026-02-21T10:22:24.3514256Z cvt.f32.bf16 %r30779, %rs2499; 2026-02-21T10:22:24.3514315Z cvt.f32.bf16 %r30780, %rs2500; 2026-02-21T10:22:24.3514372Z cvt.f32.bf16 %r30781, %rs2507; 2026-02-21T10:22:24.3514433Z cvt.f32.bf16 %r30782, %rs2508; 2026-02-21T10:22:24.3514493Z cvt.f32.bf16 %r30911, %rs2515; 2026-02-21T10:22:24.3514551Z cvt.f32.bf16 %r30912, %rs2516; 2026-02-21T10:22:24.3514666Z cvt.f32.bf16 %r30913, %rs2523; 2026-02-21T10:22:24.3514735Z cvt.f32.bf16 %r30914, %rs2524; 2026-02-21T10:22:24.3514800Z cvt.f32.bf16 %r31043, %rs2469; 2026-02-21T10:22:24.3514858Z cvt.f32.bf16 %r31044, %rs2470; 2026-02-21T10:22:24.3514916Z cvt.f32.bf16 %r31045, %rs2477; 2026-02-21T10:22:24.3514978Z cvt.f32.bf16 %r31046, %rs2478; 2026-02-21T10:22:24.3515038Z cvt.f32.bf16 %r31175, %rs2485; 2026-02-21T10:22:24.3515096Z cvt.f32.bf16 %r31176, %rs2486; 2026-02-21T10:22:24.3515156Z cvt.f32.bf16 %r31177, %rs2493; 2026-02-21T10:22:24.3515215Z cvt.f32.bf16 %r31178, %rs2494; 2026-02-21T10:22:24.3515273Z cvt.f32.bf16 %r31307, %rs2501; 2026-02-21T10:22:24.3515334Z cvt.f32.bf16 %r31308, %rs2502; 2026-02-21T10:22:24.3515392Z cvt.f32.bf16 %r31309, %rs2509; 2026-02-21T10:22:24.3515450Z cvt.f32.bf16 %r31310, %rs2510; 2026-02-21T10:22:24.3515508Z cvt.f32.bf16 %r31439, %rs2517; 2026-02-21T10:22:24.3515574Z cvt.f32.bf16 %r31440, %rs2518; 2026-02-21T10:22:24.3515632Z cvt.f32.bf16 %r31441, %rs2525; 2026-02-21T10:22:24.3515763Z cvt.f32.bf16 %r31442, %rs2526; 2026-02-21T10:22:24.3515826Z cvt.f32.bf16 %r31571, %rs2471; 2026-02-21T10:22:24.3515883Z cvt.f32.bf16 %r31572, %rs2472; 2026-02-21T10:22:24.3515940Z cvt.f32.bf16 %r31573, %rs2479; 2026-02-21T10:22:24.3515997Z cvt.f32.bf16 %r31574, %rs2480; 2026-02-21T10:22:24.3516059Z cvt.f32.bf16 %r31703, %rs2487; 2026-02-21T10:22:24.3516166Z cvt.f32.bf16 %r31704, %rs2488; 2026-02-21T10:22:24.3516223Z cvt.f32.bf16 %r31705, %rs2495; 2026-02-21T10:22:24.3516287Z cvt.f32.bf16 %r31706, %rs2496; 2026-02-21T10:22:24.3516346Z cvt.f32.bf16 %r31835, %rs2503; 2026-02-21T10:22:24.3516423Z cvt.f32.bf16 %r31836, %rs2504; 2026-02-21T10:22:24.3516609Z cvt.f32.bf16 %r31837, %rs2511; 2026-02-21T10:22:24.3516675Z cvt.f32.bf16 %r31838, %rs2512; 2026-02-21T10:22:24.3516734Z cvt.f32.bf16 %r31967, %rs2519; 2026-02-21T10:22:24.3516792Z cvt.f32.bf16 %r31968, %rs2520; 2026-02-21T10:22:24.3516852Z cvt.f32.bf16 %r31969, %rs2527; 2026-02-21T10:22:24.3516913Z cvt.f32.bf16 %r31970, %rs2528; 2026-02-21T10:22:24.3517125Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3517181Z bar.sync 0; 2026-02-21T10:22:24.3517239Z // begin inline asm 2026-02-21T10:22:24.3517336Z @%p313 mbarrier.init.shared::cta.b64 [%r29850], 1; 2026-02-21T10:22:24.3517395Z // end inline asm 2026-02-21T10:22:24.3517450Z bar.sync 0; 2026-02-21T10:22:24.3517507Z // begin inline asm 2026-02-21T10:22:24.3517721Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r29850], 4096; 2026-02-21T10:22:24.3517784Z // end inline asm 2026-02-21T10:22:24.3517841Z // begin inline asm 2026-02-21T10:22:24.3517917Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3517972Z // end inline asm 2026-02-21T10:22:24.3518030Z bar.sync 0; 2026-02-21T10:22:24.3518094Z elect.sync %r32233|%p306, -1; 2026-02-21T10:22:24.3518161Z and.pred %p287, %p1, %p306; 2026-02-21T10:22:24.3518231Z add.s64 %rd851, %rd851, 32; 2026-02-21T10:22:24.3518301Z cvt.u32.u64 %r29854, %rd851; 2026-02-21T10:22:24.3518361Z // begin inline asm 2026-02-21T10:22:24.3518701Z @%p287 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r29853, %r29854}], [%r29850]; 2026-02-21T10:22:24.3518757Z // end inline asm 2026-02-21T10:22:24.3518811Z bar.sync 0; 2026-02-21T10:22:24.3518870Z mov.b32 %r32101, 0; 2026-02-21T10:22:24.3518928Z // begin inline asm 2026-02-21T10:22:24.3518977Z 2026-02-21T10:22:24.3519025Z { 2026-02-21T10:22:24.3519089Z .reg .pred complete; 2026-02-21T10:22:24.3519142Z waitLoop: 2026-02-21T10:22:24.3519289Z mbarrier.try_wait.parity.shared.b64 complete, [%r29850], %r32101; 2026-02-21T10:22:24.3519356Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3519406Z } 2026-02-21T10:22:24.3519410Z 2026-02-21T10:22:24.3519465Z // end inline asm 2026-02-21T10:22:24.3519518Z bar.sync 0; 2026-02-21T10:22:24.3519576Z // begin inline asm 2026-02-21T10:22:24.3519671Z @%p313 mbarrier.inval.shared::cta.b64 [%r29850]; 2026-02-21T10:22:24.3519804Z // end inline asm 2026-02-21T10:22:24.3520007Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3520074Z ld.shared.s8 %rs2529, [%r20]; 2026-02-21T10:22:24.3520265Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3520331Z shl.b16 %rs2530, %rs2529, 4; 2026-02-21T10:22:24.3520527Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3520594Z ld.shared.s8 %rs2531, [%r21+128]; 2026-02-21T10:22:24.3520782Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3520844Z shl.b16 %rs2532, %rs2531, 4; 2026-02-21T10:22:24.3521031Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3521094Z ld.shared.s8 %rs2533, [%r22+256]; 2026-02-21T10:22:24.3521364Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3521429Z shl.b16 %rs2534, %rs2533, 4; 2026-02-21T10:22:24.3521621Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3521753Z ld.shared.s8 %rs2535, [%r23+384]; 2026-02-21T10:22:24.3521941Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3522001Z shl.b16 %rs2536, %rs2535, 4; 2026-02-21T10:22:24.3522190Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3524453Z ld.shared.s8 %rs2537, [%r24+512]; 2026-02-21T10:22:24.3524709Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3524778Z shl.b16 %rs2538, %rs2537, 4; 2026-02-21T10:22:24.3524988Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3525062Z ld.shared.s8 %rs2539, [%r25+640]; 2026-02-21T10:22:24.3525273Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3525341Z shl.b16 %rs2540, %rs2539, 4; 2026-02-21T10:22:24.3525545Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3525614Z ld.shared.s8 %rs2541, [%r26+768]; 2026-02-21T10:22:24.3525901Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3525971Z shl.b16 %rs2542, %rs2541, 4; 2026-02-21T10:22:24.3526170Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3526236Z ld.shared.s8 %rs2543, [%r27+896]; 2026-02-21T10:22:24.3526426Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3526653Z shl.b16 %rs2544, %rs2543, 4; 2026-02-21T10:22:24.3526856Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3526927Z ld.shared.s8 %rs2545, [%r20+1024]; 2026-02-21T10:22:24.3527121Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3527187Z shl.b16 %rs2546, %rs2545, 4; 2026-02-21T10:22:24.3527380Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3527446Z ld.shared.s8 %rs2547, [%r21+1152]; 2026-02-21T10:22:24.3527636Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3527699Z shl.b16 %rs2548, %rs2547, 4; 2026-02-21T10:22:24.3527886Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3528036Z ld.shared.s8 %rs2549, [%r22+1280]; 2026-02-21T10:22:24.3528236Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3528301Z shl.b16 %rs2550, %rs2549, 4; 2026-02-21T10:22:24.3528491Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3528557Z ld.shared.s8 %rs2551, [%r23+1408]; 2026-02-21T10:22:24.3528747Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3528806Z shl.b16 %rs2552, %rs2551, 4; 2026-02-21T10:22:24.3528991Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3529056Z ld.shared.s8 %rs2553, [%r24+1536]; 2026-02-21T10:22:24.3529241Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3529300Z shl.b16 %rs2554, %rs2553, 4; 2026-02-21T10:22:24.3529557Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3529632Z ld.shared.s8 %rs2555, [%r25+1664]; 2026-02-21T10:22:24.3529824Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3529948Z shl.b16 %rs2556, %rs2555, 4; 2026-02-21T10:22:24.3530138Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3530202Z ld.shared.s8 %rs2557, [%r26+1792]; 2026-02-21T10:22:24.3530392Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3530456Z shl.b16 %rs2558, %rs2557, 4; 2026-02-21T10:22:24.3530650Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3530717Z ld.shared.s8 %rs2559, [%r27+1920]; 2026-02-21T10:22:24.3530910Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3530974Z shl.b16 %rs2560, %rs2559, 4; 2026-02-21T10:22:24.3531161Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3531227Z ld.shared.s8 %rs2561, [%r20+2048]; 2026-02-21T10:22:24.3531415Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3531474Z shl.b16 %rs2562, %rs2561, 4; 2026-02-21T10:22:24.3531731Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3531800Z ld.shared.s8 %rs2563, [%r21+2176]; 2026-02-21T10:22:24.3531985Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3532044Z shl.b16 %rs2564, %rs2563, 4; 2026-02-21T10:22:24.3532232Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3532300Z ld.shared.s8 %rs2565, [%r22+2304]; 2026-02-21T10:22:24.3532497Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3532562Z shl.b16 %rs2566, %rs2565, 4; 2026-02-21T10:22:24.3532758Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3532827Z ld.shared.s8 %rs2567, [%r23+2432]; 2026-02-21T10:22:24.3533023Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3533083Z shl.b16 %rs2568, %rs2567, 4; 2026-02-21T10:22:24.3533272Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3533337Z ld.shared.s8 %rs2569, [%r24+2560]; 2026-02-21T10:22:24.3533526Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3533640Z shl.b16 %rs2570, %rs2569, 4; 2026-02-21T10:22:24.3533829Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3533894Z ld.shared.s8 %rs2571, [%r25+2688]; 2026-02-21T10:22:24.3534081Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3534155Z shl.b16 %rs2572, %rs2571, 4; 2026-02-21T10:22:24.3534350Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3534414Z ld.shared.s8 %rs2573, [%r26+2816]; 2026-02-21T10:22:24.3534600Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3534661Z shl.b16 %rs2574, %rs2573, 4; 2026-02-21T10:22:24.3534854Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3534919Z ld.shared.s8 %rs2575, [%r27+2944]; 2026-02-21T10:22:24.3535159Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3535226Z shl.b16 %rs2576, %rs2575, 4; 2026-02-21T10:22:24.3535421Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3535487Z ld.shared.s8 %rs2577, [%r20+3072]; 2026-02-21T10:22:24.3535729Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3535791Z shl.b16 %rs2578, %rs2577, 4; 2026-02-21T10:22:24.3535978Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3536044Z ld.shared.s8 %rs2579, [%r21+3200]; 2026-02-21T10:22:24.3536239Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3536299Z shl.b16 %rs2580, %rs2579, 4; 2026-02-21T10:22:24.3536619Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3536691Z ld.shared.s8 %rs2581, [%r22+3328]; 2026-02-21T10:22:24.3536882Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3536945Z shl.b16 %rs2582, %rs2581, 4; 2026-02-21T10:22:24.3537135Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3537199Z ld.shared.s8 %rs2583, [%r23+3456]; 2026-02-21T10:22:24.3537458Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3537523Z shl.b16 %rs2584, %rs2583, 4; 2026-02-21T10:22:24.3537710Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3537773Z ld.shared.s8 %rs2585, [%r24+3584]; 2026-02-21T10:22:24.3537960Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3538024Z shl.b16 %rs2586, %rs2585, 4; 2026-02-21T10:22:24.3538210Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3538274Z ld.shared.s8 %rs2587, [%r25+3712]; 2026-02-21T10:22:24.3538461Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3538524Z shl.b16 %rs2588, %rs2587, 4; 2026-02-21T10:22:24.3538714Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3538787Z ld.shared.s8 %rs2589, [%r26+3840]; 2026-02-21T10:22:24.3538978Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3539038Z shl.b16 %rs2590, %rs2589, 4; 2026-02-21T10:22:24.3539232Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3539387Z ld.shared.s8 %rs2591, [%r27+3968]; 2026-02-21T10:22:24.3539585Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3539649Z shl.b16 %rs2592, %rs2591, 4; 2026-02-21T10:22:24.3539835Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3539900Z cvt.s16.s8 %rs2593, %rs2530; 2026-02-21T10:22:24.3539963Z shr.s16 %rs2594, %rs2593, 4; 2026-02-21T10:22:24.3540023Z cvt.s16.s8 %rs2595, %rs2532; 2026-02-21T10:22:24.3540084Z shr.s16 %rs2596, %rs2595, 4; 2026-02-21T10:22:24.3540143Z shr.s16 %rs2597, %rs2529, 4; 2026-02-21T10:22:24.3540204Z shr.s16 %rs2598, %rs2531, 4; 2026-02-21T10:22:24.3540392Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3540457Z cvt.rn.f32.s16 %r32234, %rs2598; 2026-02-21T10:22:24.3540526Z cvt.rn.f32.s16 %r32235, %rs2597; 2026-02-21T10:22:24.3540590Z cvt.rn.f32.s16 %r32236, %rs2596; 2026-02-21T10:22:24.3540716Z cvt.rn.f32.s16 %r32237, %rs2594; 2026-02-21T10:22:24.3540912Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3540973Z cvt.s16.s8 %rs2599, %rs2534; 2026-02-21T10:22:24.3541032Z shr.s16 %rs2600, %rs2599, 4; 2026-02-21T10:22:24.3541160Z cvt.s16.s8 %rs2601, %rs2536; 2026-02-21T10:22:24.3541220Z shr.s16 %rs2602, %rs2601, 4; 2026-02-21T10:22:24.3541280Z shr.s16 %rs2603, %rs2533, 4; 2026-02-21T10:22:24.3541342Z shr.s16 %rs2604, %rs2535, 4; 2026-02-21T10:22:24.3541532Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3541593Z cvt.rn.f32.s16 %r32238, %rs2604; 2026-02-21T10:22:24.3541653Z cvt.rn.f32.s16 %r32239, %rs2603; 2026-02-21T10:22:24.3541716Z cvt.rn.f32.s16 %r32240, %rs2602; 2026-02-21T10:22:24.3541775Z cvt.rn.f32.s16 %r32241, %rs2600; 2026-02-21T10:22:24.3541963Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3542025Z cvt.s16.s8 %rs2605, %rs2538; 2026-02-21T10:22:24.3542087Z shr.s16 %rs2606, %rs2605, 4; 2026-02-21T10:22:24.3542145Z cvt.s16.s8 %rs2607, %rs2540; 2026-02-21T10:22:24.3542202Z shr.s16 %rs2608, %rs2607, 4; 2026-02-21T10:22:24.3542263Z shr.s16 %rs2609, %rs2537, 4; 2026-02-21T10:22:24.3542323Z shr.s16 %rs2610, %rs2539, 4; 2026-02-21T10:22:24.3542564Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3542635Z cvt.rn.f32.s16 %r32242, %rs2610; 2026-02-21T10:22:24.3542706Z cvt.rn.f32.s16 %r32243, %rs2609; 2026-02-21T10:22:24.3542768Z cvt.rn.f32.s16 %r32244, %rs2608; 2026-02-21T10:22:24.3542827Z cvt.rn.f32.s16 %r32245, %rs2606; 2026-02-21T10:22:24.3543018Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3543077Z cvt.s16.s8 %rs2611, %rs2542; 2026-02-21T10:22:24.3543140Z shr.s16 %rs2612, %rs2611, 4; 2026-02-21T10:22:24.3543201Z cvt.s16.s8 %rs2613, %rs2544; 2026-02-21T10:22:24.3543261Z shr.s16 %rs2614, %rs2613, 4; 2026-02-21T10:22:24.3543319Z shr.s16 %rs2615, %rs2541, 4; 2026-02-21T10:22:24.3543377Z shr.s16 %rs2616, %rs2543, 4; 2026-02-21T10:22:24.3543566Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3543630Z cvt.rn.f32.s16 %r32246, %rs2616; 2026-02-21T10:22:24.3543692Z cvt.rn.f32.s16 %r32247, %rs2615; 2026-02-21T10:22:24.3543755Z cvt.rn.f32.s16 %r32248, %rs2614; 2026-02-21T10:22:24.3543815Z cvt.rn.f32.s16 %r32249, %rs2612; 2026-02-21T10:22:24.3544001Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3544062Z cvt.s16.s8 %rs2617, %rs2546; 2026-02-21T10:22:24.3544121Z shr.s16 %rs2618, %rs2617, 4; 2026-02-21T10:22:24.3544179Z cvt.s16.s8 %rs2619, %rs2548; 2026-02-21T10:22:24.3544307Z shr.s16 %rs2620, %rs2619, 4; 2026-02-21T10:22:24.3544373Z shr.s16 %rs2621, %rs2545, 4; 2026-02-21T10:22:24.3544432Z shr.s16 %rs2622, %rs2547, 4; 2026-02-21T10:22:24.3544622Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3544686Z cvt.rn.f32.s16 %r32250, %rs2622; 2026-02-21T10:22:24.3544748Z cvt.rn.f32.s16 %r32251, %rs2621; 2026-02-21T10:22:24.3544808Z cvt.rn.f32.s16 %r32252, %rs2620; 2026-02-21T10:22:24.3544870Z cvt.rn.f32.s16 %r32253, %rs2618; 2026-02-21T10:22:24.3545062Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3545123Z cvt.s16.s8 %rs2623, %rs2550; 2026-02-21T10:22:24.3545183Z shr.s16 %rs2624, %rs2623, 4; 2026-02-21T10:22:24.3545245Z cvt.s16.s8 %rs2625, %rs2552; 2026-02-21T10:22:24.3545303Z shr.s16 %rs2626, %rs2625, 4; 2026-02-21T10:22:24.3545361Z shr.s16 %rs2627, %rs2549, 4; 2026-02-21T10:22:24.3545427Z shr.s16 %rs2628, %rs2551, 4; 2026-02-21T10:22:24.3545683Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3545746Z cvt.rn.f32.s16 %r32254, %rs2628; 2026-02-21T10:22:24.3545808Z cvt.rn.f32.s16 %r32255, %rs2627; 2026-02-21T10:22:24.3545870Z cvt.rn.f32.s16 %r32256, %rs2626; 2026-02-21T10:22:24.3545978Z cvt.rn.f32.s16 %r32257, %rs2624; 2026-02-21T10:22:24.3546167Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3546232Z cvt.s16.s8 %rs2629, %rs2554; 2026-02-21T10:22:24.3546291Z shr.s16 %rs2630, %rs2629, 4; 2026-02-21T10:22:24.3546349Z cvt.s16.s8 %rs2631, %rs2556; 2026-02-21T10:22:24.3546409Z shr.s16 %rs2632, %rs2631, 4; 2026-02-21T10:22:24.3546625Z shr.s16 %rs2633, %rs2553, 4; 2026-02-21T10:22:24.3546690Z shr.s16 %rs2634, %rs2555, 4; 2026-02-21T10:22:24.3546881Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3546952Z cvt.rn.f32.s16 %r32258, %rs2634; 2026-02-21T10:22:24.3547013Z cvt.rn.f32.s16 %r32259, %rs2633; 2026-02-21T10:22:24.3547073Z cvt.rn.f32.s16 %r32260, %rs2632; 2026-02-21T10:22:24.3547137Z cvt.rn.f32.s16 %r32261, %rs2630; 2026-02-21T10:22:24.3547325Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3547388Z cvt.s16.s8 %rs2635, %rs2558; 2026-02-21T10:22:24.3547449Z shr.s16 %rs2636, %rs2635, 4; 2026-02-21T10:22:24.3547593Z cvt.s16.s8 %rs2637, %rs2560; 2026-02-21T10:22:24.3547657Z shr.s16 %rs2638, %rs2637, 4; 2026-02-21T10:22:24.3547715Z shr.s16 %rs2639, %rs2557, 4; 2026-02-21T10:22:24.3547776Z shr.s16 %rs2640, %rs2559, 4; 2026-02-21T10:22:24.3547964Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3548025Z cvt.rn.f32.s16 %r32262, %rs2640; 2026-02-21T10:22:24.3548089Z cvt.rn.f32.s16 %r32263, %rs2639; 2026-02-21T10:22:24.3548151Z cvt.rn.f32.s16 %r32264, %rs2638; 2026-02-21T10:22:24.3548213Z cvt.rn.f32.s16 %r32265, %rs2636; 2026-02-21T10:22:24.3548403Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3548547Z cvt.s16.s8 %rs2641, %rs2562; 2026-02-21T10:22:24.3548608Z shr.s16 %rs2642, %rs2641, 4; 2026-02-21T10:22:24.3548669Z cvt.s16.s8 %rs2643, %rs2564; 2026-02-21T10:22:24.3548730Z shr.s16 %rs2644, %rs2643, 4; 2026-02-21T10:22:24.3548789Z shr.s16 %rs2645, %rs2561, 4; 2026-02-21T10:22:24.3548847Z shr.s16 %rs2646, %rs2563, 4; 2026-02-21T10:22:24.3549040Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3549101Z cvt.rn.f32.s16 %r32266, %rs2646; 2026-02-21T10:22:24.3549162Z cvt.rn.f32.s16 %r32267, %rs2645; 2026-02-21T10:22:24.3549226Z cvt.rn.f32.s16 %r32268, %rs2644; 2026-02-21T10:22:24.3549292Z cvt.rn.f32.s16 %r32269, %rs2642; 2026-02-21T10:22:24.3549571Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3549636Z cvt.s16.s8 %rs2647, %rs2566; 2026-02-21T10:22:24.3549700Z shr.s16 %rs2648, %rs2647, 4; 2026-02-21T10:22:24.3549759Z cvt.s16.s8 %rs2649, %rs2568; 2026-02-21T10:22:24.3549818Z shr.s16 %rs2650, %rs2649, 4; 2026-02-21T10:22:24.3549881Z shr.s16 %rs2651, %rs2565, 4; 2026-02-21T10:22:24.3549940Z shr.s16 %rs2652, %rs2567, 4; 2026-02-21T10:22:24.3550135Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3550199Z cvt.rn.f32.s16 %r32270, %rs2652; 2026-02-21T10:22:24.3550262Z cvt.rn.f32.s16 %r32271, %rs2651; 2026-02-21T10:22:24.3550322Z cvt.rn.f32.s16 %r32272, %rs2650; 2026-02-21T10:22:24.3550394Z cvt.rn.f32.s16 %r32273, %rs2648; 2026-02-21T10:22:24.3550590Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3550650Z cvt.s16.s8 %rs2653, %rs2570; 2026-02-21T10:22:24.3550777Z shr.s16 %rs2654, %rs2653, 4; 2026-02-21T10:22:24.3550842Z cvt.s16.s8 %rs2655, %rs2572; 2026-02-21T10:22:24.3550900Z shr.s16 %rs2656, %rs2655, 4; 2026-02-21T10:22:24.3550959Z shr.s16 %rs2657, %rs2569, 4; 2026-02-21T10:22:24.3551018Z shr.s16 %rs2658, %rs2571, 4; 2026-02-21T10:22:24.3551210Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3551335Z cvt.rn.f32.s16 %r32274, %rs2658; 2026-02-21T10:22:24.3551398Z cvt.rn.f32.s16 %r32275, %rs2657; 2026-02-21T10:22:24.3551461Z cvt.rn.f32.s16 %r32276, %rs2656; 2026-02-21T10:22:24.3551520Z cvt.rn.f32.s16 %r32277, %rs2654; 2026-02-21T10:22:24.3551707Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3551768Z cvt.s16.s8 %rs2659, %rs2574; 2026-02-21T10:22:24.3551826Z shr.s16 %rs2660, %rs2659, 4; 2026-02-21T10:22:24.3551885Z cvt.s16.s8 %rs2661, %rs2576; 2026-02-21T10:22:24.3551948Z shr.s16 %rs2662, %rs2661, 4; 2026-02-21T10:22:24.3552009Z shr.s16 %rs2663, %rs2573, 4; 2026-02-21T10:22:24.3552068Z shr.s16 %rs2664, %rs2575, 4; 2026-02-21T10:22:24.3552256Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3552320Z cvt.rn.f32.s16 %r32278, %rs2664; 2026-02-21T10:22:24.3552384Z cvt.rn.f32.s16 %r32279, %rs2663; 2026-02-21T10:22:24.3552444Z cvt.rn.f32.s16 %r32280, %rs2662; 2026-02-21T10:22:24.3552556Z cvt.rn.f32.s16 %r32281, %rs2660; 2026-02-21T10:22:24.3552748Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3552808Z cvt.s16.s8 %rs2665, %rs2578; 2026-02-21T10:22:24.3552866Z shr.s16 %rs2666, %rs2665, 4; 2026-02-21T10:22:24.3552934Z cvt.s16.s8 %rs2667, %rs2580; 2026-02-21T10:22:24.3552999Z shr.s16 %rs2668, %rs2667, 4; 2026-02-21T10:22:24.3553060Z shr.s16 %rs2669, %rs2577, 4; 2026-02-21T10:22:24.3553125Z shr.s16 %rs2670, %rs2579, 4; 2026-02-21T10:22:24.3553316Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3553379Z cvt.rn.f32.s16 %r32282, %rs2670; 2026-02-21T10:22:24.3553440Z cvt.rn.f32.s16 %r32283, %rs2669; 2026-02-21T10:22:24.3553503Z cvt.rn.f32.s16 %r32284, %rs2668; 2026-02-21T10:22:24.3553565Z cvt.rn.f32.s16 %r32285, %rs2666; 2026-02-21T10:22:24.3553753Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3553818Z cvt.s16.s8 %rs2671, %rs2582; 2026-02-21T10:22:24.3553878Z shr.s16 %rs2672, %rs2671, 4; 2026-02-21T10:22:24.3553936Z cvt.s16.s8 %rs2673, %rs2584; 2026-02-21T10:22:24.3553996Z shr.s16 %rs2674, %rs2673, 4; 2026-02-21T10:22:24.3554054Z shr.s16 %rs2675, %rs2581, 4; 2026-02-21T10:22:24.3554113Z shr.s16 %rs2676, %rs2583, 4; 2026-02-21T10:22:24.3554302Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3554442Z cvt.rn.f32.s16 %r32286, %rs2676; 2026-02-21T10:22:24.3554504Z cvt.rn.f32.s16 %r32287, %rs2675; 2026-02-21T10:22:24.3554565Z cvt.rn.f32.s16 %r32288, %rs2674; 2026-02-21T10:22:24.3554627Z cvt.rn.f32.s16 %r32289, %rs2672; 2026-02-21T10:22:24.3554818Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3554881Z cvt.s16.s8 %rs2677, %rs2586; 2026-02-21T10:22:24.3554942Z shr.s16 %rs2678, %rs2677, 4; 2026-02-21T10:22:24.3555004Z cvt.s16.s8 %rs2679, %rs2588; 2026-02-21T10:22:24.3555063Z shr.s16 %rs2680, %rs2679, 4; 2026-02-21T10:22:24.3555123Z shr.s16 %rs2681, %rs2585, 4; 2026-02-21T10:22:24.3555185Z shr.s16 %rs2682, %rs2587, 4; 2026-02-21T10:22:24.3555373Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3555437Z cvt.rn.f32.s16 %r32290, %rs2682; 2026-02-21T10:22:24.3555500Z cvt.rn.f32.s16 %r32291, %rs2681; 2026-02-21T10:22:24.3555615Z cvt.rn.f32.s16 %r32292, %rs2680; 2026-02-21T10:22:24.3555677Z cvt.rn.f32.s16 %r32293, %rs2678; 2026-02-21T10:22:24.3555864Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3555927Z cvt.s16.s8 %rs2683, %rs2590; 2026-02-21T10:22:24.3555986Z shr.s16 %rs2684, %rs2683, 4; 2026-02-21T10:22:24.3556094Z cvt.s16.s8 %rs2685, %rs2592; 2026-02-21T10:22:24.3556158Z shr.s16 %rs2686, %rs2685, 4; 2026-02-21T10:22:24.3556226Z shr.s16 %rs2687, %rs2589, 4; 2026-02-21T10:22:24.3556284Z shr.s16 %rs2688, %rs2591, 4; 2026-02-21T10:22:24.3556597Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3556665Z cvt.rn.f32.s16 %r32294, %rs2688; 2026-02-21T10:22:24.3556727Z cvt.rn.f32.s16 %r32295, %rs2687; 2026-02-21T10:22:24.3556787Z cvt.rn.f32.s16 %r32296, %rs2686; 2026-02-21T10:22:24.3556850Z cvt.rn.f32.s16 %r32297, %rs2684; 2026-02-21T10:22:24.3556912Z bar.sync 0; 2026-02-21T10:22:24.3557039Z st.shared.v4.b32 [%r28], {%r32237, %r32235, %r32236, %r32234}; 2026-02-21T10:22:24.3557179Z st.shared.v4.b32 [%r28+16384], {%r32269, %r32267, %r32268, %r32266}; 2026-02-21T10:22:24.3557292Z st.shared.v4.b32 [%r29], {%r32241, %r32239, %r32240, %r32238}; 2026-02-21T10:22:24.3557847Z [3337s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:22:24.3559259Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=4, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[2, 3], range_unroll_factors=[3, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:22:24.3559411Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:22:24.3559475Z `ptxas` stderr: 2026-02-21T10:22:24.3559947Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1166 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:22:24.3560052Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:24.3560062Z 2026-02-21T10:22:24.3560572Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpy47jx0jh.ptx -o /tmp/tmpy47jx0jh.ptx.o 2026-02-21T10:22:24.3560580Z 2026-02-21T10:22:24.3560730Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:22:24.3560861Z st.shared.v4.b32 [%r29+16384], {%r32273, %r32271, %r32272, %r32270}; 2026-02-21T10:22:24.3560976Z st.shared.v4.b32 [%r30], {%r32245, %r32243, %r32244, %r32242}; 2026-02-21T10:22:24.3561098Z st.shared.v4.b32 [%r30+16384], {%r32277, %r32275, %r32276, %r32274}; 2026-02-21T10:22:24.3561277Z st.shared.v4.b32 [%r31], {%r32249, %r32247, %r32248, %r32246}; 2026-02-21T10:22:24.3561394Z st.shared.v4.b32 [%r31+16384], {%r32281, %r32279, %r32280, %r32278}; 2026-02-21T10:22:24.3561501Z st.shared.v4.b32 [%r32], {%r32253, %r32251, %r32252, %r32250}; 2026-02-21T10:22:24.3561615Z st.shared.v4.b32 [%r32+16384], {%r32285, %r32283, %r32284, %r32282}; 2026-02-21T10:22:24.3561723Z st.shared.v4.b32 [%r33], {%r32257, %r32255, %r32256, %r32254}; 2026-02-21T10:22:24.3561840Z st.shared.v4.b32 [%r33+16384], {%r32289, %r32287, %r32288, %r32286}; 2026-02-21T10:22:24.3561947Z st.shared.v4.b32 [%r34], {%r32261, %r32259, %r32260, %r32258}; 2026-02-21T10:22:24.3562059Z st.shared.v4.b32 [%r34+16384], {%r32293, %r32291, %r32292, %r32290}; 2026-02-21T10:22:24.3562166Z st.shared.v4.b32 [%r35], {%r32265, %r32263, %r32264, %r32262}; 2026-02-21T10:22:24.3562280Z st.shared.v4.b32 [%r35+16384], {%r32297, %r32295, %r32296, %r32294}; 2026-02-21T10:22:24.3562335Z $L__tmp23: 2026-02-21T10:22:24.3562677Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3562745Z // begin inline asm 2026-02-21T10:22:24.3562829Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3562885Z // end inline asm 2026-02-21T10:22:24.3562949Z bar.sync 0; 2026-02-21T10:22:24.3563088Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3563153Z mov.pred %p289, -1; 2026-02-21T10:22:24.3563212Z // begin inline asm 2026-02-21T10:22:24.3564711Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r29987,%r29988,%r29989,%r29990}, %rd3, %p289, 1, 1; 2026-02-21T10:22:24.3564772Z // end inline asm 2026-02-21T10:22:24.3564832Z // begin inline asm 2026-02-21T10:22:24.3566367Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30119,%r30120,%r30121,%r30122}, %rd4, %p289, 1, 1; 2026-02-21T10:22:24.3566432Z // end inline asm 2026-02-21T10:22:24.3566614Z // begin inline asm 2026-02-21T10:22:24.3568102Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30251,%r30252,%r30253,%r30254}, %rd5, %p289, 1, 1; 2026-02-21T10:22:24.3568166Z // end inline asm 2026-02-21T10:22:24.3568224Z // begin inline asm 2026-02-21T10:22:24.3569699Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30383,%r30384,%r30385,%r30386}, %rd6, %p289, 1, 1; 2026-02-21T10:22:24.3569835Z // end inline asm 2026-02-21T10:22:24.3569899Z // begin inline asm 2026-02-21T10:22:24.3571382Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30515,%r30516,%r30517,%r30518}, %rd7, %p289, 1, 1; 2026-02-21T10:22:24.3571502Z // end inline asm 2026-02-21T10:22:24.3571564Z // begin inline asm 2026-02-21T10:22:24.3573058Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30647,%r30648,%r30649,%r30650}, %rd8, %p289, 1, 1; 2026-02-21T10:22:24.3573180Z // end inline asm 2026-02-21T10:22:24.3573237Z // begin inline asm 2026-02-21T10:22:24.3574719Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30779,%r30780,%r30781,%r30782}, %rd9, %p289, 1, 1; 2026-02-21T10:22:24.3574845Z // end inline asm 2026-02-21T10:22:24.3574906Z // begin inline asm 2026-02-21T10:22:24.3576391Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051}, {%r30911,%r30912,%r30913,%r30914}, %rd10, %p289, 1, 1; 2026-02-21T10:22:24.3576561Z // end inline asm 2026-02-21T10:22:24.3576627Z // begin inline asm 2026-02-21T10:22:24.3578113Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31043,%r31044,%r31045,%r31046}, %rd3, %p289, 1, 1; 2026-02-21T10:22:24.3578253Z // end inline asm 2026-02-21T10:22:24.3578315Z // begin inline asm 2026-02-21T10:22:24.3579790Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31175,%r31176,%r31177,%r31178}, %rd4, %p289, 1, 1; 2026-02-21T10:22:24.3579853Z // end inline asm 2026-02-21T10:22:24.3579911Z // begin inline asm 2026-02-21T10:22:24.3581454Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31307,%r31308,%r31309,%r31310}, %rd5, %p289, 1, 1; 2026-02-21T10:22:24.3581575Z // end inline asm 2026-02-21T10:22:24.3581635Z // begin inline asm 2026-02-21T10:22:24.3583131Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31439,%r31440,%r31441,%r31442}, %rd6, %p289, 1, 1; 2026-02-21T10:22:24.3583190Z // end inline asm 2026-02-21T10:22:24.3583247Z // begin inline asm 2026-02-21T10:22:24.3584799Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31571,%r31572,%r31573,%r31574}, %rd7, %p289, 1, 1; 2026-02-21T10:22:24.3584860Z // end inline asm 2026-02-21T10:22:24.3584917Z // begin inline asm 2026-02-21T10:22:24.3586417Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31703,%r31704,%r31705,%r31706}, %rd8, %p289, 1, 1; 2026-02-21T10:22:24.3586588Z // end inline asm 2026-02-21T10:22:24.3586653Z // begin inline asm 2026-02-21T10:22:24.3588138Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31835,%r31836,%r31837,%r31838}, %rd9, %p289, 1, 1; 2026-02-21T10:22:24.3588271Z // end inline asm 2026-02-21T10:22:24.3588344Z // begin inline asm 2026-02-21T10:22:24.3589975Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115}, {%r31967,%r31968,%r31969,%r31970}, %rd10, %p289, 1, 1; 2026-02-21T10:22:24.3590046Z // end inline asm 2026-02-21T10:22:24.3590125Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3590187Z mov.b32 %r32100, %r32101; 2026-02-21T10:22:24.3590249Z mov.b32 %r32099, %r39936; 2026-02-21T10:22:24.3590401Z // begin inline asm 2026-02-21T10:22:24.3592942Z // wait for regs: %r42988,%r42989,%r42990,%r42991,%r42992,%r42993,%r42994,%r42995,%r42996,%r42997,%r42998,%r42999,%r43000,%r43001,%r43002,%r43003,%r43004,%r43005,%r43006,%r43007,%r43008,%r43009,%r43010,%r43011,%r43012,%r43013,%r43014,%r43015,%r43016,%r43017,%r43018,%r43019,%r43020,%r43021,%r43022,%r43023,%r43024,%r43025,%r43026,%r43027,%r43028,%r43029,%r43030,%r43031,%r43032,%r43033,%r43034,%r43035,%r43036,%r43037,%r43038,%r43039,%r43040,%r43041,%r43042,%r43043,%r43044,%r43045,%r43046,%r43047,%r43048,%r43049,%r43050,%r43051,%r43052,%r43053,%r43054,%r43055,%r43056,%r43057,%r43058,%r43059,%r43060,%r43061,%r43062,%r43063,%r43064,%r43065,%r43066,%r43067,%r43068,%r43069,%r43070,%r43071,%r43072,%r43073,%r43074,%r43075,%r43076,%r43077,%r43078,%r43079,%r43080,%r43081,%r43082,%r43083,%r43084,%r43085,%r43086,%r43087,%r43088,%r43089,%r43090,%r43091,%r43092,%r43093,%r43094,%r43095,%r43096,%r43097,%r43098,%r43099,%r43100,%r43101,%r43102,%r43103,%r43104,%r43105,%r43106,%r43107,%r43108,%r43109,%r43110,%r43111,%r43112,%r43113,%r43114,%r43115,%r32099,%r32100,%r32101 2026-02-21T10:22:24.3593027Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3593143Z // end inline asm 2026-02-21T10:22:24.3593200Z $L__tmp24: 2026-02-21T10:22:24.3593421Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3593487Z add.s64 %rd850, %rd850, 128; 2026-02-21T10:22:24.3593556Z setp.lt.u64 %p307, %rd851, 4064; 2026-02-21T10:22:24.3593634Z @%p307 bra $L__BB0_13; 2026-02-21T10:22:24.3593747Z // %bb.14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:24.3593954Z .loc 1 97 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:97:28 2026-02-21T10:22:24.3594042Z cvt.rn.bf16x2.f32 %r32301, %r42989, %r42988; 2026-02-21T10:22:24.3594119Z cvt.rn.bf16x2.f32 %r32302, %r42991, %r42990; 2026-02-21T10:22:24.3594195Z cvt.rn.bf16x2.f32 %r32303, %r42993, %r42992; 2026-02-21T10:22:24.3594272Z cvt.rn.bf16x2.f32 %r32304, %r42995, %r42994; 2026-02-21T10:22:24.3594349Z cvt.rn.bf16x2.f32 %r32305, %r42997, %r42996; 2026-02-21T10:22:24.3594427Z cvt.rn.bf16x2.f32 %r32306, %r42999, %r42998; 2026-02-21T10:22:24.3594503Z cvt.rn.bf16x2.f32 %r32307, %r43001, %r43000; 2026-02-21T10:22:24.3594579Z cvt.rn.bf16x2.f32 %r32308, %r43003, %r43002; 2026-02-21T10:22:24.3594654Z cvt.rn.bf16x2.f32 %r32309, %r43005, %r43004; 2026-02-21T10:22:24.3594728Z cvt.rn.bf16x2.f32 %r32310, %r43007, %r43006; 2026-02-21T10:22:24.3594803Z cvt.rn.bf16x2.f32 %r32311, %r43009, %r43008; 2026-02-21T10:22:24.3594879Z cvt.rn.bf16x2.f32 %r32312, %r43011, %r43010; 2026-02-21T10:22:24.3595008Z cvt.rn.bf16x2.f32 %r32313, %r43013, %r43012; 2026-02-21T10:22:24.3595082Z cvt.rn.bf16x2.f32 %r32314, %r43015, %r43014; 2026-02-21T10:22:24.3595161Z cvt.rn.bf16x2.f32 %r32315, %r43017, %r43016; 2026-02-21T10:22:24.3595234Z cvt.rn.bf16x2.f32 %r32316, %r43019, %r43018; 2026-02-21T10:22:24.3595307Z cvt.rn.bf16x2.f32 %r32317, %r43021, %r43020; 2026-02-21T10:22:24.3595385Z cvt.rn.bf16x2.f32 %r32318, %r43023, %r43022; 2026-02-21T10:22:24.3595459Z cvt.rn.bf16x2.f32 %r32319, %r43025, %r43024; 2026-02-21T10:22:24.3595534Z cvt.rn.bf16x2.f32 %r32320, %r43027, %r43026; 2026-02-21T10:22:24.3595611Z cvt.rn.bf16x2.f32 %r32321, %r43029, %r43028; 2026-02-21T10:22:24.3595684Z cvt.rn.bf16x2.f32 %r32322, %r43031, %r43030; 2026-02-21T10:22:24.3595768Z cvt.rn.bf16x2.f32 %r32323, %r43033, %r43032; 2026-02-21T10:22:24.3595846Z cvt.rn.bf16x2.f32 %r32324, %r43035, %r43034; 2026-02-21T10:22:24.3595924Z cvt.rn.bf16x2.f32 %r32325, %r43037, %r43036; 2026-02-21T10:22:24.3596053Z cvt.rn.bf16x2.f32 %r32326, %r43039, %r43038; 2026-02-21T10:22:24.3596130Z cvt.rn.bf16x2.f32 %r32327, %r43041, %r43040; 2026-02-21T10:22:24.3596214Z cvt.rn.bf16x2.f32 %r32328, %r43043, %r43042; 2026-02-21T10:22:24.3596291Z cvt.rn.bf16x2.f32 %r32329, %r43045, %r43044; 2026-02-21T10:22:24.3596365Z cvt.rn.bf16x2.f32 %r32330, %r43047, %r43046; 2026-02-21T10:22:24.3596611Z cvt.rn.bf16x2.f32 %r32331, %r43049, %r43048; 2026-02-21T10:22:24.3596693Z cvt.rn.bf16x2.f32 %r32332, %r43051, %r43050; 2026-02-21T10:22:24.3596769Z cvt.rn.bf16x2.f32 %r32333, %r43053, %r43052; 2026-02-21T10:22:24.3596845Z cvt.rn.bf16x2.f32 %r32334, %r43055, %r43054; 2026-02-21T10:22:24.3596923Z cvt.rn.bf16x2.f32 %r32335, %r43057, %r43056; 2026-02-21T10:22:24.3596997Z cvt.rn.bf16x2.f32 %r32336, %r43059, %r43058; 2026-02-21T10:22:24.3597070Z cvt.rn.bf16x2.f32 %r32337, %r43061, %r43060; 2026-02-21T10:22:24.3597145Z cvt.rn.bf16x2.f32 %r32338, %r43063, %r43062; 2026-02-21T10:22:24.3597220Z cvt.rn.bf16x2.f32 %r32339, %r43065, %r43064; 2026-02-21T10:22:24.3597297Z cvt.rn.bf16x2.f32 %r32340, %r43067, %r43066; 2026-02-21T10:22:24.3597374Z cvt.rn.bf16x2.f32 %r32341, %r43069, %r43068; 2026-02-21T10:22:24.3597449Z cvt.rn.bf16x2.f32 %r32342, %r43071, %r43070; 2026-02-21T10:22:24.3597524Z cvt.rn.bf16x2.f32 %r32343, %r43073, %r43072; 2026-02-21T10:22:24.3597597Z cvt.rn.bf16x2.f32 %r32344, %r43075, %r43074; 2026-02-21T10:22:24.3597675Z cvt.rn.bf16x2.f32 %r32345, %r43077, %r43076; 2026-02-21T10:22:24.3597748Z cvt.rn.bf16x2.f32 %r32346, %r43079, %r43078; 2026-02-21T10:22:24.3597895Z cvt.rn.bf16x2.f32 %r32347, %r43081, %r43080; 2026-02-21T10:22:24.3597974Z cvt.rn.bf16x2.f32 %r32348, %r43083, %r43082; 2026-02-21T10:22:24.3598048Z cvt.rn.bf16x2.f32 %r32349, %r43085, %r43084; 2026-02-21T10:22:24.3598123Z cvt.rn.bf16x2.f32 %r32350, %r43087, %r43086; 2026-02-21T10:22:24.3598197Z cvt.rn.bf16x2.f32 %r32351, %r43089, %r43088; 2026-02-21T10:22:24.3598271Z cvt.rn.bf16x2.f32 %r32352, %r43091, %r43090; 2026-02-21T10:22:24.3598349Z cvt.rn.bf16x2.f32 %r32353, %r43093, %r43092; 2026-02-21T10:22:24.3598424Z cvt.rn.bf16x2.f32 %r32354, %r43095, %r43094; 2026-02-21T10:22:24.3598501Z cvt.rn.bf16x2.f32 %r32355, %r43097, %r43096; 2026-02-21T10:22:24.3598585Z cvt.rn.bf16x2.f32 %r32356, %r43099, %r43098; 2026-02-21T10:22:24.3598661Z cvt.rn.bf16x2.f32 %r32357, %r43101, %r43100; 2026-02-21T10:22:24.3598740Z cvt.rn.bf16x2.f32 %r32358, %r43103, %r43102; 2026-02-21T10:22:24.3598816Z cvt.rn.bf16x2.f32 %r32359, %r43105, %r43104; 2026-02-21T10:22:24.3598891Z cvt.rn.bf16x2.f32 %r32360, %r43107, %r43106; 2026-02-21T10:22:24.3598966Z cvt.rn.bf16x2.f32 %r32361, %r43109, %r43108; 2026-02-21T10:22:24.3599041Z cvt.rn.bf16x2.f32 %r32362, %r43111, %r43110; 2026-02-21T10:22:24.3599114Z cvt.rn.bf16x2.f32 %r32363, %r43113, %r43112; 2026-02-21T10:22:24.3599186Z cvt.rn.bf16x2.f32 %r32364, %r43115, %r43114; 2026-02-21T10:22:24.3599387Z .loc 1 98 43 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:98:43 2026-02-21T10:22:24.3599515Z bar.sync 0; 2026-02-21T10:22:24.3599712Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r36], {%r32301, %r32302, %r32303, %r32304}; 2026-02-21T10:22:24.3599897Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r37], {%r32317, %r32318, %r32319, %r32320}; 2026-02-21T10:22:24.3600079Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r38], {%r32333, %r32334, %r32335, %r32336}; 2026-02-21T10:22:24.3600260Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r39], {%r32349, %r32350, %r32351, %r32352}; 2026-02-21T10:22:24.3600443Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r40], {%r32305, %r32306, %r32307, %r32308}; 2026-02-21T10:22:24.3600621Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r41], {%r32321, %r32322, %r32323, %r32324}; 2026-02-21T10:22:24.3600798Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r42], {%r32337, %r32338, %r32339, %r32340}; 2026-02-21T10:22:24.3600977Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r43], {%r32353, %r32354, %r32355, %r32356}; 2026-02-21T10:22:24.3601222Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r44], {%r32309, %r32310, %r32311, %r32312}; 2026-02-21T10:22:24.3601406Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r45], {%r32325, %r32326, %r32327, %r32328}; 2026-02-21T10:22:24.3601585Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r46], {%r32341, %r32342, %r32343, %r32344}; 2026-02-21T10:22:24.3601765Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r47], {%r32357, %r32358, %r32359, %r32360}; 2026-02-21T10:22:24.3602008Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r48], {%r32313, %r32314, %r32315, %r32316}; 2026-02-21T10:22:24.3602198Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r49], {%r32329, %r32330, %r32331, %r32332}; 2026-02-21T10:22:24.3602382Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r50], {%r32345, %r32346, %r32347, %r32348}; 2026-02-21T10:22:24.3602559Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r51], {%r32361, %r32362, %r32363, %r32364}; 2026-02-21T10:22:24.3602618Z // begin inline asm 2026-02-21T10:22:24.3602699Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3602758Z // end inline asm 2026-02-21T10:22:24.3602814Z bar.sync 0; 2026-02-21T10:22:24.3602883Z elect.sync %r32365|%p310, -1; 2026-02-21T10:22:24.3602950Z and.pred %p308, %p405, %p310; 2026-02-21T10:22:24.3603014Z or.b32 %r32298, %r29853, %r641; 2026-02-21T10:22:24.3603072Z // begin inline asm 2026-02-21T10:22:24.3603308Z @%p308 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r32298, %r32299}], [%r22280]; 2026-02-21T10:22:24.3603366Z // end inline asm 2026-02-21T10:22:24.3603439Z cp.async.bulk.commit_group; 2026-02-21T10:22:24.3603571Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:24.3603628Z bar.sync 0; 2026-02-21T10:22:24.3603825Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.3603900Z add.s32 %r42472, %r42472, 3; 2026-02-21T10:22:24.3603974Z setp.lt.s32 %p311, %r42472, %r43244; 2026-02-21T10:22:24.3604035Z @%p311 bra $L__BB0_2; 2026-02-21T10:22:24.3604132Z $L__BB0_15: // %.preheader268 2026-02-21T10:22:24.3604205Z setp.ge.s32 %p312, %r43244, %r3; 2026-02-21T10:22:24.3604266Z @%p312 bra $L__BB0_22; 2026-02-21T10:22:24.3604351Z // %bb.16: // %.lr.ph279 2026-02-21T10:22:24.3604552Z .loc 1 0 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:0:88 2026-02-21T10:22:24.3604616Z shl.b32 %r32367, %r42456, 4; 2026-02-21T10:22:24.3604680Z xor.b32 %r32369, %r32367, %r42457; 2026-02-21T10:22:24.3604744Z add.s32 %r70, %r39936, %r32369; 2026-02-21T10:22:24.3604809Z xor.b32 %r32371, %r32369, 8; 2026-02-21T10:22:24.3604880Z add.s32 %r71, %r39936, %r32371; 2026-02-21T10:22:24.3604943Z and.b32 %r32373, %r42458, 6144; 2026-02-21T10:22:24.3605008Z and.b32 %r32375, %r42459, 896; 2026-02-21T10:22:24.3605069Z and.b32 %r32377, %r42460, 62; 2026-02-21T10:22:24.3605130Z or.b32 %r32378, %r32375, %r32377; 2026-02-21T10:22:24.3605191Z or.b32 %r32379, %r32378, %r32373; 2026-02-21T10:22:24.3605305Z add.s32 %r72, %r39936, %r32379; 2026-02-21T10:22:24.3605366Z xor.b32 %r32380, %r32379, 8; 2026-02-21T10:22:24.3605426Z add.s32 %r73, %r39936, %r32380; 2026-02-21T10:22:24.3605488Z xor.b32 %r32381, %r32379, 16; 2026-02-21T10:22:24.3605547Z add.s32 %r74, %r39936, %r32381; 2026-02-21T10:22:24.3605604Z xor.b32 %r32382, %r32379, 24; 2026-02-21T10:22:24.3605667Z add.s32 %r75, %r39936, %r32382; 2026-02-21T10:22:24.3605724Z xor.b32 %r32383, %r32379, 32; 2026-02-21T10:22:24.3605784Z add.s32 %r76, %r39936, %r32383; 2026-02-21T10:22:24.3605844Z xor.b32 %r32384, %r32379, 40; 2026-02-21T10:22:24.3605906Z add.s32 %r77, %r39936, %r32384; 2026-02-21T10:22:24.3605965Z xor.b32 %r32385, %r32379, 48; 2026-02-21T10:22:24.3606026Z add.s32 %r78, %r39936, %r32385; 2026-02-21T10:22:24.3606092Z xor.b32 %r32386, %r32379, 56; 2026-02-21T10:22:24.3606151Z add.s32 %r79, %r39936, %r32386; 2026-02-21T10:22:24.3606210Z add.s32 %r80, %r39936, %r42456; 2026-02-21T10:22:24.3606268Z xor.b32 %r32387, %r42456, 16; 2026-02-21T10:22:24.3606380Z add.s32 %r81, %r39936, %r32387; 2026-02-21T10:22:24.3606562Z xor.b32 %r32388, %r42456, 32; 2026-02-21T10:22:24.3606638Z add.s32 %r82, %r39936, %r32388; 2026-02-21T10:22:24.3606702Z xor.b32 %r32389, %r42456, 48; 2026-02-21T10:22:24.3606761Z add.s32 %r83, %r39936, %r32389; 2026-02-21T10:22:24.3606820Z xor.b32 %r32390, %r42456, 64; 2026-02-21T10:22:24.3606957Z add.s32 %r84, %r39936, %r32390; 2026-02-21T10:22:24.3607027Z xor.b32 %r32391, %r42456, 80; 2026-02-21T10:22:24.3607086Z add.s32 %r85, %r39936, %r32391; 2026-02-21T10:22:24.3607147Z xor.b32 %r32392, %r42456, 96; 2026-02-21T10:22:24.3607208Z add.s32 %r86, %r39936, %r32392; 2026-02-21T10:22:24.3607268Z xor.b32 %r32393, %r42456, 112; 2026-02-21T10:22:24.3607327Z add.s32 %r87, %r39936, %r32393; 2026-02-21T10:22:24.3607387Z shl.b32 %r32394, %r42456, 7; 2026-02-21T10:22:24.3607446Z or.b32 %r32396, %r32394, %r42461; 2026-02-21T10:22:24.3607506Z add.s32 %r88, %r39936, %r32396; 2026-02-21T10:22:24.3607566Z xor.b32 %r32397, %r32396, 16; 2026-02-21T10:22:24.3607630Z add.s32 %r89, %r39936, %r32397; 2026-02-21T10:22:24.3607690Z xor.b32 %r32398, %r32396, 32; 2026-02-21T10:22:24.3607748Z add.s32 %r90, %r39936, %r32398; 2026-02-21T10:22:24.3607808Z xor.b32 %r32399, %r32396, 48; 2026-02-21T10:22:24.3607866Z add.s32 %r91, %r39936, %r32399; 2026-02-21T10:22:24.3607928Z xor.b32 %r32400, %r32396, 64; 2026-02-21T10:22:24.3607988Z add.s32 %r92, %r39936, %r32400; 2026-02-21T10:22:24.3608048Z xor.b32 %r32401, %r32396, 80; 2026-02-21T10:22:24.3608177Z add.s32 %r93, %r39936, %r32401; 2026-02-21T10:22:24.3608239Z xor.b32 %r32402, %r32396, 96; 2026-02-21T10:22:24.3608300Z add.s32 %r94, %r39936, %r32402; 2026-02-21T10:22:24.3608359Z xor.b32 %r32403, %r32396, 112; 2026-02-21T10:22:24.3608417Z add.s32 %r95, %r39936, %r32403; 2026-02-21T10:22:24.3608491Z bfe.u32 %r32404, %r39936, 4, 14; 2026-02-21T10:22:24.3608556Z cvt.u64.u32 %rd665, %r32404; 2026-02-21T10:22:24.3608632Z or.b64 %rd12, %rd665, 4611686293372403712; 2026-02-21T10:22:24.3608699Z add.s32 %r32405, %r39936, 32; 2026-02-21T10:22:24.3608764Z bfe.u32 %r32406, %r32405, 4, 14; 2026-02-21T10:22:24.3608825Z cvt.u64.u32 %rd666, %r32406; 2026-02-21T10:22:24.3608896Z or.b64 %rd13, %rd666, 4611686293372403712; 2026-02-21T10:22:24.3608956Z add.s32 %r32407, %r39936, 64; 2026-02-21T10:22:24.3609020Z bfe.u32 %r32408, %r32407, 4, 14; 2026-02-21T10:22:24.3609081Z cvt.u64.u32 %rd667, %r32408; 2026-02-21T10:22:24.3609150Z or.b64 %rd14, %rd667, 4611686293372403712; 2026-02-21T10:22:24.3609215Z add.s32 %r32409, %r39936, 96; 2026-02-21T10:22:24.3609273Z bfe.u32 %r32410, %r32409, 4, 14; 2026-02-21T10:22:24.3609332Z cvt.u64.u32 %rd668, %r32410; 2026-02-21T10:22:24.3609404Z or.b64 %rd15, %rd668, 4611686293372403712; 2026-02-21T10:22:24.3609464Z add.s32 %r32411, %r39936, 16384; 2026-02-21T10:22:24.3609522Z bfe.u32 %r32412, %r32411, 4, 14; 2026-02-21T10:22:24.3609584Z cvt.u64.u32 %rd669, %r32412; 2026-02-21T10:22:24.3609652Z or.b64 %rd16, %rd669, 4611686293372403712; 2026-02-21T10:22:24.3609795Z add.s32 %r32413, %r39936, 16416; 2026-02-21T10:22:24.3609855Z bfe.u32 %r32414, %r32413, 4, 14; 2026-02-21T10:22:24.3609918Z cvt.u64.u32 %rd670, %r32414; 2026-02-21T10:22:24.3609987Z or.b64 %rd17, %rd670, 4611686293372403712; 2026-02-21T10:22:24.3610046Z add.s32 %r32415, %r39936, 16448; 2026-02-21T10:22:24.3610111Z bfe.u32 %r32416, %r32415, 4, 14; 2026-02-21T10:22:24.3610171Z cvt.u64.u32 %rd671, %r32416; 2026-02-21T10:22:24.3610239Z or.b64 %rd18, %rd671, 4611686293372403712; 2026-02-21T10:22:24.3610300Z add.s32 %r32417, %r39936, 16480; 2026-02-21T10:22:24.3610361Z bfe.u32 %r32418, %r32417, 4, 14; 2026-02-21T10:22:24.3610421Z cvt.u64.u32 %rd672, %r32418; 2026-02-21T10:22:24.3610489Z or.b64 %rd19, %rd672, 4611686293372403712; 2026-02-21T10:22:24.3610556Z and.b32 %r32420, %r42462, 1920; 2026-02-21T10:22:24.3610625Z or.b32 %r32422, %r32420, %r42461; 2026-02-21T10:22:24.3610690Z xor.b32 %r32423, %r32422, %r42463; 2026-02-21T10:22:24.3610753Z or.b32 %r32424, %r32423, %r32373; 2026-02-21T10:22:24.3610901Z add.s32 %r96, %r39936, %r32424; 2026-02-21T10:22:24.3610967Z add.s32 %r97, %r96, 16384; 2026-02-21T10:22:24.3611027Z add.s32 %r98, %r96, 8192; 2026-02-21T10:22:24.3611089Z add.s32 %r99, %r96, 24576; 2026-02-21T10:22:24.3611147Z xor.b32 %r32425, %r32424, 32; 2026-02-21T10:22:24.3611257Z add.s32 %r100, %r39936, %r32425; 2026-02-21T10:22:24.3611321Z add.s32 %r101, %r100, 16384; 2026-02-21T10:22:24.3611381Z add.s32 %r102, %r100, 8192; 2026-02-21T10:22:24.3611441Z add.s32 %r103, %r100, 24576; 2026-02-21T10:22:24.3611500Z xor.b32 %r32426, %r32424, 64; 2026-02-21T10:22:24.3611564Z add.s32 %r104, %r39936, %r32426; 2026-02-21T10:22:24.3611622Z add.s32 %r105, %r104, 16384; 2026-02-21T10:22:24.3611681Z add.s32 %r106, %r104, 8192; 2026-02-21T10:22:24.3611741Z add.s32 %r107, %r104, 24576; 2026-02-21T10:22:24.3611800Z xor.b32 %r32427, %r32424, 96; 2026-02-21T10:22:24.3611859Z add.s32 %r108, %r39936, %r32427; 2026-02-21T10:22:24.3611921Z add.s32 %r109, %r108, 16384; 2026-02-21T10:22:24.3611996Z add.s32 %r110, %r108, 8192; 2026-02-21T10:22:24.3612057Z add.s32 %r111, %r108, 24576; 2026-02-21T10:22:24.3612258Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.3612333Z mad.wide.u32 %rd20, %r8, 16, %rd117; 2026-02-21T10:22:24.3612541Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3612654Z or.b32 %r32429, %r42471, %r9; 2026-02-21T10:22:24.3612718Z or.b32 %r120, %r32429, 128; 2026-02-21T10:22:24.3612828Z $L__BB0_17: // =>This Loop Header: Depth=1 2026-02-21T10:22:24.3612935Z // Child Loop BB0_18 Depth 2 2026-02-21T10:22:24.3613027Z // Child Loop BB0_20 Depth 2 2026-02-21T10:22:24.3613225Z .loc 1 37 35 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:37:35 2026-02-21T10:22:24.3613291Z shr.s32 %r32431, %r43244, 31; 2026-02-21T10:22:24.3613349Z shr.u32 %r32432, %r32431, 18; 2026-02-21T10:22:24.3613414Z add.s32 %r32433, %r43244, %r32432; 2026-02-21T10:22:24.3613472Z shr.s32 %r32434, %r32433, 14; 2026-02-21T10:22:24.3613661Z .loc 1 38 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:38:33 2026-02-21T10:22:24.3613725Z shl.b32 %r32435, %r32434, 5; 2026-02-21T10:22:24.3613917Z .loc 1 39 39 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:39 2026-02-21T10:22:24.3613976Z sub.s32 %r32436, 10, %r32435; 2026-02-21T10:22:24.3614167Z .loc 1 39 52 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:39:52 2026-02-21T10:22:24.3614225Z min.s32 %r32437, %r32436, 32; 2026-02-21T10:22:24.3614412Z .loc 1 40 45 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:45 2026-02-21T10:22:24.3614527Z and.b32 %r32438, %r32433, -16384; 2026-02-21T10:22:24.3614592Z sub.s32 %r32439, %r43244, %r32438; 2026-02-21T10:22:24.3614781Z .loc 1 41 51 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:41:51 2026-02-21T10:22:24.3614842Z div.s32 %r32440, %r32439, %r32437; 2026-02-21T10:22:24.3615034Z .loc 1 40 64 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:64 2026-02-21T10:22:24.3615104Z mul.lo.s32 %r32441, %r32440, %r32437; 2026-02-21T10:22:24.3615167Z sub.s32 %r32442, %r32439, %r32441; 2026-02-21T10:22:24.3615357Z .loc 1 40 30 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:40:30 2026-02-21T10:22:24.3615416Z add.s32 %r32443, %r32442, %r32435; 2026-02-21T10:22:24.3615614Z .loc 1 42 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:42:27 2026-02-21T10:22:24.3615676Z shl.b32 %r39937, %r32443, 7; 2026-02-21T10:22:24.3615927Z .loc 1 43 27 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:43:27 2026-02-21T10:22:24.3615991Z shl.b32 %r42384, %r32440, 7; 2026-02-21T10:22:24.3616189Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3616252Z or.b32 %r32444, %r42464, %r42384; 2026-02-21T10:22:24.3616312Z shl.b32 %r32445, %r32444, 13; 2026-02-21T10:22:24.3616425Z mul.wide.s32 %rd93, %r32445, 2; 2026-02-21T10:22:24.3616615Z or.b32 %r32446, %r42465, %r42384; 2026-02-21T10:22:24.3616680Z shl.b32 %r32447, %r32446, 13; 2026-02-21T10:22:24.3616744Z mul.wide.s32 %rd94, %r32447, 2; 2026-02-21T10:22:24.3616803Z or.b32 %r32448, %r42466, %r42384; 2026-02-21T10:22:24.3616864Z shl.b32 %r32449, %r32448, 13; 2026-02-21T10:22:24.3616927Z mul.wide.s32 %rd95, %r32449, 2; 2026-02-21T10:22:24.3616986Z or.b32 %r32450, %r42467, %r42384; 2026-02-21T10:22:24.3617051Z shl.b32 %r32451, %r32450, 13; 2026-02-21T10:22:24.3617114Z mul.wide.s32 %rd96, %r32451, 2; 2026-02-21T10:22:24.3617175Z or.b32 %r32452, %r42468, %r42384; 2026-02-21T10:22:24.3617236Z shl.b32 %r32453, %r32452, 13; 2026-02-21T10:22:24.3617298Z mul.wide.s32 %rd97, %r32453, 2; 2026-02-21T10:22:24.3617356Z or.b32 %r32454, %r42469, %r42384; 2026-02-21T10:22:24.3617414Z shl.b32 %r32455, %r32454, 13; 2026-02-21T10:22:24.3617479Z mul.wide.s32 %rd98, %r32455, 2; 2026-02-21T10:22:24.3617541Z shl.b32 %r32456, %r32440, 20; 2026-02-21T10:22:24.3617600Z or.b32 %r32457, %r42470, %r32456; 2026-02-21T10:22:24.3617664Z mul.wide.s32 %rd99, %r32457, 2; 2026-02-21T10:22:24.3617798Z or.b32 %r43245, %r120, %r32456; 2026-02-21T10:22:24.3617861Z or.b32 %r32458, %r42471, %r32456; 2026-02-21T10:22:24.3617935Z mul.wide.s32 %rd100, %r32458, 2; 2026-02-21T10:22:24.3618000Z mov.b32 %r43246, 0f00000000; 2026-02-21T10:22:24.3618059Z mov.b64 %rd853, -96; 2026-02-21T10:22:24.3618119Z mov.b64 %rd852, %rd20; 2026-02-21T10:22:24.3618180Z mov.b32 %r43247, %r43246; 2026-02-21T10:22:24.3618238Z mov.b32 %r43248, %r43246; 2026-02-21T10:22:24.3618302Z mov.b32 %r43249, %r43246; 2026-02-21T10:22:24.3618360Z mov.b32 %r43250, %r43246; 2026-02-21T10:22:24.3618419Z mov.b32 %r43251, %r43246; 2026-02-21T10:22:24.3618477Z mov.b32 %r43252, %r43246; 2026-02-21T10:22:24.3618534Z mov.b32 %r43253, %r43246; 2026-02-21T10:22:24.3618594Z mov.b32 %r43254, %r43246; 2026-02-21T10:22:24.3618654Z mov.b32 %r43255, %r43246; 2026-02-21T10:22:24.3618711Z mov.b32 %r43256, %r43246; 2026-02-21T10:22:24.3618769Z mov.b32 %r43257, %r43246; 2026-02-21T10:22:24.3618830Z mov.b32 %r43258, %r43246; 2026-02-21T10:22:24.3618887Z mov.b32 %r43259, %r43246; 2026-02-21T10:22:24.3618944Z mov.b32 %r43260, %r43246; 2026-02-21T10:22:24.3619004Z mov.b32 %r43261, %r43246; 2026-02-21T10:22:24.3619062Z mov.b32 %r43262, %r43246; 2026-02-21T10:22:24.3619118Z mov.b32 %r43263, %r43246; 2026-02-21T10:22:24.3619177Z mov.b32 %r43264, %r43246; 2026-02-21T10:22:24.3619233Z mov.b32 %r43265, %r43246; 2026-02-21T10:22:24.3619289Z mov.b32 %r43266, %r43246; 2026-02-21T10:22:24.3619439Z mov.b32 %r43267, %r43246; 2026-02-21T10:22:24.3619501Z mov.b32 %r43268, %r43246; 2026-02-21T10:22:24.3619557Z mov.b32 %r43269, %r43246; 2026-02-21T10:22:24.3619613Z mov.b32 %r43270, %r43246; 2026-02-21T10:22:24.3619672Z mov.b32 %r43271, %r43246; 2026-02-21T10:22:24.3619728Z mov.b32 %r43272, %r43246; 2026-02-21T10:22:24.3619787Z mov.b32 %r43273, %r43246; 2026-02-21T10:22:24.3619844Z mov.b32 %r43274, %r43246; 2026-02-21T10:22:24.3619904Z mov.b32 %r43275, %r43246; 2026-02-21T10:22:24.3619967Z mov.b32 %r43276, %r43246; 2026-02-21T10:22:24.3620025Z mov.b32 %r43277, %r43246; 2026-02-21T10:22:24.3620084Z mov.b32 %r43278, %r43246; 2026-02-21T10:22:24.3620140Z mov.b32 %r43279, %r43246; 2026-02-21T10:22:24.3620195Z mov.b32 %r43280, %r43246; 2026-02-21T10:22:24.3620251Z mov.b32 %r43281, %r43246; 2026-02-21T10:22:24.3620310Z mov.b32 %r43282, %r43246; 2026-02-21T10:22:24.3620366Z mov.b32 %r43283, %r43246; 2026-02-21T10:22:24.3620423Z mov.b32 %r43284, %r43246; 2026-02-21T10:22:24.3620553Z mov.b32 %r43285, %r43246; 2026-02-21T10:22:24.3620613Z mov.b32 %r43286, %r43246; 2026-02-21T10:22:24.3620671Z mov.b32 %r43287, %r43246; 2026-02-21T10:22:24.3620728Z mov.b32 %r43288, %r43246; 2026-02-21T10:22:24.3620796Z mov.b32 %r43289, %r43246; 2026-02-21T10:22:24.3620857Z mov.b32 %r43290, %r43246; 2026-02-21T10:22:24.3620976Z mov.b32 %r43291, %r43246; 2026-02-21T10:22:24.3621035Z mov.b32 %r43292, %r43246; 2026-02-21T10:22:24.3621091Z mov.b32 %r43293, %r43246; 2026-02-21T10:22:24.3621149Z mov.b32 %r43294, %r43246; 2026-02-21T10:22:24.3621206Z mov.b32 %r43295, %r43246; 2026-02-21T10:22:24.3621268Z mov.b32 %r43296, %r43246; 2026-02-21T10:22:24.3621324Z mov.b32 %r43297, %r43246; 2026-02-21T10:22:24.3621381Z mov.b32 %r43298, %r43246; 2026-02-21T10:22:24.3621440Z mov.b32 %r43299, %r43246; 2026-02-21T10:22:24.3621496Z mov.b32 %r43300, %r43246; 2026-02-21T10:22:24.3621553Z mov.b32 %r43301, %r43246; 2026-02-21T10:22:24.3621622Z mov.b32 %r43302, %r43246; 2026-02-21T10:22:24.3621690Z mov.b32 %r43303, %r43246; 2026-02-21T10:22:24.3621747Z mov.b32 %r43304, %r43246; 2026-02-21T10:22:24.3621804Z mov.b32 %r43305, %r43246; 2026-02-21T10:22:24.3621863Z mov.b32 %r43306, %r43246; 2026-02-21T10:22:24.3621919Z mov.b32 %r43307, %r43246; 2026-02-21T10:22:24.3621976Z mov.b32 %r43308, %r43246; 2026-02-21T10:22:24.3622039Z mov.b32 %r43309, %r43246; 2026-02-21T10:22:24.3622096Z mov.b32 %r43310, %r43246; 2026-02-21T10:22:24.3622153Z mov.b32 %r43311, %r43246; 2026-02-21T10:22:24.3622263Z mov.b32 %r43312, %r43246; 2026-02-21T10:22:24.3622325Z mov.b32 %r43313, %r43246; 2026-02-21T10:22:24.3622381Z mov.b32 %r43314, %r43246; 2026-02-21T10:22:24.3622438Z mov.b32 %r43315, %r43246; 2026-02-21T10:22:24.3622498Z mov.b32 %r43316, %r43246; 2026-02-21T10:22:24.3622554Z mov.b32 %r43317, %r43246; 2026-02-21T10:22:24.3622610Z mov.b32 %r43318, %r43246; 2026-02-21T10:22:24.3622668Z mov.b32 %r43319, %r43246; 2026-02-21T10:22:24.3622727Z mov.b32 %r43320, %r43246; 2026-02-21T10:22:24.3622788Z mov.b32 %r43321, %r43246; 2026-02-21T10:22:24.3622846Z mov.b32 %r43322, %r43246; 2026-02-21T10:22:24.3622918Z mov.b32 %r43323, %r43246; 2026-02-21T10:22:24.3622977Z mov.b32 %r43324, %r43246; 2026-02-21T10:22:24.3623034Z mov.b32 %r43325, %r43246; 2026-02-21T10:22:24.3623090Z mov.b32 %r43326, %r43246; 2026-02-21T10:22:24.3623153Z mov.b32 %r43327, %r43246; 2026-02-21T10:22:24.3623211Z mov.b32 %r43328, %r43246; 2026-02-21T10:22:24.3623268Z mov.b32 %r43329, %r43246; 2026-02-21T10:22:24.3623328Z mov.b32 %r43330, %r43246; 2026-02-21T10:22:24.3623385Z mov.b32 %r43331, %r43246; 2026-02-21T10:22:24.3623442Z mov.b32 %r43332, %r43246; 2026-02-21T10:22:24.3623499Z mov.b32 %r43333, %r43246; 2026-02-21T10:22:24.3623558Z mov.b32 %r43334, %r43246; 2026-02-21T10:22:24.3623614Z mov.b32 %r43335, %r43246; 2026-02-21T10:22:24.3623670Z mov.b32 %r43336, %r43246; 2026-02-21T10:22:24.3623728Z mov.b32 %r43337, %r43246; 2026-02-21T10:22:24.3623786Z mov.b32 %r43338, %r43246; 2026-02-21T10:22:24.3623902Z mov.b32 %r43339, %r43246; 2026-02-21T10:22:24.3623960Z mov.b32 %r43340, %r43246; 2026-02-21T10:22:24.3624018Z mov.b32 %r43341, %r43246; 2026-02-21T10:22:24.3624074Z mov.b32 %r43342, %r43246; 2026-02-21T10:22:24.3624131Z mov.b32 %r43343, %r43246; 2026-02-21T10:22:24.3624189Z mov.b32 %r43344, %r43246; 2026-02-21T10:22:24.3624247Z mov.b32 %r43345, %r43246; 2026-02-21T10:22:24.3624304Z mov.b32 %r43346, %r43246; 2026-02-21T10:22:24.3624361Z mov.b32 %r43347, %r43246; 2026-02-21T10:22:24.3624436Z mov.b32 %r43348, %r43246; 2026-02-21T10:22:24.3624497Z mov.b32 %r43349, %r43246; 2026-02-21T10:22:24.3624554Z mov.b32 %r43350, %r43246; 2026-02-21T10:22:24.3624615Z mov.b32 %r43351, %r43246; 2026-02-21T10:22:24.3624673Z mov.b32 %r43352, %r43246; 2026-02-21T10:22:24.3624730Z mov.b32 %r43353, %r43246; 2026-02-21T10:22:24.3624789Z mov.b32 %r43354, %r43246; 2026-02-21T10:22:24.3624845Z mov.b32 %r43355, %r43246; 2026-02-21T10:22:24.3624902Z mov.b32 %r43356, %r43246; 2026-02-21T10:22:24.3625012Z mov.b32 %r43357, %r43246; 2026-02-21T10:22:24.3625074Z mov.b32 %r43358, %r43246; 2026-02-21T10:22:24.3625132Z mov.b32 %r43359, %r43246; 2026-02-21T10:22:24.3625190Z mov.b32 %r43360, %r43246; 2026-02-21T10:22:24.3625249Z mov.b32 %r43361, %r43246; 2026-02-21T10:22:24.3625307Z mov.b32 %r43362, %r43246; 2026-02-21T10:22:24.3625411Z mov.b32 %r43363, %r43246; 2026-02-21T10:22:24.3625468Z mov.b32 %r43364, %r43246; 2026-02-21T10:22:24.3625526Z mov.b32 %r43365, %r43246; 2026-02-21T10:22:24.3625584Z mov.b32 %r43366, %r43246; 2026-02-21T10:22:24.3625642Z mov.b32 %r43367, %r43246; 2026-02-21T10:22:24.3625701Z mov.b32 %r43368, %r43246; 2026-02-21T10:22:24.3625758Z mov.b32 %r43369, %r43246; 2026-02-21T10:22:24.3625814Z mov.b32 %r43370, %r43246; 2026-02-21T10:22:24.3625871Z mov.b32 %r43371, %r43246; 2026-02-21T10:22:24.3625942Z mov.b32 %r43372, %r43246; 2026-02-21T10:22:24.3626001Z mov.b32 %r43373, %r43246; 2026-02-21T10:22:24.3626120Z $L__BB0_18: // Parent Loop BB0_17 Depth=1 2026-02-21T10:22:24.3626231Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.3626427Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3626608Z add.s64 %rd675, %rd852, %rd99; 2026-02-21T10:22:24.3626680Z add.s64 %rd678, %rd852, %rd98; 2026-02-21T10:22:24.3626742Z add.s64 %rd681, %rd852, %rd97; 2026-02-21T10:22:24.3626804Z add.s64 %rd684, %rd852, %rd96; 2026-02-21T10:22:24.3626945Z add.s64 %rd687, %rd852, %rd95; 2026-02-21T10:22:24.3627011Z add.s64 %rd690, %rd852, %rd94; 2026-02-21T10:22:24.3627071Z add.s64 %rd693, %rd852, %rd93; 2026-02-21T10:22:24.3627264Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3627337Z add.s64 %rd696, %rd852, %rd100; 2026-02-21T10:22:24.3627401Z // begin inline asm 2026-02-21T10:22:24.3627460Z mov.u64 %rd674, 0x0; 2026-02-21T10:22:24.3627594Z createpolicy.fractional.L2::evict_first.b64 %rd674, 1.0; 2026-02-21T10:22:24.3627654Z // end inline asm 2026-02-21T10:22:24.3627712Z // begin inline asm 2026-02-21T10:22:24.3627768Z mov.u32 %r32459, 0x0; 2026-02-21T10:22:24.3627827Z mov.u32 %r32460, 0x0; 2026-02-21T10:22:24.3627894Z mov.u32 %r32461, 0x0; 2026-02-21T10:22:24.3627954Z mov.u32 %r32462, 0x0; 2026-02-21T10:22:24.3628195Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32459, %r32460, %r32461, %r32462 }, [ %rd675 + 0 ], %rd674; 2026-02-21T10:22:24.3628255Z // end inline asm 2026-02-21T10:22:24.3628312Z // begin inline asm 2026-02-21T10:22:24.3628370Z mov.u64 %rd677, 0x0; 2026-02-21T10:22:24.3628575Z createpolicy.fractional.L2::evict_first.b64 %rd677, 1.0; 2026-02-21T10:22:24.3628632Z // end inline asm 2026-02-21T10:22:24.3628690Z // begin inline asm 2026-02-21T10:22:24.3628748Z mov.u32 %r32463, 0x0; 2026-02-21T10:22:24.3628803Z mov.u32 %r32464, 0x0; 2026-02-21T10:22:24.3628860Z mov.u32 %r32465, 0x0; 2026-02-21T10:22:24.3629022Z mov.u32 %r32466, 0x0; 2026-02-21T10:22:24.3629256Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32463, %r32464, %r32465, %r32466 }, [ %rd678 + 0 ], %rd677; 2026-02-21T10:22:24.3629313Z // end inline asm 2026-02-21T10:22:24.3629373Z // begin inline asm 2026-02-21T10:22:24.3629434Z mov.u64 %rd680, 0x0; 2026-02-21T10:22:24.3629556Z createpolicy.fractional.L2::evict_first.b64 %rd680, 1.0; 2026-02-21T10:22:24.3629613Z // end inline asm 2026-02-21T10:22:24.3629673Z // begin inline asm 2026-02-21T10:22:24.3629732Z mov.u32 %r32467, 0x0; 2026-02-21T10:22:24.3629791Z mov.u32 %r32468, 0x0; 2026-02-21T10:22:24.3629849Z mov.u32 %r32469, 0x0; 2026-02-21T10:22:24.3629908Z mov.u32 %r32470, 0x0; 2026-02-21T10:22:24.3630134Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32467, %r32468, %r32469, %r32470 }, [ %rd681 + 0 ], %rd680; 2026-02-21T10:22:24.3630190Z // end inline asm 2026-02-21T10:22:24.3630250Z // begin inline asm 2026-02-21T10:22:24.3630307Z mov.u64 %rd683, 0x0; 2026-02-21T10:22:24.3630494Z createpolicy.fractional.L2::evict_first.b64 %rd683, 1.0; 2026-02-21T10:22:24.3630555Z // end inline asm 2026-02-21T10:22:24.3630612Z // begin inline asm 2026-02-21T10:22:24.3630669Z mov.u32 %r32471, 0x0; 2026-02-21T10:22:24.3630724Z mov.u32 %r32472, 0x0; 2026-02-21T10:22:24.3630782Z mov.u32 %r32473, 0x0; 2026-02-21T10:22:24.3630899Z mov.u32 %r32474, 0x0; 2026-02-21T10:22:24.3631121Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32471, %r32472, %r32473, %r32474 }, [ %rd684 + 0 ], %rd683; 2026-02-21T10:22:24.3631181Z // end inline asm 2026-02-21T10:22:24.3631239Z // begin inline asm 2026-02-21T10:22:24.3631296Z mov.u64 %rd686, 0x0; 2026-02-21T10:22:24.3631413Z createpolicy.fractional.L2::evict_first.b64 %rd686, 1.0; 2026-02-21T10:22:24.3631470Z // end inline asm 2026-02-21T10:22:24.3631526Z // begin inline asm 2026-02-21T10:22:24.3631584Z mov.u32 %r32475, 0x0; 2026-02-21T10:22:24.3631645Z mov.u32 %r32476, 0x0; 2026-02-21T10:22:24.3631701Z mov.u32 %r32477, 0x0; 2026-02-21T10:22:24.3631765Z mov.u32 %r32478, 0x0; 2026-02-21T10:22:24.3631988Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32475, %r32476, %r32477, %r32478 }, [ %rd687 + 0 ], %rd686; 2026-02-21T10:22:24.3632043Z // end inline asm 2026-02-21T10:22:24.3632100Z // begin inline asm 2026-02-21T10:22:24.3632158Z mov.u64 %rd689, 0x0; 2026-02-21T10:22:24.3632277Z createpolicy.fractional.L2::evict_first.b64 %rd689, 1.0; 2026-02-21T10:22:24.3632333Z // end inline asm 2026-02-21T10:22:24.3632393Z // begin inline asm 2026-02-21T10:22:24.3632504Z mov.u32 %r32479, 0x0; 2026-02-21T10:22:24.3632564Z mov.u32 %r32480, 0x0; 2026-02-21T10:22:24.3632620Z mov.u32 %r32481, 0x0; 2026-02-21T10:22:24.3632677Z mov.u32 %r32482, 0x0; 2026-02-21T10:22:24.3632901Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32479, %r32480, %r32481, %r32482 }, [ %rd690 + 0 ], %rd689; 2026-02-21T10:22:24.3632955Z // end inline asm 2026-02-21T10:22:24.3633012Z // begin inline asm 2026-02-21T10:22:24.3633074Z mov.u64 %rd692, 0x0; 2026-02-21T10:22:24.3633202Z createpolicy.fractional.L2::evict_first.b64 %rd692, 1.0; 2026-02-21T10:22:24.3633259Z // end inline asm 2026-02-21T10:22:24.3633320Z // begin inline asm 2026-02-21T10:22:24.3633377Z mov.u32 %r32483, 0x0; 2026-02-21T10:22:24.3633432Z mov.u32 %r32484, 0x0; 2026-02-21T10:22:24.3633488Z mov.u32 %r32485, 0x0; 2026-02-21T10:22:24.3633550Z mov.u32 %r32486, 0x0; 2026-02-21T10:22:24.3633781Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32483, %r32484, %r32485, %r32486 }, [ %rd693 + 0 ], %rd692; 2026-02-21T10:22:24.3633841Z // end inline asm 2026-02-21T10:22:24.3633903Z // begin inline asm 2026-02-21T10:22:24.3633961Z mov.u64 %rd695, 0x0; 2026-02-21T10:22:24.3634083Z createpolicy.fractional.L2::evict_first.b64 %rd695, 1.0; 2026-02-21T10:22:24.3634141Z // end inline asm 2026-02-21T10:22:24.3634198Z // begin inline asm 2026-02-21T10:22:24.3634256Z mov.u32 %r32487, 0x0; 2026-02-21T10:22:24.3634311Z mov.u32 %r32488, 0x0; 2026-02-21T10:22:24.3634436Z mov.u32 %r32489, 0x0; 2026-02-21T10:22:24.3634495Z mov.u32 %r32490, 0x0; 2026-02-21T10:22:24.3634727Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r32487, %r32488, %r32489, %r32490 }, [ %rd696 + 0 ], %rd695; 2026-02-21T10:22:24.3634786Z // end inline asm 2026-02-21T10:22:24.3634993Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3635052Z bar.sync 0; 2026-02-21T10:22:24.3635138Z st.shared.v2.b32 [%r70], {%r32459, %r32460}; 2026-02-21T10:22:24.3635230Z st.shared.v2.b32 [%r70+2048], {%r32463, %r32464}; 2026-02-21T10:22:24.3635315Z st.shared.v2.b32 [%r70+4096], {%r32467, %r32468}; 2026-02-21T10:22:24.3635399Z st.shared.v2.b32 [%r70+6144], {%r32471, %r32472}; 2026-02-21T10:22:24.3635483Z st.shared.v2.b32 [%r70+8192], {%r32475, %r32476}; 2026-02-21T10:22:24.3635574Z st.shared.v2.b32 [%r70+10240], {%r32479, %r32480}; 2026-02-21T10:22:24.3635659Z st.shared.v2.b32 [%r70+12288], {%r32483, %r32484}; 2026-02-21T10:22:24.3635798Z st.shared.v2.b32 [%r70+14336], {%r32487, %r32488}; 2026-02-21T10:22:24.3635879Z st.shared.v2.b32 [%r71], {%r32461, %r32462}; 2026-02-21T10:22:24.3635963Z st.shared.v2.b32 [%r71+2048], {%r32465, %r32466}; 2026-02-21T10:22:24.3636046Z st.shared.v2.b32 [%r71+4096], {%r32469, %r32470}; 2026-02-21T10:22:24.3636128Z st.shared.v2.b32 [%r71+6144], {%r32473, %r32474}; 2026-02-21T10:22:24.3636265Z st.shared.v2.b32 [%r71+8192], {%r32477, %r32478}; 2026-02-21T10:22:24.3636352Z st.shared.v2.b32 [%r71+10240], {%r32481, %r32482}; 2026-02-21T10:22:24.3636444Z st.shared.v2.b32 [%r71+12288], {%r32485, %r32486}; 2026-02-21T10:22:24.3636657Z st.shared.v2.b32 [%r71+14336], {%r32489, %r32490}; 2026-02-21T10:22:24.3636712Z bar.sync 0; 2026-02-21T10:22:24.3636783Z ld.shared.b16 %rs2689, [%r72]; 2026-02-21T10:22:24.3636853Z ld.shared.b16 %rs2690, [%r72+1024]; 2026-02-21T10:22:24.3636919Z ld.shared.b16 %rs2691, [%r72+64]; 2026-02-21T10:22:24.3636989Z ld.shared.b16 %rs2692, [%r72+1088]; 2026-02-21T10:22:24.3637057Z ld.shared.b16 %rs2693, [%r72+8192]; 2026-02-21T10:22:24.3637124Z ld.shared.b16 %rs2694, [%r72+9216]; 2026-02-21T10:22:24.3637187Z ld.shared.b16 %rs2695, [%r72+8256]; 2026-02-21T10:22:24.3637254Z ld.shared.b16 %rs2696, [%r72+9280]; 2026-02-21T10:22:24.3637317Z ld.shared.b16 %rs2697, [%r73]; 2026-02-21T10:22:24.3637380Z ld.shared.b16 %rs2698, [%r73+1024]; 2026-02-21T10:22:24.3637450Z ld.shared.b16 %rs2699, [%r73+64]; 2026-02-21T10:22:24.3637514Z ld.shared.b16 %rs2700, [%r73+1088]; 2026-02-21T10:22:24.3637660Z ld.shared.b16 %rs2701, [%r73+8192]; 2026-02-21T10:22:24.3637731Z ld.shared.b16 %rs2702, [%r73+9216]; 2026-02-21T10:22:24.3637798Z ld.shared.b16 %rs2703, [%r73+8256]; 2026-02-21T10:22:24.3637862Z ld.shared.b16 %rs2704, [%r73+9280]; 2026-02-21T10:22:24.3637928Z ld.shared.b16 %rs2705, [%r74]; 2026-02-21T10:22:24.3637993Z ld.shared.b16 %rs2706, [%r74+1024]; 2026-02-21T10:22:24.3638058Z ld.shared.b16 %rs2707, [%r74+64]; 2026-02-21T10:22:24.3638120Z ld.shared.b16 %rs2708, [%r74+1088]; 2026-02-21T10:22:24.3638189Z ld.shared.b16 %rs2709, [%r74+8192]; 2026-02-21T10:22:24.3638255Z ld.shared.b16 %rs2710, [%r74+9216]; 2026-02-21T10:22:24.3638320Z ld.shared.b16 %rs2711, [%r74+8256]; 2026-02-21T10:22:24.3638384Z ld.shared.b16 %rs2712, [%r74+9280]; 2026-02-21T10:22:24.3638450Z ld.shared.b16 %rs2713, [%r75]; 2026-02-21T10:22:24.3638518Z ld.shared.b16 %rs2714, [%r75+1024]; 2026-02-21T10:22:24.3638583Z ld.shared.b16 %rs2715, [%r75+64]; 2026-02-21T10:22:24.3638649Z ld.shared.b16 %rs2716, [%r75+1088]; 2026-02-21T10:22:24.3638714Z ld.shared.b16 %rs2717, [%r75+8192]; 2026-02-21T10:22:24.3638776Z ld.shared.b16 %rs2718, [%r75+9216]; 2026-02-21T10:22:24.3638839Z ld.shared.b16 %rs2719, [%r75+8256]; 2026-02-21T10:22:24.3638903Z ld.shared.b16 %rs2720, [%r75+9280]; 2026-02-21T10:22:24.3638965Z ld.shared.b16 %rs2721, [%r76]; 2026-02-21T10:22:24.3639029Z ld.shared.b16 %rs2722, [%r76+1024]; 2026-02-21T10:22:24.3639100Z ld.shared.b16 %rs2723, [%r76+64]; 2026-02-21T10:22:24.3639244Z ld.shared.b16 %rs2724, [%r76+1088]; 2026-02-21T10:22:24.3639312Z ld.shared.b16 %rs2725, [%r76+8192]; 2026-02-21T10:22:24.3639377Z ld.shared.b16 %rs2726, [%r76+9216]; 2026-02-21T10:22:24.3639442Z ld.shared.b16 %rs2727, [%r76+8256]; 2026-02-21T10:22:24.3639505Z ld.shared.b16 %rs2728, [%r76+9280]; 2026-02-21T10:22:24.3639567Z ld.shared.b16 %rs2729, [%r77]; 2026-02-21T10:22:24.3639636Z ld.shared.b16 %rs2730, [%r77+1024]; 2026-02-21T10:22:24.3639700Z ld.shared.b16 %rs2731, [%r77+64]; 2026-02-21T10:22:24.3639767Z ld.shared.b16 %rs2732, [%r77+1088]; 2026-02-21T10:22:24.3639832Z ld.shared.b16 %rs2733, [%r77+8192]; 2026-02-21T10:22:24.3639895Z ld.shared.b16 %rs2734, [%r77+9216]; 2026-02-21T10:22:24.3639958Z ld.shared.b16 %rs2735, [%r77+8256]; 2026-02-21T10:22:24.3640021Z ld.shared.b16 %rs2736, [%r77+9280]; 2026-02-21T10:22:24.3640087Z ld.shared.b16 %rs2737, [%r78]; 2026-02-21T10:22:24.3640161Z ld.shared.b16 %rs2738, [%r78+1024]; 2026-02-21T10:22:24.3640228Z ld.shared.b16 %rs2739, [%r78+64]; 2026-02-21T10:22:24.3640362Z ld.shared.b16 %rs2740, [%r78+1088]; 2026-02-21T10:22:24.3640430Z ld.shared.b16 %rs2741, [%r78+8192]; 2026-02-21T10:22:24.3640494Z ld.shared.b16 %rs2742, [%r78+9216]; 2026-02-21T10:22:24.3640557Z ld.shared.b16 %rs2743, [%r78+8256]; 2026-02-21T10:22:24.3640623Z ld.shared.b16 %rs2744, [%r78+9280]; 2026-02-21T10:22:24.3640747Z ld.shared.b16 %rs2745, [%r79]; 2026-02-21T10:22:24.3640810Z ld.shared.b16 %rs2746, [%r79+1024]; 2026-02-21T10:22:24.3640876Z ld.shared.b16 %rs2747, [%r79+64]; 2026-02-21T10:22:24.3640941Z ld.shared.b16 %rs2748, [%r79+1088]; 2026-02-21T10:22:24.3641004Z ld.shared.b16 %rs2749, [%r79+8192]; 2026-02-21T10:22:24.3641071Z ld.shared.b16 %rs2750, [%r79+9216]; 2026-02-21T10:22:24.3641134Z ld.shared.b16 %rs2751, [%r79+8256]; 2026-02-21T10:22:24.3641197Z ld.shared.b16 %rs2752, [%r79+9280]; 2026-02-21T10:22:24.3641258Z cvt.f32.bf16 %r32628, %rs2689; 2026-02-21T10:22:24.3641321Z cvt.f32.bf16 %r32629, %rs2690; 2026-02-21T10:22:24.3641387Z cvt.f32.bf16 %r32630, %rs2697; 2026-02-21T10:22:24.3641448Z cvt.f32.bf16 %r32631, %rs2698; 2026-02-21T10:22:24.3641509Z cvt.f32.bf16 %r32760, %rs2705; 2026-02-21T10:22:24.3641569Z cvt.f32.bf16 %r32761, %rs2706; 2026-02-21T10:22:24.3641629Z cvt.f32.bf16 %r32762, %rs2713; 2026-02-21T10:22:24.3641688Z cvt.f32.bf16 %r32763, %rs2714; 2026-02-21T10:22:24.3641762Z cvt.f32.bf16 %r32892, %rs2721; 2026-02-21T10:22:24.3641825Z cvt.f32.bf16 %r32893, %rs2722; 2026-02-21T10:22:24.3641884Z cvt.f32.bf16 %r32894, %rs2729; 2026-02-21T10:22:24.3641998Z cvt.f32.bf16 %r32895, %rs2730; 2026-02-21T10:22:24.3642061Z cvt.f32.bf16 %r33024, %rs2737; 2026-02-21T10:22:24.3642121Z cvt.f32.bf16 %r33025, %rs2738; 2026-02-21T10:22:24.3642183Z cvt.f32.bf16 %r33026, %rs2745; 2026-02-21T10:22:24.3642243Z cvt.f32.bf16 %r33027, %rs2746; 2026-02-21T10:22:24.3642302Z cvt.f32.bf16 %r33156, %rs2691; 2026-02-21T10:22:24.3642360Z cvt.f32.bf16 %r33157, %rs2692; 2026-02-21T10:22:24.3642423Z cvt.f32.bf16 %r33158, %rs2699; 2026-02-21T10:22:24.3642488Z cvt.f32.bf16 %r33159, %rs2700; 2026-02-21T10:22:24.3642547Z cvt.f32.bf16 %r33288, %rs2707; 2026-02-21T10:22:24.3642608Z cvt.f32.bf16 %r33289, %rs2708; 2026-02-21T10:22:24.3642666Z cvt.f32.bf16 %r33290, %rs2715; 2026-02-21T10:22:24.3642725Z cvt.f32.bf16 %r33291, %rs2716; 2026-02-21T10:22:24.3642784Z cvt.f32.bf16 %r33420, %rs2723; 2026-02-21T10:22:24.3642849Z cvt.f32.bf16 %r33421, %rs2724; 2026-02-21T10:22:24.3642909Z cvt.f32.bf16 %r33422, %rs2731; 2026-02-21T10:22:24.3642971Z cvt.f32.bf16 %r33423, %rs2732; 2026-02-21T10:22:24.3643034Z cvt.f32.bf16 %r33552, %rs2739; 2026-02-21T10:22:24.3643095Z cvt.f32.bf16 %r33553, %rs2740; 2026-02-21T10:22:24.3643155Z cvt.f32.bf16 %r33554, %rs2747; 2026-02-21T10:22:24.3643214Z cvt.f32.bf16 %r33555, %rs2748; 2026-02-21T10:22:24.3643276Z cvt.f32.bf16 %r33684, %rs2693; 2026-02-21T10:22:24.3643335Z cvt.f32.bf16 %r33685, %rs2694; 2026-02-21T10:22:24.3643395Z cvt.f32.bf16 %r33686, %rs2701; 2026-02-21T10:22:24.3643523Z cvt.f32.bf16 %r33687, %rs2702; 2026-02-21T10:22:24.3643585Z cvt.f32.bf16 %r33816, %rs2709; 2026-02-21T10:22:24.3643645Z cvt.f32.bf16 %r33817, %rs2710; 2026-02-21T10:22:24.3643704Z cvt.f32.bf16 %r33818, %rs2717; 2026-02-21T10:22:24.3643767Z cvt.f32.bf16 %r33819, %rs2718; 2026-02-21T10:22:24.3643826Z cvt.f32.bf16 %r33948, %rs2725; 2026-02-21T10:22:24.3643888Z cvt.f32.bf16 %r33949, %rs2726; 2026-02-21T10:22:24.3643950Z cvt.f32.bf16 %r33950, %rs2733; 2026-02-21T10:22:24.3644009Z cvt.f32.bf16 %r33951, %rs2734; 2026-02-21T10:22:24.3644070Z cvt.f32.bf16 %r34080, %rs2741; 2026-02-21T10:22:24.3644131Z cvt.f32.bf16 %r34081, %rs2742; 2026-02-21T10:22:24.3644190Z cvt.f32.bf16 %r34082, %rs2749; 2026-02-21T10:22:24.3644248Z cvt.f32.bf16 %r34083, %rs2750; 2026-02-21T10:22:24.3644308Z cvt.f32.bf16 %r34212, %rs2695; 2026-02-21T10:22:24.3644369Z cvt.f32.bf16 %r34213, %rs2696; 2026-02-21T10:22:24.3644428Z cvt.f32.bf16 %r34214, %rs2703; 2026-02-21T10:22:24.3644489Z cvt.f32.bf16 %r34215, %rs2704; 2026-02-21T10:22:24.3644603Z cvt.f32.bf16 %r34344, %rs2711; 2026-02-21T10:22:24.3644665Z cvt.f32.bf16 %r34345, %rs2712; 2026-02-21T10:22:24.3644725Z cvt.f32.bf16 %r34346, %rs2719; 2026-02-21T10:22:24.3644784Z cvt.f32.bf16 %r34347, %rs2720; 2026-02-21T10:22:24.3644845Z cvt.f32.bf16 %r34476, %rs2727; 2026-02-21T10:22:24.3644903Z cvt.f32.bf16 %r34477, %rs2728; 2026-02-21T10:22:24.3645011Z cvt.f32.bf16 %r34478, %rs2735; 2026-02-21T10:22:24.3645072Z cvt.f32.bf16 %r34479, %rs2736; 2026-02-21T10:22:24.3645134Z cvt.f32.bf16 %r34608, %rs2743; 2026-02-21T10:22:24.3645193Z cvt.f32.bf16 %r34609, %rs2744; 2026-02-21T10:22:24.3645252Z cvt.f32.bf16 %r34610, %rs2751; 2026-02-21T10:22:24.3645315Z cvt.f32.bf16 %r34611, %rs2752; 2026-02-21T10:22:24.3645521Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3645577Z bar.sync 0; 2026-02-21T10:22:24.3645652Z add.s32 %r39934, %r39936, 4096; 2026-02-21T10:22:24.3645715Z // begin inline asm 2026-02-21T10:22:24.3645818Z @%p313 mbarrier.init.shared::cta.b64 [%r39934], 1; 2026-02-21T10:22:24.3645877Z // end inline asm 2026-02-21T10:22:24.3645931Z bar.sync 0; 2026-02-21T10:22:24.3645989Z // begin inline asm 2026-02-21T10:22:24.3646133Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39934], 4096; 2026-02-21T10:22:24.3646196Z // end inline asm 2026-02-21T10:22:24.3646254Z // begin inline asm 2026-02-21T10:22:24.3646330Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3646386Z // end inline asm 2026-02-21T10:22:24.3646658Z bar.sync 0; 2026-02-21T10:22:24.3646739Z elect.sync %r39704|%p374, -1; 2026-02-21T10:22:24.3646808Z and.pred %p315, %p1, %p374; 2026-02-21T10:22:24.3646873Z add.s64 %rd103, %rd853, 96; 2026-02-21T10:22:24.3646937Z cvt.u32.u64 %r32495, %rd103; 2026-02-21T10:22:24.3646997Z // begin inline asm 2026-02-21T10:22:24.3647340Z @%p315 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r39937, %r32495}], [%r39934]; 2026-02-21T10:22:24.3647402Z // end inline asm 2026-02-21T10:22:24.3647456Z bar.sync 0; 2026-02-21T10:22:24.3647517Z mov.b32 %r39572, 0; 2026-02-21T10:22:24.3647574Z // begin inline asm 2026-02-21T10:22:24.3647627Z 2026-02-21T10:22:24.3647676Z { 2026-02-21T10:22:24.3647741Z .reg .pred complete; 2026-02-21T10:22:24.3647796Z waitLoop: 2026-02-21T10:22:24.3647946Z mbarrier.try_wait.parity.shared.b64 complete, [%r39934], %r39572; 2026-02-21T10:22:24.3648019Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3648068Z } 2026-02-21T10:22:24.3648076Z 2026-02-21T10:22:24.3648132Z // end inline asm 2026-02-21T10:22:24.3648186Z bar.sync 0; 2026-02-21T10:22:24.3648246Z // begin inline asm 2026-02-21T10:22:24.3648341Z @%p313 mbarrier.inval.shared::cta.b64 [%r39934]; 2026-02-21T10:22:24.3648408Z // end inline asm 2026-02-21T10:22:24.3648616Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3648683Z ld.shared.s8 %rs2753, [%r80]; 2026-02-21T10:22:24.3648968Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3649037Z shl.b16 %rs2754, %rs2753, 4; 2026-02-21T10:22:24.3649231Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3649301Z ld.shared.s8 %rs2755, [%r81+128]; 2026-02-21T10:22:24.3649491Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3649557Z shl.b16 %rs2756, %rs2755, 4; 2026-02-21T10:22:24.3649746Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3649812Z ld.shared.s8 %rs2757, [%r82+256]; 2026-02-21T10:22:24.3650003Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3650064Z shl.b16 %rs2758, %rs2757, 4; 2026-02-21T10:22:24.3650320Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3650391Z ld.shared.s8 %rs2759, [%r83+384]; 2026-02-21T10:22:24.3650581Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3650643Z shl.b16 %rs2760, %rs2759, 4; 2026-02-21T10:22:24.3650901Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3650965Z ld.shared.s8 %rs2761, [%r84+512]; 2026-02-21T10:22:24.3651156Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3651218Z shl.b16 %rs2762, %rs2761, 4; 2026-02-21T10:22:24.3651407Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3651470Z ld.shared.s8 %rs2763, [%r85+640]; 2026-02-21T10:22:24.3651662Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3651740Z shl.b16 %rs2764, %rs2763, 4; 2026-02-21T10:22:24.3651934Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3651997Z ld.shared.s8 %rs2765, [%r86+768]; 2026-02-21T10:22:24.3652187Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3652250Z shl.b16 %rs2766, %rs2765, 4; 2026-02-21T10:22:24.3652493Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3652560Z ld.shared.s8 %rs2767, [%r87+896]; 2026-02-21T10:22:24.3652748Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3652808Z shl.b16 %rs2768, %rs2767, 4; 2026-02-21T10:22:24.3652997Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3653069Z ld.shared.s8 %rs2769, [%r80+1024]; 2026-02-21T10:22:24.3653259Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3653319Z shl.b16 %rs2770, %rs2769, 4; 2026-02-21T10:22:24.3653508Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3653575Z ld.shared.s8 %rs2771, [%r81+1152]; 2026-02-21T10:22:24.3653764Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3653826Z shl.b16 %rs2772, %rs2771, 4; 2026-02-21T10:22:24.3654013Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3654076Z ld.shared.s8 %rs2773, [%r82+1280]; 2026-02-21T10:22:24.3654281Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3654342Z shl.b16 %rs2774, %rs2773, 4; 2026-02-21T10:22:24.3654584Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3654650Z ld.shared.s8 %rs2775, [%r83+1408]; 2026-02-21T10:22:24.3654835Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3654897Z shl.b16 %rs2776, %rs2775, 4; 2026-02-21T10:22:24.3655085Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3655152Z ld.shared.s8 %rs2777, [%r84+1536]; 2026-02-21T10:22:24.3655341Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3655403Z shl.b16 %rs2778, %rs2777, 4; 2026-02-21T10:22:24.3655595Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3655659Z ld.shared.s8 %rs2779, [%r85+1664]; 2026-02-21T10:22:24.3655893Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3655962Z shl.b16 %rs2780, %rs2779, 4; 2026-02-21T10:22:24.3656157Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3656220Z ld.shared.s8 %rs2781, [%r86+1792]; 2026-02-21T10:22:24.3656585Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3656656Z shl.b16 %rs2782, %rs2781, 4; 2026-02-21T10:22:24.3656849Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3656912Z ld.shared.s8 %rs2783, [%r87+1920]; 2026-02-21T10:22:24.3657104Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3657165Z shl.b16 %rs2784, %rs2783, 4; 2026-02-21T10:22:24.3657363Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3657436Z ld.shared.s8 %rs2785, [%r80+2048]; 2026-02-21T10:22:24.3657625Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3657684Z shl.b16 %rs2786, %rs2785, 4; 2026-02-21T10:22:24.3657876Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3657941Z ld.shared.s8 %rs2787, [%r81+2176]; 2026-02-21T10:22:24.3658204Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3658269Z shl.b16 %rs2788, %rs2787, 4; 2026-02-21T10:22:24.3658459Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3658522Z ld.shared.s8 %rs2789, [%r82+2304]; 2026-02-21T10:22:24.3658708Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3658774Z shl.b16 %rs2790, %rs2789, 4; 2026-02-21T10:22:24.3658963Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3659027Z ld.shared.s8 %rs2791, [%r83+2432]; 2026-02-21T10:22:24.3659217Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3659280Z shl.b16 %rs2792, %rs2791, 4; 2026-02-21T10:22:24.3659471Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3659536Z ld.shared.s8 %rs2793, [%r84+2560]; 2026-02-21T10:22:24.3659721Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3659781Z shl.b16 %rs2794, %rs2793, 4; 2026-02-21T10:22:24.3659972Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3660039Z ld.shared.s8 %rs2795, [%r85+2688]; 2026-02-21T10:22:24.3660304Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3660365Z shl.b16 %rs2796, %rs2795, 4; 2026-02-21T10:22:24.3660571Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3660634Z ld.shared.s8 %rs2797, [%r86+2816]; 2026-02-21T10:22:24.3660824Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3660890Z shl.b16 %rs2798, %rs2797, 4; 2026-02-21T10:22:24.3661079Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3661142Z ld.shared.s8 %rs2799, [%r87+2944]; 2026-02-21T10:22:24.3661333Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3661393Z shl.b16 %rs2800, %rs2799, 4; 2026-02-21T10:22:24.3661646Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3661717Z ld.shared.s8 %rs2801, [%r80+3072]; 2026-02-21T10:22:24.3661904Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3661964Z shl.b16 %rs2802, %rs2801, 4; 2026-02-21T10:22:24.3662213Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3662280Z ld.shared.s8 %rs2803, [%r81+3200]; 2026-02-21T10:22:24.3662469Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3662529Z shl.b16 %rs2804, %rs2803, 4; 2026-02-21T10:22:24.3662721Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3662783Z ld.shared.s8 %rs2805, [%r82+3328]; 2026-02-21T10:22:24.3662969Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3663037Z shl.b16 %rs2806, %rs2805, 4; 2026-02-21T10:22:24.3663225Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3663288Z ld.shared.s8 %rs2807, [%r83+3456]; 2026-02-21T10:22:24.3663476Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3663537Z shl.b16 %rs2808, %rs2807, 4; 2026-02-21T10:22:24.3663775Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3663842Z ld.shared.s8 %rs2809, [%r84+3584]; 2026-02-21T10:22:24.3664028Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3664089Z shl.b16 %rs2810, %rs2809, 4; 2026-02-21T10:22:24.3664277Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3664345Z ld.shared.s8 %rs2811, [%r85+3712]; 2026-02-21T10:22:24.3664534Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3664595Z shl.b16 %rs2812, %rs2811, 4; 2026-02-21T10:22:24.3664783Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3664848Z ld.shared.s8 %rs2813, [%r86+3840]; 2026-02-21T10:22:24.3665038Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3665104Z shl.b16 %rs2814, %rs2813, 4; 2026-02-21T10:22:24.3665291Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3665355Z ld.shared.s8 %rs2815, [%r87+3968]; 2026-02-21T10:22:24.3665547Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3665607Z shl.b16 %rs2816, %rs2815, 4; 2026-02-21T10:22:24.3665848Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3665921Z cvt.s16.s8 %rs2817, %rs2754; 2026-02-21T10:22:24.3665985Z shr.s16 %rs2818, %rs2817, 4; 2026-02-21T10:22:24.3666045Z cvt.s16.s8 %rs2819, %rs2756; 2026-02-21T10:22:24.3666105Z shr.s16 %rs2820, %rs2819, 4; 2026-02-21T10:22:24.3666170Z shr.s16 %rs2821, %rs2753, 4; 2026-02-21T10:22:24.3666228Z shr.s16 %rs2822, %rs2755, 4; 2026-02-21T10:22:24.3666420Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3666614Z cvt.rn.f32.s16 %r39705, %rs2822; 2026-02-21T10:22:24.3666680Z cvt.rn.f32.s16 %r39706, %rs2821; 2026-02-21T10:22:24.3666752Z cvt.rn.f32.s16 %r39707, %rs2820; 2026-02-21T10:22:24.3666816Z cvt.rn.f32.s16 %r39708, %rs2818; 2026-02-21T10:22:24.3667009Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3667072Z cvt.s16.s8 %rs2823, %rs2758; 2026-02-21T10:22:24.3667215Z shr.s16 %rs2824, %rs2823, 4; 2026-02-21T10:22:24.3667281Z cvt.s16.s8 %rs2825, %rs2760; 2026-02-21T10:22:24.3667341Z shr.s16 %rs2826, %rs2825, 4; 2026-02-21T10:22:24.3667402Z shr.s16 %rs2827, %rs2757, 4; 2026-02-21T10:22:24.3667461Z shr.s16 %rs2828, %rs2759, 4; 2026-02-21T10:22:24.3667745Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3667810Z cvt.rn.f32.s16 %r39709, %rs2828; 2026-02-21T10:22:24.3667873Z cvt.rn.f32.s16 %r39710, %rs2827; 2026-02-21T10:22:24.3667937Z cvt.rn.f32.s16 %r39711, %rs2826; 2026-02-21T10:22:24.3667998Z cvt.rn.f32.s16 %r39712, %rs2824; 2026-02-21T10:22:24.3668187Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3668251Z cvt.s16.s8 %rs2829, %rs2762; 2026-02-21T10:22:24.3668311Z shr.s16 %rs2830, %rs2829, 4; 2026-02-21T10:22:24.3668370Z cvt.s16.s8 %rs2831, %rs2764; 2026-02-21T10:22:24.3668498Z shr.s16 %rs2832, %rs2831, 4; 2026-02-21T10:22:24.3668563Z shr.s16 %rs2833, %rs2761, 4; 2026-02-21T10:22:24.3668623Z shr.s16 %rs2834, %rs2763, 4; 2026-02-21T10:22:24.3668812Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3668880Z cvt.rn.f32.s16 %r39713, %rs2834; 2026-02-21T10:22:24.3668941Z cvt.rn.f32.s16 %r39714, %rs2833; 2026-02-21T10:22:24.3669002Z cvt.rn.f32.s16 %r39715, %rs2832; 2026-02-21T10:22:24.3669137Z cvt.rn.f32.s16 %r39716, %rs2830; 2026-02-21T10:22:24.3669331Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3669390Z cvt.s16.s8 %rs2835, %rs2766; 2026-02-21T10:22:24.3669449Z shr.s16 %rs2836, %rs2835, 4; 2026-02-21T10:22:24.3669511Z cvt.s16.s8 %rs2837, %rs2768; 2026-02-21T10:22:24.3669569Z shr.s16 %rs2838, %rs2837, 4; 2026-02-21T10:22:24.3669627Z shr.s16 %rs2839, %rs2765, 4; 2026-02-21T10:22:24.3669694Z shr.s16 %rs2840, %rs2767, 4; 2026-02-21T10:22:24.3669882Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3669945Z cvt.rn.f32.s16 %r39717, %rs2840; 2026-02-21T10:22:24.3670009Z cvt.rn.f32.s16 %r39718, %rs2839; 2026-02-21T10:22:24.3670072Z cvt.rn.f32.s16 %r39719, %rs2838; 2026-02-21T10:22:24.3670136Z cvt.rn.f32.s16 %r39720, %rs2836; 2026-02-21T10:22:24.3670326Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3670397Z cvt.s16.s8 %rs2841, %rs2770; 2026-02-21T10:22:24.3670461Z shr.s16 %rs2842, %rs2841, 4; 2026-02-21T10:22:24.3670521Z cvt.s16.s8 %rs2843, %rs2772; 2026-02-21T10:22:24.3670583Z shr.s16 %rs2844, %rs2843, 4; 2026-02-21T10:22:24.3670640Z shr.s16 %rs2845, %rs2769, 4; 2026-02-21T10:22:24.3670700Z shr.s16 %rs2846, %rs2771, 4; 2026-02-21T10:22:24.3670890Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3671034Z cvt.rn.f32.s16 %r39721, %rs2846; 2026-02-21T10:22:24.3671096Z cvt.rn.f32.s16 %r39722, %rs2845; 2026-02-21T10:22:24.3671157Z cvt.rn.f32.s16 %r39723, %rs2844; 2026-02-21T10:22:24.3671219Z cvt.rn.f32.s16 %r39724, %rs2842; 2026-02-21T10:22:24.3671409Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3671472Z cvt.s16.s8 %rs2847, %rs2774; 2026-02-21T10:22:24.3671533Z shr.s16 %rs2848, %rs2847, 4; 2026-02-21T10:22:24.3671594Z cvt.s16.s8 %rs2849, %rs2776; 2026-02-21T10:22:24.3671654Z shr.s16 %rs2850, %rs2849, 4; 2026-02-21T10:22:24.3671713Z shr.s16 %rs2851, %rs2773, 4; 2026-02-21T10:22:24.3671775Z shr.s16 %rs2852, %rs2775, 4; 2026-02-21T10:22:24.3671965Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3672028Z cvt.rn.f32.s16 %r39725, %rs2852; 2026-02-21T10:22:24.3672094Z cvt.rn.f32.s16 %r39726, %rs2851; 2026-02-21T10:22:24.3672206Z cvt.rn.f32.s16 %r39727, %rs2850; 2026-02-21T10:22:24.3672269Z cvt.rn.f32.s16 %r39728, %rs2848; 2026-02-21T10:22:24.3672461Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3672521Z cvt.s16.s8 %rs2853, %rs2778; 2026-02-21T10:22:24.3672629Z shr.s16 %rs2854, %rs2853, 4; 2026-02-21T10:22:24.3672689Z cvt.s16.s8 %rs2855, %rs2780; 2026-02-21T10:22:24.3672751Z shr.s16 %rs2856, %rs2855, 4; 2026-02-21T10:22:24.3672812Z shr.s16 %rs2857, %rs2777, 4; 2026-02-21T10:22:24.3672873Z shr.s16 %rs2858, %rs2779, 4; 2026-02-21T10:22:24.3673064Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3673125Z cvt.rn.f32.s16 %r39729, %rs2858; 2026-02-21T10:22:24.3673186Z cvt.rn.f32.s16 %r39730, %rs2857; 2026-02-21T10:22:24.3673249Z cvt.rn.f32.s16 %r39731, %rs2856; 2026-02-21T10:22:24.3673310Z cvt.rn.f32.s16 %r39732, %rs2854; 2026-02-21T10:22:24.3673502Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3673562Z cvt.s16.s8 %rs2859, %rs2782; 2026-02-21T10:22:24.3673624Z shr.s16 %rs2860, %rs2859, 4; 2026-02-21T10:22:24.3673684Z cvt.s16.s8 %rs2861, %rs2784; 2026-02-21T10:22:24.3673743Z shr.s16 %rs2862, %rs2861, 4; 2026-02-21T10:22:24.3673806Z shr.s16 %rs2863, %rs2781, 4; 2026-02-21T10:22:24.3673865Z shr.s16 %rs2864, %rs2783, 4; 2026-02-21T10:22:24.3674106Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3674171Z cvt.rn.f32.s16 %r39733, %rs2864; 2026-02-21T10:22:24.3674234Z cvt.rn.f32.s16 %r39734, %rs2863; 2026-02-21T10:22:24.3674294Z cvt.rn.f32.s16 %r39735, %rs2862; 2026-02-21T10:22:24.3674354Z cvt.rn.f32.s16 %r39736, %rs2860; 2026-02-21T10:22:24.3674544Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3674607Z cvt.s16.s8 %rs2865, %rs2786; 2026-02-21T10:22:24.3674668Z shr.s16 %rs2866, %rs2865, 4; 2026-02-21T10:22:24.3674730Z cvt.s16.s8 %rs2867, %rs2788; 2026-02-21T10:22:24.3674798Z shr.s16 %rs2868, %rs2867, 4; 2026-02-21T10:22:24.3674860Z shr.s16 %rs2869, %rs2785, 4; 2026-02-21T10:22:24.3674919Z shr.s16 %rs2870, %rs2787, 4; 2026-02-21T10:22:24.3675117Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3675179Z cvt.rn.f32.s16 %r39737, %rs2870; 2026-02-21T10:22:24.3675245Z cvt.rn.f32.s16 %r39738, %rs2869; 2026-02-21T10:22:24.3675307Z cvt.rn.f32.s16 %r39739, %rs2868; 2026-02-21T10:22:24.3675369Z cvt.rn.f32.s16 %r39740, %rs2866; 2026-02-21T10:22:24.3675558Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3675621Z cvt.s16.s8 %rs2871, %rs2790; 2026-02-21T10:22:24.3675681Z shr.s16 %rs2872, %rs2871, 4; 2026-02-21T10:22:24.3675797Z cvt.s16.s8 %rs2873, %rs2792; 2026-02-21T10:22:24.3675857Z shr.s16 %rs2874, %rs2873, 4; 2026-02-21T10:22:24.3675920Z shr.s16 %rs2875, %rs2789, 4; 2026-02-21T10:22:24.3675978Z shr.s16 %rs2876, %rs2791, 4; 2026-02-21T10:22:24.3676166Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3676234Z cvt.rn.f32.s16 %r39741, %rs2876; 2026-02-21T10:22:24.3676295Z cvt.rn.f32.s16 %r39742, %rs2875; 2026-02-21T10:22:24.3676355Z cvt.rn.f32.s16 %r39743, %rs2874; 2026-02-21T10:22:24.3676417Z cvt.rn.f32.s16 %r39744, %rs2872; 2026-02-21T10:22:24.3676751Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3676815Z cvt.s16.s8 %rs2877, %rs2794; 2026-02-21T10:22:24.3676877Z shr.s16 %rs2878, %rs2877, 4; 2026-02-21T10:22:24.3676937Z cvt.s16.s8 %rs2879, %rs2796; 2026-02-21T10:22:24.3676996Z shr.s16 %rs2880, %rs2879, 4; 2026-02-21T10:22:24.3677055Z shr.s16 %rs2881, %rs2793, 4; 2026-02-21T10:22:24.3677202Z shr.s16 %rs2882, %rs2795, 4; 2026-02-21T10:22:24.3677404Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3677469Z cvt.rn.f32.s16 %r39745, %rs2882; 2026-02-21T10:22:24.3677532Z cvt.rn.f32.s16 %r39746, %rs2881; 2026-02-21T10:22:24.3677657Z cvt.rn.f32.s16 %r39747, %rs2880; 2026-02-21T10:22:24.3677717Z cvt.rn.f32.s16 %r39748, %rs2878; 2026-02-21T10:22:24.3677912Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3677975Z cvt.s16.s8 %rs2883, %rs2798; 2026-02-21T10:22:24.3678045Z shr.s16 %rs2884, %rs2883, 4; 2026-02-21T10:22:24.3678106Z cvt.s16.s8 %rs2885, %rs2800; 2026-02-21T10:22:24.3678168Z shr.s16 %rs2886, %rs2885, 4; 2026-02-21T10:22:24.3678227Z shr.s16 %rs2887, %rs2797, 4; 2026-02-21T10:22:24.3678286Z shr.s16 %rs2888, %rs2799, 4; 2026-02-21T10:22:24.3678477Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3678546Z cvt.rn.f32.s16 %r39749, %rs2888; 2026-02-21T10:22:24.3678609Z cvt.rn.f32.s16 %r39750, %rs2887; 2026-02-21T10:22:24.3678670Z cvt.rn.f32.s16 %r39751, %rs2886; 2026-02-21T10:22:24.3678732Z cvt.rn.f32.s16 %r39752, %rs2884; 2026-02-21T10:22:24.3678923Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3678985Z cvt.s16.s8 %rs2889, %rs2802; 2026-02-21T10:22:24.3679117Z shr.s16 %rs2890, %rs2889, 4; 2026-02-21T10:22:24.3679182Z cvt.s16.s8 %rs2891, %rs2804; 2026-02-21T10:22:24.3679241Z shr.s16 %rs2892, %rs2891, 4; 2026-02-21T10:22:24.3679299Z shr.s16 %rs2893, %rs2801, 4; 2026-02-21T10:22:24.3679360Z shr.s16 %rs2894, %rs2803, 4; 2026-02-21T10:22:24.3679552Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3679613Z cvt.rn.f32.s16 %r39753, %rs2894; 2026-02-21T10:22:24.3679680Z cvt.rn.f32.s16 %r39754, %rs2893; 2026-02-21T10:22:24.3679742Z cvt.rn.f32.s16 %r39755, %rs2892; 2026-02-21T10:22:24.3679803Z cvt.rn.f32.s16 %r39756, %rs2890; 2026-02-21T10:22:24.3679994Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3680054Z cvt.s16.s8 %rs2895, %rs2806; 2026-02-21T10:22:24.3680117Z shr.s16 %rs2896, %rs2895, 4; 2026-02-21T10:22:24.3680176Z cvt.s16.s8 %rs2897, %rs2808; 2026-02-21T10:22:24.3680237Z shr.s16 %rs2898, %rs2897, 4; 2026-02-21T10:22:24.3680299Z shr.s16 %rs2899, %rs2805, 4; 2026-02-21T10:22:24.3680359Z shr.s16 %rs2900, %rs2807, 4; 2026-02-21T10:22:24.3680555Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3680617Z cvt.rn.f32.s16 %r39757, %rs2900; 2026-02-21T10:22:24.3680678Z cvt.rn.f32.s16 %r39758, %rs2899; 2026-02-21T10:22:24.3680739Z cvt.rn.f32.s16 %r39759, %rs2898; 2026-02-21T10:22:24.3680883Z cvt.rn.f32.s16 %r39760, %rs2896; 2026-02-21T10:22:24.3681082Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3681142Z cvt.s16.s8 %rs2901, %rs2810; 2026-02-21T10:22:24.3681205Z shr.s16 %rs2902, %rs2901, 4; 2026-02-21T10:22:24.3681264Z cvt.s16.s8 %rs2903, %rs2812; 2026-02-21T10:22:24.3681326Z shr.s16 %rs2904, %rs2903, 4; 2026-02-21T10:22:24.3681389Z shr.s16 %rs2905, %rs2809, 4; 2026-02-21T10:22:24.3681449Z shr.s16 %rs2906, %rs2811, 4; 2026-02-21T10:22:24.3681639Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3681702Z cvt.rn.f32.s16 %r39761, %rs2906; 2026-02-21T10:22:24.3681767Z cvt.rn.f32.s16 %r39762, %rs2905; 2026-02-21T10:22:24.3681841Z cvt.rn.f32.s16 %r39763, %rs2904; 2026-02-21T10:22:24.3681903Z cvt.rn.f32.s16 %r39764, %rs2902; 2026-02-21T10:22:24.3682095Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3682204Z cvt.s16.s8 %rs2907, %rs2814; 2026-02-21T10:22:24.3682266Z shr.s16 %rs2908, %rs2907, 4; 2026-02-21T10:22:24.3682326Z cvt.s16.s8 %rs2909, %rs2816; 2026-02-21T10:22:24.3682386Z shr.s16 %rs2910, %rs2909, 4; 2026-02-21T10:22:24.3682444Z shr.s16 %rs2911, %rs2813, 4; 2026-02-21T10:22:24.3682502Z shr.s16 %rs2912, %rs2815, 4; 2026-02-21T10:22:24.3682744Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3682807Z cvt.rn.f32.s16 %r39765, %rs2912; 2026-02-21T10:22:24.3682868Z cvt.rn.f32.s16 %r39766, %rs2911; 2026-02-21T10:22:24.3682930Z cvt.rn.f32.s16 %r39767, %rs2910; 2026-02-21T10:22:24.3682991Z cvt.rn.f32.s16 %r39768, %rs2908; 2026-02-21T10:22:24.3683048Z bar.sync 0; 2026-02-21T10:22:24.3683167Z st.shared.v4.b32 [%r88], {%r39708, %r39706, %r39707, %r39705}; 2026-02-21T10:22:24.3683297Z st.shared.v4.b32 [%r88+16384], {%r39740, %r39738, %r39739, %r39737}; 2026-02-21T10:22:24.3683414Z st.shared.v4.b32 [%r89], {%r39712, %r39710, %r39711, %r39709}; 2026-02-21T10:22:24.3683534Z st.shared.v4.b32 [%r89+16384], {%r39744, %r39742, %r39743, %r39741}; 2026-02-21T10:22:24.3683647Z st.shared.v4.b32 [%r90], {%r39716, %r39714, %r39715, %r39713}; 2026-02-21T10:22:24.3683761Z st.shared.v4.b32 [%r90+16384], {%r39748, %r39746, %r39747, %r39745}; 2026-02-21T10:22:24.3683871Z st.shared.v4.b32 [%r91], {%r39720, %r39718, %r39719, %r39717}; 2026-02-21T10:22:24.3683988Z st.shared.v4.b32 [%r91+16384], {%r39752, %r39750, %r39751, %r39749}; 2026-02-21T10:22:24.3684145Z st.shared.v4.b32 [%r92], {%r39724, %r39722, %r39723, %r39721}; 2026-02-21T10:22:24.3684264Z st.shared.v4.b32 [%r92+16384], {%r39756, %r39754, %r39755, %r39753}; 2026-02-21T10:22:24.3684383Z st.shared.v4.b32 [%r93], {%r39728, %r39726, %r39727, %r39725}; 2026-02-21T10:22:24.3684502Z st.shared.v4.b32 [%r93+16384], {%r39760, %r39758, %r39759, %r39757}; 2026-02-21T10:22:24.3684609Z st.shared.v4.b32 [%r94], {%r39732, %r39730, %r39731, %r39729}; 2026-02-21T10:22:24.3684729Z st.shared.v4.b32 [%r94+16384], {%r39764, %r39762, %r39763, %r39761}; 2026-02-21T10:22:24.3684837Z st.shared.v4.b32 [%r95], {%r39736, %r39734, %r39735, %r39733}; 2026-02-21T10:22:24.3684955Z st.shared.v4.b32 [%r95+16384], {%r39768, %r39766, %r39767, %r39765}; 2026-02-21T10:22:24.3685010Z $L__tmp25: 2026-02-21T10:22:24.3685287Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3685348Z // begin inline asm 2026-02-21T10:22:24.3685428Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3685485Z // end inline asm 2026-02-21T10:22:24.3685540Z bar.sync 0; 2026-02-21T10:22:24.3685623Z shfl.sync.idx.b32 %r39769, %r5, 0, 31, -1; 2026-02-21T10:22:24.3685695Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3685765Z mov.pred %p317, -1; 2026-02-21T10:22:24.3685822Z // begin inline asm 2026-02-21T10:22:24.3687465Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r32628,%r32629,%r32630,%r32631}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3687610Z // end inline asm 2026-02-21T10:22:24.3687669Z // begin inline asm 2026-02-21T10:22:24.3689262Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r32760,%r32761,%r32762,%r32763}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3689324Z // end inline asm 2026-02-21T10:22:24.3689451Z // begin inline asm 2026-02-21T10:22:24.3690938Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r32892,%r32893,%r32894,%r32895}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3691003Z // end inline asm 2026-02-21T10:22:24.3691059Z // begin inline asm 2026-02-21T10:22:24.3692602Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r33024,%r33025,%r33026,%r33027}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3692664Z // end inline asm 2026-02-21T10:22:24.3692725Z // begin inline asm 2026-02-21T10:22:24.3694203Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r33156,%r33157,%r33158,%r33159}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3694268Z // end inline asm 2026-02-21T10:22:24.3694328Z // begin inline asm 2026-02-21T10:22:24.3695811Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r33288,%r33289,%r33290,%r33291}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3695936Z // end inline asm 2026-02-21T10:22:24.3695994Z // begin inline asm 2026-02-21T10:22:24.3697606Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r33420,%r33421,%r33422,%r33423}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3697669Z // end inline asm 2026-02-21T10:22:24.3697800Z // begin inline asm 2026-02-21T10:22:24.3699298Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r33552,%r33553,%r33554,%r33555}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3699414Z // end inline asm 2026-02-21T10:22:24.3699472Z // begin inline asm 2026-02-21T10:22:24.3700957Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r33684,%r33685,%r33686,%r33687}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3701018Z // end inline asm 2026-02-21T10:22:24.3701075Z // begin inline asm 2026-02-21T10:22:24.3702613Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r33816,%r33817,%r33818,%r33819}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3702676Z // end inline asm 2026-02-21T10:22:24.3702734Z // begin inline asm 2026-02-21T10:22:24.3704216Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r33948,%r33949,%r33950,%r33951}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3704274Z // end inline asm 2026-02-21T10:22:24.3704336Z // begin inline asm 2026-02-21T10:22:24.3705878Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r34080,%r34081,%r34082,%r34083}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3705939Z // end inline asm 2026-02-21T10:22:24.3706001Z // begin inline asm 2026-02-21T10:22:24.3707675Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r34212,%r34213,%r34214,%r34215}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3707799Z // end inline asm 2026-02-21T10:22:24.3707857Z // begin inline asm 2026-02-21T10:22:24.3709436Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r34344,%r34345,%r34346,%r34347}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3709499Z // end inline asm 2026-02-21T10:22:24.3709556Z // begin inline asm 2026-02-21T10:22:24.3711102Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r34476,%r34477,%r34478,%r34479}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3711164Z // end inline asm 2026-02-21T10:22:24.3711220Z // begin inline asm 2026-02-21T10:22:24.3712713Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r34608,%r34609,%r34610,%r34611}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3712775Z // end inline asm 2026-02-21T10:22:24.3712853Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3712916Z mov.b32 %r34741, %r39572; 2026-02-21T10:22:24.3712975Z mov.b32 %r34742, %r39572; 2026-02-21T10:22:24.3713032Z mov.b32 %r34740, %r39936; 2026-02-21T10:22:24.3713092Z // begin inline asm 2026-02-21T10:22:24.3715614Z // wait for regs: %r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373,%r34740,%r34741,%r34742 2026-02-21T10:22:24.3715761Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3715862Z // end inline asm 2026-02-21T10:22:24.3715918Z $L__tmp26: 2026-02-21T10:22:24.3716127Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3716193Z add.s32 %r39770, %r43245, -64; 2026-02-21T10:22:24.3716256Z add.s64 %rd716, %rd675, 128; 2026-02-21T10:22:24.3716366Z add.s64 %rd719, %rd678, 128; 2026-02-21T10:22:24.3716437Z add.s64 %rd722, %rd681, 128; 2026-02-21T10:22:24.3716625Z add.s64 %rd725, %rd684, 128; 2026-02-21T10:22:24.3716694Z add.s64 %rd728, %rd687, 128; 2026-02-21T10:22:24.3716757Z add.s64 %rd731, %rd690, 128; 2026-02-21T10:22:24.3716817Z add.s64 %rd734, %rd693, 128; 2026-02-21T10:22:24.3716892Z mad.wide.s32 %rd737, %r39770, 2, %rd117; 2026-02-21T10:22:24.3717091Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3717149Z // begin inline asm 2026-02-21T10:22:24.3717207Z mov.u64 %rd715, 0x0; 2026-02-21T10:22:24.3717341Z createpolicy.fractional.L2::evict_first.b64 %rd715, 1.0; 2026-02-21T10:22:24.3717400Z // end inline asm 2026-02-21T10:22:24.3717458Z // begin inline asm 2026-02-21T10:22:24.3717515Z mov.u32 %r34874, 0x0; 2026-02-21T10:22:24.3717573Z mov.u32 %r34875, 0x0; 2026-02-21T10:22:24.3717630Z mov.u32 %r34876, 0x0; 2026-02-21T10:22:24.3717689Z mov.u32 %r34877, 0x0; 2026-02-21T10:22:24.3717926Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34874, %r34875, %r34876, %r34877 }, [ %rd716 + 0 ], %rd715; 2026-02-21T10:22:24.3718070Z // end inline asm 2026-02-21T10:22:24.3718133Z // begin inline asm 2026-02-21T10:22:24.3718191Z mov.u64 %rd718, 0x0; 2026-02-21T10:22:24.3718324Z createpolicy.fractional.L2::evict_first.b64 %rd718, 1.0; 2026-02-21T10:22:24.3718382Z // end inline asm 2026-02-21T10:22:24.3718440Z // begin inline asm 2026-02-21T10:22:24.3718499Z mov.u32 %r34878, 0x0; 2026-02-21T10:22:24.3718556Z mov.u32 %r34879, 0x0; 2026-02-21T10:22:24.3718611Z mov.u32 %r34880, 0x0; 2026-02-21T10:22:24.3718669Z mov.u32 %r34881, 0x0; 2026-02-21T10:22:24.3718910Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34878, %r34879, %r34880, %r34881 }, [ %rd719 + 0 ], %rd718; 2026-02-21T10:22:24.3718968Z // end inline asm 2026-02-21T10:22:24.3719025Z // begin inline asm 2026-02-21T10:22:24.3719086Z mov.u64 %rd721, 0x0; 2026-02-21T10:22:24.3719209Z createpolicy.fractional.L2::evict_first.b64 %rd721, 1.0; 2026-02-21T10:22:24.3719265Z // end inline asm 2026-02-21T10:22:24.3719322Z // begin inline asm 2026-02-21T10:22:24.3719384Z mov.u32 %r34882, 0x0; 2026-02-21T10:22:24.3719441Z mov.u32 %r34883, 0x0; 2026-02-21T10:22:24.3719497Z mov.u32 %r34884, 0x0; 2026-02-21T10:22:24.3719561Z mov.u32 %r34885, 0x0; 2026-02-21T10:22:24.3719795Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34882, %r34883, %r34884, %r34885 }, [ %rd722 + 0 ], %rd721; 2026-02-21T10:22:24.3719852Z // end inline asm 2026-02-21T10:22:24.3719910Z // begin inline asm 2026-02-21T10:22:24.3719969Z mov.u64 %rd724, 0x0; 2026-02-21T10:22:24.3720181Z createpolicy.fractional.L2::evict_first.b64 %rd724, 1.0; 2026-02-21T10:22:24.3720236Z // end inline asm 2026-02-21T10:22:24.3720295Z // begin inline asm 2026-02-21T10:22:24.3720351Z mov.u32 %r34886, 0x0; 2026-02-21T10:22:24.3720407Z mov.u32 %r34887, 0x0; 2026-02-21T10:22:24.3720465Z mov.u32 %r34888, 0x0; 2026-02-21T10:22:24.3720523Z mov.u32 %r34889, 0x0; 2026-02-21T10:22:24.3720755Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34886, %r34887, %r34888, %r34889 }, [ %rd725 + 0 ], %rd724; 2026-02-21T10:22:24.3720817Z // end inline asm 2026-02-21T10:22:24.3720876Z // begin inline asm 2026-02-21T10:22:24.3720933Z mov.u64 %rd727, 0x0; 2026-02-21T10:22:24.3721050Z createpolicy.fractional.L2::evict_first.b64 %rd727, 1.0; 2026-02-21T10:22:24.3721108Z // end inline asm 2026-02-21T10:22:24.3721165Z // begin inline asm 2026-02-21T10:22:24.3721220Z mov.u32 %r34890, 0x0; 2026-02-21T10:22:24.3721277Z mov.u32 %r34891, 0x0; 2026-02-21T10:22:24.3721333Z mov.u32 %r34892, 0x0; 2026-02-21T10:22:24.3721461Z mov.u32 %r34893, 0x0; 2026-02-21T10:22:24.3721684Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34890, %r34891, %r34892, %r34893 }, [ %rd728 + 0 ], %rd727; 2026-02-21T10:22:24.3721744Z // end inline asm 2026-02-21T10:22:24.3721802Z // begin inline asm 2026-02-21T10:22:24.3721860Z mov.u64 %rd730, 0x0; 2026-02-21T10:22:24.3722052Z createpolicy.fractional.L2::evict_first.b64 %rd730, 1.0; 2026-02-21T10:22:24.3722113Z // end inline asm 2026-02-21T10:22:24.3722170Z // begin inline asm 2026-02-21T10:22:24.3722233Z mov.u32 %r34894, 0x0; 2026-02-21T10:22:24.3724703Z mov.u32 %r34895, 0x0; 2026-02-21T10:22:24.3724764Z mov.u32 %r34896, 0x0; 2026-02-21T10:22:24.3724821Z mov.u32 %r34897, 0x0; 2026-02-21T10:22:24.3725086Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34894, %r34895, %r34896, %r34897 }, [ %rd731 + 0 ], %rd730; 2026-02-21T10:22:24.3725145Z // end inline asm 2026-02-21T10:22:24.3725204Z // begin inline asm 2026-02-21T10:22:24.3725263Z mov.u64 %rd733, 0x0; 2026-02-21T10:22:24.3725406Z createpolicy.fractional.L2::evict_first.b64 %rd733, 1.0; 2026-02-21T10:22:24.3725461Z // end inline asm 2026-02-21T10:22:24.3725517Z // begin inline asm 2026-02-21T10:22:24.3725579Z mov.u32 %r34898, 0x0; 2026-02-21T10:22:24.3725633Z mov.u32 %r34899, 0x0; 2026-02-21T10:22:24.3725688Z mov.u32 %r34900, 0x0; 2026-02-21T10:22:24.3725748Z mov.u32 %r34901, 0x0; 2026-02-21T10:22:24.3725982Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34898, %r34899, %r34900, %r34901 }, [ %rd734 + 0 ], %rd733; 2026-02-21T10:22:24.3726133Z // end inline asm 2026-02-21T10:22:24.3726197Z // begin inline asm 2026-02-21T10:22:24.3726253Z mov.u64 %rd736, 0x0; 2026-02-21T10:22:24.3726377Z createpolicy.fractional.L2::evict_first.b64 %rd736, 1.0; 2026-02-21T10:22:24.3726431Z // end inline asm 2026-02-21T10:22:24.3726659Z // begin inline asm 2026-02-21T10:22:24.3726729Z mov.u32 %r34902, 0x0; 2026-02-21T10:22:24.3726788Z mov.u32 %r34903, 0x0; 2026-02-21T10:22:24.3726845Z mov.u32 %r34904, 0x0; 2026-02-21T10:22:24.3726906Z mov.u32 %r34905, 0x0; 2026-02-21T10:22:24.3727137Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r34902, %r34903, %r34904, %r34905 }, [ %rd737 + 0 ], %rd736; 2026-02-21T10:22:24.3727195Z // end inline asm 2026-02-21T10:22:24.3727407Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3727464Z bar.sync 0; 2026-02-21T10:22:24.3727549Z st.shared.v2.b32 [%r70], {%r34874, %r34875}; 2026-02-21T10:22:24.3727644Z st.shared.v2.b32 [%r70+2048], {%r34878, %r34879}; 2026-02-21T10:22:24.3727728Z st.shared.v2.b32 [%r70+4096], {%r34882, %r34883}; 2026-02-21T10:22:24.3727810Z st.shared.v2.b32 [%r70+6144], {%r34886, %r34887}; 2026-02-21T10:22:24.3727893Z st.shared.v2.b32 [%r70+8192], {%r34890, %r34891}; 2026-02-21T10:22:24.3727982Z st.shared.v2.b32 [%r70+10240], {%r34894, %r34895}; 2026-02-21T10:22:24.3728066Z st.shared.v2.b32 [%r70+12288], {%r34898, %r34899}; 2026-02-21T10:22:24.3728254Z st.shared.v2.b32 [%r70+14336], {%r34902, %r34903}; 2026-02-21T10:22:24.3728340Z st.shared.v2.b32 [%r71], {%r34876, %r34877}; 2026-02-21T10:22:24.3728424Z st.shared.v2.b32 [%r71+2048], {%r34880, %r34881}; 2026-02-21T10:22:24.3728506Z st.shared.v2.b32 [%r71+4096], {%r34884, %r34885}; 2026-02-21T10:22:24.3728587Z st.shared.v2.b32 [%r71+6144], {%r34888, %r34889}; 2026-02-21T10:22:24.3728670Z st.shared.v2.b32 [%r71+8192], {%r34892, %r34893}; 2026-02-21T10:22:24.3728753Z st.shared.v2.b32 [%r71+10240], {%r34896, %r34897}; 2026-02-21T10:22:24.3728840Z st.shared.v2.b32 [%r71+12288], {%r34900, %r34901}; 2026-02-21T10:22:24.3728921Z st.shared.v2.b32 [%r71+14336], {%r34904, %r34905}; 2026-02-21T10:22:24.3728975Z bar.sync 0; 2026-02-21T10:22:24.3729044Z ld.shared.b16 %rs2913, [%r72]; 2026-02-21T10:22:24.3729115Z ld.shared.b16 %rs2914, [%r72+1024]; 2026-02-21T10:22:24.3729180Z ld.shared.b16 %rs2915, [%r72+64]; 2026-02-21T10:22:24.3729244Z ld.shared.b16 %rs2916, [%r72+1088]; 2026-02-21T10:22:24.3729312Z ld.shared.b16 %rs2917, [%r72+8192]; 2026-02-21T10:22:24.3729443Z ld.shared.b16 %rs2918, [%r72+9216]; 2026-02-21T10:22:24.3729508Z ld.shared.b16 %rs2919, [%r72+8256]; 2026-02-21T10:22:24.3729571Z ld.shared.b16 %rs2920, [%r72+9280]; 2026-02-21T10:22:24.3729636Z ld.shared.b16 %rs2921, [%r73]; 2026-02-21T10:22:24.3729698Z ld.shared.b16 %rs2922, [%r73+1024]; 2026-02-21T10:22:24.3729821Z ld.shared.b16 %rs2923, [%r73+64]; 2026-02-21T10:22:24.3729890Z ld.shared.b16 %rs2924, [%r73+1088]; 2026-02-21T10:22:24.3729954Z ld.shared.b16 %rs2925, [%r73+8192]; 2026-02-21T10:22:24.3730016Z ld.shared.b16 %rs2926, [%r73+9216]; 2026-02-21T10:22:24.3730081Z ld.shared.b16 %rs2927, [%r73+8256]; 2026-02-21T10:22:24.3730145Z ld.shared.b16 %rs2928, [%r73+9280]; 2026-02-21T10:22:24.3730207Z ld.shared.b16 %rs2929, [%r74]; 2026-02-21T10:22:24.3730269Z ld.shared.b16 %rs2930, [%r74+1024]; 2026-02-21T10:22:24.3730335Z ld.shared.b16 %rs2931, [%r74+64]; 2026-02-21T10:22:24.3730401Z ld.shared.b16 %rs2932, [%r74+1088]; 2026-02-21T10:22:24.3730471Z ld.shared.b16 %rs2933, [%r74+8192]; 2026-02-21T10:22:24.3730550Z ld.shared.b16 %rs2934, [%r74+9216]; 2026-02-21T10:22:24.3730616Z ld.shared.b16 %rs2935, [%r74+8256]; 2026-02-21T10:22:24.3730679Z ld.shared.b16 %rs2936, [%r74+9280]; 2026-02-21T10:22:24.3730743Z ld.shared.b16 %rs2937, [%r75]; 2026-02-21T10:22:24.3730808Z ld.shared.b16 %rs2938, [%r75+1024]; 2026-02-21T10:22:24.3730874Z ld.shared.b16 %rs2939, [%r75+64]; 2026-02-21T10:22:24.3730937Z ld.shared.b16 %rs2940, [%r75+1088]; 2026-02-21T10:22:24.3731075Z ld.shared.b16 %rs2941, [%r75+8192]; 2026-02-21T10:22:24.3731141Z ld.shared.b16 %rs2942, [%r75+9216]; 2026-02-21T10:22:24.3731203Z ld.shared.b16 %rs2943, [%r75+8256]; 2026-02-21T10:22:24.3731266Z ld.shared.b16 %rs2944, [%r75+9280]; 2026-02-21T10:22:24.3731328Z ld.shared.b16 %rs2945, [%r76]; 2026-02-21T10:22:24.3731390Z ld.shared.b16 %rs2946, [%r76+1024]; 2026-02-21T10:22:24.3731451Z ld.shared.b16 %rs2947, [%r76+64]; 2026-02-21T10:22:24.3731517Z ld.shared.b16 %rs2948, [%r76+1088]; 2026-02-21T10:22:24.3731581Z ld.shared.b16 %rs2949, [%r76+8192]; 2026-02-21T10:22:24.3731644Z ld.shared.b16 %rs2950, [%r76+9216]; 2026-02-21T10:22:24.3731709Z ld.shared.b16 %rs2951, [%r76+8256]; 2026-02-21T10:22:24.3731770Z ld.shared.b16 %rs2952, [%r76+9280]; 2026-02-21T10:22:24.3731831Z ld.shared.b16 %rs2953, [%r77]; 2026-02-21T10:22:24.3731903Z ld.shared.b16 %rs2954, [%r77+1024]; 2026-02-21T10:22:24.3731974Z ld.shared.b16 %rs2955, [%r77+64]; 2026-02-21T10:22:24.3732039Z ld.shared.b16 %rs2956, [%r77+1088]; 2026-02-21T10:22:24.3732100Z ld.shared.b16 %rs2957, [%r77+8192]; 2026-02-21T10:22:24.3732163Z ld.shared.b16 %rs2958, [%r77+9216]; 2026-02-21T10:22:24.3732225Z ld.shared.b16 %rs2959, [%r77+8256]; 2026-02-21T10:22:24.3732287Z ld.shared.b16 %rs2960, [%r77+9280]; 2026-02-21T10:22:24.3732350Z ld.shared.b16 %rs2961, [%r78]; 2026-02-21T10:22:24.3732410Z ld.shared.b16 %rs2962, [%r78+1024]; 2026-02-21T10:22:24.3732472Z ld.shared.b16 %rs2963, [%r78+64]; 2026-02-21T10:22:24.3732604Z ld.shared.b16 %rs2964, [%r78+1088]; 2026-02-21T10:22:24.3732670Z ld.shared.b16 %rs2965, [%r78+8192]; 2026-02-21T10:22:24.3732730Z ld.shared.b16 %rs2966, [%r78+9216]; 2026-02-21T10:22:24.3732792Z ld.shared.b16 %rs2967, [%r78+8256]; 2026-02-21T10:22:24.3732856Z ld.shared.b16 %rs2968, [%r78+9280]; 2026-02-21T10:22:24.3732920Z ld.shared.b16 %rs2969, [%r79]; 2026-02-21T10:22:24.3732983Z ld.shared.b16 %rs2970, [%r79+1024]; 2026-02-21T10:22:24.3733046Z ld.shared.b16 %rs2971, [%r79+64]; 2026-02-21T10:22:24.3733109Z ld.shared.b16 %rs2972, [%r79+1088]; 2026-02-21T10:22:24.3733170Z ld.shared.b16 %rs2973, [%r79+8192]; 2026-02-21T10:22:24.3733231Z ld.shared.b16 %rs2974, [%r79+9216]; 2026-02-21T10:22:24.3733304Z ld.shared.b16 %rs2975, [%r79+8256]; 2026-02-21T10:22:24.3733368Z ld.shared.b16 %rs2976, [%r79+9280]; 2026-02-21T10:22:24.3733429Z cvt.f32.bf16 %r35043, %rs2913; 2026-02-21T10:22:24.3733492Z cvt.f32.bf16 %r35044, %rs2914; 2026-02-21T10:22:24.3733557Z cvt.f32.bf16 %r35045, %rs2921; 2026-02-21T10:22:24.3733665Z cvt.f32.bf16 %r35046, %rs2922; 2026-02-21T10:22:24.3733725Z cvt.f32.bf16 %r35175, %rs2929; 2026-02-21T10:22:24.3733785Z cvt.f32.bf16 %r35176, %rs2930; 2026-02-21T10:22:24.3733843Z cvt.f32.bf16 %r35177, %rs2937; 2026-02-21T10:22:24.3733899Z cvt.f32.bf16 %r35178, %rs2938; 2026-02-21T10:22:24.3734017Z cvt.f32.bf16 %r35307, %rs2945; 2026-02-21T10:22:24.3734075Z cvt.f32.bf16 %r35308, %rs2946; 2026-02-21T10:22:24.3734132Z cvt.f32.bf16 %r35309, %rs2953; 2026-02-21T10:22:24.3734193Z cvt.f32.bf16 %r35310, %rs2954; 2026-02-21T10:22:24.3734251Z cvt.f32.bf16 %r35439, %rs2961; 2026-02-21T10:22:24.3734310Z cvt.f32.bf16 %r35440, %rs2962; 2026-02-21T10:22:24.3734367Z cvt.f32.bf16 %r35441, %rs2969; 2026-02-21T10:22:24.3734427Z cvt.f32.bf16 %r35442, %rs2970; 2026-02-21T10:22:24.3734485Z cvt.f32.bf16 %r35571, %rs2915; 2026-02-21T10:22:24.3734544Z cvt.f32.bf16 %r35572, %rs2916; 2026-02-21T10:22:24.3734603Z cvt.f32.bf16 %r35573, %rs2923; 2026-02-21T10:22:24.3734664Z cvt.f32.bf16 %r35574, %rs2924; 2026-02-21T10:22:24.3734721Z cvt.f32.bf16 %r35703, %rs2931; 2026-02-21T10:22:24.3734779Z cvt.f32.bf16 %r35704, %rs2932; 2026-02-21T10:22:24.3734838Z cvt.f32.bf16 %r35705, %rs2939; 2026-02-21T10:22:24.3734896Z cvt.f32.bf16 %r35706, %rs2940; 2026-02-21T10:22:24.3734953Z cvt.f32.bf16 %r35835, %rs2947; 2026-02-21T10:22:24.3735015Z cvt.f32.bf16 %r35836, %rs2948; 2026-02-21T10:22:24.3735072Z cvt.f32.bf16 %r35837, %rs2955; 2026-02-21T10:22:24.3735128Z cvt.f32.bf16 %r35838, %rs2956; 2026-02-21T10:22:24.3735241Z cvt.f32.bf16 %r35967, %rs2963; 2026-02-21T10:22:24.3735303Z cvt.f32.bf16 %r35968, %rs2964; 2026-02-21T10:22:24.3735361Z cvt.f32.bf16 %r35969, %rs2971; 2026-02-21T10:22:24.3735417Z cvt.f32.bf16 %r35970, %rs2972; 2026-02-21T10:22:24.3735475Z cvt.f32.bf16 %r36099, %rs2917; 2026-02-21T10:22:24.3735531Z cvt.f32.bf16 %r36100, %rs2918; 2026-02-21T10:22:24.3735587Z cvt.f32.bf16 %r36101, %rs2925; 2026-02-21T10:22:24.3735647Z cvt.f32.bf16 %r36102, %rs2926; 2026-02-21T10:22:24.3735710Z cvt.f32.bf16 %r36231, %rs2933; 2026-02-21T10:22:24.3735767Z cvt.f32.bf16 %r36232, %rs2934; 2026-02-21T10:22:24.3735825Z cvt.f32.bf16 %r36233, %rs2941; 2026-02-21T10:22:24.3735884Z cvt.f32.bf16 %r36234, %rs2942; 2026-02-21T10:22:24.3735941Z cvt.f32.bf16 %r36363, %rs2949; 2026-02-21T10:22:24.3736020Z cvt.f32.bf16 %r36364, %rs2950; 2026-02-21T10:22:24.3736083Z cvt.f32.bf16 %r36365, %rs2957; 2026-02-21T10:22:24.3736140Z cvt.f32.bf16 %r36366, %rs2958; 2026-02-21T10:22:24.3736200Z cvt.f32.bf16 %r36495, %rs2965; 2026-02-21T10:22:24.3736257Z cvt.f32.bf16 %r36496, %rs2966; 2026-02-21T10:22:24.3736316Z cvt.f32.bf16 %r36497, %rs2973; 2026-02-21T10:22:24.3736374Z cvt.f32.bf16 %r36498, %rs2974; 2026-02-21T10:22:24.3736431Z cvt.f32.bf16 %r36627, %rs2919; 2026-02-21T10:22:24.3736623Z cvt.f32.bf16 %r36628, %rs2920; 2026-02-21T10:22:24.3736685Z cvt.f32.bf16 %r36629, %rs2927; 2026-02-21T10:22:24.3736742Z cvt.f32.bf16 %r36630, %rs2928; 2026-02-21T10:22:24.3736891Z cvt.f32.bf16 %r36759, %rs2935; 2026-02-21T10:22:24.3736955Z cvt.f32.bf16 %r36760, %rs2936; 2026-02-21T10:22:24.3737013Z cvt.f32.bf16 %r36761, %rs2943; 2026-02-21T10:22:24.3737071Z cvt.f32.bf16 %r36762, %rs2944; 2026-02-21T10:22:24.3737129Z cvt.f32.bf16 %r36891, %rs2951; 2026-02-21T10:22:24.3737186Z cvt.f32.bf16 %r36892, %rs2952; 2026-02-21T10:22:24.3737247Z cvt.f32.bf16 %r36893, %rs2959; 2026-02-21T10:22:24.3737307Z cvt.f32.bf16 %r36894, %rs2960; 2026-02-21T10:22:24.3737367Z cvt.f32.bf16 %r37023, %rs2967; 2026-02-21T10:22:24.3737426Z cvt.f32.bf16 %r37024, %rs2968; 2026-02-21T10:22:24.3737485Z cvt.f32.bf16 %r37025, %rs2975; 2026-02-21T10:22:24.3737545Z cvt.f32.bf16 %r37026, %rs2976; 2026-02-21T10:22:24.3737761Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3737817Z bar.sync 0; 2026-02-21T10:22:24.3737877Z // begin inline asm 2026-02-21T10:22:24.3737983Z @%p313 mbarrier.init.shared::cta.b64 [%r39934], 1; 2026-02-21T10:22:24.3738118Z // end inline asm 2026-02-21T10:22:24.3738175Z bar.sync 0; 2026-02-21T10:22:24.3738235Z // begin inline asm 2026-02-21T10:22:24.3738372Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39934], 4096; 2026-02-21T10:22:24.3738427Z // end inline asm 2026-02-21T10:22:24.3738487Z // begin inline asm 2026-02-21T10:22:24.3738627Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3738681Z // end inline asm 2026-02-21T10:22:24.3738732Z bar.sync 0; 2026-02-21T10:22:24.3738803Z elect.sync %r39771|%p375, -1; 2026-02-21T10:22:24.3738871Z and.pred %p335, %p1, %p375; 2026-02-21T10:22:24.3738934Z cvt.u32.u64 %r39772, %rd853; 2026-02-21T10:22:24.3738996Z add.s32 %r34910, %r39772, 128; 2026-02-21T10:22:24.3739053Z // begin inline asm 2026-02-21T10:22:24.3739401Z @%p335 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r39937, %r34910}], [%r39934]; 2026-02-21T10:22:24.3739462Z // end inline asm 2026-02-21T10:22:24.3739517Z bar.sync 0; 2026-02-21T10:22:24.3739575Z // begin inline asm 2026-02-21T10:22:24.3739625Z 2026-02-21T10:22:24.3739675Z { 2026-02-21T10:22:24.3739736Z .reg .pred complete; 2026-02-21T10:22:24.3739801Z waitLoop: 2026-02-21T10:22:24.3739956Z mbarrier.try_wait.parity.shared.b64 complete, [%r39934], %r39572; 2026-02-21T10:22:24.3740029Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3740077Z } 2026-02-21T10:22:24.3740082Z 2026-02-21T10:22:24.3740136Z // end inline asm 2026-02-21T10:22:24.3740193Z bar.sync 0; 2026-02-21T10:22:24.3740343Z // begin inline asm 2026-02-21T10:22:24.3740444Z @%p313 mbarrier.inval.shared::cta.b64 [%r39934]; 2026-02-21T10:22:24.3740502Z // end inline asm 2026-02-21T10:22:24.3740709Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3740776Z ld.shared.s8 %rs2977, [%r80]; 2026-02-21T10:22:24.3740971Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3741039Z shl.b16 %rs2978, %rs2977, 4; 2026-02-21T10:22:24.3741230Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3741297Z ld.shared.s8 %rs2979, [%r81+128]; 2026-02-21T10:22:24.3741491Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3741567Z shl.b16 %rs2980, %rs2979, 4; 2026-02-21T10:22:24.3741770Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3741841Z ld.shared.s8 %rs2981, [%r82+256]; 2026-02-21T10:22:24.3742033Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3742093Z shl.b16 %rs2982, %rs2981, 4; 2026-02-21T10:22:24.3742282Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3742401Z ld.shared.s8 %rs2983, [%r83+384]; 2026-02-21T10:22:24.3742590Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3742651Z shl.b16 %rs2984, %rs2983, 4; 2026-02-21T10:22:24.3742837Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3742901Z ld.shared.s8 %rs2985, [%r84+512]; 2026-02-21T10:22:24.3743087Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3743161Z shl.b16 %rs2986, %rs2985, 4; 2026-02-21T10:22:24.3743352Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3743414Z ld.shared.s8 %rs2987, [%r85+640]; 2026-02-21T10:22:24.3743603Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3743661Z shl.b16 %rs2988, %rs2987, 4; 2026-02-21T10:22:24.3743897Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3743963Z ld.shared.s8 %rs2989, [%r86+768]; 2026-02-21T10:22:24.3744149Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3744208Z shl.b16 %rs2990, %rs2989, 4; 2026-02-21T10:22:24.3744444Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3744508Z ld.shared.s8 %rs2991, [%r87+896]; 2026-02-21T10:22:24.3744694Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3744753Z shl.b16 %rs2992, %rs2991, 4; 2026-02-21T10:22:24.3744941Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3745005Z ld.shared.s8 %rs2993, [%r80+1024]; 2026-02-21T10:22:24.3745205Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3745270Z shl.b16 %rs2994, %rs2993, 4; 2026-02-21T10:22:24.3745467Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3745532Z ld.shared.s8 %rs2995, [%r81+1152]; 2026-02-21T10:22:24.3745724Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3745787Z shl.b16 %rs2996, %rs2995, 4; 2026-02-21T10:22:24.3746023Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3746092Z ld.shared.s8 %rs2997, [%r82+1280]; 2026-02-21T10:22:24.3746278Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3746344Z shl.b16 %rs2998, %rs2997, 4; 2026-02-21T10:22:24.3746657Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3746733Z ld.shared.s8 %rs2999, [%r83+1408]; 2026-02-21T10:22:24.3746920Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3746979Z shl.b16 %rs3000, %rs2999, 4; 2026-02-21T10:22:24.3747168Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3747233Z ld.shared.s8 %rs3001, [%r84+1536]; 2026-02-21T10:22:24.3747423Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3747484Z shl.b16 %rs3002, %rs3001, 4; 2026-02-21T10:22:24.3747670Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3747733Z ld.shared.s8 %rs3003, [%r85+1664]; 2026-02-21T10:22:24.3747920Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3748067Z shl.b16 %rs3004, %rs3003, 4; 2026-02-21T10:22:24.3748264Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3748330Z ld.shared.s8 %rs3005, [%r86+1792]; 2026-02-21T10:22:24.3748613Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3748676Z shl.b16 %rs3006, %rs3005, 4; 2026-02-21T10:22:24.3748865Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3748932Z ld.shared.s8 %rs3007, [%r87+1920]; 2026-02-21T10:22:24.3749118Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3749177Z shl.b16 %rs3008, %rs3007, 4; 2026-02-21T10:22:24.3749365Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3749426Z ld.shared.s8 %rs3009, [%r80+2048]; 2026-02-21T10:22:24.3749687Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3749751Z shl.b16 %rs3010, %rs3009, 4; 2026-02-21T10:22:24.3749947Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3750021Z ld.shared.s8 %rs3011, [%r81+2176]; 2026-02-21T10:22:24.3750280Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3750343Z shl.b16 %rs3012, %rs3011, 4; 2026-02-21T10:22:24.3750535Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3750600Z ld.shared.s8 %rs3013, [%r82+2304]; 2026-02-21T10:22:24.3750790Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3750849Z shl.b16 %rs3014, %rs3013, 4; 2026-02-21T10:22:24.3751037Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3751106Z ld.shared.s8 %rs3015, [%r83+2432]; 2026-02-21T10:22:24.3751295Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3751353Z shl.b16 %rs3016, %rs3015, 4; 2026-02-21T10:22:24.3751542Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3751607Z ld.shared.s8 %rs3017, [%r84+2560]; 2026-02-21T10:22:24.3751864Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3751929Z shl.b16 %rs3018, %rs3017, 4; 2026-02-21T10:22:24.3752118Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3752179Z ld.shared.s8 %rs3019, [%r85+2688]; 2026-02-21T10:22:24.3752365Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3752433Z shl.b16 %rs3020, %rs3019, 4; 2026-02-21T10:22:24.3752620Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3752682Z ld.shared.s8 %rs3021, [%r86+2816]; 2026-02-21T10:22:24.3752889Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3752951Z shl.b16 %rs3022, %rs3021, 4; 2026-02-21T10:22:24.3753139Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3753205Z ld.shared.s8 %rs3023, [%r87+2944]; 2026-02-21T10:22:24.3753392Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3753451Z shl.b16 %rs3024, %rs3023, 4; 2026-02-21T10:22:24.3753639Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3753700Z ld.shared.s8 %rs3025, [%r80+3072]; 2026-02-21T10:22:24.3753945Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3754003Z shl.b16 %rs3026, %rs3025, 4; 2026-02-21T10:22:24.3754191Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3754254Z ld.shared.s8 %rs3027, [%r81+3200]; 2026-02-21T10:22:24.3754452Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3754518Z shl.b16 %rs3028, %rs3027, 4; 2026-02-21T10:22:24.3754707Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3754768Z ld.shared.s8 %rs3029, [%r82+3328]; 2026-02-21T10:22:24.3754956Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3755015Z shl.b16 %rs3030, %rs3029, 4; 2026-02-21T10:22:24.3755252Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3755317Z ld.shared.s8 %rs3031, [%r83+3456]; 2026-02-21T10:22:24.3755504Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3755563Z shl.b16 %rs3032, %rs3031, 4; 2026-02-21T10:22:24.3755796Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3755863Z ld.shared.s8 %rs3033, [%r84+3584]; 2026-02-21T10:22:24.3756049Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3756108Z shl.b16 %rs3034, %rs3033, 4; 2026-02-21T10:22:24.3756296Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3756358Z ld.shared.s8 %rs3035, [%r85+3712]; 2026-02-21T10:22:24.3756683Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3756751Z shl.b16 %rs3036, %rs3035, 4; 2026-02-21T10:22:24.3756937Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3756999Z ld.shared.s8 %rs3037, [%r86+3840]; 2026-02-21T10:22:24.3757188Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3757252Z shl.b16 %rs3038, %rs3037, 4; 2026-02-21T10:22:24.3757522Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3757591Z ld.shared.s8 %rs3039, [%r87+3968]; 2026-02-21T10:22:24.3757780Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3757838Z shl.b16 %rs3040, %rs3039, 4; 2026-02-21T10:22:24.3758024Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3758092Z cvt.s16.s8 %rs3041, %rs2978; 2026-02-21T10:22:24.3758150Z shr.s16 %rs3042, %rs3041, 4; 2026-02-21T10:22:24.3758209Z cvt.s16.s8 %rs3043, %rs2980; 2026-02-21T10:22:24.3758269Z shr.s16 %rs3044, %rs3043, 4; 2026-02-21T10:22:24.3758326Z shr.s16 %rs3045, %rs2977, 4; 2026-02-21T10:22:24.3758384Z shr.s16 %rs3046, %rs2979, 4; 2026-02-21T10:22:24.3758576Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3758641Z cvt.rn.f32.s16 %r39773, %rs3046; 2026-02-21T10:22:24.3758703Z cvt.rn.f32.s16 %r39774, %rs3045; 2026-02-21T10:22:24.3758764Z cvt.rn.f32.s16 %r39775, %rs3044; 2026-02-21T10:22:24.3758825Z cvt.rn.f32.s16 %r39776, %rs3042; 2026-02-21T10:22:24.3759013Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3759071Z cvt.s16.s8 %rs3047, %rs2982; 2026-02-21T10:22:24.3759133Z shr.s16 %rs3048, %rs3047, 4; 2026-02-21T10:22:24.3759264Z cvt.s16.s8 %rs3049, %rs2984; 2026-02-21T10:22:24.3759324Z shr.s16 %rs3050, %rs3049, 4; 2026-02-21T10:22:24.3759384Z shr.s16 %rs3051, %rs2981, 4; 2026-02-21T10:22:24.3759444Z shr.s16 %rs3052, %rs2983, 4; 2026-02-21T10:22:24.3759632Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3759706Z cvt.rn.f32.s16 %r39777, %rs3052; 2026-02-21T10:22:24.3759771Z cvt.rn.f32.s16 %r39778, %rs3051; 2026-02-21T10:22:24.3759831Z cvt.rn.f32.s16 %r39779, %rs3050; 2026-02-21T10:22:24.3759891Z cvt.rn.f32.s16 %r39780, %rs3048; 2026-02-21T10:22:24.3760083Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3760143Z cvt.s16.s8 %rs3053, %rs2986; 2026-02-21T10:22:24.3760206Z shr.s16 %rs3054, %rs3053, 4; 2026-02-21T10:22:24.3760267Z cvt.s16.s8 %rs3055, %rs2988; 2026-02-21T10:22:24.3760328Z shr.s16 %rs3056, %rs3055, 4; 2026-02-21T10:22:24.3760385Z shr.s16 %rs3057, %rs2985, 4; 2026-02-21T10:22:24.3760514Z shr.s16 %rs3058, %rs2987, 4; 2026-02-21T10:22:24.3760719Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3760783Z cvt.rn.f32.s16 %r39781, %rs3058; 2026-02-21T10:22:24.3760845Z cvt.rn.f32.s16 %r39782, %rs3057; 2026-02-21T10:22:24.3760990Z cvt.rn.f32.s16 %r39783, %rs3056; 2026-02-21T10:22:24.3761060Z cvt.rn.f32.s16 %r39784, %rs3054; 2026-02-21T10:22:24.3761259Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3761321Z cvt.s16.s8 %rs3059, %rs2990; 2026-02-21T10:22:24.3761383Z shr.s16 %rs3060, %rs3059, 4; 2026-02-21T10:22:24.3761441Z cvt.s16.s8 %rs3061, %rs2992; 2026-02-21T10:22:24.3761499Z shr.s16 %rs3062, %rs3061, 4; 2026-02-21T10:22:24.3761559Z shr.s16 %rs3063, %rs2989, 4; 2026-02-21T10:22:24.3761618Z shr.s16 %rs3064, %rs2991, 4; 2026-02-21T10:22:24.3761811Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3761877Z cvt.rn.f32.s16 %r39785, %rs3064; 2026-02-21T10:22:24.3761937Z cvt.rn.f32.s16 %r39786, %rs3063; 2026-02-21T10:22:24.3761997Z cvt.rn.f32.s16 %r39787, %rs3062; 2026-02-21T10:22:24.3762056Z cvt.rn.f32.s16 %r39788, %rs3060; 2026-02-21T10:22:24.3762248Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3762310Z cvt.s16.s8 %rs3065, %rs2994; 2026-02-21T10:22:24.3762422Z shr.s16 %rs3066, %rs3065, 4; 2026-02-21T10:22:24.3762485Z cvt.s16.s8 %rs3067, %rs2996; 2026-02-21T10:22:24.3762554Z shr.s16 %rs3068, %rs3067, 4; 2026-02-21T10:22:24.3762618Z shr.s16 %rs3069, %rs2993, 4; 2026-02-21T10:22:24.3762676Z shr.s16 %rs3070, %rs2995, 4; 2026-02-21T10:22:24.3762869Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3762931Z cvt.rn.f32.s16 %r39789, %rs3070; 2026-02-21T10:22:24.3762993Z cvt.rn.f32.s16 %r39790, %rs3069; 2026-02-21T10:22:24.3763058Z cvt.rn.f32.s16 %r39791, %rs3068; 2026-02-21T10:22:24.3763118Z cvt.rn.f32.s16 %r39792, %rs3066; 2026-02-21T10:22:24.3763306Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3763368Z cvt.s16.s8 %rs3071, %rs2998; 2026-02-21T10:22:24.3763429Z shr.s16 %rs3072, %rs3071, 4; 2026-02-21T10:22:24.3763486Z cvt.s16.s8 %rs3073, %rs3000; 2026-02-21T10:22:24.3763544Z shr.s16 %rs3074, %rs3073, 4; 2026-02-21T10:22:24.3763606Z shr.s16 %rs3075, %rs2997, 4; 2026-02-21T10:22:24.3763664Z shr.s16 %rs3076, %rs2999, 4; 2026-02-21T10:22:24.3763852Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3763916Z cvt.rn.f32.s16 %r39793, %rs3076; 2026-02-21T10:22:24.3763976Z cvt.rn.f32.s16 %r39794, %rs3075; 2026-02-21T10:22:24.3764034Z cvt.rn.f32.s16 %r39795, %rs3074; 2026-02-21T10:22:24.3764154Z cvt.rn.f32.s16 %r39796, %rs3072; 2026-02-21T10:22:24.3764343Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3764403Z cvt.s16.s8 %rs3077, %rs3002; 2026-02-21T10:22:24.3764460Z shr.s16 %rs3078, %rs3077, 4; 2026-02-21T10:22:24.3764521Z cvt.s16.s8 %rs3079, %rs3004; 2026-02-21T10:22:24.3764581Z shr.s16 %rs3080, %rs3079, 4; 2026-02-21T10:22:24.3764638Z shr.s16 %rs3081, %rs3001, 4; 2026-02-21T10:22:24.3764698Z shr.s16 %rs3082, %rs3003, 4; 2026-02-21T10:22:24.3764890Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3764952Z cvt.rn.f32.s16 %r39797, %rs3082; 2026-02-21T10:22:24.3765014Z cvt.rn.f32.s16 %r39798, %rs3081; 2026-02-21T10:22:24.3765073Z cvt.rn.f32.s16 %r39799, %rs3080; 2026-02-21T10:22:24.3765132Z cvt.rn.f32.s16 %r39800, %rs3078; 2026-02-21T10:22:24.3765319Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3765435Z cvt.s16.s8 %rs3083, %rs3006; 2026-02-21T10:22:24.3765496Z shr.s16 %rs3084, %rs3083, 4; 2026-02-21T10:22:24.3765565Z cvt.s16.s8 %rs3085, %rs3008; 2026-02-21T10:22:24.3765628Z shr.s16 %rs3086, %rs3085, 4; 2026-02-21T10:22:24.3765687Z shr.s16 %rs3087, %rs3005, 4; 2026-02-21T10:22:24.3765745Z shr.s16 %rs3088, %rs3007, 4; 2026-02-21T10:22:24.3765988Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3766062Z cvt.rn.f32.s16 %r39801, %rs3088; 2026-02-21T10:22:24.3766122Z cvt.rn.f32.s16 %r39802, %rs3087; 2026-02-21T10:22:24.3766185Z cvt.rn.f32.s16 %r39803, %rs3086; 2026-02-21T10:22:24.3766248Z cvt.rn.f32.s16 %r39804, %rs3084; 2026-02-21T10:22:24.3766436Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3766623Z cvt.s16.s8 %rs3089, %rs3010; 2026-02-21T10:22:24.3766689Z shr.s16 %rs3090, %rs3089, 4; 2026-02-21T10:22:24.3766753Z cvt.s16.s8 %rs3091, %rs3012; 2026-02-21T10:22:24.3766810Z shr.s16 %rs3092, %rs3091, 4; 2026-02-21T10:22:24.3766868Z shr.s16 %rs3093, %rs3009, 4; 2026-02-21T10:22:24.3766928Z shr.s16 %rs3094, %rs3011, 4; 2026-02-21T10:22:24.3767117Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3767181Z cvt.rn.f32.s16 %r39805, %rs3094; 2026-02-21T10:22:24.3767244Z cvt.rn.f32.s16 %r39806, %rs3093; 2026-02-21T10:22:24.3767384Z cvt.rn.f32.s16 %r39807, %rs3092; 2026-02-21T10:22:24.3767450Z cvt.rn.f32.s16 %r39808, %rs3090; 2026-02-21T10:22:24.3767648Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3767708Z cvt.s16.s8 %rs3095, %rs3014; 2026-02-21T10:22:24.3767766Z shr.s16 %rs3096, %rs3095, 4; 2026-02-21T10:22:24.3767825Z cvt.s16.s8 %rs3097, %rs3016; 2026-02-21T10:22:24.3767884Z shr.s16 %rs3098, %rs3097, 4; 2026-02-21T10:22:24.3767945Z shr.s16 %rs3099, %rs3013, 4; 2026-02-21T10:22:24.3768004Z shr.s16 %rs3100, %rs3015, 4; 2026-02-21T10:22:24.3768196Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3768257Z cvt.rn.f32.s16 %r39809, %rs3100; 2026-02-21T10:22:24.3768328Z cvt.rn.f32.s16 %r39810, %rs3099; 2026-02-21T10:22:24.3768396Z cvt.rn.f32.s16 %r39811, %rs3098; 2026-02-21T10:22:24.3768457Z cvt.rn.f32.s16 %r39812, %rs3096; 2026-02-21T10:22:24.3768649Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3768708Z cvt.s16.s8 %rs3101, %rs3018; 2026-02-21T10:22:24.3768771Z shr.s16 %rs3102, %rs3101, 4; 2026-02-21T10:22:24.3768829Z cvt.s16.s8 %rs3103, %rs3020; 2026-02-21T10:22:24.3768887Z shr.s16 %rs3104, %rs3103, 4; 2026-02-21T10:22:24.3768945Z shr.s16 %rs3105, %rs3017, 4; 2026-02-21T10:22:24.3769004Z shr.s16 %rs3106, %rs3019, 4; 2026-02-21T10:22:24.3769193Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3769338Z cvt.rn.f32.s16 %r39813, %rs3106; 2026-02-21T10:22:24.3769402Z cvt.rn.f32.s16 %r39814, %rs3105; 2026-02-21T10:22:24.3769462Z cvt.rn.f32.s16 %r39815, %rs3104; 2026-02-21T10:22:24.3769521Z cvt.rn.f32.s16 %r39816, %rs3102; 2026-02-21T10:22:24.3769719Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3769778Z cvt.s16.s8 %rs3107, %rs3022; 2026-02-21T10:22:24.3769839Z shr.s16 %rs3108, %rs3107, 4; 2026-02-21T10:22:24.3769899Z cvt.s16.s8 %rs3109, %rs3024; 2026-02-21T10:22:24.3769957Z shr.s16 %rs3110, %rs3109, 4; 2026-02-21T10:22:24.3770015Z shr.s16 %rs3111, %rs3021, 4; 2026-02-21T10:22:24.3770072Z shr.s16 %rs3112, %rs3023, 4; 2026-02-21T10:22:24.3770262Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3770323Z cvt.rn.f32.s16 %r39817, %rs3112; 2026-02-21T10:22:24.3770452Z cvt.rn.f32.s16 %r39818, %rs3111; 2026-02-21T10:22:24.3770520Z cvt.rn.f32.s16 %r39819, %rs3110; 2026-02-21T10:22:24.3770579Z cvt.rn.f32.s16 %r39820, %rs3108; 2026-02-21T10:22:24.3770768Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3770896Z cvt.s16.s8 %rs3113, %rs3026; 2026-02-21T10:22:24.3770963Z shr.s16 %rs3114, %rs3113, 4; 2026-02-21T10:22:24.3771022Z cvt.s16.s8 %rs3115, %rs3028; 2026-02-21T10:22:24.3771082Z shr.s16 %rs3116, %rs3115, 4; 2026-02-21T10:22:24.3771145Z shr.s16 %rs3117, %rs3025, 4; 2026-02-21T10:22:24.3771203Z shr.s16 %rs3118, %rs3027, 4; 2026-02-21T10:22:24.3771392Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3771455Z cvt.rn.f32.s16 %r39821, %rs3118; 2026-02-21T10:22:24.3771514Z cvt.rn.f32.s16 %r39822, %rs3117; 2026-02-21T10:22:24.3771573Z cvt.rn.f32.s16 %r39823, %rs3116; 2026-02-21T10:22:24.3771637Z cvt.rn.f32.s16 %r39824, %rs3114; 2026-02-21T10:22:24.3771827Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3771886Z cvt.s16.s8 %rs3119, %rs3030; 2026-02-21T10:22:24.3771944Z shr.s16 %rs3120, %rs3119, 4; 2026-02-21T10:22:24.3772005Z cvt.s16.s8 %rs3121, %rs3032; 2026-02-21T10:22:24.3772066Z shr.s16 %rs3122, %rs3121, 4; 2026-02-21T10:22:24.3772123Z shr.s16 %rs3123, %rs3029, 4; 2026-02-21T10:22:24.3772183Z shr.s16 %rs3124, %rs3031, 4; 2026-02-21T10:22:24.3772425Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3772489Z cvt.rn.f32.s16 %r39825, %rs3124; 2026-02-21T10:22:24.3772549Z cvt.rn.f32.s16 %r39826, %rs3123; 2026-02-21T10:22:24.3772610Z cvt.rn.f32.s16 %r39827, %rs3122; 2026-02-21T10:22:24.3772668Z cvt.rn.f32.s16 %r39828, %rs3120; 2026-02-21T10:22:24.3772857Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3772923Z cvt.s16.s8 %rs3125, %rs3034; 2026-02-21T10:22:24.3772982Z shr.s16 %rs3126, %rs3125, 4; 2026-02-21T10:22:24.3773040Z cvt.s16.s8 %rs3127, %rs3036; 2026-02-21T10:22:24.3773100Z shr.s16 %rs3128, %rs3127, 4; 2026-02-21T10:22:24.3773158Z shr.s16 %rs3129, %rs3033, 4; 2026-02-21T10:22:24.3773218Z shr.s16 %rs3130, %rs3035, 4; 2026-02-21T10:22:24.3773407Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3773471Z cvt.rn.f32.s16 %r39829, %rs3130; 2026-02-21T10:22:24.3773531Z cvt.rn.f32.s16 %r39830, %rs3129; 2026-02-21T10:22:24.3773590Z cvt.rn.f32.s16 %r39831, %rs3128; 2026-02-21T10:22:24.3773651Z cvt.rn.f32.s16 %r39832, %rs3126; 2026-02-21T10:22:24.3773853Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3773913Z cvt.s16.s8 %rs3131, %rs3038; 2026-02-21T10:22:24.3773974Z shr.s16 %rs3132, %rs3131, 4; 2026-02-21T10:22:24.3774092Z cvt.s16.s8 %rs3133, %rs3040; 2026-02-21T10:22:24.3774153Z shr.s16 %rs3134, %rs3133, 4; 2026-02-21T10:22:24.3774210Z shr.s16 %rs3135, %rs3037, 4; 2026-02-21T10:22:24.3774269Z shr.s16 %rs3136, %rs3039, 4; 2026-02-21T10:22:24.3774455Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3774519Z cvt.rn.f32.s16 %r39833, %rs3136; 2026-02-21T10:22:24.3774581Z cvt.rn.f32.s16 %r39834, %rs3135; 2026-02-21T10:22:24.3774642Z cvt.rn.f32.s16 %r39835, %rs3134; 2026-02-21T10:22:24.3774701Z cvt.rn.f32.s16 %r39836, %rs3132; 2026-02-21T10:22:24.3774755Z bar.sync 0; 2026-02-21T10:22:24.3774878Z st.shared.v4.b32 [%r88], {%r39776, %r39774, %r39775, %r39773}; 2026-02-21T10:22:24.3775004Z st.shared.v4.b32 [%r88+16384], {%r39808, %r39806, %r39807, %r39805}; 2026-02-21T10:22:24.3775114Z st.shared.v4.b32 [%r89], {%r39780, %r39778, %r39779, %r39777}; 2026-02-21T10:22:24.3775234Z st.shared.v4.b32 [%r89+16384], {%r39812, %r39810, %r39811, %r39809}; 2026-02-21T10:22:24.3775393Z st.shared.v4.b32 [%r90], {%r39784, %r39782, %r39783, %r39781}; 2026-02-21T10:22:24.3775509Z st.shared.v4.b32 [%r90+16384], {%r39816, %r39814, %r39815, %r39813}; 2026-02-21T10:22:24.3775625Z st.shared.v4.b32 [%r91], {%r39788, %r39786, %r39787, %r39785}; 2026-02-21T10:22:24.3775797Z st.shared.v4.b32 [%r91+16384], {%r39820, %r39818, %r39819, %r39817}; 2026-02-21T10:22:24.3775903Z st.shared.v4.b32 [%r92], {%r39792, %r39790, %r39791, %r39789}; 2026-02-21T10:22:24.3776021Z st.shared.v4.b32 [%r92+16384], {%r39824, %r39822, %r39823, %r39821}; 2026-02-21T10:22:24.3776126Z st.shared.v4.b32 [%r93], {%r39796, %r39794, %r39795, %r39793}; 2026-02-21T10:22:24.3776240Z st.shared.v4.b32 [%r93+16384], {%r39828, %r39826, %r39827, %r39825}; 2026-02-21T10:22:24.3776343Z st.shared.v4.b32 [%r94], {%r39800, %r39798, %r39799, %r39797}; 2026-02-21T10:22:24.3776578Z st.shared.v4.b32 [%r94+16384], {%r39832, %r39830, %r39831, %r39829}; 2026-02-21T10:22:24.3776707Z st.shared.v4.b32 [%r95], {%r39804, %r39802, %r39803, %r39801}; 2026-02-21T10:22:24.3776824Z st.shared.v4.b32 [%r95+16384], {%r39836, %r39834, %r39835, %r39833}; 2026-02-21T10:22:24.3776880Z $L__tmp27: 2026-02-21T10:22:24.3777151Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3777213Z // begin inline asm 2026-02-21T10:22:24.3777293Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3777348Z // end inline asm 2026-02-21T10:22:24.3777475Z bar.sync 0; 2026-02-21T10:22:24.3777548Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3777610Z // begin inline asm 2026-02-21T10:22:24.3779088Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35043,%r35044,%r35045,%r35046}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3779148Z // end inline asm 2026-02-21T10:22:24.3779207Z // begin inline asm 2026-02-21T10:22:24.3780670Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35175,%r35176,%r35177,%r35178}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3780793Z // end inline asm 2026-02-21T10:22:24.3780850Z // begin inline asm 2026-02-21T10:22:24.3782317Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35307,%r35308,%r35309,%r35310}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3782375Z // end inline asm 2026-02-21T10:22:24.3782443Z // begin inline asm 2026-02-21T10:22:24.3783983Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35439,%r35440,%r35441,%r35442}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3784100Z // end inline asm 2026-02-21T10:22:24.3784155Z // begin inline asm 2026-02-21T10:22:24.3785612Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35571,%r35572,%r35573,%r35574}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3785671Z // end inline asm 2026-02-21T10:22:24.3785730Z // begin inline asm 2026-02-21T10:22:24.3787401Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35703,%r35704,%r35705,%r35706}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3787467Z // end inline asm 2026-02-21T10:22:24.3787529Z // begin inline asm 2026-02-21T10:22:24.3789089Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35835,%r35836,%r35837,%r35838}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3789153Z // end inline asm 2026-02-21T10:22:24.3789210Z // begin inline asm 2026-02-21T10:22:24.3790664Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r35967,%r35968,%r35969,%r35970}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3790803Z // end inline asm 2026-02-21T10:22:24.3790858Z // begin inline asm 2026-02-21T10:22:24.3792369Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36099,%r36100,%r36101,%r36102}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3792435Z // end inline asm 2026-02-21T10:22:24.3792491Z // begin inline asm 2026-02-21T10:22:24.3793949Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36231,%r36232,%r36233,%r36234}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3794064Z // end inline asm 2026-02-21T10:22:24.3794121Z // begin inline asm 2026-02-21T10:22:24.3795623Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36363,%r36364,%r36365,%r36366}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3795693Z // end inline asm 2026-02-21T10:22:24.3795751Z // begin inline asm 2026-02-21T10:22:24.3797321Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36495,%r36496,%r36497,%r36498}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3797384Z // end inline asm 2026-02-21T10:22:24.3797442Z // begin inline asm 2026-02-21T10:22:24.3798907Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36627,%r36628,%r36629,%r36630}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3799045Z // end inline asm 2026-02-21T10:22:24.3799105Z // begin inline asm 2026-02-21T10:22:24.3800594Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36759,%r36760,%r36761,%r36762}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3800655Z // end inline asm 2026-02-21T10:22:24.3800711Z // begin inline asm 2026-02-21T10:22:24.3802259Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r36891,%r36892,%r36893,%r36894}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3802374Z // end inline asm 2026-02-21T10:22:24.3802431Z // begin inline asm 2026-02-21T10:22:24.3803915Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r37023,%r37024,%r37025,%r37026}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3803974Z // end inline asm 2026-02-21T10:22:24.3804053Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3804113Z mov.b32 %r37156, %r39572; 2026-02-21T10:22:24.3804175Z mov.b32 %r37157, %r39572; 2026-02-21T10:22:24.3804291Z mov.b32 %r37155, %r39936; 2026-02-21T10:22:24.3804362Z // begin inline asm 2026-02-21T10:22:24.3807007Z // wait for regs: %r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373,%r37155,%r37156,%r37157 2026-02-21T10:22:24.3807093Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3807151Z // end inline asm 2026-02-21T10:22:24.3807204Z $L__tmp28: 2026-02-21T10:22:24.3807410Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3807554Z add.s64 %rd757, %rd675, 256; 2026-02-21T10:22:24.3807616Z add.s64 %rd760, %rd678, 256; 2026-02-21T10:22:24.3807674Z add.s64 %rd763, %rd681, 256; 2026-02-21T10:22:24.3807732Z add.s64 %rd766, %rd684, 256; 2026-02-21T10:22:24.3807791Z add.s64 %rd769, %rd687, 256; 2026-02-21T10:22:24.3807849Z add.s64 %rd772, %rd690, 256; 2026-02-21T10:22:24.3807909Z add.s64 %rd775, %rd693, 256; 2026-02-21T10:22:24.3807985Z mad.wide.s32 %rd778, %r43245, 2, %rd117; 2026-02-21T10:22:24.3808183Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3808240Z // begin inline asm 2026-02-21T10:22:24.3808299Z mov.u64 %rd756, 0x0; 2026-02-21T10:22:24.3808426Z createpolicy.fractional.L2::evict_first.b64 %rd756, 1.0; 2026-02-21T10:22:24.3808481Z // end inline asm 2026-02-21T10:22:24.3808537Z // begin inline asm 2026-02-21T10:22:24.3808598Z mov.u32 %r37289, 0x0; 2026-02-21T10:22:24.3808654Z mov.u32 %r37290, 0x0; 2026-02-21T10:22:24.3808711Z mov.u32 %r37291, 0x0; 2026-02-21T10:22:24.3808836Z mov.u32 %r37292, 0x0; 2026-02-21T10:22:24.3809077Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37289, %r37290, %r37291, %r37292 }, [ %rd757 + 0 ], %rd756; 2026-02-21T10:22:24.3809133Z // end inline asm 2026-02-21T10:22:24.3809191Z // begin inline asm 2026-02-21T10:22:24.3809249Z mov.u64 %rd759, 0x0; 2026-02-21T10:22:24.3809432Z createpolicy.fractional.L2::evict_first.b64 %rd759, 1.0; 2026-02-21T10:22:24.3809487Z // end inline asm 2026-02-21T10:22:24.3809545Z // begin inline asm 2026-02-21T10:22:24.3809602Z mov.u32 %r37293, 0x0; 2026-02-21T10:22:24.3809657Z mov.u32 %r37294, 0x0; 2026-02-21T10:22:24.3809714Z mov.u32 %r37295, 0x0; 2026-02-21T10:22:24.3809771Z mov.u32 %r37296, 0x0; 2026-02-21T10:22:24.3809993Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37293, %r37294, %r37295, %r37296 }, [ %rd760 + 0 ], %rd759; 2026-02-21T10:22:24.3810048Z // end inline asm 2026-02-21T10:22:24.3810106Z // begin inline asm 2026-02-21T10:22:24.3810164Z mov.u64 %rd762, 0x0; 2026-02-21T10:22:24.3810282Z createpolicy.fractional.L2::evict_first.b64 %rd762, 1.0; 2026-02-21T10:22:24.3810339Z // end inline asm 2026-02-21T10:22:24.3810396Z // begin inline asm 2026-02-21T10:22:24.3810453Z mov.u32 %r37297, 0x0; 2026-02-21T10:22:24.3810510Z mov.u32 %r37298, 0x0; 2026-02-21T10:22:24.3810566Z mov.u32 %r37299, 0x0; 2026-02-21T10:22:24.3810638Z mov.u32 %r37300, 0x0; 2026-02-21T10:22:24.3810932Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37297, %r37298, %r37299, %r37300 }, [ %rd763 + 0 ], %rd762; 2026-02-21T10:22:24.3810992Z // end inline asm 2026-02-21T10:22:24.3811048Z // begin inline asm 2026-02-21T10:22:24.3811104Z mov.u64 %rd765, 0x0; 2026-02-21T10:22:24.3811221Z createpolicy.fractional.L2::evict_first.b64 %rd765, 1.0; 2026-02-21T10:22:24.3811276Z // end inline asm 2026-02-21T10:22:24.3811331Z // begin inline asm 2026-02-21T10:22:24.3811387Z mov.u32 %r37301, 0x0; 2026-02-21T10:22:24.3811446Z mov.u32 %r37302, 0x0; 2026-02-21T10:22:24.3811503Z mov.u32 %r37303, 0x0; 2026-02-21T10:22:24.3811559Z mov.u32 %r37304, 0x0; 2026-02-21T10:22:24.3811783Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37301, %r37302, %r37303, %r37304 }, [ %rd766 + 0 ], %rd765; 2026-02-21T10:22:24.3811837Z // end inline asm 2026-02-21T10:22:24.3811893Z // begin inline asm 2026-02-21T10:22:24.3811952Z mov.u64 %rd768, 0x0; 2026-02-21T10:22:24.3812070Z createpolicy.fractional.L2::evict_first.b64 %rd768, 1.0; 2026-02-21T10:22:24.3812123Z // end inline asm 2026-02-21T10:22:24.3812183Z // begin inline asm 2026-02-21T10:22:24.3812243Z mov.u32 %r37305, 0x0; 2026-02-21T10:22:24.3812298Z mov.u32 %r37306, 0x0; 2026-02-21T10:22:24.3812352Z mov.u32 %r37307, 0x0; 2026-02-21T10:22:24.3812409Z mov.u32 %r37308, 0x0; 2026-02-21T10:22:24.3812627Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37305, %r37306, %r37307, %r37308 }, [ %rd769 + 0 ], %rd768; 2026-02-21T10:22:24.3812682Z // end inline asm 2026-02-21T10:22:24.3812741Z // begin inline asm 2026-02-21T10:22:24.3812884Z mov.u64 %rd771, 0x0; 2026-02-21T10:22:24.3813001Z createpolicy.fractional.L2::evict_first.b64 %rd771, 1.0; 2026-02-21T10:22:24.3813057Z // end inline asm 2026-02-21T10:22:24.3813116Z // begin inline asm 2026-02-21T10:22:24.3813171Z mov.u32 %r37309, 0x0; 2026-02-21T10:22:24.3813225Z mov.u32 %r37310, 0x0; 2026-02-21T10:22:24.3813282Z mov.u32 %r37311, 0x0; 2026-02-21T10:22:24.3813340Z mov.u32 %r37312, 0x0; 2026-02-21T10:22:24.3813561Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37309, %r37310, %r37311, %r37312 }, [ %rd772 + 0 ], %rd771; 2026-02-21T10:22:24.3813615Z // end inline asm 2026-02-21T10:22:24.3813674Z // begin inline asm 2026-02-21T10:22:24.3813729Z mov.u64 %rd774, 0x0; 2026-02-21T10:22:24.3813844Z createpolicy.fractional.L2::evict_first.b64 %rd774, 1.0; 2026-02-21T10:22:24.3813900Z // end inline asm 2026-02-21T10:22:24.3813955Z // begin inline asm 2026-02-21T10:22:24.3814011Z mov.u32 %r37313, 0x0; 2026-02-21T10:22:24.3814068Z mov.u32 %r37314, 0x0; 2026-02-21T10:22:24.3814125Z mov.u32 %r37315, 0x0; 2026-02-21T10:22:24.3814242Z mov.u32 %r37316, 0x0; 2026-02-21T10:22:24.3814466Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37313, %r37314, %r37315, %r37316 }, [ %rd775 + 0 ], %rd774; 2026-02-21T10:22:24.3814523Z // end inline asm 2026-02-21T10:22:24.3814580Z // begin inline asm 2026-02-21T10:22:24.3814685Z mov.u64 %rd777, 0x0; 2026-02-21T10:22:24.3814801Z createpolicy.fractional.L2::evict_first.b64 %rd777, 1.0; 2026-02-21T10:22:24.3814855Z // end inline asm 2026-02-21T10:22:24.3814913Z // begin inline asm 2026-02-21T10:22:24.3814970Z mov.u32 %r37317, 0x0; 2026-02-21T10:22:24.3815026Z mov.u32 %r37318, 0x0; 2026-02-21T10:22:24.3815081Z mov.u32 %r37319, 0x0; 2026-02-21T10:22:24.3815135Z mov.u32 %r37320, 0x0; 2026-02-21T10:22:24.3815356Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r37317, %r37318, %r37319, %r37320 }, [ %rd778 + 0 ], %rd777; 2026-02-21T10:22:24.3815411Z // end inline asm 2026-02-21T10:22:24.3815612Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3815669Z bar.sync 0; 2026-02-21T10:22:24.3815755Z st.shared.v2.b32 [%r70], {%r37289, %r37290}; 2026-02-21T10:22:24.3815845Z st.shared.v2.b32 [%r70+2048], {%r37293, %r37294}; 2026-02-21T10:22:24.3815932Z st.shared.v2.b32 [%r70+4096], {%r37297, %r37298}; 2026-02-21T10:22:24.3816016Z st.shared.v2.b32 [%r70+6144], {%r37301, %r37302}; 2026-02-21T10:22:24.3816097Z st.shared.v2.b32 [%r70+8192], {%r37305, %r37306}; 2026-02-21T10:22:24.3816239Z st.shared.v2.b32 [%r70+10240], {%r37309, %r37310}; 2026-02-21T10:22:24.3816331Z st.shared.v2.b32 [%r70+12288], {%r37313, %r37314}; 2026-02-21T10:22:24.3816421Z st.shared.v2.b32 [%r70+14336], {%r37317, %r37318}; 2026-02-21T10:22:24.3816626Z st.shared.v2.b32 [%r71], {%r37291, %r37292}; 2026-02-21T10:22:24.3816716Z st.shared.v2.b32 [%r71+2048], {%r37295, %r37296}; 2026-02-21T10:22:24.3816799Z st.shared.v2.b32 [%r71+4096], {%r37299, %r37300}; 2026-02-21T10:22:24.3816884Z st.shared.v2.b32 [%r71+6144], {%r37303, %r37304}; 2026-02-21T10:22:24.3816967Z st.shared.v2.b32 [%r71+8192], {%r37307, %r37308}; 2026-02-21T10:22:24.3817053Z st.shared.v2.b32 [%r71+10240], {%r37311, %r37312}; 2026-02-21T10:22:24.3817135Z st.shared.v2.b32 [%r71+12288], {%r37315, %r37316}; 2026-02-21T10:22:24.3817217Z st.shared.v2.b32 [%r71+14336], {%r37319, %r37320}; 2026-02-21T10:22:24.3817276Z bar.sync 0; 2026-02-21T10:22:24.3817344Z ld.shared.b16 %rs3137, [%r72]; 2026-02-21T10:22:24.3817415Z ld.shared.b16 %rs3138, [%r72+1024]; 2026-02-21T10:22:24.3817485Z ld.shared.b16 %rs3139, [%r72+64]; 2026-02-21T10:22:24.3817550Z ld.shared.b16 %rs3140, [%r72+1088]; 2026-02-21T10:22:24.3817624Z ld.shared.b16 %rs3141, [%r72+8192]; 2026-02-21T10:22:24.3817690Z ld.shared.b16 %rs3142, [%r72+9216]; 2026-02-21T10:22:24.3817755Z ld.shared.b16 %rs3143, [%r72+8256]; 2026-02-21T10:22:24.3817818Z ld.shared.b16 %rs3144, [%r72+9280]; 2026-02-21T10:22:24.3817882Z ld.shared.b16 %rs3145, [%r73]; 2026-02-21T10:22:24.3818037Z ld.shared.b16 %rs3146, [%r73+1024]; 2026-02-21T10:22:24.3818101Z ld.shared.b16 %rs3147, [%r73+64]; 2026-02-21T10:22:24.3818164Z ld.shared.b16 %rs3148, [%r73+1088]; 2026-02-21T10:22:24.3818229Z ld.shared.b16 %rs3149, [%r73+8192]; 2026-02-21T10:22:24.3818292Z ld.shared.b16 %rs3150, [%r73+9216]; 2026-02-21T10:22:24.3818355Z ld.shared.b16 %rs3151, [%r73+8256]; 2026-02-21T10:22:24.3818420Z ld.shared.b16 %rs3152, [%r73+9280]; 2026-02-21T10:22:24.3818486Z ld.shared.b16 %rs3153, [%r74]; 2026-02-21T10:22:24.3818551Z ld.shared.b16 %rs3154, [%r74+1024]; 2026-02-21T10:22:24.3818614Z ld.shared.b16 %rs3155, [%r74+64]; 2026-02-21T10:22:24.3818677Z ld.shared.b16 %rs3156, [%r74+1088]; 2026-02-21T10:22:24.3818740Z ld.shared.b16 %rs3157, [%r74+8192]; 2026-02-21T10:22:24.3818803Z ld.shared.b16 %rs3158, [%r74+9216]; 2026-02-21T10:22:24.3818865Z ld.shared.b16 %rs3159, [%r74+8256]; 2026-02-21T10:22:24.3818930Z ld.shared.b16 %rs3160, [%r74+9280]; 2026-02-21T10:22:24.3818994Z ld.shared.b16 %rs3161, [%r75]; 2026-02-21T10:22:24.3819125Z ld.shared.b16 %rs3162, [%r75+1024]; 2026-02-21T10:22:24.3819193Z ld.shared.b16 %rs3163, [%r75+64]; 2026-02-21T10:22:24.3819255Z ld.shared.b16 %rs3164, [%r75+1088]; 2026-02-21T10:22:24.3819317Z ld.shared.b16 %rs3165, [%r75+8192]; 2026-02-21T10:22:24.3819380Z ld.shared.b16 %rs3166, [%r75+9216]; 2026-02-21T10:22:24.3819505Z ld.shared.b16 %rs3167, [%r75+8256]; 2026-02-21T10:22:24.3819567Z ld.shared.b16 %rs3168, [%r75+9280]; 2026-02-21T10:22:24.3819627Z ld.shared.b16 %rs3169, [%r76]; 2026-02-21T10:22:24.3819694Z ld.shared.b16 %rs3170, [%r76+1024]; 2026-02-21T10:22:24.3819756Z ld.shared.b16 %rs3171, [%r76+64]; 2026-02-21T10:22:24.3819818Z ld.shared.b16 %rs3172, [%r76+1088]; 2026-02-21T10:22:24.3819883Z ld.shared.b16 %rs3173, [%r76+8192]; 2026-02-21T10:22:24.3819944Z ld.shared.b16 %rs3174, [%r76+9216]; 2026-02-21T10:22:24.3820005Z ld.shared.b16 %rs3175, [%r76+8256]; 2026-02-21T10:22:24.3820067Z ld.shared.b16 %rs3176, [%r76+9280]; 2026-02-21T10:22:24.3820136Z ld.shared.b16 %rs3177, [%r77]; 2026-02-21T10:22:24.3820213Z ld.shared.b16 %rs3178, [%r77+1024]; 2026-02-21T10:22:24.3820277Z ld.shared.b16 %rs3179, [%r77+64]; 2026-02-21T10:22:24.3820342Z ld.shared.b16 %rs3180, [%r77+1088]; 2026-02-21T10:22:24.3820406Z ld.shared.b16 %rs3181, [%r77+8192]; 2026-02-21T10:22:24.3820470Z ld.shared.b16 %rs3182, [%r77+9216]; 2026-02-21T10:22:24.3820535Z ld.shared.b16 %rs3183, [%r77+8256]; 2026-02-21T10:22:24.3820601Z ld.shared.b16 %rs3184, [%r77+9280]; 2026-02-21T10:22:24.3820731Z ld.shared.b16 %rs3185, [%r78]; 2026-02-21T10:22:24.3820796Z ld.shared.b16 %rs3186, [%r78+1024]; 2026-02-21T10:22:24.3820860Z ld.shared.b16 %rs3187, [%r78+64]; 2026-02-21T10:22:24.3820923Z ld.shared.b16 %rs3188, [%r78+1088]; 2026-02-21T10:22:24.3820985Z ld.shared.b16 %rs3189, [%r78+8192]; 2026-02-21T10:22:24.3821063Z ld.shared.b16 %rs3190, [%r78+9216]; 2026-02-21T10:22:24.3821127Z ld.shared.b16 %rs3191, [%r78+8256]; 2026-02-21T10:22:24.3821194Z ld.shared.b16 %rs3192, [%r78+9280]; 2026-02-21T10:22:24.3821260Z ld.shared.b16 %rs3193, [%r79]; 2026-02-21T10:22:24.3821325Z ld.shared.b16 %rs3194, [%r79+1024]; 2026-02-21T10:22:24.3821388Z ld.shared.b16 %rs3195, [%r79+64]; 2026-02-21T10:22:24.3821451Z ld.shared.b16 %rs3196, [%r79+1088]; 2026-02-21T10:22:24.3821515Z ld.shared.b16 %rs3197, [%r79+8192]; 2026-02-21T10:22:24.3821580Z ld.shared.b16 %rs3198, [%r79+9216]; 2026-02-21T10:22:24.3821642Z ld.shared.b16 %rs3199, [%r79+8256]; 2026-02-21T10:22:24.3821711Z ld.shared.b16 %rs3200, [%r79+9280]; 2026-02-21T10:22:24.3821772Z cvt.f32.bf16 %r37458, %rs3137; 2026-02-21T10:22:24.3821831Z cvt.f32.bf16 %r37459, %rs3138; 2026-02-21T10:22:24.3821891Z cvt.f32.bf16 %r37460, %rs3145; 2026-02-21T10:22:24.3821955Z cvt.f32.bf16 %r37461, %rs3146; 2026-02-21T10:22:24.3822013Z cvt.f32.bf16 %r37590, %rs3153; 2026-02-21T10:22:24.3822072Z cvt.f32.bf16 %r37591, %rs3154; 2026-02-21T10:22:24.3822132Z cvt.f32.bf16 %r37592, %rs3161; 2026-02-21T10:22:24.3822247Z cvt.f32.bf16 %r37593, %rs3162; 2026-02-21T10:22:24.3822308Z cvt.f32.bf16 %r37722, %rs3169; 2026-02-21T10:22:24.3822366Z cvt.f32.bf16 %r37723, %rs3170; 2026-02-21T10:22:24.3822428Z cvt.f32.bf16 %r37724, %rs3177; 2026-02-21T10:22:24.3822486Z cvt.f32.bf16 %r37725, %rs3178; 2026-02-21T10:22:24.3822543Z cvt.f32.bf16 %r37854, %rs3185; 2026-02-21T10:22:24.3822608Z cvt.f32.bf16 %r37855, %rs3186; 2026-02-21T10:22:24.3822667Z cvt.f32.bf16 %r37856, %rs3193; 2026-02-21T10:22:24.3822724Z cvt.f32.bf16 %r37857, %rs3194; 2026-02-21T10:22:24.3822785Z cvt.f32.bf16 %r37986, %rs3139; 2026-02-21T10:22:24.3822845Z cvt.f32.bf16 %r37987, %rs3140; 2026-02-21T10:22:24.3822903Z cvt.f32.bf16 %r37988, %rs3147; 2026-02-21T10:22:24.3822962Z cvt.f32.bf16 %r37989, %rs3148; 2026-02-21T10:22:24.3823022Z cvt.f32.bf16 %r38118, %rs3155; 2026-02-21T10:22:24.3823079Z cvt.f32.bf16 %r38119, %rs3156; 2026-02-21T10:22:24.3823138Z cvt.f32.bf16 %r38120, %rs3163; 2026-02-21T10:22:24.3823201Z cvt.f32.bf16 %r38121, %rs3164; 2026-02-21T10:22:24.3823316Z cvt.f32.bf16 %r38250, %rs3171; 2026-02-21T10:22:24.3823376Z cvt.f32.bf16 %r38251, %rs3172; 2026-02-21T10:22:24.3823446Z cvt.f32.bf16 %r38252, %rs3179; 2026-02-21T10:22:24.3823509Z cvt.f32.bf16 %r38253, %rs3180; 2026-02-21T10:22:24.3823568Z cvt.f32.bf16 %r38382, %rs3187; 2026-02-21T10:22:24.3823626Z cvt.f32.bf16 %r38383, %rs3188; 2026-02-21T10:22:24.3823735Z cvt.f32.bf16 %r38384, %rs3195; 2026-02-21T10:22:24.3823794Z cvt.f32.bf16 %r38385, %rs3196; 2026-02-21T10:22:24.3823852Z cvt.f32.bf16 %r38514, %rs3141; 2026-02-21T10:22:24.3823924Z cvt.f32.bf16 %r38515, %rs3142; 2026-02-21T10:22:24.3823986Z cvt.f32.bf16 %r38516, %rs3149; 2026-02-21T10:22:24.3824046Z cvt.f32.bf16 %r38517, %rs3150; 2026-02-21T10:22:24.3824104Z cvt.f32.bf16 %r38646, %rs3157; 2026-02-21T10:22:24.3824165Z cvt.f32.bf16 %r38647, %rs3158; 2026-02-21T10:22:24.3824226Z cvt.f32.bf16 %r38648, %rs3165; 2026-02-21T10:22:24.3824283Z cvt.f32.bf16 %r38649, %rs3166; 2026-02-21T10:22:24.3824343Z cvt.f32.bf16 %r38778, %rs3173; 2026-02-21T10:22:24.3824407Z cvt.f32.bf16 %r38779, %rs3174; 2026-02-21T10:22:24.3824466Z cvt.f32.bf16 %r38780, %rs3181; 2026-02-21T10:22:24.3824524Z cvt.f32.bf16 %r38781, %rs3182; 2026-02-21T10:22:24.3824583Z cvt.f32.bf16 %r38910, %rs3189; 2026-02-21T10:22:24.3824642Z cvt.f32.bf16 %r38911, %rs3190; 2026-02-21T10:22:24.3824702Z cvt.f32.bf16 %r38912, %rs3197; 2026-02-21T10:22:24.3824762Z cvt.f32.bf16 %r38913, %rs3198; 2026-02-21T10:22:24.3824822Z cvt.f32.bf16 %r39042, %rs3143; 2026-02-21T10:22:24.3824931Z cvt.f32.bf16 %r39043, %rs3144; 2026-02-21T10:22:24.3824991Z cvt.f32.bf16 %r39044, %rs3151; 2026-02-21T10:22:24.3825052Z cvt.f32.bf16 %r39045, %rs3152; 2026-02-21T10:22:24.3825110Z cvt.f32.bf16 %r39174, %rs3159; 2026-02-21T10:22:24.3825168Z cvt.f32.bf16 %r39175, %rs3160; 2026-02-21T10:22:24.3825226Z cvt.f32.bf16 %r39176, %rs3167; 2026-02-21T10:22:24.3825289Z cvt.f32.bf16 %r39177, %rs3168; 2026-02-21T10:22:24.3825347Z cvt.f32.bf16 %r39306, %rs3175; 2026-02-21T10:22:24.3825411Z cvt.f32.bf16 %r39307, %rs3176; 2026-02-21T10:22:24.3825472Z cvt.f32.bf16 %r39308, %rs3183; 2026-02-21T10:22:24.3825530Z cvt.f32.bf16 %r39309, %rs3184; 2026-02-21T10:22:24.3825587Z cvt.f32.bf16 %r39438, %rs3191; 2026-02-21T10:22:24.3825648Z cvt.f32.bf16 %r39439, %rs3192; 2026-02-21T10:22:24.3825708Z cvt.f32.bf16 %r39440, %rs3199; 2026-02-21T10:22:24.3825768Z cvt.f32.bf16 %r39441, %rs3200; 2026-02-21T10:22:24.3825976Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3826032Z bar.sync 0; 2026-02-21T10:22:24.3826090Z // begin inline asm 2026-02-21T10:22:24.3826203Z @%p313 mbarrier.init.shared::cta.b64 [%r39934], 1; 2026-02-21T10:22:24.3826263Z // end inline asm 2026-02-21T10:22:24.3826318Z bar.sync 0; 2026-02-21T10:22:24.3826376Z // begin inline asm 2026-02-21T10:22:24.3826634Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39934], 4096; 2026-02-21T10:22:24.3826696Z // end inline asm 2026-02-21T10:22:24.3826841Z // begin inline asm 2026-02-21T10:22:24.3826921Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3826979Z // end inline asm 2026-02-21T10:22:24.3827035Z bar.sync 0; 2026-02-21T10:22:24.3827102Z elect.sync %r39837|%p376, -1; 2026-02-21T10:22:24.3827172Z and.pred %p355, %p1, %p376; 2026-02-21T10:22:24.3827233Z add.s32 %r37325, %r39772, 160; 2026-02-21T10:22:24.3827294Z // begin inline asm 2026-02-21T10:22:24.3827636Z @%p355 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r39937, %r37325}], [%r39934]; 2026-02-21T10:22:24.3827696Z // end inline asm 2026-02-21T10:22:24.3827749Z bar.sync 0; 2026-02-21T10:22:24.3827806Z // begin inline asm 2026-02-21T10:22:24.3827860Z 2026-02-21T10:22:24.3827909Z { 2026-02-21T10:22:24.3827971Z .reg .pred complete; 2026-02-21T10:22:24.3828025Z waitLoop: 2026-02-21T10:22:24.3828175Z mbarrier.try_wait.parity.shared.b64 complete, [%r39934], %r39572; 2026-02-21T10:22:24.3828245Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3828365Z } 2026-02-21T10:22:24.3828370Z 2026-02-21T10:22:24.3828505Z // end inline asm 2026-02-21T10:22:24.3828560Z bar.sync 0; 2026-02-21T10:22:24.3828618Z // begin inline asm 2026-02-21T10:22:24.3828718Z @%p313 mbarrier.inval.shared::cta.b64 [%r39934]; 2026-02-21T10:22:24.3828772Z // end inline asm 2026-02-21T10:22:24.3829048Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3829113Z ld.shared.s8 %rs3201, [%r80]; 2026-02-21T10:22:24.3829310Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3829383Z shl.b16 %rs3202, %rs3201, 4; 2026-02-21T10:22:24.3829578Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3829647Z ld.shared.s8 %rs3203, [%r81+128]; 2026-02-21T10:22:24.3829834Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3829900Z shl.b16 %rs3204, %rs3203, 4; 2026-02-21T10:22:24.3830091Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3830156Z ld.shared.s8 %rs3205, [%r82+256]; 2026-02-21T10:22:24.3830345Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3830410Z shl.b16 %rs3206, %rs3205, 4; 2026-02-21T10:22:24.3830684Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3830750Z ld.shared.s8 %rs3207, [%r83+384]; 2026-02-21T10:22:24.3830938Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3830999Z shl.b16 %rs3208, %rs3207, 4; 2026-02-21T10:22:24.3831185Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3831249Z ld.shared.s8 %rs3209, [%r84+512]; 2026-02-21T10:22:24.3831441Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3831501Z shl.b16 %rs3210, %rs3209, 4; 2026-02-21T10:22:24.3831686Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3831753Z ld.shared.s8 %rs3211, [%r85+640]; 2026-02-21T10:22:24.3831942Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3832002Z shl.b16 %rs3212, %rs3211, 4; 2026-02-21T10:22:24.3832192Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3832255Z ld.shared.s8 %rs3213, [%r86+768]; 2026-02-21T10:22:24.3832442Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3832502Z shl.b16 %rs3214, %rs3213, 4; 2026-02-21T10:22:24.3832762Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3832825Z ld.shared.s8 %rs3215, [%r87+896]; 2026-02-21T10:22:24.3833013Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3833078Z shl.b16 %rs3216, %rs3215, 4; 2026-02-21T10:22:24.3833265Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3833333Z ld.shared.s8 %rs3217, [%r80+1024]; 2026-02-21T10:22:24.3833520Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3833581Z shl.b16 %rs3218, %rs3217, 4; 2026-02-21T10:22:24.3833765Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3833831Z ld.shared.s8 %rs3219, [%r81+1152]; 2026-02-21T10:22:24.3834067Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3834130Z shl.b16 %rs3220, %rs3219, 4; 2026-02-21T10:22:24.3834317Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3834383Z ld.shared.s8 %rs3221, [%r82+1280]; 2026-02-21T10:22:24.3834620Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3834678Z shl.b16 %rs3222, %rs3221, 4; 2026-02-21T10:22:24.3834867Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3834929Z ld.shared.s8 %rs3223, [%r83+1408]; 2026-02-21T10:22:24.3835116Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3835180Z shl.b16 %rs3224, %rs3223, 4; 2026-02-21T10:22:24.3835365Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3835431Z ld.shared.s8 %rs3225, [%r84+1536]; 2026-02-21T10:22:24.3835620Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3835681Z shl.b16 %rs3226, %rs3225, 4; 2026-02-21T10:22:24.3835868Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3835934Z ld.shared.s8 %rs3227, [%r85+1664]; 2026-02-21T10:22:24.3836186Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3836259Z shl.b16 %rs3228, %rs3227, 4; 2026-02-21T10:22:24.3836557Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3836627Z ld.shared.s8 %rs3229, [%r86+1792]; 2026-02-21T10:22:24.3836816Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3836881Z shl.b16 %rs3230, %rs3229, 4; 2026-02-21T10:22:24.3837075Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3837138Z ld.shared.s8 %rs3231, [%r87+1920]; 2026-02-21T10:22:24.3837327Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3837391Z shl.b16 %rs3232, %rs3231, 4; 2026-02-21T10:22:24.3837580Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3837653Z ld.shared.s8 %rs3233, [%r80+2048]; 2026-02-21T10:22:24.3837844Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3837906Z shl.b16 %rs3234, %rs3233, 4; 2026-02-21T10:22:24.3838091Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3838154Z ld.shared.s8 %rs3235, [%r81+2176]; 2026-02-21T10:22:24.3838426Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3838485Z shl.b16 %rs3236, %rs3235, 4; 2026-02-21T10:22:24.3838672Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3838737Z ld.shared.s8 %rs3237, [%r82+2304]; 2026-02-21T10:22:24.3838927Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3838988Z shl.b16 %rs3238, %rs3237, 4; 2026-02-21T10:22:24.3839178Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3839240Z ld.shared.s8 %rs3239, [%r83+2432]; 2026-02-21T10:22:24.3839427Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3839489Z shl.b16 %rs3240, %rs3239, 4; 2026-02-21T10:22:24.3839742Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3839809Z ld.shared.s8 %rs3241, [%r84+2560]; 2026-02-21T10:22:24.3839997Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3840060Z shl.b16 %rs3242, %rs3241, 4; 2026-02-21T10:22:24.3840309Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3840373Z ld.shared.s8 %rs3243, [%r85+2688]; 2026-02-21T10:22:24.3840577Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3840638Z shl.b16 %rs3244, %rs3243, 4; 2026-02-21T10:22:24.3840827Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3840895Z ld.shared.s8 %rs3245, [%r86+2816]; 2026-02-21T10:22:24.3841083Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3841147Z shl.b16 %rs3246, %rs3245, 4; 2026-02-21T10:22:24.3841337Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3841400Z ld.shared.s8 %rs3247, [%r87+2944]; 2026-02-21T10:22:24.3841587Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3841649Z shl.b16 %rs3248, %rs3247, 4; 2026-02-21T10:22:24.3841910Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3841975Z ld.shared.s8 %rs3249, [%r80+3072]; 2026-02-21T10:22:24.3842162Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3842224Z shl.b16 %rs3250, %rs3249, 4; 2026-02-21T10:22:24.3842411Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3842478Z ld.shared.s8 %rs3251, [%r81+3200]; 2026-02-21T10:22:24.3842668Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3842727Z shl.b16 %rs3252, %rs3251, 4; 2026-02-21T10:22:24.3842928Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3842999Z ld.shared.s8 %rs3253, [%r82+3328]; 2026-02-21T10:22:24.3843190Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3843250Z shl.b16 %rs3254, %rs3253, 4; 2026-02-21T10:22:24.3843436Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3843500Z ld.shared.s8 %rs3255, [%r83+3456]; 2026-02-21T10:22:24.3843688Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3843748Z shl.b16 %rs3256, %rs3255, 4; 2026-02-21T10:22:24.3843996Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3844059Z ld.shared.s8 %rs3257, [%r84+3584]; 2026-02-21T10:22:24.3844247Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3844308Z shl.b16 %rs3258, %rs3257, 4; 2026-02-21T10:22:24.3844500Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3844564Z ld.shared.s8 %rs3259, [%r85+3712]; 2026-02-21T10:22:24.3844753Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3844812Z shl.b16 %rs3260, %rs3259, 4; 2026-02-21T10:22:24.3844998Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3845059Z ld.shared.s8 %rs3261, [%r86+3840]; 2026-02-21T10:22:24.3845297Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3845362Z shl.b16 %rs3262, %rs3261, 4; 2026-02-21T10:22:24.3845549Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3845617Z ld.shared.s8 %rs3263, [%r87+3968]; 2026-02-21T10:22:24.3845855Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3845914Z shl.b16 %rs3264, %rs3263, 4; 2026-02-21T10:22:24.3846115Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3846174Z cvt.s16.s8 %rs3265, %rs3202; 2026-02-21T10:22:24.3846232Z shr.s16 %rs3266, %rs3265, 4; 2026-02-21T10:22:24.3846293Z cvt.s16.s8 %rs3267, %rs3204; 2026-02-21T10:22:24.3846352Z shr.s16 %rs3268, %rs3267, 4; 2026-02-21T10:22:24.3846411Z shr.s16 %rs3269, %rs3201, 4; 2026-02-21T10:22:24.3846593Z shr.s16 %rs3270, %rs3203, 4; 2026-02-21T10:22:24.3846797Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3846863Z cvt.rn.f32.s16 %r39838, %rs3270; 2026-02-21T10:22:24.3846926Z cvt.rn.f32.s16 %r39839, %rs3269; 2026-02-21T10:22:24.3846992Z cvt.rn.f32.s16 %r39840, %rs3268; 2026-02-21T10:22:24.3847052Z cvt.rn.f32.s16 %r39841, %rs3266; 2026-02-21T10:22:24.3847244Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3847388Z cvt.s16.s8 %rs3271, %rs3206; 2026-02-21T10:22:24.3847451Z shr.s16 %rs3272, %rs3271, 4; 2026-02-21T10:22:24.3847511Z cvt.s16.s8 %rs3273, %rs3208; 2026-02-21T10:22:24.3847569Z shr.s16 %rs3274, %rs3273, 4; 2026-02-21T10:22:24.3847629Z shr.s16 %rs3275, %rs3205, 4; 2026-02-21T10:22:24.3847687Z shr.s16 %rs3276, %rs3207, 4; 2026-02-21T10:22:24.3847875Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3847943Z cvt.rn.f32.s16 %r39842, %rs3276; 2026-02-21T10:22:24.3848006Z cvt.rn.f32.s16 %r39843, %rs3275; 2026-02-21T10:22:24.3848065Z cvt.rn.f32.s16 %r39844, %rs3274; 2026-02-21T10:22:24.3848124Z cvt.rn.f32.s16 %r39845, %rs3272; 2026-02-21T10:22:24.3848316Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3848379Z cvt.s16.s8 %rs3277, %rs3210; 2026-02-21T10:22:24.3848437Z shr.s16 %rs3278, %rs3277, 4; 2026-02-21T10:22:24.3848498Z cvt.s16.s8 %rs3279, %rs3212; 2026-02-21T10:22:24.3848559Z shr.s16 %rs3280, %rs3279, 4; 2026-02-21T10:22:24.3848617Z shr.s16 %rs3281, %rs3209, 4; 2026-02-21T10:22:24.3848680Z shr.s16 %rs3282, %rs3211, 4; 2026-02-21T10:22:24.3848868Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3848929Z cvt.rn.f32.s16 %r39846, %rs3282; 2026-02-21T10:22:24.3848990Z cvt.rn.f32.s16 %r39847, %rs3281; 2026-02-21T10:22:24.3849139Z cvt.rn.f32.s16 %r39848, %rs3280; 2026-02-21T10:22:24.3849201Z cvt.rn.f32.s16 %r39849, %rs3278; 2026-02-21T10:22:24.3849389Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3849450Z cvt.s16.s8 %rs3283, %rs3214; 2026-02-21T10:22:24.3849508Z shr.s16 %rs3284, %rs3283, 4; 2026-02-21T10:22:24.3849568Z cvt.s16.s8 %rs3285, %rs3216; 2026-02-21T10:22:24.3849628Z shr.s16 %rs3286, %rs3285, 4; 2026-02-21T10:22:24.3849686Z shr.s16 %rs3287, %rs3213, 4; 2026-02-21T10:22:24.3849746Z shr.s16 %rs3288, %rs3215, 4; 2026-02-21T10:22:24.3849935Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3849999Z cvt.rn.f32.s16 %r39850, %rs3288; 2026-02-21T10:22:24.3850060Z cvt.rn.f32.s16 %r39851, %rs3287; 2026-02-21T10:22:24.3850119Z cvt.rn.f32.s16 %r39852, %rs3286; 2026-02-21T10:22:24.3850181Z cvt.rn.f32.s16 %r39853, %rs3284; 2026-02-21T10:22:24.3850437Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3850503Z cvt.s16.s8 %rs3289, %rs3218; 2026-02-21T10:22:24.3850564Z shr.s16 %rs3290, %rs3289, 4; 2026-02-21T10:22:24.3850622Z cvt.s16.s8 %rs3291, %rs3220; 2026-02-21T10:22:24.3850680Z shr.s16 %rs3292, %rs3291, 4; 2026-02-21T10:22:24.3850817Z shr.s16 %rs3293, %rs3217, 4; 2026-02-21T10:22:24.3850877Z shr.s16 %rs3294, %rs3219, 4; 2026-02-21T10:22:24.3851067Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3851129Z cvt.rn.f32.s16 %r39854, %rs3294; 2026-02-21T10:22:24.3851192Z cvt.rn.f32.s16 %r39855, %rs3293; 2026-02-21T10:22:24.3851251Z cvt.rn.f32.s16 %r39856, %rs3292; 2026-02-21T10:22:24.3851310Z cvt.rn.f32.s16 %r39857, %rs3290; 2026-02-21T10:22:24.3851497Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3851558Z cvt.s16.s8 %rs3295, %rs3222; 2026-02-21T10:22:24.3851622Z shr.s16 %rs3296, %rs3295, 4; 2026-02-21T10:22:24.3851680Z cvt.s16.s8 %rs3297, %rs3224; 2026-02-21T10:22:24.3851742Z shr.s16 %rs3298, %rs3297, 4; 2026-02-21T10:22:24.3851800Z shr.s16 %rs3299, %rs3221, 4; 2026-02-21T10:22:24.3851859Z shr.s16 %rs3300, %rs3223, 4; 2026-02-21T10:22:24.3852053Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3852130Z cvt.rn.f32.s16 %r39858, %rs3300; 2026-02-21T10:22:24.3852246Z cvt.rn.f32.s16 %r39859, %rs3299; 2026-02-21T10:22:24.3852309Z cvt.rn.f32.s16 %r39860, %rs3298; 2026-02-21T10:22:24.3852374Z cvt.rn.f32.s16 %r39861, %rs3296; 2026-02-21T10:22:24.3852563Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3852623Z cvt.s16.s8 %rs3301, %rs3226; 2026-02-21T10:22:24.3852687Z shr.s16 %rs3302, %rs3301, 4; 2026-02-21T10:22:24.3852745Z cvt.s16.s8 %rs3303, %rs3228; 2026-02-21T10:22:24.3852806Z shr.s16 %rs3304, %rs3303, 4; 2026-02-21T10:22:24.3852869Z shr.s16 %rs3305, %rs3225, 4; 2026-02-21T10:22:24.3852926Z shr.s16 %rs3306, %rs3227, 4; 2026-02-21T10:22:24.3853115Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3853176Z cvt.rn.f32.s16 %r39862, %rs3306; 2026-02-21T10:22:24.3853241Z cvt.rn.f32.s16 %r39863, %rs3305; 2026-02-21T10:22:24.3853301Z cvt.rn.f32.s16 %r39864, %rs3304; 2026-02-21T10:22:24.3853360Z cvt.rn.f32.s16 %r39865, %rs3302; 2026-02-21T10:22:24.3853552Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3853610Z cvt.s16.s8 %rs3307, %rs3230; 2026-02-21T10:22:24.3853667Z shr.s16 %rs3308, %rs3307, 4; 2026-02-21T10:22:24.3853726Z cvt.s16.s8 %rs3309, %rs3232; 2026-02-21T10:22:24.3853787Z shr.s16 %rs3310, %rs3309, 4; 2026-02-21T10:22:24.3853844Z shr.s16 %rs3311, %rs3229, 4; 2026-02-21T10:22:24.3853914Z shr.s16 %rs3312, %rs3231, 4; 2026-02-21T10:22:24.3854165Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3854226Z cvt.rn.f32.s16 %r39866, %rs3312; 2026-02-21T10:22:24.3854286Z cvt.rn.f32.s16 %r39867, %rs3311; 2026-02-21T10:22:24.3854347Z cvt.rn.f32.s16 %r39868, %rs3310; 2026-02-21T10:22:24.3854410Z cvt.rn.f32.s16 %r39869, %rs3308; 2026-02-21T10:22:24.3854598Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3854658Z cvt.s16.s8 %rs3313, %rs3234; 2026-02-21T10:22:24.3854719Z shr.s16 %rs3314, %rs3313, 4; 2026-02-21T10:22:24.3854788Z cvt.s16.s8 %rs3315, %rs3236; 2026-02-21T10:22:24.3854847Z shr.s16 %rs3316, %rs3315, 4; 2026-02-21T10:22:24.3854907Z shr.s16 %rs3317, %rs3233, 4; 2026-02-21T10:22:24.3854965Z shr.s16 %rs3318, %rs3235, 4; 2026-02-21T10:22:24.3855153Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3855269Z cvt.rn.f32.s16 %r39870, %rs3318; 2026-02-21T10:22:24.3855331Z cvt.rn.f32.s16 %r39871, %rs3317; 2026-02-21T10:22:24.3855390Z cvt.rn.f32.s16 %r39872, %rs3316; 2026-02-21T10:22:24.3855449Z cvt.rn.f32.s16 %r39873, %rs3314; 2026-02-21T10:22:24.3855639Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3855744Z cvt.s16.s8 %rs3319, %rs3238; 2026-02-21T10:22:24.3855803Z shr.s16 %rs3320, %rs3319, 4; 2026-02-21T10:22:24.3855865Z cvt.s16.s8 %rs3321, %rs3240; 2026-02-21T10:22:24.3855924Z shr.s16 %rs3322, %rs3321, 4; 2026-02-21T10:22:24.3855981Z shr.s16 %rs3323, %rs3237, 4; 2026-02-21T10:22:24.3856040Z shr.s16 %rs3324, %rs3239, 4; 2026-02-21T10:22:24.3856228Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3856296Z cvt.rn.f32.s16 %r39874, %rs3324; 2026-02-21T10:22:24.3856356Z cvt.rn.f32.s16 %r39875, %rs3323; 2026-02-21T10:22:24.3856420Z cvt.rn.f32.s16 %r39876, %rs3322; 2026-02-21T10:22:24.3856604Z cvt.rn.f32.s16 %r39877, %rs3320; 2026-02-21T10:22:24.3856799Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3856863Z cvt.s16.s8 %rs3325, %rs3242; 2026-02-21T10:22:24.3856924Z shr.s16 %rs3326, %rs3325, 4; 2026-02-21T10:22:24.3856985Z cvt.s16.s8 %rs3327, %rs3244; 2026-02-21T10:22:24.3857043Z shr.s16 %rs3328, %rs3327, 4; 2026-02-21T10:22:24.3857105Z shr.s16 %rs3329, %rs3241, 4; 2026-02-21T10:22:24.3857242Z shr.s16 %rs3330, %rs3243, 4; 2026-02-21T10:22:24.3857448Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3857515Z cvt.rn.f32.s16 %r39878, %rs3330; 2026-02-21T10:22:24.3857575Z cvt.rn.f32.s16 %r39879, %rs3329; 2026-02-21T10:22:24.3857634Z cvt.rn.f32.s16 %r39880, %rs3328; 2026-02-21T10:22:24.3857696Z cvt.rn.f32.s16 %r39881, %rs3326; 2026-02-21T10:22:24.3857891Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3857950Z cvt.s16.s8 %rs3331, %rs3246; 2026-02-21T10:22:24.3858009Z shr.s16 %rs3332, %rs3331, 4; 2026-02-21T10:22:24.3858070Z cvt.s16.s8 %rs3333, %rs3248; 2026-02-21T10:22:24.3858129Z shr.s16 %rs3334, %rs3333, 4; 2026-02-21T10:22:24.3858189Z shr.s16 %rs3335, %rs3245, 4; 2026-02-21T10:22:24.3858259Z shr.s16 %rs3336, %rs3247, 4; 2026-02-21T10:22:24.3858453Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3858515Z cvt.rn.f32.s16 %r39882, %rs3336; 2026-02-21T10:22:24.3858577Z cvt.rn.f32.s16 %r39883, %rs3335; 2026-02-21T10:22:24.3858637Z cvt.rn.f32.s16 %r39884, %rs3334; 2026-02-21T10:22:24.3858696Z cvt.rn.f32.s16 %r39885, %rs3332; 2026-02-21T10:22:24.3858884Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3858947Z cvt.s16.s8 %rs3337, %rs3250; 2026-02-21T10:22:24.3859080Z shr.s16 %rs3338, %rs3337, 4; 2026-02-21T10:22:24.3859140Z cvt.s16.s8 %rs3339, %rs3252; 2026-02-21T10:22:24.3859200Z shr.s16 %rs3340, %rs3339, 4; 2026-02-21T10:22:24.3859257Z shr.s16 %rs3341, %rs3249, 4; 2026-02-21T10:22:24.3859314Z shr.s16 %rs3342, %rs3251, 4; 2026-02-21T10:22:24.3859505Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3859568Z cvt.rn.f32.s16 %r39886, %rs3342; 2026-02-21T10:22:24.3859630Z cvt.rn.f32.s16 %r39887, %rs3341; 2026-02-21T10:22:24.3859690Z cvt.rn.f32.s16 %r39888, %rs3340; 2026-02-21T10:22:24.3859751Z cvt.rn.f32.s16 %r39889, %rs3338; 2026-02-21T10:22:24.3859942Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3860003Z cvt.s16.s8 %rs3343, %rs3254; 2026-02-21T10:22:24.3860065Z shr.s16 %rs3344, %rs3343, 4; 2026-02-21T10:22:24.3860129Z cvt.s16.s8 %rs3345, %rs3256; 2026-02-21T10:22:24.3860256Z shr.s16 %rs3346, %rs3345, 4; 2026-02-21T10:22:24.3860328Z shr.s16 %rs3347, %rs3253, 4; 2026-02-21T10:22:24.3860392Z shr.s16 %rs3348, %rs3255, 4; 2026-02-21T10:22:24.3860584Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3860645Z cvt.rn.f32.s16 %r39890, %rs3348; 2026-02-21T10:22:24.3860774Z cvt.rn.f32.s16 %r39891, %rs3347; 2026-02-21T10:22:24.3860834Z cvt.rn.f32.s16 %r39892, %rs3346; 2026-02-21T10:22:24.3860895Z cvt.rn.f32.s16 %r39893, %rs3344; 2026-02-21T10:22:24.3861085Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3861155Z cvt.s16.s8 %rs3349, %rs3258; 2026-02-21T10:22:24.3861216Z shr.s16 %rs3350, %rs3349, 4; 2026-02-21T10:22:24.3861275Z cvt.s16.s8 %rs3351, %rs3260; 2026-02-21T10:22:24.3861336Z shr.s16 %rs3352, %rs3351, 4; 2026-02-21T10:22:24.3861393Z shr.s16 %rs3353, %rs3257, 4; 2026-02-21T10:22:24.3861453Z shr.s16 %rs3354, %rs3259, 4; 2026-02-21T10:22:24.3861645Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3861706Z cvt.rn.f32.s16 %r39894, %rs3354; 2026-02-21T10:22:24.3861767Z cvt.rn.f32.s16 %r39895, %rs3353; 2026-02-21T10:22:24.3861832Z cvt.rn.f32.s16 %r39896, %rs3352; 2026-02-21T10:22:24.3861894Z cvt.rn.f32.s16 %r39897, %rs3350; 2026-02-21T10:22:24.3862083Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3862195Z cvt.s16.s8 %rs3355, %rs3262; 2026-02-21T10:22:24.3862258Z shr.s16 %rs3356, %rs3355, 4; 2026-02-21T10:22:24.3862318Z cvt.s16.s8 %rs3357, %rs3264; 2026-02-21T10:22:24.3862375Z shr.s16 %rs3358, %rs3357, 4; 2026-02-21T10:22:24.3862437Z shr.s16 %rs3359, %rs3261, 4; 2026-02-21T10:22:24.3862496Z shr.s16 %rs3360, %rs3263, 4; 2026-02-21T10:22:24.3862684Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3862752Z cvt.rn.f32.s16 %r39898, %rs3360; 2026-02-21T10:22:24.3862812Z cvt.rn.f32.s16 %r39899, %rs3359; 2026-02-21T10:22:24.3862873Z cvt.rn.f32.s16 %r39900, %rs3358; 2026-02-21T10:22:24.3862932Z cvt.rn.f32.s16 %r39901, %rs3356; 2026-02-21T10:22:24.3862988Z bar.sync 0; 2026-02-21T10:22:24.3863104Z st.shared.v4.b32 [%r88], {%r39841, %r39839, %r39840, %r39838}; 2026-02-21T10:22:24.3863233Z st.shared.v4.b32 [%r88+16384], {%r39873, %r39871, %r39872, %r39870}; 2026-02-21T10:22:24.3863347Z st.shared.v4.b32 [%r89], {%r39845, %r39843, %r39844, %r39842}; 2026-02-21T10:22:24.3863465Z st.shared.v4.b32 [%r89+16384], {%r39877, %r39875, %r39876, %r39874}; 2026-02-21T10:22:24.3863570Z st.shared.v4.b32 [%r90], {%r39849, %r39847, %r39848, %r39846}; 2026-02-21T10:22:24.3863685Z st.shared.v4.b32 [%r90+16384], {%r39881, %r39879, %r39880, %r39878}; 2026-02-21T10:22:24.3863791Z st.shared.v4.b32 [%r91], {%r39853, %r39851, %r39852, %r39850}; 2026-02-21T10:22:24.3863976Z st.shared.v4.b32 [%r91+16384], {%r39885, %r39883, %r39884, %r39882}; 2026-02-21T10:22:24.3864084Z st.shared.v4.b32 [%r92], {%r39857, %r39855, %r39856, %r39854}; 2026-02-21T10:22:24.3864210Z st.shared.v4.b32 [%r92+16384], {%r39889, %r39887, %r39888, %r39886}; 2026-02-21T10:22:24.3864319Z st.shared.v4.b32 [%r93], {%r39861, %r39859, %r39860, %r39858}; 2026-02-21T10:22:24.3864440Z st.shared.v4.b32 [%r93+16384], {%r39893, %r39891, %r39892, %r39890}; 2026-02-21T10:22:24.3864549Z st.shared.v4.b32 [%r94], {%r39865, %r39863, %r39864, %r39862}; 2026-02-21T10:22:24.3864668Z st.shared.v4.b32 [%r94+16384], {%r39897, %r39895, %r39896, %r39894}; 2026-02-21T10:22:24.3864772Z st.shared.v4.b32 [%r95], {%r39869, %r39867, %r39868, %r39866}; 2026-02-21T10:22:24.3864890Z st.shared.v4.b32 [%r95+16384], {%r39901, %r39899, %r39900, %r39898}; 2026-02-21T10:22:24.3864945Z $L__tmp29: 2026-02-21T10:22:24.3865225Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3865339Z // begin inline asm 2026-02-21T10:22:24.3865417Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3865473Z // end inline asm 2026-02-21T10:22:24.3865528Z bar.sync 0; 2026-02-21T10:22:24.3865602Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3865658Z // begin inline asm 2026-02-21T10:22:24.3867331Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r37458,%r37459,%r37460,%r37461}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3867396Z // end inline asm 2026-02-21T10:22:24.3867457Z // begin inline asm 2026-02-21T10:22:24.3869110Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r37590,%r37591,%r37592,%r37593}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3869175Z // end inline asm 2026-02-21T10:22:24.3869232Z // begin inline asm 2026-02-21T10:22:24.3870712Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r37722,%r37723,%r37724,%r37725}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3870772Z // end inline asm 2026-02-21T10:22:24.3870829Z // begin inline asm 2026-02-21T10:22:24.3872310Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r37854,%r37855,%r37856,%r37857}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3872441Z // end inline asm 2026-02-21T10:22:24.3872501Z // begin inline asm 2026-02-21T10:22:24.3873978Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r37986,%r37987,%r37988,%r37989}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3874037Z // end inline asm 2026-02-21T10:22:24.3874096Z // begin inline asm 2026-02-21T10:22:24.3875651Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r38118,%r38119,%r38120,%r38121}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3875764Z // end inline asm 2026-02-21T10:22:24.3875820Z // begin inline asm 2026-02-21T10:22:24.3877436Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r38250,%r38251,%r38252,%r38253}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3877503Z // end inline asm 2026-02-21T10:22:24.3877560Z // begin inline asm 2026-02-21T10:22:24.3879107Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r38382,%r38383,%r38384,%r38385}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3879175Z // end inline asm 2026-02-21T10:22:24.3879231Z // begin inline asm 2026-02-21T10:22:24.3880709Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r38514,%r38515,%r38516,%r38517}, %rd12, %p317, 1, 1; 2026-02-21T10:22:24.3880767Z // end inline asm 2026-02-21T10:22:24.3880823Z // begin inline asm 2026-02-21T10:22:24.3882305Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r38646,%r38647,%r38648,%r38649}, %rd13, %p317, 1, 1; 2026-02-21T10:22:24.3882435Z // end inline asm 2026-02-21T10:22:24.3882494Z // begin inline asm 2026-02-21T10:22:24.3884042Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r38778,%r38779,%r38780,%r38781}, %rd14, %p317, 1, 1; 2026-02-21T10:22:24.3884105Z // end inline asm 2026-02-21T10:22:24.3884229Z // begin inline asm 2026-02-21T10:22:24.3885708Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r38910,%r38911,%r38912,%r38913}, %rd15, %p317, 1, 1; 2026-02-21T10:22:24.3885767Z // end inline asm 2026-02-21T10:22:24.3885826Z // begin inline asm 2026-02-21T10:22:24.3887509Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r39042,%r39043,%r39044,%r39045}, %rd16, %p317, 1, 1; 2026-02-21T10:22:24.3887587Z // end inline asm 2026-02-21T10:22:24.3887646Z // begin inline asm 2026-02-21T10:22:24.3889149Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r39174,%r39175,%r39176,%r39177}, %rd17, %p317, 1, 1; 2026-02-21T10:22:24.3889215Z // end inline asm 2026-02-21T10:22:24.3889277Z // begin inline asm 2026-02-21T10:22:24.3890767Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r39306,%r39307,%r39308,%r39309}, %rd18, %p317, 1, 1; 2026-02-21T10:22:24.3890890Z // end inline asm 2026-02-21T10:22:24.3890947Z // begin inline asm 2026-02-21T10:22:24.3892433Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r39438,%r39439,%r39440,%r39441}, %rd19, %p317, 1, 1; 2026-02-21T10:22:24.3892490Z // end inline asm 2026-02-21T10:22:24.3892625Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3892691Z mov.b32 %r39570, %r39936; 2026-02-21T10:22:24.3892761Z mov.b32 %r39571, %r39572; 2026-02-21T10:22:24.3892819Z // begin inline asm 2026-02-21T10:22:24.3895343Z // wait for regs: %r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373,%r39570,%r39571,%r39572 2026-02-21T10:22:24.3895486Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3895544Z // end inline asm 2026-02-21T10:22:24.3895598Z $L__tmp30: 2026-02-21T10:22:24.3895860Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3895925Z add.s64 %rd852, %rd852, 384; 2026-02-21T10:22:24.3895990Z add.s32 %r43245, %r43245, 192; 2026-02-21T10:22:24.3896057Z setp.lt.u64 %p377, %rd103, 3936; 2026-02-21T10:22:24.3896116Z mov.b64 %rd853, %rd103; 2026-02-21T10:22:24.3896190Z @%p377 bra $L__BB0_18; 2026-02-21T10:22:24.3896296Z // %bb.19: // %.preheader.preheader 2026-02-21T10:22:24.3896402Z // in Loop: Header=BB0_17 Depth=1 2026-02-21T10:22:24.3896583Z add.s64 %rd105, %rd100, 16128; 2026-02-21T10:22:24.3896650Z add.s64 %rd106, %rd93, 16128; 2026-02-21T10:22:24.3896708Z add.s64 %rd107, %rd94, 16128; 2026-02-21T10:22:24.3896766Z add.s64 %rd108, %rd95, 16128; 2026-02-21T10:22:24.3896827Z add.s64 %rd109, %rd96, 16128; 2026-02-21T10:22:24.3896887Z add.s64 %rd110, %rd97, 16128; 2026-02-21T10:22:24.3896944Z add.s64 %rd111, %rd98, 16128; 2026-02-21T10:22:24.3897004Z add.s64 %rd112, %rd99, 16128; 2026-02-21T10:22:24.3897063Z mov.b64 %rd855, 4000; 2026-02-21T10:22:24.3897122Z mov.b64 %rd854, %rd20; 2026-02-21T10:22:24.3897213Z $L__BB0_20: // %.preheader 2026-02-21T10:22:24.3897316Z // Parent Loop BB0_17 Depth=1 2026-02-21T10:22:24.3897419Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:24.3897625Z .loc 1 58 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:32 2026-02-21T10:22:24.3897778Z add.s64 %rd799, %rd854, %rd112; 2026-02-21T10:22:24.3897839Z add.s64 %rd802, %rd854, %rd111; 2026-02-21T10:22:24.3897898Z add.s64 %rd805, %rd854, %rd110; 2026-02-21T10:22:24.3897961Z add.s64 %rd808, %rd854, %rd109; 2026-02-21T10:22:24.3898021Z add.s64 %rd811, %rd854, %rd108; 2026-02-21T10:22:24.3898082Z add.s64 %rd814, %rd854, %rd107; 2026-02-21T10:22:24.3898142Z add.s64 %rd817, %rd854, %rd106; 2026-02-21T10:22:24.3898344Z .loc 1 58 80 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:58:80 2026-02-21T10:22:24.3898405Z add.s64 %rd820, %rd854, %rd105; 2026-02-21T10:22:24.3898463Z // begin inline asm 2026-02-21T10:22:24.3898521Z mov.u64 %rd798, 0x0; 2026-02-21T10:22:24.3898650Z createpolicy.fractional.L2::evict_first.b64 %rd798, 1.0; 2026-02-21T10:22:24.3898705Z // end inline asm 2026-02-21T10:22:24.3898763Z // begin inline asm 2026-02-21T10:22:24.3898820Z mov.u32 %r39902, 0x0; 2026-02-21T10:22:24.3898945Z mov.u32 %r39903, 0x0; 2026-02-21T10:22:24.3899014Z mov.u32 %r39904, 0x0; 2026-02-21T10:22:24.3899076Z mov.u32 %r39905, 0x0; 2026-02-21T10:22:24.3899314Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39902, %r39903, %r39904, %r39905 }, [ %rd799 + 0 ], %rd798; 2026-02-21T10:22:24.3899368Z // end inline asm 2026-02-21T10:22:24.3899492Z // begin inline asm 2026-02-21T10:22:24.3899550Z mov.u64 %rd801, 0x0; 2026-02-21T10:22:24.3899669Z createpolicy.fractional.L2::evict_first.b64 %rd801, 1.0; 2026-02-21T10:22:24.3899731Z // end inline asm 2026-02-21T10:22:24.3899788Z // begin inline asm 2026-02-21T10:22:24.3899843Z mov.u32 %r39906, 0x0; 2026-02-21T10:22:24.3899898Z mov.u32 %r39907, 0x0; 2026-02-21T10:22:24.3899954Z mov.u32 %r39908, 0x0; 2026-02-21T10:22:24.3900008Z mov.u32 %r39909, 0x0; 2026-02-21T10:22:24.3900231Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39906, %r39907, %r39908, %r39909 }, [ %rd802 + 0 ], %rd801; 2026-02-21T10:22:24.3900294Z // end inline asm 2026-02-21T10:22:24.3900352Z // begin inline asm 2026-02-21T10:22:24.3900409Z mov.u64 %rd804, 0x0; 2026-02-21T10:22:24.3900526Z createpolicy.fractional.L2::evict_first.b64 %rd804, 1.0; 2026-02-21T10:22:24.3900585Z // end inline asm 2026-02-21T10:22:24.3900642Z // begin inline asm 2026-02-21T10:22:24.3900697Z mov.u32 %r39910, 0x0; 2026-02-21T10:22:24.3900758Z mov.u32 %r39911, 0x0; 2026-02-21T10:22:24.3900812Z mov.u32 %r39912, 0x0; 2026-02-21T10:22:24.3900867Z mov.u32 %r39913, 0x0; 2026-02-21T10:22:24.3901159Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39910, %r39911, %r39912, %r39913 }, [ %rd805 + 0 ], %rd804; 2026-02-21T10:22:24.3901216Z // end inline asm 2026-02-21T10:22:24.3901272Z // begin inline asm 2026-02-21T10:22:24.3901328Z mov.u64 %rd807, 0x0; 2026-02-21T10:22:24.3901457Z createpolicy.fractional.L2::evict_first.b64 %rd807, 1.0; 2026-02-21T10:22:24.3901516Z // end inline asm 2026-02-21T10:22:24.3901576Z // begin inline asm 2026-02-21T10:22:24.3901634Z mov.u32 %r39914, 0x0; 2026-02-21T10:22:24.3901695Z mov.u32 %r39915, 0x0; 2026-02-21T10:22:24.3901750Z mov.u32 %r39916, 0x0; 2026-02-21T10:22:24.3901805Z mov.u32 %r39917, 0x0; 2026-02-21T10:22:24.3902027Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39914, %r39915, %r39916, %r39917 }, [ %rd808 + 0 ], %rd807; 2026-02-21T10:22:24.3902082Z // end inline asm 2026-02-21T10:22:24.3902140Z // begin inline asm 2026-02-21T10:22:24.3902199Z mov.u64 %rd810, 0x0; 2026-02-21T10:22:24.3902316Z createpolicy.fractional.L2::evict_first.b64 %rd810, 1.0; 2026-02-21T10:22:24.3902373Z // end inline asm 2026-02-21T10:22:24.3902432Z // begin inline asm 2026-02-21T10:22:24.3902487Z mov.u32 %r39918, 0x0; 2026-02-21T10:22:24.3902541Z mov.u32 %r39919, 0x0; 2026-02-21T10:22:24.3902596Z mov.u32 %r39920, 0x0; 2026-02-21T10:22:24.3902653Z mov.u32 %r39921, 0x0; 2026-02-21T10:22:24.3902872Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39918, %r39919, %r39920, %r39921 }, [ %rd811 + 0 ], %rd810; 2026-02-21T10:22:24.3903003Z // end inline asm 2026-02-21T10:22:24.3903064Z // begin inline asm 2026-02-21T10:22:24.3903119Z mov.u64 %rd813, 0x0; 2026-02-21T10:22:24.3903234Z createpolicy.fractional.L2::evict_first.b64 %rd813, 1.0; 2026-02-21T10:22:24.3903304Z // end inline asm 2026-02-21T10:22:24.3903362Z // begin inline asm 2026-02-21T10:22:24.3903420Z mov.u32 %r39922, 0x0; 2026-02-21T10:22:24.3903479Z mov.u32 %r39923, 0x0; 2026-02-21T10:22:24.3903535Z mov.u32 %r39924, 0x0; 2026-02-21T10:22:24.3903590Z mov.u32 %r39925, 0x0; 2026-02-21T10:22:24.3903813Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39922, %r39923, %r39924, %r39925 }, [ %rd814 + 0 ], %rd813; 2026-02-21T10:22:24.3903870Z // end inline asm 2026-02-21T10:22:24.3903925Z // begin inline asm 2026-02-21T10:22:24.3903981Z mov.u64 %rd816, 0x0; 2026-02-21T10:22:24.3904098Z createpolicy.fractional.L2::evict_first.b64 %rd816, 1.0; 2026-02-21T10:22:24.3904152Z // end inline asm 2026-02-21T10:22:24.3904208Z // begin inline asm 2026-02-21T10:22:24.3904262Z mov.u32 %r39926, 0x0; 2026-02-21T10:22:24.3904374Z mov.u32 %r39927, 0x0; 2026-02-21T10:22:24.3904432Z mov.u32 %r39928, 0x0; 2026-02-21T10:22:24.3904487Z mov.u32 %r39929, 0x0; 2026-02-21T10:22:24.3904712Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39926, %r39927, %r39928, %r39929 }, [ %rd817 + 0 ], %rd816; 2026-02-21T10:22:24.3904767Z // end inline asm 2026-02-21T10:22:24.3904870Z // begin inline asm 2026-02-21T10:22:24.3904926Z mov.u64 %rd819, 0x0; 2026-02-21T10:22:24.3905055Z createpolicy.fractional.L2::evict_first.b64 %rd819, 1.0; 2026-02-21T10:22:24.3905112Z // end inline asm 2026-02-21T10:22:24.3905168Z // begin inline asm 2026-02-21T10:22:24.3905226Z mov.u32 %r39930, 0x0; 2026-02-21T10:22:24.3905281Z mov.u32 %r39931, 0x0; 2026-02-21T10:22:24.3905336Z mov.u32 %r39932, 0x0; 2026-02-21T10:22:24.3905390Z mov.u32 %r39933, 0x0; 2026-02-21T10:22:24.3905620Z ld.global.L1::evict_first.L2::cache_hint.v4.b32 { %r39930, %r39931, %r39932, %r39933 }, [ %rd820 + 0 ], %rd819; 2026-02-21T10:22:24.3905680Z // end inline asm 2026-02-21T10:22:24.3905880Z .loc 1 62 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:62:32 2026-02-21T10:22:24.3905937Z bar.sync 0; 2026-02-21T10:22:24.3906020Z st.shared.v2.b32 [%r70], {%r39902, %r39903}; 2026-02-21T10:22:24.3906111Z st.shared.v2.b32 [%r70+2048], {%r39906, %r39907}; 2026-02-21T10:22:24.3906201Z st.shared.v2.b32 [%r70+4096], {%r39910, %r39911}; 2026-02-21T10:22:24.3906282Z st.shared.v2.b32 [%r70+6144], {%r39914, %r39915}; 2026-02-21T10:22:24.3906419Z st.shared.v2.b32 [%r70+8192], {%r39918, %r39919}; 2026-02-21T10:22:24.3906640Z st.shared.v2.b32 [%r70+10240], {%r39922, %r39923}; 2026-02-21T10:22:24.3906730Z st.shared.v2.b32 [%r70+12288], {%r39926, %r39927}; 2026-02-21T10:22:24.3906814Z st.shared.v2.b32 [%r70+14336], {%r39930, %r39931}; 2026-02-21T10:22:24.3906891Z st.shared.v2.b32 [%r71], {%r39904, %r39905}; 2026-02-21T10:22:24.3906975Z st.shared.v2.b32 [%r71+2048], {%r39908, %r39909}; 2026-02-21T10:22:24.3907064Z st.shared.v2.b32 [%r71+4096], {%r39912, %r39913}; 2026-02-21T10:22:24.3907146Z st.shared.v2.b32 [%r71+6144], {%r39916, %r39917}; 2026-02-21T10:22:24.3907228Z st.shared.v2.b32 [%r71+8192], {%r39920, %r39921}; 2026-02-21T10:22:24.3907316Z st.shared.v2.b32 [%r71+10240], {%r39924, %r39925}; 2026-02-21T10:22:24.3907399Z st.shared.v2.b32 [%r71+12288], {%r39928, %r39929}; 2026-02-21T10:22:24.3907487Z st.shared.v2.b32 [%r71+14336], {%r39932, %r39933}; 2026-02-21T10:22:24.3907541Z bar.sync 0; 2026-02-21T10:22:24.3907609Z ld.shared.b16 %rs3361, [%r72]; 2026-02-21T10:22:24.3907678Z ld.shared.b16 %rs3362, [%r72+1024]; 2026-02-21T10:22:24.3907757Z ld.shared.b16 %rs3363, [%r72+64]; 2026-02-21T10:22:24.3907826Z ld.shared.b16 %rs3364, [%r72+1088]; 2026-02-21T10:22:24.3907891Z ld.shared.b16 %rs3365, [%r72+8192]; 2026-02-21T10:22:24.3907958Z ld.shared.b16 %rs3366, [%r72+9216]; 2026-02-21T10:22:24.3908021Z ld.shared.b16 %rs3367, [%r72+8256]; 2026-02-21T10:22:24.3908083Z ld.shared.b16 %rs3368, [%r72+9280]; 2026-02-21T10:22:24.3908226Z ld.shared.b16 %rs3369, [%r73]; 2026-02-21T10:22:24.3908291Z ld.shared.b16 %rs3370, [%r73+1024]; 2026-02-21T10:22:24.3908352Z ld.shared.b16 %rs3371, [%r73+64]; 2026-02-21T10:22:24.3908494Z ld.shared.b16 %rs3372, [%r73+1088]; 2026-02-21T10:22:24.3908565Z ld.shared.b16 %rs3373, [%r73+8192]; 2026-02-21T10:22:24.3908632Z ld.shared.b16 %rs3374, [%r73+9216]; 2026-02-21T10:22:24.3908695Z ld.shared.b16 %rs3375, [%r73+8256]; 2026-02-21T10:22:24.3908760Z ld.shared.b16 %rs3376, [%r73+9280]; 2026-02-21T10:22:24.3908824Z ld.shared.b16 %rs3377, [%r74]; 2026-02-21T10:22:24.3908886Z ld.shared.b16 %rs3378, [%r74+1024]; 2026-02-21T10:22:24.3908949Z ld.shared.b16 %rs3379, [%r74+64]; 2026-02-21T10:22:24.3909014Z ld.shared.b16 %rs3380, [%r74+1088]; 2026-02-21T10:22:24.3909076Z ld.shared.b16 %rs3381, [%r74+8192]; 2026-02-21T10:22:24.3909137Z ld.shared.b16 %rs3382, [%r74+9216]; 2026-02-21T10:22:24.3909202Z ld.shared.b16 %rs3383, [%r74+8256]; 2026-02-21T10:22:24.3909269Z ld.shared.b16 %rs3384, [%r74+9280]; 2026-02-21T10:22:24.3909408Z ld.shared.b16 %rs3385, [%r75]; 2026-02-21T10:22:24.3909475Z ld.shared.b16 %rs3386, [%r75+1024]; 2026-02-21T10:22:24.3909539Z ld.shared.b16 %rs3387, [%r75+64]; 2026-02-21T10:22:24.3909602Z ld.shared.b16 %rs3388, [%r75+1088]; 2026-02-21T10:22:24.3909664Z ld.shared.b16 %rs3389, [%r75+8192]; 2026-02-21T10:22:24.3909792Z ld.shared.b16 %rs3390, [%r75+9216]; 2026-02-21T10:22:24.3909854Z ld.shared.b16 %rs3391, [%r75+8256]; 2026-02-21T10:22:24.3909921Z ld.shared.b16 %rs3392, [%r75+9280]; 2026-02-21T10:22:24.3909986Z ld.shared.b16 %rs3393, [%r76]; 2026-02-21T10:22:24.3910050Z ld.shared.b16 %rs3394, [%r76+1024]; 2026-02-21T10:22:24.3910114Z ld.shared.b16 %rs3395, [%r76+64]; 2026-02-21T10:22:24.3910176Z ld.shared.b16 %rs3396, [%r76+1088]; 2026-02-21T10:22:24.3910241Z ld.shared.b16 %rs3397, [%r76+8192]; 2026-02-21T10:22:24.3910303Z ld.shared.b16 %rs3398, [%r76+9216]; 2026-02-21T10:22:24.3910366Z ld.shared.b16 %rs3399, [%r76+8256]; 2026-02-21T10:22:24.3910436Z ld.shared.b16 %rs3400, [%r76+9280]; 2026-02-21T10:22:24.3910498Z ld.shared.b16 %rs3401, [%r77]; 2026-02-21T10:22:24.3910561Z ld.shared.b16 %rs3402, [%r77+1024]; 2026-02-21T10:22:24.3910623Z ld.shared.b16 %rs3403, [%r77+64]; 2026-02-21T10:22:24.3910689Z ld.shared.b16 %rs3404, [%r77+1088]; 2026-02-21T10:22:24.3910755Z ld.shared.b16 %rs3405, [%r77+8192]; 2026-02-21T10:22:24.3910815Z ld.shared.b16 %rs3406, [%r77+9216]; 2026-02-21T10:22:24.3910881Z ld.shared.b16 %rs3407, [%r77+8256]; 2026-02-21T10:22:24.3911015Z ld.shared.b16 %rs3408, [%r77+9280]; 2026-02-21T10:22:24.3911081Z ld.shared.b16 %rs3409, [%r78]; 2026-02-21T10:22:24.3911148Z ld.shared.b16 %rs3410, [%r78+1024]; 2026-02-21T10:22:24.3911212Z ld.shared.b16 %rs3411, [%r78+64]; 2026-02-21T10:22:24.3911275Z ld.shared.b16 %rs3412, [%r78+1088]; 2026-02-21T10:22:24.3911338Z ld.shared.b16 %rs3413, [%r78+8192]; 2026-02-21T10:22:24.3911402Z ld.shared.b16 %rs3414, [%r78+9216]; 2026-02-21T10:22:24.3911468Z ld.shared.b16 %rs3415, [%r78+8256]; 2026-02-21T10:22:24.3911533Z ld.shared.b16 %rs3416, [%r78+9280]; 2026-02-21T10:22:24.3911598Z ld.shared.b16 %rs3417, [%r79]; 2026-02-21T10:22:24.3911660Z ld.shared.b16 %rs3418, [%r79+1024]; 2026-02-21T10:22:24.3911722Z ld.shared.b16 %rs3419, [%r79+64]; 2026-02-21T10:22:24.3911785Z ld.shared.b16 %rs3420, [%r79+1088]; 2026-02-21T10:22:24.3911853Z ld.shared.b16 %rs3421, [%r79+8192]; 2026-02-21T10:22:24.3911915Z ld.shared.b16 %rs3422, [%r79+9216]; 2026-02-21T10:22:24.3911979Z ld.shared.b16 %rs3423, [%r79+8256]; 2026-02-21T10:22:24.3912056Z ld.shared.b16 %rs3424, [%r79+9280]; 2026-02-21T10:22:24.3912120Z cvt.f32.bf16 %r40071, %rs3361; 2026-02-21T10:22:24.3912180Z cvt.f32.bf16 %r40072, %rs3362; 2026-02-21T10:22:24.3912242Z cvt.f32.bf16 %r40073, %rs3369; 2026-02-21T10:22:24.3912302Z cvt.f32.bf16 %r40074, %rs3370; 2026-02-21T10:22:24.3912360Z cvt.f32.bf16 %r40203, %rs3377; 2026-02-21T10:22:24.3912419Z cvt.f32.bf16 %r40204, %rs3378; 2026-02-21T10:22:24.3912539Z cvt.f32.bf16 %r40205, %rs3385; 2026-02-21T10:22:24.3912599Z cvt.f32.bf16 %r40206, %rs3386; 2026-02-21T10:22:24.3912659Z cvt.f32.bf16 %r40335, %rs3393; 2026-02-21T10:22:24.3912721Z cvt.f32.bf16 %r40336, %rs3394; 2026-02-21T10:22:24.3912780Z cvt.f32.bf16 %r40337, %rs3401; 2026-02-21T10:22:24.3912839Z cvt.f32.bf16 %r40338, %rs3402; 2026-02-21T10:22:24.3912902Z cvt.f32.bf16 %r40467, %rs3409; 2026-02-21T10:22:24.3912962Z cvt.f32.bf16 %r40468, %rs3410; 2026-02-21T10:22:24.3913021Z cvt.f32.bf16 %r40469, %rs3417; 2026-02-21T10:22:24.3913082Z cvt.f32.bf16 %r40470, %rs3418; 2026-02-21T10:22:24.3913142Z cvt.f32.bf16 %r40599, %rs3363; 2026-02-21T10:22:24.3913201Z cvt.f32.bf16 %r40600, %rs3364; 2026-02-21T10:22:24.3913258Z cvt.f32.bf16 %r40601, %rs3371; 2026-02-21T10:22:24.3913318Z cvt.f32.bf16 %r40602, %rs3372; 2026-02-21T10:22:24.3913387Z cvt.f32.bf16 %r40731, %rs3379; 2026-02-21T10:22:24.3913449Z cvt.f32.bf16 %r40732, %rs3380; 2026-02-21T10:22:24.3913508Z cvt.f32.bf16 %r40733, %rs3387; 2026-02-21T10:22:24.3913622Z cvt.f32.bf16 %r40734, %rs3388; 2026-02-21T10:22:24.3913686Z cvt.f32.bf16 %r40863, %rs3395; 2026-02-21T10:22:24.3913743Z cvt.f32.bf16 %r40864, %rs3396; 2026-02-21T10:22:24.3913802Z cvt.f32.bf16 %r40865, %rs3403; 2026-02-21T10:22:24.3913866Z cvt.f32.bf16 %r40866, %rs3404; 2026-02-21T10:22:24.3913983Z cvt.f32.bf16 %r40995, %rs3411; 2026-02-21T10:22:24.3914043Z cvt.f32.bf16 %r40996, %rs3412; 2026-02-21T10:22:24.3914105Z cvt.f32.bf16 %r40997, %rs3419; 2026-02-21T10:22:24.3914164Z cvt.f32.bf16 %r40998, %rs3420; 2026-02-21T10:22:24.3914223Z cvt.f32.bf16 %r41127, %rs3365; 2026-02-21T10:22:24.3914283Z cvt.f32.bf16 %r41128, %rs3366; 2026-02-21T10:22:24.3914342Z cvt.f32.bf16 %r41129, %rs3373; 2026-02-21T10:22:24.3914400Z cvt.f32.bf16 %r41130, %rs3374; 2026-02-21T10:22:24.3914458Z cvt.f32.bf16 %r41259, %rs3381; 2026-02-21T10:22:24.3914518Z cvt.f32.bf16 %r41260, %rs3382; 2026-02-21T10:22:24.3914577Z cvt.f32.bf16 %r41261, %rs3389; 2026-02-21T10:22:24.3914638Z cvt.f32.bf16 %r41262, %rs3390; 2026-02-21T10:22:24.3914705Z cvt.f32.bf16 %r41391, %rs3397; 2026-02-21T10:22:24.3914764Z cvt.f32.bf16 %r41392, %rs3398; 2026-02-21T10:22:24.3914822Z cvt.f32.bf16 %r41393, %rs3405; 2026-02-21T10:22:24.3914880Z cvt.f32.bf16 %r41394, %rs3406; 2026-02-21T10:22:24.3914941Z cvt.f32.bf16 %r41523, %rs3413; 2026-02-21T10:22:24.3915001Z cvt.f32.bf16 %r41524, %rs3414; 2026-02-21T10:22:24.3915060Z cvt.f32.bf16 %r41525, %rs3421; 2026-02-21T10:22:24.3915120Z cvt.f32.bf16 %r41526, %rs3422; 2026-02-21T10:22:24.3915232Z cvt.f32.bf16 %r41655, %rs3367; 2026-02-21T10:22:24.3915292Z cvt.f32.bf16 %r41656, %rs3368; 2026-02-21T10:22:24.3915350Z cvt.f32.bf16 %r41657, %rs3375; 2026-02-21T10:22:24.3915412Z cvt.f32.bf16 %r41658, %rs3376; 2026-02-21T10:22:24.3915470Z cvt.f32.bf16 %r41787, %rs3383; 2026-02-21T10:22:24.3915529Z cvt.f32.bf16 %r41788, %rs3384; 2026-02-21T10:22:24.3915592Z cvt.f32.bf16 %r41789, %rs3391; 2026-02-21T10:22:24.3915649Z cvt.f32.bf16 %r41790, %rs3392; 2026-02-21T10:22:24.3915715Z cvt.f32.bf16 %r41919, %rs3399; 2026-02-21T10:22:24.3915773Z cvt.f32.bf16 %r41920, %rs3400; 2026-02-21T10:22:24.3915833Z cvt.f32.bf16 %r41921, %rs3407; 2026-02-21T10:22:24.3915891Z cvt.f32.bf16 %r41922, %rs3408; 2026-02-21T10:22:24.3915949Z cvt.f32.bf16 %r42051, %rs3415; 2026-02-21T10:22:24.3916013Z cvt.f32.bf16 %r42052, %rs3416; 2026-02-21T10:22:24.3916071Z cvt.f32.bf16 %r42053, %rs3423; 2026-02-21T10:22:24.3916129Z cvt.f32.bf16 %r42054, %rs3424; 2026-02-21T10:22:24.3916356Z .loc 1 64 33 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:64:33 2026-02-21T10:22:24.3916414Z bar.sync 0; 2026-02-21T10:22:24.3916609Z // begin inline asm 2026-02-21T10:22:24.3916718Z @%p313 mbarrier.init.shared::cta.b64 [%r39934], 1; 2026-02-21T10:22:24.3916788Z // end inline asm 2026-02-21T10:22:24.3916844Z bar.sync 0; 2026-02-21T10:22:24.3916902Z // begin inline asm 2026-02-21T10:22:24.3917041Z @%p313 mbarrier.arrive.expect_tx.shared.b64 _, [%r39934], 4096; 2026-02-21T10:22:24.3917192Z // end inline asm 2026-02-21T10:22:24.3917517Z // begin inline asm 2026-02-21T10:22:24.3917644Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3920016Z // end inline asm 2026-02-21T10:22:24.3920107Z bar.sync 0; 2026-02-21T10:22:24.3920189Z elect.sync %r42317|%p399, -1; 2026-02-21T10:22:24.3920269Z and.pred %p380, %p1, %p399; 2026-02-21T10:22:24.3920333Z add.s64 %rd855, %rd855, 32; 2026-02-21T10:22:24.3920396Z cvt.u32.u64 %r39938, %rd855; 2026-02-21T10:22:24.3920458Z // begin inline asm 2026-02-21T10:22:24.3920827Z @%p380 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r39936], [%rd822, {%r39937, %r39938}], [%r39934]; 2026-02-21T10:22:24.3920885Z // end inline asm 2026-02-21T10:22:24.3920941Z bar.sync 0; 2026-02-21T10:22:24.3921003Z mov.b32 %r42185, 0; 2026-02-21T10:22:24.3921060Z // begin inline asm 2026-02-21T10:22:24.3921112Z 2026-02-21T10:22:24.3921164Z { 2026-02-21T10:22:24.3921231Z .reg .pred complete; 2026-02-21T10:22:24.3921400Z waitLoop: 2026-02-21T10:22:24.3921560Z mbarrier.try_wait.parity.shared.b64 complete, [%r39934], %r42185; 2026-02-21T10:22:24.3921636Z @!complete bra.uni waitLoop; 2026-02-21T10:22:24.3921687Z } 2026-02-21T10:22:24.3921692Z 2026-02-21T10:22:24.3921748Z // end inline asm 2026-02-21T10:22:24.3921804Z bar.sync 0; 2026-02-21T10:22:24.3921931Z // begin inline asm 2026-02-21T10:22:24.3922044Z @%p313 mbarrier.inval.shared::cta.b64 [%r39934]; 2026-02-21T10:22:24.3922102Z // end inline asm 2026-02-21T10:22:24.3922324Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3922395Z ld.shared.s8 %rs3425, [%r80]; 2026-02-21T10:22:24.3922604Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3922673Z shl.b16 %rs3426, %rs3425, 4; 2026-02-21T10:22:24.3922880Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3922956Z ld.shared.s8 %rs3427, [%r81+128]; 2026-02-21T10:22:24.3923152Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3923215Z shl.b16 %rs3428, %rs3427, 4; 2026-02-21T10:22:24.3923404Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3923474Z ld.shared.s8 %rs3429, [%r82+256]; 2026-02-21T10:22:24.3923737Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3923805Z shl.b16 %rs3430, %rs3429, 4; 2026-02-21T10:22:24.3924001Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3924070Z ld.shared.s8 %rs3431, [%r83+384]; 2026-02-21T10:22:24.3924254Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3924317Z shl.b16 %rs3432, %rs3431, 4; 2026-02-21T10:22:24.3924508Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3924571Z ld.shared.s8 %rs3433, [%r84+512]; 2026-02-21T10:22:24.3924759Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3924824Z shl.b16 %rs3434, %rs3433, 4; 2026-02-21T10:22:24.3925015Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3925077Z ld.shared.s8 %rs3435, [%r85+640]; 2026-02-21T10:22:24.3925266Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3925328Z shl.b16 %rs3436, %rs3435, 4; 2026-02-21T10:22:24.3925513Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3925576Z ld.shared.s8 %rs3437, [%r86+768]; 2026-02-21T10:22:24.3925822Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3925882Z shl.b16 %rs3438, %rs3437, 4; 2026-02-21T10:22:24.3926077Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3926144Z ld.shared.s8 %rs3439, [%r87+896]; 2026-02-21T10:22:24.3926333Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3926395Z shl.b16 %rs3440, %rs3439, 4; 2026-02-21T10:22:24.3926793Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3926869Z ld.shared.s8 %rs3441, [%r80+1024]; 2026-02-21T10:22:24.3927059Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3927125Z shl.b16 %rs3442, %rs3441, 4; 2026-02-21T10:22:24.3927408Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3927481Z ld.shared.s8 %rs3443, [%r81+1152]; 2026-02-21T10:22:24.3927672Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3927734Z shl.b16 %rs3444, %rs3443, 4; 2026-02-21T10:22:24.3927994Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3928058Z ld.shared.s8 %rs3445, [%r82+1280]; 2026-02-21T10:22:24.3928250Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3928310Z shl.b16 %rs3446, %rs3445, 4; 2026-02-21T10:22:24.3928496Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3928562Z ld.shared.s8 %rs3447, [%r83+1408]; 2026-02-21T10:22:24.3928747Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3928812Z shl.b16 %rs3448, %rs3447, 4; 2026-02-21T10:22:24.3929001Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3929065Z ld.shared.s8 %rs3449, [%r84+1536]; 2026-02-21T10:22:24.3929254Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3929318Z shl.b16 %rs3450, %rs3449, 4; 2026-02-21T10:22:24.3929570Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3929638Z ld.shared.s8 %rs3451, [%r85+1664]; 2026-02-21T10:22:24.3929824Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3929896Z shl.b16 %rs3452, %rs3451, 4; 2026-02-21T10:22:24.3930086Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3930152Z ld.shared.s8 %rs3453, [%r86+1792]; 2026-02-21T10:22:24.3930350Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3930412Z shl.b16 %rs3454, %rs3453, 4; 2026-02-21T10:22:24.3930612Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3930684Z ld.shared.s8 %rs3455, [%r87+1920]; 2026-02-21T10:22:24.3930879Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3930940Z shl.b16 %rs3456, %rs3455, 4; 2026-02-21T10:22:24.3931132Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3931197Z ld.shared.s8 %rs3457, [%r80+2048]; 2026-02-21T10:22:24.3931386Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3931446Z shl.b16 %rs3458, %rs3457, 4; 2026-02-21T10:22:24.3931704Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3931768Z ld.shared.s8 %rs3459, [%r81+2176]; 2026-02-21T10:22:24.3931956Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3932019Z shl.b16 %rs3460, %rs3459, 4; 2026-02-21T10:22:24.3932209Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3932275Z ld.shared.s8 %rs3461, [%r82+2304]; 2026-02-21T10:22:24.3932464Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3932524Z shl.b16 %rs3462, %rs3461, 4; 2026-02-21T10:22:24.3932711Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3932777Z ld.shared.s8 %rs3463, [%r83+2432]; 2026-02-21T10:22:24.3933023Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3933091Z shl.b16 %rs3464, %rs3463, 4; 2026-02-21T10:22:24.3933279Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3933345Z ld.shared.s8 %rs3465, [%r84+2560]; 2026-02-21T10:22:24.3933603Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3933665Z shl.b16 %rs3466, %rs3465, 4; 2026-02-21T10:22:24.3933870Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3933937Z ld.shared.s8 %rs3467, [%r85+2688]; 2026-02-21T10:22:24.3934134Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3934211Z shl.b16 %rs3468, %rs3467, 4; 2026-02-21T10:22:24.3934406Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3934478Z ld.shared.s8 %rs3469, [%r86+2816]; 2026-02-21T10:22:24.3934676Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3934738Z shl.b16 %rs3470, %rs3469, 4; 2026-02-21T10:22:24.3934929Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3934998Z ld.shared.s8 %rs3471, [%r87+2944]; 2026-02-21T10:22:24.3935243Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3935307Z shl.b16 %rs3472, %rs3471, 4; 2026-02-21T10:22:24.3935496Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3935563Z ld.shared.s8 %rs3473, [%r80+3072]; 2026-02-21T10:22:24.3935752Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3935815Z shl.b16 %rs3474, %rs3473, 4; 2026-02-21T10:22:24.3936015Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3936080Z ld.shared.s8 %rs3475, [%r81+3200]; 2026-02-21T10:22:24.3936269Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3936333Z shl.b16 %rs3476, %rs3475, 4; 2026-02-21T10:22:24.3936667Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3936739Z ld.shared.s8 %rs3477, [%r82+3328]; 2026-02-21T10:22:24.3936933Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3936994Z shl.b16 %rs3478, %rs3477, 4; 2026-02-21T10:22:24.3937181Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3937245Z ld.shared.s8 %rs3479, [%r83+3456]; 2026-02-21T10:22:24.3937516Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3937577Z shl.b16 %rs3480, %rs3479, 4; 2026-02-21T10:22:24.3937763Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3937828Z ld.shared.s8 %rs3481, [%r84+3584]; 2026-02-21T10:22:24.3938018Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3938080Z shl.b16 %rs3482, %rs3481, 4; 2026-02-21T10:22:24.3938275Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3938347Z ld.shared.s8 %rs3483, [%r85+3712]; 2026-02-21T10:22:24.3938539Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3938601Z shl.b16 %rs3484, %rs3483, 4; 2026-02-21T10:22:24.3938851Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3938919Z ld.shared.s8 %rs3485, [%r86+3840]; 2026-02-21T10:22:24.3939107Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3939170Z shl.b16 %rs3486, %rs3485, 4; 2026-02-21T10:22:24.3939418Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3939482Z ld.shared.s8 %rs3487, [%r87+3968]; 2026-02-21T10:22:24.3939673Z .loc 1 67 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:67:28 2026-02-21T10:22:24.3939734Z shl.b16 %rs3488, %rs3487, 4; 2026-02-21T10:22:24.3939920Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3939983Z cvt.s16.s8 %rs3489, %rs3426; 2026-02-21T10:22:24.3940043Z shr.s16 %rs3490, %rs3489, 4; 2026-02-21T10:22:24.3940103Z cvt.s16.s8 %rs3491, %rs3428; 2026-02-21T10:22:24.3940166Z shr.s16 %rs3492, %rs3491, 4; 2026-02-21T10:22:24.3940228Z shr.s16 %rs3493, %rs3425, 4; 2026-02-21T10:22:24.3940288Z shr.s16 %rs3494, %rs3427, 4; 2026-02-21T10:22:24.3940476Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3940545Z cvt.rn.f32.s16 %r42318, %rs3494; 2026-02-21T10:22:24.3940623Z cvt.rn.f32.s16 %r42319, %rs3493; 2026-02-21T10:22:24.3940687Z cvt.rn.f32.s16 %r42320, %rs3492; 2026-02-21T10:22:24.3940818Z cvt.rn.f32.s16 %r42321, %rs3490; 2026-02-21T10:22:24.3941013Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3941072Z cvt.s16.s8 %rs3495, %rs3430; 2026-02-21T10:22:24.3941132Z shr.s16 %rs3496, %rs3495, 4; 2026-02-21T10:22:24.3941194Z cvt.s16.s8 %rs3497, %rs3432; 2026-02-21T10:22:24.3941254Z shr.s16 %rs3498, %rs3497, 4; 2026-02-21T10:22:24.3941327Z shr.s16 %rs3499, %rs3429, 4; 2026-02-21T10:22:24.3941393Z shr.s16 %rs3500, %rs3431, 4; 2026-02-21T10:22:24.3941584Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3941646Z cvt.rn.f32.s16 %r42322, %rs3500; 2026-02-21T10:22:24.3941710Z cvt.rn.f32.s16 %r42323, %rs3499; 2026-02-21T10:22:24.3941772Z cvt.rn.f32.s16 %r42324, %rs3498; 2026-02-21T10:22:24.3941836Z cvt.rn.f32.s16 %r42325, %rs3496; 2026-02-21T10:22:24.3942027Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3942090Z cvt.s16.s8 %rs3501, %rs3434; 2026-02-21T10:22:24.3942149Z shr.s16 %rs3502, %rs3501, 4; 2026-02-21T10:22:24.3942209Z cvt.s16.s8 %rs3503, %rs3436; 2026-02-21T10:22:24.3942271Z shr.s16 %rs3504, %rs3503, 4; 2026-02-21T10:22:24.3942331Z shr.s16 %rs3505, %rs3433, 4; 2026-02-21T10:22:24.3942390Z shr.s16 %rs3506, %rs3435, 4; 2026-02-21T10:22:24.3942582Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3942703Z cvt.rn.f32.s16 %r42326, %rs3506; 2026-02-21T10:22:24.3942775Z cvt.rn.f32.s16 %r42327, %rs3505; 2026-02-21T10:22:24.3942837Z cvt.rn.f32.s16 %r42328, %rs3504; 2026-02-21T10:22:24.3942902Z cvt.rn.f32.s16 %r42329, %rs3502; 2026-02-21T10:22:24.3943092Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3943155Z cvt.s16.s8 %rs3507, %rs3438; 2026-02-21T10:22:24.3943217Z shr.s16 %rs3508, %rs3507, 4; 2026-02-21T10:22:24.3943277Z cvt.s16.s8 %rs3509, %rs3440; 2026-02-21T10:22:24.3943337Z shr.s16 %rs3510, %rs3509, 4; 2026-02-21T10:22:24.3943396Z shr.s16 %rs3511, %rs3437, 4; 2026-02-21T10:22:24.3943456Z shr.s16 %rs3512, %rs3439, 4; 2026-02-21T10:22:24.3943644Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3943705Z cvt.rn.f32.s16 %r42330, %rs3512; 2026-02-21T10:22:24.3943768Z cvt.rn.f32.s16 %r42331, %rs3511; 2026-02-21T10:22:24.3943884Z cvt.rn.f32.s16 %r42332, %rs3510; 2026-02-21T10:22:24.3943948Z cvt.rn.f32.s16 %r42333, %rs3508; 2026-02-21T10:22:24.3944138Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3944197Z cvt.s16.s8 %rs3513, %rs3442; 2026-02-21T10:22:24.3944300Z shr.s16 %rs3514, %rs3513, 4; 2026-02-21T10:22:24.3944359Z cvt.s16.s8 %rs3515, %rs3444; 2026-02-21T10:22:24.3944421Z shr.s16 %rs3516, %rs3515, 4; 2026-02-21T10:22:24.3944481Z shr.s16 %rs3517, %rs3441, 4; 2026-02-21T10:22:24.3944540Z shr.s16 %rs3518, %rs3443, 4; 2026-02-21T10:22:24.3944730Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3944793Z cvt.rn.f32.s16 %r42334, %rs3518; 2026-02-21T10:22:24.3944855Z cvt.rn.f32.s16 %r42335, %rs3517; 2026-02-21T10:22:24.3944917Z cvt.rn.f32.s16 %r42336, %rs3516; 2026-02-21T10:22:24.3944977Z cvt.rn.f32.s16 %r42337, %rs3514; 2026-02-21T10:22:24.3945170Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3945233Z cvt.s16.s8 %rs3519, %rs3446; 2026-02-21T10:22:24.3945305Z shr.s16 %rs3520, %rs3519, 4; 2026-02-21T10:22:24.3945368Z cvt.s16.s8 %rs3521, %rs3448; 2026-02-21T10:22:24.3945428Z shr.s16 %rs3522, %rs3521, 4; 2026-02-21T10:22:24.3945493Z shr.s16 %rs3523, %rs3445, 4; 2026-02-21T10:22:24.3945552Z shr.s16 %rs3524, %rs3447, 4; 2026-02-21T10:22:24.3945794Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3945860Z cvt.rn.f32.s16 %r42338, %rs3524; 2026-02-21T10:22:24.3945921Z cvt.rn.f32.s16 %r42339, %rs3523; 2026-02-21T10:22:24.3945983Z cvt.rn.f32.s16 %r42340, %rs3522; 2026-02-21T10:22:24.3946044Z cvt.rn.f32.s16 %r42341, %rs3520; 2026-02-21T10:22:24.3946243Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3946305Z cvt.s16.s8 %rs3525, %rs3450; 2026-02-21T10:22:24.3946366Z shr.s16 %rs3526, %rs3525, 4; 2026-02-21T10:22:24.3946425Z cvt.s16.s8 %rs3527, %rs3452; 2026-02-21T10:22:24.3946617Z shr.s16 %rs3528, %rs3527, 4; 2026-02-21T10:22:24.3946683Z shr.s16 %rs3529, %rs3449, 4; 2026-02-21T10:22:24.3946743Z shr.s16 %rs3530, %rs3451, 4; 2026-02-21T10:22:24.3946939Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3947001Z cvt.rn.f32.s16 %r42342, %rs3530; 2026-02-21T10:22:24.3947066Z cvt.rn.f32.s16 %r42343, %rs3529; 2026-02-21T10:22:24.3947129Z cvt.rn.f32.s16 %r42344, %rs3528; 2026-02-21T10:22:24.3947189Z cvt.rn.f32.s16 %r42345, %rs3526; 2026-02-21T10:22:24.3947377Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3947438Z cvt.s16.s8 %rs3531, %rs3454; 2026-02-21T10:22:24.3947497Z shr.s16 %rs3532, %rs3531, 4; 2026-02-21T10:22:24.3947648Z cvt.s16.s8 %rs3533, %rs3456; 2026-02-21T10:22:24.3947711Z shr.s16 %rs3534, %rs3533, 4; 2026-02-21T10:22:24.3947774Z shr.s16 %rs3535, %rs3453, 4; 2026-02-21T10:22:24.3947832Z shr.s16 %rs3536, %rs3455, 4; 2026-02-21T10:22:24.3948027Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3948096Z cvt.rn.f32.s16 %r42346, %rs3536; 2026-02-21T10:22:24.3948157Z cvt.rn.f32.s16 %r42347, %rs3535; 2026-02-21T10:22:24.3948219Z cvt.rn.f32.s16 %r42348, %rs3534; 2026-02-21T10:22:24.3948284Z cvt.rn.f32.s16 %r42349, %rs3532; 2026-02-21T10:22:24.3948552Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3948615Z cvt.s16.s8 %rs3537, %rs3458; 2026-02-21T10:22:24.3948674Z shr.s16 %rs3538, %rs3537, 4; 2026-02-21T10:22:24.3948736Z cvt.s16.s8 %rs3539, %rs3460; 2026-02-21T10:22:24.3948794Z shr.s16 %rs3540, %rs3539, 4; 2026-02-21T10:22:24.3948854Z shr.s16 %rs3541, %rs3457, 4; 2026-02-21T10:22:24.3948989Z shr.s16 %rs3542, %rs3459, 4; 2026-02-21T10:22:24.3949184Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3949245Z cvt.rn.f32.s16 %r42350, %rs3542; 2026-02-21T10:22:24.3949309Z cvt.rn.f32.s16 %r42351, %rs3541; 2026-02-21T10:22:24.3949371Z cvt.rn.f32.s16 %r42352, %rs3540; 2026-02-21T10:22:24.3949492Z cvt.rn.f32.s16 %r42353, %rs3538; 2026-02-21T10:22:24.3949681Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3949745Z cvt.s16.s8 %rs3543, %rs3462; 2026-02-21T10:22:24.3949803Z shr.s16 %rs3544, %rs3543, 4; 2026-02-21T10:22:24.3949863Z cvt.s16.s8 %rs3545, %rs3464; 2026-02-21T10:22:24.3949923Z shr.s16 %rs3546, %rs3545, 4; 2026-02-21T10:22:24.3949981Z shr.s16 %rs3547, %rs3461, 4; 2026-02-21T10:22:24.3950040Z shr.s16 %rs3548, %rs3463, 4; 2026-02-21T10:22:24.3950229Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3950310Z cvt.rn.f32.s16 %r42354, %rs3548; 2026-02-21T10:22:24.3950372Z cvt.rn.f32.s16 %r42355, %rs3547; 2026-02-21T10:22:24.3950434Z cvt.rn.f32.s16 %r42356, %rs3546; 2026-02-21T10:22:24.3950497Z cvt.rn.f32.s16 %r42357, %rs3544; 2026-02-21T10:22:24.3950684Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3950748Z cvt.s16.s8 %rs3549, %rs3466; 2026-02-21T10:22:24.3950809Z shr.s16 %rs3550, %rs3549, 4; 2026-02-21T10:22:24.3950935Z cvt.s16.s8 %rs3551, %rs3468; 2026-02-21T10:22:24.3950998Z shr.s16 %rs3552, %rs3551, 4; 2026-02-21T10:22:24.3951058Z shr.s16 %rs3553, %rs3465, 4; 2026-02-21T10:22:24.3951120Z shr.s16 %rs3554, %rs3467, 4; 2026-02-21T10:22:24.3951309Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3951371Z cvt.rn.f32.s16 %r42358, %rs3554; 2026-02-21T10:22:24.3951441Z cvt.rn.f32.s16 %r42359, %rs3553; 2026-02-21T10:22:24.3951504Z cvt.rn.f32.s16 %r42360, %rs3552; 2026-02-21T10:22:24.3951565Z cvt.rn.f32.s16 %r42361, %rs3550; 2026-02-21T10:22:24.3951754Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3951815Z cvt.s16.s8 %rs3555, %rs3470; 2026-02-21T10:22:24.3951877Z shr.s16 %rs3556, %rs3555, 4; 2026-02-21T10:22:24.3951936Z cvt.s16.s8 %rs3557, %rs3472; 2026-02-21T10:22:24.3952000Z shr.s16 %rs3558, %rs3557, 4; 2026-02-21T10:22:24.3952061Z shr.s16 %rs3559, %rs3469, 4; 2026-02-21T10:22:24.3952131Z shr.s16 %rs3560, %rs3471, 4; 2026-02-21T10:22:24.3952327Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3952390Z cvt.rn.f32.s16 %r42362, %rs3560; 2026-02-21T10:22:24.3952453Z cvt.rn.f32.s16 %r42363, %rs3559; 2026-02-21T10:22:24.3952514Z cvt.rn.f32.s16 %r42364, %rs3558; 2026-02-21T10:22:24.3952578Z cvt.rn.f32.s16 %r42365, %rs3556; 2026-02-21T10:22:24.3952826Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3952886Z cvt.s16.s8 %rs3561, %rs3474; 2026-02-21T10:22:24.3952948Z shr.s16 %rs3562, %rs3561, 4; 2026-02-21T10:22:24.3953007Z cvt.s16.s8 %rs3563, %rs3476; 2026-02-21T10:22:24.3953070Z shr.s16 %rs3564, %rs3563, 4; 2026-02-21T10:22:24.3953138Z shr.s16 %rs3565, %rs3473, 4; 2026-02-21T10:22:24.3953204Z shr.s16 %rs3566, %rs3475, 4; 2026-02-21T10:22:24.3953395Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3953460Z cvt.rn.f32.s16 %r42366, %rs3566; 2026-02-21T10:22:24.3953524Z cvt.rn.f32.s16 %r42367, %rs3565; 2026-02-21T10:22:24.3953585Z cvt.rn.f32.s16 %r42368, %rs3564; 2026-02-21T10:22:24.3953645Z cvt.rn.f32.s16 %r42369, %rs3562; 2026-02-21T10:22:24.3953836Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3953979Z cvt.s16.s8 %rs3567, %rs3478; 2026-02-21T10:22:24.3954044Z shr.s16 %rs3568, %rs3567, 4; 2026-02-21T10:22:24.3954107Z cvt.s16.s8 %rs3569, %rs3480; 2026-02-21T10:22:24.3954166Z shr.s16 %rs3570, %rs3569, 4; 2026-02-21T10:22:24.3954225Z shr.s16 %rs3571, %rs3477, 4; 2026-02-21T10:22:24.3954284Z shr.s16 %rs3572, %rs3479, 4; 2026-02-21T10:22:24.3954528Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3954591Z cvt.rn.f32.s16 %r42370, %rs3572; 2026-02-21T10:22:24.3954657Z cvt.rn.f32.s16 %r42371, %rs3571; 2026-02-21T10:22:24.3954719Z cvt.rn.f32.s16 %r42372, %rs3570; 2026-02-21T10:22:24.3954779Z cvt.rn.f32.s16 %r42373, %rs3568; 2026-02-21T10:22:24.3954966Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3955027Z cvt.s16.s8 %rs3573, %rs3482; 2026-02-21T10:22:24.3955086Z shr.s16 %rs3574, %rs3573, 4; 2026-02-21T10:22:24.3955147Z cvt.s16.s8 %rs3575, %rs3484; 2026-02-21T10:22:24.3955208Z shr.s16 %rs3576, %rs3575, 4; 2026-02-21T10:22:24.3955269Z shr.s16 %rs3577, %rs3481, 4; 2026-02-21T10:22:24.3955327Z shr.s16 %rs3578, %rs3483, 4; 2026-02-21T10:22:24.3955517Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3955584Z cvt.rn.f32.s16 %r42374, %rs3578; 2026-02-21T10:22:24.3955645Z cvt.rn.f32.s16 %r42375, %rs3577; 2026-02-21T10:22:24.3955706Z cvt.rn.f32.s16 %r42376, %rs3576; 2026-02-21T10:22:24.3955819Z cvt.rn.f32.s16 %r42377, %rs3574; 2026-02-21T10:22:24.3956011Z .loc 1 69 25 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:69:25 2026-02-21T10:22:24.3956071Z cvt.s16.s8 %rs3579, %rs3486; 2026-02-21T10:22:24.3956130Z shr.s16 %rs3580, %rs3579, 4; 2026-02-21T10:22:24.3956194Z cvt.s16.s8 %rs3581, %rs3488; 2026-02-21T10:22:24.3956253Z shr.s16 %rs3582, %rs3581, 4; 2026-02-21T10:22:24.3956323Z shr.s16 %rs3583, %rs3485, 4; 2026-02-21T10:22:24.3956391Z shr.s16 %rs3584, %rs3487, 4; 2026-02-21T10:22:24.3956720Z .loc 1 87 32 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:87:32 2026-02-21T10:22:24.3956788Z cvt.rn.f32.s16 %r42378, %rs3584; 2026-02-21T10:22:24.3956850Z cvt.rn.f32.s16 %r42379, %rs3583; 2026-02-21T10:22:24.3956918Z cvt.rn.f32.s16 %r42380, %rs3582; 2026-02-21T10:22:24.3956977Z cvt.rn.f32.s16 %r42381, %rs3580; 2026-02-21T10:22:24.3957032Z bar.sync 0; 2026-02-21T10:22:24.3957158Z st.shared.v4.b32 [%r88], {%r42321, %r42319, %r42320, %r42318}; 2026-02-21T10:22:24.3957288Z st.shared.v4.b32 [%r88+16384], {%r42353, %r42351, %r42352, %r42350}; 2026-02-21T10:22:24.3957400Z st.shared.v4.b32 [%r89], {%r42325, %r42323, %r42324, %r42322}; 2026-02-21T10:22:24.3957521Z st.shared.v4.b32 [%r89+16384], {%r42357, %r42355, %r42356, %r42354}; 2026-02-21T10:22:24.3957629Z st.shared.v4.b32 [%r90], {%r42329, %r42327, %r42328, %r42326}; 2026-02-21T10:22:24.3957830Z st.shared.v4.b32 [%r90+16384], {%r42361, %r42359, %r42360, %r42358}; 2026-02-21T10:22:24.3957937Z st.shared.v4.b32 [%r91], {%r42333, %r42331, %r42332, %r42330}; 2026-02-21T10:22:24.3958057Z st.shared.v4.b32 [%r91+16384], {%r42365, %r42363, %r42364, %r42362}; 2026-02-21T10:22:24.3958167Z st.shared.v4.b32 [%r92], {%r42337, %r42335, %r42336, %r42334}; 2026-02-21T10:22:24.3958284Z st.shared.v4.b32 [%r92+16384], {%r42369, %r42367, %r42368, %r42366}; 2026-02-21T10:22:24.3958392Z st.shared.v4.b32 [%r93], {%r42341, %r42339, %r42340, %r42338}; 2026-02-21T10:22:24.3958509Z st.shared.v4.b32 [%r93+16384], {%r42373, %r42371, %r42372, %r42370}; 2026-02-21T10:22:24.3958617Z st.shared.v4.b32 [%r94], {%r42345, %r42343, %r42344, %r42342}; 2026-02-21T10:22:24.3958736Z st.shared.v4.b32 [%r94+16384], {%r42377, %r42375, %r42376, %r42374}; 2026-02-21T10:22:24.3958842Z st.shared.v4.b32 [%r95], {%r42349, %r42347, %r42348, %r42346}; 2026-02-21T10:22:24.3958958Z st.shared.v4.b32 [%r95+16384], {%r42381, %r42379, %r42380, %r42378}; 2026-02-21T10:22:24.3959019Z $L__tmp31: 2026-02-21T10:22:24.3959355Z .loc 2 291 36 // standard.py:291:36 @[ c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:94:40 ] 2026-02-21T10:22:24.3959420Z // begin inline asm 2026-02-21T10:22:24.3959499Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3959557Z // end inline asm 2026-02-21T10:22:24.3959674Z bar.sync 0; 2026-02-21T10:22:24.3959770Z shfl.sync.idx.b32 %r42382, %r5, 0, 31, -1; 2026-02-21T10:22:24.3959847Z wgmma.fence.sync.aligned; 2026-02-21T10:22:24.3959912Z mov.pred %p382, -1; 2026-02-21T10:22:24.3959972Z // begin inline asm 2026-02-21T10:22:24.3961473Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40071,%r40072,%r40073,%r40074}, %rd12, %p382, 1, 1; 2026-02-21T10:22:24.3961535Z // end inline asm 2026-02-21T10:22:24.3961592Z // begin inline asm 2026-02-21T10:22:24.3963164Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40203,%r40204,%r40205,%r40206}, %rd13, %p382, 1, 1; 2026-02-21T10:22:24.3963230Z // end inline asm 2026-02-21T10:22:24.3963292Z // begin inline asm 2026-02-21T10:22:24.3964781Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40335,%r40336,%r40337,%r40338}, %rd14, %p382, 1, 1; 2026-02-21T10:22:24.3964840Z // end inline asm 2026-02-21T10:22:24.3964901Z // begin inline asm 2026-02-21T10:22:24.3966378Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40467,%r40468,%r40469,%r40470}, %rd15, %p382, 1, 1; 2026-02-21T10:22:24.3966621Z // end inline asm 2026-02-21T10:22:24.3966686Z // begin inline asm 2026-02-21T10:22:24.3968258Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40599,%r40600,%r40601,%r40602}, %rd16, %p382, 1, 1; 2026-02-21T10:22:24.3968323Z // end inline asm 2026-02-21T10:22:24.3968382Z // begin inline asm 2026-02-21T10:22:24.3969868Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40731,%r40732,%r40733,%r40734}, %rd17, %p382, 1, 1; 2026-02-21T10:22:24.3970005Z // end inline asm 2026-02-21T10:22:24.3970070Z // begin inline asm 2026-02-21T10:22:24.3971621Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40863,%r40864,%r40865,%r40866}, %rd18, %p382, 1, 1; 2026-02-21T10:22:24.3971684Z // end inline asm 2026-02-21T10:22:24.3971745Z // begin inline asm 2026-02-21T10:22:24.3973219Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309}, {%r40995,%r40996,%r40997,%r40998}, %rd19, %p382, 1, 1; 2026-02-21T10:22:24.3973281Z // end inline asm 2026-02-21T10:22:24.3973341Z // begin inline asm 2026-02-21T10:22:24.3974806Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41127,%r41128,%r41129,%r41130}, %rd12, %p382, 1, 1; 2026-02-21T10:22:24.3974926Z // end inline asm 2026-02-21T10:22:24.3974998Z // begin inline asm 2026-02-21T10:22:24.3976583Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41259,%r41260,%r41261,%r41262}, %rd13, %p382, 1, 1; 2026-02-21T10:22:24.3976657Z // end inline asm 2026-02-21T10:22:24.3976719Z // begin inline asm 2026-02-21T10:22:24.3978252Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41391,%r41392,%r41393,%r41394}, %rd14, %p382, 1, 1; 2026-02-21T10:22:24.3978376Z // end inline asm 2026-02-21T10:22:24.3978434Z // begin inline asm 2026-02-21T10:22:24.3979886Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41523,%r41524,%r41525,%r41526}, %rd15, %p382, 1, 1; 2026-02-21T10:22:24.3979950Z // end inline asm 2026-02-21T10:22:24.3980008Z // begin inline asm 2026-02-21T10:22:24.3981522Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41655,%r41656,%r41657,%r41658}, %rd16, %p382, 1, 1; 2026-02-21T10:22:24.3981589Z // end inline asm 2026-02-21T10:22:24.3981646Z // begin inline asm 2026-02-21T10:22:24.3983105Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41787,%r41788,%r41789,%r41790}, %rd17, %p382, 1, 1; 2026-02-21T10:22:24.3983163Z // end inline asm 2026-02-21T10:22:24.3983220Z // begin inline asm 2026-02-21T10:22:24.3984678Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r41919,%r41920,%r41921,%r41922}, %rd18, %p382, 1, 1; 2026-02-21T10:22:24.3984816Z // end inline asm 2026-02-21T10:22:24.3984873Z // begin inline asm 2026-02-21T10:22:24.3986384Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373}, {%r42051,%r42052,%r42053,%r42054}, %rd19, %p382, 1, 1; 2026-02-21T10:22:24.3986444Z // end inline asm 2026-02-21T10:22:24.3986648Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:24.3986785Z mov.b32 %r42183, %r39936; 2026-02-21T10:22:24.3986844Z mov.b32 %r42184, %r42185; 2026-02-21T10:22:24.3986901Z // begin inline asm 2026-02-21T10:22:24.3989539Z // wait for regs: %r43246,%r43247,%r43248,%r43249,%r43250,%r43251,%r43252,%r43253,%r43254,%r43255,%r43256,%r43257,%r43258,%r43259,%r43260,%r43261,%r43262,%r43263,%r43264,%r43265,%r43266,%r43267,%r43268,%r43269,%r43270,%r43271,%r43272,%r43273,%r43274,%r43275,%r43276,%r43277,%r43278,%r43279,%r43280,%r43281,%r43282,%r43283,%r43284,%r43285,%r43286,%r43287,%r43288,%r43289,%r43290,%r43291,%r43292,%r43293,%r43294,%r43295,%r43296,%r43297,%r43298,%r43299,%r43300,%r43301,%r43302,%r43303,%r43304,%r43305,%r43306,%r43307,%r43308,%r43309,%r43310,%r43311,%r43312,%r43313,%r43314,%r43315,%r43316,%r43317,%r43318,%r43319,%r43320,%r43321,%r43322,%r43323,%r43324,%r43325,%r43326,%r43327,%r43328,%r43329,%r43330,%r43331,%r43332,%r43333,%r43334,%r43335,%r43336,%r43337,%r43338,%r43339,%r43340,%r43341,%r43342,%r43343,%r43344,%r43345,%r43346,%r43347,%r43348,%r43349,%r43350,%r43351,%r43352,%r43353,%r43354,%r43355,%r43356,%r43357,%r43358,%r43359,%r43360,%r43361,%r43362,%r43363,%r43364,%r43365,%r43366,%r43367,%r43368,%r43369,%r43370,%r43371,%r43372,%r43373,%r42183,%r42184,%r42185 2026-02-21T10:22:24.3989635Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:24.3989699Z // end inline asm 2026-02-21T10:22:24.3989754Z $L__tmp32: 2026-02-21T10:22:24.3989976Z .loc 1 51 126 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:51:126 2026-02-21T10:22:24.3990045Z add.s64 %rd854, %rd854, 128; 2026-02-21T10:22:24.3990117Z setp.lt.u64 %p400, %rd855, 4064; 2026-02-21T10:22:24.3990183Z @%p400 bra $L__BB0_20; 2026-02-21T10:22:24.3990302Z // %bb.21: // in Loop: Header=BB0_17 Depth=1 2026-02-21T10:22:24.3990506Z .loc 1 97 28 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:97:28 2026-02-21T10:22:24.3990593Z cvt.rn.bf16x2.f32 %r42386, %r43247, %r43246; 2026-02-21T10:22:24.3990678Z cvt.rn.bf16x2.f32 %r42387, %r43249, %r43248; 2026-02-21T10:22:24.3990756Z cvt.rn.bf16x2.f32 %r42388, %r43251, %r43250; 2026-02-21T10:22:24.3990834Z cvt.rn.bf16x2.f32 %r42389, %r43253, %r43252; 2026-02-21T10:22:24.3990910Z cvt.rn.bf16x2.f32 %r42390, %r43255, %r43254; 2026-02-21T10:22:24.3990985Z cvt.rn.bf16x2.f32 %r42391, %r43257, %r43256; 2026-02-21T10:22:24.3991062Z cvt.rn.bf16x2.f32 %r42392, %r43259, %r43258; 2026-02-21T10:22:24.3991136Z cvt.rn.bf16x2.f32 %r42393, %r43261, %r43260; 2026-02-21T10:22:24.3991212Z cvt.rn.bf16x2.f32 %r42394, %r43263, %r43262; 2026-02-21T10:22:24.3991287Z cvt.rn.bf16x2.f32 %r42395, %r43265, %r43264; 2026-02-21T10:22:24.3991434Z cvt.rn.bf16x2.f32 %r42396, %r43267, %r43266; 2026-02-21T10:22:24.3991513Z cvt.rn.bf16x2.f32 %r42397, %r43269, %r43268; 2026-02-21T10:22:24.3991586Z cvt.rn.bf16x2.f32 %r42398, %r43271, %r43270; 2026-02-21T10:22:24.3991660Z cvt.rn.bf16x2.f32 %r42399, %r43273, %r43272; 2026-02-21T10:22:24.3991735Z cvt.rn.bf16x2.f32 %r42400, %r43275, %r43274; 2026-02-21T10:22:24.3991811Z cvt.rn.bf16x2.f32 %r42401, %r43277, %r43276; 2026-02-21T10:22:24.3991886Z cvt.rn.bf16x2.f32 %r42402, %r43279, %r43278; 2026-02-21T10:22:24.3991965Z cvt.rn.bf16x2.f32 %r42403, %r43281, %r43280; 2026-02-21T10:22:24.3992042Z cvt.rn.bf16x2.f32 %r42404, %r43283, %r43282; 2026-02-21T10:22:24.3992116Z cvt.rn.bf16x2.f32 %r42405, %r43285, %r43284; 2026-02-21T10:22:24.3992190Z cvt.rn.bf16x2.f32 %r42406, %r43287, %r43286; 2026-02-21T10:22:24.3992265Z cvt.rn.bf16x2.f32 %r42407, %r43289, %r43288; 2026-02-21T10:22:24.3992339Z cvt.rn.bf16x2.f32 %r42408, %r43291, %r43290; 2026-02-21T10:22:24.3992477Z cvt.rn.bf16x2.f32 %r42409, %r43293, %r43292; 2026-02-21T10:22:24.3992555Z cvt.rn.bf16x2.f32 %r42410, %r43295, %r43294; 2026-02-21T10:22:24.3992633Z cvt.rn.bf16x2.f32 %r42411, %r43297, %r43296; 2026-02-21T10:22:24.3992707Z cvt.rn.bf16x2.f32 %r42412, %r43299, %r43298; 2026-02-21T10:22:24.3992779Z cvt.rn.bf16x2.f32 %r42413, %r43301, %r43300; 2026-02-21T10:22:24.3992913Z cvt.rn.bf16x2.f32 %r42414, %r43303, %r43302; 2026-02-21T10:22:24.3992989Z cvt.rn.bf16x2.f32 %r42415, %r43305, %r43304; 2026-02-21T10:22:24.3993065Z cvt.rn.bf16x2.f32 %r42416, %r43307, %r43306; 2026-02-21T10:22:24.3993142Z cvt.rn.bf16x2.f32 %r42417, %r43309, %r43308; 2026-02-21T10:22:24.3993215Z cvt.rn.bf16x2.f32 %r42418, %r43311, %r43310; 2026-02-21T10:22:24.3993288Z cvt.rn.bf16x2.f32 %r42419, %r43313, %r43312; 2026-02-21T10:22:24.3993362Z cvt.rn.bf16x2.f32 %r42420, %r43315, %r43314; 2026-02-21T10:22:24.3993437Z cvt.rn.bf16x2.f32 %r42421, %r43317, %r43316; 2026-02-21T10:22:24.3993510Z cvt.rn.bf16x2.f32 %r42422, %r43319, %r43318; 2026-02-21T10:22:24.3993590Z cvt.rn.bf16x2.f32 %r42423, %r43321, %r43320; 2026-02-21T10:22:24.3993666Z cvt.rn.bf16x2.f32 %r42424, %r43323, %r43322; 2026-02-21T10:22:24.3993740Z cvt.rn.bf16x2.f32 %r42425, %r43325, %r43324; 2026-02-21T10:22:24.3993813Z cvt.rn.bf16x2.f32 %r42426, %r43327, %r43326; 2026-02-21T10:22:24.3993891Z cvt.rn.bf16x2.f32 %r42427, %r43329, %r43328; 2026-02-21T10:22:24.3993965Z cvt.rn.bf16x2.f32 %r42428, %r43331, %r43330; 2026-02-21T10:22:24.3994090Z cvt.rn.bf16x2.f32 %r42429, %r43333, %r43332; 2026-02-21T10:22:24.3994168Z cvt.rn.bf16x2.f32 %r42430, %r43335, %r43334; 2026-02-21T10:22:24.3994245Z cvt.rn.bf16x2.f32 %r42431, %r43337, %r43336; 2026-02-21T10:22:24.3994318Z cvt.rn.bf16x2.f32 %r42432, %r43339, %r43338; 2026-02-21T10:22:24.3994391Z cvt.rn.bf16x2.f32 %r42433, %r43341, %r43340; 2026-02-21T10:22:24.3994473Z cvt.rn.bf16x2.f32 %r42434, %r43343, %r43342; 2026-02-21T10:22:24.3994556Z cvt.rn.bf16x2.f32 %r42435, %r43345, %r43344; 2026-02-21T10:22:24.3994635Z cvt.rn.bf16x2.f32 %r42436, %r43347, %r43346; 2026-02-21T10:22:24.3994710Z cvt.rn.bf16x2.f32 %r42437, %r43349, %r43348; 2026-02-21T10:22:24.3994786Z cvt.rn.bf16x2.f32 %r42438, %r43351, %r43350; 2026-02-21T10:22:24.3994860Z cvt.rn.bf16x2.f32 %r42439, %r43353, %r43352; 2026-02-21T10:22:24.3994933Z cvt.rn.bf16x2.f32 %r42440, %r43355, %r43354; 2026-02-21T10:22:24.3995011Z cvt.rn.bf16x2.f32 %r42441, %r43357, %r43356; 2026-02-21T10:22:24.3995084Z cvt.rn.bf16x2.f32 %r42442, %r43359, %r43358; 2026-02-21T10:22:24.3995160Z cvt.rn.bf16x2.f32 %r42443, %r43361, %r43360; 2026-02-21T10:22:24.3995238Z cvt.rn.bf16x2.f32 %r42444, %r43363, %r43362; 2026-02-21T10:22:24.3995312Z cvt.rn.bf16x2.f32 %r42445, %r43365, %r43364; 2026-02-21T10:22:24.3995385Z cvt.rn.bf16x2.f32 %r42446, %r43367, %r43366; 2026-02-21T10:22:24.3995458Z cvt.rn.bf16x2.f32 %r42447, %r43369, %r43368; 2026-02-21T10:22:24.3995535Z cvt.rn.bf16x2.f32 %r42448, %r43371, %r43370; 2026-02-21T10:22:24.3995610Z cvt.rn.bf16x2.f32 %r42449, %r43373, %r43372; 2026-02-21T10:22:24.3995878Z .loc 1 98 43 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:98:43 2026-02-21T10:22:24.3995939Z bar.sync 0; 2026-02-21T10:22:24.3996135Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r96], {%r42386, %r42387, %r42388, %r42389}; 2026-02-21T10:22:24.3996321Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r97], {%r42402, %r42403, %r42404, %r42405}; 2026-02-21T10:22:24.3996631Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r98], {%r42418, %r42419, %r42420, %r42421}; 2026-02-21T10:22:24.3996818Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r99], {%r42434, %r42435, %r42436, %r42437}; 2026-02-21T10:22:24.3997009Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r100], {%r42390, %r42391, %r42392, %r42393}; 2026-02-21T10:22:24.3997196Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r101], {%r42406, %r42407, %r42408, %r42409}; 2026-02-21T10:22:24.3997380Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r102], {%r42422, %r42423, %r42424, %r42425}; 2026-02-21T10:22:24.3997637Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r103], {%r42438, %r42439, %r42440, %r42441}; 2026-02-21T10:22:24.3997826Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r104], {%r42394, %r42395, %r42396, %r42397}; 2026-02-21T10:22:24.3998010Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r105], {%r42410, %r42411, %r42412, %r42413}; 2026-02-21T10:22:24.3998253Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r106], {%r42426, %r42427, %r42428, %r42429}; 2026-02-21T10:22:24.3998435Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r107], {%r42442, %r42443, %r42444, %r42445}; 2026-02-21T10:22:24.3998634Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r108], {%r42398, %r42399, %r42400, %r42401}; 2026-02-21T10:22:24.3998817Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r109], {%r42414, %r42415, %r42416, %r42417}; 2026-02-21T10:22:24.3998999Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r110], {%r42430, %r42431, %r42432, %r42433}; 2026-02-21T10:22:24.3999182Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r111], {%r42446, %r42447, %r42448, %r42449}; 2026-02-21T10:22:24.3999245Z // begin inline asm 2026-02-21T10:22:24.3999323Z fence.proxy.async.shared::cta; 2026-02-21T10:22:24.3999384Z // end inline asm 2026-02-21T10:22:24.3999438Z bar.sync 0; 2026-02-21T10:22:24.3999506Z elect.sync %r42450|%p403, -1; 2026-02-21T10:22:24.3999589Z shfl.sync.idx.b32 %r42451, %r5, 0, 31, -1; 2026-02-21T10:22:24.3999667Z and.pred %p401, %p405, %p403; 2026-02-21T10:22:24.3999728Z and.b32 %r42452, %r42451, 1; 2026-02-21T10:22:24.3999857Z shl.b32 %r42453, %r42452, 14; 2026-02-21T10:22:24.3999929Z add.s32 %r42385, %r39936, %r42453; 2026-02-21T10:22:24.3999989Z shl.b32 %r42455, %r42452, 6; 2026-02-21T10:22:24.4000052Z or.b32 %r42383, %r42455, %r39937; 2026-02-21T10:22:24.4000112Z // begin inline asm 2026-02-21T10:22:24.4000349Z @%p401 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd497, {%r42383, %r42384}], [%r42385]; 2026-02-21T10:22:24.4000405Z // end inline asm 2026-02-21T10:22:24.4000480Z cp.async.bulk.commit_group; 2026-02-21T10:22:24.4000561Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:24.4000614Z bar.sync 0; 2026-02-21T10:22:24.4000813Z .loc 1 31 88 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:88 2026-02-21T10:22:24.4000876Z add.s32 %r43244, %r43244, 1; 2026-02-21T10:22:24.4000952Z setp.ne.b32 %p404, %r43244, %r3; 2026-02-21T10:22:24.4001018Z @%p404 bra $L__BB0_17; 2026-02-21T10:22:24.4001109Z $L__BB0_22: // %._crit_edge 2026-02-21T10:22:24.4001306Z .loc 1 31 4 // c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py:31:4 2026-02-21T10:22:24.4001359Z ret; 2026-02-21T10:22:24.4001414Z $L__tmp33: 2026-02-21T10:22:24.4001470Z $L__func_end0: 2026-02-21T10:22:24.4001556Z // -- End function 2026-02-21T10:22:24.4001607Z } 2026-02-21T10:22:24.4001850Z .file 1 "/tmp/torchinductor_root/32/c3273zy6owusxpn33mvnexiqk4x5buqqrd47ewejnkljb4a7wpw5.py" 2026-02-21T10:22:24.4002136Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:22:24.4002199Z .section .debug_abbrev 2026-02-21T10:22:24.4002252Z { 2026-02-21T10:22:24.4002347Z .b8 1 // Abbreviation Code 2026-02-21T10:22:24.4002440Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:22:24.4002537Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:24.4002627Z .b8 37 // DW_AT_producer 2026-02-21T10:22:24.4002710Z .b8 8 // DW_FORM_string 2026-02-21T10:22:24.4002787Z .b8 19 // DW_AT_language 2026-02-21T10:22:24.4002869Z .b8 5 // DW_FORM_data2 2026-02-21T10:22:24.4002946Z .b8 3 // DW_AT_name 2026-02-21T10:22:24.4003024Z .b8 8 // DW_FORM_string 2026-02-21T10:22:24.4003106Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:22:24.4003245Z .b8 6 // DW_FORM_data4 2026-02-21T10:22:24.4003327Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:22:24.4003403Z .b8 8 // DW_FORM_string 2026-02-21T10:22:24.4003478Z .b8 0 // EOM(1) 2026-02-21T10:22:24.4003595Z .b8 0 // EOM(2) 2026-02-21T10:22:24.4003682Z .b8 2 // Abbreviation Code 2026-02-21T10:22:24.4003772Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:24.4003851Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:24.4003925Z .b8 3 // DW_AT_name 2026-02-21T10:22:24.4004004Z .b8 8 // DW_FORM_string 2026-02-21T10:22:24.4004081Z .b8 32 // DW_AT_inline 2026-02-21T10:22:24.4004164Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:24.4004234Z .b8 0 // EOM(1) 2026-02-21T10:22:24.4004308Z .b8 0 // EOM(2) 2026-02-21T10:22:24.4004392Z .b8 3 // Abbreviation Code 2026-02-21T10:22:24.4004475Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:24.4004562Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:24.4004641Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:24.4004786Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:24.4004871Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:24.4004947Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:24.4005042Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:24.4005120Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:24.4005195Z .b8 0 // EOM(1) 2026-02-21T10:22:24.4005266Z .b8 0 // EOM(2) 2026-02-21T10:22:24.4005352Z .b8 4 // Abbreviation Code 2026-02-21T10:22:24.4005455Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:22:24.4005534Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:24.4005626Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:24.4005707Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:24.4005782Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:24.4005856Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:24.4005937Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:24.4006023Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:24.4006107Z .b8 88 // DW_AT_call_file 2026-02-21T10:22:24.4006254Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:24.4006336Z .b8 89 // DW_AT_call_line 2026-02-21T10:22:24.4006414Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:24.4006622Z .b8 87 // DW_AT_call_column 2026-02-21T10:22:24.4006711Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:24.4006781Z .b8 0 // EOM(1) 2026-02-21T10:22:24.4006851Z .b8 0 // EOM(2) 2026-02-21T10:22:24.4006921Z .b8 0 // EOM(3) 2026-02-21T10:22:24.4006972Z } 2026-02-21T10:22:24.4007033Z .section .debug_info 2026-02-21T10:22:24.4007083Z { 2026-02-21T10:22:24.4007173Z .b32 178 // Length of Unit 2026-02-21T10:22:24.4007266Z .b8 2 // DWARF version number 2026-02-21T10:22:24.4007316Z .b8 0 2026-02-21T10:22:24.4007534Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:22:24.4007637Z .b8 8 // Address Size (in bytes) 2026-02-21T10:22:24.4007751Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:22:24.4007841Z .b8 116 // DW_AT_producer 2026-02-21T10:22:24.4007960Z .b8 114 2026-02-21T10:22:24.4008011Z .b8 105 2026-02-21T10:22:24.4008062Z .b8 116 2026-02-21T10:22:24.4008114Z .b8 111 2026-02-21T10:22:24.4008166Z .b8 110 2026-02-21T10:22:24.4008218Z .b8 0 2026-02-21T10:22:24.4008299Z .b8 2 // DW_AT_language 2026-02-21T10:22:24.4008351Z .b8 0 2026-02-21T10:22:24.4008428Z .b8 99 // DW_AT_name 2026-02-21T10:22:24.4008491Z .b8 51 2026-02-21T10:22:24.4008545Z .b8 50 2026-02-21T10:22:24.4008595Z .b8 55 2026-02-21T10:22:24.4008644Z .b8 51 2026-02-21T10:22:24.4008698Z .b8 122 2026-02-21T10:22:24.4008748Z .b8 121 2026-02-21T10:22:24.4008801Z .b8 54 2026-02-21T10:22:24.4008854Z .b8 111 2026-02-21T10:22:24.4008907Z .b8 119 2026-02-21T10:22:24.4008959Z .b8 117 2026-02-21T10:22:24.4009009Z .b8 115 2026-02-21T10:22:24.4009060Z .b8 120 2026-02-21T10:22:24.4009115Z .b8 112 2026-02-21T10:22:24.4009166Z .b8 110 2026-02-21T10:22:24.4009215Z .b8 51 2026-02-21T10:22:24.4009267Z .b8 51 2026-02-21T10:22:24.4009321Z .b8 109 2026-02-21T10:22:24.4009370Z .b8 118 2026-02-21T10:22:24.4009421Z .b8 110 2026-02-21T10:22:24.4009474Z .b8 101 2026-02-21T10:22:24.4009525Z .b8 120 2026-02-21T10:22:24.4009655Z .b8 105 2026-02-21T10:22:24.4009714Z .b8 113 2026-02-21T10:22:24.4009765Z .b8 107 2026-02-21T10:22:24.4009815Z .b8 52 2026-02-21T10:22:24.4009866Z .b8 120 2026-02-21T10:22:24.4009917Z .b8 53 2026-02-21T10:22:24.4009967Z .b8 98 2026-02-21T10:22:24.4010018Z .b8 117 2026-02-21T10:22:24.4010069Z .b8 113 2026-02-21T10:22:24.4010121Z .b8 113 2026-02-21T10:22:24.4010171Z .b8 114 2026-02-21T10:22:24.4010221Z .b8 100 2026-02-21T10:22:24.4010273Z .b8 52 2026-02-21T10:22:24.4010324Z .b8 55 2026-02-21T10:22:24.4010377Z .b8 101 2026-02-21T10:22:24.4010427Z .b8 119 2026-02-21T10:22:24.4010481Z .b8 101 2026-02-21T10:22:24.4010531Z .b8 106 2026-02-21T10:22:24.4010581Z .b8 110 2026-02-21T10:22:24.4010633Z .b8 107 2026-02-21T10:22:24.4010683Z .b8 108 2026-02-21T10:22:24.4010734Z .b8 106 2026-02-21T10:22:24.4010784Z .b8 98 2026-02-21T10:22:24.4010839Z .b8 52 2026-02-21T10:22:24.4010890Z .b8 97 2026-02-21T10:22:24.4010941Z .b8 55 2026-02-21T10:22:24.4010993Z .b8 119 2026-02-21T10:22:24.4011044Z .b8 112 2026-02-21T10:22:24.4011098Z .b8 119 2026-02-21T10:22:24.4011148Z .b8 53 2026-02-21T10:22:24.4011201Z .b8 46 2026-02-21T10:22:24.4011251Z .b8 112 2026-02-21T10:22:24.4011301Z .b8 121 2026-02-21T10:22:24.4011351Z .b8 0 2026-02-21T10:22:24.4011457Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:22:24.4011538Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:22:24.4011589Z .b8 116 2026-02-21T10:22:24.4011641Z .b8 109 2026-02-21T10:22:24.4011777Z .b8 112 2026-02-21T10:22:24.4011827Z .b8 47 2026-02-21T10:22:24.4011880Z .b8 116 2026-02-21T10:22:24.4011934Z .b8 111 2026-02-21T10:22:24.4011983Z .b8 114 2026-02-21T10:22:24.4012034Z .b8 99 2026-02-21T10:22:24.4012086Z .b8 104 2026-02-21T10:22:24.4012136Z .b8 105 2026-02-21T10:22:24.4012186Z .b8 110 2026-02-21T10:22:24.4012236Z .b8 100 2026-02-21T10:22:24.4012291Z .b8 117 2026-02-21T10:22:24.4012341Z .b8 99 2026-02-21T10:22:24.4012392Z .b8 116 2026-02-21T10:22:24.4012444Z .b8 111 2026-02-21T10:22:24.4012495Z .b8 114 2026-02-21T10:22:24.4012548Z .b8 95 2026-02-21T10:22:24.4012599Z .b8 114 2026-02-21T10:22:24.4012650Z .b8 111 2026-02-21T10:22:24.4012700Z .b8 111 2026-02-21T10:22:24.4012750Z .b8 116 2026-02-21T10:22:24.4012800Z .b8 47 2026-02-21T10:22:24.4012854Z .b8 51 2026-02-21T10:22:24.4012904Z .b8 50 2026-02-21T10:22:24.4012953Z .b8 0 2026-02-21T10:22:24.4013068Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:22:24.4013145Z .b8 95 // DW_AT_name 2026-02-21T10:22:24.4013254Z .b8 104 2026-02-21T10:22:24.4013309Z .b8 101 2026-02-21T10:22:24.4013368Z .b8 108 2026-02-21T10:22:24.4013422Z .b8 105 2026-02-21T10:22:24.4013474Z .b8 111 2026-02-21T10:22:24.4013526Z .b8 110 2026-02-21T10:22:24.4013575Z .b8 95 2026-02-21T10:22:24.4013625Z .b8 109 2026-02-21T10:22:24.4013675Z .b8 97 2026-02-21T10:22:24.4013779Z .b8 116 2026-02-21T10:22:24.4013829Z .b8 109 2026-02-21T10:22:24.4013879Z .b8 117 2026-02-21T10:22:24.4013931Z .b8 108 2026-02-21T10:22:24.4013981Z .b8 95 2026-02-21T10:22:24.4014033Z .b8 98 2026-02-21T10:22:24.4014084Z .b8 102 2026-02-21T10:22:24.4014135Z .b8 49 2026-02-21T10:22:24.4014184Z .b8 54 2026-02-21T10:22:24.4014233Z .b8 95 2026-02-21T10:22:24.4014287Z .b8 105 2026-02-21T10:22:24.4014337Z .b8 110 2026-02-21T10:22:24.4014387Z .b8 116 2026-02-21T10:22:24.4014437Z .b8 52 2026-02-21T10:22:24.4014492Z .b8 0 2026-02-21T10:22:24.4014583Z .b8 1 // DW_AT_inline 2026-02-21T10:22:24.4014698Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:22:24.4014793Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:22:24.4014890Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:22:24.4014990Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:24.4015121Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:22:24.4015221Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:24.4015369Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:22:24.4015466Z .b64 $L__tmp32 // DW_AT_high_pc 2026-02-21T10:22:24.4015550Z .b8 1 // DW_AT_call_file 2026-02-21T10:22:24.4015631Z .b8 94 // DW_AT_call_line 2026-02-21T10:22:24.4015715Z .b8 40 // DW_AT_call_column 2026-02-21T10:22:24.4015811Z .b8 0 // End Of Children Mark 2026-02-21T10:22:24.4015898Z .b8 0 // End Of Children Mark 2026-02-21T10:22:24.4015950Z } 2026-02-21T10:22:24.4016019Z .section .debug_macinfo { } 2026-02-21T10:22:24.4016027Z 2026-02-21T10:22:24.4016114Z ================================================================ 2026-02-21T10:22:24.4016232Z please share the reproducer above with Triton project. 2026-02-21T10:22:31.0859393Z 2026-02-21T10:22:31.0859404Z 2026-02-21T10:22:31.0859409Z 2026-02-21T10:22:31.0859747Z ================================================================ 2026-02-21T10:22:31.0860137Z Internal Triton PTX codegen error 2026-02-21T10:22:31.0860427Z `ptxas` stderr: 2026-02-21T10:22:31.0861150Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 627 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T10:22:31.0861981Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:31.0862555Z 2026-02-21T10:22:31.0863255Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpi4v9yydw.ptx -o /tmp/tmpi4v9yydw.ptx.o 2026-02-21T10:22:31.0864007Z 2026-02-21T10:22:31.0864012Z 2026-02-21T10:22:31.0864082Z // 2026-02-21T10:22:31.0864279Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:22:31.0864519Z // 2026-02-21T10:22:31.0864624Z 2026-02-21T10:22:31.0864695Z .version 8.7 2026-02-21T10:22:31.0864875Z .target sm_90a 2026-02-21T10:22:31.0865053Z .address_size 64 2026-02-21T10:22:31.0865169Z 2026-02-21T10:22:31.0865393Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:22:31.0865821Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:22:31.0866141Z // @_helion_matmul_bf16_int4 2026-02-21T10:22:31.0869837Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:22:31.0870324Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:22:31.0870735Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:22:31.0871108Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:22:31.0871479Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:22:31.0871984Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:22:31.0872290Z ) 2026-02-21T10:22:31.0872432Z .reqntid 256 2026-02-21T10:22:31.0872611Z .maxnreg 32 2026-02-21T10:22:31.0872757Z { 2026-02-21T10:22:31.0872909Z .reg .pred %p<60>; 2026-02-21T10:22:31.0873100Z .reg .b16 %rs<385>; 2026-02-21T10:22:31.0873270Z .reg .b32 %r<3821>; 2026-02-21T10:22:31.0873431Z .reg .b64 %rd<156>; 2026-02-21T10:22:31.0873753Z .loc 1 19 0 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:19:0 2026-02-21T10:22:31.0874150Z $L__func_begin0: 2026-02-21T10:22:31.0874469Z .loc 1 19 0 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:19:0 2026-02-21T10:22:31.0874796Z 2026-02-21T10:22:31.0874853Z // %bb.0: 2026-02-21T10:22:31.0875068Z ld.param.b64 %rd26, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:22:31.0875345Z $L__tmp0: 2026-02-21T10:22:31.0875664Z .loc 1 21 68 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:21:68 2026-02-21T10:22:31.0876056Z mov.u32 %r182, %ctaid.x; 2026-02-21T10:22:31.0876292Z ld.param.b64 %rd29, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:22:31.0876831Z mov.u32 %r183, %ctaid.y; 2026-02-21T10:22:31.0877070Z ld.param.b64 %rd46, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:22:31.0877340Z mov.u32 %r184, %ctaid.z; 2026-02-21T10:22:31.0877527Z mov.u32 %r185, %nctaid.x; 2026-02-21T10:22:31.0877713Z mov.u32 %r186, %nctaid.y; 2026-02-21T10:22:31.0877903Z mad.lo.s32 %r187, %r184, %r186, %r183; 2026-02-21T10:22:31.0878134Z mad.lo.s32 %r188, %r187, %r185, %r182; 2026-02-21T10:22:31.0878355Z shl.b32 %r189, %r188, 7; 2026-02-21T10:22:31.0878543Z cvt.s64.s32 %rd47, %r189; 2026-02-21T10:22:31.0878724Z add.s64 %rd43, %rd46, %rd47; 2026-02-21T10:22:31.0878919Z mov.u32 %r1, %tid.x; 2026-02-21T10:22:31.0879091Z setp.lt.u32 %p1, %r1, 32; 2026-02-21T10:22:31.0879263Z shl.b32 %r190, %r1, 2; 2026-02-21T10:22:31.0879441Z mov.b32 %r2514, global_smem; 2026-02-21T10:22:31.0879640Z add.s32 %r174, %r2514, %r190; 2026-02-21T10:22:31.0879826Z mov.b32 %r175, 0; 2026-02-21T10:22:31.0879992Z // begin inline asm 2026-02-21T10:22:31.0880177Z @%p1 st.shared.b32 [ %r174 + 0 ], %r175; 2026-02-21T10:22:31.0880383Z // end inline asm 2026-02-21T10:22:31.0880543Z bar.warp.sync -1; 2026-02-21T10:22:31.0880708Z setp.eq.b32 %p2, %r1, 0; 2026-02-21T10:22:31.0880892Z cvt.u64.u32 %rd28, %r2514; 2026-02-21T10:22:31.0881071Z // begin inline asm 2026-02-21T10:22:31.0881399Z @%p2 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd28 + 0 ], %rd29; 2026-02-21T10:22:31.0881747Z // end inline asm 2026-02-21T10:22:31.0882035Z // begin inline asm 2026-02-21T10:22:31.0882301Z @%p2 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1; 2026-02-21T10:22:31.0882604Z // end inline asm 2026-02-21T10:22:31.0882756Z mov.b32 %r176, 64; 2026-02-21T10:22:31.0882902Z // begin inline asm 2026-02-21T10:22:31.0883197Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r176; 2026-02-21T10:22:31.0883534Z // end inline asm 2026-02-21T10:22:31.0883682Z // begin inline asm 2026-02-21T10:22:31.0883956Z @%p2 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r176; 2026-02-21T10:22:31.0884272Z // end inline asm 2026-02-21T10:22:31.0884418Z mov.b32 %r178, 1280; 2026-02-21T10:22:31.0884572Z // begin inline asm 2026-02-21T10:22:31.0884859Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r178; 2026-02-21T10:22:31.0885642Z [3344s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:22:31.0887496Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[2, 3], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:22:31.0889101Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:22:31.0889390Z `ptxas` stderr: 2026-02-21T10:22:31.0889937Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 627 in function _helion_matmul_bf16_int4. Try to compile with register target of 62 or higher. 2026-02-21T10:22:31.0890564Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:31.0890744Z 2026-02-21T10:22:31.0891251Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpi4v9yydw.ptx -o /tmp/tmpi4v9yydw.ptx.o 2026-02-21T10:22:31.0891827Z 2026-02-21T10:22:31.0891979Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:22:31.0892275Z // end inline asm 2026-02-21T10:22:31.0892428Z mov.b32 %r179, 65536; 2026-02-21T10:22:31.0892596Z // begin inline asm 2026-02-21T10:22:31.0892904Z @%p2 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r179; 2026-02-21T10:22:31.0893327Z // end inline asm 2026-02-21T10:22:31.0893482Z mov.b64 %rd36, 2560; 2026-02-21T10:22:31.0893638Z // begin inline asm 2026-02-21T10:22:31.0893942Z @%p2 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd28 + 0 ], 0x0, %rd36; 2026-02-21T10:22:31.0894296Z // end inline asm 2026-02-21T10:22:31.0894445Z mov.b32 %r180, 1; 2026-02-21T10:22:31.0894596Z // begin inline asm 2026-02-21T10:22:31.0894907Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0, %r180; 2026-02-21T10:22:31.0895275Z // end inline asm 2026-02-21T10:22:31.0895421Z // begin inline asm 2026-02-21T10:22:31.0895723Z @%p2 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x1, %r180; 2026-02-21T10:22:31.0896066Z // end inline asm 2026-02-21T10:22:31.0896227Z // begin inline asm 2026-02-21T10:22:31.0896631Z @%p2 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd28 + 0 ], 0xa; 2026-02-21T10:22:31.0896987Z // end inline asm 2026-02-21T10:22:31.0897140Z // begin inline asm 2026-02-21T10:22:31.0897440Z @%p2 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T10:22:31.0897791Z // end inline asm 2026-02-21T10:22:31.0897936Z // begin inline asm 2026-02-21T10:22:31.0898224Z @%p2 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x3; 2026-02-21T10:22:31.0898547Z // end inline asm 2026-02-21T10:22:31.0898696Z // begin inline asm 2026-02-21T10:22:31.0899050Z @%p2 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd28 + 0 ], 0x0; 2026-02-21T10:22:31.0899363Z // end inline asm 2026-02-21T10:22:31.0899510Z // begin inline asm 2026-02-21T10:22:31.0899932Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd43 + 0 ], [ %rd28 + 0 ], 0x80; 2026-02-21T10:22:31.0900406Z // end inline asm 2026-02-21T10:22:31.0900552Z // begin inline asm 2026-02-21T10:22:31.0900797Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd43 + 0 ], 0x80; 2026-02-21T10:22:31.0901099Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:31.0901319Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:31.0901532Z // end inline asm 2026-02-21T10:22:31.0901682Z bar.sync 0; 2026-02-21T10:22:31.0901836Z cvta.global.u64 %rd1, %rd43; 2026-02-21T10:22:31.0902177Z .loc 1 27 35 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:27:35 2026-02-21T10:22:31.0902540Z mul.lo.s32 %r3753, %r182, 78; 2026-02-21T10:22:31.0902933Z .loc 1 28 37 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:28:37 2026-02-21T10:22:31.0903285Z add.s32 %r192, %r3753, 78; 2026-02-21T10:22:31.0903593Z .loc 1 28 49 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:28:49 2026-02-21T10:22:31.0904003Z min.s32 %r3, %r192, 10240; 2026-02-21T10:22:31.0904309Z .loc 1 43 45 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:43:45 2026-02-21T10:22:31.0904656Z bfe.u32 %r6, %r1, 2, 6; 2026-02-21T10:22:31.0904974Z .loc 1 57 38 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:57:38 2026-02-21T10:22:31.0905309Z and.b32 %r7, %r1, 3; 2026-02-21T10:22:31.0905468Z shl.b32 %r8, %r7, 2; 2026-02-21T10:22:31.0905764Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.0906129Z setp.lt.s32 %p19, %r3753, %r3; 2026-02-21T10:22:31.0906320Z @%p19 bra $L__BB0_2; 2026-02-21T10:22:31.0906609Z bra.uni $L__BB0_1; 2026-02-21T10:22:31.0906803Z $L__BB0_2: // %.lr.ph 2026-02-21T10:22:31.0907162Z .loc 1 0 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:0:120 2026-02-21T10:22:31.0907562Z ld.param.b64 %rd27, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:22:31.0907811Z shr.u32 %r4, %r1, 5; 2026-02-21T10:22:31.0907986Z and.b32 %r5, %r190, 124; 2026-02-21T10:22:31.0908155Z and.b32 %r9, %r1, 128; 2026-02-21T10:22:31.0908612Z .loc 1 51 48 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:51:48 2026-02-21T10:22:31.0908973Z and.b32 %r194, %r1, 224; 2026-02-21T10:22:31.0909147Z bfe.u32 %r195, %r1, 5, 3; 2026-02-21T10:22:31.0909463Z .loc 1 41 45 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:41:45 2026-02-21T10:22:31.0909804Z and.b32 %r196, %r1, 31; 2026-02-21T10:22:31.0909974Z shl.b32 %r197, %r1, 3; 2026-02-21T10:22:31.0910140Z and.b32 %r3820, %r197, 1912; 2026-02-21T10:22:31.0910339Z and.b32 %r3819, %r1, 16; 2026-02-21T10:22:31.0910508Z bfe.s32 %r198, %r1, 4, 1; 2026-02-21T10:22:31.0910683Z and.b32 %r199, %r198, 136; 2026-02-21T10:22:31.0910862Z xor.b32 %r200, %r199, %r3820; 2026-02-21T10:22:31.0911053Z add.s32 %r15, %r2514, %r200; 2026-02-21T10:22:31.0911234Z and.b32 %r202, %r1, 96; 2026-02-21T10:22:31.0911395Z shl.b32 %r203, %r202, 4; 2026-02-21T10:22:31.0911561Z and.b32 %r204, %r197, 96; 2026-02-21T10:22:31.0911728Z shl.b32 %r205, %r7, 1; 2026-02-21T10:22:31.0911894Z or.b32 %r206, %r203, %r204; 2026-02-21T10:22:31.0912066Z or.b32 %r207, %r206, %r205; 2026-02-21T10:22:31.0912243Z or.b32 %r208, %r207, %r199; 2026-02-21T10:22:31.0912417Z add.s32 %r16, %r2514, %r208; 2026-02-21T10:22:31.0912586Z xor.b32 %r209, %r208, 8; 2026-02-21T10:22:31.0912766Z add.s32 %r17, %r2514, %r209; 2026-02-21T10:22:31.0912957Z bfe.u32 %r210, %r1, 5, 2; 2026-02-21T10:22:31.0913207Z or.b32 %r211, %r5, %r210; 2026-02-21T10:22:31.0913378Z or.b32 %r212, %r211, %r9; 2026-02-21T10:22:31.0913547Z add.s32 %r18, %r2514, %r212; 2026-02-21T10:22:31.0913718Z xor.b32 %r213, %r212, 32; 2026-02-21T10:22:31.0913889Z add.s32 %r19, %r2514, %r213; 2026-02-21T10:22:31.0914058Z xor.b32 %r214, %r212, 64; 2026-02-21T10:22:31.0914241Z add.s32 %r20, %r2514, %r214; 2026-02-21T10:22:31.0914420Z xor.b32 %r215, %r212, 96; 2026-02-21T10:22:31.0914590Z add.s32 %r21, %r2514, %r215; 2026-02-21T10:22:31.0914761Z shl.b32 %r216, %r7, 8; 2026-02-21T10:22:31.0914929Z shl.b32 %r217, %r7, 5; 2026-02-21T10:22:31.0915086Z and.b32 %r218, %r1, 124; 2026-02-21T10:22:31.0915255Z xor.b32 %r219, %r217, %r218; 2026-02-21T10:22:31.0915429Z add.s32 %r220, %r2514, %r216; 2026-02-21T10:22:31.0915601Z add.s32 %r22, %r220, %r219; 2026-02-21T10:22:31.0915784Z shl.b32 %r221, %r1, 6; 2026-02-21T10:22:31.0915948Z and.b32 %r222, %r221, 8128; 2026-02-21T10:22:31.0916124Z and.b32 %r223, %r197, 48; 2026-02-21T10:22:31.0916291Z shr.u32 %r224, %r9, 5; 2026-02-21T10:22:31.0916651Z or.b32 %r225, %r222, %r224; 2026-02-21T10:22:31.0916837Z or.b32 %r226, %r225, %r223; 2026-02-21T10:22:31.0917019Z add.s32 %r23, %r2514, %r226; 2026-02-21T10:22:31.0917191Z xor.b32 %r227, %r226, 16; 2026-02-21T10:22:31.0917365Z add.s32 %r24, %r2514, %r227; 2026-02-21T10:22:31.0917622Z xor.b32 %r228, %r226, 32; 2026-02-21T10:22:31.0917787Z add.s32 %r25, %r2514, %r228; 2026-02-21T10:22:31.0917964Z xor.b32 %r229, %r226, 48; 2026-02-21T10:22:31.0918144Z add.s32 %r26, %r2514, %r229; 2026-02-21T10:22:31.0918320Z shl.b32 %r230, %r4, 3; 2026-02-21T10:22:31.0918483Z and.b32 %r231, %r230, 56; 2026-02-21T10:22:31.0918663Z or.b32 %r232, %r231, %r196; 2026-02-21T10:22:31.0918836Z shl.b32 %r233, %r232, 4; 2026-02-21T10:22:31.0919007Z add.s32 %r234, %r2514, 8192; 2026-02-21T10:22:31.0919182Z add.s32 %r1604, %r234, %r233; 2026-02-21T10:22:31.0919363Z shl.b32 %r235, %r1, 4; 2026-02-21T10:22:31.0919538Z and.b32 %r236, %r235, 112; 2026-02-21T10:22:31.0919724Z shl.b32 %r237, %r4, 7; 2026-02-21T10:22:31.0919894Z shl.b32 %r238, %r196, 3; 2026-02-21T10:22:31.0920060Z or.b32 %r239, %r237, %r238; 2026-02-21T10:22:31.0920236Z and.b32 %r240, %r239, 384; 2026-02-21T10:22:31.0920404Z and.b32 %r241, %r221, 512; 2026-02-21T10:22:31.0920586Z add.s32 %r242, %r234, %r236; 2026-02-21T10:22:31.0920765Z add.s32 %r243, %r242, %r241; 2026-02-21T10:22:31.0920942Z add.s32 %r282, %r243, %r240; 2026-02-21T10:22:31.0921118Z bfe.u32 %r244, %r2514, 4, 14; 2026-02-21T10:22:31.0921375Z cvt.u64.u32 %rd48, %r244; 2026-02-21T10:22:31.0921579Z or.b64 %rd105, %rd48, -9223371899382267904; 2026-02-21T10:22:31.0921793Z add.s32 %r245, %r2514, 32; 2026-02-21T10:22:31.0921969Z bfe.u32 %r246, %r245, 4, 14; 2026-02-21T10:22:31.0922140Z cvt.u64.u32 %rd49, %r246; 2026-02-21T10:22:31.0922328Z or.b64 %rd106, %rd49, -9223371899382267904; 2026-02-21T10:22:31.0922535Z add.s32 %r247, %r2514, 4096; 2026-02-21T10:22:31.0922714Z bfe.u32 %r248, %r247, 4, 14; 2026-02-21T10:22:31.0922890Z cvt.u64.u32 %rd50, %r248; 2026-02-21T10:22:31.0923078Z or.b64 %rd107, %rd50, -9223371899382267904; 2026-02-21T10:22:31.0923284Z add.s32 %r249, %r2514, 4128; 2026-02-21T10:22:31.0923457Z bfe.u32 %r250, %r249, 4, 14; 2026-02-21T10:22:31.0923632Z cvt.u64.u32 %rd51, %r250; 2026-02-21T10:22:31.0923816Z or.b64 %rd108, %rd51, -9223371899382267904; 2026-02-21T10:22:31.0924033Z shl.b32 %r251, %r1, 7; 2026-02-21T10:22:31.0924198Z and.b32 %r252, %r251, 1920; 2026-02-21T10:22:31.0924374Z shl.b32 %r253, %r194, 6; 2026-02-21T10:22:31.0924537Z or.b32 %r254, %r252, %r253; 2026-02-21T10:22:31.0924718Z xor.b32 %r255, %r236, %r3819; 2026-02-21T10:22:31.0924895Z or.b32 %r256, %r254, %r255; 2026-02-21T10:22:31.0925066Z add.s32 %r29, %r2514, %r256; 2026-02-21T10:22:31.0925238Z xor.b32 %r257, %r256, 32; 2026-02-21T10:22:31.0925401Z add.s32 %r30, %r2514, %r257; 2026-02-21T10:22:31.0925572Z xor.b32 %r258, %r256, 64; 2026-02-21T10:22:31.0925735Z add.s32 %r31, %r2514, %r258; 2026-02-21T10:22:31.0926010Z xor.b32 %r259, %r256, 96; 2026-02-21T10:22:31.0926177Z add.s32 %r32, %r2514, %r259; 2026-02-21T10:22:31.0926629Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.0927006Z add.s64 %rd6, %rd26, 96; 2026-02-21T10:22:31.0927179Z shl.b32 %r3818, %r6, 13; 2026-02-21T10:22:31.0927494Z .loc 1 50 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:50:126 2026-02-21T10:22:31.0927843Z or.b32 %r34, %r3818, %r8; 2026-02-21T10:22:31.0928153Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.0928508Z mad.wide.u32 %rd7, %r195, 1280, %rd27; 2026-02-21T10:22:31.0928766Z $L__BB0_3: // =>This Loop Header: Depth=1 2026-02-21T10:22:31.0929047Z // Child Loop BB0_4 Depth 2 2026-02-21T10:22:31.0929317Z // Child Loop BB0_6 Depth 2 2026-02-21T10:22:31.0929780Z .loc 1 35 35 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:35:35 2026-02-21T10:22:31.0930135Z shr.s32 %r261, %r3753, 31; 2026-02-21T10:22:31.0930312Z shr.u32 %r262, %r261, 16; 2026-02-21T10:22:31.0930480Z add.s32 %r263, %r3753, %r262; 2026-02-21T10:22:31.0930739Z shr.s32 %r264, %r263, 16; 2026-02-21T10:22:31.0931051Z .loc 1 36 33 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:36:33 2026-02-21T10:22:31.0931402Z shl.b32 %r265, %r264, 6; 2026-02-21T10:22:31.0931703Z .loc 1 37 39 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:39 2026-02-21T10:22:31.0932052Z sub.s32 %r266, 10, %r265; 2026-02-21T10:22:31.0932358Z .loc 1 37 52 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:52 2026-02-21T10:22:31.0932695Z min.s32 %r267, %r266, 64; 2026-02-21T10:22:31.0932999Z .loc 1 38 45 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:45 2026-02-21T10:22:31.0933343Z and.b32 %r268, %r263, -65536; 2026-02-21T10:22:31.0933524Z sub.s32 %r269, %r3753, %r268; 2026-02-21T10:22:31.0933838Z .loc 1 39 51 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:39:51 2026-02-21T10:22:31.0934186Z div.s32 %r270, %r269, %r267; 2026-02-21T10:22:31.0934498Z .loc 1 38 64 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:64 2026-02-21T10:22:31.0934960Z mul.lo.s32 %r271, %r270, %r267; 2026-02-21T10:22:31.0935160Z sub.s32 %r272, %r269, %r271; 2026-02-21T10:22:31.0935465Z .loc 1 38 30 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:30 2026-02-21T10:22:31.0935818Z add.s32 %r273, %r272, %r265; 2026-02-21T10:22:31.0936124Z .loc 1 40 27 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:40:27 2026-02-21T10:22:31.0936582Z shl.b32 %r36, %r273, 7; 2026-02-21T10:22:31.0936902Z .loc 1 42 27 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:42:27 2026-02-21T10:22:31.0937242Z shl.b32 %r1965, %r270, 6; 2026-02-21T10:22:31.0937572Z .loc 1 50 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:50:126 2026-02-21T10:22:31.0937930Z shl.b32 %r274, %r270, 19; 2026-02-21T10:22:31.0938113Z or.b32 %r275, %r34, %r274; 2026-02-21T10:22:31.0938314Z mad.wide.s32 %rd151, %r275, 2, %rd6; 2026-02-21T10:22:31.0938522Z or.b32 %r276, %r5, %r36; 2026-02-21T10:22:31.0938697Z cvt.s64.s32 %rd53, %r276; 2026-02-21T10:22:31.0938871Z add.s64 %rd150, %rd7, %rd53; 2026-02-21T10:22:31.0939056Z mov.b32 %r1735, 0f00000000; 2026-02-21T10:22:31.0939229Z mov.b64 %rd152, -32; 2026-02-21T10:22:31.0939392Z mov.b32 %r1736, %r1735; 2026-02-21T10:22:31.0939556Z mov.b32 %r1737, %r1735; 2026-02-21T10:22:31.0939720Z mov.b32 %r1738, %r1735; 2026-02-21T10:22:31.0939879Z mov.b32 %r1739, %r1735; 2026-02-21T10:22:31.0940138Z mov.b32 %r1740, %r1735; 2026-02-21T10:22:31.0940304Z mov.b32 %r1741, %r1735; 2026-02-21T10:22:31.0940463Z mov.b32 %r1742, %r1735; 2026-02-21T10:22:31.0940626Z mov.b32 %r1743, %r1735; 2026-02-21T10:22:31.0940784Z mov.b32 %r1744, %r1735; 2026-02-21T10:22:31.0940954Z mov.b32 %r1745, %r1735; 2026-02-21T10:22:31.0941128Z mov.b32 %r1746, %r1735; 2026-02-21T10:22:31.0941293Z mov.b32 %r1747, %r1735; 2026-02-21T10:22:31.0941452Z mov.b32 %r1748, %r1735; 2026-02-21T10:22:31.0941616Z mov.b32 %r1749, %r1735; 2026-02-21T10:22:31.0941779Z mov.b32 %r1750, %r1735; 2026-02-21T10:22:31.0941943Z mov.b32 %r1751, %r1735; 2026-02-21T10:22:31.0942103Z mov.b32 %r1752, %r1735; 2026-02-21T10:22:31.0942265Z mov.b32 %r1753, %r1735; 2026-02-21T10:22:31.0942427Z mov.b32 %r1754, %r1735; 2026-02-21T10:22:31.0942590Z mov.b32 %r1755, %r1735; 2026-02-21T10:22:31.0942763Z mov.b32 %r1756, %r1735; 2026-02-21T10:22:31.0942924Z mov.b32 %r1757, %r1735; 2026-02-21T10:22:31.0943103Z mov.b32 %r1758, %r1735; 2026-02-21T10:22:31.0943341Z mov.b32 %r1759, %r1735; 2026-02-21T10:22:31.0943511Z mov.b32 %r1760, %r1735; 2026-02-21T10:22:31.0943674Z mov.b32 %r1761, %r1735; 2026-02-21T10:22:31.0943841Z mov.b32 %r1762, %r1735; 2026-02-21T10:22:31.0944003Z mov.b32 %r1763, %r1735; 2026-02-21T10:22:31.0944163Z mov.b32 %r1764, %r1735; 2026-02-21T10:22:31.0944410Z mov.b32 %r1765, %r1735; 2026-02-21T10:22:31.0944576Z mov.b32 %r1766, %r1735; 2026-02-21T10:22:31.0944799Z $L__BB0_4: // Parent Loop BB0_3 Depth=1 2026-02-21T10:22:31.0945102Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:31.0945501Z .loc 1 81 38 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:81:38 2026-02-21T10:22:31.0945866Z setp.eq.b32 %p34, %r9, 0; 2026-02-21T10:22:31.0946188Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.0946676Z add.s64 %rd55, %rd151, -96; 2026-02-21T10:22:31.0946865Z // begin inline asm 2026-02-21T10:22:31.0947028Z mov.u64 %rd54, 0x0; 2026-02-21T10:22:31.0947244Z createpolicy.fractional.L2::evict_last.b64 %rd54, 1.0; 2026-02-21T10:22:31.0947511Z // end inline asm 2026-02-21T10:22:31.0947662Z // begin inline asm 2026-02-21T10:22:31.0947815Z mov.u32 %r277, 0x0; 2026-02-21T10:22:31.0947967Z mov.u32 %r278, 0x0; 2026-02-21T10:22:31.0948237Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r277, %r278 }, [ %rd55 + 0 ], %rd54; 2026-02-21T10:22:31.0948707Z // end inline asm 2026-02-21T10:22:31.0949006Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.0949360Z bar.sync 0; 2026-02-21T10:22:31.0949523Z st.shared.v2.b32 [%r15], {%r277, %r278}; 2026-02-21T10:22:31.0949732Z bar.sync 0; 2026-02-21T10:22:31.0949879Z ld.shared.b16 %rs1, [%r16]; 2026-02-21T10:22:31.0950071Z ld.shared.b16 %rs2, [%r16+256]; 2026-02-21T10:22:31.0950271Z ld.shared.b16 %rs3, [%r16+16]; 2026-02-21T10:22:31.0950465Z ld.shared.b16 %rs4, [%r16+272]; 2026-02-21T10:22:31.0950659Z ld.shared.b16 %rs5, [%r17]; 2026-02-21T10:22:31.0950836Z ld.shared.b16 %rs6, [%r17+256]; 2026-02-21T10:22:31.0951035Z ld.shared.b16 %rs7, [%r17+16]; 2026-02-21T10:22:31.0951232Z ld.shared.b16 %rs8, [%r17+272]; 2026-02-21T10:22:31.0951440Z cvt.f32.bf16 %r576, %rs1; 2026-02-21T10:22:31.0951618Z cvt.f32.bf16 %r577, %rs2; 2026-02-21T10:22:31.0951794Z cvt.f32.bf16 %r578, %rs5; 2026-02-21T10:22:31.0951961Z cvt.f32.bf16 %r579, %rs6; 2026-02-21T10:22:31.0952136Z cvt.f32.bf16 %r644, %rs3; 2026-02-21T10:22:31.0952308Z cvt.f32.bf16 %r645, %rs4; 2026-02-21T10:22:31.0952473Z cvt.f32.bf16 %r646, %rs7; 2026-02-21T10:22:31.0952645Z cvt.f32.bf16 %r647, %rs8; 2026-02-21T10:22:31.0952959Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.0953320Z // begin inline asm 2026-02-21T10:22:31.0953478Z mov.u64 %rd57, 0x0; 2026-02-21T10:22:31.0953794Z createpolicy.fractional.L2::evict_last.b64 %rd57, 1.0; 2026-02-21T10:22:31.0954047Z // end inline asm 2026-02-21T10:22:31.0954200Z // begin inline asm 2026-02-21T10:22:31.0954358Z mov.u32 %r279, 0x0; 2026-02-21T10:22:31.0954607Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r279 }, [ %rd150 + 0 ], %rd57; 2026-02-21T10:22:31.0954913Z // end inline asm 2026-02-21T10:22:31.0955209Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.0955574Z bar.sync 0; 2026-02-21T10:22:31.0955731Z st.shared.b8 [%r18], %r279; 2026-02-21T10:22:31.0955931Z prmt.b32 %r1873, %r279, 0, 0x7771U; 2026-02-21T10:22:31.0956138Z st.shared.b8 [%r19+256], %r1873; 2026-02-21T10:22:31.0956342Z prmt.b32 %r1874, %r279, 0, 0x7772U; 2026-02-21T10:22:31.0956678Z st.shared.b8 [%r20+512], %r1874; 2026-02-21T10:22:31.0956877Z prmt.b32 %r1875, %r279, 0, 0x7773U; 2026-02-21T10:22:31.0965937Z st.shared.b8 [%r21+768], %r1875; 2026-02-21T10:22:31.0966221Z bar.sync 0; 2026-02-21T10:22:31.0966727Z ld.shared.b32 %r1876, [%r22]; 2026-02-21T10:22:31.0966962Z prmt.b32 %r1877, %r1876, 0, 0x7770U; 2026-02-21T10:22:31.0967205Z cvt.u16.u32 %rs9, %r1877; 2026-02-21T10:22:31.0967410Z prmt.b32 %r1878, %r1876, 0, 0x7771U; 2026-02-21T10:22:31.0967636Z cvt.u16.u32 %rs10, %r1878; 2026-02-21T10:22:31.0967930Z prmt.b32 %r1879, %r1876, 0, 0x7772U; 2026-02-21T10:22:31.0968147Z cvt.u16.u32 %rs11, %r1879; 2026-02-21T10:22:31.0968341Z prmt.b32 %r1880, %r1876, 0, 0x7773U; 2026-02-21T10:22:31.0968554Z cvt.u16.u32 %rs12, %r1880; 2026-02-21T10:22:31.0968743Z ld.shared.b32 %r1881, [%r22+128]; 2026-02-21T10:22:31.0968955Z prmt.b32 %r1882, %r1881, 0, 0x7770U; 2026-02-21T10:22:31.0969165Z cvt.u16.u32 %rs13, %r1882; 2026-02-21T10:22:31.0969354Z prmt.b32 %r1883, %r1881, 0, 0x7771U; 2026-02-21T10:22:31.0969561Z cvt.u16.u32 %rs14, %r1883; 2026-02-21T10:22:31.0969736Z prmt.b32 %r1884, %r1881, 0, 0x7772U; 2026-02-21T10:22:31.0969934Z cvt.u16.u32 %rs15, %r1884; 2026-02-21T10:22:31.0970122Z prmt.b32 %r1885, %r1881, 0, 0x7773U; 2026-02-21T10:22:31.0970336Z cvt.u16.u32 %rs16, %r1885; 2026-02-21T10:22:31.0970672Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.0971039Z shl.b16 %rs17, %rs9, 4; 2026-02-21T10:22:31.0971223Z shl.b16 %rs18, %rs10, 4; 2026-02-21T10:22:31.0971399Z shl.b16 %rs19, %rs11, 4; 2026-02-21T10:22:31.0971573Z shl.b16 %rs20, %rs12, 4; 2026-02-21T10:22:31.0971741Z shl.b16 %rs21, %rs13, 4; 2026-02-21T10:22:31.0972006Z shl.b16 %rs22, %rs14, 4; 2026-02-21T10:22:31.0972183Z shl.b16 %rs23, %rs15, 4; 2026-02-21T10:22:31.0972358Z shl.b16 %rs24, %rs16, 4; 2026-02-21T10:22:31.0972666Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.0973033Z selp.b16 %rs25, %rs17, %rs9, %p34; 2026-02-21T10:22:31.0973257Z cvt.s16.s8 %rs26, %rs25; 2026-02-21T10:22:31.0973426Z shr.s16 %rs27, %rs26, 4; 2026-02-21T10:22:31.0973618Z selp.b16 %rs28, %rs18, %rs10, %p34; 2026-02-21T10:22:31.0973819Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T10:22:31.0973992Z shr.s16 %rs30, %rs29, 4; 2026-02-21T10:22:31.0974181Z selp.b16 %rs31, %rs19, %rs11, %p34; 2026-02-21T10:22:31.0974384Z cvt.s16.s8 %rs32, %rs31; 2026-02-21T10:22:31.0974552Z shr.s16 %rs33, %rs32, 4; 2026-02-21T10:22:31.0974737Z selp.b16 %rs34, %rs20, %rs12, %p34; 2026-02-21T10:22:31.0974936Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T10:22:31.0975107Z shr.s16 %rs36, %rs35, 4; 2026-02-21T10:22:31.0975292Z selp.b16 %rs37, %rs21, %rs13, %p34; 2026-02-21T10:22:31.0975497Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T10:22:31.0975681Z shr.s16 %rs39, %rs38, 4; 2026-02-21T10:22:31.0975865Z selp.b16 %rs40, %rs22, %rs14, %p34; 2026-02-21T10:22:31.0976074Z cvt.s16.s8 %rs41, %rs40; 2026-02-21T10:22:31.0976249Z shr.s16 %rs42, %rs41, 4; 2026-02-21T10:22:31.0976435Z selp.b16 %rs43, %rs23, %rs15, %p34; 2026-02-21T10:22:31.0976785Z cvt.s16.s8 %rs44, %rs43; 2026-02-21T10:22:31.0977053Z shr.s16 %rs45, %rs44, 4; 2026-02-21T10:22:31.0977242Z selp.b16 %rs46, %rs24, %rs16, %p34; 2026-02-21T10:22:31.0977437Z cvt.s16.s8 %rs47, %rs46; 2026-02-21T10:22:31.0977622Z shr.s16 %rs48, %rs47, 4; 2026-02-21T10:22:31.0977944Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.0978314Z cvt.rn.f32.s16 %r1886, %rs27; 2026-02-21T10:22:31.0978502Z cvt.rn.f32.s16 %r1887, %rs30; 2026-02-21T10:22:31.0978690Z cvt.rn.f32.s16 %r1888, %rs33; 2026-02-21T10:22:31.0978874Z cvt.rn.f32.s16 %r1889, %rs36; 2026-02-21T10:22:31.0979065Z cvt.rn.f32.s16 %r1890, %rs39; 2026-02-21T10:22:31.0979250Z cvt.rn.f32.s16 %r1891, %rs42; 2026-02-21T10:22:31.0979432Z cvt.rn.f32.s16 %r1892, %rs45; 2026-02-21T10:22:31.0979635Z cvt.rn.f32.s16 %r1893, %rs48; 2026-02-21T10:22:31.0979811Z bar.sync 0; 2026-02-21T10:22:31.0979971Z st.shared.b32 [%r23], %r1886; 2026-02-21T10:22:31.0980159Z st.shared.b32 [%r23+8], %r1887; 2026-02-21T10:22:31.0980447Z st.shared.b32 [%r24], %r1888; 2026-02-21T10:22:31.0980639Z st.shared.b32 [%r24+8], %r1889; 2026-02-21T10:22:31.0980834Z st.shared.b32 [%r25], %r1890; 2026-02-21T10:22:31.0981016Z st.shared.b32 [%r25+8], %r1891; 2026-02-21T10:22:31.0981209Z st.shared.b32 [%r26], %r1892; 2026-02-21T10:22:31.0981395Z st.shared.b32 [%r26+8], %r1893; 2026-02-21T10:22:31.0981750Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1735}; 2026-02-21T10:22:31.0982029Z bar.sync 0; 2026-02-21T10:22:31.0982176Z // begin inline asm 2026-02-21T10:22:31.0982431Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r444, %r580}, [%r282]; 2026-02-21T10:22:31.0982717Z // end inline asm 2026-02-21T10:22:31.0982875Z bar.sync 0; 2026-02-21T10:22:31.0983089Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1737}; 2026-02-21T10:22:31.0983372Z bar.sync 0; 2026-02-21T10:22:31.0983523Z // begin inline asm 2026-02-21T10:22:31.0983761Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r446, %r582}, [%r282]; 2026-02-21T10:22:31.0984050Z // end inline asm 2026-02-21T10:22:31.0984213Z bar.sync 0; 2026-02-21T10:22:31.0984437Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1736}; 2026-02-21T10:22:31.0984698Z bar.sync 0; 2026-02-21T10:22:31.0984847Z // begin inline asm 2026-02-21T10:22:31.0985082Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r445, %r581}, [%r282]; 2026-02-21T10:22:31.0985366Z // end inline asm 2026-02-21T10:22:31.0985519Z bar.sync 0; 2026-02-21T10:22:31.0985734Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1738}; 2026-02-21T10:22:31.0986085Z bar.sync 0; 2026-02-21T10:22:31.0986236Z // begin inline asm 2026-02-21T10:22:31.0986588Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r447, %r583}, [%r282]; 2026-02-21T10:22:31.0986886Z // end inline asm 2026-02-21T10:22:31.0987040Z bar.sync 0; 2026-02-21T10:22:31.0987249Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1739}; 2026-02-21T10:22:31.0987515Z bar.sync 0; 2026-02-21T10:22:31.0987657Z // begin inline asm 2026-02-21T10:22:31.0987905Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r448, %r584}, [%r282]; 2026-02-21T10:22:31.0988194Z // end inline asm 2026-02-21T10:22:31.0988350Z bar.sync 0; 2026-02-21T10:22:31.0988656Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1741}; 2026-02-21T10:22:31.0988921Z bar.sync 0; 2026-02-21T10:22:31.0989073Z // begin inline asm 2026-02-21T10:22:31.0989315Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r450, %r586}, [%r282]; 2026-02-21T10:22:31.0989602Z // end inline asm 2026-02-21T10:22:31.0989749Z bar.sync 0; 2026-02-21T10:22:31.0989970Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1740}; 2026-02-21T10:22:31.0990241Z bar.sync 0; 2026-02-21T10:22:31.0990383Z // begin inline asm 2026-02-21T10:22:31.0990629Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r449, %r585}, [%r282]; 2026-02-21T10:22:31.0990904Z // end inline asm 2026-02-21T10:22:31.0991056Z bar.sync 0; 2026-02-21T10:22:31.0991263Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1742}; 2026-02-21T10:22:31.0991618Z bar.sync 0; 2026-02-21T10:22:31.0991764Z // begin inline asm 2026-02-21T10:22:31.0992001Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r451, %r587}, [%r282]; 2026-02-21T10:22:31.0992276Z // end inline asm 2026-02-21T10:22:31.0992423Z bar.sync 0; 2026-02-21T10:22:31.0992639Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1743}; 2026-02-21T10:22:31.0992917Z bar.sync 0; 2026-02-21T10:22:31.0993066Z // begin inline asm 2026-02-21T10:22:31.0993300Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r452, %r588}, [%r282]; 2026-02-21T10:22:31.0993582Z // end inline asm 2026-02-21T10:22:31.0993730Z bar.sync 0; 2026-02-21T10:22:31.0993945Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1745}; 2026-02-21T10:22:31.0994205Z bar.sync 0; 2026-02-21T10:22:31.0994352Z // begin inline asm 2026-02-21T10:22:31.0994593Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r454, %r590}, [%r282]; 2026-02-21T10:22:31.0994871Z // end inline asm 2026-02-21T10:22:31.0995022Z bar.sync 0; 2026-02-21T10:22:31.0995317Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1744}; 2026-02-21T10:22:31.0995595Z bar.sync 0; 2026-02-21T10:22:31.0995734Z // begin inline asm 2026-02-21T10:22:31.0995970Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r453, %r589}, [%r282]; 2026-02-21T10:22:31.0996249Z // end inline asm 2026-02-21T10:22:31.0996392Z bar.sync 0; 2026-02-21T10:22:31.0996827Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1746}; 2026-02-21T10:22:31.0997094Z bar.sync 0; 2026-02-21T10:22:31.0997246Z // begin inline asm 2026-02-21T10:22:31.0997488Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r455, %r591}, [%r282]; 2026-02-21T10:22:31.0997762Z // end inline asm 2026-02-21T10:22:31.0997910Z bar.sync 0; 2026-02-21T10:22:31.0998118Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1747}; 2026-02-21T10:22:31.0998380Z bar.sync 0; 2026-02-21T10:22:31.0998520Z // begin inline asm 2026-02-21T10:22:31.0998758Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r456, %r592}, [%r282]; 2026-02-21T10:22:31.0999034Z // end inline asm 2026-02-21T10:22:31.0999186Z bar.sync 0; 2026-02-21T10:22:31.0999405Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1749}; 2026-02-21T10:22:31.0999666Z bar.sync 0; 2026-02-21T10:22:31.0999821Z // begin inline asm 2026-02-21T10:22:31.1000062Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r458, %r594}, [%r282]; 2026-02-21T10:22:31.1000343Z // end inline asm 2026-02-21T10:22:31.1000486Z bar.sync 0; 2026-02-21T10:22:31.1000705Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1748}; 2026-02-21T10:22:31.1001043Z bar.sync 0; 2026-02-21T10:22:31.1001195Z // begin inline asm 2026-02-21T10:22:31.1001433Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r457, %r593}, [%r282]; 2026-02-21T10:22:31.1001705Z // end inline asm 2026-02-21T10:22:31.1001867Z bar.sync 0; 2026-02-21T10:22:31.1002078Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1750}; 2026-02-21T10:22:31.1002341Z bar.sync 0; 2026-02-21T10:22:31.1002477Z // begin inline asm 2026-02-21T10:22:31.1002722Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r459, %r595}, [%r282]; 2026-02-21T10:22:31.1002994Z // end inline asm 2026-02-21T10:22:31.1003141Z bar.sync 0; 2026-02-21T10:22:31.1003348Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1751}; 2026-02-21T10:22:31.1003613Z bar.sync 0; 2026-02-21T10:22:31.1003755Z // begin inline asm 2026-02-21T10:22:31.1003987Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r460, %r596}, [%r282]; 2026-02-21T10:22:31.1004263Z // end inline asm 2026-02-21T10:22:31.1004406Z bar.sync 0; 2026-02-21T10:22:31.1004618Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1753}; 2026-02-21T10:22:31.1004876Z bar.sync 0; 2026-02-21T10:22:31.1005023Z // begin inline asm 2026-02-21T10:22:31.1005253Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r462, %r598}, [%r282]; 2026-02-21T10:22:31.1005532Z // end inline asm 2026-02-21T10:22:31.1005681Z bar.sync 0; 2026-02-21T10:22:31.1005894Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1752}; 2026-02-21T10:22:31.1006239Z bar.sync 0; 2026-02-21T10:22:31.1006381Z // begin inline asm 2026-02-21T10:22:31.1006762Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r461, %r597}, [%r282]; 2026-02-21T10:22:31.1007043Z // end inline asm 2026-02-21T10:22:31.1007188Z bar.sync 0; 2026-02-21T10:22:31.1007409Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1754}; 2026-02-21T10:22:31.1007678Z bar.sync 0; 2026-02-21T10:22:31.1007824Z // begin inline asm 2026-02-21T10:22:31.1008068Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r463, %r599}, [%r282]; 2026-02-21T10:22:31.1008347Z // end inline asm 2026-02-21T10:22:31.1008495Z bar.sync 0; 2026-02-21T10:22:31.1008707Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1755}; 2026-02-21T10:22:31.1008978Z bar.sync 0; 2026-02-21T10:22:31.1009115Z // begin inline asm 2026-02-21T10:22:31.1009358Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r464, %r600}, [%r282]; 2026-02-21T10:22:31.1009630Z // end inline asm 2026-02-21T10:22:31.1009777Z bar.sync 0; 2026-02-21T10:22:31.1010072Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1757}; 2026-02-21T10:22:31.1010347Z bar.sync 0; 2026-02-21T10:22:31.1010489Z // begin inline asm 2026-02-21T10:22:31.1010744Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r466, %r602}, [%r282]; 2026-02-21T10:22:31.1011024Z // end inline asm 2026-02-21T10:22:31.1011171Z bar.sync 0; 2026-02-21T10:22:31.1011474Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1756}; 2026-02-21T10:22:31.1011739Z bar.sync 0; 2026-02-21T10:22:31.1011884Z // begin inline asm 2026-02-21T10:22:31.1012119Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r465, %r601}, [%r282]; 2026-02-21T10:22:31.1012412Z // end inline asm 2026-02-21T10:22:31.1012557Z bar.sync 0; 2026-02-21T10:22:31.1012770Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1758}; 2026-02-21T10:22:31.1013037Z bar.sync 0; 2026-02-21T10:22:31.1013176Z // begin inline asm 2026-02-21T10:22:31.1013407Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r467, %r603}, [%r282]; 2026-02-21T10:22:31.1013683Z // end inline asm 2026-02-21T10:22:31.1013826Z bar.sync 0; 2026-02-21T10:22:31.1014034Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1759}; 2026-02-21T10:22:31.1014298Z bar.sync 0; 2026-02-21T10:22:31.1014432Z // begin inline asm 2026-02-21T10:22:31.1014661Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r468, %r604}, [%r282]; 2026-02-21T10:22:31.1014936Z // end inline asm 2026-02-21T10:22:31.1015081Z bar.sync 0; 2026-02-21T10:22:31.1015291Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1761}; 2026-02-21T10:22:31.1015550Z bar.sync 0; 2026-02-21T10:22:31.1015798Z // begin inline asm 2026-02-21T10:22:31.1016035Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r470, %r606}, [%r282]; 2026-02-21T10:22:31.1016308Z // end inline asm 2026-02-21T10:22:31.1016577Z bar.sync 0; 2026-02-21T10:22:31.1016809Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1760}; 2026-02-21T10:22:31.1017086Z bar.sync 0; 2026-02-21T10:22:31.1017236Z // begin inline asm 2026-02-21T10:22:31.1017499Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r469, %r605}, [%r282]; 2026-02-21T10:22:31.1017790Z // end inline asm 2026-02-21T10:22:31.1017943Z bar.sync 0; 2026-02-21T10:22:31.1018163Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1762}; 2026-02-21T10:22:31.1018430Z bar.sync 0; 2026-02-21T10:22:31.1018567Z // begin inline asm 2026-02-21T10:22:31.1018810Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r471, %r607}, [%r282]; 2026-02-21T10:22:31.1019087Z // end inline asm 2026-02-21T10:22:31.1019242Z bar.sync 0; 2026-02-21T10:22:31.1019448Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1763}; 2026-02-21T10:22:31.1019708Z bar.sync 0; 2026-02-21T10:22:31.1019848Z // begin inline asm 2026-02-21T10:22:31.1020086Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r472, %r608}, [%r282]; 2026-02-21T10:22:31.1020364Z // end inline asm 2026-02-21T10:22:31.1020504Z bar.sync 0; 2026-02-21T10:22:31.1020714Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1765}; 2026-02-21T10:22:31.1020973Z bar.sync 0; 2026-02-21T10:22:31.1021201Z // begin inline asm 2026-02-21T10:22:31.1021432Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r474, %r610}, [%r282]; 2026-02-21T10:22:31.1021728Z // end inline asm 2026-02-21T10:22:31.1021871Z bar.sync 0; 2026-02-21T10:22:31.1022076Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1764}; 2026-02-21T10:22:31.1022335Z bar.sync 0; 2026-02-21T10:22:31.1022474Z // begin inline asm 2026-02-21T10:22:31.1022707Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r473, %r609}, [%r282]; 2026-02-21T10:22:31.1022975Z // end inline asm 2026-02-21T10:22:31.1023120Z bar.sync 0; 2026-02-21T10:22:31.1023324Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r1766}; 2026-02-21T10:22:31.1023588Z bar.sync 0; 2026-02-21T10:22:31.1023724Z // begin inline asm 2026-02-21T10:22:31.1023957Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r475, %r611}, [%r282]; 2026-02-21T10:22:31.1024233Z // end inline asm 2026-02-21T10:22:31.1024371Z $L__tmp1: 2026-02-21T10:22:31.1024823Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1025254Z // begin inline asm 2026-02-21T10:22:31.1025440Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1025626Z // end inline asm 2026-02-21T10:22:31.1025800Z shfl.sync.idx.b32 %r1894, %r4, 0, 31, -1; 2026-02-21T10:22:31.1026094Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1026284Z mov.pred %p20, -1; 2026-02-21T10:22:31.1026443Z // begin inline asm 2026-02-21T10:22:31.1027322Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r576,%r577,%r578,%r579}, %rd105, %p20, 1, 1; 2026-02-21T10:22:31.1028132Z // end inline asm 2026-02-21T10:22:31.1028294Z // begin inline asm 2026-02-21T10:22:31.1029110Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r644,%r645,%r646,%r647}, %rd106, %p20, 1, 1; 2026-02-21T10:22:31.1029904Z // end inline asm 2026-02-21T10:22:31.1030050Z // begin inline asm 2026-02-21T10:22:31.1030876Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r576,%r577,%r578,%r579}, %rd107, %p20, 1, 1; 2026-02-21T10:22:31.1031673Z // end inline asm 2026-02-21T10:22:31.1031817Z // begin inline asm 2026-02-21T10:22:31.1032557Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r644,%r645,%r646,%r647}, %rd108, %p20, 1, 1; 2026-02-21T10:22:31.1033339Z // end inline asm 2026-02-21T10:22:31.1033523Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1033721Z mov.b32 %r1836, 0; 2026-02-21T10:22:31.1033872Z mov.b32 %r713, %r1836; 2026-02-21T10:22:31.1034040Z mov.b32 %r714, %r1836; 2026-02-21T10:22:31.1034198Z mov.b32 %r712, %r2514; 2026-02-21T10:22:31.1034360Z // begin inline asm 2026-02-21T10:22:31.1035337Z // wait for regs: %r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475,%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r712,%r713,%r714 2026-02-21T10:22:31.1036369Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1036700Z // end inline asm 2026-02-21T10:22:31.1036840Z $L__tmp2: 2026-02-21T10:22:31.1037224Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1037587Z add.s64 %rd65, %rd151, -64; 2026-02-21T10:22:31.1037771Z // begin inline asm 2026-02-21T10:22:31.1037922Z mov.u64 %rd64, 0x0; 2026-02-21T10:22:31.1038140Z createpolicy.fractional.L2::evict_last.b64 %rd64, 1.0; 2026-02-21T10:22:31.1038397Z // end inline asm 2026-02-21T10:22:31.1038546Z // begin inline asm 2026-02-21T10:22:31.1038700Z mov.u32 %r782, 0x0; 2026-02-21T10:22:31.1038846Z mov.u32 %r783, 0x0; 2026-02-21T10:22:31.1039119Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r782, %r783 }, [ %rd65 + 0 ], %rd64; 2026-02-21T10:22:31.1039443Z // end inline asm 2026-02-21T10:22:31.1039739Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1040083Z bar.sync 0; 2026-02-21T10:22:31.1040243Z st.shared.v2.b32 [%r15], {%r782, %r783}; 2026-02-21T10:22:31.1040445Z bar.sync 0; 2026-02-21T10:22:31.1040596Z ld.shared.b16 %rs49, [%r16]; 2026-02-21T10:22:31.1040888Z ld.shared.b16 %rs50, [%r16+256]; 2026-02-21T10:22:31.1041091Z ld.shared.b16 %rs51, [%r16+16]; 2026-02-21T10:22:31.1041288Z ld.shared.b16 %rs52, [%r16+272]; 2026-02-21T10:22:31.1041477Z ld.shared.b16 %rs53, [%r17]; 2026-02-21T10:22:31.1041660Z ld.shared.b16 %rs54, [%r17+256]; 2026-02-21T10:22:31.1041916Z ld.shared.b16 %rs55, [%r17+16]; 2026-02-21T10:22:31.1042108Z ld.shared.b16 %rs56, [%r17+272]; 2026-02-21T10:22:31.1042293Z cvt.f32.bf16 %r985, %rs49; 2026-02-21T10:22:31.1042471Z cvt.f32.bf16 %r986, %rs50; 2026-02-21T10:22:31.1042647Z cvt.f32.bf16 %r987, %rs53; 2026-02-21T10:22:31.1042815Z cvt.f32.bf16 %r988, %rs54; 2026-02-21T10:22:31.1042992Z cvt.f32.bf16 %r1053, %rs51; 2026-02-21T10:22:31.1043169Z cvt.f32.bf16 %r1054, %rs52; 2026-02-21T10:22:31.1043341Z cvt.f32.bf16 %r1055, %rs55; 2026-02-21T10:22:31.1043511Z cvt.f32.bf16 %r1056, %rs56; 2026-02-21T10:22:31.1043831Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1044184Z add.s64 %rd68, %rd150, 10240; 2026-02-21T10:22:31.1044375Z // begin inline asm 2026-02-21T10:22:31.1044537Z mov.u64 %rd67, 0x0; 2026-02-21T10:22:31.1044751Z createpolicy.fractional.L2::evict_last.b64 %rd67, 1.0; 2026-02-21T10:22:31.1045002Z // end inline asm 2026-02-21T10:22:31.1045152Z // begin inline asm 2026-02-21T10:22:31.1045303Z mov.u32 %r784, 0x0; 2026-02-21T10:22:31.1045620Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r784 }, [ %rd68 + 0 ], %rd67; 2026-02-21T10:22:31.1045921Z // end inline asm 2026-02-21T10:22:31.1046205Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1046696Z bar.sync 0; 2026-02-21T10:22:31.1046859Z st.shared.b8 [%r18], %r784; 2026-02-21T10:22:31.1047047Z prmt.b32 %r1895, %r784, 0, 0x7771U; 2026-02-21T10:22:31.1047252Z st.shared.b8 [%r19+256], %r1895; 2026-02-21T10:22:31.1047443Z prmt.b32 %r1896, %r784, 0, 0x7772U; 2026-02-21T10:22:31.1047658Z st.shared.b8 [%r20+512], %r1896; 2026-02-21T10:22:31.1047849Z prmt.b32 %r1897, %r784, 0, 0x7773U; 2026-02-21T10:22:31.1048044Z st.shared.b8 [%r21+768], %r1897; 2026-02-21T10:22:31.1048227Z bar.sync 0; 2026-02-21T10:22:31.1048381Z ld.shared.b32 %r1898, [%r22]; 2026-02-21T10:22:31.1048571Z prmt.b32 %r1899, %r1898, 0, 0x7770U; 2026-02-21T10:22:31.1048770Z cvt.u16.u32 %rs57, %r1899; 2026-02-21T10:22:31.1048951Z prmt.b32 %r1900, %r1898, 0, 0x7771U; 2026-02-21T10:22:31.1049145Z cvt.u16.u32 %rs58, %r1900; 2026-02-21T10:22:31.1049322Z prmt.b32 %r1901, %r1898, 0, 0x7772U; 2026-02-21T10:22:31.1049511Z cvt.u16.u32 %rs59, %r1901; 2026-02-21T10:22:31.1049686Z prmt.b32 %r1902, %r1898, 0, 0x7773U; 2026-02-21T10:22:31.1049885Z cvt.u16.u32 %rs60, %r1902; 2026-02-21T10:22:31.1050065Z ld.shared.b32 %r1903, [%r22+128]; 2026-02-21T10:22:31.1050254Z prmt.b32 %r1904, %r1903, 0, 0x7770U; 2026-02-21T10:22:31.1050445Z cvt.u16.u32 %rs61, %r1904; 2026-02-21T10:22:31.1050720Z prmt.b32 %r1905, %r1903, 0, 0x7771U; 2026-02-21T10:22:31.1050903Z cvt.u16.u32 %rs62, %r1905; 2026-02-21T10:22:31.1051075Z prmt.b32 %r1906, %r1903, 0, 0x7772U; 2026-02-21T10:22:31.1051261Z cvt.u16.u32 %rs63, %r1906; 2026-02-21T10:22:31.1051433Z prmt.b32 %r1907, %r1903, 0, 0x7773U; 2026-02-21T10:22:31.1051619Z cvt.u16.u32 %rs64, %r1907; 2026-02-21T10:22:31.1051929Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1052274Z shl.b16 %rs65, %rs57, 4; 2026-02-21T10:22:31.1052451Z shl.b16 %rs66, %rs58, 4; 2026-02-21T10:22:31.1052626Z shl.b16 %rs67, %rs59, 4; 2026-02-21T10:22:31.1052798Z shl.b16 %rs68, %rs60, 4; 2026-02-21T10:22:31.1052966Z shl.b16 %rs69, %rs61, 4; 2026-02-21T10:22:31.1053128Z shl.b16 %rs70, %rs62, 4; 2026-02-21T10:22:31.1053291Z shl.b16 %rs71, %rs63, 4; 2026-02-21T10:22:31.1053464Z shl.b16 %rs72, %rs64, 4; 2026-02-21T10:22:31.1053844Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1054201Z selp.b16 %rs73, %rs65, %rs57, %p34; 2026-02-21T10:22:31.1054397Z cvt.s16.s8 %rs74, %rs73; 2026-02-21T10:22:31.1054567Z shr.s16 %rs75, %rs74, 4; 2026-02-21T10:22:31.1054737Z selp.b16 %rs76, %rs66, %rs58, %p34; 2026-02-21T10:22:31.1054934Z cvt.s16.s8 %rs77, %rs76; 2026-02-21T10:22:31.1055179Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:22:31.1055354Z selp.b16 %rs79, %rs67, %rs59, %p34; 2026-02-21T10:22:31.1055542Z cvt.s16.s8 %rs80, %rs79; 2026-02-21T10:22:31.1055712Z shr.s16 %rs81, %rs80, 4; 2026-02-21T10:22:31.1055879Z selp.b16 %rs82, %rs68, %rs60, %p34; 2026-02-21T10:22:31.1056077Z cvt.s16.s8 %rs83, %rs82; 2026-02-21T10:22:31.1056250Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:22:31.1056424Z selp.b16 %rs85, %rs69, %rs61, %p34; 2026-02-21T10:22:31.1056765Z cvt.s16.s8 %rs86, %rs85; 2026-02-21T10:22:31.1056934Z shr.s16 %rs87, %rs86, 4; 2026-02-21T10:22:31.1057126Z selp.b16 %rs88, %rs70, %rs62, %p34; 2026-02-21T10:22:31.1057334Z cvt.s16.s8 %rs89, %rs88; 2026-02-21T10:22:31.1057513Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:22:31.1057686Z selp.b16 %rs91, %rs71, %rs63, %p34; 2026-02-21T10:22:31.1057895Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T10:22:31.1058075Z shr.s16 %rs93, %rs92, 4; 2026-02-21T10:22:31.1058257Z selp.b16 %rs94, %rs72, %rs64, %p34; 2026-02-21T10:22:31.1058458Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T10:22:31.1058621Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:22:31.1059009Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1059362Z cvt.rn.f32.s16 %r1908, %rs75; 2026-02-21T10:22:31.1059551Z cvt.rn.f32.s16 %r1909, %rs78; 2026-02-21T10:22:31.1059727Z cvt.rn.f32.s16 %r1910, %rs81; 2026-02-21T10:22:31.1059907Z cvt.rn.f32.s16 %r1911, %rs84; 2026-02-21T10:22:31.1060082Z cvt.rn.f32.s16 %r1912, %rs87; 2026-02-21T10:22:31.1060278Z cvt.rn.f32.s16 %r1913, %rs90; 2026-02-21T10:22:31.1060458Z cvt.rn.f32.s16 %r1914, %rs93; 2026-02-21T10:22:31.1060641Z cvt.rn.f32.s16 %r1915, %rs96; 2026-02-21T10:22:31.1060820Z bar.sync 0; 2026-02-21T10:22:31.1060973Z st.shared.b32 [%r23], %r1908; 2026-02-21T10:22:31.1061164Z st.shared.b32 [%r23+8], %r1909; 2026-02-21T10:22:31.1061351Z st.shared.b32 [%r24], %r1910; 2026-02-21T10:22:31.1061538Z st.shared.b32 [%r24+8], %r1911; 2026-02-21T10:22:31.1061726Z st.shared.b32 [%r25], %r1912; 2026-02-21T10:22:31.1061912Z st.shared.b32 [%r25+8], %r1913; 2026-02-21T10:22:31.1062098Z st.shared.b32 [%r26], %r1914; 2026-02-21T10:22:31.1062285Z st.shared.b32 [%r26+8], %r1915; 2026-02-21T10:22:31.1062466Z $L__tmp3: 2026-02-21T10:22:31.1062822Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1063249Z // begin inline asm 2026-02-21T10:22:31.1063428Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1063615Z // end inline asm 2026-02-21T10:22:31.1063762Z bar.sync 0; 2026-02-21T10:22:31.1064016Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1064195Z // begin inline asm 2026-02-21T10:22:31.1064956Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r985,%r986,%r987,%r988}, %rd105, %p20, 1, 1; 2026-02-21T10:22:31.1065759Z // end inline asm 2026-02-21T10:22:31.1065908Z // begin inline asm 2026-02-21T10:22:31.1066792Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r1053,%r1054,%r1055,%r1056}, %rd106, %p20, 1, 1; 2026-02-21T10:22:31.1067599Z // end inline asm 2026-02-21T10:22:31.1067748Z // begin inline asm 2026-02-21T10:22:31.1068654Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r985,%r986,%r987,%r988}, %rd107, %p20, 1, 1; 2026-02-21T10:22:31.1069452Z // end inline asm 2026-02-21T10:22:31.1069613Z // begin inline asm 2026-02-21T10:22:31.1070371Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r1053,%r1054,%r1055,%r1056}, %rd108, %p20, 1, 1; 2026-02-21T10:22:31.1071257Z // end inline asm 2026-02-21T10:22:31.1071429Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1071627Z mov.b32 %r1122, %r1836; 2026-02-21T10:22:31.1071805Z mov.b32 %r1123, %r1836; 2026-02-21T10:22:31.1071970Z mov.b32 %r1121, %r2514; 2026-02-21T10:22:31.1072154Z // begin inline asm 2026-02-21T10:22:31.1073141Z // wait for regs: %r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475,%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r1121,%r1122,%r1123 2026-02-21T10:22:31.1074177Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1074382Z // end inline asm 2026-02-21T10:22:31.1074533Z $L__tmp4: 2026-02-21T10:22:31.1074894Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1075262Z add.s64 %rd75, %rd151, -32; 2026-02-21T10:22:31.1075442Z // begin inline asm 2026-02-21T10:22:31.1075603Z mov.u64 %rd74, 0x0; 2026-02-21T10:22:31.1075816Z createpolicy.fractional.L2::evict_last.b64 %rd74, 1.0; 2026-02-21T10:22:31.1076070Z // end inline asm 2026-02-21T10:22:31.1076217Z // begin inline asm 2026-02-21T10:22:31.1076380Z mov.u32 %r1191, 0x0; 2026-02-21T10:22:31.1076670Z mov.u32 %r1192, 0x0; 2026-02-21T10:22:31.1076952Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1191, %r1192 }, [ %rd75 + 0 ], %rd74; 2026-02-21T10:22:31.1077298Z // end inline asm 2026-02-21T10:22:31.1077591Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1077943Z bar.sync 0; 2026-02-21T10:22:31.1078108Z st.shared.v2.b32 [%r15], {%r1191, %r1192}; 2026-02-21T10:22:31.1078322Z bar.sync 0; 2026-02-21T10:22:31.1078473Z ld.shared.b16 %rs97, [%r16]; 2026-02-21T10:22:31.1078667Z ld.shared.b16 %rs98, [%r16+256]; 2026-02-21T10:22:31.1078864Z ld.shared.b16 %rs99, [%r16+16]; 2026-02-21T10:22:31.1079063Z ld.shared.b16 %rs100, [%r16+272]; 2026-02-21T10:22:31.1079265Z ld.shared.b16 %rs101, [%r17]; 2026-02-21T10:22:31.1079464Z ld.shared.b16 %rs102, [%r17+256]; 2026-02-21T10:22:31.1079663Z ld.shared.b16 %rs103, [%r17+16]; 2026-02-21T10:22:31.1079938Z ld.shared.b16 %rs104, [%r17+272]; 2026-02-21T10:22:31.1080138Z cvt.f32.bf16 %r1394, %rs97; 2026-02-21T10:22:31.1080320Z cvt.f32.bf16 %r1395, %rs98; 2026-02-21T10:22:31.1080513Z cvt.f32.bf16 %r1396, %rs101; 2026-02-21T10:22:31.1080691Z cvt.f32.bf16 %r1397, %rs102; 2026-02-21T10:22:31.1080872Z cvt.f32.bf16 %r1462, %rs99; 2026-02-21T10:22:31.1081051Z cvt.f32.bf16 %r1463, %rs100; 2026-02-21T10:22:31.1081230Z cvt.f32.bf16 %r1464, %rs103; 2026-02-21T10:22:31.1081408Z cvt.f32.bf16 %r1465, %rs104; 2026-02-21T10:22:31.1081726Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1082087Z add.s64 %rd78, %rd150, 20480; 2026-02-21T10:22:31.1082266Z // begin inline asm 2026-02-21T10:22:31.1082429Z mov.u64 %rd77, 0x0; 2026-02-21T10:22:31.1082646Z createpolicy.fractional.L2::evict_last.b64 %rd77, 1.0; 2026-02-21T10:22:31.1082905Z // end inline asm 2026-02-21T10:22:31.1083053Z // begin inline asm 2026-02-21T10:22:31.1083215Z mov.u32 %r1193, 0x0; 2026-02-21T10:22:31.1083558Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1193 }, [ %rd78 + 0 ], %rd77; 2026-02-21T10:22:31.1083868Z // end inline asm 2026-02-21T10:22:31.1084165Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1084510Z bar.sync 0; 2026-02-21T10:22:31.1084733Z st.shared.b8 [%r18], %r1193; 2026-02-21T10:22:31.1084919Z prmt.b32 %r1916, %r1193, 0, 0x7771U; 2026-02-21T10:22:31.1085128Z st.shared.b8 [%r19+256], %r1916; 2026-02-21T10:22:31.1085331Z prmt.b32 %r1917, %r1193, 0, 0x7772U; 2026-02-21T10:22:31.1085530Z st.shared.b8 [%r20+512], %r1917; 2026-02-21T10:22:31.1085737Z prmt.b32 %r1918, %r1193, 0, 0x7773U; 2026-02-21T10:22:31.1085935Z st.shared.b8 [%r21+768], %r1918; 2026-02-21T10:22:31.1086121Z bar.sync 0; 2026-02-21T10:22:31.1086272Z ld.shared.b32 %r1919, [%r22]; 2026-02-21T10:22:31.1086589Z prmt.b32 %r1920, %r1919, 0, 0x7770U; 2026-02-21T10:22:31.1086797Z cvt.u16.u32 %rs105, %r1920; 2026-02-21T10:22:31.1086984Z prmt.b32 %r1921, %r1919, 0, 0x7771U; 2026-02-21T10:22:31.1087177Z cvt.u16.u32 %rs106, %r1921; 2026-02-21T10:22:31.1087359Z prmt.b32 %r1922, %r1919, 0, 0x7772U; 2026-02-21T10:22:31.1087553Z cvt.u16.u32 %rs107, %r1922; 2026-02-21T10:22:31.1087732Z prmt.b32 %r1923, %r1919, 0, 0x7773U; 2026-02-21T10:22:31.1087943Z cvt.u16.u32 %rs108, %r1923; 2026-02-21T10:22:31.1088133Z ld.shared.b32 %r1924, [%r22+128]; 2026-02-21T10:22:31.1088335Z prmt.b32 %r1925, %r1924, 0, 0x7770U; 2026-02-21T10:22:31.1088615Z cvt.u16.u32 %rs109, %r1925; 2026-02-21T10:22:31.1088807Z prmt.b32 %r1926, %r1924, 0, 0x7771U; 2026-02-21T10:22:31.1088995Z cvt.u16.u32 %rs110, %r1926; 2026-02-21T10:22:31.1089176Z prmt.b32 %r1927, %r1924, 0, 0x7772U; 2026-02-21T10:22:31.1089370Z cvt.u16.u32 %rs111, %r1927; 2026-02-21T10:22:31.1089547Z prmt.b32 %r1928, %r1924, 0, 0x7773U; 2026-02-21T10:22:31.1089741Z cvt.u16.u32 %rs112, %r1928; 2026-02-21T10:22:31.1090058Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1090417Z shl.b16 %rs113, %rs105, 4; 2026-02-21T10:22:31.1090595Z shl.b16 %rs114, %rs106, 4; 2026-02-21T10:22:31.1090773Z shl.b16 %rs115, %rs107, 4; 2026-02-21T10:22:31.1090944Z shl.b16 %rs116, %rs108, 4; 2026-02-21T10:22:31.1091123Z shl.b16 %rs117, %rs109, 4; 2026-02-21T10:22:31.1091303Z shl.b16 %rs118, %rs110, 4; 2026-02-21T10:22:31.1091473Z shl.b16 %rs119, %rs111, 4; 2026-02-21T10:22:31.1091649Z shl.b16 %rs120, %rs112, 4; 2026-02-21T10:22:31.1091959Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1092323Z selp.b16 %rs121, %rs113, %rs105, %p34; 2026-02-21T10:22:31.1092530Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T10:22:31.1092710Z shr.s16 %rs123, %rs122, 4; 2026-02-21T10:22:31.1092891Z selp.b16 %rs124, %rs114, %rs106, %p34; 2026-02-21T10:22:31.1093097Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T10:22:31.1093367Z shr.s16 %rs126, %rs125, 4; 2026-02-21T10:22:31.1093551Z selp.b16 %rs127, %rs115, %rs107, %p34; 2026-02-21T10:22:31.1093754Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T10:22:31.1093923Z shr.s16 %rs129, %rs128, 4; 2026-02-21T10:22:31.1094110Z selp.b16 %rs130, %rs116, %rs108, %p34; 2026-02-21T10:22:31.1094306Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T10:22:31.1094500Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:22:31.1094681Z selp.b16 %rs133, %rs117, %rs109, %p34; 2026-02-21T10:22:31.1094885Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T10:22:31.1095059Z shr.s16 %rs135, %rs134, 4; 2026-02-21T10:22:31.1095246Z selp.b16 %rs136, %rs118, %rs110, %p34; 2026-02-21T10:22:31.1095448Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T10:22:31.1095618Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:22:31.1095805Z selp.b16 %rs139, %rs119, %rs111, %p34; 2026-02-21T10:22:31.1096006Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T10:22:31.1096184Z shr.s16 %rs141, %rs140, 4; 2026-02-21T10:22:31.1096360Z selp.b16 %rs142, %rs120, %rs112, %p34; 2026-02-21T10:22:31.1096765Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T10:22:31.1096957Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:22:31.1097273Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1097629Z cvt.rn.f32.s16 %r1929, %rs123; 2026-02-21T10:22:31.1097816Z cvt.rn.f32.s16 %r1930, %rs126; 2026-02-21T10:22:31.1098071Z cvt.rn.f32.s16 %r1931, %rs129; 2026-02-21T10:22:31.1098257Z cvt.rn.f32.s16 %r1932, %rs132; 2026-02-21T10:22:31.1098444Z cvt.rn.f32.s16 %r1933, %rs135; 2026-02-21T10:22:31.1098622Z cvt.rn.f32.s16 %r1934, %rs138; 2026-02-21T10:22:31.1098816Z cvt.rn.f32.s16 %r1935, %rs141; 2026-02-21T10:22:31.1098999Z cvt.rn.f32.s16 %r1936, %rs144; 2026-02-21T10:22:31.1099178Z bar.sync 0; 2026-02-21T10:22:31.1099329Z st.shared.b32 [%r23], %r1929; 2026-02-21T10:22:31.1099522Z st.shared.b32 [%r23+8], %r1930; 2026-02-21T10:22:31.1099728Z st.shared.b32 [%r24], %r1931; 2026-02-21T10:22:31.1099912Z st.shared.b32 [%r24+8], %r1932; 2026-02-21T10:22:31.1100109Z st.shared.b32 [%r25], %r1933; 2026-02-21T10:22:31.1100289Z st.shared.b32 [%r25+8], %r1934; 2026-02-21T10:22:31.1100479Z st.shared.b32 [%r26], %r1935; 2026-02-21T10:22:31.1100658Z st.shared.b32 [%r26+8], %r1936; 2026-02-21T10:22:31.1100838Z $L__tmp5: 2026-02-21T10:22:31.1101185Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1101607Z // begin inline asm 2026-02-21T10:22:31.1101881Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1102075Z // end inline asm 2026-02-21T10:22:31.1102226Z bar.sync 0; 2026-02-21T10:22:31.1102383Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1102568Z // begin inline asm 2026-02-21T10:22:31.1103333Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r1394,%r1395,%r1396,%r1397}, %rd105, %p20, 1, 1; 2026-02-21T10:22:31.1104146Z // end inline asm 2026-02-21T10:22:31.1104301Z // begin inline asm 2026-02-21T10:22:31.1105047Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475}, {%r1462,%r1463,%r1464,%r1465}, %rd106, %p20, 1, 1; 2026-02-21T10:22:31.1105848Z // end inline asm 2026-02-21T10:22:31.1105997Z // begin inline asm 2026-02-21T10:22:31.1106869Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r1394,%r1395,%r1396,%r1397}, %rd107, %p20, 1, 1; 2026-02-21T10:22:31.1107688Z // end inline asm 2026-02-21T10:22:31.1107841Z // begin inline asm 2026-02-21T10:22:31.1108782Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611}, {%r1462,%r1463,%r1464,%r1465}, %rd108, %p20, 1, 1; 2026-02-21T10:22:31.1109591Z // end inline asm 2026-02-21T10:22:31.1109756Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1109958Z mov.b32 %r1531, %r1836; 2026-02-21T10:22:31.1110130Z mov.b32 %r1532, %r1836; 2026-02-21T10:22:31.1110300Z mov.b32 %r1530, %r2514; 2026-02-21T10:22:31.1110461Z // begin inline asm 2026-02-21T10:22:31.1111520Z // wait for regs: %r444,%r445,%r446,%r447,%r448,%r449,%r450,%r451,%r452,%r453,%r454,%r455,%r456,%r457,%r458,%r459,%r460,%r461,%r462,%r463,%r464,%r465,%r466,%r467,%r468,%r469,%r470,%r471,%r472,%r473,%r474,%r475,%r580,%r581,%r582,%r583,%r584,%r585,%r586,%r587,%r588,%r589,%r590,%r591,%r592,%r593,%r594,%r595,%r596,%r597,%r598,%r599,%r600,%r601,%r602,%r603,%r604,%r605,%r606,%r607,%r608,%r609,%r610,%r611,%r1530,%r1531,%r1532 2026-02-21T10:22:31.1112564Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1112758Z // end inline asm 2026-02-21T10:22:31.1112907Z $L__tmp6: 2026-02-21T10:22:31.1113188Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1113609Z // begin inline asm 2026-02-21T10:22:31.1113760Z mov.u64 %rd84, 0x0; 2026-02-21T10:22:31.1113979Z createpolicy.fractional.L2::evict_last.b64 %rd84, 1.0; 2026-02-21T10:22:31.1114236Z // end inline asm 2026-02-21T10:22:31.1114384Z // begin inline asm 2026-02-21T10:22:31.1114542Z mov.u32 %r1600, 0x0; 2026-02-21T10:22:31.1114694Z mov.u32 %r1601, 0x0; 2026-02-21T10:22:31.1114993Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1600, %r1601 }, [ %rd151 + 0 ], %rd84; 2026-02-21T10:22:31.1115321Z // end inline asm 2026-02-21T10:22:31.1115618Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1115965Z bar.sync 0; 2026-02-21T10:22:31.1116132Z st.shared.v2.b32 [%r15], {%r1600, %r1601}; 2026-02-21T10:22:31.1116343Z bar.sync 0; 2026-02-21T10:22:31.1116711Z ld.shared.b16 %rs145, [%r16]; 2026-02-21T10:22:31.1116924Z ld.shared.b16 %rs146, [%r16+256]; 2026-02-21T10:22:31.1117126Z ld.shared.b16 %rs147, [%r16+16]; 2026-02-21T10:22:31.1117328Z ld.shared.b16 %rs148, [%r16+272]; 2026-02-21T10:22:31.1117520Z ld.shared.b16 %rs149, [%r17]; 2026-02-21T10:22:31.1117796Z ld.shared.b16 %rs150, [%r17+256]; 2026-02-21T10:22:31.1117998Z ld.shared.b16 %rs151, [%r17+16]; 2026-02-21T10:22:31.1118196Z ld.shared.b16 %rs152, [%r17+272]; 2026-02-21T10:22:31.1118398Z cvt.f32.bf16 %r1731, %rs145; 2026-02-21T10:22:31.1118585Z cvt.f32.bf16 %r1732, %rs146; 2026-02-21T10:22:31.1118767Z cvt.f32.bf16 %r1733, %rs149; 2026-02-21T10:22:31.1118940Z cvt.f32.bf16 %r1734, %rs150; 2026-02-21T10:22:31.1119130Z cvt.f32.bf16 %r1799, %rs147; 2026-02-21T10:22:31.1119311Z cvt.f32.bf16 %r1800, %rs148; 2026-02-21T10:22:31.1119495Z cvt.f32.bf16 %r1801, %rs151; 2026-02-21T10:22:31.1119673Z cvt.f32.bf16 %r1802, %rs152; 2026-02-21T10:22:31.1119997Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1120352Z add.s64 %rd88, %rd150, 30720; 2026-02-21T10:22:31.1120538Z // begin inline asm 2026-02-21T10:22:31.1120713Z mov.u64 %rd87, 0x0; 2026-02-21T10:22:31.1120931Z createpolicy.fractional.L2::evict_last.b64 %rd87, 1.0; 2026-02-21T10:22:31.1121191Z // end inline asm 2026-02-21T10:22:31.1121339Z // begin inline asm 2026-02-21T10:22:31.1121500Z mov.u32 %r1602, 0x0; 2026-02-21T10:22:31.1121750Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1602 }, [ %rd88 + 0 ], %rd87; 2026-02-21T10:22:31.1122051Z // end inline asm 2026-02-21T10:22:31.1122341Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1122691Z bar.sync 0; 2026-02-21T10:22:31.1122940Z st.shared.b8 [%r18], %r1602; 2026-02-21T10:22:31.1123134Z prmt.b32 %r1937, %r1602, 0, 0x7771U; 2026-02-21T10:22:31.1123346Z st.shared.b8 [%r19+256], %r1937; 2026-02-21T10:22:31.1123539Z prmt.b32 %r1938, %r1602, 0, 0x7772U; 2026-02-21T10:22:31.1123745Z st.shared.b8 [%r20+512], %r1938; 2026-02-21T10:22:31.1123944Z prmt.b32 %r1939, %r1602, 0, 0x7773U; 2026-02-21T10:22:31.1124151Z st.shared.b8 [%r21+768], %r1939; 2026-02-21T10:22:31.1124330Z bar.sync 0; 2026-02-21T10:22:31.1124486Z ld.shared.b32 %r1940, [%r22]; 2026-02-21T10:22:31.1124678Z prmt.b32 %r1941, %r1940, 0, 0x7770U; 2026-02-21T10:22:31.1124874Z cvt.u16.u32 %rs153, %r1941; 2026-02-21T10:22:31.1125058Z prmt.b32 %r1942, %r1940, 0, 0x7771U; 2026-02-21T10:22:31.1125249Z cvt.u16.u32 %rs154, %r1942; 2026-02-21T10:22:31.1125438Z prmt.b32 %r1943, %r1940, 0, 0x7772U; 2026-02-21T10:22:31.1125628Z cvt.u16.u32 %rs155, %r1943; 2026-02-21T10:22:31.1125811Z prmt.b32 %r1944, %r1940, 0, 0x7773U; 2026-02-21T10:22:31.1126004Z cvt.u16.u32 %rs156, %r1944; 2026-02-21T10:22:31.1126298Z ld.shared.b32 %r1945, [%r22+128]; 2026-02-21T10:22:31.1126627Z prmt.b32 %r1946, %r1945, 0, 0x7770U; 2026-02-21T10:22:31.1126821Z cvt.u16.u32 %rs157, %r1946; 2026-02-21T10:22:31.1127002Z prmt.b32 %r1947, %r1945, 0, 0x7771U; 2026-02-21T10:22:31.1127192Z cvt.u16.u32 %rs158, %r1947; 2026-02-21T10:22:31.1127464Z prmt.b32 %r1948, %r1945, 0, 0x7772U; 2026-02-21T10:22:31.1127656Z cvt.u16.u32 %rs159, %r1948; 2026-02-21T10:22:31.1127835Z prmt.b32 %r1949, %r1945, 0, 0x7773U; 2026-02-21T10:22:31.1128027Z cvt.u16.u32 %rs160, %r1949; 2026-02-21T10:22:31.1128349Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1128415Z shl.b16 %rs161, %rs153, 4; 2026-02-21T10:22:31.1128484Z shl.b16 %rs162, %rs154, 4; 2026-02-21T10:22:31.1128546Z shl.b16 %rs163, %rs155, 4; 2026-02-21T10:22:31.1128607Z shl.b16 %rs164, %rs156, 4; 2026-02-21T10:22:31.1128668Z shl.b16 %rs165, %rs157, 4; 2026-02-21T10:22:31.1128750Z shl.b16 %rs166, %rs158, 4; 2026-02-21T10:22:31.1128818Z shl.b16 %rs167, %rs159, 4; 2026-02-21T10:22:31.1128881Z shl.b16 %rs168, %rs160, 4; 2026-02-21T10:22:31.1129084Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1129159Z selp.b16 %rs169, %rs161, %rs153, %p34; 2026-02-21T10:22:31.1129228Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T10:22:31.1129288Z shr.s16 %rs171, %rs170, 4; 2026-02-21T10:22:31.1129439Z selp.b16 %rs172, %rs162, %rs154, %p34; 2026-02-21T10:22:31.1129509Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T10:22:31.1129573Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:22:31.1129653Z selp.b16 %rs175, %rs163, %rs155, %p34; 2026-02-21T10:22:31.1129716Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T10:22:31.1129778Z shr.s16 %rs177, %rs176, 4; 2026-02-21T10:22:31.1129858Z selp.b16 %rs178, %rs164, %rs156, %p34; 2026-02-21T10:22:31.1129921Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T10:22:31.1129985Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:22:31.1130056Z selp.b16 %rs181, %rs165, %rs157, %p34; 2026-02-21T10:22:31.1130124Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T10:22:31.1130185Z shr.s16 %rs183, %rs182, 4; 2026-02-21T10:22:31.1130267Z selp.b16 %rs184, %rs166, %rs158, %p34; 2026-02-21T10:22:31.1130335Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T10:22:31.1130401Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:22:31.1130472Z selp.b16 %rs187, %rs167, %rs159, %p34; 2026-02-21T10:22:31.1130533Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T10:22:31.1130603Z shr.s16 %rs189, %rs188, 4; 2026-02-21T10:22:31.1130672Z selp.b16 %rs190, %rs168, %rs160, %p34; 2026-02-21T10:22:31.1130736Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T10:22:31.1130804Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:22:31.1131011Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1131079Z cvt.rn.f32.s16 %r1950, %rs171; 2026-02-21T10:22:31.1131150Z cvt.rn.f32.s16 %r1951, %rs174; 2026-02-21T10:22:31.1131293Z cvt.rn.f32.s16 %r1952, %rs177; 2026-02-21T10:22:31.1131357Z cvt.rn.f32.s16 %r1953, %rs180; 2026-02-21T10:22:31.1131423Z cvt.rn.f32.s16 %r1954, %rs183; 2026-02-21T10:22:31.1131492Z cvt.rn.f32.s16 %r1955, %rs186; 2026-02-21T10:22:31.1131554Z cvt.rn.f32.s16 %r1956, %rs189; 2026-02-21T10:22:31.1131617Z cvt.rn.f32.s16 %r1957, %rs192; 2026-02-21T10:22:31.1131689Z bar.sync 0; 2026-02-21T10:22:31.1131759Z st.shared.b32 [%r23], %r1950; 2026-02-21T10:22:31.1131828Z st.shared.b32 [%r23+8], %r1951; 2026-02-21T10:22:31.1131895Z st.shared.b32 [%r24], %r1952; 2026-02-21T10:22:31.1131979Z st.shared.b32 [%r24+8], %r1953; 2026-02-21T10:22:31.1132046Z st.shared.b32 [%r25], %r1954; 2026-02-21T10:22:31.1132114Z st.shared.b32 [%r25+8], %r1955; 2026-02-21T10:22:31.1132183Z st.shared.b32 [%r26], %r1956; 2026-02-21T10:22:31.1132247Z st.shared.b32 [%r26+8], %r1957; 2026-02-21T10:22:31.1132304Z $L__tmp7: 2026-02-21T10:22:31.1132647Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1132815Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r444, %r580}; 2026-02-21T10:22:31.1132872Z bar.sync 0; 2026-02-21T10:22:31.1132934Z // begin inline asm 2026-02-21T10:22:31.1133078Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1735}, [%r1604]; 2026-02-21T10:22:31.1133189Z // end inline asm 2026-02-21T10:22:31.1133247Z bar.sync 0; 2026-02-21T10:22:31.1133398Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r446, %r582}; 2026-02-21T10:22:31.1133458Z bar.sync 0; 2026-02-21T10:22:31.1133518Z // begin inline asm 2026-02-21T10:22:31.1133649Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1737}, [%r1604]; 2026-02-21T10:22:31.1133713Z // end inline asm 2026-02-21T10:22:31.1133768Z bar.sync 0; 2026-02-21T10:22:31.1133920Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r445, %r581}; 2026-02-21T10:22:31.1133981Z bar.sync 0; 2026-02-21T10:22:31.1134043Z // begin inline asm 2026-02-21T10:22:31.1134178Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1736}, [%r1604]; 2026-02-21T10:22:31.1134235Z // end inline asm 2026-02-21T10:22:31.1134297Z bar.sync 0; 2026-02-21T10:22:31.1134439Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r447, %r583}; 2026-02-21T10:22:31.1134495Z bar.sync 0; 2026-02-21T10:22:31.1134558Z // begin inline asm 2026-02-21T10:22:31.1134689Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1738}, [%r1604]; 2026-02-21T10:22:31.1134748Z // end inline asm 2026-02-21T10:22:31.1134804Z bar.sync 0; 2026-02-21T10:22:31.1135005Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r448, %r584}; 2026-02-21T10:22:31.1135065Z bar.sync 0; 2026-02-21T10:22:31.1135125Z // begin inline asm 2026-02-21T10:22:31.1135256Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1739}, [%r1604]; 2026-02-21T10:22:31.1135312Z // end inline asm 2026-02-21T10:22:31.1135367Z bar.sync 0; 2026-02-21T10:22:31.1135511Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r450, %r586}; 2026-02-21T10:22:31.1135568Z bar.sync 0; 2026-02-21T10:22:31.1135630Z // begin inline asm 2026-02-21T10:22:31.1135768Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1741}, [%r1604]; 2026-02-21T10:22:31.1135831Z // end inline asm 2026-02-21T10:22:31.1135888Z bar.sync 0; 2026-02-21T10:22:31.1136028Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r449, %r585}; 2026-02-21T10:22:31.1136091Z bar.sync 0; 2026-02-21T10:22:31.1136151Z // begin inline asm 2026-02-21T10:22:31.1136277Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1740}, [%r1604]; 2026-02-21T10:22:31.1136338Z // end inline asm 2026-02-21T10:22:31.1136400Z bar.sync 0; 2026-02-21T10:22:31.1136671Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r451, %r587}; 2026-02-21T10:22:31.1136730Z bar.sync 0; 2026-02-21T10:22:31.1136796Z // begin inline asm 2026-02-21T10:22:31.1136928Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1742}, [%r1604]; 2026-02-21T10:22:31.1136984Z // end inline asm 2026-02-21T10:22:31.1137043Z bar.sync 0; 2026-02-21T10:22:31.1137272Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r452, %r588}; 2026-02-21T10:22:31.1137329Z bar.sync 0; 2026-02-21T10:22:31.1137389Z // begin inline asm 2026-02-21T10:22:31.1137522Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1743}, [%r1604]; 2026-02-21T10:22:31.1137590Z // end inline asm 2026-02-21T10:22:31.1137648Z bar.sync 0; 2026-02-21T10:22:31.1137799Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r454, %r590}; 2026-02-21T10:22:31.1137858Z bar.sync 0; 2026-02-21T10:22:31.1137919Z // begin inline asm 2026-02-21T10:22:31.1138052Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1745}, [%r1604]; 2026-02-21T10:22:31.1138118Z // end inline asm 2026-02-21T10:22:31.1138174Z bar.sync 0; 2026-02-21T10:22:31.1138316Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r453, %r589}; 2026-02-21T10:22:31.1138377Z bar.sync 0; 2026-02-21T10:22:31.1138436Z // begin inline asm 2026-02-21T10:22:31.1138564Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1744}, [%r1604]; 2026-02-21T10:22:31.1138625Z // end inline asm 2026-02-21T10:22:31.1138762Z bar.sync 0; 2026-02-21T10:22:31.1138908Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r455, %r591}; 2026-02-21T10:22:31.1138964Z bar.sync 0; 2026-02-21T10:22:31.1139027Z // begin inline asm 2026-02-21T10:22:31.1139156Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1746}, [%r1604]; 2026-02-21T10:22:31.1139276Z // end inline asm 2026-02-21T10:22:31.1139332Z bar.sync 0; 2026-02-21T10:22:31.1139478Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r456, %r592}; 2026-02-21T10:22:31.1139538Z bar.sync 0; 2026-02-21T10:22:31.1139597Z // begin inline asm 2026-02-21T10:22:31.1139731Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1747}, [%r1604]; 2026-02-21T10:22:31.1139792Z // end inline asm 2026-02-21T10:22:31.1139848Z bar.sync 0; 2026-02-21T10:22:31.1139992Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r458, %r594}; 2026-02-21T10:22:31.1140048Z bar.sync 0; 2026-02-21T10:22:31.1140107Z // begin inline asm 2026-02-21T10:22:31.1140242Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1749}, [%r1604]; 2026-02-21T10:22:31.1140306Z // end inline asm 2026-02-21T10:22:31.1140361Z bar.sync 0; 2026-02-21T10:22:31.1140501Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r457, %r593}; 2026-02-21T10:22:31.1140564Z bar.sync 0; 2026-02-21T10:22:31.1140623Z // begin inline asm 2026-02-21T10:22:31.1140753Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1748}, [%r1604]; 2026-02-21T10:22:31.1140810Z // end inline asm 2026-02-21T10:22:31.1140872Z bar.sync 0; 2026-02-21T10:22:31.1141080Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r459, %r595}; 2026-02-21T10:22:31.1141140Z bar.sync 0; 2026-02-21T10:22:31.1141205Z // begin inline asm 2026-02-21T10:22:31.1141335Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1750}, [%r1604]; 2026-02-21T10:22:31.1141405Z // end inline asm 2026-02-21T10:22:31.1141462Z bar.sync 0; 2026-02-21T10:22:31.1141609Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r460, %r596}; 2026-02-21T10:22:31.1141669Z bar.sync 0; 2026-02-21T10:22:31.1141731Z // begin inline asm 2026-02-21T10:22:31.1141869Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1751}, [%r1604]; 2026-02-21T10:22:31.1141928Z // end inline asm 2026-02-21T10:22:31.1141983Z bar.sync 0; 2026-02-21T10:22:31.1142127Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r462, %r598}; 2026-02-21T10:22:31.1142189Z bar.sync 0; 2026-02-21T10:22:31.1142251Z // begin inline asm 2026-02-21T10:22:31.1142382Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1753}, [%r1604]; 2026-02-21T10:22:31.1142448Z // end inline asm 2026-02-21T10:22:31.1142505Z bar.sync 0; 2026-02-21T10:22:31.1142646Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r461, %r597}; 2026-02-21T10:22:31.1142707Z bar.sync 0; 2026-02-21T10:22:31.1142767Z // begin inline asm 2026-02-21T10:22:31.1142897Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1752}, [%r1604]; 2026-02-21T10:22:31.1142955Z // end inline asm 2026-02-21T10:22:31.1143018Z bar.sync 0; 2026-02-21T10:22:31.1143231Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r463, %r599}; 2026-02-21T10:22:31.1143288Z bar.sync 0; 2026-02-21T10:22:31.1143351Z // begin inline asm 2026-02-21T10:22:31.1143478Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1754}, [%r1604]; 2026-02-21T10:22:31.1143535Z // end inline asm 2026-02-21T10:22:31.1143590Z bar.sync 0; 2026-02-21T10:22:31.1143736Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r464, %r600}; 2026-02-21T10:22:31.1143793Z bar.sync 0; 2026-02-21T10:22:31.1143851Z // begin inline asm 2026-02-21T10:22:31.1143987Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1755}, [%r1604]; 2026-02-21T10:22:31.1144043Z // end inline asm 2026-02-21T10:22:31.1144096Z bar.sync 0; 2026-02-21T10:22:31.1144239Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r466, %r602}; 2026-02-21T10:22:31.1144293Z bar.sync 0; 2026-02-21T10:22:31.1144351Z // begin inline asm 2026-02-21T10:22:31.1144477Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1757}, [%r1604]; 2026-02-21T10:22:31.1144541Z // end inline asm 2026-02-21T10:22:31.1144655Z bar.sync 0; 2026-02-21T10:22:31.1144799Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r465, %r601}; 2026-02-21T10:22:31.1144859Z bar.sync 0; 2026-02-21T10:22:31.1144917Z // begin inline asm 2026-02-21T10:22:31.1145045Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1756}, [%r1604]; 2026-02-21T10:22:31.1145152Z // end inline asm 2026-02-21T10:22:31.1145212Z bar.sync 0; 2026-02-21T10:22:31.1145348Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r467, %r603}; 2026-02-21T10:22:31.1145406Z bar.sync 0; 2026-02-21T10:22:31.1145473Z // begin inline asm 2026-02-21T10:22:31.1145601Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1758}, [%r1604]; 2026-02-21T10:22:31.1145659Z // end inline asm 2026-02-21T10:22:31.1145715Z bar.sync 0; 2026-02-21T10:22:31.1145856Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r468, %r604}; 2026-02-21T10:22:31.1145911Z bar.sync 0; 2026-02-21T10:22:31.1145970Z // begin inline asm 2026-02-21T10:22:31.1146107Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1759}, [%r1604]; 2026-02-21T10:22:31.1146168Z // end inline asm 2026-02-21T10:22:31.1146223Z bar.sync 0; 2026-02-21T10:22:31.1146369Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r470, %r606}; 2026-02-21T10:22:31.1146425Z bar.sync 0; 2026-02-21T10:22:31.1146608Z // begin inline asm 2026-02-21T10:22:31.1146750Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1761}, [%r1604]; 2026-02-21T10:22:31.1146814Z // end inline asm 2026-02-21T10:22:31.1146871Z bar.sync 0; 2026-02-21T10:22:31.1147098Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r469, %r605}; 2026-02-21T10:22:31.1147161Z bar.sync 0; 2026-02-21T10:22:31.1147223Z // begin inline asm 2026-02-21T10:22:31.1147351Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1760}, [%r1604]; 2026-02-21T10:22:31.1147408Z // end inline asm 2026-02-21T10:22:31.1147470Z bar.sync 0; 2026-02-21T10:22:31.1147607Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r471, %r607}; 2026-02-21T10:22:31.1147663Z bar.sync 0; 2026-02-21T10:22:31.1147733Z // begin inline asm 2026-02-21T10:22:31.1147862Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1762}, [%r1604]; 2026-02-21T10:22:31.1147919Z // end inline asm 2026-02-21T10:22:31.1147977Z bar.sync 0; 2026-02-21T10:22:31.1148137Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r472, %r608}; 2026-02-21T10:22:31.1148200Z bar.sync 0; 2026-02-21T10:22:31.1148261Z // begin inline asm 2026-02-21T10:22:31.1148450Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1763}, [%r1604]; 2026-02-21T10:22:31.1148527Z // end inline asm 2026-02-21T10:22:31.1148586Z bar.sync 0; 2026-02-21T10:22:31.1148733Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r474, %r610}; 2026-02-21T10:22:31.1148789Z bar.sync 0; 2026-02-21T10:22:31.1148848Z // begin inline asm 2026-02-21T10:22:31.1148977Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1765}, [%r1604]; 2026-02-21T10:22:31.1149038Z // end inline asm 2026-02-21T10:22:31.1149096Z bar.sync 0; 2026-02-21T10:22:31.1149235Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r473, %r609}; 2026-02-21T10:22:31.1149397Z bar.sync 0; 2026-02-21T10:22:31.1149457Z // begin inline asm 2026-02-21T10:22:31.1149598Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1764}, [%r1604]; 2026-02-21T10:22:31.1149656Z // end inline asm 2026-02-21T10:22:31.1149718Z bar.sync 0; 2026-02-21T10:22:31.1149857Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r475, %r611}; 2026-02-21T10:22:31.1149917Z bar.sync 0; 2026-02-21T10:22:31.1149979Z // begin inline asm 2026-02-21T10:22:31.1150109Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1766}, [%r1604]; 2026-02-21T10:22:31.1150253Z // end inline asm 2026-02-21T10:22:31.1150345Z // begin inline asm 2026-02-21T10:22:31.1150500Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1150694Z // end inline asm 2026-02-21T10:22:31.1150817Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1150943Z shl.b32 %r1958, %r1894, 10; 2026-02-21T10:22:31.1151046Z and.b32 %r1959, %r1958, 4096; 2026-02-21T10:22:31.1151188Z add.s32 %r1960, %r1959, %r2514; 2026-02-21T10:22:31.1151368Z bfe.u32 %r1961, %r1960, 4, 14; 2026-02-21T10:22:31.1151457Z cvt.u64.u32 %rd92, %r1961; 2026-02-21T10:22:31.1151702Z or.b64 %rd90, %rd92, -9223371899382267904; 2026-02-21T10:22:31.1151801Z // begin inline asm 2026-02-21T10:22:31.1152596Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1735,%r1736,%r1737,%r1738,%r1739,%r1740,%r1741,%r1742,%r1743,%r1744,%r1745,%r1746,%r1747,%r1748,%r1749,%r1750,%r1751,%r1752,%r1753,%r1754,%r1755,%r1756,%r1757,%r1758,%r1759,%r1760,%r1761,%r1762,%r1763,%r1764,%r1765,%r1766}, {%r1731,%r1732,%r1733,%r1734}, %rd90, %p20, 1, 1; 2026-02-21T10:22:31.1152807Z // end inline asm 2026-02-21T10:22:31.1152904Z add.s32 %r1962, %r1960, 32; 2026-02-21T10:22:31.1152986Z bfe.u32 %r1963, %r1962, 4, 14; 2026-02-21T10:22:31.1153202Z cvt.u64.u32 %rd93, %r1963; 2026-02-21T10:22:31.1153312Z or.b64 %rd91, %rd93, -9223371899382267904; 2026-02-21T10:22:31.1153400Z // begin inline asm 2026-02-21T10:22:31.1154225Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r1735,%r1736,%r1737,%r1738,%r1739,%r1740,%r1741,%r1742,%r1743,%r1744,%r1745,%r1746,%r1747,%r1748,%r1749,%r1750,%r1751,%r1752,%r1753,%r1754,%r1755,%r1756,%r1757,%r1758,%r1759,%r1760,%r1761,%r1762,%r1763,%r1764,%r1765,%r1766}, {%r1799,%r1800,%r1801,%r1802}, %rd91, %p20, 1, 1; 2026-02-21T10:22:31.1154322Z // end inline asm 2026-02-21T10:22:31.1154418Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1154621Z mov.b32 %r1837, %r1836; 2026-02-21T10:22:31.1154715Z mov.b32 %r1835, %r2514; 2026-02-21T10:22:31.1154866Z // begin inline asm 2026-02-21T10:22:31.1155459Z // wait for regs: %r1735,%r1736,%r1737,%r1738,%r1739,%r1740,%r1741,%r1742,%r1743,%r1744,%r1745,%r1746,%r1747,%r1748,%r1749,%r1750,%r1751,%r1752,%r1753,%r1754,%r1755,%r1756,%r1757,%r1758,%r1759,%r1760,%r1761,%r1762,%r1763,%r1764,%r1765,%r1766,%r1835,%r1836,%r1837 2026-02-21T10:22:31.1155598Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1155688Z // end inline asm 2026-02-21T10:22:31.1155829Z $L__tmp8: 2026-02-21T10:22:31.1156127Z .loc 1 50 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:50:126 2026-02-21T10:22:31.1156208Z add.s64 %rd152, %rd152, 32; 2026-02-21T10:22:31.1162330Z add.s64 %rd151, %rd151, 128; 2026-02-21T10:22:31.1162422Z add.s64 %rd150, %rd150, 40960; 2026-02-21T10:22:31.1162497Z setp.lt.u64 %p35, %rd152, 4064; 2026-02-21T10:22:31.1162568Z @%p35 bra $L__BB0_4; 2026-02-21T10:22:31.1162692Z // %bb.5: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:22:31.1162924Z .loc 1 0 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:0:126 2026-02-21T10:22:31.1162999Z setp.lt.u32 %p37, %r1, 64; 2026-02-21T10:22:31.1163207Z .loc 1 97 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:97:28 2026-02-21T10:22:31.1163288Z cvt.rn.bf16x2.f32 %r1968, %r1736, %r1735; 2026-02-21T10:22:31.1163359Z cvt.rn.bf16x2.f32 %r1969, %r1738, %r1737; 2026-02-21T10:22:31.1163580Z cvt.rn.bf16x2.f32 %r1970, %r1740, %r1739; 2026-02-21T10:22:31.1163654Z cvt.rn.bf16x2.f32 %r1971, %r1742, %r1741; 2026-02-21T10:22:31.1163721Z cvt.rn.bf16x2.f32 %r1972, %r1744, %r1743; 2026-02-21T10:22:31.1163793Z cvt.rn.bf16x2.f32 %r1973, %r1746, %r1745; 2026-02-21T10:22:31.1163861Z cvt.rn.bf16x2.f32 %r1974, %r1748, %r1747; 2026-02-21T10:22:31.1163933Z cvt.rn.bf16x2.f32 %r1975, %r1750, %r1749; 2026-02-21T10:22:31.1164008Z cvt.rn.bf16x2.f32 %r1976, %r1752, %r1751; 2026-02-21T10:22:31.1164089Z cvt.rn.bf16x2.f32 %r1977, %r1754, %r1753; 2026-02-21T10:22:31.1164159Z cvt.rn.bf16x2.f32 %r1978, %r1756, %r1755; 2026-02-21T10:22:31.1164227Z cvt.rn.bf16x2.f32 %r1979, %r1758, %r1757; 2026-02-21T10:22:31.1164298Z cvt.rn.bf16x2.f32 %r1980, %r1760, %r1759; 2026-02-21T10:22:31.1164365Z cvt.rn.bf16x2.f32 %r1981, %r1762, %r1761; 2026-02-21T10:22:31.1164432Z cvt.rn.bf16x2.f32 %r1982, %r1764, %r1763; 2026-02-21T10:22:31.1164501Z cvt.rn.bf16x2.f32 %r1983, %r1766, %r1765; 2026-02-21T10:22:31.1164788Z .loc 1 98 43 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:98:43 2026-02-21T10:22:31.1164850Z bar.sync 0; 2026-02-21T10:22:31.1165045Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r1968, %r1969, %r1970, %r1971}; 2026-02-21T10:22:31.1165224Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r1972, %r1973, %r1974, %r1975}; 2026-02-21T10:22:31.1165463Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r1976, %r1977, %r1978, %r1979}; 2026-02-21T10:22:31.1165633Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r1980, %r1981, %r1982, %r1983}; 2026-02-21T10:22:31.1165699Z // begin inline asm 2026-02-21T10:22:31.1165783Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1165839Z // end inline asm 2026-02-21T10:22:31.1165900Z bar.sync 0; 2026-02-21T10:22:31.1165971Z elect.sync %r1984|%p38, -1; 2026-02-21T10:22:31.1166050Z shfl.sync.idx.b32 %r1985, %r4, 0, 31, -1; 2026-02-21T10:22:31.1166127Z and.pred %p36, %p37, %p38; 2026-02-21T10:22:31.1166190Z and.b32 %r1986, %r1985, 1; 2026-02-21T10:22:31.1166253Z shl.b32 %r1987, %r1986, 13; 2026-02-21T10:22:31.1166331Z add.s32 %r3694, %r2514, %r1987; 2026-02-21T10:22:31.1166402Z shl.b32 %r103, %r1986, 6; 2026-02-21T10:22:31.1166647Z or.b32 %r1964, %r103, %r36; 2026-02-21T10:22:31.1166715Z // begin inline asm 2026-02-21T10:22:31.1166951Z @%p36 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd1, {%r1964, %r1965}], [%r3694]; 2026-02-21T10:22:31.1167014Z // end inline asm 2026-02-21T10:22:31.1167086Z cp.async.bulk.commit_group; 2026-02-21T10:22:31.1167253Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:31.1167321Z bar.sync 0; 2026-02-21T10:22:31.1167538Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.1167601Z or.b32 %r1989, %r3753, 1; 2026-02-21T10:22:31.1167806Z .loc 1 35 35 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:35:35 2026-02-21T10:22:31.1167871Z add.s32 %r1992, %r1989, %r262; 2026-02-21T10:22:31.1167941Z shr.s32 %r1993, %r1992, 16; 2026-02-21T10:22:31.1168150Z .loc 1 36 33 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:36:33 2026-02-21T10:22:31.1168215Z shl.b32 %r1994, %r1993, 6; 2026-02-21T10:22:31.1168407Z .loc 1 37 39 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:39 2026-02-21T10:22:31.1168471Z sub.s32 %r1995, 10, %r1994; 2026-02-21T10:22:31.1168674Z .loc 1 37 52 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:52 2026-02-21T10:22:31.1168736Z min.s32 %r1996, %r1995, 64; 2026-02-21T10:22:31.1168929Z .loc 1 38 45 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:45 2026-02-21T10:22:31.1169003Z and.b32 %r1997, %r1992, -65536; 2026-02-21T10:22:31.1169066Z sub.s32 %r1998, %r1989, %r1997; 2026-02-21T10:22:31.1169258Z .loc 1 39 51 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:39:51 2026-02-21T10:22:31.1169403Z div.s32 %r1999, %r1998, %r1996; 2026-02-21T10:22:31.1169597Z .loc 1 38 64 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:64 2026-02-21T10:22:31.1169663Z mul.lo.s32 %r2000, %r1999, %r1996; 2026-02-21T10:22:31.1169733Z sub.s32 %r2001, %r1998, %r2000; 2026-02-21T10:22:31.1169925Z .loc 1 38 30 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:30 2026-02-21T10:22:31.1169988Z add.s32 %r2002, %r2001, %r1994; 2026-02-21T10:22:31.1170184Z .loc 1 40 27 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:40:27 2026-02-21T10:22:31.1170252Z shl.b32 %r104, %r2002, 7; 2026-02-21T10:22:31.1170442Z .loc 1 42 27 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:42:27 2026-02-21T10:22:31.1170516Z shl.b32 %r105, %r1999, 6; 2026-02-21T10:22:31.1170589Z shl.b32 %r2003, %r1985, 10; 2026-02-21T10:22:31.1170652Z and.b32 %r2004, %r2003, 4096; 2026-02-21T10:22:31.1170784Z add.s32 %r2005, %r2004, %r2514; 2026-02-21T10:22:31.1170858Z bfe.u32 %r2006, %r2005, 4, 14; 2026-02-21T10:22:31.1170924Z cvt.u64.u32 %rd96, %r2006; 2026-02-21T10:22:31.1171007Z or.b64 %rd135, %rd96, -9223371899382267904; 2026-02-21T10:22:31.1171068Z add.s32 %r2007, %r2005, 32; 2026-02-21T10:22:31.1171136Z bfe.u32 %r2008, %r2007, 4, 14; 2026-02-21T10:22:31.1171284Z cvt.u64.u32 %rd97, %r2008; 2026-02-21T10:22:31.1171360Z or.b64 %rd136, %rd97, -9223371899382267904; 2026-02-21T10:22:31.1171577Z .loc 1 50 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:50:126 2026-02-21T10:22:31.1171650Z shl.b32 %r2009, %r1999, 19; 2026-02-21T10:22:31.1171718Z or.b32 %r2010, %r34, %r2009; 2026-02-21T10:22:31.1171796Z mad.wide.s32 %rd154, %r2010, 2, %rd6; 2026-02-21T10:22:31.1171857Z or.b32 %r2011, %r5, %r104; 2026-02-21T10:22:31.1171919Z cvt.s64.s32 %rd98, %r2011; 2026-02-21T10:22:31.1171983Z add.s64 %rd153, %rd7, %rd98; 2026-02-21T10:22:31.1172055Z mov.b32 %r3470, 0f00000000; 2026-02-21T10:22:31.1172121Z mov.b64 %rd155, -32; 2026-02-21T10:22:31.1172184Z mov.b32 %r3471, %r3470; 2026-02-21T10:22:31.1172249Z mov.b32 %r3472, %r3470; 2026-02-21T10:22:31.1172310Z mov.b32 %r3473, %r3470; 2026-02-21T10:22:31.1172367Z mov.b32 %r3474, %r3470; 2026-02-21T10:22:31.1172426Z mov.b32 %r3475, %r3470; 2026-02-21T10:22:31.1172493Z mov.b32 %r3476, %r3470; 2026-02-21T10:22:31.1172553Z mov.b32 %r3477, %r3470; 2026-02-21T10:22:31.1172614Z mov.b32 %r3478, %r3470; 2026-02-21T10:22:31.1172743Z mov.b32 %r3479, %r3470; 2026-02-21T10:22:31.1172808Z mov.b32 %r3480, %r3470; 2026-02-21T10:22:31.1172867Z mov.b32 %r3481, %r3470; 2026-02-21T10:22:31.1172927Z mov.b32 %r3482, %r3470; 2026-02-21T10:22:31.1172996Z mov.b32 %r3483, %r3470; 2026-02-21T10:22:31.1173055Z mov.b32 %r3484, %r3470; 2026-02-21T10:22:31.1173114Z mov.b32 %r3485, %r3470; 2026-02-21T10:22:31.1173180Z mov.b32 %r3486, %r3470; 2026-02-21T10:22:31.1173240Z mov.b32 %r3487, %r3470; 2026-02-21T10:22:31.1173304Z mov.b32 %r3488, %r3470; 2026-02-21T10:22:31.1173364Z mov.b32 %r3489, %r3470; 2026-02-21T10:22:31.1173429Z mov.b32 %r3490, %r3470; 2026-02-21T10:22:31.1173489Z mov.b32 %r3491, %r3470; 2026-02-21T10:22:31.1173550Z mov.b32 %r3492, %r3470; 2026-02-21T10:22:31.1173616Z mov.b32 %r3493, %r3470; 2026-02-21T10:22:31.1173680Z mov.b32 %r3494, %r3470; 2026-02-21T10:22:31.1173741Z mov.b32 %r3495, %r3470; 2026-02-21T10:22:31.1173802Z mov.b32 %r3496, %r3470; 2026-02-21T10:22:31.1173882Z mov.b32 %r3497, %r3470; 2026-02-21T10:22:31.1173943Z mov.b32 %r3498, %r3470; 2026-02-21T10:22:31.1174001Z mov.b32 %r3499, %r3470; 2026-02-21T10:22:31.1174068Z mov.b32 %r3500, %r3470; 2026-02-21T10:22:31.1174128Z mov.b32 %r3501, %r3470; 2026-02-21T10:22:31.1174247Z $L__BB0_6: // Parent Loop BB0_3 Depth=1 2026-02-21T10:22:31.1174364Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:31.1174625Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1174695Z add.s64 %rd100, %rd154, -96; 2026-02-21T10:22:31.1174757Z // begin inline asm 2026-02-21T10:22:31.1174823Z mov.u64 %rd99, 0x0; 2026-02-21T10:22:31.1174950Z createpolicy.fractional.L2::evict_last.b64 %rd99, 1.0; 2026-02-21T10:22:31.1175013Z // end inline asm 2026-02-21T10:22:31.1175079Z // begin inline asm 2026-02-21T10:22:31.1175139Z mov.u32 %r2012, 0x0; 2026-02-21T10:22:31.1175198Z mov.u32 %r2013, 0x0; 2026-02-21T10:22:31.1175393Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2012, %r2013 }, [ %rd100 + 0 ], %rd99; 2026-02-21T10:22:31.1175451Z // end inline asm 2026-02-21T10:22:31.1175654Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1175708Z bar.sync 0; 2026-02-21T10:22:31.1175809Z st.shared.v2.b32 [%r15], {%r2012, %r2013}; 2026-02-21T10:22:31.1175868Z bar.sync 0; 2026-02-21T10:22:31.1175942Z ld.shared.b16 %rs193, [%r16]; 2026-02-21T10:22:31.1176070Z ld.shared.b16 %rs194, [%r16+256]; 2026-02-21T10:22:31.1176140Z ld.shared.b16 %rs195, [%r16+16]; 2026-02-21T10:22:31.1176208Z ld.shared.b16 %rs196, [%r16+272]; 2026-02-21T10:22:31.1176274Z ld.shared.b16 %rs197, [%r17]; 2026-02-21T10:22:31.1176343Z ld.shared.b16 %rs198, [%r17+256]; 2026-02-21T10:22:31.1176594Z ld.shared.b16 %rs199, [%r17+16]; 2026-02-21T10:22:31.1176668Z ld.shared.b16 %rs200, [%r17+272]; 2026-02-21T10:22:31.1176740Z cvt.f32.bf16 %r2311, %rs193; 2026-02-21T10:22:31.1176806Z cvt.f32.bf16 %r2312, %rs194; 2026-02-21T10:22:31.1176867Z cvt.f32.bf16 %r2313, %rs197; 2026-02-21T10:22:31.1176937Z cvt.f32.bf16 %r2314, %rs198; 2026-02-21T10:22:31.1176999Z cvt.f32.bf16 %r2379, %rs195; 2026-02-21T10:22:31.1177060Z cvt.f32.bf16 %r2380, %rs196; 2026-02-21T10:22:31.1177136Z cvt.f32.bf16 %r2381, %rs199; 2026-02-21T10:22:31.1177207Z cvt.f32.bf16 %r2382, %rs200; 2026-02-21T10:22:31.1177414Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1177478Z // begin inline asm 2026-02-21T10:22:31.1177544Z mov.u64 %rd102, 0x0; 2026-02-21T10:22:31.1177673Z createpolicy.fractional.L2::evict_last.b64 %rd102, 1.0; 2026-02-21T10:22:31.1177732Z // end inline asm 2026-02-21T10:22:31.1177793Z // begin inline asm 2026-02-21T10:22:31.1177863Z mov.u32 %r2014, 0x0; 2026-02-21T10:22:31.1178026Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2014 }, [ %rd153 + 0 ], %rd102; 2026-02-21T10:22:31.1178172Z // end inline asm 2026-02-21T10:22:31.1178382Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1178439Z bar.sync 0; 2026-02-21T10:22:31.1178507Z st.shared.b8 [%r18], %r2014; 2026-02-21T10:22:31.1178582Z prmt.b32 %r3608, %r2014, 0, 0x7771U; 2026-02-21T10:22:31.1178648Z st.shared.b8 [%r19+256], %r3608; 2026-02-21T10:22:31.1178714Z prmt.b32 %r3609, %r2014, 0, 0x7772U; 2026-02-21T10:22:31.1178783Z st.shared.b8 [%r20+512], %r3609; 2026-02-21T10:22:31.1178855Z prmt.b32 %r3610, %r2014, 0, 0x7773U; 2026-02-21T10:22:31.1178919Z st.shared.b8 [%r21+768], %r3610; 2026-02-21T10:22:31.1178975Z bar.sync 0; 2026-02-21T10:22:31.1179047Z ld.shared.b32 %r3611, [%r22]; 2026-02-21T10:22:31.1179110Z prmt.b32 %r3612, %r3611, 0, 0x7770U; 2026-02-21T10:22:31.1179178Z cvt.u16.u32 %rs201, %r3612; 2026-02-21T10:22:31.1179241Z prmt.b32 %r3613, %r3611, 0, 0x7771U; 2026-02-21T10:22:31.1179311Z cvt.u16.u32 %rs202, %r3613; 2026-02-21T10:22:31.1179388Z prmt.b32 %r3614, %r3611, 0, 0x7772U; 2026-02-21T10:22:31.1179453Z cvt.u16.u32 %rs203, %r3614; 2026-02-21T10:22:31.1179524Z prmt.b32 %r3615, %r3611, 0, 0x7773U; 2026-02-21T10:22:31.1179586Z cvt.u16.u32 %rs204, %r3615; 2026-02-21T10:22:31.1179655Z ld.shared.b32 %r3616, [%r22+128]; 2026-02-21T10:22:31.1179724Z prmt.b32 %r3617, %r3616, 0, 0x7770U; 2026-02-21T10:22:31.1179787Z cvt.u16.u32 %rs205, %r3617; 2026-02-21T10:22:31.1179924Z prmt.b32 %r3618, %r3616, 0, 0x7771U; 2026-02-21T10:22:31.1179999Z cvt.u16.u32 %rs206, %r3618; 2026-02-21T10:22:31.1180075Z prmt.b32 %r3619, %r3616, 0, 0x7772U; 2026-02-21T10:22:31.1180138Z cvt.u16.u32 %rs207, %r3619; 2026-02-21T10:22:31.1180201Z prmt.b32 %r3620, %r3616, 0, 0x7773U; 2026-02-21T10:22:31.1180266Z cvt.u16.u32 %rs208, %r3620; 2026-02-21T10:22:31.1180466Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1180531Z shl.b16 %rs209, %rs201, 4; 2026-02-21T10:22:31.1180598Z shl.b16 %rs210, %rs202, 4; 2026-02-21T10:22:31.1180668Z shl.b16 %rs211, %rs203, 4; 2026-02-21T10:22:31.1180729Z shl.b16 %rs212, %rs204, 4; 2026-02-21T10:22:31.1180791Z shl.b16 %rs213, %rs205, 4; 2026-02-21T10:22:31.1180862Z shl.b16 %rs214, %rs206, 4; 2026-02-21T10:22:31.1180925Z shl.b16 %rs215, %rs207, 4; 2026-02-21T10:22:31.1180988Z shl.b16 %rs216, %rs208, 4; 2026-02-21T10:22:31.1181187Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1181341Z selp.b16 %rs217, %rs209, %rs201, %p34; 2026-02-21T10:22:31.1181409Z cvt.s16.s8 %rs218, %rs217; 2026-02-21T10:22:31.1181474Z shr.s16 %rs219, %rs218, 4; 2026-02-21T10:22:31.1181553Z selp.b16 %rs220, %rs210, %rs202, %p34; 2026-02-21T10:22:31.1181616Z cvt.s16.s8 %rs221, %rs220; 2026-02-21T10:22:31.1181744Z shr.s16 %rs222, %rs221, 4; 2026-02-21T10:22:31.1181822Z selp.b16 %rs223, %rs211, %rs203, %p34; 2026-02-21T10:22:31.1181884Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T10:22:31.1181949Z shr.s16 %rs225, %rs224, 4; 2026-02-21T10:22:31.1182019Z selp.b16 %rs226, %rs212, %rs204, %p34; 2026-02-21T10:22:31.1182087Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T10:22:31.1182149Z shr.s16 %rs228, %rs227, 4; 2026-02-21T10:22:31.1182218Z selp.b16 %rs229, %rs213, %rs205, %p34; 2026-02-21T10:22:31.1182287Z cvt.s16.s8 %rs230, %rs229; 2026-02-21T10:22:31.1182348Z shr.s16 %rs231, %rs230, 4; 2026-02-21T10:22:31.1182416Z selp.b16 %rs232, %rs214, %rs206, %p34; 2026-02-21T10:22:31.1182483Z cvt.s16.s8 %rs233, %rs232; 2026-02-21T10:22:31.1182551Z shr.s16 %rs234, %rs233, 4; 2026-02-21T10:22:31.1182618Z selp.b16 %rs235, %rs215, %rs207, %p34; 2026-02-21T10:22:31.1182680Z cvt.s16.s8 %rs236, %rs235; 2026-02-21T10:22:31.1182745Z shr.s16 %rs237, %rs236, 4; 2026-02-21T10:22:31.1182813Z selp.b16 %rs238, %rs216, %rs208, %p34; 2026-02-21T10:22:31.1182877Z cvt.s16.s8 %rs239, %rs238; 2026-02-21T10:22:31.1182944Z shr.s16 %rs240, %rs239, 4; 2026-02-21T10:22:31.1183189Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1183272Z cvt.rn.f32.s16 %r3621, %rs219; 2026-02-21T10:22:31.1183338Z cvt.rn.f32.s16 %r3622, %rs222; 2026-02-21T10:22:31.1183414Z cvt.rn.f32.s16 %r3623, %rs225; 2026-02-21T10:22:31.1183478Z cvt.rn.f32.s16 %r3624, %rs228; 2026-02-21T10:22:31.1183540Z cvt.rn.f32.s16 %r3625, %rs231; 2026-02-21T10:22:31.1183610Z cvt.rn.f32.s16 %r3626, %rs234; 2026-02-21T10:22:31.1183676Z cvt.rn.f32.s16 %r3627, %rs237; 2026-02-21T10:22:31.1183740Z cvt.rn.f32.s16 %r3628, %rs240; 2026-02-21T10:22:31.1183797Z bar.sync 0; 2026-02-21T10:22:31.1183869Z st.shared.b32 [%r23], %r3621; 2026-02-21T10:22:31.1183939Z st.shared.b32 [%r23+8], %r3622; 2026-02-21T10:22:31.1184004Z st.shared.b32 [%r24], %r3623; 2026-02-21T10:22:31.1184079Z st.shared.b32 [%r24+8], %r3624; 2026-02-21T10:22:31.1184145Z st.shared.b32 [%r25], %r3625; 2026-02-21T10:22:31.1184213Z st.shared.b32 [%r25+8], %r3626; 2026-02-21T10:22:31.1184278Z st.shared.b32 [%r26], %r3627; 2026-02-21T10:22:31.1184348Z st.shared.b32 [%r26+8], %r3628; 2026-02-21T10:22:31.1184491Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3470}; 2026-02-21T10:22:31.1184545Z bar.sync 0; 2026-02-21T10:22:31.1184612Z // begin inline asm 2026-02-21T10:22:31.1184769Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2179, %r2315}, [%r282]; 2026-02-21T10:22:31.1184827Z // end inline asm 2026-02-21T10:22:31.1184890Z bar.sync 0; 2026-02-21T10:22:31.1185082Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3472}; 2026-02-21T10:22:31.1185138Z bar.sync 0; 2026-02-21T10:22:31.1185198Z // begin inline asm 2026-02-21T10:22:31.1185353Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2181, %r2317}, [%r282]; 2026-02-21T10:22:31.1185415Z // end inline asm 2026-02-21T10:22:31.1185474Z bar.sync 0; 2026-02-21T10:22:31.1185610Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3471}; 2026-02-21T10:22:31.1185665Z bar.sync 0; 2026-02-21T10:22:31.1185723Z // begin inline asm 2026-02-21T10:22:31.1185870Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2180, %r2316}, [%r282]; 2026-02-21T10:22:31.1185941Z // end inline asm 2026-02-21T10:22:31.1186004Z bar.sync 0; 2026-02-21T10:22:31.1186135Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3473}; 2026-02-21T10:22:31.1186198Z bar.sync 0; 2026-02-21T10:22:31.1186258Z // begin inline asm 2026-02-21T10:22:31.1186402Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2182, %r2318}, [%r282]; 2026-02-21T10:22:31.1186678Z // end inline asm 2026-02-21T10:22:31.1186745Z bar.sync 0; 2026-02-21T10:22:31.1186885Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3474}; 2026-02-21T10:22:31.1186943Z bar.sync 0; 2026-02-21T10:22:31.1187009Z // begin inline asm 2026-02-21T10:22:31.1187163Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2183, %r2319}, [%r282]; 2026-02-21T10:22:31.1187290Z // end inline asm 2026-02-21T10:22:31.1187352Z bar.sync 0; 2026-02-21T10:22:31.1187485Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3476}; 2026-02-21T10:22:31.1187540Z bar.sync 0; 2026-02-21T10:22:31.1187599Z // begin inline asm 2026-02-21T10:22:31.1187752Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2185, %r2321}, [%r282]; 2026-02-21T10:22:31.1187810Z // end inline asm 2026-02-21T10:22:31.1187867Z bar.sync 0; 2026-02-21T10:22:31.1188003Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3475}; 2026-02-21T10:22:31.1188059Z bar.sync 0; 2026-02-21T10:22:31.1188119Z // begin inline asm 2026-02-21T10:22:31.1188269Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2184, %r2320}, [%r282]; 2026-02-21T10:22:31.1188331Z // end inline asm 2026-02-21T10:22:31.1188394Z bar.sync 0; 2026-02-21T10:22:31.1188616Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3477}; 2026-02-21T10:22:31.1188681Z bar.sync 0; 2026-02-21T10:22:31.1188744Z // begin inline asm 2026-02-21T10:22:31.1188887Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2186, %r2322}, [%r282]; 2026-02-21T10:22:31.1188953Z // end inline asm 2026-02-21T10:22:31.1189080Z bar.sync 0; 2026-02-21T10:22:31.1189211Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3478}; 2026-02-21T10:22:31.1189265Z bar.sync 0; 2026-02-21T10:22:31.1189332Z // begin inline asm 2026-02-21T10:22:31.1189476Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2187, %r2323}, [%r282]; 2026-02-21T10:22:31.1189535Z // end inline asm 2026-02-21T10:22:31.1189597Z bar.sync 0; 2026-02-21T10:22:31.1189724Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3480}; 2026-02-21T10:22:31.1189780Z bar.sync 0; 2026-02-21T10:22:31.1189849Z // begin inline asm 2026-02-21T10:22:31.1189992Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2189, %r2325}, [%r282]; 2026-02-21T10:22:31.1190050Z // end inline asm 2026-02-21T10:22:31.1190104Z bar.sync 0; 2026-02-21T10:22:31.1190238Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3479}; 2026-02-21T10:22:31.1190296Z bar.sync 0; 2026-02-21T10:22:31.1190356Z // begin inline asm 2026-02-21T10:22:31.1190519Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2188, %r2324}, [%r282]; 2026-02-21T10:22:31.1190580Z // end inline asm 2026-02-21T10:22:31.1190636Z bar.sync 0; 2026-02-21T10:22:31.1190763Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3481}; 2026-02-21T10:22:31.1190829Z bar.sync 0; 2026-02-21T10:22:31.1190887Z // begin inline asm 2026-02-21T10:22:31.1191029Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2190, %r2326}, [%r282]; 2026-02-21T10:22:31.1191090Z // end inline asm 2026-02-21T10:22:31.1191224Z bar.sync 0; 2026-02-21T10:22:31.1191355Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3482}; 2026-02-21T10:22:31.1191415Z bar.sync 0; 2026-02-21T10:22:31.1191475Z // begin inline asm 2026-02-21T10:22:31.1191623Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2191, %r2327}, [%r282]; 2026-02-21T10:22:31.1191687Z // end inline asm 2026-02-21T10:22:31.1191753Z bar.sync 0; 2026-02-21T10:22:31.1191888Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3484}; 2026-02-21T10:22:31.1191943Z bar.sync 0; 2026-02-21T10:22:31.1192009Z // begin inline asm 2026-02-21T10:22:31.1192157Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2193, %r2329}, [%r282]; 2026-02-21T10:22:31.1192215Z // end inline asm 2026-02-21T10:22:31.1192271Z bar.sync 0; 2026-02-21T10:22:31.1192407Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3483}; 2026-02-21T10:22:31.1192461Z bar.sync 0; 2026-02-21T10:22:31.1192520Z // begin inline asm 2026-02-21T10:22:31.1192669Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2192, %r2328}, [%r282]; 2026-02-21T10:22:31.1192807Z // end inline asm 2026-02-21T10:22:31.1192868Z bar.sync 0; 2026-02-21T10:22:31.1192999Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3485}; 2026-02-21T10:22:31.1193060Z bar.sync 0; 2026-02-21T10:22:31.1193121Z // begin inline asm 2026-02-21T10:22:31.1193269Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2194, %r2330}, [%r282]; 2026-02-21T10:22:31.1193385Z // end inline asm 2026-02-21T10:22:31.1193441Z bar.sync 0; 2026-02-21T10:22:31.1193570Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3486}; 2026-02-21T10:22:31.1193631Z bar.sync 0; 2026-02-21T10:22:31.1193695Z // begin inline asm 2026-02-21T10:22:31.1193838Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2195, %r2331}, [%r282]; 2026-02-21T10:22:31.1193895Z // end inline asm 2026-02-21T10:22:31.1193955Z bar.sync 0; 2026-02-21T10:22:31.1194082Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3488}; 2026-02-21T10:22:31.1194135Z bar.sync 0; 2026-02-21T10:22:31.1194198Z // begin inline asm 2026-02-21T10:22:31.1194359Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2197, %r2333}, [%r282]; 2026-02-21T10:22:31.1194417Z // end inline asm 2026-02-21T10:22:31.1194474Z bar.sync 0; 2026-02-21T10:22:31.1194607Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3487}; 2026-02-21T10:22:31.1194663Z bar.sync 0; 2026-02-21T10:22:31.1194725Z // begin inline asm 2026-02-21T10:22:31.1194872Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2196, %r2332}, [%r282]; 2026-02-21T10:22:31.1194927Z // end inline asm 2026-02-21T10:22:31.1195033Z bar.sync 0; 2026-02-21T10:22:31.1195163Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3489}; 2026-02-21T10:22:31.1195223Z bar.sync 0; 2026-02-21T10:22:31.1195281Z // begin inline asm 2026-02-21T10:22:31.1195421Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2198, %r2334}, [%r282]; 2026-02-21T10:22:31.1195485Z // end inline asm 2026-02-21T10:22:31.1195539Z bar.sync 0; 2026-02-21T10:22:31.1195666Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3490}; 2026-02-21T10:22:31.1195730Z bar.sync 0; 2026-02-21T10:22:31.1195789Z // begin inline asm 2026-02-21T10:22:31.1195931Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2199, %r2335}, [%r282]; 2026-02-21T10:22:31.1195988Z // end inline asm 2026-02-21T10:22:31.1196047Z bar.sync 0; 2026-02-21T10:22:31.1196172Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3492}; 2026-02-21T10:22:31.1196228Z bar.sync 0; 2026-02-21T10:22:31.1196291Z // begin inline asm 2026-02-21T10:22:31.1196436Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2201, %r2337}, [%r282]; 2026-02-21T10:22:31.1196618Z // end inline asm 2026-02-21T10:22:31.1196675Z bar.sync 0; 2026-02-21T10:22:31.1196810Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3491}; 2026-02-21T10:22:31.1196865Z bar.sync 0; 2026-02-21T10:22:31.1196923Z // begin inline asm 2026-02-21T10:22:31.1197072Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2200, %r2336}, [%r282]; 2026-02-21T10:22:31.1197131Z // end inline asm 2026-02-21T10:22:31.1197266Z bar.sync 0; 2026-02-21T10:22:31.1197396Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3493}; 2026-02-21T10:22:31.1197459Z bar.sync 0; 2026-02-21T10:22:31.1197517Z // begin inline asm 2026-02-21T10:22:31.1197660Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2202, %r2338}, [%r282]; 2026-02-21T10:22:31.1197723Z // end inline asm 2026-02-21T10:22:31.1197781Z bar.sync 0; 2026-02-21T10:22:31.1197907Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3494}; 2026-02-21T10:22:31.1197968Z bar.sync 0; 2026-02-21T10:22:31.1198029Z // begin inline asm 2026-02-21T10:22:31.1198174Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2203, %r2339}, [%r282]; 2026-02-21T10:22:31.1198235Z // end inline asm 2026-02-21T10:22:31.1198290Z bar.sync 0; 2026-02-21T10:22:31.1198424Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3496}; 2026-02-21T10:22:31.1198483Z bar.sync 0; 2026-02-21T10:22:31.1198542Z // begin inline asm 2026-02-21T10:22:31.1198685Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2205, %r2341}, [%r282]; 2026-02-21T10:22:31.1198816Z // end inline asm 2026-02-21T10:22:31.1198873Z bar.sync 0; 2026-02-21T10:22:31.1199000Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3495}; 2026-02-21T10:22:31.1199056Z bar.sync 0; 2026-02-21T10:22:31.1199118Z // begin inline asm 2026-02-21T10:22:31.1199259Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2204, %r2340}, [%r282]; 2026-02-21T10:22:31.1199376Z // end inline asm 2026-02-21T10:22:31.1199440Z bar.sync 0; 2026-02-21T10:22:31.1199570Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3497}; 2026-02-21T10:22:31.1199628Z bar.sync 0; 2026-02-21T10:22:31.1199687Z // begin inline asm 2026-02-21T10:22:31.1199837Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2206, %r2342}, [%r282]; 2026-02-21T10:22:31.1199892Z // end inline asm 2026-02-21T10:22:31.1199945Z bar.sync 0; 2026-02-21T10:22:31.1200077Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3498}; 2026-02-21T10:22:31.1200132Z bar.sync 0; 2026-02-21T10:22:31.1200194Z // begin inline asm 2026-02-21T10:22:31.1200337Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2207, %r2343}, [%r282]; 2026-02-21T10:22:31.1200399Z // end inline asm 2026-02-21T10:22:31.1200453Z bar.sync 0; 2026-02-21T10:22:31.1200579Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3500}; 2026-02-21T10:22:31.1200638Z bar.sync 0; 2026-02-21T10:22:31.1200700Z // begin inline asm 2026-02-21T10:22:31.1200844Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2209, %r2345}, [%r282]; 2026-02-21T10:22:31.1200906Z // end inline asm 2026-02-21T10:22:31.1201031Z bar.sync 0; 2026-02-21T10:22:31.1201161Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3499}; 2026-02-21T10:22:31.1201216Z bar.sync 0; 2026-02-21T10:22:31.1201290Z // begin inline asm 2026-02-21T10:22:31.1201437Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2208, %r2344}, [%r282]; 2026-02-21T10:22:31.1201493Z // end inline asm 2026-02-21T10:22:31.1201551Z bar.sync 0; 2026-02-21T10:22:31.1201678Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r1604], {%r3501}; 2026-02-21T10:22:31.1201739Z bar.sync 0; 2026-02-21T10:22:31.1201798Z // begin inline asm 2026-02-21T10:22:31.1201960Z ldmatrix.sync.aligned.m8n8.x2.shared.b16 {%r2210, %r2346}, [%r282]; 2026-02-21T10:22:31.1202016Z // end inline asm 2026-02-21T10:22:31.1202070Z $L__tmp9: 2026-02-21T10:22:31.1202354Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1202416Z // begin inline asm 2026-02-21T10:22:31.1202498Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1202558Z // end inline asm 2026-02-21T10:22:31.1202632Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1202697Z mov.pred %p39, -1; 2026-02-21T10:22:31.1202757Z // begin inline asm 2026-02-21T10:22:31.1203528Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r2311,%r2312,%r2313,%r2314}, %rd105, %p39, 1, 1; 2026-02-21T10:22:31.1203642Z // end inline asm 2026-02-21T10:22:31.1203701Z // begin inline asm 2026-02-21T10:22:31.1204454Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r2379,%r2380,%r2381,%r2382}, %rd106, %p39, 1, 1; 2026-02-21T10:22:31.1204516Z // end inline asm 2026-02-21T10:22:31.1204574Z // begin inline asm 2026-02-21T10:22:31.1205319Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r2311,%r2312,%r2313,%r2314}, %rd107, %p39, 1, 1; 2026-02-21T10:22:31.1205380Z // end inline asm 2026-02-21T10:22:31.1205495Z // begin inline asm 2026-02-21T10:22:31.1206248Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r2379,%r2380,%r2381,%r2382}, %rd108, %p39, 1, 1; 2026-02-21T10:22:31.1206349Z // end inline asm 2026-02-21T10:22:31.1206431Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1206627Z mov.b32 %r3572, 0; 2026-02-21T10:22:31.1206692Z mov.b32 %r2447, %r2514; 2026-02-21T10:22:31.1206755Z mov.b32 %r2448, %r3572; 2026-02-21T10:22:31.1206819Z mov.b32 %r2449, %r3572; 2026-02-21T10:22:31.1206877Z // begin inline asm 2026-02-21T10:22:31.1207949Z // wait for regs: %r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346,%r2447,%r2448,%r2449 2026-02-21T10:22:31.1208034Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1208094Z // end inline asm 2026-02-21T10:22:31.1208156Z $L__tmp10: 2026-02-21T10:22:31.1208452Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1208527Z add.s64 %rd110, %rd154, -64; 2026-02-21T10:22:31.1208589Z // begin inline asm 2026-02-21T10:22:31.1208649Z mov.u64 %rd109, 0x0; 2026-02-21T10:22:31.1208786Z createpolicy.fractional.L2::evict_last.b64 %rd109, 1.0; 2026-02-21T10:22:31.1208847Z // end inline asm 2026-02-21T10:22:31.1208907Z // begin inline asm 2026-02-21T10:22:31.1208970Z mov.u32 %r2517, 0x0; 2026-02-21T10:22:31.1209031Z mov.u32 %r2518, 0x0; 2026-02-21T10:22:31.1209225Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2517, %r2518 }, [ %rd110 + 0 ], %rd109; 2026-02-21T10:22:31.1209285Z // end inline asm 2026-02-21T10:22:31.1209493Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1209551Z bar.sync 0; 2026-02-21T10:22:31.1209636Z st.shared.v2.b32 [%r15], {%r2517, %r2518}; 2026-02-21T10:22:31.1209700Z bar.sync 0; 2026-02-21T10:22:31.1209769Z ld.shared.b16 %rs241, [%r16]; 2026-02-21T10:22:31.1209840Z ld.shared.b16 %rs242, [%r16+256]; 2026-02-21T10:22:31.1209915Z ld.shared.b16 %rs243, [%r16+16]; 2026-02-21T10:22:31.1209979Z ld.shared.b16 %rs244, [%r16+272]; 2026-02-21T10:22:31.1210045Z ld.shared.b16 %rs245, [%r17]; 2026-02-21T10:22:31.1210109Z ld.shared.b16 %rs246, [%r17+256]; 2026-02-21T10:22:31.1210181Z ld.shared.b16 %rs247, [%r17+16]; 2026-02-21T10:22:31.1210244Z ld.shared.b16 %rs248, [%r17+272]; 2026-02-21T10:22:31.1210395Z cvt.f32.bf16 %r2720, %rs241; 2026-02-21T10:22:31.1210466Z cvt.f32.bf16 %r2721, %rs242; 2026-02-21T10:22:31.1210527Z cvt.f32.bf16 %r2722, %rs245; 2026-02-21T10:22:31.1210591Z cvt.f32.bf16 %r2723, %rs246; 2026-02-21T10:22:31.1210652Z cvt.f32.bf16 %r2788, %rs243; 2026-02-21T10:22:31.1210719Z cvt.f32.bf16 %r2789, %rs244; 2026-02-21T10:22:31.1210781Z cvt.f32.bf16 %r2790, %rs247; 2026-02-21T10:22:31.1210842Z cvt.f32.bf16 %r2791, %rs248; 2026-02-21T10:22:31.1211058Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1211127Z add.s64 %rd113, %rd153, 10240; 2026-02-21T10:22:31.1211187Z // begin inline asm 2026-02-21T10:22:31.1211251Z mov.u64 %rd112, 0x0; 2026-02-21T10:22:31.1211375Z createpolicy.fractional.L2::evict_last.b64 %rd112, 1.0; 2026-02-21T10:22:31.1211432Z // end inline asm 2026-02-21T10:22:31.1211488Z // begin inline asm 2026-02-21T10:22:31.1211564Z mov.u32 %r2519, 0x0; 2026-02-21T10:22:31.1211798Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2519 }, [ %rd113 + 0 ], %rd112; 2026-02-21T10:22:31.1211860Z // end inline asm 2026-02-21T10:22:31.1212063Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1212119Z bar.sync 0; 2026-02-21T10:22:31.1212184Z st.shared.b8 [%r18], %r2519; 2026-02-21T10:22:31.1212321Z prmt.b32 %r3629, %r2519, 0, 0x7771U; 2026-02-21T10:22:31.1212388Z st.shared.b8 [%r19+256], %r3629; 2026-02-21T10:22:31.1212454Z prmt.b32 %r3630, %r2519, 0, 0x7772U; 2026-02-21T10:22:31.1212522Z st.shared.b8 [%r20+512], %r3630; 2026-02-21T10:22:31.1212595Z prmt.b32 %r3631, %r2519, 0, 0x7773U; 2026-02-21T10:22:31.1212674Z st.shared.b8 [%r21+768], %r3631; 2026-02-21T10:22:31.1212729Z bar.sync 0; 2026-02-21T10:22:31.1212800Z ld.shared.b32 %r3632, [%r22]; 2026-02-21T10:22:31.1212862Z prmt.b32 %r3633, %r3632, 0, 0x7770U; 2026-02-21T10:22:31.1212927Z cvt.u16.u32 %rs249, %r3633; 2026-02-21T10:22:31.1212990Z prmt.b32 %r3634, %r3632, 0, 0x7771U; 2026-02-21T10:22:31.1213064Z cvt.u16.u32 %rs250, %r3634; 2026-02-21T10:22:31.1213126Z prmt.b32 %r3635, %r3632, 0, 0x7772U; 2026-02-21T10:22:31.1213188Z cvt.u16.u32 %rs251, %r3635; 2026-02-21T10:22:31.1213255Z prmt.b32 %r3636, %r3632, 0, 0x7773U; 2026-02-21T10:22:31.1213317Z cvt.u16.u32 %rs252, %r3636; 2026-02-21T10:22:31.1213384Z ld.shared.b32 %r3637, [%r22+128]; 2026-02-21T10:22:31.1213450Z prmt.b32 %r3638, %r3637, 0, 0x7770U; 2026-02-21T10:22:31.1213514Z cvt.u16.u32 %rs253, %r3638; 2026-02-21T10:22:31.1213629Z prmt.b32 %r3639, %r3637, 0, 0x7771U; 2026-02-21T10:22:31.1213691Z cvt.u16.u32 %rs254, %r3639; 2026-02-21T10:22:31.1213759Z prmt.b32 %r3640, %r3637, 0, 0x7772U; 2026-02-21T10:22:31.1213820Z cvt.u16.u32 %rs255, %r3640; 2026-02-21T10:22:31.1213883Z prmt.b32 %r3641, %r3637, 0, 0x7773U; 2026-02-21T10:22:31.1213948Z cvt.u16.u32 %rs256, %r3641; 2026-02-21T10:22:31.1214143Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1214214Z shl.b16 %rs257, %rs249, 4; 2026-02-21T10:22:31.1214275Z shl.b16 %rs258, %rs250, 4; 2026-02-21T10:22:31.1214339Z shl.b16 %rs259, %rs251, 4; 2026-02-21T10:22:31.1214399Z shl.b16 %rs260, %rs252, 4; 2026-02-21T10:22:31.1214458Z shl.b16 %rs261, %rs253, 4; 2026-02-21T10:22:31.1214523Z shl.b16 %rs262, %rs254, 4; 2026-02-21T10:22:31.1214585Z shl.b16 %rs263, %rs255, 4; 2026-02-21T10:22:31.1214644Z shl.b16 %rs264, %rs256, 4; 2026-02-21T10:22:31.1214841Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1214920Z selp.b16 %rs265, %rs257, %rs249, %p34; 2026-02-21T10:22:31.1214994Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T10:22:31.1215057Z shr.s16 %rs267, %rs266, 4; 2026-02-21T10:22:31.1215132Z selp.b16 %rs268, %rs258, %rs250, %p34; 2026-02-21T10:22:31.1215192Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T10:22:31.1215253Z shr.s16 %rs270, %rs269, 4; 2026-02-21T10:22:31.1215405Z selp.b16 %rs271, %rs259, %rs251, %p34; 2026-02-21T10:22:31.1215469Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T10:22:31.1215530Z shr.s16 %rs273, %rs272, 4; 2026-02-21T10:22:31.1215598Z selp.b16 %rs274, %rs260, %rs252, %p34; 2026-02-21T10:22:31.1215664Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T10:22:31.1215725Z shr.s16 %rs276, %rs275, 4; 2026-02-21T10:22:31.1215795Z selp.b16 %rs277, %rs261, %rs253, %p34; 2026-02-21T10:22:31.1215860Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T10:22:31.1215918Z shr.s16 %rs279, %rs278, 4; 2026-02-21T10:22:31.1215987Z selp.b16 %rs280, %rs262, %rs254, %p34; 2026-02-21T10:22:31.1216048Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T10:22:31.1216116Z shr.s16 %rs282, %rs281, 4; 2026-02-21T10:22:31.1216195Z selp.b16 %rs283, %rs263, %rs255, %p34; 2026-02-21T10:22:31.1216258Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T10:22:31.1216326Z shr.s16 %rs285, %rs284, 4; 2026-02-21T10:22:31.1216398Z selp.b16 %rs286, %rs264, %rs256, %p34; 2026-02-21T10:22:31.1216582Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T10:22:31.1216732Z shr.s16 %rs288, %rs287, 4; 2026-02-21T10:22:31.1216942Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1217008Z cvt.rn.f32.s16 %r3642, %rs267; 2026-02-21T10:22:31.1217071Z cvt.rn.f32.s16 %r3643, %rs270; 2026-02-21T10:22:31.1217140Z cvt.rn.f32.s16 %r3644, %rs273; 2026-02-21T10:22:31.1217268Z cvt.rn.f32.s16 %r3645, %rs276; 2026-02-21T10:22:31.1217330Z cvt.rn.f32.s16 %r3646, %rs279; 2026-02-21T10:22:31.1217396Z cvt.rn.f32.s16 %r3647, %rs282; 2026-02-21T10:22:31.1217458Z cvt.rn.f32.s16 %r3648, %rs285; 2026-02-21T10:22:31.1217519Z cvt.rn.f32.s16 %r3649, %rs288; 2026-02-21T10:22:31.1217576Z bar.sync 0; 2026-02-21T10:22:31.1217646Z st.shared.b32 [%r23], %r3642; 2026-02-21T10:22:31.1217712Z st.shared.b32 [%r23+8], %r3643; 2026-02-21T10:22:31.1217776Z st.shared.b32 [%r24], %r3644; 2026-02-21T10:22:31.1217845Z st.shared.b32 [%r24+8], %r3645; 2026-02-21T10:22:31.1217909Z st.shared.b32 [%r25], %r3646; 2026-02-21T10:22:31.1217977Z st.shared.b32 [%r25+8], %r3647; 2026-02-21T10:22:31.1218045Z st.shared.b32 [%r26], %r3648; 2026-02-21T10:22:31.1218118Z st.shared.b32 [%r26+8], %r3649; 2026-02-21T10:22:31.1218172Z $L__tmp11: 2026-02-21T10:22:31.1218446Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1218515Z // begin inline asm 2026-02-21T10:22:31.1218591Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1218650Z // end inline asm 2026-02-21T10:22:31.1218780Z bar.sync 0; 2026-02-21T10:22:31.1218868Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1218929Z // begin inline asm 2026-02-21T10:22:31.1219691Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r2720,%r2721,%r2722,%r2723}, %rd105, %p39, 1, 1; 2026-02-21T10:22:31.1219761Z // end inline asm 2026-02-21T10:22:31.1219821Z // begin inline asm 2026-02-21T10:22:31.1220567Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r2788,%r2789,%r2790,%r2791}, %rd106, %p39, 1, 1; 2026-02-21T10:22:31.1220632Z // end inline asm 2026-02-21T10:22:31.1220693Z // begin inline asm 2026-02-21T10:22:31.1221435Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r2720,%r2721,%r2722,%r2723}, %rd107, %p39, 1, 1; 2026-02-21T10:22:31.1221499Z // end inline asm 2026-02-21T10:22:31.1221626Z // begin inline asm 2026-02-21T10:22:31.1222368Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r2788,%r2789,%r2790,%r2791}, %rd108, %p39, 1, 1; 2026-02-21T10:22:31.1222436Z // end inline asm 2026-02-21T10:22:31.1222516Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1222579Z mov.b32 %r2856, %r2514; 2026-02-21T10:22:31.1222650Z mov.b32 %r2857, %r3572; 2026-02-21T10:22:31.1222710Z mov.b32 %r2858, %r3572; 2026-02-21T10:22:31.1222768Z // begin inline asm 2026-02-21T10:22:31.1223890Z // wait for regs: %r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346,%r2856,%r2857,%r2858 2026-02-21T10:22:31.1223972Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1224031Z // end inline asm 2026-02-21T10:22:31.1224089Z $L__tmp12: 2026-02-21T10:22:31.1224338Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1224407Z add.s64 %rd120, %rd154, -32; 2026-02-21T10:22:31.1224475Z // begin inline asm 2026-02-21T10:22:31.1224536Z mov.u64 %rd119, 0x0; 2026-02-21T10:22:31.1224659Z createpolicy.fractional.L2::evict_last.b64 %rd119, 1.0; 2026-02-21T10:22:31.1224716Z // end inline asm 2026-02-21T10:22:31.1224778Z // begin inline asm 2026-02-21T10:22:31.1224835Z mov.u32 %r2926, 0x0; 2026-02-21T10:22:31.1224892Z mov.u32 %r2927, 0x0; 2026-02-21T10:22:31.1225094Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2926, %r2927 }, [ %rd120 + 0 ], %rd119; 2026-02-21T10:22:31.1225162Z // end inline asm 2026-02-21T10:22:31.1225361Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1225422Z bar.sync 0; 2026-02-21T10:22:31.1225502Z st.shared.v2.b32 [%r15], {%r2926, %r2927}; 2026-02-21T10:22:31.1225559Z bar.sync 0; 2026-02-21T10:22:31.1225626Z ld.shared.b16 %rs289, [%r16]; 2026-02-21T10:22:31.1225700Z ld.shared.b16 %rs290, [%r16+256]; 2026-02-21T10:22:31.1225819Z ld.shared.b16 %rs291, [%r16+16]; 2026-02-21T10:22:31.1225890Z ld.shared.b16 %rs292, [%r16+272]; 2026-02-21T10:22:31.1225969Z ld.shared.b16 %rs293, [%r17]; 2026-02-21T10:22:31.1226035Z ld.shared.b16 %rs294, [%r17+256]; 2026-02-21T10:22:31.1226102Z ld.shared.b16 %rs295, [%r17+16]; 2026-02-21T10:22:31.1226165Z ld.shared.b16 %rs296, [%r17+272]; 2026-02-21T10:22:31.1226233Z cvt.f32.bf16 %r3129, %rs289; 2026-02-21T10:22:31.1226295Z cvt.f32.bf16 %r3130, %rs290; 2026-02-21T10:22:31.1226359Z cvt.f32.bf16 %r3131, %rs293; 2026-02-21T10:22:31.1226428Z cvt.f32.bf16 %r3132, %rs294; 2026-02-21T10:22:31.1226612Z cvt.f32.bf16 %r3197, %rs291; 2026-02-21T10:22:31.1226678Z cvt.f32.bf16 %r3198, %rs292; 2026-02-21T10:22:31.1226741Z cvt.f32.bf16 %r3199, %rs295; 2026-02-21T10:22:31.1226806Z cvt.f32.bf16 %r3200, %rs296; 2026-02-21T10:22:31.1227015Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1227081Z add.s64 %rd123, %rd153, 20480; 2026-02-21T10:22:31.1227148Z // begin inline asm 2026-02-21T10:22:31.1227208Z mov.u64 %rd122, 0x0; 2026-02-21T10:22:31.1227331Z createpolicy.fractional.L2::evict_last.b64 %rd122, 1.0; 2026-02-21T10:22:31.1227394Z // end inline asm 2026-02-21T10:22:31.1227454Z // begin inline asm 2026-02-21T10:22:31.1227514Z mov.u32 %r2928, 0x0; 2026-02-21T10:22:31.1227674Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2928 }, [ %rd123 + 0 ], %rd122; 2026-02-21T10:22:31.1227737Z // end inline asm 2026-02-21T10:22:31.1228021Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1228078Z bar.sync 0; 2026-02-21T10:22:31.1228155Z st.shared.b8 [%r18], %r2928; 2026-02-21T10:22:31.1228231Z prmt.b32 %r3650, %r2928, 0, 0x7771U; 2026-02-21T10:22:31.1228300Z st.shared.b8 [%r19+256], %r3650; 2026-02-21T10:22:31.1228384Z prmt.b32 %r3651, %r2928, 0, 0x7772U; 2026-02-21T10:22:31.1228531Z st.shared.b8 [%r20+512], %r3651; 2026-02-21T10:22:31.1228604Z prmt.b32 %r3652, %r2928, 0, 0x7773U; 2026-02-21T10:22:31.1228674Z st.shared.b8 [%r21+768], %r3652; 2026-02-21T10:22:31.1228735Z bar.sync 0; 2026-02-21T10:22:31.1228803Z ld.shared.b32 %r3653, [%r22]; 2026-02-21T10:22:31.1228870Z prmt.b32 %r3654, %r3653, 0, 0x7770U; 2026-02-21T10:22:31.1228938Z cvt.u16.u32 %rs297, %r3654; 2026-02-21T10:22:31.1229001Z prmt.b32 %r3655, %r3653, 0, 0x7771U; 2026-02-21T10:22:31.1229064Z cvt.u16.u32 %rs298, %r3655; 2026-02-21T10:22:31.1229130Z prmt.b32 %r3656, %r3653, 0, 0x7772U; 2026-02-21T10:22:31.1229275Z cvt.u16.u32 %rs299, %r3656; 2026-02-21T10:22:31.1229348Z prmt.b32 %r3657, %r3653, 0, 0x7773U; 2026-02-21T10:22:31.1229412Z cvt.u16.u32 %rs300, %r3657; 2026-02-21T10:22:31.1229483Z ld.shared.b32 %r3658, [%r22+128]; 2026-02-21T10:22:31.1229547Z prmt.b32 %r3659, %r3658, 0, 0x7770U; 2026-02-21T10:22:31.1229668Z cvt.u16.u32 %rs301, %r3659; 2026-02-21T10:22:31.1229739Z prmt.b32 %r3660, %r3658, 0, 0x7771U; 2026-02-21T10:22:31.1229804Z cvt.u16.u32 %rs302, %r3660; 2026-02-21T10:22:31.1229870Z prmt.b32 %r3661, %r3658, 0, 0x7772U; 2026-02-21T10:22:31.1229931Z cvt.u16.u32 %rs303, %r3661; 2026-02-21T10:22:31.1230003Z prmt.b32 %r3662, %r3658, 0, 0x7773U; 2026-02-21T10:22:31.1230063Z cvt.u16.u32 %rs304, %r3662; 2026-02-21T10:22:31.1230279Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1230349Z shl.b16 %rs305, %rs297, 4; 2026-02-21T10:22:31.1230415Z shl.b16 %rs306, %rs298, 4; 2026-02-21T10:22:31.1230479Z shl.b16 %rs307, %rs299, 4; 2026-02-21T10:22:31.1230539Z shl.b16 %rs308, %rs300, 4; 2026-02-21T10:22:31.1230606Z shl.b16 %rs309, %rs301, 4; 2026-02-21T10:22:31.1230668Z shl.b16 %rs310, %rs302, 4; 2026-02-21T10:22:31.1230729Z shl.b16 %rs311, %rs303, 4; 2026-02-21T10:22:31.1230795Z shl.b16 %rs312, %rs304, 4; 2026-02-21T10:22:31.1230995Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1231149Z selp.b16 %rs313, %rs305, %rs297, %p34; 2026-02-21T10:22:31.1231224Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T10:22:31.1231287Z shr.s16 %rs315, %rs314, 4; 2026-02-21T10:22:31.1231359Z selp.b16 %rs316, %rs306, %rs298, %p34; 2026-02-21T10:22:31.1231420Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T10:22:31.1231487Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:22:31.1231557Z selp.b16 %rs319, %rs307, %rs299, %p34; 2026-02-21T10:22:31.1231620Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T10:22:31.1231688Z shr.s16 %rs321, %rs320, 4; 2026-02-21T10:22:31.1231759Z selp.b16 %rs322, %rs308, %rs300, %p34; 2026-02-21T10:22:31.1231823Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T10:22:31.1231885Z shr.s16 %rs324, %rs323, 4; 2026-02-21T10:22:31.1231959Z selp.b16 %rs325, %rs309, %rs301, %p34; 2026-02-21T10:22:31.1232020Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T10:22:31.1232085Z shr.s16 %rs327, %rs326, 4; 2026-02-21T10:22:31.1232158Z selp.b16 %rs328, %rs310, %rs302, %p34; 2026-02-21T10:22:31.1232218Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T10:22:31.1232281Z shr.s16 %rs330, %rs329, 4; 2026-02-21T10:22:31.1232350Z selp.b16 %rs331, %rs311, %rs303, %p34; 2026-02-21T10:22:31.1232419Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T10:22:31.1232479Z shr.s16 %rs333, %rs332, 4; 2026-02-21T10:22:31.1232545Z selp.b16 %rs334, %rs312, %rs304, %p34; 2026-02-21T10:22:31.1232608Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T10:22:31.1232667Z shr.s16 %rs336, %rs335, 4; 2026-02-21T10:22:31.1232869Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1233007Z cvt.rn.f32.s16 %r3663, %rs315; 2026-02-21T10:22:31.1233072Z cvt.rn.f32.s16 %r3664, %rs318; 2026-02-21T10:22:31.1233135Z cvt.rn.f32.s16 %r3665, %rs321; 2026-02-21T10:22:31.1233198Z cvt.rn.f32.s16 %r3666, %rs324; 2026-02-21T10:22:31.1233264Z cvt.rn.f32.s16 %r3667, %rs327; 2026-02-21T10:22:31.1233328Z cvt.rn.f32.s16 %r3668, %rs330; 2026-02-21T10:22:31.1233391Z cvt.rn.f32.s16 %r3669, %rs333; 2026-02-21T10:22:31.1233463Z cvt.rn.f32.s16 %r3670, %rs336; 2026-02-21T10:22:31.1233518Z bar.sync 0; 2026-02-21T10:22:31.1233583Z st.shared.b32 [%r23], %r3663; 2026-02-21T10:22:31.1233650Z st.shared.b32 [%r23+8], %r3664; 2026-02-21T10:22:31.1233719Z st.shared.b32 [%r24], %r3665; 2026-02-21T10:22:31.1233782Z st.shared.b32 [%r24+8], %r3666; 2026-02-21T10:22:31.1233845Z st.shared.b32 [%r25], %r3667; 2026-02-21T10:22:31.1233914Z st.shared.b32 [%r25+8], %r3668; 2026-02-21T10:22:31.1233975Z st.shared.b32 [%r26], %r3669; 2026-02-21T10:22:31.1234090Z st.shared.b32 [%r26+8], %r3670; 2026-02-21T10:22:31.1234146Z $L__tmp13: 2026-02-21T10:22:31.1234424Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1234485Z // begin inline asm 2026-02-21T10:22:31.1234618Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1234681Z // end inline asm 2026-02-21T10:22:31.1234737Z bar.sync 0; 2026-02-21T10:22:31.1234810Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1234876Z // begin inline asm 2026-02-21T10:22:31.1235637Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r3129,%r3130,%r3131,%r3132}, %rd105, %p39, 1, 1; 2026-02-21T10:22:31.1235694Z // end inline asm 2026-02-21T10:22:31.1235762Z // begin inline asm 2026-02-21T10:22:31.1236637Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210}, {%r3197,%r3198,%r3199,%r3200}, %rd106, %p39, 1, 1; 2026-02-21T10:22:31.1236703Z // end inline asm 2026-02-21T10:22:31.1236767Z // begin inline asm 2026-02-21T10:22:31.1237617Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r3129,%r3130,%r3131,%r3132}, %rd107, %p39, 1, 1; 2026-02-21T10:22:31.1237680Z // end inline asm 2026-02-21T10:22:31.1237746Z // begin inline asm 2026-02-21T10:22:31.1238495Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346}, {%r3197,%r3198,%r3199,%r3200}, %rd108, %p39, 1, 1; 2026-02-21T10:22:31.1238556Z // end inline asm 2026-02-21T10:22:31.1238637Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1238703Z mov.b32 %r3265, %r2514; 2026-02-21T10:22:31.1238767Z mov.b32 %r3266, %r3572; 2026-02-21T10:22:31.1238824Z mov.b32 %r3267, %r3572; 2026-02-21T10:22:31.1238890Z // begin inline asm 2026-02-21T10:22:31.1239955Z // wait for regs: %r2179,%r2180,%r2181,%r2182,%r2183,%r2184,%r2185,%r2186,%r2187,%r2188,%r2189,%r2190,%r2191,%r2192,%r2193,%r2194,%r2195,%r2196,%r2197,%r2198,%r2199,%r2200,%r2201,%r2202,%r2203,%r2204,%r2205,%r2206,%r2207,%r2208,%r2209,%r2210,%r2315,%r2316,%r2317,%r2318,%r2319,%r2320,%r2321,%r2322,%r2323,%r2324,%r2325,%r2326,%r2327,%r2328,%r2329,%r2330,%r2331,%r2332,%r2333,%r2334,%r2335,%r2336,%r2337,%r2338,%r2339,%r2340,%r2341,%r2342,%r2343,%r2344,%r2345,%r2346,%r3265,%r3266,%r3267 2026-02-21T10:22:31.1240106Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1240164Z // end inline asm 2026-02-21T10:22:31.1240218Z $L__tmp14: 2026-02-21T10:22:31.1240426Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1240493Z // begin inline asm 2026-02-21T10:22:31.1240554Z mov.u64 %rd129, 0x0; 2026-02-21T10:22:31.1240681Z createpolicy.fractional.L2::evict_last.b64 %rd129, 1.0; 2026-02-21T10:22:31.1240746Z // end inline asm 2026-02-21T10:22:31.1240807Z // begin inline asm 2026-02-21T10:22:31.1240866Z mov.u32 %r3335, 0x0; 2026-02-21T10:22:31.1240923Z mov.u32 %r3336, 0x0; 2026-02-21T10:22:31.1241132Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r3335, %r3336 }, [ %rd154 + 0 ], %rd129; 2026-02-21T10:22:31.1241192Z // end inline asm 2026-02-21T10:22:31.1241391Z .loc 1 62 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:62:32 2026-02-21T10:22:31.1241524Z bar.sync 0; 2026-02-21T10:22:31.1241611Z st.shared.v2.b32 [%r15], {%r3335, %r3336}; 2026-02-21T10:22:31.1241667Z bar.sync 0; 2026-02-21T10:22:31.1241738Z ld.shared.b16 %rs337, [%r16]; 2026-02-21T10:22:31.1241806Z ld.shared.b16 %rs338, [%r16+256]; 2026-02-21T10:22:31.1241950Z ld.shared.b16 %rs339, [%r16+16]; 2026-02-21T10:22:31.1242015Z ld.shared.b16 %rs340, [%r16+272]; 2026-02-21T10:22:31.1242085Z ld.shared.b16 %rs341, [%r17]; 2026-02-21T10:22:31.1242156Z ld.shared.b16 %rs342, [%r17+256]; 2026-02-21T10:22:31.1242223Z ld.shared.b16 %rs343, [%r17+16]; 2026-02-21T10:22:31.1242297Z ld.shared.b16 %rs344, [%r17+272]; 2026-02-21T10:22:31.1242360Z cvt.f32.bf16 %r3466, %rs337; 2026-02-21T10:22:31.1242429Z cvt.f32.bf16 %r3467, %rs338; 2026-02-21T10:22:31.1242492Z cvt.f32.bf16 %r3468, %rs341; 2026-02-21T10:22:31.1242560Z cvt.f32.bf16 %r3469, %rs342; 2026-02-21T10:22:31.1242625Z cvt.f32.bf16 %r3534, %rs339; 2026-02-21T10:22:31.1242689Z cvt.f32.bf16 %r3535, %rs340; 2026-02-21T10:22:31.1242769Z cvt.f32.bf16 %r3536, %rs343; 2026-02-21T10:22:31.1242833Z cvt.f32.bf16 %r3537, %rs344; 2026-02-21T10:22:31.1243036Z .loc 1 64 87 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:64:87 2026-02-21T10:22:31.1243107Z add.s64 %rd133, %rd153, 30720; 2026-02-21T10:22:31.1243171Z // begin inline asm 2026-02-21T10:22:31.1243231Z mov.u64 %rd132, 0x0; 2026-02-21T10:22:31.1243351Z createpolicy.fractional.L2::evict_last.b64 %rd132, 1.0; 2026-02-21T10:22:31.1243477Z // end inline asm 2026-02-21T10:22:31.1243543Z // begin inline asm 2026-02-21T10:22:31.1243604Z mov.u32 %r3337, 0x0; 2026-02-21T10:22:31.1243770Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3337 }, [ %rd133 + 0 ], %rd132; 2026-02-21T10:22:31.1243827Z // end inline asm 2026-02-21T10:22:31.1244024Z .loc 1 72 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:72:28 2026-02-21T10:22:31.1244088Z bar.sync 0; 2026-02-21T10:22:31.1244155Z st.shared.b8 [%r18], %r3337; 2026-02-21T10:22:31.1244225Z prmt.b32 %r3671, %r3337, 0, 0x7771U; 2026-02-21T10:22:31.1244291Z st.shared.b8 [%r19+256], %r3671; 2026-02-21T10:22:31.1244362Z prmt.b32 %r3672, %r3337, 0, 0x7772U; 2026-02-21T10:22:31.1244427Z st.shared.b8 [%r20+512], %r3672; 2026-02-21T10:22:31.1244495Z prmt.b32 %r3673, %r3337, 0, 0x7773U; 2026-02-21T10:22:31.1244561Z st.shared.b8 [%r21+768], %r3673; 2026-02-21T10:22:31.1244615Z bar.sync 0; 2026-02-21T10:22:31.1244681Z ld.shared.b32 %r3674, [%r22]; 2026-02-21T10:22:31.1244746Z prmt.b32 %r3675, %r3674, 0, 0x7770U; 2026-02-21T10:22:31.1244816Z cvt.u16.u32 %rs345, %r3675; 2026-02-21T10:22:31.1244879Z prmt.b32 %r3676, %r3674, 0, 0x7771U; 2026-02-21T10:22:31.1244941Z cvt.u16.u32 %rs346, %r3676; 2026-02-21T10:22:31.1245009Z prmt.b32 %r3677, %r3674, 0, 0x7772U; 2026-02-21T10:22:31.1245070Z cvt.u16.u32 %rs347, %r3677; 2026-02-21T10:22:31.1245133Z prmt.b32 %r3678, %r3674, 0, 0x7773U; 2026-02-21T10:22:31.1245265Z cvt.u16.u32 %rs348, %r3678; 2026-02-21T10:22:31.1245334Z ld.shared.b32 %r3679, [%r22+128]; 2026-02-21T10:22:31.1245397Z prmt.b32 %r3680, %r3679, 0, 0x7770U; 2026-02-21T10:22:31.1245458Z cvt.u16.u32 %rs349, %r3680; 2026-02-21T10:22:31.1245526Z prmt.b32 %r3681, %r3679, 0, 0x7771U; 2026-02-21T10:22:31.1245586Z cvt.u16.u32 %rs350, %r3681; 2026-02-21T10:22:31.1245652Z prmt.b32 %r3682, %r3679, 0, 0x7772U; 2026-02-21T10:22:31.1245717Z cvt.u16.u32 %rs351, %r3682; 2026-02-21T10:22:31.1245783Z prmt.b32 %r3683, %r3679, 0, 0x7773U; 2026-02-21T10:22:31.1245845Z cvt.u16.u32 %rs352, %r3683; 2026-02-21T10:22:31.1246039Z .loc 1 67 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:67:28 2026-02-21T10:22:31.1246106Z shl.b16 %rs353, %rs345, 4; 2026-02-21T10:22:31.1246169Z shl.b16 %rs354, %rs346, 4; 2026-02-21T10:22:31.1246230Z shl.b16 %rs355, %rs347, 4; 2026-02-21T10:22:31.1246296Z shl.b16 %rs356, %rs348, 4; 2026-02-21T10:22:31.1246362Z shl.b16 %rs357, %rs349, 4; 2026-02-21T10:22:31.1246678Z shl.b16 %rs358, %rs350, 4; 2026-02-21T10:22:31.1246754Z shl.b16 %rs359, %rs351, 4; 2026-02-21T10:22:31.1246832Z shl.b16 %rs360, %rs352, 4; 2026-02-21T10:22:31.1247039Z .loc 1 82 58 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:82:58 2026-02-21T10:22:31.1247187Z selp.b16 %rs361, %rs353, %rs345, %p34; 2026-02-21T10:22:31.1247256Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T10:22:31.1247319Z shr.s16 %rs363, %rs362, 4; 2026-02-21T10:22:31.1247393Z selp.b16 %rs364, %rs354, %rs346, %p34; 2026-02-21T10:22:31.1247462Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T10:22:31.1247523Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:22:31.1247592Z selp.b16 %rs367, %rs355, %rs347, %p34; 2026-02-21T10:22:31.1247653Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T10:22:31.1247720Z shr.s16 %rs369, %rs368, 4; 2026-02-21T10:22:31.1247788Z selp.b16 %rs370, %rs356, %rs348, %p34; 2026-02-21T10:22:31.1247849Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T10:22:31.1247916Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:22:31.1247988Z selp.b16 %rs373, %rs357, %rs349, %p34; 2026-02-21T10:22:31.1248049Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T10:22:31.1248109Z shr.s16 %rs375, %rs374, 4; 2026-02-21T10:22:31.1248184Z selp.b16 %rs376, %rs358, %rs350, %p34; 2026-02-21T10:22:31.1248245Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T10:22:31.1248308Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:22:31.1248381Z selp.b16 %rs379, %rs359, %rs351, %p34; 2026-02-21T10:22:31.1248441Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T10:22:31.1248569Z shr.s16 %rs381, %rs380, 4; 2026-02-21T10:22:31.1248652Z selp.b16 %rs382, %rs360, %rs352, %p34; 2026-02-21T10:22:31.1248721Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T10:22:31.1248781Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:22:31.1248981Z .loc 1 87 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:87:32 2026-02-21T10:22:31.1249057Z cvt.rn.f32.s16 %r3684, %rs363; 2026-02-21T10:22:31.1249121Z cvt.rn.f32.s16 %r3685, %rs366; 2026-02-21T10:22:31.1249192Z cvt.rn.f32.s16 %r3686, %rs369; 2026-02-21T10:22:31.1249259Z cvt.rn.f32.s16 %r3687, %rs372; 2026-02-21T10:22:31.1249323Z cvt.rn.f32.s16 %r3688, %rs375; 2026-02-21T10:22:31.1249387Z cvt.rn.f32.s16 %r3689, %rs378; 2026-02-21T10:22:31.1249447Z cvt.rn.f32.s16 %r3690, %rs381; 2026-02-21T10:22:31.1249516Z cvt.rn.f32.s16 %r3691, %rs384; 2026-02-21T10:22:31.1249572Z bar.sync 0; 2026-02-21T10:22:31.1249636Z st.shared.b32 [%r23], %r3684; 2026-02-21T10:22:31.1249710Z st.shared.b32 [%r23+8], %r3685; 2026-02-21T10:22:31.1249775Z st.shared.b32 [%r24], %r3686; 2026-02-21T10:22:31.1249840Z st.shared.b32 [%r24+8], %r3687; 2026-02-21T10:22:31.1249903Z st.shared.b32 [%r25], %r3688; 2026-02-21T10:22:31.1249970Z st.shared.b32 [%r25+8], %r3689; 2026-02-21T10:22:31.1250035Z st.shared.b32 [%r26], %r3690; 2026-02-21T10:22:31.1250098Z st.shared.b32 [%r26+8], %r3691; 2026-02-21T10:22:31.1250158Z $L__tmp15: 2026-02-21T10:22:31.1250429Z .loc 2 291 36 // standard.py:291:36 @[ cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:94:40 ] 2026-02-21T10:22:31.1250661Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2179, %r2315}; 2026-02-21T10:22:31.1250723Z bar.sync 0; 2026-02-21T10:22:31.1250784Z // begin inline asm 2026-02-21T10:22:31.1250922Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3470}, [%r1604]; 2026-02-21T10:22:31.1250985Z // end inline asm 2026-02-21T10:22:31.1251047Z bar.sync 0; 2026-02-21T10:22:31.1251196Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2181, %r2317}; 2026-02-21T10:22:31.1251253Z bar.sync 0; 2026-02-21T10:22:31.1251321Z // begin inline asm 2026-02-21T10:22:31.1251453Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3472}, [%r1604]; 2026-02-21T10:22:31.1251511Z // end inline asm 2026-02-21T10:22:31.1251566Z bar.sync 0; 2026-02-21T10:22:31.1251734Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2180, %r2316}; 2026-02-21T10:22:31.1251789Z bar.sync 0; 2026-02-21T10:22:31.1251852Z // begin inline asm 2026-02-21T10:22:31.1252062Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3471}, [%r1604]; 2026-02-21T10:22:31.1252123Z // end inline asm 2026-02-21T10:22:31.1252178Z bar.sync 0; 2026-02-21T10:22:31.1252330Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2182, %r2318}; 2026-02-21T10:22:31.1252386Z bar.sync 0; 2026-02-21T10:22:31.1252491Z // begin inline asm 2026-02-21T10:22:31.1252621Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3473}, [%r1604]; 2026-02-21T10:22:31.1252683Z // end inline asm 2026-02-21T10:22:31.1252741Z bar.sync 0; 2026-02-21T10:22:31.1252884Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2183, %r2319}; 2026-02-21T10:22:31.1252944Z bar.sync 0; 2026-02-21T10:22:31.1253004Z // begin inline asm 2026-02-21T10:22:31.1253136Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3474}, [%r1604]; 2026-02-21T10:22:31.1253195Z // end inline asm 2026-02-21T10:22:31.1253256Z bar.sync 0; 2026-02-21T10:22:31.1253411Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2185, %r2321}; 2026-02-21T10:22:31.1253473Z bar.sync 0; 2026-02-21T10:22:31.1253541Z // begin inline asm 2026-02-21T10:22:31.1253675Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3476}, [%r1604]; 2026-02-21T10:22:31.1253732Z // end inline asm 2026-02-21T10:22:31.1253788Z bar.sync 0; 2026-02-21T10:22:31.1253943Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2184, %r2320}; 2026-02-21T10:22:31.1254006Z bar.sync 0; 2026-02-21T10:22:31.1254066Z // begin inline asm 2026-02-21T10:22:31.1254254Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3475}, [%r1604]; 2026-02-21T10:22:31.1254316Z // end inline asm 2026-02-21T10:22:31.1254372Z bar.sync 0; 2026-02-21T10:22:31.1254521Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2186, %r2322}; 2026-02-21T10:22:31.1254576Z bar.sync 0; 2026-02-21T10:22:31.1254636Z // begin inline asm 2026-02-21T10:22:31.1254764Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3477}, [%r1604]; 2026-02-21T10:22:31.1254826Z // end inline asm 2026-02-21T10:22:31.1254883Z bar.sync 0; 2026-02-21T10:22:31.1255029Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2187, %r2323}; 2026-02-21T10:22:31.1255094Z bar.sync 0; 2026-02-21T10:22:31.1255164Z // begin inline asm 2026-02-21T10:22:31.1255293Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3478}, [%r1604]; 2026-02-21T10:22:31.1255350Z // end inline asm 2026-02-21T10:22:31.1255413Z bar.sync 0; 2026-02-21T10:22:31.1255556Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2189, %r2325}; 2026-02-21T10:22:31.1255613Z bar.sync 0; 2026-02-21T10:22:31.1255680Z // begin inline asm 2026-02-21T10:22:31.1255809Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3480}, [%r1604]; 2026-02-21T10:22:31.1255866Z // end inline asm 2026-02-21T10:22:31.1255922Z bar.sync 0; 2026-02-21T10:22:31.1256070Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2188, %r2324}; 2026-02-21T10:22:31.1256125Z bar.sync 0; 2026-02-21T10:22:31.1256183Z // begin inline asm 2026-02-21T10:22:31.1256315Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3479}, [%r1604]; 2026-02-21T10:22:31.1256430Z // end inline asm 2026-02-21T10:22:31.1256621Z bar.sync 0; 2026-02-21T10:22:31.1256778Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2190, %r2326}; 2026-02-21T10:22:31.1256835Z bar.sync 0; 2026-02-21T10:22:31.1256895Z // begin inline asm 2026-02-21T10:22:31.1257022Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3481}, [%r1604]; 2026-02-21T10:22:31.1257087Z // end inline asm 2026-02-21T10:22:31.1257141Z bar.sync 0; 2026-02-21T10:22:31.1257289Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2191, %r2327}; 2026-02-21T10:22:31.1257348Z bar.sync 0; 2026-02-21T10:22:31.1257408Z // begin inline asm 2026-02-21T10:22:31.1257537Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3482}, [%r1604]; 2026-02-21T10:22:31.1257595Z // end inline asm 2026-02-21T10:22:31.1257663Z bar.sync 0; 2026-02-21T10:22:31.1257814Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2193, %r2329}; 2026-02-21T10:22:31.1257873Z bar.sync 0; 2026-02-21T10:22:31.1257941Z // begin inline asm 2026-02-21T10:22:31.1258154Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3484}, [%r1604]; 2026-02-21T10:22:31.1258213Z // end inline asm 2026-02-21T10:22:31.1258269Z bar.sync 0; 2026-02-21T10:22:31.1258419Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2192, %r2328}; 2026-02-21T10:22:31.1258475Z bar.sync 0; 2026-02-21T10:22:31.1258619Z // begin inline asm 2026-02-21T10:22:31.1258757Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3483}, [%r1604]; 2026-02-21T10:22:31.1258814Z // end inline asm 2026-02-21T10:22:31.1258872Z bar.sync 0; 2026-02-21T10:22:31.1259022Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2194, %r2330}; 2026-02-21T10:22:31.1259079Z bar.sync 0; 2026-02-21T10:22:31.1259139Z // begin inline asm 2026-02-21T10:22:31.1259266Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3485}, [%r1604]; 2026-02-21T10:22:31.1259329Z // end inline asm 2026-02-21T10:22:31.1259385Z bar.sync 0; 2026-02-21T10:22:31.1259531Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2195, %r2331}; 2026-02-21T10:22:31.1259597Z bar.sync 0; 2026-02-21T10:22:31.1259660Z // begin inline asm 2026-02-21T10:22:31.1259788Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3486}, [%r1604]; 2026-02-21T10:22:31.1259848Z // end inline asm 2026-02-21T10:22:31.1259907Z bar.sync 0; 2026-02-21T10:22:31.1260048Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2197, %r2333}; 2026-02-21T10:22:31.1260107Z bar.sync 0; 2026-02-21T10:22:31.1260175Z // begin inline asm 2026-02-21T10:22:31.1260368Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3488}, [%r1604]; 2026-02-21T10:22:31.1260430Z // end inline asm 2026-02-21T10:22:31.1260497Z bar.sync 0; 2026-02-21T10:22:31.1260650Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2196, %r2332}; 2026-02-21T10:22:31.1260704Z bar.sync 0; 2026-02-21T10:22:31.1260765Z // begin inline asm 2026-02-21T10:22:31.1260898Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3487}, [%r1604]; 2026-02-21T10:22:31.1260955Z // end inline asm 2026-02-21T10:22:31.1261014Z bar.sync 0; 2026-02-21T10:22:31.1261167Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2198, %r2334}; 2026-02-21T10:22:31.1261224Z bar.sync 0; 2026-02-21T10:22:31.1261282Z // begin inline asm 2026-02-21T10:22:31.1261411Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3489}, [%r1604]; 2026-02-21T10:22:31.1261476Z // end inline asm 2026-02-21T10:22:31.1261535Z bar.sync 0; 2026-02-21T10:22:31.1261680Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2199, %r2335}; 2026-02-21T10:22:31.1261740Z bar.sync 0; 2026-02-21T10:22:31.1261803Z // begin inline asm 2026-02-21T10:22:31.1261932Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3490}, [%r1604]; 2026-02-21T10:22:31.1261988Z // end inline asm 2026-02-21T10:22:31.1262050Z bar.sync 0; 2026-02-21T10:22:31.1262193Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2201, %r2337}; 2026-02-21T10:22:31.1262249Z bar.sync 0; 2026-02-21T10:22:31.1262313Z // begin inline asm 2026-02-21T10:22:31.1262441Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3492}, [%r1604]; 2026-02-21T10:22:31.1262586Z // end inline asm 2026-02-21T10:22:31.1262643Z bar.sync 0; 2026-02-21T10:22:31.1262795Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2200, %r2336}; 2026-02-21T10:22:31.1262852Z bar.sync 0; 2026-02-21T10:22:31.1262910Z // begin inline asm 2026-02-21T10:22:31.1263043Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3491}, [%r1604]; 2026-02-21T10:22:31.1263103Z // end inline asm 2026-02-21T10:22:31.1263158Z bar.sync 0; 2026-02-21T10:22:31.1263311Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2202, %r2338}; 2026-02-21T10:22:31.1263366Z bar.sync 0; 2026-02-21T10:22:31.1263424Z // begin inline asm 2026-02-21T10:22:31.1263551Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3493}, [%r1604]; 2026-02-21T10:22:31.1263613Z // end inline asm 2026-02-21T10:22:31.1263670Z bar.sync 0; 2026-02-21T10:22:31.1263816Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2203, %r2339}; 2026-02-21T10:22:31.1263879Z bar.sync 0; 2026-02-21T10:22:31.1263945Z // begin inline asm 2026-02-21T10:22:31.1264133Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3494}, [%r1604]; 2026-02-21T10:22:31.1264198Z // end inline asm 2026-02-21T10:22:31.1264260Z bar.sync 0; 2026-02-21T10:22:31.1264407Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2205, %r2341}; 2026-02-21T10:22:31.1264463Z bar.sync 0; 2026-02-21T10:22:31.1264576Z // begin inline asm 2026-02-21T10:22:31.1264708Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3496}, [%r1604]; 2026-02-21T10:22:31.1264765Z // end inline asm 2026-02-21T10:22:31.1264824Z bar.sync 0; 2026-02-21T10:22:31.1264976Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2204, %r2340}; 2026-02-21T10:22:31.1265031Z bar.sync 0; 2026-02-21T10:22:31.1265091Z // begin inline asm 2026-02-21T10:22:31.1265226Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3495}, [%r1604]; 2026-02-21T10:22:31.1265282Z // end inline asm 2026-02-21T10:22:31.1265341Z bar.sync 0; 2026-02-21T10:22:31.1265491Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2206, %r2342}; 2026-02-21T10:22:31.1265553Z bar.sync 0; 2026-02-21T10:22:31.1265612Z // begin inline asm 2026-02-21T10:22:31.1265740Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3497}, [%r1604]; 2026-02-21T10:22:31.1265804Z // end inline asm 2026-02-21T10:22:31.1265860Z bar.sync 0; 2026-02-21T10:22:31.1266004Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2207, %r2343}; 2026-02-21T10:22:31.1266065Z bar.sync 0; 2026-02-21T10:22:31.1266128Z // begin inline asm 2026-02-21T10:22:31.1266309Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3498}, [%r1604]; 2026-02-21T10:22:31.1266369Z // end inline asm 2026-02-21T10:22:31.1266430Z bar.sync 0; 2026-02-21T10:22:31.1266704Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2209, %r2345}; 2026-02-21T10:22:31.1266765Z bar.sync 0; 2026-02-21T10:22:31.1266831Z // begin inline asm 2026-02-21T10:22:31.1266962Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3500}, [%r1604]; 2026-02-21T10:22:31.1267020Z // end inline asm 2026-02-21T10:22:31.1267080Z bar.sync 0; 2026-02-21T10:22:31.1267231Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2208, %r2344}; 2026-02-21T10:22:31.1267288Z bar.sync 0; 2026-02-21T10:22:31.1267349Z // begin inline asm 2026-02-21T10:22:31.1267484Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3499}, [%r1604]; 2026-02-21T10:22:31.1267541Z // end inline asm 2026-02-21T10:22:31.1267601Z bar.sync 0; 2026-02-21T10:22:31.1267749Z stmatrix.sync.aligned.m8n8.x2.shared.b16 [%r282], {%r2210, %r2346}; 2026-02-21T10:22:31.1267805Z bar.sync 0; 2026-02-21T10:22:31.1267866Z // begin inline asm 2026-02-21T10:22:31.1267994Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3501}, [%r1604]; 2026-02-21T10:22:31.1268058Z // end inline asm 2026-02-21T10:22:31.1268118Z // begin inline asm 2026-02-21T10:22:31.1268199Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1268262Z // end inline asm 2026-02-21T10:22:31.1268338Z wgmma.fence.sync.aligned; 2026-02-21T10:22:31.1268399Z // begin inline asm 2026-02-21T10:22:31.1269328Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498,%r3499,%r3500,%r3501}, {%r3466,%r3467,%r3468,%r3469}, %rd135, %p39, 1, 1; 2026-02-21T10:22:31.1269397Z // end inline asm 2026-02-21T10:22:31.1269459Z // begin inline asm 2026-02-21T10:22:31.1270223Z wgmma.mma_async.sync.aligned.m64n64k8.f32.tf32.tf32 {%r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498,%r3499,%r3500,%r3501}, {%r3534,%r3535,%r3536,%r3537}, %rd136, %p39, 1, 1; 2026-02-21T10:22:31.1270283Z // end inline asm 2026-02-21T10:22:31.1270363Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:31.1270427Z mov.b32 %r3571, %r3572; 2026-02-21T10:22:31.1270493Z mov.b32 %r3570, %r2514; 2026-02-21T10:22:31.1270560Z // begin inline asm 2026-02-21T10:22:31.1271184Z // wait for regs: %r3470,%r3471,%r3472,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3481,%r3482,%r3483,%r3484,%r3485,%r3486,%r3487,%r3488,%r3489,%r3490,%r3491,%r3492,%r3493,%r3494,%r3495,%r3496,%r3497,%r3498,%r3499,%r3500,%r3501,%r3570,%r3571,%r3572 2026-02-21T10:22:31.1271326Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:31.1271385Z // end inline asm 2026-02-21T10:22:31.1271442Z $L__tmp16: 2026-02-21T10:22:31.1271667Z .loc 1 50 126 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:50:126 2026-02-21T10:22:31.1271735Z add.s64 %rd155, %rd155, 32; 2026-02-21T10:22:31.1271802Z add.s64 %rd154, %rd154, 128; 2026-02-21T10:22:31.1271868Z add.s64 %rd153, %rd153, 40960; 2026-02-21T10:22:31.1271943Z setp.lt.u64 %p54, %rd155, 4064; 2026-02-21T10:22:31.1272003Z @%p54 bra $L__BB0_6; 2026-02-21T10:22:31.1272117Z // %bb.7: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:22:31.1272333Z .loc 1 97 28 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:97:28 2026-02-21T10:22:31.1272414Z cvt.rn.bf16x2.f32 %r3695, %r3471, %r3470; 2026-02-21T10:22:31.1272489Z cvt.rn.bf16x2.f32 %r3696, %r3473, %r3472; 2026-02-21T10:22:31.1272567Z cvt.rn.bf16x2.f32 %r3697, %r3475, %r3474; 2026-02-21T10:22:31.1272655Z cvt.rn.bf16x2.f32 %r3698, %r3477, %r3476; 2026-02-21T10:22:31.1272732Z cvt.rn.bf16x2.f32 %r3699, %r3479, %r3478; 2026-02-21T10:22:31.1272876Z cvt.rn.bf16x2.f32 %r3700, %r3481, %r3480; 2026-02-21T10:22:31.1272958Z cvt.rn.bf16x2.f32 %r3701, %r3483, %r3482; 2026-02-21T10:22:31.1273031Z cvt.rn.bf16x2.f32 %r3702, %r3485, %r3484; 2026-02-21T10:22:31.1273102Z cvt.rn.bf16x2.f32 %r3703, %r3487, %r3486; 2026-02-21T10:22:31.1273178Z cvt.rn.bf16x2.f32 %r3704, %r3489, %r3488; 2026-02-21T10:22:31.1273261Z cvt.rn.bf16x2.f32 %r3705, %r3491, %r3490; 2026-02-21T10:22:31.1273338Z cvt.rn.bf16x2.f32 %r3706, %r3493, %r3492; 2026-02-21T10:22:31.1273416Z cvt.rn.bf16x2.f32 %r3707, %r3495, %r3494; 2026-02-21T10:22:31.1273489Z cvt.rn.bf16x2.f32 %r3708, %r3497, %r3496; 2026-02-21T10:22:31.1273560Z cvt.rn.bf16x2.f32 %r3709, %r3499, %r3498; 2026-02-21T10:22:31.1273629Z cvt.rn.bf16x2.f32 %r3710, %r3501, %r3500; 2026-02-21T10:22:31.1273838Z .loc 1 98 43 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:98:43 2026-02-21T10:22:31.1273899Z bar.sync 0; 2026-02-21T10:22:31.1274088Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r29], {%r3695, %r3696, %r3697, %r3698}; 2026-02-21T10:22:31.1274275Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r30], {%r3699, %r3700, %r3701, %r3702}; 2026-02-21T10:22:31.1274450Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r31], {%r3703, %r3704, %r3705, %r3706}; 2026-02-21T10:22:31.1274623Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r32], {%r3707, %r3708, %r3709, %r3710}; 2026-02-21T10:22:31.1274689Z // begin inline asm 2026-02-21T10:22:31.1274769Z fence.proxy.async.shared::cta; 2026-02-21T10:22:31.1274884Z // end inline asm 2026-02-21T10:22:31.1274944Z bar.sync 0; 2026-02-21T10:22:31.1275032Z elect.sync %r3711|%p57, -1; 2026-02-21T10:22:31.1275102Z and.pred %p55, %p37, %p57; 2026-02-21T10:22:31.1275167Z or.b32 %r3692, %r104, %r103; 2026-02-21T10:22:31.1275233Z // begin inline asm 2026-02-21T10:22:31.1275456Z @%p55 cp.async.bulk.tensor.2d.global.shared::cta.bulk_group [%rd1, {%r3692, %r105}], [%r3694]; 2026-02-21T10:22:31.1275516Z // end inline asm 2026-02-21T10:22:31.1275593Z cp.async.bulk.commit_group; 2026-02-21T10:22:31.1275673Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:31.1275729Z bar.sync 0; 2026-02-21T10:22:31.1275939Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.1276004Z add.s32 %r3753, %r3753, 2; 2026-02-21T10:22:31.1276072Z setp.lt.s32 %p58, %r3753, %r3; 2026-02-21T10:22:31.1276131Z @%p58 bra $L__BB0_3; 2026-02-21T10:22:31.1276197Z bra.uni $L__BB0_8; 2026-02-21T10:22:31.1276364Z $L__BB0_1: // %.._crit_edge_crit_edge 2026-02-21T10:22:31.1276698Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1276770Z shl.b32 %r193, %r1, 3; 2026-02-21T10:22:31.1276833Z and.b32 %r3820, %r193, 1912; 2026-02-21T10:22:31.1276896Z and.b32 %r3819, %r1, 16; 2026-02-21T10:22:31.1277169Z .loc 1 58 53 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:53 2026-02-21T10:22:31.1277238Z shl.b32 %r3818, %r6, 13; 2026-02-21T10:22:31.1277329Z $L__BB0_8: // %._crit_edge 2026-02-21T10:22:31.1277537Z .loc 1 35 35 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:35:35 2026-02-21T10:22:31.1277605Z shr.s32 %r3736, %r3, 31; 2026-02-21T10:22:31.1277668Z shr.u32 %r3737, %r3736, 16; 2026-02-21T10:22:31.1277730Z add.s32 %r3738, %r3, %r3737; 2026-02-21T10:22:31.1277796Z shr.s32 %r3739, %r3738, 16; 2026-02-21T10:22:31.1277998Z .loc 1 37 39 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:39 2026-02-21T10:22:31.1278059Z shl.b32 %r3740, %r3739, 6; 2026-02-21T10:22:31.1278120Z sub.s32 %r3741, 10, %r3740; 2026-02-21T10:22:31.1278318Z .loc 1 37 52 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:37:52 2026-02-21T10:22:31.1278384Z min.s32 %r3742, %r3741, 64; 2026-02-21T10:22:31.1278583Z .loc 1 38 45 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:38:45 2026-02-21T10:22:31.1278725Z and.b32 %r3743, %r3738, -65536; 2026-02-21T10:22:31.1278792Z sub.s32 %r3744, %r3, %r3743; 2026-02-21T10:22:31.1278984Z .loc 1 39 51 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:39:51 2026-02-21T10:22:31.1279053Z div.s32 %r3745, %r3744, %r3742; 2026-02-21T10:22:31.1279246Z .loc 1 58 53 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:53 2026-02-21T10:22:31.1279310Z shl.b32 %r3746, %r3745, 19; 2026-02-21T10:22:31.1279375Z or.b32 %r3747, %r3746, %r3818; 2026-02-21T10:22:31.1279571Z .loc 1 58 60 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:60 2026-02-21T10:22:31.1279632Z or.b32 %r3748, %r3747, %r8; 2026-02-21T10:22:31.1279826Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1279918Z mad.wide.s32 %rd138, %r3748, 2, %rd26; 2026-02-21T10:22:31.1280120Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1280187Z setp.eq.b32 %p59, %r3819, 0; 2026-02-21T10:22:31.1280258Z selp.b32 %r3749, 0, 136, %p59; 2026-02-21T10:22:31.1280321Z xor.b32 %r3750, %r3749, %r3820; 2026-02-21T10:22:31.1280384Z add.s32 %r3752, %r2514, %r3750; 2026-02-21T10:22:31.1280447Z add.s32 %r3712, %r3752, 25600; 2026-02-21T10:22:31.1280510Z mov.b32 %r3713, 0; 2026-02-21T10:22:31.1280642Z // begin inline asm 2026-02-21T10:22:31.1280798Z cp.async.ca.shared.global [ %r3712 + 0 ], [ %rd138 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1280862Z // end inline asm 2026-02-21T10:22:31.1280930Z cp.async.commit_group; 2026-02-21T10:22:31.1281125Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1281197Z add.s64 %rd139, %rd138, 32; 2026-02-21T10:22:31.1281391Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1281455Z add.s32 %r3714, %r3752, 31744; 2026-02-21T10:22:31.1281516Z // begin inline asm 2026-02-21T10:22:31.1281660Z cp.async.ca.shared.global [ %r3714 + 0 ], [ %rd139 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1281718Z // end inline asm 2026-02-21T10:22:31.1281784Z cp.async.commit_group; 2026-02-21T10:22:31.1281980Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1282042Z add.s64 %rd140, %rd138, 64; 2026-02-21T10:22:31.1282327Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1282397Z add.s32 %r3716, %r3752, 37888; 2026-02-21T10:22:31.1282457Z // begin inline asm 2026-02-21T10:22:31.1282593Z cp.async.ca.shared.global [ %r3716 + 0 ], [ %rd140 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1282695Z // end inline asm 2026-02-21T10:22:31.1282766Z cp.async.commit_group; 2026-02-21T10:22:31.1282960Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1283022Z add.s64 %rd141, %rd138, 96; 2026-02-21T10:22:31.1283218Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1283278Z add.s32 %r3718, %r3752, 44032; 2026-02-21T10:22:31.1283337Z // begin inline asm 2026-02-21T10:22:31.1283476Z cp.async.ca.shared.global [ %r3718 + 0 ], [ %rd141 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1283535Z // end inline asm 2026-02-21T10:22:31.1283602Z cp.async.commit_group; 2026-02-21T10:22:31.1283791Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1283857Z add.s64 %rd142, %rd138, 128; 2026-02-21T10:22:31.1284048Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1284107Z bar.sync 0; 2026-02-21T10:22:31.1284172Z add.s32 %r3720, %r3752, 27648; 2026-02-21T10:22:31.1284231Z // begin inline asm 2026-02-21T10:22:31.1284417Z cp.async.ca.shared.global [ %r3720 + 0 ], [ %rd142 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1284479Z // end inline asm 2026-02-21T10:22:31.1284543Z cp.async.commit_group; 2026-02-21T10:22:31.1284733Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1284794Z add.s64 %rd143, %rd138, 160; 2026-02-21T10:22:31.1284989Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1285051Z add.s32 %r3722, %r3752, 33792; 2026-02-21T10:22:31.1285112Z // begin inline asm 2026-02-21T10:22:31.1285248Z cp.async.ca.shared.global [ %r3722 + 0 ], [ %rd143 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1285304Z // end inline asm 2026-02-21T10:22:31.1285380Z cp.async.commit_group; 2026-02-21T10:22:31.1285581Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1285648Z add.s64 %rd144, %rd138, 192; 2026-02-21T10:22:31.1285839Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1285899Z add.s32 %r3724, %r3752, 39936; 2026-02-21T10:22:31.1285963Z // begin inline asm 2026-02-21T10:22:31.1286096Z cp.async.ca.shared.global [ %r3724 + 0 ], [ %rd144 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1286152Z // end inline asm 2026-02-21T10:22:31.1286220Z cp.async.commit_group; 2026-02-21T10:22:31.1286600Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1286668Z add.s64 %rd145, %rd138, 224; 2026-02-21T10:22:31.1286866Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1286926Z add.s32 %r3726, %r3752, 46080; 2026-02-21T10:22:31.1286986Z // begin inline asm 2026-02-21T10:22:31.1287119Z cp.async.ca.shared.global [ %r3726 + 0 ], [ %rd145 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1287184Z // end inline asm 2026-02-21T10:22:31.1287260Z cp.async.commit_group; 2026-02-21T10:22:31.1287455Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1287521Z add.s64 %rd146, %rd138, 256; 2026-02-21T10:22:31.1287712Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1287767Z bar.sync 0; 2026-02-21T10:22:31.1287836Z add.s32 %r3728, %r3752, 29696; 2026-02-21T10:22:31.1287968Z // begin inline asm 2026-02-21T10:22:31.1288106Z cp.async.ca.shared.global [ %r3728 + 0 ], [ %rd146 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1288170Z // end inline asm 2026-02-21T10:22:31.1288238Z cp.async.commit_group; 2026-02-21T10:22:31.1288430Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1288550Z add.s64 %rd147, %rd138, 288; 2026-02-21T10:22:31.1288745Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1288806Z add.s32 %r3730, %r3752, 35840; 2026-02-21T10:22:31.1288864Z // begin inline asm 2026-02-21T10:22:31.1289001Z cp.async.ca.shared.global [ %r3730 + 0 ], [ %rd147 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1289057Z // end inline asm 2026-02-21T10:22:31.1289123Z cp.async.commit_group; 2026-02-21T10:22:31.1289314Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1289387Z add.s64 %rd148, %rd138, 320; 2026-02-21T10:22:31.1289578Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1289640Z add.s32 %r3732, %r3752, 41984; 2026-02-21T10:22:31.1289703Z // begin inline asm 2026-02-21T10:22:31.1289841Z cp.async.ca.shared.global [ %r3732 + 0 ], [ %rd148 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1289898Z // end inline asm 2026-02-21T10:22:31.1289968Z cp.async.commit_group; 2026-02-21T10:22:31.1290226Z .loc 1 58 32 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:32 2026-02-21T10:22:31.1290290Z add.s64 %rd149, %rd138, 352; 2026-02-21T10:22:31.1290482Z .loc 1 58 80 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:58:80 2026-02-21T10:22:31.1290548Z add.s32 %r3734, %r3752, 48128; 2026-02-21T10:22:31.1290607Z // begin inline asm 2026-02-21T10:22:31.1290743Z cp.async.ca.shared.global [ %r3734 + 0 ], [ %rd149 + 0 ], 0x8, %r3713; 2026-02-21T10:22:31.1290808Z // end inline asm 2026-02-21T10:22:31.1290873Z cp.async.commit_group; 2026-02-21T10:22:31.1291077Z .loc 1 29 120 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:120 2026-02-21T10:22:31.1291168Z cp.async.bulk.wait_group.read 0; 2026-02-21T10:22:31.1291229Z bar.sync 0; 2026-02-21T10:22:31.1291296Z cp.async.wait_group 0; 2026-02-21T10:22:31.1291352Z bar.sync 0; 2026-02-21T10:22:31.1291552Z .loc 1 29 4 // cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py:29:4 2026-02-21T10:22:31.1291607Z ret; 2026-02-21T10:22:31.1291663Z $L__tmp17: 2026-02-21T10:22:31.1291724Z $L__func_end0: 2026-02-21T10:22:31.1291813Z // -- End function 2026-02-21T10:22:31.1291866Z } 2026-02-21T10:22:31.1292105Z .file 1 "/tmp/torchinductor_root/r6/cr62dflg62nr3olf3vyvdyblkkwn5kq4wc7lv62qre7dfkryjttl.py" 2026-02-21T10:22:31.1292321Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:22:31.1292460Z .section .debug_abbrev 2026-02-21T10:22:31.1292511Z { 2026-02-21T10:22:31.1292609Z .b8 1 // Abbreviation Code 2026-02-21T10:22:31.1292703Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:22:31.1292792Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:31.1292891Z .b8 37 // DW_AT_producer 2026-02-21T10:22:31.1292977Z .b8 8 // DW_FORM_string 2026-02-21T10:22:31.1293057Z .b8 19 // DW_AT_language 2026-02-21T10:22:31.1293139Z .b8 5 // DW_FORM_data2 2026-02-21T10:22:31.1293220Z .b8 3 // DW_AT_name 2026-02-21T10:22:31.1293300Z .b8 8 // DW_FORM_string 2026-02-21T10:22:31.1293385Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:22:31.1293520Z .b8 6 // DW_FORM_data4 2026-02-21T10:22:31.1293601Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:22:31.1293680Z .b8 8 // DW_FORM_string 2026-02-21T10:22:31.1293758Z .b8 0 // EOM(1) 2026-02-21T10:22:31.1293871Z .b8 0 // EOM(2) 2026-02-21T10:22:31.1293959Z .b8 2 // Abbreviation Code 2026-02-21T10:22:31.1294047Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:31.1294130Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:31.1294206Z .b8 3 // DW_AT_name 2026-02-21T10:22:31.1294284Z .b8 8 // DW_FORM_string 2026-02-21T10:22:31.1294369Z .b8 32 // DW_AT_inline 2026-02-21T10:22:31.1294449Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:31.1294523Z .b8 0 // EOM(1) 2026-02-21T10:22:31.1294608Z .b8 0 // EOM(2) 2026-02-21T10:22:31.1294696Z .b8 3 // Abbreviation Code 2026-02-21T10:22:31.1294784Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:31.1294874Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:31.1294955Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:31.1295083Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:31.1295166Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:31.1295247Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:31.1295339Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:31.1295416Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:31.1295490Z .b8 0 // EOM(1) 2026-02-21T10:22:31.1295565Z .b8 0 // EOM(2) 2026-02-21T10:22:31.1295651Z .b8 4 // Abbreviation Code 2026-02-21T10:22:31.1295754Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:22:31.1295834Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:31.1295926Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:31.1296002Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:31.1296085Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:31.1296159Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:31.1296241Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:31.1296321Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:31.1296403Z .b8 88 // DW_AT_call_file 2026-02-21T10:22:31.1296603Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:31.1296787Z .b8 89 // DW_AT_call_line 2026-02-21T10:22:31.1296866Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:31.1296949Z .b8 87 // DW_AT_call_column 2026-02-21T10:22:31.1297026Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:31.1297102Z .b8 0 // EOM(1) 2026-02-21T10:22:31.1297175Z .b8 0 // EOM(2) 2026-02-21T10:22:31.1297246Z .b8 0 // EOM(3) 2026-02-21T10:22:31.1297302Z } 2026-02-21T10:22:31.1297365Z .section .debug_info 2026-02-21T10:22:31.1297416Z { 2026-02-21T10:22:31.1297509Z .b32 178 // Length of Unit 2026-02-21T10:22:31.1297600Z .b8 2 // DWARF version number 2026-02-21T10:22:31.1297652Z .b8 0 2026-02-21T10:22:31.1297782Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:22:31.1297949Z .b8 8 // Address Size (in bytes) 2026-02-21T10:22:31.1298066Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:22:31.1298153Z .b8 116 // DW_AT_producer 2026-02-21T10:22:31.1298273Z .b8 114 2026-02-21T10:22:31.1298325Z .b8 105 2026-02-21T10:22:31.1298376Z .b8 116 2026-02-21T10:22:31.1298428Z .b8 111 2026-02-21T10:22:31.1298482Z .b8 110 2026-02-21T10:22:31.1298532Z .b8 0 2026-02-21T10:22:31.1298625Z .b8 2 // DW_AT_language 2026-02-21T10:22:31.1298682Z .b8 0 2026-02-21T10:22:31.1298762Z .b8 99 // DW_AT_name 2026-02-21T10:22:31.1298816Z .b8 114 2026-02-21T10:22:31.1298868Z .b8 54 2026-02-21T10:22:31.1298921Z .b8 50 2026-02-21T10:22:31.1298973Z .b8 100 2026-02-21T10:22:31.1299024Z .b8 102 2026-02-21T10:22:31.1299079Z .b8 108 2026-02-21T10:22:31.1299130Z .b8 103 2026-02-21T10:22:31.1299184Z .b8 54 2026-02-21T10:22:31.1299236Z .b8 50 2026-02-21T10:22:31.1299290Z .b8 110 2026-02-21T10:22:31.1299341Z .b8 114 2026-02-21T10:22:31.1299392Z .b8 51 2026-02-21T10:22:31.1299445Z .b8 111 2026-02-21T10:22:31.1299498Z .b8 108 2026-02-21T10:22:31.1299561Z .b8 102 2026-02-21T10:22:31.1299613Z .b8 51 2026-02-21T10:22:31.1299669Z .b8 118 2026-02-21T10:22:31.1299723Z .b8 121 2026-02-21T10:22:31.1299775Z .b8 118 2026-02-21T10:22:31.1299829Z .b8 100 2026-02-21T10:22:31.1299881Z .b8 121 2026-02-21T10:22:31.1299931Z .b8 98 2026-02-21T10:22:31.1300053Z .b8 108 2026-02-21T10:22:31.1300112Z .b8 107 2026-02-21T10:22:31.1300163Z .b8 107 2026-02-21T10:22:31.1300214Z .b8 119 2026-02-21T10:22:31.1300264Z .b8 110 2026-02-21T10:22:31.1300317Z .b8 53 2026-02-21T10:22:31.1300368Z .b8 107 2026-02-21T10:22:31.1300419Z .b8 113 2026-02-21T10:22:31.1300475Z .b8 52 2026-02-21T10:22:31.1300525Z .b8 119 2026-02-21T10:22:31.1300575Z .b8 99 2026-02-21T10:22:31.1300625Z .b8 55 2026-02-21T10:22:31.1300682Z .b8 108 2026-02-21T10:22:31.1300737Z .b8 118 2026-02-21T10:22:31.1300788Z .b8 54 2026-02-21T10:22:31.1300852Z .b8 50 2026-02-21T10:22:31.1300903Z .b8 113 2026-02-21T10:22:31.1300957Z .b8 114 2026-02-21T10:22:31.1301007Z .b8 101 2026-02-21T10:22:31.1301062Z .b8 55 2026-02-21T10:22:31.1301113Z .b8 100 2026-02-21T10:22:31.1301164Z .b8 102 2026-02-21T10:22:31.1301216Z .b8 107 2026-02-21T10:22:31.1301274Z .b8 114 2026-02-21T10:22:31.1301328Z .b8 121 2026-02-21T10:22:31.1301379Z .b8 106 2026-02-21T10:22:31.1301433Z .b8 116 2026-02-21T10:22:31.1301484Z .b8 116 2026-02-21T10:22:31.1301536Z .b8 108 2026-02-21T10:22:31.1301587Z .b8 46 2026-02-21T10:22:31.1301642Z .b8 112 2026-02-21T10:22:31.1301695Z .b8 121 2026-02-21T10:22:31.1301745Z .b8 0 2026-02-21T10:22:31.1301852Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:22:31.1301935Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:22:31.1301988Z .b8 116 2026-02-21T10:22:31.1302039Z .b8 109 2026-02-21T10:22:31.1302095Z .b8 112 2026-02-21T10:22:31.1302224Z .b8 47 2026-02-21T10:22:31.1302277Z .b8 116 2026-02-21T10:22:31.1302331Z .b8 111 2026-02-21T10:22:31.1302382Z .b8 114 2026-02-21T10:22:31.1302432Z .b8 99 2026-02-21T10:22:31.1302482Z .b8 104 2026-02-21T10:22:31.1302536Z .b8 105 2026-02-21T10:22:31.1302586Z .b8 110 2026-02-21T10:22:31.1302637Z .b8 100 2026-02-21T10:22:31.1302690Z .b8 117 2026-02-21T10:22:31.1302742Z .b8 99 2026-02-21T10:22:31.1302792Z .b8 116 2026-02-21T10:22:31.1302842Z .b8 111 2026-02-21T10:22:31.1302896Z .b8 114 2026-02-21T10:22:31.1302946Z .b8 95 2026-02-21T10:22:31.1302998Z .b8 114 2026-02-21T10:22:31.1303049Z .b8 111 2026-02-21T10:22:31.1303103Z .b8 111 2026-02-21T10:22:31.1303154Z .b8 116 2026-02-21T10:22:31.1303204Z .b8 47 2026-02-21T10:22:31.1303262Z .b8 114 2026-02-21T10:22:31.1303319Z .b8 54 2026-02-21T10:22:31.1303371Z .b8 0 2026-02-21T10:22:31.1303485Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:22:31.1303567Z .b8 95 // DW_AT_name 2026-02-21T10:22:31.1303627Z .b8 104 2026-02-21T10:22:31.1303731Z .b8 101 2026-02-21T10:22:31.1303788Z .b8 108 2026-02-21T10:22:31.1303840Z .b8 105 2026-02-21T10:22:31.1303890Z .b8 111 2026-02-21T10:22:31.1303940Z .b8 110 2026-02-21T10:22:31.1303994Z .b8 95 2026-02-21T10:22:31.1304046Z .b8 109 2026-02-21T10:22:31.1304096Z .b8 97 2026-02-21T10:22:31.1304195Z .b8 116 2026-02-21T10:22:31.1304246Z .b8 109 2026-02-21T10:22:31.1304296Z .b8 117 2026-02-21T10:22:31.1304346Z .b8 108 2026-02-21T10:22:31.1304401Z .b8 95 2026-02-21T10:22:31.1304451Z .b8 98 2026-02-21T10:22:31.1304503Z .b8 102 2026-02-21T10:22:31.1304559Z .b8 49 2026-02-21T10:22:31.1304609Z .b8 54 2026-02-21T10:22:31.1304660Z .b8 95 2026-02-21T10:22:31.1304711Z .b8 105 2026-02-21T10:22:31.1304764Z .b8 110 2026-02-21T10:22:31.1304814Z .b8 116 2026-02-21T10:22:31.1304865Z .b8 52 2026-02-21T10:22:31.1304915Z .b8 0 2026-02-21T10:22:31.1304998Z .b8 1 // DW_AT_inline 2026-02-21T10:22:31.1305104Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:22:31.1305205Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:22:31.1305304Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:22:31.1305403Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:31.1305531Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:22:31.1305633Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:31.1305781Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:22:31.1305879Z .b64 $L__tmp16 // DW_AT_high_pc 2026-02-21T10:22:31.1305969Z .b8 1 // DW_AT_call_file 2026-02-21T10:22:31.1306053Z .b8 94 // DW_AT_call_line 2026-02-21T10:22:31.1306138Z .b8 40 // DW_AT_call_column 2026-02-21T10:22:31.1306228Z .b8 0 // End Of Children Mark 2026-02-21T10:22:31.1306323Z .b8 0 // End Of Children Mark 2026-02-21T10:22:31.1306375Z } 2026-02-21T10:22:31.1306444Z .section .debug_macinfo { } 2026-02-21T10:22:31.1306565Z 2026-02-21T10:22:31.1306653Z ================================================================ 2026-02-21T10:22:31.1306774Z please share the reproducer above with Triton project. 2026-02-21T10:22:35.4294344Z 2026-02-21T10:22:35.4294358Z 2026-02-21T10:22:35.4294364Z 2026-02-21T10:22:35.4294725Z ================================================================ 2026-02-21T10:22:35.4295121Z Internal Triton PTX codegen error 2026-02-21T10:22:35.4295384Z `ptxas` stderr: 2026-02-21T10:22:35.4296134Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 773 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T10:22:35.4297156Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:35.4297963Z 2026-02-21T10:22:35.4298647Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpme44ze5s.ptx -o /tmp/tmpme44ze5s.ptx.o 2026-02-21T10:22:35.4307243Z 2026-02-21T10:22:35.4307252Z 2026-02-21T10:22:35.4307387Z // 2026-02-21T10:22:35.4307583Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:22:35.4307855Z // 2026-02-21T10:22:35.4307937Z 2026-02-21T10:22:35.4308010Z .version 8.7 2026-02-21T10:22:35.4308171Z .target sm_90a 2026-02-21T10:22:35.4308364Z .address_size 64 2026-02-21T10:22:35.4308551Z 2026-02-21T10:22:35.4308773Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:22:35.4309205Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:22:35.4309511Z // @_helion_matmul_bf16_int4 2026-02-21T10:22:35.4309818Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:22:35.4310163Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:22:35.4310725Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:22:35.4311141Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:22:35.4311536Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:22:35.4311941Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:22:35.4312371Z ) 2026-02-21T10:22:35.4312533Z .reqntid 512 2026-02-21T10:22:35.4312697Z .maxnreg 32 2026-02-21T10:22:35.4312859Z { 2026-02-21T10:22:35.4313022Z .reg .pred %p<485>; 2026-02-21T10:22:35.4313212Z .reg .b16 %rs<1345>; 2026-02-21T10:22:35.4313407Z .reg .b32 %r<14333>; 2026-02-21T10:22:35.4313588Z .reg .b64 %rd<599>; 2026-02-21T10:22:35.4313955Z .loc 1 19 0 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:19:0 2026-02-21T10:22:35.4314380Z $L__func_begin0: 2026-02-21T10:22:35.4314731Z .loc 1 19 0 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:19:0 2026-02-21T10:22:35.4315090Z 2026-02-21T10:22:35.4315160Z // %bb.0: 2026-02-21T10:22:35.4315405Z ld.param.b64 %rd77, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:22:35.4315767Z ld.param.b64 %rd76, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:22:35.4316040Z $L__tmp0: 2026-02-21T10:22:35.4316380Z .loc 1 21 67 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:21:67 2026-02-21T10:22:35.4316958Z mov.u32 %r14246, %ctaid.x; 2026-02-21T10:22:35.4317408Z ld.param.b64 %rd79, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:22:35.4317717Z mov.u32 %r243, %ctaid.y; 2026-02-21T10:22:35.4317985Z ld.param.b64 %rd96, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:22:35.4318271Z mov.u32 %r244, %ctaid.z; 2026-02-21T10:22:35.4318465Z mov.u32 %r245, %nctaid.x; 2026-02-21T10:22:35.4318667Z mov.u32 %r246, %nctaid.y; 2026-02-21T10:22:35.4318870Z mad.lo.s32 %r247, %r244, %r246, %r243; 2026-02-21T10:22:35.4319122Z mad.lo.s32 %r248, %r247, %r245, %r14246; 2026-02-21T10:22:35.4319357Z shl.b32 %r249, %r248, 7; 2026-02-21T10:22:35.4319557Z cvt.s64.s32 %rd97, %r249; 2026-02-21T10:22:35.4319758Z add.s64 %rd93, %rd96, %rd97; 2026-02-21T10:22:35.4319978Z mov.u32 %r2, %tid.x; 2026-02-21T10:22:35.4320175Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:22:35.4320361Z shl.b32 %r250, %r2, 2; 2026-02-21T10:22:35.4320590Z mov.b32 %r11026, global_smem; 2026-02-21T10:22:35.4320811Z add.s32 %r235, %r11026, %r250; 2026-02-21T10:22:35.4321011Z mov.b32 %r236, 0; 2026-02-21T10:22:35.4321169Z // begin inline asm 2026-02-21T10:22:35.4321359Z @%p1 st.shared.b32 [ %r235 + 0 ], %r236; 2026-02-21T10:22:35.4321571Z // end inline asm 2026-02-21T10:22:35.4321745Z bar.warp.sync -1; 2026-02-21T10:22:35.4321913Z setp.eq.b32 %p472, %r2, 0; 2026-02-21T10:22:35.4322119Z cvt.u64.u32 %rd78, %r11026; 2026-02-21T10:22:35.4322327Z // begin inline asm 2026-02-21T10:22:35.4322684Z @%p472 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd78 + 0 ], %rd79; 2026-02-21T10:22:35.4323158Z // end inline asm 2026-02-21T10:22:35.4323316Z // begin inline asm 2026-02-21T10:22:35.4323600Z @%p472 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x1; 2026-02-21T10:22:35.4323920Z // end inline asm 2026-02-21T10:22:35.4324088Z mov.b32 %r237, 128; 2026-02-21T10:22:35.4324260Z // begin inline asm 2026-02-21T10:22:35.4324554Z @%p472 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0, %r237; 2026-02-21T10:22:35.4324935Z // end inline asm 2026-02-21T10:22:35.4325120Z mov.b32 %r238, 32; 2026-02-21T10:22:35.4325289Z // begin inline asm 2026-02-21T10:22:35.4325578Z @%p472 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x1, %r238; 2026-02-21T10:22:35.4325936Z // end inline asm 2026-02-21T10:22:35.4326085Z mov.b32 %r239, 1280; 2026-02-21T10:22:35.4326260Z // begin inline asm 2026-02-21T10:22:35.4326762Z @%p472 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0, %r239; 2026-02-21T10:22:35.4327221Z // end inline asm 2026-02-21T10:22:35.4327399Z mov.b32 %r240, 4096; 2026-02-21T10:22:35.4327558Z // begin inline asm 2026-02-21T10:22:35.4327892Z @%p472 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x1, %r240; 2026-02-21T10:22:35.4328259Z // end inline asm 2026-02-21T10:22:35.4328521Z mov.b64 %rd86, 1280; 2026-02-21T10:22:35.4328692Z // begin inline asm 2026-02-21T10:22:35.4329007Z @%p472 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd78 + 0 ], 0x0, %rd86; 2026-02-21T10:22:35.4329379Z // end inline asm 2026-02-21T10:22:35.4329536Z mov.b32 %r241, 1; 2026-02-21T10:22:35.4329696Z // begin inline asm 2026-02-21T10:22:35.4330015Z @%p472 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0, %r241; 2026-02-21T10:22:35.4330396Z // end inline asm 2026-02-21T10:22:35.4330556Z // begin inline asm 2026-02-21T10:22:35.4330877Z @%p472 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x1, %r241; 2026-02-21T10:22:35.4331262Z // end inline asm 2026-02-21T10:22:35.4331412Z // begin inline asm 2026-02-21T10:22:35.4331709Z @%p472 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0; 2026-02-21T10:22:35.4332053Z // end inline asm 2026-02-21T10:22:35.4332208Z // begin inline asm 2026-02-21T10:22:35.4332532Z @%p472 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0; 2026-02-21T10:22:35.4333782Z [3348s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:22:35.4335383Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=16, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[None, True], range_num_stages=[2, 3], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:22:35.4337040Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:22:35.4337326Z `ptxas` stderr: 2026-02-21T10:22:35.4337884Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 773 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T10:22:35.4338522Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:35.4338702Z 2026-02-21T10:22:35.4339209Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpme44ze5s.ptx -o /tmp/tmpme44ze5s.ptx.o 2026-02-21T10:22:35.4339788Z 2026-02-21T10:22:35.4339939Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:22:35.4340234Z // end inline asm 2026-02-21T10:22:35.4340385Z // begin inline asm 2026-02-21T10:22:35.4340690Z @%p472 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x3; 2026-02-21T10:22:35.4341126Z // end inline asm 2026-02-21T10:22:35.4341275Z // begin inline asm 2026-02-21T10:22:35.4341558Z @%p472 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd78 + 0 ], 0x0; 2026-02-21T10:22:35.4341882Z // end inline asm 2026-02-21T10:22:35.4342033Z // begin inline asm 2026-02-21T10:22:35.4342465Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd93 + 0 ], [ %rd78 + 0 ], 0x80; 2026-02-21T10:22:35.4342941Z // end inline asm 2026-02-21T10:22:35.4343101Z // begin inline asm 2026-02-21T10:22:35.4343361Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd93 + 0 ], 0x80; 2026-02-21T10:22:35.4343677Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:35.4343896Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:35.4344108Z // end inline asm 2026-02-21T10:22:35.4344250Z bar.sync 0; 2026-02-21T10:22:35.4344413Z cvta.global.u64 %rd394, %rd93; 2026-02-21T10:22:35.4344821Z .loc 1 0 0 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:0 2026-02-21T10:22:35.4345182Z sub.s32 %r252, 10371, %r14246; 2026-02-21T10:22:35.4345525Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4345891Z mul.hi.u32 %r253, %r252, 1041204193; 2026-02-21T10:22:35.4346177Z shr.u32 %r254, %r253, 5; 2026-02-21T10:22:35.4346350Z and.b32 %r255, %r254, 33554430; 2026-02-21T10:22:35.4346682Z mad.lo.s32 %r14314, %r255, 132, %r14246; 2026-02-21T10:22:35.4347041Z .loc 1 38 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:38:45 2026-02-21T10:22:35.4347398Z and.b32 %r4, %r2, 31; 2026-02-21T10:22:35.4347565Z shr.u32 %r5, %r2, 5; 2026-02-21T10:22:35.4347724Z and.b32 %r256, %r2, 15; 2026-02-21T10:22:35.4347895Z shl.b32 %r6, %r256, 3; 2026-02-21T10:22:35.4348202Z .loc 1 40 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:45 2026-02-21T10:22:35.4348675Z shr.u32 %r257, %r2, 4; 2026-02-21T10:22:35.4348841Z bfe.u32 %r7, %r2, 4, 5; 2026-02-21T10:22:35.4349010Z or.b32 %r8, %r257, 32; 2026-02-21T10:22:35.4349170Z shl.b32 %r10, %r256, 2; 2026-02-21T10:22:35.4349339Z and.b32 %r11, %r2, 7; 2026-02-21T10:22:35.4349650Z .loc 1 71 38 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:71:38 2026-02-21T10:22:35.4349998Z and.b32 %r13, %r2, 128; 2026-02-21T10:22:35.4350397Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4350800Z setp.ge.s32 %p19, %r14246, %r14314; 2026-02-21T10:22:35.4351006Z and.b32 %r14236, %r2, 96; 2026-02-21T10:22:35.4351185Z shl.b32 %r14237, %r2, 5; 2026-02-21T10:22:35.4351356Z shl.b32 %r14238, %r2, 1; 2026-02-21T10:22:35.4351532Z and.b32 %r14239, %r2, 127; 2026-02-21T10:22:35.4351715Z bfe.s32 %r14240, %r2, 8, 1; 2026-02-21T10:22:35.4351901Z shl.b32 %r14241, %r11, 4; 2026-02-21T10:22:35.4352076Z shl.b32 %r14242, %r5, 3; 2026-02-21T10:22:35.4352259Z and.b32 %r14243, %r2, 3; 2026-02-21T10:22:35.4352427Z and.b32 %r14244, %r2, 24; 2026-02-21T10:22:35.4352595Z and.b32 %r14245, %r2, 384; 2026-02-21T10:22:35.4352781Z setp.eq.b32 %p484, %r13, 0; 2026-02-21T10:22:35.4352960Z @%p19 bra $L__BB0_7; 2026-02-21T10:22:35.4353143Z // %bb.1: // %.lr.ph 2026-02-21T10:22:35.4353513Z .loc 1 0 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:0:107 2026-02-21T10:22:35.4353878Z bfe.u32 %r9, %r2, 3, 6; 2026-02-21T10:22:35.4354044Z shl.b32 %r12, %r11, 3; 2026-02-21T10:22:35.4354213Z shl.b32 %r258, %r2, 4; 2026-02-21T10:22:35.4354380Z and.b32 %r259, %r258, 8176; 2026-02-21T10:22:35.4354551Z and.b32 %r260, %r2, 56; 2026-02-21T10:22:35.4354730Z xor.b32 %r261, %r259, %r260; 2026-02-21T10:22:35.4354916Z add.s32 %r14, %r11026, %r261; 2026-02-21T10:22:35.4355099Z xor.b32 %r263, %r261, 8; 2026-02-21T10:22:35.4355266Z add.s32 %r15, %r11026, %r263; 2026-02-21T10:22:35.4355544Z shl.b32 %r265, %r14236, 6; 2026-02-21T10:22:35.4355721Z and.b32 %r267, %r14237, 896; 2026-02-21T10:22:35.4355900Z and.b32 %r269, %r14238, 62; 2026-02-21T10:22:35.4356070Z or.b32 %r270, %r265, %r267; 2026-02-21T10:22:35.4356245Z or.b32 %r271, %r270, %r269; 2026-02-21T10:22:35.4356417Z add.s32 %r16, %r11026, %r271; 2026-02-21T10:22:35.4356728Z xor.b32 %r272, %r271, 8; 2026-02-21T10:22:35.4356903Z add.s32 %r17, %r11026, %r272; 2026-02-21T10:22:35.4357076Z xor.b32 %r273, %r271, 16; 2026-02-21T10:22:35.4357266Z add.s32 %r18, %r11026, %r273; 2026-02-21T10:22:35.4357440Z xor.b32 %r274, %r271, 24; 2026-02-21T10:22:35.4357609Z add.s32 %r19, %r11026, %r274; 2026-02-21T10:22:35.4357792Z xor.b32 %r275, %r271, 32; 2026-02-21T10:22:35.4357964Z add.s32 %r20, %r11026, %r275; 2026-02-21T10:22:35.4358140Z xor.b32 %r276, %r271, 40; 2026-02-21T10:22:35.4358307Z add.s32 %r21, %r11026, %r276; 2026-02-21T10:22:35.4358482Z xor.b32 %r277, %r271, 48; 2026-02-21T10:22:35.4358732Z add.s32 %r22, %r11026, %r277; 2026-02-21T10:22:35.4358914Z xor.b32 %r278, %r271, 56; 2026-02-21T10:22:35.4359078Z add.s32 %r23, %r11026, %r278; 2026-02-21T10:22:35.4359259Z and.b32 %r281, %r14240, 144; 2026-02-21T10:22:35.4359437Z xor.b32 %r282, %r281, %r14239; 2026-02-21T10:22:35.4359622Z add.s32 %r24, %r11026, %r282; 2026-02-21T10:22:35.4359906Z xor.b32 %r283, %r282, 32; 2026-02-21T10:22:35.4360080Z add.s32 %r25, %r11026, %r283; 2026-02-21T10:22:35.4360257Z xor.b32 %r284, %r282, 64; 2026-02-21T10:22:35.4360424Z add.s32 %r26, %r11026, %r284; 2026-02-21T10:22:35.4360603Z xor.b32 %r285, %r282, 96; 2026-02-21T10:22:35.4360771Z add.s32 %r27, %r11026, %r285; 2026-02-21T10:22:35.4360952Z shl.b32 %r286, %r14239, 7; 2026-02-21T10:22:35.4361125Z and.b32 %r288, %r5, 12; 2026-02-21T10:22:35.4361311Z or.b32 %r289, %r286, %r288; 2026-02-21T10:22:35.4361487Z or.b32 %r290, %r289, %r14241; 2026-02-21T10:22:35.4361663Z add.s32 %r28, %r11026, %r290; 2026-02-21T10:22:35.4361837Z xor.b32 %r291, %r290, 16; 2026-02-21T10:22:35.4362012Z add.s32 %r29, %r11026, %r291; 2026-02-21T10:22:35.4362193Z xor.b32 %r292, %r290, 32; 2026-02-21T10:22:35.4362357Z add.s32 %r30, %r11026, %r292; 2026-02-21T10:22:35.4362535Z xor.b32 %r293, %r290, 48; 2026-02-21T10:22:35.4362699Z add.s32 %r31, %r11026, %r293; 2026-02-21T10:22:35.4362881Z xor.b32 %r294, %r290, 64; 2026-02-21T10:22:35.4363044Z add.s32 %r32, %r11026, %r294; 2026-02-21T10:22:35.4363215Z xor.b32 %r295, %r290, 80; 2026-02-21T10:22:35.4363467Z add.s32 %r33, %r11026, %r295; 2026-02-21T10:22:35.4363649Z xor.b32 %r296, %r290, 96; 2026-02-21T10:22:35.4363810Z add.s32 %r34, %r11026, %r296; 2026-02-21T10:22:35.4364001Z xor.b32 %r297, %r290, 112; 2026-02-21T10:22:35.4364172Z add.s32 %r35, %r11026, %r297; 2026-02-21T10:22:35.4364344Z and.b32 %r299, %r14242, 120; 2026-02-21T10:22:35.4364519Z or.b32 %r300, %r299, %r4; 2026-02-21T10:22:35.4364681Z shl.b32 %r301, %r300, 4; 2026-02-21T10:22:35.4364851Z add.s32 %r302, %r11026, 32768; 2026-02-21T10:22:35.4365036Z add.s32 %r4407, %r302, %r301; 2026-02-21T10:22:35.4365212Z shl.b32 %r303, %r2, 6; 2026-02-21T10:22:35.4365372Z and.b32 %r304, %r303, 1536; 2026-02-21T10:22:35.4365547Z shl.b32 %r305, %r14236, 2; 2026-02-21T10:22:35.4365720Z add.s32 %r306, %r302, %r304; 2026-02-21T10:22:35.4365894Z add.s32 %r307, %r306, %r305; 2026-02-21T10:22:35.4366074Z add.s32 %r433, %r307, %r14241; 2026-02-21T10:22:35.4366252Z bfe.u32 %r308, %r11026, 4, 14; 2026-02-21T10:22:35.4366436Z cvt.u64.u32 %rd98, %r308; 2026-02-21T10:22:35.4366753Z or.b64 %rd266, %rd98, 4611686293372403712; 2026-02-21T10:22:35.4366968Z add.s32 %r309, %r11026, 32; 2026-02-21T10:22:35.4367142Z bfe.u32 %r310, %r309, 4, 14; 2026-02-21T10:22:35.4367320Z cvt.u64.u32 %rd99, %r310; 2026-02-21T10:22:35.4367499Z or.b64 %rd267, %rd99, 4611686293372403712; 2026-02-21T10:22:35.4367721Z add.s32 %r311, %r11026, 64; 2026-02-21T10:22:35.4367903Z bfe.u32 %r312, %r311, 4, 14; 2026-02-21T10:22:35.4368168Z cvt.u64.u32 %rd100, %r312; 2026-02-21T10:22:35.4368363Z or.b64 %rd268, %rd100, 4611686293372403712; 2026-02-21T10:22:35.4368569Z add.s32 %r313, %r11026, 96; 2026-02-21T10:22:35.4368743Z bfe.u32 %r314, %r313, 4, 14; 2026-02-21T10:22:35.4368916Z cvt.u64.u32 %rd101, %r314; 2026-02-21T10:22:35.4369105Z or.b64 %rd269, %rd101, 4611686293372403712; 2026-02-21T10:22:35.4369321Z add.s32 %r315, %r11026, 16384; 2026-02-21T10:22:35.4369507Z bfe.u32 %r316, %r315, 4, 14; 2026-02-21T10:22:35.4369697Z cvt.u64.u32 %rd102, %r316; 2026-02-21T10:22:35.4369891Z or.b64 %rd270, %rd102, 4611686293372403712; 2026-02-21T10:22:35.4370104Z add.s32 %r317, %r11026, 16416; 2026-02-21T10:22:35.4370282Z bfe.u32 %r318, %r317, 4, 14; 2026-02-21T10:22:35.4370465Z cvt.u64.u32 %rd103, %r318; 2026-02-21T10:22:35.4370646Z or.b64 %rd271, %rd103, 4611686293372403712; 2026-02-21T10:22:35.4370868Z add.s32 %r319, %r11026, 16448; 2026-02-21T10:22:35.4371048Z bfe.u32 %r320, %r319, 4, 14; 2026-02-21T10:22:35.4371236Z cvt.u64.u32 %rd104, %r320; 2026-02-21T10:22:35.4371488Z or.b64 %rd272, %rd104, 4611686293372403712; 2026-02-21T10:22:35.4371714Z add.s32 %r321, %r11026, 16480; 2026-02-21T10:22:35.4371899Z bfe.u32 %r322, %r321, 4, 14; 2026-02-21T10:22:35.4372072Z cvt.u64.u32 %rd105, %r322; 2026-02-21T10:22:35.4372263Z or.b64 %rd273, %rd105, 4611686293372403712; 2026-02-21T10:22:35.4372529Z add.s32 %r4903, %r11026, 4096; 2026-02-21T10:22:35.4372710Z bfe.u32 %r324, %r4903, 4, 14; 2026-02-21T10:22:35.4372898Z cvt.u64.u32 %rd106, %r324; 2026-02-21T10:22:35.4373087Z or.b64 %rd274, %rd106, 4611686293372403712; 2026-02-21T10:22:35.4373291Z add.s32 %r325, %r11026, 4128; 2026-02-21T10:22:35.4373473Z bfe.u32 %r326, %r325, 4, 14; 2026-02-21T10:22:35.4373651Z cvt.u64.u32 %rd107, %r326; 2026-02-21T10:22:35.4373830Z or.b64 %rd275, %rd107, 4611686293372403712; 2026-02-21T10:22:35.4374035Z add.s32 %r327, %r11026, 4160; 2026-02-21T10:22:35.4374206Z bfe.u32 %r328, %r327, 4, 14; 2026-02-21T10:22:35.4374385Z cvt.u64.u32 %rd108, %r328; 2026-02-21T10:22:35.4374568Z or.b64 %rd276, %rd108, 4611686293372403712; 2026-02-21T10:22:35.4374772Z add.s32 %r329, %r11026, 4192; 2026-02-21T10:22:35.4374942Z bfe.u32 %r330, %r329, 4, 14; 2026-02-21T10:22:35.4375120Z cvt.u64.u32 %rd109, %r330; 2026-02-21T10:22:35.4375302Z or.b64 %rd277, %rd109, 4611686293372403712; 2026-02-21T10:22:35.4375509Z add.s32 %r331, %r11026, 20480; 2026-02-21T10:22:35.4375690Z bfe.u32 %r332, %r331, 4, 14; 2026-02-21T10:22:35.4375861Z cvt.u64.u32 %rd110, %r332; 2026-02-21T10:22:35.4376130Z or.b64 %rd278, %rd110, 4611686293372403712; 2026-02-21T10:22:35.4376341Z add.s32 %r333, %r11026, 20512; 2026-02-21T10:22:35.4376644Z bfe.u32 %r334, %r333, 4, 14; 2026-02-21T10:22:35.4376820Z cvt.u64.u32 %rd111, %r334; 2026-02-21T10:22:35.4377004Z or.b64 %rd279, %rd111, 4611686293372403712; 2026-02-21T10:22:35.4377204Z add.s32 %r335, %r11026, 20544; 2026-02-21T10:22:35.4377388Z bfe.u32 %r336, %r335, 4, 14; 2026-02-21T10:22:35.4377566Z cvt.u64.u32 %rd112, %r336; 2026-02-21T10:22:35.4377751Z or.b64 %rd280, %rd112, 4611686293372403712; 2026-02-21T10:22:35.4377961Z add.s32 %r337, %r11026, 20576; 2026-02-21T10:22:35.4378135Z bfe.u32 %r338, %r337, 4, 14; 2026-02-21T10:22:35.4378327Z cvt.u64.u32 %rd113, %r338; 2026-02-21T10:22:35.4378509Z or.b64 %rd281, %rd113, 4611686293372403712; 2026-02-21T10:22:35.4378720Z add.s32 %r339, %r11026, 8192; 2026-02-21T10:22:35.4378893Z bfe.u32 %r340, %r339, 4, 14; 2026-02-21T10:22:35.4379071Z cvt.u64.u32 %rd114, %r340; 2026-02-21T10:22:35.4379256Z or.b64 %rd282, %rd114, 4611686293372403712; 2026-02-21T10:22:35.4379460Z add.s32 %r341, %r11026, 8224; 2026-02-21T10:22:35.4379646Z bfe.u32 %r342, %r341, 4, 14; 2026-02-21T10:22:35.4379817Z cvt.u64.u32 %rd115, %r342; 2026-02-21T10:22:35.4380003Z or.b64 %rd283, %rd115, 4611686293372403712; 2026-02-21T10:22:35.4380202Z add.s32 %r343, %r11026, 8256; 2026-02-21T10:22:35.4380385Z bfe.u32 %r344, %r343, 4, 14; 2026-02-21T10:22:35.4380568Z cvt.u64.u32 %rd116, %r344; 2026-02-21T10:22:35.4380845Z or.b64 %rd284, %rd116, 4611686293372403712; 2026-02-21T10:22:35.4381050Z add.s32 %r345, %r11026, 8288; 2026-02-21T10:22:35.4381228Z bfe.u32 %r346, %r345, 4, 14; 2026-02-21T10:22:35.4381403Z cvt.u64.u32 %rd117, %r346; 2026-02-21T10:22:35.4381584Z or.b64 %rd285, %rd117, 4611686293372403712; 2026-02-21T10:22:35.4381796Z add.s32 %r347, %r11026, 24576; 2026-02-21T10:22:35.4381972Z bfe.u32 %r348, %r347, 4, 14; 2026-02-21T10:22:35.4382149Z cvt.u64.u32 %rd118, %r348; 2026-02-21T10:22:35.4382329Z or.b64 %rd286, %rd118, 4611686293372403712; 2026-02-21T10:22:35.4382535Z add.s32 %r349, %r11026, 24608; 2026-02-21T10:22:35.4382710Z bfe.u32 %r350, %r349, 4, 14; 2026-02-21T10:22:35.4382899Z cvt.u64.u32 %rd119, %r350; 2026-02-21T10:22:35.4383086Z or.b64 %rd287, %rd119, 4611686293372403712; 2026-02-21T10:22:35.4383285Z add.s32 %r351, %r11026, 24640; 2026-02-21T10:22:35.4383466Z bfe.u32 %r352, %r351, 4, 14; 2026-02-21T10:22:35.4383640Z cvt.u64.u32 %rd120, %r352; 2026-02-21T10:22:35.4383910Z or.b64 %rd288, %rd120, 4611686293372403712; 2026-02-21T10:22:35.4384119Z add.s32 %r353, %r11026, 24672; 2026-02-21T10:22:35.4384303Z bfe.u32 %r354, %r353, 4, 14; 2026-02-21T10:22:35.4384472Z cvt.u64.u32 %rd121, %r354; 2026-02-21T10:22:35.4384656Z or.b64 %rd289, %rd121, 4611686293372403712; 2026-02-21T10:22:35.4384944Z add.s32 %r355, %r11026, 12288; 2026-02-21T10:22:35.4385121Z bfe.u32 %r356, %r355, 4, 14; 2026-02-21T10:22:35.4385298Z cvt.u64.u32 %rd122, %r356; 2026-02-21T10:22:35.4385479Z or.b64 %rd290, %rd122, 4611686293372403712; 2026-02-21T10:22:35.4385717Z add.s32 %r357, %r11026, 12320; 2026-02-21T10:22:35.4385922Z bfe.u32 %r358, %r357, 4, 14; 2026-02-21T10:22:35.4386134Z cvt.u64.u32 %rd123, %r358; 2026-02-21T10:22:35.4386337Z or.b64 %rd291, %rd123, 4611686293372403712; 2026-02-21T10:22:35.4386695Z add.s32 %r359, %r11026, 12352; 2026-02-21T10:22:35.4386880Z bfe.u32 %r360, %r359, 4, 14; 2026-02-21T10:22:35.4387084Z cvt.u64.u32 %rd124, %r360; 2026-02-21T10:22:35.4387304Z or.b64 %rd292, %rd124, 4611686293372403712; 2026-02-21T10:22:35.4387530Z add.s32 %r361, %r11026, 12384; 2026-02-21T10:22:35.4387731Z bfe.u32 %r362, %r361, 4, 14; 2026-02-21T10:22:35.4387922Z cvt.u64.u32 %rd125, %r362; 2026-02-21T10:22:35.4388134Z or.b64 %rd293, %rd125, 4611686293372403712; 2026-02-21T10:22:35.4388359Z add.s32 %r363, %r11026, 28672; 2026-02-21T10:22:35.4388641Z bfe.u32 %r364, %r363, 4, 14; 2026-02-21T10:22:35.4388827Z cvt.u64.u32 %rd126, %r364; 2026-02-21T10:22:35.4389113Z or.b64 %rd294, %rd126, 4611686293372403712; 2026-02-21T10:22:35.4389355Z add.s32 %r365, %r11026, 28704; 2026-02-21T10:22:35.4389554Z bfe.u32 %r366, %r365, 4, 14; 2026-02-21T10:22:35.4389752Z cvt.u64.u32 %rd127, %r366; 2026-02-21T10:22:35.4389935Z or.b64 %rd295, %rd127, 4611686293372403712; 2026-02-21T10:22:35.4390150Z add.s32 %r367, %r11026, 28736; 2026-02-21T10:22:35.4390331Z bfe.u32 %r368, %r367, 4, 14; 2026-02-21T10:22:35.4390514Z cvt.u64.u32 %rd128, %r368; 2026-02-21T10:22:35.4390718Z or.b64 %rd296, %rd128, 4611686293372403712; 2026-02-21T10:22:35.4390934Z add.s32 %r369, %r11026, 28768; 2026-02-21T10:22:35.4391122Z bfe.u32 %r370, %r369, 4, 14; 2026-02-21T10:22:35.4391307Z cvt.u64.u32 %rd129, %r370; 2026-02-21T10:22:35.4391499Z or.b64 %rd297, %rd129, 4611686293372403712; 2026-02-21T10:22:35.4391705Z shl.b32 %r372, %r14243, 12; 2026-02-21T10:22:35.4391896Z and.b32 %r373, %r14237, 3168; 2026-02-21T10:22:35.4392076Z shl.b32 %r375, %r14244, 4; 2026-02-21T10:22:35.4392254Z shr.u32 %r377, %r14245, 2; 2026-02-21T10:22:35.4392428Z and.b32 %r379, %r250, 16; 2026-02-21T10:22:35.4392607Z or.b32 %r380, %r373, %r375; 2026-02-21T10:22:35.4392785Z xor.b32 %r381, %r380, %r377; 2026-02-21T10:22:35.4392967Z add.s32 %r382, %r11026, %r372; 2026-02-21T10:22:35.4393153Z add.s32 %r383, %r382, %r379; 2026-02-21T10:22:35.4393337Z add.s32 %r38, %r383, %r381; 2026-02-21T10:22:35.4393522Z shl.b32 %r384, %r14244, 9; 2026-02-21T10:22:35.4393692Z shl.b32 %r385, %r14243, 5; 2026-02-21T10:22:35.4393949Z and.b32 %r386, %r250, 2032; 2026-02-21T10:22:35.4394124Z or.b32 %r387, %r384, %r385; 2026-02-21T10:22:35.4394299Z xor.b32 %r388, %r387, %r386; 2026-02-21T10:22:35.4394477Z add.s32 %r4855, %r11026, %r388; 2026-02-21T10:22:35.4394667Z add.s32 %r4860, %r4855, 2048; 2026-02-21T10:22:35.4394842Z shr.u32 %r389, %r14245, 5; 2026-02-21T10:22:35.4395021Z or.b32 %r390, %r286, %r389; 2026-02-21T10:22:35.4395198Z or.b32 %r391, %r390, %r14241; 2026-02-21T10:22:35.4395372Z add.s32 %r41, %r11026, %r391; 2026-02-21T10:22:35.4395554Z xor.b32 %r392, %r391, 16; 2026-02-21T10:22:35.4395723Z add.s32 %r42, %r11026, %r392; 2026-02-21T10:22:35.4395920Z xor.b32 %r393, %r391, 32; 2026-02-21T10:22:35.4396089Z add.s32 %r43, %r11026, %r393; 2026-02-21T10:22:35.4396265Z xor.b32 %r394, %r391, 48; 2026-02-21T10:22:35.4396431Z add.s32 %r44, %r11026, %r394; 2026-02-21T10:22:35.4396740Z xor.b32 %r395, %r391, 64; 2026-02-21T10:22:35.4396914Z add.s32 %r45, %r11026, %r395; 2026-02-21T10:22:35.4397180Z xor.b32 %r396, %r391, 80; 2026-02-21T10:22:35.4397359Z add.s32 %r46, %r11026, %r396; 2026-02-21T10:22:35.4397532Z xor.b32 %r397, %r391, 96; 2026-02-21T10:22:35.4397706Z add.s32 %r47, %r11026, %r397; 2026-02-21T10:22:35.4397879Z xor.b32 %r398, %r391, 112; 2026-02-21T10:22:35.4398056Z add.s32 %r48, %r11026, %r398; 2026-02-21T10:22:35.4398460Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4398831Z shl.b32 %r399, %r9, 13; 2026-02-21T10:22:35.4399160Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.4399519Z or.b32 %r49, %r399, %r12; 2026-02-21T10:22:35.4399748Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:22:35.4400054Z // Child Loop BB0_3 Depth 2 2026-02-21T10:22:35.4400332Z // Child Loop BB0_5 Depth 2 2026-02-21T10:22:35.4400717Z .loc 1 32 35 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:32:35 2026-02-21T10:22:35.4401082Z shr.s32 %r401, %r14246, 31; 2026-02-21T10:22:35.4401264Z shr.u32 %r402, %r401, 16; 2026-02-21T10:22:35.4401440Z add.s32 %r403, %r14246, %r402; 2026-02-21T10:22:35.4401632Z shr.s32 %r404, %r403, 16; 2026-02-21T10:22:35.4401962Z .loc 1 33 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:33:33 2026-02-21T10:22:35.4402398Z shl.b32 %r405, %r404, 6; 2026-02-21T10:22:35.4402711Z .loc 1 34 39 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:39 2026-02-21T10:22:35.4403069Z sub.s32 %r406, 10, %r405; 2026-02-21T10:22:35.4403379Z .loc 1 34 52 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:52 2026-02-21T10:22:35.4403737Z min.s32 %r407, %r406, 64; 2026-02-21T10:22:35.4404045Z .loc 1 35 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:45 2026-02-21T10:22:35.4404398Z and.b32 %r408, %r403, -65536; 2026-02-21T10:22:35.4404587Z sub.s32 %r409, %r14246, %r408; 2026-02-21T10:22:35.4404902Z .loc 1 36 51 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:36:51 2026-02-21T10:22:35.4405254Z div.s32 %r410, %r409, %r407; 2026-02-21T10:22:35.4405570Z .loc 1 35 64 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:64 2026-02-21T10:22:35.4405926Z mul.lo.s32 %r411, %r410, %r407; 2026-02-21T10:22:35.4406134Z sub.s32 %r412, %r409, %r411; 2026-02-21T10:22:35.4406568Z .loc 1 35 30 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:30 2026-02-21T10:22:35.4406936Z add.s32 %r413, %r412, %r405; 2026-02-21T10:22:35.4407251Z .loc 1 37 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:37:27 2026-02-21T10:22:35.4407602Z shl.b32 %r423, %r413, 7; 2026-02-21T10:22:35.4408005Z .loc 1 39 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:39:27 2026-02-21T10:22:35.4408359Z shl.b32 %r52, %r410, 6; 2026-02-21T10:22:35.4408678Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.4409032Z shl.b32 %r414, %r410, 19; 2026-02-21T10:22:35.4409210Z or.b32 %r415, %r49, %r414; 2026-02-21T10:22:35.4409392Z mad.wide.s32 %rd595, %r415, 2, %rd76; 2026-02-21T10:22:35.4409604Z mov.b32 %r4474, 0f00000000; 2026-02-21T10:22:35.4409776Z mov.b64 %rd596, 0; 2026-02-21T10:22:35.4409938Z mov.b32 %r4475, %r4474; 2026-02-21T10:22:35.4410117Z mov.b32 %r4476, %r4474; 2026-02-21T10:22:35.4410292Z mov.b32 %r4477, %r4474; 2026-02-21T10:22:35.4410460Z mov.b32 %r4478, %r4474; 2026-02-21T10:22:35.4410623Z mov.b32 %r4479, %r4474; 2026-02-21T10:22:35.4410789Z mov.b32 %r4480, %r4474; 2026-02-21T10:22:35.4410949Z mov.b32 %r4481, %r4474; 2026-02-21T10:22:35.4411119Z mov.b32 %r4482, %r4474; 2026-02-21T10:22:35.4411380Z mov.b32 %r4483, %r4474; 2026-02-21T10:22:35.4411554Z mov.b32 %r4484, %r4474; 2026-02-21T10:22:35.4411716Z mov.b32 %r4485, %r4474; 2026-02-21T10:22:35.4411895Z mov.b32 %r4486, %r4474; 2026-02-21T10:22:35.4412066Z mov.b32 %r4487, %r4474; 2026-02-21T10:22:35.4412226Z mov.b32 %r4488, %r4474; 2026-02-21T10:22:35.4412468Z mov.b32 %r4489, %r4474; 2026-02-21T10:22:35.4412682Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:35.4412983Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:35.4413381Z .loc 1 0 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:0:126 2026-02-21T10:22:35.4413747Z cvt.u32.u64 %r424, %rd596; 2026-02-21T10:22:35.4414063Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4414421Z // begin inline asm 2026-02-21T10:22:35.4414586Z mov.u64 %rd131, 0x0; 2026-02-21T10:22:35.4414831Z createpolicy.fractional.L2::evict_last.b64 %rd131, 1.0; 2026-02-21T10:22:35.4415097Z // end inline asm 2026-02-21T10:22:35.4415249Z // begin inline asm 2026-02-21T10:22:35.4415408Z mov.u32 %r416, 0x0; 2026-02-21T10:22:35.4415559Z mov.u32 %r417, 0x0; 2026-02-21T10:22:35.4415716Z mov.u32 %r418, 0x0; 2026-02-21T10:22:35.4415869Z mov.u32 %r419, 0x0; 2026-02-21T10:22:35.4416182Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r416, %r417, %r418, %r419 }, [ %rd595 + 0 ], %rd131; 2026-02-21T10:22:35.4416743Z // end inline asm 2026-02-21T10:22:35.4417046Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4417399Z bar.sync 0; 2026-02-21T10:22:35.4417570Z st.shared.v2.b32 [%r14], {%r416, %r417}; 2026-02-21T10:22:35.4417802Z st.shared.v2.b32 [%r15], {%r418, %r419}; 2026-02-21T10:22:35.4418003Z bar.sync 0; 2026-02-21T10:22:35.4418161Z ld.shared.b16 %rs1, [%r16]; 2026-02-21T10:22:35.4418361Z ld.shared.b16 %rs2, [%r16+1024]; 2026-02-21T10:22:35.4418575Z ld.shared.b16 %rs3, [%r16+64]; 2026-02-21T10:22:35.4418770Z ld.shared.b16 %rs4, [%r16+1088]; 2026-02-21T10:22:35.4418960Z ld.shared.b16 %rs5, [%r17]; 2026-02-21T10:22:35.4419147Z ld.shared.b16 %rs6, [%r17+1024]; 2026-02-21T10:22:35.4419337Z ld.shared.b16 %rs7, [%r17+64]; 2026-02-21T10:22:35.4419541Z ld.shared.b16 %rs8, [%r17+1088]; 2026-02-21T10:22:35.4419726Z ld.shared.b16 %rs9, [%r18]; 2026-02-21T10:22:35.4419916Z ld.shared.b16 %rs10, [%r18+1024]; 2026-02-21T10:22:35.4420116Z ld.shared.b16 %rs11, [%r18+64]; 2026-02-21T10:22:35.4420311Z ld.shared.b16 %rs12, [%r18+1088]; 2026-02-21T10:22:35.4420510Z ld.shared.b16 %rs13, [%r19]; 2026-02-21T10:22:35.4420696Z ld.shared.b16 %rs14, [%r19+1024]; 2026-02-21T10:22:35.4420895Z ld.shared.b16 %rs15, [%r19+64]; 2026-02-21T10:22:35.4421085Z ld.shared.b16 %rs16, [%r19+1088]; 2026-02-21T10:22:35.4421295Z ld.shared.b16 %rs17, [%r20]; 2026-02-21T10:22:35.4421478Z ld.shared.b16 %rs18, [%r20+1024]; 2026-02-21T10:22:35.4421754Z ld.shared.b16 %rs19, [%r20+64]; 2026-02-21T10:22:35.4421941Z ld.shared.b16 %rs20, [%r20+1088]; 2026-02-21T10:22:35.4422140Z ld.shared.b16 %rs21, [%r21]; 2026-02-21T10:22:35.4422323Z ld.shared.b16 %rs22, [%r21+1024]; 2026-02-21T10:22:35.4422520Z ld.shared.b16 %rs23, [%r21+64]; 2026-02-21T10:22:35.4422716Z ld.shared.b16 %rs24, [%r21+1088]; 2026-02-21T10:22:35.4422909Z ld.shared.b16 %rs25, [%r22]; 2026-02-21T10:22:35.4423097Z ld.shared.b16 %rs26, [%r22+1024]; 2026-02-21T10:22:35.4423292Z ld.shared.b16 %rs27, [%r22+64]; 2026-02-21T10:22:35.4423502Z ld.shared.b16 %rs28, [%r22+1088]; 2026-02-21T10:22:35.4423694Z ld.shared.b16 %rs29, [%r23]; 2026-02-21T10:22:35.4423881Z ld.shared.b16 %rs30, [%r23+1024]; 2026-02-21T10:22:35.4424073Z ld.shared.b16 %rs31, [%r23+64]; 2026-02-21T10:22:35.4424268Z ld.shared.b16 %rs32, [%r23+1088]; 2026-02-21T10:22:35.4424467Z cvt.f32.bf16 %r829, %rs1; 2026-02-21T10:22:35.4424644Z cvt.f32.bf16 %r830, %rs2; 2026-02-21T10:22:35.4424827Z cvt.f32.bf16 %r831, %rs5; 2026-02-21T10:22:35.4425073Z cvt.f32.bf16 %r832, %rs6; 2026-02-21T10:22:35.4425261Z cvt.f32.bf16 %r865, %rs9; 2026-02-21T10:22:35.4425440Z cvt.f32.bf16 %r866, %rs10; 2026-02-21T10:22:35.4425630Z cvt.f32.bf16 %r867, %rs13; 2026-02-21T10:22:35.4425803Z cvt.f32.bf16 %r868, %rs14; 2026-02-21T10:22:35.4425982Z cvt.f32.bf16 %r901, %rs17; 2026-02-21T10:22:35.4426226Z cvt.f32.bf16 %r902, %rs18; 2026-02-21T10:22:35.4426404Z cvt.f32.bf16 %r903, %rs21; 2026-02-21T10:22:35.4426710Z cvt.f32.bf16 %r904, %rs22; 2026-02-21T10:22:35.4426905Z cvt.f32.bf16 %r937, %rs25; 2026-02-21T10:22:35.4427092Z cvt.f32.bf16 %r938, %rs26; 2026-02-21T10:22:35.4427272Z cvt.f32.bf16 %r939, %rs29; 2026-02-21T10:22:35.4427449Z cvt.f32.bf16 %r940, %rs30; 2026-02-21T10:22:35.4427620Z cvt.f32.bf16 %r973, %rs3; 2026-02-21T10:22:35.4427796Z cvt.f32.bf16 %r974, %rs4; 2026-02-21T10:22:35.4427968Z cvt.f32.bf16 %r975, %rs7; 2026-02-21T10:22:35.4428141Z cvt.f32.bf16 %r976, %rs8; 2026-02-21T10:22:35.4428318Z cvt.f32.bf16 %r1009, %rs11; 2026-02-21T10:22:35.4428604Z cvt.f32.bf16 %r1010, %rs12; 2026-02-21T10:22:35.4428798Z cvt.f32.bf16 %r1011, %rs15; 2026-02-21T10:22:35.4428973Z cvt.f32.bf16 %r1012, %rs16; 2026-02-21T10:22:35.4429150Z cvt.f32.bf16 %r1045, %rs19; 2026-02-21T10:22:35.4429339Z cvt.f32.bf16 %r1046, %rs20; 2026-02-21T10:22:35.4429522Z cvt.f32.bf16 %r1047, %rs23; 2026-02-21T10:22:35.4429693Z cvt.f32.bf16 %r1048, %rs24; 2026-02-21T10:22:35.4429878Z cvt.f32.bf16 %r1081, %rs27; 2026-02-21T10:22:35.4430140Z cvt.f32.bf16 %r1082, %rs28; 2026-02-21T10:22:35.4430330Z cvt.f32.bf16 %r1083, %rs31; 2026-02-21T10:22:35.4430507Z cvt.f32.bf16 %r1084, %rs32; 2026-02-21T10:22:35.4430839Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4431206Z bar.sync 0; 2026-02-21T10:22:35.4431352Z // begin inline asm 2026-02-21T10:22:35.4431560Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4431799Z // end inline asm 2026-02-21T10:22:35.4431954Z bar.sync 0; 2026-02-21T10:22:35.4432097Z // begin inline asm 2026-02-21T10:22:35.4432336Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4432612Z // end inline asm 2026-02-21T10:22:35.4432763Z // begin inline asm 2026-02-21T10:22:35.4432944Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4433134Z // end inline asm 2026-02-21T10:22:35.4433286Z bar.sync 0; 2026-02-21T10:22:35.4433444Z elect.sync %r4764|%p142, -1; 2026-02-21T10:22:35.4433645Z and.pred %p22, %p1, %p142; 2026-02-21T10:22:35.4433839Z // begin inline asm 2026-02-21T10:22:35.4434268Z @%p22 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r423, %r424}], [%r4903]; 2026-02-21T10:22:35.4434744Z // end inline asm 2026-02-21T10:22:35.4434890Z bar.sync 0; 2026-02-21T10:22:35.4435039Z mov.b32 %r4743, 0; 2026-02-21T10:22:35.4435192Z // begin inline asm 2026-02-21T10:22:35.4435350Z 2026-02-21T10:22:35.4435573Z { 2026-02-21T10:22:35.4435715Z .reg .pred complete; 2026-02-21T10:22:35.4435875Z waitLoop: 2026-02-21T10:22:35.4436107Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r4743; 2026-02-21T10:22:35.4436397Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4436704Z } 2026-02-21T10:22:35.4436775Z 2026-02-21T10:22:35.4436842Z // end inline asm 2026-02-21T10:22:35.4436989Z bar.sync 0; 2026-02-21T10:22:35.4437135Z // begin inline asm 2026-02-21T10:22:35.4437326Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4437564Z // end inline asm 2026-02-21T10:22:35.4437868Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4438247Z ld.shared.b8 %rs33, [%r24]; 2026-02-21T10:22:35.4438437Z ld.shared.b8 %rs34, [%r24+1024]; 2026-02-21T10:22:35.4438638Z ld.shared.b8 %rs35, [%r24+2048]; 2026-02-21T10:22:35.4438830Z ld.shared.b8 %rs36, [%r24+3072]; 2026-02-21T10:22:35.4439026Z ld.shared.b8 %rs37, [%r25+256]; 2026-02-21T10:22:35.4439312Z ld.shared.b8 %rs38, [%r25+1280]; 2026-02-21T10:22:35.4439518Z ld.shared.b8 %rs39, [%r25+2304]; 2026-02-21T10:22:35.4439714Z ld.shared.b8 %rs40, [%r25+3328]; 2026-02-21T10:22:35.4439902Z ld.shared.b8 %rs41, [%r26+512]; 2026-02-21T10:22:35.4440094Z ld.shared.b8 %rs42, [%r26+1536]; 2026-02-21T10:22:35.4440350Z ld.shared.b8 %rs43, [%r26+2560]; 2026-02-21T10:22:35.4440544Z ld.shared.b8 %rs44, [%r26+3584]; 2026-02-21T10:22:35.4440735Z ld.shared.b8 %rs45, [%r27+768]; 2026-02-21T10:22:35.4440931Z ld.shared.b8 %rs46, [%r27+1792]; 2026-02-21T10:22:35.4441124Z ld.shared.b8 %rs47, [%r27+2816]; 2026-02-21T10:22:35.4441312Z ld.shared.b8 %rs48, [%r27+3840]; 2026-02-21T10:22:35.4441667Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4442025Z shl.b16 %rs49, %rs33, 4; 2026-02-21T10:22:35.4442205Z shl.b16 %rs50, %rs37, 4; 2026-02-21T10:22:35.4442375Z shl.b16 %rs51, %rs41, 4; 2026-02-21T10:22:35.4442551Z shl.b16 %rs52, %rs45, 4; 2026-02-21T10:22:35.4442718Z shl.b16 %rs53, %rs34, 4; 2026-02-21T10:22:35.4442898Z shl.b16 %rs54, %rs38, 4; 2026-02-21T10:22:35.4443073Z shl.b16 %rs55, %rs42, 4; 2026-02-21T10:22:35.4443238Z shl.b16 %rs56, %rs46, 4; 2026-02-21T10:22:35.4443413Z shl.b16 %rs57, %rs35, 4; 2026-02-21T10:22:35.4443584Z shl.b16 %rs58, %rs39, 4; 2026-02-21T10:22:35.4443756Z shl.b16 %rs59, %rs43, 4; 2026-02-21T10:22:35.4443923Z shl.b16 %rs60, %rs47, 4; 2026-02-21T10:22:35.4444095Z shl.b16 %rs61, %rs36, 4; 2026-02-21T10:22:35.4444343Z shl.b16 %rs62, %rs40, 4; 2026-02-21T10:22:35.4444526Z shl.b16 %rs63, %rs44, 4; 2026-02-21T10:22:35.4444692Z shl.b16 %rs64, %rs48, 4; 2026-02-21T10:22:35.4445009Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4445381Z selp.b16 %rs65, %rs49, %rs33, %p484; 2026-02-21T10:22:35.4445583Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T10:22:35.4445762Z shr.s16 %rs67, %rs66, 4; 2026-02-21T10:22:35.4445955Z selp.b16 %rs68, %rs50, %rs37, %p484; 2026-02-21T10:22:35.4446161Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T10:22:35.4446332Z shr.s16 %rs70, %rs69, 4; 2026-02-21T10:22:35.4446632Z selp.b16 %rs71, %rs51, %rs41, %p484; 2026-02-21T10:22:35.4446831Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T10:22:35.4447006Z shr.s16 %rs73, %rs72, 4; 2026-02-21T10:22:35.4447190Z selp.b16 %rs74, %rs52, %rs45, %p484; 2026-02-21T10:22:35.4447389Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T10:22:35.4447566Z shr.s16 %rs76, %rs75, 4; 2026-02-21T10:22:35.4447740Z selp.b16 %rs77, %rs53, %rs34, %p484; 2026-02-21T10:22:35.4447943Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T10:22:35.4448110Z shr.s16 %rs79, %rs78, 4; 2026-02-21T10:22:35.4448289Z selp.b16 %rs80, %rs54, %rs38, %p484; 2026-02-21T10:22:35.4448481Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T10:22:35.4448655Z shr.s16 %rs82, %rs81, 4; 2026-02-21T10:22:35.4448846Z selp.b16 %rs83, %rs55, %rs42, %p484; 2026-02-21T10:22:35.4449042Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T10:22:35.4449300Z shr.s16 %rs85, %rs84, 4; 2026-02-21T10:22:35.4449480Z selp.b16 %rs86, %rs56, %rs46, %p484; 2026-02-21T10:22:35.4449692Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T10:22:35.4449861Z shr.s16 %rs88, %rs87, 4; 2026-02-21T10:22:35.4450036Z selp.b16 %rs89, %rs57, %rs35, %p484; 2026-02-21T10:22:35.4450230Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T10:22:35.4450403Z shr.s16 %rs91, %rs90, 4; 2026-02-21T10:22:35.4450586Z selp.b16 %rs92, %rs58, %rs39, %p484; 2026-02-21T10:22:35.4450789Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T10:22:35.4450964Z shr.s16 %rs94, %rs93, 4; 2026-02-21T10:22:35.4451139Z selp.b16 %rs95, %rs59, %rs43, %p484; 2026-02-21T10:22:35.4451339Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T10:22:35.4451506Z shr.s16 %rs97, %rs96, 4; 2026-02-21T10:22:35.4451683Z selp.b16 %rs98, %rs60, %rs47, %p484; 2026-02-21T10:22:35.4451877Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T10:22:35.4452052Z shr.s16 %rs100, %rs99, 4; 2026-02-21T10:22:35.4452239Z selp.b16 %rs101, %rs61, %rs36, %p484; 2026-02-21T10:22:35.4452537Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T10:22:35.4452723Z shr.s16 %rs103, %rs102, 4; 2026-02-21T10:22:35.4452911Z selp.b16 %rs104, %rs62, %rs40, %p484; 2026-02-21T10:22:35.4453116Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T10:22:35.4453290Z shr.s16 %rs106, %rs105, 4; 2026-02-21T10:22:35.4453480Z selp.b16 %rs107, %rs63, %rs44, %p484; 2026-02-21T10:22:35.4453746Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T10:22:35.4453921Z shr.s16 %rs109, %rs108, 4; 2026-02-21T10:22:35.4454118Z selp.b16 %rs110, %rs64, %rs48, %p484; 2026-02-21T10:22:35.4454324Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T10:22:35.4454500Z shr.s16 %rs112, %rs111, 4; 2026-02-21T10:22:35.4454826Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4455195Z cvt.rn.f32.s16 %r4765, %rs67; 2026-02-21T10:22:35.4455380Z cvt.rn.f32.s16 %r4766, %rs70; 2026-02-21T10:22:35.4455570Z cvt.rn.f32.s16 %r4767, %rs73; 2026-02-21T10:22:35.4455752Z cvt.rn.f32.s16 %r4768, %rs76; 2026-02-21T10:22:35.4455947Z cvt.rn.f32.s16 %r4769, %rs79; 2026-02-21T10:22:35.4456121Z cvt.rn.f32.s16 %r4770, %rs82; 2026-02-21T10:22:35.4456305Z cvt.rn.f32.s16 %r4771, %rs85; 2026-02-21T10:22:35.4456601Z cvt.rn.f32.s16 %r4772, %rs88; 2026-02-21T10:22:35.4456793Z cvt.rn.f32.s16 %r4773, %rs91; 2026-02-21T10:22:35.4456988Z cvt.rn.f32.s16 %r4774, %rs94; 2026-02-21T10:22:35.4457164Z cvt.rn.f32.s16 %r4775, %rs97; 2026-02-21T10:22:35.4457350Z cvt.rn.f32.s16 %r4776, %rs100; 2026-02-21T10:22:35.4457626Z cvt.rn.f32.s16 %r4777, %rs103; 2026-02-21T10:22:35.4457829Z cvt.rn.f32.s16 %r4778, %rs106; 2026-02-21T10:22:35.4458009Z cvt.rn.f32.s16 %r4779, %rs109; 2026-02-21T10:22:35.4458195Z cvt.rn.f32.s16 %r4780, %rs112; 2026-02-21T10:22:35.4458369Z bar.sync 0; 2026-02-21T10:22:35.4458524Z st.shared.b32 [%r28], %r4765; 2026-02-21T10:22:35.4458713Z st.shared.b32 [%r28+16384], %r4773; 2026-02-21T10:22:35.4458914Z st.shared.b32 [%r29], %r4766; 2026-02-21T10:22:35.4459109Z st.shared.b32 [%r29+16384], %r4774; 2026-02-21T10:22:35.4459306Z st.shared.b32 [%r30], %r4767; 2026-02-21T10:22:35.4459507Z st.shared.b32 [%r30+16384], %r4775; 2026-02-21T10:22:35.4459700Z st.shared.b32 [%r31], %r4768; 2026-02-21T10:22:35.4459888Z st.shared.b32 [%r31+16384], %r4776; 2026-02-21T10:22:35.4460080Z st.shared.b32 [%r32], %r4769; 2026-02-21T10:22:35.4460275Z st.shared.b32 [%r32+16384], %r4777; 2026-02-21T10:22:35.4460466Z st.shared.b32 [%r33], %r4770; 2026-02-21T10:22:35.4460653Z st.shared.b32 [%r33+16384], %r4778; 2026-02-21T10:22:35.4460854Z st.shared.b32 [%r34], %r4771; 2026-02-21T10:22:35.4461037Z st.shared.b32 [%r34+16384], %r4779; 2026-02-21T10:22:35.4461247Z st.shared.b32 [%r35], %r4772; 2026-02-21T10:22:35.4461437Z st.shared.b32 [%r35+16384], %r4780; 2026-02-21T10:22:35.4461706Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4474}; 2026-02-21T10:22:35.4461979Z bar.sync 0; 2026-02-21T10:22:35.4462129Z // begin inline asm 2026-02-21T10:22:35.4462512Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r545, %r833, %r1121, %r1409}, [%r433]; 2026-02-21T10:22:35.4462857Z // end inline asm 2026-02-21T10:22:35.4463011Z bar.sync 0; 2026-02-21T10:22:35.4463228Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4476}; 2026-02-21T10:22:35.4463501Z bar.sync 0; 2026-02-21T10:22:35.4463641Z // begin inline asm 2026-02-21T10:22:35.4463924Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r547, %r835, %r1123, %r1411}, [%r433]; 2026-02-21T10:22:35.4464245Z // end inline asm 2026-02-21T10:22:35.4464400Z bar.sync 0; 2026-02-21T10:22:35.4464612Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4475}; 2026-02-21T10:22:35.4464882Z bar.sync 0; 2026-02-21T10:22:35.4465026Z // begin inline asm 2026-02-21T10:22:35.4465302Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r546, %r834, %r1122, %r1410}, [%r433]; 2026-02-21T10:22:35.4465638Z // end inline asm 2026-02-21T10:22:35.4465788Z bar.sync 0; 2026-02-21T10:22:35.4466007Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4477}; 2026-02-21T10:22:35.4466351Z bar.sync 0; 2026-02-21T10:22:35.4466626Z // begin inline asm 2026-02-21T10:22:35.4466899Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r548, %r836, %r1124, %r1412}, [%r433]; 2026-02-21T10:22:35.4467223Z // end inline asm 2026-02-21T10:22:35.4467368Z bar.sync 0; 2026-02-21T10:22:35.4467583Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4478}; 2026-02-21T10:22:35.4467941Z bar.sync 0; 2026-02-21T10:22:35.4468083Z // begin inline asm 2026-02-21T10:22:35.4468361Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r549, %r837, %r1125, %r1413}, [%r433]; 2026-02-21T10:22:35.4468753Z // end inline asm 2026-02-21T10:22:35.4468906Z bar.sync 0; 2026-02-21T10:22:35.4469115Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4480}; 2026-02-21T10:22:35.4469384Z bar.sync 0; 2026-02-21T10:22:35.4469520Z // begin inline asm 2026-02-21T10:22:35.4469798Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r551, %r839, %r1127, %r1415}, [%r433]; 2026-02-21T10:22:35.4470120Z // end inline asm 2026-02-21T10:22:35.4470270Z bar.sync 0; 2026-02-21T10:22:35.4470484Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4479}; 2026-02-21T10:22:35.4470745Z bar.sync 0; 2026-02-21T10:22:35.4470890Z // begin inline asm 2026-02-21T10:22:35.4471155Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r550, %r838, %r1126, %r1414}, [%r433]; 2026-02-21T10:22:35.4471476Z // end inline asm 2026-02-21T10:22:35.4471618Z bar.sync 0; 2026-02-21T10:22:35.4471832Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4481}; 2026-02-21T10:22:35.4472183Z bar.sync 0; 2026-02-21T10:22:35.4472343Z // begin inline asm 2026-02-21T10:22:35.4472626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r552, %r840, %r1128, %r1416}, [%r433]; 2026-02-21T10:22:35.4472954Z // end inline asm 2026-02-21T10:22:35.4473107Z bar.sync 0; 2026-02-21T10:22:35.4473321Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4482}; 2026-02-21T10:22:35.4473594Z bar.sync 0; 2026-02-21T10:22:35.4473735Z // begin inline asm 2026-02-21T10:22:35.4474017Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r553, %r841, %r1129, %r1417}, [%r433]; 2026-02-21T10:22:35.4474336Z // end inline asm 2026-02-21T10:22:35.4474487Z bar.sync 0; 2026-02-21T10:22:35.4474703Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4484}; 2026-02-21T10:22:35.4474969Z bar.sync 0; 2026-02-21T10:22:35.4475122Z // begin inline asm 2026-02-21T10:22:35.4475409Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r555, %r843, %r1131, %r1419}, [%r433]; 2026-02-21T10:22:35.4475735Z // end inline asm 2026-02-21T10:22:35.4475886Z bar.sync 0; 2026-02-21T10:22:35.4476104Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4483}; 2026-02-21T10:22:35.4476368Z bar.sync 0; 2026-02-21T10:22:35.4476646Z // begin inline asm 2026-02-21T10:22:35.4476923Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r554, %r842, %r1130, %r1418}, [%r433]; 2026-02-21T10:22:35.4477239Z // end inline asm 2026-02-21T10:22:35.4477389Z bar.sync 0; 2026-02-21T10:22:35.4477604Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4485}; 2026-02-21T10:22:35.4477979Z bar.sync 0; 2026-02-21T10:22:35.4478125Z // begin inline asm 2026-02-21T10:22:35.4478398Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r556, %r844, %r1132, %r1420}, [%r433]; 2026-02-21T10:22:35.4478714Z // end inline asm 2026-02-21T10:22:35.4478863Z bar.sync 0; 2026-02-21T10:22:35.4479085Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4486}; 2026-02-21T10:22:35.4479349Z bar.sync 0; 2026-02-21T10:22:35.4479497Z // begin inline asm 2026-02-21T10:22:35.4494415Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r557, %r845, %r1133, %r1421}, [%r433]; 2026-02-21T10:22:35.4494777Z // end inline asm 2026-02-21T10:22:35.4494934Z bar.sync 0; 2026-02-21T10:22:35.4495164Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4488}; 2026-02-21T10:22:35.4495443Z bar.sync 0; 2026-02-21T10:22:35.4495586Z // begin inline asm 2026-02-21T10:22:35.4495869Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r559, %r847, %r1135, %r1423}, [%r433]; 2026-02-21T10:22:35.4496195Z // end inline asm 2026-02-21T10:22:35.4496643Z bar.sync 0; 2026-02-21T10:22:35.4496891Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4487}; 2026-02-21T10:22:35.4497184Z bar.sync 0; 2026-02-21T10:22:35.4497334Z // begin inline asm 2026-02-21T10:22:35.4497620Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r558, %r846, %r1134, %r1422}, [%r433]; 2026-02-21T10:22:35.4498057Z // end inline asm 2026-02-21T10:22:35.4498212Z bar.sync 0; 2026-02-21T10:22:35.4498440Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r4489}; 2026-02-21T10:22:35.4498710Z bar.sync 0; 2026-02-21T10:22:35.4498859Z // begin inline asm 2026-02-21T10:22:35.4499138Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r560, %r848, %r1136, %r1424}, [%r433]; 2026-02-21T10:22:35.4499467Z // end inline asm 2026-02-21T10:22:35.4499626Z $L__tmp1: 2026-02-21T10:22:35.4500003Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4500439Z // begin inline asm 2026-02-21T10:22:35.4500636Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4500840Z // end inline asm 2026-02-21T10:22:35.4501021Z shfl.sync.idx.b32 %r4781, %r5, 0, 31, -1; 2026-02-21T10:22:35.4501265Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4501460Z mov.pred %p24, -1; 2026-02-21T10:22:35.4501633Z // begin inline asm 2026-02-21T10:22:35.4502291Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r829,%r830,%r831,%r832}, %rd266, %p24, 1, 1; 2026-02-21T10:22:35.4502891Z // end inline asm 2026-02-21T10:22:35.4503047Z // begin inline asm 2026-02-21T10:22:35.4503573Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r865,%r866,%r867,%r868}, %rd267, %p24, 1, 1; 2026-02-21T10:22:35.4504159Z // end inline asm 2026-02-21T10:22:35.4504306Z // begin inline asm 2026-02-21T10:22:35.4504848Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r901,%r902,%r903,%r904}, %rd268, %p24, 1, 1; 2026-02-21T10:22:35.4505430Z // end inline asm 2026-02-21T10:22:35.4505574Z // begin inline asm 2026-02-21T10:22:35.4506120Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r937,%r938,%r939,%r940}, %rd269, %p24, 1, 1; 2026-02-21T10:22:35.4506835Z // end inline asm 2026-02-21T10:22:35.4506994Z // begin inline asm 2026-02-21T10:22:35.4507529Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r973,%r974,%r975,%r976}, %rd270, %p24, 1, 1; 2026-02-21T10:22:35.4508114Z // end inline asm 2026-02-21T10:22:35.4508281Z // begin inline asm 2026-02-21T10:22:35.4508998Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r1009,%r1010,%r1011,%r1012}, %rd271, %p24, 1, 1; 2026-02-21T10:22:35.4509596Z // end inline asm 2026-02-21T10:22:35.4509750Z // begin inline asm 2026-02-21T10:22:35.4510305Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r1045,%r1046,%r1047,%r1048}, %rd272, %p24, 1, 1; 2026-02-21T10:22:35.4510908Z // end inline asm 2026-02-21T10:22:35.4511063Z // begin inline asm 2026-02-21T10:22:35.4511604Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r1081,%r1082,%r1083,%r1084}, %rd273, %p24, 1, 1; 2026-02-21T10:22:35.4512187Z // end inline asm 2026-02-21T10:22:35.4512358Z // begin inline asm 2026-02-21T10:22:35.4512968Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r829,%r830,%r831,%r832}, %rd274, %p24, 1, 1; 2026-02-21T10:22:35.4513565Z // end inline asm 2026-02-21T10:22:35.4513722Z // begin inline asm 2026-02-21T10:22:35.4514252Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r865,%r866,%r867,%r868}, %rd275, %p24, 1, 1; 2026-02-21T10:22:35.4514905Z // end inline asm 2026-02-21T10:22:35.4515061Z // begin inline asm 2026-02-21T10:22:35.4515608Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r901,%r902,%r903,%r904}, %rd276, %p24, 1, 1; 2026-02-21T10:22:35.4516195Z // end inline asm 2026-02-21T10:22:35.4516340Z // begin inline asm 2026-02-21T10:22:35.4516999Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r937,%r938,%r939,%r940}, %rd277, %p24, 1, 1; 2026-02-21T10:22:35.4517580Z // end inline asm 2026-02-21T10:22:35.4517731Z // begin inline asm 2026-02-21T10:22:35.4518266Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r973,%r974,%r975,%r976}, %rd278, %p24, 1, 1; 2026-02-21T10:22:35.4518843Z // end inline asm 2026-02-21T10:22:35.4518996Z // begin inline asm 2026-02-21T10:22:35.4519607Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r1009,%r1010,%r1011,%r1012}, %rd279, %p24, 1, 1; 2026-02-21T10:22:35.4520222Z // end inline asm 2026-02-21T10:22:35.4520368Z // begin inline asm 2026-02-21T10:22:35.4520912Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r1045,%r1046,%r1047,%r1048}, %rd280, %p24, 1, 1; 2026-02-21T10:22:35.4521507Z // end inline asm 2026-02-21T10:22:35.4521653Z // begin inline asm 2026-02-21T10:22:35.4522203Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r1081,%r1082,%r1083,%r1084}, %rd281, %p24, 1, 1; 2026-02-21T10:22:35.4522799Z // end inline asm 2026-02-21T10:22:35.4522953Z // begin inline asm 2026-02-21T10:22:35.4523545Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r829,%r830,%r831,%r832}, %rd282, %p24, 1, 1; 2026-02-21T10:22:35.4524183Z // end inline asm 2026-02-21T10:22:35.4524341Z // begin inline asm 2026-02-21T10:22:35.4524921Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r865,%r866,%r867,%r868}, %rd283, %p24, 1, 1; 2026-02-21T10:22:35.4525625Z // end inline asm 2026-02-21T10:22:35.4525776Z // begin inline asm 2026-02-21T10:22:35.4526367Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r901,%r902,%r903,%r904}, %rd284, %p24, 1, 1; 2026-02-21T10:22:35.4527133Z // end inline asm 2026-02-21T10:22:35.4527285Z // begin inline asm 2026-02-21T10:22:35.4527877Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r937,%r938,%r939,%r940}, %rd285, %p24, 1, 1; 2026-02-21T10:22:35.4528502Z // end inline asm 2026-02-21T10:22:35.4528659Z // begin inline asm 2026-02-21T10:22:35.4529331Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r973,%r974,%r975,%r976}, %rd286, %p24, 1, 1; 2026-02-21T10:22:35.4532287Z // end inline asm 2026-02-21T10:22:35.4532461Z // begin inline asm 2026-02-21T10:22:35.4533099Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r1009,%r1010,%r1011,%r1012}, %rd287, %p24, 1, 1; 2026-02-21T10:22:35.4533888Z // end inline asm 2026-02-21T10:22:35.4534048Z // begin inline asm 2026-02-21T10:22:35.4534671Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r1045,%r1046,%r1047,%r1048}, %rd288, %p24, 1, 1; 2026-02-21T10:22:35.4535315Z // end inline asm 2026-02-21T10:22:35.4535473Z // begin inline asm 2026-02-21T10:22:35.4536074Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r1081,%r1082,%r1083,%r1084}, %rd289, %p24, 1, 1; 2026-02-21T10:22:35.4536902Z // end inline asm 2026-02-21T10:22:35.4537072Z // begin inline asm 2026-02-21T10:22:35.4537658Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r829,%r830,%r831,%r832}, %rd290, %p24, 1, 1; 2026-02-21T10:22:35.4538303Z // end inline asm 2026-02-21T10:22:35.4538449Z // begin inline asm 2026-02-21T10:22:35.4539118Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r865,%r866,%r867,%r868}, %rd291, %p24, 1, 1; 2026-02-21T10:22:35.4539755Z // end inline asm 2026-02-21T10:22:35.4539909Z // begin inline asm 2026-02-21T10:22:35.4540504Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r901,%r902,%r903,%r904}, %rd292, %p24, 1, 1; 2026-02-21T10:22:35.4541152Z // end inline asm 2026-02-21T10:22:35.4541310Z // begin inline asm 2026-02-21T10:22:35.4541903Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r937,%r938,%r939,%r940}, %rd293, %p24, 1, 1; 2026-02-21T10:22:35.4542549Z // end inline asm 2026-02-21T10:22:35.4542704Z // begin inline asm 2026-02-21T10:22:35.4543282Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r973,%r974,%r975,%r976}, %rd294, %p24, 1, 1; 2026-02-21T10:22:35.4543925Z // end inline asm 2026-02-21T10:22:35.4544074Z // begin inline asm 2026-02-21T10:22:35.4544688Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r1009,%r1010,%r1011,%r1012}, %rd295, %p24, 1, 1; 2026-02-21T10:22:35.4545339Z // end inline asm 2026-02-21T10:22:35.4545484Z // begin inline asm 2026-02-21T10:22:35.4546074Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r1045,%r1046,%r1047,%r1048}, %rd296, %p24, 1, 1; 2026-02-21T10:22:35.4546834Z // end inline asm 2026-02-21T10:22:35.4546990Z // begin inline asm 2026-02-21T10:22:35.4547602Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r1081,%r1082,%r1083,%r1084}, %rd297, %p24, 1, 1; 2026-02-21T10:22:35.4548245Z // end inline asm 2026-02-21T10:22:35.4548417Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4548693Z mov.b32 %r1726, %r4743; 2026-02-21T10:22:35.4548872Z mov.b32 %r1727, %r4743; 2026-02-21T10:22:35.4549040Z mov.b32 %r1725, %r11026; 2026-02-21T10:22:35.4549290Z // begin inline asm 2026-02-21T10:22:35.4550562Z // wait for regs: %r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r1725,%r1726,%r1727 2026-02-21T10:22:35.4551776Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4551979Z // end inline asm 2026-02-21T10:22:35.4552132Z $L__tmp2: 2026-02-21T10:22:35.4552438Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4552814Z add.s64 %rd168, %rd595, 128; 2026-02-21T10:22:35.4553011Z // begin inline asm 2026-02-21T10:22:35.4553177Z mov.u64 %rd167, 0x0; 2026-02-21T10:22:35.4553407Z createpolicy.fractional.L2::evict_last.b64 %rd167, 1.0; 2026-02-21T10:22:35.4553668Z // end inline asm 2026-02-21T10:22:35.4553819Z // begin inline asm 2026-02-21T10:22:35.4553978Z mov.u32 %r1795, 0x0; 2026-02-21T10:22:35.4554131Z mov.u32 %r1796, 0x0; 2026-02-21T10:22:35.4554286Z mov.u32 %r1797, 0x0; 2026-02-21T10:22:35.4554459Z mov.u32 %r1798, 0x0; 2026-02-21T10:22:35.4554780Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1795, %r1796, %r1797, %r1798 }, [ %rd168 + 0 ], %rd167; 2026-02-21T10:22:35.4555143Z // end inline asm 2026-02-21T10:22:35.4555514Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4555878Z bar.sync 0; 2026-02-21T10:22:35.4556056Z st.shared.v2.b32 [%r14], {%r1795, %r1796}; 2026-02-21T10:22:35.4556297Z st.shared.v2.b32 [%r15], {%r1797, %r1798}; 2026-02-21T10:22:35.4556639Z bar.sync 0; 2026-02-21T10:22:35.4556805Z ld.shared.b16 %rs113, [%r16]; 2026-02-21T10:22:35.4557011Z ld.shared.b16 %rs114, [%r16+1024]; 2026-02-21T10:22:35.4557216Z ld.shared.b16 %rs115, [%r16+64]; 2026-02-21T10:22:35.4557420Z ld.shared.b16 %rs116, [%r16+1088]; 2026-02-21T10:22:35.4557616Z ld.shared.b16 %rs117, [%r17]; 2026-02-21T10:22:35.4557807Z ld.shared.b16 %rs118, [%r17+1024]; 2026-02-21T10:22:35.4557998Z ld.shared.b16 %rs119, [%r17+64]; 2026-02-21T10:22:35.4558209Z ld.shared.b16 %rs120, [%r17+1088]; 2026-02-21T10:22:35.4558409Z ld.shared.b16 %rs121, [%r18]; 2026-02-21T10:22:35.4558591Z ld.shared.b16 %rs122, [%r18+1024]; 2026-02-21T10:22:35.4558790Z ld.shared.b16 %rs123, [%r18+64]; 2026-02-21T10:22:35.4558977Z ld.shared.b16 %rs124, [%r18+1088]; 2026-02-21T10:22:35.4559176Z ld.shared.b16 %rs125, [%r19]; 2026-02-21T10:22:35.4559357Z ld.shared.b16 %rs126, [%r19+1024]; 2026-02-21T10:22:35.4559555Z ld.shared.b16 %rs127, [%r19+64]; 2026-02-21T10:22:35.4559747Z ld.shared.b16 %rs128, [%r19+1088]; 2026-02-21T10:22:35.4559942Z ld.shared.b16 %rs129, [%r20]; 2026-02-21T10:22:35.4560127Z ld.shared.b16 %rs130, [%r20+1024]; 2026-02-21T10:22:35.4560316Z ld.shared.b16 %rs131, [%r20+64]; 2026-02-21T10:22:35.4560524Z ld.shared.b16 %rs132, [%r20+1088]; 2026-02-21T10:22:35.4560715Z ld.shared.b16 %rs133, [%r21]; 2026-02-21T10:22:35.4560902Z ld.shared.b16 %rs134, [%r21+1024]; 2026-02-21T10:22:35.4561089Z ld.shared.b16 %rs135, [%r21+64]; 2026-02-21T10:22:35.4561280Z ld.shared.b16 %rs136, [%r21+1088]; 2026-02-21T10:22:35.4561468Z ld.shared.b16 %rs137, [%r22]; 2026-02-21T10:22:35.4561652Z ld.shared.b16 %rs138, [%r22+1024]; 2026-02-21T10:22:35.4561844Z ld.shared.b16 %rs139, [%r22+64]; 2026-02-21T10:22:35.4562030Z ld.shared.b16 %rs140, [%r22+1088]; 2026-02-21T10:22:35.4562222Z ld.shared.b16 %rs141, [%r23]; 2026-02-21T10:22:35.4562401Z ld.shared.b16 %rs142, [%r23+1024]; 2026-02-21T10:22:35.4562594Z ld.shared.b16 %rs143, [%r23+64]; 2026-02-21T10:22:35.4562780Z ld.shared.b16 %rs144, [%r23+1088]; 2026-02-21T10:22:35.4562990Z cvt.f32.bf16 %r2128, %rs113; 2026-02-21T10:22:35.4563173Z cvt.f32.bf16 %r2129, %rs114; 2026-02-21T10:22:35.4563432Z cvt.f32.bf16 %r2130, %rs117; 2026-02-21T10:22:35.4563696Z cvt.f32.bf16 %r2131, %rs118; 2026-02-21T10:22:35.4563873Z cvt.f32.bf16 %r2164, %rs121; 2026-02-21T10:22:35.4564048Z cvt.f32.bf16 %r2165, %rs122; 2026-02-21T10:22:35.4564218Z cvt.f32.bf16 %r2166, %rs125; 2026-02-21T10:22:35.4564459Z cvt.f32.bf16 %r2167, %rs126; 2026-02-21T10:22:35.4564630Z cvt.f32.bf16 %r2200, %rs129; 2026-02-21T10:22:35.4564807Z cvt.f32.bf16 %r2201, %rs130; 2026-02-21T10:22:35.4564980Z cvt.f32.bf16 %r2202, %rs133; 2026-02-21T10:22:35.4565158Z cvt.f32.bf16 %r2203, %rs134; 2026-02-21T10:22:35.4565329Z cvt.f32.bf16 %r2236, %rs137; 2026-02-21T10:22:35.4565504Z cvt.f32.bf16 %r2237, %rs138; 2026-02-21T10:22:35.4565685Z cvt.f32.bf16 %r2238, %rs141; 2026-02-21T10:22:35.4565855Z cvt.f32.bf16 %r2239, %rs142; 2026-02-21T10:22:35.4566031Z cvt.f32.bf16 %r2272, %rs115; 2026-02-21T10:22:35.4566208Z cvt.f32.bf16 %r2273, %rs116; 2026-02-21T10:22:35.4566380Z cvt.f32.bf16 %r2274, %rs119; 2026-02-21T10:22:35.4566686Z cvt.f32.bf16 %r2275, %rs120; 2026-02-21T10:22:35.4566865Z cvt.f32.bf16 %r2308, %rs123; 2026-02-21T10:22:35.4567045Z cvt.f32.bf16 %r2309, %rs124; 2026-02-21T10:22:35.4567219Z cvt.f32.bf16 %r2310, %rs127; 2026-02-21T10:22:35.4567400Z cvt.f32.bf16 %r2311, %rs128; 2026-02-21T10:22:35.4567571Z cvt.f32.bf16 %r2344, %rs131; 2026-02-21T10:22:35.4567752Z cvt.f32.bf16 %r2345, %rs132; 2026-02-21T10:22:35.4567923Z cvt.f32.bf16 %r2346, %rs135; 2026-02-21T10:22:35.4568190Z cvt.f32.bf16 %r2347, %rs136; 2026-02-21T10:22:35.4568373Z cvt.f32.bf16 %r2380, %rs139; 2026-02-21T10:22:35.4568544Z cvt.f32.bf16 %r2381, %rs140; 2026-02-21T10:22:35.4568721Z cvt.f32.bf16 %r2382, %rs143; 2026-02-21T10:22:35.4568890Z cvt.f32.bf16 %r2383, %rs144; 2026-02-21T10:22:35.4569216Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4569575Z bar.sync 0; 2026-02-21T10:22:35.4569738Z // begin inline asm 2026-02-21T10:22:35.4569940Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4570178Z // end inline asm 2026-02-21T10:22:35.4570328Z bar.sync 0; 2026-02-21T10:22:35.4570468Z // begin inline asm 2026-02-21T10:22:35.4570697Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4570963Z // end inline asm 2026-02-21T10:22:35.4571115Z // begin inline asm 2026-02-21T10:22:35.4571285Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4571472Z // end inline asm 2026-02-21T10:22:35.4571617Z bar.sync 0; 2026-02-21T10:22:35.4571772Z elect.sync %r4782|%p143, -1; 2026-02-21T10:22:35.4571965Z and.pred %p58, %p1, %p143; 2026-02-21T10:22:35.4572145Z or.b32 %r1803, %r424, 32; 2026-02-21T10:22:35.4572317Z // begin inline asm 2026-02-21T10:22:35.4572739Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r423, %r1803}], [%r4903]; 2026-02-21T10:22:35.4573216Z // end inline asm 2026-02-21T10:22:35.4573359Z bar.sync 0; 2026-02-21T10:22:35.4573505Z // begin inline asm 2026-02-21T10:22:35.4573662Z 2026-02-21T10:22:35.4573788Z { 2026-02-21T10:22:35.4573918Z .reg .pred complete; 2026-02-21T10:22:35.4574080Z waitLoop: 2026-02-21T10:22:35.4574303Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r4743; 2026-02-21T10:22:35.4574589Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4574769Z } 2026-02-21T10:22:35.4574842Z 2026-02-21T10:22:35.4574913Z // end inline asm 2026-02-21T10:22:35.4575067Z bar.sync 0; 2026-02-21T10:22:35.4575209Z // begin inline asm 2026-02-21T10:22:35.4575400Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4575631Z // end inline asm 2026-02-21T10:22:35.4575937Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4576301Z ld.shared.b8 %rs145, [%r24]; 2026-02-21T10:22:35.4576612Z ld.shared.b8 %rs146, [%r24+1024]; 2026-02-21T10:22:35.4576824Z ld.shared.b8 %rs147, [%r24+2048]; 2026-02-21T10:22:35.4577115Z ld.shared.b8 %rs148, [%r24+3072]; 2026-02-21T10:22:35.4577390Z ld.shared.b8 %rs149, [%r25+256]; 2026-02-21T10:22:35.4577582Z ld.shared.b8 %rs150, [%r25+1280]; 2026-02-21T10:22:35.4577779Z ld.shared.b8 %rs151, [%r25+2304]; 2026-02-21T10:22:35.4577965Z ld.shared.b8 %rs152, [%r25+3328]; 2026-02-21T10:22:35.4578251Z ld.shared.b8 %rs153, [%r26+512]; 2026-02-21T10:22:35.4578452Z ld.shared.b8 %rs154, [%r26+1536]; 2026-02-21T10:22:35.4578646Z ld.shared.b8 %rs155, [%r26+2560]; 2026-02-21T10:22:35.4578840Z ld.shared.b8 %rs156, [%r26+3584]; 2026-02-21T10:22:35.4579025Z ld.shared.b8 %rs157, [%r27+768]; 2026-02-21T10:22:35.4579211Z ld.shared.b8 %rs158, [%r27+1792]; 2026-02-21T10:22:35.4579399Z ld.shared.b8 %rs159, [%r27+2816]; 2026-02-21T10:22:35.4579593Z ld.shared.b8 %rs160, [%r27+3840]; 2026-02-21T10:22:35.4579925Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4580290Z shl.b16 %rs161, %rs145, 4; 2026-02-21T10:22:35.4580474Z shl.b16 %rs162, %rs149, 4; 2026-02-21T10:22:35.4580648Z shl.b16 %rs163, %rs153, 4; 2026-02-21T10:22:35.4580826Z shl.b16 %rs164, %rs157, 4; 2026-02-21T10:22:35.4580992Z shl.b16 %rs165, %rs146, 4; 2026-02-21T10:22:35.4581166Z shl.b16 %rs166, %rs150, 4; 2026-02-21T10:22:35.4581230Z shl.b16 %rs167, %rs154, 4; 2026-02-21T10:22:35.4581293Z shl.b16 %rs168, %rs158, 4; 2026-02-21T10:22:35.4581358Z shl.b16 %rs169, %rs147, 4; 2026-02-21T10:22:35.4581418Z shl.b16 %rs170, %rs151, 4; 2026-02-21T10:22:35.4581568Z shl.b16 %rs171, %rs155, 4; 2026-02-21T10:22:35.4581643Z shl.b16 %rs172, %rs159, 4; 2026-02-21T10:22:35.4581713Z shl.b16 %rs173, %rs148, 4; 2026-02-21T10:22:35.4581774Z shl.b16 %rs174, %rs152, 4; 2026-02-21T10:22:35.4581837Z shl.b16 %rs175, %rs156, 4; 2026-02-21T10:22:35.4581903Z shl.b16 %rs176, %rs160, 4; 2026-02-21T10:22:35.4582122Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4582204Z selp.b16 %rs177, %rs161, %rs145, %p484; 2026-02-21T10:22:35.4582273Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T10:22:35.4582337Z shr.s16 %rs179, %rs178, 4; 2026-02-21T10:22:35.4582412Z selp.b16 %rs180, %rs162, %rs149, %p484; 2026-02-21T10:22:35.4582473Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T10:22:35.4582541Z shr.s16 %rs182, %rs181, 4; 2026-02-21T10:22:35.4582613Z selp.b16 %rs183, %rs163, %rs153, %p484; 2026-02-21T10:22:35.4582686Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T10:22:35.4582755Z shr.s16 %rs185, %rs184, 4; 2026-02-21T10:22:35.4582828Z selp.b16 %rs186, %rs164, %rs157, %p484; 2026-02-21T10:22:35.4582891Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T10:22:35.4582953Z shr.s16 %rs188, %rs187, 4; 2026-02-21T10:22:35.4583028Z selp.b16 %rs189, %rs165, %rs146, %p484; 2026-02-21T10:22:35.4583091Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T10:22:35.4583154Z shr.s16 %rs191, %rs190, 4; 2026-02-21T10:22:35.4583231Z selp.b16 %rs192, %rs166, %rs150, %p484; 2026-02-21T10:22:35.4583291Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T10:22:35.4583355Z shr.s16 %rs194, %rs193, 4; 2026-02-21T10:22:35.4583425Z selp.b16 %rs195, %rs167, %rs154, %p484; 2026-02-21T10:22:35.4583491Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T10:22:35.4583551Z shr.s16 %rs197, %rs196, 4; 2026-02-21T10:22:35.4583624Z selp.b16 %rs198, %rs168, %rs158, %p484; 2026-02-21T10:22:35.4583696Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T10:22:35.4583756Z shr.s16 %rs200, %rs199, 4; 2026-02-21T10:22:35.4583823Z selp.b16 %rs201, %rs169, %rs147, %p484; 2026-02-21T10:22:35.4583892Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T10:22:35.4583957Z shr.s16 %rs203, %rs202, 4; 2026-02-21T10:22:35.4584025Z selp.b16 %rs204, %rs170, %rs151, %p484; 2026-02-21T10:22:35.4584087Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T10:22:35.4584154Z shr.s16 %rs206, %rs205, 4; 2026-02-21T10:22:35.4584220Z selp.b16 %rs207, %rs171, %rs155, %p484; 2026-02-21T10:22:35.4584281Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T10:22:35.4584346Z shr.s16 %rs209, %rs208, 4; 2026-02-21T10:22:35.4584481Z selp.b16 %rs210, %rs172, %rs159, %p484; 2026-02-21T10:22:35.4584620Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T10:22:35.4584681Z shr.s16 %rs212, %rs211, 4; 2026-02-21T10:22:35.4584757Z selp.b16 %rs213, %rs173, %rs148, %p484; 2026-02-21T10:22:35.4584818Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T10:22:35.4584878Z shr.s16 %rs215, %rs214, 4; 2026-02-21T10:22:35.4584999Z selp.b16 %rs216, %rs174, %rs152, %p484; 2026-02-21T10:22:35.4585062Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T10:22:35.4585122Z shr.s16 %rs218, %rs217, 4; 2026-02-21T10:22:35.4585195Z selp.b16 %rs219, %rs175, %rs156, %p484; 2026-02-21T10:22:35.4585262Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T10:22:35.4585323Z shr.s16 %rs221, %rs220, 4; 2026-02-21T10:22:35.4585397Z selp.b16 %rs222, %rs176, %rs160, %p484; 2026-02-21T10:22:35.4585466Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T10:22:35.4585527Z shr.s16 %rs224, %rs223, 4; 2026-02-21T10:22:35.4585738Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4585813Z cvt.rn.f32.s16 %r4783, %rs179; 2026-02-21T10:22:35.4585880Z cvt.rn.f32.s16 %r4784, %rs182; 2026-02-21T10:22:35.4585943Z cvt.rn.f32.s16 %r4785, %rs185; 2026-02-21T10:22:35.4586005Z cvt.rn.f32.s16 %r4786, %rs188; 2026-02-21T10:22:35.4586073Z cvt.rn.f32.s16 %r4787, %rs191; 2026-02-21T10:22:35.4586140Z cvt.rn.f32.s16 %r4788, %rs194; 2026-02-21T10:22:35.4586204Z cvt.rn.f32.s16 %r4789, %rs197; 2026-02-21T10:22:35.4586273Z cvt.rn.f32.s16 %r4790, %rs200; 2026-02-21T10:22:35.4586396Z cvt.rn.f32.s16 %r4791, %rs203; 2026-02-21T10:22:35.4586582Z cvt.rn.f32.s16 %r4792, %rs206; 2026-02-21T10:22:35.4586650Z cvt.rn.f32.s16 %r4793, %rs209; 2026-02-21T10:22:35.4586720Z cvt.rn.f32.s16 %r4794, %rs212; 2026-02-21T10:22:35.4586781Z cvt.rn.f32.s16 %r4795, %rs215; 2026-02-21T10:22:35.4586844Z cvt.rn.f32.s16 %r4796, %rs218; 2026-02-21T10:22:35.4586916Z cvt.rn.f32.s16 %r4797, %rs221; 2026-02-21T10:22:35.4586976Z cvt.rn.f32.s16 %r4798, %rs224; 2026-02-21T10:22:35.4587033Z bar.sync 0; 2026-02-21T10:22:35.4587104Z st.shared.b32 [%r28], %r4783; 2026-02-21T10:22:35.4587185Z st.shared.b32 [%r28+16384], %r4791; 2026-02-21T10:22:35.4587250Z st.shared.b32 [%r29], %r4784; 2026-02-21T10:22:35.4587327Z st.shared.b32 [%r29+16384], %r4792; 2026-02-21T10:22:35.4587397Z st.shared.b32 [%r30], %r4785; 2026-02-21T10:22:35.4587465Z st.shared.b32 [%r30+16384], %r4793; 2026-02-21T10:22:35.4587528Z st.shared.b32 [%r31], %r4786; 2026-02-21T10:22:35.4587600Z st.shared.b32 [%r31+16384], %r4794; 2026-02-21T10:22:35.4587664Z st.shared.b32 [%r32], %r4787; 2026-02-21T10:22:35.4587730Z st.shared.b32 [%r32+16384], %r4795; 2026-02-21T10:22:35.4587791Z st.shared.b32 [%r33], %r4788; 2026-02-21T10:22:35.4587859Z st.shared.b32 [%r33+16384], %r4796; 2026-02-21T10:22:35.4587922Z st.shared.b32 [%r34], %r4789; 2026-02-21T10:22:35.4587988Z st.shared.b32 [%r34+16384], %r4797; 2026-02-21T10:22:35.4588056Z st.shared.b32 [%r35], %r4790; 2026-02-21T10:22:35.4588119Z st.shared.b32 [%r35+16384], %r4798; 2026-02-21T10:22:35.4588186Z $L__tmp3: 2026-02-21T10:22:35.4588555Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4588627Z // begin inline asm 2026-02-21T10:22:35.4588706Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4588766Z // end inline asm 2026-02-21T10:22:35.4588828Z bar.sync 0; 2026-02-21T10:22:35.4588902Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4588964Z // begin inline asm 2026-02-21T10:22:35.4589440Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2128,%r2129,%r2130,%r2131}, %rd266, %p24, 1, 1; 2026-02-21T10:22:35.4589499Z // end inline asm 2026-02-21T10:22:35.4589566Z // begin inline asm 2026-02-21T10:22:35.4590101Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2164,%r2165,%r2166,%r2167}, %rd267, %p24, 1, 1; 2026-02-21T10:22:35.4590229Z // end inline asm 2026-02-21T10:22:35.4590301Z // begin inline asm 2026-02-21T10:22:35.4590751Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2200,%r2201,%r2202,%r2203}, %rd268, %p24, 1, 1; 2026-02-21T10:22:35.4590873Z // end inline asm 2026-02-21T10:22:35.4590932Z // begin inline asm 2026-02-21T10:22:35.4591377Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2236,%r2237,%r2238,%r2239}, %rd269, %p24, 1, 1; 2026-02-21T10:22:35.4591439Z // end inline asm 2026-02-21T10:22:35.4591499Z // begin inline asm 2026-02-21T10:22:35.4591942Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2272,%r2273,%r2274,%r2275}, %rd270, %p24, 1, 1; 2026-02-21T10:22:35.4592006Z // end inline asm 2026-02-21T10:22:35.4592067Z // begin inline asm 2026-02-21T10:22:35.4592507Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2308,%r2309,%r2310,%r2311}, %rd271, %p24, 1, 1; 2026-02-21T10:22:35.4592572Z // end inline asm 2026-02-21T10:22:35.4592631Z // begin inline asm 2026-02-21T10:22:35.4593131Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2344,%r2345,%r2346,%r2347}, %rd272, %p24, 1, 1; 2026-02-21T10:22:35.4593195Z // end inline asm 2026-02-21T10:22:35.4593254Z // begin inline asm 2026-02-21T10:22:35.4593694Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r2380,%r2381,%r2382,%r2383}, %rd273, %p24, 1, 1; 2026-02-21T10:22:35.4593758Z // end inline asm 2026-02-21T10:22:35.4593833Z // begin inline asm 2026-02-21T10:22:35.4594277Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2128,%r2129,%r2130,%r2131}, %rd274, %p24, 1, 1; 2026-02-21T10:22:35.4594338Z // end inline asm 2026-02-21T10:22:35.4594403Z // begin inline asm 2026-02-21T10:22:35.4594844Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2164,%r2165,%r2166,%r2167}, %rd275, %p24, 1, 1; 2026-02-21T10:22:35.4594900Z // end inline asm 2026-02-21T10:22:35.4594969Z // begin inline asm 2026-02-21T10:22:35.4595409Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2200,%r2201,%r2202,%r2203}, %rd276, %p24, 1, 1; 2026-02-21T10:22:35.4595469Z // end inline asm 2026-02-21T10:22:35.4595535Z // begin inline asm 2026-02-21T10:22:35.4595979Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2236,%r2237,%r2238,%r2239}, %rd277, %p24, 1, 1; 2026-02-21T10:22:35.4596036Z // end inline asm 2026-02-21T10:22:35.4596103Z // begin inline asm 2026-02-21T10:22:35.4596668Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2272,%r2273,%r2274,%r2275}, %rd278, %p24, 1, 1; 2026-02-21T10:22:35.4596730Z // end inline asm 2026-02-21T10:22:35.4596795Z // begin inline asm 2026-02-21T10:22:35.4597242Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2308,%r2309,%r2310,%r2311}, %rd279, %p24, 1, 1; 2026-02-21T10:22:35.4597299Z // end inline asm 2026-02-21T10:22:35.4597442Z // begin inline asm 2026-02-21T10:22:35.4597897Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2344,%r2345,%r2346,%r2347}, %rd280, %p24, 1, 1; 2026-02-21T10:22:35.4598019Z // end inline asm 2026-02-21T10:22:35.4598134Z // begin inline asm 2026-02-21T10:22:35.4598582Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r2380,%r2381,%r2382,%r2383}, %rd281, %p24, 1, 1; 2026-02-21T10:22:35.4598638Z // end inline asm 2026-02-21T10:22:35.4598697Z // begin inline asm 2026-02-21T10:22:35.4599204Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2128,%r2129,%r2130,%r2131}, %rd282, %p24, 1, 1; 2026-02-21T10:22:35.4599262Z // end inline asm 2026-02-21T10:22:35.4599321Z // begin inline asm 2026-02-21T10:22:35.4599823Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2164,%r2165,%r2166,%r2167}, %rd283, %p24, 1, 1; 2026-02-21T10:22:35.4599895Z // end inline asm 2026-02-21T10:22:35.4599957Z // begin inline asm 2026-02-21T10:22:35.4600516Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2200,%r2201,%r2202,%r2203}, %rd284, %p24, 1, 1; 2026-02-21T10:22:35.4600574Z // end inline asm 2026-02-21T10:22:35.4600634Z // begin inline asm 2026-02-21T10:22:35.4601132Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2236,%r2237,%r2238,%r2239}, %rd285, %p24, 1, 1; 2026-02-21T10:22:35.4601189Z // end inline asm 2026-02-21T10:22:35.4601246Z // begin inline asm 2026-02-21T10:22:35.4601738Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2272,%r2273,%r2274,%r2275}, %rd286, %p24, 1, 1; 2026-02-21T10:22:35.4601816Z // end inline asm 2026-02-21T10:22:35.4601879Z // begin inline asm 2026-02-21T10:22:35.4602372Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2308,%r2309,%r2310,%r2311}, %rd287, %p24, 1, 1; 2026-02-21T10:22:35.4602437Z // end inline asm 2026-02-21T10:22:35.4602497Z // begin inline asm 2026-02-21T10:22:35.4602989Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2344,%r2345,%r2346,%r2347}, %rd288, %p24, 1, 1; 2026-02-21T10:22:35.4603053Z // end inline asm 2026-02-21T10:22:35.4603113Z // begin inline asm 2026-02-21T10:22:35.4603609Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r2380,%r2381,%r2382,%r2383}, %rd289, %p24, 1, 1; 2026-02-21T10:22:35.4603673Z // end inline asm 2026-02-21T10:22:35.4603735Z // begin inline asm 2026-02-21T10:22:35.4604230Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2128,%r2129,%r2130,%r2131}, %rd290, %p24, 1, 1; 2026-02-21T10:22:35.4604294Z // end inline asm 2026-02-21T10:22:35.4604351Z // begin inline asm 2026-02-21T10:22:35.4604844Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2164,%r2165,%r2166,%r2167}, %rd291, %p24, 1, 1; 2026-02-21T10:22:35.4604909Z // end inline asm 2026-02-21T10:22:35.4605031Z // begin inline asm 2026-02-21T10:22:35.4605571Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2200,%r2201,%r2202,%r2203}, %rd292, %p24, 1, 1; 2026-02-21T10:22:35.4605676Z // end inline asm 2026-02-21T10:22:35.4605735Z // begin inline asm 2026-02-21T10:22:35.4606222Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2236,%r2237,%r2238,%r2239}, %rd293, %p24, 1, 1; 2026-02-21T10:22:35.4606282Z // end inline asm 2026-02-21T10:22:35.4606347Z // begin inline asm 2026-02-21T10:22:35.4606951Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2272,%r2273,%r2274,%r2275}, %rd294, %p24, 1, 1; 2026-02-21T10:22:35.4607011Z // end inline asm 2026-02-21T10:22:35.4607079Z // begin inline asm 2026-02-21T10:22:35.4607586Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2308,%r2309,%r2310,%r2311}, %rd295, %p24, 1, 1; 2026-02-21T10:22:35.4607646Z // end inline asm 2026-02-21T10:22:35.4607709Z // begin inline asm 2026-02-21T10:22:35.4608276Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2344,%r2345,%r2346,%r2347}, %rd296, %p24, 1, 1; 2026-02-21T10:22:35.4608335Z // end inline asm 2026-02-21T10:22:35.4608398Z // begin inline asm 2026-02-21T10:22:35.4608890Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r2380,%r2381,%r2382,%r2383}, %rd297, %p24, 1, 1; 2026-02-21T10:22:35.4608949Z // end inline asm 2026-02-21T10:22:35.4609032Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4609099Z mov.b32 %r3025, %r4743; 2026-02-21T10:22:35.4609158Z mov.b32 %r3026, %r4743; 2026-02-21T10:22:35.4609219Z mov.b32 %r3024, %r11026; 2026-02-21T10:22:35.4609289Z // begin inline asm 2026-02-21T10:22:35.4610269Z // wait for regs: %r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r3024,%r3025,%r3026 2026-02-21T10:22:35.4610350Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4610409Z // end inline asm 2026-02-21T10:22:35.4610464Z $L__tmp4: 2026-02-21T10:22:35.4610680Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4610756Z add.s64 %rd204, %rd595, 256; 2026-02-21T10:22:35.4610816Z // begin inline asm 2026-02-21T10:22:35.4610876Z mov.u64 %rd203, 0x0; 2026-02-21T10:22:35.4611010Z createpolicy.fractional.L2::evict_last.b64 %rd203, 1.0; 2026-02-21T10:22:35.4611068Z // end inline asm 2026-02-21T10:22:35.4611128Z // begin inline asm 2026-02-21T10:22:35.4611201Z mov.u32 %r3094, 0x0; 2026-02-21T10:22:35.4611267Z mov.u32 %r3095, 0x0; 2026-02-21T10:22:35.4611328Z mov.u32 %r3096, 0x0; 2026-02-21T10:22:35.4611387Z mov.u32 %r3097, 0x0; 2026-02-21T10:22:35.4611618Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3094, %r3095, %r3096, %r3097 }, [ %rd204 + 0 ], %rd203; 2026-02-21T10:22:35.4611677Z // end inline asm 2026-02-21T10:22:35.4611883Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4611945Z bar.sync 0; 2026-02-21T10:22:35.4612027Z st.shared.v2.b32 [%r14], {%r3094, %r3095}; 2026-02-21T10:22:35.4612201Z st.shared.v2.b32 [%r15], {%r3096, %r3097}; 2026-02-21T10:22:35.4612336Z bar.sync 0; 2026-02-21T10:22:35.4612411Z ld.shared.b16 %rs225, [%r16]; 2026-02-21T10:22:35.4612484Z ld.shared.b16 %rs226, [%r16+1024]; 2026-02-21T10:22:35.4612552Z ld.shared.b16 %rs227, [%r16+64]; 2026-02-21T10:22:35.4612682Z ld.shared.b16 %rs228, [%r16+1088]; 2026-02-21T10:22:35.4612754Z ld.shared.b16 %rs229, [%r17]; 2026-02-21T10:22:35.4612823Z ld.shared.b16 %rs230, [%r17+1024]; 2026-02-21T10:22:35.4612893Z ld.shared.b16 %rs231, [%r17+64]; 2026-02-21T10:22:35.4612979Z ld.shared.b16 %rs232, [%r17+1088]; 2026-02-21T10:22:35.4613046Z ld.shared.b16 %rs233, [%r18]; 2026-02-21T10:22:35.4613111Z ld.shared.b16 %rs234, [%r18+1024]; 2026-02-21T10:22:35.4613182Z ld.shared.b16 %rs235, [%r18+64]; 2026-02-21T10:22:35.4613247Z ld.shared.b16 %rs236, [%r18+1088]; 2026-02-21T10:22:35.4613314Z ld.shared.b16 %rs237, [%r19]; 2026-02-21T10:22:35.4613384Z ld.shared.b16 %rs238, [%r19+1024]; 2026-02-21T10:22:35.4613450Z ld.shared.b16 %rs239, [%r19+64]; 2026-02-21T10:22:35.4613518Z ld.shared.b16 %rs240, [%r19+1088]; 2026-02-21T10:22:35.4613583Z ld.shared.b16 %rs241, [%r20]; 2026-02-21T10:22:35.4613655Z ld.shared.b16 %rs242, [%r20+1024]; 2026-02-21T10:22:35.4613721Z ld.shared.b16 %rs243, [%r20+64]; 2026-02-21T10:22:35.4613788Z ld.shared.b16 %rs244, [%r20+1088]; 2026-02-21T10:22:35.4613861Z ld.shared.b16 %rs245, [%r21]; 2026-02-21T10:22:35.4613927Z ld.shared.b16 %rs246, [%r21+1024]; 2026-02-21T10:22:35.4614049Z ld.shared.b16 %rs247, [%r21+64]; 2026-02-21T10:22:35.4614119Z ld.shared.b16 %rs248, [%r21+1088]; 2026-02-21T10:22:35.4614190Z ld.shared.b16 %rs249, [%r22]; 2026-02-21T10:22:35.4614254Z ld.shared.b16 %rs250, [%r22+1024]; 2026-02-21T10:22:35.4614317Z ld.shared.b16 %rs251, [%r22+64]; 2026-02-21T10:22:35.4614388Z ld.shared.b16 %rs252, [%r22+1088]; 2026-02-21T10:22:35.4614454Z ld.shared.b16 %rs253, [%r23]; 2026-02-21T10:22:35.4614529Z ld.shared.b16 %rs254, [%r23+1024]; 2026-02-21T10:22:35.4614599Z ld.shared.b16 %rs255, [%r23+64]; 2026-02-21T10:22:35.4614672Z ld.shared.b16 %rs256, [%r23+1088]; 2026-02-21T10:22:35.4614738Z cvt.f32.bf16 %r3427, %rs225; 2026-02-21T10:22:35.4614801Z cvt.f32.bf16 %r3428, %rs226; 2026-02-21T10:22:35.4614869Z cvt.f32.bf16 %r3429, %rs229; 2026-02-21T10:22:35.4614934Z cvt.f32.bf16 %r3430, %rs230; 2026-02-21T10:22:35.4614996Z cvt.f32.bf16 %r3463, %rs233; 2026-02-21T10:22:35.4615079Z cvt.f32.bf16 %r3464, %rs234; 2026-02-21T10:22:35.4615147Z cvt.f32.bf16 %r3465, %rs237; 2026-02-21T10:22:35.4615210Z cvt.f32.bf16 %r3466, %rs238; 2026-02-21T10:22:35.4615272Z cvt.f32.bf16 %r3499, %rs241; 2026-02-21T10:22:35.4615341Z cvt.f32.bf16 %r3500, %rs242; 2026-02-21T10:22:35.4615402Z cvt.f32.bf16 %r3501, %rs245; 2026-02-21T10:22:35.4615464Z cvt.f32.bf16 %r3502, %rs246; 2026-02-21T10:22:35.4615532Z cvt.f32.bf16 %r3535, %rs249; 2026-02-21T10:22:35.4615593Z cvt.f32.bf16 %r3536, %rs250; 2026-02-21T10:22:35.4615654Z cvt.f32.bf16 %r3537, %rs253; 2026-02-21T10:22:35.4615716Z cvt.f32.bf16 %r3538, %rs254; 2026-02-21T10:22:35.4615786Z cvt.f32.bf16 %r3571, %rs227; 2026-02-21T10:22:35.4615848Z cvt.f32.bf16 %r3572, %rs228; 2026-02-21T10:22:35.4615909Z cvt.f32.bf16 %r3573, %rs231; 2026-02-21T10:22:35.4615975Z cvt.f32.bf16 %r3574, %rs232; 2026-02-21T10:22:35.4616040Z cvt.f32.bf16 %r3607, %rs235; 2026-02-21T10:22:35.4616100Z cvt.f32.bf16 %r3608, %rs236; 2026-02-21T10:22:35.4616162Z cvt.f32.bf16 %r3609, %rs239; 2026-02-21T10:22:35.4616231Z cvt.f32.bf16 %r3610, %rs240; 2026-02-21T10:22:35.4616292Z cvt.f32.bf16 %r3643, %rs243; 2026-02-21T10:22:35.4616352Z cvt.f32.bf16 %r3644, %rs244; 2026-02-21T10:22:35.4616417Z cvt.f32.bf16 %r3645, %rs247; 2026-02-21T10:22:35.4616597Z cvt.f32.bf16 %r3646, %rs248; 2026-02-21T10:22:35.4616667Z cvt.f32.bf16 %r3679, %rs251; 2026-02-21T10:22:35.4616734Z cvt.f32.bf16 %r3680, %rs252; 2026-02-21T10:22:35.4616797Z cvt.f32.bf16 %r3681, %rs255; 2026-02-21T10:22:35.4616858Z cvt.f32.bf16 %r3682, %rs256; 2026-02-21T10:22:35.4617154Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4617281Z bar.sync 0; 2026-02-21T10:22:35.4617344Z // begin inline asm 2026-02-21T10:22:35.4617447Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4617567Z // end inline asm 2026-02-21T10:22:35.4617623Z bar.sync 0; 2026-02-21T10:22:35.4617681Z // begin inline asm 2026-02-21T10:22:35.4617832Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4617900Z // end inline asm 2026-02-21T10:22:35.4617959Z // begin inline asm 2026-02-21T10:22:35.4618040Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4618101Z // end inline asm 2026-02-21T10:22:35.4618159Z bar.sync 0; 2026-02-21T10:22:35.4618230Z elect.sync %r4799|%p144, -1; 2026-02-21T10:22:35.4618301Z and.pred %p94, %p1, %p144; 2026-02-21T10:22:35.4618369Z or.b32 %r3102, %r424, 64; 2026-02-21T10:22:35.4618432Z // begin inline asm 2026-02-21T10:22:35.4618764Z @%p94 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r423, %r3102}], [%r4903]; 2026-02-21T10:22:35.4618829Z // end inline asm 2026-02-21T10:22:35.4618886Z bar.sync 0; 2026-02-21T10:22:35.4618946Z // begin inline asm 2026-02-21T10:22:35.4619003Z 2026-02-21T10:22:35.4619058Z { 2026-02-21T10:22:35.4619124Z .reg .pred complete; 2026-02-21T10:22:35.4619181Z waitLoop: 2026-02-21T10:22:35.4619398Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r4743; 2026-02-21T10:22:35.4619474Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4619525Z } 2026-02-21T10:22:35.4619530Z 2026-02-21T10:22:35.4619592Z // end inline asm 2026-02-21T10:22:35.4619659Z bar.sync 0; 2026-02-21T10:22:35.4619721Z // begin inline asm 2026-02-21T10:22:35.4619819Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4619882Z // end inline asm 2026-02-21T10:22:35.4620095Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4620166Z ld.shared.b8 %rs257, [%r24]; 2026-02-21T10:22:35.4620241Z ld.shared.b8 %rs258, [%r24+1024]; 2026-02-21T10:22:35.4620308Z ld.shared.b8 %rs259, [%r24+2048]; 2026-02-21T10:22:35.4620372Z ld.shared.b8 %rs260, [%r24+3072]; 2026-02-21T10:22:35.4620447Z ld.shared.b8 %rs261, [%r25+256]; 2026-02-21T10:22:35.4620515Z ld.shared.b8 %rs262, [%r25+1280]; 2026-02-21T10:22:35.4620578Z ld.shared.b8 %rs263, [%r25+2304]; 2026-02-21T10:22:35.4620646Z ld.shared.b8 %rs264, [%r25+3328]; 2026-02-21T10:22:35.4620719Z ld.shared.b8 %rs265, [%r26+512]; 2026-02-21T10:22:35.4620784Z ld.shared.b8 %rs266, [%r26+1536]; 2026-02-21T10:22:35.4620847Z ld.shared.b8 %rs267, [%r26+2560]; 2026-02-21T10:22:35.4620919Z ld.shared.b8 %rs268, [%r26+3584]; 2026-02-21T10:22:35.4620983Z ld.shared.b8 %rs269, [%r27+768]; 2026-02-21T10:22:35.4621047Z ld.shared.b8 %rs270, [%r27+1792]; 2026-02-21T10:22:35.4621116Z ld.shared.b8 %rs271, [%r27+2816]; 2026-02-21T10:22:35.4621187Z ld.shared.b8 %rs272, [%r27+3840]; 2026-02-21T10:22:35.4621398Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4621464Z shl.b16 %rs273, %rs257, 4; 2026-02-21T10:22:35.4621533Z shl.b16 %rs274, %rs261, 4; 2026-02-21T10:22:35.4621594Z shl.b16 %rs275, %rs265, 4; 2026-02-21T10:22:35.4621660Z shl.b16 %rs276, %rs269, 4; 2026-02-21T10:22:35.4621727Z shl.b16 %rs277, %rs258, 4; 2026-02-21T10:22:35.4621790Z shl.b16 %rs278, %rs262, 4; 2026-02-21T10:22:35.4621855Z shl.b16 %rs279, %rs266, 4; 2026-02-21T10:22:35.4621920Z shl.b16 %rs280, %rs270, 4; 2026-02-21T10:22:35.4621998Z shl.b16 %rs281, %rs259, 4; 2026-02-21T10:22:35.4622063Z shl.b16 %rs282, %rs263, 4; 2026-02-21T10:22:35.4622126Z shl.b16 %rs283, %rs267, 4; 2026-02-21T10:22:35.4622193Z shl.b16 %rs284, %rs271, 4; 2026-02-21T10:22:35.4622253Z shl.b16 %rs285, %rs260, 4; 2026-02-21T10:22:35.4622316Z shl.b16 %rs286, %rs264, 4; 2026-02-21T10:22:35.4622436Z shl.b16 %rs287, %rs268, 4; 2026-02-21T10:22:35.4622507Z shl.b16 %rs288, %rs272, 4; 2026-02-21T10:22:35.4622771Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4622849Z selp.b16 %rs289, %rs273, %rs257, %p484; 2026-02-21T10:22:35.4622921Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T10:22:35.4623028Z shr.s16 %rs291, %rs290, 4; 2026-02-21T10:22:35.4623104Z selp.b16 %rs292, %rs274, %rs261, %p484; 2026-02-21T10:22:35.4623182Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T10:22:35.4623249Z shr.s16 %rs294, %rs293, 4; 2026-02-21T10:22:35.4623319Z selp.b16 %rs295, %rs275, %rs265, %p484; 2026-02-21T10:22:35.4623382Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T10:22:35.4623450Z shr.s16 %rs297, %rs296, 4; 2026-02-21T10:22:35.4623520Z selp.b16 %rs298, %rs276, %rs269, %p484; 2026-02-21T10:22:35.4623583Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T10:22:35.4623651Z shr.s16 %rs300, %rs299, 4; 2026-02-21T10:22:35.4623719Z selp.b16 %rs301, %rs277, %rs258, %p484; 2026-02-21T10:22:35.4623786Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T10:22:35.4623849Z shr.s16 %rs303, %rs302, 4; 2026-02-21T10:22:35.4623929Z selp.b16 %rs304, %rs278, %rs262, %p484; 2026-02-21T10:22:35.4623990Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T10:22:35.4624051Z shr.s16 %rs306, %rs305, 4; 2026-02-21T10:22:35.4624133Z selp.b16 %rs307, %rs279, %rs266, %p484; 2026-02-21T10:22:35.4624193Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T10:22:35.4624255Z shr.s16 %rs309, %rs308, 4; 2026-02-21T10:22:35.4624374Z selp.b16 %rs310, %rs280, %rs270, %p484; 2026-02-21T10:22:35.4624443Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T10:22:35.4624506Z shr.s16 %rs312, %rs311, 4; 2026-02-21T10:22:35.4624574Z selp.b16 %rs313, %rs281, %rs259, %p484; 2026-02-21T10:22:35.4624643Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T10:22:35.4624704Z shr.s16 %rs315, %rs314, 4; 2026-02-21T10:22:35.4624773Z selp.b16 %rs316, %rs282, %rs263, %p484; 2026-02-21T10:22:35.4624839Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T10:22:35.4624902Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:22:35.4624987Z selp.b16 %rs319, %rs283, %rs267, %p484; 2026-02-21T10:22:35.4625054Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T10:22:35.4625121Z shr.s16 %rs321, %rs320, 4; 2026-02-21T10:22:35.4625192Z selp.b16 %rs322, %rs284, %rs271, %p484; 2026-02-21T10:22:35.4625254Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T10:22:35.4625323Z shr.s16 %rs324, %rs323, 4; 2026-02-21T10:22:35.4625391Z selp.b16 %rs325, %rs285, %rs260, %p484; 2026-02-21T10:22:35.4625455Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T10:22:35.4625515Z shr.s16 %rs327, %rs326, 4; 2026-02-21T10:22:35.4625592Z selp.b16 %rs328, %rs286, %rs264, %p484; 2026-02-21T10:22:35.4625653Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T10:22:35.4625713Z shr.s16 %rs330, %rs329, 4; 2026-02-21T10:22:35.4625790Z selp.b16 %rs331, %rs287, %rs268, %p484; 2026-02-21T10:22:35.4625852Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T10:22:35.4625913Z shr.s16 %rs333, %rs332, 4; 2026-02-21T10:22:35.4625983Z selp.b16 %rs334, %rs288, %rs272, %p484; 2026-02-21T10:22:35.4626053Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T10:22:35.4626119Z shr.s16 %rs336, %rs335, 4; 2026-02-21T10:22:35.4626324Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4626398Z cvt.rn.f32.s16 %r4800, %rs291; 2026-02-21T10:22:35.4626587Z cvt.rn.f32.s16 %r4801, %rs294; 2026-02-21T10:22:35.4626658Z cvt.rn.f32.s16 %r4802, %rs297; 2026-02-21T10:22:35.4626727Z cvt.rn.f32.s16 %r4803, %rs300; 2026-02-21T10:22:35.4626793Z cvt.rn.f32.s16 %r4804, %rs303; 2026-02-21T10:22:35.4626860Z cvt.rn.f32.s16 %r4805, %rs306; 2026-02-21T10:22:35.4626923Z cvt.rn.f32.s16 %r4806, %rs309; 2026-02-21T10:22:35.4626991Z cvt.rn.f32.s16 %r4807, %rs312; 2026-02-21T10:22:35.4627054Z cvt.rn.f32.s16 %r4808, %rs315; 2026-02-21T10:22:35.4627116Z cvt.rn.f32.s16 %r4809, %rs318; 2026-02-21T10:22:35.4627186Z cvt.rn.f32.s16 %r4810, %rs321; 2026-02-21T10:22:35.4627249Z cvt.rn.f32.s16 %r4811, %rs324; 2026-02-21T10:22:35.4627393Z cvt.rn.f32.s16 %r4812, %rs327; 2026-02-21T10:22:35.4627518Z cvt.rn.f32.s16 %r4813, %rs330; 2026-02-21T10:22:35.4627589Z cvt.rn.f32.s16 %r4814, %rs333; 2026-02-21T10:22:35.4627652Z cvt.rn.f32.s16 %r4815, %rs336; 2026-02-21T10:22:35.4627708Z bar.sync 0; 2026-02-21T10:22:35.4627782Z st.shared.b32 [%r28], %r4800; 2026-02-21T10:22:35.4627914Z st.shared.b32 [%r28+16384], %r4808; 2026-02-21T10:22:35.4627980Z st.shared.b32 [%r29], %r4801; 2026-02-21T10:22:35.4628049Z st.shared.b32 [%r29+16384], %r4809; 2026-02-21T10:22:35.4628117Z st.shared.b32 [%r30], %r4802; 2026-02-21T10:22:35.4628183Z st.shared.b32 [%r30+16384], %r4810; 2026-02-21T10:22:35.4628247Z st.shared.b32 [%r31], %r4803; 2026-02-21T10:22:35.4628318Z st.shared.b32 [%r31+16384], %r4811; 2026-02-21T10:22:35.4628381Z st.shared.b32 [%r32], %r4804; 2026-02-21T10:22:35.4628526Z st.shared.b32 [%r32+16384], %r4812; 2026-02-21T10:22:35.4628599Z st.shared.b32 [%r33], %r4805; 2026-02-21T10:22:35.4628667Z st.shared.b32 [%r33+16384], %r4813; 2026-02-21T10:22:35.4628737Z st.shared.b32 [%r34], %r4806; 2026-02-21T10:22:35.4628806Z st.shared.b32 [%r34+16384], %r4814; 2026-02-21T10:22:35.4628877Z st.shared.b32 [%r35], %r4807; 2026-02-21T10:22:35.4628943Z st.shared.b32 [%r35+16384], %r4815; 2026-02-21T10:22:35.4628999Z $L__tmp5: 2026-02-21T10:22:35.4629287Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4629349Z // begin inline asm 2026-02-21T10:22:35.4629515Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4629577Z // end inline asm 2026-02-21T10:22:35.4629639Z bar.sync 0; 2026-02-21T10:22:35.4629716Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4629776Z // begin inline asm 2026-02-21T10:22:35.4630246Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3427,%r3428,%r3429,%r3430}, %rd266, %p24, 1, 1; 2026-02-21T10:22:35.4630309Z // end inline asm 2026-02-21T10:22:35.4630371Z // begin inline asm 2026-02-21T10:22:35.4630824Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3463,%r3464,%r3465,%r3466}, %rd267, %p24, 1, 1; 2026-02-21T10:22:35.4630884Z // end inline asm 2026-02-21T10:22:35.4630944Z // begin inline asm 2026-02-21T10:22:35.4631396Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3499,%r3500,%r3501,%r3502}, %rd268, %p24, 1, 1; 2026-02-21T10:22:35.4631456Z // end inline asm 2026-02-21T10:22:35.4631515Z // begin inline asm 2026-02-21T10:22:35.4631953Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3535,%r3536,%r3537,%r3538}, %rd269, %p24, 1, 1; 2026-02-21T10:22:35.4632020Z // end inline asm 2026-02-21T10:22:35.4632084Z // begin inline asm 2026-02-21T10:22:35.4632543Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3571,%r3572,%r3573,%r3574}, %rd270, %p24, 1, 1; 2026-02-21T10:22:35.4632618Z // end inline asm 2026-02-21T10:22:35.4632686Z // begin inline asm 2026-02-21T10:22:35.4633140Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3607,%r3608,%r3609,%r3610}, %rd271, %p24, 1, 1; 2026-02-21T10:22:35.4633204Z // end inline asm 2026-02-21T10:22:35.4633265Z // begin inline asm 2026-02-21T10:22:35.4633711Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3643,%r3644,%r3645,%r3646}, %rd272, %p24, 1, 1; 2026-02-21T10:22:35.4633774Z // end inline asm 2026-02-21T10:22:35.4633834Z // begin inline asm 2026-02-21T10:22:35.4634330Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560}, {%r3679,%r3680,%r3681,%r3682}, %rd273, %p24, 1, 1; 2026-02-21T10:22:35.4634436Z // end inline asm 2026-02-21T10:22:35.4634538Z // begin inline asm 2026-02-21T10:22:35.4634986Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3427,%r3428,%r3429,%r3430}, %rd274, %p24, 1, 1; 2026-02-21T10:22:35.4635050Z // end inline asm 2026-02-21T10:22:35.4635108Z // begin inline asm 2026-02-21T10:22:35.4635550Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3463,%r3464,%r3465,%r3466}, %rd275, %p24, 1, 1; 2026-02-21T10:22:35.4635607Z // end inline asm 2026-02-21T10:22:35.4635671Z // begin inline asm 2026-02-21T10:22:35.4636124Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3499,%r3500,%r3501,%r3502}, %rd276, %p24, 1, 1; 2026-02-21T10:22:35.4636189Z // end inline asm 2026-02-21T10:22:35.4636252Z // begin inline asm 2026-02-21T10:22:35.4636897Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3535,%r3536,%r3537,%r3538}, %rd277, %p24, 1, 1; 2026-02-21T10:22:35.4636962Z // end inline asm 2026-02-21T10:22:35.4637025Z // begin inline asm 2026-02-21T10:22:35.4637471Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3571,%r3572,%r3573,%r3574}, %rd278, %p24, 1, 1; 2026-02-21T10:22:35.4637529Z // end inline asm 2026-02-21T10:22:35.4637592Z // begin inline asm 2026-02-21T10:22:35.4638037Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3607,%r3608,%r3609,%r3610}, %rd279, %p24, 1, 1; 2026-02-21T10:22:35.4638110Z // end inline asm 2026-02-21T10:22:35.4638176Z // begin inline asm 2026-02-21T10:22:35.4638617Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3643,%r3644,%r3645,%r3646}, %rd280, %p24, 1, 1; 2026-02-21T10:22:35.4638679Z // end inline asm 2026-02-21T10:22:35.4638739Z // begin inline asm 2026-02-21T10:22:35.4639183Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848}, {%r3679,%r3680,%r3681,%r3682}, %rd281, %p24, 1, 1; 2026-02-21T10:22:35.4639242Z // end inline asm 2026-02-21T10:22:35.4639300Z // begin inline asm 2026-02-21T10:22:35.4639805Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3427,%r3428,%r3429,%r3430}, %rd282, %p24, 1, 1; 2026-02-21T10:22:35.4639864Z // end inline asm 2026-02-21T10:22:35.4639921Z // begin inline asm 2026-02-21T10:22:35.4640420Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3463,%r3464,%r3465,%r3466}, %rd283, %p24, 1, 1; 2026-02-21T10:22:35.4640482Z // end inline asm 2026-02-21T10:22:35.4640542Z // begin inline asm 2026-02-21T10:22:35.4641035Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3499,%r3500,%r3501,%r3502}, %rd284, %p24, 1, 1; 2026-02-21T10:22:35.4641093Z // end inline asm 2026-02-21T10:22:35.4641151Z // begin inline asm 2026-02-21T10:22:35.4641719Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3535,%r3536,%r3537,%r3538}, %rd285, %p24, 1, 1; 2026-02-21T10:22:35.4641832Z // end inline asm 2026-02-21T10:22:35.4641892Z // begin inline asm 2026-02-21T10:22:35.4642392Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3571,%r3572,%r3573,%r3574}, %rd286, %p24, 1, 1; 2026-02-21T10:22:35.4642511Z // end inline asm 2026-02-21T10:22:35.4642571Z // begin inline asm 2026-02-21T10:22:35.4643067Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3607,%r3608,%r3609,%r3610}, %rd287, %p24, 1, 1; 2026-02-21T10:22:35.4643124Z // end inline asm 2026-02-21T10:22:35.4643182Z // begin inline asm 2026-02-21T10:22:35.4643677Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3643,%r3644,%r3645,%r3646}, %rd288, %p24, 1, 1; 2026-02-21T10:22:35.4643745Z // end inline asm 2026-02-21T10:22:35.4643805Z // begin inline asm 2026-02-21T10:22:35.4644345Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136}, {%r3679,%r3680,%r3681,%r3682}, %rd289, %p24, 1, 1; 2026-02-21T10:22:35.4644412Z // end inline asm 2026-02-21T10:22:35.4644469Z // begin inline asm 2026-02-21T10:22:35.4644958Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3427,%r3428,%r3429,%r3430}, %rd290, %p24, 1, 1; 2026-02-21T10:22:35.4645021Z // end inline asm 2026-02-21T10:22:35.4645080Z // begin inline asm 2026-02-21T10:22:35.4645571Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3463,%r3464,%r3465,%r3466}, %rd291, %p24, 1, 1; 2026-02-21T10:22:35.4645635Z // end inline asm 2026-02-21T10:22:35.4645694Z // begin inline asm 2026-02-21T10:22:35.4646187Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3499,%r3500,%r3501,%r3502}, %rd292, %p24, 1, 1; 2026-02-21T10:22:35.4646251Z // end inline asm 2026-02-21T10:22:35.4646311Z // begin inline asm 2026-02-21T10:22:35.4646919Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3535,%r3536,%r3537,%r3538}, %rd293, %p24, 1, 1; 2026-02-21T10:22:35.4646987Z // end inline asm 2026-02-21T10:22:35.4647047Z // begin inline asm 2026-02-21T10:22:35.4647543Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3571,%r3572,%r3573,%r3574}, %rd294, %p24, 1, 1; 2026-02-21T10:22:35.4647608Z // end inline asm 2026-02-21T10:22:35.4647666Z // begin inline asm 2026-02-21T10:22:35.4648158Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3607,%r3608,%r3609,%r3610}, %rd295, %p24, 1, 1; 2026-02-21T10:22:35.4648218Z // end inline asm 2026-02-21T10:22:35.4648283Z // begin inline asm 2026-02-21T10:22:35.4648776Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3643,%r3644,%r3645,%r3646}, %rd296, %p24, 1, 1; 2026-02-21T10:22:35.4648832Z // end inline asm 2026-02-21T10:22:35.4648899Z // begin inline asm 2026-02-21T10:22:35.4649485Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424}, {%r3679,%r3680,%r3681,%r3682}, %rd297, %p24, 1, 1; 2026-02-21T10:22:35.4649606Z // end inline asm 2026-02-21T10:22:35.4649695Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4649819Z mov.b32 %r4324, %r4743; 2026-02-21T10:22:35.4649880Z mov.b32 %r4325, %r4743; 2026-02-21T10:22:35.4649946Z mov.b32 %r4323, %r11026; 2026-02-21T10:22:35.4650008Z // begin inline asm 2026-02-21T10:22:35.4650978Z // wait for regs: %r545,%r546,%r547,%r548,%r549,%r550,%r551,%r552,%r553,%r554,%r555,%r556,%r557,%r558,%r559,%r560,%r833,%r834,%r835,%r836,%r837,%r838,%r839,%r840,%r841,%r842,%r843,%r844,%r845,%r846,%r847,%r848,%r1121,%r1122,%r1123,%r1124,%r1125,%r1126,%r1127,%r1128,%r1129,%r1130,%r1131,%r1132,%r1133,%r1134,%r1135,%r1136,%r1409,%r1410,%r1411,%r1412,%r1413,%r1414,%r1415,%r1416,%r1417,%r1418,%r1419,%r1420,%r1421,%r1422,%r1423,%r1424,%r4323,%r4324,%r4325 2026-02-21T10:22:35.4651066Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4651127Z // end inline asm 2026-02-21T10:22:35.4651184Z $L__tmp6: 2026-02-21T10:22:35.4651401Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4651479Z add.s64 %rd240, %rd595, 384; 2026-02-21T10:22:35.4651541Z // begin inline asm 2026-02-21T10:22:35.4651601Z mov.u64 %rd239, 0x0; 2026-02-21T10:22:35.4651801Z createpolicy.fractional.L2::evict_last.b64 %rd239, 1.0; 2026-02-21T10:22:35.4651862Z // end inline asm 2026-02-21T10:22:35.4651923Z // begin inline asm 2026-02-21T10:22:35.4651987Z mov.u32 %r4393, 0x0; 2026-02-21T10:22:35.4652046Z mov.u32 %r4394, 0x0; 2026-02-21T10:22:35.4652103Z mov.u32 %r4395, 0x0; 2026-02-21T10:22:35.4652160Z mov.u32 %r4396, 0x0; 2026-02-21T10:22:35.4652392Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4393, %r4394, %r4395, %r4396 }, [ %rd240 + 0 ], %rd239; 2026-02-21T10:22:35.4652448Z // end inline asm 2026-02-21T10:22:35.4652657Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4652734Z bar.sync 0; 2026-02-21T10:22:35.4652820Z st.shared.v2.b32 [%r14], {%r4393, %r4394}; 2026-02-21T10:22:35.4652898Z st.shared.v2.b32 [%r15], {%r4395, %r4396}; 2026-02-21T10:22:35.4652961Z bar.sync 0; 2026-02-21T10:22:35.4653030Z ld.shared.b16 %rs337, [%r16]; 2026-02-21T10:22:35.4653102Z ld.shared.b16 %rs338, [%r16+1024]; 2026-02-21T10:22:35.4653173Z ld.shared.b16 %rs339, [%r16+64]; 2026-02-21T10:22:35.4653258Z ld.shared.b16 %rs340, [%r16+1088]; 2026-02-21T10:22:35.4653322Z ld.shared.b16 %rs341, [%r17]; 2026-02-21T10:22:35.4653387Z ld.shared.b16 %rs342, [%r17+1024]; 2026-02-21T10:22:35.4653459Z ld.shared.b16 %rs343, [%r17+64]; 2026-02-21T10:22:35.4653522Z ld.shared.b16 %rs344, [%r17+1088]; 2026-02-21T10:22:35.4653586Z ld.shared.b16 %rs345, [%r18]; 2026-02-21T10:22:35.4653651Z ld.shared.b16 %rs346, [%r18+1024]; 2026-02-21T10:22:35.4653725Z ld.shared.b16 %rs347, [%r18+64]; 2026-02-21T10:22:35.4653792Z ld.shared.b16 %rs348, [%r18+1088]; 2026-02-21T10:22:35.4653855Z ld.shared.b16 %rs349, [%r19]; 2026-02-21T10:22:35.4653925Z ld.shared.b16 %rs350, [%r19+1024]; 2026-02-21T10:22:35.4653990Z ld.shared.b16 %rs351, [%r19+64]; 2026-02-21T10:22:35.4654056Z ld.shared.b16 %rs352, [%r19+1088]; 2026-02-21T10:22:35.4654124Z ld.shared.b16 %rs353, [%r20]; 2026-02-21T10:22:35.4654188Z ld.shared.b16 %rs354, [%r20+1024]; 2026-02-21T10:22:35.4654257Z ld.shared.b16 %rs355, [%r20+64]; 2026-02-21T10:22:35.4654321Z ld.shared.b16 %rs356, [%r20+1088]; 2026-02-21T10:22:35.4654391Z ld.shared.b16 %rs357, [%r21]; 2026-02-21T10:22:35.4654455Z ld.shared.b16 %rs358, [%r21+1024]; 2026-02-21T10:22:35.4654519Z ld.shared.b16 %rs359, [%r21+64]; 2026-02-21T10:22:35.4654592Z ld.shared.b16 %rs360, [%r21+1088]; 2026-02-21T10:22:35.4654666Z ld.shared.b16 %rs361, [%r22]; 2026-02-21T10:22:35.4654732Z ld.shared.b16 %rs362, [%r22+1024]; 2026-02-21T10:22:35.4654856Z ld.shared.b16 %rs363, [%r22+64]; 2026-02-21T10:22:35.4654988Z ld.shared.b16 %rs364, [%r22+1088]; 2026-02-21T10:22:35.4655053Z ld.shared.b16 %rs365, [%r23]; 2026-02-21T10:22:35.4655117Z ld.shared.b16 %rs366, [%r23+1024]; 2026-02-21T10:22:35.4655186Z ld.shared.b16 %rs367, [%r23+64]; 2026-02-21T10:22:35.4655316Z ld.shared.b16 %rs368, [%r23+1088]; 2026-02-21T10:22:35.4655381Z cvt.f32.bf16 %r4470, %rs337; 2026-02-21T10:22:35.4655443Z cvt.f32.bf16 %r4471, %rs338; 2026-02-21T10:22:35.4655524Z cvt.f32.bf16 %r4472, %rs341; 2026-02-21T10:22:35.4655587Z cvt.f32.bf16 %r4473, %rs342; 2026-02-21T10:22:35.4655650Z cvt.f32.bf16 %r4506, %rs345; 2026-02-21T10:22:35.4655718Z cvt.f32.bf16 %r4507, %rs346; 2026-02-21T10:22:35.4655779Z cvt.f32.bf16 %r4508, %rs349; 2026-02-21T10:22:35.4655841Z cvt.f32.bf16 %r4509, %rs350; 2026-02-21T10:22:35.4655910Z cvt.f32.bf16 %r4542, %rs353; 2026-02-21T10:22:35.4655971Z cvt.f32.bf16 %r4543, %rs354; 2026-02-21T10:22:35.4656034Z cvt.f32.bf16 %r4544, %rs357; 2026-02-21T10:22:35.4656095Z cvt.f32.bf16 %r4545, %rs358; 2026-02-21T10:22:35.4656164Z cvt.f32.bf16 %r4578, %rs361; 2026-02-21T10:22:35.4656226Z cvt.f32.bf16 %r4579, %rs362; 2026-02-21T10:22:35.4656289Z cvt.f32.bf16 %r4580, %rs365; 2026-02-21T10:22:35.4656354Z cvt.f32.bf16 %r4581, %rs366; 2026-02-21T10:22:35.4656417Z cvt.f32.bf16 %r4614, %rs339; 2026-02-21T10:22:35.4656597Z cvt.f32.bf16 %r4615, %rs340; 2026-02-21T10:22:35.4656664Z cvt.f32.bf16 %r4616, %rs343; 2026-02-21T10:22:35.4656811Z cvt.f32.bf16 %r4617, %rs344; 2026-02-21T10:22:35.4656875Z cvt.f32.bf16 %r4650, %rs347; 2026-02-21T10:22:35.4656936Z cvt.f32.bf16 %r4651, %rs348; 2026-02-21T10:22:35.4657003Z cvt.f32.bf16 %r4652, %rs351; 2026-02-21T10:22:35.4657064Z cvt.f32.bf16 %r4653, %rs352; 2026-02-21T10:22:35.4657126Z cvt.f32.bf16 %r4686, %rs355; 2026-02-21T10:22:35.4657189Z cvt.f32.bf16 %r4687, %rs356; 2026-02-21T10:22:35.4657254Z cvt.f32.bf16 %r4688, %rs359; 2026-02-21T10:22:35.4657316Z cvt.f32.bf16 %r4689, %rs360; 2026-02-21T10:22:35.4657381Z cvt.f32.bf16 %r4722, %rs363; 2026-02-21T10:22:35.4657464Z cvt.f32.bf16 %r4723, %rs364; 2026-02-21T10:22:35.4657526Z cvt.f32.bf16 %r4724, %rs367; 2026-02-21T10:22:35.4657592Z cvt.f32.bf16 %r4725, %rs368; 2026-02-21T10:22:35.4657810Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4657870Z bar.sync 0; 2026-02-21T10:22:35.4657939Z // begin inline asm 2026-02-21T10:22:35.4658044Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4658108Z // end inline asm 2026-02-21T10:22:35.4658176Z bar.sync 0; 2026-02-21T10:22:35.4658238Z // begin inline asm 2026-02-21T10:22:35.4658383Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4658442Z // end inline asm 2026-02-21T10:22:35.4658503Z // begin inline asm 2026-02-21T10:22:35.4658582Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4658647Z // end inline asm 2026-02-21T10:22:35.4658705Z bar.sync 0; 2026-02-21T10:22:35.4658775Z elect.sync %r4816|%p145, -1; 2026-02-21T10:22:35.4658852Z and.pred %p130, %p1, %p145; 2026-02-21T10:22:35.4658915Z or.b32 %r4401, %r424, 96; 2026-02-21T10:22:35.4658976Z // begin inline asm 2026-02-21T10:22:35.4659312Z @%p130 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r423, %r4401}], [%r4903]; 2026-02-21T10:22:35.4659374Z // end inline asm 2026-02-21T10:22:35.4659431Z bar.sync 0; 2026-02-21T10:22:35.4659501Z // begin inline asm 2026-02-21T10:22:35.4659562Z 2026-02-21T10:22:35.4659612Z { 2026-02-21T10:22:35.4659679Z .reg .pred complete; 2026-02-21T10:22:35.4659743Z waitLoop: 2026-02-21T10:22:35.4659885Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r4743; 2026-02-21T10:22:35.4659956Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4660009Z } 2026-02-21T10:22:35.4660013Z 2026-02-21T10:22:35.4660078Z // end inline asm 2026-02-21T10:22:35.4660133Z bar.sync 0; 2026-02-21T10:22:35.4660277Z // begin inline asm 2026-02-21T10:22:35.4660439Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4660495Z // end inline asm 2026-02-21T10:22:35.4660706Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4660773Z ld.shared.b8 %rs369, [%r24]; 2026-02-21T10:22:35.4660911Z ld.shared.b8 %rs370, [%r24+1024]; 2026-02-21T10:22:35.4660977Z ld.shared.b8 %rs371, [%r24+2048]; 2026-02-21T10:22:35.4661044Z ld.shared.b8 %rs372, [%r24+3072]; 2026-02-21T10:22:35.4661116Z ld.shared.b8 %rs373, [%r25+256]; 2026-02-21T10:22:35.4661180Z ld.shared.b8 %rs374, [%r25+1280]; 2026-02-21T10:22:35.4661243Z ld.shared.b8 %rs375, [%r25+2304]; 2026-02-21T10:22:35.4661312Z ld.shared.b8 %rs376, [%r25+3328]; 2026-02-21T10:22:35.4661381Z ld.shared.b8 %rs377, [%r26+512]; 2026-02-21T10:22:35.4661443Z ld.shared.b8 %rs378, [%r26+1536]; 2026-02-21T10:22:35.4661505Z ld.shared.b8 %rs379, [%r26+2560]; 2026-02-21T10:22:35.4661590Z ld.shared.b8 %rs380, [%r26+3584]; 2026-02-21T10:22:35.4661659Z ld.shared.b8 %rs381, [%r27+768]; 2026-02-21T10:22:35.4661726Z ld.shared.b8 %rs382, [%r27+1792]; 2026-02-21T10:22:35.4661794Z ld.shared.b8 %rs383, [%r27+2816]; 2026-02-21T10:22:35.4661858Z ld.shared.b8 %rs384, [%r27+3840]; 2026-02-21T10:22:35.4662068Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4662138Z shl.b16 %rs385, %rs369, 4; 2026-02-21T10:22:35.4662264Z shl.b16 %rs386, %rs373, 4; 2026-02-21T10:22:35.4662330Z shl.b16 %rs387, %rs377, 4; 2026-02-21T10:22:35.4662392Z shl.b16 %rs388, %rs381, 4; 2026-02-21T10:22:35.4662469Z shl.b16 %rs389, %rs370, 4; 2026-02-21T10:22:35.4662536Z shl.b16 %rs390, %rs374, 4; 2026-02-21T10:22:35.4662597Z shl.b16 %rs391, %rs378, 4; 2026-02-21T10:22:35.4662665Z shl.b16 %rs392, %rs382, 4; 2026-02-21T10:22:35.4662725Z shl.b16 %rs393, %rs371, 4; 2026-02-21T10:22:35.4662787Z shl.b16 %rs394, %rs375, 4; 2026-02-21T10:22:35.4662853Z shl.b16 %rs395, %rs379, 4; 2026-02-21T10:22:35.4662923Z shl.b16 %rs396, %rs383, 4; 2026-02-21T10:22:35.4662986Z shl.b16 %rs397, %rs372, 4; 2026-02-21T10:22:35.4663046Z shl.b16 %rs398, %rs376, 4; 2026-02-21T10:22:35.4663115Z shl.b16 %rs399, %rs380, 4; 2026-02-21T10:22:35.4663178Z shl.b16 %rs400, %rs384, 4; 2026-02-21T10:22:35.4663382Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4663459Z selp.b16 %rs401, %rs385, %rs369, %p484; 2026-02-21T10:22:35.4663529Z cvt.s16.s8 %rs402, %rs401; 2026-02-21T10:22:35.4663592Z shr.s16 %rs403, %rs402, 4; 2026-02-21T10:22:35.4663666Z selp.b16 %rs404, %rs386, %rs373, %p484; 2026-02-21T10:22:35.4663735Z cvt.s16.s8 %rs405, %rs404; 2026-02-21T10:22:35.4663796Z shr.s16 %rs406, %rs405, 4; 2026-02-21T10:22:35.4663865Z selp.b16 %rs407, %rs387, %rs377, %p484; 2026-02-21T10:22:35.4663939Z cvt.s16.s8 %rs408, %rs407; 2026-02-21T10:22:35.4664006Z shr.s16 %rs409, %rs408, 4; 2026-02-21T10:22:35.4664078Z selp.b16 %rs410, %rs388, %rs381, %p484; 2026-02-21T10:22:35.4664143Z cvt.s16.s8 %rs411, %rs410; 2026-02-21T10:22:35.4664209Z shr.s16 %rs412, %rs411, 4; 2026-02-21T10:22:35.4664278Z selp.b16 %rs413, %rs389, %rs370, %p484; 2026-02-21T10:22:35.4664340Z cvt.s16.s8 %rs414, %rs413; 2026-02-21T10:22:35.4664408Z shr.s16 %rs415, %rs414, 4; 2026-02-21T10:22:35.4664477Z selp.b16 %rs416, %rs390, %rs374, %p484; 2026-02-21T10:22:35.4664538Z cvt.s16.s8 %rs417, %rs416; 2026-02-21T10:22:35.4664604Z shr.s16 %rs418, %rs417, 4; 2026-02-21T10:22:35.4664678Z selp.b16 %rs419, %rs391, %rs378, %p484; 2026-02-21T10:22:35.4664741Z cvt.s16.s8 %rs420, %rs419; 2026-02-21T10:22:35.4664802Z shr.s16 %rs421, %rs420, 4; 2026-02-21T10:22:35.4664876Z selp.b16 %rs422, %rs392, %rs382, %p484; 2026-02-21T10:22:35.4664937Z cvt.s16.s8 %rs423, %rs422; 2026-02-21T10:22:35.4664999Z shr.s16 %rs424, %rs423, 4; 2026-02-21T10:22:35.4665068Z selp.b16 %rs425, %rs393, %rs371, %p484; 2026-02-21T10:22:35.4665200Z cvt.s16.s8 %rs426, %rs425; 2026-02-21T10:22:35.4665307Z shr.s16 %rs427, %rs426, 4; 2026-02-21T10:22:35.4665378Z selp.b16 %rs428, %rs394, %rs375, %p484; 2026-02-21T10:22:35.4665446Z cvt.s16.s8 %rs429, %rs428; 2026-02-21T10:22:35.4665506Z shr.s16 %rs430, %rs429, 4; 2026-02-21T10:22:35.4665588Z selp.b16 %rs431, %rs395, %rs379, %p484; 2026-02-21T10:22:35.4665696Z cvt.s16.s8 %rs432, %rs431; 2026-02-21T10:22:35.4665765Z shr.s16 %rs433, %rs432, 4; 2026-02-21T10:22:35.4665837Z selp.b16 %rs434, %rs396, %rs383, %p484; 2026-02-21T10:22:35.4665899Z cvt.s16.s8 %rs435, %rs434; 2026-02-21T10:22:35.4665966Z shr.s16 %rs436, %rs435, 4; 2026-02-21T10:22:35.4666036Z selp.b16 %rs437, %rs397, %rs372, %p484; 2026-02-21T10:22:35.4666097Z cvt.s16.s8 %rs438, %rs437; 2026-02-21T10:22:35.4666167Z shr.s16 %rs439, %rs438, 4; 2026-02-21T10:22:35.4666236Z selp.b16 %rs440, %rs398, %rs376, %p484; 2026-02-21T10:22:35.4666299Z cvt.s16.s8 %rs441, %rs440; 2026-02-21T10:22:35.4666361Z shr.s16 %rs442, %rs441, 4; 2026-02-21T10:22:35.4666440Z selp.b16 %rs443, %rs399, %rs380, %p484; 2026-02-21T10:22:35.4666626Z cvt.s16.s8 %rs444, %rs443; 2026-02-21T10:22:35.4666688Z shr.s16 %rs445, %rs444, 4; 2026-02-21T10:22:35.4666767Z selp.b16 %rs446, %rs400, %rs384, %p484; 2026-02-21T10:22:35.4666828Z cvt.s16.s8 %rs447, %rs446; 2026-02-21T10:22:35.4666894Z shr.s16 %rs448, %rs447, 4; 2026-02-21T10:22:35.4667111Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4667262Z cvt.rn.f32.s16 %r4817, %rs403; 2026-02-21T10:22:35.4667333Z cvt.rn.f32.s16 %r4818, %rs406; 2026-02-21T10:22:35.4667397Z cvt.rn.f32.s16 %r4819, %rs409; 2026-02-21T10:22:35.4667465Z cvt.rn.f32.s16 %r4820, %rs412; 2026-02-21T10:22:35.4667526Z cvt.rn.f32.s16 %r4821, %rs415; 2026-02-21T10:22:35.4667588Z cvt.rn.f32.s16 %r4822, %rs418; 2026-02-21T10:22:35.4667655Z cvt.rn.f32.s16 %r4823, %rs421; 2026-02-21T10:22:35.4667719Z cvt.rn.f32.s16 %r4824, %rs424; 2026-02-21T10:22:35.4667787Z cvt.rn.f32.s16 %r4825, %rs427; 2026-02-21T10:22:35.4667853Z cvt.rn.f32.s16 %r4826, %rs430; 2026-02-21T10:22:35.4667919Z cvt.rn.f32.s16 %r4827, %rs433; 2026-02-21T10:22:35.4667981Z cvt.rn.f32.s16 %r4828, %rs436; 2026-02-21T10:22:35.4668045Z cvt.rn.f32.s16 %r4829, %rs439; 2026-02-21T10:22:35.4668116Z cvt.rn.f32.s16 %r4830, %rs442; 2026-02-21T10:22:35.4668178Z cvt.rn.f32.s16 %r4831, %rs445; 2026-02-21T10:22:35.4668241Z cvt.rn.f32.s16 %r4832, %rs448; 2026-02-21T10:22:35.4668300Z bar.sync 0; 2026-02-21T10:22:35.4668369Z st.shared.b32 [%r28], %r4817; 2026-02-21T10:22:35.4668518Z st.shared.b32 [%r28+16384], %r4825; 2026-02-21T10:22:35.4668588Z st.shared.b32 [%r29], %r4818; 2026-02-21T10:22:35.4668662Z st.shared.b32 [%r29+16384], %r4826; 2026-02-21T10:22:35.4668725Z st.shared.b32 [%r30], %r4819; 2026-02-21T10:22:35.4668791Z st.shared.b32 [%r30+16384], %r4827; 2026-02-21T10:22:35.4668856Z st.shared.b32 [%r31], %r4820; 2026-02-21T10:22:35.4668926Z st.shared.b32 [%r31+16384], %r4828; 2026-02-21T10:22:35.4668991Z st.shared.b32 [%r32], %r4821; 2026-02-21T10:22:35.4669060Z st.shared.b32 [%r32+16384], %r4829; 2026-02-21T10:22:35.4669130Z st.shared.b32 [%r33], %r4822; 2026-02-21T10:22:35.4669193Z st.shared.b32 [%r33+16384], %r4830; 2026-02-21T10:22:35.4669255Z st.shared.b32 [%r34], %r4823; 2026-02-21T10:22:35.4669328Z st.shared.b32 [%r34+16384], %r4831; 2026-02-21T10:22:35.4669391Z st.shared.b32 [%r35], %r4824; 2026-02-21T10:22:35.4669462Z st.shared.b32 [%r35+16384], %r4832; 2026-02-21T10:22:35.4669519Z $L__tmp7: 2026-02-21T10:22:35.4669805Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4669990Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r545, %r833, %r1121, %r1409}; 2026-02-21T10:22:35.4670050Z bar.sync 0; 2026-02-21T10:22:35.4670114Z // begin inline asm 2026-02-21T10:22:35.4670253Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4474}, [%r4407]; 2026-02-21T10:22:35.4670393Z // end inline asm 2026-02-21T10:22:35.4670511Z bar.sync 0; 2026-02-21T10:22:35.4670695Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r547, %r835, %r1123, %r1411}; 2026-02-21T10:22:35.4670750Z bar.sync 0; 2026-02-21T10:22:35.4670809Z // begin inline asm 2026-02-21T10:22:35.4671008Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4476}, [%r4407]; 2026-02-21T10:22:35.4671067Z // end inline asm 2026-02-21T10:22:35.4671123Z bar.sync 0; 2026-02-21T10:22:35.4671303Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r546, %r834, %r1122, %r1410}; 2026-02-21T10:22:35.4671358Z bar.sync 0; 2026-02-21T10:22:35.4671417Z // begin inline asm 2026-02-21T10:22:35.4671549Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4475}, [%r4407]; 2026-02-21T10:22:35.4671613Z // end inline asm 2026-02-21T10:22:35.4671668Z bar.sync 0; 2026-02-21T10:22:35.4671838Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r548, %r836, %r1124, %r1412}; 2026-02-21T10:22:35.4671899Z bar.sync 0; 2026-02-21T10:22:35.4671960Z // begin inline asm 2026-02-21T10:22:35.4672089Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4477}, [%r4407]; 2026-02-21T10:22:35.4672145Z // end inline asm 2026-02-21T10:22:35.4672208Z bar.sync 0; 2026-02-21T10:22:35.4672381Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r549, %r837, %r1125, %r1413}; 2026-02-21T10:22:35.4672440Z bar.sync 0; 2026-02-21T10:22:35.4672506Z // begin inline asm 2026-02-21T10:22:35.4672635Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4478}, [%r4407]; 2026-02-21T10:22:35.4672754Z // end inline asm 2026-02-21T10:22:35.4672819Z bar.sync 0; 2026-02-21T10:22:35.4672994Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r551, %r839, %r1127, %r1415}; 2026-02-21T10:22:35.4673050Z bar.sync 0; 2026-02-21T10:22:35.4673109Z // begin inline asm 2026-02-21T10:22:35.4673241Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4480}, [%r4407]; 2026-02-21T10:22:35.4673298Z // end inline asm 2026-02-21T10:22:35.4673353Z bar.sync 0; 2026-02-21T10:22:35.4673529Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r550, %r838, %r1126, %r1414}; 2026-02-21T10:22:35.4673586Z bar.sync 0; 2026-02-21T10:22:35.4673645Z // begin inline asm 2026-02-21T10:22:35.4673769Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4479}, [%r4407]; 2026-02-21T10:22:35.4673835Z // end inline asm 2026-02-21T10:22:35.4673894Z bar.sync 0; 2026-02-21T10:22:35.4674065Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r552, %r840, %r1128, %r1416}; 2026-02-21T10:22:35.4674126Z bar.sync 0; 2026-02-21T10:22:35.4674187Z // begin inline asm 2026-02-21T10:22:35.4674314Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4481}, [%r4407]; 2026-02-21T10:22:35.4674381Z // end inline asm 2026-02-21T10:22:35.4674446Z bar.sync 0; 2026-02-21T10:22:35.4674618Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r553, %r841, %r1129, %r1417}; 2026-02-21T10:22:35.4674675Z bar.sync 0; 2026-02-21T10:22:35.4674739Z // begin inline asm 2026-02-21T10:22:35.4674869Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4482}, [%r4407]; 2026-02-21T10:22:35.4674927Z // end inline asm 2026-02-21T10:22:35.4674990Z bar.sync 0; 2026-02-21T10:22:35.4675163Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r555, %r843, %r1131, %r1419}; 2026-02-21T10:22:35.4675219Z bar.sync 0; 2026-02-21T10:22:35.4675289Z // begin inline asm 2026-02-21T10:22:35.4675427Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4484}, [%r4407]; 2026-02-21T10:22:35.4675486Z // end inline asm 2026-02-21T10:22:35.4675541Z bar.sync 0; 2026-02-21T10:22:35.4675724Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r554, %r842, %r1130, %r1418}; 2026-02-21T10:22:35.4675779Z bar.sync 0; 2026-02-21T10:22:35.4675840Z // begin inline asm 2026-02-21T10:22:35.4675969Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4483}, [%r4407]; 2026-02-21T10:22:35.4676034Z // end inline asm 2026-02-21T10:22:35.4676090Z bar.sync 0; 2026-02-21T10:22:35.4676260Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r556, %r844, %r1132, %r1420}; 2026-02-21T10:22:35.4676321Z bar.sync 0; 2026-02-21T10:22:35.4676438Z // begin inline asm 2026-02-21T10:22:35.4676790Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4485}, [%r4407]; 2026-02-21T10:22:35.4676857Z // end inline asm 2026-02-21T10:22:35.4676913Z bar.sync 0; 2026-02-21T10:22:35.4677099Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r557, %r845, %r1133, %r1421}; 2026-02-21T10:22:35.4677218Z bar.sync 0; 2026-02-21T10:22:35.4677283Z // begin inline asm 2026-02-21T10:22:35.4677423Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4486}, [%r4407]; 2026-02-21T10:22:35.4677483Z // end inline asm 2026-02-21T10:22:35.4677547Z bar.sync 0; 2026-02-21T10:22:35.4677723Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r559, %r847, %r1135, %r1423}; 2026-02-21T10:22:35.4677778Z bar.sync 0; 2026-02-21T10:22:35.4677837Z // begin inline asm 2026-02-21T10:22:35.4677972Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4488}, [%r4407]; 2026-02-21T10:22:35.4678028Z // end inline asm 2026-02-21T10:22:35.4678082Z bar.sync 0; 2026-02-21T10:22:35.4679847Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r558, %r846, %r1134, %r1422}; 2026-02-21T10:22:35.4679942Z bar.sync 0; 2026-02-21T10:22:35.4680008Z // begin inline asm 2026-02-21T10:22:35.4680161Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4487}, [%r4407]; 2026-02-21T10:22:35.4680226Z // end inline asm 2026-02-21T10:22:35.4680290Z bar.sync 0; 2026-02-21T10:22:35.4680480Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r560, %r848, %r1136, %r1424}; 2026-02-21T10:22:35.4680643Z bar.sync 0; 2026-02-21T10:22:35.4680716Z // begin inline asm 2026-02-21T10:22:35.4680853Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r4489}, [%r4407]; 2026-02-21T10:22:35.4680911Z // end inline asm 2026-02-21T10:22:35.4680975Z // begin inline asm 2026-02-21T10:22:35.4681057Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4681113Z // end inline asm 2026-02-21T10:22:35.4681188Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4681257Z shl.b32 %r4833, %r4781, 10; 2026-02-21T10:22:35.4681322Z and.b32 %r4834, %r4833, 12288; 2026-02-21T10:22:35.4681390Z add.s32 %r4835, %r4834, %r11026; 2026-02-21T10:22:35.4681464Z bfe.u32 %r4836, %r4835, 4, 14; 2026-02-21T10:22:35.4681533Z cvt.u64.u32 %rd251, %r4836; 2026-02-21T10:22:35.4681611Z or.b64 %rd243, %rd251, 4611686293372403712; 2026-02-21T10:22:35.4681675Z // begin inline asm 2026-02-21T10:22:35.4682196Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4470,%r4471,%r4472,%r4473}, %rd243, %p24, 1, 1; 2026-02-21T10:22:35.4682255Z // end inline asm 2026-02-21T10:22:35.4682319Z add.s32 %r4837, %r4835, 32; 2026-02-21T10:22:35.4682392Z bfe.u32 %r4838, %r4837, 4, 14; 2026-02-21T10:22:35.4682454Z cvt.u64.u32 %rd252, %r4838; 2026-02-21T10:22:35.4682528Z or.b64 %rd244, %rd252, 4611686293372403712; 2026-02-21T10:22:35.4682588Z // begin inline asm 2026-02-21T10:22:35.4683089Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4506,%r4507,%r4508,%r4509}, %rd244, %p24, 1, 1; 2026-02-21T10:22:35.4683148Z // end inline asm 2026-02-21T10:22:35.4683208Z add.s32 %r4839, %r4835, 64; 2026-02-21T10:22:35.4683274Z bfe.u32 %r4840, %r4839, 4, 14; 2026-02-21T10:22:35.4683341Z cvt.u64.u32 %rd253, %r4840; 2026-02-21T10:22:35.4683412Z or.b64 %rd245, %rd253, 4611686293372403712; 2026-02-21T10:22:35.4683480Z // begin inline asm 2026-02-21T10:22:35.4683973Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4542,%r4543,%r4544,%r4545}, %rd245, %p24, 1, 1; 2026-02-21T10:22:35.4684033Z // end inline asm 2026-02-21T10:22:35.4684103Z add.s32 %r4841, %r4835, 96; 2026-02-21T10:22:35.4684165Z bfe.u32 %r4842, %r4841, 4, 14; 2026-02-21T10:22:35.4684226Z cvt.u64.u32 %rd254, %r4842; 2026-02-21T10:22:35.4684296Z or.b64 %rd246, %rd254, 4611686293372403712; 2026-02-21T10:22:35.4684433Z // begin inline asm 2026-02-21T10:22:35.4684930Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4578,%r4579,%r4580,%r4581}, %rd246, %p24, 1, 1; 2026-02-21T10:22:35.4685031Z // end inline asm 2026-02-21T10:22:35.4685100Z add.s32 %r4843, %r4835, 16384; 2026-02-21T10:22:35.4685164Z bfe.u32 %r4844, %r4843, 4, 14; 2026-02-21T10:22:35.4685232Z cvt.u64.u32 %rd255, %r4844; 2026-02-21T10:22:35.4685310Z or.b64 %rd247, %rd255, 4611686293372403712; 2026-02-21T10:22:35.4685371Z // begin inline asm 2026-02-21T10:22:35.4685860Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4614,%r4615,%r4616,%r4617}, %rd247, %p24, 1, 1; 2026-02-21T10:22:35.4685916Z // end inline asm 2026-02-21T10:22:35.4685982Z add.s32 %r4845, %r4835, 16416; 2026-02-21T10:22:35.4686045Z bfe.u32 %r4846, %r4845, 4, 14; 2026-02-21T10:22:35.4686215Z cvt.u64.u32 %rd256, %r4846; 2026-02-21T10:22:35.4686294Z or.b64 %rd248, %rd256, 4611686293372403712; 2026-02-21T10:22:35.4686356Z // begin inline asm 2026-02-21T10:22:35.4687069Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4650,%r4651,%r4652,%r4653}, %rd248, %p24, 1, 1; 2026-02-21T10:22:35.4687143Z // end inline asm 2026-02-21T10:22:35.4687206Z add.s32 %r4847, %r4835, 16448; 2026-02-21T10:22:35.4687267Z bfe.u32 %r4848, %r4847, 4, 14; 2026-02-21T10:22:35.4687334Z cvt.u64.u32 %rd257, %r4848; 2026-02-21T10:22:35.4687410Z or.b64 %rd249, %rd257, 4611686293372403712; 2026-02-21T10:22:35.4687470Z // begin inline asm 2026-02-21T10:22:35.4687966Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4686,%r4687,%r4688,%r4689}, %rd249, %p24, 1, 1; 2026-02-21T10:22:35.4688044Z // end inline asm 2026-02-21T10:22:35.4688107Z add.s32 %r4849, %r4835, 16480; 2026-02-21T10:22:35.4688168Z bfe.u32 %r4850, %r4849, 4, 14; 2026-02-21T10:22:35.4688235Z cvt.u64.u32 %rd258, %r4850; 2026-02-21T10:22:35.4688314Z or.b64 %rd250, %rd258, 4611686293372403712; 2026-02-21T10:22:35.4688376Z // begin inline asm 2026-02-21T10:22:35.4688882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489}, {%r4722,%r4723,%r4724,%r4725}, %rd250, %p24, 1, 1; 2026-02-21T10:22:35.4688941Z // end inline asm 2026-02-21T10:22:35.4689022Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4689084Z mov.b32 %r4742, %r11026; 2026-02-21T10:22:35.4689151Z mov.b32 %r4744, %r4743; 2026-02-21T10:22:35.4689209Z // begin inline asm 2026-02-21T10:22:35.4689518Z // wait for regs: %r4474,%r4475,%r4476,%r4477,%r4478,%r4479,%r4480,%r4481,%r4482,%r4483,%r4484,%r4485,%r4486,%r4487,%r4488,%r4489,%r4742,%r4743,%r4744 2026-02-21T10:22:35.4689602Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4689658Z // end inline asm 2026-02-21T10:22:35.4689726Z $L__tmp8: 2026-02-21T10:22:35.4689966Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.4690030Z add.s64 %rd37, %rd596, 128; 2026-02-21T10:22:35.4690098Z add.s64 %rd595, %rd595, 512; 2026-02-21T10:22:35.4690167Z setp.lt.u64 %p146, %rd596, 3968; 2026-02-21T10:22:35.4690237Z mov.b64 %rd596, %rd37; 2026-02-21T10:22:35.4690302Z @%p146 bra $L__BB0_3; 2026-02-21T10:22:35.4690416Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:35.4690633Z .loc 1 38 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:38:32 2026-02-21T10:22:35.4690698Z or.b32 %r4870, %r423, %r6; 2026-02-21T10:22:35.4690903Z .loc 1 40 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:32 2026-02-21T10:22:35.4691049Z or.b32 %r4871, %r52, %r7; 2026-02-21T10:22:35.4691108Z or.b32 %r4872, %r52, %r8; 2026-02-21T10:22:35.4691307Z .loc 1 93 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:93:28 2026-02-21T10:22:35.4691448Z cvt.rn.bf16x2.f32 %r4873, %r4475, %r4474; 2026-02-21T10:22:35.4691528Z cvt.rn.bf16x2.f32 %r4874, %r4477, %r4476; 2026-02-21T10:22:35.4691602Z cvt.rn.bf16x2.f32 %r4875, %r4479, %r4478; 2026-02-21T10:22:35.4691672Z cvt.rn.bf16x2.f32 %r4876, %r4481, %r4480; 2026-02-21T10:22:35.4691749Z cvt.rn.bf16x2.f32 %r4877, %r4483, %r4482; 2026-02-21T10:22:35.4691820Z cvt.rn.bf16x2.f32 %r4878, %r4485, %r4484; 2026-02-21T10:22:35.4691891Z cvt.rn.bf16x2.f32 %r4879, %r4487, %r4486; 2026-02-21T10:22:35.4691967Z cvt.rn.bf16x2.f32 %r4880, %r4489, %r4488; 2026-02-21T10:22:35.4692176Z .loc 1 94 50 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:50 2026-02-21T10:22:35.4692333Z mad.lo.s32 %r4881, %r4871, 1280, %r4870; 2026-02-21T10:22:35.4692406Z mad.lo.s32 %r4882, %r4872, 1280, %r4870; 2026-02-21T10:22:35.4692615Z .loc 1 94 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:22 2026-02-21T10:22:35.4692690Z mad.wide.s32 %rd259, %r4881, 2, %rd77; 2026-02-21T10:22:35.4692758Z mad.wide.s32 %rd260, %r4882, 2, %rd77; 2026-02-21T10:22:35.4693013Z .loc 1 94 81 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:81 2026-02-21T10:22:35.4693071Z bar.sync 0; 2026-02-21T10:22:35.4693184Z st.shared.v4.b32 [%r38], {%r4873, %r4875, %r4877, %r4879}; 2026-02-21T10:22:35.4693302Z st.shared.v4.b32 [%r38+512], {%r4874, %r4876, %r4878, %r4880}; 2026-02-21T10:22:35.4693357Z bar.sync 0; 2026-02-21T10:22:35.4693417Z // begin inline asm 2026-02-21T10:22:35.4693609Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4861, %r4862, %r4863, %r4864}, [%r4855]; 2026-02-21T10:22:35.4693674Z // end inline asm 2026-02-21T10:22:35.4693737Z // begin inline asm 2026-02-21T10:22:35.4693931Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4865, %r4866, %r4867, %r4868}, [%r4860]; 2026-02-21T10:22:35.4693997Z // end inline asm 2026-02-21T10:22:35.4694054Z // begin inline asm 2026-02-21T10:22:35.4694186Z st.global.v4.b32 [ %rd259 + 0 ], { %r4861, %r4862, %r4863, %r4864 }; 2026-02-21T10:22:35.4694247Z // end inline asm 2026-02-21T10:22:35.4694306Z // begin inline asm 2026-02-21T10:22:35.4694423Z st.global.v4.b32 [ %rd260 + 0 ], { %r4865, %r4866, %r4867, %r4868 }; 2026-02-21T10:22:35.4694480Z // end inline asm 2026-02-21T10:22:35.4694702Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4694769Z add.s32 %r4883, %r14246, 132; 2026-02-21T10:22:35.4694973Z .loc 1 32 35 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:32:35 2026-02-21T10:22:35.4695045Z shr.s32 %r4884, %r4883, 31; 2026-02-21T10:22:35.4695109Z shr.u32 %r4885, %r4884, 16; 2026-02-21T10:22:35.4695176Z add.s32 %r4886, %r4883, %r4885; 2026-02-21T10:22:35.4695239Z shr.s32 %r4887, %r4886, 16; 2026-02-21T10:22:35.4695447Z .loc 1 33 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:33:33 2026-02-21T10:22:35.4695511Z shl.b32 %r4888, %r4887, 6; 2026-02-21T10:22:35.4695710Z .loc 1 34 39 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:39 2026-02-21T10:22:35.4695785Z sub.s32 %r4889, 10, %r4888; 2026-02-21T10:22:35.4695988Z .loc 1 34 52 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:52 2026-02-21T10:22:35.4696049Z min.s32 %r4890, %r4889, 64; 2026-02-21T10:22:35.4696251Z .loc 1 35 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:45 2026-02-21T10:22:35.4696318Z and.b32 %r4891, %r4886, -65536; 2026-02-21T10:22:35.4696382Z sub.s32 %r4892, %r4883, %r4891; 2026-02-21T10:22:35.4696714Z .loc 1 36 51 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:36:51 2026-02-21T10:22:35.4696868Z div.s32 %r4893, %r4892, %r4890; 2026-02-21T10:22:35.4697069Z .loc 1 35 64 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:64 2026-02-21T10:22:35.4697197Z mul.lo.s32 %r4894, %r4893, %r4890; 2026-02-21T10:22:35.4697268Z sub.s32 %r4895, %r4892, %r4894; 2026-02-21T10:22:35.4697471Z .loc 1 35 30 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:30 2026-02-21T10:22:35.4697534Z add.s32 %r4896, %r4895, %r4888; 2026-02-21T10:22:35.4697739Z .loc 1 37 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:37:27 2026-02-21T10:22:35.4697804Z shl.b32 %r4906, %r4896, 7; 2026-02-21T10:22:35.4698004Z .loc 1 39 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:39:27 2026-02-21T10:22:35.4698076Z shl.b32 %r86, %r4893, 6; 2026-02-21T10:22:35.4698347Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.4698412Z shl.b32 %r4897, %r4893, 19; 2026-02-21T10:22:35.4698481Z or.b32 %r4898, %r49, %r4897; 2026-02-21T10:22:35.4698554Z mad.wide.s32 %rd597, %r4898, 2, %rd76; 2026-02-21T10:22:35.4698619Z mov.b32 %r8957, 0f00000000; 2026-02-21T10:22:35.4698679Z mov.b64 %rd598, 0; 2026-02-21T10:22:35.4698746Z mov.b32 %r8958, %r8957; 2026-02-21T10:22:35.4698897Z mov.b32 %r8959, %r8957; 2026-02-21T10:22:35.4698963Z mov.b32 %r8960, %r8957; 2026-02-21T10:22:35.4699027Z mov.b32 %r8961, %r8957; 2026-02-21T10:22:35.4699091Z mov.b32 %r8962, %r8957; 2026-02-21T10:22:35.4707360Z mov.b32 %r8963, %r8957; 2026-02-21T10:22:35.4707465Z mov.b32 %r8964, %r8957; 2026-02-21T10:22:35.4707531Z mov.b32 %r8965, %r8957; 2026-02-21T10:22:35.4707590Z mov.b32 %r8966, %r8957; 2026-02-21T10:22:35.4707653Z mov.b32 %r8967, %r8957; 2026-02-21T10:22:35.4707712Z mov.b32 %r8968, %r8957; 2026-02-21T10:22:35.4707777Z mov.b32 %r8969, %r8957; 2026-02-21T10:22:35.4707839Z mov.b32 %r8970, %r8957; 2026-02-21T10:22:35.4707898Z mov.b32 %r8971, %r8957; 2026-02-21T10:22:35.4707956Z mov.b32 %r8972, %r8957; 2026-02-21T10:22:35.4708084Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:35.4708203Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:35.4708530Z .loc 1 0 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:0:126 2026-02-21T10:22:35.4708604Z cvt.u32.u64 %r4907, %rd598; 2026-02-21T10:22:35.4708835Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4708900Z // begin inline asm 2026-02-21T10:22:35.4708963Z mov.u64 %rd262, 0x0; 2026-02-21T10:22:35.4709106Z createpolicy.fractional.L2::evict_last.b64 %rd262, 1.0; 2026-02-21T10:22:35.4709166Z // end inline asm 2026-02-21T10:22:35.4709227Z // begin inline asm 2026-02-21T10:22:35.4709298Z mov.u32 %r4899, 0x0; 2026-02-21T10:22:35.4709367Z mov.u32 %r4900, 0x0; 2026-02-21T10:22:35.4709426Z mov.u32 %r4901, 0x0; 2026-02-21T10:22:35.4709485Z mov.u32 %r4902, 0x0; 2026-02-21T10:22:35.4709729Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4899, %r4900, %r4901, %r4902 }, [ %rd597 + 0 ], %rd262; 2026-02-21T10:22:35.4709790Z // end inline asm 2026-02-21T10:22:35.4710013Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4710079Z bar.sync 0; 2026-02-21T10:22:35.4710166Z st.shared.v2.b32 [%r14], {%r4899, %r4900}; 2026-02-21T10:22:35.4710246Z st.shared.v2.b32 [%r15], {%r4901, %r4902}; 2026-02-21T10:22:35.4710301Z bar.sync 0; 2026-02-21T10:22:35.4710378Z ld.shared.b16 %rs449, [%r16]; 2026-02-21T10:22:35.4710449Z ld.shared.b16 %rs450, [%r16+1024]; 2026-02-21T10:22:35.4710517Z ld.shared.b16 %rs451, [%r16+64]; 2026-02-21T10:22:35.4710589Z ld.shared.b16 %rs452, [%r16+1088]; 2026-02-21T10:22:35.4710801Z ld.shared.b16 %rs453, [%r17]; 2026-02-21T10:22:35.4710871Z ld.shared.b16 %rs454, [%r17+1024]; 2026-02-21T10:22:35.4710946Z ld.shared.b16 %rs455, [%r17+64]; 2026-02-21T10:22:35.4711019Z ld.shared.b16 %rs456, [%r17+1088]; 2026-02-21T10:22:35.4711162Z ld.shared.b16 %rs457, [%r18]; 2026-02-21T10:22:35.4711230Z ld.shared.b16 %rs458, [%r18+1024]; 2026-02-21T10:22:35.4711299Z ld.shared.b16 %rs459, [%r18+64]; 2026-02-21T10:22:35.4711366Z ld.shared.b16 %rs460, [%r18+1088]; 2026-02-21T10:22:35.4711433Z ld.shared.b16 %rs461, [%r19]; 2026-02-21T10:22:35.4711503Z ld.shared.b16 %rs462, [%r19+1024]; 2026-02-21T10:22:35.4711570Z ld.shared.b16 %rs463, [%r19+64]; 2026-02-21T10:22:35.4711633Z ld.shared.b16 %rs464, [%r19+1088]; 2026-02-21T10:22:35.4711698Z ld.shared.b16 %rs465, [%r20]; 2026-02-21T10:22:35.4711769Z ld.shared.b16 %rs466, [%r20+1024]; 2026-02-21T10:22:35.4711836Z ld.shared.b16 %rs467, [%r20+64]; 2026-02-21T10:22:35.4711899Z ld.shared.b16 %rs468, [%r20+1088]; 2026-02-21T10:22:35.4711972Z ld.shared.b16 %rs469, [%r21]; 2026-02-21T10:22:35.4712111Z ld.shared.b16 %rs470, [%r21+1024]; 2026-02-21T10:22:35.4712182Z ld.shared.b16 %rs471, [%r21+64]; 2026-02-21T10:22:35.4712252Z ld.shared.b16 %rs472, [%r21+1088]; 2026-02-21T10:22:35.4712321Z ld.shared.b16 %rs473, [%r22]; 2026-02-21T10:22:35.4712385Z ld.shared.b16 %rs474, [%r22+1024]; 2026-02-21T10:22:35.4712449Z ld.shared.b16 %rs475, [%r22+64]; 2026-02-21T10:22:35.4712589Z ld.shared.b16 %rs476, [%r22+1088]; 2026-02-21T10:22:35.4712656Z ld.shared.b16 %rs477, [%r23]; 2026-02-21T10:22:35.4712721Z ld.shared.b16 %rs478, [%r23+1024]; 2026-02-21T10:22:35.4712791Z ld.shared.b16 %rs479, [%r23+64]; 2026-02-21T10:22:35.4712856Z ld.shared.b16 %rs480, [%r23+1088]; 2026-02-21T10:22:35.4712922Z cvt.f32.bf16 %r5312, %rs449; 2026-02-21T10:22:35.4712987Z cvt.f32.bf16 %r5313, %rs450; 2026-02-21T10:22:35.4713053Z cvt.f32.bf16 %r5314, %rs453; 2026-02-21T10:22:35.4713115Z cvt.f32.bf16 %r5315, %rs454; 2026-02-21T10:22:35.4713179Z cvt.f32.bf16 %r5348, %rs457; 2026-02-21T10:22:35.4713248Z cvt.f32.bf16 %r5349, %rs458; 2026-02-21T10:22:35.4713321Z cvt.f32.bf16 %r5350, %rs461; 2026-02-21T10:22:35.4713383Z cvt.f32.bf16 %r5351, %rs462; 2026-02-21T10:22:35.4713445Z cvt.f32.bf16 %r5384, %rs465; 2026-02-21T10:22:35.4713515Z cvt.f32.bf16 %r5385, %rs466; 2026-02-21T10:22:35.4713576Z cvt.f32.bf16 %r5386, %rs469; 2026-02-21T10:22:35.4713637Z cvt.f32.bf16 %r5387, %rs470; 2026-02-21T10:22:35.4713705Z cvt.f32.bf16 %r5420, %rs473; 2026-02-21T10:22:35.4713766Z cvt.f32.bf16 %r5421, %rs474; 2026-02-21T10:22:35.4713828Z cvt.f32.bf16 %r5422, %rs477; 2026-02-21T10:22:35.4713888Z cvt.f32.bf16 %r5423, %rs478; 2026-02-21T10:22:35.4713954Z cvt.f32.bf16 %r5456, %rs451; 2026-02-21T10:22:35.4714014Z cvt.f32.bf16 %r5457, %rs452; 2026-02-21T10:22:35.4714074Z cvt.f32.bf16 %r5458, %rs455; 2026-02-21T10:22:35.4714141Z cvt.f32.bf16 %r5459, %rs456; 2026-02-21T10:22:35.4714203Z cvt.f32.bf16 %r5492, %rs459; 2026-02-21T10:22:35.4714267Z cvt.f32.bf16 %r5493, %rs460; 2026-02-21T10:22:35.4714335Z cvt.f32.bf16 %r5494, %rs463; 2026-02-21T10:22:35.4714396Z cvt.f32.bf16 %r5495, %rs464; 2026-02-21T10:22:35.4714455Z cvt.f32.bf16 %r5528, %rs467; 2026-02-21T10:22:35.4714528Z cvt.f32.bf16 %r5529, %rs468; 2026-02-21T10:22:35.4714600Z cvt.f32.bf16 %r5530, %rs471; 2026-02-21T10:22:35.4714662Z cvt.f32.bf16 %r5531, %rs472; 2026-02-21T10:22:35.4714724Z cvt.f32.bf16 %r5564, %rs475; 2026-02-21T10:22:35.4714792Z cvt.f32.bf16 %r5565, %rs476; 2026-02-21T10:22:35.4714857Z cvt.f32.bf16 %r5566, %rs479; 2026-02-21T10:22:35.4714920Z cvt.f32.bf16 %r5567, %rs480; 2026-02-21T10:22:35.4715139Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4715207Z bar.sync 0; 2026-02-21T10:22:35.4715272Z // begin inline asm 2026-02-21T10:22:35.4715377Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4715444Z // end inline asm 2026-02-21T10:22:35.4715501Z bar.sync 0; 2026-02-21T10:22:35.4715631Z // begin inline asm 2026-02-21T10:22:35.4715775Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4715835Z // end inline asm 2026-02-21T10:22:35.4715895Z // begin inline asm 2026-02-21T10:22:35.4716023Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4716085Z // end inline asm 2026-02-21T10:22:35.4716141Z bar.sync 0; 2026-02-21T10:22:35.4716214Z elect.sync %r9247|%p269, -1; 2026-02-21T10:22:35.4716292Z and.pred %p149, %p1, %p269; 2026-02-21T10:22:35.4716355Z // begin inline asm 2026-02-21T10:22:35.4716834Z @%p149 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r4906, %r4907}], [%r4903]; 2026-02-21T10:22:35.4716896Z // end inline asm 2026-02-21T10:22:35.4716958Z bar.sync 0; 2026-02-21T10:22:35.4717029Z mov.b32 %r9226, 0; 2026-02-21T10:22:35.4717091Z // begin inline asm 2026-02-21T10:22:35.4717156Z 2026-02-21T10:22:35.4717209Z { 2026-02-21T10:22:35.4717276Z .reg .pred complete; 2026-02-21T10:22:35.4717337Z waitLoop: 2026-02-21T10:22:35.4717568Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r9226; 2026-02-21T10:22:35.4717645Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4717698Z } 2026-02-21T10:22:35.4717704Z 2026-02-21T10:22:35.4717772Z // end inline asm 2026-02-21T10:22:35.4717829Z bar.sync 0; 2026-02-21T10:22:35.4717890Z // begin inline asm 2026-02-21T10:22:35.4718088Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4718156Z // end inline asm 2026-02-21T10:22:35.4718379Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4718454Z ld.shared.b8 %rs481, [%r24]; 2026-02-21T10:22:35.4718533Z ld.shared.b8 %rs482, [%r24+1024]; 2026-02-21T10:22:35.4718601Z ld.shared.b8 %rs483, [%r24+2048]; 2026-02-21T10:22:35.4718666Z ld.shared.b8 %rs484, [%r24+3072]; 2026-02-21T10:22:35.4718741Z ld.shared.b8 %rs485, [%r25+256]; 2026-02-21T10:22:35.4718809Z ld.shared.b8 %rs486, [%r25+1280]; 2026-02-21T10:22:35.4718889Z ld.shared.b8 %rs487, [%r25+2304]; 2026-02-21T10:22:35.4718958Z ld.shared.b8 %rs488, [%r25+3328]; 2026-02-21T10:22:35.4719035Z ld.shared.b8 %rs489, [%r26+512]; 2026-02-21T10:22:35.4719102Z ld.shared.b8 %rs490, [%r26+1536]; 2026-02-21T10:22:35.4719169Z ld.shared.b8 %rs491, [%r26+2560]; 2026-02-21T10:22:35.4719244Z ld.shared.b8 %rs492, [%r26+3584]; 2026-02-21T10:22:35.4719313Z ld.shared.b8 %rs493, [%r27+768]; 2026-02-21T10:22:35.4719380Z ld.shared.b8 %rs494, [%r27+1792]; 2026-02-21T10:22:35.4719449Z ld.shared.b8 %rs495, [%r27+2816]; 2026-02-21T10:22:35.4719516Z ld.shared.b8 %rs496, [%r27+3840]; 2026-02-21T10:22:35.4719727Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4719793Z shl.b16 %rs497, %rs481, 4; 2026-02-21T10:22:35.4719860Z shl.b16 %rs498, %rs485, 4; 2026-02-21T10:22:35.4719922Z shl.b16 %rs499, %rs489, 4; 2026-02-21T10:22:35.4719987Z shl.b16 %rs500, %rs493, 4; 2026-02-21T10:22:35.4720063Z shl.b16 %rs501, %rs482, 4; 2026-02-21T10:22:35.4720136Z shl.b16 %rs502, %rs486, 4; 2026-02-21T10:22:35.4720200Z shl.b16 %rs503, %rs490, 4; 2026-02-21T10:22:35.4720264Z shl.b16 %rs504, %rs494, 4; 2026-02-21T10:22:35.4720334Z shl.b16 %rs505, %rs483, 4; 2026-02-21T10:22:35.4720395Z shl.b16 %rs506, %rs487, 4; 2026-02-21T10:22:35.4720457Z shl.b16 %rs507, %rs491, 4; 2026-02-21T10:22:35.4720525Z shl.b16 %rs508, %rs495, 4; 2026-02-21T10:22:35.4720587Z shl.b16 %rs509, %rs484, 4; 2026-02-21T10:22:35.4720649Z shl.b16 %rs510, %rs488, 4; 2026-02-21T10:22:35.4720711Z shl.b16 %rs511, %rs492, 4; 2026-02-21T10:22:35.4720779Z shl.b16 %rs512, %rs496, 4; 2026-02-21T10:22:35.4720983Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4721061Z selp.b16 %rs513, %rs497, %rs481, %p484; 2026-02-21T10:22:35.4721131Z cvt.s16.s8 %rs514, %rs513; 2026-02-21T10:22:35.4721191Z shr.s16 %rs515, %rs514, 4; 2026-02-21T10:22:35.4721352Z selp.b16 %rs516, %rs498, %rs485, %p484; 2026-02-21T10:22:35.4721423Z cvt.s16.s8 %rs517, %rs516; 2026-02-21T10:22:35.4721486Z shr.s16 %rs518, %rs517, 4; 2026-02-21T10:22:35.4721557Z selp.b16 %rs519, %rs499, %rs489, %p484; 2026-02-21T10:22:35.4721680Z cvt.s16.s8 %rs520, %rs519; 2026-02-21T10:22:35.4721746Z shr.s16 %rs521, %rs520, 4; 2026-02-21T10:22:35.4721816Z selp.b16 %rs522, %rs500, %rs493, %p484; 2026-02-21T10:22:35.4721881Z cvt.s16.s8 %rs523, %rs522; 2026-02-21T10:22:35.4721946Z shr.s16 %rs524, %rs523, 4; 2026-02-21T10:22:35.4722018Z selp.b16 %rs525, %rs501, %rs482, %p484; 2026-02-21T10:22:35.4722080Z cvt.s16.s8 %rs526, %rs525; 2026-02-21T10:22:35.4722152Z shr.s16 %rs527, %rs526, 4; 2026-02-21T10:22:35.4722233Z selp.b16 %rs528, %rs502, %rs486, %p484; 2026-02-21T10:22:35.4722298Z cvt.s16.s8 %rs529, %rs528; 2026-02-21T10:22:35.4722359Z shr.s16 %rs530, %rs529, 4; 2026-02-21T10:22:35.4722434Z selp.b16 %rs531, %rs503, %rs490, %p484; 2026-02-21T10:22:35.4722500Z cvt.s16.s8 %rs532, %rs531; 2026-02-21T10:22:35.4722609Z shr.s16 %rs533, %rs532, 4; 2026-02-21T10:22:35.4722698Z selp.b16 %rs534, %rs504, %rs494, %p484; 2026-02-21T10:22:35.4722765Z cvt.s16.s8 %rs535, %rs534; 2026-02-21T10:22:35.4722828Z shr.s16 %rs536, %rs535, 4; 2026-02-21T10:22:35.4722900Z selp.b16 %rs537, %rs505, %rs483, %p484; 2026-02-21T10:22:35.4722967Z cvt.s16.s8 %rs538, %rs537; 2026-02-21T10:22:35.4723076Z shr.s16 %rs539, %rs538, 4; 2026-02-21T10:22:35.4723149Z selp.b16 %rs540, %rs506, %rs487, %p484; 2026-02-21T10:22:35.4723217Z cvt.s16.s8 %rs541, %rs540; 2026-02-21T10:22:35.4723279Z shr.s16 %rs542, %rs541, 4; 2026-02-21T10:22:35.4723348Z selp.b16 %rs543, %rs507, %rs491, %p484; 2026-02-21T10:22:35.4723411Z cvt.s16.s8 %rs544, %rs543; 2026-02-21T10:22:35.4723486Z shr.s16 %rs545, %rs544, 4; 2026-02-21T10:22:35.4723563Z selp.b16 %rs546, %rs508, %rs495, %p484; 2026-02-21T10:22:35.4723626Z cvt.s16.s8 %rs547, %rs546; 2026-02-21T10:22:35.4723694Z shr.s16 %rs548, %rs547, 4; 2026-02-21T10:22:35.4723771Z selp.b16 %rs549, %rs509, %rs484, %p484; 2026-02-21T10:22:35.4723832Z cvt.s16.s8 %rs550, %rs549; 2026-02-21T10:22:35.4723893Z shr.s16 %rs551, %rs550, 4; 2026-02-21T10:22:35.4723966Z selp.b16 %rs552, %rs510, %rs488, %p484; 2026-02-21T10:22:35.4724031Z cvt.s16.s8 %rs553, %rs552; 2026-02-21T10:22:35.4724093Z shr.s16 %rs554, %rs553, 4; 2026-02-21T10:22:35.4724169Z selp.b16 %rs555, %rs511, %rs492, %p484; 2026-02-21T10:22:35.4724232Z cvt.s16.s8 %rs556, %rs555; 2026-02-21T10:22:35.4724294Z shr.s16 %rs557, %rs556, 4; 2026-02-21T10:22:35.4724369Z selp.b16 %rs558, %rs512, %rs496, %p484; 2026-02-21T10:22:35.4724431Z cvt.s16.s8 %rs559, %rs558; 2026-02-21T10:22:35.4724491Z shr.s16 %rs560, %rs559, 4; 2026-02-21T10:22:35.4724697Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4724771Z cvt.rn.f32.s16 %r9248, %rs515; 2026-02-21T10:22:35.4724836Z cvt.rn.f32.s16 %r9249, %rs518; 2026-02-21T10:22:35.4724902Z cvt.rn.f32.s16 %r9250, %rs521; 2026-02-21T10:22:35.4724972Z cvt.rn.f32.s16 %r9251, %rs524; 2026-02-21T10:22:35.4725034Z cvt.rn.f32.s16 %r9252, %rs527; 2026-02-21T10:22:35.4725099Z cvt.rn.f32.s16 %r9253, %rs530; 2026-02-21T10:22:35.4725164Z cvt.rn.f32.s16 %r9254, %rs533; 2026-02-21T10:22:35.4725232Z cvt.rn.f32.s16 %r9255, %rs536; 2026-02-21T10:22:35.4725298Z cvt.rn.f32.s16 %r9256, %rs539; 2026-02-21T10:22:35.4725365Z cvt.rn.f32.s16 %r9257, %rs542; 2026-02-21T10:22:35.4725434Z cvt.rn.f32.s16 %r9258, %rs545; 2026-02-21T10:22:35.4725496Z cvt.rn.f32.s16 %r9259, %rs548; 2026-02-21T10:22:35.4725559Z cvt.rn.f32.s16 %r9260, %rs551; 2026-02-21T10:22:35.4725624Z cvt.rn.f32.s16 %r9261, %rs554; 2026-02-21T10:22:35.4725703Z cvt.rn.f32.s16 %r9262, %rs557; 2026-02-21T10:22:35.4725767Z cvt.rn.f32.s16 %r9263, %rs560; 2026-02-21T10:22:35.4725825Z bar.sync 0; 2026-02-21T10:22:35.4725900Z st.shared.b32 [%r41], %r9248; 2026-02-21T10:22:35.4725972Z st.shared.b32 [%r41+16384], %r9256; 2026-02-21T10:22:35.4726099Z st.shared.b32 [%r42], %r9249; 2026-02-21T10:22:35.4726173Z st.shared.b32 [%r42+16384], %r9257; 2026-02-21T10:22:35.4726239Z st.shared.b32 [%r43], %r9250; 2026-02-21T10:22:35.4726305Z st.shared.b32 [%r43+16384], %r9258; 2026-02-21T10:22:35.4726415Z st.shared.b32 [%r44], %r9251; 2026-02-21T10:22:35.4726609Z st.shared.b32 [%r44+16384], %r9259; 2026-02-21T10:22:35.4726683Z st.shared.b32 [%r45], %r9252; 2026-02-21T10:22:35.4726751Z st.shared.b32 [%r45+16384], %r9260; 2026-02-21T10:22:35.4726823Z st.shared.b32 [%r46], %r9253; 2026-02-21T10:22:35.4726888Z st.shared.b32 [%r46+16384], %r9261; 2026-02-21T10:22:35.4726953Z st.shared.b32 [%r47], %r9254; 2026-02-21T10:22:35.4727019Z st.shared.b32 [%r47+16384], %r9262; 2026-02-21T10:22:35.4727087Z st.shared.b32 [%r48], %r9255; 2026-02-21T10:22:35.4727152Z st.shared.b32 [%r48+16384], %r9263; 2026-02-21T10:22:35.4727300Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8957}; 2026-02-21T10:22:35.4727364Z bar.sync 0; 2026-02-21T10:22:35.4727531Z // begin inline asm 2026-02-21T10:22:35.4727732Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5028, %r5316, %r5604, %r5892}, [%r433]; 2026-02-21T10:22:35.4727800Z // end inline asm 2026-02-21T10:22:35.4727859Z bar.sync 0; 2026-02-21T10:22:35.4727998Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8959}; 2026-02-21T10:22:35.4728058Z bar.sync 0; 2026-02-21T10:22:35.4728124Z // begin inline asm 2026-02-21T10:22:35.4728369Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5030, %r5318, %r5606, %r5894}, [%r433]; 2026-02-21T10:22:35.4728431Z // end inline asm 2026-02-21T10:22:35.4728494Z bar.sync 0; 2026-02-21T10:22:35.4728622Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8958}; 2026-02-21T10:22:35.4728677Z bar.sync 0; 2026-02-21T10:22:35.4728736Z // begin inline asm 2026-02-21T10:22:35.4728918Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5029, %r5317, %r5605, %r5893}, [%r433]; 2026-02-21T10:22:35.4728975Z // end inline asm 2026-02-21T10:22:35.4729034Z bar.sync 0; 2026-02-21T10:22:35.4729170Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8960}; 2026-02-21T10:22:35.4729226Z bar.sync 0; 2026-02-21T10:22:35.4729293Z // begin inline asm 2026-02-21T10:22:35.4729472Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5031, %r5319, %r5607, %r5895}, [%r433]; 2026-02-21T10:22:35.4729536Z // end inline asm 2026-02-21T10:22:35.4729604Z bar.sync 0; 2026-02-21T10:22:35.4729742Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8961}; 2026-02-21T10:22:35.4729810Z bar.sync 0; 2026-02-21T10:22:35.4729870Z // begin inline asm 2026-02-21T10:22:35.4730047Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5032, %r5320, %r5608, %r5896}, [%r433]; 2026-02-21T10:22:35.4730111Z // end inline asm 2026-02-21T10:22:35.4730167Z bar.sync 0; 2026-02-21T10:22:35.4730295Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8963}; 2026-02-21T10:22:35.4730351Z bar.sync 0; 2026-02-21T10:22:35.4730416Z // begin inline asm 2026-02-21T10:22:35.4730591Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5034, %r5322, %r5610, %r5898}, [%r433]; 2026-02-21T10:22:35.4730653Z // end inline asm 2026-02-21T10:22:35.4730718Z bar.sync 0; 2026-02-21T10:22:35.4730847Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8962}; 2026-02-21T10:22:35.4730904Z bar.sync 0; 2026-02-21T10:22:35.4730964Z // begin inline asm 2026-02-21T10:22:35.4731143Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5033, %r5321, %r5609, %r5897}, [%r433]; 2026-02-21T10:22:35.4731201Z // end inline asm 2026-02-21T10:22:35.4731256Z bar.sync 0; 2026-02-21T10:22:35.4731384Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8964}; 2026-02-21T10:22:35.4731440Z bar.sync 0; 2026-02-21T10:22:35.4731498Z // begin inline asm 2026-02-21T10:22:35.4731673Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5035, %r5323, %r5611, %r5899}, [%r433]; 2026-02-21T10:22:35.4731730Z // end inline asm 2026-02-21T10:22:35.4731798Z bar.sync 0; 2026-02-21T10:22:35.4731930Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8965}; 2026-02-21T10:22:35.4732066Z bar.sync 0; 2026-02-21T10:22:35.4732130Z // begin inline asm 2026-02-21T10:22:35.4732306Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5036, %r5324, %r5612, %r5900}, [%r433]; 2026-02-21T10:22:35.4732366Z // end inline asm 2026-02-21T10:22:35.4732486Z bar.sync 0; 2026-02-21T10:22:35.4732620Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8967}; 2026-02-21T10:22:35.4732675Z bar.sync 0; 2026-02-21T10:22:35.4732737Z // begin inline asm 2026-02-21T10:22:35.4732919Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5038, %r5326, %r5614, %r5902}, [%r433]; 2026-02-21T10:22:35.4732977Z // end inline asm 2026-02-21T10:22:35.4733033Z bar.sync 0; 2026-02-21T10:22:35.4733173Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8966}; 2026-02-21T10:22:35.4733236Z bar.sync 0; 2026-02-21T10:22:35.4733298Z // begin inline asm 2026-02-21T10:22:35.4733474Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5037, %r5325, %r5613, %r5901}, [%r433]; 2026-02-21T10:22:35.4733538Z // end inline asm 2026-02-21T10:22:35.4733648Z bar.sync 0; 2026-02-21T10:22:35.4733777Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8968}; 2026-02-21T10:22:35.4733834Z bar.sync 0; 2026-02-21T10:22:35.4733899Z // begin inline asm 2026-02-21T10:22:35.4734077Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5039, %r5327, %r5615, %r5903}, [%r433]; 2026-02-21T10:22:35.4734135Z // end inline asm 2026-02-21T10:22:35.4734197Z bar.sync 0; 2026-02-21T10:22:35.4734374Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8969}; 2026-02-21T10:22:35.4734431Z bar.sync 0; 2026-02-21T10:22:35.4734495Z // begin inline asm 2026-02-21T10:22:35.4734672Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5040, %r5328, %r5616, %r5904}, [%r433]; 2026-02-21T10:22:35.4734730Z // end inline asm 2026-02-21T10:22:35.4734787Z bar.sync 0; 2026-02-21T10:22:35.4734918Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8971}; 2026-02-21T10:22:35.4734972Z bar.sync 0; 2026-02-21T10:22:35.4735032Z // begin inline asm 2026-02-21T10:22:35.4735219Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5042, %r5330, %r5618, %r5906}, [%r433]; 2026-02-21T10:22:35.4735278Z // end inline asm 2026-02-21T10:22:35.4735334Z bar.sync 0; 2026-02-21T10:22:35.4735461Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8970}; 2026-02-21T10:22:35.4735525Z bar.sync 0; 2026-02-21T10:22:35.4735585Z // begin inline asm 2026-02-21T10:22:35.4735763Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5041, %r5329, %r5617, %r5905}, [%r433]; 2026-02-21T10:22:35.4735825Z // end inline asm 2026-02-21T10:22:35.4735880Z bar.sync 0; 2026-02-21T10:22:35.4736007Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r4407], {%r8972}; 2026-02-21T10:22:35.4736067Z bar.sync 0; 2026-02-21T10:22:35.4736130Z // begin inline asm 2026-02-21T10:22:35.4736316Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5043, %r5331, %r5619, %r5907}, [%r433]; 2026-02-21T10:22:35.4736375Z // end inline asm 2026-02-21T10:22:35.4736437Z $L__tmp9: 2026-02-21T10:22:35.4736852Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4736918Z // begin inline asm 2026-02-21T10:22:35.4737002Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4737060Z // end inline asm 2026-02-21T10:22:35.4737149Z shfl.sync.idx.b32 %r9264, %r5, 0, 31, -1; 2026-02-21T10:22:35.4737224Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4737300Z mov.pred %p151, -1; 2026-02-21T10:22:35.4737363Z // begin inline asm 2026-02-21T10:22:35.4737888Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5312,%r5313,%r5314,%r5315}, %rd266, %p151, 1, 1; 2026-02-21T10:22:35.4737955Z // end inline asm 2026-02-21T10:22:35.4738016Z // begin inline asm 2026-02-21T10:22:35.4738531Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5348,%r5349,%r5350,%r5351}, %rd267, %p151, 1, 1; 2026-02-21T10:22:35.4738675Z // end inline asm 2026-02-21T10:22:35.4738735Z // begin inline asm 2026-02-21T10:22:35.4739242Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5384,%r5385,%r5386,%r5387}, %rd268, %p151, 1, 1; 2026-02-21T10:22:35.4739370Z // end inline asm 2026-02-21T10:22:35.4739433Z // begin inline asm 2026-02-21T10:22:35.4739946Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5420,%r5421,%r5422,%r5423}, %rd269, %p151, 1, 1; 2026-02-21T10:22:35.4740012Z // end inline asm 2026-02-21T10:22:35.4740073Z // begin inline asm 2026-02-21T10:22:35.4740626Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5456,%r5457,%r5458,%r5459}, %rd270, %p151, 1, 1; 2026-02-21T10:22:35.4740693Z // end inline asm 2026-02-21T10:22:35.4740752Z // begin inline asm 2026-02-21T10:22:35.4741251Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5492,%r5493,%r5494,%r5495}, %rd271, %p151, 1, 1; 2026-02-21T10:22:35.4741368Z // end inline asm 2026-02-21T10:22:35.4741434Z // begin inline asm 2026-02-21T10:22:35.4741929Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5528,%r5529,%r5530,%r5531}, %rd272, %p151, 1, 1; 2026-02-21T10:22:35.4741988Z // end inline asm 2026-02-21T10:22:35.4742052Z // begin inline asm 2026-02-21T10:22:35.4742547Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r5564,%r5565,%r5566,%r5567}, %rd273, %p151, 1, 1; 2026-02-21T10:22:35.4742609Z // end inline asm 2026-02-21T10:22:35.4742671Z // begin inline asm 2026-02-21T10:22:35.4743172Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5312,%r5313,%r5314,%r5315}, %rd274, %p151, 1, 1; 2026-02-21T10:22:35.4743237Z // end inline asm 2026-02-21T10:22:35.4743301Z // begin inline asm 2026-02-21T10:22:35.4743798Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5348,%r5349,%r5350,%r5351}, %rd275, %p151, 1, 1; 2026-02-21T10:22:35.4743856Z // end inline asm 2026-02-21T10:22:35.4743920Z // begin inline asm 2026-02-21T10:22:35.4744419Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5384,%r5385,%r5386,%r5387}, %rd276, %p151, 1, 1; 2026-02-21T10:22:35.4744482Z // end inline asm 2026-02-21T10:22:35.4744546Z // begin inline asm 2026-02-21T10:22:35.4745041Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5420,%r5421,%r5422,%r5423}, %rd277, %p151, 1, 1; 2026-02-21T10:22:35.4745103Z // end inline asm 2026-02-21T10:22:35.4745165Z // begin inline asm 2026-02-21T10:22:35.4745670Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5456,%r5457,%r5458,%r5459}, %rd278, %p151, 1, 1; 2026-02-21T10:22:35.4745729Z // end inline asm 2026-02-21T10:22:35.4745789Z // begin inline asm 2026-02-21T10:22:35.4746293Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5492,%r5493,%r5494,%r5495}, %rd279, %p151, 1, 1; 2026-02-21T10:22:35.4746405Z // end inline asm 2026-02-21T10:22:35.4746608Z // begin inline asm 2026-02-21T10:22:35.4747194Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5528,%r5529,%r5530,%r5531}, %rd280, %p151, 1, 1; 2026-02-21T10:22:35.4747255Z // end inline asm 2026-02-21T10:22:35.4747322Z // begin inline asm 2026-02-21T10:22:35.4747816Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r5564,%r5565,%r5566,%r5567}, %rd281, %p151, 1, 1; 2026-02-21T10:22:35.4747873Z // end inline asm 2026-02-21T10:22:35.4747938Z // begin inline asm 2026-02-21T10:22:35.4748564Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5312,%r5313,%r5314,%r5315}, %rd282, %p151, 1, 1; 2026-02-21T10:22:35.4748634Z // end inline asm 2026-02-21T10:22:35.4748704Z // begin inline asm 2026-02-21T10:22:35.4749263Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5348,%r5349,%r5350,%r5351}, %rd283, %p151, 1, 1; 2026-02-21T10:22:35.4749327Z // end inline asm 2026-02-21T10:22:35.4749390Z // begin inline asm 2026-02-21T10:22:35.4749893Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5384,%r5385,%r5386,%r5387}, %rd284, %p151, 1, 1; 2026-02-21T10:22:35.4749952Z // end inline asm 2026-02-21T10:22:35.4750017Z // begin inline asm 2026-02-21T10:22:35.4750513Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5420,%r5421,%r5422,%r5423}, %rd285, %p151, 1, 1; 2026-02-21T10:22:35.4750575Z // end inline asm 2026-02-21T10:22:35.4750635Z // begin inline asm 2026-02-21T10:22:35.4751144Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5456,%r5457,%r5458,%r5459}, %rd286, %p151, 1, 1; 2026-02-21T10:22:35.4751203Z // end inline asm 2026-02-21T10:22:35.4751264Z // begin inline asm 2026-02-21T10:22:35.4751763Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5492,%r5493,%r5494,%r5495}, %rd287, %p151, 1, 1; 2026-02-21T10:22:35.4751821Z // end inline asm 2026-02-21T10:22:35.4751879Z // begin inline asm 2026-02-21T10:22:35.4752383Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5528,%r5529,%r5530,%r5531}, %rd288, %p151, 1, 1; 2026-02-21T10:22:35.4752442Z // end inline asm 2026-02-21T10:22:35.4752505Z // begin inline asm 2026-02-21T10:22:35.4753007Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r5564,%r5565,%r5566,%r5567}, %rd289, %p151, 1, 1; 2026-02-21T10:22:35.4753064Z // end inline asm 2026-02-21T10:22:35.4753123Z // begin inline asm 2026-02-21T10:22:35.4753623Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5312,%r5313,%r5314,%r5315}, %rd290, %p151, 1, 1; 2026-02-21T10:22:35.4753684Z // end inline asm 2026-02-21T10:22:35.4753743Z // begin inline asm 2026-02-21T10:22:35.4754245Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5348,%r5349,%r5350,%r5351}, %rd291, %p151, 1, 1; 2026-02-21T10:22:35.4754383Z // end inline asm 2026-02-21T10:22:35.4754508Z // begin inline asm 2026-02-21T10:22:35.4755026Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5384,%r5385,%r5386,%r5387}, %rd292, %p151, 1, 1; 2026-02-21T10:22:35.4755088Z // end inline asm 2026-02-21T10:22:35.4755150Z // begin inline asm 2026-02-21T10:22:35.4755648Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5420,%r5421,%r5422,%r5423}, %rd293, %p151, 1, 1; 2026-02-21T10:22:35.4755717Z // end inline asm 2026-02-21T10:22:35.4755778Z // begin inline asm 2026-02-21T10:22:35.4756333Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5456,%r5457,%r5458,%r5459}, %rd294, %p151, 1, 1; 2026-02-21T10:22:35.4756399Z // end inline asm 2026-02-21T10:22:35.4756582Z // begin inline asm 2026-02-21T10:22:35.4757175Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5492,%r5493,%r5494,%r5495}, %rd295, %p151, 1, 1; 2026-02-21T10:22:35.4757245Z // end inline asm 2026-02-21T10:22:35.4757310Z // begin inline asm 2026-02-21T10:22:35.4757816Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5528,%r5529,%r5530,%r5531}, %rd296, %p151, 1, 1; 2026-02-21T10:22:35.4757881Z // end inline asm 2026-02-21T10:22:35.4757942Z // begin inline asm 2026-02-21T10:22:35.4758443Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r5564,%r5565,%r5566,%r5567}, %rd297, %p151, 1, 1; 2026-02-21T10:22:35.4758509Z // end inline asm 2026-02-21T10:22:35.4758591Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4758654Z mov.b32 %r6209, %r9226; 2026-02-21T10:22:35.4758719Z mov.b32 %r6210, %r9226; 2026-02-21T10:22:35.4758783Z mov.b32 %r6208, %r11026; 2026-02-21T10:22:35.4758855Z // begin inline asm 2026-02-21T10:22:35.4759936Z // wait for regs: %r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043,%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331,%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619,%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907,%r6208,%r6209,%r6210 2026-02-21T10:22:35.4760022Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4760080Z // end inline asm 2026-02-21T10:22:35.4760136Z $L__tmp10: 2026-02-21T10:22:35.4760363Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4760437Z add.s64 %rd299, %rd597, 128; 2026-02-21T10:22:35.4760501Z // begin inline asm 2026-02-21T10:22:35.4760569Z mov.u64 %rd298, 0x0; 2026-02-21T10:22:35.4760700Z createpolicy.fractional.L2::evict_last.b64 %rd298, 1.0; 2026-02-21T10:22:35.4760759Z // end inline asm 2026-02-21T10:22:35.4760823Z // begin inline asm 2026-02-21T10:22:35.4760882Z mov.u32 %r6278, 0x0; 2026-02-21T10:22:35.4760941Z mov.u32 %r6279, 0x0; 2026-02-21T10:22:35.4761000Z mov.u32 %r6280, 0x0; 2026-02-21T10:22:35.4761064Z mov.u32 %r6281, 0x0; 2026-02-21T10:22:35.4761292Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6278, %r6279, %r6280, %r6281 }, [ %rd299 + 0 ], %rd298; 2026-02-21T10:22:35.4761437Z // end inline asm 2026-02-21T10:22:35.4761658Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4761718Z bar.sync 0; 2026-02-21T10:22:35.4761802Z st.shared.v2.b32 [%r14], {%r6278, %r6279}; 2026-02-21T10:22:35.4761952Z st.shared.v2.b32 [%r15], {%r6280, %r6281}; 2026-02-21T10:22:35.4762008Z bar.sync 0; 2026-02-21T10:22:35.4762081Z ld.shared.b16 %rs561, [%r16]; 2026-02-21T10:22:35.4762154Z ld.shared.b16 %rs562, [%r16+1024]; 2026-02-21T10:22:35.4762228Z ld.shared.b16 %rs563, [%r16+64]; 2026-02-21T10:22:35.4762295Z ld.shared.b16 %rs564, [%r16+1088]; 2026-02-21T10:22:35.4762362Z ld.shared.b16 %rs565, [%r17]; 2026-02-21T10:22:35.4762433Z ld.shared.b16 %rs566, [%r17+1024]; 2026-02-21T10:22:35.4762501Z ld.shared.b16 %rs567, [%r17+64]; 2026-02-21T10:22:35.4762567Z ld.shared.b16 %rs568, [%r17+1088]; 2026-02-21T10:22:35.4762632Z ld.shared.b16 %rs569, [%r18]; 2026-02-21T10:22:35.4762703Z ld.shared.b16 %rs570, [%r18+1024]; 2026-02-21T10:22:35.4762835Z ld.shared.b16 %rs571, [%r18+64]; 2026-02-21T10:22:35.4762905Z ld.shared.b16 %rs572, [%r18+1088]; 2026-02-21T10:22:35.4762975Z ld.shared.b16 %rs573, [%r19]; 2026-02-21T10:22:35.4763041Z ld.shared.b16 %rs574, [%r19+1024]; 2026-02-21T10:22:35.4763110Z ld.shared.b16 %rs575, [%r19+64]; 2026-02-21T10:22:35.4763177Z ld.shared.b16 %rs576, [%r19+1088]; 2026-02-21T10:22:35.4763305Z ld.shared.b16 %rs577, [%r20]; 2026-02-21T10:22:35.4763376Z ld.shared.b16 %rs578, [%r20+1024]; 2026-02-21T10:22:35.4763442Z ld.shared.b16 %rs579, [%r20+64]; 2026-02-21T10:22:35.4763511Z ld.shared.b16 %rs580, [%r20+1088]; 2026-02-21T10:22:35.4763576Z ld.shared.b16 %rs581, [%r21]; 2026-02-21T10:22:35.4763641Z ld.shared.b16 %rs582, [%r21+1024]; 2026-02-21T10:22:35.4763713Z ld.shared.b16 %rs583, [%r21+64]; 2026-02-21T10:22:35.4763778Z ld.shared.b16 %rs584, [%r21+1088]; 2026-02-21T10:22:35.4763846Z ld.shared.b16 %rs585, [%r22]; 2026-02-21T10:22:35.4763924Z ld.shared.b16 %rs586, [%r22+1024]; 2026-02-21T10:22:35.4764001Z ld.shared.b16 %rs587, [%r22+64]; 2026-02-21T10:22:35.4764067Z ld.shared.b16 %rs588, [%r22+1088]; 2026-02-21T10:22:35.4764131Z ld.shared.b16 %rs589, [%r23]; 2026-02-21T10:22:35.4764202Z ld.shared.b16 %rs590, [%r23+1024]; 2026-02-21T10:22:35.4764270Z ld.shared.b16 %rs591, [%r23+64]; 2026-02-21T10:22:35.4764336Z ld.shared.b16 %rs592, [%r23+1088]; 2026-02-21T10:22:35.4764408Z cvt.f32.bf16 %r6611, %rs561; 2026-02-21T10:22:35.4764480Z cvt.f32.bf16 %r6612, %rs562; 2026-02-21T10:22:35.4764542Z cvt.f32.bf16 %r6613, %rs565; 2026-02-21T10:22:35.4764606Z cvt.f32.bf16 %r6614, %rs566; 2026-02-21T10:22:35.4764678Z cvt.f32.bf16 %r6647, %rs569; 2026-02-21T10:22:35.4764745Z cvt.f32.bf16 %r6648, %rs570; 2026-02-21T10:22:35.4764808Z cvt.f32.bf16 %r6649, %rs573; 2026-02-21T10:22:35.4764870Z cvt.f32.bf16 %r6650, %rs574; 2026-02-21T10:22:35.4764939Z cvt.f32.bf16 %r6683, %rs577; 2026-02-21T10:22:35.4765001Z cvt.f32.bf16 %r6684, %rs578; 2026-02-21T10:22:35.4765066Z cvt.f32.bf16 %r6685, %rs581; 2026-02-21T10:22:35.4765133Z cvt.f32.bf16 %r6686, %rs582; 2026-02-21T10:22:35.4765194Z cvt.f32.bf16 %r6719, %rs585; 2026-02-21T10:22:35.4765255Z cvt.f32.bf16 %r6720, %rs586; 2026-02-21T10:22:35.4765328Z cvt.f32.bf16 %r6721, %rs589; 2026-02-21T10:22:35.4765390Z cvt.f32.bf16 %r6722, %rs590; 2026-02-21T10:22:35.4765451Z cvt.f32.bf16 %r6755, %rs563; 2026-02-21T10:22:35.4765515Z cvt.f32.bf16 %r6756, %rs564; 2026-02-21T10:22:35.4765581Z cvt.f32.bf16 %r6757, %rs567; 2026-02-21T10:22:35.4765643Z cvt.f32.bf16 %r6758, %rs568; 2026-02-21T10:22:35.4765704Z cvt.f32.bf16 %r6791, %rs571; 2026-02-21T10:22:35.4765769Z cvt.f32.bf16 %r6792, %rs572; 2026-02-21T10:22:35.4765830Z cvt.f32.bf16 %r6793, %rs575; 2026-02-21T10:22:35.4765891Z cvt.f32.bf16 %r6794, %rs576; 2026-02-21T10:22:35.4765952Z cvt.f32.bf16 %r6827, %rs579; 2026-02-21T10:22:35.4766019Z cvt.f32.bf16 %r6828, %rs580; 2026-02-21T10:22:35.4766084Z cvt.f32.bf16 %r6829, %rs583; 2026-02-21T10:22:35.4766215Z cvt.f32.bf16 %r6830, %rs584; 2026-02-21T10:22:35.4766291Z cvt.f32.bf16 %r6863, %rs587; 2026-02-21T10:22:35.4766355Z cvt.f32.bf16 %r6864, %rs588; 2026-02-21T10:22:35.4766417Z cvt.f32.bf16 %r6865, %rs591; 2026-02-21T10:22:35.4766655Z cvt.f32.bf16 %r6866, %rs592; 2026-02-21T10:22:35.4766885Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4766945Z bar.sync 0; 2026-02-21T10:22:35.4767007Z // begin inline asm 2026-02-21T10:22:35.4767114Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4767173Z // end inline asm 2026-02-21T10:22:35.4767229Z bar.sync 0; 2026-02-21T10:22:35.4767293Z // begin inline asm 2026-02-21T10:22:35.4767427Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4767485Z // end inline asm 2026-02-21T10:22:35.4767545Z // begin inline asm 2026-02-21T10:22:35.4767627Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4767687Z // end inline asm 2026-02-21T10:22:35.4767825Z bar.sync 0; 2026-02-21T10:22:35.4767905Z elect.sync %r9265|%p270, -1; 2026-02-21T10:22:35.4767977Z and.pred %p185, %p1, %p270; 2026-02-21T10:22:35.4768041Z or.b32 %r6286, %r4907, 32; 2026-02-21T10:22:35.4768104Z // begin inline asm 2026-02-21T10:22:35.4768442Z @%p185 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r4906, %r6286}], [%r4903]; 2026-02-21T10:22:35.4768568Z // end inline asm 2026-02-21T10:22:35.4768627Z bar.sync 0; 2026-02-21T10:22:35.4768693Z // begin inline asm 2026-02-21T10:22:35.4768746Z 2026-02-21T10:22:35.4768797Z { 2026-02-21T10:22:35.4768863Z .reg .pred complete; 2026-02-21T10:22:35.4768927Z waitLoop: 2026-02-21T10:22:35.4769070Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r9226; 2026-02-21T10:22:35.4769141Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4769198Z } 2026-02-21T10:22:35.4769203Z 2026-02-21T10:22:35.4769261Z // end inline asm 2026-02-21T10:22:35.4769320Z bar.sync 0; 2026-02-21T10:22:35.4769386Z // begin inline asm 2026-02-21T10:22:35.4769506Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4769566Z // end inline asm 2026-02-21T10:22:35.4769780Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4769856Z ld.shared.b8 %rs593, [%r24]; 2026-02-21T10:22:35.4769929Z ld.shared.b8 %rs594, [%r24+1024]; 2026-02-21T10:22:35.4769997Z ld.shared.b8 %rs595, [%r24+2048]; 2026-02-21T10:22:35.4770075Z ld.shared.b8 %rs596, [%r24+3072]; 2026-02-21T10:22:35.4770143Z ld.shared.b8 %rs597, [%r25+256]; 2026-02-21T10:22:35.4770208Z ld.shared.b8 %rs598, [%r25+1280]; 2026-02-21T10:22:35.4770279Z ld.shared.b8 %rs599, [%r25+2304]; 2026-02-21T10:22:35.4770349Z ld.shared.b8 %rs600, [%r25+3328]; 2026-02-21T10:22:35.4770416Z ld.shared.b8 %rs601, [%r26+512]; 2026-02-21T10:22:35.4770481Z ld.shared.b8 %rs602, [%r26+1536]; 2026-02-21T10:22:35.4770549Z ld.shared.b8 %rs603, [%r26+2560]; 2026-02-21T10:22:35.4770618Z ld.shared.b8 %rs604, [%r26+3584]; 2026-02-21T10:22:35.4770683Z ld.shared.b8 %rs605, [%r27+768]; 2026-02-21T10:22:35.4770765Z ld.shared.b8 %rs606, [%r27+1792]; 2026-02-21T10:22:35.4770834Z ld.shared.b8 %rs607, [%r27+2816]; 2026-02-21T10:22:35.4770903Z ld.shared.b8 %rs608, [%r27+3840]; 2026-02-21T10:22:35.4771110Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4771182Z shl.b16 %rs609, %rs593, 4; 2026-02-21T10:22:35.4771245Z shl.b16 %rs610, %rs597, 4; 2026-02-21T10:22:35.4771307Z shl.b16 %rs611, %rs601, 4; 2026-02-21T10:22:35.4771374Z shl.b16 %rs612, %rs605, 4; 2026-02-21T10:22:35.4771438Z shl.b16 %rs613, %rs594, 4; 2026-02-21T10:22:35.4771499Z shl.b16 %rs614, %rs598, 4; 2026-02-21T10:22:35.4771561Z shl.b16 %rs615, %rs602, 4; 2026-02-21T10:22:35.4771628Z shl.b16 %rs616, %rs606, 4; 2026-02-21T10:22:35.4771691Z shl.b16 %rs617, %rs595, 4; 2026-02-21T10:22:35.4771840Z shl.b16 %rs618, %rs599, 4; 2026-02-21T10:22:35.4771912Z shl.b16 %rs619, %rs603, 4; 2026-02-21T10:22:35.4771975Z shl.b16 %rs620, %rs607, 4; 2026-02-21T10:22:35.4772040Z shl.b16 %rs621, %rs596, 4; 2026-02-21T10:22:35.4772107Z shl.b16 %rs622, %rs600, 4; 2026-02-21T10:22:35.4772233Z shl.b16 %rs623, %rs604, 4; 2026-02-21T10:22:35.4772295Z shl.b16 %rs624, %rs608, 4; 2026-02-21T10:22:35.4772500Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4772583Z selp.b16 %rs625, %rs609, %rs593, %p484; 2026-02-21T10:22:35.4772647Z cvt.s16.s8 %rs626, %rs625; 2026-02-21T10:22:35.4772708Z shr.s16 %rs627, %rs626, 4; 2026-02-21T10:22:35.4772800Z selp.b16 %rs628, %rs610, %rs597, %p484; 2026-02-21T10:22:35.4772868Z cvt.s16.s8 %rs629, %rs628; 2026-02-21T10:22:35.4772930Z shr.s16 %rs630, %rs629, 4; 2026-02-21T10:22:35.4773000Z selp.b16 %rs631, %rs611, %rs601, %p484; 2026-02-21T10:22:35.4773069Z cvt.s16.s8 %rs632, %rs631; 2026-02-21T10:22:35.4773135Z shr.s16 %rs633, %rs632, 4; 2026-02-21T10:22:35.4773255Z selp.b16 %rs634, %rs612, %rs605, %p484; 2026-02-21T10:22:35.4773324Z cvt.s16.s8 %rs635, %rs634; 2026-02-21T10:22:35.4773384Z shr.s16 %rs636, %rs635, 4; 2026-02-21T10:22:35.4773462Z selp.b16 %rs637, %rs613, %rs594, %p484; 2026-02-21T10:22:35.4773529Z cvt.s16.s8 %rs638, %rs637; 2026-02-21T10:22:35.4773589Z shr.s16 %rs639, %rs638, 4; 2026-02-21T10:22:35.4773704Z selp.b16 %rs640, %rs614, %rs598, %p484; 2026-02-21T10:22:35.4773766Z cvt.s16.s8 %rs641, %rs640; 2026-02-21T10:22:35.4773833Z shr.s16 %rs642, %rs641, 4; 2026-02-21T10:22:35.4773903Z selp.b16 %rs643, %rs615, %rs602, %p484; 2026-02-21T10:22:35.4773966Z cvt.s16.s8 %rs644, %rs643; 2026-02-21T10:22:35.4774032Z shr.s16 %rs645, %rs644, 4; 2026-02-21T10:22:35.4774102Z selp.b16 %rs646, %rs616, %rs606, %p484; 2026-02-21T10:22:35.4774163Z cvt.s16.s8 %rs647, %rs646; 2026-02-21T10:22:35.4774223Z shr.s16 %rs648, %rs647, 4; 2026-02-21T10:22:35.4774298Z selp.b16 %rs649, %rs617, %rs595, %p484; 2026-02-21T10:22:35.4774365Z cvt.s16.s8 %rs650, %rs649; 2026-02-21T10:22:35.4774426Z shr.s16 %rs651, %rs650, 4; 2026-02-21T10:22:35.4774500Z selp.b16 %rs652, %rs618, %rs599, %p484; 2026-02-21T10:22:35.4774563Z cvt.s16.s8 %rs653, %rs652; 2026-02-21T10:22:35.4774630Z shr.s16 %rs654, %rs653, 4; 2026-02-21T10:22:35.4774700Z selp.b16 %rs655, %rs619, %rs603, %p484; 2026-02-21T10:22:35.4774764Z cvt.s16.s8 %rs656, %rs655; 2026-02-21T10:22:35.4774828Z shr.s16 %rs657, %rs656, 4; 2026-02-21T10:22:35.4774896Z selp.b16 %rs658, %rs620, %rs607, %p484; 2026-02-21T10:22:35.4774962Z cvt.s16.s8 %rs659, %rs658; 2026-02-21T10:22:35.4775022Z shr.s16 %rs660, %rs659, 4; 2026-02-21T10:22:35.4775091Z selp.b16 %rs661, %rs621, %rs596, %p484; 2026-02-21T10:22:35.4775152Z cvt.s16.s8 %rs662, %rs661; 2026-02-21T10:22:35.4775218Z shr.s16 %rs663, %rs662, 4; 2026-02-21T10:22:35.4775287Z selp.b16 %rs664, %rs622, %rs600, %p484; 2026-02-21T10:22:35.4775349Z cvt.s16.s8 %rs665, %rs664; 2026-02-21T10:22:35.4775421Z shr.s16 %rs666, %rs665, 4; 2026-02-21T10:22:35.4775490Z selp.b16 %rs667, %rs623, %rs604, %p484; 2026-02-21T10:22:35.4775552Z cvt.s16.s8 %rs668, %rs667; 2026-02-21T10:22:35.4775621Z shr.s16 %rs669, %rs668, 4; 2026-02-21T10:22:35.4775693Z selp.b16 %rs670, %rs624, %rs608, %p484; 2026-02-21T10:22:35.4775765Z cvt.s16.s8 %rs671, %rs670; 2026-02-21T10:22:35.4775828Z shr.s16 %rs672, %rs671, 4; 2026-02-21T10:22:35.4776038Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4776107Z cvt.rn.f32.s16 %r9266, %rs627; 2026-02-21T10:22:35.4776176Z cvt.rn.f32.s16 %r9267, %rs630; 2026-02-21T10:22:35.4776244Z cvt.rn.f32.s16 %r9268, %rs633; 2026-02-21T10:22:35.4776308Z cvt.rn.f32.s16 %r9269, %rs636; 2026-02-21T10:22:35.4776372Z cvt.rn.f32.s16 %r9270, %rs639; 2026-02-21T10:22:35.4776436Z cvt.rn.f32.s16 %r9271, %rs642; 2026-02-21T10:22:35.4776627Z cvt.rn.f32.s16 %r9272, %rs645; 2026-02-21T10:22:35.4776798Z cvt.rn.f32.s16 %r9273, %rs648; 2026-02-21T10:22:35.4776873Z cvt.rn.f32.s16 %r9274, %rs651; 2026-02-21T10:22:35.4776943Z cvt.rn.f32.s16 %r9275, %rs654; 2026-02-21T10:22:35.4777007Z cvt.rn.f32.s16 %r9276, %rs657; 2026-02-21T10:22:35.4777070Z cvt.rn.f32.s16 %r9277, %rs660; 2026-02-21T10:22:35.4777204Z cvt.rn.f32.s16 %r9278, %rs663; 2026-02-21T10:22:35.4777269Z cvt.rn.f32.s16 %r9279, %rs666; 2026-02-21T10:22:35.4777336Z cvt.rn.f32.s16 %r9280, %rs669; 2026-02-21T10:22:35.4777399Z cvt.rn.f32.s16 %r9281, %rs672; 2026-02-21T10:22:35.4777460Z bar.sync 0; 2026-02-21T10:22:35.4777527Z st.shared.b32 [%r41], %r9266; 2026-02-21T10:22:35.4777598Z st.shared.b32 [%r41+16384], %r9274; 2026-02-21T10:22:35.4777667Z st.shared.b32 [%r42], %r9267; 2026-02-21T10:22:35.4777735Z st.shared.b32 [%r42+16384], %r9275; 2026-02-21T10:22:35.4777799Z st.shared.b32 [%r43], %r9268; 2026-02-21T10:22:35.4777864Z st.shared.b32 [%r43+16384], %r9276; 2026-02-21T10:22:35.4777932Z st.shared.b32 [%r44], %r9269; 2026-02-21T10:22:35.4778066Z st.shared.b32 [%r44+16384], %r9277; 2026-02-21T10:22:35.4778134Z st.shared.b32 [%r45], %r9270; 2026-02-21T10:22:35.4778204Z st.shared.b32 [%r45+16384], %r9278; 2026-02-21T10:22:35.4778268Z st.shared.b32 [%r46], %r9271; 2026-02-21T10:22:35.4778336Z st.shared.b32 [%r46+16384], %r9279; 2026-02-21T10:22:35.4778403Z st.shared.b32 [%r47], %r9272; 2026-02-21T10:22:35.4778474Z st.shared.b32 [%r47+16384], %r9280; 2026-02-21T10:22:35.4778599Z st.shared.b32 [%r48], %r9273; 2026-02-21T10:22:35.4778666Z st.shared.b32 [%r48+16384], %r9281; 2026-02-21T10:22:35.4778726Z $L__tmp11: 2026-02-21T10:22:35.4779007Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4779072Z // begin inline asm 2026-02-21T10:22:35.4779154Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4779211Z // end inline asm 2026-02-21T10:22:35.4779269Z bar.sync 0; 2026-02-21T10:22:35.4779344Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4779413Z // begin inline asm 2026-02-21T10:22:35.4779931Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6611,%r6612,%r6613,%r6614}, %rd266, %p151, 1, 1; 2026-02-21T10:22:35.4779992Z // end inline asm 2026-02-21T10:22:35.4780074Z // begin inline asm 2026-02-21T10:22:35.4780580Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6647,%r6648,%r6649,%r6650}, %rd267, %p151, 1, 1; 2026-02-21T10:22:35.4780641Z // end inline asm 2026-02-21T10:22:35.4780705Z // begin inline asm 2026-02-21T10:22:35.4781207Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6683,%r6684,%r6685,%r6686}, %rd268, %p151, 1, 1; 2026-02-21T10:22:35.4781269Z // end inline asm 2026-02-21T10:22:35.4781340Z // begin inline asm 2026-02-21T10:22:35.4781851Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6719,%r6720,%r6721,%r6722}, %rd269, %p151, 1, 1; 2026-02-21T10:22:35.4781914Z // end inline asm 2026-02-21T10:22:35.4781980Z // begin inline asm 2026-02-21T10:22:35.4782484Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6755,%r6756,%r6757,%r6758}, %rd270, %p151, 1, 1; 2026-02-21T10:22:35.4782546Z // end inline asm 2026-02-21T10:22:35.4782606Z // begin inline asm 2026-02-21T10:22:35.4783106Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6791,%r6792,%r6793,%r6794}, %rd271, %p151, 1, 1; 2026-02-21T10:22:35.4783219Z // end inline asm 2026-02-21T10:22:35.4783280Z // begin inline asm 2026-02-21T10:22:35.4783780Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6827,%r6828,%r6829,%r6830}, %rd272, %p151, 1, 1; 2026-02-21T10:22:35.4783888Z // end inline asm 2026-02-21T10:22:35.4783947Z // begin inline asm 2026-02-21T10:22:35.4784449Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r6863,%r6864,%r6865,%r6866}, %rd273, %p151, 1, 1; 2026-02-21T10:22:35.4784510Z // end inline asm 2026-02-21T10:22:35.4784570Z // begin inline asm 2026-02-21T10:22:35.4785068Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6611,%r6612,%r6613,%r6614}, %rd274, %p151, 1, 1; 2026-02-21T10:22:35.4785177Z // end inline asm 2026-02-21T10:22:35.4785240Z // begin inline asm 2026-02-21T10:22:35.4785750Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6647,%r6648,%r6649,%r6650}, %rd275, %p151, 1, 1; 2026-02-21T10:22:35.4785814Z // end inline asm 2026-02-21T10:22:35.4785875Z // begin inline asm 2026-02-21T10:22:35.4786428Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6683,%r6684,%r6685,%r6686}, %rd276, %p151, 1, 1; 2026-02-21T10:22:35.4786607Z // end inline asm 2026-02-21T10:22:35.4786672Z // begin inline asm 2026-02-21T10:22:35.4787176Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6719,%r6720,%r6721,%r6722}, %rd277, %p151, 1, 1; 2026-02-21T10:22:35.4787241Z // end inline asm 2026-02-21T10:22:35.4787304Z // begin inline asm 2026-02-21T10:22:35.4787799Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6755,%r6756,%r6757,%r6758}, %rd278, %p151, 1, 1; 2026-02-21T10:22:35.4787868Z // end inline asm 2026-02-21T10:22:35.4787932Z // begin inline asm 2026-02-21T10:22:35.4788474Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6791,%r6792,%r6793,%r6794}, %rd279, %p151, 1, 1; 2026-02-21T10:22:35.4788558Z // end inline asm 2026-02-21T10:22:35.4788622Z // begin inline asm 2026-02-21T10:22:35.4789118Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6827,%r6828,%r6829,%r6830}, %rd280, %p151, 1, 1; 2026-02-21T10:22:35.4789185Z // end inline asm 2026-02-21T10:22:35.4789248Z // begin inline asm 2026-02-21T10:22:35.4789749Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r6863,%r6864,%r6865,%r6866}, %rd281, %p151, 1, 1; 2026-02-21T10:22:35.4789815Z // end inline asm 2026-02-21T10:22:35.4789876Z // begin inline asm 2026-02-21T10:22:35.4790374Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6611,%r6612,%r6613,%r6614}, %rd282, %p151, 1, 1; 2026-02-21T10:22:35.4790437Z // end inline asm 2026-02-21T10:22:35.4790498Z // begin inline asm 2026-02-21T10:22:35.4790993Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6647,%r6648,%r6649,%r6650}, %rd283, %p151, 1, 1; 2026-02-21T10:22:35.4791147Z // end inline asm 2026-02-21T10:22:35.4791206Z // begin inline asm 2026-02-21T10:22:35.4791703Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6683,%r6684,%r6685,%r6686}, %rd284, %p151, 1, 1; 2026-02-21T10:22:35.4791832Z // end inline asm 2026-02-21T10:22:35.4791896Z // begin inline asm 2026-02-21T10:22:35.4792391Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6719,%r6720,%r6721,%r6722}, %rd285, %p151, 1, 1; 2026-02-21T10:22:35.4792451Z // end inline asm 2026-02-21T10:22:35.4792529Z // begin inline asm 2026-02-21T10:22:35.4793088Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6755,%r6756,%r6757,%r6758}, %rd286, %p151, 1, 1; 2026-02-21T10:22:35.4793153Z // end inline asm 2026-02-21T10:22:35.4793229Z // begin inline asm 2026-02-21T10:22:35.4793728Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6791,%r6792,%r6793,%r6794}, %rd287, %p151, 1, 1; 2026-02-21T10:22:35.4793850Z // end inline asm 2026-02-21T10:22:35.4793918Z // begin inline asm 2026-02-21T10:22:35.4794413Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6827,%r6828,%r6829,%r6830}, %rd288, %p151, 1, 1; 2026-02-21T10:22:35.4794473Z // end inline asm 2026-02-21T10:22:35.4794536Z // begin inline asm 2026-02-21T10:22:35.4795045Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r6863,%r6864,%r6865,%r6866}, %rd289, %p151, 1, 1; 2026-02-21T10:22:35.4795107Z // end inline asm 2026-02-21T10:22:35.4795175Z // begin inline asm 2026-02-21T10:22:35.4795673Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6611,%r6612,%r6613,%r6614}, %rd290, %p151, 1, 1; 2026-02-21T10:22:35.4795737Z // end inline asm 2026-02-21T10:22:35.4795804Z // begin inline asm 2026-02-21T10:22:35.4796302Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6647,%r6648,%r6649,%r6650}, %rd291, %p151, 1, 1; 2026-02-21T10:22:35.4796361Z // end inline asm 2026-02-21T10:22:35.4796422Z // begin inline asm 2026-02-21T10:22:35.4797044Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6683,%r6684,%r6685,%r6686}, %rd292, %p151, 1, 1; 2026-02-21T10:22:35.4797109Z // end inline asm 2026-02-21T10:22:35.4797168Z // begin inline asm 2026-02-21T10:22:35.4797671Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6719,%r6720,%r6721,%r6722}, %rd293, %p151, 1, 1; 2026-02-21T10:22:35.4797735Z // end inline asm 2026-02-21T10:22:35.4797795Z // begin inline asm 2026-02-21T10:22:35.4798296Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6755,%r6756,%r6757,%r6758}, %rd294, %p151, 1, 1; 2026-02-21T10:22:35.4798355Z // end inline asm 2026-02-21T10:22:35.4798419Z // begin inline asm 2026-02-21T10:22:35.4798919Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6791,%r6792,%r6793,%r6794}, %rd295, %p151, 1, 1; 2026-02-21T10:22:35.4799070Z // end inline asm 2026-02-21T10:22:35.4799133Z // begin inline asm 2026-02-21T10:22:35.4799654Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6827,%r6828,%r6829,%r6830}, %rd296, %p151, 1, 1; 2026-02-21T10:22:35.4799782Z // end inline asm 2026-02-21T10:22:35.4799843Z // begin inline asm 2026-02-21T10:22:35.4800353Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r6863,%r6864,%r6865,%r6866}, %rd297, %p151, 1, 1; 2026-02-21T10:22:35.4800423Z // end inline asm 2026-02-21T10:22:35.4800507Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4800571Z mov.b32 %r7508, %r9226; 2026-02-21T10:22:35.4800639Z mov.b32 %r7509, %r9226; 2026-02-21T10:22:35.4800774Z mov.b32 %r7507, %r11026; 2026-02-21T10:22:35.4800842Z // begin inline asm 2026-02-21T10:22:35.4801996Z // wait for regs: %r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043,%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331,%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619,%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907,%r7507,%r7508,%r7509 2026-02-21T10:22:35.4802080Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4802139Z // end inline asm 2026-02-21T10:22:35.4802212Z $L__tmp12: 2026-02-21T10:22:35.4802435Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4802505Z add.s64 %rd335, %rd597, 256; 2026-02-21T10:22:35.4802572Z // begin inline asm 2026-02-21T10:22:35.4802639Z mov.u64 %rd334, 0x0; 2026-02-21T10:22:35.4802768Z createpolicy.fractional.L2::evict_last.b64 %rd334, 1.0; 2026-02-21T10:22:35.4802829Z // end inline asm 2026-02-21T10:22:35.4802894Z // begin inline asm 2026-02-21T10:22:35.4802956Z mov.u32 %r7577, 0x0; 2026-02-21T10:22:35.4803018Z mov.u32 %r7578, 0x0; 2026-02-21T10:22:35.4803084Z mov.u32 %r7579, 0x0; 2026-02-21T10:22:35.4803142Z mov.u32 %r7580, 0x0; 2026-02-21T10:22:35.4803370Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7577, %r7578, %r7579, %r7580 }, [ %rd335 + 0 ], %rd334; 2026-02-21T10:22:35.4803433Z // end inline asm 2026-02-21T10:22:35.4803640Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4803700Z bar.sync 0; 2026-02-21T10:22:35.4803784Z st.shared.v2.b32 [%r14], {%r7577, %r7578}; 2026-02-21T10:22:35.4803871Z st.shared.v2.b32 [%r15], {%r7579, %r7580}; 2026-02-21T10:22:35.4803928Z bar.sync 0; 2026-02-21T10:22:35.4804001Z ld.shared.b16 %rs673, [%r16]; 2026-02-21T10:22:35.4804079Z ld.shared.b16 %rs674, [%r16+1024]; 2026-02-21T10:22:35.4804149Z ld.shared.b16 %rs675, [%r16+64]; 2026-02-21T10:22:35.4804216Z ld.shared.b16 %rs676, [%r16+1088]; 2026-02-21T10:22:35.4804288Z ld.shared.b16 %rs677, [%r17]; 2026-02-21T10:22:35.4804358Z ld.shared.b16 %rs678, [%r17+1024]; 2026-02-21T10:22:35.4804425Z ld.shared.b16 %rs679, [%r17+64]; 2026-02-21T10:22:35.4804493Z ld.shared.b16 %rs680, [%r17+1088]; 2026-02-21T10:22:35.4804563Z ld.shared.b16 %rs681, [%r18]; 2026-02-21T10:22:35.4804629Z ld.shared.b16 %rs682, [%r18+1024]; 2026-02-21T10:22:35.4804696Z ld.shared.b16 %rs683, [%r18+64]; 2026-02-21T10:22:35.4804769Z ld.shared.b16 %rs684, [%r18+1088]; 2026-02-21T10:22:35.4804844Z ld.shared.b16 %rs685, [%r19]; 2026-02-21T10:22:35.4804910Z ld.shared.b16 %rs686, [%r19+1024]; 2026-02-21T10:22:35.4804977Z ld.shared.b16 %rs687, [%r19+64]; 2026-02-21T10:22:35.4805048Z ld.shared.b16 %rs688, [%r19+1088]; 2026-02-21T10:22:35.4805174Z ld.shared.b16 %rs689, [%r20]; 2026-02-21T10:22:35.4805241Z ld.shared.b16 %rs690, [%r20+1024]; 2026-02-21T10:22:35.4805313Z ld.shared.b16 %rs691, [%r20+64]; 2026-02-21T10:22:35.4805378Z ld.shared.b16 %rs692, [%r20+1088]; 2026-02-21T10:22:35.4805494Z ld.shared.b16 %rs693, [%r21]; 2026-02-21T10:22:35.4805559Z ld.shared.b16 %rs694, [%r21+1024]; 2026-02-21T10:22:35.4805629Z ld.shared.b16 %rs695, [%r21+64]; 2026-02-21T10:22:35.4805696Z ld.shared.b16 %rs696, [%r21+1088]; 2026-02-21T10:22:35.4805763Z ld.shared.b16 %rs697, [%r22]; 2026-02-21T10:22:35.4805832Z ld.shared.b16 %rs698, [%r22+1024]; 2026-02-21T10:22:35.4805899Z ld.shared.b16 %rs699, [%r22+64]; 2026-02-21T10:22:35.4805965Z ld.shared.b16 %rs700, [%r22+1088]; 2026-02-21T10:22:35.4806037Z ld.shared.b16 %rs701, [%r23]; 2026-02-21T10:22:35.4806102Z ld.shared.b16 %rs702, [%r23+1024]; 2026-02-21T10:22:35.4806170Z ld.shared.b16 %rs703, [%r23+64]; 2026-02-21T10:22:35.4806236Z ld.shared.b16 %rs704, [%r23+1088]; 2026-02-21T10:22:35.4806313Z cvt.f32.bf16 %r7910, %rs673; 2026-02-21T10:22:35.4806424Z cvt.f32.bf16 %r7911, %rs674; 2026-02-21T10:22:35.4806612Z cvt.f32.bf16 %r7912, %rs677; 2026-02-21T10:22:35.4806684Z cvt.f32.bf16 %r7913, %rs678; 2026-02-21T10:22:35.4806750Z cvt.f32.bf16 %r7946, %rs681; 2026-02-21T10:22:35.4806818Z cvt.f32.bf16 %r7947, %rs682; 2026-02-21T10:22:35.4806880Z cvt.f32.bf16 %r7948, %rs685; 2026-02-21T10:22:35.4806948Z cvt.f32.bf16 %r7949, %rs686; 2026-02-21T10:22:35.4807083Z cvt.f32.bf16 %r7982, %rs689; 2026-02-21T10:22:35.4807149Z cvt.f32.bf16 %r7983, %rs690; 2026-02-21T10:22:35.4807232Z cvt.f32.bf16 %r7984, %rs693; 2026-02-21T10:22:35.4807298Z cvt.f32.bf16 %r7985, %rs694; 2026-02-21T10:22:35.4807360Z cvt.f32.bf16 %r8018, %rs697; 2026-02-21T10:22:35.4807423Z cvt.f32.bf16 %r8019, %rs698; 2026-02-21T10:22:35.4807490Z cvt.f32.bf16 %r8020, %rs701; 2026-02-21T10:22:35.4807553Z cvt.f32.bf16 %r8021, %rs702; 2026-02-21T10:22:35.4807616Z cvt.f32.bf16 %r8054, %rs675; 2026-02-21T10:22:35.4807687Z cvt.f32.bf16 %r8055, %rs676; 2026-02-21T10:22:35.4807751Z cvt.f32.bf16 %r8056, %rs679; 2026-02-21T10:22:35.4807814Z cvt.f32.bf16 %r8057, %rs680; 2026-02-21T10:22:35.4807883Z cvt.f32.bf16 %r8090, %rs683; 2026-02-21T10:22:35.4807945Z cvt.f32.bf16 %r8091, %rs684; 2026-02-21T10:22:35.4808011Z cvt.f32.bf16 %r8092, %rs687; 2026-02-21T10:22:35.4808075Z cvt.f32.bf16 %r8093, %rs688; 2026-02-21T10:22:35.4808144Z cvt.f32.bf16 %r8126, %rs691; 2026-02-21T10:22:35.4808210Z cvt.f32.bf16 %r8127, %rs692; 2026-02-21T10:22:35.4808271Z cvt.f32.bf16 %r8128, %rs695; 2026-02-21T10:22:35.4808339Z cvt.f32.bf16 %r8129, %rs696; 2026-02-21T10:22:35.4808401Z cvt.f32.bf16 %r8162, %rs699; 2026-02-21T10:22:35.4808463Z cvt.f32.bf16 %r8163, %rs700; 2026-02-21T10:22:35.4808526Z cvt.f32.bf16 %r8164, %rs703; 2026-02-21T10:22:35.4808593Z cvt.f32.bf16 %r8165, %rs704; 2026-02-21T10:22:35.4808812Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4808874Z bar.sync 0; 2026-02-21T10:22:35.4808941Z // begin inline asm 2026-02-21T10:22:35.4809045Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4809105Z // end inline asm 2026-02-21T10:22:35.4809164Z bar.sync 0; 2026-02-21T10:22:35.4809234Z // begin inline asm 2026-02-21T10:22:35.4809367Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4809425Z // end inline asm 2026-02-21T10:22:35.4809501Z // begin inline asm 2026-02-21T10:22:35.4809595Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4809656Z // end inline asm 2026-02-21T10:22:35.4809725Z bar.sync 0; 2026-02-21T10:22:35.4809796Z elect.sync %r9282|%p271, -1; 2026-02-21T10:22:35.4809867Z and.pred %p221, %p1, %p271; 2026-02-21T10:22:35.4809933Z or.b32 %r7585, %r4907, 64; 2026-02-21T10:22:35.4810008Z // begin inline asm 2026-02-21T10:22:35.4810342Z @%p221 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r4906, %r7585}], [%r4903]; 2026-02-21T10:22:35.4810479Z // end inline asm 2026-02-21T10:22:35.4810543Z bar.sync 0; 2026-02-21T10:22:35.4810604Z // begin inline asm 2026-02-21T10:22:35.4810660Z 2026-02-21T10:22:35.4810713Z { 2026-02-21T10:22:35.4810784Z .reg .pred complete; 2026-02-21T10:22:35.4810904Z waitLoop: 2026-02-21T10:22:35.4811050Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r9226; 2026-02-21T10:22:35.4811130Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4811186Z } 2026-02-21T10:22:35.4811190Z 2026-02-21T10:22:35.4811249Z // end inline asm 2026-02-21T10:22:35.4811312Z bar.sync 0; 2026-02-21T10:22:35.4811373Z // begin inline asm 2026-02-21T10:22:35.4811481Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4811541Z // end inline asm 2026-02-21T10:22:35.4811760Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4811828Z ld.shared.b8 %rs705, [%r24]; 2026-02-21T10:22:35.4811897Z ld.shared.b8 %rs706, [%r24+1024]; 2026-02-21T10:22:35.4812035Z ld.shared.b8 %rs707, [%r24+2048]; 2026-02-21T10:22:35.4812106Z ld.shared.b8 %rs708, [%r24+3072]; 2026-02-21T10:22:35.4812173Z ld.shared.b8 %rs709, [%r25+256]; 2026-02-21T10:22:35.4812240Z ld.shared.b8 %rs710, [%r25+1280]; 2026-02-21T10:22:35.4812314Z ld.shared.b8 %rs711, [%r25+2304]; 2026-02-21T10:22:35.4812379Z ld.shared.b8 %rs712, [%r25+3328]; 2026-02-21T10:22:35.4812493Z ld.shared.b8 %rs713, [%r26+512]; 2026-02-21T10:22:35.4812568Z ld.shared.b8 %rs714, [%r26+1536]; 2026-02-21T10:22:35.4812632Z ld.shared.b8 %rs715, [%r26+2560]; 2026-02-21T10:22:35.4812696Z ld.shared.b8 %rs716, [%r26+3584]; 2026-02-21T10:22:35.4812766Z ld.shared.b8 %rs717, [%r27+768]; 2026-02-21T10:22:35.4812834Z ld.shared.b8 %rs718, [%r27+1792]; 2026-02-21T10:22:35.4812899Z ld.shared.b8 %rs719, [%r27+2816]; 2026-02-21T10:22:35.4812964Z ld.shared.b8 %rs720, [%r27+3840]; 2026-02-21T10:22:35.4813174Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4813257Z shl.b16 %rs721, %rs705, 4; 2026-02-21T10:22:35.4813323Z shl.b16 %rs722, %rs709, 4; 2026-02-21T10:22:35.4813392Z shl.b16 %rs723, %rs713, 4; 2026-02-21T10:22:35.4813455Z shl.b16 %rs724, %rs717, 4; 2026-02-21T10:22:35.4813521Z shl.b16 %rs725, %rs706, 4; 2026-02-21T10:22:35.4813583Z shl.b16 %rs726, %rs710, 4; 2026-02-21T10:22:35.4813649Z shl.b16 %rs727, %rs714, 4; 2026-02-21T10:22:35.4813714Z shl.b16 %rs728, %rs718, 4; 2026-02-21T10:22:35.4813776Z shl.b16 %rs729, %rs707, 4; 2026-02-21T10:22:35.4813842Z shl.b16 %rs730, %rs711, 4; 2026-02-21T10:22:35.4813904Z shl.b16 %rs731, %rs715, 4; 2026-02-21T10:22:35.4813966Z shl.b16 %rs732, %rs719, 4; 2026-02-21T10:22:35.4814029Z shl.b16 %rs733, %rs708, 4; 2026-02-21T10:22:35.4814096Z shl.b16 %rs734, %rs712, 4; 2026-02-21T10:22:35.4814156Z shl.b16 %rs735, %rs716, 4; 2026-02-21T10:22:35.4814219Z shl.b16 %rs736, %rs720, 4; 2026-02-21T10:22:35.4814426Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4814506Z selp.b16 %rs737, %rs721, %rs705, %p484; 2026-02-21T10:22:35.4814571Z cvt.s16.s8 %rs738, %rs737; 2026-02-21T10:22:35.4814638Z shr.s16 %rs739, %rs738, 4; 2026-02-21T10:22:35.4814714Z selp.b16 %rs740, %rs722, %rs709, %p484; 2026-02-21T10:22:35.4814777Z cvt.s16.s8 %rs741, %rs740; 2026-02-21T10:22:35.4814840Z shr.s16 %rs742, %rs741, 4; 2026-02-21T10:22:35.4814919Z selp.b16 %rs743, %rs723, %rs713, %p484; 2026-02-21T10:22:35.4814981Z cvt.s16.s8 %rs744, %rs743; 2026-02-21T10:22:35.4815044Z shr.s16 %rs745, %rs744, 4; 2026-02-21T10:22:35.4815120Z selp.b16 %rs746, %rs724, %rs717, %p484; 2026-02-21T10:22:35.4815194Z cvt.s16.s8 %rs747, %rs746; 2026-02-21T10:22:35.4815259Z shr.s16 %rs748, %rs747, 4; 2026-02-21T10:22:35.4815331Z selp.b16 %rs749, %rs725, %rs706, %p484; 2026-02-21T10:22:35.4815399Z cvt.s16.s8 %rs750, %rs749; 2026-02-21T10:22:35.4815460Z shr.s16 %rs751, %rs750, 4; 2026-02-21T10:22:35.4815593Z selp.b16 %rs752, %rs726, %rs710, %p484; 2026-02-21T10:22:35.4815667Z cvt.s16.s8 %rs753, %rs752; 2026-02-21T10:22:35.4815730Z shr.s16 %rs754, %rs753, 4; 2026-02-21T10:22:35.4815802Z selp.b16 %rs755, %rs727, %rs714, %p484; 2026-02-21T10:22:35.4815916Z cvt.s16.s8 %rs756, %rs755; 2026-02-21T10:22:35.4815984Z shr.s16 %rs757, %rs756, 4; 2026-02-21T10:22:35.4816054Z selp.b16 %rs758, %rs728, %rs718, %p484; 2026-02-21T10:22:35.4816120Z cvt.s16.s8 %rs759, %rs758; 2026-02-21T10:22:35.4816187Z shr.s16 %rs760, %rs759, 4; 2026-02-21T10:22:35.4816256Z selp.b16 %rs761, %rs729, %rs707, %p484; 2026-02-21T10:22:35.4816320Z cvt.s16.s8 %rs762, %rs761; 2026-02-21T10:22:35.4816388Z shr.s16 %rs763, %rs762, 4; 2026-02-21T10:22:35.4816575Z selp.b16 %rs764, %rs730, %rs711, %p484; 2026-02-21T10:22:35.4816644Z cvt.s16.s8 %rs765, %rs764; 2026-02-21T10:22:35.4816707Z shr.s16 %rs766, %rs765, 4; 2026-02-21T10:22:35.4816786Z selp.b16 %rs767, %rs731, %rs715, %p484; 2026-02-21T10:22:35.4816850Z cvt.s16.s8 %rs768, %rs767; 2026-02-21T10:22:35.4816998Z shr.s16 %rs769, %rs768, 4; 2026-02-21T10:22:35.4817079Z selp.b16 %rs770, %rs732, %rs719, %p484; 2026-02-21T10:22:35.4817144Z cvt.s16.s8 %rs771, %rs770; 2026-02-21T10:22:35.4817205Z shr.s16 %rs772, %rs771, 4; 2026-02-21T10:22:35.4817282Z selp.b16 %rs773, %rs733, %rs708, %p484; 2026-02-21T10:22:35.4817352Z cvt.s16.s8 %rs774, %rs773; 2026-02-21T10:22:35.4817416Z shr.s16 %rs775, %rs774, 4; 2026-02-21T10:22:35.4817552Z selp.b16 %rs776, %rs734, %rs712, %p484; 2026-02-21T10:22:35.4817626Z cvt.s16.s8 %rs777, %rs776; 2026-02-21T10:22:35.4817689Z shr.s16 %rs778, %rs777, 4; 2026-02-21T10:22:35.4817761Z selp.b16 %rs779, %rs735, %rs716, %p484; 2026-02-21T10:22:35.4817835Z cvt.s16.s8 %rs780, %rs779; 2026-02-21T10:22:35.4817906Z shr.s16 %rs781, %rs780, 4; 2026-02-21T10:22:35.4817979Z selp.b16 %rs782, %rs736, %rs720, %p484; 2026-02-21T10:22:35.4818043Z cvt.s16.s8 %rs783, %rs782; 2026-02-21T10:22:35.4818112Z shr.s16 %rs784, %rs783, 4; 2026-02-21T10:22:35.4818323Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4818393Z cvt.rn.f32.s16 %r9283, %rs739; 2026-02-21T10:22:35.4818463Z cvt.rn.f32.s16 %r9284, %rs742; 2026-02-21T10:22:35.4818530Z cvt.rn.f32.s16 %r9285, %rs745; 2026-02-21T10:22:35.4818595Z cvt.rn.f32.s16 %r9286, %rs748; 2026-02-21T10:22:35.4818660Z cvt.rn.f32.s16 %r9287, %rs751; 2026-02-21T10:22:35.4818734Z cvt.rn.f32.s16 %r9288, %rs754; 2026-02-21T10:22:35.4818798Z cvt.rn.f32.s16 %r9289, %rs757; 2026-02-21T10:22:35.4818861Z cvt.rn.f32.s16 %r9290, %rs760; 2026-02-21T10:22:35.4818930Z cvt.rn.f32.s16 %r9291, %rs763; 2026-02-21T10:22:35.4818999Z cvt.rn.f32.s16 %r9292, %rs766; 2026-02-21T10:22:35.4819063Z cvt.rn.f32.s16 %r9293, %rs769; 2026-02-21T10:22:35.4819128Z cvt.rn.f32.s16 %r9294, %rs772; 2026-02-21T10:22:35.4819199Z cvt.rn.f32.s16 %r9295, %rs775; 2026-02-21T10:22:35.4819263Z cvt.rn.f32.s16 %r9296, %rs778; 2026-02-21T10:22:35.4819327Z cvt.rn.f32.s16 %r9297, %rs781; 2026-02-21T10:22:35.4819407Z cvt.rn.f32.s16 %r9298, %rs784; 2026-02-21T10:22:35.4819465Z bar.sync 0; 2026-02-21T10:22:35.4819531Z st.shared.b32 [%r41], %r9283; 2026-02-21T10:22:35.4819602Z st.shared.b32 [%r41+16384], %r9291; 2026-02-21T10:22:35.4819673Z st.shared.b32 [%r42], %r9284; 2026-02-21T10:22:35.4819753Z st.shared.b32 [%r42+16384], %r9292; 2026-02-21T10:22:35.4819821Z st.shared.b32 [%r43], %r9285; 2026-02-21T10:22:35.4819895Z st.shared.b32 [%r43+16384], %r9293; 2026-02-21T10:22:35.4819959Z st.shared.b32 [%r44], %r9286; 2026-02-21T10:22:35.4820026Z st.shared.b32 [%r44+16384], %r9294; 2026-02-21T10:22:35.4820099Z st.shared.b32 [%r45], %r9287; 2026-02-21T10:22:35.4820165Z st.shared.b32 [%r45+16384], %r9295; 2026-02-21T10:22:35.4820229Z st.shared.b32 [%r46], %r9288; 2026-02-21T10:22:35.4820295Z st.shared.b32 [%r46+16384], %r9296; 2026-02-21T10:22:35.4820367Z st.shared.b32 [%r47], %r9289; 2026-02-21T10:22:35.4820432Z st.shared.b32 [%r47+16384], %r9297; 2026-02-21T10:22:35.4820589Z st.shared.b32 [%r48], %r9290; 2026-02-21T10:22:35.4820664Z st.shared.b32 [%r48+16384], %r9298; 2026-02-21T10:22:35.4820721Z $L__tmp13: 2026-02-21T10:22:35.4821004Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4821132Z // begin inline asm 2026-02-21T10:22:35.4821212Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4821273Z // end inline asm 2026-02-21T10:22:35.4821329Z bar.sync 0; 2026-02-21T10:22:35.4821409Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4821470Z // begin inline asm 2026-02-21T10:22:35.4821987Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r7910,%r7911,%r7912,%r7913}, %rd266, %p151, 1, 1; 2026-02-21T10:22:35.4822051Z // end inline asm 2026-02-21T10:22:35.4822111Z // begin inline asm 2026-02-21T10:22:35.4822675Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r7946,%r7947,%r7948,%r7949}, %rd267, %p151, 1, 1; 2026-02-21T10:22:35.4822749Z // end inline asm 2026-02-21T10:22:35.4822812Z // begin inline asm 2026-02-21T10:22:35.4823357Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r7982,%r7983,%r7984,%r7985}, %rd268, %p151, 1, 1; 2026-02-21T10:22:35.4823423Z // end inline asm 2026-02-21T10:22:35.4823484Z // begin inline asm 2026-02-21T10:22:35.4823983Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r8018,%r8019,%r8020,%r8021}, %rd269, %p151, 1, 1; 2026-02-21T10:22:35.4824041Z // end inline asm 2026-02-21T10:22:35.4824105Z // begin inline asm 2026-02-21T10:22:35.4824617Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r8054,%r8055,%r8056,%r8057}, %rd270, %p151, 1, 1; 2026-02-21T10:22:35.4824676Z // end inline asm 2026-02-21T10:22:35.4824744Z // begin inline asm 2026-02-21T10:22:35.4825241Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r8090,%r8091,%r8092,%r8093}, %rd271, %p151, 1, 1; 2026-02-21T10:22:35.4825300Z // end inline asm 2026-02-21T10:22:35.4825369Z // begin inline asm 2026-02-21T10:22:35.4825864Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r8126,%r8127,%r8128,%r8129}, %rd272, %p151, 1, 1; 2026-02-21T10:22:35.4825922Z // end inline asm 2026-02-21T10:22:35.4825988Z // begin inline asm 2026-02-21T10:22:35.4826605Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043}, {%r8162,%r8163,%r8164,%r8165}, %rd273, %p151, 1, 1; 2026-02-21T10:22:35.4826674Z // end inline asm 2026-02-21T10:22:35.4826741Z // begin inline asm 2026-02-21T10:22:35.4827240Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r7910,%r7911,%r7912,%r7913}, %rd274, %p151, 1, 1; 2026-02-21T10:22:35.4827299Z // end inline asm 2026-02-21T10:22:35.4827365Z // begin inline asm 2026-02-21T10:22:35.4827858Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r7946,%r7947,%r7948,%r7949}, %rd275, %p151, 1, 1; 2026-02-21T10:22:35.4827918Z // end inline asm 2026-02-21T10:22:35.4827979Z // begin inline asm 2026-02-21T10:22:35.4828620Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r7982,%r7983,%r7984,%r7985}, %rd276, %p151, 1, 1; 2026-02-21T10:22:35.4828748Z // end inline asm 2026-02-21T10:22:35.4828808Z // begin inline asm 2026-02-21T10:22:35.4829315Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r8018,%r8019,%r8020,%r8021}, %rd277, %p151, 1, 1; 2026-02-21T10:22:35.4829374Z // end inline asm 2026-02-21T10:22:35.4829435Z // begin inline asm 2026-02-21T10:22:35.4829936Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r8054,%r8055,%r8056,%r8057}, %rd278, %p151, 1, 1; 2026-02-21T10:22:35.4829997Z // end inline asm 2026-02-21T10:22:35.4830063Z // begin inline asm 2026-02-21T10:22:35.4830635Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r8090,%r8091,%r8092,%r8093}, %rd279, %p151, 1, 1; 2026-02-21T10:22:35.4830702Z // end inline asm 2026-02-21T10:22:35.4830767Z // begin inline asm 2026-02-21T10:22:35.4831328Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r8126,%r8127,%r8128,%r8129}, %rd280, %p151, 1, 1; 2026-02-21T10:22:35.4831390Z // end inline asm 2026-02-21T10:22:35.4831450Z // begin inline asm 2026-02-21T10:22:35.4831950Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331}, {%r8162,%r8163,%r8164,%r8165}, %rd281, %p151, 1, 1; 2026-02-21T10:22:35.4832010Z // end inline asm 2026-02-21T10:22:35.4832076Z // begin inline asm 2026-02-21T10:22:35.4832577Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r7910,%r7911,%r7912,%r7913}, %rd282, %p151, 1, 1; 2026-02-21T10:22:35.4832639Z // end inline asm 2026-02-21T10:22:35.4832702Z // begin inline asm 2026-02-21T10:22:35.4833196Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r7946,%r7947,%r7948,%r7949}, %rd283, %p151, 1, 1; 2026-02-21T10:22:35.4833260Z // end inline asm 2026-02-21T10:22:35.4833320Z // begin inline asm 2026-02-21T10:22:35.4833815Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r7982,%r7983,%r7984,%r7985}, %rd284, %p151, 1, 1; 2026-02-21T10:22:35.4833879Z // end inline asm 2026-02-21T10:22:35.4833943Z // begin inline asm 2026-02-21T10:22:35.4834438Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r8018,%r8019,%r8020,%r8021}, %rd285, %p151, 1, 1; 2026-02-21T10:22:35.4834506Z // end inline asm 2026-02-21T10:22:35.4834567Z // begin inline asm 2026-02-21T10:22:35.4835065Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r8054,%r8055,%r8056,%r8057}, %rd286, %p151, 1, 1; 2026-02-21T10:22:35.4835130Z // end inline asm 2026-02-21T10:22:35.4835190Z // begin inline asm 2026-02-21T10:22:35.4835698Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r8090,%r8091,%r8092,%r8093}, %rd287, %p151, 1, 1; 2026-02-21T10:22:35.4835822Z // end inline asm 2026-02-21T10:22:35.4835887Z // begin inline asm 2026-02-21T10:22:35.4836384Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r8126,%r8127,%r8128,%r8129}, %rd288, %p151, 1, 1; 2026-02-21T10:22:35.4836611Z // end inline asm 2026-02-21T10:22:35.4836680Z // begin inline asm 2026-02-21T10:22:35.4837179Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619}, {%r8162,%r8163,%r8164,%r8165}, %rd289, %p151, 1, 1; 2026-02-21T10:22:35.4837243Z // end inline asm 2026-02-21T10:22:35.4837304Z // begin inline asm 2026-02-21T10:22:35.4837799Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r7910,%r7911,%r7912,%r7913}, %rd290, %p151, 1, 1; 2026-02-21T10:22:35.4837860Z // end inline asm 2026-02-21T10:22:35.4838003Z // begin inline asm 2026-02-21T10:22:35.4838509Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r7946,%r7947,%r7948,%r7949}, %rd291, %p151, 1, 1; 2026-02-21T10:22:35.4838572Z // end inline asm 2026-02-21T10:22:35.4838637Z // begin inline asm 2026-02-21T10:22:35.4839196Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r7982,%r7983,%r7984,%r7985}, %rd292, %p151, 1, 1; 2026-02-21T10:22:35.4839259Z // end inline asm 2026-02-21T10:22:35.4839325Z // begin inline asm 2026-02-21T10:22:35.4839831Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r8018,%r8019,%r8020,%r8021}, %rd293, %p151, 1, 1; 2026-02-21T10:22:35.4839895Z // end inline asm 2026-02-21T10:22:35.4839959Z // begin inline asm 2026-02-21T10:22:35.4840456Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r8054,%r8055,%r8056,%r8057}, %rd294, %p151, 1, 1; 2026-02-21T10:22:35.4840528Z // end inline asm 2026-02-21T10:22:35.4840594Z // begin inline asm 2026-02-21T10:22:35.4841094Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r8090,%r8091,%r8092,%r8093}, %rd295, %p151, 1, 1; 2026-02-21T10:22:35.4841152Z // end inline asm 2026-02-21T10:22:35.4841215Z // begin inline asm 2026-02-21T10:22:35.4841714Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r8126,%r8127,%r8128,%r8129}, %rd296, %p151, 1, 1; 2026-02-21T10:22:35.4841793Z // end inline asm 2026-02-21T10:22:35.4841855Z // begin inline asm 2026-02-21T10:22:35.4842355Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907}, {%r8162,%r8163,%r8164,%r8165}, %rd297, %p151, 1, 1; 2026-02-21T10:22:35.4842416Z // end inline asm 2026-02-21T10:22:35.4842497Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4842566Z mov.b32 %r8807, %r9226; 2026-02-21T10:22:35.4842628Z mov.b32 %r8808, %r9226; 2026-02-21T10:22:35.4842690Z mov.b32 %r8806, %r11026; 2026-02-21T10:22:35.4842755Z // begin inline asm 2026-02-21T10:22:35.4843822Z // wait for regs: %r5028,%r5029,%r5030,%r5031,%r5032,%r5033,%r5034,%r5035,%r5036,%r5037,%r5038,%r5039,%r5040,%r5041,%r5042,%r5043,%r5316,%r5317,%r5318,%r5319,%r5320,%r5321,%r5322,%r5323,%r5324,%r5325,%r5326,%r5327,%r5328,%r5329,%r5330,%r5331,%r5604,%r5605,%r5606,%r5607,%r5608,%r5609,%r5610,%r5611,%r5612,%r5613,%r5614,%r5615,%r5616,%r5617,%r5618,%r5619,%r5892,%r5893,%r5894,%r5895,%r5896,%r5897,%r5898,%r5899,%r5900,%r5901,%r5902,%r5903,%r5904,%r5905,%r5906,%r5907,%r8806,%r8807,%r8808 2026-02-21T10:22:35.4843985Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4844109Z // end inline asm 2026-02-21T10:22:35.4844168Z $L__tmp14: 2026-02-21T10:22:35.4844385Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4844458Z add.s64 %rd371, %rd597, 384; 2026-02-21T10:22:35.4844518Z // begin inline asm 2026-02-21T10:22:35.4844579Z mov.u64 %rd370, 0x0; 2026-02-21T10:22:35.4844706Z createpolicy.fractional.L2::evict_last.b64 %rd370, 1.0; 2026-02-21T10:22:35.4844771Z // end inline asm 2026-02-21T10:22:35.4844832Z // begin inline asm 2026-02-21T10:22:35.4844892Z mov.u32 %r8876, 0x0; 2026-02-21T10:22:35.4844957Z mov.u32 %r8877, 0x0; 2026-02-21T10:22:35.4845017Z mov.u32 %r8878, 0x0; 2026-02-21T10:22:35.4845076Z mov.u32 %r8879, 0x0; 2026-02-21T10:22:35.4845359Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8876, %r8877, %r8878, %r8879 }, [ %rd371 + 0 ], %rd370; 2026-02-21T10:22:35.4845420Z // end inline asm 2026-02-21T10:22:35.4845627Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4845689Z bar.sync 0; 2026-02-21T10:22:35.4845777Z st.shared.v2.b32 [%r14], {%r8876, %r8877}; 2026-02-21T10:22:35.4845904Z st.shared.v2.b32 [%r15], {%r8878, %r8879}; 2026-02-21T10:22:35.4845966Z bar.sync 0; 2026-02-21T10:22:35.4846040Z ld.shared.b16 %rs785, [%r16]; 2026-02-21T10:22:35.4846110Z ld.shared.b16 %rs786, [%r16+1024]; 2026-02-21T10:22:35.4846179Z ld.shared.b16 %rs787, [%r16+64]; 2026-02-21T10:22:35.4846247Z ld.shared.b16 %rs788, [%r16+1088]; 2026-02-21T10:22:35.4846321Z ld.shared.b16 %rs789, [%r17]; 2026-02-21T10:22:35.4846388Z ld.shared.b16 %rs790, [%r17+1024]; 2026-02-21T10:22:35.4846577Z ld.shared.b16 %rs791, [%r17+64]; 2026-02-21T10:22:35.4846667Z ld.shared.b16 %rs792, [%r17+1088]; 2026-02-21T10:22:35.4846737Z ld.shared.b16 %rs793, [%r18]; 2026-02-21T10:22:35.4846805Z ld.shared.b16 %rs794, [%r18+1024]; 2026-02-21T10:22:35.4846877Z ld.shared.b16 %rs795, [%r18+64]; 2026-02-21T10:22:35.4846947Z ld.shared.b16 %rs796, [%r18+1088]; 2026-02-21T10:22:35.4847013Z ld.shared.b16 %rs797, [%r19]; 2026-02-21T10:22:35.4847079Z ld.shared.b16 %rs798, [%r19+1024]; 2026-02-21T10:22:35.4847156Z ld.shared.b16 %rs799, [%r19+64]; 2026-02-21T10:22:35.4847224Z ld.shared.b16 %rs800, [%r19+1088]; 2026-02-21T10:22:35.4847290Z ld.shared.b16 %rs801, [%r20]; 2026-02-21T10:22:35.4847363Z ld.shared.b16 %rs802, [%r20+1024]; 2026-02-21T10:22:35.4847430Z ld.shared.b16 %rs803, [%r20+64]; 2026-02-21T10:22:35.4847497Z ld.shared.b16 %rs804, [%r20+1088]; 2026-02-21T10:22:35.4847564Z ld.shared.b16 %rs805, [%r21]; 2026-02-21T10:22:35.4847636Z ld.shared.b16 %rs806, [%r21+1024]; 2026-02-21T10:22:35.4847711Z ld.shared.b16 %rs807, [%r21+64]; 2026-02-21T10:22:35.4847783Z ld.shared.b16 %rs808, [%r21+1088]; 2026-02-21T10:22:35.4847856Z ld.shared.b16 %rs809, [%r22]; 2026-02-21T10:22:35.4847923Z ld.shared.b16 %rs810, [%r22+1024]; 2026-02-21T10:22:35.4847991Z ld.shared.b16 %rs811, [%r22+64]; 2026-02-21T10:22:35.4848060Z ld.shared.b16 %rs812, [%r22+1088]; 2026-02-21T10:22:35.4848132Z ld.shared.b16 %rs813, [%r23]; 2026-02-21T10:22:35.4848200Z ld.shared.b16 %rs814, [%r23+1024]; 2026-02-21T10:22:35.4848270Z ld.shared.b16 %rs815, [%r23+64]; 2026-02-21T10:22:35.4848341Z ld.shared.b16 %rs816, [%r23+1088]; 2026-02-21T10:22:35.4848407Z cvt.f32.bf16 %r8953, %rs785; 2026-02-21T10:22:35.4848473Z cvt.f32.bf16 %r8954, %rs786; 2026-02-21T10:22:35.4848544Z cvt.f32.bf16 %r8955, %rs789; 2026-02-21T10:22:35.4848608Z cvt.f32.bf16 %r8956, %rs790; 2026-02-21T10:22:35.4848671Z cvt.f32.bf16 %r8989, %rs793; 2026-02-21T10:22:35.4848735Z cvt.f32.bf16 %r8990, %rs794; 2026-02-21T10:22:35.4848805Z cvt.f32.bf16 %r8991, %rs797; 2026-02-21T10:22:35.4848868Z cvt.f32.bf16 %r8992, %rs798; 2026-02-21T10:22:35.4849023Z cvt.f32.bf16 %r9025, %rs801; 2026-02-21T10:22:35.4849099Z cvt.f32.bf16 %r9026, %rs802; 2026-02-21T10:22:35.4849162Z cvt.f32.bf16 %r9027, %rs805; 2026-02-21T10:22:35.4849225Z cvt.f32.bf16 %r9028, %rs806; 2026-02-21T10:22:35.4849387Z cvt.f32.bf16 %r9061, %rs809; 2026-02-21T10:22:35.4849454Z cvt.f32.bf16 %r9062, %rs810; 2026-02-21T10:22:35.4849524Z cvt.f32.bf16 %r9063, %rs813; 2026-02-21T10:22:35.4849593Z cvt.f32.bf16 %r9064, %rs814; 2026-02-21T10:22:35.4849659Z cvt.f32.bf16 %r9097, %rs787; 2026-02-21T10:22:35.4849722Z cvt.f32.bf16 %r9098, %rs788; 2026-02-21T10:22:35.4849783Z cvt.f32.bf16 %r9099, %rs791; 2026-02-21T10:22:35.4849855Z cvt.f32.bf16 %r9100, %rs792; 2026-02-21T10:22:35.4849924Z cvt.f32.bf16 %r9133, %rs795; 2026-02-21T10:22:35.4849987Z cvt.f32.bf16 %r9134, %rs796; 2026-02-21T10:22:35.4850052Z cvt.f32.bf16 %r9135, %rs799; 2026-02-21T10:22:35.4850119Z cvt.f32.bf16 %r9136, %rs800; 2026-02-21T10:22:35.4850181Z cvt.f32.bf16 %r9169, %rs803; 2026-02-21T10:22:35.4850309Z cvt.f32.bf16 %r9170, %rs804; 2026-02-21T10:22:35.4850380Z cvt.f32.bf16 %r9171, %rs807; 2026-02-21T10:22:35.4850443Z cvt.f32.bf16 %r9172, %rs808; 2026-02-21T10:22:35.4850506Z cvt.f32.bf16 %r9205, %rs811; 2026-02-21T10:22:35.4850571Z cvt.f32.bf16 %r9206, %rs812; 2026-02-21T10:22:35.4850638Z cvt.f32.bf16 %r9207, %rs815; 2026-02-21T10:22:35.4850699Z cvt.f32.bf16 %r9208, %rs816; 2026-02-21T10:22:35.4850973Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4851037Z bar.sync 0; 2026-02-21T10:22:35.4851098Z // begin inline asm 2026-02-21T10:22:35.4851198Z @%p472 mbarrier.init.shared::cta.b64 [%r4903], 1; 2026-02-21T10:22:35.4851261Z // end inline asm 2026-02-21T10:22:35.4851322Z bar.sync 0; 2026-02-21T10:22:35.4851387Z // begin inline asm 2026-02-21T10:22:35.4851522Z @%p472 mbarrier.arrive.expect_tx.shared.b64 _, [%r4903], 4096; 2026-02-21T10:22:35.4851589Z // end inline asm 2026-02-21T10:22:35.4851665Z // begin inline asm 2026-02-21T10:22:35.4851748Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4851805Z // end inline asm 2026-02-21T10:22:35.4851868Z bar.sync 0; 2026-02-21T10:22:35.4851939Z elect.sync %r9299|%p272, -1; 2026-02-21T10:22:35.4852012Z and.pred %p257, %p1, %p272; 2026-02-21T10:22:35.4852082Z or.b32 %r8884, %r4907, 96; 2026-02-21T10:22:35.4852142Z // begin inline asm 2026-02-21T10:22:35.4852478Z @%p257 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r11026], [%rd394, {%r4906, %r8884}], [%r4903]; 2026-02-21T10:22:35.4852543Z // end inline asm 2026-02-21T10:22:35.4852601Z bar.sync 0; 2026-02-21T10:22:35.4852661Z // begin inline asm 2026-02-21T10:22:35.4852716Z 2026-02-21T10:22:35.4852772Z { 2026-02-21T10:22:35.4852837Z .reg .pred complete; 2026-02-21T10:22:35.4852895Z waitLoop: 2026-02-21T10:22:35.4853043Z mbarrier.try_wait.parity.shared.b64 complete, [%r4903], %r9226; 2026-02-21T10:22:35.4853115Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4853171Z } 2026-02-21T10:22:35.4853179Z 2026-02-21T10:22:35.4853238Z // end inline asm 2026-02-21T10:22:35.4853300Z bar.sync 0; 2026-02-21T10:22:35.4853361Z // begin inline asm 2026-02-21T10:22:35.4853457Z @%p472 mbarrier.inval.shared::cta.b64 [%r4903]; 2026-02-21T10:22:35.4853523Z // end inline asm 2026-02-21T10:22:35.4853735Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4853804Z ld.shared.b8 %rs817, [%r24]; 2026-02-21T10:22:35.4853881Z ld.shared.b8 %rs818, [%r24+1024]; 2026-02-21T10:22:35.4853949Z ld.shared.b8 %rs819, [%r24+2048]; 2026-02-21T10:22:35.4854014Z ld.shared.b8 %rs820, [%r24+3072]; 2026-02-21T10:22:35.4854083Z ld.shared.b8 %rs821, [%r25+256]; 2026-02-21T10:22:35.4854155Z ld.shared.b8 %rs822, [%r25+1280]; 2026-02-21T10:22:35.4854221Z ld.shared.b8 %rs823, [%r25+2304]; 2026-02-21T10:22:35.4854287Z ld.shared.b8 %rs824, [%r25+3328]; 2026-02-21T10:22:35.4854372Z ld.shared.b8 %rs825, [%r26+512]; 2026-02-21T10:22:35.4854506Z ld.shared.b8 %rs826, [%r26+1536]; 2026-02-21T10:22:35.4854572Z ld.shared.b8 %rs827, [%r26+2560]; 2026-02-21T10:22:35.4854639Z ld.shared.b8 %rs828, [%r26+3584]; 2026-02-21T10:22:35.4854712Z ld.shared.b8 %rs829, [%r27+768]; 2026-02-21T10:22:35.4854829Z ld.shared.b8 %rs830, [%r27+1792]; 2026-02-21T10:22:35.4854894Z ld.shared.b8 %rs831, [%r27+2816]; 2026-02-21T10:22:35.4854968Z ld.shared.b8 %rs832, [%r27+3840]; 2026-02-21T10:22:35.4855171Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4855240Z shl.b16 %rs833, %rs817, 4; 2026-02-21T10:22:35.4855311Z shl.b16 %rs834, %rs821, 4; 2026-02-21T10:22:35.4855376Z shl.b16 %rs835, %rs825, 4; 2026-02-21T10:22:35.4855439Z shl.b16 %rs836, %rs829, 4; 2026-02-21T10:22:35.4855501Z shl.b16 %rs837, %rs818, 4; 2026-02-21T10:22:35.4855581Z shl.b16 %rs838, %rs822, 4; 2026-02-21T10:22:35.4855646Z shl.b16 %rs839, %rs826, 4; 2026-02-21T10:22:35.4855760Z shl.b16 %rs840, %rs830, 4; 2026-02-21T10:22:35.4855833Z shl.b16 %rs841, %rs819, 4; 2026-02-21T10:22:35.4855894Z shl.b16 %rs842, %rs823, 4; 2026-02-21T10:22:35.4855956Z shl.b16 %rs843, %rs827, 4; 2026-02-21T10:22:35.4856020Z shl.b16 %rs844, %rs831, 4; 2026-02-21T10:22:35.4856094Z shl.b16 %rs845, %rs820, 4; 2026-02-21T10:22:35.4856160Z shl.b16 %rs846, %rs824, 4; 2026-02-21T10:22:35.4856223Z shl.b16 %rs847, %rs828, 4; 2026-02-21T10:22:35.4856343Z shl.b16 %rs848, %rs832, 4; 2026-02-21T10:22:35.4856669Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4856750Z selp.b16 %rs849, %rs833, %rs817, %p484; 2026-02-21T10:22:35.4856819Z cvt.s16.s8 %rs850, %rs849; 2026-02-21T10:22:35.4856882Z shr.s16 %rs851, %rs850, 4; 2026-02-21T10:22:35.4856954Z selp.b16 %rs852, %rs834, %rs821, %p484; 2026-02-21T10:22:35.4857019Z cvt.s16.s8 %rs853, %rs852; 2026-02-21T10:22:35.4857089Z shr.s16 %rs854, %rs853, 4; 2026-02-21T10:22:35.4857165Z selp.b16 %rs855, %rs835, %rs825, %p484; 2026-02-21T10:22:35.4857228Z cvt.s16.s8 %rs856, %rs855; 2026-02-21T10:22:35.4857297Z shr.s16 %rs857, %rs856, 4; 2026-02-21T10:22:35.4857368Z selp.b16 %rs858, %rs836, %rs829, %p484; 2026-02-21T10:22:35.4857436Z cvt.s16.s8 %rs859, %rs858; 2026-02-21T10:22:35.4857496Z shr.s16 %rs860, %rs859, 4; 2026-02-21T10:22:35.4857572Z selp.b16 %rs861, %rs837, %rs818, %p484; 2026-02-21T10:22:35.4857638Z cvt.s16.s8 %rs862, %rs861; 2026-02-21T10:22:35.4857700Z shr.s16 %rs863, %rs862, 4; 2026-02-21T10:22:35.4857774Z selp.b16 %rs864, %rs838, %rs822, %p484; 2026-02-21T10:22:35.4857836Z cvt.s16.s8 %rs865, %rs864; 2026-02-21T10:22:35.4857897Z shr.s16 %rs866, %rs865, 4; 2026-02-21T10:22:35.4857967Z selp.b16 %rs867, %rs839, %rs826, %p484; 2026-02-21T10:22:35.4858034Z cvt.s16.s8 %rs868, %rs867; 2026-02-21T10:22:35.4858098Z shr.s16 %rs869, %rs868, 4; 2026-02-21T10:22:35.4858168Z selp.b16 %rs870, %rs840, %rs830, %p484; 2026-02-21T10:22:35.4858236Z cvt.s16.s8 %rs871, %rs870; 2026-02-21T10:22:35.4858300Z shr.s16 %rs872, %rs871, 4; 2026-02-21T10:22:35.4858371Z selp.b16 %rs873, %rs841, %rs819, %p484; 2026-02-21T10:22:35.4858434Z cvt.s16.s8 %rs874, %rs873; 2026-02-21T10:22:35.4858501Z shr.s16 %rs875, %rs874, 4; 2026-02-21T10:22:35.4858587Z selp.b16 %rs876, %rs842, %rs823, %p484; 2026-02-21T10:22:35.4858651Z cvt.s16.s8 %rs877, %rs876; 2026-02-21T10:22:35.4858723Z shr.s16 %rs878, %rs877, 4; 2026-02-21T10:22:35.4858795Z selp.b16 %rs879, %rs843, %rs827, %p484; 2026-02-21T10:22:35.4858859Z cvt.s16.s8 %rs880, %rs879; 2026-02-21T10:22:35.4858926Z shr.s16 %rs881, %rs880, 4; 2026-02-21T10:22:35.4858996Z selp.b16 %rs882, %rs844, %rs831, %p484; 2026-02-21T10:22:35.4859059Z cvt.s16.s8 %rs883, %rs882; 2026-02-21T10:22:35.4859121Z shr.s16 %rs884, %rs883, 4; 2026-02-21T10:22:35.4859197Z selp.b16 %rs885, %rs845, %rs820, %p484; 2026-02-21T10:22:35.4859260Z cvt.s16.s8 %rs886, %rs885; 2026-02-21T10:22:35.4859323Z shr.s16 %rs887, %rs886, 4; 2026-02-21T10:22:35.4859503Z selp.b16 %rs888, %rs846, %rs824, %p484; 2026-02-21T10:22:35.4859566Z cvt.s16.s8 %rs889, %rs888; 2026-02-21T10:22:35.4859628Z shr.s16 %rs890, %rs889, 4; 2026-02-21T10:22:35.4859701Z selp.b16 %rs891, %rs847, %rs828, %p484; 2026-02-21T10:22:35.4859838Z cvt.s16.s8 %rs892, %rs891; 2026-02-21T10:22:35.4859905Z shr.s16 %rs893, %rs892, 4; 2026-02-21T10:22:35.4859977Z selp.b16 %rs894, %rs848, %rs832, %p484; 2026-02-21T10:22:35.4860046Z cvt.s16.s8 %rs895, %rs894; 2026-02-21T10:22:35.4860109Z shr.s16 %rs896, %rs895, 4; 2026-02-21T10:22:35.4860315Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4860388Z cvt.rn.f32.s16 %r9300, %rs851; 2026-02-21T10:22:35.4860456Z cvt.rn.f32.s16 %r9301, %rs854; 2026-02-21T10:22:35.4860520Z cvt.rn.f32.s16 %r9302, %rs857; 2026-02-21T10:22:35.4860586Z cvt.rn.f32.s16 %r9303, %rs860; 2026-02-21T10:22:35.4860654Z cvt.rn.f32.s16 %r9304, %rs863; 2026-02-21T10:22:35.4860724Z cvt.rn.f32.s16 %r9305, %rs866; 2026-02-21T10:22:35.4860852Z cvt.rn.f32.s16 %r9306, %rs869; 2026-02-21T10:22:35.4860926Z cvt.rn.f32.s16 %r9307, %rs872; 2026-02-21T10:22:35.4860989Z cvt.rn.f32.s16 %r9308, %rs875; 2026-02-21T10:22:35.4861059Z cvt.rn.f32.s16 %r9309, %rs878; 2026-02-21T10:22:35.4861137Z cvt.rn.f32.s16 %r9310, %rs881; 2026-02-21T10:22:35.4861208Z cvt.rn.f32.s16 %r9311, %rs884; 2026-02-21T10:22:35.4861339Z cvt.rn.f32.s16 %r9312, %rs887; 2026-02-21T10:22:35.4861404Z cvt.rn.f32.s16 %r9313, %rs890; 2026-02-21T10:22:35.4861473Z cvt.rn.f32.s16 %r9314, %rs893; 2026-02-21T10:22:35.4861536Z cvt.rn.f32.s16 %r9315, %rs896; 2026-02-21T10:22:35.4861594Z bar.sync 0; 2026-02-21T10:22:35.4861661Z st.shared.b32 [%r41], %r9300; 2026-02-21T10:22:35.4861740Z st.shared.b32 [%r41+16384], %r9308; 2026-02-21T10:22:35.4861805Z st.shared.b32 [%r42], %r9301; 2026-02-21T10:22:35.4861873Z st.shared.b32 [%r42+16384], %r9309; 2026-02-21T10:22:35.4861943Z st.shared.b32 [%r43], %r9302; 2026-02-21T10:22:35.4862012Z st.shared.b32 [%r43+16384], %r9310; 2026-02-21T10:22:35.4862086Z st.shared.b32 [%r44], %r9303; 2026-02-21T10:22:35.4862162Z st.shared.b32 [%r44+16384], %r9311; 2026-02-21T10:22:35.4862227Z st.shared.b32 [%r45], %r9304; 2026-02-21T10:22:35.4862297Z st.shared.b32 [%r45+16384], %r9312; 2026-02-21T10:22:35.4862360Z st.shared.b32 [%r46], %r9305; 2026-02-21T10:22:35.4862432Z st.shared.b32 [%r46+16384], %r9313; 2026-02-21T10:22:35.4862501Z st.shared.b32 [%r47], %r9306; 2026-02-21T10:22:35.4862568Z st.shared.b32 [%r47+16384], %r9314; 2026-02-21T10:22:35.4862640Z st.shared.b32 [%r48], %r9307; 2026-02-21T10:22:35.4862706Z st.shared.b32 [%r48+16384], %r9315; 2026-02-21T10:22:35.4862763Z $L__tmp15: 2026-02-21T10:22:35.4863042Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4863241Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5028, %r5316, %r5604, %r5892}; 2026-02-21T10:22:35.4863302Z bar.sync 0; 2026-02-21T10:22:35.4863368Z // begin inline asm 2026-02-21T10:22:35.4863513Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8957}, [%r4407]; 2026-02-21T10:22:35.4863573Z // end inline asm 2026-02-21T10:22:35.4863630Z bar.sync 0; 2026-02-21T10:22:35.4863821Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5030, %r5318, %r5606, %r5894}; 2026-02-21T10:22:35.4863879Z bar.sync 0; 2026-02-21T10:22:35.4863943Z // begin inline asm 2026-02-21T10:22:35.4864077Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8959}, [%r4407]; 2026-02-21T10:22:35.4864144Z // end inline asm 2026-02-21T10:22:35.4864201Z bar.sync 0; 2026-02-21T10:22:35.4864384Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5029, %r5317, %r5605, %r5893}; 2026-02-21T10:22:35.4864449Z bar.sync 0; 2026-02-21T10:22:35.4864509Z // begin inline asm 2026-02-21T10:22:35.4864640Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8958}, [%r4407]; 2026-02-21T10:22:35.4864699Z // end inline asm 2026-02-21T10:22:35.4864837Z bar.sync 0; 2026-02-21T10:22:35.4865021Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5031, %r5319, %r5607, %r5895}; 2026-02-21T10:22:35.4865079Z bar.sync 0; 2026-02-21T10:22:35.4865146Z // begin inline asm 2026-02-21T10:22:35.4865329Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8960}, [%r4407]; 2026-02-21T10:22:35.4865392Z // end inline asm 2026-02-21T10:22:35.4865453Z bar.sync 0; 2026-02-21T10:22:35.4865632Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5032, %r5320, %r5608, %r5896}; 2026-02-21T10:22:35.4865690Z bar.sync 0; 2026-02-21T10:22:35.4865748Z // begin inline asm 2026-02-21T10:22:35.4865881Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8961}, [%r4407]; 2026-02-21T10:22:35.4865938Z // end inline asm 2026-02-21T10:22:35.4866006Z bar.sync 0; 2026-02-21T10:22:35.4866193Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5034, %r5322, %r5610, %r5898}; 2026-02-21T10:22:35.4866249Z bar.sync 0; 2026-02-21T10:22:35.4866308Z // begin inline asm 2026-02-21T10:22:35.4866659Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8963}, [%r4407]; 2026-02-21T10:22:35.4866739Z // end inline asm 2026-02-21T10:22:35.4866798Z bar.sync 0; 2026-02-21T10:22:35.4866992Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5033, %r5321, %r5609, %r5897}; 2026-02-21T10:22:35.4867059Z bar.sync 0; 2026-02-21T10:22:35.4867123Z // begin inline asm 2026-02-21T10:22:35.4867270Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8962}, [%r4407]; 2026-02-21T10:22:35.4867399Z // end inline asm 2026-02-21T10:22:35.4867468Z bar.sync 0; 2026-02-21T10:22:35.4867655Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5035, %r5323, %r5611, %r5899}; 2026-02-21T10:22:35.4867713Z bar.sync 0; 2026-02-21T10:22:35.4867780Z // begin inline asm 2026-02-21T10:22:35.4867914Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8964}, [%r4407]; 2026-02-21T10:22:35.4867973Z // end inline asm 2026-02-21T10:22:35.4868034Z bar.sync 0; 2026-02-21T10:22:35.4868212Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5036, %r5324, %r5612, %r5900}; 2026-02-21T10:22:35.4868274Z bar.sync 0; 2026-02-21T10:22:35.4868335Z // begin inline asm 2026-02-21T10:22:35.4868535Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8965}, [%r4407]; 2026-02-21T10:22:35.4868599Z // end inline asm 2026-02-21T10:22:35.4868658Z bar.sync 0; 2026-02-21T10:22:35.4868840Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5038, %r5326, %r5614, %r5902}; 2026-02-21T10:22:35.4868901Z bar.sync 0; 2026-02-21T10:22:35.4868961Z // begin inline asm 2026-02-21T10:22:35.4869088Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8967}, [%r4407]; 2026-02-21T10:22:35.4869153Z // end inline asm 2026-02-21T10:22:35.4869210Z bar.sync 0; 2026-02-21T10:22:35.4869386Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5037, %r5325, %r5613, %r5901}; 2026-02-21T10:22:35.4869447Z bar.sync 0; 2026-02-21T10:22:35.4869508Z // begin inline asm 2026-02-21T10:22:35.4869643Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8966}, [%r4407]; 2026-02-21T10:22:35.4869713Z // end inline asm 2026-02-21T10:22:35.4869772Z bar.sync 0; 2026-02-21T10:22:35.4869952Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5039, %r5327, %r5615, %r5903}; 2026-02-21T10:22:35.4870010Z bar.sync 0; 2026-02-21T10:22:35.4870081Z // begin inline asm 2026-02-21T10:22:35.4870215Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8968}, [%r4407]; 2026-02-21T10:22:35.4870274Z // end inline asm 2026-02-21T10:22:35.4870347Z bar.sync 0; 2026-02-21T10:22:35.4870530Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5040, %r5328, %r5616, %r5904}; 2026-02-21T10:22:35.4870588Z bar.sync 0; 2026-02-21T10:22:35.4870649Z // begin inline asm 2026-02-21T10:22:35.4870782Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8969}, [%r4407]; 2026-02-21T10:22:35.4870843Z // end inline asm 2026-02-21T10:22:35.4870900Z bar.sync 0; 2026-02-21T10:22:35.4871086Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5042, %r5330, %r5618, %r5906}; 2026-02-21T10:22:35.4871143Z bar.sync 0; 2026-02-21T10:22:35.4871317Z // begin inline asm 2026-02-21T10:22:35.4871455Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8971}, [%r4407]; 2026-02-21T10:22:35.4871514Z // end inline asm 2026-02-21T10:22:35.4871570Z bar.sync 0; 2026-02-21T10:22:35.4871749Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5041, %r5329, %r5617, %r5905}; 2026-02-21T10:22:35.4871878Z bar.sync 0; 2026-02-21T10:22:35.4871940Z // begin inline asm 2026-02-21T10:22:35.4872072Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8970}, [%r4407]; 2026-02-21T10:22:35.4872137Z // end inline asm 2026-02-21T10:22:35.4872193Z bar.sync 0; 2026-02-21T10:22:35.4872369Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r433], {%r5043, %r5331, %r5619, %r5907}; 2026-02-21T10:22:35.4872425Z bar.sync 0; 2026-02-21T10:22:35.4872491Z // begin inline asm 2026-02-21T10:22:35.4872620Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r8972}, [%r4407]; 2026-02-21T10:22:35.4872678Z // end inline asm 2026-02-21T10:22:35.4872745Z // begin inline asm 2026-02-21T10:22:35.4872825Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4872949Z // end inline asm 2026-02-21T10:22:35.4873030Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4873101Z shl.b32 %r9316, %r9264, 10; 2026-02-21T10:22:35.4873167Z and.b32 %r9317, %r9316, 12288; 2026-02-21T10:22:35.4873247Z add.s32 %r9318, %r9317, %r11026; 2026-02-21T10:22:35.4873318Z bfe.u32 %r9319, %r9318, 4, 14; 2026-02-21T10:22:35.4873383Z cvt.u64.u32 %rd382, %r9319; 2026-02-21T10:22:35.4873513Z or.b64 %rd374, %rd382, 4611686293372403712; 2026-02-21T10:22:35.4873583Z // begin inline asm 2026-02-21T10:22:35.4874097Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r8953,%r8954,%r8955,%r8956}, %rd374, %p151, 1, 1; 2026-02-21T10:22:35.4874156Z // end inline asm 2026-02-21T10:22:35.4874219Z add.s32 %r9320, %r9318, 32; 2026-02-21T10:22:35.4874289Z bfe.u32 %r9321, %r9320, 4, 14; 2026-02-21T10:22:35.4874354Z cvt.u64.u32 %rd383, %r9321; 2026-02-21T10:22:35.4874435Z or.b64 %rd375, %rd383, 4611686293372403712; 2026-02-21T10:22:35.4874512Z // begin inline asm 2026-02-21T10:22:35.4875020Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r8989,%r8990,%r8991,%r8992}, %rd375, %p151, 1, 1; 2026-02-21T10:22:35.4875083Z // end inline asm 2026-02-21T10:22:35.4875153Z add.s32 %r9322, %r9318, 64; 2026-02-21T10:22:35.4875217Z bfe.u32 %r9323, %r9322, 4, 14; 2026-02-21T10:22:35.4875281Z cvt.u64.u32 %rd384, %r9323; 2026-02-21T10:22:35.4875357Z or.b64 %rd376, %rd384, 4611686293372403712; 2026-02-21T10:22:35.4875427Z // begin inline asm 2026-02-21T10:22:35.4875926Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9025,%r9026,%r9027,%r9028}, %rd376, %p151, 1, 1; 2026-02-21T10:22:35.4875986Z // end inline asm 2026-02-21T10:22:35.4876056Z add.s32 %r9324, %r9318, 96; 2026-02-21T10:22:35.4876119Z bfe.u32 %r9325, %r9324, 4, 14; 2026-02-21T10:22:35.4876181Z cvt.u64.u32 %rd385, %r9325; 2026-02-21T10:22:35.4876258Z or.b64 %rd377, %rd385, 4611686293372403712; 2026-02-21T10:22:35.4876323Z // begin inline asm 2026-02-21T10:22:35.4876949Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9061,%r9062,%r9063,%r9064}, %rd377, %p151, 1, 1; 2026-02-21T10:22:35.4877015Z // end inline asm 2026-02-21T10:22:35.4877078Z add.s32 %r9326, %r9318, 16384; 2026-02-21T10:22:35.4877141Z bfe.u32 %r9327, %r9326, 4, 14; 2026-02-21T10:22:35.4877205Z cvt.u64.u32 %rd386, %r9327; 2026-02-21T10:22:35.4877296Z or.b64 %rd378, %rd386, 4611686293372403712; 2026-02-21T10:22:35.4877360Z // begin inline asm 2026-02-21T10:22:35.4877859Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9097,%r9098,%r9099,%r9100}, %rd378, %p151, 1, 1; 2026-02-21T10:22:35.4878004Z // end inline asm 2026-02-21T10:22:35.4878068Z add.s32 %r9328, %r9318, 16416; 2026-02-21T10:22:35.4878195Z bfe.u32 %r9329, %r9328, 4, 14; 2026-02-21T10:22:35.4878263Z cvt.u64.u32 %rd387, %r9329; 2026-02-21T10:22:35.4878337Z or.b64 %rd379, %rd387, 4611686293372403712; 2026-02-21T10:22:35.4878412Z // begin inline asm 2026-02-21T10:22:35.4878914Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9133,%r9134,%r9135,%r9136}, %rd379, %p151, 1, 1; 2026-02-21T10:22:35.4878979Z // end inline asm 2026-02-21T10:22:35.4879041Z add.s32 %r9330, %r9318, 16448; 2026-02-21T10:22:35.4879103Z bfe.u32 %r9331, %r9330, 4, 14; 2026-02-21T10:22:35.4879172Z cvt.u64.u32 %rd388, %r9331; 2026-02-21T10:22:35.4879245Z or.b64 %rd380, %rd388, 4611686293372403712; 2026-02-21T10:22:35.4879374Z // begin inline asm 2026-02-21T10:22:35.4879885Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9169,%r9170,%r9171,%r9172}, %rd380, %p151, 1, 1; 2026-02-21T10:22:35.4879949Z // end inline asm 2026-02-21T10:22:35.4880012Z add.s32 %r9332, %r9318, 16480; 2026-02-21T10:22:35.4880131Z bfe.u32 %r9333, %r9332, 4, 14; 2026-02-21T10:22:35.4880202Z cvt.u64.u32 %rd389, %r9333; 2026-02-21T10:22:35.4880277Z or.b64 %rd381, %rd389, 4611686293372403712; 2026-02-21T10:22:35.4880341Z // begin inline asm 2026-02-21T10:22:35.4880845Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972}, {%r9205,%r9206,%r9207,%r9208}, %rd381, %p151, 1, 1; 2026-02-21T10:22:35.4880903Z // end inline asm 2026-02-21T10:22:35.4880982Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.4881056Z mov.b32 %r9225, %r11026; 2026-02-21T10:22:35.4881120Z mov.b32 %r9227, %r9226; 2026-02-21T10:22:35.4881181Z // begin inline asm 2026-02-21T10:22:35.4881495Z // wait for regs: %r8957,%r8958,%r8959,%r8960,%r8961,%r8962,%r8963,%r8964,%r8965,%r8966,%r8967,%r8968,%r8969,%r8970,%r8971,%r8972,%r9225,%r9226,%r9227 2026-02-21T10:22:35.4881577Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.4881637Z // end inline asm 2026-02-21T10:22:35.4881707Z $L__tmp16: 2026-02-21T10:22:35.4881940Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.4882006Z add.s64 %rd42, %rd598, 128; 2026-02-21T10:22:35.4882073Z add.s64 %rd597, %rd597, 512; 2026-02-21T10:22:35.4882149Z setp.lt.u64 %p273, %rd598, 3968; 2026-02-21T10:22:35.4882212Z mov.b64 %rd598, %rd42; 2026-02-21T10:22:35.4882275Z @%p273 bra $L__BB0_5; 2026-02-21T10:22:35.4882388Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:35.4882611Z .loc 1 38 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:38:32 2026-02-21T10:22:35.4882677Z or.b32 %r9352, %r4906, %r6; 2026-02-21T10:22:35.4882884Z .loc 1 40 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:32 2026-02-21T10:22:35.4882957Z or.b32 %r9353, %r86, %r7; 2026-02-21T10:22:35.4883019Z or.b32 %r9354, %r86, %r8; 2026-02-21T10:22:35.4883234Z .loc 1 93 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:93:28 2026-02-21T10:22:35.4883324Z cvt.rn.bf16x2.f32 %r9355, %r8958, %r8957; 2026-02-21T10:22:35.4883404Z cvt.rn.bf16x2.f32 %r9356, %r8960, %r8959; 2026-02-21T10:22:35.4883479Z cvt.rn.bf16x2.f32 %r9357, %r8962, %r8961; 2026-02-21T10:22:35.4883561Z cvt.rn.bf16x2.f32 %r9358, %r8964, %r8963; 2026-02-21T10:22:35.4883634Z cvt.rn.bf16x2.f32 %r9359, %r8966, %r8965; 2026-02-21T10:22:35.4883707Z cvt.rn.bf16x2.f32 %r9360, %r8968, %r8967; 2026-02-21T10:22:35.4883850Z cvt.rn.bf16x2.f32 %r9361, %r8970, %r8969; 2026-02-21T10:22:35.4883932Z cvt.rn.bf16x2.f32 %r9362, %r8972, %r8971; 2026-02-21T10:22:35.4884139Z .loc 1 94 50 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:50 2026-02-21T10:22:35.4884262Z mad.lo.s32 %r9363, %r9353, 1280, %r9352; 2026-02-21T10:22:35.4884342Z mad.lo.s32 %r9364, %r9354, 1280, %r9352; 2026-02-21T10:22:35.4884549Z .loc 1 94 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:22 2026-02-21T10:22:35.4884624Z mad.wide.s32 %rd390, %r9363, 2, %rd77; 2026-02-21T10:22:35.4884700Z mad.wide.s32 %rd391, %r9364, 2, %rd77; 2026-02-21T10:22:35.4884907Z .loc 1 94 81 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:81 2026-02-21T10:22:35.4884967Z bar.sync 0; 2026-02-21T10:22:35.4885080Z st.shared.v4.b32 [%r38], {%r9355, %r9357, %r9359, %r9361}; 2026-02-21T10:22:35.4885208Z st.shared.v4.b32 [%r38+512], {%r9356, %r9358, %r9360, %r9362}; 2026-02-21T10:22:35.4889608Z bar.sync 0; 2026-02-21T10:22:35.4889726Z // begin inline asm 2026-02-21T10:22:35.4889948Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9334, %r9335, %r9336, %r9337}, [%r4855]; 2026-02-21T10:22:35.4890015Z // end inline asm 2026-02-21T10:22:35.4890077Z // begin inline asm 2026-02-21T10:22:35.4890278Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9339, %r9340, %r9341, %r9342}, [%r4860]; 2026-02-21T10:22:35.4890413Z // end inline asm 2026-02-21T10:22:35.4890475Z // begin inline asm 2026-02-21T10:22:35.4890612Z st.global.v4.b32 [ %rd390 + 0 ], { %r9334, %r9335, %r9336, %r9337 }; 2026-02-21T10:22:35.4890675Z // end inline asm 2026-02-21T10:22:35.4890735Z // begin inline asm 2026-02-21T10:22:35.4890860Z st.global.v4.b32 [ %rd391 + 0 ], { %r9339, %r9340, %r9341, %r9342 }; 2026-02-21T10:22:35.4890922Z // end inline asm 2026-02-21T10:22:35.4891152Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4891228Z add.s32 %r14246, %r14246, 264; 2026-02-21T10:22:35.4891307Z setp.lt.s32 %p274, %r14246, %r14314; 2026-02-21T10:22:35.4891373Z @%p274 bra $L__BB0_2; 2026-02-21T10:22:35.4891474Z $L__BB0_7: // %._crit_edge 2026-02-21T10:22:35.4891547Z sub.s32 %r9485, 10240, %r14314; 2026-02-21T10:22:35.4891623Z mul.hi.s32 %r9486, %r9485, 1041204193; 2026-02-21T10:22:35.4891692Z shr.u32 %r9487, %r9486, 31; 2026-02-21T10:22:35.4891758Z shr.s32 %r9488, %r9486, 5; 2026-02-21T10:22:35.4891830Z add.s32 %r120, %r9488, %r9487; 2026-02-21T10:22:35.4891898Z mul.lo.s32 %r9489, %r120, 132; 2026-02-21T10:22:35.4891968Z setp.ne.b32 %p312, %r9485, %r9489; 2026-02-21T10:22:35.4892038Z setp.gt.s32 %p313, %r9485, -1; 2026-02-21T10:22:35.4892116Z and.pred %p314, %p313, %p312; 2026-02-21T10:22:35.4892184Z selp.b32 %r121, 1, 0, %p314; 2026-02-21T10:22:35.4892250Z add.s32 %r122, %r120, %r121; 2026-02-21T10:22:35.4892321Z add.s32 %r9365, %r11026, 182272; 2026-02-21T10:22:35.4892390Z // begin inline asm 2026-02-21T10:22:35.4892513Z @%p472 mbarrier.init.shared::cta.b64 [%r9365], 1; 2026-02-21T10:22:35.4892580Z // end inline asm 2026-02-21T10:22:35.4892638Z bar.sync 0; 2026-02-21T10:22:35.4892704Z add.s32 %r9366, %r11026, 182280; 2026-02-21T10:22:35.4892768Z // begin inline asm 2026-02-21T10:22:35.4892871Z @%p472 mbarrier.init.shared::cta.b64 [%r9366], 1; 2026-02-21T10:22:35.4892929Z // end inline asm 2026-02-21T10:22:35.4892988Z bar.sync 0; 2026-02-21T10:22:35.4893055Z add.s32 %r9367, %r11026, 182288; 2026-02-21T10:22:35.4893116Z // begin inline asm 2026-02-21T10:22:35.4893208Z @%p472 mbarrier.init.shared::cta.b64 [%r9367], 1; 2026-02-21T10:22:35.4893265Z // end inline asm 2026-02-21T10:22:35.4893333Z add.s32 %r9368, %r11026, 182304; 2026-02-21T10:22:35.4893392Z // begin inline asm 2026-02-21T10:22:35.4893482Z @%p472 mbarrier.init.shared::cta.b64 [%r9368], 1; 2026-02-21T10:22:35.4893546Z // end inline asm 2026-02-21T10:22:35.4893603Z bar.sync 0; 2026-02-21T10:22:35.4893757Z add.s32 %r9369, %r11026, 182312; 2026-02-21T10:22:35.4893819Z // begin inline asm 2026-02-21T10:22:35.4893914Z @%p472 mbarrier.init.shared::cta.b64 [%r9369], 1; 2026-02-21T10:22:35.4893972Z // end inline asm 2026-02-21T10:22:35.4894109Z bar.sync 0; 2026-02-21T10:22:35.4894178Z add.s32 %r9370, %r11026, 182320; 2026-02-21T10:22:35.4894237Z // begin inline asm 2026-02-21T10:22:35.4894331Z @%p472 mbarrier.init.shared::cta.b64 [%r9370], 1; 2026-02-21T10:22:35.4894391Z // end inline asm 2026-02-21T10:22:35.4894453Z add.s32 %r9371, %r11026, 182336; 2026-02-21T10:22:35.4894515Z // begin inline asm 2026-02-21T10:22:35.4894604Z @%p472 mbarrier.init.shared::cta.b64 [%r9371], 1; 2026-02-21T10:22:35.4894665Z // end inline asm 2026-02-21T10:22:35.4894721Z bar.sync 0; 2026-02-21T10:22:35.4894784Z add.s32 %r9372, %r11026, 182344; 2026-02-21T10:22:35.4894847Z // begin inline asm 2026-02-21T10:22:35.4894940Z @%p472 mbarrier.init.shared::cta.b64 [%r9372], 1; 2026-02-21T10:22:35.4895003Z // end inline asm 2026-02-21T10:22:35.4895109Z bar.sync 0; 2026-02-21T10:22:35.4895178Z add.s32 %r9373, %r11026, 182352; 2026-02-21T10:22:35.4895239Z // begin inline asm 2026-02-21T10:22:35.4895326Z @%p472 mbarrier.init.shared::cta.b64 [%r9373], 1; 2026-02-21T10:22:35.4895397Z // end inline asm 2026-02-21T10:22:35.4895458Z add.s32 %r9374, %r11026, 182368; 2026-02-21T10:22:35.4895518Z // begin inline asm 2026-02-21T10:22:35.4895694Z @%p472 mbarrier.init.shared::cta.b64 [%r9374], 1; 2026-02-21T10:22:35.4895755Z // end inline asm 2026-02-21T10:22:35.4895812Z bar.sync 0; 2026-02-21T10:22:35.4895874Z add.s32 %r9375, %r11026, 182376; 2026-02-21T10:22:35.4895939Z // begin inline asm 2026-02-21T10:22:35.4896032Z @%p472 mbarrier.init.shared::cta.b64 [%r9375], 1; 2026-02-21T10:22:35.4896100Z // end inline asm 2026-02-21T10:22:35.4896163Z bar.sync 0; 2026-02-21T10:22:35.4896226Z add.s32 %r9376, %r11026, 182384; 2026-02-21T10:22:35.4896287Z // begin inline asm 2026-02-21T10:22:35.4896384Z @%p472 mbarrier.init.shared::cta.b64 [%r9376], 1; 2026-02-21T10:22:35.4896446Z // end inline asm 2026-02-21T10:22:35.4896676Z setp.lt.s32 %p315, %r122, 1; 2026-02-21T10:22:35.4896745Z setp.gt.s32 %p316, %r122, 0; 2026-02-21T10:22:35.4896979Z .loc 1 32 35 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:32:35 2026-02-21T10:22:35.4897045Z shr.s32 %r9491, %r14314, 31; 2026-02-21T10:22:35.4897112Z shr.u32 %r9492, %r9491, 16; 2026-02-21T10:22:35.4897180Z add.s32 %r9493, %r14314, %r9492; 2026-02-21T10:22:35.4897242Z shr.s32 %r9494, %r9493, 16; 2026-02-21T10:22:35.4897457Z .loc 1 33 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:33:33 2026-02-21T10:22:35.4897521Z shl.b32 %r9495, %r9494, 6; 2026-02-21T10:22:35.4897729Z .loc 1 34 39 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:39 2026-02-21T10:22:35.4897791Z sub.s32 %r9496, 10, %r9495; 2026-02-21T10:22:35.4897991Z .loc 1 34 52 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:52 2026-02-21T10:22:35.4898059Z min.s32 %r9497, %r9496, 64; 2026-02-21T10:22:35.4898260Z .loc 1 35 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:45 2026-02-21T10:22:35.4898333Z and.b32 %r9498, %r9493, -65536; 2026-02-21T10:22:35.4898405Z sub.s32 %r9499, %r14314, %r9498; 2026-02-21T10:22:35.4898611Z .loc 1 36 51 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:36:51 2026-02-21T10:22:35.4898676Z div.s32 %r9500, %r9499, %r9497; 2026-02-21T10:22:35.4898878Z .loc 1 35 64 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:64 2026-02-21T10:22:35.4898951Z mul.lo.s32 %r9501, %r9500, %r9497; 2026-02-21T10:22:35.4899016Z sub.s32 %r9502, %r9499, %r9501; 2026-02-21T10:22:35.4899220Z .loc 1 35 30 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:30 2026-02-21T10:22:35.4899388Z add.s32 %r9503, %r9502, %r9495; 2026-02-21T10:22:35.4899598Z .loc 1 37 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:37:27 2026-02-21T10:22:35.4899664Z shl.b32 %r14281, %r9503, 7; 2026-02-21T10:22:35.4899947Z .loc 1 39 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:39:27 2026-02-21T10:22:35.4900009Z shl.b32 %r14279, %r9500, 6; 2026-02-21T10:22:35.4900213Z .loc 1 40 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:32 2026-02-21T10:22:35.4900282Z or.b32 %r14315, %r14279, %r7; 2026-02-21T10:22:35.4900344Z or.b32 %r14316, %r14279, %r8; 2026-02-21T10:22:35.4900545Z .loc 1 54 53 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:53 2026-02-21T10:22:35.4900608Z shl.b32 %r9504, %r14315, 13; 2026-02-21T10:22:35.4900676Z shl.b32 %r9505, %r14316, 13; 2026-02-21T10:22:35.4900885Z .loc 1 54 60 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:60 2026-02-21T10:22:35.4901021Z or.b32 %r9506, %r9504, %r10; 2026-02-21T10:22:35.4901095Z or.b32 %r9507, %r9505, %r10; 2026-02-21T10:22:35.4901294Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4901374Z mad.wide.s32 %rd392, %r9506, 2, %rd76; 2026-02-21T10:22:35.4901450Z mad.wide.s32 %rd393, %r9507, 2, %rd76; 2026-02-21T10:22:35.4901711Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4901777Z shl.b32 %r9508, %r2, 3; 2026-02-21T10:22:35.4901842Z and.b32 %r9509, %r9508, 4088; 2026-02-21T10:22:35.4901908Z shr.u32 %r9510, %r2, 1; 2026-02-21T10:22:35.4901970Z and.b32 %r9511, %r9510, 56; 2026-02-21T10:22:35.4902036Z xor.b32 %r127, %r9509, %r9511; 2026-02-21T10:22:35.4902106Z add.s32 %r9512, %r11026, %r127; 2026-02-21T10:22:35.4902171Z add.s32 %r9377, %r9512, 34816; 2026-02-21T10:22:35.4902238Z selp.b32 %r9378, 8, 0, %p316; 2026-02-21T10:22:35.4902314Z // begin inline asm 2026-02-21T10:22:35.4902465Z cp.async.ca.shared.global [ %r9377 + 0 ], [ %rd392 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4902526Z // end inline asm 2026-02-21T10:22:35.4902589Z add.s32 %r9379, %r9512, 38912; 2026-02-21T10:22:35.4902660Z // begin inline asm 2026-02-21T10:22:35.4902810Z cp.async.ca.shared.global [ %r9379 + 0 ], [ %rd393 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4902874Z // end inline asm 2026-02-21T10:22:35.4902950Z cp.async.commit_group; 2026-02-21T10:22:35.4903170Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4903240Z and.pred %p287, %p472, %p316; 2026-02-21T10:22:35.4903301Z // begin inline asm 2026-02-21T10:22:35.4903446Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9365], 4096; 2026-02-21T10:22:35.4903505Z // end inline asm 2026-02-21T10:22:35.4903713Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4903780Z bar.sync 0; 2026-02-21T10:22:35.4903854Z elect.sync %r9513|%p317, -1; 2026-02-21T10:22:35.4903922Z and.pred %p318, %p316, %p317; 2026-02-21T10:22:35.4903996Z and.pred %p288, %p1, %p318; 2026-02-21T10:22:35.4904063Z add.s32 %r9382, %r11026, 133120; 2026-02-21T10:22:35.4904128Z // begin inline asm 2026-02-21T10:22:35.4904464Z @%p288 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9382], [%rd394, {%r14281, %r236}], [%r9365]; 2026-02-21T10:22:35.4904530Z // end inline asm 2026-02-21T10:22:35.4904734Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4904811Z cvt.s64.s32 %rd428, %r9504; 2026-02-21T10:22:35.4904883Z cvt.u64.u32 %rd429, %r10; 2026-02-21T10:22:35.4904949Z or.b64 %rd430, %rd428, %rd429; 2026-02-21T10:22:35.4905015Z shl.b64 %rd431, %rd430, 1; 2026-02-21T10:22:35.4905086Z add.s64 %rd432, %rd76, %rd431; 2026-02-21T10:22:35.4905208Z add.s64 %rd395, %rd432, 128; 2026-02-21T10:22:35.4905275Z cvt.s64.s32 %rd433, %r9505; 2026-02-21T10:22:35.4905340Z or.b64 %rd434, %rd433, %rd429; 2026-02-21T10:22:35.4905419Z shl.b64 %rd435, %rd434, 1; 2026-02-21T10:22:35.4905484Z add.s64 %rd436, %rd76, %rd435; 2026-02-21T10:22:35.4905597Z add.s64 %rd396, %rd436, 128; 2026-02-21T10:22:35.4905812Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4905876Z add.s32 %r9386, %r9512, 59392; 2026-02-21T10:22:35.4905937Z // begin inline asm 2026-02-21T10:22:35.4906084Z cp.async.ca.shared.global [ %r9386 + 0 ], [ %rd395 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4906148Z // end inline asm 2026-02-21T10:22:35.4906211Z add.s32 %r9388, %r9512, 63488; 2026-02-21T10:22:35.4906270Z // begin inline asm 2026-02-21T10:22:35.4906421Z cp.async.ca.shared.global [ %r9388 + 0 ], [ %rd396 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4906601Z // end inline asm 2026-02-21T10:22:35.4906677Z cp.async.commit_group; 2026-02-21T10:22:35.4906988Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4907053Z // begin inline asm 2026-02-21T10:22:35.4907194Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9368], 4096; 2026-02-21T10:22:35.4907259Z // end inline asm 2026-02-21T10:22:35.4907536Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4907597Z bar.sync 0; 2026-02-21T10:22:35.4907671Z elect.sync %r9514|%p319, -1; 2026-02-21T10:22:35.4907747Z and.pred %p320, %p316, %p319; 2026-02-21T10:22:35.4907815Z and.pred %p290, %p1, %p320; 2026-02-21T10:22:35.4907883Z add.s32 %r9391, %r11026, 145408; 2026-02-21T10:22:35.4907950Z // begin inline asm 2026-02-21T10:22:35.4908292Z @%p290 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9391], [%rd394, {%r14281, %r238}], [%r9368]; 2026-02-21T10:22:35.4908354Z // end inline asm 2026-02-21T10:22:35.4908659Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4908732Z add.s64 %rd398, %rd432, 256; 2026-02-21T10:22:35.4908795Z add.s64 %rd399, %rd436, 256; 2026-02-21T10:22:35.4909002Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4909072Z add.s32 %r9395, %r9512, 83968; 2026-02-21T10:22:35.4909135Z // begin inline asm 2026-02-21T10:22:35.4909283Z cp.async.ca.shared.global [ %r9395 + 0 ], [ %rd398 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4909346Z // end inline asm 2026-02-21T10:22:35.4909409Z add.s32 %r9397, %r9512, 88064; 2026-02-21T10:22:35.4909471Z // begin inline asm 2026-02-21T10:22:35.4909617Z cp.async.ca.shared.global [ %r9397 + 0 ], [ %rd399 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4909681Z // end inline asm 2026-02-21T10:22:35.4909750Z cp.async.commit_group; 2026-02-21T10:22:35.4909964Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4910032Z // begin inline asm 2026-02-21T10:22:35.4910166Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9371], 4096; 2026-02-21T10:22:35.4910226Z // end inline asm 2026-02-21T10:22:35.4910453Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4910516Z bar.sync 0; 2026-02-21T10:22:35.4910585Z elect.sync %r9515|%p321, -1; 2026-02-21T10:22:35.4910655Z and.pred %p322, %p316, %p321; 2026-02-21T10:22:35.4910730Z and.pred %p292, %p1, %p322; 2026-02-21T10:22:35.4910798Z add.s32 %r9400, %r11026, 157696; 2026-02-21T10:22:35.4910859Z mov.b32 %r9402, 64; 2026-02-21T10:22:35.4910925Z // begin inline asm 2026-02-21T10:22:35.4911253Z @%p292 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9400], [%rd394, {%r14281, %r9402}], [%r9371]; 2026-02-21T10:22:35.4911310Z // end inline asm 2026-02-21T10:22:35.4911521Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4911665Z add.s64 %rd401, %rd432, 384; 2026-02-21T10:22:35.4911728Z add.s64 %rd402, %rd436, 384; 2026-02-21T10:22:35.4911927Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4912060Z add.s32 %r9404, %r9512, 108544; 2026-02-21T10:22:35.4912121Z // begin inline asm 2026-02-21T10:22:35.4912265Z cp.async.ca.shared.global [ %r9404 + 0 ], [ %rd401 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4912329Z // end inline asm 2026-02-21T10:22:35.4912391Z add.s32 %r9406, %r9512, 112640; 2026-02-21T10:22:35.4912462Z // begin inline asm 2026-02-21T10:22:35.4912610Z cp.async.ca.shared.global [ %r9406 + 0 ], [ %rd402 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4912668Z // end inline asm 2026-02-21T10:22:35.4912735Z cp.async.commit_group; 2026-02-21T10:22:35.4912946Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4913065Z // begin inline asm 2026-02-21T10:22:35.4913198Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9374], 4096; 2026-02-21T10:22:35.4913257Z // end inline asm 2026-02-21T10:22:35.4913462Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4913521Z bar.sync 0; 2026-02-21T10:22:35.4913588Z elect.sync %r9516|%p323, -1; 2026-02-21T10:22:35.4913706Z and.pred %p324, %p316, %p323; 2026-02-21T10:22:35.4913779Z and.pred %p294, %p1, %p324; 2026-02-21T10:22:35.4913843Z add.s32 %r9409, %r11026, 169984; 2026-02-21T10:22:35.4913903Z mov.b32 %r9411, 96; 2026-02-21T10:22:35.4913966Z // begin inline asm 2026-02-21T10:22:35.4914294Z @%p294 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9409], [%rd394, {%r14281, %r9411}], [%r9374]; 2026-02-21T10:22:35.4914352Z // end inline asm 2026-02-21T10:22:35.4914560Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4914626Z add.s64 %rd404, %rd432, 512; 2026-02-21T10:22:35.4914688Z add.s64 %rd405, %rd436, 512; 2026-02-21T10:22:35.4914895Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4914961Z add.s32 %r9413, %r9512, 43008; 2026-02-21T10:22:35.4915021Z // begin inline asm 2026-02-21T10:22:35.4915158Z cp.async.ca.shared.global [ %r9413 + 0 ], [ %rd404 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4915221Z // end inline asm 2026-02-21T10:22:35.4915282Z add.s32 %r9415, %r9512, 47104; 2026-02-21T10:22:35.4915343Z // begin inline asm 2026-02-21T10:22:35.4915480Z cp.async.ca.shared.global [ %r9415 + 0 ], [ %rd405 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4915537Z // end inline asm 2026-02-21T10:22:35.4915604Z cp.async.commit_group; 2026-02-21T10:22:35.4915816Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4915896Z // begin inline asm 2026-02-21T10:22:35.4916028Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9366], 4096; 2026-02-21T10:22:35.4916087Z // end inline asm 2026-02-21T10:22:35.4916290Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4916349Z bar.sync 0; 2026-02-21T10:22:35.4916415Z elect.sync %r9517|%p325, -1; 2026-02-21T10:22:35.4916642Z and.pred %p326, %p316, %p325; 2026-02-21T10:22:35.4916713Z and.pred %p296, %p1, %p326; 2026-02-21T10:22:35.4916777Z add.s32 %r9418, %r11026, 137216; 2026-02-21T10:22:35.4916839Z // begin inline asm 2026-02-21T10:22:35.4917166Z @%p296 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9418], [%rd394, {%r14281, %r237}], [%r9366]; 2026-02-21T10:22:35.4917227Z // end inline asm 2026-02-21T10:22:35.4917431Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4917594Z add.s64 %rd407, %rd432, 640; 2026-02-21T10:22:35.4917660Z add.s64 %rd408, %rd436, 640; 2026-02-21T10:22:35.4917868Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4917935Z add.s32 %r9422, %r9512, 67584; 2026-02-21T10:22:35.4918060Z // begin inline asm 2026-02-21T10:22:35.4918197Z cp.async.ca.shared.global [ %r9422 + 0 ], [ %rd407 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4918257Z // end inline asm 2026-02-21T10:22:35.4918324Z add.s32 %r9424, %r9512, 71680; 2026-02-21T10:22:35.4918382Z // begin inline asm 2026-02-21T10:22:35.4918516Z cp.async.ca.shared.global [ %r9424 + 0 ], [ %rd408 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4918578Z // end inline asm 2026-02-21T10:22:35.4918646Z cp.async.commit_group; 2026-02-21T10:22:35.4918855Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4918920Z // begin inline asm 2026-02-21T10:22:35.4919131Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9369], 4096; 2026-02-21T10:22:35.4919194Z // end inline asm 2026-02-21T10:22:35.4919404Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4919469Z bar.sync 0; 2026-02-21T10:22:35.4919543Z elect.sync %r9518|%p327, -1; 2026-02-21T10:22:35.4919609Z and.pred %p328, %p316, %p327; 2026-02-21T10:22:35.4919682Z and.pred %p298, %p1, %p328; 2026-02-21T10:22:35.4919804Z add.s32 %r9427, %r11026, 149504; 2026-02-21T10:22:35.4919865Z mov.b32 %r9429, 160; 2026-02-21T10:22:35.4919930Z // begin inline asm 2026-02-21T10:22:35.4920266Z @%p298 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9427], [%rd394, {%r14281, %r9429}], [%r9369]; 2026-02-21T10:22:35.4920325Z // end inline asm 2026-02-21T10:22:35.4920533Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4920602Z add.s64 %rd410, %rd432, 768; 2026-02-21T10:22:35.4920667Z add.s64 %rd411, %rd436, 768; 2026-02-21T10:22:35.4920875Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4920939Z add.s32 %r9431, %r9512, 92160; 2026-02-21T10:22:35.4921003Z // begin inline asm 2026-02-21T10:22:35.4921140Z cp.async.ca.shared.global [ %r9431 + 0 ], [ %rd410 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4921200Z // end inline asm 2026-02-21T10:22:35.4921264Z add.s32 %r9433, %r9512, 96256; 2026-02-21T10:22:35.4921323Z // begin inline asm 2026-02-21T10:22:35.4921468Z cp.async.ca.shared.global [ %r9433 + 0 ], [ %rd411 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4921533Z // end inline asm 2026-02-21T10:22:35.4921600Z cp.async.commit_group; 2026-02-21T10:22:35.4921817Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4921883Z // begin inline asm 2026-02-21T10:22:35.4922011Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9372], 4096; 2026-02-21T10:22:35.4922074Z // end inline asm 2026-02-21T10:22:35.4922281Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4922338Z bar.sync 0; 2026-02-21T10:22:35.4922412Z elect.sync %r9519|%p329, -1; 2026-02-21T10:22:35.4922485Z and.pred %p330, %p316, %p329; 2026-02-21T10:22:35.4922554Z and.pred %p300, %p1, %p330; 2026-02-21T10:22:35.4922618Z add.s32 %r9436, %r11026, 161792; 2026-02-21T10:22:35.4922678Z mov.b32 %r9438, 192; 2026-02-21T10:22:35.4922742Z // begin inline asm 2026-02-21T10:22:35.4923065Z @%p300 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9436], [%rd394, {%r14281, %r9438}], [%r9372]; 2026-02-21T10:22:35.4923126Z // end inline asm 2026-02-21T10:22:35.4923330Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4923393Z add.s64 %rd413, %rd432, 896; 2026-02-21T10:22:35.4923455Z add.s64 %rd414, %rd436, 896; 2026-02-21T10:22:35.4923721Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4923785Z add.s32 %r9440, %r9512, 116736; 2026-02-21T10:22:35.4923844Z // begin inline asm 2026-02-21T10:22:35.4924025Z cp.async.ca.shared.global [ %r9440 + 0 ], [ %rd413 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4924087Z // end inline asm 2026-02-21T10:22:35.4924162Z add.s32 %r9442, %r9512, 120832; 2026-02-21T10:22:35.4924225Z // begin inline asm 2026-02-21T10:22:35.4924364Z cp.async.ca.shared.global [ %r9442 + 0 ], [ %rd414 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4924423Z // end inline asm 2026-02-21T10:22:35.4924490Z cp.async.commit_group; 2026-02-21T10:22:35.4924710Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4924769Z // begin inline asm 2026-02-21T10:22:35.4924899Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9375], 4096; 2026-02-21T10:22:35.4924962Z // end inline asm 2026-02-21T10:22:35.4925218Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4925279Z bar.sync 0; 2026-02-21T10:22:35.4925350Z elect.sync %r9520|%p331, -1; 2026-02-21T10:22:35.4925427Z and.pred %p332, %p316, %p331; 2026-02-21T10:22:35.4925493Z and.pred %p302, %p1, %p332; 2026-02-21T10:22:35.4925559Z add.s32 %r9445, %r11026, 174080; 2026-02-21T10:22:35.4925668Z mov.b32 %r9447, 224; 2026-02-21T10:22:35.4925729Z // begin inline asm 2026-02-21T10:22:35.4926057Z @%p302 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9445], [%rd394, {%r14281, %r9447}], [%r9375]; 2026-02-21T10:22:35.4926131Z // end inline asm 2026-02-21T10:22:35.4926351Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4926418Z add.s64 %rd416, %rd432, 1024; 2026-02-21T10:22:35.4926612Z add.s64 %rd417, %rd436, 1024; 2026-02-21T10:22:35.4926835Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4926902Z add.s32 %r9449, %r9512, 51200; 2026-02-21T10:22:35.4926963Z // begin inline asm 2026-02-21T10:22:35.4927117Z cp.async.ca.shared.global [ %r9449 + 0 ], [ %rd416 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4927174Z // end inline asm 2026-02-21T10:22:35.4927239Z add.s32 %r9451, %r9512, 55296; 2026-02-21T10:22:35.4927301Z // begin inline asm 2026-02-21T10:22:35.4927441Z cp.async.ca.shared.global [ %r9451 + 0 ], [ %rd417 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4927498Z // end inline asm 2026-02-21T10:22:35.4927565Z cp.async.commit_group; 2026-02-21T10:22:35.4927788Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4927855Z // begin inline asm 2026-02-21T10:22:35.4927988Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9367], 4096; 2026-02-21T10:22:35.4928051Z // end inline asm 2026-02-21T10:22:35.4928257Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4928313Z bar.sync 0; 2026-02-21T10:22:35.4928382Z elect.sync %r9521|%p333, -1; 2026-02-21T10:22:35.4928455Z and.pred %p334, %p316, %p333; 2026-02-21T10:22:35.4928523Z and.pred %p304, %p1, %p334; 2026-02-21T10:22:35.4928588Z add.s32 %r9454, %r11026, 141312; 2026-02-21T10:22:35.4928655Z mov.b32 %r14285, 256; 2026-02-21T10:22:35.4928715Z // begin inline asm 2026-02-21T10:22:35.4929047Z @%p304 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9454], [%rd394, {%r14281, %r14285}], [%r9367]; 2026-02-21T10:22:35.4929110Z // end inline asm 2026-02-21T10:22:35.4929317Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4929381Z add.s64 %rd419, %rd432, 1152; 2026-02-21T10:22:35.4929445Z add.s64 %rd420, %rd436, 1152; 2026-02-21T10:22:35.4929656Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4929804Z add.s32 %r9458, %r9512, 75776; 2026-02-21T10:22:35.4929865Z // begin inline asm 2026-02-21T10:22:35.4930008Z cp.async.ca.shared.global [ %r9458 + 0 ], [ %rd419 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4930130Z // end inline asm 2026-02-21T10:22:35.4930193Z add.s32 %r9460, %r9512, 79872; 2026-02-21T10:22:35.4930258Z // begin inline asm 2026-02-21T10:22:35.4930395Z cp.async.ca.shared.global [ %r9460 + 0 ], [ %rd420 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4930454Z // end inline asm 2026-02-21T10:22:35.4930521Z cp.async.commit_group; 2026-02-21T10:22:35.4930739Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4930797Z // begin inline asm 2026-02-21T10:22:35.4930933Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9370], 4096; 2026-02-21T10:22:35.4930990Z // end inline asm 2026-02-21T10:22:35.4931255Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4931316Z bar.sync 0; 2026-02-21T10:22:35.4931385Z elect.sync %r9522|%p335, -1; 2026-02-21T10:22:35.4931452Z and.pred %p336, %p316, %p335; 2026-02-21T10:22:35.4931520Z and.pred %p306, %p1, %p336; 2026-02-21T10:22:35.4931588Z add.s32 %r9463, %r11026, 153600; 2026-02-21T10:22:35.4931646Z mov.b32 %r9465, 288; 2026-02-21T10:22:35.4931766Z // begin inline asm 2026-02-21T10:22:35.4932096Z @%p306 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9463], [%rd394, {%r14281, %r9465}], [%r9370]; 2026-02-21T10:22:35.4932154Z // end inline asm 2026-02-21T10:22:35.4932353Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4932418Z add.s64 %rd422, %rd432, 1280; 2026-02-21T10:22:35.4932480Z add.s64 %rd423, %rd436, 1280; 2026-02-21T10:22:35.4932680Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4932750Z add.s32 %r9467, %r9512, 100352; 2026-02-21T10:22:35.4932810Z // begin inline asm 2026-02-21T10:22:35.4932945Z cp.async.ca.shared.global [ %r9467 + 0 ], [ %rd422 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4933004Z // end inline asm 2026-02-21T10:22:35.4933070Z add.s32 %r9469, %r9512, 104448; 2026-02-21T10:22:35.4933129Z // begin inline asm 2026-02-21T10:22:35.4933266Z cp.async.ca.shared.global [ %r9469 + 0 ], [ %rd423 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4933328Z // end inline asm 2026-02-21T10:22:35.4933405Z cp.async.commit_group; 2026-02-21T10:22:35.4933614Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4933673Z // begin inline asm 2026-02-21T10:22:35.4933801Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9373], 4096; 2026-02-21T10:22:35.4933859Z // end inline asm 2026-02-21T10:22:35.4934060Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4934122Z bar.sync 0; 2026-02-21T10:22:35.4934189Z elect.sync %r9523|%p337, -1; 2026-02-21T10:22:35.4934253Z and.pred %p338, %p316, %p337; 2026-02-21T10:22:35.4934328Z and.pred %p308, %p1, %p338; 2026-02-21T10:22:35.4934390Z add.s32 %r9472, %r11026, 165888; 2026-02-21T10:22:35.4934447Z mov.b32 %r9474, 320; 2026-02-21T10:22:35.4934507Z // begin inline asm 2026-02-21T10:22:35.4934834Z @%p308 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9472], [%rd394, {%r14281, %r9474}], [%r9373]; 2026-02-21T10:22:35.4934890Z // end inline asm 2026-02-21T10:22:35.4935090Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.4935167Z add.s64 %rd425, %rd432, 1408; 2026-02-21T10:22:35.4935231Z add.s64 %rd426, %rd436, 1408; 2026-02-21T10:22:35.4935433Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4935563Z add.s32 %r9476, %r9512, 124928; 2026-02-21T10:22:35.4935624Z // begin inline asm 2026-02-21T10:22:35.4935760Z cp.async.ca.shared.global [ %r9476 + 0 ], [ %rd425 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4935863Z // end inline asm 2026-02-21T10:22:35.4935928Z add.s32 %r9478, %r9512, 129024; 2026-02-21T10:22:35.4935986Z // begin inline asm 2026-02-21T10:22:35.4936120Z cp.async.ca.shared.global [ %r9478 + 0 ], [ %rd426 + 0 ], 0x8, %r9378; 2026-02-21T10:22:35.4936184Z // end inline asm 2026-02-21T10:22:35.4936249Z cp.async.commit_group; 2026-02-21T10:22:35.4936578Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4936646Z // begin inline asm 2026-02-21T10:22:35.4936773Z @%p287 mbarrier.arrive.expect_tx.shared.b64 _, [%r9376], 4096; 2026-02-21T10:22:35.4936830Z // end inline asm 2026-02-21T10:22:35.4937028Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4937173Z bar.sync 0; 2026-02-21T10:22:35.4937254Z elect.sync %r9524|%p339, -1; 2026-02-21T10:22:35.4937323Z and.pred %p340, %p316, %p339; 2026-02-21T10:22:35.4937394Z and.pred %p310, %p1, %p340; 2026-02-21T10:22:35.4937461Z add.s32 %r9481, %r11026, 178176; 2026-02-21T10:22:35.4937519Z mov.b32 %r9483, 352; 2026-02-21T10:22:35.4937577Z // begin inline asm 2026-02-21T10:22:35.4937960Z @%p310 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r9481], [%rd394, {%r14281, %r9483}], [%r9376]; 2026-02-21T10:22:35.4938019Z // end inline asm 2026-02-21T10:22:35.4938226Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4938290Z @%p315 bra $L__BB0_14; 2026-02-21T10:22:35.4938379Z // %bb.8: // %.lr.ph232 2026-02-21T10:22:35.4938442Z shl.b32 %r9531, %r122, 5; 2026-02-21T10:22:35.4938516Z add.s32 %r128, %r9531, -3; 2026-02-21T10:22:35.4938586Z shl.b32 %r9533, %r14236, 6; 2026-02-21T10:22:35.4938649Z and.b32 %r9535, %r14237, 896; 2026-02-21T10:22:35.4938710Z and.b32 %r9537, %r14238, 62; 2026-02-21T10:22:35.4938777Z or.b32 %r9538, %r9533, %r9535; 2026-02-21T10:22:35.4938839Z or.b32 %r129, %r9538, %r9537; 2026-02-21T10:22:35.4938902Z xor.b32 %r130, %r129, 8; 2026-02-21T10:22:35.4938968Z xor.b32 %r131, %r129, 16; 2026-02-21T10:22:35.4939029Z xor.b32 %r132, %r129, 24; 2026-02-21T10:22:35.4939089Z xor.b32 %r133, %r129, 32; 2026-02-21T10:22:35.4939149Z xor.b32 %r134, %r129, 40; 2026-02-21T10:22:35.4939208Z xor.b32 %r135, %r129, 48; 2026-02-21T10:22:35.4939266Z xor.b32 %r136, %r129, 56; 2026-02-21T10:22:35.4939326Z and.b32 %r9541, %r14240, 144; 2026-02-21T10:22:35.4939392Z xor.b32 %r137, %r9541, %r14239; 2026-02-21T10:22:35.4939453Z shl.b32 %r9542, %r14239, 7; 2026-02-21T10:22:35.4939511Z shr.u32 %r9545, %r14245, 5; 2026-02-21T10:22:35.4939575Z or.b32 %r9546, %r9542, %r9545; 2026-02-21T10:22:35.4939638Z or.b32 %r9547, %r9546, %r14241; 2026-02-21T10:22:35.4939702Z add.s32 %r138, %r11026, %r9547; 2026-02-21T10:22:35.4939761Z xor.b32 %r9549, %r9547, 16; 2026-02-21T10:22:35.4939825Z add.s32 %r139, %r11026, %r9549; 2026-02-21T10:22:35.4939888Z xor.b32 %r9550, %r9547, 32; 2026-02-21T10:22:35.4939947Z add.s32 %r140, %r11026, %r9550; 2026-02-21T10:22:35.4940010Z xor.b32 %r9551, %r9547, 48; 2026-02-21T10:22:35.4940073Z add.s32 %r141, %r11026, %r9551; 2026-02-21T10:22:35.4940133Z xor.b32 %r9552, %r9547, 64; 2026-02-21T10:22:35.4940194Z add.s32 %r142, %r11026, %r9552; 2026-02-21T10:22:35.4940258Z xor.b32 %r9553, %r9547, 80; 2026-02-21T10:22:35.4940319Z add.s32 %r143, %r11026, %r9553; 2026-02-21T10:22:35.4940379Z xor.b32 %r9554, %r9547, 96; 2026-02-21T10:22:35.4940445Z add.s32 %r144, %r11026, %r9554; 2026-02-21T10:22:35.4940504Z xor.b32 %r9555, %r9547, 112; 2026-02-21T10:22:35.4940567Z add.s32 %r145, %r11026, %r9555; 2026-02-21T10:22:35.4940629Z and.b32 %r9557, %r14242, 120; 2026-02-21T10:22:35.4940769Z or.b32 %r9558, %r9557, %r4; 2026-02-21T10:22:35.4940830Z shl.b32 %r9559, %r9558, 4; 2026-02-21T10:22:35.4940902Z add.s32 %r9560, %r11026, 32768; 2026-02-21T10:22:35.4940971Z add.s32 %r13608, %r9560, %r9559; 2026-02-21T10:22:35.4941111Z shl.b32 %r9562, %r14244, 6; 2026-02-21T10:22:35.4941172Z shl.b32 %r9563, %r14236, 2; 2026-02-21T10:22:35.4941235Z add.s32 %r9564, %r9560, %r9562; 2026-02-21T10:22:35.4941296Z add.s32 %r9565, %r9564, %r9563; 2026-02-21T10:22:35.4941358Z add.s32 %r9667, %r9565, %r14241; 2026-02-21T10:22:35.4941419Z bfe.u32 %r9566, %r11026, 4, 14; 2026-02-21T10:22:35.4941497Z cvt.u64.u32 %rd437, %r9566; 2026-02-21T10:22:35.4941577Z or.b64 %rd469, %rd437, 4611686293372403712; 2026-02-21T10:22:35.4941638Z add.s32 %r9567, %r11026, 32; 2026-02-21T10:22:35.4941702Z bfe.u32 %r9568, %r9567, 4, 14; 2026-02-21T10:22:35.4941763Z cvt.u64.u32 %rd438, %r9568; 2026-02-21T10:22:35.4941837Z or.b64 %rd470, %rd438, 4611686293372403712; 2026-02-21T10:22:35.4941901Z add.s32 %r9569, %r11026, 64; 2026-02-21T10:22:35.4942012Z bfe.u32 %r9570, %r9569, 4, 14; 2026-02-21T10:22:35.4942075Z cvt.u64.u32 %rd439, %r9570; 2026-02-21T10:22:35.4942146Z or.b64 %rd471, %rd439, 4611686293372403712; 2026-02-21T10:22:35.4942213Z add.s32 %r9571, %r11026, 96; 2026-02-21T10:22:35.4942273Z bfe.u32 %r9572, %r9571, 4, 14; 2026-02-21T10:22:35.4942335Z cvt.u64.u32 %rd440, %r9572; 2026-02-21T10:22:35.4942455Z or.b64 %rd472, %rd440, 4611686293372403712; 2026-02-21T10:22:35.4942520Z add.s32 %r9573, %r11026, 16384; 2026-02-21T10:22:35.4942580Z bfe.u32 %r9574, %r9573, 4, 14; 2026-02-21T10:22:35.4942643Z cvt.u64.u32 %rd441, %r9574; 2026-02-21T10:22:35.4942718Z or.b64 %rd473, %rd441, 4611686293372403712; 2026-02-21T10:22:35.4942780Z add.s32 %r9575, %r11026, 16416; 2026-02-21T10:22:35.4942840Z bfe.u32 %r9576, %r9575, 4, 14; 2026-02-21T10:22:35.4942901Z cvt.u64.u32 %rd442, %r9576; 2026-02-21T10:22:35.4942969Z or.b64 %rd474, %rd442, 4611686293372403712; 2026-02-21T10:22:35.4943035Z add.s32 %r9577, %r11026, 16448; 2026-02-21T10:22:35.4943094Z bfe.u32 %r9578, %r9577, 4, 14; 2026-02-21T10:22:35.4943157Z cvt.u64.u32 %rd443, %r9578; 2026-02-21T10:22:35.4943228Z or.b64 %rd475, %rd443, 4611686293372403712; 2026-02-21T10:22:35.4943291Z add.s32 %r9579, %r11026, 16480; 2026-02-21T10:22:35.4943357Z bfe.u32 %r9580, %r9579, 4, 14; 2026-02-21T10:22:35.4943416Z cvt.u64.u32 %rd444, %r9580; 2026-02-21T10:22:35.4943490Z or.b64 %rd476, %rd444, 4611686293372403712; 2026-02-21T10:22:35.4943555Z add.s32 %r9581, %r11026, 4096; 2026-02-21T10:22:35.4943615Z bfe.u32 %r9582, %r9581, 4, 14; 2026-02-21T10:22:35.4943675Z cvt.u64.u32 %rd445, %r9582; 2026-02-21T10:22:35.4943745Z or.b64 %rd477, %rd445, 4611686293372403712; 2026-02-21T10:22:35.4943810Z add.s32 %r9583, %r11026, 4128; 2026-02-21T10:22:35.4943870Z bfe.u32 %r9584, %r9583, 4, 14; 2026-02-21T10:22:35.4943931Z cvt.u64.u32 %rd446, %r9584; 2026-02-21T10:22:35.4944015Z or.b64 %rd478, %rd446, 4611686293372403712; 2026-02-21T10:22:35.4944080Z add.s32 %r9585, %r11026, 4160; 2026-02-21T10:22:35.4944142Z bfe.u32 %r9586, %r9585, 4, 14; 2026-02-21T10:22:35.4944203Z cvt.u64.u32 %rd447, %r9586; 2026-02-21T10:22:35.4944279Z or.b64 %rd479, %rd447, 4611686293372403712; 2026-02-21T10:22:35.4944343Z add.s32 %r9587, %r11026, 4192; 2026-02-21T10:22:35.4944402Z bfe.u32 %r9588, %r9587, 4, 14; 2026-02-21T10:22:35.4944467Z cvt.u64.u32 %rd448, %r9588; 2026-02-21T10:22:35.4944540Z or.b64 %rd480, %rd448, 4611686293372403712; 2026-02-21T10:22:35.4944599Z add.s32 %r9589, %r11026, 20480; 2026-02-21T10:22:35.4944659Z bfe.u32 %r9590, %r9589, 4, 14; 2026-02-21T10:22:35.4944722Z cvt.u64.u32 %rd449, %r9590; 2026-02-21T10:22:35.4944791Z or.b64 %rd481, %rd449, 4611686293372403712; 2026-02-21T10:22:35.4944851Z add.s32 %r9591, %r11026, 20512; 2026-02-21T10:22:35.4944915Z bfe.u32 %r9592, %r9591, 4, 14; 2026-02-21T10:22:35.4944976Z cvt.u64.u32 %rd450, %r9592; 2026-02-21T10:22:35.4945045Z or.b64 %rd482, %rd450, 4611686293372403712; 2026-02-21T10:22:35.4945180Z add.s32 %r9593, %r11026, 20544; 2026-02-21T10:22:35.4945242Z bfe.u32 %r9594, %r9593, 4, 14; 2026-02-21T10:22:35.4945304Z cvt.u64.u32 %rd451, %r9594; 2026-02-21T10:22:35.4945377Z or.b64 %rd483, %rd451, 4611686293372403712; 2026-02-21T10:22:35.4945494Z add.s32 %r9595, %r11026, 20576; 2026-02-21T10:22:35.4945555Z bfe.u32 %r9596, %r9595, 4, 14; 2026-02-21T10:22:35.4945616Z cvt.u64.u32 %rd452, %r9596; 2026-02-21T10:22:35.4945692Z or.b64 %rd484, %rd452, 4611686293372403712; 2026-02-21T10:22:35.4945753Z add.s32 %r9597, %r11026, 8192; 2026-02-21T10:22:35.4945813Z bfe.u32 %r9598, %r9597, 4, 14; 2026-02-21T10:22:35.4945874Z cvt.u64.u32 %rd453, %r9598; 2026-02-21T10:22:35.4945946Z or.b64 %rd485, %rd453, 4611686293372403712; 2026-02-21T10:22:35.4946007Z add.s32 %r9599, %r11026, 8224; 2026-02-21T10:22:35.4946067Z bfe.u32 %r9600, %r9599, 4, 14; 2026-02-21T10:22:35.4946129Z cvt.u64.u32 %rd454, %r9600; 2026-02-21T10:22:35.4946200Z or.b64 %rd486, %rd454, 4611686293372403712; 2026-02-21T10:22:35.4946312Z add.s32 %r9601, %r11026, 8256; 2026-02-21T10:22:35.4946376Z bfe.u32 %r9602, %r9601, 4, 14; 2026-02-21T10:22:35.4946437Z cvt.u64.u32 %rd455, %r9602; 2026-02-21T10:22:35.4946634Z or.b64 %rd487, %rd455, 4611686293372403712; 2026-02-21T10:22:35.4946703Z add.s32 %r9603, %r11026, 8288; 2026-02-21T10:22:35.4946768Z bfe.u32 %r9604, %r9603, 4, 14; 2026-02-21T10:22:35.4946829Z cvt.u64.u32 %rd456, %r9604; 2026-02-21T10:22:35.4946981Z or.b64 %rd488, %rd456, 4611686293372403712; 2026-02-21T10:22:35.4947051Z add.s32 %r9605, %r11026, 24576; 2026-02-21T10:22:35.4947112Z bfe.u32 %r9606, %r9605, 4, 14; 2026-02-21T10:22:35.4947174Z cvt.u64.u32 %rd457, %r9606; 2026-02-21T10:22:35.4947245Z or.b64 %rd489, %rd457, 4611686293372403712; 2026-02-21T10:22:35.4947308Z add.s32 %r9607, %r11026, 24608; 2026-02-21T10:22:35.4947369Z bfe.u32 %r9608, %r9607, 4, 14; 2026-02-21T10:22:35.4947429Z cvt.u64.u32 %rd458, %r9608; 2026-02-21T10:22:35.4947502Z or.b64 %rd490, %rd458, 4611686293372403712; 2026-02-21T10:22:35.4947567Z add.s32 %r9609, %r11026, 24640; 2026-02-21T10:22:35.4947628Z bfe.u32 %r9610, %r9609, 4, 14; 2026-02-21T10:22:35.4947693Z cvt.u64.u32 %rd459, %r9610; 2026-02-21T10:22:35.4947764Z or.b64 %rd491, %rd459, 4611686293372403712; 2026-02-21T10:22:35.4947828Z add.s32 %r9611, %r11026, 24672; 2026-02-21T10:22:35.4947889Z bfe.u32 %r9612, %r9611, 4, 14; 2026-02-21T10:22:35.4947956Z cvt.u64.u32 %rd460, %r9612; 2026-02-21T10:22:35.4948025Z or.b64 %rd492, %rd460, 4611686293372403712; 2026-02-21T10:22:35.4948086Z add.s32 %r9613, %r11026, 12288; 2026-02-21T10:22:35.4948148Z bfe.u32 %r9614, %r9613, 4, 14; 2026-02-21T10:22:35.4948209Z cvt.u64.u32 %rd461, %r9614; 2026-02-21T10:22:35.4948278Z or.b64 %rd493, %rd461, 4611686293372403712; 2026-02-21T10:22:35.4948339Z add.s32 %r9615, %r11026, 12320; 2026-02-21T10:22:35.4948402Z bfe.u32 %r9616, %r9615, 4, 14; 2026-02-21T10:22:35.4948556Z cvt.u64.u32 %rd462, %r9616; 2026-02-21T10:22:35.4948631Z or.b64 %rd494, %rd462, 4611686293372403712; 2026-02-21T10:22:35.4948699Z add.s32 %r9617, %r11026, 12352; 2026-02-21T10:22:35.4948760Z bfe.u32 %r9618, %r9617, 4, 14; 2026-02-21T10:22:35.4948820Z cvt.u64.u32 %rd463, %r9618; 2026-02-21T10:22:35.4948890Z or.b64 %rd495, %rd463, 4611686293372403712; 2026-02-21T10:22:35.4948957Z add.s32 %r9619, %r11026, 12384; 2026-02-21T10:22:35.4949016Z bfe.u32 %r9620, %r9619, 4, 14; 2026-02-21T10:22:35.4949078Z cvt.u64.u32 %rd464, %r9620; 2026-02-21T10:22:35.4949150Z or.b64 %rd496, %rd464, 4611686293372403712; 2026-02-21T10:22:35.4949212Z add.s32 %r9621, %r11026, 28672; 2026-02-21T10:22:35.4949270Z bfe.u32 %r9622, %r9621, 4, 14; 2026-02-21T10:22:35.4949337Z cvt.u64.u32 %rd465, %r9622; 2026-02-21T10:22:35.4949410Z or.b64 %rd497, %rd465, 4611686293372403712; 2026-02-21T10:22:35.4949478Z add.s32 %r9623, %r11026, 28704; 2026-02-21T10:22:35.4949536Z bfe.u32 %r9624, %r9623, 4, 14; 2026-02-21T10:22:35.4949611Z cvt.u64.u32 %rd466, %r9624; 2026-02-21T10:22:35.4949767Z or.b64 %rd498, %rd466, 4611686293372403712; 2026-02-21T10:22:35.4949829Z add.s32 %r9625, %r11026, 28736; 2026-02-21T10:22:35.4949894Z bfe.u32 %r9626, %r9625, 4, 14; 2026-02-21T10:22:35.4949954Z cvt.u64.u32 %rd467, %r9626; 2026-02-21T10:22:35.4950088Z or.b64 %rd499, %rd467, 4611686293372403712; 2026-02-21T10:22:35.4950152Z add.s32 %r9627, %r11026, 28768; 2026-02-21T10:22:35.4950215Z bfe.u32 %r9628, %r9627, 4, 14; 2026-02-21T10:22:35.4950276Z cvt.u64.u32 %rd468, %r9628; 2026-02-21T10:22:35.4950358Z or.b64 %rd500, %rd468, 4611686293372403712; 2026-02-21T10:22:35.4950423Z shl.b32 %r9630, %r14243, 12; 2026-02-21T10:22:35.4950483Z and.b32 %r9631, %r14237, 3168; 2026-02-21T10:22:35.4950543Z shl.b32 %r9632, %r14244, 4; 2026-02-21T10:22:35.4950603Z shr.u32 %r9633, %r14245, 2; 2026-02-21T10:22:35.4950665Z and.b32 %r9635, %r250, 16; 2026-02-21T10:22:35.4950726Z or.b32 %r9636, %r9631, %r9632; 2026-02-21T10:22:35.4950787Z xor.b32 %r9637, %r9636, %r9633; 2026-02-21T10:22:35.4950855Z add.s32 %r9638, %r11026, %r9630; 2026-02-21T10:22:35.4950980Z add.s32 %r9639, %r9638, %r9635; 2026-02-21T10:22:35.4951046Z add.s32 %r148, %r9639, %r9637; 2026-02-21T10:22:35.4951107Z shl.b32 %r9640, %r14244, 9; 2026-02-21T10:22:35.4951167Z shl.b32 %r9641, %r14243, 5; 2026-02-21T10:22:35.4951230Z and.b32 %r9642, %r250, 2032; 2026-02-21T10:22:35.4951290Z or.b32 %r9643, %r9640, %r9641; 2026-02-21T10:22:35.4951353Z xor.b32 %r9644, %r9643, %r9642; 2026-02-21T10:22:35.4951474Z add.s32 %r14195, %r11026, %r9644; 2026-02-21T10:22:35.4951539Z add.s32 %r14200, %r14195, 2048; 2026-02-21T10:22:35.4951605Z shl.b32 %r9645, %r120, 5; 2026-02-21T10:22:35.4951664Z shl.b32 %r9646, %r121, 5; 2026-02-21T10:22:35.4951724Z add.s32 %r153, %r9645, %r9646; 2026-02-21T10:22:35.4951784Z mov.b32 %r13675, 0f00000000; 2026-02-21T10:22:35.4951844Z mov.b32 %r14288, 2; 2026-02-21T10:22:35.4951904Z mov.b32 %r14287, -1; 2026-02-21T10:22:35.4951961Z mov.b32 %r14284, 0; 2026-02-21T10:22:35.4952019Z mov.b32 %r14283, 1; 2026-02-21T10:22:35.4952083Z mov.b32 %r14280, %r14279; 2026-02-21T10:22:35.4952142Z mov.b32 %r14282, %r14281; 2026-02-21T10:22:35.4952202Z mov.b32 %r14286, %r14284; 2026-02-21T10:22:35.4952273Z mov.b32 %r14289, %r14279; 2026-02-21T10:22:35.4952337Z mov.b32 %r14290, %r14281; 2026-02-21T10:22:35.4952396Z mov.b32 %r13676, %r13675; 2026-02-21T10:22:35.4952456Z mov.b32 %r13677, %r13675; 2026-02-21T10:22:35.4952515Z mov.b32 %r13678, %r13675; 2026-02-21T10:22:35.4952577Z mov.b32 %r13679, %r13675; 2026-02-21T10:22:35.4952638Z mov.b32 %r13680, %r13675; 2026-02-21T10:22:35.4952696Z mov.b32 %r13681, %r13675; 2026-02-21T10:22:35.4952754Z mov.b32 %r13682, %r13675; 2026-02-21T10:22:35.4952812Z mov.b32 %r13683, %r13675; 2026-02-21T10:22:35.4952872Z mov.b32 %r13684, %r13675; 2026-02-21T10:22:35.4952933Z mov.b32 %r13685, %r13675; 2026-02-21T10:22:35.4952991Z mov.b32 %r13686, %r13675; 2026-02-21T10:22:35.4953052Z mov.b32 %r13687, %r13675; 2026-02-21T10:22:35.4953109Z mov.b32 %r13688, %r13675; 2026-02-21T10:22:35.4953172Z mov.b32 %r13689, %r13675; 2026-02-21T10:22:35.4953230Z mov.b32 %r13690, %r13675; 2026-02-21T10:22:35.4953291Z mov.b32 %r14308, %r14288; 2026-02-21T10:22:35.4953349Z mov.b32 %r14309, %r14284; 2026-02-21T10:22:35.4953410Z mov.b32 %r14312, %r14290; 2026-02-21T10:22:35.4953470Z mov.b32 %r14313, %r14289; 2026-02-21T10:22:35.4953527Z bra.uni $L__BB0_9; 2026-02-21T10:22:35.4953648Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:35.4953863Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4953927Z add.s32 %r14309, %r14309, 1; 2026-02-21T10:22:35.4953994Z setp.ne.b32 %p471, %r153, %r14309; 2026-02-21T10:22:35.4954055Z mov.b32 %r14279, %r14289; 2026-02-21T10:22:35.4954117Z mov.b32 %r14280, %r154; 2026-02-21T10:22:35.4954176Z mov.b32 %r14281, %r14290; 2026-02-21T10:22:35.4954235Z mov.b32 %r14282, %r156; 2026-02-21T10:22:35.4954293Z mov.b32 %r14283, %r14308; 2026-02-21T10:22:35.4954430Z mov.b32 %r14284, %r158; 2026-02-21T10:22:35.4954493Z mov.b32 %r14289, %r14313; 2026-02-21T10:22:35.4954551Z mov.b32 %r14290, %r14312; 2026-02-21T10:22:35.4954611Z mov.b32 %r14308, %r187; 2026-02-21T10:22:35.4954717Z @%p471 bra $L__BB0_9; 2026-02-21T10:22:35.4954776Z bra.uni $L__BB0_14; 2026-02-21T10:22:35.4954897Z $L__BB0_9: // =>This Inner Loop Header: Depth=1 2026-02-21T10:22:35.4955110Z .loc 1 0 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:0:107 2026-02-21T10:22:35.4955170Z mov.b32 %r158, %r14283; 2026-02-21T10:22:35.4955229Z mov.b32 %r156, %r14281; 2026-02-21T10:22:35.4955289Z mov.b32 %r154, %r14279; 2026-02-21T10:22:35.4955495Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4955555Z add.s32 %r9647, %r14308, 1; 2026-02-21T10:22:35.4955624Z setp.eq.b32 %p341, %r14308, 31; 2026-02-21T10:22:35.4955704Z selp.b32 %r187, 0, %r9647, %p341; 2026-02-21T10:22:35.4955822Z setp.ne.b32 %p342, %r187, 0; 2026-02-21T10:22:35.4955890Z @%p342 bra $L__BB0_11; 2026-02-21T10:22:35.4956003Z // %bb.10: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:35.4956070Z add.s32 %r14314, %r14314, 132; 2026-02-21T10:22:35.4956272Z .loc 1 32 35 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:32:35 2026-02-21T10:22:35.4956381Z shr.s32 %r9648, %r14314, 31; 2026-02-21T10:22:35.4956442Z shr.u32 %r9649, %r9648, 16; 2026-02-21T10:22:35.4956632Z add.s32 %r9650, %r14314, %r9649; 2026-02-21T10:22:35.4956694Z shr.s32 %r9651, %r9650, 16; 2026-02-21T10:22:35.4956896Z .loc 1 33 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:33:33 2026-02-21T10:22:35.4956956Z shl.b32 %r9652, %r9651, 6; 2026-02-21T10:22:35.4957158Z .loc 1 34 39 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:39 2026-02-21T10:22:35.4957224Z sub.s32 %r9653, 10, %r9652; 2026-02-21T10:22:35.4957423Z .loc 1 34 52 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:34:52 2026-02-21T10:22:35.4957482Z min.s32 %r9654, %r9653, 64; 2026-02-21T10:22:35.4957684Z .loc 1 35 45 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:45 2026-02-21T10:22:35.4957749Z and.b32 %r9655, %r9650, -65536; 2026-02-21T10:22:35.4957815Z sub.s32 %r9656, %r14314, %r9655; 2026-02-21T10:22:35.4958016Z .loc 1 36 51 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:36:51 2026-02-21T10:22:35.4958079Z div.s32 %r9657, %r9656, %r9654; 2026-02-21T10:22:35.4958287Z .loc 1 35 64 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:64 2026-02-21T10:22:35.4958357Z mul.lo.s32 %r9658, %r9657, %r9654; 2026-02-21T10:22:35.4958418Z sub.s32 %r9659, %r9656, %r9658; 2026-02-21T10:22:35.4958619Z .loc 1 35 30 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:35:30 2026-02-21T10:22:35.4958686Z add.s32 %r9660, %r9659, %r9652; 2026-02-21T10:22:35.4958883Z .loc 1 37 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:37:27 2026-02-21T10:22:35.4958945Z shl.b32 %r14312, %r9660, 7; 2026-02-21T10:22:35.4959141Z .loc 1 39 27 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:39:27 2026-02-21T10:22:35.4959206Z shl.b32 %r14313, %r9657, 6; 2026-02-21T10:22:35.4959404Z .loc 1 40 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:32 2026-02-21T10:22:35.4959466Z or.b32 %r14315, %r14313, %r7; 2026-02-21T10:22:35.4959529Z or.b32 %r14316, %r14313, %r8; 2026-02-21T10:22:35.4959647Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:35.4959859Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4960029Z setp.eq.b32 %p455, %r187, 0; 2026-02-21T10:22:35.4960100Z setp.lt.s32 %p456, %r14309, %r128; 2026-02-21T10:22:35.4960161Z add.s32 %r14001, %r14287, 1; 2026-02-21T10:22:35.4960225Z setp.gt.s32 %p460, %r14001, 2; 2026-02-21T10:22:35.4960357Z selp.b32 %r14287, 0, %r14001, %p460; 2026-02-21T10:22:35.4960421Z selp.b32 %r14002, 1, 0, %p460; 2026-02-21T10:22:35.4960482Z xor.b32 %r14286, %r14286, %r14002; 2026-02-21T10:22:35.4960685Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.4960753Z cp.async.wait_group 8; 2026-02-21T10:22:35.4960809Z bar.sync 0; 2026-02-21T10:22:35.4960871Z shl.b32 %r14003, %r14287, 12; 2026-02-21T10:22:35.4960940Z shl.b32 %r14004, %r14287, 13; 2026-02-21T10:22:35.4961005Z add.s32 %r14005, %r11026, 34816; 2026-02-21T10:22:35.4961067Z add.s32 %r14006, %r14005, %r14004; 2026-02-21T10:22:35.4961269Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.4961401Z add.s32 %r14007, %r14006, %r129; 2026-02-21T10:22:35.4961471Z ld.shared.b16 %rs897, [%r14007]; 2026-02-21T10:22:35.4961548Z ld.shared.b16 %rs898, [%r14007+1024]; 2026-02-21T10:22:35.4961618Z ld.shared.b16 %rs899, [%r14007+64]; 2026-02-21T10:22:35.4961690Z ld.shared.b16 %rs900, [%r14007+1088]; 2026-02-21T10:22:35.4961752Z add.s32 %r14008, %r14006, %r130; 2026-02-21T10:22:35.4961881Z ld.shared.b16 %rs901, [%r14008]; 2026-02-21T10:22:35.4961960Z ld.shared.b16 %rs902, [%r14008+1024]; 2026-02-21T10:22:35.4962028Z ld.shared.b16 %rs903, [%r14008+64]; 2026-02-21T10:22:35.4962097Z ld.shared.b16 %rs904, [%r14008+1088]; 2026-02-21T10:22:35.4962157Z add.s32 %r14009, %r14006, %r131; 2026-02-21T10:22:35.4962222Z ld.shared.b16 %rs905, [%r14009]; 2026-02-21T10:22:35.4962291Z ld.shared.b16 %rs906, [%r14009+1024]; 2026-02-21T10:22:35.4962359Z ld.shared.b16 %rs907, [%r14009+64]; 2026-02-21T10:22:35.4962424Z ld.shared.b16 %rs908, [%r14009+1088]; 2026-02-21T10:22:35.4962491Z add.s32 %r14010, %r14006, %r132; 2026-02-21T10:22:35.4962560Z ld.shared.b16 %rs909, [%r14010]; 2026-02-21T10:22:35.4962625Z ld.shared.b16 %rs910, [%r14010+1024]; 2026-02-21T10:22:35.4962692Z ld.shared.b16 %rs911, [%r14010+64]; 2026-02-21T10:22:35.4962761Z ld.shared.b16 %rs912, [%r14010+1088]; 2026-02-21T10:22:35.4962823Z add.s32 %r14011, %r14006, %r133; 2026-02-21T10:22:35.4962887Z ld.shared.b16 %rs913, [%r14011]; 2026-02-21T10:22:35.4962956Z ld.shared.b16 %rs914, [%r14011+1024]; 2026-02-21T10:22:35.4963025Z ld.shared.b16 %rs915, [%r14011+64]; 2026-02-21T10:22:35.4963090Z ld.shared.b16 %rs916, [%r14011+1088]; 2026-02-21T10:22:35.4963153Z add.s32 %r14012, %r14006, %r134; 2026-02-21T10:22:35.4963221Z ld.shared.b16 %rs917, [%r14012]; 2026-02-21T10:22:35.4963286Z ld.shared.b16 %rs918, [%r14012+1024]; 2026-02-21T10:22:35.4963351Z ld.shared.b16 %rs919, [%r14012+64]; 2026-02-21T10:22:35.4963422Z ld.shared.b16 %rs920, [%r14012+1088]; 2026-02-21T10:22:35.4963485Z add.s32 %r14013, %r14006, %r135; 2026-02-21T10:22:35.4963556Z ld.shared.b16 %rs921, [%r14013]; 2026-02-21T10:22:35.4963623Z ld.shared.b16 %rs922, [%r14013+1024]; 2026-02-21T10:22:35.4963706Z ld.shared.b16 %rs923, [%r14013+64]; 2026-02-21T10:22:35.4963777Z ld.shared.b16 %rs924, [%r14013+1088]; 2026-02-21T10:22:35.4963841Z add.s32 %r14014, %r14006, %r136; 2026-02-21T10:22:35.4963909Z ld.shared.b16 %rs925, [%r14014]; 2026-02-21T10:22:35.4963977Z ld.shared.b16 %rs926, [%r14014+1024]; 2026-02-21T10:22:35.4964044Z ld.shared.b16 %rs927, [%r14014+64]; 2026-02-21T10:22:35.4964113Z ld.shared.b16 %rs928, [%r14014+1088]; 2026-02-21T10:22:35.4964179Z cvt.f32.bf16 %r10063, %rs897; 2026-02-21T10:22:35.4964242Z cvt.f32.bf16 %r10064, %rs898; 2026-02-21T10:22:35.4964304Z cvt.f32.bf16 %r10065, %rs901; 2026-02-21T10:22:35.4964379Z cvt.f32.bf16 %r10066, %rs902; 2026-02-21T10:22:35.4964444Z cvt.f32.bf16 %r10099, %rs905; 2026-02-21T10:22:35.4964506Z cvt.f32.bf16 %r10100, %rs906; 2026-02-21T10:22:35.4964644Z cvt.f32.bf16 %r10101, %rs909; 2026-02-21T10:22:35.4964708Z cvt.f32.bf16 %r10102, %rs910; 2026-02-21T10:22:35.4964769Z cvt.f32.bf16 %r10135, %rs913; 2026-02-21T10:22:35.4964830Z cvt.f32.bf16 %r10136, %rs914; 2026-02-21T10:22:35.4964897Z cvt.f32.bf16 %r10137, %rs917; 2026-02-21T10:22:35.4965007Z cvt.f32.bf16 %r10138, %rs918; 2026-02-21T10:22:35.4965068Z cvt.f32.bf16 %r10171, %rs921; 2026-02-21T10:22:35.4965135Z cvt.f32.bf16 %r10172, %rs922; 2026-02-21T10:22:35.4965199Z cvt.f32.bf16 %r10173, %rs925; 2026-02-21T10:22:35.4965260Z cvt.f32.bf16 %r10174, %rs926; 2026-02-21T10:22:35.4965320Z cvt.f32.bf16 %r10207, %rs899; 2026-02-21T10:22:35.4965384Z cvt.f32.bf16 %r10208, %rs900; 2026-02-21T10:22:35.4965445Z cvt.f32.bf16 %r10209, %rs903; 2026-02-21T10:22:35.4965506Z cvt.f32.bf16 %r10210, %rs904; 2026-02-21T10:22:35.4965570Z cvt.f32.bf16 %r10243, %rs907; 2026-02-21T10:22:35.4965634Z cvt.f32.bf16 %r10244, %rs908; 2026-02-21T10:22:35.4965695Z cvt.f32.bf16 %r10245, %rs911; 2026-02-21T10:22:35.4965807Z cvt.f32.bf16 %r10246, %rs912; 2026-02-21T10:22:35.4965873Z cvt.f32.bf16 %r10279, %rs915; 2026-02-21T10:22:35.4965933Z cvt.f32.bf16 %r10280, %rs916; 2026-02-21T10:22:35.4965995Z cvt.f32.bf16 %r10281, %rs919; 2026-02-21T10:22:35.4966061Z cvt.f32.bf16 %r10282, %rs920; 2026-02-21T10:22:35.4966121Z cvt.f32.bf16 %r10315, %rs923; 2026-02-21T10:22:35.4966184Z cvt.f32.bf16 %r10316, %rs924; 2026-02-21T10:22:35.4966291Z cvt.f32.bf16 %r10317, %rs927; 2026-02-21T10:22:35.4966359Z cvt.f32.bf16 %r10318, %rs928; 2026-02-21T10:22:35.4966690Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.4966757Z shl.b32 %r14015, %r14287, 3; 2026-02-21T10:22:35.4966825Z add.s32 %r9661, %r9365, %r14015; 2026-02-21T10:22:35.4966897Z // begin inline asm 2026-02-21T10:22:35.4966953Z 2026-02-21T10:22:35.4967007Z { 2026-02-21T10:22:35.4967072Z .reg .pred complete; 2026-02-21T10:22:35.4967128Z waitLoop: 2026-02-21T10:22:35.4967281Z mbarrier.try_wait.parity.shared.b64 complete, [%r9661], %r14286; 2026-02-21T10:22:35.4967357Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.4967410Z } 2026-02-21T10:22:35.4967415Z 2026-02-21T10:22:35.4967473Z // end inline asm 2026-02-21T10:22:35.4967683Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.4967749Z add.s32 %r14018, %r9382, %r14003; 2026-02-21T10:22:35.4967953Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4968021Z add.s32 %r14019, %r14018, %r137; 2026-02-21T10:22:35.4968090Z ld.shared.b8 %rs929, [%r14019]; 2026-02-21T10:22:35.4968160Z ld.shared.b8 %rs930, [%r14019+1024]; 2026-02-21T10:22:35.4968227Z ld.shared.b8 %rs931, [%r14019+2048]; 2026-02-21T10:22:35.4968301Z ld.shared.b8 %rs932, [%r14019+3072]; 2026-02-21T10:22:35.4968363Z xor.b32 %r14020, %r137, 32; 2026-02-21T10:22:35.4968429Z add.s32 %r14021, %r14018, %r14020; 2026-02-21T10:22:35.4968505Z ld.shared.b8 %rs933, [%r14021+256]; 2026-02-21T10:22:35.4968571Z ld.shared.b8 %rs934, [%r14021+1280]; 2026-02-21T10:22:35.4968636Z ld.shared.b8 %rs935, [%r14021+2304]; 2026-02-21T10:22:35.4968702Z ld.shared.b8 %rs936, [%r14021+3328]; 2026-02-21T10:22:35.4968771Z xor.b32 %r14022, %r137, 64; 2026-02-21T10:22:35.4968837Z add.s32 %r14023, %r14018, %r14022; 2026-02-21T10:22:35.4968905Z ld.shared.b8 %rs937, [%r14023+512]; 2026-02-21T10:22:35.4968983Z ld.shared.b8 %rs938, [%r14023+1536]; 2026-02-21T10:22:35.4969051Z ld.shared.b8 %rs939, [%r14023+2560]; 2026-02-21T10:22:35.4969117Z ld.shared.b8 %rs940, [%r14023+3584]; 2026-02-21T10:22:35.4969180Z xor.b32 %r14024, %r137, 96; 2026-02-21T10:22:35.4969254Z add.s32 %r14025, %r14018, %r14024; 2026-02-21T10:22:35.4969325Z ld.shared.b8 %rs941, [%r14025+768]; 2026-02-21T10:22:35.4969390Z ld.shared.b8 %rs942, [%r14025+1792]; 2026-02-21T10:22:35.4969460Z ld.shared.b8 %rs943, [%r14025+2816]; 2026-02-21T10:22:35.4969607Z ld.shared.b8 %rs944, [%r14025+3840]; 2026-02-21T10:22:35.4969811Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.4969880Z shl.b16 %rs945, %rs929, 4; 2026-02-21T10:22:35.4969943Z shl.b16 %rs946, %rs933, 4; 2026-02-21T10:22:35.4970067Z shl.b16 %rs947, %rs937, 4; 2026-02-21T10:22:35.4970129Z shl.b16 %rs948, %rs941, 4; 2026-02-21T10:22:35.4970195Z shl.b16 %rs949, %rs930, 4; 2026-02-21T10:22:35.4970258Z shl.b16 %rs950, %rs934, 4; 2026-02-21T10:22:35.4970319Z shl.b16 %rs951, %rs938, 4; 2026-02-21T10:22:35.4970381Z shl.b16 %rs952, %rs942, 4; 2026-02-21T10:22:35.4970443Z shl.b16 %rs953, %rs931, 4; 2026-02-21T10:22:35.4970504Z shl.b16 %rs954, %rs935, 4; 2026-02-21T10:22:35.4970572Z shl.b16 %rs955, %rs939, 4; 2026-02-21T10:22:35.4970632Z shl.b16 %rs956, %rs943, 4; 2026-02-21T10:22:35.4970693Z shl.b16 %rs957, %rs932, 4; 2026-02-21T10:22:35.4970756Z shl.b16 %rs958, %rs936, 4; 2026-02-21T10:22:35.4970824Z shl.b16 %rs959, %rs940, 4; 2026-02-21T10:22:35.4970954Z shl.b16 %rs960, %rs944, 4; 2026-02-21T10:22:35.4971158Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.4971240Z selp.b16 %rs961, %rs945, %rs929, %p484; 2026-02-21T10:22:35.4971307Z cvt.s16.s8 %rs962, %rs961; 2026-02-21T10:22:35.4971366Z shr.s16 %rs963, %rs962, 4; 2026-02-21T10:22:35.4971449Z selp.b16 %rs964, %rs946, %rs933, %p484; 2026-02-21T10:22:35.4971577Z cvt.s16.s8 %rs965, %rs964; 2026-02-21T10:22:35.4971641Z shr.s16 %rs966, %rs965, 4; 2026-02-21T10:22:35.4971713Z selp.b16 %rs967, %rs947, %rs937, %p484; 2026-02-21T10:22:35.4971781Z cvt.s16.s8 %rs968, %rs967; 2026-02-21T10:22:35.4971843Z shr.s16 %rs969, %rs968, 4; 2026-02-21T10:22:35.4971915Z selp.b16 %rs970, %rs948, %rs941, %p484; 2026-02-21T10:22:35.4971977Z cvt.s16.s8 %rs971, %rs970; 2026-02-21T10:22:35.4972041Z shr.s16 %rs972, %rs971, 4; 2026-02-21T10:22:35.4972110Z selp.b16 %rs973, %rs949, %rs930, %p484; 2026-02-21T10:22:35.4972175Z cvt.s16.s8 %rs974, %rs973; 2026-02-21T10:22:35.4972254Z shr.s16 %rs975, %rs974, 4; 2026-02-21T10:22:35.4972327Z selp.b16 %rs976, %rs950, %rs934, %p484; 2026-02-21T10:22:35.4972389Z cvt.s16.s8 %rs977, %rs976; 2026-02-21T10:22:35.4972457Z shr.s16 %rs978, %rs977, 4; 2026-02-21T10:22:35.4972527Z selp.b16 %rs979, %rs951, %rs938, %p484; 2026-02-21T10:22:35.4972588Z cvt.s16.s8 %rs980, %rs979; 2026-02-21T10:22:35.4972653Z shr.s16 %rs981, %rs980, 4; 2026-02-21T10:22:35.4972728Z selp.b16 %rs982, %rs952, %rs942, %p484; 2026-02-21T10:22:35.4972789Z cvt.s16.s8 %rs983, %rs982; 2026-02-21T10:22:35.4972850Z shr.s16 %rs984, %rs983, 4; 2026-02-21T10:22:35.4972921Z selp.b16 %rs985, %rs953, %rs931, %p484; 2026-02-21T10:22:35.4972983Z cvt.s16.s8 %rs986, %rs985; 2026-02-21T10:22:35.4973044Z shr.s16 %rs987, %rs986, 4; 2026-02-21T10:22:35.4973113Z selp.b16 %rs988, %rs954, %rs935, %p484; 2026-02-21T10:22:35.4973178Z cvt.s16.s8 %rs989, %rs988; 2026-02-21T10:22:35.4973240Z shr.s16 %rs990, %rs989, 4; 2026-02-21T10:22:35.4973311Z selp.b16 %rs991, %rs955, %rs939, %p484; 2026-02-21T10:22:35.4973375Z cvt.s16.s8 %rs992, %rs991; 2026-02-21T10:22:35.4973435Z shr.s16 %rs993, %rs992, 4; 2026-02-21T10:22:35.4973506Z selp.b16 %rs994, %rs956, %rs943, %p484; 2026-02-21T10:22:35.4973570Z cvt.s16.s8 %rs995, %rs994; 2026-02-21T10:22:35.4973634Z shr.s16 %rs996, %rs995, 4; 2026-02-21T10:22:35.4973705Z selp.b16 %rs997, %rs957, %rs932, %p484; 2026-02-21T10:22:35.4973766Z cvt.s16.s8 %rs998, %rs997; 2026-02-21T10:22:35.4973833Z shr.s16 %rs999, %rs998, 4; 2026-02-21T10:22:35.4973908Z selp.b16 %rs1000, %rs958, %rs936, %p484; 2026-02-21T10:22:35.4973972Z cvt.s16.s8 %rs1001, %rs1000; 2026-02-21T10:22:35.4974042Z shr.s16 %rs1002, %rs1001, 4; 2026-02-21T10:22:35.4974115Z selp.b16 %rs1003, %rs959, %rs940, %p484; 2026-02-21T10:22:35.4974176Z cvt.s16.s8 %rs1004, %rs1003; 2026-02-21T10:22:35.4974239Z shr.s16 %rs1005, %rs1004, 4; 2026-02-21T10:22:35.4974315Z selp.b16 %rs1006, %rs960, %rs944, %p484; 2026-02-21T10:22:35.4974452Z cvt.s16.s8 %rs1007, %rs1006; 2026-02-21T10:22:35.4974517Z shr.s16 %rs1008, %rs1007, 4; 2026-02-21T10:22:35.4974722Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.4974840Z cvt.rn.f32.s16 %r14026, %rs963; 2026-02-21T10:22:35.4974905Z cvt.rn.f32.s16 %r14027, %rs966; 2026-02-21T10:22:35.4974969Z cvt.rn.f32.s16 %r14028, %rs969; 2026-02-21T10:22:35.4975039Z cvt.rn.f32.s16 %r14029, %rs972; 2026-02-21T10:22:35.4975101Z cvt.rn.f32.s16 %r14030, %rs975; 2026-02-21T10:22:35.4975163Z cvt.rn.f32.s16 %r14031, %rs978; 2026-02-21T10:22:35.4975229Z cvt.rn.f32.s16 %r14032, %rs981; 2026-02-21T10:22:35.4975292Z cvt.rn.f32.s16 %r14033, %rs984; 2026-02-21T10:22:35.4975353Z cvt.rn.f32.s16 %r14034, %rs987; 2026-02-21T10:22:35.4975419Z cvt.rn.f32.s16 %r14035, %rs990; 2026-02-21T10:22:35.4975481Z cvt.rn.f32.s16 %r14036, %rs993; 2026-02-21T10:22:35.4975543Z cvt.rn.f32.s16 %r14037, %rs996; 2026-02-21T10:22:35.4975655Z cvt.rn.f32.s16 %r14038, %rs999; 2026-02-21T10:22:35.4975725Z cvt.rn.f32.s16 %r14039, %rs1002; 2026-02-21T10:22:35.4975788Z cvt.rn.f32.s16 %r14040, %rs1005; 2026-02-21T10:22:35.4975851Z cvt.rn.f32.s16 %r14041, %rs1008; 2026-02-21T10:22:35.4975929Z st.shared.b32 [%r138], %r14026; 2026-02-21T10:22:35.4976009Z st.shared.b32 [%r138+16384], %r14034; 2026-02-21T10:22:35.4976073Z st.shared.b32 [%r139], %r14027; 2026-02-21T10:22:35.4976209Z st.shared.b32 [%r139+16384], %r14035; 2026-02-21T10:22:35.4976280Z st.shared.b32 [%r140], %r14028; 2026-02-21T10:22:35.4976346Z st.shared.b32 [%r140+16384], %r14036; 2026-02-21T10:22:35.4976411Z st.shared.b32 [%r141], %r14029; 2026-02-21T10:22:35.4976605Z st.shared.b32 [%r141+16384], %r14037; 2026-02-21T10:22:35.4976674Z st.shared.b32 [%r142], %r14030; 2026-02-21T10:22:35.4976741Z st.shared.b32 [%r142+16384], %r14038; 2026-02-21T10:22:35.4976812Z st.shared.b32 [%r143], %r14031; 2026-02-21T10:22:35.4976879Z st.shared.b32 [%r143+16384], %r14039; 2026-02-21T10:22:35.4976950Z st.shared.b32 [%r144], %r14032; 2026-02-21T10:22:35.4977015Z st.shared.b32 [%r144+16384], %r14040; 2026-02-21T10:22:35.4977085Z st.shared.b32 [%r145], %r14033; 2026-02-21T10:22:35.4977149Z st.shared.b32 [%r145+16384], %r14041; 2026-02-21T10:22:35.4977299Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13675}; 2026-02-21T10:22:35.4977360Z bar.sync 0; 2026-02-21T10:22:35.4977424Z // begin inline asm 2026-02-21T10:22:35.4977623Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9779, %r10067, %r10355, %r10643}, [%r9667]; 2026-02-21T10:22:35.4977682Z // end inline asm 2026-02-21T10:22:35.4977743Z bar.sync 0; 2026-02-21T10:22:35.4977881Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13677}; 2026-02-21T10:22:35.4977938Z bar.sync 0; 2026-02-21T10:22:35.4978005Z // begin inline asm 2026-02-21T10:22:35.4978196Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9781, %r10069, %r10357, %r10645}, [%r9667]; 2026-02-21T10:22:35.4978254Z // end inline asm 2026-02-21T10:22:35.4978317Z bar.sync 0; 2026-02-21T10:22:35.4978450Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13676}; 2026-02-21T10:22:35.4978508Z bar.sync 0; 2026-02-21T10:22:35.4978568Z // begin inline asm 2026-02-21T10:22:35.4978762Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9780, %r10068, %r10356, %r10644}, [%r9667]; 2026-02-21T10:22:35.4978820Z // end inline asm 2026-02-21T10:22:35.4978876Z bar.sync 0; 2026-02-21T10:22:35.4979019Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13678}; 2026-02-21T10:22:35.4979076Z bar.sync 0; 2026-02-21T10:22:35.4979135Z // begin inline asm 2026-02-21T10:22:35.4979323Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9782, %r10070, %r10358, %r10646}, [%r9667]; 2026-02-21T10:22:35.4979384Z // end inline asm 2026-02-21T10:22:35.4979441Z bar.sync 0; 2026-02-21T10:22:35.4979573Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13679}; 2026-02-21T10:22:35.4979632Z bar.sync 0; 2026-02-21T10:22:35.4979704Z // begin inline asm 2026-02-21T10:22:35.4979979Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9783, %r10071, %r10359, %r10647}, [%r9667]; 2026-02-21T10:22:35.4980039Z // end inline asm 2026-02-21T10:22:35.4980105Z bar.sync 0; 2026-02-21T10:22:35.4980243Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13681}; 2026-02-21T10:22:35.4980367Z bar.sync 0; 2026-02-21T10:22:35.4980431Z // begin inline asm 2026-02-21T10:22:35.4980622Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9785, %r10073, %r10361, %r10649}, [%r9667]; 2026-02-21T10:22:35.4980679Z // end inline asm 2026-02-21T10:22:35.4980738Z bar.sync 0; 2026-02-21T10:22:35.4980871Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13680}; 2026-02-21T10:22:35.4980926Z bar.sync 0; 2026-02-21T10:22:35.4980985Z // begin inline asm 2026-02-21T10:22:35.4981175Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9784, %r10072, %r10360, %r10648}, [%r9667]; 2026-02-21T10:22:35.4981234Z // end inline asm 2026-02-21T10:22:35.4981288Z bar.sync 0; 2026-02-21T10:22:35.4981489Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13682}; 2026-02-21T10:22:35.4981550Z bar.sync 0; 2026-02-21T10:22:35.4981609Z // begin inline asm 2026-02-21T10:22:35.4981795Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9786, %r10074, %r10362, %r10650}, [%r9667]; 2026-02-21T10:22:35.4981860Z // end inline asm 2026-02-21T10:22:35.4981917Z bar.sync 0; 2026-02-21T10:22:35.4982108Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13683}; 2026-02-21T10:22:35.4982172Z bar.sync 0; 2026-02-21T10:22:35.4982232Z // begin inline asm 2026-02-21T10:22:35.4982417Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9787, %r10075, %r10363, %r10651}, [%r9667]; 2026-02-21T10:22:35.4982478Z // end inline asm 2026-02-21T10:22:35.4982532Z bar.sync 0; 2026-02-21T10:22:35.4982665Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13685}; 2026-02-21T10:22:35.4982720Z bar.sync 0; 2026-02-21T10:22:35.4982796Z // begin inline asm 2026-02-21T10:22:35.4982987Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9789, %r10077, %r10365, %r10653}, [%r9667]; 2026-02-21T10:22:35.4983046Z // end inline asm 2026-02-21T10:22:35.4983105Z bar.sync 0; 2026-02-21T10:22:35.4983239Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13684}; 2026-02-21T10:22:35.4983297Z bar.sync 0; 2026-02-21T10:22:35.4983355Z // begin inline asm 2026-02-21T10:22:35.4983547Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9788, %r10076, %r10364, %r10652}, [%r9667]; 2026-02-21T10:22:35.4983607Z // end inline asm 2026-02-21T10:22:35.4983664Z bar.sync 0; 2026-02-21T10:22:35.4983799Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13686}; 2026-02-21T10:22:35.4983856Z bar.sync 0; 2026-02-21T10:22:35.4983915Z // begin inline asm 2026-02-21T10:22:35.4984103Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9790, %r10078, %r10366, %r10654}, [%r9667]; 2026-02-21T10:22:35.4984159Z // end inline asm 2026-02-21T10:22:35.4984214Z bar.sync 0; 2026-02-21T10:22:35.4984350Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13687}; 2026-02-21T10:22:35.4984410Z bar.sync 0; 2026-02-21T10:22:35.4984471Z // begin inline asm 2026-02-21T10:22:35.4984655Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9791, %r10079, %r10367, %r10655}, [%r9667]; 2026-02-21T10:22:35.4984715Z // end inline asm 2026-02-21T10:22:35.4984773Z bar.sync 0; 2026-02-21T10:22:35.4984903Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13689}; 2026-02-21T10:22:35.4984958Z bar.sync 0; 2026-02-21T10:22:35.4985023Z // begin inline asm 2026-02-21T10:22:35.4985218Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9793, %r10081, %r10369, %r10657}, [%r9667]; 2026-02-21T10:22:35.4985280Z // end inline asm 2026-02-21T10:22:35.4985339Z bar.sync 0; 2026-02-21T10:22:35.4985473Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13688}; 2026-02-21T10:22:35.4985527Z bar.sync 0; 2026-02-21T10:22:35.4985588Z // begin inline asm 2026-02-21T10:22:35.4985776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9792, %r10080, %r10368, %r10656}, [%r9667]; 2026-02-21T10:22:35.4985895Z // end inline asm 2026-02-21T10:22:35.4985953Z bar.sync 0; 2026-02-21T10:22:35.4986089Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r13608], {%r13690}; 2026-02-21T10:22:35.4986144Z bar.sync 0; 2026-02-21T10:22:35.4986203Z // begin inline asm 2026-02-21T10:22:35.4986565Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9794, %r10082, %r10370, %r10658}, [%r9667]; 2026-02-21T10:22:35.4986627Z // end inline asm 2026-02-21T10:22:35.4986683Z $L__tmp17: 2026-02-21T10:22:35.4986963Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.4987029Z // begin inline asm 2026-02-21T10:22:35.4987111Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.4987168Z // end inline asm 2026-02-21T10:22:35.4987255Z shfl.sync.idx.b32 %r14042, %r5, 0, 31, -1; 2026-02-21T10:22:35.4987329Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.4987394Z mov.pred %p343, -1; 2026-02-21T10:22:35.4987457Z // begin inline asm 2026-02-21T10:22:35.4988064Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10063,%r10064,%r10065,%r10066}, %rd469, %p343, 1, 1; 2026-02-21T10:22:35.4988130Z // end inline asm 2026-02-21T10:22:35.4988190Z // begin inline asm 2026-02-21T10:22:35.4988845Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10099,%r10100,%r10101,%r10102}, %rd470, %p343, 1, 1; 2026-02-21T10:22:35.4988907Z // end inline asm 2026-02-21T10:22:35.4988967Z // begin inline asm 2026-02-21T10:22:35.4989477Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10135,%r10136,%r10137,%r10138}, %rd471, %p343, 1, 1; 2026-02-21T10:22:35.4989536Z // end inline asm 2026-02-21T10:22:35.4989595Z // begin inline asm 2026-02-21T10:22:35.4990103Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10171,%r10172,%r10173,%r10174}, %rd472, %p343, 1, 1; 2026-02-21T10:22:35.4990163Z // end inline asm 2026-02-21T10:22:35.4990221Z // begin inline asm 2026-02-21T10:22:35.4990740Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10207,%r10208,%r10209,%r10210}, %rd473, %p343, 1, 1; 2026-02-21T10:22:35.4990800Z // end inline asm 2026-02-21T10:22:35.4990859Z // begin inline asm 2026-02-21T10:22:35.4991366Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10243,%r10244,%r10245,%r10246}, %rd474, %p343, 1, 1; 2026-02-21T10:22:35.4991424Z // end inline asm 2026-02-21T10:22:35.4991484Z // begin inline asm 2026-02-21T10:22:35.4991990Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10279,%r10280,%r10281,%r10282}, %rd475, %p343, 1, 1; 2026-02-21T10:22:35.4992054Z // end inline asm 2026-02-21T10:22:35.4992113Z // begin inline asm 2026-02-21T10:22:35.4992622Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r10315,%r10316,%r10317,%r10318}, %rd476, %p343, 1, 1; 2026-02-21T10:22:35.4992679Z // end inline asm 2026-02-21T10:22:35.4992738Z // begin inline asm 2026-02-21T10:22:35.4993305Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10063,%r10064,%r10065,%r10066}, %rd477, %p343, 1, 1; 2026-02-21T10:22:35.4993438Z // end inline asm 2026-02-21T10:22:35.4993503Z // begin inline asm 2026-02-21T10:22:35.4994063Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10099,%r10100,%r10101,%r10102}, %rd478, %p343, 1, 1; 2026-02-21T10:22:35.4994186Z // end inline asm 2026-02-21T10:22:35.4994245Z // begin inline asm 2026-02-21T10:22:35.4994802Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10135,%r10136,%r10137,%r10138}, %rd479, %p343, 1, 1; 2026-02-21T10:22:35.4994861Z // end inline asm 2026-02-21T10:22:35.4994918Z // begin inline asm 2026-02-21T10:22:35.4995517Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10171,%r10172,%r10173,%r10174}, %rd480, %p343, 1, 1; 2026-02-21T10:22:35.4995584Z // end inline asm 2026-02-21T10:22:35.4995654Z // begin inline asm 2026-02-21T10:22:35.4996214Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10207,%r10208,%r10209,%r10210}, %rd481, %p343, 1, 1; 2026-02-21T10:22:35.4996326Z // end inline asm 2026-02-21T10:22:35.4996388Z // begin inline asm 2026-02-21T10:22:35.4997058Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10243,%r10244,%r10245,%r10246}, %rd482, %p343, 1, 1; 2026-02-21T10:22:35.4997121Z // end inline asm 2026-02-21T10:22:35.4997179Z // begin inline asm 2026-02-21T10:22:35.4997748Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10279,%r10280,%r10281,%r10282}, %rd483, %p343, 1, 1; 2026-02-21T10:22:35.4997813Z // end inline asm 2026-02-21T10:22:35.4997871Z // begin inline asm 2026-02-21T10:22:35.4998430Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r10315,%r10316,%r10317,%r10318}, %rd484, %p343, 1, 1; 2026-02-21T10:22:35.4998490Z // end inline asm 2026-02-21T10:22:35.4998547Z // begin inline asm 2026-02-21T10:22:35.4999103Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10063,%r10064,%r10065,%r10066}, %rd485, %p343, 1, 1; 2026-02-21T10:22:35.4999162Z // end inline asm 2026-02-21T10:22:35.4999223Z // begin inline asm 2026-02-21T10:22:35.4999787Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10099,%r10100,%r10101,%r10102}, %rd486, %p343, 1, 1; 2026-02-21T10:22:35.4999847Z // end inline asm 2026-02-21T10:22:35.4999911Z // begin inline asm 2026-02-21T10:22:35.5000470Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10135,%r10136,%r10137,%r10138}, %rd487, %p343, 1, 1; 2026-02-21T10:22:35.5000529Z // end inline asm 2026-02-21T10:22:35.5000592Z // begin inline asm 2026-02-21T10:22:35.5001155Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10171,%r10172,%r10173,%r10174}, %rd488, %p343, 1, 1; 2026-02-21T10:22:35.5001214Z // end inline asm 2026-02-21T10:22:35.5001356Z // begin inline asm 2026-02-21T10:22:35.5001925Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10207,%r10208,%r10209,%r10210}, %rd489, %p343, 1, 1; 2026-02-21T10:22:35.5002049Z // end inline asm 2026-02-21T10:22:35.5002109Z // begin inline asm 2026-02-21T10:22:35.5002671Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10243,%r10244,%r10245,%r10246}, %rd490, %p343, 1, 1; 2026-02-21T10:22:35.5002729Z // end inline asm 2026-02-21T10:22:35.5002790Z // begin inline asm 2026-02-21T10:22:35.5003348Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10279,%r10280,%r10281,%r10282}, %rd491, %p343, 1, 1; 2026-02-21T10:22:35.5003492Z // end inline asm 2026-02-21T10:22:35.5003559Z // begin inline asm 2026-02-21T10:22:35.5004112Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r10315,%r10316,%r10317,%r10318}, %rd492, %p343, 1, 1; 2026-02-21T10:22:35.5004171Z // end inline asm 2026-02-21T10:22:35.5004298Z // begin inline asm 2026-02-21T10:22:35.5004855Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10063,%r10064,%r10065,%r10066}, %rd493, %p343, 1, 1; 2026-02-21T10:22:35.5004912Z // end inline asm 2026-02-21T10:22:35.5004970Z // begin inline asm 2026-02-21T10:22:35.5005533Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10099,%r10100,%r10101,%r10102}, %rd494, %p343, 1, 1; 2026-02-21T10:22:35.5005592Z // end inline asm 2026-02-21T10:22:35.5005652Z // begin inline asm 2026-02-21T10:22:35.5006213Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10135,%r10136,%r10137,%r10138}, %rd495, %p343, 1, 1; 2026-02-21T10:22:35.5006274Z // end inline asm 2026-02-21T10:22:35.5006332Z // begin inline asm 2026-02-21T10:22:35.5007012Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10171,%r10172,%r10173,%r10174}, %rd496, %p343, 1, 1; 2026-02-21T10:22:35.5007073Z // end inline asm 2026-02-21T10:22:35.5007132Z // begin inline asm 2026-02-21T10:22:35.5007690Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10207,%r10208,%r10209,%r10210}, %rd497, %p343, 1, 1; 2026-02-21T10:22:35.5007750Z // end inline asm 2026-02-21T10:22:35.5007809Z // begin inline asm 2026-02-21T10:22:35.5008371Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10243,%r10244,%r10245,%r10246}, %rd498, %p343, 1, 1; 2026-02-21T10:22:35.5008428Z // end inline asm 2026-02-21T10:22:35.5008485Z // begin inline asm 2026-02-21T10:22:35.5009041Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10279,%r10280,%r10281,%r10282}, %rd499, %p343, 1, 1; 2026-02-21T10:22:35.5009098Z // end inline asm 2026-02-21T10:22:35.5009157Z // begin inline asm 2026-02-21T10:22:35.5009807Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r10315,%r10316,%r10317,%r10318}, %rd500, %p343, 1, 1; 2026-02-21T10:22:35.5009925Z // end inline asm 2026-02-21T10:22:35.5010004Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.5010068Z mov.b32 %r13944, 0; 2026-02-21T10:22:35.5010131Z mov.b32 %r10959, %r11026; 2026-02-21T10:22:35.5010205Z mov.b32 %r10960, %r13944; 2026-02-21T10:22:35.5010267Z mov.b32 %r10961, %r13944; 2026-02-21T10:22:35.5010332Z // begin inline asm 2026-02-21T10:22:35.5011644Z // wait for regs: %r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794,%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082,%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370,%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658,%r10959,%r10960,%r10961 2026-02-21T10:22:35.5011730Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.5011789Z // end inline asm 2026-02-21T10:22:35.5011846Z $L__tmp18: 2026-02-21T10:22:35.5012072Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5012203Z add.s32 %r14043, %r11026, 59392; 2026-02-21T10:22:35.5012271Z add.s32 %r14044, %r14043, %r14004; 2026-02-21T10:22:35.5012477Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.5012547Z add.s32 %r14045, %r14044, %r129; 2026-02-21T10:22:35.5012616Z ld.shared.b16 %rs1009, [%r14045]; 2026-02-21T10:22:35.5012691Z ld.shared.b16 %rs1010, [%r14045+1024]; 2026-02-21T10:22:35.5012761Z ld.shared.b16 %rs1011, [%r14045+64]; 2026-02-21T10:22:35.5012836Z ld.shared.b16 %rs1012, [%r14045+1088]; 2026-02-21T10:22:35.5012905Z add.s32 %r14046, %r14044, %r130; 2026-02-21T10:22:35.5012973Z ld.shared.b16 %rs1013, [%r14046]; 2026-02-21T10:22:35.5013046Z ld.shared.b16 %rs1014, [%r14046+1024]; 2026-02-21T10:22:35.5013114Z ld.shared.b16 %rs1015, [%r14046+64]; 2026-02-21T10:22:35.5013185Z ld.shared.b16 %rs1016, [%r14046+1088]; 2026-02-21T10:22:35.5013253Z add.s32 %r14047, %r14044, %r131; 2026-02-21T10:22:35.5013321Z ld.shared.b16 %rs1017, [%r14047]; 2026-02-21T10:22:35.5013390Z ld.shared.b16 %rs1018, [%r14047+1024]; 2026-02-21T10:22:35.5013457Z ld.shared.b16 %rs1019, [%r14047+64]; 2026-02-21T10:22:35.5013530Z ld.shared.b16 %rs1020, [%r14047+1088]; 2026-02-21T10:22:35.5013593Z add.s32 %r14048, %r14044, %r132; 2026-02-21T10:22:35.5013660Z ld.shared.b16 %rs1021, [%r14048]; 2026-02-21T10:22:35.5013731Z ld.shared.b16 %rs1022, [%r14048+1024]; 2026-02-21T10:22:35.5013797Z ld.shared.b16 %rs1023, [%r14048+64]; 2026-02-21T10:22:35.5013865Z ld.shared.b16 %rs1024, [%r14048+1088]; 2026-02-21T10:22:35.5013931Z add.s32 %r14049, %r14044, %r133; 2026-02-21T10:22:35.5014004Z ld.shared.b16 %rs1025, [%r14049]; 2026-02-21T10:22:35.5014085Z ld.shared.b16 %rs1026, [%r14049+1024]; 2026-02-21T10:22:35.5014158Z ld.shared.b16 %rs1027, [%r14049+64]; 2026-02-21T10:22:35.5014234Z ld.shared.b16 %rs1028, [%r14049+1088]; 2026-02-21T10:22:35.5014300Z add.s32 %r14050, %r14044, %r134; 2026-02-21T10:22:35.5014368Z ld.shared.b16 %rs1029, [%r14050]; 2026-02-21T10:22:35.5014441Z ld.shared.b16 %rs1030, [%r14050+1024]; 2026-02-21T10:22:35.5014509Z ld.shared.b16 %rs1031, [%r14050+64]; 2026-02-21T10:22:35.5014578Z ld.shared.b16 %rs1032, [%r14050+1088]; 2026-02-21T10:22:35.5014640Z add.s32 %r14051, %r14044, %r135; 2026-02-21T10:22:35.5014709Z ld.shared.b16 %rs1033, [%r14051]; 2026-02-21T10:22:35.5014777Z ld.shared.b16 %rs1034, [%r14051+1024]; 2026-02-21T10:22:35.5014844Z ld.shared.b16 %rs1035, [%r14051+64]; 2026-02-21T10:22:35.5014916Z ld.shared.b16 %rs1036, [%r14051+1088]; 2026-02-21T10:22:35.5015043Z add.s32 %r14052, %r14044, %r136; 2026-02-21T10:22:35.5015109Z ld.shared.b16 %rs1037, [%r14052]; 2026-02-21T10:22:35.5015178Z ld.shared.b16 %rs1038, [%r14052+1024]; 2026-02-21T10:22:35.5015249Z ld.shared.b16 %rs1039, [%r14052+64]; 2026-02-21T10:22:35.5015365Z ld.shared.b16 %rs1040, [%r14052+1088]; 2026-02-21T10:22:35.5015443Z cvt.f32.bf16 %r11351, %rs1009; 2026-02-21T10:22:35.5015514Z cvt.f32.bf16 %r11352, %rs1010; 2026-02-21T10:22:35.5015583Z cvt.f32.bf16 %r11353, %rs1013; 2026-02-21T10:22:35.5015646Z cvt.f32.bf16 %r11354, %rs1014; 2026-02-21T10:22:35.5015712Z cvt.f32.bf16 %r11387, %rs1017; 2026-02-21T10:22:35.5015778Z cvt.f32.bf16 %r11388, %rs1018; 2026-02-21T10:22:35.5015840Z cvt.f32.bf16 %r11389, %rs1021; 2026-02-21T10:22:35.5015902Z cvt.f32.bf16 %r11390, %rs1022; 2026-02-21T10:22:35.5015968Z cvt.f32.bf16 %r11423, %rs1025; 2026-02-21T10:22:35.5016029Z cvt.f32.bf16 %r11424, %rs1026; 2026-02-21T10:22:35.5016092Z cvt.f32.bf16 %r11425, %rs1029; 2026-02-21T10:22:35.5016220Z cvt.f32.bf16 %r11426, %rs1030; 2026-02-21T10:22:35.5016286Z cvt.f32.bf16 %r11459, %rs1033; 2026-02-21T10:22:35.5016347Z cvt.f32.bf16 %r11460, %rs1034; 2026-02-21T10:22:35.5016407Z cvt.f32.bf16 %r11461, %rs1037; 2026-02-21T10:22:35.5016593Z cvt.f32.bf16 %r11462, %rs1038; 2026-02-21T10:22:35.5016669Z cvt.f32.bf16 %r11495, %rs1011; 2026-02-21T10:22:35.5016734Z cvt.f32.bf16 %r11496, %rs1012; 2026-02-21T10:22:35.5016873Z cvt.f32.bf16 %r11497, %rs1015; 2026-02-21T10:22:35.5016939Z cvt.f32.bf16 %r11498, %rs1016; 2026-02-21T10:22:35.5017001Z cvt.f32.bf16 %r11531, %rs1019; 2026-02-21T10:22:35.5017063Z cvt.f32.bf16 %r11532, %rs1020; 2026-02-21T10:22:35.5017128Z cvt.f32.bf16 %r11533, %rs1023; 2026-02-21T10:22:35.5017190Z cvt.f32.bf16 %r11534, %rs1024; 2026-02-21T10:22:35.5017253Z cvt.f32.bf16 %r11567, %rs1027; 2026-02-21T10:22:35.5017320Z cvt.f32.bf16 %r11568, %rs1028; 2026-02-21T10:22:35.5017380Z cvt.f32.bf16 %r11569, %rs1031; 2026-02-21T10:22:35.5017442Z cvt.f32.bf16 %r11570, %rs1032; 2026-02-21T10:22:35.5017510Z cvt.f32.bf16 %r11603, %rs1035; 2026-02-21T10:22:35.5017576Z cvt.f32.bf16 %r11604, %rs1036; 2026-02-21T10:22:35.5017638Z cvt.f32.bf16 %r11605, %rs1039; 2026-02-21T10:22:35.5017699Z cvt.f32.bf16 %r11606, %rs1040; 2026-02-21T10:22:35.5017922Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5017990Z add.s32 %r11029, %r9368, %r14015; 2026-02-21T10:22:35.5018051Z // begin inline asm 2026-02-21T10:22:35.5018108Z 2026-02-21T10:22:35.5018160Z { 2026-02-21T10:22:35.5018225Z .reg .pred complete; 2026-02-21T10:22:35.5018282Z waitLoop: 2026-02-21T10:22:35.5018436Z mbarrier.try_wait.parity.shared.b64 complete, [%r11029], %r14286; 2026-02-21T10:22:35.5018508Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.5018561Z } 2026-02-21T10:22:35.5018566Z 2026-02-21T10:22:35.5018639Z // end inline asm 2026-02-21T10:22:35.5018847Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5018918Z add.s32 %r14055, %r9391, %r14003; 2026-02-21T10:22:35.5019122Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5019191Z add.s32 %r14056, %r14055, %r137; 2026-02-21T10:22:35.5019258Z ld.shared.b8 %rs1041, [%r14056]; 2026-02-21T10:22:35.5019328Z ld.shared.b8 %rs1042, [%r14056+1024]; 2026-02-21T10:22:35.5019404Z ld.shared.b8 %rs1043, [%r14056+2048]; 2026-02-21T10:22:35.5019472Z ld.shared.b8 %rs1044, [%r14056+3072]; 2026-02-21T10:22:35.5019538Z add.s32 %r14057, %r14055, %r14020; 2026-02-21T10:22:35.5019609Z ld.shared.b8 %rs1045, [%r14057+256]; 2026-02-21T10:22:35.5019680Z ld.shared.b8 %rs1046, [%r14057+1280]; 2026-02-21T10:22:35.5019747Z ld.shared.b8 %rs1047, [%r14057+2304]; 2026-02-21T10:22:35.5019817Z ld.shared.b8 %rs1048, [%r14057+3328]; 2026-02-21T10:22:35.5019884Z add.s32 %r14058, %r14055, %r14022; 2026-02-21T10:22:35.5019953Z ld.shared.b8 %rs1049, [%r14058+512]; 2026-02-21T10:22:35.5020105Z ld.shared.b8 %rs1050, [%r14058+1536]; 2026-02-21T10:22:35.5020179Z ld.shared.b8 %rs1051, [%r14058+2560]; 2026-02-21T10:22:35.5020245Z ld.shared.b8 %rs1052, [%r14058+3584]; 2026-02-21T10:22:35.5020375Z add.s32 %r14059, %r14055, %r14024; 2026-02-21T10:22:35.5020447Z ld.shared.b8 %rs1053, [%r14059+768]; 2026-02-21T10:22:35.5020514Z ld.shared.b8 %rs1054, [%r14059+1792]; 2026-02-21T10:22:35.5020582Z ld.shared.b8 %rs1055, [%r14059+2816]; 2026-02-21T10:22:35.5020651Z ld.shared.b8 %rs1056, [%r14059+3840]; 2026-02-21T10:22:35.5020857Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.5020923Z shl.b16 %rs1057, %rs1041, 4; 2026-02-21T10:22:35.5020986Z shl.b16 %rs1058, %rs1045, 4; 2026-02-21T10:22:35.5021055Z shl.b16 %rs1059, %rs1049, 4; 2026-02-21T10:22:35.5021116Z shl.b16 %rs1060, %rs1053, 4; 2026-02-21T10:22:35.5021178Z shl.b16 %rs1061, %rs1042, 4; 2026-02-21T10:22:35.5021247Z shl.b16 %rs1062, %rs1046, 4; 2026-02-21T10:22:35.5021389Z shl.b16 %rs1063, %rs1050, 4; 2026-02-21T10:22:35.5021459Z shl.b16 %rs1064, %rs1054, 4; 2026-02-21T10:22:35.5021522Z shl.b16 %rs1065, %rs1043, 4; 2026-02-21T10:22:35.5021588Z shl.b16 %rs1066, %rs1047, 4; 2026-02-21T10:22:35.5021654Z shl.b16 %rs1067, %rs1051, 4; 2026-02-21T10:22:35.5021715Z shl.b16 %rs1068, %rs1055, 4; 2026-02-21T10:22:35.5021780Z shl.b16 %rs1069, %rs1044, 4; 2026-02-21T10:22:35.5021889Z shl.b16 %rs1070, %rs1048, 4; 2026-02-21T10:22:35.5021952Z shl.b16 %rs1071, %rs1052, 4; 2026-02-21T10:22:35.5022016Z shl.b16 %rs1072, %rs1056, 4; 2026-02-21T10:22:35.5022220Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5022301Z selp.b16 %rs1073, %rs1057, %rs1041, %p484; 2026-02-21T10:22:35.5022365Z cvt.s16.s8 %rs1074, %rs1073; 2026-02-21T10:22:35.5022432Z shr.s16 %rs1075, %rs1074, 4; 2026-02-21T10:22:35.5022509Z selp.b16 %rs1076, %rs1058, %rs1045, %p484; 2026-02-21T10:22:35.5022578Z cvt.s16.s8 %rs1077, %rs1076; 2026-02-21T10:22:35.5022642Z shr.s16 %rs1078, %rs1077, 4; 2026-02-21T10:22:35.5022721Z selp.b16 %rs1079, %rs1059, %rs1049, %p484; 2026-02-21T10:22:35.5022785Z cvt.s16.s8 %rs1080, %rs1079; 2026-02-21T10:22:35.5022850Z shr.s16 %rs1081, %rs1080, 4; 2026-02-21T10:22:35.5022927Z selp.b16 %rs1082, %rs1060, %rs1053, %p484; 2026-02-21T10:22:35.5022990Z cvt.s16.s8 %rs1083, %rs1082; 2026-02-21T10:22:35.5023052Z shr.s16 %rs1084, %rs1083, 4; 2026-02-21T10:22:35.5023131Z selp.b16 %rs1085, %rs1061, %rs1042, %p484; 2026-02-21T10:22:35.5023192Z cvt.s16.s8 %rs1086, %rs1085; 2026-02-21T10:22:35.5023254Z shr.s16 %rs1087, %rs1086, 4; 2026-02-21T10:22:35.5023328Z selp.b16 %rs1088, %rs1062, %rs1046, %p484; 2026-02-21T10:22:35.5023394Z cvt.s16.s8 %rs1089, %rs1088; 2026-02-21T10:22:35.5023457Z shr.s16 %rs1090, %rs1089, 4; 2026-02-21T10:22:35.5023533Z selp.b16 %rs1091, %rs1063, %rs1050, %p484; 2026-02-21T10:22:35.5023599Z cvt.s16.s8 %rs1092, %rs1091; 2026-02-21T10:22:35.5023665Z shr.s16 %rs1093, %rs1092, 4; 2026-02-21T10:22:35.5023738Z selp.b16 %rs1094, %rs1064, %rs1054, %p484; 2026-02-21T10:22:35.5023800Z cvt.s16.s8 %rs1095, %rs1094; 2026-02-21T10:22:35.5023867Z shr.s16 %rs1096, %rs1095, 4; 2026-02-21T10:22:35.5023943Z selp.b16 %rs1097, %rs1065, %rs1043, %p484; 2026-02-21T10:22:35.5024004Z cvt.s16.s8 %rs1098, %rs1097; 2026-02-21T10:22:35.5024073Z shr.s16 %rs1099, %rs1098, 4; 2026-02-21T10:22:35.5024148Z selp.b16 %rs1100, %rs1066, %rs1047, %p484; 2026-02-21T10:22:35.5024209Z cvt.s16.s8 %rs1101, %rs1100; 2026-02-21T10:22:35.5024273Z shr.s16 %rs1102, %rs1101, 4; 2026-02-21T10:22:35.5024348Z selp.b16 %rs1103, %rs1067, %rs1051, %p484; 2026-02-21T10:22:35.5024409Z cvt.s16.s8 %rs1104, %rs1103; 2026-02-21T10:22:35.5024470Z shr.s16 %rs1105, %rs1104, 4; 2026-02-21T10:22:35.5024548Z selp.b16 %rs1106, %rs1068, %rs1055, %p484; 2026-02-21T10:22:35.5024623Z cvt.s16.s8 %rs1107, %rs1106; 2026-02-21T10:22:35.5024744Z shr.s16 %rs1108, %rs1107, 4; 2026-02-21T10:22:35.5024828Z selp.b16 %rs1109, %rs1069, %rs1044, %p484; 2026-02-21T10:22:35.5024890Z cvt.s16.s8 %rs1110, %rs1109; 2026-02-21T10:22:35.5024952Z shr.s16 %rs1111, %rs1110, 4; 2026-02-21T10:22:35.5025025Z selp.b16 %rs1112, %rs1070, %rs1048, %p484; 2026-02-21T10:22:35.5025160Z cvt.s16.s8 %rs1113, %rs1112; 2026-02-21T10:22:35.5025221Z shr.s16 %rs1114, %rs1113, 4; 2026-02-21T10:22:35.5025299Z selp.b16 %rs1115, %rs1071, %rs1052, %p484; 2026-02-21T10:22:35.5025365Z cvt.s16.s8 %rs1116, %rs1115; 2026-02-21T10:22:35.5025426Z shr.s16 %rs1117, %rs1116, 4; 2026-02-21T10:22:35.5025500Z selp.b16 %rs1118, %rs1072, %rs1056, %p484; 2026-02-21T10:22:35.5025563Z cvt.s16.s8 %rs1119, %rs1118; 2026-02-21T10:22:35.5025629Z shr.s16 %rs1120, %rs1119, 4; 2026-02-21T10:22:35.5025832Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.5025899Z cvt.rn.f32.s16 %r14060, %rs1075; 2026-02-21T10:22:35.5025972Z cvt.rn.f32.s16 %r14061, %rs1078; 2026-02-21T10:22:35.5026084Z cvt.rn.f32.s16 %r14062, %rs1081; 2026-02-21T10:22:35.5026150Z cvt.rn.f32.s16 %r14063, %rs1084; 2026-02-21T10:22:35.5026225Z cvt.rn.f32.s16 %r14064, %rs1087; 2026-02-21T10:22:35.5026299Z cvt.rn.f32.s16 %r14065, %rs1090; 2026-02-21T10:22:35.5026362Z cvt.rn.f32.s16 %r14066, %rs1093; 2026-02-21T10:22:35.5026424Z cvt.rn.f32.s16 %r14067, %rs1096; 2026-02-21T10:22:35.5026678Z cvt.rn.f32.s16 %r14068, %rs1099; 2026-02-21T10:22:35.5026749Z cvt.rn.f32.s16 %r14069, %rs1102; 2026-02-21T10:22:35.5026812Z cvt.rn.f32.s16 %r14070, %rs1105; 2026-02-21T10:22:35.5026879Z cvt.rn.f32.s16 %r14071, %rs1108; 2026-02-21T10:22:35.5026942Z cvt.rn.f32.s16 %r14072, %rs1111; 2026-02-21T10:22:35.5027006Z cvt.rn.f32.s16 %r14073, %rs1114; 2026-02-21T10:22:35.5027068Z cvt.rn.f32.s16 %r14074, %rs1117; 2026-02-21T10:22:35.5027137Z cvt.rn.f32.s16 %r14075, %rs1120; 2026-02-21T10:22:35.5027206Z bar.sync 0; 2026-02-21T10:22:35.5027273Z st.shared.b32 [%r138], %r14060; 2026-02-21T10:22:35.5027351Z st.shared.b32 [%r138+16384], %r14068; 2026-02-21T10:22:35.5027415Z st.shared.b32 [%r139], %r14061; 2026-02-21T10:22:35.5027484Z st.shared.b32 [%r139+16384], %r14069; 2026-02-21T10:22:35.5027552Z st.shared.b32 [%r140], %r14062; 2026-02-21T10:22:35.5027621Z st.shared.b32 [%r140+16384], %r14070; 2026-02-21T10:22:35.5027686Z st.shared.b32 [%r141], %r14063; 2026-02-21T10:22:35.5027756Z st.shared.b32 [%r141+16384], %r14071; 2026-02-21T10:22:35.5027824Z st.shared.b32 [%r142], %r14064; 2026-02-21T10:22:35.5027891Z st.shared.b32 [%r142+16384], %r14072; 2026-02-21T10:22:35.5027955Z st.shared.b32 [%r143], %r14065; 2026-02-21T10:22:35.5028024Z st.shared.b32 [%r143+16384], %r14073; 2026-02-21T10:22:35.5028090Z st.shared.b32 [%r144], %r14066; 2026-02-21T10:22:35.5028156Z st.shared.b32 [%r144+16384], %r14074; 2026-02-21T10:22:35.5028221Z st.shared.b32 [%r145], %r14067; 2026-02-21T10:22:35.5028291Z st.shared.b32 [%r145+16384], %r14075; 2026-02-21T10:22:35.5028348Z $L__tmp19: 2026-02-21T10:22:35.5028687Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.5028758Z // begin inline asm 2026-02-21T10:22:35.5028840Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.5028898Z // end inline asm 2026-02-21T10:22:35.5028957Z bar.sync 0; 2026-02-21T10:22:35.5029030Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.5029091Z // begin inline asm 2026-02-21T10:22:35.5029623Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11351,%r11352,%r11353,%r11354}, %rd469, %p343, 1, 1; 2026-02-21T10:22:35.5029686Z // end inline asm 2026-02-21T10:22:35.5029745Z // begin inline asm 2026-02-21T10:22:35.5030256Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11387,%r11388,%r11389,%r11390}, %rd470, %p343, 1, 1; 2026-02-21T10:22:35.5030404Z // end inline asm 2026-02-21T10:22:35.5030465Z // begin inline asm 2026-02-21T10:22:35.5030972Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11423,%r11424,%r11425,%r11426}, %rd471, %p343, 1, 1; 2026-02-21T10:22:35.5031101Z // end inline asm 2026-02-21T10:22:35.5031163Z // begin inline asm 2026-02-21T10:22:35.5031669Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11459,%r11460,%r11461,%r11462}, %rd472, %p343, 1, 1; 2026-02-21T10:22:35.5031730Z // end inline asm 2026-02-21T10:22:35.5031788Z // begin inline asm 2026-02-21T10:22:35.5032349Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11495,%r11496,%r11497,%r11498}, %rd473, %p343, 1, 1; 2026-02-21T10:22:35.5032417Z // end inline asm 2026-02-21T10:22:35.5032475Z // begin inline asm 2026-02-21T10:22:35.5032978Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11531,%r11532,%r11533,%r11534}, %rd474, %p343, 1, 1; 2026-02-21T10:22:35.5033086Z // end inline asm 2026-02-21T10:22:35.5033149Z // begin inline asm 2026-02-21T10:22:35.5033656Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11567,%r11568,%r11569,%r11570}, %rd475, %p343, 1, 1; 2026-02-21T10:22:35.5033721Z // end inline asm 2026-02-21T10:22:35.5033780Z // begin inline asm 2026-02-21T10:22:35.5034285Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r11603,%r11604,%r11605,%r11606}, %rd476, %p343, 1, 1; 2026-02-21T10:22:35.5034344Z // end inline asm 2026-02-21T10:22:35.5034407Z // begin inline asm 2026-02-21T10:22:35.5034974Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11351,%r11352,%r11353,%r11354}, %rd477, %p343, 1, 1; 2026-02-21T10:22:35.5035031Z // end inline asm 2026-02-21T10:22:35.5035095Z // begin inline asm 2026-02-21T10:22:35.5035652Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11387,%r11388,%r11389,%r11390}, %rd478, %p343, 1, 1; 2026-02-21T10:22:35.5035709Z // end inline asm 2026-02-21T10:22:35.5035777Z // begin inline asm 2026-02-21T10:22:35.5036335Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11423,%r11424,%r11425,%r11426}, %rd479, %p343, 1, 1; 2026-02-21T10:22:35.5036393Z // end inline asm 2026-02-21T10:22:35.5036581Z // begin inline asm 2026-02-21T10:22:35.5037148Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11459,%r11460,%r11461,%r11462}, %rd480, %p343, 1, 1; 2026-02-21T10:22:35.5037205Z // end inline asm 2026-02-21T10:22:35.5037265Z // begin inline asm 2026-02-21T10:22:35.5037821Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11495,%r11496,%r11497,%r11498}, %rd481, %p343, 1, 1; 2026-02-21T10:22:35.5037880Z // end inline asm 2026-02-21T10:22:35.5037942Z // begin inline asm 2026-02-21T10:22:35.5038596Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11531,%r11532,%r11533,%r11534}, %rd482, %p343, 1, 1; 2026-02-21T10:22:35.5038732Z // end inline asm 2026-02-21T10:22:35.5038795Z // begin inline asm 2026-02-21T10:22:35.5039352Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11567,%r11568,%r11569,%r11570}, %rd483, %p343, 1, 1; 2026-02-21T10:22:35.5039409Z // end inline asm 2026-02-21T10:22:35.5039468Z // begin inline asm 2026-02-21T10:22:35.5040034Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r11603,%r11604,%r11605,%r11606}, %rd484, %p343, 1, 1; 2026-02-21T10:22:35.5040094Z // end inline asm 2026-02-21T10:22:35.5040215Z // begin inline asm 2026-02-21T10:22:35.5040776Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11351,%r11352,%r11353,%r11354}, %rd485, %p343, 1, 1; 2026-02-21T10:22:35.5040836Z // end inline asm 2026-02-21T10:22:35.5040896Z // begin inline asm 2026-02-21T10:22:35.5041513Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11387,%r11388,%r11389,%r11390}, %rd486, %p343, 1, 1; 2026-02-21T10:22:35.5041574Z // end inline asm 2026-02-21T10:22:35.5041634Z // begin inline asm 2026-02-21T10:22:35.5042200Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11423,%r11424,%r11425,%r11426}, %rd487, %p343, 1, 1; 2026-02-21T10:22:35.5042261Z // end inline asm 2026-02-21T10:22:35.5042320Z // begin inline asm 2026-02-21T10:22:35.5042882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11459,%r11460,%r11461,%r11462}, %rd488, %p343, 1, 1; 2026-02-21T10:22:35.5042944Z // end inline asm 2026-02-21T10:22:35.5043005Z // begin inline asm 2026-02-21T10:22:35.5043563Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11495,%r11496,%r11497,%r11498}, %rd489, %p343, 1, 1; 2026-02-21T10:22:35.5043620Z // end inline asm 2026-02-21T10:22:35.5043681Z // begin inline asm 2026-02-21T10:22:35.5044240Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11531,%r11532,%r11533,%r11534}, %rd490, %p343, 1, 1; 2026-02-21T10:22:35.5044300Z // end inline asm 2026-02-21T10:22:35.5044360Z // begin inline asm 2026-02-21T10:22:35.5044926Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11567,%r11568,%r11569,%r11570}, %rd491, %p343, 1, 1; 2026-02-21T10:22:35.5044997Z // end inline asm 2026-02-21T10:22:35.5045058Z // begin inline asm 2026-02-21T10:22:35.5045618Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r11603,%r11604,%r11605,%r11606}, %rd492, %p343, 1, 1; 2026-02-21T10:22:35.5045675Z // end inline asm 2026-02-21T10:22:35.5045736Z // begin inline asm 2026-02-21T10:22:35.5046295Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11351,%r11352,%r11353,%r11354}, %rd493, %p343, 1, 1; 2026-02-21T10:22:35.5046413Z // end inline asm 2026-02-21T10:22:35.5046643Z // begin inline asm 2026-02-21T10:22:35.5047210Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11387,%r11388,%r11389,%r11390}, %rd494, %p343, 1, 1; 2026-02-21T10:22:35.5047271Z // end inline asm 2026-02-21T10:22:35.5047330Z // begin inline asm 2026-02-21T10:22:35.5047887Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11423,%r11424,%r11425,%r11426}, %rd495, %p343, 1, 1; 2026-02-21T10:22:35.5047948Z // end inline asm 2026-02-21T10:22:35.5048011Z // begin inline asm 2026-02-21T10:22:35.5048633Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11459,%r11460,%r11461,%r11462}, %rd496, %p343, 1, 1; 2026-02-21T10:22:35.5048702Z // end inline asm 2026-02-21T10:22:35.5048761Z // begin inline asm 2026-02-21T10:22:35.5049374Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11495,%r11496,%r11497,%r11498}, %rd497, %p343, 1, 1; 2026-02-21T10:22:35.5049440Z // end inline asm 2026-02-21T10:22:35.5049499Z // begin inline asm 2026-02-21T10:22:35.5050053Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11531,%r11532,%r11533,%r11534}, %rd498, %p343, 1, 1; 2026-02-21T10:22:35.5050119Z // end inline asm 2026-02-21T10:22:35.5050180Z // begin inline asm 2026-02-21T10:22:35.5050734Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11567,%r11568,%r11569,%r11570}, %rd499, %p343, 1, 1; 2026-02-21T10:22:35.5050798Z // end inline asm 2026-02-21T10:22:35.5050858Z // begin inline asm 2026-02-21T10:22:35.5051430Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r11603,%r11604,%r11605,%r11606}, %rd500, %p343, 1, 1; 2026-02-21T10:22:35.5051493Z // end inline asm 2026-02-21T10:22:35.5051570Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.5051633Z mov.b32 %r12247, %r11026; 2026-02-21T10:22:35.5051694Z mov.b32 %r12248, %r13944; 2026-02-21T10:22:35.5051760Z mov.b32 %r12249, %r13944; 2026-02-21T10:22:35.5051824Z // begin inline asm 2026-02-21T10:22:35.5053072Z // wait for regs: %r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794,%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082,%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370,%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658,%r12247,%r12248,%r12249 2026-02-21T10:22:35.5053157Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.5053215Z // end inline asm 2026-02-21T10:22:35.5053269Z $L__tmp20: 2026-02-21T10:22:35.5053485Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5053550Z add.s32 %r14076, %r11026, 83968; 2026-02-21T10:22:35.5053614Z add.s32 %r14077, %r14076, %r14004; 2026-02-21T10:22:35.5053929Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.5053994Z add.s32 %r14078, %r14077, %r129; 2026-02-21T10:22:35.5054065Z ld.shared.b16 %rs1121, [%r14078]; 2026-02-21T10:22:35.5054206Z ld.shared.b16 %rs1122, [%r14078+1024]; 2026-02-21T10:22:35.5054275Z ld.shared.b16 %rs1123, [%r14078+64]; 2026-02-21T10:22:35.5054347Z ld.shared.b16 %rs1124, [%r14078+1088]; 2026-02-21T10:22:35.5054411Z add.s32 %r14079, %r14077, %r130; 2026-02-21T10:22:35.5054482Z ld.shared.b16 %rs1125, [%r14079]; 2026-02-21T10:22:35.5054562Z ld.shared.b16 %rs1126, [%r14079+1024]; 2026-02-21T10:22:35.5054634Z ld.shared.b16 %rs1127, [%r14079+64]; 2026-02-21T10:22:35.5054706Z ld.shared.b16 %rs1128, [%r14079+1088]; 2026-02-21T10:22:35.5054769Z add.s32 %r14080, %r14077, %r131; 2026-02-21T10:22:35.5054835Z ld.shared.b16 %rs1129, [%r14080]; 2026-02-21T10:22:35.5054904Z ld.shared.b16 %rs1130, [%r14080+1024]; 2026-02-21T10:22:35.5055026Z ld.shared.b16 %rs1131, [%r14080+64]; 2026-02-21T10:22:35.5055095Z ld.shared.b16 %rs1132, [%r14080+1088]; 2026-02-21T10:22:35.5055159Z add.s32 %r14081, %r14077, %r132; 2026-02-21T10:22:35.5055233Z ld.shared.b16 %rs1133, [%r14081]; 2026-02-21T10:22:35.5055303Z ld.shared.b16 %rs1134, [%r14081+1024]; 2026-02-21T10:22:35.5055370Z ld.shared.b16 %rs1135, [%r14081+64]; 2026-02-21T10:22:35.5055491Z ld.shared.b16 %rs1136, [%r14081+1088]; 2026-02-21T10:22:35.5055558Z add.s32 %r14082, %r14077, %r133; 2026-02-21T10:22:35.5055622Z ld.shared.b16 %rs1137, [%r14082]; 2026-02-21T10:22:35.5055691Z ld.shared.b16 %rs1138, [%r14082+1024]; 2026-02-21T10:22:35.5055762Z ld.shared.b16 %rs1139, [%r14082+64]; 2026-02-21T10:22:35.5055841Z ld.shared.b16 %rs1140, [%r14082+1088]; 2026-02-21T10:22:35.5055904Z add.s32 %r14083, %r14077, %r134; 2026-02-21T10:22:35.5055972Z ld.shared.b16 %rs1141, [%r14083]; 2026-02-21T10:22:35.5056041Z ld.shared.b16 %rs1142, [%r14083+1024]; 2026-02-21T10:22:35.5056112Z ld.shared.b16 %rs1143, [%r14083+64]; 2026-02-21T10:22:35.5056182Z ld.shared.b16 %rs1144, [%r14083+1088]; 2026-02-21T10:22:35.5056247Z add.s32 %r14084, %r14077, %r135; 2026-02-21T10:22:35.5056314Z ld.shared.b16 %rs1145, [%r14084]; 2026-02-21T10:22:35.5056388Z ld.shared.b16 %rs1146, [%r14084+1024]; 2026-02-21T10:22:35.5056581Z ld.shared.b16 %rs1147, [%r14084+64]; 2026-02-21T10:22:35.5056656Z ld.shared.b16 %rs1148, [%r14084+1088]; 2026-02-21T10:22:35.5056722Z add.s32 %r14085, %r14077, %r136; 2026-02-21T10:22:35.5056788Z ld.shared.b16 %rs1149, [%r14085]; 2026-02-21T10:22:35.5056860Z ld.shared.b16 %rs1150, [%r14085+1024]; 2026-02-21T10:22:35.5056926Z ld.shared.b16 %rs1151, [%r14085+64]; 2026-02-21T10:22:35.5056996Z ld.shared.b16 %rs1152, [%r14085+1088]; 2026-02-21T10:22:35.5057068Z cvt.f32.bf16 %r12639, %rs1121; 2026-02-21T10:22:35.5057133Z cvt.f32.bf16 %r12640, %rs1122; 2026-02-21T10:22:35.5057207Z cvt.f32.bf16 %r12641, %rs1125; 2026-02-21T10:22:35.5057278Z cvt.f32.bf16 %r12642, %rs1126; 2026-02-21T10:22:35.5057346Z cvt.f32.bf16 %r12675, %rs1129; 2026-02-21T10:22:35.5057407Z cvt.f32.bf16 %r12676, %rs1130; 2026-02-21T10:22:35.5057469Z cvt.f32.bf16 %r12677, %rs1133; 2026-02-21T10:22:35.5057538Z cvt.f32.bf16 %r12678, %rs1134; 2026-02-21T10:22:35.5057606Z cvt.f32.bf16 %r12711, %rs1137; 2026-02-21T10:22:35.5057672Z cvt.f32.bf16 %r12712, %rs1138; 2026-02-21T10:22:35.5057736Z cvt.f32.bf16 %r12713, %rs1141; 2026-02-21T10:22:35.5057800Z cvt.f32.bf16 %r12714, %rs1142; 2026-02-21T10:22:35.5057862Z cvt.f32.bf16 %r12747, %rs1145; 2026-02-21T10:22:35.5057924Z cvt.f32.bf16 %r12748, %rs1146; 2026-02-21T10:22:35.5057989Z cvt.f32.bf16 %r12749, %rs1149; 2026-02-21T10:22:35.5058050Z cvt.f32.bf16 %r12750, %rs1150; 2026-02-21T10:22:35.5058123Z cvt.f32.bf16 %r12783, %rs1123; 2026-02-21T10:22:35.5058190Z cvt.f32.bf16 %r12784, %rs1124; 2026-02-21T10:22:35.5058252Z cvt.f32.bf16 %r12785, %rs1127; 2026-02-21T10:22:35.5058315Z cvt.f32.bf16 %r12786, %rs1128; 2026-02-21T10:22:35.5058460Z cvt.f32.bf16 %r12819, %rs1131; 2026-02-21T10:22:35.5058529Z cvt.f32.bf16 %r12820, %rs1132; 2026-02-21T10:22:35.5058592Z cvt.f32.bf16 %r12821, %rs1135; 2026-02-21T10:22:35.5058653Z cvt.f32.bf16 %r12822, %rs1136; 2026-02-21T10:22:35.5058783Z cvt.f32.bf16 %r12855, %rs1139; 2026-02-21T10:22:35.5058844Z cvt.f32.bf16 %r12856, %rs1140; 2026-02-21T10:22:35.5058906Z cvt.f32.bf16 %r12857, %rs1143; 2026-02-21T10:22:35.5058974Z cvt.f32.bf16 %r12858, %rs1144; 2026-02-21T10:22:35.5059036Z cvt.f32.bf16 %r12891, %rs1147; 2026-02-21T10:22:35.5059098Z cvt.f32.bf16 %r12892, %rs1148; 2026-02-21T10:22:35.5066123Z cvt.f32.bf16 %r12893, %rs1151; 2026-02-21T10:22:35.5066233Z cvt.f32.bf16 %r12894, %rs1152; 2026-02-21T10:22:35.5066631Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5066722Z add.s32 %r12317, %r9371, %r14015; 2026-02-21T10:22:35.5066785Z // begin inline asm 2026-02-21T10:22:35.5066838Z 2026-02-21T10:22:35.5066897Z { 2026-02-21T10:22:35.5067089Z .reg .pred complete; 2026-02-21T10:22:35.5067151Z waitLoop: 2026-02-21T10:22:35.5067311Z mbarrier.try_wait.parity.shared.b64 complete, [%r12317], %r14286; 2026-02-21T10:22:35.5067385Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.5067439Z } 2026-02-21T10:22:35.5067445Z 2026-02-21T10:22:35.5067503Z // end inline asm 2026-02-21T10:22:35.5067801Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5067871Z add.s32 %r14088, %r9400, %r14003; 2026-02-21T10:22:35.5068094Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5068168Z add.s32 %r14089, %r14088, %r137; 2026-02-21T10:22:35.5068239Z ld.shared.b8 %rs1153, [%r14089]; 2026-02-21T10:22:35.5068312Z ld.shared.b8 %rs1154, [%r14089+1024]; 2026-02-21T10:22:35.5068379Z ld.shared.b8 %rs1155, [%r14089+2048]; 2026-02-21T10:22:35.5068551Z ld.shared.b8 %rs1156, [%r14089+3072]; 2026-02-21T10:22:35.5068624Z add.s32 %r14090, %r14088, %r14020; 2026-02-21T10:22:35.5068695Z ld.shared.b8 %rs1157, [%r14090+256]; 2026-02-21T10:22:35.5068768Z ld.shared.b8 %rs1158, [%r14090+1280]; 2026-02-21T10:22:35.5068834Z ld.shared.b8 %rs1159, [%r14090+2304]; 2026-02-21T10:22:35.5068902Z ld.shared.b8 %rs1160, [%r14090+3328]; 2026-02-21T10:22:35.5068968Z add.s32 %r14091, %r14088, %r14022; 2026-02-21T10:22:35.5069037Z ld.shared.b8 %rs1161, [%r14091+512]; 2026-02-21T10:22:35.5069113Z ld.shared.b8 %rs1162, [%r14091+1536]; 2026-02-21T10:22:35.5069181Z ld.shared.b8 %rs1163, [%r14091+2560]; 2026-02-21T10:22:35.5069252Z ld.shared.b8 %rs1164, [%r14091+3584]; 2026-02-21T10:22:35.5069317Z add.s32 %r14092, %r14088, %r14024; 2026-02-21T10:22:35.5069382Z ld.shared.b8 %rs1165, [%r14092+768]; 2026-02-21T10:22:35.5069449Z ld.shared.b8 %rs1166, [%r14092+1792]; 2026-02-21T10:22:35.5069523Z ld.shared.b8 %rs1167, [%r14092+2816]; 2026-02-21T10:22:35.5069586Z ld.shared.b8 %rs1168, [%r14092+3840]; 2026-02-21T10:22:35.5069802Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.5069869Z shl.b16 %rs1169, %rs1153, 4; 2026-02-21T10:22:35.5069932Z shl.b16 %rs1170, %rs1157, 4; 2026-02-21T10:22:35.5069995Z shl.b16 %rs1171, %rs1161, 4; 2026-02-21T10:22:35.5070061Z shl.b16 %rs1172, %rs1165, 4; 2026-02-21T10:22:35.5070122Z shl.b16 %rs1173, %rs1154, 4; 2026-02-21T10:22:35.5070184Z shl.b16 %rs1174, %rs1158, 4; 2026-02-21T10:22:35.5070250Z shl.b16 %rs1175, %rs1162, 4; 2026-02-21T10:22:35.5070309Z shl.b16 %rs1176, %rs1166, 4; 2026-02-21T10:22:35.5070369Z shl.b16 %rs1177, %rs1155, 4; 2026-02-21T10:22:35.5070428Z shl.b16 %rs1178, %rs1159, 4; 2026-02-21T10:22:35.5070494Z shl.b16 %rs1179, %rs1163, 4; 2026-02-21T10:22:35.5070555Z shl.b16 %rs1180, %rs1167, 4; 2026-02-21T10:22:35.5070618Z shl.b16 %rs1181, %rs1156, 4; 2026-02-21T10:22:35.5070679Z shl.b16 %rs1182, %rs1160, 4; 2026-02-21T10:22:35.5070740Z shl.b16 %rs1183, %rs1164, 4; 2026-02-21T10:22:35.5070896Z shl.b16 %rs1184, %rs1168, 4; 2026-02-21T10:22:35.5071103Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5071192Z selp.b16 %rs1185, %rs1169, %rs1153, %p484; 2026-02-21T10:22:35.5071319Z cvt.s16.s8 %rs1186, %rs1185; 2026-02-21T10:22:35.5071381Z shr.s16 %rs1187, %rs1186, 4; 2026-02-21T10:22:35.5071464Z selp.b16 %rs1188, %rs1170, %rs1157, %p484; 2026-02-21T10:22:35.5071525Z cvt.s16.s8 %rs1189, %rs1188; 2026-02-21T10:22:35.5071585Z shr.s16 %rs1190, %rs1189, 4; 2026-02-21T10:22:35.5071660Z selp.b16 %rs1191, %rs1171, %rs1161, %p484; 2026-02-21T10:22:35.5071721Z cvt.s16.s8 %rs1192, %rs1191; 2026-02-21T10:22:35.5071781Z shr.s16 %rs1193, %rs1192, 4; 2026-02-21T10:22:35.5071856Z selp.b16 %rs1194, %rs1172, %rs1165, %p484; 2026-02-21T10:22:35.5071921Z cvt.s16.s8 %rs1195, %rs1194; 2026-02-21T10:22:35.5071992Z shr.s16 %rs1196, %rs1195, 4; 2026-02-21T10:22:35.5072069Z selp.b16 %rs1197, %rs1173, %rs1154, %p484; 2026-02-21T10:22:35.5072189Z cvt.s16.s8 %rs1198, %rs1197; 2026-02-21T10:22:35.5072255Z shr.s16 %rs1199, %rs1198, 4; 2026-02-21T10:22:35.5072330Z selp.b16 %rs1200, %rs1174, %rs1158, %p484; 2026-02-21T10:22:35.5072392Z cvt.s16.s8 %rs1201, %rs1200; 2026-02-21T10:22:35.5072459Z shr.s16 %rs1202, %rs1201, 4; 2026-02-21T10:22:35.5072531Z selp.b16 %rs1203, %rs1175, %rs1162, %p484; 2026-02-21T10:22:35.5072640Z cvt.s16.s8 %rs1204, %rs1203; 2026-02-21T10:22:35.5072704Z shr.s16 %rs1205, %rs1204, 4; 2026-02-21T10:22:35.5072777Z selp.b16 %rs1206, %rs1176, %rs1166, %p484; 2026-02-21T10:22:35.5072837Z cvt.s16.s8 %rs1207, %rs1206; 2026-02-21T10:22:35.5072904Z shr.s16 %rs1208, %rs1207, 4; 2026-02-21T10:22:35.5072978Z selp.b16 %rs1209, %rs1177, %rs1155, %p484; 2026-02-21T10:22:35.5073040Z cvt.s16.s8 %rs1210, %rs1209; 2026-02-21T10:22:35.5073100Z shr.s16 %rs1211, %rs1210, 4; 2026-02-21T10:22:35.5073176Z selp.b16 %rs1212, %rs1178, %rs1159, %p484; 2026-02-21T10:22:35.5073241Z cvt.s16.s8 %rs1213, %rs1212; 2026-02-21T10:22:35.5073305Z shr.s16 %rs1214, %rs1213, 4; 2026-02-21T10:22:35.5073381Z selp.b16 %rs1215, %rs1179, %rs1163, %p484; 2026-02-21T10:22:35.5073453Z cvt.s16.s8 %rs1216, %rs1215; 2026-02-21T10:22:35.5073520Z shr.s16 %rs1217, %rs1216, 4; 2026-02-21T10:22:35.5073596Z selp.b16 %rs1218, %rs1180, %rs1167, %p484; 2026-02-21T10:22:35.5073658Z cvt.s16.s8 %rs1219, %rs1218; 2026-02-21T10:22:35.5073721Z shr.s16 %rs1220, %rs1219, 4; 2026-02-21T10:22:35.5073794Z selp.b16 %rs1221, %rs1181, %rs1156, %p484; 2026-02-21T10:22:35.5073858Z cvt.s16.s8 %rs1222, %rs1221; 2026-02-21T10:22:35.5073918Z shr.s16 %rs1223, %rs1222, 4; 2026-02-21T10:22:35.5073990Z selp.b16 %rs1224, %rs1182, %rs1160, %p484; 2026-02-21T10:22:35.5074054Z cvt.s16.s8 %rs1225, %rs1224; 2026-02-21T10:22:35.5074115Z shr.s16 %rs1226, %rs1225, 4; 2026-02-21T10:22:35.5074186Z selp.b16 %rs1227, %rs1183, %rs1164, %p484; 2026-02-21T10:22:35.5074246Z cvt.s16.s8 %rs1228, %rs1227; 2026-02-21T10:22:35.5074313Z shr.s16 %rs1229, %rs1228, 4; 2026-02-21T10:22:35.5074388Z selp.b16 %rs1230, %rs1184, %rs1168, %p484; 2026-02-21T10:22:35.5074448Z cvt.s16.s8 %rs1231, %rs1230; 2026-02-21T10:22:35.5074514Z shr.s16 %rs1232, %rs1231, 4; 2026-02-21T10:22:35.5074729Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.5074797Z cvt.rn.f32.s16 %r14093, %rs1187; 2026-02-21T10:22:35.5074862Z cvt.rn.f32.s16 %r14094, %rs1190; 2026-02-21T10:22:35.5074930Z cvt.rn.f32.s16 %r14095, %rs1193; 2026-02-21T10:22:35.5074992Z cvt.rn.f32.s16 %r14096, %rs1196; 2026-02-21T10:22:35.5075055Z cvt.rn.f32.s16 %r14097, %rs1199; 2026-02-21T10:22:35.5075120Z cvt.rn.f32.s16 %r14098, %rs1202; 2026-02-21T10:22:35.5075181Z cvt.rn.f32.s16 %r14099, %rs1205; 2026-02-21T10:22:35.5075243Z cvt.rn.f32.s16 %r14100, %rs1208; 2026-02-21T10:22:35.5075309Z cvt.rn.f32.s16 %r14101, %rs1211; 2026-02-21T10:22:35.5075371Z cvt.rn.f32.s16 %r14102, %rs1214; 2026-02-21T10:22:35.5075505Z cvt.rn.f32.s16 %r14103, %rs1217; 2026-02-21T10:22:35.5075566Z cvt.rn.f32.s16 %r14104, %rs1220; 2026-02-21T10:22:35.5075633Z cvt.rn.f32.s16 %r14105, %rs1223; 2026-02-21T10:22:35.5075695Z cvt.rn.f32.s16 %r14106, %rs1226; 2026-02-21T10:22:35.5075803Z cvt.rn.f32.s16 %r14107, %rs1229; 2026-02-21T10:22:35.5075868Z cvt.rn.f32.s16 %r14108, %rs1232; 2026-02-21T10:22:35.5075927Z bar.sync 0; 2026-02-21T10:22:35.5075997Z st.shared.b32 [%r138], %r14093; 2026-02-21T10:22:35.5076067Z st.shared.b32 [%r138+16384], %r14101; 2026-02-21T10:22:35.5076135Z st.shared.b32 [%r139], %r14094; 2026-02-21T10:22:35.5076202Z st.shared.b32 [%r139+16384], %r14102; 2026-02-21T10:22:35.5076275Z st.shared.b32 [%r140], %r14095; 2026-02-21T10:22:35.5076347Z st.shared.b32 [%r140+16384], %r14103; 2026-02-21T10:22:35.5076412Z st.shared.b32 [%r141], %r14096; 2026-02-21T10:22:35.5076602Z st.shared.b32 [%r141+16384], %r14104; 2026-02-21T10:22:35.5076671Z st.shared.b32 [%r142], %r14097; 2026-02-21T10:22:35.5076829Z st.shared.b32 [%r142+16384], %r14105; 2026-02-21T10:22:35.5076896Z st.shared.b32 [%r143], %r14098; 2026-02-21T10:22:35.5076963Z st.shared.b32 [%r143+16384], %r14106; 2026-02-21T10:22:35.5077031Z st.shared.b32 [%r144], %r14099; 2026-02-21T10:22:35.5077100Z st.shared.b32 [%r144+16384], %r14107; 2026-02-21T10:22:35.5077167Z st.shared.b32 [%r145], %r14100; 2026-02-21T10:22:35.5077236Z st.shared.b32 [%r145+16384], %r14108; 2026-02-21T10:22:35.5077377Z $L__tmp21: 2026-02-21T10:22:35.5077660Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.5077725Z // begin inline asm 2026-02-21T10:22:35.5077815Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.5077872Z // end inline asm 2026-02-21T10:22:35.5077928Z bar.sync 0; 2026-02-21T10:22:35.5078004Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.5078064Z // begin inline asm 2026-02-21T10:22:35.5078603Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12639,%r12640,%r12641,%r12642}, %rd469, %p343, 1, 1; 2026-02-21T10:22:35.5078670Z // end inline asm 2026-02-21T10:22:35.5078732Z // begin inline asm 2026-02-21T10:22:35.5079253Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12675,%r12676,%r12677,%r12678}, %rd470, %p343, 1, 1; 2026-02-21T10:22:35.5079316Z // end inline asm 2026-02-21T10:22:35.5079375Z // begin inline asm 2026-02-21T10:22:35.5079882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12711,%r12712,%r12713,%r12714}, %rd471, %p343, 1, 1; 2026-02-21T10:22:35.5079942Z // end inline asm 2026-02-21T10:22:35.5080003Z // begin inline asm 2026-02-21T10:22:35.5080505Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12747,%r12748,%r12749,%r12750}, %rd472, %p343, 1, 1; 2026-02-21T10:22:35.5080565Z // end inline asm 2026-02-21T10:22:35.5080630Z // begin inline asm 2026-02-21T10:22:35.5081144Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12783,%r12784,%r12785,%r12786}, %rd473, %p343, 1, 1; 2026-02-21T10:22:35.5081204Z // end inline asm 2026-02-21T10:22:35.5081266Z // begin inline asm 2026-02-21T10:22:35.5081780Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12819,%r12820,%r12821,%r12822}, %rd474, %p343, 1, 1; 2026-02-21T10:22:35.5081839Z // end inline asm 2026-02-21T10:22:35.5081901Z // begin inline asm 2026-02-21T10:22:35.5082489Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12855,%r12856,%r12857,%r12858}, %rd475, %p343, 1, 1; 2026-02-21T10:22:35.5082603Z // end inline asm 2026-02-21T10:22:35.5082665Z // begin inline asm 2026-02-21T10:22:35.5083174Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794}, {%r12891,%r12892,%r12893,%r12894}, %rd476, %p343, 1, 1; 2026-02-21T10:22:35.5083243Z // end inline asm 2026-02-21T10:22:35.5083309Z // begin inline asm 2026-02-21T10:22:35.5083870Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12639,%r12640,%r12641,%r12642}, %rd477, %p343, 1, 1; 2026-02-21T10:22:35.5083928Z // end inline asm 2026-02-21T10:22:35.5083991Z // begin inline asm 2026-02-21T10:22:35.5084594Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12675,%r12676,%r12677,%r12678}, %rd478, %p343, 1, 1; 2026-02-21T10:22:35.5084656Z // end inline asm 2026-02-21T10:22:35.5084724Z // begin inline asm 2026-02-21T10:22:35.5085331Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12711,%r12712,%r12713,%r12714}, %rd479, %p343, 1, 1; 2026-02-21T10:22:35.5085393Z // end inline asm 2026-02-21T10:22:35.5085462Z // begin inline asm 2026-02-21T10:22:35.5086015Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12747,%r12748,%r12749,%r12750}, %rd480, %p343, 1, 1; 2026-02-21T10:22:35.5086078Z // end inline asm 2026-02-21T10:22:35.5086136Z // begin inline asm 2026-02-21T10:22:35.5086812Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12783,%r12784,%r12785,%r12786}, %rd481, %p343, 1, 1; 2026-02-21T10:22:35.5086875Z // end inline asm 2026-02-21T10:22:35.5086936Z // begin inline asm 2026-02-21T10:22:35.5087495Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12819,%r12820,%r12821,%r12822}, %rd482, %p343, 1, 1; 2026-02-21T10:22:35.5087557Z // end inline asm 2026-02-21T10:22:35.5087615Z // begin inline asm 2026-02-21T10:22:35.5088171Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12855,%r12856,%r12857,%r12858}, %rd483, %p343, 1, 1; 2026-02-21T10:22:35.5088231Z // end inline asm 2026-02-21T10:22:35.5088290Z // begin inline asm 2026-02-21T10:22:35.5088839Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082}, {%r12891,%r12892,%r12893,%r12894}, %rd484, %p343, 1, 1; 2026-02-21T10:22:35.5088901Z // end inline asm 2026-02-21T10:22:35.5088961Z // begin inline asm 2026-02-21T10:22:35.5089514Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12639,%r12640,%r12641,%r12642}, %rd485, %p343, 1, 1; 2026-02-21T10:22:35.5089573Z // end inline asm 2026-02-21T10:22:35.5089641Z // begin inline asm 2026-02-21T10:22:35.5090198Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12675,%r12676,%r12677,%r12678}, %rd486, %p343, 1, 1; 2026-02-21T10:22:35.5090342Z // end inline asm 2026-02-21T10:22:35.5090400Z // begin inline asm 2026-02-21T10:22:35.5091020Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12711,%r12712,%r12713,%r12714}, %rd487, %p343, 1, 1; 2026-02-21T10:22:35.5091082Z // end inline asm 2026-02-21T10:22:35.5091141Z // begin inline asm 2026-02-21T10:22:35.5091688Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12747,%r12748,%r12749,%r12750}, %rd488, %p343, 1, 1; 2026-02-21T10:22:35.5091748Z // end inline asm 2026-02-21T10:22:35.5091807Z // begin inline asm 2026-02-21T10:22:35.5092443Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12783,%r12784,%r12785,%r12786}, %rd489, %p343, 1, 1; 2026-02-21T10:22:35.5092511Z // end inline asm 2026-02-21T10:22:35.5092570Z // begin inline asm 2026-02-21T10:22:35.5093177Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12819,%r12820,%r12821,%r12822}, %rd490, %p343, 1, 1; 2026-02-21T10:22:35.5093241Z // end inline asm 2026-02-21T10:22:35.5093299Z // begin inline asm 2026-02-21T10:22:35.5093853Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12855,%r12856,%r12857,%r12858}, %rd491, %p343, 1, 1; 2026-02-21T10:22:35.5093917Z // end inline asm 2026-02-21T10:22:35.5093981Z // begin inline asm 2026-02-21T10:22:35.5094531Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370}, {%r12891,%r12892,%r12893,%r12894}, %rd492, %p343, 1, 1; 2026-02-21T10:22:35.5094604Z // end inline asm 2026-02-21T10:22:35.5094664Z // begin inline asm 2026-02-21T10:22:35.5095216Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12639,%r12640,%r12641,%r12642}, %rd493, %p343, 1, 1; 2026-02-21T10:22:35.5095276Z // end inline asm 2026-02-21T10:22:35.5095333Z // begin inline asm 2026-02-21T10:22:35.5095882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12675,%r12676,%r12677,%r12678}, %rd494, %p343, 1, 1; 2026-02-21T10:22:35.5095948Z // end inline asm 2026-02-21T10:22:35.5096006Z // begin inline asm 2026-02-21T10:22:35.5096671Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12711,%r12712,%r12713,%r12714}, %rd495, %p343, 1, 1; 2026-02-21T10:22:35.5096737Z // end inline asm 2026-02-21T10:22:35.5096797Z // begin inline asm 2026-02-21T10:22:35.5097345Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12747,%r12748,%r12749,%r12750}, %rd496, %p343, 1, 1; 2026-02-21T10:22:35.5097401Z // end inline asm 2026-02-21T10:22:35.5097464Z // begin inline asm 2026-02-21T10:22:35.5098016Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12783,%r12784,%r12785,%r12786}, %rd497, %p343, 1, 1; 2026-02-21T10:22:35.5098158Z // end inline asm 2026-02-21T10:22:35.5098220Z // begin inline asm 2026-02-21T10:22:35.5098773Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12819,%r12820,%r12821,%r12822}, %rd498, %p343, 1, 1; 2026-02-21T10:22:35.5098893Z // end inline asm 2026-02-21T10:22:35.5098954Z // begin inline asm 2026-02-21T10:22:35.5099506Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12855,%r12856,%r12857,%r12858}, %rd499, %p343, 1, 1; 2026-02-21T10:22:35.5099565Z // end inline asm 2026-02-21T10:22:35.5099628Z // begin inline asm 2026-02-21T10:22:35.5100237Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658}, {%r12891,%r12892,%r12893,%r12894}, %rd500, %p343, 1, 1; 2026-02-21T10:22:35.5100300Z // end inline asm 2026-02-21T10:22:35.5100388Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.5100451Z mov.b32 %r13535, %r11026; 2026-02-21T10:22:35.5100512Z mov.b32 %r13536, %r13944; 2026-02-21T10:22:35.5100571Z mov.b32 %r13537, %r13944; 2026-02-21T10:22:35.5100688Z // begin inline asm 2026-02-21T10:22:35.5101914Z // wait for regs: %r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794,%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082,%r10355,%r10356,%r10357,%r10358,%r10359,%r10360,%r10361,%r10362,%r10363,%r10364,%r10365,%r10366,%r10367,%r10368,%r10369,%r10370,%r10643,%r10644,%r10645,%r10646,%r10647,%r10648,%r10649,%r10650,%r10651,%r10652,%r10653,%r10654,%r10655,%r10656,%r10657,%r10658,%r13535,%r13536,%r13537 2026-02-21T10:22:35.5101998Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.5102053Z // end inline asm 2026-02-21T10:22:35.5102109Z $L__tmp22: 2026-02-21T10:22:35.5102330Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5102401Z add.s32 %r14109, %r11026, 108544; 2026-02-21T10:22:35.5102468Z add.s32 %r14110, %r14109, %r14004; 2026-02-21T10:22:35.5102673Z .loc 1 58 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:58:32 2026-02-21T10:22:35.5102740Z add.s32 %r14111, %r14110, %r129; 2026-02-21T10:22:35.5102807Z ld.shared.b16 %rs1233, [%r14111]; 2026-02-21T10:22:35.5102881Z ld.shared.b16 %rs1234, [%r14111+1024]; 2026-02-21T10:22:35.5102951Z ld.shared.b16 %rs1235, [%r14111+64]; 2026-02-21T10:22:35.5103020Z ld.shared.b16 %rs1236, [%r14111+1088]; 2026-02-21T10:22:35.5103094Z add.s32 %r14112, %r14110, %r130; 2026-02-21T10:22:35.5103163Z ld.shared.b16 %rs1237, [%r14112]; 2026-02-21T10:22:35.5103237Z ld.shared.b16 %rs1238, [%r14112+1024]; 2026-02-21T10:22:35.5103304Z ld.shared.b16 %rs1239, [%r14112+64]; 2026-02-21T10:22:35.5103371Z ld.shared.b16 %rs1240, [%r14112+1088]; 2026-02-21T10:22:35.5103440Z add.s32 %r14113, %r14110, %r131; 2026-02-21T10:22:35.5103504Z ld.shared.b16 %rs1241, [%r14113]; 2026-02-21T10:22:35.5103572Z ld.shared.b16 %rs1242, [%r14113+1024]; 2026-02-21T10:22:35.5103645Z ld.shared.b16 %rs1243, [%r14113+64]; 2026-02-21T10:22:35.5103712Z ld.shared.b16 %rs1244, [%r14113+1088]; 2026-02-21T10:22:35.5103777Z add.s32 %r14114, %r14110, %r132; 2026-02-21T10:22:35.5103842Z ld.shared.b16 %rs1245, [%r14114]; 2026-02-21T10:22:35.5103914Z ld.shared.b16 %rs1246, [%r14114+1024]; 2026-02-21T10:22:35.5103980Z ld.shared.b16 %rs1247, [%r14114+64]; 2026-02-21T10:22:35.5104048Z ld.shared.b16 %rs1248, [%r14114+1088]; 2026-02-21T10:22:35.5104112Z add.s32 %r14115, %r14110, %r133; 2026-02-21T10:22:35.5104243Z ld.shared.b16 %rs1249, [%r14115]; 2026-02-21T10:22:35.5104313Z ld.shared.b16 %rs1250, [%r14115+1024]; 2026-02-21T10:22:35.5104380Z ld.shared.b16 %rs1251, [%r14115+64]; 2026-02-21T10:22:35.5104451Z ld.shared.b16 %rs1252, [%r14115+1088]; 2026-02-21T10:22:35.5104559Z add.s32 %r14116, %r14110, %r134; 2026-02-21T10:22:35.5104624Z ld.shared.b16 %rs1253, [%r14116]; 2026-02-21T10:22:35.5104694Z ld.shared.b16 %rs1254, [%r14116+1024]; 2026-02-21T10:22:35.5104762Z ld.shared.b16 %rs1255, [%r14116+64]; 2026-02-21T10:22:35.5104833Z ld.shared.b16 %rs1256, [%r14116+1088]; 2026-02-21T10:22:35.5104898Z add.s32 %r14117, %r14110, %r135; 2026-02-21T10:22:35.5104962Z ld.shared.b16 %rs1257, [%r14117]; 2026-02-21T10:22:35.5105029Z ld.shared.b16 %rs1258, [%r14117+1024]; 2026-02-21T10:22:35.5105093Z ld.shared.b16 %rs1259, [%r14117+64]; 2026-02-21T10:22:35.5105177Z ld.shared.b16 %rs1260, [%r14117+1088]; 2026-02-21T10:22:35.5105238Z add.s32 %r14118, %r14110, %r136; 2026-02-21T10:22:35.5105303Z ld.shared.b16 %rs1261, [%r14118]; 2026-02-21T10:22:35.5105444Z ld.shared.b16 %rs1262, [%r14118+1024]; 2026-02-21T10:22:35.5105511Z ld.shared.b16 %rs1263, [%r14118+64]; 2026-02-21T10:22:35.5105578Z ld.shared.b16 %rs1264, [%r14118+1088]; 2026-02-21T10:22:35.5105642Z cvt.f32.bf16 %r13671, %rs1233; 2026-02-21T10:22:35.5105712Z cvt.f32.bf16 %r13672, %rs1234; 2026-02-21T10:22:35.5105773Z cvt.f32.bf16 %r13673, %rs1237; 2026-02-21T10:22:35.5105892Z cvt.f32.bf16 %r13674, %rs1238; 2026-02-21T10:22:35.5105959Z cvt.f32.bf16 %r13707, %rs1241; 2026-02-21T10:22:35.5106020Z cvt.f32.bf16 %r13708, %rs1242; 2026-02-21T10:22:35.5106079Z cvt.f32.bf16 %r13709, %rs1245; 2026-02-21T10:22:35.5106143Z cvt.f32.bf16 %r13710, %rs1246; 2026-02-21T10:22:35.5106204Z cvt.f32.bf16 %r13743, %rs1249; 2026-02-21T10:22:35.5106263Z cvt.f32.bf16 %r13744, %rs1250; 2026-02-21T10:22:35.5106324Z cvt.f32.bf16 %r13745, %rs1253; 2026-02-21T10:22:35.5106389Z cvt.f32.bf16 %r13746, %rs1254; 2026-02-21T10:22:35.5106568Z cvt.f32.bf16 %r13779, %rs1257; 2026-02-21T10:22:35.5106643Z cvt.f32.bf16 %r13780, %rs1258; 2026-02-21T10:22:35.5106709Z cvt.f32.bf16 %r13781, %rs1261; 2026-02-21T10:22:35.5106773Z cvt.f32.bf16 %r13782, %rs1262; 2026-02-21T10:22:35.5106834Z cvt.f32.bf16 %r13815, %rs1235; 2026-02-21T10:22:35.5106900Z cvt.f32.bf16 %r13816, %rs1236; 2026-02-21T10:22:35.5106962Z cvt.f32.bf16 %r13817, %rs1239; 2026-02-21T10:22:35.5107022Z cvt.f32.bf16 %r13818, %rs1240; 2026-02-21T10:22:35.5107097Z cvt.f32.bf16 %r13851, %rs1243; 2026-02-21T10:22:35.5107163Z cvt.f32.bf16 %r13852, %rs1244; 2026-02-21T10:22:35.5107225Z cvt.f32.bf16 %r13853, %rs1247; 2026-02-21T10:22:35.5107286Z cvt.f32.bf16 %r13854, %rs1248; 2026-02-21T10:22:35.5107350Z cvt.f32.bf16 %r13887, %rs1251; 2026-02-21T10:22:35.5107411Z cvt.f32.bf16 %r13888, %rs1252; 2026-02-21T10:22:35.5107471Z cvt.f32.bf16 %r13889, %rs1255; 2026-02-21T10:22:35.5107530Z cvt.f32.bf16 %r13890, %rs1256; 2026-02-21T10:22:35.5107593Z cvt.f32.bf16 %r13923, %rs1259; 2026-02-21T10:22:35.5107655Z cvt.f32.bf16 %r13924, %rs1260; 2026-02-21T10:22:35.5107718Z cvt.f32.bf16 %r13925, %rs1263; 2026-02-21T10:22:35.5107782Z cvt.f32.bf16 %r13926, %rs1264; 2026-02-21T10:22:35.5107998Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5108064Z add.s32 %r13605, %r9374, %r14015; 2026-02-21T10:22:35.5108124Z // begin inline asm 2026-02-21T10:22:35.5108180Z 2026-02-21T10:22:35.5108233Z { 2026-02-21T10:22:35.5108298Z .reg .pred complete; 2026-02-21T10:22:35.5108358Z waitLoop: 2026-02-21T10:22:35.5108576Z mbarrier.try_wait.parity.shared.b64 complete, [%r13605], %r14286; 2026-02-21T10:22:35.5108650Z @!complete bra.uni waitLoop; 2026-02-21T10:22:35.5108702Z } 2026-02-21T10:22:35.5108711Z 2026-02-21T10:22:35.5108767Z // end inline asm 2026-02-21T10:22:35.5108975Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5109039Z add.s32 %r14121, %r9409, %r14003; 2026-02-21T10:22:35.5109327Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5109409Z add.s32 %r14122, %r14121, %r137; 2026-02-21T10:22:35.5109479Z ld.shared.b8 %rs1265, [%r14122]; 2026-02-21T10:22:35.5109620Z ld.shared.b8 %rs1266, [%r14122+1024]; 2026-02-21T10:22:35.5109686Z ld.shared.b8 %rs1267, [%r14122+2048]; 2026-02-21T10:22:35.5109755Z ld.shared.b8 %rs1268, [%r14122+3072]; 2026-02-21T10:22:35.5109820Z add.s32 %r14123, %r14121, %r14020; 2026-02-21T10:22:35.5109890Z ld.shared.b8 %rs1269, [%r14123+256]; 2026-02-21T10:22:35.5109955Z ld.shared.b8 %rs1270, [%r14123+1280]; 2026-02-21T10:22:35.5110019Z ld.shared.b8 %rs1271, [%r14123+2304]; 2026-02-21T10:22:35.5110086Z ld.shared.b8 %rs1272, [%r14123+3328]; 2026-02-21T10:22:35.5110145Z add.s32 %r14124, %r14121, %r14022; 2026-02-21T10:22:35.5110213Z ld.shared.b8 %rs1273, [%r14124+512]; 2026-02-21T10:22:35.5110292Z ld.shared.b8 %rs1274, [%r14124+1536]; 2026-02-21T10:22:35.5110423Z ld.shared.b8 %rs1275, [%r14124+2560]; 2026-02-21T10:22:35.5110493Z ld.shared.b8 %rs1276, [%r14124+3584]; 2026-02-21T10:22:35.5110556Z add.s32 %r14125, %r14121, %r14024; 2026-02-21T10:22:35.5110623Z ld.shared.b8 %rs1277, [%r14125+768]; 2026-02-21T10:22:35.5110697Z ld.shared.b8 %rs1278, [%r14125+1792]; 2026-02-21T10:22:35.5110761Z ld.shared.b8 %rs1279, [%r14125+2816]; 2026-02-21T10:22:35.5110887Z ld.shared.b8 %rs1280, [%r14125+3840]; 2026-02-21T10:22:35.5111095Z .loc 1 63 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:63:28 2026-02-21T10:22:35.5111160Z shl.b16 %rs1281, %rs1265, 4; 2026-02-21T10:22:35.5111225Z shl.b16 %rs1282, %rs1269, 4; 2026-02-21T10:22:35.5111284Z shl.b16 %rs1283, %rs1273, 4; 2026-02-21T10:22:35.5111344Z shl.b16 %rs1284, %rs1277, 4; 2026-02-21T10:22:35.5111403Z shl.b16 %rs1285, %rs1266, 4; 2026-02-21T10:22:35.5111464Z shl.b16 %rs1286, %rs1270, 4; 2026-02-21T10:22:35.5111524Z shl.b16 %rs1287, %rs1274, 4; 2026-02-21T10:22:35.5111587Z shl.b16 %rs1288, %rs1278, 4; 2026-02-21T10:22:35.5111650Z shl.b16 %rs1289, %rs1267, 4; 2026-02-21T10:22:35.5111710Z shl.b16 %rs1290, %rs1271, 4; 2026-02-21T10:22:35.5111769Z shl.b16 %rs1291, %rs1275, 4; 2026-02-21T10:22:35.5111832Z shl.b16 %rs1292, %rs1279, 4; 2026-02-21T10:22:35.5111896Z shl.b16 %rs1293, %rs1268, 4; 2026-02-21T10:22:35.5111956Z shl.b16 %rs1294, %rs1272, 4; 2026-02-21T10:22:35.5112018Z shl.b16 %rs1295, %rs1276, 4; 2026-02-21T10:22:35.5112082Z shl.b16 %rs1296, %rs1280, 4; 2026-02-21T10:22:35.5112285Z .loc 1 78 58 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:78:58 2026-02-21T10:22:35.5112367Z selp.b16 %rs1297, %rs1281, %rs1265, %p484; 2026-02-21T10:22:35.5112433Z cvt.s16.s8 %rs1298, %rs1297; 2026-02-21T10:22:35.5112495Z shr.s16 %rs1299, %rs1298, 4; 2026-02-21T10:22:35.5112570Z selp.b16 %rs1300, %rs1282, %rs1269, %p484; 2026-02-21T10:22:35.5112632Z cvt.s16.s8 %rs1301, %rs1300; 2026-02-21T10:22:35.5112698Z shr.s16 %rs1302, %rs1301, 4; 2026-02-21T10:22:35.5112774Z selp.b16 %rs1303, %rs1283, %rs1273, %p484; 2026-02-21T10:22:35.5112836Z cvt.s16.s8 %rs1304, %rs1303; 2026-02-21T10:22:35.5112903Z shr.s16 %rs1305, %rs1304, 4; 2026-02-21T10:22:35.5112976Z selp.b16 %rs1306, %rs1284, %rs1277, %p484; 2026-02-21T10:22:35.5113039Z cvt.s16.s8 %rs1307, %rs1306; 2026-02-21T10:22:35.5113100Z shr.s16 %rs1308, %rs1307, 4; 2026-02-21T10:22:35.5113180Z selp.b16 %rs1309, %rs1285, %rs1266, %p484; 2026-02-21T10:22:35.5113243Z cvt.s16.s8 %rs1310, %rs1309; 2026-02-21T10:22:35.5113303Z shr.s16 %rs1311, %rs1310, 4; 2026-02-21T10:22:35.5113378Z selp.b16 %rs1312, %rs1286, %rs1270, %p484; 2026-02-21T10:22:35.5113442Z cvt.s16.s8 %rs1313, %rs1312; 2026-02-21T10:22:35.5113504Z shr.s16 %rs1314, %rs1313, 4; 2026-02-21T10:22:35.5113577Z selp.b16 %rs1315, %rs1287, %rs1274, %p484; 2026-02-21T10:22:35.5113638Z cvt.s16.s8 %rs1316, %rs1315; 2026-02-21T10:22:35.5113698Z shr.s16 %rs1317, %rs1316, 4; 2026-02-21T10:22:35.5113849Z selp.b16 %rs1318, %rs1288, %rs1278, %p484; 2026-02-21T10:22:35.5113912Z cvt.s16.s8 %rs1319, %rs1318; 2026-02-21T10:22:35.5113972Z shr.s16 %rs1320, %rs1319, 4; 2026-02-21T10:22:35.5114048Z selp.b16 %rs1321, %rs1289, %rs1267, %p484; 2026-02-21T10:22:35.5114154Z cvt.s16.s8 %rs1322, %rs1321; 2026-02-21T10:22:35.5114216Z shr.s16 %rs1323, %rs1322, 4; 2026-02-21T10:22:35.5114294Z selp.b16 %rs1324, %rs1290, %rs1271, %p484; 2026-02-21T10:22:35.5114362Z cvt.s16.s8 %rs1325, %rs1324; 2026-02-21T10:22:35.5114423Z shr.s16 %rs1326, %rs1325, 4; 2026-02-21T10:22:35.5114502Z selp.b16 %rs1327, %rs1291, %rs1275, %p484; 2026-02-21T10:22:35.5114565Z cvt.s16.s8 %rs1328, %rs1327; 2026-02-21T10:22:35.5114626Z shr.s16 %rs1329, %rs1328, 4; 2026-02-21T10:22:35.5114703Z selp.b16 %rs1330, %rs1292, %rs1279, %p484; 2026-02-21T10:22:35.5114768Z cvt.s16.s8 %rs1331, %rs1330; 2026-02-21T10:22:35.5114829Z shr.s16 %rs1332, %rs1331, 4; 2026-02-21T10:22:35.5114904Z selp.b16 %rs1333, %rs1293, %rs1268, %p484; 2026-02-21T10:22:35.5115016Z cvt.s16.s8 %rs1334, %rs1333; 2026-02-21T10:22:35.5115083Z shr.s16 %rs1335, %rs1334, 4; 2026-02-21T10:22:35.5115155Z selp.b16 %rs1336, %rs1294, %rs1272, %p484; 2026-02-21T10:22:35.5115216Z cvt.s16.s8 %rs1337, %rs1336; 2026-02-21T10:22:35.5115282Z shr.s16 %rs1338, %rs1337, 4; 2026-02-21T10:22:35.5115353Z selp.b16 %rs1339, %rs1295, %rs1276, %p484; 2026-02-21T10:22:35.5115413Z cvt.s16.s8 %rs1340, %rs1339; 2026-02-21T10:22:35.5115522Z shr.s16 %rs1341, %rs1340, 4; 2026-02-21T10:22:35.5115597Z selp.b16 %rs1342, %rs1296, %rs1280, %p484; 2026-02-21T10:22:35.5115657Z cvt.s16.s8 %rs1343, %rs1342; 2026-02-21T10:22:35.5115716Z shr.s16 %rs1344, %rs1343, 4; 2026-02-21T10:22:35.5115935Z .loc 1 83 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:83:32 2026-02-21T10:22:35.5116002Z cvt.rn.f32.s16 %r14126, %rs1299; 2026-02-21T10:22:35.5116066Z cvt.rn.f32.s16 %r14127, %rs1302; 2026-02-21T10:22:35.5116130Z cvt.rn.f32.s16 %r14128, %rs1305; 2026-02-21T10:22:35.5116196Z cvt.rn.f32.s16 %r14129, %rs1308; 2026-02-21T10:22:35.5116257Z cvt.rn.f32.s16 %r14130, %rs1311; 2026-02-21T10:22:35.5116330Z cvt.rn.f32.s16 %r14131, %rs1314; 2026-02-21T10:22:35.5116396Z cvt.rn.f32.s16 %r14132, %rs1317; 2026-02-21T10:22:35.5116573Z cvt.rn.f32.s16 %r14133, %rs1320; 2026-02-21T10:22:35.5116641Z cvt.rn.f32.s16 %r14134, %rs1323; 2026-02-21T10:22:35.5116707Z cvt.rn.f32.s16 %r14135, %rs1326; 2026-02-21T10:22:35.5116772Z cvt.rn.f32.s16 %r14136, %rs1329; 2026-02-21T10:22:35.5116833Z cvt.rn.f32.s16 %r14137, %rs1332; 2026-02-21T10:22:35.5116896Z cvt.rn.f32.s16 %r14138, %rs1335; 2026-02-21T10:22:35.5116956Z cvt.rn.f32.s16 %r14139, %rs1338; 2026-02-21T10:22:35.5117016Z cvt.rn.f32.s16 %r14140, %rs1341; 2026-02-21T10:22:35.5117086Z cvt.rn.f32.s16 %r14141, %rs1344; 2026-02-21T10:22:35.5117148Z bar.sync 0; 2026-02-21T10:22:35.5117214Z st.shared.b32 [%r138], %r14126; 2026-02-21T10:22:35.5117281Z st.shared.b32 [%r138+16384], %r14134; 2026-02-21T10:22:35.5117350Z st.shared.b32 [%r139], %r14127; 2026-02-21T10:22:35.5117420Z st.shared.b32 [%r139+16384], %r14135; 2026-02-21T10:22:35.5117483Z st.shared.b32 [%r140], %r14128; 2026-02-21T10:22:35.5117547Z st.shared.b32 [%r140+16384], %r14136; 2026-02-21T10:22:35.5117616Z st.shared.b32 [%r141], %r14129; 2026-02-21T10:22:35.5117680Z st.shared.b32 [%r141+16384], %r14137; 2026-02-21T10:22:35.5117744Z st.shared.b32 [%r142], %r14130; 2026-02-21T10:22:35.5117813Z st.shared.b32 [%r142+16384], %r14138; 2026-02-21T10:22:35.5117875Z st.shared.b32 [%r143], %r14131; 2026-02-21T10:22:35.5117940Z st.shared.b32 [%r143+16384], %r14139; 2026-02-21T10:22:35.5118005Z st.shared.b32 [%r144], %r14132; 2026-02-21T10:22:35.5118069Z st.shared.b32 [%r144+16384], %r14140; 2026-02-21T10:22:35.5118132Z st.shared.b32 [%r145], %r14133; 2026-02-21T10:22:35.5118199Z st.shared.b32 [%r145+16384], %r14141; 2026-02-21T10:22:35.5118257Z $L__tmp23: 2026-02-21T10:22:35.5118539Z .loc 2 291 36 // standard.py:291:36 @[ co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:90:40 ] 2026-02-21T10:22:35.5118833Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9779, %r10067, %r10355, %r10643}; 2026-02-21T10:22:35.5118894Z bar.sync 0; 2026-02-21T10:22:35.5119016Z // begin inline asm 2026-02-21T10:22:35.5119156Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13675}, [%r13608]; 2026-02-21T10:22:35.5119214Z // end inline asm 2026-02-21T10:22:35.5119276Z bar.sync 0; 2026-02-21T10:22:35.5119470Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9781, %r10069, %r10357, %r10645}; 2026-02-21T10:22:35.5119524Z bar.sync 0; 2026-02-21T10:22:35.5119587Z // begin inline asm 2026-02-21T10:22:35.5119734Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13677}, [%r13608]; 2026-02-21T10:22:35.5119791Z // end inline asm 2026-02-21T10:22:35.5119847Z bar.sync 0; 2026-02-21T10:22:35.5120035Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9780, %r10068, %r10356, %r10644}; 2026-02-21T10:22:35.5120090Z bar.sync 0; 2026-02-21T10:22:35.5120213Z // begin inline asm 2026-02-21T10:22:35.5120354Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13676}, [%r13608]; 2026-02-21T10:22:35.5120411Z // end inline asm 2026-02-21T10:22:35.5120466Z bar.sync 0; 2026-02-21T10:22:35.5120658Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9782, %r10070, %r10358, %r10646}; 2026-02-21T10:22:35.5120712Z bar.sync 0; 2026-02-21T10:22:35.5120770Z // begin inline asm 2026-02-21T10:22:35.5120961Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13678}, [%r13608]; 2026-02-21T10:22:35.5121022Z // end inline asm 2026-02-21T10:22:35.5121075Z bar.sync 0; 2026-02-21T10:22:35.5121260Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9783, %r10071, %r10359, %r10647}; 2026-02-21T10:22:35.5121320Z bar.sync 0; 2026-02-21T10:22:35.5121377Z // begin inline asm 2026-02-21T10:22:35.5121509Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13679}, [%r13608]; 2026-02-21T10:22:35.5121568Z // end inline asm 2026-02-21T10:22:35.5121622Z bar.sync 0; 2026-02-21T10:22:35.5121809Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9785, %r10073, %r10361, %r10649}; 2026-02-21T10:22:35.5121864Z bar.sync 0; 2026-02-21T10:22:35.5121925Z // begin inline asm 2026-02-21T10:22:35.5122058Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13681}, [%r13608]; 2026-02-21T10:22:35.5122118Z // end inline asm 2026-02-21T10:22:35.5122174Z bar.sync 0; 2026-02-21T10:22:35.5122359Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9784, %r10072, %r10360, %r10648}; 2026-02-21T10:22:35.5122425Z bar.sync 0; 2026-02-21T10:22:35.5122485Z // begin inline asm 2026-02-21T10:22:35.5122621Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13680}, [%r13608]; 2026-02-21T10:22:35.5122678Z // end inline asm 2026-02-21T10:22:35.5122733Z bar.sync 0; 2026-02-21T10:22:35.5122918Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9786, %r10074, %r10362, %r10650}; 2026-02-21T10:22:35.5122973Z bar.sync 0; 2026-02-21T10:22:35.5123031Z // begin inline asm 2026-02-21T10:22:35.5123169Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13682}, [%r13608]; 2026-02-21T10:22:35.5123227Z // end inline asm 2026-02-21T10:22:35.5123281Z bar.sync 0; 2026-02-21T10:22:35.5123464Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9787, %r10075, %r10363, %r10651}; 2026-02-21T10:22:35.5123524Z bar.sync 0; 2026-02-21T10:22:35.5123582Z // begin inline asm 2026-02-21T10:22:35.5123715Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13683}, [%r13608]; 2026-02-21T10:22:35.5123776Z // end inline asm 2026-02-21T10:22:35.5123828Z bar.sync 0; 2026-02-21T10:22:35.5124010Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9789, %r10077, %r10365, %r10653}; 2026-02-21T10:22:35.5124065Z bar.sync 0; 2026-02-21T10:22:35.5124124Z // begin inline asm 2026-02-21T10:22:35.5124255Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13685}, [%r13608]; 2026-02-21T10:22:35.5124310Z // end inline asm 2026-02-21T10:22:35.5124365Z bar.sync 0; 2026-02-21T10:22:35.5124551Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9788, %r10076, %r10364, %r10652}; 2026-02-21T10:22:35.5124675Z bar.sync 0; 2026-02-21T10:22:35.5124733Z // begin inline asm 2026-02-21T10:22:35.5124869Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13684}, [%r13608]; 2026-02-21T10:22:35.5124993Z // end inline asm 2026-02-21T10:22:35.5125046Z bar.sync 0; 2026-02-21T10:22:35.5125230Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9790, %r10078, %r10366, %r10654}; 2026-02-21T10:22:35.5125286Z bar.sync 0; 2026-02-21T10:22:35.5125345Z // begin inline asm 2026-02-21T10:22:35.5125479Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13686}, [%r13608]; 2026-02-21T10:22:35.5125536Z // end inline asm 2026-02-21T10:22:35.5125590Z bar.sync 0; 2026-02-21T10:22:35.5125773Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9791, %r10079, %r10367, %r10655}; 2026-02-21T10:22:35.5125830Z bar.sync 0; 2026-02-21T10:22:35.5125899Z // begin inline asm 2026-02-21T10:22:35.5126033Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13687}, [%r13608]; 2026-02-21T10:22:35.5126144Z // end inline asm 2026-02-21T10:22:35.5126199Z bar.sync 0; 2026-02-21T10:22:35.5126382Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9793, %r10081, %r10369, %r10657}; 2026-02-21T10:22:35.5126436Z bar.sync 0; 2026-02-21T10:22:35.5126621Z // begin inline asm 2026-02-21T10:22:35.5126759Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13689}, [%r13608]; 2026-02-21T10:22:35.5126815Z // end inline asm 2026-02-21T10:22:35.5126941Z bar.sync 0; 2026-02-21T10:22:35.5127138Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9792, %r10080, %r10368, %r10656}; 2026-02-21T10:22:35.5127194Z bar.sync 0; 2026-02-21T10:22:35.5127254Z // begin inline asm 2026-02-21T10:22:35.5127386Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13688}, [%r13608]; 2026-02-21T10:22:35.5127442Z // end inline asm 2026-02-21T10:22:35.5127495Z bar.sync 0; 2026-02-21T10:22:35.5127679Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r9667], {%r9794, %r10082, %r10370, %r10658}; 2026-02-21T10:22:35.5127736Z bar.sync 0; 2026-02-21T10:22:35.5127798Z // begin inline asm 2026-02-21T10:22:35.5127931Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r13690}, [%r13608]; 2026-02-21T10:22:35.5127988Z // end inline asm 2026-02-21T10:22:35.5128046Z // begin inline asm 2026-02-21T10:22:35.5128128Z fence.proxy.async.shared::cta; 2026-02-21T10:22:35.5128188Z // end inline asm 2026-02-21T10:22:35.5128263Z wgmma.fence.sync.aligned; 2026-02-21T10:22:35.5128327Z shl.b32 %r14142, %r14042, 10; 2026-02-21T10:22:35.5128395Z and.b32 %r14143, %r14142, 12288; 2026-02-21T10:22:35.5128463Z add.s32 %r14144, %r14143, %r11026; 2026-02-21T10:22:35.5128526Z bfe.u32 %r14145, %r14144, 4, 14; 2026-02-21T10:22:35.5128590Z cvt.u64.u32 %rd585, %r14145; 2026-02-21T10:22:35.5128670Z or.b64 %rd565, %rd585, 4611686293372403712; 2026-02-21T10:22:35.5128729Z // begin inline asm 2026-02-21T10:22:35.5129313Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13671,%r13672,%r13673,%r13674}, %rd565, %p343, 1, 1; 2026-02-21T10:22:35.5129380Z // end inline asm 2026-02-21T10:22:35.5129443Z add.s32 %r14146, %r14144, 32; 2026-02-21T10:22:35.5129506Z bfe.u32 %r14147, %r14146, 4, 14; 2026-02-21T10:22:35.5129576Z cvt.u64.u32 %rd586, %r14147; 2026-02-21T10:22:35.5129651Z or.b64 %rd566, %rd586, 4611686293372403712; 2026-02-21T10:22:35.5129713Z // begin inline asm 2026-02-21T10:22:35.5130283Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13707,%r13708,%r13709,%r13710}, %rd566, %p343, 1, 1; 2026-02-21T10:22:35.5130341Z // end inline asm 2026-02-21T10:22:35.5130402Z add.s32 %r14148, %r14144, 64; 2026-02-21T10:22:35.5130463Z bfe.u32 %r14149, %r14148, 4, 14; 2026-02-21T10:22:35.5130527Z cvt.u64.u32 %rd587, %r14149; 2026-02-21T10:22:35.5130603Z or.b64 %rd567, %rd587, 4611686293372403712; 2026-02-21T10:22:35.5130738Z // begin inline asm 2026-02-21T10:22:35.5131295Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13743,%r13744,%r13745,%r13746}, %rd567, %p343, 1, 1; 2026-02-21T10:22:35.5131412Z // end inline asm 2026-02-21T10:22:35.5131471Z add.s32 %r14150, %r14144, 96; 2026-02-21T10:22:35.5131536Z bfe.u32 %r14151, %r14150, 4, 14; 2026-02-21T10:22:35.5131595Z cvt.u64.u32 %rd588, %r14151; 2026-02-21T10:22:35.5131666Z or.b64 %rd568, %rd588, 4611686293372403712; 2026-02-21T10:22:35.5131725Z // begin inline asm 2026-02-21T10:22:35.5132293Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13779,%r13780,%r13781,%r13782}, %rd568, %p343, 1, 1; 2026-02-21T10:22:35.5132350Z // end inline asm 2026-02-21T10:22:35.5132473Z add.s32 %r14152, %r14144, 16384; 2026-02-21T10:22:35.5132543Z bfe.u32 %r14153, %r14152, 4, 14; 2026-02-21T10:22:35.5132604Z cvt.u64.u32 %rd589, %r14153; 2026-02-21T10:22:35.5132677Z or.b64 %rd569, %rd589, 4611686293372403712; 2026-02-21T10:22:35.5132742Z // begin inline asm 2026-02-21T10:22:35.5133342Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13815,%r13816,%r13817,%r13818}, %rd569, %p343, 1, 1; 2026-02-21T10:22:35.5133402Z // end inline asm 2026-02-21T10:22:35.5133466Z add.s32 %r14154, %r14144, 16416; 2026-02-21T10:22:35.5133527Z bfe.u32 %r14155, %r14154, 4, 14; 2026-02-21T10:22:35.5133587Z cvt.u64.u32 %rd590, %r14155; 2026-02-21T10:22:35.5133657Z or.b64 %rd570, %rd590, 4611686293372403712; 2026-02-21T10:22:35.5133717Z // begin inline asm 2026-02-21T10:22:35.5134267Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13851,%r13852,%r13853,%r13854}, %rd570, %p343, 1, 1; 2026-02-21T10:22:35.5134326Z // end inline asm 2026-02-21T10:22:35.5134389Z add.s32 %r14156, %r14144, 16448; 2026-02-21T10:22:35.5134451Z bfe.u32 %r14157, %r14156, 4, 14; 2026-02-21T10:22:35.5134524Z cvt.u64.u32 %rd591, %r14157; 2026-02-21T10:22:35.5134602Z or.b64 %rd571, %rd591, 4611686293372403712; 2026-02-21T10:22:35.5134661Z // begin inline asm 2026-02-21T10:22:35.5135213Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13887,%r13888,%r13889,%r13890}, %rd571, %p343, 1, 1; 2026-02-21T10:22:35.5135271Z // end inline asm 2026-02-21T10:22:35.5135331Z add.s32 %r14158, %r14144, 16480; 2026-02-21T10:22:35.5135389Z bfe.u32 %r14159, %r14158, 4, 14; 2026-02-21T10:22:35.5135450Z cvt.u64.u32 %rd592, %r14159; 2026-02-21T10:22:35.5135530Z or.b64 %rd572, %rd592, 4611686293372403712; 2026-02-21T10:22:35.5135589Z // begin inline asm 2026-02-21T10:22:35.5136136Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690}, {%r13923,%r13924,%r13925,%r13926}, %rd572, %p343, 1, 1; 2026-02-21T10:22:35.5136202Z // end inline asm 2026-02-21T10:22:35.5136279Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:35.5136342Z mov.b32 %r13943, %r11026; 2026-02-21T10:22:35.5136403Z mov.b32 %r13945, %r13944; 2026-02-21T10:22:35.5136576Z // begin inline asm 2026-02-21T10:22:35.5136941Z // wait for regs: %r13675,%r13676,%r13677,%r13678,%r13679,%r13680,%r13681,%r13682,%r13683,%r13684,%r13685,%r13686,%r13687,%r13688,%r13689,%r13690,%r13943,%r13944,%r13945 2026-02-21T10:22:35.5137021Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:35.5137077Z // end inline asm 2026-02-21T10:22:35.5137227Z $L__tmp24: 2026-02-21T10:22:35.5137455Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5137523Z add.s32 %r14160, %r14285, 128; 2026-02-21T10:22:35.5137586Z add.s32 %r14161, %r14288, 1; 2026-02-21T10:22:35.5137726Z setp.gt.s32 %p461, %r14161, 2; 2026-02-21T10:22:35.5137798Z selp.b32 %r14288, 0, %r14161, %p461; 2026-02-21T10:22:35.5137866Z selp.b32 %r14285, 0, %r14160, %p455; 2026-02-21T10:22:35.5138077Z .loc 1 51 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:51:22 2026-02-21T10:22:35.5138137Z shl.b32 %r14162, %r14285, 1; 2026-02-21T10:22:35.5138341Z .loc 1 53 25 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:53:25 2026-02-21T10:22:35.5138403Z add.s32 %r14163, %r14162, %r10; 2026-02-21T10:22:35.5138603Z .loc 1 54 53 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:53 2026-02-21T10:22:35.5138666Z shl.b32 %r14164, %r14315, 13; 2026-02-21T10:22:35.5138794Z shl.b32 %r14165, %r14316, 13; 2026-02-21T10:22:35.5138997Z .loc 1 54 60 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:60 2026-02-21T10:22:35.5139064Z add.s32 %r14166, %r14164, %r14163; 2026-02-21T10:22:35.5139129Z add.s32 %r14167, %r14165, %r14163; 2026-02-21T10:22:35.5139388Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.5139472Z mad.wide.s32 %rd573, %r14166, 2, %rd76; 2026-02-21T10:22:35.5139546Z mad.wide.s32 %rd574, %r14167, 2, %rd76; 2026-02-21T10:22:35.5139744Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5139804Z shl.b32 %r14168, %r14288, 12; 2026-02-21T10:22:35.5139866Z shl.b32 %r14169, %r14288, 13; 2026-02-21T10:22:35.5139928Z add.s32 %r14170, %r14005, %r14169; 2026-02-21T10:22:35.5139991Z add.s32 %r13965, %r14170, %r127; 2026-02-21T10:22:35.5140063Z selp.b32 %r13966, 8, 0, %p456; 2026-02-21T10:22:35.5140123Z // begin inline asm 2026-02-21T10:22:35.5140274Z cp.async.ca.shared.global [ %r13965 + 0 ], [ %rd573 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5140346Z // end inline asm 2026-02-21T10:22:35.5140412Z add.s32 %r13967, %r13965, 4096; 2026-02-21T10:22:35.5140470Z // begin inline asm 2026-02-21T10:22:35.5140616Z cp.async.ca.shared.global [ %r13967 + 0 ], [ %rd574 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5140675Z // end inline asm 2026-02-21T10:22:35.5140740Z cp.async.commit_group; 2026-02-21T10:22:35.5140955Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5141021Z shl.b32 %r14171, %r14288, 3; 2026-02-21T10:22:35.5141085Z add.s32 %r13969, %r9365, %r14171; 2026-02-21T10:22:35.5141152Z and.pred %p447, %p472, %p456; 2026-02-21T10:22:35.5141211Z // begin inline asm 2026-02-21T10:22:35.5141349Z @%p447 mbarrier.arrive.expect_tx.shared.b64 _, [%r13969], 4096; 2026-02-21T10:22:35.5141410Z // end inline asm 2026-02-21T10:22:35.5141610Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5141675Z add.s32 %r13970, %r9382, %r14168; 2026-02-21T10:22:35.5141732Z bar.sync 0; 2026-02-21T10:22:35.5141801Z elect.sync %r14172|%p462, -1; 2026-02-21T10:22:35.5141868Z and.pred %p463, %p456, %p462; 2026-02-21T10:22:35.5141937Z and.pred %p448, %p1, %p463; 2026-02-21T10:22:35.5141996Z // begin inline asm 2026-02-21T10:22:35.5142335Z @%p448 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r13970], [%rd394, {%r14312, %r14285}], [%r13969]; 2026-02-21T10:22:35.5142395Z // end inline asm 2026-02-21T10:22:35.5142605Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.5142665Z add.s32 %r13981, %r14285, 32; 2026-02-21T10:22:35.5142865Z .loc 1 51 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:51:22 2026-02-21T10:22:35.5143000Z shl.b32 %r14173, %r13981, 1; 2026-02-21T10:22:35.5143202Z .loc 1 53 25 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:53:25 2026-02-21T10:22:35.5143312Z add.s32 %r14174, %r14173, %r10; 2026-02-21T10:22:35.5143510Z .loc 1 54 60 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:60 2026-02-21T10:22:35.5143576Z add.s32 %r14175, %r14164, %r14174; 2026-02-21T10:22:35.5143639Z add.s32 %r14176, %r14165, %r14174; 2026-02-21T10:22:35.5143836Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.5143907Z mad.wide.s32 %rd576, %r14175, 2, %rd76; 2026-02-21T10:22:35.5143976Z mad.wide.s32 %rd577, %r14176, 2, %rd76; 2026-02-21T10:22:35.5144174Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5144299Z add.s32 %r14177, %r14043, %r14169; 2026-02-21T10:22:35.5144412Z add.s32 %r13974, %r14177, %r127; 2026-02-21T10:22:35.5144476Z // begin inline asm 2026-02-21T10:22:35.5144619Z cp.async.ca.shared.global [ %r13974 + 0 ], [ %rd576 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5144677Z // end inline asm 2026-02-21T10:22:35.5144743Z add.s32 %r13976, %r13974, 4096; 2026-02-21T10:22:35.5144813Z // begin inline asm 2026-02-21T10:22:35.5145008Z cp.async.ca.shared.global [ %r13976 + 0 ], [ %rd577 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5145067Z // end inline asm 2026-02-21T10:22:35.5145134Z cp.async.commit_group; 2026-02-21T10:22:35.5145342Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5145405Z add.s32 %r13978, %r9368, %r14171; 2026-02-21T10:22:35.5145466Z // begin inline asm 2026-02-21T10:22:35.5145597Z @%p447 mbarrier.arrive.expect_tx.shared.b64 _, [%r13978], 4096; 2026-02-21T10:22:35.5145655Z // end inline asm 2026-02-21T10:22:35.5145859Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5145936Z add.s32 %r13979, %r9391, %r14168; 2026-02-21T10:22:35.5145993Z bar.sync 0; 2026-02-21T10:22:35.5146060Z elect.sync %r14178|%p464, -1; 2026-02-21T10:22:35.5146130Z and.pred %p465, %p456, %p464; 2026-02-21T10:22:35.5146198Z and.pred %p450, %p1, %p465; 2026-02-21T10:22:35.5146258Z // begin inline asm 2026-02-21T10:22:35.5146720Z @%p450 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r13979], [%rd394, {%r14312, %r13981}], [%r13978]; 2026-02-21T10:22:35.5146783Z // end inline asm 2026-02-21T10:22:35.5146991Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.5147052Z add.s32 %r13990, %r14285, 64; 2026-02-21T10:22:35.5147251Z .loc 1 51 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:51:22 2026-02-21T10:22:35.5147311Z shl.b32 %r14179, %r13990, 1; 2026-02-21T10:22:35.5147516Z .loc 1 53 25 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:53:25 2026-02-21T10:22:35.5147577Z add.s32 %r14180, %r14179, %r10; 2026-02-21T10:22:35.5147774Z .loc 1 54 60 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:60 2026-02-21T10:22:35.5147840Z add.s32 %r14181, %r14164, %r14180; 2026-02-21T10:22:35.5147902Z add.s32 %r14182, %r14165, %r14180; 2026-02-21T10:22:35.5148102Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.5148173Z mad.wide.s32 %rd579, %r14181, 2, %rd76; 2026-02-21T10:22:35.5148253Z mad.wide.s32 %rd580, %r14182, 2, %rd76; 2026-02-21T10:22:35.5148515Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5148578Z add.s32 %r14183, %r14076, %r14169; 2026-02-21T10:22:35.5148643Z add.s32 %r13983, %r14183, %r127; 2026-02-21T10:22:35.5148817Z // begin inline asm 2026-02-21T10:22:35.5148965Z cp.async.ca.shared.global [ %r13983 + 0 ], [ %rd579 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5149023Z // end inline asm 2026-02-21T10:22:35.5149086Z add.s32 %r13985, %r13983, 4096; 2026-02-21T10:22:35.5149209Z // begin inline asm 2026-02-21T10:22:35.5149347Z cp.async.ca.shared.global [ %r13985 + 0 ], [ %rd580 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5149406Z // end inline asm 2026-02-21T10:22:35.5149474Z cp.async.commit_group; 2026-02-21T10:22:35.5149681Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5149746Z add.s32 %r13987, %r9371, %r14171; 2026-02-21T10:22:35.5149804Z // begin inline asm 2026-02-21T10:22:35.5149935Z @%p447 mbarrier.arrive.expect_tx.shared.b64 _, [%r13987], 4096; 2026-02-21T10:22:35.5149991Z // end inline asm 2026-02-21T10:22:35.5150207Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5150333Z add.s32 %r13988, %r9400, %r14168; 2026-02-21T10:22:35.5150390Z bar.sync 0; 2026-02-21T10:22:35.5150459Z elect.sync %r14184|%p466, -1; 2026-02-21T10:22:35.5150525Z and.pred %p467, %p456, %p466; 2026-02-21T10:22:35.5150590Z and.pred %p452, %p1, %p467; 2026-02-21T10:22:35.5150651Z // begin inline asm 2026-02-21T10:22:35.5151040Z @%p452 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r13988], [%rd394, {%r14312, %r13990}], [%r13987]; 2026-02-21T10:22:35.5151099Z // end inline asm 2026-02-21T10:22:35.5151304Z .loc 1 47 126 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:47:126 2026-02-21T10:22:35.5151368Z add.s32 %r13999, %r14285, 96; 2026-02-21T10:22:35.5151567Z .loc 1 51 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:51:22 2026-02-21T10:22:35.5151628Z shl.b32 %r14185, %r13999, 1; 2026-02-21T10:22:35.5151842Z .loc 1 53 25 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:53:25 2026-02-21T10:22:35.5151909Z add.s32 %r14186, %r14185, %r10; 2026-02-21T10:22:35.5152108Z .loc 1 54 60 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:60 2026-02-21T10:22:35.5152173Z add.s32 %r14187, %r14164, %r14186; 2026-02-21T10:22:35.5152236Z add.s32 %r14188, %r14165, %r14186; 2026-02-21T10:22:35.5152435Z .loc 1 54 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:32 2026-02-21T10:22:35.5152509Z mad.wide.s32 %rd582, %r14187, 2, %rd76; 2026-02-21T10:22:35.5152580Z mad.wide.s32 %rd583, %r14188, 2, %rd76; 2026-02-21T10:22:35.5152777Z .loc 1 54 80 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:54:80 2026-02-21T10:22:35.5152837Z add.s32 %r14189, %r14109, %r14169; 2026-02-21T10:22:35.5152901Z add.s32 %r13992, %r14189, %r127; 2026-02-21T10:22:35.5152959Z // begin inline asm 2026-02-21T10:22:35.5153099Z cp.async.ca.shared.global [ %r13992 + 0 ], [ %rd582 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5153167Z // end inline asm 2026-02-21T10:22:35.5153233Z add.s32 %r13994, %r13992, 4096; 2026-02-21T10:22:35.5153291Z // begin inline asm 2026-02-21T10:22:35.5153429Z cp.async.ca.shared.global [ %r13994 + 0 ], [ %rd583 + 0 ], 0x8, %r13966; 2026-02-21T10:22:35.5153492Z // end inline asm 2026-02-21T10:22:35.5153557Z cp.async.commit_group; 2026-02-21T10:22:35.5153764Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5153828Z add.s32 %r13996, %r9374, %r14171; 2026-02-21T10:22:35.5153887Z // begin inline asm 2026-02-21T10:22:35.5154014Z @%p447 mbarrier.arrive.expect_tx.shared.b64 _, [%r13996], 4096; 2026-02-21T10:22:35.5154073Z // end inline asm 2026-02-21T10:22:35.5154271Z .loc 1 60 33 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:60:33 2026-02-21T10:22:35.5154333Z add.s32 %r13997, %r9409, %r14168; 2026-02-21T10:22:35.5154387Z bar.sync 0; 2026-02-21T10:22:35.5154526Z elect.sync %r14190|%p468, -1; 2026-02-21T10:22:35.5154591Z and.pred %p469, %p456, %p468; 2026-02-21T10:22:35.5154654Z and.pred %p454, %p1, %p469; 2026-02-21T10:22:35.5154713Z // begin inline asm 2026-02-21T10:22:35.5155037Z @%p454 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r13997], [%rd394, {%r14312, %r13999}], [%r13996]; 2026-02-21T10:22:35.5155143Z // end inline asm 2026-02-21T10:22:35.5155354Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5155420Z setp.ne.b32 %p470, %r14284, 31; 2026-02-21T10:22:35.5155484Z @%p470 bra $L__BB0_13; 2026-02-21T10:22:35.5155595Z // %bb.12: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:35.5155814Z .loc 1 38 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:38:32 2026-02-21T10:22:35.5155879Z add.s32 %r14210, %r14282, %r6; 2026-02-21T10:22:35.5156123Z .loc 1 40 32 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:40:32 2026-02-21T10:22:35.5156192Z add.s32 %r14211, %r14280, %r7; 2026-02-21T10:22:35.5156264Z add.s32 %r14212, %r14280, %r8; 2026-02-21T10:22:35.5156602Z .loc 1 93 28 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:93:28 2026-02-21T10:22:35.5156703Z cvt.rn.bf16x2.f32 %r14213, %r13676, %r13675; 2026-02-21T10:22:35.5156866Z cvt.rn.bf16x2.f32 %r14214, %r13678, %r13677; 2026-02-21T10:22:35.5156959Z cvt.rn.bf16x2.f32 %r14215, %r13680, %r13679; 2026-02-21T10:22:35.5157041Z cvt.rn.bf16x2.f32 %r14216, %r13682, %r13681; 2026-02-21T10:22:35.5157118Z cvt.rn.bf16x2.f32 %r14217, %r13684, %r13683; 2026-02-21T10:22:35.5157192Z cvt.rn.bf16x2.f32 %r14218, %r13686, %r13685; 2026-02-21T10:22:35.5157268Z cvt.rn.bf16x2.f32 %r14219, %r13688, %r13687; 2026-02-21T10:22:35.5157347Z cvt.rn.bf16x2.f32 %r14220, %r13690, %r13689; 2026-02-21T10:22:35.5157567Z .loc 1 94 50 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:50 2026-02-21T10:22:35.5157650Z mad.lo.s32 %r14221, %r14211, 1280, %r14210; 2026-02-21T10:22:35.5157733Z mad.lo.s32 %r14222, %r14212, 1280, %r14210; 2026-02-21T10:22:35.5157940Z .loc 1 94 22 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:22 2026-02-21T10:22:35.5158018Z mad.wide.s32 %rd593, %r14221, 2, %rd77; 2026-02-21T10:22:35.5158095Z mad.wide.s32 %rd594, %r14222, 2, %rd77; 2026-02-21T10:22:35.5158295Z .loc 1 94 81 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:94:81 2026-02-21T10:22:35.5158418Z st.shared.v4.b32 [%r148], {%r14213, %r14215, %r14217, %r14219}; 2026-02-21T10:22:35.5158546Z st.shared.v4.b32 [%r148+512], {%r14214, %r14216, %r14218, %r14220}; 2026-02-21T10:22:35.5158603Z bar.sync 0; 2026-02-21T10:22:35.5158662Z // begin inline asm 2026-02-21T10:22:35.5158867Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14201, %r14202, %r14203, %r14204}, [%r14195]; 2026-02-21T10:22:35.5158934Z // end inline asm 2026-02-21T10:22:35.5158991Z // begin inline asm 2026-02-21T10:22:35.5159186Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r14205, %r14206, %r14207, %r14208}, [%r14200]; 2026-02-21T10:22:35.5159247Z // end inline asm 2026-02-21T10:22:35.5159307Z // begin inline asm 2026-02-21T10:22:35.5159437Z st.global.v4.b32 [ %rd593 + 0 ], { %r14201, %r14202, %r14203, %r14204 }; 2026-02-21T10:22:35.5159495Z // end inline asm 2026-02-21T10:22:35.5159556Z // begin inline asm 2026-02-21T10:22:35.5159677Z st.global.v4.b32 [ %rd594 + 0 ], { %r14205, %r14206, %r14207, %r14208 }; 2026-02-21T10:22:35.5159740Z // end inline asm 2026-02-21T10:22:35.5159805Z mov.b32 %r13675, 0f00000000; 2026-02-21T10:22:35.5159868Z mov.b32 %r13676, %r13675; 2026-02-21T10:22:35.5159927Z mov.b32 %r13677, %r13675; 2026-02-21T10:22:35.5159986Z mov.b32 %r13678, %r13675; 2026-02-21T10:22:35.5160047Z mov.b32 %r13679, %r13675; 2026-02-21T10:22:35.5160106Z mov.b32 %r13680, %r13675; 2026-02-21T10:22:35.5160254Z mov.b32 %r13681, %r13675; 2026-02-21T10:22:35.5160318Z mov.b32 %r13682, %r13675; 2026-02-21T10:22:35.5160376Z mov.b32 %r13683, %r13675; 2026-02-21T10:22:35.5160433Z mov.b32 %r13684, %r13675; 2026-02-21T10:22:35.5160494Z mov.b32 %r13685, %r13675; 2026-02-21T10:22:35.5160615Z mov.b32 %r13686, %r13675; 2026-02-21T10:22:35.5160673Z mov.b32 %r13687, %r13675; 2026-02-21T10:22:35.5160729Z mov.b32 %r13688, %r13675; 2026-02-21T10:22:35.5160791Z mov.b32 %r13689, %r13675; 2026-02-21T10:22:35.5160849Z mov.b32 %r13690, %r13675; 2026-02-21T10:22:35.5160908Z bra.uni $L__BB0_13; 2026-02-21T10:22:35.5161007Z $L__BB0_14: // %._crit_edge233 2026-02-21T10:22:35.5161223Z .loc 1 26 107 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:107 2026-02-21T10:22:35.5161291Z cp.async.wait_group 0; 2026-02-21T10:22:35.5161346Z bar.sync 0; 2026-02-21T10:22:35.5161409Z // begin inline asm 2026-02-21T10:22:35.5161506Z @%p472 mbarrier.inval.shared::cta.b64 [%r9374]; 2026-02-21T10:22:35.5161626Z // end inline asm 2026-02-21T10:22:35.5161686Z bar.sync 0; 2026-02-21T10:22:35.5161745Z // begin inline asm 2026-02-21T10:22:35.5161835Z @%p472 mbarrier.inval.shared::cta.b64 [%r9375]; 2026-02-21T10:22:35.5161895Z // end inline asm 2026-02-21T10:22:35.5161954Z bar.sync 0; 2026-02-21T10:22:35.5162012Z // begin inline asm 2026-02-21T10:22:35.5162151Z @%p472 mbarrier.inval.shared::cta.b64 [%r9376]; 2026-02-21T10:22:35.5162212Z // end inline asm 2026-02-21T10:22:35.5162271Z // begin inline asm 2026-02-21T10:22:35.5162359Z @%p472 mbarrier.inval.shared::cta.b64 [%r9371]; 2026-02-21T10:22:35.5162419Z // end inline asm 2026-02-21T10:22:35.5162482Z bar.sync 0; 2026-02-21T10:22:35.5162540Z // begin inline asm 2026-02-21T10:22:35.5162627Z @%p472 mbarrier.inval.shared::cta.b64 [%r9372]; 2026-02-21T10:22:35.5162686Z // end inline asm 2026-02-21T10:22:35.5162740Z bar.sync 0; 2026-02-21T10:22:35.5162797Z // begin inline asm 2026-02-21T10:22:35.5162890Z @%p472 mbarrier.inval.shared::cta.b64 [%r9373]; 2026-02-21T10:22:35.5162946Z // end inline asm 2026-02-21T10:22:35.5163004Z // begin inline asm 2026-02-21T10:22:35.5163091Z @%p472 mbarrier.inval.shared::cta.b64 [%r9368]; 2026-02-21T10:22:35.5163152Z // end inline asm 2026-02-21T10:22:35.5163205Z bar.sync 0; 2026-02-21T10:22:35.5163261Z // begin inline asm 2026-02-21T10:22:35.5163350Z @%p472 mbarrier.inval.shared::cta.b64 [%r9369]; 2026-02-21T10:22:35.5163416Z // end inline asm 2026-02-21T10:22:35.5163471Z bar.sync 0; 2026-02-21T10:22:35.5163529Z // begin inline asm 2026-02-21T10:22:35.5163617Z @%p472 mbarrier.inval.shared::cta.b64 [%r9370]; 2026-02-21T10:22:35.5163673Z // end inline asm 2026-02-21T10:22:35.5163731Z // begin inline asm 2026-02-21T10:22:35.5163820Z @%p472 mbarrier.inval.shared::cta.b64 [%r9365]; 2026-02-21T10:22:35.5163876Z // end inline asm 2026-02-21T10:22:35.5163930Z bar.sync 0; 2026-02-21T10:22:35.5163990Z // begin inline asm 2026-02-21T10:22:35.5164091Z @%p472 mbarrier.inval.shared::cta.b64 [%r9366]; 2026-02-21T10:22:35.5164150Z // end inline asm 2026-02-21T10:22:35.5164206Z bar.sync 0; 2026-02-21T10:22:35.5164266Z // begin inline asm 2026-02-21T10:22:35.5164354Z @%p472 mbarrier.inval.shared::cta.b64 [%r9367]; 2026-02-21T10:22:35.5164412Z // end inline asm 2026-02-21T10:22:35.5164625Z .loc 1 26 4 // co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py:26:4 2026-02-21T10:22:35.5164680Z ret; 2026-02-21T10:22:35.5164735Z $L__tmp25: 2026-02-21T10:22:35.5164792Z $L__func_end0: 2026-02-21T10:22:35.5164893Z // -- End function 2026-02-21T10:22:35.5164946Z } 2026-02-21T10:22:35.5165201Z .file 1 "/tmp/torchinductor_root/o6/co65rnlfxczvdnqh7bldz6qkjg7lpyuocmscbas2rwqp3sadtcis.py" 2026-02-21T10:22:35.5165420Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:22:35.5165484Z .section .debug_abbrev 2026-02-21T10:22:35.5165535Z { 2026-02-21T10:22:35.5165706Z .b8 1 // Abbreviation Code 2026-02-21T10:22:35.5165803Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:22:35.5165886Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:35.5166016Z .b8 37 // DW_AT_producer 2026-02-21T10:22:35.5166102Z .b8 8 // DW_FORM_string 2026-02-21T10:22:35.5166184Z .b8 19 // DW_AT_language 2026-02-21T10:22:35.5166265Z .b8 5 // DW_FORM_data2 2026-02-21T10:22:35.5166345Z .b8 3 // DW_AT_name 2026-02-21T10:22:35.5166424Z .b8 8 // DW_FORM_string 2026-02-21T10:22:35.5166629Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:22:35.5166717Z .b8 6 // DW_FORM_data4 2026-02-21T10:22:35.5166795Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:22:35.5166956Z .b8 8 // DW_FORM_string 2026-02-21T10:22:35.5167037Z .b8 0 // EOM(1) 2026-02-21T10:22:35.5167108Z .b8 0 // EOM(2) 2026-02-21T10:22:35.5167200Z .b8 2 // Abbreviation Code 2026-02-21T10:22:35.5167287Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:35.5167447Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:35.5167525Z .b8 3 // DW_AT_name 2026-02-21T10:22:35.5167604Z .b8 8 // DW_FORM_string 2026-02-21T10:22:35.5167688Z .b8 32 // DW_AT_inline 2026-02-21T10:22:35.5167768Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:35.5167848Z .b8 0 // EOM(1) 2026-02-21T10:22:35.5167923Z .b8 0 // EOM(2) 2026-02-21T10:22:35.5168017Z .b8 3 // Abbreviation Code 2026-02-21T10:22:35.5168104Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:35.5168184Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:35.5168269Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:35.5168345Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:35.5168429Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:35.5168512Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:35.5168606Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:35.5168694Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:35.5168769Z .b8 0 // EOM(1) 2026-02-21T10:22:35.5168841Z .b8 0 // EOM(2) 2026-02-21T10:22:35.5168925Z .b8 4 // Abbreviation Code 2026-02-21T10:22:35.5169031Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:22:35.5169115Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:35.5169204Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:35.5169285Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:35.5169371Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:35.5169444Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:35.5169523Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:35.5169600Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:35.5169681Z .b8 88 // DW_AT_call_file 2026-02-21T10:22:35.5169758Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:35.5169839Z .b8 89 // DW_AT_call_line 2026-02-21T10:22:35.5169917Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:35.5170090Z .b8 87 // DW_AT_call_column 2026-02-21T10:22:35.5170170Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:35.5170243Z .b8 0 // EOM(1) 2026-02-21T10:22:35.5170370Z .b8 0 // EOM(2) 2026-02-21T10:22:35.5170440Z .b8 0 // EOM(3) 2026-02-21T10:22:35.5170496Z } 2026-02-21T10:22:35.5170561Z .section .debug_info 2026-02-21T10:22:35.5170612Z { 2026-02-21T10:22:35.5170699Z .b32 178 // Length of Unit 2026-02-21T10:22:35.5170794Z .b8 2 // DWARF version number 2026-02-21T10:22:35.5170846Z .b8 0 2026-02-21T10:22:35.5170978Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:22:35.5171084Z .b8 8 // Address Size (in bytes) 2026-02-21T10:22:35.5171249Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:22:35.5171340Z .b8 116 // DW_AT_producer 2026-02-21T10:22:35.5171395Z .b8 114 2026-02-21T10:22:35.5171447Z .b8 105 2026-02-21T10:22:35.5171501Z .b8 116 2026-02-21T10:22:35.5171552Z .b8 111 2026-02-21T10:22:35.5171605Z .b8 110 2026-02-21T10:22:35.5171656Z .b8 0 2026-02-21T10:22:35.5171781Z .b8 2 // DW_AT_language 2026-02-21T10:22:35.5171836Z .b8 0 2026-02-21T10:22:35.5171915Z .b8 99 // DW_AT_name 2026-02-21T10:22:35.5171967Z .b8 111 2026-02-21T10:22:35.5172017Z .b8 54 2026-02-21T10:22:35.5172070Z .b8 53 2026-02-21T10:22:35.5172120Z .b8 114 2026-02-21T10:22:35.5172170Z .b8 110 2026-02-21T10:22:35.5172224Z .b8 108 2026-02-21T10:22:35.5172274Z .b8 102 2026-02-21T10:22:35.5172324Z .b8 120 2026-02-21T10:22:35.5172375Z .b8 99 2026-02-21T10:22:35.5172428Z .b8 122 2026-02-21T10:22:35.5172480Z .b8 118 2026-02-21T10:22:35.5172530Z .b8 100 2026-02-21T10:22:35.5172583Z .b8 110 2026-02-21T10:22:35.5172638Z .b8 113 2026-02-21T10:22:35.5172689Z .b8 104 2026-02-21T10:22:35.5172738Z .b8 55 2026-02-21T10:22:35.5172791Z .b8 98 2026-02-21T10:22:35.5172842Z .b8 108 2026-02-21T10:22:35.5172893Z .b8 100 2026-02-21T10:22:35.5172949Z .b8 122 2026-02-21T10:22:35.5173013Z .b8 54 2026-02-21T10:22:35.5173070Z .b8 113 2026-02-21T10:22:35.5173123Z .b8 107 2026-02-21T10:22:35.5173179Z .b8 106 2026-02-21T10:22:35.5173232Z .b8 103 2026-02-21T10:22:35.5173282Z .b8 55 2026-02-21T10:22:35.5173333Z .b8 108 2026-02-21T10:22:35.5173386Z .b8 112 2026-02-21T10:22:35.5173437Z .b8 121 2026-02-21T10:22:35.5173487Z .b8 117 2026-02-21T10:22:35.5173540Z .b8 111 2026-02-21T10:22:35.5173590Z .b8 99 2026-02-21T10:22:35.5173640Z .b8 109 2026-02-21T10:22:35.5173690Z .b8 115 2026-02-21T10:22:35.5173742Z .b8 99 2026-02-21T10:22:35.5173792Z .b8 98 2026-02-21T10:22:35.5173843Z .b8 97 2026-02-21T10:22:35.5173895Z .b8 115 2026-02-21T10:22:35.5173947Z .b8 50 2026-02-21T10:22:35.5174000Z .b8 114 2026-02-21T10:22:35.5174050Z .b8 119 2026-02-21T10:22:35.5174107Z .b8 113 2026-02-21T10:22:35.5174158Z .b8 112 2026-02-21T10:22:35.5174208Z .b8 51 2026-02-21T10:22:35.5174258Z .b8 115 2026-02-21T10:22:35.5174310Z .b8 97 2026-02-21T10:22:35.5174361Z .b8 100 2026-02-21T10:22:35.5174426Z .b8 116 2026-02-21T10:22:35.5174480Z .b8 99 2026-02-21T10:22:35.5174532Z .b8 105 2026-02-21T10:22:35.5174583Z .b8 115 2026-02-21T10:22:35.5174633Z .b8 46 2026-02-21T10:22:35.5174687Z .b8 112 2026-02-21T10:22:35.5174738Z .b8 121 2026-02-21T10:22:35.5174788Z .b8 0 2026-02-21T10:22:35.5174906Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:22:35.5174990Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:22:35.5175042Z .b8 116 2026-02-21T10:22:35.5175095Z .b8 109 2026-02-21T10:22:35.5175147Z .b8 112 2026-02-21T10:22:35.5175196Z .b8 47 2026-02-21T10:22:35.5175248Z .b8 116 2026-02-21T10:22:35.5175299Z .b8 111 2026-02-21T10:22:35.5175352Z .b8 114 2026-02-21T10:22:35.5175401Z .b8 99 2026-02-21T10:22:35.5175513Z .b8 104 2026-02-21T10:22:35.5175569Z .b8 105 2026-02-21T10:22:35.5175619Z .b8 110 2026-02-21T10:22:35.5175669Z .b8 100 2026-02-21T10:22:35.5175719Z .b8 117 2026-02-21T10:22:35.5175771Z .b8 99 2026-02-21T10:22:35.5175821Z .b8 116 2026-02-21T10:22:35.5175919Z .b8 111 2026-02-21T10:22:35.5175982Z .b8 114 2026-02-21T10:22:35.5176034Z .b8 95 2026-02-21T10:22:35.5176085Z .b8 114 2026-02-21T10:22:35.5176136Z .b8 111 2026-02-21T10:22:35.5176190Z .b8 111 2026-02-21T10:22:35.5176243Z .b8 116 2026-02-21T10:22:35.5176293Z .b8 47 2026-02-21T10:22:35.5176347Z .b8 111 2026-02-21T10:22:35.5176398Z .b8 54 2026-02-21T10:22:35.5176567Z .b8 0 2026-02-21T10:22:35.5176688Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:22:35.5176768Z .b8 95 // DW_AT_name 2026-02-21T10:22:35.5176819Z .b8 104 2026-02-21T10:22:35.5176870Z .b8 101 2026-02-21T10:22:35.5176924Z .b8 108 2026-02-21T10:22:35.5176975Z .b8 105 2026-02-21T10:22:35.5177031Z .b8 111 2026-02-21T10:22:35.5177157Z .b8 110 2026-02-21T10:22:35.5177213Z .b8 95 2026-02-21T10:22:35.5177265Z .b8 109 2026-02-21T10:22:35.5177314Z .b8 97 2026-02-21T10:22:35.5177365Z .b8 116 2026-02-21T10:22:35.5177417Z .b8 109 2026-02-21T10:22:35.5177472Z .b8 117 2026-02-21T10:22:35.5177522Z .b8 108 2026-02-21T10:22:35.5177573Z .b8 95 2026-02-21T10:22:35.5177625Z .b8 98 2026-02-21T10:22:35.5177675Z .b8 102 2026-02-21T10:22:35.5177736Z .b8 49 2026-02-21T10:22:35.5177856Z .b8 54 2026-02-21T10:22:35.5177911Z .b8 95 2026-02-21T10:22:35.5177962Z .b8 105 2026-02-21T10:22:35.5178015Z .b8 110 2026-02-21T10:22:35.5178066Z .b8 116 2026-02-21T10:22:35.5178115Z .b8 52 2026-02-21T10:22:35.5178165Z .b8 0 2026-02-21T10:22:35.5178247Z .b8 1 // DW_AT_inline 2026-02-21T10:22:35.5178355Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:22:35.5178450Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:22:35.5178559Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:22:35.5178663Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:35.5178792Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:22:35.5178894Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:35.5178982Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:22:35.5179074Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:22:35.5179157Z .b8 1 // DW_AT_call_file 2026-02-21T10:22:35.5179243Z .b8 90 // DW_AT_call_line 2026-02-21T10:22:35.5179329Z .b8 40 // DW_AT_call_column 2026-02-21T10:22:35.5179419Z .b8 0 // End Of Children Mark 2026-02-21T10:22:35.5179514Z .b8 0 // End Of Children Mark 2026-02-21T10:22:35.5179564Z } 2026-02-21T10:22:35.5179636Z .section .debug_macinfo { } 2026-02-21T10:22:35.5179644Z 2026-02-21T10:22:35.5179723Z ================================================================ 2026-02-21T10:22:35.5179840Z please share the reproducer above with Triton project. 2026-02-21T10:22:41.6463115Z 2026-02-21T10:22:41.6463131Z 2026-02-21T10:22:41.6463137Z 2026-02-21T10:22:41.6463631Z ================================================================ 2026-02-21T10:22:41.6464052Z Internal Triton PTX codegen error 2026-02-21T10:22:41.6464332Z `ptxas` stderr: 2026-02-21T10:22:41.6465055Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 629 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T10:22:41.6465870Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:41.6466116Z 2026-02-21T10:22:41.6467003Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfc7xsvh9.ptx -o /tmp/tmpfc7xsvh9.ptx.o 2026-02-21T10:22:41.6468133Z 2026-02-21T10:22:41.6468139Z 2026-02-21T10:22:41.6468217Z // 2026-02-21T10:22:41.6468428Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:22:41.6468817Z // 2026-02-21T10:22:41.6469153Z 2026-02-21T10:22:41.6469230Z .version 8.7 2026-02-21T10:22:41.6469420Z .target sm_90a 2026-02-21T10:22:41.6469629Z .address_size 64 2026-02-21T10:22:41.6469750Z 2026-02-21T10:22:41.6469992Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:22:41.6470435Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:22:41.6470769Z // @_helion_matmul_bf16_int4 2026-02-21T10:22:41.6471122Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:22:41.6471536Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:22:41.6471989Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:22:41.6472478Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:22:41.6472866Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:22:41.6473241Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:22:41.6473558Z ) 2026-02-21T10:22:41.6473693Z .reqntid 128 2026-02-21T10:22:41.6473853Z .maxnreg 32 2026-02-21T10:22:41.6473995Z { 2026-02-21T10:22:41.6474144Z .reg .pred %p<269>; 2026-02-21T10:22:41.6474504Z .reg .b16 %rs<1345>; 2026-02-21T10:22:41.6474683Z .reg .b32 %r<5543>; 2026-02-21T10:22:41.6474857Z .reg .b64 %rd<462>; 2026-02-21T10:22:41.6475189Z .loc 1 19 0 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:19:0 2026-02-21T10:22:41.6475592Z $L__func_begin0: 2026-02-21T10:22:41.6475933Z .loc 1 19 0 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:19:0 2026-02-21T10:22:41.6476272Z 2026-02-21T10:22:41.6476334Z // %bb.0: 2026-02-21T10:22:41.6476716Z ld.param.b64 %rd36, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:22:41.6477055Z ld.param.b64 %rd35, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:22:41.6477324Z $L__tmp0: 2026-02-21T10:22:41.6497301Z .loc 1 21 67 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:21:67 2026-02-21T10:22:41.6497853Z mov.u32 %r5444, %ctaid.x; 2026-02-21T10:22:41.6498130Z ld.param.b64 %rd38, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:22:41.6498423Z mov.u32 %r271, %ctaid.y; 2026-02-21T10:22:41.6498671Z ld.param.b64 %rd55, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:22:41.6499738Z [3354s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:22:41.6501368Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 64, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[2, 3], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:22:41.6502956Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:22:41.6503259Z `ptxas` stderr: 2026-02-21T10:22:41.6503816Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 629 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T10:22:41.6504459Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:41.6504660Z 2026-02-21T10:22:41.6505199Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpfc7xsvh9.ptx -o /tmp/tmpfc7xsvh9.ptx.o 2026-02-21T10:22:41.6505786Z 2026-02-21T10:22:41.6505943Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:22:41.6506254Z mov.u32 %r272, %ctaid.z; 2026-02-21T10:22:41.6506830Z mov.u32 %r273, %nctaid.x; 2026-02-21T10:22:41.6507023Z mov.u32 %r274, %nctaid.y; 2026-02-21T10:22:41.6507229Z mad.lo.s32 %r275, %r272, %r274, %r271; 2026-02-21T10:22:41.6507457Z mad.lo.s32 %r276, %r275, %r273, %r5444; 2026-02-21T10:22:41.6507823Z shl.b32 %r277, %r276, 7; 2026-02-21T10:22:41.6508007Z cvt.s64.s32 %rd56, %r277; 2026-02-21T10:22:41.6508202Z add.s64 %rd52, %rd55, %rd56; 2026-02-21T10:22:41.6508402Z mov.u32 %r2, %tid.x; 2026-02-21T10:22:41.6508679Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:22:41.6508861Z shl.b32 %r278, %r2, 2; 2026-02-21T10:22:41.6509046Z mov.b32 %r1928, global_smem; 2026-02-21T10:22:41.6509231Z add.s32 %r263, %r1928, %r278; 2026-02-21T10:22:41.6509422Z mov.b32 %r264, 0; 2026-02-21T10:22:41.6509593Z // begin inline asm 2026-02-21T10:22:41.6509781Z @%p1 st.shared.b32 [ %r263 + 0 ], %r264; 2026-02-21T10:22:41.6510003Z // end inline asm 2026-02-21T10:22:41.6510169Z bar.warp.sync -1; 2026-02-21T10:22:41.6510351Z setp.eq.b32 %p256, %r2, 0; 2026-02-21T10:22:41.6510630Z cvt.u64.u32 %rd37, %r1928; 2026-02-21T10:22:41.6510820Z // begin inline asm 2026-02-21T10:22:41.6511161Z @%p256 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd37 + 0 ], %rd38; 2026-02-21T10:22:41.6511540Z // end inline asm 2026-02-21T10:22:41.6511703Z // begin inline asm 2026-02-21T10:22:41.6511989Z @%p256 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x1; 2026-02-21T10:22:41.6512484Z // end inline asm 2026-02-21T10:22:41.6512654Z mov.b32 %r265, 32; 2026-02-21T10:22:41.6512827Z // begin inline asm 2026-02-21T10:22:41.6513116Z @%p256 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0, %r265; 2026-02-21T10:22:41.6513467Z // end inline asm 2026-02-21T10:22:41.6513619Z // begin inline asm 2026-02-21T10:22:41.6513912Z @%p256 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x1, %r265; 2026-02-21T10:22:41.6514255Z // end inline asm 2026-02-21T10:22:41.6514407Z mov.b32 %r267, 1280; 2026-02-21T10:22:41.6514583Z // begin inline asm 2026-02-21T10:22:41.6514894Z @%p256 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0, %r267; 2026-02-21T10:22:41.6515263Z // end inline asm 2026-02-21T10:22:41.6515420Z mov.b32 %r268, 4096; 2026-02-21T10:22:41.6515607Z // begin inline asm 2026-02-21T10:22:41.6515919Z @%p256 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x1, %r268; 2026-02-21T10:22:41.6516271Z // end inline asm 2026-02-21T10:22:41.6516438Z mov.b64 %rd45, 1280; 2026-02-21T10:22:41.6516737Z // begin inline asm 2026-02-21T10:22:41.6517062Z @%p256 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd37 + 0 ], 0x0, %rd45; 2026-02-21T10:22:41.6517428Z // end inline asm 2026-02-21T10:22:41.6517586Z mov.b32 %r269, 1; 2026-02-21T10:22:41.6517750Z // begin inline asm 2026-02-21T10:22:41.6518078Z @%p256 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0, %r269; 2026-02-21T10:22:41.6518456Z // end inline asm 2026-02-21T10:22:41.6518612Z // begin inline asm 2026-02-21T10:22:41.6518931Z @%p256 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x1, %r269; 2026-02-21T10:22:41.6519290Z // end inline asm 2026-02-21T10:22:41.6519457Z // begin inline asm 2026-02-21T10:22:41.6519751Z @%p256 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0; 2026-02-21T10:22:41.6520096Z // end inline asm 2026-02-21T10:22:41.6520263Z // begin inline asm 2026-02-21T10:22:41.6520580Z @%p256 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0; 2026-02-21T10:22:41.6520951Z // end inline asm 2026-02-21T10:22:41.6521103Z // begin inline asm 2026-02-21T10:22:41.6521402Z @%p256 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x1; 2026-02-21T10:22:41.6521737Z // end inline asm 2026-02-21T10:22:41.6521898Z // begin inline asm 2026-02-21T10:22:41.6522177Z @%p256 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd37 + 0 ], 0x0; 2026-02-21T10:22:41.6522640Z // end inline asm 2026-02-21T10:22:41.6522802Z // begin inline asm 2026-02-21T10:22:41.6523236Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd52 + 0 ], [ %rd37 + 0 ], 0x80; 2026-02-21T10:22:41.6523796Z // end inline asm 2026-02-21T10:22:41.6523947Z // begin inline asm 2026-02-21T10:22:41.6524202Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd52 + 0 ], 0x80; 2026-02-21T10:22:41.6524512Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:41.6524741Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:41.6524958Z // end inline asm 2026-02-21T10:22:41.6525110Z bar.sync 0; 2026-02-21T10:22:41.6525280Z cvta.global.u64 %rd247, %rd52; 2026-02-21T10:22:41.6525614Z .loc 1 0 0 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:0 2026-02-21T10:22:41.6525989Z sub.s32 %r280, 41091, %r5444; 2026-02-21T10:22:41.6526421Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6526972Z mul.hi.u32 %r281, %r280, 1041204193; 2026-02-21T10:22:41.6527185Z shr.u32 %r282, %r281, 5; 2026-02-21T10:22:41.6527378Z and.b32 %r283, %r282, 33554430; 2026-02-21T10:22:41.6527591Z mad.lo.s32 %r5518, %r283, 132, %r5444; 2026-02-21T10:22:41.6527944Z .loc 1 38 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:38:45 2026-02-21T10:22:41.6528400Z shr.u32 %r4, %r2, 5; 2026-02-21T10:22:41.6528574Z and.b32 %r5, %r2, 3; 2026-02-21T10:22:41.6528743Z shl.b32 %r6, %r5, 3; 2026-02-21T10:22:41.6529044Z .loc 1 40 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:45 2026-02-21T10:22:41.6529411Z and.b32 %r7, %r2, 112; 2026-02-21T10:22:41.6529596Z bfe.u32 %r8, %r2, 4, 3; 2026-02-21T10:22:41.6529771Z or.b32 %r9, %r8, 8; 2026-02-21T10:22:41.6529942Z or.b32 %r10, %r8, 16; 2026-02-21T10:22:41.6530108Z or.b32 %r11, %r8, 24; 2026-02-21T10:22:41.6530282Z or.b32 %r12, %r8, 32; 2026-02-21T10:22:41.6530447Z or.b32 %r13, %r8, 40; 2026-02-21T10:22:41.6530616Z or.b32 %r14, %r8, 48; 2026-02-21T10:22:41.6530780Z or.b32 %r15, %r8, 56; 2026-02-21T10:22:41.6530951Z and.b32 %r16, %r2, 120; 2026-02-21T10:22:41.6531128Z bfe.u32 %r18, %r2, 2, 5; 2026-02-21T10:22:41.6531311Z or.b32 %r19, %r18, 32; 2026-02-21T10:22:41.6531495Z and.b32 %r20, %r278, 60; 2026-02-21T10:22:41.6531672Z and.b32 %r21, %r2, 7; 2026-02-21T10:22:41.6531983Z .loc 1 71 38 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:71:38 2026-02-21T10:22:41.6532336Z and.b32 %r22, %r2, 32; 2026-02-21T10:22:41.6532659Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6533028Z setp.ge.s32 %p19, %r5444, %r5518; 2026-02-21T10:22:41.6533232Z and.b32 %r5435, %r2, 96; 2026-02-21T10:22:41.6533415Z and.b32 %r5436, %r2, 31; 2026-02-21T10:22:41.6533594Z shr.u32 %r5437, %r2, 1; 2026-02-21T10:22:41.6533771Z shl.b32 %r5438, %r21, 4; 2026-02-21T10:22:41.6533946Z shl.b32 %r5439, %r2, 7; 2026-02-21T10:22:41.6534121Z shl.b32 %r5440, %r5, 5; 2026-02-21T10:22:41.6534288Z and.b32 %r5441, %r2, 28; 2026-02-21T10:22:41.6534465Z shl.b32 %r5442, %r2, 9; 2026-02-21T10:22:41.6534638Z shl.b32 %r5443, %r16, 2; 2026-02-21T10:22:41.6534824Z setp.eq.b32 %p268, %r22, 0; 2026-02-21T10:22:41.6535011Z @%p19 bra $L__BB0_7; 2026-02-21T10:22:41.6535204Z // %bb.1: // %.lr.ph 2026-02-21T10:22:41.6535574Z .loc 1 0 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:0:139 2026-02-21T10:22:41.6535936Z bfe.u32 %r17, %r2, 3, 4; 2026-02-21T10:22:41.6536128Z shl.b32 %r284, %r2, 4; 2026-02-21T10:22:41.6536303Z and.b32 %r285, %r284, 2032; 2026-02-21T10:22:41.6536612Z and.b32 %r286, %r2, 56; 2026-02-21T10:22:41.6536795Z xor.b32 %r287, %r285, %r286; 2026-02-21T10:22:41.6536987Z add.s32 %r23, %r1928, %r287; 2026-02-21T10:22:41.6537269Z xor.b32 %r289, %r287, 8; 2026-02-21T10:22:41.6537455Z add.s32 %r24, %r1928, %r289; 2026-02-21T10:22:41.6537638Z shl.b32 %r291, %r5435, 6; 2026-02-21T10:22:41.6537824Z shl.b32 %r292, %r2, 5; 2026-02-21T10:22:41.6538006Z and.b32 %r293, %r292, 896; 2026-02-21T10:22:41.6538257Z shl.b32 %r295, %r5436, 1; 2026-02-21T10:22:41.6538439Z or.b32 %r296, %r291, %r293; 2026-02-21T10:22:41.6538616Z or.b32 %r297, %r296, %r295; 2026-02-21T10:22:41.6538804Z add.s32 %r25, %r1928, %r297; 2026-02-21T10:22:41.6538984Z xor.b32 %r298, %r297, 8; 2026-02-21T10:22:41.6539168Z add.s32 %r26, %r1928, %r298; 2026-02-21T10:22:41.6539346Z xor.b32 %r299, %r297, 16; 2026-02-21T10:22:41.6539527Z add.s32 %r27, %r1928, %r299; 2026-02-21T10:22:41.6539723Z xor.b32 %r300, %r297, 24; 2026-02-21T10:22:41.6539899Z add.s32 %r28, %r1928, %r300; 2026-02-21T10:22:41.6540082Z xor.b32 %r301, %r297, 32; 2026-02-21T10:22:41.6540252Z add.s32 %r29, %r1928, %r301; 2026-02-21T10:22:41.6540434Z xor.b32 %r302, %r297, 40; 2026-02-21T10:22:41.6540686Z add.s32 %r30, %r1928, %r302; 2026-02-21T10:22:41.6540873Z xor.b32 %r303, %r297, 48; 2026-02-21T10:22:41.6541045Z add.s32 %r31, %r1928, %r303; 2026-02-21T10:22:41.6541228Z xor.b32 %r304, %r297, 56; 2026-02-21T10:22:41.6541401Z add.s32 %r32, %r1928, %r304; 2026-02-21T10:22:41.6541593Z and.b32 %r306, %r5437, 32; 2026-02-21T10:22:41.6541778Z or.b32 %r307, %r306, %r5436; 2026-02-21T10:22:41.6541956Z add.s32 %r33, %r1928, %r307; 2026-02-21T10:22:41.6542219Z xor.b32 %r308, %r307, 16; 2026-02-21T10:22:41.6542398Z add.s32 %r34, %r1928, %r308; 2026-02-21T10:22:41.6542586Z shl.b32 %r309, %r5436, 7; 2026-02-21T10:22:41.6542772Z shr.u32 %r311, %r5435, 3; 2026-02-21T10:22:41.6542952Z or.b32 %r312, %r309, %r311; 2026-02-21T10:22:41.6543132Z or.b32 %r313, %r312, %r5438; 2026-02-21T10:22:41.6543316Z add.s32 %r35, %r1928, %r313; 2026-02-21T10:22:41.6543507Z xor.b32 %r314, %r313, 16; 2026-02-21T10:22:41.6543686Z add.s32 %r36, %r1928, %r314; 2026-02-21T10:22:41.6543876Z xor.b32 %r315, %r313, 32; 2026-02-21T10:22:41.6544056Z add.s32 %r37, %r1928, %r315; 2026-02-21T10:22:41.6544240Z xor.b32 %r316, %r313, 48; 2026-02-21T10:22:41.6544412Z add.s32 %r38, %r1928, %r316; 2026-02-21T10:22:41.6544593Z xor.b32 %r317, %r313, 64; 2026-02-21T10:22:41.6544765Z add.s32 %r39, %r1928, %r317; 2026-02-21T10:22:41.6544963Z xor.b32 %r318, %r313, 80; 2026-02-21T10:22:41.6545134Z add.s32 %r40, %r1928, %r318; 2026-02-21T10:22:41.6545317Z xor.b32 %r319, %r313, 96; 2026-02-21T10:22:41.6545494Z add.s32 %r41, %r1928, %r319; 2026-02-21T10:22:41.6545672Z xor.b32 %r320, %r313, 112; 2026-02-21T10:22:41.6545860Z add.s32 %r42, %r1928, %r320; 2026-02-21T10:22:41.6546042Z bfe.u32 %r321, %r1928, 4, 14; 2026-02-21T10:22:41.6546231Z cvt.u64.u32 %rd57, %r321; 2026-02-21T10:22:41.6546439Z or.b64 %rd166, %rd57, 4611686293322072064; 2026-02-21T10:22:41.6546802Z add.s32 %r322, %r1928, 32; 2026-02-21T10:22:41.6546985Z bfe.u32 %r323, %r322, 4, 14; 2026-02-21T10:22:41.6547169Z cvt.u64.u32 %rd58, %r323; 2026-02-21T10:22:41.6547362Z or.b64 %rd167, %rd58, 4611686293322072064; 2026-02-21T10:22:41.6547580Z add.s32 %r324, %r1928, 64; 2026-02-21T10:22:41.6547765Z bfe.u32 %r325, %r324, 4, 14; 2026-02-21T10:22:41.6547958Z cvt.u64.u32 %rd59, %r325; 2026-02-21T10:22:41.6548160Z or.b64 %rd168, %rd59, 4611686293322072064; 2026-02-21T10:22:41.6548376Z add.s32 %r326, %r1928, 96; 2026-02-21T10:22:41.6548645Z bfe.u32 %r327, %r326, 4, 14; 2026-02-21T10:22:41.6548832Z cvt.u64.u32 %rd60, %r327; 2026-02-21T10:22:41.6549017Z or.b64 %rd169, %rd60, 4611686293322072064; 2026-02-21T10:22:41.6549228Z add.s32 %r328, %r1928, 4096; 2026-02-21T10:22:41.6549404Z bfe.u32 %r329, %r328, 4, 14; 2026-02-21T10:22:41.6549582Z cvt.u64.u32 %rd61, %r329; 2026-02-21T10:22:41.6549761Z or.b64 %rd170, %rd61, 4611686293322072064; 2026-02-21T10:22:41.6549974Z add.s32 %r330, %r1928, 4128; 2026-02-21T10:22:41.6550157Z bfe.u32 %r331, %r330, 4, 14; 2026-02-21T10:22:41.6550331Z cvt.u64.u32 %rd62, %r331; 2026-02-21T10:22:41.6550625Z or.b64 %rd171, %rd62, 4611686293322072064; 2026-02-21T10:22:41.6550833Z add.s32 %r332, %r1928, 4160; 2026-02-21T10:22:41.6551015Z bfe.u32 %r333, %r332, 4, 14; 2026-02-21T10:22:41.6551199Z cvt.u64.u32 %rd63, %r333; 2026-02-21T10:22:41.6551462Z or.b64 %rd172, %rd63, 4611686293322072064; 2026-02-21T10:22:41.6551680Z add.s32 %r334, %r1928, 4192; 2026-02-21T10:22:41.6551863Z bfe.u32 %r335, %r334, 4, 14; 2026-02-21T10:22:41.6552047Z cvt.u64.u32 %rd64, %r335; 2026-02-21T10:22:41.6552227Z or.b64 %rd173, %rd64, 4611686293322072064; 2026-02-21T10:22:41.6552452Z and.b32 %r337, %r5439, 3072; 2026-02-21T10:22:41.6552638Z shl.b32 %r339, %r5435, 3; 2026-02-21T10:22:41.6552821Z shl.b32 %r341, %r5441, 2; 2026-02-21T10:22:41.6552993Z or.b32 %r342, %r337, %r5440; 2026-02-21T10:22:41.6553179Z or.b32 %r343, %r339, %r341; 2026-02-21T10:22:41.6553358Z xor.b32 %r344, %r342, %r343; 2026-02-21T10:22:41.6553542Z add.s32 %r43, %r1928, %r344; 2026-02-21T10:22:41.6553716Z and.b32 %r346, %r5442, 3072; 2026-02-21T10:22:41.6553969Z or.b32 %r348, %r346, %r5438; 2026-02-21T10:22:41.6554155Z xor.b32 %r349, %r348, %r5443; 2026-02-21T10:22:41.6554343Z add.s32 %r1860, %r1928, %r349; 2026-02-21T10:22:41.6554534Z add.s32 %r1865, %r1860, 512; 2026-02-21T10:22:41.6554719Z shl.b32 %r350, %r5441, 5; 2026-02-21T10:22:41.6554899Z or.b32 %r351, %r291, %r350; 2026-02-21T10:22:41.6555076Z or.b32 %r352, %r351, %r295; 2026-02-21T10:22:41.6555329Z add.s32 %r46, %r1928, %r352; 2026-02-21T10:22:41.6555510Z xor.b32 %r353, %r352, 8; 2026-02-21T10:22:41.6555689Z add.s32 %r47, %r1928, %r353; 2026-02-21T10:22:41.6555874Z xor.b32 %r354, %r352, 16; 2026-02-21T10:22:41.6556044Z add.s32 %r48, %r1928, %r354; 2026-02-21T10:22:41.6556238Z xor.b32 %r355, %r352, 24; 2026-02-21T10:22:41.6556409Z add.s32 %r49, %r1928, %r355; 2026-02-21T10:22:41.6556740Z xor.b32 %r356, %r352, 32; 2026-02-21T10:22:41.6556915Z add.s32 %r50, %r1928, %r356; 2026-02-21T10:22:41.6557096Z xor.b32 %r357, %r352, 40; 2026-02-21T10:22:41.6557270Z add.s32 %r51, %r1928, %r357; 2026-02-21T10:22:41.6557450Z xor.b32 %r358, %r352, 48; 2026-02-21T10:22:41.6557620Z add.s32 %r52, %r1928, %r358; 2026-02-21T10:22:41.6557801Z xor.b32 %r359, %r352, 56; 2026-02-21T10:22:41.6557981Z add.s32 %r53, %r1928, %r359; 2026-02-21T10:22:41.6558325Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6558701Z or.b32 %r54, %r17, 48; 2026-02-21T10:22:41.6559021Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6559396Z mad.wide.u32 %rd10, %r21, 16, %rd35; 2026-02-21T10:22:41.6559741Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6560106Z or.b32 %r55, %r17, 32; 2026-02-21T10:22:41.6560279Z or.b32 %r56, %r17, 16; 2026-02-21T10:22:41.6560587Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6560951Z shl.b32 %r57, %r17, 13; 2026-02-21T10:22:41.6561184Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:22:41.6561486Z // Child Loop BB0_3 Depth 2 2026-02-21T10:22:41.6561757Z // Child Loop BB0_5 Depth 2 2026-02-21T10:22:41.6562142Z .loc 1 32 35 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:32:35 2026-02-21T10:22:41.6562507Z shr.s32 %r362, %r5444, 31; 2026-02-21T10:22:41.6562688Z shr.u32 %r363, %r362, 17; 2026-02-21T10:22:41.6562869Z add.s32 %r364, %r5444, %r363; 2026-02-21T10:22:41.6563048Z shr.s32 %r365, %r364, 15; 2026-02-21T10:22:41.6563362Z .loc 1 33 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:33:33 2026-02-21T10:22:41.6563715Z shl.b32 %r366, %r365, 5; 2026-02-21T10:22:41.6564031Z .loc 1 34 39 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:39 2026-02-21T10:22:41.6564469Z sub.s32 %r367, 40, %r366; 2026-02-21T10:22:41.6564785Z .loc 1 34 52 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:52 2026-02-21T10:22:41.6565157Z min.s32 %r368, %r367, 32; 2026-02-21T10:22:41.6565533Z .loc 1 35 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:45 2026-02-21T10:22:41.6565898Z and.b32 %r369, %r364, -32768; 2026-02-21T10:22:41.6566085Z sub.s32 %r370, %r5444, %r369; 2026-02-21T10:22:41.6566407Z .loc 1 36 51 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:36:51 2026-02-21T10:22:41.6566898Z div.s32 %r371, %r370, %r368; 2026-02-21T10:22:41.6567222Z .loc 1 35 64 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:64 2026-02-21T10:22:41.6567580Z mul.lo.s32 %r372, %r371, %r368; 2026-02-21T10:22:41.6567770Z sub.s32 %r373, %r370, %r372; 2026-02-21T10:22:41.6568193Z .loc 1 35 30 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:30 2026-02-21T10:22:41.6568549Z add.s32 %r374, %r373, %r366; 2026-02-21T10:22:41.6568866Z .loc 1 37 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:37:27 2026-02-21T10:22:41.6569216Z shl.b32 %r402, %r374, 5; 2026-02-21T10:22:41.6569603Z .loc 1 39 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:39:27 2026-02-21T10:22:41.6569958Z shl.b32 %r60, %r371, 6; 2026-02-21T10:22:41.6570273Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6570632Z or.b32 %r375, %r54, %r60; 2026-02-21T10:22:41.6570806Z shl.b32 %r376, %r375, 13; 2026-02-21T10:22:41.6571004Z mul.wide.s32 %rd11, %r376, 2; 2026-02-21T10:22:41.6571188Z or.b32 %r377, %r55, %r60; 2026-02-21T10:22:41.6571363Z shl.b32 %r378, %r377, 13; 2026-02-21T10:22:41.6571543Z mul.wide.s32 %rd12, %r378, 2; 2026-02-21T10:22:41.6571731Z or.b32 %r379, %r56, %r60; 2026-02-21T10:22:41.6571908Z shl.b32 %r380, %r379, 13; 2026-02-21T10:22:41.6572080Z mul.wide.s32 %rd13, %r380, 2; 2026-02-21T10:22:41.6572266Z shl.b32 %r381, %r371, 19; 2026-02-21T10:22:41.6572435Z or.b32 %r382, %r57, %r381; 2026-02-21T10:22:41.6572624Z mul.wide.s32 %rd14, %r382, 2; 2026-02-21T10:22:41.6572816Z mov.b32 %r5445, 0f00000000; 2026-02-21T10:22:41.6573008Z mov.b64 %rd459, 0; 2026-02-21T10:22:41.6573173Z mov.b64 %rd458, %rd10; 2026-02-21T10:22:41.6573351Z mov.b32 %r5446, %r5445; 2026-02-21T10:22:41.6573535Z mov.b32 %r5447, %r5445; 2026-02-21T10:22:41.6573702Z mov.b32 %r5448, %r5445; 2026-02-21T10:22:41.6573872Z mov.b32 %r5449, %r5445; 2026-02-21T10:22:41.6574039Z mov.b32 %r5450, %r5445; 2026-02-21T10:22:41.6574211Z mov.b32 %r5451, %r5445; 2026-02-21T10:22:41.6574373Z mov.b32 %r5452, %r5445; 2026-02-21T10:22:41.6574560Z mov.b32 %r5453, %r5445; 2026-02-21T10:22:41.6574728Z mov.b32 %r5454, %r5445; 2026-02-21T10:22:41.6574901Z mov.b32 %r5455, %r5445; 2026-02-21T10:22:41.6575067Z mov.b32 %r5456, %r5445; 2026-02-21T10:22:41.6575248Z mov.b32 %r5457, %r5445; 2026-02-21T10:22:41.6575427Z mov.b32 %r5458, %r5445; 2026-02-21T10:22:41.6575598Z mov.b32 %r5459, %r5445; 2026-02-21T10:22:41.6575772Z mov.b32 %r5460, %r5445; 2026-02-21T10:22:41.6575993Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:41.6576297Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:41.6576828Z .loc 1 0 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:0:126 2026-02-21T10:22:41.6577191Z cvt.u32.u64 %r403, %rd459; 2026-02-21T10:22:41.6577517Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6577869Z add.s64 %rd67, %rd458, %rd14; 2026-02-21T10:22:41.6578054Z add.s64 %rd70, %rd458, %rd13; 2026-02-21T10:22:41.6578237Z add.s64 %rd73, %rd458, %rd12; 2026-02-21T10:22:41.6578660Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6579020Z add.s64 %rd76, %rd458, %rd11; 2026-02-21T10:22:41.6579209Z // begin inline asm 2026-02-21T10:22:41.6579437Z mov.u64 %rd66, 0x0; 2026-02-21T10:22:41.6579666Z createpolicy.fractional.L2::evict_last.b64 %rd66, 1.0; 2026-02-21T10:22:41.6579927Z // end inline asm 2026-02-21T10:22:41.6580086Z // begin inline asm 2026-02-21T10:22:41.6580245Z mov.u32 %r383, 0x0; 2026-02-21T10:22:41.6580400Z mov.u32 %r384, 0x0; 2026-02-21T10:22:41.6580573Z mov.u32 %r385, 0x0; 2026-02-21T10:22:41.6580731Z mov.u32 %r386, 0x0; 2026-02-21T10:22:41.6581041Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r383, %r384, %r385, %r386 }, [ %rd67 + 0 ], %rd66; 2026-02-21T10:22:41.6581395Z // end inline asm 2026-02-21T10:22:41.6581567Z // begin inline asm 2026-02-21T10:22:41.6581728Z mov.u64 %rd69, 0x0; 2026-02-21T10:22:41.6581937Z createpolicy.fractional.L2::evict_last.b64 %rd69, 1.0; 2026-02-21T10:22:41.6582269Z // end inline asm 2026-02-21T10:22:41.6582423Z // begin inline asm 2026-02-21T10:22:41.6582586Z mov.u32 %r387, 0x0; 2026-02-21T10:22:41.6582735Z mov.u32 %r388, 0x0; 2026-02-21T10:22:41.6582896Z mov.u32 %r389, 0x0; 2026-02-21T10:22:41.6583048Z mov.u32 %r390, 0x0; 2026-02-21T10:22:41.6583365Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r387, %r388, %r389, %r390 }, [ %rd70 + 0 ], %rd69; 2026-02-21T10:22:41.6583785Z // end inline asm 2026-02-21T10:22:41.6583945Z // begin inline asm 2026-02-21T10:22:41.6584108Z mov.u64 %rd72, 0x0; 2026-02-21T10:22:41.6584318Z createpolicy.fractional.L2::evict_last.b64 %rd72, 1.0; 2026-02-21T10:22:41.6584571Z // end inline asm 2026-02-21T10:22:41.6584723Z // begin inline asm 2026-02-21T10:22:41.6584886Z mov.u32 %r391, 0x0; 2026-02-21T10:22:41.6585037Z mov.u32 %r392, 0x0; 2026-02-21T10:22:41.6585192Z mov.u32 %r393, 0x0; 2026-02-21T10:22:41.6585356Z mov.u32 %r394, 0x0; 2026-02-21T10:22:41.6585657Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r391, %r392, %r393, %r394 }, [ %rd73 + 0 ], %rd72; 2026-02-21T10:22:41.6586013Z // end inline asm 2026-02-21T10:22:41.6586167Z // begin inline asm 2026-02-21T10:22:41.6586326Z mov.u64 %rd75, 0x0; 2026-02-21T10:22:41.6586658Z createpolicy.fractional.L2::evict_last.b64 %rd75, 1.0; 2026-02-21T10:22:41.6586917Z // end inline asm 2026-02-21T10:22:41.6587068Z // begin inline asm 2026-02-21T10:22:41.6587235Z mov.u32 %r395, 0x0; 2026-02-21T10:22:41.6587391Z mov.u32 %r396, 0x0; 2026-02-21T10:22:41.6587551Z mov.u32 %r397, 0x0; 2026-02-21T10:22:41.6587709Z mov.u32 %r398, 0x0; 2026-02-21T10:22:41.6588024Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r395, %r396, %r397, %r398 }, [ %rd76 + 0 ], %rd75; 2026-02-21T10:22:41.6588373Z // end inline asm 2026-02-21T10:22:41.6588754Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6589121Z bar.sync 0; 2026-02-21T10:22:41.6589301Z st.shared.v2.b32 [%r23], {%r383, %r384}; 2026-02-21T10:22:41.6589549Z st.shared.v2.b32 [%r23+2048], {%r387, %r388}; 2026-02-21T10:22:41.6589786Z st.shared.v2.b32 [%r23+4096], {%r391, %r392}; 2026-02-21T10:22:41.6590023Z st.shared.v2.b32 [%r23+6144], {%r395, %r396}; 2026-02-21T10:22:41.6590264Z st.shared.v2.b32 [%r24], {%r385, %r386}; 2026-02-21T10:22:41.6590487Z st.shared.v2.b32 [%r24+2048], {%r389, %r390}; 2026-02-21T10:22:41.6590729Z st.shared.v2.b32 [%r24+4096], {%r393, %r394}; 2026-02-21T10:22:41.6590958Z st.shared.v2.b32 [%r24+6144], {%r397, %r398}; 2026-02-21T10:22:41.6591171Z bar.sync 0; 2026-02-21T10:22:41.6591327Z ld.shared.b16 %rs1, [%r25]; 2026-02-21T10:22:41.6591551Z ld.shared.b16 %rs2, [%r25+1024]; 2026-02-21T10:22:41.6591760Z ld.shared.b16 %rs3, [%r25+64]; 2026-02-21T10:22:41.6591964Z ld.shared.b16 %rs4, [%r25+1088]; 2026-02-21T10:22:41.6592163Z ld.shared.b16 %rs5, [%r26]; 2026-02-21T10:22:41.6592343Z ld.shared.b16 %rs6, [%r26+1024]; 2026-02-21T10:22:41.6592537Z ld.shared.b16 %rs7, [%r26+64]; 2026-02-21T10:22:41.6592820Z ld.shared.b16 %rs8, [%r26+1088]; 2026-02-21T10:22:41.6593011Z ld.shared.b16 %rs9, [%r27]; 2026-02-21T10:22:41.6593193Z ld.shared.b16 %rs10, [%r27+1024]; 2026-02-21T10:22:41.6593395Z ld.shared.b16 %rs11, [%r27+64]; 2026-02-21T10:22:41.6593664Z ld.shared.b16 %rs12, [%r27+1088]; 2026-02-21T10:22:41.6593870Z ld.shared.b16 %rs13, [%r28]; 2026-02-21T10:22:41.6594066Z ld.shared.b16 %rs14, [%r28+1024]; 2026-02-21T10:22:41.6594264Z ld.shared.b16 %rs15, [%r28+64]; 2026-02-21T10:22:41.6594471Z ld.shared.b16 %rs16, [%r28+1088]; 2026-02-21T10:22:41.6594682Z ld.shared.b16 %rs17, [%r29]; 2026-02-21T10:22:41.6594880Z ld.shared.b16 %rs18, [%r29+1024]; 2026-02-21T10:22:41.6595078Z ld.shared.b16 %rs19, [%r29+64]; 2026-02-21T10:22:41.6595279Z ld.shared.b16 %rs20, [%r29+1088]; 2026-02-21T10:22:41.6595474Z ld.shared.b16 %rs21, [%r30]; 2026-02-21T10:22:41.6595673Z ld.shared.b16 %rs22, [%r30+1024]; 2026-02-21T10:22:41.6595872Z ld.shared.b16 %rs23, [%r30+64]; 2026-02-21T10:22:41.6596076Z ld.shared.b16 %rs24, [%r30+1088]; 2026-02-21T10:22:41.6596353Z ld.shared.b16 %rs25, [%r31]; 2026-02-21T10:22:41.6596668Z ld.shared.b16 %rs26, [%r31+1024]; 2026-02-21T10:22:41.6596873Z ld.shared.b16 %rs27, [%r31+64]; 2026-02-21T10:22:41.6597068Z ld.shared.b16 %rs28, [%r31+1088]; 2026-02-21T10:22:41.6597266Z ld.shared.b16 %rs29, [%r32]; 2026-02-21T10:22:41.6597449Z ld.shared.b16 %rs30, [%r32+1024]; 2026-02-21T10:22:41.6597743Z ld.shared.b16 %rs31, [%r32+64]; 2026-02-21T10:22:41.6597939Z ld.shared.b16 %rs32, [%r32+1088]; 2026-02-21T10:22:41.6598140Z cvt.f32.bf16 %r440, %rs1; 2026-02-21T10:22:41.6598323Z cvt.f32.bf16 %r441, %rs2; 2026-02-21T10:22:41.6598501Z cvt.f32.bf16 %r442, %rs5; 2026-02-21T10:22:41.6598681Z cvt.f32.bf16 %r443, %rs6; 2026-02-21T10:22:41.6598854Z cvt.f32.bf16 %r476, %rs9; 2026-02-21T10:22:41.6599036Z cvt.f32.bf16 %r477, %rs10; 2026-02-21T10:22:41.6599218Z cvt.f32.bf16 %r478, %rs13; 2026-02-21T10:22:41.6599404Z cvt.f32.bf16 %r479, %rs14; 2026-02-21T10:22:41.6599584Z cvt.f32.bf16 %r512, %rs17; 2026-02-21T10:22:41.6599769Z cvt.f32.bf16 %r513, %rs18; 2026-02-21T10:22:41.6599949Z cvt.f32.bf16 %r514, %rs21; 2026-02-21T10:22:41.6600132Z cvt.f32.bf16 %r515, %rs22; 2026-02-21T10:22:41.6600315Z cvt.f32.bf16 %r548, %rs25; 2026-02-21T10:22:41.6600497Z cvt.f32.bf16 %r549, %rs26; 2026-02-21T10:22:41.6600699Z cvt.f32.bf16 %r550, %rs29; 2026-02-21T10:22:41.6600876Z cvt.f32.bf16 %r551, %rs30; 2026-02-21T10:22:41.6601063Z cvt.f32.bf16 %r584, %rs3; 2026-02-21T10:22:41.6601241Z cvt.f32.bf16 %r585, %rs4; 2026-02-21T10:22:41.6601424Z cvt.f32.bf16 %r586, %rs7; 2026-02-21T10:22:41.6601598Z cvt.f32.bf16 %r587, %rs8; 2026-02-21T10:22:41.6601793Z cvt.f32.bf16 %r620, %rs11; 2026-02-21T10:22:41.6601985Z cvt.f32.bf16 %r621, %rs12; 2026-02-21T10:22:41.6602171Z cvt.f32.bf16 %r622, %rs15; 2026-02-21T10:22:41.6602354Z cvt.f32.bf16 %r623, %rs16; 2026-02-21T10:22:41.6602531Z cvt.f32.bf16 %r656, %rs19; 2026-02-21T10:22:41.6602712Z cvt.f32.bf16 %r657, %rs20; 2026-02-21T10:22:41.6602891Z cvt.f32.bf16 %r658, %rs23; 2026-02-21T10:22:41.6603078Z cvt.f32.bf16 %r659, %rs24; 2026-02-21T10:22:41.6603254Z cvt.f32.bf16 %r692, %rs27; 2026-02-21T10:22:41.6603437Z cvt.f32.bf16 %r693, %rs28; 2026-02-21T10:22:41.6603619Z cvt.f32.bf16 %r694, %rs31; 2026-02-21T10:22:41.6603801Z cvt.f32.bf16 %r695, %rs32; 2026-02-21T10:22:41.6604133Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6604496Z bar.sync 0; 2026-02-21T10:22:41.6604661Z add.s32 %r1926, %r1928, 1024; 2026-02-21T10:22:41.6604850Z // begin inline asm 2026-02-21T10:22:41.6605061Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6605299Z // end inline asm 2026-02-21T10:22:41.6605460Z bar.sync 0; 2026-02-21T10:22:41.6605611Z // begin inline asm 2026-02-21T10:22:41.6605849Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6606124Z // end inline asm 2026-02-21T10:22:41.6606368Z // begin inline asm 2026-02-21T10:22:41.6606674Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6606868Z // end inline asm 2026-02-21T10:22:41.6607028Z bar.sync 0; 2026-02-21T10:22:41.6607186Z elect.sync %r1787|%p70, -1; 2026-02-21T10:22:41.6607469Z and.pred %p22, %p1, %p70; 2026-02-21T10:22:41.6607646Z // begin inline asm 2026-02-21T10:22:41.6608087Z @%p22 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r402, %r403}], [%r1926]; 2026-02-21T10:22:41.6608546Z // end inline asm 2026-02-21T10:22:41.6608701Z bar.sync 0; 2026-02-21T10:22:41.6608851Z mov.b32 %r1767, 0; 2026-02-21T10:22:41.6609006Z // begin inline asm 2026-02-21T10:22:41.6609162Z 2026-02-21T10:22:41.6609288Z { 2026-02-21T10:22:41.6609429Z .reg .pred complete; 2026-02-21T10:22:41.6609592Z waitLoop: 2026-02-21T10:22:41.6609818Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r1767; 2026-02-21T10:22:41.6610115Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6610300Z } 2026-02-21T10:22:41.6610372Z 2026-02-21T10:22:41.6610513Z // end inline asm 2026-02-21T10:22:41.6610687Z bar.sync 0; 2026-02-21T10:22:41.6610845Z // begin inline asm 2026-02-21T10:22:41.6611039Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6611279Z // end inline asm 2026-02-21T10:22:41.6611580Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6612028Z ld.shared.b8 %rs33, [%r33]; 2026-02-21T10:22:41.6612227Z ld.shared.b8 %rs34, [%r33+64]; 2026-02-21T10:22:41.6612429Z ld.shared.b8 %rs35, [%r33+256]; 2026-02-21T10:22:41.6612627Z ld.shared.b8 %rs36, [%r33+320]; 2026-02-21T10:22:41.6612831Z ld.shared.b8 %rs37, [%r33+512]; 2026-02-21T10:22:41.6613030Z ld.shared.b8 %rs38, [%r33+576]; 2026-02-21T10:22:41.6613223Z ld.shared.b8 %rs39, [%r33+768]; 2026-02-21T10:22:41.6613420Z ld.shared.b8 %rs40, [%r33+832]; 2026-02-21T10:22:41.6613610Z ld.shared.b8 %rs41, [%r34+128]; 2026-02-21T10:22:41.6613816Z ld.shared.b8 %rs42, [%r34+192]; 2026-02-21T10:22:41.6614015Z ld.shared.b8 %rs43, [%r34+384]; 2026-02-21T10:22:41.6614219Z ld.shared.b8 %rs44, [%r34+448]; 2026-02-21T10:22:41.6614414Z ld.shared.b8 %rs45, [%r34+640]; 2026-02-21T10:22:41.6614618Z ld.shared.b8 %rs46, [%r34+704]; 2026-02-21T10:22:41.6614828Z ld.shared.b8 %rs47, [%r34+896]; 2026-02-21T10:22:41.6615031Z ld.shared.b8 %rs48, [%r34+960]; 2026-02-21T10:22:41.6615377Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6615740Z shl.b16 %rs49, %rs33, 4; 2026-02-21T10:22:41.6615931Z shl.b16 %rs50, %rs34, 4; 2026-02-21T10:22:41.6616107Z shl.b16 %rs51, %rs41, 4; 2026-02-21T10:22:41.6616287Z shl.b16 %rs52, %rs42, 4; 2026-02-21T10:22:41.6616586Z shl.b16 %rs53, %rs35, 4; 2026-02-21T10:22:41.6616787Z shl.b16 %rs54, %rs36, 4; 2026-02-21T10:22:41.6616960Z shl.b16 %rs55, %rs43, 4; 2026-02-21T10:22:41.6617146Z shl.b16 %rs56, %rs44, 4; 2026-02-21T10:22:41.6617328Z shl.b16 %rs57, %rs37, 4; 2026-02-21T10:22:41.6617505Z shl.b16 %rs58, %rs38, 4; 2026-02-21T10:22:41.6617687Z shl.b16 %rs59, %rs45, 4; 2026-02-21T10:22:41.6617863Z shl.b16 %rs60, %rs46, 4; 2026-02-21T10:22:41.6618042Z shl.b16 %rs61, %rs39, 4; 2026-02-21T10:22:41.6618213Z shl.b16 %rs62, %rs40, 4; 2026-02-21T10:22:41.6618390Z shl.b16 %rs63, %rs47, 4; 2026-02-21T10:22:41.6618564Z shl.b16 %rs64, %rs48, 4; 2026-02-21T10:22:41.6618883Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6619251Z selp.b16 %rs65, %rs49, %rs33, %p268; 2026-02-21T10:22:41.6619459Z cvt.s16.s8 %rs66, %rs65; 2026-02-21T10:22:41.6619637Z shr.s16 %rs67, %rs66, 4; 2026-02-21T10:22:41.6619814Z selp.b16 %rs68, %rs50, %rs34, %p268; 2026-02-21T10:22:41.6620023Z cvt.s16.s8 %rs69, %rs68; 2026-02-21T10:22:41.6620192Z shr.s16 %rs70, %rs69, 4; 2026-02-21T10:22:41.6620375Z selp.b16 %rs71, %rs51, %rs41, %p268; 2026-02-21T10:22:41.6620587Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T10:22:41.6620856Z shr.s16 %rs73, %rs72, 4; 2026-02-21T10:22:41.6621034Z selp.b16 %rs74, %rs52, %rs42, %p268; 2026-02-21T10:22:41.6621239Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T10:22:41.6621413Z shr.s16 %rs76, %rs75, 4; 2026-02-21T10:22:41.6621699Z selp.b16 %rs77, %rs53, %rs35, %p268; 2026-02-21T10:22:41.6621905Z cvt.s16.s8 %rs78, %rs77; 2026-02-21T10:22:41.6622076Z shr.s16 %rs79, %rs78, 4; 2026-02-21T10:22:41.6622263Z selp.b16 %rs80, %rs54, %rs36, %p268; 2026-02-21T10:22:41.6622461Z cvt.s16.s8 %rs81, %rs80; 2026-02-21T10:22:41.6622639Z shr.s16 %rs82, %rs81, 4; 2026-02-21T10:22:41.6622817Z selp.b16 %rs83, %rs55, %rs43, %p268; 2026-02-21T10:22:41.6623022Z cvt.s16.s8 %rs84, %rs83; 2026-02-21T10:22:41.6623201Z shr.s16 %rs85, %rs84, 4; 2026-02-21T10:22:41.6623380Z selp.b16 %rs86, %rs56, %rs44, %p268; 2026-02-21T10:22:41.6623585Z cvt.s16.s8 %rs87, %rs86; 2026-02-21T10:22:41.6623758Z shr.s16 %rs88, %rs87, 4; 2026-02-21T10:22:41.6623943Z selp.b16 %rs89, %rs57, %rs37, %p268; 2026-02-21T10:22:41.6624222Z cvt.s16.s8 %rs90, %rs89; 2026-02-21T10:22:41.6624407Z shr.s16 %rs91, %rs90, 4; 2026-02-21T10:22:41.6624586Z selp.b16 %rs92, %rs58, %rs38, %p268; 2026-02-21T10:22:41.6624795Z cvt.s16.s8 %rs93, %rs92; 2026-02-21T10:22:41.6624969Z shr.s16 %rs94, %rs93, 4; 2026-02-21T10:22:41.6625154Z selp.b16 %rs95, %rs59, %rs45, %p268; 2026-02-21T10:22:41.6625374Z cvt.s16.s8 %rs96, %rs95; 2026-02-21T10:22:41.6625617Z shr.s16 %rs97, %rs96, 4; 2026-02-21T10:22:41.6625809Z selp.b16 %rs98, %rs60, %rs46, %p268; 2026-02-21T10:22:41.6626009Z cvt.s16.s8 %rs99, %rs98; 2026-02-21T10:22:41.6626192Z shr.s16 %rs100, %rs99, 4; 2026-02-21T10:22:41.6626381Z selp.b16 %rs101, %rs61, %rs39, %p268; 2026-02-21T10:22:41.6626731Z cvt.s16.s8 %rs102, %rs101; 2026-02-21T10:22:41.6626916Z shr.s16 %rs103, %rs102, 4; 2026-02-21T10:22:41.6627114Z selp.b16 %rs104, %rs62, %rs40, %p268; 2026-02-21T10:22:41.6627324Z cvt.s16.s8 %rs105, %rs104; 2026-02-21T10:22:41.6627502Z shr.s16 %rs106, %rs105, 4; 2026-02-21T10:22:41.6627701Z selp.b16 %rs107, %rs63, %rs47, %p268; 2026-02-21T10:22:41.6627905Z cvt.s16.s8 %rs108, %rs107; 2026-02-21T10:22:41.6628091Z shr.s16 %rs109, %rs108, 4; 2026-02-21T10:22:41.6628270Z selp.b16 %rs110, %rs64, %rs48, %p268; 2026-02-21T10:22:41.6628563Z cvt.s16.s8 %rs111, %rs110; 2026-02-21T10:22:41.6628740Z shr.s16 %rs112, %rs111, 4; 2026-02-21T10:22:41.6629062Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6629426Z cvt.rn.f32.s16 %r1788, %rs67; 2026-02-21T10:22:41.6629613Z cvt.rn.f32.s16 %r1789, %rs70; 2026-02-21T10:22:41.6629803Z cvt.rn.f32.s16 %r1790, %rs73; 2026-02-21T10:22:41.6629982Z cvt.rn.f32.s16 %r1791, %rs76; 2026-02-21T10:22:41.6630183Z cvt.rn.f32.s16 %r1792, %rs79; 2026-02-21T10:22:41.6630366Z cvt.rn.f32.s16 %r1793, %rs82; 2026-02-21T10:22:41.6630548Z cvt.rn.f32.s16 %r1794, %rs85; 2026-02-21T10:22:41.6630724Z cvt.rn.f32.s16 %r1795, %rs88; 2026-02-21T10:22:41.6630911Z cvt.rn.f32.s16 %r1796, %rs91; 2026-02-21T10:22:41.6631092Z cvt.rn.f32.s16 %r1797, %rs94; 2026-02-21T10:22:41.6631279Z cvt.rn.f32.s16 %r1798, %rs97; 2026-02-21T10:22:41.6631465Z cvt.rn.f32.s16 %r1799, %rs100; 2026-02-21T10:22:41.6631654Z cvt.rn.f32.s16 %r1800, %rs103; 2026-02-21T10:22:41.6631848Z cvt.rn.f32.s16 %r1801, %rs106; 2026-02-21T10:22:41.6632032Z cvt.rn.f32.s16 %r1802, %rs109; 2026-02-21T10:22:41.6632226Z cvt.rn.f32.s16 %r1803, %rs112; 2026-02-21T10:22:41.6632401Z bar.sync 0; 2026-02-21T10:22:41.6632559Z st.shared.b32 [%r35], %r1788; 2026-02-21T10:22:41.6632746Z st.shared.b32 [%r35+4096], %r1796; 2026-02-21T10:22:41.6632950Z st.shared.b32 [%r36], %r1789; 2026-02-21T10:22:41.6633134Z st.shared.b32 [%r36+4096], %r1797; 2026-02-21T10:22:41.6633336Z st.shared.b32 [%r37], %r1790; 2026-02-21T10:22:41.6633525Z st.shared.b32 [%r37+4096], %r1798; 2026-02-21T10:22:41.6633719Z st.shared.b32 [%r38], %r1791; 2026-02-21T10:22:41.6633909Z st.shared.b32 [%r38+4096], %r1799; 2026-02-21T10:22:41.6634198Z st.shared.b32 [%r39], %r1792; 2026-02-21T10:22:41.6634384Z st.shared.b32 [%r39+4096], %r1800; 2026-02-21T10:22:41.6634579Z st.shared.b32 [%r40], %r1793; 2026-02-21T10:22:41.6634767Z st.shared.b32 [%r40+4096], %r1801; 2026-02-21T10:22:41.6635042Z st.shared.b32 [%r41], %r1794; 2026-02-21T10:22:41.6635232Z st.shared.b32 [%r41+4096], %r1802; 2026-02-21T10:22:41.6635433Z st.shared.b32 [%r42], %r1795; 2026-02-21T10:22:41.6635618Z st.shared.b32 [%r42+4096], %r1803; 2026-02-21T10:22:41.6635817Z $L__tmp1: 2026-02-21T10:22:41.6636177Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6636734Z // begin inline asm 2026-02-21T10:22:41.6636922Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6637119Z // end inline asm 2026-02-21T10:22:41.6637275Z bar.sync 0; 2026-02-21T10:22:41.6637466Z shfl.sync.idx.b32 %r1804, %r4, 0, 31, -1; 2026-02-21T10:22:41.6637708Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6637981Z mov.pred %p24, -1; 2026-02-21T10:22:41.6638156Z // begin inline asm 2026-02-21T10:22:41.6638764Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r440,%r441,%r442,%r443}, %rd166, %p24, 1, 1; 2026-02-21T10:22:41.6639424Z // end inline asm 2026-02-21T10:22:41.6639584Z // begin inline asm 2026-02-21T10:22:41.6640269Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r476,%r477,%r478,%r479}, %rd167, %p24, 1, 1; 2026-02-21T10:22:41.6640915Z // end inline asm 2026-02-21T10:22:41.6641069Z // begin inline asm 2026-02-21T10:22:41.6641661Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r512,%r513,%r514,%r515}, %rd168, %p24, 1, 1; 2026-02-21T10:22:41.6642302Z // end inline asm 2026-02-21T10:22:41.6642458Z // begin inline asm 2026-02-21T10:22:41.6643044Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r548,%r549,%r550,%r551}, %rd169, %p24, 1, 1; 2026-02-21T10:22:41.6643696Z // end inline asm 2026-02-21T10:22:41.6643856Z // begin inline asm 2026-02-21T10:22:41.6644446Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r584,%r585,%r586,%r587}, %rd170, %p24, 1, 1; 2026-02-21T10:22:41.6645091Z // end inline asm 2026-02-21T10:22:41.6645241Z // begin inline asm 2026-02-21T10:22:41.6645828Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r620,%r621,%r622,%r623}, %rd171, %p24, 1, 1; 2026-02-21T10:22:41.6646595Z // end inline asm 2026-02-21T10:22:41.6646751Z // begin inline asm 2026-02-21T10:22:41.6647344Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r656,%r657,%r658,%r659}, %rd172, %p24, 1, 1; 2026-02-21T10:22:41.6647980Z // end inline asm 2026-02-21T10:22:41.6648139Z // begin inline asm 2026-02-21T10:22:41.6648731Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r692,%r693,%r694,%r695}, %rd173, %p24, 1, 1; 2026-02-21T10:22:41.6649365Z // end inline asm 2026-02-21T10:22:41.6649546Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6649754Z mov.b32 %r712, %r1928; 2026-02-21T10:22:41.6649932Z mov.b32 %r713, %r1767; 2026-02-21T10:22:41.6650099Z mov.b32 %r714, %r1767; 2026-02-21T10:22:41.6650271Z // begin inline asm 2026-02-21T10:22:41.6650787Z // wait for regs: %r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460,%r712,%r713,%r714 2026-02-21T10:22:41.6651262Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6651549Z // end inline asm 2026-02-21T10:22:41.6651699Z $L__tmp2: 2026-02-21T10:22:41.6652012Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6652381Z add.s64 %rd88, %rd67, 128; 2026-02-21T10:22:41.6652573Z add.s64 %rd91, %rd70, 128; 2026-02-21T10:22:41.6652753Z add.s64 %rd94, %rd73, 128; 2026-02-21T10:22:41.6653077Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6653440Z add.s64 %rd97, %rd76, 128; 2026-02-21T10:22:41.6653616Z // begin inline asm 2026-02-21T10:22:41.6653787Z mov.u64 %rd87, 0x0; 2026-02-21T10:22:41.6654006Z createpolicy.fractional.L2::evict_last.b64 %rd87, 1.0; 2026-02-21T10:22:41.6654272Z // end inline asm 2026-02-21T10:22:41.6654534Z // begin inline asm 2026-02-21T10:22:41.6654713Z mov.u32 %r734, 0x0; 2026-02-21T10:22:41.6654871Z mov.u32 %r735, 0x0; 2026-02-21T10:22:41.6655037Z mov.u32 %r736, 0x0; 2026-02-21T10:22:41.6655191Z mov.u32 %r737, 0x0; 2026-02-21T10:22:41.6655505Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r734, %r735, %r736, %r737 }, [ %rd88 + 0 ], %rd87; 2026-02-21T10:22:41.6655871Z // end inline asm 2026-02-21T10:22:41.6656093Z // begin inline asm 2026-02-21T10:22:41.6656266Z mov.u64 %rd90, 0x0; 2026-02-21T10:22:41.6656601Z createpolicy.fractional.L2::evict_last.b64 %rd90, 1.0; 2026-02-21T10:22:41.6656875Z // end inline asm 2026-02-21T10:22:41.6657025Z // begin inline asm 2026-02-21T10:22:41.6657187Z mov.u32 %r738, 0x0; 2026-02-21T10:22:41.6657339Z mov.u32 %r739, 0x0; 2026-02-21T10:22:41.6657496Z mov.u32 %r740, 0x0; 2026-02-21T10:22:41.6657653Z mov.u32 %r741, 0x0; 2026-02-21T10:22:41.6657950Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r738, %r739, %r740, %r741 }, [ %rd91 + 0 ], %rd90; 2026-02-21T10:22:41.6658310Z // end inline asm 2026-02-21T10:22:41.6658460Z // begin inline asm 2026-02-21T10:22:41.6658623Z mov.u64 %rd93, 0x0; 2026-02-21T10:22:41.6658833Z createpolicy.fractional.L2::evict_last.b64 %rd93, 1.0; 2026-02-21T10:22:41.6659089Z // end inline asm 2026-02-21T10:22:41.6659238Z // begin inline asm 2026-02-21T10:22:41.6659399Z mov.u32 %r742, 0x0; 2026-02-21T10:22:41.6659562Z mov.u32 %r743, 0x0; 2026-02-21T10:22:41.6659716Z mov.u32 %r744, 0x0; 2026-02-21T10:22:41.6659877Z mov.u32 %r745, 0x0; 2026-02-21T10:22:41.6660170Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r742, %r743, %r744, %r745 }, [ %rd94 + 0 ], %rd93; 2026-02-21T10:22:41.6660531Z // end inline asm 2026-02-21T10:22:41.6660686Z // begin inline asm 2026-02-21T10:22:41.6660848Z mov.u64 %rd96, 0x0; 2026-02-21T10:22:41.6661059Z createpolicy.fractional.L2::evict_last.b64 %rd96, 1.0; 2026-02-21T10:22:41.6661319Z // end inline asm 2026-02-21T10:22:41.6661487Z // begin inline asm 2026-02-21T10:22:41.6661646Z mov.u32 %r746, 0x0; 2026-02-21T10:22:41.6661809Z mov.u32 %r747, 0x0; 2026-02-21T10:22:41.6661969Z mov.u32 %r748, 0x0; 2026-02-21T10:22:41.6662129Z mov.u32 %r749, 0x0; 2026-02-21T10:22:41.6662422Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r746, %r747, %r748, %r749 }, [ %rd97 + 0 ], %rd96; 2026-02-21T10:22:41.6662777Z // end inline asm 2026-02-21T10:22:41.6663077Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6663435Z bar.sync 0; 2026-02-21T10:22:41.6663613Z st.shared.v2.b32 [%r23], {%r734, %r735}; 2026-02-21T10:22:41.6663854Z st.shared.v2.b32 [%r23+2048], {%r738, %r739}; 2026-02-21T10:22:41.6664108Z st.shared.v2.b32 [%r23+4096], {%r742, %r743}; 2026-02-21T10:22:41.6664360Z st.shared.v2.b32 [%r23+6144], {%r746, %r747}; 2026-02-21T10:22:41.6664596Z st.shared.v2.b32 [%r24], {%r736, %r737}; 2026-02-21T10:22:41.6664821Z st.shared.v2.b32 [%r24+2048], {%r740, %r741}; 2026-02-21T10:22:41.6665153Z st.shared.v2.b32 [%r24+4096], {%r744, %r745}; 2026-02-21T10:22:41.6665387Z st.shared.v2.b32 [%r24+6144], {%r748, %r749}; 2026-02-21T10:22:41.6665610Z bar.sync 0; 2026-02-21T10:22:41.6665776Z ld.shared.b16 %rs113, [%r25]; 2026-02-21T10:22:41.6666045Z ld.shared.b16 %rs114, [%r25+1024]; 2026-02-21T10:22:41.6666262Z ld.shared.b16 %rs115, [%r25+64]; 2026-02-21T10:22:41.6666586Z ld.shared.b16 %rs116, [%r25+1088]; 2026-02-21T10:22:41.6666831Z ld.shared.b16 %rs117, [%r26]; 2026-02-21T10:22:41.6667029Z ld.shared.b16 %rs118, [%r26+1024]; 2026-02-21T10:22:41.6667239Z ld.shared.b16 %rs119, [%r26+64]; 2026-02-21T10:22:41.6667441Z ld.shared.b16 %rs120, [%r26+1088]; 2026-02-21T10:22:41.6667647Z ld.shared.b16 %rs121, [%r27]; 2026-02-21T10:22:41.6667843Z ld.shared.b16 %rs122, [%r27+1024]; 2026-02-21T10:22:41.6668039Z ld.shared.b16 %rs123, [%r27+64]; 2026-02-21T10:22:41.6668236Z ld.shared.b16 %rs124, [%r27+1088]; 2026-02-21T10:22:41.6668433Z ld.shared.b16 %rs125, [%r28]; 2026-02-21T10:22:41.6668779Z ld.shared.b16 %rs126, [%r28+1024]; 2026-02-21T10:22:41.6668984Z ld.shared.b16 %rs127, [%r28+64]; 2026-02-21T10:22:41.6669182Z ld.shared.b16 %rs128, [%r28+1088]; 2026-02-21T10:22:41.6669374Z ld.shared.b16 %rs129, [%r29]; 2026-02-21T10:22:41.6669566Z ld.shared.b16 %rs130, [%r29+1024]; 2026-02-21T10:22:41.6669770Z ld.shared.b16 %rs131, [%r29+64]; 2026-02-21T10:22:41.6669959Z ld.shared.b16 %rs132, [%r29+1088]; 2026-02-21T10:22:41.6670234Z ld.shared.b16 %rs133, [%r30]; 2026-02-21T10:22:41.6670422Z ld.shared.b16 %rs134, [%r30+1024]; 2026-02-21T10:22:41.6670624Z ld.shared.b16 %rs135, [%r30+64]; 2026-02-21T10:22:41.6670816Z ld.shared.b16 %rs136, [%r30+1088]; 2026-02-21T10:22:41.6671015Z ld.shared.b16 %rs137, [%r31]; 2026-02-21T10:22:41.6671212Z ld.shared.b16 %rs138, [%r31+1024]; 2026-02-21T10:22:41.6671419Z ld.shared.b16 %rs139, [%r31+64]; 2026-02-21T10:22:41.6671611Z ld.shared.b16 %rs140, [%r31+1088]; 2026-02-21T10:22:41.6671814Z ld.shared.b16 %rs141, [%r32]; 2026-02-21T10:22:41.6672011Z ld.shared.b16 %rs142, [%r32+1024]; 2026-02-21T10:22:41.6672207Z ld.shared.b16 %rs143, [%r32+64]; 2026-02-21T10:22:41.6672408Z ld.shared.b16 %rs144, [%r32+1088]; 2026-02-21T10:22:41.6672607Z cvt.f32.bf16 %r791, %rs113; 2026-02-21T10:22:41.6672803Z cvt.f32.bf16 %r792, %rs114; 2026-02-21T10:22:41.6672988Z cvt.f32.bf16 %r793, %rs117; 2026-02-21T10:22:41.6673186Z cvt.f32.bf16 %r794, %rs118; 2026-02-21T10:22:41.6673373Z cvt.f32.bf16 %r827, %rs121; 2026-02-21T10:22:41.6673558Z cvt.f32.bf16 %r828, %rs122; 2026-02-21T10:22:41.6673745Z cvt.f32.bf16 %r829, %rs125; 2026-02-21T10:22:41.6673925Z cvt.f32.bf16 %r830, %rs126; 2026-02-21T10:22:41.6674114Z cvt.f32.bf16 %r863, %rs129; 2026-02-21T10:22:41.6674292Z cvt.f32.bf16 %r864, %rs130; 2026-02-21T10:22:41.6674479Z cvt.f32.bf16 %r865, %rs133; 2026-02-21T10:22:41.6674660Z cvt.f32.bf16 %r866, %rs134; 2026-02-21T10:22:41.6674848Z cvt.f32.bf16 %r899, %rs137; 2026-02-21T10:22:41.6675027Z cvt.f32.bf16 %r900, %rs138; 2026-02-21T10:22:41.6675220Z cvt.f32.bf16 %r901, %rs141; 2026-02-21T10:22:41.6675404Z cvt.f32.bf16 %r902, %rs142; 2026-02-21T10:22:41.6675612Z cvt.f32.bf16 %r935, %rs115; 2026-02-21T10:22:41.6675801Z cvt.f32.bf16 %r936, %rs116; 2026-02-21T10:22:41.6675986Z cvt.f32.bf16 %r937, %rs119; 2026-02-21T10:22:41.6676176Z cvt.f32.bf16 %r938, %rs120; 2026-02-21T10:22:41.6676356Z cvt.f32.bf16 %r971, %rs123; 2026-02-21T10:22:41.6676665Z cvt.f32.bf16 %r972, %rs124; 2026-02-21T10:22:41.6676854Z cvt.f32.bf16 %r973, %rs127; 2026-02-21T10:22:41.6677039Z cvt.f32.bf16 %r974, %rs128; 2026-02-21T10:22:41.6677223Z cvt.f32.bf16 %r1007, %rs131; 2026-02-21T10:22:41.6677417Z cvt.f32.bf16 %r1008, %rs132; 2026-02-21T10:22:41.6677599Z cvt.f32.bf16 %r1009, %rs135; 2026-02-21T10:22:41.6677786Z cvt.f32.bf16 %r1010, %rs136; 2026-02-21T10:22:41.6677971Z cvt.f32.bf16 %r1043, %rs139; 2026-02-21T10:22:41.6678150Z cvt.f32.bf16 %r1044, %rs140; 2026-02-21T10:22:41.6678337Z cvt.f32.bf16 %r1045, %rs143; 2026-02-21T10:22:41.6678633Z cvt.f32.bf16 %r1046, %rs144; 2026-02-21T10:22:41.6678967Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6679317Z bar.sync 0; 2026-02-21T10:22:41.6679474Z // begin inline asm 2026-02-21T10:22:41.6679748Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6680005Z // end inline asm 2026-02-21T10:22:41.6680159Z bar.sync 0; 2026-02-21T10:22:41.6680308Z // begin inline asm 2026-02-21T10:22:41.6680539Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6680810Z // end inline asm 2026-02-21T10:22:41.6680966Z // begin inline asm 2026-02-21T10:22:41.6681140Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6681338Z // end inline asm 2026-02-21T10:22:41.6681485Z bar.sync 0; 2026-02-21T10:22:41.6681650Z elect.sync %r1805|%p71, -1; 2026-02-21T10:22:41.6681851Z and.pred %p34, %p1, %p71; 2026-02-21T10:22:41.6682040Z or.b32 %r754, %r403, 32; 2026-02-21T10:22:41.6682228Z // begin inline asm 2026-02-21T10:22:41.6682726Z @%p34 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r402, %r754}], [%r1926]; 2026-02-21T10:22:41.6683200Z // end inline asm 2026-02-21T10:22:41.6683357Z bar.sync 0; 2026-02-21T10:22:41.6683510Z // begin inline asm 2026-02-21T10:22:41.6683663Z 2026-02-21T10:22:41.6683800Z { 2026-02-21T10:22:41.6683936Z .reg .pred complete; 2026-02-21T10:22:41.6684179Z waitLoop: 2026-02-21T10:22:41.6684411Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r1767; 2026-02-21T10:22:41.6684706Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6684890Z } 2026-02-21T10:22:41.6684977Z 2026-02-21T10:22:41.6685040Z // end inline asm 2026-02-21T10:22:41.6685199Z bar.sync 0; 2026-02-21T10:22:41.6685348Z // begin inline asm 2026-02-21T10:22:41.6685545Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6685780Z // end inline asm 2026-02-21T10:22:41.6692968Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6693452Z ld.shared.b8 %rs145, [%r33]; 2026-02-21T10:22:41.6693683Z ld.shared.b8 %rs146, [%r33+64]; 2026-02-21T10:22:41.6693897Z ld.shared.b8 %rs147, [%r33+256]; 2026-02-21T10:22:41.6694118Z ld.shared.b8 %rs148, [%r33+320]; 2026-02-21T10:22:41.6694321Z ld.shared.b8 %rs149, [%r33+512]; 2026-02-21T10:22:41.6694526Z ld.shared.b8 %rs150, [%r33+576]; 2026-02-21T10:22:41.6694720Z ld.shared.b8 %rs151, [%r33+768]; 2026-02-21T10:22:41.6694912Z ld.shared.b8 %rs152, [%r33+832]; 2026-02-21T10:22:41.6695098Z ld.shared.b8 %rs153, [%r34+128]; 2026-02-21T10:22:41.6695288Z ld.shared.b8 %rs154, [%r34+192]; 2026-02-21T10:22:41.6695486Z ld.shared.b8 %rs155, [%r34+384]; 2026-02-21T10:22:41.6695671Z ld.shared.b8 %rs156, [%r34+448]; 2026-02-21T10:22:41.6695861Z ld.shared.b8 %rs157, [%r34+640]; 2026-02-21T10:22:41.6696052Z ld.shared.b8 %rs158, [%r34+704]; 2026-02-21T10:22:41.6696254Z ld.shared.b8 %rs159, [%r34+896]; 2026-02-21T10:22:41.6696636Z ld.shared.b8 %rs160, [%r34+960]; 2026-02-21T10:22:41.6697008Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6697392Z shl.b16 %rs161, %rs145, 4; 2026-02-21T10:22:41.6697586Z shl.b16 %rs162, %rs146, 4; 2026-02-21T10:22:41.6697767Z shl.b16 %rs163, %rs153, 4; 2026-02-21T10:22:41.6697945Z shl.b16 %rs164, %rs154, 4; 2026-02-21T10:22:41.6698131Z shl.b16 %rs165, %rs147, 4; 2026-02-21T10:22:41.6698309Z shl.b16 %rs166, %rs148, 4; 2026-02-21T10:22:41.6698496Z shl.b16 %rs167, %rs155, 4; 2026-02-21T10:22:41.6698674Z shl.b16 %rs168, %rs156, 4; 2026-02-21T10:22:41.6698854Z shl.b16 %rs169, %rs149, 4; 2026-02-21T10:22:41.6699030Z shl.b16 %rs170, %rs150, 4; 2026-02-21T10:22:41.6699220Z shl.b16 %rs171, %rs157, 4; 2026-02-21T10:22:41.6699397Z shl.b16 %rs172, %rs158, 4; 2026-02-21T10:22:41.6699580Z shl.b16 %rs173, %rs151, 4; 2026-02-21T10:22:41.6699761Z shl.b16 %rs174, %rs152, 4; 2026-02-21T10:22:41.6700118Z shl.b16 %rs175, %rs159, 4; 2026-02-21T10:22:41.6700303Z shl.b16 %rs176, %rs160, 4; 2026-02-21T10:22:41.6700631Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6701104Z selp.b16 %rs177, %rs161, %rs145, %p268; 2026-02-21T10:22:41.6701324Z cvt.s16.s8 %rs178, %rs177; 2026-02-21T10:22:41.6701523Z shr.s16 %rs179, %rs178, 4; 2026-02-21T10:22:41.6701718Z selp.b16 %rs180, %rs162, %rs146, %p268; 2026-02-21T10:22:41.6701938Z cvt.s16.s8 %rs181, %rs180; 2026-02-21T10:22:41.6702119Z shr.s16 %rs182, %rs181, 4; 2026-02-21T10:22:41.6702307Z selp.b16 %rs183, %rs163, %rs153, %p268; 2026-02-21T10:22:41.6702525Z cvt.s16.s8 %rs184, %rs183; 2026-02-21T10:22:41.6702701Z shr.s16 %rs185, %rs184, 4; 2026-02-21T10:22:41.6702909Z selp.b16 %rs186, %rs164, %rs154, %p268; 2026-02-21T10:22:41.6703114Z cvt.s16.s8 %rs187, %rs186; 2026-02-21T10:22:41.6703297Z shr.s16 %rs188, %rs187, 4; 2026-02-21T10:22:41.6703483Z selp.b16 %rs189, %rs165, %rs147, %p268; 2026-02-21T10:22:41.6703773Z cvt.s16.s8 %rs190, %rs189; 2026-02-21T10:22:41.6703972Z shr.s16 %rs191, %rs190, 4; 2026-02-21T10:22:41.6704159Z selp.b16 %rs192, %rs166, %rs148, %p268; 2026-02-21T10:22:41.6704369Z cvt.s16.s8 %rs193, %rs192; 2026-02-21T10:22:41.6704554Z shr.s16 %rs194, %rs193, 4; 2026-02-21T10:22:41.6704744Z selp.b16 %rs195, %rs167, %rs155, %p268; 2026-02-21T10:22:41.6705023Z cvt.s16.s8 %rs196, %rs195; 2026-02-21T10:22:41.6705211Z shr.s16 %rs197, %rs196, 4; 2026-02-21T10:22:41.6705402Z selp.b16 %rs198, %rs168, %rs156, %p268; 2026-02-21T10:22:41.6705621Z cvt.s16.s8 %rs199, %rs198; 2026-02-21T10:22:41.6705800Z shr.s16 %rs200, %rs199, 4; 2026-02-21T10:22:41.6705997Z selp.b16 %rs201, %rs169, %rs149, %p268; 2026-02-21T10:22:41.6706210Z cvt.s16.s8 %rs202, %rs201; 2026-02-21T10:22:41.6706385Z shr.s16 %rs203, %rs202, 4; 2026-02-21T10:22:41.6706705Z selp.b16 %rs204, %rs170, %rs150, %p268; 2026-02-21T10:22:41.6706913Z cvt.s16.s8 %rs205, %rs204; 2026-02-21T10:22:41.6707099Z shr.s16 %rs206, %rs205, 4; 2026-02-21T10:22:41.6707280Z selp.b16 %rs207, %rs171, %rs157, %p268; 2026-02-21T10:22:41.6707500Z cvt.s16.s8 %rs208, %rs207; 2026-02-21T10:22:41.6707678Z shr.s16 %rs209, %rs208, 4; 2026-02-21T10:22:41.6707872Z selp.b16 %rs210, %rs172, %rs158, %p268; 2026-02-21T10:22:41.6708080Z cvt.s16.s8 %rs211, %rs210; 2026-02-21T10:22:41.6708262Z shr.s16 %rs212, %rs211, 4; 2026-02-21T10:22:41.6708537Z selp.b16 %rs213, %rs173, %rs151, %p268; 2026-02-21T10:22:41.6708766Z cvt.s16.s8 %rs214, %rs213; 2026-02-21T10:22:41.6708950Z shr.s16 %rs215, %rs214, 4; 2026-02-21T10:22:41.6709137Z selp.b16 %rs216, %rs174, %rs152, %p268; 2026-02-21T10:22:41.6709352Z cvt.s16.s8 %rs217, %rs216; 2026-02-21T10:22:41.6709524Z shr.s16 %rs218, %rs217, 4; 2026-02-21T10:22:41.6709715Z selp.b16 %rs219, %rs175, %rs159, %p268; 2026-02-21T10:22:41.6709916Z cvt.s16.s8 %rs220, %rs219; 2026-02-21T10:22:41.6710096Z shr.s16 %rs221, %rs220, 4; 2026-02-21T10:22:41.6710286Z selp.b16 %rs222, %rs176, %rs160, %p268; 2026-02-21T10:22:41.6710494Z cvt.s16.s8 %rs223, %rs222; 2026-02-21T10:22:41.6710676Z shr.s16 %rs224, %rs223, 4; 2026-02-21T10:22:41.6711008Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6711394Z cvt.rn.f32.s16 %r1806, %rs179; 2026-02-21T10:22:41.6711589Z cvt.rn.f32.s16 %r1807, %rs182; 2026-02-21T10:22:41.6711788Z cvt.rn.f32.s16 %r1808, %rs185; 2026-02-21T10:22:41.6711971Z cvt.rn.f32.s16 %r1809, %rs188; 2026-02-21T10:22:41.6712161Z cvt.rn.f32.s16 %r1810, %rs191; 2026-02-21T10:22:41.6712351Z cvt.rn.f32.s16 %r1811, %rs194; 2026-02-21T10:22:41.6712536Z cvt.rn.f32.s16 %r1812, %rs197; 2026-02-21T10:22:41.6712725Z cvt.rn.f32.s16 %r1813, %rs200; 2026-02-21T10:22:41.6712908Z cvt.rn.f32.s16 %r1814, %rs203; 2026-02-21T10:22:41.6713099Z cvt.rn.f32.s16 %r1815, %rs206; 2026-02-21T10:22:41.6713279Z cvt.rn.f32.s16 %r1816, %rs209; 2026-02-21T10:22:41.6713481Z cvt.rn.f32.s16 %r1817, %rs212; 2026-02-21T10:22:41.6713757Z cvt.rn.f32.s16 %r1818, %rs215; 2026-02-21T10:22:41.6713946Z cvt.rn.f32.s16 %r1819, %rs218; 2026-02-21T10:22:41.6714137Z cvt.rn.f32.s16 %r1820, %rs221; 2026-02-21T10:22:41.6714321Z cvt.rn.f32.s16 %r1821, %rs224; 2026-02-21T10:22:41.6714576Z bar.sync 0; 2026-02-21T10:22:41.6714737Z st.shared.b32 [%r35], %r1806; 2026-02-21T10:22:41.6714940Z st.shared.b32 [%r35+4096], %r1814; 2026-02-21T10:22:41.6715149Z st.shared.b32 [%r36], %r1807; 2026-02-21T10:22:41.6715362Z st.shared.b32 [%r36+4096], %r1815; 2026-02-21T10:22:41.6715563Z st.shared.b32 [%r37], %r1808; 2026-02-21T10:22:41.6715757Z st.shared.b32 [%r37+4096], %r1816; 2026-02-21T10:22:41.6715954Z st.shared.b32 [%r38], %r1809; 2026-02-21T10:22:41.6716149Z st.shared.b32 [%r38+4096], %r1817; 2026-02-21T10:22:41.6716354Z st.shared.b32 [%r39], %r1810; 2026-02-21T10:22:41.6716661Z st.shared.b32 [%r39+4096], %r1818; 2026-02-21T10:22:41.6716867Z st.shared.b32 [%r40], %r1811; 2026-02-21T10:22:41.6717067Z st.shared.b32 [%r40+4096], %r1819; 2026-02-21T10:22:41.6717361Z st.shared.b32 [%r41], %r1812; 2026-02-21T10:22:41.6717550Z st.shared.b32 [%r41+4096], %r1820; 2026-02-21T10:22:41.6717751Z st.shared.b32 [%r42], %r1813; 2026-02-21T10:22:41.6717936Z st.shared.b32 [%r42+4096], %r1821; 2026-02-21T10:22:41.6718130Z $L__tmp3: 2026-02-21T10:22:41.6718564Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6718995Z // begin inline asm 2026-02-21T10:22:41.6719190Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6719381Z // end inline asm 2026-02-21T10:22:41.6719538Z bar.sync 0; 2026-02-21T10:22:41.6719700Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6719894Z // begin inline asm 2026-02-21T10:22:41.6720500Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r791,%r792,%r793,%r794}, %rd166, %p24, 1, 1; 2026-02-21T10:22:41.6721160Z // end inline asm 2026-02-21T10:22:41.6721318Z // begin inline asm 2026-02-21T10:22:41.6721918Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r827,%r828,%r829,%r830}, %rd167, %p24, 1, 1; 2026-02-21T10:22:41.6722569Z // end inline asm 2026-02-21T10:22:41.6722723Z // begin inline asm 2026-02-21T10:22:41.6723333Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r863,%r864,%r865,%r866}, %rd168, %p24, 1, 1; 2026-02-21T10:22:41.6723970Z // end inline asm 2026-02-21T10:22:41.6724121Z // begin inline asm 2026-02-21T10:22:41.6724711Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r899,%r900,%r901,%r902}, %rd169, %p24, 1, 1; 2026-02-21T10:22:41.6725351Z // end inline asm 2026-02-21T10:22:41.6725509Z // begin inline asm 2026-02-21T10:22:41.6726090Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r935,%r936,%r937,%r938}, %rd170, %p24, 1, 1; 2026-02-21T10:22:41.6726859Z // end inline asm 2026-02-21T10:22:41.6727027Z // begin inline asm 2026-02-21T10:22:41.6727612Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r971,%r972,%r973,%r974}, %rd171, %p24, 1, 1; 2026-02-21T10:22:41.6728254Z // end inline asm 2026-02-21T10:22:41.6728402Z // begin inline asm 2026-02-21T10:22:41.6729009Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1007,%r1008,%r1009,%r1010}, %rd172, %p24, 1, 1; 2026-02-21T10:22:41.6729769Z // end inline asm 2026-02-21T10:22:41.6729924Z // begin inline asm 2026-02-21T10:22:41.6730533Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1043,%r1044,%r1045,%r1046}, %rd173, %p24, 1, 1; 2026-02-21T10:22:41.6731294Z // end inline asm 2026-02-21T10:22:41.6731479Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6731695Z mov.b32 %r1063, %r1928; 2026-02-21T10:22:41.6731869Z mov.b32 %r1064, %r1767; 2026-02-21T10:22:41.6732046Z mov.b32 %r1065, %r1767; 2026-02-21T10:22:41.6732210Z // begin inline asm 2026-02-21T10:22:41.6732626Z // wait for regs: %r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460,%r1063,%r1064,%r1065 2026-02-21T10:22:41.6733100Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6733302Z // end inline asm 2026-02-21T10:22:41.6733453Z $L__tmp4: 2026-02-21T10:22:41.6733825Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6734202Z add.s64 %rd109, %rd67, 256; 2026-02-21T10:22:41.6734388Z add.s64 %rd112, %rd70, 256; 2026-02-21T10:22:41.6734572Z add.s64 %rd115, %rd73, 256; 2026-02-21T10:22:41.6734888Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6735349Z add.s64 %rd118, %rd76, 256; 2026-02-21T10:22:41.6735529Z // begin inline asm 2026-02-21T10:22:41.6735694Z mov.u64 %rd108, 0x0; 2026-02-21T10:22:41.6735935Z createpolicy.fractional.L2::evict_last.b64 %rd108, 1.0; 2026-02-21T10:22:41.6736206Z // end inline asm 2026-02-21T10:22:41.6736371Z // begin inline asm 2026-02-21T10:22:41.6736658Z mov.u32 %r1085, 0x0; 2026-02-21T10:22:41.6736826Z mov.u32 %r1086, 0x0; 2026-02-21T10:22:41.6736980Z mov.u32 %r1087, 0x0; 2026-02-21T10:22:41.6737140Z mov.u32 %r1088, 0x0; 2026-02-21T10:22:41.6737471Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1085, %r1086, %r1087, %r1088 }, [ %rd109 + 0 ], %rd108; 2026-02-21T10:22:41.6737847Z // end inline asm 2026-02-21T10:22:41.6738008Z // begin inline asm 2026-02-21T10:22:41.6738165Z mov.u64 %rd111, 0x0; 2026-02-21T10:22:41.6738389Z createpolicy.fractional.L2::evict_last.b64 %rd111, 1.0; 2026-02-21T10:22:41.6738641Z // end inline asm 2026-02-21T10:22:41.6738793Z // begin inline asm 2026-02-21T10:22:41.6738948Z mov.u32 %r1089, 0x0; 2026-02-21T10:22:41.6739108Z mov.u32 %r1090, 0x0; 2026-02-21T10:22:41.6739271Z mov.u32 %r1091, 0x0; 2026-02-21T10:22:41.6739432Z mov.u32 %r1092, 0x0; 2026-02-21T10:22:41.6739748Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1089, %r1090, %r1091, %r1092 }, [ %rd112 + 0 ], %rd111; 2026-02-21T10:22:41.6740119Z // end inline asm 2026-02-21T10:22:41.6740273Z // begin inline asm 2026-02-21T10:22:41.6740426Z mov.u64 %rd114, 0x0; 2026-02-21T10:22:41.6740642Z createpolicy.fractional.L2::evict_last.b64 %rd114, 1.0; 2026-02-21T10:22:41.6740896Z // end inline asm 2026-02-21T10:22:41.6741051Z // begin inline asm 2026-02-21T10:22:41.6741202Z mov.u32 %r1093, 0x0; 2026-02-21T10:22:41.6741360Z mov.u32 %r1094, 0x0; 2026-02-21T10:22:41.6741520Z mov.u32 %r1095, 0x0; 2026-02-21T10:22:41.6741686Z mov.u32 %r1096, 0x0; 2026-02-21T10:22:41.6741995Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1093, %r1094, %r1095, %r1096 }, [ %rd115 + 0 ], %rd114; 2026-02-21T10:22:41.6742355Z // end inline asm 2026-02-21T10:22:41.6742524Z // begin inline asm 2026-02-21T10:22:41.6742586Z mov.u64 %rd117, 0x0; 2026-02-21T10:22:41.6742705Z createpolicy.fractional.L2::evict_last.b64 %rd117, 1.0; 2026-02-21T10:22:41.6742772Z // end inline asm 2026-02-21T10:22:41.6742836Z // begin inline asm 2026-02-21T10:22:41.6742895Z mov.u32 %r1097, 0x0; 2026-02-21T10:22:41.6742966Z mov.u32 %r1098, 0x0; 2026-02-21T10:22:41.6743029Z mov.u32 %r1099, 0x0; 2026-02-21T10:22:41.6743089Z mov.u32 %r1100, 0x0; 2026-02-21T10:22:41.6743321Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1097, %r1098, %r1099, %r1100 }, [ %rd118 + 0 ], %rd117; 2026-02-21T10:22:41.6743476Z // end inline asm 2026-02-21T10:22:41.6743692Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6743825Z bar.sync 0; 2026-02-21T10:22:41.6743913Z st.shared.v2.b32 [%r23], {%r1085, %r1086}; 2026-02-21T10:22:41.6744006Z st.shared.v2.b32 [%r23+2048], {%r1089, %r1090}; 2026-02-21T10:22:41.6744099Z st.shared.v2.b32 [%r23+4096], {%r1093, %r1094}; 2026-02-21T10:22:41.6744184Z st.shared.v2.b32 [%r23+6144], {%r1097, %r1098}; 2026-02-21T10:22:41.6744261Z st.shared.v2.b32 [%r24], {%r1087, %r1088}; 2026-02-21T10:22:41.6744346Z st.shared.v2.b32 [%r24+2048], {%r1091, %r1092}; 2026-02-21T10:22:41.6744436Z st.shared.v2.b32 [%r24+4096], {%r1095, %r1096}; 2026-02-21T10:22:41.6744521Z st.shared.v2.b32 [%r24+6144], {%r1099, %r1100}; 2026-02-21T10:22:41.6744595Z bar.sync 0; 2026-02-21T10:22:41.6744681Z ld.shared.b16 %rs225, [%r25]; 2026-02-21T10:22:41.6744826Z ld.shared.b16 %rs226, [%r25+1024]; 2026-02-21T10:22:41.6744901Z ld.shared.b16 %rs227, [%r25+64]; 2026-02-21T10:22:41.6744977Z ld.shared.b16 %rs228, [%r25+1088]; 2026-02-21T10:22:41.6745046Z ld.shared.b16 %rs229, [%r26]; 2026-02-21T10:22:41.6745122Z ld.shared.b16 %rs230, [%r26+1024]; 2026-02-21T10:22:41.6745194Z ld.shared.b16 %rs231, [%r26+64]; 2026-02-21T10:22:41.6745268Z ld.shared.b16 %rs232, [%r26+1088]; 2026-02-21T10:22:41.6745395Z ld.shared.b16 %rs233, [%r27]; 2026-02-21T10:22:41.6745467Z ld.shared.b16 %rs234, [%r27+1024]; 2026-02-21T10:22:41.6745539Z ld.shared.b16 %rs235, [%r27+64]; 2026-02-21T10:22:41.6745606Z ld.shared.b16 %rs236, [%r27+1088]; 2026-02-21T10:22:41.6745673Z ld.shared.b16 %rs237, [%r28]; 2026-02-21T10:22:41.6745742Z ld.shared.b16 %rs238, [%r28+1024]; 2026-02-21T10:22:41.6745814Z ld.shared.b16 %rs239, [%r28+64]; 2026-02-21T10:22:41.6745882Z ld.shared.b16 %rs240, [%r28+1088]; 2026-02-21T10:22:41.6745950Z ld.shared.b16 %rs241, [%r29]; 2026-02-21T10:22:41.6746027Z ld.shared.b16 %rs242, [%r29+1024]; 2026-02-21T10:22:41.6746092Z ld.shared.b16 %rs243, [%r29+64]; 2026-02-21T10:22:41.6746158Z ld.shared.b16 %rs244, [%r29+1088]; 2026-02-21T10:22:41.6746231Z ld.shared.b16 %rs245, [%r30]; 2026-02-21T10:22:41.6746300Z ld.shared.b16 %rs246, [%r30+1024]; 2026-02-21T10:22:41.6746367Z ld.shared.b16 %rs247, [%r30+64]; 2026-02-21T10:22:41.6746433Z ld.shared.b16 %rs248, [%r30+1088]; 2026-02-21T10:22:41.6746629Z ld.shared.b16 %rs249, [%r31]; 2026-02-21T10:22:41.6746702Z ld.shared.b16 %rs250, [%r31+1024]; 2026-02-21T10:22:41.6746769Z ld.shared.b16 %rs251, [%r31+64]; 2026-02-21T10:22:41.6746842Z ld.shared.b16 %rs252, [%r31+1088]; 2026-02-21T10:22:41.6746907Z ld.shared.b16 %rs253, [%r32]; 2026-02-21T10:22:41.6746975Z ld.shared.b16 %rs254, [%r32+1024]; 2026-02-21T10:22:41.6747043Z ld.shared.b16 %rs255, [%r32+64]; 2026-02-21T10:22:41.6747117Z ld.shared.b16 %rs256, [%r32+1088]; 2026-02-21T10:22:41.6747187Z cvt.f32.bf16 %r1142, %rs225; 2026-02-21T10:22:41.6747257Z cvt.f32.bf16 %r1143, %rs226; 2026-02-21T10:22:41.6747334Z cvt.f32.bf16 %r1144, %rs229; 2026-02-21T10:22:41.6747404Z cvt.f32.bf16 %r1145, %rs230; 2026-02-21T10:22:41.6747469Z cvt.f32.bf16 %r1178, %rs233; 2026-02-21T10:22:41.6747535Z cvt.f32.bf16 %r1179, %rs234; 2026-02-21T10:22:41.6747604Z cvt.f32.bf16 %r1180, %rs237; 2026-02-21T10:22:41.6747668Z cvt.f32.bf16 %r1181, %rs238; 2026-02-21T10:22:41.6747733Z cvt.f32.bf16 %r1214, %rs241; 2026-02-21T10:22:41.6747803Z cvt.f32.bf16 %r1215, %rs242; 2026-02-21T10:22:41.6747865Z cvt.f32.bf16 %r1216, %rs245; 2026-02-21T10:22:41.6747932Z cvt.f32.bf16 %r1217, %rs246; 2026-02-21T10:22:41.6747999Z cvt.f32.bf16 %r1250, %rs249; 2026-02-21T10:22:41.6748063Z cvt.f32.bf16 %r1251, %rs250; 2026-02-21T10:22:41.6748126Z cvt.f32.bf16 %r1252, %rs253; 2026-02-21T10:22:41.6748192Z cvt.f32.bf16 %r1253, %rs254; 2026-02-21T10:22:41.6748260Z cvt.f32.bf16 %r1286, %rs227; 2026-02-21T10:22:41.6748325Z cvt.f32.bf16 %r1287, %rs228; 2026-02-21T10:22:41.6748556Z cvt.f32.bf16 %r1288, %rs231; 2026-02-21T10:22:41.6748635Z cvt.f32.bf16 %r1289, %rs232; 2026-02-21T10:22:41.6748697Z cvt.f32.bf16 %r1322, %rs235; 2026-02-21T10:22:41.6748760Z cvt.f32.bf16 %r1323, %rs236; 2026-02-21T10:22:41.6748892Z cvt.f32.bf16 %r1324, %rs239; 2026-02-21T10:22:41.6748959Z cvt.f32.bf16 %r1325, %rs240; 2026-02-21T10:22:41.6749021Z cvt.f32.bf16 %r1358, %rs243; 2026-02-21T10:22:41.6749087Z cvt.f32.bf16 %r1359, %rs244; 2026-02-21T10:22:41.6749156Z cvt.f32.bf16 %r1360, %rs247; 2026-02-21T10:22:41.6749219Z cvt.f32.bf16 %r1361, %rs248; 2026-02-21T10:22:41.6749283Z cvt.f32.bf16 %r1394, %rs251; 2026-02-21T10:22:41.6749346Z cvt.f32.bf16 %r1395, %rs252; 2026-02-21T10:22:41.6749416Z cvt.f32.bf16 %r1396, %rs255; 2026-02-21T10:22:41.6749479Z cvt.f32.bf16 %r1397, %rs256; 2026-02-21T10:22:41.6749708Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6749778Z bar.sync 0; 2026-02-21T10:22:41.6749909Z // begin inline asm 2026-02-21T10:22:41.6750019Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6750086Z // end inline asm 2026-02-21T10:22:41.6750144Z bar.sync 0; 2026-02-21T10:22:41.6750205Z // begin inline asm 2026-02-21T10:22:41.6750346Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6750414Z // end inline asm 2026-02-21T10:22:41.6750479Z // begin inline asm 2026-02-21T10:22:41.6750621Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6750688Z // end inline asm 2026-02-21T10:22:41.6750746Z bar.sync 0; 2026-02-21T10:22:41.6750822Z elect.sync %r1822|%p72, -1; 2026-02-21T10:22:41.6750908Z and.pred %p46, %p1, %p72; 2026-02-21T10:22:41.6750979Z or.b32 %r1105, %r403, 64; 2026-02-21T10:22:41.6751044Z // begin inline asm 2026-02-21T10:22:41.6751375Z @%p46 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r402, %r1105}], [%r1926]; 2026-02-21T10:22:41.6751447Z // end inline asm 2026-02-21T10:22:41.6751510Z bar.sync 0; 2026-02-21T10:22:41.6751572Z // begin inline asm 2026-02-21T10:22:41.6751635Z 2026-02-21T10:22:41.6751688Z { 2026-02-21T10:22:41.6751756Z .reg .pred complete; 2026-02-21T10:22:41.6751815Z waitLoop: 2026-02-21T10:22:41.6751976Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r1767; 2026-02-21T10:22:41.6752053Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6752107Z } 2026-02-21T10:22:41.6752115Z 2026-02-21T10:22:41.6752182Z // end inline asm 2026-02-21T10:22:41.6752241Z bar.sync 0; 2026-02-21T10:22:41.6752304Z // begin inline asm 2026-02-21T10:22:41.6752404Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6752470Z // end inline asm 2026-02-21T10:22:41.6752693Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6752768Z ld.shared.b8 %rs257, [%r33]; 2026-02-21T10:22:41.6752850Z ld.shared.b8 %rs258, [%r33+64]; 2026-02-21T10:22:41.6752925Z ld.shared.b8 %rs259, [%r33+256]; 2026-02-21T10:22:41.6752995Z ld.shared.b8 %rs260, [%r33+320]; 2026-02-21T10:22:41.6753065Z ld.shared.b8 %rs261, [%r33+512]; 2026-02-21T10:22:41.6753143Z ld.shared.b8 %rs262, [%r33+576]; 2026-02-21T10:22:41.6753216Z ld.shared.b8 %rs263, [%r33+768]; 2026-02-21T10:22:41.6753282Z ld.shared.b8 %rs264, [%r33+832]; 2026-02-21T10:22:41.6753359Z ld.shared.b8 %rs265, [%r34+128]; 2026-02-21T10:22:41.6753429Z ld.shared.b8 %rs266, [%r34+192]; 2026-02-21T10:22:41.6753494Z ld.shared.b8 %rs267, [%r34+384]; 2026-02-21T10:22:41.6753563Z ld.shared.b8 %rs268, [%r34+448]; 2026-02-21T10:22:41.6753632Z ld.shared.b8 %rs269, [%r34+640]; 2026-02-21T10:22:41.6753699Z ld.shared.b8 %rs270, [%r34+704]; 2026-02-21T10:22:41.6753763Z ld.shared.b8 %rs271, [%r34+896]; 2026-02-21T10:22:41.6753834Z ld.shared.b8 %rs272, [%r34+960]; 2026-02-21T10:22:41.6754039Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6754173Z shl.b16 %rs273, %rs257, 4; 2026-02-21T10:22:41.6754246Z shl.b16 %rs274, %rs258, 4; 2026-02-21T10:22:41.6754310Z shl.b16 %rs275, %rs265, 4; 2026-02-21T10:22:41.6754372Z shl.b16 %rs276, %rs266, 4; 2026-02-21T10:22:41.6754434Z shl.b16 %rs277, %rs259, 4; 2026-02-21T10:22:41.6754553Z shl.b16 %rs278, %rs260, 4; 2026-02-21T10:22:41.6754617Z shl.b16 %rs279, %rs267, 4; 2026-02-21T10:22:41.6754678Z shl.b16 %rs280, %rs268, 4; 2026-02-21T10:22:41.6754750Z shl.b16 %rs281, %rs261, 4; 2026-02-21T10:22:41.6754814Z shl.b16 %rs282, %rs262, 4; 2026-02-21T10:22:41.6754876Z shl.b16 %rs283, %rs269, 4; 2026-02-21T10:22:41.6754946Z shl.b16 %rs284, %rs270, 4; 2026-02-21T10:22:41.6755008Z shl.b16 %rs285, %rs263, 4; 2026-02-21T10:22:41.6755069Z shl.b16 %rs286, %rs264, 4; 2026-02-21T10:22:41.6755131Z shl.b16 %rs287, %rs271, 4; 2026-02-21T10:22:41.6755200Z shl.b16 %rs288, %rs272, 4; 2026-02-21T10:22:41.6755399Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6755529Z selp.b16 %rs289, %rs273, %rs257, %p268; 2026-02-21T10:22:41.6755603Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T10:22:41.6755666Z shr.s16 %rs291, %rs290, 4; 2026-02-21T10:22:41.6755739Z selp.b16 %rs292, %rs274, %rs258, %p268; 2026-02-21T10:22:41.6755807Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T10:22:41.6755876Z shr.s16 %rs294, %rs293, 4; 2026-02-21T10:22:41.6755947Z selp.b16 %rs295, %rs275, %rs265, %p268; 2026-02-21T10:22:41.6756055Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T10:22:41.6756126Z shr.s16 %rs297, %rs296, 4; 2026-02-21T10:22:41.6756196Z selp.b16 %rs298, %rs276, %rs266, %p268; 2026-02-21T10:22:41.6756259Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T10:22:41.6756321Z shr.s16 %rs300, %rs299, 4; 2026-02-21T10:22:41.6756401Z selp.b16 %rs301, %rs277, %rs259, %p268; 2026-02-21T10:22:41.6756592Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T10:22:41.6756661Z shr.s16 %rs303, %rs302, 4; 2026-02-21T10:22:41.6756739Z selp.b16 %rs304, %rs278, %rs260, %p268; 2026-02-21T10:22:41.6756808Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T10:22:41.6756870Z shr.s16 %rs306, %rs305, 4; 2026-02-21T10:22:41.6756950Z selp.b16 %rs307, %rs279, %rs267, %p268; 2026-02-21T10:22:41.6757014Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T10:22:41.6757080Z shr.s16 %rs309, %rs308, 4; 2026-02-21T10:22:41.6757151Z selp.b16 %rs310, %rs280, %rs268, %p268; 2026-02-21T10:22:41.6757219Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T10:22:41.6757283Z shr.s16 %rs312, %rs311, 4; 2026-02-21T10:22:41.6757356Z selp.b16 %rs313, %rs281, %rs261, %p268; 2026-02-21T10:22:41.6757425Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T10:22:41.6757487Z shr.s16 %rs315, %rs314, 4; 2026-02-21T10:22:41.6757556Z selp.b16 %rs316, %rs282, %rs262, %p268; 2026-02-21T10:22:41.6757619Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T10:22:41.6757688Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:22:41.6757759Z selp.b16 %rs319, %rs283, %rs269, %p268; 2026-02-21T10:22:41.6757823Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T10:22:41.6757897Z shr.s16 %rs321, %rs320, 4; 2026-02-21T10:22:41.6757970Z selp.b16 %rs322, %rs284, %rs270, %p268; 2026-02-21T10:22:41.6758035Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T10:22:41.6758100Z shr.s16 %rs324, %rs323, 4; 2026-02-21T10:22:41.6758179Z selp.b16 %rs325, %rs285, %rs263, %p268; 2026-02-21T10:22:41.6758245Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T10:22:41.6758311Z shr.s16 %rs327, %rs326, 4; 2026-02-21T10:22:41.6758392Z selp.b16 %rs328, %rs286, %rs264, %p268; 2026-02-21T10:22:41.6758459Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T10:22:41.6758520Z shr.s16 %rs330, %rs329, 4; 2026-02-21T10:22:41.6758598Z selp.b16 %rs331, %rs287, %rs271, %p268; 2026-02-21T10:22:41.6758665Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T10:22:41.6758728Z shr.s16 %rs333, %rs332, 4; 2026-02-21T10:22:41.6758799Z selp.b16 %rs334, %rs288, %rs272, %p268; 2026-02-21T10:22:41.6758870Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T10:22:41.6758933Z shr.s16 %rs336, %rs335, 4; 2026-02-21T10:22:41.6759136Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6759310Z cvt.rn.f32.s16 %r1823, %rs291; 2026-02-21T10:22:41.6759378Z cvt.rn.f32.s16 %r1824, %rs294; 2026-02-21T10:22:41.6759443Z cvt.rn.f32.s16 %r1825, %rs297; 2026-02-21T10:22:41.6759575Z cvt.rn.f32.s16 %r1826, %rs300; 2026-02-21T10:22:41.6759650Z cvt.rn.f32.s16 %r1827, %rs303; 2026-02-21T10:22:41.6759715Z cvt.rn.f32.s16 %r1828, %rs306; 2026-02-21T10:22:41.6759781Z cvt.rn.f32.s16 %r1829, %rs309; 2026-02-21T10:22:41.6759855Z cvt.rn.f32.s16 %r1830, %rs312; 2026-02-21T10:22:41.6759919Z cvt.rn.f32.s16 %r1831, %rs315; 2026-02-21T10:22:41.6759985Z cvt.rn.f32.s16 %r1832, %rs318; 2026-02-21T10:22:41.6760052Z cvt.rn.f32.s16 %r1833, %rs321; 2026-02-21T10:22:41.6760122Z cvt.rn.f32.s16 %r1834, %rs324; 2026-02-21T10:22:41.6760187Z cvt.rn.f32.s16 %r1835, %rs327; 2026-02-21T10:22:41.6760252Z cvt.rn.f32.s16 %r1836, %rs330; 2026-02-21T10:22:41.6760323Z cvt.rn.f32.s16 %r1837, %rs333; 2026-02-21T10:22:41.6760393Z cvt.rn.f32.s16 %r1838, %rs336; 2026-02-21T10:22:41.6760536Z bar.sync 0; 2026-02-21T10:22:41.6760616Z st.shared.b32 [%r35], %r1823; 2026-02-21T10:22:41.6760687Z st.shared.b32 [%r35+4096], %r1831; 2026-02-21T10:22:41.6760756Z st.shared.b32 [%r36], %r1824; 2026-02-21T10:22:41.6760829Z st.shared.b32 [%r36+4096], %r1832; 2026-02-21T10:22:41.6760902Z st.shared.b32 [%r37], %r1825; 2026-02-21T10:22:41.6760968Z st.shared.b32 [%r37+4096], %r1833; 2026-02-21T10:22:41.6761104Z st.shared.b32 [%r38], %r1826; 2026-02-21T10:22:41.6761179Z st.shared.b32 [%r38+4096], %r1834; 2026-02-21T10:22:41.6761244Z st.shared.b32 [%r39], %r1827; 2026-02-21T10:22:41.6761309Z st.shared.b32 [%r39+4096], %r1835; 2026-02-21T10:22:41.6761376Z st.shared.b32 [%r40], %r1828; 2026-02-21T10:22:41.6761450Z st.shared.b32 [%r40+4096], %r1836; 2026-02-21T10:22:41.6761515Z st.shared.b32 [%r41], %r1829; 2026-02-21T10:22:41.6761593Z st.shared.b32 [%r41+4096], %r1837; 2026-02-21T10:22:41.6761674Z st.shared.b32 [%r42], %r1830; 2026-02-21T10:22:41.6761745Z st.shared.b32 [%r42+4096], %r1838; 2026-02-21T10:22:41.6761803Z $L__tmp5: 2026-02-21T10:22:41.6762083Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6762156Z // begin inline asm 2026-02-21T10:22:41.6762233Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6762292Z // end inline asm 2026-02-21T10:22:41.6762358Z bar.sync 0; 2026-02-21T10:22:41.6762433Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6762494Z // begin inline asm 2026-02-21T10:22:41.6763014Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1142,%r1143,%r1144,%r1145}, %rd166, %p24, 1, 1; 2026-02-21T10:22:41.6763074Z // end inline asm 2026-02-21T10:22:41.6763136Z // begin inline asm 2026-02-21T10:22:41.6763643Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1178,%r1179,%r1180,%r1181}, %rd167, %p24, 1, 1; 2026-02-21T10:22:41.6763703Z // end inline asm 2026-02-21T10:22:41.6763763Z // begin inline asm 2026-02-21T10:22:41.6764264Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1214,%r1215,%r1216,%r1217}, %rd168, %p24, 1, 1; 2026-02-21T10:22:41.6764328Z // end inline asm 2026-02-21T10:22:41.6764388Z // begin inline asm 2026-02-21T10:22:41.6764882Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1250,%r1251,%r1252,%r1253}, %rd169, %p24, 1, 1; 2026-02-21T10:22:41.6764947Z // end inline asm 2026-02-21T10:22:41.6765008Z // begin inline asm 2026-02-21T10:22:41.6765501Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1286,%r1287,%r1288,%r1289}, %rd170, %p24, 1, 1; 2026-02-21T10:22:41.6765627Z // end inline asm 2026-02-21T10:22:41.6765687Z // begin inline asm 2026-02-21T10:22:41.6766244Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1322,%r1323,%r1324,%r1325}, %rd171, %p24, 1, 1; 2026-02-21T10:22:41.6766310Z // end inline asm 2026-02-21T10:22:41.6766372Z // begin inline asm 2026-02-21T10:22:41.6766999Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1358,%r1359,%r1360,%r1361}, %rd172, %p24, 1, 1; 2026-02-21T10:22:41.6767068Z // end inline asm 2026-02-21T10:22:41.6767127Z // begin inline asm 2026-02-21T10:22:41.6767694Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1394,%r1395,%r1396,%r1397}, %rd173, %p24, 1, 1; 2026-02-21T10:22:41.6767763Z // end inline asm 2026-02-21T10:22:41.6767846Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6767917Z mov.b32 %r1414, %r1928; 2026-02-21T10:22:41.6767979Z mov.b32 %r1415, %r1767; 2026-02-21T10:22:41.6768044Z mov.b32 %r1416, %r1767; 2026-02-21T10:22:41.6768166Z // begin inline asm 2026-02-21T10:22:41.6768477Z // wait for regs: %r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460,%r1414,%r1415,%r1416 2026-02-21T10:22:41.6768575Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6768636Z // end inline asm 2026-02-21T10:22:41.6768695Z $L__tmp6: 2026-02-21T10:22:41.6768909Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6768977Z add.s64 %rd130, %rd67, 384; 2026-02-21T10:22:41.6769046Z add.s64 %rd133, %rd70, 384; 2026-02-21T10:22:41.6769112Z add.s64 %rd136, %rd73, 384; 2026-02-21T10:22:41.6769319Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6769385Z add.s64 %rd139, %rd76, 384; 2026-02-21T10:22:41.6769449Z // begin inline asm 2026-02-21T10:22:41.6769522Z mov.u64 %rd129, 0x0; 2026-02-21T10:22:41.6769660Z createpolicy.fractional.L2::evict_last.b64 %rd129, 1.0; 2026-02-21T10:22:41.6769721Z // end inline asm 2026-02-21T10:22:41.6769789Z // begin inline asm 2026-02-21T10:22:41.6769852Z mov.u32 %r1436, 0x0; 2026-02-21T10:22:41.6769914Z mov.u32 %r1437, 0x0; 2026-02-21T10:22:41.6769975Z mov.u32 %r1438, 0x0; 2026-02-21T10:22:41.6770042Z mov.u32 %r1439, 0x0; 2026-02-21T10:22:41.6770271Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1436, %r1437, %r1438, %r1439 }, [ %rd130 + 0 ], %rd129; 2026-02-21T10:22:41.6770331Z // end inline asm 2026-02-21T10:22:41.6770399Z // begin inline asm 2026-02-21T10:22:41.6770465Z mov.u64 %rd132, 0x0; 2026-02-21T10:22:41.6770586Z createpolicy.fractional.L2::evict_last.b64 %rd132, 1.0; 2026-02-21T10:22:41.6770647Z // end inline asm 2026-02-21T10:22:41.6770715Z // begin inline asm 2026-02-21T10:22:41.6770777Z mov.u32 %r1440, 0x0; 2026-02-21T10:22:41.6770835Z mov.u32 %r1441, 0x0; 2026-02-21T10:22:41.6770902Z mov.u32 %r1442, 0x0; 2026-02-21T10:22:41.6770959Z mov.u32 %r1443, 0x0; 2026-02-21T10:22:41.6771176Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1440, %r1441, %r1442, %r1443 }, [ %rd133 + 0 ], %rd132; 2026-02-21T10:22:41.6771243Z // end inline asm 2026-02-21T10:22:41.6771304Z // begin inline asm 2026-02-21T10:22:41.6771365Z mov.u64 %rd135, 0x0; 2026-02-21T10:22:41.6771490Z createpolicy.fractional.L2::evict_last.b64 %rd135, 1.0; 2026-02-21T10:22:41.6771570Z // end inline asm 2026-02-21T10:22:41.6771635Z // begin inline asm 2026-02-21T10:22:41.6771695Z mov.u32 %r1444, 0x0; 2026-02-21T10:22:41.6771762Z mov.u32 %r1445, 0x0; 2026-02-21T10:22:41.6771897Z mov.u32 %r1446, 0x0; 2026-02-21T10:22:41.6771960Z mov.u32 %r1447, 0x0; 2026-02-21T10:22:41.6772173Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1444, %r1445, %r1446, %r1447 }, [ %rd136 + 0 ], %rd135; 2026-02-21T10:22:41.6772306Z // end inline asm 2026-02-21T10:22:41.6772367Z // begin inline asm 2026-02-21T10:22:41.6772428Z mov.u64 %rd138, 0x0; 2026-02-21T10:22:41.6772555Z createpolicy.fractional.L2::evict_last.b64 %rd138, 1.0; 2026-02-21T10:22:41.6772617Z // end inline asm 2026-02-21T10:22:41.6772678Z // begin inline asm 2026-02-21T10:22:41.6772746Z mov.u32 %r1448, 0x0; 2026-02-21T10:22:41.6772805Z mov.u32 %r1449, 0x0; 2026-02-21T10:22:41.6772869Z mov.u32 %r1450, 0x0; 2026-02-21T10:22:41.6772930Z mov.u32 %r1451, 0x0; 2026-02-21T10:22:41.6773147Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1448, %r1449, %r1450, %r1451 }, [ %rd139 + 0 ], %rd138; 2026-02-21T10:22:41.6773207Z // end inline asm 2026-02-21T10:22:41.6773457Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6773522Z bar.sync 0; 2026-02-21T10:22:41.6773606Z st.shared.v2.b32 [%r23], {%r1436, %r1437}; 2026-02-21T10:22:41.6773696Z st.shared.v2.b32 [%r23+2048], {%r1440, %r1441}; 2026-02-21T10:22:41.6773788Z st.shared.v2.b32 [%r23+4096], {%r1444, %r1445}; 2026-02-21T10:22:41.6773871Z st.shared.v2.b32 [%r23+6144], {%r1448, %r1449}; 2026-02-21T10:22:41.6773996Z st.shared.v2.b32 [%r24], {%r1438, %r1439}; 2026-02-21T10:22:41.6774082Z st.shared.v2.b32 [%r24+2048], {%r1442, %r1443}; 2026-02-21T10:22:41.6774172Z st.shared.v2.b32 [%r24+4096], {%r1446, %r1447}; 2026-02-21T10:22:41.6774252Z st.shared.v2.b32 [%r24+6144], {%r1450, %r1451}; 2026-02-21T10:22:41.6774311Z bar.sync 0; 2026-02-21T10:22:41.6774387Z ld.shared.b16 %rs337, [%r25]; 2026-02-21T10:22:41.6774457Z ld.shared.b16 %rs338, [%r25+1024]; 2026-02-21T10:22:41.6774526Z ld.shared.b16 %rs339, [%r25+64]; 2026-02-21T10:22:41.6774602Z ld.shared.b16 %rs340, [%r25+1088]; 2026-02-21T10:22:41.6774674Z ld.shared.b16 %rs341, [%r26]; 2026-02-21T10:22:41.6774739Z ld.shared.b16 %rs342, [%r26+1024]; 2026-02-21T10:22:41.6774807Z ld.shared.b16 %rs343, [%r26+64]; 2026-02-21T10:22:41.6774879Z ld.shared.b16 %rs344, [%r26+1088]; 2026-02-21T10:22:41.6774946Z ld.shared.b16 %rs345, [%r27]; 2026-02-21T10:22:41.6775025Z ld.shared.b16 %rs346, [%r27+1024]; 2026-02-21T10:22:41.6775098Z ld.shared.b16 %rs347, [%r27+64]; 2026-02-21T10:22:41.6775166Z ld.shared.b16 %rs348, [%r27+1088]; 2026-02-21T10:22:41.6775230Z ld.shared.b16 %rs349, [%r28]; 2026-02-21T10:22:41.6775298Z ld.shared.b16 %rs350, [%r28+1024]; 2026-02-21T10:22:41.6775371Z ld.shared.b16 %rs351, [%r28+64]; 2026-02-21T10:22:41.6775437Z ld.shared.b16 %rs352, [%r28+1088]; 2026-02-21T10:22:41.6775502Z ld.shared.b16 %rs353, [%r29]; 2026-02-21T10:22:41.6775578Z ld.shared.b16 %rs354, [%r29+1024]; 2026-02-21T10:22:41.6775645Z ld.shared.b16 %rs355, [%r29+64]; 2026-02-21T10:22:41.6775711Z ld.shared.b16 %rs356, [%r29+1088]; 2026-02-21T10:22:41.6775783Z ld.shared.b16 %rs357, [%r30]; 2026-02-21T10:22:41.6775854Z ld.shared.b16 %rs358, [%r30+1024]; 2026-02-21T10:22:41.6775923Z ld.shared.b16 %rs359, [%r30+64]; 2026-02-21T10:22:41.6775991Z ld.shared.b16 %rs360, [%r30+1088]; 2026-02-21T10:22:41.6776069Z ld.shared.b16 %rs361, [%r31]; 2026-02-21T10:22:41.6776135Z ld.shared.b16 %rs362, [%r31+1024]; 2026-02-21T10:22:41.6776206Z ld.shared.b16 %rs363, [%r31+64]; 2026-02-21T10:22:41.6776282Z ld.shared.b16 %rs364, [%r31+1088]; 2026-02-21T10:22:41.6776347Z ld.shared.b16 %rs365, [%r32]; 2026-02-21T10:22:41.6776413Z ld.shared.b16 %rs366, [%r32+1024]; 2026-02-21T10:22:41.6776597Z ld.shared.b16 %rs367, [%r32+64]; 2026-02-21T10:22:41.6776675Z ld.shared.b16 %rs368, [%r32+1088]; 2026-02-21T10:22:41.6776743Z cvt.f32.bf16 %r1493, %rs337; 2026-02-21T10:22:41.6776820Z cvt.f32.bf16 %r1494, %rs338; 2026-02-21T10:22:41.6776893Z cvt.f32.bf16 %r1495, %rs341; 2026-02-21T10:22:41.6776959Z cvt.f32.bf16 %r1496, %rs342; 2026-02-21T10:22:41.6777106Z cvt.f32.bf16 %r1529, %rs345; 2026-02-21T10:22:41.6777169Z cvt.f32.bf16 %r1530, %rs346; 2026-02-21T10:22:41.6777242Z cvt.f32.bf16 %r1531, %rs349; 2026-02-21T10:22:41.6777306Z cvt.f32.bf16 %r1532, %rs350; 2026-02-21T10:22:41.6777434Z cvt.f32.bf16 %r1565, %rs353; 2026-02-21T10:22:41.6777503Z cvt.f32.bf16 %r1566, %rs354; 2026-02-21T10:22:41.6777568Z cvt.f32.bf16 %r1567, %rs357; 2026-02-21T10:22:41.6777632Z cvt.f32.bf16 %r1568, %rs358; 2026-02-21T10:22:41.6777696Z cvt.f32.bf16 %r1601, %rs361; 2026-02-21T10:22:41.6777770Z cvt.f32.bf16 %r1602, %rs362; 2026-02-21T10:22:41.6777834Z cvt.f32.bf16 %r1603, %rs365; 2026-02-21T10:22:41.6777898Z cvt.f32.bf16 %r1604, %rs366; 2026-02-21T10:22:41.6777966Z cvt.f32.bf16 %r1637, %rs339; 2026-02-21T10:22:41.6778029Z cvt.f32.bf16 %r1638, %rs340; 2026-02-21T10:22:41.6778094Z cvt.f32.bf16 %r1639, %rs343; 2026-02-21T10:22:41.6778164Z cvt.f32.bf16 %r1640, %rs344; 2026-02-21T10:22:41.6778227Z cvt.f32.bf16 %r1673, %rs347; 2026-02-21T10:22:41.6778361Z cvt.f32.bf16 %r1674, %rs348; 2026-02-21T10:22:41.6778429Z cvt.f32.bf16 %r1675, %rs351; 2026-02-21T10:22:41.6778500Z cvt.f32.bf16 %r1676, %rs352; 2026-02-21T10:22:41.6778564Z cvt.f32.bf16 %r1709, %rs355; 2026-02-21T10:22:41.6778632Z cvt.f32.bf16 %r1710, %rs356; 2026-02-21T10:22:41.6778703Z cvt.f32.bf16 %r1711, %rs359; 2026-02-21T10:22:41.6778766Z cvt.f32.bf16 %r1712, %rs360; 2026-02-21T10:22:41.6778891Z cvt.f32.bf16 %r1745, %rs363; 2026-02-21T10:22:41.6778957Z cvt.f32.bf16 %r1746, %rs364; 2026-02-21T10:22:41.6779026Z cvt.f32.bf16 %r1747, %rs367; 2026-02-21T10:22:41.6779093Z cvt.f32.bf16 %r1748, %rs368; 2026-02-21T10:22:41.6779317Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6779383Z bar.sync 0; 2026-02-21T10:22:41.6779446Z // begin inline asm 2026-02-21T10:22:41.6779547Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6779607Z // end inline asm 2026-02-21T10:22:41.6779674Z bar.sync 0; 2026-02-21T10:22:41.6779740Z // begin inline asm 2026-02-21T10:22:41.6779875Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6779943Z // end inline asm 2026-02-21T10:22:41.6780006Z // begin inline asm 2026-02-21T10:22:41.6780089Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6780155Z // end inline asm 2026-02-21T10:22:41.6780213Z bar.sync 0; 2026-02-21T10:22:41.6780288Z elect.sync %r1839|%p73, -1; 2026-02-21T10:22:41.6780363Z and.pred %p58, %p1, %p73; 2026-02-21T10:22:41.6780433Z or.b32 %r1456, %r403, 96; 2026-02-21T10:22:41.6780495Z // begin inline asm 2026-02-21T10:22:41.6780820Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r402, %r1456}], [%r1926]; 2026-02-21T10:22:41.6780889Z // end inline asm 2026-02-21T10:22:41.6780948Z bar.sync 0; 2026-02-21T10:22:41.6781010Z // begin inline asm 2026-02-21T10:22:41.6781064Z 2026-02-21T10:22:41.6781124Z { 2026-02-21T10:22:41.6781208Z .reg .pred complete; 2026-02-21T10:22:41.6781270Z waitLoop: 2026-02-21T10:22:41.6781429Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r1767; 2026-02-21T10:22:41.6781501Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6781554Z } 2026-02-21T10:22:41.6781561Z 2026-02-21T10:22:41.6781639Z // end inline asm 2026-02-21T10:22:41.6781697Z bar.sync 0; 2026-02-21T10:22:41.6781759Z // begin inline asm 2026-02-21T10:22:41.6781859Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6781924Z // end inline asm 2026-02-21T10:22:41.6782127Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6782195Z ld.shared.b8 %rs369, [%r33]; 2026-02-21T10:22:41.6782270Z ld.shared.b8 %rs370, [%r33+64]; 2026-02-21T10:22:41.6782338Z ld.shared.b8 %rs371, [%r33+256]; 2026-02-21T10:22:41.6782404Z ld.shared.b8 %rs372, [%r33+320]; 2026-02-21T10:22:41.6782470Z ld.shared.b8 %rs373, [%r33+512]; 2026-02-21T10:22:41.6782605Z ld.shared.b8 %rs374, [%r33+576]; 2026-02-21T10:22:41.6782673Z ld.shared.b8 %rs375, [%r33+768]; 2026-02-21T10:22:41.6782740Z ld.shared.b8 %rs376, [%r33+832]; 2026-02-21T10:22:41.6782810Z ld.shared.b8 %rs377, [%r34+128]; 2026-02-21T10:22:41.6782874Z ld.shared.b8 %rs378, [%r34+192]; 2026-02-21T10:22:41.6783010Z ld.shared.b8 %rs379, [%r34+384]; 2026-02-21T10:22:41.6783081Z ld.shared.b8 %rs380, [%r34+448]; 2026-02-21T10:22:41.6783148Z ld.shared.b8 %rs381, [%r34+640]; 2026-02-21T10:22:41.6783214Z ld.shared.b8 %rs382, [%r34+704]; 2026-02-21T10:22:41.6783279Z ld.shared.b8 %rs383, [%r34+896]; 2026-02-21T10:22:41.6783349Z ld.shared.b8 %rs384, [%r34+960]; 2026-02-21T10:22:41.6783551Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6783619Z shl.b16 %rs385, %rs369, 4; 2026-02-21T10:22:41.6783689Z shl.b16 %rs386, %rs370, 4; 2026-02-21T10:22:41.6783753Z shl.b16 %rs387, %rs377, 4; 2026-02-21T10:22:41.6783816Z shl.b16 %rs388, %rs378, 4; 2026-02-21T10:22:41.6783941Z shl.b16 %rs389, %rs371, 4; 2026-02-21T10:22:41.6784013Z shl.b16 %rs390, %rs372, 4; 2026-02-21T10:22:41.6784075Z shl.b16 %rs391, %rs379, 4; 2026-02-21T10:22:41.6784138Z shl.b16 %rs392, %rs380, 4; 2026-02-21T10:22:41.6784208Z shl.b16 %rs393, %rs373, 4; 2026-02-21T10:22:41.6784271Z shl.b16 %rs394, %rs374, 4; 2026-02-21T10:22:41.6784335Z shl.b16 %rs395, %rs381, 4; 2026-02-21T10:22:41.6784450Z shl.b16 %rs396, %rs382, 4; 2026-02-21T10:22:41.6784520Z shl.b16 %rs397, %rs375, 4; 2026-02-21T10:22:41.6784582Z shl.b16 %rs398, %rs376, 4; 2026-02-21T10:22:41.6784645Z shl.b16 %rs399, %rs383, 4; 2026-02-21T10:22:41.6784713Z shl.b16 %rs400, %rs384, 4; 2026-02-21T10:22:41.6784915Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6784996Z selp.b16 %rs401, %rs385, %rs369, %p268; 2026-02-21T10:22:41.6785067Z cvt.s16.s8 %rs402, %rs401; 2026-02-21T10:22:41.6785131Z shr.s16 %rs403, %rs402, 4; 2026-02-21T10:22:41.6785208Z selp.b16 %rs404, %rs386, %rs370, %p268; 2026-02-21T10:22:41.6785273Z cvt.s16.s8 %rs405, %rs404; 2026-02-21T10:22:41.6785344Z shr.s16 %rs406, %rs405, 4; 2026-02-21T10:22:41.6785414Z selp.b16 %rs407, %rs387, %rs377, %p268; 2026-02-21T10:22:41.6785480Z cvt.s16.s8 %rs408, %rs407; 2026-02-21T10:22:41.6785549Z shr.s16 %rs409, %rs408, 4; 2026-02-21T10:22:41.6785621Z selp.b16 %rs410, %rs388, %rs378, %p268; 2026-02-21T10:22:41.6785688Z cvt.s16.s8 %rs411, %rs410; 2026-02-21T10:22:41.6785752Z shr.s16 %rs412, %rs411, 4; 2026-02-21T10:22:41.6785833Z selp.b16 %rs413, %rs389, %rs371, %p268; 2026-02-21T10:22:41.6785895Z cvt.s16.s8 %rs414, %rs413; 2026-02-21T10:22:41.6785959Z shr.s16 %rs415, %rs414, 4; 2026-02-21T10:22:41.6786040Z selp.b16 %rs416, %rs390, %rs372, %p268; 2026-02-21T10:22:41.6786104Z cvt.s16.s8 %rs417, %rs416; 2026-02-21T10:22:41.6786168Z shr.s16 %rs418, %rs417, 4; 2026-02-21T10:22:41.6786238Z selp.b16 %rs419, %rs391, %rs379, %p268; 2026-02-21T10:22:41.6786311Z cvt.s16.s8 %rs420, %rs419; 2026-02-21T10:22:41.6786376Z shr.s16 %rs421, %rs420, 4; 2026-02-21T10:22:41.6786566Z selp.b16 %rs422, %rs392, %rs380, %p268; 2026-02-21T10:22:41.6786641Z cvt.s16.s8 %rs423, %rs422; 2026-02-21T10:22:41.6786709Z shr.s16 %rs424, %rs423, 4; 2026-02-21T10:22:41.6786781Z selp.b16 %rs425, %rs393, %rs373, %p268; 2026-02-21T10:22:41.6786851Z cvt.s16.s8 %rs426, %rs425; 2026-02-21T10:22:41.6786917Z shr.s16 %rs427, %rs426, 4; 2026-02-21T10:22:41.6786990Z selp.b16 %rs428, %rs394, %rs374, %p268; 2026-02-21T10:22:41.6787052Z cvt.s16.s8 %rs429, %rs428; 2026-02-21T10:22:41.6787121Z shr.s16 %rs430, %rs429, 4; 2026-02-21T10:22:41.6787195Z selp.b16 %rs431, %rs395, %rs381, %p268; 2026-02-21T10:22:41.6787260Z cvt.s16.s8 %rs432, %rs431; 2026-02-21T10:22:41.6787331Z shr.s16 %rs433, %rs432, 4; 2026-02-21T10:22:41.6787404Z selp.b16 %rs434, %rs396, %rs382, %p268; 2026-02-21T10:22:41.6787468Z cvt.s16.s8 %rs435, %rs434; 2026-02-21T10:22:41.6787532Z shr.s16 %rs436, %rs435, 4; 2026-02-21T10:22:41.6787722Z selp.b16 %rs437, %rs397, %rs375, %p268; 2026-02-21T10:22:41.6787788Z cvt.s16.s8 %rs438, %rs437; 2026-02-21T10:22:41.6787852Z shr.s16 %rs439, %rs438, 4; 2026-02-21T10:22:41.6787930Z selp.b16 %rs440, %rs398, %rs376, %p268; 2026-02-21T10:22:41.6788061Z cvt.s16.s8 %rs441, %rs440; 2026-02-21T10:22:41.6788126Z shr.s16 %rs442, %rs441, 4; 2026-02-21T10:22:41.6788197Z selp.b16 %rs443, %rs399, %rs383, %p268; 2026-02-21T10:22:41.6788271Z cvt.s16.s8 %rs444, %rs443; 2026-02-21T10:22:41.6788334Z shr.s16 %rs445, %rs444, 4; 2026-02-21T10:22:41.6788406Z selp.b16 %rs446, %rs400, %rs384, %p268; 2026-02-21T10:22:41.6788539Z cvt.s16.s8 %rs447, %rs446; 2026-02-21T10:22:41.6788605Z shr.s16 %rs448, %rs447, 4; 2026-02-21T10:22:41.6788808Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6788884Z cvt.rn.f32.s16 %r1840, %rs403; 2026-02-21T10:22:41.6788951Z cvt.rn.f32.s16 %r1841, %rs406; 2026-02-21T10:22:41.6789093Z cvt.rn.f32.s16 %r1842, %rs409; 2026-02-21T10:22:41.6789160Z cvt.rn.f32.s16 %r1843, %rs412; 2026-02-21T10:22:41.6789232Z cvt.rn.f32.s16 %r1844, %rs415; 2026-02-21T10:22:41.6789301Z cvt.rn.f32.s16 %r1845, %rs418; 2026-02-21T10:22:41.6789367Z cvt.rn.f32.s16 %r1846, %rs421; 2026-02-21T10:22:41.6789443Z cvt.rn.f32.s16 %r1847, %rs424; 2026-02-21T10:22:41.6789508Z cvt.rn.f32.s16 %r1848, %rs427; 2026-02-21T10:22:41.6789638Z cvt.rn.f32.s16 %r1849, %rs430; 2026-02-21T10:22:41.6789706Z cvt.rn.f32.s16 %r1850, %rs433; 2026-02-21T10:22:41.6789777Z cvt.rn.f32.s16 %r1851, %rs436; 2026-02-21T10:22:41.6789839Z cvt.rn.f32.s16 %r1852, %rs439; 2026-02-21T10:22:41.6789902Z cvt.rn.f32.s16 %r1853, %rs442; 2026-02-21T10:22:41.6789970Z cvt.rn.f32.s16 %r1854, %rs445; 2026-02-21T10:22:41.6790037Z cvt.rn.f32.s16 %r1855, %rs448; 2026-02-21T10:22:41.6790094Z bar.sync 0; 2026-02-21T10:22:41.6790163Z st.shared.b32 [%r35], %r1840; 2026-02-21T10:22:41.6790237Z st.shared.b32 [%r35+4096], %r1848; 2026-02-21T10:22:41.6790309Z st.shared.b32 [%r36], %r1841; 2026-02-21T10:22:41.6790375Z st.shared.b32 [%r36+4096], %r1849; 2026-02-21T10:22:41.6790446Z st.shared.b32 [%r37], %r1842; 2026-02-21T10:22:41.6790512Z st.shared.b32 [%r37+4096], %r1850; 2026-02-21T10:22:41.6790580Z st.shared.b32 [%r38], %r1843; 2026-02-21T10:22:41.6790654Z st.shared.b32 [%r38+4096], %r1851; 2026-02-21T10:22:41.6790727Z st.shared.b32 [%r39], %r1844; 2026-02-21T10:22:41.6790794Z st.shared.b32 [%r39+4096], %r1852; 2026-02-21T10:22:41.6790859Z st.shared.b32 [%r40], %r1845; 2026-02-21T10:22:41.6790930Z st.shared.b32 [%r40+4096], %r1853; 2026-02-21T10:22:41.6790995Z st.shared.b32 [%r41], %r1846; 2026-02-21T10:22:41.6791061Z st.shared.b32 [%r41+4096], %r1854; 2026-02-21T10:22:41.6791133Z st.shared.b32 [%r42], %r1847; 2026-02-21T10:22:41.6791197Z st.shared.b32 [%r42+4096], %r1855; 2026-02-21T10:22:41.6791253Z $L__tmp7: 2026-02-21T10:22:41.6791538Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6791609Z // begin inline asm 2026-02-21T10:22:41.6791688Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6791749Z // end inline asm 2026-02-21T10:22:41.6791817Z bar.sync 0; 2026-02-21T10:22:41.6791892Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6791954Z // begin inline asm 2026-02-21T10:22:41.6792473Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1493,%r1494,%r1495,%r1496}, %rd166, %p24, 1, 1; 2026-02-21T10:22:41.6792533Z // end inline asm 2026-02-21T10:22:41.6792596Z // begin inline asm 2026-02-21T10:22:41.6793097Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1529,%r1530,%r1531,%r1532}, %rd167, %p24, 1, 1; 2026-02-21T10:22:41.6793157Z // end inline asm 2026-02-21T10:22:41.6793279Z // begin inline asm 2026-02-21T10:22:41.6793776Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1565,%r1566,%r1567,%r1568}, %rd168, %p24, 1, 1; 2026-02-21T10:22:41.6793910Z // end inline asm 2026-02-21T10:22:41.6793972Z // begin inline asm 2026-02-21T10:22:41.6794472Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1601,%r1602,%r1603,%r1604}, %rd169, %p24, 1, 1; 2026-02-21T10:22:41.6794538Z // end inline asm 2026-02-21T10:22:41.6794600Z // begin inline asm 2026-02-21T10:22:41.6795094Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1637,%r1638,%r1639,%r1640}, %rd170, %p24, 1, 1; 2026-02-21T10:22:41.6795162Z // end inline asm 2026-02-21T10:22:41.6795271Z // begin inline asm 2026-02-21T10:22:41.6795770Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1673,%r1674,%r1675,%r1676}, %rd171, %p24, 1, 1; 2026-02-21T10:22:41.6795838Z // end inline asm 2026-02-21T10:22:41.6795898Z // begin inline asm 2026-02-21T10:22:41.6796439Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1709,%r1710,%r1711,%r1712}, %rd172, %p24, 1, 1; 2026-02-21T10:22:41.6796628Z // end inline asm 2026-02-21T10:22:41.6796694Z // begin inline asm 2026-02-21T10:22:41.6797191Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460}, {%r1745,%r1746,%r1747,%r1748}, %rd173, %p24, 1, 1; 2026-02-21T10:22:41.6797261Z // end inline asm 2026-02-21T10:22:41.6797341Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6797409Z mov.b32 %r1765, %r1928; 2026-02-21T10:22:41.6797472Z mov.b32 %r1766, %r1767; 2026-02-21T10:22:41.6797542Z // begin inline asm 2026-02-21T10:22:41.6797854Z // wait for regs: %r5445,%r5446,%r5447,%r5448,%r5449,%r5450,%r5451,%r5452,%r5453,%r5454,%r5455,%r5456,%r5457,%r5458,%r5459,%r5460,%r1765,%r1766,%r1767 2026-02-21T10:22:41.6797935Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6798004Z // end inline asm 2026-02-21T10:22:41.6798062Z $L__tmp8: 2026-02-21T10:22:41.6798281Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6798356Z add.s64 %rd17, %rd459, 128; 2026-02-21T10:22:41.6798434Z add.s64 %rd458, %rd458, 512; 2026-02-21T10:22:41.6798506Z setp.lt.u64 %p74, %rd459, 3968; 2026-02-21T10:22:41.6798572Z mov.b64 %rd459, %rd17; 2026-02-21T10:22:41.6798642Z @%p74 bra $L__BB0_3; 2026-02-21T10:22:41.6798762Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:41.6798976Z .loc 1 38 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:38:32 2026-02-21T10:22:41.6799052Z or.b32 %r1875, %r402, %r6; 2026-02-21T10:22:41.6799255Z .loc 1 40 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:32 2026-02-21T10:22:41.6799320Z or.b32 %r1876, %r60, %r18; 2026-02-21T10:22:41.6799391Z or.b32 %r1877, %r60, %r19; 2026-02-21T10:22:41.6799591Z .loc 1 93 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:93:28 2026-02-21T10:22:41.6799674Z cvt.rn.bf16x2.f32 %r1878, %r5446, %r5445; 2026-02-21T10:22:41.6799751Z cvt.rn.bf16x2.f32 %r1879, %r5448, %r5447; 2026-02-21T10:22:41.6799832Z cvt.rn.bf16x2.f32 %r1880, %r5450, %r5449; 2026-02-21T10:22:41.6799909Z cvt.rn.bf16x2.f32 %r1881, %r5452, %r5451; 2026-02-21T10:22:41.6799983Z cvt.rn.bf16x2.f32 %r1882, %r5454, %r5453; 2026-02-21T10:22:41.6800151Z cvt.rn.bf16x2.f32 %r1883, %r5456, %r5455; 2026-02-21T10:22:41.6800226Z cvt.rn.bf16x2.f32 %r1884, %r5458, %r5457; 2026-02-21T10:22:41.6800301Z cvt.rn.bf16x2.f32 %r1885, %r5460, %r5459; 2026-02-21T10:22:41.6800509Z .loc 1 94 50 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:50 2026-02-21T10:22:41.6800651Z mad.lo.s32 %r1886, %r1876, 1280, %r1875; 2026-02-21T10:22:41.6800726Z mad.lo.s32 %r1887, %r1877, 1280, %r1875; 2026-02-21T10:22:41.6800924Z .loc 1 94 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:22 2026-02-21T10:22:41.6801009Z mad.wide.s32 %rd150, %r1886, 2, %rd36; 2026-02-21T10:22:41.6801080Z mad.wide.s32 %rd151, %r1887, 2, %rd36; 2026-02-21T10:22:41.6801275Z .loc 1 94 81 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:81 2026-02-21T10:22:41.6801344Z bar.sync 0; 2026-02-21T10:22:41.6801460Z st.shared.v4.b32 [%r43], {%r1878, %r1880, %r1882, %r1884}; 2026-02-21T10:22:41.6801653Z st.shared.v4.b32 [%r43+128], {%r1879, %r1881, %r1883, %r1885}; 2026-02-21T10:22:41.6801723Z bar.sync 0; 2026-02-21T10:22:41.6801786Z // begin inline asm 2026-02-21T10:22:41.6801981Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1866, %r1867, %r1868, %r1869}, [%r1860]; 2026-02-21T10:22:41.6802046Z // end inline asm 2026-02-21T10:22:41.6802113Z // begin inline asm 2026-02-21T10:22:41.6802356Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1870, %r1871, %r1872, %r1873}, [%r1865]; 2026-02-21T10:22:41.6802417Z // end inline asm 2026-02-21T10:22:41.6802484Z // begin inline asm 2026-02-21T10:22:41.6802614Z st.global.v4.b32 [ %rd150 + 0 ], { %r1866, %r1867, %r1868, %r1869 }; 2026-02-21T10:22:41.6802672Z // end inline asm 2026-02-21T10:22:41.6802731Z // begin inline asm 2026-02-21T10:22:41.6802856Z st.global.v4.b32 [ %rd151 + 0 ], { %r1870, %r1871, %r1872, %r1873 }; 2026-02-21T10:22:41.6802915Z // end inline asm 2026-02-21T10:22:41.6803145Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6803220Z add.s32 %r1888, %r5444, 132; 2026-02-21T10:22:41.6803422Z .loc 1 32 35 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:32:35 2026-02-21T10:22:41.6803491Z shr.s32 %r1889, %r1888, 31; 2026-02-21T10:22:41.6803560Z shr.u32 %r1890, %r1889, 17; 2026-02-21T10:22:41.6803628Z add.s32 %r1891, %r1888, %r1890; 2026-02-21T10:22:41.6803694Z shr.s32 %r1892, %r1891, 15; 2026-02-21T10:22:41.6803892Z .loc 1 33 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:33:33 2026-02-21T10:22:41.6803962Z shl.b32 %r1893, %r1892, 5; 2026-02-21T10:22:41.6804157Z .loc 1 34 39 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:39 2026-02-21T10:22:41.6804221Z sub.s32 %r1894, 40, %r1893; 2026-02-21T10:22:41.6804423Z .loc 1 34 52 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:52 2026-02-21T10:22:41.6804499Z min.s32 %r1895, %r1894, 32; 2026-02-21T10:22:41.6804701Z .loc 1 35 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:45 2026-02-21T10:22:41.6804776Z and.b32 %r1896, %r1891, -32768; 2026-02-21T10:22:41.6804844Z sub.s32 %r1897, %r1888, %r1896; 2026-02-21T10:22:41.6805040Z .loc 1 36 51 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:36:51 2026-02-21T10:22:41.6805112Z div.s32 %r1898, %r1897, %r1895; 2026-02-21T10:22:41.6805307Z .loc 1 35 64 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:64 2026-02-21T10:22:41.6805375Z mul.lo.s32 %r1899, %r1898, %r1895; 2026-02-21T10:22:41.6805439Z sub.s32 %r1900, %r1897, %r1899; 2026-02-21T10:22:41.6805642Z .loc 1 35 30 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:30 2026-02-21T10:22:41.6805707Z add.s32 %r1901, %r1900, %r1893; 2026-02-21T10:22:41.6805904Z .loc 1 37 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:37:27 2026-02-21T10:22:41.6806057Z shl.b32 %r1929, %r1901, 5; 2026-02-21T10:22:41.6806253Z .loc 1 39 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:39:27 2026-02-21T10:22:41.6806375Z shl.b32 %r94, %r1898, 6; 2026-02-21T10:22:41.6806705Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6806777Z or.b32 %r1902, %r54, %r94; 2026-02-21T10:22:41.6806839Z shl.b32 %r1903, %r1902, 13; 2026-02-21T10:22:41.6806916Z mul.wide.s32 %rd19, %r1903, 2; 2026-02-21T10:22:41.6806978Z or.b32 %r1904, %r55, %r94; 2026-02-21T10:22:41.6807042Z shl.b32 %r1905, %r1904, 13; 2026-02-21T10:22:41.6807109Z mul.wide.s32 %rd20, %r1905, 2; 2026-02-21T10:22:41.6807177Z or.b32 %r1906, %r56, %r94; 2026-02-21T10:22:41.6807240Z shl.b32 %r1907, %r1906, 13; 2026-02-21T10:22:41.6807306Z mul.wide.s32 %rd21, %r1907, 2; 2026-02-21T10:22:41.6807378Z shl.b32 %r1908, %r1898, 19; 2026-02-21T10:22:41.6807526Z or.b32 %r1909, %r57, %r1908; 2026-02-21T10:22:41.6807610Z mul.wide.s32 %rd22, %r1909, 2; 2026-02-21T10:22:41.6807674Z mov.b32 %r5461, 0f00000000; 2026-02-21T10:22:41.6807743Z mov.b64 %rd461, 0; 2026-02-21T10:22:41.6807811Z mov.b64 %rd460, %rd10; 2026-02-21T10:22:41.6807875Z mov.b32 %r5462, %r5461; 2026-02-21T10:22:41.6807946Z mov.b32 %r5463, %r5461; 2026-02-21T10:22:41.6808008Z mov.b32 %r5464, %r5461; 2026-02-21T10:22:41.6808133Z mov.b32 %r5465, %r5461; 2026-02-21T10:22:41.6808197Z mov.b32 %r5466, %r5461; 2026-02-21T10:22:41.6808266Z mov.b32 %r5467, %r5461; 2026-02-21T10:22:41.6808329Z mov.b32 %r5468, %r5461; 2026-02-21T10:22:41.6808389Z mov.b32 %r5469, %r5461; 2026-02-21T10:22:41.6808457Z mov.b32 %r5470, %r5461; 2026-02-21T10:22:41.6808517Z mov.b32 %r5471, %r5461; 2026-02-21T10:22:41.6808577Z mov.b32 %r5472, %r5461; 2026-02-21T10:22:41.6808637Z mov.b32 %r5473, %r5461; 2026-02-21T10:22:41.6808704Z mov.b32 %r5474, %r5461; 2026-02-21T10:22:41.6808768Z mov.b32 %r5475, %r5461; 2026-02-21T10:22:41.6808833Z mov.b32 %r5476, %r5461; 2026-02-21T10:22:41.6808959Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T10:22:41.6809072Z // => This Inner Loop Header: Depth=2 2026-02-21T10:22:41.6809277Z .loc 1 0 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:0:126 2026-02-21T10:22:41.6809354Z cvt.u32.u64 %r1930, %rd461; 2026-02-21T10:22:41.6809554Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6809623Z add.s64 %rd154, %rd460, %rd22; 2026-02-21T10:22:41.6809692Z add.s64 %rd157, %rd460, %rd21; 2026-02-21T10:22:41.6809764Z add.s64 %rd160, %rd460, %rd20; 2026-02-21T10:22:41.6809961Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6810025Z add.s64 %rd163, %rd460, %rd19; 2026-02-21T10:22:41.6810097Z // begin inline asm 2026-02-21T10:22:41.6810161Z mov.u64 %rd153, 0x0; 2026-02-21T10:22:41.6810291Z createpolicy.fractional.L2::evict_last.b64 %rd153, 1.0; 2026-02-21T10:22:41.6810358Z // end inline asm 2026-02-21T10:22:41.6810420Z // begin inline asm 2026-02-21T10:22:41.6810485Z mov.u32 %r1910, 0x0; 2026-02-21T10:22:41.6810545Z mov.u32 %r1911, 0x0; 2026-02-21T10:22:41.6810610Z mov.u32 %r1912, 0x0; 2026-02-21T10:22:41.6810673Z mov.u32 %r1913, 0x0; 2026-02-21T10:22:41.6810902Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1910, %r1911, %r1912, %r1913 }, [ %rd154 + 0 ], %rd153; 2026-02-21T10:22:41.6810969Z // end inline asm 2026-02-21T10:22:41.6811032Z // begin inline asm 2026-02-21T10:22:41.6811093Z mov.u64 %rd156, 0x0; 2026-02-21T10:22:41.6811219Z createpolicy.fractional.L2::evict_last.b64 %rd156, 1.0; 2026-02-21T10:22:41.6811280Z // end inline asm 2026-02-21T10:22:41.6811341Z // begin inline asm 2026-02-21T10:22:41.6811409Z mov.u32 %r1914, 0x0; 2026-02-21T10:22:41.6811578Z mov.u32 %r1915, 0x0; 2026-02-21T10:22:41.6811644Z mov.u32 %r1916, 0x0; 2026-02-21T10:22:41.6811701Z mov.u32 %r1917, 0x0; 2026-02-21T10:22:41.6811921Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1914, %r1915, %r1916, %r1917 }, [ %rd157 + 0 ], %rd156; 2026-02-21T10:22:41.6812047Z // end inline asm 2026-02-21T10:22:41.6812107Z // begin inline asm 2026-02-21T10:22:41.6812167Z mov.u64 %rd159, 0x0; 2026-02-21T10:22:41.6812293Z createpolicy.fractional.L2::evict_last.b64 %rd159, 1.0; 2026-02-21T10:22:41.6812351Z // end inline asm 2026-02-21T10:22:41.6812412Z // begin inline asm 2026-02-21T10:22:41.6812475Z mov.u32 %r1918, 0x0; 2026-02-21T10:22:41.6812533Z mov.u32 %r1919, 0x0; 2026-02-21T10:22:41.6812591Z mov.u32 %r1920, 0x0; 2026-02-21T10:22:41.6812650Z mov.u32 %r1921, 0x0; 2026-02-21T10:22:41.6812870Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1918, %r1919, %r1920, %r1921 }, [ %rd160 + 0 ], %rd159; 2026-02-21T10:22:41.6812928Z // end inline asm 2026-02-21T10:22:41.6813000Z // begin inline asm 2026-02-21T10:22:41.6813122Z mov.u64 %rd162, 0x0; 2026-02-21T10:22:41.6813241Z createpolicy.fractional.L2::evict_last.b64 %rd162, 1.0; 2026-02-21T10:22:41.6813298Z // end inline asm 2026-02-21T10:22:41.6813362Z // begin inline asm 2026-02-21T10:22:41.6813424Z mov.u32 %r1922, 0x0; 2026-02-21T10:22:41.6813483Z mov.u32 %r1923, 0x0; 2026-02-21T10:22:41.6813541Z mov.u32 %r1924, 0x0; 2026-02-21T10:22:41.6813606Z mov.u32 %r1925, 0x0; 2026-02-21T10:22:41.6813869Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1922, %r1923, %r1924, %r1925 }, [ %rd163 + 0 ], %rd162; 2026-02-21T10:22:41.6813930Z // end inline asm 2026-02-21T10:22:41.6814136Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6814195Z bar.sync 0; 2026-02-21T10:22:41.6814279Z st.shared.v2.b32 [%r23], {%r1910, %r1911}; 2026-02-21T10:22:41.6814373Z st.shared.v2.b32 [%r23+2048], {%r1914, %r1915}; 2026-02-21T10:22:41.6814458Z st.shared.v2.b32 [%r23+4096], {%r1918, %r1919}; 2026-02-21T10:22:41.6814544Z st.shared.v2.b32 [%r23+6144], {%r1922, %r1923}; 2026-02-21T10:22:41.6814623Z st.shared.v2.b32 [%r24], {%r1912, %r1913}; 2026-02-21T10:22:41.6814710Z st.shared.v2.b32 [%r24+2048], {%r1916, %r1917}; 2026-02-21T10:22:41.6814792Z st.shared.v2.b32 [%r24+4096], {%r1920, %r1921}; 2026-02-21T10:22:41.6814872Z st.shared.v2.b32 [%r24+6144], {%r1924, %r1925}; 2026-02-21T10:22:41.6814935Z bar.sync 0; 2026-02-21T10:22:41.6815010Z ld.shared.b16 %rs449, [%r46]; 2026-02-21T10:22:41.6815084Z ld.shared.b16 %rs450, [%r46+1024]; 2026-02-21T10:22:41.6815160Z ld.shared.b16 %rs451, [%r46+64]; 2026-02-21T10:22:41.6815229Z ld.shared.b16 %rs452, [%r46+1088]; 2026-02-21T10:22:41.6815296Z ld.shared.b16 %rs453, [%r47]; 2026-02-21T10:22:41.6815361Z ld.shared.b16 %rs454, [%r47+1024]; 2026-02-21T10:22:41.6815433Z ld.shared.b16 %rs455, [%r47+64]; 2026-02-21T10:22:41.6815498Z ld.shared.b16 %rs456, [%r47+1088]; 2026-02-21T10:22:41.6815562Z ld.shared.b16 %rs457, [%r48]; 2026-02-21T10:22:41.6815637Z ld.shared.b16 %rs458, [%r48+1024]; 2026-02-21T10:22:41.6815706Z ld.shared.b16 %rs459, [%r48+64]; 2026-02-21T10:22:41.6815772Z ld.shared.b16 %rs460, [%r48+1088]; 2026-02-21T10:22:41.6815838Z ld.shared.b16 %rs461, [%r49]; 2026-02-21T10:22:41.6815913Z ld.shared.b16 %rs462, [%r49+1024]; 2026-02-21T10:22:41.6815979Z ld.shared.b16 %rs463, [%r49+64]; 2026-02-21T10:22:41.6816044Z ld.shared.b16 %rs464, [%r49+1088]; 2026-02-21T10:22:41.6816118Z ld.shared.b16 %rs465, [%r50]; 2026-02-21T10:22:41.6816184Z ld.shared.b16 %rs466, [%r50+1024]; 2026-02-21T10:22:41.6816251Z ld.shared.b16 %rs467, [%r50+64]; 2026-02-21T10:22:41.6816324Z ld.shared.b16 %rs468, [%r50+1088]; 2026-02-21T10:22:41.6816392Z ld.shared.b16 %rs469, [%r51]; 2026-02-21T10:22:41.6816583Z ld.shared.b16 %rs470, [%r51+1024]; 2026-02-21T10:22:41.6816656Z ld.shared.b16 %rs471, [%r51+64]; 2026-02-21T10:22:41.6816729Z ld.shared.b16 %rs472, [%r51+1088]; 2026-02-21T10:22:41.6816795Z ld.shared.b16 %rs473, [%r52]; 2026-02-21T10:22:41.6816947Z ld.shared.b16 %rs474, [%r52+1024]; 2026-02-21T10:22:41.6817018Z ld.shared.b16 %rs475, [%r52+64]; 2026-02-21T10:22:41.6817083Z ld.shared.b16 %rs476, [%r52+1088]; 2026-02-21T10:22:41.6817160Z ld.shared.b16 %rs477, [%r53]; 2026-02-21T10:22:41.6817291Z ld.shared.b16 %rs478, [%r53+1024]; 2026-02-21T10:22:41.6817368Z ld.shared.b16 %rs479, [%r53+64]; 2026-02-21T10:22:41.6817433Z ld.shared.b16 %rs480, [%r53+1088]; 2026-02-21T10:22:41.6817504Z cvt.f32.bf16 %r1967, %rs449; 2026-02-21T10:22:41.6817574Z cvt.f32.bf16 %r1968, %rs450; 2026-02-21T10:22:41.6817638Z cvt.f32.bf16 %r1969, %rs453; 2026-02-21T10:22:41.6817704Z cvt.f32.bf16 %r1970, %rs454; 2026-02-21T10:22:41.6817771Z cvt.f32.bf16 %r2003, %rs457; 2026-02-21T10:22:41.6817845Z cvt.f32.bf16 %r2004, %rs458; 2026-02-21T10:22:41.6817908Z cvt.f32.bf16 %r2005, %rs461; 2026-02-21T10:22:41.6817973Z cvt.f32.bf16 %r2006, %rs462; 2026-02-21T10:22:41.6818044Z cvt.f32.bf16 %r2039, %rs465; 2026-02-21T10:22:41.6818112Z cvt.f32.bf16 %r2040, %rs466; 2026-02-21T10:22:41.6818242Z cvt.f32.bf16 %r2041, %rs469; 2026-02-21T10:22:41.6818314Z cvt.f32.bf16 %r2042, %rs470; 2026-02-21T10:22:41.6818377Z cvt.f32.bf16 %r2075, %rs473; 2026-02-21T10:22:41.6818440Z cvt.f32.bf16 %r2076, %rs474; 2026-02-21T10:22:41.6818507Z cvt.f32.bf16 %r2077, %rs477; 2026-02-21T10:22:41.6818574Z cvt.f32.bf16 %r2078, %rs478; 2026-02-21T10:22:41.6818640Z cvt.f32.bf16 %r2111, %rs451; 2026-02-21T10:22:41.6818763Z cvt.f32.bf16 %r2112, %rs452; 2026-02-21T10:22:41.6818836Z cvt.f32.bf16 %r2113, %rs455; 2026-02-21T10:22:41.6818900Z cvt.f32.bf16 %r2114, %rs456; 2026-02-21T10:22:41.6818962Z cvt.f32.bf16 %r2147, %rs459; 2026-02-21T10:22:41.6819024Z cvt.f32.bf16 %r2148, %rs460; 2026-02-21T10:22:41.6819092Z cvt.f32.bf16 %r2149, %rs463; 2026-02-21T10:22:41.6819154Z cvt.f32.bf16 %r2150, %rs464; 2026-02-21T10:22:41.6819217Z cvt.f32.bf16 %r2183, %rs467; 2026-02-21T10:22:41.6819284Z cvt.f32.bf16 %r2184, %rs468; 2026-02-21T10:22:41.6819349Z cvt.f32.bf16 %r2185, %rs471; 2026-02-21T10:22:41.6819413Z cvt.f32.bf16 %r2186, %rs472; 2026-02-21T10:22:41.6819487Z cvt.f32.bf16 %r2219, %rs475; 2026-02-21T10:22:41.6819563Z cvt.f32.bf16 %r2220, %rs476; 2026-02-21T10:22:41.6819628Z cvt.f32.bf16 %r2221, %rs479; 2026-02-21T10:22:41.6819693Z cvt.f32.bf16 %r2222, %rs480; 2026-02-21T10:22:41.6819905Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6819967Z bar.sync 0; 2026-02-21T10:22:41.6820029Z // begin inline asm 2026-02-21T10:22:41.6820135Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6820194Z // end inline asm 2026-02-21T10:22:41.6820251Z bar.sync 0; 2026-02-21T10:22:41.6820311Z // begin inline asm 2026-02-21T10:22:41.6820451Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6820509Z // end inline asm 2026-02-21T10:22:41.6820570Z // begin inline asm 2026-02-21T10:22:41.6820657Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6820723Z // end inline asm 2026-02-21T10:22:41.6820781Z bar.sync 0; 2026-02-21T10:22:41.6820852Z elect.sync %r3314|%p125, -1; 2026-02-21T10:22:41.6820927Z and.pred %p77, %p1, %p125; 2026-02-21T10:22:41.6820988Z // begin inline asm 2026-02-21T10:22:41.6821317Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r1929, %r1930}], [%r1926]; 2026-02-21T10:22:41.6821383Z // end inline asm 2026-02-21T10:22:41.6821441Z bar.sync 0; 2026-02-21T10:22:41.6821499Z mov.b32 %r3294, 0; 2026-02-21T10:22:41.6821563Z // begin inline asm 2026-02-21T10:22:41.6821616Z 2026-02-21T10:22:41.6821668Z { 2026-02-21T10:22:41.6821733Z .reg .pred complete; 2026-02-21T10:22:41.6821795Z waitLoop: 2026-02-21T10:22:41.6821938Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r3294; 2026-02-21T10:22:41.6822011Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6822071Z } 2026-02-21T10:22:41.6822078Z 2026-02-21T10:22:41.6822147Z // end inline asm 2026-02-21T10:22:41.6822266Z bar.sync 0; 2026-02-21T10:22:41.6822328Z // begin inline asm 2026-02-21T10:22:41.6822433Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6822492Z // end inline asm 2026-02-21T10:22:41.6822697Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6822827Z ld.shared.b8 %rs481, [%r33]; 2026-02-21T10:22:41.6822900Z ld.shared.b8 %rs482, [%r33+64]; 2026-02-21T10:22:41.6822968Z ld.shared.b8 %rs483, [%r33+256]; 2026-02-21T10:22:41.6823041Z ld.shared.b8 %rs484, [%r33+320]; 2026-02-21T10:22:41.6823107Z ld.shared.b8 %rs485, [%r33+512]; 2026-02-21T10:22:41.6823183Z ld.shared.b8 %rs486, [%r33+576]; 2026-02-21T10:22:41.6823251Z ld.shared.b8 %rs487, [%r33+768]; 2026-02-21T10:22:41.6823323Z ld.shared.b8 %rs488, [%r33+832]; 2026-02-21T10:22:41.6823388Z ld.shared.b8 %rs489, [%r34+128]; 2026-02-21T10:22:41.6823453Z ld.shared.b8 %rs490, [%r34+192]; 2026-02-21T10:22:41.6823522Z ld.shared.b8 %rs491, [%r34+384]; 2026-02-21T10:22:41.6823643Z ld.shared.b8 %rs492, [%r34+448]; 2026-02-21T10:22:41.6823712Z ld.shared.b8 %rs493, [%r34+640]; 2026-02-21T10:22:41.6823777Z ld.shared.b8 %rs494, [%r34+704]; 2026-02-21T10:22:41.6823851Z ld.shared.b8 %rs495, [%r34+896]; 2026-02-21T10:22:41.6823919Z ld.shared.b8 %rs496, [%r34+960]; 2026-02-21T10:22:41.6824199Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6824273Z shl.b16 %rs497, %rs481, 4; 2026-02-21T10:22:41.6824337Z shl.b16 %rs498, %rs482, 4; 2026-02-21T10:22:41.6824399Z shl.b16 %rs499, %rs489, 4; 2026-02-21T10:22:41.6824467Z shl.b16 %rs500, %rs490, 4; 2026-02-21T10:22:41.6824529Z shl.b16 %rs501, %rs483, 4; 2026-02-21T10:22:41.6824592Z shl.b16 %rs502, %rs484, 4; 2026-02-21T10:22:41.6824654Z shl.b16 %rs503, %rs491, 4; 2026-02-21T10:22:41.6824722Z shl.b16 %rs504, %rs492, 4; 2026-02-21T10:22:41.6824784Z shl.b16 %rs505, %rs485, 4; 2026-02-21T10:22:41.6824850Z shl.b16 %rs506, %rs486, 4; 2026-02-21T10:22:41.6824919Z shl.b16 %rs507, %rs493, 4; 2026-02-21T10:22:41.6824982Z shl.b16 %rs508, %rs494, 4; 2026-02-21T10:22:41.6825044Z shl.b16 %rs509, %rs487, 4; 2026-02-21T10:22:41.6825107Z shl.b16 %rs510, %rs488, 4; 2026-02-21T10:22:41.6825180Z shl.b16 %rs511, %rs495, 4; 2026-02-21T10:22:41.6825243Z shl.b16 %rs512, %rs496, 4; 2026-02-21T10:22:41.6825445Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6825529Z selp.b16 %rs513, %rs497, %rs481, %p268; 2026-02-21T10:22:41.6825598Z cvt.s16.s8 %rs514, %rs513; 2026-02-21T10:22:41.6825661Z shr.s16 %rs515, %rs514, 4; 2026-02-21T10:22:41.6825737Z selp.b16 %rs516, %rs498, %rs482, %p268; 2026-02-21T10:22:41.6825808Z cvt.s16.s8 %rs517, %rs516; 2026-02-21T10:22:41.6825869Z shr.s16 %rs518, %rs517, 4; 2026-02-21T10:22:41.6825940Z selp.b16 %rs519, %rs499, %rs489, %p268; 2026-02-21T10:22:41.6826011Z cvt.s16.s8 %rs520, %rs519; 2026-02-21T10:22:41.6826077Z shr.s16 %rs521, %rs520, 4; 2026-02-21T10:22:41.6826148Z selp.b16 %rs522, %rs500, %rs490, %p268; 2026-02-21T10:22:41.6826217Z cvt.s16.s8 %rs523, %rs522; 2026-02-21T10:22:41.6826281Z shr.s16 %rs524, %rs523, 4; 2026-02-21T10:22:41.6826354Z selp.b16 %rs525, %rs501, %rs483, %p268; 2026-02-21T10:22:41.6826419Z cvt.s16.s8 %rs526, %rs525; 2026-02-21T10:22:41.6826614Z shr.s16 %rs527, %rs526, 4; 2026-02-21T10:22:41.6826695Z selp.b16 %rs528, %rs502, %rs484, %p268; 2026-02-21T10:22:41.6826758Z cvt.s16.s8 %rs529, %rs528; 2026-02-21T10:22:41.6826831Z shr.s16 %rs530, %rs529, 4; 2026-02-21T10:22:41.6826903Z selp.b16 %rs531, %rs503, %rs491, %p268; 2026-02-21T10:22:41.6826968Z cvt.s16.s8 %rs532, %rs531; 2026-02-21T10:22:41.6827032Z shr.s16 %rs533, %rs532, 4; 2026-02-21T10:22:41.6827108Z selp.b16 %rs534, %rs504, %rs492, %p268; 2026-02-21T10:22:41.6827172Z cvt.s16.s8 %rs535, %rs534; 2026-02-21T10:22:41.6827235Z shr.s16 %rs536, %rs535, 4; 2026-02-21T10:22:41.6827309Z selp.b16 %rs537, %rs505, %rs485, %p268; 2026-02-21T10:22:41.6827456Z cvt.s16.s8 %rs538, %rs537; 2026-02-21T10:22:41.6827518Z shr.s16 %rs539, %rs538, 4; 2026-02-21T10:22:41.6827591Z selp.b16 %rs540, %rs506, %rs486, %p268; 2026-02-21T10:22:41.6827664Z cvt.s16.s8 %rs541, %rs540; 2026-02-21T10:22:41.6827792Z shr.s16 %rs542, %rs541, 4; 2026-02-21T10:22:41.6827865Z selp.b16 %rs543, %rs507, %rs493, %p268; 2026-02-21T10:22:41.6827938Z cvt.s16.s8 %rs544, %rs543; 2026-02-21T10:22:41.6828000Z shr.s16 %rs545, %rs544, 4; 2026-02-21T10:22:41.6828086Z selp.b16 %rs546, %rs508, %rs494, %p268; 2026-02-21T10:22:41.6828152Z cvt.s16.s8 %rs547, %rs546; 2026-02-21T10:22:41.6828221Z shr.s16 %rs548, %rs547, 4; 2026-02-21T10:22:41.6828293Z selp.b16 %rs549, %rs509, %rs487, %p268; 2026-02-21T10:22:41.6828356Z cvt.s16.s8 %rs550, %rs549; 2026-02-21T10:22:41.6828423Z shr.s16 %rs551, %rs550, 4; 2026-02-21T10:22:41.6828557Z selp.b16 %rs552, %rs510, %rs488, %p268; 2026-02-21T10:22:41.6828623Z cvt.s16.s8 %rs553, %rs552; 2026-02-21T10:22:41.6828764Z shr.s16 %rs554, %rs553, 4; 2026-02-21T10:22:41.6828839Z selp.b16 %rs555, %rs511, %rs495, %p268; 2026-02-21T10:22:41.6828901Z cvt.s16.s8 %rs556, %rs555; 2026-02-21T10:22:41.6828962Z shr.s16 %rs557, %rs556, 4; 2026-02-21T10:22:41.6829042Z selp.b16 %rs558, %rs512, %rs496, %p268; 2026-02-21T10:22:41.6829105Z cvt.s16.s8 %rs559, %rs558; 2026-02-21T10:22:41.6829167Z shr.s16 %rs560, %rs559, 4; 2026-02-21T10:22:41.6829438Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6829510Z cvt.rn.f32.s16 %r3315, %rs515; 2026-02-21T10:22:41.6829575Z cvt.rn.f32.s16 %r3316, %rs518; 2026-02-21T10:22:41.6829639Z cvt.rn.f32.s16 %r3317, %rs521; 2026-02-21T10:22:41.6829709Z cvt.rn.f32.s16 %r3318, %rs524; 2026-02-21T10:22:41.6829772Z cvt.rn.f32.s16 %r3319, %rs527; 2026-02-21T10:22:41.6829836Z cvt.rn.f32.s16 %r3320, %rs530; 2026-02-21T10:22:41.6829905Z cvt.rn.f32.s16 %r3321, %rs533; 2026-02-21T10:22:41.6829971Z cvt.rn.f32.s16 %r3322, %rs536; 2026-02-21T10:22:41.6830038Z cvt.rn.f32.s16 %r3323, %rs539; 2026-02-21T10:22:41.6830110Z cvt.rn.f32.s16 %r3324, %rs542; 2026-02-21T10:22:41.6830175Z cvt.rn.f32.s16 %r3325, %rs545; 2026-02-21T10:22:41.6830243Z cvt.rn.f32.s16 %r3326, %rs548; 2026-02-21T10:22:41.6830309Z cvt.rn.f32.s16 %r3327, %rs551; 2026-02-21T10:22:41.6830378Z cvt.rn.f32.s16 %r3328, %rs554; 2026-02-21T10:22:41.6830444Z cvt.rn.f32.s16 %r3329, %rs557; 2026-02-21T10:22:41.6830507Z cvt.rn.f32.s16 %r3330, %rs560; 2026-02-21T10:22:41.6830577Z bar.sync 0; 2026-02-21T10:22:41.6830645Z st.shared.b32 [%r35], %r3315; 2026-02-21T10:22:41.6830715Z st.shared.b32 [%r35+4096], %r3323; 2026-02-21T10:22:41.6830781Z st.shared.b32 [%r36], %r3316; 2026-02-21T10:22:41.6830852Z st.shared.b32 [%r36+4096], %r3324; 2026-02-21T10:22:41.6830917Z st.shared.b32 [%r37], %r3317; 2026-02-21T10:22:41.6830981Z st.shared.b32 [%r37+4096], %r3325; 2026-02-21T10:22:41.6831050Z st.shared.b32 [%r38], %r3318; 2026-02-21T10:22:41.6831120Z st.shared.b32 [%r38+4096], %r3326; 2026-02-21T10:22:41.6831185Z st.shared.b32 [%r39], %r3319; 2026-02-21T10:22:41.6831251Z st.shared.b32 [%r39+4096], %r3327; 2026-02-21T10:22:41.6831325Z st.shared.b32 [%r40], %r3320; 2026-02-21T10:22:41.6831393Z st.shared.b32 [%r40+4096], %r3328; 2026-02-21T10:22:41.6831467Z st.shared.b32 [%r41], %r3321; 2026-02-21T10:22:41.6831538Z st.shared.b32 [%r41+4096], %r3329; 2026-02-21T10:22:41.6831607Z st.shared.b32 [%r42], %r3322; 2026-02-21T10:22:41.6831671Z st.shared.b32 [%r42+4096], %r3330; 2026-02-21T10:22:41.6831729Z $L__tmp9: 2026-02-21T10:22:41.6832024Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6832090Z // begin inline asm 2026-02-21T10:22:41.6832171Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6832235Z // end inline asm 2026-02-21T10:22:41.6832293Z bar.sync 0; 2026-02-21T10:22:41.6832377Z shfl.sync.idx.b32 %r3331, %r4, 0, 31, -1; 2026-02-21T10:22:41.6832520Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6832586Z mov.pred %p79, -1; 2026-02-21T10:22:41.6832648Z // begin inline asm 2026-02-21T10:22:41.6833161Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r1967,%r1968,%r1969,%r1970}, %rd166, %p79, 1, 1; 2026-02-21T10:22:41.6833278Z // end inline asm 2026-02-21T10:22:41.6833344Z // begin inline asm 2026-02-21T10:22:41.6833848Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2003,%r2004,%r2005,%r2006}, %rd167, %p79, 1, 1; 2026-02-21T10:22:41.6833912Z // end inline asm 2026-02-21T10:22:41.6833972Z // begin inline asm 2026-02-21T10:22:41.6834512Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2039,%r2040,%r2041,%r2042}, %rd168, %p79, 1, 1; 2026-02-21T10:22:41.6834582Z // end inline asm 2026-02-21T10:22:41.6834643Z // begin inline asm 2026-02-21T10:22:41.6835139Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2075,%r2076,%r2077,%r2078}, %rd169, %p79, 1, 1; 2026-02-21T10:22:41.6835257Z // end inline asm 2026-02-21T10:22:41.6835320Z // begin inline asm 2026-02-21T10:22:41.6835818Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2111,%r2112,%r2113,%r2114}, %rd170, %p79, 1, 1; 2026-02-21T10:22:41.6835883Z // end inline asm 2026-02-21T10:22:41.6835944Z // begin inline asm 2026-02-21T10:22:41.6836440Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2147,%r2148,%r2149,%r2150}, %rd171, %p79, 1, 1; 2026-02-21T10:22:41.6836628Z // end inline asm 2026-02-21T10:22:41.6836695Z // begin inline asm 2026-02-21T10:22:41.6837192Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2183,%r2184,%r2185,%r2186}, %rd172, %p79, 1, 1; 2026-02-21T10:22:41.6837268Z // end inline asm 2026-02-21T10:22:41.6837334Z // begin inline asm 2026-02-21T10:22:41.6837828Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2219,%r2220,%r2221,%r2222}, %rd173, %p79, 1, 1; 2026-02-21T10:22:41.6837887Z // end inline asm 2026-02-21T10:22:41.6837974Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6838041Z mov.b32 %r2239, %r1928; 2026-02-21T10:22:41.6838104Z mov.b32 %r2240, %r3294; 2026-02-21T10:22:41.6838176Z mov.b32 %r2241, %r3294; 2026-02-21T10:22:41.6838238Z // begin inline asm 2026-02-21T10:22:41.6838546Z // wait for regs: %r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476,%r2239,%r2240,%r2241 2026-02-21T10:22:41.6838634Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6838696Z // end inline asm 2026-02-21T10:22:41.6838764Z $L__tmp10: 2026-02-21T10:22:41.6838978Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6839051Z add.s64 %rd175, %rd154, 128; 2026-02-21T10:22:41.6839117Z add.s64 %rd178, %rd157, 128; 2026-02-21T10:22:41.6839180Z add.s64 %rd181, %rd160, 128; 2026-02-21T10:22:41.6839386Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6839450Z add.s64 %rd184, %rd163, 128; 2026-02-21T10:22:41.6839509Z // begin inline asm 2026-02-21T10:22:41.6839575Z mov.u64 %rd174, 0x0; 2026-02-21T10:22:41.6839787Z createpolicy.fractional.L2::evict_last.b64 %rd174, 1.0; 2026-02-21T10:22:41.6839846Z // end inline asm 2026-02-21T10:22:41.6839906Z // begin inline asm 2026-02-21T10:22:41.6839971Z mov.u32 %r2261, 0x0; 2026-02-21T10:22:41.6840092Z mov.u32 %r2262, 0x0; 2026-02-21T10:22:41.6840151Z mov.u32 %r2263, 0x0; 2026-02-21T10:22:41.6840210Z mov.u32 %r2264, 0x0; 2026-02-21T10:22:41.6840439Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2261, %r2262, %r2263, %r2264 }, [ %rd175 + 0 ], %rd174; 2026-02-21T10:22:41.6840499Z // end inline asm 2026-02-21T10:22:41.6840559Z // begin inline asm 2026-02-21T10:22:41.6840622Z mov.u64 %rd177, 0x0; 2026-02-21T10:22:41.6840742Z createpolicy.fractional.L2::evict_last.b64 %rd177, 1.0; 2026-02-21T10:22:41.6840800Z // end inline asm 2026-02-21T10:22:41.6840864Z // begin inline asm 2026-02-21T10:22:41.6840921Z mov.u32 %r2265, 0x0; 2026-02-21T10:22:41.6840979Z mov.u32 %r2266, 0x0; 2026-02-21T10:22:41.6841038Z mov.u32 %r2267, 0x0; 2026-02-21T10:22:41.6841165Z mov.u32 %r2268, 0x0; 2026-02-21T10:22:41.6841382Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2265, %r2266, %r2267, %r2268 }, [ %rd178 + 0 ], %rd177; 2026-02-21T10:22:41.6841450Z // end inline asm 2026-02-21T10:22:41.6841519Z // begin inline asm 2026-02-21T10:22:41.6841576Z mov.u64 %rd180, 0x0; 2026-02-21T10:22:41.6841693Z createpolicy.fractional.L2::evict_last.b64 %rd180, 1.0; 2026-02-21T10:22:41.6841824Z // end inline asm 2026-02-21T10:22:41.6841886Z // begin inline asm 2026-02-21T10:22:41.6841944Z mov.u32 %r2269, 0x0; 2026-02-21T10:22:41.6842002Z mov.u32 %r2270, 0x0; 2026-02-21T10:22:41.6842062Z mov.u32 %r2271, 0x0; 2026-02-21T10:22:41.6842119Z mov.u32 %r2272, 0x0; 2026-02-21T10:22:41.6842328Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2269, %r2270, %r2271, %r2272 }, [ %rd181 + 0 ], %rd180; 2026-02-21T10:22:41.6842389Z // end inline asm 2026-02-21T10:22:41.6842448Z // begin inline asm 2026-02-21T10:22:41.6842508Z mov.u64 %rd183, 0x0; 2026-02-21T10:22:41.6842631Z createpolicy.fractional.L2::evict_last.b64 %rd183, 1.0; 2026-02-21T10:22:41.6842688Z // end inline asm 2026-02-21T10:22:41.6842746Z // begin inline asm 2026-02-21T10:22:41.6842804Z mov.u32 %r2273, 0x0; 2026-02-21T10:22:41.6842865Z mov.u32 %r2274, 0x0; 2026-02-21T10:22:41.6842925Z mov.u32 %r2275, 0x0; 2026-02-21T10:22:41.6842983Z mov.u32 %r2276, 0x0; 2026-02-21T10:22:41.6843198Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2273, %r2274, %r2275, %r2276 }, [ %rd184 + 0 ], %rd183; 2026-02-21T10:22:41.6843255Z // end inline asm 2026-02-21T10:22:41.6843456Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6843517Z bar.sync 0; 2026-02-21T10:22:41.6843599Z st.shared.v2.b32 [%r23], {%r2261, %r2262}; 2026-02-21T10:22:41.6843688Z st.shared.v2.b32 [%r23+2048], {%r2265, %r2266}; 2026-02-21T10:22:41.6843773Z st.shared.v2.b32 [%r23+4096], {%r2269, %r2270}; 2026-02-21T10:22:41.6843857Z st.shared.v2.b32 [%r23+6144], {%r2273, %r2274}; 2026-02-21T10:22:41.6843938Z st.shared.v2.b32 [%r24], {%r2263, %r2264}; 2026-02-21T10:22:41.6844021Z st.shared.v2.b32 [%r24+2048], {%r2267, %r2268}; 2026-02-21T10:22:41.6844112Z st.shared.v2.b32 [%r24+4096], {%r2271, %r2272}; 2026-02-21T10:22:41.6844197Z st.shared.v2.b32 [%r24+6144], {%r2275, %r2276}; 2026-02-21T10:22:41.6844254Z bar.sync 0; 2026-02-21T10:22:41.6844325Z ld.shared.b16 %rs561, [%r46]; 2026-02-21T10:22:41.6844398Z ld.shared.b16 %rs562, [%r46+1024]; 2026-02-21T10:22:41.6844467Z ld.shared.b16 %rs563, [%r46+64]; 2026-02-21T10:22:41.6844532Z ld.shared.b16 %rs564, [%r46+1088]; 2026-02-21T10:22:41.6844601Z ld.shared.b16 %rs565, [%r47]; 2026-02-21T10:22:41.6844667Z ld.shared.b16 %rs566, [%r47+1024]; 2026-02-21T10:22:41.6844734Z ld.shared.b16 %rs567, [%r47+64]; 2026-02-21T10:22:41.6844808Z ld.shared.b16 %rs568, [%r47+1088]; 2026-02-21T10:22:41.6844880Z ld.shared.b16 %rs569, [%r48]; 2026-02-21T10:22:41.6844947Z ld.shared.b16 %rs570, [%r48+1024]; 2026-02-21T10:22:41.6845078Z ld.shared.b16 %rs571, [%r48+64]; 2026-02-21T10:22:41.6845148Z ld.shared.b16 %rs572, [%r48+1088]; 2026-02-21T10:22:41.6845216Z ld.shared.b16 %rs573, [%r49]; 2026-02-21T10:22:41.6845280Z ld.shared.b16 %rs574, [%r49+1024]; 2026-02-21T10:22:41.6845397Z ld.shared.b16 %rs575, [%r49+64]; 2026-02-21T10:22:41.6845467Z ld.shared.b16 %rs576, [%r49+1088]; 2026-02-21T10:22:41.6845531Z ld.shared.b16 %rs577, [%r50]; 2026-02-21T10:22:41.6845606Z ld.shared.b16 %rs578, [%r50+1024]; 2026-02-21T10:22:41.6845676Z ld.shared.b16 %rs579, [%r50+64]; 2026-02-21T10:22:41.6845741Z ld.shared.b16 %rs580, [%r50+1088]; 2026-02-21T10:22:41.6845804Z ld.shared.b16 %rs581, [%r51]; 2026-02-21T10:22:41.6845873Z ld.shared.b16 %rs582, [%r51+1024]; 2026-02-21T10:22:41.6845938Z ld.shared.b16 %rs583, [%r51+64]; 2026-02-21T10:22:41.6846003Z ld.shared.b16 %rs584, [%r51+1088]; 2026-02-21T10:22:41.6846074Z ld.shared.b16 %rs585, [%r52]; 2026-02-21T10:22:41.6846138Z ld.shared.b16 %rs586, [%r52+1024]; 2026-02-21T10:22:41.6846277Z ld.shared.b16 %rs587, [%r52+64]; 2026-02-21T10:22:41.6846344Z ld.shared.b16 %rs588, [%r52+1088]; 2026-02-21T10:22:41.6846426Z ld.shared.b16 %rs589, [%r53]; 2026-02-21T10:22:41.6846637Z ld.shared.b16 %rs590, [%r53+1024]; 2026-02-21T10:22:41.6846709Z ld.shared.b16 %rs591, [%r53+64]; 2026-02-21T10:22:41.6846775Z ld.shared.b16 %rs592, [%r53+1088]; 2026-02-21T10:22:41.6846842Z cvt.f32.bf16 %r2318, %rs561; 2026-02-21T10:22:41.6846994Z cvt.f32.bf16 %r2319, %rs562; 2026-02-21T10:22:41.6847062Z cvt.f32.bf16 %r2320, %rs565; 2026-02-21T10:22:41.6847129Z cvt.f32.bf16 %r2321, %rs566; 2026-02-21T10:22:41.6847193Z cvt.f32.bf16 %r2354, %rs569; 2026-02-21T10:22:41.6847253Z cvt.f32.bf16 %r2355, %rs570; 2026-02-21T10:22:41.6847319Z cvt.f32.bf16 %r2356, %rs573; 2026-02-21T10:22:41.6847392Z cvt.f32.bf16 %r2357, %rs574; 2026-02-21T10:22:41.6847455Z cvt.f32.bf16 %r2390, %rs577; 2026-02-21T10:22:41.6847519Z cvt.f32.bf16 %r2391, %rs578; 2026-02-21T10:22:41.6847586Z cvt.f32.bf16 %r2392, %rs581; 2026-02-21T10:22:41.6847653Z cvt.f32.bf16 %r2393, %rs582; 2026-02-21T10:22:41.6847715Z cvt.f32.bf16 %r2426, %rs585; 2026-02-21T10:22:41.6847781Z cvt.f32.bf16 %r2427, %rs586; 2026-02-21T10:22:41.6847842Z cvt.f32.bf16 %r2428, %rs589; 2026-02-21T10:22:41.6847907Z cvt.f32.bf16 %r2429, %rs590; 2026-02-21T10:22:41.6847973Z cvt.f32.bf16 %r2462, %rs563; 2026-02-21T10:22:41.6848036Z cvt.f32.bf16 %r2463, %rs564; 2026-02-21T10:22:41.6848098Z cvt.f32.bf16 %r2464, %rs567; 2026-02-21T10:22:41.6848159Z cvt.f32.bf16 %r2465, %rs568; 2026-02-21T10:22:41.6848227Z cvt.f32.bf16 %r2498, %rs571; 2026-02-21T10:22:41.6848293Z cvt.f32.bf16 %r2499, %rs572; 2026-02-21T10:22:41.6848354Z cvt.f32.bf16 %r2500, %rs575; 2026-02-21T10:22:41.6848418Z cvt.f32.bf16 %r2501, %rs576; 2026-02-21T10:22:41.6848478Z cvt.f32.bf16 %r2534, %rs579; 2026-02-21T10:22:41.6848540Z cvt.f32.bf16 %r2535, %rs580; 2026-02-21T10:22:41.6848601Z cvt.f32.bf16 %r2536, %rs583; 2026-02-21T10:22:41.6848667Z cvt.f32.bf16 %r2537, %rs584; 2026-02-21T10:22:41.6848733Z cvt.f32.bf16 %r2570, %rs587; 2026-02-21T10:22:41.6848794Z cvt.f32.bf16 %r2571, %rs588; 2026-02-21T10:22:41.6848857Z cvt.f32.bf16 %r2572, %rs591; 2026-02-21T10:22:41.6848930Z cvt.f32.bf16 %r2573, %rs592; 2026-02-21T10:22:41.6849140Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6849198Z bar.sync 0; 2026-02-21T10:22:41.6849266Z // begin inline asm 2026-02-21T10:22:41.6849368Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6849425Z // end inline asm 2026-02-21T10:22:41.6849485Z bar.sync 0; 2026-02-21T10:22:41.6849545Z // begin inline asm 2026-02-21T10:22:41.6849680Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6849743Z // end inline asm 2026-02-21T10:22:41.6849804Z // begin inline asm 2026-02-21T10:22:41.6849883Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6849945Z // end inline asm 2026-02-21T10:22:41.6850101Z bar.sync 0; 2026-02-21T10:22:41.6850173Z elect.sync %r3332|%p126, -1; 2026-02-21T10:22:41.6850243Z and.pred %p89, %p1, %p126; 2026-02-21T10:22:41.6850311Z or.b32 %r2281, %r1930, 32; 2026-02-21T10:22:41.6850370Z // begin inline asm 2026-02-21T10:22:41.6850758Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r1929, %r2281}], [%r1926]; 2026-02-21T10:22:41.6850817Z // end inline asm 2026-02-21T10:22:41.6850885Z bar.sync 0; 2026-02-21T10:22:41.6850945Z // begin inline asm 2026-02-21T10:22:41.6850998Z 2026-02-21T10:22:41.6851054Z { 2026-02-21T10:22:41.6851121Z .reg .pred complete; 2026-02-21T10:22:41.6851179Z waitLoop: 2026-02-21T10:22:41.6851328Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r3294; 2026-02-21T10:22:41.6851408Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6851460Z } 2026-02-21T10:22:41.6851465Z 2026-02-21T10:22:41.6851522Z // end inline asm 2026-02-21T10:22:41.6851579Z bar.sync 0; 2026-02-21T10:22:41.6851640Z // begin inline asm 2026-02-21T10:22:41.6851810Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6851878Z // end inline asm 2026-02-21T10:22:41.6852079Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6852149Z ld.shared.b8 %rs593, [%r33]; 2026-02-21T10:22:41.6852217Z ld.shared.b8 %rs594, [%r33+64]; 2026-02-21T10:22:41.6852334Z ld.shared.b8 %rs595, [%r33+256]; 2026-02-21T10:22:41.6852400Z ld.shared.b8 %rs596, [%r33+320]; 2026-02-21T10:22:41.6852464Z ld.shared.b8 %rs597, [%r33+512]; 2026-02-21T10:22:41.6852534Z ld.shared.b8 %rs598, [%r33+576]; 2026-02-21T10:22:41.6852600Z ld.shared.b8 %rs599, [%r33+768]; 2026-02-21T10:22:41.6852666Z ld.shared.b8 %rs600, [%r33+832]; 2026-02-21T10:22:41.6852739Z ld.shared.b8 %rs601, [%r34+128]; 2026-02-21T10:22:41.6852804Z ld.shared.b8 %rs602, [%r34+192]; 2026-02-21T10:22:41.6852872Z ld.shared.b8 %rs603, [%r34+384]; 2026-02-21T10:22:41.6852937Z ld.shared.b8 %rs604, [%r34+448]; 2026-02-21T10:22:41.6853013Z ld.shared.b8 %rs605, [%r34+640]; 2026-02-21T10:22:41.6853078Z ld.shared.b8 %rs606, [%r34+704]; 2026-02-21T10:22:41.6853146Z ld.shared.b8 %rs607, [%r34+896]; 2026-02-21T10:22:41.6853220Z ld.shared.b8 %rs608, [%r34+960]; 2026-02-21T10:22:41.6857488Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6857618Z shl.b16 %rs609, %rs593, 4; 2026-02-21T10:22:41.6857699Z shl.b16 %rs610, %rs594, 4; 2026-02-21T10:22:41.6857763Z shl.b16 %rs611, %rs601, 4; 2026-02-21T10:22:41.6857837Z shl.b16 %rs612, %rs602, 4; 2026-02-21T10:22:41.6857899Z shl.b16 %rs613, %rs595, 4; 2026-02-21T10:22:41.6857962Z shl.b16 %rs614, %rs596, 4; 2026-02-21T10:22:41.6858028Z shl.b16 %rs615, %rs603, 4; 2026-02-21T10:22:41.6858097Z shl.b16 %rs616, %rs604, 4; 2026-02-21T10:22:41.6858164Z shl.b16 %rs617, %rs597, 4; 2026-02-21T10:22:41.6858225Z shl.b16 %rs618, %rs598, 4; 2026-02-21T10:22:41.6858297Z shl.b16 %rs619, %rs605, 4; 2026-02-21T10:22:41.6858362Z shl.b16 %rs620, %rs606, 4; 2026-02-21T10:22:41.6858428Z shl.b16 %rs621, %rs599, 4; 2026-02-21T10:22:41.6858488Z shl.b16 %rs622, %rs600, 4; 2026-02-21T10:22:41.6858552Z shl.b16 %rs623, %rs607, 4; 2026-02-21T10:22:41.6858617Z shl.b16 %rs624, %rs608, 4; 2026-02-21T10:22:41.6858854Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6858943Z selp.b16 %rs625, %rs609, %rs593, %p268; 2026-02-21T10:22:41.6859011Z cvt.s16.s8 %rs626, %rs625; 2026-02-21T10:22:41.6859075Z shr.s16 %rs627, %rs626, 4; 2026-02-21T10:22:41.6859155Z selp.b16 %rs628, %rs610, %rs594, %p268; 2026-02-21T10:22:41.6859228Z cvt.s16.s8 %rs629, %rs628; 2026-02-21T10:22:41.6859290Z shr.s16 %rs630, %rs629, 4; 2026-02-21T10:22:41.6859365Z selp.b16 %rs631, %rs611, %rs601, %p268; 2026-02-21T10:22:41.6859436Z cvt.s16.s8 %rs632, %rs631; 2026-02-21T10:22:41.6859501Z shr.s16 %rs633, %rs632, 4; 2026-02-21T10:22:41.6859703Z selp.b16 %rs634, %rs612, %rs602, %p268; 2026-02-21T10:22:41.6859773Z cvt.s16.s8 %rs635, %rs634; 2026-02-21T10:22:41.6859837Z shr.s16 %rs636, %rs635, 4; 2026-02-21T10:22:41.6859909Z selp.b16 %rs637, %rs613, %rs595, %p268; 2026-02-21T10:22:41.6860057Z cvt.s16.s8 %rs638, %rs637; 2026-02-21T10:22:41.6860133Z shr.s16 %rs639, %rs638, 4; 2026-02-21T10:22:41.6860216Z selp.b16 %rs640, %rs614, %rs596, %p268; 2026-02-21T10:22:41.6860291Z cvt.s16.s8 %rs641, %rs640; 2026-02-21T10:22:41.6860363Z shr.s16 %rs642, %rs641, 4; 2026-02-21T10:22:41.6860442Z selp.b16 %rs643, %rs615, %rs603, %p268; 2026-02-21T10:22:41.6860506Z cvt.s16.s8 %rs644, %rs643; 2026-02-21T10:22:41.6860572Z shr.s16 %rs645, %rs644, 4; 2026-02-21T10:22:41.6860649Z selp.b16 %rs646, %rs616, %rs604, %p268; 2026-02-21T10:22:41.6860713Z cvt.s16.s8 %rs647, %rs646; 2026-02-21T10:22:41.6860777Z shr.s16 %rs648, %rs647, 4; 2026-02-21T10:22:41.6860854Z selp.b16 %rs649, %rs617, %rs597, %p268; 2026-02-21T10:22:41.6860919Z cvt.s16.s8 %rs650, %rs649; 2026-02-21T10:22:41.6861050Z shr.s16 %rs651, %rs650, 4; 2026-02-21T10:22:41.6861125Z selp.b16 %rs652, %rs618, %rs598, %p268; 2026-02-21T10:22:41.6861200Z cvt.s16.s8 %rs653, %rs652; 2026-02-21T10:22:41.6861264Z shr.s16 %rs654, %rs653, 4; 2026-02-21T10:22:41.6861348Z selp.b16 %rs655, %rs619, %rs605, %p268; 2026-02-21T10:22:41.6861425Z cvt.s16.s8 %rs656, %rs655; 2026-02-21T10:22:41.6861490Z shr.s16 %rs657, %rs656, 4; 2026-02-21T10:22:41.6861638Z selp.b16 %rs658, %rs620, %rs606, %p268; 2026-02-21T10:22:41.6861716Z cvt.s16.s8 %rs659, %rs658; 2026-02-21T10:22:41.6861782Z shr.s16 %rs660, %rs659, 4; 2026-02-21T10:22:41.6861868Z selp.b16 %rs661, %rs621, %rs599, %p268; 2026-02-21T10:22:41.6861935Z cvt.s16.s8 %rs662, %rs661; 2026-02-21T10:22:41.6862005Z shr.s16 %rs663, %rs662, 4; 2026-02-21T10:22:41.6862076Z selp.b16 %rs664, %rs622, %rs600, %p268; 2026-02-21T10:22:41.6862140Z cvt.s16.s8 %rs665, %rs664; 2026-02-21T10:22:41.6862211Z shr.s16 %rs666, %rs665, 4; 2026-02-21T10:22:41.6862287Z selp.b16 %rs667, %rs623, %rs607, %p268; 2026-02-21T10:22:41.6862350Z cvt.s16.s8 %rs668, %rs667; 2026-02-21T10:22:41.6862415Z shr.s16 %rs669, %rs668, 4; 2026-02-21T10:22:41.6862494Z selp.b16 %rs670, %rs624, %rs608, %p268; 2026-02-21T10:22:41.6862562Z cvt.s16.s8 %rs671, %rs670; 2026-02-21T10:22:41.6862626Z shr.s16 %rs672, %rs671, 4; 2026-02-21T10:22:41.6862867Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6862940Z cvt.rn.f32.s16 %r3333, %rs627; 2026-02-21T10:22:41.6863006Z cvt.rn.f32.s16 %r3334, %rs630; 2026-02-21T10:22:41.6863072Z cvt.rn.f32.s16 %r3335, %rs633; 2026-02-21T10:22:41.6863144Z cvt.rn.f32.s16 %r3336, %rs636; 2026-02-21T10:22:41.6863209Z cvt.rn.f32.s16 %r3337, %rs639; 2026-02-21T10:22:41.6863274Z cvt.rn.f32.s16 %r3338, %rs642; 2026-02-21T10:22:41.6863347Z cvt.rn.f32.s16 %r3339, %rs645; 2026-02-21T10:22:41.6863412Z cvt.rn.f32.s16 %r3340, %rs648; 2026-02-21T10:22:41.6863480Z cvt.rn.f32.s16 %r3341, %rs651; 2026-02-21T10:22:41.6863553Z cvt.rn.f32.s16 %r3342, %rs654; 2026-02-21T10:22:41.6863629Z cvt.rn.f32.s16 %r3343, %rs657; 2026-02-21T10:22:41.6863695Z cvt.rn.f32.s16 %r3344, %rs660; 2026-02-21T10:22:41.6863759Z cvt.rn.f32.s16 %r3345, %rs663; 2026-02-21T10:22:41.6863835Z cvt.rn.f32.s16 %r3346, %rs666; 2026-02-21T10:22:41.6863899Z cvt.rn.f32.s16 %r3347, %rs669; 2026-02-21T10:22:41.6863969Z cvt.rn.f32.s16 %r3348, %rs672; 2026-02-21T10:22:41.6864038Z bar.sync 0; 2026-02-21T10:22:41.6864109Z st.shared.b32 [%r35], %r3333; 2026-02-21T10:22:41.6864182Z st.shared.b32 [%r35+4096], %r3341; 2026-02-21T10:22:41.6864251Z st.shared.b32 [%r36], %r3334; 2026-02-21T10:22:41.6864327Z st.shared.b32 [%r36+4096], %r3342; 2026-02-21T10:22:41.6864394Z st.shared.b32 [%r37], %r3335; 2026-02-21T10:22:41.6864460Z st.shared.b32 [%r37+4096], %r3343; 2026-02-21T10:22:41.6864532Z st.shared.b32 [%r38], %r3336; 2026-02-21T10:22:41.6864599Z st.shared.b32 [%r38+4096], %r3344; 2026-02-21T10:22:41.6864730Z st.shared.b32 [%r39], %r3337; 2026-02-21T10:22:41.6864798Z st.shared.b32 [%r39+4096], %r3345; 2026-02-21T10:22:41.6864867Z st.shared.b32 [%r40], %r3338; 2026-02-21T10:22:41.6864934Z st.shared.b32 [%r40+4096], %r3346; 2026-02-21T10:22:41.6865046Z st.shared.b32 [%r41], %r3339; 2026-02-21T10:22:41.6865118Z st.shared.b32 [%r41+4096], %r3347; 2026-02-21T10:22:41.6865195Z st.shared.b32 [%r42], %r3340; 2026-02-21T10:22:41.6865270Z st.shared.b32 [%r42+4096], %r3348; 2026-02-21T10:22:41.6865333Z $L__tmp11: 2026-02-21T10:22:41.6865626Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6865694Z // begin inline asm 2026-02-21T10:22:41.6865784Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6865855Z // end inline asm 2026-02-21T10:22:41.6865914Z bar.sync 0; 2026-02-21T10:22:41.6865991Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6866060Z // begin inline asm 2026-02-21T10:22:41.6866816Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2318,%r2319,%r2320,%r2321}, %rd166, %p79, 1, 1; 2026-02-21T10:22:41.6866890Z // end inline asm 2026-02-21T10:22:41.6866960Z // begin inline asm 2026-02-21T10:22:41.6867521Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2354,%r2355,%r2356,%r2357}, %rd167, %p79, 1, 1; 2026-02-21T10:22:41.6867584Z // end inline asm 2026-02-21T10:22:41.6867647Z // begin inline asm 2026-02-21T10:22:41.6868157Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2390,%r2391,%r2392,%r2393}, %rd168, %p79, 1, 1; 2026-02-21T10:22:41.6868219Z // end inline asm 2026-02-21T10:22:41.6868281Z // begin inline asm 2026-02-21T10:22:41.6868864Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2426,%r2427,%r2428,%r2429}, %rd169, %p79, 1, 1; 2026-02-21T10:22:41.6868933Z // end inline asm 2026-02-21T10:22:41.6868994Z // begin inline asm 2026-02-21T10:22:41.6869499Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2462,%r2463,%r2464,%r2465}, %rd170, %p79, 1, 1; 2026-02-21T10:22:41.6869562Z // end inline asm 2026-02-21T10:22:41.6869623Z // begin inline asm 2026-02-21T10:22:41.6870123Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2498,%r2499,%r2500,%r2501}, %rd171, %p79, 1, 1; 2026-02-21T10:22:41.6870184Z // end inline asm 2026-02-21T10:22:41.6870246Z // begin inline asm 2026-02-21T10:22:41.6870748Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2534,%r2535,%r2536,%r2537}, %rd172, %p79, 1, 1; 2026-02-21T10:22:41.6870814Z // end inline asm 2026-02-21T10:22:41.6870876Z // begin inline asm 2026-02-21T10:22:41.6871373Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2570,%r2571,%r2572,%r2573}, %rd173, %p79, 1, 1; 2026-02-21T10:22:41.6871437Z // end inline asm 2026-02-21T10:22:41.6871528Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6871593Z mov.b32 %r2590, %r1928; 2026-02-21T10:22:41.6871666Z mov.b32 %r2591, %r3294; 2026-02-21T10:22:41.6871727Z mov.b32 %r2592, %r3294; 2026-02-21T10:22:41.6871790Z // begin inline asm 2026-02-21T10:22:41.6872108Z // wait for regs: %r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476,%r2590,%r2591,%r2592 2026-02-21T10:22:41.6872302Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6872364Z // end inline asm 2026-02-21T10:22:41.6872430Z $L__tmp12: 2026-02-21T10:22:41.6872739Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6872813Z add.s64 %rd196, %rd154, 256; 2026-02-21T10:22:41.6872883Z add.s64 %rd199, %rd157, 256; 2026-02-21T10:22:41.6872954Z add.s64 %rd202, %rd160, 256; 2026-02-21T10:22:41.6873158Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6873223Z add.s64 %rd205, %rd163, 256; 2026-02-21T10:22:41.6873292Z // begin inline asm 2026-02-21T10:22:41.6873356Z mov.u64 %rd195, 0x0; 2026-02-21T10:22:41.6873488Z createpolicy.fractional.L2::evict_last.b64 %rd195, 1.0; 2026-02-21T10:22:41.6873554Z // end inline asm 2026-02-21T10:22:41.6873618Z // begin inline asm 2026-02-21T10:22:41.6873737Z mov.u32 %r2612, 0x0; 2026-02-21T10:22:41.6873799Z mov.u32 %r2613, 0x0; 2026-02-21T10:22:41.6873865Z mov.u32 %r2614, 0x0; 2026-02-21T10:22:41.6873925Z mov.u32 %r2615, 0x0; 2026-02-21T10:22:41.6874155Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2612, %r2613, %r2614, %r2615 }, [ %rd196 + 0 ], %rd195; 2026-02-21T10:22:41.6874226Z // end inline asm 2026-02-21T10:22:41.6874286Z // begin inline asm 2026-02-21T10:22:41.6874398Z mov.u64 %rd198, 0x0; 2026-02-21T10:22:41.6874526Z createpolicy.fractional.L2::evict_last.b64 %rd198, 1.0; 2026-02-21T10:22:41.6874592Z // end inline asm 2026-02-21T10:22:41.6874657Z // begin inline asm 2026-02-21T10:22:41.6874717Z mov.u32 %r2616, 0x0; 2026-02-21T10:22:41.6874782Z mov.u32 %r2617, 0x0; 2026-02-21T10:22:41.6874847Z mov.u32 %r2618, 0x0; 2026-02-21T10:22:41.6874905Z mov.u32 %r2619, 0x0; 2026-02-21T10:22:41.6875132Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2616, %r2617, %r2618, %r2619 }, [ %rd199 + 0 ], %rd198; 2026-02-21T10:22:41.6875199Z // end inline asm 2026-02-21T10:22:41.6875262Z // begin inline asm 2026-02-21T10:22:41.6875324Z mov.u64 %rd201, 0x0; 2026-02-21T10:22:41.6875453Z createpolicy.fractional.L2::evict_last.b64 %rd201, 1.0; 2026-02-21T10:22:41.6875515Z // end inline asm 2026-02-21T10:22:41.6875577Z // begin inline asm 2026-02-21T10:22:41.6875642Z mov.u32 %r2620, 0x0; 2026-02-21T10:22:41.6875706Z mov.u32 %r2621, 0x0; 2026-02-21T10:22:41.6875767Z mov.u32 %r2622, 0x0; 2026-02-21T10:22:41.6875827Z mov.u32 %r2623, 0x0; 2026-02-21T10:22:41.6876050Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2620, %r2621, %r2622, %r2623 }, [ %rd202 + 0 ], %rd201; 2026-02-21T10:22:41.6876111Z // end inline asm 2026-02-21T10:22:41.6876184Z // begin inline asm 2026-02-21T10:22:41.6876254Z mov.u64 %rd204, 0x0; 2026-02-21T10:22:41.6876376Z createpolicy.fractional.L2::evict_last.b64 %rd204, 1.0; 2026-02-21T10:22:41.6876439Z // end inline asm 2026-02-21T10:22:41.6876636Z // begin inline asm 2026-02-21T10:22:41.6876704Z mov.u32 %r2624, 0x0; 2026-02-21T10:22:41.6876765Z mov.u32 %r2625, 0x0; 2026-02-21T10:22:41.6876826Z mov.u32 %r2626, 0x0; 2026-02-21T10:22:41.6876894Z mov.u32 %r2627, 0x0; 2026-02-21T10:22:41.6877108Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2624, %r2625, %r2626, %r2627 }, [ %rd205 + 0 ], %rd204; 2026-02-21T10:22:41.6877170Z // end inline asm 2026-02-21T10:22:41.6877386Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6877445Z bar.sync 0; 2026-02-21T10:22:41.6877531Z st.shared.v2.b32 [%r23], {%r2612, %r2613}; 2026-02-21T10:22:41.6877629Z st.shared.v2.b32 [%r23+2048], {%r2616, %r2617}; 2026-02-21T10:22:41.6877715Z st.shared.v2.b32 [%r23+4096], {%r2620, %r2621}; 2026-02-21T10:22:41.6877802Z st.shared.v2.b32 [%r23+6144], {%r2624, %r2625}; 2026-02-21T10:22:41.6877880Z st.shared.v2.b32 [%r24], {%r2614, %r2615}; 2026-02-21T10:22:41.6877969Z st.shared.v2.b32 [%r24+2048], {%r2618, %r2619}; 2026-02-21T10:22:41.6878140Z st.shared.v2.b32 [%r24+4096], {%r2622, %r2623}; 2026-02-21T10:22:41.6878224Z st.shared.v2.b32 [%r24+6144], {%r2626, %r2627}; 2026-02-21T10:22:41.6878289Z bar.sync 0; 2026-02-21T10:22:41.6878365Z ld.shared.b16 %rs673, [%r46]; 2026-02-21T10:22:41.6878514Z ld.shared.b16 %rs674, [%r46+1024]; 2026-02-21T10:22:41.6878592Z ld.shared.b16 %rs675, [%r46+64]; 2026-02-21T10:22:41.6878662Z ld.shared.b16 %rs676, [%r46+1088]; 2026-02-21T10:22:41.6878736Z ld.shared.b16 %rs677, [%r47]; 2026-02-21T10:22:41.6878807Z ld.shared.b16 %rs678, [%r47+1024]; 2026-02-21T10:22:41.6878880Z ld.shared.b16 %rs679, [%r47+64]; 2026-02-21T10:22:41.6878947Z ld.shared.b16 %rs680, [%r47+1088]; 2026-02-21T10:22:41.6879017Z ld.shared.b16 %rs681, [%r48]; 2026-02-21T10:22:41.6879093Z ld.shared.b16 %rs682, [%r48+1024]; 2026-02-21T10:22:41.6879162Z ld.shared.b16 %rs683, [%r48+64]; 2026-02-21T10:22:41.6879230Z ld.shared.b16 %rs684, [%r48+1088]; 2026-02-21T10:22:41.6879297Z ld.shared.b16 %rs685, [%r49]; 2026-02-21T10:22:41.6879437Z ld.shared.b16 %rs686, [%r49+1024]; 2026-02-21T10:22:41.6879511Z ld.shared.b16 %rs687, [%r49+64]; 2026-02-21T10:22:41.6879579Z ld.shared.b16 %rs688, [%r49+1088]; 2026-02-21T10:22:41.6879651Z ld.shared.b16 %rs689, [%r50]; 2026-02-21T10:22:41.6879723Z ld.shared.b16 %rs690, [%r50+1024]; 2026-02-21T10:22:41.6879791Z ld.shared.b16 %rs691, [%r50+64]; 2026-02-21T10:22:41.6879860Z ld.shared.b16 %rs692, [%r50+1088]; 2026-02-21T10:22:41.6879997Z ld.shared.b16 %rs693, [%r51]; 2026-02-21T10:22:41.6880067Z ld.shared.b16 %rs694, [%r51+1024]; 2026-02-21T10:22:41.6880137Z ld.shared.b16 %rs695, [%r51+64]; 2026-02-21T10:22:41.6880214Z ld.shared.b16 %rs696, [%r51+1088]; 2026-02-21T10:22:41.6880280Z ld.shared.b16 %rs697, [%r52]; 2026-02-21T10:22:41.6880347Z ld.shared.b16 %rs698, [%r52+1024]; 2026-02-21T10:22:41.6880420Z ld.shared.b16 %rs699, [%r52+64]; 2026-02-21T10:22:41.6880487Z ld.shared.b16 %rs700, [%r52+1088]; 2026-02-21T10:22:41.6880555Z ld.shared.b16 %rs701, [%r53]; 2026-02-21T10:22:41.6880628Z ld.shared.b16 %rs702, [%r53+1024]; 2026-02-21T10:22:41.6880704Z ld.shared.b16 %rs703, [%r53+64]; 2026-02-21T10:22:41.6880772Z ld.shared.b16 %rs704, [%r53+1088]; 2026-02-21T10:22:41.6880842Z cvt.f32.bf16 %r2669, %rs673; 2026-02-21T10:22:41.6880922Z cvt.f32.bf16 %r2670, %rs674; 2026-02-21T10:22:41.6880988Z cvt.f32.bf16 %r2671, %rs677; 2026-02-21T10:22:41.6881052Z cvt.f32.bf16 %r2672, %rs678; 2026-02-21T10:22:41.6881119Z cvt.f32.bf16 %r2705, %rs681; 2026-02-21T10:22:41.6881192Z cvt.f32.bf16 %r2706, %rs682; 2026-02-21T10:22:41.6881257Z cvt.f32.bf16 %r2707, %rs685; 2026-02-21T10:22:41.6881323Z cvt.f32.bf16 %r2708, %rs686; 2026-02-21T10:22:41.6881394Z cvt.f32.bf16 %r2741, %rs689; 2026-02-21T10:22:41.6881457Z cvt.f32.bf16 %r2742, %rs690; 2026-02-21T10:22:41.6881523Z cvt.f32.bf16 %r2743, %rs693; 2026-02-21T10:22:41.6881585Z cvt.f32.bf16 %r2744, %rs694; 2026-02-21T10:22:41.6881671Z cvt.f32.bf16 %r2777, %rs697; 2026-02-21T10:22:41.6881738Z cvt.f32.bf16 %r2778, %rs698; 2026-02-21T10:22:41.6881808Z cvt.f32.bf16 %r2779, %rs701; 2026-02-21T10:22:41.6881880Z cvt.f32.bf16 %r2780, %rs702; 2026-02-21T10:22:41.6881945Z cvt.f32.bf16 %r2813, %rs675; 2026-02-21T10:22:41.6882008Z cvt.f32.bf16 %r2814, %rs676; 2026-02-21T10:22:41.6882086Z cvt.f32.bf16 %r2815, %rs679; 2026-02-21T10:22:41.6882150Z cvt.f32.bf16 %r2816, %rs680; 2026-02-21T10:22:41.6882214Z cvt.f32.bf16 %r2849, %rs683; 2026-02-21T10:22:41.6882279Z cvt.f32.bf16 %r2850, %rs684; 2026-02-21T10:22:41.6882349Z cvt.f32.bf16 %r2851, %rs687; 2026-02-21T10:22:41.6882415Z cvt.f32.bf16 %r2852, %rs688; 2026-02-21T10:22:41.6882478Z cvt.f32.bf16 %r2885, %rs691; 2026-02-21T10:22:41.6882548Z cvt.f32.bf16 %r2886, %rs692; 2026-02-21T10:22:41.6882614Z cvt.f32.bf16 %r2887, %rs695; 2026-02-21T10:22:41.6882676Z cvt.f32.bf16 %r2888, %rs696; 2026-02-21T10:22:41.6882738Z cvt.f32.bf16 %r2921, %rs699; 2026-02-21T10:22:41.6882808Z cvt.f32.bf16 %r2922, %rs700; 2026-02-21T10:22:41.6882870Z cvt.f32.bf16 %r2923, %rs703; 2026-02-21T10:22:41.6882998Z cvt.f32.bf16 %r2924, %rs704; 2026-02-21T10:22:41.6883221Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6883280Z bar.sync 0; 2026-02-21T10:22:41.6883390Z // begin inline asm 2026-02-21T10:22:41.6883502Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6883563Z // end inline asm 2026-02-21T10:22:41.6883620Z bar.sync 0; 2026-02-21T10:22:41.6883685Z // begin inline asm 2026-02-21T10:22:41.6883842Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6883901Z // end inline asm 2026-02-21T10:22:41.6883965Z // begin inline asm 2026-02-21T10:22:41.6884052Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6884110Z // end inline asm 2026-02-21T10:22:41.6884168Z bar.sync 0; 2026-02-21T10:22:41.6884242Z elect.sync %r3349|%p127, -1; 2026-02-21T10:22:41.6884320Z and.pred %p101, %p1, %p127; 2026-02-21T10:22:41.6884386Z or.b32 %r2632, %r1930, 64; 2026-02-21T10:22:41.6884504Z // begin inline asm 2026-02-21T10:22:41.6884847Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r1929, %r2632}], [%r1926]; 2026-02-21T10:22:41.6884910Z // end inline asm 2026-02-21T10:22:41.6884971Z bar.sync 0; 2026-02-21T10:22:41.6885033Z // begin inline asm 2026-02-21T10:22:41.6885097Z 2026-02-21T10:22:41.6885151Z { 2026-02-21T10:22:41.6885275Z .reg .pred complete; 2026-02-21T10:22:41.6885344Z waitLoop: 2026-02-21T10:22:41.6885492Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r3294; 2026-02-21T10:22:41.6885567Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6885623Z } 2026-02-21T10:22:41.6885635Z 2026-02-21T10:22:41.6885694Z // end inline asm 2026-02-21T10:22:41.6885752Z bar.sync 0; 2026-02-21T10:22:41.6885819Z // begin inline asm 2026-02-21T10:22:41.6885924Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6885985Z // end inline asm 2026-02-21T10:22:41.6886202Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6886279Z ld.shared.b8 %rs705, [%r33]; 2026-02-21T10:22:41.6886355Z ld.shared.b8 %rs706, [%r33+64]; 2026-02-21T10:22:41.6886440Z ld.shared.b8 %rs707, [%r33+256]; 2026-02-21T10:22:41.6886635Z ld.shared.b8 %rs708, [%r33+320]; 2026-02-21T10:22:41.6886706Z ld.shared.b8 %rs709, [%r33+512]; 2026-02-21T10:22:41.6886779Z ld.shared.b8 %rs710, [%r33+576]; 2026-02-21T10:22:41.6886846Z ld.shared.b8 %rs711, [%r33+768]; 2026-02-21T10:22:41.6886912Z ld.shared.b8 %rs712, [%r33+832]; 2026-02-21T10:22:41.6886980Z ld.shared.b8 %rs713, [%r34+128]; 2026-02-21T10:22:41.6887051Z ld.shared.b8 %rs714, [%r34+192]; 2026-02-21T10:22:41.6887116Z ld.shared.b8 %rs715, [%r34+384]; 2026-02-21T10:22:41.6887181Z ld.shared.b8 %rs716, [%r34+448]; 2026-02-21T10:22:41.6887257Z ld.shared.b8 %rs717, [%r34+640]; 2026-02-21T10:22:41.6887326Z ld.shared.b8 %rs718, [%r34+704]; 2026-02-21T10:22:41.6887394Z ld.shared.b8 %rs719, [%r34+896]; 2026-02-21T10:22:41.6887462Z ld.shared.b8 %rs720, [%r34+960]; 2026-02-21T10:22:41.6887683Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6887755Z shl.b16 %rs721, %rs705, 4; 2026-02-21T10:22:41.6887821Z shl.b16 %rs722, %rs706, 4; 2026-02-21T10:22:41.6887890Z shl.b16 %rs723, %rs713, 4; 2026-02-21T10:22:41.6887954Z shl.b16 %rs724, %rs714, 4; 2026-02-21T10:22:41.6888020Z shl.b16 %rs725, %rs707, 4; 2026-02-21T10:22:41.6888092Z shl.b16 %rs726, %rs708, 4; 2026-02-21T10:22:41.6888155Z shl.b16 %rs727, %rs715, 4; 2026-02-21T10:22:41.6888217Z shl.b16 %rs728, %rs716, 4; 2026-02-21T10:22:41.6888279Z shl.b16 %rs729, %rs709, 4; 2026-02-21T10:22:41.6888346Z shl.b16 %rs730, %rs710, 4; 2026-02-21T10:22:41.6888409Z shl.b16 %rs731, %rs717, 4; 2026-02-21T10:22:41.6888472Z shl.b16 %rs732, %rs718, 4; 2026-02-21T10:22:41.6888540Z shl.b16 %rs733, %rs711, 4; 2026-02-21T10:22:41.6888603Z shl.b16 %rs734, %rs712, 4; 2026-02-21T10:22:41.6888754Z shl.b16 %rs735, %rs719, 4; 2026-02-21T10:22:41.6888829Z shl.b16 %rs736, %rs720, 4; 2026-02-21T10:22:41.6889046Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6889190Z selp.b16 %rs737, %rs721, %rs705, %p268; 2026-02-21T10:22:41.6889262Z cvt.s16.s8 %rs738, %rs737; 2026-02-21T10:22:41.6889332Z shr.s16 %rs739, %rs738, 4; 2026-02-21T10:22:41.6889412Z selp.b16 %rs740, %rs722, %rs706, %p268; 2026-02-21T10:22:41.6889482Z cvt.s16.s8 %rs741, %rs740; 2026-02-21T10:22:41.6889551Z shr.s16 %rs742, %rs741, 4; 2026-02-21T10:22:41.6889629Z selp.b16 %rs743, %rs723, %rs713, %p268; 2026-02-21T10:22:41.6889695Z cvt.s16.s8 %rs744, %rs743; 2026-02-21T10:22:41.6889759Z shr.s16 %rs745, %rs744, 4; 2026-02-21T10:22:41.6889841Z selp.b16 %rs746, %rs724, %rs714, %p268; 2026-02-21T10:22:41.6889907Z cvt.s16.s8 %rs747, %rs746; 2026-02-21T10:22:41.6889972Z shr.s16 %rs748, %rs747, 4; 2026-02-21T10:22:41.6890112Z selp.b16 %rs749, %rs725, %rs707, %p268; 2026-02-21T10:22:41.6890190Z cvt.s16.s8 %rs750, %rs749; 2026-02-21T10:22:41.6890256Z shr.s16 %rs751, %rs750, 4; 2026-02-21T10:22:41.6890328Z selp.b16 %rs752, %rs726, %rs708, %p268; 2026-02-21T10:22:41.6890402Z cvt.s16.s8 %rs753, %rs752; 2026-02-21T10:22:41.6890465Z shr.s16 %rs754, %rs753, 4; 2026-02-21T10:22:41.6890536Z selp.b16 %rs755, %rs727, %rs715, %p268; 2026-02-21T10:22:41.6890670Z cvt.s16.s8 %rs756, %rs755; 2026-02-21T10:22:41.6890736Z shr.s16 %rs757, %rs756, 4; 2026-02-21T10:22:41.6890809Z selp.b16 %rs758, %rs728, %rs716, %p268; 2026-02-21T10:22:41.6890874Z cvt.s16.s8 %rs759, %rs758; 2026-02-21T10:22:41.6890943Z shr.s16 %rs760, %rs759, 4; 2026-02-21T10:22:41.6891016Z selp.b16 %rs761, %rs729, %rs709, %p268; 2026-02-21T10:22:41.6891078Z cvt.s16.s8 %rs762, %rs761; 2026-02-21T10:22:41.6891146Z shr.s16 %rs763, %rs762, 4; 2026-02-21T10:22:41.6891216Z selp.b16 %rs764, %rs730, %rs710, %p268; 2026-02-21T10:22:41.6891289Z cvt.s16.s8 %rs765, %rs764; 2026-02-21T10:22:41.6891359Z shr.s16 %rs766, %rs765, 4; 2026-02-21T10:22:41.6891430Z selp.b16 %rs767, %rs731, %rs717, %p268; 2026-02-21T10:22:41.6891503Z cvt.s16.s8 %rs768, %rs767; 2026-02-21T10:22:41.6891567Z shr.s16 %rs769, %rs768, 4; 2026-02-21T10:22:41.6891649Z selp.b16 %rs770, %rs732, %rs718, %p268; 2026-02-21T10:22:41.6891716Z cvt.s16.s8 %rs771, %rs770; 2026-02-21T10:22:41.6891783Z shr.s16 %rs772, %rs771, 4; 2026-02-21T10:22:41.6891862Z selp.b16 %rs773, %rs733, %rs711, %p268; 2026-02-21T10:22:41.6891929Z cvt.s16.s8 %rs774, %rs773; 2026-02-21T10:22:41.6891991Z shr.s16 %rs775, %rs774, 4; 2026-02-21T10:22:41.6892063Z selp.b16 %rs776, %rs734, %rs712, %p268; 2026-02-21T10:22:41.6892132Z cvt.s16.s8 %rs777, %rs776; 2026-02-21T10:22:41.6892195Z shr.s16 %rs778, %rs777, 4; 2026-02-21T10:22:41.6892267Z selp.b16 %rs779, %rs735, %rs719, %p268; 2026-02-21T10:22:41.6892337Z cvt.s16.s8 %rs780, %rs779; 2026-02-21T10:22:41.6892399Z shr.s16 %rs781, %rs780, 4; 2026-02-21T10:22:41.6892477Z selp.b16 %rs782, %rs736, %rs720, %p268; 2026-02-21T10:22:41.6892540Z cvt.s16.s8 %rs783, %rs782; 2026-02-21T10:22:41.6892611Z shr.s16 %rs784, %rs783, 4; 2026-02-21T10:22:41.6892835Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6892912Z cvt.rn.f32.s16 %r3350, %rs739; 2026-02-21T10:22:41.6892985Z cvt.rn.f32.s16 %r3351, %rs742; 2026-02-21T10:22:41.6893054Z cvt.rn.f32.s16 %r3352, %rs745; 2026-02-21T10:22:41.6893120Z cvt.rn.f32.s16 %r3353, %rs748; 2026-02-21T10:22:41.6893203Z cvt.rn.f32.s16 %r3354, %rs751; 2026-02-21T10:22:41.6893272Z cvt.rn.f32.s16 %r3355, %rs754; 2026-02-21T10:22:41.6893337Z cvt.rn.f32.s16 %r3356, %rs757; 2026-02-21T10:22:41.6893405Z cvt.rn.f32.s16 %r3357, %rs760; 2026-02-21T10:22:41.6893475Z cvt.rn.f32.s16 %r3358, %rs763; 2026-02-21T10:22:41.6893539Z cvt.rn.f32.s16 %r3359, %rs766; 2026-02-21T10:22:41.6893611Z cvt.rn.f32.s16 %r3360, %rs769; 2026-02-21T10:22:41.6893761Z cvt.rn.f32.s16 %r3361, %rs772; 2026-02-21T10:22:41.6893832Z cvt.rn.f32.s16 %r3362, %rs775; 2026-02-21T10:22:41.6893901Z cvt.rn.f32.s16 %r3363, %rs778; 2026-02-21T10:22:41.6893964Z cvt.rn.f32.s16 %r3364, %rs781; 2026-02-21T10:22:41.6894029Z cvt.rn.f32.s16 %r3365, %rs784; 2026-02-21T10:22:41.6894138Z bar.sync 0; 2026-02-21T10:22:41.6894214Z st.shared.b32 [%r35], %r3350; 2026-02-21T10:22:41.6894288Z st.shared.b32 [%r35+4096], %r3358; 2026-02-21T10:22:41.6894359Z st.shared.b32 [%r36], %r3351; 2026-02-21T10:22:41.6894430Z st.shared.b32 [%r36+4096], %r3359; 2026-02-21T10:22:41.6894495Z st.shared.b32 [%r37], %r3352; 2026-02-21T10:22:41.6894561Z st.shared.b32 [%r37+4096], %r3360; 2026-02-21T10:22:41.6894627Z st.shared.b32 [%r38], %r3353; 2026-02-21T10:22:41.6894713Z st.shared.b32 [%r38+4096], %r3361; 2026-02-21T10:22:41.6894779Z st.shared.b32 [%r39], %r3354; 2026-02-21T10:22:41.6894845Z st.shared.b32 [%r39+4096], %r3362; 2026-02-21T10:22:41.6894916Z st.shared.b32 [%r40], %r3355; 2026-02-21T10:22:41.6895034Z st.shared.b32 [%r40+4096], %r3363; 2026-02-21T10:22:41.6895101Z st.shared.b32 [%r41], %r3356; 2026-02-21T10:22:41.6895167Z st.shared.b32 [%r41+4096], %r3364; 2026-02-21T10:22:41.6895239Z st.shared.b32 [%r42], %r3357; 2026-02-21T10:22:41.6895309Z st.shared.b32 [%r42+4096], %r3365; 2026-02-21T10:22:41.6895369Z $L__tmp13: 2026-02-21T10:22:41.6895711Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6895780Z // begin inline asm 2026-02-21T10:22:41.6895863Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6895928Z // end inline asm 2026-02-21T10:22:41.6895985Z bar.sync 0; 2026-02-21T10:22:41.6896060Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6896120Z // begin inline asm 2026-02-21T10:22:41.6896776Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2669,%r2670,%r2671,%r2672}, %rd166, %p79, 1, 1; 2026-02-21T10:22:41.6896847Z // end inline asm 2026-02-21T10:22:41.6896907Z // begin inline asm 2026-02-21T10:22:41.6897418Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2705,%r2706,%r2707,%r2708}, %rd167, %p79, 1, 1; 2026-02-21T10:22:41.6897480Z // end inline asm 2026-02-21T10:22:41.6897546Z // begin inline asm 2026-02-21T10:22:41.6898046Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2741,%r2742,%r2743,%r2744}, %rd168, %p79, 1, 1; 2026-02-21T10:22:41.6898108Z // end inline asm 2026-02-21T10:22:41.6898169Z // begin inline asm 2026-02-21T10:22:41.6898673Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2777,%r2778,%r2779,%r2780}, %rd169, %p79, 1, 1; 2026-02-21T10:22:41.6898735Z // end inline asm 2026-02-21T10:22:41.6898798Z // begin inline asm 2026-02-21T10:22:41.6899304Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2813,%r2814,%r2815,%r2816}, %rd170, %p79, 1, 1; 2026-02-21T10:22:41.6899365Z // end inline asm 2026-02-21T10:22:41.6899427Z // begin inline asm 2026-02-21T10:22:41.6899927Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2849,%r2850,%r2851,%r2852}, %rd171, %p79, 1, 1; 2026-02-21T10:22:41.6899986Z // end inline asm 2026-02-21T10:22:41.6900049Z // begin inline asm 2026-02-21T10:22:41.6900545Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2885,%r2886,%r2887,%r2888}, %rd172, %p79, 1, 1; 2026-02-21T10:22:41.6900693Z // end inline asm 2026-02-21T10:22:41.6900766Z // begin inline asm 2026-02-21T10:22:41.6901266Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r2921,%r2922,%r2923,%r2924}, %rd173, %p79, 1, 1; 2026-02-21T10:22:41.6901393Z // end inline asm 2026-02-21T10:22:41.6901483Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6901547Z mov.b32 %r2941, %r1928; 2026-02-21T10:22:41.6901613Z mov.b32 %r2942, %r3294; 2026-02-21T10:22:41.6901675Z mov.b32 %r2943, %r3294; 2026-02-21T10:22:41.6901736Z // begin inline asm 2026-02-21T10:22:41.6902050Z // wait for regs: %r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476,%r2941,%r2942,%r2943 2026-02-21T10:22:41.6902129Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6902190Z // end inline asm 2026-02-21T10:22:41.6902315Z $L__tmp14: 2026-02-21T10:22:41.6902537Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6902606Z add.s64 %rd217, %rd154, 384; 2026-02-21T10:22:41.6902676Z add.s64 %rd220, %rd157, 384; 2026-02-21T10:22:41.6902751Z add.s64 %rd223, %rd160, 384; 2026-02-21T10:22:41.6903022Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6903088Z add.s64 %rd226, %rd163, 384; 2026-02-21T10:22:41.6903151Z // begin inline asm 2026-02-21T10:22:41.6903218Z mov.u64 %rd216, 0x0; 2026-02-21T10:22:41.6903349Z createpolicy.fractional.L2::evict_last.b64 %rd216, 1.0; 2026-02-21T10:22:41.6903408Z // end inline asm 2026-02-21T10:22:41.6903474Z // begin inline asm 2026-02-21T10:22:41.6903535Z mov.u32 %r2963, 0x0; 2026-02-21T10:22:41.6903598Z mov.u32 %r2964, 0x0; 2026-02-21T10:22:41.6903662Z mov.u32 %r2965, 0x0; 2026-02-21T10:22:41.6903724Z mov.u32 %r2966, 0x0; 2026-02-21T10:22:41.6903960Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2963, %r2964, %r2965, %r2966 }, [ %rd217 + 0 ], %rd216; 2026-02-21T10:22:41.6904020Z // end inline asm 2026-02-21T10:22:41.6904087Z // begin inline asm 2026-02-21T10:22:41.6904151Z mov.u64 %rd219, 0x0; 2026-02-21T10:22:41.6904272Z createpolicy.fractional.L2::evict_last.b64 %rd219, 1.0; 2026-02-21T10:22:41.6904337Z // end inline asm 2026-02-21T10:22:41.6904401Z // begin inline asm 2026-02-21T10:22:41.6904462Z mov.u32 %r2967, 0x0; 2026-02-21T10:22:41.6904522Z mov.u32 %r2968, 0x0; 2026-02-21T10:22:41.6904587Z mov.u32 %r2969, 0x0; 2026-02-21T10:22:41.6904648Z mov.u32 %r2970, 0x0; 2026-02-21T10:22:41.6904867Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2967, %r2968, %r2969, %r2970 }, [ %rd220 + 0 ], %rd219; 2026-02-21T10:22:41.6904930Z // end inline asm 2026-02-21T10:22:41.6904992Z // begin inline asm 2026-02-21T10:22:41.6905054Z mov.u64 %rd222, 0x0; 2026-02-21T10:22:41.6905180Z createpolicy.fractional.L2::evict_last.b64 %rd222, 1.0; 2026-02-21T10:22:41.6905242Z // end inline asm 2026-02-21T10:22:41.6905303Z // begin inline asm 2026-02-21T10:22:41.6905364Z mov.u32 %r2971, 0x0; 2026-02-21T10:22:41.6905430Z mov.u32 %r2972, 0x0; 2026-02-21T10:22:41.6905493Z mov.u32 %r2973, 0x0; 2026-02-21T10:22:41.6905554Z mov.u32 %r2974, 0x0; 2026-02-21T10:22:41.6905776Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2971, %r2972, %r2973, %r2974 }, [ %rd223 + 0 ], %rd222; 2026-02-21T10:22:41.6905837Z // end inline asm 2026-02-21T10:22:41.6905899Z // begin inline asm 2026-02-21T10:22:41.6905965Z mov.u64 %rd225, 0x0; 2026-02-21T10:22:41.6906084Z createpolicy.fractional.L2::evict_last.b64 %rd225, 1.0; 2026-02-21T10:22:41.6906144Z // end inline asm 2026-02-21T10:22:41.6906205Z // begin inline asm 2026-02-21T10:22:41.6906275Z mov.u32 %r2975, 0x0; 2026-02-21T10:22:41.6906335Z mov.u32 %r2976, 0x0; 2026-02-21T10:22:41.6906394Z mov.u32 %r2977, 0x0; 2026-02-21T10:22:41.6906582Z mov.u32 %r2978, 0x0; 2026-02-21T10:22:41.6906900Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2975, %r2976, %r2977, %r2978 }, [ %rd226 + 0 ], %rd225; 2026-02-21T10:22:41.6906960Z // end inline asm 2026-02-21T10:22:41.6907182Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.6907308Z bar.sync 0; 2026-02-21T10:22:41.6907396Z st.shared.v2.b32 [%r23], {%r2963, %r2964}; 2026-02-21T10:22:41.6907491Z st.shared.v2.b32 [%r23+2048], {%r2967, %r2968}; 2026-02-21T10:22:41.6907586Z st.shared.v2.b32 [%r23+4096], {%r2971, %r2972}; 2026-02-21T10:22:41.6907672Z st.shared.v2.b32 [%r23+6144], {%r2975, %r2976}; 2026-02-21T10:22:41.6907751Z st.shared.v2.b32 [%r24], {%r2965, %r2966}; 2026-02-21T10:22:41.6907839Z st.shared.v2.b32 [%r24+2048], {%r2969, %r2970}; 2026-02-21T10:22:41.6907923Z st.shared.v2.b32 [%r24+4096], {%r2973, %r2974}; 2026-02-21T10:22:41.6908007Z st.shared.v2.b32 [%r24+6144], {%r2977, %r2978}; 2026-02-21T10:22:41.6908066Z bar.sync 0; 2026-02-21T10:22:41.6908222Z ld.shared.b16 %rs785, [%r46]; 2026-02-21T10:22:41.6908298Z ld.shared.b16 %rs786, [%r46+1024]; 2026-02-21T10:22:41.6908371Z ld.shared.b16 %rs787, [%r46+64]; 2026-02-21T10:22:41.6908444Z ld.shared.b16 %rs788, [%r46+1088]; 2026-02-21T10:22:41.6908593Z ld.shared.b16 %rs789, [%r47]; 2026-02-21T10:22:41.6908666Z ld.shared.b16 %rs790, [%r47+1024]; 2026-02-21T10:22:41.6908743Z ld.shared.b16 %rs791, [%r47+64]; 2026-02-21T10:22:41.6908876Z ld.shared.b16 %rs792, [%r47+1088]; 2026-02-21T10:22:41.6908945Z ld.shared.b16 %rs793, [%r48]; 2026-02-21T10:22:41.6909011Z ld.shared.b16 %rs794, [%r48+1024]; 2026-02-21T10:22:41.6909083Z ld.shared.b16 %rs795, [%r48+64]; 2026-02-21T10:22:41.6909148Z ld.shared.b16 %rs796, [%r48+1088]; 2026-02-21T10:22:41.6909214Z ld.shared.b16 %rs797, [%r49]; 2026-02-21T10:22:41.6909287Z ld.shared.b16 %rs798, [%r49+1024]; 2026-02-21T10:22:41.6909353Z ld.shared.b16 %rs799, [%r49+64]; 2026-02-21T10:22:41.6909418Z ld.shared.b16 %rs800, [%r49+1088]; 2026-02-21T10:22:41.6909490Z ld.shared.b16 %rs801, [%r50]; 2026-02-21T10:22:41.6909560Z ld.shared.b16 %rs802, [%r50+1024]; 2026-02-21T10:22:41.6909627Z ld.shared.b16 %rs803, [%r50+64]; 2026-02-21T10:22:41.6909694Z ld.shared.b16 %rs804, [%r50+1088]; 2026-02-21T10:22:41.6909767Z ld.shared.b16 %rs805, [%r51]; 2026-02-21T10:22:41.6909833Z ld.shared.b16 %rs806, [%r51+1024]; 2026-02-21T10:22:41.6909901Z ld.shared.b16 %rs807, [%r51+64]; 2026-02-21T10:22:41.6909974Z ld.shared.b16 %rs808, [%r51+1088]; 2026-02-21T10:22:41.6910039Z ld.shared.b16 %rs809, [%r52]; 2026-02-21T10:22:41.6910105Z ld.shared.b16 %rs810, [%r52+1024]; 2026-02-21T10:22:41.6910173Z ld.shared.b16 %rs811, [%r52+64]; 2026-02-21T10:22:41.6910246Z ld.shared.b16 %rs812, [%r52+1088]; 2026-02-21T10:22:41.6910313Z ld.shared.b16 %rs813, [%r53]; 2026-02-21T10:22:41.6910382Z ld.shared.b16 %rs814, [%r53+1024]; 2026-02-21T10:22:41.6910455Z ld.shared.b16 %rs815, [%r53+64]; 2026-02-21T10:22:41.6910521Z ld.shared.b16 %rs816, [%r53+1088]; 2026-02-21T10:22:41.6910594Z cvt.f32.bf16 %r3020, %rs785; 2026-02-21T10:22:41.6910660Z cvt.f32.bf16 %r3021, %rs786; 2026-02-21T10:22:41.6910730Z cvt.f32.bf16 %r3022, %rs789; 2026-02-21T10:22:41.6910793Z cvt.f32.bf16 %r3023, %rs790; 2026-02-21T10:22:41.6910860Z cvt.f32.bf16 %r3056, %rs793; 2026-02-21T10:22:41.6910927Z cvt.f32.bf16 %r3057, %rs794; 2026-02-21T10:22:41.6910993Z cvt.f32.bf16 %r3058, %rs797; 2026-02-21T10:22:41.6911057Z cvt.f32.bf16 %r3059, %rs798; 2026-02-21T10:22:41.6911120Z cvt.f32.bf16 %r3092, %rs801; 2026-02-21T10:22:41.6911187Z cvt.f32.bf16 %r3093, %rs802; 2026-02-21T10:22:41.6911252Z cvt.f32.bf16 %r3094, %rs805; 2026-02-21T10:22:41.6911315Z cvt.f32.bf16 %r3095, %rs806; 2026-02-21T10:22:41.6911396Z cvt.f32.bf16 %r3128, %rs809; 2026-02-21T10:22:41.6911461Z cvt.f32.bf16 %r3129, %rs810; 2026-02-21T10:22:41.6911533Z cvt.f32.bf16 %r3130, %rs813; 2026-02-21T10:22:41.6911596Z cvt.f32.bf16 %r3131, %rs814; 2026-02-21T10:22:41.6911665Z cvt.f32.bf16 %r3164, %rs787; 2026-02-21T10:22:41.6911788Z cvt.f32.bf16 %r3165, %rs788; 2026-02-21T10:22:41.6911853Z cvt.f32.bf16 %r3166, %rs791; 2026-02-21T10:22:41.6911923Z cvt.f32.bf16 %r3167, %rs792; 2026-02-21T10:22:41.6911987Z cvt.f32.bf16 %r3200, %rs795; 2026-02-21T10:22:41.6912096Z cvt.f32.bf16 %r3201, %rs796; 2026-02-21T10:22:41.6912163Z cvt.f32.bf16 %r3202, %rs799; 2026-02-21T10:22:41.6912227Z cvt.f32.bf16 %r3203, %rs800; 2026-02-21T10:22:41.6912292Z cvt.f32.bf16 %r3236, %rs803; 2026-02-21T10:22:41.6912357Z cvt.f32.bf16 %r3237, %rs804; 2026-02-21T10:22:41.6912426Z cvt.f32.bf16 %r3238, %rs807; 2026-02-21T10:22:41.6912488Z cvt.f32.bf16 %r3239, %rs808; 2026-02-21T10:22:41.6912551Z cvt.f32.bf16 %r3272, %rs811; 2026-02-21T10:22:41.6912619Z cvt.f32.bf16 %r3273, %rs812; 2026-02-21T10:22:41.6912683Z cvt.f32.bf16 %r3274, %rs815; 2026-02-21T10:22:41.6912745Z cvt.f32.bf16 %r3275, %rs816; 2026-02-21T10:22:41.6912958Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6913075Z bar.sync 0; 2026-02-21T10:22:41.6913140Z // begin inline asm 2026-02-21T10:22:41.6913247Z @%p256 mbarrier.init.shared::cta.b64 [%r1926], 1; 2026-02-21T10:22:41.6913315Z // end inline asm 2026-02-21T10:22:41.6913375Z bar.sync 0; 2026-02-21T10:22:41.6913436Z // begin inline asm 2026-02-21T10:22:41.6913579Z @%p256 mbarrier.arrive.expect_tx.shared.b64 _, [%r1926], 1024; 2026-02-21T10:22:41.6913707Z // end inline asm 2026-02-21T10:22:41.6913771Z // begin inline asm 2026-02-21T10:22:41.6913852Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6913919Z // end inline asm 2026-02-21T10:22:41.6913978Z bar.sync 0; 2026-02-21T10:22:41.6914050Z elect.sync %r3366|%p128, -1; 2026-02-21T10:22:41.6914127Z and.pred %p113, %p1, %p128; 2026-02-21T10:22:41.6914193Z or.b32 %r2983, %r1930, 96; 2026-02-21T10:22:41.6914256Z // begin inline asm 2026-02-21T10:22:41.6914599Z @%p113 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1928], [%rd247, {%r1929, %r2983}], [%r1926]; 2026-02-21T10:22:41.6914676Z // end inline asm 2026-02-21T10:22:41.6914735Z bar.sync 0; 2026-02-21T10:22:41.6914797Z // begin inline asm 2026-02-21T10:22:41.6914858Z 2026-02-21T10:22:41.6914927Z { 2026-02-21T10:22:41.6915000Z .reg .pred complete; 2026-02-21T10:22:41.6915059Z waitLoop: 2026-02-21T10:22:41.6915217Z mbarrier.try_wait.parity.shared.b64 complete, [%r1926], %r3294; 2026-02-21T10:22:41.6915293Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.6915348Z } 2026-02-21T10:22:41.6915352Z 2026-02-21T10:22:41.6915419Z // end inline asm 2026-02-21T10:22:41.6915476Z bar.sync 0; 2026-02-21T10:22:41.6915537Z // begin inline asm 2026-02-21T10:22:41.6915642Z @%p256 mbarrier.inval.shared::cta.b64 [%r1926]; 2026-02-21T10:22:41.6915702Z // end inline asm 2026-02-21T10:22:41.6915918Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6915989Z ld.shared.b8 %rs817, [%r33]; 2026-02-21T10:22:41.6916069Z ld.shared.b8 %rs818, [%r33+64]; 2026-02-21T10:22:41.6916140Z ld.shared.b8 %rs819, [%r33+256]; 2026-02-21T10:22:41.6916207Z ld.shared.b8 %rs820, [%r33+320]; 2026-02-21T10:22:41.6916287Z ld.shared.b8 %rs821, [%r33+512]; 2026-02-21T10:22:41.6916357Z ld.shared.b8 %rs822, [%r33+576]; 2026-02-21T10:22:41.6916425Z ld.shared.b8 %rs823, [%r33+768]; 2026-02-21T10:22:41.6916604Z ld.shared.b8 %rs824, [%r33+832]; 2026-02-21T10:22:41.6916680Z ld.shared.b8 %rs825, [%r34+128]; 2026-02-21T10:22:41.6916747Z ld.shared.b8 %rs826, [%r34+192]; 2026-02-21T10:22:41.6916813Z ld.shared.b8 %rs827, [%r34+384]; 2026-02-21T10:22:41.6916881Z ld.shared.b8 %rs828, [%r34+448]; 2026-02-21T10:22:41.6916944Z ld.shared.b8 %rs829, [%r34+640]; 2026-02-21T10:22:41.6917010Z ld.shared.b8 %rs830, [%r34+704]; 2026-02-21T10:22:41.6917080Z ld.shared.b8 %rs831, [%r34+896]; 2026-02-21T10:22:41.6917146Z ld.shared.b8 %rs832, [%r34+960]; 2026-02-21T10:22:41.6917349Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.6917503Z shl.b16 %rs833, %rs817, 4; 2026-02-21T10:22:41.6917571Z shl.b16 %rs834, %rs818, 4; 2026-02-21T10:22:41.6917635Z shl.b16 %rs835, %rs825, 4; 2026-02-21T10:22:41.6917694Z shl.b16 %rs836, %rs826, 4; 2026-02-21T10:22:41.6917833Z shl.b16 %rs837, %rs819, 4; 2026-02-21T10:22:41.6917896Z shl.b16 %rs838, %rs820, 4; 2026-02-21T10:22:41.6917956Z shl.b16 %rs839, %rs827, 4; 2026-02-21T10:22:41.6918020Z shl.b16 %rs840, %rs828, 4; 2026-02-21T10:22:41.6918086Z shl.b16 %rs841, %rs821, 4; 2026-02-21T10:22:41.6918149Z shl.b16 %rs842, %rs822, 4; 2026-02-21T10:22:41.6918211Z shl.b16 %rs843, %rs829, 4; 2026-02-21T10:22:41.6918278Z shl.b16 %rs844, %rs830, 4; 2026-02-21T10:22:41.6918339Z shl.b16 %rs845, %rs823, 4; 2026-02-21T10:22:41.6918400Z shl.b16 %rs846, %rs824, 4; 2026-02-21T10:22:41.6918463Z shl.b16 %rs847, %rs831, 4; 2026-02-21T10:22:41.6918527Z shl.b16 %rs848, %rs832, 4; 2026-02-21T10:22:41.6918791Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.6918877Z selp.b16 %rs849, %rs833, %rs817, %p268; 2026-02-21T10:22:41.6918945Z cvt.s16.s8 %rs850, %rs849; 2026-02-21T10:22:41.6919009Z shr.s16 %rs851, %rs850, 4; 2026-02-21T10:22:41.6919086Z selp.b16 %rs852, %rs834, %rs818, %p268; 2026-02-21T10:22:41.6919154Z cvt.s16.s8 %rs853, %rs852; 2026-02-21T10:22:41.6919219Z shr.s16 %rs854, %rs853, 4; 2026-02-21T10:22:41.6919354Z selp.b16 %rs855, %rs835, %rs825, %p268; 2026-02-21T10:22:41.6919419Z cvt.s16.s8 %rs856, %rs855; 2026-02-21T10:22:41.6919486Z shr.s16 %rs857, %rs856, 4; 2026-02-21T10:22:41.6919557Z selp.b16 %rs858, %rs836, %rs826, %p268; 2026-02-21T10:22:41.6919618Z cvt.s16.s8 %rs859, %rs858; 2026-02-21T10:22:41.6919685Z shr.s16 %rs860, %rs859, 4; 2026-02-21T10:22:41.6919755Z selp.b16 %rs861, %rs837, %rs819, %p268; 2026-02-21T10:22:41.6919827Z cvt.s16.s8 %rs862, %rs861; 2026-02-21T10:22:41.6919891Z shr.s16 %rs863, %rs862, 4; 2026-02-21T10:22:41.6919970Z selp.b16 %rs864, %rs838, %rs820, %p268; 2026-02-21T10:22:41.6920032Z cvt.s16.s8 %rs865, %rs864; 2026-02-21T10:22:41.6920095Z shr.s16 %rs866, %rs865, 4; 2026-02-21T10:22:41.6920171Z selp.b16 %rs867, %rs839, %rs827, %p268; 2026-02-21T10:22:41.6920236Z cvt.s16.s8 %rs868, %rs867; 2026-02-21T10:22:41.6920300Z shr.s16 %rs869, %rs868, 4; 2026-02-21T10:22:41.6920380Z selp.b16 %rs870, %rs840, %rs828, %p268; 2026-02-21T10:22:41.6920447Z cvt.s16.s8 %rs871, %rs870; 2026-02-21T10:22:41.6920509Z shr.s16 %rs872, %rs871, 4; 2026-02-21T10:22:41.6920579Z selp.b16 %rs873, %rs841, %rs821, %p268; 2026-02-21T10:22:41.6920647Z cvt.s16.s8 %rs874, %rs873; 2026-02-21T10:22:41.6920710Z shr.s16 %rs875, %rs874, 4; 2026-02-21T10:22:41.6920780Z selp.b16 %rs876, %rs842, %rs822, %p268; 2026-02-21T10:22:41.6920848Z cvt.s16.s8 %rs877, %rs876; 2026-02-21T10:22:41.6920910Z shr.s16 %rs878, %rs877, 4; 2026-02-21T10:22:41.6920978Z selp.b16 %rs879, %rs843, %rs829, %p268; 2026-02-21T10:22:41.6921041Z cvt.s16.s8 %rs880, %rs879; 2026-02-21T10:22:41.6921111Z shr.s16 %rs881, %rs880, 4; 2026-02-21T10:22:41.6921181Z selp.b16 %rs882, %rs844, %rs830, %p268; 2026-02-21T10:22:41.6921244Z cvt.s16.s8 %rs883, %rs882; 2026-02-21T10:22:41.6921308Z shr.s16 %rs884, %rs883, 4; 2026-02-21T10:22:41.6921381Z selp.b16 %rs885, %rs845, %rs823, %p268; 2026-02-21T10:22:41.6921444Z cvt.s16.s8 %rs886, %rs885; 2026-02-21T10:22:41.6921506Z shr.s16 %rs887, %rs886, 4; 2026-02-21T10:22:41.6921584Z selp.b16 %rs888, %rs846, %rs824, %p268; 2026-02-21T10:22:41.6921648Z cvt.s16.s8 %rs889, %rs888; 2026-02-21T10:22:41.6921716Z shr.s16 %rs890, %rs889, 4; 2026-02-21T10:22:41.6921791Z selp.b16 %rs891, %rs847, %rs831, %p268; 2026-02-21T10:22:41.6921856Z cvt.s16.s8 %rs892, %rs891; 2026-02-21T10:22:41.6921917Z shr.s16 %rs893, %rs892, 4; 2026-02-21T10:22:41.6921990Z selp.b16 %rs894, %rs848, %rs832, %p268; 2026-02-21T10:22:41.6922053Z cvt.s16.s8 %rs895, %rs894; 2026-02-21T10:22:41.6922114Z shr.s16 %rs896, %rs895, 4; 2026-02-21T10:22:41.6922380Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.6922452Z cvt.rn.f32.s16 %r3367, %rs851; 2026-02-21T10:22:41.6922518Z cvt.rn.f32.s16 %r3368, %rs854; 2026-02-21T10:22:41.6922631Z cvt.rn.f32.s16 %r3369, %rs857; 2026-02-21T10:22:41.6922696Z cvt.rn.f32.s16 %r3370, %rs860; 2026-02-21T10:22:41.6922759Z cvt.rn.f32.s16 %r3371, %rs863; 2026-02-21T10:22:41.6922824Z cvt.rn.f32.s16 %r3372, %rs866; 2026-02-21T10:22:41.6922888Z cvt.rn.f32.s16 %r3373, %rs869; 2026-02-21T10:22:41.6922955Z cvt.rn.f32.s16 %r3374, %rs872; 2026-02-21T10:22:41.6923020Z cvt.rn.f32.s16 %r3375, %rs875; 2026-02-21T10:22:41.6923083Z cvt.rn.f32.s16 %r3376, %rs878; 2026-02-21T10:22:41.6923148Z cvt.rn.f32.s16 %r3377, %rs881; 2026-02-21T10:22:41.6923211Z cvt.rn.f32.s16 %r3378, %rs884; 2026-02-21T10:22:41.6923273Z cvt.rn.f32.s16 %r3379, %rs887; 2026-02-21T10:22:41.6923381Z cvt.rn.f32.s16 %r3380, %rs890; 2026-02-21T10:22:41.6923497Z cvt.rn.f32.s16 %r3381, %rs893; 2026-02-21T10:22:41.6923700Z cvt.rn.f32.s16 %r3382, %rs896; 2026-02-21T10:22:41.6923805Z bar.sync 0; 2026-02-21T10:22:41.6923889Z st.shared.b32 [%r35], %r3367; 2026-02-21T10:22:41.6923959Z st.shared.b32 [%r35+4096], %r3375; 2026-02-21T10:22:41.6924030Z st.shared.b32 [%r36], %r3368; 2026-02-21T10:22:41.6924103Z st.shared.b32 [%r36+4096], %r3376; 2026-02-21T10:22:41.6924172Z st.shared.b32 [%r37], %r3369; 2026-02-21T10:22:41.6924296Z st.shared.b32 [%r37+4096], %r3377; 2026-02-21T10:22:41.6924362Z st.shared.b32 [%r38], %r3370; 2026-02-21T10:22:41.6924431Z st.shared.b32 [%r38+4096], %r3378; 2026-02-21T10:22:41.6924499Z st.shared.b32 [%r39], %r3371; 2026-02-21T10:22:41.6924562Z st.shared.b32 [%r39+4096], %r3379; 2026-02-21T10:22:41.6924628Z st.shared.b32 [%r40], %r3372; 2026-02-21T10:22:41.6924692Z st.shared.b32 [%r40+4096], %r3380; 2026-02-21T10:22:41.6924757Z st.shared.b32 [%r41], %r3373; 2026-02-21T10:22:41.6924822Z st.shared.b32 [%r41+4096], %r3381; 2026-02-21T10:22:41.6924895Z st.shared.b32 [%r42], %r3374; 2026-02-21T10:22:41.6924959Z st.shared.b32 [%r42+4096], %r3382; 2026-02-21T10:22:41.6925015Z $L__tmp15: 2026-02-21T10:22:41.6925306Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.6925369Z // begin inline asm 2026-02-21T10:22:41.6925448Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.6925511Z // end inline asm 2026-02-21T10:22:41.6925569Z bar.sync 0; 2026-02-21T10:22:41.6925656Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.6925720Z // begin inline asm 2026-02-21T10:22:41.6926241Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3020,%r3021,%r3022,%r3023}, %rd166, %p79, 1, 1; 2026-02-21T10:22:41.6926300Z // end inline asm 2026-02-21T10:22:41.6926359Z // begin inline asm 2026-02-21T10:22:41.6927037Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3056,%r3057,%r3058,%r3059}, %rd167, %p79, 1, 1; 2026-02-21T10:22:41.6927103Z // end inline asm 2026-02-21T10:22:41.6927168Z // begin inline asm 2026-02-21T10:22:41.6927676Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3092,%r3093,%r3094,%r3095}, %rd168, %p79, 1, 1; 2026-02-21T10:22:41.6927735Z // end inline asm 2026-02-21T10:22:41.6927796Z // begin inline asm 2026-02-21T10:22:41.6928298Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3128,%r3129,%r3130,%r3131}, %rd169, %p79, 1, 1; 2026-02-21T10:22:41.6928357Z // end inline asm 2026-02-21T10:22:41.6928418Z // begin inline asm 2026-02-21T10:22:41.6929007Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3164,%r3165,%r3166,%r3167}, %rd170, %p79, 1, 1; 2026-02-21T10:22:41.6929067Z // end inline asm 2026-02-21T10:22:41.6929207Z // begin inline asm 2026-02-21T10:22:41.6929707Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3200,%r3201,%r3202,%r3203}, %rd171, %p79, 1, 1; 2026-02-21T10:22:41.6929771Z // end inline asm 2026-02-21T10:22:41.6929832Z // begin inline asm 2026-02-21T10:22:41.6930327Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3236,%r3237,%r3238,%r3239}, %rd172, %p79, 1, 1; 2026-02-21T10:22:41.6930391Z // end inline asm 2026-02-21T10:22:41.6930451Z // begin inline asm 2026-02-21T10:22:41.6931010Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476}, {%r3272,%r3273,%r3274,%r3275}, %rd173, %p79, 1, 1; 2026-02-21T10:22:41.6931083Z // end inline asm 2026-02-21T10:22:41.6931163Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.6931228Z mov.b32 %r3292, %r1928; 2026-02-21T10:22:41.6931296Z mov.b32 %r3293, %r3294; 2026-02-21T10:22:41.6931417Z // begin inline asm 2026-02-21T10:22:41.6931726Z // wait for regs: %r5461,%r5462,%r5463,%r5464,%r5465,%r5466,%r5467,%r5468,%r5469,%r5470,%r5471,%r5472,%r5473,%r5474,%r5475,%r5476,%r3292,%r3293,%r3294 2026-02-21T10:22:41.6931803Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.6931862Z // end inline asm 2026-02-21T10:22:41.6931917Z $L__tmp16: 2026-02-21T10:22:41.6932137Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.6932213Z add.s64 %rd25, %rd461, 128; 2026-02-21T10:22:41.6932286Z add.s64 %rd460, %rd460, 512; 2026-02-21T10:22:41.6932356Z setp.lt.u64 %p129, %rd461, 3968; 2026-02-21T10:22:41.6932420Z mov.b64 %rd461, %rd25; 2026-02-21T10:22:41.6932482Z @%p129 bra $L__BB0_5; 2026-02-21T10:22:41.6932597Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:41.6932806Z .loc 1 38 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:38:32 2026-02-21T10:22:41.6932877Z or.b32 %r3401, %r1929, %r6; 2026-02-21T10:22:41.6933080Z .loc 1 40 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:32 2026-02-21T10:22:41.6933146Z or.b32 %r3402, %r94, %r18; 2026-02-21T10:22:41.6933214Z or.b32 %r3403, %r94, %r19; 2026-02-21T10:22:41.6933416Z .loc 1 93 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:93:28 2026-02-21T10:22:41.6933500Z cvt.rn.bf16x2.f32 %r3404, %r5462, %r5461; 2026-02-21T10:22:41.6933584Z cvt.rn.bf16x2.f32 %r3405, %r5464, %r5463; 2026-02-21T10:22:41.6933664Z cvt.rn.bf16x2.f32 %r3406, %r5466, %r5465; 2026-02-21T10:22:41.6933740Z cvt.rn.bf16x2.f32 %r3407, %r5468, %r5467; 2026-02-21T10:22:41.6933818Z cvt.rn.bf16x2.f32 %r3408, %r5470, %r5469; 2026-02-21T10:22:41.6933913Z cvt.rn.bf16x2.f32 %r3409, %r5472, %r5471; 2026-02-21T10:22:41.6933991Z cvt.rn.bf16x2.f32 %r3410, %r5474, %r5473; 2026-02-21T10:22:41.6934067Z cvt.rn.bf16x2.f32 %r3411, %r5476, %r5475; 2026-02-21T10:22:41.6934286Z .loc 1 94 50 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:50 2026-02-21T10:22:41.6934412Z mad.lo.s32 %r3412, %r3402, 1280, %r3401; 2026-02-21T10:22:41.6934529Z mad.lo.s32 %r3413, %r3403, 1280, %r3401; 2026-02-21T10:22:41.6934816Z .loc 1 94 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:22 2026-02-21T10:22:41.6934893Z mad.wide.s32 %rd237, %r3412, 2, %rd36; 2026-02-21T10:22:41.6934967Z mad.wide.s32 %rd238, %r3413, 2, %rd36; 2026-02-21T10:22:41.6935250Z .loc 1 94 81 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:81 2026-02-21T10:22:41.6935312Z bar.sync 0; 2026-02-21T10:22:41.6935429Z st.shared.v4.b32 [%r43], {%r3404, %r3406, %r3408, %r3410}; 2026-02-21T10:22:41.6935603Z st.shared.v4.b32 [%r43+128], {%r3405, %r3407, %r3409, %r3411}; 2026-02-21T10:22:41.6935667Z bar.sync 0; 2026-02-21T10:22:41.6935732Z // begin inline asm 2026-02-21T10:22:41.6935931Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3383, %r3384, %r3385, %r3386}, [%r1860]; 2026-02-21T10:22:41.6936012Z // end inline asm 2026-02-21T10:22:41.6936075Z // begin inline asm 2026-02-21T10:22:41.6936262Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3388, %r3389, %r3390, %r3391}, [%r1865]; 2026-02-21T10:22:41.6936322Z // end inline asm 2026-02-21T10:22:41.6936389Z // begin inline asm 2026-02-21T10:22:41.6936675Z st.global.v4.b32 [ %rd237 + 0 ], { %r3383, %r3384, %r3385, %r3386 }; 2026-02-21T10:22:41.6936738Z // end inline asm 2026-02-21T10:22:41.6936915Z // begin inline asm 2026-02-21T10:22:41.6937044Z st.global.v4.b32 [ %rd238 + 0 ], { %r3388, %r3389, %r3390, %r3391 }; 2026-02-21T10:22:41.6937103Z // end inline asm 2026-02-21T10:22:41.6937339Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6937414Z add.s32 %r5444, %r5444, 264; 2026-02-21T10:22:41.6937486Z setp.lt.s32 %p130, %r5444, %r5518; 2026-02-21T10:22:41.6937616Z @%p130 bra $L__BB0_2; 2026-02-21T10:22:41.6937720Z $L__BB0_7: // %._crit_edge 2026-02-21T10:22:41.6937789Z sub.s32 %r3678, 40960, %r5518; 2026-02-21T10:22:41.6937860Z mul.hi.s32 %r3679, %r3678, 1041204193; 2026-02-21T10:22:41.6937933Z shr.u32 %r3680, %r3679, 31; 2026-02-21T10:22:41.6938001Z shr.s32 %r3681, %r3679, 5; 2026-02-21T10:22:41.6938068Z add.s32 %r128, %r3681, %r3680; 2026-02-21T10:22:41.6938137Z mul.lo.s32 %r3682, %r128, 132; 2026-02-21T10:22:41.6938213Z setp.ne.b32 %p168, %r3678, %r3682; 2026-02-21T10:22:41.6938288Z setp.gt.s32 %p169, %r3678, -1; 2026-02-21T10:22:41.6938361Z and.pred %p170, %p169, %p168; 2026-02-21T10:22:41.6938433Z selp.b32 %r129, 1, 0, %p170; 2026-02-21T10:22:41.6938497Z add.s32 %r130, %r128, %r129; 2026-02-21T10:22:41.6938569Z add.s32 %r3414, %r1928, 118784; 2026-02-21T10:22:41.6938636Z // begin inline asm 2026-02-21T10:22:41.6938745Z @%p256 mbarrier.init.shared::cta.b64 [%r3414], 1; 2026-02-21T10:22:41.6938808Z // end inline asm 2026-02-21T10:22:41.6938879Z bar.sync 0; 2026-02-21T10:22:41.6938953Z add.s32 %r3415, %r1928, 118792; 2026-02-21T10:22:41.6939016Z // begin inline asm 2026-02-21T10:22:41.6939113Z @%p256 mbarrier.init.shared::cta.b64 [%r3415], 1; 2026-02-21T10:22:41.6939180Z // end inline asm 2026-02-21T10:22:41.6939239Z bar.sync 0; 2026-02-21T10:22:41.6939304Z add.s32 %r3416, %r1928, 118800; 2026-02-21T10:22:41.6939366Z // begin inline asm 2026-02-21T10:22:41.6939464Z @%p256 mbarrier.init.shared::cta.b64 [%r3416], 1; 2026-02-21T10:22:41.6939525Z // end inline asm 2026-02-21T10:22:41.6939593Z add.s32 %r3417, %r1928, 118816; 2026-02-21T10:22:41.6939661Z // begin inline asm 2026-02-21T10:22:41.6939755Z @%p256 mbarrier.init.shared::cta.b64 [%r3417], 1; 2026-02-21T10:22:41.6939816Z // end inline asm 2026-02-21T10:22:41.6939879Z bar.sync 0; 2026-02-21T10:22:41.6939943Z add.s32 %r3418, %r1928, 118824; 2026-02-21T10:22:41.6940005Z // begin inline asm 2026-02-21T10:22:41.6940102Z @%p256 mbarrier.init.shared::cta.b64 [%r3418], 1; 2026-02-21T10:22:41.6940167Z // end inline asm 2026-02-21T10:22:41.6940226Z bar.sync 0; 2026-02-21T10:22:41.6940288Z add.s32 %r3419, %r1928, 118832; 2026-02-21T10:22:41.6940354Z // begin inline asm 2026-02-21T10:22:41.6940446Z @%p256 mbarrier.init.shared::cta.b64 [%r3419], 1; 2026-02-21T10:22:41.6940506Z // end inline asm 2026-02-21T10:22:41.6940572Z add.s32 %r3420, %r1928, 118848; 2026-02-21T10:22:41.6940641Z // begin inline asm 2026-02-21T10:22:41.6940737Z @%p256 mbarrier.init.shared::cta.b64 [%r3420], 1; 2026-02-21T10:22:41.6940893Z // end inline asm 2026-02-21T10:22:41.6940957Z bar.sync 0; 2026-02-21T10:22:41.6941024Z add.s32 %r3421, %r1928, 118856; 2026-02-21T10:22:41.6941089Z // begin inline asm 2026-02-21T10:22:41.6941184Z @%p256 mbarrier.init.shared::cta.b64 [%r3421], 1; 2026-02-21T10:22:41.6941315Z // end inline asm 2026-02-21T10:22:41.6941373Z bar.sync 0; 2026-02-21T10:22:41.6941437Z add.s32 %r3422, %r1928, 118864; 2026-02-21T10:22:41.6941506Z // begin inline asm 2026-02-21T10:22:41.6941601Z @%p256 mbarrier.init.shared::cta.b64 [%r3422], 1; 2026-02-21T10:22:41.6941662Z // end inline asm 2026-02-21T10:22:41.6941731Z add.s32 %r3423, %r1928, 118880; 2026-02-21T10:22:41.6941796Z // begin inline asm 2026-02-21T10:22:41.6941890Z @%p256 mbarrier.init.shared::cta.b64 [%r3423], 1; 2026-02-21T10:22:41.6941952Z // end inline asm 2026-02-21T10:22:41.6942018Z bar.sync 0; 2026-02-21T10:22:41.6942082Z add.s32 %r3424, %r1928, 118888; 2026-02-21T10:22:41.6942143Z // begin inline asm 2026-02-21T10:22:41.6942294Z @%p256 mbarrier.init.shared::cta.b64 [%r3424], 1; 2026-02-21T10:22:41.6942356Z // end inline asm 2026-02-21T10:22:41.6942416Z bar.sync 0; 2026-02-21T10:22:41.6942481Z add.s32 %r3425, %r1928, 118896; 2026-02-21T10:22:41.6942552Z // begin inline asm 2026-02-21T10:22:41.6942648Z @%p256 mbarrier.init.shared::cta.b64 [%r3425], 1; 2026-02-21T10:22:41.6942708Z // end inline asm 2026-02-21T10:22:41.6942830Z setp.lt.s32 %p171, %r130, 1; 2026-02-21T10:22:41.6942901Z setp.gt.s32 %p172, %r130, 0; 2026-02-21T10:22:41.6943109Z .loc 1 32 35 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:32:35 2026-02-21T10:22:41.6943184Z shr.s32 %r3684, %r5518, 31; 2026-02-21T10:22:41.6943248Z shr.u32 %r3685, %r3684, 17; 2026-02-21T10:22:41.6943327Z add.s32 %r3686, %r5518, %r3685; 2026-02-21T10:22:41.6943393Z shr.s32 %r3687, %r3686, 15; 2026-02-21T10:22:41.6943604Z .loc 1 33 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:33:33 2026-02-21T10:22:41.6943674Z shl.b32 %r3688, %r3687, 5; 2026-02-21T10:22:41.6943873Z .loc 1 34 39 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:39 2026-02-21T10:22:41.6943944Z sub.s32 %r3689, 40, %r3688; 2026-02-21T10:22:41.6944147Z .loc 1 34 52 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:52 2026-02-21T10:22:41.6944210Z min.s32 %r3690, %r3689, 32; 2026-02-21T10:22:41.6944417Z .loc 1 35 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:45 2026-02-21T10:22:41.6944484Z and.b32 %r3691, %r3686, -32768; 2026-02-21T10:22:41.6944551Z sub.s32 %r3692, %r5518, %r3691; 2026-02-21T10:22:41.6944752Z .loc 1 36 51 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:36:51 2026-02-21T10:22:41.6944825Z div.s32 %r3693, %r3692, %r3690; 2026-02-21T10:22:41.6945022Z .loc 1 35 64 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:64 2026-02-21T10:22:41.6945098Z mul.lo.s32 %r3694, %r3693, %r3690; 2026-02-21T10:22:41.6945170Z sub.s32 %r3695, %r3692, %r3694; 2026-02-21T10:22:41.6945366Z .loc 1 35 30 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:30 2026-02-21T10:22:41.6945434Z add.s32 %r3696, %r3695, %r3688; 2026-02-21T10:22:41.6945641Z .loc 1 37 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:37:27 2026-02-21T10:22:41.6945705Z shl.b32 %r5487, %r3696, 5; 2026-02-21T10:22:41.6945901Z .loc 1 39 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:39:27 2026-02-21T10:22:41.6945972Z shl.b32 %r5485, %r3693, 6; 2026-02-21T10:22:41.6946168Z .loc 1 40 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:32 2026-02-21T10:22:41.6946231Z or.b32 %r5519, %r5485, %r8; 2026-02-21T10:22:41.6946294Z or.b32 %r5520, %r5485, %r9; 2026-02-21T10:22:41.6946364Z or.b32 %r5521, %r5485, %r10; 2026-02-21T10:22:41.6946600Z or.b32 %r5522, %r5485, %r11; 2026-02-21T10:22:41.6946667Z or.b32 %r5523, %r5485, %r12; 2026-02-21T10:22:41.6946733Z or.b32 %r5524, %r5485, %r13; 2026-02-21T10:22:41.6946795Z or.b32 %r5525, %r5485, %r14; 2026-02-21T10:22:41.6946941Z or.b32 %r5526, %r5485, %r15; 2026-02-21T10:22:41.6947160Z .loc 1 54 53 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:53 2026-02-21T10:22:41.6947227Z shl.b32 %r3697, %r5519, 13; 2026-02-21T10:22:41.6947290Z shl.b32 %r3698, %r5520, 13; 2026-02-21T10:22:41.6947352Z shl.b32 %r3699, %r5521, 13; 2026-02-21T10:22:41.6947418Z shl.b32 %r3700, %r5522, 13; 2026-02-21T10:22:41.6947480Z shl.b32 %r3701, %r5523, 13; 2026-02-21T10:22:41.6947545Z shl.b32 %r3702, %r5524, 13; 2026-02-21T10:22:41.6947615Z shl.b32 %r3703, %r5525, 13; 2026-02-21T10:22:41.6947680Z shl.b32 %r3704, %r5526, 13; 2026-02-21T10:22:41.6947888Z .loc 1 54 60 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:60 2026-02-21T10:22:41.6948024Z or.b32 %r3705, %r3697, %r20; 2026-02-21T10:22:41.6948099Z or.b32 %r3706, %r3698, %r20; 2026-02-21T10:22:41.6948163Z or.b32 %r3707, %r3699, %r20; 2026-02-21T10:22:41.6948225Z or.b32 %r3708, %r3700, %r20; 2026-02-21T10:22:41.6948295Z or.b32 %r3709, %r3701, %r20; 2026-02-21T10:22:41.6948355Z or.b32 %r3710, %r3702, %r20; 2026-02-21T10:22:41.6948418Z or.b32 %r3711, %r3703, %r20; 2026-02-21T10:22:41.6948626Z or.b32 %r3712, %r3704, %r20; 2026-02-21T10:22:41.6948839Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6948918Z mad.wide.s32 %rd239, %r3705, 2, %rd35; 2026-02-21T10:22:41.6948991Z mad.wide.s32 %rd240, %r3706, 2, %rd35; 2026-02-21T10:22:41.6949068Z mad.wide.s32 %rd241, %r3707, 2, %rd35; 2026-02-21T10:22:41.6949142Z mad.wide.s32 %rd242, %r3708, 2, %rd35; 2026-02-21T10:22:41.6949211Z mad.wide.s32 %rd243, %r3709, 2, %rd35; 2026-02-21T10:22:41.6949288Z mad.wide.s32 %rd244, %r3710, 2, %rd35; 2026-02-21T10:22:41.6949362Z mad.wide.s32 %rd245, %r3711, 2, %rd35; 2026-02-21T10:22:41.6949432Z mad.wide.s32 %rd246, %r3712, 2, %rd35; 2026-02-21T10:22:41.6949632Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6949708Z shl.b32 %r3713, %r2, 3; 2026-02-21T10:22:41.6949777Z and.b32 %r3714, %r3713, 1016; 2026-02-21T10:22:41.6949856Z shr.u32 %r3715, %r7, 1; 2026-02-21T10:22:41.6949933Z xor.b32 %r141, %r3714, %r3715; 2026-02-21T10:22:41.6950000Z add.s32 %r3426, %r1928, %r141; 2026-02-21T10:22:41.6950067Z selp.b32 %r3427, 8, 0, %p172; 2026-02-21T10:22:41.6950137Z // begin inline asm 2026-02-21T10:22:41.6950291Z cp.async.ca.shared.global [ %r3426 + 0 ], [ %rd239 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6950353Z // end inline asm 2026-02-21T10:22:41.6950419Z add.s32 %r3428, %r3426, 1024; 2026-02-21T10:22:41.6950490Z // begin inline asm 2026-02-21T10:22:41.6950632Z cp.async.ca.shared.global [ %r3428 + 0 ], [ %rd240 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6950697Z // end inline asm 2026-02-21T10:22:41.6950767Z add.s32 %r3430, %r3426, 2048; 2026-02-21T10:22:41.6950830Z // begin inline asm 2026-02-21T10:22:41.6950970Z cp.async.ca.shared.global [ %r3430 + 0 ], [ %rd241 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6951034Z // end inline asm 2026-02-21T10:22:41.6951103Z add.s32 %r3432, %r3426, 3072; 2026-02-21T10:22:41.6951166Z // begin inline asm 2026-02-21T10:22:41.6951304Z cp.async.ca.shared.global [ %r3432 + 0 ], [ %rd242 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6951373Z // end inline asm 2026-02-21T10:22:41.6951437Z add.s32 %r3434, %r3426, 4096; 2026-02-21T10:22:41.6951502Z // begin inline asm 2026-02-21T10:22:41.6951647Z cp.async.ca.shared.global [ %r3434 + 0 ], [ %rd243 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6951710Z // end inline asm 2026-02-21T10:22:41.6951777Z add.s32 %r3436, %r3426, 5120; 2026-02-21T10:22:41.6951840Z // begin inline asm 2026-02-21T10:22:41.6951989Z cp.async.ca.shared.global [ %r3436 + 0 ], [ %rd244 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6952127Z // end inline asm 2026-02-21T10:22:41.6952192Z add.s32 %r3438, %r3426, 6144; 2026-02-21T10:22:41.6952271Z // begin inline asm 2026-02-21T10:22:41.6952412Z cp.async.ca.shared.global [ %r3438 + 0 ], [ %rd245 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6952523Z // end inline asm 2026-02-21T10:22:41.6952588Z add.s32 %r3440, %r3426, 7168; 2026-02-21T10:22:41.6952661Z // begin inline asm 2026-02-21T10:22:41.6952802Z cp.async.ca.shared.global [ %r3440 + 0 ], [ %rd246 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6952862Z // end inline asm 2026-02-21T10:22:41.6952940Z cp.async.commit_group; 2026-02-21T10:22:41.6953164Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6953240Z and.pred %p143, %p256, %p172; 2026-02-21T10:22:41.6953309Z // begin inline asm 2026-02-21T10:22:41.6953450Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3414], 1024; 2026-02-21T10:22:41.6953515Z // end inline asm 2026-02-21T10:22:41.6953770Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6953839Z bar.sync 0; 2026-02-21T10:22:41.6953913Z elect.sync %r3716|%p173, -1; 2026-02-21T10:22:41.6953987Z and.pred %p174, %p172, %p173; 2026-02-21T10:22:41.6954065Z and.pred %p144, %p1, %p174; 2026-02-21T10:22:41.6954138Z add.s32 %r3443, %r1928, 106496; 2026-02-21T10:22:41.6954249Z // begin inline asm 2026-02-21T10:22:41.6954596Z @%p144 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3443], [%rd247, {%r5487, %r264}], [%r3414]; 2026-02-21T10:22:41.6954658Z // end inline asm 2026-02-21T10:22:41.6954862Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6954931Z cvt.s64.s32 %rd347, %r3697; 2026-02-21T10:22:41.6955006Z cvt.u64.u32 %rd348, %r20; 2026-02-21T10:22:41.6955076Z or.b64 %rd349, %rd347, %rd348; 2026-02-21T10:22:41.6955150Z shl.b64 %rd350, %rd349, 1; 2026-02-21T10:22:41.6955227Z add.s64 %rd351, %rd35, %rd350; 2026-02-21T10:22:41.6955296Z add.s64 %rd248, %rd351, 128; 2026-02-21T10:22:41.6955363Z cvt.s64.s32 %rd352, %r3698; 2026-02-21T10:22:41.6955432Z or.b64 %rd353, %rd352, %rd348; 2026-02-21T10:22:41.6955504Z shl.b64 %rd354, %rd353, 1; 2026-02-21T10:22:41.6955570Z add.s64 %rd355, %rd35, %rd354; 2026-02-21T10:22:41.6955640Z add.s64 %rd249, %rd355, 128; 2026-02-21T10:22:41.6955711Z cvt.s64.s32 %rd356, %r3699; 2026-02-21T10:22:41.6955777Z or.b64 %rd357, %rd356, %rd348; 2026-02-21T10:22:41.6955840Z shl.b64 %rd358, %rd357, 1; 2026-02-21T10:22:41.6955910Z add.s64 %rd359, %rd35, %rd358; 2026-02-21T10:22:41.6955975Z add.s64 %rd250, %rd359, 128; 2026-02-21T10:22:41.6956041Z cvt.s64.s32 %rd360, %r3700; 2026-02-21T10:22:41.6956105Z or.b64 %rd361, %rd360, %rd348; 2026-02-21T10:22:41.6956175Z shl.b64 %rd362, %rd361, 1; 2026-02-21T10:22:41.6956240Z add.s64 %rd363, %rd35, %rd362; 2026-02-21T10:22:41.6956309Z add.s64 %rd251, %rd363, 128; 2026-02-21T10:22:41.6956378Z cvt.s64.s32 %rd364, %r3701; 2026-02-21T10:22:41.6956443Z or.b64 %rd365, %rd364, %rd348; 2026-02-21T10:22:41.6956641Z shl.b64 %rd366, %rd365, 1; 2026-02-21T10:22:41.6956715Z add.s64 %rd367, %rd35, %rd366; 2026-02-21T10:22:41.6956787Z add.s64 %rd252, %rd367, 128; 2026-02-21T10:22:41.6956851Z cvt.s64.s32 %rd368, %r3702; 2026-02-21T10:22:41.6956919Z or.b64 %rd369, %rd368, %rd348; 2026-02-21T10:22:41.6956994Z shl.b64 %rd370, %rd369, 1; 2026-02-21T10:22:41.6957069Z add.s64 %rd371, %rd35, %rd370; 2026-02-21T10:22:41.6957133Z add.s64 %rd253, %rd371, 128; 2026-02-21T10:22:41.6957198Z cvt.s64.s32 %rd372, %r3703; 2026-02-21T10:22:41.6957272Z or.b64 %rd373, %rd372, %rd348; 2026-02-21T10:22:41.6957337Z shl.b64 %rd374, %rd373, 1; 2026-02-21T10:22:41.6957401Z add.s64 %rd375, %rd35, %rd374; 2026-02-21T10:22:41.6957474Z add.s64 %rd254, %rd375, 128; 2026-02-21T10:22:41.6957538Z cvt.s64.s32 %rd376, %r3704; 2026-02-21T10:22:41.6957719Z or.b64 %rd377, %rd376, %rd348; 2026-02-21T10:22:41.6957785Z shl.b64 %rd378, %rd377, 1; 2026-02-21T10:22:41.6957857Z add.s64 %rd379, %rd35, %rd378; 2026-02-21T10:22:41.6957921Z add.s64 %rd255, %rd379, 128; 2026-02-21T10:22:41.6958209Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6958293Z add.s32 %r3447, %r3426, 24576; 2026-02-21T10:22:41.6958363Z // begin inline asm 2026-02-21T10:22:41.6958510Z cp.async.ca.shared.global [ %r3447 + 0 ], [ %rd248 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6958576Z // end inline asm 2026-02-21T10:22:41.6958646Z add.s32 %r3449, %r3426, 25600; 2026-02-21T10:22:41.6958709Z // begin inline asm 2026-02-21T10:22:41.6958850Z cp.async.ca.shared.global [ %r3449 + 0 ], [ %rd249 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6958918Z // end inline asm 2026-02-21T10:22:41.6958983Z add.s32 %r3451, %r3426, 26624; 2026-02-21T10:22:41.6959044Z // begin inline asm 2026-02-21T10:22:41.6959254Z cp.async.ca.shared.global [ %r3451 + 0 ], [ %rd250 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6959318Z // end inline asm 2026-02-21T10:22:41.6959382Z add.s32 %r3453, %r3426, 27648; 2026-02-21T10:22:41.6959446Z // begin inline asm 2026-02-21T10:22:41.6959591Z cp.async.ca.shared.global [ %r3453 + 0 ], [ %rd251 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6959651Z // end inline asm 2026-02-21T10:22:41.6959717Z add.s32 %r3455, %r3426, 28672; 2026-02-21T10:22:41.6959850Z // begin inline asm 2026-02-21T10:22:41.6959989Z cp.async.ca.shared.global [ %r3455 + 0 ], [ %rd252 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6960051Z // end inline asm 2026-02-21T10:22:41.6960119Z add.s32 %r3457, %r3426, 29696; 2026-02-21T10:22:41.6960181Z // begin inline asm 2026-02-21T10:22:41.6960317Z cp.async.ca.shared.global [ %r3457 + 0 ], [ %rd253 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6960380Z // end inline asm 2026-02-21T10:22:41.6960453Z add.s32 %r3459, %r3426, 30720; 2026-02-21T10:22:41.6960516Z // begin inline asm 2026-02-21T10:22:41.6960656Z cp.async.ca.shared.global [ %r3459 + 0 ], [ %rd254 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6960725Z // end inline asm 2026-02-21T10:22:41.6960789Z add.s32 %r3461, %r3426, 31744; 2026-02-21T10:22:41.6960849Z // begin inline asm 2026-02-21T10:22:41.6960987Z cp.async.ca.shared.global [ %r3461 + 0 ], [ %rd255 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6961053Z // end inline asm 2026-02-21T10:22:41.6961126Z cp.async.commit_group; 2026-02-21T10:22:41.6961343Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6961414Z // begin inline asm 2026-02-21T10:22:41.6961551Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3417], 1024; 2026-02-21T10:22:41.6961612Z // end inline asm 2026-02-21T10:22:41.6961820Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6961882Z bar.sync 0; 2026-02-21T10:22:41.6961955Z elect.sync %r3717|%p175, -1; 2026-02-21T10:22:41.6962032Z and.pred %p176, %p172, %p175; 2026-02-21T10:22:41.6962109Z and.pred %p146, %p1, %p176; 2026-02-21T10:22:41.6962192Z add.s32 %r3464, %r1928, 109568; 2026-02-21T10:22:41.6962263Z // begin inline asm 2026-02-21T10:22:41.6962602Z @%p146 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3464], [%rd247, {%r5487, %r265}], [%r3417]; 2026-02-21T10:22:41.6962666Z // end inline asm 2026-02-21T10:22:41.6962870Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6962946Z add.s64 %rd257, %rd351, 256; 2026-02-21T10:22:41.6963013Z add.s64 %rd258, %rd355, 256; 2026-02-21T10:22:41.6963080Z add.s64 %rd259, %rd359, 256; 2026-02-21T10:22:41.6963146Z add.s64 %rd260, %rd363, 256; 2026-02-21T10:22:41.6963217Z add.s64 %rd261, %rd367, 256; 2026-02-21T10:22:41.6963282Z add.s64 %rd262, %rd371, 256; 2026-02-21T10:22:41.6963346Z add.s64 %rd263, %rd375, 256; 2026-02-21T10:22:41.6963474Z add.s64 %rd264, %rd379, 256; 2026-02-21T10:22:41.6963680Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6963747Z add.s32 %r3468, %r3426, 49152; 2026-02-21T10:22:41.6963818Z // begin inline asm 2026-02-21T10:22:41.6964009Z cp.async.ca.shared.global [ %r3468 + 0 ], [ %rd257 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6964069Z // end inline asm 2026-02-21T10:22:41.6964136Z add.s32 %r3470, %r3426, 50176; 2026-02-21T10:22:41.6964201Z // begin inline asm 2026-02-21T10:22:41.6964343Z cp.async.ca.shared.global [ %r3470 + 0 ], [ %rd258 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6964402Z // end inline asm 2026-02-21T10:22:41.6964471Z add.s32 %r3472, %r3426, 51200; 2026-02-21T10:22:41.6964532Z // begin inline asm 2026-02-21T10:22:41.6964669Z cp.async.ca.shared.global [ %r3472 + 0 ], [ %rd259 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6964729Z // end inline asm 2026-02-21T10:22:41.6964796Z add.s32 %r3474, %r3426, 52224; 2026-02-21T10:22:41.6964859Z // begin inline asm 2026-02-21T10:22:41.6965044Z cp.async.ca.shared.global [ %r3474 + 0 ], [ %rd260 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6965110Z // end inline asm 2026-02-21T10:22:41.6965173Z add.s32 %r3476, %r3426, 53248; 2026-02-21T10:22:41.6965237Z // begin inline asm 2026-02-21T10:22:41.6965371Z cp.async.ca.shared.global [ %r3476 + 0 ], [ %rd261 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6965435Z // end inline asm 2026-02-21T10:22:41.6965544Z add.s32 %r3478, %r3426, 54272; 2026-02-21T10:22:41.6965608Z // begin inline asm 2026-02-21T10:22:41.6965748Z cp.async.ca.shared.global [ %r3478 + 0 ], [ %rd262 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6965807Z // end inline asm 2026-02-21T10:22:41.6965872Z add.s32 %r3480, %r3426, 55296; 2026-02-21T10:22:41.6965937Z // begin inline asm 2026-02-21T10:22:41.6966074Z cp.async.ca.shared.global [ %r3480 + 0 ], [ %rd263 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6966133Z // end inline asm 2026-02-21T10:22:41.6966195Z add.s32 %r3482, %r3426, 56320; 2026-02-21T10:22:41.6966266Z // begin inline asm 2026-02-21T10:22:41.6966400Z cp.async.ca.shared.global [ %r3482 + 0 ], [ %rd264 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6966575Z // end inline asm 2026-02-21T10:22:41.6966655Z cp.async.commit_group; 2026-02-21T10:22:41.6966879Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6966942Z // begin inline asm 2026-02-21T10:22:41.6967085Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3420], 1024; 2026-02-21T10:22:41.6967154Z // end inline asm 2026-02-21T10:22:41.6967360Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6967420Z bar.sync 0; 2026-02-21T10:22:41.6967496Z elect.sync %r3718|%p177, -1; 2026-02-21T10:22:41.6967566Z and.pred %p178, %p172, %p177; 2026-02-21T10:22:41.6967634Z and.pred %p148, %p1, %p178; 2026-02-21T10:22:41.6967706Z add.s32 %r3485, %r1928, 112640; 2026-02-21T10:22:41.6967771Z mov.b32 %r3487, 64; 2026-02-21T10:22:41.6967838Z // begin inline asm 2026-02-21T10:22:41.6968166Z @%p148 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3485], [%rd247, {%r5487, %r3487}], [%r3420]; 2026-02-21T10:22:41.6968234Z // end inline asm 2026-02-21T10:22:41.6968437Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6968506Z add.s64 %rd266, %rd351, 384; 2026-02-21T10:22:41.6968577Z add.s64 %rd267, %rd355, 384; 2026-02-21T10:22:41.6968642Z add.s64 %rd268, %rd359, 384; 2026-02-21T10:22:41.6968706Z add.s64 %rd269, %rd363, 384; 2026-02-21T10:22:41.6968776Z add.s64 %rd270, %rd367, 384; 2026-02-21T10:22:41.6968840Z add.s64 %rd271, %rd371, 384; 2026-02-21T10:22:41.6968904Z add.s64 %rd272, %rd375, 384; 2026-02-21T10:22:41.6968967Z add.s64 %rd273, %rd379, 384; 2026-02-21T10:22:41.6969173Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6969333Z add.s32 %r3489, %r3426, 73728; 2026-02-21T10:22:41.6969396Z // begin inline asm 2026-02-21T10:22:41.6969540Z cp.async.ca.shared.global [ %r3489 + 0 ], [ %rd266 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6969599Z // end inline asm 2026-02-21T10:22:41.6969731Z add.s32 %r3491, %r3426, 74752; 2026-02-21T10:22:41.6969809Z // begin inline asm 2026-02-21T10:22:41.6969955Z cp.async.ca.shared.global [ %r3491 + 0 ], [ %rd267 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6970016Z // end inline asm 2026-02-21T10:22:41.6970080Z add.s32 %r3493, %r3426, 75776; 2026-02-21T10:22:41.6970146Z // begin inline asm 2026-02-21T10:22:41.6970279Z cp.async.ca.shared.global [ %r3493 + 0 ], [ %rd268 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6970338Z // end inline asm 2026-02-21T10:22:41.6970406Z add.s32 %r3495, %r3426, 76800; 2026-02-21T10:22:41.6970467Z // begin inline asm 2026-02-21T10:22:41.6970600Z cp.async.ca.shared.global [ %r3495 + 0 ], [ %rd269 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6970662Z // end inline asm 2026-02-21T10:22:41.6970796Z add.s32 %r3497, %r3426, 77824; 2026-02-21T10:22:41.6970861Z // begin inline asm 2026-02-21T10:22:41.6970997Z cp.async.ca.shared.global [ %r3497 + 0 ], [ %rd270 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6971064Z // end inline asm 2026-02-21T10:22:41.6971128Z add.s32 %r3499, %r3426, 78848; 2026-02-21T10:22:41.6971193Z // begin inline asm 2026-02-21T10:22:41.6971392Z cp.async.ca.shared.global [ %r3499 + 0 ], [ %rd271 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6971466Z // end inline asm 2026-02-21T10:22:41.6971530Z add.s32 %r3501, %r3426, 79872; 2026-02-21T10:22:41.6971593Z // begin inline asm 2026-02-21T10:22:41.6971735Z cp.async.ca.shared.global [ %r3501 + 0 ], [ %rd272 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6971795Z // end inline asm 2026-02-21T10:22:41.6971860Z add.s32 %r3503, %r3426, 80896; 2026-02-21T10:22:41.6971928Z // begin inline asm 2026-02-21T10:22:41.6972062Z cp.async.ca.shared.global [ %r3503 + 0 ], [ %rd273 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6972124Z // end inline asm 2026-02-21T10:22:41.6972198Z cp.async.commit_group; 2026-02-21T10:22:41.6972433Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6972500Z // begin inline asm 2026-02-21T10:22:41.6972639Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3423], 1024; 2026-02-21T10:22:41.6972706Z // end inline asm 2026-02-21T10:22:41.6972914Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6972975Z bar.sync 0; 2026-02-21T10:22:41.6973053Z elect.sync %r3719|%p179, -1; 2026-02-21T10:22:41.6973124Z and.pred %p180, %p172, %p179; 2026-02-21T10:22:41.6973194Z and.pred %p150, %p1, %p180; 2026-02-21T10:22:41.6973259Z add.s32 %r3506, %r1928, 115712; 2026-02-21T10:22:41.6973329Z mov.b32 %r3508, 96; 2026-02-21T10:22:41.6973392Z // begin inline asm 2026-02-21T10:22:41.6973723Z @%p150 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3506], [%rd247, {%r5487, %r3508}], [%r3423]; 2026-02-21T10:22:41.6973793Z // end inline asm 2026-02-21T10:22:41.6973997Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6974064Z add.s64 %rd275, %rd351, 512; 2026-02-21T10:22:41.6974134Z add.s64 %rd276, %rd355, 512; 2026-02-21T10:22:41.6974201Z add.s64 %rd277, %rd359, 512; 2026-02-21T10:22:41.6974269Z add.s64 %rd278, %rd363, 512; 2026-02-21T10:22:41.6974334Z add.s64 %rd279, %rd367, 512; 2026-02-21T10:22:41.6974405Z add.s64 %rd280, %rd371, 512; 2026-02-21T10:22:41.6974470Z add.s64 %rd281, %rd375, 512; 2026-02-21T10:22:41.6974535Z add.s64 %rd282, %rd379, 512; 2026-02-21T10:22:41.6974742Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6974806Z add.s32 %r3510, %r3426, 8192; 2026-02-21T10:22:41.6974867Z // begin inline asm 2026-02-21T10:22:41.6975010Z cp.async.ca.shared.global [ %r3510 + 0 ], [ %rd275 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6975137Z // end inline asm 2026-02-21T10:22:41.6975199Z add.s32 %r3512, %r3426, 9216; 2026-02-21T10:22:41.6975261Z // begin inline asm 2026-02-21T10:22:41.6975406Z cp.async.ca.shared.global [ %r3512 + 0 ], [ %rd276 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6975527Z // end inline asm 2026-02-21T10:22:41.6975594Z add.s32 %r3514, %r3426, 10240; 2026-02-21T10:22:41.6975665Z // begin inline asm 2026-02-21T10:22:41.6975802Z cp.async.ca.shared.global [ %r3514 + 0 ], [ %rd277 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6975863Z // end inline asm 2026-02-21T10:22:41.6975926Z add.s32 %r3516, %r3426, 11264; 2026-02-21T10:22:41.6975994Z // begin inline asm 2026-02-21T10:22:41.6976130Z cp.async.ca.shared.global [ %r3516 + 0 ], [ %rd278 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6976190Z // end inline asm 2026-02-21T10:22:41.6976262Z add.s32 %r3518, %r3426, 12288; 2026-02-21T10:22:41.6976325Z // begin inline asm 2026-02-21T10:22:41.6976659Z cp.async.ca.shared.global [ %r3518 + 0 ], [ %rd279 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6976727Z // end inline asm 2026-02-21T10:22:41.6976800Z add.s32 %r3520, %r3426, 13312; 2026-02-21T10:22:41.6976863Z // begin inline asm 2026-02-21T10:22:41.6977002Z cp.async.ca.shared.global [ %r3520 + 0 ], [ %rd280 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6977079Z // end inline asm 2026-02-21T10:22:41.6977146Z add.s32 %r3522, %r3426, 14336; 2026-02-21T10:22:41.6977275Z // begin inline asm 2026-02-21T10:22:41.6977418Z cp.async.ca.shared.global [ %r3522 + 0 ], [ %rd281 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6977479Z // end inline asm 2026-02-21T10:22:41.6977543Z add.s32 %r3524, %r3426, 15360; 2026-02-21T10:22:41.6977605Z // begin inline asm 2026-02-21T10:22:41.6977746Z cp.async.ca.shared.global [ %r3524 + 0 ], [ %rd282 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6977806Z // end inline asm 2026-02-21T10:22:41.6977876Z cp.async.commit_group; 2026-02-21T10:22:41.6978097Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6978161Z // begin inline asm 2026-02-21T10:22:41.6978293Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3415], 1024; 2026-02-21T10:22:41.6978351Z // end inline asm 2026-02-21T10:22:41.6978556Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6978615Z bar.sync 0; 2026-02-21T10:22:41.6978689Z elect.sync %r3720|%p181, -1; 2026-02-21T10:22:41.6978765Z and.pred %p182, %p172, %p181; 2026-02-21T10:22:41.6978837Z and.pred %p152, %p1, %p182; 2026-02-21T10:22:41.6978903Z add.s32 %r3527, %r1928, 107520; 2026-02-21T10:22:41.6978972Z mov.b32 %r3529, 128; 2026-02-21T10:22:41.6979036Z // begin inline asm 2026-02-21T10:22:41.6979359Z @%p152 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3527], [%rd247, {%r5487, %r3529}], [%r3415]; 2026-02-21T10:22:41.6979421Z // end inline asm 2026-02-21T10:22:41.6979632Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6979702Z add.s64 %rd284, %rd351, 640; 2026-02-21T10:22:41.6979767Z add.s64 %rd285, %rd355, 640; 2026-02-21T10:22:41.6979837Z add.s64 %rd286, %rd359, 640; 2026-02-21T10:22:41.6979917Z add.s64 %rd287, %rd363, 640; 2026-02-21T10:22:41.6979986Z add.s64 %rd288, %rd367, 640; 2026-02-21T10:22:41.6980058Z add.s64 %rd289, %rd371, 640; 2026-02-21T10:22:41.6980125Z add.s64 %rd290, %rd375, 640; 2026-02-21T10:22:41.6980192Z add.s64 %rd291, %rd379, 640; 2026-02-21T10:22:41.6980391Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6980462Z add.s32 %r3531, %r3426, 32768; 2026-02-21T10:22:41.6980526Z // begin inline asm 2026-02-21T10:22:41.6980666Z cp.async.ca.shared.global [ %r3531 + 0 ], [ %rd284 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6980733Z // end inline asm 2026-02-21T10:22:41.6980798Z add.s32 %r3533, %r3426, 33792; 2026-02-21T10:22:41.6980962Z // begin inline asm 2026-02-21T10:22:41.6981100Z cp.async.ca.shared.global [ %r3533 + 0 ], [ %rd285 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6981170Z // end inline asm 2026-02-21T10:22:41.6981235Z add.s32 %r3535, %r3426, 34816; 2026-02-21T10:22:41.6981359Z // begin inline asm 2026-02-21T10:22:41.6981502Z cp.async.ca.shared.global [ %r3535 + 0 ], [ %rd286 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6981573Z // end inline asm 2026-02-21T10:22:41.6981650Z add.s32 %r3537, %r3426, 35840; 2026-02-21T10:22:41.6981721Z // begin inline asm 2026-02-21T10:22:41.6981859Z cp.async.ca.shared.global [ %r3537 + 0 ], [ %rd287 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6981919Z // end inline asm 2026-02-21T10:22:41.6981984Z add.s32 %r3539, %r3426, 36864; 2026-02-21T10:22:41.6982055Z // begin inline asm 2026-02-21T10:22:41.6982189Z cp.async.ca.shared.global [ %r3539 + 0 ], [ %rd288 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6982248Z // end inline asm 2026-02-21T10:22:41.6982318Z add.s32 %r3541, %r3426, 37888; 2026-02-21T10:22:41.6982437Z // begin inline asm 2026-02-21T10:22:41.6982577Z cp.async.ca.shared.global [ %r3541 + 0 ], [ %rd289 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6982638Z // end inline asm 2026-02-21T10:22:41.6982711Z add.s32 %r3543, %r3426, 38912; 2026-02-21T10:22:41.6982778Z // begin inline asm 2026-02-21T10:22:41.6982913Z cp.async.ca.shared.global [ %r3543 + 0 ], [ %rd290 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6983033Z // end inline asm 2026-02-21T10:22:41.6983100Z add.s32 %r3545, %r3426, 39936; 2026-02-21T10:22:41.6983161Z // begin inline asm 2026-02-21T10:22:41.6983304Z cp.async.ca.shared.global [ %r3545 + 0 ], [ %rd291 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6983366Z // end inline asm 2026-02-21T10:22:41.6983437Z cp.async.commit_group; 2026-02-21T10:22:41.6983648Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6983714Z // begin inline asm 2026-02-21T10:22:41.6983847Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3418], 1024; 2026-02-21T10:22:41.6983908Z // end inline asm 2026-02-21T10:22:41.6984114Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6984176Z bar.sync 0; 2026-02-21T10:22:41.6984245Z elect.sync %r3721|%p183, -1; 2026-02-21T10:22:41.6984314Z and.pred %p184, %p172, %p183; 2026-02-21T10:22:41.6984389Z and.pred %p154, %p1, %p184; 2026-02-21T10:22:41.6984457Z add.s32 %r3548, %r1928, 110592; 2026-02-21T10:22:41.6984519Z mov.b32 %r3550, 160; 2026-02-21T10:22:41.6984587Z // begin inline asm 2026-02-21T10:22:41.6984911Z @%p154 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3548], [%rd247, {%r5487, %r3550}], [%r3418]; 2026-02-21T10:22:41.6984972Z // end inline asm 2026-02-21T10:22:41.6985182Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6985248Z add.s64 %rd293, %rd351, 768; 2026-02-21T10:22:41.6985317Z add.s64 %rd294, %rd355, 768; 2026-02-21T10:22:41.6985387Z add.s64 %rd295, %rd359, 768; 2026-02-21T10:22:41.6985452Z add.s64 %rd296, %rd363, 768; 2026-02-21T10:22:41.6985517Z add.s64 %rd297, %rd367, 768; 2026-02-21T10:22:41.6985582Z add.s64 %rd298, %rd371, 768; 2026-02-21T10:22:41.6985652Z add.s64 %rd299, %rd375, 768; 2026-02-21T10:22:41.6985715Z add.s64 %rd300, %rd379, 768; 2026-02-21T10:22:41.6985916Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6985988Z add.s32 %r3552, %r3426, 57344; 2026-02-21T10:22:41.6986051Z // begin inline asm 2026-02-21T10:22:41.6986190Z cp.async.ca.shared.global [ %r3552 + 0 ], [ %rd293 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6986250Z // end inline asm 2026-02-21T10:22:41.6986319Z add.s32 %r3554, %r3426, 58368; 2026-02-21T10:22:41.6986384Z // begin inline asm 2026-02-21T10:22:41.6986646Z cp.async.ca.shared.global [ %r3554 + 0 ], [ %rd294 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6986794Z // end inline asm 2026-02-21T10:22:41.6986862Z add.s32 %r3556, %r3426, 59392; 2026-02-21T10:22:41.6986936Z // begin inline asm 2026-02-21T10:22:41.6987075Z cp.async.ca.shared.global [ %r3556 + 0 ], [ %rd295 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6987209Z // end inline asm 2026-02-21T10:22:41.6987274Z add.s32 %r3558, %r3426, 60416; 2026-02-21T10:22:41.6987336Z // begin inline asm 2026-02-21T10:22:41.6987483Z cp.async.ca.shared.global [ %r3558 + 0 ], [ %rd296 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6987543Z // end inline asm 2026-02-21T10:22:41.6987606Z add.s32 %r3560, %r3426, 61440; 2026-02-21T10:22:41.6987675Z // begin inline asm 2026-02-21T10:22:41.6987807Z cp.async.ca.shared.global [ %r3560 + 0 ], [ %rd297 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6987868Z // end inline asm 2026-02-21T10:22:41.6987932Z add.s32 %r3562, %r3426, 62464; 2026-02-21T10:22:41.6988001Z // begin inline asm 2026-02-21T10:22:41.6988134Z cp.async.ca.shared.global [ %r3562 + 0 ], [ %rd298 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6988263Z // end inline asm 2026-02-21T10:22:41.6988336Z add.s32 %r3564, %r3426, 63488; 2026-02-21T10:22:41.6988397Z // begin inline asm 2026-02-21T10:22:41.6988610Z cp.async.ca.shared.global [ %r3564 + 0 ], [ %rd299 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6988676Z // end inline asm 2026-02-21T10:22:41.6988747Z add.s32 %r3566, %r3426, 64512; 2026-02-21T10:22:41.6988810Z // begin inline asm 2026-02-21T10:22:41.6989011Z cp.async.ca.shared.global [ %r3566 + 0 ], [ %rd300 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6989082Z // end inline asm 2026-02-21T10:22:41.6989150Z cp.async.commit_group; 2026-02-21T10:22:41.6989363Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6989432Z // begin inline asm 2026-02-21T10:22:41.6989562Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3421], 1024; 2026-02-21T10:22:41.6989622Z // end inline asm 2026-02-21T10:22:41.6989829Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6989898Z bar.sync 0; 2026-02-21T10:22:41.6989969Z elect.sync %r3722|%p185, -1; 2026-02-21T10:22:41.6990038Z and.pred %p186, %p172, %p185; 2026-02-21T10:22:41.6990128Z and.pred %p156, %p1, %p186; 2026-02-21T10:22:41.6990197Z add.s32 %r3569, %r1928, 113664; 2026-02-21T10:22:41.6990259Z mov.b32 %r3571, 192; 2026-02-21T10:22:41.6990321Z // begin inline asm 2026-02-21T10:22:41.6990653Z @%p156 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3569], [%rd247, {%r5487, %r3571}], [%r3421]; 2026-02-21T10:22:41.6990715Z // end inline asm 2026-02-21T10:22:41.6990918Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6990992Z add.s64 %rd302, %rd351, 896; 2026-02-21T10:22:41.6991058Z add.s64 %rd303, %rd355, 896; 2026-02-21T10:22:41.6991125Z add.s64 %rd304, %rd359, 896; 2026-02-21T10:22:41.6991197Z add.s64 %rd305, %rd363, 896; 2026-02-21T10:22:41.6991267Z add.s64 %rd306, %rd367, 896; 2026-02-21T10:22:41.6991334Z add.s64 %rd307, %rd371, 896; 2026-02-21T10:22:41.6991408Z add.s64 %rd308, %rd375, 896; 2026-02-21T10:22:41.6991472Z add.s64 %rd309, %rd379, 896; 2026-02-21T10:22:41.6991675Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6991741Z add.s32 %r3573, %r3426, 81920; 2026-02-21T10:22:41.6991812Z // begin inline asm 2026-02-21T10:22:41.6991950Z cp.async.ca.shared.global [ %r3573 + 0 ], [ %rd302 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6992011Z // end inline asm 2026-02-21T10:22:41.6992082Z add.s32 %r3575, %r3426, 82944; 2026-02-21T10:22:41.6992143Z // begin inline asm 2026-02-21T10:22:41.6992280Z cp.async.ca.shared.global [ %r3575 + 0 ], [ %rd303 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6992338Z // end inline asm 2026-02-21T10:22:41.6992411Z add.s32 %r3577, %r3426, 83968; 2026-02-21T10:22:41.6992474Z // begin inline asm 2026-02-21T10:22:41.6992672Z cp.async.ca.shared.global [ %r3577 + 0 ], [ %rd304 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6992741Z // end inline asm 2026-02-21T10:22:41.6992807Z add.s32 %r3579, %r3426, 84992; 2026-02-21T10:22:41.6992872Z // begin inline asm 2026-02-21T10:22:41.6993072Z cp.async.ca.shared.global [ %r3579 + 0 ], [ %rd305 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6993133Z // end inline asm 2026-02-21T10:22:41.6993205Z add.s32 %r3581, %r3426, 86016; 2026-02-21T10:22:41.6993267Z // begin inline asm 2026-02-21T10:22:41.6993409Z cp.async.ca.shared.global [ %r3581 + 0 ], [ %rd306 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6993469Z // end inline asm 2026-02-21T10:22:41.6993533Z add.s32 %r3583, %r3426, 87040; 2026-02-21T10:22:41.6993600Z // begin inline asm 2026-02-21T10:22:41.6993733Z cp.async.ca.shared.global [ %r3583 + 0 ], [ %rd307 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6993793Z // end inline asm 2026-02-21T10:22:41.6993855Z add.s32 %r3585, %r3426, 88064; 2026-02-21T10:22:41.6993928Z // begin inline asm 2026-02-21T10:22:41.6994114Z cp.async.ca.shared.global [ %r3585 + 0 ], [ %rd308 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6994176Z // end inline asm 2026-02-21T10:22:41.6994248Z add.s32 %r3587, %r3426, 89088; 2026-02-21T10:22:41.6994313Z // begin inline asm 2026-02-21T10:22:41.6994446Z cp.async.ca.shared.global [ %r3587 + 0 ], [ %rd309 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6994506Z // end inline asm 2026-02-21T10:22:41.6994631Z cp.async.commit_group; 2026-02-21T10:22:41.6994844Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.6994907Z // begin inline asm 2026-02-21T10:22:41.6995040Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3424], 1024; 2026-02-21T10:22:41.6995099Z // end inline asm 2026-02-21T10:22:41.6995299Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.6995363Z bar.sync 0; 2026-02-21T10:22:41.6995431Z elect.sync %r3723|%p187, -1; 2026-02-21T10:22:41.6995505Z and.pred %p188, %p172, %p187; 2026-02-21T10:22:41.6995574Z and.pred %p158, %p1, %p188; 2026-02-21T10:22:41.6995647Z add.s32 %r3590, %r1928, 116736; 2026-02-21T10:22:41.6995709Z mov.b32 %r3592, 224; 2026-02-21T10:22:41.6995774Z // begin inline asm 2026-02-21T10:22:41.6996105Z @%p158 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3590], [%rd247, {%r5487, %r3592}], [%r3424]; 2026-02-21T10:22:41.6996165Z // end inline asm 2026-02-21T10:22:41.6996378Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.6996568Z add.s64 %rd311, %rd351, 1024; 2026-02-21T10:22:41.6996640Z add.s64 %rd312, %rd355, 1024; 2026-02-21T10:22:41.6996706Z add.s64 %rd313, %rd359, 1024; 2026-02-21T10:22:41.6996771Z add.s64 %rd314, %rd363, 1024; 2026-02-21T10:22:41.6996838Z add.s64 %rd315, %rd367, 1024; 2026-02-21T10:22:41.6996903Z add.s64 %rd316, %rd371, 1024; 2026-02-21T10:22:41.6996973Z add.s64 %rd317, %rd375, 1024; 2026-02-21T10:22:41.6997056Z add.s64 %rd318, %rd379, 1024; 2026-02-21T10:22:41.6997258Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.6997326Z add.s32 %r3594, %r3426, 16384; 2026-02-21T10:22:41.6997396Z // begin inline asm 2026-02-21T10:22:41.6997536Z cp.async.ca.shared.global [ %r3594 + 0 ], [ %rd311 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6997598Z // end inline asm 2026-02-21T10:22:41.6997664Z add.s32 %r3596, %r3426, 17408; 2026-02-21T10:22:41.6997733Z // begin inline asm 2026-02-21T10:22:41.6997869Z cp.async.ca.shared.global [ %r3596 + 0 ], [ %rd312 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6997929Z // end inline asm 2026-02-21T10:22:41.6998006Z add.s32 %r3598, %r3426, 18432; 2026-02-21T10:22:41.6998069Z // begin inline asm 2026-02-21T10:22:41.6998212Z cp.async.ca.shared.global [ %r3598 + 0 ], [ %rd313 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6998273Z // end inline asm 2026-02-21T10:22:41.6998428Z add.s32 %r3600, %r3426, 19456; 2026-02-21T10:22:41.6998491Z // begin inline asm 2026-02-21T10:22:41.6998628Z cp.async.ca.shared.global [ %r3600 + 0 ], [ %rd314 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6998693Z // end inline asm 2026-02-21T10:22:41.6998820Z add.s32 %r3602, %r3426, 20480; 2026-02-21T10:22:41.6998882Z // begin inline asm 2026-02-21T10:22:41.6999027Z cp.async.ca.shared.global [ %r3602 + 0 ], [ %rd315 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6999089Z // end inline asm 2026-02-21T10:22:41.6999153Z add.s32 %r3604, %r3426, 21504; 2026-02-21T10:22:41.6999216Z // begin inline asm 2026-02-21T10:22:41.6999368Z cp.async.ca.shared.global [ %r3604 + 0 ], [ %rd316 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6999430Z // end inline asm 2026-02-21T10:22:41.6999494Z add.s32 %r3606, %r3426, 22528; 2026-02-21T10:22:41.6999561Z // begin inline asm 2026-02-21T10:22:41.6999697Z cp.async.ca.shared.global [ %r3606 + 0 ], [ %rd317 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.6999760Z // end inline asm 2026-02-21T10:22:41.6999890Z add.s32 %r3608, %r3426, 23552; 2026-02-21T10:22:41.6999961Z // begin inline asm 2026-02-21T10:22:41.7000097Z cp.async.ca.shared.global [ %r3608 + 0 ], [ %rd318 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7000160Z // end inline asm 2026-02-21T10:22:41.7000236Z cp.async.commit_group; 2026-02-21T10:22:41.7000525Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7000591Z // begin inline asm 2026-02-21T10:22:41.7000731Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3416], 1024; 2026-02-21T10:22:41.7000794Z // end inline asm 2026-02-21T10:22:41.7001007Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7001069Z bar.sync 0; 2026-02-21T10:22:41.7001146Z elect.sync %r3724|%p189, -1; 2026-02-21T10:22:41.7001217Z and.pred %p190, %p172, %p189; 2026-02-21T10:22:41.7001288Z and.pred %p160, %p1, %p190; 2026-02-21T10:22:41.7001365Z add.s32 %r3611, %r1928, 108544; 2026-02-21T10:22:41.7001428Z mov.b32 %r5491, 256; 2026-02-21T10:22:41.7001492Z // begin inline asm 2026-02-21T10:22:41.7001827Z @%p160 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3611], [%rd247, {%r5487, %r5491}], [%r3416]; 2026-02-21T10:22:41.7001891Z // end inline asm 2026-02-21T10:22:41.7002098Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7002165Z add.s64 %rd320, %rd351, 1152; 2026-02-21T10:22:41.7002238Z add.s64 %rd321, %rd355, 1152; 2026-02-21T10:22:41.7002302Z add.s64 %rd322, %rd359, 1152; 2026-02-21T10:22:41.7002370Z add.s64 %rd323, %rd363, 1152; 2026-02-21T10:22:41.7002443Z add.s64 %rd324, %rd367, 1152; 2026-02-21T10:22:41.7002512Z add.s64 %rd325, %rd371, 1152; 2026-02-21T10:22:41.7002575Z add.s64 %rd326, %rd375, 1152; 2026-02-21T10:22:41.7002639Z add.s64 %rd327, %rd379, 1152; 2026-02-21T10:22:41.7002847Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7002915Z add.s32 %r3615, %r3426, 40960; 2026-02-21T10:22:41.7002977Z // begin inline asm 2026-02-21T10:22:41.7003120Z cp.async.ca.shared.global [ %r3615 + 0 ], [ %rd320 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7003184Z // end inline asm 2026-02-21T10:22:41.7003248Z add.s32 %r3617, %r3426, 41984; 2026-02-21T10:22:41.7003317Z // begin inline asm 2026-02-21T10:22:41.7003453Z cp.async.ca.shared.global [ %r3617 + 0 ], [ %rd321 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7003511Z // end inline asm 2026-02-21T10:22:41.7003577Z add.s32 %r3619, %r3426, 43008; 2026-02-21T10:22:41.7003643Z // begin inline asm 2026-02-21T10:22:41.7003778Z cp.async.ca.shared.global [ %r3619 + 0 ], [ %rd322 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7003837Z // end inline asm 2026-02-21T10:22:41.7003906Z add.s32 %r3621, %r3426, 44032; 2026-02-21T10:22:41.7003969Z // begin inline asm 2026-02-21T10:22:41.7004108Z cp.async.ca.shared.global [ %r3621 + 0 ], [ %rd323 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7004253Z // end inline asm 2026-02-21T10:22:41.7004334Z add.s32 %r3623, %r3426, 45056; 2026-02-21T10:22:41.7004399Z // begin inline asm 2026-02-21T10:22:41.7004534Z cp.async.ca.shared.global [ %r3623 + 0 ], [ %rd324 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7004653Z // end inline asm 2026-02-21T10:22:41.7004717Z add.s32 %r3625, %r3426, 46080; 2026-02-21T10:22:41.7004783Z // begin inline asm 2026-02-21T10:22:41.7004921Z cp.async.ca.shared.global [ %r3625 + 0 ], [ %rd325 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7004982Z // end inline asm 2026-02-21T10:22:41.7005046Z add.s32 %r3627, %r3426, 47104; 2026-02-21T10:22:41.7005106Z // begin inline asm 2026-02-21T10:22:41.7005245Z cp.async.ca.shared.global [ %r3627 + 0 ], [ %rd326 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7005304Z // end inline asm 2026-02-21T10:22:41.7005369Z add.s32 %r3629, %r3426, 48128; 2026-02-21T10:22:41.7005435Z // begin inline asm 2026-02-21T10:22:41.7005618Z cp.async.ca.shared.global [ %r3629 + 0 ], [ %rd327 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7005679Z // end inline asm 2026-02-21T10:22:41.7005748Z cp.async.commit_group; 2026-02-21T10:22:41.7005962Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7006029Z // begin inline asm 2026-02-21T10:22:41.7006157Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3419], 1024; 2026-02-21T10:22:41.7006276Z // end inline asm 2026-02-21T10:22:41.7006595Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7006661Z bar.sync 0; 2026-02-21T10:22:41.7006736Z elect.sync %r3725|%p191, -1; 2026-02-21T10:22:41.7006806Z and.pred %p192, %p172, %p191; 2026-02-21T10:22:41.7006874Z and.pred %p162, %p1, %p192; 2026-02-21T10:22:41.7006941Z add.s32 %r3632, %r1928, 111616; 2026-02-21T10:22:41.7007020Z mov.b32 %r3634, 288; 2026-02-21T10:22:41.7007084Z // begin inline asm 2026-02-21T10:22:41.7007409Z @%p162 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3632], [%rd247, {%r5487, %r3634}], [%r3419]; 2026-02-21T10:22:41.7007477Z // end inline asm 2026-02-21T10:22:41.7007679Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7007751Z add.s64 %rd329, %rd351, 1280; 2026-02-21T10:22:41.7007824Z add.s64 %rd330, %rd355, 1280; 2026-02-21T10:22:41.7007893Z add.s64 %rd331, %rd359, 1280; 2026-02-21T10:22:41.7007960Z add.s64 %rd332, %rd363, 1280; 2026-02-21T10:22:41.7008024Z add.s64 %rd333, %rd367, 1280; 2026-02-21T10:22:41.7008096Z add.s64 %rd334, %rd371, 1280; 2026-02-21T10:22:41.7008160Z add.s64 %rd335, %rd375, 1280; 2026-02-21T10:22:41.7008223Z add.s64 %rd336, %rd379, 1280; 2026-02-21T10:22:41.7008432Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7008498Z add.s32 %r3636, %r3426, 65536; 2026-02-21T10:22:41.7008565Z // begin inline asm 2026-02-21T10:22:41.7008710Z cp.async.ca.shared.global [ %r3636 + 0 ], [ %rd329 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7008771Z // end inline asm 2026-02-21T10:22:41.7008837Z add.s32 %r3638, %r3426, 66560; 2026-02-21T10:22:41.7008906Z // begin inline asm 2026-02-21T10:22:41.7009048Z cp.async.ca.shared.global [ %r3638 + 0 ], [ %rd330 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7009108Z // end inline asm 2026-02-21T10:22:41.7009176Z add.s32 %r3640, %r3426, 67584; 2026-02-21T10:22:41.7009244Z // begin inline asm 2026-02-21T10:22:41.7009378Z cp.async.ca.shared.global [ %r3640 + 0 ], [ %rd331 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7009437Z // end inline asm 2026-02-21T10:22:41.7009502Z add.s32 %r3642, %r3426, 68608; 2026-02-21T10:22:41.7009571Z // begin inline asm 2026-02-21T10:22:41.7009705Z cp.async.ca.shared.global [ %r3642 + 0 ], [ %rd332 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7009764Z // end inline asm 2026-02-21T10:22:41.7009836Z add.s32 %r3644, %r3426, 69632; 2026-02-21T10:22:41.7009992Z // begin inline asm 2026-02-21T10:22:41.7010129Z cp.async.ca.shared.global [ %r3644 + 0 ], [ %rd333 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7018279Z // end inline asm 2026-02-21T10:22:41.7018393Z add.s32 %r3646, %r3426, 70656; 2026-02-21T10:22:41.7018623Z // begin inline asm 2026-02-21T10:22:41.7018796Z cp.async.ca.shared.global [ %r3646 + 0 ], [ %rd334 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7018873Z // end inline asm 2026-02-21T10:22:41.7018948Z add.s32 %r3648, %r3426, 71680; 2026-02-21T10:22:41.7019014Z // begin inline asm 2026-02-21T10:22:41.7019172Z cp.async.ca.shared.global [ %r3648 + 0 ], [ %rd335 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7019234Z // end inline asm 2026-02-21T10:22:41.7019301Z add.s32 %r3650, %r3426, 72704; 2026-02-21T10:22:41.7019362Z // begin inline asm 2026-02-21T10:22:41.7019508Z cp.async.ca.shared.global [ %r3650 + 0 ], [ %rd336 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7019568Z // end inline asm 2026-02-21T10:22:41.7019642Z cp.async.commit_group; 2026-02-21T10:22:41.7019953Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7020017Z // begin inline asm 2026-02-21T10:22:41.7020162Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3422], 1024; 2026-02-21T10:22:41.7020228Z // end inline asm 2026-02-21T10:22:41.7020501Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7020561Z bar.sync 0; 2026-02-21T10:22:41.7020637Z elect.sync %r3726|%p193, -1; 2026-02-21T10:22:41.7020718Z and.pred %p194, %p172, %p193; 2026-02-21T10:22:41.7020790Z and.pred %p164, %p1, %p194; 2026-02-21T10:22:41.7020858Z add.s32 %r3653, %r1928, 114688; 2026-02-21T10:22:41.7020924Z mov.b32 %r3655, 320; 2026-02-21T10:22:41.7020986Z // begin inline asm 2026-02-21T10:22:41.7021332Z @%p164 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3653], [%rd247, {%r5487, %r3655}], [%r3422]; 2026-02-21T10:22:41.7021403Z // end inline asm 2026-02-21T10:22:41.7021625Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7021696Z add.s64 %rd338, %rd351, 1408; 2026-02-21T10:22:41.7021764Z add.s64 %rd339, %rd355, 1408; 2026-02-21T10:22:41.7021832Z add.s64 %rd340, %rd359, 1408; 2026-02-21T10:22:41.7021897Z add.s64 %rd341, %rd363, 1408; 2026-02-21T10:22:41.7021963Z add.s64 %rd342, %rd367, 1408; 2026-02-21T10:22:41.7022031Z add.s64 %rd343, %rd371, 1408; 2026-02-21T10:22:41.7022094Z add.s64 %rd344, %rd375, 1408; 2026-02-21T10:22:41.7022158Z add.s64 %rd345, %rd379, 1408; 2026-02-21T10:22:41.7022376Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7022444Z add.s32 %r3657, %r3426, 90112; 2026-02-21T10:22:41.7022508Z // begin inline asm 2026-02-21T10:22:41.7022660Z cp.async.ca.shared.global [ %r3657 + 0 ], [ %rd338 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7022731Z // end inline asm 2026-02-21T10:22:41.7022794Z add.s32 %r3659, %r3426, 91136; 2026-02-21T10:22:41.7022855Z // begin inline asm 2026-02-21T10:22:41.7023006Z cp.async.ca.shared.global [ %r3659 + 0 ], [ %rd339 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7023077Z // end inline asm 2026-02-21T10:22:41.7023144Z add.s32 %r3661, %r3426, 92160; 2026-02-21T10:22:41.7023206Z // begin inline asm 2026-02-21T10:22:41.7023358Z cp.async.ca.shared.global [ %r3661 + 0 ], [ %rd340 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7023420Z // end inline asm 2026-02-21T10:22:41.7023484Z add.s32 %r3663, %r3426, 93184; 2026-02-21T10:22:41.7023551Z // begin inline asm 2026-02-21T10:22:41.7023695Z cp.async.ca.shared.global [ %r3663 + 0 ], [ %rd341 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7023759Z // end inline asm 2026-02-21T10:22:41.7023830Z add.s32 %r3665, %r3426, 94208; 2026-02-21T10:22:41.7023891Z // begin inline asm 2026-02-21T10:22:41.7024031Z cp.async.ca.shared.global [ %r3665 + 0 ], [ %rd342 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7024192Z // end inline asm 2026-02-21T10:22:41.7024265Z add.s32 %r3667, %r3426, 95232; 2026-02-21T10:22:41.7024327Z // begin inline asm 2026-02-21T10:22:41.7024466Z cp.async.ca.shared.global [ %r3667 + 0 ], [ %rd343 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7024581Z // end inline asm 2026-02-21T10:22:41.7024644Z add.s32 %r3669, %r3426, 96256; 2026-02-21T10:22:41.7024708Z // begin inline asm 2026-02-21T10:22:41.7024848Z cp.async.ca.shared.global [ %r3669 + 0 ], [ %rd344 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7024912Z // end inline asm 2026-02-21T10:22:41.7024978Z add.s32 %r3671, %r3426, 97280; 2026-02-21T10:22:41.7025038Z // begin inline asm 2026-02-21T10:22:41.7025183Z cp.async.ca.shared.global [ %r3671 + 0 ], [ %rd345 + 0 ], 0x8, %r3427; 2026-02-21T10:22:41.7025244Z // end inline asm 2026-02-21T10:22:41.7025317Z cp.async.commit_group; 2026-02-21T10:22:41.7025549Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7025667Z // begin inline asm 2026-02-21T10:22:41.7025808Z @%p143 mbarrier.arrive.expect_tx.shared.b64 _, [%r3425], 1024; 2026-02-21T10:22:41.7025868Z // end inline asm 2026-02-21T10:22:41.7026092Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7026157Z bar.sync 0; 2026-02-21T10:22:41.7026232Z elect.sync %r3727|%p195, -1; 2026-02-21T10:22:41.7026368Z and.pred %p196, %p172, %p195; 2026-02-21T10:22:41.7026443Z and.pred %p166, %p1, %p196; 2026-02-21T10:22:41.7026637Z add.s32 %r3674, %r1928, 117760; 2026-02-21T10:22:41.7026703Z mov.b32 %r3676, 352; 2026-02-21T10:22:41.7026773Z // begin inline asm 2026-02-21T10:22:41.7027130Z @%p166 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r3674], [%rd247, {%r5487, %r3676}], [%r3425]; 2026-02-21T10:22:41.7027194Z // end inline asm 2026-02-21T10:22:41.7027416Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7027488Z @%p171 bra $L__BB0_14; 2026-02-21T10:22:41.7027582Z // %bb.8: // %.lr.ph264 2026-02-21T10:22:41.7027656Z shl.b32 %r3734, %r130, 5; 2026-02-21T10:22:41.7027725Z add.s32 %r142, %r3734, -3; 2026-02-21T10:22:41.7027790Z shl.b32 %r3736, %r5435, 6; 2026-02-21T10:22:41.7027853Z shl.b32 %r3738, %r5441, 5; 2026-02-21T10:22:41.7027925Z shl.b32 %r3740, %r5436, 1; 2026-02-21T10:22:41.7027989Z or.b32 %r3741, %r3736, %r3738; 2026-02-21T10:22:41.7028054Z or.b32 %r143, %r3741, %r3740; 2026-02-21T10:22:41.7028123Z xor.b32 %r144, %r143, 8; 2026-02-21T10:22:41.7028186Z xor.b32 %r145, %r143, 16; 2026-02-21T10:22:41.7028248Z xor.b32 %r146, %r143, 24; 2026-02-21T10:22:41.7028323Z xor.b32 %r147, %r143, 32; 2026-02-21T10:22:41.7028393Z xor.b32 %r148, %r143, 40; 2026-02-21T10:22:41.7028512Z xor.b32 %r149, %r143, 48; 2026-02-21T10:22:41.7028597Z xor.b32 %r150, %r143, 56; 2026-02-21T10:22:41.7028680Z and.b32 %r3743, %r5437, 32; 2026-02-21T10:22:41.7028746Z or.b32 %r151, %r3743, %r5436; 2026-02-21T10:22:41.7028810Z shl.b32 %r3744, %r5436, 7; 2026-02-21T10:22:41.7028881Z shr.u32 %r3746, %r5435, 3; 2026-02-21T10:22:41.7028945Z or.b32 %r3747, %r3744, %r3746; 2026-02-21T10:22:41.7029011Z or.b32 %r3748, %r3747, %r5438; 2026-02-21T10:22:41.7029074Z add.s32 %r4127, %r1928, 98304; 2026-02-21T10:22:41.7029144Z add.s32 %r152, %r4127, %r3748; 2026-02-21T10:22:41.7029211Z xor.b32 %r3751, %r3748, 16; 2026-02-21T10:22:41.7029276Z add.s32 %r153, %r4127, %r3751; 2026-02-21T10:22:41.7029349Z xor.b32 %r3752, %r3748, 32; 2026-02-21T10:22:41.7029412Z add.s32 %r154, %r4127, %r3752; 2026-02-21T10:22:41.7029474Z xor.b32 %r3753, %r3748, 48; 2026-02-21T10:22:41.7029538Z add.s32 %r155, %r4127, %r3753; 2026-02-21T10:22:41.7029609Z xor.b32 %r3754, %r3748, 64; 2026-02-21T10:22:41.7029672Z add.s32 %r156, %r4127, %r3754; 2026-02-21T10:22:41.7029735Z xor.b32 %r3755, %r3748, 80; 2026-02-21T10:22:41.7029891Z add.s32 %r157, %r4127, %r3755; 2026-02-21T10:22:41.7029957Z xor.b32 %r3756, %r3748, 96; 2026-02-21T10:22:41.7030020Z add.s32 %r158, %r4127, %r3756; 2026-02-21T10:22:41.7030086Z xor.b32 %r3757, %r3748, 112; 2026-02-21T10:22:41.7030155Z add.s32 %r159, %r4127, %r3757; 2026-02-21T10:22:41.7030284Z bfe.u32 %r3758, %r4127, 4, 14; 2026-02-21T10:22:41.7030361Z cvt.u64.u32 %rd380, %r3758; 2026-02-21T10:22:41.7030454Z or.b64 %rd388, %rd380, 4611686293322072064; 2026-02-21T10:22:41.7030520Z add.s32 %r3759, %r1928, 98336; 2026-02-21T10:22:41.7030583Z bfe.u32 %r3760, %r3759, 4, 14; 2026-02-21T10:22:41.7030659Z cvt.u64.u32 %rd381, %r3760; 2026-02-21T10:22:41.7030738Z or.b64 %rd389, %rd381, 4611686293322072064; 2026-02-21T10:22:41.7030803Z add.s32 %r3761, %r1928, 98368; 2026-02-21T10:22:41.7030866Z bfe.u32 %r3762, %r3761, 4, 14; 2026-02-21T10:22:41.7030938Z cvt.u64.u32 %rd382, %r3762; 2026-02-21T10:22:41.7031011Z or.b64 %rd390, %rd382, 4611686293322072064; 2026-02-21T10:22:41.7031075Z add.s32 %r3763, %r1928, 98400; 2026-02-21T10:22:41.7031237Z bfe.u32 %r3764, %r3763, 4, 14; 2026-02-21T10:22:41.7031304Z cvt.u64.u32 %rd383, %r3764; 2026-02-21T10:22:41.7031381Z or.b64 %rd391, %rd383, 4611686293322072064; 2026-02-21T10:22:41.7031453Z add.s32 %r3765, %r1928, 102400; 2026-02-21T10:22:41.7031523Z bfe.u32 %r3766, %r3765, 4, 14; 2026-02-21T10:22:41.7031591Z cvt.u64.u32 %rd384, %r3766; 2026-02-21T10:22:41.7031739Z or.b64 %rd392, %rd384, 4611686293322072064; 2026-02-21T10:22:41.7031815Z add.s32 %r3767, %r1928, 102432; 2026-02-21T10:22:41.7031879Z bfe.u32 %r3768, %r3767, 4, 14; 2026-02-21T10:22:41.7031945Z cvt.u64.u32 %rd385, %r3768; 2026-02-21T10:22:41.7032026Z or.b64 %rd393, %rd385, 4611686293322072064; 2026-02-21T10:22:41.7032089Z add.s32 %r3769, %r1928, 102464; 2026-02-21T10:22:41.7032152Z bfe.u32 %r3770, %r3769, 4, 14; 2026-02-21T10:22:41.7032216Z cvt.u64.u32 %rd386, %r3770; 2026-02-21T10:22:41.7032293Z or.b64 %rd394, %rd386, 4611686293322072064; 2026-02-21T10:22:41.7032360Z add.s32 %r3771, %r1928, 102496; 2026-02-21T10:22:41.7032425Z bfe.u32 %r3772, %r3771, 4, 14; 2026-02-21T10:22:41.7032493Z cvt.u64.u32 %rd387, %r3772; 2026-02-21T10:22:41.7032566Z or.b64 %rd395, %rd387, 4611686293322072064; 2026-02-21T10:22:41.7032635Z and.b32 %r3774, %r5439, 3072; 2026-02-21T10:22:41.7032699Z shl.b32 %r3776, %r5435, 3; 2026-02-21T10:22:41.7032767Z shl.b32 %r3777, %r5441, 2; 2026-02-21T10:22:41.7032833Z or.b32 %r3778, %r3774, %r5440; 2026-02-21T10:22:41.7032896Z or.b32 %r3779, %r3776, %r3777; 2026-02-21T10:22:41.7032970Z xor.b32 %r3780, %r3778, %r3779; 2026-02-21T10:22:41.7033035Z add.s32 %r160, %r4127, %r3780; 2026-02-21T10:22:41.7033096Z and.b32 %r3782, %r5442, 3072; 2026-02-21T10:22:41.7033162Z or.b32 %r3784, %r3782, %r5438; 2026-02-21T10:22:41.7033232Z xor.b32 %r3785, %r3784, %r5443; 2026-02-21T10:22:41.7033295Z add.s32 %r5394, %r4127, %r3785; 2026-02-21T10:22:41.7033359Z add.s32 %r5399, %r5394, 512; 2026-02-21T10:22:41.7033430Z shl.b32 %r3786, %r128, 5; 2026-02-21T10:22:41.7033499Z shl.b32 %r3787, %r129, 5; 2026-02-21T10:22:41.7033564Z add.s32 %r163, %r3786, %r3787; 2026-02-21T10:22:41.7033638Z mov.b32 %r5497, 0f00000000; 2026-02-21T10:22:41.7033698Z mov.b32 %r5494, 2; 2026-02-21T10:22:41.7033762Z mov.b32 %r5493, -1; 2026-02-21T10:22:41.7033822Z mov.b32 %r5490, 0; 2026-02-21T10:22:41.7033886Z mov.b32 %r5489, 1; 2026-02-21T10:22:41.7033950Z mov.b32 %r5486, %r5485; 2026-02-21T10:22:41.7034014Z mov.b32 %r5488, %r5487; 2026-02-21T10:22:41.7034080Z mov.b32 %r5492, %r5490; 2026-02-21T10:22:41.7034141Z mov.b32 %r5495, %r5485; 2026-02-21T10:22:41.7034204Z mov.b32 %r5496, %r5487; 2026-02-21T10:22:41.7034265Z mov.b32 %r5498, %r5497; 2026-02-21T10:22:41.7034331Z mov.b32 %r5499, %r5497; 2026-02-21T10:22:41.7034391Z mov.b32 %r5500, %r5497; 2026-02-21T10:22:41.7034452Z mov.b32 %r5501, %r5497; 2026-02-21T10:22:41.7034517Z mov.b32 %r5502, %r5497; 2026-02-21T10:22:41.7034577Z mov.b32 %r5503, %r5497; 2026-02-21T10:22:41.7034650Z mov.b32 %r5504, %r5497; 2026-02-21T10:22:41.7034778Z mov.b32 %r5505, %r5497; 2026-02-21T10:22:41.7034847Z mov.b32 %r5506, %r5497; 2026-02-21T10:22:41.7034906Z mov.b32 %r5507, %r5497; 2026-02-21T10:22:41.7034968Z mov.b32 %r5508, %r5497; 2026-02-21T10:22:41.7035083Z mov.b32 %r5509, %r5497; 2026-02-21T10:22:41.7035143Z mov.b32 %r5510, %r5497; 2026-02-21T10:22:41.7035203Z mov.b32 %r5511, %r5497; 2026-02-21T10:22:41.7035266Z mov.b32 %r5512, %r5497; 2026-02-21T10:22:41.7035337Z mov.b32 %r5514, %r5494; 2026-02-21T10:22:41.7035397Z mov.b32 %r5515, %r5490; 2026-02-21T10:22:41.7035458Z mov.b32 %r5516, %r5496; 2026-02-21T10:22:41.7035528Z mov.b32 %r5517, %r5495; 2026-02-21T10:22:41.7035591Z bra.uni $L__BB0_9; 2026-02-21T10:22:41.7035716Z $L__BB0_13: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:41.7035934Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7036010Z add.s32 %r5515, %r5515, 1; 2026-02-21T10:22:41.7036142Z setp.ne.b32 %p255, %r163, %r5515; 2026-02-21T10:22:41.7036207Z mov.b32 %r5485, %r5495; 2026-02-21T10:22:41.7036279Z mov.b32 %r5486, %r172; 2026-02-21T10:22:41.7036340Z mov.b32 %r5487, %r5496; 2026-02-21T10:22:41.7036404Z mov.b32 %r5488, %r174; 2026-02-21T10:22:41.7036589Z mov.b32 %r5489, %r5514; 2026-02-21T10:22:41.7036657Z mov.b32 %r5490, %r176; 2026-02-21T10:22:41.7036719Z mov.b32 %r5495, %r5517; 2026-02-21T10:22:41.7036859Z mov.b32 %r5496, %r5516; 2026-02-21T10:22:41.7036931Z mov.b32 %r5514, %r203; 2026-02-21T10:22:41.7036997Z @%p255 bra $L__BB0_9; 2026-02-21T10:22:41.7037064Z bra.uni $L__BB0_14; 2026-02-21T10:22:41.7037201Z $L__BB0_9: // =>This Inner Loop Header: Depth=1 2026-02-21T10:22:41.7037416Z .loc 1 0 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:0:139 2026-02-21T10:22:41.7037480Z mov.b32 %r176, %r5489; 2026-02-21T10:22:41.7037541Z mov.b32 %r174, %r5487; 2026-02-21T10:22:41.7037610Z mov.b32 %r172, %r5485; 2026-02-21T10:22:41.7037823Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7037888Z add.s32 %r3788, %r5514, 1; 2026-02-21T10:22:41.7037964Z setp.eq.b32 %p197, %r5514, 31; 2026-02-21T10:22:41.7038035Z selp.b32 %r203, 0, %r3788, %p197; 2026-02-21T10:22:41.7038104Z setp.ne.b32 %p198, %r203, 0; 2026-02-21T10:22:41.7038178Z @%p198 bra $L__BB0_11; 2026-02-21T10:22:41.7038291Z // %bb.10: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:41.7038358Z add.s32 %r5518, %r5518, 132; 2026-02-21T10:22:41.7038561Z .loc 1 32 35 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:32:35 2026-02-21T10:22:41.7038631Z shr.s32 %r3789, %r5518, 31; 2026-02-21T10:22:41.7038695Z shr.u32 %r3790, %r3789, 17; 2026-02-21T10:22:41.7038762Z add.s32 %r3791, %r5518, %r3790; 2026-02-21T10:22:41.7038831Z shr.s32 %r3792, %r3791, 15; 2026-02-21T10:22:41.7039036Z .loc 1 33 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:33:33 2026-02-21T10:22:41.7039101Z shl.b32 %r3793, %r3792, 5; 2026-02-21T10:22:41.7039304Z .loc 1 34 39 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:39 2026-02-21T10:22:41.7039370Z sub.s32 %r3794, 40, %r3793; 2026-02-21T10:22:41.7039568Z .loc 1 34 52 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:34:52 2026-02-21T10:22:41.7039633Z min.s32 %r3795, %r3794, 32; 2026-02-21T10:22:41.7039835Z .loc 1 35 45 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:45 2026-02-21T10:22:41.7039903Z and.b32 %r3796, %r3791, -32768; 2026-02-21T10:22:41.7039966Z sub.s32 %r3797, %r5518, %r3796; 2026-02-21T10:22:41.7040169Z .loc 1 36 51 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:36:51 2026-02-21T10:22:41.7040234Z div.s32 %r3798, %r3797, %r3795; 2026-02-21T10:22:41.7040509Z .loc 1 35 64 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:64 2026-02-21T10:22:41.7040583Z mul.lo.s32 %r3799, %r3798, %r3795; 2026-02-21T10:22:41.7040659Z sub.s32 %r3800, %r3797, %r3799; 2026-02-21T10:22:41.7040926Z .loc 1 35 30 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:35:30 2026-02-21T10:22:41.7040996Z add.s32 %r3801, %r3800, %r3793; 2026-02-21T10:22:41.7041195Z .loc 1 37 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:37:27 2026-02-21T10:22:41.7041258Z shl.b32 %r5516, %r3801, 5; 2026-02-21T10:22:41.7041455Z .loc 1 39 27 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:39:27 2026-02-21T10:22:41.7041528Z shl.b32 %r5517, %r3798, 6; 2026-02-21T10:22:41.7041724Z .loc 1 40 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:32 2026-02-21T10:22:41.7041786Z or.b32 %r5519, %r5517, %r8; 2026-02-21T10:22:41.7041858Z or.b32 %r5520, %r5517, %r9; 2026-02-21T10:22:41.7041985Z or.b32 %r5521, %r5517, %r10; 2026-02-21T10:22:41.7042050Z or.b32 %r5522, %r5517, %r11; 2026-02-21T10:22:41.7042120Z or.b32 %r5523, %r5517, %r12; 2026-02-21T10:22:41.7042182Z or.b32 %r5524, %r5517, %r13; 2026-02-21T10:22:41.7042247Z or.b32 %r5525, %r5517, %r14; 2026-02-21T10:22:41.7042310Z or.b32 %r5526, %r5517, %r15; 2026-02-21T10:22:41.7042480Z $L__BB0_11: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:41.7042690Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7042758Z setp.eq.b32 %p239, %r203, 0; 2026-02-21T10:22:41.7042849Z setp.lt.s32 %p240, %r5515, %r142; 2026-02-21T10:22:41.7042914Z add.s32 %r5198, %r5493, 1; 2026-02-21T10:22:41.7042983Z setp.gt.s32 %p244, %r5198, 2; 2026-02-21T10:22:41.7043051Z selp.b32 %r5493, 0, %r5198, %p244; 2026-02-21T10:22:41.7043125Z selp.b32 %r5199, 1, 0, %p244; 2026-02-21T10:22:41.7043195Z xor.b32 %r5492, %r5492, %r5199; 2026-02-21T10:22:41.7043398Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7043475Z cp.async.wait_group 8; 2026-02-21T10:22:41.7043538Z bar.sync 0; 2026-02-21T10:22:41.7043601Z shl.b32 %r5200, %r5493, 13; 2026-02-21T10:22:41.7043672Z add.s32 %r5202, %r1928, %r5200; 2026-02-21T10:22:41.7043874Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.7043941Z add.s32 %r5203, %r5202, %r143; 2026-02-21T10:22:41.7044011Z ld.shared.b16 %rs897, [%r5203]; 2026-02-21T10:22:41.7044090Z ld.shared.b16 %rs898, [%r5203+1024]; 2026-02-21T10:22:41.7044159Z ld.shared.b16 %rs899, [%r5203+64]; 2026-02-21T10:22:41.7044232Z ld.shared.b16 %rs900, [%r5203+1088]; 2026-02-21T10:22:41.7044310Z add.s32 %r5204, %r5202, %r144; 2026-02-21T10:22:41.7044378Z ld.shared.b16 %rs901, [%r5204]; 2026-02-21T10:22:41.7044447Z ld.shared.b16 %rs902, [%r5204+1024]; 2026-02-21T10:22:41.7044524Z ld.shared.b16 %rs903, [%r5204+64]; 2026-02-21T10:22:41.7044594Z ld.shared.b16 %rs904, [%r5204+1088]; 2026-02-21T10:22:41.7044657Z add.s32 %r5205, %r5202, %r145; 2026-02-21T10:22:41.7044725Z ld.shared.b16 %rs905, [%r5205]; 2026-02-21T10:22:41.7044803Z ld.shared.b16 %rs906, [%r5205+1024]; 2026-02-21T10:22:41.7044869Z ld.shared.b16 %rs907, [%r5205+64]; 2026-02-21T10:22:41.7044943Z ld.shared.b16 %rs908, [%r5205+1088]; 2026-02-21T10:22:41.7045014Z add.s32 %r5206, %r5202, %r146; 2026-02-21T10:22:41.7045081Z ld.shared.b16 %rs909, [%r5206]; 2026-02-21T10:22:41.7045147Z ld.shared.b16 %rs910, [%r5206+1024]; 2026-02-21T10:22:41.7045215Z ld.shared.b16 %rs911, [%r5206+64]; 2026-02-21T10:22:41.7045290Z ld.shared.b16 %rs912, [%r5206+1088]; 2026-02-21T10:22:41.7045354Z add.s32 %r5207, %r5202, %r147; 2026-02-21T10:22:41.7045422Z ld.shared.b16 %rs913, [%r5207]; 2026-02-21T10:22:41.7045498Z ld.shared.b16 %rs914, [%r5207+1024]; 2026-02-21T10:22:41.7045630Z ld.shared.b16 %rs915, [%r5207+64]; 2026-02-21T10:22:41.7045700Z ld.shared.b16 %rs916, [%r5207+1088]; 2026-02-21T10:22:41.7045775Z add.s32 %r5208, %r5202, %r148; 2026-02-21T10:22:41.7045850Z ld.shared.b16 %rs917, [%r5208]; 2026-02-21T10:22:41.7045968Z ld.shared.b16 %rs918, [%r5208+1024]; 2026-02-21T10:22:41.7046039Z ld.shared.b16 %rs919, [%r5208+64]; 2026-02-21T10:22:41.7046106Z ld.shared.b16 %rs920, [%r5208+1088]; 2026-02-21T10:22:41.7046171Z add.s32 %r5209, %r5202, %r149; 2026-02-21T10:22:41.7046237Z ld.shared.b16 %rs921, [%r5209]; 2026-02-21T10:22:41.7046308Z ld.shared.b16 %rs922, [%r5209+1024]; 2026-02-21T10:22:41.7046374Z ld.shared.b16 %rs923, [%r5209+64]; 2026-02-21T10:22:41.7046441Z ld.shared.b16 %rs924, [%r5209+1088]; 2026-02-21T10:22:41.7046625Z add.s32 %r5210, %r5202, %r150; 2026-02-21T10:22:41.7046695Z ld.shared.b16 %rs925, [%r5210]; 2026-02-21T10:22:41.7046762Z ld.shared.b16 %rs926, [%r5210+1024]; 2026-02-21T10:22:41.7046828Z ld.shared.b16 %rs927, [%r5210+64]; 2026-02-21T10:22:41.7046990Z ld.shared.b16 %rs928, [%r5210+1088]; 2026-02-21T10:22:41.7047062Z cvt.f32.bf16 %r3836, %rs897; 2026-02-21T10:22:41.7047128Z cvt.f32.bf16 %r3837, %rs898; 2026-02-21T10:22:41.7047198Z cvt.f32.bf16 %r3838, %rs901; 2026-02-21T10:22:41.7047262Z cvt.f32.bf16 %r3839, %rs902; 2026-02-21T10:22:41.7047329Z cvt.f32.bf16 %r3872, %rs905; 2026-02-21T10:22:41.7047394Z cvt.f32.bf16 %r3873, %rs906; 2026-02-21T10:22:41.7047529Z cvt.f32.bf16 %r3874, %rs909; 2026-02-21T10:22:41.7047593Z cvt.f32.bf16 %r3875, %rs910; 2026-02-21T10:22:41.7047654Z cvt.f32.bf16 %r3908, %rs913; 2026-02-21T10:22:41.7047721Z cvt.f32.bf16 %r3909, %rs914; 2026-02-21T10:22:41.7047782Z cvt.f32.bf16 %r3910, %rs917; 2026-02-21T10:22:41.7047843Z cvt.f32.bf16 %r3911, %rs918; 2026-02-21T10:22:41.7047911Z cvt.f32.bf16 %r3944, %rs921; 2026-02-21T10:22:41.7047972Z cvt.f32.bf16 %r3945, %rs922; 2026-02-21T10:22:41.7048033Z cvt.f32.bf16 %r3946, %rs925; 2026-02-21T10:22:41.7048106Z cvt.f32.bf16 %r3947, %rs926; 2026-02-21T10:22:41.7048180Z cvt.f32.bf16 %r3980, %rs899; 2026-02-21T10:22:41.7048242Z cvt.f32.bf16 %r3981, %rs900; 2026-02-21T10:22:41.7048308Z cvt.f32.bf16 %r3982, %rs903; 2026-02-21T10:22:41.7048379Z cvt.f32.bf16 %r3983, %rs904; 2026-02-21T10:22:41.7048444Z cvt.f32.bf16 %r4016, %rs907; 2026-02-21T10:22:41.7048505Z cvt.f32.bf16 %r4017, %rs908; 2026-02-21T10:22:41.7048566Z cvt.f32.bf16 %r4018, %rs911; 2026-02-21T10:22:41.7048636Z cvt.f32.bf16 %r4019, %rs912; 2026-02-21T10:22:41.7048698Z cvt.f32.bf16 %r4052, %rs915; 2026-02-21T10:22:41.7048760Z cvt.f32.bf16 %r4053, %rs916; 2026-02-21T10:22:41.7048828Z cvt.f32.bf16 %r4054, %rs919; 2026-02-21T10:22:41.7048891Z cvt.f32.bf16 %r4055, %rs920; 2026-02-21T10:22:41.7048952Z cvt.f32.bf16 %r4088, %rs923; 2026-02-21T10:22:41.7049013Z cvt.f32.bf16 %r4089, %rs924; 2026-02-21T10:22:41.7049080Z cvt.f32.bf16 %r4090, %rs927; 2026-02-21T10:22:41.7049139Z cvt.f32.bf16 %r4091, %rs928; 2026-02-21T10:22:41.7049354Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7049424Z shl.b32 %r5211, %r5493, 3; 2026-02-21T10:22:41.7049487Z add.s32 %r3802, %r3414, %r5211; 2026-02-21T10:22:41.7049550Z // begin inline asm 2026-02-21T10:22:41.7049613Z 2026-02-21T10:22:41.7049666Z { 2026-02-21T10:22:41.7049731Z .reg .pred complete; 2026-02-21T10:22:41.7049790Z waitLoop: 2026-02-21T10:22:41.7049947Z mbarrier.try_wait.parity.shared.b64 complete, [%r3802], %r5492; 2026-02-21T10:22:41.7050021Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.7050077Z } 2026-02-21T10:22:41.7050083Z 2026-02-21T10:22:41.7050148Z // end inline asm 2026-02-21T10:22:41.7050350Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7050426Z shl.b32 %r5213, %r5493, 10; 2026-02-21T10:22:41.7050493Z add.s32 %r5215, %r3443, %r5213; 2026-02-21T10:22:41.7050695Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7050839Z add.s32 %r5216, %r5215, %r151; 2026-02-21T10:22:41.7050907Z ld.shared.b8 %rs929, [%r5216]; 2026-02-21T10:22:41.7050981Z ld.shared.b8 %rs930, [%r5216+64]; 2026-02-21T10:22:41.7051049Z ld.shared.b8 %rs931, [%r5216+256]; 2026-02-21T10:22:41.7051209Z ld.shared.b8 %rs932, [%r5216+320]; 2026-02-21T10:22:41.7051283Z ld.shared.b8 %rs933, [%r5216+512]; 2026-02-21T10:22:41.7051352Z ld.shared.b8 %rs934, [%r5216+576]; 2026-02-21T10:22:41.7051418Z ld.shared.b8 %rs935, [%r5216+768]; 2026-02-21T10:22:41.7051483Z ld.shared.b8 %rs936, [%r5216+832]; 2026-02-21T10:22:41.7051553Z xor.b32 %r5217, %r151, 16; 2026-02-21T10:22:41.7051620Z add.s32 %r5218, %r5215, %r5217; 2026-02-21T10:22:41.7051687Z ld.shared.b8 %rs937, [%r5218+128]; 2026-02-21T10:22:41.7051756Z ld.shared.b8 %rs938, [%r5218+192]; 2026-02-21T10:22:41.7051821Z ld.shared.b8 %rs939, [%r5218+384]; 2026-02-21T10:22:41.7051885Z ld.shared.b8 %rs940, [%r5218+448]; 2026-02-21T10:22:41.7051953Z ld.shared.b8 %rs941, [%r5218+640]; 2026-02-21T10:22:41.7052071Z ld.shared.b8 %rs942, [%r5218+704]; 2026-02-21T10:22:41.7052139Z ld.shared.b8 %rs943, [%r5218+896]; 2026-02-21T10:22:41.7052204Z ld.shared.b8 %rs944, [%r5218+960]; 2026-02-21T10:22:41.7052409Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.7052473Z shl.b16 %rs945, %rs929, 4; 2026-02-21T10:22:41.7052585Z shl.b16 %rs946, %rs930, 4; 2026-02-21T10:22:41.7052654Z shl.b16 %rs947, %rs937, 4; 2026-02-21T10:22:41.7052717Z shl.b16 %rs948, %rs938, 4; 2026-02-21T10:22:41.7052779Z shl.b16 %rs949, %rs931, 4; 2026-02-21T10:22:41.7052840Z shl.b16 %rs950, %rs932, 4; 2026-02-21T10:22:41.7052908Z shl.b16 %rs951, %rs939, 4; 2026-02-21T10:22:41.7052982Z shl.b16 %rs952, %rs940, 4; 2026-02-21T10:22:41.7053052Z shl.b16 %rs953, %rs933, 4; 2026-02-21T10:22:41.7053114Z shl.b16 %rs954, %rs934, 4; 2026-02-21T10:22:41.7053174Z shl.b16 %rs955, %rs941, 4; 2026-02-21T10:22:41.7053243Z shl.b16 %rs956, %rs942, 4; 2026-02-21T10:22:41.7053308Z shl.b16 %rs957, %rs935, 4; 2026-02-21T10:22:41.7053369Z shl.b16 %rs958, %rs936, 4; 2026-02-21T10:22:41.7053430Z shl.b16 %rs959, %rs943, 4; 2026-02-21T10:22:41.7053495Z shl.b16 %rs960, %rs944, 4; 2026-02-21T10:22:41.7053696Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7053776Z selp.b16 %rs961, %rs945, %rs929, %p268; 2026-02-21T10:22:41.7053843Z cvt.s16.s8 %rs962, %rs961; 2026-02-21T10:22:41.7053906Z shr.s16 %rs963, %rs962, 4; 2026-02-21T10:22:41.7053979Z selp.b16 %rs964, %rs946, %rs930, %p268; 2026-02-21T10:22:41.7054043Z cvt.s16.s8 %rs965, %rs964; 2026-02-21T10:22:41.7054110Z shr.s16 %rs966, %rs965, 4; 2026-02-21T10:22:41.7054180Z selp.b16 %rs967, %rs947, %rs937, %p268; 2026-02-21T10:22:41.7054241Z cvt.s16.s8 %rs968, %rs967; 2026-02-21T10:22:41.7054307Z shr.s16 %rs969, %rs968, 4; 2026-02-21T10:22:41.7054378Z selp.b16 %rs970, %rs948, %rs938, %p268; 2026-02-21T10:22:41.7054458Z cvt.s16.s8 %rs971, %rs970; 2026-02-21T10:22:41.7054520Z shr.s16 %rs972, %rs971, 4; 2026-02-21T10:22:41.7054595Z selp.b16 %rs973, %rs949, %rs931, %p268; 2026-02-21T10:22:41.7054657Z cvt.s16.s8 %rs974, %rs973; 2026-02-21T10:22:41.7054720Z shr.s16 %rs975, %rs974, 4; 2026-02-21T10:22:41.7054796Z selp.b16 %rs976, %rs950, %rs932, %p268; 2026-02-21T10:22:41.7054858Z cvt.s16.s8 %rs977, %rs976; 2026-02-21T10:22:41.7054921Z shr.s16 %rs978, %rs977, 4; 2026-02-21T10:22:41.7054994Z selp.b16 %rs979, %rs951, %rs939, %p268; 2026-02-21T10:22:41.7055055Z cvt.s16.s8 %rs980, %rs979; 2026-02-21T10:22:41.7055117Z shr.s16 %rs981, %rs980, 4; 2026-02-21T10:22:41.7055186Z selp.b16 %rs982, %rs952, %rs940, %p268; 2026-02-21T10:22:41.7055252Z cvt.s16.s8 %rs983, %rs982; 2026-02-21T10:22:41.7055315Z shr.s16 %rs984, %rs983, 4; 2026-02-21T10:22:41.7055383Z selp.b16 %rs985, %rs953, %rs933, %p268; 2026-02-21T10:22:41.7055451Z cvt.s16.s8 %rs986, %rs985; 2026-02-21T10:22:41.7055571Z shr.s16 %rs987, %rs986, 4; 2026-02-21T10:22:41.7055642Z selp.b16 %rs988, %rs954, %rs934, %p268; 2026-02-21T10:22:41.7055703Z cvt.s16.s8 %rs989, %rs988; 2026-02-21T10:22:41.7055767Z shr.s16 %rs990, %rs989, 4; 2026-02-21T10:22:41.7055885Z selp.b16 %rs991, %rs955, %rs941, %p268; 2026-02-21T10:22:41.7055946Z cvt.s16.s8 %rs992, %rs991; 2026-02-21T10:22:41.7056011Z shr.s16 %rs993, %rs992, 4; 2026-02-21T10:22:41.7056081Z selp.b16 %rs994, %rs956, %rs942, %p268; 2026-02-21T10:22:41.7056143Z cvt.s16.s8 %rs995, %rs994; 2026-02-21T10:22:41.7056206Z shr.s16 %rs996, %rs995, 4; 2026-02-21T10:22:41.7056279Z selp.b16 %rs997, %rs957, %rs935, %p268; 2026-02-21T10:22:41.7056340Z cvt.s16.s8 %rs998, %rs997; 2026-02-21T10:22:41.7056401Z shr.s16 %rs999, %rs998, 4; 2026-02-21T10:22:41.7056607Z selp.b16 %rs1000, %rs958, %rs936, %p268; 2026-02-21T10:22:41.7056679Z cvt.s16.s8 %rs1001, %rs1000; 2026-02-21T10:22:41.7056743Z shr.s16 %rs1002, %rs1001, 4; 2026-02-21T10:22:41.7056819Z selp.b16 %rs1003, %rs959, %rs943, %p268; 2026-02-21T10:22:41.7056980Z cvt.s16.s8 %rs1004, %rs1003; 2026-02-21T10:22:41.7057048Z shr.s16 %rs1005, %rs1004, 4; 2026-02-21T10:22:41.7057124Z selp.b16 %rs1006, %rs960, %rs944, %p268; 2026-02-21T10:22:41.7057188Z cvt.s16.s8 %rs1007, %rs1006; 2026-02-21T10:22:41.7057252Z shr.s16 %rs1008, %rs1007, 4; 2026-02-21T10:22:41.7057520Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.7057593Z cvt.rn.f32.s16 %r5219, %rs963; 2026-02-21T10:22:41.7057657Z cvt.rn.f32.s16 %r5220, %rs966; 2026-02-21T10:22:41.7057720Z cvt.rn.f32.s16 %r5221, %rs969; 2026-02-21T10:22:41.7057782Z cvt.rn.f32.s16 %r5222, %rs972; 2026-02-21T10:22:41.7057849Z cvt.rn.f32.s16 %r5223, %rs975; 2026-02-21T10:22:41.7057910Z cvt.rn.f32.s16 %r5224, %rs978; 2026-02-21T10:22:41.7057972Z cvt.rn.f32.s16 %r5225, %rs981; 2026-02-21T10:22:41.7058038Z cvt.rn.f32.s16 %r5226, %rs984; 2026-02-21T10:22:41.7058101Z cvt.rn.f32.s16 %r5227, %rs987; 2026-02-21T10:22:41.7058170Z cvt.rn.f32.s16 %r5228, %rs990; 2026-02-21T10:22:41.7058239Z cvt.rn.f32.s16 %r5229, %rs993; 2026-02-21T10:22:41.7058301Z cvt.rn.f32.s16 %r5230, %rs996; 2026-02-21T10:22:41.7058365Z cvt.rn.f32.s16 %r5231, %rs999; 2026-02-21T10:22:41.7058446Z cvt.rn.f32.s16 %r5232, %rs1002; 2026-02-21T10:22:41.7058516Z cvt.rn.f32.s16 %r5233, %rs1005; 2026-02-21T10:22:41.7058578Z cvt.rn.f32.s16 %r5234, %rs1008; 2026-02-21T10:22:41.7058647Z st.shared.b32 [%r152], %r5219; 2026-02-21T10:22:41.7058722Z st.shared.b32 [%r152+4096], %r5227; 2026-02-21T10:22:41.7058788Z st.shared.b32 [%r153], %r5220; 2026-02-21T10:22:41.7058855Z st.shared.b32 [%r153+4096], %r5228; 2026-02-21T10:22:41.7058918Z st.shared.b32 [%r154], %r5221; 2026-02-21T10:22:41.7058987Z st.shared.b32 [%r154+4096], %r5229; 2026-02-21T10:22:41.7059052Z st.shared.b32 [%r155], %r5222; 2026-02-21T10:22:41.7059119Z st.shared.b32 [%r155+4096], %r5230; 2026-02-21T10:22:41.7059185Z st.shared.b32 [%r156], %r5223; 2026-02-21T10:22:41.7059254Z st.shared.b32 [%r156+4096], %r5231; 2026-02-21T10:22:41.7059318Z st.shared.b32 [%r157], %r5224; 2026-02-21T10:22:41.7059383Z st.shared.b32 [%r157+4096], %r5232; 2026-02-21T10:22:41.7059451Z st.shared.b32 [%r158], %r5225; 2026-02-21T10:22:41.7059520Z st.shared.b32 [%r158+4096], %r5233; 2026-02-21T10:22:41.7059584Z st.shared.b32 [%r159], %r5226; 2026-02-21T10:22:41.7059653Z st.shared.b32 [%r159+4096], %r5234; 2026-02-21T10:22:41.7059712Z $L__tmp17: 2026-02-21T10:22:41.7059991Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.7060058Z // begin inline asm 2026-02-21T10:22:41.7060140Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.7060209Z // end inline asm 2026-02-21T10:22:41.7060268Z bar.sync 0; 2026-02-21T10:22:41.7060356Z shfl.sync.idx.b32 %r5235, %r4, 0, 31, -1; 2026-02-21T10:22:41.7060431Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.7060496Z mov.pred %p199, -1; 2026-02-21T10:22:41.7060643Z // begin inline asm 2026-02-21T10:22:41.7061163Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r3836,%r3837,%r3838,%r3839}, %rd388, %p199, 1, 1; 2026-02-21T10:22:41.7061288Z // end inline asm 2026-02-21T10:22:41.7061352Z // begin inline asm 2026-02-21T10:22:41.7061868Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r3872,%r3873,%r3874,%r3875}, %rd389, %p199, 1, 1; 2026-02-21T10:22:41.7061936Z // end inline asm 2026-02-21T10:22:41.7061998Z // begin inline asm 2026-02-21T10:22:41.7062503Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r3908,%r3909,%r3910,%r3911}, %rd390, %p199, 1, 1; 2026-02-21T10:22:41.7062567Z // end inline asm 2026-02-21T10:22:41.7062672Z // begin inline asm 2026-02-21T10:22:41.7063173Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r3944,%r3945,%r3946,%r3947}, %rd391, %p199, 1, 1; 2026-02-21T10:22:41.7063233Z // end inline asm 2026-02-21T10:22:41.7063293Z // begin inline asm 2026-02-21T10:22:41.7063835Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r3980,%r3981,%r3982,%r3983}, %rd392, %p199, 1, 1; 2026-02-21T10:22:41.7063894Z // end inline asm 2026-02-21T10:22:41.7063954Z // begin inline asm 2026-02-21T10:22:41.7064455Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4016,%r4017,%r4018,%r4019}, %rd393, %p199, 1, 1; 2026-02-21T10:22:41.7064517Z // end inline asm 2026-02-21T10:22:41.7064576Z // begin inline asm 2026-02-21T10:22:41.7065074Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4052,%r4053,%r4054,%r4055}, %rd394, %p199, 1, 1; 2026-02-21T10:22:41.7065135Z // end inline asm 2026-02-21T10:22:41.7065191Z // begin inline asm 2026-02-21T10:22:41.7065690Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4088,%r4089,%r4090,%r4091}, %rd395, %p199, 1, 1; 2026-02-21T10:22:41.7065747Z // end inline asm 2026-02-21T10:22:41.7065823Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.7065884Z mov.b32 %r5093, 0; 2026-02-21T10:22:41.7065946Z mov.b32 %r4109, %r5093; 2026-02-21T10:22:41.7066006Z mov.b32 %r4110, %r5093; 2026-02-21T10:22:41.7066065Z mov.b32 %r4108, %r4127; 2026-02-21T10:22:41.7066128Z // begin inline asm 2026-02-21T10:22:41.7066436Z // wait for regs: %r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512,%r4108,%r4109,%r4110 2026-02-21T10:22:41.7066632Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.7066700Z // end inline asm 2026-02-21T10:22:41.7066755Z $L__tmp18: 2026-02-21T10:22:41.7066984Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7067051Z add.s32 %r5236, %r1928, 24576; 2026-02-21T10:22:41.7067118Z add.s32 %r5237, %r5236, %r5200; 2026-02-21T10:22:41.7067326Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.7067390Z add.s32 %r5238, %r5237, %r143; 2026-02-21T10:22:41.7067463Z ld.shared.b16 %rs1009, [%r5238]; 2026-02-21T10:22:41.7067536Z ld.shared.b16 %rs1010, [%r5238+1024]; 2026-02-21T10:22:41.7067607Z ld.shared.b16 %rs1011, [%r5238+64]; 2026-02-21T10:22:41.7067764Z ld.shared.b16 %rs1012, [%r5238+1088]; 2026-02-21T10:22:41.7067829Z add.s32 %r5239, %r5237, %r144; 2026-02-21T10:22:41.7067894Z ld.shared.b16 %rs1013, [%r5239]; 2026-02-21T10:22:41.7067962Z ld.shared.b16 %rs1014, [%r5239+1024]; 2026-02-21T10:22:41.7068105Z ld.shared.b16 %rs1015, [%r5239+64]; 2026-02-21T10:22:41.7068175Z ld.shared.b16 %rs1016, [%r5239+1088]; 2026-02-21T10:22:41.7068237Z add.s32 %r5240, %r5237, %r145; 2026-02-21T10:22:41.7068309Z ld.shared.b16 %rs1017, [%r5240]; 2026-02-21T10:22:41.7068377Z ld.shared.b16 %rs1018, [%r5240+1024]; 2026-02-21T10:22:41.7068445Z ld.shared.b16 %rs1019, [%r5240+64]; 2026-02-21T10:22:41.7068588Z ld.shared.b16 %rs1020, [%r5240+1088]; 2026-02-21T10:22:41.7068657Z add.s32 %r5241, %r5237, %r146; 2026-02-21T10:22:41.7068722Z ld.shared.b16 %rs1021, [%r5241]; 2026-02-21T10:22:41.7068789Z ld.shared.b16 %rs1022, [%r5241+1024]; 2026-02-21T10:22:41.7068860Z ld.shared.b16 %rs1023, [%r5241+64]; 2026-02-21T10:22:41.7068929Z ld.shared.b16 %rs1024, [%r5241+1088]; 2026-02-21T10:22:41.7069066Z add.s32 %r5242, %r5237, %r147; 2026-02-21T10:22:41.7069136Z ld.shared.b16 %rs1025, [%r5242]; 2026-02-21T10:22:41.7069203Z ld.shared.b16 %rs1026, [%r5242+1024]; 2026-02-21T10:22:41.7069272Z ld.shared.b16 %rs1027, [%r5242+64]; 2026-02-21T10:22:41.7069338Z ld.shared.b16 %rs1028, [%r5242+1088]; 2026-02-21T10:22:41.7069403Z add.s32 %r5243, %r5237, %r148; 2026-02-21T10:22:41.7069531Z ld.shared.b16 %rs1029, [%r5243]; 2026-02-21T10:22:41.7069602Z ld.shared.b16 %rs1030, [%r5243+1024]; 2026-02-21T10:22:41.7069672Z ld.shared.b16 %rs1031, [%r5243+64]; 2026-02-21T10:22:41.7069740Z ld.shared.b16 %rs1032, [%r5243+1088]; 2026-02-21T10:22:41.7069802Z add.s32 %r5244, %r5237, %r149; 2026-02-21T10:22:41.7069868Z ld.shared.b16 %rs1033, [%r5244]; 2026-02-21T10:22:41.7069939Z ld.shared.b16 %rs1034, [%r5244+1024]; 2026-02-21T10:22:41.7070005Z ld.shared.b16 %rs1035, [%r5244+64]; 2026-02-21T10:22:41.7070072Z ld.shared.b16 %rs1036, [%r5244+1088]; 2026-02-21T10:22:41.7070142Z add.s32 %r5245, %r5237, %r150; 2026-02-21T10:22:41.7070209Z ld.shared.b16 %rs1037, [%r5245]; 2026-02-21T10:22:41.7070276Z ld.shared.b16 %rs1038, [%r5245+1024]; 2026-02-21T10:22:41.7070350Z ld.shared.b16 %rs1039, [%r5245+64]; 2026-02-21T10:22:41.7070419Z ld.shared.b16 %rs1040, [%r5245+1088]; 2026-02-21T10:22:41.7070483Z cvt.f32.bf16 %r4164, %rs1009; 2026-02-21T10:22:41.7070545Z cvt.f32.bf16 %r4165, %rs1010; 2026-02-21T10:22:41.7070614Z cvt.f32.bf16 %r4166, %rs1013; 2026-02-21T10:22:41.7070676Z cvt.f32.bf16 %r4167, %rs1014; 2026-02-21T10:22:41.7070738Z cvt.f32.bf16 %r4200, %rs1017; 2026-02-21T10:22:41.7070801Z cvt.f32.bf16 %r4201, %rs1018; 2026-02-21T10:22:41.7070860Z cvt.f32.bf16 %r4202, %rs1021; 2026-02-21T10:22:41.7070922Z cvt.f32.bf16 %r4203, %rs1022; 2026-02-21T10:22:41.7070984Z cvt.f32.bf16 %r4236, %rs1025; 2026-02-21T10:22:41.7071050Z cvt.f32.bf16 %r4237, %rs1026; 2026-02-21T10:22:41.7071110Z cvt.f32.bf16 %r4238, %rs1029; 2026-02-21T10:22:41.7071173Z cvt.f32.bf16 %r4239, %rs1030; 2026-02-21T10:22:41.7071246Z cvt.f32.bf16 %r4272, %rs1033; 2026-02-21T10:22:41.7071314Z cvt.f32.bf16 %r4273, %rs1034; 2026-02-21T10:22:41.7071376Z cvt.f32.bf16 %r4274, %rs1037; 2026-02-21T10:22:41.7071438Z cvt.f32.bf16 %r4275, %rs1038; 2026-02-21T10:22:41.7071505Z cvt.f32.bf16 %r4308, %rs1011; 2026-02-21T10:22:41.7071565Z cvt.f32.bf16 %r4309, %rs1012; 2026-02-21T10:22:41.7071626Z cvt.f32.bf16 %r4310, %rs1015; 2026-02-21T10:22:41.7071691Z cvt.f32.bf16 %r4311, %rs1016; 2026-02-21T10:22:41.7071752Z cvt.f32.bf16 %r4344, %rs1019; 2026-02-21T10:22:41.7071812Z cvt.f32.bf16 %r4345, %rs1020; 2026-02-21T10:22:41.7071875Z cvt.f32.bf16 %r4346, %rs1023; 2026-02-21T10:22:41.7071941Z cvt.f32.bf16 %r4347, %rs1024; 2026-02-21T10:22:41.7072002Z cvt.f32.bf16 %r4380, %rs1027; 2026-02-21T10:22:41.7072063Z cvt.f32.bf16 %r4381, %rs1028; 2026-02-21T10:22:41.7072131Z cvt.f32.bf16 %r4382, %rs1031; 2026-02-21T10:22:41.7072191Z cvt.f32.bf16 %r4383, %rs1032; 2026-02-21T10:22:41.7072340Z cvt.f32.bf16 %r4416, %rs1035; 2026-02-21T10:22:41.7072404Z cvt.f32.bf16 %r4417, %rs1036; 2026-02-21T10:22:41.7072465Z cvt.f32.bf16 %r4418, %rs1039; 2026-02-21T10:22:41.7072525Z cvt.f32.bf16 %r4419, %rs1040; 2026-02-21T10:22:41.7072791Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7072870Z add.s32 %r4130, %r3417, %r5211; 2026-02-21T10:22:41.7072935Z // begin inline asm 2026-02-21T10:22:41.7072988Z 2026-02-21T10:22:41.7073043Z { 2026-02-21T10:22:41.7073110Z .reg .pred complete; 2026-02-21T10:22:41.7073165Z waitLoop: 2026-02-21T10:22:41.7073314Z mbarrier.try_wait.parity.shared.b64 complete, [%r4130], %r5492; 2026-02-21T10:22:41.7073389Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.7073440Z } 2026-02-21T10:22:41.7073444Z 2026-02-21T10:22:41.7073500Z // end inline asm 2026-02-21T10:22:41.7073714Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7073831Z add.s32 %r5248, %r3464, %r5213; 2026-02-21T10:22:41.7074036Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7074102Z add.s32 %r5249, %r5248, %r151; 2026-02-21T10:22:41.7074170Z ld.shared.b8 %rs1041, [%r5249]; 2026-02-21T10:22:41.7074238Z ld.shared.b8 %rs1042, [%r5249+64]; 2026-02-21T10:22:41.7074303Z ld.shared.b8 %rs1043, [%r5249+256]; 2026-02-21T10:22:41.7074421Z ld.shared.b8 %rs1044, [%r5249+320]; 2026-02-21T10:22:41.7074487Z ld.shared.b8 %rs1045, [%r5249+512]; 2026-02-21T10:22:41.7074554Z ld.shared.b8 %rs1046, [%r5249+576]; 2026-02-21T10:22:41.7074622Z ld.shared.b8 %rs1047, [%r5249+768]; 2026-02-21T10:22:41.7074688Z ld.shared.b8 %rs1048, [%r5249+832]; 2026-02-21T10:22:41.7074750Z add.s32 %r5250, %r5248, %r5217; 2026-02-21T10:22:41.7074818Z ld.shared.b8 %rs1049, [%r5250+128]; 2026-02-21T10:22:41.7074883Z ld.shared.b8 %rs1050, [%r5250+192]; 2026-02-21T10:22:41.7074952Z ld.shared.b8 %rs1051, [%r5250+384]; 2026-02-21T10:22:41.7075034Z ld.shared.b8 %rs1052, [%r5250+448]; 2026-02-21T10:22:41.7075105Z ld.shared.b8 %rs1053, [%r5250+640]; 2026-02-21T10:22:41.7075172Z ld.shared.b8 %rs1054, [%r5250+704]; 2026-02-21T10:22:41.7075241Z ld.shared.b8 %rs1055, [%r5250+896]; 2026-02-21T10:22:41.7075310Z ld.shared.b8 %rs1056, [%r5250+960]; 2026-02-21T10:22:41.7075515Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.7075580Z shl.b16 %rs1057, %rs1041, 4; 2026-02-21T10:22:41.7075646Z shl.b16 %rs1058, %rs1042, 4; 2026-02-21T10:22:41.7075707Z shl.b16 %rs1059, %rs1049, 4; 2026-02-21T10:22:41.7075768Z shl.b16 %rs1060, %rs1050, 4; 2026-02-21T10:22:41.7075830Z shl.b16 %rs1061, %rs1043, 4; 2026-02-21T10:22:41.7075894Z shl.b16 %rs1062, %rs1044, 4; 2026-02-21T10:22:41.7075954Z shl.b16 %rs1063, %rs1051, 4; 2026-02-21T10:22:41.7076015Z shl.b16 %rs1064, %rs1052, 4; 2026-02-21T10:22:41.7076082Z shl.b16 %rs1065, %rs1045, 4; 2026-02-21T10:22:41.7076146Z shl.b16 %rs1066, %rs1046, 4; 2026-02-21T10:22:41.7076206Z shl.b16 %rs1067, %rs1053, 4; 2026-02-21T10:22:41.7076268Z shl.b16 %rs1068, %rs1054, 4; 2026-02-21T10:22:41.7076332Z shl.b16 %rs1069, %rs1047, 4; 2026-02-21T10:22:41.7076397Z shl.b16 %rs1070, %rs1048, 4; 2026-02-21T10:22:41.7076579Z shl.b16 %rs1071, %rs1055, 4; 2026-02-21T10:22:41.7076650Z shl.b16 %rs1072, %rs1056, 4; 2026-02-21T10:22:41.7076849Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7076930Z selp.b16 %rs1073, %rs1057, %rs1041, %p268; 2026-02-21T10:22:41.7076993Z cvt.s16.s8 %rs1074, %rs1073; 2026-02-21T10:22:41.7077057Z shr.s16 %rs1075, %rs1074, 4; 2026-02-21T10:22:41.7077136Z selp.b16 %rs1076, %rs1058, %rs1042, %p268; 2026-02-21T10:22:41.7077210Z cvt.s16.s8 %rs1077, %rs1076; 2026-02-21T10:22:41.7077286Z shr.s16 %rs1078, %rs1077, 4; 2026-02-21T10:22:41.7077364Z selp.b16 %rs1079, %rs1059, %rs1049, %p268; 2026-02-21T10:22:41.7077511Z cvt.s16.s8 %rs1080, %rs1079; 2026-02-21T10:22:41.7077576Z shr.s16 %rs1081, %rs1080, 4; 2026-02-21T10:22:41.7077652Z selp.b16 %rs1082, %rs1060, %rs1050, %p268; 2026-02-21T10:22:41.7077714Z cvt.s16.s8 %rs1083, %rs1082; 2026-02-21T10:22:41.7077838Z shr.s16 %rs1084, %rs1083, 4; 2026-02-21T10:22:41.7077914Z selp.b16 %rs1085, %rs1061, %rs1043, %p268; 2026-02-21T10:22:41.7077975Z cvt.s16.s8 %rs1086, %rs1085; 2026-02-21T10:22:41.7078048Z shr.s16 %rs1087, %rs1086, 4; 2026-02-21T10:22:41.7078129Z selp.b16 %rs1088, %rs1062, %rs1044, %p268; 2026-02-21T10:22:41.7078192Z cvt.s16.s8 %rs1089, %rs1088; 2026-02-21T10:22:41.7078253Z shr.s16 %rs1090, %rs1089, 4; 2026-02-21T10:22:41.7078330Z selp.b16 %rs1091, %rs1063, %rs1051, %p268; 2026-02-21T10:22:41.7078395Z cvt.s16.s8 %rs1092, %rs1091; 2026-02-21T10:22:41.7078454Z shr.s16 %rs1093, %rs1092, 4; 2026-02-21T10:22:41.7078528Z selp.b16 %rs1094, %rs1064, %rs1052, %p268; 2026-02-21T10:22:41.7078596Z cvt.s16.s8 %rs1095, %rs1094; 2026-02-21T10:22:41.7078722Z shr.s16 %rs1096, %rs1095, 4; 2026-02-21T10:22:41.7078799Z selp.b16 %rs1097, %rs1065, %rs1045, %p268; 2026-02-21T10:22:41.7078864Z cvt.s16.s8 %rs1098, %rs1097; 2026-02-21T10:22:41.7078925Z shr.s16 %rs1099, %rs1098, 4; 2026-02-21T10:22:41.7079002Z selp.b16 %rs1100, %rs1066, %rs1046, %p268; 2026-02-21T10:22:41.7079062Z cvt.s16.s8 %rs1101, %rs1100; 2026-02-21T10:22:41.7079192Z shr.s16 %rs1102, %rs1101, 4; 2026-02-21T10:22:41.7079269Z selp.b16 %rs1103, %rs1067, %rs1053, %p268; 2026-02-21T10:22:41.7079331Z cvt.s16.s8 %rs1104, %rs1103; 2026-02-21T10:22:41.7079400Z shr.s16 %rs1105, %rs1104, 4; 2026-02-21T10:22:41.7079487Z selp.b16 %rs1106, %rs1068, %rs1054, %p268; 2026-02-21T10:22:41.7079549Z cvt.s16.s8 %rs1107, %rs1106; 2026-02-21T10:22:41.7079610Z shr.s16 %rs1108, %rs1107, 4; 2026-02-21T10:22:41.7079687Z selp.b16 %rs1109, %rs1069, %rs1047, %p268; 2026-02-21T10:22:41.7079749Z cvt.s16.s8 %rs1110, %rs1109; 2026-02-21T10:22:41.7079810Z shr.s16 %rs1111, %rs1110, 4; 2026-02-21T10:22:41.7079893Z selp.b16 %rs1112, %rs1070, %rs1048, %p268; 2026-02-21T10:22:41.7079955Z cvt.s16.s8 %rs1113, %rs1112; 2026-02-21T10:22:41.7080015Z shr.s16 %rs1114, %rs1113, 4; 2026-02-21T10:22:41.7080086Z selp.b16 %rs1115, %rs1071, %rs1055, %p268; 2026-02-21T10:22:41.7080155Z cvt.s16.s8 %rs1116, %rs1115; 2026-02-21T10:22:41.7080217Z shr.s16 %rs1117, %rs1116, 4; 2026-02-21T10:22:41.7080293Z selp.b16 %rs1118, %rs1072, %rs1056, %p268; 2026-02-21T10:22:41.7080359Z cvt.s16.s8 %rs1119, %rs1118; 2026-02-21T10:22:41.7080419Z shr.s16 %rs1120, %rs1119, 4; 2026-02-21T10:22:41.7080632Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.7080703Z cvt.rn.f32.s16 %r5251, %rs1075; 2026-02-21T10:22:41.7080767Z cvt.rn.f32.s16 %r5252, %rs1078; 2026-02-21T10:22:41.7080829Z cvt.rn.f32.s16 %r5253, %rs1081; 2026-02-21T10:22:41.7080893Z cvt.rn.f32.s16 %r5254, %rs1084; 2026-02-21T10:22:41.7080958Z cvt.rn.f32.s16 %r5255, %rs1087; 2026-02-21T10:22:41.7081027Z cvt.rn.f32.s16 %r5256, %rs1090; 2026-02-21T10:22:41.7081091Z cvt.rn.f32.s16 %r5257, %rs1093; 2026-02-21T10:22:41.7081158Z cvt.rn.f32.s16 %r5258, %rs1096; 2026-02-21T10:22:41.7081222Z cvt.rn.f32.s16 %r5259, %rs1099; 2026-02-21T10:22:41.7081288Z cvt.rn.f32.s16 %r5260, %rs1102; 2026-02-21T10:22:41.7081357Z cvt.rn.f32.s16 %r5261, %rs1105; 2026-02-21T10:22:41.7081422Z cvt.rn.f32.s16 %r5262, %rs1108; 2026-02-21T10:22:41.7081485Z cvt.rn.f32.s16 %r5263, %rs1111; 2026-02-21T10:22:41.7081547Z cvt.rn.f32.s16 %r5264, %rs1114; 2026-02-21T10:22:41.7081613Z cvt.rn.f32.s16 %r5265, %rs1117; 2026-02-21T10:22:41.7081677Z cvt.rn.f32.s16 %r5266, %rs1120; 2026-02-21T10:22:41.7081732Z bar.sync 0; 2026-02-21T10:22:41.7081804Z st.shared.b32 [%r152], %r5251; 2026-02-21T10:22:41.7081872Z st.shared.b32 [%r152+4096], %r5259; 2026-02-21T10:22:41.7081935Z st.shared.b32 [%r153], %r5252; 2026-02-21T10:22:41.7082001Z st.shared.b32 [%r153+4096], %r5260; 2026-02-21T10:22:41.7082129Z st.shared.b32 [%r154], %r5253; 2026-02-21T10:22:41.7082195Z st.shared.b32 [%r154+4096], %r5261; 2026-02-21T10:22:41.7082258Z st.shared.b32 [%r155], %r5254; 2026-02-21T10:22:41.7082325Z st.shared.b32 [%r155+4096], %r5262; 2026-02-21T10:22:41.7082438Z st.shared.b32 [%r156], %r5255; 2026-02-21T10:22:41.7082502Z st.shared.b32 [%r156+4096], %r5263; 2026-02-21T10:22:41.7082566Z st.shared.b32 [%r157], %r5256; 2026-02-21T10:22:41.7082648Z st.shared.b32 [%r157+4096], %r5264; 2026-02-21T10:22:41.7082714Z st.shared.b32 [%r158], %r5257; 2026-02-21T10:22:41.7082780Z st.shared.b32 [%r158+4096], %r5265; 2026-02-21T10:22:41.7082845Z st.shared.b32 [%r159], %r5258; 2026-02-21T10:22:41.7082910Z st.shared.b32 [%r159+4096], %r5266; 2026-02-21T10:22:41.7082964Z $L__tmp19: 2026-02-21T10:22:41.7083243Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.7083305Z // begin inline asm 2026-02-21T10:22:41.7083437Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.7083496Z // end inline asm 2026-02-21T10:22:41.7083555Z bar.sync 0; 2026-02-21T10:22:41.7083627Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.7083690Z // begin inline asm 2026-02-21T10:22:41.7084251Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4164,%r4165,%r4166,%r4167}, %rd388, %p199, 1, 1; 2026-02-21T10:22:41.7084311Z // end inline asm 2026-02-21T10:22:41.7084368Z // begin inline asm 2026-02-21T10:22:41.7084873Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4200,%r4201,%r4202,%r4203}, %rd389, %p199, 1, 1; 2026-02-21T10:22:41.7084932Z // end inline asm 2026-02-21T10:22:41.7084989Z // begin inline asm 2026-02-21T10:22:41.7085494Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4236,%r4237,%r4238,%r4239}, %rd390, %p199, 1, 1; 2026-02-21T10:22:41.7085554Z // end inline asm 2026-02-21T10:22:41.7085615Z // begin inline asm 2026-02-21T10:22:41.7086112Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4272,%r4273,%r4274,%r4275}, %rd391, %p199, 1, 1; 2026-02-21T10:22:41.7086173Z // end inline asm 2026-02-21T10:22:41.7086232Z // begin inline asm 2026-02-21T10:22:41.7086852Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4308,%r4309,%r4310,%r4311}, %rd392, %p199, 1, 1; 2026-02-21T10:22:41.7086917Z // end inline asm 2026-02-21T10:22:41.7086985Z // begin inline asm 2026-02-21T10:22:41.7087500Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4344,%r4345,%r4346,%r4347}, %rd393, %p199, 1, 1; 2026-02-21T10:22:41.7087565Z // end inline asm 2026-02-21T10:22:41.7087626Z // begin inline asm 2026-02-21T10:22:41.7088130Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4380,%r4381,%r4382,%r4383}, %rd394, %p199, 1, 1; 2026-02-21T10:22:41.7088191Z // end inline asm 2026-02-21T10:22:41.7088250Z // begin inline asm 2026-02-21T10:22:41.7088746Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4416,%r4417,%r4418,%r4419}, %rd395, %p199, 1, 1; 2026-02-21T10:22:41.7088809Z // end inline asm 2026-02-21T10:22:41.7088885Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.7089027Z mov.b32 %r4437, %r5093; 2026-02-21T10:22:41.7089090Z mov.b32 %r4438, %r5093; 2026-02-21T10:22:41.7089154Z mov.b32 %r4436, %r4127; 2026-02-21T10:22:41.7089213Z // begin inline asm 2026-02-21T10:22:41.7089521Z // wait for regs: %r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512,%r4436,%r4437,%r4438 2026-02-21T10:22:41.7089664Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.7089725Z // end inline asm 2026-02-21T10:22:41.7089780Z $L__tmp20: 2026-02-21T10:22:41.7090003Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7090068Z add.s32 %r5267, %r1928, 49152; 2026-02-21T10:22:41.7090130Z add.s32 %r5268, %r5267, %r5200; 2026-02-21T10:22:41.7090329Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.7090395Z add.s32 %r5269, %r5268, %r143; 2026-02-21T10:22:41.7090461Z ld.shared.b16 %rs1121, [%r5269]; 2026-02-21T10:22:41.7090595Z ld.shared.b16 %rs1122, [%r5269+1024]; 2026-02-21T10:22:41.7090670Z ld.shared.b16 %rs1123, [%r5269+64]; 2026-02-21T10:22:41.7090738Z ld.shared.b16 %rs1124, [%r5269+1088]; 2026-02-21T10:22:41.7090801Z add.s32 %r5270, %r5268, %r144; 2026-02-21T10:22:41.7090871Z ld.shared.b16 %rs1125, [%r5270]; 2026-02-21T10:22:41.7090938Z ld.shared.b16 %rs1126, [%r5270+1024]; 2026-02-21T10:22:41.7091084Z ld.shared.b16 %rs1127, [%r5270+64]; 2026-02-21T10:22:41.7091157Z ld.shared.b16 %rs1128, [%r5270+1088]; 2026-02-21T10:22:41.7091226Z add.s32 %r5271, %r5268, %r145; 2026-02-21T10:22:41.7091292Z ld.shared.b16 %rs1129, [%r5271]; 2026-02-21T10:22:41.7091372Z ld.shared.b16 %rs1130, [%r5271+1024]; 2026-02-21T10:22:41.7091447Z ld.shared.b16 %rs1131, [%r5271+64]; 2026-02-21T10:22:41.7091518Z ld.shared.b16 %rs1132, [%r5271+1088]; 2026-02-21T10:22:41.7091580Z add.s32 %r5272, %r5268, %r146; 2026-02-21T10:22:41.7091645Z ld.shared.b16 %rs1133, [%r5272]; 2026-02-21T10:22:41.7091717Z ld.shared.b16 %rs1134, [%r5272+1024]; 2026-02-21T10:22:41.7091788Z ld.shared.b16 %rs1135, [%r5272+64]; 2026-02-21T10:22:41.7091856Z ld.shared.b16 %rs1136, [%r5272+1088]; 2026-02-21T10:22:41.7091925Z add.s32 %r5273, %r5268, %r147; 2026-02-21T10:22:41.7091995Z ld.shared.b16 %rs1137, [%r5273]; 2026-02-21T10:22:41.7092063Z ld.shared.b16 %rs1138, [%r5273+1024]; 2026-02-21T10:22:41.7092134Z ld.shared.b16 %rs1139, [%r5273+64]; 2026-02-21T10:22:41.7092204Z ld.shared.b16 %rs1140, [%r5273+1088]; 2026-02-21T10:22:41.7092266Z add.s32 %r5274, %r5268, %r148; 2026-02-21T10:22:41.7092332Z ld.shared.b16 %rs1141, [%r5274]; 2026-02-21T10:22:41.7092405Z ld.shared.b16 %rs1142, [%r5274+1024]; 2026-02-21T10:22:41.7092472Z ld.shared.b16 %rs1143, [%r5274+64]; 2026-02-21T10:22:41.7092540Z ld.shared.b16 %rs1144, [%r5274+1088]; 2026-02-21T10:22:41.7092607Z add.s32 %r5275, %r5268, %r149; 2026-02-21T10:22:41.7092671Z ld.shared.b16 %rs1145, [%r5275]; 2026-02-21T10:22:41.7092737Z ld.shared.b16 %rs1146, [%r5275+1024]; 2026-02-21T10:22:41.7092808Z ld.shared.b16 %rs1147, [%r5275+64]; 2026-02-21T10:22:41.7092880Z ld.shared.b16 %rs1148, [%r5275+1088]; 2026-02-21T10:22:41.7092942Z add.s32 %r5276, %r5268, %r150; 2026-02-21T10:22:41.7093008Z ld.shared.b16 %rs1149, [%r5276]; 2026-02-21T10:22:41.7093092Z ld.shared.b16 %rs1150, [%r5276+1024]; 2026-02-21T10:22:41.7093161Z ld.shared.b16 %rs1151, [%r5276+64]; 2026-02-21T10:22:41.7093232Z ld.shared.b16 %rs1152, [%r5276+1088]; 2026-02-21T10:22:41.7093301Z cvt.f32.bf16 %r4492, %rs1121; 2026-02-21T10:22:41.7093363Z cvt.f32.bf16 %r4493, %rs1122; 2026-02-21T10:22:41.7093425Z cvt.f32.bf16 %r4494, %rs1125; 2026-02-21T10:22:41.7093486Z cvt.f32.bf16 %r4495, %rs1126; 2026-02-21T10:22:41.7093550Z cvt.f32.bf16 %r4528, %rs1129; 2026-02-21T10:22:41.7093612Z cvt.f32.bf16 %r4529, %rs1130; 2026-02-21T10:22:41.7093674Z cvt.f32.bf16 %r4530, %rs1133; 2026-02-21T10:22:41.7093740Z cvt.f32.bf16 %r4531, %rs1134; 2026-02-21T10:22:41.7093802Z cvt.f32.bf16 %r4564, %rs1137; 2026-02-21T10:22:41.7093927Z cvt.f32.bf16 %r4565, %rs1138; 2026-02-21T10:22:41.7093991Z cvt.f32.bf16 %r4566, %rs1141; 2026-02-21T10:22:41.7094056Z cvt.f32.bf16 %r4567, %rs1142; 2026-02-21T10:22:41.7094118Z cvt.f32.bf16 %r4600, %rs1145; 2026-02-21T10:22:41.7094238Z cvt.f32.bf16 %r4601, %rs1146; 2026-02-21T10:22:41.7094304Z cvt.f32.bf16 %r4602, %rs1149; 2026-02-21T10:22:41.7094364Z cvt.f32.bf16 %r4603, %rs1150; 2026-02-21T10:22:41.7094428Z cvt.f32.bf16 %r4636, %rs1123; 2026-02-21T10:22:41.7094489Z cvt.f32.bf16 %r4637, %rs1124; 2026-02-21T10:22:41.7094553Z cvt.f32.bf16 %r4638, %rs1127; 2026-02-21T10:22:41.7094613Z cvt.f32.bf16 %r4639, %rs1128; 2026-02-21T10:22:41.7094675Z cvt.f32.bf16 %r4672, %rs1131; 2026-02-21T10:22:41.7094741Z cvt.f32.bf16 %r4673, %rs1132; 2026-02-21T10:22:41.7094803Z cvt.f32.bf16 %r4674, %rs1135; 2026-02-21T10:22:41.7094866Z cvt.f32.bf16 %r4675, %rs1136; 2026-02-21T10:22:41.7094931Z cvt.f32.bf16 %r4708, %rs1139; 2026-02-21T10:22:41.7094992Z cvt.f32.bf16 %r4709, %rs1140; 2026-02-21T10:22:41.7095107Z cvt.f32.bf16 %r4710, %rs1143; 2026-02-21T10:22:41.7095170Z cvt.f32.bf16 %r4711, %rs1144; 2026-02-21T10:22:41.7095236Z cvt.f32.bf16 %r4744, %rs1147; 2026-02-21T10:22:41.7095297Z cvt.f32.bf16 %r4745, %rs1148; 2026-02-21T10:22:41.7095360Z cvt.f32.bf16 %r4746, %rs1151; 2026-02-21T10:22:41.7095425Z cvt.f32.bf16 %r4747, %rs1152; 2026-02-21T10:22:41.7095691Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7095759Z add.s32 %r4458, %r3420, %r5211; 2026-02-21T10:22:41.7095820Z // begin inline asm 2026-02-21T10:22:41.7095878Z 2026-02-21T10:22:41.7095931Z { 2026-02-21T10:22:41.7095995Z .reg .pred complete; 2026-02-21T10:22:41.7096054Z waitLoop: 2026-02-21T10:22:41.7096200Z mbarrier.try_wait.parity.shared.b64 complete, [%r4458], %r5492; 2026-02-21T10:22:41.7096272Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.7096324Z } 2026-02-21T10:22:41.7096328Z 2026-02-21T10:22:41.7096394Z // end inline asm 2026-02-21T10:22:41.7096717Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7096785Z add.s32 %r5279, %r3485, %r5213; 2026-02-21T10:22:41.7097000Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7097067Z add.s32 %r5280, %r5279, %r151; 2026-02-21T10:22:41.7097136Z ld.shared.b8 %rs1153, [%r5280]; 2026-02-21T10:22:41.7097210Z ld.shared.b8 %rs1154, [%r5280+64]; 2026-02-21T10:22:41.7097279Z ld.shared.b8 %rs1155, [%r5280+256]; 2026-02-21T10:22:41.7097345Z ld.shared.b8 %rs1156, [%r5280+320]; 2026-02-21T10:22:41.7097411Z ld.shared.b8 %rs1157, [%r5280+512]; 2026-02-21T10:22:41.7097481Z ld.shared.b8 %rs1158, [%r5280+576]; 2026-02-21T10:22:41.7097548Z ld.shared.b8 %rs1159, [%r5280+768]; 2026-02-21T10:22:41.7097613Z ld.shared.b8 %rs1160, [%r5280+832]; 2026-02-21T10:22:41.7097680Z add.s32 %r5281, %r5279, %r5217; 2026-02-21T10:22:41.7097749Z ld.shared.b8 %rs1161, [%r5281+128]; 2026-02-21T10:22:41.7097819Z ld.shared.b8 %rs1162, [%r5281+192]; 2026-02-21T10:22:41.7097892Z ld.shared.b8 %rs1163, [%r5281+384]; 2026-02-21T10:22:41.7097960Z ld.shared.b8 %rs1164, [%r5281+448]; 2026-02-21T10:22:41.7098029Z ld.shared.b8 %rs1165, [%r5281+640]; 2026-02-21T10:22:41.7098095Z ld.shared.b8 %rs1166, [%r5281+704]; 2026-02-21T10:22:41.7098164Z ld.shared.b8 %rs1167, [%r5281+896]; 2026-02-21T10:22:41.7098232Z ld.shared.b8 %rs1168, [%r5281+960]; 2026-02-21T10:22:41.7098430Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.7098497Z shl.b16 %rs1169, %rs1153, 4; 2026-02-21T10:22:41.7098560Z shl.b16 %rs1170, %rs1154, 4; 2026-02-21T10:22:41.7098622Z shl.b16 %rs1171, %rs1161, 4; 2026-02-21T10:22:41.7098691Z shl.b16 %rs1172, %rs1162, 4; 2026-02-21T10:22:41.7098754Z shl.b16 %rs1173, %rs1155, 4; 2026-02-21T10:22:41.7098816Z shl.b16 %rs1174, %rs1156, 4; 2026-02-21T10:22:41.7098962Z shl.b16 %rs1175, %rs1163, 4; 2026-02-21T10:22:41.7099032Z shl.b16 %rs1176, %rs1164, 4; 2026-02-21T10:22:41.7099094Z shl.b16 %rs1177, %rs1157, 4; 2026-02-21T10:22:41.7099155Z shl.b16 %rs1178, %rs1158, 4; 2026-02-21T10:22:41.7099222Z shl.b16 %rs1179, %rs1165, 4; 2026-02-21T10:22:41.7099362Z shl.b16 %rs1180, %rs1166, 4; 2026-02-21T10:22:41.7099424Z shl.b16 %rs1181, %rs1159, 4; 2026-02-21T10:22:41.7099490Z shl.b16 %rs1182, %rs1160, 4; 2026-02-21T10:22:41.7099562Z shl.b16 %rs1183, %rs1167, 4; 2026-02-21T10:22:41.7099624Z shl.b16 %rs1184, %rs1168, 4; 2026-02-21T10:22:41.7099822Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7099908Z selp.b16 %rs1185, %rs1169, %rs1153, %p268; 2026-02-21T10:22:41.7099971Z cvt.s16.s8 %rs1186, %rs1185; 2026-02-21T10:22:41.7100032Z shr.s16 %rs1187, %rs1186, 4; 2026-02-21T10:22:41.7100109Z selp.b16 %rs1188, %rs1170, %rs1154, %p268; 2026-02-21T10:22:41.7100176Z cvt.s16.s8 %rs1189, %rs1188; 2026-02-21T10:22:41.7100305Z shr.s16 %rs1190, %rs1189, 4; 2026-02-21T10:22:41.7100384Z selp.b16 %rs1191, %rs1171, %rs1161, %p268; 2026-02-21T10:22:41.7100450Z cvt.s16.s8 %rs1192, %rs1191; 2026-02-21T10:22:41.7100511Z shr.s16 %rs1193, %rs1192, 4; 2026-02-21T10:22:41.7100589Z selp.b16 %rs1194, %rs1172, %rs1162, %p268; 2026-02-21T10:22:41.7100655Z cvt.s16.s8 %rs1195, %rs1194; 2026-02-21T10:22:41.7100777Z shr.s16 %rs1196, %rs1195, 4; 2026-02-21T10:22:41.7100854Z selp.b16 %rs1197, %rs1173, %rs1155, %p268; 2026-02-21T10:22:41.7100915Z cvt.s16.s8 %rs1198, %rs1197; 2026-02-21T10:22:41.7100982Z shr.s16 %rs1199, %rs1198, 4; 2026-02-21T10:22:41.7101055Z selp.b16 %rs1200, %rs1174, %rs1156, %p268; 2026-02-21T10:22:41.7101117Z cvt.s16.s8 %rs1201, %rs1200; 2026-02-21T10:22:41.7101181Z shr.s16 %rs1202, %rs1201, 4; 2026-02-21T10:22:41.7101267Z selp.b16 %rs1203, %rs1175, %rs1163, %p268; 2026-02-21T10:22:41.7101331Z cvt.s16.s8 %rs1204, %rs1203; 2026-02-21T10:22:41.7101393Z shr.s16 %rs1205, %rs1204, 4; 2026-02-21T10:22:41.7101473Z selp.b16 %rs1206, %rs1176, %rs1164, %p268; 2026-02-21T10:22:41.7101533Z cvt.s16.s8 %rs1207, %rs1206; 2026-02-21T10:22:41.7101593Z shr.s16 %rs1208, %rs1207, 4; 2026-02-21T10:22:41.7101672Z selp.b16 %rs1209, %rs1177, %rs1157, %p268; 2026-02-21T10:22:41.7101740Z cvt.s16.s8 %rs1210, %rs1209; 2026-02-21T10:22:41.7101802Z shr.s16 %rs1211, %rs1210, 4; 2026-02-21T10:22:41.7101882Z selp.b16 %rs1212, %rs1178, %rs1158, %p268; 2026-02-21T10:22:41.7101946Z cvt.s16.s8 %rs1213, %rs1212; 2026-02-21T10:22:41.7102006Z shr.s16 %rs1214, %rs1213, 4; 2026-02-21T10:22:41.7102081Z selp.b16 %rs1215, %rs1179, %rs1165, %p268; 2026-02-21T10:22:41.7102144Z cvt.s16.s8 %rs1216, %rs1215; 2026-02-21T10:22:41.7102205Z shr.s16 %rs1217, %rs1216, 4; 2026-02-21T10:22:41.7102279Z selp.b16 %rs1218, %rs1180, %rs1166, %p268; 2026-02-21T10:22:41.7102346Z cvt.s16.s8 %rs1219, %rs1218; 2026-02-21T10:22:41.7102408Z shr.s16 %rs1220, %rs1219, 4; 2026-02-21T10:22:41.7102482Z selp.b16 %rs1221, %rs1181, %rs1159, %p268; 2026-02-21T10:22:41.7102545Z cvt.s16.s8 %rs1222, %rs1221; 2026-02-21T10:22:41.7102612Z shr.s16 %rs1223, %rs1222, 4; 2026-02-21T10:22:41.7102687Z selp.b16 %rs1224, %rs1182, %rs1160, %p268; 2026-02-21T10:22:41.7102753Z cvt.s16.s8 %rs1225, %rs1224; 2026-02-21T10:22:41.7102818Z shr.s16 %rs1226, %rs1225, 4; 2026-02-21T10:22:41.7102890Z selp.b16 %rs1227, %rs1183, %rs1167, %p268; 2026-02-21T10:22:41.7102953Z cvt.s16.s8 %rs1228, %rs1227; 2026-02-21T10:22:41.7103014Z shr.s16 %rs1229, %rs1228, 4; 2026-02-21T10:22:41.7103096Z selp.b16 %rs1230, %rs1184, %rs1168, %p268; 2026-02-21T10:22:41.7103157Z cvt.s16.s8 %rs1231, %rs1230; 2026-02-21T10:22:41.7103217Z shr.s16 %rs1232, %rs1231, 4; 2026-02-21T10:22:41.7103418Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.7103484Z cvt.rn.f32.s16 %r5282, %rs1187; 2026-02-21T10:22:41.7103551Z cvt.rn.f32.s16 %r5283, %rs1190; 2026-02-21T10:22:41.7103687Z cvt.rn.f32.s16 %r5284, %rs1193; 2026-02-21T10:22:41.7103760Z cvt.rn.f32.s16 %r5285, %rs1196; 2026-02-21T10:22:41.7103823Z cvt.rn.f32.s16 %r5286, %rs1199; 2026-02-21T10:22:41.7103884Z cvt.rn.f32.s16 %r5287, %rs1202; 2026-02-21T10:22:41.7104001Z cvt.rn.f32.s16 %r5288, %rs1205; 2026-02-21T10:22:41.7104063Z cvt.rn.f32.s16 %r5289, %rs1208; 2026-02-21T10:22:41.7104126Z cvt.rn.f32.s16 %r5290, %rs1211; 2026-02-21T10:22:41.7104195Z cvt.rn.f32.s16 %r5291, %rs1214; 2026-02-21T10:22:41.7104257Z cvt.rn.f32.s16 %r5292, %rs1217; 2026-02-21T10:22:41.7104319Z cvt.rn.f32.s16 %r5293, %rs1220; 2026-02-21T10:22:41.7104384Z cvt.rn.f32.s16 %r5294, %rs1223; 2026-02-21T10:22:41.7104452Z cvt.rn.f32.s16 %r5295, %rs1226; 2026-02-21T10:22:41.7104514Z cvt.rn.f32.s16 %r5296, %rs1229; 2026-02-21T10:22:41.7104576Z cvt.rn.f32.s16 %r5297, %rs1232; 2026-02-21T10:22:41.7104637Z bar.sync 0; 2026-02-21T10:22:41.7104714Z st.shared.b32 [%r152], %r5282; 2026-02-21T10:22:41.7104789Z st.shared.b32 [%r152+4096], %r5290; 2026-02-21T10:22:41.7104918Z st.shared.b32 [%r153], %r5283; 2026-02-21T10:22:41.7104990Z st.shared.b32 [%r153+4096], %r5291; 2026-02-21T10:22:41.7105056Z st.shared.b32 [%r154], %r5284; 2026-02-21T10:22:41.7105125Z st.shared.b32 [%r154+4096], %r5292; 2026-02-21T10:22:41.7105201Z st.shared.b32 [%r155], %r5285; 2026-02-21T10:22:41.7105269Z st.shared.b32 [%r155+4096], %r5293; 2026-02-21T10:22:41.7105336Z st.shared.b32 [%r156], %r5286; 2026-02-21T10:22:41.7105456Z st.shared.b32 [%r156+4096], %r5294; 2026-02-21T10:22:41.7105525Z st.shared.b32 [%r157], %r5287; 2026-02-21T10:22:41.7105594Z st.shared.b32 [%r157+4096], %r5295; 2026-02-21T10:22:41.7105658Z st.shared.b32 [%r158], %r5288; 2026-02-21T10:22:41.7105729Z st.shared.b32 [%r158+4096], %r5296; 2026-02-21T10:22:41.7105794Z st.shared.b32 [%r159], %r5289; 2026-02-21T10:22:41.7105861Z st.shared.b32 [%r159+4096], %r5297; 2026-02-21T10:22:41.7105921Z $L__tmp21: 2026-02-21T10:22:41.7106206Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.7106274Z // begin inline asm 2026-02-21T10:22:41.7106359Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.7106417Z // end inline asm 2026-02-21T10:22:41.7106595Z bar.sync 0; 2026-02-21T10:22:41.7106676Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.7106744Z // begin inline asm 2026-02-21T10:22:41.7107278Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4492,%r4493,%r4494,%r4495}, %rd388, %p199, 1, 1; 2026-02-21T10:22:41.7107338Z // end inline asm 2026-02-21T10:22:41.7107402Z // begin inline asm 2026-02-21T10:22:41.7107904Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4528,%r4529,%r4530,%r4531}, %rd389, %p199, 1, 1; 2026-02-21T10:22:41.7107965Z // end inline asm 2026-02-21T10:22:41.7108030Z // begin inline asm 2026-02-21T10:22:41.7108593Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4564,%r4565,%r4566,%r4567}, %rd390, %p199, 1, 1; 2026-02-21T10:22:41.7108658Z // end inline asm 2026-02-21T10:22:41.7108722Z // begin inline asm 2026-02-21T10:22:41.7109222Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4600,%r4601,%r4602,%r4603}, %rd391, %p199, 1, 1; 2026-02-21T10:22:41.7109280Z // end inline asm 2026-02-21T10:22:41.7109341Z // begin inline asm 2026-02-21T10:22:41.7109845Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4636,%r4637,%r4638,%r4639}, %rd392, %p199, 1, 1; 2026-02-21T10:22:41.7109987Z // end inline asm 2026-02-21T10:22:41.7110048Z // begin inline asm 2026-02-21T10:22:41.7110546Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4672,%r4673,%r4674,%r4675}, %rd393, %p199, 1, 1; 2026-02-21T10:22:41.7110680Z // end inline asm 2026-02-21T10:22:41.7110739Z // begin inline asm 2026-02-21T10:22:41.7111245Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4708,%r4709,%r4710,%r4711}, %rd394, %p199, 1, 1; 2026-02-21T10:22:41.7111303Z // end inline asm 2026-02-21T10:22:41.7111361Z // begin inline asm 2026-02-21T10:22:41.7111864Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4744,%r4745,%r4746,%r4747}, %rd395, %p199, 1, 1; 2026-02-21T10:22:41.7112021Z // end inline asm 2026-02-21T10:22:41.7112103Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.7112174Z mov.b32 %r4765, %r5093; 2026-02-21T10:22:41.7112234Z mov.b32 %r4766, %r5093; 2026-02-21T10:22:41.7112294Z mov.b32 %r4764, %r4127; 2026-02-21T10:22:41.7112357Z // begin inline asm 2026-02-21T10:22:41.7112733Z // wait for regs: %r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512,%r4764,%r4765,%r4766 2026-02-21T10:22:41.7112812Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.7112870Z // end inline asm 2026-02-21T10:22:41.7112929Z $L__tmp22: 2026-02-21T10:22:41.7113139Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7113204Z add.s32 %r5298, %r1928, 73728; 2026-02-21T10:22:41.7113271Z add.s32 %r5299, %r5298, %r5200; 2026-02-21T10:22:41.7113475Z .loc 1 58 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:58:32 2026-02-21T10:22:41.7113545Z add.s32 %r5300, %r5299, %r143; 2026-02-21T10:22:41.7113614Z ld.shared.b16 %rs1233, [%r5300]; 2026-02-21T10:22:41.7113692Z ld.shared.b16 %rs1234, [%r5300+1024]; 2026-02-21T10:22:41.7113762Z ld.shared.b16 %rs1235, [%r5300+64]; 2026-02-21T10:22:41.7113836Z ld.shared.b16 %rs1236, [%r5300+1088]; 2026-02-21T10:22:41.7113903Z add.s32 %r5301, %r5299, %r144; 2026-02-21T10:22:41.7113984Z ld.shared.b16 %rs1237, [%r5301]; 2026-02-21T10:22:41.7114054Z ld.shared.b16 %rs1238, [%r5301+1024]; 2026-02-21T10:22:41.7114127Z ld.shared.b16 %rs1239, [%r5301+64]; 2026-02-21T10:22:41.7114195Z ld.shared.b16 %rs1240, [%r5301+1088]; 2026-02-21T10:22:41.7114258Z add.s32 %r5302, %r5299, %r145; 2026-02-21T10:22:41.7114326Z ld.shared.b16 %rs1241, [%r5302]; 2026-02-21T10:22:41.7114396Z ld.shared.b16 %rs1242, [%r5302+1024]; 2026-02-21T10:22:41.7114464Z ld.shared.b16 %rs1243, [%r5302+64]; 2026-02-21T10:22:41.7114531Z ld.shared.b16 %rs1244, [%r5302+1088]; 2026-02-21T10:22:41.7114603Z add.s32 %r5303, %r5299, %r146; 2026-02-21T10:22:41.7114670Z ld.shared.b16 %rs1245, [%r5303]; 2026-02-21T10:22:41.7114737Z ld.shared.b16 %rs1246, [%r5303+1024]; 2026-02-21T10:22:41.7114805Z ld.shared.b16 %rs1247, [%r5303+64]; 2026-02-21T10:22:41.7114881Z ld.shared.b16 %rs1248, [%r5303+1088]; 2026-02-21T10:22:41.7114945Z add.s32 %r5304, %r5299, %r147; 2026-02-21T10:22:41.7115011Z ld.shared.b16 %rs1249, [%r5304]; 2026-02-21T10:22:41.7115084Z ld.shared.b16 %rs1250, [%r5304+1024]; 2026-02-21T10:22:41.7115153Z ld.shared.b16 %rs1251, [%r5304+64]; 2026-02-21T10:22:41.7115221Z ld.shared.b16 %rs1252, [%r5304+1088]; 2026-02-21T10:22:41.7115289Z add.s32 %r5305, %r5299, %r148; 2026-02-21T10:22:41.7115354Z ld.shared.b16 %rs1253, [%r5305]; 2026-02-21T10:22:41.7115421Z ld.shared.b16 %rs1254, [%r5305+1024]; 2026-02-21T10:22:41.7115490Z ld.shared.b16 %rs1255, [%r5305+64]; 2026-02-21T10:22:41.7115562Z ld.shared.b16 %rs1256, [%r5305+1088]; 2026-02-21T10:22:41.7115624Z add.s32 %r5306, %r5299, %r149; 2026-02-21T10:22:41.7115753Z ld.shared.b16 %rs1257, [%r5306]; 2026-02-21T10:22:41.7115824Z ld.shared.b16 %rs1258, [%r5306+1024]; 2026-02-21T10:22:41.7115892Z ld.shared.b16 %rs1259, [%r5306+64]; 2026-02-21T10:22:41.7115959Z ld.shared.b16 %rs1260, [%r5306+1088]; 2026-02-21T10:22:41.7116071Z add.s32 %r5307, %r5299, %r150; 2026-02-21T10:22:41.7116142Z ld.shared.b16 %rs1261, [%r5307]; 2026-02-21T10:22:41.7116210Z ld.shared.b16 %rs1262, [%r5307+1024]; 2026-02-21T10:22:41.7116279Z ld.shared.b16 %rs1263, [%r5307+64]; 2026-02-21T10:22:41.7116349Z ld.shared.b16 %rs1264, [%r5307+1088]; 2026-02-21T10:22:41.7116415Z cvt.f32.bf16 %r4820, %rs1233; 2026-02-21T10:22:41.7116601Z cvt.f32.bf16 %r4821, %rs1234; 2026-02-21T10:22:41.7116674Z cvt.f32.bf16 %r4822, %rs1237; 2026-02-21T10:22:41.7116738Z cvt.f32.bf16 %r4823, %rs1238; 2026-02-21T10:22:41.7116800Z cvt.f32.bf16 %r4856, %rs1241; 2026-02-21T10:22:41.7116861Z cvt.f32.bf16 %r4857, %rs1242; 2026-02-21T10:22:41.7116927Z cvt.f32.bf16 %r4858, %rs1245; 2026-02-21T10:22:41.7117068Z cvt.f32.bf16 %r4859, %rs1246; 2026-02-21T10:22:41.7117145Z cvt.f32.bf16 %r4892, %rs1249; 2026-02-21T10:22:41.7117213Z cvt.f32.bf16 %r4893, %rs1250; 2026-02-21T10:22:41.7117276Z cvt.f32.bf16 %r4894, %rs1253; 2026-02-21T10:22:41.7117339Z cvt.f32.bf16 %r4895, %rs1254; 2026-02-21T10:22:41.7117401Z cvt.f32.bf16 %r4928, %rs1257; 2026-02-21T10:22:41.7117473Z cvt.f32.bf16 %r4929, %rs1258; 2026-02-21T10:22:41.7117599Z cvt.f32.bf16 %r4930, %rs1261; 2026-02-21T10:22:41.7117664Z cvt.f32.bf16 %r4931, %rs1262; 2026-02-21T10:22:41.7117729Z cvt.f32.bf16 %r4964, %rs1235; 2026-02-21T10:22:41.7117792Z cvt.f32.bf16 %r4965, %rs1236; 2026-02-21T10:22:41.7117853Z cvt.f32.bf16 %r4966, %rs1239; 2026-02-21T10:22:41.7117913Z cvt.f32.bf16 %r4967, %rs1240; 2026-02-21T10:22:41.7117979Z cvt.f32.bf16 %r5000, %rs1243; 2026-02-21T10:22:41.7118043Z cvt.f32.bf16 %r5001, %rs1244; 2026-02-21T10:22:41.7118106Z cvt.f32.bf16 %r5002, %rs1247; 2026-02-21T10:22:41.7118174Z cvt.f32.bf16 %r5003, %rs1248; 2026-02-21T10:22:41.7118239Z cvt.f32.bf16 %r5036, %rs1251; 2026-02-21T10:22:41.7118299Z cvt.f32.bf16 %r5037, %rs1252; 2026-02-21T10:22:41.7118361Z cvt.f32.bf16 %r5038, %rs1255; 2026-02-21T10:22:41.7118427Z cvt.f32.bf16 %r5039, %rs1256; 2026-02-21T10:22:41.7118492Z cvt.f32.bf16 %r5072, %rs1259; 2026-02-21T10:22:41.7118553Z cvt.f32.bf16 %r5073, %rs1260; 2026-02-21T10:22:41.7118623Z cvt.f32.bf16 %r5074, %rs1263; 2026-02-21T10:22:41.7118685Z cvt.f32.bf16 %r5075, %rs1264; 2026-02-21T10:22:41.7118900Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7118969Z add.s32 %r4786, %r3423, %r5211; 2026-02-21T10:22:41.7119030Z // begin inline asm 2026-02-21T10:22:41.7119082Z 2026-02-21T10:22:41.7119134Z { 2026-02-21T10:22:41.7119203Z .reg .pred complete; 2026-02-21T10:22:41.7119260Z waitLoop: 2026-02-21T10:22:41.7119404Z mbarrier.try_wait.parity.shared.b64 complete, [%r4786], %r5492; 2026-02-21T10:22:41.7119487Z @!complete bra.uni waitLoop; 2026-02-21T10:22:41.7119538Z } 2026-02-21T10:22:41.7119542Z 2026-02-21T10:22:41.7119602Z // end inline asm 2026-02-21T10:22:41.7119805Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7119877Z add.s32 %r5310, %r3506, %r5213; 2026-02-21T10:22:41.7120075Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7120139Z add.s32 %r5311, %r5310, %r151; 2026-02-21T10:22:41.7120222Z ld.shared.b8 %rs1265, [%r5311]; 2026-02-21T10:22:41.7120293Z ld.shared.b8 %rs1266, [%r5311+64]; 2026-02-21T10:22:41.7120361Z ld.shared.b8 %rs1267, [%r5311+256]; 2026-02-21T10:22:41.7120432Z ld.shared.b8 %rs1268, [%r5311+320]; 2026-02-21T10:22:41.7120497Z ld.shared.b8 %rs1269, [%r5311+512]; 2026-02-21T10:22:41.7120562Z ld.shared.b8 %rs1270, [%r5311+576]; 2026-02-21T10:22:41.7120628Z ld.shared.b8 %rs1271, [%r5311+768]; 2026-02-21T10:22:41.7120781Z ld.shared.b8 %rs1272, [%r5311+832]; 2026-02-21T10:22:41.7120845Z add.s32 %r5312, %r5310, %r5217; 2026-02-21T10:22:41.7120911Z ld.shared.b8 %rs1273, [%r5312+128]; 2026-02-21T10:22:41.7120980Z ld.shared.b8 %rs1274, [%r5312+192]; 2026-02-21T10:22:41.7121117Z ld.shared.b8 %rs1275, [%r5312+384]; 2026-02-21T10:22:41.7121182Z ld.shared.b8 %rs1276, [%r5312+448]; 2026-02-21T10:22:41.7121255Z ld.shared.b8 %rs1277, [%r5312+640]; 2026-02-21T10:22:41.7121323Z ld.shared.b8 %rs1278, [%r5312+704]; 2026-02-21T10:22:41.7121389Z ld.shared.b8 %rs1279, [%r5312+896]; 2026-02-21T10:22:41.7121453Z ld.shared.b8 %rs1280, [%r5312+960]; 2026-02-21T10:22:41.7121655Z .loc 1 63 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:63:28 2026-02-21T10:22:41.7121720Z shl.b16 %rs1281, %rs1265, 4; 2026-02-21T10:22:41.7121784Z shl.b16 %rs1282, %rs1266, 4; 2026-02-21T10:22:41.7121849Z shl.b16 %rs1283, %rs1273, 4; 2026-02-21T10:22:41.7121914Z shl.b16 %rs1284, %rs1274, 4; 2026-02-21T10:22:41.7122033Z shl.b16 %rs1285, %rs1267, 4; 2026-02-21T10:22:41.7122098Z shl.b16 %rs1286, %rs1268, 4; 2026-02-21T10:22:41.7122164Z shl.b16 %rs1287, %rs1275, 4; 2026-02-21T10:22:41.7122225Z shl.b16 %rs1288, %rs1276, 4; 2026-02-21T10:22:41.7122290Z shl.b16 %rs1289, %rs1269, 4; 2026-02-21T10:22:41.7122355Z shl.b16 %rs1290, %rs1270, 4; 2026-02-21T10:22:41.7122417Z shl.b16 %rs1291, %rs1277, 4; 2026-02-21T10:22:41.7122528Z shl.b16 %rs1292, %rs1278, 4; 2026-02-21T10:22:41.7122597Z shl.b16 %rs1293, %rs1271, 4; 2026-02-21T10:22:41.7122659Z shl.b16 %rs1294, %rs1272, 4; 2026-02-21T10:22:41.7122720Z shl.b16 %rs1295, %rs1279, 4; 2026-02-21T10:22:41.7122782Z shl.b16 %rs1296, %rs1280, 4; 2026-02-21T10:22:41.7122984Z .loc 1 78 58 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:78:58 2026-02-21T10:22:41.7123068Z selp.b16 %rs1297, %rs1281, %rs1265, %p268; 2026-02-21T10:22:41.7123132Z cvt.s16.s8 %rs1298, %rs1297; 2026-02-21T10:22:41.7123199Z shr.s16 %rs1299, %rs1298, 4; 2026-02-21T10:22:41.7123279Z selp.b16 %rs1300, %rs1282, %rs1266, %p268; 2026-02-21T10:22:41.7123343Z cvt.s16.s8 %rs1301, %rs1300; 2026-02-21T10:22:41.7123405Z shr.s16 %rs1302, %rs1301, 4; 2026-02-21T10:22:41.7123485Z selp.b16 %rs1303, %rs1283, %rs1273, %p268; 2026-02-21T10:22:41.7123549Z cvt.s16.s8 %rs1304, %rs1303; 2026-02-21T10:22:41.7123610Z shr.s16 %rs1305, %rs1304, 4; 2026-02-21T10:22:41.7123691Z selp.b16 %rs1306, %rs1284, %rs1274, %p268; 2026-02-21T10:22:41.7123753Z cvt.s16.s8 %rs1307, %rs1306; 2026-02-21T10:22:41.7123814Z shr.s16 %rs1308, %rs1307, 4; 2026-02-21T10:22:41.7123892Z selp.b16 %rs1309, %rs1285, %rs1267, %p268; 2026-02-21T10:22:41.7123958Z cvt.s16.s8 %rs1310, %rs1309; 2026-02-21T10:22:41.7124020Z shr.s16 %rs1311, %rs1310, 4; 2026-02-21T10:22:41.7124095Z selp.b16 %rs1312, %rs1286, %rs1268, %p268; 2026-02-21T10:22:41.7124162Z cvt.s16.s8 %rs1313, %rs1312; 2026-02-21T10:22:41.7124223Z shr.s16 %rs1314, %rs1313, 4; 2026-02-21T10:22:41.7124303Z selp.b16 %rs1315, %rs1287, %rs1275, %p268; 2026-02-21T10:22:41.7124368Z cvt.s16.s8 %rs1316, %rs1315; 2026-02-21T10:22:41.7124428Z shr.s16 %rs1317, %rs1316, 4; 2026-02-21T10:22:41.7124503Z selp.b16 %rs1318, %rs1288, %rs1276, %p268; 2026-02-21T10:22:41.7124567Z cvt.s16.s8 %rs1319, %rs1318; 2026-02-21T10:22:41.7124633Z shr.s16 %rs1320, %rs1319, 4; 2026-02-21T10:22:41.7124709Z selp.b16 %rs1321, %rs1289, %rs1269, %p268; 2026-02-21T10:22:41.7124776Z cvt.s16.s8 %rs1322, %rs1321; 2026-02-21T10:22:41.7124846Z shr.s16 %rs1323, %rs1322, 4; 2026-02-21T10:22:41.7124922Z selp.b16 %rs1324, %rs1290, %rs1270, %p268; 2026-02-21T10:22:41.7124985Z cvt.s16.s8 %rs1325, %rs1324; 2026-02-21T10:22:41.7125047Z shr.s16 %rs1326, %rs1325, 4; 2026-02-21T10:22:41.7125126Z selp.b16 %rs1327, %rs1291, %rs1277, %p268; 2026-02-21T10:22:41.7125189Z cvt.s16.s8 %rs1328, %rs1327; 2026-02-21T10:22:41.7125266Z shr.s16 %rs1329, %rs1328, 4; 2026-02-21T10:22:41.7125351Z selp.b16 %rs1330, %rs1292, %rs1278, %p268; 2026-02-21T10:22:41.7125477Z cvt.s16.s8 %rs1331, %rs1330; 2026-02-21T10:22:41.7125541Z shr.s16 %rs1332, %rs1331, 4; 2026-02-21T10:22:41.7125619Z selp.b16 %rs1333, %rs1293, %rs1271, %p268; 2026-02-21T10:22:41.7125680Z cvt.s16.s8 %rs1334, %rs1333; 2026-02-21T10:22:41.7125790Z shr.s16 %rs1335, %rs1334, 4; 2026-02-21T10:22:41.7125866Z selp.b16 %rs1336, %rs1294, %rs1272, %p268; 2026-02-21T10:22:41.7125933Z cvt.s16.s8 %rs1337, %rs1336; 2026-02-21T10:22:41.7125998Z shr.s16 %rs1338, %rs1337, 4; 2026-02-21T10:22:41.7126072Z selp.b16 %rs1339, %rs1295, %rs1279, %p268; 2026-02-21T10:22:41.7126138Z cvt.s16.s8 %rs1340, %rs1339; 2026-02-21T10:22:41.7126200Z shr.s16 %rs1341, %rs1340, 4; 2026-02-21T10:22:41.7126276Z selp.b16 %rs1342, %rs1296, %rs1280, %p268; 2026-02-21T10:22:41.7126341Z cvt.s16.s8 %rs1343, %rs1342; 2026-02-21T10:22:41.7126411Z shr.s16 %rs1344, %rs1343, 4; 2026-02-21T10:22:41.7126735Z .loc 1 83 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:83:32 2026-02-21T10:22:41.7126884Z cvt.rn.f32.s16 %r5313, %rs1299; 2026-02-21T10:22:41.7126967Z cvt.rn.f32.s16 %r5314, %rs1302; 2026-02-21T10:22:41.7127032Z cvt.rn.f32.s16 %r5315, %rs1305; 2026-02-21T10:22:41.7127100Z cvt.rn.f32.s16 %r5316, %rs1308; 2026-02-21T10:22:41.7127171Z cvt.rn.f32.s16 %r5317, %rs1311; 2026-02-21T10:22:41.7127234Z cvt.rn.f32.s16 %r5318, %rs1314; 2026-02-21T10:22:41.7127299Z cvt.rn.f32.s16 %r5319, %rs1317; 2026-02-21T10:22:41.7127425Z cvt.rn.f32.s16 %r5320, %rs1320; 2026-02-21T10:22:41.7127499Z cvt.rn.f32.s16 %r5321, %rs1323; 2026-02-21T10:22:41.7127564Z cvt.rn.f32.s16 %r5322, %rs1326; 2026-02-21T10:22:41.7127626Z cvt.rn.f32.s16 %r5323, %rs1329; 2026-02-21T10:22:41.7127692Z cvt.rn.f32.s16 %r5324, %rs1332; 2026-02-21T10:22:41.7127755Z cvt.rn.f32.s16 %r5325, %rs1335; 2026-02-21T10:22:41.7127818Z cvt.rn.f32.s16 %r5326, %rs1338; 2026-02-21T10:22:41.7127881Z cvt.rn.f32.s16 %r5327, %rs1341; 2026-02-21T10:22:41.7127948Z cvt.rn.f32.s16 %r5328, %rs1344; 2026-02-21T10:22:41.7128007Z bar.sync 0; 2026-02-21T10:22:41.7128079Z st.shared.b32 [%r152], %r5313; 2026-02-21T10:22:41.7128153Z st.shared.b32 [%r152+4096], %r5321; 2026-02-21T10:22:41.7128218Z st.shared.b32 [%r153], %r5314; 2026-02-21T10:22:41.7128285Z st.shared.b32 [%r153+4096], %r5322; 2026-02-21T10:22:41.7128356Z st.shared.b32 [%r154], %r5315; 2026-02-21T10:22:41.7128421Z st.shared.b32 [%r154+4096], %r5323; 2026-02-21T10:22:41.7128492Z st.shared.b32 [%r155], %r5316; 2026-02-21T10:22:41.7128557Z st.shared.b32 [%r155+4096], %r5324; 2026-02-21T10:22:41.7128625Z st.shared.b32 [%r156], %r5317; 2026-02-21T10:22:41.7128691Z st.shared.b32 [%r156+4096], %r5325; 2026-02-21T10:22:41.7128755Z st.shared.b32 [%r157], %r5318; 2026-02-21T10:22:41.7128824Z st.shared.b32 [%r157+4096], %r5326; 2026-02-21T10:22:41.7128888Z st.shared.b32 [%r158], %r5319; 2026-02-21T10:22:41.7128954Z st.shared.b32 [%r158+4096], %r5327; 2026-02-21T10:22:41.7129017Z st.shared.b32 [%r159], %r5320; 2026-02-21T10:22:41.7129089Z st.shared.b32 [%r159+4096], %r5328; 2026-02-21T10:22:41.7129159Z $L__tmp23: 2026-02-21T10:22:41.7129433Z .loc 2 291 36 // standard.py:291:36 @[ cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:90:40 ] 2026-02-21T10:22:41.7129506Z // begin inline asm 2026-02-21T10:22:41.7129584Z fence.proxy.async.shared::cta; 2026-02-21T10:22:41.7129643Z // end inline asm 2026-02-21T10:22:41.7129705Z bar.sync 0; 2026-02-21T10:22:41.7129781Z wgmma.fence.sync.aligned; 2026-02-21T10:22:41.7129841Z // begin inline asm 2026-02-21T10:22:41.7130352Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4820,%r4821,%r4822,%r4823}, %rd388, %p199, 1, 1; 2026-02-21T10:22:41.7130416Z // end inline asm 2026-02-21T10:22:41.7130475Z // begin inline asm 2026-02-21T10:22:41.7130979Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4856,%r4857,%r4858,%r4859}, %rd389, %p199, 1, 1; 2026-02-21T10:22:41.7131121Z // end inline asm 2026-02-21T10:22:41.7131182Z // begin inline asm 2026-02-21T10:22:41.7131683Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4892,%r4893,%r4894,%r4895}, %rd390, %p199, 1, 1; 2026-02-21T10:22:41.7131828Z // end inline asm 2026-02-21T10:22:41.7131888Z // begin inline asm 2026-02-21T10:22:41.7132396Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4928,%r4929,%r4930,%r4931}, %rd391, %p199, 1, 1; 2026-02-21T10:22:41.7132460Z // end inline asm 2026-02-21T10:22:41.7132519Z // begin inline asm 2026-02-21T10:22:41.7133065Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r4964,%r4965,%r4966,%r4967}, %rd392, %p199, 1, 1; 2026-02-21T10:22:41.7133132Z // end inline asm 2026-02-21T10:22:41.7133190Z // begin inline asm 2026-02-21T10:22:41.7133733Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r5000,%r5001,%r5002,%r5003}, %rd393, %p199, 1, 1; 2026-02-21T10:22:41.7133799Z // end inline asm 2026-02-21T10:22:41.7133858Z // begin inline asm 2026-02-21T10:22:41.7134356Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r5036,%r5037,%r5038,%r5039}, %rd394, %p199, 1, 1; 2026-02-21T10:22:41.7134414Z // end inline asm 2026-02-21T10:22:41.7134484Z // begin inline asm 2026-02-21T10:22:41.7134983Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512}, {%r5072,%r5073,%r5074,%r5075}, %rd395, %p199, 1, 1; 2026-02-21T10:22:41.7135043Z // end inline asm 2026-02-21T10:22:41.7135122Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:41.7135188Z mov.b32 %r5094, %r5093; 2026-02-21T10:22:41.7135250Z mov.b32 %r5092, %r4127; 2026-02-21T10:22:41.7135318Z // begin inline asm 2026-02-21T10:22:41.7135633Z // wait for regs: %r5497,%r5498,%r5499,%r5500,%r5501,%r5502,%r5503,%r5504,%r5505,%r5506,%r5507,%r5508,%r5509,%r5510,%r5511,%r5512,%r5092,%r5093,%r5094 2026-02-21T10:22:41.7135711Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:41.7135779Z // end inline asm 2026-02-21T10:22:41.7135835Z $L__tmp24: 2026-02-21T10:22:41.7136054Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7136119Z add.s32 %r5329, %r5491, 128; 2026-02-21T10:22:41.7136188Z add.s32 %r5330, %r5494, 1; 2026-02-21T10:22:41.7136257Z setp.gt.s32 %p245, %r5330, 2; 2026-02-21T10:22:41.7136330Z selp.b32 %r5494, 0, %r5330, %p245; 2026-02-21T10:22:41.7136401Z selp.b32 %r5491, 0, %r5329, %p239; 2026-02-21T10:22:41.7136726Z .loc 1 51 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:51:22 2026-02-21T10:22:41.7136800Z shl.b32 %r5331, %r5491, 1; 2026-02-21T10:22:41.7137001Z .loc 1 53 25 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:53:25 2026-02-21T10:22:41.7137072Z add.s32 %r5332, %r5331, %r20; 2026-02-21T10:22:41.7137280Z .loc 1 54 53 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:53 2026-02-21T10:22:41.7137347Z shl.b32 %r5333, %r5519, 13; 2026-02-21T10:22:41.7137414Z shl.b32 %r5334, %r5520, 13; 2026-02-21T10:22:41.7137476Z shl.b32 %r5335, %r5521, 13; 2026-02-21T10:22:41.7137539Z shl.b32 %r5336, %r5522, 13; 2026-02-21T10:22:41.7137606Z shl.b32 %r5337, %r5523, 13; 2026-02-21T10:22:41.7137666Z shl.b32 %r5338, %r5524, 13; 2026-02-21T10:22:41.7137813Z shl.b32 %r5339, %r5525, 13; 2026-02-21T10:22:41.7137873Z shl.b32 %r5340, %r5526, 13; 2026-02-21T10:22:41.7138073Z .loc 1 54 60 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:60 2026-02-21T10:22:41.7138198Z add.s32 %r5341, %r5333, %r5332; 2026-02-21T10:22:41.7138261Z add.s32 %r5342, %r5334, %r5332; 2026-02-21T10:22:41.7138327Z add.s32 %r5343, %r5335, %r5332; 2026-02-21T10:22:41.7138391Z add.s32 %r5344, %r5336, %r5332; 2026-02-21T10:22:41.7138453Z add.s32 %r5345, %r5337, %r5332; 2026-02-21T10:22:41.7138520Z add.s32 %r5346, %r5338, %r5332; 2026-02-21T10:22:41.7138582Z add.s32 %r5347, %r5339, %r5332; 2026-02-21T10:22:41.7138647Z add.s32 %r5348, %r5340, %r5332; 2026-02-21T10:22:41.7138842Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7138921Z mad.wide.s32 %rd420, %r5341, 2, %rd35; 2026-02-21T10:22:41.7139005Z mad.wide.s32 %rd421, %r5342, 2, %rd35; 2026-02-21T10:22:41.7139145Z mad.wide.s32 %rd422, %r5343, 2, %rd35; 2026-02-21T10:22:41.7139223Z mad.wide.s32 %rd423, %r5344, 2, %rd35; 2026-02-21T10:22:41.7139290Z mad.wide.s32 %rd424, %r5345, 2, %rd35; 2026-02-21T10:22:41.7139358Z mad.wide.s32 %rd425, %r5346, 2, %rd35; 2026-02-21T10:22:41.7139434Z mad.wide.s32 %rd426, %r5347, 2, %rd35; 2026-02-21T10:22:41.7139502Z mad.wide.s32 %rd427, %r5348, 2, %rd35; 2026-02-21T10:22:41.7139767Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7139832Z shl.b32 %r5349, %r5494, 13; 2026-02-21T10:22:41.7139900Z add.s32 %r5350, %r1928, %r5349; 2026-02-21T10:22:41.7139963Z add.s32 %r5114, %r5350, %r141; 2026-02-21T10:22:41.7140028Z selp.b32 %r5115, 8, 0, %p240; 2026-02-21T10:22:41.7140095Z // begin inline asm 2026-02-21T10:22:41.7140245Z cp.async.ca.shared.global [ %r5114 + 0 ], [ %rd420 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7140303Z // end inline asm 2026-02-21T10:22:41.7140369Z add.s32 %r5116, %r5114, 1024; 2026-02-21T10:22:41.7140432Z // begin inline asm 2026-02-21T10:22:41.7140569Z cp.async.ca.shared.global [ %r5116 + 0 ], [ %rd421 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7140627Z // end inline asm 2026-02-21T10:22:41.7140700Z add.s32 %r5118, %r5114, 2048; 2026-02-21T10:22:41.7140760Z // begin inline asm 2026-02-21T10:22:41.7140893Z cp.async.ca.shared.global [ %r5118 + 0 ], [ %rd422 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7140957Z // end inline asm 2026-02-21T10:22:41.7141019Z add.s32 %r5120, %r5114, 3072; 2026-02-21T10:22:41.7141082Z // begin inline asm 2026-02-21T10:22:41.7141214Z cp.async.ca.shared.global [ %r5120 + 0 ], [ %rd423 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7141278Z // end inline asm 2026-02-21T10:22:41.7141340Z add.s32 %r5122, %r5114, 4096; 2026-02-21T10:22:41.7141402Z // begin inline asm 2026-02-21T10:22:41.7141551Z cp.async.ca.shared.global [ %r5122 + 0 ], [ %rd424 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7141612Z // end inline asm 2026-02-21T10:22:41.7141679Z add.s32 %r5124, %r5114, 5120; 2026-02-21T10:22:41.7141739Z // begin inline asm 2026-02-21T10:22:41.7141874Z cp.async.ca.shared.global [ %r5124 + 0 ], [ %rd425 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7141933Z // end inline asm 2026-02-21T10:22:41.7141999Z add.s32 %r5126, %r5114, 6144; 2026-02-21T10:22:41.7142064Z // begin inline asm 2026-02-21T10:22:41.7142199Z cp.async.ca.shared.global [ %r5126 + 0 ], [ %rd426 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7142257Z // end inline asm 2026-02-21T10:22:41.7142319Z add.s32 %r5128, %r5114, 7168; 2026-02-21T10:22:41.7142385Z // begin inline asm 2026-02-21T10:22:41.7142517Z cp.async.ca.shared.global [ %r5128 + 0 ], [ %rd427 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7142576Z // end inline asm 2026-02-21T10:22:41.7142650Z cp.async.commit_group; 2026-02-21T10:22:41.7142862Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7142927Z shl.b32 %r5351, %r5494, 3; 2026-02-21T10:22:41.7143060Z add.s32 %r5130, %r3414, %r5351; 2026-02-21T10:22:41.7143128Z and.pred %p231, %p256, %p240; 2026-02-21T10:22:41.7143190Z // begin inline asm 2026-02-21T10:22:41.7143324Z @%p231 mbarrier.arrive.expect_tx.shared.b64 _, [%r5130], 1024; 2026-02-21T10:22:41.7143436Z // end inline asm 2026-02-21T10:22:41.7143636Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7143701Z shl.b32 %r5352, %r5494, 10; 2026-02-21T10:22:41.7143769Z add.s32 %r5131, %r3443, %r5352; 2026-02-21T10:22:41.7143826Z bar.sync 0; 2026-02-21T10:22:41.7143896Z elect.sync %r5353|%p246, -1; 2026-02-21T10:22:41.7143970Z and.pred %p247, %p240, %p246; 2026-02-21T10:22:41.7144040Z and.pred %p232, %p1, %p247; 2026-02-21T10:22:41.7144100Z // begin inline asm 2026-02-21T10:22:41.7144433Z @%p232 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r5131], [%rd247, {%r5516, %r5491}], [%r5130]; 2026-02-21T10:22:41.7144502Z // end inline asm 2026-02-21T10:22:41.7144761Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.7144826Z add.s32 %r5154, %r5491, 32; 2026-02-21T10:22:41.7145029Z .loc 1 51 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:51:22 2026-02-21T10:22:41.7145095Z shl.b32 %r5354, %r5154, 1; 2026-02-21T10:22:41.7145335Z .loc 1 53 25 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:53:25 2026-02-21T10:22:41.7145417Z add.s32 %r5355, %r5354, %r20; 2026-02-21T10:22:41.7145616Z .loc 1 54 60 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:60 2026-02-21T10:22:41.7145680Z add.s32 %r5356, %r5333, %r5355; 2026-02-21T10:22:41.7145744Z add.s32 %r5357, %r5334, %r5355; 2026-02-21T10:22:41.7145811Z add.s32 %r5358, %r5335, %r5355; 2026-02-21T10:22:41.7145873Z add.s32 %r5359, %r5336, %r5355; 2026-02-21T10:22:41.7145936Z add.s32 %r5360, %r5337, %r5355; 2026-02-21T10:22:41.7146008Z add.s32 %r5361, %r5338, %r5355; 2026-02-21T10:22:41.7146073Z add.s32 %r5362, %r5339, %r5355; 2026-02-21T10:22:41.7146135Z add.s32 %r5363, %r5340, %r5355; 2026-02-21T10:22:41.7146338Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7146414Z mad.wide.s32 %rd429, %r5356, 2, %rd35; 2026-02-21T10:22:41.7146606Z mad.wide.s32 %rd430, %r5357, 2, %rd35; 2026-02-21T10:22:41.7146683Z mad.wide.s32 %rd431, %r5358, 2, %rd35; 2026-02-21T10:22:41.7146759Z mad.wide.s32 %rd432, %r5359, 2, %rd35; 2026-02-21T10:22:41.7146827Z mad.wide.s32 %rd433, %r5360, 2, %rd35; 2026-02-21T10:22:41.7146895Z mad.wide.s32 %rd434, %r5361, 2, %rd35; 2026-02-21T10:22:41.7146982Z mad.wide.s32 %rd435, %r5362, 2, %rd35; 2026-02-21T10:22:41.7147049Z mad.wide.s32 %rd436, %r5363, 2, %rd35; 2026-02-21T10:22:41.7147245Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7147318Z add.s32 %r5364, %r5236, %r5349; 2026-02-21T10:22:41.7147381Z add.s32 %r5135, %r5364, %r141; 2026-02-21T10:22:41.7147443Z // begin inline asm 2026-02-21T10:22:41.7147582Z cp.async.ca.shared.global [ %r5135 + 0 ], [ %rd429 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7147649Z // end inline asm 2026-02-21T10:22:41.7147712Z add.s32 %r5137, %r5135, 1024; 2026-02-21T10:22:41.7147772Z // begin inline asm 2026-02-21T10:22:41.7147917Z cp.async.ca.shared.global [ %r5137 + 0 ], [ %rd430 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7147976Z // end inline asm 2026-02-21T10:22:41.7148038Z add.s32 %r5139, %r5135, 2048; 2026-02-21T10:22:41.7148362Z // begin inline asm 2026-02-21T10:22:41.7148704Z cp.async.ca.shared.global [ %r5139 + 0 ], [ %rd431 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7149018Z // end inline asm 2026-02-21T10:22:41.7149179Z add.s32 %r5141, %r5135, 3072; 2026-02-21T10:22:41.7149370Z // begin inline asm 2026-02-21T10:22:41.7149614Z cp.async.ca.shared.global [ %r5141 + 0 ], [ %rd432 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7149978Z // end inline asm 2026-02-21T10:22:41.7150136Z add.s32 %r5143, %r5135, 4096; 2026-02-21T10:22:41.7150312Z // begin inline asm 2026-02-21T10:22:41.7150545Z cp.async.ca.shared.global [ %r5143 + 0 ], [ %rd433 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7150906Z // end inline asm 2026-02-21T10:22:41.7151063Z add.s32 %r5145, %r5135, 5120; 2026-02-21T10:22:41.7151240Z // begin inline asm 2026-02-21T10:22:41.7151472Z cp.async.ca.shared.global [ %r5145 + 0 ], [ %rd434 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7151751Z // end inline asm 2026-02-21T10:22:41.7151945Z add.s32 %r5147, %r5135, 6144; 2026-02-21T10:22:41.7152203Z // begin inline asm 2026-02-21T10:22:41.7152466Z cp.async.ca.shared.global [ %r5147 + 0 ], [ %rd435 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7152753Z // end inline asm 2026-02-21T10:22:41.7152945Z add.s32 %r5149, %r5135, 7168; 2026-02-21T10:22:41.7153127Z // begin inline asm 2026-02-21T10:22:41.7153484Z cp.async.ca.shared.global [ %r5149 + 0 ], [ %rd436 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7153832Z // end inline asm 2026-02-21T10:22:41.7154037Z cp.async.commit_group; 2026-02-21T10:22:41.7154413Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7154793Z add.s32 %r5151, %r3417, %r5351; 2026-02-21T10:22:41.7155021Z // begin inline asm 2026-02-21T10:22:41.7155339Z @%p231 mbarrier.arrive.expect_tx.shared.b64 _, [%r5151], 1024; 2026-02-21T10:22:41.7155613Z // end inline asm 2026-02-21T10:22:41.7155905Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7156264Z add.s32 %r5152, %r3464, %r5352; 2026-02-21T10:22:41.7156638Z bar.sync 0; 2026-02-21T10:22:41.7156821Z elect.sync %r5365|%p248, -1; 2026-02-21T10:22:41.7157027Z and.pred %p249, %p240, %p248; 2026-02-21T10:22:41.7157233Z and.pred %p234, %p1, %p249; 2026-02-21T10:22:41.7157418Z // begin inline asm 2026-02-21T10:22:41.7157855Z @%p234 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r5152], [%rd247, {%r5516, %r5154}], [%r5151]; 2026-02-21T10:22:41.7158330Z // end inline asm 2026-02-21T10:22:41.7158635Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.7159006Z add.s32 %r5175, %r5491, 64; 2026-02-21T10:22:41.7159327Z .loc 1 51 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:51:22 2026-02-21T10:22:41.7159688Z shl.b32 %r5366, %r5175, 1; 2026-02-21T10:22:41.7160001Z .loc 1 53 25 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:53:25 2026-02-21T10:22:41.7160354Z add.s32 %r5367, %r5366, %r20; 2026-02-21T10:22:41.7160670Z .loc 1 54 60 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:60 2026-02-21T10:22:41.7161028Z add.s32 %r5368, %r5333, %r5367; 2026-02-21T10:22:41.7161221Z add.s32 %r5369, %r5334, %r5367; 2026-02-21T10:22:41.7161404Z add.s32 %r5370, %r5335, %r5367; 2026-02-21T10:22:41.7161590Z add.s32 %r5371, %r5336, %r5367; 2026-02-21T10:22:41.7161781Z add.s32 %r5372, %r5337, %r5367; 2026-02-21T10:22:41.7161960Z add.s32 %r5373, %r5338, %r5367; 2026-02-21T10:22:41.7162148Z add.s32 %r5374, %r5339, %r5367; 2026-02-21T10:22:41.7162331Z add.s32 %r5375, %r5340, %r5367; 2026-02-21T10:22:41.7162651Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7163010Z mad.wide.s32 %rd438, %r5368, 2, %rd35; 2026-02-21T10:22:41.7163230Z mad.wide.s32 %rd439, %r5369, 2, %rd35; 2026-02-21T10:22:41.7163441Z mad.wide.s32 %rd440, %r5370, 2, %rd35; 2026-02-21T10:22:41.7163653Z mad.wide.s32 %rd441, %r5371, 2, %rd35; 2026-02-21T10:22:41.7163859Z mad.wide.s32 %rd442, %r5372, 2, %rd35; 2026-02-21T10:22:41.7164062Z mad.wide.s32 %rd443, %r5373, 2, %rd35; 2026-02-21T10:22:41.7164272Z mad.wide.s32 %rd444, %r5374, 2, %rd35; 2026-02-21T10:22:41.7164589Z mad.wide.s32 %rd445, %r5375, 2, %rd35; 2026-02-21T10:22:41.7164952Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7165307Z add.s32 %r5376, %r5267, %r5349; 2026-02-21T10:22:41.7165574Z add.s32 %r5156, %r5376, %r141; 2026-02-21T10:22:41.7165762Z // begin inline asm 2026-02-21T10:22:41.7166028Z cp.async.ca.shared.global [ %r5156 + 0 ], [ %rd438 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7166320Z // end inline asm 2026-02-21T10:22:41.7166591Z add.s32 %r5158, %r5156, 1024; 2026-02-21T10:22:41.7166782Z // begin inline asm 2026-02-21T10:22:41.7167026Z cp.async.ca.shared.global [ %r5158 + 0 ], [ %rd439 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7167313Z // end inline asm 2026-02-21T10:22:41.7167467Z add.s32 %r5160, %r5156, 2048; 2026-02-21T10:22:41.7167652Z // begin inline asm 2026-02-21T10:22:41.7167885Z cp.async.ca.shared.global [ %r5160 + 0 ], [ %rd440 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7168167Z // end inline asm 2026-02-21T10:22:41.7171915Z add.s32 %r5162, %r5156, 3072; 2026-02-21T10:22:41.7172176Z // begin inline asm 2026-02-21T10:22:41.7172456Z cp.async.ca.shared.global [ %r5162 + 0 ], [ %rd441 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7172767Z // end inline asm 2026-02-21T10:22:41.7172939Z add.s32 %r5164, %r5156, 4096; 2026-02-21T10:22:41.7173135Z // begin inline asm 2026-02-21T10:22:41.7173468Z cp.async.ca.shared.global [ %r5164 + 0 ], [ %rd442 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7173774Z // end inline asm 2026-02-21T10:22:41.7173932Z add.s32 %r5166, %r5156, 5120; 2026-02-21T10:22:41.7174115Z // begin inline asm 2026-02-21T10:22:41.7174349Z cp.async.ca.shared.global [ %r5166 + 0 ], [ %rd443 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7174631Z // end inline asm 2026-02-21T10:22:41.7174793Z add.s32 %r5168, %r5156, 6144; 2026-02-21T10:22:41.7174974Z // begin inline asm 2026-02-21T10:22:41.7175216Z cp.async.ca.shared.global [ %r5168 + 0 ], [ %rd444 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7175490Z // end inline asm 2026-02-21T10:22:41.7175651Z add.s32 %r5170, %r5156, 7168; 2026-02-21T10:22:41.7175825Z // begin inline asm 2026-02-21T10:22:41.7176058Z cp.async.ca.shared.global [ %r5170 + 0 ], [ %rd445 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7176334Z // end inline asm 2026-02-21T10:22:41.7176671Z cp.async.commit_group; 2026-02-21T10:22:41.7177017Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7177386Z add.s32 %r5172, %r3420, %r5351; 2026-02-21T10:22:41.7177575Z // begin inline asm 2026-02-21T10:22:41.7177824Z @%p231 mbarrier.arrive.expect_tx.shared.b64 _, [%r5172], 1024; 2026-02-21T10:22:41.7178104Z // end inline asm 2026-02-21T10:22:41.7178406Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7178787Z add.s32 %r5173, %r3485, %r5352; 2026-02-21T10:22:41.7178981Z bar.sync 0; 2026-02-21T10:22:41.7179140Z elect.sync %r5377|%p250, -1; 2026-02-21T10:22:41.7179350Z and.pred %p251, %p240, %p250; 2026-02-21T10:22:41.7179542Z and.pred %p236, %p1, %p251; 2026-02-21T10:22:41.7179726Z // begin inline asm 2026-02-21T10:22:41.7180166Z @%p236 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r5173], [%rd247, {%r5516, %r5175}], [%r5172]; 2026-02-21T10:22:41.7180657Z // end inline asm 2026-02-21T10:22:41.7180984Z .loc 1 47 126 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:47:126 2026-02-21T10:22:41.7181355Z add.s32 %r5196, %r5491, 96; 2026-02-21T10:22:41.7181681Z .loc 1 51 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:51:22 2026-02-21T10:22:41.7182031Z shl.b32 %r5378, %r5196, 1; 2026-02-21T10:22:41.7182349Z .loc 1 53 25 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:53:25 2026-02-21T10:22:41.7182696Z add.s32 %r5379, %r5378, %r20; 2026-02-21T10:22:41.7183015Z .loc 1 54 60 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:60 2026-02-21T10:22:41.7183456Z add.s32 %r5380, %r5333, %r5379; 2026-02-21T10:22:41.7183645Z add.s32 %r5381, %r5334, %r5379; 2026-02-21T10:22:41.7183836Z add.s32 %r5382, %r5335, %r5379; 2026-02-21T10:22:41.7184083Z add.s32 %r5383, %r5336, %r5379; 2026-02-21T10:22:41.7184267Z add.s32 %r5384, %r5337, %r5379; 2026-02-21T10:22:41.7184445Z add.s32 %r5385, %r5338, %r5379; 2026-02-21T10:22:41.7184630Z add.s32 %r5386, %r5339, %r5379; 2026-02-21T10:22:41.7184807Z add.s32 %r5387, %r5340, %r5379; 2026-02-21T10:22:41.7185139Z .loc 1 54 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:32 2026-02-21T10:22:41.7185503Z mad.wide.s32 %rd447, %r5380, 2, %rd35; 2026-02-21T10:22:41.7185724Z mad.wide.s32 %rd448, %r5381, 2, %rd35; 2026-02-21T10:22:41.7185937Z mad.wide.s32 %rd449, %r5382, 2, %rd35; 2026-02-21T10:22:41.7186137Z mad.wide.s32 %rd450, %r5383, 2, %rd35; 2026-02-21T10:22:41.7186344Z mad.wide.s32 %rd451, %r5384, 2, %rd35; 2026-02-21T10:22:41.7186750Z mad.wide.s32 %rd452, %r5385, 2, %rd35; 2026-02-21T10:22:41.7186966Z mad.wide.s32 %rd453, %r5386, 2, %rd35; 2026-02-21T10:22:41.7187172Z mad.wide.s32 %rd454, %r5387, 2, %rd35; 2026-02-21T10:22:41.7187513Z .loc 1 54 80 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:54:80 2026-02-21T10:22:41.7187868Z add.s32 %r5388, %r5298, %r5349; 2026-02-21T10:22:41.7188123Z add.s32 %r5177, %r5388, %r141; 2026-02-21T10:22:41.7188314Z // begin inline asm 2026-02-21T10:22:41.7188626Z cp.async.ca.shared.global [ %r5177 + 0 ], [ %rd447 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7188914Z // end inline asm 2026-02-21T10:22:41.7189069Z add.s32 %r5179, %r5177, 1024; 2026-02-21T10:22:41.7189248Z // begin inline asm 2026-02-21T10:22:41.7189485Z cp.async.ca.shared.global [ %r5179 + 0 ], [ %rd448 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7189760Z // end inline asm 2026-02-21T10:22:41.7189913Z add.s32 %r5181, %r5177, 2048; 2026-02-21T10:22:41.7190104Z // begin inline asm 2026-02-21T10:22:41.7190338Z cp.async.ca.shared.global [ %r5181 + 0 ], [ %rd449 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7190610Z // end inline asm 2026-02-21T10:22:41.7190766Z add.s32 %r5183, %r5177, 3072; 2026-02-21T10:22:41.7190942Z // begin inline asm 2026-02-21T10:22:41.7191172Z cp.async.ca.shared.global [ %r5183 + 0 ], [ %rd450 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7191442Z // end inline asm 2026-02-21T10:22:41.7191596Z add.s32 %r5185, %r5177, 4096; 2026-02-21T10:22:41.7191773Z // begin inline asm 2026-02-21T10:22:41.7192000Z cp.async.ca.shared.global [ %r5185 + 0 ], [ %rd451 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7192276Z // end inline asm 2026-02-21T10:22:41.7192421Z add.s32 %r5187, %r5177, 5120; 2026-02-21T10:22:41.7192605Z // begin inline asm 2026-02-21T10:22:41.7192840Z cp.async.ca.shared.global [ %r5187 + 0 ], [ %rd452 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7193123Z // end inline asm 2026-02-21T10:22:41.7193280Z add.s32 %r5189, %r5177, 6144; 2026-02-21T10:22:41.7193466Z // begin inline asm 2026-02-21T10:22:41.7193703Z cp.async.ca.shared.global [ %r5189 + 0 ], [ %rd453 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7193977Z // end inline asm 2026-02-21T10:22:41.7194132Z add.s32 %r5191, %r5177, 7168; 2026-02-21T10:22:41.7194312Z // begin inline asm 2026-02-21T10:22:41.7194542Z cp.async.ca.shared.global [ %r5191 + 0 ], [ %rd454 + 0 ], 0x8, %r5115; 2026-02-21T10:22:41.7194818Z // end inline asm 2026-02-21T10:22:41.7194993Z cp.async.commit_group; 2026-02-21T10:22:41.7195311Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7195700Z add.s32 %r5193, %r3423, %r5351; 2026-02-21T10:22:41.7195897Z // begin inline asm 2026-02-21T10:22:41.7196129Z @%p231 mbarrier.arrive.expect_tx.shared.b64 _, [%r5193], 1024; 2026-02-21T10:22:41.7196406Z // end inline asm 2026-02-21T10:22:41.7196843Z .loc 1 60 33 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:60:33 2026-02-21T10:22:41.7197303Z add.s32 %r5194, %r3506, %r5352; 2026-02-21T10:22:41.7197485Z bar.sync 0; 2026-02-21T10:22:41.7197651Z elect.sync %r5389|%p252, -1; 2026-02-21T10:22:41.7197847Z and.pred %p253, %p240, %p252; 2026-02-21T10:22:41.7198109Z and.pred %p238, %p1, %p253; 2026-02-21T10:22:41.7198294Z // begin inline asm 2026-02-21T10:22:41.7198726Z @%p238 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r5194], [%rd247, {%r5516, %r5196}], [%r5193]; 2026-02-21T10:22:41.7199205Z // end inline asm 2026-02-21T10:22:41.7199507Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7199876Z setp.ne.b32 %p254, %r5490, 31; 2026-02-21T10:22:41.7200063Z @%p254 bra $L__BB0_13; 2026-02-21T10:22:41.7200284Z // %bb.12: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:22:41.7200692Z .loc 1 38 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:38:32 2026-02-21T10:22:41.7201140Z add.s32 %r5409, %r5488, %r6; 2026-02-21T10:22:41.7201472Z .loc 1 40 32 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:40:32 2026-02-21T10:22:41.7201822Z add.s32 %r5410, %r5486, %r18; 2026-02-21T10:22:41.7202008Z add.s32 %r5411, %r5486, %r19; 2026-02-21T10:22:41.7202383Z .loc 1 93 28 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:93:28 2026-02-21T10:22:41.7202755Z cvt.rn.bf16x2.f32 %r5412, %r5498, %r5497; 2026-02-21T10:22:41.7202987Z cvt.rn.bf16x2.f32 %r5413, %r5500, %r5499; 2026-02-21T10:22:41.7203203Z cvt.rn.bf16x2.f32 %r5414, %r5502, %r5501; 2026-02-21T10:22:41.7203424Z cvt.rn.bf16x2.f32 %r5415, %r5504, %r5503; 2026-02-21T10:22:41.7203636Z cvt.rn.bf16x2.f32 %r5416, %r5506, %r5505; 2026-02-21T10:22:41.7203853Z cvt.rn.bf16x2.f32 %r5417, %r5508, %r5507; 2026-02-21T10:22:41.7204061Z cvt.rn.bf16x2.f32 %r5418, %r5510, %r5509; 2026-02-21T10:22:41.7204276Z cvt.rn.bf16x2.f32 %r5419, %r5512, %r5511; 2026-02-21T10:22:41.7204634Z .loc 1 94 50 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:50 2026-02-21T10:22:41.7204998Z mad.lo.s32 %r5420, %r5410, 1280, %r5409; 2026-02-21T10:22:41.7205222Z mad.lo.s32 %r5421, %r5411, 1280, %r5409; 2026-02-21T10:22:41.7205559Z .loc 1 94 22 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:22 2026-02-21T10:22:41.7205923Z mad.wide.s32 %rd456, %r5420, 2, %rd36; 2026-02-21T10:22:41.7206134Z mad.wide.s32 %rd457, %r5421, 2, %rd36; 2026-02-21T10:22:41.7206595Z .loc 1 94 81 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:94:81 2026-02-21T10:22:41.7207004Z st.shared.v4.b32 [%r160], {%r5412, %r5414, %r5416, %r5418}; 2026-02-21T10:22:41.7207312Z st.shared.v4.b32 [%r160+128], {%r5413, %r5415, %r5417, %r5419}; 2026-02-21T10:22:41.7207575Z bar.sync 0; 2026-02-21T10:22:41.7207722Z // begin inline asm 2026-02-21T10:22:41.7208017Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5400, %r5401, %r5402, %r5403}, [%r5394]; 2026-02-21T10:22:41.7208354Z // end inline asm 2026-02-21T10:22:41.7208508Z // begin inline asm 2026-02-21T10:22:41.7208787Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5404, %r5405, %r5406, %r5407}, [%r5399]; 2026-02-21T10:22:41.7209131Z // end inline asm 2026-02-21T10:22:41.7209286Z // begin inline asm 2026-02-21T10:22:41.7209508Z st.global.v4.b32 [ %rd456 + 0 ], { %r5400, %r5401, %r5402, %r5403 }; 2026-02-21T10:22:41.7209778Z // end inline asm 2026-02-21T10:22:41.7209925Z // begin inline asm 2026-02-21T10:22:41.7210137Z st.global.v4.b32 [ %rd457 + 0 ], { %r5404, %r5405, %r5406, %r5407 }; 2026-02-21T10:22:41.7210392Z // end inline asm 2026-02-21T10:22:41.7210550Z mov.b32 %r5497, 0f00000000; 2026-02-21T10:22:41.7210729Z mov.b32 %r5498, %r5497; 2026-02-21T10:22:41.7210903Z mov.b32 %r5499, %r5497; 2026-02-21T10:22:41.7211072Z mov.b32 %r5500, %r5497; 2026-02-21T10:22:41.7211230Z mov.b32 %r5501, %r5497; 2026-02-21T10:22:41.7211492Z mov.b32 %r5502, %r5497; 2026-02-21T10:22:41.7211666Z mov.b32 %r5503, %r5497; 2026-02-21T10:22:41.7211830Z mov.b32 %r5504, %r5497; 2026-02-21T10:22:41.7211990Z mov.b32 %r5505, %r5497; 2026-02-21T10:22:41.7212152Z mov.b32 %r5506, %r5497; 2026-02-21T10:22:41.7212381Z mov.b32 %r5507, %r5497; 2026-02-21T10:22:41.7212544Z mov.b32 %r5508, %r5497; 2026-02-21T10:22:41.7212708Z mov.b32 %r5509, %r5497; 2026-02-21T10:22:41.7212872Z mov.b32 %r5510, %r5497; 2026-02-21T10:22:41.7213035Z mov.b32 %r5511, %r5497; 2026-02-21T10:22:41.7213193Z mov.b32 %r5512, %r5497; 2026-02-21T10:22:41.7213360Z bra.uni $L__BB0_13; 2026-02-21T10:22:41.7213552Z $L__BB0_14: // %._crit_edge265 2026-02-21T10:22:41.7213960Z .loc 1 26 139 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:139 2026-02-21T10:22:41.7214329Z cp.async.wait_group 0; 2026-02-21T10:22:41.7214503Z bar.sync 0; 2026-02-21T10:22:41.7214649Z // begin inline asm 2026-02-21T10:22:41.7214921Z @%p256 mbarrier.inval.shared::cta.b64 [%r3423]; 2026-02-21T10:22:41.7215155Z // end inline asm 2026-02-21T10:22:41.7215301Z bar.sync 0; 2026-02-21T10:22:41.7215447Z // begin inline asm 2026-02-21T10:22:41.7215645Z @%p256 mbarrier.inval.shared::cta.b64 [%r3424]; 2026-02-21T10:22:41.7215875Z // end inline asm 2026-02-21T10:22:41.7216018Z bar.sync 0; 2026-02-21T10:22:41.7216158Z // begin inline asm 2026-02-21T10:22:41.7216429Z @%p256 mbarrier.inval.shared::cta.b64 [%r3425]; 2026-02-21T10:22:41.7216780Z // end inline asm 2026-02-21T10:22:41.7216932Z // begin inline asm 2026-02-21T10:22:41.7217112Z @%p256 mbarrier.inval.shared::cta.b64 [%r3420]; 2026-02-21T10:22:41.7217332Z // end inline asm 2026-02-21T10:22:41.7217476Z bar.sync 0; 2026-02-21T10:22:41.7217629Z // begin inline asm 2026-02-21T10:22:41.7217808Z @%p256 mbarrier.inval.shared::cta.b64 [%r3421]; 2026-02-21T10:22:41.7218031Z // end inline asm 2026-02-21T10:22:41.7218171Z bar.sync 0; 2026-02-21T10:22:41.7218313Z // begin inline asm 2026-02-21T10:22:41.7218504Z @%p256 mbarrier.inval.shared::cta.b64 [%r3422]; 2026-02-21T10:22:41.7218718Z // end inline asm 2026-02-21T10:22:41.7218875Z // begin inline asm 2026-02-21T10:22:41.7219062Z @%p256 mbarrier.inval.shared::cta.b64 [%r3417]; 2026-02-21T10:22:41.7219284Z // end inline asm 2026-02-21T10:22:41.7219425Z bar.sync 0; 2026-02-21T10:22:41.7219566Z // begin inline asm 2026-02-21T10:22:41.7219748Z @%p256 mbarrier.inval.shared::cta.b64 [%r3418]; 2026-02-21T10:22:41.7219971Z // end inline asm 2026-02-21T10:22:41.7220111Z bar.sync 0; 2026-02-21T10:22:41.7220257Z // begin inline asm 2026-02-21T10:22:41.7220444Z @%p256 mbarrier.inval.shared::cta.b64 [%r3419]; 2026-02-21T10:22:41.7220662Z // end inline asm 2026-02-21T10:22:41.7220807Z // begin inline asm 2026-02-21T10:22:41.7220988Z @%p256 mbarrier.inval.shared::cta.b64 [%r3414]; 2026-02-21T10:22:41.7221218Z // end inline asm 2026-02-21T10:22:41.7221360Z bar.sync 0; 2026-02-21T10:22:41.7221499Z // begin inline asm 2026-02-21T10:22:41.7221682Z @%p256 mbarrier.inval.shared::cta.b64 [%r3415]; 2026-02-21T10:22:41.7221897Z // end inline asm 2026-02-21T10:22:41.7222041Z bar.sync 0; 2026-02-21T10:22:41.7222180Z // begin inline asm 2026-02-21T10:22:41.7222369Z @%p256 mbarrier.inval.shared::cta.b64 [%r3416]; 2026-02-21T10:22:41.7222595Z // end inline asm 2026-02-21T10:22:41.7222886Z .loc 1 26 4 // cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py:26:4 2026-02-21T10:22:41.7223228Z ret; 2026-02-21T10:22:41.7223356Z $L__tmp25: 2026-02-21T10:22:41.7223489Z $L__func_end0: 2026-02-21T10:22:41.7223658Z // -- End function 2026-02-21T10:22:41.7223872Z } 2026-02-21T10:22:41.7224185Z .file 1 "/tmp/torchinductor_root/ot/cot66vqjrgky6snfzf5k7d6e33ywp6wvlqnyqmaeb3tfrbdntwdk.py" 2026-02-21T10:22:41.7224728Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:22:41.7225084Z .section .debug_abbrev 2026-02-21T10:22:41.7225327Z { 2026-02-21T10:22:41.7225495Z .b8 1 // Abbreviation Code 2026-02-21T10:22:41.7225758Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:22:41.7226027Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:41.7226342Z .b8 37 // DW_AT_producer 2026-02-21T10:22:41.7226706Z .b8 8 // DW_FORM_string 2026-02-21T10:22:41.7226945Z .b8 19 // DW_AT_language 2026-02-21T10:22:41.7227184Z .b8 5 // DW_FORM_data2 2026-02-21T10:22:41.7227421Z .b8 3 // DW_AT_name 2026-02-21T10:22:41.7227654Z .b8 8 // DW_FORM_string 2026-02-21T10:22:41.7227893Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:22:41.7228131Z .b8 6 // DW_FORM_data4 2026-02-21T10:22:41.7228368Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:22:41.7228789Z .b8 8 // DW_FORM_string 2026-02-21T10:22:41.7229025Z .b8 0 // EOM(1) 2026-02-21T10:22:41.7229247Z .b8 0 // EOM(2) 2026-02-21T10:22:41.7229489Z .b8 2 // Abbreviation Code 2026-02-21T10:22:41.7229813Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:41.7230058Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:41.7230292Z .b8 3 // DW_AT_name 2026-02-21T10:22:41.7230523Z .b8 8 // DW_FORM_string 2026-02-21T10:22:41.7230758Z .b8 32 // DW_AT_inline 2026-02-21T10:22:41.7230995Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:41.7231224Z .b8 0 // EOM(1) 2026-02-21T10:22:41.7231447Z .b8 0 // EOM(2) 2026-02-21T10:22:41.7231687Z .b8 3 // Abbreviation Code 2026-02-21T10:22:41.7231938Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:41.7232179Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:41.7232424Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:41.7232661Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:41.7232894Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:41.7233130Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:41.7233374Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:41.7233623Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:41.7233848Z .b8 0 // EOM(1) 2026-02-21T10:22:41.7234070Z .b8 0 // EOM(2) 2026-02-21T10:22:41.7234300Z .b8 4 // Abbreviation Code 2026-02-21T10:22:41.7234570Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:22:41.7234831Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:41.7235080Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:41.7235330Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:41.7235563Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:41.7235796Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:41.7236025Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:41.7236257Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:41.7236616Z .b8 88 // DW_AT_call_file 2026-02-21T10:22:41.7236860Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:41.7237099Z .b8 89 // DW_AT_call_line 2026-02-21T10:22:41.7237429Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:41.7237668Z .b8 87 // DW_AT_call_column 2026-02-21T10:22:41.7237907Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:41.7238203Z .b8 0 // EOM(1) 2026-02-21T10:22:41.7238420Z .b8 0 // EOM(2) 2026-02-21T10:22:41.7238633Z .b8 0 // EOM(3) 2026-02-21T10:22:41.7238833Z } 2026-02-21T10:22:41.7238966Z .section .debug_info 2026-02-21T10:22:41.7239118Z { 2026-02-21T10:22:41.7239278Z .b32 178 // Length of Unit 2026-02-21T10:22:41.7239535Z .b8 2 // DWARF version number 2026-02-21T10:22:41.7239756Z .b8 0 2026-02-21T10:22:41.7239963Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:22:41.7240269Z .b8 8 // Address Size (in bytes) 2026-02-21T10:22:41.7240644Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:22:41.7240937Z .b8 116 // DW_AT_producer 2026-02-21T10:22:41.7241149Z .b8 114 2026-02-21T10:22:41.7241274Z .b8 105 2026-02-21T10:22:41.7241398Z .b8 116 2026-02-21T10:22:41.7241522Z .b8 111 2026-02-21T10:22:41.7241640Z .b8 110 2026-02-21T10:22:41.7241761Z .b8 0 2026-02-21T10:22:41.7241980Z .b8 2 // DW_AT_language 2026-02-21T10:22:41.7242194Z .b8 0 2026-02-21T10:22:41.7242342Z .b8 99 // DW_AT_name 2026-02-21T10:22:41.7242547Z .b8 111 2026-02-21T10:22:41.7242672Z .b8 116 2026-02-21T10:22:41.7242792Z .b8 54 2026-02-21T10:22:41.7242914Z .b8 54 2026-02-21T10:22:41.7243033Z .b8 118 2026-02-21T10:22:41.7243168Z .b8 113 2026-02-21T10:22:41.7243288Z .b8 106 2026-02-21T10:22:41.7243412Z .b8 114 2026-02-21T10:22:41.7243532Z .b8 103 2026-02-21T10:22:41.7243656Z .b8 107 2026-02-21T10:22:41.7243775Z .b8 121 2026-02-21T10:22:41.7243901Z .b8 54 2026-02-21T10:22:41.7244026Z .b8 115 2026-02-21T10:22:41.7244145Z .b8 110 2026-02-21T10:22:41.7244267Z .b8 102 2026-02-21T10:22:41.7244383Z .b8 122 2026-02-21T10:22:41.7244503Z .b8 102 2026-02-21T10:22:41.7244623Z .b8 53 2026-02-21T10:22:41.7244750Z .b8 107 2026-02-21T10:22:41.7244878Z .b8 55 2026-02-21T10:22:41.7245003Z .b8 100 2026-02-21T10:22:41.7245121Z .b8 54 2026-02-21T10:22:41.7245248Z .b8 101 2026-02-21T10:22:41.7245368Z .b8 51 2026-02-21T10:22:41.7245489Z .b8 51 2026-02-21T10:22:41.7245606Z .b8 121 2026-02-21T10:22:41.7245729Z .b8 119 2026-02-21T10:22:41.7245849Z .b8 112 2026-02-21T10:22:41.7245970Z .b8 54 2026-02-21T10:22:41.7246090Z .b8 119 2026-02-21T10:22:41.7246208Z .b8 118 2026-02-21T10:22:41.7246330Z .b8 108 2026-02-21T10:22:41.7246447Z .b8 113 2026-02-21T10:22:41.7246681Z .b8 110 2026-02-21T10:22:41.7246814Z .b8 121 2026-02-21T10:22:41.7246933Z .b8 113 2026-02-21T10:22:41.7247055Z .b8 109 2026-02-21T10:22:41.7247178Z .b8 97 2026-02-21T10:22:41.7247295Z .b8 101 2026-02-21T10:22:41.7247417Z .b8 98 2026-02-21T10:22:41.7247538Z .b8 51 2026-02-21T10:22:41.7247660Z .b8 116 2026-02-21T10:22:41.7247777Z .b8 102 2026-02-21T10:22:41.7247902Z .b8 114 2026-02-21T10:22:41.7248024Z .b8 98 2026-02-21T10:22:41.7248142Z .b8 100 2026-02-21T10:22:41.7248265Z .b8 110 2026-02-21T10:22:41.7248384Z .b8 116 2026-02-21T10:22:41.7248505Z .b8 119 2026-02-21T10:22:41.7248627Z .b8 100 2026-02-21T10:22:41.7248748Z .b8 107 2026-02-21T10:22:41.7248867Z .b8 46 2026-02-21T10:22:41.7248993Z .b8 112 2026-02-21T10:22:41.7249112Z .b8 121 2026-02-21T10:22:41.7249246Z .b8 0 2026-02-21T10:22:41.7249424Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:22:41.7249687Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:22:41.7249902Z .b8 116 2026-02-21T10:22:41.7250022Z .b8 109 2026-02-21T10:22:41.7250143Z .b8 112 2026-02-21T10:22:41.7250261Z .b8 47 2026-02-21T10:22:41.7250382Z .b8 116 2026-02-21T10:22:41.7250503Z .b8 111 2026-02-21T10:22:41.7250622Z .b8 114 2026-02-21T10:22:41.7250824Z .b8 99 2026-02-21T10:22:41.7250948Z .b8 104 2026-02-21T10:22:41.7251066Z .b8 105 2026-02-21T10:22:41.7251199Z .b8 110 2026-02-21T10:22:41.7251319Z .b8 100 2026-02-21T10:22:41.7251439Z .b8 117 2026-02-21T10:22:41.7251555Z .b8 99 2026-02-21T10:22:41.7251752Z .b8 116 2026-02-21T10:22:41.7251872Z .b8 111 2026-02-21T10:22:41.7251992Z .b8 114 2026-02-21T10:22:41.7252125Z .b8 95 2026-02-21T10:22:41.7252244Z .b8 114 2026-02-21T10:22:41.7252366Z .b8 111 2026-02-21T10:22:41.7252487Z .b8 111 2026-02-21T10:22:41.7252606Z .b8 116 2026-02-21T10:22:41.7252724Z .b8 47 2026-02-21T10:22:41.7252856Z .b8 111 2026-02-21T10:22:41.7252974Z .b8 116 2026-02-21T10:22:41.7253095Z .b8 0 2026-02-21T10:22:41.7253276Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:22:41.7253556Z .b8 95 // DW_AT_name 2026-02-21T10:22:41.7253768Z .b8 104 2026-02-21T10:22:41.7253892Z .b8 101 2026-02-21T10:22:41.7254014Z .b8 108 2026-02-21T10:22:41.7254134Z .b8 105 2026-02-21T10:22:41.7254253Z .b8 111 2026-02-21T10:22:41.7254445Z .b8 110 2026-02-21T10:22:41.7254568Z .b8 95 2026-02-21T10:22:41.7254685Z .b8 109 2026-02-21T10:22:41.7254806Z .b8 97 2026-02-21T10:22:41.7254927Z .b8 116 2026-02-21T10:22:41.7255060Z .b8 109 2026-02-21T10:22:41.7255182Z .b8 117 2026-02-21T10:22:41.7255301Z .b8 108 2026-02-21T10:22:41.7255419Z .b8 95 2026-02-21T10:22:41.7255539Z .b8 98 2026-02-21T10:22:41.7255657Z .b8 102 2026-02-21T10:22:41.7255850Z .b8 49 2026-02-21T10:22:41.7255975Z .b8 54 2026-02-21T10:22:41.7256094Z .b8 95 2026-02-21T10:22:41.7256213Z .b8 105 2026-02-21T10:22:41.7256337Z .b8 110 2026-02-21T10:22:41.7256576Z .b8 116 2026-02-21T10:22:41.7256703Z .b8 52 2026-02-21T10:22:41.7256827Z .b8 0 2026-02-21T10:22:41.7256978Z .b8 1 // DW_AT_inline 2026-02-21T10:22:41.7257262Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:22:41.7257547Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:22:41.7257828Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:22:41.7258103Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:41.7258410Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:22:41.7258726Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:41.7258996Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:22:41.7259256Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:22:41.7259505Z .b8 1 // DW_AT_call_file 2026-02-21T10:22:41.7259765Z .b8 90 // DW_AT_call_line 2026-02-21T10:22:41.7260020Z .b8 40 // DW_AT_call_column 2026-02-21T10:22:41.7260280Z .b8 0 // End Of Children Mark 2026-02-21T10:22:41.7260544Z .b8 0 // End Of Children Mark 2026-02-21T10:22:41.7260762Z } 2026-02-21T10:22:41.7260909Z .section .debug_macinfo { } 2026-02-21T10:22:41.7261034Z 2026-02-21T10:22:41.7261117Z ================================================================ 2026-02-21T10:22:41.7261393Z please share the reproducer above with Triton project. 2026-02-21T10:22:47.6568936Z 2026-02-21T10:22:47.6568952Z 2026-02-21T10:22:47.6568978Z 2026-02-21T10:22:47.6569422Z ================================================================ 2026-02-21T10:22:47.6569778Z Internal Triton PTX codegen error 2026-02-21T10:22:47.6570027Z `ptxas` stderr: 2026-02-21T10:22:47.6570697Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 945 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:22:47.6571456Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:47.6571676Z 2026-02-21T10:22:47.6572282Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpau0d1lol.ptx -o /tmp/tmpau0d1lol.ptx.o 2026-02-21T10:22:47.6573370Z 2026-02-21T10:22:47.6573373Z 2026-02-21T10:22:47.6573447Z // 2026-02-21T10:22:47.6573622Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:22:47.6574028Z // 2026-02-21T10:22:47.6574114Z 2026-02-21T10:22:47.6574182Z .version 8.7 2026-02-21T10:22:47.6574347Z .target sm_90a 2026-02-21T10:22:47.6574508Z .address_size 64 2026-02-21T10:22:47.6574617Z 2026-02-21T10:22:47.6574823Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:22:47.6575221Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:22:47.6575528Z // @_helion_matmul_bf16_int4 2026-02-21T10:22:47.6575826Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:22:47.6576161Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:22:47.6576907Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:22:47.6577422Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:22:47.6577828Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:22:47.6578221Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:22:47.6578538Z ) 2026-02-21T10:22:47.6578686Z .reqntid 256 2026-02-21T10:22:47.6578843Z .maxnreg 32 2026-02-21T10:22:47.6578996Z { 2026-02-21T10:22:47.6579294Z .reg .pred %p<57>; 2026-02-21T10:22:47.6579485Z .reg .b16 %rs<193>; 2026-02-21T10:22:47.6579659Z .reg .b32 %r<2058>; 2026-02-21T10:22:47.6579833Z .reg .b64 %rd<116>; 2026-02-21T10:22:47.6580188Z .loc 1 19 0 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:19:0 2026-02-21T10:22:47.6580611Z $L__func_begin0: 2026-02-21T10:22:47.6580948Z .loc 1 19 0 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:19:0 2026-02-21T10:22:47.6581296Z 2026-02-21T10:22:47.6581371Z // %bb.0: 2026-02-21T10:22:47.6581602Z ld.param.b64 %rd10, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:22:47.6581884Z $L__tmp0: 2026-02-21T10:22:47.6582216Z .loc 1 21 67 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:21:67 2026-02-21T10:22:47.6582636Z mov.u32 %r1985, %ctaid.x; 2026-02-21T10:22:47.6582897Z ld.param.b64 %rd13, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:22:47.6583183Z mov.u32 %r350, %ctaid.y; 2026-02-21T10:22:47.6583440Z ld.param.b64 %rd48, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:22:47.6583735Z mov.u32 %r351, %ctaid.z; 2026-02-21T10:22:47.6583926Z mov.u32 %r352, %nctaid.x; 2026-02-21T10:22:47.6584099Z mov.u32 %r353, %nctaid.y; 2026-02-21T10:22:47.6584275Z mad.lo.s32 %r354, %r351, %r353, %r350; 2026-02-21T10:22:47.6584490Z mad.lo.s32 %r355, %r354, %r352, %r1985; 2026-02-21T10:22:47.6584690Z shl.b32 %r356, %r355, 7; 2026-02-21T10:22:47.6584862Z cvt.s64.s32 %rd49, %r356; 2026-02-21T10:22:47.6585035Z add.s64 %rd27, %rd48, %rd49; 2026-02-21T10:22:47.6585221Z mov.u32 %r2, %tid.x; 2026-02-21T10:22:47.6585390Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:22:47.6585563Z shl.b32 %r357, %r2, 2; 2026-02-21T10:22:47.6585741Z mov.b32 %r358, global_smem; 2026-02-21T10:22:47.6585921Z add.s32 %r298, %r358, %r357; 2026-02-21T10:22:47.6586102Z mov.b32 %r1909, 0; 2026-02-21T10:22:47.6586259Z // begin inline asm 2026-02-21T10:22:47.6586444Z @%p1 st.shared.b32 [ %r298 + 0 ], %r1909; 2026-02-21T10:22:47.6586786Z // end inline asm 2026-02-21T10:22:47.6586950Z bar.warp.sync -1; 2026-02-21T10:22:47.6587112Z setp.eq.b32 %p55, %r2, 0; 2026-02-21T10:22:47.6587295Z cvt.u64.u32 %rd12, %r358; 2026-02-21T10:22:47.6587483Z // begin inline asm 2026-02-21T10:22:47.6587815Z @%p55 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd12 + 0 ], %rd13; 2026-02-21T10:22:47.6588169Z // end inline asm 2026-02-21T10:22:47.6588319Z // begin inline asm 2026-02-21T10:22:47.6588687Z @%p55 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1; 2026-02-21T10:22:47.6589109Z // end inline asm 2026-02-21T10:22:47.6589272Z mov.b32 %r300, 128; 2026-02-21T10:22:47.6589437Z // begin inline asm 2026-02-21T10:22:47.6589736Z @%p55 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r300; 2026-02-21T10:22:47.6590178Z // end inline asm 2026-02-21T10:22:47.6590327Z mov.b32 %r1910, 32; 2026-02-21T10:22:47.6590489Z // begin inline asm 2026-02-21T10:22:47.6590775Z @%p55 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r1910; 2026-02-21T10:22:47.6591123Z // end inline asm 2026-02-21T10:22:47.6591277Z mov.b32 %r302, 1280; 2026-02-21T10:22:47.6591778Z [3360s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:22:47.6593466Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=1, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[None, False], range_num_stages=[3, 1], range_unroll_factors=[1, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:22:47.6595062Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:22:47.6595348Z `ptxas` stderr: 2026-02-21T10:22:47.6595980Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 945 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:22:47.6596751Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:22:47.6596942Z 2026-02-21T10:22:47.6597449Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpau0d1lol.ptx -o /tmp/tmpau0d1lol.ptx.o 2026-02-21T10:22:47.6598034Z 2026-02-21T10:22:47.6598195Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:22:47.6598495Z // begin inline asm 2026-02-21T10:22:47.6598810Z @%p55 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r302; 2026-02-21T10:22:47.6599176Z // end inline asm 2026-02-21T10:22:47.6599337Z mov.b32 %r303, 4096; 2026-02-21T10:22:47.6599507Z // begin inline asm 2026-02-21T10:22:47.6599799Z @%p55 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r303; 2026-02-21T10:22:47.6600150Z // end inline asm 2026-02-21T10:22:47.6600299Z mov.b64 %rd20, 1280; 2026-02-21T10:22:47.6600463Z // begin inline asm 2026-02-21T10:22:47.6600769Z @%p55 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd12 + 0 ], 0x0, %rd20; 2026-02-21T10:22:47.6601129Z // end inline asm 2026-02-21T10:22:47.6601273Z mov.b32 %r1913, 1; 2026-02-21T10:22:47.6601435Z // begin inline asm 2026-02-21T10:22:47.6601748Z @%p55 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0, %r1913; 2026-02-21T10:22:47.6602105Z // end inline asm 2026-02-21T10:22:47.6602261Z // begin inline asm 2026-02-21T10:22:47.6602567Z @%p55 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x1, %r1913; 2026-02-21T10:22:47.6602926Z // end inline asm 2026-02-21T10:22:47.6603071Z // begin inline asm 2026-02-21T10:22:47.6603380Z @%p55 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0; 2026-02-21T10:22:47.6603711Z // end inline asm 2026-02-21T10:22:47.6603859Z // begin inline asm 2026-02-21T10:22:47.6604163Z @%p55 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0; 2026-02-21T10:22:47.6604512Z // end inline asm 2026-02-21T10:22:47.6604661Z // begin inline asm 2026-02-21T10:22:47.6604943Z @%p55 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x3; 2026-02-21T10:22:47.6605276Z // end inline asm 2026-02-21T10:22:47.6605431Z // begin inline asm 2026-02-21T10:22:47.6605701Z @%p55 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd12 + 0 ], 0x0; 2026-02-21T10:22:47.6606114Z // end inline asm 2026-02-21T10:22:47.6606262Z // begin inline asm 2026-02-21T10:22:47.6606836Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd27 + 0 ], [ %rd12 + 0 ], 0x80; 2026-02-21T10:22:47.6607411Z // end inline asm 2026-02-21T10:22:47.6607565Z // begin inline asm 2026-02-21T10:22:47.6607814Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd27 + 0 ], 0x80; 2026-02-21T10:22:47.6608121Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:22:47.6608349Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:22:47.6608557Z // end inline asm 2026-02-21T10:22:47.6608710Z bar.sync 0; 2026-02-21T10:22:47.6608866Z cvta.global.u64 %rd38, %rd27; 2026-02-21T10:22:47.6609214Z .loc 1 41 45 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:41:45 2026-02-21T10:22:47.6609578Z shr.u32 %r359, %r2, 4; 2026-02-21T10:22:47.6609759Z bfe.u32 %r4, %r2, 4, 4; 2026-02-21T10:22:47.6609932Z or.b32 %r5, %r4, 16; 2026-02-21T10:22:47.6610160Z or.b32 %r6, %r4, 32; 2026-02-21T10:22:47.6610340Z or.b32 %r7, %r359, 48; 2026-02-21T10:22:47.6610505Z or.b32 %r8, %r4, 64; 2026-02-21T10:22:47.6610665Z or.b32 %r9, %r4, 80; 2026-02-21T10:22:47.6610818Z or.b32 %r10, %r4, 96; 2026-02-21T10:22:47.6610993Z or.b32 %r11, %r359, 112; 2026-02-21T10:22:47.6611167Z and.b32 %r360, %r2, 15; 2026-02-21T10:22:47.6611564Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6611931Z max.u32 %r361, %r1985, 5119; 2026-02-21T10:22:47.6612120Z shl.b32 %r362, %r361, 7; 2026-02-21T10:22:47.6612300Z sub.s32 %r14, 655360, %r362; 2026-02-21T10:22:47.6612617Z .loc 1 56 38 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:56:38 2026-02-21T10:22:47.6612994Z shl.b32 %r15, %r360, 2; 2026-02-21T10:22:47.6613307Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6613670Z add.s32 %r306, %r358, 73728; 2026-02-21T10:22:47.6613857Z // begin inline asm 2026-02-21T10:22:47.6614059Z @%p55 mbarrier.init.shared::cta.b64 [%r306], 1; 2026-02-21T10:22:47.6614284Z // end inline asm 2026-02-21T10:22:47.6614441Z bar.sync 0; 2026-02-21T10:22:47.6614597Z add.s32 %r307, %r358, 73736; 2026-02-21T10:22:47.6614776Z // begin inline asm 2026-02-21T10:22:47.6614964Z @%p55 mbarrier.init.shared::cta.b64 [%r307], 1; 2026-02-21T10:22:47.6615193Z // end inline asm 2026-02-21T10:22:47.6615352Z setp.lt.s32 %p25, %r14, 1; 2026-02-21T10:22:47.6615542Z setp.gt.s32 %p26, %r14, 0; 2026-02-21T10:22:47.6615865Z .loc 1 35 35 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:35:35 2026-02-21T10:22:47.6616226Z mul.hi.u32 %r363, %r1985, 1717986919; 2026-02-21T10:22:47.6616433Z shr.u32 %r364, %r363, 5; 2026-02-21T10:22:47.6616860Z .loc 1 36 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:36:33 2026-02-21T10:22:47.6617234Z shl.b32 %r365, %r364, 3; 2026-02-21T10:22:47.6617542Z .loc 1 37 39 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:37:39 2026-02-21T10:22:47.6617895Z sub.s32 %r366, 512, %r365; 2026-02-21T10:22:47.6618211Z .loc 1 37 52 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:37:52 2026-02-21T10:22:47.6618558Z min.s32 %r367, %r366, 8; 2026-02-21T10:22:47.6618872Z .loc 1 38 45 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:45 2026-02-21T10:22:47.6619222Z mul.lo.s32 %r368, %r364, 80; 2026-02-21T10:22:47.6619406Z sub.s32 %r369, %r1985, %r368; 2026-02-21T10:22:47.6619724Z .loc 1 39 51 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:39:51 2026-02-21T10:22:47.6620081Z div.s32 %r370, %r369, %r367; 2026-02-21T10:22:47.6620397Z .loc 1 38 64 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:64 2026-02-21T10:22:47.6620752Z mul.lo.s32 %r371, %r370, %r367; 2026-02-21T10:22:47.6621044Z sub.s32 %r372, %r369, %r371; 2026-02-21T10:22:47.6621361Z .loc 1 38 30 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:30 2026-02-21T10:22:47.6621719Z add.s32 %r373, %r372, %r365; 2026-02-21T10:22:47.6622112Z .loc 1 40 27 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:40:27 2026-02-21T10:22:47.6622469Z shl.b32 %r1908, %r373, 7; 2026-02-21T10:22:47.6622790Z .loc 1 41 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:41:32 2026-02-21T10:22:47.6623141Z or.b32 %r1986, %r1908, %r4; 2026-02-21T10:22:47.6623334Z or.b32 %r1987, %r1908, %r5; 2026-02-21T10:22:47.6623505Z or.b32 %r1988, %r1908, %r6; 2026-02-21T10:22:47.6623688Z or.b32 %r1989, %r1908, %r7; 2026-02-21T10:22:47.6623856Z or.b32 %r1990, %r1908, %r8; 2026-02-21T10:22:47.6624033Z or.b32 %r1991, %r1908, %r9; 2026-02-21T10:22:47.6624204Z or.b32 %r1992, %r1908, %r10; 2026-02-21T10:22:47.6624465Z or.b32 %r1993, %r1908, %r11; 2026-02-21T10:22:47.6624792Z .loc 1 42 27 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:42:27 2026-02-21T10:22:47.6625140Z shl.b32 %r1907, %r370, 7; 2026-02-21T10:22:47.6625457Z .loc 1 57 53 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:53 2026-02-21T10:22:47.6625804Z shl.b32 %r374, %r1986, 13; 2026-02-21T10:22:47.6626067Z shl.b32 %r375, %r1987, 13; 2026-02-21T10:22:47.6626245Z shl.b32 %r376, %r1988, 13; 2026-02-21T10:22:47.6626424Z shl.b32 %r377, %r1989, 13; 2026-02-21T10:22:47.6626717Z shl.b32 %r378, %r1990, 13; 2026-02-21T10:22:47.6626895Z shl.b32 %r379, %r1991, 13; 2026-02-21T10:22:47.6627071Z shl.b32 %r380, %r1992, 13; 2026-02-21T10:22:47.6627239Z shl.b32 %r381, %r1993, 13; 2026-02-21T10:22:47.6627559Z .loc 1 57 60 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:60 2026-02-21T10:22:47.6627911Z or.b32 %r382, %r374, %r15; 2026-02-21T10:22:47.6628094Z or.b32 %r383, %r375, %r15; 2026-02-21T10:22:47.6628265Z or.b32 %r384, %r376, %r15; 2026-02-21T10:22:47.6628441Z or.b32 %r385, %r377, %r15; 2026-02-21T10:22:47.6628689Z or.b32 %r386, %r378, %r15; 2026-02-21T10:22:47.6628866Z or.b32 %r387, %r379, %r15; 2026-02-21T10:22:47.6629041Z or.b32 %r388, %r380, %r15; 2026-02-21T10:22:47.6629211Z or.b32 %r389, %r381, %r15; 2026-02-21T10:22:47.6629527Z .loc 1 57 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:32 2026-02-21T10:22:47.6629900Z mad.wide.s32 %rd30, %r382, 2, %rd10; 2026-02-21T10:22:47.6630113Z mad.wide.s32 %rd31, %r383, 2, %rd10; 2026-02-21T10:22:47.6630314Z mad.wide.s32 %rd32, %r384, 2, %rd10; 2026-02-21T10:22:47.6630515Z mad.wide.s32 %rd33, %r385, 2, %rd10; 2026-02-21T10:22:47.6630712Z mad.wide.s32 %rd34, %r386, 2, %rd10; 2026-02-21T10:22:47.6630914Z mad.wide.s32 %rd35, %r387, 2, %rd10; 2026-02-21T10:22:47.6631116Z mad.wide.s32 %rd36, %r388, 2, %rd10; 2026-02-21T10:22:47.6631315Z mad.wide.s32 %rd37, %r389, 2, %rd10; 2026-02-21T10:22:47.6631650Z .loc 1 57 80 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:80 2026-02-21T10:22:47.6631997Z shl.b32 %r390, %r2, 3; 2026-02-21T10:22:47.6632170Z and.b32 %r391, %r390, 2040; 2026-02-21T10:22:47.6632344Z shr.u32 %r392, %r2, 1; 2026-02-21T10:22:47.6632514Z and.b32 %r393, %r392, 56; 2026-02-21T10:22:47.6632686Z xor.b32 %r27, %r391, %r393; 2026-02-21T10:22:47.6632865Z add.s32 %r308, %r358, %r27; 2026-02-21T10:22:47.6633044Z selp.b32 %r309, 8, 0, %p26; 2026-02-21T10:22:47.6633218Z // begin inline asm 2026-02-21T10:22:47.6633459Z cp.async.ca.shared.global [ %r308 + 0 ], [ %rd30 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6633738Z // end inline asm 2026-02-21T10:22:47.6633898Z add.s32 %r310, %r308, 2048; 2026-02-21T10:22:47.6634069Z // begin inline asm 2026-02-21T10:22:47.6634300Z cp.async.ca.shared.global [ %r310 + 0 ], [ %rd31 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6634666Z // end inline asm 2026-02-21T10:22:47.6634827Z add.s32 %r312, %r308, 4096; 2026-02-21T10:22:47.6634999Z // begin inline asm 2026-02-21T10:22:47.6635228Z cp.async.ca.shared.global [ %r312 + 0 ], [ %rd32 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6635501Z // end inline asm 2026-02-21T10:22:47.6635719Z add.s32 %r314, %r308, 6144; 2026-02-21T10:22:47.6635897Z // begin inline asm 2026-02-21T10:22:47.6636121Z cp.async.ca.shared.global [ %r314 + 0 ], [ %rd33 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6636393Z // end inline asm 2026-02-21T10:22:47.6636669Z add.s32 %r316, %r308, 8192; 2026-02-21T10:22:47.6636850Z // begin inline asm 2026-02-21T10:22:47.6637074Z cp.async.ca.shared.global [ %r316 + 0 ], [ %rd34 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6637343Z // end inline asm 2026-02-21T10:22:47.6637500Z add.s32 %r318, %r308, 10240; 2026-02-21T10:22:47.6637677Z // begin inline asm 2026-02-21T10:22:47.6637902Z cp.async.ca.shared.global [ %r318 + 0 ], [ %rd35 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6638170Z // end inline asm 2026-02-21T10:22:47.6638405Z add.s32 %r320, %r308, 12288; 2026-02-21T10:22:47.6638599Z // begin inline asm 2026-02-21T10:22:47.6638831Z cp.async.ca.shared.global [ %r320 + 0 ], [ %rd36 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6639095Z // end inline asm 2026-02-21T10:22:47.6639257Z add.s32 %r322, %r308, 14336; 2026-02-21T10:22:47.6639435Z // begin inline asm 2026-02-21T10:22:47.6639740Z cp.async.ca.shared.global [ %r322 + 0 ], [ %rd37 + 0 ], 0x8, %r309; 2026-02-21T10:22:47.6640027Z // end inline asm 2026-02-21T10:22:47.6640184Z cp.async.commit_group; 2026-02-21T10:22:47.6640516Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6640873Z bar.sync 0; 2026-02-21T10:22:47.6641041Z and.pred %p21, %p55, %p26; 2026-02-21T10:22:47.6641223Z // begin inline asm 2026-02-21T10:22:47.6641459Z @%p21 mbarrier.arrive.expect_tx.shared.b64 _, [%r306], 4096; 2026-02-21T10:22:47.6641749Z // end inline asm 2026-02-21T10:22:47.6642068Z .loc 1 63 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:63:33 2026-02-21T10:22:47.6642431Z bar.sync 0; 2026-02-21T10:22:47.6642586Z elect.sync %r394|%p27, -1; 2026-02-21T10:22:47.6642776Z and.pred %p28, %p26, %p27; 2026-02-21T10:22:47.6642960Z and.pred %p22, %p1, %p28; 2026-02-21T10:22:47.6643155Z add.s32 %r325, %r358, 65536; 2026-02-21T10:22:47.6643333Z // begin inline asm 2026-02-21T10:22:47.6643761Z @%p22 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r325], [%rd38, {%r1907, %r1909}], [%r306]; 2026-02-21T10:22:47.6644222Z // end inline asm 2026-02-21T10:22:47.6644519Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6644887Z setp.gt.s32 %p29, %r14, 1; 2026-02-21T10:22:47.6645204Z .loc 1 57 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:32 2026-02-21T10:22:47.6645562Z cvt.s64.s32 %rd50, %r374; 2026-02-21T10:22:47.6645747Z cvt.u64.u32 %rd51, %r15; 2026-02-21T10:22:47.6645927Z or.b64 %rd52, %rd50, %rd51; 2026-02-21T10:22:47.6646110Z shl.b64 %rd53, %rd52, 1; 2026-02-21T10:22:47.6646284Z add.s64 %rd54, %rd10, %rd53; 2026-02-21T10:22:47.6646591Z add.s64 %rd39, %rd54, 128; 2026-02-21T10:22:47.6646779Z cvt.s64.s32 %rd55, %r375; 2026-02-21T10:22:47.6646964Z or.b64 %rd56, %rd55, %rd51; 2026-02-21T10:22:47.6647147Z shl.b64 %rd57, %rd56, 1; 2026-02-21T10:22:47.6647328Z add.s64 %rd58, %rd10, %rd57; 2026-02-21T10:22:47.6647509Z add.s64 %rd40, %rd58, 128; 2026-02-21T10:22:47.6647697Z cvt.s64.s32 %rd59, %r376; 2026-02-21T10:22:47.6647872Z or.b64 %rd60, %rd59, %rd51; 2026-02-21T10:22:47.6648073Z shl.b64 %rd61, %rd60, 1; 2026-02-21T10:22:47.6648252Z add.s64 %rd62, %rd10, %rd61; 2026-02-21T10:22:47.6648432Z add.s64 %rd41, %rd62, 128; 2026-02-21T10:22:47.6648611Z cvt.s64.s32 %rd63, %r377; 2026-02-21T10:22:47.6648783Z or.b64 %rd64, %rd63, %rd51; 2026-02-21T10:22:47.6648965Z shl.b64 %rd65, %rd64, 1; 2026-02-21T10:22:47.6649239Z add.s64 %rd66, %rd10, %rd65; 2026-02-21T10:22:47.6649432Z add.s64 %rd42, %rd66, 128; 2026-02-21T10:22:47.6649610Z cvt.s64.s32 %rd67, %r378; 2026-02-21T10:22:47.6649797Z or.b64 %rd68, %rd67, %rd51; 2026-02-21T10:22:47.6650059Z shl.b64 %rd69, %rd68, 1; 2026-02-21T10:22:47.6650239Z add.s64 %rd70, %rd10, %rd69; 2026-02-21T10:22:47.6650425Z add.s64 %rd43, %rd70, 128; 2026-02-21T10:22:47.6650602Z cvt.s64.s32 %rd71, %r379; 2026-02-21T10:22:47.6650780Z or.b64 %rd72, %rd71, %rd51; 2026-02-21T10:22:47.6650958Z shl.b64 %rd73, %rd72, 1; 2026-02-21T10:22:47.6651138Z add.s64 %rd74, %rd10, %rd73; 2026-02-21T10:22:47.6651315Z add.s64 %rd44, %rd74, 128; 2026-02-21T10:22:47.6651511Z cvt.s64.s32 %rd75, %r380; 2026-02-21T10:22:47.6651685Z or.b64 %rd76, %rd75, %rd51; 2026-02-21T10:22:47.6651864Z shl.b64 %rd77, %rd76, 1; 2026-02-21T10:22:47.6652032Z add.s64 %rd78, %rd10, %rd77; 2026-02-21T10:22:47.6652215Z add.s64 %rd45, %rd78, 128; 2026-02-21T10:22:47.6652394Z cvt.s64.s32 %rd79, %r381; 2026-02-21T10:22:47.6652644Z or.b64 %rd80, %rd79, %rd51; 2026-02-21T10:22:47.6652830Z shl.b64 %rd81, %rd80, 1; 2026-02-21T10:22:47.6652999Z add.s64 %rd82, %rd10, %rd81; 2026-02-21T10:22:47.6653178Z add.s64 %rd46, %rd82, 128; 2026-02-21T10:22:47.6653505Z .loc 1 57 80 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:80 2026-02-21T10:22:47.6653874Z add.s32 %r329, %r308, 16384; 2026-02-21T10:22:47.6654119Z selp.b32 %r330, 8, 0, %p29; 2026-02-21T10:22:47.6654302Z // begin inline asm 2026-02-21T10:22:47.6654552Z cp.async.ca.shared.global [ %r329 + 0 ], [ %rd39 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6654841Z // end inline asm 2026-02-21T10:22:47.6655002Z add.s32 %r331, %r308, 18432; 2026-02-21T10:22:47.6655176Z // begin inline asm 2026-02-21T10:22:47.6655409Z cp.async.ca.shared.global [ %r331 + 0 ], [ %rd40 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6655679Z // end inline asm 2026-02-21T10:22:47.6655840Z add.s32 %r333, %r308, 20480; 2026-02-21T10:22:47.6656018Z // begin inline asm 2026-02-21T10:22:47.6656247Z cp.async.ca.shared.global [ %r333 + 0 ], [ %rd41 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6656644Z // end inline asm 2026-02-21T10:22:47.6656796Z add.s32 %r335, %r308, 22528; 2026-02-21T10:22:47.6656980Z // begin inline asm 2026-02-21T10:22:47.6657218Z cp.async.ca.shared.global [ %r335 + 0 ], [ %rd42 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6657489Z // end inline asm 2026-02-21T10:22:47.6657642Z add.s32 %r337, %r308, 24576; 2026-02-21T10:22:47.6657821Z // begin inline asm 2026-02-21T10:22:47.6658040Z cp.async.ca.shared.global [ %r337 + 0 ], [ %rd43 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6658314Z // end inline asm 2026-02-21T10:22:47.6658462Z add.s32 %r339, %r308, 26624; 2026-02-21T10:22:47.6658642Z // begin inline asm 2026-02-21T10:22:47.6658870Z cp.async.ca.shared.global [ %r339 + 0 ], [ %rd44 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6659134Z // end inline asm 2026-02-21T10:22:47.6659293Z add.s32 %r341, %r308, 28672; 2026-02-21T10:22:47.6659469Z // begin inline asm 2026-02-21T10:22:47.6659698Z cp.async.ca.shared.global [ %r341 + 0 ], [ %rd45 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6667609Z // end inline asm 2026-02-21T10:22:47.6667807Z add.s32 %r343, %r308, 30720; 2026-02-21T10:22:47.6668022Z // begin inline asm 2026-02-21T10:22:47.6668281Z cp.async.ca.shared.global [ %r343 + 0 ], [ %rd46 + 0 ], 0x8, %r330; 2026-02-21T10:22:47.6668668Z // end inline asm 2026-02-21T10:22:47.6668855Z cp.async.commit_group; 2026-02-21T10:22:47.6669201Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6669589Z bar.sync 0; 2026-02-21T10:22:47.6669754Z and.pred %p23, %p55, %p29; 2026-02-21T10:22:47.6669951Z // begin inline asm 2026-02-21T10:22:47.6670183Z @%p23 mbarrier.arrive.expect_tx.shared.b64 _, [%r307], 4096; 2026-02-21T10:22:47.6670456Z // end inline asm 2026-02-21T10:22:47.6670770Z .loc 1 63 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:63:33 2026-02-21T10:22:47.6671294Z bar.sync 0; 2026-02-21T10:22:47.6671464Z elect.sync %r395|%p30, -1; 2026-02-21T10:22:47.6671651Z and.pred %p31, %p29, %p30; 2026-02-21T10:22:47.6671843Z and.pred %p24, %p1, %p31; 2026-02-21T10:22:47.6672119Z add.s32 %r346, %r358, 69632; 2026-02-21T10:22:47.6672308Z // begin inline asm 2026-02-21T10:22:47.6672753Z @%p24 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r346], [%rd38, {%r1907, %r1910}], [%r307]; 2026-02-21T10:22:47.6673222Z // end inline asm 2026-02-21T10:22:47.6673525Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6673894Z @%p25 bra $L__BB0_7; 2026-02-21T10:22:47.6674089Z // %bb.1: // %.lr.ph 2026-02-21T10:22:47.6674479Z .loc 1 0 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:0:88 2026-02-21T10:22:47.6674908Z ld.param.b64 %rd11, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:22:47.6675238Z shr.u32 %r3, %r2, 5; 2026-02-21T10:22:47.6675414Z shl.b32 %r12, %r360, 3; 2026-02-21T10:22:47.6675587Z add.s32 %r13, %r362, -655232; 2026-02-21T10:22:47.6675779Z and.b32 %r16, %r2, 128; 2026-02-21T10:22:47.6675962Z sub.s32 %r28, 126, %r13; 2026-02-21T10:22:47.6676146Z shl.b32 %r401, %r2, 6; 2026-02-21T10:22:47.6676324Z and.b32 %r402, %r401, 14336; 2026-02-21T10:22:47.6676723Z shl.b32 %r403, %r2, 5; 2026-02-21T10:22:47.6676911Z and.b32 %r404, %r403, 896; 2026-02-21T10:22:47.6677099Z shl.b32 %r405, %r2, 1; 2026-02-21T10:22:47.6677269Z and.b32 %r406, %r405, 62; 2026-02-21T10:22:47.6677443Z or.b32 %r407, %r402, %r404; 2026-02-21T10:22:47.6677625Z or.b32 %r29, %r407, %r406; 2026-02-21T10:22:47.6677794Z xor.b32 %r30, %r29, 8; 2026-02-21T10:22:47.6677964Z xor.b32 %r31, %r29, 16; 2026-02-21T10:22:47.6678135Z xor.b32 %r32, %r29, 24; 2026-02-21T10:22:47.6678297Z xor.b32 %r33, %r29, 32; 2026-02-21T10:22:47.6678466Z xor.b32 %r34, %r29, 40; 2026-02-21T10:22:47.6678633Z xor.b32 %r35, %r29, 48; 2026-02-21T10:22:47.6678804Z xor.b32 %r36, %r29, 56; 2026-02-21T10:22:47.6678965Z and.b32 %r37, %r2, 127; 2026-02-21T10:22:47.6679141Z shl.b32 %r408, %r37, 7; 2026-02-21T10:22:47.6679312Z shl.b32 %r409, %r2, 4; 2026-02-21T10:22:47.6679490Z and.b32 %r410, %r409, 112; 2026-02-21T10:22:47.6679670Z shr.u32 %r411, %r16, 5; 2026-02-21T10:22:47.6679849Z or.b32 %r412, %r410, %r411; 2026-02-21T10:22:47.6680038Z or.b32 %r413, %r412, %r408; 2026-02-21T10:22:47.6680218Z add.s32 %r415, %r358, 32768; 2026-02-21T10:22:47.6680405Z add.s32 %r38, %r415, %r413; 2026-02-21T10:22:47.6680582Z xor.b32 %r416, %r413, 16; 2026-02-21T10:22:47.6680766Z add.s32 %r39, %r415, %r416; 2026-02-21T10:22:47.6680940Z xor.b32 %r417, %r413, 32; 2026-02-21T10:22:47.6681119Z add.s32 %r40, %r415, %r417; 2026-02-21T10:22:47.6681292Z xor.b32 %r418, %r413, 48; 2026-02-21T10:22:47.6681472Z add.s32 %r41, %r415, %r418; 2026-02-21T10:22:47.6681647Z xor.b32 %r419, %r413, 64; 2026-02-21T10:22:47.6681829Z add.s32 %r42, %r415, %r419; 2026-02-21T10:22:47.6682014Z xor.b32 %r420, %r413, 80; 2026-02-21T10:22:47.6682188Z add.s32 %r43, %r415, %r420; 2026-02-21T10:22:47.6682375Z xor.b32 %r421, %r413, 96; 2026-02-21T10:22:47.6682550Z add.s32 %r44, %r415, %r421; 2026-02-21T10:22:47.6682733Z xor.b32 %r422, %r413, 112; 2026-02-21T10:22:47.6682905Z add.s32 %r45, %r415, %r422; 2026-02-21T10:22:47.6683092Z bfe.u32 %r423, %r415, 4, 14; 2026-02-21T10:22:47.6683276Z cvt.u64.u32 %rd83, %r423; 2026-02-21T10:22:47.6683480Z or.b64 %rd91, %rd83, 4611686293372403712; 2026-02-21T10:22:47.6683702Z add.s32 %r424, %r358, 32800; 2026-02-21T10:22:47.6683886Z bfe.u32 %r425, %r424, 4, 14; 2026-02-21T10:22:47.6684066Z cvt.u64.u32 %rd84, %r425; 2026-02-21T10:22:47.6684248Z or.b64 %rd92, %rd84, 4611686293372403712; 2026-02-21T10:22:47.6684466Z add.s32 %r426, %r358, 32832; 2026-02-21T10:22:47.6684640Z bfe.u32 %r427, %r426, 4, 14; 2026-02-21T10:22:47.6684818Z cvt.u64.u32 %rd85, %r427; 2026-02-21T10:22:47.6685083Z or.b64 %rd93, %rd85, 4611686293372403712; 2026-02-21T10:22:47.6685293Z add.s32 %r428, %r358, 32864; 2026-02-21T10:22:47.6685467Z bfe.u32 %r429, %r428, 4, 14; 2026-02-21T10:22:47.6685664Z cvt.u64.u32 %rd86, %r429; 2026-02-21T10:22:47.6685935Z or.b64 %rd94, %rd86, 4611686293372403712; 2026-02-21T10:22:47.6686146Z add.s32 %r430, %r358, 49152; 2026-02-21T10:22:47.6686338Z bfe.u32 %r431, %r430, 4, 14; 2026-02-21T10:22:47.6686635Z cvt.u64.u32 %rd87, %r431; 2026-02-21T10:22:47.6686834Z or.b64 %rd95, %rd87, 4611686293372403712; 2026-02-21T10:22:47.6687043Z add.s32 %r432, %r358, 49184; 2026-02-21T10:22:47.6687225Z bfe.u32 %r433, %r432, 4, 14; 2026-02-21T10:22:47.6687399Z cvt.u64.u32 %rd88, %r433; 2026-02-21T10:22:47.6687586Z or.b64 %rd96, %rd88, 4611686293372403712; 2026-02-21T10:22:47.6687794Z add.s32 %r434, %r358, 49216; 2026-02-21T10:22:47.6687969Z bfe.u32 %r435, %r434, 4, 14; 2026-02-21T10:22:47.6688149Z cvt.u64.u32 %rd89, %r435; 2026-02-21T10:22:47.6688420Z or.b64 %rd97, %rd89, 4611686293372403712; 2026-02-21T10:22:47.6688638Z add.s32 %r436, %r358, 49248; 2026-02-21T10:22:47.6688813Z bfe.u32 %r437, %r436, 4, 14; 2026-02-21T10:22:47.6688995Z cvt.u64.u32 %rd90, %r437; 2026-02-21T10:22:47.6689174Z or.b64 %rd98, %rd90, 4611686293372403712; 2026-02-21T10:22:47.6689390Z and.b32 %r438, %r2, 3; 2026-02-21T10:22:47.6689559Z shl.b32 %r439, %r438, 13; 2026-02-21T10:22:47.6689821Z and.b32 %r440, %r403, 7264; 2026-02-21T10:22:47.6690020Z and.b32 %r441, %r2, 24; 2026-02-21T10:22:47.6690194Z shl.b32 %r442, %r441, 4; 2026-02-21T10:22:47.6690375Z and.b32 %r444, %r357, 16; 2026-02-21T10:22:47.6690546Z or.b32 %r445, %r439, %r444; 2026-02-21T10:22:47.6690735Z or.b32 %r446, %r440, %r442; 2026-02-21T10:22:47.6690911Z or.b32 %r447, %r445, %r446; 2026-02-21T10:22:47.6691094Z add.s32 %r46, %r415, %r447; 2026-02-21T10:22:47.6691267Z xor.b32 %r448, %r447, 32; 2026-02-21T10:22:47.6691443Z add.s32 %r47, %r415, %r448; 2026-02-21T10:22:47.6691622Z xor.b32 %r449, %r447, 64; 2026-02-21T10:22:47.6691801Z add.s32 %r48, %r415, %r449; 2026-02-21T10:22:47.6691982Z xor.b32 %r450, %r447, 96; 2026-02-21T10:22:47.6692150Z add.s32 %r49, %r415, %r450; 2026-02-21T10:22:47.6692331Z shl.b32 %r451, %r441, 10; 2026-02-21T10:22:47.6692502Z shl.b32 %r452, %r438, 5; 2026-02-21T10:22:47.6692677Z and.b32 %r453, %r357, 1008; 2026-02-21T10:22:47.6692849Z or.b32 %r454, %r451, %r452; 2026-02-21T10:22:47.6693032Z xor.b32 %r455, %r454, %r453; 2026-02-21T10:22:47.6693208Z add.s32 %r1778, %r415, %r455; 2026-02-21T10:22:47.6693395Z add.s32 %r1783, %r1778, 1024; 2026-02-21T10:22:47.6693572Z add.s32 %r1788, %r1778, 2048; 2026-02-21T10:22:47.6693757Z add.s32 %r1793, %r1778, 3072; 2026-02-21T10:22:47.6693938Z add.s32 %r1798, %r1778, 4096; 2026-02-21T10:22:47.6694112Z add.s32 %r1803, %r1778, 5120; 2026-02-21T10:22:47.6694293Z add.s32 %r1808, %r1778, 6144; 2026-02-21T10:22:47.6694469Z add.s32 %r1813, %r1778, 7168; 2026-02-21T10:22:47.6694653Z mov.b32 %r1916, 0f00000000; 2026-02-21T10:22:47.6694847Z mov.b32 %r1912, -1; 2026-02-21T10:22:47.6695019Z mov.b32 %r1911, %r1909; 2026-02-21T10:22:47.6695184Z mov.b32 %r1914, %r1907; 2026-02-21T10:22:47.6695359Z mov.b32 %r1915, %r1908; 2026-02-21T10:22:47.6695530Z mov.b32 %r1917, %r1916; 2026-02-21T10:22:47.6695694Z mov.b32 %r1918, %r1916; 2026-02-21T10:22:47.6695868Z mov.b32 %r1919, %r1916; 2026-02-21T10:22:47.6696031Z mov.b32 %r1920, %r1916; 2026-02-21T10:22:47.6696201Z mov.b32 %r1921, %r1916; 2026-02-21T10:22:47.6696362Z mov.b32 %r1922, %r1916; 2026-02-21T10:22:47.6696657Z mov.b32 %r1923, %r1916; 2026-02-21T10:22:47.6696822Z mov.b32 %r1924, %r1916; 2026-02-21T10:22:47.6696989Z mov.b32 %r1925, %r1916; 2026-02-21T10:22:47.6697150Z mov.b32 %r1926, %r1916; 2026-02-21T10:22:47.6697316Z mov.b32 %r1927, %r1916; 2026-02-21T10:22:47.6697488Z mov.b32 %r1928, %r1916; 2026-02-21T10:22:47.6697658Z mov.b32 %r1929, %r1916; 2026-02-21T10:22:47.6697826Z mov.b32 %r1930, %r1916; 2026-02-21T10:22:47.6698091Z mov.b32 %r1931, %r1916; 2026-02-21T10:22:47.6698263Z mov.b32 %r1932, %r1916; 2026-02-21T10:22:47.6698425Z mov.b32 %r1933, %r1916; 2026-02-21T10:22:47.6698596Z mov.b32 %r1934, %r1916; 2026-02-21T10:22:47.6698762Z mov.b32 %r1935, %r1916; 2026-02-21T10:22:47.6699011Z mov.b32 %r1936, %r1916; 2026-02-21T10:22:47.6699177Z mov.b32 %r1937, %r1916; 2026-02-21T10:22:47.6699337Z mov.b32 %r1938, %r1916; 2026-02-21T10:22:47.6699506Z mov.b32 %r1939, %r1916; 2026-02-21T10:22:47.6699667Z mov.b32 %r1940, %r1916; 2026-02-21T10:22:47.6699834Z mov.b32 %r1941, %r1916; 2026-02-21T10:22:47.6699995Z mov.b32 %r1942, %r1916; 2026-02-21T10:22:47.6700160Z mov.b32 %r1943, %r1916; 2026-02-21T10:22:47.6700319Z mov.b32 %r1944, %r1916; 2026-02-21T10:22:47.6700485Z mov.b32 %r1945, %r1916; 2026-02-21T10:22:47.6700646Z mov.b32 %r1946, %r1916; 2026-02-21T10:22:47.6700817Z mov.b32 %r1947, %r1916; 2026-02-21T10:22:47.6700985Z mov.b32 %r1948, %r1916; 2026-02-21T10:22:47.6701146Z mov.b32 %r1949, %r1916; 2026-02-21T10:22:47.6701416Z mov.b32 %r1950, %r1916; 2026-02-21T10:22:47.6701586Z mov.b32 %r1951, %r1916; 2026-02-21T10:22:47.6701752Z mov.b32 %r1952, %r1916; 2026-02-21T10:22:47.6701915Z mov.b32 %r1953, %r1916; 2026-02-21T10:22:47.6702087Z mov.b32 %r1954, %r1916; 2026-02-21T10:22:47.6702248Z mov.b32 %r1955, %r1916; 2026-02-21T10:22:47.6702417Z mov.b32 %r1956, %r1916; 2026-02-21T10:22:47.6702579Z mov.b32 %r1957, %r1916; 2026-02-21T10:22:47.6702815Z mov.b32 %r1958, %r1916; 2026-02-21T10:22:47.6702985Z mov.b32 %r1959, %r1916; 2026-02-21T10:22:47.6703144Z mov.b32 %r1960, %r1916; 2026-02-21T10:22:47.6703310Z mov.b32 %r1961, %r1916; 2026-02-21T10:22:47.6703471Z mov.b32 %r1962, %r1916; 2026-02-21T10:22:47.6703651Z mov.b32 %r1963, %r1916; 2026-02-21T10:22:47.6703814Z mov.b32 %r1964, %r1916; 2026-02-21T10:22:47.6703984Z mov.b32 %r1965, %r1916; 2026-02-21T10:22:47.6704142Z mov.b32 %r1966, %r1916; 2026-02-21T10:22:47.6704306Z mov.b32 %r1967, %r1916; 2026-02-21T10:22:47.6704470Z mov.b32 %r1968, %r1916; 2026-02-21T10:22:47.6704637Z mov.b32 %r1969, %r1916; 2026-02-21T10:22:47.6704801Z mov.b32 %r1970, %r1916; 2026-02-21T10:22:47.6704960Z mov.b32 %r1971, %r1916; 2026-02-21T10:22:47.6705127Z mov.b32 %r1972, %r1916; 2026-02-21T10:22:47.6705290Z mov.b32 %r1973, %r1916; 2026-02-21T10:22:47.6705455Z mov.b32 %r1974, %r1916; 2026-02-21T10:22:47.6705615Z mov.b32 %r1975, %r1916; 2026-02-21T10:22:47.6705785Z mov.b32 %r1976, %r1916; 2026-02-21T10:22:47.6705947Z mov.b32 %r1977, %r1916; 2026-02-21T10:22:47.6706112Z mov.b32 %r1978, %r1916; 2026-02-21T10:22:47.6706272Z mov.b32 %r1979, %r1916; 2026-02-21T10:22:47.6706440Z mov.b32 %r1981, %r1913; 2026-02-21T10:22:47.6706728Z mov.b32 %r1982, %r1909; 2026-02-21T10:22:47.6706889Z mov.b32 %r1983, %r1915; 2026-02-21T10:22:47.6707068Z mov.b32 %r1984, %r1914; 2026-02-21T10:22:47.6707232Z bra.uni $L__BB0_2; 2026-02-21T10:22:47.6707440Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:47.6707865Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6708227Z add.s32 %r1982, %r1982, 1; 2026-02-21T10:22:47.6708417Z setp.lt.s32 %p54, %r1982, %r14; 2026-02-21T10:22:47.6708683Z mov.b32 %r1907, %r1914; 2026-02-21T10:22:47.6708849Z mov.b32 %r1908, %r1915; 2026-02-21T10:22:47.6709007Z mov.b32 %r1909, %r1981; 2026-02-21T10:22:47.6709175Z mov.b32 %r1914, %r1984; 2026-02-21T10:22:47.6709338Z mov.b32 %r1915, %r1983; 2026-02-21T10:22:47.6709501Z mov.b32 %r1981, %r142; 2026-02-21T10:22:47.6709666Z @%p54 bra $L__BB0_2; 2026-02-21T10:22:47.6709825Z bra.uni $L__BB0_7; 2026-02-21T10:22:47.6710038Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T10:22:47.6710441Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6710796Z add.s32 %r456, %r1981, 1; 2026-02-21T10:22:47.6710972Z setp.eq.b32 %p32, %r1981, 127; 2026-02-21T10:22:47.6711267Z selp.b32 %r142, 0, %r456, %p32; 2026-02-21T10:22:47.6711460Z setp.ne.b32 %p33, %r142, 0; 2026-02-21T10:22:47.6711636Z @%p33 bra $L__BB0_4; 2026-02-21T10:22:47.6711844Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:47.6712168Z add.s32 %r1985, %r1985, 1; 2026-02-21T10:22:47.6712493Z .loc 1 35 35 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:35:35 2026-02-21T10:22:47.6712850Z mul.hi.s32 %r457, %r1985, 1717986919; 2026-02-21T10:22:47.6713050Z shr.u32 %r458, %r457, 31; 2026-02-21T10:22:47.6713218Z shr.s32 %r459, %r457, 5; 2026-02-21T10:22:47.6713389Z add.s32 %r460, %r459, %r458; 2026-02-21T10:22:47.6713715Z .loc 1 36 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:36:33 2026-02-21T10:22:47.6714075Z shl.b32 %r461, %r460, 3; 2026-02-21T10:22:47.6714388Z .loc 1 37 39 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:37:39 2026-02-21T10:22:47.6714824Z sub.s32 %r462, 512, %r461; 2026-02-21T10:22:47.6715143Z .loc 1 37 52 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:37:52 2026-02-21T10:22:47.6715485Z min.s32 %r463, %r462, 8; 2026-02-21T10:22:47.6715804Z .loc 1 38 45 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:45 2026-02-21T10:22:47.6716156Z mul.lo.s32 %r464, %r460, 80; 2026-02-21T10:22:47.6716396Z sub.s32 %r465, %r1985, %r464; 2026-02-21T10:22:47.6716831Z .loc 1 39 51 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:39:51 2026-02-21T10:22:47.6717192Z div.s32 %r466, %r465, %r463; 2026-02-21T10:22:47.6717508Z .loc 1 38 64 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:64 2026-02-21T10:22:47.6717858Z mul.lo.s32 %r467, %r466, %r463; 2026-02-21T10:22:47.6718047Z sub.s32 %r468, %r465, %r467; 2026-02-21T10:22:47.6718361Z .loc 1 38 30 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:38:30 2026-02-21T10:22:47.6718708Z add.s32 %r469, %r468, %r461; 2026-02-21T10:22:47.6719019Z .loc 1 40 27 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:40:27 2026-02-21T10:22:47.6719366Z shl.b32 %r1983, %r469, 7; 2026-02-21T10:22:47.6719677Z .loc 1 41 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:41:32 2026-02-21T10:22:47.6720027Z or.b32 %r1986, %r1983, %r4; 2026-02-21T10:22:47.6720204Z or.b32 %r1987, %r1983, %r5; 2026-02-21T10:22:47.6720377Z or.b32 %r1988, %r1983, %r6; 2026-02-21T10:22:47.6720547Z or.b32 %r1989, %r1983, %r7; 2026-02-21T10:22:47.6720716Z or.b32 %r1990, %r1983, %r8; 2026-02-21T10:22:47.6720881Z or.b32 %r1991, %r1983, %r9; 2026-02-21T10:22:47.6721056Z or.b32 %r1992, %r1983, %r10; 2026-02-21T10:22:47.6721226Z or.b32 %r1993, %r1983, %r11; 2026-02-21T10:22:47.6721540Z .loc 1 42 27 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:42:27 2026-02-21T10:22:47.6721888Z shl.b32 %r1984, %r466, 7; 2026-02-21T10:22:47.6722110Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:47.6722505Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6722853Z setp.eq.b32 %p44, %r142, 0; 2026-02-21T10:22:47.6723037Z setp.lt.s32 %p45, %r1982, %r28; 2026-02-21T10:22:47.6723361Z .loc 1 80 38 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:80:38 2026-02-21T10:22:47.6723730Z setp.eq.b32 %p46, %r16, 0; 2026-02-21T10:22:47.6724039Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6724384Z add.s32 %r1683, %r1912, 1; 2026-02-21T10:22:47.6724561Z setp.gt.s32 %p49, %r1683, 1; 2026-02-21T10:22:47.6724741Z selp.b32 %r1912, 0, %r1683, %p49; 2026-02-21T10:22:47.6724940Z selp.b32 %r1684, 1, 0, %p49; 2026-02-21T10:22:47.6725214Z xor.b32 %r1911, %r1911, %r1684; 2026-02-21T10:22:47.6725535Z .loc 1 57 80 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:80 2026-02-21T10:22:47.6725884Z cp.async.wait_group 1; 2026-02-21T10:22:47.6726117Z bar.sync 0; 2026-02-21T10:22:47.6726260Z shl.b32 %r1685, %r1912, 14; 2026-02-21T10:22:47.6726595Z add.s32 %r1687, %r358, %r1685; 2026-02-21T10:22:47.6726926Z .loc 1 61 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:61:32 2026-02-21T10:22:47.6727273Z add.s32 %r1688, %r1687, %r29; 2026-02-21T10:22:47.6727455Z ld.shared.b16 %rs1, [%r1688]; 2026-02-21T10:22:47.6727640Z ld.shared.b16 %rs2, [%r1688+1024]; 2026-02-21T10:22:47.6727842Z ld.shared.b16 %rs3, [%r1688+64]; 2026-02-21T10:22:47.6728031Z ld.shared.b16 %rs4, [%r1688+1088]; 2026-02-21T10:22:47.6728238Z add.s32 %r1689, %r1687, %r30; 2026-02-21T10:22:47.6728415Z ld.shared.b16 %rs5, [%r1689]; 2026-02-21T10:22:47.6728600Z ld.shared.b16 %rs6, [%r1689+1024]; 2026-02-21T10:22:47.6728871Z ld.shared.b16 %rs7, [%r1689+64]; 2026-02-21T10:22:47.6729065Z ld.shared.b16 %rs8, [%r1689+1088]; 2026-02-21T10:22:47.6729256Z add.s32 %r1690, %r1687, %r31; 2026-02-21T10:22:47.6729431Z ld.shared.b16 %rs9, [%r1690]; 2026-02-21T10:22:47.6729617Z ld.shared.b16 %rs10, [%r1690+1024]; 2026-02-21T10:22:47.6729818Z ld.shared.b16 %rs11, [%r1690+64]; 2026-02-21T10:22:47.6730087Z ld.shared.b16 %rs12, [%r1690+1088]; 2026-02-21T10:22:47.6730285Z add.s32 %r1691, %r1687, %r32; 2026-02-21T10:22:47.6730467Z ld.shared.b16 %rs13, [%r1691]; 2026-02-21T10:22:47.6730654Z ld.shared.b16 %rs14, [%r1691+1024]; 2026-02-21T10:22:47.6730848Z ld.shared.b16 %rs15, [%r1691+64]; 2026-02-21T10:22:47.6731041Z ld.shared.b16 %rs16, [%r1691+1088]; 2026-02-21T10:22:47.6731228Z add.s32 %r1692, %r1687, %r33; 2026-02-21T10:22:47.6731410Z ld.shared.b16 %rs17, [%r1692]; 2026-02-21T10:22:47.6731594Z ld.shared.b16 %rs18, [%r1692+1024]; 2026-02-21T10:22:47.6731789Z ld.shared.b16 %rs19, [%r1692+64]; 2026-02-21T10:22:47.6731984Z ld.shared.b16 %rs20, [%r1692+1088]; 2026-02-21T10:22:47.6732177Z add.s32 %r1693, %r1687, %r34; 2026-02-21T10:22:47.6732361Z ld.shared.b16 %rs21, [%r1693]; 2026-02-21T10:22:47.6732544Z ld.shared.b16 %rs22, [%r1693+1024]; 2026-02-21T10:22:47.6732742Z ld.shared.b16 %rs23, [%r1693+64]; 2026-02-21T10:22:47.6732930Z ld.shared.b16 %rs24, [%r1693+1088]; 2026-02-21T10:22:47.6733122Z add.s32 %r1694, %r1687, %r35; 2026-02-21T10:22:47.6733300Z ld.shared.b16 %rs25, [%r1694]; 2026-02-21T10:22:47.6733486Z ld.shared.b16 %rs26, [%r1694+1024]; 2026-02-21T10:22:47.6733675Z ld.shared.b16 %rs27, [%r1694+64]; 2026-02-21T10:22:47.6733867Z ld.shared.b16 %rs28, [%r1694+1088]; 2026-02-21T10:22:47.6734051Z add.s32 %r1695, %r1687, %r36; 2026-02-21T10:22:47.6734230Z ld.shared.b16 %rs29, [%r1695]; 2026-02-21T10:22:47.6734413Z ld.shared.b16 %rs30, [%r1695+1024]; 2026-02-21T10:22:47.6734603Z ld.shared.b16 %rs31, [%r1695+64]; 2026-02-21T10:22:47.6734797Z ld.shared.b16 %rs32, [%r1695+1088]; 2026-02-21T10:22:47.6734989Z cvt.f32.bf16 %r600, %rs1; 2026-02-21T10:22:47.6735179Z cvt.f32.bf16 %r601, %rs2; 2026-02-21T10:22:47.6735351Z cvt.f32.bf16 %r602, %rs5; 2026-02-21T10:22:47.6735521Z cvt.f32.bf16 %r603, %rs6; 2026-02-21T10:22:47.6735689Z cvt.f32.bf16 %r732, %rs9; 2026-02-21T10:22:47.6735856Z cvt.f32.bf16 %r733, %rs10; 2026-02-21T10:22:47.6736035Z cvt.f32.bf16 %r734, %rs13; 2026-02-21T10:22:47.6736206Z cvt.f32.bf16 %r735, %rs14; 2026-02-21T10:22:47.6736381Z cvt.f32.bf16 %r864, %rs17; 2026-02-21T10:22:47.6736667Z cvt.f32.bf16 %r865, %rs18; 2026-02-21T10:22:47.6736840Z cvt.f32.bf16 %r866, %rs21; 2026-02-21T10:22:47.6737007Z cvt.f32.bf16 %r867, %rs22; 2026-02-21T10:22:47.6737174Z cvt.f32.bf16 %r996, %rs25; 2026-02-21T10:22:47.6737338Z cvt.f32.bf16 %r997, %rs26; 2026-02-21T10:22:47.6737507Z cvt.f32.bf16 %r998, %rs29; 2026-02-21T10:22:47.6737674Z cvt.f32.bf16 %r999, %rs30; 2026-02-21T10:22:47.6737847Z cvt.f32.bf16 %r1128, %rs3; 2026-02-21T10:22:47.6738115Z cvt.f32.bf16 %r1129, %rs4; 2026-02-21T10:22:47.6738283Z cvt.f32.bf16 %r1130, %rs7; 2026-02-21T10:22:47.6738457Z cvt.f32.bf16 %r1131, %rs8; 2026-02-21T10:22:47.6738623Z cvt.f32.bf16 %r1260, %rs11; 2026-02-21T10:22:47.6738802Z cvt.f32.bf16 %r1261, %rs12; 2026-02-21T10:22:47.6739039Z cvt.f32.bf16 %r1262, %rs15; 2026-02-21T10:22:47.6739210Z cvt.f32.bf16 %r1263, %rs16; 2026-02-21T10:22:47.6739380Z cvt.f32.bf16 %r1392, %rs19; 2026-02-21T10:22:47.6739556Z cvt.f32.bf16 %r1393, %rs20; 2026-02-21T10:22:47.6739728Z cvt.f32.bf16 %r1394, %rs23; 2026-02-21T10:22:47.6739903Z cvt.f32.bf16 %r1395, %rs24; 2026-02-21T10:22:47.6740078Z cvt.f32.bf16 %r1524, %rs27; 2026-02-21T10:22:47.6740257Z cvt.f32.bf16 %r1525, %rs28; 2026-02-21T10:22:47.6740432Z cvt.f32.bf16 %r1526, %rs31; 2026-02-21T10:22:47.6740603Z cvt.f32.bf16 %r1527, %rs32; 2026-02-21T10:22:47.6740922Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6741278Z shl.b32 %r1696, %r1912, 3; 2026-02-21T10:22:47.6741521Z add.s32 %r470, %r306, %r1696; 2026-02-21T10:22:47.6741697Z // begin inline asm 2026-02-21T10:22:47.6741849Z 2026-02-21T10:22:47.6741971Z { 2026-02-21T10:22:47.6742105Z .reg .pred complete; 2026-02-21T10:22:47.6742267Z waitLoop: 2026-02-21T10:22:47.6742478Z mbarrier.try_wait.parity.shared.b64 complete, [%r470], %r1911; 2026-02-21T10:22:47.6742769Z @!complete bra.uni waitLoop; 2026-02-21T10:22:47.6743013Z } 2026-02-21T10:22:47.6743093Z 2026-02-21T10:22:47.6743152Z // end inline asm 2026-02-21T10:22:47.6743445Z .loc 1 63 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:63:33 2026-02-21T10:22:47.6743798Z shl.b32 %r1698, %r1912, 12; 2026-02-21T10:22:47.6743974Z add.s32 %r1700, %r325, %r1698; 2026-02-21T10:22:47.6744289Z .loc 1 81 58 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:81:58 2026-02-21T10:22:47.6744648Z add.s32 %r1701, %r1700, %r37; 2026-02-21T10:22:47.6744833Z ld.shared.b8 %rs33, [%r1701]; 2026-02-21T10:22:47.6745023Z ld.shared.b8 %rs34, [%r1701+1024]; 2026-02-21T10:22:47.6745220Z ld.shared.b8 %rs35, [%r1701+2048]; 2026-02-21T10:22:47.6745416Z ld.shared.b8 %rs36, [%r1701+3072]; 2026-02-21T10:22:47.6745604Z xor.b32 %r1702, %r37, 16; 2026-02-21T10:22:47.6745777Z add.s32 %r1703, %r1700, %r1702; 2026-02-21T10:22:47.6745970Z ld.shared.b8 %rs37, [%r1703+128]; 2026-02-21T10:22:47.6746160Z ld.shared.b8 %rs38, [%r1703+1152]; 2026-02-21T10:22:47.6746354Z ld.shared.b8 %rs39, [%r1703+2176]; 2026-02-21T10:22:47.6746654Z ld.shared.b8 %rs40, [%r1703+3200]; 2026-02-21T10:22:47.6746843Z xor.b32 %r1704, %r37, 32; 2026-02-21T10:22:47.6747013Z add.s32 %r1705, %r1700, %r1704; 2026-02-21T10:22:47.6747204Z ld.shared.b8 %rs41, [%r1705+256]; 2026-02-21T10:22:47.6747391Z ld.shared.b8 %rs42, [%r1705+1280]; 2026-02-21T10:22:47.6747582Z ld.shared.b8 %rs43, [%r1705+2304]; 2026-02-21T10:22:47.6747771Z ld.shared.b8 %rs44, [%r1705+3328]; 2026-02-21T10:22:47.6747960Z xor.b32 %r1706, %r37, 48; 2026-02-21T10:22:47.6748132Z add.s32 %r1707, %r1700, %r1706; 2026-02-21T10:22:47.6748313Z ld.shared.b8 %rs45, [%r1707+384]; 2026-02-21T10:22:47.6748570Z ld.shared.b8 %rs46, [%r1707+1408]; 2026-02-21T10:22:47.6748768Z ld.shared.b8 %rs47, [%r1707+2432]; 2026-02-21T10:22:47.6748959Z ld.shared.b8 %rs48, [%r1707+3456]; 2026-02-21T10:22:47.6749144Z xor.b32 %r1708, %r37, 64; 2026-02-21T10:22:47.6749318Z add.s32 %r1709, %r1700, %r1708; 2026-02-21T10:22:47.6749498Z ld.shared.b8 %rs49, [%r1709+512]; 2026-02-21T10:22:47.6749684Z ld.shared.b8 %rs50, [%r1709+1536]; 2026-02-21T10:22:47.6749873Z ld.shared.b8 %rs51, [%r1709+2560]; 2026-02-21T10:22:47.6750061Z ld.shared.b8 %rs52, [%r1709+3584]; 2026-02-21T10:22:47.6750250Z xor.b32 %r1710, %r37, 80; 2026-02-21T10:22:47.6750415Z add.s32 %r1711, %r1700, %r1710; 2026-02-21T10:22:47.6750600Z ld.shared.b8 %rs53, [%r1711+640]; 2026-02-21T10:22:47.6750787Z ld.shared.b8 %rs54, [%r1711+1664]; 2026-02-21T10:22:47.6751073Z ld.shared.b8 %rs55, [%r1711+2688]; 2026-02-21T10:22:47.6751261Z ld.shared.b8 %rs56, [%r1711+3712]; 2026-02-21T10:22:47.6751445Z xor.b32 %r1712, %r37, 96; 2026-02-21T10:22:47.6751622Z add.s32 %r1713, %r1700, %r1712; 2026-02-21T10:22:47.6751810Z ld.shared.b8 %rs57, [%r1713+768]; 2026-02-21T10:22:47.6752092Z ld.shared.b8 %rs58, [%r1713+1792]; 2026-02-21T10:22:47.6752291Z ld.shared.b8 %rs59, [%r1713+2816]; 2026-02-21T10:22:47.6752498Z ld.shared.b8 %rs60, [%r1713+3840]; 2026-02-21T10:22:47.6752694Z xor.b32 %r1714, %r37, 112; 2026-02-21T10:22:47.6752882Z add.s32 %r1715, %r1700, %r1714; 2026-02-21T10:22:47.6753075Z ld.shared.b8 %rs61, [%r1715+896]; 2026-02-21T10:22:47.6753277Z ld.shared.b8 %rs62, [%r1715+1920]; 2026-02-21T10:22:47.6753470Z ld.shared.b8 %rs63, [%r1715+2944]; 2026-02-21T10:22:47.6753671Z ld.shared.b8 %rs64, [%r1715+3968]; 2026-02-21T10:22:47.6754021Z .loc 1 66 28 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:66:28 2026-02-21T10:22:47.6754385Z shl.b16 %rs65, %rs33, 4; 2026-02-21T10:22:47.6754646Z shl.b16 %rs66, %rs37, 4; 2026-02-21T10:22:47.6754829Z shl.b16 %rs67, %rs41, 4; 2026-02-21T10:22:47.6755001Z shl.b16 %rs68, %rs45, 4; 2026-02-21T10:22:47.6755166Z shl.b16 %rs69, %rs49, 4; 2026-02-21T10:22:47.6755341Z shl.b16 %rs70, %rs53, 4; 2026-02-21T10:22:47.6755506Z shl.b16 %rs71, %rs57, 4; 2026-02-21T10:22:47.6755677Z shl.b16 %rs72, %rs61, 4; 2026-02-21T10:22:47.6755925Z shl.b16 %rs73, %rs34, 4; 2026-02-21T10:22:47.6756104Z shl.b16 %rs74, %rs38, 4; 2026-02-21T10:22:47.6756275Z shl.b16 %rs75, %rs42, 4; 2026-02-21T10:22:47.6756441Z shl.b16 %rs76, %rs46, 4; 2026-02-21T10:22:47.6756736Z shl.b16 %rs77, %rs50, 4; 2026-02-21T10:22:47.6756902Z shl.b16 %rs78, %rs54, 4; 2026-02-21T10:22:47.6757074Z shl.b16 %rs79, %rs58, 4; 2026-02-21T10:22:47.6757242Z shl.b16 %rs80, %rs62, 4; 2026-02-21T10:22:47.6757412Z shl.b16 %rs81, %rs35, 4; 2026-02-21T10:22:47.6757577Z shl.b16 %rs82, %rs39, 4; 2026-02-21T10:22:47.6757749Z shl.b16 %rs83, %rs43, 4; 2026-02-21T10:22:47.6757926Z shl.b16 %rs84, %rs47, 4; 2026-02-21T10:22:47.6758093Z shl.b16 %rs85, %rs51, 4; 2026-02-21T10:22:47.6758264Z shl.b16 %rs86, %rs55, 4; 2026-02-21T10:22:47.6758428Z shl.b16 %rs87, %rs59, 4; 2026-02-21T10:22:47.6758603Z shl.b16 %rs88, %rs63, 4; 2026-02-21T10:22:47.6758768Z shl.b16 %rs89, %rs36, 4; 2026-02-21T10:22:47.6758941Z shl.b16 %rs90, %rs40, 4; 2026-02-21T10:22:47.6759115Z shl.b16 %rs91, %rs44, 4; 2026-02-21T10:22:47.6759288Z shl.b16 %rs92, %rs48, 4; 2026-02-21T10:22:47.6759456Z shl.b16 %rs93, %rs52, 4; 2026-02-21T10:22:47.6759627Z shl.b16 %rs94, %rs56, 4; 2026-02-21T10:22:47.6759799Z shl.b16 %rs95, %rs60, 4; 2026-02-21T10:22:47.6759969Z shl.b16 %rs96, %rs64, 4; 2026-02-21T10:22:47.6760296Z .loc 1 81 58 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:81:58 2026-02-21T10:22:47.6760673Z selp.b16 %rs97, %rs65, %rs33, %p46; 2026-02-21T10:22:47.6760890Z cvt.s16.s8 %rs98, %rs97; 2026-02-21T10:22:47.6761061Z shr.s16 %rs99, %rs98, 4; 2026-02-21T10:22:47.6761243Z selp.b16 %rs100, %rs66, %rs37, %p46; 2026-02-21T10:22:47.6761447Z cvt.s16.s8 %rs101, %rs100; 2026-02-21T10:22:47.6761630Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:22:47.6761818Z selp.b16 %rs103, %rs67, %rs41, %p46; 2026-02-21T10:22:47.6762021Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T10:22:47.6762204Z shr.s16 %rs105, %rs104, 4; 2026-02-21T10:22:47.6762384Z selp.b16 %rs106, %rs68, %rs45, %p46; 2026-02-21T10:22:47.6762585Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T10:22:47.6762758Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:22:47.6762945Z selp.b16 %rs109, %rs69, %rs49, %p46; 2026-02-21T10:22:47.6763145Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T10:22:47.6763325Z shr.s16 %rs111, %rs110, 4; 2026-02-21T10:22:47.6763521Z selp.b16 %rs112, %rs70, %rs53, %p46; 2026-02-21T10:22:47.6763716Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T10:22:47.6763894Z shr.s16 %rs114, %rs113, 4; 2026-02-21T10:22:47.6764076Z selp.b16 %rs115, %rs71, %rs57, %p46; 2026-02-21T10:22:47.6764368Z cvt.s16.s8 %rs116, %rs115; 2026-02-21T10:22:47.6764540Z shr.s16 %rs117, %rs116, 4; 2026-02-21T10:22:47.6764721Z selp.b16 %rs118, %rs72, %rs61, %p46; 2026-02-21T10:22:47.6764912Z cvt.s16.s8 %rs119, %rs118; 2026-02-21T10:22:47.6765184Z shr.s16 %rs120, %rs119, 4; 2026-02-21T10:22:47.6765363Z selp.b16 %rs121, %rs73, %rs34, %p46; 2026-02-21T10:22:47.6765561Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T10:22:47.6765737Z shr.s16 %rs123, %rs122, 4; 2026-02-21T10:22:47.6765918Z selp.b16 %rs124, %rs74, %rs38, %p46; 2026-02-21T10:22:47.6766116Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T10:22:47.6766287Z shr.s16 %rs126, %rs125, 4; 2026-02-21T10:22:47.6766578Z selp.b16 %rs127, %rs75, %rs42, %p46; 2026-02-21T10:22:47.6766772Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T10:22:47.6766944Z shr.s16 %rs129, %rs128, 4; 2026-02-21T10:22:47.6767120Z selp.b16 %rs130, %rs76, %rs46, %p46; 2026-02-21T10:22:47.6767315Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T10:22:47.6767483Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:22:47.6767751Z selp.b16 %rs133, %rs77, %rs50, %p46; 2026-02-21T10:22:47.6767956Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T10:22:47.6768124Z shr.s16 %rs135, %rs134, 4; 2026-02-21T10:22:47.6768306Z selp.b16 %rs136, %rs78, %rs54, %p46; 2026-02-21T10:22:47.6768500Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T10:22:47.6768673Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:22:47.6768847Z selp.b16 %rs139, %rs79, %rs58, %p46; 2026-02-21T10:22:47.6769123Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T10:22:47.6769298Z shr.s16 %rs141, %rs140, 4; 2026-02-21T10:22:47.6769481Z selp.b16 %rs142, %rs80, %rs62, %p46; 2026-02-21T10:22:47.6769690Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T10:22:47.6769862Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:22:47.6770044Z selp.b16 %rs145, %rs81, %rs35, %p46; 2026-02-21T10:22:47.6770236Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T10:22:47.6770412Z shr.s16 %rs147, %rs146, 4; 2026-02-21T10:22:47.6770590Z selp.b16 %rs148, %rs82, %rs39, %p46; 2026-02-21T10:22:47.6770791Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T10:22:47.6770963Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:22:47.6771146Z selp.b16 %rs151, %rs83, %rs43, %p46; 2026-02-21T10:22:47.6771340Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T10:22:47.6771517Z shr.s16 %rs153, %rs152, 4; 2026-02-21T10:22:47.6771701Z selp.b16 %rs154, %rs84, %rs47, %p46; 2026-02-21T10:22:47.6771895Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T10:22:47.6772071Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:22:47.6772249Z selp.b16 %rs157, %rs85, %rs51, %p46; 2026-02-21T10:22:47.6772450Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T10:22:47.6772620Z shr.s16 %rs159, %rs158, 4; 2026-02-21T10:22:47.6772802Z selp.b16 %rs160, %rs86, %rs55, %p46; 2026-02-21T10:22:47.6773000Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T10:22:47.6773177Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:22:47.6773360Z selp.b16 %rs163, %rs87, %rs59, %p46; 2026-02-21T10:22:47.6773555Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T10:22:47.6773728Z shr.s16 %rs165, %rs164, 4; 2026-02-21T10:22:47.6773906Z selp.b16 %rs166, %rs88, %rs63, %p46; 2026-02-21T10:22:47.6774117Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T10:22:47.6774288Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:22:47.6774465Z selp.b16 %rs169, %rs89, %rs36, %p46; 2026-02-21T10:22:47.6774658Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T10:22:47.6774833Z shr.s16 %rs171, %rs170, 4; 2026-02-21T10:22:47.6775009Z selp.b16 %rs172, %rs90, %rs40, %p46; 2026-02-21T10:22:47.6775208Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T10:22:47.6775388Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:22:47.6775567Z selp.b16 %rs175, %rs91, %rs44, %p46; 2026-02-21T10:22:47.6775767Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T10:22:47.6775935Z shr.s16 %rs177, %rs176, 4; 2026-02-21T10:22:47.6776116Z selp.b16 %rs178, %rs92, %rs48, %p46; 2026-02-21T10:22:47.6776308Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T10:22:47.6776595Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:22:47.6776788Z selp.b16 %rs181, %rs93, %rs52, %p46; 2026-02-21T10:22:47.6776986Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T10:22:47.6777245Z shr.s16 %rs183, %rs182, 4; 2026-02-21T10:22:47.6777422Z selp.b16 %rs184, %rs94, %rs56, %p46; 2026-02-21T10:22:47.6777619Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T10:22:47.6777789Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:22:47.6778044Z selp.b16 %rs187, %rs95, %rs60, %p46; 2026-02-21T10:22:47.6778237Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T10:22:47.6778414Z shr.s16 %rs189, %rs188, 4; 2026-02-21T10:22:47.6778594Z selp.b16 %rs190, %rs96, %rs64, %p46; 2026-02-21T10:22:47.6778790Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T10:22:47.6778964Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:22:47.6779285Z .loc 1 86 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:86:32 2026-02-21T10:22:47.6779648Z cvt.rn.f32.s16 %r1716, %rs99; 2026-02-21T10:22:47.6779838Z cvt.rn.f32.s16 %r1717, %rs102; 2026-02-21T10:22:47.6780029Z cvt.rn.f32.s16 %r1718, %rs105; 2026-02-21T10:22:47.6780210Z cvt.rn.f32.s16 %r1719, %rs108; 2026-02-21T10:22:47.6780397Z cvt.rn.f32.s16 %r1720, %rs111; 2026-02-21T10:22:47.6780655Z cvt.rn.f32.s16 %r1721, %rs114; 2026-02-21T10:22:47.6780845Z cvt.rn.f32.s16 %r1722, %rs117; 2026-02-21T10:22:47.6781025Z cvt.rn.f32.s16 %r1723, %rs120; 2026-02-21T10:22:47.6781206Z cvt.rn.f32.s16 %r1724, %rs123; 2026-02-21T10:22:47.6781398Z cvt.rn.f32.s16 %r1725, %rs126; 2026-02-21T10:22:47.6781576Z cvt.rn.f32.s16 %r1726, %rs129; 2026-02-21T10:22:47.6781834Z cvt.rn.f32.s16 %r1727, %rs132; 2026-02-21T10:22:47.6782020Z cvt.rn.f32.s16 %r1728, %rs135; 2026-02-21T10:22:47.6782203Z cvt.rn.f32.s16 %r1729, %rs138; 2026-02-21T10:22:47.6782382Z cvt.rn.f32.s16 %r1730, %rs141; 2026-02-21T10:22:47.6782566Z cvt.rn.f32.s16 %r1731, %rs144; 2026-02-21T10:22:47.6782745Z cvt.rn.f32.s16 %r1732, %rs147; 2026-02-21T10:22:47.6782932Z cvt.rn.f32.s16 %r1733, %rs150; 2026-02-21T10:22:47.6783108Z cvt.rn.f32.s16 %r1734, %rs153; 2026-02-21T10:22:47.6783293Z cvt.rn.f32.s16 %r1735, %rs156; 2026-02-21T10:22:47.6783479Z cvt.rn.f32.s16 %r1736, %rs159; 2026-02-21T10:22:47.6783661Z cvt.rn.f32.s16 %r1737, %rs162; 2026-02-21T10:22:47.6783842Z cvt.rn.f32.s16 %r1738, %rs165; 2026-02-21T10:22:47.6784019Z cvt.rn.f32.s16 %r1739, %rs168; 2026-02-21T10:22:47.6784200Z cvt.rn.f32.s16 %r1740, %rs171; 2026-02-21T10:22:47.6784380Z cvt.rn.f32.s16 %r1741, %rs174; 2026-02-21T10:22:47.6784561Z cvt.rn.f32.s16 %r1742, %rs177; 2026-02-21T10:22:47.6784739Z cvt.rn.f32.s16 %r1743, %rs180; 2026-02-21T10:22:47.6784924Z cvt.rn.f32.s16 %r1744, %rs183; 2026-02-21T10:22:47.6785105Z cvt.rn.f32.s16 %r1745, %rs186; 2026-02-21T10:22:47.6785283Z cvt.rn.f32.s16 %r1746, %rs189; 2026-02-21T10:22:47.6785469Z cvt.rn.f32.s16 %r1747, %rs192; 2026-02-21T10:22:47.6785650Z st.shared.b32 [%r38], %r1716; 2026-02-21T10:22:47.6785847Z st.shared.b32 [%r38+8], %r1717; 2026-02-21T10:22:47.6786041Z st.shared.b32 [%r38+16384], %r1732; 2026-02-21T10:22:47.6786245Z st.shared.b32 [%r38+16392], %r1733; 2026-02-21T10:22:47.6786433Z st.shared.b32 [%r39], %r1718; 2026-02-21T10:22:47.6786742Z st.shared.b32 [%r39+8], %r1719; 2026-02-21T10:22:47.6786931Z st.shared.b32 [%r39+16384], %r1734; 2026-02-21T10:22:47.6787129Z st.shared.b32 [%r39+16392], %r1735; 2026-02-21T10:22:47.6787326Z st.shared.b32 [%r40], %r1720; 2026-02-21T10:22:47.6787507Z st.shared.b32 [%r40+8], %r1721; 2026-02-21T10:22:47.6787697Z st.shared.b32 [%r40+16384], %r1736; 2026-02-21T10:22:47.6787892Z st.shared.b32 [%r40+16392], %r1737; 2026-02-21T10:22:47.6788087Z st.shared.b32 [%r41], %r1722; 2026-02-21T10:22:47.6788266Z st.shared.b32 [%r41+8], %r1723; 2026-02-21T10:22:47.6788454Z st.shared.b32 [%r41+16384], %r1738; 2026-02-21T10:22:47.6788712Z st.shared.b32 [%r41+16392], %r1739; 2026-02-21T10:22:47.6788907Z st.shared.b32 [%r42], %r1724; 2026-02-21T10:22:47.6789093Z st.shared.b32 [%r42+8], %r1725; 2026-02-21T10:22:47.6789276Z st.shared.b32 [%r42+16384], %r1740; 2026-02-21T10:22:47.6789475Z st.shared.b32 [%r42+16392], %r1741; 2026-02-21T10:22:47.6789665Z st.shared.b32 [%r43], %r1726; 2026-02-21T10:22:47.6789945Z st.shared.b32 [%r43+8], %r1727; 2026-02-21T10:22:47.6790131Z st.shared.b32 [%r43+16384], %r1742; 2026-02-21T10:22:47.6790329Z st.shared.b32 [%r43+16392], %r1743; 2026-02-21T10:22:47.6790521Z st.shared.b32 [%r44], %r1728; 2026-02-21T10:22:47.6790772Z st.shared.b32 [%r44+8], %r1729; 2026-02-21T10:22:47.6790966Z st.shared.b32 [%r44+16384], %r1744; 2026-02-21T10:22:47.6791164Z st.shared.b32 [%r44+16392], %r1745; 2026-02-21T10:22:47.6791376Z st.shared.b32 [%r45], %r1730; 2026-02-21T10:22:47.6791565Z st.shared.b32 [%r45+8], %r1731; 2026-02-21T10:22:47.6791759Z st.shared.b32 [%r45+16384], %r1746; 2026-02-21T10:22:47.6791956Z st.shared.b32 [%r45+16392], %r1747; 2026-02-21T10:22:47.6792148Z $L__tmp1: 2026-02-21T10:22:47.6792510Z .loc 2 291 36 // standard.py:291:36 @[ cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:93:40 ] 2026-02-21T10:22:47.6792938Z // begin inline asm 2026-02-21T10:22:47.6793129Z fence.proxy.async.shared::cta; 2026-02-21T10:22:47.6793324Z // end inline asm 2026-02-21T10:22:47.6793558Z bar.sync 0; 2026-02-21T10:22:47.6793734Z shfl.sync.idx.b32 %r1748, %r3, 0, 31, -1; 2026-02-21T10:22:47.6793970Z wgmma.fence.sync.aligned; 2026-02-21T10:22:47.6794156Z mov.pred %p34, -1; 2026-02-21T10:22:47.6794322Z // begin inline asm 2026-02-21T10:22:47.6795746Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r600,%r601,%r602,%r603}, %rd91, %p34, 1, 1; 2026-02-21T10:22:47.6797297Z // end inline asm 2026-02-21T10:22:47.6797452Z // begin inline asm 2026-02-21T10:22:47.6798814Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r732,%r733,%r734,%r735}, %rd92, %p34, 1, 1; 2026-02-21T10:22:47.6800205Z // end inline asm 2026-02-21T10:22:47.6800353Z // begin inline asm 2026-02-21T10:22:47.6801709Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r864,%r865,%r866,%r867}, %rd93, %p34, 1, 1; 2026-02-21T10:22:47.6803104Z // end inline asm 2026-02-21T10:22:47.6803251Z // begin inline asm 2026-02-21T10:22:47.6804593Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r996,%r997,%r998,%r999}, %rd94, %p34, 1, 1; 2026-02-21T10:22:47.6805987Z // end inline asm 2026-02-21T10:22:47.6806133Z // begin inline asm 2026-02-21T10:22:47.6807591Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r1128,%r1129,%r1130,%r1131}, %rd95, %p34, 1, 1; 2026-02-21T10:22:47.6809164Z // end inline asm 2026-02-21T10:22:47.6809332Z // begin inline asm 2026-02-21T10:22:47.6810766Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r1260,%r1261,%r1262,%r1263}, %rd96, %p34, 1, 1; 2026-02-21T10:22:47.6812177Z // end inline asm 2026-02-21T10:22:47.6812329Z // begin inline asm 2026-02-21T10:22:47.6813771Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r1392,%r1393,%r1394,%r1395}, %rd97, %p34, 1, 1; 2026-02-21T10:22:47.6815174Z // end inline asm 2026-02-21T10:22:47.6815320Z // begin inline asm 2026-02-21T10:22:47.6816794Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979}, {%r1524,%r1525,%r1526,%r1527}, %rd98, %p34, 1, 1; 2026-02-21T10:22:47.6818213Z // end inline asm 2026-02-21T10:22:47.6818388Z wgmma.commit_group.sync.aligned; 2026-02-21T10:22:47.6818593Z mov.b32 %r1594, 0; 2026-02-21T10:22:47.6818747Z mov.b32 %r1592, %r415; 2026-02-21T10:22:47.6818922Z mov.b32 %r1593, %r1594; 2026-02-21T10:22:47.6819090Z // begin inline asm 2026-02-21T10:22:47.6820259Z // wait for regs: %r1916,%r1917,%r1918,%r1919,%r1920,%r1921,%r1922,%r1923,%r1924,%r1925,%r1926,%r1927,%r1928,%r1929,%r1930,%r1931,%r1932,%r1933,%r1934,%r1935,%r1936,%r1937,%r1938,%r1939,%r1940,%r1941,%r1942,%r1943,%r1944,%r1945,%r1946,%r1947,%r1948,%r1949,%r1950,%r1951,%r1952,%r1953,%r1954,%r1955,%r1956,%r1957,%r1958,%r1959,%r1960,%r1961,%r1962,%r1963,%r1964,%r1965,%r1966,%r1967,%r1968,%r1969,%r1970,%r1971,%r1972,%r1973,%r1974,%r1975,%r1976,%r1977,%r1978,%r1979,%r1592,%r1593,%r1594 2026-02-21T10:22:47.6821489Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:22:47.6821683Z // end inline asm 2026-02-21T10:22:47.6821838Z $L__tmp2: 2026-02-21T10:22:47.6822134Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6822518Z add.s32 %r1749, %r1910, 32; 2026-02-21T10:22:47.6822708Z add.s32 %r1750, %r1913, 1; 2026-02-21T10:22:47.6822889Z setp.gt.s32 %p50, %r1750, 1; 2026-02-21T10:22:47.6823086Z selp.b32 %r1913, 0, %r1750, %p50; 2026-02-21T10:22:47.6823293Z selp.b32 %r1910, 0, %r1749, %p44; 2026-02-21T10:22:47.6823632Z .loc 1 54 22 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:54:22 2026-02-21T10:22:47.6824110Z shl.b32 %r1751, %r1910, 1; 2026-02-21T10:22:47.6824431Z .loc 1 56 25 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:56:25 2026-02-21T10:22:47.6824792Z add.s32 %r1752, %r1751, %r15; 2026-02-21T10:22:47.6825174Z .loc 1 57 53 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:53 2026-02-21T10:22:47.6825535Z shl.b32 %r1753, %r1986, 13; 2026-02-21T10:22:47.6825715Z shl.b32 %r1754, %r1987, 13; 2026-02-21T10:22:47.6825898Z shl.b32 %r1755, %r1988, 13; 2026-02-21T10:22:47.6826082Z shl.b32 %r1756, %r1989, 13; 2026-02-21T10:22:47.6826261Z shl.b32 %r1757, %r1990, 13; 2026-02-21T10:22:47.6826434Z shl.b32 %r1758, %r1991, 13; 2026-02-21T10:22:47.6826721Z shl.b32 %r1759, %r1992, 13; 2026-02-21T10:22:47.6826896Z shl.b32 %r1760, %r1993, 13; 2026-02-21T10:22:47.6827210Z .loc 1 57 60 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:60 2026-02-21T10:22:47.6827664Z add.s32 %r1761, %r1753, %r1752; 2026-02-21T10:22:47.6827862Z add.s32 %r1762, %r1754, %r1752; 2026-02-21T10:22:47.6828050Z add.s32 %r1763, %r1755, %r1752; 2026-02-21T10:22:47.6828229Z add.s32 %r1764, %r1756, %r1752; 2026-02-21T10:22:47.6828416Z add.s32 %r1765, %r1757, %r1752; 2026-02-21T10:22:47.6828681Z add.s32 %r1766, %r1758, %r1752; 2026-02-21T10:22:47.6828867Z add.s32 %r1767, %r1759, %r1752; 2026-02-21T10:22:47.6829125Z add.s32 %r1768, %r1760, %r1752; 2026-02-21T10:22:47.6829450Z .loc 1 57 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:32 2026-02-21T10:22:47.6829815Z mad.wide.s32 %rd99, %r1761, 2, %rd10; 2026-02-21T10:22:47.6830029Z mad.wide.s32 %rd100, %r1762, 2, %rd10; 2026-02-21T10:22:47.6830247Z mad.wide.s32 %rd101, %r1763, 2, %rd10; 2026-02-21T10:22:47.6830452Z mad.wide.s32 %rd102, %r1764, 2, %rd10; 2026-02-21T10:22:47.6830676Z mad.wide.s32 %rd103, %r1765, 2, %rd10; 2026-02-21T10:22:47.6830883Z mad.wide.s32 %rd104, %r1766, 2, %rd10; 2026-02-21T10:22:47.6831097Z mad.wide.s32 %rd105, %r1767, 2, %rd10; 2026-02-21T10:22:47.6831303Z mad.wide.s32 %rd106, %r1768, 2, %rd10; 2026-02-21T10:22:47.6831644Z .loc 1 57 80 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:57:80 2026-02-21T10:22:47.6832005Z shl.b32 %r1769, %r1913, 14; 2026-02-21T10:22:47.6832073Z add.s32 %r1770, %r358, %r1769; 2026-02-21T10:22:47.6832139Z add.s32 %r1662, %r1770, %r27; 2026-02-21T10:22:47.6832213Z selp.b32 %r1663, 8, 0, %p45; 2026-02-21T10:22:47.6832276Z // begin inline asm 2026-02-21T10:22:47.6832425Z cp.async.ca.shared.global [ %r1662 + 0 ], [ %rd99 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6832485Z // end inline asm 2026-02-21T10:22:47.6832555Z add.s32 %r1664, %r1662, 2048; 2026-02-21T10:22:47.6832617Z // begin inline asm 2026-02-21T10:22:47.6832759Z cp.async.ca.shared.global [ %r1664 + 0 ], [ %rd100 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6832825Z // end inline asm 2026-02-21T10:22:47.6832890Z add.s32 %r1666, %r1662, 4096; 2026-02-21T10:22:47.6832952Z // begin inline asm 2026-02-21T10:22:47.6833098Z cp.async.ca.shared.global [ %r1666 + 0 ], [ %rd101 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6833162Z // end inline asm 2026-02-21T10:22:47.6833225Z add.s32 %r1668, %r1662, 6144; 2026-02-21T10:22:47.6833287Z // begin inline asm 2026-02-21T10:22:47.6833428Z cp.async.ca.shared.global [ %r1668 + 0 ], [ %rd102 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6833500Z // end inline asm 2026-02-21T10:22:47.6833565Z add.s32 %r1670, %r1662, 8192; 2026-02-21T10:22:47.6833632Z // begin inline asm 2026-02-21T10:22:47.6833763Z cp.async.ca.shared.global [ %r1670 + 0 ], [ %rd103 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6833820Z // end inline asm 2026-02-21T10:22:47.6833885Z add.s32 %r1672, %r1662, 10240; 2026-02-21T10:22:47.6833952Z // begin inline asm 2026-02-21T10:22:47.6834085Z cp.async.ca.shared.global [ %r1672 + 0 ], [ %rd104 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6834144Z // end inline asm 2026-02-21T10:22:47.6834298Z add.s32 %r1674, %r1662, 12288; 2026-02-21T10:22:47.6834362Z // begin inline asm 2026-02-21T10:22:47.6834516Z cp.async.ca.shared.global [ %r1674 + 0 ], [ %rd105 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6834577Z // end inline asm 2026-02-21T10:22:47.6834715Z add.s32 %r1676, %r1662, 14336; 2026-02-21T10:22:47.6834779Z // begin inline asm 2026-02-21T10:22:47.6834918Z cp.async.ca.shared.global [ %r1676 + 0 ], [ %rd106 + 0 ], 0x8, %r1663; 2026-02-21T10:22:47.6834984Z // end inline asm 2026-02-21T10:22:47.6835053Z cp.async.commit_group; 2026-02-21T10:22:47.6835269Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6835344Z shl.b32 %r1771, %r1913, 3; 2026-02-21T10:22:47.6835410Z add.s32 %r1678, %r306, %r1771; 2026-02-21T10:22:47.6835480Z and.pred %p42, %p55, %p45; 2026-02-21T10:22:47.6835540Z // begin inline asm 2026-02-21T10:22:47.6835681Z @%p42 mbarrier.arrive.expect_tx.shared.b64 _, [%r1678], 4096; 2026-02-21T10:22:47.6835742Z // end inline asm 2026-02-21T10:22:47.6836000Z .loc 1 63 33 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:63:33 2026-02-21T10:22:47.6836075Z shl.b32 %r1772, %r1913, 12; 2026-02-21T10:22:47.6836141Z add.s32 %r1679, %r325, %r1772; 2026-02-21T10:22:47.6836202Z bar.sync 0; 2026-02-21T10:22:47.6836272Z elect.sync %r1773|%p51, -1; 2026-02-21T10:22:47.6836351Z and.pred %p52, %p45, %p51; 2026-02-21T10:22:47.6836605Z and.pred %p43, %p1, %p52; 2026-02-21T10:22:47.6836677Z // begin inline asm 2026-02-21T10:22:47.6837015Z @%p43 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r1679], [%rd38, {%r1984, %r1910}], [%r1678]; 2026-02-21T10:22:47.6837076Z // end inline asm 2026-02-21T10:22:47.6837294Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6837368Z setp.ne.b32 %p53, %r1909, 127; 2026-02-21T10:22:47.6837430Z @%p53 bra $L__BB0_6; 2026-02-21T10:22:47.6837545Z // %bb.5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:22:47.6837752Z .loc 1 41 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:41:32 2026-02-21T10:22:47.6837822Z add.s32 %r1847, %r1908, %r4; 2026-02-21T10:22:47.6837885Z add.s32 %r1848, %r1908, %r5; 2026-02-21T10:22:47.6837949Z add.s32 %r1849, %r1908, %r6; 2026-02-21T10:22:47.6838018Z add.s32 %r1850, %r1908, %r7; 2026-02-21T10:22:47.6838080Z add.s32 %r1851, %r1908, %r8; 2026-02-21T10:22:47.6838140Z add.s32 %r1852, %r1908, %r9; 2026-02-21T10:22:47.6838213Z add.s32 %r1853, %r1908, %r10; 2026-02-21T10:22:47.6838274Z add.s32 %r1854, %r1908, %r11; 2026-02-21T10:22:47.6838473Z .loc 1 43 32 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:43:32 2026-02-21T10:22:47.6838536Z add.s32 %r1855, %r1907, %r12; 2026-02-21T10:22:47.6838740Z .loc 1 96 28 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:96:28 2026-02-21T10:22:47.6838823Z cvt.rn.bf16x2.f32 %r1856, %r1917, %r1916; 2026-02-21T10:22:47.6838900Z cvt.rn.bf16x2.f32 %r1857, %r1919, %r1918; 2026-02-21T10:22:47.6838980Z cvt.rn.bf16x2.f32 %r1858, %r1921, %r1920; 2026-02-21T10:22:47.6839054Z cvt.rn.bf16x2.f32 %r1859, %r1923, %r1922; 2026-02-21T10:22:47.6839128Z cvt.rn.bf16x2.f32 %r1860, %r1925, %r1924; 2026-02-21T10:22:47.6839205Z cvt.rn.bf16x2.f32 %r1861, %r1927, %r1926; 2026-02-21T10:22:47.6839280Z cvt.rn.bf16x2.f32 %r1862, %r1929, %r1928; 2026-02-21T10:22:47.6839352Z cvt.rn.bf16x2.f32 %r1863, %r1931, %r1930; 2026-02-21T10:22:47.6839426Z cvt.rn.bf16x2.f32 %r1864, %r1933, %r1932; 2026-02-21T10:22:47.6839506Z cvt.rn.bf16x2.f32 %r1865, %r1935, %r1934; 2026-02-21T10:22:47.6839579Z cvt.rn.bf16x2.f32 %r1866, %r1937, %r1936; 2026-02-21T10:22:47.6839651Z cvt.rn.bf16x2.f32 %r1867, %r1939, %r1938; 2026-02-21T10:22:47.6839730Z cvt.rn.bf16x2.f32 %r1868, %r1941, %r1940; 2026-02-21T10:22:47.6839804Z cvt.rn.bf16x2.f32 %r1869, %r1943, %r1942; 2026-02-21T10:22:47.6839973Z cvt.rn.bf16x2.f32 %r1870, %r1945, %r1944; 2026-02-21T10:22:47.6840053Z cvt.rn.bf16x2.f32 %r1871, %r1947, %r1946; 2026-02-21T10:22:47.6840125Z cvt.rn.bf16x2.f32 %r1872, %r1949, %r1948; 2026-02-21T10:22:47.6840261Z cvt.rn.bf16x2.f32 %r1873, %r1951, %r1950; 2026-02-21T10:22:47.6840333Z cvt.rn.bf16x2.f32 %r1874, %r1953, %r1952; 2026-02-21T10:22:47.6840411Z cvt.rn.bf16x2.f32 %r1875, %r1955, %r1954; 2026-02-21T10:22:47.6840484Z cvt.rn.bf16x2.f32 %r1876, %r1957, %r1956; 2026-02-21T10:22:47.6840559Z cvt.rn.bf16x2.f32 %r1877, %r1959, %r1958; 2026-02-21T10:22:47.6840636Z cvt.rn.bf16x2.f32 %r1878, %r1961, %r1960; 2026-02-21T10:22:47.6840710Z cvt.rn.bf16x2.f32 %r1879, %r1963, %r1962; 2026-02-21T10:22:47.6840783Z cvt.rn.bf16x2.f32 %r1880, %r1965, %r1964; 2026-02-21T10:22:47.6840862Z cvt.rn.bf16x2.f32 %r1881, %r1967, %r1966; 2026-02-21T10:22:47.6840934Z cvt.rn.bf16x2.f32 %r1882, %r1969, %r1968; 2026-02-21T10:22:47.6841007Z cvt.rn.bf16x2.f32 %r1883, %r1971, %r1970; 2026-02-21T10:22:47.6841142Z cvt.rn.bf16x2.f32 %r1884, %r1973, %r1972; 2026-02-21T10:22:47.6841224Z cvt.rn.bf16x2.f32 %r1885, %r1975, %r1974; 2026-02-21T10:22:47.6841298Z cvt.rn.bf16x2.f32 %r1886, %r1977, %r1976; 2026-02-21T10:22:47.6841372Z cvt.rn.bf16x2.f32 %r1887, %r1979, %r1978; 2026-02-21T10:22:47.6841590Z .loc 1 97 50 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:97:50 2026-02-21T10:22:47.6841723Z mad.lo.s32 %r1888, %r1847, 1280, %r1855; 2026-02-21T10:22:47.6841796Z mad.lo.s32 %r1889, %r1848, 1280, %r1855; 2026-02-21T10:22:47.6841884Z mad.lo.s32 %r1890, %r1849, 1280, %r1855; 2026-02-21T10:22:47.6841959Z mad.lo.s32 %r1891, %r1850, 1280, %r1855; 2026-02-21T10:22:47.6842030Z mad.lo.s32 %r1892, %r1851, 1280, %r1855; 2026-02-21T10:22:47.6842098Z mad.lo.s32 %r1893, %r1852, 1280, %r1855; 2026-02-21T10:22:47.6842176Z mad.lo.s32 %r1894, %r1853, 1280, %r1855; 2026-02-21T10:22:47.6842246Z mad.lo.s32 %r1895, %r1854, 1280, %r1855; 2026-02-21T10:22:47.6842457Z .loc 1 97 22 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:97:22 2026-02-21T10:22:47.6842534Z mad.wide.s32 %rd108, %r1888, 2, %rd11; 2026-02-21T10:22:47.6842605Z mad.wide.s32 %rd109, %r1889, 2, %rd11; 2026-02-21T10:22:47.6842679Z mad.wide.s32 %rd110, %r1890, 2, %rd11; 2026-02-21T10:22:47.6842753Z mad.wide.s32 %rd111, %r1891, 2, %rd11; 2026-02-21T10:22:47.6842823Z mad.wide.s32 %rd112, %r1892, 2, %rd11; 2026-02-21T10:22:47.6842892Z mad.wide.s32 %rd113, %r1893, 2, %rd11; 2026-02-21T10:22:47.6842960Z mad.wide.s32 %rd114, %r1894, 2, %rd11; 2026-02-21T10:22:47.6843039Z mad.wide.s32 %rd115, %r1895, 2, %rd11; 2026-02-21T10:22:47.6843249Z .loc 1 97 81 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:97:81 2026-02-21T10:22:47.6843362Z st.shared.v4.b32 [%r46], {%r1856, %r1858, %r1860, %r1862}; 2026-02-21T10:22:47.6843486Z st.shared.v4.b32 [%r46+512], {%r1857, %r1859, %r1861, %r1863}; 2026-02-21T10:22:47.6843595Z st.shared.v4.b32 [%r47], {%r1864, %r1866, %r1868, %r1870}; 2026-02-21T10:22:47.6843707Z st.shared.v4.b32 [%r47+512], {%r1865, %r1867, %r1869, %r1871}; 2026-02-21T10:22:47.6843819Z st.shared.v4.b32 [%r48], {%r1872, %r1874, %r1876, %r1878}; 2026-02-21T10:22:47.6843938Z st.shared.v4.b32 [%r48+512], {%r1873, %r1875, %r1877, %r1879}; 2026-02-21T10:22:47.6844042Z st.shared.v4.b32 [%r49], {%r1880, %r1882, %r1884, %r1886}; 2026-02-21T10:22:47.6844154Z st.shared.v4.b32 [%r49+512], {%r1881, %r1883, %r1885, %r1887}; 2026-02-21T10:22:47.6844222Z bar.sync 0; 2026-02-21T10:22:47.6844287Z // begin inline asm 2026-02-21T10:22:47.6844485Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1814, %r1815, %r1816, %r1817}, [%r1778]; 2026-02-21T10:22:47.6844553Z // end inline asm 2026-02-21T10:22:47.6844616Z // begin inline asm 2026-02-21T10:22:47.6844800Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1818, %r1819, %r1820, %r1821}, [%r1783]; 2026-02-21T10:22:47.6844865Z // end inline asm 2026-02-21T10:22:47.6844997Z // begin inline asm 2026-02-21T10:22:47.6845183Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1822, %r1823, %r1824, %r1825}, [%r1788]; 2026-02-21T10:22:47.6845244Z // end inline asm 2026-02-21T10:22:47.6845311Z // begin inline asm 2026-02-21T10:22:47.6845539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1826, %r1827, %r1828, %r1829}, [%r1793]; 2026-02-21T10:22:47.6845598Z // end inline asm 2026-02-21T10:22:47.6845666Z // begin inline asm 2026-02-21T10:22:47.6845850Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1830, %r1831, %r1832, %r1833}, [%r1798]; 2026-02-21T10:22:47.6845908Z // end inline asm 2026-02-21T10:22:47.6845968Z // begin inline asm 2026-02-21T10:22:47.6846155Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1834, %r1835, %r1836, %r1837}, [%r1803]; 2026-02-21T10:22:47.6846213Z // end inline asm 2026-02-21T10:22:47.6846273Z // begin inline asm 2026-02-21T10:22:47.6846582Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1838, %r1839, %r1840, %r1841}, [%r1808]; 2026-02-21T10:22:47.6846648Z // end inline asm 2026-02-21T10:22:47.6846781Z // begin inline asm 2026-02-21T10:22:47.6846986Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1842, %r1843, %r1844, %r1845}, [%r1813]; 2026-02-21T10:22:47.6847045Z // end inline asm 2026-02-21T10:22:47.6847106Z // begin inline asm 2026-02-21T10:22:47.6847239Z st.global.v4.b32 [ %rd108 + 0 ], { %r1814, %r1815, %r1816, %r1817 }; 2026-02-21T10:22:47.6847305Z // end inline asm 2026-02-21T10:22:47.6847442Z // begin inline asm 2026-02-21T10:22:47.6847567Z st.global.v4.b32 [ %rd109 + 0 ], { %r1818, %r1819, %r1820, %r1821 }; 2026-02-21T10:22:47.6847629Z // end inline asm 2026-02-21T10:22:47.6847688Z // begin inline asm 2026-02-21T10:22:47.6847808Z st.global.v4.b32 [ %rd110 + 0 ], { %r1822, %r1823, %r1824, %r1825 }; 2026-02-21T10:22:47.6847866Z // end inline asm 2026-02-21T10:22:47.6847930Z // begin inline asm 2026-02-21T10:22:47.6848044Z st.global.v4.b32 [ %rd111 + 0 ], { %r1826, %r1827, %r1828, %r1829 }; 2026-02-21T10:22:47.6848100Z // end inline asm 2026-02-21T10:22:47.6848166Z // begin inline asm 2026-02-21T10:22:47.6848283Z st.global.v4.b32 [ %rd112 + 0 ], { %r1830, %r1831, %r1832, %r1833 }; 2026-02-21T10:22:47.6848340Z // end inline asm 2026-02-21T10:22:47.6848407Z // begin inline asm 2026-02-21T10:22:47.6848522Z st.global.v4.b32 [ %rd113 + 0 ], { %r1834, %r1835, %r1836, %r1837 }; 2026-02-21T10:22:47.6848583Z // end inline asm 2026-02-21T10:22:47.6848643Z // begin inline asm 2026-02-21T10:22:47.6848767Z st.global.v4.b32 [ %rd114 + 0 ], { %r1838, %r1839, %r1840, %r1841 }; 2026-02-21T10:22:47.6848824Z // end inline asm 2026-02-21T10:22:47.6848884Z // begin inline asm 2026-02-21T10:22:47.6849005Z st.global.v4.b32 [ %rd115 + 0 ], { %r1842, %r1843, %r1844, %r1845 }; 2026-02-21T10:22:47.6849074Z // end inline asm 2026-02-21T10:22:47.6849142Z mov.b32 %r1916, 0f00000000; 2026-02-21T10:22:47.6849207Z mov.b32 %r1917, %r1916; 2026-02-21T10:22:47.6849275Z mov.b32 %r1918, %r1916; 2026-02-21T10:22:47.6849335Z mov.b32 %r1919, %r1916; 2026-02-21T10:22:47.6849396Z mov.b32 %r1920, %r1916; 2026-02-21T10:22:47.6849464Z mov.b32 %r1921, %r1916; 2026-02-21T10:22:47.6849525Z mov.b32 %r1922, %r1916; 2026-02-21T10:22:47.6849586Z mov.b32 %r1923, %r1916; 2026-02-21T10:22:47.6849647Z mov.b32 %r1924, %r1916; 2026-02-21T10:22:47.6849718Z mov.b32 %r1925, %r1916; 2026-02-21T10:22:47.6849778Z mov.b32 %r1926, %r1916; 2026-02-21T10:22:47.6849837Z mov.b32 %r1927, %r1916; 2026-02-21T10:22:47.6849905Z mov.b32 %r1928, %r1916; 2026-02-21T10:22:47.6849966Z mov.b32 %r1929, %r1916; 2026-02-21T10:22:47.6850028Z mov.b32 %r1930, %r1916; 2026-02-21T10:22:47.6850093Z mov.b32 %r1931, %r1916; 2026-02-21T10:22:47.6850154Z mov.b32 %r1932, %r1916; 2026-02-21T10:22:47.6850214Z mov.b32 %r1933, %r1916; 2026-02-21T10:22:47.6850274Z mov.b32 %r1934, %r1916; 2026-02-21T10:22:47.6850352Z mov.b32 %r1935, %r1916; 2026-02-21T10:22:47.6850414Z mov.b32 %r1936, %r1916; 2026-02-21T10:22:47.6850476Z mov.b32 %r1937, %r1916; 2026-02-21T10:22:47.6850541Z mov.b32 %r1938, %r1916; 2026-02-21T10:22:47.6852573Z mov.b32 %r1939, %r1916; 2026-02-21T10:22:47.6852634Z mov.b32 %r1940, %r1916; 2026-02-21T10:22:47.6852694Z mov.b32 %r1941, %r1916; 2026-02-21T10:22:47.6852761Z mov.b32 %r1942, %r1916; 2026-02-21T10:22:47.6852821Z mov.b32 %r1943, %r1916; 2026-02-21T10:22:47.6852942Z mov.b32 %r1944, %r1916; 2026-02-21T10:22:47.6853010Z mov.b32 %r1945, %r1916; 2026-02-21T10:22:47.6853073Z mov.b32 %r1946, %r1916; 2026-02-21T10:22:47.6853145Z mov.b32 %r1947, %r1916; 2026-02-21T10:22:47.6853208Z mov.b32 %r1948, %r1916; 2026-02-21T10:22:47.6853274Z mov.b32 %r1949, %r1916; 2026-02-21T10:22:47.6853338Z mov.b32 %r1950, %r1916; 2026-02-21T10:22:47.6853397Z mov.b32 %r1951, %r1916; 2026-02-21T10:22:47.6853462Z mov.b32 %r1952, %r1916; 2026-02-21T10:22:47.6853522Z mov.b32 %r1953, %r1916; 2026-02-21T10:22:47.6853582Z mov.b32 %r1954, %r1916; 2026-02-21T10:22:47.6853641Z mov.b32 %r1955, %r1916; 2026-02-21T10:22:47.6853707Z mov.b32 %r1956, %r1916; 2026-02-21T10:22:47.6853769Z mov.b32 %r1957, %r1916; 2026-02-21T10:22:47.6853885Z mov.b32 %r1958, %r1916; 2026-02-21T10:22:47.6853955Z mov.b32 %r1959, %r1916; 2026-02-21T10:22:47.6854015Z mov.b32 %r1960, %r1916; 2026-02-21T10:22:47.6858384Z mov.b32 %r1961, %r1916; 2026-02-21T10:22:47.6858502Z mov.b32 %r1962, %r1916; 2026-02-21T10:22:47.6858569Z mov.b32 %r1963, %r1916; 2026-02-21T10:22:47.6858635Z mov.b32 %r1964, %r1916; 2026-02-21T10:22:47.6858696Z mov.b32 %r1965, %r1916; 2026-02-21T10:22:47.6858892Z mov.b32 %r1966, %r1916; 2026-02-21T10:22:47.6858960Z mov.b32 %r1967, %r1916; 2026-02-21T10:22:47.6859027Z mov.b32 %r1968, %r1916; 2026-02-21T10:22:47.6859088Z mov.b32 %r1969, %r1916; 2026-02-21T10:22:47.6859156Z mov.b32 %r1970, %r1916; 2026-02-21T10:22:47.6859222Z mov.b32 %r1971, %r1916; 2026-02-21T10:22:47.6859282Z mov.b32 %r1972, %r1916; 2026-02-21T10:22:47.6859343Z mov.b32 %r1973, %r1916; 2026-02-21T10:22:47.6859403Z mov.b32 %r1974, %r1916; 2026-02-21T10:22:47.6859473Z mov.b32 %r1975, %r1916; 2026-02-21T10:22:47.6859535Z mov.b32 %r1976, %r1916; 2026-02-21T10:22:47.6859596Z mov.b32 %r1977, %r1916; 2026-02-21T10:22:47.6859662Z mov.b32 %r1978, %r1916; 2026-02-21T10:22:47.6859725Z mov.b32 %r1979, %r1916; 2026-02-21T10:22:47.6859786Z bra.uni $L__BB0_6; 2026-02-21T10:22:47.6859891Z $L__BB0_7: // %._crit_edge 2026-02-21T10:22:47.6860142Z .loc 1 29 88 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:88 2026-02-21T10:22:47.6860219Z cp.async.wait_group 0; 2026-02-21T10:22:47.6860278Z bar.sync 0; 2026-02-21T10:22:47.6860344Z // begin inline asm 2026-02-21T10:22:47.6860454Z @%p55 mbarrier.inval.shared::cta.b64 [%r306]; 2026-02-21T10:22:47.6860514Z // end inline asm 2026-02-21T10:22:47.6860571Z bar.sync 0; 2026-02-21T10:22:47.6860646Z // begin inline asm 2026-02-21T10:22:47.6860740Z @%p55 mbarrier.inval.shared::cta.b64 [%r307]; 2026-02-21T10:22:47.6860798Z // end inline asm 2026-02-21T10:22:47.6861020Z .loc 1 29 4 // cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py:29:4 2026-02-21T10:22:47.6861078Z ret; 2026-02-21T10:22:47.6861135Z $L__tmp3: 2026-02-21T10:22:47.6861200Z $L__func_end0: 2026-02-21T10:22:47.6861290Z // -- End function 2026-02-21T10:22:47.6861347Z } 2026-02-21T10:22:47.6861602Z .file 1 "/tmp/torchinductor_root/ts/cts7zrloc7j33phvpxejoobcquyx3pffudxch6q6cmvtbmxbg4ec.py" 2026-02-21T10:22:47.6861834Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:22:47.6861913Z .section .debug_abbrev 2026-02-21T10:22:47.6861967Z { 2026-02-21T10:22:47.6862074Z .b8 1 // Abbreviation Code 2026-02-21T10:22:47.6862172Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:22:47.6862262Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:47.6862355Z .b8 37 // DW_AT_producer 2026-02-21T10:22:47.6862438Z .b8 8 // DW_FORM_string 2026-02-21T10:22:47.6862624Z .b8 19 // DW_AT_language 2026-02-21T10:22:47.6862709Z .b8 5 // DW_FORM_data2 2026-02-21T10:22:47.6862795Z .b8 3 // DW_AT_name 2026-02-21T10:22:47.6862943Z .b8 8 // DW_FORM_string 2026-02-21T10:22:47.6863030Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:22:47.6863122Z .b8 6 // DW_FORM_data4 2026-02-21T10:22:47.6863220Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:22:47.6863304Z .b8 8 // DW_FORM_string 2026-02-21T10:22:47.6863386Z .b8 0 // EOM(1) 2026-02-21T10:22:47.6863457Z .b8 0 // EOM(2) 2026-02-21T10:22:47.6863548Z .b8 2 // Abbreviation Code 2026-02-21T10:22:47.6863701Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:47.6863793Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:47.6863869Z .b8 3 // DW_AT_name 2026-02-21T10:22:47.6863951Z .b8 8 // DW_FORM_string 2026-02-21T10:22:47.6864041Z .b8 32 // DW_AT_inline 2026-02-21T10:22:47.6864166Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:47.6864250Z .b8 0 // EOM(1) 2026-02-21T10:22:47.6864328Z .b8 0 // EOM(2) 2026-02-21T10:22:47.6864417Z .b8 3 // Abbreviation Code 2026-02-21T10:22:47.6864502Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:22:47.6864589Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:22:47.6864670Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:47.6864751Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:47.6864837Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:47.6864921Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:47.6865016Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:47.6865094Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:47.6865175Z .b8 0 // EOM(1) 2026-02-21T10:22:47.6865244Z .b8 0 // EOM(2) 2026-02-21T10:22:47.6865333Z .b8 4 // Abbreviation Code 2026-02-21T10:22:47.6865440Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:22:47.6865522Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:22:47.6865614Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:22:47.6865691Z .b8 19 // DW_FORM_ref4 2026-02-21T10:22:47.6865776Z .b8 17 // DW_AT_low_pc 2026-02-21T10:22:47.6865854Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:47.6865936Z .b8 18 // DW_AT_high_pc 2026-02-21T10:22:47.6866019Z .b8 1 // DW_FORM_addr 2026-02-21T10:22:47.6866102Z .b8 88 // DW_AT_call_file 2026-02-21T10:22:47.6866184Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:47.6866271Z .b8 89 // DW_AT_call_line 2026-02-21T10:22:47.6866348Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:47.6866443Z .b8 87 // DW_AT_call_column 2026-02-21T10:22:47.6866658Z .b8 11 // DW_FORM_data1 2026-02-21T10:22:47.6866744Z .b8 0 // EOM(1) 2026-02-21T10:22:47.6866814Z .b8 0 // EOM(2) 2026-02-21T10:22:47.6866978Z .b8 0 // EOM(3) 2026-02-21T10:22:47.6867035Z } 2026-02-21T10:22:47.6867102Z .section .debug_info 2026-02-21T10:22:47.6867156Z { 2026-02-21T10:22:47.6867252Z .b32 178 // Length of Unit 2026-02-21T10:22:47.6867412Z .b8 2 // DWARF version number 2026-02-21T10:22:47.6867466Z .b8 0 2026-02-21T10:22:47.6867601Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:22:47.6867702Z .b8 8 // Address Size (in bytes) 2026-02-21T10:22:47.6867823Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:22:47.6867917Z .b8 116 // DW_AT_producer 2026-02-21T10:22:47.6867977Z .b8 114 2026-02-21T10:22:47.6868029Z .b8 105 2026-02-21T10:22:47.6868081Z .b8 116 2026-02-21T10:22:47.6868132Z .b8 111 2026-02-21T10:22:47.6868188Z .b8 110 2026-02-21T10:22:47.6868242Z .b8 0 2026-02-21T10:22:47.6868385Z .b8 2 // DW_AT_language 2026-02-21T10:22:47.6868446Z .b8 0 2026-02-21T10:22:47.6868599Z .b8 99 // DW_AT_name 2026-02-21T10:22:47.6868655Z .b8 116 2026-02-21T10:22:47.6868714Z .b8 115 2026-02-21T10:22:47.6868773Z .b8 55 2026-02-21T10:22:47.6868827Z .b8 122 2026-02-21T10:22:47.6868884Z .b8 114 2026-02-21T10:22:47.6868942Z .b8 108 2026-02-21T10:22:47.6868995Z .b8 111 2026-02-21T10:22:47.6869126Z .b8 99 2026-02-21T10:22:47.6869182Z .b8 55 2026-02-21T10:22:47.6869243Z .b8 106 2026-02-21T10:22:47.6869295Z .b8 51 2026-02-21T10:22:47.6869346Z .b8 51 2026-02-21T10:22:47.6869403Z .b8 112 2026-02-21T10:22:47.6869456Z .b8 104 2026-02-21T10:22:47.6869508Z .b8 118 2026-02-21T10:22:47.6869561Z .b8 112 2026-02-21T10:22:47.6869620Z .b8 120 2026-02-21T10:22:47.6869673Z .b8 101 2026-02-21T10:22:47.6869725Z .b8 106 2026-02-21T10:22:47.6869783Z .b8 111 2026-02-21T10:22:47.6869835Z .b8 111 2026-02-21T10:22:47.6869889Z .b8 98 2026-02-21T10:22:47.6869943Z .b8 99 2026-02-21T10:22:47.6870002Z .b8 113 2026-02-21T10:22:47.6870055Z .b8 117 2026-02-21T10:22:47.6870107Z .b8 121 2026-02-21T10:22:47.6870159Z .b8 120 2026-02-21T10:22:47.6870217Z .b8 51 2026-02-21T10:22:47.6870269Z .b8 112 2026-02-21T10:22:47.6870326Z .b8 102 2026-02-21T10:22:47.6870383Z .b8 102 2026-02-21T10:22:47.6870434Z .b8 117 2026-02-21T10:22:47.6870488Z .b8 100 2026-02-21T10:22:47.6870540Z .b8 120 2026-02-21T10:22:47.6870599Z .b8 99 2026-02-21T10:22:47.6870653Z .b8 104 2026-02-21T10:22:47.6870705Z .b8 54 2026-02-21T10:22:47.6870762Z .b8 113 2026-02-21T10:22:47.6870815Z .b8 54 2026-02-21T10:22:47.6870867Z .b8 99 2026-02-21T10:22:47.6870919Z .b8 109 2026-02-21T10:22:47.6870975Z .b8 118 2026-02-21T10:22:47.6871028Z .b8 116 2026-02-21T10:22:47.6871079Z .b8 98 2026-02-21T10:22:47.6871131Z .b8 109 2026-02-21T10:22:47.6871191Z .b8 120 2026-02-21T10:22:47.6871242Z .b8 98 2026-02-21T10:22:47.6871294Z .b8 103 2026-02-21T10:22:47.6871356Z .b8 52 2026-02-21T10:22:47.6871418Z .b8 101 2026-02-21T10:22:47.6871473Z .b8 99 2026-02-21T10:22:47.6871526Z .b8 46 2026-02-21T10:22:47.6871585Z .b8 112 2026-02-21T10:22:47.6871637Z .b8 121 2026-02-21T10:22:47.6871689Z .b8 0 2026-02-21T10:22:47.6871801Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:22:47.6871887Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:22:47.6871942Z .b8 116 2026-02-21T10:22:47.6871994Z .b8 109 2026-02-21T10:22:47.6872052Z .b8 112 2026-02-21T10:22:47.6872105Z .b8 47 2026-02-21T10:22:47.6872157Z .b8 116 2026-02-21T10:22:47.6872214Z .b8 111 2026-02-21T10:22:47.6872267Z .b8 114 2026-02-21T10:22:47.6872318Z .b8 99 2026-02-21T10:22:47.6872370Z .b8 104 2026-02-21T10:22:47.6872426Z .b8 105 2026-02-21T10:22:47.6872479Z .b8 110 2026-02-21T10:22:47.6872531Z .b8 100 2026-02-21T10:22:47.6872588Z .b8 117 2026-02-21T10:22:47.6872638Z .b8 99 2026-02-21T10:22:47.6872692Z .b8 116 2026-02-21T10:22:47.6872743Z .b8 111 2026-02-21T10:22:47.6872801Z .b8 114 2026-02-21T10:22:47.6872852Z .b8 95 2026-02-21T10:22:47.6872967Z .b8 114 2026-02-21T10:22:47.6873021Z .b8 111 2026-02-21T10:22:47.6873087Z .b8 111 2026-02-21T10:22:47.6873142Z .b8 116 2026-02-21T10:22:47.6873195Z .b8 47 2026-02-21T10:22:47.6873253Z .b8 116 2026-02-21T10:22:47.6873304Z .b8 115 2026-02-21T10:22:47.6873407Z .b8 0 2026-02-21T10:22:47.6873525Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:22:47.6873614Z .b8 95 // DW_AT_name 2026-02-21T10:22:47.6873671Z .b8 104 2026-02-21T10:22:47.6873724Z .b8 101 2026-02-21T10:22:47.6873780Z .b8 108 2026-02-21T10:22:47.6873832Z .b8 105 2026-02-21T10:22:47.6873885Z .b8 111 2026-02-21T10:22:47.6873937Z .b8 110 2026-02-21T10:22:47.6873994Z .b8 95 2026-02-21T10:22:47.6874048Z .b8 109 2026-02-21T10:22:47.6874099Z .b8 97 2026-02-21T10:22:47.6874158Z .b8 116 2026-02-21T10:22:47.6874210Z .b8 109 2026-02-21T10:22:47.6874263Z .b8 117 2026-02-21T10:22:47.6874316Z .b8 108 2026-02-21T10:22:47.6874373Z .b8 95 2026-02-21T10:22:47.6874424Z .b8 98 2026-02-21T10:22:47.6874480Z .b8 102 2026-02-21T10:22:47.6874605Z .b8 49 2026-02-21T10:22:47.6874664Z .b8 54 2026-02-21T10:22:47.6874717Z .b8 95 2026-02-21T10:22:47.6874769Z .b8 105 2026-02-21T10:22:47.6874827Z .b8 110 2026-02-21T10:22:47.6874879Z .b8 116 2026-02-21T10:22:47.6874932Z .b8 52 2026-02-21T10:22:47.6874985Z .b8 0 2026-02-21T10:22:47.6875072Z .b8 1 // DW_AT_inline 2026-02-21T10:22:47.6875231Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:22:47.6875330Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:22:47.6875435Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:22:47.6875537Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:47.6875669Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:22:47.6875778Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:22:47.6875882Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:22:47.6875974Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T10:22:47.6876064Z .b8 1 // DW_AT_call_file 2026-02-21T10:22:47.6876149Z .b8 93 // DW_AT_call_line 2026-02-21T10:22:47.6876240Z .b8 40 // DW_AT_call_column 2026-02-21T10:22:47.6876334Z .b8 0 // End Of Children Mark 2026-02-21T10:22:47.6876428Z .b8 0 // End Of Children Mark 2026-02-21T10:22:47.6876591Z } 2026-02-21T10:22:47.6876668Z .section .debug_macinfo { } 2026-02-21T10:22:47.6876674Z 2026-02-21T10:22:47.6876761Z ================================================================ 2026-02-21T10:22:47.6876882Z please share the reproducer above with Triton project. 2026-02-21T10:22:47.6876887Z 2026-02-21T10:22:47.6877338Z Generation 18: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 61/61 2.2 configs/s 2026-02-21T10:22:50.4192675Z Generation 18: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 8.7 configs/s 2026-02-21T10:22:51.9054022Z [3364s] Generation 18 complete: 2026-02-21T10:22:51.9054303Z error=16 2026-02-21T10:22:51.9054480Z ok=47 2026-02-21T10:22:51.9054675Z min=5.9418 2026-02-21T10:22:51.9054841Z mid=13.0073 2026-02-21T10:22:51.9055009Z max=417.0431 2026-02-21T10:22:51.9055199Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:22:51.9055577Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:22:51.9055952Z 'l2_groupings': [8], 2026-02-21T10:22:51.9056191Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:22:51.9056819Z 'loop_orders': [[1, 0]], 2026-02-21T10:22:51.9057045Z 'maxnreg': 256, 2026-02-21T10:22:51.9057249Z 'num_sm_multiplier': 64, 2026-02-21T10:22:51.9057468Z 'num_stages': 1, 2026-02-21T10:22:51.9057659Z 'num_warps': 4, 2026-02-21T10:22:51.9057860Z 'pid_type': 'persistent_blocked', 2026-02-21T10:22:51.9058126Z 'range_flattens': [True, False], 2026-02-21T10:22:51.9058748Z 'range_multi_buffers': [None, True], 2026-02-21T10:22:51.9059021Z 'range_num_stages': [3, 1], 2026-02-21T10:22:51.9059264Z 'range_unroll_factors': [1, 1], 2026-02-21T10:22:51.9059506Z 'range_warp_specializes': []} 2026-02-21T10:22:51.9123862Z [3364s] Fitting surrogate: 1501 points, 1501 targets 2026-02-21T10:22:52.9088292Z [3365s] Generation 19 starting: 57 neighbors, 3 active search path(s) 2026-02-21T10:23:16.8441916Z Generation 19: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58/58 1.2 configs/s 2026-02-21T10:23:47.7342523Z 2026-02-21T10:23:47.7342599Z 2026-02-21T10:23:47.7343246Z ================================================================ 2026-02-21T10:23:47.7343846Z Internal Triton PTX codegen error 2026-02-21T10:23:47.7344275Z `ptxas` stderr: 2026-02-21T10:23:47.7345445Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 788 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:23:47.7347935Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:23:47.7348354Z 2026-02-21T10:23:47.7349595Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8e6q2n1v.ptx -o /tmp/tmp8e6q2n1v.ptx.o 2026-02-21T10:23:47.7350807Z 2026-02-21T10:23:47.7350815Z 2026-02-21T10:23:47.7350969Z // 2026-02-21T10:23:47.7351582Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:23:47.7352119Z // 2026-02-21T10:23:47.7352328Z 2026-02-21T10:23:47.7352462Z .version 8.7 2026-02-21T10:23:47.7352699Z .target sm_90a 2026-02-21T10:23:47.7352889Z .address_size 64 2026-02-21T10:23:47.7353007Z 2026-02-21T10:23:47.7353207Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:23:47.7353615Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:23:47.7353917Z // @_helion_matmul_bf16_int4 2026-02-21T10:23:47.7354224Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:23:47.7354573Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:23:47.7354990Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:23:47.7355388Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:23:47.7355798Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:23:47.7356222Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:23:47.7356757Z ) 2026-02-21T10:23:47.7356934Z .reqntid 256 2026-02-21T10:23:47.7357133Z .maxnreg 32 2026-02-21T10:23:47.7357322Z { 2026-02-21T10:23:47.7357477Z .reg .pred %p<189>; 2026-02-21T10:23:47.7357684Z .reg .b16 %rs<2310>; 2026-02-21T10:23:47.7357865Z .reg .b32 %r<16284>; 2026-02-21T10:23:47.7358046Z .reg .b64 %rd<367>; 2026-02-21T10:23:47.7358407Z .loc 1 19 0 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:19:0 2026-02-21T10:23:47.7358838Z $L__func_begin0: 2026-02-21T10:23:47.7359184Z .loc 1 19 0 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:19:0 2026-02-21T10:23:47.7359535Z 2026-02-21T10:23:47.7359598Z // %bb.0: 2026-02-21T10:23:47.7359827Z ld.param.b64 %rd45, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:23:47.7360176Z ld.param.b64 %rd44, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:23:47.7360461Z $L__tmp0: 2026-02-21T10:23:47.7360801Z .loc 1 21 67 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:21:67 2026-02-21T10:23:47.7361232Z mov.u32 %r16090, %ctaid.x; 2026-02-21T10:23:47.7361509Z ld.param.b64 %rd47, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:23:47.7361803Z mov.u32 %r505, %ctaid.y; 2026-02-21T10:23:47.7362061Z ld.param.b64 %rd64, [_helion_matmul_bf16_int4_param_3]; 2026-02-21T10:23:47.7362346Z mov.u32 %r506, %ctaid.z; 2026-02-21T10:23:47.7362538Z mov.u32 %r507, %nctaid.x; 2026-02-21T10:23:47.7363496Z [3420s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:23:47.7365228Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, True], range_num_stages=[2, 2], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:23:47.7367092Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:23:47.7367384Z `ptxas` stderr: 2026-02-21T10:23:47.7367948Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 788 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:23:47.7368608Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:23:47.7368802Z 2026-02-21T10:23:47.7369394Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp8e6q2n1v.ptx -o /tmp/tmp8e6q2n1v.ptx.o 2026-02-21T10:23:47.7370002Z 2026-02-21T10:23:47.7370162Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:23:47.7370470Z mov.u32 %r508, %nctaid.y; 2026-02-21T10:23:47.7370742Z mad.lo.s32 %r509, %r506, %r508, %r505; 2026-02-21T10:23:47.7370978Z mad.lo.s32 %r510, %r509, %r507, %r16090; 2026-02-21T10:23:47.7371196Z shl.b32 %r511, %r510, 7; 2026-02-21T10:23:47.7371382Z cvt.s64.s32 %rd65, %r511; 2026-02-21T10:23:47.7371563Z add.s64 %rd61, %rd64, %rd65; 2026-02-21T10:23:47.7371753Z mov.u32 %r2, %tid.x; 2026-02-21T10:23:47.7371926Z setp.lt.u32 %p1, %r2, 32; 2026-02-21T10:23:47.7372108Z shl.b32 %r512, %r2, 2; 2026-02-21T10:23:47.7372280Z mov.b32 %r10980, global_smem; 2026-02-21T10:23:47.7372474Z add.s32 %r497, %r10980, %r512; 2026-02-21T10:23:47.7372661Z mov.b32 %r498, 0; 2026-02-21T10:23:47.7372818Z // begin inline asm 2026-02-21T10:23:47.7372999Z @%p1 st.shared.b32 [ %r497 + 0 ], %r498; 2026-02-21T10:23:47.7373202Z // end inline asm 2026-02-21T10:23:47.7373363Z bar.warp.sync -1; 2026-02-21T10:23:47.7373531Z setp.eq.b32 %p132, %r2, 0; 2026-02-21T10:23:47.7373719Z cvt.u64.u32 %rd46, %r10980; 2026-02-21T10:23:47.7373900Z // begin inline asm 2026-02-21T10:23:47.7374240Z @%p132 tensormap.replace.tile.global_address.shared::cta.b1024.b64 [ %rd46 + 0 ], %rd47; 2026-02-21T10:23:47.7374603Z // end inline asm 2026-02-21T10:23:47.7374753Z // begin inline asm 2026-02-21T10:23:47.7375030Z @%p132 tensormap.replace.tile.rank.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1; 2026-02-21T10:23:47.7375346Z // end inline asm 2026-02-21T10:23:47.7375501Z mov.b32 %r499, 128; 2026-02-21T10:23:47.7375672Z // begin inline asm 2026-02-21T10:23:47.7375971Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r499; 2026-02-21T10:23:47.7376314Z // end inline asm 2026-02-21T10:23:47.7376595Z mov.b32 %r500, 32; 2026-02-21T10:23:47.7376763Z // begin inline asm 2026-02-21T10:23:47.7377048Z @%p132 tensormap.replace.tile.box_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r500; 2026-02-21T10:23:47.7377391Z // end inline asm 2026-02-21T10:23:47.7377540Z mov.b32 %r501, 1280; 2026-02-21T10:23:47.7377705Z // begin inline asm 2026-02-21T10:23:47.7378004Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r501; 2026-02-21T10:23:47.7378357Z // end inline asm 2026-02-21T10:23:47.7378510Z mov.b32 %r502, 4096; 2026-02-21T10:23:47.7378668Z // begin inline asm 2026-02-21T10:23:47.7378970Z @%p132 tensormap.replace.tile.global_dim.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r502; 2026-02-21T10:23:47.7379315Z // end inline asm 2026-02-21T10:23:47.7379488Z mov.b64 %rd54, 1280; 2026-02-21T10:23:47.7379646Z // begin inline asm 2026-02-21T10:23:47.7379964Z @%p132 tensormap.replace.tile.global_stride.shared::cta.b1024.b64 [ %rd46 + 0 ], 0x0, %rd54; 2026-02-21T10:23:47.7380424Z // end inline asm 2026-02-21T10:23:47.7380582Z mov.b32 %r503, 1; 2026-02-21T10:23:47.7380737Z // begin inline asm 2026-02-21T10:23:47.7381050Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0, %r503; 2026-02-21T10:23:47.7381507Z // end inline asm 2026-02-21T10:23:47.7381657Z // begin inline asm 2026-02-21T10:23:47.7381985Z @%p132 tensormap.replace.tile.element_stride.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x1, %r503; 2026-02-21T10:23:47.7382347Z // end inline asm 2026-02-21T10:23:47.7382501Z // begin inline asm 2026-02-21T10:23:47.7382793Z @%p132 tensormap.replace.tile.elemtype.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0; 2026-02-21T10:23:47.7383125Z // end inline asm 2026-02-21T10:23:47.7383279Z // begin inline asm 2026-02-21T10:23:47.7383584Z @%p132 tensormap.replace.tile.interleave_layout.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0; 2026-02-21T10:23:47.7383943Z // end inline asm 2026-02-21T10:23:47.7384094Z // begin inline asm 2026-02-21T10:23:47.7384467Z @%p132 tensormap.replace.tile.swizzle_mode.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x3; 2026-02-21T10:23:47.7384813Z // end inline asm 2026-02-21T10:23:47.7384965Z // begin inline asm 2026-02-21T10:23:47.7385250Z @%p132 tensormap.replace.tile.fill_mode.shared::cta.b1024.b32 [ %rd46 + 0 ], 0x0; 2026-02-21T10:23:47.7385571Z // end inline asm 2026-02-21T10:23:47.7385728Z // begin inline asm 2026-02-21T10:23:47.7386223Z @%p1 tensormap.cp_fenceproxy.global.shared::cta.tensormap::generic.release.gpu.sync.aligned [ %rd61 + 0 ], [ %rd46 + 0 ], 0x80; 2026-02-21T10:23:47.7386842Z // end inline asm 2026-02-21T10:23:47.7386995Z // begin inline asm 2026-02-21T10:23:47.7387246Z @%p1 fence.proxy.tensormap::generic.acquire.gpu [ %rd61 + 0 ], 0x80; 2026-02-21T10:23:47.7387562Z @%p1 cp.async.bulk.commit_group ; 2026-02-21T10:23:47.7387784Z @%p1 cp.async.bulk.wait_group.read 0 ; 2026-02-21T10:23:47.7388010Z // end inline asm 2026-02-21T10:23:47.7388158Z bar.sync 0; 2026-02-21T10:23:47.7388325Z cvta.global.u64 %rd281, %rd61; 2026-02-21T10:23:47.7388748Z .loc 1 0 0 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0 2026-02-21T10:23:47.7389105Z sub.s32 %r514, 5251, %r16090; 2026-02-21T10:23:47.7389449Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7389837Z mul.hi.u32 %r515, %r514, 1041204193; 2026-02-21T10:23:47.7390048Z shr.u32 %r516, %r515, 5; 2026-02-21T10:23:47.7390226Z and.b32 %r517, %r516, 33554430; 2026-02-21T10:23:47.7390430Z mad.lo.s32 %r16219, %r517, 132, %r16090; 2026-02-21T10:23:47.7390790Z .loc 1 38 45 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:38:45 2026-02-21T10:23:47.7391159Z shr.u32 %r4, %r2, 5; 2026-02-21T10:23:47.7391323Z shr.u32 %r518, %r2, 3; 2026-02-21T10:23:47.7391497Z bfe.u32 %r5, %r2, 3, 5; 2026-02-21T10:23:47.7391668Z or.b32 %r6, %r518, 96; 2026-02-21T10:23:47.7391831Z shr.u32 %r519, %r2, 4; 2026-02-21T10:23:47.7392004Z bfe.u32 %r7, %r2, 4, 4; 2026-02-21T10:23:47.7392170Z or.b32 %r8, %r7, 16; 2026-02-21T10:23:47.7392331Z or.b32 %r9, %r7, 32; 2026-02-21T10:23:47.7392502Z or.b32 %r10, %r519, 48; 2026-02-21T10:23:47.7392678Z or.b32 %r11, %r7, 64; 2026-02-21T10:23:47.7392840Z or.b32 %r12, %r7, 80; 2026-02-21T10:23:47.7393002Z or.b32 %r13, %r7, 96; 2026-02-21T10:23:47.7393161Z or.b32 %r14, %r519, 112; 2026-02-21T10:23:47.7393345Z shl.b32 %r520, %r2, 3; 2026-02-21T10:23:47.7393518Z and.b32 %r15, %r520, 120; 2026-02-21T10:23:47.7393854Z .loc 1 53 38 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:53:38 2026-02-21T10:23:47.7394221Z and.b32 %r16, %r2, 7; 2026-02-21T10:23:47.7394524Z .loc 1 71 38 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:71:38 2026-02-21T10:23:47.7394886Z and.b32 %r17, %r2, 128; 2026-02-21T10:23:47.7395207Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7395692Z setp.ge.s32 %p19, %r16090, %r16219; 2026-02-21T10:23:47.7395894Z shl.b32 %r16076, %r2, 4; 2026-02-21T10:23:47.7396082Z and.b32 %r16077, %r2, 56; 2026-02-21T10:23:47.7396262Z shl.b32 %r16078, %r2, 6; 2026-02-21T10:23:47.7396623Z shl.b32 %r16079, %r2, 5; 2026-02-21T10:23:47.7396806Z shl.b32 %r16080, %r2, 1; 2026-02-21T10:23:47.7396975Z and.b32 %r16081, %r2, 127; 2026-02-21T10:23:47.7397157Z shl.b32 %r16082, %r16, 4; 2026-02-21T10:23:47.7397326Z shr.u32 %r16083, %r17, 5; 2026-02-21T10:23:47.7397497Z and.b32 %r16084, %r2, 3; 2026-02-21T10:23:47.7397660Z and.b32 %r16085, %r2, 24; 2026-02-21T10:23:47.7397830Z shl.b32 %r16086, %r6, 13; 2026-02-21T10:23:47.7397994Z or.b32 %r16087, %r5, 64; 2026-02-21T10:23:47.7398162Z or.b32 %r16088, %r5, 32; 2026-02-21T10:23:47.7398334Z shl.b32 %r16089, %r5, 13; 2026-02-21T10:23:47.7398510Z setp.eq.b32 %p188, %r17, 0; 2026-02-21T10:23:47.7398699Z @%p19 bra $L__BB0_7; 2026-02-21T10:23:47.7398967Z // %bb.1: // %.lr.ph 2026-02-21T10:23:47.7399361Z .loc 1 0 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0:140 2026-02-21T10:23:47.7399729Z and.b32 %r522, %r16076, 4080; 2026-02-21T10:23:47.7399924Z xor.b32 %r524, %r522, %r16077; 2026-02-21T10:23:47.7400112Z add.s32 %r18, %r10980, %r524; 2026-02-21T10:23:47.7400298Z xor.b32 %r526, %r524, 8; 2026-02-21T10:23:47.7400538Z add.s32 %r19, %r10980, %r526; 2026-02-21T10:23:47.7400720Z and.b32 %r528, %r16078, 14336; 2026-02-21T10:23:47.7400909Z and.b32 %r530, %r16079, 896; 2026-02-21T10:23:47.7401087Z and.b32 %r532, %r16080, 62; 2026-02-21T10:23:47.7401270Z or.b32 %r533, %r528, %r530; 2026-02-21T10:23:47.7401443Z or.b32 %r534, %r533, %r532; 2026-02-21T10:23:47.7401622Z add.s32 %r20, %r10980, %r534; 2026-02-21T10:23:47.7401799Z xor.b32 %r535, %r534, 8; 2026-02-21T10:23:47.7401976Z add.s32 %r21, %r10980, %r535; 2026-02-21T10:23:47.7402173Z xor.b32 %r536, %r534, 16; 2026-02-21T10:23:47.7402357Z add.s32 %r22, %r10980, %r536; 2026-02-21T10:23:47.7402537Z xor.b32 %r537, %r534, 24; 2026-02-21T10:23:47.7402707Z add.s32 %r23, %r10980, %r537; 2026-02-21T10:23:47.7402887Z xor.b32 %r538, %r534, 32; 2026-02-21T10:23:47.7403057Z add.s32 %r24, %r10980, %r538; 2026-02-21T10:23:47.7403241Z xor.b32 %r539, %r534, 40; 2026-02-21T10:23:47.7403411Z add.s32 %r25, %r10980, %r539; 2026-02-21T10:23:47.7403607Z xor.b32 %r540, %r534, 48; 2026-02-21T10:23:47.7403779Z add.s32 %r26, %r10980, %r540; 2026-02-21T10:23:47.7403964Z xor.b32 %r541, %r534, 56; 2026-02-21T10:23:47.7404136Z add.s32 %r27, %r10980, %r541; 2026-02-21T10:23:47.7404315Z add.s32 %r28, %r10980, %r16081; 2026-02-21T10:23:47.7404509Z xor.b32 %r543, %r16081, 16; 2026-02-21T10:23:47.7404685Z add.s32 %r29, %r10980, %r543; 2026-02-21T10:23:47.7404868Z xor.b32 %r544, %r16081, 32; 2026-02-21T10:23:47.7405045Z add.s32 %r30, %r10980, %r544; 2026-02-21T10:23:47.7405229Z xor.b32 %r545, %r16081, 48; 2026-02-21T10:23:47.7405407Z add.s32 %r31, %r10980, %r545; 2026-02-21T10:23:47.7405587Z xor.b32 %r546, %r16081, 64; 2026-02-21T10:23:47.7405759Z add.s32 %r32, %r10980, %r546; 2026-02-21T10:23:47.7405939Z xor.b32 %r547, %r16081, 80; 2026-02-21T10:23:47.7406118Z add.s32 %r33, %r10980, %r547; 2026-02-21T10:23:47.7406290Z xor.b32 %r548, %r16081, 96; 2026-02-21T10:23:47.7406585Z add.s32 %r34, %r10980, %r548; 2026-02-21T10:23:47.7406772Z xor.b32 %r549, %r16081, 112; 2026-02-21T10:23:47.7406951Z add.s32 %r35, %r10980, %r549; 2026-02-21T10:23:47.7407128Z shl.b32 %r550, %r16081, 7; 2026-02-21T10:23:47.7407310Z or.b32 %r553, %r16082, %r16083; 2026-02-21T10:23:47.7407493Z or.b32 %r554, %r553, %r550; 2026-02-21T10:23:47.7407671Z add.s32 %r36, %r10980, %r554; 2026-02-21T10:23:47.7407853Z xor.b32 %r555, %r554, 16; 2026-02-21T10:23:47.7408022Z add.s32 %r37, %r10980, %r555; 2026-02-21T10:23:47.7408201Z xor.b32 %r556, %r554, 32; 2026-02-21T10:23:47.7408368Z add.s32 %r38, %r10980, %r556; 2026-02-21T10:23:47.7408654Z xor.b32 %r557, %r554, 48; 2026-02-21T10:23:47.7408822Z add.s32 %r39, %r10980, %r557; 2026-02-21T10:23:47.7409001Z xor.b32 %r558, %r554, 64; 2026-02-21T10:23:47.7409166Z add.s32 %r40, %r10980, %r558; 2026-02-21T10:23:47.7409433Z xor.b32 %r559, %r554, 80; 2026-02-21T10:23:47.7409602Z add.s32 %r41, %r10980, %r559; 2026-02-21T10:23:47.7409779Z xor.b32 %r560, %r554, 96; 2026-02-21T10:23:47.7409957Z add.s32 %r42, %r10980, %r560; 2026-02-21T10:23:47.7410132Z xor.b32 %r561, %r554, 112; 2026-02-21T10:23:47.7410312Z add.s32 %r43, %r10980, %r561; 2026-02-21T10:23:47.7410491Z bfe.u32 %r562, %r10980, 4, 14; 2026-02-21T10:23:47.7410683Z cvt.u64.u32 %rd66, %r562; 2026-02-21T10:23:47.7410887Z or.b64 %rd2, %rd66, 4611686293372403712; 2026-02-21T10:23:47.7411104Z add.s32 %r563, %r10980, 32; 2026-02-21T10:23:47.7411284Z bfe.u32 %r564, %r563, 4, 14; 2026-02-21T10:23:47.7411467Z cvt.u64.u32 %rd67, %r564; 2026-02-21T10:23:47.7411648Z or.b64 %rd3, %rd67, 4611686293372403712; 2026-02-21T10:23:47.7411935Z add.s32 %r565, %r10980, 64; 2026-02-21T10:23:47.7412120Z bfe.u32 %r566, %r565, 4, 14; 2026-02-21T10:23:47.7412297Z cvt.u64.u32 %rd68, %r566; 2026-02-21T10:23:47.7412487Z or.b64 %rd4, %rd68, 4611686293372403712; 2026-02-21T10:23:47.7412695Z add.s32 %r567, %r10980, 96; 2026-02-21T10:23:47.7412879Z bfe.u32 %r568, %r567, 4, 14; 2026-02-21T10:23:47.7413054Z cvt.u64.u32 %rd69, %r568; 2026-02-21T10:23:47.7413321Z or.b64 %rd5, %rd69, 4611686293372403712; 2026-02-21T10:23:47.7413534Z add.s32 %r569, %r10980, 16384; 2026-02-21T10:23:47.7413725Z bfe.u32 %r570, %r569, 4, 14; 2026-02-21T10:23:47.7413915Z cvt.u64.u32 %rd70, %r570; 2026-02-21T10:23:47.7414103Z or.b64 %rd6, %rd70, 4611686293372403712; 2026-02-21T10:23:47.7414320Z add.s32 %r571, %r10980, 16416; 2026-02-21T10:23:47.7414510Z bfe.u32 %r572, %r571, 4, 14; 2026-02-21T10:23:47.7414693Z cvt.u64.u32 %rd71, %r572; 2026-02-21T10:23:47.7414874Z or.b64 %rd7, %rd71, 4611686293372403712; 2026-02-21T10:23:47.7415084Z add.s32 %r573, %r10980, 16448; 2026-02-21T10:23:47.7415265Z bfe.u32 %r574, %r573, 4, 14; 2026-02-21T10:23:47.7415462Z cvt.u64.u32 %rd72, %r574; 2026-02-21T10:23:47.7415644Z or.b64 %rd8, %rd72, 4611686293372403712; 2026-02-21T10:23:47.7415861Z add.s32 %r575, %r10980, 16480; 2026-02-21T10:23:47.7416054Z bfe.u32 %r576, %r575, 4, 14; 2026-02-21T10:23:47.7416231Z cvt.u64.u32 %rd73, %r576; 2026-02-21T10:23:47.7416420Z or.b64 %rd9, %rd73, 4611686293372403712; 2026-02-21T10:23:47.7416928Z shl.b32 %r578, %r16084, 13; 2026-02-21T10:23:47.7417113Z and.b32 %r579, %r16079, 7264; 2026-02-21T10:23:47.7417299Z shl.b32 %r581, %r16085, 4; 2026-02-21T10:23:47.7417482Z and.b32 %r583, %r512, 16; 2026-02-21T10:23:47.7417655Z or.b32 %r584, %r578, %r583; 2026-02-21T10:23:47.7417837Z or.b32 %r585, %r579, %r581; 2026-02-21T10:23:47.7418018Z or.b32 %r586, %r584, %r585; 2026-02-21T10:23:47.7418191Z add.s32 %r44, %r10980, %r586; 2026-02-21T10:23:47.7418374Z xor.b32 %r587, %r586, 32; 2026-02-21T10:23:47.7418553Z add.s32 %r45, %r10980, %r587; 2026-02-21T10:23:47.7418735Z xor.b32 %r588, %r586, 64; 2026-02-21T10:23:47.7418901Z add.s32 %r46, %r10980, %r588; 2026-02-21T10:23:47.7419080Z xor.b32 %r589, %r586, 96; 2026-02-21T10:23:47.7419250Z add.s32 %r47, %r10980, %r589; 2026-02-21T10:23:47.7419447Z shl.b32 %r590, %r16085, 10; 2026-02-21T10:23:47.7419627Z shl.b32 %r591, %r16084, 5; 2026-02-21T10:23:47.7419806Z and.b32 %r592, %r512, 1008; 2026-02-21T10:23:47.7419988Z or.b32 %r593, %r590, %r591; 2026-02-21T10:23:47.7420163Z xor.b32 %r594, %r593, %r592; 2026-02-21T10:23:47.7420351Z add.s32 %r5614, %r10980, %r594; 2026-02-21T10:23:47.7420544Z add.s32 %r5619, %r5614, 1024; 2026-02-21T10:23:47.7420733Z add.s32 %r5624, %r5614, 2048; 2026-02-21T10:23:47.7420911Z add.s32 %r5629, %r5614, 3072; 2026-02-21T10:23:47.7421094Z add.s32 %r5634, %r5614, 4096; 2026-02-21T10:23:47.7421278Z add.s32 %r5639, %r5614, 5120; 2026-02-21T10:23:47.7421459Z add.s32 %r5644, %r5614, 6144; 2026-02-21T10:23:47.7421740Z add.s32 %r5649, %r5614, 7168; 2026-02-21T10:23:47.7422089Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7422479Z mad.wide.u32 %rd10, %r16, 16, %rd44; 2026-02-21T10:23:47.7422803Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T10:23:47.7423101Z // Child Loop BB0_3 Depth 2 2026-02-21T10:23:47.7423376Z // Child Loop BB0_5 Depth 2 2026-02-21T10:23:47.7423766Z .loc 1 32 35 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:32:35 2026-02-21T10:23:47.7424138Z shr.s32 %r597, %r16090, 31; 2026-02-21T10:23:47.7424330Z shr.u32 %r598, %r597, 17; 2026-02-21T10:23:47.7424521Z add.s32 %r599, %r16090, %r598; 2026-02-21T10:23:47.7424706Z shr.s32 %r600, %r599, 15; 2026-02-21T10:23:47.7425030Z .loc 1 33 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:33:33 2026-02-21T10:23:47.7425459Z shl.b32 %r601, %r600, 6; 2026-02-21T10:23:47.7425787Z .loc 1 34 39 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:39 2026-02-21T10:23:47.7426142Z sub.s32 %r602, 10, %r601; 2026-02-21T10:23:47.7426597Z .loc 1 34 52 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:52 2026-02-21T10:23:47.7426968Z min.s32 %r603, %r602, 64; 2026-02-21T10:23:47.7427366Z .loc 1 35 45 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:45 2026-02-21T10:23:47.7427729Z and.b32 %r604, %r599, -32768; 2026-02-21T10:23:47.7427914Z sub.s32 %r605, %r16090, %r604; 2026-02-21T10:23:47.7428246Z .loc 1 36 51 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:36:51 2026-02-21T10:23:47.7428667Z div.s32 %r606, %r605, %r603; 2026-02-21T10:23:47.7428995Z .loc 1 35 64 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:64 2026-02-21T10:23:47.7429359Z mul.lo.s32 %r607, %r606, %r603; 2026-02-21T10:23:47.7429548Z sub.s32 %r608, %r605, %r607; 2026-02-21T10:23:47.7429884Z .loc 1 35 30 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:30 2026-02-21T10:23:47.7430237Z add.s32 %r609, %r608, %r601; 2026-02-21T10:23:47.7430556Z .loc 1 37 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:37:27 2026-02-21T10:23:47.7430907Z shl.b32 %r636, %r609, 7; 2026-02-21T10:23:47.7431224Z .loc 1 39 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:39:27 2026-02-21T10:23:47.7431578Z shl.b32 %r105, %r606, 7; 2026-02-21T10:23:47.7431894Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.7432262Z shl.b32 %r610, %r606, 20; 2026-02-21T10:23:47.7432432Z or.b32 %r611, %r16086, %r610; 2026-02-21T10:23:47.7432617Z mul.wide.s32 %rd20, %r611, 2; 2026-02-21T10:23:47.7432801Z or.b32 %r612, %r16087, %r105; 2026-02-21T10:23:47.7432983Z shl.b32 %r613, %r612, 13; 2026-02-21T10:23:47.7433162Z mul.wide.s32 %rd21, %r613, 2; 2026-02-21T10:23:47.7433339Z or.b32 %r614, %r16088, %r105; 2026-02-21T10:23:47.7433518Z shl.b32 %r615, %r614, 13; 2026-02-21T10:23:47.7433692Z mul.wide.s32 %rd22, %r615, 2; 2026-02-21T10:23:47.7433875Z or.b32 %r616, %r16089, %r610; 2026-02-21T10:23:47.7434052Z mul.wide.s32 %rd23, %r616, 2; 2026-02-21T10:23:47.7434237Z mov.b32 %r16091, 0f00000000; 2026-02-21T10:23:47.7434411Z mov.b64 %rd362, 0; 2026-02-21T10:23:47.7434580Z mov.b64 %rd361, %rd10; 2026-02-21T10:23:47.7434750Z mov.b32 %r16092, %r16091; 2026-02-21T10:23:47.7434925Z mov.b32 %r16093, %r16091; 2026-02-21T10:23:47.7435102Z mov.b32 %r16094, %r16091; 2026-02-21T10:23:47.7435277Z mov.b32 %r16095, %r16091; 2026-02-21T10:23:47.7435448Z mov.b32 %r16096, %r16091; 2026-02-21T10:23:47.7435614Z mov.b32 %r16097, %r16091; 2026-02-21T10:23:47.7435784Z mov.b32 %r16098, %r16091; 2026-02-21T10:23:47.7436046Z mov.b32 %r16099, %r16091; 2026-02-21T10:23:47.7436227Z mov.b32 %r16100, %r16091; 2026-02-21T10:23:47.7436391Z mov.b32 %r16101, %r16091; 2026-02-21T10:23:47.7436698Z mov.b32 %r16102, %r16091; 2026-02-21T10:23:47.7436948Z mov.b32 %r16103, %r16091; 2026-02-21T10:23:47.7437123Z mov.b32 %r16104, %r16091; 2026-02-21T10:23:47.7437298Z mov.b32 %r16105, %r16091; 2026-02-21T10:23:47.7437467Z mov.b32 %r16106, %r16091; 2026-02-21T10:23:47.7437644Z mov.b32 %r16107, %r16091; 2026-02-21T10:23:47.7437810Z mov.b32 %r16108, %r16091; 2026-02-21T10:23:47.7437983Z mov.b32 %r16109, %r16091; 2026-02-21T10:23:47.7438148Z mov.b32 %r16110, %r16091; 2026-02-21T10:23:47.7438324Z mov.b32 %r16111, %r16091; 2026-02-21T10:23:47.7438491Z mov.b32 %r16112, %r16091; 2026-02-21T10:23:47.7438661Z mov.b32 %r16113, %r16091; 2026-02-21T10:23:47.7438824Z mov.b32 %r16114, %r16091; 2026-02-21T10:23:47.7439007Z mov.b32 %r16115, %r16091; 2026-02-21T10:23:47.7439180Z mov.b32 %r16116, %r16091; 2026-02-21T10:23:47.7439422Z mov.b32 %r16117, %r16091; 2026-02-21T10:23:47.7439605Z mov.b32 %r16118, %r16091; 2026-02-21T10:23:47.7439773Z mov.b32 %r16119, %r16091; 2026-02-21T10:23:47.7439945Z mov.b32 %r16120, %r16091; 2026-02-21T10:23:47.7440112Z mov.b32 %r16121, %r16091; 2026-02-21T10:23:47.7440282Z mov.b32 %r16122, %r16091; 2026-02-21T10:23:47.7440446Z mov.b32 %r16123, %r16091; 2026-02-21T10:23:47.7440613Z mov.b32 %r16124, %r16091; 2026-02-21T10:23:47.7440852Z mov.b32 %r16125, %r16091; 2026-02-21T10:23:47.7441039Z mov.b32 %r16126, %r16091; 2026-02-21T10:23:47.7441217Z mov.b32 %r16127, %r16091; 2026-02-21T10:23:47.7441386Z mov.b32 %r16128, %r16091; 2026-02-21T10:23:47.7441556Z mov.b32 %r16129, %r16091; 2026-02-21T10:23:47.7441720Z mov.b32 %r16130, %r16091; 2026-02-21T10:23:47.7441892Z mov.b32 %r16131, %r16091; 2026-02-21T10:23:47.7442055Z mov.b32 %r16132, %r16091; 2026-02-21T10:23:47.7442227Z mov.b32 %r16133, %r16091; 2026-02-21T10:23:47.7442393Z mov.b32 %r16134, %r16091; 2026-02-21T10:23:47.7442567Z mov.b32 %r16135, %r16091; 2026-02-21T10:23:47.7442735Z mov.b32 %r16136, %r16091; 2026-02-21T10:23:47.7442905Z mov.b32 %r16137, %r16091; 2026-02-21T10:23:47.7443087Z mov.b32 %r16138, %r16091; 2026-02-21T10:23:47.7443260Z mov.b32 %r16139, %r16091; 2026-02-21T10:23:47.7443433Z mov.b32 %r16140, %r16091; 2026-02-21T10:23:47.7443601Z mov.b32 %r16141, %r16091; 2026-02-21T10:23:47.7443776Z mov.b32 %r16142, %r16091; 2026-02-21T10:23:47.7443944Z mov.b32 %r16143, %r16091; 2026-02-21T10:23:47.7444116Z mov.b32 %r16144, %r16091; 2026-02-21T10:23:47.7444284Z mov.b32 %r16145, %r16091; 2026-02-21T10:23:47.7444455Z mov.b32 %r16146, %r16091; 2026-02-21T10:23:47.7444628Z mov.b32 %r16147, %r16091; 2026-02-21T10:23:47.7444794Z mov.b32 %r16148, %r16091; 2026-02-21T10:23:47.7444966Z mov.b32 %r16149, %r16091; 2026-02-21T10:23:47.7445133Z mov.b32 %r16150, %r16091; 2026-02-21T10:23:47.7445311Z mov.b32 %r16151, %r16091; 2026-02-21T10:23:47.7445474Z mov.b32 %r16152, %r16091; 2026-02-21T10:23:47.7445653Z mov.b32 %r16153, %r16091; 2026-02-21T10:23:47.7445820Z mov.b32 %r16154, %r16091; 2026-02-21T10:23:47.7446048Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T10:23:47.7446352Z // => This Inner Loop Header: Depth=2 2026-02-21T10:23:47.7446899Z .loc 1 0 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0:126 2026-02-21T10:23:47.7447276Z cvt.u32.u64 %r637, %rd362; 2026-02-21T10:23:47.7447602Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7447981Z add.s64 %rd76, %rd361, %rd23; 2026-02-21T10:23:47.7448169Z add.s64 %rd79, %rd361, %rd22; 2026-02-21T10:23:47.7448358Z add.s64 %rd82, %rd361, %rd21; 2026-02-21T10:23:47.7448674Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7449034Z add.s64 %rd85, %rd361, %rd20; 2026-02-21T10:23:47.7449316Z // begin inline asm 2026-02-21T10:23:47.7449472Z mov.u64 %rd75, 0x0; 2026-02-21T10:23:47.7449694Z createpolicy.fractional.L2::evict_last.b64 %rd75, 1.0; 2026-02-21T10:23:47.7449950Z // end inline asm 2026-02-21T10:23:47.7450173Z // begin inline asm 2026-02-21T10:23:47.7450326Z mov.u32 %r617, 0x0; 2026-02-21T10:23:47.7450477Z mov.u32 %r618, 0x0; 2026-02-21T10:23:47.7450626Z mov.u32 %r619, 0x0; 2026-02-21T10:23:47.7450784Z mov.u32 %r620, 0x0; 2026-02-21T10:23:47.7451102Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r617, %r618, %r619, %r620 }, [ %rd76 + 0 ], %rd75; 2026-02-21T10:23:47.7451460Z // end inline asm 2026-02-21T10:23:47.7451615Z // begin inline asm 2026-02-21T10:23:47.7451767Z mov.u64 %rd78, 0x0; 2026-02-21T10:23:47.7451983Z createpolicy.fractional.L2::evict_last.b64 %rd78, 1.0; 2026-02-21T10:23:47.7452235Z // end inline asm 2026-02-21T10:23:47.7452387Z // begin inline asm 2026-02-21T10:23:47.7452539Z mov.u32 %r621, 0x0; 2026-02-21T10:23:47.7452697Z mov.u32 %r622, 0x0; 2026-02-21T10:23:47.7452938Z mov.u32 %r623, 0x0; 2026-02-21T10:23:47.7453102Z mov.u32 %r624, 0x0; 2026-02-21T10:23:47.7453405Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r621, %r622, %r623, %r624 }, [ %rd79 + 0 ], %rd78; 2026-02-21T10:23:47.7453754Z // end inline asm 2026-02-21T10:23:47.7453909Z // begin inline asm 2026-02-21T10:23:47.7454062Z mov.u64 %rd81, 0x0; 2026-02-21T10:23:47.7454342Z createpolicy.fractional.L2::evict_last.b64 %rd81, 1.0; 2026-02-21T10:23:47.7454595Z // end inline asm 2026-02-21T10:23:47.7454751Z // begin inline asm 2026-02-21T10:23:47.7454902Z mov.u32 %r625, 0x0; 2026-02-21T10:23:47.7455056Z mov.u32 %r626, 0x0; 2026-02-21T10:23:47.7455208Z mov.u32 %r627, 0x0; 2026-02-21T10:23:47.7455355Z mov.u32 %r628, 0x0; 2026-02-21T10:23:47.7455652Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r625, %r626, %r627, %r628 }, [ %rd82 + 0 ], %rd81; 2026-02-21T10:23:47.7455994Z // end inline asm 2026-02-21T10:23:47.7456146Z // begin inline asm 2026-02-21T10:23:47.7456301Z mov.u64 %rd84, 0x0; 2026-02-21T10:23:47.7456647Z createpolicy.fractional.L2::evict_last.b64 %rd84, 1.0; 2026-02-21T10:23:47.7456901Z // end inline asm 2026-02-21T10:23:47.7457059Z // begin inline asm 2026-02-21T10:23:47.7457218Z mov.u32 %r629, 0x0; 2026-02-21T10:23:47.7457371Z mov.u32 %r630, 0x0; 2026-02-21T10:23:47.7457528Z mov.u32 %r631, 0x0; 2026-02-21T10:23:47.7457677Z mov.u32 %r632, 0x0; 2026-02-21T10:23:47.7457994Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r629, %r630, %r631, %r632 }, [ %rd85 + 0 ], %rd84; 2026-02-21T10:23:47.7458340Z // end inline asm 2026-02-21T10:23:47.7458649Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7459000Z bar.sync 0; 2026-02-21T10:23:47.7459173Z st.shared.v2.b32 [%r18], {%r617, %r618}; 2026-02-21T10:23:47.7459419Z st.shared.v2.b32 [%r18+4096], {%r621, %r622}; 2026-02-21T10:23:47.7459663Z st.shared.v2.b32 [%r18+8192], {%r625, %r626}; 2026-02-21T10:23:47.7459912Z st.shared.v2.b32 [%r18+12288], {%r629, %r630}; 2026-02-21T10:23:47.7460148Z st.shared.v2.b32 [%r19], {%r619, %r620}; 2026-02-21T10:23:47.7460378Z st.shared.v2.b32 [%r19+4096], {%r623, %r624}; 2026-02-21T10:23:47.7460609Z st.shared.v2.b32 [%r19+8192], {%r627, %r628}; 2026-02-21T10:23:47.7460852Z st.shared.v2.b32 [%r19+12288], {%r631, %r632}; 2026-02-21T10:23:47.7461070Z bar.sync 0; 2026-02-21T10:23:47.7461229Z ld.shared.b16 %rs1, [%r20]; 2026-02-21T10:23:47.7461427Z ld.shared.b16 %rs2, [%r20+1024]; 2026-02-21T10:23:47.7461625Z ld.shared.b16 %rs3, [%r20+64]; 2026-02-21T10:23:47.7461822Z ld.shared.b16 %rs4, [%r20+1088]; 2026-02-21T10:23:47.7462014Z ld.shared.b16 %rs5, [%r21]; 2026-02-21T10:23:47.7462211Z ld.shared.b16 %rs6, [%r21+1024]; 2026-02-21T10:23:47.7462416Z ld.shared.b16 %rs7, [%r21+64]; 2026-02-21T10:23:47.7462617Z ld.shared.b16 %rs8, [%r21+1088]; 2026-02-21T10:23:47.7462813Z ld.shared.b16 %rs9, [%r22]; 2026-02-21T10:23:47.7463013Z ld.shared.b16 %rs10, [%r22+1024]; 2026-02-21T10:23:47.7463343Z ld.shared.b16 %rs11, [%r22+64]; 2026-02-21T10:23:47.7463542Z ld.shared.b16 %rs12, [%r22+1088]; 2026-02-21T10:23:47.7463743Z ld.shared.b16 %rs13, [%r23]; 2026-02-21T10:23:47.7463932Z ld.shared.b16 %rs14, [%r23+1024]; 2026-02-21T10:23:47.7464202Z ld.shared.b16 %rs15, [%r23+64]; 2026-02-21T10:23:47.7464392Z ld.shared.b16 %rs16, [%r23+1088]; 2026-02-21T10:23:47.7464590Z ld.shared.b16 %rs17, [%r24]; 2026-02-21T10:23:47.7464774Z ld.shared.b16 %rs18, [%r24+1024]; 2026-02-21T10:23:47.7464971Z ld.shared.b16 %rs19, [%r24+64]; 2026-02-21T10:23:47.7465168Z ld.shared.b16 %rs20, [%r24+1088]; 2026-02-21T10:23:47.7465359Z ld.shared.b16 %rs21, [%r25]; 2026-02-21T10:23:47.7465547Z ld.shared.b16 %rs22, [%r25+1024]; 2026-02-21T10:23:47.7465738Z ld.shared.b16 %rs23, [%r25+64]; 2026-02-21T10:23:47.7465946Z ld.shared.b16 %rs24, [%r25+1088]; 2026-02-21T10:23:47.7466139Z ld.shared.b16 %rs25, [%r26]; 2026-02-21T10:23:47.7466326Z ld.shared.b16 %rs26, [%r26+1024]; 2026-02-21T10:23:47.7466720Z ld.shared.b16 %rs27, [%r26+64]; 2026-02-21T10:23:47.7466927Z ld.shared.b16 %rs28, [%r26+1088]; 2026-02-21T10:23:47.7467119Z ld.shared.b16 %rs29, [%r27]; 2026-02-21T10:23:47.7467309Z ld.shared.b16 %rs30, [%r27+1024]; 2026-02-21T10:23:47.7467512Z ld.shared.b16 %rs31, [%r27+64]; 2026-02-21T10:23:47.7467703Z ld.shared.b16 %rs32, [%r27+1088]; 2026-02-21T10:23:47.7467904Z cvt.f32.bf16 %r770, %rs1; 2026-02-21T10:23:47.7468148Z cvt.f32.bf16 %r771, %rs2; 2026-02-21T10:23:47.7468331Z cvt.f32.bf16 %r772, %rs5; 2026-02-21T10:23:47.7468587Z cvt.f32.bf16 %r773, %rs6; 2026-02-21T10:23:47.7468780Z cvt.f32.bf16 %r902, %rs9; 2026-02-21T10:23:47.7468956Z cvt.f32.bf16 %r903, %rs10; 2026-02-21T10:23:47.7469145Z cvt.f32.bf16 %r904, %rs13; 2026-02-21T10:23:47.7469330Z cvt.f32.bf16 %r905, %rs14; 2026-02-21T10:23:47.7469509Z cvt.f32.bf16 %r1034, %rs17; 2026-02-21T10:23:47.7469694Z cvt.f32.bf16 %r1035, %rs18; 2026-02-21T10:23:47.7469869Z cvt.f32.bf16 %r1036, %rs21; 2026-02-21T10:23:47.7470054Z cvt.f32.bf16 %r1037, %rs22; 2026-02-21T10:23:47.7470229Z cvt.f32.bf16 %r1166, %rs25; 2026-02-21T10:23:47.7470413Z cvt.f32.bf16 %r1167, %rs26; 2026-02-21T10:23:47.7470587Z cvt.f32.bf16 %r1168, %rs29; 2026-02-21T10:23:47.7470778Z cvt.f32.bf16 %r1169, %rs30; 2026-02-21T10:23:47.7470960Z cvt.f32.bf16 %r1298, %rs3; 2026-02-21T10:23:47.7471138Z cvt.f32.bf16 %r1299, %rs4; 2026-02-21T10:23:47.7471315Z cvt.f32.bf16 %r1300, %rs7; 2026-02-21T10:23:47.7471490Z cvt.f32.bf16 %r1301, %rs8; 2026-02-21T10:23:47.7471669Z cvt.f32.bf16 %r1430, %rs11; 2026-02-21T10:23:47.7471846Z cvt.f32.bf16 %r1431, %rs12; 2026-02-21T10:23:47.7472023Z cvt.f32.bf16 %r1432, %rs15; 2026-02-21T10:23:47.7472195Z cvt.f32.bf16 %r1433, %rs16; 2026-02-21T10:23:47.7472376Z cvt.f32.bf16 %r1562, %rs19; 2026-02-21T10:23:47.7472551Z cvt.f32.bf16 %r1563, %rs20; 2026-02-21T10:23:47.7472729Z cvt.f32.bf16 %r1564, %rs23; 2026-02-21T10:23:47.7472901Z cvt.f32.bf16 %r1565, %rs24; 2026-02-21T10:23:47.7473081Z cvt.f32.bf16 %r1694, %rs27; 2026-02-21T10:23:47.7473267Z cvt.f32.bf16 %r1695, %rs28; 2026-02-21T10:23:47.7473446Z cvt.f32.bf16 %r1696, %rs31; 2026-02-21T10:23:47.7473626Z cvt.f32.bf16 %r1697, %rs32; 2026-02-21T10:23:47.7473957Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7474325Z bar.sync 0; 2026-02-21T10:23:47.7474483Z add.s32 %r5769, %r10980, 4096; 2026-02-21T10:23:47.7474674Z // begin inline asm 2026-02-21T10:23:47.7474873Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7475117Z // end inline asm 2026-02-21T10:23:47.7475272Z bar.sync 0; 2026-02-21T10:23:47.7475417Z // begin inline asm 2026-02-21T10:23:47.7475652Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7475922Z // end inline asm 2026-02-21T10:23:47.7476091Z // begin inline asm 2026-02-21T10:23:47.7476268Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7476596Z // end inline asm 2026-02-21T10:23:47.7476849Z bar.sync 0; 2026-02-21T10:23:47.7477015Z elect.sync %r5477|%p70, -1; 2026-02-21T10:23:47.7477208Z and.pred %p22, %p1, %p70; 2026-02-21T10:23:47.7490464Z // begin inline asm 2026-02-21T10:23:47.7491031Z @%p22 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r636, %r637}], [%r5769]; 2026-02-21T10:23:47.7491713Z // end inline asm 2026-02-21T10:23:47.7491884Z bar.sync 0; 2026-02-21T10:23:47.7492054Z mov.b32 %r5408, 0; 2026-02-21T10:23:47.7492223Z // begin inline asm 2026-02-21T10:23:47.7492387Z 2026-02-21T10:23:47.7492517Z { 2026-02-21T10:23:47.7492662Z .reg .pred complete; 2026-02-21T10:23:47.7492837Z waitLoop: 2026-02-21T10:23:47.7493081Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r5408; 2026-02-21T10:23:47.7493395Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7493574Z } 2026-02-21T10:23:47.7493654Z 2026-02-21T10:23:47.7493723Z // end inline asm 2026-02-21T10:23:47.7493880Z bar.sync 0; 2026-02-21T10:23:47.7494051Z // begin inline asm 2026-02-21T10:23:47.7494348Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7494605Z // end inline asm 2026-02-21T10:23:47.7494932Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7495320Z ld.shared.b8 %rs33, [%r28]; 2026-02-21T10:23:47.7495522Z ld.shared.b8 %rs34, [%r28+1024]; 2026-02-21T10:23:47.7495810Z ld.shared.b8 %rs35, [%r28+2048]; 2026-02-21T10:23:47.7496018Z ld.shared.b8 %rs36, [%r28+3072]; 2026-02-21T10:23:47.7496214Z ld.shared.b8 %rs37, [%r29+128]; 2026-02-21T10:23:47.7496412Z ld.shared.b8 %rs38, [%r29+1152]; 2026-02-21T10:23:47.7496754Z ld.shared.b8 %rs39, [%r29+2176]; 2026-02-21T10:23:47.7496971Z ld.shared.b8 %rs40, [%r29+3200]; 2026-02-21T10:23:47.7497169Z ld.shared.b8 %rs41, [%r30+256]; 2026-02-21T10:23:47.7497370Z ld.shared.b8 %rs42, [%r30+1280]; 2026-02-21T10:23:47.7497567Z ld.shared.b8 %rs43, [%r30+2304]; 2026-02-21T10:23:47.7497758Z ld.shared.b8 %rs44, [%r30+3328]; 2026-02-21T10:23:47.7497964Z ld.shared.b8 %rs45, [%r31+384]; 2026-02-21T10:23:47.7498156Z ld.shared.b8 %rs46, [%r31+1408]; 2026-02-21T10:23:47.7498352Z ld.shared.b8 %rs47, [%r31+2432]; 2026-02-21T10:23:47.7498544Z ld.shared.b8 %rs48, [%r31+3456]; 2026-02-21T10:23:47.7498745Z ld.shared.b8 %rs49, [%r32+512]; 2026-02-21T10:23:47.7498937Z ld.shared.b8 %rs50, [%r32+1536]; 2026-02-21T10:23:47.7499135Z ld.shared.b8 %rs51, [%r32+2560]; 2026-02-21T10:23:47.7499324Z ld.shared.b8 %rs52, [%r32+3584]; 2026-02-21T10:23:47.7499521Z ld.shared.b8 %rs53, [%r33+640]; 2026-02-21T10:23:47.7499715Z ld.shared.b8 %rs54, [%r33+1664]; 2026-02-21T10:23:47.7499905Z ld.shared.b8 %rs55, [%r33+2688]; 2026-02-21T10:23:47.7500104Z ld.shared.b8 %rs56, [%r33+3712]; 2026-02-21T10:23:47.7500299Z ld.shared.b8 %rs57, [%r34+768]; 2026-02-21T10:23:47.7500495Z ld.shared.b8 %rs58, [%r34+1792]; 2026-02-21T10:23:47.7500689Z ld.shared.b8 %rs59, [%r34+2816]; 2026-02-21T10:23:47.7500892Z ld.shared.b8 %rs60, [%r34+3840]; 2026-02-21T10:23:47.7501087Z ld.shared.b8 %rs61, [%r35+896]; 2026-02-21T10:23:47.7501284Z ld.shared.b8 %rs62, [%r35+1920]; 2026-02-21T10:23:47.7501483Z ld.shared.b8 %rs63, [%r35+2944]; 2026-02-21T10:23:47.7501675Z ld.shared.b8 %rs64, [%r35+3968]; 2026-02-21T10:23:47.7502025Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7502396Z shl.b16 %rs65, %rs33, 4; 2026-02-21T10:23:47.7502589Z shl.b16 %rs66, %rs37, 4; 2026-02-21T10:23:47.7502766Z shl.b16 %rs67, %rs41, 4; 2026-02-21T10:23:47.7502949Z shl.b16 %rs68, %rs45, 4; 2026-02-21T10:23:47.7503121Z shl.b16 %rs69, %rs49, 4; 2026-02-21T10:23:47.7503298Z shl.b16 %rs70, %rs53, 4; 2026-02-21T10:23:47.7503474Z shl.b16 %rs71, %rs57, 4; 2026-02-21T10:23:47.7503645Z shl.b16 %rs72, %rs61, 4; 2026-02-21T10:23:47.7503826Z shl.b16 %rs73, %rs34, 4; 2026-02-21T10:23:47.7504011Z shl.b16 %rs74, %rs38, 4; 2026-02-21T10:23:47.7504188Z shl.b16 %rs75, %rs42, 4; 2026-02-21T10:23:47.7504447Z shl.b16 %rs76, %rs46, 4; 2026-02-21T10:23:47.7504622Z shl.b16 %rs77, %rs50, 4; 2026-02-21T10:23:47.7504790Z shl.b16 %rs78, %rs54, 4; 2026-02-21T10:23:47.7504966Z shl.b16 %rs79, %rs58, 4; 2026-02-21T10:23:47.7505136Z shl.b16 %rs80, %rs62, 4; 2026-02-21T10:23:47.7505380Z shl.b16 %rs81, %rs35, 4; 2026-02-21T10:23:47.7505558Z shl.b16 %rs82, %rs39, 4; 2026-02-21T10:23:47.7505728Z shl.b16 %rs83, %rs43, 4; 2026-02-21T10:23:47.7505916Z shl.b16 %rs84, %rs47, 4; 2026-02-21T10:23:47.7506087Z shl.b16 %rs85, %rs51, 4; 2026-02-21T10:23:47.7506261Z shl.b16 %rs86, %rs55, 4; 2026-02-21T10:23:47.7506430Z shl.b16 %rs87, %rs59, 4; 2026-02-21T10:23:47.7506735Z shl.b16 %rs88, %rs63, 4; 2026-02-21T10:23:47.7506904Z shl.b16 %rs89, %rs36, 4; 2026-02-21T10:23:47.7507084Z shl.b16 %rs90, %rs40, 4; 2026-02-21T10:23:47.7507251Z shl.b16 %rs91, %rs44, 4; 2026-02-21T10:23:47.7507425Z shl.b16 %rs92, %rs48, 4; 2026-02-21T10:23:47.7507598Z shl.b16 %rs93, %rs52, 4; 2026-02-21T10:23:47.7507771Z shl.b16 %rs94, %rs56, 4; 2026-02-21T10:23:47.7508040Z shl.b16 %rs95, %rs60, 4; 2026-02-21T10:23:47.7508214Z shl.b16 %rs96, %rs64, 4; 2026-02-21T10:23:47.7508620Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7509023Z selp.b16 %rs97, %rs65, %rs33, %p188; 2026-02-21T10:23:47.7509256Z cvt.s16.s8 %rs98, %rs97; 2026-02-21T10:23:47.7509442Z shr.s16 %rs99, %rs98, 4; 2026-02-21T10:23:47.7509712Z selp.b16 %rs100, %rs66, %rs37, %p188; 2026-02-21T10:23:47.7509950Z cvt.s16.s8 %rs101, %rs100; 2026-02-21T10:23:47.7510139Z shr.s16 %rs102, %rs101, 4; 2026-02-21T10:23:47.7510333Z selp.b16 %rs103, %rs67, %rs41, %p188; 2026-02-21T10:23:47.7510544Z cvt.s16.s8 %rs104, %rs103; 2026-02-21T10:23:47.7510733Z shr.s16 %rs105, %rs104, 4; 2026-02-21T10:23:47.7510920Z selp.b16 %rs106, %rs68, %rs45, %p188; 2026-02-21T10:23:47.7511132Z cvt.s16.s8 %rs107, %rs106; 2026-02-21T10:23:47.7511310Z shr.s16 %rs108, %rs107, 4; 2026-02-21T10:23:47.7511505Z selp.b16 %rs109, %rs69, %rs49, %p188; 2026-02-21T10:23:47.7511711Z cvt.s16.s8 %rs110, %rs109; 2026-02-21T10:23:47.7511903Z shr.s16 %rs111, %rs110, 4; 2026-02-21T10:23:47.7512098Z selp.b16 %rs112, %rs70, %rs53, %p188; 2026-02-21T10:23:47.7512303Z cvt.s16.s8 %rs113, %rs112; 2026-02-21T10:23:47.7512491Z shr.s16 %rs114, %rs113, 4; 2026-02-21T10:23:47.7512674Z selp.b16 %rs115, %rs71, %rs57, %p188; 2026-02-21T10:23:47.7512886Z cvt.s16.s8 %rs116, %rs115; 2026-02-21T10:23:47.7513061Z shr.s16 %rs117, %rs116, 4; 2026-02-21T10:23:47.7513253Z selp.b16 %rs118, %rs72, %rs61, %p188; 2026-02-21T10:23:47.7513453Z cvt.s16.s8 %rs119, %rs118; 2026-02-21T10:23:47.7513640Z shr.s16 %rs120, %rs119, 4; 2026-02-21T10:23:47.7513832Z selp.b16 %rs121, %rs73, %rs34, %p188; 2026-02-21T10:23:47.7514030Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T10:23:47.7514214Z shr.s16 %rs123, %rs122, 4; 2026-02-21T10:23:47.7514397Z selp.b16 %rs124, %rs74, %rs38, %p188; 2026-02-21T10:23:47.7514606Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T10:23:47.7514787Z shr.s16 %rs126, %rs125, 4; 2026-02-21T10:23:47.7514986Z selp.b16 %rs127, %rs75, %rs42, %p188; 2026-02-21T10:23:47.7515186Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T10:23:47.7515368Z shr.s16 %rs129, %rs128, 4; 2026-02-21T10:23:47.7515556Z selp.b16 %rs130, %rs76, %rs46, %p188; 2026-02-21T10:23:47.7515761Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T10:23:47.7515945Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:23:47.7516132Z selp.b16 %rs133, %rs77, %rs50, %p188; 2026-02-21T10:23:47.7516342Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T10:23:47.7516647Z shr.s16 %rs135, %rs134, 4; 2026-02-21T10:23:47.7516837Z selp.b16 %rs136, %rs78, %rs54, %p188; 2026-02-21T10:23:47.7517035Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T10:23:47.7517214Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:23:47.7517393Z selp.b16 %rs139, %rs79, %rs58, %p188; 2026-02-21T10:23:47.7517604Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T10:23:47.7517786Z shr.s16 %rs141, %rs140, 4; 2026-02-21T10:23:47.7518067Z selp.b16 %rs142, %rs80, %rs62, %p188; 2026-02-21T10:23:47.7518277Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T10:23:47.7518450Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:23:47.7518639Z selp.b16 %rs145, %rs81, %rs35, %p188; 2026-02-21T10:23:47.7518909Z cvt.s16.s8 %rs146, %rs145; 2026-02-21T10:23:47.7519089Z shr.s16 %rs147, %rs146, 4; 2026-02-21T10:23:47.7519269Z selp.b16 %rs148, %rs82, %rs39, %p188; 2026-02-21T10:23:47.7519494Z cvt.s16.s8 %rs149, %rs148; 2026-02-21T10:23:47.7519667Z shr.s16 %rs150, %rs149, 4; 2026-02-21T10:23:47.7519856Z selp.b16 %rs151, %rs83, %rs43, %p188; 2026-02-21T10:23:47.7520062Z cvt.s16.s8 %rs152, %rs151; 2026-02-21T10:23:47.7520235Z shr.s16 %rs153, %rs152, 4; 2026-02-21T10:23:47.7520423Z selp.b16 %rs154, %rs84, %rs47, %p188; 2026-02-21T10:23:47.7520620Z cvt.s16.s8 %rs155, %rs154; 2026-02-21T10:23:47.7520802Z shr.s16 %rs156, %rs155, 4; 2026-02-21T10:23:47.7520981Z selp.b16 %rs157, %rs85, %rs51, %p188; 2026-02-21T10:23:47.7521197Z cvt.s16.s8 %rs158, %rs157; 2026-02-21T10:23:47.7521460Z shr.s16 %rs159, %rs158, 4; 2026-02-21T10:23:47.7521658Z selp.b16 %rs160, %rs86, %rs55, %p188; 2026-02-21T10:23:47.7521861Z cvt.s16.s8 %rs161, %rs160; 2026-02-21T10:23:47.7522039Z shr.s16 %rs162, %rs161, 4; 2026-02-21T10:23:47.7522233Z selp.b16 %rs163, %rs87, %rs59, %p188; 2026-02-21T10:23:47.7522433Z cvt.s16.s8 %rs164, %rs163; 2026-02-21T10:23:47.7522620Z shr.s16 %rs165, %rs164, 4; 2026-02-21T10:23:47.7522900Z selp.b16 %rs166, %rs88, %rs63, %p188; 2026-02-21T10:23:47.7523111Z cvt.s16.s8 %rs167, %rs166; 2026-02-21T10:23:47.7523286Z shr.s16 %rs168, %rs167, 4; 2026-02-21T10:23:47.7523472Z selp.b16 %rs169, %rs89, %rs36, %p188; 2026-02-21T10:23:47.7523674Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T10:23:47.7523845Z shr.s16 %rs171, %rs170, 4; 2026-02-21T10:23:47.7524031Z selp.b16 %rs172, %rs90, %rs40, %p188; 2026-02-21T10:23:47.7524228Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T10:23:47.7524405Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:23:47.7524586Z selp.b16 %rs175, %rs91, %rs44, %p188; 2026-02-21T10:23:47.7524798Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T10:23:47.7524970Z shr.s16 %rs177, %rs176, 4; 2026-02-21T10:23:47.7525158Z selp.b16 %rs178, %rs92, %rs48, %p188; 2026-02-21T10:23:47.7525368Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T10:23:47.7525545Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:23:47.7525734Z selp.b16 %rs181, %rs93, %rs52, %p188; 2026-02-21T10:23:47.7525935Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T10:23:47.7526120Z shr.s16 %rs183, %rs182, 4; 2026-02-21T10:23:47.7526304Z selp.b16 %rs184, %rs94, %rs56, %p188; 2026-02-21T10:23:47.7526637Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T10:23:47.7526814Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:23:47.7527002Z selp.b16 %rs187, %rs95, %rs60, %p188; 2026-02-21T10:23:47.7527198Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T10:23:47.7527373Z shr.s16 %rs189, %rs188, 4; 2026-02-21T10:23:47.7527555Z selp.b16 %rs190, %rs96, %rs64, %p188; 2026-02-21T10:23:47.7527752Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T10:23:47.7527938Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:23:47.7528273Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7528649Z cvt.rn.f32.s16 %r5478, %rs99; 2026-02-21T10:23:47.7528841Z cvt.rn.f32.s16 %r5479, %rs102; 2026-02-21T10:23:47.7529039Z cvt.rn.f32.s16 %r5480, %rs105; 2026-02-21T10:23:47.7529231Z cvt.rn.f32.s16 %r5481, %rs108; 2026-02-21T10:23:47.7529424Z cvt.rn.f32.s16 %r5482, %rs111; 2026-02-21T10:23:47.7529615Z cvt.rn.f32.s16 %r5483, %rs114; 2026-02-21T10:23:47.7529794Z cvt.rn.f32.s16 %r5484, %rs117; 2026-02-21T10:23:47.7529977Z cvt.rn.f32.s16 %r5485, %rs120; 2026-02-21T10:23:47.7530158Z cvt.rn.f32.s16 %r5486, %rs123; 2026-02-21T10:23:47.7530345Z cvt.rn.f32.s16 %r5487, %rs126; 2026-02-21T10:23:47.7530524Z cvt.rn.f32.s16 %r5488, %rs129; 2026-02-21T10:23:47.7530708Z cvt.rn.f32.s16 %r5489, %rs132; 2026-02-21T10:23:47.7530888Z cvt.rn.f32.s16 %r5490, %rs135; 2026-02-21T10:23:47.7531073Z cvt.rn.f32.s16 %r5491, %rs138; 2026-02-21T10:23:47.7531358Z cvt.rn.f32.s16 %r5492, %rs141; 2026-02-21T10:23:47.7531532Z cvt.rn.f32.s16 %r5493, %rs144; 2026-02-21T10:23:47.7531711Z cvt.rn.f32.s16 %r5494, %rs147; 2026-02-21T10:23:47.7531895Z cvt.rn.f32.s16 %r5495, %rs150; 2026-02-21T10:23:47.7532167Z cvt.rn.f32.s16 %r5496, %rs153; 2026-02-21T10:23:47.7532353Z cvt.rn.f32.s16 %r5497, %rs156; 2026-02-21T10:23:47.7532533Z cvt.rn.f32.s16 %r5498, %rs159; 2026-02-21T10:23:47.7532724Z cvt.rn.f32.s16 %r5499, %rs162; 2026-02-21T10:23:47.7532906Z cvt.rn.f32.s16 %r5500, %rs165; 2026-02-21T10:23:47.7533095Z cvt.rn.f32.s16 %r5501, %rs168; 2026-02-21T10:23:47.7533275Z cvt.rn.f32.s16 %r5502, %rs171; 2026-02-21T10:23:47.7533464Z cvt.rn.f32.s16 %r5503, %rs174; 2026-02-21T10:23:47.7533647Z cvt.rn.f32.s16 %r5504, %rs177; 2026-02-21T10:23:47.7533833Z cvt.rn.f32.s16 %r5505, %rs180; 2026-02-21T10:23:47.7534018Z cvt.rn.f32.s16 %r5506, %rs183; 2026-02-21T10:23:47.7534200Z cvt.rn.f32.s16 %r5507, %rs186; 2026-02-21T10:23:47.7534470Z cvt.rn.f32.s16 %r5508, %rs189; 2026-02-21T10:23:47.7534658Z cvt.rn.f32.s16 %r5509, %rs192; 2026-02-21T10:23:47.7534842Z bar.sync 0; 2026-02-21T10:23:47.7534993Z st.shared.b32 [%r36], %r5478; 2026-02-21T10:23:47.7535187Z st.shared.b32 [%r36+8], %r5479; 2026-02-21T10:23:47.7535386Z st.shared.b32 [%r36+16384], %r5494; 2026-02-21T10:23:47.7535593Z st.shared.b32 [%r36+16392], %r5495; 2026-02-21T10:23:47.7535868Z st.shared.b32 [%r37], %r5480; 2026-02-21T10:23:47.7536060Z st.shared.b32 [%r37+8], %r5481; 2026-02-21T10:23:47.7536249Z st.shared.b32 [%r37+16384], %r5496; 2026-02-21T10:23:47.7536444Z st.shared.b32 [%r37+16392], %r5497; 2026-02-21T10:23:47.7536762Z st.shared.b32 [%r38], %r5482; 2026-02-21T10:23:47.7536942Z st.shared.b32 [%r38+8], %r5483; 2026-02-21T10:23:47.7537134Z st.shared.b32 [%r38+16384], %r5498; 2026-02-21T10:23:47.7537330Z st.shared.b32 [%r38+16392], %r5499; 2026-02-21T10:23:47.7537523Z st.shared.b32 [%r39], %r5484; 2026-02-21T10:23:47.7537707Z st.shared.b32 [%r39+8], %r5485; 2026-02-21T10:23:47.7537919Z st.shared.b32 [%r39+16384], %r5500; 2026-02-21T10:23:47.7538122Z st.shared.b32 [%r39+16392], %r5501; 2026-02-21T10:23:47.7538316Z st.shared.b32 [%r40], %r5486; 2026-02-21T10:23:47.7538506Z st.shared.b32 [%r40+8], %r5487; 2026-02-21T10:23:47.7538697Z st.shared.b32 [%r40+16384], %r5502; 2026-02-21T10:23:47.7538896Z st.shared.b32 [%r40+16392], %r5503; 2026-02-21T10:23:47.7539091Z st.shared.b32 [%r41], %r5488; 2026-02-21T10:23:47.7539279Z st.shared.b32 [%r41+8], %r5489; 2026-02-21T10:23:47.7539465Z st.shared.b32 [%r41+16384], %r5504; 2026-02-21T10:23:47.7539667Z st.shared.b32 [%r41+16392], %r5505; 2026-02-21T10:23:47.7539863Z st.shared.b32 [%r42], %r5490; 2026-02-21T10:23:47.7540046Z st.shared.b32 [%r42+8], %r5491; 2026-02-21T10:23:47.7540241Z st.shared.b32 [%r42+16384], %r5506; 2026-02-21T10:23:47.7540439Z st.shared.b32 [%r42+16392], %r5507; 2026-02-21T10:23:47.7540637Z st.shared.b32 [%r43], %r5492; 2026-02-21T10:23:47.7540819Z st.shared.b32 [%r43+8], %r5493; 2026-02-21T10:23:47.7541013Z st.shared.b32 [%r43+16384], %r5508; 2026-02-21T10:23:47.7541210Z st.shared.b32 [%r43+16392], %r5509; 2026-02-21T10:23:47.7541402Z $L__tmp1: 2026-02-21T10:23:47.7541782Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7542226Z // begin inline asm 2026-02-21T10:23:47.7542417Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7542603Z // end inline asm 2026-02-21T10:23:47.7542755Z bar.sync 0; 2026-02-21T10:23:47.7542922Z shfl.sync.idx.b32 %r5510, %r4, 0, 31, -1; 2026-02-21T10:23:47.7543158Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7543343Z mov.pred %p24, -1; 2026-02-21T10:23:47.7543508Z // begin inline asm 2026-02-21T10:23:47.7545065Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r770,%r771,%r772,%r773}, %rd2, %p24, 1, 1; 2026-02-21T10:23:47.7546993Z // end inline asm 2026-02-21T10:23:47.7547158Z // begin inline asm 2026-02-21T10:23:47.7548890Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r902,%r903,%r904,%r905}, %rd3, %p24, 1, 1; 2026-02-21T10:23:47.7550510Z // end inline asm 2026-02-21T10:23:47.7550677Z // begin inline asm 2026-02-21T10:23:47.7552341Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1034,%r1035,%r1036,%r1037}, %rd4, %p24, 1, 1; 2026-02-21T10:23:47.7553962Z // end inline asm 2026-02-21T10:23:47.7554111Z // begin inline asm 2026-02-21T10:23:47.7555685Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1166,%r1167,%r1168,%r1169}, %rd5, %p24, 1, 1; 2026-02-21T10:23:47.7557413Z // end inline asm 2026-02-21T10:23:47.7557562Z // begin inline asm 2026-02-21T10:23:47.7559134Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1298,%r1299,%r1300,%r1301}, %rd6, %p24, 1, 1; 2026-02-21T10:23:47.7560765Z // end inline asm 2026-02-21T10:23:47.7560921Z // begin inline asm 2026-02-21T10:23:47.7562498Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1430,%r1431,%r1432,%r1433}, %rd7, %p24, 1, 1; 2026-02-21T10:23:47.7564205Z // end inline asm 2026-02-21T10:23:47.7564353Z // begin inline asm 2026-02-21T10:23:47.7565921Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1562,%r1563,%r1564,%r1565}, %rd8, %p24, 1, 1; 2026-02-21T10:23:47.7567719Z // end inline asm 2026-02-21T10:23:47.7567868Z // begin inline asm 2026-02-21T10:23:47.7569578Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1694,%r1695,%r1696,%r1697}, %rd9, %p24, 1, 1; 2026-02-21T10:23:47.7571210Z // end inline asm 2026-02-21T10:23:47.7571380Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7571592Z mov.b32 %r1763, %r5408; 2026-02-21T10:23:47.7571769Z mov.b32 %r1764, %r5408; 2026-02-21T10:23:47.7571940Z mov.b32 %r1762, %r10980; 2026-02-21T10:23:47.7572129Z // begin inline asm 2026-02-21T10:23:47.7573513Z // wait for regs: %r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154,%r1762,%r1763,%r1764 2026-02-21T10:23:47.7574968Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7575174Z // end inline asm 2026-02-21T10:23:47.7575322Z $L__tmp2: 2026-02-21T10:23:47.7575630Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7575998Z add.s64 %rd97, %rd76, 128; 2026-02-21T10:23:47.7576188Z add.s64 %rd100, %rd79, 128; 2026-02-21T10:23:47.7576368Z add.s64 %rd103, %rd82, 128; 2026-02-21T10:23:47.7576807Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7577172Z add.s64 %rd106, %rd85, 128; 2026-02-21T10:23:47.7577353Z // begin inline asm 2026-02-21T10:23:47.7577517Z mov.u64 %rd96, 0x0; 2026-02-21T10:23:47.7577738Z createpolicy.fractional.L2::evict_last.b64 %rd96, 1.0; 2026-02-21T10:23:47.7578011Z // end inline asm 2026-02-21T10:23:47.7578170Z // begin inline asm 2026-02-21T10:23:47.7578334Z mov.u32 %r1832, 0x0; 2026-02-21T10:23:47.7578493Z mov.u32 %r1833, 0x0; 2026-02-21T10:23:47.7578653Z mov.u32 %r1834, 0x0; 2026-02-21T10:23:47.7578812Z mov.u32 %r1835, 0x0; 2026-02-21T10:23:47.7579138Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1832, %r1833, %r1834, %r1835 }, [ %rd97 + 0 ], %rd96; 2026-02-21T10:23:47.7579509Z // end inline asm 2026-02-21T10:23:47.7579662Z // begin inline asm 2026-02-21T10:23:47.7579825Z mov.u64 %rd99, 0x0; 2026-02-21T10:23:47.7580037Z createpolicy.fractional.L2::evict_last.b64 %rd99, 1.0; 2026-02-21T10:23:47.7580298Z // end inline asm 2026-02-21T10:23:47.7580449Z // begin inline asm 2026-02-21T10:23:47.7580608Z mov.u32 %r1836, 0x0; 2026-02-21T10:23:47.7580865Z mov.u32 %r1837, 0x0; 2026-02-21T10:23:47.7581023Z mov.u32 %r1838, 0x0; 2026-02-21T10:23:47.7581179Z mov.u32 %r1839, 0x0; 2026-02-21T10:23:47.7581494Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1836, %r1837, %r1838, %r1839 }, [ %rd100 + 0 ], %rd99; 2026-02-21T10:23:47.7581940Z // end inline asm 2026-02-21T10:23:47.7582090Z // begin inline asm 2026-02-21T10:23:47.7582249Z mov.u64 %rd102, 0x0; 2026-02-21T10:23:47.7582467Z createpolicy.fractional.L2::evict_last.b64 %rd102, 1.0; 2026-02-21T10:23:47.7582725Z // end inline asm 2026-02-21T10:23:47.7582872Z // begin inline asm 2026-02-21T10:23:47.7583045Z mov.u32 %r1840, 0x0; 2026-02-21T10:23:47.7583202Z mov.u32 %r1841, 0x0; 2026-02-21T10:23:47.7583354Z mov.u32 %r1842, 0x0; 2026-02-21T10:23:47.7583508Z mov.u32 %r1843, 0x0; 2026-02-21T10:23:47.7583821Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1840, %r1841, %r1842, %r1843 }, [ %rd103 + 0 ], %rd102; 2026-02-21T10:23:47.7584191Z // end inline asm 2026-02-21T10:23:47.7584343Z // begin inline asm 2026-02-21T10:23:47.7584592Z mov.u64 %rd105, 0x0; 2026-02-21T10:23:47.7584819Z createpolicy.fractional.L2::evict_last.b64 %rd105, 1.0; 2026-02-21T10:23:47.7585078Z // end inline asm 2026-02-21T10:23:47.7585229Z // begin inline asm 2026-02-21T10:23:47.7585388Z mov.u32 %r1844, 0x0; 2026-02-21T10:23:47.7585547Z mov.u32 %r1845, 0x0; 2026-02-21T10:23:47.7585698Z mov.u32 %r1846, 0x0; 2026-02-21T10:23:47.7585920Z mov.u32 %r1847, 0x0; 2026-02-21T10:23:47.7586231Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1844, %r1845, %r1846, %r1847 }, [ %rd106 + 0 ], %rd105; 2026-02-21T10:23:47.7586727Z // end inline asm 2026-02-21T10:23:47.7587035Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7587396Z bar.sync 0; 2026-02-21T10:23:47.7587572Z st.shared.v2.b32 [%r18], {%r1832, %r1833}; 2026-02-21T10:23:47.7587821Z st.shared.v2.b32 [%r18+4096], {%r1836, %r1837}; 2026-02-21T10:23:47.7588085Z st.shared.v2.b32 [%r18+8192], {%r1840, %r1841}; 2026-02-21T10:23:47.7588335Z st.shared.v2.b32 [%r18+12288], {%r1844, %r1845}; 2026-02-21T10:23:47.7588664Z st.shared.v2.b32 [%r19], {%r1834, %r1835}; 2026-02-21T10:23:47.7588902Z st.shared.v2.b32 [%r19+4096], {%r1838, %r1839}; 2026-02-21T10:23:47.7589151Z st.shared.v2.b32 [%r19+8192], {%r1842, %r1843}; 2026-02-21T10:23:47.7589403Z st.shared.v2.b32 [%r19+12288], {%r1846, %r1847}; 2026-02-21T10:23:47.7589630Z bar.sync 0; 2026-02-21T10:23:47.7589793Z ld.shared.b16 %rs193, [%r20]; 2026-02-21T10:23:47.7589994Z ld.shared.b16 %rs194, [%r20+1024]; 2026-02-21T10:23:47.7590208Z ld.shared.b16 %rs195, [%r20+64]; 2026-02-21T10:23:47.7590409Z ld.shared.b16 %rs196, [%r20+1088]; 2026-02-21T10:23:47.7590618Z ld.shared.b16 %rs197, [%r21]; 2026-02-21T10:23:47.7590809Z ld.shared.b16 %rs198, [%r21+1024]; 2026-02-21T10:23:47.7591011Z ld.shared.b16 %rs199, [%r21+64]; 2026-02-21T10:23:47.7591206Z ld.shared.b16 %rs200, [%r21+1088]; 2026-02-21T10:23:47.7591410Z ld.shared.b16 %rs201, [%r22]; 2026-02-21T10:23:47.7591605Z ld.shared.b16 %rs202, [%r22+1024]; 2026-02-21T10:23:47.7591804Z ld.shared.b16 %rs203, [%r22+64]; 2026-02-21T10:23:47.7592004Z ld.shared.b16 %rs204, [%r22+1088]; 2026-02-21T10:23:47.7592200Z ld.shared.b16 %rs205, [%r23]; 2026-02-21T10:23:47.7592392Z ld.shared.b16 %rs206, [%r23+1024]; 2026-02-21T10:23:47.7592586Z ld.shared.b16 %rs207, [%r23+64]; 2026-02-21T10:23:47.7592783Z ld.shared.b16 %rs208, [%r23+1088]; 2026-02-21T10:23:47.7592977Z ld.shared.b16 %rs209, [%r24]; 2026-02-21T10:23:47.7593164Z ld.shared.b16 %rs210, [%r24+1024]; 2026-02-21T10:23:47.7593377Z ld.shared.b16 %rs211, [%r24+64]; 2026-02-21T10:23:47.7593570Z ld.shared.b16 %rs212, [%r24+1088]; 2026-02-21T10:23:47.7593770Z ld.shared.b16 %rs213, [%r25]; 2026-02-21T10:23:47.7593954Z ld.shared.b16 %rs214, [%r25+1024]; 2026-02-21T10:23:47.7594152Z ld.shared.b16 %rs215, [%r25+64]; 2026-02-21T10:23:47.7594341Z ld.shared.b16 %rs216, [%r25+1088]; 2026-02-21T10:23:47.7594650Z ld.shared.b16 %rs217, [%r26]; 2026-02-21T10:23:47.7594835Z ld.shared.b16 %rs218, [%r26+1024]; 2026-02-21T10:23:47.7595033Z ld.shared.b16 %rs219, [%r26+64]; 2026-02-21T10:23:47.7595226Z ld.shared.b16 %rs220, [%r26+1088]; 2026-02-21T10:23:47.7595489Z ld.shared.b16 %rs221, [%r27]; 2026-02-21T10:23:47.7595675Z ld.shared.b16 %rs222, [%r27+1024]; 2026-02-21T10:23:47.7595868Z ld.shared.b16 %rs223, [%r27+64]; 2026-02-21T10:23:47.7596064Z ld.shared.b16 %rs224, [%r27+1088]; 2026-02-21T10:23:47.7596263Z cvt.f32.bf16 %r1985, %rs193; 2026-02-21T10:23:47.7596581Z cvt.f32.bf16 %r1986, %rs194; 2026-02-21T10:23:47.7596784Z cvt.f32.bf16 %r1987, %rs197; 2026-02-21T10:23:47.7596968Z cvt.f32.bf16 %r1988, %rs198; 2026-02-21T10:23:47.7597143Z cvt.f32.bf16 %r2117, %rs201; 2026-02-21T10:23:47.7597327Z cvt.f32.bf16 %r2118, %rs202; 2026-02-21T10:23:47.7597509Z cvt.f32.bf16 %r2119, %rs205; 2026-02-21T10:23:47.7597686Z cvt.f32.bf16 %r2120, %rs206; 2026-02-21T10:23:47.7597871Z cvt.f32.bf16 %r2249, %rs209; 2026-02-21T10:23:47.7598126Z cvt.f32.bf16 %r2250, %rs210; 2026-02-21T10:23:47.7598328Z cvt.f32.bf16 %r2251, %rs213; 2026-02-21T10:23:47.7598506Z cvt.f32.bf16 %r2252, %rs214; 2026-02-21T10:23:47.7598689Z cvt.f32.bf16 %r2381, %rs217; 2026-02-21T10:23:47.7598867Z cvt.f32.bf16 %r2382, %rs218; 2026-02-21T10:23:47.7599047Z cvt.f32.bf16 %r2383, %rs221; 2026-02-21T10:23:47.7599226Z cvt.f32.bf16 %r2384, %rs222; 2026-02-21T10:23:47.7599481Z cvt.f32.bf16 %r2513, %rs195; 2026-02-21T10:23:47.7599673Z cvt.f32.bf16 %r2514, %rs196; 2026-02-21T10:23:47.7599849Z cvt.f32.bf16 %r2515, %rs199; 2026-02-21T10:23:47.7600030Z cvt.f32.bf16 %r2516, %rs200; 2026-02-21T10:23:47.7600203Z cvt.f32.bf16 %r2645, %rs203; 2026-02-21T10:23:47.7600380Z cvt.f32.bf16 %r2646, %rs204; 2026-02-21T10:23:47.7600553Z cvt.f32.bf16 %r2647, %rs207; 2026-02-21T10:23:47.7600731Z cvt.f32.bf16 %r2648, %rs208; 2026-02-21T10:23:47.7600906Z cvt.f32.bf16 %r2777, %rs211; 2026-02-21T10:23:47.7601084Z cvt.f32.bf16 %r2778, %rs212; 2026-02-21T10:23:47.7601266Z cvt.f32.bf16 %r2779, %rs215; 2026-02-21T10:23:47.7601443Z cvt.f32.bf16 %r2780, %rs216; 2026-02-21T10:23:47.7601621Z cvt.f32.bf16 %r2909, %rs219; 2026-02-21T10:23:47.7601793Z cvt.f32.bf16 %r2910, %rs220; 2026-02-21T10:23:47.7601972Z cvt.f32.bf16 %r2911, %rs223; 2026-02-21T10:23:47.7602143Z cvt.f32.bf16 %r2912, %rs224; 2026-02-21T10:23:47.7602500Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7602860Z bar.sync 0; 2026-02-21T10:23:47.7603016Z // begin inline asm 2026-02-21T10:23:47.7603221Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7603458Z // end inline asm 2026-02-21T10:23:47.7603611Z bar.sync 0; 2026-02-21T10:23:47.7603754Z // begin inline asm 2026-02-21T10:23:47.7603987Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7604256Z // end inline asm 2026-02-21T10:23:47.7604411Z // begin inline asm 2026-02-21T10:23:47.7604591Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7604786Z // end inline asm 2026-02-21T10:23:47.7604932Z bar.sync 0; 2026-02-21T10:23:47.7605089Z elect.sync %r5511|%p71, -1; 2026-02-21T10:23:47.7605284Z and.pred %p34, %p1, %p71; 2026-02-21T10:23:47.7605467Z or.b32 %r1852, %r637, 32; 2026-02-21T10:23:47.7605641Z // begin inline asm 2026-02-21T10:23:47.7606068Z @%p34 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r636, %r1852}], [%r5769]; 2026-02-21T10:23:47.7606661Z // end inline asm 2026-02-21T10:23:47.7606812Z bar.sync 0; 2026-02-21T10:23:47.7606958Z // begin inline asm 2026-02-21T10:23:47.7607106Z 2026-02-21T10:23:47.7607234Z { 2026-02-21T10:23:47.7607384Z .reg .pred complete; 2026-02-21T10:23:47.7607550Z waitLoop: 2026-02-21T10:23:47.7607776Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r5408; 2026-02-21T10:23:47.7608069Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7608255Z } 2026-02-21T10:23:47.7608325Z 2026-02-21T10:23:47.7608471Z // end inline asm 2026-02-21T10:23:47.7608629Z bar.sync 0; 2026-02-21T10:23:47.7608771Z // begin inline asm 2026-02-21T10:23:47.7608971Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7609200Z // end inline asm 2026-02-21T10:23:47.7609589Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7609963Z ld.shared.b8 %rs225, [%r28]; 2026-02-21T10:23:47.7610155Z ld.shared.b8 %rs226, [%r28+1024]; 2026-02-21T10:23:47.7610363Z ld.shared.b8 %rs227, [%r28+2048]; 2026-02-21T10:23:47.7610556Z ld.shared.b8 %rs228, [%r28+3072]; 2026-02-21T10:23:47.7610755Z ld.shared.b8 %rs229, [%r29+128]; 2026-02-21T10:23:47.7610949Z ld.shared.b8 %rs230, [%r29+1152]; 2026-02-21T10:23:47.7611144Z ld.shared.b8 %rs231, [%r29+2176]; 2026-02-21T10:23:47.7611339Z ld.shared.b8 %rs232, [%r29+3200]; 2026-02-21T10:23:47.7611548Z ld.shared.b8 %rs233, [%r30+256]; 2026-02-21T10:23:47.7611752Z ld.shared.b8 %rs234, [%r30+1280]; 2026-02-21T10:23:47.7612029Z ld.shared.b8 %rs235, [%r30+2304]; 2026-02-21T10:23:47.7612237Z ld.shared.b8 %rs236, [%r30+3328]; 2026-02-21T10:23:47.7612432Z ld.shared.b8 %rs237, [%r31+384]; 2026-02-21T10:23:47.7612629Z ld.shared.b8 %rs238, [%r31+1408]; 2026-02-21T10:23:47.7612820Z ld.shared.b8 %rs239, [%r31+2432]; 2026-02-21T10:23:47.7613016Z ld.shared.b8 %rs240, [%r31+3456]; 2026-02-21T10:23:47.7613271Z ld.shared.b8 %rs241, [%r32+512]; 2026-02-21T10:23:47.7613475Z ld.shared.b8 %rs242, [%r32+1536]; 2026-02-21T10:23:47.7613673Z ld.shared.b8 %rs243, [%r32+2560]; 2026-02-21T10:23:47.7613870Z ld.shared.b8 %rs244, [%r32+3584]; 2026-02-21T10:23:47.7614065Z ld.shared.b8 %rs245, [%r33+640]; 2026-02-21T10:23:47.7614252Z ld.shared.b8 %rs246, [%r33+1664]; 2026-02-21T10:23:47.7614446Z ld.shared.b8 %rs247, [%r33+2688]; 2026-02-21T10:23:47.7614636Z ld.shared.b8 %rs248, [%r33+3712]; 2026-02-21T10:23:47.7614831Z ld.shared.b8 %rs249, [%r34+768]; 2026-02-21T10:23:47.7615018Z ld.shared.b8 %rs250, [%r34+1792]; 2026-02-21T10:23:47.7615219Z ld.shared.b8 %rs251, [%r34+2816]; 2026-02-21T10:23:47.7615412Z ld.shared.b8 %rs252, [%r34+3840]; 2026-02-21T10:23:47.7615603Z ld.shared.b8 %rs253, [%r35+896]; 2026-02-21T10:23:47.7615796Z ld.shared.b8 %rs254, [%r35+1920]; 2026-02-21T10:23:47.7615987Z ld.shared.b8 %rs255, [%r35+2944]; 2026-02-21T10:23:47.7616181Z ld.shared.b8 %rs256, [%r35+3968]; 2026-02-21T10:23:47.7616643Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7617021Z shl.b16 %rs257, %rs225, 4; 2026-02-21T10:23:47.7617208Z shl.b16 %rs258, %rs229, 4; 2026-02-21T10:23:47.7617389Z shl.b16 %rs259, %rs233, 4; 2026-02-21T10:23:47.7617567Z shl.b16 %rs260, %rs237, 4; 2026-02-21T10:23:47.7617631Z shl.b16 %rs261, %rs241, 4; 2026-02-21T10:23:47.7617691Z shl.b16 %rs262, %rs245, 4; 2026-02-21T10:23:47.7617753Z shl.b16 %rs263, %rs249, 4; 2026-02-21T10:23:47.7617820Z shl.b16 %rs264, %rs253, 4; 2026-02-21T10:23:47.7617885Z shl.b16 %rs265, %rs226, 4; 2026-02-21T10:23:47.7617950Z shl.b16 %rs266, %rs230, 4; 2026-02-21T10:23:47.7618018Z shl.b16 %rs267, %rs234, 4; 2026-02-21T10:23:47.7618081Z shl.b16 %rs268, %rs238, 4; 2026-02-21T10:23:47.7618143Z shl.b16 %rs269, %rs242, 4; 2026-02-21T10:23:47.7618209Z shl.b16 %rs270, %rs246, 4; 2026-02-21T10:23:47.7618285Z shl.b16 %rs271, %rs250, 4; 2026-02-21T10:23:47.7618353Z shl.b16 %rs272, %rs254, 4; 2026-02-21T10:23:47.7618419Z shl.b16 %rs273, %rs227, 4; 2026-02-21T10:23:47.7618486Z shl.b16 %rs274, %rs231, 4; 2026-02-21T10:23:47.7618550Z shl.b16 %rs275, %rs235, 4; 2026-02-21T10:23:47.7618611Z shl.b16 %rs276, %rs239, 4; 2026-02-21T10:23:47.7618675Z shl.b16 %rs277, %rs243, 4; 2026-02-21T10:23:47.7618743Z shl.b16 %rs278, %rs247, 4; 2026-02-21T10:23:47.7618805Z shl.b16 %rs279, %rs251, 4; 2026-02-21T10:23:47.7618868Z shl.b16 %rs280, %rs255, 4; 2026-02-21T10:23:47.7618935Z shl.b16 %rs281, %rs228, 4; 2026-02-21T10:23:47.7618995Z shl.b16 %rs282, %rs232, 4; 2026-02-21T10:23:47.7619146Z shl.b16 %rs283, %rs236, 4; 2026-02-21T10:23:47.7619208Z shl.b16 %rs284, %rs240, 4; 2026-02-21T10:23:47.7619276Z shl.b16 %rs285, %rs244, 4; 2026-02-21T10:23:47.7619338Z shl.b16 %rs286, %rs248, 4; 2026-02-21T10:23:47.7619462Z shl.b16 %rs287, %rs252, 4; 2026-02-21T10:23:47.7619528Z shl.b16 %rs288, %rs256, 4; 2026-02-21T10:23:47.7619760Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7619841Z selp.b16 %rs289, %rs257, %rs225, %p188; 2026-02-21T10:23:47.7619909Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T10:23:47.7619973Z shr.s16 %rs291, %rs290, 4; 2026-02-21T10:23:47.7620048Z selp.b16 %rs292, %rs258, %rs229, %p188; 2026-02-21T10:23:47.7620110Z cvt.s16.s8 %rs293, %rs292; 2026-02-21T10:23:47.7620178Z shr.s16 %rs294, %rs293, 4; 2026-02-21T10:23:47.7620250Z selp.b16 %rs295, %rs259, %rs233, %p188; 2026-02-21T10:23:47.7620316Z cvt.s16.s8 %rs296, %rs295; 2026-02-21T10:23:47.7620385Z shr.s16 %rs297, %rs296, 4; 2026-02-21T10:23:47.7620526Z selp.b16 %rs298, %rs260, %rs237, %p188; 2026-02-21T10:23:47.7620592Z cvt.s16.s8 %rs299, %rs298; 2026-02-21T10:23:47.7620655Z shr.s16 %rs300, %rs299, 4; 2026-02-21T10:23:47.7620731Z selp.b16 %rs301, %rs261, %rs241, %p188; 2026-02-21T10:23:47.7620799Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T10:23:47.7620860Z shr.s16 %rs303, %rs302, 4; 2026-02-21T10:23:47.7620999Z selp.b16 %rs304, %rs262, %rs245, %p188; 2026-02-21T10:23:47.7621067Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T10:23:47.7621130Z shr.s16 %rs306, %rs305, 4; 2026-02-21T10:23:47.7621204Z selp.b16 %rs307, %rs263, %rs249, %p188; 2026-02-21T10:23:47.7621274Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T10:23:47.7621337Z shr.s16 %rs309, %rs308, 4; 2026-02-21T10:23:47.7621407Z selp.b16 %rs310, %rs264, %rs253, %p188; 2026-02-21T10:23:47.7621476Z cvt.s16.s8 %rs311, %rs310; 2026-02-21T10:23:47.7621539Z shr.s16 %rs312, %rs311, 4; 2026-02-21T10:23:47.7621611Z selp.b16 %rs313, %rs265, %rs226, %p188; 2026-02-21T10:23:47.7621696Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T10:23:47.7621761Z shr.s16 %rs315, %rs314, 4; 2026-02-21T10:23:47.7621833Z selp.b16 %rs316, %rs266, %rs230, %p188; 2026-02-21T10:23:47.7621893Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T10:23:47.7621967Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:23:47.7622044Z selp.b16 %rs319, %rs267, %rs234, %p188; 2026-02-21T10:23:47.7622105Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T10:23:47.7622174Z shr.s16 %rs321, %rs320, 4; 2026-02-21T10:23:47.7622244Z selp.b16 %rs322, %rs268, %rs238, %p188; 2026-02-21T10:23:47.7622306Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T10:23:47.7622367Z shr.s16 %rs324, %rs323, 4; 2026-02-21T10:23:47.7622443Z selp.b16 %rs325, %rs269, %rs242, %p188; 2026-02-21T10:23:47.7622505Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T10:23:47.7622565Z shr.s16 %rs327, %rs326, 4; 2026-02-21T10:23:47.7622640Z selp.b16 %rs328, %rs270, %rs246, %p188; 2026-02-21T10:23:47.7622702Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T10:23:47.7622767Z shr.s16 %rs330, %rs329, 4; 2026-02-21T10:23:47.7622841Z selp.b16 %rs331, %rs271, %rs250, %p188; 2026-02-21T10:23:47.7622911Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T10:23:47.7622985Z shr.s16 %rs333, %rs332, 4; 2026-02-21T10:23:47.7623060Z selp.b16 %rs334, %rs272, %rs254, %p188; 2026-02-21T10:23:47.7623130Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T10:23:47.7623191Z shr.s16 %rs336, %rs335, 4; 2026-02-21T10:23:47.7623266Z selp.b16 %rs337, %rs273, %rs227, %p188; 2026-02-21T10:23:47.7623337Z cvt.s16.s8 %rs338, %rs337; 2026-02-21T10:23:47.7623398Z shr.s16 %rs339, %rs338, 4; 2026-02-21T10:23:47.7623468Z selp.b16 %rs340, %rs274, %rs231, %p188; 2026-02-21T10:23:47.7623530Z cvt.s16.s8 %rs341, %rs340; 2026-02-21T10:23:47.7623597Z shr.s16 %rs342, %rs341, 4; 2026-02-21T10:23:47.7623668Z selp.b16 %rs343, %rs275, %rs235, %p188; 2026-02-21T10:23:47.7623731Z cvt.s16.s8 %rs344, %rs343; 2026-02-21T10:23:47.7623798Z shr.s16 %rs345, %rs344, 4; 2026-02-21T10:23:47.7623869Z selp.b16 %rs346, %rs276, %rs239, %p188; 2026-02-21T10:23:47.7623998Z cvt.s16.s8 %rs347, %rs346; 2026-02-21T10:23:47.7624060Z shr.s16 %rs348, %rs347, 4; 2026-02-21T10:23:47.7624137Z selp.b16 %rs349, %rs277, %rs243, %p188; 2026-02-21T10:23:47.7624212Z cvt.s16.s8 %rs350, %rs349; 2026-02-21T10:23:47.7624345Z shr.s16 %rs351, %rs350, 4; 2026-02-21T10:23:47.7624422Z selp.b16 %rs352, %rs278, %rs247, %p188; 2026-02-21T10:23:47.7624484Z cvt.s16.s8 %rs353, %rs352; 2026-02-21T10:23:47.7624550Z shr.s16 %rs354, %rs353, 4; 2026-02-21T10:23:47.7624622Z selp.b16 %rs355, %rs279, %rs251, %p188; 2026-02-21T10:23:47.7624689Z cvt.s16.s8 %rs356, %rs355; 2026-02-21T10:23:47.7624751Z shr.s16 %rs357, %rs356, 4; 2026-02-21T10:23:47.7624823Z selp.b16 %rs358, %rs280, %rs255, %p188; 2026-02-21T10:23:47.7624890Z cvt.s16.s8 %rs359, %rs358; 2026-02-21T10:23:47.7624952Z shr.s16 %rs360, %rs359, 4; 2026-02-21T10:23:47.7625023Z selp.b16 %rs361, %rs281, %rs228, %p188; 2026-02-21T10:23:47.7625090Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T10:23:47.7625155Z shr.s16 %rs363, %rs362, 4; 2026-02-21T10:23:47.7625278Z selp.b16 %rs364, %rs282, %rs232, %p188; 2026-02-21T10:23:47.7625343Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T10:23:47.7625412Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:23:47.7625483Z selp.b16 %rs367, %rs283, %rs236, %p188; 2026-02-21T10:23:47.7625548Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T10:23:47.7625619Z shr.s16 %rs369, %rs368, 4; 2026-02-21T10:23:47.7625734Z selp.b16 %rs370, %rs284, %rs240, %p188; 2026-02-21T10:23:47.7625797Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T10:23:47.7625862Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:23:47.7625938Z selp.b16 %rs373, %rs285, %rs244, %p188; 2026-02-21T10:23:47.7626001Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T10:23:47.7626063Z shr.s16 %rs375, %rs374, 4; 2026-02-21T10:23:47.7626144Z selp.b16 %rs376, %rs286, %rs248, %p188; 2026-02-21T10:23:47.7626213Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T10:23:47.7626276Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:23:47.7626347Z selp.b16 %rs379, %rs287, %rs252, %p188; 2026-02-21T10:23:47.7626417Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T10:23:47.7626589Z shr.s16 %rs381, %rs380, 4; 2026-02-21T10:23:47.7626664Z selp.b16 %rs382, %rs288, %rs256, %p188; 2026-02-21T10:23:47.7626733Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T10:23:47.7626814Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:23:47.7627028Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7627107Z cvt.rn.f32.s16 %r5512, %rs291; 2026-02-21T10:23:47.7627173Z cvt.rn.f32.s16 %r5513, %rs294; 2026-02-21T10:23:47.7627238Z cvt.rn.f32.s16 %r5514, %rs297; 2026-02-21T10:23:47.7627302Z cvt.rn.f32.s16 %r5515, %rs300; 2026-02-21T10:23:47.7627371Z cvt.rn.f32.s16 %r5516, %rs303; 2026-02-21T10:23:47.7627435Z cvt.rn.f32.s16 %r5517, %rs306; 2026-02-21T10:23:47.7627501Z cvt.rn.f32.s16 %r5518, %rs309; 2026-02-21T10:23:47.7627570Z cvt.rn.f32.s16 %r5519, %rs312; 2026-02-21T10:23:47.7627633Z cvt.rn.f32.s16 %r5520, %rs315; 2026-02-21T10:23:47.7627701Z cvt.rn.f32.s16 %r5521, %rs318; 2026-02-21T10:23:47.7627765Z cvt.rn.f32.s16 %r5522, %rs321; 2026-02-21T10:23:47.7627836Z cvt.rn.f32.s16 %r5523, %rs324; 2026-02-21T10:23:47.7627900Z cvt.rn.f32.s16 %r5524, %rs327; 2026-02-21T10:23:47.7627965Z cvt.rn.f32.s16 %r5525, %rs330; 2026-02-21T10:23:47.7628037Z cvt.rn.f32.s16 %r5526, %rs333; 2026-02-21T10:23:47.7628101Z cvt.rn.f32.s16 %r5527, %rs336; 2026-02-21T10:23:47.7628168Z cvt.rn.f32.s16 %r5528, %rs339; 2026-02-21T10:23:47.7628234Z cvt.rn.f32.s16 %r5529, %rs342; 2026-02-21T10:23:47.7628304Z cvt.rn.f32.s16 %r5530, %rs345; 2026-02-21T10:23:47.7628367Z cvt.rn.f32.s16 %r5531, %rs348; 2026-02-21T10:23:47.7628434Z cvt.rn.f32.s16 %r5532, %rs351; 2026-02-21T10:23:47.7628581Z cvt.rn.f32.s16 %r5533, %rs354; 2026-02-21T10:23:47.7628648Z cvt.rn.f32.s16 %r5534, %rs357; 2026-02-21T10:23:47.7628712Z cvt.rn.f32.s16 %r5535, %rs360; 2026-02-21T10:23:47.7628780Z cvt.rn.f32.s16 %r5536, %rs363; 2026-02-21T10:23:47.7628845Z cvt.rn.f32.s16 %r5537, %rs366; 2026-02-21T10:23:47.7628995Z cvt.rn.f32.s16 %r5538, %rs369; 2026-02-21T10:23:47.7629061Z cvt.rn.f32.s16 %r5539, %rs372; 2026-02-21T10:23:47.7629128Z cvt.rn.f32.s16 %r5540, %rs375; 2026-02-21T10:23:47.7629193Z cvt.rn.f32.s16 %r5541, %rs378; 2026-02-21T10:23:47.7629319Z cvt.rn.f32.s16 %r5542, %rs381; 2026-02-21T10:23:47.7629386Z cvt.rn.f32.s16 %r5543, %rs384; 2026-02-21T10:23:47.7629444Z bar.sync 0; 2026-02-21T10:23:47.7629512Z st.shared.b32 [%r36], %r5512; 2026-02-21T10:23:47.7629582Z st.shared.b32 [%r36+8], %r5513; 2026-02-21T10:23:47.7629662Z st.shared.b32 [%r36+16384], %r5528; 2026-02-21T10:23:47.7629729Z st.shared.b32 [%r36+16392], %r5529; 2026-02-21T10:23:47.7629795Z st.shared.b32 [%r37], %r5514; 2026-02-21T10:23:47.7629865Z st.shared.b32 [%r37+8], %r5515; 2026-02-21T10:23:47.7629931Z st.shared.b32 [%r37+16384], %r5530; 2026-02-21T10:23:47.7629997Z st.shared.b32 [%r37+16392], %r5531; 2026-02-21T10:23:47.7630061Z st.shared.b32 [%r38], %r5516; 2026-02-21T10:23:47.7630196Z st.shared.b32 [%r38+8], %r5517; 2026-02-21T10:23:47.7630265Z st.shared.b32 [%r38+16384], %r5532; 2026-02-21T10:23:47.7630332Z st.shared.b32 [%r38+16392], %r5533; 2026-02-21T10:23:47.7630401Z st.shared.b32 [%r39], %r5518; 2026-02-21T10:23:47.7630468Z st.shared.b32 [%r39+8], %r5519; 2026-02-21T10:23:47.7630537Z st.shared.b32 [%r39+16384], %r5534; 2026-02-21T10:23:47.7630614Z st.shared.b32 [%r39+16392], %r5535; 2026-02-21T10:23:47.7630749Z st.shared.b32 [%r40], %r5520; 2026-02-21T10:23:47.7630820Z st.shared.b32 [%r40+8], %r5521; 2026-02-21T10:23:47.7630887Z st.shared.b32 [%r40+16384], %r5536; 2026-02-21T10:23:47.7630958Z st.shared.b32 [%r40+16392], %r5537; 2026-02-21T10:23:47.7631023Z st.shared.b32 [%r41], %r5522; 2026-02-21T10:23:47.7631088Z st.shared.b32 [%r41+8], %r5523; 2026-02-21T10:23:47.7631159Z st.shared.b32 [%r41+16384], %r5538; 2026-02-21T10:23:47.7631225Z st.shared.b32 [%r41+16392], %r5539; 2026-02-21T10:23:47.7631292Z st.shared.b32 [%r42], %r5524; 2026-02-21T10:23:47.7631363Z st.shared.b32 [%r42+8], %r5525; 2026-02-21T10:23:47.7631438Z st.shared.b32 [%r42+16384], %r5540; 2026-02-21T10:23:47.7631504Z st.shared.b32 [%r42+16392], %r5541; 2026-02-21T10:23:47.7631568Z st.shared.b32 [%r43], %r5526; 2026-02-21T10:23:47.7631642Z st.shared.b32 [%r43+8], %r5527; 2026-02-21T10:23:47.7631709Z st.shared.b32 [%r43+16384], %r5542; 2026-02-21T10:23:47.7631775Z st.shared.b32 [%r43+16392], %r5543; 2026-02-21T10:23:47.7631839Z $L__tmp3: 2026-02-21T10:23:47.7632130Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7632194Z // begin inline asm 2026-02-21T10:23:47.7632275Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7632341Z // end inline asm 2026-02-21T10:23:47.7632398Z bar.sync 0; 2026-02-21T10:23:47.7632474Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7632542Z // begin inline asm 2026-02-21T10:23:47.7634046Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r1985,%r1986,%r1987,%r1988}, %rd2, %p24, 1, 1; 2026-02-21T10:23:47.7634112Z // end inline asm 2026-02-21T10:23:47.7634192Z // begin inline asm 2026-02-21T10:23:47.7635676Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2117,%r2118,%r2119,%r2120}, %rd3, %p24, 1, 1; 2026-02-21T10:23:47.7635843Z // end inline asm 2026-02-21T10:23:47.7635903Z // begin inline asm 2026-02-21T10:23:47.7637510Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2249,%r2250,%r2251,%r2252}, %rd4, %p24, 1, 1; 2026-02-21T10:23:47.7637649Z // end inline asm 2026-02-21T10:23:47.7637715Z // begin inline asm 2026-02-21T10:23:47.7639253Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2381,%r2382,%r2383,%r2384}, %rd5, %p24, 1, 1; 2026-02-21T10:23:47.7639326Z // end inline asm 2026-02-21T10:23:47.7639388Z // begin inline asm 2026-02-21T10:23:47.7640870Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2513,%r2514,%r2515,%r2516}, %rd6, %p24, 1, 1; 2026-02-21T10:23:47.7640944Z // end inline asm 2026-02-21T10:23:47.7641007Z // begin inline asm 2026-02-21T10:23:47.7642491Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2645,%r2646,%r2647,%r2648}, %rd7, %p24, 1, 1; 2026-02-21T10:23:47.7642553Z // end inline asm 2026-02-21T10:23:47.7642617Z // begin inline asm 2026-02-21T10:23:47.7644104Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2777,%r2778,%r2779,%r2780}, %rd8, %p24, 1, 1; 2026-02-21T10:23:47.7644165Z // end inline asm 2026-02-21T10:23:47.7644309Z // begin inline asm 2026-02-21T10:23:47.7645792Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r2909,%r2910,%r2911,%r2912}, %rd9, %p24, 1, 1; 2026-02-21T10:23:47.7645914Z // end inline asm 2026-02-21T10:23:47.7645999Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7646063Z mov.b32 %r2978, %r5408; 2026-02-21T10:23:47.7646123Z mov.b32 %r2979, %r5408; 2026-02-21T10:23:47.7646187Z mov.b32 %r2977, %r10980; 2026-02-21T10:23:47.7646256Z // begin inline asm 2026-02-21T10:23:47.7647861Z // wait for regs: %r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154,%r2977,%r2978,%r2979 2026-02-21T10:23:47.7647958Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7648017Z // end inline asm 2026-02-21T10:23:47.7648072Z $L__tmp4: 2026-02-21T10:23:47.7648304Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7648369Z add.s64 %rd118, %rd76, 256; 2026-02-21T10:23:47.7648431Z add.s64 %rd121, %rd79, 256; 2026-02-21T10:23:47.7648494Z add.s64 %rd124, %rd82, 256; 2026-02-21T10:23:47.7648703Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7648765Z add.s64 %rd127, %rd85, 256; 2026-02-21T10:23:47.7648828Z // begin inline asm 2026-02-21T10:23:47.7648892Z mov.u64 %rd117, 0x0; 2026-02-21T10:23:47.7649019Z createpolicy.fractional.L2::evict_last.b64 %rd117, 1.0; 2026-02-21T10:23:47.7649079Z // end inline asm 2026-02-21T10:23:47.7649144Z // begin inline asm 2026-02-21T10:23:47.7649205Z mov.u32 %r3047, 0x0; 2026-02-21T10:23:47.7649261Z mov.u32 %r3048, 0x0; 2026-02-21T10:23:47.7649319Z mov.u32 %r3049, 0x0; 2026-02-21T10:23:47.7649381Z mov.u32 %r3050, 0x0; 2026-02-21T10:23:47.7649608Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3047, %r3048, %r3049, %r3050 }, [ %rd118 + 0 ], %rd117; 2026-02-21T10:23:47.7649666Z // end inline asm 2026-02-21T10:23:47.7649728Z // begin inline asm 2026-02-21T10:23:47.7649787Z mov.u64 %rd120, 0x0; 2026-02-21T10:23:47.7649905Z createpolicy.fractional.L2::evict_last.b64 %rd120, 1.0; 2026-02-21T10:23:47.7649971Z // end inline asm 2026-02-21T10:23:47.7650031Z // begin inline asm 2026-02-21T10:23:47.7650089Z mov.u32 %r3051, 0x0; 2026-02-21T10:23:47.7650146Z mov.u32 %r3052, 0x0; 2026-02-21T10:23:47.7650217Z mov.u32 %r3053, 0x0; 2026-02-21T10:23:47.7650278Z mov.u32 %r3054, 0x0; 2026-02-21T10:23:47.7650494Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3051, %r3052, %r3053, %r3054 }, [ %rd121 + 0 ], %rd120; 2026-02-21T10:23:47.7650559Z // end inline asm 2026-02-21T10:23:47.7650619Z // begin inline asm 2026-02-21T10:23:47.7650677Z mov.u64 %rd123, 0x0; 2026-02-21T10:23:47.7650797Z createpolicy.fractional.L2::evict_last.b64 %rd123, 1.0; 2026-02-21T10:23:47.7650855Z // end inline asm 2026-02-21T10:23:47.7650914Z // begin inline asm 2026-02-21T10:23:47.7650972Z mov.u32 %r3055, 0x0; 2026-02-21T10:23:47.7651032Z mov.u32 %r3056, 0x0; 2026-02-21T10:23:47.7651088Z mov.u32 %r3057, 0x0; 2026-02-21T10:23:47.7651145Z mov.u32 %r3058, 0x0; 2026-02-21T10:23:47.7651478Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3055, %r3056, %r3057, %r3058 }, [ %rd124 + 0 ], %rd123; 2026-02-21T10:23:47.7651536Z // end inline asm 2026-02-21T10:23:47.7651596Z // begin inline asm 2026-02-21T10:23:47.7651653Z mov.u64 %rd126, 0x0; 2026-02-21T10:23:47.7651839Z createpolicy.fractional.L2::evict_last.b64 %rd126, 1.0; 2026-02-21T10:23:47.7651898Z // end inline asm 2026-02-21T10:23:47.7651963Z // begin inline asm 2026-02-21T10:23:47.7652026Z mov.u32 %r3059, 0x0; 2026-02-21T10:23:47.7652085Z mov.u32 %r3060, 0x0; 2026-02-21T10:23:47.7652142Z mov.u32 %r3061, 0x0; 2026-02-21T10:23:47.7652203Z mov.u32 %r3062, 0x0; 2026-02-21T10:23:47.7652419Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3059, %r3060, %r3061, %r3062 }, [ %rd127 + 0 ], %rd126; 2026-02-21T10:23:47.7652479Z // end inline asm 2026-02-21T10:23:47.7652689Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7652767Z bar.sync 0; 2026-02-21T10:23:47.7652907Z st.shared.v2.b32 [%r18], {%r3047, %r3048}; 2026-02-21T10:23:47.7653001Z st.shared.v2.b32 [%r18+4096], {%r3051, %r3052}; 2026-02-21T10:23:47.7653092Z st.shared.v2.b32 [%r18+8192], {%r3055, %r3056}; 2026-02-21T10:23:47.7653181Z st.shared.v2.b32 [%r18+12288], {%r3059, %r3060}; 2026-02-21T10:23:47.7653260Z st.shared.v2.b32 [%r19], {%r3049, %r3050}; 2026-02-21T10:23:47.7653346Z st.shared.v2.b32 [%r19+4096], {%r3053, %r3054}; 2026-02-21T10:23:47.7653473Z st.shared.v2.b32 [%r19+8192], {%r3057, %r3058}; 2026-02-21T10:23:47.7653560Z st.shared.v2.b32 [%r19+12288], {%r3061, %r3062}; 2026-02-21T10:23:47.7653617Z bar.sync 0; 2026-02-21T10:23:47.7653691Z ld.shared.b16 %rs385, [%r20]; 2026-02-21T10:23:47.7653759Z ld.shared.b16 %rs386, [%r20+1024]; 2026-02-21T10:23:47.7653826Z ld.shared.b16 %rs387, [%r20+64]; 2026-02-21T10:23:47.7653895Z ld.shared.b16 %rs388, [%r20+1088]; 2026-02-21T10:23:47.7653963Z ld.shared.b16 %rs389, [%r21]; 2026-02-21T10:23:47.7654037Z ld.shared.b16 %rs390, [%r21+1024]; 2026-02-21T10:23:47.7654109Z ld.shared.b16 %rs391, [%r21+64]; 2026-02-21T10:23:47.7654183Z ld.shared.b16 %rs392, [%r21+1088]; 2026-02-21T10:23:47.7654249Z ld.shared.b16 %rs393, [%r22]; 2026-02-21T10:23:47.7654314Z ld.shared.b16 %rs394, [%r22+1024]; 2026-02-21T10:23:47.7654390Z ld.shared.b16 %rs395, [%r22+64]; 2026-02-21T10:23:47.7654456Z ld.shared.b16 %rs396, [%r22+1088]; 2026-02-21T10:23:47.7654524Z ld.shared.b16 %rs397, [%r23]; 2026-02-21T10:23:47.7654589Z ld.shared.b16 %rs398, [%r23+1024]; 2026-02-21T10:23:47.7654658Z ld.shared.b16 %rs399, [%r23+64]; 2026-02-21T10:23:47.7654721Z ld.shared.b16 %rs400, [%r23+1088]; 2026-02-21T10:23:47.7654787Z ld.shared.b16 %rs401, [%r24]; 2026-02-21T10:23:47.7654859Z ld.shared.b16 %rs402, [%r24+1024]; 2026-02-21T10:23:47.7654926Z ld.shared.b16 %rs403, [%r24+64]; 2026-02-21T10:23:47.7654993Z ld.shared.b16 %rs404, [%r24+1088]; 2026-02-21T10:23:47.7655066Z ld.shared.b16 %rs405, [%r25]; 2026-02-21T10:23:47.7655129Z ld.shared.b16 %rs406, [%r25+1024]; 2026-02-21T10:23:47.7655199Z ld.shared.b16 %rs407, [%r25+64]; 2026-02-21T10:23:47.7655262Z ld.shared.b16 %rs408, [%r25+1088]; 2026-02-21T10:23:47.7655332Z ld.shared.b16 %rs409, [%r26]; 2026-02-21T10:23:47.7655395Z ld.shared.b16 %rs410, [%r26+1024]; 2026-02-21T10:23:47.7655462Z ld.shared.b16 %rs411, [%r26+64]; 2026-02-21T10:23:47.7655534Z ld.shared.b16 %rs412, [%r26+1088]; 2026-02-21T10:23:47.7655597Z ld.shared.b16 %rs413, [%r27]; 2026-02-21T10:23:47.7655665Z ld.shared.b16 %rs414, [%r27+1024]; 2026-02-21T10:23:47.7655735Z ld.shared.b16 %rs415, [%r27+64]; 2026-02-21T10:23:47.7655808Z ld.shared.b16 %rs416, [%r27+1088]; 2026-02-21T10:23:47.7655876Z cvt.f32.bf16 %r3200, %rs385; 2026-02-21T10:23:47.7655943Z cvt.f32.bf16 %r3201, %rs386; 2026-02-21T10:23:47.7656010Z cvt.f32.bf16 %r3202, %rs389; 2026-02-21T10:23:47.7656073Z cvt.f32.bf16 %r3203, %rs390; 2026-02-21T10:23:47.7656137Z cvt.f32.bf16 %r3332, %rs393; 2026-02-21T10:23:47.7656207Z cvt.f32.bf16 %r3333, %rs394; 2026-02-21T10:23:47.7656342Z cvt.f32.bf16 %r3334, %rs397; 2026-02-21T10:23:47.7656406Z cvt.f32.bf16 %r3335, %rs398; 2026-02-21T10:23:47.7656588Z cvt.f32.bf16 %r3464, %rs401; 2026-02-21T10:23:47.7656660Z cvt.f32.bf16 %r3465, %rs402; 2026-02-21T10:23:47.7656821Z cvt.f32.bf16 %r3466, %rs405; 2026-02-21T10:23:47.7656883Z cvt.f32.bf16 %r3467, %rs406; 2026-02-21T10:23:47.7656951Z cvt.f32.bf16 %r3596, %rs409; 2026-02-21T10:23:47.7657016Z cvt.f32.bf16 %r3597, %rs410; 2026-02-21T10:23:47.7657080Z cvt.f32.bf16 %r3598, %rs413; 2026-02-21T10:23:47.7657146Z cvt.f32.bf16 %r3599, %rs414; 2026-02-21T10:23:47.7657213Z cvt.f32.bf16 %r3728, %rs387; 2026-02-21T10:23:47.7657279Z cvt.f32.bf16 %r3729, %rs388; 2026-02-21T10:23:47.7657341Z cvt.f32.bf16 %r3730, %rs391; 2026-02-21T10:23:47.7657404Z cvt.f32.bf16 %r3731, %rs392; 2026-02-21T10:23:47.7657478Z cvt.f32.bf16 %r3860, %rs395; 2026-02-21T10:23:47.7657543Z cvt.f32.bf16 %r3861, %rs396; 2026-02-21T10:23:47.7657604Z cvt.f32.bf16 %r3862, %rs399; 2026-02-21T10:23:47.7657743Z cvt.f32.bf16 %r3863, %rs400; 2026-02-21T10:23:47.7657816Z cvt.f32.bf16 %r3992, %rs403; 2026-02-21T10:23:47.7657882Z cvt.f32.bf16 %r3993, %rs404; 2026-02-21T10:23:47.7657947Z cvt.f32.bf16 %r3994, %rs407; 2026-02-21T10:23:47.7658006Z cvt.f32.bf16 %r3995, %rs408; 2026-02-21T10:23:47.7658071Z cvt.f32.bf16 %r4124, %rs411; 2026-02-21T10:23:47.7658131Z cvt.f32.bf16 %r4125, %rs412; 2026-02-21T10:23:47.7658257Z cvt.f32.bf16 %r4126, %rs415; 2026-02-21T10:23:47.7658323Z cvt.f32.bf16 %r4127, %rs416; 2026-02-21T10:23:47.7658542Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7658605Z bar.sync 0; 2026-02-21T10:23:47.7658667Z // begin inline asm 2026-02-21T10:23:47.7658769Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7658847Z // end inline asm 2026-02-21T10:23:47.7658909Z bar.sync 0; 2026-02-21T10:23:47.7658970Z // begin inline asm 2026-02-21T10:23:47.7659108Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7659180Z // end inline asm 2026-02-21T10:23:47.7659240Z // begin inline asm 2026-02-21T10:23:47.7659321Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7659381Z // end inline asm 2026-02-21T10:23:47.7659444Z bar.sync 0; 2026-02-21T10:23:47.7659515Z elect.sync %r5544|%p72, -1; 2026-02-21T10:23:47.7659587Z and.pred %p46, %p1, %p72; 2026-02-21T10:23:47.7659657Z or.b32 %r3067, %r637, 64; 2026-02-21T10:23:47.7659719Z // begin inline asm 2026-02-21T10:23:47.7660052Z @%p46 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r636, %r3067}], [%r5769]; 2026-02-21T10:23:47.7660117Z // end inline asm 2026-02-21T10:23:47.7660174Z bar.sync 0; 2026-02-21T10:23:47.7660235Z // begin inline asm 2026-02-21T10:23:47.7660292Z 2026-02-21T10:23:47.7660345Z { 2026-02-21T10:23:47.7660413Z .reg .pred complete; 2026-02-21T10:23:47.7660471Z waitLoop: 2026-02-21T10:23:47.7660621Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r5408; 2026-02-21T10:23:47.7660700Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7660757Z } 2026-02-21T10:23:47.7660761Z 2026-02-21T10:23:47.7660825Z // end inline asm 2026-02-21T10:23:47.7660883Z bar.sync 0; 2026-02-21T10:23:47.7660947Z // begin inline asm 2026-02-21T10:23:47.7661044Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7661109Z // end inline asm 2026-02-21T10:23:47.7661326Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7661395Z ld.shared.b8 %rs417, [%r28]; 2026-02-21T10:23:47.7661470Z ld.shared.b8 %rs418, [%r28+1024]; 2026-02-21T10:23:47.7661537Z ld.shared.b8 %rs419, [%r28+2048]; 2026-02-21T10:23:47.7661603Z ld.shared.b8 %rs420, [%r28+3072]; 2026-02-21T10:23:47.7661676Z ld.shared.b8 %rs421, [%r29+128]; 2026-02-21T10:23:47.7661743Z ld.shared.b8 %rs422, [%r29+1152]; 2026-02-21T10:23:47.7661808Z ld.shared.b8 %rs423, [%r29+2176]; 2026-02-21T10:23:47.7661967Z ld.shared.b8 %rs424, [%r29+3200]; 2026-02-21T10:23:47.7662046Z ld.shared.b8 %rs425, [%r30+256]; 2026-02-21T10:23:47.7662113Z ld.shared.b8 %rs426, [%r30+1280]; 2026-02-21T10:23:47.7662179Z ld.shared.b8 %rs427, [%r30+2304]; 2026-02-21T10:23:47.7662296Z ld.shared.b8 %rs428, [%r30+3328]; 2026-02-21T10:23:47.7662363Z ld.shared.b8 %rs429, [%r31+384]; 2026-02-21T10:23:47.7662429Z ld.shared.b8 %rs430, [%r31+1408]; 2026-02-21T10:23:47.7662498Z ld.shared.b8 %rs431, [%r31+2432]; 2026-02-21T10:23:47.7662570Z ld.shared.b8 %rs432, [%r31+3456]; 2026-02-21T10:23:47.7662636Z ld.shared.b8 %rs433, [%r32+512]; 2026-02-21T10:23:47.7662702Z ld.shared.b8 %rs434, [%r32+1536]; 2026-02-21T10:23:47.7662774Z ld.shared.b8 %rs435, [%r32+2560]; 2026-02-21T10:23:47.7662839Z ld.shared.b8 %rs436, [%r32+3584]; 2026-02-21T10:23:47.7662904Z ld.shared.b8 %rs437, [%r33+640]; 2026-02-21T10:23:47.7662974Z ld.shared.b8 %rs438, [%r33+1664]; 2026-02-21T10:23:47.7663043Z ld.shared.b8 %rs439, [%r33+2688]; 2026-02-21T10:23:47.7663165Z ld.shared.b8 %rs440, [%r33+3712]; 2026-02-21T10:23:47.7663234Z ld.shared.b8 %rs441, [%r34+768]; 2026-02-21T10:23:47.7663305Z ld.shared.b8 %rs442, [%r34+1792]; 2026-02-21T10:23:47.7663370Z ld.shared.b8 %rs443, [%r34+2816]; 2026-02-21T10:23:47.7663448Z ld.shared.b8 %rs444, [%r34+3840]; 2026-02-21T10:23:47.7663518Z ld.shared.b8 %rs445, [%r35+896]; 2026-02-21T10:23:47.7663582Z ld.shared.b8 %rs446, [%r35+1920]; 2026-02-21T10:23:47.7663691Z ld.shared.b8 %rs447, [%r35+2944]; 2026-02-21T10:23:47.7663760Z ld.shared.b8 %rs448, [%r35+3968]; 2026-02-21T10:23:47.7663979Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7664045Z shl.b16 %rs449, %rs417, 4; 2026-02-21T10:23:47.7664110Z shl.b16 %rs450, %rs421, 4; 2026-02-21T10:23:47.7664182Z shl.b16 %rs451, %rs425, 4; 2026-02-21T10:23:47.7664246Z shl.b16 %rs452, %rs429, 4; 2026-02-21T10:23:47.7664308Z shl.b16 %rs453, %rs433, 4; 2026-02-21T10:23:47.7664373Z shl.b16 %rs454, %rs437, 4; 2026-02-21T10:23:47.7664441Z shl.b16 %rs455, %rs441, 4; 2026-02-21T10:23:47.7664503Z shl.b16 %rs456, %rs445, 4; 2026-02-21T10:23:47.7664565Z shl.b16 %rs457, %rs418, 4; 2026-02-21T10:23:47.7664635Z shl.b16 %rs458, %rs422, 4; 2026-02-21T10:23:47.7664701Z shl.b16 %rs459, %rs426, 4; 2026-02-21T10:23:47.7664763Z shl.b16 %rs460, %rs430, 4; 2026-02-21T10:23:47.7664832Z shl.b16 %rs461, %rs434, 4; 2026-02-21T10:23:47.7664897Z shl.b16 %rs462, %rs438, 4; 2026-02-21T10:23:47.7664957Z shl.b16 %rs463, %rs442, 4; 2026-02-21T10:23:47.7665020Z shl.b16 %rs464, %rs446, 4; 2026-02-21T10:23:47.7665089Z shl.b16 %rs465, %rs419, 4; 2026-02-21T10:23:47.7665151Z shl.b16 %rs466, %rs423, 4; 2026-02-21T10:23:47.7665212Z shl.b16 %rs467, %rs427, 4; 2026-02-21T10:23:47.7665278Z shl.b16 %rs468, %rs431, 4; 2026-02-21T10:23:47.7665342Z shl.b16 %rs469, %rs435, 4; 2026-02-21T10:23:47.7665405Z shl.b16 %rs470, %rs439, 4; 2026-02-21T10:23:47.7665467Z shl.b16 %rs471, %rs443, 4; 2026-02-21T10:23:47.7665540Z shl.b16 %rs472, %rs447, 4; 2026-02-21T10:23:47.7665602Z shl.b16 %rs473, %rs420, 4; 2026-02-21T10:23:47.7665665Z shl.b16 %rs474, %rs424, 4; 2026-02-21T10:23:47.7665731Z shl.b16 %rs475, %rs428, 4; 2026-02-21T10:23:47.7665797Z shl.b16 %rs476, %rs432, 4; 2026-02-21T10:23:47.7665860Z shl.b16 %rs477, %rs436, 4; 2026-02-21T10:23:47.7665923Z shl.b16 %rs478, %rs440, 4; 2026-02-21T10:23:47.7665991Z shl.b16 %rs479, %rs444, 4; 2026-02-21T10:23:47.7666055Z shl.b16 %rs480, %rs448, 4; 2026-02-21T10:23:47.7666280Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7666365Z selp.b16 %rs481, %rs449, %rs417, %p188; 2026-02-21T10:23:47.7666429Z cvt.s16.s8 %rs482, %rs481; 2026-02-21T10:23:47.7666614Z shr.s16 %rs483, %rs482, 4; 2026-02-21T10:23:47.7666694Z selp.b16 %rs484, %rs450, %rs421, %p188; 2026-02-21T10:23:47.7666761Z cvt.s16.s8 %rs485, %rs484; 2026-02-21T10:23:47.7666824Z shr.s16 %rs486, %rs485, 4; 2026-02-21T10:23:47.7666994Z selp.b16 %rs487, %rs451, %rs425, %p188; 2026-02-21T10:23:47.7667062Z cvt.s16.s8 %rs488, %rs487; 2026-02-21T10:23:47.7667124Z shr.s16 %rs489, %rs488, 4; 2026-02-21T10:23:47.7667198Z selp.b16 %rs490, %rs452, %rs429, %p188; 2026-02-21T10:23:47.7667329Z cvt.s16.s8 %rs491, %rs490; 2026-02-21T10:23:47.7667391Z shr.s16 %rs492, %rs491, 4; 2026-02-21T10:23:47.7667464Z selp.b16 %rs493, %rs453, %rs433, %p188; 2026-02-21T10:23:47.7667534Z cvt.s16.s8 %rs494, %rs493; 2026-02-21T10:23:47.7667602Z shr.s16 %rs495, %rs494, 4; 2026-02-21T10:23:47.7667675Z selp.b16 %rs496, %rs454, %rs437, %p188; 2026-02-21T10:23:47.7667738Z cvt.s16.s8 %rs497, %rs496; 2026-02-21T10:23:47.7667805Z shr.s16 %rs498, %rs497, 4; 2026-02-21T10:23:47.7667877Z selp.b16 %rs499, %rs455, %rs441, %p188; 2026-02-21T10:23:47.7667940Z cvt.s16.s8 %rs500, %rs499; 2026-02-21T10:23:47.7668004Z shr.s16 %rs501, %rs500, 4; 2026-02-21T10:23:47.7668080Z selp.b16 %rs502, %rs456, %rs445, %p188; 2026-02-21T10:23:47.7668148Z cvt.s16.s8 %rs503, %rs502; 2026-02-21T10:23:47.7668278Z shr.s16 %rs504, %rs503, 4; 2026-02-21T10:23:47.7668360Z selp.b16 %rs505, %rs457, %rs418, %p188; 2026-02-21T10:23:47.7668423Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T10:23:47.7668569Z shr.s16 %rs507, %rs506, 4; 2026-02-21T10:23:47.7668653Z selp.b16 %rs508, %rs458, %rs422, %p188; 2026-02-21T10:23:47.7668721Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T10:23:47.7668874Z shr.s16 %rs510, %rs509, 4; 2026-02-21T10:23:47.7668953Z selp.b16 %rs511, %rs459, %rs426, %p188; 2026-02-21T10:23:47.7669022Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T10:23:47.7669085Z shr.s16 %rs513, %rs512, 4; 2026-02-21T10:23:47.7669157Z selp.b16 %rs514, %rs460, %rs430, %p188; 2026-02-21T10:23:47.7669224Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T10:23:47.7669288Z shr.s16 %rs516, %rs515, 4; 2026-02-21T10:23:47.7669358Z selp.b16 %rs517, %rs461, %rs434, %p188; 2026-02-21T10:23:47.7669420Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T10:23:47.7669487Z shr.s16 %rs519, %rs518, 4; 2026-02-21T10:23:47.7669565Z selp.b16 %rs520, %rs462, %rs438, %p188; 2026-02-21T10:23:47.7669627Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T10:23:47.7669694Z shr.s16 %rs522, %rs521, 4; 2026-02-21T10:23:47.7669776Z selp.b16 %rs523, %rs463, %rs442, %p188; 2026-02-21T10:23:47.7669844Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T10:23:47.7669908Z shr.s16 %rs525, %rs524, 4; 2026-02-21T10:23:47.7669984Z selp.b16 %rs526, %rs464, %rs446, %p188; 2026-02-21T10:23:47.7670048Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T10:23:47.7670112Z shr.s16 %rs528, %rs527, 4; 2026-02-21T10:23:47.7670190Z selp.b16 %rs529, %rs465, %rs419, %p188; 2026-02-21T10:23:47.7670252Z cvt.s16.s8 %rs530, %rs529; 2026-02-21T10:23:47.7670315Z shr.s16 %rs531, %rs530, 4; 2026-02-21T10:23:47.7670385Z selp.b16 %rs532, %rs466, %rs423, %p188; 2026-02-21T10:23:47.7670454Z cvt.s16.s8 %rs533, %rs532; 2026-02-21T10:23:47.7670516Z shr.s16 %rs534, %rs533, 4; 2026-02-21T10:23:47.7670587Z selp.b16 %rs535, %rs467, %rs427, %p188; 2026-02-21T10:23:47.7670663Z cvt.s16.s8 %rs536, %rs535; 2026-02-21T10:23:47.7670726Z shr.s16 %rs537, %rs536, 4; 2026-02-21T10:23:47.7670797Z selp.b16 %rs538, %rs468, %rs431, %p188; 2026-02-21T10:23:47.7670864Z cvt.s16.s8 %rs539, %rs538; 2026-02-21T10:23:47.7670930Z shr.s16 %rs540, %rs539, 4; 2026-02-21T10:23:47.7671001Z selp.b16 %rs541, %rs469, %rs435, %p188; 2026-02-21T10:23:47.7671062Z cvt.s16.s8 %rs542, %rs541; 2026-02-21T10:23:47.7671134Z shr.s16 %rs543, %rs542, 4; 2026-02-21T10:23:47.7671205Z selp.b16 %rs544, %rs470, %rs439, %p188; 2026-02-21T10:23:47.7671269Z cvt.s16.s8 %rs545, %rs544; 2026-02-21T10:23:47.7671349Z shr.s16 %rs546, %rs545, 4; 2026-02-21T10:23:47.7671422Z selp.b16 %rs547, %rs471, %rs443, %p188; 2026-02-21T10:23:47.7671485Z cvt.s16.s8 %rs548, %rs547; 2026-02-21T10:23:47.7671547Z shr.s16 %rs549, %rs548, 4; 2026-02-21T10:23:47.7671623Z selp.b16 %rs550, %rs472, %rs447, %p188; 2026-02-21T10:23:47.7671688Z cvt.s16.s8 %rs551, %rs550; 2026-02-21T10:23:47.7671753Z shr.s16 %rs552, %rs551, 4; 2026-02-21T10:23:47.7671890Z selp.b16 %rs553, %rs473, %rs420, %p188; 2026-02-21T10:23:47.7671956Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T10:23:47.7672020Z shr.s16 %rs555, %rs554, 4; 2026-02-21T10:23:47.7672102Z selp.b16 %rs556, %rs474, %rs424, %p188; 2026-02-21T10:23:47.7672243Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T10:23:47.7672307Z shr.s16 %rs558, %rs557, 4; 2026-02-21T10:23:47.7672384Z selp.b16 %rs559, %rs475, %rs428, %p188; 2026-02-21T10:23:47.7672454Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T10:23:47.7672520Z shr.s16 %rs561, %rs560, 4; 2026-02-21T10:23:47.7672591Z selp.b16 %rs562, %rs476, %rs432, %p188; 2026-02-21T10:23:47.7672659Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T10:23:47.7672722Z shr.s16 %rs564, %rs563, 4; 2026-02-21T10:23:47.7672794Z selp.b16 %rs565, %rs477, %rs436, %p188; 2026-02-21T10:23:47.7672856Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T10:23:47.7672926Z shr.s16 %rs567, %rs566, 4; 2026-02-21T10:23:47.7672997Z selp.b16 %rs568, %rs478, %rs440, %p188; 2026-02-21T10:23:47.7673111Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T10:23:47.7673180Z shr.s16 %rs570, %rs569, 4; 2026-02-21T10:23:47.7673250Z selp.b16 %rs571, %rs479, %rs444, %p188; 2026-02-21T10:23:47.7673313Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T10:23:47.7673379Z shr.s16 %rs573, %rs572, 4; 2026-02-21T10:23:47.7673454Z selp.b16 %rs574, %rs480, %rs448, %p188; 2026-02-21T10:23:47.7673515Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T10:23:47.7673624Z shr.s16 %rs576, %rs575, 4; 2026-02-21T10:23:47.7673849Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7673918Z cvt.rn.f32.s16 %r5545, %rs483; 2026-02-21T10:23:47.7673984Z cvt.rn.f32.s16 %r5546, %rs486; 2026-02-21T10:23:47.7674049Z cvt.rn.f32.s16 %r5547, %rs489; 2026-02-21T10:23:47.7674117Z cvt.rn.f32.s16 %r5548, %rs492; 2026-02-21T10:23:47.7674181Z cvt.rn.f32.s16 %r5549, %rs495; 2026-02-21T10:23:47.7674243Z cvt.rn.f32.s16 %r5550, %rs498; 2026-02-21T10:23:47.7674313Z cvt.rn.f32.s16 %r5551, %rs501; 2026-02-21T10:23:47.7674379Z cvt.rn.f32.s16 %r5552, %rs504; 2026-02-21T10:23:47.7674442Z cvt.rn.f32.s16 %r5553, %rs507; 2026-02-21T10:23:47.7674509Z cvt.rn.f32.s16 %r5554, %rs510; 2026-02-21T10:23:47.7674571Z cvt.rn.f32.s16 %r5555, %rs513; 2026-02-21T10:23:47.7674638Z cvt.rn.f32.s16 %r5556, %rs516; 2026-02-21T10:23:47.7674701Z cvt.rn.f32.s16 %r5557, %rs519; 2026-02-21T10:23:47.7674772Z cvt.rn.f32.s16 %r5558, %rs522; 2026-02-21T10:23:47.7674837Z cvt.rn.f32.s16 %r5559, %rs525; 2026-02-21T10:23:47.7674902Z cvt.rn.f32.s16 %r5560, %rs528; 2026-02-21T10:23:47.7674969Z cvt.rn.f32.s16 %r5561, %rs531; 2026-02-21T10:23:47.7675031Z cvt.rn.f32.s16 %r5562, %rs534; 2026-02-21T10:23:47.7675107Z cvt.rn.f32.s16 %r5563, %rs537; 2026-02-21T10:23:47.7675174Z cvt.rn.f32.s16 %r5564, %rs540; 2026-02-21T10:23:47.7675241Z cvt.rn.f32.s16 %r5565, %rs543; 2026-02-21T10:23:47.7675304Z cvt.rn.f32.s16 %r5566, %rs546; 2026-02-21T10:23:47.7675369Z cvt.rn.f32.s16 %r5567, %rs549; 2026-02-21T10:23:47.7675441Z cvt.rn.f32.s16 %r5568, %rs552; 2026-02-21T10:23:47.7675505Z cvt.rn.f32.s16 %r5569, %rs555; 2026-02-21T10:23:47.7675568Z cvt.rn.f32.s16 %r5570, %rs558; 2026-02-21T10:23:47.7675633Z cvt.rn.f32.s16 %r5571, %rs561; 2026-02-21T10:23:47.7675707Z cvt.rn.f32.s16 %r5572, %rs564; 2026-02-21T10:23:47.7675772Z cvt.rn.f32.s16 %r5573, %rs567; 2026-02-21T10:23:47.7675835Z cvt.rn.f32.s16 %r5574, %rs570; 2026-02-21T10:23:47.7675907Z cvt.rn.f32.s16 %r5575, %rs573; 2026-02-21T10:23:47.7675974Z cvt.rn.f32.s16 %r5576, %rs576; 2026-02-21T10:23:47.7676031Z bar.sync 0; 2026-02-21T10:23:47.7676105Z st.shared.b32 [%r36], %r5545; 2026-02-21T10:23:47.7676174Z st.shared.b32 [%r36+8], %r5546; 2026-02-21T10:23:47.7676246Z st.shared.b32 [%r36+16384], %r5561; 2026-02-21T10:23:47.7676313Z st.shared.b32 [%r36+16392], %r5562; 2026-02-21T10:23:47.7676382Z st.shared.b32 [%r37], %r5547; 2026-02-21T10:23:47.7676584Z st.shared.b32 [%r37+8], %r5548; 2026-02-21T10:23:47.7676659Z st.shared.b32 [%r37+16384], %r5563; 2026-02-21T10:23:47.7676821Z st.shared.b32 [%r37+16392], %r5564; 2026-02-21T10:23:47.7676887Z st.shared.b32 [%r38], %r5549; 2026-02-21T10:23:47.7676955Z st.shared.b32 [%r38+8], %r5550; 2026-02-21T10:23:47.7677025Z st.shared.b32 [%r38+16384], %r5565; 2026-02-21T10:23:47.7677157Z st.shared.b32 [%r38+16392], %r5566; 2026-02-21T10:23:47.7677225Z st.shared.b32 [%r39], %r5551; 2026-02-21T10:23:47.7677293Z st.shared.b32 [%r39+8], %r5552; 2026-02-21T10:23:47.7677364Z st.shared.b32 [%r39+16384], %r5567; 2026-02-21T10:23:47.7677433Z st.shared.b32 [%r39+16392], %r5568; 2026-02-21T10:23:47.7677498Z st.shared.b32 [%r40], %r5553; 2026-02-21T10:23:47.7677565Z st.shared.b32 [%r40+8], %r5554; 2026-02-21T10:23:47.7677635Z st.shared.b32 [%r40+16384], %r5569; 2026-02-21T10:23:47.7677701Z st.shared.b32 [%r40+16392], %r5570; 2026-02-21T10:23:47.7677767Z st.shared.b32 [%r41], %r5555; 2026-02-21T10:23:47.7677837Z st.shared.b32 [%r41+8], %r5556; 2026-02-21T10:23:47.7677908Z st.shared.b32 [%r41+16384], %r5571; 2026-02-21T10:23:47.7678034Z st.shared.b32 [%r41+16392], %r5572; 2026-02-21T10:23:47.7678108Z st.shared.b32 [%r42], %r5557; 2026-02-21T10:23:47.7678173Z st.shared.b32 [%r42+8], %r5558; 2026-02-21T10:23:47.7678242Z st.shared.b32 [%r42+16384], %r5573; 2026-02-21T10:23:47.7678311Z st.shared.b32 [%r42+16392], %r5574; 2026-02-21T10:23:47.7678392Z st.shared.b32 [%r43], %r5559; 2026-02-21T10:23:47.7678520Z st.shared.b32 [%r43+8], %r5560; 2026-02-21T10:23:47.7678590Z st.shared.b32 [%r43+16384], %r5575; 2026-02-21T10:23:47.7683562Z st.shared.b32 [%r43+16392], %r5576; 2026-02-21T10:23:47.7683656Z $L__tmp5: 2026-02-21T10:23:47.7683989Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7684061Z // begin inline asm 2026-02-21T10:23:47.7684152Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7684220Z // end inline asm 2026-02-21T10:23:47.7684280Z bar.sync 0; 2026-02-21T10:23:47.7684365Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7684435Z // begin inline asm 2026-02-21T10:23:47.7685931Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3200,%r3201,%r3202,%r3203}, %rd2, %p24, 1, 1; 2026-02-21T10:23:47.7685999Z // end inline asm 2026-02-21T10:23:47.7686068Z // begin inline asm 2026-02-21T10:23:47.7687715Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3332,%r3333,%r3334,%r3335}, %rd3, %p24, 1, 1; 2026-02-21T10:23:47.7687791Z // end inline asm 2026-02-21T10:23:47.7687863Z // begin inline asm 2026-02-21T10:23:47.7689323Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3464,%r3465,%r3466,%r3467}, %rd4, %p24, 1, 1; 2026-02-21T10:23:47.7689551Z // end inline asm 2026-02-21T10:23:47.7689689Z // begin inline asm 2026-02-21T10:23:47.7691155Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3596,%r3597,%r3598,%r3599}, %rd5, %p24, 1, 1; 2026-02-21T10:23:47.7691227Z // end inline asm 2026-02-21T10:23:47.7691289Z // begin inline asm 2026-02-21T10:23:47.7692893Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3728,%r3729,%r3730,%r3731}, %rd6, %p24, 1, 1; 2026-02-21T10:23:47.7692972Z // end inline asm 2026-02-21T10:23:47.7693033Z // begin inline asm 2026-02-21T10:23:47.7694497Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3860,%r3861,%r3862,%r3863}, %rd7, %p24, 1, 1; 2026-02-21T10:23:47.7694562Z // end inline asm 2026-02-21T10:23:47.7694627Z // begin inline asm 2026-02-21T10:23:47.7696091Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r3992,%r3993,%r3994,%r3995}, %rd8, %p24, 1, 1; 2026-02-21T10:23:47.7696157Z // end inline asm 2026-02-21T10:23:47.7696219Z // begin inline asm 2026-02-21T10:23:47.7697821Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4124,%r4125,%r4126,%r4127}, %rd9, %p24, 1, 1; 2026-02-21T10:23:47.7697890Z // end inline asm 2026-02-21T10:23:47.7697981Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7698049Z mov.b32 %r4193, %r5408; 2026-02-21T10:23:47.7698203Z mov.b32 %r4194, %r5408; 2026-02-21T10:23:47.7698273Z mov.b32 %r4192, %r10980; 2026-02-21T10:23:47.7698348Z // begin inline asm 2026-02-21T10:23:47.7699619Z // wait for regs: %r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154,%r4192,%r4193,%r4194 2026-02-21T10:23:47.7699787Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7699849Z // end inline asm 2026-02-21T10:23:47.7699907Z $L__tmp6: 2026-02-21T10:23:47.7700152Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7700235Z add.s64 %rd139, %rd76, 384; 2026-02-21T10:23:47.7700385Z add.s64 %rd142, %rd79, 384; 2026-02-21T10:23:47.7700453Z add.s64 %rd145, %rd82, 384; 2026-02-21T10:23:47.7700677Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7700744Z add.s64 %rd148, %rd85, 384; 2026-02-21T10:23:47.7700810Z // begin inline asm 2026-02-21T10:23:47.7700881Z mov.u64 %rd138, 0x0; 2026-02-21T10:23:47.7701089Z createpolicy.fractional.L2::evict_last.b64 %rd138, 1.0; 2026-02-21T10:23:47.7701154Z // end inline asm 2026-02-21T10:23:47.7701219Z // begin inline asm 2026-02-21T10:23:47.7701286Z mov.u32 %r4262, 0x0; 2026-02-21T10:23:47.7701346Z mov.u32 %r4263, 0x0; 2026-02-21T10:23:47.7701405Z mov.u32 %r4264, 0x0; 2026-02-21T10:23:47.7701470Z mov.u32 %r4265, 0x0; 2026-02-21T10:23:47.7701702Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4262, %r4263, %r4264, %r4265 }, [ %rd139 + 0 ], %rd138; 2026-02-21T10:23:47.7701762Z // end inline asm 2026-02-21T10:23:47.7701828Z // begin inline asm 2026-02-21T10:23:47.7701896Z mov.u64 %rd141, 0x0; 2026-02-21T10:23:47.7702020Z createpolicy.fractional.L2::evict_last.b64 %rd141, 1.0; 2026-02-21T10:23:47.7702080Z // end inline asm 2026-02-21T10:23:47.7702152Z // begin inline asm 2026-02-21T10:23:47.7702221Z mov.u32 %r4266, 0x0; 2026-02-21T10:23:47.7702282Z mov.u32 %r4267, 0x0; 2026-02-21T10:23:47.7702348Z mov.u32 %r4268, 0x0; 2026-02-21T10:23:47.7702412Z mov.u32 %r4269, 0x0; 2026-02-21T10:23:47.7702631Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4266, %r4267, %r4268, %r4269 }, [ %rd142 + 0 ], %rd141; 2026-02-21T10:23:47.7702690Z // end inline asm 2026-02-21T10:23:47.7702758Z // begin inline asm 2026-02-21T10:23:47.7702819Z mov.u64 %rd144, 0x0; 2026-02-21T10:23:47.7702942Z createpolicy.fractional.L2::evict_last.b64 %rd144, 1.0; 2026-02-21T10:23:47.7703009Z // end inline asm 2026-02-21T10:23:47.7703072Z // begin inline asm 2026-02-21T10:23:47.7703133Z mov.u32 %r4270, 0x0; 2026-02-21T10:23:47.7703195Z mov.u32 %r4271, 0x0; 2026-02-21T10:23:47.7703266Z mov.u32 %r4272, 0x0; 2026-02-21T10:23:47.7703324Z mov.u32 %r4273, 0x0; 2026-02-21T10:23:47.7703550Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4270, %r4271, %r4272, %r4273 }, [ %rd145 + 0 ], %rd144; 2026-02-21T10:23:47.7703620Z // end inline asm 2026-02-21T10:23:47.7703680Z // begin inline asm 2026-02-21T10:23:47.7703741Z mov.u64 %rd147, 0x0; 2026-02-21T10:23:47.7703870Z createpolicy.fractional.L2::evict_last.b64 %rd147, 1.0; 2026-02-21T10:23:47.7703931Z // end inline asm 2026-02-21T10:23:47.7703992Z // begin inline asm 2026-02-21T10:23:47.7704051Z mov.u32 %r4274, 0x0; 2026-02-21T10:23:47.7704117Z mov.u32 %r4275, 0x0; 2026-02-21T10:23:47.7704177Z mov.u32 %r4276, 0x0; 2026-02-21T10:23:47.7704237Z mov.u32 %r4277, 0x0; 2026-02-21T10:23:47.7704458Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4274, %r4275, %r4276, %r4277 }, [ %rd148 + 0 ], %rd147; 2026-02-21T10:23:47.7704519Z // end inline asm 2026-02-21T10:23:47.7704800Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7704867Z bar.sync 0; 2026-02-21T10:23:47.7704953Z st.shared.v2.b32 [%r18], {%r4262, %r4263}; 2026-02-21T10:23:47.7705045Z st.shared.v2.b32 [%r18+4096], {%r4266, %r4267}; 2026-02-21T10:23:47.7705183Z st.shared.v2.b32 [%r18+8192], {%r4270, %r4271}; 2026-02-21T10:23:47.7705278Z st.shared.v2.b32 [%r18+12288], {%r4274, %r4275}; 2026-02-21T10:23:47.7705361Z st.shared.v2.b32 [%r19], {%r4264, %r4265}; 2026-02-21T10:23:47.7705449Z st.shared.v2.b32 [%r19+4096], {%r4268, %r4269}; 2026-02-21T10:23:47.7705539Z st.shared.v2.b32 [%r19+8192], {%r4272, %r4273}; 2026-02-21T10:23:47.7705626Z st.shared.v2.b32 [%r19+12288], {%r4276, %r4277}; 2026-02-21T10:23:47.7705686Z bar.sync 0; 2026-02-21T10:23:47.7705762Z ld.shared.b16 %rs577, [%r20]; 2026-02-21T10:23:47.7705837Z ld.shared.b16 %rs578, [%r20+1024]; 2026-02-21T10:23:47.7705907Z ld.shared.b16 %rs579, [%r20+64]; 2026-02-21T10:23:47.7706035Z ld.shared.b16 %rs580, [%r20+1088]; 2026-02-21T10:23:47.7706112Z ld.shared.b16 %rs581, [%r21]; 2026-02-21T10:23:47.7706180Z ld.shared.b16 %rs582, [%r21+1024]; 2026-02-21T10:23:47.7706249Z ld.shared.b16 %rs583, [%r21+64]; 2026-02-21T10:23:47.7706327Z ld.shared.b16 %rs584, [%r21+1088]; 2026-02-21T10:23:47.7706394Z ld.shared.b16 %rs585, [%r22]; 2026-02-21T10:23:47.7706583Z ld.shared.b16 %rs586, [%r22+1024]; 2026-02-21T10:23:47.7706744Z ld.shared.b16 %rs587, [%r22+64]; 2026-02-21T10:23:47.7706826Z ld.shared.b16 %rs588, [%r22+1088]; 2026-02-21T10:23:47.7706894Z ld.shared.b16 %rs589, [%r23]; 2026-02-21T10:23:47.7706962Z ld.shared.b16 %rs590, [%r23+1024]; 2026-02-21T10:23:47.7707035Z ld.shared.b16 %rs591, [%r23+64]; 2026-02-21T10:23:47.7707102Z ld.shared.b16 %rs592, [%r23+1088]; 2026-02-21T10:23:47.7707170Z ld.shared.b16 %rs593, [%r24]; 2026-02-21T10:23:47.7707237Z ld.shared.b16 %rs594, [%r24+1024]; 2026-02-21T10:23:47.7707312Z ld.shared.b16 %rs595, [%r24+64]; 2026-02-21T10:23:47.7707386Z ld.shared.b16 %rs596, [%r24+1088]; 2026-02-21T10:23:47.7707453Z ld.shared.b16 %rs597, [%r25]; 2026-02-21T10:23:47.7707540Z ld.shared.b16 %rs598, [%r25+1024]; 2026-02-21T10:23:47.7707610Z ld.shared.b16 %rs599, [%r25+64]; 2026-02-21T10:23:47.7707684Z ld.shared.b16 %rs600, [%r25+1088]; 2026-02-21T10:23:47.7707764Z ld.shared.b16 %rs601, [%r26]; 2026-02-21T10:23:47.7707830Z ld.shared.b16 %rs602, [%r26+1024]; 2026-02-21T10:23:47.7707903Z ld.shared.b16 %rs603, [%r26+64]; 2026-02-21T10:23:47.7707969Z ld.shared.b16 %rs604, [%r26+1088]; 2026-02-21T10:23:47.7708044Z ld.shared.b16 %rs605, [%r27]; 2026-02-21T10:23:47.7708112Z ld.shared.b16 %rs606, [%r27+1024]; 2026-02-21T10:23:47.7708180Z ld.shared.b16 %rs607, [%r27+64]; 2026-02-21T10:23:47.7708254Z ld.shared.b16 %rs608, [%r27+1088]; 2026-02-21T10:23:47.7708324Z cvt.f32.bf16 %r4415, %rs577; 2026-02-21T10:23:47.7708391Z cvt.f32.bf16 %r4416, %rs578; 2026-02-21T10:23:47.7708456Z cvt.f32.bf16 %r4417, %rs581; 2026-02-21T10:23:47.7708600Z cvt.f32.bf16 %r4418, %rs582; 2026-02-21T10:23:47.7708670Z cvt.f32.bf16 %r4547, %rs585; 2026-02-21T10:23:47.7708734Z cvt.f32.bf16 %r4548, %rs586; 2026-02-21T10:23:47.7708804Z cvt.f32.bf16 %r4549, %rs589; 2026-02-21T10:23:47.7708866Z cvt.f32.bf16 %r4550, %rs590; 2026-02-21T10:23:47.7708933Z cvt.f32.bf16 %r4679, %rs593; 2026-02-21T10:23:47.7708996Z cvt.f32.bf16 %r4680, %rs594; 2026-02-21T10:23:47.7709065Z cvt.f32.bf16 %r4681, %rs597; 2026-02-21T10:23:47.7709131Z cvt.f32.bf16 %r4682, %rs598; 2026-02-21T10:23:47.7709195Z cvt.f32.bf16 %r4811, %rs601; 2026-02-21T10:23:47.7709272Z cvt.f32.bf16 %r4812, %rs602; 2026-02-21T10:23:47.7709340Z cvt.f32.bf16 %r4813, %rs605; 2026-02-21T10:23:47.7709403Z cvt.f32.bf16 %r4814, %rs606; 2026-02-21T10:23:47.7709475Z cvt.f32.bf16 %r4943, %rs579; 2026-02-21T10:23:47.7709537Z cvt.f32.bf16 %r4944, %rs580; 2026-02-21T10:23:47.7709600Z cvt.f32.bf16 %r4945, %rs583; 2026-02-21T10:23:47.7709664Z cvt.f32.bf16 %r4946, %rs584; 2026-02-21T10:23:47.7709832Z cvt.f32.bf16 %r5075, %rs587; 2026-02-21T10:23:47.7709900Z cvt.f32.bf16 %r5076, %rs588; 2026-02-21T10:23:47.7709966Z cvt.f32.bf16 %r5077, %rs591; 2026-02-21T10:23:47.7710036Z cvt.f32.bf16 %r5078, %rs592; 2026-02-21T10:23:47.7710099Z cvt.f32.bf16 %r5207, %rs595; 2026-02-21T10:23:47.7710227Z cvt.f32.bf16 %r5208, %rs596; 2026-02-21T10:23:47.7710290Z cvt.f32.bf16 %r5209, %rs599; 2026-02-21T10:23:47.7710360Z cvt.f32.bf16 %r5210, %rs600; 2026-02-21T10:23:47.7710424Z cvt.f32.bf16 %r5339, %rs603; 2026-02-21T10:23:47.7710489Z cvt.f32.bf16 %r5340, %rs604; 2026-02-21T10:23:47.7710560Z cvt.f32.bf16 %r5341, %rs607; 2026-02-21T10:23:47.7710623Z cvt.f32.bf16 %r5342, %rs608; 2026-02-21T10:23:47.7710843Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7710908Z bar.sync 0; 2026-02-21T10:23:47.7710972Z // begin inline asm 2026-02-21T10:23:47.7711078Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7711142Z // end inline asm 2026-02-21T10:23:47.7711268Z bar.sync 0; 2026-02-21T10:23:47.7711333Z // begin inline asm 2026-02-21T10:23:47.7711475Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7711541Z // end inline asm 2026-02-21T10:23:47.7711609Z // begin inline asm 2026-02-21T10:23:47.7711692Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7711752Z // end inline asm 2026-02-21T10:23:47.7711817Z bar.sync 0; 2026-02-21T10:23:47.7711938Z elect.sync %r5577|%p73, -1; 2026-02-21T10:23:47.7712016Z and.pred %p58, %p1, %p73; 2026-02-21T10:23:47.7712088Z or.b32 %r4282, %r637, 96; 2026-02-21T10:23:47.7712159Z // begin inline asm 2026-02-21T10:23:47.7712499Z @%p58 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r636, %r4282}], [%r5769]; 2026-02-21T10:23:47.7712571Z // end inline asm 2026-02-21T10:23:47.7712638Z bar.sync 0; 2026-02-21T10:23:47.7712701Z // begin inline asm 2026-02-21T10:23:47.7712756Z 2026-02-21T10:23:47.7712818Z { 2026-02-21T10:23:47.7712888Z .reg .pred complete; 2026-02-21T10:23:47.7712945Z waitLoop: 2026-02-21T10:23:47.7713102Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r5408; 2026-02-21T10:23:47.7713176Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7713236Z } 2026-02-21T10:23:47.7713241Z 2026-02-21T10:23:47.7713301Z // end inline asm 2026-02-21T10:23:47.7713380Z bar.sync 0; 2026-02-21T10:23:47.7713447Z // begin inline asm 2026-02-21T10:23:47.7713550Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7713617Z // end inline asm 2026-02-21T10:23:47.7713838Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7713911Z ld.shared.b8 %rs609, [%r28]; 2026-02-21T10:23:47.7713984Z ld.shared.b8 %rs610, [%r28+1024]; 2026-02-21T10:23:47.7714063Z ld.shared.b8 %rs611, [%r28+2048]; 2026-02-21T10:23:47.7714132Z ld.shared.b8 %rs612, [%r28+3072]; 2026-02-21T10:23:47.7714207Z ld.shared.b8 %rs613, [%r29+128]; 2026-02-21T10:23:47.7714283Z ld.shared.b8 %rs614, [%r29+1152]; 2026-02-21T10:23:47.7714350Z ld.shared.b8 %rs615, [%r29+2176]; 2026-02-21T10:23:47.7714415Z ld.shared.b8 %rs616, [%r29+3200]; 2026-02-21T10:23:47.7714487Z ld.shared.b8 %rs617, [%r30+256]; 2026-02-21T10:23:47.7714557Z ld.shared.b8 %rs618, [%r30+1280]; 2026-02-21T10:23:47.7714625Z ld.shared.b8 %rs619, [%r30+2304]; 2026-02-21T10:23:47.7714695Z ld.shared.b8 %rs620, [%r30+3328]; 2026-02-21T10:23:47.7714767Z ld.shared.b8 %rs621, [%r31+384]; 2026-02-21T10:23:47.7714834Z ld.shared.b8 %rs622, [%r31+1408]; 2026-02-21T10:23:47.7714903Z ld.shared.b8 %rs623, [%r31+2432]; 2026-02-21T10:23:47.7714976Z ld.shared.b8 %rs624, [%r31+3456]; 2026-02-21T10:23:47.7715046Z ld.shared.b8 %rs625, [%r32+512]; 2026-02-21T10:23:47.7715114Z ld.shared.b8 %rs626, [%r32+1536]; 2026-02-21T10:23:47.7715183Z ld.shared.b8 %rs627, [%r32+2560]; 2026-02-21T10:23:47.7715258Z ld.shared.b8 %rs628, [%r32+3584]; 2026-02-21T10:23:47.7715388Z ld.shared.b8 %rs629, [%r33+640]; 2026-02-21T10:23:47.7715459Z ld.shared.b8 %rs630, [%r33+1664]; 2026-02-21T10:23:47.7715533Z ld.shared.b8 %rs631, [%r33+2688]; 2026-02-21T10:23:47.7715600Z ld.shared.b8 %rs632, [%r33+3712]; 2026-02-21T10:23:47.7715667Z ld.shared.b8 %rs633, [%r34+768]; 2026-02-21T10:23:47.7715787Z ld.shared.b8 %rs634, [%r34+1792]; 2026-02-21T10:23:47.7715853Z ld.shared.b8 %rs635, [%r34+2816]; 2026-02-21T10:23:47.7715922Z ld.shared.b8 %rs636, [%r34+3840]; 2026-02-21T10:23:47.7715988Z ld.shared.b8 %rs637, [%r35+896]; 2026-02-21T10:23:47.7716064Z ld.shared.b8 %rs638, [%r35+1920]; 2026-02-21T10:23:47.7716131Z ld.shared.b8 %rs639, [%r35+2944]; 2026-02-21T10:23:47.7716197Z ld.shared.b8 %rs640, [%r35+3968]; 2026-02-21T10:23:47.7716416Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7716611Z shl.b16 %rs641, %rs609, 4; 2026-02-21T10:23:47.7716681Z shl.b16 %rs642, %rs613, 4; 2026-02-21T10:23:47.7716762Z shl.b16 %rs643, %rs617, 4; 2026-02-21T10:23:47.7716909Z shl.b16 %rs644, %rs621, 4; 2026-02-21T10:23:47.7716987Z shl.b16 %rs645, %rs625, 4; 2026-02-21T10:23:47.7717054Z shl.b16 %rs646, %rs629, 4; 2026-02-21T10:23:47.7717124Z shl.b16 %rs647, %rs633, 4; 2026-02-21T10:23:47.7717192Z shl.b16 %rs648, %rs637, 4; 2026-02-21T10:23:47.7717260Z shl.b16 %rs649, %rs610, 4; 2026-02-21T10:23:47.7717327Z shl.b16 %rs650, %rs614, 4; 2026-02-21T10:23:47.7717459Z shl.b16 %rs651, %rs618, 4; 2026-02-21T10:23:47.7717526Z shl.b16 %rs652, %rs622, 4; 2026-02-21T10:23:47.7717590Z shl.b16 %rs653, %rs626, 4; 2026-02-21T10:23:47.7717658Z shl.b16 %rs654, %rs630, 4; 2026-02-21T10:23:47.7717721Z shl.b16 %rs655, %rs634, 4; 2026-02-21T10:23:47.7717787Z shl.b16 %rs656, %rs638, 4; 2026-02-21T10:23:47.7717860Z shl.b16 %rs657, %rs611, 4; 2026-02-21T10:23:47.7717925Z shl.b16 %rs658, %rs615, 4; 2026-02-21T10:23:47.7717988Z shl.b16 %rs659, %rs619, 4; 2026-02-21T10:23:47.7718052Z shl.b16 %rs660, %rs623, 4; 2026-02-21T10:23:47.7718128Z shl.b16 %rs661, %rs627, 4; 2026-02-21T10:23:47.7718192Z shl.b16 %rs662, %rs631, 4; 2026-02-21T10:23:47.7718254Z shl.b16 %rs663, %rs635, 4; 2026-02-21T10:23:47.7718326Z shl.b16 %rs664, %rs639, 4; 2026-02-21T10:23:47.7718396Z shl.b16 %rs665, %rs612, 4; 2026-02-21T10:23:47.7718462Z shl.b16 %rs666, %rs616, 4; 2026-02-21T10:23:47.7718524Z shl.b16 %rs667, %rs620, 4; 2026-02-21T10:23:47.7718594Z shl.b16 %rs668, %rs624, 4; 2026-02-21T10:23:47.7718660Z shl.b16 %rs669, %rs628, 4; 2026-02-21T10:23:47.7718723Z shl.b16 %rs670, %rs632, 4; 2026-02-21T10:23:47.7718791Z shl.b16 %rs671, %rs636, 4; 2026-02-21T10:23:47.7718856Z shl.b16 %rs672, %rs640, 4; 2026-02-21T10:23:47.7719068Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7719148Z selp.b16 %rs673, %rs641, %rs609, %p188; 2026-02-21T10:23:47.7719221Z cvt.s16.s8 %rs674, %rs673; 2026-02-21T10:23:47.7719282Z shr.s16 %rs675, %rs674, 4; 2026-02-21T10:23:47.7719359Z selp.b16 %rs676, %rs642, %rs613, %p188; 2026-02-21T10:23:47.7719429Z cvt.s16.s8 %rs677, %rs676; 2026-02-21T10:23:47.7719492Z shr.s16 %rs678, %rs677, 4; 2026-02-21T10:23:47.7719565Z selp.b16 %rs679, %rs643, %rs617, %p188; 2026-02-21T10:23:47.7719631Z cvt.s16.s8 %rs680, %rs679; 2026-02-21T10:23:47.7719702Z shr.s16 %rs681, %rs680, 4; 2026-02-21T10:23:47.7719772Z selp.b16 %rs682, %rs644, %rs621, %p188; 2026-02-21T10:23:47.7719838Z cvt.s16.s8 %rs683, %rs682; 2026-02-21T10:23:47.7719908Z shr.s16 %rs684, %rs683, 4; 2026-02-21T10:23:47.7719980Z selp.b16 %rs685, %rs645, %rs625, %p188; 2026-02-21T10:23:47.7720044Z cvt.s16.s8 %rs686, %rs685; 2026-02-21T10:23:47.7720117Z shr.s16 %rs687, %rs686, 4; 2026-02-21T10:23:47.7720196Z selp.b16 %rs688, %rs646, %rs629, %p188; 2026-02-21T10:23:47.7720261Z cvt.s16.s8 %rs689, %rs688; 2026-02-21T10:23:47.7720324Z shr.s16 %rs690, %rs689, 4; 2026-02-21T10:23:47.7720400Z selp.b16 %rs691, %rs647, %rs633, %p188; 2026-02-21T10:23:47.7720466Z cvt.s16.s8 %rs692, %rs691; 2026-02-21T10:23:47.7720616Z shr.s16 %rs693, %rs692, 4; 2026-02-21T10:23:47.7720696Z selp.b16 %rs694, %rs648, %rs637, %p188; 2026-02-21T10:23:47.7720759Z cvt.s16.s8 %rs695, %rs694; 2026-02-21T10:23:47.7720821Z shr.s16 %rs696, %rs695, 4; 2026-02-21T10:23:47.7720982Z selp.b16 %rs697, %rs649, %rs610, %p188; 2026-02-21T10:23:47.7721054Z cvt.s16.s8 %rs698, %rs697; 2026-02-21T10:23:47.7721119Z shr.s16 %rs699, %rs698, 4; 2026-02-21T10:23:47.7721190Z selp.b16 %rs700, %rs650, %rs614, %p188; 2026-02-21T10:23:47.7721268Z cvt.s16.s8 %rs701, %rs700; 2026-02-21T10:23:47.7721335Z shr.s16 %rs702, %rs701, 4; 2026-02-21T10:23:47.7721409Z selp.b16 %rs703, %rs651, %rs618, %p188; 2026-02-21T10:23:47.7721478Z cvt.s16.s8 %rs704, %rs703; 2026-02-21T10:23:47.7721541Z shr.s16 %rs705, %rs704, 4; 2026-02-21T10:23:47.7721617Z selp.b16 %rs706, %rs652, %rs622, %p188; 2026-02-21T10:23:47.7721679Z cvt.s16.s8 %rs707, %rs706; 2026-02-21T10:23:47.7721741Z shr.s16 %rs708, %rs707, 4; 2026-02-21T10:23:47.7721863Z selp.b16 %rs709, %rs653, %rs626, %p188; 2026-02-21T10:23:47.7721933Z cvt.s16.s8 %rs710, %rs709; 2026-02-21T10:23:47.7721996Z shr.s16 %rs711, %rs710, 4; 2026-02-21T10:23:47.7722066Z selp.b16 %rs712, %rs654, %rs630, %p188; 2026-02-21T10:23:47.7722137Z cvt.s16.s8 %rs713, %rs712; 2026-02-21T10:23:47.7722199Z shr.s16 %rs714, %rs713, 4; 2026-02-21T10:23:47.7722269Z selp.b16 %rs715, %rs655, %rs634, %p188; 2026-02-21T10:23:47.7722382Z cvt.s16.s8 %rs716, %rs715; 2026-02-21T10:23:47.7722453Z shr.s16 %rs717, %rs716, 4; 2026-02-21T10:23:47.7722522Z selp.b16 %rs718, %rs656, %rs638, %p188; 2026-02-21T10:23:47.7722584Z cvt.s16.s8 %rs719, %rs718; 2026-02-21T10:23:47.7722652Z shr.s16 %rs720, %rs719, 4; 2026-02-21T10:23:47.7722723Z selp.b16 %rs721, %rs657, %rs611, %p188; 2026-02-21T10:23:47.7722786Z cvt.s16.s8 %rs722, %rs721; 2026-02-21T10:23:47.7722853Z shr.s16 %rs723, %rs722, 4; 2026-02-21T10:23:47.7722926Z selp.b16 %rs724, %rs658, %rs615, %p188; 2026-02-21T10:23:47.7722991Z cvt.s16.s8 %rs725, %rs724; 2026-02-21T10:23:47.7723057Z shr.s16 %rs726, %rs725, 4; 2026-02-21T10:23:47.7723133Z selp.b16 %rs727, %rs659, %rs619, %p188; 2026-02-21T10:23:47.7723195Z cvt.s16.s8 %rs728, %rs727; 2026-02-21T10:23:47.7723258Z shr.s16 %rs729, %rs728, 4; 2026-02-21T10:23:47.7723336Z selp.b16 %rs730, %rs660, %rs623, %p188; 2026-02-21T10:23:47.7723398Z cvt.s16.s8 %rs731, %rs730; 2026-02-21T10:23:47.7723461Z shr.s16 %rs732, %rs731, 4; 2026-02-21T10:23:47.7723535Z selp.b16 %rs733, %rs661, %rs627, %p188; 2026-02-21T10:23:47.7723608Z cvt.s16.s8 %rs734, %rs733; 2026-02-21T10:23:47.7723670Z shr.s16 %rs735, %rs734, 4; 2026-02-21T10:23:47.7723740Z selp.b16 %rs736, %rs662, %rs631, %p188; 2026-02-21T10:23:47.7723810Z cvt.s16.s8 %rs737, %rs736; 2026-02-21T10:23:47.7723877Z shr.s16 %rs738, %rs737, 4; 2026-02-21T10:23:47.7723947Z selp.b16 %rs739, %rs663, %rs635, %p188; 2026-02-21T10:23:47.7724010Z cvt.s16.s8 %rs740, %rs739; 2026-02-21T10:23:47.7724082Z shr.s16 %rs741, %rs740, 4; 2026-02-21T10:23:47.7724159Z selp.b16 %rs742, %rs664, %rs639, %p188; 2026-02-21T10:23:47.7724222Z cvt.s16.s8 %rs743, %rs742; 2026-02-21T10:23:47.7724292Z shr.s16 %rs744, %rs743, 4; 2026-02-21T10:23:47.7724361Z selp.b16 %rs745, %rs665, %rs612, %p188; 2026-02-21T10:23:47.7724427Z cvt.s16.s8 %rs746, %rs745; 2026-02-21T10:23:47.7724496Z shr.s16 %rs747, %rs746, 4; 2026-02-21T10:23:47.7724567Z selp.b16 %rs748, %rs666, %rs616, %p188; 2026-02-21T10:23:47.7724632Z cvt.s16.s8 %rs749, %rs748; 2026-02-21T10:23:47.7724694Z shr.s16 %rs750, %rs749, 4; 2026-02-21T10:23:47.7724769Z selp.b16 %rs751, %rs667, %rs620, %p188; 2026-02-21T10:23:47.7724833Z cvt.s16.s8 %rs752, %rs751; 2026-02-21T10:23:47.7724895Z shr.s16 %rs753, %rs752, 4; 2026-02-21T10:23:47.7724970Z selp.b16 %rs754, %rs668, %rs624, %p188; 2026-02-21T10:23:47.7725032Z cvt.s16.s8 %rs755, %rs754; 2026-02-21T10:23:47.7725095Z shr.s16 %rs756, %rs755, 4; 2026-02-21T10:23:47.7725168Z selp.b16 %rs757, %rs669, %rs628, %p188; 2026-02-21T10:23:47.7725306Z cvt.s16.s8 %rs758, %rs757; 2026-02-21T10:23:47.7725372Z shr.s16 %rs759, %rs758, 4; 2026-02-21T10:23:47.7725444Z selp.b16 %rs760, %rs670, %rs632, %p188; 2026-02-21T10:23:47.7725511Z cvt.s16.s8 %rs761, %rs760; 2026-02-21T10:23:47.7725573Z shr.s16 %rs762, %rs761, 4; 2026-02-21T10:23:47.7725693Z selp.b16 %rs763, %rs671, %rs636, %p188; 2026-02-21T10:23:47.7725756Z cvt.s16.s8 %rs764, %rs763; 2026-02-21T10:23:47.7725827Z shr.s16 %rs765, %rs764, 4; 2026-02-21T10:23:47.7725898Z selp.b16 %rs766, %rs672, %rs640, %p188; 2026-02-21T10:23:47.7725960Z cvt.s16.s8 %rs767, %rs766; 2026-02-21T10:23:47.7726027Z shr.s16 %rs768, %rs767, 4; 2026-02-21T10:23:47.7726234Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7726305Z cvt.rn.f32.s16 %r5578, %rs675; 2026-02-21T10:23:47.7726379Z cvt.rn.f32.s16 %r5579, %rs678; 2026-02-21T10:23:47.7726444Z cvt.rn.f32.s16 %r5580, %rs681; 2026-02-21T10:23:47.7726635Z cvt.rn.f32.s16 %r5581, %rs684; 2026-02-21T10:23:47.7726792Z cvt.rn.f32.s16 %r5582, %rs687; 2026-02-21T10:23:47.7726866Z cvt.rn.f32.s16 %r5583, %rs690; 2026-02-21T10:23:47.7726930Z cvt.rn.f32.s16 %r5584, %rs693; 2026-02-21T10:23:47.7726993Z cvt.rn.f32.s16 %r5585, %rs696; 2026-02-21T10:23:47.7727066Z cvt.rn.f32.s16 %r5586, %rs699; 2026-02-21T10:23:47.7727131Z cvt.rn.f32.s16 %r5587, %rs702; 2026-02-21T10:23:47.7727195Z cvt.rn.f32.s16 %r5588, %rs705; 2026-02-21T10:23:47.7727336Z cvt.rn.f32.s16 %r5589, %rs708; 2026-02-21T10:23:47.7727412Z cvt.rn.f32.s16 %r5590, %rs711; 2026-02-21T10:23:47.7727475Z cvt.rn.f32.s16 %r5591, %rs714; 2026-02-21T10:23:47.7727539Z cvt.rn.f32.s16 %r5592, %rs717; 2026-02-21T10:23:47.7727609Z cvt.rn.f32.s16 %r5593, %rs720; 2026-02-21T10:23:47.7727673Z cvt.rn.f32.s16 %r5594, %rs723; 2026-02-21T10:23:47.7727736Z cvt.rn.f32.s16 %r5595, %rs726; 2026-02-21T10:23:47.7727800Z cvt.rn.f32.s16 %r5596, %rs729; 2026-02-21T10:23:47.7727868Z cvt.rn.f32.s16 %r5597, %rs732; 2026-02-21T10:23:47.7727932Z cvt.rn.f32.s16 %r5598, %rs735; 2026-02-21T10:23:47.7727995Z cvt.rn.f32.s16 %r5599, %rs738; 2026-02-21T10:23:47.7728061Z cvt.rn.f32.s16 %r5600, %rs741; 2026-02-21T10:23:47.7728125Z cvt.rn.f32.s16 %r5601, %rs744; 2026-02-21T10:23:47.7728197Z cvt.rn.f32.s16 %r5602, %rs747; 2026-02-21T10:23:47.7728261Z cvt.rn.f32.s16 %r5603, %rs750; 2026-02-21T10:23:47.7728323Z cvt.rn.f32.s16 %r5604, %rs753; 2026-02-21T10:23:47.7728391Z cvt.rn.f32.s16 %r5605, %rs756; 2026-02-21T10:23:47.7728459Z cvt.rn.f32.s16 %r5606, %rs759; 2026-02-21T10:23:47.7728521Z cvt.rn.f32.s16 %r5607, %rs762; 2026-02-21T10:23:47.7728593Z cvt.rn.f32.s16 %r5608, %rs765; 2026-02-21T10:23:47.7728658Z cvt.rn.f32.s16 %r5609, %rs768; 2026-02-21T10:23:47.7728715Z bar.sync 0; 2026-02-21T10:23:47.7728781Z st.shared.b32 [%r36], %r5578; 2026-02-21T10:23:47.7728851Z st.shared.b32 [%r36+8], %r5579; 2026-02-21T10:23:47.7728921Z st.shared.b32 [%r36+16384], %r5594; 2026-02-21T10:23:47.7728988Z st.shared.b32 [%r36+16392], %r5595; 2026-02-21T10:23:47.7729066Z st.shared.b32 [%r37], %r5580; 2026-02-21T10:23:47.7729135Z st.shared.b32 [%r37+8], %r5581; 2026-02-21T10:23:47.7729200Z st.shared.b32 [%r37+16384], %r5596; 2026-02-21T10:23:47.7729264Z st.shared.b32 [%r37+16392], %r5597; 2026-02-21T10:23:47.7729337Z st.shared.b32 [%r38], %r5582; 2026-02-21T10:23:47.7729401Z st.shared.b32 [%r38+8], %r5583; 2026-02-21T10:23:47.7729465Z st.shared.b32 [%r38+16384], %r5598; 2026-02-21T10:23:47.7729535Z st.shared.b32 [%r38+16392], %r5599; 2026-02-21T10:23:47.7729602Z st.shared.b32 [%r39], %r5584; 2026-02-21T10:23:47.7729666Z st.shared.b32 [%r39+8], %r5585; 2026-02-21T10:23:47.7729731Z st.shared.b32 [%r39+16384], %r5600; 2026-02-21T10:23:47.7729798Z st.shared.b32 [%r39+16392], %r5601; 2026-02-21T10:23:47.7729862Z st.shared.b32 [%r40], %r5586; 2026-02-21T10:23:47.7729926Z st.shared.b32 [%r40+8], %r5587; 2026-02-21T10:23:47.7729989Z st.shared.b32 [%r40+16384], %r5602; 2026-02-21T10:23:47.7730054Z st.shared.b32 [%r40+16392], %r5603; 2026-02-21T10:23:47.7730203Z st.shared.b32 [%r41], %r5588; 2026-02-21T10:23:47.7730267Z st.shared.b32 [%r41+8], %r5589; 2026-02-21T10:23:47.7730335Z st.shared.b32 [%r41+16384], %r5604; 2026-02-21T10:23:47.7730400Z st.shared.b32 [%r41+16392], %r5605; 2026-02-21T10:23:47.7730523Z st.shared.b32 [%r42], %r5590; 2026-02-21T10:23:47.7730589Z st.shared.b32 [%r42+8], %r5591; 2026-02-21T10:23:47.7730653Z st.shared.b32 [%r42+16384], %r5606; 2026-02-21T10:23:47.7730718Z st.shared.b32 [%r42+16392], %r5607; 2026-02-21T10:23:47.7730781Z st.shared.b32 [%r43], %r5592; 2026-02-21T10:23:47.7730847Z st.shared.b32 [%r43+8], %r5593; 2026-02-21T10:23:47.7730910Z st.shared.b32 [%r43+16384], %r5608; 2026-02-21T10:23:47.7730973Z st.shared.b32 [%r43+16392], %r5609; 2026-02-21T10:23:47.7731030Z $L__tmp7: 2026-02-21T10:23:47.7731326Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7731391Z // begin inline asm 2026-02-21T10:23:47.7731522Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7731582Z // end inline asm 2026-02-21T10:23:47.7731636Z bar.sync 0; 2026-02-21T10:23:47.7731708Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7731769Z // begin inline asm 2026-02-21T10:23:47.7733312Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4415,%r4416,%r4417,%r4418}, %rd2, %p24, 1, 1; 2026-02-21T10:23:47.7733373Z // end inline asm 2026-02-21T10:23:47.7733433Z // begin inline asm 2026-02-21T10:23:47.7734911Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4547,%r4548,%r4549,%r4550}, %rd3, %p24, 1, 1; 2026-02-21T10:23:47.7734975Z // end inline asm 2026-02-21T10:23:47.7735033Z // begin inline asm 2026-02-21T10:23:47.7736623Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4679,%r4680,%r4681,%r4682}, %rd4, %p24, 1, 1; 2026-02-21T10:23:47.7736696Z // end inline asm 2026-02-21T10:23:47.7736755Z // begin inline asm 2026-02-21T10:23:47.7738233Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4811,%r4812,%r4813,%r4814}, %rd5, %p24, 1, 1; 2026-02-21T10:23:47.7738397Z // end inline asm 2026-02-21T10:23:47.7738457Z // begin inline asm 2026-02-21T10:23:47.7739936Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r4943,%r4944,%r4945,%r4946}, %rd6, %p24, 1, 1; 2026-02-21T10:23:47.7740069Z // end inline asm 2026-02-21T10:23:47.7740128Z // begin inline asm 2026-02-21T10:23:47.7741716Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r5075,%r5076,%r5077,%r5078}, %rd7, %p24, 1, 1; 2026-02-21T10:23:47.7741785Z // end inline asm 2026-02-21T10:23:47.7741847Z // begin inline asm 2026-02-21T10:23:47.7743334Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r5207,%r5208,%r5209,%r5210}, %rd8, %p24, 1, 1; 2026-02-21T10:23:47.7743398Z // end inline asm 2026-02-21T10:23:47.7743460Z // begin inline asm 2026-02-21T10:23:47.7744936Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154}, {%r5339,%r5340,%r5341,%r5342}, %rd9, %p24, 1, 1; 2026-02-21T10:23:47.7744998Z // end inline asm 2026-02-21T10:23:47.7745074Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7745137Z mov.b32 %r5407, %r10980; 2026-02-21T10:23:47.7745199Z mov.b32 %r5409, %r5408; 2026-02-21T10:23:47.7745261Z // begin inline asm 2026-02-21T10:23:47.7746683Z // wait for regs: %r16091,%r16092,%r16093,%r16094,%r16095,%r16096,%r16097,%r16098,%r16099,%r16100,%r16101,%r16102,%r16103,%r16104,%r16105,%r16106,%r16107,%r16108,%r16109,%r16110,%r16111,%r16112,%r16113,%r16114,%r16115,%r16116,%r16117,%r16118,%r16119,%r16120,%r16121,%r16122,%r16123,%r16124,%r16125,%r16126,%r16127,%r16128,%r16129,%r16130,%r16131,%r16132,%r16133,%r16134,%r16135,%r16136,%r16137,%r16138,%r16139,%r16140,%r16141,%r16142,%r16143,%r16144,%r16145,%r16146,%r16147,%r16148,%r16149,%r16150,%r16151,%r16152,%r16153,%r16154,%r5407,%r5408,%r5409 2026-02-21T10:23:47.7746773Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7746832Z // end inline asm 2026-02-21T10:23:47.7746889Z $L__tmp8: 2026-02-21T10:23:47.7747116Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.7747290Z add.s64 %rd26, %rd362, 128; 2026-02-21T10:23:47.7747357Z add.s64 %rd361, %rd361, 512; 2026-02-21T10:23:47.7747424Z setp.lt.u64 %p74, %rd362, 3968; 2026-02-21T10:23:47.7747550Z mov.b64 %rd362, %rd26; 2026-02-21T10:23:47.7747624Z @%p74 bra $L__BB0_3; 2026-02-21T10:23:47.7747742Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:23:47.7747956Z .loc 1 38 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:38:32 2026-02-21T10:23:47.7748025Z or.b32 %r5683, %r636, %r15; 2026-02-21T10:23:47.7748227Z .loc 1 40 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:40:32 2026-02-21T10:23:47.7748291Z or.b32 %r5684, %r105, %r7; 2026-02-21T10:23:47.7748358Z or.b32 %r5685, %r105, %r8; 2026-02-21T10:23:47.7748416Z or.b32 %r5686, %r105, %r9; 2026-02-21T10:23:47.7748551Z or.b32 %r5687, %r105, %r10; 2026-02-21T10:23:47.7748906Z or.b32 %r5688, %r105, %r11; 2026-02-21T10:23:47.7748974Z or.b32 %r5689, %r105, %r12; 2026-02-21T10:23:47.7749045Z or.b32 %r5690, %r105, %r13; 2026-02-21T10:23:47.7749105Z or.b32 %r5691, %r105, %r14; 2026-02-21T10:23:47.7749318Z .loc 1 93 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:93:28 2026-02-21T10:23:47.7749404Z cvt.rn.bf16x2.f32 %r5692, %r16092, %r16091; 2026-02-21T10:23:47.7749547Z cvt.rn.bf16x2.f32 %r5693, %r16094, %r16093; 2026-02-21T10:23:47.7749634Z cvt.rn.bf16x2.f32 %r5694, %r16096, %r16095; 2026-02-21T10:23:47.7749711Z cvt.rn.bf16x2.f32 %r5695, %r16098, %r16097; 2026-02-21T10:23:47.7749787Z cvt.rn.bf16x2.f32 %r5696, %r16100, %r16099; 2026-02-21T10:23:47.7749868Z cvt.rn.bf16x2.f32 %r5697, %r16102, %r16101; 2026-02-21T10:23:47.7749946Z cvt.rn.bf16x2.f32 %r5698, %r16104, %r16103; 2026-02-21T10:23:47.7750024Z cvt.rn.bf16x2.f32 %r5699, %r16106, %r16105; 2026-02-21T10:23:47.7750100Z cvt.rn.bf16x2.f32 %r5700, %r16108, %r16107; 2026-02-21T10:23:47.7750187Z cvt.rn.bf16x2.f32 %r5701, %r16110, %r16109; 2026-02-21T10:23:47.7750264Z cvt.rn.bf16x2.f32 %r5702, %r16112, %r16111; 2026-02-21T10:23:47.7750341Z cvt.rn.bf16x2.f32 %r5703, %r16114, %r16113; 2026-02-21T10:23:47.7750425Z cvt.rn.bf16x2.f32 %r5704, %r16116, %r16115; 2026-02-21T10:23:47.7750501Z cvt.rn.bf16x2.f32 %r5705, %r16118, %r16117; 2026-02-21T10:23:47.7750581Z cvt.rn.bf16x2.f32 %r5706, %r16120, %r16119; 2026-02-21T10:23:47.7750663Z cvt.rn.bf16x2.f32 %r5707, %r16122, %r16121; 2026-02-21T10:23:47.7750740Z cvt.rn.bf16x2.f32 %r5708, %r16124, %r16123; 2026-02-21T10:23:47.7750815Z cvt.rn.bf16x2.f32 %r5709, %r16126, %r16125; 2026-02-21T10:23:47.7750894Z cvt.rn.bf16x2.f32 %r5710, %r16128, %r16127; 2026-02-21T10:23:47.7750975Z cvt.rn.bf16x2.f32 %r5711, %r16130, %r16129; 2026-02-21T10:23:47.7751047Z cvt.rn.bf16x2.f32 %r5712, %r16132, %r16131; 2026-02-21T10:23:47.7751122Z cvt.rn.bf16x2.f32 %r5713, %r16134, %r16133; 2026-02-21T10:23:47.7751211Z cvt.rn.bf16x2.f32 %r5714, %r16136, %r16135; 2026-02-21T10:23:47.7751290Z cvt.rn.bf16x2.f32 %r5715, %r16138, %r16137; 2026-02-21T10:23:47.7751365Z cvt.rn.bf16x2.f32 %r5716, %r16140, %r16139; 2026-02-21T10:23:47.7751443Z cvt.rn.bf16x2.f32 %r5717, %r16142, %r16141; 2026-02-21T10:23:47.7751523Z cvt.rn.bf16x2.f32 %r5718, %r16144, %r16143; 2026-02-21T10:23:47.7751599Z cvt.rn.bf16x2.f32 %r5719, %r16146, %r16145; 2026-02-21T10:23:47.7751678Z cvt.rn.bf16x2.f32 %r5720, %r16148, %r16147; 2026-02-21T10:23:47.7751758Z cvt.rn.bf16x2.f32 %r5721, %r16150, %r16149; 2026-02-21T10:23:47.7751834Z cvt.rn.bf16x2.f32 %r5722, %r16152, %r16151; 2026-02-21T10:23:47.7751907Z cvt.rn.bf16x2.f32 %r5723, %r16154, %r16153; 2026-02-21T10:23:47.7752130Z .loc 1 94 50 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:50 2026-02-21T10:23:47.7752206Z mad.lo.s32 %r5724, %r5684, 1280, %r5683; 2026-02-21T10:23:47.7752281Z mad.lo.s32 %r5725, %r5685, 1280, %r5683; 2026-02-21T10:23:47.7752427Z mad.lo.s32 %r5726, %r5686, 1280, %r5683; 2026-02-21T10:23:47.7752497Z mad.lo.s32 %r5727, %r5687, 1280, %r5683; 2026-02-21T10:23:47.7752564Z mad.lo.s32 %r5728, %r5688, 1280, %r5683; 2026-02-21T10:23:47.7752633Z mad.lo.s32 %r5729, %r5689, 1280, %r5683; 2026-02-21T10:23:47.7752755Z mad.lo.s32 %r5730, %r5690, 1280, %r5683; 2026-02-21T10:23:47.7752826Z mad.lo.s32 %r5731, %r5691, 1280, %r5683; 2026-02-21T10:23:47.7753040Z .loc 1 94 22 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:22 2026-02-21T10:23:47.7753117Z mad.wide.s32 %rd159, %r5724, 2, %rd45; 2026-02-21T10:23:47.7753188Z mad.wide.s32 %rd160, %r5725, 2, %rd45; 2026-02-21T10:23:47.7753257Z mad.wide.s32 %rd161, %r5726, 2, %rd45; 2026-02-21T10:23:47.7753328Z mad.wide.s32 %rd162, %r5727, 2, %rd45; 2026-02-21T10:23:47.7753397Z mad.wide.s32 %rd163, %r5728, 2, %rd45; 2026-02-21T10:23:47.7753463Z mad.wide.s32 %rd164, %r5729, 2, %rd45; 2026-02-21T10:23:47.7753532Z mad.wide.s32 %rd165, %r5730, 2, %rd45; 2026-02-21T10:23:47.7753652Z mad.wide.s32 %rd166, %r5731, 2, %rd45; 2026-02-21T10:23:47.7753862Z .loc 1 94 81 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:81 2026-02-21T10:23:47.7753921Z bar.sync 0; 2026-02-21T10:23:47.7754045Z st.shared.v4.b32 [%r44], {%r5692, %r5694, %r5696, %r5698}; 2026-02-21T10:23:47.7754165Z st.shared.v4.b32 [%r44+512], {%r5693, %r5695, %r5697, %r5699}; 2026-02-21T10:23:47.7754319Z st.shared.v4.b32 [%r45], {%r5700, %r5702, %r5704, %r5706}; 2026-02-21T10:23:47.7754454Z st.shared.v4.b32 [%r45+512], {%r5701, %r5703, %r5705, %r5707}; 2026-02-21T10:23:47.7754564Z st.shared.v4.b32 [%r46], {%r5708, %r5710, %r5712, %r5714}; 2026-02-21T10:23:47.7754676Z st.shared.v4.b32 [%r46+512], {%r5709, %r5711, %r5713, %r5715}; 2026-02-21T10:23:47.7754779Z st.shared.v4.b32 [%r47], {%r5716, %r5718, %r5720, %r5722}; 2026-02-21T10:23:47.7754894Z st.shared.v4.b32 [%r47+512], {%r5717, %r5719, %r5721, %r5723}; 2026-02-21T10:23:47.7754956Z bar.sync 0; 2026-02-21T10:23:47.7755022Z // begin inline asm 2026-02-21T10:23:47.7755222Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5650, %r5651, %r5652, %r5653}, [%r5614]; 2026-02-21T10:23:47.7755285Z // end inline asm 2026-02-21T10:23:47.7755350Z // begin inline asm 2026-02-21T10:23:47.7755539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5654, %r5655, %r5656, %r5657}, [%r5619]; 2026-02-21T10:23:47.7755599Z // end inline asm 2026-02-21T10:23:47.7755665Z // begin inline asm 2026-02-21T10:23:47.7755847Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5658, %r5659, %r5660, %r5661}, [%r5624]; 2026-02-21T10:23:47.7755910Z // end inline asm 2026-02-21T10:23:47.7755971Z // begin inline asm 2026-02-21T10:23:47.7756150Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5662, %r5663, %r5664, %r5665}, [%r5629]; 2026-02-21T10:23:47.7756213Z // end inline asm 2026-02-21T10:23:47.7756273Z // begin inline asm 2026-02-21T10:23:47.7756587Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5666, %r5667, %r5668, %r5669}, [%r5634]; 2026-02-21T10:23:47.7756656Z // end inline asm 2026-02-21T10:23:47.7756723Z // begin inline asm 2026-02-21T10:23:47.7756906Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5670, %r5671, %r5672, %r5673}, [%r5639]; 2026-02-21T10:23:47.7756977Z // end inline asm 2026-02-21T10:23:47.7757048Z // begin inline asm 2026-02-21T10:23:47.7757231Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5674, %r5675, %r5676, %r5677}, [%r5644]; 2026-02-21T10:23:47.7757292Z // end inline asm 2026-02-21T10:23:47.7757359Z // begin inline asm 2026-02-21T10:23:47.7757539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5678, %r5679, %r5680, %r5681}, [%r5649]; 2026-02-21T10:23:47.7757597Z // end inline asm 2026-02-21T10:23:47.7757659Z // begin inline asm 2026-02-21T10:23:47.7757792Z st.global.v4.b32 [ %rd159 + 0 ], { %r5650, %r5651, %r5652, %r5653 }; 2026-02-21T10:23:47.7757852Z // end inline asm 2026-02-21T10:23:47.7757913Z // begin inline asm 2026-02-21T10:23:47.7758040Z st.global.v4.b32 [ %rd160 + 0 ], { %r5654, %r5655, %r5656, %r5657 }; 2026-02-21T10:23:47.7758182Z // end inline asm 2026-02-21T10:23:47.7758243Z // begin inline asm 2026-02-21T10:23:47.7758373Z st.global.v4.b32 [ %rd161 + 0 ], { %r5658, %r5659, %r5660, %r5661 }; 2026-02-21T10:23:47.7758442Z // end inline asm 2026-02-21T10:23:47.7758567Z // begin inline asm 2026-02-21T10:23:47.7758688Z st.global.v4.b32 [ %rd162 + 0 ], { %r5662, %r5663, %r5664, %r5665 }; 2026-02-21T10:23:47.7758760Z // end inline asm 2026-02-21T10:23:47.7758823Z // begin inline asm 2026-02-21T10:23:47.7758940Z st.global.v4.b32 [ %rd163 + 0 ], { %r5666, %r5667, %r5668, %r5669 }; 2026-02-21T10:23:47.7759000Z // end inline asm 2026-02-21T10:23:47.7759060Z // begin inline asm 2026-02-21T10:23:47.7759174Z st.global.v4.b32 [ %rd164 + 0 ], { %r5670, %r5671, %r5672, %r5673 }; 2026-02-21T10:23:47.7759231Z // end inline asm 2026-02-21T10:23:47.7759295Z // begin inline asm 2026-02-21T10:23:47.7759412Z st.global.v4.b32 [ %rd165 + 0 ], { %r5674, %r5675, %r5676, %r5677 }; 2026-02-21T10:23:47.7759474Z // end inline asm 2026-02-21T10:23:47.7759622Z // begin inline asm 2026-02-21T10:23:47.7759752Z st.global.v4.b32 [ %rd166 + 0 ], { %r5678, %r5679, %r5680, %r5681 }; 2026-02-21T10:23:47.7759810Z // end inline asm 2026-02-21T10:23:47.7760034Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7760109Z add.s32 %r5732, %r16090, 132; 2026-02-21T10:23:47.7760380Z .loc 1 32 35 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:32:35 2026-02-21T10:23:47.7760451Z shr.s32 %r5733, %r5732, 31; 2026-02-21T10:23:47.7760519Z shr.u32 %r5734, %r5733, 17; 2026-02-21T10:23:47.7760588Z add.s32 %r5735, %r5732, %r5734; 2026-02-21T10:23:47.7760651Z shr.s32 %r5736, %r5735, 15; 2026-02-21T10:23:47.7760860Z .loc 1 33 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:33:33 2026-02-21T10:23:47.7760924Z shl.b32 %r5737, %r5736, 6; 2026-02-21T10:23:47.7761130Z .loc 1 34 39 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:39 2026-02-21T10:23:47.7761202Z sub.s32 %r5738, 10, %r5737; 2026-02-21T10:23:47.7761403Z .loc 1 34 52 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:52 2026-02-21T10:23:47.7761468Z min.s32 %r5739, %r5738, 64; 2026-02-21T10:23:47.7761670Z .loc 1 35 45 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:45 2026-02-21T10:23:47.7761757Z and.b32 %r5740, %r5735, -32768; 2026-02-21T10:23:47.7761824Z sub.s32 %r5741, %r5732, %r5740; 2026-02-21T10:23:47.7762028Z .loc 1 36 51 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:36:51 2026-02-21T10:23:47.7762096Z div.s32 %r5742, %r5741, %r5739; 2026-02-21T10:23:47.7762298Z .loc 1 35 64 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:64 2026-02-21T10:23:47.7762368Z mul.lo.s32 %r5743, %r5742, %r5739; 2026-02-21T10:23:47.7762437Z sub.s32 %r5744, %r5741, %r5743; 2026-02-21T10:23:47.7762641Z .loc 1 35 30 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:30 2026-02-21T10:23:47.7762704Z add.s32 %r5745, %r5744, %r5737; 2026-02-21T10:23:47.7762909Z .loc 1 37 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:37:27 2026-02-21T10:23:47.7762977Z shl.b32 %r5772, %r5745, 7; 2026-02-21T10:23:47.7763181Z .loc 1 39 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:39:27 2026-02-21T10:23:47.7763246Z shl.b32 %r235, %r5742, 7; 2026-02-21T10:23:47.7763466Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.7763530Z shl.b32 %r5746, %r5742, 20; 2026-02-21T10:23:47.7763594Z or.b32 %r5747, %r16086, %r5746; 2026-02-21T10:23:47.7763667Z mul.wide.s32 %rd28, %r5747, 2; 2026-02-21T10:23:47.7763732Z or.b32 %r5748, %r16087, %r235; 2026-02-21T10:23:47.7763798Z shl.b32 %r5749, %r5748, 13; 2026-02-21T10:23:47.7763929Z mul.wide.s32 %rd29, %r5749, 2; 2026-02-21T10:23:47.7763991Z or.b32 %r5750, %r16088, %r235; 2026-02-21T10:23:47.7764053Z shl.b32 %r5751, %r5750, 13; 2026-02-21T10:23:47.7764118Z mul.wide.s32 %rd30, %r5751, 2; 2026-02-21T10:23:47.7764233Z or.b32 %r5752, %r16089, %r5746; 2026-02-21T10:23:47.7764311Z mul.wide.s32 %rd31, %r5752, 2; 2026-02-21T10:23:47.7764378Z mov.b32 %r16155, 0f00000000; 2026-02-21T10:23:47.7764444Z mov.b64 %rd364, 0; 2026-02-21T10:23:47.7764508Z mov.b64 %rd363, %rd10; 2026-02-21T10:23:47.7764573Z mov.b32 %r16156, %r16155; 2026-02-21T10:23:47.7764633Z mov.b32 %r16157, %r16155; 2026-02-21T10:23:47.7764699Z mov.b32 %r16158, %r16155; 2026-02-21T10:23:47.7764760Z mov.b32 %r16159, %r16155; 2026-02-21T10:23:47.7764820Z mov.b32 %r16160, %r16155; 2026-02-21T10:23:47.7764885Z mov.b32 %r16161, %r16155; 2026-02-21T10:23:47.7764945Z mov.b32 %r16162, %r16155; 2026-02-21T10:23:47.7765006Z mov.b32 %r16163, %r16155; 2026-02-21T10:23:47.7765068Z mov.b32 %r16164, %r16155; 2026-02-21T10:23:47.7765188Z mov.b32 %r16165, %r16155; 2026-02-21T10:23:47.7765251Z mov.b32 %r16166, %r16155; 2026-02-21T10:23:47.7765312Z mov.b32 %r16167, %r16155; 2026-02-21T10:23:47.7765377Z mov.b32 %r16168, %r16155; 2026-02-21T10:23:47.7765439Z mov.b32 %r16169, %r16155; 2026-02-21T10:23:47.7765500Z mov.b32 %r16170, %r16155; 2026-02-21T10:23:47.7765559Z mov.b32 %r16171, %r16155; 2026-02-21T10:23:47.7765694Z mov.b32 %r16172, %r16155; 2026-02-21T10:23:47.7765758Z mov.b32 %r16173, %r16155; 2026-02-21T10:23:47.7765818Z mov.b32 %r16174, %r16155; 2026-02-21T10:23:47.7765885Z mov.b32 %r16175, %r16155; 2026-02-21T10:23:47.7765944Z mov.b32 %r16176, %r16155; 2026-02-21T10:23:47.7766002Z mov.b32 %r16177, %r16155; 2026-02-21T10:23:47.7766063Z mov.b32 %r16178, %r16155; 2026-02-21T10:23:47.7766127Z mov.b32 %r16179, %r16155; 2026-02-21T10:23:47.7766188Z mov.b32 %r16180, %r16155; 2026-02-21T10:23:47.7766247Z mov.b32 %r16181, %r16155; 2026-02-21T10:23:47.7766315Z mov.b32 %r16182, %r16155; 2026-02-21T10:23:47.7766379Z mov.b32 %r16183, %r16155; 2026-02-21T10:23:47.7766441Z mov.b32 %r16184, %r16155; 2026-02-21T10:23:47.7766634Z mov.b32 %r16185, %r16155; 2026-02-21T10:23:47.7766699Z mov.b32 %r16186, %r16155; 2026-02-21T10:23:47.7766762Z mov.b32 %r16187, %r16155; 2026-02-21T10:23:47.7766825Z mov.b32 %r16188, %r16155; 2026-02-21T10:23:47.7766890Z mov.b32 %r16189, %r16155; 2026-02-21T10:23:47.7766969Z mov.b32 %r16190, %r16155; 2026-02-21T10:23:47.7767033Z mov.b32 %r16191, %r16155; 2026-02-21T10:23:47.7767099Z mov.b32 %r16192, %r16155; 2026-02-21T10:23:47.7767159Z mov.b32 %r16193, %r16155; 2026-02-21T10:23:47.7767222Z mov.b32 %r16194, %r16155; 2026-02-21T10:23:47.7767284Z mov.b32 %r16195, %r16155; 2026-02-21T10:23:47.7767351Z mov.b32 %r16196, %r16155; 2026-02-21T10:23:47.7767413Z mov.b32 %r16197, %r16155; 2026-02-21T10:23:47.7767474Z mov.b32 %r16198, %r16155; 2026-02-21T10:23:47.7767538Z mov.b32 %r16199, %r16155; 2026-02-21T10:23:47.7767604Z mov.b32 %r16200, %r16155; 2026-02-21T10:23:47.7767667Z mov.b32 %r16201, %r16155; 2026-02-21T10:23:47.7767729Z mov.b32 %r16202, %r16155; 2026-02-21T10:23:47.7767799Z mov.b32 %r16203, %r16155; 2026-02-21T10:23:47.7767860Z mov.b32 %r16204, %r16155; 2026-02-21T10:23:47.7767923Z mov.b32 %r16205, %r16155; 2026-02-21T10:23:47.7767986Z mov.b32 %r16206, %r16155; 2026-02-21T10:23:47.7768047Z mov.b32 %r16207, %r16155; 2026-02-21T10:23:47.7768107Z mov.b32 %r16208, %r16155; 2026-02-21T10:23:47.7768168Z mov.b32 %r16209, %r16155; 2026-02-21T10:23:47.7768234Z mov.b32 %r16210, %r16155; 2026-02-21T10:23:47.7768294Z mov.b32 %r16211, %r16155; 2026-02-21T10:23:47.7768355Z mov.b32 %r16212, %r16155; 2026-02-21T10:23:47.7768419Z mov.b32 %r16213, %r16155; 2026-02-21T10:23:47.7768480Z mov.b32 %r16214, %r16155; 2026-02-21T10:23:47.7768540Z mov.b32 %r16215, %r16155; 2026-02-21T10:23:47.7768601Z mov.b32 %r16216, %r16155; 2026-02-21T10:23:47.7768664Z mov.b32 %r16217, %r16155; 2026-02-21T10:23:47.7768818Z mov.b32 %r16218, %r16155; 2026-02-21T10:23:47.7768943Z $L__BB0_5: // Parent Loop BB0_2 Depth=1 2026-02-21T10:23:47.7769061Z // => This Inner Loop Header: Depth=2 2026-02-21T10:23:47.7769346Z .loc 1 0 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0:126 2026-02-21T10:23:47.7769416Z cvt.u32.u64 %r5773, %rd364; 2026-02-21T10:23:47.7769632Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7769701Z add.s64 %rd169, %rd363, %rd31; 2026-02-21T10:23:47.7769766Z add.s64 %rd172, %rd363, %rd30; 2026-02-21T10:23:47.7769830Z add.s64 %rd175, %rd363, %rd29; 2026-02-21T10:23:47.7770039Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7770104Z add.s64 %rd178, %rd363, %rd28; 2026-02-21T10:23:47.7770166Z // begin inline asm 2026-02-21T10:23:47.7770234Z mov.u64 %rd168, 0x0; 2026-02-21T10:23:47.7770429Z createpolicy.fractional.L2::evict_last.b64 %rd168, 1.0; 2026-02-21T10:23:47.7770493Z // end inline asm 2026-02-21T10:23:47.7770559Z // begin inline asm 2026-02-21T10:23:47.7770620Z mov.u32 %r5753, 0x0; 2026-02-21T10:23:47.7770683Z mov.u32 %r5754, 0x0; 2026-02-21T10:23:47.7770742Z mov.u32 %r5755, 0x0; 2026-02-21T10:23:47.7770803Z mov.u32 %r5756, 0x0; 2026-02-21T10:23:47.7771090Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5753, %r5754, %r5755, %r5756 }, [ %rd169 + 0 ], %rd168; 2026-02-21T10:23:47.7771153Z // end inline asm 2026-02-21T10:23:47.7771220Z // begin inline asm 2026-02-21T10:23:47.7771280Z mov.u64 %rd171, 0x0; 2026-02-21T10:23:47.7771401Z createpolicy.fractional.L2::evict_last.b64 %rd171, 1.0; 2026-02-21T10:23:47.7771464Z // end inline asm 2026-02-21T10:23:47.7771535Z // begin inline asm 2026-02-21T10:23:47.7771597Z mov.u32 %r5757, 0x0; 2026-02-21T10:23:47.7771658Z mov.u32 %r5758, 0x0; 2026-02-21T10:23:47.7771722Z mov.u32 %r5759, 0x0; 2026-02-21T10:23:47.7771783Z mov.u32 %r5760, 0x0; 2026-02-21T10:23:47.7772008Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5757, %r5758, %r5759, %r5760 }, [ %rd172 + 0 ], %rd171; 2026-02-21T10:23:47.7772070Z // end inline asm 2026-02-21T10:23:47.7772137Z // begin inline asm 2026-02-21T10:23:47.7772197Z mov.u64 %rd174, 0x0; 2026-02-21T10:23:47.7772317Z createpolicy.fractional.L2::evict_last.b64 %rd174, 1.0; 2026-02-21T10:23:47.7772383Z // end inline asm 2026-02-21T10:23:47.7772442Z // begin inline asm 2026-02-21T10:23:47.7772501Z mov.u32 %r5761, 0x0; 2026-02-21T10:23:47.7772566Z mov.u32 %r5762, 0x0; 2026-02-21T10:23:47.7772624Z mov.u32 %r5763, 0x0; 2026-02-21T10:23:47.7772686Z mov.u32 %r5764, 0x0; 2026-02-21T10:23:47.7772904Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5761, %r5762, %r5763, %r5764 }, [ %rd175 + 0 ], %rd174; 2026-02-21T10:23:47.7772975Z // end inline asm 2026-02-21T10:23:47.7773037Z // begin inline asm 2026-02-21T10:23:47.7773097Z mov.u64 %rd177, 0x0; 2026-02-21T10:23:47.7773225Z createpolicy.fractional.L2::evict_last.b64 %rd177, 1.0; 2026-02-21T10:23:47.7773284Z // end inline asm 2026-02-21T10:23:47.7773345Z // begin inline asm 2026-02-21T10:23:47.7773409Z mov.u32 %r5765, 0x0; 2026-02-21T10:23:47.7773471Z mov.u32 %r5766, 0x0; 2026-02-21T10:23:47.7773530Z mov.u32 %r5767, 0x0; 2026-02-21T10:23:47.7773589Z mov.u32 %r5768, 0x0; 2026-02-21T10:23:47.7773806Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5765, %r5766, %r5767, %r5768 }, [ %rd178 + 0 ], %rd177; 2026-02-21T10:23:47.7773865Z // end inline asm 2026-02-21T10:23:47.7774072Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7774136Z bar.sync 0; 2026-02-21T10:23:47.7774221Z st.shared.v2.b32 [%r18], {%r5753, %r5754}; 2026-02-21T10:23:47.7774312Z st.shared.v2.b32 [%r18+4096], {%r5757, %r5758}; 2026-02-21T10:23:47.7774402Z st.shared.v2.b32 [%r18+8192], {%r5761, %r5762}; 2026-02-21T10:23:47.7774564Z st.shared.v2.b32 [%r18+12288], {%r5765, %r5766}; 2026-02-21T10:23:47.7774645Z st.shared.v2.b32 [%r19], {%r5755, %r5756}; 2026-02-21T10:23:47.7774730Z st.shared.v2.b32 [%r19+4096], {%r5759, %r5760}; 2026-02-21T10:23:47.7774818Z st.shared.v2.b32 [%r19+8192], {%r5763, %r5764}; 2026-02-21T10:23:47.7774951Z st.shared.v2.b32 [%r19+12288], {%r5767, %r5768}; 2026-02-21T10:23:47.7775008Z bar.sync 0; 2026-02-21T10:23:47.7775084Z ld.shared.b16 %rs769, [%r20]; 2026-02-21T10:23:47.7775160Z ld.shared.b16 %rs770, [%r20+1024]; 2026-02-21T10:23:47.7775231Z ld.shared.b16 %rs771, [%r20+64]; 2026-02-21T10:23:47.7775304Z ld.shared.b16 %rs772, [%r20+1088]; 2026-02-21T10:23:47.7775372Z ld.shared.b16 %rs773, [%r21]; 2026-02-21T10:23:47.7775438Z ld.shared.b16 %rs774, [%r21+1024]; 2026-02-21T10:23:47.7775505Z ld.shared.b16 %rs775, [%r21+64]; 2026-02-21T10:23:47.7775577Z ld.shared.b16 %rs776, [%r21+1088]; 2026-02-21T10:23:47.7775645Z ld.shared.b16 %rs777, [%r22]; 2026-02-21T10:23:47.7775711Z ld.shared.b16 %rs778, [%r22+1024]; 2026-02-21T10:23:47.7775836Z ld.shared.b16 %rs779, [%r22+64]; 2026-02-21T10:23:47.7775903Z ld.shared.b16 %rs780, [%r22+1088]; 2026-02-21T10:23:47.7775969Z ld.shared.b16 %rs781, [%r23]; 2026-02-21T10:23:47.7776036Z ld.shared.b16 %rs782, [%r23+1024]; 2026-02-21T10:23:47.7776112Z ld.shared.b16 %rs783, [%r23+64]; 2026-02-21T10:23:47.7776177Z ld.shared.b16 %rs784, [%r23+1088]; 2026-02-21T10:23:47.7776301Z ld.shared.b16 %rs785, [%r24]; 2026-02-21T10:23:47.7776378Z ld.shared.b16 %rs786, [%r24+1024]; 2026-02-21T10:23:47.7776445Z ld.shared.b16 %rs787, [%r24+64]; 2026-02-21T10:23:47.7776642Z ld.shared.b16 %rs788, [%r24+1088]; 2026-02-21T10:23:47.7776708Z ld.shared.b16 %rs789, [%r25]; 2026-02-21T10:23:47.7776786Z ld.shared.b16 %rs790, [%r25+1024]; 2026-02-21T10:23:47.7776852Z ld.shared.b16 %rs791, [%r25+64]; 2026-02-21T10:23:47.7776918Z ld.shared.b16 %rs792, [%r25+1088]; 2026-02-21T10:23:47.7776988Z ld.shared.b16 %rs793, [%r26]; 2026-02-21T10:23:47.7777054Z ld.shared.b16 %rs794, [%r26+1024]; 2026-02-21T10:23:47.7777128Z ld.shared.b16 %rs795, [%r26+64]; 2026-02-21T10:23:47.7777198Z ld.shared.b16 %rs796, [%r26+1088]; 2026-02-21T10:23:47.7777265Z ld.shared.b16 %rs797, [%r27]; 2026-02-21T10:23:47.7777332Z ld.shared.b16 %rs798, [%r27+1024]; 2026-02-21T10:23:47.7777404Z ld.shared.b16 %rs799, [%r27+64]; 2026-02-21T10:23:47.7777474Z ld.shared.b16 %rs800, [%r27+1088]; 2026-02-21T10:23:47.7777548Z cvt.f32.bf16 %r5906, %rs769; 2026-02-21T10:23:47.7777613Z cvt.f32.bf16 %r5907, %rs770; 2026-02-21T10:23:47.7777681Z cvt.f32.bf16 %r5908, %rs773; 2026-02-21T10:23:47.7777746Z cvt.f32.bf16 %r5909, %rs774; 2026-02-21T10:23:47.7777811Z cvt.f32.bf16 %r6038, %rs777; 2026-02-21T10:23:47.7777872Z cvt.f32.bf16 %r6039, %rs778; 2026-02-21T10:23:47.7777940Z cvt.f32.bf16 %r6040, %rs781; 2026-02-21T10:23:47.7778004Z cvt.f32.bf16 %r6041, %rs782; 2026-02-21T10:23:47.7778067Z cvt.f32.bf16 %r6170, %rs785; 2026-02-21T10:23:47.7778132Z cvt.f32.bf16 %r6171, %rs786; 2026-02-21T10:23:47.7778197Z cvt.f32.bf16 %r6172, %rs789; 2026-02-21T10:23:47.7778261Z cvt.f32.bf16 %r6173, %rs790; 2026-02-21T10:23:47.7778323Z cvt.f32.bf16 %r6302, %rs793; 2026-02-21T10:23:47.7778392Z cvt.f32.bf16 %r6303, %rs794; 2026-02-21T10:23:47.7778453Z cvt.f32.bf16 %r6304, %rs797; 2026-02-21T10:23:47.7778518Z cvt.f32.bf16 %r6305, %rs798; 2026-02-21T10:23:47.7778592Z cvt.f32.bf16 %r6434, %rs771; 2026-02-21T10:23:47.7778663Z cvt.f32.bf16 %r6435, %rs772; 2026-02-21T10:23:47.7778730Z cvt.f32.bf16 %r6436, %rs775; 2026-02-21T10:23:47.7778796Z cvt.f32.bf16 %r6437, %rs776; 2026-02-21T10:23:47.7778858Z cvt.f32.bf16 %r6566, %rs779; 2026-02-21T10:23:47.7778924Z cvt.f32.bf16 %r6567, %rs780; 2026-02-21T10:23:47.7778986Z cvt.f32.bf16 %r6568, %rs783; 2026-02-21T10:23:47.7779052Z cvt.f32.bf16 %r6569, %rs784; 2026-02-21T10:23:47.7779118Z cvt.f32.bf16 %r6698, %rs787; 2026-02-21T10:23:47.7779181Z cvt.f32.bf16 %r6699, %rs788; 2026-02-21T10:23:47.7779249Z cvt.f32.bf16 %r6700, %rs791; 2026-02-21T10:23:47.7779404Z cvt.f32.bf16 %r6701, %rs792; 2026-02-21T10:23:47.7779473Z cvt.f32.bf16 %r6830, %rs795; 2026-02-21T10:23:47.7779536Z cvt.f32.bf16 %r6831, %rs796; 2026-02-21T10:23:47.7779603Z cvt.f32.bf16 %r6832, %rs799; 2026-02-21T10:23:47.7779666Z cvt.f32.bf16 %r6833, %rs800; 2026-02-21T10:23:47.7779946Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7780008Z bar.sync 0; 2026-02-21T10:23:47.7780073Z // begin inline asm 2026-02-21T10:23:47.7780177Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7780255Z // end inline asm 2026-02-21T10:23:47.7780315Z bar.sync 0; 2026-02-21T10:23:47.7780377Z // begin inline asm 2026-02-21T10:23:47.7780518Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7780584Z // end inline asm 2026-02-21T10:23:47.7780644Z // begin inline asm 2026-02-21T10:23:47.7780724Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7780787Z // end inline asm 2026-02-21T10:23:47.7780847Z bar.sync 0; 2026-02-21T10:23:47.7780981Z elect.sync %r10613|%p125, -1; 2026-02-21T10:23:47.7781054Z and.pred %p77, %p1, %p125; 2026-02-21T10:23:47.7781120Z // begin inline asm 2026-02-21T10:23:47.7781452Z @%p77 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r5772, %r5773}], [%r5769]; 2026-02-21T10:23:47.7781515Z // end inline asm 2026-02-21T10:23:47.7781639Z bar.sync 0; 2026-02-21T10:23:47.7781703Z mov.b32 %r10545, 0; 2026-02-21T10:23:47.7781763Z // begin inline asm 2026-02-21T10:23:47.7781819Z 2026-02-21T10:23:47.7781876Z { 2026-02-21T10:23:47.7781943Z .reg .pred complete; 2026-02-21T10:23:47.7782001Z waitLoop: 2026-02-21T10:23:47.7782156Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r10545; 2026-02-21T10:23:47.7782231Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7782284Z } 2026-02-21T10:23:47.7782289Z 2026-02-21T10:23:47.7782353Z // end inline asm 2026-02-21T10:23:47.7782422Z bar.sync 0; 2026-02-21T10:23:47.7782489Z // begin inline asm 2026-02-21T10:23:47.7782588Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7782654Z // end inline asm 2026-02-21T10:23:47.7782866Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7782939Z ld.shared.b8 %rs801, [%r28]; 2026-02-21T10:23:47.7783016Z ld.shared.b8 %rs802, [%r28+1024]; 2026-02-21T10:23:47.7783088Z ld.shared.b8 %rs803, [%r28+2048]; 2026-02-21T10:23:47.7783155Z ld.shared.b8 %rs804, [%r28+3072]; 2026-02-21T10:23:47.7783225Z ld.shared.b8 %rs805, [%r29+128]; 2026-02-21T10:23:47.7783298Z ld.shared.b8 %rs806, [%r29+1152]; 2026-02-21T10:23:47.7783363Z ld.shared.b8 %rs807, [%r29+2176]; 2026-02-21T10:23:47.7783427Z ld.shared.b8 %rs808, [%r29+3200]; 2026-02-21T10:23:47.7783506Z ld.shared.b8 %rs809, [%r30+256]; 2026-02-21T10:23:47.7783579Z ld.shared.b8 %rs810, [%r30+1280]; 2026-02-21T10:23:47.7783645Z ld.shared.b8 %rs811, [%r30+2304]; 2026-02-21T10:23:47.7783721Z ld.shared.b8 %rs812, [%r30+3328]; 2026-02-21T10:23:47.7783790Z ld.shared.b8 %rs813, [%r31+384]; 2026-02-21T10:23:47.7783856Z ld.shared.b8 %rs814, [%r31+1408]; 2026-02-21T10:23:47.7783925Z ld.shared.b8 %rs815, [%r31+2432]; 2026-02-21T10:23:47.7783998Z ld.shared.b8 %rs816, [%r31+3456]; 2026-02-21T10:23:47.7784063Z ld.shared.b8 %rs817, [%r32+512]; 2026-02-21T10:23:47.7784131Z ld.shared.b8 %rs818, [%r32+1536]; 2026-02-21T10:23:47.7784204Z ld.shared.b8 %rs819, [%r32+2560]; 2026-02-21T10:23:47.7784270Z ld.shared.b8 %rs820, [%r32+3584]; 2026-02-21T10:23:47.7784337Z ld.shared.b8 %rs821, [%r33+640]; 2026-02-21T10:23:47.7784402Z ld.shared.b8 %rs822, [%r33+1664]; 2026-02-21T10:23:47.7784471Z ld.shared.b8 %rs823, [%r33+2688]; 2026-02-21T10:23:47.7784538Z ld.shared.b8 %rs824, [%r33+3712]; 2026-02-21T10:23:47.7784603Z ld.shared.b8 %rs825, [%r34+768]; 2026-02-21T10:23:47.7784672Z ld.shared.b8 %rs826, [%r34+1792]; 2026-02-21T10:23:47.7784737Z ld.shared.b8 %rs827, [%r34+2816]; 2026-02-21T10:23:47.7784868Z ld.shared.b8 %rs828, [%r34+3840]; 2026-02-21T10:23:47.7784940Z ld.shared.b8 %rs829, [%r35+896]; 2026-02-21T10:23:47.7785004Z ld.shared.b8 %rs830, [%r35+1920]; 2026-02-21T10:23:47.7785069Z ld.shared.b8 %rs831, [%r35+2944]; 2026-02-21T10:23:47.7785182Z ld.shared.b8 %rs832, [%r35+3968]; 2026-02-21T10:23:47.7785402Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7785470Z shl.b16 %rs833, %rs801, 4; 2026-02-21T10:23:47.7785537Z shl.b16 %rs834, %rs805, 4; 2026-02-21T10:23:47.7785603Z shl.b16 %rs835, %rs809, 4; 2026-02-21T10:23:47.7785665Z shl.b16 %rs836, %rs813, 4; 2026-02-21T10:23:47.7785740Z shl.b16 %rs837, %rs817, 4; 2026-02-21T10:23:47.7785805Z shl.b16 %rs838, %rs821, 4; 2026-02-21T10:23:47.7785874Z shl.b16 %rs839, %rs825, 4; 2026-02-21T10:23:47.7785940Z shl.b16 %rs840, %rs829, 4; 2026-02-21T10:23:47.7786002Z shl.b16 %rs841, %rs802, 4; 2026-02-21T10:23:47.7786070Z shl.b16 %rs842, %rs806, 4; 2026-02-21T10:23:47.7786208Z shl.b16 %rs843, %rs810, 4; 2026-02-21T10:23:47.7786275Z shl.b16 %rs844, %rs814, 4; 2026-02-21T10:23:47.7786337Z shl.b16 %rs845, %rs818, 4; 2026-02-21T10:23:47.7786404Z shl.b16 %rs846, %rs822, 4; 2026-02-21T10:23:47.7786582Z shl.b16 %rs847, %rs826, 4; 2026-02-21T10:23:47.7786649Z shl.b16 %rs848, %rs830, 4; 2026-02-21T10:23:47.7786716Z shl.b16 %rs849, %rs803, 4; 2026-02-21T10:23:47.7786855Z shl.b16 %rs850, %rs807, 4; 2026-02-21T10:23:47.7786925Z shl.b16 %rs851, %rs811, 4; 2026-02-21T10:23:47.7786988Z shl.b16 %rs852, %rs815, 4; 2026-02-21T10:23:47.7787061Z shl.b16 %rs853, %rs819, 4; 2026-02-21T10:23:47.7787123Z shl.b16 %rs854, %rs823, 4; 2026-02-21T10:23:47.7787187Z shl.b16 %rs855, %rs827, 4; 2026-02-21T10:23:47.7787254Z shl.b16 %rs856, %rs831, 4; 2026-02-21T10:23:47.7787319Z shl.b16 %rs857, %rs804, 4; 2026-02-21T10:23:47.7787382Z shl.b16 %rs858, %rs808, 4; 2026-02-21T10:23:47.7787448Z shl.b16 %rs859, %rs812, 4; 2026-02-21T10:23:47.7787515Z shl.b16 %rs860, %rs816, 4; 2026-02-21T10:23:47.7787579Z shl.b16 %rs861, %rs820, 4; 2026-02-21T10:23:47.7787642Z shl.b16 %rs862, %rs824, 4; 2026-02-21T10:23:47.7787707Z shl.b16 %rs863, %rs828, 4; 2026-02-21T10:23:47.7787771Z shl.b16 %rs864, %rs832, 4; 2026-02-21T10:23:47.7787984Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7788073Z selp.b16 %rs865, %rs833, %rs801, %p188; 2026-02-21T10:23:47.7788137Z cvt.s16.s8 %rs866, %rs865; 2026-02-21T10:23:47.7788200Z shr.s16 %rs867, %rs866, 4; 2026-02-21T10:23:47.7788274Z selp.b16 %rs868, %rs834, %rs805, %p188; 2026-02-21T10:23:47.7788340Z cvt.s16.s8 %rs869, %rs868; 2026-02-21T10:23:47.7788402Z shr.s16 %rs870, %rs869, 4; 2026-02-21T10:23:47.7788473Z selp.b16 %rs871, %rs835, %rs809, %p188; 2026-02-21T10:23:47.7788622Z cvt.s16.s8 %rs872, %rs871; 2026-02-21T10:23:47.7788686Z shr.s16 %rs873, %rs872, 4; 2026-02-21T10:23:47.7788758Z selp.b16 %rs874, %rs836, %rs813, %p188; 2026-02-21T10:23:47.7788826Z cvt.s16.s8 %rs875, %rs874; 2026-02-21T10:23:47.7788896Z shr.s16 %rs876, %rs875, 4; 2026-02-21T10:23:47.7788967Z selp.b16 %rs877, %rs837, %rs817, %p188; 2026-02-21T10:23:47.7789030Z cvt.s16.s8 %rs878, %rs877; 2026-02-21T10:23:47.7789098Z shr.s16 %rs879, %rs878, 4; 2026-02-21T10:23:47.7789169Z selp.b16 %rs880, %rs838, %rs821, %p188; 2026-02-21T10:23:47.7789231Z cvt.s16.s8 %rs881, %rs880; 2026-02-21T10:23:47.7789301Z shr.s16 %rs882, %rs881, 4; 2026-02-21T10:23:47.7789373Z selp.b16 %rs883, %rs839, %rs825, %p188; 2026-02-21T10:23:47.7789437Z cvt.s16.s8 %rs884, %rs883; 2026-02-21T10:23:47.7789499Z shr.s16 %rs885, %rs884, 4; 2026-02-21T10:23:47.7789573Z selp.b16 %rs886, %rs840, %rs829, %p188; 2026-02-21T10:23:47.7789637Z cvt.s16.s8 %rs887, %rs886; 2026-02-21T10:23:47.7789700Z shr.s16 %rs888, %rs887, 4; 2026-02-21T10:23:47.7789775Z selp.b16 %rs889, %rs841, %rs802, %p188; 2026-02-21T10:23:47.7789838Z cvt.s16.s8 %rs890, %rs889; 2026-02-21T10:23:47.7789991Z shr.s16 %rs891, %rs890, 4; 2026-02-21T10:23:47.7790070Z selp.b16 %rs892, %rs842, %rs806, %p188; 2026-02-21T10:23:47.7790136Z cvt.s16.s8 %rs893, %rs892; 2026-02-21T10:23:47.7790197Z shr.s16 %rs894, %rs893, 4; 2026-02-21T10:23:47.7790268Z selp.b16 %rs895, %rs843, %rs810, %p188; 2026-02-21T10:23:47.7790404Z cvt.s16.s8 %rs896, %rs895; 2026-02-21T10:23:47.7790477Z shr.s16 %rs897, %rs896, 4; 2026-02-21T10:23:47.7790551Z selp.b16 %rs898, %rs844, %rs814, %p188; 2026-02-21T10:23:47.7790615Z cvt.s16.s8 %rs899, %rs898; 2026-02-21T10:23:47.7790682Z shr.s16 %rs900, %rs899, 4; 2026-02-21T10:23:47.7790757Z selp.b16 %rs901, %rs845, %rs818, %p188; 2026-02-21T10:23:47.7790820Z cvt.s16.s8 %rs902, %rs901; 2026-02-21T10:23:47.7790888Z shr.s16 %rs903, %rs902, 4; 2026-02-21T10:23:47.7790958Z selp.b16 %rs904, %rs846, %rs822, %p188; 2026-02-21T10:23:47.7791022Z cvt.s16.s8 %rs905, %rs904; 2026-02-21T10:23:47.7791090Z shr.s16 %rs906, %rs905, 4; 2026-02-21T10:23:47.7791162Z selp.b16 %rs907, %rs847, %rs826, %p188; 2026-02-21T10:23:47.7791316Z cvt.s16.s8 %rs908, %rs907; 2026-02-21T10:23:47.7791383Z shr.s16 %rs909, %rs908, 4; 2026-02-21T10:23:47.7791458Z selp.b16 %rs910, %rs848, %rs830, %p188; 2026-02-21T10:23:47.7791522Z cvt.s16.s8 %rs911, %rs910; 2026-02-21T10:23:47.7791589Z shr.s16 %rs912, %rs911, 4; 2026-02-21T10:23:47.7791665Z selp.b16 %rs913, %rs849, %rs803, %p188; 2026-02-21T10:23:47.7791729Z cvt.s16.s8 %rs914, %rs913; 2026-02-21T10:23:47.7791840Z shr.s16 %rs915, %rs914, 4; 2026-02-21T10:23:47.7791929Z selp.b16 %rs916, %rs850, %rs807, %p188; 2026-02-21T10:23:47.7791999Z cvt.s16.s8 %rs917, %rs916; 2026-02-21T10:23:47.7792062Z shr.s16 %rs918, %rs917, 4; 2026-02-21T10:23:47.7792139Z selp.b16 %rs919, %rs851, %rs811, %p188; 2026-02-21T10:23:47.7792207Z cvt.s16.s8 %rs920, %rs919; 2026-02-21T10:23:47.7792267Z shr.s16 %rs921, %rs920, 4; 2026-02-21T10:23:47.7792340Z selp.b16 %rs922, %rs852, %rs815, %p188; 2026-02-21T10:23:47.7792406Z cvt.s16.s8 %rs923, %rs922; 2026-02-21T10:23:47.7792478Z shr.s16 %rs924, %rs923, 4; 2026-02-21T10:23:47.7792548Z selp.b16 %rs925, %rs853, %rs819, %p188; 2026-02-21T10:23:47.7792611Z cvt.s16.s8 %rs926, %rs925; 2026-02-21T10:23:47.7792679Z shr.s16 %rs927, %rs926, 4; 2026-02-21T10:23:47.7792752Z selp.b16 %rs928, %rs854, %rs823, %p188; 2026-02-21T10:23:47.7792815Z cvt.s16.s8 %rs929, %rs928; 2026-02-21T10:23:47.7792887Z shr.s16 %rs930, %rs929, 4; 2026-02-21T10:23:47.7792961Z selp.b16 %rs931, %rs855, %rs827, %p188; 2026-02-21T10:23:47.7793025Z cvt.s16.s8 %rs932, %rs931; 2026-02-21T10:23:47.7793088Z shr.s16 %rs933, %rs932, 4; 2026-02-21T10:23:47.7793165Z selp.b16 %rs934, %rs856, %rs831, %p188; 2026-02-21T10:23:47.7793228Z cvt.s16.s8 %rs935, %rs934; 2026-02-21T10:23:47.7793290Z shr.s16 %rs936, %rs935, 4; 2026-02-21T10:23:47.7793366Z selp.b16 %rs937, %rs857, %rs804, %p188; 2026-02-21T10:23:47.7793428Z cvt.s16.s8 %rs938, %rs937; 2026-02-21T10:23:47.7793491Z shr.s16 %rs939, %rs938, 4; 2026-02-21T10:23:47.7793561Z selp.b16 %rs940, %rs858, %rs808, %p188; 2026-02-21T10:23:47.7793633Z cvt.s16.s8 %rs941, %rs940; 2026-02-21T10:23:47.7793700Z shr.s16 %rs942, %rs941, 4; 2026-02-21T10:23:47.7793771Z selp.b16 %rs943, %rs859, %rs812, %p188; 2026-02-21T10:23:47.7793841Z cvt.s16.s8 %rs944, %rs943; 2026-02-21T10:23:47.7793907Z shr.s16 %rs945, %rs944, 4; 2026-02-21T10:23:47.7793977Z selp.b16 %rs946, %rs860, %rs816, %p188; 2026-02-21T10:23:47.7794039Z cvt.s16.s8 %rs947, %rs946; 2026-02-21T10:23:47.7794108Z shr.s16 %rs948, %rs947, 4; 2026-02-21T10:23:47.7794192Z selp.b16 %rs949, %rs861, %rs820, %p188; 2026-02-21T10:23:47.7794257Z cvt.s16.s8 %rs950, %rs949; 2026-02-21T10:23:47.7794329Z shr.s16 %rs951, %rs950, 4; 2026-02-21T10:23:47.7794400Z selp.b16 %rs952, %rs862, %rs824, %p188; 2026-02-21T10:23:47.7794463Z cvt.s16.s8 %rs953, %rs952; 2026-02-21T10:23:47.7794529Z shr.s16 %rs954, %rs953, 4; 2026-02-21T10:23:47.7794600Z selp.b16 %rs955, %rs863, %rs828, %p188; 2026-02-21T10:23:47.7794662Z cvt.s16.s8 %rs956, %rs955; 2026-02-21T10:23:47.7794792Z shr.s16 %rs957, %rs956, 4; 2026-02-21T10:23:47.7794869Z selp.b16 %rs958, %rs864, %rs832, %p188; 2026-02-21T10:23:47.7794931Z cvt.s16.s8 %rs959, %rs958; 2026-02-21T10:23:47.7794992Z shr.s16 %rs960, %rs959, 4; 2026-02-21T10:23:47.7795264Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7795336Z cvt.rn.f32.s16 %r10614, %rs867; 2026-02-21T10:23:47.7795405Z cvt.rn.f32.s16 %r10615, %rs870; 2026-02-21T10:23:47.7795471Z cvt.rn.f32.s16 %r10616, %rs873; 2026-02-21T10:23:47.7795539Z cvt.rn.f32.s16 %r10617, %rs876; 2026-02-21T10:23:47.7795602Z cvt.rn.f32.s16 %r10618, %rs879; 2026-02-21T10:23:47.7795666Z cvt.rn.f32.s16 %r10619, %rs882; 2026-02-21T10:23:47.7795740Z cvt.rn.f32.s16 %r10620, %rs885; 2026-02-21T10:23:47.7795811Z cvt.rn.f32.s16 %r10621, %rs888; 2026-02-21T10:23:47.7795875Z cvt.rn.f32.s16 %r10622, %rs891; 2026-02-21T10:23:47.7795944Z cvt.rn.f32.s16 %r10623, %rs894; 2026-02-21T10:23:47.7796010Z cvt.rn.f32.s16 %r10624, %rs897; 2026-02-21T10:23:47.7796121Z cvt.rn.f32.s16 %r10625, %rs900; 2026-02-21T10:23:47.7796185Z cvt.rn.f32.s16 %r10626, %rs903; 2026-02-21T10:23:47.7796253Z cvt.rn.f32.s16 %r10627, %rs906; 2026-02-21T10:23:47.7796320Z cvt.rn.f32.s16 %r10628, %rs909; 2026-02-21T10:23:47.7796383Z cvt.rn.f32.s16 %r10629, %rs912; 2026-02-21T10:23:47.7796578Z cvt.rn.f32.s16 %r10630, %rs915; 2026-02-21T10:23:47.7796723Z cvt.rn.f32.s16 %r10631, %rs918; 2026-02-21T10:23:47.7796794Z cvt.rn.f32.s16 %r10632, %rs921; 2026-02-21T10:23:47.7796858Z cvt.rn.f32.s16 %r10633, %rs924; 2026-02-21T10:23:47.7796928Z cvt.rn.f32.s16 %r10634, %rs927; 2026-02-21T10:23:47.7796990Z cvt.rn.f32.s16 %r10635, %rs930; 2026-02-21T10:23:47.7797054Z cvt.rn.f32.s16 %r10636, %rs933; 2026-02-21T10:23:47.7797123Z cvt.rn.f32.s16 %r10637, %rs936; 2026-02-21T10:23:47.7797188Z cvt.rn.f32.s16 %r10638, %rs939; 2026-02-21T10:23:47.7797252Z cvt.rn.f32.s16 %r10639, %rs942; 2026-02-21T10:23:47.7797318Z cvt.rn.f32.s16 %r10640, %rs945; 2026-02-21T10:23:47.7797393Z cvt.rn.f32.s16 %r10641, %rs948; 2026-02-21T10:23:47.7797456Z cvt.rn.f32.s16 %r10642, %rs951; 2026-02-21T10:23:47.7797520Z cvt.rn.f32.s16 %r10643, %rs954; 2026-02-21T10:23:47.7797590Z cvt.rn.f32.s16 %r10644, %rs957; 2026-02-21T10:23:47.7797658Z cvt.rn.f32.s16 %r10645, %rs960; 2026-02-21T10:23:47.7797718Z bar.sync 0; 2026-02-21T10:23:47.7797807Z st.shared.b32 [%r36], %r10614; 2026-02-21T10:23:47.7797882Z st.shared.b32 [%r36+8], %r10615; 2026-02-21T10:23:47.7797955Z st.shared.b32 [%r36+16384], %r10630; 2026-02-21T10:23:47.7798027Z st.shared.b32 [%r36+16392], %r10631; 2026-02-21T10:23:47.7798098Z st.shared.b32 [%r37], %r10616; 2026-02-21T10:23:47.7798164Z st.shared.b32 [%r37+8], %r10617; 2026-02-21T10:23:47.7798231Z st.shared.b32 [%r37+16384], %r10632; 2026-02-21T10:23:47.7798305Z st.shared.b32 [%r37+16392], %r10633; 2026-02-21T10:23:47.7798371Z st.shared.b32 [%r38], %r10618; 2026-02-21T10:23:47.7798437Z st.shared.b32 [%r38+8], %r10619; 2026-02-21T10:23:47.7798508Z st.shared.b32 [%r38+16384], %r10634; 2026-02-21T10:23:47.7798580Z st.shared.b32 [%r38+16392], %r10635; 2026-02-21T10:23:47.7798647Z st.shared.b32 [%r39], %r10620; 2026-02-21T10:23:47.7798713Z st.shared.b32 [%r39+8], %r10621; 2026-02-21T10:23:47.7798787Z st.shared.b32 [%r39+16384], %r10636; 2026-02-21T10:23:47.7798855Z st.shared.b32 [%r39+16392], %r10637; 2026-02-21T10:23:47.7798923Z st.shared.b32 [%r40], %r10622; 2026-02-21T10:23:47.7798990Z st.shared.b32 [%r40+8], %r10623; 2026-02-21T10:23:47.7799062Z st.shared.b32 [%r40+16384], %r10638; 2026-02-21T10:23:47.7799128Z st.shared.b32 [%r40+16392], %r10639; 2026-02-21T10:23:47.7799195Z st.shared.b32 [%r41], %r10624; 2026-02-21T10:23:47.7799266Z st.shared.b32 [%r41+8], %r10625; 2026-02-21T10:23:47.7799334Z st.shared.b32 [%r41+16384], %r10640; 2026-02-21T10:23:47.7799401Z st.shared.b32 [%r41+16392], %r10641; 2026-02-21T10:23:47.7799474Z st.shared.b32 [%r42], %r10626; 2026-02-21T10:23:47.7799624Z st.shared.b32 [%r42+8], %r10627; 2026-02-21T10:23:47.7799694Z st.shared.b32 [%r42+16384], %r10642; 2026-02-21T10:23:47.7799761Z st.shared.b32 [%r42+16392], %r10643; 2026-02-21T10:23:47.7799834Z st.shared.b32 [%r43], %r10628; 2026-02-21T10:23:47.7799962Z st.shared.b32 [%r43+8], %r10629; 2026-02-21T10:23:47.7800030Z st.shared.b32 [%r43+16384], %r10644; 2026-02-21T10:23:47.7800104Z st.shared.b32 [%r43+16392], %r10645; 2026-02-21T10:23:47.7800165Z $L__tmp9: 2026-02-21T10:23:47.7800450Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7800519Z // begin inline asm 2026-02-21T10:23:47.7800601Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7800660Z // end inline asm 2026-02-21T10:23:47.7800717Z bar.sync 0; 2026-02-21T10:23:47.7800820Z shfl.sync.idx.b32 %r10646, %r4, 0, 31, -1; 2026-02-21T10:23:47.7800896Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7800963Z mov.pred %p79, -1; 2026-02-21T10:23:47.7801033Z // begin inline asm 2026-02-21T10:23:47.7802638Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r5906,%r5907,%r5908,%r5909}, %rd2, %p79, 1, 1; 2026-02-21T10:23:47.7802706Z // end inline asm 2026-02-21T10:23:47.7802774Z // begin inline asm 2026-02-21T10:23:47.7804258Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6038,%r6039,%r6040,%r6041}, %rd3, %p79, 1, 1; 2026-02-21T10:23:47.7804329Z // end inline asm 2026-02-21T10:23:47.7804391Z // begin inline asm 2026-02-21T10:23:47.7805867Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6170,%r6171,%r6172,%r6173}, %rd4, %p79, 1, 1; 2026-02-21T10:23:47.7805935Z // end inline asm 2026-02-21T10:23:47.7805996Z // begin inline asm 2026-02-21T10:23:47.7807588Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6302,%r6303,%r6304,%r6305}, %rd5, %p79, 1, 1; 2026-02-21T10:23:47.7807657Z // end inline asm 2026-02-21T10:23:47.7807719Z // begin inline asm 2026-02-21T10:23:47.7809204Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6434,%r6435,%r6436,%r6437}, %rd6, %p79, 1, 1; 2026-02-21T10:23:47.7809430Z // end inline asm 2026-02-21T10:23:47.7809493Z // begin inline asm 2026-02-21T10:23:47.7811050Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6566,%r6567,%r6568,%r6569}, %rd7, %p79, 1, 1; 2026-02-21T10:23:47.7811122Z // end inline asm 2026-02-21T10:23:47.7811185Z // begin inline asm 2026-02-21T10:23:47.7812743Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6698,%r6699,%r6700,%r6701}, %rd8, %p79, 1, 1; 2026-02-21T10:23:47.7812811Z // end inline asm 2026-02-21T10:23:47.7812876Z // begin inline asm 2026-02-21T10:23:47.7814357Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r6830,%r6831,%r6832,%r6833}, %rd9, %p79, 1, 1; 2026-02-21T10:23:47.7814420Z // end inline asm 2026-02-21T10:23:47.7814504Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7814582Z mov.b32 %r6899, %r10545; 2026-02-21T10:23:47.7814645Z mov.b32 %r6900, %r10545; 2026-02-21T10:23:47.7814713Z mov.b32 %r6898, %r10980; 2026-02-21T10:23:47.7814777Z // begin inline asm 2026-02-21T10:23:47.7816060Z // wait for regs: %r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218,%r6898,%r6899,%r6900 2026-02-21T10:23:47.7816150Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7816210Z // end inline asm 2026-02-21T10:23:47.7816267Z $L__tmp10: 2026-02-21T10:23:47.7816618Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7816691Z add.s64 %rd190, %rd169, 128; 2026-02-21T10:23:47.7816844Z add.s64 %rd193, %rd172, 128; 2026-02-21T10:23:47.7816912Z add.s64 %rd196, %rd175, 128; 2026-02-21T10:23:47.7817128Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7817257Z add.s64 %rd199, %rd178, 128; 2026-02-21T10:23:47.7817318Z // begin inline asm 2026-02-21T10:23:47.7817385Z mov.u64 %rd189, 0x0; 2026-02-21T10:23:47.7817515Z createpolicy.fractional.L2::evict_last.b64 %rd189, 1.0; 2026-02-21T10:23:47.7817575Z // end inline asm 2026-02-21T10:23:47.7817639Z // begin inline asm 2026-02-21T10:23:47.7817699Z mov.u32 %r6968, 0x0; 2026-02-21T10:23:47.7817758Z mov.u32 %r6969, 0x0; 2026-02-21T10:23:47.7817818Z mov.u32 %r6970, 0x0; 2026-02-21T10:23:47.7817883Z mov.u32 %r6971, 0x0; 2026-02-21T10:23:47.7818113Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6968, %r6969, %r6970, %r6971 }, [ %rd190 + 0 ], %rd189; 2026-02-21T10:23:47.7818173Z // end inline asm 2026-02-21T10:23:47.7818239Z // begin inline asm 2026-02-21T10:23:47.7818377Z mov.u64 %rd192, 0x0; 2026-02-21T10:23:47.7818515Z createpolicy.fractional.L2::evict_last.b64 %rd192, 1.0; 2026-02-21T10:23:47.7818579Z // end inline asm 2026-02-21T10:23:47.7818638Z // begin inline asm 2026-02-21T10:23:47.7818700Z mov.u32 %r6972, 0x0; 2026-02-21T10:23:47.7818759Z mov.u32 %r6973, 0x0; 2026-02-21T10:23:47.7818821Z mov.u32 %r6974, 0x0; 2026-02-21T10:23:47.7818879Z mov.u32 %r6975, 0x0; 2026-02-21T10:23:47.7819159Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6972, %r6973, %r6974, %r6975 }, [ %rd193 + 0 ], %rd192; 2026-02-21T10:23:47.7819226Z // end inline asm 2026-02-21T10:23:47.7819287Z // begin inline asm 2026-02-21T10:23:47.7819348Z mov.u64 %rd195, 0x0; 2026-02-21T10:23:47.7819468Z createpolicy.fractional.L2::evict_last.b64 %rd195, 1.0; 2026-02-21T10:23:47.7819544Z // end inline asm 2026-02-21T10:23:47.7819607Z // begin inline asm 2026-02-21T10:23:47.7819667Z mov.u32 %r6976, 0x0; 2026-02-21T10:23:47.7819731Z mov.u32 %r6977, 0x0; 2026-02-21T10:23:47.7819791Z mov.u32 %r6978, 0x0; 2026-02-21T10:23:47.7819852Z mov.u32 %r6979, 0x0; 2026-02-21T10:23:47.7820071Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6976, %r6977, %r6978, %r6979 }, [ %rd196 + 0 ], %rd195; 2026-02-21T10:23:47.7820131Z // end inline asm 2026-02-21T10:23:47.7820198Z // begin inline asm 2026-02-21T10:23:47.7820258Z mov.u64 %rd198, 0x0; 2026-02-21T10:23:47.7820383Z createpolicy.fractional.L2::evict_last.b64 %rd198, 1.0; 2026-02-21T10:23:47.7820445Z // end inline asm 2026-02-21T10:23:47.7820506Z // begin inline asm 2026-02-21T10:23:47.7820571Z mov.u32 %r6980, 0x0; 2026-02-21T10:23:47.7820630Z mov.u32 %r6981, 0x0; 2026-02-21T10:23:47.7820687Z mov.u32 %r6982, 0x0; 2026-02-21T10:23:47.7820745Z mov.u32 %r6983, 0x0; 2026-02-21T10:23:47.7820960Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6980, %r6981, %r6982, %r6983 }, [ %rd199 + 0 ], %rd198; 2026-02-21T10:23:47.7821021Z // end inline asm 2026-02-21T10:23:47.7821235Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7821303Z bar.sync 0; 2026-02-21T10:23:47.7821386Z st.shared.v2.b32 [%r18], {%r6968, %r6969}; 2026-02-21T10:23:47.7821477Z st.shared.v2.b32 [%r18+4096], {%r6972, %r6973}; 2026-02-21T10:23:47.7821585Z st.shared.v2.b32 [%r18+8192], {%r6976, %r6977}; 2026-02-21T10:23:47.7821679Z st.shared.v2.b32 [%r18+12288], {%r6980, %r6981}; 2026-02-21T10:23:47.7821763Z st.shared.v2.b32 [%r19], {%r6970, %r6971}; 2026-02-21T10:23:47.7821846Z st.shared.v2.b32 [%r19+4096], {%r6974, %r6975}; 2026-02-21T10:23:47.7821937Z st.shared.v2.b32 [%r19+8192], {%r6978, %r6979}; 2026-02-21T10:23:47.7822027Z st.shared.v2.b32 [%r19+12288], {%r6982, %r6983}; 2026-02-21T10:23:47.7822090Z bar.sync 0; 2026-02-21T10:23:47.7822166Z ld.shared.b16 %rs961, [%r20]; 2026-02-21T10:23:47.7822238Z ld.shared.b16 %rs962, [%r20+1024]; 2026-02-21T10:23:47.7822309Z ld.shared.b16 %rs963, [%r20+64]; 2026-02-21T10:23:47.7822382Z ld.shared.b16 %rs964, [%r20+1088]; 2026-02-21T10:23:47.7822509Z ld.shared.b16 %rs965, [%r21]; 2026-02-21T10:23:47.7822577Z ld.shared.b16 %rs966, [%r21+1024]; 2026-02-21T10:23:47.7822643Z ld.shared.b16 %rs967, [%r21+64]; 2026-02-21T10:23:47.7822715Z ld.shared.b16 %rs968, [%r21+1088]; 2026-02-21T10:23:47.7822831Z ld.shared.b16 %rs969, [%r22]; 2026-02-21T10:23:47.7822899Z ld.shared.b16 %rs970, [%r22+1024]; 2026-02-21T10:23:47.7822977Z ld.shared.b16 %rs971, [%r22+64]; 2026-02-21T10:23:47.7823053Z ld.shared.b16 %rs972, [%r22+1088]; 2026-02-21T10:23:47.7823123Z ld.shared.b16 %rs973, [%r23]; 2026-02-21T10:23:47.7823191Z ld.shared.b16 %rs974, [%r23+1024]; 2026-02-21T10:23:47.7823266Z ld.shared.b16 %rs975, [%r23+64]; 2026-02-21T10:23:47.7823331Z ld.shared.b16 %rs976, [%r23+1088]; 2026-02-21T10:23:47.7823396Z ld.shared.b16 %rs977, [%r24]; 2026-02-21T10:23:47.7823466Z ld.shared.b16 %rs978, [%r24+1024]; 2026-02-21T10:23:47.7823532Z ld.shared.b16 %rs979, [%r24+64]; 2026-02-21T10:23:47.7823600Z ld.shared.b16 %rs980, [%r24+1088]; 2026-02-21T10:23:47.7823726Z ld.shared.b16 %rs981, [%r25]; 2026-02-21T10:23:47.7823795Z ld.shared.b16 %rs982, [%r25+1024]; 2026-02-21T10:23:47.7823862Z ld.shared.b16 %rs983, [%r25+64]; 2026-02-21T10:23:47.7823930Z ld.shared.b16 %rs984, [%r25+1088]; 2026-02-21T10:23:47.7824004Z ld.shared.b16 %rs985, [%r26]; 2026-02-21T10:23:47.7824073Z ld.shared.b16 %rs986, [%r26+1024]; 2026-02-21T10:23:47.7824142Z ld.shared.b16 %rs987, [%r26+64]; 2026-02-21T10:23:47.7824260Z ld.shared.b16 %rs988, [%r26+1088]; 2026-02-21T10:23:47.7824328Z ld.shared.b16 %rs989, [%r27]; 2026-02-21T10:23:47.7824396Z ld.shared.b16 %rs990, [%r27+1024]; 2026-02-21T10:23:47.7824461Z ld.shared.b16 %rs991, [%r27+64]; 2026-02-21T10:23:47.7824533Z ld.shared.b16 %rs992, [%r27+1088]; 2026-02-21T10:23:47.7824602Z cvt.f32.bf16 %r7121, %rs961; 2026-02-21T10:23:47.7824671Z cvt.f32.bf16 %r7122, %rs962; 2026-02-21T10:23:47.7824738Z cvt.f32.bf16 %r7123, %rs965; 2026-02-21T10:23:47.7824801Z cvt.f32.bf16 %r7124, %rs966; 2026-02-21T10:23:47.7824866Z cvt.f32.bf16 %r7253, %rs969; 2026-02-21T10:23:47.7824930Z cvt.f32.bf16 %r7254, %rs970; 2026-02-21T10:23:47.7824998Z cvt.f32.bf16 %r7255, %rs973; 2026-02-21T10:23:47.7825060Z cvt.f32.bf16 %r7256, %rs974; 2026-02-21T10:23:47.7825122Z cvt.f32.bf16 %r7385, %rs977; 2026-02-21T10:23:47.7825192Z cvt.f32.bf16 %r7386, %rs978; 2026-02-21T10:23:47.7825257Z cvt.f32.bf16 %r7387, %rs981; 2026-02-21T10:23:47.7825321Z cvt.f32.bf16 %r7388, %rs982; 2026-02-21T10:23:47.7825390Z cvt.f32.bf16 %r7517, %rs985; 2026-02-21T10:23:47.7825454Z cvt.f32.bf16 %r7518, %rs986; 2026-02-21T10:23:47.7825520Z cvt.f32.bf16 %r7519, %rs989; 2026-02-21T10:23:47.7825595Z cvt.f32.bf16 %r7520, %rs990; 2026-02-21T10:23:47.7825665Z cvt.f32.bf16 %r7649, %rs963; 2026-02-21T10:23:47.7825730Z cvt.f32.bf16 %r7650, %rs964; 2026-02-21T10:23:47.7825792Z cvt.f32.bf16 %r7651, %rs967; 2026-02-21T10:23:47.7825859Z cvt.f32.bf16 %r7652, %rs968; 2026-02-21T10:23:47.7825921Z cvt.f32.bf16 %r7781, %rs971; 2026-02-21T10:23:47.7825986Z cvt.f32.bf16 %r7782, %rs972; 2026-02-21T10:23:47.7826050Z cvt.f32.bf16 %r7783, %rs975; 2026-02-21T10:23:47.7826119Z cvt.f32.bf16 %r7784, %rs976; 2026-02-21T10:23:47.7826183Z cvt.f32.bf16 %r7913, %rs979; 2026-02-21T10:23:47.7826245Z cvt.f32.bf16 %r7914, %rs980; 2026-02-21T10:23:47.7826323Z cvt.f32.bf16 %r7915, %rs983; 2026-02-21T10:23:47.7826389Z cvt.f32.bf16 %r7916, %rs984; 2026-02-21T10:23:47.7826575Z cvt.f32.bf16 %r8045, %rs987; 2026-02-21T10:23:47.7826645Z cvt.f32.bf16 %r8046, %rs988; 2026-02-21T10:23:47.7826714Z cvt.f32.bf16 %r8047, %rs991; 2026-02-21T10:23:47.7826777Z cvt.f32.bf16 %r8048, %rs992; 2026-02-21T10:23:47.7826995Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7827062Z bar.sync 0; 2026-02-21T10:23:47.7827124Z // begin inline asm 2026-02-21T10:23:47.7827227Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7827293Z // end inline asm 2026-02-21T10:23:47.7827453Z bar.sync 0; 2026-02-21T10:23:47.7827521Z // begin inline asm 2026-02-21T10:23:47.7827660Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7827724Z // end inline asm 2026-02-21T10:23:47.7827785Z // begin inline asm 2026-02-21T10:23:47.7827927Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7827995Z // end inline asm 2026-02-21T10:23:47.7828052Z bar.sync 0; 2026-02-21T10:23:47.7828124Z elect.sync %r10647|%p126, -1; 2026-02-21T10:23:47.7828195Z and.pred %p89, %p1, %p126; 2026-02-21T10:23:47.7828276Z or.b32 %r6988, %r5773, 32; 2026-02-21T10:23:47.7828337Z // begin inline asm 2026-02-21T10:23:47.7828743Z @%p89 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r5772, %r6988}], [%r5769]; 2026-02-21T10:23:47.7828811Z // end inline asm 2026-02-21T10:23:47.7828869Z bar.sync 0; 2026-02-21T10:23:47.7828929Z // begin inline asm 2026-02-21T10:23:47.7828990Z 2026-02-21T10:23:47.7829044Z { 2026-02-21T10:23:47.7829114Z .reg .pred complete; 2026-02-21T10:23:47.7829250Z waitLoop: 2026-02-21T10:23:47.7829409Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r10545; 2026-02-21T10:23:47.7829482Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7829535Z } 2026-02-21T10:23:47.7829542Z 2026-02-21T10:23:47.7829606Z // end inline asm 2026-02-21T10:23:47.7829664Z bar.sync 0; 2026-02-21T10:23:47.7829724Z // begin inline asm 2026-02-21T10:23:47.7829896Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7829968Z // end inline asm 2026-02-21T10:23:47.7830200Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7830272Z ld.shared.b8 %rs993, [%r28]; 2026-02-21T10:23:47.7830349Z ld.shared.b8 %rs994, [%r28+1024]; 2026-02-21T10:23:47.7830418Z ld.shared.b8 %rs995, [%r28+2048]; 2026-02-21T10:23:47.7830485Z ld.shared.b8 %rs996, [%r28+3072]; 2026-02-21T10:23:47.7830565Z ld.shared.b8 %rs997, [%r29+128]; 2026-02-21T10:23:47.7830644Z ld.shared.b8 %rs998, [%r29+1152]; 2026-02-21T10:23:47.7830713Z ld.shared.b8 %rs999, [%r29+2176]; 2026-02-21T10:23:47.7830793Z ld.shared.b8 %rs1000, [%r29+3200]; 2026-02-21T10:23:47.7830870Z ld.shared.b8 %rs1001, [%r30+256]; 2026-02-21T10:23:47.7830939Z ld.shared.b8 %rs1002, [%r30+1280]; 2026-02-21T10:23:47.7831005Z ld.shared.b8 %rs1003, [%r30+2304]; 2026-02-21T10:23:47.7831080Z ld.shared.b8 %rs1004, [%r30+3328]; 2026-02-21T10:23:47.7831153Z ld.shared.b8 %rs1005, [%r31+384]; 2026-02-21T10:23:47.7831220Z ld.shared.b8 %rs1006, [%r31+1408]; 2026-02-21T10:23:47.7831284Z ld.shared.b8 %rs1007, [%r31+2432]; 2026-02-21T10:23:47.7831353Z ld.shared.b8 %rs1008, [%r31+3456]; 2026-02-21T10:23:47.7831419Z ld.shared.b8 %rs1009, [%r32+512]; 2026-02-21T10:23:47.7831483Z ld.shared.b8 %rs1010, [%r32+1536]; 2026-02-21T10:23:47.7831553Z ld.shared.b8 %rs1011, [%r32+2560]; 2026-02-21T10:23:47.7831618Z ld.shared.b8 %rs1012, [%r32+3584]; 2026-02-21T10:23:47.7831684Z ld.shared.b8 %rs1013, [%r33+640]; 2026-02-21T10:23:47.7831754Z ld.shared.b8 %rs1014, [%r33+1664]; 2026-02-21T10:23:47.7831825Z ld.shared.b8 %rs1015, [%r33+2688]; 2026-02-21T10:23:47.7831891Z ld.shared.b8 %rs1016, [%r33+3712]; 2026-02-21T10:23:47.7831957Z ld.shared.b8 %rs1017, [%r34+768]; 2026-02-21T10:23:47.7832029Z ld.shared.b8 %rs1018, [%r34+1792]; 2026-02-21T10:23:47.7832094Z ld.shared.b8 %rs1019, [%r34+2816]; 2026-02-21T10:23:47.7832174Z ld.shared.b8 %rs1020, [%r34+3840]; 2026-02-21T10:23:47.7832253Z ld.shared.b8 %rs1021, [%r35+896]; 2026-02-21T10:23:47.7832320Z ld.shared.b8 %rs1022, [%r35+1920]; 2026-02-21T10:23:47.7832386Z ld.shared.b8 %rs1023, [%r35+2944]; 2026-02-21T10:23:47.7832452Z ld.shared.b8 %rs1024, [%r35+3968]; 2026-02-21T10:23:47.7832675Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7832744Z shl.b16 %rs1025, %rs993, 4; 2026-02-21T10:23:47.7832809Z shl.b16 %rs1026, %rs997, 4; 2026-02-21T10:23:47.7832878Z shl.b16 %rs1027, %rs1001, 4; 2026-02-21T10:23:47.7833020Z shl.b16 %rs1028, %rs1005, 4; 2026-02-21T10:23:47.7833083Z shl.b16 %rs1029, %rs1009, 4; 2026-02-21T10:23:47.7833145Z shl.b16 %rs1030, %rs1013, 4; 2026-02-21T10:23:47.7833224Z shl.b16 %rs1031, %rs1017, 4; 2026-02-21T10:23:47.7833338Z shl.b16 %rs1032, %rs1021, 4; 2026-02-21T10:23:47.7833402Z shl.b16 %rs1033, %rs994, 4; 2026-02-21T10:23:47.7833471Z shl.b16 %rs1034, %rs998, 4; 2026-02-21T10:23:47.7833538Z shl.b16 %rs1035, %rs1002, 4; 2026-02-21T10:23:47.7833601Z shl.b16 %rs1036, %rs1006, 4; 2026-02-21T10:23:47.7833671Z shl.b16 %rs1037, %rs1010, 4; 2026-02-21T10:23:47.7833740Z shl.b16 %rs1038, %rs1014, 4; 2026-02-21T10:23:47.7833807Z shl.b16 %rs1039, %rs1018, 4; 2026-02-21T10:23:47.7833868Z shl.b16 %rs1040, %rs1022, 4; 2026-02-21T10:23:47.7833938Z shl.b16 %rs1041, %rs995, 4; 2026-02-21T10:23:47.7834001Z shl.b16 %rs1042, %rs999, 4; 2026-02-21T10:23:47.7834065Z shl.b16 %rs1043, %rs1003, 4; 2026-02-21T10:23:47.7834133Z shl.b16 %rs1044, %rs1007, 4; 2026-02-21T10:23:47.7834246Z shl.b16 %rs1045, %rs1011, 4; 2026-02-21T10:23:47.7834313Z shl.b16 %rs1046, %rs1015, 4; 2026-02-21T10:23:47.7834377Z shl.b16 %rs1047, %rs1019, 4; 2026-02-21T10:23:47.7834446Z shl.b16 %rs1048, %rs1023, 4; 2026-02-21T10:23:47.7834513Z shl.b16 %rs1049, %rs996, 4; 2026-02-21T10:23:47.7834577Z shl.b16 %rs1050, %rs1000, 4; 2026-02-21T10:23:47.7834645Z shl.b16 %rs1051, %rs1004, 4; 2026-02-21T10:23:47.7834753Z shl.b16 %rs1052, %rs1008, 4; 2026-02-21T10:23:47.7834818Z shl.b16 %rs1053, %rs1012, 4; 2026-02-21T10:23:47.7834880Z shl.b16 %rs1054, %rs1016, 4; 2026-02-21T10:23:47.7834948Z shl.b16 %rs1055, %rs1020, 4; 2026-02-21T10:23:47.7835012Z shl.b16 %rs1056, %rs1024, 4; 2026-02-21T10:23:47.7835225Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7835315Z selp.b16 %rs1057, %rs1025, %rs993, %p188; 2026-02-21T10:23:47.7835382Z cvt.s16.s8 %rs1058, %rs1057; 2026-02-21T10:23:47.7835443Z shr.s16 %rs1059, %rs1058, 4; 2026-02-21T10:23:47.7835532Z selp.b16 %rs1060, %rs1026, %rs997, %p188; 2026-02-21T10:23:47.7835595Z cvt.s16.s8 %rs1061, %rs1060; 2026-02-21T10:23:47.7835659Z shr.s16 %rs1062, %rs1061, 4; 2026-02-21T10:23:47.7835740Z selp.b16 %rs1063, %rs1027, %rs1001, %p188; 2026-02-21T10:23:47.7835811Z cvt.s16.s8 %rs1064, %rs1063; 2026-02-21T10:23:47.7835873Z shr.s16 %rs1065, %rs1064, 4; 2026-02-21T10:23:47.7835956Z selp.b16 %rs1066, %rs1028, %rs1005, %p188; 2026-02-21T10:23:47.7836024Z cvt.s16.s8 %rs1067, %rs1066; 2026-02-21T10:23:47.7836087Z shr.s16 %rs1068, %rs1067, 4; 2026-02-21T10:23:47.7836163Z selp.b16 %rs1069, %rs1029, %rs1009, %p188; 2026-02-21T10:23:47.7836227Z cvt.s16.s8 %rs1070, %rs1069; 2026-02-21T10:23:47.7836309Z shr.s16 %rs1071, %rs1070, 4; 2026-02-21T10:23:47.7836386Z selp.b16 %rs1072, %rs1030, %rs1013, %p188; 2026-02-21T10:23:47.7836588Z cvt.s16.s8 %rs1073, %rs1072; 2026-02-21T10:23:47.7836662Z shr.s16 %rs1074, %rs1073, 4; 2026-02-21T10:23:47.7836746Z selp.b16 %rs1075, %rs1031, %rs1017, %p188; 2026-02-21T10:23:47.7836814Z cvt.s16.s8 %rs1076, %rs1075; 2026-02-21T10:23:47.7836885Z shr.s16 %rs1077, %rs1076, 4; 2026-02-21T10:23:47.7836961Z selp.b16 %rs1078, %rs1032, %rs1021, %p188; 2026-02-21T10:23:47.7837027Z cvt.s16.s8 %rs1079, %rs1078; 2026-02-21T10:23:47.7837090Z shr.s16 %rs1080, %rs1079, 4; 2026-02-21T10:23:47.7837172Z selp.b16 %rs1081, %rs1033, %rs994, %p188; 2026-02-21T10:23:47.7837237Z cvt.s16.s8 %rs1082, %rs1081; 2026-02-21T10:23:47.7837300Z shr.s16 %rs1083, %rs1082, 4; 2026-02-21T10:23:47.7837379Z selp.b16 %rs1084, %rs1034, %rs998, %p188; 2026-02-21T10:23:47.7837443Z cvt.s16.s8 %rs1085, %rs1084; 2026-02-21T10:23:47.7837507Z shr.s16 %rs1086, %rs1085, 4; 2026-02-21T10:23:47.7837581Z selp.b16 %rs1087, %rs1035, %rs1002, %p188; 2026-02-21T10:23:47.7837650Z cvt.s16.s8 %rs1088, %rs1087; 2026-02-21T10:23:47.7837714Z shr.s16 %rs1089, %rs1088, 4; 2026-02-21T10:23:47.7837793Z selp.b16 %rs1090, %rs1036, %rs1006, %p188; 2026-02-21T10:23:47.7837957Z cvt.s16.s8 %rs1091, %rs1090; 2026-02-21T10:23:47.7838025Z shr.s16 %rs1092, %rs1091, 4; 2026-02-21T10:23:47.7838106Z selp.b16 %rs1093, %rs1037, %rs1010, %p188; 2026-02-21T10:23:47.7838172Z cvt.s16.s8 %rs1094, %rs1093; 2026-02-21T10:23:47.7838304Z shr.s16 %rs1095, %rs1094, 4; 2026-02-21T10:23:47.7838382Z selp.b16 %rs1096, %rs1038, %rs1014, %p188; 2026-02-21T10:23:47.7838446Z cvt.s16.s8 %rs1097, %rs1096; 2026-02-21T10:23:47.7838517Z shr.s16 %rs1098, %rs1097, 4; 2026-02-21T10:23:47.7838593Z selp.b16 %rs1099, %rs1039, %rs1018, %p188; 2026-02-21T10:23:47.7838655Z cvt.s16.s8 %rs1100, %rs1099; 2026-02-21T10:23:47.7838723Z shr.s16 %rs1101, %rs1100, 4; 2026-02-21T10:23:47.7838810Z selp.b16 %rs1102, %rs1040, %rs1022, %p188; 2026-02-21T10:23:47.7838877Z cvt.s16.s8 %rs1103, %rs1102; 2026-02-21T10:23:47.7838939Z shr.s16 %rs1104, %rs1103, 4; 2026-02-21T10:23:47.7839021Z selp.b16 %rs1105, %rs1041, %rs995, %p188; 2026-02-21T10:23:47.7839083Z cvt.s16.s8 %rs1106, %rs1105; 2026-02-21T10:23:47.7839209Z shr.s16 %rs1107, %rs1106, 4; 2026-02-21T10:23:47.7839291Z selp.b16 %rs1108, %rs1042, %rs999, %p188; 2026-02-21T10:23:47.7839355Z cvt.s16.s8 %rs1109, %rs1108; 2026-02-21T10:23:47.7839418Z shr.s16 %rs1110, %rs1109, 4; 2026-02-21T10:23:47.7839497Z selp.b16 %rs1111, %rs1043, %rs1003, %p188; 2026-02-21T10:23:47.7839563Z cvt.s16.s8 %rs1112, %rs1111; 2026-02-21T10:23:47.7839626Z shr.s16 %rs1113, %rs1112, 4; 2026-02-21T10:23:47.7839763Z selp.b16 %rs1114, %rs1044, %rs1007, %p188; 2026-02-21T10:23:47.7839834Z cvt.s16.s8 %rs1115, %rs1114; 2026-02-21T10:23:47.7839896Z shr.s16 %rs1116, %rs1115, 4; 2026-02-21T10:23:47.7839970Z selp.b16 %rs1117, %rs1045, %rs1011, %p188; 2026-02-21T10:23:47.7840037Z cvt.s16.s8 %rs1118, %rs1117; 2026-02-21T10:23:47.7840099Z shr.s16 %rs1119, %rs1118, 4; 2026-02-21T10:23:47.7840173Z selp.b16 %rs1120, %rs1046, %rs1015, %p188; 2026-02-21T10:23:47.7840235Z cvt.s16.s8 %rs1121, %rs1120; 2026-02-21T10:23:47.7840316Z shr.s16 %rs1122, %rs1121, 4; 2026-02-21T10:23:47.7840396Z selp.b16 %rs1123, %rs1047, %rs1019, %p188; 2026-02-21T10:23:47.7840460Z cvt.s16.s8 %rs1124, %rs1123; 2026-02-21T10:23:47.7840527Z shr.s16 %rs1125, %rs1124, 4; 2026-02-21T10:23:47.7840605Z selp.b16 %rs1126, %rs1048, %rs1023, %p188; 2026-02-21T10:23:47.7840671Z cvt.s16.s8 %rs1127, %rs1126; 2026-02-21T10:23:47.7840737Z shr.s16 %rs1128, %rs1127, 4; 2026-02-21T10:23:47.7840816Z selp.b16 %rs1129, %rs1049, %rs996, %p188; 2026-02-21T10:23:47.7840883Z cvt.s16.s8 %rs1130, %rs1129; 2026-02-21T10:23:47.7840948Z shr.s16 %rs1131, %rs1130, 4; 2026-02-21T10:23:47.7841027Z selp.b16 %rs1132, %rs1050, %rs1000, %p188; 2026-02-21T10:23:47.7841091Z cvt.s16.s8 %rs1133, %rs1132; 2026-02-21T10:23:47.7841152Z shr.s16 %rs1134, %rs1133, 4; 2026-02-21T10:23:47.7841229Z selp.b16 %rs1135, %rs1051, %rs1004, %p188; 2026-02-21T10:23:47.7841297Z cvt.s16.s8 %rs1136, %rs1135; 2026-02-21T10:23:47.7841359Z shr.s16 %rs1137, %rs1136, 4; 2026-02-21T10:23:47.7841435Z selp.b16 %rs1138, %rs1052, %rs1008, %p188; 2026-02-21T10:23:47.7841508Z cvt.s16.s8 %rs1139, %rs1138; 2026-02-21T10:23:47.7841570Z shr.s16 %rs1140, %rs1139, 4; 2026-02-21T10:23:47.7841646Z selp.b16 %rs1141, %rs1053, %rs1012, %p188; 2026-02-21T10:23:47.7841714Z cvt.s16.s8 %rs1142, %rs1141; 2026-02-21T10:23:47.7841780Z shr.s16 %rs1143, %rs1142, 4; 2026-02-21T10:23:47.7841855Z selp.b16 %rs1144, %rs1054, %rs1016, %p188; 2026-02-21T10:23:47.7841919Z cvt.s16.s8 %rs1145, %rs1144; 2026-02-21T10:23:47.7841988Z shr.s16 %rs1146, %rs1145, 4; 2026-02-21T10:23:47.7842064Z selp.b16 %rs1147, %rs1055, %rs1020, %p188; 2026-02-21T10:23:47.7842127Z cvt.s16.s8 %rs1148, %rs1147; 2026-02-21T10:23:47.7842195Z shr.s16 %rs1149, %rs1148, 4; 2026-02-21T10:23:47.7842272Z selp.b16 %rs1150, %rs1056, %rs1024, %p188; 2026-02-21T10:23:47.7842336Z cvt.s16.s8 %rs1151, %rs1150; 2026-02-21T10:23:47.7842399Z shr.s16 %rs1152, %rs1151, 4; 2026-02-21T10:23:47.7842628Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7842767Z cvt.rn.f32.s16 %r10648, %rs1059; 2026-02-21T10:23:47.7842836Z cvt.rn.f32.s16 %r10649, %rs1062; 2026-02-21T10:23:47.7842908Z cvt.rn.f32.s16 %r10650, %rs1065; 2026-02-21T10:23:47.7842976Z cvt.rn.f32.s16 %r10651, %rs1068; 2026-02-21T10:23:47.7843090Z cvt.rn.f32.s16 %r10652, %rs1071; 2026-02-21T10:23:47.7843159Z cvt.rn.f32.s16 %r10653, %rs1074; 2026-02-21T10:23:47.7843225Z cvt.rn.f32.s16 %r10654, %rs1077; 2026-02-21T10:23:47.7843290Z cvt.rn.f32.s16 %r10655, %rs1080; 2026-02-21T10:23:47.7843355Z cvt.rn.f32.s16 %r10656, %rs1083; 2026-02-21T10:23:47.7843426Z cvt.rn.f32.s16 %r10657, %rs1086; 2026-02-21T10:23:47.7843489Z cvt.rn.f32.s16 %r10658, %rs1089; 2026-02-21T10:23:47.7843553Z cvt.rn.f32.s16 %r10659, %rs1092; 2026-02-21T10:23:47.7843622Z cvt.rn.f32.s16 %r10660, %rs1095; 2026-02-21T10:23:47.7843686Z cvt.rn.f32.s16 %r10661, %rs1098; 2026-02-21T10:23:47.7843752Z cvt.rn.f32.s16 %r10662, %rs1101; 2026-02-21T10:23:47.7843816Z cvt.rn.f32.s16 %r10663, %rs1104; 2026-02-21T10:23:47.7843939Z cvt.rn.f32.s16 %r10664, %rs1107; 2026-02-21T10:23:47.7844005Z cvt.rn.f32.s16 %r10665, %rs1110; 2026-02-21T10:23:47.7844070Z cvt.rn.f32.s16 %r10666, %rs1113; 2026-02-21T10:23:47.7844141Z cvt.rn.f32.s16 %r10667, %rs1116; 2026-02-21T10:23:47.7844213Z cvt.rn.f32.s16 %r10668, %rs1119; 2026-02-21T10:23:47.7844289Z cvt.rn.f32.s16 %r10669, %rs1122; 2026-02-21T10:23:47.7844405Z cvt.rn.f32.s16 %r10670, %rs1125; 2026-02-21T10:23:47.7844472Z cvt.rn.f32.s16 %r10671, %rs1128; 2026-02-21T10:23:47.7844537Z cvt.rn.f32.s16 %r10672, %rs1131; 2026-02-21T10:23:47.7844601Z cvt.rn.f32.s16 %r10673, %rs1134; 2026-02-21T10:23:47.7844672Z cvt.rn.f32.s16 %r10674, %rs1137; 2026-02-21T10:23:47.7844740Z cvt.rn.f32.s16 %r10675, %rs1140; 2026-02-21T10:23:47.7844808Z cvt.rn.f32.s16 %r10676, %rs1143; 2026-02-21T10:23:47.7844881Z cvt.rn.f32.s16 %r10677, %rs1146; 2026-02-21T10:23:47.7844945Z cvt.rn.f32.s16 %r10678, %rs1149; 2026-02-21T10:23:47.7845010Z cvt.rn.f32.s16 %r10679, %rs1152; 2026-02-21T10:23:47.7845073Z bar.sync 0; 2026-02-21T10:23:47.7845147Z st.shared.b32 [%r36], %r10648; 2026-02-21T10:23:47.7845214Z st.shared.b32 [%r36+8], %r10649; 2026-02-21T10:23:47.7845284Z st.shared.b32 [%r36+16384], %r10664; 2026-02-21T10:23:47.7845362Z st.shared.b32 [%r36+16392], %r10665; 2026-02-21T10:23:47.7845430Z st.shared.b32 [%r37], %r10650; 2026-02-21T10:23:47.7845497Z st.shared.b32 [%r37+8], %r10651; 2026-02-21T10:23:47.7845569Z st.shared.b32 [%r37+16384], %r10666; 2026-02-21T10:23:47.7845642Z st.shared.b32 [%r37+16392], %r10667; 2026-02-21T10:23:47.7845709Z st.shared.b32 [%r38], %r10652; 2026-02-21T10:23:47.7845775Z st.shared.b32 [%r38+8], %r10653; 2026-02-21T10:23:47.7845847Z st.shared.b32 [%r38+16384], %r10668; 2026-02-21T10:23:47.7845914Z st.shared.b32 [%r38+16392], %r10669; 2026-02-21T10:23:47.7845981Z st.shared.b32 [%r39], %r10654; 2026-02-21T10:23:47.7846051Z st.shared.b32 [%r39+8], %r10655; 2026-02-21T10:23:47.7846118Z st.shared.b32 [%r39+16384], %r10670; 2026-02-21T10:23:47.7846190Z st.shared.b32 [%r39+16392], %r10671; 2026-02-21T10:23:47.7846255Z st.shared.b32 [%r40], %r10656; 2026-02-21T10:23:47.7846340Z st.shared.b32 [%r40+8], %r10657; 2026-02-21T10:23:47.7846409Z st.shared.b32 [%r40+16384], %r10672; 2026-02-21T10:23:47.7846606Z st.shared.b32 [%r40+16392], %r10673; 2026-02-21T10:23:47.7846680Z st.shared.b32 [%r41], %r10658; 2026-02-21T10:23:47.7846750Z st.shared.b32 [%r41+8], %r10659; 2026-02-21T10:23:47.7846817Z st.shared.b32 [%r41+16384], %r10674; 2026-02-21T10:23:47.7846884Z st.shared.b32 [%r41+16392], %r10675; 2026-02-21T10:23:47.7846956Z st.shared.b32 [%r42], %r10660; 2026-02-21T10:23:47.7847023Z st.shared.b32 [%r42+8], %r10661; 2026-02-21T10:23:47.7847093Z st.shared.b32 [%r42+16384], %r10676; 2026-02-21T10:23:47.7847164Z st.shared.b32 [%r42+16392], %r10677; 2026-02-21T10:23:47.7847230Z st.shared.b32 [%r43], %r10662; 2026-02-21T10:23:47.7847297Z st.shared.b32 [%r43+8], %r10663; 2026-02-21T10:23:47.7847459Z st.shared.b32 [%r43+16384], %r10678; 2026-02-21T10:23:47.7847531Z st.shared.b32 [%r43+16392], %r10679; 2026-02-21T10:23:47.7847589Z $L__tmp11: 2026-02-21T10:23:47.7847874Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7848006Z // begin inline asm 2026-02-21T10:23:47.7848100Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7848165Z // end inline asm 2026-02-21T10:23:47.7848227Z bar.sync 0; 2026-02-21T10:23:47.7848302Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7848363Z // begin inline asm 2026-02-21T10:23:47.7849934Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7121,%r7122,%r7123,%r7124}, %rd2, %p79, 1, 1; 2026-02-21T10:23:47.7850002Z // end inline asm 2026-02-21T10:23:47.7850064Z // begin inline asm 2026-02-21T10:23:47.7851626Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7253,%r7254,%r7255,%r7256}, %rd3, %p79, 1, 1; 2026-02-21T10:23:47.7851689Z // end inline asm 2026-02-21T10:23:47.7851754Z // begin inline asm 2026-02-21T10:23:47.7853247Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7385,%r7386,%r7387,%r7388}, %rd4, %p79, 1, 1; 2026-02-21T10:23:47.7853310Z // end inline asm 2026-02-21T10:23:47.7853374Z // begin inline asm 2026-02-21T10:23:47.7854867Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7517,%r7518,%r7519,%r7520}, %rd5, %p79, 1, 1; 2026-02-21T10:23:47.7854935Z // end inline asm 2026-02-21T10:23:47.7855000Z // begin inline asm 2026-02-21T10:23:47.7856596Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7649,%r7650,%r7651,%r7652}, %rd6, %p79, 1, 1; 2026-02-21T10:23:47.7856741Z // end inline asm 2026-02-21T10:23:47.7856806Z // begin inline asm 2026-02-21T10:23:47.7858358Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7781,%r7782,%r7783,%r7784}, %rd7, %p79, 1, 1; 2026-02-21T10:23:47.7858427Z // end inline asm 2026-02-21T10:23:47.7858489Z // begin inline asm 2026-02-21T10:23:47.7860075Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r7913,%r7914,%r7915,%r7916}, %rd8, %p79, 1, 1; 2026-02-21T10:23:47.7860152Z // end inline asm 2026-02-21T10:23:47.7860213Z // begin inline asm 2026-02-21T10:23:47.7861692Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8045,%r8046,%r8047,%r8048}, %rd9, %p79, 1, 1; 2026-02-21T10:23:47.7861758Z // end inline asm 2026-02-21T10:23:47.7861839Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7861912Z mov.b32 %r8114, %r10545; 2026-02-21T10:23:47.7861984Z mov.b32 %r8115, %r10545; 2026-02-21T10:23:47.7862057Z mov.b32 %r8113, %r10980; 2026-02-21T10:23:47.7862125Z // begin inline asm 2026-02-21T10:23:47.7863410Z // wait for regs: %r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218,%r8113,%r8114,%r8115 2026-02-21T10:23:47.7863493Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7863561Z // end inline asm 2026-02-21T10:23:47.7863619Z $L__tmp12: 2026-02-21T10:23:47.7863837Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7863912Z add.s64 %rd211, %rd169, 256; 2026-02-21T10:23:47.7863978Z add.s64 %rd214, %rd172, 256; 2026-02-21T10:23:47.7864042Z add.s64 %rd217, %rd175, 256; 2026-02-21T10:23:47.7864247Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7864317Z add.s64 %rd220, %rd178, 256; 2026-02-21T10:23:47.7864377Z // begin inline asm 2026-02-21T10:23:47.7864444Z mov.u64 %rd210, 0x0; 2026-02-21T10:23:47.7864581Z createpolicy.fractional.L2::evict_last.b64 %rd210, 1.0; 2026-02-21T10:23:47.7864701Z // end inline asm 2026-02-21T10:23:47.7868051Z // begin inline asm 2026-02-21T10:23:47.7868162Z mov.u32 %r8183, 0x0; 2026-02-21T10:23:47.7868232Z mov.u32 %r8184, 0x0; 2026-02-21T10:23:47.7868436Z mov.u32 %r8185, 0x0; 2026-02-21T10:23:47.7868576Z mov.u32 %r8186, 0x0; 2026-02-21T10:23:47.7868870Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8183, %r8184, %r8185, %r8186 }, [ %rd211 + 0 ], %rd210; 2026-02-21T10:23:47.7868939Z // end inline asm 2026-02-21T10:23:47.7869005Z // begin inline asm 2026-02-21T10:23:47.7869073Z mov.u64 %rd213, 0x0; 2026-02-21T10:23:47.7869211Z createpolicy.fractional.L2::evict_last.b64 %rd213, 1.0; 2026-02-21T10:23:47.7869273Z // end inline asm 2026-02-21T10:23:47.7869341Z // begin inline asm 2026-02-21T10:23:47.7869404Z mov.u32 %r8187, 0x0; 2026-02-21T10:23:47.7869465Z mov.u32 %r8188, 0x0; 2026-02-21T10:23:47.7869524Z mov.u32 %r8189, 0x0; 2026-02-21T10:23:47.7869589Z mov.u32 %r8190, 0x0; 2026-02-21T10:23:47.7869920Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8187, %r8188, %r8189, %r8190 }, [ %rd214 + 0 ], %rd213; 2026-02-21T10:23:47.7869988Z // end inline asm 2026-02-21T10:23:47.7870060Z // begin inline asm 2026-02-21T10:23:47.7870123Z mov.u64 %rd216, 0x0; 2026-02-21T10:23:47.7870261Z createpolicy.fractional.L2::evict_last.b64 %rd216, 1.0; 2026-02-21T10:23:47.7870324Z // end inline asm 2026-02-21T10:23:47.7870393Z // begin inline asm 2026-02-21T10:23:47.7870518Z mov.u32 %r8191, 0x0; 2026-02-21T10:23:47.7870579Z mov.u32 %r8192, 0x0; 2026-02-21T10:23:47.7870646Z mov.u32 %r8193, 0x0; 2026-02-21T10:23:47.7870706Z mov.u32 %r8194, 0x0; 2026-02-21T10:23:47.7870937Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8191, %r8192, %r8193, %r8194 }, [ %rd217 + 0 ], %rd216; 2026-02-21T10:23:47.7871007Z // end inline asm 2026-02-21T10:23:47.7871069Z // begin inline asm 2026-02-21T10:23:47.7871129Z mov.u64 %rd219, 0x0; 2026-02-21T10:23:47.7871269Z createpolicy.fractional.L2::evict_last.b64 %rd219, 1.0; 2026-02-21T10:23:47.7871341Z // end inline asm 2026-02-21T10:23:47.7871404Z // begin inline asm 2026-02-21T10:23:47.7871465Z mov.u32 %r8195, 0x0; 2026-02-21T10:23:47.7871529Z mov.u32 %r8196, 0x0; 2026-02-21T10:23:47.7871598Z mov.u32 %r8197, 0x0; 2026-02-21T10:23:47.7871660Z mov.u32 %r8198, 0x0; 2026-02-21T10:23:47.7871882Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8195, %r8196, %r8197, %r8198 }, [ %rd220 + 0 ], %rd219; 2026-02-21T10:23:47.7871951Z // end inline asm 2026-02-21T10:23:47.7872175Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7872235Z bar.sync 0; 2026-02-21T10:23:47.7872331Z st.shared.v2.b32 [%r18], {%r8183, %r8184}; 2026-02-21T10:23:47.7872425Z st.shared.v2.b32 [%r18+4096], {%r8187, %r8188}; 2026-02-21T10:23:47.7872509Z st.shared.v2.b32 [%r18+8192], {%r8191, %r8192}; 2026-02-21T10:23:47.7872608Z st.shared.v2.b32 [%r18+12288], {%r8195, %r8196}; 2026-02-21T10:23:47.7872686Z st.shared.v2.b32 [%r19], {%r8185, %r8186}; 2026-02-21T10:23:47.7872777Z st.shared.v2.b32 [%r19+4096], {%r8189, %r8190}; 2026-02-21T10:23:47.7872868Z st.shared.v2.b32 [%r19+8192], {%r8193, %r8194}; 2026-02-21T10:23:47.7872955Z st.shared.v2.b32 [%r19+12288], {%r8197, %r8198}; 2026-02-21T10:23:47.7873017Z bar.sync 0; 2026-02-21T10:23:47.7873091Z ld.shared.b16 %rs1153, [%r20]; 2026-02-21T10:23:47.7873172Z ld.shared.b16 %rs1154, [%r20+1024]; 2026-02-21T10:23:47.7873248Z ld.shared.b16 %rs1155, [%r20+64]; 2026-02-21T10:23:47.7873317Z ld.shared.b16 %rs1156, [%r20+1088]; 2026-02-21T10:23:47.7873391Z ld.shared.b16 %rs1157, [%r21]; 2026-02-21T10:23:47.7873458Z ld.shared.b16 %rs1158, [%r21+1024]; 2026-02-21T10:23:47.7873525Z ld.shared.b16 %rs1159, [%r21+64]; 2026-02-21T10:23:47.7873592Z ld.shared.b16 %rs1160, [%r21+1088]; 2026-02-21T10:23:47.7873664Z ld.shared.b16 %rs1161, [%r22]; 2026-02-21T10:23:47.7873734Z ld.shared.b16 %rs1162, [%r22+1024]; 2026-02-21T10:23:47.7873800Z ld.shared.b16 %rs1163, [%r22+64]; 2026-02-21T10:23:47.7873962Z ld.shared.b16 %rs1164, [%r22+1088]; 2026-02-21T10:23:47.7874030Z ld.shared.b16 %rs1165, [%r23]; 2026-02-21T10:23:47.7874098Z ld.shared.b16 %rs1166, [%r23+1024]; 2026-02-21T10:23:47.7874165Z ld.shared.b16 %rs1167, [%r23+64]; 2026-02-21T10:23:47.7874286Z ld.shared.b16 %rs1168, [%r23+1088]; 2026-02-21T10:23:47.7874352Z ld.shared.b16 %rs1169, [%r24]; 2026-02-21T10:23:47.7874421Z ld.shared.b16 %rs1170, [%r24+1024]; 2026-02-21T10:23:47.7874493Z ld.shared.b16 %rs1171, [%r24+64]; 2026-02-21T10:23:47.7874561Z ld.shared.b16 %rs1172, [%r24+1088]; 2026-02-21T10:23:47.7874627Z ld.shared.b16 %rs1173, [%r25]; 2026-02-21T10:23:47.7874701Z ld.shared.b16 %rs1174, [%r25+1024]; 2026-02-21T10:23:47.7874768Z ld.shared.b16 %rs1175, [%r25+64]; 2026-02-21T10:23:47.7874835Z ld.shared.b16 %rs1176, [%r25+1088]; 2026-02-21T10:23:47.7874902Z ld.shared.b16 %rs1177, [%r26]; 2026-02-21T10:23:47.7874975Z ld.shared.b16 %rs1178, [%r26+1024]; 2026-02-21T10:23:47.7875041Z ld.shared.b16 %rs1179, [%r26+64]; 2026-02-21T10:23:47.7875158Z ld.shared.b16 %rs1180, [%r26+1088]; 2026-02-21T10:23:47.7875232Z ld.shared.b16 %rs1181, [%r27]; 2026-02-21T10:23:47.7875300Z ld.shared.b16 %rs1182, [%r27+1024]; 2026-02-21T10:23:47.7875369Z ld.shared.b16 %rs1183, [%r27+64]; 2026-02-21T10:23:47.7875438Z ld.shared.b16 %rs1184, [%r27+1088]; 2026-02-21T10:23:47.7875513Z cvt.f32.bf16 %r8336, %rs1153; 2026-02-21T10:23:47.7875622Z cvt.f32.bf16 %r8337, %rs1154; 2026-02-21T10:23:47.7875686Z cvt.f32.bf16 %r8338, %rs1157; 2026-02-21T10:23:47.7875753Z cvt.f32.bf16 %r8339, %rs1158; 2026-02-21T10:23:47.7875817Z cvt.f32.bf16 %r8468, %rs1161; 2026-02-21T10:23:47.7875884Z cvt.f32.bf16 %r8469, %rs1162; 2026-02-21T10:23:47.7875952Z cvt.f32.bf16 %r8470, %rs1165; 2026-02-21T10:23:47.7876015Z cvt.f32.bf16 %r8471, %rs1166; 2026-02-21T10:23:47.7876078Z cvt.f32.bf16 %r8600, %rs1169; 2026-02-21T10:23:47.7876141Z cvt.f32.bf16 %r8601, %rs1170; 2026-02-21T10:23:47.7876210Z cvt.f32.bf16 %r8602, %rs1173; 2026-02-21T10:23:47.7876278Z cvt.f32.bf16 %r8603, %rs1174; 2026-02-21T10:23:47.7876340Z cvt.f32.bf16 %r8732, %rs1177; 2026-02-21T10:23:47.7876406Z cvt.f32.bf16 %r8733, %rs1178; 2026-02-21T10:23:47.7876604Z cvt.f32.bf16 %r8734, %rs1181; 2026-02-21T10:23:47.7876675Z cvt.f32.bf16 %r8735, %rs1182; 2026-02-21T10:23:47.7876739Z cvt.f32.bf16 %r8864, %rs1155; 2026-02-21T10:23:47.7876811Z cvt.f32.bf16 %r8865, %rs1156; 2026-02-21T10:23:47.7876889Z cvt.f32.bf16 %r8866, %rs1159; 2026-02-21T10:23:47.7876953Z cvt.f32.bf16 %r8867, %rs1160; 2026-02-21T10:23:47.7877023Z cvt.f32.bf16 %r8996, %rs1163; 2026-02-21T10:23:47.7877085Z cvt.f32.bf16 %r8997, %rs1164; 2026-02-21T10:23:47.7877148Z cvt.f32.bf16 %r8998, %rs1167; 2026-02-21T10:23:47.7877211Z cvt.f32.bf16 %r8999, %rs1168; 2026-02-21T10:23:47.7877279Z cvt.f32.bf16 %r9128, %rs1171; 2026-02-21T10:23:47.7877341Z cvt.f32.bf16 %r9129, %rs1172; 2026-02-21T10:23:47.7877405Z cvt.f32.bf16 %r9130, %rs1175; 2026-02-21T10:23:47.7877473Z cvt.f32.bf16 %r9131, %rs1176; 2026-02-21T10:23:47.7877541Z cvt.f32.bf16 %r9260, %rs1179; 2026-02-21T10:23:47.7877604Z cvt.f32.bf16 %r9261, %rs1180; 2026-02-21T10:23:47.7877668Z cvt.f32.bf16 %r9262, %rs1183; 2026-02-21T10:23:47.7877736Z cvt.f32.bf16 %r9263, %rs1184; 2026-02-21T10:23:47.7877965Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7878024Z bar.sync 0; 2026-02-21T10:23:47.7878095Z // begin inline asm 2026-02-21T10:23:47.7878200Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7878262Z // end inline asm 2026-02-21T10:23:47.7878325Z bar.sync 0; 2026-02-21T10:23:47.7878389Z // begin inline asm 2026-02-21T10:23:47.7878525Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7878587Z // end inline asm 2026-02-21T10:23:47.7878655Z // begin inline asm 2026-02-21T10:23:47.7878735Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7878793Z // end inline asm 2026-02-21T10:23:47.7878953Z bar.sync 0; 2026-02-21T10:23:47.7879034Z elect.sync %r10680|%p127, -1; 2026-02-21T10:23:47.7879108Z and.pred %p101, %p1, %p127; 2026-02-21T10:23:47.7879176Z or.b32 %r8203, %r5773, 64; 2026-02-21T10:23:47.7879244Z // begin inline asm 2026-02-21T10:23:47.7879664Z @%p101 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r5772, %r8203}], [%r5769]; 2026-02-21T10:23:47.7879728Z // end inline asm 2026-02-21T10:23:47.7879796Z bar.sync 0; 2026-02-21T10:23:47.7879859Z // begin inline asm 2026-02-21T10:23:47.7879914Z 2026-02-21T10:23:47.7879975Z { 2026-02-21T10:23:47.7880055Z .reg .pred complete; 2026-02-21T10:23:47.7880116Z waitLoop: 2026-02-21T10:23:47.7880272Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r10545; 2026-02-21T10:23:47.7880357Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7880410Z } 2026-02-21T10:23:47.7880415Z 2026-02-21T10:23:47.7880474Z // end inline asm 2026-02-21T10:23:47.7880543Z bar.sync 0; 2026-02-21T10:23:47.7880608Z // begin inline asm 2026-02-21T10:23:47.7880797Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7880858Z // end inline asm 2026-02-21T10:23:47.7881086Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7881164Z ld.shared.b8 %rs1185, [%r28]; 2026-02-21T10:23:47.7881238Z ld.shared.b8 %rs1186, [%r28+1024]; 2026-02-21T10:23:47.7881372Z ld.shared.b8 %rs1187, [%r28+2048]; 2026-02-21T10:23:47.7881442Z ld.shared.b8 %rs1188, [%r28+3072]; 2026-02-21T10:23:47.7881510Z ld.shared.b8 %rs1189, [%r29+128]; 2026-02-21T10:23:47.7881582Z ld.shared.b8 %rs1190, [%r29+1152]; 2026-02-21T10:23:47.7881648Z ld.shared.b8 %rs1191, [%r29+2176]; 2026-02-21T10:23:47.7881712Z ld.shared.b8 %rs1192, [%r29+3200]; 2026-02-21T10:23:47.7881780Z ld.shared.b8 %rs1193, [%r30+256]; 2026-02-21T10:23:47.7881866Z ld.shared.b8 %rs1194, [%r30+1280]; 2026-02-21T10:23:47.7881933Z ld.shared.b8 %rs1195, [%r30+2304]; 2026-02-21T10:23:47.7882004Z ld.shared.b8 %rs1196, [%r30+3328]; 2026-02-21T10:23:47.7882075Z ld.shared.b8 %rs1197, [%r31+384]; 2026-02-21T10:23:47.7882141Z ld.shared.b8 %rs1198, [%r31+1408]; 2026-02-21T10:23:47.7882207Z ld.shared.b8 %rs1199, [%r31+2432]; 2026-02-21T10:23:47.7882276Z ld.shared.b8 %rs1200, [%r31+3456]; 2026-02-21T10:23:47.7882348Z ld.shared.b8 %rs1201, [%r32+512]; 2026-02-21T10:23:47.7882424Z ld.shared.b8 %rs1202, [%r32+1536]; 2026-02-21T10:23:47.7882494Z ld.shared.b8 %rs1203, [%r32+2560]; 2026-02-21T10:23:47.7882572Z ld.shared.b8 %rs1204, [%r32+3584]; 2026-02-21T10:23:47.7882638Z ld.shared.b8 %rs1205, [%r33+640]; 2026-02-21T10:23:47.7882702Z ld.shared.b8 %rs1206, [%r33+1664]; 2026-02-21T10:23:47.7882774Z ld.shared.b8 %rs1207, [%r33+2688]; 2026-02-21T10:23:47.7882841Z ld.shared.b8 %rs1208, [%r33+3712]; 2026-02-21T10:23:47.7882905Z ld.shared.b8 %rs1209, [%r34+768]; 2026-02-21T10:23:47.7882971Z ld.shared.b8 %rs1210, [%r34+1792]; 2026-02-21T10:23:47.7883044Z ld.shared.b8 %rs1211, [%r34+2816]; 2026-02-21T10:23:47.7883112Z ld.shared.b8 %rs1212, [%r34+3840]; 2026-02-21T10:23:47.7883179Z ld.shared.b8 %rs1213, [%r35+896]; 2026-02-21T10:23:47.7883253Z ld.shared.b8 %rs1214, [%r35+1920]; 2026-02-21T10:23:47.7883319Z ld.shared.b8 %rs1215, [%r35+2944]; 2026-02-21T10:23:47.7883390Z ld.shared.b8 %rs1216, [%r35+3968]; 2026-02-21T10:23:47.7883607Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7883682Z shl.b16 %rs1217, %rs1185, 4; 2026-02-21T10:23:47.7883746Z shl.b16 %rs1218, %rs1189, 4; 2026-02-21T10:23:47.7883810Z shl.b16 %rs1219, %rs1193, 4; 2026-02-21T10:23:47.7883880Z shl.b16 %rs1220, %rs1197, 4; 2026-02-21T10:23:47.7883944Z shl.b16 %rs1221, %rs1201, 4; 2026-02-21T10:23:47.7884007Z shl.b16 %rs1222, %rs1205, 4; 2026-02-21T10:23:47.7884075Z shl.b16 %rs1223, %rs1209, 4; 2026-02-21T10:23:47.7884138Z shl.b16 %rs1224, %rs1213, 4; 2026-02-21T10:23:47.7884201Z shl.b16 %rs1225, %rs1186, 4; 2026-02-21T10:23:47.7884333Z shl.b16 %rs1226, %rs1190, 4; 2026-02-21T10:23:47.7884406Z shl.b16 %rs1227, %rs1194, 4; 2026-02-21T10:23:47.7884468Z shl.b16 %rs1228, %rs1198, 4; 2026-02-21T10:23:47.7884531Z shl.b16 %rs1229, %rs1202, 4; 2026-02-21T10:23:47.7884648Z shl.b16 %rs1230, %rs1206, 4; 2026-02-21T10:23:47.7884721Z shl.b16 %rs1231, %rs1210, 4; 2026-02-21T10:23:47.7884787Z shl.b16 %rs1232, %rs1214, 4; 2026-02-21T10:23:47.7884853Z shl.b16 %rs1233, %rs1187, 4; 2026-02-21T10:23:47.7884923Z shl.b16 %rs1234, %rs1191, 4; 2026-02-21T10:23:47.7884985Z shl.b16 %rs1235, %rs1195, 4; 2026-02-21T10:23:47.7885046Z shl.b16 %rs1236, %rs1199, 4; 2026-02-21T10:23:47.7885113Z shl.b16 %rs1237, %rs1203, 4; 2026-02-21T10:23:47.7885177Z shl.b16 %rs1238, %rs1207, 4; 2026-02-21T10:23:47.7885241Z shl.b16 %rs1239, %rs1211, 4; 2026-02-21T10:23:47.7885303Z shl.b16 %rs1240, %rs1215, 4; 2026-02-21T10:23:47.7885372Z shl.b16 %rs1241, %rs1188, 4; 2026-02-21T10:23:47.7885434Z shl.b16 %rs1242, %rs1192, 4; 2026-02-21T10:23:47.7885551Z shl.b16 %rs1243, %rs1196, 4; 2026-02-21T10:23:47.7885622Z shl.b16 %rs1244, %rs1200, 4; 2026-02-21T10:23:47.7885687Z shl.b16 %rs1245, %rs1204, 4; 2026-02-21T10:23:47.7885752Z shl.b16 %rs1246, %rs1208, 4; 2026-02-21T10:23:47.7885820Z shl.b16 %rs1247, %rs1212, 4; 2026-02-21T10:23:47.7885889Z shl.b16 %rs1248, %rs1216, 4; 2026-02-21T10:23:47.7886142Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7886230Z selp.b16 %rs1249, %rs1217, %rs1185, %p188; 2026-02-21T10:23:47.7886300Z cvt.s16.s8 %rs1250, %rs1249; 2026-02-21T10:23:47.7886360Z shr.s16 %rs1251, %rs1250, 4; 2026-02-21T10:23:47.7886439Z selp.b16 %rs1252, %rs1218, %rs1189, %p188; 2026-02-21T10:23:47.7886635Z cvt.s16.s8 %rs1253, %rs1252; 2026-02-21T10:23:47.7886703Z shr.s16 %rs1254, %rs1253, 4; 2026-02-21T10:23:47.7886793Z selp.b16 %rs1255, %rs1219, %rs1193, %p188; 2026-02-21T10:23:47.7886861Z cvt.s16.s8 %rs1256, %rs1255; 2026-02-21T10:23:47.7886936Z shr.s16 %rs1257, %rs1256, 4; 2026-02-21T10:23:47.7887013Z selp.b16 %rs1258, %rs1220, %rs1197, %p188; 2026-02-21T10:23:47.7887082Z cvt.s16.s8 %rs1259, %rs1258; 2026-02-21T10:23:47.7887156Z shr.s16 %rs1260, %rs1259, 4; 2026-02-21T10:23:47.7887235Z selp.b16 %rs1261, %rs1221, %rs1201, %p188; 2026-02-21T10:23:47.7887299Z cvt.s16.s8 %rs1262, %rs1261; 2026-02-21T10:23:47.7887363Z shr.s16 %rs1263, %rs1262, 4; 2026-02-21T10:23:47.7887450Z selp.b16 %rs1264, %rs1222, %rs1205, %p188; 2026-02-21T10:23:47.7887517Z cvt.s16.s8 %rs1265, %rs1264; 2026-02-21T10:23:47.7887581Z shr.s16 %rs1266, %rs1265, 4; 2026-02-21T10:23:47.7887664Z selp.b16 %rs1267, %rs1223, %rs1209, %p188; 2026-02-21T10:23:47.7887731Z cvt.s16.s8 %rs1268, %rs1267; 2026-02-21T10:23:47.7887796Z shr.s16 %rs1269, %rs1268, 4; 2026-02-21T10:23:47.7887879Z selp.b16 %rs1270, %rs1224, %rs1213, %p188; 2026-02-21T10:23:47.7887945Z cvt.s16.s8 %rs1271, %rs1270; 2026-02-21T10:23:47.7888010Z shr.s16 %rs1272, %rs1271, 4; 2026-02-21T10:23:47.7888090Z selp.b16 %rs1273, %rs1225, %rs1186, %p188; 2026-02-21T10:23:47.7888160Z cvt.s16.s8 %rs1274, %rs1273; 2026-02-21T10:23:47.7888224Z shr.s16 %rs1275, %rs1274, 4; 2026-02-21T10:23:47.7888299Z selp.b16 %rs1276, %rs1226, %rs1190, %p188; 2026-02-21T10:23:47.7888371Z cvt.s16.s8 %rs1277, %rs1276; 2026-02-21T10:23:47.7888434Z shr.s16 %rs1278, %rs1277, 4; 2026-02-21T10:23:47.7888508Z selp.b16 %rs1279, %rs1227, %rs1194, %p188; 2026-02-21T10:23:47.7888573Z cvt.s16.s8 %rs1280, %rs1279; 2026-02-21T10:23:47.7888642Z shr.s16 %rs1281, %rs1280, 4; 2026-02-21T10:23:47.7888717Z selp.b16 %rs1282, %rs1228, %rs1198, %p188; 2026-02-21T10:23:47.7888781Z cvt.s16.s8 %rs1283, %rs1282; 2026-02-21T10:23:47.7888855Z shr.s16 %rs1284, %rs1283, 4; 2026-02-21T10:23:47.7888931Z selp.b16 %rs1285, %rs1229, %rs1202, %p188; 2026-02-21T10:23:47.7888995Z cvt.s16.s8 %rs1286, %rs1285; 2026-02-21T10:23:47.7889062Z shr.s16 %rs1287, %rs1286, 4; 2026-02-21T10:23:47.7889138Z selp.b16 %rs1288, %rs1230, %rs1206, %p188; 2026-02-21T10:23:47.7889299Z cvt.s16.s8 %rs1289, %rs1288; 2026-02-21T10:23:47.7889365Z shr.s16 %rs1290, %rs1289, 4; 2026-02-21T10:23:47.7889453Z selp.b16 %rs1291, %rs1231, %rs1210, %p188; 2026-02-21T10:23:47.7889515Z cvt.s16.s8 %rs1292, %rs1291; 2026-02-21T10:23:47.7889638Z shr.s16 %rs1293, %rs1292, 4; 2026-02-21T10:23:47.7889718Z selp.b16 %rs1294, %rs1232, %rs1214, %p188; 2026-02-21T10:23:47.7889780Z cvt.s16.s8 %rs1295, %rs1294; 2026-02-21T10:23:47.7889846Z shr.s16 %rs1296, %rs1295, 4; 2026-02-21T10:23:47.7889923Z selp.b16 %rs1297, %rs1233, %rs1187, %p188; 2026-02-21T10:23:47.7889993Z cvt.s16.s8 %rs1298, %rs1297; 2026-02-21T10:23:47.7890056Z shr.s16 %rs1299, %rs1298, 4; 2026-02-21T10:23:47.7890132Z selp.b16 %rs1300, %rs1234, %rs1191, %p188; 2026-02-21T10:23:47.7890201Z cvt.s16.s8 %rs1301, %rs1300; 2026-02-21T10:23:47.7890276Z shr.s16 %rs1302, %rs1301, 4; 2026-02-21T10:23:47.7890353Z selp.b16 %rs1303, %rs1235, %rs1195, %p188; 2026-02-21T10:23:47.7890414Z cvt.s16.s8 %rs1304, %rs1303; 2026-02-21T10:23:47.7890560Z shr.s16 %rs1305, %rs1304, 4; 2026-02-21T10:23:47.7890640Z selp.b16 %rs1306, %rs1236, %rs1199, %p188; 2026-02-21T10:23:47.7890706Z cvt.s16.s8 %rs1307, %rs1306; 2026-02-21T10:23:47.7890786Z shr.s16 %rs1308, %rs1307, 4; 2026-02-21T10:23:47.7890869Z selp.b16 %rs1309, %rs1237, %rs1203, %p188; 2026-02-21T10:23:47.7890934Z cvt.s16.s8 %rs1310, %rs1309; 2026-02-21T10:23:47.7891067Z shr.s16 %rs1311, %rs1310, 4; 2026-02-21T10:23:47.7891146Z selp.b16 %rs1312, %rs1238, %rs1207, %p188; 2026-02-21T10:23:47.7891211Z cvt.s16.s8 %rs1313, %rs1312; 2026-02-21T10:23:47.7891276Z shr.s16 %rs1314, %rs1313, 4; 2026-02-21T10:23:47.7891356Z selp.b16 %rs1315, %rs1239, %rs1211, %p188; 2026-02-21T10:23:47.7891419Z cvt.s16.s8 %rs1316, %rs1315; 2026-02-21T10:23:47.7891481Z shr.s16 %rs1317, %rs1316, 4; 2026-02-21T10:23:47.7891573Z selp.b16 %rs1318, %rs1240, %rs1215, %p188; 2026-02-21T10:23:47.7891637Z cvt.s16.s8 %rs1319, %rs1318; 2026-02-21T10:23:47.7891698Z shr.s16 %rs1320, %rs1319, 4; 2026-02-21T10:23:47.7891779Z selp.b16 %rs1321, %rs1241, %rs1188, %p188; 2026-02-21T10:23:47.7891850Z cvt.s16.s8 %rs1322, %rs1321; 2026-02-21T10:23:47.7891915Z shr.s16 %rs1323, %rs1322, 4; 2026-02-21T10:23:47.7891992Z selp.b16 %rs1324, %rs1242, %rs1192, %p188; 2026-02-21T10:23:47.7892070Z cvt.s16.s8 %rs1325, %rs1324; 2026-02-21T10:23:47.7892132Z shr.s16 %rs1326, %rs1325, 4; 2026-02-21T10:23:47.7892210Z selp.b16 %rs1327, %rs1243, %rs1196, %p188; 2026-02-21T10:23:47.7892279Z cvt.s16.s8 %rs1328, %rs1327; 2026-02-21T10:23:47.7892347Z shr.s16 %rs1329, %rs1328, 4; 2026-02-21T10:23:47.7892425Z selp.b16 %rs1330, %rs1244, %rs1200, %p188; 2026-02-21T10:23:47.7892489Z cvt.s16.s8 %rs1331, %rs1330; 2026-02-21T10:23:47.7892551Z shr.s16 %rs1332, %rs1331, 4; 2026-02-21T10:23:47.7892628Z selp.b16 %rs1333, %rs1245, %rs1204, %p188; 2026-02-21T10:23:47.7892695Z cvt.s16.s8 %rs1334, %rs1333; 2026-02-21T10:23:47.7892756Z shr.s16 %rs1335, %rs1334, 4; 2026-02-21T10:23:47.7892832Z selp.b16 %rs1336, %rs1246, %rs1208, %p188; 2026-02-21T10:23:47.7892904Z cvt.s16.s8 %rs1337, %rs1336; 2026-02-21T10:23:47.7892967Z shr.s16 %rs1338, %rs1337, 4; 2026-02-21T10:23:47.7893042Z selp.b16 %rs1339, %rs1247, %rs1212, %p188; 2026-02-21T10:23:47.7893113Z cvt.s16.s8 %rs1340, %rs1339; 2026-02-21T10:23:47.7893175Z shr.s16 %rs1341, %rs1340, 4; 2026-02-21T10:23:47.7893249Z selp.b16 %rs1342, %rs1248, %rs1216, %p188; 2026-02-21T10:23:47.7893313Z cvt.s16.s8 %rs1343, %rs1342; 2026-02-21T10:23:47.7893383Z shr.s16 %rs1344, %rs1343, 4; 2026-02-21T10:23:47.7893602Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7893672Z cvt.rn.f32.s16 %r10681, %rs1251; 2026-02-21T10:23:47.7893743Z cvt.rn.f32.s16 %r10682, %rs1254; 2026-02-21T10:23:47.7893807Z cvt.rn.f32.s16 %r10683, %rs1257; 2026-02-21T10:23:47.7893870Z cvt.rn.f32.s16 %r10684, %rs1260; 2026-02-21T10:23:47.7893934Z cvt.rn.f32.s16 %r10685, %rs1263; 2026-02-21T10:23:47.7894078Z cvt.rn.f32.s16 %r10686, %rs1266; 2026-02-21T10:23:47.7894148Z cvt.rn.f32.s16 %r10687, %rs1269; 2026-02-21T10:23:47.7894213Z cvt.rn.f32.s16 %r10688, %rs1272; 2026-02-21T10:23:47.7894284Z cvt.rn.f32.s16 %r10689, %rs1275; 2026-02-21T10:23:47.7894404Z cvt.rn.f32.s16 %r10690, %rs1278; 2026-02-21T10:23:47.7894469Z cvt.rn.f32.s16 %r10691, %rs1281; 2026-02-21T10:23:47.7894541Z cvt.rn.f32.s16 %r10692, %rs1284; 2026-02-21T10:23:47.7894607Z cvt.rn.f32.s16 %r10693, %rs1287; 2026-02-21T10:23:47.7894671Z cvt.rn.f32.s16 %r10694, %rs1290; 2026-02-21T10:23:47.7894735Z cvt.rn.f32.s16 %r10695, %rs1293; 2026-02-21T10:23:47.7894804Z cvt.rn.f32.s16 %r10696, %rs1296; 2026-02-21T10:23:47.7894873Z cvt.rn.f32.s16 %r10697, %rs1299; 2026-02-21T10:23:47.7894941Z cvt.rn.f32.s16 %r10698, %rs1302; 2026-02-21T10:23:47.7895010Z cvt.rn.f32.s16 %r10699, %rs1305; 2026-02-21T10:23:47.7895073Z cvt.rn.f32.s16 %r10700, %rs1308; 2026-02-21T10:23:47.7895137Z cvt.rn.f32.s16 %r10701, %rs1311; 2026-02-21T10:23:47.7895205Z cvt.rn.f32.s16 %r10702, %rs1314; 2026-02-21T10:23:47.7895321Z cvt.rn.f32.s16 %r10703, %rs1317; 2026-02-21T10:23:47.7895387Z cvt.rn.f32.s16 %r10704, %rs1320; 2026-02-21T10:23:47.7895451Z cvt.rn.f32.s16 %r10705, %rs1323; 2026-02-21T10:23:47.7895523Z cvt.rn.f32.s16 %r10706, %rs1326; 2026-02-21T10:23:47.7895590Z cvt.rn.f32.s16 %r10707, %rs1329; 2026-02-21T10:23:47.7895654Z cvt.rn.f32.s16 %r10708, %rs1332; 2026-02-21T10:23:47.7895770Z cvt.rn.f32.s16 %r10709, %rs1335; 2026-02-21T10:23:47.7895839Z cvt.rn.f32.s16 %r10710, %rs1338; 2026-02-21T10:23:47.7895906Z cvt.rn.f32.s16 %r10711, %rs1341; 2026-02-21T10:23:47.7895971Z cvt.rn.f32.s16 %r10712, %rs1344; 2026-02-21T10:23:47.7896036Z bar.sync 0; 2026-02-21T10:23:47.7896104Z st.shared.b32 [%r36], %r10681; 2026-02-21T10:23:47.7896169Z st.shared.b32 [%r36+8], %r10682; 2026-02-21T10:23:47.7896246Z st.shared.b32 [%r36+16384], %r10697; 2026-02-21T10:23:47.7896324Z st.shared.b32 [%r36+16392], %r10698; 2026-02-21T10:23:47.7896394Z st.shared.b32 [%r37], %r10683; 2026-02-21T10:23:47.7896577Z st.shared.b32 [%r37+8], %r10684; 2026-02-21T10:23:47.7896662Z st.shared.b32 [%r37+16384], %r10699; 2026-02-21T10:23:47.7896731Z st.shared.b32 [%r37+16392], %r10700; 2026-02-21T10:23:47.7896799Z st.shared.b32 [%r38], %r10685; 2026-02-21T10:23:47.7896874Z st.shared.b32 [%r38+8], %r10686; 2026-02-21T10:23:47.7896941Z st.shared.b32 [%r38+16384], %r10701; 2026-02-21T10:23:47.7897010Z st.shared.b32 [%r38+16392], %r10702; 2026-02-21T10:23:47.7897076Z st.shared.b32 [%r39], %r10687; 2026-02-21T10:23:47.7897147Z st.shared.b32 [%r39+8], %r10688; 2026-02-21T10:23:47.7897211Z st.shared.b32 [%r39+16384], %r10703; 2026-02-21T10:23:47.7897278Z st.shared.b32 [%r39+16392], %r10704; 2026-02-21T10:23:47.7897348Z st.shared.b32 [%r40], %r10689; 2026-02-21T10:23:47.7897425Z st.shared.b32 [%r40+8], %r10690; 2026-02-21T10:23:47.7897493Z st.shared.b32 [%r40+16384], %r10705; 2026-02-21T10:23:47.7897563Z st.shared.b32 [%r40+16392], %r10706; 2026-02-21T10:23:47.7897635Z st.shared.b32 [%r41], %r10691; 2026-02-21T10:23:47.7897701Z st.shared.b32 [%r41+8], %r10692; 2026-02-21T10:23:47.7897768Z st.shared.b32 [%r41+16384], %r10707; 2026-02-21T10:23:47.7897838Z st.shared.b32 [%r41+16392], %r10708; 2026-02-21T10:23:47.7897908Z st.shared.b32 [%r42], %r10693; 2026-02-21T10:23:47.7897972Z st.shared.b32 [%r42+8], %r10694; 2026-02-21T10:23:47.7898041Z st.shared.b32 [%r42+16384], %r10709; 2026-02-21T10:23:47.7898109Z st.shared.b32 [%r42+16392], %r10710; 2026-02-21T10:23:47.7898174Z st.shared.b32 [%r43], %r10695; 2026-02-21T10:23:47.7898239Z st.shared.b32 [%r43+8], %r10696; 2026-02-21T10:23:47.7898308Z st.shared.b32 [%r43+16384], %r10711; 2026-02-21T10:23:47.7898378Z st.shared.b32 [%r43+16392], %r10712; 2026-02-21T10:23:47.7898434Z $L__tmp13: 2026-02-21T10:23:47.7898722Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7898787Z // begin inline asm 2026-02-21T10:23:47.7898962Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7899025Z // end inline asm 2026-02-21T10:23:47.7899082Z bar.sync 0; 2026-02-21T10:23:47.7899157Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7899219Z // begin inline asm 2026-02-21T10:23:47.7900781Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8336,%r8337,%r8338,%r8339}, %rd2, %p79, 1, 1; 2026-02-21T10:23:47.7900841Z // end inline asm 2026-02-21T10:23:47.7900905Z // begin inline asm 2026-02-21T10:23:47.7902483Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8468,%r8469,%r8470,%r8471}, %rd3, %p79, 1, 1; 2026-02-21T10:23:47.7902548Z // end inline asm 2026-02-21T10:23:47.7902610Z // begin inline asm 2026-02-21T10:23:47.7904072Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8600,%r8601,%r8602,%r8603}, %rd4, %p79, 1, 1; 2026-02-21T10:23:47.7904135Z // end inline asm 2026-02-21T10:23:47.7904207Z // begin inline asm 2026-02-21T10:23:47.7905669Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8732,%r8733,%r8734,%r8735}, %rd5, %p79, 1, 1; 2026-02-21T10:23:47.7905730Z // end inline asm 2026-02-21T10:23:47.7905791Z // begin inline asm 2026-02-21T10:23:47.7907365Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8864,%r8865,%r8866,%r8867}, %rd6, %p79, 1, 1; 2026-02-21T10:23:47.7907432Z // end inline asm 2026-02-21T10:23:47.7907495Z // begin inline asm 2026-02-21T10:23:47.7909018Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r8996,%r8997,%r8998,%r8999}, %rd7, %p79, 1, 1; 2026-02-21T10:23:47.7909239Z // end inline asm 2026-02-21T10:23:47.7909305Z // begin inline asm 2026-02-21T10:23:47.7910813Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9128,%r9129,%r9130,%r9131}, %rd8, %p79, 1, 1; 2026-02-21T10:23:47.7910885Z // end inline asm 2026-02-21T10:23:47.7910945Z // begin inline asm 2026-02-21T10:23:47.7912457Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9260,%r9261,%r9262,%r9263}, %rd9, %p79, 1, 1; 2026-02-21T10:23:47.7912538Z // end inline asm 2026-02-21T10:23:47.7912622Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7912685Z mov.b32 %r9329, %r10545; 2026-02-21T10:23:47.7912750Z mov.b32 %r9330, %r10545; 2026-02-21T10:23:47.7912810Z mov.b32 %r9328, %r10980; 2026-02-21T10:23:47.7912873Z // begin inline asm 2026-02-21T10:23:47.7914133Z // wait for regs: %r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218,%r9328,%r9329,%r9330 2026-02-21T10:23:47.7914213Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7914270Z // end inline asm 2026-02-21T10:23:47.7914330Z $L__tmp14: 2026-02-21T10:23:47.7914556Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7914626Z add.s64 %rd232, %rd169, 384; 2026-02-21T10:23:47.7914690Z add.s64 %rd235, %rd172, 384; 2026-02-21T10:23:47.7914759Z add.s64 %rd238, %rd175, 384; 2026-02-21T10:23:47.7914968Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7915043Z add.s64 %rd241, %rd178, 384; 2026-02-21T10:23:47.7915111Z // begin inline asm 2026-02-21T10:23:47.7915172Z mov.u64 %rd231, 0x0; 2026-02-21T10:23:47.7915301Z createpolicy.fractional.L2::evict_last.b64 %rd231, 1.0; 2026-02-21T10:23:47.7915362Z // end inline asm 2026-02-21T10:23:47.7915422Z // begin inline asm 2026-02-21T10:23:47.7915480Z mov.u32 %r9398, 0x0; 2026-02-21T10:23:47.7915539Z mov.u32 %r9399, 0x0; 2026-02-21T10:23:47.7915602Z mov.u32 %r9400, 0x0; 2026-02-21T10:23:47.7915659Z mov.u32 %r9401, 0x0; 2026-02-21T10:23:47.7915945Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r9398, %r9399, %r9400, %r9401 }, [ %rd232 + 0 ], %rd231; 2026-02-21T10:23:47.7916009Z // end inline asm 2026-02-21T10:23:47.7916067Z // begin inline asm 2026-02-21T10:23:47.7916128Z mov.u64 %rd234, 0x0; 2026-02-21T10:23:47.7916315Z createpolicy.fractional.L2::evict_last.b64 %rd234, 1.0; 2026-02-21T10:23:47.7916377Z // end inline asm 2026-02-21T10:23:47.7916435Z // begin inline asm 2026-02-21T10:23:47.7916628Z mov.u32 %r9402, 0x0; 2026-02-21T10:23:47.7916694Z mov.u32 %r9403, 0x0; 2026-02-21T10:23:47.7916752Z mov.u32 %r9404, 0x0; 2026-02-21T10:23:47.7916809Z mov.u32 %r9405, 0x0; 2026-02-21T10:23:47.7917040Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r9402, %r9403, %r9404, %r9405 }, [ %rd235 + 0 ], %rd234; 2026-02-21T10:23:47.7917099Z // end inline asm 2026-02-21T10:23:47.7917159Z // begin inline asm 2026-02-21T10:23:47.7917217Z mov.u64 %rd237, 0x0; 2026-02-21T10:23:47.7917350Z createpolicy.fractional.L2::evict_last.b64 %rd237, 1.0; 2026-02-21T10:23:47.7917411Z // end inline asm 2026-02-21T10:23:47.7917546Z // begin inline asm 2026-02-21T10:23:47.7917613Z mov.u32 %r9406, 0x0; 2026-02-21T10:23:47.7917671Z mov.u32 %r9407, 0x0; 2026-02-21T10:23:47.7917727Z mov.u32 %r9408, 0x0; 2026-02-21T10:23:47.7917788Z mov.u32 %r9409, 0x0; 2026-02-21T10:23:47.7918008Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r9406, %r9407, %r9408, %r9409 }, [ %rd238 + 0 ], %rd237; 2026-02-21T10:23:47.7918130Z // end inline asm 2026-02-21T10:23:47.7918193Z // begin inline asm 2026-02-21T10:23:47.7918254Z mov.u64 %rd240, 0x0; 2026-02-21T10:23:47.7918369Z createpolicy.fractional.L2::evict_last.b64 %rd240, 1.0; 2026-02-21T10:23:47.7918427Z // end inline asm 2026-02-21T10:23:47.7918489Z // begin inline asm 2026-02-21T10:23:47.7918550Z mov.u32 %r9410, 0x0; 2026-02-21T10:23:47.7918607Z mov.u32 %r9411, 0x0; 2026-02-21T10:23:47.7918664Z mov.u32 %r9412, 0x0; 2026-02-21T10:23:47.7918726Z mov.u32 %r9413, 0x0; 2026-02-21T10:23:47.7918939Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r9410, %r9411, %r9412, %r9413 }, [ %rd241 + 0 ], %rd240; 2026-02-21T10:23:47.7919000Z // end inline asm 2026-02-21T10:23:47.7919232Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7919297Z bar.sync 0; 2026-02-21T10:23:47.7919381Z st.shared.v2.b32 [%r18], {%r9398, %r9399}; 2026-02-21T10:23:47.7919478Z st.shared.v2.b32 [%r18+4096], {%r9402, %r9403}; 2026-02-21T10:23:47.7919560Z st.shared.v2.b32 [%r18+8192], {%r9406, %r9407}; 2026-02-21T10:23:47.7919650Z st.shared.v2.b32 [%r18+12288], {%r9410, %r9411}; 2026-02-21T10:23:47.7919727Z st.shared.v2.b32 [%r19], {%r9400, %r9401}; 2026-02-21T10:23:47.7919814Z st.shared.v2.b32 [%r19+4096], {%r9404, %r9405}; 2026-02-21T10:23:47.7919897Z st.shared.v2.b32 [%r19+8192], {%r9408, %r9409}; 2026-02-21T10:23:47.7919983Z st.shared.v2.b32 [%r19+12288], {%r9412, %r9413}; 2026-02-21T10:23:47.7920041Z bar.sync 0; 2026-02-21T10:23:47.7920112Z ld.shared.b16 %rs1345, [%r20]; 2026-02-21T10:23:47.7920188Z ld.shared.b16 %rs1346, [%r20+1024]; 2026-02-21T10:23:47.7920261Z ld.shared.b16 %rs1347, [%r20+64]; 2026-02-21T10:23:47.7920328Z ld.shared.b16 %rs1348, [%r20+1088]; 2026-02-21T10:23:47.7920394Z ld.shared.b16 %rs1349, [%r21]; 2026-02-21T10:23:47.7920464Z ld.shared.b16 %rs1350, [%r21+1024]; 2026-02-21T10:23:47.7920545Z ld.shared.b16 %rs1351, [%r21+64]; 2026-02-21T10:23:47.7920616Z ld.shared.b16 %rs1352, [%r21+1088]; 2026-02-21T10:23:47.7920683Z ld.shared.b16 %rs1353, [%r22]; 2026-02-21T10:23:47.7920752Z ld.shared.b16 %rs1354, [%r22+1024]; 2026-02-21T10:23:47.7920818Z ld.shared.b16 %rs1355, [%r22+64]; 2026-02-21T10:23:47.7920884Z ld.shared.b16 %rs1356, [%r22+1088]; 2026-02-21T10:23:47.7920947Z ld.shared.b16 %rs1357, [%r23]; 2026-02-21T10:23:47.7921017Z ld.shared.b16 %rs1358, [%r23+1024]; 2026-02-21T10:23:47.7921082Z ld.shared.b16 %rs1359, [%r23+64]; 2026-02-21T10:23:47.7921149Z ld.shared.b16 %rs1360, [%r23+1088]; 2026-02-21T10:23:47.7921305Z ld.shared.b16 %rs1361, [%r24]; 2026-02-21T10:23:47.7921377Z ld.shared.b16 %rs1362, [%r24+1024]; 2026-02-21T10:23:47.7921442Z ld.shared.b16 %rs1363, [%r24+64]; 2026-02-21T10:23:47.7921509Z ld.shared.b16 %rs1364, [%r24+1088]; 2026-02-21T10:23:47.7921640Z ld.shared.b16 %rs1365, [%r25]; 2026-02-21T10:23:47.7921706Z ld.shared.b16 %rs1366, [%r25+1024]; 2026-02-21T10:23:47.7921772Z ld.shared.b16 %rs1367, [%r25+64]; 2026-02-21T10:23:47.7921845Z ld.shared.b16 %rs1368, [%r25+1088]; 2026-02-21T10:23:47.7921911Z ld.shared.b16 %rs1369, [%r26]; 2026-02-21T10:23:47.7921975Z ld.shared.b16 %rs1370, [%r26+1024]; 2026-02-21T10:23:47.7922044Z ld.shared.b16 %rs1371, [%r26+64]; 2026-02-21T10:23:47.7922109Z ld.shared.b16 %rs1372, [%r26+1088]; 2026-02-21T10:23:47.7922183Z ld.shared.b16 %rs1373, [%r27]; 2026-02-21T10:23:47.7922248Z ld.shared.b16 %rs1374, [%r27+1024]; 2026-02-21T10:23:47.7922315Z ld.shared.b16 %rs1375, [%r27+64]; 2026-02-21T10:23:47.7922381Z ld.shared.b16 %rs1376, [%r27+1088]; 2026-02-21T10:23:47.7922495Z cvt.f32.bf16 %r9551, %rs1345; 2026-02-21T10:23:47.7922568Z cvt.f32.bf16 %r9552, %rs1346; 2026-02-21T10:23:47.7922638Z cvt.f32.bf16 %r9553, %rs1349; 2026-02-21T10:23:47.7922699Z cvt.f32.bf16 %r9554, %rs1350; 2026-02-21T10:23:47.7922764Z cvt.f32.bf16 %r9683, %rs1353; 2026-02-21T10:23:47.7922828Z cvt.f32.bf16 %r9684, %rs1354; 2026-02-21T10:23:47.7922890Z cvt.f32.bf16 %r9685, %rs1357; 2026-02-21T10:23:47.7922999Z cvt.f32.bf16 %r9686, %rs1358; 2026-02-21T10:23:47.7923068Z cvt.f32.bf16 %r9815, %rs1361; 2026-02-21T10:23:47.7923128Z cvt.f32.bf16 %r9816, %rs1362; 2026-02-21T10:23:47.7923189Z cvt.f32.bf16 %r9817, %rs1365; 2026-02-21T10:23:47.7923252Z cvt.f32.bf16 %r9818, %rs1366; 2026-02-21T10:23:47.7923314Z cvt.f32.bf16 %r9947, %rs1369; 2026-02-21T10:23:47.7923374Z cvt.f32.bf16 %r9948, %rs1370; 2026-02-21T10:23:47.7923435Z cvt.f32.bf16 %r9949, %rs1373; 2026-02-21T10:23:47.7923498Z cvt.f32.bf16 %r9950, %rs1374; 2026-02-21T10:23:47.7923561Z cvt.f32.bf16 %r10079, %rs1347; 2026-02-21T10:23:47.7923629Z cvt.f32.bf16 %r10080, %rs1348; 2026-02-21T10:23:47.7923695Z cvt.f32.bf16 %r10081, %rs1351; 2026-02-21T10:23:47.7923756Z cvt.f32.bf16 %r10082, %rs1352; 2026-02-21T10:23:47.7923817Z cvt.f32.bf16 %r10211, %rs1355; 2026-02-21T10:23:47.7923881Z cvt.f32.bf16 %r10212, %rs1356; 2026-02-21T10:23:47.7923952Z cvt.f32.bf16 %r10213, %rs1359; 2026-02-21T10:23:47.7924018Z cvt.f32.bf16 %r10214, %rs1360; 2026-02-21T10:23:47.7924086Z cvt.f32.bf16 %r10343, %rs1363; 2026-02-21T10:23:47.7924150Z cvt.f32.bf16 %r10344, %rs1364; 2026-02-21T10:23:47.7924212Z cvt.f32.bf16 %r10345, %rs1367; 2026-02-21T10:23:47.7924272Z cvt.f32.bf16 %r10346, %rs1368; 2026-02-21T10:23:47.7924333Z cvt.f32.bf16 %r10475, %rs1371; 2026-02-21T10:23:47.7924403Z cvt.f32.bf16 %r10476, %rs1372; 2026-02-21T10:23:47.7924463Z cvt.f32.bf16 %r10477, %rs1375; 2026-02-21T10:23:47.7924524Z cvt.f32.bf16 %r10478, %rs1376; 2026-02-21T10:23:47.7924747Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.7924809Z bar.sync 0; 2026-02-21T10:23:47.7924872Z // begin inline asm 2026-02-21T10:23:47.7924979Z @%p132 mbarrier.init.shared::cta.b64 [%r5769], 1; 2026-02-21T10:23:47.7925037Z // end inline asm 2026-02-21T10:23:47.7925094Z bar.sync 0; 2026-02-21T10:23:47.7925153Z // begin inline asm 2026-02-21T10:23:47.7925294Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r5769], 4096; 2026-02-21T10:23:47.7925354Z // end inline asm 2026-02-21T10:23:47.7925412Z // begin inline asm 2026-02-21T10:23:47.7925493Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7925553Z // end inline asm 2026-02-21T10:23:47.7925610Z bar.sync 0; 2026-02-21T10:23:47.7925678Z elect.sync %r10713|%p128, -1; 2026-02-21T10:23:47.7925752Z and.pred %p113, %p1, %p128; 2026-02-21T10:23:47.7925816Z or.b32 %r9418, %r5773, 96; 2026-02-21T10:23:47.7925876Z // begin inline asm 2026-02-21T10:23:47.7926216Z @%p113 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r5772, %r9418}], [%r5769]; 2026-02-21T10:23:47.7926362Z // end inline asm 2026-02-21T10:23:47.7926418Z bar.sync 0; 2026-02-21T10:23:47.7926606Z // begin inline asm 2026-02-21T10:23:47.7926662Z 2026-02-21T10:23:47.7926788Z { 2026-02-21T10:23:47.7926857Z .reg .pred complete; 2026-02-21T10:23:47.7926916Z waitLoop: 2026-02-21T10:23:47.7927065Z mbarrier.try_wait.parity.shared.b64 complete, [%r5769], %r10545; 2026-02-21T10:23:47.7927139Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.7927195Z } 2026-02-21T10:23:47.7927201Z 2026-02-21T10:23:47.7927258Z // end inline asm 2026-02-21T10:23:47.7927313Z bar.sync 0; 2026-02-21T10:23:47.7927372Z // begin inline asm 2026-02-21T10:23:47.7927474Z @%p132 mbarrier.inval.shared::cta.b64 [%r5769]; 2026-02-21T10:23:47.7927530Z // end inline asm 2026-02-21T10:23:47.7927744Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7927816Z ld.shared.b8 %rs1377, [%r28]; 2026-02-21T10:23:47.7927953Z ld.shared.b8 %rs1378, [%r28+1024]; 2026-02-21T10:23:47.7928021Z ld.shared.b8 %rs1379, [%r28+2048]; 2026-02-21T10:23:47.7928086Z ld.shared.b8 %rs1380, [%r28+3072]; 2026-02-21T10:23:47.7928159Z ld.shared.b8 %rs1381, [%r29+128]; 2026-02-21T10:23:47.7928228Z ld.shared.b8 %rs1382, [%r29+1152]; 2026-02-21T10:23:47.7928292Z ld.shared.b8 %rs1383, [%r29+2176]; 2026-02-21T10:23:47.7928421Z ld.shared.b8 %rs1384, [%r29+3200]; 2026-02-21T10:23:47.7928489Z ld.shared.b8 %rs1385, [%r30+256]; 2026-02-21T10:23:47.7928553Z ld.shared.b8 %rs1386, [%r30+1280]; 2026-02-21T10:23:47.7928627Z ld.shared.b8 %rs1387, [%r30+2304]; 2026-02-21T10:23:47.7928699Z ld.shared.b8 %rs1388, [%r30+3328]; 2026-02-21T10:23:47.7928764Z ld.shared.b8 %rs1389, [%r31+384]; 2026-02-21T10:23:47.7928829Z ld.shared.b8 %rs1390, [%r31+1408]; 2026-02-21T10:23:47.7928895Z ld.shared.b8 %rs1391, [%r31+2432]; 2026-02-21T10:23:47.7928959Z ld.shared.b8 %rs1392, [%r31+3456]; 2026-02-21T10:23:47.7929025Z ld.shared.b8 %rs1393, [%r32+512]; 2026-02-21T10:23:47.7929096Z ld.shared.b8 %rs1394, [%r32+1536]; 2026-02-21T10:23:47.7929162Z ld.shared.b8 %rs1395, [%r32+2560]; 2026-02-21T10:23:47.7929226Z ld.shared.b8 %rs1396, [%r32+3584]; 2026-02-21T10:23:47.7929293Z ld.shared.b8 %rs1397, [%r33+640]; 2026-02-21T10:23:47.7929363Z ld.shared.b8 %rs1398, [%r33+1664]; 2026-02-21T10:23:47.7929427Z ld.shared.b8 %rs1399, [%r33+2688]; 2026-02-21T10:23:47.7929496Z ld.shared.b8 %rs1400, [%r33+3712]; 2026-02-21T10:23:47.7929564Z ld.shared.b8 %rs1401, [%r34+768]; 2026-02-21T10:23:47.7929628Z ld.shared.b8 %rs1402, [%r34+1792]; 2026-02-21T10:23:47.7929692Z ld.shared.b8 %rs1403, [%r34+2816]; 2026-02-21T10:23:47.7929758Z ld.shared.b8 %rs1404, [%r34+3840]; 2026-02-21T10:23:47.7929821Z ld.shared.b8 %rs1405, [%r35+896]; 2026-02-21T10:23:47.7929884Z ld.shared.b8 %rs1406, [%r35+1920]; 2026-02-21T10:23:47.7929947Z ld.shared.b8 %rs1407, [%r35+2944]; 2026-02-21T10:23:47.7930015Z ld.shared.b8 %rs1408, [%r35+3968]; 2026-02-21T10:23:47.7930230Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.7930299Z shl.b16 %rs1409, %rs1377, 4; 2026-02-21T10:23:47.7930363Z shl.b16 %rs1410, %rs1381, 4; 2026-02-21T10:23:47.7930427Z shl.b16 %rs1411, %rs1385, 4; 2026-02-21T10:23:47.7930489Z shl.b16 %rs1412, %rs1389, 4; 2026-02-21T10:23:47.7930550Z shl.b16 %rs1413, %rs1393, 4; 2026-02-21T10:23:47.7930620Z shl.b16 %rs1414, %rs1397, 4; 2026-02-21T10:23:47.7930682Z shl.b16 %rs1415, %rs1401, 4; 2026-02-21T10:23:47.7930742Z shl.b16 %rs1416, %rs1405, 4; 2026-02-21T10:23:47.7930808Z shl.b16 %rs1417, %rs1378, 4; 2026-02-21T10:23:47.7930869Z shl.b16 %rs1418, %rs1382, 4; 2026-02-21T10:23:47.7930931Z shl.b16 %rs1419, %rs1386, 4; 2026-02-21T10:23:47.7930993Z shl.b16 %rs1420, %rs1390, 4; 2026-02-21T10:23:47.7931057Z shl.b16 %rs1421, %rs1394, 4; 2026-02-21T10:23:47.7931118Z shl.b16 %rs1422, %rs1398, 4; 2026-02-21T10:23:47.7931179Z shl.b16 %rs1423, %rs1402, 4; 2026-02-21T10:23:47.7931331Z shl.b16 %rs1424, %rs1406, 4; 2026-02-21T10:23:47.7931393Z shl.b16 %rs1425, %rs1379, 4; 2026-02-21T10:23:47.7931452Z shl.b16 %rs1426, %rs1383, 4; 2026-02-21T10:23:47.7931516Z shl.b16 %rs1427, %rs1387, 4; 2026-02-21T10:23:47.7931627Z shl.b16 %rs1428, %rs1391, 4; 2026-02-21T10:23:47.7931687Z shl.b16 %rs1429, %rs1395, 4; 2026-02-21T10:23:47.7931755Z shl.b16 %rs1430, %rs1399, 4; 2026-02-21T10:23:47.7931821Z shl.b16 %rs1431, %rs1403, 4; 2026-02-21T10:23:47.7931882Z shl.b16 %rs1432, %rs1407, 4; 2026-02-21T10:23:47.7931942Z shl.b16 %rs1433, %rs1380, 4; 2026-02-21T10:23:47.7932006Z shl.b16 %rs1434, %rs1384, 4; 2026-02-21T10:23:47.7932066Z shl.b16 %rs1435, %rs1388, 4; 2026-02-21T10:23:47.7932124Z shl.b16 %rs1436, %rs1392, 4; 2026-02-21T10:23:47.7932184Z shl.b16 %rs1437, %rs1396, 4; 2026-02-21T10:23:47.7932248Z shl.b16 %rs1438, %rs1400, 4; 2026-02-21T10:23:47.7932308Z shl.b16 %rs1439, %rs1404, 4; 2026-02-21T10:23:47.7932368Z shl.b16 %rs1440, %rs1408, 4; 2026-02-21T10:23:47.7932627Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.7932712Z selp.b16 %rs1441, %rs1409, %rs1377, %p188; 2026-02-21T10:23:47.7932776Z cvt.s16.s8 %rs1442, %rs1441; 2026-02-21T10:23:47.7932848Z shr.s16 %rs1443, %rs1442, 4; 2026-02-21T10:23:47.7932924Z selp.b16 %rs1444, %rs1410, %rs1381, %p188; 2026-02-21T10:23:47.7933035Z cvt.s16.s8 %rs1445, %rs1444; 2026-02-21T10:23:47.7933098Z shr.s16 %rs1446, %rs1445, 4; 2026-02-21T10:23:47.7933178Z selp.b16 %rs1447, %rs1411, %rs1385, %p188; 2026-02-21T10:23:47.7933241Z cvt.s16.s8 %rs1448, %rs1447; 2026-02-21T10:23:47.7933301Z shr.s16 %rs1449, %rs1448, 4; 2026-02-21T10:23:47.7933379Z selp.b16 %rs1450, %rs1412, %rs1389, %p188; 2026-02-21T10:23:47.7933441Z cvt.s16.s8 %rs1451, %rs1450; 2026-02-21T10:23:47.7933501Z shr.s16 %rs1452, %rs1451, 4; 2026-02-21T10:23:47.7933576Z selp.b16 %rs1453, %rs1413, %rs1393, %p188; 2026-02-21T10:23:47.7933645Z cvt.s16.s8 %rs1454, %rs1453; 2026-02-21T10:23:47.7933709Z shr.s16 %rs1455, %rs1454, 4; 2026-02-21T10:23:47.7933783Z selp.b16 %rs1456, %rs1414, %rs1397, %p188; 2026-02-21T10:23:47.7933848Z cvt.s16.s8 %rs1457, %rs1456; 2026-02-21T10:23:47.7933909Z shr.s16 %rs1458, %rs1457, 4; 2026-02-21T10:23:47.7933984Z selp.b16 %rs1459, %rs1415, %rs1401, %p188; 2026-02-21T10:23:47.7934044Z cvt.s16.s8 %rs1460, %rs1459; 2026-02-21T10:23:47.7934112Z shr.s16 %rs1461, %rs1460, 4; 2026-02-21T10:23:47.7934187Z selp.b16 %rs1462, %rs1416, %rs1405, %p188; 2026-02-21T10:23:47.7934246Z cvt.s16.s8 %rs1463, %rs1462; 2026-02-21T10:23:47.7934311Z shr.s16 %rs1464, %rs1463, 4; 2026-02-21T10:23:47.7934386Z selp.b16 %rs1465, %rs1417, %rs1378, %p188; 2026-02-21T10:23:47.7934446Z cvt.s16.s8 %rs1466, %rs1465; 2026-02-21T10:23:47.7934510Z shr.s16 %rs1467, %rs1466, 4; 2026-02-21T10:23:47.7934584Z selp.b16 %rs1468, %rs1418, %rs1382, %p188; 2026-02-21T10:23:47.7934644Z cvt.s16.s8 %rs1469, %rs1468; 2026-02-21T10:23:47.7934719Z shr.s16 %rs1470, %rs1469, 4; 2026-02-21T10:23:47.7934799Z selp.b16 %rs1471, %rs1419, %rs1386, %p188; 2026-02-21T10:23:47.7934863Z cvt.s16.s8 %rs1472, %rs1471; 2026-02-21T10:23:47.7934922Z shr.s16 %rs1473, %rs1472, 4; 2026-02-21T10:23:47.7934998Z selp.b16 %rs1474, %rs1420, %rs1390, %p188; 2026-02-21T10:23:47.7935062Z cvt.s16.s8 %rs1475, %rs1474; 2026-02-21T10:23:47.7935122Z shr.s16 %rs1476, %rs1475, 4; 2026-02-21T10:23:47.7935199Z selp.b16 %rs1477, %rs1421, %rs1394, %p188; 2026-02-21T10:23:47.7935267Z cvt.s16.s8 %rs1478, %rs1477; 2026-02-21T10:23:47.7935329Z shr.s16 %rs1479, %rs1478, 4; 2026-02-21T10:23:47.7935403Z selp.b16 %rs1480, %rs1422, %rs1398, %p188; 2026-02-21T10:23:47.7935469Z cvt.s16.s8 %rs1481, %rs1480; 2026-02-21T10:23:47.7935529Z shr.s16 %rs1482, %rs1481, 4; 2026-02-21T10:23:47.7935603Z selp.b16 %rs1483, %rs1423, %rs1402, %p188; 2026-02-21T10:23:47.7935667Z cvt.s16.s8 %rs1484, %rs1483; 2026-02-21T10:23:47.7935728Z shr.s16 %rs1485, %rs1484, 4; 2026-02-21T10:23:47.7936270Z selp.b16 %rs1486, %rs1424, %rs1406, %p188; 2026-02-21T10:23:47.7936331Z cvt.s16.s8 %rs1487, %rs1486; 2026-02-21T10:23:47.7936396Z shr.s16 %rs1488, %rs1487, 4; 2026-02-21T10:23:47.7936587Z selp.b16 %rs1489, %rs1425, %rs1379, %p188; 2026-02-21T10:23:47.7936737Z cvt.s16.s8 %rs1490, %rs1489; 2026-02-21T10:23:47.7936803Z shr.s16 %rs1491, %rs1490, 4; 2026-02-21T10:23:47.7936881Z selp.b16 %rs1492, %rs1426, %rs1383, %p188; 2026-02-21T10:23:47.7936948Z cvt.s16.s8 %rs1493, %rs1492; 2026-02-21T10:23:47.7937011Z shr.s16 %rs1494, %rs1493, 4; 2026-02-21T10:23:47.7937092Z selp.b16 %rs1495, %rs1427, %rs1387, %p188; 2026-02-21T10:23:47.7937152Z cvt.s16.s8 %rs1496, %rs1495; 2026-02-21T10:23:47.7937212Z shr.s16 %rs1497, %rs1496, 4; 2026-02-21T10:23:47.7937290Z selp.b16 %rs1498, %rs1428, %rs1391, %p188; 2026-02-21T10:23:47.7937353Z cvt.s16.s8 %rs1499, %rs1498; 2026-02-21T10:23:47.7937414Z shr.s16 %rs1500, %rs1499, 4; 2026-02-21T10:23:47.7937489Z selp.b16 %rs1501, %rs1429, %rs1395, %p188; 2026-02-21T10:23:47.7937620Z cvt.s16.s8 %rs1502, %rs1501; 2026-02-21T10:23:47.7937687Z shr.s16 %rs1503, %rs1502, 4; 2026-02-21T10:23:47.7937761Z selp.b16 %rs1504, %rs1430, %rs1399, %p188; 2026-02-21T10:23:47.7937833Z cvt.s16.s8 %rs1505, %rs1504; 2026-02-21T10:23:47.7937904Z shr.s16 %rs1506, %rs1505, 4; 2026-02-21T10:23:47.7937979Z selp.b16 %rs1507, %rs1431, %rs1403, %p188; 2026-02-21T10:23:47.7938044Z cvt.s16.s8 %rs1508, %rs1507; 2026-02-21T10:23:47.7938173Z shr.s16 %rs1509, %rs1508, 4; 2026-02-21T10:23:47.7938253Z selp.b16 %rs1510, %rs1432, %rs1407, %p188; 2026-02-21T10:23:47.7938324Z cvt.s16.s8 %rs1511, %rs1510; 2026-02-21T10:23:47.7938391Z shr.s16 %rs1512, %rs1511, 4; 2026-02-21T10:23:47.7938465Z selp.b16 %rs1513, %rs1433, %rs1380, %p188; 2026-02-21T10:23:47.7938529Z cvt.s16.s8 %rs1514, %rs1513; 2026-02-21T10:23:47.7938593Z shr.s16 %rs1515, %rs1514, 4; 2026-02-21T10:23:47.7938667Z selp.b16 %rs1516, %rs1434, %rs1384, %p188; 2026-02-21T10:23:47.7938727Z cvt.s16.s8 %rs1517, %rs1516; 2026-02-21T10:23:47.7938791Z shr.s16 %rs1518, %rs1517, 4; 2026-02-21T10:23:47.7938870Z selp.b16 %rs1519, %rs1435, %rs1388, %p188; 2026-02-21T10:23:47.7938930Z cvt.s16.s8 %rs1520, %rs1519; 2026-02-21T10:23:47.7938992Z shr.s16 %rs1521, %rs1520, 4; 2026-02-21T10:23:47.7939074Z selp.b16 %rs1522, %rs1436, %rs1392, %p188; 2026-02-21T10:23:47.7939133Z cvt.s16.s8 %rs1523, %rs1522; 2026-02-21T10:23:47.7939194Z shr.s16 %rs1524, %rs1523, 4; 2026-02-21T10:23:47.7939272Z selp.b16 %rs1525, %rs1437, %rs1396, %p188; 2026-02-21T10:23:47.7939335Z cvt.s16.s8 %rs1526, %rs1525; 2026-02-21T10:23:47.7939397Z shr.s16 %rs1527, %rs1526, 4; 2026-02-21T10:23:47.7939470Z selp.b16 %rs1528, %rs1438, %rs1400, %p188; 2026-02-21T10:23:47.7939534Z cvt.s16.s8 %rs1529, %rs1528; 2026-02-21T10:23:47.7939595Z shr.s16 %rs1530, %rs1529, 4; 2026-02-21T10:23:47.7939668Z selp.b16 %rs1531, %rs1439, %rs1404, %p188; 2026-02-21T10:23:47.7939737Z cvt.s16.s8 %rs1532, %rs1531; 2026-02-21T10:23:47.7939797Z shr.s16 %rs1533, %rs1532, 4; 2026-02-21T10:23:47.7939876Z selp.b16 %rs1534, %rs1440, %rs1408, %p188; 2026-02-21T10:23:47.7939938Z cvt.s16.s8 %rs1535, %rs1534; 2026-02-21T10:23:47.7940002Z shr.s16 %rs1536, %rs1535, 4; 2026-02-21T10:23:47.7940213Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.7940288Z cvt.rn.f32.s16 %r10714, %rs1443; 2026-02-21T10:23:47.7940359Z cvt.rn.f32.s16 %r10715, %rs1446; 2026-02-21T10:23:47.7940422Z cvt.rn.f32.s16 %r10716, %rs1449; 2026-02-21T10:23:47.7940484Z cvt.rn.f32.s16 %r10717, %rs1452; 2026-02-21T10:23:47.7940552Z cvt.rn.f32.s16 %r10718, %rs1455; 2026-02-21T10:23:47.7940614Z cvt.rn.f32.s16 %r10719, %rs1458; 2026-02-21T10:23:47.7940676Z cvt.rn.f32.s16 %r10720, %rs1461; 2026-02-21T10:23:47.7940739Z cvt.rn.f32.s16 %r10721, %rs1464; 2026-02-21T10:23:47.7940805Z cvt.rn.f32.s16 %r10722, %rs1467; 2026-02-21T10:23:47.7940867Z cvt.rn.f32.s16 %r10723, %rs1470; 2026-02-21T10:23:47.7940930Z cvt.rn.f32.s16 %r10724, %rs1473; 2026-02-21T10:23:47.7941086Z cvt.rn.f32.s16 %r10725, %rs1476; 2026-02-21T10:23:47.7941150Z cvt.rn.f32.s16 %r10726, %rs1479; 2026-02-21T10:23:47.7941210Z cvt.rn.f32.s16 %r10727, %rs1482; 2026-02-21T10:23:47.7941273Z cvt.rn.f32.s16 %r10728, %rs1485; 2026-02-21T10:23:47.7941386Z cvt.rn.f32.s16 %r10729, %rs1488; 2026-02-21T10:23:47.7941449Z cvt.rn.f32.s16 %r10730, %rs1491; 2026-02-21T10:23:47.7941516Z cvt.rn.f32.s16 %r10731, %rs1494; 2026-02-21T10:23:47.7941582Z cvt.rn.f32.s16 %r10732, %rs1497; 2026-02-21T10:23:47.7941643Z cvt.rn.f32.s16 %r10733, %rs1500; 2026-02-21T10:23:47.7941706Z cvt.rn.f32.s16 %r10734, %rs1503; 2026-02-21T10:23:47.7941771Z cvt.rn.f32.s16 %r10735, %rs1506; 2026-02-21T10:23:47.7941833Z cvt.rn.f32.s16 %r10736, %rs1509; 2026-02-21T10:23:47.7941895Z cvt.rn.f32.s16 %r10737, %rs1512; 2026-02-21T10:23:47.7941955Z cvt.rn.f32.s16 %r10738, %rs1515; 2026-02-21T10:23:47.7942021Z cvt.rn.f32.s16 %r10739, %rs1518; 2026-02-21T10:23:47.7942090Z cvt.rn.f32.s16 %r10740, %rs1521; 2026-02-21T10:23:47.7942202Z cvt.rn.f32.s16 %r10741, %rs1524; 2026-02-21T10:23:47.7942270Z cvt.rn.f32.s16 %r10742, %rs1527; 2026-02-21T10:23:47.7942331Z cvt.rn.f32.s16 %r10743, %rs1530; 2026-02-21T10:23:47.7942394Z cvt.rn.f32.s16 %r10744, %rs1533; 2026-02-21T10:23:47.7942459Z cvt.rn.f32.s16 %r10745, %rs1536; 2026-02-21T10:23:47.7942519Z bar.sync 0; 2026-02-21T10:23:47.7942585Z st.shared.b32 [%r36], %r10714; 2026-02-21T10:23:47.7942713Z st.shared.b32 [%r36+8], %r10715; 2026-02-21T10:23:47.7942790Z st.shared.b32 [%r36+16384], %r10730; 2026-02-21T10:23:47.7942870Z st.shared.b32 [%r36+16392], %r10731; 2026-02-21T10:23:47.7942937Z st.shared.b32 [%r37], %r10716; 2026-02-21T10:23:47.7943001Z st.shared.b32 [%r37+8], %r10717; 2026-02-21T10:23:47.7943070Z st.shared.b32 [%r37+16384], %r10732; 2026-02-21T10:23:47.7943136Z st.shared.b32 [%r37+16392], %r10733; 2026-02-21T10:23:47.7943200Z st.shared.b32 [%r38], %r10718; 2026-02-21T10:23:47.7943268Z st.shared.b32 [%r38+8], %r10719; 2026-02-21T10:23:47.7943339Z st.shared.b32 [%r38+16384], %r10734; 2026-02-21T10:23:47.7943406Z st.shared.b32 [%r38+16392], %r10735; 2026-02-21T10:23:47.7943478Z st.shared.b32 [%r39], %r10720; 2026-02-21T10:23:47.7943542Z st.shared.b32 [%r39+8], %r10721; 2026-02-21T10:23:47.7943612Z st.shared.b32 [%r39+16384], %r10736; 2026-02-21T10:23:47.7943678Z st.shared.b32 [%r39+16392], %r10737; 2026-02-21T10:23:47.7943751Z st.shared.b32 [%r40], %r10722; 2026-02-21T10:23:47.7943817Z st.shared.b32 [%r40+8], %r10723; 2026-02-21T10:23:47.7943884Z st.shared.b32 [%r40+16384], %r10738; 2026-02-21T10:23:47.7943957Z st.shared.b32 [%r40+16392], %r10739; 2026-02-21T10:23:47.7944024Z st.shared.b32 [%r41], %r10724; 2026-02-21T10:23:47.7944091Z st.shared.b32 [%r41+8], %r10725; 2026-02-21T10:23:47.7944160Z st.shared.b32 [%r41+16384], %r10740; 2026-02-21T10:23:47.7944232Z st.shared.b32 [%r41+16392], %r10741; 2026-02-21T10:23:47.7944298Z st.shared.b32 [%r42], %r10726; 2026-02-21T10:23:47.7944364Z st.shared.b32 [%r42+8], %r10727; 2026-02-21T10:23:47.7944440Z st.shared.b32 [%r42+16384], %r10742; 2026-02-21T10:23:47.7944506Z st.shared.b32 [%r42+16392], %r10743; 2026-02-21T10:23:47.7944572Z st.shared.b32 [%r43], %r10728; 2026-02-21T10:23:47.7944640Z st.shared.b32 [%r43+8], %r10729; 2026-02-21T10:23:47.7944710Z st.shared.b32 [%r43+16384], %r10744; 2026-02-21T10:23:47.7944788Z st.shared.b32 [%r43+16392], %r10745; 2026-02-21T10:23:47.7944852Z $L__tmp15: 2026-02-21T10:23:47.7945144Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.7945209Z // begin inline asm 2026-02-21T10:23:47.7945293Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.7945356Z // end inline asm 2026-02-21T10:23:47.7945412Z bar.sync 0; 2026-02-21T10:23:47.7945488Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.7945552Z // begin inline asm 2026-02-21T10:23:47.7947150Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9551,%r9552,%r9553,%r9554}, %rd2, %p79, 1, 1; 2026-02-21T10:23:47.7947362Z // end inline asm 2026-02-21T10:23:47.7947428Z // begin inline asm 2026-02-21T10:23:47.7949015Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9683,%r9684,%r9685,%r9686}, %rd3, %p79, 1, 1; 2026-02-21T10:23:47.7949095Z // end inline asm 2026-02-21T10:23:47.7949163Z // begin inline asm 2026-02-21T10:23:47.7950680Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9815,%r9816,%r9817,%r9818}, %rd4, %p79, 1, 1; 2026-02-21T10:23:47.7950748Z // end inline asm 2026-02-21T10:23:47.7950811Z // begin inline asm 2026-02-21T10:23:47.7952273Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r9947,%r9948,%r9949,%r9950}, %rd5, %p79, 1, 1; 2026-02-21T10:23:47.7952341Z // end inline asm 2026-02-21T10:23:47.7952404Z // begin inline asm 2026-02-21T10:23:47.7953868Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r10079,%r10080,%r10081,%r10082}, %rd6, %p79, 1, 1; 2026-02-21T10:23:47.7953936Z // end inline asm 2026-02-21T10:23:47.7953994Z // begin inline asm 2026-02-21T10:23:47.7955456Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r10211,%r10212,%r10213,%r10214}, %rd7, %p79, 1, 1; 2026-02-21T10:23:47.7955563Z // end inline asm 2026-02-21T10:23:47.7955671Z // begin inline asm 2026-02-21T10:23:47.7957264Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r10343,%r10344,%r10345,%r10346}, %rd8, %p79, 1, 1; 2026-02-21T10:23:47.7957329Z // end inline asm 2026-02-21T10:23:47.7957389Z // begin inline asm 2026-02-21T10:23:47.7958975Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218}, {%r10475,%r10476,%r10477,%r10478}, %rd9, %p79, 1, 1; 2026-02-21T10:23:47.7959045Z // end inline asm 2026-02-21T10:23:47.7959130Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.7959195Z mov.b32 %r10544, %r10545; 2026-02-21T10:23:47.7959256Z mov.b32 %r10543, %r10980; 2026-02-21T10:23:47.7959321Z // begin inline asm 2026-02-21T10:23:47.7960590Z // wait for regs: %r16155,%r16156,%r16157,%r16158,%r16159,%r16160,%r16161,%r16162,%r16163,%r16164,%r16165,%r16166,%r16167,%r16168,%r16169,%r16170,%r16171,%r16172,%r16173,%r16174,%r16175,%r16176,%r16177,%r16178,%r16179,%r16180,%r16181,%r16182,%r16183,%r16184,%r16185,%r16186,%r16187,%r16188,%r16189,%r16190,%r16191,%r16192,%r16193,%r16194,%r16195,%r16196,%r16197,%r16198,%r16199,%r16200,%r16201,%r16202,%r16203,%r16204,%r16205,%r16206,%r16207,%r16208,%r16209,%r16210,%r16211,%r16212,%r16213,%r16214,%r16215,%r16216,%r16217,%r16218,%r10543,%r10544,%r10545 2026-02-21T10:23:47.7960673Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.7960736Z // end inline asm 2026-02-21T10:23:47.7960792Z $L__tmp16: 2026-02-21T10:23:47.7961016Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.7961088Z add.s64 %rd34, %rd364, 128; 2026-02-21T10:23:47.7961155Z add.s64 %rd363, %rd363, 512; 2026-02-21T10:23:47.7961224Z setp.lt.u64 %p129, %rd364, 3968; 2026-02-21T10:23:47.7961288Z mov.b64 %rd364, %rd34; 2026-02-21T10:23:47.7961359Z @%p129 bra $L__BB0_5; 2026-02-21T10:23:47.7961475Z // %bb.6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T10:23:47.7961687Z .loc 1 38 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:38:32 2026-02-21T10:23:47.7961759Z or.b32 %r10818, %r5772, %r15; 2026-02-21T10:23:47.7961964Z .loc 1 40 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:40:32 2026-02-21T10:23:47.7962029Z or.b32 %r10819, %r235, %r7; 2026-02-21T10:23:47.7962097Z or.b32 %r10820, %r235, %r8; 2026-02-21T10:23:47.7962159Z or.b32 %r10821, %r235, %r9; 2026-02-21T10:23:47.7962221Z or.b32 %r10822, %r235, %r10; 2026-02-21T10:23:47.7962282Z or.b32 %r10823, %r235, %r11; 2026-02-21T10:23:47.7962355Z or.b32 %r10824, %r235, %r12; 2026-02-21T10:23:47.7962421Z or.b32 %r10825, %r235, %r13; 2026-02-21T10:23:47.7962485Z or.b32 %r10826, %r235, %r14; 2026-02-21T10:23:47.7962694Z .loc 1 93 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:93:28 2026-02-21T10:23:47.7962856Z cvt.rn.bf16x2.f32 %r10827, %r16156, %r16155; 2026-02-21T10:23:47.7962941Z cvt.rn.bf16x2.f32 %r10828, %r16158, %r16157; 2026-02-21T10:23:47.7963018Z cvt.rn.bf16x2.f32 %r10829, %r16160, %r16159; 2026-02-21T10:23:47.7963161Z cvt.rn.bf16x2.f32 %r10830, %r16162, %r16161; 2026-02-21T10:23:47.7963238Z cvt.rn.bf16x2.f32 %r10831, %r16164, %r16163; 2026-02-21T10:23:47.7963317Z cvt.rn.bf16x2.f32 %r10832, %r16166, %r16165; 2026-02-21T10:23:47.7963400Z cvt.rn.bf16x2.f32 %r10833, %r16168, %r16167; 2026-02-21T10:23:47.7963476Z cvt.rn.bf16x2.f32 %r10834, %r16170, %r16169; 2026-02-21T10:23:47.7963564Z cvt.rn.bf16x2.f32 %r10835, %r16172, %r16171; 2026-02-21T10:23:47.7963648Z cvt.rn.bf16x2.f32 %r10836, %r16174, %r16173; 2026-02-21T10:23:47.7963727Z cvt.rn.bf16x2.f32 %r10837, %r16176, %r16175; 2026-02-21T10:23:47.7963803Z cvt.rn.bf16x2.f32 %r10838, %r16178, %r16177; 2026-02-21T10:23:47.7963880Z cvt.rn.bf16x2.f32 %r10839, %r16180, %r16179; 2026-02-21T10:23:47.7964016Z cvt.rn.bf16x2.f32 %r10840, %r16182, %r16181; 2026-02-21T10:23:47.7964096Z cvt.rn.bf16x2.f32 %r10841, %r16184, %r16183; 2026-02-21T10:23:47.7964172Z cvt.rn.bf16x2.f32 %r10842, %r16186, %r16185; 2026-02-21T10:23:47.7964256Z cvt.rn.bf16x2.f32 %r10843, %r16188, %r16187; 2026-02-21T10:23:47.7964333Z cvt.rn.bf16x2.f32 %r10844, %r16190, %r16189; 2026-02-21T10:23:47.7964453Z cvt.rn.bf16x2.f32 %r10845, %r16192, %r16191; 2026-02-21T10:23:47.7964538Z cvt.rn.bf16x2.f32 %r10846, %r16194, %r16193; 2026-02-21T10:23:47.7964614Z cvt.rn.bf16x2.f32 %r10847, %r16196, %r16195; 2026-02-21T10:23:47.7964691Z cvt.rn.bf16x2.f32 %r10848, %r16198, %r16197; 2026-02-21T10:23:47.7964768Z cvt.rn.bf16x2.f32 %r10849, %r16200, %r16199; 2026-02-21T10:23:47.7964849Z cvt.rn.bf16x2.f32 %r10850, %r16202, %r16201; 2026-02-21T10:23:47.7964926Z cvt.rn.bf16x2.f32 %r10851, %r16204, %r16203; 2026-02-21T10:23:47.7965003Z cvt.rn.bf16x2.f32 %r10852, %r16206, %r16205; 2026-02-21T10:23:47.7965100Z cvt.rn.bf16x2.f32 %r10853, %r16208, %r16207; 2026-02-21T10:23:47.7965180Z cvt.rn.bf16x2.f32 %r10854, %r16210, %r16209; 2026-02-21T10:23:47.7965261Z cvt.rn.bf16x2.f32 %r10855, %r16212, %r16211; 2026-02-21T10:23:47.7965343Z cvt.rn.bf16x2.f32 %r10856, %r16214, %r16213; 2026-02-21T10:23:47.7965424Z cvt.rn.bf16x2.f32 %r10857, %r16216, %r16215; 2026-02-21T10:23:47.7965500Z cvt.rn.bf16x2.f32 %r10858, %r16218, %r16217; 2026-02-21T10:23:47.7965714Z .loc 1 94 50 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:50 2026-02-21T10:23:47.7965797Z mad.lo.s32 %r10859, %r10819, 1280, %r10818; 2026-02-21T10:23:47.7965874Z mad.lo.s32 %r10860, %r10820, 1280, %r10818; 2026-02-21T10:23:47.7965949Z mad.lo.s32 %r10861, %r10821, 1280, %r10818; 2026-02-21T10:23:47.7966027Z mad.lo.s32 %r10862, %r10822, 1280, %r10818; 2026-02-21T10:23:47.7966100Z mad.lo.s32 %r10863, %r10823, 1280, %r10818; 2026-02-21T10:23:47.7966175Z mad.lo.s32 %r10864, %r10824, 1280, %r10818; 2026-02-21T10:23:47.7966260Z mad.lo.s32 %r10865, %r10825, 1280, %r10818; 2026-02-21T10:23:47.7966334Z mad.lo.s32 %r10866, %r10826, 1280, %r10818; 2026-02-21T10:23:47.7966663Z .loc 1 94 22 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:22 2026-02-21T10:23:47.7966744Z mad.wide.s32 %rd252, %r10859, 2, %rd45; 2026-02-21T10:23:47.7966825Z mad.wide.s32 %rd253, %r10860, 2, %rd45; 2026-02-21T10:23:47.7966898Z mad.wide.s32 %rd254, %r10861, 2, %rd45; 2026-02-21T10:23:47.7966969Z mad.wide.s32 %rd255, %r10862, 2, %rd45; 2026-02-21T10:23:47.7967043Z mad.wide.s32 %rd256, %r10863, 2, %rd45; 2026-02-21T10:23:47.7967111Z mad.wide.s32 %rd257, %r10864, 2, %rd45; 2026-02-21T10:23:47.7967179Z mad.wide.s32 %rd258, %r10865, 2, %rd45; 2026-02-21T10:23:47.7967252Z mad.wide.s32 %rd259, %r10866, 2, %rd45; 2026-02-21T10:23:47.7967455Z .loc 1 94 81 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:81 2026-02-21T10:23:47.7967513Z bar.sync 0; 2026-02-21T10:23:47.7967726Z st.shared.v4.b32 [%r44], {%r10827, %r10829, %r10831, %r10833}; 2026-02-21T10:23:47.7967859Z st.shared.v4.b32 [%r44+512], {%r10828, %r10830, %r10832, %r10834}; 2026-02-21T10:23:47.7967983Z st.shared.v4.b32 [%r45], {%r10835, %r10837, %r10839, %r10841}; 2026-02-21T10:23:47.7968166Z st.shared.v4.b32 [%r45+512], {%r10836, %r10838, %r10840, %r10842}; 2026-02-21T10:23:47.7968288Z st.shared.v4.b32 [%r46], {%r10843, %r10845, %r10847, %r10849}; 2026-02-21T10:23:47.7968402Z st.shared.v4.b32 [%r46+512], {%r10844, %r10846, %r10848, %r10850}; 2026-02-21T10:23:47.7968511Z st.shared.v4.b32 [%r47], {%r10851, %r10853, %r10855, %r10857}; 2026-02-21T10:23:47.7968633Z st.shared.v4.b32 [%r47+512], {%r10852, %r10854, %r10856, %r10858}; 2026-02-21T10:23:47.7968691Z bar.sync 0; 2026-02-21T10:23:47.7968757Z // begin inline asm 2026-02-21T10:23:47.7968964Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10746, %r10747, %r10748, %r10749}, [%r5614]; 2026-02-21T10:23:47.7969038Z // end inline asm 2026-02-21T10:23:47.7969180Z // begin inline asm 2026-02-21T10:23:47.7969379Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10751, %r10752, %r10753, %r10754}, [%r5619]; 2026-02-21T10:23:47.7969441Z // end inline asm 2026-02-21T10:23:47.7969505Z // begin inline asm 2026-02-21T10:23:47.7969695Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10756, %r10757, %r10758, %r10759}, [%r5624]; 2026-02-21T10:23:47.7969757Z // end inline asm 2026-02-21T10:23:47.7969877Z // begin inline asm 2026-02-21T10:23:47.7970068Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10761, %r10762, %r10763, %r10764}, [%r5629]; 2026-02-21T10:23:47.7970126Z // end inline asm 2026-02-21T10:23:47.7970190Z // begin inline asm 2026-02-21T10:23:47.7970381Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10766, %r10767, %r10768, %r10769}, [%r5634]; 2026-02-21T10:23:47.7970440Z // end inline asm 2026-02-21T10:23:47.7970503Z // begin inline asm 2026-02-21T10:23:47.7970692Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10771, %r10772, %r10773, %r10774}, [%r5639]; 2026-02-21T10:23:47.7970755Z // end inline asm 2026-02-21T10:23:47.7970819Z // begin inline asm 2026-02-21T10:23:47.7971017Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10776, %r10777, %r10778, %r10779}, [%r5644]; 2026-02-21T10:23:47.7971082Z // end inline asm 2026-02-21T10:23:47.7971142Z // begin inline asm 2026-02-21T10:23:47.7971336Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10781, %r10782, %r10783, %r10784}, [%r5649]; 2026-02-21T10:23:47.7971396Z // end inline asm 2026-02-21T10:23:47.7971456Z // begin inline asm 2026-02-21T10:23:47.7971596Z st.global.v4.b32 [ %rd252 + 0 ], { %r10746, %r10747, %r10748, %r10749 }; 2026-02-21T10:23:47.7971654Z // end inline asm 2026-02-21T10:23:47.7971714Z // begin inline asm 2026-02-21T10:23:47.7971839Z st.global.v4.b32 [ %rd253 + 0 ], { %r10751, %r10752, %r10753, %r10754 }; 2026-02-21T10:23:47.7971904Z // end inline asm 2026-02-21T10:23:47.7971970Z // begin inline asm 2026-02-21T10:23:47.7972090Z st.global.v4.b32 [ %rd254 + 0 ], { %r10756, %r10757, %r10758, %r10759 }; 2026-02-21T10:23:47.7972159Z // end inline asm 2026-02-21T10:23:47.7972218Z // begin inline asm 2026-02-21T10:23:47.7972338Z st.global.v4.b32 [ %rd255 + 0 ], { %r10761, %r10762, %r10763, %r10764 }; 2026-02-21T10:23:47.7972402Z // end inline asm 2026-02-21T10:23:47.7972462Z // begin inline asm 2026-02-21T10:23:47.7972581Z st.global.v4.b32 [ %rd256 + 0 ], { %r10766, %r10767, %r10768, %r10769 }; 2026-02-21T10:23:47.7972641Z // end inline asm 2026-02-21T10:23:47.7972704Z // begin inline asm 2026-02-21T10:23:47.7972823Z st.global.v4.b32 [ %rd257 + 0 ], { %r10771, %r10772, %r10773, %r10774 }; 2026-02-21T10:23:47.7972882Z // end inline asm 2026-02-21T10:23:47.7972945Z // begin inline asm 2026-02-21T10:23:47.7973064Z st.global.v4.b32 [ %rd258 + 0 ], { %r10776, %r10777, %r10778, %r10779 }; 2026-02-21T10:23:47.7973124Z // end inline asm 2026-02-21T10:23:47.7973183Z // begin inline asm 2026-02-21T10:23:47.7973308Z st.global.v4.b32 [ %rd259 + 0 ], { %r10781, %r10782, %r10783, %r10784 }; 2026-02-21T10:23:47.7973438Z // end inline asm 2026-02-21T10:23:47.7973661Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7973737Z add.s32 %r16090, %r16090, 264; 2026-02-21T10:23:47.7973859Z setp.lt.s32 %p130, %r16090, %r16219; 2026-02-21T10:23:47.7973923Z @%p130 bra $L__BB0_2; 2026-02-21T10:23:47.7974019Z $L__BB0_7: // %.preheader 2026-02-21T10:23:47.7974091Z setp.gt.s32 %p131, %r16219, 5119; 2026-02-21T10:23:47.7974155Z @%p131 bra $L__BB0_12; 2026-02-21T10:23:47.7974241Z // %bb.8: // %.lr.ph491 2026-02-21T10:23:47.7974459Z .loc 1 0 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0:140 2026-02-21T10:23:47.7974528Z and.b32 %r10868, %r16076, 4080; 2026-02-21T10:23:47.7974595Z xor.b32 %r10870, %r10868, %r16077; 2026-02-21T10:23:47.7974665Z add.s32 %r60, %r10980, %r10870; 2026-02-21T10:23:47.7974736Z xor.b32 %r10872, %r10870, 8; 2026-02-21T10:23:47.7974865Z add.s32 %r61, %r10980, %r10872; 2026-02-21T10:23:47.7974939Z and.b32 %r10874, %r16078, 14336; 2026-02-21T10:23:47.7975005Z and.b32 %r10876, %r16079, 896; 2026-02-21T10:23:47.7975069Z and.b32 %r10878, %r16080, 62; 2026-02-21T10:23:47.7975138Z or.b32 %r10879, %r10874, %r10876; 2026-02-21T10:23:47.7975204Z or.b32 %r10880, %r10879, %r10878; 2026-02-21T10:23:47.7975314Z add.s32 %r62, %r10980, %r10880; 2026-02-21T10:23:47.7975379Z xor.b32 %r10881, %r10880, 8; 2026-02-21T10:23:47.7975447Z add.s32 %r63, %r10980, %r10881; 2026-02-21T10:23:47.7975509Z xor.b32 %r10882, %r10880, 16; 2026-02-21T10:23:47.7975572Z add.s32 %r64, %r10980, %r10882; 2026-02-21T10:23:47.7975633Z xor.b32 %r10883, %r10880, 24; 2026-02-21T10:23:47.7975706Z add.s32 %r65, %r10980, %r10883; 2026-02-21T10:23:47.7975767Z xor.b32 %r10884, %r10880, 32; 2026-02-21T10:23:47.7975831Z add.s32 %r66, %r10980, %r10884; 2026-02-21T10:23:47.7975897Z xor.b32 %r10885, %r10880, 40; 2026-02-21T10:23:47.7975965Z add.s32 %r67, %r10980, %r10885; 2026-02-21T10:23:47.7976026Z xor.b32 %r10886, %r10880, 48; 2026-02-21T10:23:47.7976089Z add.s32 %r68, %r10980, %r10886; 2026-02-21T10:23:47.7976156Z xor.b32 %r10887, %r10880, 56; 2026-02-21T10:23:47.7976220Z add.s32 %r69, %r10980, %r10887; 2026-02-21T10:23:47.7976283Z add.s32 %r70, %r10980, %r16081; 2026-02-21T10:23:47.7976350Z xor.b32 %r10889, %r16081, 16; 2026-02-21T10:23:47.7976414Z add.s32 %r71, %r10980, %r10889; 2026-02-21T10:23:47.7976587Z xor.b32 %r10890, %r16081, 32; 2026-02-21T10:23:47.7976657Z add.s32 %r72, %r10980, %r10890; 2026-02-21T10:23:47.7976719Z xor.b32 %r10891, %r16081, 48; 2026-02-21T10:23:47.7976782Z add.s32 %r73, %r10980, %r10891; 2026-02-21T10:23:47.7976846Z xor.b32 %r10892, %r16081, 64; 2026-02-21T10:23:47.7976913Z add.s32 %r74, %r10980, %r10892; 2026-02-21T10:23:47.7976976Z xor.b32 %r10893, %r16081, 80; 2026-02-21T10:23:47.7977049Z add.s32 %r75, %r10980, %r10893; 2026-02-21T10:23:47.7977118Z xor.b32 %r10894, %r16081, 96; 2026-02-21T10:23:47.7977183Z add.s32 %r76, %r10980, %r10894; 2026-02-21T10:23:47.7977245Z xor.b32 %r10895, %r16081, 112; 2026-02-21T10:23:47.7977307Z add.s32 %r77, %r10980, %r10895; 2026-02-21T10:23:47.7977372Z shl.b32 %r10896, %r16081, 7; 2026-02-21T10:23:47.7977438Z or.b32 %r10899, %r16082, %r16083; 2026-02-21T10:23:47.7977500Z or.b32 %r10900, %r10899, %r10896; 2026-02-21T10:23:47.7977569Z add.s32 %r78, %r10980, %r10900; 2026-02-21T10:23:47.7977633Z xor.b32 %r10901, %r10900, 16; 2026-02-21T10:23:47.7977698Z add.s32 %r79, %r10980, %r10901; 2026-02-21T10:23:47.7977765Z xor.b32 %r10902, %r10900, 32; 2026-02-21T10:23:47.7977826Z add.s32 %r80, %r10980, %r10902; 2026-02-21T10:23:47.7977888Z xor.b32 %r10903, %r10900, 48; 2026-02-21T10:23:47.7977949Z add.s32 %r81, %r10980, %r10903; 2026-02-21T10:23:47.7978015Z xor.b32 %r10904, %r10900, 64; 2026-02-21T10:23:47.7978080Z add.s32 %r82, %r10980, %r10904; 2026-02-21T10:23:47.7978142Z xor.b32 %r10905, %r10900, 80; 2026-02-21T10:23:47.7978303Z add.s32 %r83, %r10980, %r10905; 2026-02-21T10:23:47.7978379Z xor.b32 %r10906, %r10900, 96; 2026-02-21T10:23:47.7978449Z add.s32 %r84, %r10980, %r10906; 2026-02-21T10:23:47.7978516Z xor.b32 %r10907, %r10900, 112; 2026-02-21T10:23:47.7978647Z add.s32 %r85, %r10980, %r10907; 2026-02-21T10:23:47.7978712Z bfe.u32 %r10908, %r10980, 4, 14; 2026-02-21T10:23:47.7978780Z cvt.u64.u32 %rd260, %r10908; 2026-02-21T10:23:47.7978868Z or.b64 %rd11, %rd260, 4611686293372403712; 2026-02-21T10:23:47.7978935Z add.s32 %r10909, %r10980, 32; 2026-02-21T10:23:47.7978998Z bfe.u32 %r10910, %r10909, 4, 14; 2026-02-21T10:23:47.7979063Z cvt.u64.u32 %rd261, %r10910; 2026-02-21T10:23:47.7979146Z or.b64 %rd12, %rd261, 4611686293372403712; 2026-02-21T10:23:47.7979208Z add.s32 %r10911, %r10980, 64; 2026-02-21T10:23:47.7979283Z bfe.u32 %r10912, %r10911, 4, 14; 2026-02-21T10:23:47.7979357Z cvt.u64.u32 %rd262, %r10912; 2026-02-21T10:23:47.7979432Z or.b64 %rd13, %rd262, 4611686293372403712; 2026-02-21T10:23:47.7979562Z add.s32 %r10913, %r10980, 96; 2026-02-21T10:23:47.7979633Z bfe.u32 %r10914, %r10913, 4, 14; 2026-02-21T10:23:47.7979697Z cvt.u64.u32 %rd263, %r10914; 2026-02-21T10:23:47.7979771Z or.b64 %rd14, %rd263, 4611686293372403712; 2026-02-21T10:23:47.7979839Z add.s32 %r10915, %r10980, 16384; 2026-02-21T10:23:47.7979906Z bfe.u32 %r10916, %r10915, 4, 14; 2026-02-21T10:23:47.7979970Z cvt.u64.u32 %rd264, %r10916; 2026-02-21T10:23:47.7980101Z or.b64 %rd15, %rd264, 4611686293372403712; 2026-02-21T10:23:47.7980173Z add.s32 %r10917, %r10980, 16416; 2026-02-21T10:23:47.7980237Z bfe.u32 %r10918, %r10917, 4, 14; 2026-02-21T10:23:47.7980300Z cvt.u64.u32 %rd265, %r10918; 2026-02-21T10:23:47.7980374Z or.b64 %rd16, %rd265, 4611686293372403712; 2026-02-21T10:23:47.7980441Z add.s32 %r10919, %r10980, 16448; 2026-02-21T10:23:47.7980503Z bfe.u32 %r10920, %r10919, 4, 14; 2026-02-21T10:23:47.7980565Z cvt.u64.u32 %rd266, %r10920; 2026-02-21T10:23:47.7980644Z or.b64 %rd17, %rd266, 4611686293372403712; 2026-02-21T10:23:47.7980723Z add.s32 %r10921, %r10980, 16480; 2026-02-21T10:23:47.7980787Z bfe.u32 %r10922, %r10921, 4, 14; 2026-02-21T10:23:47.7980851Z cvt.u64.u32 %rd267, %r10922; 2026-02-21T10:23:47.7980927Z or.b64 %rd18, %rd267, 4611686293372403712; 2026-02-21T10:23:47.7980995Z shl.b32 %r10924, %r16084, 13; 2026-02-21T10:23:47.7981060Z and.b32 %r10925, %r16079, 7264; 2026-02-21T10:23:47.7981133Z shl.b32 %r10927, %r16085, 4; 2026-02-21T10:23:47.7981197Z and.b32 %r10929, %r512, 16; 2026-02-21T10:23:47.7981263Z or.b32 %r10930, %r10924, %r10929; 2026-02-21T10:23:47.7981331Z or.b32 %r10931, %r10925, %r10927; 2026-02-21T10:23:47.7981393Z or.b32 %r10932, %r10930, %r10931; 2026-02-21T10:23:47.7981468Z add.s32 %r86, %r10980, %r10932; 2026-02-21T10:23:47.7981531Z xor.b32 %r10933, %r10932, 32; 2026-02-21T10:23:47.7981600Z add.s32 %r87, %r10980, %r10933; 2026-02-21T10:23:47.7981662Z xor.b32 %r10934, %r10932, 64; 2026-02-21T10:23:47.7981725Z add.s32 %r88, %r10980, %r10934; 2026-02-21T10:23:47.7981797Z xor.b32 %r10935, %r10932, 96; 2026-02-21T10:23:47.7981859Z add.s32 %r89, %r10980, %r10935; 2026-02-21T10:23:47.7981921Z shl.b32 %r10936, %r16085, 10; 2026-02-21T10:23:47.7981984Z shl.b32 %r10937, %r16084, 5; 2026-02-21T10:23:47.7982053Z and.b32 %r10938, %r512, 1008; 2026-02-21T10:23:47.7982123Z or.b32 %r10939, %r10936, %r10937; 2026-02-21T10:23:47.7982190Z xor.b32 %r10940, %r10939, %r10938; 2026-02-21T10:23:47.7982261Z add.s32 %r15959, %r10980, %r10940; 2026-02-21T10:23:47.7982328Z add.s32 %r15964, %r15959, 1024; 2026-02-21T10:23:47.7982392Z add.s32 %r15969, %r15959, 2048; 2026-02-21T10:23:47.7982459Z add.s32 %r15974, %r15959, 3072; 2026-02-21T10:23:47.7982523Z add.s32 %r15979, %r15959, 4096; 2026-02-21T10:23:47.7982587Z add.s32 %r15984, %r15959, 5120; 2026-02-21T10:23:47.7982649Z add.s32 %r15989, %r15959, 6144; 2026-02-21T10:23:47.7982715Z add.s32 %r15994, %r15959, 7168; 2026-02-21T10:23:47.7982942Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.7983084Z mad.wide.u32 %rd19, %r16, 16, %rd44; 2026-02-21T10:23:47.7983205Z $L__BB0_9: // =>This Loop Header: Depth=1 2026-02-21T10:23:47.7983349Z // Child Loop BB0_10 Depth 2 2026-02-21T10:23:47.7983563Z .loc 1 32 35 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:32:35 2026-02-21T10:23:47.7983633Z shr.s32 %r10943, %r16219, 31; 2026-02-21T10:23:47.7983694Z shr.u32 %r10944, %r10943, 17; 2026-02-21T10:23:47.7983759Z add.s32 %r10945, %r16219, %r10944; 2026-02-21T10:23:47.7983822Z shr.s32 %r10946, %r10945, 15; 2026-02-21T10:23:47.7984031Z .loc 1 33 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:33:33 2026-02-21T10:23:47.7984095Z shl.b32 %r10947, %r10946, 6; 2026-02-21T10:23:47.7984298Z .loc 1 34 39 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:39 2026-02-21T10:23:47.7984414Z sub.s32 %r10948, 10, %r10947; 2026-02-21T10:23:47.7984617Z .loc 1 34 52 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:34:52 2026-02-21T10:23:47.7984678Z min.u32 %r10949, %r10948, 64; 2026-02-21T10:23:47.7984885Z .loc 1 35 45 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:45 2026-02-21T10:23:47.7984996Z and.b32 %r10950, %r10945, 32768; 2026-02-21T10:23:47.7985062Z sub.s32 %r10951, %r16219, %r10950; 2026-02-21T10:23:47.7985263Z .loc 1 35 64 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:64 2026-02-21T10:23:47.7985332Z cvt.u16.u32 %rs1537, %r10951; 2026-02-21T10:23:47.7985397Z cvt.u16.u32 %rs1538, %r10949; 2026-02-21T10:23:47.7985595Z .loc 1 36 51 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:36:51 2026-02-21T10:23:47.7985672Z div.s16 %rs1539, %rs1537, %rs1538; 2026-02-21T10:23:47.7985883Z .loc 1 35 64 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:64 2026-02-21T10:23:47.7985956Z mul.lo.s16 %rs1540, %rs1539, %rs1538; 2026-02-21T10:23:47.7986024Z sub.s16 %rs1541, %rs1537, %rs1540; 2026-02-21T10:23:47.7986091Z cvt.s32.s16 %r10952, %rs1541; 2026-02-21T10:23:47.7986290Z .loc 1 35 30 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:35:30 2026-02-21T10:23:47.7986363Z add.s32 %r10953, %r10947, %r10952; 2026-02-21T10:23:47.7986697Z .loc 1 36 51 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:36:51 2026-02-21T10:23:47.7986765Z cvt.u32.u16 %r10954, %rs1539; 2026-02-21T10:23:47.7986965Z .loc 1 37 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:37:27 2026-02-21T10:23:47.7987032Z shl.b32 %r10981, %r10953, 7; 2026-02-21T10:23:47.7987231Z .loc 1 39 27 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:39:27 2026-02-21T10:23:47.7987303Z mul.wide.s16 %r367, %rs1539, 128; 2026-02-21T10:23:47.7987518Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.7987581Z shl.b32 %r10955, %r10954, 20; 2026-02-21T10:23:47.7987647Z or.b32 %r10956, %r16086, %r10955; 2026-02-21T10:23:47.7987719Z mul.wide.s32 %rd36, %r10956, 2; 2026-02-21T10:23:47.7987783Z or.b32 %r10957, %r16087, %r367; 2026-02-21T10:23:47.7987847Z shl.b32 %r10958, %r10957, 13; 2026-02-21T10:23:47.7987914Z mul.wide.s32 %rd37, %r10958, 2; 2026-02-21T10:23:47.7987980Z or.b32 %r10959, %r16088, %r367; 2026-02-21T10:23:47.7988044Z shl.b32 %r10960, %r10959, 13; 2026-02-21T10:23:47.7988109Z mul.wide.s32 %rd38, %r10960, 2; 2026-02-21T10:23:47.7988177Z or.b32 %r10961, %r16089, %r10955; 2026-02-21T10:23:47.7988241Z mul.wide.s32 %rd39, %r10961, 2; 2026-02-21T10:23:47.7988306Z mov.b32 %r16220, 0f00000000; 2026-02-21T10:23:47.7988366Z mov.b64 %rd366, 0; 2026-02-21T10:23:47.7988433Z mov.b64 %rd365, %rd19; 2026-02-21T10:23:47.7988675Z mov.b32 %r16221, %r16220; 2026-02-21T10:23:47.7988740Z mov.b32 %r16222, %r16220; 2026-02-21T10:23:47.7988805Z mov.b32 %r16223, %r16220; 2026-02-21T10:23:47.7988867Z mov.b32 %r16224, %r16220; 2026-02-21T10:23:47.7989009Z mov.b32 %r16225, %r16220; 2026-02-21T10:23:47.7989074Z mov.b32 %r16226, %r16220; 2026-02-21T10:23:47.7989145Z mov.b32 %r16227, %r16220; 2026-02-21T10:23:47.7989209Z mov.b32 %r16228, %r16220; 2026-02-21T10:23:47.7989270Z mov.b32 %r16229, %r16220; 2026-02-21T10:23:47.7989334Z mov.b32 %r16230, %r16220; 2026-02-21T10:23:47.7989395Z mov.b32 %r16231, %r16220; 2026-02-21T10:23:47.7989456Z mov.b32 %r16232, %r16220; 2026-02-21T10:23:47.7989519Z mov.b32 %r16233, %r16220; 2026-02-21T10:23:47.7989579Z mov.b32 %r16234, %r16220; 2026-02-21T10:23:47.7989639Z mov.b32 %r16235, %r16220; 2026-02-21T10:23:47.7989700Z mov.b32 %r16236, %r16220; 2026-02-21T10:23:47.7989765Z mov.b32 %r16237, %r16220; 2026-02-21T10:23:47.7989826Z mov.b32 %r16238, %r16220; 2026-02-21T10:23:47.7989959Z mov.b32 %r16239, %r16220; 2026-02-21T10:23:47.7990031Z mov.b32 %r16240, %r16220; 2026-02-21T10:23:47.7990094Z mov.b32 %r16241, %r16220; 2026-02-21T10:23:47.7990155Z mov.b32 %r16242, %r16220; 2026-02-21T10:23:47.7990222Z mov.b32 %r16243, %r16220; 2026-02-21T10:23:47.7990286Z mov.b32 %r16244, %r16220; 2026-02-21T10:23:47.7990348Z mov.b32 %r16245, %r16220; 2026-02-21T10:23:47.7990472Z mov.b32 %r16246, %r16220; 2026-02-21T10:23:47.7990538Z mov.b32 %r16247, %r16220; 2026-02-21T10:23:47.7990598Z mov.b32 %r16248, %r16220; 2026-02-21T10:23:47.7990657Z mov.b32 %r16249, %r16220; 2026-02-21T10:23:47.7990718Z mov.b32 %r16250, %r16220; 2026-02-21T10:23:47.7990786Z mov.b32 %r16251, %r16220; 2026-02-21T10:23:47.7990846Z mov.b32 %r16252, %r16220; 2026-02-21T10:23:47.7990905Z mov.b32 %r16253, %r16220; 2026-02-21T10:23:47.7990975Z mov.b32 %r16254, %r16220; 2026-02-21T10:23:47.7991041Z mov.b32 %r16255, %r16220; 2026-02-21T10:23:47.7991100Z mov.b32 %r16256, %r16220; 2026-02-21T10:23:47.7991165Z mov.b32 %r16257, %r16220; 2026-02-21T10:23:47.7991229Z mov.b32 %r16258, %r16220; 2026-02-21T10:23:47.7991288Z mov.b32 %r16259, %r16220; 2026-02-21T10:23:47.7991347Z mov.b32 %r16260, %r16220; 2026-02-21T10:23:47.7991411Z mov.b32 %r16261, %r16220; 2026-02-21T10:23:47.7991471Z mov.b32 %r16262, %r16220; 2026-02-21T10:23:47.7991532Z mov.b32 %r16263, %r16220; 2026-02-21T10:23:47.7991592Z mov.b32 %r16264, %r16220; 2026-02-21T10:23:47.7991656Z mov.b32 %r16265, %r16220; 2026-02-21T10:23:47.7991714Z mov.b32 %r16266, %r16220; 2026-02-21T10:23:47.7991773Z mov.b32 %r16267, %r16220; 2026-02-21T10:23:47.7991835Z mov.b32 %r16268, %r16220; 2026-02-21T10:23:47.7991894Z mov.b32 %r16269, %r16220; 2026-02-21T10:23:47.7991962Z mov.b32 %r16270, %r16220; 2026-02-21T10:23:47.7992026Z mov.b32 %r16271, %r16220; 2026-02-21T10:23:47.7992086Z mov.b32 %r16272, %r16220; 2026-02-21T10:23:47.7992146Z mov.b32 %r16273, %r16220; 2026-02-21T10:23:47.7992206Z mov.b32 %r16274, %r16220; 2026-02-21T10:23:47.7992273Z mov.b32 %r16275, %r16220; 2026-02-21T10:23:47.7992330Z mov.b32 %r16276, %r16220; 2026-02-21T10:23:47.7992389Z mov.b32 %r16277, %r16220; 2026-02-21T10:23:47.7992450Z mov.b32 %r16278, %r16220; 2026-02-21T10:23:47.7992513Z mov.b32 %r16279, %r16220; 2026-02-21T10:23:47.7992571Z mov.b32 %r16280, %r16220; 2026-02-21T10:23:47.7992629Z mov.b32 %r16281, %r16220; 2026-02-21T10:23:47.7992693Z mov.b32 %r16282, %r16220; 2026-02-21T10:23:47.7992750Z mov.b32 %r16283, %r16220; 2026-02-21T10:23:47.7992870Z $L__BB0_10: // Parent Loop BB0_9 Depth=1 2026-02-21T10:23:47.7992983Z // => This Inner Loop Header: Depth=2 2026-02-21T10:23:47.7993202Z .loc 1 0 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:0:126 2026-02-21T10:23:47.7993266Z cvt.u32.u64 %r10982, %rd366; 2026-02-21T10:23:47.7993478Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.7993610Z add.s64 %rd270, %rd365, %rd39; 2026-02-21T10:23:47.7993684Z add.s64 %rd273, %rd365, %rd38; 2026-02-21T10:23:47.7993750Z add.s64 %rd276, %rd365, %rd37; 2026-02-21T10:23:47.7994008Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.7994070Z add.s64 %rd279, %rd365, %rd36; 2026-02-21T10:23:47.7994133Z // begin inline asm 2026-02-21T10:23:47.7994197Z mov.u64 %rd269, 0x0; 2026-02-21T10:23:47.7994327Z createpolicy.fractional.L2::evict_last.b64 %rd269, 1.0; 2026-02-21T10:23:47.7994385Z // end inline asm 2026-02-21T10:23:47.7994444Z // begin inline asm 2026-02-21T10:23:47.7994506Z mov.u32 %r10962, 0x0; 2026-02-21T10:23:47.7994565Z mov.u32 %r10963, 0x0; 2026-02-21T10:23:47.7994624Z mov.u32 %r10964, 0x0; 2026-02-21T10:23:47.7994685Z mov.u32 %r10965, 0x0; 2026-02-21T10:23:47.7994927Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10962, %r10963, %r10964, %r10965 }, [ %rd270 + 0 ], %rd269; 2026-02-21T10:23:47.7995036Z // end inline asm 2026-02-21T10:23:47.7995102Z // begin inline asm 2026-02-21T10:23:47.7995163Z mov.u64 %rd272, 0x0; 2026-02-21T10:23:47.7995282Z createpolicy.fractional.L2::evict_last.b64 %rd272, 1.0; 2026-02-21T10:23:47.7995343Z // end inline asm 2026-02-21T10:23:47.7995408Z // begin inline asm 2026-02-21T10:23:47.7995467Z mov.u32 %r10966, 0x0; 2026-02-21T10:23:47.7995579Z mov.u32 %r10967, 0x0; 2026-02-21T10:23:47.7995642Z mov.u32 %r10968, 0x0; 2026-02-21T10:23:47.7995702Z mov.u32 %r10969, 0x0; 2026-02-21T10:23:47.7995936Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10966, %r10967, %r10968, %r10969 }, [ %rd273 + 0 ], %rd272; 2026-02-21T10:23:47.7995997Z // end inline asm 2026-02-21T10:23:47.7996066Z // begin inline asm 2026-02-21T10:23:47.7996127Z mov.u64 %rd275, 0x0; 2026-02-21T10:23:47.7996249Z createpolicy.fractional.L2::evict_last.b64 %rd275, 1.0; 2026-02-21T10:23:47.7996311Z // end inline asm 2026-02-21T10:23:47.7996374Z // begin inline asm 2026-02-21T10:23:47.7996434Z mov.u32 %r10970, 0x0; 2026-02-21T10:23:47.7996625Z mov.u32 %r10971, 0x0; 2026-02-21T10:23:47.7996684Z mov.u32 %r10972, 0x0; 2026-02-21T10:23:47.7996742Z mov.u32 %r10973, 0x0; 2026-02-21T10:23:47.7996969Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10970, %r10971, %r10972, %r10973 }, [ %rd276 + 0 ], %rd275; 2026-02-21T10:23:47.7997031Z // end inline asm 2026-02-21T10:23:47.7997093Z // begin inline asm 2026-02-21T10:23:47.7997154Z mov.u64 %rd278, 0x0; 2026-02-21T10:23:47.7997274Z createpolicy.fractional.L2::evict_last.b64 %rd278, 1.0; 2026-02-21T10:23:47.7997333Z // end inline asm 2026-02-21T10:23:47.7997394Z // begin inline asm 2026-02-21T10:23:47.7997458Z mov.u32 %r10974, 0x0; 2026-02-21T10:23:47.7997518Z mov.u32 %r10975, 0x0; 2026-02-21T10:23:47.7997575Z mov.u32 %r10976, 0x0; 2026-02-21T10:23:47.7997632Z mov.u32 %r10977, 0x0; 2026-02-21T10:23:47.7997862Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10974, %r10975, %r10976, %r10977 }, [ %rd279 + 0 ], %rd278; 2026-02-21T10:23:47.7997925Z // end inline asm 2026-02-21T10:23:47.7998138Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.7998201Z bar.sync 0; 2026-02-21T10:23:47.7998291Z st.shared.v2.b32 [%r60], {%r10962, %r10963}; 2026-02-21T10:23:47.7998385Z st.shared.v2.b32 [%r60+4096], {%r10966, %r10967}; 2026-02-21T10:23:47.7998480Z st.shared.v2.b32 [%r60+8192], {%r10970, %r10971}; 2026-02-21T10:23:47.7998573Z st.shared.v2.b32 [%r60+12288], {%r10974, %r10975}; 2026-02-21T10:23:47.7998653Z st.shared.v2.b32 [%r61], {%r10964, %r10965}; 2026-02-21T10:23:47.7998739Z st.shared.v2.b32 [%r61+4096], {%r10968, %r10969}; 2026-02-21T10:23:47.7998826Z st.shared.v2.b32 [%r61+8192], {%r10972, %r10973}; 2026-02-21T10:23:47.7998915Z st.shared.v2.b32 [%r61+12288], {%r10976, %r10977}; 2026-02-21T10:23:47.7998974Z bar.sync 0; 2026-02-21T10:23:47.7999047Z ld.shared.b16 %rs1542, [%r62]; 2026-02-21T10:23:47.7999215Z ld.shared.b16 %rs1543, [%r62+1024]; 2026-02-21T10:23:47.7999289Z ld.shared.b16 %rs1544, [%r62+64]; 2026-02-21T10:23:47.7999360Z ld.shared.b16 %rs1545, [%r62+1088]; 2026-02-21T10:23:47.7999434Z ld.shared.b16 %rs1546, [%r63]; 2026-02-21T10:23:47.7999564Z ld.shared.b16 %rs1547, [%r63+1024]; 2026-02-21T10:23:47.7999632Z ld.shared.b16 %rs1548, [%r63+64]; 2026-02-21T10:23:47.7999705Z ld.shared.b16 %rs1549, [%r63+1088]; 2026-02-21T10:23:47.7999775Z ld.shared.b16 %rs1550, [%r64]; 2026-02-21T10:23:47.7999846Z ld.shared.b16 %rs1551, [%r64+1024]; 2026-02-21T10:23:47.7999917Z ld.shared.b16 %rs1552, [%r64+64]; 2026-02-21T10:23:47.7999985Z ld.shared.b16 %rs1553, [%r64+1088]; 2026-02-21T10:23:47.8000050Z ld.shared.b16 %rs1554, [%r65]; 2026-02-21T10:23:47.8000118Z ld.shared.b16 %rs1555, [%r65+1024]; 2026-02-21T10:23:47.8000186Z ld.shared.b16 %rs1556, [%r65+64]; 2026-02-21T10:23:47.8000254Z ld.shared.b16 %rs1557, [%r65+1088]; 2026-02-21T10:23:47.8000333Z ld.shared.b16 %rs1558, [%r66]; 2026-02-21T10:23:47.8000477Z ld.shared.b16 %rs1559, [%r66+1024]; 2026-02-21T10:23:47.8000545Z ld.shared.b16 %rs1560, [%r66+64]; 2026-02-21T10:23:47.8000611Z ld.shared.b16 %rs1561, [%r66+1088]; 2026-02-21T10:23:47.8000675Z ld.shared.b16 %rs1562, [%r67]; 2026-02-21T10:23:47.8000749Z ld.shared.b16 %rs1563, [%r67+1024]; 2026-02-21T10:23:47.8000813Z ld.shared.b16 %rs1564, [%r67+64]; 2026-02-21T10:23:47.8000879Z ld.shared.b16 %rs1565, [%r67+1088]; 2026-02-21T10:23:47.8001009Z ld.shared.b16 %rs1566, [%r68]; 2026-02-21T10:23:47.8001075Z ld.shared.b16 %rs1567, [%r68+1024]; 2026-02-21T10:23:47.8001140Z ld.shared.b16 %rs1568, [%r68+64]; 2026-02-21T10:23:47.8001209Z ld.shared.b16 %rs1569, [%r68+1088]; 2026-02-21T10:23:47.8001285Z ld.shared.b16 %rs1570, [%r69]; 2026-02-21T10:23:47.8001353Z ld.shared.b16 %rs1571, [%r69+1024]; 2026-02-21T10:23:47.8001421Z ld.shared.b16 %rs1572, [%r69+64]; 2026-02-21T10:23:47.8001490Z ld.shared.b16 %rs1573, [%r69+1088]; 2026-02-21T10:23:47.8001555Z cvt.f32.bf16 %r11115, %rs1542; 2026-02-21T10:23:47.8001622Z cvt.f32.bf16 %r11116, %rs1543; 2026-02-21T10:23:47.8001689Z cvt.f32.bf16 %r11117, %rs1546; 2026-02-21T10:23:47.8001750Z cvt.f32.bf16 %r11118, %rs1547; 2026-02-21T10:23:47.8001814Z cvt.f32.bf16 %r11247, %rs1550; 2026-02-21T10:23:47.8001878Z cvt.f32.bf16 %r11248, %rs1551; 2026-02-21T10:23:47.8001944Z cvt.f32.bf16 %r11249, %rs1554; 2026-02-21T10:23:47.8002007Z cvt.f32.bf16 %r11250, %rs1555; 2026-02-21T10:23:47.8002076Z cvt.f32.bf16 %r11379, %rs1558; 2026-02-21T10:23:47.8002149Z cvt.f32.bf16 %r11380, %rs1559; 2026-02-21T10:23:47.8002210Z cvt.f32.bf16 %r11381, %rs1562; 2026-02-21T10:23:47.8002271Z cvt.f32.bf16 %r11382, %rs1563; 2026-02-21T10:23:47.8002335Z cvt.f32.bf16 %r11511, %rs1566; 2026-02-21T10:23:47.8002401Z cvt.f32.bf16 %r11512, %rs1567; 2026-02-21T10:23:47.8002463Z cvt.f32.bf16 %r11513, %rs1570; 2026-02-21T10:23:47.8002524Z cvt.f32.bf16 %r11514, %rs1571; 2026-02-21T10:23:47.8002591Z cvt.f32.bf16 %r11643, %rs1544; 2026-02-21T10:23:47.8002654Z cvt.f32.bf16 %r11644, %rs1545; 2026-02-21T10:23:47.8002718Z cvt.f32.bf16 %r11645, %rs1548; 2026-02-21T10:23:47.8002784Z cvt.f32.bf16 %r11646, %rs1549; 2026-02-21T10:23:47.8002846Z cvt.f32.bf16 %r11775, %rs1552; 2026-02-21T10:23:47.8002913Z cvt.f32.bf16 %r11776, %rs1553; 2026-02-21T10:23:47.8002974Z cvt.f32.bf16 %r11777, %rs1556; 2026-02-21T10:23:47.8003042Z cvt.f32.bf16 %r11778, %rs1557; 2026-02-21T10:23:47.8003106Z cvt.f32.bf16 %r11907, %rs1560; 2026-02-21T10:23:47.8003170Z cvt.f32.bf16 %r11908, %rs1561; 2026-02-21T10:23:47.8003237Z cvt.f32.bf16 %r11909, %rs1564; 2026-02-21T10:23:47.8003300Z cvt.f32.bf16 %r11910, %rs1565; 2026-02-21T10:23:47.8003362Z cvt.f32.bf16 %r12039, %rs1568; 2026-02-21T10:23:47.8003424Z cvt.f32.bf16 %r12040, %rs1569; 2026-02-21T10:23:47.8003492Z cvt.f32.bf16 %r12041, %rs1572; 2026-02-21T10:23:47.8003555Z cvt.f32.bf16 %r12042, %rs1573; 2026-02-21T10:23:47.8003772Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.8003896Z bar.sync 0; 2026-02-21T10:23:47.8003961Z add.s32 %r10978, %r10980, 4096; 2026-02-21T10:23:47.8004023Z // begin inline asm 2026-02-21T10:23:47.8004138Z @%p132 mbarrier.init.shared::cta.b64 [%r10978], 1; 2026-02-21T10:23:47.8004249Z // end inline asm 2026-02-21T10:23:47.8004306Z bar.sync 0; 2026-02-21T10:23:47.8004368Z // begin inline asm 2026-02-21T10:23:47.8004520Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r10978], 4096; 2026-02-21T10:23:47.8004579Z // end inline asm 2026-02-21T10:23:47.8004640Z // begin inline asm 2026-02-21T10:23:47.8004723Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8004782Z // end inline asm 2026-02-21T10:23:47.8004842Z bar.sync 0; 2026-02-21T10:23:47.8004913Z elect.sync %r15822|%p182, -1; 2026-02-21T10:23:47.8004988Z and.pred %p134, %p1, %p182; 2026-02-21T10:23:47.8005050Z // begin inline asm 2026-02-21T10:23:47.8005445Z @%p134 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r10981, %r10982}], [%r10978]; 2026-02-21T10:23:47.8005513Z // end inline asm 2026-02-21T10:23:47.8005571Z bar.sync 0; 2026-02-21T10:23:47.8005631Z mov.b32 %r15753, 0; 2026-02-21T10:23:47.8005692Z // begin inline asm 2026-02-21T10:23:47.8005753Z 2026-02-21T10:23:47.8005808Z { 2026-02-21T10:23:47.8005875Z .reg .pred complete; 2026-02-21T10:23:47.8005937Z waitLoop: 2026-02-21T10:23:47.8006137Z mbarrier.try_wait.parity.shared.b64 complete, [%r10978], %r15753; 2026-02-21T10:23:47.8006211Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.8006264Z } 2026-02-21T10:23:47.8006272Z 2026-02-21T10:23:47.8006333Z // end inline asm 2026-02-21T10:23:47.8006393Z bar.sync 0; 2026-02-21T10:23:47.8006581Z // begin inline asm 2026-02-21T10:23:47.8006694Z @%p132 mbarrier.inval.shared::cta.b64 [%r10978]; 2026-02-21T10:23:47.8006752Z // end inline asm 2026-02-21T10:23:47.8006964Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8007042Z ld.shared.b8 %rs1574, [%r70]; 2026-02-21T10:23:47.8007114Z ld.shared.b8 %rs1575, [%r70+1024]; 2026-02-21T10:23:47.8007180Z ld.shared.b8 %rs1576, [%r70+2048]; 2026-02-21T10:23:47.8007245Z ld.shared.b8 %rs1577, [%r70+3072]; 2026-02-21T10:23:47.8007318Z ld.shared.b8 %rs1578, [%r71+128]; 2026-02-21T10:23:47.8007387Z ld.shared.b8 %rs1579, [%r71+1152]; 2026-02-21T10:23:47.8007452Z ld.shared.b8 %rs1580, [%r71+2176]; 2026-02-21T10:23:47.8007528Z ld.shared.b8 %rs1581, [%r71+3200]; 2026-02-21T10:23:47.8007597Z ld.shared.b8 %rs1582, [%r72+256]; 2026-02-21T10:23:47.8007666Z ld.shared.b8 %rs1583, [%r72+1280]; 2026-02-21T10:23:47.8007739Z ld.shared.b8 %rs1584, [%r72+2304]; 2026-02-21T10:23:47.8007805Z ld.shared.b8 %rs1585, [%r72+3328]; 2026-02-21T10:23:47.8007871Z ld.shared.b8 %rs1586, [%r73+384]; 2026-02-21T10:23:47.8007936Z ld.shared.b8 %rs1587, [%r73+1408]; 2026-02-21T10:23:47.8008008Z ld.shared.b8 %rs1588, [%r73+2432]; 2026-02-21T10:23:47.8008074Z ld.shared.b8 %rs1589, [%r73+3456]; 2026-02-21T10:23:47.8008155Z ld.shared.b8 %rs1590, [%r74+512]; 2026-02-21T10:23:47.8008229Z ld.shared.b8 %rs1591, [%r74+1536]; 2026-02-21T10:23:47.8008299Z ld.shared.b8 %rs1592, [%r74+2560]; 2026-02-21T10:23:47.8008367Z ld.shared.b8 %rs1593, [%r74+3584]; 2026-02-21T10:23:47.8008434Z ld.shared.b8 %rs1594, [%r75+640]; 2026-02-21T10:23:47.8008505Z ld.shared.b8 %rs1595, [%r75+1664]; 2026-02-21T10:23:47.8008571Z ld.shared.b8 %rs1596, [%r75+2688]; 2026-02-21T10:23:47.8008638Z ld.shared.b8 %rs1597, [%r75+3712]; 2026-02-21T10:23:47.8008708Z ld.shared.b8 %rs1598, [%r76+768]; 2026-02-21T10:23:47.8008772Z ld.shared.b8 %rs1599, [%r76+1792]; 2026-02-21T10:23:47.8008838Z ld.shared.b8 %rs1600, [%r76+2816]; 2026-02-21T10:23:47.8008909Z ld.shared.b8 %rs1601, [%r76+3840]; 2026-02-21T10:23:47.8008977Z ld.shared.b8 %rs1602, [%r77+896]; 2026-02-21T10:23:47.8009045Z ld.shared.b8 %rs1603, [%r77+1920]; 2026-02-21T10:23:47.8009112Z ld.shared.b8 %rs1604, [%r77+2944]; 2026-02-21T10:23:47.8009289Z ld.shared.b8 %rs1605, [%r77+3968]; 2026-02-21T10:23:47.8009503Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.8009572Z shl.b16 %rs1606, %rs1574, 4; 2026-02-21T10:23:47.8009707Z shl.b16 %rs1607, %rs1578, 4; 2026-02-21T10:23:47.8009769Z shl.b16 %rs1608, %rs1582, 4; 2026-02-21T10:23:47.8009831Z shl.b16 %rs1609, %rs1586, 4; 2026-02-21T10:23:47.8009902Z shl.b16 %rs1610, %rs1590, 4; 2026-02-21T10:23:47.8009970Z shl.b16 %rs1611, %rs1594, 4; 2026-02-21T10:23:47.8010033Z shl.b16 %rs1612, %rs1598, 4; 2026-02-21T10:23:47.8010096Z shl.b16 %rs1613, %rs1602, 4; 2026-02-21T10:23:47.8010164Z shl.b16 %rs1614, %rs1575, 4; 2026-02-21T10:23:47.8010229Z shl.b16 %rs1615, %rs1579, 4; 2026-02-21T10:23:47.8010292Z shl.b16 %rs1616, %rs1583, 4; 2026-02-21T10:23:47.8010353Z shl.b16 %rs1617, %rs1587, 4; 2026-02-21T10:23:47.8010420Z shl.b16 %rs1618, %rs1591, 4; 2026-02-21T10:23:47.8010483Z shl.b16 %rs1619, %rs1595, 4; 2026-02-21T10:23:47.8010606Z shl.b16 %rs1620, %rs1599, 4; 2026-02-21T10:23:47.8010675Z shl.b16 %rs1621, %rs1603, 4; 2026-02-21T10:23:47.8010739Z shl.b16 %rs1622, %rs1576, 4; 2026-02-21T10:23:47.8010803Z shl.b16 %rs1623, %rs1580, 4; 2026-02-21T10:23:47.8010876Z shl.b16 %rs1624, %rs1584, 4; 2026-02-21T10:23:47.8010938Z shl.b16 %rs1625, %rs1588, 4; 2026-02-21T10:23:47.8011000Z shl.b16 %rs1626, %rs1592, 4; 2026-02-21T10:23:47.8011132Z shl.b16 %rs1627, %rs1596, 4; 2026-02-21T10:23:47.8011204Z shl.b16 %rs1628, %rs1600, 4; 2026-02-21T10:23:47.8011267Z shl.b16 %rs1629, %rs1604, 4; 2026-02-21T10:23:47.8011329Z shl.b16 %rs1630, %rs1577, 4; 2026-02-21T10:23:47.8011394Z shl.b16 %rs1631, %rs1581, 4; 2026-02-21T10:23:47.8011456Z shl.b16 %rs1632, %rs1585, 4; 2026-02-21T10:23:47.8011517Z shl.b16 %rs1633, %rs1589, 4; 2026-02-21T10:23:47.8011581Z shl.b16 %rs1634, %rs1593, 4; 2026-02-21T10:23:47.8011649Z shl.b16 %rs1635, %rs1597, 4; 2026-02-21T10:23:47.8011712Z shl.b16 %rs1636, %rs1601, 4; 2026-02-21T10:23:47.8011777Z shl.b16 %rs1637, %rs1605, 4; 2026-02-21T10:23:47.8011995Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8012079Z selp.b16 %rs1638, %rs1606, %rs1574, %p188; 2026-02-21T10:23:47.8012148Z cvt.s16.s8 %rs1639, %rs1638; 2026-02-21T10:23:47.8012215Z shr.s16 %rs1640, %rs1639, 4; 2026-02-21T10:23:47.8012294Z selp.b16 %rs1641, %rs1607, %rs1578, %p188; 2026-02-21T10:23:47.8012362Z cvt.s16.s8 %rs1642, %rs1641; 2026-02-21T10:23:47.8012427Z shr.s16 %rs1643, %rs1642, 4; 2026-02-21T10:23:47.8012508Z selp.b16 %rs1644, %rs1608, %rs1582, %p188; 2026-02-21T10:23:47.8012571Z cvt.s16.s8 %rs1645, %rs1644; 2026-02-21T10:23:47.8012633Z shr.s16 %rs1646, %rs1645, 4; 2026-02-21T10:23:47.8012713Z selp.b16 %rs1647, %rs1609, %rs1586, %p188; 2026-02-21T10:23:47.8012776Z cvt.s16.s8 %rs1648, %rs1647; 2026-02-21T10:23:47.8012836Z shr.s16 %rs1649, %rs1648, 4; 2026-02-21T10:23:47.8012911Z selp.b16 %rs1650, %rs1610, %rs1590, %p188; 2026-02-21T10:23:47.8012994Z cvt.s16.s8 %rs1651, %rs1650; 2026-02-21T10:23:47.8013061Z shr.s16 %rs1652, %rs1651, 4; 2026-02-21T10:23:47.8013137Z selp.b16 %rs1653, %rs1611, %rs1594, %p188; 2026-02-21T10:23:47.8013206Z cvt.s16.s8 %rs1654, %rs1653; 2026-02-21T10:23:47.8013270Z shr.s16 %rs1655, %rs1654, 4; 2026-02-21T10:23:47.8013344Z selp.b16 %rs1656, %rs1612, %rs1598, %p188; 2026-02-21T10:23:47.8013408Z cvt.s16.s8 %rs1657, %rs1656; 2026-02-21T10:23:47.8013478Z shr.s16 %rs1658, %rs1657, 4; 2026-02-21T10:23:47.8013553Z selp.b16 %rs1659, %rs1613, %rs1602, %p188; 2026-02-21T10:23:47.8013617Z cvt.s16.s8 %rs1660, %rs1659; 2026-02-21T10:23:47.8013685Z shr.s16 %rs1661, %rs1660, 4; 2026-02-21T10:23:47.8013760Z selp.b16 %rs1662, %rs1614, %rs1575, %p188; 2026-02-21T10:23:47.8013828Z cvt.s16.s8 %rs1663, %rs1662; 2026-02-21T10:23:47.8013895Z shr.s16 %rs1664, %rs1663, 4; 2026-02-21T10:23:47.8013970Z selp.b16 %rs1665, %rs1615, %rs1579, %p188; 2026-02-21T10:23:47.8014033Z cvt.s16.s8 %rs1666, %rs1665; 2026-02-21T10:23:47.8014154Z shr.s16 %rs1667, %rs1666, 4; 2026-02-21T10:23:47.8014237Z selp.b16 %rs1668, %rs1616, %rs1583, %p188; 2026-02-21T10:23:47.8014301Z cvt.s16.s8 %rs1669, %rs1668; 2026-02-21T10:23:47.8014366Z shr.s16 %rs1670, %rs1669, 4; 2026-02-21T10:23:47.8014495Z selp.b16 %rs1671, %rs1617, %rs1587, %p188; 2026-02-21T10:23:47.8014558Z cvt.s16.s8 %rs1672, %rs1671; 2026-02-21T10:23:47.8014624Z shr.s16 %rs1673, %rs1672, 4; 2026-02-21T10:23:47.8014701Z selp.b16 %rs1674, %rs1618, %rs1591, %p188; 2026-02-21T10:23:47.8014771Z cvt.s16.s8 %rs1675, %rs1674; 2026-02-21T10:23:47.8014833Z shr.s16 %rs1676, %rs1675, 4; 2026-02-21T10:23:47.8014908Z selp.b16 %rs1677, %rs1619, %rs1595, %p188; 2026-02-21T10:23:47.8014980Z cvt.s16.s8 %rs1678, %rs1677; 2026-02-21T10:23:47.8015044Z shr.s16 %rs1679, %rs1678, 4; 2026-02-21T10:23:47.8015119Z selp.b16 %rs1680, %rs1620, %rs1599, %p188; 2026-02-21T10:23:47.8015189Z cvt.s16.s8 %rs1681, %rs1680; 2026-02-21T10:23:47.8015252Z shr.s16 %rs1682, %rs1681, 4; 2026-02-21T10:23:47.8015379Z selp.b16 %rs1683, %rs1621, %rs1603, %p188; 2026-02-21T10:23:47.8015445Z cvt.s16.s8 %rs1684, %rs1683; 2026-02-21T10:23:47.8015510Z shr.s16 %rs1685, %rs1684, 4; 2026-02-21T10:23:47.8015598Z selp.b16 %rs1686, %rs1622, %rs1576, %p188; 2026-02-21T10:23:47.8015665Z cvt.s16.s8 %rs1687, %rs1686; 2026-02-21T10:23:47.8015733Z shr.s16 %rs1688, %rs1687, 4; 2026-02-21T10:23:47.8015856Z selp.b16 %rs1689, %rs1623, %rs1580, %p188; 2026-02-21T10:23:47.8015921Z cvt.s16.s8 %rs1690, %rs1689; 2026-02-21T10:23:47.8015985Z shr.s16 %rs1691, %rs1690, 4; 2026-02-21T10:23:47.8016066Z selp.b16 %rs1692, %rs1624, %rs1584, %p188; 2026-02-21T10:23:47.8016129Z cvt.s16.s8 %rs1693, %rs1692; 2026-02-21T10:23:47.8016192Z shr.s16 %rs1694, %rs1693, 4; 2026-02-21T10:23:47.8016273Z selp.b16 %rs1695, %rs1625, %rs1588, %p188; 2026-02-21T10:23:47.8016334Z cvt.s16.s8 %rs1696, %rs1695; 2026-02-21T10:23:47.8016397Z shr.s16 %rs1697, %rs1696, 4; 2026-02-21T10:23:47.8016588Z selp.b16 %rs1698, %rs1626, %rs1592, %p188; 2026-02-21T10:23:47.8016667Z cvt.s16.s8 %rs1699, %rs1698; 2026-02-21T10:23:47.8016731Z shr.s16 %rs1700, %rs1699, 4; 2026-02-21T10:23:47.8016810Z selp.b16 %rs1701, %rs1627, %rs1596, %p188; 2026-02-21T10:23:47.8016880Z cvt.s16.s8 %rs1702, %rs1701; 2026-02-21T10:23:47.8016943Z shr.s16 %rs1703, %rs1702, 4; 2026-02-21T10:23:47.8017018Z selp.b16 %rs1704, %rs1628, %rs1600, %p188; 2026-02-21T10:23:47.8017088Z cvt.s16.s8 %rs1705, %rs1704; 2026-02-21T10:23:47.8017150Z shr.s16 %rs1706, %rs1705, 4; 2026-02-21T10:23:47.8017227Z selp.b16 %rs1707, %rs1629, %rs1604, %p188; 2026-02-21T10:23:47.8017290Z cvt.s16.s8 %rs1708, %rs1707; 2026-02-21T10:23:47.8017355Z shr.s16 %rs1709, %rs1708, 4; 2026-02-21T10:23:47.8017429Z selp.b16 %rs1710, %rs1630, %rs1577, %p188; 2026-02-21T10:23:47.8017495Z cvt.s16.s8 %rs1711, %rs1710; 2026-02-21T10:23:47.8017560Z shr.s16 %rs1712, %rs1711, 4; 2026-02-21T10:23:47.8017636Z selp.b16 %rs1713, %rs1631, %rs1581, %p188; 2026-02-21T10:23:47.8017701Z cvt.s16.s8 %rs1714, %rs1713; 2026-02-21T10:23:47.8017767Z shr.s16 %rs1715, %rs1714, 4; 2026-02-21T10:23:47.8017848Z selp.b16 %rs1716, %rs1632, %rs1585, %p188; 2026-02-21T10:23:47.8017912Z cvt.s16.s8 %rs1717, %rs1716; 2026-02-21T10:23:47.8017976Z shr.s16 %rs1718, %rs1717, 4; 2026-02-21T10:23:47.8018056Z selp.b16 %rs1719, %rs1633, %rs1589, %p188; 2026-02-21T10:23:47.8018118Z cvt.s16.s8 %rs1720, %rs1719; 2026-02-21T10:23:47.8018183Z shr.s16 %rs1721, %rs1720, 4; 2026-02-21T10:23:47.8018267Z selp.b16 %rs1722, %rs1634, %rs1593, %p188; 2026-02-21T10:23:47.8018340Z cvt.s16.s8 %rs1723, %rs1722; 2026-02-21T10:23:47.8018408Z shr.s16 %rs1724, %rs1723, 4; 2026-02-21T10:23:47.8018482Z selp.b16 %rs1725, %rs1635, %rs1597, %p188; 2026-02-21T10:23:47.8018548Z cvt.s16.s8 %rs1726, %rs1725; 2026-02-21T10:23:47.8018610Z shr.s16 %rs1727, %rs1726, 4; 2026-02-21T10:23:47.8018688Z selp.b16 %rs1728, %rs1636, %rs1601, %p188; 2026-02-21T10:23:47.8018757Z cvt.s16.s8 %rs1729, %rs1728; 2026-02-21T10:23:47.8018907Z shr.s16 %rs1730, %rs1729, 4; 2026-02-21T10:23:47.8018983Z selp.b16 %rs1731, %rs1637, %rs1605, %p188; 2026-02-21T10:23:47.8019046Z cvt.s16.s8 %rs1732, %rs1731; 2026-02-21T10:23:47.8019114Z shr.s16 %rs1733, %rs1732, 4; 2026-02-21T10:23:47.8019425Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.8019497Z cvt.rn.f32.s16 %r15823, %rs1640; 2026-02-21T10:23:47.8019572Z cvt.rn.f32.s16 %r15824, %rs1643; 2026-02-21T10:23:47.8019636Z cvt.rn.f32.s16 %r15825, %rs1646; 2026-02-21T10:23:47.8019700Z cvt.rn.f32.s16 %r15826, %rs1649; 2026-02-21T10:23:47.8019768Z cvt.rn.f32.s16 %r15827, %rs1652; 2026-02-21T10:23:47.8019831Z cvt.rn.f32.s16 %r15828, %rs1655; 2026-02-21T10:23:47.8019894Z cvt.rn.f32.s16 %r15829, %rs1658; 2026-02-21T10:23:47.8019958Z cvt.rn.f32.s16 %r15830, %rs1661; 2026-02-21T10:23:47.8020026Z cvt.rn.f32.s16 %r15831, %rs1664; 2026-02-21T10:23:47.8020090Z cvt.rn.f32.s16 %r15832, %rs1667; 2026-02-21T10:23:47.8020221Z cvt.rn.f32.s16 %r15833, %rs1670; 2026-02-21T10:23:47.8020293Z cvt.rn.f32.s16 %r15834, %rs1673; 2026-02-21T10:23:47.8020356Z cvt.rn.f32.s16 %r15835, %rs1676; 2026-02-21T10:23:47.8020419Z cvt.rn.f32.s16 %r15836, %rs1679; 2026-02-21T10:23:47.8020485Z cvt.rn.f32.s16 %r15837, %rs1682; 2026-02-21T10:23:47.8020557Z cvt.rn.f32.s16 %r15838, %rs1685; 2026-02-21T10:23:47.8020620Z cvt.rn.f32.s16 %r15839, %rs1688; 2026-02-21T10:23:47.8020746Z cvt.rn.f32.s16 %r15840, %rs1691; 2026-02-21T10:23:47.8020818Z cvt.rn.f32.s16 %r15841, %rs1694; 2026-02-21T10:23:47.8020882Z cvt.rn.f32.s16 %r15842, %rs1697; 2026-02-21T10:23:47.8020946Z cvt.rn.f32.s16 %r15843, %rs1700; 2026-02-21T10:23:47.8021010Z cvt.rn.f32.s16 %r15844, %rs1703; 2026-02-21T10:23:47.8021081Z cvt.rn.f32.s16 %r15845, %rs1706; 2026-02-21T10:23:47.8021143Z cvt.rn.f32.s16 %r15846, %rs1709; 2026-02-21T10:23:47.8021206Z cvt.rn.f32.s16 %r15847, %rs1712; 2026-02-21T10:23:47.8021275Z cvt.rn.f32.s16 %r15848, %rs1715; 2026-02-21T10:23:47.8021343Z cvt.rn.f32.s16 %r15849, %rs1718; 2026-02-21T10:23:47.8021408Z cvt.rn.f32.s16 %r15850, %rs1721; 2026-02-21T10:23:47.8021475Z cvt.rn.f32.s16 %r15851, %rs1724; 2026-02-21T10:23:47.8021539Z cvt.rn.f32.s16 %r15852, %rs1727; 2026-02-21T10:23:47.8021606Z cvt.rn.f32.s16 %r15853, %rs1730; 2026-02-21T10:23:47.8021669Z cvt.rn.f32.s16 %r15854, %rs1733; 2026-02-21T10:23:47.8021743Z bar.sync 0; 2026-02-21T10:23:47.8021815Z st.shared.b32 [%r78], %r15823; 2026-02-21T10:23:47.8021887Z st.shared.b32 [%r78+8], %r15824; 2026-02-21T10:23:47.8021963Z st.shared.b32 [%r78+16384], %r15839; 2026-02-21T10:23:47.8022033Z st.shared.b32 [%r78+16392], %r15840; 2026-02-21T10:23:47.8024960Z st.shared.b32 [%r79], %r15825; 2026-02-21T10:23:47.8025076Z st.shared.b32 [%r79+8], %r15826; 2026-02-21T10:23:47.8025156Z st.shared.b32 [%r79+16384], %r15841; 2026-02-21T10:23:47.8025231Z st.shared.b32 [%r79+16392], %r15842; 2026-02-21T10:23:47.8025304Z st.shared.b32 [%r80], %r15827; 2026-02-21T10:23:47.8025385Z st.shared.b32 [%r80+8], %r15828; 2026-02-21T10:23:47.8025455Z st.shared.b32 [%r80+16384], %r15843; 2026-02-21T10:23:47.8025523Z st.shared.b32 [%r80+16392], %r15844; 2026-02-21T10:23:47.8025593Z st.shared.b32 [%r81], %r15829; 2026-02-21T10:23:47.8025661Z st.shared.b32 [%r81+8], %r15830; 2026-02-21T10:23:47.8025728Z st.shared.b32 [%r81+16384], %r15845; 2026-02-21T10:23:47.8025795Z st.shared.b32 [%r81+16392], %r15846; 2026-02-21T10:23:47.8025867Z st.shared.b32 [%r82], %r15831; 2026-02-21T10:23:47.8025932Z st.shared.b32 [%r82+8], %r15832; 2026-02-21T10:23:47.8025998Z st.shared.b32 [%r82+16384], %r15847; 2026-02-21T10:23:47.8026068Z st.shared.b32 [%r82+16392], %r15848; 2026-02-21T10:23:47.8026132Z st.shared.b32 [%r83], %r15833; 2026-02-21T10:23:47.8026198Z st.shared.b32 [%r83+8], %r15834; 2026-02-21T10:23:47.8026269Z st.shared.b32 [%r83+16384], %r15849; 2026-02-21T10:23:47.8026335Z st.shared.b32 [%r83+16392], %r15850; 2026-02-21T10:23:47.8026399Z st.shared.b32 [%r84], %r15835; 2026-02-21T10:23:47.8026760Z st.shared.b32 [%r84+8], %r15836; 2026-02-21T10:23:47.8026839Z st.shared.b32 [%r84+16384], %r15851; 2026-02-21T10:23:47.8026905Z st.shared.b32 [%r84+16392], %r15852; 2026-02-21T10:23:47.8026971Z st.shared.b32 [%r85], %r15837; 2026-02-21T10:23:47.8027136Z st.shared.b32 [%r85+8], %r15838; 2026-02-21T10:23:47.8027204Z st.shared.b32 [%r85+16384], %r15853; 2026-02-21T10:23:47.8027274Z st.shared.b32 [%r85+16392], %r15854; 2026-02-21T10:23:47.8027333Z $L__tmp17: 2026-02-21T10:23:47.8027645Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.8027711Z // begin inline asm 2026-02-21T10:23:47.8027813Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8027879Z // end inline asm 2026-02-21T10:23:47.8027937Z bar.sync 0; 2026-02-21T10:23:47.8028030Z shfl.sync.idx.b32 %r15855, %r4, 0, 31, -1; 2026-02-21T10:23:47.8028109Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.8028183Z mov.pred %p136, -1; 2026-02-21T10:23:47.8028316Z // begin inline asm 2026-02-21T10:23:47.8029998Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11115,%r11116,%r11117,%r11118}, %rd11, %p136, 1, 1; 2026-02-21T10:23:47.8030071Z // end inline asm 2026-02-21T10:23:47.8030136Z // begin inline asm 2026-02-21T10:23:47.8031617Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11247,%r11248,%r11249,%r11250}, %rd12, %p136, 1, 1; 2026-02-21T10:23:47.8031685Z // end inline asm 2026-02-21T10:23:47.8031757Z // begin inline asm 2026-02-21T10:23:47.8033238Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11379,%r11380,%r11381,%r11382}, %rd13, %p136, 1, 1; 2026-02-21T10:23:47.8033301Z // end inline asm 2026-02-21T10:23:47.8033368Z // begin inline asm 2026-02-21T10:23:47.8034838Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11511,%r11512,%r11513,%r11514}, %rd14, %p136, 1, 1; 2026-02-21T10:23:47.8034898Z // end inline asm 2026-02-21T10:23:47.8034963Z // begin inline asm 2026-02-21T10:23:47.8036636Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11643,%r11644,%r11645,%r11646}, %rd15, %p136, 1, 1; 2026-02-21T10:23:47.8036780Z // end inline asm 2026-02-21T10:23:47.8036844Z // begin inline asm 2026-02-21T10:23:47.8038376Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11775,%r11776,%r11777,%r11778}, %rd16, %p136, 1, 1; 2026-02-21T10:23:47.8038454Z // end inline asm 2026-02-21T10:23:47.8038578Z // begin inline asm 2026-02-21T10:23:47.8040081Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r11907,%r11908,%r11909,%r11910}, %rd17, %p136, 1, 1; 2026-02-21T10:23:47.8040150Z // end inline asm 2026-02-21T10:23:47.8040209Z // begin inline asm 2026-02-21T10:23:47.8041710Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12039,%r12040,%r12041,%r12042}, %rd18, %p136, 1, 1; 2026-02-21T10:23:47.8041772Z // end inline asm 2026-02-21T10:23:47.8041851Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.8041918Z mov.b32 %r12108, %r15753; 2026-02-21T10:23:47.8041984Z mov.b32 %r12109, %r15753; 2026-02-21T10:23:47.8042045Z mov.b32 %r12107, %r10980; 2026-02-21T10:23:47.8042123Z // begin inline asm 2026-02-21T10:23:47.8043416Z // wait for regs: %r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283,%r12107,%r12108,%r12109 2026-02-21T10:23:47.8043497Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.8043558Z // end inline asm 2026-02-21T10:23:47.8043615Z $L__tmp18: 2026-02-21T10:23:47.8043840Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.8043977Z add.s64 %rd291, %rd270, 128; 2026-02-21T10:23:47.8044048Z add.s64 %rd294, %rd273, 128; 2026-02-21T10:23:47.8044110Z add.s64 %rd297, %rd276, 128; 2026-02-21T10:23:47.8044322Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.8044451Z add.s64 %rd300, %rd279, 128; 2026-02-21T10:23:47.8044514Z // begin inline asm 2026-02-21T10:23:47.8044578Z mov.u64 %rd290, 0x0; 2026-02-21T10:23:47.8044728Z createpolicy.fractional.L2::evict_last.b64 %rd290, 1.0; 2026-02-21T10:23:47.8044789Z // end inline asm 2026-02-21T10:23:47.8044849Z // begin inline asm 2026-02-21T10:23:47.8044909Z mov.u32 %r12177, 0x0; 2026-02-21T10:23:47.8044974Z mov.u32 %r12178, 0x0; 2026-02-21T10:23:47.8045033Z mov.u32 %r12179, 0x0; 2026-02-21T10:23:47.8045091Z mov.u32 %r12180, 0x0; 2026-02-21T10:23:47.8045336Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r12177, %r12178, %r12179, %r12180 }, [ %rd291 + 0 ], %rd290; 2026-02-21T10:23:47.8045397Z // end inline asm 2026-02-21T10:23:47.8045504Z // begin inline asm 2026-02-21T10:23:47.8045572Z mov.u64 %rd293, 0x0; 2026-02-21T10:23:47.8045694Z createpolicy.fractional.L2::evict_last.b64 %rd293, 1.0; 2026-02-21T10:23:47.8045756Z // end inline asm 2026-02-21T10:23:47.8045816Z // begin inline asm 2026-02-21T10:23:47.8045882Z mov.u32 %r12181, 0x0; 2026-02-21T10:23:47.8045941Z mov.u32 %r12182, 0x0; 2026-02-21T10:23:47.8046043Z mov.u32 %r12183, 0x0; 2026-02-21T10:23:47.8046111Z mov.u32 %r12184, 0x0; 2026-02-21T10:23:47.8046341Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r12181, %r12182, %r12183, %r12184 }, [ %rd294 + 0 ], %rd293; 2026-02-21T10:23:47.8046401Z // end inline asm 2026-02-21T10:23:47.8046576Z // begin inline asm 2026-02-21T10:23:47.8046646Z mov.u64 %rd296, 0x0; 2026-02-21T10:23:47.8046764Z createpolicy.fractional.L2::evict_last.b64 %rd296, 1.0; 2026-02-21T10:23:47.8046822Z // end inline asm 2026-02-21T10:23:47.8046885Z // begin inline asm 2026-02-21T10:23:47.8046948Z mov.u32 %r12185, 0x0; 2026-02-21T10:23:47.8047008Z mov.u32 %r12186, 0x0; 2026-02-21T10:23:47.8047070Z mov.u32 %r12187, 0x0; 2026-02-21T10:23:47.8047129Z mov.u32 %r12188, 0x0; 2026-02-21T10:23:47.8047351Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r12185, %r12186, %r12187, %r12188 }, [ %rd297 + 0 ], %rd296; 2026-02-21T10:23:47.8047411Z // end inline asm 2026-02-21T10:23:47.8047476Z // begin inline asm 2026-02-21T10:23:47.8047537Z mov.u64 %rd299, 0x0; 2026-02-21T10:23:47.8047653Z createpolicy.fractional.L2::evict_last.b64 %rd299, 1.0; 2026-02-21T10:23:47.8047715Z // end inline asm 2026-02-21T10:23:47.8047778Z // begin inline asm 2026-02-21T10:23:47.8047835Z mov.u32 %r12189, 0x0; 2026-02-21T10:23:47.8047894Z mov.u32 %r12190, 0x0; 2026-02-21T10:23:47.8047956Z mov.u32 %r12191, 0x0; 2026-02-21T10:23:47.8048017Z mov.u32 %r12192, 0x0; 2026-02-21T10:23:47.8048257Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r12189, %r12190, %r12191, %r12192 }, [ %rd300 + 0 ], %rd299; 2026-02-21T10:23:47.8048325Z // end inline asm 2026-02-21T10:23:47.8048547Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.8048610Z bar.sync 0; 2026-02-21T10:23:47.8048704Z st.shared.v2.b32 [%r60], {%r12177, %r12178}; 2026-02-21T10:23:47.8048801Z st.shared.v2.b32 [%r60+4096], {%r12181, %r12182}; 2026-02-21T10:23:47.8048890Z st.shared.v2.b32 [%r60+8192], {%r12185, %r12186}; 2026-02-21T10:23:47.8048985Z st.shared.v2.b32 [%r60+12288], {%r12189, %r12190}; 2026-02-21T10:23:47.8049073Z st.shared.v2.b32 [%r61], {%r12179, %r12180}; 2026-02-21T10:23:47.8049161Z st.shared.v2.b32 [%r61+4096], {%r12183, %r12184}; 2026-02-21T10:23:47.8049248Z st.shared.v2.b32 [%r61+8192], {%r12187, %r12188}; 2026-02-21T10:23:47.8049341Z st.shared.v2.b32 [%r61+12288], {%r12191, %r12192}; 2026-02-21T10:23:47.8049401Z bar.sync 0; 2026-02-21T10:23:47.8049472Z ld.shared.b16 %rs1734, [%r62]; 2026-02-21T10:23:47.8049549Z ld.shared.b16 %rs1735, [%r62+1024]; 2026-02-21T10:23:47.8049703Z ld.shared.b16 %rs1736, [%r62+64]; 2026-02-21T10:23:47.8049772Z ld.shared.b16 %rs1737, [%r62+1088]; 2026-02-21T10:23:47.8049845Z ld.shared.b16 %rs1738, [%r63]; 2026-02-21T10:23:47.8049931Z ld.shared.b16 %rs1739, [%r63+1024]; 2026-02-21T10:23:47.8050067Z ld.shared.b16 %rs1740, [%r63+64]; 2026-02-21T10:23:47.8050135Z ld.shared.b16 %rs1741, [%r63+1088]; 2026-02-21T10:23:47.8050206Z ld.shared.b16 %rs1742, [%r64]; 2026-02-21T10:23:47.8050275Z ld.shared.b16 %rs1743, [%r64+1024]; 2026-02-21T10:23:47.8050341Z ld.shared.b16 %rs1744, [%r64+64]; 2026-02-21T10:23:47.8050411Z ld.shared.b16 %rs1745, [%r64+1088]; 2026-02-21T10:23:47.8050487Z ld.shared.b16 %rs1746, [%r65]; 2026-02-21T10:23:47.8050553Z ld.shared.b16 %rs1747, [%r65+1024]; 2026-02-21T10:23:47.8050621Z ld.shared.b16 %rs1748, [%r65+64]; 2026-02-21T10:23:47.8050695Z ld.shared.b16 %rs1749, [%r65+1088]; 2026-02-21T10:23:47.8050765Z ld.shared.b16 %rs1750, [%r66]; 2026-02-21T10:23:47.8050842Z ld.shared.b16 %rs1751, [%r66+1024]; 2026-02-21T10:23:47.8050979Z ld.shared.b16 %rs1752, [%r66+64]; 2026-02-21T10:23:47.8051052Z ld.shared.b16 %rs1753, [%r66+1088]; 2026-02-21T10:23:47.8051118Z ld.shared.b16 %rs1754, [%r67]; 2026-02-21T10:23:47.8051188Z ld.shared.b16 %rs1755, [%r67+1024]; 2026-02-21T10:23:47.8051276Z ld.shared.b16 %rs1756, [%r67+64]; 2026-02-21T10:23:47.8051344Z ld.shared.b16 %rs1757, [%r67+1088]; 2026-02-21T10:23:47.8051467Z ld.shared.b16 %rs1758, [%r68]; 2026-02-21T10:23:47.8051549Z ld.shared.b16 %rs1759, [%r68+1024]; 2026-02-21T10:23:47.8051620Z ld.shared.b16 %rs1760, [%r68+64]; 2026-02-21T10:23:47.8051687Z ld.shared.b16 %rs1761, [%r68+1088]; 2026-02-21T10:23:47.8051756Z ld.shared.b16 %rs1762, [%r69]; 2026-02-21T10:23:47.8051826Z ld.shared.b16 %rs1763, [%r69+1024]; 2026-02-21T10:23:47.8051892Z ld.shared.b16 %rs1764, [%r69+64]; 2026-02-21T10:23:47.8051964Z ld.shared.b16 %rs1765, [%r69+1088]; 2026-02-21T10:23:47.8052034Z cvt.f32.bf16 %r12330, %rs1734; 2026-02-21T10:23:47.8052100Z cvt.f32.bf16 %r12331, %rs1735; 2026-02-21T10:23:47.8052165Z cvt.f32.bf16 %r12332, %rs1738; 2026-02-21T10:23:47.8052234Z cvt.f32.bf16 %r12333, %rs1739; 2026-02-21T10:23:47.8052295Z cvt.f32.bf16 %r12462, %rs1742; 2026-02-21T10:23:47.8052357Z cvt.f32.bf16 %r12463, %rs1743; 2026-02-21T10:23:47.8052420Z cvt.f32.bf16 %r12464, %rs1746; 2026-02-21T10:23:47.8052496Z cvt.f32.bf16 %r12465, %rs1747; 2026-02-21T10:23:47.8052565Z cvt.f32.bf16 %r12594, %rs1750; 2026-02-21T10:23:47.8052627Z cvt.f32.bf16 %r12595, %rs1751; 2026-02-21T10:23:47.8052694Z cvt.f32.bf16 %r12596, %rs1754; 2026-02-21T10:23:47.8052755Z cvt.f32.bf16 %r12597, %rs1755; 2026-02-21T10:23:47.8052816Z cvt.f32.bf16 %r12726, %rs1758; 2026-02-21T10:23:47.8052878Z cvt.f32.bf16 %r12727, %rs1759; 2026-02-21T10:23:47.8052949Z cvt.f32.bf16 %r12728, %rs1762; 2026-02-21T10:23:47.8053011Z cvt.f32.bf16 %r12729, %rs1763; 2026-02-21T10:23:47.8053072Z cvt.f32.bf16 %r12858, %rs1736; 2026-02-21T10:23:47.8053138Z cvt.f32.bf16 %r12859, %rs1737; 2026-02-21T10:23:47.8053206Z cvt.f32.bf16 %r12860, %rs1740; 2026-02-21T10:23:47.8053269Z cvt.f32.bf16 %r12861, %rs1741; 2026-02-21T10:23:47.8053332Z cvt.f32.bf16 %r12990, %rs1744; 2026-02-21T10:23:47.8053398Z cvt.f32.bf16 %r12991, %rs1745; 2026-02-21T10:23:47.8053463Z cvt.f32.bf16 %r12992, %rs1748; 2026-02-21T10:23:47.8053526Z cvt.f32.bf16 %r12993, %rs1749; 2026-02-21T10:23:47.8053592Z cvt.f32.bf16 %r13122, %rs1752; 2026-02-21T10:23:47.8053658Z cvt.f32.bf16 %r13123, %rs1753; 2026-02-21T10:23:47.8053721Z cvt.f32.bf16 %r13124, %rs1756; 2026-02-21T10:23:47.8053790Z cvt.f32.bf16 %r13125, %rs1757; 2026-02-21T10:23:47.8053852Z cvt.f32.bf16 %r13254, %rs1760; 2026-02-21T10:23:47.8053914Z cvt.f32.bf16 %r13255, %rs1761; 2026-02-21T10:23:47.8053977Z cvt.f32.bf16 %r13256, %rs1764; 2026-02-21T10:23:47.8054044Z cvt.f32.bf16 %r13257, %rs1765; 2026-02-21T10:23:47.8054274Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.8054393Z bar.sync 0; 2026-02-21T10:23:47.8054462Z // begin inline asm 2026-02-21T10:23:47.8054565Z @%p132 mbarrier.init.shared::cta.b64 [%r10978], 1; 2026-02-21T10:23:47.8054624Z // end inline asm 2026-02-21T10:23:47.8054679Z bar.sync 0; 2026-02-21T10:23:47.8054791Z // begin inline asm 2026-02-21T10:23:47.8054929Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r10978], 4096; 2026-02-21T10:23:47.8054986Z // end inline asm 2026-02-21T10:23:47.8055053Z // begin inline asm 2026-02-21T10:23:47.8055133Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8055190Z // end inline asm 2026-02-21T10:23:47.8055246Z bar.sync 0; 2026-02-21T10:23:47.8055322Z elect.sync %r15856|%p183, -1; 2026-02-21T10:23:47.8055392Z and.pred %p146, %p1, %p183; 2026-02-21T10:23:47.8055456Z or.b32 %r12197, %r10982, 32; 2026-02-21T10:23:47.8055519Z // begin inline asm 2026-02-21T10:23:47.8055859Z @%p146 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r10981, %r12197}], [%r10978]; 2026-02-21T10:23:47.8055984Z // end inline asm 2026-02-21T10:23:47.8056045Z bar.sync 0; 2026-02-21T10:23:47.8056105Z // begin inline asm 2026-02-21T10:23:47.8056158Z 2026-02-21T10:23:47.8056211Z { 2026-02-21T10:23:47.8056283Z .reg .pred complete; 2026-02-21T10:23:47.8056347Z waitLoop: 2026-02-21T10:23:47.8056611Z mbarrier.try_wait.parity.shared.b64 complete, [%r10978], %r15753; 2026-02-21T10:23:47.8056692Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.8056826Z } 2026-02-21T10:23:47.8056833Z 2026-02-21T10:23:47.8056896Z // end inline asm 2026-02-21T10:23:47.8056958Z bar.sync 0; 2026-02-21T10:23:47.8057022Z // begin inline asm 2026-02-21T10:23:47.8057120Z @%p132 mbarrier.inval.shared::cta.b64 [%r10978]; 2026-02-21T10:23:47.8057180Z // end inline asm 2026-02-21T10:23:47.8057399Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8057468Z ld.shared.b8 %rs1766, [%r70]; 2026-02-21T10:23:47.8057543Z ld.shared.b8 %rs1767, [%r70+1024]; 2026-02-21T10:23:47.8057618Z ld.shared.b8 %rs1768, [%r70+2048]; 2026-02-21T10:23:47.8057685Z ld.shared.b8 %rs1769, [%r70+3072]; 2026-02-21T10:23:47.8057754Z ld.shared.b8 %rs1770, [%r71+128]; 2026-02-21T10:23:47.8057820Z ld.shared.b8 %rs1771, [%r71+1152]; 2026-02-21T10:23:47.8057896Z ld.shared.b8 %rs1772, [%r71+2176]; 2026-02-21T10:23:47.8057962Z ld.shared.b8 %rs1773, [%r71+3200]; 2026-02-21T10:23:47.8058034Z ld.shared.b8 %rs1774, [%r72+256]; 2026-02-21T10:23:47.8058106Z ld.shared.b8 %rs1775, [%r72+1280]; 2026-02-21T10:23:47.8058171Z ld.shared.b8 %rs1776, [%r72+2304]; 2026-02-21T10:23:47.8058239Z ld.shared.b8 %rs1777, [%r72+3328]; 2026-02-21T10:23:47.8058304Z ld.shared.b8 %rs1778, [%r73+384]; 2026-02-21T10:23:47.8058372Z ld.shared.b8 %rs1779, [%r73+1408]; 2026-02-21T10:23:47.8058437Z ld.shared.b8 %rs1780, [%r73+2432]; 2026-02-21T10:23:47.8058500Z ld.shared.b8 %rs1781, [%r73+3456]; 2026-02-21T10:23:47.8058574Z ld.shared.b8 %rs1782, [%r74+512]; 2026-02-21T10:23:47.8058644Z ld.shared.b8 %rs1783, [%r74+1536]; 2026-02-21T10:23:47.8058709Z ld.shared.b8 %rs1784, [%r74+2560]; 2026-02-21T10:23:47.8058783Z ld.shared.b8 %rs1785, [%r74+3584]; 2026-02-21T10:23:47.8058851Z ld.shared.b8 %rs1786, [%r75+640]; 2026-02-21T10:23:47.8058919Z ld.shared.b8 %rs1787, [%r75+1664]; 2026-02-21T10:23:47.8058987Z ld.shared.b8 %rs1788, [%r75+2688]; 2026-02-21T10:23:47.8059072Z ld.shared.b8 %rs1789, [%r75+3712]; 2026-02-21T10:23:47.8059141Z ld.shared.b8 %rs1790, [%r76+768]; 2026-02-21T10:23:47.8059207Z ld.shared.b8 %rs1791, [%r76+1792]; 2026-02-21T10:23:47.8059277Z ld.shared.b8 %rs1792, [%r76+2816]; 2026-02-21T10:23:47.8059342Z ld.shared.b8 %rs1793, [%r76+3840]; 2026-02-21T10:23:47.8059408Z ld.shared.b8 %rs1794, [%r77+896]; 2026-02-21T10:23:47.8059475Z ld.shared.b8 %rs1795, [%r77+1920]; 2026-02-21T10:23:47.8059544Z ld.shared.b8 %rs1796, [%r77+2944]; 2026-02-21T10:23:47.8059610Z ld.shared.b8 %rs1797, [%r77+3968]; 2026-02-21T10:23:47.8059819Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.8059965Z shl.b16 %rs1798, %rs1766, 4; 2026-02-21T10:23:47.8060031Z shl.b16 %rs1799, %rs1770, 4; 2026-02-21T10:23:47.8060092Z shl.b16 %rs1800, %rs1774, 4; 2026-02-21T10:23:47.8060220Z shl.b16 %rs1801, %rs1778, 4; 2026-02-21T10:23:47.8060293Z shl.b16 %rs1802, %rs1782, 4; 2026-02-21T10:23:47.8060356Z shl.b16 %rs1803, %rs1786, 4; 2026-02-21T10:23:47.8060420Z shl.b16 %rs1804, %rs1790, 4; 2026-02-21T10:23:47.8060486Z shl.b16 %rs1805, %rs1794, 4; 2026-02-21T10:23:47.8060548Z shl.b16 %rs1806, %rs1767, 4; 2026-02-21T10:23:47.8060613Z shl.b16 %rs1807, %rs1771, 4; 2026-02-21T10:23:47.8060678Z shl.b16 %rs1808, %rs1775, 4; 2026-02-21T10:23:47.8060740Z shl.b16 %rs1809, %rs1779, 4; 2026-02-21T10:23:47.8060800Z shl.b16 %rs1810, %rs1783, 4; 2026-02-21T10:23:47.8060862Z shl.b16 %rs1811, %rs1787, 4; 2026-02-21T10:23:47.8060928Z shl.b16 %rs1812, %rs1791, 4; 2026-02-21T10:23:47.8060998Z shl.b16 %rs1813, %rs1795, 4; 2026-02-21T10:23:47.8061126Z shl.b16 %rs1814, %rs1768, 4; 2026-02-21T10:23:47.8061200Z shl.b16 %rs1815, %rs1772, 4; 2026-02-21T10:23:47.8061263Z shl.b16 %rs1816, %rs1776, 4; 2026-02-21T10:23:47.8061325Z shl.b16 %rs1817, %rs1780, 4; 2026-02-21T10:23:47.8061390Z shl.b16 %rs1818, %rs1784, 4; 2026-02-21T10:23:47.8061456Z shl.b16 %rs1819, %rs1788, 4; 2026-02-21T10:23:47.8061516Z shl.b16 %rs1820, %rs1792, 4; 2026-02-21T10:23:47.8061623Z shl.b16 %rs1821, %rs1796, 4; 2026-02-21T10:23:47.8061702Z shl.b16 %rs1822, %rs1769, 4; 2026-02-21T10:23:47.8061767Z shl.b16 %rs1823, %rs1773, 4; 2026-02-21T10:23:47.8061829Z shl.b16 %rs1824, %rs1777, 4; 2026-02-21T10:23:47.8061896Z shl.b16 %rs1825, %rs1781, 4; 2026-02-21T10:23:47.8061957Z shl.b16 %rs1826, %rs1785, 4; 2026-02-21T10:23:47.8062020Z shl.b16 %rs1827, %rs1789, 4; 2026-02-21T10:23:47.8062082Z shl.b16 %rs1828, %rs1793, 4; 2026-02-21T10:23:47.8062149Z shl.b16 %rs1829, %rs1797, 4; 2026-02-21T10:23:47.8062355Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8062439Z selp.b16 %rs1830, %rs1798, %rs1766, %p188; 2026-02-21T10:23:47.8062502Z cvt.s16.s8 %rs1831, %rs1830; 2026-02-21T10:23:47.8062576Z shr.s16 %rs1832, %rs1831, 4; 2026-02-21T10:23:47.8062662Z selp.b16 %rs1833, %rs1799, %rs1770, %p188; 2026-02-21T10:23:47.8062724Z cvt.s16.s8 %rs1834, %rs1833; 2026-02-21T10:23:47.8062788Z shr.s16 %rs1835, %rs1834, 4; 2026-02-21T10:23:47.8062868Z selp.b16 %rs1836, %rs1800, %rs1774, %p188; 2026-02-21T10:23:47.8062931Z cvt.s16.s8 %rs1837, %rs1836; 2026-02-21T10:23:47.8062995Z shr.s16 %rs1838, %rs1837, 4; 2026-02-21T10:23:47.8063070Z selp.b16 %rs1839, %rs1801, %rs1778, %p188; 2026-02-21T10:23:47.8063137Z cvt.s16.s8 %rs1840, %rs1839; 2026-02-21T10:23:47.8063199Z shr.s16 %rs1841, %rs1840, 4; 2026-02-21T10:23:47.8063273Z selp.b16 %rs1842, %rs1802, %rs1782, %p188; 2026-02-21T10:23:47.8063338Z cvt.s16.s8 %rs1843, %rs1842; 2026-02-21T10:23:47.8063403Z shr.s16 %rs1844, %rs1843, 4; 2026-02-21T10:23:47.8063480Z selp.b16 %rs1845, %rs1803, %rs1786, %p188; 2026-02-21T10:23:47.8063542Z cvt.s16.s8 %rs1846, %rs1845; 2026-02-21T10:23:47.8063610Z shr.s16 %rs1847, %rs1846, 4; 2026-02-21T10:23:47.8063684Z selp.b16 %rs1848, %rs1804, %rs1790, %p188; 2026-02-21T10:23:47.8063748Z cvt.s16.s8 %rs1849, %rs1848; 2026-02-21T10:23:47.8063813Z shr.s16 %rs1850, %rs1849, 4; 2026-02-21T10:23:47.8063887Z selp.b16 %rs1851, %rs1805, %rs1794, %p188; 2026-02-21T10:23:47.8063948Z cvt.s16.s8 %rs1852, %rs1851; 2026-02-21T10:23:47.8064012Z shr.s16 %rs1853, %rs1852, 4; 2026-02-21T10:23:47.8064086Z selp.b16 %rs1854, %rs1806, %rs1767, %p188; 2026-02-21T10:23:47.8064150Z cvt.s16.s8 %rs1855, %rs1854; 2026-02-21T10:23:47.8064211Z shr.s16 %rs1856, %rs1855, 4; 2026-02-21T10:23:47.8064290Z selp.b16 %rs1857, %rs1807, %rs1771, %p188; 2026-02-21T10:23:47.8064354Z cvt.s16.s8 %rs1858, %rs1857; 2026-02-21T10:23:47.8064415Z shr.s16 %rs1859, %rs1858, 4; 2026-02-21T10:23:47.8064557Z selp.b16 %rs1860, %rs1808, %rs1775, %p188; 2026-02-21T10:23:47.8064619Z cvt.s16.s8 %rs1861, %rs1860; 2026-02-21T10:23:47.8064680Z shr.s16 %rs1862, %rs1861, 4; 2026-02-21T10:23:47.8064756Z selp.b16 %rs1863, %rs1809, %rs1779, %p188; 2026-02-21T10:23:47.8064873Z cvt.s16.s8 %rs1864, %rs1863; 2026-02-21T10:23:47.8064934Z shr.s16 %rs1865, %rs1864, 4; 2026-02-21T10:23:47.8065023Z selp.b16 %rs1866, %rs1810, %rs1783, %p188; 2026-02-21T10:23:47.8065093Z cvt.s16.s8 %rs1867, %rs1866; 2026-02-21T10:23:47.8065155Z shr.s16 %rs1868, %rs1867, 4; 2026-02-21T10:23:47.8065228Z selp.b16 %rs1869, %rs1811, %rs1787, %p188; 2026-02-21T10:23:47.8065293Z cvt.s16.s8 %rs1870, %rs1869; 2026-02-21T10:23:47.8065357Z shr.s16 %rs1871, %rs1870, 4; 2026-02-21T10:23:47.8065430Z selp.b16 %rs1872, %rs1812, %rs1791, %p188; 2026-02-21T10:23:47.8065490Z cvt.s16.s8 %rs1873, %rs1872; 2026-02-21T10:23:47.8065554Z shr.s16 %rs1874, %rs1873, 4; 2026-02-21T10:23:47.8065628Z selp.b16 %rs1875, %rs1813, %rs1795, %p188; 2026-02-21T10:23:47.8065742Z cvt.s16.s8 %rs1876, %rs1875; 2026-02-21T10:23:47.8065808Z shr.s16 %rs1877, %rs1876, 4; 2026-02-21T10:23:47.8065881Z selp.b16 %rs1878, %rs1814, %rs1768, %p188; 2026-02-21T10:23:47.8065944Z cvt.s16.s8 %rs1879, %rs1878; 2026-02-21T10:23:47.8066008Z shr.s16 %rs1880, %rs1879, 4; 2026-02-21T10:23:47.8066088Z selp.b16 %rs1881, %rs1815, %rs1772, %p188; 2026-02-21T10:23:47.8066149Z cvt.s16.s8 %rs1882, %rs1881; 2026-02-21T10:23:47.8066256Z shr.s16 %rs1883, %rs1882, 4; 2026-02-21T10:23:47.8066336Z selp.b16 %rs1884, %rs1816, %rs1776, %p188; 2026-02-21T10:23:47.8066399Z cvt.s16.s8 %rs1885, %rs1884; 2026-02-21T10:23:47.8066588Z shr.s16 %rs1886, %rs1885, 4; 2026-02-21T10:23:47.8066673Z selp.b16 %rs1887, %rs1817, %rs1780, %p188; 2026-02-21T10:23:47.8066742Z cvt.s16.s8 %rs1888, %rs1887; 2026-02-21T10:23:47.8066804Z shr.s16 %rs1889, %rs1888, 4; 2026-02-21T10:23:47.8066878Z selp.b16 %rs1890, %rs1818, %rs1784, %p188; 2026-02-21T10:23:47.8066942Z cvt.s16.s8 %rs1891, %rs1890; 2026-02-21T10:23:47.8067009Z shr.s16 %rs1892, %rs1891, 4; 2026-02-21T10:23:47.8067084Z selp.b16 %rs1893, %rs1819, %rs1788, %p188; 2026-02-21T10:23:47.8067148Z cvt.s16.s8 %rs1894, %rs1893; 2026-02-21T10:23:47.8067210Z shr.s16 %rs1895, %rs1894, 4; 2026-02-21T10:23:47.8067287Z selp.b16 %rs1896, %rs1820, %rs1792, %p188; 2026-02-21T10:23:47.8067351Z cvt.s16.s8 %rs1897, %rs1896; 2026-02-21T10:23:47.8067417Z shr.s16 %rs1898, %rs1897, 4; 2026-02-21T10:23:47.8067493Z selp.b16 %rs1899, %rs1821, %rs1796, %p188; 2026-02-21T10:23:47.8067554Z cvt.s16.s8 %rs1900, %rs1899; 2026-02-21T10:23:47.8067619Z shr.s16 %rs1901, %rs1900, 4; 2026-02-21T10:23:47.8067695Z selp.b16 %rs1902, %rs1822, %rs1769, %p188; 2026-02-21T10:23:47.8067756Z cvt.s16.s8 %rs1903, %rs1902; 2026-02-21T10:23:47.8067820Z shr.s16 %rs1904, %rs1903, 4; 2026-02-21T10:23:47.8067898Z selp.b16 %rs1905, %rs1823, %rs1773, %p188; 2026-02-21T10:23:47.8067958Z cvt.s16.s8 %rs1906, %rs1905; 2026-02-21T10:23:47.8068019Z shr.s16 %rs1907, %rs1906, 4; 2026-02-21T10:23:47.8068101Z selp.b16 %rs1908, %rs1824, %rs1777, %p188; 2026-02-21T10:23:47.8068163Z cvt.s16.s8 %rs1909, %rs1908; 2026-02-21T10:23:47.8068226Z shr.s16 %rs1910, %rs1909, 4; 2026-02-21T10:23:47.8068303Z selp.b16 %rs1911, %rs1825, %rs1781, %p188; 2026-02-21T10:23:47.8068368Z cvt.s16.s8 %rs1912, %rs1911; 2026-02-21T10:23:47.8068428Z shr.s16 %rs1913, %rs1912, 4; 2026-02-21T10:23:47.8068574Z selp.b16 %rs1914, %rs1826, %rs1785, %p188; 2026-02-21T10:23:47.8068643Z cvt.s16.s8 %rs1915, %rs1914; 2026-02-21T10:23:47.8068704Z shr.s16 %rs1916, %rs1915, 4; 2026-02-21T10:23:47.8068781Z selp.b16 %rs1917, %rs1827, %rs1789, %p188; 2026-02-21T10:23:47.8068844Z cvt.s16.s8 %rs1918, %rs1917; 2026-02-21T10:23:47.8068903Z shr.s16 %rs1919, %rs1918, 4; 2026-02-21T10:23:47.8068975Z selp.b16 %rs1920, %rs1828, %rs1793, %p188; 2026-02-21T10:23:47.8069035Z cvt.s16.s8 %rs1921, %rs1920; 2026-02-21T10:23:47.8069098Z shr.s16 %rs1922, %rs1921, 4; 2026-02-21T10:23:47.8069170Z selp.b16 %rs1923, %rs1829, %rs1797, %p188; 2026-02-21T10:23:47.8069330Z cvt.s16.s8 %rs1924, %rs1923; 2026-02-21T10:23:47.8069393Z shr.s16 %rs1925, %rs1924, 4; 2026-02-21T10:23:47.8069602Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.8069741Z cvt.rn.f32.s16 %r15857, %rs1832; 2026-02-21T10:23:47.8069810Z cvt.rn.f32.s16 %r15858, %rs1835; 2026-02-21T10:23:47.8069875Z cvt.rn.f32.s16 %r15859, %rs1838; 2026-02-21T10:23:47.8069938Z cvt.rn.f32.s16 %r15860, %rs1841; 2026-02-21T10:23:47.8070004Z cvt.rn.f32.s16 %r15861, %rs1844; 2026-02-21T10:23:47.8070066Z cvt.rn.f32.s16 %r15862, %rs1847; 2026-02-21T10:23:47.8070127Z cvt.rn.f32.s16 %r15863, %rs1850; 2026-02-21T10:23:47.8070197Z cvt.rn.f32.s16 %r15864, %rs1853; 2026-02-21T10:23:47.8070263Z cvt.rn.f32.s16 %r15865, %rs1856; 2026-02-21T10:23:47.8070323Z cvt.rn.f32.s16 %r15866, %rs1859; 2026-02-21T10:23:47.8070388Z cvt.rn.f32.s16 %r15867, %rs1862; 2026-02-21T10:23:47.8070450Z cvt.rn.f32.s16 %r15868, %rs1865; 2026-02-21T10:23:47.8070574Z cvt.rn.f32.s16 %r15869, %rs1868; 2026-02-21T10:23:47.8070639Z cvt.rn.f32.s16 %r15870, %rs1871; 2026-02-21T10:23:47.8070704Z cvt.rn.f32.s16 %r15871, %rs1874; 2026-02-21T10:23:47.8070767Z cvt.rn.f32.s16 %r15872, %rs1877; 2026-02-21T10:23:47.8070832Z cvt.rn.f32.s16 %r15873, %rs1880; 2026-02-21T10:23:47.8070896Z cvt.rn.f32.s16 %r15874, %rs1883; 2026-02-21T10:23:47.8071028Z cvt.rn.f32.s16 %r15875, %rs1886; 2026-02-21T10:23:47.8071091Z cvt.rn.f32.s16 %r15876, %rs1889; 2026-02-21T10:23:47.8071155Z cvt.rn.f32.s16 %r15877, %rs1892; 2026-02-21T10:23:47.8071218Z cvt.rn.f32.s16 %r15878, %rs1895; 2026-02-21T10:23:47.8071279Z cvt.rn.f32.s16 %r15879, %rs1898; 2026-02-21T10:23:47.8071339Z cvt.rn.f32.s16 %r15880, %rs1901; 2026-02-21T10:23:47.8071412Z cvt.rn.f32.s16 %r15881, %rs1904; 2026-02-21T10:23:47.8071476Z cvt.rn.f32.s16 %r15882, %rs1907; 2026-02-21T10:23:47.8071538Z cvt.rn.f32.s16 %r15883, %rs1910; 2026-02-21T10:23:47.8071600Z cvt.rn.f32.s16 %r15884, %rs1913; 2026-02-21T10:23:47.8071665Z cvt.rn.f32.s16 %r15885, %rs1916; 2026-02-21T10:23:47.8071727Z cvt.rn.f32.s16 %r15886, %rs1919; 2026-02-21T10:23:47.8071787Z cvt.rn.f32.s16 %r15887, %rs1922; 2026-02-21T10:23:47.8071850Z cvt.rn.f32.s16 %r15888, %rs1925; 2026-02-21T10:23:47.8071908Z bar.sync 0; 2026-02-21T10:23:47.8071974Z st.shared.b32 [%r78], %r15857; 2026-02-21T10:23:47.8072039Z st.shared.b32 [%r78+8], %r15858; 2026-02-21T10:23:47.8072110Z st.shared.b32 [%r78+16384], %r15873; 2026-02-21T10:23:47.8072183Z st.shared.b32 [%r78+16392], %r15874; 2026-02-21T10:23:47.8072247Z st.shared.b32 [%r79], %r15859; 2026-02-21T10:23:47.8072321Z st.shared.b32 [%r79+8], %r15860; 2026-02-21T10:23:47.8072388Z st.shared.b32 [%r79+16384], %r15875; 2026-02-21T10:23:47.8072454Z st.shared.b32 [%r79+16392], %r15876; 2026-02-21T10:23:47.8072519Z st.shared.b32 [%r80], %r15861; 2026-02-21T10:23:47.8072582Z st.shared.b32 [%r80+8], %r15862; 2026-02-21T10:23:47.8072645Z st.shared.b32 [%r80+16384], %r15877; 2026-02-21T10:23:47.8072715Z st.shared.b32 [%r80+16392], %r15878; 2026-02-21T10:23:47.8072782Z st.shared.b32 [%r81], %r15863; 2026-02-21T10:23:47.8072843Z st.shared.b32 [%r81+8], %r15864; 2026-02-21T10:23:47.8072908Z st.shared.b32 [%r81+16384], %r15879; 2026-02-21T10:23:47.8072977Z st.shared.b32 [%r81+16392], %r15880; 2026-02-21T10:23:47.8073041Z st.shared.b32 [%r82], %r15865; 2026-02-21T10:23:47.8073106Z st.shared.b32 [%r82+8], %r15866; 2026-02-21T10:23:47.8073172Z st.shared.b32 [%r82+16384], %r15881; 2026-02-21T10:23:47.8073237Z st.shared.b32 [%r82+16392], %r15882; 2026-02-21T10:23:47.8073300Z st.shared.b32 [%r83], %r15867; 2026-02-21T10:23:47.8073362Z st.shared.b32 [%r83+8], %r15868; 2026-02-21T10:23:47.8073429Z st.shared.b32 [%r83+16384], %r15883; 2026-02-21T10:23:47.8073493Z st.shared.b32 [%r83+16392], %r15884; 2026-02-21T10:23:47.8073557Z st.shared.b32 [%r84], %r15869; 2026-02-21T10:23:47.8073622Z st.shared.b32 [%r84+8], %r15870; 2026-02-21T10:23:47.8073757Z st.shared.b32 [%r84+16384], %r15885; 2026-02-21T10:23:47.8073823Z st.shared.b32 [%r84+16392], %r15886; 2026-02-21T10:23:47.8073890Z st.shared.b32 [%r85], %r15871; 2026-02-21T10:23:47.8073956Z st.shared.b32 [%r85+8], %r15872; 2026-02-21T10:23:47.8074090Z st.shared.b32 [%r85+16384], %r15887; 2026-02-21T10:23:47.8074158Z st.shared.b32 [%r85+16392], %r15888; 2026-02-21T10:23:47.8074214Z $L__tmp19: 2026-02-21T10:23:47.8074497Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.8074558Z // begin inline asm 2026-02-21T10:23:47.8074640Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8074697Z // end inline asm 2026-02-21T10:23:47.8074753Z bar.sync 0; 2026-02-21T10:23:47.8074827Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.8074889Z // begin inline asm 2026-02-21T10:23:47.8076616Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12330,%r12331,%r12332,%r12333}, %rd11, %p136, 1, 1; 2026-02-21T10:23:47.8076696Z // end inline asm 2026-02-21T10:23:47.8076757Z // begin inline asm 2026-02-21T10:23:47.8078232Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12462,%r12463,%r12464,%r12465}, %rd12, %p136, 1, 1; 2026-02-21T10:23:47.8078297Z // end inline asm 2026-02-21T10:23:47.8078355Z // begin inline asm 2026-02-21T10:23:47.8079828Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12594,%r12595,%r12596,%r12597}, %rd13, %p136, 1, 1; 2026-02-21T10:23:47.8079890Z // end inline asm 2026-02-21T10:23:47.8079964Z // begin inline asm 2026-02-21T10:23:47.8081428Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12726,%r12727,%r12728,%r12729}, %rd14, %p136, 1, 1; 2026-02-21T10:23:47.8081488Z // end inline asm 2026-02-21T10:23:47.8081548Z // begin inline asm 2026-02-21T10:23:47.8083004Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12858,%r12859,%r12860,%r12861}, %rd15, %p136, 1, 1; 2026-02-21T10:23:47.8083196Z // end inline asm 2026-02-21T10:23:47.8083254Z // begin inline asm 2026-02-21T10:23:47.8084781Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r12990,%r12991,%r12992,%r12993}, %rd16, %p136, 1, 1; 2026-02-21T10:23:47.8084844Z // end inline asm 2026-02-21T10:23:47.8084905Z // begin inline asm 2026-02-21T10:23:47.8086434Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13122,%r13123,%r13124,%r13125}, %rd17, %p136, 1, 1; 2026-02-21T10:23:47.8086622Z // end inline asm 2026-02-21T10:23:47.8086691Z // begin inline asm 2026-02-21T10:23:47.8088154Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13254,%r13255,%r13256,%r13257}, %rd18, %p136, 1, 1; 2026-02-21T10:23:47.8088218Z // end inline asm 2026-02-21T10:23:47.8088296Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.8088359Z mov.b32 %r13323, %r15753; 2026-02-21T10:23:47.8088429Z mov.b32 %r13324, %r15753; 2026-02-21T10:23:47.8088494Z mov.b32 %r13322, %r10980; 2026-02-21T10:23:47.8088553Z // begin inline asm 2026-02-21T10:23:47.8089817Z // wait for regs: %r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283,%r13322,%r13323,%r13324 2026-02-21T10:23:47.8089903Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.8089959Z // end inline asm 2026-02-21T10:23:47.8090013Z $L__tmp20: 2026-02-21T10:23:47.8090235Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.8090300Z add.s64 %rd312, %rd270, 256; 2026-02-21T10:23:47.8090362Z add.s64 %rd315, %rd273, 256; 2026-02-21T10:23:47.8090431Z add.s64 %rd318, %rd276, 256; 2026-02-21T10:23:47.8090719Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.8090792Z add.s64 %rd321, %rd279, 256; 2026-02-21T10:23:47.8090857Z // begin inline asm 2026-02-21T10:23:47.8090918Z mov.u64 %rd311, 0x0; 2026-02-21T10:23:47.8091110Z createpolicy.fractional.L2::evict_last.b64 %rd311, 1.0; 2026-02-21T10:23:47.8091168Z // end inline asm 2026-02-21T10:23:47.8091228Z // begin inline asm 2026-02-21T10:23:47.8091289Z mov.u32 %r13392, 0x0; 2026-02-21T10:23:47.8091346Z mov.u32 %r13393, 0x0; 2026-02-21T10:23:47.8091407Z mov.u32 %r13394, 0x0; 2026-02-21T10:23:47.8091463Z mov.u32 %r13395, 0x0; 2026-02-21T10:23:47.8091711Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r13392, %r13393, %r13394, %r13395 }, [ %rd312 + 0 ], %rd311; 2026-02-21T10:23:47.8091772Z // end inline asm 2026-02-21T10:23:47.8091831Z // begin inline asm 2026-02-21T10:23:47.8091891Z mov.u64 %rd314, 0x0; 2026-02-21T10:23:47.8092019Z createpolicy.fractional.L2::evict_last.b64 %rd314, 1.0; 2026-02-21T10:23:47.8092144Z // end inline asm 2026-02-21T10:23:47.8092206Z // begin inline asm 2026-02-21T10:23:47.8092263Z mov.u32 %r13396, 0x0; 2026-02-21T10:23:47.8092323Z mov.u32 %r13397, 0x0; 2026-02-21T10:23:47.8092379Z mov.u32 %r13398, 0x0; 2026-02-21T10:23:47.8092439Z mov.u32 %r13399, 0x0; 2026-02-21T10:23:47.8092674Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r13396, %r13397, %r13398, %r13399 }, [ %rd315 + 0 ], %rd314; 2026-02-21T10:23:47.8092801Z // end inline asm 2026-02-21T10:23:47.8092862Z // begin inline asm 2026-02-21T10:23:47.8092922Z mov.u64 %rd317, 0x0; 2026-02-21T10:23:47.8093041Z createpolicy.fractional.L2::evict_last.b64 %rd317, 1.0; 2026-02-21T10:23:47.8093097Z // end inline asm 2026-02-21T10:23:47.8093154Z // begin inline asm 2026-02-21T10:23:47.8093213Z mov.u32 %r13400, 0x0; 2026-02-21T10:23:47.8093271Z mov.u32 %r13401, 0x0; 2026-02-21T10:23:47.8093327Z mov.u32 %r13402, 0x0; 2026-02-21T10:23:47.8093384Z mov.u32 %r13403, 0x0; 2026-02-21T10:23:47.8093629Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r13400, %r13401, %r13402, %r13403 }, [ %rd318 + 0 ], %rd317; 2026-02-21T10:23:47.8093687Z // end inline asm 2026-02-21T10:23:47.8093747Z // begin inline asm 2026-02-21T10:23:47.8093808Z mov.u64 %rd320, 0x0; 2026-02-21T10:23:47.8093924Z createpolicy.fractional.L2::evict_last.b64 %rd320, 1.0; 2026-02-21T10:23:47.8093980Z // end inline asm 2026-02-21T10:23:47.8094038Z // begin inline asm 2026-02-21T10:23:47.8094101Z mov.u32 %r13404, 0x0; 2026-02-21T10:23:47.8094161Z mov.u32 %r13405, 0x0; 2026-02-21T10:23:47.8094218Z mov.u32 %r13406, 0x0; 2026-02-21T10:23:47.8094279Z mov.u32 %r13407, 0x0; 2026-02-21T10:23:47.8094499Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r13404, %r13405, %r13406, %r13407 }, [ %rd321 + 0 ], %rd320; 2026-02-21T10:23:47.8094565Z // end inline asm 2026-02-21T10:23:47.8094783Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.8094843Z bar.sync 0; 2026-02-21T10:23:47.8094932Z st.shared.v2.b32 [%r60], {%r13392, %r13393}; 2026-02-21T10:23:47.8095025Z st.shared.v2.b32 [%r60+4096], {%r13396, %r13397}; 2026-02-21T10:23:47.8095115Z st.shared.v2.b32 [%r60+8192], {%r13400, %r13401}; 2026-02-21T10:23:47.8095210Z st.shared.v2.b32 [%r60+12288], {%r13404, %r13405}; 2026-02-21T10:23:47.8095288Z st.shared.v2.b32 [%r61], {%r13394, %r13395}; 2026-02-21T10:23:47.8095378Z st.shared.v2.b32 [%r61+4096], {%r13398, %r13399}; 2026-02-21T10:23:47.8095462Z st.shared.v2.b32 [%r61+8192], {%r13402, %r13403}; 2026-02-21T10:23:47.8095547Z st.shared.v2.b32 [%r61+12288], {%r13406, %r13407}; 2026-02-21T10:23:47.8095606Z bar.sync 0; 2026-02-21T10:23:47.8095675Z ld.shared.b16 %rs1926, [%r62]; 2026-02-21T10:23:47.8095746Z ld.shared.b16 %rs1927, [%r62+1024]; 2026-02-21T10:23:47.8095813Z ld.shared.b16 %rs1928, [%r62+64]; 2026-02-21T10:23:47.8095884Z ld.shared.b16 %rs1929, [%r62+1088]; 2026-02-21T10:23:47.8095949Z ld.shared.b16 %rs1930, [%r63]; 2026-02-21T10:23:47.8096077Z ld.shared.b16 %rs1931, [%r63+1024]; 2026-02-21T10:23:47.8096157Z ld.shared.b16 %rs1932, [%r63+64]; 2026-02-21T10:23:47.8096224Z ld.shared.b16 %rs1933, [%r63+1088]; 2026-02-21T10:23:47.8096288Z ld.shared.b16 %rs1934, [%r64]; 2026-02-21T10:23:47.8096407Z ld.shared.b16 %rs1935, [%r64+1024]; 2026-02-21T10:23:47.8096587Z ld.shared.b16 %rs1936, [%r64+64]; 2026-02-21T10:23:47.8096656Z ld.shared.b16 %rs1937, [%r64+1088]; 2026-02-21T10:23:47.8096723Z ld.shared.b16 %rs1938, [%r65]; 2026-02-21T10:23:47.8096793Z ld.shared.b16 %rs1939, [%r65+1024]; 2026-02-21T10:23:47.8096857Z ld.shared.b16 %rs1940, [%r65+64]; 2026-02-21T10:23:47.8096924Z ld.shared.b16 %rs1941, [%r65+1088]; 2026-02-21T10:23:47.8096992Z ld.shared.b16 %rs1942, [%r66]; 2026-02-21T10:23:47.8097058Z ld.shared.b16 %rs1943, [%r66+1024]; 2026-02-21T10:23:47.8097123Z ld.shared.b16 %rs1944, [%r66+64]; 2026-02-21T10:23:47.8097191Z ld.shared.b16 %rs1945, [%r66+1088]; 2026-02-21T10:23:47.8097260Z ld.shared.b16 %rs1946, [%r67]; 2026-02-21T10:23:47.8097418Z ld.shared.b16 %rs1947, [%r67+1024]; 2026-02-21T10:23:47.8097488Z ld.shared.b16 %rs1948, [%r67+64]; 2026-02-21T10:23:47.8097557Z ld.shared.b16 %rs1949, [%r67+1088]; 2026-02-21T10:23:47.8097622Z ld.shared.b16 %rs1950, [%r68]; 2026-02-21T10:23:47.8097695Z ld.shared.b16 %rs1951, [%r68+1024]; 2026-02-21T10:23:47.8097761Z ld.shared.b16 %rs1952, [%r68+64]; 2026-02-21T10:23:47.8097828Z ld.shared.b16 %rs1953, [%r68+1088]; 2026-02-21T10:23:47.8097955Z ld.shared.b16 %rs1954, [%r69]; 2026-02-21T10:23:47.8098022Z ld.shared.b16 %rs1955, [%r69+1024]; 2026-02-21T10:23:47.8098097Z ld.shared.b16 %rs1956, [%r69+64]; 2026-02-21T10:23:47.8098167Z ld.shared.b16 %rs1957, [%r69+1088]; 2026-02-21T10:23:47.8098233Z cvt.f32.bf16 %r13545, %rs1926; 2026-02-21T10:23:47.8098298Z cvt.f32.bf16 %r13546, %rs1927; 2026-02-21T10:23:47.8098361Z cvt.f32.bf16 %r13547, %rs1930; 2026-02-21T10:23:47.8098422Z cvt.f32.bf16 %r13548, %rs1931; 2026-02-21T10:23:47.8098484Z cvt.f32.bf16 %r13677, %rs1934; 2026-02-21T10:23:47.8098553Z cvt.f32.bf16 %r13678, %rs1935; 2026-02-21T10:23:47.8098614Z cvt.f32.bf16 %r13679, %rs1938; 2026-02-21T10:23:47.8098675Z cvt.f32.bf16 %r13680, %rs1939; 2026-02-21T10:23:47.8098738Z cvt.f32.bf16 %r13809, %rs1942; 2026-02-21T10:23:47.8098800Z cvt.f32.bf16 %r13810, %rs1943; 2026-02-21T10:23:47.8098861Z cvt.f32.bf16 %r13811, %rs1946; 2026-02-21T10:23:47.8098922Z cvt.f32.bf16 %r13812, %rs1947; 2026-02-21T10:23:47.8098990Z cvt.f32.bf16 %r13941, %rs1950; 2026-02-21T10:23:47.8099050Z cvt.f32.bf16 %r13942, %rs1951; 2026-02-21T10:23:47.8099109Z cvt.f32.bf16 %r13943, %rs1954; 2026-02-21T10:23:47.8099172Z cvt.f32.bf16 %r13944, %rs1955; 2026-02-21T10:23:47.8099232Z cvt.f32.bf16 %r14073, %rs1928; 2026-02-21T10:23:47.8099291Z cvt.f32.bf16 %r14074, %rs1929; 2026-02-21T10:23:47.8099352Z cvt.f32.bf16 %r14075, %rs1932; 2026-02-21T10:23:47.8099417Z cvt.f32.bf16 %r14076, %rs1933; 2026-02-21T10:23:47.8099477Z cvt.f32.bf16 %r14205, %rs1936; 2026-02-21T10:23:47.8099542Z cvt.f32.bf16 %r14206, %rs1937; 2026-02-21T10:23:47.8099608Z cvt.f32.bf16 %r14207, %rs1940; 2026-02-21T10:23:47.8099668Z cvt.f32.bf16 %r14208, %rs1941; 2026-02-21T10:23:47.8099728Z cvt.f32.bf16 %r14337, %rs1944; 2026-02-21T10:23:47.8099791Z cvt.f32.bf16 %r14338, %rs1945; 2026-02-21T10:23:47.8099854Z cvt.f32.bf16 %r14339, %rs1948; 2026-02-21T10:23:47.8099914Z cvt.f32.bf16 %r14340, %rs1949; 2026-02-21T10:23:47.8099979Z cvt.f32.bf16 %r14469, %rs1952; 2026-02-21T10:23:47.8100043Z cvt.f32.bf16 %r14470, %rs1953; 2026-02-21T10:23:47.8100105Z cvt.f32.bf16 %r14471, %rs1956; 2026-02-21T10:23:47.8100166Z cvt.f32.bf16 %r14472, %rs1957; 2026-02-21T10:23:47.8100397Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.8100453Z bar.sync 0; 2026-02-21T10:23:47.8100514Z // begin inline asm 2026-02-21T10:23:47.8100618Z @%p132 mbarrier.init.shared::cta.b64 [%r10978], 1; 2026-02-21T10:23:47.8100680Z // end inline asm 2026-02-21T10:23:47.8100842Z bar.sync 0; 2026-02-21T10:23:47.8100905Z // begin inline asm 2026-02-21T10:23:47.8101046Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r10978], 4096; 2026-02-21T10:23:47.8101102Z // end inline asm 2026-02-21T10:23:47.8101161Z // begin inline asm 2026-02-21T10:23:47.8101304Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8101362Z // end inline asm 2026-02-21T10:23:47.8101416Z bar.sync 0; 2026-02-21T10:23:47.8101487Z elect.sync %r15889|%p184, -1; 2026-02-21T10:23:47.8101560Z and.pred %p158, %p1, %p184; 2026-02-21T10:23:47.8101623Z or.b32 %r13412, %r10982, 64; 2026-02-21T10:23:47.8101681Z // begin inline asm 2026-02-21T10:23:47.8102029Z @%p158 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r10981, %r13412}], [%r10978]; 2026-02-21T10:23:47.8102086Z // end inline asm 2026-02-21T10:23:47.8102141Z bar.sync 0; 2026-02-21T10:23:47.8102199Z // begin inline asm 2026-02-21T10:23:47.8102254Z 2026-02-21T10:23:47.8102307Z { 2026-02-21T10:23:47.8102428Z .reg .pred complete; 2026-02-21T10:23:47.8102487Z waitLoop: 2026-02-21T10:23:47.8102637Z mbarrier.try_wait.parity.shared.b64 complete, [%r10978], %r15753; 2026-02-21T10:23:47.8102707Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.8102760Z } 2026-02-21T10:23:47.8102764Z 2026-02-21T10:23:47.8102822Z // end inline asm 2026-02-21T10:23:47.8102878Z bar.sync 0; 2026-02-21T10:23:47.8102936Z // begin inline asm 2026-02-21T10:23:47.8103083Z @%p132 mbarrier.inval.shared::cta.b64 [%r10978]; 2026-02-21T10:23:47.8103143Z // end inline asm 2026-02-21T10:23:47.8103363Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8103435Z ld.shared.b8 %rs1958, [%r70]; 2026-02-21T10:23:47.8103504Z ld.shared.b8 %rs1959, [%r70+1024]; 2026-02-21T10:23:47.8103569Z ld.shared.b8 %rs1960, [%r70+2048]; 2026-02-21T10:23:47.8103633Z ld.shared.b8 %rs1961, [%r70+3072]; 2026-02-21T10:23:47.8103701Z ld.shared.b8 %rs1962, [%r71+128]; 2026-02-21T10:23:47.8103770Z ld.shared.b8 %rs1963, [%r71+1152]; 2026-02-21T10:23:47.8103834Z ld.shared.b8 %rs1964, [%r71+2176]; 2026-02-21T10:23:47.8103901Z ld.shared.b8 %rs1965, [%r71+3200]; 2026-02-21T10:23:47.8103965Z ld.shared.b8 %rs1966, [%r72+256]; 2026-02-21T10:23:47.8104033Z ld.shared.b8 %rs1967, [%r72+1280]; 2026-02-21T10:23:47.8104095Z ld.shared.b8 %rs1968, [%r72+2304]; 2026-02-21T10:23:47.8104162Z ld.shared.b8 %rs1969, [%r72+3328]; 2026-02-21T10:23:47.8104228Z ld.shared.b8 %rs1970, [%r73+384]; 2026-02-21T10:23:47.8104292Z ld.shared.b8 %rs1971, [%r73+1408]; 2026-02-21T10:23:47.8104359Z ld.shared.b8 %rs1972, [%r73+2432]; 2026-02-21T10:23:47.8104424Z ld.shared.b8 %rs1973, [%r73+3456]; 2026-02-21T10:23:47.8104487Z ld.shared.b8 %rs1974, [%r74+512]; 2026-02-21T10:23:47.8104551Z ld.shared.b8 %rs1975, [%r74+1536]; 2026-02-21T10:23:47.8104615Z ld.shared.b8 %rs1976, [%r74+2560]; 2026-02-21T10:23:47.8104677Z ld.shared.b8 %rs1977, [%r74+3584]; 2026-02-21T10:23:47.8104746Z ld.shared.b8 %rs1978, [%r75+640]; 2026-02-21T10:23:47.8104813Z ld.shared.b8 %rs1979, [%r75+1664]; 2026-02-21T10:23:47.8104877Z ld.shared.b8 %rs1980, [%r75+2688]; 2026-02-21T10:23:47.8104939Z ld.shared.b8 %rs1981, [%r75+3712]; 2026-02-21T10:23:47.8105012Z ld.shared.b8 %rs1982, [%r76+768]; 2026-02-21T10:23:47.8105076Z ld.shared.b8 %rs1983, [%r76+1792]; 2026-02-21T10:23:47.8105141Z ld.shared.b8 %rs1984, [%r76+2816]; 2026-02-21T10:23:47.8105206Z ld.shared.b8 %rs1985, [%r76+3840]; 2026-02-21T10:23:47.8105272Z ld.shared.b8 %rs1986, [%r77+896]; 2026-02-21T10:23:47.8105335Z ld.shared.b8 %rs1987, [%r77+1920]; 2026-02-21T10:23:47.8105397Z ld.shared.b8 %rs1988, [%r77+2944]; 2026-02-21T10:23:47.8105464Z ld.shared.b8 %rs1989, [%r77+3968]; 2026-02-21T10:23:47.8105682Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.8105747Z shl.b16 %rs1990, %rs1958, 4; 2026-02-21T10:23:47.8105812Z shl.b16 %rs1991, %rs1962, 4; 2026-02-21T10:23:47.8105934Z shl.b16 %rs1992, %rs1966, 4; 2026-02-21T10:23:47.8105998Z shl.b16 %rs1993, %rs1970, 4; 2026-02-21T10:23:47.8106060Z shl.b16 %rs1994, %rs1974, 4; 2026-02-21T10:23:47.8106123Z shl.b16 %rs1995, %rs1978, 4; 2026-02-21T10:23:47.8106185Z shl.b16 %rs1996, %rs1982, 4; 2026-02-21T10:23:47.8106294Z shl.b16 %rs1997, %rs1986, 4; 2026-02-21T10:23:47.8106356Z shl.b16 %rs1998, %rs1959, 4; 2026-02-21T10:23:47.8106418Z shl.b16 %rs1999, %rs1963, 4; 2026-02-21T10:23:47.8106610Z shl.b16 %rs2000, %rs1967, 4; 2026-02-21T10:23:47.8106678Z shl.b16 %rs2001, %rs1971, 4; 2026-02-21T10:23:47.8106742Z shl.b16 %rs2002, %rs1975, 4; 2026-02-21T10:23:47.8106803Z shl.b16 %rs2003, %rs1979, 4; 2026-02-21T10:23:47.8106865Z shl.b16 %rs2004, %rs1983, 4; 2026-02-21T10:23:47.8106928Z shl.b16 %rs2005, %rs1987, 4; 2026-02-21T10:23:47.8106988Z shl.b16 %rs2006, %rs1960, 4; 2026-02-21T10:23:47.8107048Z shl.b16 %rs2007, %rs1964, 4; 2026-02-21T10:23:47.8107108Z shl.b16 %rs2008, %rs1968, 4; 2026-02-21T10:23:47.8107173Z shl.b16 %rs2009, %rs1972, 4; 2026-02-21T10:23:47.8107304Z shl.b16 %rs2010, %rs1976, 4; 2026-02-21T10:23:47.8107368Z shl.b16 %rs2011, %rs1980, 4; 2026-02-21T10:23:47.8107430Z shl.b16 %rs2012, %rs1984, 4; 2026-02-21T10:23:47.8107490Z shl.b16 %rs2013, %rs1988, 4; 2026-02-21T10:23:47.8107554Z shl.b16 %rs2014, %rs1961, 4; 2026-02-21T10:23:47.8107615Z shl.b16 %rs2015, %rs1965, 4; 2026-02-21T10:23:47.8107676Z shl.b16 %rs2016, %rs1969, 4; 2026-02-21T10:23:47.8107797Z shl.b16 %rs2017, %rs1973, 4; 2026-02-21T10:23:47.8107860Z shl.b16 %rs2018, %rs1977, 4; 2026-02-21T10:23:47.8107922Z shl.b16 %rs2019, %rs1981, 4; 2026-02-21T10:23:47.8107983Z shl.b16 %rs2020, %rs1985, 4; 2026-02-21T10:23:47.8108044Z shl.b16 %rs2021, %rs1989, 4; 2026-02-21T10:23:47.8108251Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8108331Z selp.b16 %rs2022, %rs1990, %rs1958, %p188; 2026-02-21T10:23:47.8108392Z cvt.s16.s8 %rs2023, %rs2022; 2026-02-21T10:23:47.8108458Z shr.s16 %rs2024, %rs2023, 4; 2026-02-21T10:23:47.8108630Z selp.b16 %rs2025, %rs1991, %rs1962, %p188; 2026-02-21T10:23:47.8108695Z cvt.s16.s8 %rs2026, %rs2025; 2026-02-21T10:23:47.8108756Z shr.s16 %rs2027, %rs2026, 4; 2026-02-21T10:23:47.8108837Z selp.b16 %rs2028, %rs1992, %rs1966, %p188; 2026-02-21T10:23:47.8108898Z cvt.s16.s8 %rs2029, %rs2028; 2026-02-21T10:23:47.8108958Z shr.s16 %rs2030, %rs2029, 4; 2026-02-21T10:23:47.8109037Z selp.b16 %rs2031, %rs1993, %rs1970, %p188; 2026-02-21T10:23:47.8109097Z cvt.s16.s8 %rs2032, %rs2031; 2026-02-21T10:23:47.8109157Z shr.s16 %rs2033, %rs2032, 4; 2026-02-21T10:23:47.8109229Z selp.b16 %rs2034, %rs1994, %rs1974, %p188; 2026-02-21T10:23:47.8109291Z cvt.s16.s8 %rs2035, %rs2034; 2026-02-21T10:23:47.8109351Z shr.s16 %rs2036, %rs2035, 4; 2026-02-21T10:23:47.8109424Z selp.b16 %rs2037, %rs1995, %rs1978, %p188; 2026-02-21T10:23:47.8109486Z cvt.s16.s8 %rs2038, %rs2037; 2026-02-21T10:23:47.8109545Z shr.s16 %rs2039, %rs2038, 4; 2026-02-21T10:23:47.8109624Z selp.b16 %rs2040, %rs1996, %rs1982, %p188; 2026-02-21T10:23:47.8109686Z cvt.s16.s8 %rs2041, %rs2040; 2026-02-21T10:23:47.8109748Z shr.s16 %rs2042, %rs2041, 4; 2026-02-21T10:23:47.8109833Z selp.b16 %rs2043, %rs1997, %rs1986, %p188; 2026-02-21T10:23:47.8109897Z cvt.s16.s8 %rs2044, %rs2043; 2026-02-21T10:23:47.8109961Z shr.s16 %rs2045, %rs2044, 4; 2026-02-21T10:23:47.8110038Z selp.b16 %rs2046, %rs1998, %rs1959, %p188; 2026-02-21T10:23:47.8110099Z cvt.s16.s8 %rs2047, %rs2046; 2026-02-21T10:23:47.8110161Z shr.s16 %rs2048, %rs2047, 4; 2026-02-21T10:23:47.8110237Z selp.b16 %rs2049, %rs1999, %rs1963, %p188; 2026-02-21T10:23:47.8110297Z cvt.s16.s8 %rs2050, %rs2049; 2026-02-21T10:23:47.8110357Z shr.s16 %rs2051, %rs2050, 4; 2026-02-21T10:23:47.8110436Z selp.b16 %rs2052, %rs2000, %rs1967, %p188; 2026-02-21T10:23:47.8110495Z cvt.s16.s8 %rs2053, %rs2052; 2026-02-21T10:23:47.8110554Z shr.s16 %rs2054, %rs2053, 4; 2026-02-21T10:23:47.8110629Z selp.b16 %rs2055, %rs2001, %rs1971, %p188; 2026-02-21T10:23:47.8110783Z cvt.s16.s8 %rs2056, %rs2055; 2026-02-21T10:23:47.8110844Z shr.s16 %rs2057, %rs2056, 4; 2026-02-21T10:23:47.8110919Z selp.b16 %rs2058, %rs2002, %rs1975, %p188; 2026-02-21T10:23:47.8110983Z cvt.s16.s8 %rs2059, %rs2058; 2026-02-21T10:23:47.8111106Z shr.s16 %rs2060, %rs2059, 4; 2026-02-21T10:23:47.8111178Z selp.b16 %rs2061, %rs2003, %rs1979, %p188; 2026-02-21T10:23:47.8111241Z cvt.s16.s8 %rs2062, %rs2061; 2026-02-21T10:23:47.8111304Z shr.s16 %rs2063, %rs2062, 4; 2026-02-21T10:23:47.8111377Z selp.b16 %rs2064, %rs2004, %rs1983, %p188; 2026-02-21T10:23:47.8111437Z cvt.s16.s8 %rs2065, %rs2064; 2026-02-21T10:23:47.8111502Z shr.s16 %rs2066, %rs2065, 4; 2026-02-21T10:23:47.8111574Z selp.b16 %rs2067, %rs2005, %rs1987, %p188; 2026-02-21T10:23:47.8111634Z cvt.s16.s8 %rs2068, %rs2067; 2026-02-21T10:23:47.8111696Z shr.s16 %rs2069, %rs2068, 4; 2026-02-21T10:23:47.8111769Z selp.b16 %rs2070, %rs2006, %rs1960, %p188; 2026-02-21T10:23:47.8111833Z cvt.s16.s8 %rs2071, %rs2070; 2026-02-21T10:23:47.8111944Z shr.s16 %rs2072, %rs2071, 4; 2026-02-21T10:23:47.8112019Z selp.b16 %rs2073, %rs2007, %rs1964, %p188; 2026-02-21T10:23:47.8112079Z cvt.s16.s8 %rs2074, %rs2073; 2026-02-21T10:23:47.8112149Z shr.s16 %rs2075, %rs2074, 4; 2026-02-21T10:23:47.8112224Z selp.b16 %rs2076, %rs2008, %rs1968, %p188; 2026-02-21T10:23:47.8112285Z cvt.s16.s8 %rs2077, %rs2076; 2026-02-21T10:23:47.8112392Z shr.s16 %rs2078, %rs2077, 4; 2026-02-21T10:23:47.8112474Z selp.b16 %rs2079, %rs2009, %rs1972, %p188; 2026-02-21T10:23:47.8112543Z cvt.s16.s8 %rs2080, %rs2079; 2026-02-21T10:23:47.8112604Z shr.s16 %rs2081, %rs2080, 4; 2026-02-21T10:23:47.8112684Z selp.b16 %rs2082, %rs2010, %rs1976, %p188; 2026-02-21T10:23:47.8112748Z cvt.s16.s8 %rs2083, %rs2082; 2026-02-21T10:23:47.8112808Z shr.s16 %rs2084, %rs2083, 4; 2026-02-21T10:23:47.8112881Z selp.b16 %rs2085, %rs2011, %rs1980, %p188; 2026-02-21T10:23:47.8112948Z cvt.s16.s8 %rs2086, %rs2085; 2026-02-21T10:23:47.8113017Z shr.s16 %rs2087, %rs2086, 4; 2026-02-21T10:23:47.8113092Z selp.b16 %rs2088, %rs2012, %rs1984, %p188; 2026-02-21T10:23:47.8113156Z cvt.s16.s8 %rs2089, %rs2088; 2026-02-21T10:23:47.8113221Z shr.s16 %rs2090, %rs2089, 4; 2026-02-21T10:23:47.8113295Z selp.b16 %rs2091, %rs2013, %rs1988, %p188; 2026-02-21T10:23:47.8113357Z cvt.s16.s8 %rs2092, %rs2091; 2026-02-21T10:23:47.8113421Z shr.s16 %rs2093, %rs2092, 4; 2026-02-21T10:23:47.8113496Z selp.b16 %rs2094, %rs2014, %rs1961, %p188; 2026-02-21T10:23:47.8113558Z cvt.s16.s8 %rs2095, %rs2094; 2026-02-21T10:23:47.8113619Z shr.s16 %rs2096, %rs2095, 4; 2026-02-21T10:23:47.8113691Z selp.b16 %rs2097, %rs2015, %rs1965, %p188; 2026-02-21T10:23:47.8113751Z cvt.s16.s8 %rs2098, %rs2097; 2026-02-21T10:23:47.8113814Z shr.s16 %rs2099, %rs2098, 4; 2026-02-21T10:23:47.8113892Z selp.b16 %rs2100, %rs2016, %rs1969, %p188; 2026-02-21T10:23:47.8113954Z cvt.s16.s8 %rs2101, %rs2100; 2026-02-21T10:23:47.8114015Z shr.s16 %rs2102, %rs2101, 4; 2026-02-21T10:23:47.8114099Z selp.b16 %rs2103, %rs2017, %rs1973, %p188; 2026-02-21T10:23:47.8114161Z cvt.s16.s8 %rs2104, %rs2103; 2026-02-21T10:23:47.8114221Z shr.s16 %rs2105, %rs2104, 4; 2026-02-21T10:23:47.8114296Z selp.b16 %rs2106, %rs2018, %rs1977, %p188; 2026-02-21T10:23:47.8114363Z cvt.s16.s8 %rs2107, %rs2106; 2026-02-21T10:23:47.8114425Z shr.s16 %rs2108, %rs2107, 4; 2026-02-21T10:23:47.8114498Z selp.b16 %rs2109, %rs2019, %rs1981, %p188; 2026-02-21T10:23:47.8114569Z cvt.s16.s8 %rs2110, %rs2109; 2026-02-21T10:23:47.8114629Z shr.s16 %rs2111, %rs2110, 4; 2026-02-21T10:23:47.8114703Z selp.b16 %rs2112, %rs2020, %rs1985, %p188; 2026-02-21T10:23:47.8114767Z cvt.s16.s8 %rs2113, %rs2112; 2026-02-21T10:23:47.8114828Z shr.s16 %rs2114, %rs2113, 4; 2026-02-21T10:23:47.8114902Z selp.b16 %rs2115, %rs2021, %rs1989, %p188; 2026-02-21T10:23:47.8114963Z cvt.s16.s8 %rs2116, %rs2115; 2026-02-21T10:23:47.8115025Z shr.s16 %rs2117, %rs2116, 4; 2026-02-21T10:23:47.8115229Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.8115372Z cvt.rn.f32.s16 %r15890, %rs2024; 2026-02-21T10:23:47.8115441Z cvt.rn.f32.s16 %r15891, %rs2027; 2026-02-21T10:23:47.8115507Z cvt.rn.f32.s16 %r15892, %rs2030; 2026-02-21T10:23:47.8115618Z cvt.rn.f32.s16 %r15893, %rs2033; 2026-02-21T10:23:47.8115680Z cvt.rn.f32.s16 %r15894, %rs2036; 2026-02-21T10:23:47.8115745Z cvt.rn.f32.s16 %r15895, %rs2039; 2026-02-21T10:23:47.8115809Z cvt.rn.f32.s16 %r15896, %rs2042; 2026-02-21T10:23:47.8115871Z cvt.rn.f32.s16 %r15897, %rs2045; 2026-02-21T10:23:47.8115937Z cvt.rn.f32.s16 %r15898, %rs2048; 2026-02-21T10:23:47.8115998Z cvt.rn.f32.s16 %r15899, %rs2051; 2026-02-21T10:23:47.8116061Z cvt.rn.f32.s16 %r15900, %rs2054; 2026-02-21T10:23:47.8116127Z cvt.rn.f32.s16 %r15901, %rs2057; 2026-02-21T10:23:47.8116188Z cvt.rn.f32.s16 %r15902, %rs2060; 2026-02-21T10:23:47.8116250Z cvt.rn.f32.s16 %r15903, %rs2063; 2026-02-21T10:23:47.8116311Z cvt.rn.f32.s16 %r15904, %rs2066; 2026-02-21T10:23:47.8116428Z cvt.rn.f32.s16 %r15905, %rs2069; 2026-02-21T10:23:47.8116613Z cvt.rn.f32.s16 %r15906, %rs2072; 2026-02-21T10:23:47.8116679Z cvt.rn.f32.s16 %r15907, %rs2075; 2026-02-21T10:23:47.8116743Z cvt.rn.f32.s16 %r15908, %rs2078; 2026-02-21T10:23:47.8116809Z cvt.rn.f32.s16 %r15909, %rs2081; 2026-02-21T10:23:47.8116871Z cvt.rn.f32.s16 %r15910, %rs2084; 2026-02-21T10:23:47.8116934Z cvt.rn.f32.s16 %r15911, %rs2087; 2026-02-21T10:23:47.8117096Z cvt.rn.f32.s16 %r15912, %rs2090; 2026-02-21T10:23:47.8117166Z cvt.rn.f32.s16 %r15913, %rs2093; 2026-02-21T10:23:47.8117229Z cvt.rn.f32.s16 %r15914, %rs2096; 2026-02-21T10:23:47.8117295Z cvt.rn.f32.s16 %r15915, %rs2099; 2026-02-21T10:23:47.8117358Z cvt.rn.f32.s16 %r15916, %rs2102; 2026-02-21T10:23:47.8117419Z cvt.rn.f32.s16 %r15917, %rs2105; 2026-02-21T10:23:47.8117481Z cvt.rn.f32.s16 %r15918, %rs2108; 2026-02-21T10:23:47.8117548Z cvt.rn.f32.s16 %r15919, %rs2111; 2026-02-21T10:23:47.8117609Z cvt.rn.f32.s16 %r15920, %rs2114; 2026-02-21T10:23:47.8117677Z cvt.rn.f32.s16 %r15921, %rs2117; 2026-02-21T10:23:47.8117737Z bar.sync 0; 2026-02-21T10:23:47.8117817Z st.shared.b32 [%r78], %r15890; 2026-02-21T10:23:47.8117882Z st.shared.b32 [%r78+8], %r15891; 2026-02-21T10:23:47.8117958Z st.shared.b32 [%r78+16384], %r15906; 2026-02-21T10:23:47.8118025Z st.shared.b32 [%r78+16392], %r15907; 2026-02-21T10:23:47.8118089Z st.shared.b32 [%r79], %r15892; 2026-02-21T10:23:47.8118155Z st.shared.b32 [%r79+8], %r15893; 2026-02-21T10:23:47.8118235Z st.shared.b32 [%r79+16384], %r15908; 2026-02-21T10:23:47.8118303Z st.shared.b32 [%r79+16392], %r15909; 2026-02-21T10:23:47.8118368Z st.shared.b32 [%r80], %r15894; 2026-02-21T10:23:47.8118434Z st.shared.b32 [%r80+8], %r15895; 2026-02-21T10:23:47.8118500Z st.shared.b32 [%r80+16384], %r15910; 2026-02-21T10:23:47.8118566Z st.shared.b32 [%r80+16392], %r15911; 2026-02-21T10:23:47.8118629Z st.shared.b32 [%r81], %r15896; 2026-02-21T10:23:47.8118697Z st.shared.b32 [%r81+8], %r15897; 2026-02-21T10:23:47.8118766Z st.shared.b32 [%r81+16384], %r15912; 2026-02-21T10:23:47.8118831Z st.shared.b32 [%r81+16392], %r15913; 2026-02-21T10:23:47.8118898Z st.shared.b32 [%r82], %r15898; 2026-02-21T10:23:47.8118960Z st.shared.b32 [%r82+8], %r15899; 2026-02-21T10:23:47.8119031Z st.shared.b32 [%r82+16384], %r15914; 2026-02-21T10:23:47.8119099Z st.shared.b32 [%r82+16392], %r15915; 2026-02-21T10:23:47.8119164Z st.shared.b32 [%r83], %r15900; 2026-02-21T10:23:47.8119230Z st.shared.b32 [%r83+8], %r15901; 2026-02-21T10:23:47.8119295Z st.shared.b32 [%r83+16384], %r15916; 2026-02-21T10:23:47.8119363Z st.shared.b32 [%r83+16392], %r15917; 2026-02-21T10:23:47.8119428Z st.shared.b32 [%r84], %r15902; 2026-02-21T10:23:47.8119493Z st.shared.b32 [%r84+8], %r15903; 2026-02-21T10:23:47.8119560Z st.shared.b32 [%r84+16384], %r15918; 2026-02-21T10:23:47.8119625Z st.shared.b32 [%r84+16392], %r15919; 2026-02-21T10:23:47.8119689Z st.shared.b32 [%r85], %r15904; 2026-02-21T10:23:47.8119752Z st.shared.b32 [%r85+8], %r15905; 2026-02-21T10:23:47.8119914Z st.shared.b32 [%r85+16384], %r15920; 2026-02-21T10:23:47.8119981Z st.shared.b32 [%r85+16392], %r15921; 2026-02-21T10:23:47.8120037Z $L__tmp21: 2026-02-21T10:23:47.8120321Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.8120446Z // begin inline asm 2026-02-21T10:23:47.8120526Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8120598Z // end inline asm 2026-02-21T10:23:47.8120658Z bar.sync 0; 2026-02-21T10:23:47.8120729Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.8120788Z // begin inline asm 2026-02-21T10:23:47.8122349Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13545,%r13546,%r13547,%r13548}, %rd11, %p136, 1, 1; 2026-02-21T10:23:47.8122419Z // end inline asm 2026-02-21T10:23:47.8122484Z // begin inline asm 2026-02-21T10:23:47.8124035Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13677,%r13678,%r13679,%r13680}, %rd12, %p136, 1, 1; 2026-02-21T10:23:47.8124101Z // end inline asm 2026-02-21T10:23:47.8124165Z // begin inline asm 2026-02-21T10:23:47.8125653Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13809,%r13810,%r13811,%r13812}, %rd13, %p136, 1, 1; 2026-02-21T10:23:47.8125720Z // end inline asm 2026-02-21T10:23:47.8125779Z // begin inline asm 2026-02-21T10:23:47.8127412Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r13941,%r13942,%r13943,%r13944}, %rd14, %p136, 1, 1; 2026-02-21T10:23:47.8127484Z // end inline asm 2026-02-21T10:23:47.8127544Z // begin inline asm 2026-02-21T10:23:47.8129033Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14073,%r14074,%r14075,%r14076}, %rd15, %p136, 1, 1; 2026-02-21T10:23:47.8129181Z // end inline asm 2026-02-21T10:23:47.8129303Z // begin inline asm 2026-02-21T10:23:47.8130791Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14205,%r14206,%r14207,%r14208}, %rd16, %p136, 1, 1; 2026-02-21T10:23:47.8130850Z // end inline asm 2026-02-21T10:23:47.8130913Z // begin inline asm 2026-02-21T10:23:47.8132522Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14337,%r14338,%r14339,%r14340}, %rd17, %p136, 1, 1; 2026-02-21T10:23:47.8132587Z // end inline asm 2026-02-21T10:23:47.8132647Z // begin inline asm 2026-02-21T10:23:47.8134151Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14469,%r14470,%r14471,%r14472}, %rd18, %p136, 1, 1; 2026-02-21T10:23:47.8134215Z // end inline asm 2026-02-21T10:23:47.8134298Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.8134363Z mov.b32 %r14538, %r15753; 2026-02-21T10:23:47.8134422Z mov.b32 %r14539, %r15753; 2026-02-21T10:23:47.8134481Z mov.b32 %r14537, %r10980; 2026-02-21T10:23:47.8134546Z // begin inline asm 2026-02-21T10:23:47.8135832Z // wait for regs: %r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283,%r14537,%r14538,%r14539 2026-02-21T10:23:47.8135920Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.8135976Z // end inline asm 2026-02-21T10:23:47.8136033Z $L__tmp22: 2026-02-21T10:23:47.8136247Z .loc 1 54 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:32 2026-02-21T10:23:47.8136319Z add.s64 %rd333, %rd270, 384; 2026-02-21T10:23:47.8136381Z add.s64 %rd336, %rd273, 384; 2026-02-21T10:23:47.8136441Z add.s64 %rd339, %rd276, 384; 2026-02-21T10:23:47.8136777Z .loc 1 54 80 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:54:80 2026-02-21T10:23:47.8136840Z add.s64 %rd342, %rd279, 384; 2026-02-21T10:23:47.8136899Z // begin inline asm 2026-02-21T10:23:47.8137047Z mov.u64 %rd332, 0x0; 2026-02-21T10:23:47.8137177Z createpolicy.fractional.L2::evict_last.b64 %rd332, 1.0; 2026-02-21T10:23:47.8137235Z // end inline asm 2026-02-21T10:23:47.8137294Z // begin inline asm 2026-02-21T10:23:47.8137421Z mov.u32 %r14607, 0x0; 2026-02-21T10:23:47.8137479Z mov.u32 %r14608, 0x0; 2026-02-21T10:23:47.8137538Z mov.u32 %r14609, 0x0; 2026-02-21T10:23:47.8137598Z mov.u32 %r14610, 0x0; 2026-02-21T10:23:47.8137841Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r14607, %r14608, %r14609, %r14610 }, [ %rd333 + 0 ], %rd332; 2026-02-21T10:23:47.8137899Z // end inline asm 2026-02-21T10:23:47.8137975Z // begin inline asm 2026-02-21T10:23:47.8138037Z mov.u64 %rd335, 0x0; 2026-02-21T10:23:47.8138159Z createpolicy.fractional.L2::evict_last.b64 %rd335, 1.0; 2026-02-21T10:23:47.8138216Z // end inline asm 2026-02-21T10:23:47.8138279Z // begin inline asm 2026-02-21T10:23:47.8138337Z mov.u32 %r14611, 0x0; 2026-02-21T10:23:47.8138395Z mov.u32 %r14612, 0x0; 2026-02-21T10:23:47.8138521Z mov.u32 %r14613, 0x0; 2026-02-21T10:23:47.8138582Z mov.u32 %r14614, 0x0; 2026-02-21T10:23:47.8138811Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r14611, %r14612, %r14613, %r14614 }, [ %rd336 + 0 ], %rd335; 2026-02-21T10:23:47.8138874Z // end inline asm 2026-02-21T10:23:47.8138935Z // begin inline asm 2026-02-21T10:23:47.8138994Z mov.u64 %rd338, 0x0; 2026-02-21T10:23:47.8139169Z createpolicy.fractional.L2::evict_last.b64 %rd338, 1.0; 2026-02-21T10:23:47.8139233Z // end inline asm 2026-02-21T10:23:47.8139291Z // begin inline asm 2026-02-21T10:23:47.8139351Z mov.u32 %r14615, 0x0; 2026-02-21T10:23:47.8139413Z mov.u32 %r14616, 0x0; 2026-02-21T10:23:47.8139472Z mov.u32 %r14617, 0x0; 2026-02-21T10:23:47.8139530Z mov.u32 %r14618, 0x0; 2026-02-21T10:23:47.8139761Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r14615, %r14616, %r14617, %r14618 }, [ %rd339 + 0 ], %rd338; 2026-02-21T10:23:47.8139825Z // end inline asm 2026-02-21T10:23:47.8139884Z // begin inline asm 2026-02-21T10:23:47.8139948Z mov.u64 %rd341, 0x0; 2026-02-21T10:23:47.8140068Z createpolicy.fractional.L2::evict_last.b64 %rd341, 1.0; 2026-02-21T10:23:47.8140126Z // end inline asm 2026-02-21T10:23:47.8140186Z // begin inline asm 2026-02-21T10:23:47.8140249Z mov.u32 %r14619, 0x0; 2026-02-21T10:23:47.8140313Z mov.u32 %r14620, 0x0; 2026-02-21T10:23:47.8140373Z mov.u32 %r14621, 0x0; 2026-02-21T10:23:47.8140430Z mov.u32 %r14622, 0x0; 2026-02-21T10:23:47.8140655Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r14619, %r14620, %r14621, %r14622 }, [ %rd342 + 0 ], %rd341; 2026-02-21T10:23:47.8140713Z // end inline asm 2026-02-21T10:23:47.8140922Z .loc 1 58 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:58:32 2026-02-21T10:23:47.8140982Z bar.sync 0; 2026-02-21T10:23:47.8141069Z st.shared.v2.b32 [%r60], {%r14607, %r14608}; 2026-02-21T10:23:47.8141161Z st.shared.v2.b32 [%r60+4096], {%r14611, %r14612}; 2026-02-21T10:23:47.8141250Z st.shared.v2.b32 [%r60+8192], {%r14615, %r14616}; 2026-02-21T10:23:47.8141350Z st.shared.v2.b32 [%r60+12288], {%r14619, %r14620}; 2026-02-21T10:23:47.8141429Z st.shared.v2.b32 [%r61], {%r14609, %r14610}; 2026-02-21T10:23:47.8141512Z st.shared.v2.b32 [%r61+4096], {%r14613, %r14614}; 2026-02-21T10:23:47.8141602Z st.shared.v2.b32 [%r61+8192], {%r14617, %r14618}; 2026-02-21T10:23:47.8141693Z st.shared.v2.b32 [%r61+12288], {%r14621, %r14622}; 2026-02-21T10:23:47.8141750Z bar.sync 0; 2026-02-21T10:23:47.8141825Z ld.shared.b16 %rs2118, [%r62]; 2026-02-21T10:23:47.8141899Z ld.shared.b16 %rs2119, [%r62+1024]; 2026-02-21T10:23:47.8141968Z ld.shared.b16 %rs2120, [%r62+64]; 2026-02-21T10:23:47.8142043Z ld.shared.b16 %rs2121, [%r62+1088]; 2026-02-21T10:23:47.8142113Z ld.shared.b16 %rs2122, [%r63]; 2026-02-21T10:23:47.8142180Z ld.shared.b16 %rs2123, [%r63+1024]; 2026-02-21T10:23:47.8142249Z ld.shared.b16 %rs2124, [%r63+64]; 2026-02-21T10:23:47.8142318Z ld.shared.b16 %rs2125, [%r63+1088]; 2026-02-21T10:23:47.8142382Z ld.shared.b16 %rs2126, [%r64]; 2026-02-21T10:23:47.8142509Z ld.shared.b16 %rs2127, [%r64+1024]; 2026-02-21T10:23:47.8142576Z ld.shared.b16 %rs2128, [%r64+64]; 2026-02-21T10:23:47.8142644Z ld.shared.b16 %rs2129, [%r64+1088]; 2026-02-21T10:23:47.8142709Z ld.shared.b16 %rs2130, [%r65]; 2026-02-21T10:23:47.8142824Z ld.shared.b16 %rs2131, [%r65+1024]; 2026-02-21T10:23:47.8142892Z ld.shared.b16 %rs2132, [%r65+64]; 2026-02-21T10:23:47.8142960Z ld.shared.b16 %rs2133, [%r65+1088]; 2026-02-21T10:23:47.8143025Z ld.shared.b16 %rs2134, [%r66]; 2026-02-21T10:23:47.8143094Z ld.shared.b16 %rs2135, [%r66+1024]; 2026-02-21T10:23:47.8143160Z ld.shared.b16 %rs2136, [%r66+64]; 2026-02-21T10:23:47.8143227Z ld.shared.b16 %rs2137, [%r66+1088]; 2026-02-21T10:23:47.8143294Z ld.shared.b16 %rs2138, [%r67]; 2026-02-21T10:23:47.8143363Z ld.shared.b16 %rs2139, [%r67+1024]; 2026-02-21T10:23:47.8143428Z ld.shared.b16 %rs2140, [%r67+64]; 2026-02-21T10:23:47.8143493Z ld.shared.b16 %rs2141, [%r67+1088]; 2026-02-21T10:23:47.8143563Z ld.shared.b16 %rs2142, [%r68]; 2026-02-21T10:23:47.8143691Z ld.shared.b16 %rs2143, [%r68+1024]; 2026-02-21T10:23:47.8143759Z ld.shared.b16 %rs2144, [%r68+64]; 2026-02-21T10:23:47.8143837Z ld.shared.b16 %rs2145, [%r68+1088]; 2026-02-21T10:23:47.8143912Z ld.shared.b16 %rs2146, [%r69]; 2026-02-21T10:23:47.8143978Z ld.shared.b16 %rs2147, [%r69+1024]; 2026-02-21T10:23:47.8144045Z ld.shared.b16 %rs2148, [%r69+64]; 2026-02-21T10:23:47.8144160Z ld.shared.b16 %rs2149, [%r69+1088]; 2026-02-21T10:23:47.8144228Z cvt.f32.bf16 %r14760, %rs2118; 2026-02-21T10:23:47.8144291Z cvt.f32.bf16 %r14761, %rs2119; 2026-02-21T10:23:47.8144356Z cvt.f32.bf16 %r14762, %rs2122; 2026-02-21T10:23:47.8144418Z cvt.f32.bf16 %r14763, %rs2123; 2026-02-21T10:23:47.8144478Z cvt.f32.bf16 %r14892, %rs2126; 2026-02-21T10:23:47.8144541Z cvt.f32.bf16 %r14893, %rs2127; 2026-02-21T10:23:47.8144608Z cvt.f32.bf16 %r14894, %rs2130; 2026-02-21T10:23:47.8144671Z cvt.f32.bf16 %r14895, %rs2131; 2026-02-21T10:23:47.8144734Z cvt.f32.bf16 %r15024, %rs2134; 2026-02-21T10:23:47.8144800Z cvt.f32.bf16 %r15025, %rs2135; 2026-02-21T10:23:47.8144861Z cvt.f32.bf16 %r15026, %rs2138; 2026-02-21T10:23:47.8144921Z cvt.f32.bf16 %r15027, %rs2139; 2026-02-21T10:23:47.8144982Z cvt.f32.bf16 %r15156, %rs2142; 2026-02-21T10:23:47.8145049Z cvt.f32.bf16 %r15157, %rs2143; 2026-02-21T10:23:47.8145111Z cvt.f32.bf16 %r15158, %rs2146; 2026-02-21T10:23:47.8145174Z cvt.f32.bf16 %r15159, %rs2147; 2026-02-21T10:23:47.8145249Z cvt.f32.bf16 %r15288, %rs2120; 2026-02-21T10:23:47.8145313Z cvt.f32.bf16 %r15289, %rs2121; 2026-02-21T10:23:47.8145374Z cvt.f32.bf16 %r15290, %rs2124; 2026-02-21T10:23:47.8145436Z cvt.f32.bf16 %r15291, %rs2125; 2026-02-21T10:23:47.8145500Z cvt.f32.bf16 %r15420, %rs2128; 2026-02-21T10:23:47.8145562Z cvt.f32.bf16 %r15421, %rs2129; 2026-02-21T10:23:47.8145624Z cvt.f32.bf16 %r15422, %rs2132; 2026-02-21T10:23:47.8145687Z cvt.f32.bf16 %r15423, %rs2133; 2026-02-21T10:23:47.8145751Z cvt.f32.bf16 %r15552, %rs2136; 2026-02-21T10:23:47.8145818Z cvt.f32.bf16 %r15553, %rs2137; 2026-02-21T10:23:47.8145880Z cvt.f32.bf16 %r15554, %rs2140; 2026-02-21T10:23:47.8145945Z cvt.f32.bf16 %r15555, %rs2141; 2026-02-21T10:23:47.8146006Z cvt.f32.bf16 %r15684, %rs2144; 2026-02-21T10:23:47.8146070Z cvt.f32.bf16 %r15685, %rs2145; 2026-02-21T10:23:47.8146135Z cvt.f32.bf16 %r15686, %rs2148; 2026-02-21T10:23:47.8146197Z cvt.f32.bf16 %r15687, %rs2149; 2026-02-21T10:23:47.8146413Z .loc 1 60 33 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:60:33 2026-02-21T10:23:47.8146602Z bar.sync 0; 2026-02-21T10:23:47.8146668Z // begin inline asm 2026-02-21T10:23:47.8146769Z @%p132 mbarrier.init.shared::cta.b64 [%r10978], 1; 2026-02-21T10:23:47.8146827Z // end inline asm 2026-02-21T10:23:47.8146886Z bar.sync 0; 2026-02-21T10:23:47.8146945Z // begin inline asm 2026-02-21T10:23:47.8147083Z @%p132 mbarrier.arrive.expect_tx.shared.b64 _, [%r10978], 4096; 2026-02-21T10:23:47.8147145Z // end inline asm 2026-02-21T10:23:47.8147291Z // begin inline asm 2026-02-21T10:23:47.8147371Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8147427Z // end inline asm 2026-02-21T10:23:47.8147485Z bar.sync 0; 2026-02-21T10:23:47.8147553Z elect.sync %r15922|%p185, -1; 2026-02-21T10:23:47.8147700Z and.pred %p170, %p1, %p185; 2026-02-21T10:23:47.8147768Z or.b32 %r14627, %r10982, 96; 2026-02-21T10:23:47.8147827Z // begin inline asm 2026-02-21T10:23:47.8148171Z @%p170 cp.async.bulk.tensor.2d.shared::cluster.global.mbarrier::complete_tx::bytes [%r10980], [%rd281, {%r10981, %r14627}], [%r10978]; 2026-02-21T10:23:47.8148233Z // end inline asm 2026-02-21T10:23:47.8148288Z bar.sync 0; 2026-02-21T10:23:47.8148347Z // begin inline asm 2026-02-21T10:23:47.8148399Z 2026-02-21T10:23:47.8148464Z { 2026-02-21T10:23:47.8148592Z .reg .pred complete; 2026-02-21T10:23:47.8148650Z waitLoop: 2026-02-21T10:23:47.8148802Z mbarrier.try_wait.parity.shared.b64 complete, [%r10978], %r15753; 2026-02-21T10:23:47.8148877Z @!complete bra.uni waitLoop; 2026-02-21T10:23:47.8149006Z } 2026-02-21T10:23:47.8149012Z 2026-02-21T10:23:47.8149073Z // end inline asm 2026-02-21T10:23:47.8149133Z bar.sync 0; 2026-02-21T10:23:47.8149193Z // begin inline asm 2026-02-21T10:23:47.8149292Z @%p132 mbarrier.inval.shared::cta.b64 [%r10978]; 2026-02-21T10:23:47.8149355Z // end inline asm 2026-02-21T10:23:47.8149641Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8149711Z ld.shared.b8 %rs2150, [%r70]; 2026-02-21T10:23:47.8149783Z ld.shared.b8 %rs2151, [%r70+1024]; 2026-02-21T10:23:47.8149849Z ld.shared.b8 %rs2152, [%r70+2048]; 2026-02-21T10:23:47.8149912Z ld.shared.b8 %rs2153, [%r70+3072]; 2026-02-21T10:23:47.8149979Z ld.shared.b8 %rs2154, [%r71+128]; 2026-02-21T10:23:47.8150047Z ld.shared.b8 %rs2155, [%r71+1152]; 2026-02-21T10:23:47.8150111Z ld.shared.b8 %rs2156, [%r71+2176]; 2026-02-21T10:23:47.8150174Z ld.shared.b8 %rs2157, [%r71+3200]; 2026-02-21T10:23:47.8150244Z ld.shared.b8 %rs2158, [%r72+256]; 2026-02-21T10:23:47.8150308Z ld.shared.b8 %rs2159, [%r72+1280]; 2026-02-21T10:23:47.8150374Z ld.shared.b8 %rs2160, [%r72+2304]; 2026-02-21T10:23:47.8150439Z ld.shared.b8 %rs2161, [%r72+3328]; 2026-02-21T10:23:47.8150509Z ld.shared.b8 %rs2162, [%r73+384]; 2026-02-21T10:23:47.8150574Z ld.shared.b8 %rs2163, [%r73+1408]; 2026-02-21T10:23:47.8150640Z ld.shared.b8 %rs2164, [%r73+2432]; 2026-02-21T10:23:47.8150713Z ld.shared.b8 %rs2165, [%r73+3456]; 2026-02-21T10:23:47.8150778Z ld.shared.b8 %rs2166, [%r74+512]; 2026-02-21T10:23:47.8150841Z ld.shared.b8 %rs2167, [%r74+1536]; 2026-02-21T10:23:47.8150920Z ld.shared.b8 %rs2168, [%r74+2560]; 2026-02-21T10:23:47.8150987Z ld.shared.b8 %rs2169, [%r74+3584]; 2026-02-21T10:23:47.8151052Z ld.shared.b8 %rs2170, [%r75+640]; 2026-02-21T10:23:47.8151115Z ld.shared.b8 %rs2171, [%r75+1664]; 2026-02-21T10:23:47.8151184Z ld.shared.b8 %rs2172, [%r75+2688]; 2026-02-21T10:23:47.8151246Z ld.shared.b8 %rs2173, [%r75+3712]; 2026-02-21T10:23:47.8151315Z ld.shared.b8 %rs2174, [%r76+768]; 2026-02-21T10:23:47.8151383Z ld.shared.b8 %rs2175, [%r76+1792]; 2026-02-21T10:23:47.8151446Z ld.shared.b8 %rs2176, [%r76+2816]; 2026-02-21T10:23:47.8151510Z ld.shared.b8 %rs2177, [%r76+3840]; 2026-02-21T10:23:47.8151577Z ld.shared.b8 %rs2178, [%r77+896]; 2026-02-21T10:23:47.8151644Z ld.shared.b8 %rs2179, [%r77+1920]; 2026-02-21T10:23:47.8151709Z ld.shared.b8 %rs2180, [%r77+2944]; 2026-02-21T10:23:47.8151775Z ld.shared.b8 %rs2181, [%r77+3968]; 2026-02-21T10:23:47.8151985Z .loc 1 63 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:63:28 2026-02-21T10:23:47.8152050Z shl.b16 %rs2182, %rs2150, 4; 2026-02-21T10:23:47.8152112Z shl.b16 %rs2183, %rs2154, 4; 2026-02-21T10:23:47.8152177Z shl.b16 %rs2184, %rs2158, 4; 2026-02-21T10:23:47.8152238Z shl.b16 %rs2185, %rs2162, 4; 2026-02-21T10:23:47.8152299Z shl.b16 %rs2186, %rs2166, 4; 2026-02-21T10:23:47.8152360Z shl.b16 %rs2187, %rs2170, 4; 2026-02-21T10:23:47.8152490Z shl.b16 %rs2188, %rs2174, 4; 2026-02-21T10:23:47.8152552Z shl.b16 %rs2189, %rs2178, 4; 2026-02-21T10:23:47.8152613Z shl.b16 %rs2190, %rs2151, 4; 2026-02-21T10:23:47.8152683Z shl.b16 %rs2191, %rs2155, 4; 2026-02-21T10:23:47.8152792Z shl.b16 %rs2192, %rs2159, 4; 2026-02-21T10:23:47.8152854Z shl.b16 %rs2193, %rs2163, 4; 2026-02-21T10:23:47.8152913Z shl.b16 %rs2194, %rs2167, 4; 2026-02-21T10:23:47.8152979Z shl.b16 %rs2195, %rs2171, 4; 2026-02-21T10:23:47.8153041Z shl.b16 %rs2196, %rs2175, 4; 2026-02-21T10:23:47.8153102Z shl.b16 %rs2197, %rs2179, 4; 2026-02-21T10:23:47.8153166Z shl.b16 %rs2198, %rs2152, 4; 2026-02-21T10:23:47.8153227Z shl.b16 %rs2199, %rs2156, 4; 2026-02-21T10:23:47.8153287Z shl.b16 %rs2200, %rs2160, 4; 2026-02-21T10:23:47.8153348Z shl.b16 %rs2201, %rs2164, 4; 2026-02-21T10:23:47.8153411Z shl.b16 %rs2202, %rs2168, 4; 2026-02-21T10:23:47.8153471Z shl.b16 %rs2203, %rs2172, 4; 2026-02-21T10:23:47.8153533Z shl.b16 %rs2204, %rs2176, 4; 2026-02-21T10:23:47.8153649Z shl.b16 %rs2205, %rs2180, 4; 2026-02-21T10:23:47.8153712Z shl.b16 %rs2206, %rs2153, 4; 2026-02-21T10:23:47.8153774Z shl.b16 %rs2207, %rs2157, 4; 2026-02-21T10:23:47.8153839Z shl.b16 %rs2208, %rs2161, 4; 2026-02-21T10:23:47.8153903Z shl.b16 %rs2209, %rs2165, 4; 2026-02-21T10:23:47.8153963Z shl.b16 %rs2210, %rs2169, 4; 2026-02-21T10:23:47.8154024Z shl.b16 %rs2211, %rs2173, 4; 2026-02-21T10:23:47.8154138Z shl.b16 %rs2212, %rs2177, 4; 2026-02-21T10:23:47.8154202Z shl.b16 %rs2213, %rs2181, 4; 2026-02-21T10:23:47.8154406Z .loc 1 78 58 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:78:58 2026-02-21T10:23:47.8154490Z selp.b16 %rs2214, %rs2182, %rs2150, %p188; 2026-02-21T10:23:47.8154554Z cvt.s16.s8 %rs2215, %rs2214; 2026-02-21T10:23:47.8154615Z shr.s16 %rs2216, %rs2215, 4; 2026-02-21T10:23:47.8154690Z selp.b16 %rs2217, %rs2183, %rs2154, %p188; 2026-02-21T10:23:47.8154754Z cvt.s16.s8 %rs2218, %rs2217; 2026-02-21T10:23:47.8154817Z shr.s16 %rs2219, %rs2218, 4; 2026-02-21T10:23:47.8154893Z selp.b16 %rs2220, %rs2184, %rs2158, %p188; 2026-02-21T10:23:47.8154959Z cvt.s16.s8 %rs2221, %rs2220; 2026-02-21T10:23:47.8155019Z shr.s16 %rs2222, %rs2221, 4; 2026-02-21T10:23:47.8155097Z selp.b16 %rs2223, %rs2185, %rs2162, %p188; 2026-02-21T10:23:47.8155160Z cvt.s16.s8 %rs2224, %rs2223; 2026-02-21T10:23:47.8155221Z shr.s16 %rs2225, %rs2224, 4; 2026-02-21T10:23:47.8155309Z selp.b16 %rs2226, %rs2186, %rs2166, %p188; 2026-02-21T10:23:47.8155372Z cvt.s16.s8 %rs2227, %rs2226; 2026-02-21T10:23:47.8155438Z shr.s16 %rs2228, %rs2227, 4; 2026-02-21T10:23:47.8155512Z selp.b16 %rs2229, %rs2187, %rs2170, %p188; 2026-02-21T10:23:47.8155575Z cvt.s16.s8 %rs2230, %rs2229; 2026-02-21T10:23:47.8155638Z shr.s16 %rs2231, %rs2230, 4; 2026-02-21T10:23:47.8155713Z selp.b16 %rs2232, %rs2188, %rs2174, %p188; 2026-02-21T10:23:47.8155774Z cvt.s16.s8 %rs2233, %rs2232; 2026-02-21T10:23:47.8155834Z shr.s16 %rs2234, %rs2233, 4; 2026-02-21T10:23:47.8155916Z selp.b16 %rs2235, %rs2189, %rs2178, %p188; 2026-02-21T10:23:47.8155977Z cvt.s16.s8 %rs2236, %rs2235; 2026-02-21T10:23:47.8156040Z shr.s16 %rs2237, %rs2236, 4; 2026-02-21T10:23:47.8156116Z selp.b16 %rs2238, %rs2190, %rs2151, %p188; 2026-02-21T10:23:47.8156183Z cvt.s16.s8 %rs2239, %rs2238; 2026-02-21T10:23:47.8156245Z shr.s16 %rs2240, %rs2239, 4; 2026-02-21T10:23:47.8156318Z selp.b16 %rs2241, %rs2191, %rs2155, %p188; 2026-02-21T10:23:47.8156386Z cvt.s16.s8 %rs2242, %rs2241; 2026-02-21T10:23:47.8156581Z shr.s16 %rs2243, %rs2242, 4; 2026-02-21T10:23:47.8156664Z selp.b16 %rs2244, %rs2192, %rs2159, %p188; 2026-02-21T10:23:47.8156729Z cvt.s16.s8 %rs2245, %rs2244; 2026-02-21T10:23:47.8156791Z shr.s16 %rs2246, %rs2245, 4; 2026-02-21T10:23:47.8156866Z selp.b16 %rs2247, %rs2193, %rs2163, %p188; 2026-02-21T10:23:47.8156932Z cvt.s16.s8 %rs2248, %rs2247; 2026-02-21T10:23:47.8156993Z shr.s16 %rs2249, %rs2248, 4; 2026-02-21T10:23:47.8157069Z selp.b16 %rs2250, %rs2194, %rs2167, %p188; 2026-02-21T10:23:47.8157236Z cvt.s16.s8 %rs2251, %rs2250; 2026-02-21T10:23:47.8157304Z shr.s16 %rs2252, %rs2251, 4; 2026-02-21T10:23:47.8157380Z selp.b16 %rs2253, %rs2195, %rs2171, %p188; 2026-02-21T10:23:47.8157443Z cvt.s16.s8 %rs2254, %rs2253; 2026-02-21T10:23:47.8157580Z shr.s16 %rs2255, %rs2254, 4; 2026-02-21T10:23:47.8157656Z selp.b16 %rs2256, %rs2196, %rs2175, %p188; 2026-02-21T10:23:47.8157718Z cvt.s16.s8 %rs2257, %rs2256; 2026-02-21T10:23:47.8157782Z shr.s16 %rs2258, %rs2257, 4; 2026-02-21T10:23:47.8157861Z selp.b16 %rs2259, %rs2197, %rs2179, %p188; 2026-02-21T10:23:47.8157923Z cvt.s16.s8 %rs2260, %rs2259; 2026-02-21T10:23:47.8157984Z shr.s16 %rs2261, %rs2260, 4; 2026-02-21T10:23:47.8158061Z selp.b16 %rs2262, %rs2198, %rs2152, %p188; 2026-02-21T10:23:47.8158124Z cvt.s16.s8 %rs2263, %rs2262; 2026-02-21T10:23:47.8158184Z shr.s16 %rs2264, %rs2263, 4; 2026-02-21T10:23:47.8158260Z selp.b16 %rs2265, %rs2199, %rs2156, %p188; 2026-02-21T10:23:47.8158323Z cvt.s16.s8 %rs2266, %rs2265; 2026-02-21T10:23:47.8158449Z shr.s16 %rs2267, %rs2266, 4; 2026-02-21T10:23:47.8158527Z selp.b16 %rs2268, %rs2200, %rs2160, %p188; 2026-02-21T10:23:47.8158592Z cvt.s16.s8 %rs2269, %rs2268; 2026-02-21T10:23:47.8158652Z shr.s16 %rs2270, %rs2269, 4; 2026-02-21T10:23:47.8158740Z selp.b16 %rs2271, %rs2201, %rs2164, %p188; 2026-02-21T10:23:47.8158808Z cvt.s16.s8 %rs2272, %rs2271; 2026-02-21T10:23:47.8158867Z shr.s16 %rs2273, %rs2272, 4; 2026-02-21T10:23:47.8159002Z selp.b16 %rs2274, %rs2202, %rs2168, %p188; 2026-02-21T10:23:47.8159065Z cvt.s16.s8 %rs2275, %rs2274; 2026-02-21T10:23:47.8159140Z shr.s16 %rs2276, %rs2275, 4; 2026-02-21T10:23:47.8159216Z selp.b16 %rs2277, %rs2203, %rs2172, %p188; 2026-02-21T10:23:47.8159278Z cvt.s16.s8 %rs2278, %rs2277; 2026-02-21T10:23:47.8159344Z shr.s16 %rs2279, %rs2278, 4; 2026-02-21T10:23:47.8159419Z selp.b16 %rs2280, %rs2204, %rs2176, %p188; 2026-02-21T10:23:47.8159479Z cvt.s16.s8 %rs2281, %rs2280; 2026-02-21T10:23:47.8159540Z shr.s16 %rs2282, %rs2281, 4; 2026-02-21T10:23:47.8159622Z selp.b16 %rs2283, %rs2205, %rs2180, %p188; 2026-02-21T10:23:47.8159682Z cvt.s16.s8 %rs2284, %rs2283; 2026-02-21T10:23:47.8159742Z shr.s16 %rs2285, %rs2284, 4; 2026-02-21T10:23:47.8159819Z selp.b16 %rs2286, %rs2206, %rs2153, %p188; 2026-02-21T10:23:47.8159884Z cvt.s16.s8 %rs2287, %rs2286; 2026-02-21T10:23:47.8159944Z shr.s16 %rs2288, %rs2287, 4; 2026-02-21T10:23:47.8160028Z selp.b16 %rs2289, %rs2207, %rs2157, %p188; 2026-02-21T10:23:47.8160089Z cvt.s16.s8 %rs2290, %rs2289; 2026-02-21T10:23:47.8160149Z shr.s16 %rs2291, %rs2290, 4; 2026-02-21T10:23:47.8160222Z selp.b16 %rs2292, %rs2208, %rs2161, %p188; 2026-02-21T10:23:47.8160294Z cvt.s16.s8 %rs2293, %rs2292; 2026-02-21T10:23:47.8160357Z shr.s16 %rs2294, %rs2293, 4; 2026-02-21T10:23:47.8160432Z selp.b16 %rs2295, %rs2209, %rs2165, %p188; 2026-02-21T10:23:47.8160496Z cvt.s16.s8 %rs2296, %rs2295; 2026-02-21T10:23:47.8160558Z shr.s16 %rs2297, %rs2296, 4; 2026-02-21T10:23:47.8160641Z selp.b16 %rs2298, %rs2210, %rs2169, %p188; 2026-02-21T10:23:47.8160710Z cvt.s16.s8 %rs2299, %rs2298; 2026-02-21T10:23:47.8160776Z shr.s16 %rs2300, %rs2299, 4; 2026-02-21T10:23:47.8160850Z selp.b16 %rs2301, %rs2211, %rs2173, %p188; 2026-02-21T10:23:47.8160913Z cvt.s16.s8 %rs2302, %rs2301; 2026-02-21T10:23:47.8160978Z shr.s16 %rs2303, %rs2302, 4; 2026-02-21T10:23:47.8161051Z selp.b16 %rs2304, %rs2212, %rs2177, %p188; 2026-02-21T10:23:47.8161116Z cvt.s16.s8 %rs2305, %rs2304; 2026-02-21T10:23:47.8161181Z shr.s16 %rs2306, %rs2305, 4; 2026-02-21T10:23:47.8161254Z selp.b16 %rs2307, %rs2213, %rs2181, %p188; 2026-02-21T10:23:47.8161315Z cvt.s16.s8 %rs2308, %rs2307; 2026-02-21T10:23:47.8161376Z shr.s16 %rs2309, %rs2308, 4; 2026-02-21T10:23:47.8161593Z .loc 1 83 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:83:32 2026-02-21T10:23:47.8161664Z cvt.rn.f32.s16 %r15923, %rs2216; 2026-02-21T10:23:47.8161730Z cvt.rn.f32.s16 %r15924, %rs2219; 2026-02-21T10:23:47.8161857Z cvt.rn.f32.s16 %r15925, %rs2222; 2026-02-21T10:23:47.8161923Z cvt.rn.f32.s16 %r15926, %rs2225; 2026-02-21T10:23:47.8161986Z cvt.rn.f32.s16 %r15927, %rs2228; 2026-02-21T10:23:47.8162049Z cvt.rn.f32.s16 %r15928, %rs2231; 2026-02-21T10:23:47.8162198Z cvt.rn.f32.s16 %r15929, %rs2234; 2026-02-21T10:23:47.8162264Z cvt.rn.f32.s16 %r15930, %rs2237; 2026-02-21T10:23:47.8162330Z cvt.rn.f32.s16 %r15931, %rs2240; 2026-02-21T10:23:47.8162396Z cvt.rn.f32.s16 %r15932, %rs2243; 2026-02-21T10:23:47.8162459Z cvt.rn.f32.s16 %r15933, %rs2246; 2026-02-21T10:23:47.8162524Z cvt.rn.f32.s16 %r15934, %rs2249; 2026-02-21T10:23:47.8162590Z cvt.rn.f32.s16 %r15935, %rs2252; 2026-02-21T10:23:47.8162654Z cvt.rn.f32.s16 %r15936, %rs2255; 2026-02-21T10:23:47.8162717Z cvt.rn.f32.s16 %r15937, %rs2258; 2026-02-21T10:23:47.8162780Z cvt.rn.f32.s16 %r15938, %rs2261; 2026-02-21T10:23:47.8162846Z cvt.rn.f32.s16 %r15939, %rs2264; 2026-02-21T10:23:47.8162909Z cvt.rn.f32.s16 %r15940, %rs2267; 2026-02-21T10:23:47.8162973Z cvt.rn.f32.s16 %r15941, %rs2270; 2026-02-21T10:23:47.8163086Z cvt.rn.f32.s16 %r15942, %rs2273; 2026-02-21T10:23:47.8163152Z cvt.rn.f32.s16 %r15943, %rs2276; 2026-02-21T10:23:47.8163215Z cvt.rn.f32.s16 %r15944, %rs2279; 2026-02-21T10:23:47.8163281Z cvt.rn.f32.s16 %r15945, %rs2282; 2026-02-21T10:23:47.8163348Z cvt.rn.f32.s16 %r15946, %rs2285; 2026-02-21T10:23:47.8163411Z cvt.rn.f32.s16 %r15947, %rs2288; 2026-02-21T10:23:47.8163521Z cvt.rn.f32.s16 %r15948, %rs2291; 2026-02-21T10:23:47.8163589Z cvt.rn.f32.s16 %r15949, %rs2294; 2026-02-21T10:23:47.8163651Z cvt.rn.f32.s16 %r15950, %rs2297; 2026-02-21T10:23:47.8163728Z cvt.rn.f32.s16 %r15951, %rs2300; 2026-02-21T10:23:47.8163791Z cvt.rn.f32.s16 %r15952, %rs2303; 2026-02-21T10:23:47.8163857Z cvt.rn.f32.s16 %r15953, %rs2306; 2026-02-21T10:23:47.8163918Z cvt.rn.f32.s16 %r15954, %rs2309; 2026-02-21T10:23:47.8163975Z bar.sync 0; 2026-02-21T10:23:47.8164042Z st.shared.b32 [%r78], %r15923; 2026-02-21T10:23:47.8164109Z st.shared.b32 [%r78+8], %r15924; 2026-02-21T10:23:47.8164184Z st.shared.b32 [%r78+16384], %r15939; 2026-02-21T10:23:47.8164253Z st.shared.b32 [%r78+16392], %r15940; 2026-02-21T10:23:47.8164319Z st.shared.b32 [%r79], %r15925; 2026-02-21T10:23:47.8164383Z st.shared.b32 [%r79+8], %r15926; 2026-02-21T10:23:47.8164450Z st.shared.b32 [%r79+16384], %r15941; 2026-02-21T10:23:47.8164519Z st.shared.b32 [%r79+16392], %r15942; 2026-02-21T10:23:47.8164585Z st.shared.b32 [%r80], %r15927; 2026-02-21T10:23:47.8164649Z st.shared.b32 [%r80+8], %r15928; 2026-02-21T10:23:47.8164718Z st.shared.b32 [%r80+16384], %r15943; 2026-02-21T10:23:47.8164782Z st.shared.b32 [%r80+16392], %r15944; 2026-02-21T10:23:47.8164849Z st.shared.b32 [%r81], %r15929; 2026-02-21T10:23:47.8164913Z st.shared.b32 [%r81+8], %r15930; 2026-02-21T10:23:47.8164980Z st.shared.b32 [%r81+16384], %r15945; 2026-02-21T10:23:47.8165044Z st.shared.b32 [%r81+16392], %r15946; 2026-02-21T10:23:47.8165107Z st.shared.b32 [%r82], %r15931; 2026-02-21T10:23:47.8165179Z st.shared.b32 [%r82+8], %r15932; 2026-02-21T10:23:47.8165245Z st.shared.b32 [%r82+16384], %r15947; 2026-02-21T10:23:47.8165320Z st.shared.b32 [%r82+16392], %r15948; 2026-02-21T10:23:47.8165390Z st.shared.b32 [%r83], %r15933; 2026-02-21T10:23:47.8165460Z st.shared.b32 [%r83+8], %r15934; 2026-02-21T10:23:47.8165526Z st.shared.b32 [%r83+16384], %r15949; 2026-02-21T10:23:47.8165590Z st.shared.b32 [%r83+16392], %r15950; 2026-02-21T10:23:47.8165659Z st.shared.b32 [%r84], %r15935; 2026-02-21T10:23:47.8165723Z st.shared.b32 [%r84+8], %r15936; 2026-02-21T10:23:47.8165787Z st.shared.b32 [%r84+16384], %r15951; 2026-02-21T10:23:47.8165857Z st.shared.b32 [%r84+16392], %r15952; 2026-02-21T10:23:47.8165920Z st.shared.b32 [%r85], %r15937; 2026-02-21T10:23:47.8165982Z st.shared.b32 [%r85+8], %r15938; 2026-02-21T10:23:47.8166048Z st.shared.b32 [%r85+16384], %r15953; 2026-02-21T10:23:47.8166116Z st.shared.b32 [%r85+16392], %r15954; 2026-02-21T10:23:47.8166172Z $L__tmp23: 2026-02-21T10:23:47.8166643Z .loc 2 291 36 // standard.py:291:36 @[ cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:90:40 ] 2026-02-21T10:23:47.8166712Z // begin inline asm 2026-02-21T10:23:47.8166792Z fence.proxy.async.shared::cta; 2026-02-21T10:23:47.8166932Z // end inline asm 2026-02-21T10:23:47.8166991Z bar.sync 0; 2026-02-21T10:23:47.8167064Z wgmma.fence.sync.aligned; 2026-02-21T10:23:47.8167125Z // begin inline asm 2026-02-21T10:23:47.8168692Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14760,%r14761,%r14762,%r14763}, %rd11, %p136, 1, 1; 2026-02-21T10:23:47.8168755Z // end inline asm 2026-02-21T10:23:47.8168827Z // begin inline asm 2026-02-21T10:23:47.8170381Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r14892,%r14893,%r14894,%r14895}, %rd12, %p136, 1, 1; 2026-02-21T10:23:47.8170446Z // end inline asm 2026-02-21T10:23:47.8170504Z // begin inline asm 2026-02-21T10:23:47.8171996Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15024,%r15025,%r15026,%r15027}, %rd13, %p136, 1, 1; 2026-02-21T10:23:47.8172060Z // end inline asm 2026-02-21T10:23:47.8172119Z // begin inline asm 2026-02-21T10:23:47.8173601Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15156,%r15157,%r15158,%r15159}, %rd14, %p136, 1, 1; 2026-02-21T10:23:47.8173663Z // end inline asm 2026-02-21T10:23:47.8173725Z // begin inline asm 2026-02-21T10:23:47.8175208Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15288,%r15289,%r15290,%r15291}, %rd15, %p136, 1, 1; 2026-02-21T10:23:47.8175344Z // end inline asm 2026-02-21T10:23:47.8175404Z // begin inline asm 2026-02-21T10:23:47.8177010Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15420,%r15421,%r15422,%r15423}, %rd16, %p136, 1, 1; 2026-02-21T10:23:47.8177151Z // end inline asm 2026-02-21T10:23:47.8177214Z // begin inline asm 2026-02-21T10:23:47.8178815Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15552,%r15553,%r15554,%r15555}, %rd17, %p136, 1, 1; 2026-02-21T10:23:47.8178889Z // end inline asm 2026-02-21T10:23:47.8178948Z // begin inline asm 2026-02-21T10:23:47.8180435Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283}, {%r15684,%r15685,%r15686,%r15687}, %rd18, %p136, 1, 1; 2026-02-21T10:23:47.8180495Z // end inline asm 2026-02-21T10:23:47.8180574Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:47.8180640Z mov.b32 %r15754, %r15753; 2026-02-21T10:23:47.8180700Z mov.b32 %r15752, %r10980; 2026-02-21T10:23:47.8180762Z // begin inline asm 2026-02-21T10:23:47.8182050Z // wait for regs: %r16220,%r16221,%r16222,%r16223,%r16224,%r16225,%r16226,%r16227,%r16228,%r16229,%r16230,%r16231,%r16232,%r16233,%r16234,%r16235,%r16236,%r16237,%r16238,%r16239,%r16240,%r16241,%r16242,%r16243,%r16244,%r16245,%r16246,%r16247,%r16248,%r16249,%r16250,%r16251,%r16252,%r16253,%r16254,%r16255,%r16256,%r16257,%r16258,%r16259,%r16260,%r16261,%r16262,%r16263,%r16264,%r16265,%r16266,%r16267,%r16268,%r16269,%r16270,%r16271,%r16272,%r16273,%r16274,%r16275,%r16276,%r16277,%r16278,%r16279,%r16280,%r16281,%r16282,%r16283,%r15752,%r15753,%r15754 2026-02-21T10:23:47.8182152Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:47.8182211Z // end inline asm 2026-02-21T10:23:47.8182269Z $L__tmp24: 2026-02-21T10:23:47.8182492Z .loc 1 47 126 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:47:126 2026-02-21T10:23:47.8182562Z add.s64 %rd42, %rd366, 128; 2026-02-21T10:23:47.8182626Z add.s64 %rd365, %rd365, 512; 2026-02-21T10:23:47.8182701Z setp.lt.u64 %p186, %rd366, 3968; 2026-02-21T10:23:47.8182763Z mov.b64 %rd366, %rd42; 2026-02-21T10:23:47.8182823Z @%p186 bra $L__BB0_10; 2026-02-21T10:23:47.8182939Z // %bb.11: // in Loop: Header=BB0_9 Depth=1 2026-02-21T10:23:47.8183148Z .loc 1 38 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:38:32 2026-02-21T10:23:47.8183216Z or.b32 %r16027, %r10981, %r15; 2026-02-21T10:23:47.8183423Z .loc 1 40 32 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:40:32 2026-02-21T10:23:47.8183559Z or.b32 %r16028, %r367, %r7; 2026-02-21T10:23:47.8183621Z or.b32 %r16029, %r367, %r8; 2026-02-21T10:23:47.8183681Z or.b32 %r16030, %r367, %r9; 2026-02-21T10:23:47.8183746Z or.b32 %r16031, %r367, %r10; 2026-02-21T10:23:47.8183855Z or.b32 %r16032, %r367, %r11; 2026-02-21T10:23:47.8183915Z or.b32 %r16033, %r367, %r12; 2026-02-21T10:23:47.8183980Z or.b32 %r16034, %r367, %r13; 2026-02-21T10:23:47.8184045Z or.b32 %r16035, %r367, %r14; 2026-02-21T10:23:47.8184246Z .loc 1 93 28 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:93:28 2026-02-21T10:23:47.8184340Z cvt.rn.bf16x2.f32 %r16036, %r16221, %r16220; 2026-02-21T10:23:47.8184432Z cvt.rn.bf16x2.f32 %r16037, %r16223, %r16222; 2026-02-21T10:23:47.8184512Z cvt.rn.bf16x2.f32 %r16038, %r16225, %r16224; 2026-02-21T10:23:47.8184588Z cvt.rn.bf16x2.f32 %r16039, %r16227, %r16226; 2026-02-21T10:23:47.8184670Z cvt.rn.bf16x2.f32 %r16040, %r16229, %r16228; 2026-02-21T10:23:47.8184802Z cvt.rn.bf16x2.f32 %r16041, %r16231, %r16230; 2026-02-21T10:23:47.8184881Z cvt.rn.bf16x2.f32 %r16042, %r16233, %r16232; 2026-02-21T10:23:47.8184962Z cvt.rn.bf16x2.f32 %r16043, %r16235, %r16234; 2026-02-21T10:23:47.8185041Z cvt.rn.bf16x2.f32 %r16044, %r16237, %r16236; 2026-02-21T10:23:47.8185121Z cvt.rn.bf16x2.f32 %r16045, %r16239, %r16238; 2026-02-21T10:23:47.8185201Z cvt.rn.bf16x2.f32 %r16046, %r16241, %r16240; 2026-02-21T10:23:47.8185325Z cvt.rn.bf16x2.f32 %r16047, %r16243, %r16242; 2026-02-21T10:23:47.8185406Z cvt.rn.bf16x2.f32 %r16048, %r16245, %r16244; 2026-02-21T10:23:47.8185487Z cvt.rn.bf16x2.f32 %r16049, %r16247, %r16246; 2026-02-21T10:23:47.8185565Z cvt.rn.bf16x2.f32 %r16050, %r16249, %r16248; 2026-02-21T10:23:47.8185643Z cvt.rn.bf16x2.f32 %r16051, %r16251, %r16250; 2026-02-21T10:23:47.8185717Z cvt.rn.bf16x2.f32 %r16052, %r16253, %r16252; 2026-02-21T10:23:47.8185797Z cvt.rn.bf16x2.f32 %r16053, %r16255, %r16254; 2026-02-21T10:23:47.8185873Z cvt.rn.bf16x2.f32 %r16054, %r16257, %r16256; 2026-02-21T10:23:47.8185952Z cvt.rn.bf16x2.f32 %r16055, %r16259, %r16258; 2026-02-21T10:23:47.8186031Z cvt.rn.bf16x2.f32 %r16056, %r16261, %r16260; 2026-02-21T10:23:47.8186108Z cvt.rn.bf16x2.f32 %r16057, %r16263, %r16262; 2026-02-21T10:23:47.8186190Z cvt.rn.bf16x2.f32 %r16058, %r16265, %r16264; 2026-02-21T10:23:47.8186266Z cvt.rn.bf16x2.f32 %r16059, %r16267, %r16266; 2026-02-21T10:23:47.8186352Z cvt.rn.bf16x2.f32 %r16060, %r16269, %r16268; 2026-02-21T10:23:47.8186429Z cvt.rn.bf16x2.f32 %r16061, %r16271, %r16270; 2026-02-21T10:23:47.8186636Z cvt.rn.bf16x2.f32 %r16062, %r16273, %r16272; 2026-02-21T10:23:47.8186720Z cvt.rn.bf16x2.f32 %r16063, %r16275, %r16274; 2026-02-21T10:23:47.8186797Z cvt.rn.bf16x2.f32 %r16064, %r16277, %r16276; 2026-02-21T10:23:47.8186874Z cvt.rn.bf16x2.f32 %r16065, %r16279, %r16278; 2026-02-21T10:23:47.8186952Z cvt.rn.bf16x2.f32 %r16066, %r16281, %r16280; 2026-02-21T10:23:47.8187031Z cvt.rn.bf16x2.f32 %r16067, %r16283, %r16282; 2026-02-21T10:23:47.8187246Z .loc 1 94 50 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:50 2026-02-21T10:23:47.8187326Z mad.lo.s32 %r16068, %r16028, 1280, %r16027; 2026-02-21T10:23:47.8187405Z mad.lo.s32 %r16069, %r16029, 1280, %r16027; 2026-02-21T10:23:47.8187481Z mad.lo.s32 %r16070, %r16030, 1280, %r16027; 2026-02-21T10:23:47.8187554Z mad.lo.s32 %r16071, %r16031, 1280, %r16027; 2026-02-21T10:23:47.8187631Z mad.lo.s32 %r16072, %r16032, 1280, %r16027; 2026-02-21T10:23:47.8187704Z mad.lo.s32 %r16073, %r16033, 1280, %r16027; 2026-02-21T10:23:47.8187775Z mad.lo.s32 %r16074, %r16034, 1280, %r16027; 2026-02-21T10:23:47.8187849Z mad.lo.s32 %r16075, %r16035, 1280, %r16027; 2026-02-21T10:23:47.8188054Z .loc 1 94 22 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:22 2026-02-21T10:23:47.8188129Z mad.wide.s32 %rd353, %r16068, 2, %rd45; 2026-02-21T10:23:47.8188201Z mad.wide.s32 %rd354, %r16069, 2, %rd45; 2026-02-21T10:23:47.8188384Z mad.wide.s32 %rd355, %r16070, 2, %rd45; 2026-02-21T10:23:47.8188457Z mad.wide.s32 %rd356, %r16071, 2, %rd45; 2026-02-21T10:23:47.8188591Z mad.wide.s32 %rd357, %r16072, 2, %rd45; 2026-02-21T10:23:47.8188665Z mad.wide.s32 %rd358, %r16073, 2, %rd45; 2026-02-21T10:23:47.8188814Z mad.wide.s32 %rd359, %r16074, 2, %rd45; 2026-02-21T10:23:47.8188886Z mad.wide.s32 %rd360, %r16075, 2, %rd45; 2026-02-21T10:23:47.8189097Z .loc 1 94 81 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:94:81 2026-02-21T10:23:47.8189155Z bar.sync 0; 2026-02-21T10:23:47.8189274Z st.shared.v4.b32 [%r86], {%r16036, %r16038, %r16040, %r16042}; 2026-02-21T10:23:47.8189399Z st.shared.v4.b32 [%r86+512], {%r16037, %r16039, %r16041, %r16043}; 2026-02-21T10:23:47.8189512Z st.shared.v4.b32 [%r87], {%r16044, %r16046, %r16048, %r16050}; 2026-02-21T10:23:47.8189629Z st.shared.v4.b32 [%r87+512], {%r16045, %r16047, %r16049, %r16051}; 2026-02-21T10:23:47.8189737Z st.shared.v4.b32 [%r88], {%r16052, %r16054, %r16056, %r16058}; 2026-02-21T10:23:47.8189921Z st.shared.v4.b32 [%r88+512], {%r16053, %r16055, %r16057, %r16059}; 2026-02-21T10:23:47.8190032Z st.shared.v4.b32 [%r89], {%r16060, %r16062, %r16064, %r16066}; 2026-02-21T10:23:47.8190145Z st.shared.v4.b32 [%r89+512], {%r16061, %r16063, %r16065, %r16067}; 2026-02-21T10:23:47.8190207Z bar.sync 0; 2026-02-21T10:23:47.8190270Z // begin inline asm 2026-02-21T10:23:47.8190534Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15955, %r15956, %r15957, %r15958}, [%r15959]; 2026-02-21T10:23:47.8190596Z // end inline asm 2026-02-21T10:23:47.8190660Z // begin inline asm 2026-02-21T10:23:47.8190857Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15960, %r15961, %r15962, %r15963}, [%r15964]; 2026-02-21T10:23:47.8190914Z // end inline asm 2026-02-21T10:23:47.8190974Z // begin inline asm 2026-02-21T10:23:47.8191167Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15965, %r15966, %r15967, %r15968}, [%r15969]; 2026-02-21T10:23:47.8191223Z // end inline asm 2026-02-21T10:23:47.8191288Z // begin inline asm 2026-02-21T10:23:47.8191494Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15970, %r15971, %r15972, %r15973}, [%r15974]; 2026-02-21T10:23:47.8191554Z // end inline asm 2026-02-21T10:23:47.8191614Z // begin inline asm 2026-02-21T10:23:47.8191811Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15975, %r15976, %r15977, %r15978}, [%r15979]; 2026-02-21T10:23:47.8191869Z // end inline asm 2026-02-21T10:23:47.8191930Z // begin inline asm 2026-02-21T10:23:47.8192124Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15980, %r15981, %r15982, %r15983}, [%r15984]; 2026-02-21T10:23:47.8192181Z // end inline asm 2026-02-21T10:23:47.8192239Z // begin inline asm 2026-02-21T10:23:47.8192431Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15985, %r15986, %r15987, %r15988}, [%r15989]; 2026-02-21T10:23:47.8192491Z // end inline asm 2026-02-21T10:23:47.8192550Z // begin inline asm 2026-02-21T10:23:47.8192740Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r15990, %r15991, %r15992, %r15993}, [%r15994]; 2026-02-21T10:23:47.8192806Z // end inline asm 2026-02-21T10:23:47.8192864Z // begin inline asm 2026-02-21T10:23:47.8192995Z st.global.v4.b32 [ %rd353 + 0 ], { %r15955, %r15956, %r15957, %r15958 }; 2026-02-21T10:23:47.8193054Z // end inline asm 2026-02-21T10:23:47.8193126Z // begin inline asm 2026-02-21T10:23:47.8193254Z st.global.v4.b32 [ %rd354 + 0 ], { %r15960, %r15961, %r15962, %r15963 }; 2026-02-21T10:23:47.8193311Z // end inline asm 2026-02-21T10:23:47.8193376Z // begin inline asm 2026-02-21T10:23:47.8193498Z st.global.v4.b32 [ %rd355 + 0 ], { %r15965, %r15966, %r15967, %r15968 }; 2026-02-21T10:23:47.8193557Z // end inline asm 2026-02-21T10:23:47.8193619Z // begin inline asm 2026-02-21T10:23:47.8193739Z st.global.v4.b32 [ %rd356 + 0 ], { %r15970, %r15971, %r15972, %r15973 }; 2026-02-21T10:23:47.8193795Z // end inline asm 2026-02-21T10:23:47.8193854Z // begin inline asm 2026-02-21T10:23:47.8193976Z st.global.v4.b32 [ %rd357 + 0 ], { %r15975, %r15976, %r15977, %r15978 }; 2026-02-21T10:23:47.8194093Z // end inline asm 2026-02-21T10:23:47.8194154Z // begin inline asm 2026-02-21T10:23:47.8194277Z st.global.v4.b32 [ %rd358 + 0 ], { %r15980, %r15981, %r15982, %r15983 }; 2026-02-21T10:23:47.8194332Z // end inline asm 2026-02-21T10:23:47.8194442Z // begin inline asm 2026-02-21T10:23:47.8194568Z st.global.v4.b32 [ %rd359 + 0 ], { %r15985, %r15986, %r15987, %r15988 }; 2026-02-21T10:23:47.8194633Z // end inline asm 2026-02-21T10:23:47.8194695Z // begin inline asm 2026-02-21T10:23:47.8194816Z st.global.v4.b32 [ %rd360 + 0 ], { %r15990, %r15991, %r15992, %r15993 }; 2026-02-21T10:23:47.8194875Z // end inline asm 2026-02-21T10:23:47.8195094Z .loc 1 26 140 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:140 2026-02-21T10:23:47.8195159Z add.s32 %r496, %r16219, 132; 2026-02-21T10:23:47.8195234Z setp.lt.s32 %p187, %r16219, 4988; 2026-02-21T10:23:47.8195299Z mov.b32 %r16219, %r496; 2026-02-21T10:23:47.8195361Z @%p187 bra $L__BB0_9; 2026-02-21T10:23:47.8195506Z $L__BB0_12: // %._crit_edge 2026-02-21T10:23:47.8195715Z .loc 1 26 4 // cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py:26:4 2026-02-21T10:23:47.8195768Z ret; 2026-02-21T10:23:47.8195827Z $L__tmp25: 2026-02-21T10:23:47.8195887Z $L__func_end0: 2026-02-21T10:23:47.8195976Z // -- End function 2026-02-21T10:23:47.8196028Z } 2026-02-21T10:23:47.8196326Z .file 1 "/tmp/torchinductor_root/be/cbeynqiiumrhvrv6kkl6iwoexee3chp3ivp65qqjmf4fnaayu2rj.py" 2026-02-21T10:23:47.8196663Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:23:47.8196733Z .section .debug_abbrev 2026-02-21T10:23:47.8196797Z { 2026-02-21T10:23:47.8196900Z .b8 1 // Abbreviation Code 2026-02-21T10:23:47.8196993Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:23:47.8197079Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:23:47.8197177Z .b8 37 // DW_AT_producer 2026-02-21T10:23:47.8197257Z .b8 8 // DW_FORM_string 2026-02-21T10:23:47.8197336Z .b8 19 // DW_AT_language 2026-02-21T10:23:47.8197427Z .b8 5 // DW_FORM_data2 2026-02-21T10:23:47.8197507Z .b8 3 // DW_AT_name 2026-02-21T10:23:47.8197590Z .b8 8 // DW_FORM_string 2026-02-21T10:23:47.8197680Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:23:47.8197759Z .b8 6 // DW_FORM_data4 2026-02-21T10:23:47.8197839Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:23:47.8197917Z .b8 8 // DW_FORM_string 2026-02-21T10:23:47.8197996Z .b8 0 // EOM(1) 2026-02-21T10:23:47.8198067Z .b8 0 // EOM(2) 2026-02-21T10:23:47.8198159Z .b8 2 // Abbreviation Code 2026-02-21T10:23:47.8198251Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:23:47.8198331Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:23:47.8198410Z .b8 3 // DW_AT_name 2026-02-21T10:23:47.8198491Z .b8 8 // DW_FORM_string 2026-02-21T10:23:47.8198573Z .b8 32 // DW_AT_inline 2026-02-21T10:23:47.8198655Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:47.8198727Z .b8 0 // EOM(1) 2026-02-21T10:23:47.8198798Z .b8 0 // EOM(2) 2026-02-21T10:23:47.8198885Z .b8 3 // Abbreviation Code 2026-02-21T10:23:47.8198970Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:23:47.8199055Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:23:47.8199219Z .b8 17 // DW_AT_low_pc 2026-02-21T10:23:47.8199297Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:47.8199381Z .b8 18 // DW_AT_high_pc 2026-02-21T10:23:47.8199533Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:47.8199627Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:23:47.8199706Z .b8 19 // DW_FORM_ref4 2026-02-21T10:23:47.8199780Z .b8 0 // EOM(1) 2026-02-21T10:23:47.8199849Z .b8 0 // EOM(2) 2026-02-21T10:23:47.8199933Z .b8 4 // Abbreviation Code 2026-02-21T10:23:47.8200036Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:23:47.8200116Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:23:47.8200204Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:23:47.8200349Z .b8 19 // DW_FORM_ref4 2026-02-21T10:23:47.8200430Z .b8 17 // DW_AT_low_pc 2026-02-21T10:23:47.8200506Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:47.8200596Z .b8 18 // DW_AT_high_pc 2026-02-21T10:23:47.8200728Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:47.8200815Z .b8 88 // DW_AT_call_file 2026-02-21T10:23:47.8200895Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:47.8200979Z .b8 89 // DW_AT_call_line 2026-02-21T10:23:47.8201058Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:47.8201142Z .b8 87 // DW_AT_call_column 2026-02-21T10:23:47.8201223Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:47.8201297Z .b8 0 // EOM(1) 2026-02-21T10:23:47.8201368Z .b8 0 // EOM(2) 2026-02-21T10:23:47.8201438Z .b8 0 // EOM(3) 2026-02-21T10:23:47.8201491Z } 2026-02-21T10:23:47.8201557Z .section .debug_info 2026-02-21T10:23:47.8201608Z { 2026-02-21T10:23:47.8201701Z .b32 178 // Length of Unit 2026-02-21T10:23:47.8201796Z .b8 2 // DWARF version number 2026-02-21T10:23:47.8201849Z .b8 0 2026-02-21T10:23:47.8201983Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:23:47.8202082Z .b8 8 // Address Size (in bytes) 2026-02-21T10:23:47.8202205Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:23:47.8202296Z .b8 116 // DW_AT_producer 2026-02-21T10:23:47.8202352Z .b8 114 2026-02-21T10:23:47.8202405Z .b8 105 2026-02-21T10:23:47.8202460Z .b8 116 2026-02-21T10:23:47.8202520Z .b8 111 2026-02-21T10:23:47.8202571Z .b8 110 2026-02-21T10:23:47.8202621Z .b8 0 2026-02-21T10:23:47.8202700Z .b8 2 // DW_AT_language 2026-02-21T10:23:47.8202765Z .b8 0 2026-02-21T10:23:47.8202849Z .b8 99 // DW_AT_name 2026-02-21T10:23:47.8202908Z .b8 98 2026-02-21T10:23:47.8202964Z .b8 101 2026-02-21T10:23:47.8203016Z .b8 121 2026-02-21T10:23:47.8205714Z .b8 110 2026-02-21T10:23:47.8205801Z .b8 113 2026-02-21T10:23:47.8205859Z .b8 105 2026-02-21T10:23:47.8205914Z .b8 105 2026-02-21T10:23:47.8205967Z .b8 117 2026-02-21T10:23:47.8206021Z .b8 109 2026-02-21T10:23:47.8206076Z .b8 114 2026-02-21T10:23:47.8206128Z .b8 104 2026-02-21T10:23:47.8206182Z .b8 118 2026-02-21T10:23:47.8206234Z .b8 114 2026-02-21T10:23:47.8206285Z .b8 118 2026-02-21T10:23:47.8206337Z .b8 54 2026-02-21T10:23:47.8206393Z .b8 107 2026-02-21T10:23:47.8206617Z .b8 107 2026-02-21T10:23:47.8206679Z .b8 108 2026-02-21T10:23:47.8206869Z .b8 54 2026-02-21T10:23:47.8206940Z .b8 105 2026-02-21T10:23:47.8206996Z .b8 119 2026-02-21T10:23:47.8207049Z .b8 111 2026-02-21T10:23:47.8207104Z .b8 101 2026-02-21T10:23:47.8207157Z .b8 120 2026-02-21T10:23:47.8207208Z .b8 101 2026-02-21T10:23:47.8207331Z .b8 101 2026-02-21T10:23:47.8207385Z .b8 51 2026-02-21T10:23:47.8207437Z .b8 99 2026-02-21T10:23:47.8207488Z .b8 104 2026-02-21T10:23:47.8207542Z .b8 112 2026-02-21T10:23:47.8207593Z .b8 51 2026-02-21T10:23:47.8207648Z .b8 105 2026-02-21T10:23:47.8207698Z .b8 118 2026-02-21T10:23:47.8207753Z .b8 112 2026-02-21T10:23:47.8207804Z .b8 54 2026-02-21T10:23:47.8207868Z .b8 53 2026-02-21T10:23:47.8207922Z .b8 113 2026-02-21T10:23:47.8207978Z .b8 113 2026-02-21T10:23:47.8208029Z .b8 106 2026-02-21T10:23:47.8208080Z .b8 109 2026-02-21T10:23:47.8208137Z .b8 102 2026-02-21T10:23:47.8208189Z .b8 52 2026-02-21T10:23:47.8208240Z .b8 102 2026-02-21T10:23:47.8208294Z .b8 110 2026-02-21T10:23:47.8208348Z .b8 97 2026-02-21T10:23:47.8208399Z .b8 97 2026-02-21T10:23:47.8208454Z .b8 121 2026-02-21T10:23:47.8208595Z .b8 117 2026-02-21T10:23:47.8208652Z .b8 50 2026-02-21T10:23:47.8208704Z .b8 114 2026-02-21T10:23:47.8208758Z .b8 106 2026-02-21T10:23:47.8208816Z .b8 46 2026-02-21T10:23:47.8208869Z .b8 112 2026-02-21T10:23:47.8208923Z .b8 121 2026-02-21T10:23:47.8208977Z .b8 0 2026-02-21T10:23:47.8209117Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:23:47.8209305Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:23:47.8209363Z .b8 116 2026-02-21T10:23:47.8209417Z .b8 109 2026-02-21T10:23:47.8209470Z .b8 112 2026-02-21T10:23:47.8209521Z .b8 47 2026-02-21T10:23:47.8209575Z .b8 116 2026-02-21T10:23:47.8209627Z .b8 111 2026-02-21T10:23:47.8209677Z .b8 114 2026-02-21T10:23:47.8209729Z .b8 99 2026-02-21T10:23:47.8209782Z .b8 104 2026-02-21T10:23:47.8209835Z .b8 105 2026-02-21T10:23:47.8209885Z .b8 110 2026-02-21T10:23:47.8209936Z .b8 100 2026-02-21T10:23:47.8210001Z .b8 117 2026-02-21T10:23:47.8210054Z .b8 99 2026-02-21T10:23:47.8210110Z .b8 116 2026-02-21T10:23:47.8210168Z .b8 111 2026-02-21T10:23:47.8210222Z .b8 114 2026-02-21T10:23:47.8210272Z .b8 95 2026-02-21T10:23:47.8210323Z .b8 114 2026-02-21T10:23:47.8210377Z .b8 111 2026-02-21T10:23:47.8210428Z .b8 111 2026-02-21T10:23:47.8210483Z .b8 116 2026-02-21T10:23:47.8210537Z .b8 47 2026-02-21T10:23:47.8210588Z .b8 98 2026-02-21T10:23:47.8210638Z .b8 101 2026-02-21T10:23:47.8210691Z .b8 0 2026-02-21T10:23:47.8210822Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:23:47.8210909Z .b8 95 // DW_AT_name 2026-02-21T10:23:47.8210962Z .b8 104 2026-02-21T10:23:47.8211016Z .b8 101 2026-02-21T10:23:47.8211069Z .b8 108 2026-02-21T10:23:47.8211123Z .b8 105 2026-02-21T10:23:47.8211175Z .b8 111 2026-02-21T10:23:47.8211229Z .b8 110 2026-02-21T10:23:47.8211280Z .b8 95 2026-02-21T10:23:47.8211332Z .b8 109 2026-02-21T10:23:47.8211387Z .b8 97 2026-02-21T10:23:47.8211439Z .b8 116 2026-02-21T10:23:47.8211491Z .b8 109 2026-02-21T10:23:47.8211543Z .b8 117 2026-02-21T10:23:47.8211601Z .b8 108 2026-02-21T10:23:47.8211650Z .b8 95 2026-02-21T10:23:47.8211701Z .b8 98 2026-02-21T10:23:47.8211755Z .b8 102 2026-02-21T10:23:47.8211815Z .b8 49 2026-02-21T10:23:47.8211866Z .b8 54 2026-02-21T10:23:47.8211921Z .b8 95 2026-02-21T10:23:47.8211977Z .b8 105 2026-02-21T10:23:47.8212029Z .b8 110 2026-02-21T10:23:47.8212082Z .b8 116 2026-02-21T10:23:47.8212134Z .b8 52 2026-02-21T10:23:47.8212190Z .b8 0 2026-02-21T10:23:47.8212280Z .b8 1 // DW_AT_inline 2026-02-21T10:23:47.8212400Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:23:47.8212509Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:23:47.8212610Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:23:47.8212714Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:23:47.8212851Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:23:47.8213014Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:23:47.8213105Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:23:47.8213198Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:23:47.8213348Z .b8 1 // DW_AT_call_file 2026-02-21T10:23:47.8213436Z .b8 90 // DW_AT_call_line 2026-02-21T10:23:47.8213533Z .b8 40 // DW_AT_call_column 2026-02-21T10:23:47.8213628Z .b8 0 // End Of Children Mark 2026-02-21T10:23:47.8213714Z .b8 0 // End Of Children Mark 2026-02-21T10:23:47.8213768Z } 2026-02-21T10:23:47.8213843Z .section .debug_macinfo { } 2026-02-21T10:23:47.8213850Z 2026-02-21T10:23:47.8213931Z ================================================================ 2026-02-21T10:23:47.8214049Z please share the reproducer above with Triton project. 2026-02-21T10:23:48.7531915Z 2026-02-21T10:23:48.7532230Z 2026-02-21T10:23:48.7532242Z 2026-02-21T10:23:48.7532522Z ================================================================ 2026-02-21T10:23:48.7533008Z Internal Triton PTX codegen error 2026-02-21T10:23:48.7533289Z `ptxas` stderr: 2026-02-21T10:23:48.7534185Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 412 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:23:48.7535044Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:23:48.7535291Z 2026-02-21T10:23:48.7535953Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp3qfszq07.ptx -o /tmp/tmp3qfszq07.ptx.o 2026-02-21T10:23:48.7536888Z 2026-02-21T10:23:48.7536894Z 2026-02-21T10:23:48.7536970Z // 2026-02-21T10:23:48.7537168Z // Generated by LLVM NVPTX Back-End 2026-02-21T10:23:48.7537437Z // 2026-02-21T10:23:48.7537547Z 2026-02-21T10:23:48.7537624Z .version 8.7 2026-02-21T10:23:48.7537811Z .target sm_90a 2026-02-21T10:23:48.7537989Z .address_size 64 2026-02-21T10:23:48.7538115Z 2026-02-21T10:23:48.7538337Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T10:23:48.7538779Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T10:23:48.7539117Z // @_helion_matmul_bf16_int4 2026-02-21T10:23:48.7539469Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T10:23:48.7539846Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T10:23:48.7540485Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T10:23:48.7540957Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T10:23:48.7541409Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T10:23:48.7541849Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T10:23:48.7542233Z ) 2026-02-21T10:23:48.7542406Z .reqntid 256 2026-02-21T10:23:48.7542600Z .maxnreg 32 2026-02-21T10:23:48.7542737Z { 2026-02-21T10:23:48.7542889Z .reg .pred %p<46>; 2026-02-21T10:23:48.7543070Z .reg .b16 %rs<577>; 2026-02-21T10:23:48.7543272Z .reg .b32 %r<6653>; 2026-02-21T10:23:48.7543446Z .reg .b64 %rd<193>; 2026-02-21T10:23:48.7543794Z .loc 1 14 0 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:14:0 2026-02-21T10:23:48.7544204Z $L__func_begin0: 2026-02-21T10:23:48.7544536Z .loc 1 14 0 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:14:0 2026-02-21T10:23:48.7544866Z 2026-02-21T10:23:48.7544962Z // %bb.0: 2026-02-21T10:23:48.7545183Z ld.param.b64 %rd25, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T10:23:48.7545535Z ld.param.b64 %rd24, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T10:23:48.7545861Z ld.param.b64 %rd23, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T10:23:48.7546120Z $L__tmp0: 2026-02-21T10:23:48.7546710Z .loc 1 20 30 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:20:30 2026-02-21T10:23:48.7547121Z mov.u32 %r599, %ctaid.x; 2026-02-21T10:23:48.7547467Z .loc 1 20 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:20:35 2026-02-21T10:23:48.7548001Z mul.lo.s32 %r6359, %r599, 39; 2026-02-21T10:23:48.7548368Z .loc 1 21 37 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:21:37 2026-02-21T10:23:48.7548848Z add.s32 %r600, %r6359, 39; 2026-02-21T10:23:48.7549204Z .loc 1 21 49 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:21:49 2026-02-21T10:23:48.7549596Z min.s32 %r2, %r600, 5120; 2026-02-21T10:23:48.7549952Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7550357Z sub.s32 %r601, %r2, %r6359; 2026-02-21T10:23:48.7550545Z shr.u32 %r602, %r601, 31; 2026-02-21T10:23:48.7550814Z add.s32 %r603, %r601, %r602; 2026-02-21T10:23:48.7551006Z and.b32 %r604, %r603, -2; 2026-02-21T10:23:48.7551193Z add.s32 %r6586, %r604, %r6359; 2026-02-21T10:23:48.7551545Z .loc 1 34 45 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:45 2026-02-21T10:23:48.7551937Z mov.u32 %r4, %tid.x; 2026-02-21T10:23:48.7552120Z shr.u32 %r5, %r4, 5; 2026-02-21T10:23:48.7552293Z and.b32 %r6, %r4, 252; 2026-02-21T10:23:48.7552551Z bfe.u32 %r7, %r4, 2, 6; 2026-02-21T10:23:48.7552729Z or.b32 %r8, %r7, 64; 2026-02-21T10:23:48.7552885Z shr.u32 %r605, %r4, 4; 2026-02-21T10:23:48.7553046Z bfe.u32 %r10, %r4, 4, 4; 2026-02-21T10:23:48.7553214Z or.b32 %r11, %r10, 16; 2026-02-21T10:23:48.7553368Z or.b32 %r12, %r10, 32; 2026-02-21T10:23:48.7553528Z or.b32 %r13, %r605, 48; 2026-02-21T10:23:48.7553697Z or.b32 %r14, %r10, 64; 2026-02-21T10:23:48.7553858Z or.b32 %r15, %r10, 80; 2026-02-21T10:23:48.7554017Z or.b32 %r16, %r10, 96; 2026-02-21T10:23:48.7554194Z or.b32 %r17, %r605, 112; 2026-02-21T10:23:48.7554362Z shl.b32 %r18, %r4, 2; 2026-02-21T10:23:48.7554520Z and.b32 %r19, %r18, 124; 2026-02-21T10:23:48.7554689Z shl.b32 %r20, %r4, 3; 2026-02-21T10:23:48.7554843Z and.b32 %r21, %r20, 120; 2026-02-21T10:23:48.7555166Z .loc 1 44 48 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:44:48 2026-02-21T10:23:48.7555522Z and.b32 %r22, %r4, 224; 2026-02-21T10:23:48.7555693Z bfe.u32 %r23, %r4, 5, 3; 2026-02-21T10:23:48.7556007Z .loc 1 50 38 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:50:38 2026-02-21T10:23:48.7556354Z and.b32 %r24, %r4, 3; 2026-02-21T10:23:48.7556665Z shl.b32 %r25, %r24, 2; 2026-02-21T10:23:48.7556980Z .loc 1 68 38 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:68:38 2026-02-21T10:23:48.7557339Z and.b32 %r27, %r4, 128; 2026-02-21T10:23:48.7557664Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7558046Z setp.lt.s32 %p1, %r6359, %r6586; 2026-02-21T10:23:48.7558263Z mov.b32 %r4869, global_smem; 2026-02-21T10:23:48.7558441Z shl.b32 %r6344, %r22, 4; 2026-02-21T10:23:48.7558616Z and.b32 %r6345, %r20, 96; 2026-02-21T10:23:48.7558787Z shl.b32 %r6346, %r24, 1; 2026-02-21T10:23:48.7558951Z and.b32 %r6347, %r5, 3; 2026-02-21T10:23:48.7559115Z shl.b32 %r6348, %r24, 8; 2026-02-21T10:23:48.7559278Z shl.b32 %r6349, %r24, 5; 2026-02-21T10:23:48.7559442Z and.b32 %r6350, %r4, 124; 2026-02-21T10:23:48.7559606Z shl.b32 %r6351, %r4, 6; 2026-02-21T10:23:48.7559769Z and.b32 %r6352, %r20, 48; 2026-02-21T10:23:48.7559930Z shr.u32 %r6353, %r27, 5; 2026-02-21T10:23:48.7560092Z shl.b32 %r6354, %r24, 13; 2026-02-21T10:23:48.7560252Z shl.b32 %r6355, %r4, 5; 2026-02-21T10:23:48.7560411Z and.b32 %r6356, %r4, 24; 2026-02-21T10:23:48.7560571Z and.b32 %r6357, %r18, 16; 2026-02-21T10:23:48.7560734Z shl.b32 %r6358, %r6, 2; 2026-02-21T10:23:48.7561008Z setp.eq.b32 %p45, %r27, 0; 2026-02-21T10:23:48.7561183Z @%p1 bra $L__BB0_2; 2026-02-21T10:23:48.7561348Z bra.uni $L__BB0_1; 2026-02-21T10:23:48.7561526Z $L__BB0_2: // %.lr.ph 2026-02-21T10:23:48.7561920Z .loc 1 0 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:0:121 2026-02-21T10:23:48.7562360Z bfe.u32 %r9, %r4, 1, 7; 2026-02-21T10:23:48.7562541Z and.b32 %r26, %r20, 8; 2026-02-21T10:23:48.7562710Z shl.b32 %r606, %r4, 4; 2026-02-21T10:23:48.7562872Z and.b32 %r607, %r606, 3952; 2026-02-21T10:23:48.7563049Z bfe.s32 %r608, %r4, 3, 1; 2026-02-21T10:23:48.7563216Z and.b32 %r609, %r608, 136; 2026-02-21T10:23:48.7563391Z or.b32 %r610, %r609, %r607; 2026-02-21T10:23:48.7563578Z add.s32 %r29, %r4869, %r610; 2026-02-21T10:23:48.7563755Z xor.b32 %r612, %r610, 8; 2026-02-21T10:23:48.7563919Z add.s32 %r30, %r4869, %r612; 2026-02-21T10:23:48.7564095Z and.b32 %r6488, %r4, 16; 2026-02-21T10:23:48.7564259Z bfe.s32 %r616, %r4, 4, 1; 2026-02-21T10:23:48.7564509Z and.b32 %r617, %r616, 136; 2026-02-21T10:23:48.7564691Z or.b32 %r618, %r6344, %r6345; 2026-02-21T10:23:48.7564867Z or.b32 %r619, %r618, %r6346; 2026-02-21T10:23:48.7565042Z or.b32 %r620, %r619, %r617; 2026-02-21T10:23:48.7565214Z add.s32 %r32, %r4869, %r620; 2026-02-21T10:23:48.7565388Z xor.b32 %r621, %r620, 8; 2026-02-21T10:23:48.7565549Z add.s32 %r33, %r4869, %r621; 2026-02-21T10:23:48.7565789Z or.b32 %r623, %r19, %r6347; 2026-02-21T10:23:48.7565959Z or.b32 %r624, %r623, %r27; 2026-02-21T10:23:48.7566133Z add.s32 %r34, %r4869, %r624; 2026-02-21T10:23:48.7566306Z xor.b32 %r625, %r624, 32; 2026-02-21T10:23:48.7566593Z add.s32 %r35, %r4869, %r625; 2026-02-21T10:23:48.7566775Z xor.b32 %r626, %r624, 64; 2026-02-21T10:23:48.7566948Z add.s32 %r36, %r4869, %r626; 2026-02-21T10:23:48.7567123Z xor.b32 %r627, %r624, 96; 2026-02-21T10:23:48.7567287Z add.s32 %r37, %r4869, %r627; 2026-02-21T10:23:48.7567466Z xor.b32 %r631, %r6349, %r6350; 2026-02-21T10:23:48.7567650Z add.s32 %r632, %r4869, %r6348; 2026-02-21T10:23:48.7567833Z add.s32 %r38, %r632, %r631; 2026-02-21T10:23:48.7568006Z and.b32 %r634, %r6351, 8128; 2026-02-21T10:23:48.7568182Z or.b32 %r637, %r634, %r6353; 2026-02-21T10:23:48.7568361Z or.b32 %r638, %r637, %r6352; 2026-02-21T10:23:48.7568535Z add.s32 %r39, %r4869, %r638; 2026-02-21T10:23:48.7568708Z xor.b32 %r639, %r638, 16; 2026-02-21T10:23:48.7568868Z add.s32 %r40, %r4869, %r639; 2026-02-21T10:23:48.7569048Z xor.b32 %r640, %r638, 32; 2026-02-21T10:23:48.7569210Z add.s32 %r41, %r4869, %r640; 2026-02-21T10:23:48.7569385Z xor.b32 %r641, %r638, 48; 2026-02-21T10:23:48.7569546Z add.s32 %r42, %r4869, %r641; 2026-02-21T10:23:48.7569726Z bfe.u32 %r642, %r4869, 4, 14; 2026-02-21T10:23:48.7569901Z cvt.u64.u32 %rd26, %r642; 2026-02-21T10:23:48.7570089Z or.b64 %rd78, %rd26, -9223371899382267904; 2026-02-21T10:23:48.7570302Z add.s32 %r643, %r4869, 32; 2026-02-21T10:23:48.7570483Z bfe.u32 %r644, %r643, 4, 14; 2026-02-21T10:23:48.7570663Z cvt.u64.u32 %rd27, %r644; 2026-02-21T10:23:48.7570841Z or.b64 %rd79, %rd27, -9223371899382267904; 2026-02-21T10:23:48.7571044Z and.b32 %r647, %r6355, 7264; 2026-02-21T10:23:48.7571213Z shl.b32 %r649, %r6356, 4; 2026-02-21T10:23:48.7571377Z or.b32 %r651, %r6354, %r6357; 2026-02-21T10:23:48.7571549Z or.b32 %r652, %r647, %r649; 2026-02-21T10:23:48.7571717Z or.b32 %r653, %r651, %r652; 2026-02-21T10:23:48.7571909Z add.s32 %r43, %r4869, %r653; 2026-02-21T10:23:48.7572079Z xor.b32 %r654, %r653, 32; 2026-02-21T10:23:48.7572251Z add.s32 %r44, %r4869, %r654; 2026-02-21T10:23:48.7572418Z xor.b32 %r655, %r653, 64; 2026-02-21T10:23:48.7572585Z add.s32 %r45, %r4869, %r655; 2026-02-21T10:23:48.7572750Z xor.b32 %r656, %r653, 96; 2026-02-21T10:23:48.7572915Z add.s32 %r46, %r4869, %r656; 2026-02-21T10:23:48.7573086Z shl.b32 %r657, %r6356, 10; 2026-02-21T10:23:48.7573256Z or.b32 %r659, %r657, %r6349; 2026-02-21T10:23:48.7573426Z xor.b32 %r660, %r659, %r6358; 2026-02-21T10:23:48.7573728Z add.s32 %r2380, %r4869, %r660; 2026-02-21T10:23:48.7573912Z add.s32 %r2385, %r2380, 1024; 2026-02-21T10:23:48.7574085Z add.s32 %r2390, %r2380, 2048; 2026-02-21T10:23:48.7574259Z add.s32 %r2395, %r2380, 3072; 2026-02-21T10:23:48.7574426Z add.s32 %r2400, %r2380, 4096; 2026-02-21T10:23:48.7574668Z add.s32 %r2405, %r2380, 5120; 2026-02-21T10:23:48.7574836Z add.s32 %r2410, %r2380, 6144; 2026-02-21T10:23:48.7575022Z add.s32 %r2415, %r2380, 7168; 2026-02-21T10:23:48.7575359Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7575727Z add.s64 %rd3, %rd23, 96; 2026-02-21T10:23:48.7575898Z shl.b32 %r661, %r9, 13; 2026-02-21T10:23:48.7576209Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7576695Z or.b32 %r55, %r661, %r26; 2026-02-21T10:23:48.7577013Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7577491Z mad.wide.u32 %rd4, %r23, 1280, %rd24; 2026-02-21T10:23:48.7577749Z $L__BB0_3: // =>This Loop Header: Depth=1 2026-02-21T10:23:48.7578037Z // Child Loop BB0_4 Depth 2 2026-02-21T10:23:48.7578302Z // Child Loop BB0_6 Depth 2 2026-02-21T10:23:48.7578754Z .loc 1 28 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:28:35 2026-02-21T10:23:48.7579120Z shr.s32 %r663, %r6359, 31; 2026-02-21T10:23:48.7579290Z shr.u32 %r664, %r663, 17; 2026-02-21T10:23:48.7579463Z add.s32 %r665, %r6359, %r664; 2026-02-21T10:23:48.7579637Z shr.s32 %r666, %r665, 15; 2026-02-21T10:23:48.7579965Z .loc 1 29 33 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:29:33 2026-02-21T10:23:48.7580320Z shl.b32 %r667, %r666, 6; 2026-02-21T10:23:48.7580624Z .loc 1 30 39 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:39 2026-02-21T10:23:48.7580981Z sub.s32 %r668, 10, %r667; 2026-02-21T10:23:48.7581284Z .loc 1 30 52 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:52 2026-02-21T10:23:48.7581632Z min.s32 %r669, %r668, 64; 2026-02-21T10:23:48.7581936Z .loc 1 31 45 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:45 2026-02-21T10:23:48.7582310Z and.b32 %r670, %r665, -32768; 2026-02-21T10:23:48.7582494Z sub.s32 %r671, %r6359, %r670; 2026-02-21T10:23:48.7582807Z .loc 1 32 51 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:32:51 2026-02-21T10:23:48.7583160Z div.s32 %r672, %r671, %r669; 2026-02-21T10:23:48.7583468Z .loc 1 31 64 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:64 2026-02-21T10:23:48.7583825Z mul.lo.s32 %r673, %r672, %r669; 2026-02-21T10:23:48.7584013Z sub.s32 %r674, %r671, %r673; 2026-02-21T10:23:48.7584329Z .loc 1 31 30 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:30 2026-02-21T10:23:48.7584687Z add.s32 %r675, %r674, %r667; 2026-02-21T10:23:48.7584994Z .loc 1 33 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:33:27 2026-02-21T10:23:48.7585346Z shl.b32 %r57, %r675, 7; 2026-02-21T10:23:48.7585651Z .loc 1 35 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:35:27 2026-02-21T10:23:48.7585997Z shl.b32 %r58, %r672, 7; 2026-02-21T10:23:48.7586310Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7586807Z shl.b32 %r676, %r672, 20; 2026-02-21T10:23:48.7586986Z or.b32 %r677, %r55, %r676; 2026-02-21T10:23:48.7587165Z mad.wide.s32 %rd188, %r677, 2, %rd3; 2026-02-21T10:23:48.7587363Z or.b32 %r678, %r19, %r57; 2026-02-21T10:23:48.7587528Z cvt.s64.s32 %rd29, %r678; 2026-02-21T10:23:48.7587702Z add.s64 %rd187, %rd4, %rd29; 2026-02-21T10:23:48.7587973Z mov.b32 %r6360, 0f00000000; 2026-02-21T10:23:48.7588149Z mov.b64 %rd189, -32; 2026-02-21T10:23:48.7588306Z mov.b32 %r6361, %r6360; 2026-02-21T10:23:48.7588468Z mov.b32 %r6362, %r6360; 2026-02-21T10:23:48.7588733Z mov.b32 %r6363, %r6360; 2026-02-21T10:23:48.7588971Z mov.b32 %r6364, %r6360; 2026-02-21T10:23:48.7589134Z mov.b32 %r6365, %r6360; 2026-02-21T10:23:48.7589289Z mov.b32 %r6366, %r6360; 2026-02-21T10:23:48.7589450Z mov.b32 %r6367, %r6360; 2026-02-21T10:23:48.7589609Z mov.b32 %r6368, %r6360; 2026-02-21T10:23:48.7589773Z mov.b32 %r6369, %r6360; 2026-02-21T10:23:48.7589935Z mov.b32 %r6370, %r6360; 2026-02-21T10:23:48.7590098Z mov.b32 %r6371, %r6360; 2026-02-21T10:23:48.7590253Z mov.b32 %r6372, %r6360; 2026-02-21T10:23:48.7590412Z mov.b32 %r6373, %r6360; 2026-02-21T10:23:48.7590569Z mov.b32 %r6374, %r6360; 2026-02-21T10:23:48.7590729Z mov.b32 %r6375, %r6360; 2026-02-21T10:23:48.7590885Z mov.b32 %r6376, %r6360; 2026-02-21T10:23:48.7591040Z mov.b32 %r6377, %r6360; 2026-02-21T10:23:48.7591288Z mov.b32 %r6378, %r6360; 2026-02-21T10:23:48.7591450Z mov.b32 %r6379, %r6360; 2026-02-21T10:23:48.7591609Z mov.b32 %r6380, %r6360; 2026-02-21T10:23:48.7591761Z mov.b32 %r6381, %r6360; 2026-02-21T10:23:48.7591935Z mov.b32 %r6382, %r6360; 2026-02-21T10:23:48.7592097Z mov.b32 %r6383, %r6360; 2026-02-21T10:23:48.7592257Z mov.b32 %r6384, %r6360; 2026-02-21T10:23:48.7592411Z mov.b32 %r6385, %r6360; 2026-02-21T10:23:48.7592639Z mov.b32 %r6386, %r6360; 2026-02-21T10:23:48.7592801Z mov.b32 %r6387, %r6360; 2026-02-21T10:23:48.7592961Z mov.b32 %r6388, %r6360; 2026-02-21T10:23:48.7593119Z mov.b32 %r6389, %r6360; 2026-02-21T10:23:48.7593284Z mov.b32 %r6390, %r6360; 2026-02-21T10:23:48.7593445Z mov.b32 %r6391, %r6360; 2026-02-21T10:23:48.7593598Z mov.b32 %r6392, %r6360; 2026-02-21T10:23:48.7593757Z mov.b32 %r6393, %r6360; 2026-02-21T10:23:48.7593910Z mov.b32 %r6394, %r6360; 2026-02-21T10:23:48.7594068Z mov.b32 %r6395, %r6360; 2026-02-21T10:23:48.7594238Z mov.b32 %r6396, %r6360; 2026-02-21T10:23:48.7594404Z mov.b32 %r6397, %r6360; 2026-02-21T10:23:48.7594562Z mov.b32 %r6398, %r6360; 2026-02-21T10:23:48.7594717Z mov.b32 %r6399, %r6360; 2026-02-21T10:23:48.7594874Z mov.b32 %r6400, %r6360; 2026-02-21T10:23:48.7595032Z mov.b32 %r6401, %r6360; 2026-02-21T10:23:48.7595193Z mov.b32 %r6402, %r6360; 2026-02-21T10:23:48.7595348Z mov.b32 %r6403, %r6360; 2026-02-21T10:23:48.7595510Z mov.b32 %r6404, %r6360; 2026-02-21T10:23:48.7595665Z mov.b32 %r6405, %r6360; 2026-02-21T10:23:48.7595824Z mov.b32 %r6406, %r6360; 2026-02-21T10:23:48.7595984Z mov.b32 %r6407, %r6360; 2026-02-21T10:23:48.7596158Z mov.b32 %r6408, %r6360; 2026-02-21T10:23:48.7596317Z mov.b32 %r6409, %r6360; 2026-02-21T10:23:48.7596601Z mov.b32 %r6410, %r6360; 2026-02-21T10:23:48.7596780Z mov.b32 %r6411, %r6360; 2026-02-21T10:23:48.7596938Z mov.b32 %r6412, %r6360; 2026-02-21T10:23:48.7597101Z mov.b32 %r6413, %r6360; 2026-02-21T10:23:48.7597258Z mov.b32 %r6414, %r6360; 2026-02-21T10:23:48.7597419Z mov.b32 %r6415, %r6360; 2026-02-21T10:23:48.7597576Z mov.b32 %r6416, %r6360; 2026-02-21T10:23:48.7597745Z mov.b32 %r6417, %r6360; 2026-02-21T10:23:48.7597904Z mov.b32 %r6418, %r6360; 2026-02-21T10:23:48.7598062Z mov.b32 %r6419, %r6360; 2026-02-21T10:23:48.7598222Z mov.b32 %r6420, %r6360; 2026-02-21T10:23:48.7598377Z mov.b32 %r6421, %r6360; 2026-02-21T10:23:48.7598539Z mov.b32 %r6422, %r6360; 2026-02-21T10:23:48.7598696Z mov.b32 %r6423, %r6360; 2026-02-21T10:23:48.7598909Z $L__BB0_4: // Parent Loop BB0_3 Depth=1 2026-02-21T10:23:48.7599203Z // => This Inner Loop Header: Depth=2 2026-02-21T10:23:48.7599604Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7599973Z add.s64 %rd31, %rd188, -96; 2026-02-21T10:23:48.7600157Z // begin inline asm 2026-02-21T10:23:48.7600312Z mov.u64 %rd30, 0x0; 2026-02-21T10:23:48.7600628Z createpolicy.fractional.L2::evict_last.b64 %rd30, 1.0; 2026-02-21T10:23:48.7600886Z // end inline asm 2026-02-21T10:23:48.7601031Z // begin inline asm 2026-02-21T10:23:48.7601192Z mov.u32 %r679, 0x0; 2026-02-21T10:23:48.7601340Z mov.u32 %r680, 0x0; 2026-02-21T10:23:48.7601570Z mov.u32 %r681, 0x0; 2026-02-21T10:23:48.7601717Z mov.u32 %r682, 0x0; 2026-02-21T10:23:48.7602024Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r679, %r680, %r681, %r682 }, [ %rd31 + 0 ], %rd30; 2026-02-21T10:23:48.7602386Z // end inline asm 2026-02-21T10:23:48.7602683Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7603040Z bar.sync 0; 2026-02-21T10:23:48.7603201Z st.shared.v2.b32 [%r29], {%r679, %r680}; 2026-02-21T10:23:48.7603425Z st.shared.v2.b32 [%r30], {%r681, %r682}; 2026-02-21T10:23:48.7603623Z bar.sync 0; 2026-02-21T10:23:48.7603790Z ld.shared.b16 %rs1, [%r32]; 2026-02-21T10:23:48.7603982Z ld.shared.b16 %rs2, [%r32+256]; 2026-02-21T10:23:48.7604263Z ld.shared.b16 %rs3, [%r32+16]; 2026-02-21T10:23:48.7604462Z ld.shared.b16 %rs4, [%r32+272]; 2026-02-21T10:23:48.7604652Z ld.shared.b16 %rs5, [%r33]; 2026-02-21T10:23:48.7604838Z ld.shared.b16 %rs6, [%r33+256]; 2026-02-21T10:23:48.7605027Z ld.shared.b16 %rs7, [%r33+16]; 2026-02-21T10:23:48.7605214Z ld.shared.b16 %rs8, [%r33+272]; 2026-02-21T10:23:48.7605398Z cvt.f32.bf16 %r812, %rs1; 2026-02-21T10:23:48.7605652Z cvt.f32.bf16 %r813, %rs2; 2026-02-21T10:23:48.7605830Z cvt.f32.bf16 %r814, %rs5; 2026-02-21T10:23:48.7606001Z cvt.f32.bf16 %r815, %rs6; 2026-02-21T10:23:48.7606166Z cvt.f32.bf16 %r944, %rs3; 2026-02-21T10:23:48.7606337Z cvt.f32.bf16 %r945, %rs4; 2026-02-21T10:23:48.7606632Z cvt.f32.bf16 %r946, %rs7; 2026-02-21T10:23:48.7606806Z cvt.f32.bf16 %r947, %rs8; 2026-02-21T10:23:48.7607129Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7607483Z // begin inline asm 2026-02-21T10:23:48.7607646Z mov.u64 %rd33, 0x0; 2026-02-21T10:23:48.7607879Z createpolicy.fractional.L2::evict_last.b64 %rd33, 1.0; 2026-02-21T10:23:48.7608134Z // end inline asm 2026-02-21T10:23:48.7608283Z // begin inline asm 2026-02-21T10:23:48.7608436Z mov.u32 %r683, 0x0; 2026-02-21T10:23:48.7608690Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r683 }, [ %rd187 + 0 ], %rd33; 2026-02-21T10:23:48.7608986Z // end inline asm 2026-02-21T10:23:48.7609285Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7609631Z bar.sync 0; 2026-02-21T10:23:48.7609781Z st.shared.b8 [%r34], %r683; 2026-02-21T10:23:48.7609966Z prmt.b32 %r2291, %r683, 0, 0x7771U; 2026-02-21T10:23:48.7610170Z st.shared.b8 [%r35+256], %r2291; 2026-02-21T10:23:48.7610365Z prmt.b32 %r2292, %r683, 0, 0x7772U; 2026-02-21T10:23:48.7610569Z st.shared.b8 [%r36+512], %r2292; 2026-02-21T10:23:48.7610763Z prmt.b32 %r2293, %r683, 0, 0x7773U; 2026-02-21T10:23:48.7610952Z st.shared.b8 [%r37+768], %r2293; 2026-02-21T10:23:48.7611138Z bar.sync 0; 2026-02-21T10:23:48.7611284Z ld.shared.b32 %r2294, [%r38]; 2026-02-21T10:23:48.7611482Z prmt.b32 %r2295, %r2294, 0, 0x7770U; 2026-02-21T10:23:48.7611681Z cvt.u16.u32 %rs9, %r2295; 2026-02-21T10:23:48.7611864Z prmt.b32 %r2296, %r2294, 0, 0x7771U; 2026-02-21T10:23:48.7612060Z cvt.u16.u32 %rs10, %r2296; 2026-02-21T10:23:48.7612247Z prmt.b32 %r2297, %r2294, 0, 0x7772U; 2026-02-21T10:23:48.7612453Z cvt.u16.u32 %rs11, %r2297; 2026-02-21T10:23:48.7612628Z prmt.b32 %r2298, %r2294, 0, 0x7773U; 2026-02-21T10:23:48.7612829Z cvt.u16.u32 %rs12, %r2298; 2026-02-21T10:23:48.7613013Z ld.shared.b32 %r2299, [%r38+128]; 2026-02-21T10:23:48.7613211Z prmt.b32 %r2300, %r2299, 0, 0x7770U; 2026-02-21T10:23:48.7613399Z cvt.u16.u32 %rs13, %r2300; 2026-02-21T10:23:48.7613574Z prmt.b32 %r2301, %r2299, 0, 0x7771U; 2026-02-21T10:23:48.7613759Z cvt.u16.u32 %rs14, %r2301; 2026-02-21T10:23:48.7613932Z prmt.b32 %r2302, %r2299, 0, 0x7772U; 2026-02-21T10:23:48.7614222Z cvt.u16.u32 %rs15, %r2302; 2026-02-21T10:23:48.7614397Z prmt.b32 %r2303, %r2299, 0, 0x7773U; 2026-02-21T10:23:48.7614590Z cvt.u16.u32 %rs16, %r2303; 2026-02-21T10:23:48.7614905Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7615339Z shl.b16 %rs17, %rs9, 4; 2026-02-21T10:23:48.7615506Z shl.b16 %rs18, %rs10, 4; 2026-02-21T10:23:48.7615693Z shl.b16 %rs19, %rs11, 4; 2026-02-21T10:23:48.7615859Z shl.b16 %rs20, %rs12, 4; 2026-02-21T10:23:48.7616024Z shl.b16 %rs21, %rs13, 4; 2026-02-21T10:23:48.7616186Z shl.b16 %rs22, %rs14, 4; 2026-02-21T10:23:48.7616365Z shl.b16 %rs23, %rs15, 4; 2026-02-21T10:23:48.7616655Z shl.b16 %rs24, %rs16, 4; 2026-02-21T10:23:48.7616974Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7617341Z selp.b16 %rs25, %rs17, %rs9, %p45; 2026-02-21T10:23:48.7617537Z cvt.s16.s8 %rs26, %rs25; 2026-02-21T10:23:48.7617721Z shr.s16 %rs27, %rs26, 4; 2026-02-21T10:23:48.7617970Z selp.b16 %rs28, %rs18, %rs10, %p45; 2026-02-21T10:23:48.7618170Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T10:23:48.7618333Z shr.s16 %rs30, %rs29, 4; 2026-02-21T10:23:48.7618507Z selp.b16 %rs31, %rs19, %rs11, %p45; 2026-02-21T10:23:48.7618719Z cvt.s16.s8 %rs32, %rs31; 2026-02-21T10:23:48.7618886Z shr.s16 %rs33, %rs32, 4; 2026-02-21T10:23:48.7619060Z selp.b16 %rs34, %rs20, %rs12, %p45; 2026-02-21T10:23:48.7619337Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T10:23:48.7619510Z shr.s16 %rs36, %rs35, 4; 2026-02-21T10:23:48.7619693Z selp.b16 %rs37, %rs21, %rs13, %p45; 2026-02-21T10:23:48.7619889Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T10:23:48.7620052Z shr.s16 %rs39, %rs38, 4; 2026-02-21T10:23:48.7620227Z selp.b16 %rs40, %rs22, %rs14, %p45; 2026-02-21T10:23:48.7620421Z cvt.s16.s8 %rs41, %rs40; 2026-02-21T10:23:48.7620584Z shr.s16 %rs42, %rs41, 4; 2026-02-21T10:23:48.7620755Z selp.b16 %rs43, %rs23, %rs15, %p45; 2026-02-21T10:23:48.7620946Z cvt.s16.s8 %rs44, %rs43; 2026-02-21T10:23:48.7621114Z shr.s16 %rs45, %rs44, 4; 2026-02-21T10:23:48.7621283Z selp.b16 %rs46, %rs24, %rs16, %p45; 2026-02-21T10:23:48.7621491Z cvt.s16.s8 %rs47, %rs46; 2026-02-21T10:23:48.7621666Z shr.s16 %rs48, %rs47, 4; 2026-02-21T10:23:48.7621989Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7622357Z cvt.rn.f32.s16 %r2304, %rs27; 2026-02-21T10:23:48.7622548Z cvt.rn.f32.s16 %r2305, %rs30; 2026-02-21T10:23:48.7622729Z cvt.rn.f32.s16 %r2306, %rs33; 2026-02-21T10:23:48.7622904Z cvt.rn.f32.s16 %r2307, %rs36; 2026-02-21T10:23:48.7623084Z cvt.rn.f32.s16 %r2308, %rs39; 2026-02-21T10:23:48.7623256Z cvt.rn.f32.s16 %r2309, %rs42; 2026-02-21T10:23:48.7623433Z cvt.rn.f32.s16 %r2310, %rs45; 2026-02-21T10:23:48.7623606Z cvt.rn.f32.s16 %r2311, %rs48; 2026-02-21T10:23:48.7623779Z bar.sync 0; 2026-02-21T10:23:48.7623938Z st.shared.b32 [%r39], %r2304; 2026-02-21T10:23:48.7624124Z st.shared.b32 [%r39+8], %r2305; 2026-02-21T10:23:48.7624319Z st.shared.b32 [%r40], %r2306; 2026-02-21T10:23:48.7624507Z st.shared.b32 [%r40+8], %r2307; 2026-02-21T10:23:48.7624699Z st.shared.b32 [%r41], %r2308; 2026-02-21T10:23:48.7624877Z st.shared.b32 [%r41+8], %r2309; 2026-02-21T10:23:48.7625067Z st.shared.b32 [%r42], %r2310; 2026-02-21T10:23:48.7625241Z st.shared.b32 [%r42+8], %r2311; 2026-02-21T10:23:48.7625434Z $L__tmp1: 2026-02-21T10:23:48.7625795Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7626227Z // begin inline asm 2026-02-21T10:23:48.7626404Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7626715Z // end inline asm 2026-02-21T10:23:48.7626864Z bar.sync 0; 2026-02-21T10:23:48.7627021Z shfl.sync.idx.b32 %r2312, %r5, 0, 31, -1; 2026-02-21T10:23:48.7627251Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7627433Z mov.pred %p2, -1; 2026-02-21T10:23:48.7627591Z // begin inline asm 2026-02-21T10:23:48.7629134Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r812,%r813,%r814,%r815}, %rd78, %p2, 1, 1; 2026-02-21T10:23:48.7630658Z // end inline asm 2026-02-21T10:23:48.7630810Z // begin inline asm 2026-02-21T10:23:48.7632217Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r944,%r945,%r946,%r947}, %rd79, %p2, 1, 1; 2026-02-21T10:23:48.7633623Z // end inline asm 2026-02-21T10:23:48.7633787Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7633983Z mov.b32 %r2222, 0; 2026-02-21T10:23:48.7634136Z mov.b32 %r1012, %r4869; 2026-02-21T10:23:48.7634367Z mov.b32 %r1013, %r2222; 2026-02-21T10:23:48.7634530Z mov.b32 %r1014, %r2222; 2026-02-21T10:23:48.7634706Z // begin inline asm 2026-02-21T10:23:48.7635875Z // wait for regs: %r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423,%r1012,%r1013,%r1014 2026-02-21T10:23:48.7637227Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7637420Z // end inline asm 2026-02-21T10:23:48.7637562Z $L__tmp2: 2026-02-21T10:23:48.7637865Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7638247Z add.s64 %rd39, %rd188, -64; 2026-02-21T10:23:48.7638425Z // begin inline asm 2026-02-21T10:23:48.7638582Z mov.u64 %rd38, 0x0; 2026-02-21T10:23:48.7638796Z createpolicy.fractional.L2::evict_last.b64 %rd38, 1.0; 2026-02-21T10:23:48.7639048Z // end inline asm 2026-02-21T10:23:48.7639195Z // begin inline asm 2026-02-21T10:23:48.7639348Z mov.u32 %r1082, 0x0; 2026-02-21T10:23:48.7639502Z mov.u32 %r1083, 0x0; 2026-02-21T10:23:48.7639654Z mov.u32 %r1084, 0x0; 2026-02-21T10:23:48.7639807Z mov.u32 %r1085, 0x0; 2026-02-21T10:23:48.7640120Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1082, %r1083, %r1084, %r1085 }, [ %rd39 + 0 ], %rd38; 2026-02-21T10:23:48.7640500Z // end inline asm 2026-02-21T10:23:48.7640806Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7641157Z bar.sync 0; 2026-02-21T10:23:48.7641324Z st.shared.v2.b32 [%r29], {%r1082, %r1083}; 2026-02-21T10:23:48.7641554Z st.shared.v2.b32 [%r30], {%r1084, %r1085}; 2026-02-21T10:23:48.7641754Z bar.sync 0; 2026-02-21T10:23:48.7641910Z ld.shared.b16 %rs49, [%r32]; 2026-02-21T10:23:48.7642107Z ld.shared.b16 %rs50, [%r32+256]; 2026-02-21T10:23:48.7642307Z ld.shared.b16 %rs51, [%r32+16]; 2026-02-21T10:23:48.7642500Z ld.shared.b16 %rs52, [%r32+272]; 2026-02-21T10:23:48.7642692Z ld.shared.b16 %rs53, [%r33]; 2026-02-21T10:23:48.7642885Z ld.shared.b16 %rs54, [%r33+256]; 2026-02-21T10:23:48.7643074Z ld.shared.b16 %rs55, [%r33+16]; 2026-02-21T10:23:48.7643265Z ld.shared.b16 %rs56, [%r33+272]; 2026-02-21T10:23:48.7643450Z cvt.f32.bf16 %r1215, %rs49; 2026-02-21T10:23:48.7643727Z cvt.f32.bf16 %r1216, %rs50; 2026-02-21T10:23:48.7643900Z cvt.f32.bf16 %r1217, %rs53; 2026-02-21T10:23:48.7644075Z cvt.f32.bf16 %r1218, %rs54; 2026-02-21T10:23:48.7644250Z cvt.f32.bf16 %r1347, %rs51; 2026-02-21T10:23:48.7644421Z cvt.f32.bf16 %r1348, %rs52; 2026-02-21T10:23:48.7644678Z cvt.f32.bf16 %r1349, %rs55; 2026-02-21T10:23:48.7644847Z cvt.f32.bf16 %r1350, %rs56; 2026-02-21T10:23:48.7645185Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7645545Z add.s64 %rd42, %rd187, 10240; 2026-02-21T10:23:48.7645727Z // begin inline asm 2026-02-21T10:23:48.7645884Z mov.u64 %rd41, 0x0; 2026-02-21T10:23:48.7646102Z createpolicy.fractional.L2::evict_last.b64 %rd41, 1.0; 2026-02-21T10:23:48.7646372Z // end inline asm 2026-02-21T10:23:48.7646639Z // begin inline asm 2026-02-21T10:23:48.7646795Z mov.u32 %r1086, 0x0; 2026-02-21T10:23:48.7647048Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1086 }, [ %rd42 + 0 ], %rd41; 2026-02-21T10:23:48.7647434Z // end inline asm 2026-02-21T10:23:48.7647743Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7648099Z bar.sync 0; 2026-02-21T10:23:48.7648253Z st.shared.b8 [%r34], %r1086; 2026-02-21T10:23:48.7648442Z prmt.b32 %r2313, %r1086, 0, 0x7771U; 2026-02-21T10:23:48.7648647Z st.shared.b8 [%r35+256], %r2313; 2026-02-21T10:23:48.7648913Z prmt.b32 %r2314, %r1086, 0, 0x7772U; 2026-02-21T10:23:48.7649122Z st.shared.b8 [%r36+512], %r2314; 2026-02-21T10:23:48.7649307Z prmt.b32 %r2315, %r1086, 0, 0x7773U; 2026-02-21T10:23:48.7649504Z st.shared.b8 [%r37+768], %r2315; 2026-02-21T10:23:48.7649681Z bar.sync 0; 2026-02-21T10:23:48.7649832Z ld.shared.b32 %r2316, [%r38]; 2026-02-21T10:23:48.7650015Z prmt.b32 %r2317, %r2316, 0, 0x7770U; 2026-02-21T10:23:48.7650213Z cvt.u16.u32 %rs57, %r2317; 2026-02-21T10:23:48.7650393Z prmt.b32 %r2318, %r2316, 0, 0x7771U; 2026-02-21T10:23:48.7650583Z cvt.u16.u32 %rs58, %r2318; 2026-02-21T10:23:48.7650764Z prmt.b32 %r2319, %r2316, 0, 0x7772U; 2026-02-21T10:23:48.7650954Z cvt.u16.u32 %rs59, %r2319; 2026-02-21T10:23:48.7651132Z prmt.b32 %r2320, %r2316, 0, 0x7773U; 2026-02-21T10:23:48.7651323Z cvt.u16.u32 %rs60, %r2320; 2026-02-21T10:23:48.7651507Z ld.shared.b32 %r2321, [%r38+128]; 2026-02-21T10:23:48.7651698Z prmt.b32 %r2322, %r2321, 0, 0x7770U; 2026-02-21T10:23:48.7651893Z cvt.u16.u32 %rs61, %r2322; 2026-02-21T10:23:48.7652065Z prmt.b32 %r2323, %r2321, 0, 0x7771U; 2026-02-21T10:23:48.7652276Z cvt.u16.u32 %rs62, %r2323; 2026-02-21T10:23:48.7652457Z prmt.b32 %r2324, %r2321, 0, 0x7772U; 2026-02-21T10:23:48.7652649Z cvt.u16.u32 %rs63, %r2324; 2026-02-21T10:23:48.7652824Z prmt.b32 %r2325, %r2321, 0, 0x7773U; 2026-02-21T10:23:48.7653014Z cvt.u16.u32 %rs64, %r2325; 2026-02-21T10:23:48.7653333Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7653700Z shl.b16 %rs65, %rs57, 4; 2026-02-21T10:23:48.7653880Z shl.b16 %rs66, %rs58, 4; 2026-02-21T10:23:48.7654048Z shl.b16 %rs67, %rs59, 4; 2026-02-21T10:23:48.7654216Z shl.b16 %rs68, %rs60, 4; 2026-02-21T10:23:48.7654382Z shl.b16 %rs69, %rs61, 4; 2026-02-21T10:23:48.7654549Z shl.b16 %rs70, %rs62, 4; 2026-02-21T10:23:48.7654725Z shl.b16 %rs71, %rs63, 4; 2026-02-21T10:23:48.7654887Z shl.b16 %rs72, %rs64, 4; 2026-02-21T10:23:48.7655201Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7655561Z selp.b16 %rs73, %rs65, %rs57, %p45; 2026-02-21T10:23:48.7655762Z cvt.s16.s8 %rs74, %rs73; 2026-02-21T10:23:48.7655928Z shr.s16 %rs75, %rs74, 4; 2026-02-21T10:23:48.7656121Z selp.b16 %rs76, %rs66, %rs58, %p45; 2026-02-21T10:23:48.7656319Z cvt.s16.s8 %rs77, %rs76; 2026-02-21T10:23:48.7656621Z shr.s16 %rs78, %rs77, 4; 2026-02-21T10:23:48.7656805Z selp.b16 %rs79, %rs67, %rs59, %p45; 2026-02-21T10:23:48.7657002Z cvt.s16.s8 %rs80, %rs79; 2026-02-21T10:23:48.7657274Z shr.s16 %rs81, %rs80, 4; 2026-02-21T10:23:48.7657449Z selp.b16 %rs82, %rs68, %rs60, %p45; 2026-02-21T10:23:48.7657651Z cvt.s16.s8 %rs83, %rs82; 2026-02-21T10:23:48.7657818Z shr.s16 %rs84, %rs83, 4; 2026-02-21T10:23:48.7657991Z selp.b16 %rs85, %rs69, %rs61, %p45; 2026-02-21T10:23:48.7658256Z cvt.s16.s8 %rs86, %rs85; 2026-02-21T10:23:48.7658436Z shr.s16 %rs87, %rs86, 4; 2026-02-21T10:23:48.7658614Z selp.b16 %rs88, %rs70, %rs62, %p45; 2026-02-21T10:23:48.7658805Z cvt.s16.s8 %rs89, %rs88; 2026-02-21T10:23:48.7658977Z shr.s16 %rs90, %rs89, 4; 2026-02-21T10:23:48.7659149Z selp.b16 %rs91, %rs71, %rs63, %p45; 2026-02-21T10:23:48.7659342Z cvt.s16.s8 %rs92, %rs91; 2026-02-21T10:23:48.7659502Z shr.s16 %rs93, %rs92, 4; 2026-02-21T10:23:48.7659677Z selp.b16 %rs94, %rs72, %rs64, %p45; 2026-02-21T10:23:48.7659868Z cvt.s16.s8 %rs95, %rs94; 2026-02-21T10:23:48.7660048Z shr.s16 %rs96, %rs95, 4; 2026-02-21T10:23:48.7660452Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7660828Z cvt.rn.f32.s16 %r2326, %rs75; 2026-02-21T10:23:48.7661016Z cvt.rn.f32.s16 %r2327, %rs78; 2026-02-21T10:23:48.7661191Z cvt.rn.f32.s16 %r2328, %rs81; 2026-02-21T10:23:48.7661373Z cvt.rn.f32.s16 %r2329, %rs84; 2026-02-21T10:23:48.7661546Z cvt.rn.f32.s16 %r2330, %rs87; 2026-02-21T10:23:48.7661724Z cvt.rn.f32.s16 %r2331, %rs90; 2026-02-21T10:23:48.7661975Z cvt.rn.f32.s16 %r2332, %rs93; 2026-02-21T10:23:48.7662163Z cvt.rn.f32.s16 %r2333, %rs96; 2026-02-21T10:23:48.7662337Z bar.sync 0; 2026-02-21T10:23:48.7662482Z st.shared.b32 [%r39], %r2326; 2026-02-21T10:23:48.7662671Z st.shared.b32 [%r39+8], %r2327; 2026-02-21T10:23:48.7662859Z st.shared.b32 [%r40], %r2328; 2026-02-21T10:23:48.7663042Z st.shared.b32 [%r40+8], %r2329; 2026-02-21T10:23:48.7663227Z st.shared.b32 [%r41], %r2330; 2026-02-21T10:23:48.7663408Z st.shared.b32 [%r41+8], %r2331; 2026-02-21T10:23:48.7663588Z st.shared.b32 [%r42], %r2332; 2026-02-21T10:23:48.7663789Z st.shared.b32 [%r42+8], %r2333; 2026-02-21T10:23:48.7663965Z $L__tmp3: 2026-02-21T10:23:48.7664329Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7664760Z // begin inline asm 2026-02-21T10:23:48.7664935Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7665124Z // end inline asm 2026-02-21T10:23:48.7665269Z bar.sync 0; 2026-02-21T10:23:48.7665429Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7665619Z // begin inline asm 2026-02-21T10:23:48.7667121Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r1215,%r1216,%r1217,%r1218}, %rd78, %p2, 1, 1; 2026-02-21T10:23:48.7668605Z // end inline asm 2026-02-21T10:23:48.7668760Z // begin inline asm 2026-02-21T10:23:48.7670121Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r1347,%r1348,%r1349,%r1350}, %rd79, %p2, 1, 1; 2026-02-21T10:23:48.7671520Z // end inline asm 2026-02-21T10:23:48.7671682Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7671879Z mov.b32 %r1415, %r4869; 2026-02-21T10:23:48.7672041Z mov.b32 %r1416, %r2222; 2026-02-21T10:23:48.7672436Z mov.b32 %r1417, %r2222; 2026-02-21T10:23:48.7672601Z // begin inline asm 2026-02-21T10:23:48.7673759Z // wait for regs: %r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423,%r1415,%r1416,%r1417 2026-02-21T10:23:48.7675041Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7675232Z // end inline asm 2026-02-21T10:23:48.7675385Z $L__tmp4: 2026-02-21T10:23:48.7675682Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7676046Z add.s64 %rd47, %rd188, -32; 2026-02-21T10:23:48.7676226Z // begin inline asm 2026-02-21T10:23:48.7676380Z mov.u64 %rd46, 0x0; 2026-02-21T10:23:48.7676850Z createpolicy.fractional.L2::evict_last.b64 %rd46, 1.0; 2026-02-21T10:23:48.7677107Z // end inline asm 2026-02-21T10:23:48.7677254Z // begin inline asm 2026-02-21T10:23:48.7677409Z mov.u32 %r1485, 0x0; 2026-02-21T10:23:48.7677565Z mov.u32 %r1486, 0x0; 2026-02-21T10:23:48.7677720Z mov.u32 %r1487, 0x0; 2026-02-21T10:23:48.7677867Z mov.u32 %r1488, 0x0; 2026-02-21T10:23:48.7678262Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1485, %r1486, %r1487, %r1488 }, [ %rd47 + 0 ], %rd46; 2026-02-21T10:23:48.7678633Z // end inline asm 2026-02-21T10:23:48.7678932Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7679280Z bar.sync 0; 2026-02-21T10:23:48.7679445Z st.shared.v2.b32 [%r29], {%r1485, %r1486}; 2026-02-21T10:23:48.7679676Z st.shared.v2.b32 [%r30], {%r1487, %r1488}; 2026-02-21T10:23:48.7679876Z bar.sync 0; 2026-02-21T10:23:48.7680037Z ld.shared.b16 %rs97, [%r32]; 2026-02-21T10:23:48.7680235Z ld.shared.b16 %rs98, [%r32+256]; 2026-02-21T10:23:48.7680437Z ld.shared.b16 %rs99, [%r32+16]; 2026-02-21T10:23:48.7680626Z ld.shared.b16 %rs100, [%r32+272]; 2026-02-21T10:23:48.7680827Z ld.shared.b16 %rs101, [%r33]; 2026-02-21T10:23:48.7681012Z ld.shared.b16 %rs102, [%r33+256]; 2026-02-21T10:23:48.7681202Z ld.shared.b16 %rs103, [%r33+16]; 2026-02-21T10:23:48.7681403Z ld.shared.b16 %rs104, [%r33+272]; 2026-02-21T10:23:48.7681598Z cvt.f32.bf16 %r1618, %rs97; 2026-02-21T10:23:48.7681781Z cvt.f32.bf16 %r1619, %rs98; 2026-02-21T10:23:48.7681955Z cvt.f32.bf16 %r1620, %rs101; 2026-02-21T10:23:48.7682135Z cvt.f32.bf16 %r1621, %rs102; 2026-02-21T10:23:48.7682316Z cvt.f32.bf16 %r1750, %rs99; 2026-02-21T10:23:48.7682488Z cvt.f32.bf16 %r1751, %rs100; 2026-02-21T10:23:48.7682660Z cvt.f32.bf16 %r1752, %rs103; 2026-02-21T10:23:48.7682832Z cvt.f32.bf16 %r1753, %rs104; 2026-02-21T10:23:48.7683152Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7683522Z add.s64 %rd50, %rd187, 20480; 2026-02-21T10:23:48.7683700Z // begin inline asm 2026-02-21T10:23:48.7683854Z mov.u64 %rd49, 0x0; 2026-02-21T10:23:48.7684084Z createpolicy.fractional.L2::evict_last.b64 %rd49, 1.0; 2026-02-21T10:23:48.7684338Z // end inline asm 2026-02-21T10:23:48.7684487Z // begin inline asm 2026-02-21T10:23:48.7684638Z mov.u32 %r1489, 0x0; 2026-02-21T10:23:48.7684907Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1489 }, [ %rd50 + 0 ], %rd49; 2026-02-21T10:23:48.7685202Z // end inline asm 2026-02-21T10:23:48.7685502Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7685854Z bar.sync 0; 2026-02-21T10:23:48.7686001Z st.shared.b8 [%r34], %r1489; 2026-02-21T10:23:48.7686187Z prmt.b32 %r2334, %r1489, 0, 0x7771U; 2026-02-21T10:23:48.7686390Z st.shared.b8 [%r35+256], %r2334; 2026-02-21T10:23:48.7686718Z prmt.b32 %r2335, %r1489, 0, 0x7772U; 2026-02-21T10:23:48.7687017Z st.shared.b8 [%r36+512], %r2335; 2026-02-21T10:23:48.7687206Z prmt.b32 %r2336, %r1489, 0, 0x7773U; 2026-02-21T10:23:48.7687394Z st.shared.b8 [%r37+768], %r2336; 2026-02-21T10:23:48.7687576Z bar.sync 0; 2026-02-21T10:23:48.7687804Z ld.shared.b32 %r2337, [%r38]; 2026-02-21T10:23:48.7687990Z prmt.b32 %r2338, %r2337, 0, 0x7770U; 2026-02-21T10:23:48.7688187Z cvt.u16.u32 %rs105, %r2338; 2026-02-21T10:23:48.7688367Z prmt.b32 %r2339, %r2337, 0, 0x7771U; 2026-02-21T10:23:48.7688569Z cvt.u16.u32 %rs106, %r2339; 2026-02-21T10:23:48.7688749Z prmt.b32 %r2340, %r2337, 0, 0x7772U; 2026-02-21T10:23:48.7688942Z cvt.u16.u32 %rs107, %r2340; 2026-02-21T10:23:48.7689120Z prmt.b32 %r2341, %r2337, 0, 0x7773U; 2026-02-21T10:23:48.7689315Z cvt.u16.u32 %rs108, %r2341; 2026-02-21T10:23:48.7689495Z ld.shared.b32 %r2342, [%r38+128]; 2026-02-21T10:23:48.7689686Z prmt.b32 %r2343, %r2342, 0, 0x7770U; 2026-02-21T10:23:48.7689877Z cvt.u16.u32 %rs109, %r2343; 2026-02-21T10:23:48.7690067Z prmt.b32 %r2344, %r2342, 0, 0x7771U; 2026-02-21T10:23:48.7690340Z cvt.u16.u32 %rs110, %r2344; 2026-02-21T10:23:48.7690521Z prmt.b32 %r2345, %r2342, 0, 0x7772U; 2026-02-21T10:23:48.7690712Z cvt.u16.u32 %rs111, %r2345; 2026-02-21T10:23:48.7690888Z prmt.b32 %r2346, %r2342, 0, 0x7773U; 2026-02-21T10:23:48.7691098Z cvt.u16.u32 %rs112, %r2346; 2026-02-21T10:23:48.7691483Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7691844Z shl.b16 %rs113, %rs105, 4; 2026-02-21T10:23:48.7692022Z shl.b16 %rs114, %rs106, 4; 2026-02-21T10:23:48.7692197Z shl.b16 %rs115, %rs107, 4; 2026-02-21T10:23:48.7692378Z shl.b16 %rs116, %rs108, 4; 2026-02-21T10:23:48.7692550Z shl.b16 %rs117, %rs109, 4; 2026-02-21T10:23:48.7692720Z shl.b16 %rs118, %rs110, 4; 2026-02-21T10:23:48.7692887Z shl.b16 %rs119, %rs111, 4; 2026-02-21T10:23:48.7693054Z shl.b16 %rs120, %rs112, 4; 2026-02-21T10:23:48.7693373Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7693740Z selp.b16 %rs121, %rs113, %rs105, %p45; 2026-02-21T10:23:48.7693950Z cvt.s16.s8 %rs122, %rs121; 2026-02-21T10:23:48.7694117Z shr.s16 %rs123, %rs122, 4; 2026-02-21T10:23:48.7694303Z selp.b16 %rs124, %rs114, %rs106, %p45; 2026-02-21T10:23:48.7694512Z cvt.s16.s8 %rs125, %rs124; 2026-02-21T10:23:48.7694691Z shr.s16 %rs126, %rs125, 4; 2026-02-21T10:23:48.7694870Z selp.b16 %rs127, %rs115, %rs107, %p45; 2026-02-21T10:23:48.7695068Z cvt.s16.s8 %rs128, %rs127; 2026-02-21T10:23:48.7695235Z shr.s16 %rs129, %rs128, 4; 2026-02-21T10:23:48.7695414Z selp.b16 %rs130, %rs116, %rs108, %p45; 2026-02-21T10:23:48.7695626Z cvt.s16.s8 %rs131, %rs130; 2026-02-21T10:23:48.7695795Z shr.s16 %rs132, %rs131, 4; 2026-02-21T10:23:48.7695972Z selp.b16 %rs133, %rs117, %rs109, %p45; 2026-02-21T10:23:48.7696166Z cvt.s16.s8 %rs134, %rs133; 2026-02-21T10:23:48.7696338Z shr.s16 %rs135, %rs134, 4; 2026-02-21T10:23:48.7696647Z selp.b16 %rs136, %rs118, %rs110, %p45; 2026-02-21T10:23:48.7696851Z cvt.s16.s8 %rs137, %rs136; 2026-02-21T10:23:48.7697017Z shr.s16 %rs138, %rs137, 4; 2026-02-21T10:23:48.7697199Z selp.b16 %rs139, %rs119, %rs111, %p45; 2026-02-21T10:23:48.7697399Z cvt.s16.s8 %rs140, %rs139; 2026-02-21T10:23:48.7697571Z shr.s16 %rs141, %rs140, 4; 2026-02-21T10:23:48.7697756Z selp.b16 %rs142, %rs120, %rs112, %p45; 2026-02-21T10:23:48.7697952Z cvt.s16.s8 %rs143, %rs142; 2026-02-21T10:23:48.7707670Z shr.s16 %rs144, %rs143, 4; 2026-02-21T10:23:48.7708062Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7708450Z cvt.rn.f32.s16 %r2347, %rs123; 2026-02-21T10:23:48.7708751Z cvt.rn.f32.s16 %r2348, %rs126; 2026-02-21T10:23:48.7708933Z cvt.rn.f32.s16 %r2349, %rs129; 2026-02-21T10:23:48.7709120Z cvt.rn.f32.s16 %r2350, %rs132; 2026-02-21T10:23:48.7709299Z cvt.rn.f32.s16 %r2351, %rs135; 2026-02-21T10:23:48.7709479Z cvt.rn.f32.s16 %r2352, %rs138; 2026-02-21T10:23:48.7709832Z cvt.rn.f32.s16 %r2353, %rs141; 2026-02-21T10:23:48.7710020Z cvt.rn.f32.s16 %r2354, %rs144; 2026-02-21T10:23:48.7710195Z bar.sync 0; 2026-02-21T10:23:48.7710355Z st.shared.b32 [%r39], %r2347; 2026-02-21T10:23:48.7710634Z st.shared.b32 [%r39+8], %r2348; 2026-02-21T10:23:48.7710839Z st.shared.b32 [%r40], %r2349; 2026-02-21T10:23:48.7711035Z st.shared.b32 [%r40+8], %r2350; 2026-02-21T10:23:48.7711234Z st.shared.b32 [%r41], %r2351; 2026-02-21T10:23:48.7711418Z st.shared.b32 [%r41+8], %r2352; 2026-02-21T10:23:48.7711599Z st.shared.b32 [%r42], %r2353; 2026-02-21T10:23:48.7711783Z st.shared.b32 [%r42+8], %r2354; 2026-02-21T10:23:48.7711958Z $L__tmp5: 2026-02-21T10:23:48.7712337Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7712774Z // begin inline asm 2026-02-21T10:23:48.7712963Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7713153Z // end inline asm 2026-02-21T10:23:48.7713320Z bar.sync 0; 2026-02-21T10:23:48.7713580Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7713771Z // begin inline asm 2026-02-21T10:23:48.7715218Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r1618,%r1619,%r1620,%r1621}, %rd78, %p2, 1, 1; 2026-02-21T10:23:48.7716801Z // end inline asm 2026-02-21T10:23:48.7716956Z // begin inline asm 2026-02-21T10:23:48.7718342Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r1750,%r1751,%r1752,%r1753}, %rd79, %p2, 1, 1; 2026-02-21T10:23:48.7719753Z // end inline asm 2026-02-21T10:23:48.7719919Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7720134Z mov.b32 %r1818, %r4869; 2026-02-21T10:23:48.7720309Z mov.b32 %r1819, %r2222; 2026-02-21T10:23:48.7720474Z mov.b32 %r1820, %r2222; 2026-02-21T10:23:48.7720636Z // begin inline asm 2026-02-21T10:23:48.7721825Z // wait for regs: %r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423,%r1818,%r1819,%r1820 2026-02-21T10:23:48.7723069Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7723271Z // end inline asm 2026-02-21T10:23:48.7723422Z $L__tmp6: 2026-02-21T10:23:48.7723725Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7724094Z // begin inline asm 2026-02-21T10:23:48.7724252Z mov.u64 %rd54, 0x0; 2026-02-21T10:23:48.7724470Z createpolicy.fractional.L2::evict_last.b64 %rd54, 1.0; 2026-02-21T10:23:48.7724726Z // end inline asm 2026-02-21T10:23:48.7724879Z // begin inline asm 2026-02-21T10:23:48.7725044Z mov.u32 %r1888, 0x0; 2026-02-21T10:23:48.7725206Z mov.u32 %r1889, 0x0; 2026-02-21T10:23:48.7725362Z mov.u32 %r1890, 0x0; 2026-02-21T10:23:48.7725512Z mov.u32 %r1891, 0x0; 2026-02-21T10:23:48.7725836Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1888, %r1889, %r1890, %r1891 }, [ %rd188 + 0 ], %rd54; 2026-02-21T10:23:48.7726305Z // end inline asm 2026-02-21T10:23:48.7726742Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7727197Z bar.sync 0; 2026-02-21T10:23:48.7727362Z st.shared.v2.b32 [%r29], {%r1888, %r1889}; 2026-02-21T10:23:48.7727597Z st.shared.v2.b32 [%r30], {%r1890, %r1891}; 2026-02-21T10:23:48.7727818Z bar.sync 0; 2026-02-21T10:23:48.7727974Z ld.shared.b16 %rs145, [%r32]; 2026-02-21T10:23:48.7728165Z ld.shared.b16 %rs146, [%r32+256]; 2026-02-21T10:23:48.7728374Z ld.shared.b16 %rs147, [%r32+16]; 2026-02-21T10:23:48.7728569Z ld.shared.b16 %rs148, [%r32+272]; 2026-02-21T10:23:48.7728761Z ld.shared.b16 %rs149, [%r33]; 2026-02-21T10:23:48.7728950Z ld.shared.b16 %rs150, [%r33+256]; 2026-02-21T10:23:48.7729137Z ld.shared.b16 %rs151, [%r33+16]; 2026-02-21T10:23:48.7729347Z ld.shared.b16 %rs152, [%r33+272]; 2026-02-21T10:23:48.7729553Z cvt.f32.bf16 %r2021, %rs145; 2026-02-21T10:23:48.7729830Z cvt.f32.bf16 %r2022, %rs146; 2026-02-21T10:23:48.7730020Z cvt.f32.bf16 %r2023, %rs149; 2026-02-21T10:23:48.7730198Z cvt.f32.bf16 %r2024, %rs150; 2026-02-21T10:23:48.7730377Z cvt.f32.bf16 %r2153, %rs147; 2026-02-21T10:23:48.7730555Z cvt.f32.bf16 %r2154, %rs148; 2026-02-21T10:23:48.7730736Z cvt.f32.bf16 %r2155, %rs151; 2026-02-21T10:23:48.7731003Z cvt.f32.bf16 %r2156, %rs152; 2026-02-21T10:23:48.7731349Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7731730Z add.s64 %rd58, %rd187, 30720; 2026-02-21T10:23:48.7731922Z // begin inline asm 2026-02-21T10:23:48.7732095Z mov.u64 %rd57, 0x0; 2026-02-21T10:23:48.7732317Z createpolicy.fractional.L2::evict_last.b64 %rd57, 1.0; 2026-02-21T10:23:48.7732582Z // end inline asm 2026-02-21T10:23:48.7732738Z // begin inline asm 2026-02-21T10:23:48.7732909Z mov.u32 %r1892, 0x0; 2026-02-21T10:23:48.7733175Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1892 }, [ %rd58 + 0 ], %rd57; 2026-02-21T10:23:48.7733485Z // end inline asm 2026-02-21T10:23:48.7733785Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7734152Z bar.sync 0; 2026-02-21T10:23:48.7734308Z st.shared.b8 [%r34], %r1892; 2026-02-21T10:23:48.7734506Z prmt.b32 %r2355, %r1892, 0, 0x7771U; 2026-02-21T10:23:48.7734714Z st.shared.b8 [%r35+256], %r2355; 2026-02-21T10:23:48.7734911Z prmt.b32 %r2356, %r1892, 0, 0x7772U; 2026-02-21T10:23:48.7735113Z st.shared.b8 [%r36+512], %r2356; 2026-02-21T10:23:48.7735302Z prmt.b32 %r2357, %r1892, 0, 0x7773U; 2026-02-21T10:23:48.7735511Z st.shared.b8 [%r37+768], %r2357; 2026-02-21T10:23:48.7735697Z bar.sync 0; 2026-02-21T10:23:48.7735845Z ld.shared.b32 %r2358, [%r38]; 2026-02-21T10:23:48.7736032Z prmt.b32 %r2359, %r2358, 0, 0x7770U; 2026-02-21T10:23:48.7736239Z cvt.u16.u32 %rs153, %r2359; 2026-02-21T10:23:48.7736442Z prmt.b32 %r2360, %r2358, 0, 0x7771U; 2026-02-21T10:23:48.7736784Z cvt.u16.u32 %rs154, %r2360; 2026-02-21T10:23:48.7736970Z prmt.b32 %r2361, %r2358, 0, 0x7772U; 2026-02-21T10:23:48.7737164Z cvt.u16.u32 %rs155, %r2361; 2026-02-21T10:23:48.7737352Z prmt.b32 %r2362, %r2358, 0, 0x7773U; 2026-02-21T10:23:48.7737549Z cvt.u16.u32 %rs156, %r2362; 2026-02-21T10:23:48.7737729Z ld.shared.b32 %r2363, [%r38+128]; 2026-02-21T10:23:48.7737927Z prmt.b32 %r2364, %r2363, 0, 0x7770U; 2026-02-21T10:23:48.7738131Z cvt.u16.u32 %rs157, %r2364; 2026-02-21T10:23:48.7738305Z prmt.b32 %r2365, %r2363, 0, 0x7771U; 2026-02-21T10:23:48.7738502Z cvt.u16.u32 %rs158, %r2365; 2026-02-21T10:23:48.7738693Z prmt.b32 %r2366, %r2363, 0, 0x7772U; 2026-02-21T10:23:48.7738888Z cvt.u16.u32 %rs159, %r2366; 2026-02-21T10:23:48.7739070Z prmt.b32 %r2367, %r2363, 0, 0x7773U; 2026-02-21T10:23:48.7739264Z cvt.u16.u32 %rs160, %r2367; 2026-02-21T10:23:48.7739585Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7740043Z shl.b16 %rs161, %rs153, 4; 2026-02-21T10:23:48.7740236Z shl.b16 %rs162, %rs154, 4; 2026-02-21T10:23:48.7740411Z shl.b16 %rs163, %rs155, 4; 2026-02-21T10:23:48.7740597Z shl.b16 %rs164, %rs156, 4; 2026-02-21T10:23:48.7740859Z shl.b16 %rs165, %rs157, 4; 2026-02-21T10:23:48.7741033Z shl.b16 %rs166, %rs158, 4; 2026-02-21T10:23:48.7741214Z shl.b16 %rs167, %rs159, 4; 2026-02-21T10:23:48.7741387Z shl.b16 %rs168, %rs160, 4; 2026-02-21T10:23:48.7741708Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7742077Z selp.b16 %rs169, %rs161, %rs153, %p45; 2026-02-21T10:23:48.7742300Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T10:23:48.7742472Z shr.s16 %rs171, %rs170, 4; 2026-02-21T10:23:48.7742671Z selp.b16 %rs172, %rs162, %rs154, %p45; 2026-02-21T10:23:48.7742881Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T10:23:48.7743052Z shr.s16 %rs174, %rs173, 4; 2026-02-21T10:23:48.7743247Z selp.b16 %rs175, %rs163, %rs155, %p45; 2026-02-21T10:23:48.7743520Z cvt.s16.s8 %rs176, %rs175; 2026-02-21T10:23:48.7743712Z shr.s16 %rs177, %rs176, 4; 2026-02-21T10:23:48.7743895Z selp.b16 %rs178, %rs164, %rs156, %p45; 2026-02-21T10:23:48.7744089Z cvt.s16.s8 %rs179, %rs178; 2026-02-21T10:23:48.7744267Z shr.s16 %rs180, %rs179, 4; 2026-02-21T10:23:48.7744446Z selp.b16 %rs181, %rs165, %rs157, %p45; 2026-02-21T10:23:48.7744716Z cvt.s16.s8 %rs182, %rs181; 2026-02-21T10:23:48.7744900Z shr.s16 %rs183, %rs182, 4; 2026-02-21T10:23:48.7745075Z selp.b16 %rs184, %rs166, %rs158, %p45; 2026-02-21T10:23:48.7745274Z cvt.s16.s8 %rs185, %rs184; 2026-02-21T10:23:48.7745440Z shr.s16 %rs186, %rs185, 4; 2026-02-21T10:23:48.7745618Z selp.b16 %rs187, %rs167, %rs159, %p45; 2026-02-21T10:23:48.7745813Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T10:23:48.7745982Z shr.s16 %rs189, %rs188, 4; 2026-02-21T10:23:48.7746161Z selp.b16 %rs190, %rs168, %rs160, %p45; 2026-02-21T10:23:48.7746355Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T10:23:48.7746667Z shr.s16 %rs192, %rs191, 4; 2026-02-21T10:23:48.7746997Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7747366Z cvt.rn.f32.s16 %r2368, %rs171; 2026-02-21T10:23:48.7747555Z cvt.rn.f32.s16 %r2369, %rs174; 2026-02-21T10:23:48.7747744Z cvt.rn.f32.s16 %r2370, %rs177; 2026-02-21T10:23:48.7747929Z cvt.rn.f32.s16 %r2371, %rs180; 2026-02-21T10:23:48.7748117Z cvt.rn.f32.s16 %r2372, %rs183; 2026-02-21T10:23:48.7748298Z cvt.rn.f32.s16 %r2373, %rs186; 2026-02-21T10:23:48.7748474Z cvt.rn.f32.s16 %r2374, %rs189; 2026-02-21T10:23:48.7748726Z cvt.rn.f32.s16 %r2375, %rs192; 2026-02-21T10:23:48.7748898Z bar.sync 0; 2026-02-21T10:23:48.7749054Z st.shared.b32 [%r39], %r2368; 2026-02-21T10:23:48.7749244Z st.shared.b32 [%r39+8], %r2369; 2026-02-21T10:23:48.7749436Z st.shared.b32 [%r40], %r2370; 2026-02-21T10:23:48.7749615Z st.shared.b32 [%r40+8], %r2371; 2026-02-21T10:23:48.7749801Z st.shared.b32 [%r41], %r2372; 2026-02-21T10:23:48.7749981Z st.shared.b32 [%r41+8], %r2373; 2026-02-21T10:23:48.7750176Z st.shared.b32 [%r42], %r2374; 2026-02-21T10:23:48.7750361Z st.shared.b32 [%r42+8], %r2375; 2026-02-21T10:23:48.7750534Z $L__tmp7: 2026-02-21T10:23:48.7750898Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7751320Z // begin inline asm 2026-02-21T10:23:48.7751501Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7751685Z // end inline asm 2026-02-21T10:23:48.7751833Z bar.sync 0; 2026-02-21T10:23:48.7751989Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7752169Z // begin inline asm 2026-02-21T10:23:48.7753543Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r2021,%r2022,%r2023,%r2024}, %rd78, %p2, 1, 1; 2026-02-21T10:23:48.7755076Z // end inline asm 2026-02-21T10:23:48.7755227Z // begin inline asm 2026-02-21T10:23:48.7756725Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423}, {%r2153,%r2154,%r2155,%r2156}, %rd79, %p2, 1, 1; 2026-02-21T10:23:48.7758134Z // end inline asm 2026-02-21T10:23:48.7758302Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7758585Z mov.b32 %r2221, %r4869; 2026-02-21T10:23:48.7758759Z mov.b32 %r2223, %r2222; 2026-02-21T10:23:48.7758924Z // begin inline asm 2026-02-21T10:23:48.7760147Z // wait for regs: %r6360,%r6361,%r6362,%r6363,%r6364,%r6365,%r6366,%r6367,%r6368,%r6369,%r6370,%r6371,%r6372,%r6373,%r6374,%r6375,%r6376,%r6377,%r6378,%r6379,%r6380,%r6381,%r6382,%r6383,%r6384,%r6385,%r6386,%r6387,%r6388,%r6389,%r6390,%r6391,%r6392,%r6393,%r6394,%r6395,%r6396,%r6397,%r6398,%r6399,%r6400,%r6401,%r6402,%r6403,%r6404,%r6405,%r6406,%r6407,%r6408,%r6409,%r6410,%r6411,%r6412,%r6413,%r6414,%r6415,%r6416,%r6417,%r6418,%r6419,%r6420,%r6421,%r6422,%r6423,%r2221,%r2222,%r2223 2026-02-21T10:23:48.7761365Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7761557Z // end inline asm 2026-02-21T10:23:48.7761700Z $L__tmp8: 2026-02-21T10:23:48.7761991Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7762374Z add.s64 %rd189, %rd189, 32; 2026-02-21T10:23:48.7762559Z add.s64 %rd188, %rd188, 128; 2026-02-21T10:23:48.7762744Z add.s64 %rd187, %rd187, 40960; 2026-02-21T10:23:48.7762945Z setp.lt.u64 %p11, %rd189, 4064; 2026-02-21T10:23:48.7763134Z @%p11 bra $L__BB0_4; 2026-02-21T10:23:48.7763344Z // %bb.5: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:23:48.7763760Z .loc 1 34 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:32 2026-02-21T10:23:48.7764129Z or.b32 %r2449, %r57, %r21; 2026-02-21T10:23:48.7764463Z .loc 1 36 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:36:32 2026-02-21T10:23:48.7764816Z or.b32 %r2450, %r58, %r10; 2026-02-21T10:23:48.7764986Z or.b32 %r2451, %r58, %r11; 2026-02-21T10:23:48.7765155Z or.b32 %r2452, %r58, %r12; 2026-02-21T10:23:48.7765316Z or.b32 %r2453, %r58, %r13; 2026-02-21T10:23:48.7765484Z or.b32 %r2454, %r58, %r14; 2026-02-21T10:23:48.7765661Z or.b32 %r2455, %r58, %r15; 2026-02-21T10:23:48.7765836Z or.b32 %r2456, %r58, %r16; 2026-02-21T10:23:48.7766002Z or.b32 %r2457, %r58, %r17; 2026-02-21T10:23:48.7766311Z .loc 1 90 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:90:28 2026-02-21T10:23:48.7766817Z cvt.rn.bf16x2.f32 %r2458, %r6361, %r6360; 2026-02-21T10:23:48.7767042Z cvt.rn.bf16x2.f32 %r2459, %r6363, %r6362; 2026-02-21T10:23:48.7767263Z cvt.rn.bf16x2.f32 %r2460, %r6365, %r6364; 2026-02-21T10:23:48.7767476Z cvt.rn.bf16x2.f32 %r2461, %r6367, %r6366; 2026-02-21T10:23:48.7767691Z cvt.rn.bf16x2.f32 %r2462, %r6369, %r6368; 2026-02-21T10:23:48.7767918Z cvt.rn.bf16x2.f32 %r2463, %r6371, %r6370; 2026-02-21T10:23:48.7768132Z cvt.rn.bf16x2.f32 %r2464, %r6373, %r6372; 2026-02-21T10:23:48.7768344Z cvt.rn.bf16x2.f32 %r2465, %r6375, %r6374; 2026-02-21T10:23:48.7768553Z cvt.rn.bf16x2.f32 %r2466, %r6377, %r6376; 2026-02-21T10:23:48.7768766Z cvt.rn.bf16x2.f32 %r2467, %r6379, %r6378; 2026-02-21T10:23:48.7769071Z cvt.rn.bf16x2.f32 %r2468, %r6381, %r6380; 2026-02-21T10:23:48.7769287Z cvt.rn.bf16x2.f32 %r2469, %r6383, %r6382; 2026-02-21T10:23:48.7769499Z cvt.rn.bf16x2.f32 %r2470, %r6385, %r6384; 2026-02-21T10:23:48.7769712Z cvt.rn.bf16x2.f32 %r2471, %r6387, %r6386; 2026-02-21T10:23:48.7770027Z cvt.rn.bf16x2.f32 %r2472, %r6389, %r6388; 2026-02-21T10:23:48.7770241Z cvt.rn.bf16x2.f32 %r2473, %r6391, %r6390; 2026-02-21T10:23:48.7770456Z cvt.rn.bf16x2.f32 %r2474, %r6393, %r6392; 2026-02-21T10:23:48.7770663Z cvt.rn.bf16x2.f32 %r2475, %r6395, %r6394; 2026-02-21T10:23:48.7770877Z cvt.rn.bf16x2.f32 %r2476, %r6397, %r6396; 2026-02-21T10:23:48.7771087Z cvt.rn.bf16x2.f32 %r2477, %r6399, %r6398; 2026-02-21T10:23:48.7771313Z cvt.rn.bf16x2.f32 %r2478, %r6401, %r6400; 2026-02-21T10:23:48.7771525Z cvt.rn.bf16x2.f32 %r2479, %r6403, %r6402; 2026-02-21T10:23:48.7771738Z cvt.rn.bf16x2.f32 %r2480, %r6405, %r6404; 2026-02-21T10:23:48.7771949Z cvt.rn.bf16x2.f32 %r2481, %r6407, %r6406; 2026-02-21T10:23:48.7772159Z cvt.rn.bf16x2.f32 %r2482, %r6409, %r6408; 2026-02-21T10:23:48.7772447Z cvt.rn.bf16x2.f32 %r2483, %r6411, %r6410; 2026-02-21T10:23:48.7772659Z cvt.rn.bf16x2.f32 %r2484, %r6413, %r6412; 2026-02-21T10:23:48.7772886Z cvt.rn.bf16x2.f32 %r2485, %r6415, %r6414; 2026-02-21T10:23:48.7773098Z cvt.rn.bf16x2.f32 %r2486, %r6417, %r6416; 2026-02-21T10:23:48.7773314Z cvt.rn.bf16x2.f32 %r2487, %r6419, %r6418; 2026-02-21T10:23:48.7773603Z cvt.rn.bf16x2.f32 %r2488, %r6421, %r6420; 2026-02-21T10:23:48.7773819Z cvt.rn.bf16x2.f32 %r2489, %r6423, %r6422; 2026-02-21T10:23:48.7774169Z .loc 1 91 50 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:50 2026-02-21T10:23:48.7774535Z mad.lo.s32 %r2490, %r2450, 1280, %r2449; 2026-02-21T10:23:48.7774749Z mad.lo.s32 %r2491, %r2451, 1280, %r2449; 2026-02-21T10:23:48.7774954Z mad.lo.s32 %r2492, %r2452, 1280, %r2449; 2026-02-21T10:23:48.7775162Z mad.lo.s32 %r2493, %r2453, 1280, %r2449; 2026-02-21T10:23:48.7775363Z mad.lo.s32 %r2494, %r2454, 1280, %r2449; 2026-02-21T10:23:48.7775586Z mad.lo.s32 %r2495, %r2455, 1280, %r2449; 2026-02-21T10:23:48.7775795Z mad.lo.s32 %r2496, %r2456, 1280, %r2449; 2026-02-21T10:23:48.7776000Z mad.lo.s32 %r2497, %r2457, 1280, %r2449; 2026-02-21T10:23:48.7776351Z .loc 1 91 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:22 2026-02-21T10:23:48.7776854Z mad.wide.s32 %rd62, %r2490, 2, %rd25; 2026-02-21T10:23:48.7777068Z mad.wide.s32 %rd63, %r2491, 2, %rd25; 2026-02-21T10:23:48.7777266Z mad.wide.s32 %rd64, %r2492, 2, %rd25; 2026-02-21T10:23:48.7777477Z mad.wide.s32 %rd65, %r2493, 2, %rd25; 2026-02-21T10:23:48.7777679Z mad.wide.s32 %rd66, %r2494, 2, %rd25; 2026-02-21T10:23:48.7777879Z mad.wide.s32 %rd67, %r2495, 2, %rd25; 2026-02-21T10:23:48.7778077Z mad.wide.s32 %rd68, %r2496, 2, %rd25; 2026-02-21T10:23:48.7778271Z mad.wide.s32 %rd69, %r2497, 2, %rd25; 2026-02-21T10:23:48.7778611Z .loc 1 91 81 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:81 2026-02-21T10:23:48.7778966Z bar.sync 0; 2026-02-21T10:23:48.7779167Z st.shared.v4.b32 [%r43], {%r2458, %r2460, %r2462, %r2464}; 2026-02-21T10:23:48.7779483Z st.shared.v4.b32 [%r43+512], {%r2459, %r2461, %r2463, %r2465}; 2026-02-21T10:23:48.7779793Z st.shared.v4.b32 [%r44], {%r2466, %r2468, %r2470, %r2472}; 2026-02-21T10:23:48.7780084Z st.shared.v4.b32 [%r44+512], {%r2467, %r2469, %r2471, %r2473}; 2026-02-21T10:23:48.7780382Z st.shared.v4.b32 [%r45], {%r2474, %r2476, %r2478, %r2480}; 2026-02-21T10:23:48.7780671Z st.shared.v4.b32 [%r45+512], {%r2475, %r2477, %r2479, %r2481}; 2026-02-21T10:23:48.7780973Z st.shared.v4.b32 [%r46], {%r2482, %r2484, %r2486, %r2488}; 2026-02-21T10:23:48.7781265Z st.shared.v4.b32 [%r46+512], {%r2483, %r2485, %r2487, %r2489}; 2026-02-21T10:23:48.7781507Z bar.sync 0; 2026-02-21T10:23:48.7781653Z // begin inline asm 2026-02-21T10:23:48.7781940Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2416, %r2417, %r2418, %r2419}, [%r2380]; 2026-02-21T10:23:48.7782369Z // end inline asm 2026-02-21T10:23:48.7782522Z // begin inline asm 2026-02-21T10:23:48.7782801Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2420, %r2421, %r2422, %r2423}, [%r2385]; 2026-02-21T10:23:48.7783134Z // end inline asm 2026-02-21T10:23:48.7783362Z // begin inline asm 2026-02-21T10:23:48.7783636Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2424, %r2425, %r2426, %r2427}, [%r2390]; 2026-02-21T10:23:48.7783958Z // end inline asm 2026-02-21T10:23:48.7784105Z // begin inline asm 2026-02-21T10:23:48.7784376Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2428, %r2429, %r2430, %r2431}, [%r2395]; 2026-02-21T10:23:48.7784700Z // end inline asm 2026-02-21T10:23:48.7784851Z // begin inline asm 2026-02-21T10:23:48.7785126Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2432, %r2433, %r2434, %r2435}, [%r2400]; 2026-02-21T10:23:48.7785450Z // end inline asm 2026-02-21T10:23:48.7785590Z // begin inline asm 2026-02-21T10:23:48.7785858Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2436, %r2437, %r2438, %r2439}, [%r2405]; 2026-02-21T10:23:48.7786265Z // end inline asm 2026-02-21T10:23:48.7786418Z // begin inline asm 2026-02-21T10:23:48.7786802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2440, %r2441, %r2442, %r2443}, [%r2410]; 2026-02-21T10:23:48.7787130Z // end inline asm 2026-02-21T10:23:48.7787279Z // begin inline asm 2026-02-21T10:23:48.7787558Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2444, %r2445, %r2446, %r2447}, [%r2415]; 2026-02-21T10:23:48.7787962Z // end inline asm 2026-02-21T10:23:48.7788106Z // begin inline asm 2026-02-21T10:23:48.7788325Z st.global.v4.b32 [ %rd62 + 0 ], { %r2416, %r2417, %r2418, %r2419 }; 2026-02-21T10:23:48.7788666Z // end inline asm 2026-02-21T10:23:48.7788814Z // begin inline asm 2026-02-21T10:23:48.7789023Z st.global.v4.b32 [ %rd63 + 0 ], { %r2420, %r2421, %r2422, %r2423 }; 2026-02-21T10:23:48.7789279Z // end inline asm 2026-02-21T10:23:48.7789422Z // begin inline asm 2026-02-21T10:23:48.7789623Z st.global.v4.b32 [ %rd64 + 0 ], { %r2424, %r2425, %r2426, %r2427 }; 2026-02-21T10:23:48.7789893Z // end inline asm 2026-02-21T10:23:48.7790034Z // begin inline asm 2026-02-21T10:23:48.7790240Z st.global.v4.b32 [ %rd65 + 0 ], { %r2428, %r2429, %r2430, %r2431 }; 2026-02-21T10:23:48.7790489Z // end inline asm 2026-02-21T10:23:48.7790639Z // begin inline asm 2026-02-21T10:23:48.7790843Z st.global.v4.b32 [ %rd66 + 0 ], { %r2432, %r2433, %r2434, %r2435 }; 2026-02-21T10:23:48.7791097Z // end inline asm 2026-02-21T10:23:48.7791247Z // begin inline asm 2026-02-21T10:23:48.7791450Z st.global.v4.b32 [ %rd67 + 0 ], { %r2436, %r2437, %r2438, %r2439 }; 2026-02-21T10:23:48.7791699Z // end inline asm 2026-02-21T10:23:48.7791841Z // begin inline asm 2026-02-21T10:23:48.7792061Z st.global.v4.b32 [ %rd68 + 0 ], { %r2440, %r2441, %r2442, %r2443 }; 2026-02-21T10:23:48.7792307Z // end inline asm 2026-02-21T10:23:48.7792462Z // begin inline asm 2026-02-21T10:23:48.7792665Z st.global.v4.b32 [ %rd69 + 0 ], { %r2444, %r2445, %r2446, %r2447 }; 2026-02-21T10:23:48.7792921Z // end inline asm 2026-02-21T10:23:48.7793229Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7793612Z add.s32 %r2498, %r6359, 1; 2026-02-21T10:23:48.7793935Z .loc 1 28 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:28:35 2026-02-21T10:23:48.7794294Z shr.s32 %r2499, %r2498, 31; 2026-02-21T10:23:48.7794474Z shr.u32 %r2500, %r2499, 17; 2026-02-21T10:23:48.7794660Z add.s32 %r2501, %r2498, %r2500; 2026-02-21T10:23:48.7794848Z shr.s32 %r2502, %r2501, 15; 2026-02-21T10:23:48.7795168Z .loc 1 29 33 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:29:33 2026-02-21T10:23:48.7795518Z shl.b32 %r2503, %r2502, 6; 2026-02-21T10:23:48.7795832Z .loc 1 30 39 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:39 2026-02-21T10:23:48.7795894Z sub.s32 %r2504, 10, %r2503; 2026-02-21T10:23:48.7796096Z .loc 1 30 52 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:52 2026-02-21T10:23:48.7796253Z min.s32 %r2505, %r2504, 64; 2026-02-21T10:23:48.7796582Z .loc 1 31 45 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:45 2026-02-21T10:23:48.7796728Z and.b32 %r2506, %r2501, -32768; 2026-02-21T10:23:48.7796793Z sub.s32 %r2507, %r2498, %r2506; 2026-02-21T10:23:48.7797016Z .loc 1 32 51 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:32:51 2026-02-21T10:23:48.7797083Z div.s32 %r2508, %r2507, %r2505; 2026-02-21T10:23:48.7797284Z .loc 1 31 64 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:64 2026-02-21T10:23:48.7797353Z mul.lo.s32 %r2509, %r2508, %r2505; 2026-02-21T10:23:48.7797415Z sub.s32 %r2510, %r2507, %r2509; 2026-02-21T10:23:48.7797615Z .loc 1 31 30 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:30 2026-02-21T10:23:48.7797682Z add.s32 %r2511, %r2510, %r2503; 2026-02-21T10:23:48.7797961Z .loc 1 33 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:33:27 2026-02-21T10:23:48.7798028Z shl.b32 %r187, %r2511, 7; 2026-02-21T10:23:48.7798232Z .loc 1 35 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:35:27 2026-02-21T10:23:48.7798295Z shl.b32 %r188, %r2508, 7; 2026-02-21T10:23:48.7798566Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7798642Z shl.b32 %r2512, %r2508, 20; 2026-02-21T10:23:48.7798710Z or.b32 %r2513, %r55, %r2512; 2026-02-21T10:23:48.7798783Z mad.wide.s32 %rd191, %r2513, 2, %rd3; 2026-02-21T10:23:48.7798844Z or.b32 %r2514, %r19, %r187; 2026-02-21T10:23:48.7798912Z cvt.s64.s32 %rd71, %r2514; 2026-02-21T10:23:48.7798976Z add.s64 %rd190, %rd4, %rd71; 2026-02-21T10:23:48.7799035Z mov.b32 %r6424, 0f00000000; 2026-02-21T10:23:48.7799102Z mov.b64 %rd192, -32; 2026-02-21T10:23:48.7799166Z mov.b32 %r6425, %r6424; 2026-02-21T10:23:48.7799228Z mov.b32 %r6426, %r6424; 2026-02-21T10:23:48.7799286Z mov.b32 %r6427, %r6424; 2026-02-21T10:23:48.7799347Z mov.b32 %r6428, %r6424; 2026-02-21T10:23:48.7799405Z mov.b32 %r6429, %r6424; 2026-02-21T10:23:48.7799466Z mov.b32 %r6430, %r6424; 2026-02-21T10:23:48.7799526Z mov.b32 %r6431, %r6424; 2026-02-21T10:23:48.7799583Z mov.b32 %r6432, %r6424; 2026-02-21T10:23:48.7799642Z mov.b32 %r6433, %r6424; 2026-02-21T10:23:48.7799699Z mov.b32 %r6434, %r6424; 2026-02-21T10:23:48.7799760Z mov.b32 %r6435, %r6424; 2026-02-21T10:23:48.7799818Z mov.b32 %r6436, %r6424; 2026-02-21T10:23:48.7799874Z mov.b32 %r6437, %r6424; 2026-02-21T10:23:48.7799935Z mov.b32 %r6438, %r6424; 2026-02-21T10:23:48.7799992Z mov.b32 %r6439, %r6424; 2026-02-21T10:23:48.7800049Z mov.b32 %r6440, %r6424; 2026-02-21T10:23:48.7800106Z mov.b32 %r6441, %r6424; 2026-02-21T10:23:48.7800167Z mov.b32 %r6442, %r6424; 2026-02-21T10:23:48.7800223Z mov.b32 %r6443, %r6424; 2026-02-21T10:23:48.7800290Z mov.b32 %r6444, %r6424; 2026-02-21T10:23:48.7800364Z mov.b32 %r6445, %r6424; 2026-02-21T10:23:48.7800425Z mov.b32 %r6446, %r6424; 2026-02-21T10:23:48.7800484Z mov.b32 %r6447, %r6424; 2026-02-21T10:23:48.7800542Z mov.b32 %r6448, %r6424; 2026-02-21T10:23:48.7800607Z mov.b32 %r6449, %r6424; 2026-02-21T10:23:48.7800665Z mov.b32 %r6450, %r6424; 2026-02-21T10:23:48.7800722Z mov.b32 %r6451, %r6424; 2026-02-21T10:23:48.7800784Z mov.b32 %r6452, %r6424; 2026-02-21T10:23:48.7800841Z mov.b32 %r6453, %r6424; 2026-02-21T10:23:48.7800898Z mov.b32 %r6454, %r6424; 2026-02-21T10:23:48.7800957Z mov.b32 %r6455, %r6424; 2026-02-21T10:23:48.7801019Z mov.b32 %r6456, %r6424; 2026-02-21T10:23:48.7801077Z mov.b32 %r6457, %r6424; 2026-02-21T10:23:48.7801134Z mov.b32 %r6458, %r6424; 2026-02-21T10:23:48.7801195Z mov.b32 %r6459, %r6424; 2026-02-21T10:23:48.7801255Z mov.b32 %r6460, %r6424; 2026-02-21T10:23:48.7801313Z mov.b32 %r6461, %r6424; 2026-02-21T10:23:48.7801460Z mov.b32 %r6462, %r6424; 2026-02-21T10:23:48.7801530Z mov.b32 %r6463, %r6424; 2026-02-21T10:23:48.7801588Z mov.b32 %r6464, %r6424; 2026-02-21T10:23:48.7801647Z mov.b32 %r6465, %r6424; 2026-02-21T10:23:48.7801712Z mov.b32 %r6466, %r6424; 2026-02-21T10:23:48.7801822Z mov.b32 %r6467, %r6424; 2026-02-21T10:23:48.7801878Z mov.b32 %r6468, %r6424; 2026-02-21T10:23:48.7801941Z mov.b32 %r6469, %r6424; 2026-02-21T10:23:48.7802002Z mov.b32 %r6470, %r6424; 2026-02-21T10:23:48.7802061Z mov.b32 %r6471, %r6424; 2026-02-21T10:23:48.7802118Z mov.b32 %r6472, %r6424; 2026-02-21T10:23:48.7802178Z mov.b32 %r6473, %r6424; 2026-02-21T10:23:48.7802243Z mov.b32 %r6474, %r6424; 2026-02-21T10:23:48.7802301Z mov.b32 %r6475, %r6424; 2026-02-21T10:23:48.7802362Z mov.b32 %r6476, %r6424; 2026-02-21T10:23:48.7802420Z mov.b32 %r6477, %r6424; 2026-02-21T10:23:48.7802478Z mov.b32 %r6478, %r6424; 2026-02-21T10:23:48.7802536Z mov.b32 %r6479, %r6424; 2026-02-21T10:23:48.7802597Z mov.b32 %r6480, %r6424; 2026-02-21T10:23:48.7802721Z mov.b32 %r6481, %r6424; 2026-02-21T10:23:48.7802784Z mov.b32 %r6482, %r6424; 2026-02-21T10:23:48.7802846Z mov.b32 %r6483, %r6424; 2026-02-21T10:23:48.7802904Z mov.b32 %r6484, %r6424; 2026-02-21T10:23:48.7802963Z mov.b32 %r6485, %r6424; 2026-02-21T10:23:48.7803024Z mov.b32 %r6486, %r6424; 2026-02-21T10:23:48.7803086Z mov.b32 %r6487, %r6424; 2026-02-21T10:23:48.7803261Z $L__BB0_6: // Parent Loop BB0_3 Depth=1 2026-02-21T10:23:48.7803375Z // => This Inner Loop Header: Depth=2 2026-02-21T10:23:48.7803583Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7803648Z add.s64 %rd73, %rd191, -96; 2026-02-21T10:23:48.7803709Z // begin inline asm 2026-02-21T10:23:48.7803773Z mov.u64 %rd72, 0x0; 2026-02-21T10:23:48.7803909Z createpolicy.fractional.L2::evict_last.b64 %rd72, 1.0; 2026-02-21T10:23:48.7803969Z // end inline asm 2026-02-21T10:23:48.7804028Z // begin inline asm 2026-02-21T10:23:48.7804095Z mov.u32 %r2515, 0x0; 2026-02-21T10:23:48.7804152Z mov.u32 %r2516, 0x0; 2026-02-21T10:23:48.7804208Z mov.u32 %r2517, 0x0; 2026-02-21T10:23:48.7804269Z mov.u32 %r2518, 0x0; 2026-02-21T10:23:48.7804493Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2515, %r2516, %r2517, %r2518 }, [ %rd73 + 0 ], %rd72; 2026-02-21T10:23:48.7804551Z // end inline asm 2026-02-21T10:23:48.7804763Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7804821Z bar.sync 0; 2026-02-21T10:23:48.7804902Z st.shared.v2.b32 [%r29], {%r2515, %r2516}; 2026-02-21T10:23:48.7804977Z st.shared.v2.b32 [%r30], {%r2517, %r2518}; 2026-02-21T10:23:48.7805039Z bar.sync 0; 2026-02-21T10:23:48.7805117Z ld.shared.b16 %rs193, [%r32]; 2026-02-21T10:23:48.7805188Z ld.shared.b16 %rs194, [%r32+256]; 2026-02-21T10:23:48.7805258Z ld.shared.b16 %rs195, [%r32+16]; 2026-02-21T10:23:48.7805324Z ld.shared.b16 %rs196, [%r32+272]; 2026-02-21T10:23:48.7805393Z ld.shared.b16 %rs197, [%r33]; 2026-02-21T10:23:48.7805456Z ld.shared.b16 %rs198, [%r33+256]; 2026-02-21T10:23:48.7805525Z ld.shared.b16 %rs199, [%r33+16]; 2026-02-21T10:23:48.7805590Z ld.shared.b16 %rs200, [%r33+272]; 2026-02-21T10:23:48.7805656Z cvt.f32.bf16 %r2648, %rs193; 2026-02-21T10:23:48.7805726Z cvt.f32.bf16 %r2649, %rs194; 2026-02-21T10:23:48.7805798Z cvt.f32.bf16 %r2650, %rs197; 2026-02-21T10:23:48.7805864Z cvt.f32.bf16 %r2651, %rs198; 2026-02-21T10:23:48.7805927Z cvt.f32.bf16 %r2780, %rs195; 2026-02-21T10:23:48.7805988Z cvt.f32.bf16 %r2781, %rs196; 2026-02-21T10:23:48.7806051Z cvt.f32.bf16 %r2782, %rs199; 2026-02-21T10:23:48.7806111Z cvt.f32.bf16 %r2783, %rs200; 2026-02-21T10:23:48.7806319Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7806381Z // begin inline asm 2026-02-21T10:23:48.7806439Z mov.u64 %rd75, 0x0; 2026-02-21T10:23:48.7806692Z createpolicy.fractional.L2::evict_last.b64 %rd75, 1.0; 2026-02-21T10:23:48.7806861Z // end inline asm 2026-02-21T10:23:48.7806921Z // begin inline asm 2026-02-21T10:23:48.7806979Z mov.u32 %r2519, 0x0; 2026-02-21T10:23:48.7807145Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2519 }, [ %rd190 + 0 ], %rd75; 2026-02-21T10:23:48.7807269Z // end inline asm 2026-02-21T10:23:48.7807477Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7807541Z bar.sync 0; 2026-02-21T10:23:48.7807605Z st.shared.b8 [%r34], %r2519; 2026-02-21T10:23:48.7807686Z prmt.b32 %r4127, %r2519, 0, 0x7771U; 2026-02-21T10:23:48.7807757Z st.shared.b8 [%r35+256], %r4127; 2026-02-21T10:23:48.7807824Z prmt.b32 %r4128, %r2519, 0, 0x7772U; 2026-02-21T10:23:48.7807887Z st.shared.b8 [%r36+512], %r4128; 2026-02-21T10:23:48.7807950Z prmt.b32 %r4129, %r2519, 0, 0x7773U; 2026-02-21T10:23:48.7808017Z st.shared.b8 [%r37+768], %r4129; 2026-02-21T10:23:48.7808072Z bar.sync 0; 2026-02-21T10:23:48.7808201Z ld.shared.b32 %r4130, [%r38]; 2026-02-21T10:23:48.7808270Z prmt.b32 %r4131, %r4130, 0, 0x7770U; 2026-02-21T10:23:48.7808333Z cvt.u16.u32 %rs201, %r4131; 2026-02-21T10:23:48.7808394Z prmt.b32 %r4132, %r4130, 0, 0x7771U; 2026-02-21T10:23:48.7808471Z cvt.u16.u32 %rs202, %r4132; 2026-02-21T10:23:48.7808542Z prmt.b32 %r4133, %r4130, 0, 0x7772U; 2026-02-21T10:23:48.7808606Z cvt.u16.u32 %rs203, %r4133; 2026-02-21T10:23:48.7808735Z prmt.b32 %r4134, %r4130, 0, 0x7773U; 2026-02-21T10:23:48.7808802Z cvt.u16.u32 %rs204, %r4134; 2026-02-21T10:23:48.7808868Z ld.shared.b32 %r4135, [%r38+128]; 2026-02-21T10:23:48.7808932Z prmt.b32 %r4136, %r4135, 0, 0x7770U; 2026-02-21T10:23:48.7808998Z cvt.u16.u32 %rs205, %r4136; 2026-02-21T10:23:48.7809062Z prmt.b32 %r4137, %r4135, 0, 0x7771U; 2026-02-21T10:23:48.7809124Z cvt.u16.u32 %rs206, %r4137; 2026-02-21T10:23:48.7809186Z prmt.b32 %r4138, %r4135, 0, 0x7772U; 2026-02-21T10:23:48.7809249Z cvt.u16.u32 %rs207, %r4138; 2026-02-21T10:23:48.7809315Z prmt.b32 %r4139, %r4135, 0, 0x7773U; 2026-02-21T10:23:48.7809377Z cvt.u16.u32 %rs208, %r4139; 2026-02-21T10:23:48.7809588Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7809656Z shl.b16 %rs209, %rs201, 4; 2026-02-21T10:23:48.7809717Z shl.b16 %rs210, %rs202, 4; 2026-02-21T10:23:48.7809778Z shl.b16 %rs211, %rs203, 4; 2026-02-21T10:23:48.7809844Z shl.b16 %rs212, %rs204, 4; 2026-02-21T10:23:48.7809904Z shl.b16 %rs213, %rs205, 4; 2026-02-21T10:23:48.7809964Z shl.b16 %rs214, %rs206, 4; 2026-02-21T10:23:48.7810027Z shl.b16 %rs215, %rs207, 4; 2026-02-21T10:23:48.7810086Z shl.b16 %rs216, %rs208, 4; 2026-02-21T10:23:48.7810287Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7810366Z selp.b16 %rs217, %rs209, %rs201, %p45; 2026-02-21T10:23:48.7810427Z cvt.s16.s8 %rs218, %rs217; 2026-02-21T10:23:48.7810488Z shr.s16 %rs219, %rs218, 4; 2026-02-21T10:23:48.7810563Z selp.b16 %rs220, %rs210, %rs202, %p45; 2026-02-21T10:23:48.7810628Z cvt.s16.s8 %rs221, %rs220; 2026-02-21T10:23:48.7810689Z shr.s16 %rs222, %rs221, 4; 2026-02-21T10:23:48.7810760Z selp.b16 %rs223, %rs211, %rs203, %p45; 2026-02-21T10:23:48.7810825Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T10:23:48.7810897Z shr.s16 %rs225, %rs224, 4; 2026-02-21T10:23:48.7810970Z selp.b16 %rs226, %rs212, %rs204, %p45; 2026-02-21T10:23:48.7811033Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T10:23:48.7811098Z shr.s16 %rs228, %rs227, 4; 2026-02-21T10:23:48.7811165Z selp.b16 %rs229, %rs213, %rs205, %p45; 2026-02-21T10:23:48.7811224Z cvt.s16.s8 %rs230, %rs229; 2026-02-21T10:23:48.7811288Z shr.s16 %rs231, %rs230, 4; 2026-02-21T10:23:48.7811356Z selp.b16 %rs232, %rs214, %rs206, %p45; 2026-02-21T10:23:48.7811416Z cvt.s16.s8 %rs233, %rs232; 2026-02-21T10:23:48.7811476Z shr.s16 %rs234, %rs233, 4; 2026-02-21T10:23:48.7811548Z selp.b16 %rs235, %rs215, %rs207, %p45; 2026-02-21T10:23:48.7811676Z cvt.s16.s8 %rs236, %rs235; 2026-02-21T10:23:48.7811736Z shr.s16 %rs237, %rs236, 4; 2026-02-21T10:23:48.7811805Z selp.b16 %rs238, %rs216, %rs208, %p45; 2026-02-21T10:23:48.7811866Z cvt.s16.s8 %rs239, %rs238; 2026-02-21T10:23:48.7811975Z shr.s16 %rs240, %rs239, 4; 2026-02-21T10:23:48.7812182Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7812251Z cvt.rn.f32.s16 %r4140, %rs219; 2026-02-21T10:23:48.7812314Z cvt.rn.f32.s16 %r4141, %rs222; 2026-02-21T10:23:48.7812377Z cvt.rn.f32.s16 %r4142, %rs225; 2026-02-21T10:23:48.7812456Z cvt.rn.f32.s16 %r4143, %rs228; 2026-02-21T10:23:48.7812518Z cvt.rn.f32.s16 %r4144, %rs231; 2026-02-21T10:23:48.7812581Z cvt.rn.f32.s16 %r4145, %rs234; 2026-02-21T10:23:48.7812646Z cvt.rn.f32.s16 %r4146, %rs237; 2026-02-21T10:23:48.7812707Z cvt.rn.f32.s16 %r4147, %rs240; 2026-02-21T10:23:48.7812761Z bar.sync 0; 2026-02-21T10:23:48.7812826Z st.shared.b32 [%r39], %r4140; 2026-02-21T10:23:48.7812956Z st.shared.b32 [%r39+8], %r4141; 2026-02-21T10:23:48.7813023Z st.shared.b32 [%r40], %r4142; 2026-02-21T10:23:48.7813088Z st.shared.b32 [%r40+8], %r4143; 2026-02-21T10:23:48.7813155Z st.shared.b32 [%r41], %r4144; 2026-02-21T10:23:48.7813220Z st.shared.b32 [%r41+8], %r4145; 2026-02-21T10:23:48.7813293Z st.shared.b32 [%r42], %r4146; 2026-02-21T10:23:48.7813357Z st.shared.b32 [%r42+8], %r4147; 2026-02-21T10:23:48.7813469Z $L__tmp9: 2026-02-21T10:23:48.7813761Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7813825Z // begin inline asm 2026-02-21T10:23:48.7813906Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7813963Z // end inline asm 2026-02-21T10:23:48.7814017Z bar.sync 0; 2026-02-21T10:23:48.7814102Z shfl.sync.idx.b32 %r4148, %r5, 0, 31, -1; 2026-02-21T10:23:48.7814175Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7814239Z mov.pred %p12, -1; 2026-02-21T10:23:48.7814302Z // begin inline asm 2026-02-21T10:23:48.7815588Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r2648,%r2649,%r2650,%r2651}, %rd78, %p12, 1, 1; 2026-02-21T10:23:48.7815651Z // end inline asm 2026-02-21T10:23:48.7815713Z // begin inline asm 2026-02-21T10:23:48.7817094Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r2780,%r2781,%r2782,%r2783}, %rd79, %p12, 1, 1; 2026-02-21T10:23:48.7817162Z // end inline asm 2026-02-21T10:23:48.7817244Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7817302Z mov.b32 %r4059, 0; 2026-02-21T10:23:48.7817366Z mov.b32 %r2848, %r4869; 2026-02-21T10:23:48.7817430Z mov.b32 %r2849, %r4059; 2026-02-21T10:23:48.7817499Z mov.b32 %r2850, %r4059; 2026-02-21T10:23:48.7817560Z // begin inline asm 2026-02-21T10:23:48.7818625Z // wait for regs: %r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487,%r2848,%r2849,%r2850 2026-02-21T10:23:48.7818791Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7818848Z // end inline asm 2026-02-21T10:23:48.7818965Z $L__tmp10: 2026-02-21T10:23:48.7819180Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7819247Z add.s64 %rd81, %rd191, -64; 2026-02-21T10:23:48.7819307Z // begin inline asm 2026-02-21T10:23:48.7819370Z mov.u64 %rd80, 0x0; 2026-02-21T10:23:48.7819489Z createpolicy.fractional.L2::evict_last.b64 %rd80, 1.0; 2026-02-21T10:23:48.7819547Z // end inline asm 2026-02-21T10:23:48.7819610Z // begin inline asm 2026-02-21T10:23:48.7819669Z mov.u32 %r2918, 0x0; 2026-02-21T10:23:48.7819726Z mov.u32 %r2919, 0x0; 2026-02-21T10:23:48.7819783Z mov.u32 %r2920, 0x0; 2026-02-21T10:23:48.7819845Z mov.u32 %r2921, 0x0; 2026-02-21T10:23:48.7820131Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2918, %r2919, %r2920, %r2921 }, [ %rd81 + 0 ], %rd80; 2026-02-21T10:23:48.7820194Z // end inline asm 2026-02-21T10:23:48.7820409Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7820474Z bar.sync 0; 2026-02-21T10:23:48.7820556Z st.shared.v2.b32 [%r29], {%r2918, %r2919}; 2026-02-21T10:23:48.7820638Z st.shared.v2.b32 [%r30], {%r2920, %r2921}; 2026-02-21T10:23:48.7820762Z bar.sync 0; 2026-02-21T10:23:48.7820834Z ld.shared.b16 %rs241, [%r32]; 2026-02-21T10:23:48.7820902Z ld.shared.b16 %rs242, [%r32+256]; 2026-02-21T10:23:48.7820974Z ld.shared.b16 %rs243, [%r32+16]; 2026-02-21T10:23:48.7821041Z ld.shared.b16 %rs244, [%r32+272]; 2026-02-21T10:23:48.7821105Z ld.shared.b16 %rs245, [%r33]; 2026-02-21T10:23:48.7821172Z ld.shared.b16 %rs246, [%r33+256]; 2026-02-21T10:23:48.7821238Z ld.shared.b16 %rs247, [%r33+16]; 2026-02-21T10:23:48.7821302Z ld.shared.b16 %rs248, [%r33+272]; 2026-02-21T10:23:48.7821370Z cvt.f32.bf16 %r3051, %rs241; 2026-02-21T10:23:48.7821438Z cvt.f32.bf16 %r3052, %rs242; 2026-02-21T10:23:48.7821500Z cvt.f32.bf16 %r3053, %rs245; 2026-02-21T10:23:48.7821561Z cvt.f32.bf16 %r3054, %rs246; 2026-02-21T10:23:48.7821628Z cvt.f32.bf16 %r3183, %rs243; 2026-02-21T10:23:48.7821691Z cvt.f32.bf16 %r3184, %rs244; 2026-02-21T10:23:48.7821762Z cvt.f32.bf16 %r3185, %rs247; 2026-02-21T10:23:48.7821827Z cvt.f32.bf16 %r3186, %rs248; 2026-02-21T10:23:48.7822044Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7822107Z add.s64 %rd84, %rd190, 10240; 2026-02-21T10:23:48.7822167Z // begin inline asm 2026-02-21T10:23:48.7822226Z mov.u64 %rd83, 0x0; 2026-02-21T10:23:48.7822347Z createpolicy.fractional.L2::evict_last.b64 %rd83, 1.0; 2026-02-21T10:23:48.7822405Z // end inline asm 2026-02-21T10:23:48.7822467Z // begin inline asm 2026-02-21T10:23:48.7822524Z mov.u32 %r2922, 0x0; 2026-02-21T10:23:48.7822685Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2922 }, [ %rd84 + 0 ], %rd83; 2026-02-21T10:23:48.7822742Z // end inline asm 2026-02-21T10:23:48.7822951Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7823008Z bar.sync 0; 2026-02-21T10:23:48.7823073Z st.shared.b8 [%r34], %r2922; 2026-02-21T10:23:48.7823143Z prmt.b32 %r4149, %r2922, 0, 0x7771U; 2026-02-21T10:23:48.7823208Z st.shared.b8 [%r35+256], %r4149; 2026-02-21T10:23:48.7823274Z prmt.b32 %r4150, %r2922, 0, 0x7772U; 2026-02-21T10:23:48.7823342Z st.shared.b8 [%r36+512], %r4150; 2026-02-21T10:23:48.7823406Z prmt.b32 %r4151, %r2922, 0, 0x7773U; 2026-02-21T10:23:48.7823469Z st.shared.b8 [%r37+768], %r4151; 2026-02-21T10:23:48.7823526Z bar.sync 0; 2026-02-21T10:23:48.7823593Z ld.shared.b32 %r4152, [%r38]; 2026-02-21T10:23:48.7823658Z prmt.b32 %r4153, %r4152, 0, 0x7770U; 2026-02-21T10:23:48.7823726Z cvt.u16.u32 %rs249, %r4153; 2026-02-21T10:23:48.7823793Z prmt.b32 %r4154, %r4152, 0, 0x7771U; 2026-02-21T10:23:48.7823928Z cvt.u16.u32 %rs250, %r4154; 2026-02-21T10:23:48.7823993Z prmt.b32 %r4155, %r4152, 0, 0x7772U; 2026-02-21T10:23:48.7824410Z [3421s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T10:23:48.7825745Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[64], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=1, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[True, True], range_num_stages=[3, 2], range_unroll_factors=[2, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:23:48.7825898Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T10:23:48.7825959Z `ptxas` stderr: 2026-02-21T10:23:48.7826622Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 412 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T10:23:48.7826748Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T10:23:48.7826756Z 2026-02-21T10:23:48.7827261Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp3qfszq07.ptx -o /tmp/tmp3qfszq07.ptx.o 2026-02-21T10:23:48.7827269Z 2026-02-21T10:23:48.7827518Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T10:23:48.7827592Z cvt.u16.u32 %rs251, %r4155; 2026-02-21T10:23:48.7827667Z prmt.b32 %r4156, %r4152, 0, 0x7773U; 2026-02-21T10:23:48.7827731Z cvt.u16.u32 %rs252, %r4156; 2026-02-21T10:23:48.7827803Z ld.shared.b32 %r4157, [%r38+128]; 2026-02-21T10:23:48.7827873Z prmt.b32 %r4158, %r4157, 0, 0x7770U; 2026-02-21T10:23:48.7827936Z cvt.u16.u32 %rs253, %r4158; 2026-02-21T10:23:48.7827999Z prmt.b32 %r4159, %r4157, 0, 0x7771U; 2026-02-21T10:23:48.7828077Z cvt.u16.u32 %rs254, %r4159; 2026-02-21T10:23:48.7828149Z prmt.b32 %r4160, %r4157, 0, 0x7772U; 2026-02-21T10:23:48.7828211Z cvt.u16.u32 %rs255, %r4160; 2026-02-21T10:23:48.7828279Z prmt.b32 %r4161, %r4157, 0, 0x7773U; 2026-02-21T10:23:48.7828348Z cvt.u16.u32 %rs256, %r4161; 2026-02-21T10:23:48.7828632Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7828701Z shl.b16 %rs257, %rs249, 4; 2026-02-21T10:23:48.7828766Z shl.b16 %rs258, %rs250, 4; 2026-02-21T10:23:48.7828830Z shl.b16 %rs259, %rs251, 4; 2026-02-21T10:23:48.7828891Z shl.b16 %rs260, %rs252, 4; 2026-02-21T10:23:48.7828953Z shl.b16 %rs261, %rs253, 4; 2026-02-21T10:23:48.7829019Z shl.b16 %rs262, %rs254, 4; 2026-02-21T10:23:48.7829079Z shl.b16 %rs263, %rs255, 4; 2026-02-21T10:23:48.7829140Z shl.b16 %rs264, %rs256, 4; 2026-02-21T10:23:48.7829349Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7829429Z selp.b16 %rs265, %rs257, %rs249, %p45; 2026-02-21T10:23:48.7829491Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T10:23:48.7829551Z shr.s16 %rs267, %rs266, 4; 2026-02-21T10:23:48.7829625Z selp.b16 %rs268, %rs258, %rs250, %p45; 2026-02-21T10:23:48.7829688Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T10:23:48.7829750Z shr.s16 %rs270, %rs269, 4; 2026-02-21T10:23:48.7829828Z selp.b16 %rs271, %rs259, %rs251, %p45; 2026-02-21T10:23:48.7829896Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T10:23:48.7829959Z shr.s16 %rs273, %rs272, 4; 2026-02-21T10:23:48.7830029Z selp.b16 %rs274, %rs260, %rs252, %p45; 2026-02-21T10:23:48.7830093Z cvt.s16.s8 %rs275, %rs274; 2026-02-21T10:23:48.7830153Z shr.s16 %rs276, %rs275, 4; 2026-02-21T10:23:48.7830220Z selp.b16 %rs277, %rs261, %rs253, %p45; 2026-02-21T10:23:48.7830284Z cvt.s16.s8 %rs278, %rs277; 2026-02-21T10:23:48.7830345Z shr.s16 %rs279, %rs278, 4; 2026-02-21T10:23:48.7830414Z selp.b16 %rs280, %rs262, %rs254, %p45; 2026-02-21T10:23:48.7830757Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T10:23:48.7830822Z shr.s16 %rs282, %rs281, 4; 2026-02-21T10:23:48.7830892Z selp.b16 %rs283, %rs263, %rs255, %p45; 2026-02-21T10:23:48.7830952Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T10:23:48.7831016Z shr.s16 %rs285, %rs284, 4; 2026-02-21T10:23:48.7831149Z selp.b16 %rs286, %rs264, %rs256, %p45; 2026-02-21T10:23:48.7831210Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T10:23:48.7831275Z shr.s16 %rs288, %rs287, 4; 2026-02-21T10:23:48.7831497Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7831566Z cvt.rn.f32.s16 %r4162, %rs267; 2026-02-21T10:23:48.7831630Z cvt.rn.f32.s16 %r4163, %rs270; 2026-02-21T10:23:48.7831695Z cvt.rn.f32.s16 %r4164, %rs273; 2026-02-21T10:23:48.7831759Z cvt.rn.f32.s16 %r4165, %rs276; 2026-02-21T10:23:48.7831820Z cvt.rn.f32.s16 %r4166, %rs279; 2026-02-21T10:23:48.7831885Z cvt.rn.f32.s16 %r4167, %rs282; 2026-02-21T10:23:48.7831950Z cvt.rn.f32.s16 %r4168, %rs285; 2026-02-21T10:23:48.7832087Z cvt.rn.f32.s16 %r4169, %rs288; 2026-02-21T10:23:48.7832148Z bar.sync 0; 2026-02-21T10:23:48.7832221Z st.shared.b32 [%r39], %r4162; 2026-02-21T10:23:48.7832288Z st.shared.b32 [%r39+8], %r4163; 2026-02-21T10:23:48.7832358Z st.shared.b32 [%r40], %r4164; 2026-02-21T10:23:48.7832427Z st.shared.b32 [%r40+8], %r4165; 2026-02-21T10:23:48.7832495Z st.shared.b32 [%r41], %r4166; 2026-02-21T10:23:48.7832607Z st.shared.b32 [%r41+8], %r4167; 2026-02-21T10:23:48.7832676Z st.shared.b32 [%r42], %r4168; 2026-02-21T10:23:48.7832741Z st.shared.b32 [%r42+8], %r4169; 2026-02-21T10:23:48.7832796Z $L__tmp11: 2026-02-21T10:23:48.7833086Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7833151Z // begin inline asm 2026-02-21T10:23:48.7833230Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7833290Z // end inline asm 2026-02-21T10:23:48.7833350Z bar.sync 0; 2026-02-21T10:23:48.7833425Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7833487Z // begin inline asm 2026-02-21T10:23:48.7834764Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3051,%r3052,%r3053,%r3054}, %rd78, %p12, 1, 1; 2026-02-21T10:23:48.7834827Z // end inline asm 2026-02-21T10:23:48.7834887Z // begin inline asm 2026-02-21T10:23:48.7836147Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3183,%r3184,%r3185,%r3186}, %rd79, %p12, 1, 1; 2026-02-21T10:23:48.7836209Z // end inline asm 2026-02-21T10:23:48.7836286Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7836354Z mov.b32 %r3251, %r4869; 2026-02-21T10:23:48.7836415Z mov.b32 %r3252, %r4059; 2026-02-21T10:23:48.7836618Z mov.b32 %r3253, %r4059; 2026-02-21T10:23:48.7836687Z // begin inline asm 2026-02-21T10:23:48.7837753Z // wait for regs: %r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487,%r3251,%r3252,%r3253 2026-02-21T10:23:48.7837920Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7837982Z // end inline asm 2026-02-21T10:23:48.7838098Z $L__tmp12: 2026-02-21T10:23:48.7838311Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7838383Z add.s64 %rd89, %rd191, -32; 2026-02-21T10:23:48.7838443Z // begin inline asm 2026-02-21T10:23:48.7838502Z mov.u64 %rd88, 0x0; 2026-02-21T10:23:48.7838626Z createpolicy.fractional.L2::evict_last.b64 %rd88, 1.0; 2026-02-21T10:23:48.7838688Z // end inline asm 2026-02-21T10:23:48.7838747Z // begin inline asm 2026-02-21T10:23:48.7838806Z mov.u32 %r3321, 0x0; 2026-02-21T10:23:48.7838870Z mov.u32 %r3322, 0x0; 2026-02-21T10:23:48.7838927Z mov.u32 %r3323, 0x0; 2026-02-21T10:23:48.7838986Z mov.u32 %r3324, 0x0; 2026-02-21T10:23:48.7839281Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3321, %r3322, %r3323, %r3324 }, [ %rd89 + 0 ], %rd88; 2026-02-21T10:23:48.7839349Z // end inline asm 2026-02-21T10:23:48.7839556Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7839617Z bar.sync 0; 2026-02-21T10:23:48.7839703Z st.shared.v2.b32 [%r29], {%r3321, %r3322}; 2026-02-21T10:23:48.7839792Z st.shared.v2.b32 [%r30], {%r3323, %r3324}; 2026-02-21T10:23:48.7839913Z bar.sync 0; 2026-02-21T10:23:48.7839989Z ld.shared.b16 %rs289, [%r32]; 2026-02-21T10:23:48.7840059Z ld.shared.b16 %rs290, [%r32+256]; 2026-02-21T10:23:48.7840128Z ld.shared.b16 %rs291, [%r32+16]; 2026-02-21T10:23:48.7840193Z ld.shared.b16 %rs292, [%r32+272]; 2026-02-21T10:23:48.7840259Z ld.shared.b16 %rs293, [%r33]; 2026-02-21T10:23:48.7840323Z ld.shared.b16 %rs294, [%r33+256]; 2026-02-21T10:23:48.7840387Z ld.shared.b16 %rs295, [%r33+16]; 2026-02-21T10:23:48.7840455Z ld.shared.b16 %rs296, [%r33+272]; 2026-02-21T10:23:48.7840524Z cvt.f32.bf16 %r3454, %rs289; 2026-02-21T10:23:48.7840602Z cvt.f32.bf16 %r3455, %rs290; 2026-02-21T10:23:48.7840665Z cvt.f32.bf16 %r3456, %rs293; 2026-02-21T10:23:48.7840733Z cvt.f32.bf16 %r3457, %rs294; 2026-02-21T10:23:48.7840795Z cvt.f32.bf16 %r3586, %rs291; 2026-02-21T10:23:48.7840859Z cvt.f32.bf16 %r3587, %rs292; 2026-02-21T10:23:48.7840924Z cvt.f32.bf16 %r3588, %rs295; 2026-02-21T10:23:48.7840988Z cvt.f32.bf16 %r3589, %rs296; 2026-02-21T10:23:48.7841203Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7841272Z add.s64 %rd92, %rd190, 20480; 2026-02-21T10:23:48.7841333Z // begin inline asm 2026-02-21T10:23:48.7841392Z mov.u64 %rd91, 0x0; 2026-02-21T10:23:48.7841512Z createpolicy.fractional.L2::evict_last.b64 %rd91, 1.0; 2026-02-21T10:23:48.7841573Z // end inline asm 2026-02-21T10:23:48.7841632Z // begin inline asm 2026-02-21T10:23:48.7841691Z mov.u32 %r3325, 0x0; 2026-02-21T10:23:48.7841853Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3325 }, [ %rd92 + 0 ], %rd91; 2026-02-21T10:23:48.7841914Z // end inline asm 2026-02-21T10:23:48.7842132Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7842194Z bar.sync 0; 2026-02-21T10:23:48.7842269Z st.shared.b8 [%r34], %r3325; 2026-02-21T10:23:48.7842337Z prmt.b32 %r4170, %r3325, 0, 0x7771U; 2026-02-21T10:23:48.7842404Z st.shared.b8 [%r35+256], %r4170; 2026-02-21T10:23:48.7842474Z prmt.b32 %r4171, %r3325, 0, 0x7772U; 2026-02-21T10:23:48.7842539Z st.shared.b8 [%r36+512], %r4171; 2026-02-21T10:23:48.7842604Z prmt.b32 %r4172, %r3325, 0, 0x7773U; 2026-02-21T10:23:48.7842670Z st.shared.b8 [%r37+768], %r4172; 2026-02-21T10:23:48.7842726Z bar.sync 0; 2026-02-21T10:23:48.7842801Z ld.shared.b32 %r4173, [%r38]; 2026-02-21T10:23:48.7842867Z prmt.b32 %r4174, %r4173, 0, 0x7770U; 2026-02-21T10:23:48.7842936Z cvt.u16.u32 %rs297, %r4174; 2026-02-21T10:23:48.7843000Z prmt.b32 %r4175, %r4173, 0, 0x7771U; 2026-02-21T10:23:48.7843132Z cvt.u16.u32 %rs298, %r4175; 2026-02-21T10:23:48.7843200Z prmt.b32 %r4176, %r4173, 0, 0x7772U; 2026-02-21T10:23:48.7843261Z cvt.u16.u32 %rs299, %r4176; 2026-02-21T10:23:48.7843325Z prmt.b32 %r4177, %r4173, 0, 0x7773U; 2026-02-21T10:23:48.7843439Z cvt.u16.u32 %rs300, %r4177; 2026-02-21T10:23:48.7843509Z ld.shared.b32 %r4178, [%r38+128]; 2026-02-21T10:23:48.7843574Z prmt.b32 %r4179, %r4178, 0, 0x7770U; 2026-02-21T10:23:48.7843635Z cvt.u16.u32 %rs301, %r4179; 2026-02-21T10:23:48.7843703Z prmt.b32 %r4180, %r4178, 0, 0x7771U; 2026-02-21T10:23:48.7843764Z cvt.u16.u32 %rs302, %r4180; 2026-02-21T10:23:48.7843828Z prmt.b32 %r4181, %r4178, 0, 0x7772U; 2026-02-21T10:23:48.7843893Z cvt.u16.u32 %rs303, %r4181; 2026-02-21T10:23:48.7843956Z prmt.b32 %r4182, %r4178, 0, 0x7773U; 2026-02-21T10:23:48.7844017Z cvt.u16.u32 %rs304, %r4182; 2026-02-21T10:23:48.7844222Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7844369Z shl.b16 %rs305, %rs297, 4; 2026-02-21T10:23:48.7844436Z shl.b16 %rs306, %rs298, 4; 2026-02-21T10:23:48.7844498Z shl.b16 %rs307, %rs299, 4; 2026-02-21T10:23:48.7844564Z shl.b16 %rs308, %rs300, 4; 2026-02-21T10:23:48.7844629Z shl.b16 %rs309, %rs301, 4; 2026-02-21T10:23:48.7844689Z shl.b16 %rs310, %rs302, 4; 2026-02-21T10:23:48.7844749Z shl.b16 %rs311, %rs303, 4; 2026-02-21T10:23:48.7844863Z shl.b16 %rs312, %rs304, 4; 2026-02-21T10:23:48.7845070Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7845145Z selp.b16 %rs313, %rs305, %rs297, %p45; 2026-02-21T10:23:48.7845212Z cvt.s16.s8 %rs314, %rs313; 2026-02-21T10:23:48.7845272Z shr.s16 %rs315, %rs314, 4; 2026-02-21T10:23:48.7845343Z selp.b16 %rs316, %rs306, %rs298, %p45; 2026-02-21T10:23:48.7845410Z cvt.s16.s8 %rs317, %rs316; 2026-02-21T10:23:48.7845472Z shr.s16 %rs318, %rs317, 4; 2026-02-21T10:23:48.7845540Z selp.b16 %rs319, %rs307, %rs299, %p45; 2026-02-21T10:23:48.7845605Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T10:23:48.7845671Z shr.s16 %rs321, %rs320, 4; 2026-02-21T10:23:48.7845739Z selp.b16 %rs322, %rs308, %rs300, %p45; 2026-02-21T10:23:48.7845801Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T10:23:48.7845880Z shr.s16 %rs324, %rs323, 4; 2026-02-21T10:23:48.7845951Z selp.b16 %rs325, %rs309, %rs301, %p45; 2026-02-21T10:23:48.7846014Z cvt.s16.s8 %rs326, %rs325; 2026-02-21T10:23:48.7846075Z shr.s16 %rs327, %rs326, 4; 2026-02-21T10:23:48.7846149Z selp.b16 %rs328, %rs310, %rs302, %p45; 2026-02-21T10:23:48.7846210Z cvt.s16.s8 %rs329, %rs328; 2026-02-21T10:23:48.7846270Z shr.s16 %rs330, %rs329, 4; 2026-02-21T10:23:48.7846353Z selp.b16 %rs331, %rs311, %rs303, %p45; 2026-02-21T10:23:48.7846418Z cvt.s16.s8 %rs332, %rs331; 2026-02-21T10:23:48.7846593Z shr.s16 %rs333, %rs332, 4; 2026-02-21T10:23:48.7846675Z selp.b16 %rs334, %rs312, %rs304, %p45; 2026-02-21T10:23:48.7846741Z cvt.s16.s8 %rs335, %rs334; 2026-02-21T10:23:48.7846806Z shr.s16 %rs336, %rs335, 4; 2026-02-21T10:23:48.7847007Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7847077Z cvt.rn.f32.s16 %r4183, %rs315; 2026-02-21T10:23:48.7847143Z cvt.rn.f32.s16 %r4184, %rs318; 2026-02-21T10:23:48.7847206Z cvt.rn.f32.s16 %r4185, %rs321; 2026-02-21T10:23:48.7847272Z cvt.rn.f32.s16 %r4186, %rs324; 2026-02-21T10:23:48.7847338Z cvt.rn.f32.s16 %r4187, %rs327; 2026-02-21T10:23:48.7847399Z cvt.rn.f32.s16 %r4188, %rs330; 2026-02-21T10:23:48.7847461Z cvt.rn.f32.s16 %r4189, %rs333; 2026-02-21T10:23:48.7847527Z cvt.rn.f32.s16 %r4190, %rs336; 2026-02-21T10:23:48.7847581Z bar.sync 0; 2026-02-21T10:23:48.7847647Z st.shared.b32 [%r39], %r4183; 2026-02-21T10:23:48.7847717Z st.shared.b32 [%r39+8], %r4184; 2026-02-21T10:23:48.7847781Z st.shared.b32 [%r40], %r4185; 2026-02-21T10:23:48.7847844Z st.shared.b32 [%r40+8], %r4186; 2026-02-21T10:23:48.7847907Z st.shared.b32 [%r41], %r4187; 2026-02-21T10:23:48.7848061Z st.shared.b32 [%r41+8], %r4188; 2026-02-21T10:23:48.7848124Z st.shared.b32 [%r42], %r4189; 2026-02-21T10:23:48.7848189Z st.shared.b32 [%r42+8], %r4190; 2026-02-21T10:23:48.7848245Z $L__tmp13: 2026-02-21T10:23:48.7848584Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7848646Z // begin inline asm 2026-02-21T10:23:48.7848728Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7848785Z // end inline asm 2026-02-21T10:23:48.7848840Z bar.sync 0; 2026-02-21T10:23:48.7848914Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7848977Z // begin inline asm 2026-02-21T10:23:48.7850330Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3454,%r3455,%r3456,%r3457}, %rd78, %p12, 1, 1; 2026-02-21T10:23:48.7850406Z // end inline asm 2026-02-21T10:23:48.7850465Z // begin inline asm 2026-02-21T10:23:48.7851777Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3586,%r3587,%r3588,%r3589}, %rd79, %p12, 1, 1; 2026-02-21T10:23:48.7851841Z // end inline asm 2026-02-21T10:23:48.7851919Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7851981Z mov.b32 %r3654, %r4869; 2026-02-21T10:23:48.7852054Z mov.b32 %r3655, %r4059; 2026-02-21T10:23:48.7852117Z mov.b32 %r3656, %r4059; 2026-02-21T10:23:48.7852176Z // begin inline asm 2026-02-21T10:23:48.7853248Z // wait for regs: %r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487,%r3654,%r3655,%r3656 2026-02-21T10:23:48.7853329Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7853395Z // end inline asm 2026-02-21T10:23:48.7853451Z $L__tmp14: 2026-02-21T10:23:48.7853669Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7853735Z // begin inline asm 2026-02-21T10:23:48.7853795Z mov.u64 %rd96, 0x0; 2026-02-21T10:23:48.7853925Z createpolicy.fractional.L2::evict_last.b64 %rd96, 1.0; 2026-02-21T10:23:48.7853983Z // end inline asm 2026-02-21T10:23:48.7854044Z // begin inline asm 2026-02-21T10:23:48.7854107Z mov.u32 %r3724, 0x0; 2026-02-21T10:23:48.7854165Z mov.u32 %r3725, 0x0; 2026-02-21T10:23:48.7854223Z mov.u32 %r3726, 0x0; 2026-02-21T10:23:48.7854282Z mov.u32 %r3727, 0x0; 2026-02-21T10:23:48.7854520Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3724, %r3725, %r3726, %r3727 }, [ %rd191 + 0 ], %rd96; 2026-02-21T10:23:48.7854582Z // end inline asm 2026-02-21T10:23:48.7854789Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7854851Z bar.sync 0; 2026-02-21T10:23:48.7854934Z st.shared.v2.b32 [%r29], {%r3724, %r3725}; 2026-02-21T10:23:48.7855010Z st.shared.v2.b32 [%r30], {%r3726, %r3727}; 2026-02-21T10:23:48.7855134Z bar.sync 0; 2026-02-21T10:23:48.7855208Z ld.shared.b16 %rs337, [%r32]; 2026-02-21T10:23:48.7855276Z ld.shared.b16 %rs338, [%r32+256]; 2026-02-21T10:23:48.7855344Z ld.shared.b16 %rs339, [%r32+16]; 2026-02-21T10:23:48.7855416Z ld.shared.b16 %rs340, [%r32+272]; 2026-02-21T10:23:48.7855533Z ld.shared.b16 %rs341, [%r33]; 2026-02-21T10:23:48.7855601Z ld.shared.b16 %rs342, [%r33+256]; 2026-02-21T10:23:48.7855677Z ld.shared.b16 %rs343, [%r33+16]; 2026-02-21T10:23:48.7855742Z ld.shared.b16 %rs344, [%r33+272]; 2026-02-21T10:23:48.7855804Z cvt.f32.bf16 %r3857, %rs337; 2026-02-21T10:23:48.7855865Z cvt.f32.bf16 %r3858, %rs338; 2026-02-21T10:23:48.7855931Z cvt.f32.bf16 %r3859, %rs341; 2026-02-21T10:23:48.7855992Z cvt.f32.bf16 %r3860, %rs342; 2026-02-21T10:23:48.7856053Z cvt.f32.bf16 %r3989, %rs339; 2026-02-21T10:23:48.7856116Z cvt.f32.bf16 %r3990, %rs340; 2026-02-21T10:23:48.7856178Z cvt.f32.bf16 %r3991, %rs343; 2026-02-21T10:23:48.7856237Z cvt.f32.bf16 %r3992, %rs344; 2026-02-21T10:23:48.7856636Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7856729Z add.s64 %rd100, %rd190, 30720; 2026-02-21T10:23:48.7856791Z // begin inline asm 2026-02-21T10:23:48.7856852Z mov.u64 %rd99, 0x0; 2026-02-21T10:23:48.7856978Z createpolicy.fractional.L2::evict_last.b64 %rd99, 1.0; 2026-02-21T10:23:48.7857036Z // end inline asm 2026-02-21T10:23:48.7857158Z // begin inline asm 2026-02-21T10:23:48.7857223Z mov.u32 %r3728, 0x0; 2026-02-21T10:23:48.7857384Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3728 }, [ %rd100 + 0 ], %rd99; 2026-02-21T10:23:48.7857441Z // end inline asm 2026-02-21T10:23:48.7857650Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7857711Z bar.sync 0; 2026-02-21T10:23:48.7857775Z st.shared.b8 [%r34], %r3728; 2026-02-21T10:23:48.7857847Z prmt.b32 %r4191, %r3728, 0, 0x7771U; 2026-02-21T10:23:48.7857919Z st.shared.b8 [%r35+256], %r4191; 2026-02-21T10:23:48.7857993Z prmt.b32 %r4192, %r3728, 0, 0x7772U; 2026-02-21T10:23:48.7858059Z st.shared.b8 [%r36+512], %r4192; 2026-02-21T10:23:48.7858130Z prmt.b32 %r4193, %r3728, 0, 0x7773U; 2026-02-21T10:23:48.7858200Z st.shared.b8 [%r37+768], %r4193; 2026-02-21T10:23:48.7858255Z bar.sync 0; 2026-02-21T10:23:48.7858320Z ld.shared.b32 %r4194, [%r38]; 2026-02-21T10:23:48.7858390Z prmt.b32 %r4195, %r4194, 0, 0x7770U; 2026-02-21T10:23:48.7858454Z cvt.u16.u32 %rs345, %r4195; 2026-02-21T10:23:48.7858520Z prmt.b32 %r4196, %r4194, 0, 0x7771U; 2026-02-21T10:23:48.7858586Z cvt.u16.u32 %rs346, %r4196; 2026-02-21T10:23:48.7858649Z prmt.b32 %r4197, %r4194, 0, 0x7772U; 2026-02-21T10:23:48.7858712Z cvt.u16.u32 %rs347, %r4197; 2026-02-21T10:23:48.7858775Z prmt.b32 %r4198, %r4194, 0, 0x7773U; 2026-02-21T10:23:48.7858842Z cvt.u16.u32 %rs348, %r4198; 2026-02-21T10:23:48.7858921Z ld.shared.b32 %r4199, [%r38+128]; 2026-02-21T10:23:48.7858986Z prmt.b32 %r4200, %r4199, 0, 0x7770U; 2026-02-21T10:23:48.7859054Z cvt.u16.u32 %rs349, %r4200; 2026-02-21T10:23:48.7859119Z prmt.b32 %r4201, %r4199, 0, 0x7771U; 2026-02-21T10:23:48.7859186Z cvt.u16.u32 %rs350, %r4201; 2026-02-21T10:23:48.7859252Z prmt.b32 %r4202, %r4199, 0, 0x7772U; 2026-02-21T10:23:48.7859319Z cvt.u16.u32 %rs351, %r4202; 2026-02-21T10:23:48.7859383Z prmt.b32 %r4203, %r4199, 0, 0x7773U; 2026-02-21T10:23:48.7859445Z cvt.u16.u32 %rs352, %r4203; 2026-02-21T10:23:48.7859655Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7859718Z shl.b16 %rs353, %rs345, 4; 2026-02-21T10:23:48.7859780Z shl.b16 %rs354, %rs346, 4; 2026-02-21T10:23:48.7859845Z shl.b16 %rs355, %rs347, 4; 2026-02-21T10:23:48.7859906Z shl.b16 %rs356, %rs348, 4; 2026-02-21T10:23:48.7859967Z shl.b16 %rs357, %rs349, 4; 2026-02-21T10:23:48.7860027Z shl.b16 %rs358, %rs350, 4; 2026-02-21T10:23:48.7860092Z shl.b16 %rs359, %rs351, 4; 2026-02-21T10:23:48.7860241Z shl.b16 %rs360, %rs352, 4; 2026-02-21T10:23:48.7860447Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7860525Z selp.b16 %rs361, %rs353, %rs345, %p45; 2026-02-21T10:23:48.7860653Z cvt.s16.s8 %rs362, %rs361; 2026-02-21T10:23:48.7860713Z shr.s16 %rs363, %rs362, 4; 2026-02-21T10:23:48.7860783Z selp.b16 %rs364, %rs354, %rs346, %p45; 2026-02-21T10:23:48.7860852Z cvt.s16.s8 %rs365, %rs364; 2026-02-21T10:23:48.7860911Z shr.s16 %rs366, %rs365, 4; 2026-02-21T10:23:48.7860981Z selp.b16 %rs367, %rs355, %rs347, %p45; 2026-02-21T10:23:48.7861045Z cvt.s16.s8 %rs368, %rs367; 2026-02-21T10:23:48.7861105Z shr.s16 %rs369, %rs368, 4; 2026-02-21T10:23:48.7861171Z selp.b16 %rs370, %rs356, %rs348, %p45; 2026-02-21T10:23:48.7861233Z cvt.s16.s8 %rs371, %rs370; 2026-02-21T10:23:48.7861293Z shr.s16 %rs372, %rs371, 4; 2026-02-21T10:23:48.7861362Z selp.b16 %rs373, %rs357, %rs349, %p45; 2026-02-21T10:23:48.7861423Z cvt.s16.s8 %rs374, %rs373; 2026-02-21T10:23:48.7861541Z shr.s16 %rs375, %rs374, 4; 2026-02-21T10:23:48.7861611Z selp.b16 %rs376, %rs358, %rs350, %p45; 2026-02-21T10:23:48.7861672Z cvt.s16.s8 %rs377, %rs376; 2026-02-21T10:23:48.7861747Z shr.s16 %rs378, %rs377, 4; 2026-02-21T10:23:48.7861819Z selp.b16 %rs379, %rs359, %rs351, %p45; 2026-02-21T10:23:48.7861878Z cvt.s16.s8 %rs380, %rs379; 2026-02-21T10:23:48.7861940Z shr.s16 %rs381, %rs380, 4; 2026-02-21T10:23:48.7862061Z selp.b16 %rs382, %rs360, %rs352, %p45; 2026-02-21T10:23:48.7862126Z cvt.s16.s8 %rs383, %rs382; 2026-02-21T10:23:48.7862186Z shr.s16 %rs384, %rs383, 4; 2026-02-21T10:23:48.7862392Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7862459Z cvt.rn.f32.s16 %r4204, %rs363; 2026-02-21T10:23:48.7862523Z cvt.rn.f32.s16 %r4205, %rs366; 2026-02-21T10:23:48.7862585Z cvt.rn.f32.s16 %r4206, %rs369; 2026-02-21T10:23:48.7862651Z cvt.rn.f32.s16 %r4207, %rs372; 2026-02-21T10:23:48.7862717Z cvt.rn.f32.s16 %r4208, %rs375; 2026-02-21T10:23:48.7862781Z cvt.rn.f32.s16 %r4209, %rs378; 2026-02-21T10:23:48.7862846Z cvt.rn.f32.s16 %r4210, %rs381; 2026-02-21T10:23:48.7862909Z cvt.rn.f32.s16 %r4211, %rs384; 2026-02-21T10:23:48.7862965Z bar.sync 0; 2026-02-21T10:23:48.7863035Z st.shared.b32 [%r39], %r4204; 2026-02-21T10:23:48.7863101Z st.shared.b32 [%r39+8], %r4205; 2026-02-21T10:23:48.7863164Z st.shared.b32 [%r40], %r4206; 2026-02-21T10:23:48.7863230Z st.shared.b32 [%r40+8], %r4207; 2026-02-21T10:23:48.7863298Z st.shared.b32 [%r41], %r4208; 2026-02-21T10:23:48.7863361Z st.shared.b32 [%r41+8], %r4209; 2026-02-21T10:23:48.7863425Z st.shared.b32 [%r42], %r4210; 2026-02-21T10:23:48.7863492Z st.shared.b32 [%r42+8], %r4211; 2026-02-21T10:23:48.7863549Z $L__tmp15: 2026-02-21T10:23:48.7863827Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7863889Z // begin inline asm 2026-02-21T10:23:48.7863974Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7864031Z // end inline asm 2026-02-21T10:23:48.7864087Z bar.sync 0; 2026-02-21T10:23:48.7864176Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7864237Z // begin inline asm 2026-02-21T10:23:48.7865505Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3857,%r3858,%r3859,%r3860}, %rd78, %p12, 1, 1; 2026-02-21T10:23:48.7865571Z // end inline asm 2026-02-21T10:23:48.7865630Z // begin inline asm 2026-02-21T10:23:48.7867020Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487}, {%r3989,%r3990,%r3991,%r3992}, %rd79, %p12, 1, 1; 2026-02-21T10:23:48.7867231Z // end inline asm 2026-02-21T10:23:48.7867309Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7867369Z mov.b32 %r4057, %r4869; 2026-02-21T10:23:48.7867433Z mov.b32 %r4058, %r4059; 2026-02-21T10:23:48.7867494Z // begin inline asm 2026-02-21T10:23:48.7868702Z // wait for regs: %r6424,%r6425,%r6426,%r6427,%r6428,%r6429,%r6430,%r6431,%r6432,%r6433,%r6434,%r6435,%r6436,%r6437,%r6438,%r6439,%r6440,%r6441,%r6442,%r6443,%r6444,%r6445,%r6446,%r6447,%r6448,%r6449,%r6450,%r6451,%r6452,%r6453,%r6454,%r6455,%r6456,%r6457,%r6458,%r6459,%r6460,%r6461,%r6462,%r6463,%r6464,%r6465,%r6466,%r6467,%r6468,%r6469,%r6470,%r6471,%r6472,%r6473,%r6474,%r6475,%r6476,%r6477,%r6478,%r6479,%r6480,%r6481,%r6482,%r6483,%r6484,%r6485,%r6486,%r6487,%r4057,%r4058,%r4059 2026-02-21T10:23:48.7868794Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7868852Z // end inline asm 2026-02-21T10:23:48.7868907Z $L__tmp16: 2026-02-21T10:23:48.7869191Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7869273Z add.s64 %rd192, %rd192, 32; 2026-02-21T10:23:48.7869339Z add.s64 %rd191, %rd191, 128; 2026-02-21T10:23:48.7869406Z add.s64 %rd190, %rd190, 40960; 2026-02-21T10:23:48.7869474Z setp.lt.u64 %p21, %rd192, 4064; 2026-02-21T10:23:48.7869538Z @%p21 bra $L__BB0_6; 2026-02-21T10:23:48.7869651Z // %bb.7: // in Loop: Header=BB0_3 Depth=1 2026-02-21T10:23:48.7869863Z .loc 1 34 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:32 2026-02-21T10:23:48.7869941Z or.b32 %r4284, %r187, %r21; 2026-02-21T10:23:48.7870144Z .loc 1 36 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:36:32 2026-02-21T10:23:48.7870213Z or.b32 %r4285, %r188, %r10; 2026-02-21T10:23:48.7870275Z or.b32 %r4286, %r188, %r11; 2026-02-21T10:23:48.7870337Z or.b32 %r4287, %r188, %r12; 2026-02-21T10:23:48.7870403Z or.b32 %r4288, %r188, %r13; 2026-02-21T10:23:48.7870465Z or.b32 %r4289, %r188, %r14; 2026-02-21T10:23:48.7870528Z or.b32 %r4290, %r188, %r15; 2026-02-21T10:23:48.7870588Z or.b32 %r4291, %r188, %r16; 2026-02-21T10:23:48.7870650Z or.b32 %r4292, %r188, %r17; 2026-02-21T10:23:48.7870849Z .loc 1 90 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:90:28 2026-02-21T10:23:48.7870928Z cvt.rn.bf16x2.f32 %r4293, %r6425, %r6424; 2026-02-21T10:23:48.7871010Z cvt.rn.bf16x2.f32 %r4294, %r6427, %r6426; 2026-02-21T10:23:48.7871083Z cvt.rn.bf16x2.f32 %r4295, %r6429, %r6428; 2026-02-21T10:23:48.7871161Z cvt.rn.bf16x2.f32 %r4296, %r6431, %r6430; 2026-02-21T10:23:48.7871239Z cvt.rn.bf16x2.f32 %r4297, %r6433, %r6432; 2026-02-21T10:23:48.7871318Z cvt.rn.bf16x2.f32 %r4298, %r6435, %r6434; 2026-02-21T10:23:48.7871392Z cvt.rn.bf16x2.f32 %r4299, %r6437, %r6436; 2026-02-21T10:23:48.7871463Z cvt.rn.bf16x2.f32 %r4300, %r6439, %r6438; 2026-02-21T10:23:48.7871539Z cvt.rn.bf16x2.f32 %r4301, %r6441, %r6440; 2026-02-21T10:23:48.7871612Z cvt.rn.bf16x2.f32 %r4302, %r6443, %r6442; 2026-02-21T10:23:48.7871682Z cvt.rn.bf16x2.f32 %r4303, %r6445, %r6444; 2026-02-21T10:23:48.7871756Z cvt.rn.bf16x2.f32 %r4304, %r6447, %r6446; 2026-02-21T10:23:48.7871838Z cvt.rn.bf16x2.f32 %r4305, %r6449, %r6448; 2026-02-21T10:23:48.7871910Z cvt.rn.bf16x2.f32 %r4306, %r6451, %r6450; 2026-02-21T10:23:48.7871986Z cvt.rn.bf16x2.f32 %r4307, %r6453, %r6452; 2026-02-21T10:23:48.7872062Z cvt.rn.bf16x2.f32 %r4308, %r6455, %r6454; 2026-02-21T10:23:48.7872133Z cvt.rn.bf16x2.f32 %r4309, %r6457, %r6456; 2026-02-21T10:23:48.7872265Z cvt.rn.bf16x2.f32 %r4310, %r6459, %r6458; 2026-02-21T10:23:48.7872341Z cvt.rn.bf16x2.f32 %r4311, %r6461, %r6460; 2026-02-21T10:23:48.7872413Z cvt.rn.bf16x2.f32 %r4312, %r6463, %r6462; 2026-02-21T10:23:48.7872562Z cvt.rn.bf16x2.f32 %r4313, %r6465, %r6464; 2026-02-21T10:23:48.7872641Z cvt.rn.bf16x2.f32 %r4314, %r6467, %r6466; 2026-02-21T10:23:48.7872714Z cvt.rn.bf16x2.f32 %r4315, %r6469, %r6468; 2026-02-21T10:23:48.7872785Z cvt.rn.bf16x2.f32 %r4316, %r6471, %r6470; 2026-02-21T10:23:48.7872854Z cvt.rn.bf16x2.f32 %r4317, %r6473, %r6472; 2026-02-21T10:23:48.7872927Z cvt.rn.bf16x2.f32 %r4318, %r6475, %r6474; 2026-02-21T10:23:48.7872997Z cvt.rn.bf16x2.f32 %r4319, %r6477, %r6476; 2026-02-21T10:23:48.7873067Z cvt.rn.bf16x2.f32 %r4320, %r6479, %r6478; 2026-02-21T10:23:48.7873141Z cvt.rn.bf16x2.f32 %r4321, %r6481, %r6480; 2026-02-21T10:23:48.7873212Z cvt.rn.bf16x2.f32 %r4322, %r6483, %r6482; 2026-02-21T10:23:48.7873282Z cvt.rn.bf16x2.f32 %r4323, %r6485, %r6484; 2026-02-21T10:23:48.7873411Z cvt.rn.bf16x2.f32 %r4324, %r6487, %r6486; 2026-02-21T10:23:48.7873620Z .loc 1 91 50 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:50 2026-02-21T10:23:48.7873705Z mad.lo.s32 %r4325, %r4285, 1280, %r4284; 2026-02-21T10:23:48.7873777Z mad.lo.s32 %r4326, %r4286, 1280, %r4284; 2026-02-21T10:23:48.7873849Z mad.lo.s32 %r4327, %r4287, 1280, %r4284; 2026-02-21T10:23:48.7873970Z mad.lo.s32 %r4328, %r4288, 1280, %r4284; 2026-02-21T10:23:48.7874041Z mad.lo.s32 %r4329, %r4289, 1280, %r4284; 2026-02-21T10:23:48.7874112Z mad.lo.s32 %r4330, %r4290, 1280, %r4284; 2026-02-21T10:23:48.7874183Z mad.lo.s32 %r4331, %r4291, 1280, %r4284; 2026-02-21T10:23:48.7874251Z mad.lo.s32 %r4332, %r4292, 1280, %r4284; 2026-02-21T10:23:48.7874468Z .loc 1 91 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:22 2026-02-21T10:23:48.7874539Z mad.wide.s32 %rd104, %r4325, 2, %rd25; 2026-02-21T10:23:48.7874612Z mad.wide.s32 %rd105, %r4326, 2, %rd25; 2026-02-21T10:23:48.7874679Z mad.wide.s32 %rd106, %r4327, 2, %rd25; 2026-02-21T10:23:48.7874750Z mad.wide.s32 %rd107, %r4328, 2, %rd25; 2026-02-21T10:23:48.7874817Z mad.wide.s32 %rd108, %r4329, 2, %rd25; 2026-02-21T10:23:48.7874889Z mad.wide.s32 %rd109, %r4330, 2, %rd25; 2026-02-21T10:23:48.7874961Z mad.wide.s32 %rd110, %r4331, 2, %rd25; 2026-02-21T10:23:48.7875031Z mad.wide.s32 %rd111, %r4332, 2, %rd25; 2026-02-21T10:23:48.7875232Z .loc 1 91 81 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:81 2026-02-21T10:23:48.7875292Z bar.sync 0; 2026-02-21T10:23:48.7875405Z st.shared.v4.b32 [%r43], {%r4293, %r4295, %r4297, %r4299}; 2026-02-21T10:23:48.7875521Z st.shared.v4.b32 [%r43+512], {%r4294, %r4296, %r4298, %r4300}; 2026-02-21T10:23:48.7875628Z st.shared.v4.b32 [%r44], {%r4301, %r4303, %r4305, %r4307}; 2026-02-21T10:23:48.7875742Z st.shared.v4.b32 [%r44+512], {%r4302, %r4304, %r4306, %r4308}; 2026-02-21T10:23:48.7875852Z st.shared.v4.b32 [%r45], {%r4309, %r4311, %r4313, %r4315}; 2026-02-21T10:23:48.7875963Z st.shared.v4.b32 [%r45+512], {%r4310, %r4312, %r4314, %r4316}; 2026-02-21T10:23:48.7876066Z st.shared.v4.b32 [%r46], {%r4317, %r4319, %r4321, %r4323}; 2026-02-21T10:23:48.7876176Z st.shared.v4.b32 [%r46+512], {%r4318, %r4320, %r4322, %r4324}; 2026-02-21T10:23:48.7876232Z bar.sync 0; 2026-02-21T10:23:48.7876308Z // begin inline asm 2026-02-21T10:23:48.7876636Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4212, %r4213, %r4214, %r4215}, [%r2380]; 2026-02-21T10:23:48.7876702Z // end inline asm 2026-02-21T10:23:48.7876762Z // begin inline asm 2026-02-21T10:23:48.7876950Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4217, %r4218, %r4219, %r4220}, [%r2385]; 2026-02-21T10:23:48.7877007Z // end inline asm 2026-02-21T10:23:48.7877064Z // begin inline asm 2026-02-21T10:23:48.7877249Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4222, %r4223, %r4224, %r4225}, [%r2390]; 2026-02-21T10:23:48.7877391Z // end inline asm 2026-02-21T10:23:48.7877452Z // begin inline asm 2026-02-21T10:23:48.7877636Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4227, %r4228, %r4229, %r4230}, [%r2395]; 2026-02-21T10:23:48.7877696Z // end inline asm 2026-02-21T10:23:48.7877819Z // begin inline asm 2026-02-21T10:23:48.7878009Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4232, %r4233, %r4234, %r4235}, [%r2400]; 2026-02-21T10:23:48.7878073Z // end inline asm 2026-02-21T10:23:48.7878134Z // begin inline asm 2026-02-21T10:23:48.7878313Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4237, %r4238, %r4239, %r4240}, [%r2405]; 2026-02-21T10:23:48.7878374Z // end inline asm 2026-02-21T10:23:48.7878433Z // begin inline asm 2026-02-21T10:23:48.7878610Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4242, %r4243, %r4244, %r4245}, [%r2410]; 2026-02-21T10:23:48.7878666Z // end inline asm 2026-02-21T10:23:48.7878727Z // begin inline asm 2026-02-21T10:23:48.7878903Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4247, %r4248, %r4249, %r4250}, [%r2415]; 2026-02-21T10:23:48.7879024Z // end inline asm 2026-02-21T10:23:48.7879091Z // begin inline asm 2026-02-21T10:23:48.7879230Z st.global.v4.b32 [ %rd104 + 0 ], { %r4212, %r4213, %r4214, %r4215 }; 2026-02-21T10:23:48.7879288Z // end inline asm 2026-02-21T10:23:48.7879353Z // begin inline asm 2026-02-21T10:23:48.7879473Z st.global.v4.b32 [ %rd105 + 0 ], { %r4217, %r4218, %r4219, %r4220 }; 2026-02-21T10:23:48.7879528Z // end inline asm 2026-02-21T10:23:48.7879650Z // begin inline asm 2026-02-21T10:23:48.7879774Z st.global.v4.b32 [ %rd106 + 0 ], { %r4222, %r4223, %r4224, %r4225 }; 2026-02-21T10:23:48.7879831Z // end inline asm 2026-02-21T10:23:48.7879890Z // begin inline asm 2026-02-21T10:23:48.7880009Z st.global.v4.b32 [ %rd107 + 0 ], { %r4227, %r4228, %r4229, %r4230 }; 2026-02-21T10:23:48.7880065Z // end inline asm 2026-02-21T10:23:48.7880134Z // begin inline asm 2026-02-21T10:23:48.7880251Z st.global.v4.b32 [ %rd108 + 0 ], { %r4232, %r4233, %r4234, %r4235 }; 2026-02-21T10:23:48.7880314Z // end inline asm 2026-02-21T10:23:48.7880376Z // begin inline asm 2026-02-21T10:23:48.7880489Z st.global.v4.b32 [ %rd109 + 0 ], { %r4237, %r4238, %r4239, %r4240 }; 2026-02-21T10:23:48.7880550Z // end inline asm 2026-02-21T10:23:48.7880608Z // begin inline asm 2026-02-21T10:23:48.7880728Z st.global.v4.b32 [ %rd110 + 0 ], { %r4242, %r4243, %r4244, %r4245 }; 2026-02-21T10:23:48.7880803Z // end inline asm 2026-02-21T10:23:48.7880864Z // begin inline asm 2026-02-21T10:23:48.7880977Z st.global.v4.b32 [ %rd111 + 0 ], { %r4247, %r4248, %r4249, %r4250 }; 2026-02-21T10:23:48.7881034Z // end inline asm 2026-02-21T10:23:48.7881256Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7881320Z add.s32 %r6359, %r6359, 2; 2026-02-21T10:23:48.7881390Z setp.lt.s32 %p22, %r6359, %r6586; 2026-02-21T10:23:48.7881455Z @%p22 bra $L__BB0_3; 2026-02-21T10:23:48.7881513Z bra.uni $L__BB0_8; 2026-02-21T10:23:48.7881623Z $L__BB0_1: // %.._crit_edge_crit_edge 2026-02-21T10:23:48.7881835Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7881898Z and.b32 %r6488, %r4, 16; 2026-02-21T10:23:48.7881987Z $L__BB0_8: // %._crit_edge 2026-02-21T10:23:48.7882196Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7882265Z sub.s32 %r4381, %r2, %r6586; 2026-02-21T10:23:48.7882327Z shl.b32 %r319, %r4381, 7; 2026-02-21T10:23:48.7882394Z setp.lt.s32 %p23, %r319, 1; 2026-02-21T10:23:48.7882464Z setp.gt.s32 %p24, %r319, 0; 2026-02-21T10:23:48.7882665Z .loc 1 28 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:28:35 2026-02-21T10:23:48.7882726Z shr.s32 %r4382, %r6586, 31; 2026-02-21T10:23:48.7882801Z shr.u32 %r4383, %r4382, 17; 2026-02-21T10:23:48.7882868Z add.s32 %r4384, %r6586, %r4383; 2026-02-21T10:23:48.7882986Z shr.s32 %r4385, %r4384, 15; 2026-02-21T10:23:48.7883200Z .loc 1 29 33 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:29:33 2026-02-21T10:23:48.7883266Z shl.b32 %r320, %r4385, 6; 2026-02-21T10:23:48.7883468Z .loc 1 30 39 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:39 2026-02-21T10:23:48.7883579Z sub.s32 %r4386, 10, %r320; 2026-02-21T10:23:48.7883783Z .loc 1 30 52 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:52 2026-02-21T10:23:48.7883843Z min.s32 %r321, %r4386, 64; 2026-02-21T10:23:48.7884041Z .loc 1 31 45 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:45 2026-02-21T10:23:48.7884108Z and.b32 %r4387, %r4384, -32768; 2026-02-21T10:23:48.7884171Z sub.s32 %r322, %r6586, %r4387; 2026-02-21T10:23:48.7884369Z .loc 1 32 51 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:32:51 2026-02-21T10:23:48.7884435Z div.s32 %r323, %r322, %r321; 2026-02-21T10:23:48.7884687Z .loc 1 35 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:35:27 2026-02-21T10:23:48.7884761Z shl.b32 %r6489, %r323, 7; 2026-02-21T10:23:48.7884969Z .loc 1 36 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:36:32 2026-02-21T10:23:48.7885039Z or.b32 %r6587, %r6489, %r7; 2026-02-21T10:23:48.7885145Z or.b32 %r6588, %r6489, %r8; 2026-02-21T10:23:48.7885350Z .loc 1 51 53 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:53 2026-02-21T10:23:48.7885413Z shl.b32 %r4388, %r6587, 13; 2026-02-21T10:23:48.7885471Z shl.b32 %r4389, %r6588, 13; 2026-02-21T10:23:48.7885670Z .loc 1 51 60 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:60 2026-02-21T10:23:48.7885739Z or.b32 %r4390, %r4388, %r25; 2026-02-21T10:23:48.7885801Z or.b32 %r4391, %r4389, %r25; 2026-02-21T10:23:48.7886003Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7886084Z mad.wide.s32 %rd112, %r4390, 2, %rd23; 2026-02-21T10:23:48.7886156Z mad.wide.s32 %rd113, %r4391, 2, %rd23; 2026-02-21T10:23:48.7886355Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7886420Z and.b32 %r4392, %r20, 1912; 2026-02-21T10:23:48.7886621Z setp.eq.b32 %p25, %r6488, 0; 2026-02-21T10:23:48.7886693Z selp.b32 %r327, 0, 136, %p25; 2026-02-21T10:23:48.7886767Z xor.b32 %r328, %r327, %r4392; 2026-02-21T10:23:48.7886833Z add.s32 %r4394, %r4869, %r328; 2026-02-21T10:23:48.7886900Z add.s32 %r4333, %r4394, 32768; 2026-02-21T10:23:48.7886968Z selp.b32 %r4334, 8, 0, %p24; 2026-02-21T10:23:48.7887027Z // begin inline asm 2026-02-21T10:23:48.7887179Z cp.async.ca.shared.global [ %r4333 + 0 ], [ %rd112 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7887238Z // end inline asm 2026-02-21T10:23:48.7887299Z add.s32 %r4335, %r4394, 34816; 2026-02-21T10:23:48.7887380Z // begin inline asm 2026-02-21T10:23:48.7887518Z cp.async.ca.shared.global [ %r4335 + 0 ], [ %rd113 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7887574Z // end inline asm 2026-02-21T10:23:48.7887643Z cp.async.commit_group; 2026-02-21T10:23:48.7887854Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7887919Z cvt.s64.s32 %rd136, %r4388; 2026-02-21T10:23:48.7887985Z cvt.u64.u32 %rd137, %r25; 2026-02-21T10:23:48.7888055Z or.b64 %rd138, %rd136, %rd137; 2026-02-21T10:23:48.7888120Z shl.b64 %rd139, %rd138, 1; 2026-02-21T10:23:48.7888184Z add.s64 %rd140, %rd23, %rd139; 2026-02-21T10:23:48.7888248Z add.s64 %rd114, %rd140, 32; 2026-02-21T10:23:48.7888315Z cvt.s64.s32 %rd141, %r4389; 2026-02-21T10:23:48.7888377Z or.b64 %rd142, %rd141, %rd137; 2026-02-21T10:23:48.7888442Z shl.b64 %rd143, %rd142, 1; 2026-02-21T10:23:48.7888508Z add.s64 %rd144, %rd23, %rd143; 2026-02-21T10:23:48.7888661Z add.s64 %rd115, %rd144, 32; 2026-02-21T10:23:48.7888872Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7888937Z add.s32 %r4337, %r4394, 45056; 2026-02-21T10:23:48.7888997Z // begin inline asm 2026-02-21T10:23:48.7889200Z cp.async.ca.shared.global [ %r4337 + 0 ], [ %rd114 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7889260Z // end inline asm 2026-02-21T10:23:48.7889330Z add.s32 %r4339, %r4394, 47104; 2026-02-21T10:23:48.7889390Z // begin inline asm 2026-02-21T10:23:48.7889528Z cp.async.ca.shared.global [ %r4339 + 0 ], [ %rd115 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7889594Z // end inline asm 2026-02-21T10:23:48.7889663Z cp.async.commit_group; 2026-02-21T10:23:48.7889871Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7889942Z add.s64 %rd116, %rd140, 64; 2026-02-21T10:23:48.7890007Z add.s64 %rd117, %rd144, 64; 2026-02-21T10:23:48.7890271Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7890354Z add.s32 %r4341, %r4394, 57344; 2026-02-21T10:23:48.7890425Z // begin inline asm 2026-02-21T10:23:48.7890564Z cp.async.ca.shared.global [ %r4341 + 0 ], [ %rd116 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7890627Z // end inline asm 2026-02-21T10:23:48.7890697Z add.s32 %r4343, %r4394, 59392; 2026-02-21T10:23:48.7890822Z // begin inline asm 2026-02-21T10:23:48.7890970Z cp.async.ca.shared.global [ %r4343 + 0 ], [ %rd117 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7891031Z // end inline asm 2026-02-21T10:23:48.7891108Z cp.async.commit_group; 2026-02-21T10:23:48.7891314Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7891380Z add.s64 %rd118, %rd140, 96; 2026-02-21T10:23:48.7891452Z add.s64 %rd119, %rd144, 96; 2026-02-21T10:23:48.7891657Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7891727Z add.s32 %r4345, %r4394, 69632; 2026-02-21T10:23:48.7891795Z // begin inline asm 2026-02-21T10:23:48.7891928Z cp.async.ca.shared.global [ %r4345 + 0 ], [ %rd118 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7891989Z // end inline asm 2026-02-21T10:23:48.7892053Z add.s32 %r4347, %r4394, 71680; 2026-02-21T10:23:48.7892118Z // begin inline asm 2026-02-21T10:23:48.7892254Z cp.async.ca.shared.global [ %r4347 + 0 ], [ %rd119 + 0 ], 0x8, %r4334; 2026-02-21T10:23:48.7892316Z // end inline asm 2026-02-21T10:23:48.7892390Z cp.async.commit_group; 2026-02-21T10:23:48.7892605Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7892674Z setp.gt.s32 %p26, %r319, 1; 2026-02-21T10:23:48.7893020Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7893123Z add.s64 %rd120, %rd140, 128; 2026-02-21T10:23:48.7893221Z add.s64 %rd121, %rd144, 128; 2026-02-21T10:23:48.7893603Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7893714Z bar.sync 0; 2026-02-21T10:23:48.7893878Z add.s32 %r4349, %r4394, 36864; 2026-02-21T10:23:48.7893980Z selp.b32 %r4350, 8, 0, %p26; 2026-02-21T10:23:48.7894077Z // begin inline asm 2026-02-21T10:23:48.7894270Z cp.async.ca.shared.global [ %r4349 + 0 ], [ %rd120 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7894437Z // end inline asm 2026-02-21T10:23:48.7900213Z add.s32 %r4351, %r4394, 38912; 2026-02-21T10:23:48.7900318Z // begin inline asm 2026-02-21T10:23:48.7900489Z cp.async.ca.shared.global [ %r4351 + 0 ], [ %rd121 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7900555Z // end inline asm 2026-02-21T10:23:48.7900633Z cp.async.commit_group; 2026-02-21T10:23:48.7900867Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7900939Z add.s64 %rd122, %rd140, 160; 2026-02-21T10:23:48.7901164Z add.s64 %rd123, %rd144, 160; 2026-02-21T10:23:48.7901386Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7901456Z add.s32 %r4353, %r4394, 49152; 2026-02-21T10:23:48.7901605Z // begin inline asm 2026-02-21T10:23:48.7901770Z cp.async.ca.shared.global [ %r4353 + 0 ], [ %rd122 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7901831Z // end inline asm 2026-02-21T10:23:48.7901896Z add.s32 %r4355, %r4394, 51200; 2026-02-21T10:23:48.7901958Z // begin inline asm 2026-02-21T10:23:48.7902107Z cp.async.ca.shared.global [ %r4355 + 0 ], [ %rd123 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7902167Z // end inline asm 2026-02-21T10:23:48.7902240Z cp.async.commit_group; 2026-02-21T10:23:48.7902459Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7902525Z add.s64 %rd124, %rd140, 192; 2026-02-21T10:23:48.7902586Z add.s64 %rd125, %rd144, 192; 2026-02-21T10:23:48.7902889Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7902955Z add.s32 %r4357, %r4394, 61440; 2026-02-21T10:23:48.7903020Z // begin inline asm 2026-02-21T10:23:48.7903169Z cp.async.ca.shared.global [ %r4357 + 0 ], [ %rd124 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7903226Z // end inline asm 2026-02-21T10:23:48.7903288Z add.s32 %r4359, %r4394, 63488; 2026-02-21T10:23:48.7903412Z // begin inline asm 2026-02-21T10:23:48.7903556Z cp.async.ca.shared.global [ %r4359 + 0 ], [ %rd125 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7903616Z // end inline asm 2026-02-21T10:23:48.7903684Z cp.async.commit_group; 2026-02-21T10:23:48.7903909Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7903974Z add.s64 %rd126, %rd140, 224; 2026-02-21T10:23:48.7904034Z add.s64 %rd127, %rd144, 224; 2026-02-21T10:23:48.7904242Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7904307Z add.s32 %r4361, %r4394, 73728; 2026-02-21T10:23:48.7904366Z // begin inline asm 2026-02-21T10:23:48.7904498Z cp.async.ca.shared.global [ %r4361 + 0 ], [ %rd126 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7904564Z // end inline asm 2026-02-21T10:23:48.7904625Z add.s32 %r4363, %r4394, 75776; 2026-02-21T10:23:48.7904696Z // begin inline asm 2026-02-21T10:23:48.7904836Z cp.async.ca.shared.global [ %r4363 + 0 ], [ %rd127 + 0 ], 0x8, %r4350; 2026-02-21T10:23:48.7904893Z // end inline asm 2026-02-21T10:23:48.7904958Z cp.async.commit_group; 2026-02-21T10:23:48.7905175Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7905245Z setp.gt.s32 %p27, %r319, 2; 2026-02-21T10:23:48.7905444Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7905507Z add.s64 %rd128, %rd140, 256; 2026-02-21T10:23:48.7905576Z add.s64 %rd129, %rd144, 256; 2026-02-21T10:23:48.7905785Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7905844Z bar.sync 0; 2026-02-21T10:23:48.7905913Z add.s32 %r4365, %r4394, 40960; 2026-02-21T10:23:48.7905978Z selp.b32 %r4366, 8, 0, %p27; 2026-02-21T10:23:48.7906041Z // begin inline asm 2026-02-21T10:23:48.7906181Z cp.async.ca.shared.global [ %r4365 + 0 ], [ %rd128 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7906240Z // end inline asm 2026-02-21T10:23:48.7906300Z add.s32 %r4367, %r4394, 43008; 2026-02-21T10:23:48.7906361Z // begin inline asm 2026-02-21T10:23:48.7906681Z cp.async.ca.shared.global [ %r4367 + 0 ], [ %rd129 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7906746Z // end inline asm 2026-02-21T10:23:48.7906814Z cp.async.commit_group; 2026-02-21T10:23:48.7907028Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7907217Z add.s64 %rd130, %rd140, 288; 2026-02-21T10:23:48.7907283Z add.s64 %rd131, %rd144, 288; 2026-02-21T10:23:48.7907499Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7907642Z add.s32 %r4369, %r4394, 53248; 2026-02-21T10:23:48.7907705Z // begin inline asm 2026-02-21T10:23:48.7907845Z cp.async.ca.shared.global [ %r4369 + 0 ], [ %rd130 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7907911Z // end inline asm 2026-02-21T10:23:48.7907973Z add.s32 %r4371, %r4394, 55296; 2026-02-21T10:23:48.7908031Z // begin inline asm 2026-02-21T10:23:48.7908167Z cp.async.ca.shared.global [ %r4371 + 0 ], [ %rd131 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7908224Z // end inline asm 2026-02-21T10:23:48.7908290Z cp.async.commit_group; 2026-02-21T10:23:48.7908582Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7908658Z add.s64 %rd132, %rd140, 320; 2026-02-21T10:23:48.7908725Z add.s64 %rd133, %rd144, 320; 2026-02-21T10:23:48.7909006Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7909077Z add.s32 %r4373, %r4394, 65536; 2026-02-21T10:23:48.7909136Z // begin inline asm 2026-02-21T10:23:48.7909274Z cp.async.ca.shared.global [ %r4373 + 0 ], [ %rd132 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7909335Z // end inline asm 2026-02-21T10:23:48.7909474Z add.s32 %r4375, %r4394, 67584; 2026-02-21T10:23:48.7909541Z // begin inline asm 2026-02-21T10:23:48.7909672Z cp.async.ca.shared.global [ %r4375 + 0 ], [ %rd133 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7909733Z // end inline asm 2026-02-21T10:23:48.7909798Z cp.async.commit_group; 2026-02-21T10:23:48.7909997Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7910063Z add.s64 %rd134, %rd140, 352; 2026-02-21T10:23:48.7910125Z add.s64 %rd135, %rd144, 352; 2026-02-21T10:23:48.7910341Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7910409Z add.s32 %r4377, %r4394, 77824; 2026-02-21T10:23:48.7910470Z // begin inline asm 2026-02-21T10:23:48.7910612Z cp.async.ca.shared.global [ %r4377 + 0 ], [ %rd134 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7910675Z // end inline asm 2026-02-21T10:23:48.7910742Z add.s32 %r4379, %r4394, 79872; 2026-02-21T10:23:48.7910816Z // begin inline asm 2026-02-21T10:23:48.7910955Z cp.async.ca.shared.global [ %r4379 + 0 ], [ %rd135 + 0 ], 0x8, %r4366; 2026-02-21T10:23:48.7911016Z // end inline asm 2026-02-21T10:23:48.7911083Z cp.async.commit_group; 2026-02-21T10:23:48.7911304Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7911370Z @%p23 bra $L__BB0_15; 2026-02-21T10:23:48.7911461Z // %bb.9: // %.lr.ph140 2026-02-21T10:23:48.7911673Z .loc 1 31 64 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:64 2026-02-21T10:23:48.7911744Z mul.lo.s32 %r4411, %r323, %r321; 2026-02-21T10:23:48.7911819Z sub.s32 %r4412, %r322, %r4411; 2026-02-21T10:23:48.7912031Z .loc 1 31 30 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:30 2026-02-21T10:23:48.7912098Z add.s32 %r4413, %r4412, %r320; 2026-02-21T10:23:48.7912309Z .loc 1 33 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:33:27 2026-02-21T10:23:48.7912373Z shl.b32 %r6491, %r4413, 7; 2026-02-21T10:23:48.7912581Z .loc 1 34 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:32 2026-02-21T10:23:48.7912648Z or.b32 %r6504, %r6491, %r19; 2026-02-21T10:23:48.7912715Z add.s32 %r331, %r319, -3; 2026-02-21T10:23:48.7912789Z or.b32 %r4417, %r6344, %r6345; 2026-02-21T10:23:48.7912854Z or.b32 %r4418, %r4417, %r6346; 2026-02-21T10:23:48.7912920Z or.b32 %r332, %r4418, %r327; 2026-02-21T10:23:48.7913041Z xor.b32 %r333, %r332, 8; 2026-02-21T10:23:48.7913105Z or.b32 %r4420, %r19, %r6347; 2026-02-21T10:23:48.7913170Z or.b32 %r4421, %r4420, %r27; 2026-02-21T10:23:48.7913231Z add.s32 %r334, %r4869, %r4421; 2026-02-21T10:23:48.7913295Z xor.b32 %r4423, %r4421, 32; 2026-02-21T10:23:48.7913414Z add.s32 %r335, %r4869, %r4423; 2026-02-21T10:23:48.7913481Z xor.b32 %r4424, %r4421, 64; 2026-02-21T10:23:48.7913542Z add.s32 %r336, %r4869, %r4424; 2026-02-21T10:23:48.7913603Z xor.b32 %r4425, %r4421, 96; 2026-02-21T10:23:48.7913669Z add.s32 %r337, %r4869, %r4425; 2026-02-21T10:23:48.7913735Z xor.b32 %r4429, %r6349, %r6350; 2026-02-21T10:23:48.7913795Z add.s32 %r4430, %r4869, %r6348; 2026-02-21T10:23:48.7913858Z add.s32 %r338, %r4430, %r4429; 2026-02-21T10:23:48.7913922Z and.b32 %r4432, %r6351, 8128; 2026-02-21T10:23:48.7913990Z or.b32 %r4435, %r4432, %r6353; 2026-02-21T10:23:48.7914052Z or.b32 %r4436, %r4435, %r6352; 2026-02-21T10:23:48.7914118Z add.s32 %r339, %r4869, %r4436; 2026-02-21T10:23:48.7914179Z xor.b32 %r4437, %r4436, 16; 2026-02-21T10:23:48.7914291Z add.s32 %r340, %r4869, %r4437; 2026-02-21T10:23:48.7914370Z xor.b32 %r4438, %r4436, 32; 2026-02-21T10:23:48.7914432Z add.s32 %r341, %r4869, %r4438; 2026-02-21T10:23:48.7914491Z xor.b32 %r4439, %r4436, 48; 2026-02-21T10:23:48.7914555Z add.s32 %r342, %r4869, %r4439; 2026-02-21T10:23:48.7914621Z bfe.u32 %r4440, %r4869, 4, 14; 2026-02-21T10:23:48.7914731Z cvt.u64.u32 %rd145, %r4440; 2026-02-21T10:23:48.7914819Z or.b64 %rd150, %rd145, -9223371899382267904; 2026-02-21T10:23:48.7914892Z add.s32 %r4441, %r4869, 32; 2026-02-21T10:23:48.7914956Z bfe.u32 %r4442, %r4441, 4, 14; 2026-02-21T10:23:48.7915019Z cvt.u64.u32 %rd146, %r4442; 2026-02-21T10:23:48.7915097Z or.b64 %rd151, %rd146, -9223371899382267904; 2026-02-21T10:23:48.7915164Z and.b32 %r4445, %r6355, 7264; 2026-02-21T10:23:48.7915226Z shl.b32 %r4447, %r6356, 4; 2026-02-21T10:23:48.7915287Z or.b32 %r4449, %r6354, %r6357; 2026-02-21T10:23:48.7915361Z or.b32 %r4450, %r4445, %r4447; 2026-02-21T10:23:48.7915428Z or.b32 %r4451, %r4449, %r4450; 2026-02-21T10:23:48.7915491Z add.s32 %r343, %r4869, %r4451; 2026-02-21T10:23:48.7915554Z xor.b32 %r4452, %r4451, 32; 2026-02-21T10:23:48.7915620Z add.s32 %r344, %r4869, %r4452; 2026-02-21T10:23:48.7915691Z xor.b32 %r4453, %r4451, 64; 2026-02-21T10:23:48.7915756Z add.s32 %r345, %r4869, %r4453; 2026-02-21T10:23:48.7915820Z xor.b32 %r4454, %r4451, 96; 2026-02-21T10:23:48.7915885Z add.s32 %r346, %r4869, %r4454; 2026-02-21T10:23:48.7915945Z shl.b32 %r4455, %r6356, 10; 2026-02-21T10:23:48.7916010Z or.b32 %r4457, %r4455, %r6349; 2026-02-21T10:23:48.7916074Z xor.b32 %r4458, %r4457, %r6358; 2026-02-21T10:23:48.7916136Z add.s32 %r6225, %r4869, %r4458; 2026-02-21T10:23:48.7916199Z add.s32 %r6230, %r6225, 1024; 2026-02-21T10:23:48.7916261Z add.s32 %r6235, %r6225, 2048; 2026-02-21T10:23:48.7916321Z add.s32 %r6240, %r6225, 3072; 2026-02-21T10:23:48.7916383Z add.s32 %r6245, %r6225, 4096; 2026-02-21T10:23:48.7916606Z add.s32 %r6250, %r6225, 5120; 2026-02-21T10:23:48.7916674Z add.s32 %r6255, %r6225, 6144; 2026-02-21T10:23:48.7916736Z add.s32 %r6260, %r6225, 7168; 2026-02-21T10:23:48.7916795Z mov.b32 %r6514, 0f00000000; 2026-02-21T10:23:48.7916858Z mov.b32 %r6510, 2; 2026-02-21T10:23:48.7916922Z mov.b32 %r6509, -1; 2026-02-21T10:23:48.7916979Z mov.b32 %r6507, 32; 2026-02-21T10:23:48.7917054Z mov.b32 %r6506, 64; 2026-02-21T10:23:48.7917114Z mov.b32 %r6503, 8; 2026-02-21T10:23:48.7917171Z mov.b32 %r6502, 40; 2026-02-21T10:23:48.7917227Z mov.b32 %r6501, 72; 2026-02-21T10:23:48.7917287Z mov.b32 %r6500, 16; 2026-02-21T10:23:48.7917343Z mov.b32 %r6499, 48; 2026-02-21T10:23:48.7917400Z mov.b32 %r6498, 80; 2026-02-21T10:23:48.7917458Z mov.b32 %r6497, 24; 2026-02-21T10:23:48.7917514Z mov.b32 %r6496, 56; 2026-02-21T10:23:48.7917571Z mov.b32 %r6495, 88; 2026-02-21T10:23:48.7917628Z mov.b32 %r6494, 0; 2026-02-21T10:23:48.7917688Z mov.b32 %r6493, 1; 2026-02-21T10:23:48.7917747Z mov.b32 %r6490, %r6489; 2026-02-21T10:23:48.7917901Z mov.b32 %r6492, %r6491; 2026-02-21T10:23:48.7917967Z mov.b32 %r6505, %r6504; 2026-02-21T10:23:48.7918025Z mov.b32 %r6508, %r6494; 2026-02-21T10:23:48.7918082Z mov.b32 %r6511, %r6489; 2026-02-21T10:23:48.7918140Z mov.b32 %r6512, %r6504; 2026-02-21T10:23:48.7918278Z mov.b32 %r6513, %r6491; 2026-02-21T10:23:48.7918336Z mov.b32 %r6515, %r6514; 2026-02-21T10:23:48.7918394Z mov.b32 %r6516, %r6514; 2026-02-21T10:23:48.7918461Z mov.b32 %r6517, %r6514; 2026-02-21T10:23:48.7918520Z mov.b32 %r6518, %r6514; 2026-02-21T10:23:48.7918578Z mov.b32 %r6519, %r6514; 2026-02-21T10:23:48.7918637Z mov.b32 %r6520, %r6514; 2026-02-21T10:23:48.7918699Z mov.b32 %r6521, %r6514; 2026-02-21T10:23:48.7918756Z mov.b32 %r6522, %r6514; 2026-02-21T10:23:48.7918814Z mov.b32 %r6523, %r6514; 2026-02-21T10:23:48.7918878Z mov.b32 %r6524, %r6514; 2026-02-21T10:23:48.7918936Z mov.b32 %r6525, %r6514; 2026-02-21T10:23:48.7918992Z mov.b32 %r6526, %r6514; 2026-02-21T10:23:48.7919053Z mov.b32 %r6527, %r6514; 2026-02-21T10:23:48.7919178Z mov.b32 %r6528, %r6514; 2026-02-21T10:23:48.7919238Z mov.b32 %r6529, %r6514; 2026-02-21T10:23:48.7919297Z mov.b32 %r6530, %r6514; 2026-02-21T10:23:48.7919358Z mov.b32 %r6531, %r6514; 2026-02-21T10:23:48.7919419Z mov.b32 %r6532, %r6514; 2026-02-21T10:23:48.7919479Z mov.b32 %r6533, %r6514; 2026-02-21T10:23:48.7919546Z mov.b32 %r6534, %r6514; 2026-02-21T10:23:48.7919664Z mov.b32 %r6535, %r6514; 2026-02-21T10:23:48.7919725Z mov.b32 %r6536, %r6514; 2026-02-21T10:23:48.7919783Z mov.b32 %r6537, %r6514; 2026-02-21T10:23:48.7919847Z mov.b32 %r6538, %r6514; 2026-02-21T10:23:48.7919905Z mov.b32 %r6539, %r6514; 2026-02-21T10:23:48.7919961Z mov.b32 %r6540, %r6514; 2026-02-21T10:23:48.7920022Z mov.b32 %r6541, %r6514; 2026-02-21T10:23:48.7920079Z mov.b32 %r6542, %r6514; 2026-02-21T10:23:48.7920135Z mov.b32 %r6543, %r6514; 2026-02-21T10:23:48.7920193Z mov.b32 %r6544, %r6514; 2026-02-21T10:23:48.7920253Z mov.b32 %r6545, %r6514; 2026-02-21T10:23:48.7920314Z mov.b32 %r6546, %r6514; 2026-02-21T10:23:48.7920374Z mov.b32 %r6547, %r6514; 2026-02-21T10:23:48.7920437Z mov.b32 %r6548, %r6514; 2026-02-21T10:23:48.7920493Z mov.b32 %r6549, %r6514; 2026-02-21T10:23:48.7920550Z mov.b32 %r6550, %r6514; 2026-02-21T10:23:48.7920612Z mov.b32 %r6551, %r6514; 2026-02-21T10:23:48.7920673Z mov.b32 %r6552, %r6514; 2026-02-21T10:23:48.7920730Z mov.b32 %r6553, %r6514; 2026-02-21T10:23:48.7920792Z mov.b32 %r6554, %r6514; 2026-02-21T10:23:48.7920866Z mov.b32 %r6555, %r6514; 2026-02-21T10:23:48.7920926Z mov.b32 %r6556, %r6514; 2026-02-21T10:23:48.7920983Z mov.b32 %r6557, %r6514; 2026-02-21T10:23:48.7921040Z mov.b32 %r6558, %r6514; 2026-02-21T10:23:48.7921103Z mov.b32 %r6559, %r6514; 2026-02-21T10:23:48.7921160Z mov.b32 %r6560, %r6514; 2026-02-21T10:23:48.7921217Z mov.b32 %r6561, %r6514; 2026-02-21T10:23:48.7921279Z mov.b32 %r6562, %r6514; 2026-02-21T10:23:48.7921337Z mov.b32 %r6563, %r6514; 2026-02-21T10:23:48.7921394Z mov.b32 %r6564, %r6514; 2026-02-21T10:23:48.7921457Z mov.b32 %r6565, %r6514; 2026-02-21T10:23:48.7921518Z mov.b32 %r6566, %r6514; 2026-02-21T10:23:48.7921577Z mov.b32 %r6567, %r6514; 2026-02-21T10:23:48.7921639Z mov.b32 %r6568, %r6514; 2026-02-21T10:23:48.7921703Z mov.b32 %r6569, %r6514; 2026-02-21T10:23:48.7921762Z mov.b32 %r6570, %r6514; 2026-02-21T10:23:48.7921819Z mov.b32 %r6571, %r6514; 2026-02-21T10:23:48.7921877Z mov.b32 %r6572, %r6514; 2026-02-21T10:23:48.7921943Z mov.b32 %r6573, %r6514; 2026-02-21T10:23:48.7922002Z mov.b32 %r6574, %r6514; 2026-02-21T10:23:48.7922060Z mov.b32 %r6575, %r6514; 2026-02-21T10:23:48.7922130Z mov.b32 %r6576, %r6514; 2026-02-21T10:23:48.7922194Z mov.b32 %r6577, %r6514; 2026-02-21T10:23:48.7922261Z mov.b32 %r6579, %r6510; 2026-02-21T10:23:48.7922323Z mov.b32 %r6580, %r6494; 2026-02-21T10:23:48.7922383Z mov.b32 %r6583, %r6513; 2026-02-21T10:23:48.7922441Z mov.b32 %r6584, %r6512; 2026-02-21T10:23:48.7922503Z mov.b32 %r6585, %r6511; 2026-02-21T10:23:48.7922634Z bra.uni $L__BB0_10; 2026-02-21T10:23:48.7922761Z $L__BB0_14: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:23:48.7922981Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7923093Z add.s32 %r6580, %r6580, 1; 2026-02-21T10:23:48.7923177Z setp.ne.b32 %p44, %r319, %r6580; 2026-02-21T10:23:48.7923238Z mov.b32 %r6489, %r6511; 2026-02-21T10:23:48.7923303Z mov.b32 %r6490, %r357; 2026-02-21T10:23:48.7923364Z mov.b32 %r6491, %r6513; 2026-02-21T10:23:48.7923427Z mov.b32 %r6492, %r359; 2026-02-21T10:23:48.7923488Z mov.b32 %r6493, %r6579; 2026-02-21T10:23:48.7923547Z mov.b32 %r6494, %r361; 2026-02-21T10:23:48.7923609Z mov.b32 %r6497, %r364; 2026-02-21T10:23:48.7923669Z mov.b32 %r6500, %r367; 2026-02-21T10:23:48.7923728Z mov.b32 %r6503, %r370; 2026-02-21T10:23:48.7923788Z mov.b32 %r6504, %r6512; 2026-02-21T10:23:48.7923850Z mov.b32 %r6505, %r372; 2026-02-21T10:23:48.7923909Z mov.b32 %r6508, %r375; 2026-02-21T10:23:48.7924028Z mov.b32 %r6511, %r6585; 2026-02-21T10:23:48.7924096Z mov.b32 %r6512, %r6584; 2026-02-21T10:23:48.7924156Z mov.b32 %r6513, %r6583; 2026-02-21T10:23:48.7924216Z mov.b32 %r6579, %r451; 2026-02-21T10:23:48.7924279Z @%p44 bra $L__BB0_10; 2026-02-21T10:23:48.7924341Z bra.uni $L__BB0_15; 2026-02-21T10:23:48.7924463Z $L__BB0_10: // =>This Inner Loop Header: Depth=1 2026-02-21T10:23:48.7924742Z .loc 1 0 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:0:121 2026-02-21T10:23:48.7924809Z mov.b32 %r375, %r6507; 2026-02-21T10:23:48.7924870Z mov.b32 %r6507, %r6506; 2026-02-21T10:23:48.7924928Z mov.b32 %r372, %r6504; 2026-02-21T10:23:48.7924991Z mov.b32 %r370, %r6502; 2026-02-21T10:23:48.7925051Z mov.b32 %r6502, %r6501; 2026-02-21T10:23:48.7925108Z mov.b32 %r367, %r6499; 2026-02-21T10:23:48.7925166Z mov.b32 %r6499, %r6498; 2026-02-21T10:23:48.7925228Z mov.b32 %r364, %r6496; 2026-02-21T10:23:48.7925290Z mov.b32 %r6496, %r6495; 2026-02-21T10:23:48.7925350Z mov.b32 %r361, %r6493; 2026-02-21T10:23:48.7925411Z mov.b32 %r359, %r6491; 2026-02-21T10:23:48.7925470Z mov.b32 %r357, %r6489; 2026-02-21T10:23:48.7925686Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7925753Z add.s32 %r4459, %r6579, 1; 2026-02-21T10:23:48.7925836Z setp.eq.b32 %p28, %r6579, 127; 2026-02-21T10:23:48.7925911Z selp.b32 %r451, 0, %r4459, %p28; 2026-02-21T10:23:48.7925980Z setp.ne.b32 %p29, %r451, 0; 2026-02-21T10:23:48.7926043Z @%p29 bra $L__BB0_12; 2026-02-21T10:23:48.7926154Z // %bb.11: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:23:48.7926217Z add.s32 %r6586, %r6586, 1; 2026-02-21T10:23:48.7926423Z .loc 1 28 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:28:35 2026-02-21T10:23:48.7926608Z shr.s32 %r4460, %r6586, 31; 2026-02-21T10:23:48.7926672Z shr.u32 %r4461, %r4460, 17; 2026-02-21T10:23:48.7926742Z add.s32 %r4462, %r6586, %r4461; 2026-02-21T10:23:48.7926817Z shr.s32 %r4463, %r4462, 15; 2026-02-21T10:23:48.7927021Z .loc 1 29 33 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:29:33 2026-02-21T10:23:48.7927084Z shl.b32 %r4464, %r4463, 6; 2026-02-21T10:23:48.7927289Z .loc 1 30 39 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:39 2026-02-21T10:23:48.7927349Z sub.s32 %r4465, 10, %r4464; 2026-02-21T10:23:48.7927547Z .loc 1 30 52 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:30:52 2026-02-21T10:23:48.7927613Z min.s32 %r4466, %r4465, 64; 2026-02-21T10:23:48.7927809Z .loc 1 31 45 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:45 2026-02-21T10:23:48.7927874Z and.b32 %r4467, %r4462, -32768; 2026-02-21T10:23:48.7927935Z sub.s32 %r4468, %r6586, %r4467; 2026-02-21T10:23:48.7928138Z .loc 1 32 51 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:32:51 2026-02-21T10:23:48.7928291Z div.s32 %r4469, %r4468, %r4466; 2026-02-21T10:23:48.7928490Z .loc 1 31 64 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:64 2026-02-21T10:23:48.7928623Z mul.lo.s32 %r4470, %r4469, %r4466; 2026-02-21T10:23:48.7928688Z sub.s32 %r4471, %r4468, %r4470; 2026-02-21T10:23:48.7928887Z .loc 1 31 30 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:31:30 2026-02-21T10:23:48.7928951Z add.s32 %r4472, %r4471, %r4464; 2026-02-21T10:23:48.7929149Z .loc 1 33 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:33:27 2026-02-21T10:23:48.7929211Z shl.b32 %r6583, %r4472, 7; 2026-02-21T10:23:48.7929415Z .loc 1 34 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:32 2026-02-21T10:23:48.7929484Z or.b32 %r6584, %r6583, %r19; 2026-02-21T10:23:48.7929743Z .loc 1 35 27 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:35:27 2026-02-21T10:23:48.7929810Z shl.b32 %r6585, %r4469, 7; 2026-02-21T10:23:48.7930013Z .loc 1 36 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:36:32 2026-02-21T10:23:48.7930075Z or.b32 %r6587, %r6585, %r7; 2026-02-21T10:23:48.7930136Z or.b32 %r6588, %r6585, %r8; 2026-02-21T10:23:48.7930309Z $L__BB0_12: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:23:48.7930524Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7930590Z setp.eq.b32 %p38, %r451, 0; 2026-02-21T10:23:48.7930674Z setp.lt.s32 %p39, %r6580, %r331; 2026-02-21T10:23:48.7930736Z add.s32 %r6085, %r6509, 1; 2026-02-21T10:23:48.7930798Z setp.gt.s32 %p41, %r6085, 2; 2026-02-21T10:23:48.7930867Z selp.b32 %r6509, 0, %r6085, %p41; 2026-02-21T10:23:48.7931073Z .loc 1 44 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:44:35 2026-02-21T10:23:48.7931140Z add.s32 %r6086, %r6508, %r23; 2026-02-21T10:23:48.7931337Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7931412Z cp.async.wait_group 8; 2026-02-21T10:23:48.7931469Z bar.sync 0; 2026-02-21T10:23:48.7931530Z shl.b32 %r6087, %r6509, 12; 2026-02-21T10:23:48.7931599Z add.s32 %r6088, %r4869, 32768; 2026-02-21T10:23:48.7931662Z add.s32 %r6089, %r6088, %r6087; 2026-02-21T10:23:48.7931859Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7931926Z add.s32 %r6090, %r6089, %r332; 2026-02-21T10:23:48.7931991Z ld.shared.b16 %rs385, [%r6090]; 2026-02-21T10:23:48.7932060Z ld.shared.b16 %rs386, [%r6090+256]; 2026-02-21T10:23:48.7932132Z ld.shared.b16 %rs387, [%r6090+16]; 2026-02-21T10:23:48.7932201Z ld.shared.b16 %rs388, [%r6090+272]; 2026-02-21T10:23:48.7932261Z add.s32 %r6091, %r6089, %r333; 2026-02-21T10:23:48.7932332Z ld.shared.b16 %rs389, [%r6091]; 2026-02-21T10:23:48.7932399Z ld.shared.b16 %rs390, [%r6091+256]; 2026-02-21T10:23:48.7932462Z ld.shared.b16 %rs391, [%r6091+16]; 2026-02-21T10:23:48.7932529Z ld.shared.b16 %rs392, [%r6091+272]; 2026-02-21T10:23:48.7932601Z cvt.f32.bf16 %r4602, %rs385; 2026-02-21T10:23:48.7932664Z cvt.f32.bf16 %r4603, %rs386; 2026-02-21T10:23:48.7932727Z cvt.f32.bf16 %r4604, %rs389; 2026-02-21T10:23:48.7932792Z cvt.f32.bf16 %r4605, %rs390; 2026-02-21T10:23:48.7932852Z cvt.f32.bf16 %r4734, %rs387; 2026-02-21T10:23:48.7932912Z cvt.f32.bf16 %r4735, %rs388; 2026-02-21T10:23:48.7932973Z cvt.f32.bf16 %r4736, %rs391; 2026-02-21T10:23:48.7933038Z cvt.f32.bf16 %r4737, %rs392; 2026-02-21T10:23:48.7933236Z .loc 1 57 62 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:62 2026-02-21T10:23:48.7933310Z mad.lo.s32 %r6092, %r6086, 1280, %r6505; 2026-02-21T10:23:48.7933513Z .loc 1 57 34 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:34 2026-02-21T10:23:48.7933646Z cvt.s64.s32 %rd175, %r6092; 2026-02-21T10:23:48.7933712Z add.s64 %rd148, %rd24, %rd175; 2026-02-21T10:23:48.7933911Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7934019Z // begin inline asm 2026-02-21T10:23:48.7934078Z mov.u64 %rd147, 0x0; 2026-02-21T10:23:48.7934218Z createpolicy.fractional.L2::evict_last.b64 %rd147, 1.0; 2026-02-21T10:23:48.7934277Z // end inline asm 2026-02-21T10:23:48.7934337Z // begin inline asm 2026-02-21T10:23:48.7934396Z mov.u32 %r4473, 0x0; 2026-02-21T10:23:48.7934565Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r4473 }, [ %rd148 + 0 ], %rd147; 2026-02-21T10:23:48.7934622Z // end inline asm 2026-02-21T10:23:48.7934825Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7934896Z st.shared.b8 [%r334], %r4473; 2026-02-21T10:23:48.7935020Z prmt.b32 %r6093, %r4473, 0, 0x7771U; 2026-02-21T10:23:48.7935090Z st.shared.b8 [%r335+256], %r6093; 2026-02-21T10:23:48.7935176Z prmt.b32 %r6094, %r4473, 0, 0x7772U; 2026-02-21T10:23:48.7935241Z st.shared.b8 [%r336+512], %r6094; 2026-02-21T10:23:48.7935309Z prmt.b32 %r6095, %r4473, 0, 0x7773U; 2026-02-21T10:23:48.7935372Z st.shared.b8 [%r337+768], %r6095; 2026-02-21T10:23:48.7935429Z bar.sync 0; 2026-02-21T10:23:48.7935543Z ld.shared.b32 %r6096, [%r338]; 2026-02-21T10:23:48.7935609Z prmt.b32 %r6097, %r6096, 0, 0x7770U; 2026-02-21T10:23:48.7935681Z cvt.u16.u32 %rs393, %r6097; 2026-02-21T10:23:48.7935750Z prmt.b32 %r6098, %r6096, 0, 0x7771U; 2026-02-21T10:23:48.7935813Z cvt.u16.u32 %rs394, %r6098; 2026-02-21T10:23:48.7935877Z prmt.b32 %r6099, %r6096, 0, 0x7772U; 2026-02-21T10:23:48.7935941Z cvt.u16.u32 %rs395, %r6099; 2026-02-21T10:23:48.7936006Z prmt.b32 %r6100, %r6096, 0, 0x7773U; 2026-02-21T10:23:48.7936067Z cvt.u16.u32 %rs396, %r6100; 2026-02-21T10:23:48.7936144Z ld.shared.b32 %r6101, [%r338+128]; 2026-02-21T10:23:48.7936207Z prmt.b32 %r6102, %r6101, 0, 0x7770U; 2026-02-21T10:23:48.7936269Z cvt.u16.u32 %rs397, %r6102; 2026-02-21T10:23:48.7936331Z prmt.b32 %r6103, %r6101, 0, 0x7771U; 2026-02-21T10:23:48.7936398Z cvt.u16.u32 %rs398, %r6103; 2026-02-21T10:23:48.7936581Z prmt.b32 %r6104, %r6101, 0, 0x7772U; 2026-02-21T10:23:48.7936657Z cvt.u16.u32 %rs399, %r6104; 2026-02-21T10:23:48.7936731Z prmt.b32 %r6105, %r6101, 0, 0x7773U; 2026-02-21T10:23:48.7936793Z cvt.u16.u32 %rs400, %r6105; 2026-02-21T10:23:48.7936996Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7937063Z shl.b16 %rs401, %rs393, 4; 2026-02-21T10:23:48.7937125Z shl.b16 %rs402, %rs394, 4; 2026-02-21T10:23:48.7937186Z shl.b16 %rs403, %rs395, 4; 2026-02-21T10:23:48.7937245Z shl.b16 %rs404, %rs396, 4; 2026-02-21T10:23:48.7937309Z shl.b16 %rs405, %rs397, 4; 2026-02-21T10:23:48.7937370Z shl.b16 %rs406, %rs398, 4; 2026-02-21T10:23:48.7937436Z shl.b16 %rs407, %rs399, 4; 2026-02-21T10:23:48.7937500Z shl.b16 %rs408, %rs400, 4; 2026-02-21T10:23:48.7937702Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7937777Z selp.b16 %rs409, %rs401, %rs393, %p45; 2026-02-21T10:23:48.7937840Z cvt.s16.s8 %rs410, %rs409; 2026-02-21T10:23:48.7937902Z shr.s16 %rs411, %rs410, 4; 2026-02-21T10:23:48.7937974Z selp.b16 %rs412, %rs402, %rs394, %p45; 2026-02-21T10:23:48.7938036Z cvt.s16.s8 %rs413, %rs412; 2026-02-21T10:23:48.7938101Z shr.s16 %rs414, %rs413, 4; 2026-02-21T10:23:48.7938168Z selp.b16 %rs415, %rs403, %rs395, %p45; 2026-02-21T10:23:48.7938240Z cvt.s16.s8 %rs416, %rs415; 2026-02-21T10:23:48.7938306Z shr.s16 %rs417, %rs416, 4; 2026-02-21T10:23:48.7938376Z selp.b16 %rs418, %rs404, %rs396, %p45; 2026-02-21T10:23:48.7938439Z cvt.s16.s8 %rs419, %rs418; 2026-02-21T10:23:48.7938500Z shr.s16 %rs420, %rs419, 4; 2026-02-21T10:23:48.7938654Z selp.b16 %rs421, %rs405, %rs397, %p45; 2026-02-21T10:23:48.7938715Z cvt.s16.s8 %rs422, %rs421; 2026-02-21T10:23:48.7938776Z shr.s16 %rs423, %rs422, 4; 2026-02-21T10:23:48.7938849Z selp.b16 %rs424, %rs406, %rs398, %p45; 2026-02-21T10:23:48.7938976Z cvt.s16.s8 %rs425, %rs424; 2026-02-21T10:23:48.7939036Z shr.s16 %rs426, %rs425, 4; 2026-02-21T10:23:48.7939104Z selp.b16 %rs427, %rs407, %rs399, %p45; 2026-02-21T10:23:48.7939181Z cvt.s16.s8 %rs428, %rs427; 2026-02-21T10:23:48.7939244Z shr.s16 %rs429, %rs428, 4; 2026-02-21T10:23:48.7939311Z selp.b16 %rs430, %rs408, %rs400, %p45; 2026-02-21T10:23:48.7939375Z cvt.s16.s8 %rs431, %rs430; 2026-02-21T10:23:48.7939435Z shr.s16 %rs432, %rs431, 4; 2026-02-21T10:23:48.7939637Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7939704Z cvt.rn.f32.s16 %r6106, %rs411; 2026-02-21T10:23:48.7939773Z cvt.rn.f32.s16 %r6107, %rs414; 2026-02-21T10:23:48.7939836Z cvt.rn.f32.s16 %r6108, %rs417; 2026-02-21T10:23:48.7939973Z cvt.rn.f32.s16 %r6109, %rs420; 2026-02-21T10:23:48.7940045Z cvt.rn.f32.s16 %r6110, %rs423; 2026-02-21T10:23:48.7940106Z cvt.rn.f32.s16 %r6111, %rs426; 2026-02-21T10:23:48.7940168Z cvt.rn.f32.s16 %r6112, %rs429; 2026-02-21T10:23:48.7940236Z cvt.rn.f32.s16 %r6113, %rs432; 2026-02-21T10:23:48.7940294Z bar.sync 0; 2026-02-21T10:23:48.7940358Z st.shared.b32 [%r339], %r6106; 2026-02-21T10:23:48.7940486Z st.shared.b32 [%r339+8], %r6107; 2026-02-21T10:23:48.7940556Z st.shared.b32 [%r340], %r6108; 2026-02-21T10:23:48.7940622Z st.shared.b32 [%r340+8], %r6109; 2026-02-21T10:23:48.7940686Z st.shared.b32 [%r341], %r6110; 2026-02-21T10:23:48.7940751Z st.shared.b32 [%r341+8], %r6111; 2026-02-21T10:23:48.7940813Z st.shared.b32 [%r342], %r6112; 2026-02-21T10:23:48.7940878Z st.shared.b32 [%r342+8], %r6113; 2026-02-21T10:23:48.7940933Z $L__tmp17: 2026-02-21T10:23:48.7941227Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7941293Z // begin inline asm 2026-02-21T10:23:48.7941373Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7941436Z // end inline asm 2026-02-21T10:23:48.7941494Z bar.sync 0; 2026-02-21T10:23:48.7941579Z shfl.sync.idx.b32 %r6114, %r5, 0, 31, -1; 2026-02-21T10:23:48.7941654Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7941720Z mov.pred %p30, -1; 2026-02-21T10:23:48.7941782Z // begin inline asm 2026-02-21T10:23:48.7943070Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r4602,%r4603,%r4604,%r4605}, %rd150, %p30, 1, 1; 2026-02-21T10:23:48.7943137Z // end inline asm 2026-02-21T10:23:48.7943198Z // begin inline asm 2026-02-21T10:23:48.7944465Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r4734,%r4735,%r4736,%r4737}, %rd151, %p30, 1, 1; 2026-02-21T10:23:48.7944526Z // end inline asm 2026-02-21T10:23:48.7944605Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7944664Z mov.b32 %r6000, 0; 2026-02-21T10:23:48.7944725Z mov.b32 %r4803, %r6000; 2026-02-21T10:23:48.7944785Z mov.b32 %r4804, %r6000; 2026-02-21T10:23:48.7944904Z mov.b32 %r4802, %r4869; 2026-02-21T10:23:48.7944977Z // begin inline asm 2026-02-21T10:23:48.7946051Z // wait for regs: %r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577,%r4802,%r4803,%r4804 2026-02-21T10:23:48.7946183Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7946240Z // end inline asm 2026-02-21T10:23:48.7946293Z $L__tmp18: 2026-02-21T10:23:48.7946634Z .loc 1 44 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:44:35 2026-02-21T10:23:48.7946707Z add.s32 %r6115, %r6503, %r23; 2026-02-21T10:23:48.7946912Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7947082Z add.s32 %r6116, %r4869, 45056; 2026-02-21T10:23:48.7947153Z add.s32 %r6117, %r6116, %r6087; 2026-02-21T10:23:48.7947354Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7947432Z add.s32 %r6118, %r6117, %r332; 2026-02-21T10:23:48.7947501Z ld.shared.b16 %rs433, [%r6118]; 2026-02-21T10:23:48.7947637Z ld.shared.b16 %rs434, [%r6118+256]; 2026-02-21T10:23:48.7947707Z ld.shared.b16 %rs435, [%r6118+16]; 2026-02-21T10:23:48.7947784Z ld.shared.b16 %rs436, [%r6118+272]; 2026-02-21T10:23:48.7947853Z add.s32 %r6119, %r6117, %r333; 2026-02-21T10:23:48.7947919Z ld.shared.b16 %rs437, [%r6119]; 2026-02-21T10:23:48.7947985Z ld.shared.b16 %rs438, [%r6119+256]; 2026-02-21T10:23:48.7948056Z ld.shared.b16 %rs439, [%r6119+16]; 2026-02-21T10:23:48.7948121Z ld.shared.b16 %rs440, [%r6119+272]; 2026-02-21T10:23:48.7948185Z cvt.f32.bf16 %r5001, %rs433; 2026-02-21T10:23:48.7948248Z cvt.f32.bf16 %r5002, %rs434; 2026-02-21T10:23:48.7948316Z cvt.f32.bf16 %r5003, %rs437; 2026-02-21T10:23:48.7948379Z cvt.f32.bf16 %r5004, %rs438; 2026-02-21T10:23:48.7948441Z cvt.f32.bf16 %r5133, %rs435; 2026-02-21T10:23:48.7948563Z cvt.f32.bf16 %r5134, %rs436; 2026-02-21T10:23:48.7948630Z cvt.f32.bf16 %r5135, %rs439; 2026-02-21T10:23:48.7948692Z cvt.f32.bf16 %r5136, %rs440; 2026-02-21T10:23:48.7948902Z .loc 1 57 62 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:62 2026-02-21T10:23:48.7948976Z mad.lo.s32 %r6120, %r6115, 1280, %r6505; 2026-02-21T10:23:48.7949175Z .loc 1 57 34 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:34 2026-02-21T10:23:48.7949240Z cvt.s64.s32 %rd176, %r6120; 2026-02-21T10:23:48.7949316Z add.s64 %rd153, %rd24, %rd176; 2026-02-21T10:23:48.7949516Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7949581Z // begin inline asm 2026-02-21T10:23:48.7949646Z mov.u64 %rd152, 0x0; 2026-02-21T10:23:48.7949772Z createpolicy.fractional.L2::evict_last.b64 %rd152, 1.0; 2026-02-21T10:23:48.7949830Z // end inline asm 2026-02-21T10:23:48.7949893Z // begin inline asm 2026-02-21T10:23:48.7949955Z mov.u32 %r4872, 0x0; 2026-02-21T10:23:48.7950119Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r4872 }, [ %rd153 + 0 ], %rd152; 2026-02-21T10:23:48.7950179Z // end inline asm 2026-02-21T10:23:48.7950386Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7950441Z bar.sync 0; 2026-02-21T10:23:48.7950507Z st.shared.b8 [%r334], %r4872; 2026-02-21T10:23:48.7950584Z prmt.b32 %r6121, %r4872, 0, 0x7771U; 2026-02-21T10:23:48.7950657Z st.shared.b8 [%r335+256], %r6121; 2026-02-21T10:23:48.7950724Z prmt.b32 %r6122, %r4872, 0, 0x7772U; 2026-02-21T10:23:48.7950791Z st.shared.b8 [%r336+512], %r6122; 2026-02-21T10:23:48.7950855Z prmt.b32 %r6123, %r4872, 0, 0x7773U; 2026-02-21T10:23:48.7951001Z st.shared.b8 [%r337+768], %r6123; 2026-02-21T10:23:48.7951055Z bar.sync 0; 2026-02-21T10:23:48.7951139Z ld.shared.b32 %r6124, [%r338]; 2026-02-21T10:23:48.7951205Z prmt.b32 %r6125, %r6124, 0, 0x7770U; 2026-02-21T10:23:48.7951334Z cvt.u16.u32 %rs441, %r6125; 2026-02-21T10:23:48.7951399Z prmt.b32 %r6126, %r6124, 0, 0x7771U; 2026-02-21T10:23:48.7951465Z cvt.u16.u32 %rs442, %r6126; 2026-02-21T10:23:48.7951529Z prmt.b32 %r6127, %r6124, 0, 0x7772U; 2026-02-21T10:23:48.7951591Z cvt.u16.u32 %rs443, %r6127; 2026-02-21T10:23:48.7951657Z prmt.b32 %r6128, %r6124, 0, 0x7773U; 2026-02-21T10:23:48.7951717Z cvt.u16.u32 %rs444, %r6128; 2026-02-21T10:23:48.7951784Z ld.shared.b32 %r6129, [%r338+128]; 2026-02-21T10:23:48.7951859Z prmt.b32 %r6130, %r6129, 0, 0x7770U; 2026-02-21T10:23:48.7951920Z cvt.u16.u32 %rs445, %r6130; 2026-02-21T10:23:48.7951985Z prmt.b32 %r6131, %r6129, 0, 0x7771U; 2026-02-21T10:23:48.7952047Z cvt.u16.u32 %rs446, %r6131; 2026-02-21T10:23:48.7952193Z prmt.b32 %r6132, %r6129, 0, 0x7772U; 2026-02-21T10:23:48.7952257Z cvt.u16.u32 %rs447, %r6132; 2026-02-21T10:23:48.7952321Z prmt.b32 %r6133, %r6129, 0, 0x7773U; 2026-02-21T10:23:48.7952388Z cvt.u16.u32 %rs448, %r6133; 2026-02-21T10:23:48.7952593Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7952660Z shl.b16 %rs449, %rs441, 4; 2026-02-21T10:23:48.7952773Z shl.b16 %rs450, %rs442, 4; 2026-02-21T10:23:48.7952837Z shl.b16 %rs451, %rs443, 4; 2026-02-21T10:23:48.7952897Z shl.b16 %rs452, %rs444, 4; 2026-02-21T10:23:48.7952956Z shl.b16 %rs453, %rs445, 4; 2026-02-21T10:23:48.7953019Z shl.b16 %rs454, %rs446, 4; 2026-02-21T10:23:48.7953079Z shl.b16 %rs455, %rs447, 4; 2026-02-21T10:23:48.7953140Z shl.b16 %rs456, %rs448, 4; 2026-02-21T10:23:48.7953342Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7953415Z selp.b16 %rs457, %rs449, %rs441, %p45; 2026-02-21T10:23:48.7953492Z cvt.s16.s8 %rs458, %rs457; 2026-02-21T10:23:48.7953558Z shr.s16 %rs459, %rs458, 4; 2026-02-21T10:23:48.7953628Z selp.b16 %rs460, %rs450, %rs442, %p45; 2026-02-21T10:23:48.7953690Z cvt.s16.s8 %rs461, %rs460; 2026-02-21T10:23:48.7953752Z shr.s16 %rs462, %rs461, 4; 2026-02-21T10:23:48.7953824Z selp.b16 %rs463, %rs451, %rs443, %p45; 2026-02-21T10:23:48.7953885Z cvt.s16.s8 %rs464, %rs463; 2026-02-21T10:23:48.7953948Z shr.s16 %rs465, %rs464, 4; 2026-02-21T10:23:48.7954019Z selp.b16 %rs466, %rs452, %rs444, %p45; 2026-02-21T10:23:48.7954081Z cvt.s16.s8 %rs467, %rs466; 2026-02-21T10:23:48.7954141Z shr.s16 %rs468, %rs467, 4; 2026-02-21T10:23:48.7954208Z selp.b16 %rs469, %rs453, %rs445, %p45; 2026-02-21T10:23:48.7954272Z cvt.s16.s8 %rs470, %rs469; 2026-02-21T10:23:48.7954332Z shr.s16 %rs471, %rs470, 4; 2026-02-21T10:23:48.7954399Z selp.b16 %rs472, %rs454, %rs446, %p45; 2026-02-21T10:23:48.7954463Z cvt.s16.s8 %rs473, %rs472; 2026-02-21T10:23:48.7954527Z shr.s16 %rs474, %rs473, 4; 2026-02-21T10:23:48.7954596Z selp.b16 %rs475, %rs455, %rs447, %p45; 2026-02-21T10:23:48.7954656Z cvt.s16.s8 %rs476, %rs475; 2026-02-21T10:23:48.7954718Z shr.s16 %rs477, %rs476, 4; 2026-02-21T10:23:48.7954784Z selp.b16 %rs478, %rs456, %rs448, %p45; 2026-02-21T10:23:48.7954848Z cvt.s16.s8 %rs479, %rs478; 2026-02-21T10:23:48.7954911Z shr.s16 %rs480, %rs479, 4; 2026-02-21T10:23:48.7955116Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7955183Z cvt.rn.f32.s16 %r6134, %rs459; 2026-02-21T10:23:48.7955251Z cvt.rn.f32.s16 %r6135, %rs462; 2026-02-21T10:23:48.7955314Z cvt.rn.f32.s16 %r6136, %rs465; 2026-02-21T10:23:48.7955376Z cvt.rn.f32.s16 %r6137, %rs468; 2026-02-21T10:23:48.7955437Z cvt.rn.f32.s16 %r6138, %rs471; 2026-02-21T10:23:48.7955502Z cvt.rn.f32.s16 %r6139, %rs474; 2026-02-21T10:23:48.7955563Z cvt.rn.f32.s16 %r6140, %rs477; 2026-02-21T10:23:48.7955625Z cvt.rn.f32.s16 %r6141, %rs480; 2026-02-21T10:23:48.7955747Z bar.sync 0; 2026-02-21T10:23:48.7955812Z st.shared.b32 [%r339], %r6134; 2026-02-21T10:23:48.7955879Z st.shared.b32 [%r339+8], %r6135; 2026-02-21T10:23:48.7955943Z st.shared.b32 [%r340], %r6136; 2026-02-21T10:23:48.7956071Z st.shared.b32 [%r340+8], %r6137; 2026-02-21T10:23:48.7956139Z st.shared.b32 [%r341], %r6138; 2026-02-21T10:23:48.7956202Z st.shared.b32 [%r341+8], %r6139; 2026-02-21T10:23:48.7956269Z st.shared.b32 [%r342], %r6140; 2026-02-21T10:23:48.7956333Z st.shared.b32 [%r342+8], %r6141; 2026-02-21T10:23:48.7956388Z $L__tmp19: 2026-02-21T10:23:48.7956802Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7956871Z // begin inline asm 2026-02-21T10:23:48.7956951Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7957008Z // end inline asm 2026-02-21T10:23:48.7957068Z bar.sync 0; 2026-02-21T10:23:48.7957141Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7957203Z // begin inline asm 2026-02-21T10:23:48.7958631Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5001,%r5002,%r5003,%r5004}, %rd150, %p30, 1, 1; 2026-02-21T10:23:48.7958698Z // end inline asm 2026-02-21T10:23:48.7958758Z // begin inline asm 2026-02-21T10:23:48.7960026Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5133,%r5134,%r5135,%r5136}, %rd151, %p30, 1, 1; 2026-02-21T10:23:48.7960088Z // end inline asm 2026-02-21T10:23:48.7960164Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7960229Z mov.b32 %r5202, %r6000; 2026-02-21T10:23:48.7960291Z mov.b32 %r5203, %r6000; 2026-02-21T10:23:48.7960350Z mov.b32 %r5201, %r4869; 2026-02-21T10:23:48.7960412Z // begin inline asm 2026-02-21T10:23:48.7961473Z // wait for regs: %r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577,%r5201,%r5202,%r5203 2026-02-21T10:23:48.7961549Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7961610Z // end inline asm 2026-02-21T10:23:48.7961664Z $L__tmp20: 2026-02-21T10:23:48.7961877Z .loc 1 44 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:44:35 2026-02-21T10:23:48.7961947Z add.s32 %r6142, %r6500, %r23; 2026-02-21T10:23:48.7962153Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7962215Z add.s32 %r6143, %r4869, 57344; 2026-02-21T10:23:48.7962281Z add.s32 %r6144, %r6143, %r6087; 2026-02-21T10:23:48.7962481Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7962555Z add.s32 %r6145, %r6144, %r332; 2026-02-21T10:23:48.7962622Z ld.shared.b16 %rs481, [%r6145]; 2026-02-21T10:23:48.7962695Z ld.shared.b16 %rs482, [%r6145+256]; 2026-02-21T10:23:48.7962847Z ld.shared.b16 %rs483, [%r6145+16]; 2026-02-21T10:23:48.7962917Z ld.shared.b16 %rs484, [%r6145+272]; 2026-02-21T10:23:48.7962983Z add.s32 %r6146, %r6144, %r333; 2026-02-21T10:23:48.7963048Z ld.shared.b16 %rs485, [%r6146]; 2026-02-21T10:23:48.7963177Z ld.shared.b16 %rs486, [%r6146+256]; 2026-02-21T10:23:48.7963242Z ld.shared.b16 %rs487, [%r6146+16]; 2026-02-21T10:23:48.7963310Z ld.shared.b16 %rs488, [%r6146+272]; 2026-02-21T10:23:48.7963375Z cvt.f32.bf16 %r5400, %rs481; 2026-02-21T10:23:48.7963439Z cvt.f32.bf16 %r5401, %rs482; 2026-02-21T10:23:48.7963505Z cvt.f32.bf16 %r5402, %rs485; 2026-02-21T10:23:48.7963566Z cvt.f32.bf16 %r5403, %rs486; 2026-02-21T10:23:48.7963627Z cvt.f32.bf16 %r5532, %rs483; 2026-02-21T10:23:48.7963690Z cvt.f32.bf16 %r5533, %rs484; 2026-02-21T10:23:48.7963750Z cvt.f32.bf16 %r5534, %rs487; 2026-02-21T10:23:48.7963810Z cvt.f32.bf16 %r5535, %rs488; 2026-02-21T10:23:48.7964062Z .loc 1 57 62 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:62 2026-02-21T10:23:48.7964144Z mad.lo.s32 %r6147, %r6142, 1280, %r6505; 2026-02-21T10:23:48.7964342Z .loc 1 57 34 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:34 2026-02-21T10:23:48.7964408Z cvt.s64.s32 %rd177, %r6147; 2026-02-21T10:23:48.7964476Z add.s64 %rd158, %rd24, %rd177; 2026-02-21T10:23:48.7964733Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7964796Z // begin inline asm 2026-02-21T10:23:48.7964860Z mov.u64 %rd157, 0x0; 2026-02-21T10:23:48.7964994Z createpolicy.fractional.L2::evict_last.b64 %rd157, 1.0; 2026-02-21T10:23:48.7965054Z // end inline asm 2026-02-21T10:23:48.7965113Z // begin inline asm 2026-02-21T10:23:48.7965177Z mov.u32 %r5271, 0x0; 2026-02-21T10:23:48.7965341Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r5271 }, [ %rd158 + 0 ], %rd157; 2026-02-21T10:23:48.7965398Z // end inline asm 2026-02-21T10:23:48.7965606Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7965662Z bar.sync 0; 2026-02-21T10:23:48.7965727Z st.shared.b8 [%r334], %r5271; 2026-02-21T10:23:48.7965798Z prmt.b32 %r6148, %r5271, 0, 0x7771U; 2026-02-21T10:23:48.7965868Z st.shared.b8 [%r335+256], %r6148; 2026-02-21T10:23:48.7965932Z prmt.b32 %r6149, %r5271, 0, 0x7772U; 2026-02-21T10:23:48.7966001Z st.shared.b8 [%r336+512], %r6149; 2026-02-21T10:23:48.7966069Z prmt.b32 %r6150, %r5271, 0, 0x7773U; 2026-02-21T10:23:48.7966133Z st.shared.b8 [%r337+768], %r6150; 2026-02-21T10:23:48.7966187Z bar.sync 0; 2026-02-21T10:23:48.7966266Z ld.shared.b32 %r6151, [%r338]; 2026-02-21T10:23:48.7966331Z prmt.b32 %r6152, %r6151, 0, 0x7770U; 2026-02-21T10:23:48.7966398Z cvt.u16.u32 %rs489, %r6152; 2026-02-21T10:23:48.7966579Z prmt.b32 %r6153, %r6151, 0, 0x7771U; 2026-02-21T10:23:48.7966648Z cvt.u16.u32 %rs490, %r6153; 2026-02-21T10:23:48.7966711Z prmt.b32 %r6154, %r6151, 0, 0x7772U; 2026-02-21T10:23:48.7966779Z cvt.u16.u32 %rs491, %r6154; 2026-02-21T10:23:48.7966846Z prmt.b32 %r6155, %r6151, 0, 0x7773U; 2026-02-21T10:23:48.7966906Z cvt.u16.u32 %rs492, %r6155; 2026-02-21T10:23:48.7966973Z ld.shared.b32 %r6156, [%r338+128]; 2026-02-21T10:23:48.7967049Z prmt.b32 %r6157, %r6156, 0, 0x7770U; 2026-02-21T10:23:48.7967116Z cvt.u16.u32 %rs493, %r6157; 2026-02-21T10:23:48.7967179Z prmt.b32 %r6158, %r6156, 0, 0x7771U; 2026-02-21T10:23:48.7967242Z cvt.u16.u32 %rs494, %r6158; 2026-02-21T10:23:48.7967308Z prmt.b32 %r6159, %r6156, 0, 0x7772U; 2026-02-21T10:23:48.7967370Z cvt.u16.u32 %rs495, %r6159; 2026-02-21T10:23:48.7967435Z prmt.b32 %r6160, %r6156, 0, 0x7773U; 2026-02-21T10:23:48.7967496Z cvt.u16.u32 %rs496, %r6160; 2026-02-21T10:23:48.7967699Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7967764Z shl.b16 %rs497, %rs489, 4; 2026-02-21T10:23:48.7967825Z shl.b16 %rs498, %rs490, 4; 2026-02-21T10:23:48.7967981Z shl.b16 %rs499, %rs491, 4; 2026-02-21T10:23:48.7968042Z shl.b16 %rs500, %rs492, 4; 2026-02-21T10:23:48.7968102Z shl.b16 %rs501, %rs493, 4; 2026-02-21T10:23:48.7968166Z shl.b16 %rs502, %rs494, 4; 2026-02-21T10:23:48.7968315Z shl.b16 %rs503, %rs495, 4; 2026-02-21T10:23:48.7968375Z shl.b16 %rs504, %rs496, 4; 2026-02-21T10:23:48.7968583Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7968660Z selp.b16 %rs505, %rs497, %rs489, %p45; 2026-02-21T10:23:48.7968722Z cvt.s16.s8 %rs506, %rs505; 2026-02-21T10:23:48.7968781Z shr.s16 %rs507, %rs506, 4; 2026-02-21T10:23:48.7968854Z selp.b16 %rs508, %rs498, %rs490, %p45; 2026-02-21T10:23:48.7968914Z cvt.s16.s8 %rs509, %rs508; 2026-02-21T10:23:48.7968975Z shr.s16 %rs510, %rs509, 4; 2026-02-21T10:23:48.7969050Z selp.b16 %rs511, %rs499, %rs491, %p45; 2026-02-21T10:23:48.7969111Z cvt.s16.s8 %rs512, %rs511; 2026-02-21T10:23:48.7969171Z shr.s16 %rs513, %rs512, 4; 2026-02-21T10:23:48.7969317Z selp.b16 %rs514, %rs500, %rs492, %p45; 2026-02-21T10:23:48.7969387Z cvt.s16.s8 %rs515, %rs514; 2026-02-21T10:23:48.7969447Z shr.s16 %rs516, %rs515, 4; 2026-02-21T10:23:48.7969515Z selp.b16 %rs517, %rs501, %rs493, %p45; 2026-02-21T10:23:48.7969581Z cvt.s16.s8 %rs518, %rs517; 2026-02-21T10:23:48.7969642Z shr.s16 %rs519, %rs518, 4; 2026-02-21T10:23:48.7969710Z selp.b16 %rs520, %rs502, %rs494, %p45; 2026-02-21T10:23:48.7969841Z cvt.s16.s8 %rs521, %rs520; 2026-02-21T10:23:48.7969911Z shr.s16 %rs522, %rs521, 4; 2026-02-21T10:23:48.7969980Z selp.b16 %rs523, %rs503, %rs495, %p45; 2026-02-21T10:23:48.7970041Z cvt.s16.s8 %rs524, %rs523; 2026-02-21T10:23:48.7970102Z shr.s16 %rs525, %rs524, 4; 2026-02-21T10:23:48.7970171Z selp.b16 %rs526, %rs504, %rs496, %p45; 2026-02-21T10:23:48.7970232Z cvt.s16.s8 %rs527, %rs526; 2026-02-21T10:23:48.7970302Z shr.s16 %rs528, %rs527, 4; 2026-02-21T10:23:48.7970512Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7970580Z cvt.rn.f32.s16 %r6161, %rs507; 2026-02-21T10:23:48.7970643Z cvt.rn.f32.s16 %r6162, %rs510; 2026-02-21T10:23:48.7970714Z cvt.rn.f32.s16 %r6163, %rs513; 2026-02-21T10:23:48.7970780Z cvt.rn.f32.s16 %r6164, %rs516; 2026-02-21T10:23:48.7970840Z cvt.rn.f32.s16 %r6165, %rs519; 2026-02-21T10:23:48.7970905Z cvt.rn.f32.s16 %r6166, %rs522; 2026-02-21T10:23:48.7970969Z cvt.rn.f32.s16 %r6167, %rs525; 2026-02-21T10:23:48.7971031Z cvt.rn.f32.s16 %r6168, %rs528; 2026-02-21T10:23:48.7971088Z bar.sync 0; 2026-02-21T10:23:48.7971156Z st.shared.b32 [%r339], %r6161; 2026-02-21T10:23:48.7971221Z st.shared.b32 [%r339+8], %r6162; 2026-02-21T10:23:48.7971286Z st.shared.b32 [%r340], %r6163; 2026-02-21T10:23:48.7971354Z st.shared.b32 [%r340+8], %r6164; 2026-02-21T10:23:48.7971418Z st.shared.b32 [%r341], %r6165; 2026-02-21T10:23:48.7971482Z st.shared.b32 [%r341+8], %r6166; 2026-02-21T10:23:48.7971545Z st.shared.b32 [%r342], %r6167; 2026-02-21T10:23:48.7971619Z st.shared.b32 [%r342+8], %r6168; 2026-02-21T10:23:48.7971678Z $L__tmp21: 2026-02-21T10:23:48.7971955Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7972036Z // begin inline asm 2026-02-21T10:23:48.7972118Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7972175Z // end inline asm 2026-02-21T10:23:48.7972237Z bar.sync 0; 2026-02-21T10:23:48.7972310Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7972369Z // begin inline asm 2026-02-21T10:23:48.7973657Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5400,%r5401,%r5402,%r5403}, %rd150, %p30, 1, 1; 2026-02-21T10:23:48.7973782Z // end inline asm 2026-02-21T10:23:48.7973842Z // begin inline asm 2026-02-21T10:23:48.7975161Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5532,%r5533,%r5534,%r5535}, %rd151, %p30, 1, 1; 2026-02-21T10:23:48.7975227Z // end inline asm 2026-02-21T10:23:48.7975305Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7975370Z mov.b32 %r5601, %r6000; 2026-02-21T10:23:48.7975429Z mov.b32 %r5602, %r6000; 2026-02-21T10:23:48.7975546Z mov.b32 %r5600, %r4869; 2026-02-21T10:23:48.7975610Z // begin inline asm 2026-02-21T10:23:48.7976882Z // wait for regs: %r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577,%r5600,%r5601,%r5602 2026-02-21T10:23:48.7976970Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7977031Z // end inline asm 2026-02-21T10:23:48.7977085Z $L__tmp22: 2026-02-21T10:23:48.7977295Z .loc 1 44 35 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:44:35 2026-02-21T10:23:48.7977359Z add.s32 %r6169, %r6497, %r23; 2026-02-21T10:23:48.7977565Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7977630Z add.s32 %r6170, %r4869, 69632; 2026-02-21T10:23:48.7977692Z add.s32 %r6171, %r6170, %r6087; 2026-02-21T10:23:48.7977896Z .loc 1 55 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:55:32 2026-02-21T10:23:48.7977960Z add.s32 %r6172, %r6171, %r332; 2026-02-21T10:23:48.7978027Z ld.shared.b16 %rs529, [%r6172]; 2026-02-21T10:23:48.7978100Z ld.shared.b16 %rs530, [%r6172+256]; 2026-02-21T10:23:48.7978166Z ld.shared.b16 %rs531, [%r6172+16]; 2026-02-21T10:23:48.7978235Z ld.shared.b16 %rs532, [%r6172+272]; 2026-02-21T10:23:48.7978298Z add.s32 %r6173, %r6171, %r333; 2026-02-21T10:23:48.7978370Z ld.shared.b16 %rs533, [%r6173]; 2026-02-21T10:23:48.7978434Z ld.shared.b16 %rs534, [%r6173+256]; 2026-02-21T10:23:48.7978499Z ld.shared.b16 %rs535, [%r6173+16]; 2026-02-21T10:23:48.7978582Z ld.shared.b16 %rs536, [%r6173+272]; 2026-02-21T10:23:48.7978649Z cvt.f32.bf16 %r5799, %rs529; 2026-02-21T10:23:48.7978713Z cvt.f32.bf16 %r5800, %rs530; 2026-02-21T10:23:48.7978777Z cvt.f32.bf16 %r5801, %rs533; 2026-02-21T10:23:48.7978838Z cvt.f32.bf16 %r5802, %rs534; 2026-02-21T10:23:48.7978898Z cvt.f32.bf16 %r5931, %rs531; 2026-02-21T10:23:48.7978961Z cvt.f32.bf16 %r5932, %rs532; 2026-02-21T10:23:48.7979025Z cvt.f32.bf16 %r5933, %rs535; 2026-02-21T10:23:48.7979087Z cvt.f32.bf16 %r5934, %rs536; 2026-02-21T10:23:48.7979291Z .loc 1 57 62 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:62 2026-02-21T10:23:48.7979366Z mad.lo.s32 %r6174, %r6169, 1280, %r6505; 2026-02-21T10:23:48.7979564Z .loc 1 57 34 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:34 2026-02-21T10:23:48.7979629Z cvt.s64.s32 %rd178, %r6174; 2026-02-21T10:23:48.7979693Z add.s64 %rd163, %rd24, %rd178; 2026-02-21T10:23:48.7979901Z .loc 1 57 87 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:57:87 2026-02-21T10:23:48.7980417Z // begin inline asm 2026-02-21T10:23:48.7980482Z mov.u64 %rd162, 0x0; 2026-02-21T10:23:48.7980610Z createpolicy.fractional.L2::evict_last.b64 %rd162, 1.0; 2026-02-21T10:23:48.7980666Z // end inline asm 2026-02-21T10:23:48.7980801Z // begin inline asm 2026-02-21T10:23:48.7980863Z mov.u32 %r5670, 0x0; 2026-02-21T10:23:48.7981027Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r5670 }, [ %rd163 + 0 ], %rd162; 2026-02-21T10:23:48.7981086Z // end inline asm 2026-02-21T10:23:48.7981287Z .loc 1 65 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:65:28 2026-02-21T10:23:48.7981345Z bar.sync 0; 2026-02-21T10:23:48.7981412Z st.shared.b8 [%r334], %r5670; 2026-02-21T10:23:48.7981479Z prmt.b32 %r6175, %r5670, 0, 0x7771U; 2026-02-21T10:23:48.7981547Z st.shared.b8 [%r335+256], %r6175; 2026-02-21T10:23:48.7981612Z prmt.b32 %r6176, %r5670, 0, 0x7772U; 2026-02-21T10:23:48.7981677Z st.shared.b8 [%r336+512], %r6176; 2026-02-21T10:23:48.7981809Z prmt.b32 %r6177, %r5670, 0, 0x7773U; 2026-02-21T10:23:48.7981876Z st.shared.b8 [%r337+768], %r6177; 2026-02-21T10:23:48.7981931Z bar.sync 0; 2026-02-21T10:23:48.7982008Z ld.shared.b32 %r6178, [%r338]; 2026-02-21T10:23:48.7982081Z prmt.b32 %r6179, %r6178, 0, 0x7770U; 2026-02-21T10:23:48.7982148Z cvt.u16.u32 %rs537, %r6179; 2026-02-21T10:23:48.7982212Z prmt.b32 %r6180, %r6178, 0, 0x7771U; 2026-02-21T10:23:48.7982325Z cvt.u16.u32 %rs538, %r6180; 2026-02-21T10:23:48.7982403Z prmt.b32 %r6181, %r6178, 0, 0x7772U; 2026-02-21T10:23:48.7982466Z cvt.u16.u32 %rs539, %r6181; 2026-02-21T10:23:48.7982530Z prmt.b32 %r6182, %r6178, 0, 0x7773U; 2026-02-21T10:23:48.7982599Z cvt.u16.u32 %rs540, %r6182; 2026-02-21T10:23:48.7982665Z ld.shared.b32 %r6183, [%r338+128]; 2026-02-21T10:23:48.7982729Z prmt.b32 %r6184, %r6183, 0, 0x7770U; 2026-02-21T10:23:48.7982796Z cvt.u16.u32 %rs541, %r6184; 2026-02-21T10:23:48.7982860Z prmt.b32 %r6185, %r6183, 0, 0x7771U; 2026-02-21T10:23:48.7982923Z cvt.u16.u32 %rs542, %r6185; 2026-02-21T10:23:48.7982992Z prmt.b32 %r6186, %r6183, 0, 0x7772U; 2026-02-21T10:23:48.7983055Z cvt.u16.u32 %rs543, %r6186; 2026-02-21T10:23:48.7983117Z prmt.b32 %r6187, %r6183, 0, 0x7773U; 2026-02-21T10:23:48.7983176Z cvt.u16.u32 %rs544, %r6187; 2026-02-21T10:23:48.7983383Z .loc 1 60 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:60:28 2026-02-21T10:23:48.7983450Z shl.b16 %rs545, %rs537, 4; 2026-02-21T10:23:48.7983513Z shl.b16 %rs546, %rs538, 4; 2026-02-21T10:23:48.7983578Z shl.b16 %rs547, %rs539, 4; 2026-02-21T10:23:48.7983641Z shl.b16 %rs548, %rs540, 4; 2026-02-21T10:23:48.7983703Z shl.b16 %rs549, %rs541, 4; 2026-02-21T10:23:48.7983763Z shl.b16 %rs550, %rs542, 4; 2026-02-21T10:23:48.7983826Z shl.b16 %rs551, %rs543, 4; 2026-02-21T10:23:48.7983886Z shl.b16 %rs552, %rs544, 4; 2026-02-21T10:23:48.7984086Z .loc 1 75 58 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:75:58 2026-02-21T10:23:48.7984165Z selp.b16 %rs553, %rs545, %rs537, %p45; 2026-02-21T10:23:48.7984239Z cvt.s16.s8 %rs554, %rs553; 2026-02-21T10:23:48.7984304Z shr.s16 %rs555, %rs554, 4; 2026-02-21T10:23:48.7984376Z selp.b16 %rs556, %rs546, %rs538, %p45; 2026-02-21T10:23:48.7984439Z cvt.s16.s8 %rs557, %rs556; 2026-02-21T10:23:48.7984500Z shr.s16 %rs558, %rs557, 4; 2026-02-21T10:23:48.7984568Z selp.b16 %rs559, %rs547, %rs539, %p45; 2026-02-21T10:23:48.7984635Z cvt.s16.s8 %rs560, %rs559; 2026-02-21T10:23:48.7984697Z shr.s16 %rs561, %rs560, 4; 2026-02-21T10:23:48.7984764Z selp.b16 %rs562, %rs548, %rs540, %p45; 2026-02-21T10:23:48.7984828Z cvt.s16.s8 %rs563, %rs562; 2026-02-21T10:23:48.7984899Z shr.s16 %rs564, %rs563, 4; 2026-02-21T10:23:48.7984969Z selp.b16 %rs565, %rs549, %rs541, %p45; 2026-02-21T10:23:48.7985029Z cvt.s16.s8 %rs566, %rs565; 2026-02-21T10:23:48.7985093Z shr.s16 %rs567, %rs566, 4; 2026-02-21T10:23:48.7985161Z selp.b16 %rs568, %rs550, %rs542, %p45; 2026-02-21T10:23:48.7985222Z cvt.s16.s8 %rs569, %rs568; 2026-02-21T10:23:48.7985365Z shr.s16 %rs570, %rs569, 4; 2026-02-21T10:23:48.7985434Z selp.b16 %rs571, %rs551, %rs543, %p45; 2026-02-21T10:23:48.7985495Z cvt.s16.s8 %rs572, %rs571; 2026-02-21T10:23:48.7985556Z shr.s16 %rs573, %rs572, 4; 2026-02-21T10:23:48.7985681Z selp.b16 %rs574, %rs552, %rs544, %p45; 2026-02-21T10:23:48.7985742Z cvt.s16.s8 %rs575, %rs574; 2026-02-21T10:23:48.7985802Z shr.s16 %rs576, %rs575, 4; 2026-02-21T10:23:48.7986009Z .loc 1 80 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:80:32 2026-02-21T10:23:48.7986076Z cvt.rn.f32.s16 %r6188, %rs555; 2026-02-21T10:23:48.7986139Z cvt.rn.f32.s16 %r6189, %rs558; 2026-02-21T10:23:48.7986209Z cvt.rn.f32.s16 %r6190, %rs561; 2026-02-21T10:23:48.7986270Z cvt.rn.f32.s16 %r6191, %rs564; 2026-02-21T10:23:48.7986333Z cvt.rn.f32.s16 %r6192, %rs567; 2026-02-21T10:23:48.7986394Z cvt.rn.f32.s16 %r6193, %rs570; 2026-02-21T10:23:48.7986596Z cvt.rn.f32.s16 %r6194, %rs573; 2026-02-21T10:23:48.7986745Z cvt.rn.f32.s16 %r6195, %rs576; 2026-02-21T10:23:48.7986814Z bar.sync 0; 2026-02-21T10:23:48.7986881Z st.shared.b32 [%r339], %r6188; 2026-02-21T10:23:48.7986947Z st.shared.b32 [%r339+8], %r6189; 2026-02-21T10:23:48.7987010Z st.shared.b32 [%r340], %r6190; 2026-02-21T10:23:48.7987078Z st.shared.b32 [%r340+8], %r6191; 2026-02-21T10:23:48.7987146Z st.shared.b32 [%r341], %r6192; 2026-02-21T10:23:48.7987272Z st.shared.b32 [%r341+8], %r6193; 2026-02-21T10:23:48.7987339Z st.shared.b32 [%r342], %r6194; 2026-02-21T10:23:48.7987406Z st.shared.b32 [%r342+8], %r6195; 2026-02-21T10:23:48.7987461Z $L__tmp23: 2026-02-21T10:23:48.7987737Z .loc 2 291 36 // standard.py:291:36 @[ cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:87:40 ] 2026-02-21T10:23:48.7987806Z // begin inline asm 2026-02-21T10:23:48.7987884Z fence.proxy.async.shared::cta; 2026-02-21T10:23:48.7987942Z // end inline asm 2026-02-21T10:23:48.7987997Z bar.sync 0; 2026-02-21T10:23:48.7988074Z wgmma.fence.sync.aligned; 2026-02-21T10:23:48.7988136Z // begin inline asm 2026-02-21T10:23:48.7989498Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5799,%r5800,%r5801,%r5802}, %rd150, %p30, 1, 1; 2026-02-21T10:23:48.7989566Z // end inline asm 2026-02-21T10:23:48.7989626Z // begin inline asm 2026-02-21T10:23:48.7990891Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577}, {%r5931,%r5932,%r5933,%r5934}, %rd151, %p30, 1, 1; 2026-02-21T10:23:48.7990957Z // end inline asm 2026-02-21T10:23:48.7991032Z wgmma.commit_group.sync.aligned; 2026-02-21T10:23:48.7991094Z mov.b32 %r5999, %r4869; 2026-02-21T10:23:48.7991157Z mov.b32 %r6001, %r6000; 2026-02-21T10:23:48.7991226Z // begin inline asm 2026-02-21T10:23:48.7992294Z // wait for regs: %r6514,%r6515,%r6516,%r6517,%r6518,%r6519,%r6520,%r6521,%r6522,%r6523,%r6524,%r6525,%r6526,%r6527,%r6528,%r6529,%r6530,%r6531,%r6532,%r6533,%r6534,%r6535,%r6536,%r6537,%r6538,%r6539,%r6540,%r6541,%r6542,%r6543,%r6544,%r6545,%r6546,%r6547,%r6548,%r6549,%r6550,%r6551,%r6552,%r6553,%r6554,%r6555,%r6556,%r6557,%r6558,%r6559,%r6560,%r6561,%r6562,%r6563,%r6564,%r6565,%r6566,%r6567,%r6568,%r6569,%r6570,%r6571,%r6572,%r6573,%r6574,%r6575,%r6576,%r6577,%r5999,%r6000,%r6001 2026-02-21T10:23:48.7992480Z wgmma.wait_group.sync.aligned 0; 2026-02-21T10:23:48.7992538Z // end inline asm 2026-02-21T10:23:48.7992592Z $L__tmp24: 2026-02-21T10:23:48.7992878Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.7992941Z add.s32 %r6196, %r6507, 32; 2026-02-21T10:23:48.7993005Z add.s32 %r6197, %r6510, 1; 2026-02-21T10:23:48.7993075Z setp.gt.s32 %p42, %r6197, 2; 2026-02-21T10:23:48.7993141Z selp.b32 %r6510, 0, %r6197, %p42; 2026-02-21T10:23:48.7993205Z selp.b32 %r6506, 0, %r6196, %p38; 2026-02-21T10:23:48.7993411Z .loc 1 48 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:48:22 2026-02-21T10:23:48.7993473Z shl.b32 %r6198, %r6506, 1; 2026-02-21T10:23:48.7993674Z .loc 1 50 25 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:50:25 2026-02-21T10:23:48.7993740Z add.s32 %r6199, %r6198, %r25; 2026-02-21T10:23:48.7994017Z .loc 1 51 53 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:53 2026-02-21T10:23:48.7994085Z shl.b32 %r6200, %r6587, 13; 2026-02-21T10:23:48.7994151Z shl.b32 %r6201, %r6588, 13; 2026-02-21T10:23:48.7994360Z .loc 1 51 60 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:60 2026-02-21T10:23:48.7994466Z add.s32 %r6202, %r6200, %r6199; 2026-02-21T10:23:48.7994531Z add.s32 %r6203, %r6201, %r6199; 2026-02-21T10:23:48.7994734Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7994806Z mad.wide.s32 %rd167, %r6202, 2, %rd23; 2026-02-21T10:23:48.7994874Z mad.wide.s32 %rd168, %r6203, 2, %rd23; 2026-02-21T10:23:48.7995073Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7995137Z shl.b32 %r6204, %r6510, 12; 2026-02-21T10:23:48.7995203Z add.s32 %r6205, %r6088, %r6204; 2026-02-21T10:23:48.7995266Z add.s32 %r6069, %r6205, %r328; 2026-02-21T10:23:48.7995344Z selp.b32 %r6070, 8, 0, %p39; 2026-02-21T10:23:48.7995410Z // begin inline asm 2026-02-21T10:23:48.7995562Z cp.async.ca.shared.global [ %r6069 + 0 ], [ %rd167 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.7995626Z // end inline asm 2026-02-21T10:23:48.7995689Z add.s32 %r6071, %r6069, 2048; 2026-02-21T10:23:48.7995751Z // begin inline asm 2026-02-21T10:23:48.7995888Z cp.async.ca.shared.global [ %r6071 + 0 ], [ %rd168 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.7995950Z // end inline asm 2026-02-21T10:23:48.7996017Z cp.async.commit_group; 2026-02-21T10:23:48.7996233Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7996298Z add.s32 %r6501, %r6506, 8; 2026-02-21T10:23:48.7996632Z .loc 1 48 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:48:22 2026-02-21T10:23:48.7996698Z shl.b32 %r6206, %r6501, 1; 2026-02-21T10:23:48.7996904Z .loc 1 50 25 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:50:25 2026-02-21T10:23:48.7996966Z add.s32 %r6207, %r6206, %r25; 2026-02-21T10:23:48.7997167Z .loc 1 51 60 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:60 2026-02-21T10:23:48.7997229Z add.s32 %r6208, %r6200, %r6207; 2026-02-21T10:23:48.7997296Z add.s32 %r6209, %r6201, %r6207; 2026-02-21T10:23:48.7997494Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.7997564Z mad.wide.s32 %rd169, %r6208, 2, %rd23; 2026-02-21T10:23:48.7997646Z mad.wide.s32 %rd170, %r6209, 2, %rd23; 2026-02-21T10:23:48.7997849Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.7997912Z add.s32 %r6210, %r6116, %r6204; 2026-02-21T10:23:48.7997978Z add.s32 %r6073, %r6210, %r328; 2026-02-21T10:23:48.7998134Z // begin inline asm 2026-02-21T10:23:48.7998276Z cp.async.ca.shared.global [ %r6073 + 0 ], [ %rd169 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.7998334Z // end inline asm 2026-02-21T10:23:48.7998400Z add.s32 %r6075, %r6073, 2048; 2026-02-21T10:23:48.7998521Z // begin inline asm 2026-02-21T10:23:48.7998658Z cp.async.ca.shared.global [ %r6075 + 0 ], [ %rd170 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.7998718Z // end inline asm 2026-02-21T10:23:48.7998787Z cp.async.commit_group; 2026-02-21T10:23:48.7998998Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.7999060Z add.s32 %r6498, %r6506, 16; 2026-02-21T10:23:48.7999259Z .loc 1 48 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:48:22 2026-02-21T10:23:48.7999319Z shl.b32 %r6211, %r6498, 1; 2026-02-21T10:23:48.7999517Z .loc 1 50 25 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:50:25 2026-02-21T10:23:48.7999655Z add.s32 %r6212, %r6211, %r25; 2026-02-21T10:23:48.7999868Z .loc 1 51 60 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:60 2026-02-21T10:23:48.7999932Z add.s32 %r6213, %r6200, %r6212; 2026-02-21T10:23:48.7999998Z add.s32 %r6214, %r6201, %r6212; 2026-02-21T10:23:48.8000198Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.8000333Z mad.wide.s32 %rd171, %r6213, 2, %rd23; 2026-02-21T10:23:48.8000407Z mad.wide.s32 %rd172, %r6214, 2, %rd23; 2026-02-21T10:23:48.8000618Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.8000680Z add.s32 %r6215, %r6143, %r6204; 2026-02-21T10:23:48.8000742Z add.s32 %r6077, %r6215, %r328; 2026-02-21T10:23:48.8000803Z // begin inline asm 2026-02-21T10:23:48.8000938Z cp.async.ca.shared.global [ %r6077 + 0 ], [ %rd171 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.8000998Z // end inline asm 2026-02-21T10:23:48.8001063Z add.s32 %r6079, %r6077, 2048; 2026-02-21T10:23:48.8001121Z // begin inline asm 2026-02-21T10:23:48.8001262Z cp.async.ca.shared.global [ %r6079 + 0 ], [ %rd172 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.8001325Z // end inline asm 2026-02-21T10:23:48.8001391Z cp.async.commit_group; 2026-02-21T10:23:48.8001605Z .loc 1 43 126 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:43:126 2026-02-21T10:23:48.8001667Z add.s32 %r6495, %r6506, 24; 2026-02-21T10:23:48.8001872Z .loc 1 48 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:48:22 2026-02-21T10:23:48.8001931Z shl.b32 %r6216, %r6495, 1; 2026-02-21T10:23:48.8002130Z .loc 1 50 25 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:50:25 2026-02-21T10:23:48.8002195Z add.s32 %r6217, %r6216, %r25; 2026-02-21T10:23:48.8002394Z .loc 1 51 60 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:60 2026-02-21T10:23:48.8002459Z add.s32 %r6218, %r6200, %r6217; 2026-02-21T10:23:48.8002524Z add.s32 %r6219, %r6201, %r6217; 2026-02-21T10:23:48.8002723Z .loc 1 51 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:32 2026-02-21T10:23:48.8002794Z mad.wide.s32 %rd173, %r6218, 2, %rd23; 2026-02-21T10:23:48.8002864Z mad.wide.s32 %rd174, %r6219, 2, %rd23; 2026-02-21T10:23:48.8003065Z .loc 1 51 80 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:51:80 2026-02-21T10:23:48.8003131Z add.s32 %r6220, %r6170, %r6204; 2026-02-21T10:23:48.8003190Z add.s32 %r6081, %r6220, %r328; 2026-02-21T10:23:48.8003254Z // begin inline asm 2026-02-21T10:23:48.8003386Z cp.async.ca.shared.global [ %r6081 + 0 ], [ %rd173 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.8003443Z // end inline asm 2026-02-21T10:23:48.8003506Z add.s32 %r6083, %r6081, 2048; 2026-02-21T10:23:48.8003565Z // begin inline asm 2026-02-21T10:23:48.8003701Z cp.async.ca.shared.global [ %r6083 + 0 ], [ %rd174 + 0 ], 0x8, %r6070; 2026-02-21T10:23:48.8003829Z // end inline asm 2026-02-21T10:23:48.8003899Z cp.async.commit_group; 2026-02-21T10:23:48.8004108Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.8004222Z setp.ne.b32 %p43, %r6494, 127; 2026-02-21T10:23:48.8004286Z @%p43 bra $L__BB0_14; 2026-02-21T10:23:48.8004401Z // %bb.13: // in Loop: Header=BB0_10 Depth=1 2026-02-21T10:23:48.8004603Z .loc 1 34 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:34:32 2026-02-21T10:23:48.8004672Z add.s32 %r6294, %r6492, %r21; 2026-02-21T10:23:48.8004870Z .loc 1 36 32 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:36:32 2026-02-21T10:23:48.8004931Z add.s32 %r6295, %r6490, %r10; 2026-02-21T10:23:48.8004990Z add.s32 %r6296, %r6490, %r11; 2026-02-21T10:23:48.8005053Z add.s32 %r6297, %r6490, %r12; 2026-02-21T10:23:48.8005179Z add.s32 %r6298, %r6490, %r13; 2026-02-21T10:23:48.8005244Z add.s32 %r6299, %r6490, %r14; 2026-02-21T10:23:48.8005311Z add.s32 %r6300, %r6490, %r15; 2026-02-21T10:23:48.8005372Z add.s32 %r6301, %r6490, %r16; 2026-02-21T10:23:48.8005435Z add.s32 %r6302, %r6490, %r17; 2026-02-21T10:23:48.8005639Z .loc 1 90 28 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:90:28 2026-02-21T10:23:48.8005765Z cvt.rn.bf16x2.f32 %r6303, %r6515, %r6514; 2026-02-21T10:23:48.8005842Z cvt.rn.bf16x2.f32 %r6304, %r6517, %r6516; 2026-02-21T10:23:48.8005915Z cvt.rn.bf16x2.f32 %r6305, %r6519, %r6518; 2026-02-21T10:23:48.8005989Z cvt.rn.bf16x2.f32 %r6306, %r6521, %r6520; 2026-02-21T10:23:48.8006058Z cvt.rn.bf16x2.f32 %r6307, %r6523, %r6522; 2026-02-21T10:23:48.8006127Z cvt.rn.bf16x2.f32 %r6308, %r6525, %r6524; 2026-02-21T10:23:48.8006201Z cvt.rn.bf16x2.f32 %r6309, %r6527, %r6526; 2026-02-21T10:23:48.8006270Z cvt.rn.bf16x2.f32 %r6310, %r6529, %r6528; 2026-02-21T10:23:48.8006345Z cvt.rn.bf16x2.f32 %r6311, %r6531, %r6530; 2026-02-21T10:23:48.8006420Z cvt.rn.bf16x2.f32 %r6312, %r6533, %r6532; 2026-02-21T10:23:48.8006612Z cvt.rn.bf16x2.f32 %r6313, %r6535, %r6534; 2026-02-21T10:23:48.8006688Z cvt.rn.bf16x2.f32 %r6314, %r6537, %r6536; 2026-02-21T10:23:48.8006761Z cvt.rn.bf16x2.f32 %r6315, %r6539, %r6538; 2026-02-21T10:23:48.8006846Z cvt.rn.bf16x2.f32 %r6316, %r6541, %r6540; 2026-02-21T10:23:48.8006922Z cvt.rn.bf16x2.f32 %r6317, %r6543, %r6542; 2026-02-21T10:23:48.8006994Z cvt.rn.bf16x2.f32 %r6318, %r6545, %r6544; 2026-02-21T10:23:48.8007071Z cvt.rn.bf16x2.f32 %r6319, %r6547, %r6546; 2026-02-21T10:23:48.8007142Z cvt.rn.bf16x2.f32 %r6320, %r6549, %r6548; 2026-02-21T10:23:48.8007213Z cvt.rn.bf16x2.f32 %r6321, %r6551, %r6550; 2026-02-21T10:23:48.8007284Z cvt.rn.bf16x2.f32 %r6322, %r6553, %r6552; 2026-02-21T10:23:48.8007359Z cvt.rn.bf16x2.f32 %r6323, %r6555, %r6554; 2026-02-21T10:23:48.8007428Z cvt.rn.bf16x2.f32 %r6324, %r6557, %r6556; 2026-02-21T10:23:48.8007504Z cvt.rn.bf16x2.f32 %r6325, %r6559, %r6558; 2026-02-21T10:23:48.8007580Z cvt.rn.bf16x2.f32 %r6326, %r6561, %r6560; 2026-02-21T10:23:48.8007652Z cvt.rn.bf16x2.f32 %r6327, %r6563, %r6562; 2026-02-21T10:23:48.8007724Z cvt.rn.bf16x2.f32 %r6328, %r6565, %r6564; 2026-02-21T10:23:48.8007801Z cvt.rn.bf16x2.f32 %r6329, %r6567, %r6566; 2026-02-21T10:23:48.8007874Z cvt.rn.bf16x2.f32 %r6330, %r6569, %r6568; 2026-02-21T10:23:48.8007947Z cvt.rn.bf16x2.f32 %r6331, %r6571, %r6570; 2026-02-21T10:23:48.8008018Z cvt.rn.bf16x2.f32 %r6332, %r6573, %r6572; 2026-02-21T10:23:48.8008092Z cvt.rn.bf16x2.f32 %r6333, %r6575, %r6574; 2026-02-21T10:23:48.8008162Z cvt.rn.bf16x2.f32 %r6334, %r6577, %r6576; 2026-02-21T10:23:48.8008375Z .loc 1 91 50 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:50 2026-02-21T10:23:48.8008450Z mad.lo.s32 %r6335, %r6295, 1280, %r6294; 2026-02-21T10:23:48.8008520Z mad.lo.s32 %r6336, %r6296, 1280, %r6294; 2026-02-21T10:23:48.8008681Z mad.lo.s32 %r6337, %r6297, 1280, %r6294; 2026-02-21T10:23:48.8008753Z mad.lo.s32 %r6338, %r6298, 1280, %r6294; 2026-02-21T10:23:48.8008822Z mad.lo.s32 %r6339, %r6299, 1280, %r6294; 2026-02-21T10:23:48.8008890Z mad.lo.s32 %r6340, %r6300, 1280, %r6294; 2026-02-21T10:23:48.8009021Z mad.lo.s32 %r6341, %r6301, 1280, %r6294; 2026-02-21T10:23:48.8009102Z mad.lo.s32 %r6342, %r6302, 1280, %r6294; 2026-02-21T10:23:48.8009315Z .loc 1 91 22 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:22 2026-02-21T10:23:48.8009385Z mad.wide.s32 %rd179, %r6335, 2, %rd25; 2026-02-21T10:23:48.8009456Z mad.wide.s32 %rd180, %r6336, 2, %rd25; 2026-02-21T10:23:48.8009523Z mad.wide.s32 %rd181, %r6337, 2, %rd25; 2026-02-21T10:23:48.8009589Z mad.wide.s32 %rd182, %r6338, 2, %rd25; 2026-02-21T10:23:48.8009659Z mad.wide.s32 %rd183, %r6339, 2, %rd25; 2026-02-21T10:23:48.8009726Z mad.wide.s32 %rd184, %r6340, 2, %rd25; 2026-02-21T10:23:48.8009793Z mad.wide.s32 %rd185, %r6341, 2, %rd25; 2026-02-21T10:23:48.8009927Z mad.wide.s32 %rd186, %r6342, 2, %rd25; 2026-02-21T10:23:48.8010135Z .loc 1 91 81 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:91:81 2026-02-21T10:23:48.8010192Z bar.sync 0; 2026-02-21T10:23:48.8010323Z st.shared.v4.b32 [%r343], {%r6303, %r6305, %r6307, %r6309}; 2026-02-21T10:23:48.8010449Z st.shared.v4.b32 [%r343+512], {%r6304, %r6306, %r6308, %r6310}; 2026-02-21T10:23:48.8010642Z st.shared.v4.b32 [%r344], {%r6311, %r6313, %r6315, %r6317}; 2026-02-21T10:23:48.8010770Z st.shared.v4.b32 [%r344+512], {%r6312, %r6314, %r6316, %r6318}; 2026-02-21T10:23:48.8010878Z st.shared.v4.b32 [%r345], {%r6319, %r6321, %r6323, %r6325}; 2026-02-21T10:23:48.8010986Z st.shared.v4.b32 [%r345+512], {%r6320, %r6322, %r6324, %r6326}; 2026-02-21T10:23:48.8011091Z st.shared.v4.b32 [%r346], {%r6327, %r6329, %r6331, %r6333}; 2026-02-21T10:23:48.8011201Z st.shared.v4.b32 [%r346+512], {%r6328, %r6330, %r6332, %r6334}; 2026-02-21T10:23:48.8011262Z bar.sync 0; 2026-02-21T10:23:48.8011337Z // begin inline asm 2026-02-21T10:23:48.8011533Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6261, %r6262, %r6263, %r6264}, [%r6225]; 2026-02-21T10:23:48.8011595Z // end inline asm 2026-02-21T10:23:48.8011654Z // begin inline asm 2026-02-21T10:23:48.8011841Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6265, %r6266, %r6267, %r6268}, [%r6230]; 2026-02-21T10:23:48.8011901Z // end inline asm 2026-02-21T10:23:48.8011962Z // begin inline asm 2026-02-21T10:23:48.8012151Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6269, %r6270, %r6271, %r6272}, [%r6235]; 2026-02-21T10:23:48.8012210Z // end inline asm 2026-02-21T10:23:48.8012273Z // begin inline asm 2026-02-21T10:23:48.8012451Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6273, %r6274, %r6275, %r6276}, [%r6240]; 2026-02-21T10:23:48.8012509Z // end inline asm 2026-02-21T10:23:48.8012572Z // begin inline asm 2026-02-21T10:23:48.8012750Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6277, %r6278, %r6279, %r6280}, [%r6245]; 2026-02-21T10:23:48.8012810Z // end inline asm 2026-02-21T10:23:48.8012871Z // begin inline asm 2026-02-21T10:23:48.8013046Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6281, %r6282, %r6283, %r6284}, [%r6250]; 2026-02-21T10:23:48.8013103Z // end inline asm 2026-02-21T10:23:48.8013164Z // begin inline asm 2026-02-21T10:23:48.8013344Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6285, %r6286, %r6287, %r6288}, [%r6255]; 2026-02-21T10:23:48.8013403Z // end inline asm 2026-02-21T10:23:48.8013461Z // begin inline asm 2026-02-21T10:23:48.8013642Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6289, %r6290, %r6291, %r6292}, [%r6260]; 2026-02-21T10:23:48.8013699Z // end inline asm 2026-02-21T10:23:48.8013758Z // begin inline asm 2026-02-21T10:23:48.8013884Z st.global.v4.b32 [ %rd179 + 0 ], { %r6261, %r6262, %r6263, %r6264 }; 2026-02-21T10:23:48.8013944Z // end inline asm 2026-02-21T10:23:48.8014003Z // begin inline asm 2026-02-21T10:23:48.8014123Z st.global.v4.b32 [ %rd180 + 0 ], { %r6265, %r6266, %r6267, %r6268 }; 2026-02-21T10:23:48.8014252Z // end inline asm 2026-02-21T10:23:48.8014320Z // begin inline asm 2026-02-21T10:23:48.8014447Z st.global.v4.b32 [ %rd181 + 0 ], { %r6269, %r6270, %r6271, %r6272 }; 2026-02-21T10:23:48.8014507Z // end inline asm 2026-02-21T10:23:48.8014617Z // begin inline asm 2026-02-21T10:23:48.8014734Z st.global.v4.b32 [ %rd182 + 0 ], { %r6273, %r6274, %r6275, %r6276 }; 2026-02-21T10:23:48.8014791Z // end inline asm 2026-02-21T10:23:48.8014856Z // begin inline asm 2026-02-21T10:23:48.8014973Z st.global.v4.b32 [ %rd183 + 0 ], { %r6277, %r6278, %r6279, %r6280 }; 2026-02-21T10:23:48.8015030Z // end inline asm 2026-02-21T10:23:48.8015091Z // begin inline asm 2026-02-21T10:23:48.8015204Z st.global.v4.b32 [ %rd184 + 0 ], { %r6281, %r6282, %r6283, %r6284 }; 2026-02-21T10:23:48.8015262Z // end inline asm 2026-02-21T10:23:48.8015320Z // begin inline asm 2026-02-21T10:23:48.8015436Z st.global.v4.b32 [ %rd185 + 0 ], { %r6285, %r6286, %r6287, %r6288 }; 2026-02-21T10:23:48.8015508Z // end inline asm 2026-02-21T10:23:48.8015619Z // begin inline asm 2026-02-21T10:23:48.8015741Z st.global.v4.b32 [ %rd186 + 0 ], { %r6289, %r6290, %r6291, %r6292 }; 2026-02-21T10:23:48.8015798Z // end inline asm 2026-02-21T10:23:48.8015862Z mov.b32 %r6514, 0f00000000; 2026-02-21T10:23:48.8015932Z mov.b32 %r6515, %r6514; 2026-02-21T10:23:48.8015993Z mov.b32 %r6516, %r6514; 2026-02-21T10:23:48.8016064Z mov.b32 %r6517, %r6514; 2026-02-21T10:23:48.8016178Z mov.b32 %r6518, %r6514; 2026-02-21T10:23:48.8016246Z mov.b32 %r6519, %r6514; 2026-02-21T10:23:48.8016306Z mov.b32 %r6520, %r6514; 2026-02-21T10:23:48.8016364Z mov.b32 %r6521, %r6514; 2026-02-21T10:23:48.8016428Z mov.b32 %r6522, %r6514; 2026-02-21T10:23:48.8016603Z mov.b32 %r6523, %r6514; 2026-02-21T10:23:48.8016668Z mov.b32 %r6524, %r6514; 2026-02-21T10:23:48.8016740Z mov.b32 %r6525, %r6514; 2026-02-21T10:23:48.8016805Z mov.b32 %r6526, %r6514; 2026-02-21T10:23:48.8016863Z mov.b32 %r6527, %r6514; 2026-02-21T10:23:48.8016925Z mov.b32 %r6528, %r6514; 2026-02-21T10:23:48.8016987Z mov.b32 %r6529, %r6514; 2026-02-21T10:23:48.8017046Z mov.b32 %r6530, %r6514; 2026-02-21T10:23:48.8017104Z mov.b32 %r6531, %r6514; 2026-02-21T10:23:48.8017163Z mov.b32 %r6532, %r6514; 2026-02-21T10:23:48.8017229Z mov.b32 %r6533, %r6514; 2026-02-21T10:23:48.8017288Z mov.b32 %r6534, %r6514; 2026-02-21T10:23:48.8017345Z mov.b32 %r6535, %r6514; 2026-02-21T10:23:48.8017406Z mov.b32 %r6536, %r6514; 2026-02-21T10:23:48.8017465Z mov.b32 %r6537, %r6514; 2026-02-21T10:23:48.8017523Z mov.b32 %r6538, %r6514; 2026-02-21T10:23:48.8017583Z mov.b32 %r6539, %r6514; 2026-02-21T10:23:48.8017648Z mov.b32 %r6540, %r6514; 2026-02-21T10:23:48.8017708Z mov.b32 %r6541, %r6514; 2026-02-21T10:23:48.8017766Z mov.b32 %r6542, %r6514; 2026-02-21T10:23:48.8017829Z mov.b32 %r6543, %r6514; 2026-02-21T10:23:48.8017887Z mov.b32 %r6544, %r6514; 2026-02-21T10:23:48.8017945Z mov.b32 %r6545, %r6514; 2026-02-21T10:23:48.8018002Z mov.b32 %r6546, %r6514; 2026-02-21T10:23:48.8018065Z mov.b32 %r6547, %r6514; 2026-02-21T10:23:48.8018125Z mov.b32 %r6548, %r6514; 2026-02-21T10:23:48.8018186Z mov.b32 %r6549, %r6514; 2026-02-21T10:23:48.8018247Z mov.b32 %r6550, %r6514; 2026-02-21T10:23:48.8018306Z mov.b32 %r6551, %r6514; 2026-02-21T10:23:48.8018376Z mov.b32 %r6552, %r6514; 2026-02-21T10:23:48.8018438Z mov.b32 %r6553, %r6514; 2026-02-21T10:23:48.8018502Z mov.b32 %r6554, %r6514; 2026-02-21T10:23:48.8018562Z mov.b32 %r6555, %r6514; 2026-02-21T10:23:48.8018621Z mov.b32 %r6556, %r6514; 2026-02-21T10:23:48.8018684Z mov.b32 %r6557, %r6514; 2026-02-21T10:23:48.8018743Z mov.b32 %r6558, %r6514; 2026-02-21T10:23:48.8018800Z mov.b32 %r6559, %r6514; 2026-02-21T10:23:48.8018864Z mov.b32 %r6560, %r6514; 2026-02-21T10:23:48.8018921Z mov.b32 %r6561, %r6514; 2026-02-21T10:23:48.8018979Z mov.b32 %r6562, %r6514; 2026-02-21T10:23:48.8019048Z mov.b32 %r6563, %r6514; 2026-02-21T10:23:48.8019113Z mov.b32 %r6564, %r6514; 2026-02-21T10:23:48.8019172Z mov.b32 %r6565, %r6514; 2026-02-21T10:23:48.8019315Z mov.b32 %r6566, %r6514; 2026-02-21T10:23:48.8019377Z mov.b32 %r6567, %r6514; 2026-02-21T10:23:48.8019435Z mov.b32 %r6568, %r6514; 2026-02-21T10:23:48.8019494Z mov.b32 %r6569, %r6514; 2026-02-21T10:23:48.8019617Z mov.b32 %r6570, %r6514; 2026-02-21T10:23:48.8019679Z mov.b32 %r6571, %r6514; 2026-02-21T10:23:48.8019737Z mov.b32 %r6572, %r6514; 2026-02-21T10:23:48.8019795Z mov.b32 %r6573, %r6514; 2026-02-21T10:23:48.8019856Z mov.b32 %r6574, %r6514; 2026-02-21T10:23:48.8019916Z mov.b32 %r6575, %r6514; 2026-02-21T10:23:48.8019974Z mov.b32 %r6576, %r6514; 2026-02-21T10:23:48.8020034Z mov.b32 %r6577, %r6514; 2026-02-21T10:23:48.8020108Z bra.uni $L__BB0_14; 2026-02-21T10:23:48.8020211Z $L__BB0_15: // %._crit_edge141 2026-02-21T10:23:48.8020436Z .loc 1 22 121 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:121 2026-02-21T10:23:48.8020506Z cp.async.wait_group 0; 2026-02-21T10:23:48.8020568Z bar.sync 0; 2026-02-21T10:23:48.8020840Z .loc 1 22 4 // cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py:22:4 2026-02-21T10:23:48.8020900Z ret; 2026-02-21T10:23:48.8020957Z $L__tmp25: 2026-02-21T10:23:48.8021013Z $L__func_end0: 2026-02-21T10:23:48.8021104Z // -- End function 2026-02-21T10:23:48.8021162Z } 2026-02-21T10:23:48.8021509Z .file 1 "/tmp/torchinductor_root/gy/cgyaoumnyokgyfvzfvs2s5yqqe3tqxa4uywvhfry3liy4rwxab5c.py" 2026-02-21T10:23:48.8021729Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T10:23:48.8021798Z .section .debug_abbrev 2026-02-21T10:23:48.8021850Z { 2026-02-21T10:23:48.8021946Z .b8 1 // Abbreviation Code 2026-02-21T10:23:48.8022043Z .b8 17 // DW_TAG_compile_unit 2026-02-21T10:23:48.8022132Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:23:48.8022220Z .b8 37 // DW_AT_producer 2026-02-21T10:23:48.8022304Z .b8 8 // DW_FORM_string 2026-02-21T10:23:48.8022388Z .b8 19 // DW_AT_language 2026-02-21T10:23:48.8022471Z .b8 5 // DW_FORM_data2 2026-02-21T10:23:48.8022551Z .b8 3 // DW_AT_name 2026-02-21T10:23:48.8022641Z .b8 8 // DW_FORM_string 2026-02-21T10:23:48.8022732Z .b8 16 // DW_AT_stmt_list 2026-02-21T10:23:48.8022811Z .b8 6 // DW_FORM_data4 2026-02-21T10:23:48.8022896Z .b8 27 // DW_AT_comp_dir 2026-02-21T10:23:48.8022973Z .b8 8 // DW_FORM_string 2026-02-21T10:23:48.8023047Z .b8 0 // EOM(1) 2026-02-21T10:23:48.8023117Z .b8 0 // EOM(2) 2026-02-21T10:23:48.8023215Z .b8 2 // Abbreviation Code 2026-02-21T10:23:48.8023304Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:23:48.8023385Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:23:48.8023468Z .b8 3 // DW_AT_name 2026-02-21T10:23:48.8023546Z .b8 8 // DW_FORM_string 2026-02-21T10:23:48.8023628Z .b8 32 // DW_AT_inline 2026-02-21T10:23:48.8023713Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:48.8023785Z .b8 0 // EOM(1) 2026-02-21T10:23:48.8023853Z .b8 0 // EOM(2) 2026-02-21T10:23:48.8023937Z .b8 3 // Abbreviation Code 2026-02-21T10:23:48.8024025Z .b8 46 // DW_TAG_subprogram 2026-02-21T10:23:48.8024109Z .b8 1 // DW_CHILDREN_yes 2026-02-21T10:23:48.8024249Z .b8 17 // DW_AT_low_pc 2026-02-21T10:23:48.8024331Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:48.8024424Z .b8 18 // DW_AT_high_pc 2026-02-21T10:23:48.8024502Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:48.8024647Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:23:48.8024728Z .b8 19 // DW_FORM_ref4 2026-02-21T10:23:48.8024807Z .b8 0 // EOM(1) 2026-02-21T10:23:48.8024879Z .b8 0 // EOM(2) 2026-02-21T10:23:48.8024970Z .b8 4 // Abbreviation Code 2026-02-21T10:23:48.8025070Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T10:23:48.8025151Z .b8 0 // DW_CHILDREN_no 2026-02-21T10:23:48.8025248Z .b8 49 // DW_AT_abstract_origin 2026-02-21T10:23:48.8025389Z .b8 19 // DW_FORM_ref4 2026-02-21T10:23:48.8025471Z .b8 17 // DW_AT_low_pc 2026-02-21T10:23:48.8025551Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:48.8025635Z .b8 18 // DW_AT_high_pc 2026-02-21T10:23:48.8025713Z .b8 1 // DW_FORM_addr 2026-02-21T10:23:48.8025842Z .b8 88 // DW_AT_call_file 2026-02-21T10:23:48.8025922Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:48.8026003Z .b8 89 // DW_AT_call_line 2026-02-21T10:23:48.8026081Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:48.8026167Z .b8 87 // DW_AT_call_column 2026-02-21T10:23:48.8026243Z .b8 11 // DW_FORM_data1 2026-02-21T10:23:48.8026313Z .b8 0 // EOM(1) 2026-02-21T10:23:48.8026390Z .b8 0 // EOM(2) 2026-02-21T10:23:48.8026587Z .b8 0 // EOM(3) 2026-02-21T10:23:48.8026648Z } 2026-02-21T10:23:48.8026716Z .section .debug_info 2026-02-21T10:23:48.8026772Z { 2026-02-21T10:23:48.8026861Z .b32 178 // Length of Unit 2026-02-21T10:23:48.8026954Z .b8 2 // DWARF version number 2026-02-21T10:23:48.8027025Z .b8 0 2026-02-21T10:23:48.8027158Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T10:23:48.8027253Z .b8 8 // Address Size (in bytes) 2026-02-21T10:23:48.8027367Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T10:23:48.8027457Z .b8 116 // DW_AT_producer 2026-02-21T10:23:48.8027513Z .b8 114 2026-02-21T10:23:48.8027567Z .b8 105 2026-02-21T10:23:48.8027625Z .b8 116 2026-02-21T10:23:48.8027680Z .b8 111 2026-02-21T10:23:48.8027732Z .b8 110 2026-02-21T10:23:48.8027792Z .b8 0 2026-02-21T10:23:48.8027872Z .b8 2 // DW_AT_language 2026-02-21T10:23:48.8027924Z .b8 0 2026-02-21T10:23:48.8028005Z .b8 99 // DW_AT_name 2026-02-21T10:23:48.8028065Z .b8 103 2026-02-21T10:23:48.8028118Z .b8 121 2026-02-21T10:23:48.8028170Z .b8 97 2026-02-21T10:23:48.8028224Z .b8 111 2026-02-21T10:23:48.8028280Z .b8 117 2026-02-21T10:23:48.8028333Z .b8 109 2026-02-21T10:23:48.8028395Z .b8 110 2026-02-21T10:23:48.8028452Z .b8 121 2026-02-21T10:23:48.8028561Z .b8 111 2026-02-21T10:23:48.8028615Z .b8 107 2026-02-21T10:23:48.8028669Z .b8 103 2026-02-21T10:23:48.8028719Z .b8 121 2026-02-21T10:23:48.8028769Z .b8 102 2026-02-21T10:23:48.8028823Z .b8 118 2026-02-21T10:23:48.8028877Z .b8 122 2026-02-21T10:23:48.8028929Z .b8 102 2026-02-21T10:23:48.8028985Z .b8 118 2026-02-21T10:23:48.8029036Z .b8 115 2026-02-21T10:23:48.8029092Z .b8 50 2026-02-21T10:23:48.8029143Z .b8 115 2026-02-21T10:23:48.8029285Z .b8 53 2026-02-21T10:23:48.8029342Z .b8 121 2026-02-21T10:23:48.8029393Z .b8 113 2026-02-21T10:23:48.8029446Z .b8 113 2026-02-21T10:23:48.8029499Z .b8 101 2026-02-21T10:23:48.8029555Z .b8 51 2026-02-21T10:23:48.8029608Z .b8 116 2026-02-21T10:23:48.8029725Z .b8 113 2026-02-21T10:23:48.8029780Z .b8 120 2026-02-21T10:23:48.8029833Z .b8 97 2026-02-21T10:23:48.8029894Z .b8 52 2026-02-21T10:23:48.8029946Z .b8 117 2026-02-21T10:23:48.8030004Z .b8 121 2026-02-21T10:23:48.8030055Z .b8 119 2026-02-21T10:23:48.8030106Z .b8 118 2026-02-21T10:23:48.8030160Z .b8 104 2026-02-21T10:23:48.8030210Z .b8 102 2026-02-21T10:23:48.8030263Z .b8 114 2026-02-21T10:23:48.8030314Z .b8 121 2026-02-21T10:23:48.8030369Z .b8 51 2026-02-21T10:23:48.8030421Z .b8 108 2026-02-21T10:23:48.8030473Z .b8 105 2026-02-21T10:23:48.8030526Z .b8 121 2026-02-21T10:23:48.8030578Z .b8 52 2026-02-21T10:23:48.8030630Z .b8 114 2026-02-21T10:23:48.8030685Z .b8 119 2026-02-21T10:23:48.8030740Z .b8 120 2026-02-21T10:23:48.8030795Z .b8 97 2026-02-21T10:23:48.8030845Z .b8 98 2026-02-21T10:23:48.8030990Z .b8 53 2026-02-21T10:23:48.8031049Z .b8 99 2026-02-21T10:23:48.8031100Z .b8 46 2026-02-21T10:23:48.8031153Z .b8 112 2026-02-21T10:23:48.8031208Z .b8 121 2026-02-21T10:23:48.8031258Z .b8 0 2026-02-21T10:23:48.8031366Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T10:23:48.8031452Z .b8 47 // DW_AT_comp_dir 2026-02-21T10:23:48.8031581Z .b8 116 2026-02-21T10:23:48.8031638Z .b8 109 2026-02-21T10:23:48.8031688Z .b8 112 2026-02-21T10:23:48.8031742Z .b8 47 2026-02-21T10:23:48.8031793Z .b8 116 2026-02-21T10:23:48.8031845Z .b8 111 2026-02-21T10:23:48.8031897Z .b8 114 2026-02-21T10:23:48.8031952Z .b8 99 2026-02-21T10:23:48.8032004Z .b8 104 2026-02-21T10:23:48.8032055Z .b8 105 2026-02-21T10:23:48.8032106Z .b8 110 2026-02-21T10:23:48.8032161Z .b8 100 2026-02-21T10:23:48.8032213Z .b8 117 2026-02-21T10:23:48.8032264Z .b8 99 2026-02-21T10:23:48.8032319Z .b8 116 2026-02-21T10:23:48.8032371Z .b8 111 2026-02-21T10:23:48.8032444Z .b8 114 2026-02-21T10:23:48.8032499Z .b8 95 2026-02-21T10:23:48.8032556Z .b8 114 2026-02-21T10:23:48.8032608Z .b8 111 2026-02-21T10:23:48.8032660Z .b8 111 2026-02-21T10:23:48.8032714Z .b8 116 2026-02-21T10:23:48.8032767Z .b8 47 2026-02-21T10:23:48.8032822Z .b8 103 2026-02-21T10:23:48.8032873Z .b8 121 2026-02-21T10:23:48.8032926Z .b8 0 2026-02-21T10:23:48.8033042Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T10:23:48.8033122Z .b8 95 // DW_AT_name 2026-02-21T10:23:48.8033184Z .b8 104 2026-02-21T10:23:48.8033242Z .b8 101 2026-02-21T10:23:48.8033293Z .b8 108 2026-02-21T10:23:48.8033345Z .b8 105 2026-02-21T10:23:48.8033400Z .b8 111 2026-02-21T10:23:48.8033449Z .b8 110 2026-02-21T10:23:48.8033500Z .b8 95 2026-02-21T10:23:48.8033553Z .b8 109 2026-02-21T10:23:48.8033603Z .b8 97 2026-02-21T10:23:48.8033655Z .b8 116 2026-02-21T10:23:48.8033705Z .b8 109 2026-02-21T10:23:48.8033758Z .b8 117 2026-02-21T10:23:48.8033812Z .b8 108 2026-02-21T10:23:48.8033864Z .b8 95 2026-02-21T10:23:48.8033916Z .b8 98 2026-02-21T10:23:48.8033972Z .b8 102 2026-02-21T10:23:48.8034022Z .b8 49 2026-02-21T10:23:48.8034072Z .b8 54 2026-02-21T10:23:48.8034128Z .b8 95 2026-02-21T10:23:48.8034183Z .b8 105 2026-02-21T10:23:48.8034239Z .b8 110 2026-02-21T10:23:48.8034291Z .b8 116 2026-02-21T10:23:48.8034345Z .b8 52 2026-02-21T10:23:48.8034394Z .b8 0 2026-02-21T10:23:48.8034480Z .b8 1 // DW_AT_inline 2026-02-21T10:23:48.8034590Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T10:23:48.8034685Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T10:23:48.8034780Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T10:23:48.8034882Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:23:48.8035022Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T10:23:48.8035127Z .b32 108 // DW_AT_abstract_origin 2026-02-21T10:23:48.8035275Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T10:23:48.8035368Z .b64 $L__tmp24 // DW_AT_high_pc 2026-02-21T10:23:48.8035501Z .b8 1 // DW_AT_call_file 2026-02-21T10:23:48.8035583Z .b8 87 // DW_AT_call_line 2026-02-21T10:23:48.8035685Z .b8 40 // DW_AT_call_column 2026-02-21T10:23:48.8035780Z .b8 0 // End Of Children Mark 2026-02-21T10:23:48.8035867Z .b8 0 // End Of Children Mark 2026-02-21T10:23:48.8035923Z } 2026-02-21T10:23:48.8035992Z .section .debug_macinfo { } 2026-02-21T10:23:48.8035999Z 2026-02-21T10:23:48.8036078Z ================================================================ 2026-02-21T10:23:48.8036197Z please share the reproducer above with Triton project. 2026-02-21T10:23:53.4958248Z 2026-02-21T10:23:53.4959883Z Generation 19: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 58/58 1.6 configs/s 2026-02-21T10:23:56.0041361Z Generation 19: verifying top configs 100% ━━━━━━━━━━━━━━━━━━ 34/34 9.1 configs/s 2026-02-21T10:23:57.4604104Z [3430s] Generation 19 complete: 2026-02-21T10:23:57.4604371Z error=16 2026-02-21T10:23:57.4604547Z ok=44 2026-02-21T10:23:57.4604706Z min=6.1400 2026-02-21T10:23:57.4604891Z mid=10.9486 2026-02-21T10:23:57.4605650Z max=764.9005 2026-02-21T10:23:57.4606030Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:23:57.4607121Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:23:57.4607726Z 'l2_groupings': [8], 2026-02-21T10:23:57.4608102Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:23:57.4608532Z 'loop_orders': [[1, 0]], 2026-02-21T10:23:57.4608873Z 'maxnreg': 256, 2026-02-21T10:23:57.4609179Z 'num_sm_multiplier': 64, 2026-02-21T10:23:57.4609520Z 'num_stages': 1, 2026-02-21T10:23:57.4609818Z 'num_warps': 4, 2026-02-21T10:23:57.4610169Z 'pid_type': 'persistent_blocked', 2026-02-21T10:23:57.4610591Z 'range_flattens': [True, False], 2026-02-21T10:23:57.4610989Z 'range_multi_buffers': [False, True], 2026-02-21T10:23:57.4611406Z 'range_num_stages': [3, 1], 2026-02-21T10:23:57.4611770Z 'range_unroll_factors': [1, 1], 2026-02-21T10:23:57.4612166Z 'range_warp_specializes': []} 2026-02-21T10:23:57.4674905Z [3430s] Fitting surrogate: 1561 points, 1561 targets 2026-02-21T10:23:58.2666388Z [3431s] Generation 20 starting: 42 neighbors, 2 active search path(s) 2026-02-21T10:24:23.3818114Z Generation 20: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 43/43 0.4 configs/s 2026-02-21T10:24:46.7485981Z Generation 20: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 43/43 1.8 configs/s 2026-02-21T10:24:48.0039777Z Generation 20: verifying top configs 100% ━━━━━━━━━━━━━━━━━ 34/34 13.9 configs/s 2026-02-21T10:24:49.4504791Z [3482s] Generation 20 complete: 2026-02-21T10:24:49.4505096Z error=8 2026-02-21T10:24:49.4505274Z ok=36 2026-02-21T10:24:49.4505467Z min=6.3065 2026-02-21T10:24:49.4505659Z mid=15.0160 2026-02-21T10:24:49.4505827Z max=827.0081 2026-02-21T10:24:49.4506028Z best={'block_sizes': [16, 128, 128], 2026-02-21T10:24:49.4506394Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T10:24:49.4512545Z 'l2_groupings': [8], 2026-02-21T10:24:49.4512794Z 'load_eviction_policies': ['last', ''], 2026-02-21T10:24:49.4513053Z 'loop_orders': [[1, 0]], 2026-02-21T10:24:49.4513256Z 'maxnreg': 256, 2026-02-21T10:24:49.4513440Z 'num_sm_multiplier': 64, 2026-02-21T10:24:49.4513626Z 'num_stages': 1, 2026-02-21T10:24:49.4513795Z 'num_warps': 4, 2026-02-21T10:24:49.4513976Z 'pid_type': 'persistent_blocked', 2026-02-21T10:24:49.4514209Z 'range_flattens': [True, False], 2026-02-21T10:24:49.4514438Z 'range_multi_buffers': [False, True], 2026-02-21T10:24:49.4514665Z 'range_num_stages': [3, 1], 2026-02-21T10:24:49.4514872Z 'range_unroll_factors': [1, 1], 2026-02-21T10:24:49.4515085Z 'range_warp_specializes': []} 2026-02-21T10:24:49.4574707Z [3482s] Fitting surrogate: 1605 points, 1605 targets 2026-02-21T10:24:49.6336240Z [3482s] Autotuning complete in 3482.7s after searching 1541 configs. 2026-02-21T10:24:49.6337073Z One can hardcode the best config and skip autotuning with: 2026-02-21T10:24:49.6339361Z @helion.kernel(config=helion.Config(block_sizes=[16, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=64, num_stages=1, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[1, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T10:24:49.6341155Z 2026-02-21T10:24:49.6341603Z [3482s] Code of selected kernel: /tmp/torchinductor_root/jq/cjqn2v73tefbtynhheptwfmrz5yoq7t67asle73uskprh47svbbi.py 2026-02-21T10:24:50.8351165Z WARNING:tritonbench.utils.triton_op:Completed input ID 24: 2026-02-21T10:24:50.8351827Z x_val 2026-02-21T10:24:50.8352505Z ---------------------- 2026-02-21T10:24:50.8352742Z (16, 4096, 1280, 8192) 2026-02-21T10:24:50.8352892Z 2026-02-21T10:24:50.8396010Z 80%|████████ | 8/10 [2:06:31<52:19, 1569.67s/it]WARNING:tritonbench.utils.triton_op:Running input ID 28: 2026-02-21T10:24:50.8396852Z x_val 2026-02-21T10:24:50.8397049Z ---------------------- 2026-02-21T10:24:50.8397260Z (64, 4096, 1280, 8192) 2026-02-21T10:24:50.8439694Z INFO:tritonbench.utils.triton_op:Took 0.29ms to get benchmark function for preprocessed_eager_int4_gemm 2026-02-21T10:24:51.9556213Z INFO:tritonbench.utils.triton_op:Took 3.36ms to get benchmark function for preprocessed_torch_compile_int4_gemm 2026-02-21T10:24:56.2968075Z Autotune Choices Stats: 2026-02-21T10:24:56.2970562Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 8.163455963134766, "best_triton_pos": 1, "best_triton_time": 10.960448265075684, "best_triton_kernel": "triton_mm_124", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4"} 2026-02-21T10:24:56.2975501Z AUTOTUNE mm(262144x8192, 8192x1280) 2026-02-21T10:24:56.2975766Z strides: [8192, 1], [1280, 1] 2026-02-21T10:24:56.2976008Z dtypes: torch.bfloat16, torch.bfloat16 2026-02-21T10:24:56.2976286Z mm 8.1635 ms 100.0% 2026-02-21T10:24:56.2977082Z triton_mm_124 10.9604 ms 74.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T10:24:56.2978108Z triton_mm_125 11.4628 ms 71.2% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T10:24:56.2979143Z triton_mm_126 11.7791 ms 69.3% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T10:24:56.2980163Z triton_mm_123 15.5235 ms 52.6% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2026-02-21T10:24:56.2981171Z triton_mm_117 15.8725 ms 51.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T10:24:56.2982187Z triton_mm_120 17.0630 ms 47.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T10:24:56.2983185Z triton_mm_122 17.3901 ms 46.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T10:24:56.2984194Z triton_mm_118 17.9447 ms 45.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T10:24:56.2985666Z triton_mm_119 17.9911 ms 45.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T10:24:56.2986829Z SingleProcess AUTOTUNE benchmarking takes 3.7628 seconds and 0.3404 seconds precompiling for 20 choices 2026-02-21T10:24:57.6577700Z INFO:tritonbench.utils.triton_op:Took 0.20ms to get benchmark function for preprocessed_triton_int4_gemm 2026-02-21T10:24:59.7992038Z WARNING:__main__:Input tensor metadata: 2026-02-21T10:24:59.7992599Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T10:24:59.7993047Z 'dtype': 'torch.bfloat16', 2026-02-21T10:24:59.7993503Z 'shape': (64, 4096, 8192), 2026-02-21T10:24:59.7993917Z 'stride': (33554432, 8192, 1)}, 2026-02-21T10:24:59.7994432Z { 'device': 'cuda:0', 2026-02-21T10:24:59.7994902Z 'dtype': 'torch.int32', 2026-02-21T10:24:59.7995206Z 'shape': (8192, 1280), 2026-02-21T10:24:59.7995844Z 'stride': (1280, 1)}), 2026-02-21T10:24:59.7996103Z 'kwargs': {}} 2026-02-21T10:24:59.8091514Z INFO:tritonbench.utils.triton_op:Took 10.39ms to get benchmark function for helion_int4_gemm_tritonbench 2026-02-21T10:25:00.2254612Z [0s] Autotune random seed: 2135373392 2026-02-21T10:25:00.7454479Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T10:25:35.6850060Z [34s] Timeout after 30s compiling Config(block_sizes=[16, 8192, 2], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=4, num_stages=3, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[1, 3], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T10:25:36.9444883Z [36s] Timeout after 30s compiling Config(block_sizes=[16, 1, 1024], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=3, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, False], range_num_stages=[0, 3], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T10:25:41.3841596Z [40s] Timeout after 30s compiling Config(block_sizes=[8, 2048, 1], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], num_sm_multiplier=2, num_stages=8, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[None, False], range_num_stages=[2, 4], range_unroll_factors=[4, 1], range_warp_specializes=[]) 2026-02-21T10:25:43.1979941Z [42s] Timeout after 30s compiling Config(block_sizes=[128, 512, 8], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=4, num_stages=4, num_warps=16, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[True, True], range_num_stages=[2, 2], range_unroll_factors=[4, 0], range_warp_specializes=[]) 2026-02-21T10:25:46.9059513Z [46s] Timeout after 30s compiling Config(block_sizes=[64, 4096, 1], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=32, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[3, 0], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T10:26:01.0280140Z [60s] Timeout after 30s compiling Config(block_sizes=[1024, 8, 8], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=16, num_stages=7, num_warps=1, pid_type='persistent_interleaved', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[1, 0], range_unroll_factors=[2, 0], range_warp_specializes=[]) 2026-02-21T10:26:01.6402933Z [60s] Timeout after 30s compiling Config(block_sizes=[16, 1024, 1], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[32], load_eviction_policies=['', 'first'], loop_orders=[[0, 1]], num_stages=2, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, True], range_num_stages=[0, 3], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T10:26:02.2213466Z [61s] Timeout after 30s compiling Config(block_sizes=[8, 2048, 4], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=5, num_warps=1, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, None], range_num_stages=[0, 0], range_unroll_factors=[0, 1], range_warp_specializes=[]) 2026-02-21T10:26:02.2235201Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━━ 100/100 0.4 configs/s 2026-02-21T10:30:14.9677294Z [314s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=128, num_stages=8, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[True, None], range_num_stages=[2, 2], range_unroll_factors=[2, 1], range_warp_specializes=[]) 2026-02-21T10:30:14.9679199Z Tensor-likes are not close! 2026-02-21T10:30:14.9679364Z 2026-02-21T10:30:14.9679484Z Mismatched elements: 334842684 / 335544320 (99.8%) 2026-02-21T10:30:14.9679916Z Greatest absolute difference: 7232.0 at index (126054, 532) (up to 0.01 allowed) 2026-02-21T10:30:14.9680451Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T10:30:14.9680931Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:30:14.9681191Z 2026-02-21T10:46:15.3322740Z [1274s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 2048, 64], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], num_stages=5, num_warps=16, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 2], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T10:46:15.3324443Z Tensor-likes are not close! 2026-02-21T10:46:15.3324605Z 2026-02-21T10:46:15.3324727Z Mismatched elements: 335044614 / 335544320 (99.9%) 2026-02-21T10:46:15.3325162Z Greatest absolute difference: 9984.0 at index (2420, 1093) (up to 0.01 allowed) 2026-02-21T10:46:15.3325713Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T10:46:15.3326195Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:46:15.3326595Z 2026-02-21T10:51:33.4730295Z [1592s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 8192, 32], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[1], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=128, num_stages=2, num_warps=16, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[True, None], range_num_stages=[1, 0], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T10:51:33.4733293Z Tensor-likes are not close! 2026-02-21T10:51:33.4733563Z 2026-02-21T10:51:33.4733764Z Mismatched elements: 335041577 / 335544320 (99.9%) 2026-02-21T10:51:33.4734491Z Greatest absolute difference: 10560.0 at index (152456, 1194) (up to 0.01 allowed) 2026-02-21T10:51:33.4735372Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T10:51:33.4736122Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:51:33.4737636Z 2026-02-21T10:58:02.7655768Z [1982s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 512], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=2, num_stages=2, num_warps=32, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[None, True], range_num_stages=[1, 4], range_unroll_factors=[1, 4], range_warp_specializes=[]) 2026-02-21T10:58:02.7659876Z Tensor-likes are not close! 2026-02-21T10:58:02.7660176Z 2026-02-21T10:58:02.7660388Z Mismatched elements: 334811300 / 335544320 (99.8%) 2026-02-21T10:58:02.7660816Z Greatest absolute difference: 7040.0 at index (187897, 775) (up to 0.01 allowed) 2026-02-21T10:58:02.7661297Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T10:58:02.7661716Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T10:58:02.7662119Z 2026-02-21T11:12:21.0510297Z [2840s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 2048, 16], indexing=['pointer', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=128, num_stages=3, num_warps=32, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[True, True], range_num_stages=[1, 0], range_unroll_factors=[1, 4], range_warp_specializes=[]) 2026-02-21T11:12:21.0512136Z Tensor-likes are not close! 2026-02-21T11:12:21.0512313Z 2026-02-21T11:12:21.0512437Z Mismatched elements: 335099718 / 335544320 (99.9%) 2026-02-21T11:12:21.0512920Z Greatest absolute difference: 10624.0 at index (150408, 1186) (up to 0.01 allowed) 2026-02-21T11:12:21.0513465Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T11:12:21.0513943Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T11:12:21.0514209Z 2026-02-21T11:45:45.4320295Z [4844s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 64, 16], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[16], load_eviction_policies=['first', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=8, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, False], range_multi_buffers=[False, None], range_num_stages=[1, 3], range_unroll_factors=[2, 2], range_warp_specializes=[]) 2026-02-21T11:45:45.4322305Z Tensor-likes are not close! 2026-02-21T11:45:45.4322525Z 2026-02-21T11:45:45.4322667Z Mismatched elements: 335161033 / 335544320 (99.9%) 2026-02-21T11:45:45.4323184Z Greatest absolute difference: 10752.0 at index (149928, 1186) (up to 0.01 allowed) 2026-02-21T11:45:45.4323770Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T11:45:45.4324315Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T11:45:45.4324606Z 2026-02-21T12:29:14.8446332Z Initial population exploring neighbors 100% ━━━━━━━━━━━━━━━━ 100/100 - configs/s 2026-02-21T12:29:14.8474618Z [7454s] Adaptive compile timeout: 30s (90% percentile=30.0s, bounds=[30.0s, 30s]) 2026-02-21T12:29:15.1928133Z Verifying initial results 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 3/3 - configs/s 2026-02-21T12:29:16.1977310Z [7455s] Initial random population of 100, 5 starting points: 2026-02-21T12:29:16.1977747Z error=12 2026-02-21T12:29:16.1977993Z timeout=8 2026-02-21T12:29:16.1978237Z ok=80 2026-02-21T12:29:16.1978433Z min=94.1015 2026-02-21T12:29:16.1978646Z mid=4922.3569 2026-02-21T12:29:16.1978870Z max=71313.1797 2026-02-21T12:29:16.1979102Z best={'block_sizes': [4, 512, 64], 2026-02-21T12:29:16.1979495Z 'indexing': ['pointer', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T12:29:16.1979867Z 'l2_groupings': [4], 2026-02-21T12:29:16.1980171Z 'load_eviction_policies': ['', 'first'], 2026-02-21T12:29:16.1980460Z 'loop_orders': [[1, 0]], 2026-02-21T12:29:16.1980735Z 'num_sm_multiplier': 16, 2026-02-21T12:29:16.1981339Z 'num_stages': 4, 2026-02-21T12:29:16.1981575Z 'num_warps': 8, 2026-02-21T12:29:16.1981816Z 'pid_type': 'persistent_interleaved', 2026-02-21T12:29:16.1982145Z 'range_flattens': [True, False], 2026-02-21T12:29:16.1982602Z 'range_multi_buffers': [False, False], 2026-02-21T12:29:16.1982894Z 'range_num_stages': [4, 0], 2026-02-21T12:29:16.1983199Z 'range_unroll_factors': [0, 0], 2026-02-21T12:29:16.1983471Z 'range_warp_specializes': []} 2026-02-21T12:29:16.2003386Z [7455s] Fitting surrogate: 100 points, 100 targets 2026-02-21T12:29:18.0065513Z [7457s] Generation 1 starting: 108 neighbors, 5 active search path(s) 2026-02-21T12:30:11.2079318Z Generation 1: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 109/109 0.8 configs/s 2026-02-21T12:30:12.2011782Z [7511s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 512, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', ''], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:30:12.2013627Z Tensor-likes are not close! 2026-02-21T12:30:12.2013806Z 2026-02-21T12:30:12.2013969Z Mismatched elements: 334524510 / 335544320 (99.7%) 2026-02-21T12:30:12.2014591Z Greatest absolute difference: 4128.0 at index (237144, 114) (up to 0.01 allowed) 2026-02-21T12:30:12.2015151Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:30:12.2015621Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:30:12.2015909Z 2026-02-21T12:30:13.1905322Z [7512s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 256, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:30:13.1908051Z Tensor-likes are not close! 2026-02-21T12:30:13.1908454Z 2026-02-21T12:30:13.1908686Z Mismatched elements: 334508245 / 335544320 (99.7%) 2026-02-21T12:30:13.1909423Z Greatest absolute difference: 4288.0 at index (112754, 610) (up to 0.01 allowed) 2026-02-21T12:30:13.1910242Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:30:13.1910871Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:30:13.1911242Z 2026-02-21T12:31:12.2022622Z [7571s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 512, 64], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:31:12.2024598Z Tensor-likes are not close! 2026-02-21T12:31:12.2024786Z 2026-02-21T12:31:12.2024920Z Mismatched elements: 334859175 / 335544320 (99.8%) 2026-02-21T12:31:12.2025406Z Greatest absolute difference: 7296.0 at index (123117, 77) (up to 0.01 allowed) 2026-02-21T12:31:12.2025959Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:31:12.2026794Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:31:12.2027065Z 2026-02-21T12:31:22.0685314Z 2026-02-21T12:31:22.0685329Z 2026-02-21T12:31:22.0685920Z ================================================================ 2026-02-21T12:31:22.0686341Z Internal Triton PTX codegen error 2026-02-21T12:31:22.0687091Z `ptxas` stderr: 2026-02-21T12:31:22.0687894Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 1884 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:31:22.0689225Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:31:22.0689490Z 2026-02-21T12:31:22.0690332Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpbhy24nkw.ptx -o /tmp/tmpbhy24nkw.ptx.o 2026-02-21T12:31:22.0691064Z 2026-02-21T12:31:22.0691069Z 2026-02-21T12:31:22.0691158Z // 2026-02-21T12:31:22.0691396Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:31:22.0691668Z // 2026-02-21T12:31:22.0691802Z 2026-02-21T12:31:22.0691893Z .version 8.7 2026-02-21T12:31:22.0692108Z .target sm_90a 2026-02-21T12:31:22.0692358Z .address_size 64 2026-02-21T12:31:22.0692484Z 2026-02-21T12:31:22.0692733Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:31:22.0693171Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:31:22.0693636Z // @_helion_matmul_bf16_int4 2026-02-21T12:31:22.0693984Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:31:22.0694453Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:31:22.0694920Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:31:22.0695392Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:31:22.0696015Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:31:22.0696672Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:31:22.0697069Z ) 2026-02-21T12:31:22.0697307Z .reqntid 128 2026-02-21T12:31:22.0697603Z .maxnreg 128 2026-02-21T12:31:22.0697823Z { 2026-02-21T12:31:22.0698089Z .reg .pred %p<23>; 2026-02-21T12:31:22.0698304Z .reg .b16 %rs<81>; 2026-02-21T12:31:22.0698544Z .reg .b32 %r<11284>; 2026-02-21T12:31:22.0698778Z .reg .b64 %rd<668>; 2026-02-21T12:31:22.0699151Z .loc 1 14 0 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:14:0 2026-02-21T12:31:22.0699589Z $L__func_begin0: 2026-02-21T12:31:22.0699986Z .loc 1 14 0 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:14:0 2026-02-21T12:31:22.0700350Z 2026-02-21T12:31:22.0700451Z // %bb.0: 2026-02-21T12:31:22.0700735Z ld.param.b64 %rd185, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:31:22.0701502Z $L__tmp0: 2026-02-21T12:31:22.0701884Z .loc 1 19 46 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:46 2026-02-21T12:31:22.0702298Z mov.u32 %r3142, %ctaid.x; 2026-02-21T12:31:22.0702697Z .loc 1 19 52 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:52 2026-02-21T12:31:22.0703151Z cvt.u64.u32 %rd667, %r3142; 2026-02-21T12:31:22.0703533Z .loc 1 31 45 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:31:45 2026-02-21T12:31:22.0703961Z mov.u32 %r1, %tid.x; 2026-02-21T12:31:22.0704320Z .loc 1 33 45 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:33:45 2026-02-21T12:31:22.0704786Z and.b32 %r3144, %r1, 127; 2026-02-21T12:31:22.0705002Z or.b32 %r3145, %r3144, 128; 2026-02-21T12:31:22.0705275Z or.b32 %r3146, %r3144, 256; 2026-02-21T12:31:22.0705529Z or.b32 %r3147, %r3144, 384; 2026-02-21T12:31:22.0705893Z .loc 1 33 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:33:32 2026-02-21T12:31:22.0706343Z cvt.u64.u32 %rd3, %r3144; 2026-02-21T12:31:22.0706741Z cvt.u64.u32 %rd4, %r3145; 2026-02-21T12:31:22.0706999Z cvt.u64.u32 %rd5, %r3146; 2026-02-21T12:31:22.0707216Z cvt.u64.u32 %rd6, %r3147; 2026-02-21T12:31:22.0707641Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0708077Z sub.s64 %rd200, 2560, %rd667; 2026-02-21T12:31:22.0708391Z cvt.u32.u64 %r3276, %rd200; 2026-02-21T12:31:22.0708793Z mul.hi.s32 %r3277, %r3276, 1041204193; 2026-02-21T12:31:22.0709047Z shr.u32 %r3278, %r3277, 31; 2026-02-21T12:31:22.0709289Z shr.s32 %r3279, %r3277, 9; 2026-02-21T12:31:22.0709514Z add.s32 %r3280, %r3279, %r3278; 2026-02-21T12:31:22.0709876Z cvt.s64.s32 %rd135, %r3280; 2026-02-21T12:31:22.0710191Z mul.wide.s32 %rd201, %r3280, 2112; 2026-02-21T12:31:22.0710488Z setp.ne.b64 %p1, %rd200, %rd201; 2026-02-21T12:31:22.0710764Z setp.lt.u32 %p2, %r3142, 2561; 2026-02-21T12:31:22.0711056Z and.pred %p3, %p2, %p1; 2026-02-21T12:31:22.0711314Z selp.b64 %rd136, 1, 0, %p3; 2026-02-21T12:31:22.0711560Z add.s64 %rd137, %rd136, %rd135; 2026-02-21T12:31:22.0711813Z setp.lt.s64 %p4, %rd137, 1; 2026-02-21T12:31:22.0712102Z setp.gt.s64 %p5, %rd137, 0; 2026-02-21T12:31:22.0725949Z .loc 1 26 33 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:26:33 2026-02-21T12:31:22.0726591Z shr.u64 %rd202, %rd667, 9; 2026-02-21T12:31:22.0726843Z and.b64 %rd138, %rd202, 4194300; 2026-02-21T12:31:22.0727334Z .loc 1 27 39 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:27:39 2026-02-21T12:31:22.0727753Z sub.s64 %rd204, 5, %rd138; 2026-02-21T12:31:22.0728092Z .loc 1 27 52 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:27:52 2026-02-21T12:31:22.0728464Z min.s64 %rd205, %rd204, 4; 2026-02-21T12:31:22.0728855Z .loc 1 28 45 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:45 2026-02-21T12:31:22.0729236Z and.b32 %r5, %r3142, 2047; 2026-02-21T12:31:22.0729562Z .loc 1 29 51 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:29:51 2026-02-21T12:31:22.0729922Z cvt.u32.u64 %r6, %rd205; 2026-02-21T12:31:22.0730107Z div.s32 %r7, %r5, %r6; 2026-02-21T12:31:22.0730464Z .loc 1 32 27 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:32:27 2026-02-21T12:31:22.0730893Z mul.wide.s32 %rd645, %r7, 512; 2026-02-21T12:31:22.0731275Z .loc 1 33 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:33:32 2026-02-21T12:31:22.0731649Z or.b64 %rd663, %rd645, %rd3; 2026-02-21T12:31:22.0731845Z or.b64 %rd664, %rd645, %rd4; 2026-02-21T12:31:22.0732027Z or.b64 %rd665, %rd645, %rd5; 2026-02-21T12:31:22.0732210Z or.b64 %rd666, %rd645, %rd6; 2026-02-21T12:31:22.0732529Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.0732948Z shl.b64 %rd206, %rd663, 14; 2026-02-21T12:31:22.0733138Z add.s64 %rd188, %rd185, %rd206; 2026-02-21T12:31:22.0733348Z shl.b64 %rd207, %rd664, 14; 2026-02-21T12:31:22.0733552Z add.s64 %rd189, %rd185, %rd207; 2026-02-21T12:31:22.0733754Z shl.b64 %rd208, %rd665, 14; 2026-02-21T12:31:22.0733938Z add.s64 %rd190, %rd185, %rd208; 2026-02-21T12:31:22.0734130Z shl.b64 %rd209, %rd666, 14; 2026-02-21T12:31:22.0734327Z add.s64 %rd191, %rd185, %rd209; 2026-02-21T12:31:22.0734656Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.0735023Z shl.b32 %r8, %r3144, 4; 2026-02-21T12:31:22.0735202Z mov.b32 %r3281, global_smem; 2026-02-21T12:31:22.0735395Z add.s32 %r3118, %r3281, %r8; 2026-02-21T12:31:22.0735580Z selp.b32 %r3119, 16, 0, %p5; 2026-02-21T12:31:22.0735766Z // begin inline asm 2026-02-21T12:31:22.0736021Z cp.async.cg.shared.global [ %r3118 + 0 ], [ %rd188 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0736316Z // end inline asm 2026-02-21T12:31:22.0736632Z add.s32 %r3120, %r3118, 2048; 2026-02-21T12:31:22.0736823Z // begin inline asm 2026-02-21T12:31:22.0737082Z cp.async.cg.shared.global [ %r3120 + 0 ], [ %rd189 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0737369Z // end inline asm 2026-02-21T12:31:22.0737537Z add.s32 %r3122, %r3118, 4096; 2026-02-21T12:31:22.0737723Z // begin inline asm 2026-02-21T12:31:22.0737982Z cp.async.cg.shared.global [ %r3122 + 0 ], [ %rd190 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0738375Z // end inline asm 2026-02-21T12:31:22.0738535Z add.s32 %r3124, %r3118, 6144; 2026-02-21T12:31:22.0738721Z // begin inline asm 2026-02-21T12:31:22.0738963Z cp.async.cg.shared.global [ %r3124 + 0 ], [ %rd191 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0739321Z // end inline asm 2026-02-21T12:31:22.0739478Z cp.async.commit_group; 2026-02-21T12:31:22.0739817Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.0740182Z add.s64 %rd192, %rd188, 16; 2026-02-21T12:31:22.0740369Z add.s64 %rd193, %rd189, 16; 2026-02-21T12:31:22.0740558Z add.s64 %rd194, %rd190, 16; 2026-02-21T12:31:22.0740735Z add.s64 %rd195, %rd191, 16; 2026-02-21T12:31:22.0741058Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.0741407Z bar.sync 0; 2026-02-21T12:31:22.0741564Z add.s32 %r3126, %r3118, 8192; 2026-02-21T12:31:22.0741740Z // begin inline asm 2026-02-21T12:31:22.0742073Z cp.async.cg.shared.global [ %r3126 + 0 ], [ %rd192 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0742362Z // end inline asm 2026-02-21T12:31:22.0742527Z add.s32 %r3128, %r3118, 10240; 2026-02-21T12:31:22.0742720Z // begin inline asm 2026-02-21T12:31:22.0742955Z cp.async.cg.shared.global [ %r3128 + 0 ], [ %rd193 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0743241Z // end inline asm 2026-02-21T12:31:22.0743406Z add.s32 %r3130, %r3118, 12288; 2026-02-21T12:31:22.0743667Z // begin inline asm 2026-02-21T12:31:22.0743903Z cp.async.cg.shared.global [ %r3130 + 0 ], [ %rd194 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0744187Z // end inline asm 2026-02-21T12:31:22.0744350Z add.s32 %r3132, %r3118, 14336; 2026-02-21T12:31:22.0744543Z // begin inline asm 2026-02-21T12:31:22.0744775Z cp.async.cg.shared.global [ %r3132 + 0 ], [ %rd195 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0745060Z // end inline asm 2026-02-21T12:31:22.0745224Z cp.async.commit_group; 2026-02-21T12:31:22.0745540Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.0745909Z add.s64 %rd196, %rd188, 32; 2026-02-21T12:31:22.0746105Z add.s64 %rd197, %rd189, 32; 2026-02-21T12:31:22.0746295Z add.s64 %rd198, %rd190, 32; 2026-02-21T12:31:22.0746645Z add.s64 %rd199, %rd191, 32; 2026-02-21T12:31:22.0746985Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.0747356Z bar.sync 0; 2026-02-21T12:31:22.0747508Z add.s32 %r3134, %r3118, 16384; 2026-02-21T12:31:22.0747709Z // begin inline asm 2026-02-21T12:31:22.0747956Z cp.async.cg.shared.global [ %r3134 + 0 ], [ %rd196 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0748339Z // end inline asm 2026-02-21T12:31:22.0748503Z add.s32 %r3136, %r3118, 18432; 2026-02-21T12:31:22.0748699Z // begin inline asm 2026-02-21T12:31:22.0748934Z cp.async.cg.shared.global [ %r3136 + 0 ], [ %rd197 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0749222Z // end inline asm 2026-02-21T12:31:22.0749381Z add.s32 %r3138, %r3118, 20480; 2026-02-21T12:31:22.0749574Z // begin inline asm 2026-02-21T12:31:22.0749815Z cp.async.cg.shared.global [ %r3138 + 0 ], [ %rd198 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0750094Z // end inline asm 2026-02-21T12:31:22.0750262Z add.s32 %r3140, %r3118, 22528; 2026-02-21T12:31:22.0750444Z // begin inline asm 2026-02-21T12:31:22.0750684Z cp.async.cg.shared.global [ %r3140 + 0 ], [ %rd199 + 0 ], 0x10, %r3119; 2026-02-21T12:31:22.0750966Z // end inline asm 2026-02-21T12:31:22.0751134Z cp.async.commit_group; 2026-02-21T12:31:22.0751458Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0751833Z @%p4 bra $L__BB0_10; 2026-02-21T12:31:22.0752020Z // %bb.1: // %.lr.ph 2026-02-21T12:31:22.0752392Z .loc 1 0 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:0:135 2026-02-21T12:31:22.0752817Z ld.param.b64 %rd187, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:31:22.0753231Z ld.param.b64 %rd186, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:31:22.0753491Z shr.u32 %r2, %r1, 5; 2026-02-21T12:31:22.0753654Z shl.b32 %r3, %r1, 3; 2026-02-21T12:31:22.0753823Z and.b32 %r3143, %r3, 248; 2026-02-21T12:31:22.0754080Z cvt.u64.u32 %rd2, %r3143; 2026-02-21T12:31:22.0754264Z and.b32 %r4, %r1, 96; 2026-02-21T12:31:22.0754442Z bfe.u32 %r3148, %r1, 5, 2; 2026-02-21T12:31:22.0754623Z or.b32 %r3149, %r3148, 4; 2026-02-21T12:31:22.0754799Z or.b32 %r3150, %r3148, 8; 2026-02-21T12:31:22.0754968Z or.b32 %r3151, %r3148, 12; 2026-02-21T12:31:22.0755152Z or.b32 %r3152, %r3148, 16; 2026-02-21T12:31:22.0755322Z or.b32 %r3153, %r3148, 20; 2026-02-21T12:31:22.0755504Z or.b32 %r3154, %r3148, 24; 2026-02-21T12:31:22.0755677Z or.b32 %r3155, %r3148, 28; 2026-02-21T12:31:22.0755854Z or.b32 %r3156, %r3148, 32; 2026-02-21T12:31:22.0756033Z or.b32 %r3157, %r3148, 36; 2026-02-21T12:31:22.0756205Z or.b32 %r3158, %r3148, 40; 2026-02-21T12:31:22.0756385Z or.b32 %r3159, %r3148, 44; 2026-02-21T12:31:22.0756809Z or.b32 %r3160, %r3148, 48; 2026-02-21T12:31:22.0757001Z or.b32 %r3161, %r3148, 52; 2026-02-21T12:31:22.0757171Z or.b32 %r3162, %r3148, 56; 2026-02-21T12:31:22.0757362Z or.b32 %r3163, %r3148, 60; 2026-02-21T12:31:22.0757539Z or.b32 %r3164, %r3148, 64; 2026-02-21T12:31:22.0757716Z or.b32 %r3165, %r3148, 68; 2026-02-21T12:31:22.0757884Z or.b32 %r3166, %r3148, 72; 2026-02-21T12:31:22.0758134Z or.b32 %r3167, %r3148, 76; 2026-02-21T12:31:22.0758322Z or.b32 %r3168, %r3148, 80; 2026-02-21T12:31:22.0758490Z or.b32 %r3169, %r3148, 84; 2026-02-21T12:31:22.0758667Z or.b32 %r3170, %r3148, 88; 2026-02-21T12:31:22.0758841Z or.b32 %r3171, %r3148, 92; 2026-02-21T12:31:22.0759032Z or.b32 %r3172, %r3148, 96; 2026-02-21T12:31:22.0759207Z or.b32 %r3173, %r3148, 100; 2026-02-21T12:31:22.0759388Z or.b32 %r3174, %r3148, 104; 2026-02-21T12:31:22.0759562Z or.b32 %r3175, %r3148, 108; 2026-02-21T12:31:22.0759743Z or.b32 %r3176, %r3148, 112; 2026-02-21T12:31:22.0759921Z or.b32 %r3177, %r3148, 116; 2026-02-21T12:31:22.0760104Z or.b32 %r3178, %r3148, 120; 2026-02-21T12:31:22.0760301Z or.b32 %r3179, %r3148, 124; 2026-02-21T12:31:22.0760481Z or.b32 %r3180, %r3148, 128; 2026-02-21T12:31:22.0760662Z or.b32 %r3181, %r3148, 132; 2026-02-21T12:31:22.0760833Z or.b32 %r3182, %r3148, 136; 2026-02-21T12:31:22.0761010Z or.b32 %r3183, %r3148, 140; 2026-02-21T12:31:22.0761182Z or.b32 %r3184, %r3148, 144; 2026-02-21T12:31:22.0761359Z or.b32 %r3185, %r3148, 148; 2026-02-21T12:31:22.0761538Z or.b32 %r3186, %r3148, 152; 2026-02-21T12:31:22.0761709Z or.b32 %r3187, %r3148, 156; 2026-02-21T12:31:22.0761885Z or.b32 %r3188, %r3148, 160; 2026-02-21T12:31:22.0762054Z or.b32 %r3189, %r3148, 164; 2026-02-21T12:31:22.0762232Z or.b32 %r3190, %r3148, 168; 2026-02-21T12:31:22.0762402Z or.b32 %r3191, %r3148, 172; 2026-02-21T12:31:22.0762583Z or.b32 %r3192, %r3148, 176; 2026-02-21T12:31:22.0762754Z or.b32 %r3193, %r3148, 180; 2026-02-21T12:31:22.0762934Z or.b32 %r3194, %r3148, 184; 2026-02-21T12:31:22.0763107Z or.b32 %r3195, %r3148, 188; 2026-02-21T12:31:22.0763284Z or.b32 %r3196, %r3148, 192; 2026-02-21T12:31:22.0763477Z or.b32 %r3197, %r3148, 196; 2026-02-21T12:31:22.0763656Z or.b32 %r3198, %r3148, 200; 2026-02-21T12:31:22.0763839Z or.b32 %r3199, %r3148, 204; 2026-02-21T12:31:22.0764010Z or.b32 %r3200, %r3148, 208; 2026-02-21T12:31:22.0764193Z or.b32 %r3201, %r3148, 212; 2026-02-21T12:31:22.0764367Z or.b32 %r3202, %r3148, 216; 2026-02-21T12:31:22.0764543Z or.b32 %r3203, %r3148, 220; 2026-02-21T12:31:22.0764714Z or.b32 %r3204, %r3148, 224; 2026-02-21T12:31:22.0764889Z or.b32 %r3205, %r3148, 228; 2026-02-21T12:31:22.0765063Z or.b32 %r3206, %r3148, 232; 2026-02-21T12:31:22.0765234Z or.b32 %r3207, %r3148, 236; 2026-02-21T12:31:22.0765412Z or.b32 %r3208, %r3148, 240; 2026-02-21T12:31:22.0765578Z or.b32 %r3209, %r3148, 244; 2026-02-21T12:31:22.0765756Z or.b32 %r3210, %r3148, 248; 2026-02-21T12:31:22.0765925Z or.b32 %r3211, %r3148, 252; 2026-02-21T12:31:22.0766206Z or.b32 %r3212, %r3148, 256; 2026-02-21T12:31:22.0766378Z or.b32 %r3213, %r3148, 260; 2026-02-21T12:31:22.0766690Z or.b32 %r3214, %r3148, 264; 2026-02-21T12:31:22.0766861Z or.b32 %r3215, %r3148, 268; 2026-02-21T12:31:22.0767128Z or.b32 %r3216, %r3148, 272; 2026-02-21T12:31:22.0767302Z or.b32 %r3217, %r3148, 276; 2026-02-21T12:31:22.0767481Z or.b32 %r3218, %r3148, 280; 2026-02-21T12:31:22.0767665Z or.b32 %r3219, %r3148, 284; 2026-02-21T12:31:22.0767840Z or.b32 %r3220, %r3148, 288; 2026-02-21T12:31:22.0768016Z or.b32 %r3221, %r3148, 292; 2026-02-21T12:31:22.0768187Z or.b32 %r3222, %r3148, 296; 2026-02-21T12:31:22.0768361Z or.b32 %r3223, %r3148, 300; 2026-02-21T12:31:22.0768536Z or.b32 %r3224, %r3148, 304; 2026-02-21T12:31:22.0768716Z or.b32 %r3225, %r3148, 308; 2026-02-21T12:31:22.0768905Z or.b32 %r3226, %r3148, 312; 2026-02-21T12:31:22.0769074Z or.b32 %r3227, %r3148, 316; 2026-02-21T12:31:22.0769256Z or.b32 %r3228, %r3148, 320; 2026-02-21T12:31:22.0769505Z or.b32 %r3229, %r3148, 324; 2026-02-21T12:31:22.0769689Z or.b32 %r3230, %r3148, 328; 2026-02-21T12:31:22.0769865Z or.b32 %r3231, %r3148, 332; 2026-02-21T12:31:22.0770038Z or.b32 %r3232, %r3148, 336; 2026-02-21T12:31:22.0770242Z or.b32 %r3233, %r3148, 340; 2026-02-21T12:31:22.0770417Z or.b32 %r3234, %r3148, 344; 2026-02-21T12:31:22.0770601Z or.b32 %r3235, %r3148, 348; 2026-02-21T12:31:22.0770850Z or.b32 %r3236, %r3148, 352; 2026-02-21T12:31:22.0771033Z or.b32 %r3237, %r3148, 356; 2026-02-21T12:31:22.0771203Z or.b32 %r3238, %r3148, 360; 2026-02-21T12:31:22.0771379Z or.b32 %r3239, %r3148, 364; 2026-02-21T12:31:22.0771556Z or.b32 %r3240, %r3148, 368; 2026-02-21T12:31:22.0771726Z or.b32 %r3241, %r3148, 372; 2026-02-21T12:31:22.0771915Z or.b32 %r3242, %r3148, 376; 2026-02-21T12:31:22.0772086Z or.b32 %r3243, %r3148, 380; 2026-02-21T12:31:22.0772258Z or.b32 %r3244, %r3148, 384; 2026-02-21T12:31:22.0772427Z or.b32 %r3245, %r3148, 388; 2026-02-21T12:31:22.0772602Z or.b32 %r3246, %r3148, 392; 2026-02-21T12:31:22.0772770Z or.b32 %r3247, %r3148, 396; 2026-02-21T12:31:22.0772946Z or.b32 %r3248, %r3148, 400; 2026-02-21T12:31:22.0773114Z or.b32 %r3249, %r3148, 404; 2026-02-21T12:31:22.0773289Z or.b32 %r3250, %r3148, 408; 2026-02-21T12:31:22.0773471Z or.b32 %r3251, %r3148, 412; 2026-02-21T12:31:22.0773644Z or.b32 %r3252, %r3148, 416; 2026-02-21T12:31:22.0773821Z or.b32 %r3253, %r3148, 420; 2026-02-21T12:31:22.0773991Z or.b32 %r3254, %r3148, 424; 2026-02-21T12:31:22.0774167Z or.b32 %r3255, %r3148, 428; 2026-02-21T12:31:22.0774351Z or.b32 %r3256, %r3148, 432; 2026-02-21T12:31:22.0774531Z or.b32 %r3257, %r3148, 436; 2026-02-21T12:31:22.0774701Z or.b32 %r3258, %r3148, 440; 2026-02-21T12:31:22.0774878Z or.b32 %r3259, %r3148, 444; 2026-02-21T12:31:22.0775053Z or.b32 %r3260, %r3148, 448; 2026-02-21T12:31:22.0775231Z or.b32 %r3261, %r3148, 452; 2026-02-21T12:31:22.0775405Z or.b32 %r3262, %r3148, 456; 2026-02-21T12:31:22.0775574Z or.b32 %r3263, %r3148, 460; 2026-02-21T12:31:22.0775758Z or.b32 %r3264, %r3148, 464; 2026-02-21T12:31:22.0775928Z or.b32 %r3265, %r3148, 468; 2026-02-21T12:31:22.0776104Z or.b32 %r3266, %r3148, 472; 2026-02-21T12:31:22.0776275Z or.b32 %r3267, %r3148, 476; 2026-02-21T12:31:22.0776575Z or.b32 %r3268, %r3148, 480; 2026-02-21T12:31:22.0776756Z or.b32 %r3269, %r3148, 484; 2026-02-21T12:31:22.0776935Z or.b32 %r3270, %r3148, 488; 2026-02-21T12:31:22.0777119Z or.b32 %r3271, %r3148, 492; 2026-02-21T12:31:22.0777292Z or.b32 %r3272, %r3148, 496; 2026-02-21T12:31:22.0777481Z or.b32 %r3273, %r3148, 500; 2026-02-21T12:31:22.0777657Z or.b32 %r3274, %r3148, 504; 2026-02-21T12:31:22.0777837Z or.b32 %r3275, %r3148, 508; 2026-02-21T12:31:22.0778012Z cvt.u64.u32 %rd7, %r3148; 2026-02-21T12:31:22.0778192Z cvt.u64.u32 %rd8, %r3149; 2026-02-21T12:31:22.0778366Z cvt.u64.u32 %rd9, %r3150; 2026-02-21T12:31:22.0778546Z cvt.u64.u32 %rd10, %r3151; 2026-02-21T12:31:22.0778723Z cvt.u64.u32 %rd11, %r3152; 2026-02-21T12:31:22.0779003Z cvt.u64.u32 %rd12, %r3153; 2026-02-21T12:31:22.0779185Z cvt.u64.u32 %rd13, %r3154; 2026-02-21T12:31:22.0779362Z cvt.u64.u32 %rd14, %r3155; 2026-02-21T12:31:22.0779538Z cvt.u64.u32 %rd15, %r3156; 2026-02-21T12:31:22.0779710Z cvt.u64.u32 %rd16, %r3157; 2026-02-21T12:31:22.0779986Z cvt.u64.u32 %rd17, %r3158; 2026-02-21T12:31:22.0780163Z cvt.u64.u32 %rd18, %r3159; 2026-02-21T12:31:22.0780344Z cvt.u64.u32 %rd19, %r3160; 2026-02-21T12:31:22.0780520Z cvt.u64.u32 %rd20, %r3161; 2026-02-21T12:31:22.0780701Z cvt.u64.u32 %rd21, %r3162; 2026-02-21T12:31:22.0780877Z cvt.u64.u32 %rd22, %r3163; 2026-02-21T12:31:22.0781062Z cvt.u64.u32 %rd23, %r3164; 2026-02-21T12:31:22.0781235Z cvt.u64.u32 %rd24, %r3165; 2026-02-21T12:31:22.0781404Z cvt.u64.u32 %rd25, %r3166; 2026-02-21T12:31:22.0781577Z cvt.u64.u32 %rd26, %r3167; 2026-02-21T12:31:22.0781750Z cvt.u64.u32 %rd27, %r3168; 2026-02-21T12:31:22.0781928Z cvt.u64.u32 %rd28, %r3169; 2026-02-21T12:31:22.0782114Z cvt.u64.u32 %rd29, %r3170; 2026-02-21T12:31:22.0782367Z cvt.u64.u32 %rd30, %r3171; 2026-02-21T12:31:22.0782543Z cvt.u64.u32 %rd31, %r3172; 2026-02-21T12:31:22.0782718Z cvt.u64.u32 %rd32, %r3173; 2026-02-21T12:31:22.0782888Z cvt.u64.u32 %rd33, %r3174; 2026-02-21T12:31:22.0783070Z cvt.u64.u32 %rd34, %r3175; 2026-02-21T12:31:22.0783249Z cvt.u64.u32 %rd35, %r3176; 2026-02-21T12:31:22.0783422Z cvt.u64.u32 %rd36, %r3177; 2026-02-21T12:31:22.0783662Z cvt.u64.u32 %rd37, %r3178; 2026-02-21T12:31:22.0783851Z cvt.u64.u32 %rd38, %r3179; 2026-02-21T12:31:22.0784075Z cvt.u64.u32 %rd39, %r3180; 2026-02-21T12:31:22.0784246Z cvt.u64.u32 %rd40, %r3181; 2026-02-21T12:31:22.0784424Z cvt.u64.u32 %rd41, %r3182; 2026-02-21T12:31:22.0784596Z cvt.u64.u32 %rd42, %r3183; 2026-02-21T12:31:22.0784774Z cvt.u64.u32 %rd43, %r3184; 2026-02-21T12:31:22.0784952Z cvt.u64.u32 %rd44, %r3185; 2026-02-21T12:31:22.0785132Z cvt.u64.u32 %rd45, %r3186; 2026-02-21T12:31:22.0785309Z cvt.u64.u32 %rd46, %r3187; 2026-02-21T12:31:22.0785482Z cvt.u64.u32 %rd47, %r3188; 2026-02-21T12:31:22.0785661Z cvt.u64.u32 %rd48, %r3189; 2026-02-21T12:31:22.0785834Z cvt.u64.u32 %rd49, %r3190; 2026-02-21T12:31:22.0786010Z cvt.u64.u32 %rd50, %r3191; 2026-02-21T12:31:22.0786185Z cvt.u64.u32 %rd51, %r3192; 2026-02-21T12:31:22.0786365Z cvt.u64.u32 %rd52, %r3193; 2026-02-21T12:31:22.0786663Z cvt.u64.u32 %rd53, %r3194; 2026-02-21T12:31:22.0786846Z cvt.u64.u32 %rd54, %r3195; 2026-02-21T12:31:22.0787019Z cvt.u64.u32 %rd55, %r3196; 2026-02-21T12:31:22.0787210Z cvt.u64.u32 %rd56, %r3197; 2026-02-21T12:31:22.0787414Z cvt.u64.u32 %rd57, %r3198; 2026-02-21T12:31:22.0787588Z cvt.u64.u32 %rd58, %r3199; 2026-02-21T12:31:22.0787766Z cvt.u64.u32 %rd59, %r3200; 2026-02-21T12:31:22.0787941Z cvt.u64.u32 %rd60, %r3201; 2026-02-21T12:31:22.0788141Z cvt.u64.u32 %rd61, %r3202; 2026-02-21T12:31:22.0788391Z cvt.u64.u32 %rd62, %r3203; 2026-02-21T12:31:22.0788593Z cvt.u64.u32 %rd63, %r3204; 2026-02-21T12:31:22.0788777Z cvt.u64.u32 %rd64, %r3205; 2026-02-21T12:31:22.0788972Z cvt.u64.u32 %rd65, %r3206; 2026-02-21T12:31:22.0789156Z cvt.u64.u32 %rd66, %r3207; 2026-02-21T12:31:22.0789356Z cvt.u64.u32 %rd67, %r3208; 2026-02-21T12:31:22.0789555Z cvt.u64.u32 %rd68, %r3209; 2026-02-21T12:31:22.0789745Z cvt.u64.u32 %rd69, %r3210; 2026-02-21T12:31:22.0789928Z cvt.u64.u32 %rd70, %r3211; 2026-02-21T12:31:22.0790094Z cvt.u64.u32 %rd71, %r3212; 2026-02-21T12:31:22.0790267Z cvt.u64.u32 %rd72, %r3213; 2026-02-21T12:31:22.0790435Z cvt.u64.u32 %rd73, %r3214; 2026-02-21T12:31:22.0790609Z cvt.u64.u32 %rd74, %r3215; 2026-02-21T12:31:22.0790777Z cvt.u64.u32 %rd75, %r3216; 2026-02-21T12:31:22.0790955Z cvt.u64.u32 %rd76, %r3217; 2026-02-21T12:31:22.0791130Z cvt.u64.u32 %rd77, %r3218; 2026-02-21T12:31:22.0791298Z cvt.u64.u32 %rd78, %r3219; 2026-02-21T12:31:22.0791470Z cvt.u64.u32 %rd79, %r3220; 2026-02-21T12:31:22.0791637Z cvt.u64.u32 %rd80, %r3221; 2026-02-21T12:31:22.0791834Z cvt.u64.u32 %rd81, %r3222; 2026-02-21T12:31:22.0792021Z cvt.u64.u32 %rd82, %r3223; 2026-02-21T12:31:22.0792324Z cvt.u64.u32 %rd83, %r3224; 2026-02-21T12:31:22.0792510Z cvt.u64.u32 %rd84, %r3225; 2026-02-21T12:31:22.0792698Z cvt.u64.u32 %rd85, %r3226; 2026-02-21T12:31:22.0792869Z cvt.u64.u32 %rd86, %r3227; 2026-02-21T12:31:22.0793114Z cvt.u64.u32 %rd87, %r3228; 2026-02-21T12:31:22.0793288Z cvt.u64.u32 %rd88, %r3229; 2026-02-21T12:31:22.0793456Z cvt.u64.u32 %rd89, %r3230; 2026-02-21T12:31:22.0793643Z cvt.u64.u32 %rd90, %r3231; 2026-02-21T12:31:22.0793819Z cvt.u64.u32 %rd91, %r3232; 2026-02-21T12:31:22.0794010Z cvt.u64.u32 %rd92, %r3233; 2026-02-21T12:31:22.0794188Z cvt.u64.u32 %rd93, %r3234; 2026-02-21T12:31:22.0794378Z cvt.u64.u32 %rd94, %r3235; 2026-02-21T12:31:22.0794551Z cvt.u64.u32 %rd95, %r3236; 2026-02-21T12:31:22.0794746Z cvt.u64.u32 %rd96, %r3237; 2026-02-21T12:31:22.0794918Z cvt.u64.u32 %rd97, %r3238; 2026-02-21T12:31:22.0795102Z cvt.u64.u32 %rd98, %r3239; 2026-02-21T12:31:22.0795282Z cvt.u64.u32 %rd99, %r3240; 2026-02-21T12:31:22.0795457Z cvt.u64.u32 %rd100, %r3241; 2026-02-21T12:31:22.0795726Z cvt.u64.u32 %rd101, %r3242; 2026-02-21T12:31:22.0795921Z cvt.u64.u32 %rd102, %r3243; 2026-02-21T12:31:22.0796117Z cvt.u64.u32 %rd103, %r3244; 2026-02-21T12:31:22.0796300Z cvt.u64.u32 %rd104, %r3245; 2026-02-21T12:31:22.0796623Z cvt.u64.u32 %rd105, %r3246; 2026-02-21T12:31:22.0796806Z cvt.u64.u32 %rd106, %r3247; 2026-02-21T12:31:22.0797005Z cvt.u64.u32 %rd107, %r3248; 2026-02-21T12:31:22.0797266Z cvt.u64.u32 %rd108, %r3249; 2026-02-21T12:31:22.0797483Z cvt.u64.u32 %rd109, %r3250; 2026-02-21T12:31:22.0797677Z cvt.u64.u32 %rd110, %r3251; 2026-02-21T12:31:22.0797856Z cvt.u64.u32 %rd111, %r3252; 2026-02-21T12:31:22.0798038Z cvt.u64.u32 %rd112, %r3253; 2026-02-21T12:31:22.0798219Z cvt.u64.u32 %rd113, %r3254; 2026-02-21T12:31:22.0798422Z cvt.u64.u32 %rd114, %r3255; 2026-02-21T12:31:22.0798601Z cvt.u64.u32 %rd115, %r3256; 2026-02-21T12:31:22.0798789Z cvt.u64.u32 %rd116, %r3257; 2026-02-21T12:31:22.0798972Z cvt.u64.u32 %rd117, %r3258; 2026-02-21T12:31:22.0799156Z cvt.u64.u32 %rd118, %r3259; 2026-02-21T12:31:22.0799338Z cvt.u64.u32 %rd119, %r3260; 2026-02-21T12:31:22.0799524Z cvt.u64.u32 %rd120, %r3261; 2026-02-21T12:31:22.0799711Z cvt.u64.u32 %rd121, %r3262; 2026-02-21T12:31:22.0799893Z cvt.u64.u32 %rd122, %r3263; 2026-02-21T12:31:22.0800079Z cvt.u64.u32 %rd123, %r3264; 2026-02-21T12:31:22.0800263Z cvt.u64.u32 %rd124, %r3265; 2026-02-21T12:31:22.0800458Z cvt.u64.u32 %rd125, %r3266; 2026-02-21T12:31:22.0800641Z cvt.u64.u32 %rd126, %r3267; 2026-02-21T12:31:22.0800821Z cvt.u64.u32 %rd127, %r3268; 2026-02-21T12:31:22.0801010Z cvt.u64.u32 %rd128, %r3269; 2026-02-21T12:31:22.0801197Z cvt.u64.u32 %rd129, %r3270; 2026-02-21T12:31:22.0801378Z cvt.u64.u32 %rd130, %r3271; 2026-02-21T12:31:22.0801552Z cvt.u64.u32 %rd131, %r3272; 2026-02-21T12:31:22.0801735Z cvt.u64.u32 %rd132, %r3273; 2026-02-21T12:31:22.0801910Z cvt.u64.u32 %rd133, %r3274; 2026-02-21T12:31:22.0802105Z cvt.u64.u32 %rd134, %r3275; 2026-02-21T12:31:22.0802284Z cvt.u32.u64 %r3288, %rd7; 2026-02-21T12:31:22.0802630Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0803013Z shl.b64 %rd213, %rd137, 10; 2026-02-21T12:31:22.0803342Z .loc 1 28 64 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:64 2026-02-21T12:31:22.0803718Z mul.lo.s32 %r3289, %r7, %r6; 2026-02-21T12:31:22.0803900Z sub.s32 %r3290, %r5, %r3289; 2026-02-21T12:31:22.0804078Z cvt.u64.u32 %rd214, %r3290; 2026-02-21T12:31:22.0804393Z .loc 1 28 30 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:30 2026-02-21T12:31:22.0804754Z add.s64 %rd215, %rd138, %rd214; 2026-02-21T12:31:22.0805082Z .loc 1 30 27 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:30:27 2026-02-21T12:31:22.0805436Z shl.b64 %rd216, %rd215, 8; 2026-02-21T12:31:22.0805766Z .loc 1 31 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:31:32 2026-02-21T12:31:22.0806219Z or.b64 %rd649, %rd216, %rd2; 2026-02-21T12:31:22.0806422Z add.s64 %rd145, %rd213, -3; 2026-02-21T12:31:22.0806721Z shl.b32 %r3291, %r4, 3; 2026-02-21T12:31:22.0806904Z shl.b32 %r3292, %r1, 2; 2026-02-21T12:31:22.0807181Z and.b32 %r3293, %r3292, 112; 2026-02-21T12:31:22.0807365Z and.b32 %r3294, %r1, 3; 2026-02-21T12:31:22.0807541Z shl.b32 %r3295, %r3294, 1; 2026-02-21T12:31:22.0807880Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0808250Z add.s32 %r3297, %r3281, %r3291; 2026-02-21T12:31:22.0808438Z add.s32 %r3298, %r3297, %r3293; 2026-02-21T12:31:22.0808634Z add.s32 %r9, %r3298, %r3295; 2026-02-21T12:31:22.0808819Z and.b32 %r3299, %r3, 120; 2026-02-21T12:31:22.0809007Z shr.u32 %r3300, %r1, 2; 2026-02-21T12:31:22.0809176Z and.b32 %r3301, %r3300, 4; 2026-02-21T12:31:22.0809359Z or.b32 %r3302, %r3301, %r3299; 2026-02-21T12:31:22.0809546Z or.b32 %r3303, %r3302, %r3288; 2026-02-21T12:31:22.0809822Z add.s32 %r3304, %r3281, 24576; 2026-02-21T12:31:22.0810015Z add.s32 %r10, %r3304, %r3303; 2026-02-21T12:31:22.0810208Z xor.b32 %r3305, %r3303, 32; 2026-02-21T12:31:22.0810491Z add.s32 %r11, %r3304, %r3305; 2026-02-21T12:31:22.0810783Z xor.b32 %r3306, %r3303, 64; 2026-02-21T12:31:22.0811063Z add.s32 %r12, %r3304, %r3306; 2026-02-21T12:31:22.0811319Z xor.b32 %r3307, %r3303, 96; 2026-02-21T12:31:22.0811697Z add.s32 %r13, %r3304, %r3307; 2026-02-21T12:31:22.0811976Z xor.b32 %r3308, %r3303, 4; 2026-02-21T12:31:22.0812239Z add.s32 %r14, %r3304, %r3308; 2026-02-21T12:31:22.0812514Z xor.b32 %r3309, %r3303, 36; 2026-02-21T12:31:22.0812788Z add.s32 %r15, %r3304, %r3309; 2026-02-21T12:31:22.0813058Z xor.b32 %r3310, %r3303, 68; 2026-02-21T12:31:22.0813330Z add.s32 %r16, %r3304, %r3310; 2026-02-21T12:31:22.0813611Z xor.b32 %r3311, %r3303, 100; 2026-02-21T12:31:22.0813892Z add.s32 %r17, %r3304, %r3311; 2026-02-21T12:31:22.0814171Z and.b32 %r3312, %r1, 7; 2026-02-21T12:31:22.0814429Z shl.b32 %r3313, %r3312, 7; 2026-02-21T12:31:22.0814683Z shl.b32 %r3314, %r3294, 5; 2026-02-21T12:31:22.0814943Z and.b32 %r3315, %r1, 124; 2026-02-21T12:31:22.0815204Z xor.b32 %r3316, %r3314, %r3315; 2026-02-21T12:31:22.0815483Z or.b32 %r3317, %r3316, %r3313; 2026-02-21T12:31:22.0815765Z add.s32 %r18, %r3304, %r3317; 2026-02-21T12:31:22.0816029Z xor.b32 %r3318, %r3317, 4; 2026-02-21T12:31:22.0816287Z add.s32 %r19, %r3304, %r3318; 2026-02-21T12:31:22.0816728Z shl.b32 %r3319, %r1, 5; 2026-02-21T12:31:22.0816964Z and.b32 %r3320, %r3319, 3936; 2026-02-21T12:31:22.0817222Z bfe.s32 %r3321, %r1, 2, 1; 2026-02-21T12:31:22.0817473Z and.b32 %r3322, %r3321, 144; 2026-02-21T12:31:22.0817747Z or.b32 %r3323, %r3322, %r3320; 2026-02-21T12:31:22.0818002Z add.s32 %r20, %r3304, %r3323; 2026-02-21T12:31:22.0818264Z xor.b32 %r3324, %r3323, 16; 2026-02-21T12:31:22.0818512Z add.s32 %r21, %r3304, %r3324; 2026-02-21T12:31:22.0818786Z bfe.u32 %r3325, %r3304, 4, 14; 2026-02-21T12:31:22.0819108Z cvt.u64.u32 %rd217, %r3325; 2026-02-21T12:31:22.0819429Z or.b64 %rd236, %rd217, -4611685949674356736; 2026-02-21T12:31:22.0819664Z shl.b32 %r3326, %r3294, 12; 2026-02-21T12:31:22.0819840Z shl.b32 %r3327, %r1, 4; 2026-02-21T12:31:22.0820019Z and.b32 %r3328, %r3327, 1920; 2026-02-21T12:31:22.0820196Z and.b32 %r3329, %r3321, 2064; 2026-02-21T12:31:22.0820382Z or.b32 %r3330, %r3328, %r3329; 2026-02-21T12:31:22.0820568Z or.b32 %r3331, %r3330, %r3326; 2026-02-21T12:31:22.0820756Z or.b32 %r3332, %r3331, %r3314; 2026-02-21T12:31:22.0820944Z add.s32 %r22, %r3304, %r3332; 2026-02-21T12:31:22.0821121Z xor.b32 %r3333, %r3332, 16; 2026-02-21T12:31:22.0821303Z add.s32 %r23, %r3304, %r3333; 2026-02-21T12:31:22.0821491Z xor.b32 %r3334, %r3332, 32; 2026-02-21T12:31:22.0821677Z add.s32 %r24, %r3304, %r3334; 2026-02-21T12:31:22.0821857Z xor.b32 %r3335, %r3332, 48; 2026-02-21T12:31:22.0822040Z add.s32 %r25, %r3304, %r3335; 2026-02-21T12:31:22.0822343Z xor.b32 %r3336, %r3332, 64; 2026-02-21T12:31:22.0822531Z add.s32 %r26, %r3304, %r3336; 2026-02-21T12:31:22.0822708Z xor.b32 %r3337, %r3332, 80; 2026-02-21T12:31:22.0822898Z add.s32 %r27, %r3304, %r3337; 2026-02-21T12:31:22.0823084Z xor.b32 %r3338, %r3332, 96; 2026-02-21T12:31:22.0823334Z add.s32 %r28, %r3304, %r3338; 2026-02-21T12:31:22.0823533Z xor.b32 %r3339, %r3332, 112; 2026-02-21T12:31:22.0823716Z add.s32 %r29, %r3304, %r3339; 2026-02-21T12:31:22.0823901Z and.b32 %r3340, %r1, 24; 2026-02-21T12:31:22.0824074Z shl.b32 %r3341, %r3340, 9; 2026-02-21T12:31:22.0824252Z shl.b32 %r3342, %r3312, 4; 2026-02-21T12:31:22.0824423Z shl.b32 %r3343, %r3340, 2; 2026-02-21T12:31:22.0824596Z bfe.s32 %r3344, %r1, 5, 1; 2026-02-21T12:31:22.0824769Z and.b32 %r3345, %r3344, 2064; 2026-02-21T12:31:22.0824950Z shl.b32 %r3346, %r1, 1; 2026-02-21T12:31:22.0825119Z and.b32 %r3347, %r3346, 128; 2026-02-21T12:31:22.0825294Z or.b32 %r3348, %r3341, %r3342; 2026-02-21T12:31:22.0825490Z or.b32 %r3349, %r3345, %r3343; 2026-02-21T12:31:22.0825757Z xor.b32 %r3350, %r3349, %r3348; 2026-02-21T12:31:22.0825953Z add.s32 %r3351, %r3304, %r3347; 2026-02-21T12:31:22.0826135Z add.s32 %r7570, %r3351, %r3350; 2026-02-21T12:31:22.0826322Z add.s32 %r7575, %r7570, 256; 2026-02-21T12:31:22.0826710Z add.s32 %r7580, %r7570, 512; 2026-02-21T12:31:22.0826912Z add.s32 %r7585, %r7570, 768; 2026-02-21T12:31:22.0827092Z add.s32 %r7590, %r7570, 1024; 2026-02-21T12:31:22.0827364Z add.s32 %r7595, %r7570, 1280; 2026-02-21T12:31:22.0827560Z add.s32 %r7600, %r7570, 1536; 2026-02-21T12:31:22.0827737Z add.s32 %r7605, %r7570, 1792; 2026-02-21T12:31:22.0827919Z shl.b64 %rd218, %rd136, 10; 2026-02-21T12:31:22.0828098Z shl.b64 %rd219, %rd135, 10; 2026-02-21T12:31:22.0828373Z add.s64 %rd147, %rd218, %rd219; 2026-02-21T12:31:22.0828558Z mov.b64 %rd658, 2; 2026-02-21T12:31:22.0828722Z mov.b32 %r9236, 0f00000000; 2026-02-21T12:31:22.0828892Z mov.b32 %r9235, 2; 2026-02-21T12:31:22.0829054Z mov.b32 %r9234, -1; 2026-02-21T12:31:22.0829224Z mov.b32 %r9233, 0; 2026-02-21T12:31:22.0829375Z mov.b32 %r9232, 4; 2026-02-21T12:31:22.0829531Z mov.b32 %r9231, 8; 2026-02-21T12:31:22.0829677Z mov.b64 %rd648, 0; 2026-02-21T12:31:22.0829832Z mov.b64 %rd647, 1; 2026-02-21T12:31:22.0830005Z prmt.b32 %r7516, %r7517, %r7518, 0x3340U; 2026-02-21T12:31:22.0830228Z mov.b64 %rd646, %rd645; 2026-02-21T12:31:22.0830398Z mov.b64 %rd650, %rd649; 2026-02-21T12:31:22.0830571Z mov.b64 %rd655, %rd645; 2026-02-21T12:31:22.0830737Z mov.b64 %rd656, %rd649; 2026-02-21T12:31:22.0830985Z mov.b32 %r9237, %r9236; 2026-02-21T12:31:22.0831163Z mov.b32 %r9238, %r9236; 2026-02-21T12:31:22.0831331Z mov.b32 %r9239, %r9236; 2026-02-21T12:31:22.0831516Z mov.b32 %r9240, %r9236; 2026-02-21T12:31:22.0831685Z mov.b32 %r9241, %r9236; 2026-02-21T12:31:22.0831855Z mov.b32 %r9242, %r9236; 2026-02-21T12:31:22.0832022Z mov.b32 %r9243, %r9236; 2026-02-21T12:31:22.0832196Z mov.b32 %r9244, %r9236; 2026-02-21T12:31:22.0832360Z mov.b32 %r9245, %r9236; 2026-02-21T12:31:22.0832538Z mov.b32 %r9246, %r9236; 2026-02-21T12:31:22.0832704Z mov.b32 %r9247, %r9236; 2026-02-21T12:31:22.0832872Z mov.b32 %r9248, %r9236; 2026-02-21T12:31:22.0833043Z mov.b32 %r9249, %r9236; 2026-02-21T12:31:22.0833207Z mov.b32 %r9250, %r9236; 2026-02-21T12:31:22.0833372Z mov.b32 %r9251, %r9236; 2026-02-21T12:31:22.0833534Z mov.b32 %r9252, %r9236; 2026-02-21T12:31:22.0833707Z mov.b32 %r9253, %r9236; 2026-02-21T12:31:22.0833869Z mov.b32 %r9254, %r9236; 2026-02-21T12:31:22.0834035Z mov.b32 %r9255, %r9236; 2026-02-21T12:31:22.0834194Z mov.b32 %r9256, %r9236; 2026-02-21T12:31:22.0834357Z mov.b32 %r9257, %r9236; 2026-02-21T12:31:22.0834518Z mov.b32 %r9258, %r9236; 2026-02-21T12:31:22.0834685Z mov.b32 %r9259, %r9236; 2026-02-21T12:31:22.0834867Z mov.b32 %r9260, %r9236; 2026-02-21T12:31:22.0835031Z mov.b32 %r9261, %r9236; 2026-02-21T12:31:22.0835199Z mov.b32 %r9262, %r9236; 2026-02-21T12:31:22.0835359Z mov.b32 %r9263, %r9236; 2026-02-21T12:31:22.0835616Z mov.b32 %r9264, %r9236; 2026-02-21T12:31:22.0835778Z mov.b32 %r9265, %r9236; 2026-02-21T12:31:22.0835947Z mov.b32 %r9266, %r9236; 2026-02-21T12:31:22.0836108Z mov.b32 %r9267, %r9236; 2026-02-21T12:31:22.0836274Z mov.b32 %r9268, %r9236; 2026-02-21T12:31:22.0836669Z mov.b32 %r9269, %r9236; 2026-02-21T12:31:22.0836853Z mov.b32 %r9270, %r9236; 2026-02-21T12:31:22.0837022Z mov.b32 %r9271, %r9236; 2026-02-21T12:31:22.0837186Z mov.b32 %r9272, %r9236; 2026-02-21T12:31:22.0837353Z mov.b32 %r9273, %r9236; 2026-02-21T12:31:22.0837517Z mov.b32 %r9274, %r9236; 2026-02-21T12:31:22.0837687Z mov.b32 %r9275, %r9236; 2026-02-21T12:31:22.0837856Z mov.b32 %r9276, %r9236; 2026-02-21T12:31:22.0838027Z mov.b32 %r9277, %r9236; 2026-02-21T12:31:22.0838188Z mov.b32 %r9278, %r9236; 2026-02-21T12:31:22.0838356Z mov.b32 %r9279, %r9236; 2026-02-21T12:31:22.0838518Z mov.b32 %r9280, %r9236; 2026-02-21T12:31:22.0838688Z mov.b32 %r9281, %r9236; 2026-02-21T12:31:22.0838854Z mov.b32 %r9282, %r9236; 2026-02-21T12:31:22.0839117Z mov.b32 %r9283, %r9236; 2026-02-21T12:31:22.0839295Z mov.b32 %r9284, %r9236; 2026-02-21T12:31:22.0839462Z mov.b32 %r9285, %r9236; 2026-02-21T12:31:22.0839643Z mov.b32 %r9286, %r9236; 2026-02-21T12:31:22.0839808Z mov.b32 %r9287, %r9236; 2026-02-21T12:31:22.0839982Z mov.b32 %r9288, %r9236; 2026-02-21T12:31:22.0840142Z mov.b32 %r9289, %r9236; 2026-02-21T12:31:22.0840313Z mov.b32 %r9290, %r9236; 2026-02-21T12:31:22.0840542Z mov.b32 %r9291, %r9236; 2026-02-21T12:31:22.0840717Z mov.b32 %r9292, %r9236; 2026-02-21T12:31:22.0840890Z mov.b32 %r9293, %r9236; 2026-02-21T12:31:22.0841051Z mov.b32 %r9294, %r9236; 2026-02-21T12:31:22.0841218Z mov.b32 %r9295, %r9236; 2026-02-21T12:31:22.0841381Z mov.b32 %r9296, %r9236; 2026-02-21T12:31:22.0841550Z mov.b32 %r9297, %r9236; 2026-02-21T12:31:22.0841713Z mov.b32 %r9298, %r9236; 2026-02-21T12:31:22.0841895Z mov.b32 %r9299, %r9236; 2026-02-21T12:31:22.0842061Z mov.b32 %r9300, %r9236; 2026-02-21T12:31:22.0842229Z mov.b32 %r9301, %r9236; 2026-02-21T12:31:22.0842393Z mov.b32 %r9302, %r9236; 2026-02-21T12:31:22.0842561Z mov.b32 %r9303, %r9236; 2026-02-21T12:31:22.0842725Z mov.b32 %r9304, %r9236; 2026-02-21T12:31:22.0842884Z mov.b32 %r9305, %r9236; 2026-02-21T12:31:22.0843055Z mov.b32 %r9306, %r9236; 2026-02-21T12:31:22.0843215Z mov.b32 %r9307, %r9236; 2026-02-21T12:31:22.0843382Z mov.b32 %r9308, %r9236; 2026-02-21T12:31:22.0843543Z mov.b32 %r9309, %r9236; 2026-02-21T12:31:22.0843714Z mov.b32 %r9310, %r9236; 2026-02-21T12:31:22.0843874Z mov.b32 %r9311, %r9236; 2026-02-21T12:31:22.0844040Z mov.b32 %r9312, %r9236; 2026-02-21T12:31:22.0844200Z mov.b32 %r9313, %r9236; 2026-02-21T12:31:22.0844365Z mov.b32 %r9314, %r9236; 2026-02-21T12:31:22.0844530Z mov.b32 %r9315, %r9236; 2026-02-21T12:31:22.0844690Z mov.b32 %r9316, %r9236; 2026-02-21T12:31:22.0844855Z mov.b32 %r9317, %r9236; 2026-02-21T12:31:22.0845016Z mov.b32 %r9318, %r9236; 2026-02-21T12:31:22.0845183Z mov.b32 %r9319, %r9236; 2026-02-21T12:31:22.0845346Z mov.b32 %r9320, %r9236; 2026-02-21T12:31:22.0845513Z mov.b32 %r9321, %r9236; 2026-02-21T12:31:22.0845674Z mov.b32 %r9322, %r9236; 2026-02-21T12:31:22.0845841Z mov.b32 %r9323, %r9236; 2026-02-21T12:31:22.0846001Z mov.b32 %r9324, %r9236; 2026-02-21T12:31:22.0846170Z mov.b32 %r9325, %r9236; 2026-02-21T12:31:22.0846336Z mov.b32 %r9326, %r9236; 2026-02-21T12:31:22.0846767Z mov.b32 %r9327, %r9236; 2026-02-21T12:31:22.0846950Z mov.b32 %r9328, %r9236; 2026-02-21T12:31:22.0847111Z mov.b32 %r9329, %r9236; 2026-02-21T12:31:22.0847277Z mov.b32 %r9330, %r9236; 2026-02-21T12:31:22.0847438Z mov.b32 %r9331, %r9236; 2026-02-21T12:31:22.0847605Z mov.b32 %r9332, %r9236; 2026-02-21T12:31:22.0847779Z mov.b32 %r9333, %r9236; 2026-02-21T12:31:22.0847951Z mov.b32 %r9334, %r9236; 2026-02-21T12:31:22.0848114Z mov.b32 %r9335, %r9236; 2026-02-21T12:31:22.0848281Z mov.b32 %r9336, %r9236; 2026-02-21T12:31:22.0848455Z mov.b32 %r9337, %r9236; 2026-02-21T12:31:22.0848616Z mov.b32 %r9338, %r9236; 2026-02-21T12:31:22.0848883Z mov.b32 %r9339, %r9236; 2026-02-21T12:31:22.0849048Z mov.b32 %r9340, %r9236; 2026-02-21T12:31:22.0849222Z mov.b32 %r9341, %r9236; 2026-02-21T12:31:22.0849397Z mov.b32 %r9342, %r9236; 2026-02-21T12:31:22.0849567Z mov.b32 %r9343, %r9236; 2026-02-21T12:31:22.0849800Z mov.b32 %r9344, %r9236; 2026-02-21T12:31:22.0849970Z mov.b32 %r9345, %r9236; 2026-02-21T12:31:22.0850132Z mov.b32 %r9346, %r9236; 2026-02-21T12:31:22.0850321Z mov.b32 %r9347, %r9236; 2026-02-21T12:31:22.0850488Z mov.b32 %r9348, %r9236; 2026-02-21T12:31:22.0850650Z mov.b32 %r9349, %r9236; 2026-02-21T12:31:22.0850819Z mov.b32 %r9350, %r9236; 2026-02-21T12:31:22.0850979Z mov.b32 %r9351, %r9236; 2026-02-21T12:31:22.0851147Z mov.b32 %r9352, %r9236; 2026-02-21T12:31:22.0851305Z mov.b32 %r9353, %r9236; 2026-02-21T12:31:22.0851471Z mov.b32 %r9354, %r9236; 2026-02-21T12:31:22.0851632Z mov.b32 %r9355, %r9236; 2026-02-21T12:31:22.0851796Z mov.b32 %r9356, %r9236; 2026-02-21T12:31:22.0851962Z mov.b32 %r9357, %r9236; 2026-02-21T12:31:22.0852218Z mov.b32 %r9358, %r9236; 2026-02-21T12:31:22.0852404Z mov.b32 %r9359, %r9236; 2026-02-21T12:31:22.0852569Z mov.b32 %r9360, %r9236; 2026-02-21T12:31:22.0852735Z mov.b32 %r9361, %r9236; 2026-02-21T12:31:22.0852898Z mov.b32 %r9362, %r9236; 2026-02-21T12:31:22.0853068Z mov.b32 %r9363, %r9236; 2026-02-21T12:31:22.0853233Z mov.b32 %r9364, %r9236; 2026-02-21T12:31:22.0853470Z mov.b32 %r9365, %r9236; 2026-02-21T12:31:22.0853635Z mov.b32 %r9366, %r9236; 2026-02-21T12:31:22.0853801Z mov.b32 %r9367, %r9236; 2026-02-21T12:31:22.0853961Z mov.b32 %r9368, %r9236; 2026-02-21T12:31:22.0854130Z mov.b32 %r9369, %r9236; 2026-02-21T12:31:22.0854310Z mov.b32 %r9370, %r9236; 2026-02-21T12:31:22.0854476Z mov.b32 %r9371, %r9236; 2026-02-21T12:31:22.0854647Z mov.b32 %r9372, %r9236; 2026-02-21T12:31:22.0854808Z mov.b32 %r9373, %r9236; 2026-02-21T12:31:22.0854979Z mov.b32 %r9374, %r9236; 2026-02-21T12:31:22.0855145Z mov.b32 %r9375, %r9236; 2026-02-21T12:31:22.0855318Z mov.b32 %r9376, %r9236; 2026-02-21T12:31:22.0855481Z mov.b32 %r9377, %r9236; 2026-02-21T12:31:22.0855648Z mov.b32 %r9378, %r9236; 2026-02-21T12:31:22.0855813Z mov.b32 %r9379, %r9236; 2026-02-21T12:31:22.0855980Z mov.b32 %r9380, %r9236; 2026-02-21T12:31:22.0856152Z mov.b32 %r9381, %r9236; 2026-02-21T12:31:22.0856321Z mov.b32 %r9382, %r9236; 2026-02-21T12:31:22.0856628Z mov.b32 %r9383, %r9236; 2026-02-21T12:31:22.0856799Z mov.b32 %r9384, %r9236; 2026-02-21T12:31:22.0856968Z mov.b32 %r9385, %r9236; 2026-02-21T12:31:22.0857134Z mov.b32 %r9386, %r9236; 2026-02-21T12:31:22.0857308Z mov.b32 %r9387, %r9236; 2026-02-21T12:31:22.0857471Z mov.b32 %r9388, %r9236; 2026-02-21T12:31:22.0857644Z mov.b32 %r9389, %r9236; 2026-02-21T12:31:22.0857804Z mov.b32 %r9390, %r9236; 2026-02-21T12:31:22.0857991Z mov.b32 %r9391, %r9236; 2026-02-21T12:31:22.0858158Z mov.b32 %r9392, %r9236; 2026-02-21T12:31:22.0858321Z mov.b32 %r9393, %r9236; 2026-02-21T12:31:22.0858493Z mov.b32 %r9394, %r9236; 2026-02-21T12:31:22.0858658Z mov.b32 %r9395, %r9236; 2026-02-21T12:31:22.0858829Z mov.b32 %r9396, %r9236; 2026-02-21T12:31:22.0858989Z mov.b32 %r9397, %r9236; 2026-02-21T12:31:22.0859158Z mov.b32 %r9398, %r9236; 2026-02-21T12:31:22.0859323Z mov.b32 %r9399, %r9236; 2026-02-21T12:31:22.0859490Z mov.b32 %r9400, %r9236; 2026-02-21T12:31:22.0859650Z mov.b32 %r9401, %r9236; 2026-02-21T12:31:22.0859816Z mov.b32 %r9402, %r9236; 2026-02-21T12:31:22.0859984Z mov.b32 %r9403, %r9236; 2026-02-21T12:31:22.0860146Z mov.b32 %r9404, %r9236; 2026-02-21T12:31:22.0860317Z mov.b32 %r9405, %r9236; 2026-02-21T12:31:22.0860482Z mov.b32 %r9406, %r9236; 2026-02-21T12:31:22.0860651Z mov.b32 %r9407, %r9236; 2026-02-21T12:31:22.0860814Z mov.b32 %r9408, %r9236; 2026-02-21T12:31:22.0860979Z mov.b32 %r9409, %r9236; 2026-02-21T12:31:22.0861141Z mov.b32 %r9410, %r9236; 2026-02-21T12:31:22.0861308Z mov.b32 %r9411, %r9236; 2026-02-21T12:31:22.0861469Z mov.b32 %r9412, %r9236; 2026-02-21T12:31:22.0861759Z mov.b32 %r9413, %r9236; 2026-02-21T12:31:22.0861930Z mov.b32 %r9414, %r9236; 2026-02-21T12:31:22.0862095Z mov.b32 %r9415, %r9236; 2026-02-21T12:31:22.0862263Z mov.b32 %r9416, %r9236; 2026-02-21T12:31:22.0862426Z mov.b32 %r9417, %r9236; 2026-02-21T12:31:22.0862676Z mov.b32 %r9418, %r9236; 2026-02-21T12:31:22.0862838Z mov.b32 %r9419, %r9236; 2026-02-21T12:31:22.0863009Z mov.b32 %r9420, %r9236; 2026-02-21T12:31:22.0863175Z mov.b32 %r9421, %r9236; 2026-02-21T12:31:22.0863346Z mov.b32 %r9422, %r9236; 2026-02-21T12:31:22.0863510Z mov.b32 %r9423, %r9236; 2026-02-21T12:31:22.0863679Z mov.b32 %r9424, %r9236; 2026-02-21T12:31:22.0863849Z mov.b32 %r9425, %r9236; 2026-02-21T12:31:22.0864014Z mov.b32 %r9426, %r9236; 2026-02-21T12:31:22.0864183Z mov.b32 %r9427, %r9236; 2026-02-21T12:31:22.0864357Z mov.b32 %r9428, %r9236; 2026-02-21T12:31:22.0864529Z mov.b32 %r9429, %r9236; 2026-02-21T12:31:22.0864692Z mov.b32 %r9430, %r9236; 2026-02-21T12:31:22.0864861Z mov.b32 %r9431, %r9236; 2026-02-21T12:31:22.0865096Z mov.b32 %r9432, %r9236; 2026-02-21T12:31:22.0865267Z mov.b32 %r9433, %r9236; 2026-02-21T12:31:22.0865427Z mov.b32 %r9434, %r9236; 2026-02-21T12:31:22.0865594Z mov.b32 %r9435, %r9236; 2026-02-21T12:31:22.0865761Z mov.b32 %r9436, %r9236; 2026-02-21T12:31:22.0865929Z mov.b32 %r9437, %r9236; 2026-02-21T12:31:22.0866097Z mov.b32 %r9438, %r9236; 2026-02-21T12:31:22.0866275Z mov.b32 %r9439, %r9236; 2026-02-21T12:31:22.0866648Z mov.b32 %r9440, %r9236; 2026-02-21T12:31:22.0866836Z mov.b32 %r9441, %r9236; 2026-02-21T12:31:22.0867005Z mov.b32 %r9442, %r9236; 2026-02-21T12:31:22.0867168Z mov.b32 %r9443, %r9236; 2026-02-21T12:31:22.0867356Z mov.b32 %r9444, %r9236; 2026-02-21T12:31:22.0867522Z mov.b32 %r9445, %r9236; 2026-02-21T12:31:22.0867695Z mov.b32 %r9446, %r9236; 2026-02-21T12:31:22.0867866Z mov.b32 %r9447, %r9236; 2026-02-21T12:31:22.0868029Z mov.b32 %r9448, %r9236; 2026-02-21T12:31:22.0868197Z mov.b32 %r9449, %r9236; 2026-02-21T12:31:22.0868439Z mov.b32 %r9450, %r9236; 2026-02-21T12:31:22.0868610Z mov.b32 %r9451, %r9236; 2026-02-21T12:31:22.0868768Z mov.b32 %r9452, %r9236; 2026-02-21T12:31:22.0868933Z mov.b32 %r9453, %r9236; 2026-02-21T12:31:22.0869094Z mov.b32 %r9454, %r9236; 2026-02-21T12:31:22.0869262Z mov.b32 %r9455, %r9236; 2026-02-21T12:31:22.0869422Z mov.b32 %r9456, %r9236; 2026-02-21T12:31:22.0869588Z mov.b32 %r9457, %r9236; 2026-02-21T12:31:22.0869757Z mov.b32 %r9458, %r9236; 2026-02-21T12:31:22.0869919Z mov.b32 %r9459, %r9236; 2026-02-21T12:31:22.0870085Z mov.b32 %r9460, %r9236; 2026-02-21T12:31:22.0870244Z mov.b32 %r9461, %r9236; 2026-02-21T12:31:22.0870412Z mov.b32 %r9462, %r9236; 2026-02-21T12:31:22.0870574Z mov.b32 %r9463, %r9236; 2026-02-21T12:31:22.0870742Z mov.b32 %r9464, %r9236; 2026-02-21T12:31:22.0870906Z mov.b32 %r9465, %r9236; 2026-02-21T12:31:22.0871077Z mov.b32 %r9466, %r9236; 2026-02-21T12:31:22.0871238Z mov.b32 %r9467, %r9236; 2026-02-21T12:31:22.0871404Z mov.b32 %r9468, %r9236; 2026-02-21T12:31:22.0871579Z mov.b32 %r9469, %r9236; 2026-02-21T12:31:22.0871740Z mov.b32 %r9470, %r9236; 2026-02-21T12:31:22.0871912Z mov.b32 %r9471, %r9236; 2026-02-21T12:31:22.0872074Z mov.b32 %r9472, %r9236; 2026-02-21T12:31:22.0872241Z mov.b32 %r9473, %r9236; 2026-02-21T12:31:22.0872404Z mov.b32 %r9474, %r9236; 2026-02-21T12:31:22.0872574Z mov.b32 %r9475, %r9236; 2026-02-21T12:31:22.0872733Z mov.b32 %r9476, %r9236; 2026-02-21T12:31:22.0872903Z mov.b32 %r9477, %r9236; 2026-02-21T12:31:22.0873068Z mov.b32 %r9478, %r9236; 2026-02-21T12:31:22.0873255Z mov.b32 %r9479, %r9236; 2026-02-21T12:31:22.0873424Z mov.b32 %r9480, %r9236; 2026-02-21T12:31:22.0873585Z mov.b32 %r9481, %r9236; 2026-02-21T12:31:22.0873757Z mov.b32 %r9482, %r9236; 2026-02-21T12:31:22.0873919Z mov.b32 %r9483, %r9236; 2026-02-21T12:31:22.0874090Z mov.b32 %r9484, %r9236; 2026-02-21T12:31:22.0874249Z mov.b32 %r9485, %r9236; 2026-02-21T12:31:22.0874415Z mov.b32 %r9486, %r9236; 2026-02-21T12:31:22.0874579Z mov.b32 %r9487, %r9236; 2026-02-21T12:31:22.0874850Z mov.b32 %r9488, %r9236; 2026-02-21T12:31:22.0875014Z mov.b32 %r9489, %r9236; 2026-02-21T12:31:22.0875184Z mov.b32 %r9490, %r9236; 2026-02-21T12:31:22.0875354Z mov.b32 %r9491, %r9236; 2026-02-21T12:31:22.0875594Z mov.b32 %r9492, %r9236; 2026-02-21T12:31:22.0875765Z mov.b32 %r9493, %r9236; 2026-02-21T12:31:22.0875928Z mov.b32 %r9494, %r9236; 2026-02-21T12:31:22.0876100Z mov.b32 %r9495, %r9236; 2026-02-21T12:31:22.0876266Z mov.b32 %r9496, %r9236; 2026-02-21T12:31:22.0876436Z mov.b32 %r9497, %r9236; 2026-02-21T12:31:22.0876722Z mov.b32 %r9498, %r9236; 2026-02-21T12:31:22.0876896Z mov.b32 %r9499, %r9236; 2026-02-21T12:31:22.0877058Z mov.b32 %r9500, %r9236; 2026-02-21T12:31:22.0877224Z mov.b32 %r9501, %r9236; 2026-02-21T12:31:22.0877398Z mov.b32 %r9502, %r9236; 2026-02-21T12:31:22.0877557Z mov.b32 %r9503, %r9236; 2026-02-21T12:31:22.0877738Z mov.b32 %r9504, %r9236; 2026-02-21T12:31:22.0877897Z mov.b32 %r9505, %r9236; 2026-02-21T12:31:22.0878064Z mov.b32 %r9506, %r9236; 2026-02-21T12:31:22.0878301Z mov.b32 %r9507, %r9236; 2026-02-21T12:31:22.0878471Z mov.b32 %r9508, %r9236; 2026-02-21T12:31:22.0878632Z mov.b32 %r9509, %r9236; 2026-02-21T12:31:22.0878799Z mov.b32 %r9510, %r9236; 2026-02-21T12:31:22.0878960Z mov.b32 %r9511, %r9236; 2026-02-21T12:31:22.0879125Z mov.b32 %r9512, %r9236; 2026-02-21T12:31:22.0879292Z mov.b32 %r9513, %r9236; 2026-02-21T12:31:22.0879520Z mov.b32 %r9514, %r9236; 2026-02-21T12:31:22.0879690Z mov.b32 %r9515, %r9236; 2026-02-21T12:31:22.0879861Z mov.b32 %r9516, %r9236; 2026-02-21T12:31:22.0880031Z mov.b32 %r9517, %r9236; 2026-02-21T12:31:22.0880197Z mov.b32 %r9518, %r9236; 2026-02-21T12:31:22.0880365Z mov.b32 %r9519, %r9236; 2026-02-21T12:31:22.0880526Z mov.b32 %r9520, %r9236; 2026-02-21T12:31:22.0880692Z mov.b32 %r9521, %r9236; 2026-02-21T12:31:22.0880855Z mov.b32 %r9522, %r9236; 2026-02-21T12:31:22.0881023Z mov.b32 %r9523, %r9236; 2026-02-21T12:31:22.0881189Z mov.b32 %r9524, %r9236; 2026-02-21T12:31:22.0881356Z mov.b32 %r9525, %r9236; 2026-02-21T12:31:22.0881524Z mov.b32 %r9526, %r9236; 2026-02-21T12:31:22.0881684Z mov.b32 %r9527, %r9236; 2026-02-21T12:31:22.0881850Z mov.b32 %r9528, %r9236; 2026-02-21T12:31:22.0882011Z mov.b32 %r9529, %r9236; 2026-02-21T12:31:22.0882179Z mov.b32 %r9530, %r9236; 2026-02-21T12:31:22.0882342Z mov.b32 %r9531, %r9236; 2026-02-21T12:31:22.0882509Z mov.b32 %r9532, %r9236; 2026-02-21T12:31:22.0882671Z mov.b32 %r9533, %r9236; 2026-02-21T12:31:22.0882840Z mov.b32 %r9534, %r9236; 2026-02-21T12:31:22.0883006Z mov.b32 %r9535, %r9236; 2026-02-21T12:31:22.0883169Z mov.b32 %r9536, %r9236; 2026-02-21T12:31:22.0895148Z mov.b32 %r9537, %r9236; 2026-02-21T12:31:22.0895439Z mov.b32 %r9538, %r9236; 2026-02-21T12:31:22.0895632Z mov.b32 %r9539, %r9236; 2026-02-21T12:31:22.0895821Z mov.b32 %r9540, %r9236; 2026-02-21T12:31:22.0895990Z mov.b32 %r9541, %r9236; 2026-02-21T12:31:22.0896155Z mov.b32 %r9542, %r9236; 2026-02-21T12:31:22.0896319Z mov.b32 %r9543, %r9236; 2026-02-21T12:31:22.0896679Z mov.b32 %r9544, %r9236; 2026-02-21T12:31:22.0896848Z mov.b32 %r9545, %r9236; 2026-02-21T12:31:22.0897019Z mov.b32 %r9546, %r9236; 2026-02-21T12:31:22.0897194Z mov.b32 %r9547, %r9236; 2026-02-21T12:31:22.0897397Z mov.b32 %r9548, %r9236; 2026-02-21T12:31:22.0897575Z mov.b32 %r9549, %r9236; 2026-02-21T12:31:22.0897744Z mov.b32 %r9550, %r9236; 2026-02-21T12:31:22.0897921Z mov.b32 %r9551, %r9236; 2026-02-21T12:31:22.0898088Z mov.b32 %r9552, %r9236; 2026-02-21T12:31:22.0898258Z mov.b32 %r9553, %r9236; 2026-02-21T12:31:22.0898421Z mov.b32 %r9554, %r9236; 2026-02-21T12:31:22.0898592Z mov.b32 %r9555, %r9236; 2026-02-21T12:31:22.0898754Z mov.b32 %r9556, %r9236; 2026-02-21T12:31:22.0898924Z mov.b32 %r9557, %r9236; 2026-02-21T12:31:22.0899093Z mov.b32 %r9558, %r9236; 2026-02-21T12:31:22.0899260Z mov.b32 %r9559, %r9236; 2026-02-21T12:31:22.0899435Z mov.b32 %r9560, %r9236; 2026-02-21T12:31:22.0899598Z mov.b32 %r9561, %r9236; 2026-02-21T12:31:22.0899944Z mov.b32 %r9562, %r9236; 2026-02-21T12:31:22.0900129Z mov.b32 %r9563, %r9236; 2026-02-21T12:31:22.0900302Z mov.b32 %r9564, %r9236; 2026-02-21T12:31:22.0900463Z mov.b32 %r9565, %r9236; 2026-02-21T12:31:22.0900639Z mov.b32 %r9566, %r9236; 2026-02-21T12:31:22.0900909Z mov.b32 %r9567, %r9236; 2026-02-21T12:31:22.0901076Z mov.b32 %r9568, %r9236; 2026-02-21T12:31:22.0901246Z mov.b32 %r9569, %r9236; 2026-02-21T12:31:22.0901409Z mov.b32 %r9570, %r9236; 2026-02-21T12:31:22.0901595Z mov.b32 %r9571, %r9236; 2026-02-21T12:31:22.0901765Z mov.b32 %r9572, %r9236; 2026-02-21T12:31:22.0901947Z mov.b32 %r9573, %r9236; 2026-02-21T12:31:22.0902115Z mov.b32 %r9574, %r9236; 2026-02-21T12:31:22.0902283Z mov.b32 %r9575, %r9236; 2026-02-21T12:31:22.0902447Z mov.b32 %r9576, %r9236; 2026-02-21T12:31:22.0902616Z mov.b32 %r9577, %r9236; 2026-02-21T12:31:22.0902776Z mov.b32 %r9578, %r9236; 2026-02-21T12:31:22.0902945Z mov.b32 %r9579, %r9236; 2026-02-21T12:31:22.0903113Z mov.b32 %r9580, %r9236; 2026-02-21T12:31:22.0903377Z mov.b32 %r9581, %r9236; 2026-02-21T12:31:22.0903550Z mov.b32 %r9582, %r9236; 2026-02-21T12:31:22.0903712Z mov.b32 %r9583, %r9236; 2026-02-21T12:31:22.0903879Z mov.b32 %r9584, %r9236; 2026-02-21T12:31:22.0904041Z mov.b32 %r9585, %r9236; 2026-02-21T12:31:22.0904210Z mov.b32 %r9586, %r9236; 2026-02-21T12:31:22.0904372Z mov.b32 %r9587, %r9236; 2026-02-21T12:31:22.0904542Z mov.b32 %r9588, %r9236; 2026-02-21T12:31:22.0904780Z mov.b32 %r9589, %r9236; 2026-02-21T12:31:22.0904952Z mov.b32 %r9590, %r9236; 2026-02-21T12:31:22.0905133Z mov.b32 %r9591, %r9236; 2026-02-21T12:31:22.0905295Z mov.b32 %r9592, %r9236; 2026-02-21T12:31:22.0905460Z mov.b32 %r9593, %r9236; 2026-02-21T12:31:22.0905619Z mov.b32 %r9594, %r9236; 2026-02-21T12:31:22.0905785Z mov.b32 %r9595, %r9236; 2026-02-21T12:31:22.0905948Z mov.b32 %r9596, %r9236; 2026-02-21T12:31:22.0906114Z mov.b32 %r9597, %r9236; 2026-02-21T12:31:22.0906276Z mov.b32 %r9598, %r9236; 2026-02-21T12:31:22.0906640Z mov.b32 %r9599, %r9236; 2026-02-21T12:31:22.0906820Z mov.b32 %r9600, %r9236; 2026-02-21T12:31:22.0906990Z mov.b32 %r9601, %r9236; 2026-02-21T12:31:22.0907159Z mov.b32 %r9602, %r9236; 2026-02-21T12:31:22.0907331Z mov.b32 %r9603, %r9236; 2026-02-21T12:31:22.0907515Z mov.b32 %r9604, %r9236; 2026-02-21T12:31:22.0907681Z mov.b32 %r9605, %r9236; 2026-02-21T12:31:22.0907850Z mov.b32 %r9606, %r9236; 2026-02-21T12:31:22.0908014Z mov.b32 %r9607, %r9236; 2026-02-21T12:31:22.0908185Z mov.b32 %r9608, %r9236; 2026-02-21T12:31:22.0908448Z mov.b32 %r9609, %r9236; 2026-02-21T12:31:22.0908621Z mov.b32 %r9610, %r9236; 2026-02-21T12:31:22.0908789Z mov.b32 %r9611, %r9236; 2026-02-21T12:31:22.0908959Z mov.b32 %r9612, %r9236; 2026-02-21T12:31:22.0909131Z mov.b32 %r9613, %r9236; 2026-02-21T12:31:22.0909293Z mov.b32 %r9614, %r9236; 2026-02-21T12:31:22.0909466Z mov.b32 %r9615, %r9236; 2026-02-21T12:31:22.0909632Z mov.b32 %r9616, %r9236; 2026-02-21T12:31:22.0909801Z mov.b32 %r9617, %r9236; 2026-02-21T12:31:22.0909968Z mov.b32 %r9618, %r9236; 2026-02-21T12:31:22.0910138Z mov.b32 %r9619, %r9236; 2026-02-21T12:31:22.0910298Z mov.b32 %r9620, %r9236; 2026-02-21T12:31:22.0910465Z mov.b32 %r9621, %r9236; 2026-02-21T12:31:22.0910629Z mov.b32 %r9622, %r9236; 2026-02-21T12:31:22.0910801Z mov.b32 %r9623, %r9236; 2026-02-21T12:31:22.0910968Z mov.b32 %r9624, %r9236; 2026-02-21T12:31:22.0911132Z mov.b32 %r9625, %r9236; 2026-02-21T12:31:22.0911307Z mov.b32 %r9626, %r9236; 2026-02-21T12:31:22.0911476Z mov.b32 %r9627, %r9236; 2026-02-21T12:31:22.0911655Z mov.b32 %r9628, %r9236; 2026-02-21T12:31:22.0911823Z mov.b32 %r9629, %r9236; 2026-02-21T12:31:22.0911994Z mov.b32 %r9630, %r9236; 2026-02-21T12:31:22.0912154Z mov.b32 %r9631, %r9236; 2026-02-21T12:31:22.0912319Z mov.b32 %r9632, %r9236; 2026-02-21T12:31:22.0912477Z mov.b32 %r9633, %r9236; 2026-02-21T12:31:22.0912648Z mov.b32 %r9634, %r9236; 2026-02-21T12:31:22.0912816Z mov.b32 %r9635, %r9236; 2026-02-21T12:31:22.0912978Z mov.b32 %r9636, %r9236; 2026-02-21T12:31:22.0913268Z mov.b32 %r9637, %r9236; 2026-02-21T12:31:22.0913430Z mov.b32 %r9638, %r9236; 2026-02-21T12:31:22.0913596Z mov.b32 %r9639, %r9236; 2026-02-21T12:31:22.0913756Z mov.b32 %r9640, %r9236; 2026-02-21T12:31:22.0914008Z mov.b32 %r9641, %r9236; 2026-02-21T12:31:22.0914170Z mov.b32 %r9642, %r9236; 2026-02-21T12:31:22.0914339Z mov.b32 %r9643, %r9236; 2026-02-21T12:31:22.0914512Z mov.b32 %r9644, %r9236; 2026-02-21T12:31:22.0914683Z mov.b32 %r9645, %r9236; 2026-02-21T12:31:22.0914850Z mov.b32 %r9646, %r9236; 2026-02-21T12:31:22.0915014Z mov.b32 %r9647, %r9236; 2026-02-21T12:31:22.0915185Z mov.b32 %r9648, %r9236; 2026-02-21T12:31:22.0915345Z mov.b32 %r9649, %r9236; 2026-02-21T12:31:22.0915512Z mov.b32 %r9650, %r9236; 2026-02-21T12:31:22.0915678Z mov.b32 %r9651, %r9236; 2026-02-21T12:31:22.0915852Z mov.b32 %r9652, %r9236; 2026-02-21T12:31:22.0916022Z mov.b32 %r9653, %r9236; 2026-02-21T12:31:22.0916194Z mov.b32 %r9654, %r9236; 2026-02-21T12:31:22.0916359Z mov.b32 %r9655, %r9236; 2026-02-21T12:31:22.0916749Z mov.b32 %r9656, %r9236; 2026-02-21T12:31:22.0916932Z mov.b32 %r9657, %r9236; 2026-02-21T12:31:22.0917093Z mov.b32 %r9658, %r9236; 2026-02-21T12:31:22.0917270Z mov.b32 %r9659, %r9236; 2026-02-21T12:31:22.0917455Z mov.b32 %r9660, %r9236; 2026-02-21T12:31:22.0917622Z mov.b32 %r9661, %r9236; 2026-02-21T12:31:22.0917782Z mov.b32 %r9662, %r9236; 2026-02-21T12:31:22.0918021Z mov.b32 %r9663, %r9236; 2026-02-21T12:31:22.0918188Z mov.b32 %r9664, %r9236; 2026-02-21T12:31:22.0918367Z mov.b32 %r9665, %r9236; 2026-02-21T12:31:22.0918538Z mov.b32 %r9666, %r9236; 2026-02-21T12:31:22.0918707Z mov.b32 %r9667, %r9236; 2026-02-21T12:31:22.0918875Z mov.b32 %r9668, %r9236; 2026-02-21T12:31:22.0919038Z mov.b32 %r9669, %r9236; 2026-02-21T12:31:22.0919212Z mov.b32 %r9670, %r9236; 2026-02-21T12:31:22.0919374Z mov.b32 %r9671, %r9236; 2026-02-21T12:31:22.0919548Z mov.b32 %r9672, %r9236; 2026-02-21T12:31:22.0919713Z mov.b32 %r9673, %r9236; 2026-02-21T12:31:22.0919903Z mov.b32 %r9674, %r9236; 2026-02-21T12:31:22.0920073Z mov.b32 %r9675, %r9236; 2026-02-21T12:31:22.0920244Z mov.b32 %r9676, %r9236; 2026-02-21T12:31:22.0920404Z mov.b32 %r9677, %r9236; 2026-02-21T12:31:22.0920576Z mov.b32 %r9678, %r9236; 2026-02-21T12:31:22.0920747Z mov.b32 %r9679, %r9236; 2026-02-21T12:31:22.0920906Z mov.b32 %r9680, %r9236; 2026-02-21T12:31:22.0921068Z mov.b32 %r9681, %r9236; 2026-02-21T12:31:22.0921255Z mov.b32 %r9682, %r9236; 2026-02-21T12:31:22.0921419Z mov.b32 %r9683, %r9236; 2026-02-21T12:31:22.0921588Z mov.b32 %r9684, %r9236; 2026-02-21T12:31:22.0921748Z mov.b32 %r9685, %r9236; 2026-02-21T12:31:22.0921918Z mov.b32 %r9686, %r9236; 2026-02-21T12:31:22.0922079Z mov.b32 %r9687, %r9236; 2026-02-21T12:31:22.0922243Z mov.b32 %r9688, %r9236; 2026-02-21T12:31:22.0922405Z mov.b32 %r9689, %r9236; 2026-02-21T12:31:22.0922574Z mov.b32 %r9690, %r9236; 2026-02-21T12:31:22.0922737Z mov.b32 %r9691, %r9236; 2026-02-21T12:31:22.0922905Z mov.b32 %r9692, %r9236; 2026-02-21T12:31:22.0923075Z mov.b32 %r9693, %r9236; 2026-02-21T12:31:22.0923234Z mov.b32 %r9694, %r9236; 2026-02-21T12:31:22.0923400Z mov.b32 %r9695, %r9236; 2026-02-21T12:31:22.0923559Z mov.b32 %r9696, %r9236; 2026-02-21T12:31:22.0923728Z mov.b32 %r9697, %r9236; 2026-02-21T12:31:22.0923890Z mov.b32 %r9698, %r9236; 2026-02-21T12:31:22.0924055Z mov.b32 %r9699, %r9236; 2026-02-21T12:31:22.0924214Z mov.b32 %r9700, %r9236; 2026-02-21T12:31:22.0924382Z mov.b32 %r9701, %r9236; 2026-02-21T12:31:22.0924548Z mov.b32 %r9702, %r9236; 2026-02-21T12:31:22.0924723Z mov.b32 %r9703, %r9236; 2026-02-21T12:31:22.0924893Z mov.b32 %r9704, %r9236; 2026-02-21T12:31:22.0925054Z mov.b32 %r9705, %r9236; 2026-02-21T12:31:22.0925220Z mov.b32 %r9706, %r9236; 2026-02-21T12:31:22.0925382Z mov.b32 %r9707, %r9236; 2026-02-21T12:31:22.0925455Z mov.b32 %r9708, %r9236; 2026-02-21T12:31:22.0925522Z mov.b32 %r9709, %r9236; 2026-02-21T12:31:22.0925581Z mov.b32 %r9710, %r9236; 2026-02-21T12:31:22.0925725Z mov.b32 %r9711, %r9236; 2026-02-21T12:31:22.0925792Z mov.b32 %r9712, %r9236; 2026-02-21T12:31:22.0925853Z mov.b32 %r9713, %r9236; 2026-02-21T12:31:22.0925912Z mov.b32 %r9714, %r9236; 2026-02-21T12:31:22.0925972Z mov.b32 %r9715, %r9236; 2026-02-21T12:31:22.0926106Z mov.b32 %r9716, %r9236; 2026-02-21T12:31:22.0926166Z mov.b32 %r9717, %r9236; 2026-02-21T12:31:22.0926227Z mov.b32 %r9718, %r9236; 2026-02-21T12:31:22.0926300Z mov.b32 %r9719, %r9236; 2026-02-21T12:31:22.0926367Z mov.b32 %r9720, %r9236; 2026-02-21T12:31:22.0926427Z mov.b32 %r9721, %r9236; 2026-02-21T12:31:22.0926604Z mov.b32 %r9722, %r9236; 2026-02-21T12:31:22.0926673Z mov.b32 %r9723, %r9236; 2026-02-21T12:31:22.0926731Z mov.b32 %r9724, %r9236; 2026-02-21T12:31:22.0926790Z mov.b32 %r9725, %r9236; 2026-02-21T12:31:22.0926853Z mov.b32 %r9726, %r9236; 2026-02-21T12:31:22.0926914Z mov.b32 %r9727, %r9236; 2026-02-21T12:31:22.0926972Z mov.b32 %r9728, %r9236; 2026-02-21T12:31:22.0927040Z mov.b32 %r9729, %r9236; 2026-02-21T12:31:22.0927179Z mov.b32 %r9730, %r9236; 2026-02-21T12:31:22.0927243Z mov.b32 %r9731, %r9236; 2026-02-21T12:31:22.0927303Z mov.b32 %r9732, %r9236; 2026-02-21T12:31:22.0927367Z mov.b32 %r9733, %r9236; 2026-02-21T12:31:22.0927426Z mov.b32 %r9734, %r9236; 2026-02-21T12:31:22.0927487Z mov.b32 %r9735, %r9236; 2026-02-21T12:31:22.0927555Z mov.b32 %r9736, %r9236; 2026-02-21T12:31:22.0927614Z mov.b32 %r9737, %r9236; 2026-02-21T12:31:22.0927733Z mov.b32 %r9738, %r9236; 2026-02-21T12:31:22.0927795Z mov.b32 %r9739, %r9236; 2026-02-21T12:31:22.0927861Z mov.b32 %r9740, %r9236; 2026-02-21T12:31:22.0927921Z mov.b32 %r9741, %r9236; 2026-02-21T12:31:22.0927980Z mov.b32 %r9742, %r9236; 2026-02-21T12:31:22.0928043Z mov.b32 %r9743, %r9236; 2026-02-21T12:31:22.0928103Z mov.b32 %r9744, %r9236; 2026-02-21T12:31:22.0928163Z mov.b32 %r9745, %r9236; 2026-02-21T12:31:22.0928221Z mov.b32 %r9746, %r9236; 2026-02-21T12:31:22.0928286Z mov.b32 %r9747, %r9236; 2026-02-21T12:31:22.0928349Z mov.b32 %r9748, %r9236; 2026-02-21T12:31:22.0928415Z mov.b32 %r9749, %r9236; 2026-02-21T12:31:22.0928476Z mov.b32 %r9750, %r9236; 2026-02-21T12:31:22.0928535Z mov.b32 %r9751, %r9236; 2026-02-21T12:31:22.0928600Z mov.b32 %r9752, %r9236; 2026-02-21T12:31:22.0928670Z mov.b32 %r9753, %r9236; 2026-02-21T12:31:22.0928732Z mov.b32 %r9754, %r9236; 2026-02-21T12:31:22.0928790Z mov.b32 %r9755, %r9236; 2026-02-21T12:31:22.0928854Z mov.b32 %r9756, %r9236; 2026-02-21T12:31:22.0928917Z mov.b32 %r9757, %r9236; 2026-02-21T12:31:22.0928978Z mov.b32 %r9758, %r9236; 2026-02-21T12:31:22.0929043Z mov.b32 %r9759, %r9236; 2026-02-21T12:31:22.0929103Z mov.b32 %r9760, %r9236; 2026-02-21T12:31:22.0929163Z mov.b32 %r9761, %r9236; 2026-02-21T12:31:22.0929222Z mov.b32 %r9762, %r9236; 2026-02-21T12:31:22.0929289Z mov.b32 %r9763, %r9236; 2026-02-21T12:31:22.0929349Z mov.b32 %r9764, %r9236; 2026-02-21T12:31:22.0929410Z mov.b32 %r9765, %r9236; 2026-02-21T12:31:22.0929480Z mov.b32 %r9766, %r9236; 2026-02-21T12:31:22.0929546Z mov.b32 %r9767, %r9236; 2026-02-21T12:31:22.0929605Z mov.b32 %r9768, %r9236; 2026-02-21T12:31:22.0929664Z mov.b32 %r9769, %r9236; 2026-02-21T12:31:22.0929725Z mov.b32 %r9770, %r9236; 2026-02-21T12:31:22.0929782Z mov.b32 %r9771, %r9236; 2026-02-21T12:31:22.0929843Z mov.b32 %r9772, %r9236; 2026-02-21T12:31:22.0929906Z mov.b32 %r9773, %r9236; 2026-02-21T12:31:22.0929963Z mov.b32 %r9774, %r9236; 2026-02-21T12:31:22.0930023Z mov.b32 %r9775, %r9236; 2026-02-21T12:31:22.0930092Z mov.b32 %r9776, %r9236; 2026-02-21T12:31:22.0930156Z mov.b32 %r9777, %r9236; 2026-02-21T12:31:22.0930219Z mov.b32 %r9778, %r9236; 2026-02-21T12:31:22.0930276Z mov.b32 %r9779, %r9236; 2026-02-21T12:31:22.0930342Z mov.b32 %r9780, %r9236; 2026-02-21T12:31:22.0930400Z mov.b32 %r9781, %r9236; 2026-02-21T12:31:22.0930456Z mov.b32 %r9782, %r9236; 2026-02-21T12:31:22.0930514Z mov.b32 %r9783, %r9236; 2026-02-21T12:31:22.0930576Z mov.b32 %r9784, %r9236; 2026-02-21T12:31:22.0930633Z mov.b32 %r9785, %r9236; 2026-02-21T12:31:22.0930773Z mov.b32 %r9786, %r9236; 2026-02-21T12:31:22.0930835Z mov.b32 %r9787, %r9236; 2026-02-21T12:31:22.0930893Z mov.b32 %r9788, %r9236; 2026-02-21T12:31:22.0930951Z mov.b32 %r9789, %r9236; 2026-02-21T12:31:22.0931087Z mov.b32 %r9790, %r9236; 2026-02-21T12:31:22.0931150Z mov.b32 %r9791, %r9236; 2026-02-21T12:31:22.0931210Z mov.b32 %r9792, %r9236; 2026-02-21T12:31:22.0931267Z mov.b32 %r9793, %r9236; 2026-02-21T12:31:22.0931331Z mov.b32 %r9794, %r9236; 2026-02-21T12:31:22.0931389Z mov.b32 %r9795, %r9236; 2026-02-21T12:31:22.0931447Z mov.b32 %r9796, %r9236; 2026-02-21T12:31:22.0931510Z mov.b32 %r9797, %r9236; 2026-02-21T12:31:22.0931568Z mov.b32 %r9798, %r9236; 2026-02-21T12:31:22.0931625Z mov.b32 %r9799, %r9236; 2026-02-21T12:31:22.0931683Z mov.b32 %r9800, %r9236; 2026-02-21T12:31:22.0931749Z mov.b32 %r9801, %r9236; 2026-02-21T12:31:22.0931807Z mov.b32 %r9802, %r9236; 2026-02-21T12:31:22.0931868Z mov.b32 %r9803, %r9236; 2026-02-21T12:31:22.0931936Z mov.b32 %r9804, %r9236; 2026-02-21T12:31:22.0932047Z mov.b32 %r9805, %r9236; 2026-02-21T12:31:22.0932107Z mov.b32 %r9806, %r9236; 2026-02-21T12:31:22.0932166Z mov.b32 %r9807, %r9236; 2026-02-21T12:31:22.0932230Z mov.b32 %r9808, %r9236; 2026-02-21T12:31:22.0932292Z mov.b32 %r9809, %r9236; 2026-02-21T12:31:22.0932364Z mov.b32 %r9810, %r9236; 2026-02-21T12:31:22.0932430Z mov.b32 %r9811, %r9236; 2026-02-21T12:31:22.0932535Z mov.b32 %r9812, %r9236; 2026-02-21T12:31:22.0932600Z mov.b32 %r9813, %r9236; 2026-02-21T12:31:22.0932658Z mov.b32 %r9814, %r9236; 2026-02-21T12:31:22.0932722Z mov.b32 %r9815, %r9236; 2026-02-21T12:31:22.0932779Z mov.b32 %r9816, %r9236; 2026-02-21T12:31:22.0932839Z mov.b32 %r9817, %r9236; 2026-02-21T12:31:22.0932900Z mov.b32 %r9818, %r9236; 2026-02-21T12:31:22.0932959Z mov.b32 %r9819, %r9236; 2026-02-21T12:31:22.0933016Z mov.b32 %r9820, %r9236; 2026-02-21T12:31:22.0933077Z mov.b32 %r9821, %r9236; 2026-02-21T12:31:22.0933139Z mov.b32 %r9822, %r9236; 2026-02-21T12:31:22.0933204Z mov.b32 %r9823, %r9236; 2026-02-21T12:31:22.0933263Z mov.b32 %r9824, %r9236; 2026-02-21T12:31:22.0933325Z mov.b32 %r9825, %r9236; 2026-02-21T12:31:22.0933385Z mov.b32 %r9826, %r9236; 2026-02-21T12:31:22.0933444Z mov.b32 %r9827, %r9236; 2026-02-21T12:31:22.0933509Z mov.b32 %r9828, %r9236; 2026-02-21T12:31:22.0933572Z mov.b32 %r9829, %r9236; 2026-02-21T12:31:22.0933631Z mov.b32 %r9830, %r9236; 2026-02-21T12:31:22.0933693Z mov.b32 %r9831, %r9236; 2026-02-21T12:31:22.0933756Z mov.b32 %r9832, %r9236; 2026-02-21T12:31:22.0933816Z mov.b32 %r9833, %r9236; 2026-02-21T12:31:22.0933875Z mov.b32 %r9834, %r9236; 2026-02-21T12:31:22.0933934Z mov.b32 %r9835, %r9236; 2026-02-21T12:31:22.0933998Z mov.b32 %r9836, %r9236; 2026-02-21T12:31:22.0934056Z mov.b32 %r9837, %r9236; 2026-02-21T12:31:22.0934125Z mov.b32 %r9838, %r9236; 2026-02-21T12:31:22.0934190Z mov.b32 %r9839, %r9236; 2026-02-21T12:31:22.0934254Z mov.b32 %r9840, %r9236; 2026-02-21T12:31:22.0934314Z mov.b32 %r9841, %r9236; 2026-02-21T12:31:22.0934374Z mov.b32 %r9842, %r9236; 2026-02-21T12:31:22.0934436Z mov.b32 %r9843, %r9236; 2026-02-21T12:31:22.0934495Z mov.b32 %r9844, %r9236; 2026-02-21T12:31:22.0934554Z mov.b32 %r9845, %r9236; 2026-02-21T12:31:22.0934621Z mov.b32 %r9846, %r9236; 2026-02-21T12:31:22.0934682Z mov.b32 %r9847, %r9236; 2026-02-21T12:31:22.0934742Z mov.b32 %r9848, %r9236; 2026-02-21T12:31:22.0934806Z mov.b32 %r9849, %r9236; 2026-02-21T12:31:22.0934866Z mov.b32 %r9850, %r9236; 2026-02-21T12:31:22.0934925Z mov.b32 %r9851, %r9236; 2026-02-21T12:31:22.0934983Z mov.b32 %r9852, %r9236; 2026-02-21T12:31:22.0935044Z mov.b32 %r9853, %r9236; 2026-02-21T12:31:22.0935103Z mov.b32 %r9854, %r9236; 2026-02-21T12:31:22.0935162Z mov.b32 %r9855, %r9236; 2026-02-21T12:31:22.0935224Z mov.b32 %r9856, %r9236; 2026-02-21T12:31:22.0935283Z mov.b32 %r9857, %r9236; 2026-02-21T12:31:22.0935343Z mov.b32 %r9858, %r9236; 2026-02-21T12:31:22.0935401Z mov.b32 %r9859, %r9236; 2026-02-21T12:31:22.0935534Z mov.b32 %r9860, %r9236; 2026-02-21T12:31:22.0935596Z mov.b32 %r9861, %r9236; 2026-02-21T12:31:22.0935656Z mov.b32 %r9862, %r9236; 2026-02-21T12:31:22.0935718Z mov.b32 %r9863, %r9236; 2026-02-21T12:31:22.0935775Z mov.b32 %r9864, %r9236; 2026-02-21T12:31:22.0935902Z mov.b32 %r9865, %r9236; 2026-02-21T12:31:22.0935960Z mov.b32 %r9866, %r9236; 2026-02-21T12:31:22.0936022Z mov.b32 %r9867, %r9236; 2026-02-21T12:31:22.0936082Z mov.b32 %r9868, %r9236; 2026-02-21T12:31:22.0936140Z mov.b32 %r9869, %r9236; 2026-02-21T12:31:22.0936206Z mov.b32 %r9870, %r9236; 2026-02-21T12:31:22.0936275Z mov.b32 %r9871, %r9236; 2026-02-21T12:31:22.0936336Z mov.b32 %r9872, %r9236; 2026-02-21T12:31:22.0936393Z mov.b32 %r9873, %r9236; 2026-02-21T12:31:22.0936575Z mov.b32 %r9874, %r9236; 2026-02-21T12:31:22.0936639Z mov.b32 %r9875, %r9236; 2026-02-21T12:31:22.0936697Z mov.b32 %r9876, %r9236; 2026-02-21T12:31:22.0936758Z mov.b32 %r9877, %r9236; 2026-02-21T12:31:22.0936815Z mov.b32 %r9878, %r9236; 2026-02-21T12:31:22.0936963Z mov.b32 %r9879, %r9236; 2026-02-21T12:31:22.0937026Z mov.b32 %r9880, %r9236; 2026-02-21T12:31:22.0937088Z mov.b32 %r9881, %r9236; 2026-02-21T12:31:22.0937146Z mov.b32 %r9882, %r9236; 2026-02-21T12:31:22.0937207Z mov.b32 %r9883, %r9236; 2026-02-21T12:31:22.0937271Z mov.b32 %r9884, %r9236; 2026-02-21T12:31:22.0937339Z mov.b32 %r9885, %r9236; 2026-02-21T12:31:22.0937399Z mov.b32 %r9886, %r9236; 2026-02-21T12:31:22.0937520Z mov.b32 %r9887, %r9236; 2026-02-21T12:31:22.0937584Z mov.b32 %r9888, %r9236; 2026-02-21T12:31:22.0937641Z mov.b32 %r9889, %r9236; 2026-02-21T12:31:22.0937699Z mov.b32 %r9890, %r9236; 2026-02-21T12:31:22.0937763Z mov.b32 %r9891, %r9236; 2026-02-21T12:31:22.0937821Z mov.b32 %r9892, %r9236; 2026-02-21T12:31:22.0937890Z mov.b32 %r9893, %r9236; 2026-02-21T12:31:22.0937955Z mov.b32 %r9894, %r9236; 2026-02-21T12:31:22.0938014Z mov.b32 %r9895, %r9236; 2026-02-21T12:31:22.0938073Z mov.b32 %r9896, %r9236; 2026-02-21T12:31:22.0938134Z mov.b32 %r9897, %r9236; 2026-02-21T12:31:22.0938198Z mov.b32 %r9898, %r9236; 2026-02-21T12:31:22.0938258Z mov.b32 %r9899, %r9236; 2026-02-21T12:31:22.0938315Z mov.b32 %r9900, %r9236; 2026-02-21T12:31:22.0938377Z mov.b32 %r9901, %r9236; 2026-02-21T12:31:22.0938437Z mov.b32 %r9902, %r9236; 2026-02-21T12:31:22.0938496Z mov.b32 %r9903, %r9236; 2026-02-21T12:31:22.0938554Z mov.b32 %r9904, %r9236; 2026-02-21T12:31:22.0938618Z mov.b32 %r9905, %r9236; 2026-02-21T12:31:22.0938678Z mov.b32 %r9906, %r9236; 2026-02-21T12:31:22.0938734Z mov.b32 %r9907, %r9236; 2026-02-21T12:31:22.0938795Z mov.b32 %r9908, %r9236; 2026-02-21T12:31:22.0938853Z mov.b32 %r9909, %r9236; 2026-02-21T12:31:22.0938911Z mov.b32 %r9910, %r9236; 2026-02-21T12:31:22.0938968Z mov.b32 %r9911, %r9236; 2026-02-21T12:31:22.0939031Z mov.b32 %r9912, %r9236; 2026-02-21T12:31:22.0939089Z mov.b32 %r9913, %r9236; 2026-02-21T12:31:22.0939146Z mov.b32 %r9914, %r9236; 2026-02-21T12:31:22.0939213Z mov.b32 %r9915, %r9236; 2026-02-21T12:31:22.0939276Z mov.b32 %r9916, %r9236; 2026-02-21T12:31:22.0939336Z mov.b32 %r9917, %r9236; 2026-02-21T12:31:22.0939394Z mov.b32 %r9918, %r9236; 2026-02-21T12:31:22.0939456Z mov.b32 %r9919, %r9236; 2026-02-21T12:31:22.0939516Z mov.b32 %r9920, %r9236; 2026-02-21T12:31:22.0939577Z mov.b32 %r9921, %r9236; 2026-02-21T12:31:22.0939640Z mov.b32 %r9922, %r9236; 2026-02-21T12:31:22.0939699Z mov.b32 %r9923, %r9236; 2026-02-21T12:31:22.0939759Z mov.b32 %r9924, %r9236; 2026-02-21T12:31:22.0939819Z mov.b32 %r9925, %r9236; 2026-02-21T12:31:22.0939892Z mov.b32 %r9926, %r9236; 2026-02-21T12:31:22.0939955Z mov.b32 %r9927, %r9236; 2026-02-21T12:31:22.0940017Z mov.b32 %r9928, %r9236; 2026-02-21T12:31:22.0940079Z mov.b32 %r9929, %r9236; 2026-02-21T12:31:22.0940137Z mov.b32 %r9930, %r9236; 2026-02-21T12:31:22.0940194Z mov.b32 %r9931, %r9236; 2026-02-21T12:31:22.0940255Z mov.b32 %r9932, %r9236; 2026-02-21T12:31:22.0940317Z mov.b32 %r9933, %r9236; 2026-02-21T12:31:22.0940379Z mov.b32 %r9934, %r9236; 2026-02-21T12:31:22.0940514Z mov.b32 %r9935, %r9236; 2026-02-21T12:31:22.0940581Z mov.b32 %r9936, %r9236; 2026-02-21T12:31:22.0940639Z mov.b32 %r9937, %r9236; 2026-02-21T12:31:22.0940697Z mov.b32 %r9938, %r9236; 2026-02-21T12:31:22.0940823Z mov.b32 %r9939, %r9236; 2026-02-21T12:31:22.0940882Z mov.b32 %r9940, %r9236; 2026-02-21T12:31:22.0940940Z mov.b32 %r9941, %r9236; 2026-02-21T12:31:22.0940999Z mov.b32 %r9942, %r9236; 2026-02-21T12:31:22.0941063Z mov.b32 %r9943, %r9236; 2026-02-21T12:31:22.0941121Z mov.b32 %r9944, %r9236; 2026-02-21T12:31:22.0941189Z mov.b32 %r9945, %r9236; 2026-02-21T12:31:22.0941254Z mov.b32 %r9946, %r9236; 2026-02-21T12:31:22.0941312Z mov.b32 %r9947, %r9236; 2026-02-21T12:31:22.0941370Z mov.b32 %r9948, %r9236; 2026-02-21T12:31:22.0941429Z mov.b32 %r9949, %r9236; 2026-02-21T12:31:22.0941491Z mov.b32 %r9950, %r9236; 2026-02-21T12:31:22.0941549Z mov.b32 %r9951, %r9236; 2026-02-21T12:31:22.0941607Z mov.b32 %r9952, %r9236; 2026-02-21T12:31:22.0941672Z mov.b32 %r9953, %r9236; 2026-02-21T12:31:22.0941782Z mov.b32 %r9954, %r9236; 2026-02-21T12:31:22.0941842Z mov.b32 %r9955, %r9236; 2026-02-21T12:31:22.0941900Z mov.b32 %r9956, %r9236; 2026-02-21T12:31:22.0941966Z mov.b32 %r9957, %r9236; 2026-02-21T12:31:22.0942028Z mov.b32 %r9958, %r9236; 2026-02-21T12:31:22.0942086Z mov.b32 %r9959, %r9236; 2026-02-21T12:31:22.0942149Z mov.b32 %r9960, %r9236; 2026-02-21T12:31:22.0942253Z mov.b32 %r9961, %r9236; 2026-02-21T12:31:22.0942313Z mov.b32 %r9962, %r9236; 2026-02-21T12:31:22.0942371Z mov.b32 %r9963, %r9236; 2026-02-21T12:31:22.0942436Z mov.b32 %r9964, %r9236; 2026-02-21T12:31:22.0942492Z mov.b32 %r9965, %r9236; 2026-02-21T12:31:22.0942551Z mov.b32 %r9966, %r9236; 2026-02-21T12:31:22.0942613Z mov.b32 %r9967, %r9236; 2026-02-21T12:31:22.0942673Z mov.b32 %r9968, %r9236; 2026-02-21T12:31:22.0942730Z mov.b32 %r9969, %r9236; 2026-02-21T12:31:22.0942788Z mov.b32 %r9970, %r9236; 2026-02-21T12:31:22.0942852Z mov.b32 %r9971, %r9236; 2026-02-21T12:31:22.0942915Z mov.b32 %r9972, %r9236; 2026-02-21T12:31:22.0942972Z mov.b32 %r9973, %r9236; 2026-02-21T12:31:22.0943037Z mov.b32 %r9974, %r9236; 2026-02-21T12:31:22.0943094Z mov.b32 %r9975, %r9236; 2026-02-21T12:31:22.0943152Z mov.b32 %r9976, %r9236; 2026-02-21T12:31:22.0943213Z mov.b32 %r9977, %r9236; 2026-02-21T12:31:22.0943277Z mov.b32 %r9978, %r9236; 2026-02-21T12:31:22.0943335Z mov.b32 %r9979, %r9236; 2026-02-21T12:31:22.0943395Z mov.b32 %r9980, %r9236; 2026-02-21T12:31:22.0943456Z mov.b32 %r9981, %r9236; 2026-02-21T12:31:22.0943515Z mov.b32 %r9982, %r9236; 2026-02-21T12:31:22.0943573Z mov.b32 %r9983, %r9236; 2026-02-21T12:31:22.0943635Z mov.b32 %r9984, %r9236; 2026-02-21T12:31:22.0943694Z mov.b32 %r9985, %r9236; 2026-02-21T12:31:22.0943752Z mov.b32 %r9986, %r9236; 2026-02-21T12:31:22.0943808Z mov.b32 %r9987, %r9236; 2026-02-21T12:31:22.0943872Z mov.b32 %r9988, %r9236; 2026-02-21T12:31:22.0943930Z mov.b32 %r9989, %r9236; 2026-02-21T12:31:22.0943989Z mov.b32 %r9990, %r9236; 2026-02-21T12:31:22.0944053Z mov.b32 %r9991, %r9236; 2026-02-21T12:31:22.0944114Z mov.b32 %r9992, %r9236; 2026-02-21T12:31:22.0944171Z mov.b32 %r9993, %r9236; 2026-02-21T12:31:22.0944230Z mov.b32 %r9994, %r9236; 2026-02-21T12:31:22.0944292Z mov.b32 %r9995, %r9236; 2026-02-21T12:31:22.0944366Z mov.b32 %r9996, %r9236; 2026-02-21T12:31:22.0944426Z mov.b32 %r9997, %r9236; 2026-02-21T12:31:22.0944489Z mov.b32 %r9998, %r9236; 2026-02-21T12:31:22.0944548Z mov.b32 %r9999, %r9236; 2026-02-21T12:31:22.0944610Z mov.b32 %r10000, %r9236; 2026-02-21T12:31:22.0944669Z mov.b32 %r10001, %r9236; 2026-02-21T12:31:22.0944730Z mov.b32 %r10002, %r9236; 2026-02-21T12:31:22.0944788Z mov.b32 %r10003, %r9236; 2026-02-21T12:31:22.0944849Z mov.b32 %r10004, %r9236; 2026-02-21T12:31:22.0944913Z mov.b32 %r10005, %r9236; 2026-02-21T12:31:22.0944972Z mov.b32 %r10006, %r9236; 2026-02-21T12:31:22.0945031Z mov.b32 %r10007, %r9236; 2026-02-21T12:31:22.0945090Z mov.b32 %r10008, %r9236; 2026-02-21T12:31:22.0945214Z mov.b32 %r10009, %r9236; 2026-02-21T12:31:22.0945273Z mov.b32 %r10010, %r9236; 2026-02-21T12:31:22.0945334Z mov.b32 %r10011, %r9236; 2026-02-21T12:31:22.0945397Z mov.b32 %r10012, %r9236; 2026-02-21T12:31:22.0945456Z mov.b32 %r10013, %r9236; 2026-02-21T12:31:22.0945564Z mov.b32 %r10014, %r9236; 2026-02-21T12:31:22.0945623Z mov.b32 %r10015, %r9236; 2026-02-21T12:31:22.0945686Z mov.b32 %r10016, %r9236; 2026-02-21T12:31:22.0945745Z mov.b32 %r10017, %r9236; 2026-02-21T12:31:22.0945808Z mov.b32 %r10018, %r9236; 2026-02-21T12:31:22.0945872Z mov.b32 %r10019, %r9236; 2026-02-21T12:31:22.0945931Z mov.b32 %r10020, %r9236; 2026-02-21T12:31:22.0945989Z mov.b32 %r10021, %r9236; 2026-02-21T12:31:22.0946048Z mov.b32 %r10022, %r9236; 2026-02-21T12:31:22.0946113Z mov.b32 %r10023, %r9236; 2026-02-21T12:31:22.0946171Z mov.b32 %r10024, %r9236; 2026-02-21T12:31:22.0946229Z mov.b32 %r10025, %r9236; 2026-02-21T12:31:22.0946293Z mov.b32 %r10026, %r9236; 2026-02-21T12:31:22.0946353Z mov.b32 %r10027, %r9236; 2026-02-21T12:31:22.0946584Z mov.b32 %r10028, %r9236; 2026-02-21T12:31:22.0946656Z mov.b32 %r10029, %r9236; 2026-02-21T12:31:22.0946715Z mov.b32 %r10030, %r9236; 2026-02-21T12:31:22.0946775Z mov.b32 %r10031, %r9236; 2026-02-21T12:31:22.0946835Z mov.b32 %r10032, %r9236; 2026-02-21T12:31:22.0946900Z mov.b32 %r10033, %r9236; 2026-02-21T12:31:22.0946958Z mov.b32 %r10034, %r9236; 2026-02-21T12:31:22.0947090Z mov.b32 %r10035, %r9236; 2026-02-21T12:31:22.0947168Z mov.b32 %r10036, %r9236; 2026-02-21T12:31:22.0947238Z mov.b32 %r10037, %r9236; 2026-02-21T12:31:22.0947298Z mov.b32 %r10038, %r9236; 2026-02-21T12:31:22.0947358Z mov.b32 %r10039, %r9236; 2026-02-21T12:31:22.0947425Z mov.b32 %r10040, %r9236; 2026-02-21T12:31:22.0947484Z mov.b32 %r10041, %r9236; 2026-02-21T12:31:22.0947542Z mov.b32 %r10042, %r9236; 2026-02-21T12:31:22.0947606Z mov.b32 %r10043, %r9236; 2026-02-21T12:31:22.0947665Z mov.b32 %r10044, %r9236; 2026-02-21T12:31:22.0947725Z mov.b32 %r10045, %r9236; 2026-02-21T12:31:22.0947786Z mov.b32 %r10046, %r9236; 2026-02-21T12:31:22.0947849Z mov.b32 %r10047, %r9236; 2026-02-21T12:31:22.0947908Z mov.b32 %r10048, %r9236; 2026-02-21T12:31:22.0947966Z mov.b32 %r10049, %r9236; 2026-02-21T12:31:22.0948031Z mov.b32 %r10050, %r9236; 2026-02-21T12:31:22.0948089Z mov.b32 %r10051, %r9236; 2026-02-21T12:31:22.0948147Z mov.b32 %r10052, %r9236; 2026-02-21T12:31:22.0948208Z mov.b32 %r10053, %r9236; 2026-02-21T12:31:22.0948363Z mov.b32 %r10054, %r9236; 2026-02-21T12:31:22.0948424Z mov.b32 %r10055, %r9236; 2026-02-21T12:31:22.0948483Z mov.b32 %r10056, %r9236; 2026-02-21T12:31:22.0948546Z mov.b32 %r10057, %r9236; 2026-02-21T12:31:22.0948605Z mov.b32 %r10058, %r9236; 2026-02-21T12:31:22.0948668Z mov.b32 %r10059, %r9236; 2026-02-21T12:31:22.0948725Z mov.b32 %r10060, %r9236; 2026-02-21T12:31:22.0948787Z mov.b32 %r10061, %r9236; 2026-02-21T12:31:22.0948846Z mov.b32 %r10062, %r9236; 2026-02-21T12:31:22.0948905Z mov.b32 %r10063, %r9236; 2026-02-21T12:31:22.0948973Z mov.b32 %r10064, %r9236; 2026-02-21T12:31:22.0949031Z mov.b32 %r10065, %r9236; 2026-02-21T12:31:22.0949089Z mov.b32 %r10066, %r9236; 2026-02-21T12:31:22.0949148Z mov.b32 %r10067, %r9236; 2026-02-21T12:31:22.0949213Z mov.b32 %r10068, %r9236; 2026-02-21T12:31:22.0949272Z mov.b32 %r10069, %r9236; 2026-02-21T12:31:22.0949331Z mov.b32 %r10070, %r9236; 2026-02-21T12:31:22.0949392Z mov.b32 %r10071, %r9236; 2026-02-21T12:31:22.0949451Z mov.b32 %r10072, %r9236; 2026-02-21T12:31:22.0949510Z mov.b32 %r10073, %r9236; 2026-02-21T12:31:22.0949569Z mov.b32 %r10074, %r9236; 2026-02-21T12:31:22.0949633Z mov.b32 %r10075, %r9236; 2026-02-21T12:31:22.0949690Z mov.b32 %r10076, %r9236; 2026-02-21T12:31:22.0949748Z mov.b32 %r10077, %r9236; 2026-02-21T12:31:22.0949809Z mov.b32 %r10078, %r9236; 2026-02-21T12:31:22.0949868Z mov.b32 %r10079, %r9236; 2026-02-21T12:31:22.0949927Z mov.b32 %r10080, %r9236; 2026-02-21T12:31:22.0949993Z mov.b32 %r10081, %r9236; 2026-02-21T12:31:22.0950133Z mov.b32 %r10082, %r9236; 2026-02-21T12:31:22.0950190Z mov.b32 %r10083, %r9236; 2026-02-21T12:31:22.0950248Z mov.b32 %r10084, %r9236; 2026-02-21T12:31:22.0950310Z mov.b32 %r10085, %r9236; 2026-02-21T12:31:22.0950368Z mov.b32 %r10086, %r9236; 2026-02-21T12:31:22.0950490Z mov.b32 %r10087, %r9236; 2026-02-21T12:31:22.0950560Z mov.b32 %r10088, %r9236; 2026-02-21T12:31:22.0950629Z mov.b32 %r10089, %r9236; 2026-02-21T12:31:22.0950692Z mov.b32 %r10090, %r9236; 2026-02-21T12:31:22.0950751Z mov.b32 %r10091, %r9236; 2026-02-21T12:31:22.0950817Z mov.b32 %r10092, %r9236; 2026-02-21T12:31:22.0950876Z mov.b32 %r10093, %r9236; 2026-02-21T12:31:22.0950934Z mov.b32 %r10094, %r9236; 2026-02-21T12:31:22.0950997Z mov.b32 %r10095, %r9236; 2026-02-21T12:31:22.0951056Z mov.b32 %r10096, %r9236; 2026-02-21T12:31:22.0951116Z mov.b32 %r10097, %r9236; 2026-02-21T12:31:22.0951174Z mov.b32 %r10098, %r9236; 2026-02-21T12:31:22.0951239Z mov.b32 %r10099, %r9236; 2026-02-21T12:31:22.0951300Z mov.b32 %r10100, %r9236; 2026-02-21T12:31:22.0951424Z mov.b32 %r10101, %r9236; 2026-02-21T12:31:22.0951493Z mov.b32 %r10102, %r9236; 2026-02-21T12:31:22.0951551Z mov.b32 %r10103, %r9236; 2026-02-21T12:31:22.0951611Z mov.b32 %r10104, %r9236; 2026-02-21T12:31:22.0951672Z mov.b32 %r10105, %r9236; 2026-02-21T12:31:22.0951747Z mov.b32 %r10106, %r9236; 2026-02-21T12:31:22.0951811Z mov.b32 %r10107, %r9236; 2026-02-21T12:31:22.0951916Z mov.b32 %r10108, %r9236; 2026-02-21T12:31:22.0951984Z mov.b32 %r10109, %r9236; 2026-02-21T12:31:22.0952043Z mov.b32 %r10110, %r9236; 2026-02-21T12:31:22.0952101Z mov.b32 %r10111, %r9236; 2026-02-21T12:31:22.0952160Z mov.b32 %r10112, %r9236; 2026-02-21T12:31:22.0952224Z mov.b32 %r10113, %r9236; 2026-02-21T12:31:22.0952284Z mov.b32 %r10114, %r9236; 2026-02-21T12:31:22.0952342Z mov.b32 %r10115, %r9236; 2026-02-21T12:31:22.0952405Z mov.b32 %r10116, %r9236; 2026-02-21T12:31:22.0952462Z mov.b32 %r10117, %r9236; 2026-02-21T12:31:22.0952520Z mov.b32 %r10118, %r9236; 2026-02-21T12:31:22.0952582Z mov.b32 %r10119, %r9236; 2026-02-21T12:31:22.0952645Z mov.b32 %r10120, %r9236; 2026-02-21T12:31:22.0952703Z mov.b32 %r10121, %r9236; 2026-02-21T12:31:22.0952761Z mov.b32 %r10122, %r9236; 2026-02-21T12:31:22.0952827Z mov.b32 %r10123, %r9236; 2026-02-21T12:31:22.0952884Z mov.b32 %r10124, %r9236; 2026-02-21T12:31:22.0952944Z mov.b32 %r10125, %r9236; 2026-02-21T12:31:22.0953006Z mov.b32 %r10126, %r9236; 2026-02-21T12:31:22.0953065Z mov.b32 %r10127, %r9236; 2026-02-21T12:31:22.0953123Z mov.b32 %r10128, %r9236; 2026-02-21T12:31:22.0953182Z mov.b32 %r10129, %r9236; 2026-02-21T12:31:22.0953246Z mov.b32 %r10130, %r9236; 2026-02-21T12:31:22.0953304Z mov.b32 %r10131, %r9236; 2026-02-21T12:31:22.0953363Z mov.b32 %r10132, %r9236; 2026-02-21T12:31:22.0953424Z mov.b32 %r10133, %r9236; 2026-02-21T12:31:22.0953484Z mov.b32 %r10134, %r9236; 2026-02-21T12:31:22.0953542Z mov.b32 %r10135, %r9236; 2026-02-21T12:31:22.0953601Z mov.b32 %r10136, %r9236; 2026-02-21T12:31:22.0953671Z mov.b32 %r10137, %r9236; 2026-02-21T12:31:22.0953730Z mov.b32 %r10138, %r9236; 2026-02-21T12:31:22.0953788Z mov.b32 %r10139, %r9236; 2026-02-21T12:31:22.0953852Z mov.b32 %r10140, %r9236; 2026-02-21T12:31:22.0953911Z mov.b32 %r10141, %r9236; 2026-02-21T12:31:22.0953971Z mov.b32 %r10142, %r9236; 2026-02-21T12:31:22.0954030Z mov.b32 %r10143, %r9236; 2026-02-21T12:31:22.0954095Z mov.b32 %r10144, %r9236; 2026-02-21T12:31:22.0954158Z mov.b32 %r10145, %r9236; 2026-02-21T12:31:22.0954220Z mov.b32 %r10146, %r9236; 2026-02-21T12:31:22.0954287Z mov.b32 %r10147, %r9236; 2026-02-21T12:31:22.0954348Z mov.b32 %r10148, %r9236; 2026-02-21T12:31:22.0954409Z mov.b32 %r10149, %r9236; 2026-02-21T12:31:22.0954470Z mov.b32 %r10150, %r9236; 2026-02-21T12:31:22.0954538Z mov.b32 %r10151, %r9236; 2026-02-21T12:31:22.0954598Z mov.b32 %r10152, %r9236; 2026-02-21T12:31:22.0954661Z mov.b32 %r10153, %r9236; 2026-02-21T12:31:22.0954726Z mov.b32 %r10154, %r9236; 2026-02-21T12:31:22.0954881Z mov.b32 %r10155, %r9236; 2026-02-21T12:31:22.0954944Z mov.b32 %r10156, %r9236; 2026-02-21T12:31:22.0955004Z mov.b32 %r10157, %r9236; 2026-02-21T12:31:22.0955071Z mov.b32 %r10158, %r9236; 2026-02-21T12:31:22.0955132Z mov.b32 %r10159, %r9236; 2026-02-21T12:31:22.0955242Z mov.b32 %r10160, %r9236; 2026-02-21T12:31:22.0955304Z mov.b32 %r10161, %r9236; 2026-02-21T12:31:22.0955368Z mov.b32 %r10162, %r9236; 2026-02-21T12:31:22.0955430Z mov.b32 %r10163, %r9236; 2026-02-21T12:31:22.0955492Z mov.b32 %r10164, %r9236; 2026-02-21T12:31:22.0955557Z mov.b32 %r10165, %r9236; 2026-02-21T12:31:22.0955617Z mov.b32 %r10166, %r9236; 2026-02-21T12:31:22.0955676Z mov.b32 %r10167, %r9236; 2026-02-21T12:31:22.0955742Z mov.b32 %r10168, %r9236; 2026-02-21T12:31:22.0955801Z mov.b32 %r10169, %r9236; 2026-02-21T12:31:22.0955861Z mov.b32 %r10170, %r9236; 2026-02-21T12:31:22.0955924Z mov.b32 %r10171, %r9236; 2026-02-21T12:31:22.0955987Z mov.b32 %r10172, %r9236; 2026-02-21T12:31:22.0956047Z mov.b32 %r10173, %r9236; 2026-02-21T12:31:22.0956165Z mov.b32 %r10174, %r9236; 2026-02-21T12:31:22.0956234Z mov.b32 %r10175, %r9236; 2026-02-21T12:31:22.0956294Z mov.b32 %r10176, %r9236; 2026-02-21T12:31:22.0956352Z mov.b32 %r10177, %r9236; 2026-02-21T12:31:22.0956417Z mov.b32 %r10178, %r9236; 2026-02-21T12:31:22.0956592Z mov.b32 %r10179, %r9236; 2026-02-21T12:31:22.0956657Z mov.b32 %r10180, %r9236; 2026-02-21T12:31:22.0956788Z mov.b32 %r10181, %r9236; 2026-02-21T12:31:22.0956855Z mov.b32 %r10182, %r9236; 2026-02-21T12:31:22.0956914Z mov.b32 %r10183, %r9236; 2026-02-21T12:31:22.0956973Z mov.b32 %r10184, %r9236; 2026-02-21T12:31:22.0957039Z mov.b32 %r10185, %r9236; 2026-02-21T12:31:22.0957097Z mov.b32 %r10186, %r9236; 2026-02-21T12:31:22.0957157Z mov.b32 %r10187, %r9236; 2026-02-21T12:31:22.0957225Z mov.b32 %r10188, %r9236; 2026-02-21T12:31:22.0957292Z mov.b32 %r10189, %r9236; 2026-02-21T12:31:22.0957351Z mov.b32 %r10190, %r9236; 2026-02-21T12:31:22.0957410Z mov.b32 %r10191, %r9236; 2026-02-21T12:31:22.0957493Z mov.b32 %r10192, %r9236; 2026-02-21T12:31:22.0957554Z mov.b32 %r10193, %r9236; 2026-02-21T12:31:22.0957614Z mov.b32 %r10194, %r9236; 2026-02-21T12:31:22.0957674Z mov.b32 %r10195, %r9236; 2026-02-21T12:31:22.0957741Z mov.b32 %r10196, %r9236; 2026-02-21T12:31:22.0957801Z mov.b32 %r10197, %r9236; 2026-02-21T12:31:22.0957860Z mov.b32 %r10198, %r9236; 2026-02-21T12:31:22.0957925Z mov.b32 %r10199, %r9236; 2026-02-21T12:31:22.0957986Z mov.b32 %r10200, %r9236; 2026-02-21T12:31:22.0958049Z mov.b32 %r10201, %r9236; 2026-02-21T12:31:22.0958108Z mov.b32 %r10202, %r9236; 2026-02-21T12:31:22.0958172Z mov.b32 %r10203, %r9236; 2026-02-21T12:31:22.0958231Z mov.b32 %r10204, %r9236; 2026-02-21T12:31:22.0958291Z mov.b32 %r10205, %r9236; 2026-02-21T12:31:22.0958357Z mov.b32 %r10206, %r9236; 2026-02-21T12:31:22.0958415Z mov.b32 %r10207, %r9236; 2026-02-21T12:31:22.0958474Z mov.b32 %r10208, %r9236; 2026-02-21T12:31:22.0958533Z mov.b32 %r10209, %r9236; 2026-02-21T12:31:22.0958604Z mov.b32 %r10210, %r9236; 2026-02-21T12:31:22.0958663Z mov.b32 %r10211, %r9236; 2026-02-21T12:31:22.0958722Z mov.b32 %r10212, %r9236; 2026-02-21T12:31:22.0958791Z mov.b32 %r10213, %r9236; 2026-02-21T12:31:22.0958851Z mov.b32 %r10214, %r9236; 2026-02-21T12:31:22.0958913Z mov.b32 %r10215, %r9236; 2026-02-21T12:31:22.0958978Z mov.b32 %r10216, %r9236; 2026-02-21T12:31:22.0959037Z mov.b32 %r10217, %r9236; 2026-02-21T12:31:22.0959095Z mov.b32 %r10218, %r9236; 2026-02-21T12:31:22.0959155Z mov.b32 %r10219, %r9236; 2026-02-21T12:31:22.0959219Z mov.b32 %r10220, %r9236; 2026-02-21T12:31:22.0959278Z mov.b32 %r10221, %r9236; 2026-02-21T12:31:22.0959338Z mov.b32 %r10222, %r9236; 2026-02-21T12:31:22.0959403Z mov.b32 %r10223, %r9236; 2026-02-21T12:31:22.0959461Z mov.b32 %r10224, %r9236; 2026-02-21T12:31:22.0959522Z mov.b32 %r10225, %r9236; 2026-02-21T12:31:22.0959582Z mov.b32 %r10226, %r9236; 2026-02-21T12:31:22.0959650Z mov.b32 %r10227, %r9236; 2026-02-21T12:31:22.0959801Z mov.b32 %r10228, %r9236; 2026-02-21T12:31:22.0959864Z mov.b32 %r10229, %r9236; 2026-02-21T12:31:22.0959930Z mov.b32 %r10230, %r9236; 2026-02-21T12:31:22.0959989Z mov.b32 %r10231, %r9236; 2026-02-21T12:31:22.0960050Z mov.b32 %r10232, %r9236; 2026-02-21T12:31:22.0960172Z mov.b32 %r10233, %r9236; 2026-02-21T12:31:22.0960248Z mov.b32 %r10234, %r9236; 2026-02-21T12:31:22.0960311Z mov.b32 %r10235, %r9236; 2026-02-21T12:31:22.0960372Z mov.b32 %r10236, %r9236; 2026-02-21T12:31:22.0960437Z mov.b32 %r10237, %r9236; 2026-02-21T12:31:22.0960497Z mov.b32 %r10238, %r9236; 2026-02-21T12:31:22.0960557Z mov.b32 %r10239, %r9236; 2026-02-21T12:31:22.0960615Z mov.b32 %r10240, %r9236; 2026-02-21T12:31:22.0960679Z mov.b32 %r10241, %r9236; 2026-02-21T12:31:22.0960739Z mov.b32 %r10242, %r9236; 2026-02-21T12:31:22.0960798Z mov.b32 %r10243, %r9236; 2026-02-21T12:31:22.0960864Z mov.b32 %r10244, %r9236; 2026-02-21T12:31:22.0960925Z mov.b32 %r10245, %r9236; 2026-02-21T12:31:22.0960985Z mov.b32 %r10246, %r9236; 2026-02-21T12:31:22.0961111Z mov.b32 %r10247, %r9236; 2026-02-21T12:31:22.0961181Z mov.b32 %r10248, %r9236; 2026-02-21T12:31:22.0961241Z mov.b32 %r10249, %r9236; 2026-02-21T12:31:22.0961302Z mov.b32 %r10250, %r9236; 2026-02-21T12:31:22.0961370Z mov.b32 %r10251, %r9236; 2026-02-21T12:31:22.0961430Z mov.b32 %r10252, %r9236; 2026-02-21T12:31:22.0961491Z mov.b32 %r10253, %r9236; 2026-02-21T12:31:22.0961603Z mov.b32 %r10254, %r9236; 2026-02-21T12:31:22.0961679Z mov.b32 %r10255, %r9236; 2026-02-21T12:31:22.0961739Z mov.b32 %r10256, %r9236; 2026-02-21T12:31:22.0961799Z mov.b32 %r10257, %r9236; 2026-02-21T12:31:22.0961863Z mov.b32 %r10258, %r9236; 2026-02-21T12:31:22.0961922Z mov.b32 %r10259, %r9236; 2026-02-21T12:31:22.0961988Z mov.b64 %rd659, %rd648; 2026-02-21T12:31:22.0962054Z mov.b64 %rd661, %rd656; 2026-02-21T12:31:22.0962116Z mov.b64 %rd662, %rd655; 2026-02-21T12:31:22.0962179Z bra.uni $L__BB0_2; 2026-02-21T12:31:22.0962308Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0962557Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0962629Z add.s64 %rd659, %rd659, 1; 2026-02-21T12:31:22.0962709Z setp.ne.b64 %p22, %rd147, %rd659; 2026-02-21T12:31:22.0962778Z mov.b64 %rd645, %rd655; 2026-02-21T12:31:22.0962839Z mov.b64 %rd646, %rd148; 2026-02-21T12:31:22.0962899Z mov.b64 %rd647, %rd658; 2026-02-21T12:31:22.0962959Z mov.b64 %rd648, %rd150; 2026-02-21T12:31:22.0963023Z mov.b64 %rd649, %rd656; 2026-02-21T12:31:22.0963083Z mov.b64 %rd650, %rd152; 2026-02-21T12:31:22.0963144Z mov.b32 %r9233, %r39; 2026-02-21T12:31:22.0963209Z mov.b64 %rd655, %rd662; 2026-02-21T12:31:22.0963268Z mov.b64 %rd656, %rd661; 2026-02-21T12:31:22.0963327Z mov.b64 %rd658, %rd163; 2026-02-21T12:31:22.0963390Z @%p22 bra $L__BB0_2; 2026-02-21T12:31:22.0963455Z bra.uni $L__BB0_10; 2026-02-21T12:31:22.0963580Z $L__BB0_2: // =>This Inner Loop Header: Depth=1 2026-02-21T12:31:22.0963806Z .loc 1 0 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:0:135 2026-02-21T12:31:22.0963875Z mov.b32 %r39, %r9232; 2026-02-21T12:31:22.0963935Z mov.b32 %r9232, %r9231; 2026-02-21T12:31:22.0963997Z mov.b64 %rd152, %rd649; 2026-02-21T12:31:22.0964063Z mov.b64 %rd150, %rd647; 2026-02-21T12:31:22.0964123Z mov.b64 %rd148, %rd645; 2026-02-21T12:31:22.0964347Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0964413Z add.s64 %rd220, %rd658, 1; 2026-02-21T12:31:22.0964491Z setp.eq.b64 %p6, %rd658, 1023; 2026-02-21T12:31:22.0964561Z selp.b64 %rd163, 0, %rd220, %p6; 2026-02-21T12:31:22.0964630Z setp.ne.b64 %p7, %rd163, 0; 2026-02-21T12:31:22.0964698Z @%p7 bra $L__BB0_7; 2026-02-21T12:31:22.0964815Z // %bb.3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0964884Z add.s64 %rd667, %rd667, 2112; 2026-02-21T12:31:22.0965183Z .loc 1 25 35 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:25:35 2026-02-21T12:31:22.0965253Z shr.s64 %rd221, %rd667, 63; 2026-02-21T12:31:22.0965317Z shr.u64 %rd222, %rd221, 53; 2026-02-21T12:31:22.0965431Z add.s64 %rd223, %rd667, %rd222; 2026-02-21T12:31:22.0965499Z shr.s64 %rd224, %rd223, 11; 2026-02-21T12:31:22.0965703Z .loc 1 26 33 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:26:33 2026-02-21T12:31:22.0965770Z shl.b64 %rd165, %rd224, 2; 2026-02-21T12:31:22.0965974Z .loc 1 27 39 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:27:39 2026-02-21T12:31:22.0966038Z sub.s64 %rd225, 5, %rd165; 2026-02-21T12:31:22.0966236Z .loc 1 27 52 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:27:52 2026-02-21T12:31:22.0966302Z min.s64 %rd166, %rd225, 4; 2026-02-21T12:31:22.0966713Z .loc 1 28 45 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:45 2026-02-21T12:31:22.0966795Z and.b64 %rd226, %rd223, -2048; 2026-02-21T12:31:22.0966862Z sub.s64 %rd167, %rd667, %rd226; 2026-02-21T12:31:22.0967074Z .loc 1 29 51 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:29:51 2026-02-21T12:31:22.0967144Z or.b64 %rd227, %rd167, %rd166; 2026-02-21T12:31:22.0967215Z and.b64 %rd228, %rd227, -4294967296; 2026-02-21T12:31:22.0967354Z setp.ne.b64 %p8, %rd228, 0; 2026-02-21T12:31:22.0967420Z @%p8 bra $L__BB0_5; 2026-02-21T12:31:22.0967480Z bra.uni $L__BB0_4; 2026-02-21T12:31:22.0967601Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0967682Z div.s64 %rd660, %rd167, %rd166; 2026-02-21T12:31:22.0967745Z bra.uni $L__BB0_6; 2026-02-21T12:31:22.0967853Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0967924Z cvt.u32.u64 %r3352, %rd166; 2026-02-21T12:31:22.0967987Z cvt.u32.u64 %r3353, %rd167; 2026-02-21T12:31:22.0968055Z div.u32 %r3354, %r3353, %r3352; 2026-02-21T12:31:22.0968121Z cvt.u64.u32 %rd660, %r3354; 2026-02-21T12:31:22.0968224Z $L__BB0_6: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0968428Z .loc 1 28 64 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:64 2026-02-21T12:31:22.0968502Z mul.lo.s64 %rd229, %rd660, %rd166; 2026-02-21T12:31:22.0968567Z sub.s64 %rd230, %rd167, %rd229; 2026-02-21T12:31:22.0968764Z .loc 1 28 30 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:28:30 2026-02-21T12:31:22.0968829Z add.s64 %rd231, %rd230, %rd165; 2026-02-21T12:31:22.0969032Z .loc 1 30 27 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:30:27 2026-02-21T12:31:22.0969097Z shl.b64 %rd232, %rd231, 8; 2026-02-21T12:31:22.0969295Z .loc 1 31 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:31:32 2026-02-21T12:31:22.0969370Z or.b64 %rd661, %rd232, %rd2; 2026-02-21T12:31:22.0969569Z .loc 1 32 27 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:32:27 2026-02-21T12:31:22.0969633Z shl.b64 %rd662, %rd660, 9; 2026-02-21T12:31:22.0969839Z .loc 1 33 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:33:32 2026-02-21T12:31:22.0969904Z or.b64 %rd663, %rd662, %rd3; 2026-02-21T12:31:22.0969967Z or.b64 %rd664, %rd662, %rd4; 2026-02-21T12:31:22.0970030Z or.b64 %rd665, %rd662, %rd5; 2026-02-21T12:31:22.0970098Z or.b64 %rd666, %rd662, %rd6; 2026-02-21T12:31:22.0970204Z $L__BB0_7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.0970414Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.0970487Z setp.eq.b64 %p17, %rd163, 0; 2026-02-21T12:31:22.0970560Z setp.lt.s64 %p18, %rd659, %rd145; 2026-02-21T12:31:22.0970623Z add.s32 %r7499, %r9234, 1; 2026-02-21T12:31:22.0970779Z setp.gt.s32 %p19, %r7499, 2; 2026-02-21T12:31:22.0970849Z selp.b32 %r9234, 0, %r7499, %p19; 2026-02-21T12:31:22.0971050Z .loc 1 41 35 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:41:35 2026-02-21T12:31:22.0971181Z cvt.s64.s32 %rd248, %r9233; 2026-02-21T12:31:22.0971255Z add.s64 %rd249, %rd248, %rd7; 2026-02-21T12:31:22.0971455Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.0971526Z cp.async.wait_group 2; 2026-02-21T12:31:22.0971591Z bar.sync 0; 2026-02-21T12:31:22.0971656Z shl.b32 %r7500, %r9234, 13; 2026-02-21T12:31:22.0971852Z .loc 1 52 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:52:32 2026-02-21T12:31:22.0971921Z add.s32 %r7501, %r9, %r7500; 2026-02-21T12:31:22.0971989Z ld.shared.b16 %rs1, [%r7501]; 2026-02-21T12:31:22.0972058Z ld.shared.b16 %rs2, [%r7501+128]; 2026-02-21T12:31:22.0972130Z ld.shared.b16 %rs3, [%r7501+8]; 2026-02-21T12:31:22.0972256Z ld.shared.b16 %rs4, [%r7501+136]; 2026-02-21T12:31:22.0972330Z ld.shared.b16 %rs5, [%r7501+1024]; 2026-02-21T12:31:22.0972399Z ld.shared.b16 %rs6, [%r7501+1152]; 2026-02-21T12:31:22.0972479Z ld.shared.b16 %rs7, [%r7501+1032]; 2026-02-21T12:31:22.0972547Z ld.shared.b16 %rs8, [%r7501+1160]; 2026-02-21T12:31:22.0972612Z ld.shared.b16 %rs9, [%r7501+2048]; 2026-02-21T12:31:22.0972754Z ld.shared.b16 %rs10, [%r7501+2176]; 2026-02-21T12:31:22.0972825Z ld.shared.b16 %rs11, [%r7501+2056]; 2026-02-21T12:31:22.0972892Z ld.shared.b16 %rs12, [%r7501+2184]; 2026-02-21T12:31:22.0972956Z ld.shared.b16 %rs13, [%r7501+3072]; 2026-02-21T12:31:22.0973035Z ld.shared.b16 %rs14, [%r7501+3200]; 2026-02-21T12:31:22.0973111Z ld.shared.b16 %rs15, [%r7501+3080]; 2026-02-21T12:31:22.0973178Z ld.shared.b16 %rs16, [%r7501+3208]; 2026-02-21T12:31:22.0973255Z ld.shared.b16 %rs17, [%r7501+4096]; 2026-02-21T12:31:22.0973323Z ld.shared.b16 %rs18, [%r7501+4224]; 2026-02-21T12:31:22.0973394Z ld.shared.b16 %rs19, [%r7501+4104]; 2026-02-21T12:31:22.0973462Z ld.shared.b16 %rs20, [%r7501+4232]; 2026-02-21T12:31:22.0973532Z ld.shared.b16 %rs21, [%r7501+5120]; 2026-02-21T12:31:22.0973598Z ld.shared.b16 %rs22, [%r7501+5248]; 2026-02-21T12:31:22.0973667Z ld.shared.b16 %rs23, [%r7501+5128]; 2026-02-21T12:31:22.0973739Z ld.shared.b16 %rs24, [%r7501+5256]; 2026-02-21T12:31:22.0973808Z ld.shared.b16 %rs25, [%r7501+6144]; 2026-02-21T12:31:22.0973874Z ld.shared.b16 %rs26, [%r7501+6272]; 2026-02-21T12:31:22.0973945Z ld.shared.b16 %rs27, [%r7501+6152]; 2026-02-21T12:31:22.0974013Z ld.shared.b16 %rs28, [%r7501+6280]; 2026-02-21T12:31:22.0974079Z ld.shared.b16 %rs29, [%r7501+7168]; 2026-02-21T12:31:22.0974145Z ld.shared.b16 %rs30, [%r7501+7296]; 2026-02-21T12:31:22.0974227Z ld.shared.b16 %rs31, [%r7501+7176]; 2026-02-21T12:31:22.0974297Z ld.shared.b16 %rs32, [%r7501+7304]; 2026-02-21T12:31:22.0974363Z cvt.f32.bf16 %r3613, %rs1; 2026-02-21T12:31:22.0974431Z cvt.f32.bf16 %r3614, %rs2; 2026-02-21T12:31:22.0974495Z cvt.f32.bf16 %r3615, %rs3; 2026-02-21T12:31:22.0974557Z cvt.f32.bf16 %r3616, %rs4; 2026-02-21T12:31:22.0974618Z cvt.f32.bf16 %r3873, %rs5; 2026-02-21T12:31:22.0974683Z cvt.f32.bf16 %r3874, %rs6; 2026-02-21T12:31:22.0974746Z cvt.f32.bf16 %r3875, %rs7; 2026-02-21T12:31:22.0974806Z cvt.f32.bf16 %r3876, %rs8; 2026-02-21T12:31:22.0974871Z cvt.f32.bf16 %r4133, %rs9; 2026-02-21T12:31:22.0974938Z cvt.f32.bf16 %r4134, %rs10; 2026-02-21T12:31:22.0975000Z cvt.f32.bf16 %r4135, %rs11; 2026-02-21T12:31:22.0975061Z cvt.f32.bf16 %r4136, %rs12; 2026-02-21T12:31:22.0975129Z cvt.f32.bf16 %r4393, %rs13; 2026-02-21T12:31:22.0975191Z cvt.f32.bf16 %r4394, %rs14; 2026-02-21T12:31:22.0975252Z cvt.f32.bf16 %r4395, %rs15; 2026-02-21T12:31:22.0975321Z cvt.f32.bf16 %r4396, %rs16; 2026-02-21T12:31:22.0975383Z cvt.f32.bf16 %r4653, %rs17; 2026-02-21T12:31:22.0975443Z cvt.f32.bf16 %r4654, %rs18; 2026-02-21T12:31:22.0975506Z cvt.f32.bf16 %r4655, %rs19; 2026-02-21T12:31:22.0975650Z cvt.f32.bf16 %r4656, %rs20; 2026-02-21T12:31:22.0975713Z cvt.f32.bf16 %r4913, %rs21; 2026-02-21T12:31:22.0975775Z cvt.f32.bf16 %r4914, %rs22; 2026-02-21T12:31:22.0975843Z cvt.f32.bf16 %r4915, %rs23; 2026-02-21T12:31:22.0975957Z cvt.f32.bf16 %r4916, %rs24; 2026-02-21T12:31:22.0976019Z cvt.f32.bf16 %r5173, %rs25; 2026-02-21T12:31:22.0976087Z cvt.f32.bf16 %r5174, %rs26; 2026-02-21T12:31:22.0976149Z cvt.f32.bf16 %r5175, %rs27; 2026-02-21T12:31:22.0976210Z cvt.f32.bf16 %r5176, %rs28; 2026-02-21T12:31:22.0976272Z cvt.f32.bf16 %r5433, %rs29; 2026-02-21T12:31:22.0976341Z cvt.f32.bf16 %r5434, %rs30; 2026-02-21T12:31:22.0976402Z cvt.f32.bf16 %r5435, %rs31; 2026-02-21T12:31:22.0976583Z cvt.f32.bf16 %r5436, %rs32; 2026-02-21T12:31:22.0976798Z .loc 1 54 34 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:54:34 2026-02-21T12:31:22.0976878Z mad.lo.s64 %rd250, %rd249, 1280, %rd186; 2026-02-21T12:31:22.0976946Z add.s64 %rd234, %rd250, %rd650; 2026-02-21T12:31:22.0977231Z .loc 1 54 87 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:54:87 2026-02-21T12:31:22.0977312Z // begin inline asm 2026-02-21T12:31:22.0977374Z mov.u64 %rd233, 0x0; 2026-02-21T12:31:22.0977545Z createpolicy.fractional.L2::evict_first.b64 %rd233, 1.0; 2026-02-21T12:31:22.0977614Z // end inline asm 2026-02-21T12:31:22.0977676Z // begin inline asm 2026-02-21T12:31:22.0977801Z mov.u32 %r3355, 0x0; 2026-02-21T12:31:22.0977869Z mov.u32 %r3356, 0x0; 2026-02-21T12:31:22.0978084Z ld.global.L1::evict_first.L2::cache_hint.v2.b32 { %r3355, %r3356 }, [ %rd234 + 0 ], %rd233; 2026-02-21T12:31:22.0978149Z // end inline asm 2026-02-21T12:31:22.0978361Z .loc 1 62 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:62:28 2026-02-21T12:31:22.0978436Z st.shared.b8 [%r10], %r3355; 2026-02-21T12:31:22.0978507Z prmt.b32 %r7502, %r3355, 0, 0x7771U; 2026-02-21T12:31:22.0978576Z st.shared.b8 [%r11+128], %r7502; 2026-02-21T12:31:22.0978658Z prmt.b32 %r7503, %r3355, 0, 0x7772U; 2026-02-21T12:31:22.0978727Z st.shared.b8 [%r12+256], %r7503; 2026-02-21T12:31:22.0978798Z prmt.b32 %r7504, %r3355, 0, 0x7773U; 2026-02-21T12:31:22.0978871Z st.shared.b8 [%r13+384], %r7504; 2026-02-21T12:31:22.0978939Z st.shared.b8 [%r14+512], %r3356; 2026-02-21T12:31:22.0979006Z prmt.b32 %r7505, %r3356, 0, 0x7771U; 2026-02-21T12:31:22.0979072Z st.shared.b8 [%r15+640], %r7505; 2026-02-21T12:31:22.0979148Z prmt.b32 %r7506, %r3356, 0, 0x7772U; 2026-02-21T12:31:22.0979212Z st.shared.b8 [%r16+768], %r7506; 2026-02-21T12:31:22.0979279Z prmt.b32 %r7507, %r3356, 0, 0x7773U; 2026-02-21T12:31:22.0979346Z st.shared.b8 [%r17+896], %r7507; 2026-02-21T12:31:22.0979405Z bar.sync 0; 2026-02-21T12:31:22.0979472Z ld.shared.b32 %r7508, [%r18]; 2026-02-21T12:31:22.0979538Z prmt.b32 %r7509, %r7508, 0, 0x7771U; 2026-02-21T12:31:22.0979611Z cvt.u16.u32 %rs33, %r7509; 2026-02-21T12:31:22.0979677Z prmt.b32 %r7510, %r7508, 0, 0x7770U; 2026-02-21T12:31:22.0979745Z cvt.u16.u32 %rs34, %r7510; 2026-02-21T12:31:22.0979816Z prmt.b32 %r7511, %r7508, 0, 0x7773U; 2026-02-21T12:31:22.0979877Z cvt.u16.u32 %rs35, %r7511; 2026-02-21T12:31:22.0979942Z prmt.b32 %r7512, %r7508, 0, 0x7772U; 2026-02-21T12:31:22.0980011Z cvt.u16.u32 %rs36, %r7512; 2026-02-21T12:31:22.0980218Z .loc 1 57 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:57:28 2026-02-21T12:31:22.0980296Z shl.b16 %rs37, %rs34, 4; 2026-02-21T12:31:22.0980362Z shl.b16 %rs38, %rs33, 4; 2026-02-21T12:31:22.0980573Z .loc 1 59 25 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:59:25 2026-02-21T12:31:22.0980635Z cvt.u32.u16 %r7513, %rs37; 2026-02-21T12:31:22.0980715Z prmt.b32 %r7514, %r7513, %r7515, 0x3340U; 2026-02-21T12:31:22.0980793Z prmt.b32 %r7519, %r7514, %r7516, 0x5410U; 2026-02-21T12:31:22.0980865Z prmt.b32 %r7520, %r7519, %r7508, 0x5040U; 2026-02-21T12:31:22.0980931Z prmt.b32 %r7521, %r7520, 0, 0x9991U; 2026-02-21T12:31:22.0981088Z cvt.u16.u32 %rs39, %r7521; 2026-02-21T12:31:22.0981150Z shr.s16 %rs40, %rs39, 4; 2026-02-21T12:31:22.0981215Z prmt.b32 %r7522, %r7520, 0, 0xbbb3U; 2026-02-21T12:31:22.0981276Z cvt.u16.u32 %rs41, %r7522; 2026-02-21T12:31:22.0981408Z shr.s16 %rs42, %rs41, 4; 2026-02-21T12:31:22.0981472Z cvt.s16.s8 %rs43, %rs37; 2026-02-21T12:31:22.0981533Z shr.s16 %rs44, %rs43, 4; 2026-02-21T12:31:22.0981601Z cvt.s16.s8 %rs45, %rs38; 2026-02-21T12:31:22.0981662Z shr.s16 %rs46, %rs45, 4; 2026-02-21T12:31:22.0981861Z .loc 1 77 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:77:32 2026-02-21T12:31:22.0981927Z cvt.rn.f32.s16 %r7523, %rs42; 2026-02-21T12:31:22.0981996Z cvt.rn.f32.s16 %r7524, %rs40; 2026-02-21T12:31:22.0982058Z cvt.rn.f32.s16 %r7525, %rs46; 2026-02-21T12:31:22.0982120Z cvt.rn.f32.s16 %r7526, %rs44; 2026-02-21T12:31:22.0982325Z .loc 1 62 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:62:28 2026-02-21T12:31:22.0982441Z ld.shared.b32 %r7527, [%r19]; 2026-02-21T12:31:22.0982510Z prmt.b32 %r7528, %r7527, 0, 0x7771U; 2026-02-21T12:31:22.0982577Z cvt.u16.u32 %rs47, %r7528; 2026-02-21T12:31:22.0982643Z prmt.b32 %r7529, %r7527, 0, 0x7770U; 2026-02-21T12:31:22.0982707Z cvt.u16.u32 %rs48, %r7529; 2026-02-21T12:31:22.0982772Z prmt.b32 %r7530, %r7527, 0, 0x7773U; 2026-02-21T12:31:22.0982839Z cvt.u16.u32 %rs49, %r7530; 2026-02-21T12:31:22.0982951Z prmt.b32 %r7531, %r7527, 0, 0x7772U; 2026-02-21T12:31:22.0983028Z cvt.u16.u32 %rs50, %r7531; 2026-02-21T12:31:22.0983237Z .loc 1 57 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:57:28 2026-02-21T12:31:22.0983299Z shl.b16 %rs51, %rs48, 4; 2026-02-21T12:31:22.0983360Z shl.b16 %rs52, %rs47, 4; 2026-02-21T12:31:22.0983561Z .loc 1 59 25 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:59:25 2026-02-21T12:31:22.0983627Z cvt.u32.u16 %r7532, %rs51; 2026-02-21T12:31:22.0983704Z prmt.b32 %r7533, %r7532, %r7534, 0x3340U; 2026-02-21T12:31:22.0983776Z prmt.b32 %r7535, %r7533, %r7516, 0x5410U; 2026-02-21T12:31:22.0983851Z prmt.b32 %r7536, %r7535, %r7527, 0x5040U; 2026-02-21T12:31:22.0983915Z prmt.b32 %r7537, %r7536, 0, 0x9991U; 2026-02-21T12:31:22.0983979Z cvt.u16.u32 %rs53, %r7537; 2026-02-21T12:31:22.0984046Z shr.s16 %rs54, %rs53, 4; 2026-02-21T12:31:22.0984117Z prmt.b32 %r7538, %r7536, 0, 0xbbb3U; 2026-02-21T12:31:22.0984186Z cvt.u16.u32 %rs55, %r7538; 2026-02-21T12:31:22.0984250Z shr.s16 %rs56, %rs55, 4; 2026-02-21T12:31:22.0984320Z cvt.s16.s8 %rs57, %rs51; 2026-02-21T12:31:22.0984383Z shr.s16 %rs58, %rs57, 4; 2026-02-21T12:31:22.0984444Z cvt.s16.s8 %rs59, %rs52; 2026-02-21T12:31:22.0984511Z shr.s16 %rs60, %rs59, 4; 2026-02-21T12:31:22.0984721Z .loc 1 77 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:77:32 2026-02-21T12:31:22.0984789Z cvt.rn.f32.s16 %r7539, %rs56; 2026-02-21T12:31:22.0984858Z cvt.rn.f32.s16 %r7540, %rs54; 2026-02-21T12:31:22.0984928Z cvt.rn.f32.s16 %r7541, %rs60; 2026-02-21T12:31:22.0984993Z cvt.rn.f32.s16 %r7542, %rs58; 2026-02-21T12:31:22.0985195Z .loc 1 57 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:57:28 2026-02-21T12:31:22.0985265Z shl.b16 %rs61, %rs36, 4; 2026-02-21T12:31:22.0985327Z shl.b16 %rs62, %rs35, 4; 2026-02-21T12:31:22.0985529Z .loc 1 59 25 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:59:25 2026-02-21T12:31:22.0985612Z prmt.b32 %r7543, %r7508, %r7544, 0x3020U; 2026-02-21T12:31:22.0985680Z prmt.b32 %r7545, %r7543, 0, 0x9991U; 2026-02-21T12:31:22.0985744Z cvt.u16.u32 %rs63, %r7545; 2026-02-21T12:31:22.0985806Z shr.s16 %rs64, %rs63, 4; 2026-02-21T12:31:22.0985876Z cvt.s16.s8 %rs65, %rs61; 2026-02-21T12:31:22.0985938Z shr.s16 %rs66, %rs65, 4; 2026-02-21T12:31:22.0986000Z cvt.s16.s8 %rs67, %rs62; 2026-02-21T12:31:22.0986066Z shr.s16 %rs68, %rs67, 4; 2026-02-21T12:31:22.0986198Z prmt.b32 %r7546, %r7508, 0, 0xbbb3U; 2026-02-21T12:31:22.0986262Z cvt.u16.u32 %rs69, %r7546; 2026-02-21T12:31:22.0986325Z shr.s16 %rs70, %rs69, 4; 2026-02-21T12:31:22.0986669Z .loc 1 77 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:77:32 2026-02-21T12:31:22.0986823Z cvt.rn.f32.s16 %r7547, %rs64; 2026-02-21T12:31:22.0986888Z cvt.rn.f32.s16 %r7548, %rs70; 2026-02-21T12:31:22.0986959Z cvt.rn.f32.s16 %r7549, %rs68; 2026-02-21T12:31:22.0987022Z cvt.rn.f32.s16 %r7550, %rs66; 2026-02-21T12:31:22.0987223Z .loc 1 57 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:57:28 2026-02-21T12:31:22.0987289Z shl.b16 %rs71, %rs50, 4; 2026-02-21T12:31:22.0987363Z shl.b16 %rs72, %rs49, 4; 2026-02-21T12:31:22.0987573Z .loc 1 59 25 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:59:25 2026-02-21T12:31:22.0987650Z prmt.b32 %r7551, %r7527, %r7552, 0x3020U; 2026-02-21T12:31:22.0987733Z prmt.b32 %r7553, %r7551, 0, 0x9991U; 2026-02-21T12:31:22.0987867Z cvt.u16.u32 %rs73, %r7553; 2026-02-21T12:31:22.0987934Z shr.s16 %rs74, %rs73, 4; 2026-02-21T12:31:22.0988015Z cvt.s16.s8 %rs75, %rs71; 2026-02-21T12:31:22.0988081Z shr.s16 %rs76, %rs75, 4; 2026-02-21T12:31:22.0988151Z cvt.s16.s8 %rs77, %rs72; 2026-02-21T12:31:22.0988220Z shr.s16 %rs78, %rs77, 4; 2026-02-21T12:31:22.0988386Z prmt.b32 %r7554, %r7527, 0, 0xbbb3U; 2026-02-21T12:31:22.0988518Z cvt.u16.u32 %rs79, %r7554; 2026-02-21T12:31:22.0988585Z shr.s16 %rs80, %rs79, 4; 2026-02-21T12:31:22.0988795Z .loc 1 77 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:77:32 2026-02-21T12:31:22.0988861Z cvt.rn.f32.s16 %r7555, %rs74; 2026-02-21T12:31:22.0988927Z cvt.rn.f32.s16 %r7556, %rs80; 2026-02-21T12:31:22.0989000Z cvt.rn.f32.s16 %r7557, %rs78; 2026-02-21T12:31:22.0989065Z cvt.rn.f32.s16 %r7558, %rs76; 2026-02-21T12:31:22.0989124Z bar.sync 0; 2026-02-21T12:31:22.0989242Z st.shared.v4.b32 [%r20], {%r7526, %r7524, %r7525, %r7523}; 2026-02-21T12:31:22.0989389Z st.shared.v4.b32 [%r20+4096], {%r7542, %r7540, %r7541, %r7539}; 2026-02-21T12:31:22.0989504Z st.shared.v4.b32 [%r21], {%r7550, %r7547, %r7549, %r7548}; 2026-02-21T12:31:22.0989620Z st.shared.v4.b32 [%r21+4096], {%r7558, %r7555, %r7557, %r7556}; 2026-02-21T12:31:22.0989690Z $L__tmp1: 2026-02-21T12:31:22.0989975Z .loc 2 291 36 // standard.py:291:36 @[ c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:84:40 ] 2026-02-21T12:31:22.0990041Z // begin inline asm 2026-02-21T12:31:22.0990130Z fence.proxy.async.shared::cta; 2026-02-21T12:31:22.0990192Z // end inline asm 2026-02-21T12:31:22.0990250Z bar.sync 0; 2026-02-21T12:31:22.0990334Z shfl.sync.idx.b32 %r7559, %r2, 0, 31, -1; 2026-02-21T12:31:22.0990413Z wgmma.fence.sync.aligned; 2026-02-21T12:31:22.0990477Z mov.pred %p9, -1; 2026-02-21T12:31:22.0990540Z // begin inline asm 2026-02-21T12:31:22.0992834Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9236,%r9237,%r9238,%r9239,%r9240,%r9241,%r9242,%r9243,%r9244,%r9245,%r9246,%r9247,%r9248,%r9249,%r9250,%r9251,%r9252,%r9253,%r9254,%r9255,%r9256,%r9257,%r9258,%r9259,%r9260,%r9261,%r9262,%r9263,%r9264,%r9265,%r9266,%r9267,%r9268,%r9269,%r9270,%r9271,%r9272,%r9273,%r9274,%r9275,%r9276,%r9277,%r9278,%r9279,%r9280,%r9281,%r9282,%r9283,%r9284,%r9285,%r9286,%r9287,%r9288,%r9289,%r9290,%r9291,%r9292,%r9293,%r9294,%r9295,%r9296,%r9297,%r9298,%r9299,%r9300,%r9301,%r9302,%r9303,%r9304,%r9305,%r9306,%r9307,%r9308,%r9309,%r9310,%r9311,%r9312,%r9313,%r9314,%r9315,%r9316,%r9317,%r9318,%r9319,%r9320,%r9321,%r9322,%r9323,%r9324,%r9325,%r9326,%r9327,%r9328,%r9329,%r9330,%r9331,%r9332,%r9333,%r9334,%r9335,%r9336,%r9337,%r9338,%r9339,%r9340,%r9341,%r9342,%r9343,%r9344,%r9345,%r9346,%r9347,%r9348,%r9349,%r9350,%r9351,%r9352,%r9353,%r9354,%r9355,%r9356,%r9357,%r9358,%r9359,%r9360,%r9361,%r9362,%r9363}, {%r3613,%r3614,%r3615,%r3616}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.0992901Z // end inline asm 2026-02-21T12:31:22.0993058Z // begin inline asm 2026-02-21T12:31:22.0995334Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9364,%r9365,%r9366,%r9367,%r9368,%r9369,%r9370,%r9371,%r9372,%r9373,%r9374,%r9375,%r9376,%r9377,%r9378,%r9379,%r9380,%r9381,%r9382,%r9383,%r9384,%r9385,%r9386,%r9387,%r9388,%r9389,%r9390,%r9391,%r9392,%r9393,%r9394,%r9395,%r9396,%r9397,%r9398,%r9399,%r9400,%r9401,%r9402,%r9403,%r9404,%r9405,%r9406,%r9407,%r9408,%r9409,%r9410,%r9411,%r9412,%r9413,%r9414,%r9415,%r9416,%r9417,%r9418,%r9419,%r9420,%r9421,%r9422,%r9423,%r9424,%r9425,%r9426,%r9427,%r9428,%r9429,%r9430,%r9431,%r9432,%r9433,%r9434,%r9435,%r9436,%r9437,%r9438,%r9439,%r9440,%r9441,%r9442,%r9443,%r9444,%r9445,%r9446,%r9447,%r9448,%r9449,%r9450,%r9451,%r9452,%r9453,%r9454,%r9455,%r9456,%r9457,%r9458,%r9459,%r9460,%r9461,%r9462,%r9463,%r9464,%r9465,%r9466,%r9467,%r9468,%r9469,%r9470,%r9471,%r9472,%r9473,%r9474,%r9475,%r9476,%r9477,%r9478,%r9479,%r9480,%r9481,%r9482,%r9483,%r9484,%r9485,%r9486,%r9487,%r9488,%r9489,%r9490,%r9491}, {%r3873,%r3874,%r3875,%r3876}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.0995512Z // end inline asm 2026-02-21T12:31:22.0995574Z // begin inline asm 2026-02-21T12:31:22.0998065Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9492,%r9493,%r9494,%r9495,%r9496,%r9497,%r9498,%r9499,%r9500,%r9501,%r9502,%r9503,%r9504,%r9505,%r9506,%r9507,%r9508,%r9509,%r9510,%r9511,%r9512,%r9513,%r9514,%r9515,%r9516,%r9517,%r9518,%r9519,%r9520,%r9521,%r9522,%r9523,%r9524,%r9525,%r9526,%r9527,%r9528,%r9529,%r9530,%r9531,%r9532,%r9533,%r9534,%r9535,%r9536,%r9537,%r9538,%r9539,%r9540,%r9541,%r9542,%r9543,%r9544,%r9545,%r9546,%r9547,%r9548,%r9549,%r9550,%r9551,%r9552,%r9553,%r9554,%r9555,%r9556,%r9557,%r9558,%r9559,%r9560,%r9561,%r9562,%r9563,%r9564,%r9565,%r9566,%r9567,%r9568,%r9569,%r9570,%r9571,%r9572,%r9573,%r9574,%r9575,%r9576,%r9577,%r9578,%r9579,%r9580,%r9581,%r9582,%r9583,%r9584,%r9585,%r9586,%r9587,%r9588,%r9589,%r9590,%r9591,%r9592,%r9593,%r9594,%r9595,%r9596,%r9597,%r9598,%r9599,%r9600,%r9601,%r9602,%r9603,%r9604,%r9605,%r9606,%r9607,%r9608,%r9609,%r9610,%r9611,%r9612,%r9613,%r9614,%r9615,%r9616,%r9617,%r9618,%r9619}, {%r4133,%r4134,%r4135,%r4136}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.0998139Z // end inline asm 2026-02-21T12:31:22.0998202Z // begin inline asm 2026-02-21T12:31:22.1000483Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9620,%r9621,%r9622,%r9623,%r9624,%r9625,%r9626,%r9627,%r9628,%r9629,%r9630,%r9631,%r9632,%r9633,%r9634,%r9635,%r9636,%r9637,%r9638,%r9639,%r9640,%r9641,%r9642,%r9643,%r9644,%r9645,%r9646,%r9647,%r9648,%r9649,%r9650,%r9651,%r9652,%r9653,%r9654,%r9655,%r9656,%r9657,%r9658,%r9659,%r9660,%r9661,%r9662,%r9663,%r9664,%r9665,%r9666,%r9667,%r9668,%r9669,%r9670,%r9671,%r9672,%r9673,%r9674,%r9675,%r9676,%r9677,%r9678,%r9679,%r9680,%r9681,%r9682,%r9683,%r9684,%r9685,%r9686,%r9687,%r9688,%r9689,%r9690,%r9691,%r9692,%r9693,%r9694,%r9695,%r9696,%r9697,%r9698,%r9699,%r9700,%r9701,%r9702,%r9703,%r9704,%r9705,%r9706,%r9707,%r9708,%r9709,%r9710,%r9711,%r9712,%r9713,%r9714,%r9715,%r9716,%r9717,%r9718,%r9719,%r9720,%r9721,%r9722,%r9723,%r9724,%r9725,%r9726,%r9727,%r9728,%r9729,%r9730,%r9731,%r9732,%r9733,%r9734,%r9735,%r9736,%r9737,%r9738,%r9739,%r9740,%r9741,%r9742,%r9743,%r9744,%r9745,%r9746,%r9747}, {%r4393,%r4394,%r4395,%r4396}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.1000547Z // end inline asm 2026-02-21T12:31:22.1000612Z // begin inline asm 2026-02-21T12:31:22.1002892Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9748,%r9749,%r9750,%r9751,%r9752,%r9753,%r9754,%r9755,%r9756,%r9757,%r9758,%r9759,%r9760,%r9761,%r9762,%r9763,%r9764,%r9765,%r9766,%r9767,%r9768,%r9769,%r9770,%r9771,%r9772,%r9773,%r9774,%r9775,%r9776,%r9777,%r9778,%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794,%r9795,%r9796,%r9797,%r9798,%r9799,%r9800,%r9801,%r9802,%r9803,%r9804,%r9805,%r9806,%r9807,%r9808,%r9809,%r9810,%r9811,%r9812,%r9813,%r9814,%r9815,%r9816,%r9817,%r9818,%r9819,%r9820,%r9821,%r9822,%r9823,%r9824,%r9825,%r9826,%r9827,%r9828,%r9829,%r9830,%r9831,%r9832,%r9833,%r9834,%r9835,%r9836,%r9837,%r9838,%r9839,%r9840,%r9841,%r9842,%r9843,%r9844,%r9845,%r9846,%r9847,%r9848,%r9849,%r9850,%r9851,%r9852,%r9853,%r9854,%r9855,%r9856,%r9857,%r9858,%r9859,%r9860,%r9861,%r9862,%r9863,%r9864,%r9865,%r9866,%r9867,%r9868,%r9869,%r9870,%r9871,%r9872,%r9873,%r9874,%r9875}, {%r4653,%r4654,%r4655,%r4656}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.1003101Z // end inline asm 2026-02-21T12:31:22.1003162Z // begin inline asm 2026-02-21T12:31:22.1005543Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r9876,%r9877,%r9878,%r9879,%r9880,%r9881,%r9882,%r9883,%r9884,%r9885,%r9886,%r9887,%r9888,%r9889,%r9890,%r9891,%r9892,%r9893,%r9894,%r9895,%r9896,%r9897,%r9898,%r9899,%r9900,%r9901,%r9902,%r9903,%r9904,%r9905,%r9906,%r9907,%r9908,%r9909,%r9910,%r9911,%r9912,%r9913,%r9914,%r9915,%r9916,%r9917,%r9918,%r9919,%r9920,%r9921,%r9922,%r9923,%r9924,%r9925,%r9926,%r9927,%r9928,%r9929,%r9930,%r9931,%r9932,%r9933,%r9934,%r9935,%r9936,%r9937,%r9938,%r9939,%r9940,%r9941,%r9942,%r9943,%r9944,%r9945,%r9946,%r9947,%r9948,%r9949,%r9950,%r9951,%r9952,%r9953,%r9954,%r9955,%r9956,%r9957,%r9958,%r9959,%r9960,%r9961,%r9962,%r9963,%r9964,%r9965,%r9966,%r9967,%r9968,%r9969,%r9970,%r9971,%r9972,%r9973,%r9974,%r9975,%r9976,%r9977,%r9978,%r9979,%r9980,%r9981,%r9982,%r9983,%r9984,%r9985,%r9986,%r9987,%r9988,%r9989,%r9990,%r9991,%r9992,%r9993,%r9994,%r9995,%r9996,%r9997,%r9998,%r9999,%r10000,%r10001,%r10002,%r10003}, {%r4913,%r4914,%r4915,%r4916}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.1005657Z // end inline asm 2026-02-21T12:31:22.1005721Z // begin inline asm 2026-02-21T12:31:22.1008576Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r10004,%r10005,%r10006,%r10007,%r10008,%r10009,%r10010,%r10011,%r10012,%r10013,%r10014,%r10015,%r10016,%r10017,%r10018,%r10019,%r10020,%r10021,%r10022,%r10023,%r10024,%r10025,%r10026,%r10027,%r10028,%r10029,%r10030,%r10031,%r10032,%r10033,%r10034,%r10035,%r10036,%r10037,%r10038,%r10039,%r10040,%r10041,%r10042,%r10043,%r10044,%r10045,%r10046,%r10047,%r10048,%r10049,%r10050,%r10051,%r10052,%r10053,%r10054,%r10055,%r10056,%r10057,%r10058,%r10059,%r10060,%r10061,%r10062,%r10063,%r10064,%r10065,%r10066,%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082,%r10083,%r10084,%r10085,%r10086,%r10087,%r10088,%r10089,%r10090,%r10091,%r10092,%r10093,%r10094,%r10095,%r10096,%r10097,%r10098,%r10099,%r10100,%r10101,%r10102,%r10103,%r10104,%r10105,%r10106,%r10107,%r10108,%r10109,%r10110,%r10111,%r10112,%r10113,%r10114,%r10115,%r10116,%r10117,%r10118,%r10119,%r10120,%r10121,%r10122,%r10123,%r10124,%r10125,%r10126,%r10127,%r10128,%r10129,%r10130,%r10131}, {%r5173,%r5174,%r5175,%r5176}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.1008644Z // end inline asm 2026-02-21T12:31:22.1008711Z // begin inline asm 2026-02-21T12:31:22.1011423Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r10132,%r10133,%r10134,%r10135,%r10136,%r10137,%r10138,%r10139,%r10140,%r10141,%r10142,%r10143,%r10144,%r10145,%r10146,%r10147,%r10148,%r10149,%r10150,%r10151,%r10152,%r10153,%r10154,%r10155,%r10156,%r10157,%r10158,%r10159,%r10160,%r10161,%r10162,%r10163,%r10164,%r10165,%r10166,%r10167,%r10168,%r10169,%r10170,%r10171,%r10172,%r10173,%r10174,%r10175,%r10176,%r10177,%r10178,%r10179,%r10180,%r10181,%r10182,%r10183,%r10184,%r10185,%r10186,%r10187,%r10188,%r10189,%r10190,%r10191,%r10192,%r10193,%r10194,%r10195,%r10196,%r10197,%r10198,%r10199,%r10200,%r10201,%r10202,%r10203,%r10204,%r10205,%r10206,%r10207,%r10208,%r10209,%r10210,%r10211,%r10212,%r10213,%r10214,%r10215,%r10216,%r10217,%r10218,%r10219,%r10220,%r10221,%r10222,%r10223,%r10224,%r10225,%r10226,%r10227,%r10228,%r10229,%r10230,%r10231,%r10232,%r10233,%r10234,%r10235,%r10236,%r10237,%r10238,%r10239,%r10240,%r10241,%r10242,%r10243,%r10244,%r10245,%r10246,%r10247,%r10248,%r10249,%r10250,%r10251,%r10252,%r10253,%r10254,%r10255,%r10256,%r10257,%r10258,%r10259}, {%r5433,%r5434,%r5435,%r5436}, %rd236, %p9, 1, 1; 2026-02-21T12:31:22.1011492Z // end inline asm 2026-02-21T12:31:22.1011575Z wgmma.commit_group.sync.aligned; 2026-02-21T12:31:22.1011712Z mov.b32 %r6462, 0; 2026-02-21T12:31:22.1011794Z mov.b32 %r6461, %r3304; 2026-02-21T12:31:22.1011859Z mov.b32 %r6463, %r6462; 2026-02-21T12:31:22.1011922Z // begin inline asm 2026-02-21T12:31:22.1029708Z // wait for regs: %r9236,%r9237,%r9238,%r9239,%r9240,%r9241,%r9242,%r9243,%r9244,%r9245,%r9246,%r9247,%r9248,%r9249,%r9250,%r9251,%r9252,%r9253,%r9254,%r9255,%r9256,%r9257,%r9258,%r9259,%r9260,%r9261,%r9262,%r9263,%r9264,%r9265,%r9266,%r9267,%r9268,%r9269,%r9270,%r9271,%r9272,%r9273,%r9274,%r9275,%r9276,%r9277,%r9278,%r9279,%r9280,%r9281,%r9282,%r9283,%r9284,%r9285,%r9286,%r9287,%r9288,%r9289,%r9290,%r9291,%r9292,%r9293,%r9294,%r9295,%r9296,%r9297,%r9298,%r9299,%r9300,%r9301,%r9302,%r9303,%r9304,%r9305,%r9306,%r9307,%r9308,%r9309,%r9310,%r9311,%r9312,%r9313,%r9314,%r9315,%r9316,%r9317,%r9318,%r9319,%r9320,%r9321,%r9322,%r9323,%r9324,%r9325,%r9326,%r9327,%r9328,%r9329,%r9330,%r9331,%r9332,%r9333,%r9334,%r9335,%r9336,%r9337,%r9338,%r9339,%r9340,%r9341,%r9342,%r9343,%r9344,%r9345,%r9346,%r9347,%r9348,%r9349,%r9350,%r9351,%r9352,%r9353,%r9354,%r9355,%r9356,%r9357,%r9358,%r9359,%r9360,%r9361,%r9362,%r9363,%r9364,%r9365,%r9366,%r9367,%r9368,%r9369,%r9370,%r9371,%r9372,%r9373,%r9374,%r9375,%r9376,%r9377,%r9378,%r9379,%r9380,%r9381,%r9382,%r9383,%r9384,%r9385,%r9386,%r9387,%r9388,%r9389,%r9390,%r9391,%r9392,%r9393,%r9394,%r9395,%r9396,%r9397,%r9398,%r9399,%r9400,%r9401,%r9402,%r9403,%r9404,%r9405,%r9406,%r9407,%r9408,%r9409,%r9410,%r9411,%r9412,%r9413,%r9414,%r9415,%r9416,%r9417,%r9418,%r9419,%r9420,%r9421,%r9422,%r9423,%r9424,%r9425,%r9426,%r9427,%r9428,%r9429,%r9430,%r9431,%r9432,%r9433,%r9434,%r9435,%r9436,%r9437,%r9438,%r9439,%r9440,%r9441,%r9442,%r9443,%r9444,%r9445,%r9446,%r9447,%r9448,%r9449,%r9450,%r9451,%r9452,%r9453,%r9454,%r9455,%r9456,%r9457,%r9458,%r9459,%r9460,%r9461,%r9462,%r9463,%r9464,%r9465,%r9466,%r9467,%r9468,%r9469,%r9470,%r9471,%r9472,%r9473,%r9474,%r9475,%r9476,%r9477,%r9478,%r9479,%r9480,%r9481,%r9482,%r9483,%r9484,%r9485,%r9486,%r9487,%r9488,%r9489,%r9490,%r9491,%r9492,%r9493,%r9494,%r9495,%r9496,%r9497,%r9498,%r9499,%r9500,%r9501,%r9502,%r9503,%r9504,%r9505,%r9506,%r9507,%r9508,%r9509,%r9510,%r9511,%r9512,%r9513,%r9514,%r9515,%r9516,%r9517,%r9518,%r9519,%r9520,%r9521,%r9522,%r9523,%r9524,%r9525,%r9526,%r9527,%r9528,%r9529,%r9530,%r9531,%r9532,%r9533,%r9534,%r9535,%r9536,%r9537,%r9538,%r9539,%r9540,%r9541,%r9542,%r9543,%r9544,%r9545,%r9546,%r9547,%r9548,%r9549,%r9550,%r9551,%r9552,%r9553,%r9554,%r9555,%r9556,%r9557,%r9558,%r9559,%r9560,%r9561,%r9562,%r9563,%r9564,%r9565,%r9566,%r9567,%r9568,%r9569,%r9570,%r9571,%r9572,%r9573,%r9574,%r9575,%r9576,%r9577,%r9578,%r9579,%r9580,%r9581,%r9582,%r9583,%r9584,%r9585,%r9586,%r9587,%r9588,%r9589,%r9590,%r9591,%r9592,%r9593,%r9594,%r9595,%r9596,%r9597,%r9598,%r9599,%r9600,%r9601,%r9602,%r9603,%r9604,%r9605,%r9606,%r9607,%r9608,%r9609,%r9610,%r9611,%r9612,%r9613,%r9614,%r9615,%r9616,%r9617,%r9618,%r9619,%r9620,%r9621,%r9622,%r9623,%r9624,%r9625,%r9626,%r9627,%r9628,%r9629,%r9630,%r9631,%r9632,%r9633,%r9634,%r9635,%r9636,%r9637,%r9638,%r9639,%r9640,%r9641,%r9642,%r9643,%r9644,%r9645,%r9646,%r9647,%r9648,%r9649,%r9650,%r9651,%r9652,%r9653,%r9654,%r9655,%r9656,%r9657,%r9658,%r9659,%r9660,%r9661,%r9662,%r9663,%r9664,%r9665,%r9666,%r9667,%r9668,%r9669,%r9670,%r9671,%r9672,%r9673,%r9674,%r9675,%r9676,%r9677,%r9678,%r9679,%r9680,%r9681,%r9682,%r9683,%r9684,%r9685,%r9686,%r9687,%r9688,%r9689,%r9690,%r9691,%r9692,%r9693,%r9694,%r9695,%r9696,%r9697,%r9698,%r9699,%r9700,%r9701,%r9702,%r9703,%r9704,%r9705,%r9706,%r9707,%r9708,%r9709,%r9710,%r9711,%r9712,%r9713,%r9714,%r9715,%r9716,%r9717,%r9718,%r9719,%r9720,%r9721,%r9722,%r9723,%r9724,%r9725,%r9726,%r9727,%r9728,%r9729,%r9730,%r9731,%r9732,%r9733,%r9734,%r9735,%r9736,%r9737,%r9738,%r9739,%r9740,%r9741,%r9742,%r9743,%r9744,%r9745,%r9746,%r9747,%r9748,%r9749,%r9750,%r9751,%r9752,%r9753,%r9754,%r9755,%r9756,%r9757,%r9758,%r9759,%r9760,%r9761,%r9762,%r9763,%r9764,%r9765,%r9766,%r9767,%r9768,%r9769,%r9770,%r9771,%r9772,%r9773,%r9774,%r9775,%r9776,%r9777,%r9778,%r9779,%r9780,%r9781,%r9782,%r9783,%r9784,%r9785,%r9786,%r9787,%r9788,%r9789,%r9790,%r9791,%r9792,%r9793,%r9794,%r9795,%r9796,%r9797,%r9798,%r9799,%r9800,%r9801,%r9802,%r9803,%r9804,%r9805,%r9806,%r9807,%r9808,%r9809,%r9810,%r9811,%r9812,%r9813,%r9814,%r9815,%r9816,%r9817,%r9818,%r9819,%r9820,%r9821,%r9822,%r9823,%r9824,%r9825,%r9826,%r9827,%r9828,%r9829,%r9830,%r9831,%r9832,%r9833,%r9834,%r9835,%r9836,%r9837,%r9838,%r9839,%r9840,%r9841,%r9842,%r9843,%r9844,%r9845,%r9846,%r9847,%r9848,%r9849,%r9850,%r9851,%r9852,%r9853,%r9854,%r9855,%r9856,%r9857,%r9858,%r9859,%r9860,%r9861,%r9862,%r9863,%r9864,%r9865,%r9866,%r9867,%r9868,%r9869,%r9870,%r9871,%r9872,%r9873,%r9874,%r9875,%r9876,%r9877,%r9878,%r9879,%r9880,%r9881,%r9882,%r9883,%r9884,%r9885,%r9886,%r9887,%r9888,%r9889,%r9890,%r9891,%r9892,%r9893,%r9894,%r9895,%r9896,%r9897,%r9898,%r9899,%r9900,%r9901,%r9902,%r9903,%r9904,%r9905,%r9906,%r9907,%r9908,%r9909,%r9910,%r9911,%r9912,%r9913,%r9914,%r9915,%r9916,%r9917,%r9918,%r9919,%r9920,%r9921,%r9922,%r9923,%r9924,%r9925,%r9926,%r9927,%r9928,%r9929,%r9930,%r9931,%r9932,%r9933,%r9934,%r9935,%r9936,%r9937,%r9938,%r9939,%r9940,%r9941,%r9942,%r9943,%r9944,%r9945,%r9946,%r9947,%r9948,%r9949,%r9950,%r9951,%r9952,%r9953,%r9954,%r9955,%r9956,%r9957,%r9958,%r9959,%r9960,%r9961,%r9962,%r9963,%r9964,%r9965,%r9966,%r9967,%r9968,%r9969,%r9970,%r9971,%r9972,%r9973,%r9974,%r9975,%r9976,%r9977,%r9978,%r9979,%r9980,%r9981,%r9982,%r9983,%r9984,%r9985,%r9986,%r9987,%r9988,%r9989,%r9990,%r9991,%r9992,%r9993,%r9994,%r9995,%r9996,%r9997,%r9998,%r9999,%r10000,%r10001,%r10002,%r10003,%r10004,%r10005,%r10006,%r10007,%r10008,%r10009,%r10010,%r10011,%r10012,%r10013,%r10014,%r10015,%r10016,%r10017,%r10018,%r10019,%r10020,%r10021,%r10022,%r10023,%r10024,%r10025,%r10026,%r10027,%r10028,%r10029,%r10030,%r10031,%r10032,%r10033,%r10034,%r10035,%r10036,%r10037,%r10038,%r10039,%r10040,%r10041,%r10042,%r10043,%r10044,%r10045,%r10046,%r10047,%r10048,%r10049,%r10050,%r10051,%r10052,%r10053,%r10054,%r10055,%r10056,%r10057,%r10058,%r10059,%r10060,%r10061,%r10062,%r10063,%r10064,%r10065,%r10066,%r10067,%r10068,%r10069,%r10070,%r10071,%r10072,%r10073,%r10074,%r10075,%r10076,%r10077,%r10078,%r10079,%r10080,%r10081,%r10082,%r10083,%r10084,%r10085,%r10086,%r10087,%r10088,%r10089,%r10090,%r10091,%r10092,%r10093,%r10094,%r10095,%r10096,%r10097,%r10098,%r10099,%r10100,%r10101,%r10102,%r10103,%r10104,%r10105,%r10106,%r10107,%r10108,%r10109,%r10110,%r10111,%r10112,%r10113,%r10114,%r10115,%r10116,%r10117,%r10118,%r10119,%r10120,%r10121,%r10122,%r10123,%r10124,%r10125,%r10126,%r10127,%r10128,%r10129,%r10130,%r10131,%r10132,%r10133,%r10134,%r10135,%r10136,%r10137,%r10138,%r10139,%r10140,%r10141,%r10142,%r10143,%r10144,%r10145,%r10146,%r10147,%r10148,%r10149,%r10150,%r10151,%r10152,%r10153,%r10154,%r10155,%r10156,%r10157,%r10158,%r10159,%r10160,%r10161,%r10162,%r10163,%r10164,%r10165,%r10166,%r10167,%r10168,%r10169,%r10170,%r10171,%r10172,%r10173,%r10174,%r10175,%r10176,%r10177,%r10178,%r10179,%r10180,%r10181,%r10182,%r10183,%r10184,%r10185,%r10186,%r10187,%r10188,%r10189,%r10190,%r10191,%r10192,%r10193,%r10194,%r10195,%r10196,%r10197,%r10198,%r10199,%r10200,%r10201,%r10202,%r10203,%r10204,%r10205,%r10206,%r10207,%r10208,%r10209,%r10210,%r10211,%r10212,%r10213,%r10214,%r10215,%r10216,%r10217,%r10218,%r10219,%r10220,%r10221,%r10222,%r10223,%r10224,%r10225,%r10226,%r10227,%r10228,%r10229,%r10230,%r10231,%r10232,%r10233,%r10234,%r10235,%r10236,%r10237,%r10238,%r10239,%r10240,%r10241,%r10242,%r10243,%r10244,%r10245,%r10246,%r10247,%r10248,%r10249,%r10250,%r10251,%r10252,%r10253,%r10254,%r10255,%r10256,%r10257,%r10258,%r10259,%r6461,%r6462,%r6463 2026-02-21T12:31:22.1029990Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:31:22.1030055Z // end inline asm 2026-02-21T12:31:22.1030113Z $L__tmp2: 2026-02-21T12:31:22.1030350Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.1030420Z add.s32 %r7561, %r9235, 1; 2026-02-21T12:31:22.1030489Z setp.gt.s32 %p20, %r7561, 2; 2026-02-21T12:31:22.1030565Z selp.b32 %r9235, 0, %r7561, %p20; 2026-02-21T12:31:22.1030680Z add.s32 %r7562, %r9232, 4; 2026-02-21T12:31:22.1030748Z selp.b32 %r9231, 0, %r7562, %p17; 2026-02-21T12:31:22.1030957Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1031098Z shl.b64 %rd251, %rd665, 14; 2026-02-21T12:31:22.1031299Z .loc 1 45 22 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:45:22 2026-02-21T12:31:22.1031370Z shl.b32 %r7563, %r9231, 1; 2026-02-21T12:31:22.1031578Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1031647Z add.s64 %rd252, %rd185, %rd251; 2026-02-21T12:31:22.1031715Z mul.wide.s32 %rd253, %r7563, 2; 2026-02-21T12:31:22.1031786Z add.s64 %rd246, %rd252, %rd253; 2026-02-21T12:31:22.1031854Z shl.b64 %rd254, %rd663, 14; 2026-02-21T12:31:22.1031919Z add.s64 %rd255, %rd185, %rd254; 2026-02-21T12:31:22.1032173Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1032246Z selp.b32 %r7492, 16, 0, %p18; 2026-02-21T12:31:22.1032445Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1032513Z add.s64 %rd244, %rd255, %rd253; 2026-02-21T12:31:22.1032716Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1032829Z shl.b32 %r7564, %r9235, 13; 2026-02-21T12:31:22.1032895Z add.s32 %r7565, %r3281, %r7564; 2026-02-21T12:31:22.1033098Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1033162Z shl.b64 %rd256, %rd664, 14; 2026-02-21T12:31:22.1033227Z add.s64 %rd257, %rd185, %rd256; 2026-02-21T12:31:22.1033430Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1033496Z add.s32 %r7491, %r7565, %r8; 2026-02-21T12:31:22.1033696Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1033761Z add.s64 %rd245, %rd257, %rd253; 2026-02-21T12:31:22.1033980Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1034048Z // begin inline asm 2026-02-21T12:31:22.1034202Z cp.async.cg.shared.global [ %r7491 + 0 ], [ %rd244 + 0 ], 0x10, %r7492; 2026-02-21T12:31:22.1034268Z // end inline asm 2026-02-21T12:31:22.1034333Z add.s32 %r7493, %r7491, 2048; 2026-02-21T12:31:22.1034531Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1034600Z shl.b64 %rd258, %rd666, 14; 2026-02-21T12:31:22.1034801Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1034863Z // begin inline asm 2026-02-21T12:31:22.1035013Z cp.async.cg.shared.global [ %r7493 + 0 ], [ %rd245 + 0 ], 0x10, %r7492; 2026-02-21T12:31:22.1035075Z // end inline asm 2026-02-21T12:31:22.1035276Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1035341Z add.s64 %rd259, %rd185, %rd258; 2026-02-21T12:31:22.1035546Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1035609Z add.s32 %r7495, %r7491, 4096; 2026-02-21T12:31:22.1035809Z .loc 1 48 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:32 2026-02-21T12:31:22.1035880Z add.s64 %rd247, %rd259, %rd253; 2026-02-21T12:31:22.1036078Z .loc 1 48 80 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:48:80 2026-02-21T12:31:22.1036140Z // begin inline asm 2026-02-21T12:31:22.1036285Z cp.async.cg.shared.global [ %r7495 + 0 ], [ %rd246 + 0 ], 0x10, %r7492; 2026-02-21T12:31:22.1036345Z // end inline asm 2026-02-21T12:31:22.1036407Z add.s32 %r7497, %r7491, 6144; 2026-02-21T12:31:22.1036664Z // begin inline asm 2026-02-21T12:31:22.1036814Z cp.async.cg.shared.global [ %r7497 + 0 ], [ %rd247 + 0 ], 0x10, %r7492; 2026-02-21T12:31:22.1036874Z // end inline asm 2026-02-21T12:31:22.1036956Z cp.async.commit_group; 2026-02-21T12:31:22.1037249Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.1037333Z setp.ne.b64 %p21, %rd648, 1023; 2026-02-21T12:31:22.1037398Z @%p21 bra $L__BB0_9; 2026-02-21T12:31:22.1037527Z // %bb.8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:31:22.1037744Z .loc 1 33 32 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:33:32 2026-02-21T12:31:22.1037812Z add.s64 %rd388, %rd646, %rd7; 2026-02-21T12:31:22.1037878Z add.s64 %rd389, %rd646, %rd8; 2026-02-21T12:31:22.1037948Z add.s64 %rd390, %rd646, %rd9; 2026-02-21T12:31:22.1038013Z add.s64 %rd391, %rd646, %rd10; 2026-02-21T12:31:22.1038082Z add.s64 %rd392, %rd646, %rd11; 2026-02-21T12:31:22.1038218Z add.s64 %rd393, %rd646, %rd12; 2026-02-21T12:31:22.1038286Z add.s64 %rd394, %rd646, %rd13; 2026-02-21T12:31:22.1038349Z add.s64 %rd395, %rd646, %rd14; 2026-02-21T12:31:22.1038415Z add.s64 %rd396, %rd646, %rd15; 2026-02-21T12:31:22.1038484Z add.s64 %rd397, %rd646, %rd16; 2026-02-21T12:31:22.1038548Z add.s64 %rd398, %rd646, %rd17; 2026-02-21T12:31:22.1038675Z add.s64 %rd399, %rd646, %rd18; 2026-02-21T12:31:22.1038760Z add.s64 %rd400, %rd646, %rd19; 2026-02-21T12:31:22.1038827Z add.s64 %rd401, %rd646, %rd20; 2026-02-21T12:31:22.1038892Z add.s64 %rd402, %rd646, %rd21; 2026-02-21T12:31:22.1038963Z add.s64 %rd403, %rd646, %rd22; 2026-02-21T12:31:22.1039028Z add.s64 %rd404, %rd646, %rd23; 2026-02-21T12:31:22.1039093Z add.s64 %rd405, %rd646, %rd24; 2026-02-21T12:31:22.1039157Z add.s64 %rd406, %rd646, %rd25; 2026-02-21T12:31:22.1039228Z add.s64 %rd407, %rd646, %rd26; 2026-02-21T12:31:22.1039292Z add.s64 %rd408, %rd646, %rd27; 2026-02-21T12:31:22.1039361Z add.s64 %rd409, %rd646, %rd28; 2026-02-21T12:31:22.1039430Z add.s64 %rd410, %rd646, %rd29; 2026-02-21T12:31:22.1039495Z add.s64 %rd411, %rd646, %rd30; 2026-02-21T12:31:22.1039558Z add.s64 %rd412, %rd646, %rd31; 2026-02-21T12:31:22.1039623Z add.s64 %rd413, %rd646, %rd32; 2026-02-21T12:31:22.1039693Z add.s64 %rd414, %rd646, %rd33; 2026-02-21T12:31:22.1039757Z add.s64 %rd415, %rd646, %rd34; 2026-02-21T12:31:22.1039822Z add.s64 %rd416, %rd646, %rd35; 2026-02-21T12:31:22.1039891Z add.s64 %rd417, %rd646, %rd36; 2026-02-21T12:31:22.1039955Z add.s64 %rd418, %rd646, %rd37; 2026-02-21T12:31:22.1040019Z add.s64 %rd419, %rd646, %rd38; 2026-02-21T12:31:22.1040083Z add.s64 %rd420, %rd646, %rd39; 2026-02-21T12:31:22.1040153Z add.s64 %rd421, %rd646, %rd40; 2026-02-21T12:31:22.1040216Z add.s64 %rd422, %rd646, %rd41; 2026-02-21T12:31:22.1040280Z add.s64 %rd423, %rd646, %rd42; 2026-02-21T12:31:22.1040351Z add.s64 %rd424, %rd646, %rd43; 2026-02-21T12:31:22.1040419Z add.s64 %rd425, %rd646, %rd44; 2026-02-21T12:31:22.1040486Z add.s64 %rd426, %rd646, %rd45; 2026-02-21T12:31:22.1040552Z add.s64 %rd427, %rd646, %rd46; 2026-02-21T12:31:22.1040622Z add.s64 %rd428, %rd646, %rd47; 2026-02-21T12:31:22.1040699Z add.s64 %rd429, %rd646, %rd48; 2026-02-21T12:31:22.1040768Z add.s64 %rd430, %rd646, %rd49; 2026-02-21T12:31:22.1040839Z add.s64 %rd431, %rd646, %rd50; 2026-02-21T12:31:22.1040906Z add.s64 %rd432, %rd646, %rd51; 2026-02-21T12:31:22.1040972Z add.s64 %rd433, %rd646, %rd52; 2026-02-21T12:31:22.1041044Z add.s64 %rd434, %rd646, %rd53; 2026-02-21T12:31:22.1041107Z add.s64 %rd435, %rd646, %rd54; 2026-02-21T12:31:22.1041173Z add.s64 %rd436, %rd646, %rd55; 2026-02-21T12:31:22.1041237Z add.s64 %rd437, %rd646, %rd56; 2026-02-21T12:31:22.1041307Z add.s64 %rd438, %rd646, %rd57; 2026-02-21T12:31:22.1041374Z add.s64 %rd439, %rd646, %rd58; 2026-02-21T12:31:22.1041440Z add.s64 %rd440, %rd646, %rd59; 2026-02-21T12:31:22.1041509Z add.s64 %rd441, %rd646, %rd60; 2026-02-21T12:31:22.1041660Z add.s64 %rd442, %rd646, %rd61; 2026-02-21T12:31:22.1041726Z add.s64 %rd443, %rd646, %rd62; 2026-02-21T12:31:22.1041792Z add.s64 %rd444, %rd646, %rd63; 2026-02-21T12:31:22.1041863Z add.s64 %rd445, %rd646, %rd64; 2026-02-21T12:31:22.1041993Z add.s64 %rd446, %rd646, %rd65; 2026-02-21T12:31:22.1042057Z add.s64 %rd447, %rd646, %rd66; 2026-02-21T12:31:22.1042128Z add.s64 %rd448, %rd646, %rd67; 2026-02-21T12:31:22.1042195Z add.s64 %rd449, %rd646, %rd68; 2026-02-21T12:31:22.1042258Z add.s64 %rd450, %rd646, %rd69; 2026-02-21T12:31:22.1042321Z add.s64 %rd451, %rd646, %rd70; 2026-02-21T12:31:22.1042390Z add.s64 %rd452, %rd646, %rd71; 2026-02-21T12:31:22.1042456Z add.s64 %rd453, %rd646, %rd72; 2026-02-21T12:31:22.1042519Z add.s64 %rd454, %rd646, %rd73; 2026-02-21T12:31:22.1042589Z add.s64 %rd455, %rd646, %rd74; 2026-02-21T12:31:22.1042653Z add.s64 %rd456, %rd646, %rd75; 2026-02-21T12:31:22.1042716Z add.s64 %rd457, %rd646, %rd76; 2026-02-21T12:31:22.1042786Z add.s64 %rd458, %rd646, %rd77; 2026-02-21T12:31:22.1042901Z add.s64 %rd459, %rd646, %rd78; 2026-02-21T12:31:22.1042966Z add.s64 %rd460, %rd646, %rd79; 2026-02-21T12:31:22.1043028Z add.s64 %rd461, %rd646, %rd80; 2026-02-21T12:31:22.1043099Z add.s64 %rd462, %rd646, %rd81; 2026-02-21T12:31:22.1043167Z add.s64 %rd463, %rd646, %rd82; 2026-02-21T12:31:22.1043230Z add.s64 %rd464, %rd646, %rd83; 2026-02-21T12:31:22.1043344Z add.s64 %rd465, %rd646, %rd84; 2026-02-21T12:31:22.1043409Z add.s64 %rd466, %rd646, %rd85; 2026-02-21T12:31:22.1043472Z add.s64 %rd467, %rd646, %rd86; 2026-02-21T12:31:22.1043535Z add.s64 %rd468, %rd646, %rd87; 2026-02-21T12:31:22.1043605Z add.s64 %rd469, %rd646, %rd88; 2026-02-21T12:31:22.1043669Z add.s64 %rd470, %rd646, %rd89; 2026-02-21T12:31:22.1043731Z add.s64 %rd471, %rd646, %rd90; 2026-02-21T12:31:22.1043800Z add.s64 %rd472, %rd646, %rd91; 2026-02-21T12:31:22.1043864Z add.s64 %rd473, %rd646, %rd92; 2026-02-21T12:31:22.1043931Z add.s64 %rd474, %rd646, %rd93; 2026-02-21T12:31:22.1043999Z add.s64 %rd475, %rd646, %rd94; 2026-02-21T12:31:22.1044070Z add.s64 %rd476, %rd646, %rd95; 2026-02-21T12:31:22.1044133Z add.s64 %rd477, %rd646, %rd96; 2026-02-21T12:31:22.1044198Z add.s64 %rd478, %rd646, %rd97; 2026-02-21T12:31:22.1044270Z add.s64 %rd479, %rd646, %rd98; 2026-02-21T12:31:22.1044334Z add.s64 %rd480, %rd646, %rd99; 2026-02-21T12:31:22.1044401Z add.s64 %rd481, %rd646, %rd100; 2026-02-21T12:31:22.1044470Z add.s64 %rd482, %rd646, %rd101; 2026-02-21T12:31:22.1044538Z add.s64 %rd483, %rd646, %rd102; 2026-02-21T12:31:22.1044602Z add.s64 %rd484, %rd646, %rd103; 2026-02-21T12:31:22.1044666Z add.s64 %rd485, %rd646, %rd104; 2026-02-21T12:31:22.1044735Z add.s64 %rd486, %rd646, %rd105; 2026-02-21T12:31:22.1044800Z add.s64 %rd487, %rd646, %rd106; 2026-02-21T12:31:22.1044863Z add.s64 %rd488, %rd646, %rd107; 2026-02-21T12:31:22.1044945Z add.s64 %rd489, %rd646, %rd108; 2026-02-21T12:31:22.1045013Z add.s64 %rd490, %rd646, %rd109; 2026-02-21T12:31:22.1045083Z add.s64 %rd491, %rd646, %rd110; 2026-02-21T12:31:22.1045149Z add.s64 %rd492, %rd646, %rd111; 2026-02-21T12:31:22.1045218Z add.s64 %rd493, %rd646, %rd112; 2026-02-21T12:31:22.1045281Z add.s64 %rd494, %rd646, %rd113; 2026-02-21T12:31:22.1045348Z add.s64 %rd495, %rd646, %rd114; 2026-02-21T12:31:22.1045414Z add.s64 %rd496, %rd646, %rd115; 2026-02-21T12:31:22.1045477Z add.s64 %rd497, %rd646, %rd116; 2026-02-21T12:31:22.1045543Z add.s64 %rd498, %rd646, %rd117; 2026-02-21T12:31:22.1045606Z add.s64 %rd499, %rd646, %rd118; 2026-02-21T12:31:22.1045673Z add.s64 %rd500, %rd646, %rd119; 2026-02-21T12:31:22.1045737Z add.s64 %rd501, %rd646, %rd120; 2026-02-21T12:31:22.1045799Z add.s64 %rd502, %rd646, %rd121; 2026-02-21T12:31:22.1045869Z add.s64 %rd503, %rd646, %rd122; 2026-02-21T12:31:22.1045933Z add.s64 %rd504, %rd646, %rd123; 2026-02-21T12:31:22.1045997Z add.s64 %rd505, %rd646, %rd124; 2026-02-21T12:31:22.1046067Z add.s64 %rd506, %rd646, %rd125; 2026-02-21T12:31:22.1046864Z [7581s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:31:22.1048164Z Config: @helion.kernel(config=helion.Config(block_sizes=[4, 512, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=16, num_stages=4, num_warps=4, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:31:22.1048400Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:31:22.1048463Z `ptxas` stderr: 2026-02-21T12:31:22.1048938Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 1884 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:31:22.1049048Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:31:22.1049057Z 2026-02-21T12:31:22.1049625Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpbhy24nkw.ptx -o /tmp/tmpbhy24nkw.ptx.o 2026-02-21T12:31:22.1049634Z 2026-02-21T12:31:22.1049794Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:31:22.1049873Z add.s64 %rd507, %rd646, %rd126; 2026-02-21T12:31:22.1049997Z add.s64 %rd508, %rd646, %rd127; 2026-02-21T12:31:22.1050064Z add.s64 %rd509, %rd646, %rd128; 2026-02-21T12:31:22.1050135Z add.s64 %rd510, %rd646, %rd129; 2026-02-21T12:31:22.1050200Z add.s64 %rd511, %rd646, %rd130; 2026-02-21T12:31:22.1050263Z add.s64 %rd512, %rd646, %rd131; 2026-02-21T12:31:22.1050329Z add.s64 %rd513, %rd646, %rd132; 2026-02-21T12:31:22.1050400Z add.s64 %rd514, %rd646, %rd133; 2026-02-21T12:31:22.1050464Z add.s64 %rd515, %rd646, %rd134; 2026-02-21T12:31:22.1050681Z .loc 1 87 28 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:87:28 2026-02-21T12:31:22.1050772Z cvt.rn.bf16x2.f32 %r8719, %r9237, %r9236; 2026-02-21T12:31:22.1050852Z cvt.rn.bf16x2.f32 %r8720, %r9239, %r9238; 2026-02-21T12:31:22.1050927Z cvt.rn.bf16x2.f32 %r8721, %r9241, %r9240; 2026-02-21T12:31:22.1051006Z cvt.rn.bf16x2.f32 %r8722, %r9243, %r9242; 2026-02-21T12:31:22.1051080Z cvt.rn.bf16x2.f32 %r8723, %r9245, %r9244; 2026-02-21T12:31:22.1051154Z cvt.rn.bf16x2.f32 %r8724, %r9247, %r9246; 2026-02-21T12:31:22.1051228Z cvt.rn.bf16x2.f32 %r8725, %r9249, %r9248; 2026-02-21T12:31:22.1051303Z cvt.rn.bf16x2.f32 %r8726, %r9251, %r9250; 2026-02-21T12:31:22.1051376Z cvt.rn.bf16x2.f32 %r8727, %r9253, %r9252; 2026-02-21T12:31:22.1051460Z cvt.rn.bf16x2.f32 %r8728, %r9255, %r9254; 2026-02-21T12:31:22.1051541Z cvt.rn.bf16x2.f32 %r8729, %r9257, %r9256; 2026-02-21T12:31:22.1051614Z cvt.rn.bf16x2.f32 %r8730, %r9259, %r9258; 2026-02-21T12:31:22.1051687Z cvt.rn.bf16x2.f32 %r8731, %r9261, %r9260; 2026-02-21T12:31:22.1051769Z cvt.rn.bf16x2.f32 %r8732, %r9263, %r9262; 2026-02-21T12:31:22.1051846Z cvt.rn.bf16x2.f32 %r8733, %r9265, %r9264; 2026-02-21T12:31:22.1051924Z cvt.rn.bf16x2.f32 %r8734, %r9267, %r9266; 2026-02-21T12:31:22.1051999Z cvt.rn.bf16x2.f32 %r8735, %r9269, %r9268; 2026-02-21T12:31:22.1052081Z cvt.rn.bf16x2.f32 %r8736, %r9271, %r9270; 2026-02-21T12:31:22.1052154Z cvt.rn.bf16x2.f32 %r8737, %r9273, %r9272; 2026-02-21T12:31:22.1052231Z cvt.rn.bf16x2.f32 %r8738, %r9275, %r9274; 2026-02-21T12:31:22.1052311Z cvt.rn.bf16x2.f32 %r8739, %r9277, %r9276; 2026-02-21T12:31:22.1052383Z cvt.rn.bf16x2.f32 %r8740, %r9279, %r9278; 2026-02-21T12:31:22.1052456Z cvt.rn.bf16x2.f32 %r8741, %r9281, %r9280; 2026-02-21T12:31:22.1052536Z cvt.rn.bf16x2.f32 %r8742, %r9283, %r9282; 2026-02-21T12:31:22.1052610Z cvt.rn.bf16x2.f32 %r8743, %r9285, %r9284; 2026-02-21T12:31:22.1052685Z cvt.rn.bf16x2.f32 %r8744, %r9287, %r9286; 2026-02-21T12:31:22.1052758Z cvt.rn.bf16x2.f32 %r8745, %r9289, %r9288; 2026-02-21T12:31:22.1052941Z cvt.rn.bf16x2.f32 %r8746, %r9291, %r9290; 2026-02-21T12:31:22.1053017Z cvt.rn.bf16x2.f32 %r8747, %r9293, %r9292; 2026-02-21T12:31:22.1053091Z cvt.rn.bf16x2.f32 %r8748, %r9295, %r9294; 2026-02-21T12:31:22.1053168Z cvt.rn.bf16x2.f32 %r8749, %r9297, %r9296; 2026-02-21T12:31:22.1053291Z cvt.rn.bf16x2.f32 %r8750, %r9299, %r9298; 2026-02-21T12:31:22.1053363Z cvt.rn.bf16x2.f32 %r8751, %r9301, %r9300; 2026-02-21T12:31:22.1053436Z cvt.rn.bf16x2.f32 %r8752, %r9303, %r9302; 2026-02-21T12:31:22.1053513Z cvt.rn.bf16x2.f32 %r8753, %r9305, %r9304; 2026-02-21T12:31:22.1053585Z cvt.rn.bf16x2.f32 %r8754, %r9307, %r9306; 2026-02-21T12:31:22.1053657Z cvt.rn.bf16x2.f32 %r8755, %r9309, %r9308; 2026-02-21T12:31:22.1053734Z cvt.rn.bf16x2.f32 %r8756, %r9311, %r9310; 2026-02-21T12:31:22.1053807Z cvt.rn.bf16x2.f32 %r8757, %r9313, %r9312; 2026-02-21T12:31:22.1053880Z cvt.rn.bf16x2.f32 %r8758, %r9315, %r9314; 2026-02-21T12:31:22.1053957Z cvt.rn.bf16x2.f32 %r8759, %r9317, %r9316; 2026-02-21T12:31:22.1054032Z cvt.rn.bf16x2.f32 %r8760, %r9319, %r9318; 2026-02-21T12:31:22.1054153Z cvt.rn.bf16x2.f32 %r8761, %r9321, %r9320; 2026-02-21T12:31:22.1054229Z cvt.rn.bf16x2.f32 %r8762, %r9323, %r9322; 2026-02-21T12:31:22.1054305Z cvt.rn.bf16x2.f32 %r8763, %r9325, %r9324; 2026-02-21T12:31:22.1054380Z cvt.rn.bf16x2.f32 %r8764, %r9327, %r9326; 2026-02-21T12:31:22.1054452Z cvt.rn.bf16x2.f32 %r8765, %r9329, %r9328; 2026-02-21T12:31:22.1054572Z cvt.rn.bf16x2.f32 %r8766, %r9331, %r9330; 2026-02-21T12:31:22.1054650Z cvt.rn.bf16x2.f32 %r8767, %r9333, %r9332; 2026-02-21T12:31:22.1054723Z cvt.rn.bf16x2.f32 %r8768, %r9335, %r9334; 2026-02-21T12:31:22.1054801Z cvt.rn.bf16x2.f32 %r8769, %r9337, %r9336; 2026-02-21T12:31:22.1054885Z cvt.rn.bf16x2.f32 %r8770, %r9339, %r9338; 2026-02-21T12:31:22.1054960Z cvt.rn.bf16x2.f32 %r8771, %r9341, %r9340; 2026-02-21T12:31:22.1055035Z cvt.rn.bf16x2.f32 %r8772, %r9343, %r9342; 2026-02-21T12:31:22.1055116Z cvt.rn.bf16x2.f32 %r8773, %r9345, %r9344; 2026-02-21T12:31:22.1055189Z cvt.rn.bf16x2.f32 %r8774, %r9347, %r9346; 2026-02-21T12:31:22.1055262Z cvt.rn.bf16x2.f32 %r8775, %r9349, %r9348; 2026-02-21T12:31:22.1055339Z cvt.rn.bf16x2.f32 %r8776, %r9351, %r9350; 2026-02-21T12:31:22.1055412Z cvt.rn.bf16x2.f32 %r8777, %r9353, %r9352; 2026-02-21T12:31:22.1055486Z cvt.rn.bf16x2.f32 %r8778, %r9355, %r9354; 2026-02-21T12:31:22.1055558Z cvt.rn.bf16x2.f32 %r8779, %r9357, %r9356; 2026-02-21T12:31:22.1055637Z cvt.rn.bf16x2.f32 %r8780, %r9359, %r9358; 2026-02-21T12:31:22.1055708Z cvt.rn.bf16x2.f32 %r8781, %r9361, %r9360; 2026-02-21T12:31:22.1055782Z cvt.rn.bf16x2.f32 %r8782, %r9363, %r9362; 2026-02-21T12:31:22.1055860Z cvt.rn.bf16x2.f32 %r8783, %r9365, %r9364; 2026-02-21T12:31:22.1055932Z cvt.rn.bf16x2.f32 %r8784, %r9367, %r9366; 2026-02-21T12:31:22.1056006Z cvt.rn.bf16x2.f32 %r8785, %r9369, %r9368; 2026-02-21T12:31:22.1056082Z cvt.rn.bf16x2.f32 %r8786, %r9371, %r9370; 2026-02-21T12:31:22.1056155Z cvt.rn.bf16x2.f32 %r8787, %r9373, %r9372; 2026-02-21T12:31:22.1056233Z cvt.rn.bf16x2.f32 %r8788, %r9375, %r9374; 2026-02-21T12:31:22.1056308Z cvt.rn.bf16x2.f32 %r8789, %r9377, %r9376; 2026-02-21T12:31:22.1056385Z cvt.rn.bf16x2.f32 %r8790, %r9379, %r9378; 2026-02-21T12:31:22.1056584Z cvt.rn.bf16x2.f32 %r8791, %r9381, %r9380; 2026-02-21T12:31:22.1056661Z cvt.rn.bf16x2.f32 %r8792, %r9383, %r9382; 2026-02-21T12:31:22.1056741Z cvt.rn.bf16x2.f32 %r8793, %r9385, %r9384; 2026-02-21T12:31:22.1056815Z cvt.rn.bf16x2.f32 %r8794, %r9387, %r9386; 2026-02-21T12:31:22.1056890Z cvt.rn.bf16x2.f32 %r8795, %r9389, %r9388; 2026-02-21T12:31:22.1056967Z cvt.rn.bf16x2.f32 %r8796, %r9391, %r9390; 2026-02-21T12:31:22.1057040Z cvt.rn.bf16x2.f32 %r8797, %r9393, %r9392; 2026-02-21T12:31:22.1057112Z cvt.rn.bf16x2.f32 %r8798, %r9395, %r9394; 2026-02-21T12:31:22.1057183Z cvt.rn.bf16x2.f32 %r8799, %r9397, %r9396; 2026-02-21T12:31:22.1057263Z cvt.rn.bf16x2.f32 %r8800, %r9399, %r9398; 2026-02-21T12:31:22.1057333Z cvt.rn.bf16x2.f32 %r8801, %r9401, %r9400; 2026-02-21T12:31:22.1057405Z cvt.rn.bf16x2.f32 %r8802, %r9403, %r9402; 2026-02-21T12:31:22.1057589Z cvt.rn.bf16x2.f32 %r8803, %r9405, %r9404; 2026-02-21T12:31:22.1057673Z cvt.rn.bf16x2.f32 %r8804, %r9407, %r9406; 2026-02-21T12:31:22.1057747Z cvt.rn.bf16x2.f32 %r8805, %r9409, %r9408; 2026-02-21T12:31:22.1057880Z cvt.rn.bf16x2.f32 %r8806, %r9411, %r9410; 2026-02-21T12:31:22.1057959Z cvt.rn.bf16x2.f32 %r8807, %r9413, %r9412; 2026-02-21T12:31:22.1058031Z cvt.rn.bf16x2.f32 %r8808, %r9415, %r9414; 2026-02-21T12:31:22.1058103Z cvt.rn.bf16x2.f32 %r8809, %r9417, %r9416; 2026-02-21T12:31:22.1058178Z cvt.rn.bf16x2.f32 %r8810, %r9419, %r9418; 2026-02-21T12:31:22.1058250Z cvt.rn.bf16x2.f32 %r8811, %r9421, %r9420; 2026-02-21T12:31:22.1058323Z cvt.rn.bf16x2.f32 %r8812, %r9423, %r9422; 2026-02-21T12:31:22.1058400Z cvt.rn.bf16x2.f32 %r8813, %r9425, %r9424; 2026-02-21T12:31:22.1058473Z cvt.rn.bf16x2.f32 %r8814, %r9427, %r9426; 2026-02-21T12:31:22.1058546Z cvt.rn.bf16x2.f32 %r8815, %r9429, %r9428; 2026-02-21T12:31:22.1058617Z cvt.rn.bf16x2.f32 %r8816, %r9431, %r9430; 2026-02-21T12:31:22.1058765Z cvt.rn.bf16x2.f32 %r8817, %r9433, %r9432; 2026-02-21T12:31:22.1058845Z cvt.rn.bf16x2.f32 %r8818, %r9435, %r9434; 2026-02-21T12:31:22.1058920Z cvt.rn.bf16x2.f32 %r8819, %r9437, %r9436; 2026-02-21T12:31:22.1059002Z cvt.rn.bf16x2.f32 %r8820, %r9439, %r9438; 2026-02-21T12:31:22.1059074Z cvt.rn.bf16x2.f32 %r8821, %r9441, %r9440; 2026-02-21T12:31:22.1059206Z cvt.rn.bf16x2.f32 %r8822, %r9443, %r9442; 2026-02-21T12:31:22.1059299Z cvt.rn.bf16x2.f32 %r8823, %r9445, %r9444; 2026-02-21T12:31:22.1059373Z cvt.rn.bf16x2.f32 %r8824, %r9447, %r9446; 2026-02-21T12:31:22.1059445Z cvt.rn.bf16x2.f32 %r8825, %r9449, %r9448; 2026-02-21T12:31:22.1059519Z cvt.rn.bf16x2.f32 %r8826, %r9451, %r9450; 2026-02-21T12:31:22.1059599Z cvt.rn.bf16x2.f32 %r8827, %r9453, %r9452; 2026-02-21T12:31:22.1059672Z cvt.rn.bf16x2.f32 %r8828, %r9455, %r9454; 2026-02-21T12:31:22.1059746Z cvt.rn.bf16x2.f32 %r8829, %r9457, %r9456; 2026-02-21T12:31:22.1059824Z cvt.rn.bf16x2.f32 %r8830, %r9459, %r9458; 2026-02-21T12:31:22.1059899Z cvt.rn.bf16x2.f32 %r8831, %r9461, %r9460; 2026-02-21T12:31:22.1059974Z cvt.rn.bf16x2.f32 %r8832, %r9463, %r9462; 2026-02-21T12:31:22.1060053Z cvt.rn.bf16x2.f32 %r8833, %r9465, %r9464; 2026-02-21T12:31:22.1060128Z cvt.rn.bf16x2.f32 %r8834, %r9467, %r9466; 2026-02-21T12:31:22.1060200Z cvt.rn.bf16x2.f32 %r8835, %r9469, %r9468; 2026-02-21T12:31:22.1060274Z cvt.rn.bf16x2.f32 %r8836, %r9471, %r9470; 2026-02-21T12:31:22.1060355Z cvt.rn.bf16x2.f32 %r8837, %r9473, %r9472; 2026-02-21T12:31:22.1060427Z cvt.rn.bf16x2.f32 %r8838, %r9475, %r9474; 2026-02-21T12:31:22.1060499Z cvt.rn.bf16x2.f32 %r8839, %r9477, %r9476; 2026-02-21T12:31:22.1060573Z cvt.rn.bf16x2.f32 %r8840, %r9479, %r9478; 2026-02-21T12:31:22.1060662Z cvt.rn.bf16x2.f32 %r8841, %r9481, %r9480; 2026-02-21T12:31:22.1060738Z cvt.rn.bf16x2.f32 %r8842, %r9483, %r9482; 2026-02-21T12:31:22.1060818Z cvt.rn.bf16x2.f32 %r8843, %r9485, %r9484; 2026-02-21T12:31:22.1060893Z cvt.rn.bf16x2.f32 %r8844, %r9487, %r9486; 2026-02-21T12:31:22.1061043Z cvt.rn.bf16x2.f32 %r8845, %r9489, %r9488; 2026-02-21T12:31:22.1061142Z cvt.rn.bf16x2.f32 %r8846, %r9491, %r9490; 2026-02-21T12:31:22.1061241Z cvt.rn.bf16x2.f32 %r8847, %r9493, %r9492; 2026-02-21T12:31:22.1061366Z cvt.rn.bf16x2.f32 %r8848, %r9495, %r9494; 2026-02-21T12:31:22.1061463Z cvt.rn.bf16x2.f32 %r8849, %r9497, %r9496; 2026-02-21T12:31:22.1061564Z cvt.rn.bf16x2.f32 %r8850, %r9499, %r9498; 2026-02-21T12:31:22.1061691Z cvt.rn.bf16x2.f32 %r8851, %r9501, %r9500; 2026-02-21T12:31:22.1061788Z cvt.rn.bf16x2.f32 %r8852, %r9503, %r9502; 2026-02-21T12:31:22.1061882Z cvt.rn.bf16x2.f32 %r8853, %r9505, %r9504; 2026-02-21T12:31:22.1061981Z cvt.rn.bf16x2.f32 %r8854, %r9507, %r9506; 2026-02-21T12:31:22.1062110Z cvt.rn.bf16x2.f32 %r8855, %r9509, %r9508; 2026-02-21T12:31:22.1062206Z cvt.rn.bf16x2.f32 %r8856, %r9511, %r9510; 2026-02-21T12:31:22.1062303Z cvt.rn.bf16x2.f32 %r8857, %r9513, %r9512; 2026-02-21T12:31:22.1068515Z cvt.rn.bf16x2.f32 %r8858, %r9515, %r9514; 2026-02-21T12:31:22.1068792Z cvt.rn.bf16x2.f32 %r8859, %r9517, %r9516; 2026-02-21T12:31:22.1068881Z cvt.rn.bf16x2.f32 %r8860, %r9519, %r9518; 2026-02-21T12:31:22.1068963Z cvt.rn.bf16x2.f32 %r8861, %r9521, %r9520; 2026-02-21T12:31:22.1069127Z cvt.rn.bf16x2.f32 %r8862, %r9523, %r9522; 2026-02-21T12:31:22.1069199Z cvt.rn.bf16x2.f32 %r8863, %r9525, %r9524; 2026-02-21T12:31:22.1069273Z cvt.rn.bf16x2.f32 %r8864, %r9527, %r9526; 2026-02-21T12:31:22.1069343Z cvt.rn.bf16x2.f32 %r8865, %r9529, %r9528; 2026-02-21T12:31:22.1069410Z cvt.rn.bf16x2.f32 %r8866, %r9531, %r9530; 2026-02-21T12:31:22.1069484Z cvt.rn.bf16x2.f32 %r8867, %r9533, %r9532; 2026-02-21T12:31:22.1069557Z cvt.rn.bf16x2.f32 %r8868, %r9535, %r9534; 2026-02-21T12:31:22.1069627Z cvt.rn.bf16x2.f32 %r8869, %r9537, %r9536; 2026-02-21T12:31:22.1069696Z cvt.rn.bf16x2.f32 %r8870, %r9539, %r9538; 2026-02-21T12:31:22.1069766Z cvt.rn.bf16x2.f32 %r8871, %r9541, %r9540; 2026-02-21T12:31:22.1069834Z cvt.rn.bf16x2.f32 %r8872, %r9543, %r9542; 2026-02-21T12:31:22.1069983Z cvt.rn.bf16x2.f32 %r8873, %r9545, %r9544; 2026-02-21T12:31:22.1070061Z cvt.rn.bf16x2.f32 %r8874, %r9547, %r9546; 2026-02-21T12:31:22.1070133Z cvt.rn.bf16x2.f32 %r8875, %r9549, %r9548; 2026-02-21T12:31:22.1070207Z cvt.rn.bf16x2.f32 %r8876, %r9551, %r9550; 2026-02-21T12:31:22.1070277Z cvt.rn.bf16x2.f32 %r8877, %r9553, %r9552; 2026-02-21T12:31:22.1070424Z cvt.rn.bf16x2.f32 %r8878, %r9555, %r9554; 2026-02-21T12:31:22.1070501Z cvt.rn.bf16x2.f32 %r8879, %r9557, %r9556; 2026-02-21T12:31:22.1070572Z cvt.rn.bf16x2.f32 %r8880, %r9559, %r9558; 2026-02-21T12:31:22.1070646Z cvt.rn.bf16x2.f32 %r8881, %r9561, %r9560; 2026-02-21T12:31:22.1070717Z cvt.rn.bf16x2.f32 %r8882, %r9563, %r9562; 2026-02-21T12:31:22.1070789Z cvt.rn.bf16x2.f32 %r8883, %r9565, %r9564; 2026-02-21T12:31:22.1070864Z cvt.rn.bf16x2.f32 %r8884, %r9567, %r9566; 2026-02-21T12:31:22.1070934Z cvt.rn.bf16x2.f32 %r8885, %r9569, %r9568; 2026-02-21T12:31:22.1071016Z cvt.rn.bf16x2.f32 %r8886, %r9571, %r9570; 2026-02-21T12:31:22.1071092Z cvt.rn.bf16x2.f32 %r8887, %r9573, %r9572; 2026-02-21T12:31:22.1071172Z cvt.rn.bf16x2.f32 %r8888, %r9575, %r9574; 2026-02-21T12:31:22.1071244Z cvt.rn.bf16x2.f32 %r8889, %r9577, %r9576; 2026-02-21T12:31:22.1071318Z cvt.rn.bf16x2.f32 %r8890, %r9579, %r9578; 2026-02-21T12:31:22.1071395Z cvt.rn.bf16x2.f32 %r8891, %r9581, %r9580; 2026-02-21T12:31:22.1071467Z cvt.rn.bf16x2.f32 %r8892, %r9583, %r9582; 2026-02-21T12:31:22.1071540Z cvt.rn.bf16x2.f32 %r8893, %r9585, %r9584; 2026-02-21T12:31:22.1071610Z cvt.rn.bf16x2.f32 %r8894, %r9587, %r9586; 2026-02-21T12:31:22.1071685Z cvt.rn.bf16x2.f32 %r8895, %r9589, %r9588; 2026-02-21T12:31:22.1071755Z cvt.rn.bf16x2.f32 %r8896, %r9591, %r9590; 2026-02-21T12:31:22.1071827Z cvt.rn.bf16x2.f32 %r8897, %r9593, %r9592; 2026-02-21T12:31:22.1071902Z cvt.rn.bf16x2.f32 %r8898, %r9595, %r9594; 2026-02-21T12:31:22.1071973Z cvt.rn.bf16x2.f32 %r8899, %r9597, %r9596; 2026-02-21T12:31:22.1072044Z cvt.rn.bf16x2.f32 %r8900, %r9599, %r9598; 2026-02-21T12:31:22.1072126Z cvt.rn.bf16x2.f32 %r8901, %r9601, %r9600; 2026-02-21T12:31:22.1072197Z cvt.rn.bf16x2.f32 %r8902, %r9603, %r9602; 2026-02-21T12:31:22.1072267Z cvt.rn.bf16x2.f32 %r8903, %r9605, %r9604; 2026-02-21T12:31:22.1072341Z cvt.rn.bf16x2.f32 %r8904, %r9607, %r9606; 2026-02-21T12:31:22.1072419Z cvt.rn.bf16x2.f32 %r8905, %r9609, %r9608; 2026-02-21T12:31:22.1072491Z cvt.rn.bf16x2.f32 %r8906, %r9611, %r9610; 2026-02-21T12:31:22.1072563Z cvt.rn.bf16x2.f32 %r8907, %r9613, %r9612; 2026-02-21T12:31:22.1072640Z cvt.rn.bf16x2.f32 %r8908, %r9615, %r9614; 2026-02-21T12:31:22.1072710Z cvt.rn.bf16x2.f32 %r8909, %r9617, %r9616; 2026-02-21T12:31:22.1072792Z cvt.rn.bf16x2.f32 %r8910, %r9619, %r9618; 2026-02-21T12:31:22.1072871Z cvt.rn.bf16x2.f32 %r8911, %r9621, %r9620; 2026-02-21T12:31:22.1072946Z cvt.rn.bf16x2.f32 %r8912, %r9623, %r9622; 2026-02-21T12:31:22.1073019Z cvt.rn.bf16x2.f32 %r8913, %r9625, %r9624; 2026-02-21T12:31:22.1073089Z cvt.rn.bf16x2.f32 %r8914, %r9627, %r9626; 2026-02-21T12:31:22.1073230Z cvt.rn.bf16x2.f32 %r8915, %r9629, %r9628; 2026-02-21T12:31:22.1073303Z cvt.rn.bf16x2.f32 %r8916, %r9631, %r9630; 2026-02-21T12:31:22.1073373Z cvt.rn.bf16x2.f32 %r8917, %r9633, %r9632; 2026-02-21T12:31:22.1073495Z cvt.rn.bf16x2.f32 %r8918, %r9635, %r9634; 2026-02-21T12:31:22.1073566Z cvt.rn.bf16x2.f32 %r8919, %r9637, %r9636; 2026-02-21T12:31:22.1073636Z cvt.rn.bf16x2.f32 %r8920, %r9639, %r9638; 2026-02-21T12:31:22.1073712Z cvt.rn.bf16x2.f32 %r8921, %r9641, %r9640; 2026-02-21T12:31:22.1073784Z cvt.rn.bf16x2.f32 %r8922, %r9643, %r9642; 2026-02-21T12:31:22.1073854Z cvt.rn.bf16x2.f32 %r8923, %r9645, %r9644; 2026-02-21T12:31:22.1073926Z cvt.rn.bf16x2.f32 %r8924, %r9647, %r9646; 2026-02-21T12:31:22.1074003Z cvt.rn.bf16x2.f32 %r8925, %r9649, %r9648; 2026-02-21T12:31:22.1074075Z cvt.rn.bf16x2.f32 %r8926, %r9651, %r9650; 2026-02-21T12:31:22.1074145Z cvt.rn.bf16x2.f32 %r8927, %r9653, %r9652; 2026-02-21T12:31:22.1074223Z cvt.rn.bf16x2.f32 %r8928, %r9655, %r9654; 2026-02-21T12:31:22.1074340Z cvt.rn.bf16x2.f32 %r8929, %r9657, %r9656; 2026-02-21T12:31:22.1074412Z cvt.rn.bf16x2.f32 %r8930, %r9659, %r9658; 2026-02-21T12:31:22.1074482Z cvt.rn.bf16x2.f32 %r8931, %r9661, %r9660; 2026-02-21T12:31:22.1074561Z cvt.rn.bf16x2.f32 %r8932, %r9663, %r9662; 2026-02-21T12:31:22.1074632Z cvt.rn.bf16x2.f32 %r8933, %r9665, %r9664; 2026-02-21T12:31:22.1074707Z cvt.rn.bf16x2.f32 %r8934, %r9667, %r9666; 2026-02-21T12:31:22.1074851Z cvt.rn.bf16x2.f32 %r8935, %r9669, %r9668; 2026-02-21T12:31:22.1074923Z cvt.rn.bf16x2.f32 %r8936, %r9671, %r9670; 2026-02-21T12:31:22.1074995Z cvt.rn.bf16x2.f32 %r8937, %r9673, %r9672; 2026-02-21T12:31:22.1075069Z cvt.rn.bf16x2.f32 %r8938, %r9675, %r9674; 2026-02-21T12:31:22.1075139Z cvt.rn.bf16x2.f32 %r8939, %r9677, %r9676; 2026-02-21T12:31:22.1075209Z cvt.rn.bf16x2.f32 %r8940, %r9679, %r9678; 2026-02-21T12:31:22.1075279Z cvt.rn.bf16x2.f32 %r8941, %r9681, %r9680; 2026-02-21T12:31:22.1075356Z cvt.rn.bf16x2.f32 %r8942, %r9683, %r9682; 2026-02-21T12:31:22.1075430Z cvt.rn.bf16x2.f32 %r8943, %r9685, %r9684; 2026-02-21T12:31:22.1075504Z cvt.rn.bf16x2.f32 %r8944, %r9687, %r9686; 2026-02-21T12:31:22.1075581Z cvt.rn.bf16x2.f32 %r8945, %r9689, %r9688; 2026-02-21T12:31:22.1075655Z cvt.rn.bf16x2.f32 %r8946, %r9691, %r9690; 2026-02-21T12:31:22.1075725Z cvt.rn.bf16x2.f32 %r8947, %r9693, %r9692; 2026-02-21T12:31:22.1075802Z cvt.rn.bf16x2.f32 %r8948, %r9695, %r9694; 2026-02-21T12:31:22.1075874Z cvt.rn.bf16x2.f32 %r8949, %r9697, %r9696; 2026-02-21T12:31:22.1075944Z cvt.rn.bf16x2.f32 %r8950, %r9699, %r9698; 2026-02-21T12:31:22.1076015Z cvt.rn.bf16x2.f32 %r8951, %r9701, %r9700; 2026-02-21T12:31:22.1076090Z cvt.rn.bf16x2.f32 %r8952, %r9703, %r9702; 2026-02-21T12:31:22.1076161Z cvt.rn.bf16x2.f32 %r8953, %r9705, %r9704; 2026-02-21T12:31:22.1076230Z cvt.rn.bf16x2.f32 %r8954, %r9707, %r9706; 2026-02-21T12:31:22.1076306Z cvt.rn.bf16x2.f32 %r8955, %r9709, %r9708; 2026-02-21T12:31:22.1076377Z cvt.rn.bf16x2.f32 %r8956, %r9711, %r9710; 2026-02-21T12:31:22.1076600Z cvt.rn.bf16x2.f32 %r8957, %r9713, %r9712; 2026-02-21T12:31:22.1076681Z cvt.rn.bf16x2.f32 %r8958, %r9715, %r9714; 2026-02-21T12:31:22.1076758Z cvt.rn.bf16x2.f32 %r8959, %r9717, %r9716; 2026-02-21T12:31:22.1076834Z cvt.rn.bf16x2.f32 %r8960, %r9719, %r9718; 2026-02-21T12:31:22.1076906Z cvt.rn.bf16x2.f32 %r8961, %r9721, %r9720; 2026-02-21T12:31:22.1076983Z cvt.rn.bf16x2.f32 %r8962, %r9723, %r9722; 2026-02-21T12:31:22.1077057Z cvt.rn.bf16x2.f32 %r8963, %r9725, %r9724; 2026-02-21T12:31:22.1077130Z cvt.rn.bf16x2.f32 %r8964, %r9727, %r9726; 2026-02-21T12:31:22.1077207Z cvt.rn.bf16x2.f32 %r8965, %r9729, %r9728; 2026-02-21T12:31:22.1077279Z cvt.rn.bf16x2.f32 %r8966, %r9731, %r9730; 2026-02-21T12:31:22.1077352Z cvt.rn.bf16x2.f32 %r8967, %r9733, %r9732; 2026-02-21T12:31:22.1077424Z cvt.rn.bf16x2.f32 %r8968, %r9735, %r9734; 2026-02-21T12:31:22.1077502Z cvt.rn.bf16x2.f32 %r8969, %r9737, %r9736; 2026-02-21T12:31:22.1077574Z cvt.rn.bf16x2.f32 %r8970, %r9739, %r9738; 2026-02-21T12:31:22.1077727Z cvt.rn.bf16x2.f32 %r8971, %r9741, %r9740; 2026-02-21T12:31:22.1077803Z cvt.rn.bf16x2.f32 %r8972, %r9743, %r9742; 2026-02-21T12:31:22.1077874Z cvt.rn.bf16x2.f32 %r8973, %r9745, %r9744; 2026-02-21T12:31:22.1078006Z cvt.rn.bf16x2.f32 %r8974, %r9747, %r9746; 2026-02-21T12:31:22.1078082Z cvt.rn.bf16x2.f32 %r8975, %r9749, %r9748; 2026-02-21T12:31:22.1078154Z cvt.rn.bf16x2.f32 %r8976, %r9751, %r9750; 2026-02-21T12:31:22.1078227Z cvt.rn.bf16x2.f32 %r8977, %r9753, %r9752; 2026-02-21T12:31:22.1078298Z cvt.rn.bf16x2.f32 %r8978, %r9755, %r9754; 2026-02-21T12:31:22.1078375Z cvt.rn.bf16x2.f32 %r8979, %r9757, %r9756; 2026-02-21T12:31:22.1078455Z cvt.rn.bf16x2.f32 %r8980, %r9759, %r9758; 2026-02-21T12:31:22.1078531Z cvt.rn.bf16x2.f32 %r8981, %r9761, %r9760; 2026-02-21T12:31:22.1078605Z cvt.rn.bf16x2.f32 %r8982, %r9763, %r9762; 2026-02-21T12:31:22.1078678Z cvt.rn.bf16x2.f32 %r8983, %r9765, %r9764; 2026-02-21T12:31:22.1078748Z cvt.rn.bf16x2.f32 %r8984, %r9767, %r9766; 2026-02-21T12:31:22.1078883Z cvt.rn.bf16x2.f32 %r8985, %r9769, %r9768; 2026-02-21T12:31:22.1078962Z cvt.rn.bf16x2.f32 %r8986, %r9771, %r9770; 2026-02-21T12:31:22.1079034Z cvt.rn.bf16x2.f32 %r8987, %r9773, %r9772; 2026-02-21T12:31:22.1079106Z cvt.rn.bf16x2.f32 %r8988, %r9775, %r9774; 2026-02-21T12:31:22.1079183Z cvt.rn.bf16x2.f32 %r8989, %r9777, %r9776; 2026-02-21T12:31:22.1079254Z cvt.rn.bf16x2.f32 %r8990, %r9779, %r9778; 2026-02-21T12:31:22.1079384Z cvt.rn.bf16x2.f32 %r8991, %r9781, %r9780; 2026-02-21T12:31:22.1079463Z cvt.rn.bf16x2.f32 %r8992, %r9783, %r9782; 2026-02-21T12:31:22.1079534Z cvt.rn.bf16x2.f32 %r8993, %r9785, %r9784; 2026-02-21T12:31:22.1079606Z cvt.rn.bf16x2.f32 %r8994, %r9787, %r9786; 2026-02-21T12:31:22.1079679Z cvt.rn.bf16x2.f32 %r8995, %r9789, %r9788; 2026-02-21T12:31:22.1079755Z cvt.rn.bf16x2.f32 %r8996, %r9791, %r9790; 2026-02-21T12:31:22.1079825Z cvt.rn.bf16x2.f32 %r8997, %r9793, %r9792; 2026-02-21T12:31:22.1079896Z cvt.rn.bf16x2.f32 %r8998, %r9795, %r9794; 2026-02-21T12:31:22.1079987Z cvt.rn.bf16x2.f32 %r8999, %r9797, %r9796; 2026-02-21T12:31:22.1080061Z cvt.rn.bf16x2.f32 %r9000, %r9799, %r9798; 2026-02-21T12:31:22.1080133Z cvt.rn.bf16x2.f32 %r9001, %r9801, %r9800; 2026-02-21T12:31:22.1080209Z cvt.rn.bf16x2.f32 %r9002, %r9803, %r9802; 2026-02-21T12:31:22.1080282Z cvt.rn.bf16x2.f32 %r9003, %r9805, %r9804; 2026-02-21T12:31:22.1080354Z cvt.rn.bf16x2.f32 %r9004, %r9807, %r9806; 2026-02-21T12:31:22.1080425Z cvt.rn.bf16x2.f32 %r9005, %r9809, %r9808; 2026-02-21T12:31:22.1080502Z cvt.rn.bf16x2.f32 %r9006, %r9811, %r9810; 2026-02-21T12:31:22.1080575Z cvt.rn.bf16x2.f32 %r9007, %r9813, %r9812; 2026-02-21T12:31:22.1080645Z cvt.rn.bf16x2.f32 %r9008, %r9815, %r9814; 2026-02-21T12:31:22.1080722Z cvt.rn.bf16x2.f32 %r9009, %r9817, %r9816; 2026-02-21T12:31:22.1080794Z cvt.rn.bf16x2.f32 %r9010, %r9819, %r9818; 2026-02-21T12:31:22.1080864Z cvt.rn.bf16x2.f32 %r9011, %r9821, %r9820; 2026-02-21T12:31:22.1080935Z cvt.rn.bf16x2.f32 %r9012, %r9823, %r9822; 2026-02-21T12:31:22.1081013Z cvt.rn.bf16x2.f32 %r9013, %r9825, %r9824; 2026-02-21T12:31:22.1081086Z cvt.rn.bf16x2.f32 %r9014, %r9827, %r9826; 2026-02-21T12:31:22.1081159Z cvt.rn.bf16x2.f32 %r9015, %r9829, %r9828; 2026-02-21T12:31:22.1081236Z cvt.rn.bf16x2.f32 %r9016, %r9831, %r9830; 2026-02-21T12:31:22.1081307Z cvt.rn.bf16x2.f32 %r9017, %r9833, %r9832; 2026-02-21T12:31:22.1081379Z cvt.rn.bf16x2.f32 %r9018, %r9835, %r9834; 2026-02-21T12:31:22.1081457Z cvt.rn.bf16x2.f32 %r9019, %r9837, %r9836; 2026-02-21T12:31:22.1081532Z cvt.rn.bf16x2.f32 %r9020, %r9839, %r9838; 2026-02-21T12:31:22.1081603Z cvt.rn.bf16x2.f32 %r9021, %r9841, %r9840; 2026-02-21T12:31:22.1081675Z cvt.rn.bf16x2.f32 %r9022, %r9843, %r9842; 2026-02-21T12:31:22.1081751Z cvt.rn.bf16x2.f32 %r9023, %r9845, %r9844; 2026-02-21T12:31:22.1081821Z cvt.rn.bf16x2.f32 %r9024, %r9847, %r9846; 2026-02-21T12:31:22.1081890Z cvt.rn.bf16x2.f32 %r9025, %r9849, %r9848; 2026-02-21T12:31:22.1081968Z cvt.rn.bf16x2.f32 %r9026, %r9851, %r9850; 2026-02-21T12:31:22.1082111Z cvt.rn.bf16x2.f32 %r9027, %r9853, %r9852; 2026-02-21T12:31:22.1082183Z cvt.rn.bf16x2.f32 %r9028, %r9855, %r9854; 2026-02-21T12:31:22.1082260Z cvt.rn.bf16x2.f32 %r9029, %r9857, %r9856; 2026-02-21T12:31:22.1082331Z cvt.rn.bf16x2.f32 %r9030, %r9859, %r9858; 2026-02-21T12:31:22.1082452Z cvt.rn.bf16x2.f32 %r9031, %r9861, %r9860; 2026-02-21T12:31:22.1082525Z cvt.rn.bf16x2.f32 %r9032, %r9863, %r9862; 2026-02-21T12:31:22.1082602Z cvt.rn.bf16x2.f32 %r9033, %r9865, %r9864; 2026-02-21T12:31:22.1082673Z cvt.rn.bf16x2.f32 %r9034, %r9867, %r9866; 2026-02-21T12:31:22.1082745Z cvt.rn.bf16x2.f32 %r9035, %r9869, %r9868; 2026-02-21T12:31:22.1082822Z cvt.rn.bf16x2.f32 %r9036, %r9871, %r9870; 2026-02-21T12:31:22.1082897Z cvt.rn.bf16x2.f32 %r9037, %r9873, %r9872; 2026-02-21T12:31:22.1082967Z cvt.rn.bf16x2.f32 %r9038, %r9875, %r9874; 2026-02-21T12:31:22.1083041Z cvt.rn.bf16x2.f32 %r9039, %r9877, %r9876; 2026-02-21T12:31:22.1083113Z cvt.rn.bf16x2.f32 %r9040, %r9879, %r9878; 2026-02-21T12:31:22.1083235Z cvt.rn.bf16x2.f32 %r9041, %r9881, %r9880; 2026-02-21T12:31:22.1083310Z cvt.rn.bf16x2.f32 %r9042, %r9883, %r9882; 2026-02-21T12:31:22.1083391Z cvt.rn.bf16x2.f32 %r9043, %r9885, %r9884; 2026-02-21T12:31:22.1083462Z cvt.rn.bf16x2.f32 %r9044, %r9887, %r9886; 2026-02-21T12:31:22.1083536Z cvt.rn.bf16x2.f32 %r9045, %r9889, %r9888; 2026-02-21T12:31:22.1083612Z cvt.rn.bf16x2.f32 %r9046, %r9891, %r9890; 2026-02-21T12:31:22.1083744Z cvt.rn.bf16x2.f32 %r9047, %r9893, %r9892; 2026-02-21T12:31:22.1083822Z cvt.rn.bf16x2.f32 %r9048, %r9895, %r9894; 2026-02-21T12:31:22.1083894Z cvt.rn.bf16x2.f32 %r9049, %r9897, %r9896; 2026-02-21T12:31:22.1083969Z cvt.rn.bf16x2.f32 %r9050, %r9899, %r9898; 2026-02-21T12:31:22.1084040Z cvt.rn.bf16x2.f32 %r9051, %r9901, %r9900; 2026-02-21T12:31:22.1084110Z cvt.rn.bf16x2.f32 %r9052, %r9903, %r9902; 2026-02-21T12:31:22.1084186Z cvt.rn.bf16x2.f32 %r9053, %r9905, %r9904; 2026-02-21T12:31:22.1084257Z cvt.rn.bf16x2.f32 %r9054, %r9907, %r9906; 2026-02-21T12:31:22.1084331Z cvt.rn.bf16x2.f32 %r9055, %r9909, %r9908; 2026-02-21T12:31:22.1084409Z cvt.rn.bf16x2.f32 %r9056, %r9911, %r9910; 2026-02-21T12:31:22.1084482Z cvt.rn.bf16x2.f32 %r9057, %r9913, %r9912; 2026-02-21T12:31:22.1084555Z cvt.rn.bf16x2.f32 %r9058, %r9915, %r9914; 2026-02-21T12:31:22.1084629Z cvt.rn.bf16x2.f32 %r9059, %r9917, %r9916; 2026-02-21T12:31:22.1084705Z cvt.rn.bf16x2.f32 %r9060, %r9919, %r9918; 2026-02-21T12:31:22.1084776Z cvt.rn.bf16x2.f32 %r9061, %r9921, %r9920; 2026-02-21T12:31:22.1084845Z cvt.rn.bf16x2.f32 %r9062, %r9923, %r9922; 2026-02-21T12:31:22.1084921Z cvt.rn.bf16x2.f32 %r9063, %r9925, %r9924; 2026-02-21T12:31:22.1084992Z cvt.rn.bf16x2.f32 %r9064, %r9927, %r9926; 2026-02-21T12:31:22.1085063Z cvt.rn.bf16x2.f32 %r9065, %r9929, %r9928; 2026-02-21T12:31:22.1085142Z cvt.rn.bf16x2.f32 %r9066, %r9931, %r9930; 2026-02-21T12:31:22.1085216Z cvt.rn.bf16x2.f32 %r9067, %r9933, %r9932; 2026-02-21T12:31:22.1085289Z cvt.rn.bf16x2.f32 %r9068, %r9935, %r9934; 2026-02-21T12:31:22.1085365Z cvt.rn.bf16x2.f32 %r9069, %r9937, %r9936; 2026-02-21T12:31:22.1085443Z cvt.rn.bf16x2.f32 %r9070, %r9939, %r9938; 2026-02-21T12:31:22.1085515Z cvt.rn.bf16x2.f32 %r9071, %r9941, %r9940; 2026-02-21T12:31:22.1085585Z cvt.rn.bf16x2.f32 %r9072, %r9943, %r9942; 2026-02-21T12:31:22.1085663Z cvt.rn.bf16x2.f32 %r9073, %r9945, %r9944; 2026-02-21T12:31:22.1085734Z cvt.rn.bf16x2.f32 %r9074, %r9947, %r9946; 2026-02-21T12:31:22.1085806Z cvt.rn.bf16x2.f32 %r9075, %r9949, %r9948; 2026-02-21T12:31:22.1085881Z cvt.rn.bf16x2.f32 %r9076, %r9951, %r9950; 2026-02-21T12:31:22.1085957Z cvt.rn.bf16x2.f32 %r9077, %r9953, %r9952; 2026-02-21T12:31:22.1086028Z cvt.rn.bf16x2.f32 %r9078, %r9955, %r9954; 2026-02-21T12:31:22.1086099Z cvt.rn.bf16x2.f32 %r9079, %r9957, %r9956; 2026-02-21T12:31:22.1086176Z cvt.rn.bf16x2.f32 %r9080, %r9959, %r9958; 2026-02-21T12:31:22.1086247Z cvt.rn.bf16x2.f32 %r9081, %r9961, %r9960; 2026-02-21T12:31:22.1086318Z cvt.rn.bf16x2.f32 %r9082, %r9963, %r9962; 2026-02-21T12:31:22.1086580Z cvt.rn.bf16x2.f32 %r9083, %r9965, %r9964; 2026-02-21T12:31:22.1086659Z cvt.rn.bf16x2.f32 %r9084, %r9967, %r9966; 2026-02-21T12:31:22.1086742Z cvt.rn.bf16x2.f32 %r9085, %r9969, %r9968; 2026-02-21T12:31:22.1086814Z cvt.rn.bf16x2.f32 %r9086, %r9971, %r9970; 2026-02-21T12:31:22.1086964Z cvt.rn.bf16x2.f32 %r9087, %r9973, %r9972; 2026-02-21T12:31:22.1087035Z cvt.rn.bf16x2.f32 %r9088, %r9975, %r9974; 2026-02-21T12:31:22.1087107Z cvt.rn.bf16x2.f32 %r9089, %r9977, %r9976; 2026-02-21T12:31:22.1087184Z cvt.rn.bf16x2.f32 %r9090, %r9979, %r9978; 2026-02-21T12:31:22.1087254Z cvt.rn.bf16x2.f32 %r9091, %r9981, %r9980; 2026-02-21T12:31:22.1087322Z cvt.rn.bf16x2.f32 %r9092, %r9983, %r9982; 2026-02-21T12:31:22.1087411Z cvt.rn.bf16x2.f32 %r9093, %r9985, %r9984; 2026-02-21T12:31:22.1087486Z cvt.rn.bf16x2.f32 %r9094, %r9987, %r9986; 2026-02-21T12:31:22.1087558Z cvt.rn.bf16x2.f32 %r9095, %r9989, %r9988; 2026-02-21T12:31:22.1087630Z cvt.rn.bf16x2.f32 %r9096, %r9991, %r9990; 2026-02-21T12:31:22.1087775Z cvt.rn.bf16x2.f32 %r9097, %r9993, %r9992; 2026-02-21T12:31:22.1087850Z cvt.rn.bf16x2.f32 %r9098, %r9995, %r9994; 2026-02-21T12:31:22.1087923Z cvt.rn.bf16x2.f32 %r9099, %r9997, %r9996; 2026-02-21T12:31:22.1087998Z cvt.rn.bf16x2.f32 %r9100, %r9999, %r9998; 2026-02-21T12:31:22.1088087Z cvt.rn.bf16x2.f32 %r9101, %r10001, %r10000; 2026-02-21T12:31:22.1088173Z cvt.rn.bf16x2.f32 %r9102, %r10003, %r10002; 2026-02-21T12:31:22.1088310Z cvt.rn.bf16x2.f32 %r9103, %r10005, %r10004; 2026-02-21T12:31:22.1088396Z cvt.rn.bf16x2.f32 %r9104, %r10007, %r10006; 2026-02-21T12:31:22.1088474Z cvt.rn.bf16x2.f32 %r9105, %r10009, %r10008; 2026-02-21T12:31:22.1088550Z cvt.rn.bf16x2.f32 %r9106, %r10011, %r10010; 2026-02-21T12:31:22.1088630Z cvt.rn.bf16x2.f32 %r9107, %r10013, %r10012; 2026-02-21T12:31:22.1088705Z cvt.rn.bf16x2.f32 %r9108, %r10015, %r10014; 2026-02-21T12:31:22.1088780Z cvt.rn.bf16x2.f32 %r9109, %r10017, %r10016; 2026-02-21T12:31:22.1088860Z cvt.rn.bf16x2.f32 %r9110, %r10019, %r10018; 2026-02-21T12:31:22.1088942Z cvt.rn.bf16x2.f32 %r9111, %r10021, %r10020; 2026-02-21T12:31:22.1089018Z cvt.rn.bf16x2.f32 %r9112, %r10023, %r10022; 2026-02-21T12:31:22.1089096Z cvt.rn.bf16x2.f32 %r9113, %r10025, %r10024; 2026-02-21T12:31:22.1089181Z cvt.rn.bf16x2.f32 %r9114, %r10027, %r10026; 2026-02-21T12:31:22.1089260Z cvt.rn.bf16x2.f32 %r9115, %r10029, %r10028; 2026-02-21T12:31:22.1089336Z cvt.rn.bf16x2.f32 %r9116, %r10031, %r10030; 2026-02-21T12:31:22.1089418Z cvt.rn.bf16x2.f32 %r9117, %r10033, %r10032; 2026-02-21T12:31:22.1089507Z cvt.rn.bf16x2.f32 %r9118, %r10035, %r10034; 2026-02-21T12:31:22.1089588Z cvt.rn.bf16x2.f32 %r9119, %r10037, %r10036; 2026-02-21T12:31:22.1089667Z cvt.rn.bf16x2.f32 %r9120, %r10039, %r10038; 2026-02-21T12:31:22.1089742Z cvt.rn.bf16x2.f32 %r9121, %r10041, %r10040; 2026-02-21T12:31:22.1089816Z cvt.rn.bf16x2.f32 %r9122, %r10043, %r10042; 2026-02-21T12:31:22.1089890Z cvt.rn.bf16x2.f32 %r9123, %r10045, %r10044; 2026-02-21T12:31:22.1089971Z cvt.rn.bf16x2.f32 %r9124, %r10047, %r10046; 2026-02-21T12:31:22.1090050Z cvt.rn.bf16x2.f32 %r9125, %r10049, %r10048; 2026-02-21T12:31:22.1090127Z cvt.rn.bf16x2.f32 %r9126, %r10051, %r10050; 2026-02-21T12:31:22.1090206Z cvt.rn.bf16x2.f32 %r9127, %r10053, %r10052; 2026-02-21T12:31:22.1090285Z cvt.rn.bf16x2.f32 %r9128, %r10055, %r10054; 2026-02-21T12:31:22.1090360Z cvt.rn.bf16x2.f32 %r9129, %r10057, %r10056; 2026-02-21T12:31:22.1090443Z cvt.rn.bf16x2.f32 %r9130, %r10059, %r10058; 2026-02-21T12:31:22.1090521Z cvt.rn.bf16x2.f32 %r9131, %r10061, %r10060; 2026-02-21T12:31:22.1090595Z cvt.rn.bf16x2.f32 %r9132, %r10063, %r10062; 2026-02-21T12:31:22.1090668Z cvt.rn.bf16x2.f32 %r9133, %r10065, %r10064; 2026-02-21T12:31:22.1090751Z cvt.rn.bf16x2.f32 %r9134, %r10067, %r10066; 2026-02-21T12:31:22.1090825Z cvt.rn.bf16x2.f32 %r9135, %r10069, %r10068; 2026-02-21T12:31:22.1090901Z cvt.rn.bf16x2.f32 %r9136, %r10071, %r10070; 2026-02-21T12:31:22.1090986Z cvt.rn.bf16x2.f32 %r9137, %r10073, %r10072; 2026-02-21T12:31:22.1091152Z cvt.rn.bf16x2.f32 %r9138, %r10075, %r10074; 2026-02-21T12:31:22.1091228Z cvt.rn.bf16x2.f32 %r9139, %r10077, %r10076; 2026-02-21T12:31:22.1091312Z cvt.rn.bf16x2.f32 %r9140, %r10079, %r10078; 2026-02-21T12:31:22.1091387Z cvt.rn.bf16x2.f32 %r9141, %r10081, %r10080; 2026-02-21T12:31:22.1091514Z cvt.rn.bf16x2.f32 %r9142, %r10083, %r10082; 2026-02-21T12:31:22.1091590Z cvt.rn.bf16x2.f32 %r9143, %r10085, %r10084; 2026-02-21T12:31:22.1091673Z cvt.rn.bf16x2.f32 %r9144, %r10087, %r10086; 2026-02-21T12:31:22.1091749Z cvt.rn.bf16x2.f32 %r9145, %r10089, %r10088; 2026-02-21T12:31:22.1091822Z cvt.rn.bf16x2.f32 %r9146, %r10091, %r10090; 2026-02-21T12:31:22.1091900Z cvt.rn.bf16x2.f32 %r9147, %r10093, %r10092; 2026-02-21T12:31:22.1091975Z cvt.rn.bf16x2.f32 %r9148, %r10095, %r10094; 2026-02-21T12:31:22.1092052Z cvt.rn.bf16x2.f32 %r9149, %r10097, %r10096; 2026-02-21T12:31:22.1092132Z cvt.rn.bf16x2.f32 %r9150, %r10099, %r10098; 2026-02-21T12:31:22.1092206Z cvt.rn.bf16x2.f32 %r9151, %r10101, %r10100; 2026-02-21T12:31:22.1092347Z cvt.rn.bf16x2.f32 %r9152, %r10103, %r10102; 2026-02-21T12:31:22.1092426Z cvt.rn.bf16x2.f32 %r9153, %r10105, %r10104; 2026-02-21T12:31:22.1092507Z cvt.rn.bf16x2.f32 %r9154, %r10107, %r10106; 2026-02-21T12:31:22.1092584Z cvt.rn.bf16x2.f32 %r9155, %r10109, %r10108; 2026-02-21T12:31:22.1092660Z cvt.rn.bf16x2.f32 %r9156, %r10111, %r10110; 2026-02-21T12:31:22.1092801Z cvt.rn.bf16x2.f32 %r9157, %r10113, %r10112; 2026-02-21T12:31:22.1092878Z cvt.rn.bf16x2.f32 %r9158, %r10115, %r10114; 2026-02-21T12:31:22.1092957Z cvt.rn.bf16x2.f32 %r9159, %r10117, %r10116; 2026-02-21T12:31:22.1093036Z cvt.rn.bf16x2.f32 %r9160, %r10119, %r10118; 2026-02-21T12:31:22.1093115Z cvt.rn.bf16x2.f32 %r9161, %r10121, %r10120; 2026-02-21T12:31:22.1093193Z cvt.rn.bf16x2.f32 %r9162, %r10123, %r10122; 2026-02-21T12:31:22.1093271Z cvt.rn.bf16x2.f32 %r9163, %r10125, %r10124; 2026-02-21T12:31:22.1093346Z cvt.rn.bf16x2.f32 %r9164, %r10127, %r10126; 2026-02-21T12:31:22.1093422Z cvt.rn.bf16x2.f32 %r9165, %r10129, %r10128; 2026-02-21T12:31:22.1093501Z cvt.rn.bf16x2.f32 %r9166, %r10131, %r10130; 2026-02-21T12:31:22.1093582Z cvt.rn.bf16x2.f32 %r9167, %r10133, %r10132; 2026-02-21T12:31:22.1093657Z cvt.rn.bf16x2.f32 %r9168, %r10135, %r10134; 2026-02-21T12:31:22.1093733Z cvt.rn.bf16x2.f32 %r9169, %r10137, %r10136; 2026-02-21T12:31:22.1093812Z cvt.rn.bf16x2.f32 %r9170, %r10139, %r10138; 2026-02-21T12:31:22.1093889Z cvt.rn.bf16x2.f32 %r9171, %r10141, %r10140; 2026-02-21T12:31:22.1093965Z cvt.rn.bf16x2.f32 %r9172, %r10143, %r10142; 2026-02-21T12:31:22.1094044Z cvt.rn.bf16x2.f32 %r9173, %r10145, %r10144; 2026-02-21T12:31:22.1094118Z cvt.rn.bf16x2.f32 %r9174, %r10147, %r10146; 2026-02-21T12:31:22.1094191Z cvt.rn.bf16x2.f32 %r9175, %r10149, %r10148; 2026-02-21T12:31:22.1094266Z cvt.rn.bf16x2.f32 %r9176, %r10151, %r10150; 2026-02-21T12:31:22.1094344Z cvt.rn.bf16x2.f32 %r9177, %r10153, %r10152; 2026-02-21T12:31:22.1094418Z cvt.rn.bf16x2.f32 %r9178, %r10155, %r10154; 2026-02-21T12:31:22.1094497Z cvt.rn.bf16x2.f32 %r9179, %r10157, %r10156; 2026-02-21T12:31:22.1094578Z cvt.rn.bf16x2.f32 %r9180, %r10159, %r10158; 2026-02-21T12:31:22.1094652Z cvt.rn.bf16x2.f32 %r9181, %r10161, %r10160; 2026-02-21T12:31:22.1094726Z cvt.rn.bf16x2.f32 %r9182, %r10163, %r10162; 2026-02-21T12:31:22.1094802Z cvt.rn.bf16x2.f32 %r9183, %r10165, %r10164; 2026-02-21T12:31:22.1094881Z cvt.rn.bf16x2.f32 %r9184, %r10167, %r10166; 2026-02-21T12:31:22.1094958Z cvt.rn.bf16x2.f32 %r9185, %r10169, %r10168; 2026-02-21T12:31:22.1095032Z cvt.rn.bf16x2.f32 %r9186, %r10171, %r10170; 2026-02-21T12:31:22.1095110Z cvt.rn.bf16x2.f32 %r9187, %r10173, %r10172; 2026-02-21T12:31:22.1095185Z cvt.rn.bf16x2.f32 %r9188, %r10175, %r10174; 2026-02-21T12:31:22.1095261Z cvt.rn.bf16x2.f32 %r9189, %r10177, %r10176; 2026-02-21T12:31:22.1095340Z cvt.rn.bf16x2.f32 %r9190, %r10179, %r10178; 2026-02-21T12:31:22.1095416Z cvt.rn.bf16x2.f32 %r9191, %r10181, %r10180; 2026-02-21T12:31:22.1095491Z cvt.rn.bf16x2.f32 %r9192, %r10183, %r10182; 2026-02-21T12:31:22.1095647Z cvt.rn.bf16x2.f32 %r9193, %r10185, %r10184; 2026-02-21T12:31:22.1095731Z cvt.rn.bf16x2.f32 %r9194, %r10187, %r10186; 2026-02-21T12:31:22.1095806Z cvt.rn.bf16x2.f32 %r9195, %r10189, %r10188; 2026-02-21T12:31:22.1095933Z cvt.rn.bf16x2.f32 %r9196, %r10191, %r10190; 2026-02-21T12:31:22.1096014Z cvt.rn.bf16x2.f32 %r9197, %r10193, %r10192; 2026-02-21T12:31:22.1096092Z cvt.rn.bf16x2.f32 %r9198, %r10195, %r10194; 2026-02-21T12:31:22.1096166Z cvt.rn.bf16x2.f32 %r9199, %r10197, %r10196; 2026-02-21T12:31:22.1096247Z cvt.rn.bf16x2.f32 %r9200, %r10199, %r10198; 2026-02-21T12:31:22.1096322Z cvt.rn.bf16x2.f32 %r9201, %r10201, %r10200; 2026-02-21T12:31:22.1096400Z cvt.rn.bf16x2.f32 %r9202, %r10203, %r10202; 2026-02-21T12:31:22.1096602Z cvt.rn.bf16x2.f32 %r9203, %r10205, %r10204; 2026-02-21T12:31:22.1096689Z cvt.rn.bf16x2.f32 %r9204, %r10207, %r10206; 2026-02-21T12:31:22.1096763Z cvt.rn.bf16x2.f32 %r9205, %r10209, %r10208; 2026-02-21T12:31:22.1096839Z cvt.rn.bf16x2.f32 %r9206, %r10211, %r10210; 2026-02-21T12:31:22.1097001Z cvt.rn.bf16x2.f32 %r9207, %r10213, %r10212; 2026-02-21T12:31:22.1097083Z cvt.rn.bf16x2.f32 %r9208, %r10215, %r10214; 2026-02-21T12:31:22.1097159Z cvt.rn.bf16x2.f32 %r9209, %r10217, %r10216; 2026-02-21T12:31:22.1097241Z cvt.rn.bf16x2.f32 %r9210, %r10219, %r10218; 2026-02-21T12:31:22.1097324Z cvt.rn.bf16x2.f32 %r9211, %r10221, %r10220; 2026-02-21T12:31:22.1097465Z cvt.rn.bf16x2.f32 %r9212, %r10223, %r10222; 2026-02-21T12:31:22.1097546Z cvt.rn.bf16x2.f32 %r9213, %r10225, %r10224; 2026-02-21T12:31:22.1097628Z cvt.rn.bf16x2.f32 %r9214, %r10227, %r10226; 2026-02-21T12:31:22.1097706Z cvt.rn.bf16x2.f32 %r9215, %r10229, %r10228; 2026-02-21T12:31:22.1097782Z cvt.rn.bf16x2.f32 %r9216, %r10231, %r10230; 2026-02-21T12:31:22.1097864Z cvt.rn.bf16x2.f32 %r9217, %r10233, %r10232; 2026-02-21T12:31:22.1097939Z cvt.rn.bf16x2.f32 %r9218, %r10235, %r10234; 2026-02-21T12:31:22.1098012Z cvt.rn.bf16x2.f32 %r9219, %r10237, %r10236; 2026-02-21T12:31:22.1098095Z cvt.rn.bf16x2.f32 %r9220, %r10239, %r10238; 2026-02-21T12:31:22.1098170Z cvt.rn.bf16x2.f32 %r9221, %r10241, %r10240; 2026-02-21T12:31:22.1098245Z cvt.rn.bf16x2.f32 %r9222, %r10243, %r10242; 2026-02-21T12:31:22.1098320Z cvt.rn.bf16x2.f32 %r9223, %r10245, %r10244; 2026-02-21T12:31:22.1098400Z cvt.rn.bf16x2.f32 %r9224, %r10247, %r10246; 2026-02-21T12:31:22.1098475Z cvt.rn.bf16x2.f32 %r9225, %r10249, %r10248; 2026-02-21T12:31:22.1098551Z cvt.rn.bf16x2.f32 %r9226, %r10251, %r10250; 2026-02-21T12:31:22.1098629Z cvt.rn.bf16x2.f32 %r9227, %r10253, %r10252; 2026-02-21T12:31:22.1098705Z cvt.rn.bf16x2.f32 %r9228, %r10255, %r10254; 2026-02-21T12:31:22.1098782Z cvt.rn.bf16x2.f32 %r9229, %r10257, %r10256; 2026-02-21T12:31:22.1098860Z cvt.rn.bf16x2.f32 %r9230, %r10259, %r10258; 2026-02-21T12:31:22.1099091Z .loc 1 88 22 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:88:22 2026-02-21T12:31:22.1099176Z mad.lo.s64 %rd516, %rd388, 2560, %rd187; 2026-02-21T12:31:22.1099247Z shl.b64 %rd517, %rd650, 1; 2026-02-21T12:31:22.1099324Z add.s64 %rd260, %rd516, %rd517; 2026-02-21T12:31:22.1099400Z mad.lo.s64 %rd518, %rd389, 2560, %rd187; 2026-02-21T12:31:22.1099467Z add.s64 %rd261, %rd518, %rd517; 2026-02-21T12:31:22.1099546Z mad.lo.s64 %rd519, %rd390, 2560, %rd187; 2026-02-21T12:31:22.1099611Z add.s64 %rd262, %rd519, %rd517; 2026-02-21T12:31:22.1099684Z mad.lo.s64 %rd520, %rd391, 2560, %rd187; 2026-02-21T12:31:22.1099752Z add.s64 %rd263, %rd520, %rd517; 2026-02-21T12:31:22.1099823Z mad.lo.s64 %rd521, %rd392, 2560, %rd187; 2026-02-21T12:31:22.1099886Z add.s64 %rd264, %rd521, %rd517; 2026-02-21T12:31:22.1099956Z mad.lo.s64 %rd522, %rd393, 2560, %rd187; 2026-02-21T12:31:22.1100025Z add.s64 %rd265, %rd522, %rd517; 2026-02-21T12:31:22.1100095Z mad.lo.s64 %rd523, %rd394, 2560, %rd187; 2026-02-21T12:31:22.1100158Z add.s64 %rd266, %rd523, %rd517; 2026-02-21T12:31:22.1100230Z mad.lo.s64 %rd524, %rd395, 2560, %rd187; 2026-02-21T12:31:22.1100386Z add.s64 %rd267, %rd524, %rd517; 2026-02-21T12:31:22.1100459Z mad.lo.s64 %rd525, %rd396, 2560, %rd187; 2026-02-21T12:31:22.1100524Z add.s64 %rd268, %rd525, %rd517; 2026-02-21T12:31:22.1100604Z mad.lo.s64 %rd526, %rd397, 2560, %rd187; 2026-02-21T12:31:22.1100742Z add.s64 %rd269, %rd526, %rd517; 2026-02-21T12:31:22.1100817Z mad.lo.s64 %rd527, %rd398, 2560, %rd187; 2026-02-21T12:31:22.1100880Z add.s64 %rd270, %rd527, %rd517; 2026-02-21T12:31:22.1100953Z mad.lo.s64 %rd528, %rd399, 2560, %rd187; 2026-02-21T12:31:22.1101018Z add.s64 %rd271, %rd528, %rd517; 2026-02-21T12:31:22.1101086Z mad.lo.s64 %rd529, %rd400, 2560, %rd187; 2026-02-21T12:31:22.1101149Z add.s64 %rd272, %rd529, %rd517; 2026-02-21T12:31:22.1101217Z mad.lo.s64 %rd530, %rd401, 2560, %rd187; 2026-02-21T12:31:22.1101282Z add.s64 %rd273, %rd530, %rd517; 2026-02-21T12:31:22.1101352Z mad.lo.s64 %rd531, %rd402, 2560, %rd187; 2026-02-21T12:31:22.1101416Z add.s64 %rd274, %rd531, %rd517; 2026-02-21T12:31:22.1101487Z mad.lo.s64 %rd532, %rd403, 2560, %rd187; 2026-02-21T12:31:22.1101602Z add.s64 %rd275, %rd532, %rd517; 2026-02-21T12:31:22.1101673Z mad.lo.s64 %rd533, %rd404, 2560, %rd187; 2026-02-21T12:31:22.1101737Z add.s64 %rd276, %rd533, %rd517; 2026-02-21T12:31:22.1101807Z mad.lo.s64 %rd534, %rd405, 2560, %rd187; 2026-02-21T12:31:22.1101883Z add.s64 %rd277, %rd534, %rd517; 2026-02-21T12:31:22.1101964Z mad.lo.s64 %rd535, %rd406, 2560, %rd187; 2026-02-21T12:31:22.1102087Z add.s64 %rd278, %rd535, %rd517; 2026-02-21T12:31:22.1102161Z mad.lo.s64 %rd536, %rd407, 2560, %rd187; 2026-02-21T12:31:22.1102225Z add.s64 %rd279, %rd536, %rd517; 2026-02-21T12:31:22.1102299Z mad.lo.s64 %rd537, %rd408, 2560, %rd187; 2026-02-21T12:31:22.1102360Z add.s64 %rd280, %rd537, %rd517; 2026-02-21T12:31:22.1102432Z mad.lo.s64 %rd538, %rd409, 2560, %rd187; 2026-02-21T12:31:22.1102494Z add.s64 %rd281, %rd538, %rd517; 2026-02-21T12:31:22.1102567Z mad.lo.s64 %rd539, %rd410, 2560, %rd187; 2026-02-21T12:31:22.1102628Z add.s64 %rd282, %rd539, %rd517; 2026-02-21T12:31:22.1102700Z mad.lo.s64 %rd540, %rd411, 2560, %rd187; 2026-02-21T12:31:22.1102769Z add.s64 %rd283, %rd540, %rd517; 2026-02-21T12:31:22.1102837Z mad.lo.s64 %rd541, %rd412, 2560, %rd187; 2026-02-21T12:31:22.1102900Z add.s64 %rd284, %rd541, %rd517; 2026-02-21T12:31:22.1102973Z mad.lo.s64 %rd542, %rd413, 2560, %rd187; 2026-02-21T12:31:22.1103034Z add.s64 %rd285, %rd542, %rd517; 2026-02-21T12:31:22.1103104Z mad.lo.s64 %rd543, %rd414, 2560, %rd187; 2026-02-21T12:31:22.1103166Z add.s64 %rd286, %rd543, %rd517; 2026-02-21T12:31:22.1103241Z mad.lo.s64 %rd544, %rd415, 2560, %rd187; 2026-02-21T12:31:22.1103302Z add.s64 %rd287, %rd544, %rd517; 2026-02-21T12:31:22.1103371Z mad.lo.s64 %rd545, %rd416, 2560, %rd187; 2026-02-21T12:31:22.1103437Z add.s64 %rd288, %rd545, %rd517; 2026-02-21T12:31:22.1103504Z mad.lo.s64 %rd546, %rd417, 2560, %rd187; 2026-02-21T12:31:22.1103566Z add.s64 %rd289, %rd546, %rd517; 2026-02-21T12:31:22.1103633Z mad.lo.s64 %rd547, %rd418, 2560, %rd187; 2026-02-21T12:31:22.1103702Z add.s64 %rd290, %rd547, %rd517; 2026-02-21T12:31:22.1103771Z mad.lo.s64 %rd548, %rd419, 2560, %rd187; 2026-02-21T12:31:22.1103833Z add.s64 %rd291, %rd548, %rd517; 2026-02-21T12:31:22.1103910Z mad.lo.s64 %rd549, %rd420, 2560, %rd187; 2026-02-21T12:31:22.1103985Z add.s64 %rd292, %rd549, %rd517; 2026-02-21T12:31:22.1104059Z mad.lo.s64 %rd550, %rd421, 2560, %rd187; 2026-02-21T12:31:22.1104128Z add.s64 %rd293, %rd550, %rd517; 2026-02-21T12:31:22.1104199Z mad.lo.s64 %rd551, %rd422, 2560, %rd187; 2026-02-21T12:31:22.1104260Z add.s64 %rd294, %rd551, %rd517; 2026-02-21T12:31:22.1104328Z mad.lo.s64 %rd552, %rd423, 2560, %rd187; 2026-02-21T12:31:22.1104394Z add.s64 %rd295, %rd552, %rd517; 2026-02-21T12:31:22.1104461Z mad.lo.s64 %rd553, %rd424, 2560, %rd187; 2026-02-21T12:31:22.1104522Z add.s64 %rd296, %rd553, %rd517; 2026-02-21T12:31:22.1104596Z mad.lo.s64 %rd554, %rd425, 2560, %rd187; 2026-02-21T12:31:22.1104657Z add.s64 %rd297, %rd554, %rd517; 2026-02-21T12:31:22.1104786Z mad.lo.s64 %rd555, %rd426, 2560, %rd187; 2026-02-21T12:31:22.1104851Z add.s64 %rd298, %rd555, %rd517; 2026-02-21T12:31:22.1104921Z mad.lo.s64 %rd556, %rd427, 2560, %rd187; 2026-02-21T12:31:22.1104985Z add.s64 %rd299, %rd556, %rd517; 2026-02-21T12:31:22.1105101Z mad.lo.s64 %rd557, %rd428, 2560, %rd187; 2026-02-21T12:31:22.1105167Z add.s64 %rd300, %rd557, %rd517; 2026-02-21T12:31:22.1105236Z mad.lo.s64 %rd558, %rd429, 2560, %rd187; 2026-02-21T12:31:22.1105299Z add.s64 %rd301, %rd558, %rd517; 2026-02-21T12:31:22.1105375Z mad.lo.s64 %rd559, %rd430, 2560, %rd187; 2026-02-21T12:31:22.1105445Z add.s64 %rd302, %rd559, %rd517; 2026-02-21T12:31:22.1105514Z mad.lo.s64 %rd560, %rd431, 2560, %rd187; 2026-02-21T12:31:22.1105576Z add.s64 %rd303, %rd560, %rd517; 2026-02-21T12:31:22.1105648Z mad.lo.s64 %rd561, %rd432, 2560, %rd187; 2026-02-21T12:31:22.1105709Z add.s64 %rd304, %rd561, %rd517; 2026-02-21T12:31:22.1105777Z mad.lo.s64 %rd562, %rd433, 2560, %rd187; 2026-02-21T12:31:22.1105894Z add.s64 %rd305, %rd562, %rd517; 2026-02-21T12:31:22.1105965Z mad.lo.s64 %rd563, %rd434, 2560, %rd187; 2026-02-21T12:31:22.1106026Z add.s64 %rd306, %rd563, %rd517; 2026-02-21T12:31:22.1106097Z mad.lo.s64 %rd564, %rd435, 2560, %rd187; 2026-02-21T12:31:22.1106161Z add.s64 %rd307, %rd564, %rd517; 2026-02-21T12:31:22.1106229Z mad.lo.s64 %rd565, %rd436, 2560, %rd187; 2026-02-21T12:31:22.1106290Z add.s64 %rd308, %rd565, %rd517; 2026-02-21T12:31:22.1106410Z mad.lo.s64 %rd566, %rd437, 2560, %rd187; 2026-02-21T12:31:22.1106605Z add.s64 %rd309, %rd566, %rd517; 2026-02-21T12:31:22.1106679Z mad.lo.s64 %rd567, %rd438, 2560, %rd187; 2026-02-21T12:31:22.1106761Z add.s64 %rd310, %rd567, %rd517; 2026-02-21T12:31:22.1106832Z mad.lo.s64 %rd568, %rd439, 2560, %rd187; 2026-02-21T12:31:22.1106895Z add.s64 %rd311, %rd568, %rd517; 2026-02-21T12:31:22.1106964Z mad.lo.s64 %rd569, %rd440, 2560, %rd187; 2026-02-21T12:31:22.1107031Z add.s64 %rd312, %rd569, %rd517; 2026-02-21T12:31:22.1107106Z mad.lo.s64 %rd570, %rd441, 2560, %rd187; 2026-02-21T12:31:22.1107171Z add.s64 %rd313, %rd570, %rd517; 2026-02-21T12:31:22.1107245Z mad.lo.s64 %rd571, %rd442, 2560, %rd187; 2026-02-21T12:31:22.1107308Z add.s64 %rd314, %rd571, %rd517; 2026-02-21T12:31:22.1107378Z mad.lo.s64 %rd572, %rd443, 2560, %rd187; 2026-02-21T12:31:22.1107453Z add.s64 %rd315, %rd572, %rd517; 2026-02-21T12:31:22.1107521Z mad.lo.s64 %rd573, %rd444, 2560, %rd187; 2026-02-21T12:31:22.1107584Z add.s64 %rd316, %rd573, %rd517; 2026-02-21T12:31:22.1107653Z mad.lo.s64 %rd574, %rd445, 2560, %rd187; 2026-02-21T12:31:22.1107720Z add.s64 %rd317, %rd574, %rd517; 2026-02-21T12:31:22.1107788Z mad.lo.s64 %rd575, %rd446, 2560, %rd187; 2026-02-21T12:31:22.1107851Z add.s64 %rd318, %rd575, %rd517; 2026-02-21T12:31:22.1107921Z mad.lo.s64 %rd576, %rd447, 2560, %rd187; 2026-02-21T12:31:22.1107983Z add.s64 %rd319, %rd576, %rd517; 2026-02-21T12:31:22.1108051Z mad.lo.s64 %rd577, %rd448, 2560, %rd187; 2026-02-21T12:31:22.1108115Z add.s64 %rd320, %rd577, %rd517; 2026-02-21T12:31:22.1108188Z mad.lo.s64 %rd578, %rd449, 2560, %rd187; 2026-02-21T12:31:22.1108365Z add.s64 %rd321, %rd578, %rd517; 2026-02-21T12:31:22.1108439Z mad.lo.s64 %rd579, %rd450, 2560, %rd187; 2026-02-21T12:31:22.1108507Z add.s64 %rd322, %rd579, %rd517; 2026-02-21T12:31:22.1108576Z mad.lo.s64 %rd580, %rd451, 2560, %rd187; 2026-02-21T12:31:22.1108637Z add.s64 %rd323, %rd580, %rd517; 2026-02-21T12:31:22.1108709Z mad.lo.s64 %rd581, %rd452, 2560, %rd187; 2026-02-21T12:31:22.1108771Z add.s64 %rd324, %rd581, %rd517; 2026-02-21T12:31:22.1108841Z mad.lo.s64 %rd582, %rd453, 2560, %rd187; 2026-02-21T12:31:22.1108903Z add.s64 %rd325, %rd582, %rd517; 2026-02-21T12:31:22.1108974Z mad.lo.s64 %rd583, %rd454, 2560, %rd187; 2026-02-21T12:31:22.1109036Z add.s64 %rd326, %rd583, %rd517; 2026-02-21T12:31:22.1109105Z mad.lo.s64 %rd584, %rd455, 2560, %rd187; 2026-02-21T12:31:22.1109171Z add.s64 %rd327, %rd584, %rd517; 2026-02-21T12:31:22.1109239Z mad.lo.s64 %rd585, %rd456, 2560, %rd187; 2026-02-21T12:31:22.1109393Z add.s64 %rd328, %rd585, %rd517; 2026-02-21T12:31:22.1109464Z mad.lo.s64 %rd586, %rd457, 2560, %rd187; 2026-02-21T12:31:22.1109530Z add.s64 %rd329, %rd586, %rd517; 2026-02-21T12:31:22.1109598Z mad.lo.s64 %rd587, %rd458, 2560, %rd187; 2026-02-21T12:31:22.1109755Z add.s64 %rd330, %rd587, %rd517; 2026-02-21T12:31:22.1109829Z mad.lo.s64 %rd588, %rd459, 2560, %rd187; 2026-02-21T12:31:22.1109892Z add.s64 %rd331, %rd588, %rd517; 2026-02-21T12:31:22.1109972Z mad.lo.s64 %rd589, %rd460, 2560, %rd187; 2026-02-21T12:31:22.1110038Z add.s64 %rd332, %rd589, %rd517; 2026-02-21T12:31:22.1110108Z mad.lo.s64 %rd590, %rd461, 2560, %rd187; 2026-02-21T12:31:22.1110170Z add.s64 %rd333, %rd590, %rd517; 2026-02-21T12:31:22.1110238Z mad.lo.s64 %rd591, %rd462, 2560, %rd187; 2026-02-21T12:31:22.1110302Z add.s64 %rd334, %rd591, %rd517; 2026-02-21T12:31:22.1110371Z mad.lo.s64 %rd592, %rd463, 2560, %rd187; 2026-02-21T12:31:22.1110435Z add.s64 %rd335, %rd592, %rd517; 2026-02-21T12:31:22.1110570Z mad.lo.s64 %rd593, %rd464, 2560, %rd187; 2026-02-21T12:31:22.1110635Z add.s64 %rd336, %rd593, %rd517; 2026-02-21T12:31:22.1110703Z mad.lo.s64 %rd594, %rd465, 2560, %rd187; 2026-02-21T12:31:22.1110765Z add.s64 %rd337, %rd594, %rd517; 2026-02-21T12:31:22.1110839Z mad.lo.s64 %rd595, %rd466, 2560, %rd187; 2026-02-21T12:31:22.1110900Z add.s64 %rd338, %rd595, %rd517; 2026-02-21T12:31:22.1111028Z mad.lo.s64 %rd596, %rd467, 2560, %rd187; 2026-02-21T12:31:22.1111096Z add.s64 %rd339, %rd596, %rd517; 2026-02-21T12:31:22.1111165Z mad.lo.s64 %rd597, %rd468, 2560, %rd187; 2026-02-21T12:31:22.1111226Z add.s64 %rd340, %rd597, %rd517; 2026-02-21T12:31:22.1111301Z mad.lo.s64 %rd598, %rd469, 2560, %rd187; 2026-02-21T12:31:22.1111371Z add.s64 %rd341, %rd598, %rd517; 2026-02-21T12:31:22.1111440Z mad.lo.s64 %rd599, %rd470, 2560, %rd187; 2026-02-21T12:31:22.1111501Z add.s64 %rd342, %rd599, %rd517; 2026-02-21T12:31:22.1111573Z mad.lo.s64 %rd600, %rd471, 2560, %rd187; 2026-02-21T12:31:22.1111641Z add.s64 %rd343, %rd600, %rd517; 2026-02-21T12:31:22.1111709Z mad.lo.s64 %rd601, %rd472, 2560, %rd187; 2026-02-21T12:31:22.1111773Z add.s64 %rd344, %rd601, %rd517; 2026-02-21T12:31:22.1111841Z mad.lo.s64 %rd602, %rd473, 2560, %rd187; 2026-02-21T12:31:22.1111905Z add.s64 %rd345, %rd602, %rd517; 2026-02-21T12:31:22.1111972Z mad.lo.s64 %rd603, %rd474, 2560, %rd187; 2026-02-21T12:31:22.1112040Z add.s64 %rd346, %rd603, %rd517; 2026-02-21T12:31:22.1112109Z mad.lo.s64 %rd604, %rd475, 2560, %rd187; 2026-02-21T12:31:22.1112172Z add.s64 %rd347, %rd604, %rd517; 2026-02-21T12:31:22.1112244Z mad.lo.s64 %rd605, %rd476, 2560, %rd187; 2026-02-21T12:31:22.1112306Z add.s64 %rd348, %rd605, %rd517; 2026-02-21T12:31:22.1112376Z mad.lo.s64 %rd606, %rd477, 2560, %rd187; 2026-02-21T12:31:22.1112443Z add.s64 %rd349, %rd606, %rd517; 2026-02-21T12:31:22.1112511Z mad.lo.s64 %rd607, %rd478, 2560, %rd187; 2026-02-21T12:31:22.1112574Z add.s64 %rd350, %rd607, %rd517; 2026-02-21T12:31:22.1112646Z mad.lo.s64 %rd608, %rd479, 2560, %rd187; 2026-02-21T12:31:22.1112712Z add.s64 %rd351, %rd608, %rd517; 2026-02-21T12:31:22.1112782Z mad.lo.s64 %rd609, %rd480, 2560, %rd187; 2026-02-21T12:31:22.1112844Z add.s64 %rd352, %rd609, %rd517; 2026-02-21T12:31:22.1112920Z mad.lo.s64 %rd610, %rd481, 2560, %rd187; 2026-02-21T12:31:22.1112982Z add.s64 %rd353, %rd610, %rd517; 2026-02-21T12:31:22.1113050Z mad.lo.s64 %rd611, %rd482, 2560, %rd187; 2026-02-21T12:31:22.1113116Z add.s64 %rd354, %rd611, %rd517; 2026-02-21T12:31:22.1113184Z mad.lo.s64 %rd612, %rd483, 2560, %rd187; 2026-02-21T12:31:22.1113247Z add.s64 %rd355, %rd612, %rd517; 2026-02-21T12:31:22.1113316Z mad.lo.s64 %rd613, %rd484, 2560, %rd187; 2026-02-21T12:31:22.1113383Z add.s64 %rd356, %rd613, %rd517; 2026-02-21T12:31:22.1113450Z mad.lo.s64 %rd614, %rd485, 2560, %rd187; 2026-02-21T12:31:22.1113510Z add.s64 %rd357, %rd614, %rd517; 2026-02-21T12:31:22.1113583Z mad.lo.s64 %rd615, %rd486, 2560, %rd187; 2026-02-21T12:31:22.1113709Z add.s64 %rd358, %rd615, %rd517; 2026-02-21T12:31:22.1113779Z mad.lo.s64 %rd616, %rd487, 2560, %rd187; 2026-02-21T12:31:22.1113843Z add.s64 %rd359, %rd616, %rd517; 2026-02-21T12:31:22.1113914Z mad.lo.s64 %rd617, %rd488, 2560, %rd187; 2026-02-21T12:31:22.1114023Z add.s64 %rd360, %rd617, %rd517; 2026-02-21T12:31:22.1114092Z mad.lo.s64 %rd618, %rd489, 2560, %rd187; 2026-02-21T12:31:22.1114157Z add.s64 %rd361, %rd618, %rd517; 2026-02-21T12:31:22.1114226Z mad.lo.s64 %rd619, %rd490, 2560, %rd187; 2026-02-21T12:31:22.1114288Z add.s64 %rd362, %rd619, %rd517; 2026-02-21T12:31:22.1114359Z mad.lo.s64 %rd620, %rd491, 2560, %rd187; 2026-02-21T12:31:22.1114421Z add.s64 %rd363, %rd620, %rd517; 2026-02-21T12:31:22.1114491Z mad.lo.s64 %rd621, %rd492, 2560, %rd187; 2026-02-21T12:31:22.1114553Z add.s64 %rd364, %rd621, %rd517; 2026-02-21T12:31:22.1114624Z mad.lo.s64 %rd622, %rd493, 2560, %rd187; 2026-02-21T12:31:22.1114688Z add.s64 %rd365, %rd622, %rd517; 2026-02-21T12:31:22.1114758Z mad.lo.s64 %rd623, %rd494, 2560, %rd187; 2026-02-21T12:31:22.1114870Z add.s64 %rd366, %rd623, %rd517; 2026-02-21T12:31:22.1114942Z mad.lo.s64 %rd624, %rd495, 2560, %rd187; 2026-02-21T12:31:22.1115005Z add.s64 %rd367, %rd624, %rd517; 2026-02-21T12:31:22.1115075Z mad.lo.s64 %rd625, %rd496, 2560, %rd187; 2026-02-21T12:31:22.1115142Z add.s64 %rd368, %rd625, %rd517; 2026-02-21T12:31:22.1115209Z mad.lo.s64 %rd626, %rd497, 2560, %rd187; 2026-02-21T12:31:22.1115317Z add.s64 %rd369, %rd626, %rd517; 2026-02-21T12:31:22.1115394Z mad.lo.s64 %rd627, %rd498, 2560, %rd187; 2026-02-21T12:31:22.1115457Z add.s64 %rd370, %rd627, %rd517; 2026-02-21T12:31:22.1115526Z mad.lo.s64 %rd628, %rd499, 2560, %rd187; 2026-02-21T12:31:22.1115603Z add.s64 %rd371, %rd628, %rd517; 2026-02-21T12:31:22.1115672Z mad.lo.s64 %rd629, %rd500, 2560, %rd187; 2026-02-21T12:31:22.1115734Z add.s64 %rd372, %rd629, %rd517; 2026-02-21T12:31:22.1115805Z mad.lo.s64 %rd630, %rd501, 2560, %rd187; 2026-02-21T12:31:22.1115870Z add.s64 %rd373, %rd630, %rd517; 2026-02-21T12:31:22.1115943Z mad.lo.s64 %rd631, %rd502, 2560, %rd187; 2026-02-21T12:31:22.1116004Z add.s64 %rd374, %rd631, %rd517; 2026-02-21T12:31:22.1116076Z mad.lo.s64 %rd632, %rd503, 2560, %rd187; 2026-02-21T12:31:22.1116136Z add.s64 %rd375, %rd632, %rd517; 2026-02-21T12:31:22.1116209Z mad.lo.s64 %rd633, %rd504, 2560, %rd187; 2026-02-21T12:31:22.1116270Z add.s64 %rd376, %rd633, %rd517; 2026-02-21T12:31:22.1116341Z mad.lo.s64 %rd634, %rd505, 2560, %rd187; 2026-02-21T12:31:22.1116404Z add.s64 %rd377, %rd634, %rd517; 2026-02-21T12:31:22.1116607Z mad.lo.s64 %rd635, %rd506, 2560, %rd187; 2026-02-21T12:31:22.1116677Z add.s64 %rd378, %rd635, %rd517; 2026-02-21T12:31:22.1116747Z mad.lo.s64 %rd636, %rd507, 2560, %rd187; 2026-02-21T12:31:22.1116808Z add.s64 %rd379, %rd636, %rd517; 2026-02-21T12:31:22.1116880Z mad.lo.s64 %rd637, %rd508, 2560, %rd187; 2026-02-21T12:31:22.1116941Z add.s64 %rd380, %rd637, %rd517; 2026-02-21T12:31:22.1117008Z mad.lo.s64 %rd638, %rd509, 2560, %rd187; 2026-02-21T12:31:22.1117075Z add.s64 %rd381, %rd638, %rd517; 2026-02-21T12:31:22.1117147Z mad.lo.s64 %rd639, %rd510, 2560, %rd187; 2026-02-21T12:31:22.1117209Z add.s64 %rd382, %rd639, %rd517; 2026-02-21T12:31:22.1117278Z mad.lo.s64 %rd640, %rd511, 2560, %rd187; 2026-02-21T12:31:22.1117344Z add.s64 %rd383, %rd640, %rd517; 2026-02-21T12:31:22.1117412Z mad.lo.s64 %rd641, %rd512, 2560, %rd187; 2026-02-21T12:31:22.1117474Z add.s64 %rd384, %rd641, %rd517; 2026-02-21T12:31:22.1117543Z mad.lo.s64 %rd642, %rd513, 2560, %rd187; 2026-02-21T12:31:22.1117611Z add.s64 %rd385, %rd642, %rd517; 2026-02-21T12:31:22.1117677Z mad.lo.s64 %rd643, %rd514, 2560, %rd187; 2026-02-21T12:31:22.1117739Z add.s64 %rd386, %rd643, %rd517; 2026-02-21T12:31:22.1117810Z mad.lo.s64 %rd644, %rd515, 2560, %rd187; 2026-02-21T12:31:22.1117872Z add.s64 %rd387, %rd644, %rd517; 2026-02-21T12:31:22.1118102Z .loc 1 88 81 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:88:81 2026-02-21T12:31:22.1118256Z bar.sync 0; 2026-02-21T12:31:22.1118379Z st.shared.v4.b32 [%r22], {%r8719, %r8721, %r8723, %r8725}; 2026-02-21T12:31:22.1118492Z st.shared.v4.b32 [%r23], {%r8727, %r8729, %r8731, %r8733}; 2026-02-21T12:31:22.1118594Z st.shared.v4.b32 [%r24], {%r8735, %r8737, %r8739, %r8741}; 2026-02-21T12:31:22.1118763Z st.shared.v4.b32 [%r25], {%r8743, %r8745, %r8747, %r8749}; 2026-02-21T12:31:22.1118867Z st.shared.v4.b32 [%r26], {%r8751, %r8753, %r8755, %r8757}; 2026-02-21T12:31:22.1118967Z st.shared.v4.b32 [%r27], {%r8759, %r8761, %r8763, %r8765}; 2026-02-21T12:31:22.1119071Z st.shared.v4.b32 [%r28], {%r8767, %r8769, %r8771, %r8773}; 2026-02-21T12:31:22.1119170Z st.shared.v4.b32 [%r29], {%r8775, %r8777, %r8779, %r8781}; 2026-02-21T12:31:22.1119238Z bar.sync 0; 2026-02-21T12:31:22.1119308Z // begin inline asm 2026-02-21T12:31:22.1119506Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8206, %r8207, %r8208, %r8209}, [%r7570]; 2026-02-21T12:31:22.1119564Z // end inline asm 2026-02-21T12:31:22.1119626Z // begin inline asm 2026-02-21T12:31:22.1119878Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8210, %r8211, %r8212, %r8213}, [%r7575]; 2026-02-21T12:31:22.1119938Z // end inline asm 2026-02-21T12:31:22.1119997Z // begin inline asm 2026-02-21T12:31:22.1120181Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8222, %r8223, %r8224, %r8225}, [%r7580]; 2026-02-21T12:31:22.1120237Z // end inline asm 2026-02-21T12:31:22.1120295Z // begin inline asm 2026-02-21T12:31:22.1120547Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8226, %r8227, %r8228, %r8229}, [%r7585]; 2026-02-21T12:31:22.1120611Z // end inline asm 2026-02-21T12:31:22.1120669Z // begin inline asm 2026-02-21T12:31:22.1120852Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8238, %r8239, %r8240, %r8241}, [%r7590]; 2026-02-21T12:31:22.1120914Z // end inline asm 2026-02-21T12:31:22.1120972Z // begin inline asm 2026-02-21T12:31:22.1121152Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8242, %r8243, %r8244, %r8245}, [%r7595]; 2026-02-21T12:31:22.1121213Z // end inline asm 2026-02-21T12:31:22.1121273Z // begin inline asm 2026-02-21T12:31:22.1121450Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8254, %r8255, %r8256, %r8257}, [%r7600]; 2026-02-21T12:31:22.1121507Z // end inline asm 2026-02-21T12:31:22.1121570Z // begin inline asm 2026-02-21T12:31:22.1121746Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8258, %r8259, %r8260, %r8261}, [%r7605]; 2026-02-21T12:31:22.1121802Z // end inline asm 2026-02-21T12:31:22.1121861Z bar.sync 0; 2026-02-21T12:31:22.1121968Z st.shared.v4.b32 [%r22], {%r8720, %r8722, %r8724, %r8726}; 2026-02-21T12:31:22.1122070Z st.shared.v4.b32 [%r23], {%r8728, %r8730, %r8732, %r8734}; 2026-02-21T12:31:22.1122172Z st.shared.v4.b32 [%r24], {%r8736, %r8738, %r8740, %r8742}; 2026-02-21T12:31:22.1122276Z st.shared.v4.b32 [%r25], {%r8744, %r8746, %r8748, %r8750}; 2026-02-21T12:31:22.1122378Z st.shared.v4.b32 [%r26], {%r8752, %r8754, %r8756, %r8758}; 2026-02-21T12:31:22.1122478Z st.shared.v4.b32 [%r27], {%r8760, %r8762, %r8764, %r8766}; 2026-02-21T12:31:22.1122585Z st.shared.v4.b32 [%r28], {%r8768, %r8770, %r8772, %r8774}; 2026-02-21T12:31:22.1122685Z st.shared.v4.b32 [%r29], {%r8776, %r8778, %r8780, %r8782}; 2026-02-21T12:31:22.1122741Z bar.sync 0; 2026-02-21T12:31:22.1122812Z // begin inline asm 2026-02-21T12:31:22.1123000Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8214, %r8215, %r8216, %r8217}, [%r7570]; 2026-02-21T12:31:22.1123058Z // end inline asm 2026-02-21T12:31:22.1123120Z // begin inline asm 2026-02-21T12:31:22.1123299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8218, %r8219, %r8220, %r8221}, [%r7575]; 2026-02-21T12:31:22.1123355Z // end inline asm 2026-02-21T12:31:22.1123414Z // begin inline asm 2026-02-21T12:31:22.1123594Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8230, %r8231, %r8232, %r8233}, [%r7580]; 2026-02-21T12:31:22.1123652Z // end inline asm 2026-02-21T12:31:22.1123709Z // begin inline asm 2026-02-21T12:31:22.1123890Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8234, %r8235, %r8236, %r8237}, [%r7585]; 2026-02-21T12:31:22.1124007Z // end inline asm 2026-02-21T12:31:22.1124066Z // begin inline asm 2026-02-21T12:31:22.1124247Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8246, %r8247, %r8248, %r8249}, [%r7590]; 2026-02-21T12:31:22.1124307Z // end inline asm 2026-02-21T12:31:22.1124414Z // begin inline asm 2026-02-21T12:31:22.1124590Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8250, %r8251, %r8252, %r8253}, [%r7595]; 2026-02-21T12:31:22.1124655Z // end inline asm 2026-02-21T12:31:22.1124714Z // begin inline asm 2026-02-21T12:31:22.1124890Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8262, %r8263, %r8264, %r8265}, [%r7600]; 2026-02-21T12:31:22.1124948Z // end inline asm 2026-02-21T12:31:22.1125005Z // begin inline asm 2026-02-21T12:31:22.1125180Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8266, %r8267, %r8268, %r8269}, [%r7605]; 2026-02-21T12:31:22.1125235Z // end inline asm 2026-02-21T12:31:22.1125293Z bar.sync 0; 2026-02-21T12:31:22.1125395Z st.shared.v4.b32 [%r22], {%r8783, %r8785, %r8787, %r8789}; 2026-02-21T12:31:22.1125547Z st.shared.v4.b32 [%r23], {%r8791, %r8793, %r8795, %r8797}; 2026-02-21T12:31:22.1125656Z st.shared.v4.b32 [%r24], {%r8799, %r8801, %r8803, %r8805}; 2026-02-21T12:31:22.1125757Z st.shared.v4.b32 [%r25], {%r8807, %r8809, %r8811, %r8813}; 2026-02-21T12:31:22.1125858Z st.shared.v4.b32 [%r26], {%r8815, %r8817, %r8819, %r8821}; 2026-02-21T12:31:22.1126006Z st.shared.v4.b32 [%r27], {%r8823, %r8825, %r8827, %r8829}; 2026-02-21T12:31:22.1126108Z st.shared.v4.b32 [%r28], {%r8831, %r8833, %r8835, %r8837}; 2026-02-21T12:31:22.1126211Z st.shared.v4.b32 [%r29], {%r8839, %r8841, %r8843, %r8845}; 2026-02-21T12:31:22.1126266Z bar.sync 0; 2026-02-21T12:31:22.1126341Z // begin inline asm 2026-02-21T12:31:22.1126646Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8270, %r8271, %r8272, %r8273}, [%r7570]; 2026-02-21T12:31:22.1126705Z // end inline asm 2026-02-21T12:31:22.1126768Z // begin inline asm 2026-02-21T12:31:22.1126948Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8274, %r8275, %r8276, %r8277}, [%r7575]; 2026-02-21T12:31:22.1127007Z // end inline asm 2026-02-21T12:31:22.1127068Z // begin inline asm 2026-02-21T12:31:22.1127245Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8286, %r8287, %r8288, %r8289}, [%r7580]; 2026-02-21T12:31:22.1127303Z // end inline asm 2026-02-21T12:31:22.1127361Z // begin inline asm 2026-02-21T12:31:22.1127551Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8290, %r8291, %r8292, %r8293}, [%r7585]; 2026-02-21T12:31:22.1127609Z // end inline asm 2026-02-21T12:31:22.1127666Z // begin inline asm 2026-02-21T12:31:22.1127849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8302, %r8303, %r8304, %r8305}, [%r7590]; 2026-02-21T12:31:22.1127907Z // end inline asm 2026-02-21T12:31:22.1127963Z // begin inline asm 2026-02-21T12:31:22.1128142Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8306, %r8307, %r8308, %r8309}, [%r7595]; 2026-02-21T12:31:22.1128199Z // end inline asm 2026-02-21T12:31:22.1128258Z // begin inline asm 2026-02-21T12:31:22.1128437Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8318, %r8319, %r8320, %r8321}, [%r7600]; 2026-02-21T12:31:22.1128496Z // end inline asm 2026-02-21T12:31:22.1128556Z // begin inline asm 2026-02-21T12:31:22.1128731Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8322, %r8323, %r8324, %r8325}, [%r7605]; 2026-02-21T12:31:22.1128791Z // end inline asm 2026-02-21T12:31:22.1128848Z bar.sync 0; 2026-02-21T12:31:22.1128949Z st.shared.v4.b32 [%r22], {%r8784, %r8786, %r8788, %r8790}; 2026-02-21T12:31:22.1129050Z st.shared.v4.b32 [%r23], {%r8792, %r8794, %r8796, %r8798}; 2026-02-21T12:31:22.1129155Z st.shared.v4.b32 [%r24], {%r8800, %r8802, %r8804, %r8806}; 2026-02-21T12:31:22.1129253Z st.shared.v4.b32 [%r25], {%r8808, %r8810, %r8812, %r8814}; 2026-02-21T12:31:22.1129352Z st.shared.v4.b32 [%r26], {%r8816, %r8818, %r8820, %r8822}; 2026-02-21T12:31:22.1129456Z st.shared.v4.b32 [%r27], {%r8824, %r8826, %r8828, %r8830}; 2026-02-21T12:31:22.1129558Z st.shared.v4.b32 [%r28], {%r8832, %r8834, %r8836, %r8838}; 2026-02-21T12:31:22.1129772Z st.shared.v4.b32 [%r29], {%r8840, %r8842, %r8844, %r8846}; 2026-02-21T12:31:22.1129830Z bar.sync 0; 2026-02-21T12:31:22.1129888Z // begin inline asm 2026-02-21T12:31:22.1130067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8278, %r8279, %r8280, %r8281}, [%r7570]; 2026-02-21T12:31:22.1130185Z // end inline asm 2026-02-21T12:31:22.1130250Z // begin inline asm 2026-02-21T12:31:22.1130428Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8282, %r8283, %r8284, %r8285}, [%r7575]; 2026-02-21T12:31:22.1130483Z // end inline asm 2026-02-21T12:31:22.1130544Z // begin inline asm 2026-02-21T12:31:22.1130722Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8294, %r8295, %r8296, %r8297}, [%r7580]; 2026-02-21T12:31:22.1130778Z // end inline asm 2026-02-21T12:31:22.1130838Z // begin inline asm 2026-02-21T12:31:22.1131016Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8298, %r8299, %r8300, %r8301}, [%r7585]; 2026-02-21T12:31:22.1131072Z // end inline asm 2026-02-21T12:31:22.1131130Z // begin inline asm 2026-02-21T12:31:22.1131377Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8310, %r8311, %r8312, %r8313}, [%r7590]; 2026-02-21T12:31:22.1131439Z // end inline asm 2026-02-21T12:31:22.1131497Z // begin inline asm 2026-02-21T12:31:22.1131679Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8314, %r8315, %r8316, %r8317}, [%r7595]; 2026-02-21T12:31:22.1131738Z // end inline asm 2026-02-21T12:31:22.1131797Z // begin inline asm 2026-02-21T12:31:22.1132047Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8326, %r8327, %r8328, %r8329}, [%r7600]; 2026-02-21T12:31:22.1132108Z // end inline asm 2026-02-21T12:31:22.1132178Z // begin inline asm 2026-02-21T12:31:22.1132366Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8330, %r8331, %r8332, %r8333}, [%r7605]; 2026-02-21T12:31:22.1132427Z // end inline asm 2026-02-21T12:31:22.1132482Z bar.sync 0; 2026-02-21T12:31:22.1132592Z st.shared.v4.b32 [%r22], {%r8847, %r8849, %r8851, %r8853}; 2026-02-21T12:31:22.1132699Z st.shared.v4.b32 [%r23], {%r8855, %r8857, %r8859, %r8861}; 2026-02-21T12:31:22.1132805Z st.shared.v4.b32 [%r24], {%r8863, %r8865, %r8867, %r8869}; 2026-02-21T12:31:22.1132909Z st.shared.v4.b32 [%r25], {%r8871, %r8873, %r8875, %r8877}; 2026-02-21T12:31:22.1133009Z st.shared.v4.b32 [%r26], {%r8879, %r8881, %r8883, %r8885}; 2026-02-21T12:31:22.1133118Z st.shared.v4.b32 [%r27], {%r8887, %r8889, %r8891, %r8893}; 2026-02-21T12:31:22.1133219Z st.shared.v4.b32 [%r28], {%r8895, %r8897, %r8899, %r8901}; 2026-02-21T12:31:22.1133320Z st.shared.v4.b32 [%r29], {%r8903, %r8905, %r8907, %r8909}; 2026-02-21T12:31:22.1133380Z bar.sync 0; 2026-02-21T12:31:22.1133441Z // begin inline asm 2026-02-21T12:31:22.1133622Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8334, %r8335, %r8336, %r8337}, [%r7570]; 2026-02-21T12:31:22.1133680Z // end inline asm 2026-02-21T12:31:22.1133739Z // begin inline asm 2026-02-21T12:31:22.1133918Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8338, %r8339, %r8340, %r8341}, [%r7575]; 2026-02-21T12:31:22.1133975Z // end inline asm 2026-02-21T12:31:22.1134041Z // begin inline asm 2026-02-21T12:31:22.1134219Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8350, %r8351, %r8352, %r8353}, [%r7580]; 2026-02-21T12:31:22.1134277Z // end inline asm 2026-02-21T12:31:22.1134347Z // begin inline asm 2026-02-21T12:31:22.1134527Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8354, %r8355, %r8356, %r8357}, [%r7585]; 2026-02-21T12:31:22.1134586Z // end inline asm 2026-02-21T12:31:22.1134652Z // begin inline asm 2026-02-21T12:31:22.1134831Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8366, %r8367, %r8368, %r8369}, [%r7590]; 2026-02-21T12:31:22.1134888Z // end inline asm 2026-02-21T12:31:22.1134950Z // begin inline asm 2026-02-21T12:31:22.1135132Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8370, %r8371, %r8372, %r8373}, [%r7595]; 2026-02-21T12:31:22.1135188Z // end inline asm 2026-02-21T12:31:22.1135246Z // begin inline asm 2026-02-21T12:31:22.1135427Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8382, %r8383, %r8384, %r8385}, [%r7600]; 2026-02-21T12:31:22.1135556Z // end inline asm 2026-02-21T12:31:22.1135619Z // begin inline asm 2026-02-21T12:31:22.1135806Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8386, %r8387, %r8388, %r8389}, [%r7605]; 2026-02-21T12:31:22.1135862Z // end inline asm 2026-02-21T12:31:22.1135972Z bar.sync 0; 2026-02-21T12:31:22.1136082Z st.shared.v4.b32 [%r22], {%r8848, %r8850, %r8852, %r8854}; 2026-02-21T12:31:22.1136199Z st.shared.v4.b32 [%r23], {%r8856, %r8858, %r8860, %r8862}; 2026-02-21T12:31:22.1136307Z st.shared.v4.b32 [%r24], {%r8864, %r8866, %r8868, %r8870}; 2026-02-21T12:31:22.1136412Z st.shared.v4.b32 [%r25], {%r8872, %r8874, %r8876, %r8878}; 2026-02-21T12:31:22.1136648Z st.shared.v4.b32 [%r26], {%r8880, %r8882, %r8884, %r8886}; 2026-02-21T12:31:22.1136756Z st.shared.v4.b32 [%r27], {%r8888, %r8890, %r8892, %r8894}; 2026-02-21T12:31:22.1136857Z st.shared.v4.b32 [%r28], {%r8896, %r8898, %r8900, %r8902}; 2026-02-21T12:31:22.1136963Z st.shared.v4.b32 [%r29], {%r8904, %r8906, %r8908, %r8910}; 2026-02-21T12:31:22.1137022Z bar.sync 0; 2026-02-21T12:31:22.1137163Z // begin inline asm 2026-02-21T12:31:22.1137359Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8342, %r8343, %r8344, %r8345}, [%r7570]; 2026-02-21T12:31:22.1137426Z // end inline asm 2026-02-21T12:31:22.1137490Z // begin inline asm 2026-02-21T12:31:22.1137672Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8346, %r8347, %r8348, %r8349}, [%r7575]; 2026-02-21T12:31:22.1137734Z // end inline asm 2026-02-21T12:31:22.1137855Z // begin inline asm 2026-02-21T12:31:22.1138035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8358, %r8359, %r8360, %r8361}, [%r7580]; 2026-02-21T12:31:22.1138092Z // end inline asm 2026-02-21T12:31:22.1138155Z // begin inline asm 2026-02-21T12:31:22.1138335Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8362, %r8363, %r8364, %r8365}, [%r7585]; 2026-02-21T12:31:22.1138391Z // end inline asm 2026-02-21T12:31:22.1138455Z // begin inline asm 2026-02-21T12:31:22.1138634Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8374, %r8375, %r8376, %r8377}, [%r7590]; 2026-02-21T12:31:22.1138696Z // end inline asm 2026-02-21T12:31:22.1138760Z // begin inline asm 2026-02-21T12:31:22.1138954Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8378, %r8379, %r8380, %r8381}, [%r7595]; 2026-02-21T12:31:22.1139016Z // end inline asm 2026-02-21T12:31:22.1139075Z // begin inline asm 2026-02-21T12:31:22.1139257Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8390, %r8391, %r8392, %r8393}, [%r7600]; 2026-02-21T12:31:22.1139317Z // end inline asm 2026-02-21T12:31:22.1139376Z // begin inline asm 2026-02-21T12:31:22.1139558Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8394, %r8395, %r8396, %r8397}, [%r7605]; 2026-02-21T12:31:22.1139615Z // end inline asm 2026-02-21T12:31:22.1139671Z bar.sync 0; 2026-02-21T12:31:22.1139779Z st.shared.v4.b32 [%r22], {%r8911, %r8913, %r8915, %r8917}; 2026-02-21T12:31:22.1139885Z st.shared.v4.b32 [%r23], {%r8919, %r8921, %r8923, %r8925}; 2026-02-21T12:31:22.1139988Z st.shared.v4.b32 [%r24], {%r8927, %r8929, %r8931, %r8933}; 2026-02-21T12:31:22.1140092Z st.shared.v4.b32 [%r25], {%r8935, %r8937, %r8939, %r8941}; 2026-02-21T12:31:22.1140199Z st.shared.v4.b32 [%r26], {%r8943, %r8945, %r8947, %r8949}; 2026-02-21T12:31:22.1140305Z st.shared.v4.b32 [%r27], {%r8951, %r8953, %r8955, %r8957}; 2026-02-21T12:31:22.1140413Z st.shared.v4.b32 [%r28], {%r8959, %r8961, %r8963, %r8965}; 2026-02-21T12:31:22.1140519Z st.shared.v4.b32 [%r29], {%r8967, %r8969, %r8971, %r8973}; 2026-02-21T12:31:22.1140578Z bar.sync 0; 2026-02-21T12:31:22.1140638Z // begin inline asm 2026-02-21T12:31:22.1140824Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8398, %r8399, %r8400, %r8401}, [%r7570]; 2026-02-21T12:31:22.1140886Z // end inline asm 2026-02-21T12:31:22.1140945Z // begin inline asm 2026-02-21T12:31:22.1141126Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8402, %r8403, %r8404, %r8405}, [%r7575]; 2026-02-21T12:31:22.1141188Z // end inline asm 2026-02-21T12:31:22.1141247Z // begin inline asm 2026-02-21T12:31:22.1141427Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8414, %r8415, %r8416, %r8417}, [%r7580]; 2026-02-21T12:31:22.1141576Z // end inline asm 2026-02-21T12:31:22.1141640Z // begin inline asm 2026-02-21T12:31:22.1141820Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8418, %r8419, %r8420, %r8421}, [%r7585]; 2026-02-21T12:31:22.1141942Z // end inline asm 2026-02-21T12:31:22.1142007Z // begin inline asm 2026-02-21T12:31:22.1142186Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8430, %r8431, %r8432, %r8433}, [%r7590]; 2026-02-21T12:31:22.1142243Z // end inline asm 2026-02-21T12:31:22.1142305Z // begin inline asm 2026-02-21T12:31:22.1142484Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8434, %r8435, %r8436, %r8437}, [%r7595]; 2026-02-21T12:31:22.1142541Z // end inline asm 2026-02-21T12:31:22.1142604Z // begin inline asm 2026-02-21T12:31:22.1142781Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8446, %r8447, %r8448, %r8449}, [%r7600]; 2026-02-21T12:31:22.1142838Z // end inline asm 2026-02-21T12:31:22.1142897Z // begin inline asm 2026-02-21T12:31:22.1143129Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8450, %r8451, %r8452, %r8453}, [%r7605]; 2026-02-21T12:31:22.1143190Z // end inline asm 2026-02-21T12:31:22.1143247Z bar.sync 0; 2026-02-21T12:31:22.1143358Z st.shared.v4.b32 [%r22], {%r8912, %r8914, %r8916, %r8918}; 2026-02-21T12:31:22.1143463Z st.shared.v4.b32 [%r23], {%r8920, %r8922, %r8924, %r8926}; 2026-02-21T12:31:22.1143617Z st.shared.v4.b32 [%r24], {%r8928, %r8930, %r8932, %r8934}; 2026-02-21T12:31:22.1143720Z st.shared.v4.b32 [%r25], {%r8936, %r8938, %r8940, %r8942}; 2026-02-21T12:31:22.1143828Z st.shared.v4.b32 [%r26], {%r8944, %r8946, %r8948, %r8950}; 2026-02-21T12:31:22.1143931Z st.shared.v4.b32 [%r27], {%r8952, %r8954, %r8956, %r8958}; 2026-02-21T12:31:22.1144031Z st.shared.v4.b32 [%r28], {%r8960, %r8962, %r8964, %r8966}; 2026-02-21T12:31:22.1144138Z st.shared.v4.b32 [%r29], {%r8968, %r8970, %r8972, %r8974}; 2026-02-21T12:31:22.1144193Z bar.sync 0; 2026-02-21T12:31:22.1144252Z // begin inline asm 2026-02-21T12:31:22.1144442Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8406, %r8407, %r8408, %r8409}, [%r7570]; 2026-02-21T12:31:22.1144499Z // end inline asm 2026-02-21T12:31:22.1144557Z // begin inline asm 2026-02-21T12:31:22.1144734Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8410, %r8411, %r8412, %r8413}, [%r7575]; 2026-02-21T12:31:22.1144797Z // end inline asm 2026-02-21T12:31:22.1144870Z // begin inline asm 2026-02-21T12:31:22.1145054Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8422, %r8423, %r8424, %r8425}, [%r7580]; 2026-02-21T12:31:22.1145116Z // end inline asm 2026-02-21T12:31:22.1145177Z // begin inline asm 2026-02-21T12:31:22.1145353Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8426, %r8427, %r8428, %r8429}, [%r7585]; 2026-02-21T12:31:22.1145416Z // end inline asm 2026-02-21T12:31:22.1145475Z // begin inline asm 2026-02-21T12:31:22.1145652Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8438, %r8439, %r8440, %r8441}, [%r7590]; 2026-02-21T12:31:22.1145710Z // end inline asm 2026-02-21T12:31:22.1145776Z // begin inline asm 2026-02-21T12:31:22.1145957Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8442, %r8443, %r8444, %r8445}, [%r7595]; 2026-02-21T12:31:22.1146014Z // end inline asm 2026-02-21T12:31:22.1146077Z // begin inline asm 2026-02-21T12:31:22.1146255Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8454, %r8455, %r8456, %r8457}, [%r7600]; 2026-02-21T12:31:22.1146313Z // end inline asm 2026-02-21T12:31:22.1146379Z // begin inline asm 2026-02-21T12:31:22.1146704Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8458, %r8459, %r8460, %r8461}, [%r7605]; 2026-02-21T12:31:22.1146766Z // end inline asm 2026-02-21T12:31:22.1146822Z bar.sync 0; 2026-02-21T12:31:22.1146934Z st.shared.v4.b32 [%r22], {%r8975, %r8977, %r8979, %r8981}; 2026-02-21T12:31:22.1147037Z st.shared.v4.b32 [%r23], {%r8983, %r8985, %r8987, %r8989}; 2026-02-21T12:31:22.1147139Z st.shared.v4.b32 [%r24], {%r8991, %r8993, %r8995, %r8997}; 2026-02-21T12:31:22.1147256Z st.shared.v4.b32 [%r25], {%r8999, %r9001, %r9003, %r9005}; 2026-02-21T12:31:22.1147455Z st.shared.v4.b32 [%r26], {%r9007, %r9009, %r9011, %r9013}; 2026-02-21T12:31:22.1147563Z st.shared.v4.b32 [%r27], {%r9015, %r9017, %r9019, %r9021}; 2026-02-21T12:31:22.1147668Z st.shared.v4.b32 [%r28], {%r9023, %r9025, %r9027, %r9029}; 2026-02-21T12:31:22.1147832Z st.shared.v4.b32 [%r29], {%r9031, %r9033, %r9035, %r9037}; 2026-02-21T12:31:22.1147888Z bar.sync 0; 2026-02-21T12:31:22.1147948Z // begin inline asm 2026-02-21T12:31:22.1148133Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8462, %r8463, %r8464, %r8465}, [%r7570]; 2026-02-21T12:31:22.1148201Z // end inline asm 2026-02-21T12:31:22.1148353Z // begin inline asm 2026-02-21T12:31:22.1148541Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8466, %r8467, %r8468, %r8469}, [%r7575]; 2026-02-21T12:31:22.1148602Z // end inline asm 2026-02-21T12:31:22.1148660Z // begin inline asm 2026-02-21T12:31:22.1148840Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8478, %r8479, %r8480, %r8481}, [%r7580]; 2026-02-21T12:31:22.1148900Z // end inline asm 2026-02-21T12:31:22.1149030Z // begin inline asm 2026-02-21T12:31:22.1149209Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8482, %r8483, %r8484, %r8485}, [%r7585]; 2026-02-21T12:31:22.1149269Z // end inline asm 2026-02-21T12:31:22.1149329Z // begin inline asm 2026-02-21T12:31:22.1149507Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8494, %r8495, %r8496, %r8497}, [%r7590]; 2026-02-21T12:31:22.1149568Z // end inline asm 2026-02-21T12:31:22.1149704Z // begin inline asm 2026-02-21T12:31:22.1149884Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8498, %r8499, %r8500, %r8501}, [%r7595]; 2026-02-21T12:31:22.1149941Z // end inline asm 2026-02-21T12:31:22.1150004Z // begin inline asm 2026-02-21T12:31:22.1150180Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8510, %r8511, %r8512, %r8513}, [%r7600]; 2026-02-21T12:31:22.1150237Z // end inline asm 2026-02-21T12:31:22.1150299Z // begin inline asm 2026-02-21T12:31:22.1150476Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8514, %r8515, %r8516, %r8517}, [%r7605]; 2026-02-21T12:31:22.1150539Z // end inline asm 2026-02-21T12:31:22.1150596Z bar.sync 0; 2026-02-21T12:31:22.1150703Z st.shared.v4.b32 [%r22], {%r8976, %r8978, %r8980, %r8982}; 2026-02-21T12:31:22.1150805Z st.shared.v4.b32 [%r23], {%r8984, %r8986, %r8988, %r8990}; 2026-02-21T12:31:22.1150909Z st.shared.v4.b32 [%r24], {%r8992, %r8994, %r8996, %r8998}; 2026-02-21T12:31:22.1151014Z st.shared.v4.b32 [%r25], {%r9000, %r9002, %r9004, %r9006}; 2026-02-21T12:31:22.1151116Z st.shared.v4.b32 [%r26], {%r9008, %r9010, %r9012, %r9014}; 2026-02-21T12:31:22.1151219Z st.shared.v4.b32 [%r27], {%r9016, %r9018, %r9020, %r9022}; 2026-02-21T12:31:22.1151324Z st.shared.v4.b32 [%r28], {%r9024, %r9026, %r9028, %r9030}; 2026-02-21T12:31:22.1151425Z st.shared.v4.b32 [%r29], {%r9032, %r9034, %r9036, %r9038}; 2026-02-21T12:31:22.1151494Z bar.sync 0; 2026-02-21T12:31:22.1151561Z // begin inline asm 2026-02-21T12:31:22.1151744Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8470, %r8471, %r8472, %r8473}, [%r7570]; 2026-02-21T12:31:22.1151809Z // end inline asm 2026-02-21T12:31:22.1151868Z // begin inline asm 2026-02-21T12:31:22.1152050Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8474, %r8475, %r8476, %r8477}, [%r7575]; 2026-02-21T12:31:22.1152107Z // end inline asm 2026-02-21T12:31:22.1152168Z // begin inline asm 2026-02-21T12:31:22.1152349Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8486, %r8487, %r8488, %r8489}, [%r7580]; 2026-02-21T12:31:22.1152407Z // end inline asm 2026-02-21T12:31:22.1152466Z // begin inline asm 2026-02-21T12:31:22.1152642Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8490, %r8491, %r8492, %r8493}, [%r7585]; 2026-02-21T12:31:22.1152704Z // end inline asm 2026-02-21T12:31:22.1152764Z // begin inline asm 2026-02-21T12:31:22.1152942Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8502, %r8503, %r8504, %r8505}, [%r7590]; 2026-02-21T12:31:22.1153003Z // end inline asm 2026-02-21T12:31:22.1153061Z // begin inline asm 2026-02-21T12:31:22.1153238Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8506, %r8507, %r8508, %r8509}, [%r7595]; 2026-02-21T12:31:22.1153363Z // end inline asm 2026-02-21T12:31:22.1153424Z // begin inline asm 2026-02-21T12:31:22.1153599Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8518, %r8519, %r8520, %r8521}, [%r7600]; 2026-02-21T12:31:22.1153702Z // end inline asm 2026-02-21T12:31:22.1153767Z // begin inline asm 2026-02-21T12:31:22.1153945Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8522, %r8523, %r8524, %r8525}, [%r7605]; 2026-02-21T12:31:22.1154003Z // end inline asm 2026-02-21T12:31:22.1154062Z bar.sync 0; 2026-02-21T12:31:22.1154167Z st.shared.v4.b32 [%r22], {%r9039, %r9041, %r9043, %r9045}; 2026-02-21T12:31:22.1154270Z st.shared.v4.b32 [%r23], {%r9047, %r9049, %r9051, %r9053}; 2026-02-21T12:31:22.1154378Z st.shared.v4.b32 [%r24], {%r9055, %r9057, %r9059, %r9061}; 2026-02-21T12:31:22.1154479Z st.shared.v4.b32 [%r25], {%r9063, %r9065, %r9067, %r9069}; 2026-02-21T12:31:22.1154580Z st.shared.v4.b32 [%r26], {%r9071, %r9073, %r9075, %r9077}; 2026-02-21T12:31:22.1154733Z st.shared.v4.b32 [%r27], {%r9079, %r9081, %r9083, %r9085}; 2026-02-21T12:31:22.1154842Z st.shared.v4.b32 [%r28], {%r9087, %r9089, %r9091, %r9093}; 2026-02-21T12:31:22.1154945Z st.shared.v4.b32 [%r29], {%r9095, %r9097, %r9099, %r9101}; 2026-02-21T12:31:22.1155003Z bar.sync 0; 2026-02-21T12:31:22.1155067Z // begin inline asm 2026-02-21T12:31:22.1155247Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8526, %r8527, %r8528, %r8529}, [%r7570]; 2026-02-21T12:31:22.1155350Z // end inline asm 2026-02-21T12:31:22.1155411Z // begin inline asm 2026-02-21T12:31:22.1155598Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8530, %r8531, %r8532, %r8533}, [%r7575]; 2026-02-21T12:31:22.1155654Z // end inline asm 2026-02-21T12:31:22.1155713Z // begin inline asm 2026-02-21T12:31:22.1155902Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8542, %r8543, %r8544, %r8545}, [%r7580]; 2026-02-21T12:31:22.1155967Z // end inline asm 2026-02-21T12:31:22.1156028Z // begin inline asm 2026-02-21T12:31:22.1156216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8546, %r8547, %r8548, %r8549}, [%r7585]; 2026-02-21T12:31:22.1156278Z // end inline asm 2026-02-21T12:31:22.1156338Z // begin inline asm 2026-02-21T12:31:22.1156648Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8558, %r8559, %r8560, %r8561}, [%r7590]; 2026-02-21T12:31:22.1156715Z // end inline asm 2026-02-21T12:31:22.1156776Z // begin inline asm 2026-02-21T12:31:22.1156955Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8562, %r8563, %r8564, %r8565}, [%r7595]; 2026-02-21T12:31:22.1157018Z // end inline asm 2026-02-21T12:31:22.1157077Z // begin inline asm 2026-02-21T12:31:22.1157255Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8574, %r8575, %r8576, %r8577}, [%r7600]; 2026-02-21T12:31:22.1157319Z // end inline asm 2026-02-21T12:31:22.1157380Z // begin inline asm 2026-02-21T12:31:22.1157557Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8578, %r8579, %r8580, %r8581}, [%r7605]; 2026-02-21T12:31:22.1157614Z // end inline asm 2026-02-21T12:31:22.1157672Z bar.sync 0; 2026-02-21T12:31:22.1157779Z st.shared.v4.b32 [%r22], {%r9040, %r9042, %r9044, %r9046}; 2026-02-21T12:31:22.1157894Z st.shared.v4.b32 [%r23], {%r9048, %r9050, %r9052, %r9054}; 2026-02-21T12:31:22.1158001Z st.shared.v4.b32 [%r24], {%r9056, %r9058, %r9060, %r9062}; 2026-02-21T12:31:22.1158106Z st.shared.v4.b32 [%r25], {%r9064, %r9066, %r9068, %r9070}; 2026-02-21T12:31:22.1158207Z st.shared.v4.b32 [%r26], {%r9072, %r9074, %r9076, %r9078}; 2026-02-21T12:31:22.1158312Z st.shared.v4.b32 [%r27], {%r9080, %r9082, %r9084, %r9086}; 2026-02-21T12:31:22.1158413Z st.shared.v4.b32 [%r28], {%r9088, %r9090, %r9092, %r9094}; 2026-02-21T12:31:22.1158514Z st.shared.v4.b32 [%r29], {%r9096, %r9098, %r9100, %r9102}; 2026-02-21T12:31:22.1158571Z bar.sync 0; 2026-02-21T12:31:22.1158635Z // begin inline asm 2026-02-21T12:31:22.1158815Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8534, %r8535, %r8536, %r8537}, [%r7570]; 2026-02-21T12:31:22.1158873Z // end inline asm 2026-02-21T12:31:22.1158935Z // begin inline asm 2026-02-21T12:31:22.1159214Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8538, %r8539, %r8540, %r8541}, [%r7575]; 2026-02-21T12:31:22.1159275Z // end inline asm 2026-02-21T12:31:22.1159335Z // begin inline asm 2026-02-21T12:31:22.1159520Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8550, %r8551, %r8552, %r8553}, [%r7580]; 2026-02-21T12:31:22.1159641Z // end inline asm 2026-02-21T12:31:22.1159702Z // begin inline asm 2026-02-21T12:31:22.1159888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8554, %r8555, %r8556, %r8557}, [%r7585]; 2026-02-21T12:31:22.1159945Z // end inline asm 2026-02-21T12:31:22.1160004Z // begin inline asm 2026-02-21T12:31:22.1160188Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8566, %r8567, %r8568, %r8569}, [%r7590]; 2026-02-21T12:31:22.1160252Z // end inline asm 2026-02-21T12:31:22.1160312Z // begin inline asm 2026-02-21T12:31:22.1160491Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8570, %r8571, %r8572, %r8573}, [%r7595]; 2026-02-21T12:31:22.1160552Z // end inline asm 2026-02-21T12:31:22.1160612Z // begin inline asm 2026-02-21T12:31:22.1160853Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8582, %r8583, %r8584, %r8585}, [%r7600]; 2026-02-21T12:31:22.1160916Z // end inline asm 2026-02-21T12:31:22.1160975Z // begin inline asm 2026-02-21T12:31:22.1161154Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8586, %r8587, %r8588, %r8589}, [%r7605]; 2026-02-21T12:31:22.1161213Z // end inline asm 2026-02-21T12:31:22.1161271Z bar.sync 0; 2026-02-21T12:31:22.1161435Z st.shared.v4.b32 [%r22], {%r9103, %r9105, %r9107, %r9109}; 2026-02-21T12:31:22.1161542Z st.shared.v4.b32 [%r23], {%r9111, %r9113, %r9115, %r9117}; 2026-02-21T12:31:22.1161649Z st.shared.v4.b32 [%r24], {%r9119, %r9121, %r9123, %r9125}; 2026-02-21T12:31:22.1161751Z st.shared.v4.b32 [%r25], {%r9127, %r9129, %r9131, %r9133}; 2026-02-21T12:31:22.1161851Z st.shared.v4.b32 [%r26], {%r9135, %r9137, %r9139, %r9141}; 2026-02-21T12:31:22.1161967Z st.shared.v4.b32 [%r27], {%r9143, %r9145, %r9147, %r9149}; 2026-02-21T12:31:22.1162074Z st.shared.v4.b32 [%r28], {%r9151, %r9153, %r9155, %r9157}; 2026-02-21T12:31:22.1162176Z st.shared.v4.b32 [%r29], {%r9159, %r9161, %r9163, %r9165}; 2026-02-21T12:31:22.1162236Z bar.sync 0; 2026-02-21T12:31:22.1162297Z // begin inline asm 2026-02-21T12:31:22.1162481Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8590, %r8591, %r8592, %r8593}, [%r7570]; 2026-02-21T12:31:22.1162538Z // end inline asm 2026-02-21T12:31:22.1162601Z // begin inline asm 2026-02-21T12:31:22.1162782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8594, %r8595, %r8596, %r8597}, [%r7575]; 2026-02-21T12:31:22.1162840Z // end inline asm 2026-02-21T12:31:22.1162904Z // begin inline asm 2026-02-21T12:31:22.1163084Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8606, %r8607, %r8608, %r8609}, [%r7580]; 2026-02-21T12:31:22.1163141Z // end inline asm 2026-02-21T12:31:22.1163201Z // begin inline asm 2026-02-21T12:31:22.1163388Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8610, %r8611, %r8612, %r8613}, [%r7585]; 2026-02-21T12:31:22.1163446Z // end inline asm 2026-02-21T12:31:22.1163508Z // begin inline asm 2026-02-21T12:31:22.1163692Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8622, %r8623, %r8624, %r8625}, [%r7590]; 2026-02-21T12:31:22.1163749Z // end inline asm 2026-02-21T12:31:22.1163813Z // begin inline asm 2026-02-21T12:31:22.1163996Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8626, %r8627, %r8628, %r8629}, [%r7595]; 2026-02-21T12:31:22.1164053Z // end inline asm 2026-02-21T12:31:22.1164114Z // begin inline asm 2026-02-21T12:31:22.1164292Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8638, %r8639, %r8640, %r8641}, [%r7600]; 2026-02-21T12:31:22.1164354Z // end inline asm 2026-02-21T12:31:22.1164413Z // begin inline asm 2026-02-21T12:31:22.1164589Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8642, %r8643, %r8644, %r8645}, [%r7605]; 2026-02-21T12:31:22.1164652Z // end inline asm 2026-02-21T12:31:22.1164707Z bar.sync 0; 2026-02-21T12:31:22.1164809Z st.shared.v4.b32 [%r22], {%r9104, %r9106, %r9108, %r9110}; 2026-02-21T12:31:22.1164978Z st.shared.v4.b32 [%r23], {%r9112, %r9114, %r9116, %r9118}; 2026-02-21T12:31:22.1165080Z st.shared.v4.b32 [%r24], {%r9120, %r9122, %r9124, %r9126}; 2026-02-21T12:31:22.1165182Z st.shared.v4.b32 [%r25], {%r9128, %r9130, %r9132, %r9134}; 2026-02-21T12:31:22.1165331Z st.shared.v4.b32 [%r26], {%r9136, %r9138, %r9140, %r9142}; 2026-02-21T12:31:22.1165446Z st.shared.v4.b32 [%r27], {%r9144, %r9146, %r9148, %r9150}; 2026-02-21T12:31:22.1165555Z st.shared.v4.b32 [%r28], {%r9152, %r9154, %r9156, %r9158}; 2026-02-21T12:31:22.1165658Z st.shared.v4.b32 [%r29], {%r9160, %r9162, %r9164, %r9166}; 2026-02-21T12:31:22.1165719Z bar.sync 0; 2026-02-21T12:31:22.1165780Z // begin inline asm 2026-02-21T12:31:22.1165962Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8598, %r8599, %r8600, %r8601}, [%r7570]; 2026-02-21T12:31:22.1166023Z // end inline asm 2026-02-21T12:31:22.1166083Z // begin inline asm 2026-02-21T12:31:22.1166260Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8602, %r8603, %r8604, %r8605}, [%r7575]; 2026-02-21T12:31:22.1166371Z // end inline asm 2026-02-21T12:31:22.1166438Z // begin inline asm 2026-02-21T12:31:22.1166728Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8614, %r8615, %r8616, %r8617}, [%r7580]; 2026-02-21T12:31:22.1166788Z // end inline asm 2026-02-21T12:31:22.1166858Z // begin inline asm 2026-02-21T12:31:22.1167044Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8618, %r8619, %r8620, %r8621}, [%r7585]; 2026-02-21T12:31:22.1167182Z // end inline asm 2026-02-21T12:31:22.1167248Z // begin inline asm 2026-02-21T12:31:22.1167443Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8630, %r8631, %r8632, %r8633}, [%r7590]; 2026-02-21T12:31:22.1167505Z // end inline asm 2026-02-21T12:31:22.1167566Z // begin inline asm 2026-02-21T12:31:22.1167749Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8634, %r8635, %r8636, %r8637}, [%r7595]; 2026-02-21T12:31:22.1167806Z // end inline asm 2026-02-21T12:31:22.1167866Z // begin inline asm 2026-02-21T12:31:22.1168050Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8646, %r8647, %r8648, %r8649}, [%r7600]; 2026-02-21T12:31:22.1168112Z // end inline asm 2026-02-21T12:31:22.1168173Z // begin inline asm 2026-02-21T12:31:22.1168353Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8650, %r8651, %r8652, %r8653}, [%r7605]; 2026-02-21T12:31:22.1168418Z // end inline asm 2026-02-21T12:31:22.1168475Z bar.sync 0; 2026-02-21T12:31:22.1168578Z st.shared.v4.b32 [%r22], {%r9167, %r9169, %r9171, %r9173}; 2026-02-21T12:31:22.1168684Z st.shared.v4.b32 [%r23], {%r9175, %r9177, %r9179, %r9181}; 2026-02-21T12:31:22.1168786Z st.shared.v4.b32 [%r24], {%r9183, %r9185, %r9187, %r9189}; 2026-02-21T12:31:22.1168887Z st.shared.v4.b32 [%r25], {%r9191, %r9193, %r9195, %r9197}; 2026-02-21T12:31:22.1168993Z st.shared.v4.b32 [%r26], {%r9199, %r9201, %r9203, %r9205}; 2026-02-21T12:31:22.1169094Z st.shared.v4.b32 [%r27], {%r9207, %r9209, %r9211, %r9213}; 2026-02-21T12:31:22.1169192Z st.shared.v4.b32 [%r28], {%r9215, %r9217, %r9219, %r9221}; 2026-02-21T12:31:22.1169295Z st.shared.v4.b32 [%r29], {%r9223, %r9225, %r9227, %r9229}; 2026-02-21T12:31:22.1169358Z bar.sync 0; 2026-02-21T12:31:22.1169419Z // begin inline asm 2026-02-21T12:31:22.1169599Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8654, %r8655, %r8656, %r8657}, [%r7570]; 2026-02-21T12:31:22.1169669Z // end inline asm 2026-02-21T12:31:22.1169726Z // begin inline asm 2026-02-21T12:31:22.1169906Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8658, %r8659, %r8660, %r8661}, [%r7575]; 2026-02-21T12:31:22.1169965Z // end inline asm 2026-02-21T12:31:22.1170025Z // begin inline asm 2026-02-21T12:31:22.1170204Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8670, %r8671, %r8672, %r8673}, [%r7580]; 2026-02-21T12:31:22.1170261Z // end inline asm 2026-02-21T12:31:22.1170324Z // begin inline asm 2026-02-21T12:31:22.1170502Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8674, %r8675, %r8676, %r8677}, [%r7585]; 2026-02-21T12:31:22.1170559Z // end inline asm 2026-02-21T12:31:22.1170622Z // begin inline asm 2026-02-21T12:31:22.1170891Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8686, %r8687, %r8688, %r8689}, [%r7590]; 2026-02-21T12:31:22.1170954Z // end inline asm 2026-02-21T12:31:22.1171013Z // begin inline asm 2026-02-21T12:31:22.1171200Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8690, %r8691, %r8692, %r8693}, [%r7595]; 2026-02-21T12:31:22.1171323Z // end inline asm 2026-02-21T12:31:22.1171388Z // begin inline asm 2026-02-21T12:31:22.1171583Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8702, %r8703, %r8704, %r8705}, [%r7600]; 2026-02-21T12:31:22.1171644Z // end inline asm 2026-02-21T12:31:22.1171703Z // begin inline asm 2026-02-21T12:31:22.1171888Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8706, %r8707, %r8708, %r8709}, [%r7605]; 2026-02-21T12:31:22.1171945Z // end inline asm 2026-02-21T12:31:22.1172002Z bar.sync 0; 2026-02-21T12:31:22.1172110Z st.shared.v4.b32 [%r22], {%r9168, %r9170, %r9172, %r9174}; 2026-02-21T12:31:22.1172222Z st.shared.v4.b32 [%r23], {%r9176, %r9178, %r9180, %r9182}; 2026-02-21T12:31:22.1172414Z st.shared.v4.b32 [%r24], {%r9184, %r9186, %r9188, %r9190}; 2026-02-21T12:31:22.1172523Z st.shared.v4.b32 [%r25], {%r9192, %r9194, %r9196, %r9198}; 2026-02-21T12:31:22.1172629Z st.shared.v4.b32 [%r26], {%r9200, %r9202, %r9204, %r9206}; 2026-02-21T12:31:22.1172734Z st.shared.v4.b32 [%r27], {%r9208, %r9210, %r9212, %r9214}; 2026-02-21T12:31:22.1172837Z st.shared.v4.b32 [%r28], {%r9216, %r9218, %r9220, %r9222}; 2026-02-21T12:31:22.1172990Z st.shared.v4.b32 [%r29], {%r9224, %r9226, %r9228, %r9230}; 2026-02-21T12:31:22.1173049Z bar.sync 0; 2026-02-21T12:31:22.1173109Z // begin inline asm 2026-02-21T12:31:22.1173289Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8662, %r8663, %r8664, %r8665}, [%r7570]; 2026-02-21T12:31:22.1173352Z // end inline asm 2026-02-21T12:31:22.1173412Z // begin inline asm 2026-02-21T12:31:22.1173591Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8666, %r8667, %r8668, %r8669}, [%r7575]; 2026-02-21T12:31:22.1173652Z // end inline asm 2026-02-21T12:31:22.1173710Z // begin inline asm 2026-02-21T12:31:22.1173893Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8678, %r8679, %r8680, %r8681}, [%r7580]; 2026-02-21T12:31:22.1173955Z // end inline asm 2026-02-21T12:31:22.1174015Z // begin inline asm 2026-02-21T12:31:22.1174192Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8682, %r8683, %r8684, %r8685}, [%r7585]; 2026-02-21T12:31:22.1174252Z // end inline asm 2026-02-21T12:31:22.1174316Z // begin inline asm 2026-02-21T12:31:22.1174497Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8694, %r8695, %r8696, %r8697}, [%r7590]; 2026-02-21T12:31:22.1174554Z // end inline asm 2026-02-21T12:31:22.1174616Z // begin inline asm 2026-02-21T12:31:22.1174793Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8698, %r8699, %r8700, %r8701}, [%r7595]; 2026-02-21T12:31:22.1174862Z // end inline asm 2026-02-21T12:31:22.1174925Z // begin inline asm 2026-02-21T12:31:22.1175110Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8710, %r8711, %r8712, %r8713}, [%r7600]; 2026-02-21T12:31:22.1175169Z // end inline asm 2026-02-21T12:31:22.1175231Z // begin inline asm 2026-02-21T12:31:22.1175411Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8714, %r8715, %r8716, %r8717}, [%r7605]; 2026-02-21T12:31:22.1175471Z // end inline asm 2026-02-21T12:31:22.1175530Z // begin inline asm 2026-02-21T12:31:22.1175665Z st.global.v4.b32 [ %rd260 + 0 ], { %r8206, %r8207, %r8208, %r8209 }; 2026-02-21T12:31:22.1175722Z // end inline asm 2026-02-21T12:31:22.1175781Z // begin inline asm 2026-02-21T12:31:22.1175908Z st.global.v4.b32 [ %rd261 + 0 ], { %r8210, %r8211, %r8212, %r8213 }; 2026-02-21T12:31:22.1175975Z // end inline asm 2026-02-21T12:31:22.1176034Z // begin inline asm 2026-02-21T12:31:22.1176150Z st.global.v4.b32 [ %rd262 + 0 ], { %r8214, %r8215, %r8216, %r8217 }; 2026-02-21T12:31:22.1176210Z // end inline asm 2026-02-21T12:31:22.1176269Z // begin inline asm 2026-02-21T12:31:22.1176383Z st.global.v4.b32 [ %rd263 + 0 ], { %r8218, %r8219, %r8220, %r8221 }; 2026-02-21T12:31:22.1176442Z // end inline asm 2026-02-21T12:31:22.1176767Z // begin inline asm 2026-02-21T12:31:22.1176911Z st.global.v4.b32 [ %rd264 + 0 ], { %r8222, %r8223, %r8224, %r8225 }; 2026-02-21T12:31:22.1176977Z // end inline asm 2026-02-21T12:31:22.1177048Z // begin inline asm 2026-02-21T12:31:22.1177175Z st.global.v4.b32 [ %rd265 + 0 ], { %r8226, %r8227, %r8228, %r8229 }; 2026-02-21T12:31:22.1177326Z // end inline asm 2026-02-21T12:31:22.1177408Z // begin inline asm 2026-02-21T12:31:22.1177535Z st.global.v4.b32 [ %rd266 + 0 ], { %r8230, %r8231, %r8232, %r8233 }; 2026-02-21T12:31:22.1177592Z // end inline asm 2026-02-21T12:31:22.1177651Z // begin inline asm 2026-02-21T12:31:22.1177772Z st.global.v4.b32 [ %rd267 + 0 ], { %r8234, %r8235, %r8236, %r8237 }; 2026-02-21T12:31:22.1177831Z // end inline asm 2026-02-21T12:31:22.1177891Z // begin inline asm 2026-02-21T12:31:22.1178015Z st.global.v4.b32 [ %rd268 + 0 ], { %r8238, %r8239, %r8240, %r8241 }; 2026-02-21T12:31:22.1178073Z // end inline asm 2026-02-21T12:31:22.1178132Z // begin inline asm 2026-02-21T12:31:22.1178330Z st.global.v4.b32 [ %rd269 + 0 ], { %r8242, %r8243, %r8244, %r8245 }; 2026-02-21T12:31:22.1178397Z // end inline asm 2026-02-21T12:31:22.1178458Z // begin inline asm 2026-02-21T12:31:22.1178576Z st.global.v4.b32 [ %rd270 + 0 ], { %r8246, %r8247, %r8248, %r8249 }; 2026-02-21T12:31:22.1178639Z // end inline asm 2026-02-21T12:31:22.1178700Z // begin inline asm 2026-02-21T12:31:22.1178879Z st.global.v4.b32 [ %rd271 + 0 ], { %r8250, %r8251, %r8252, %r8253 }; 2026-02-21T12:31:22.1178951Z // end inline asm 2026-02-21T12:31:22.1179020Z // begin inline asm 2026-02-21T12:31:22.1179142Z st.global.v4.b32 [ %rd272 + 0 ], { %r8254, %r8255, %r8256, %r8257 }; 2026-02-21T12:31:22.1179200Z // end inline asm 2026-02-21T12:31:22.1179262Z // begin inline asm 2026-02-21T12:31:22.1179377Z st.global.v4.b32 [ %rd273 + 0 ], { %r8258, %r8259, %r8260, %r8261 }; 2026-02-21T12:31:22.1179434Z // end inline asm 2026-02-21T12:31:22.1179497Z // begin inline asm 2026-02-21T12:31:22.1179617Z st.global.v4.b32 [ %rd274 + 0 ], { %r8262, %r8263, %r8264, %r8265 }; 2026-02-21T12:31:22.1179677Z // end inline asm 2026-02-21T12:31:22.1179735Z // begin inline asm 2026-02-21T12:31:22.1179855Z st.global.v4.b32 [ %rd275 + 0 ], { %r8266, %r8267, %r8268, %r8269 }; 2026-02-21T12:31:22.1179912Z // end inline asm 2026-02-21T12:31:22.1179971Z // begin inline asm 2026-02-21T12:31:22.1180096Z st.global.v4.b32 [ %rd276 + 0 ], { %r8270, %r8271, %r8272, %r8273 }; 2026-02-21T12:31:22.1180158Z // end inline asm 2026-02-21T12:31:22.1180217Z // begin inline asm 2026-02-21T12:31:22.1180339Z st.global.v4.b32 [ %rd277 + 0 ], { %r8274, %r8275, %r8276, %r8277 }; 2026-02-21T12:31:22.1180404Z // end inline asm 2026-02-21T12:31:22.1180466Z // begin inline asm 2026-02-21T12:31:22.1180584Z st.global.v4.b32 [ %rd278 + 0 ], { %r8278, %r8279, %r8280, %r8281 }; 2026-02-21T12:31:22.1180647Z // end inline asm 2026-02-21T12:31:22.1180706Z // begin inline asm 2026-02-21T12:31:22.1180820Z st.global.v4.b32 [ %rd279 + 0 ], { %r8282, %r8283, %r8284, %r8285 }; 2026-02-21T12:31:22.1180882Z // end inline asm 2026-02-21T12:31:22.1180941Z // begin inline asm 2026-02-21T12:31:22.1181057Z st.global.v4.b32 [ %rd280 + 0 ], { %r8286, %r8287, %r8288, %r8289 }; 2026-02-21T12:31:22.1181114Z // end inline asm 2026-02-21T12:31:22.1181178Z // begin inline asm 2026-02-21T12:31:22.1181292Z st.global.v4.b32 [ %rd281 + 0 ], { %r8290, %r8291, %r8292, %r8293 }; 2026-02-21T12:31:22.1181350Z // end inline asm 2026-02-21T12:31:22.1181413Z // begin inline asm 2026-02-21T12:31:22.1181528Z st.global.v4.b32 [ %rd282 + 0 ], { %r8294, %r8295, %r8296, %r8297 }; 2026-02-21T12:31:22.1181586Z // end inline asm 2026-02-21T12:31:22.1181645Z // begin inline asm 2026-02-21T12:31:22.1181779Z st.global.v4.b32 [ %rd283 + 0 ], { %r8298, %r8299, %r8300, %r8301 }; 2026-02-21T12:31:22.1181837Z // end inline asm 2026-02-21T12:31:22.1181898Z // begin inline asm 2026-02-21T12:31:22.1182016Z st.global.v4.b32 [ %rd284 + 0 ], { %r8302, %r8303, %r8304, %r8305 }; 2026-02-21T12:31:22.1182140Z // end inline asm 2026-02-21T12:31:22.1182202Z // begin inline asm 2026-02-21T12:31:22.1182321Z st.global.v4.b32 [ %rd285 + 0 ], { %r8306, %r8307, %r8308, %r8309 }; 2026-02-21T12:31:22.1182378Z // end inline asm 2026-02-21T12:31:22.1182439Z // begin inline asm 2026-02-21T12:31:22.1182610Z st.global.v4.b32 [ %rd286 + 0 ], { %r8310, %r8311, %r8312, %r8313 }; 2026-02-21T12:31:22.1182671Z // end inline asm 2026-02-21T12:31:22.1182734Z // begin inline asm 2026-02-21T12:31:22.1182856Z st.global.v4.b32 [ %rd287 + 0 ], { %r8314, %r8315, %r8316, %r8317 }; 2026-02-21T12:31:22.1182917Z // end inline asm 2026-02-21T12:31:22.1182977Z // begin inline asm 2026-02-21T12:31:22.1183094Z st.global.v4.b32 [ %rd288 + 0 ], { %r8318, %r8319, %r8320, %r8321 }; 2026-02-21T12:31:22.1183152Z // end inline asm 2026-02-21T12:31:22.1183215Z // begin inline asm 2026-02-21T12:31:22.1183331Z st.global.v4.b32 [ %rd289 + 0 ], { %r8322, %r8323, %r8324, %r8325 }; 2026-02-21T12:31:22.1183389Z // end inline asm 2026-02-21T12:31:22.1183453Z // begin inline asm 2026-02-21T12:31:22.1183627Z st.global.v4.b32 [ %rd290 + 0 ], { %r8326, %r8327, %r8328, %r8329 }; 2026-02-21T12:31:22.1183688Z // end inline asm 2026-02-21T12:31:22.1183754Z // begin inline asm 2026-02-21T12:31:22.1183874Z st.global.v4.b32 [ %rd291 + 0 ], { %r8330, %r8331, %r8332, %r8333 }; 2026-02-21T12:31:22.1183936Z // end inline asm 2026-02-21T12:31:22.1183996Z // begin inline asm 2026-02-21T12:31:22.1184175Z st.global.v4.b32 [ %rd292 + 0 ], { %r8334, %r8335, %r8336, %r8337 }; 2026-02-21T12:31:22.1184236Z // end inline asm 2026-02-21T12:31:22.1184294Z // begin inline asm 2026-02-21T12:31:22.1184414Z st.global.v4.b32 [ %rd293 + 0 ], { %r8338, %r8339, %r8340, %r8341 }; 2026-02-21T12:31:22.1184472Z // end inline asm 2026-02-21T12:31:22.1184532Z // begin inline asm 2026-02-21T12:31:22.1184647Z st.global.v4.b32 [ %rd294 + 0 ], { %r8342, %r8343, %r8344, %r8345 }; 2026-02-21T12:31:22.1184709Z // end inline asm 2026-02-21T12:31:22.1184767Z // begin inline asm 2026-02-21T12:31:22.1184886Z st.global.v4.b32 [ %rd295 + 0 ], { %r8346, %r8347, %r8348, %r8349 }; 2026-02-21T12:31:22.1184954Z // end inline asm 2026-02-21T12:31:22.1185015Z // begin inline asm 2026-02-21T12:31:22.1185131Z st.global.v4.b32 [ %rd296 + 0 ], { %r8350, %r8351, %r8352, %r8353 }; 2026-02-21T12:31:22.1185205Z // end inline asm 2026-02-21T12:31:22.1185267Z // begin inline asm 2026-02-21T12:31:22.1185388Z st.global.v4.b32 [ %rd297 + 0 ], { %r8354, %r8355, %r8356, %r8357 }; 2026-02-21T12:31:22.1185451Z // end inline asm 2026-02-21T12:31:22.1185517Z // begin inline asm 2026-02-21T12:31:22.1185632Z st.global.v4.b32 [ %rd298 + 0 ], { %r8358, %r8359, %r8360, %r8361 }; 2026-02-21T12:31:22.1185691Z // end inline asm 2026-02-21T12:31:22.1185758Z // begin inline asm 2026-02-21T12:31:22.1185889Z st.global.v4.b32 [ %rd299 + 0 ], { %r8362, %r8363, %r8364, %r8365 }; 2026-02-21T12:31:22.1185949Z // end inline asm 2026-02-21T12:31:22.1186011Z // begin inline asm 2026-02-21T12:31:22.1186133Z st.global.v4.b32 [ %rd300 + 0 ], { %r8366, %r8367, %r8368, %r8369 }; 2026-02-21T12:31:22.1186191Z // end inline asm 2026-02-21T12:31:22.1186250Z // begin inline asm 2026-02-21T12:31:22.1186370Z st.global.v4.b32 [ %rd301 + 0 ], { %r8370, %r8371, %r8372, %r8373 }; 2026-02-21T12:31:22.1186429Z // end inline asm 2026-02-21T12:31:22.1186632Z // begin inline asm 2026-02-21T12:31:22.1186752Z st.global.v4.b32 [ %rd302 + 0 ], { %r8374, %r8375, %r8376, %r8377 }; 2026-02-21T12:31:22.1186815Z // end inline asm 2026-02-21T12:31:22.1186878Z // begin inline asm 2026-02-21T12:31:22.1186992Z st.global.v4.b32 [ %rd303 + 0 ], { %r8378, %r8379, %r8380, %r8381 }; 2026-02-21T12:31:22.1187053Z // end inline asm 2026-02-21T12:31:22.1187114Z // begin inline asm 2026-02-21T12:31:22.1187228Z st.global.v4.b32 [ %rd304 + 0 ], { %r8382, %r8383, %r8384, %r8385 }; 2026-02-21T12:31:22.1187301Z // end inline asm 2026-02-21T12:31:22.1187363Z // begin inline asm 2026-02-21T12:31:22.1187479Z st.global.v4.b32 [ %rd305 + 0 ], { %r8386, %r8387, %r8388, %r8389 }; 2026-02-21T12:31:22.1187620Z // end inline asm 2026-02-21T12:31:22.1187685Z // begin inline asm 2026-02-21T12:31:22.1187801Z st.global.v4.b32 [ %rd306 + 0 ], { %r8390, %r8391, %r8392, %r8393 }; 2026-02-21T12:31:22.1187859Z // end inline asm 2026-02-21T12:31:22.1187987Z // begin inline asm 2026-02-21T12:31:22.1188115Z st.global.v4.b32 [ %rd307 + 0 ], { %r8394, %r8395, %r8396, %r8397 }; 2026-02-21T12:31:22.1188174Z // end inline asm 2026-02-21T12:31:22.1188329Z // begin inline asm 2026-02-21T12:31:22.1188494Z st.global.v4.b32 [ %rd308 + 0 ], { %r8398, %r8399, %r8400, %r8401 }; 2026-02-21T12:31:22.1188595Z // end inline asm 2026-02-21T12:31:22.1188697Z // begin inline asm 2026-02-21T12:31:22.1188893Z st.global.v4.b32 [ %rd309 + 0 ], { %r8402, %r8403, %r8404, %r8405 }; 2026-02-21T12:31:22.1188981Z // end inline asm 2026-02-21T12:31:22.1189070Z // begin inline asm 2026-02-21T12:31:22.1189261Z st.global.v4.b32 [ %rd310 + 0 ], { %r8406, %r8407, %r8408, %r8409 }; 2026-02-21T12:31:22.1189350Z // end inline asm 2026-02-21T12:31:22.1189555Z // begin inline asm 2026-02-21T12:31:22.1189748Z st.global.v4.b32 [ %rd311 + 0 ], { %r8410, %r8411, %r8412, %r8413 }; 2026-02-21T12:31:22.1189842Z // end inline asm 2026-02-21T12:31:22.1189930Z // begin inline asm 2026-02-21T12:31:22.1190110Z st.global.v4.b32 [ %rd312 + 0 ], { %r8414, %r8415, %r8416, %r8417 }; 2026-02-21T12:31:22.1190202Z // end inline asm 2026-02-21T12:31:22.1190379Z // begin inline asm 2026-02-21T12:31:22.1190553Z st.global.v4.b32 [ %rd313 + 0 ], { %r8418, %r8419, %r8420, %r8421 }; 2026-02-21T12:31:22.1190637Z // end inline asm 2026-02-21T12:31:22.1190729Z // begin inline asm 2026-02-21T12:31:22.1190898Z st.global.v4.b32 [ %rd314 + 0 ], { %r8422, %r8423, %r8424, %r8425 }; 2026-02-21T12:31:22.1190980Z // end inline asm 2026-02-21T12:31:22.1191070Z // begin inline asm 2026-02-21T12:31:22.1191240Z st.global.v4.b32 [ %rd315 + 0 ], { %r8426, %r8427, %r8428, %r8429 }; 2026-02-21T12:31:22.1191327Z // end inline asm 2026-02-21T12:31:22.1191419Z // begin inline asm 2026-02-21T12:31:22.1191588Z st.global.v4.b32 [ %rd316 + 0 ], { %r8430, %r8431, %r8432, %r8433 }; 2026-02-21T12:31:22.1191672Z // end inline asm 2026-02-21T12:31:22.1191757Z // begin inline asm 2026-02-21T12:31:22.1191934Z st.global.v4.b32 [ %rd317 + 0 ], { %r8434, %r8435, %r8436, %r8437 }; 2026-02-21T12:31:22.1192016Z // end inline asm 2026-02-21T12:31:22.1192102Z // begin inline asm 2026-02-21T12:31:22.1192274Z st.global.v4.b32 [ %rd318 + 0 ], { %r8438, %r8439, %r8440, %r8441 }; 2026-02-21T12:31:22.1192356Z // end inline asm 2026-02-21T12:31:22.1192439Z // begin inline asm 2026-02-21T12:31:22.1192606Z st.global.v4.b32 [ %rd319 + 0 ], { %r8442, %r8443, %r8444, %r8445 }; 2026-02-21T12:31:22.1192690Z // end inline asm 2026-02-21T12:31:22.1192775Z // begin inline asm 2026-02-21T12:31:22.1192952Z st.global.v4.b32 [ %rd320 + 0 ], { %r8446, %r8447, %r8448, %r8449 }; 2026-02-21T12:31:22.1193050Z // end inline asm 2026-02-21T12:31:22.1193138Z // begin inline asm 2026-02-21T12:31:22.1193309Z st.global.v4.b32 [ %rd321 + 0 ], { %r8450, %r8451, %r8452, %r8453 }; 2026-02-21T12:31:22.1193405Z // end inline asm 2026-02-21T12:31:22.1193495Z // begin inline asm 2026-02-21T12:31:22.1193667Z st.global.v4.b32 [ %rd322 + 0 ], { %r8454, %r8455, %r8456, %r8457 }; 2026-02-21T12:31:22.1193765Z // end inline asm 2026-02-21T12:31:22.1193870Z // begin inline asm 2026-02-21T12:31:22.1194062Z st.global.v4.b32 [ %rd323 + 0 ], { %r8458, %r8459, %r8460, %r8461 }; 2026-02-21T12:31:22.1194158Z // end inline asm 2026-02-21T12:31:22.1194268Z // begin inline asm 2026-02-21T12:31:22.1194413Z st.global.v4.b32 [ %rd324 + 0 ], { %r8462, %r8463, %r8464, %r8465 }; 2026-02-21T12:31:22.1194471Z // end inline asm 2026-02-21T12:31:22.1194534Z // begin inline asm 2026-02-21T12:31:22.1194656Z st.global.v4.b32 [ %rd325 + 0 ], { %r8466, %r8467, %r8468, %r8469 }; 2026-02-21T12:31:22.1194716Z // end inline asm 2026-02-21T12:31:22.1194776Z // begin inline asm 2026-02-21T12:31:22.1194899Z st.global.v4.b32 [ %rd326 + 0 ], { %r8470, %r8471, %r8472, %r8473 }; 2026-02-21T12:31:22.1195036Z // end inline asm 2026-02-21T12:31:22.1195098Z // begin inline asm 2026-02-21T12:31:22.1195276Z st.global.v4.b32 [ %rd327 + 0 ], { %r8474, %r8475, %r8476, %r8477 }; 2026-02-21T12:31:22.1195465Z // end inline asm 2026-02-21T12:31:22.1195526Z // begin inline asm 2026-02-21T12:31:22.1195653Z st.global.v4.b32 [ %rd328 + 0 ], { %r8478, %r8479, %r8480, %r8481 }; 2026-02-21T12:31:22.1195723Z // end inline asm 2026-02-21T12:31:22.1195784Z // begin inline asm 2026-02-21T12:31:22.1195939Z st.global.v4.b32 [ %rd329 + 0 ], { %r8482, %r8483, %r8484, %r8485 }; 2026-02-21T12:31:22.1196051Z // end inline asm 2026-02-21T12:31:22.1196140Z // begin inline asm 2026-02-21T12:31:22.1196270Z st.global.v4.b32 [ %rd330 + 0 ], { %r8486, %r8487, %r8488, %r8489 }; 2026-02-21T12:31:22.1196329Z // end inline asm 2026-02-21T12:31:22.1196394Z // begin inline asm 2026-02-21T12:31:22.1196665Z st.global.v4.b32 [ %rd331 + 0 ], { %r8490, %r8491, %r8492, %r8493 }; 2026-02-21T12:31:22.1196824Z // end inline asm 2026-02-21T12:31:22.1196907Z // begin inline asm 2026-02-21T12:31:22.1197028Z st.global.v4.b32 [ %rd332 + 0 ], { %r8494, %r8495, %r8496, %r8497 }; 2026-02-21T12:31:22.1197087Z // end inline asm 2026-02-21T12:31:22.1197152Z // begin inline asm 2026-02-21T12:31:22.1197274Z st.global.v4.b32 [ %rd333 + 0 ], { %r8498, %r8499, %r8500, %r8501 }; 2026-02-21T12:31:22.1197332Z // end inline asm 2026-02-21T12:31:22.1197466Z // begin inline asm 2026-02-21T12:31:22.1197596Z st.global.v4.b32 [ %rd334 + 0 ], { %r8502, %r8503, %r8504, %r8505 }; 2026-02-21T12:31:22.1197653Z // end inline asm 2026-02-21T12:31:22.1197714Z // begin inline asm 2026-02-21T12:31:22.1197834Z st.global.v4.b32 [ %rd335 + 0 ], { %r8506, %r8507, %r8508, %r8509 }; 2026-02-21T12:31:22.1197900Z // end inline asm 2026-02-21T12:31:22.1197965Z // begin inline asm 2026-02-21T12:31:22.1198088Z st.global.v4.b32 [ %rd336 + 0 ], { %r8510, %r8511, %r8512, %r8513 }; 2026-02-21T12:31:22.1198152Z // end inline asm 2026-02-21T12:31:22.1198215Z // begin inline asm 2026-02-21T12:31:22.1198334Z st.global.v4.b32 [ %rd337 + 0 ], { %r8514, %r8515, %r8516, %r8517 }; 2026-02-21T12:31:22.1198395Z // end inline asm 2026-02-21T12:31:22.1198457Z // begin inline asm 2026-02-21T12:31:22.1198578Z st.global.v4.b32 [ %rd338 + 0 ], { %r8518, %r8519, %r8520, %r8521 }; 2026-02-21T12:31:22.1198636Z // end inline asm 2026-02-21T12:31:22.1198701Z // begin inline asm 2026-02-21T12:31:22.1198815Z st.global.v4.b32 [ %rd339 + 0 ], { %r8522, %r8523, %r8524, %r8525 }; 2026-02-21T12:31:22.1198873Z // end inline asm 2026-02-21T12:31:22.1198937Z // begin inline asm 2026-02-21T12:31:22.1199055Z st.global.v4.b32 [ %rd340 + 0 ], { %r8526, %r8527, %r8528, %r8529 }; 2026-02-21T12:31:22.1199113Z // end inline asm 2026-02-21T12:31:22.1199184Z // begin inline asm 2026-02-21T12:31:22.1199300Z st.global.v4.b32 [ %rd341 + 0 ], { %r8530, %r8531, %r8532, %r8533 }; 2026-02-21T12:31:22.1199359Z // end inline asm 2026-02-21T12:31:22.1199419Z // begin inline asm 2026-02-21T12:31:22.1199543Z st.global.v4.b32 [ %rd342 + 0 ], { %r8534, %r8535, %r8536, %r8537 }; 2026-02-21T12:31:22.1199600Z // end inline asm 2026-02-21T12:31:22.1199659Z // begin inline asm 2026-02-21T12:31:22.1199783Z st.global.v4.b32 [ %rd343 + 0 ], { %r8538, %r8539, %r8540, %r8541 }; 2026-02-21T12:31:22.1199841Z // end inline asm 2026-02-21T12:31:22.1199905Z // begin inline asm 2026-02-21T12:31:22.1200024Z st.global.v4.b32 [ %rd344 + 0 ], { %r8542, %r8543, %r8544, %r8545 }; 2026-02-21T12:31:22.1200089Z // end inline asm 2026-02-21T12:31:22.1200148Z // begin inline asm 2026-02-21T12:31:22.1200261Z st.global.v4.b32 [ %rd345 + 0 ], { %r8546, %r8547, %r8548, %r8549 }; 2026-02-21T12:31:22.1200323Z // end inline asm 2026-02-21T12:31:22.1200384Z // begin inline asm 2026-02-21T12:31:22.1200499Z st.global.v4.b32 [ %rd346 + 0 ], { %r8550, %r8551, %r8552, %r8553 }; 2026-02-21T12:31:22.1200561Z // end inline asm 2026-02-21T12:31:22.1200621Z // begin inline asm 2026-02-21T12:31:22.1200918Z st.global.v4.b32 [ %rd347 + 0 ], { %r8554, %r8555, %r8556, %r8557 }; 2026-02-21T12:31:22.1201016Z // end inline asm 2026-02-21T12:31:22.1201118Z // begin inline asm 2026-02-21T12:31:22.1201307Z st.global.v4.b32 [ %rd348 + 0 ], { %r8558, %r8559, %r8560, %r8561 }; 2026-02-21T12:31:22.1201502Z // end inline asm 2026-02-21T12:31:22.1201608Z // begin inline asm 2026-02-21T12:31:22.1201805Z st.global.v4.b32 [ %rd349 + 0 ], { %r8562, %r8563, %r8564, %r8565 }; 2026-02-21T12:31:22.1201903Z // end inline asm 2026-02-21T12:31:22.1201991Z // begin inline asm 2026-02-21T12:31:22.1202116Z st.global.v4.b32 [ %rd350 + 0 ], { %r8566, %r8567, %r8568, %r8569 }; 2026-02-21T12:31:22.1202174Z // end inline asm 2026-02-21T12:31:22.1202257Z // begin inline asm 2026-02-21T12:31:22.1202462Z st.global.v4.b32 [ %rd351 + 0 ], { %r8570, %r8571, %r8572, %r8573 }; 2026-02-21T12:31:22.1202560Z // end inline asm 2026-02-21T12:31:22.1202653Z // begin inline asm 2026-02-21T12:31:22.1202932Z st.global.v4.b32 [ %rd352 + 0 ], { %r8574, %r8575, %r8576, %r8577 }; 2026-02-21T12:31:22.1203037Z // end inline asm 2026-02-21T12:31:22.1203132Z // begin inline asm 2026-02-21T12:31:22.1203327Z st.global.v4.b32 [ %rd353 + 0 ], { %r8578, %r8579, %r8580, %r8581 }; 2026-02-21T12:31:22.1203431Z // end inline asm 2026-02-21T12:31:22.1203529Z // begin inline asm 2026-02-21T12:31:22.1203720Z st.global.v4.b32 [ %rd354 + 0 ], { %r8582, %r8583, %r8584, %r8585 }; 2026-02-21T12:31:22.1203889Z // end inline asm 2026-02-21T12:31:22.1203989Z // begin inline asm 2026-02-21T12:31:22.1204184Z st.global.v4.b32 [ %rd355 + 0 ], { %r8586, %r8587, %r8588, %r8589 }; 2026-02-21T12:31:22.1204277Z // end inline asm 2026-02-21T12:31:22.1204377Z // begin inline asm 2026-02-21T12:31:22.1204570Z st.global.v4.b32 [ %rd356 + 0 ], { %r8590, %r8591, %r8592, %r8593 }; 2026-02-21T12:31:22.1204663Z // end inline asm 2026-02-21T12:31:22.1204767Z // begin inline asm 2026-02-21T12:31:22.1204963Z st.global.v4.b32 [ %rd357 + 0 ], { %r8594, %r8595, %r8596, %r8597 }; 2026-02-21T12:31:22.1205072Z // end inline asm 2026-02-21T12:31:22.1205176Z // begin inline asm 2026-02-21T12:31:22.1205389Z st.global.v4.b32 [ %rd358 + 0 ], { %r8598, %r8599, %r8600, %r8601 }; 2026-02-21T12:31:22.1205496Z // end inline asm 2026-02-21T12:31:22.1205606Z // begin inline asm 2026-02-21T12:31:22.1205818Z st.global.v4.b32 [ %rd359 + 0 ], { %r8602, %r8603, %r8604, %r8605 }; 2026-02-21T12:31:22.1205920Z // end inline asm 2026-02-21T12:31:22.1206027Z // begin inline asm 2026-02-21T12:31:22.1206214Z st.global.v4.b32 [ %rd360 + 0 ], { %r8606, %r8607, %r8608, %r8609 }; 2026-02-21T12:31:22.1206274Z // end inline asm 2026-02-21T12:31:22.1206333Z // begin inline asm 2026-02-21T12:31:22.1206583Z st.global.v4.b32 [ %rd361 + 0 ], { %r8610, %r8611, %r8612, %r8613 }; 2026-02-21T12:31:22.1206649Z // end inline asm 2026-02-21T12:31:22.1206708Z // begin inline asm 2026-02-21T12:31:22.1206825Z st.global.v4.b32 [ %rd362 + 0 ], { %r8614, %r8615, %r8616, %r8617 }; 2026-02-21T12:31:22.1206889Z // end inline asm 2026-02-21T12:31:22.1206951Z // begin inline asm 2026-02-21T12:31:22.1207066Z st.global.v4.b32 [ %rd363 + 0 ], { %r8618, %r8619, %r8620, %r8621 }; 2026-02-21T12:31:22.1207125Z // end inline asm 2026-02-21T12:31:22.1207190Z // begin inline asm 2026-02-21T12:31:22.1207307Z st.global.v4.b32 [ %rd364 + 0 ], { %r8622, %r8623, %r8624, %r8625 }; 2026-02-21T12:31:22.1207363Z // end inline asm 2026-02-21T12:31:22.1207430Z // begin inline asm 2026-02-21T12:31:22.1207548Z st.global.v4.b32 [ %rd365 + 0 ], { %r8626, %r8627, %r8628, %r8629 }; 2026-02-21T12:31:22.1207604Z // end inline asm 2026-02-21T12:31:22.1207668Z // begin inline asm 2026-02-21T12:31:22.1207786Z st.global.v4.b32 [ %rd366 + 0 ], { %r8630, %r8631, %r8632, %r8633 }; 2026-02-21T12:31:22.1207843Z // end inline asm 2026-02-21T12:31:22.1207902Z // begin inline asm 2026-02-21T12:31:22.1208024Z st.global.v4.b32 [ %rd367 + 0 ], { %r8634, %r8635, %r8636, %r8637 }; 2026-02-21T12:31:22.1208082Z // end inline asm 2026-02-21T12:31:22.1208249Z // begin inline asm 2026-02-21T12:31:22.1208377Z st.global.v4.b32 [ %rd368 + 0 ], { %r8638, %r8639, %r8640, %r8641 }; 2026-02-21T12:31:22.1208447Z // end inline asm 2026-02-21T12:31:22.1208508Z // begin inline asm 2026-02-21T12:31:22.1208694Z st.global.v4.b32 [ %rd369 + 0 ], { %r8642, %r8643, %r8644, %r8645 }; 2026-02-21T12:31:22.1208755Z // end inline asm 2026-02-21T12:31:22.1208814Z // begin inline asm 2026-02-21T12:31:22.1208931Z st.global.v4.b32 [ %rd370 + 0 ], { %r8646, %r8647, %r8648, %r8649 }; 2026-02-21T12:31:22.1208995Z // end inline asm 2026-02-21T12:31:22.1209054Z // begin inline asm 2026-02-21T12:31:22.1209171Z st.global.v4.b32 [ %rd371 + 0 ], { %r8650, %r8651, %r8652, %r8653 }; 2026-02-21T12:31:22.1209233Z // end inline asm 2026-02-21T12:31:22.1209293Z // begin inline asm 2026-02-21T12:31:22.1209409Z st.global.v4.b32 [ %rd372 + 0 ], { %r8654, %r8655, %r8656, %r8657 }; 2026-02-21T12:31:22.1209467Z // end inline asm 2026-02-21T12:31:22.1209530Z // begin inline asm 2026-02-21T12:31:22.1209724Z st.global.v4.b32 [ %rd373 + 0 ], { %r8658, %r8659, %r8660, %r8661 }; 2026-02-21T12:31:22.1209787Z // end inline asm 2026-02-21T12:31:22.1209851Z // begin inline asm 2026-02-21T12:31:22.1209969Z st.global.v4.b32 [ %rd374 + 0 ], { %r8662, %r8663, %r8664, %r8665 }; 2026-02-21T12:31:22.1210031Z // end inline asm 2026-02-21T12:31:22.1210091Z // begin inline asm 2026-02-21T12:31:22.1210272Z st.global.v4.b32 [ %rd375 + 0 ], { %r8666, %r8667, %r8668, %r8669 }; 2026-02-21T12:31:22.1210332Z // end inline asm 2026-02-21T12:31:22.1210392Z // begin inline asm 2026-02-21T12:31:22.1210514Z st.global.v4.b32 [ %rd376 + 0 ], { %r8670, %r8671, %r8672, %r8673 }; 2026-02-21T12:31:22.1210573Z // end inline asm 2026-02-21T12:31:22.1210634Z // begin inline asm 2026-02-21T12:31:22.1210750Z st.global.v4.b32 [ %rd377 + 0 ], { %r8674, %r8675, %r8676, %r8677 }; 2026-02-21T12:31:22.1210812Z // end inline asm 2026-02-21T12:31:22.1210871Z // begin inline asm 2026-02-21T12:31:22.1210990Z st.global.v4.b32 [ %rd378 + 0 ], { %r8678, %r8679, %r8680, %r8681 }; 2026-02-21T12:31:22.1211052Z // end inline asm 2026-02-21T12:31:22.1211123Z // begin inline asm 2026-02-21T12:31:22.1211244Z st.global.v4.b32 [ %rd379 + 0 ], { %r8682, %r8683, %r8684, %r8685 }; 2026-02-21T12:31:22.1211311Z // end inline asm 2026-02-21T12:31:22.1211374Z // begin inline asm 2026-02-21T12:31:22.1211493Z st.global.v4.b32 [ %rd380 + 0 ], { %r8686, %r8687, %r8688, %r8689 }; 2026-02-21T12:31:22.1211554Z // end inline asm 2026-02-21T12:31:22.1211617Z // begin inline asm 2026-02-21T12:31:22.1211733Z st.global.v4.b32 [ %rd381 + 0 ], { %r8690, %r8691, %r8692, %r8693 }; 2026-02-21T12:31:22.1211791Z // end inline asm 2026-02-21T12:31:22.1211856Z // begin inline asm 2026-02-21T12:31:22.1211971Z st.global.v4.b32 [ %rd382 + 0 ], { %r8694, %r8695, %r8696, %r8697 }; 2026-02-21T12:31:22.1212029Z // end inline asm 2026-02-21T12:31:22.1212089Z // begin inline asm 2026-02-21T12:31:22.1212208Z st.global.v4.b32 [ %rd383 + 0 ], { %r8698, %r8699, %r8700, %r8701 }; 2026-02-21T12:31:22.1212271Z // end inline asm 2026-02-21T12:31:22.1212333Z // begin inline asm 2026-02-21T12:31:22.1212452Z st.global.v4.b32 [ %rd384 + 0 ], { %r8702, %r8703, %r8704, %r8705 }; 2026-02-21T12:31:22.1212514Z // end inline asm 2026-02-21T12:31:22.1212575Z // begin inline asm 2026-02-21T12:31:22.1212696Z st.global.v4.b32 [ %rd385 + 0 ], { %r8706, %r8707, %r8708, %r8709 }; 2026-02-21T12:31:22.1212754Z // end inline asm 2026-02-21T12:31:22.1212816Z // begin inline asm 2026-02-21T12:31:22.1212933Z st.global.v4.b32 [ %rd386 + 0 ], { %r8710, %r8711, %r8712, %r8713 }; 2026-02-21T12:31:22.1212996Z // end inline asm 2026-02-21T12:31:22.1213055Z // begin inline asm 2026-02-21T12:31:22.1213171Z st.global.v4.b32 [ %rd387 + 0 ], { %r8714, %r8715, %r8716, %r8717 }; 2026-02-21T12:31:22.1213234Z // end inline asm 2026-02-21T12:31:22.1213312Z mov.b32 %r9236, 0f00000000; 2026-02-21T12:31:22.1213381Z mov.b32 %r9237, %r9236; 2026-02-21T12:31:22.1213442Z mov.b32 %r9238, %r9236; 2026-02-21T12:31:22.1213572Z mov.b32 %r9239, %r9236; 2026-02-21T12:31:22.1213633Z mov.b32 %r9240, %r9236; 2026-02-21T12:31:22.1213693Z mov.b32 %r9241, %r9236; 2026-02-21T12:31:22.1213757Z mov.b32 %r9242, %r9236; 2026-02-21T12:31:22.1213818Z mov.b32 %r9243, %r9236; 2026-02-21T12:31:22.1213929Z mov.b32 %r9244, %r9236; 2026-02-21T12:31:22.1213989Z mov.b32 %r9245, %r9236; 2026-02-21T12:31:22.1214054Z mov.b32 %r9246, %r9236; 2026-02-21T12:31:22.1214115Z mov.b32 %r9247, %r9236; 2026-02-21T12:31:22.1214176Z mov.b32 %r9248, %r9236; 2026-02-21T12:31:22.1214241Z mov.b32 %r9249, %r9236; 2026-02-21T12:31:22.1214304Z mov.b32 %r9250, %r9236; 2026-02-21T12:31:22.1217949Z mov.b32 %r9251, %r9236; 2026-02-21T12:31:22.1218037Z mov.b32 %r9252, %r9236; 2026-02-21T12:31:22.1218099Z mov.b32 %r9253, %r9236; 2026-02-21T12:31:22.1218159Z mov.b32 %r9254, %r9236; 2026-02-21T12:31:22.1218216Z mov.b32 %r9255, %r9236; 2026-02-21T12:31:22.1218273Z mov.b32 %r9256, %r9236; 2026-02-21T12:31:22.1218330Z mov.b32 %r9257, %r9236; 2026-02-21T12:31:22.1218519Z mov.b32 %r9258, %r9236; 2026-02-21T12:31:22.1218583Z mov.b32 %r9259, %r9236; 2026-02-21T12:31:22.1218639Z mov.b32 %r9260, %r9236; 2026-02-21T12:31:22.1218697Z mov.b32 %r9261, %r9236; 2026-02-21T12:31:22.1218758Z mov.b32 %r9262, %r9236; 2026-02-21T12:31:22.1218814Z mov.b32 %r9263, %r9236; 2026-02-21T12:31:22.1218870Z mov.b32 %r9264, %r9236; 2026-02-21T12:31:22.1218932Z mov.b32 %r9265, %r9236; 2026-02-21T12:31:22.1219057Z mov.b32 %r9266, %r9236; 2026-02-21T12:31:22.1219118Z mov.b32 %r9267, %r9236; 2026-02-21T12:31:22.1219178Z mov.b32 %r9268, %r9236; 2026-02-21T12:31:22.1219246Z mov.b32 %r9269, %r9236; 2026-02-21T12:31:22.1219307Z mov.b32 %r9270, %r9236; 2026-02-21T12:31:22.1219364Z mov.b32 %r9271, %r9236; 2026-02-21T12:31:22.1219427Z mov.b32 %r9272, %r9236; 2026-02-21T12:31:22.1219485Z mov.b32 %r9273, %r9236; 2026-02-21T12:31:22.1219542Z mov.b32 %r9274, %r9236; 2026-02-21T12:31:22.1219605Z mov.b32 %r9275, %r9236; 2026-02-21T12:31:22.1219665Z mov.b32 %r9276, %r9236; 2026-02-21T12:31:22.1219725Z mov.b32 %r9277, %r9236; 2026-02-21T12:31:22.1219784Z mov.b32 %r9278, %r9236; 2026-02-21T12:31:22.1219845Z mov.b32 %r9279, %r9236; 2026-02-21T12:31:22.1219901Z mov.b32 %r9280, %r9236; 2026-02-21T12:31:22.1219961Z mov.b32 %r9281, %r9236; 2026-02-21T12:31:22.1220022Z mov.b32 %r9282, %r9236; 2026-02-21T12:31:22.1220080Z mov.b32 %r9283, %r9236; 2026-02-21T12:31:22.1220139Z mov.b32 %r9284, %r9236; 2026-02-21T12:31:22.1220196Z mov.b32 %r9285, %r9236; 2026-02-21T12:31:22.1220257Z mov.b32 %r9286, %r9236; 2026-02-21T12:31:22.1220314Z mov.b32 %r9287, %r9236; 2026-02-21T12:31:22.1220371Z mov.b32 %r9288, %r9236; 2026-02-21T12:31:22.1220431Z mov.b32 %r9289, %r9236; 2026-02-21T12:31:22.1220487Z mov.b32 %r9290, %r9236; 2026-02-21T12:31:22.1220545Z mov.b32 %r9291, %r9236; 2026-02-21T12:31:22.1220607Z mov.b32 %r9292, %r9236; 2026-02-21T12:31:22.1220664Z mov.b32 %r9293, %r9236; 2026-02-21T12:31:22.1220721Z mov.b32 %r9294, %r9236; 2026-02-21T12:31:22.1220785Z mov.b32 %r9295, %r9236; 2026-02-21T12:31:22.1220857Z mov.b32 %r9296, %r9236; 2026-02-21T12:31:22.1220920Z mov.b32 %r9297, %r9236; 2026-02-21T12:31:22.1220978Z mov.b32 %r9298, %r9236; 2026-02-21T12:31:22.1221041Z mov.b32 %r9299, %r9236; 2026-02-21T12:31:22.1221101Z mov.b32 %r9300, %r9236; 2026-02-21T12:31:22.1221160Z mov.b32 %r9301, %r9236; 2026-02-21T12:31:22.1221218Z mov.b32 %r9302, %r9236; 2026-02-21T12:31:22.1221284Z mov.b32 %r9303, %r9236; 2026-02-21T12:31:22.1221342Z mov.b32 %r9304, %r9236; 2026-02-21T12:31:22.1221401Z mov.b32 %r9305, %r9236; 2026-02-21T12:31:22.1221462Z mov.b32 %r9306, %r9236; 2026-02-21T12:31:22.1221519Z mov.b32 %r9307, %r9236; 2026-02-21T12:31:22.1221574Z mov.b32 %r9308, %r9236; 2026-02-21T12:31:22.1221632Z mov.b32 %r9309, %r9236; 2026-02-21T12:31:22.1221694Z mov.b32 %r9310, %r9236; 2026-02-21T12:31:22.1221750Z mov.b32 %r9311, %r9236; 2026-02-21T12:31:22.1221808Z mov.b32 %r9312, %r9236; 2026-02-21T12:31:22.1221978Z mov.b32 %r9313, %r9236; 2026-02-21T12:31:22.1222038Z mov.b32 %r9314, %r9236; 2026-02-21T12:31:22.1222094Z mov.b32 %r9315, %r9236; 2026-02-21T12:31:22.1222152Z mov.b32 %r9316, %r9236; 2026-02-21T12:31:22.1222215Z mov.b32 %r9317, %r9236; 2026-02-21T12:31:22.1222359Z mov.b32 %r9318, %r9236; 2026-02-21T12:31:22.1222418Z mov.b32 %r9319, %r9236; 2026-02-21T12:31:22.1222480Z mov.b32 %r9320, %r9236; 2026-02-21T12:31:22.1222541Z mov.b32 %r9321, %r9236; 2026-02-21T12:31:22.1222597Z mov.b32 %r9322, %r9236; 2026-02-21T12:31:22.1222653Z mov.b32 %r9323, %r9236; 2026-02-21T12:31:22.1222716Z mov.b32 %r9324, %r9236; 2026-02-21T12:31:22.1222772Z mov.b32 %r9325, %r9236; 2026-02-21T12:31:22.1222830Z mov.b32 %r9326, %r9236; 2026-02-21T12:31:22.1222897Z mov.b32 %r9327, %r9236; 2026-02-21T12:31:22.1222956Z mov.b32 %r9328, %r9236; 2026-02-21T12:31:22.1223010Z mov.b32 %r9329, %r9236; 2026-02-21T12:31:22.1223066Z mov.b32 %r9330, %r9236; 2026-02-21T12:31:22.1223127Z mov.b32 %r9331, %r9236; 2026-02-21T12:31:22.1223251Z mov.b32 %r9332, %r9236; 2026-02-21T12:31:22.1223311Z mov.b32 %r9333, %r9236; 2026-02-21T12:31:22.1223373Z mov.b32 %r9334, %r9236; 2026-02-21T12:31:22.1223432Z mov.b32 %r9335, %r9236; 2026-02-21T12:31:22.1223490Z mov.b32 %r9336, %r9236; 2026-02-21T12:31:22.1223555Z mov.b32 %r9337, %r9236; 2026-02-21T12:31:22.1223614Z mov.b32 %r9338, %r9236; 2026-02-21T12:31:22.1223670Z mov.b32 %r9339, %r9236; 2026-02-21T12:31:22.1223774Z mov.b32 %r9340, %r9236; 2026-02-21T12:31:22.1223839Z mov.b32 %r9341, %r9236; 2026-02-21T12:31:22.1223897Z mov.b32 %r9342, %r9236; 2026-02-21T12:31:22.1223954Z mov.b32 %r9343, %r9236; 2026-02-21T12:31:22.1224017Z mov.b32 %r9344, %r9236; 2026-02-21T12:31:22.1224074Z mov.b32 %r9345, %r9236; 2026-02-21T12:31:22.1224129Z mov.b32 %r9346, %r9236; 2026-02-21T12:31:22.1224188Z mov.b32 %r9347, %r9236; 2026-02-21T12:31:22.1224248Z mov.b32 %r9348, %r9236; 2026-02-21T12:31:22.1224305Z mov.b32 %r9349, %r9236; 2026-02-21T12:31:22.1224362Z mov.b32 %r9350, %r9236; 2026-02-21T12:31:22.1224432Z mov.b32 %r9351, %r9236; 2026-02-21T12:31:22.1224489Z mov.b32 %r9352, %r9236; 2026-02-21T12:31:22.1224548Z mov.b32 %r9353, %r9236; 2026-02-21T12:31:22.1224606Z mov.b32 %r9354, %r9236; 2026-02-21T12:31:22.1224673Z mov.b32 %r9355, %r9236; 2026-02-21T12:31:22.1224731Z mov.b32 %r9356, %r9236; 2026-02-21T12:31:22.1224789Z mov.b32 %r9357, %r9236; 2026-02-21T12:31:22.1224849Z mov.b32 %r9358, %r9236; 2026-02-21T12:31:22.1224909Z mov.b32 %r9359, %r9236; 2026-02-21T12:31:22.1224968Z mov.b32 %r9360, %r9236; 2026-02-21T12:31:22.1225025Z mov.b32 %r9361, %r9236; 2026-02-21T12:31:22.1225089Z mov.b32 %r9362, %r9236; 2026-02-21T12:31:22.1225146Z mov.b32 %r9363, %r9236; 2026-02-21T12:31:22.1225202Z mov.b32 %r9364, %r9236; 2026-02-21T12:31:22.1225268Z mov.b32 %r9365, %r9236; 2026-02-21T12:31:22.1225324Z mov.b32 %r9366, %r9236; 2026-02-21T12:31:22.1225382Z mov.b32 %r9367, %r9236; 2026-02-21T12:31:22.1225439Z mov.b32 %r9368, %r9236; 2026-02-21T12:31:22.1225503Z mov.b32 %r9369, %r9236; 2026-02-21T12:31:22.1225563Z mov.b32 %r9370, %r9236; 2026-02-21T12:31:22.1225621Z mov.b32 %r9371, %r9236; 2026-02-21T12:31:22.1225684Z mov.b32 %r9372, %r9236; 2026-02-21T12:31:22.1225741Z mov.b32 %r9373, %r9236; 2026-02-21T12:31:22.1225801Z mov.b32 %r9374, %r9236; 2026-02-21T12:31:22.1225857Z mov.b32 %r9375, %r9236; 2026-02-21T12:31:22.1225920Z mov.b32 %r9376, %r9236; 2026-02-21T12:31:22.1225978Z mov.b32 %r9377, %r9236; 2026-02-21T12:31:22.1226039Z mov.b32 %r9378, %r9236; 2026-02-21T12:31:22.1226101Z mov.b32 %r9379, %r9236; 2026-02-21T12:31:22.1226157Z mov.b32 %r9380, %r9236; 2026-02-21T12:31:22.1226213Z mov.b32 %r9381, %r9236; 2026-02-21T12:31:22.1226274Z mov.b32 %r9382, %r9236; 2026-02-21T12:31:22.1226333Z mov.b32 %r9383, %r9236; 2026-02-21T12:31:22.1226391Z mov.b32 %r9384, %r9236; 2026-02-21T12:31:22.1226593Z mov.b32 %r9385, %r9236; 2026-02-21T12:31:22.1226663Z mov.b32 %r9386, %r9236; 2026-02-21T12:31:22.1226720Z mov.b32 %r9387, %r9236; 2026-02-21T12:31:22.1226859Z mov.b32 %r9388, %r9236; 2026-02-21T12:31:22.1226923Z mov.b32 %r9389, %r9236; 2026-02-21T12:31:22.1226980Z mov.b32 %r9390, %r9236; 2026-02-21T12:31:22.1227037Z mov.b32 %r9391, %r9236; 2026-02-21T12:31:22.1227094Z mov.b32 %r9392, %r9236; 2026-02-21T12:31:22.1227223Z mov.b32 %r9393, %r9236; 2026-02-21T12:31:22.1227281Z mov.b32 %r9394, %r9236; 2026-02-21T12:31:22.1227352Z mov.b32 %r9395, %r9236; 2026-02-21T12:31:22.1227419Z mov.b32 %r9396, %r9236; 2026-02-21T12:31:22.1227479Z mov.b32 %r9397, %r9236; 2026-02-21T12:31:22.1227534Z mov.b32 %r9398, %r9236; 2026-02-21T12:31:22.1227592Z mov.b32 %r9399, %r9236; 2026-02-21T12:31:22.1227656Z mov.b32 %r9400, %r9236; 2026-02-21T12:31:22.1227712Z mov.b32 %r9401, %r9236; 2026-02-21T12:31:22.1227771Z mov.b32 %r9402, %r9236; 2026-02-21T12:31:22.1227831Z mov.b32 %r9403, %r9236; 2026-02-21T12:31:22.1227888Z mov.b32 %r9404, %r9236; 2026-02-21T12:31:22.1227947Z mov.b32 %r9405, %r9236; 2026-02-21T12:31:22.1228006Z mov.b32 %r9406, %r9236; 2026-02-21T12:31:22.1228137Z mov.b32 %r9407, %r9236; 2026-02-21T12:31:22.1228197Z mov.b32 %r9408, %r9236; 2026-02-21T12:31:22.1228352Z mov.b32 %r9409, %r9236; 2026-02-21T12:31:22.1228415Z mov.b32 %r9410, %r9236; 2026-02-21T12:31:22.1228476Z mov.b32 %r9411, %r9236; 2026-02-21T12:31:22.1228534Z mov.b32 %r9412, %r9236; 2026-02-21T12:31:22.1228592Z mov.b32 %r9413, %r9236; 2026-02-21T12:31:22.1228738Z mov.b32 %r9414, %r9236; 2026-02-21T12:31:22.1228800Z mov.b32 %r9415, %r9236; 2026-02-21T12:31:22.1228857Z mov.b32 %r9416, %r9236; 2026-02-21T12:31:22.1228918Z mov.b32 %r9417, %r9236; 2026-02-21T12:31:22.1228974Z mov.b32 %r9418, %r9236; 2026-02-21T12:31:22.1229035Z mov.b32 %r9419, %r9236; 2026-02-21T12:31:22.1229092Z mov.b32 %r9420, %r9236; 2026-02-21T12:31:22.1229153Z mov.b32 %r9421, %r9236; 2026-02-21T12:31:22.1229212Z mov.b32 %r9422, %r9236; 2026-02-21T12:31:22.1229269Z mov.b32 %r9423, %r9236; 2026-02-21T12:31:22.1229332Z mov.b32 %r9424, %r9236; 2026-02-21T12:31:22.1229392Z mov.b32 %r9425, %r9236; 2026-02-21T12:31:22.1229452Z mov.b32 %r9426, %r9236; 2026-02-21T12:31:22.1229509Z mov.b32 %r9427, %r9236; 2026-02-21T12:31:22.1229571Z mov.b32 %r9428, %r9236; 2026-02-21T12:31:22.1229629Z mov.b32 %r9429, %r9236; 2026-02-21T12:31:22.1229692Z mov.b32 %r9430, %r9236; 2026-02-21T12:31:22.1229753Z mov.b32 %r9431, %r9236; 2026-02-21T12:31:22.1229810Z mov.b32 %r9432, %r9236; 2026-02-21T12:31:22.1229868Z mov.b32 %r9433, %r9236; 2026-02-21T12:31:22.1229931Z mov.b32 %r9434, %r9236; 2026-02-21T12:31:22.1229988Z mov.b32 %r9435, %r9236; 2026-02-21T12:31:22.1230049Z mov.b32 %r9436, %r9236; 2026-02-21T12:31:22.1230106Z mov.b32 %r9437, %r9236; 2026-02-21T12:31:22.1230171Z mov.b32 %r9438, %r9236; 2026-02-21T12:31:22.1230228Z mov.b32 %r9439, %r9236; 2026-02-21T12:31:22.1230285Z mov.b32 %r9440, %r9236; 2026-02-21T12:31:22.1230346Z mov.b32 %r9441, %r9236; 2026-02-21T12:31:22.1230404Z mov.b32 %r9442, %r9236; 2026-02-21T12:31:22.1230460Z mov.b32 %r9443, %r9236; 2026-02-21T12:31:22.1230525Z mov.b32 %r9444, %r9236; 2026-02-21T12:31:22.1230588Z mov.b32 %r9445, %r9236; 2026-02-21T12:31:22.1230645Z mov.b32 %r9446, %r9236; 2026-02-21T12:31:22.1230702Z mov.b32 %r9447, %r9236; 2026-02-21T12:31:22.1230770Z mov.b32 %r9448, %r9236; 2026-02-21T12:31:22.1230830Z mov.b32 %r9449, %r9236; 2026-02-21T12:31:22.1230901Z mov.b32 %r9450, %r9236; 2026-02-21T12:31:22.1230961Z mov.b32 %r9451, %r9236; 2026-02-21T12:31:22.1231034Z mov.b32 %r9452, %r9236; 2026-02-21T12:31:22.1231093Z mov.b32 %r9453, %r9236; 2026-02-21T12:31:22.1231151Z mov.b32 %r9454, %r9236; 2026-02-21T12:31:22.1231210Z mov.b32 %r9455, %r9236; 2026-02-21T12:31:22.1231268Z mov.b32 %r9456, %r9236; 2026-02-21T12:31:22.1231325Z mov.b32 %r9457, %r9236; 2026-02-21T12:31:22.1231382Z mov.b32 %r9458, %r9236; 2026-02-21T12:31:22.1231446Z mov.b32 %r9459, %r9236; 2026-02-21T12:31:22.1231505Z mov.b32 %r9460, %r9236; 2026-02-21T12:31:22.1231563Z mov.b32 %r9461, %r9236; 2026-02-21T12:31:22.1231690Z mov.b32 %r9462, %r9236; 2026-02-21T12:31:22.1231749Z mov.b32 %r9463, %r9236; 2026-02-21T12:31:22.1231807Z mov.b32 %r9464, %r9236; 2026-02-21T12:31:22.1231864Z mov.b32 %r9465, %r9236; 2026-02-21T12:31:22.1231928Z mov.b32 %r9466, %r9236; 2026-02-21T12:31:22.1232035Z mov.b32 %r9467, %r9236; 2026-02-21T12:31:22.1232093Z mov.b32 %r9468, %r9236; 2026-02-21T12:31:22.1232158Z mov.b32 %r9469, %r9236; 2026-02-21T12:31:22.1232217Z mov.b32 %r9470, %r9236; 2026-02-21T12:31:22.1232274Z mov.b32 %r9471, %r9236; 2026-02-21T12:31:22.1232333Z mov.b32 %r9472, %r9236; 2026-02-21T12:31:22.1232397Z mov.b32 %r9473, %r9236; 2026-02-21T12:31:22.1232455Z mov.b32 %r9474, %r9236; 2026-02-21T12:31:22.1232511Z mov.b32 %r9475, %r9236; 2026-02-21T12:31:22.1232573Z mov.b32 %r9476, %r9236; 2026-02-21T12:31:22.1232632Z mov.b32 %r9477, %r9236; 2026-02-21T12:31:22.1232689Z mov.b32 %r9478, %r9236; 2026-02-21T12:31:22.1232749Z mov.b32 %r9479, %r9236; 2026-02-21T12:31:22.1232806Z mov.b32 %r9480, %r9236; 2026-02-21T12:31:22.1232919Z mov.b32 %r9481, %r9236; 2026-02-21T12:31:22.1232979Z mov.b32 %r9482, %r9236; 2026-02-21T12:31:22.1233040Z mov.b32 %r9483, %r9236; 2026-02-21T12:31:22.1233097Z mov.b32 %r9484, %r9236; 2026-02-21T12:31:22.1233154Z mov.b32 %r9485, %r9236; 2026-02-21T12:31:22.1233217Z mov.b32 %r9486, %r9236; 2026-02-21T12:31:22.1233274Z mov.b32 %r9487, %r9236; 2026-02-21T12:31:22.1233331Z mov.b32 %r9488, %r9236; 2026-02-21T12:31:22.1233437Z mov.b32 %r9489, %r9236; 2026-02-21T12:31:22.1233517Z mov.b32 %r9490, %r9236; 2026-02-21T12:31:22.1233576Z mov.b32 %r9491, %r9236; 2026-02-21T12:31:22.1233633Z mov.b32 %r9492, %r9236; 2026-02-21T12:31:22.1233695Z mov.b32 %r9493, %r9236; 2026-02-21T12:31:22.1233754Z mov.b32 %r9494, %r9236; 2026-02-21T12:31:22.1233812Z mov.b32 %r9495, %r9236; 2026-02-21T12:31:22.1233870Z mov.b32 %r9496, %r9236; 2026-02-21T12:31:22.1233934Z mov.b32 %r9497, %r9236; 2026-02-21T12:31:22.1233991Z mov.b32 %r9498, %r9236; 2026-02-21T12:31:22.1234050Z mov.b32 %r9499, %r9236; 2026-02-21T12:31:22.1234121Z mov.b32 %r9500, %r9236; 2026-02-21T12:31:22.1234177Z mov.b32 %r9501, %r9236; 2026-02-21T12:31:22.1234235Z mov.b32 %r9502, %r9236; 2026-02-21T12:31:22.1234292Z mov.b32 %r9503, %r9236; 2026-02-21T12:31:22.1234358Z mov.b32 %r9504, %r9236; 2026-02-21T12:31:22.1234416Z mov.b32 %r9505, %r9236; 2026-02-21T12:31:22.1234474Z mov.b32 %r9506, %r9236; 2026-02-21T12:31:22.1234536Z mov.b32 %r9507, %r9236; 2026-02-21T12:31:22.1234596Z mov.b32 %r9508, %r9236; 2026-02-21T12:31:22.1234653Z mov.b32 %r9509, %r9236; 2026-02-21T12:31:22.1234713Z mov.b32 %r9510, %r9236; 2026-02-21T12:31:22.1234773Z mov.b32 %r9511, %r9236; 2026-02-21T12:31:22.1234830Z mov.b32 %r9512, %r9236; 2026-02-21T12:31:22.1234887Z mov.b32 %r9513, %r9236; 2026-02-21T12:31:22.1234948Z mov.b32 %r9514, %r9236; 2026-02-21T12:31:22.1235004Z mov.b32 %r9515, %r9236; 2026-02-21T12:31:22.1235060Z mov.b32 %r9516, %r9236; 2026-02-21T12:31:22.1235115Z mov.b32 %r9517, %r9236; 2026-02-21T12:31:22.1235183Z mov.b32 %r9518, %r9236; 2026-02-21T12:31:22.1235243Z mov.b32 %r9519, %r9236; 2026-02-21T12:31:22.1235302Z mov.b32 %r9520, %r9236; 2026-02-21T12:31:22.1235363Z mov.b32 %r9521, %r9236; 2026-02-21T12:31:22.1235422Z mov.b32 %r9522, %r9236; 2026-02-21T12:31:22.1235480Z mov.b32 %r9523, %r9236; 2026-02-21T12:31:22.1235543Z mov.b32 %r9524, %r9236; 2026-02-21T12:31:22.1235612Z mov.b32 %r9525, %r9236; 2026-02-21T12:31:22.1235676Z mov.b32 %r9526, %r9236; 2026-02-21T12:31:22.1235733Z mov.b32 %r9527, %r9236; 2026-02-21T12:31:22.1235795Z mov.b32 %r9528, %r9236; 2026-02-21T12:31:22.1235852Z mov.b32 %r9529, %r9236; 2026-02-21T12:31:22.1235910Z mov.b32 %r9530, %r9236; 2026-02-21T12:31:22.1235973Z mov.b32 %r9531, %r9236; 2026-02-21T12:31:22.1236030Z mov.b32 %r9532, %r9236; 2026-02-21T12:31:22.1236087Z mov.b32 %r9533, %r9236; 2026-02-21T12:31:22.1236143Z mov.b32 %r9534, %r9236; 2026-02-21T12:31:22.1236205Z mov.b32 %r9535, %r9236; 2026-02-21T12:31:22.1236263Z mov.b32 %r9536, %r9236; 2026-02-21T12:31:22.1236382Z mov.b32 %r9537, %r9236; 2026-02-21T12:31:22.1236568Z mov.b32 %r9538, %r9236; 2026-02-21T12:31:22.1236633Z mov.b32 %r9539, %r9236; 2026-02-21T12:31:22.1236695Z mov.b32 %r9540, %r9236; 2026-02-21T12:31:22.1236751Z mov.b32 %r9541, %r9236; 2026-02-21T12:31:22.1236888Z mov.b32 %r9542, %r9236; 2026-02-21T12:31:22.1236948Z mov.b32 %r9543, %r9236; 2026-02-21T12:31:22.1237005Z mov.b32 %r9544, %r9236; 2026-02-21T12:31:22.1237069Z mov.b32 %r9545, %r9236; 2026-02-21T12:31:22.1237125Z mov.b32 %r9546, %r9236; 2026-02-21T12:31:22.1237193Z mov.b32 %r9547, %r9236; 2026-02-21T12:31:22.1237253Z mov.b32 %r9548, %r9236; 2026-02-21T12:31:22.1237316Z mov.b32 %r9549, %r9236; 2026-02-21T12:31:22.1237382Z mov.b32 %r9550, %r9236; 2026-02-21T12:31:22.1237447Z mov.b32 %r9551, %r9236; 2026-02-21T12:31:22.1237507Z mov.b32 %r9552, %r9236; 2026-02-21T12:31:22.1237567Z mov.b32 %r9553, %r9236; 2026-02-21T12:31:22.1237625Z mov.b32 %r9554, %r9236; 2026-02-21T12:31:22.1237688Z mov.b32 %r9555, %r9236; 2026-02-21T12:31:22.1237819Z mov.b32 %r9556, %r9236; 2026-02-21T12:31:22.1237881Z mov.b32 %r9557, %r9236; 2026-02-21T12:31:22.1237937Z mov.b32 %r9558, %r9236; 2026-02-21T12:31:22.1238000Z mov.b32 %r9559, %r9236; 2026-02-21T12:31:22.1238060Z mov.b32 %r9560, %r9236; 2026-02-21T12:31:22.1238117Z mov.b32 %r9561, %r9236; 2026-02-21T12:31:22.1238178Z mov.b32 %r9562, %r9236; 2026-02-21T12:31:22.1238315Z mov.b32 %r9563, %r9236; 2026-02-21T12:31:22.1238374Z mov.b32 %r9564, %r9236; 2026-02-21T12:31:22.1238431Z mov.b32 %r9565, %r9236; 2026-02-21T12:31:22.1238493Z mov.b32 %r9566, %r9236; 2026-02-21T12:31:22.1238550Z mov.b32 %r9567, %r9236; 2026-02-21T12:31:22.1238612Z mov.b32 %r9568, %r9236; 2026-02-21T12:31:22.1238672Z mov.b32 %r9569, %r9236; 2026-02-21T12:31:22.1238729Z mov.b32 %r9570, %r9236; 2026-02-21T12:31:22.1238786Z mov.b32 %r9571, %r9236; 2026-02-21T12:31:22.1238844Z mov.b32 %r9572, %r9236; 2026-02-21T12:31:22.1238901Z mov.b32 %r9573, %r9236; 2026-02-21T12:31:22.1238961Z mov.b32 %r9574, %r9236; 2026-02-21T12:31:22.1239018Z mov.b32 %r9575, %r9236; 2026-02-21T12:31:22.1239080Z mov.b32 %r9576, %r9236; 2026-02-21T12:31:22.1239136Z mov.b32 %r9577, %r9236; 2026-02-21T12:31:22.1239192Z mov.b32 %r9578, %r9236; 2026-02-21T12:31:22.1239255Z mov.b32 %r9579, %r9236; 2026-02-21T12:31:22.1239311Z mov.b32 %r9580, %r9236; 2026-02-21T12:31:22.1239367Z mov.b32 %r9581, %r9236; 2026-02-21T12:31:22.1239425Z mov.b32 %r9582, %r9236; 2026-02-21T12:31:22.1239488Z mov.b32 %r9583, %r9236; 2026-02-21T12:31:22.1239545Z mov.b32 %r9584, %r9236; 2026-02-21T12:31:22.1239604Z mov.b32 %r9585, %r9236; 2026-02-21T12:31:22.1239666Z mov.b32 %r9586, %r9236; 2026-02-21T12:31:22.1239724Z mov.b32 %r9587, %r9236; 2026-02-21T12:31:22.1239779Z mov.b32 %r9588, %r9236; 2026-02-21T12:31:22.1239836Z mov.b32 %r9589, %r9236; 2026-02-21T12:31:22.1239898Z mov.b32 %r9590, %r9236; 2026-02-21T12:31:22.1239955Z mov.b32 %r9591, %r9236; 2026-02-21T12:31:22.1240013Z mov.b32 %r9592, %r9236; 2026-02-21T12:31:22.1240078Z mov.b32 %r9593, %r9236; 2026-02-21T12:31:22.1240138Z mov.b32 %r9594, %r9236; 2026-02-21T12:31:22.1240196Z mov.b32 %r9595, %r9236; 2026-02-21T12:31:22.1240252Z mov.b32 %r9596, %r9236; 2026-02-21T12:31:22.1240317Z mov.b32 %r9597, %r9236; 2026-02-21T12:31:22.1240373Z mov.b32 %r9598, %r9236; 2026-02-21T12:31:22.1240430Z mov.b32 %r9599, %r9236; 2026-02-21T12:31:22.1240490Z mov.b32 %r9600, %r9236; 2026-02-21T12:31:22.1240549Z mov.b32 %r9601, %r9236; 2026-02-21T12:31:22.1240606Z mov.b32 %r9602, %r9236; 2026-02-21T12:31:22.1240677Z mov.b32 %r9603, %r9236; 2026-02-21T12:31:22.1240741Z mov.b32 %r9604, %r9236; 2026-02-21T12:31:22.1240798Z mov.b32 %r9605, %r9236; 2026-02-21T12:31:22.1240855Z mov.b32 %r9606, %r9236; 2026-02-21T12:31:22.1240917Z mov.b32 %r9607, %r9236; 2026-02-21T12:31:22.1240977Z mov.b32 %r9608, %r9236; 2026-02-21T12:31:22.1241034Z mov.b32 %r9609, %r9236; 2026-02-21T12:31:22.1241092Z mov.b32 %r9610, %r9236; 2026-02-21T12:31:22.1241244Z mov.b32 %r9611, %r9236; 2026-02-21T12:31:22.1241306Z mov.b32 %r9612, %r9236; 2026-02-21T12:31:22.1241364Z mov.b32 %r9613, %r9236; 2026-02-21T12:31:22.1241425Z mov.b32 %r9614, %r9236; 2026-02-21T12:31:22.1241482Z mov.b32 %r9615, %r9236; 2026-02-21T12:31:22.1241588Z mov.b32 %r9616, %r9236; 2026-02-21T12:31:22.1241645Z mov.b32 %r9617, %r9236; 2026-02-21T12:31:22.1241706Z mov.b32 %r9618, %r9236; 2026-02-21T12:31:22.1241766Z mov.b32 %r9619, %r9236; 2026-02-21T12:31:22.1241824Z mov.b32 %r9620, %r9236; 2026-02-21T12:31:22.1241883Z mov.b32 %r9621, %r9236; 2026-02-21T12:31:22.1241940Z mov.b32 %r9622, %r9236; 2026-02-21T12:31:22.1241996Z mov.b32 %r9623, %r9236; 2026-02-21T12:31:22.1242060Z mov.b32 %r9624, %r9236; 2026-02-21T12:31:22.1242116Z mov.b32 %r9625, %r9236; 2026-02-21T12:31:22.1242172Z mov.b32 %r9626, %r9236; 2026-02-21T12:31:22.1242228Z mov.b32 %r9627, %r9236; 2026-02-21T12:31:22.1242287Z mov.b32 %r9628, %r9236; 2026-02-21T12:31:22.1242347Z mov.b32 %r9629, %r9236; 2026-02-21T12:31:22.1242461Z mov.b32 %r9630, %r9236; 2026-02-21T12:31:22.1242526Z mov.b32 %r9631, %r9236; 2026-02-21T12:31:22.1242584Z mov.b32 %r9632, %r9236; 2026-02-21T12:31:22.1242641Z mov.b32 %r9633, %r9236; 2026-02-21T12:31:22.1242698Z mov.b32 %r9634, %r9236; 2026-02-21T12:31:22.1242762Z mov.b32 %r9635, %r9236; 2026-02-21T12:31:22.1242817Z mov.b32 %r9636, %r9236; 2026-02-21T12:31:22.1242873Z mov.b32 %r9637, %r9236; 2026-02-21T12:31:22.1242981Z mov.b32 %r9638, %r9236; 2026-02-21T12:31:22.1243040Z mov.b32 %r9639, %r9236; 2026-02-21T12:31:22.1243096Z mov.b32 %r9640, %r9236; 2026-02-21T12:31:22.1243154Z mov.b32 %r9641, %r9236; 2026-02-21T12:31:22.1243216Z mov.b32 %r9642, %r9236; 2026-02-21T12:31:22.1243272Z mov.b32 %r9643, %r9236; 2026-02-21T12:31:22.1243328Z mov.b32 %r9644, %r9236; 2026-02-21T12:31:22.1243387Z mov.b32 %r9645, %r9236; 2026-02-21T12:31:22.1243447Z mov.b32 %r9646, %r9236; 2026-02-21T12:31:22.1243507Z mov.b32 %r9647, %r9236; 2026-02-21T12:31:22.1243565Z mov.b32 %r9648, %r9236; 2026-02-21T12:31:22.1243623Z mov.b32 %r9649, %r9236; 2026-02-21T12:31:22.1243690Z mov.b32 %r9650, %r9236; 2026-02-21T12:31:22.1243754Z mov.b32 %r9651, %r9236; 2026-02-21T12:31:22.1243811Z mov.b32 %r9652, %r9236; 2026-02-21T12:31:22.1243871Z mov.b32 %r9653, %r9236; 2026-02-21T12:31:22.1243931Z mov.b32 %r9654, %r9236; 2026-02-21T12:31:22.1243987Z mov.b32 %r9655, %r9236; 2026-02-21T12:31:22.1244042Z mov.b32 %r9656, %r9236; 2026-02-21T12:31:22.1244103Z mov.b32 %r9657, %r9236; 2026-02-21T12:31:22.1244163Z mov.b32 %r9658, %r9236; 2026-02-21T12:31:22.1244219Z mov.b32 %r9659, %r9236; 2026-02-21T12:31:22.1244275Z mov.b32 %r9660, %r9236; 2026-02-21T12:31:22.1244338Z mov.b32 %r9661, %r9236; 2026-02-21T12:31:22.1244394Z mov.b32 %r9662, %r9236; 2026-02-21T12:31:22.1244449Z mov.b32 %r9663, %r9236; 2026-02-21T12:31:22.1244504Z mov.b32 %r9664, %r9236; 2026-02-21T12:31:22.1244565Z mov.b32 %r9665, %r9236; 2026-02-21T12:31:22.1244623Z mov.b32 %r9666, %r9236; 2026-02-21T12:31:22.1244682Z mov.b32 %r9667, %r9236; 2026-02-21T12:31:22.1244745Z mov.b32 %r9668, %r9236; 2026-02-21T12:31:22.1244800Z mov.b32 %r9669, %r9236; 2026-02-21T12:31:22.1244857Z mov.b32 %r9670, %r9236; 2026-02-21T12:31:22.1244912Z mov.b32 %r9671, %r9236; 2026-02-21T12:31:22.1244975Z mov.b32 %r9672, %r9236; 2026-02-21T12:31:22.1245030Z mov.b32 %r9673, %r9236; 2026-02-21T12:31:22.1245085Z mov.b32 %r9674, %r9236; 2026-02-21T12:31:22.1245146Z mov.b32 %r9675, %r9236; 2026-02-21T12:31:22.1245201Z mov.b32 %r9676, %r9236; 2026-02-21T12:31:22.1245256Z mov.b32 %r9677, %r9236; 2026-02-21T12:31:22.1245312Z mov.b32 %r9678, %r9236; 2026-02-21T12:31:22.1245371Z mov.b32 %r9679, %r9236; 2026-02-21T12:31:22.1245426Z mov.b32 %r9680, %r9236; 2026-02-21T12:31:22.1245481Z mov.b32 %r9681, %r9236; 2026-02-21T12:31:22.1245540Z mov.b32 %r9682, %r9236; 2026-02-21T12:31:22.1245595Z mov.b32 %r9683, %r9236; 2026-02-21T12:31:22.1245651Z mov.b32 %r9684, %r9236; 2026-02-21T12:31:22.1245710Z mov.b32 %r9685, %r9236; 2026-02-21T12:31:22.1245838Z mov.b32 %r9686, %r9236; 2026-02-21T12:31:22.1245896Z mov.b32 %r9687, %r9236; 2026-02-21T12:31:22.1245950Z mov.b32 %r9688, %r9236; 2026-02-21T12:31:22.1246010Z mov.b32 %r9689, %r9236; 2026-02-21T12:31:22.1246111Z mov.b32 %r9690, %r9236; 2026-02-21T12:31:22.1246166Z mov.b32 %r9691, %r9236; 2026-02-21T12:31:22.1246225Z mov.b32 %r9692, %r9236; 2026-02-21T12:31:22.1246280Z mov.b32 %r9693, %r9236; 2026-02-21T12:31:22.1246338Z mov.b32 %r9694, %r9236; 2026-02-21T12:31:22.1246393Z mov.b32 %r9695, %r9236; 2026-02-21T12:31:22.1246590Z mov.b32 %r9696, %r9236; 2026-02-21T12:31:22.1246697Z mov.b32 %r9697, %r9236; 2026-02-21T12:31:22.1246804Z mov.b32 %r9698, %r9236; 2026-02-21T12:31:22.1246871Z mov.b32 %r9699, %r9236; 2026-02-21T12:31:22.1246928Z mov.b32 %r9700, %r9236; 2026-02-21T12:31:22.1246984Z mov.b32 %r9701, %r9236; 2026-02-21T12:31:22.1247039Z mov.b32 %r9702, %r9236; 2026-02-21T12:31:22.1247098Z mov.b32 %r9703, %r9236; 2026-02-21T12:31:22.1247157Z mov.b32 %r9704, %r9236; 2026-02-21T12:31:22.1247301Z mov.b32 %r9705, %r9236; 2026-02-21T12:31:22.1247378Z mov.b32 %r9706, %r9236; 2026-02-21T12:31:22.1247435Z mov.b32 %r9707, %r9236; 2026-02-21T12:31:22.1247491Z mov.b32 %r9708, %r9236; 2026-02-21T12:31:22.1247552Z mov.b32 %r9709, %r9236; 2026-02-21T12:31:22.1247613Z mov.b32 %r9710, %r9236; 2026-02-21T12:31:22.1247670Z mov.b32 %r9711, %r9236; 2026-02-21T12:31:22.1247789Z mov.b32 %r9712, %r9236; 2026-02-21T12:31:22.1247852Z mov.b32 %r9713, %r9236; 2026-02-21T12:31:22.1247908Z mov.b32 %r9714, %r9236; 2026-02-21T12:31:22.1247964Z mov.b32 %r9715, %r9236; 2026-02-21T12:31:22.1248020Z mov.b32 %r9716, %r9236; 2026-02-21T12:31:22.1248080Z mov.b32 %r9717, %r9236; 2026-02-21T12:31:22.1248137Z mov.b32 %r9718, %r9236; 2026-02-21T12:31:22.1248194Z mov.b32 %r9719, %r9236; 2026-02-21T12:31:22.1248254Z mov.b32 %r9720, %r9236; 2026-02-21T12:31:22.1248310Z mov.b32 %r9721, %r9236; 2026-02-21T12:31:22.1248367Z mov.b32 %r9722, %r9236; 2026-02-21T12:31:22.1248428Z mov.b32 %r9723, %r9236; 2026-02-21T12:31:22.1248502Z mov.b32 %r9724, %r9236; 2026-02-21T12:31:22.1248559Z mov.b32 %r9725, %r9236; 2026-02-21T12:31:22.1248615Z mov.b32 %r9726, %r9236; 2026-02-21T12:31:22.1248675Z mov.b32 %r9727, %r9236; 2026-02-21T12:31:22.1248736Z mov.b32 %r9728, %r9236; 2026-02-21T12:31:22.1248792Z mov.b32 %r9729, %r9236; 2026-02-21T12:31:22.1248851Z mov.b32 %r9730, %r9236; 2026-02-21T12:31:22.1248910Z mov.b32 %r9731, %r9236; 2026-02-21T12:31:22.1248966Z mov.b32 %r9732, %r9236; 2026-02-21T12:31:22.1249022Z mov.b32 %r9733, %r9236; 2026-02-21T12:31:22.1249088Z mov.b32 %r9734, %r9236; 2026-02-21T12:31:22.1249149Z mov.b32 %r9735, %r9236; 2026-02-21T12:31:22.1249206Z mov.b32 %r9736, %r9236; 2026-02-21T12:31:22.1249268Z mov.b32 %r9737, %r9236; 2026-02-21T12:31:22.1249325Z mov.b32 %r9738, %r9236; 2026-02-21T12:31:22.1249381Z mov.b32 %r9739, %r9236; 2026-02-21T12:31:22.1249436Z mov.b32 %r9740, %r9236; 2026-02-21T12:31:22.1249495Z mov.b32 %r9741, %r9236; 2026-02-21T12:31:22.1249556Z mov.b32 %r9742, %r9236; 2026-02-21T12:31:22.1249612Z mov.b32 %r9743, %r9236; 2026-02-21T12:31:22.1249673Z mov.b32 %r9744, %r9236; 2026-02-21T12:31:22.1249730Z mov.b32 %r9745, %r9236; 2026-02-21T12:31:22.1249789Z mov.b32 %r9746, %r9236; 2026-02-21T12:31:22.1249844Z mov.b32 %r9747, %r9236; 2026-02-21T12:31:22.1249904Z mov.b32 %r9748, %r9236; 2026-02-21T12:31:22.1249959Z mov.b32 %r9749, %r9236; 2026-02-21T12:31:22.1250018Z mov.b32 %r9750, %r9236; 2026-02-21T12:31:22.1250077Z mov.b32 %r9751, %r9236; 2026-02-21T12:31:22.1250133Z mov.b32 %r9752, %r9236; 2026-02-21T12:31:22.1250190Z mov.b32 %r9753, %r9236; 2026-02-21T12:31:22.1250245Z mov.b32 %r9754, %r9236; 2026-02-21T12:31:22.1250304Z mov.b32 %r9755, %r9236; 2026-02-21T12:31:22.1250361Z mov.b32 %r9756, %r9236; 2026-02-21T12:31:22.1250417Z mov.b32 %r9757, %r9236; 2026-02-21T12:31:22.1250477Z mov.b32 %r9758, %r9236; 2026-02-21T12:31:22.1250535Z mov.b32 %r9759, %r9236; 2026-02-21T12:31:22.1250677Z mov.b32 %r9760, %r9236; 2026-02-21T12:31:22.1250737Z mov.b32 %r9761, %r9236; 2026-02-21T12:31:22.1250799Z mov.b32 %r9762, %r9236; 2026-02-21T12:31:22.1250856Z mov.b32 %r9763, %r9236; 2026-02-21T12:31:22.1250913Z mov.b32 %r9764, %r9236; 2026-02-21T12:31:22.1251029Z mov.b32 %r9765, %r9236; 2026-02-21T12:31:22.1251085Z mov.b32 %r9766, %r9236; 2026-02-21T12:31:22.1251142Z mov.b32 %r9767, %r9236; 2026-02-21T12:31:22.1251202Z mov.b32 %r9768, %r9236; 2026-02-21T12:31:22.1251273Z mov.b32 %r9769, %r9236; 2026-02-21T12:31:22.1251329Z mov.b32 %r9770, %r9236; 2026-02-21T12:31:22.1251385Z mov.b32 %r9771, %r9236; 2026-02-21T12:31:22.1251443Z mov.b32 %r9772, %r9236; 2026-02-21T12:31:22.1251499Z mov.b32 %r9773, %r9236; 2026-02-21T12:31:22.1251555Z mov.b32 %r9774, %r9236; 2026-02-21T12:31:22.1251614Z mov.b32 %r9775, %r9236; 2026-02-21T12:31:22.1251671Z mov.b32 %r9776, %r9236; 2026-02-21T12:31:22.1251729Z mov.b32 %r9777, %r9236; 2026-02-21T12:31:22.1251784Z mov.b32 %r9778, %r9236; 2026-02-21T12:31:22.1251900Z mov.b32 %r9779, %r9236; 2026-02-21T12:31:22.1251961Z mov.b32 %r9780, %r9236; 2026-02-21T12:31:22.1252016Z mov.b32 %r9781, %r9236; 2026-02-21T12:31:22.1252076Z mov.b32 %r9782, %r9236; 2026-02-21T12:31:22.1252131Z mov.b32 %r9783, %r9236; 2026-02-21T12:31:22.1252191Z mov.b32 %r9784, %r9236; 2026-02-21T12:31:22.1252247Z mov.b32 %r9785, %r9236; 2026-02-21T12:31:22.1252307Z mov.b32 %r9786, %r9236; 2026-02-21T12:31:22.1252413Z mov.b32 %r9787, %r9236; 2026-02-21T12:31:22.1252471Z mov.b32 %r9788, %r9236; 2026-02-21T12:31:22.1252530Z mov.b32 %r9789, %r9236; 2026-02-21T12:31:22.1252587Z mov.b32 %r9790, %r9236; 2026-02-21T12:31:22.1252642Z mov.b32 %r9791, %r9236; 2026-02-21T12:31:22.1252698Z mov.b32 %r9792, %r9236; 2026-02-21T12:31:22.1252758Z mov.b32 %r9793, %r9236; 2026-02-21T12:31:22.1252814Z mov.b32 %r9794, %r9236; 2026-02-21T12:31:22.1252871Z mov.b32 %r9795, %r9236; 2026-02-21T12:31:22.1252928Z mov.b32 %r9796, %r9236; 2026-02-21T12:31:22.1252989Z mov.b32 %r9797, %r9236; 2026-02-21T12:31:22.1253046Z mov.b32 %r9798, %r9236; 2026-02-21T12:31:22.1253102Z mov.b32 %r9799, %r9236; 2026-02-21T12:31:22.1253163Z mov.b32 %r9800, %r9236; 2026-02-21T12:31:22.1253218Z mov.b32 %r9801, %r9236; 2026-02-21T12:31:22.1253277Z mov.b32 %r9802, %r9236; 2026-02-21T12:31:22.1253335Z mov.b32 %r9803, %r9236; 2026-02-21T12:31:22.1253391Z mov.b32 %r9804, %r9236; 2026-02-21T12:31:22.1253462Z mov.b32 %r9805, %r9236; 2026-02-21T12:31:22.1253522Z mov.b32 %r9806, %r9236; 2026-02-21T12:31:22.1253582Z mov.b32 %r9807, %r9236; 2026-02-21T12:31:22.1253639Z mov.b32 %r9808, %r9236; 2026-02-21T12:31:22.1253695Z mov.b32 %r9809, %r9236; 2026-02-21T12:31:22.1253756Z mov.b32 %r9810, %r9236; 2026-02-21T12:31:22.1253812Z mov.b32 %r9811, %r9236; 2026-02-21T12:31:22.1253868Z mov.b32 %r9812, %r9236; 2026-02-21T12:31:22.1253924Z mov.b32 %r9813, %r9236; 2026-02-21T12:31:22.1253983Z mov.b32 %r9814, %r9236; 2026-02-21T12:31:22.1254037Z mov.b32 %r9815, %r9236; 2026-02-21T12:31:22.1254096Z mov.b32 %r9816, %r9236; 2026-02-21T12:31:22.1254157Z mov.b32 %r9817, %r9236; 2026-02-21T12:31:22.1254211Z mov.b32 %r9818, %r9236; 2026-02-21T12:31:22.1254267Z mov.b32 %r9819, %r9236; 2026-02-21T12:31:22.1254323Z mov.b32 %r9820, %r9236; 2026-02-21T12:31:22.1254386Z mov.b32 %r9821, %r9236; 2026-02-21T12:31:22.1254443Z mov.b32 %r9822, %r9236; 2026-02-21T12:31:22.1254498Z mov.b32 %r9823, %r9236; 2026-02-21T12:31:22.1254558Z mov.b32 %r9824, %r9236; 2026-02-21T12:31:22.1254613Z mov.b32 %r9825, %r9236; 2026-02-21T12:31:22.1254668Z mov.b32 %r9826, %r9236; 2026-02-21T12:31:22.1254726Z mov.b32 %r9827, %r9236; 2026-02-21T12:31:22.1254782Z mov.b32 %r9828, %r9236; 2026-02-21T12:31:22.1254838Z mov.b32 %r9829, %r9236; 2026-02-21T12:31:22.1254897Z mov.b32 %r9830, %r9236; 2026-02-21T12:31:22.1254955Z mov.b32 %r9831, %r9236; 2026-02-21T12:31:22.1255011Z mov.b32 %r9832, %r9236; 2026-02-21T12:31:22.1255067Z mov.b32 %r9833, %r9236; 2026-02-21T12:31:22.1255124Z mov.b32 %r9834, %r9236; 2026-02-21T12:31:22.1255244Z mov.b32 %r9835, %r9236; 2026-02-21T12:31:22.1255300Z mov.b32 %r9836, %r9236; 2026-02-21T12:31:22.1255368Z mov.b32 %r9837, %r9236; 2026-02-21T12:31:22.1255429Z mov.b32 %r9838, %r9236; 2026-02-21T12:31:22.1255533Z mov.b32 %r9839, %r9236; 2026-02-21T12:31:22.1255589Z mov.b32 %r9840, %r9236; 2026-02-21T12:31:22.1255647Z mov.b32 %r9841, %r9236; 2026-02-21T12:31:22.1255701Z mov.b32 %r9842, %r9236; 2026-02-21T12:31:22.1255759Z mov.b32 %r9843, %r9236; 2026-02-21T12:31:22.1255814Z mov.b32 %r9844, %r9236; 2026-02-21T12:31:22.1255873Z mov.b32 %r9845, %r9236; 2026-02-21T12:31:22.1255928Z mov.b32 %r9846, %r9236; 2026-02-21T12:31:22.1255982Z mov.b32 %r9847, %r9236; 2026-02-21T12:31:22.1256041Z mov.b32 %r9848, %r9236; 2026-02-21T12:31:22.1256096Z mov.b32 %r9849, %r9236; 2026-02-21T12:31:22.1256151Z mov.b32 %r9850, %r9236; 2026-02-21T12:31:22.1256208Z mov.b32 %r9851, %r9236; 2026-02-21T12:31:22.1256267Z mov.b32 %r9852, %r9236; 2026-02-21T12:31:22.1256324Z mov.b32 %r9853, %r9236; 2026-02-21T12:31:22.1256622Z mov.b32 %r9854, %r9236; 2026-02-21T12:31:22.1256711Z mov.b32 %r9855, %r9236; 2026-02-21T12:31:22.1256770Z mov.b32 %r9856, %r9236; 2026-02-21T12:31:22.1256827Z mov.b32 %r9857, %r9236; 2026-02-21T12:31:22.1256888Z mov.b32 %r9858, %r9236; 2026-02-21T12:31:22.1256948Z mov.b32 %r9859, %r9236; 2026-02-21T12:31:22.1257002Z mov.b32 %r9860, %r9236; 2026-02-21T12:31:22.1257144Z mov.b32 %r9861, %r9236; 2026-02-21T12:31:22.1257209Z mov.b32 %r9862, %r9236; 2026-02-21T12:31:22.1257266Z mov.b32 %r9863, %r9236; 2026-02-21T12:31:22.1257322Z mov.b32 %r9864, %r9236; 2026-02-21T12:31:22.1257379Z mov.b32 %r9865, %r9236; 2026-02-21T12:31:22.1257439Z mov.b32 %r9866, %r9236; 2026-02-21T12:31:22.1257495Z mov.b32 %r9867, %r9236; 2026-02-21T12:31:22.1257551Z mov.b32 %r9868, %r9236; 2026-02-21T12:31:22.1257610Z mov.b32 %r9869, %r9236; 2026-02-21T12:31:22.1257666Z mov.b32 %r9870, %r9236; 2026-02-21T12:31:22.1257722Z mov.b32 %r9871, %r9236; 2026-02-21T12:31:22.1257786Z mov.b32 %r9872, %r9236; 2026-02-21T12:31:22.1257842Z mov.b32 %r9873, %r9236; 2026-02-21T12:31:22.1257897Z mov.b32 %r9874, %r9236; 2026-02-21T12:31:22.1257952Z mov.b32 %r9875, %r9236; 2026-02-21T12:31:22.1258009Z mov.b32 %r9876, %r9236; 2026-02-21T12:31:22.1258068Z mov.b32 %r9877, %r9236; 2026-02-21T12:31:22.1258126Z mov.b32 %r9878, %r9236; 2026-02-21T12:31:22.1258185Z mov.b32 %r9879, %r9236; 2026-02-21T12:31:22.1258243Z mov.b32 %r9880, %r9236; 2026-02-21T12:31:22.1258298Z mov.b32 %r9881, %r9236; 2026-02-21T12:31:22.1258355Z mov.b32 %r9882, %r9236; 2026-02-21T12:31:22.1258414Z mov.b32 %r9883, %r9236; 2026-02-21T12:31:22.1258470Z mov.b32 %r9884, %r9236; 2026-02-21T12:31:22.1258526Z mov.b32 %r9885, %r9236; 2026-02-21T12:31:22.1258582Z mov.b32 %r9886, %r9236; 2026-02-21T12:31:22.1258637Z mov.b32 %r9887, %r9236; 2026-02-21T12:31:22.1258694Z mov.b32 %r9888, %r9236; 2026-02-21T12:31:22.1258750Z mov.b32 %r9889, %r9236; 2026-02-21T12:31:22.1258812Z mov.b32 %r9890, %r9236; 2026-02-21T12:31:22.1258872Z mov.b32 %r9891, %r9236; 2026-02-21T12:31:22.1258929Z mov.b32 %r9892, %r9236; 2026-02-21T12:31:22.1258986Z mov.b32 %r9893, %r9236; 2026-02-21T12:31:22.1259042Z mov.b32 %r9894, %r9236; 2026-02-21T12:31:22.1259100Z mov.b32 %r9895, %r9236; 2026-02-21T12:31:22.1259155Z mov.b32 %r9896, %r9236; 2026-02-21T12:31:22.1259215Z mov.b32 %r9897, %r9236; 2026-02-21T12:31:22.1259274Z mov.b32 %r9898, %r9236; 2026-02-21T12:31:22.1259340Z mov.b32 %r9899, %r9236; 2026-02-21T12:31:22.1259401Z mov.b32 %r9900, %r9236; 2026-02-21T12:31:22.1259458Z mov.b32 %r9901, %r9236; 2026-02-21T12:31:22.1259514Z mov.b32 %r9902, %r9236; 2026-02-21T12:31:22.1259570Z mov.b32 %r9903, %r9236; 2026-02-21T12:31:22.1259629Z mov.b32 %r9904, %r9236; 2026-02-21T12:31:22.1259686Z mov.b32 %r9905, %r9236; 2026-02-21T12:31:22.1259742Z mov.b32 %r9906, %r9236; 2026-02-21T12:31:22.1259804Z mov.b32 %r9907, %r9236; 2026-02-21T12:31:22.1259859Z mov.b32 %r9908, %r9236; 2026-02-21T12:31:22.1260006Z mov.b32 %r9909, %r9236; 2026-02-21T12:31:22.1260066Z mov.b32 %r9910, %r9236; 2026-02-21T12:31:22.1260130Z mov.b32 %r9911, %r9236; 2026-02-21T12:31:22.1260185Z mov.b32 %r9912, %r9236; 2026-02-21T12:31:22.1260242Z mov.b32 %r9913, %r9236; 2026-02-21T12:31:22.1260363Z mov.b32 %r9914, %r9236; 2026-02-21T12:31:22.1260421Z mov.b32 %r9915, %r9236; 2026-02-21T12:31:22.1260477Z mov.b32 %r9916, %r9236; 2026-02-21T12:31:22.1260538Z mov.b32 %r9917, %r9236; 2026-02-21T12:31:22.1260595Z mov.b32 %r9918, %r9236; 2026-02-21T12:31:22.1260651Z mov.b32 %r9919, %r9236; 2026-02-21T12:31:22.1260709Z mov.b32 %r9920, %r9236; 2026-02-21T12:31:22.1260769Z mov.b32 %r9921, %r9236; 2026-02-21T12:31:22.1260827Z mov.b32 %r9922, %r9236; 2026-02-21T12:31:22.1260883Z mov.b32 %r9923, %r9236; 2026-02-21T12:31:22.1260942Z mov.b32 %r9924, %r9236; 2026-02-21T12:31:22.1260998Z mov.b32 %r9925, %r9236; 2026-02-21T12:31:22.1261054Z mov.b32 %r9926, %r9236; 2026-02-21T12:31:22.1261111Z mov.b32 %r9927, %r9236; 2026-02-21T12:31:22.1261243Z mov.b32 %r9928, %r9236; 2026-02-21T12:31:22.1261311Z mov.b32 %r9929, %r9236; 2026-02-21T12:31:22.1261369Z mov.b32 %r9930, %r9236; 2026-02-21T12:31:22.1261428Z mov.b32 %r9931, %r9236; 2026-02-21T12:31:22.1261484Z mov.b32 %r9932, %r9236; 2026-02-21T12:31:22.1261543Z mov.b32 %r9933, %r9236; 2026-02-21T12:31:22.1261599Z mov.b32 %r9934, %r9236; 2026-02-21T12:31:22.1261659Z mov.b32 %r9935, %r9236; 2026-02-21T12:31:22.1261765Z mov.b32 %r9936, %r9236; 2026-02-21T12:31:22.1261824Z mov.b32 %r9937, %r9236; 2026-02-21T12:31:22.1261882Z mov.b32 %r9938, %r9236; 2026-02-21T12:31:22.1261938Z mov.b32 %r9939, %r9236; 2026-02-21T12:31:22.1261994Z mov.b32 %r9940, %r9236; 2026-02-21T12:31:22.1262050Z mov.b32 %r9941, %r9236; 2026-02-21T12:31:22.1262108Z mov.b32 %r9942, %r9236; 2026-02-21T12:31:22.1262165Z mov.b32 %r9943, %r9236; 2026-02-21T12:31:22.1262221Z mov.b32 %r9944, %r9236; 2026-02-21T12:31:22.1262280Z mov.b32 %r9945, %r9236; 2026-02-21T12:31:22.1262339Z mov.b32 %r9946, %r9236; 2026-02-21T12:31:22.1262398Z mov.b32 %r9947, %r9236; 2026-02-21T12:31:22.1262454Z mov.b32 %r9948, %r9236; 2026-02-21T12:31:22.1262513Z mov.b32 %r9949, %r9236; 2026-02-21T12:31:22.1262569Z mov.b32 %r9950, %r9236; 2026-02-21T12:31:22.1262628Z mov.b32 %r9951, %r9236; 2026-02-21T12:31:22.1262686Z mov.b32 %r9952, %r9236; 2026-02-21T12:31:22.1262741Z mov.b32 %r9953, %r9236; 2026-02-21T12:31:22.1262799Z mov.b32 %r9954, %r9236; 2026-02-21T12:31:22.1262856Z mov.b32 %r9955, %r9236; 2026-02-21T12:31:22.1262915Z mov.b32 %r9956, %r9236; 2026-02-21T12:31:22.1262971Z mov.b32 %r9957, %r9236; 2026-02-21T12:31:22.1263027Z mov.b32 %r9958, %r9236; 2026-02-21T12:31:22.1263087Z mov.b32 %r9959, %r9236; 2026-02-21T12:31:22.1263145Z mov.b32 %r9960, %r9236; 2026-02-21T12:31:22.1263202Z mov.b32 %r9961, %r9236; 2026-02-21T12:31:22.1263265Z mov.b32 %r9962, %r9236; 2026-02-21T12:31:22.1263321Z mov.b32 %r9963, %r9236; 2026-02-21T12:31:22.1263377Z mov.b32 %r9964, %r9236; 2026-02-21T12:31:22.1263438Z mov.b32 %r9965, %r9236; 2026-02-21T12:31:22.1263498Z mov.b32 %r9966, %r9236; 2026-02-21T12:31:22.1263557Z mov.b32 %r9967, %r9236; 2026-02-21T12:31:22.1263614Z mov.b32 %r9968, %r9236; 2026-02-21T12:31:22.1263673Z mov.b32 %r9969, %r9236; 2026-02-21T12:31:22.1263743Z mov.b32 %r9970, %r9236; 2026-02-21T12:31:22.1263804Z mov.b32 %r9971, %r9236; 2026-02-21T12:31:22.1263861Z mov.b32 %r9972, %r9236; 2026-02-21T12:31:22.1263923Z mov.b32 %r9973, %r9236; 2026-02-21T12:31:22.1263979Z mov.b32 %r9974, %r9236; 2026-02-21T12:31:22.1264035Z mov.b32 %r9975, %r9236; 2026-02-21T12:31:22.1264095Z mov.b32 %r9976, %r9236; 2026-02-21T12:31:22.1264151Z mov.b32 %r9977, %r9236; 2026-02-21T12:31:22.1264207Z mov.b32 %r9978, %r9236; 2026-02-21T12:31:22.1264262Z mov.b32 %r9979, %r9236; 2026-02-21T12:31:22.1264322Z mov.b32 %r9980, %r9236; 2026-02-21T12:31:22.1264378Z mov.b32 %r9981, %r9236; 2026-02-21T12:31:22.1264434Z mov.b32 %r9982, %r9236; 2026-02-21T12:31:22.1264492Z mov.b32 %r9983, %r9236; 2026-02-21T12:31:22.1264617Z mov.b32 %r9984, %r9236; 2026-02-21T12:31:22.1264673Z mov.b32 %r9985, %r9236; 2026-02-21T12:31:22.1264729Z mov.b32 %r9986, %r9236; 2026-02-21T12:31:22.1264789Z mov.b32 %r9987, %r9236; 2026-02-21T12:31:22.1264895Z mov.b32 %r9988, %r9236; 2026-02-21T12:31:22.1264950Z mov.b32 %r9989, %r9236; 2026-02-21T12:31:22.1265009Z mov.b32 %r9990, %r9236; 2026-02-21T12:31:22.1265065Z mov.b32 %r9991, %r9236; 2026-02-21T12:31:22.1265124Z mov.b32 %r9992, %r9236; 2026-02-21T12:31:22.1265182Z mov.b32 %r9993, %r9236; 2026-02-21T12:31:22.1265243Z mov.b32 %r9994, %r9236; 2026-02-21T12:31:22.1265300Z mov.b32 %r9995, %r9236; 2026-02-21T12:31:22.1265356Z mov.b32 %r9996, %r9236; 2026-02-21T12:31:22.1265414Z mov.b32 %r9997, %r9236; 2026-02-21T12:31:22.1265469Z mov.b32 %r9998, %r9236; 2026-02-21T12:31:22.1265524Z mov.b32 %r9999, %r9236; 2026-02-21T12:31:22.1265585Z mov.b32 %r10000, %r9236; 2026-02-21T12:31:22.1265648Z mov.b32 %r10001, %r9236; 2026-02-21T12:31:22.1265709Z mov.b32 %r10002, %r9236; 2026-02-21T12:31:22.1265826Z mov.b32 %r10003, %r9236; 2026-02-21T12:31:22.1265891Z mov.b32 %r10004, %r9236; 2026-02-21T12:31:22.1265948Z mov.b32 %r10005, %r9236; 2026-02-21T12:31:22.1266005Z mov.b32 %r10006, %r9236; 2026-02-21T12:31:22.1266068Z mov.b32 %r10007, %r9236; 2026-02-21T12:31:22.1266125Z mov.b32 %r10008, %r9236; 2026-02-21T12:31:22.1266181Z mov.b32 %r10009, %r9236; 2026-02-21T12:31:22.1266283Z mov.b32 %r10010, %r9236; 2026-02-21T12:31:22.1266347Z mov.b32 %r10011, %r9236; 2026-02-21T12:31:22.1266404Z mov.b32 %r10012, %r9236; 2026-02-21T12:31:22.1266589Z mov.b32 %r10013, %r9236; 2026-02-21T12:31:22.1266652Z mov.b32 %r10014, %r9236; 2026-02-21T12:31:22.1266709Z mov.b32 %r10015, %r9236; 2026-02-21T12:31:22.1266765Z mov.b32 %r10016, %r9236; 2026-02-21T12:31:22.1266821Z mov.b32 %r10017, %r9236; 2026-02-21T12:31:22.1266881Z mov.b32 %r10018, %r9236; 2026-02-21T12:31:22.1266937Z mov.b32 %r10019, %r9236; 2026-02-21T12:31:22.1266997Z mov.b32 %r10020, %r9236; 2026-02-21T12:31:22.1267059Z mov.b32 %r10021, %r9236; 2026-02-21T12:31:22.1267115Z mov.b32 %r10022, %r9236; 2026-02-21T12:31:22.1267171Z mov.b32 %r10023, %r9236; 2026-02-21T12:31:22.1267227Z mov.b32 %r10024, %r9236; 2026-02-21T12:31:22.1267289Z mov.b32 %r10025, %r9236; 2026-02-21T12:31:22.1267345Z mov.b32 %r10026, %r9236; 2026-02-21T12:31:22.1267403Z mov.b32 %r10027, %r9236; 2026-02-21T12:31:22.1267464Z mov.b32 %r10028, %r9236; 2026-02-21T12:31:22.1267521Z mov.b32 %r10029, %r9236; 2026-02-21T12:31:22.1267577Z mov.b32 %r10030, %r9236; 2026-02-21T12:31:22.1267633Z mov.b32 %r10031, %r9236; 2026-02-21T12:31:22.1267692Z mov.b32 %r10032, %r9236; 2026-02-21T12:31:22.1267748Z mov.b32 %r10033, %r9236; 2026-02-21T12:31:22.1267804Z mov.b32 %r10034, %r9236; 2026-02-21T12:31:22.1267864Z mov.b32 %r10035, %r9236; 2026-02-21T12:31:22.1267921Z mov.b32 %r10036, %r9236; 2026-02-21T12:31:22.1267977Z mov.b32 %r10037, %r9236; 2026-02-21T12:31:22.1268034Z mov.b32 %r10038, %r9236; 2026-02-21T12:31:22.1268105Z mov.b32 %r10039, %r9236; 2026-02-21T12:31:22.1268167Z mov.b32 %r10040, %r9236; 2026-02-21T12:31:22.1268282Z mov.b32 %r10041, %r9236; 2026-02-21T12:31:22.1268359Z mov.b32 %r10042, %r9236; 2026-02-21T12:31:22.1268422Z mov.b32 %r10043, %r9236; 2026-02-21T12:31:22.1268479Z mov.b32 %r10044, %r9236; 2026-02-21T12:31:22.1268538Z mov.b32 %r10045, %r9236; 2026-02-21T12:31:22.1268603Z mov.b32 %r10046, %r9236; 2026-02-21T12:31:22.1268663Z mov.b32 %r10047, %r9236; 2026-02-21T12:31:22.1268721Z mov.b32 %r10048, %r9236; 2026-02-21T12:31:22.1268782Z mov.b32 %r10049, %r9236; 2026-02-21T12:31:22.1268838Z mov.b32 %r10050, %r9236; 2026-02-21T12:31:22.1268896Z mov.b32 %r10051, %r9236; 2026-02-21T12:31:22.1268953Z mov.b32 %r10052, %r9236; 2026-02-21T12:31:22.1269013Z mov.b32 %r10053, %r9236; 2026-02-21T12:31:22.1269069Z mov.b32 %r10054, %r9236; 2026-02-21T12:31:22.1269129Z mov.b32 %r10055, %r9236; 2026-02-21T12:31:22.1269191Z mov.b32 %r10056, %r9236; 2026-02-21T12:31:22.1269335Z mov.b32 %r10057, %r9236; 2026-02-21T12:31:22.1269394Z mov.b32 %r10058, %r9236; 2026-02-21T12:31:22.1269456Z mov.b32 %r10059, %r9236; 2026-02-21T12:31:22.1269526Z mov.b32 %r10060, %r9236; 2026-02-21T12:31:22.1269586Z mov.b32 %r10061, %r9236; 2026-02-21T12:31:22.1269707Z mov.b32 %r10062, %r9236; 2026-02-21T12:31:22.1269780Z mov.b32 %r10063, %r9236; 2026-02-21T12:31:22.1269839Z mov.b32 %r10064, %r9236; 2026-02-21T12:31:22.1269899Z mov.b32 %r10065, %r9236; 2026-02-21T12:31:22.1269959Z mov.b32 %r10066, %r9236; 2026-02-21T12:31:22.1270018Z mov.b32 %r10067, %r9236; 2026-02-21T12:31:22.1270076Z mov.b32 %r10068, %r9236; 2026-02-21T12:31:22.1270133Z mov.b32 %r10069, %r9236; 2026-02-21T12:31:22.1270195Z mov.b32 %r10070, %r9236; 2026-02-21T12:31:22.1270252Z mov.b32 %r10071, %r9236; 2026-02-21T12:31:22.1270312Z mov.b32 %r10072, %r9236; 2026-02-21T12:31:22.1270372Z mov.b32 %r10073, %r9236; 2026-02-21T12:31:22.1270428Z mov.b32 %r10074, %r9236; 2026-02-21T12:31:22.1270500Z mov.b32 %r10075, %r9236; 2026-02-21T12:31:22.1270619Z mov.b32 %r10076, %r9236; 2026-02-21T12:31:22.1270683Z mov.b32 %r10077, %r9236; 2026-02-21T12:31:22.1270740Z mov.b32 %r10078, %r9236; 2026-02-21T12:31:22.1270800Z mov.b32 %r10079, %r9236; 2026-02-21T12:31:22.1270864Z mov.b32 %r10080, %r9236; 2026-02-21T12:31:22.1270920Z mov.b32 %r10081, %r9236; 2026-02-21T12:31:22.1270977Z mov.b32 %r10082, %r9236; 2026-02-21T12:31:22.1271095Z mov.b32 %r10083, %r9236; 2026-02-21T12:31:22.1271159Z mov.b32 %r10084, %r9236; 2026-02-21T12:31:22.1271221Z mov.b32 %r10085, %r9236; 2026-02-21T12:31:22.1271278Z mov.b32 %r10086, %r9236; 2026-02-21T12:31:22.1271339Z mov.b32 %r10087, %r9236; 2026-02-21T12:31:22.1271396Z mov.b32 %r10088, %r9236; 2026-02-21T12:31:22.1271451Z mov.b32 %r10089, %r9236; 2026-02-21T12:31:22.1271507Z mov.b32 %r10090, %r9236; 2026-02-21T12:31:22.1271568Z mov.b32 %r10091, %r9236; 2026-02-21T12:31:22.1271626Z mov.b32 %r10092, %r9236; 2026-02-21T12:31:22.1271683Z mov.b32 %r10093, %r9236; 2026-02-21T12:31:22.1271747Z mov.b32 %r10094, %r9236; 2026-02-21T12:31:22.1271805Z mov.b32 %r10095, %r9236; 2026-02-21T12:31:22.1271861Z mov.b32 %r10096, %r9236; 2026-02-21T12:31:22.1271918Z mov.b32 %r10097, %r9236; 2026-02-21T12:31:22.1271981Z mov.b32 %r10098, %r9236; 2026-02-21T12:31:22.1272038Z mov.b32 %r10099, %r9236; 2026-02-21T12:31:22.1272096Z mov.b32 %r10100, %r9236; 2026-02-21T12:31:22.1272157Z mov.b32 %r10101, %r9236; 2026-02-21T12:31:22.1272218Z mov.b32 %r10102, %r9236; 2026-02-21T12:31:22.1272275Z mov.b32 %r10103, %r9236; 2026-02-21T12:31:22.1272336Z mov.b32 %r10104, %r9236; 2026-02-21T12:31:22.1272393Z mov.b32 %r10105, %r9236; 2026-02-21T12:31:22.1272452Z mov.b32 %r10106, %r9236; 2026-02-21T12:31:22.1272509Z mov.b32 %r10107, %r9236; 2026-02-21T12:31:22.1272572Z mov.b32 %r10108, %r9236; 2026-02-21T12:31:22.1272628Z mov.b32 %r10109, %r9236; 2026-02-21T12:31:22.1272686Z mov.b32 %r10110, %r9236; 2026-02-21T12:31:22.1272747Z mov.b32 %r10111, %r9236; 2026-02-21T12:31:22.1272809Z mov.b32 %r10112, %r9236; 2026-02-21T12:31:22.1272867Z mov.b32 %r10113, %r9236; 2026-02-21T12:31:22.1272923Z mov.b32 %r10114, %r9236; 2026-02-21T12:31:22.1272985Z mov.b32 %r10115, %r9236; 2026-02-21T12:31:22.1273042Z mov.b32 %r10116, %r9236; 2026-02-21T12:31:22.1273100Z mov.b32 %r10117, %r9236; 2026-02-21T12:31:22.1273160Z mov.b32 %r10118, %r9236; 2026-02-21T12:31:22.1273216Z mov.b32 %r10119, %r9236; 2026-02-21T12:31:22.1273275Z mov.b32 %r10120, %r9236; 2026-02-21T12:31:22.1273333Z mov.b32 %r10121, %r9236; 2026-02-21T12:31:22.1273394Z mov.b32 %r10122, %r9236; 2026-02-21T12:31:22.1273452Z mov.b32 %r10123, %r9236; 2026-02-21T12:31:22.1273509Z mov.b32 %r10124, %r9236; 2026-02-21T12:31:22.1273568Z mov.b32 %r10125, %r9236; 2026-02-21T12:31:22.1273627Z mov.b32 %r10126, %r9236; 2026-02-21T12:31:22.1273685Z mov.b32 %r10127, %r9236; 2026-02-21T12:31:22.1273741Z mov.b32 %r10128, %r9236; 2026-02-21T12:31:22.1273802Z mov.b32 %r10129, %r9236; 2026-02-21T12:31:22.1273936Z mov.b32 %r10130, %r9236; 2026-02-21T12:31:22.1273997Z mov.b32 %r10131, %r9236; 2026-02-21T12:31:22.1274060Z mov.b32 %r10132, %r9236; 2026-02-21T12:31:22.1274117Z mov.b32 %r10133, %r9236; 2026-02-21T12:31:22.1274174Z mov.b32 %r10134, %r9236; 2026-02-21T12:31:22.1274300Z mov.b32 %r10135, %r9236; 2026-02-21T12:31:22.1274361Z mov.b32 %r10136, %r9236; 2026-02-21T12:31:22.1274418Z mov.b32 %r10137, %r9236; 2026-02-21T12:31:22.1274477Z mov.b32 %r10138, %r9236; 2026-02-21T12:31:22.1274537Z mov.b32 %r10139, %r9236; 2026-02-21T12:31:22.1274595Z mov.b32 %r10140, %r9236; 2026-02-21T12:31:22.1274650Z mov.b32 %r10141, %r9236; 2026-02-21T12:31:22.1274708Z mov.b32 %r10142, %r9236; 2026-02-21T12:31:22.1274768Z mov.b32 %r10143, %r9236; 2026-02-21T12:31:22.1274829Z mov.b32 %r10144, %r9236; 2026-02-21T12:31:22.1274886Z mov.b32 %r10145, %r9236; 2026-02-21T12:31:22.1274950Z mov.b32 %r10146, %r9236; 2026-02-21T12:31:22.1275008Z mov.b32 %r10147, %r9236; 2026-02-21T12:31:22.1275081Z mov.b32 %r10148, %r9236; 2026-02-21T12:31:22.1275191Z mov.b32 %r10149, %r9236; 2026-02-21T12:31:22.1275251Z mov.b32 %r10150, %r9236; 2026-02-21T12:31:22.1275309Z mov.b32 %r10151, %r9236; 2026-02-21T12:31:22.1275369Z mov.b32 %r10152, %r9236; 2026-02-21T12:31:22.1275433Z mov.b32 %r10153, %r9236; 2026-02-21T12:31:22.1275489Z mov.b32 %r10154, %r9236; 2026-02-21T12:31:22.1275546Z mov.b32 %r10155, %r9236; 2026-02-21T12:31:22.1275652Z mov.b32 %r10156, %r9236; 2026-02-21T12:31:22.1275713Z mov.b32 %r10157, %r9236; 2026-02-21T12:31:22.1275771Z mov.b32 %r10158, %r9236; 2026-02-21T12:31:22.1275828Z mov.b32 %r10159, %r9236; 2026-02-21T12:31:22.1275889Z mov.b32 %r10160, %r9236; 2026-02-21T12:31:22.1275946Z mov.b32 %r10161, %r9236; 2026-02-21T12:31:22.1276003Z mov.b32 %r10162, %r9236; 2026-02-21T12:31:22.1276066Z mov.b32 %r10163, %r9236; 2026-02-21T12:31:22.1276121Z mov.b32 %r10164, %r9236; 2026-02-21T12:31:22.1276178Z mov.b32 %r10165, %r9236; 2026-02-21T12:31:22.1276235Z mov.b32 %r10166, %r9236; 2026-02-21T12:31:22.1276301Z mov.b32 %r10167, %r9236; 2026-02-21T12:31:22.1276358Z mov.b32 %r10168, %r9236; 2026-02-21T12:31:22.1276416Z mov.b32 %r10169, %r9236; 2026-02-21T12:31:22.1276599Z mov.b32 %r10170, %r9236; 2026-02-21T12:31:22.1276665Z mov.b32 %r10171, %r9236; 2026-02-21T12:31:22.1276723Z mov.b32 %r10172, %r9236; 2026-02-21T12:31:22.1276779Z mov.b32 %r10173, %r9236; 2026-02-21T12:31:22.1276847Z mov.b32 %r10174, %r9236; 2026-02-21T12:31:22.1276912Z mov.b32 %r10175, %r9236; 2026-02-21T12:31:22.1276971Z mov.b32 %r10176, %r9236; 2026-02-21T12:31:22.1277031Z mov.b32 %r10177, %r9236; 2026-02-21T12:31:22.1277089Z mov.b32 %r10178, %r9236; 2026-02-21T12:31:22.1277145Z mov.b32 %r10179, %r9236; 2026-02-21T12:31:22.1277202Z mov.b32 %r10180, %r9236; 2026-02-21T12:31:22.1277264Z mov.b32 %r10181, %r9236; 2026-02-21T12:31:22.1277322Z mov.b32 %r10182, %r9236; 2026-02-21T12:31:22.1277380Z mov.b32 %r10183, %r9236; 2026-02-21T12:31:22.1277441Z mov.b32 %r10184, %r9236; 2026-02-21T12:31:22.1277500Z mov.b32 %r10185, %r9236; 2026-02-21T12:31:22.1277560Z mov.b32 %r10186, %r9236; 2026-02-21T12:31:22.1277617Z mov.b32 %r10187, %r9236; 2026-02-21T12:31:22.1277677Z mov.b32 %r10188, %r9236; 2026-02-21T12:31:22.1277733Z mov.b32 %r10189, %r9236; 2026-02-21T12:31:22.1277793Z mov.b32 %r10190, %r9236; 2026-02-21T12:31:22.1277853Z mov.b32 %r10191, %r9236; 2026-02-21T12:31:22.1277911Z mov.b32 %r10192, %r9236; 2026-02-21T12:31:22.1277970Z mov.b32 %r10193, %r9236; 2026-02-21T12:31:22.1278032Z mov.b32 %r10194, %r9236; 2026-02-21T12:31:22.1278089Z mov.b32 %r10195, %r9236; 2026-02-21T12:31:22.1278147Z mov.b32 %r10196, %r9236; 2026-02-21T12:31:22.1278203Z mov.b32 %r10197, %r9236; 2026-02-21T12:31:22.1278263Z mov.b32 %r10198, %r9236; 2026-02-21T12:31:22.1278319Z mov.b32 %r10199, %r9236; 2026-02-21T12:31:22.1278376Z mov.b32 %r10200, %r9236; 2026-02-21T12:31:22.1278440Z mov.b32 %r10201, %r9236; 2026-02-21T12:31:22.1278497Z mov.b32 %r10202, %r9236; 2026-02-21T12:31:22.1278646Z mov.b32 %r10203, %r9236; 2026-02-21T12:31:22.1278708Z mov.b32 %r10204, %r9236; 2026-02-21T12:31:22.1278768Z mov.b32 %r10205, %r9236; 2026-02-21T12:31:22.1278826Z mov.b32 %r10206, %r9236; 2026-02-21T12:31:22.1278883Z mov.b32 %r10207, %r9236; 2026-02-21T12:31:22.1279006Z mov.b32 %r10208, %r9236; 2026-02-21T12:31:22.1279064Z mov.b32 %r10209, %r9236; 2026-02-21T12:31:22.1279120Z mov.b32 %r10210, %r9236; 2026-02-21T12:31:22.1279179Z mov.b32 %r10211, %r9236; 2026-02-21T12:31:22.1279240Z mov.b32 %r10212, %r9236; 2026-02-21T12:31:22.1279297Z mov.b32 %r10213, %r9236; 2026-02-21T12:31:22.1279354Z mov.b32 %r10214, %r9236; 2026-02-21T12:31:22.1279415Z mov.b32 %r10215, %r9236; 2026-02-21T12:31:22.1279472Z mov.b32 %r10216, %r9236; 2026-02-21T12:31:22.1279529Z mov.b32 %r10217, %r9236; 2026-02-21T12:31:22.1279586Z mov.b32 %r10218, %r9236; 2026-02-21T12:31:22.1279646Z mov.b32 %r10219, %r9236; 2026-02-21T12:31:22.1279703Z mov.b32 %r10220, %r9236; 2026-02-21T12:31:22.1279759Z mov.b32 %r10221, %r9236; 2026-02-21T12:31:22.1279896Z mov.b32 %r10222, %r9236; 2026-02-21T12:31:22.1279959Z mov.b32 %r10223, %r9236; 2026-02-21T12:31:22.1280016Z mov.b32 %r10224, %r9236; 2026-02-21T12:31:22.1280074Z mov.b32 %r10225, %r9236; 2026-02-21T12:31:22.1280136Z mov.b32 %r10226, %r9236; 2026-02-21T12:31:22.1280193Z mov.b32 %r10227, %r9236; 2026-02-21T12:31:22.1280251Z mov.b32 %r10228, %r9236; 2026-02-21T12:31:22.1280314Z mov.b32 %r10229, %r9236; 2026-02-21T12:31:22.1280430Z mov.b32 %r10230, %r9236; 2026-02-21T12:31:22.1280490Z mov.b32 %r10231, %r9236; 2026-02-21T12:31:22.1280547Z mov.b32 %r10232, %r9236; 2026-02-21T12:31:22.1280608Z mov.b32 %r10233, %r9236; 2026-02-21T12:31:22.1280665Z mov.b32 %r10234, %r9236; 2026-02-21T12:31:22.1280722Z mov.b32 %r10235, %r9236; 2026-02-21T12:31:22.1280783Z mov.b32 %r10236, %r9236; 2026-02-21T12:31:22.1280840Z mov.b32 %r10237, %r9236; 2026-02-21T12:31:22.1280897Z mov.b32 %r10238, %r9236; 2026-02-21T12:31:22.1280958Z mov.b32 %r10239, %r9236; 2026-02-21T12:31:22.1281020Z mov.b32 %r10240, %r9236; 2026-02-21T12:31:22.1281077Z mov.b32 %r10241, %r9236; 2026-02-21T12:31:22.1281135Z mov.b32 %r10242, %r9236; 2026-02-21T12:31:22.1281195Z mov.b32 %r10243, %r9236; 2026-02-21T12:31:22.1281253Z mov.b32 %r10244, %r9236; 2026-02-21T12:31:22.1281315Z mov.b32 %r10245, %r9236; 2026-02-21T12:31:22.1281387Z mov.b32 %r10246, %r9236; 2026-02-21T12:31:22.1281446Z mov.b32 %r10247, %r9236; 2026-02-21T12:31:22.1281507Z mov.b32 %r10248, %r9236; 2026-02-21T12:31:22.1281565Z mov.b32 %r10249, %r9236; 2026-02-21T12:31:22.1281626Z mov.b32 %r10250, %r9236; 2026-02-21T12:31:22.1281683Z mov.b32 %r10251, %r9236; 2026-02-21T12:31:22.1281741Z mov.b32 %r10252, %r9236; 2026-02-21T12:31:22.1281804Z mov.b32 %r10253, %r9236; 2026-02-21T12:31:22.1281863Z mov.b32 %r10254, %r9236; 2026-02-21T12:31:22.1281920Z mov.b32 %r10255, %r9236; 2026-02-21T12:31:22.1281978Z mov.b32 %r10256, %r9236; 2026-02-21T12:31:22.1282039Z mov.b32 %r10257, %r9236; 2026-02-21T12:31:22.1282102Z mov.b32 %r10258, %r9236; 2026-02-21T12:31:22.1282163Z mov.b32 %r10259, %r9236; 2026-02-21T12:31:22.1282228Z bra.uni $L__BB0_9; 2026-02-21T12:31:22.1282333Z $L__BB0_10: // %._crit_edge 2026-02-21T12:31:22.1282573Z .loc 1 19 135 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:135 2026-02-21T12:31:22.1282645Z cp.async.wait_group 0; 2026-02-21T12:31:22.1282706Z bar.sync 0; 2026-02-21T12:31:22.1282920Z .loc 1 19 4 // c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py:19:4 2026-02-21T12:31:22.1282971Z ret; 2026-02-21T12:31:22.1283033Z $L__tmp3: 2026-02-21T12:31:22.1283088Z $L__func_end0: 2026-02-21T12:31:22.1283177Z // -- End function 2026-02-21T12:31:22.1283233Z } 2026-02-21T12:31:22.1283487Z .file 1 "/tmp/torchinductor_root/45/c45qhvuddtj3vqketwjcdejjxbrr25fbwjgvcxg7x4jbhvwdixy3.py" 2026-02-21T12:31:22.1283705Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:31:22.1283832Z .section .debug_abbrev 2026-02-21T12:31:22.1283887Z { 2026-02-21T12:31:22.1283989Z .b8 1 // Abbreviation Code 2026-02-21T12:31:22.1284095Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:31:22.1284235Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:31:22.1284323Z .b8 37 // DW_AT_producer 2026-02-21T12:31:22.1284403Z .b8 8 // DW_FORM_string 2026-02-21T12:31:22.1284486Z .b8 19 // DW_AT_language 2026-02-21T12:31:22.1284569Z .b8 5 // DW_FORM_data2 2026-02-21T12:31:22.1284645Z .b8 3 // DW_AT_name 2026-02-21T12:31:22.1284722Z .b8 8 // DW_FORM_string 2026-02-21T12:31:22.1284810Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:31:22.1284945Z .b8 6 // DW_FORM_data4 2026-02-21T12:31:22.1285031Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:31:22.1285112Z .b8 8 // DW_FORM_string 2026-02-21T12:31:22.1285188Z .b8 0 // EOM(1) 2026-02-21T12:31:22.1285257Z .b8 0 // EOM(2) 2026-02-21T12:31:22.1285395Z .b8 2 // Abbreviation Code 2026-02-21T12:31:22.1285485Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:31:22.1285563Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:31:22.1285641Z .b8 3 // DW_AT_name 2026-02-21T12:31:22.1285722Z .b8 8 // DW_FORM_string 2026-02-21T12:31:22.1285802Z .b8 32 // DW_AT_inline 2026-02-21T12:31:22.1285882Z .b8 11 // DW_FORM_data1 2026-02-21T12:31:22.1285961Z .b8 0 // EOM(1) 2026-02-21T12:31:22.1286030Z .b8 0 // EOM(2) 2026-02-21T12:31:22.1286116Z .b8 3 // Abbreviation Code 2026-02-21T12:31:22.1286209Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:31:22.1286291Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:31:22.1286373Z .b8 17 // DW_AT_low_pc 2026-02-21T12:31:22.1286573Z .b8 1 // DW_FORM_addr 2026-02-21T12:31:22.1286663Z .b8 18 // DW_AT_high_pc 2026-02-21T12:31:22.1286746Z .b8 1 // DW_FORM_addr 2026-02-21T12:31:22.1286839Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:31:22.1286918Z .b8 19 // DW_FORM_ref4 2026-02-21T12:31:22.1286988Z .b8 0 // EOM(1) 2026-02-21T12:31:22.1287062Z .b8 0 // EOM(2) 2026-02-21T12:31:22.1287154Z .b8 4 // Abbreviation Code 2026-02-21T12:31:22.1287256Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:31:22.1287341Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:31:22.1287435Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:31:22.1287513Z .b8 19 // DW_FORM_ref4 2026-02-21T12:31:22.1287589Z .b8 17 // DW_AT_low_pc 2026-02-21T12:31:22.1287663Z .b8 1 // DW_FORM_addr 2026-02-21T12:31:22.1287748Z .b8 18 // DW_AT_high_pc 2026-02-21T12:31:22.1287823Z .b8 1 // DW_FORM_addr 2026-02-21T12:31:22.1287906Z .b8 88 // DW_AT_call_file 2026-02-21T12:31:22.1287988Z .b8 11 // DW_FORM_data1 2026-02-21T12:31:22.1288160Z .b8 89 // DW_AT_call_line 2026-02-21T12:31:22.1288239Z .b8 11 // DW_FORM_data1 2026-02-21T12:31:22.1288340Z .b8 87 // DW_AT_call_column 2026-02-21T12:31:22.1288481Z .b8 11 // DW_FORM_data1 2026-02-21T12:31:22.1288554Z .b8 0 // EOM(1) 2026-02-21T12:31:22.1288624Z .b8 0 // EOM(2) 2026-02-21T12:31:22.1288695Z .b8 0 // EOM(3) 2026-02-21T12:31:22.1288756Z } 2026-02-21T12:31:22.1288821Z .section .debug_info 2026-02-21T12:31:22.1288875Z { 2026-02-21T12:31:22.1288965Z .b32 178 // Length of Unit 2026-02-21T12:31:22.1289057Z .b8 2 // DWARF version number 2026-02-21T12:31:22.1289111Z .b8 0 2026-02-21T12:31:22.1289255Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:31:22.1289416Z .b8 8 // Address Size (in bytes) 2026-02-21T12:31:22.1289533Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:31:22.1289624Z .b8 116 // DW_AT_producer 2026-02-21T12:31:22.1289681Z .b8 114 2026-02-21T12:31:22.1289735Z .b8 105 2026-02-21T12:31:22.1289790Z .b8 116 2026-02-21T12:31:22.1289839Z .b8 111 2026-02-21T12:31:22.1289951Z .b8 110 2026-02-21T12:31:22.1290005Z .b8 0 2026-02-21T12:31:22.1290088Z .b8 2 // DW_AT_language 2026-02-21T12:31:22.1290140Z .b8 0 2026-02-21T12:31:22.1290229Z .b8 99 // DW_AT_name 2026-02-21T12:31:22.1290286Z .b8 52 2026-02-21T12:31:22.1290337Z .b8 53 2026-02-21T12:31:22.1290390Z .b8 113 2026-02-21T12:31:22.1290443Z .b8 104 2026-02-21T12:31:22.1290497Z .b8 118 2026-02-21T12:31:22.1290546Z .b8 117 2026-02-21T12:31:22.1290596Z .b8 100 2026-02-21T12:31:22.1290650Z .b8 100 2026-02-21T12:31:22.1290702Z .b8 116 2026-02-21T12:31:22.1290754Z .b8 106 2026-02-21T12:31:22.1290803Z .b8 51 2026-02-21T12:31:22.1290857Z .b8 118 2026-02-21T12:31:22.1290908Z .b8 113 2026-02-21T12:31:22.1290960Z .b8 107 2026-02-21T12:31:22.1291015Z .b8 101 2026-02-21T12:31:22.1291070Z .b8 116 2026-02-21T12:31:22.1291122Z .b8 119 2026-02-21T12:31:22.1291174Z .b8 106 2026-02-21T12:31:22.1291229Z .b8 99 2026-02-21T12:31:22.1291279Z .b8 100 2026-02-21T12:31:22.1291331Z .b8 101 2026-02-21T12:31:22.1291383Z .b8 106 2026-02-21T12:31:22.1291436Z .b8 106 2026-02-21T12:31:22.1291486Z .b8 120 2026-02-21T12:31:22.1291536Z .b8 98 2026-02-21T12:31:22.1291588Z .b8 114 2026-02-21T12:31:22.1291639Z .b8 114 2026-02-21T12:31:22.1291690Z .b8 50 2026-02-21T12:31:22.1291740Z .b8 53 2026-02-21T12:31:22.1291794Z .b8 102 2026-02-21T12:31:22.1291842Z .b8 98 2026-02-21T12:31:22.1291891Z .b8 119 2026-02-21T12:31:22.1291944Z .b8 106 2026-02-21T12:31:22.1291995Z .b8 103 2026-02-21T12:31:22.1292047Z .b8 118 2026-02-21T12:31:22.1292097Z .b8 99 2026-02-21T12:31:22.1292153Z .b8 120 2026-02-21T12:31:22.1292205Z .b8 103 2026-02-21T12:31:22.1292255Z .b8 55 2026-02-21T12:31:22.1292306Z .b8 120 2026-02-21T12:31:22.1292359Z .b8 52 2026-02-21T12:31:22.1292410Z .b8 106 2026-02-21T12:31:22.1292463Z .b8 98 2026-02-21T12:31:22.1292520Z .b8 104 2026-02-21T12:31:22.1292572Z .b8 118 2026-02-21T12:31:22.1292622Z .b8 119 2026-02-21T12:31:22.1292672Z .b8 100 2026-02-21T12:31:22.1292727Z .b8 105 2026-02-21T12:31:22.1292778Z .b8 120 2026-02-21T12:31:22.1292831Z .b8 121 2026-02-21T12:31:22.1292885Z .b8 51 2026-02-21T12:31:22.1292936Z .b8 46 2026-02-21T12:31:22.1292985Z .b8 112 2026-02-21T12:31:22.1293035Z .b8 121 2026-02-21T12:31:22.1293089Z .b8 0 2026-02-21T12:31:22.1293189Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:31:22.1293268Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:31:22.1293323Z .b8 116 2026-02-21T12:31:22.1293374Z .b8 109 2026-02-21T12:31:22.1293425Z .b8 112 2026-02-21T12:31:22.1293473Z .b8 47 2026-02-21T12:31:22.1293606Z .b8 116 2026-02-21T12:31:22.1293657Z .b8 111 2026-02-21T12:31:22.1293708Z .b8 114 2026-02-21T12:31:22.1293760Z .b8 99 2026-02-21T12:31:22.1293811Z .b8 104 2026-02-21T12:31:22.1293860Z .b8 105 2026-02-21T12:31:22.1293910Z .b8 110 2026-02-21T12:31:22.1294011Z .b8 100 2026-02-21T12:31:22.1294062Z .b8 117 2026-02-21T12:31:22.1294114Z .b8 99 2026-02-21T12:31:22.1294164Z .b8 116 2026-02-21T12:31:22.1294218Z .b8 111 2026-02-21T12:31:22.1294270Z .b8 114 2026-02-21T12:31:22.1294320Z .b8 95 2026-02-21T12:31:22.1294376Z .b8 114 2026-02-21T12:31:22.1294426Z .b8 111 2026-02-21T12:31:22.1294476Z .b8 111 2026-02-21T12:31:22.1294524Z .b8 116 2026-02-21T12:31:22.1294585Z .b8 47 2026-02-21T12:31:22.1294641Z .b8 52 2026-02-21T12:31:22.1294692Z .b8 53 2026-02-21T12:31:22.1294746Z .b8 0 2026-02-21T12:31:22.1294859Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:31:22.1294934Z .b8 95 // DW_AT_name 2026-02-21T12:31:22.1294990Z .b8 104 2026-02-21T12:31:22.1295046Z .b8 101 2026-02-21T12:31:22.1295147Z .b8 108 2026-02-21T12:31:22.1295200Z .b8 105 2026-02-21T12:31:22.1295253Z .b8 111 2026-02-21T12:31:22.1295306Z .b8 110 2026-02-21T12:31:22.1295354Z .b8 95 2026-02-21T12:31:22.1295407Z .b8 109 2026-02-21T12:31:22.1295462Z .b8 97 2026-02-21T12:31:22.1295515Z .b8 116 2026-02-21T12:31:22.1295565Z .b8 109 2026-02-21T12:31:22.1295615Z .b8 117 2026-02-21T12:31:22.1295671Z .b8 108 2026-02-21T12:31:22.1295768Z .b8 95 2026-02-21T12:31:22.1295822Z .b8 98 2026-02-21T12:31:22.1295877Z .b8 102 2026-02-21T12:31:22.1295927Z .b8 49 2026-02-21T12:31:22.1295975Z .b8 54 2026-02-21T12:31:22.1296037Z .b8 95 2026-02-21T12:31:22.1296095Z .b8 105 2026-02-21T12:31:22.1296148Z .b8 110 2026-02-21T12:31:22.1296197Z .b8 116 2026-02-21T12:31:22.1296250Z .b8 52 2026-02-21T12:31:22.1296300Z .b8 0 2026-02-21T12:31:22.1296379Z .b8 1 // DW_AT_inline 2026-02-21T12:31:22.1296604Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:31:22.1296714Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:31:22.1296810Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:31:22.1296907Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:31:22.1297045Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:31:22.1297144Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:31:22.1297232Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:31:22.1297324Z .b64 $L__tmp2 // DW_AT_high_pc 2026-02-21T12:31:22.1297405Z .b8 1 // DW_AT_call_file 2026-02-21T12:31:22.1297486Z .b8 84 // DW_AT_call_line 2026-02-21T12:31:22.1297573Z .b8 40 // DW_AT_call_column 2026-02-21T12:31:22.1297673Z .b8 0 // End Of Children Mark 2026-02-21T12:31:22.1297767Z .b8 0 // End Of Children Mark 2026-02-21T12:31:22.1297832Z } 2026-02-21T12:31:22.1297909Z .section .debug_macinfo { } 2026-02-21T12:31:22.1297916Z 2026-02-21T12:31:22.1297997Z ================================================================ 2026-02-21T12:31:22.1298122Z please share the reproducer above with Triton project. 2026-02-21T12:31:37.3476866Z [7596s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 512, 16], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:31:37.3478607Z Tensor-likes are not close! 2026-02-21T12:31:37.3478761Z 2026-02-21T12:31:37.3478871Z Mismatched elements: 334493344 / 335544320 (99.7%) 2026-02-21T12:31:37.3479777Z Greatest absolute difference: 4288.0 at index (153815, 793) (up to 0.01 allowed) 2026-02-21T12:31:37.3480274Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:31:37.3480878Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:31:37.3481113Z 2026-02-21T12:31:43.4085174Z [7602s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 512, 64], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=4, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:31:43.4087433Z Tensor-likes are not close! 2026-02-21T12:31:43.4087608Z 2026-02-21T12:31:43.4087731Z Mismatched elements: 335051172 / 335544320 (99.9%) 2026-02-21T12:31:43.4088620Z Greatest absolute difference: 10240.0 at index (126054, 532) (up to 0.01 allowed) 2026-02-21T12:31:43.4089209Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:31:43.4089692Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:31:43.4089967Z 2026-02-21T12:31:45.3924364Z [7604s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 512, 256], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=16, num_stages=4, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:31:45.3926179Z Tensor-likes are not close! 2026-02-21T12:31:45.3926339Z 2026-02-21T12:31:45.3926740Z Mismatched elements: 334524510 / 335544320 (99.7%) 2026-02-21T12:31:45.3927177Z Greatest absolute difference: 4128.0 at index (237144, 114) (up to 0.01 allowed) 2026-02-21T12:31:45.3927696Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:31:45.3928134Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:31:45.3928377Z 2026-02-21T12:32:22.7645485Z [7642s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 128, 16], indexing=['tensor_descriptor', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=64, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, True], range_num_stages=[0, 1], range_unroll_factors=[3, 3], range_warp_specializes=[]) 2026-02-21T12:32:22.7647559Z Tensor-likes are not close! 2026-02-21T12:32:22.7647711Z 2026-02-21T12:32:22.7647826Z Mismatched elements: 334457783 / 335544320 (99.7%) 2026-02-21T12:32:22.7648249Z Greatest absolute difference: 4160.0 at index (248507, 532) (up to 0.01 allowed) 2026-02-21T12:32:22.7648768Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:32:22.7649202Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:32:22.7649452Z 2026-02-21T12:35:31.8370265Z [7831s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 16, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=32, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[True, True], range_num_stages=[3, 1], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T12:35:31.8373131Z Tensor-likes are not close! 2026-02-21T12:35:31.8373406Z 2026-02-21T12:35:31.8373613Z Mismatched elements: 334848010 / 335544320 (99.8%) 2026-02-21T12:35:31.8374325Z Greatest absolute difference: 7232.0 at index (126054, 532) (up to 0.01 allowed) 2026-02-21T12:35:32.1854536Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:35:32.1855179Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:35:32.1855414Z 2026-02-21T12:36:45.5964972Z [7904s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 16, 16], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['', ''], loop_orders=[[0, 1]], num_stages=1, num_warps=8, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:36:45.8220200Z Tensor-likes are not close! 2026-02-21T12:36:45.8220378Z 2026-02-21T12:36:45.8220518Z Mismatched elements: 333395284 / 335544320 (99.4%) 2026-02-21T12:36:45.8220968Z Greatest absolute difference: 2320.0 at index (140315, 978) (up to 0.01 allowed) 2026-02-21T12:36:45.8221514Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:36:45.8222198Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:36:45.8222479Z 2026-02-21T12:37:30.0472030Z 2026-02-21T12:37:30.0475073Z Generation 1: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 109/109 0.2 configs/s 2026-02-21T12:37:30.3425874Z Generation 1: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 4/4 - configs/s 2026-02-21T12:37:31.5169828Z [7950s] Generation 1 complete: 2026-02-21T12:37:31.5170866Z error=12 2026-02-21T12:37:31.5171159Z ok=101 2026-02-21T12:37:31.5171417Z min=49.2954 2026-02-21T12:37:31.5171679Z mid=175.4651 2026-02-21T12:37:31.5171946Z max=4655.5078 2026-02-21T12:37:31.5172250Z best={'block_sizes': [8, 128, 64], 2026-02-21T12:37:31.5172694Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T12:37:31.5173067Z 'l2_groupings': [32], 2026-02-21T12:37:31.5173315Z 'load_eviction_policies': ['last', ''], 2026-02-21T12:37:31.5173585Z 'loop_orders': [[0, 1]], 2026-02-21T12:37:31.5173807Z 'num_stages': 6, 2026-02-21T12:37:31.5174013Z 'num_warps': 4, 2026-02-21T12:37:31.5181433Z 'pid_type': 'flat', 2026-02-21T12:37:31.5181722Z 'range_flattens': [None, False], 2026-02-21T12:37:31.5182031Z 'range_multi_buffers': [None, True], 2026-02-21T12:37:31.5182298Z 'range_num_stages': [0, 2], 2026-02-21T12:37:31.5182538Z 'range_unroll_factors': [0, 3], 2026-02-21T12:37:31.5182797Z 'range_warp_specializes': []} 2026-02-21T12:37:31.5203934Z [7950s] Fitting surrogate: 213 points, 213 targets 2026-02-21T12:37:33.2053650Z [7952s] Generation 2 starting: 99 neighbors, 5 active search path(s) 2026-02-21T12:38:19.7557468Z Generation 2: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 103/103 1.0 configs/s 2026-02-21T12:39:43.3500725Z [8082s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=32, num_stages=6, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T12:39:43.3502518Z Tensor-likes are not close! 2026-02-21T12:39:43.3502690Z 2026-02-21T12:39:43.3502819Z Mismatched elements: 334446786 / 335544320 (99.7%) 2026-02-21T12:39:43.3503347Z Greatest absolute difference: 4064.0 at index (160926, 619) (up to 0.01 allowed) 2026-02-21T12:39:43.3503847Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:39:43.3504256Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:39:43.3504480Z 2026-02-21T12:39:47.6251830Z 2026-02-21T12:39:47.6251846Z 2026-02-21T12:39:47.6252069Z ================================================================ 2026-02-21T12:39:47.6252455Z Internal Triton PTX codegen error 2026-02-21T12:39:47.6252733Z `ptxas` stderr: 2026-02-21T12:39:47.6253476Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 583 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:39:47.6254697Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:39:47.6254948Z 2026-02-21T12:39:47.6255598Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpw2lqk6lk.ptx -o /tmp/tmpw2lqk6lk.ptx.o 2026-02-21T12:39:47.6256735Z 2026-02-21T12:39:47.6256752Z 2026-02-21T12:39:47.6256833Z // 2026-02-21T12:39:47.6257039Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:39:47.6257290Z // 2026-02-21T12:39:47.6257382Z 2026-02-21T12:39:47.6257453Z .version 8.7 2026-02-21T12:39:47.6257657Z .target sm_90a 2026-02-21T12:39:47.6257840Z .address_size 64 2026-02-21T12:39:47.6257963Z 2026-02-21T12:39:47.6258189Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:39:47.6258624Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:39:47.6259071Z // @_helion_matmul_bf16_int4 2026-02-21T12:39:47.6259415Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:39:47.6259784Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:39:47.6260231Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:39:47.6260653Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:39:47.6261253Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:39:47.6261690Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:39:47.6262029Z ) 2026-02-21T12:39:47.6262193Z .reqntid 256 2026-02-21T12:39:47.6262366Z .maxnreg 128 2026-02-21T12:39:47.6262554Z { 2026-02-21T12:39:47.6262724Z .reg .pred %p<44>; 2026-02-21T12:39:47.6262935Z .reg .b16 %rs<561>; 2026-02-21T12:39:47.6263133Z .reg .b32 %r<12028>; 2026-02-21T12:39:47.6263336Z .reg .b64 %rd<632>; 2026-02-21T12:39:47.6263719Z .loc 1 14 0 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:14:0 2026-02-21T12:39:47.6264182Z $L__func_begin0: 2026-02-21T12:39:47.6264546Z .loc 1 14 0 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:14:0 2026-02-21T12:39:47.6264840Z 2026-02-21T12:39:47.6264899Z // %bb.0: 2026-02-21T12:39:47.6265110Z ld.param.b64 %rd151, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:39:47.6265424Z ld.param.b64 %rd150, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:39:47.6265728Z ld.param.b64 %rd149, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:39:47.6265978Z $L__tmp0: 2026-02-21T12:39:47.6266272Z .loc 1 20 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:20:30 2026-02-21T12:39:47.6266880Z mov.u32 %r1503, %ctaid.x; 2026-02-21T12:39:47.6267200Z .loc 1 20 48 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:20:48 2026-02-21T12:39:47.6267574Z mul.wide.u32 %rd589, %r1503, 3; 2026-02-21T12:39:47.6267934Z .loc 1 21 49 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:21:49 2026-02-21T12:39:47.6268295Z min.u64 %rd152, %rd589, 10237; 2026-02-21T12:39:47.6268575Z add.s64 %rd2, %rd152, 3; 2026-02-21T12:39:47.6268916Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6269294Z sub.s64 %rd153, %rd2, %rd589; 2026-02-21T12:39:47.6269494Z shr.s64 %rd154, %rd153, 63; 2026-02-21T12:39:47.6269689Z shr.u64 %rd155, %rd154, 62; 2026-02-21T12:39:47.6269874Z add.s64 %rd156, %rd153, %rd155; 2026-02-21T12:39:47.6270080Z and.b64 %rd157, %rd156, -4; 2026-02-21T12:39:47.6270269Z add.s64 %rd629, %rd157, %rd589; 2026-02-21T12:39:47.6270614Z .loc 1 34 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:45 2026-02-21T12:39:47.6270985Z mov.u32 %r1, %tid.x; 2026-02-21T12:39:47.6271151Z shr.u32 %r2, %r1, 5; 2026-02-21T12:39:47.6271321Z bfe.u32 %r1504, %r1, 2, 6; 2026-02-21T12:39:47.6271597Z or.b32 %r1505, %r1504, 64; 2026-02-21T12:39:47.6271780Z and.b32 %r4, %r1, 224; 2026-02-21T12:39:47.6271953Z bfe.u32 %r1507, %r1, 5, 3; 2026-02-21T12:39:47.6272146Z or.b32 %r1508, %r1507, 8; 2026-02-21T12:39:47.6272317Z or.b32 %r1509, %r1507, 16; 2026-02-21T12:39:47.6272563Z or.b32 %r1510, %r1507, 24; 2026-02-21T12:39:47.6272733Z or.b32 %r1511, %r1507, 32; 2026-02-21T12:39:47.6272906Z or.b32 %r1512, %r1507, 40; 2026-02-21T12:39:47.6273079Z or.b32 %r1513, %r1507, 48; 2026-02-21T12:39:47.6273248Z or.b32 %r1514, %r1507, 56; 2026-02-21T12:39:47.6273422Z or.b32 %r1515, %r1507, 64; 2026-02-21T12:39:47.6273609Z or.b32 %r1516, %r1507, 72; 2026-02-21T12:39:47.6273783Z or.b32 %r1517, %r1507, 80; 2026-02-21T12:39:47.6273949Z or.b32 %r1518, %r1507, 88; 2026-02-21T12:39:47.6274125Z or.b32 %r1519, %r1507, 96; 2026-02-21T12:39:47.6274305Z or.b32 %r1520, %r1507, 104; 2026-02-21T12:39:47.6274494Z or.b32 %r1521, %r1507, 112; 2026-02-21T12:39:47.6274669Z or.b32 %r1522, %r1507, 120; 2026-02-21T12:39:47.6275076Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6275458Z cvt.u64.u32 %rd7, %r1507; 2026-02-21T12:39:47.6275641Z cvt.u64.u32 %rd8, %r1508; 2026-02-21T12:39:47.6275821Z cvt.u64.u32 %rd9, %r1509; 2026-02-21T12:39:47.6275992Z cvt.u64.u32 %rd10, %r1510; 2026-02-21T12:39:47.6276169Z cvt.u64.u32 %rd11, %r1511; 2026-02-21T12:39:47.6277222Z [8086s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:39:47.6278793Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 128, 256], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=32, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 2], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:39:47.6280280Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:39:47.6280571Z `ptxas` stderr: 2026-02-21T12:39:47.6281128Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 583 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:39:47.6281787Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:39:47.6281972Z 2026-02-21T12:39:47.6282501Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpw2lqk6lk.ptx -o /tmp/tmpw2lqk6lk.ptx.o 2026-02-21T12:39:47.6283075Z 2026-02-21T12:39:47.6283230Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:39:47.6283535Z cvt.u64.u32 %rd12, %r1512; 2026-02-21T12:39:47.6283724Z cvt.u64.u32 %rd13, %r1513; 2026-02-21T12:39:47.6283909Z cvt.u64.u32 %rd14, %r1514; 2026-02-21T12:39:47.6284092Z cvt.u64.u32 %rd15, %r1515; 2026-02-21T12:39:47.6284269Z cvt.u64.u32 %rd16, %r1516; 2026-02-21T12:39:47.6284454Z cvt.u64.u32 %rd17, %r1517; 2026-02-21T12:39:47.6284626Z cvt.u64.u32 %rd18, %r1518; 2026-02-21T12:39:47.6284809Z cvt.u64.u32 %rd19, %r1519; 2026-02-21T12:39:47.6284984Z cvt.u64.u32 %rd20, %r1520; 2026-02-21T12:39:47.6285164Z cvt.u64.u32 %rd21, %r1521; 2026-02-21T12:39:47.6285342Z cvt.u64.u32 %rd22, %r1522; 2026-02-21T12:39:47.6285663Z .loc 1 36 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:45 2026-02-21T12:39:47.6286027Z and.b32 %r5, %r1, 31; 2026-02-21T12:39:47.6286194Z shl.b32 %r1523, %r5, 3; 2026-02-21T12:39:47.6286636Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6286991Z cvt.u64.u32 %rd23, %r1523; 2026-02-21T12:39:47.6287304Z .loc 1 50 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:50:39 2026-02-21T12:39:47.6287756Z and.b32 %r6, %r1, 3; 2026-02-21T12:39:47.6287924Z shl.b32 %r8, %r1, 3; 2026-02-21T12:39:47.6288239Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6288603Z setp.gt.s64 %p1, %rd157, 0; 2026-02-21T12:39:47.6288867Z mov.b32 %r10096, global_smem; 2026-02-21T12:39:47.6289052Z shl.b32 %r11232, %r4, 4; 2026-02-21T12:39:47.6289240Z and.b32 %r11233, %r8, 96; 2026-02-21T12:39:47.6289416Z shl.b32 %r11234, %r6, 1; 2026-02-21T12:39:47.6289587Z and.b32 %r11759, %r1, 16; 2026-02-21T12:39:47.6289757Z shl.b32 %r11236, %r5, 2; 2026-02-21T12:39:47.6289927Z and.b32 %r11237, %r2, 3; 2026-02-21T12:39:47.6290102Z and.b32 %r11238, %r1, 128; 2026-02-21T12:39:47.6290280Z and.b32 %r11239, %r1, 7; 2026-02-21T12:39:47.6290461Z shl.b32 %r11240, %r1, 6; 2026-02-21T12:39:47.6290629Z and.b32 %r11241, %r8, 48; 2026-02-21T12:39:47.6290802Z and.b32 %r11242, %r1, 248; 2026-02-21T12:39:47.6290975Z bfe.s32 %r11243, %r1, 2, 1; 2026-02-21T12:39:47.6291156Z and.b32 %r11244, %r1, 24; 2026-02-21T12:39:47.6291391Z shl.b32 %r11245, %r1, 1; 2026-02-21T12:39:47.6291568Z bfe.s32 %r11246, %r1, 5, 1; 2026-02-21T12:39:47.6291744Z @%p1 bra $L__BB0_1; 2026-02-21T12:39:47.6291909Z bra.uni $L__BB0_23; 2026-02-21T12:39:47.6292102Z $L__BB0_1: // %.lr.ph 2026-02-21T12:39:47.6292547Z .loc 1 0 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:0:120 2026-02-21T12:39:47.6292909Z shr.u32 %r3, %r1, 1; 2026-02-21T12:39:47.6293070Z bfe.u32 %r1506, %r1, 1, 7; 2026-02-21T12:39:47.6293256Z cvt.u64.u32 %rd6, %r1506; 2026-02-21T12:39:47.6293424Z shl.b32 %r1524, %r1, 4; 2026-02-21T12:39:47.6293599Z and.b32 %r1525, %r1524, 3952; 2026-02-21T12:39:47.6293791Z bfe.s32 %r1526, %r1, 3, 1; 2026-02-21T12:39:47.6293972Z and.b32 %r1527, %r1526, 136; 2026-02-21T12:39:47.6294162Z or.b32 %r1528, %r1527, %r1525; 2026-02-21T12:39:47.6294349Z add.s32 %r10, %r10096, %r1528; 2026-02-21T12:39:47.6294538Z xor.b32 %r1530, %r1528, 8; 2026-02-21T12:39:47.6294720Z add.s32 %r11, %r10096, %r1530; 2026-02-21T12:39:47.6294912Z bfe.s32 %r1534, %r1, 4, 1; 2026-02-21T12:39:47.6295085Z and.b32 %r1535, %r1534, 136; 2026-02-21T12:39:47.6295272Z or.b32 %r1536, %r11232, %r11233; 2026-02-21T12:39:47.6295469Z or.b32 %r1537, %r1536, %r11234; 2026-02-21T12:39:47.6295668Z or.b32 %r1538, %r1537, %r1535; 2026-02-21T12:39:47.6295858Z add.s32 %r13, %r10096, %r1538; 2026-02-21T12:39:47.6296049Z xor.b32 %r1539, %r1538, 8; 2026-02-21T12:39:47.6296232Z add.s32 %r14, %r10096, %r1539; 2026-02-21T12:39:47.6296415Z or.b32 %r1543, %r11238, %r11237; 2026-02-21T12:39:47.6296742Z or.b32 %r1544, %r1543, %r11236; 2026-02-21T12:39:47.6296929Z add.s32 %r15, %r10096, %r1544; 2026-02-21T12:39:47.6297115Z xor.b32 %r1545, %r1544, 16; 2026-02-21T12:39:47.6297307Z add.s32 %r16, %r10096, %r1545; 2026-02-21T12:39:47.6297492Z xor.b32 %r1546, %r1544, 32; 2026-02-21T12:39:47.6297665Z add.s32 %r17, %r10096, %r1546; 2026-02-21T12:39:47.6297851Z xor.b32 %r1547, %r1544, 48; 2026-02-21T12:39:47.6298033Z add.s32 %r18, %r10096, %r1547; 2026-02-21T12:39:47.6298211Z xor.b32 %r1548, %r1544, 64; 2026-02-21T12:39:47.6298389Z add.s32 %r19, %r10096, %r1548; 2026-02-21T12:39:47.6298566Z xor.b32 %r1549, %r1544, 80; 2026-02-21T12:39:47.6298742Z add.s32 %r20, %r10096, %r1549; 2026-02-21T12:39:47.6298919Z xor.b32 %r1550, %r1544, 96; 2026-02-21T12:39:47.6299097Z add.s32 %r21, %r10096, %r1550; 2026-02-21T12:39:47.6299278Z xor.b32 %r1551, %r1544, 112; 2026-02-21T12:39:47.6299460Z add.s32 %r22, %r10096, %r1551; 2026-02-21T12:39:47.6299637Z shl.b32 %r1553, %r11239, 8; 2026-02-21T12:39:47.6299825Z shl.b32 %r1554, %r11239, 4; 2026-02-21T12:39:47.6300011Z and.b32 %r1555, %r3, 124; 2026-02-21T12:39:47.6300188Z xor.b32 %r1556, %r1554, %r1555; 2026-02-21T12:39:47.6300387Z add.s32 %r1557, %r10096, %r1553; 2026-02-21T12:39:47.6300577Z add.s32 %r23, %r1557, %r1556; 2026-02-21T12:39:47.6300763Z and.b32 %r1559, %r11240, 16320; 2026-02-21T12:39:47.6301042Z or.b32 %r1561, %r1559, %r11241; 2026-02-21T12:39:47.6301235Z add.s32 %r24, %r10096, %r1561; 2026-02-21T12:39:47.6301418Z xor.b32 %r1562, %r1561, 16; 2026-02-21T12:39:47.6301600Z add.s32 %r25, %r10096, %r1562; 2026-02-21T12:39:47.6301876Z xor.b32 %r1563, %r1561, 32; 2026-02-21T12:39:47.6302051Z add.s32 %r26, %r10096, %r1563; 2026-02-21T12:39:47.6302234Z xor.b32 %r1564, %r1561, 48; 2026-02-21T12:39:47.6302411Z add.s32 %r27, %r10096, %r1564; 2026-02-21T12:39:47.6302600Z bfe.u32 %r1565, %r10096, 4, 14; 2026-02-21T12:39:47.6302783Z cvt.u64.u32 %rd158, %r1565; 2026-02-21T12:39:47.6302999Z or.b64 %rd404, %rd158, -9223371899348713472; 2026-02-21T12:39:47.6303219Z add.s32 %r1566, %r10096, 32; 2026-02-21T12:39:47.6303403Z bfe.u32 %r1567, %r1566, 4, 14; 2026-02-21T12:39:47.6303585Z cvt.u64.u32 %rd159, %r1567; 2026-02-21T12:39:47.6303787Z or.b64 %rd405, %rd159, -9223371899348713472; 2026-02-21T12:39:47.6304011Z shl.b32 %r1569, %r11242, 4; 2026-02-21T12:39:47.6304192Z and.b32 %r1571, %r11243, 4112; 2026-02-21T12:39:47.6304455Z or.b32 %r1572, %r1571, %r1569; 2026-02-21T12:39:47.6304648Z mad.lo.s32 %r1573, %r6, 8224, %r1572; 2026-02-21T12:39:47.6304859Z add.s32 %r28, %r10096, %r1573; 2026-02-21T12:39:47.6305043Z xor.b32 %r1574, %r1573, 16; 2026-02-21T12:39:47.6305229Z add.s32 %r29, %r10096, %r1574; 2026-02-21T12:39:47.6305406Z xor.b32 %r1575, %r1573, 32; 2026-02-21T12:39:47.6305659Z add.s32 %r30, %r10096, %r1575; 2026-02-21T12:39:47.6305851Z xor.b32 %r1576, %r1573, 48; 2026-02-21T12:39:47.6306029Z add.s32 %r31, %r10096, %r1576; 2026-02-21T12:39:47.6306217Z xor.b32 %r1577, %r1573, 64; 2026-02-21T12:39:47.6306390Z add.s32 %r32, %r10096, %r1577; 2026-02-21T12:39:47.6306689Z xor.b32 %r1578, %r1573, 80; 2026-02-21T12:39:47.6306862Z add.s32 %r33, %r10096, %r1578; 2026-02-21T12:39:47.6307043Z xor.b32 %r1579, %r1573, 96; 2026-02-21T12:39:47.6307223Z add.s32 %r34, %r10096, %r1579; 2026-02-21T12:39:47.6307422Z xor.b32 %r1580, %r1573, 112; 2026-02-21T12:39:47.6307605Z add.s32 %r35, %r10096, %r1580; 2026-02-21T12:39:47.6307788Z shl.b32 %r1582, %r11244, 10; 2026-02-21T12:39:47.6307963Z shl.b32 %r1583, %r11244, 2; 2026-02-21T12:39:47.6308138Z and.b32 %r1585, %r11245, 384; 2026-02-21T12:39:47.6308322Z and.b32 %r1587, %r11246, 4112; 2026-02-21T12:39:47.6308633Z or.b32 %r1588, %r1582, %r1554; 2026-02-21T12:39:47.6308821Z or.b32 %r1589, %r1583, %r1585; 2026-02-21T12:39:47.6309004Z xor.b32 %r1590, %r1588, %r1589; 2026-02-21T12:39:47.6309195Z xor.b32 %r1591, %r1590, %r1587; 2026-02-21T12:39:47.6309380Z add.s32 %r3291, %r10096, %r1591; 2026-02-21T12:39:47.6309572Z add.s32 %r3296, %r3291, 512; 2026-02-21T12:39:47.6309753Z add.s32 %r3301, %r3291, 1024; 2026-02-21T12:39:47.6309931Z add.s32 %r3306, %r3291, 1536; 2026-02-21T12:39:47.6310112Z add.s32 %r3311, %r3291, 2048; 2026-02-21T12:39:47.6310283Z add.s32 %r3316, %r3291, 2560; 2026-02-21T12:39:47.6310461Z add.s32 %r3321, %r3291, 3072; 2026-02-21T12:39:47.6310648Z add.s32 %r3326, %r3291, 3584; 2026-02-21T12:39:47.6310836Z shr.u32 %r1592, %r11242, 1; 2026-02-21T12:39:47.6311015Z xor.b32 %r1593, %r1554, %r1592; 2026-02-21T12:39:47.6311206Z add.s32 %r44, %r1557, %r1593; 2026-02-21T12:39:47.6311541Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6311934Z shl.b64 %rd160, %rd6, 14; 2026-02-21T12:39:47.6312120Z and.b32 %r1594, %r1, 1; 2026-02-21T12:39:47.6312302Z mul.wide.u32 %rd161, %r1594, 16; 2026-02-21T12:39:47.6312506Z or.b64 %rd162, %rd160, %rd161; 2026-02-21T12:39:47.6312698Z add.s64 %rd163, %rd162, %rd149; 2026-02-21T12:39:47.6312896Z add.s64 %rd26, %rd163, 32; 2026-02-21T12:39:47.6313079Z mul.lo.s64 %rd164, %rd7, 1280; 2026-02-21T12:39:47.6313280Z or.b64 %rd165, %rd164, %rd23; 2026-02-21T12:39:47.6313468Z add.s64 %rd27, %rd150, %rd165; 2026-02-21T12:39:47.6313688Z prmt.b32 %r3189, %r3190, %r3191, 0x3340U; 2026-02-21T12:39:47.6313926Z prmt.b32 %r5089, %r5090, %r5091, 0x3340U; 2026-02-21T12:39:47.6314239Z prmt.b32 %r6989, %r6990, %r6991, 0x3340U; 2026-02-21T12:39:47.6314469Z prmt.b32 %r8889, %r8890, %r8891, 0x3340U; 2026-02-21T12:39:47.6314731Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:39:47.6315103Z // Child Loop BB0_6 Depth 2 2026-02-21T12:39:47.6315383Z // Child Loop BB0_11 Depth 2 2026-02-21T12:39:47.6315667Z // Child Loop BB0_16 Depth 2 2026-02-21T12:39:47.6315954Z // Child Loop BB0_21 Depth 2 2026-02-21T12:39:47.6316340Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6316882Z mul.hi.u64 %rd166, %rd589, -3689348814741910323; 2026-02-21T12:39:47.6317137Z shr.u64 %rd167, %rd166, 5; 2026-02-21T12:39:47.6317478Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6317942Z shl.b64 %rd30, %rd167, 3; 2026-02-21T12:39:47.6318271Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6318630Z sub.s64 %rd168, 2048, %rd30; 2026-02-21T12:39:47.6318949Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6319302Z min.s64 %rd31, %rd168, 8; 2026-02-21T12:39:47.6319677Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6320038Z mul.lo.s64 %rd169, %rd167, 40; 2026-02-21T12:39:47.6320227Z sub.s64 %rd32, %rd589, %rd169; 2026-02-21T12:39:47.6320548Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6320899Z or.b64 %rd170, %rd32, %rd31; 2026-02-21T12:39:47.6321103Z and.b64 %rd171, %rd170, -4294967296; 2026-02-21T12:39:47.6321318Z setp.ne.b64 %p2, %rd171, 0; 2026-02-21T12:39:47.6321506Z @%p2 bra $L__BB0_4; 2026-02-21T12:39:47.6321672Z bra.uni $L__BB0_3; 2026-02-21T12:39:47.6321884Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6322150Z div.s64 %rd590, %rd32, %rd31; 2026-02-21T12:39:47.6322331Z bra.uni $L__BB0_5; 2026-02-21T12:39:47.6322536Z $L__BB0_3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6322797Z cvt.u32.u64 %r1595, %rd31; 2026-02-21T12:39:47.6322974Z cvt.u32.u64 %r1596, %rd32; 2026-02-21T12:39:47.6323163Z div.u32 %r1597, %r1596, %r1595; 2026-02-21T12:39:47.6323352Z cvt.u64.u32 %rd590, %r1597; 2026-02-21T12:39:47.6323583Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6323977Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6324342Z mul.lo.s64 %rd173, %rd590, %rd31; 2026-02-21T12:39:47.6324549Z sub.s64 %rd174, %rd32, %rd173; 2026-02-21T12:39:47.6324874Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6325254Z add.s64 %rd175, %rd174, %rd30; 2026-02-21T12:39:47.6325579Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6325936Z shl.b64 %rd36, %rd175, 7; 2026-02-21T12:39:47.6326246Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6326740Z shl.b64 %rd176, %rd590, 8; 2026-02-21T12:39:47.6327061Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6327422Z or.b64 %rd37, %rd176, %rd23; 2026-02-21T12:39:47.6327758Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6328117Z shl.b64 %rd177, %rd175, 21; 2026-02-21T12:39:47.6328311Z add.s64 %rd592, %rd26, %rd177; 2026-02-21T12:39:47.6328586Z add.s64 %rd591, %rd27, %rd176; 2026-02-21T12:39:47.6328792Z mov.b32 %r11247, 0f00000000; 2026-02-21T12:39:47.6328978Z mov.b64 %rd593, -16; 2026-02-21T12:39:47.6329154Z mov.b32 %r11248, %r11247; 2026-02-21T12:39:47.6329412Z mov.b32 %r11249, %r11247; 2026-02-21T12:39:47.6329582Z mov.b32 %r11250, %r11247; 2026-02-21T12:39:47.6329761Z mov.b32 %r11251, %r11247; 2026-02-21T12:39:47.6329930Z mov.b32 %r11252, %r11247; 2026-02-21T12:39:47.6330110Z mov.b32 %r11253, %r11247; 2026-02-21T12:39:47.6330277Z mov.b32 %r11254, %r11247; 2026-02-21T12:39:47.6330455Z mov.b32 %r11255, %r11247; 2026-02-21T12:39:47.6330628Z mov.b32 %r11256, %r11247; 2026-02-21T12:39:47.6330808Z mov.b32 %r11257, %r11247; 2026-02-21T12:39:47.6330980Z mov.b32 %r11258, %r11247; 2026-02-21T12:39:47.6331166Z mov.b32 %r11259, %r11247; 2026-02-21T12:39:47.6331341Z mov.b32 %r11260, %r11247; 2026-02-21T12:39:47.6331509Z mov.b32 %r11261, %r11247; 2026-02-21T12:39:47.6331680Z mov.b32 %r11262, %r11247; 2026-02-21T12:39:47.6331920Z mov.b32 %r11263, %r11247; 2026-02-21T12:39:47.6332096Z mov.b32 %r11264, %r11247; 2026-02-21T12:39:47.6332259Z mov.b32 %r11265, %r11247; 2026-02-21T12:39:47.6332430Z mov.b32 %r11266, %r11247; 2026-02-21T12:39:47.6332595Z mov.b32 %r11267, %r11247; 2026-02-21T12:39:47.6332782Z mov.b32 %r11268, %r11247; 2026-02-21T12:39:47.6332945Z mov.b32 %r11269, %r11247; 2026-02-21T12:39:47.6333182Z mov.b32 %r11270, %r11247; 2026-02-21T12:39:47.6333355Z mov.b32 %r11271, %r11247; 2026-02-21T12:39:47.6333519Z mov.b32 %r11272, %r11247; 2026-02-21T12:39:47.6333689Z mov.b32 %r11273, %r11247; 2026-02-21T12:39:47.6333852Z mov.b32 %r11274, %r11247; 2026-02-21T12:39:47.6334027Z mov.b32 %r11275, %r11247; 2026-02-21T12:39:47.6334191Z mov.b32 %r11276, %r11247; 2026-02-21T12:39:47.6334362Z mov.b32 %r11277, %r11247; 2026-02-21T12:39:47.6334535Z mov.b32 %r11278, %r11247; 2026-02-21T12:39:47.6334706Z mov.b32 %r11279, %r11247; 2026-02-21T12:39:47.6334875Z mov.b32 %r11280, %r11247; 2026-02-21T12:39:47.6335053Z mov.b32 %r11281, %r11247; 2026-02-21T12:39:47.6335224Z mov.b32 %r11282, %r11247; 2026-02-21T12:39:47.6335392Z mov.b32 %r11283, %r11247; 2026-02-21T12:39:47.6335578Z mov.b32 %r11284, %r11247; 2026-02-21T12:39:47.6335757Z mov.b32 %r11285, %r11247; 2026-02-21T12:39:47.6335936Z mov.b32 %r11286, %r11247; 2026-02-21T12:39:47.6336103Z mov.b32 %r11287, %r11247; 2026-02-21T12:39:47.6336286Z mov.b32 %r11288, %r11247; 2026-02-21T12:39:47.6336584Z mov.b32 %r11289, %r11247; 2026-02-21T12:39:47.6336773Z mov.b32 %r11290, %r11247; 2026-02-21T12:39:47.6336941Z mov.b32 %r11291, %r11247; 2026-02-21T12:39:47.6337127Z mov.b32 %r11292, %r11247; 2026-02-21T12:39:47.6337312Z mov.b32 %r11293, %r11247; 2026-02-21T12:39:47.6337483Z mov.b32 %r11294, %r11247; 2026-02-21T12:39:47.6337661Z mov.b32 %r11295, %r11247; 2026-02-21T12:39:47.6337827Z mov.b32 %r11296, %r11247; 2026-02-21T12:39:47.6337999Z mov.b32 %r11297, %r11247; 2026-02-21T12:39:47.6338167Z mov.b32 %r11298, %r11247; 2026-02-21T12:39:47.6338347Z mov.b32 %r11299, %r11247; 2026-02-21T12:39:47.6338518Z mov.b32 %r11300, %r11247; 2026-02-21T12:39:47.6338690Z mov.b32 %r11301, %r11247; 2026-02-21T12:39:47.6338866Z mov.b32 %r11302, %r11247; 2026-02-21T12:39:47.6339039Z mov.b32 %r11303, %r11247; 2026-02-21T12:39:47.6339213Z mov.b32 %r11304, %r11247; 2026-02-21T12:39:47.6339379Z mov.b32 %r11305, %r11247; 2026-02-21T12:39:47.6339554Z mov.b32 %r11306, %r11247; 2026-02-21T12:39:47.6339723Z mov.b32 %r11307, %r11247; 2026-02-21T12:39:47.6339892Z mov.b32 %r11308, %r11247; 2026-02-21T12:39:47.6340055Z mov.b32 %r11309, %r11247; 2026-02-21T12:39:47.6340226Z mov.b32 %r11310, %r11247; 2026-02-21T12:39:47.6340389Z mov.b32 %r11311, %r11247; 2026-02-21T12:39:47.6340558Z mov.b32 %r11312, %r11247; 2026-02-21T12:39:47.6340726Z mov.b32 %r11313, %r11247; 2026-02-21T12:39:47.6340890Z mov.b32 %r11314, %r11247; 2026-02-21T12:39:47.6341060Z mov.b32 %r11315, %r11247; 2026-02-21T12:39:47.6341223Z mov.b32 %r11316, %r11247; 2026-02-21T12:39:47.6341487Z mov.b32 %r11317, %r11247; 2026-02-21T12:39:47.6341671Z mov.b32 %r11318, %r11247; 2026-02-21T12:39:47.6341846Z mov.b32 %r11319, %r11247; 2026-02-21T12:39:47.6342010Z mov.b32 %r11320, %r11247; 2026-02-21T12:39:47.6342257Z mov.b32 %r11321, %r11247; 2026-02-21T12:39:47.6342424Z mov.b32 %r11322, %r11247; 2026-02-21T12:39:47.6342594Z mov.b32 %r11323, %r11247; 2026-02-21T12:39:47.6342766Z mov.b32 %r11324, %r11247; 2026-02-21T12:39:47.6342930Z mov.b32 %r11325, %r11247; 2026-02-21T12:39:47.6343102Z mov.b32 %r11326, %r11247; 2026-02-21T12:39:47.6343269Z mov.b32 %r11327, %r11247; 2026-02-21T12:39:47.6343440Z mov.b32 %r11328, %r11247; 2026-02-21T12:39:47.6343614Z mov.b32 %r11329, %r11247; 2026-02-21T12:39:47.6343789Z mov.b32 %r11330, %r11247; 2026-02-21T12:39:47.6343965Z mov.b32 %r11331, %r11247; 2026-02-21T12:39:47.6344138Z mov.b32 %r11332, %r11247; 2026-02-21T12:39:47.6344302Z mov.b32 %r11333, %r11247; 2026-02-21T12:39:47.6344475Z mov.b32 %r11334, %r11247; 2026-02-21T12:39:47.6344740Z mov.b32 %r11335, %r11247; 2026-02-21T12:39:47.6344909Z mov.b32 %r11336, %r11247; 2026-02-21T12:39:47.6345082Z mov.b32 %r11337, %r11247; 2026-02-21T12:39:47.6345245Z mov.b32 %r11338, %r11247; 2026-02-21T12:39:47.6345429Z mov.b32 %r11339, %r11247; 2026-02-21T12:39:47.6345620Z mov.b32 %r11340, %r11247; 2026-02-21T12:39:47.6345796Z mov.b32 %r11341, %r11247; 2026-02-21T12:39:47.6346033Z mov.b32 %r11342, %r11247; 2026-02-21T12:39:47.6346207Z mov.b32 %r11343, %r11247; 2026-02-21T12:39:47.6346372Z mov.b32 %r11344, %r11247; 2026-02-21T12:39:47.6346665Z mov.b32 %r11345, %r11247; 2026-02-21T12:39:47.6346858Z mov.b32 %r11346, %r11247; 2026-02-21T12:39:47.6347025Z mov.b32 %r11347, %r11247; 2026-02-21T12:39:47.6347197Z mov.b32 %r11348, %r11247; 2026-02-21T12:39:47.6347362Z mov.b32 %r11349, %r11247; 2026-02-21T12:39:47.6347533Z mov.b32 %r11350, %r11247; 2026-02-21T12:39:47.6347699Z mov.b32 %r11351, %r11247; 2026-02-21T12:39:47.6347888Z mov.b32 %r11352, %r11247; 2026-02-21T12:39:47.6348061Z mov.b32 %r11353, %r11247; 2026-02-21T12:39:47.6348235Z mov.b32 %r11354, %r11247; 2026-02-21T12:39:47.6348486Z mov.b32 %r11355, %r11247; 2026-02-21T12:39:47.6348674Z mov.b32 %r11356, %r11247; 2026-02-21T12:39:47.6348848Z mov.b32 %r11357, %r11247; 2026-02-21T12:39:47.6349013Z mov.b32 %r11358, %r11247; 2026-02-21T12:39:47.6349184Z mov.b32 %r11359, %r11247; 2026-02-21T12:39:47.6349352Z mov.b32 %r11360, %r11247; 2026-02-21T12:39:47.6349522Z mov.b32 %r11361, %r11247; 2026-02-21T12:39:47.6349684Z mov.b32 %r11362, %r11247; 2026-02-21T12:39:47.6349853Z mov.b32 %r11363, %r11247; 2026-02-21T12:39:47.6350018Z mov.b32 %r11364, %r11247; 2026-02-21T12:39:47.6350189Z mov.b32 %r11365, %r11247; 2026-02-21T12:39:47.6350353Z mov.b32 %r11366, %r11247; 2026-02-21T12:39:47.6350525Z mov.b32 %r11367, %r11247; 2026-02-21T12:39:47.6350693Z mov.b32 %r11368, %r11247; 2026-02-21T12:39:47.6350877Z mov.b32 %r11369, %r11247; 2026-02-21T12:39:47.6351048Z mov.b32 %r11370, %r11247; 2026-02-21T12:39:47.6351218Z mov.b32 %r11371, %r11247; 2026-02-21T12:39:47.6351389Z mov.b32 %r11372, %r11247; 2026-02-21T12:39:47.6351555Z mov.b32 %r11373, %r11247; 2026-02-21T12:39:47.6351724Z mov.b32 %r11374, %r11247; 2026-02-21T12:39:47.6351947Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T12:39:47.6352270Z // => This Inner Loop Header: Depth=2 2026-02-21T12:39:47.6352676Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6353035Z add.s64 %rd179, %rd592, -32; 2026-02-21T12:39:47.6353223Z // begin inline asm 2026-02-21T12:39:47.6353385Z mov.u64 %rd178, 0x0; 2026-02-21T12:39:47.6353637Z createpolicy.fractional.L2::evict_last.b64 %rd178, 1.0; 2026-02-21T12:39:47.6353895Z // end inline asm 2026-02-21T12:39:47.6354053Z // begin inline asm 2026-02-21T12:39:47.6354206Z mov.u32 %r1599, 0x0; 2026-02-21T12:39:47.6354365Z mov.u32 %r1600, 0x0; 2026-02-21T12:39:47.6354656Z mov.u32 %r1601, 0x0; 2026-02-21T12:39:47.6354809Z mov.u32 %r1602, 0x0; 2026-02-21T12:39:47.6355152Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1599, %r1600, %r1601, %r1602 }, [ %rd179 + 0 ], %rd178; 2026-02-21T12:39:47.6355585Z // end inline asm 2026-02-21T12:39:47.6355889Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6356239Z bar.sync 0; 2026-02-21T12:39:47.6356417Z st.shared.v2.b32 [%r10], {%r1599, %r1600}; 2026-02-21T12:39:47.6356775Z st.shared.v2.b32 [%r11], {%r1601, %r1602}; 2026-02-21T12:39:47.6356987Z bar.sync 0; 2026-02-21T12:39:47.6357148Z ld.shared.b16 %rs1, [%r13]; 2026-02-21T12:39:47.6357359Z ld.shared.b16 %rs2, [%r13+256]; 2026-02-21T12:39:47.6357564Z ld.shared.b16 %rs3, [%r13+16]; 2026-02-21T12:39:47.6357760Z ld.shared.b16 %rs4, [%r13+272]; 2026-02-21T12:39:47.6357961Z ld.shared.b16 %rs5, [%r14]; 2026-02-21T12:39:47.6358145Z ld.shared.b16 %rs6, [%r14+256]; 2026-02-21T12:39:47.6358434Z ld.shared.b16 %rs7, [%r14+16]; 2026-02-21T12:39:47.6358630Z ld.shared.b16 %rs8, [%r14+272]; 2026-02-21T12:39:47.6358829Z cvt.f32.bf16 %r1861, %rs1; 2026-02-21T12:39:47.6359015Z cvt.f32.bf16 %r1862, %rs2; 2026-02-21T12:39:47.6359198Z cvt.f32.bf16 %r1863, %rs5; 2026-02-21T12:39:47.6359384Z cvt.f32.bf16 %r1864, %rs6; 2026-02-21T12:39:47.6359573Z cvt.f32.bf16 %r2121, %rs3; 2026-02-21T12:39:47.6359839Z cvt.f32.bf16 %r2122, %rs4; 2026-02-21T12:39:47.6360015Z cvt.f32.bf16 %r2123, %rs7; 2026-02-21T12:39:47.6360192Z cvt.f32.bf16 %r2124, %rs8; 2026-02-21T12:39:47.6360510Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6360867Z // begin inline asm 2026-02-21T12:39:47.6361023Z mov.u32 %r1603, 0x0; 2026-02-21T12:39:47.6361182Z mov.u32 %r1604, 0x0; 2026-02-21T12:39:47.6361373Z ld.global.v2.b32 { %r1603, %r1604 }, [ %rd591 + 0 ]; 2026-02-21T12:39:47.6361615Z // end inline asm 2026-02-21T12:39:47.6361919Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6362260Z bar.sync 0; 2026-02-21T12:39:47.6362413Z st.shared.b8 [%r15], %r1603; 2026-02-21T12:39:47.6362601Z prmt.b32 %r3175, %r1603, 0, 0x7771U; 2026-02-21T12:39:47.6362814Z st.shared.b8 [%r16+256], %r3175; 2026-02-21T12:39:47.6363010Z prmt.b32 %r3176, %r1603, 0, 0x7772U; 2026-02-21T12:39:47.6363219Z st.shared.b8 [%r17+512], %r3176; 2026-02-21T12:39:47.6363416Z prmt.b32 %r3177, %r1603, 0, 0x7773U; 2026-02-21T12:39:47.6363610Z st.shared.b8 [%r18+768], %r3177; 2026-02-21T12:39:47.6363808Z st.shared.b8 [%r19+1024], %r1604; 2026-02-21T12:39:47.6364002Z prmt.b32 %r3178, %r1604, 0, 0x7771U; 2026-02-21T12:39:47.6364203Z st.shared.b8 [%r20+1280], %r3178; 2026-02-21T12:39:47.6364403Z prmt.b32 %r3179, %r1604, 0, 0x7772U; 2026-02-21T12:39:47.6364602Z st.shared.b8 [%r21+1536], %r3179; 2026-02-21T12:39:47.6364791Z prmt.b32 %r3180, %r1604, 0, 0x7773U; 2026-02-21T12:39:47.6364994Z st.shared.b8 [%r22+1792], %r3180; 2026-02-21T12:39:47.6365185Z bar.sync 0; 2026-02-21T12:39:47.6365336Z ld.shared.b32 %r3181, [%r23]; 2026-02-21T12:39:47.6365527Z prmt.b32 %r3182, %r3181, 0, 0x7771U; 2026-02-21T12:39:47.6365725Z cvt.u16.u32 %rs9, %r3182; 2026-02-21T12:39:47.6365911Z prmt.b32 %r3183, %r3181, 0, 0x7770U; 2026-02-21T12:39:47.6366105Z cvt.u16.u32 %rs10, %r3183; 2026-02-21T12:39:47.6366291Z prmt.b32 %r3184, %r3181, 0, 0x7773U; 2026-02-21T12:39:47.6366601Z cvt.u16.u32 %rs11, %r3184; 2026-02-21T12:39:47.6366793Z prmt.b32 %r3185, %r3181, 0, 0x7772U; 2026-02-21T12:39:47.6366988Z cvt.u16.u32 %rs12, %r3185; 2026-02-21T12:39:47.6367303Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6367672Z shl.b16 %rs13, %rs10, 4; 2026-02-21T12:39:47.6367845Z shl.b16 %rs14, %rs9, 4; 2026-02-21T12:39:47.6368165Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6368601Z cvt.u32.u16 %r3186, %rs13; 2026-02-21T12:39:47.6368798Z prmt.b32 %r3187, %r3186, %r3188, 0x3340U; 2026-02-21T12:39:47.6369020Z prmt.b32 %r3192, %r3187, %r3189, 0x5410U; 2026-02-21T12:39:47.6369244Z prmt.b32 %r3193, %r3192, %r3181, 0x5040U; 2026-02-21T12:39:47.6369526Z prmt.b32 %r3194, %r3193, 0, 0x9991U; 2026-02-21T12:39:47.6369736Z cvt.u16.u32 %rs15, %r3194; 2026-02-21T12:39:47.6369923Z shr.s16 %rs16, %rs15, 4; 2026-02-21T12:39:47.6370102Z prmt.b32 %r3195, %r3193, 0, 0xbbb3U; 2026-02-21T12:39:47.6370306Z cvt.u16.u32 %rs17, %r3195; 2026-02-21T12:39:47.6370480Z shr.s16 %rs18, %rs17, 4; 2026-02-21T12:39:47.6370656Z cvt.s16.s8 %rs19, %rs13; 2026-02-21T12:39:47.6370823Z shr.s16 %rs20, %rs19, 4; 2026-02-21T12:39:47.6370995Z cvt.s16.s8 %rs21, %rs14; 2026-02-21T12:39:47.6371167Z shr.s16 %rs22, %rs21, 4; 2026-02-21T12:39:47.6371472Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6371907Z cvt.rn.f32.s16 %r3196, %rs18; 2026-02-21T12:39:47.6372098Z cvt.rn.f32.s16 %r3197, %rs16; 2026-02-21T12:39:47.6372280Z cvt.rn.f32.s16 %r3198, %rs22; 2026-02-21T12:39:47.6372457Z cvt.rn.f32.s16 %r3199, %rs20; 2026-02-21T12:39:47.6372787Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6373134Z shl.b16 %rs23, %rs12, 4; 2026-02-21T12:39:47.6373383Z shl.b16 %rs24, %rs11, 4; 2026-02-21T12:39:47.6373702Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6374058Z prmt.b32 %r3200, %r3181, %r3201, 0x3020U; 2026-02-21T12:39:47.6374273Z prmt.b32 %r3202, %r3200, 0, 0x9991U; 2026-02-21T12:39:47.6374481Z cvt.u16.u32 %rs25, %r3202; 2026-02-21T12:39:47.6374659Z shr.s16 %rs26, %rs25, 4; 2026-02-21T12:39:47.6374826Z cvt.s16.s8 %rs27, %rs23; 2026-02-21T12:39:47.6375002Z shr.s16 %rs28, %rs27, 4; 2026-02-21T12:39:47.6375168Z cvt.s16.s8 %rs29, %rs24; 2026-02-21T12:39:47.6375347Z shr.s16 %rs30, %rs29, 4; 2026-02-21T12:39:47.6375529Z prmt.b32 %r3203, %r3181, 0, 0xbbb3U; 2026-02-21T12:39:47.6375737Z cvt.u16.u32 %rs31, %r3203; 2026-02-21T12:39:47.6375919Z shr.s16 %rs32, %rs31, 4; 2026-02-21T12:39:47.6376224Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6376704Z cvt.rn.f32.s16 %r3204, %rs26; 2026-02-21T12:39:47.6376896Z cvt.rn.f32.s16 %r3205, %rs32; 2026-02-21T12:39:47.6377080Z cvt.rn.f32.s16 %r3206, %rs30; 2026-02-21T12:39:47.6377269Z cvt.rn.f32.s16 %r3207, %rs28; 2026-02-21T12:39:47.6377591Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6377951Z ld.shared.b32 %r3208, [%r23+128]; 2026-02-21T12:39:47.6378147Z prmt.b32 %r3209, %r3208, 0, 0x7771U; 2026-02-21T12:39:47.6378354Z cvt.u16.u32 %rs33, %r3209; 2026-02-21T12:39:47.6378531Z prmt.b32 %r3210, %r3208, 0, 0x7770U; 2026-02-21T12:39:47.6378735Z cvt.u16.u32 %rs34, %r3210; 2026-02-21T12:39:47.6378915Z prmt.b32 %r3211, %r3208, 0, 0x7773U; 2026-02-21T12:39:47.6379113Z cvt.u16.u32 %rs35, %r3211; 2026-02-21T12:39:47.6379287Z prmt.b32 %r3212, %r3208, 0, 0x7772U; 2026-02-21T12:39:47.6379495Z cvt.u16.u32 %rs36, %r3212; 2026-02-21T12:39:47.6379814Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6380167Z shl.b16 %rs37, %rs34, 4; 2026-02-21T12:39:47.6380361Z shl.b16 %rs38, %rs33, 4; 2026-02-21T12:39:47.6380669Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6381023Z cvt.u32.u16 %r3213, %rs37; 2026-02-21T12:39:47.6381207Z prmt.b32 %r3214, %r3213, %r3215, 0x3340U; 2026-02-21T12:39:47.6381430Z prmt.b32 %r3216, %r3214, %r3189, 0x5410U; 2026-02-21T12:39:47.6381650Z prmt.b32 %r3217, %r3216, %r3208, 0x5040U; 2026-02-21T12:39:47.6381857Z prmt.b32 %r3218, %r3217, 0, 0x9991U; 2026-02-21T12:39:47.6382160Z cvt.u16.u32 %rs39, %r3218; 2026-02-21T12:39:47.6382335Z shr.s16 %rs40, %rs39, 4; 2026-02-21T12:39:47.6382517Z prmt.b32 %r3219, %r3217, 0, 0xbbb3U; 2026-02-21T12:39:47.6382712Z cvt.u16.u32 %rs41, %r3219; 2026-02-21T12:39:47.6382978Z shr.s16 %rs42, %rs41, 4; 2026-02-21T12:39:47.6383147Z cvt.s16.s8 %rs43, %rs37; 2026-02-21T12:39:47.6383319Z shr.s16 %rs44, %rs43, 4; 2026-02-21T12:39:47.6383485Z cvt.s16.s8 %rs45, %rs38; 2026-02-21T12:39:47.6383657Z shr.s16 %rs46, %rs45, 4; 2026-02-21T12:39:47.6383964Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6384315Z cvt.rn.f32.s16 %r3220, %rs42; 2026-02-21T12:39:47.6384502Z cvt.rn.f32.s16 %r3221, %rs40; 2026-02-21T12:39:47.6384682Z cvt.rn.f32.s16 %r3222, %rs46; 2026-02-21T12:39:47.6384862Z cvt.rn.f32.s16 %r3223, %rs44; 2026-02-21T12:39:47.6385174Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6385621Z shl.b16 %rs47, %rs36, 4; 2026-02-21T12:39:47.6385803Z shl.b16 %rs48, %rs35, 4; 2026-02-21T12:39:47.6386108Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6386589Z prmt.b32 %r3224, %r3208, %r3225, 0x3020U; 2026-02-21T12:39:47.6386825Z prmt.b32 %r3226, %r3224, 0, 0x9991U; 2026-02-21T12:39:47.6387031Z cvt.u16.u32 %rs49, %r3226; 2026-02-21T12:39:47.6387287Z shr.s16 %rs50, %rs49, 4; 2026-02-21T12:39:47.6387473Z cvt.s16.s8 %rs51, %rs47; 2026-02-21T12:39:47.6387647Z shr.s16 %rs52, %rs51, 4; 2026-02-21T12:39:47.6387818Z cvt.s16.s8 %rs53, %rs48; 2026-02-21T12:39:47.6387983Z shr.s16 %rs54, %rs53, 4; 2026-02-21T12:39:47.6388160Z prmt.b32 %r3227, %r3208, 0, 0xbbb3U; 2026-02-21T12:39:47.6388362Z cvt.u16.u32 %rs55, %r3227; 2026-02-21T12:39:47.6388603Z shr.s16 %rs56, %rs55, 4; 2026-02-21T12:39:47.6388914Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6389270Z cvt.rn.f32.s16 %r3228, %rs50; 2026-02-21T12:39:47.6389459Z cvt.rn.f32.s16 %r3229, %rs56; 2026-02-21T12:39:47.6389639Z cvt.rn.f32.s16 %r3230, %rs54; 2026-02-21T12:39:47.6389826Z cvt.rn.f32.s16 %r3231, %rs52; 2026-02-21T12:39:47.6390003Z bar.sync 0; 2026-02-21T12:39:47.6390204Z st.shared.v4.b32 [%r24], {%r3199, %r3197, %r3198, %r3196}; 2026-02-21T12:39:47.6390506Z st.shared.v4.b32 [%r25], {%r3207, %r3204, %r3206, %r3205}; 2026-02-21T12:39:47.6390793Z st.shared.v4.b32 [%r26], {%r3223, %r3221, %r3222, %r3220}; 2026-02-21T12:39:47.6391080Z st.shared.v4.b32 [%r27], {%r3231, %r3228, %r3230, %r3229}; 2026-02-21T12:39:47.6391315Z $L__tmp1: 2026-02-21T12:39:47.6391673Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6392092Z // begin inline asm 2026-02-21T12:39:47.6392279Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6392474Z // end inline asm 2026-02-21T12:39:47.6392641Z bar.sync 0; 2026-02-21T12:39:47.6392813Z shfl.sync.idx.b32 %r3232, %r2, 0, 31, -1; 2026-02-21T12:39:47.6393041Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6393230Z mov.pred %p3, -1; 2026-02-21T12:39:47.6393390Z // begin inline asm 2026-02-21T12:39:47.6396235Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374}, {%r1861,%r1862,%r1863,%r1864}, %rd404, %p3, 1, 1; 2026-02-21T12:39:47.6399564Z // end inline asm 2026-02-21T12:39:47.6399728Z // begin inline asm 2026-02-21T12:39:47.6402685Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374}, {%r2121,%r2122,%r2123,%r2124}, %rd405, %p3, 1, 1; 2026-02-21T12:39:47.6405697Z // end inline asm 2026-02-21T12:39:47.6405874Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6406078Z mov.b32 %r3042, 0; 2026-02-21T12:39:47.6406242Z mov.b32 %r2253, %r10096; 2026-02-21T12:39:47.6406414Z mov.b32 %r2254, %r3042; 2026-02-21T12:39:47.6406712Z mov.b32 %r2255, %r3042; 2026-02-21T12:39:47.6406883Z // begin inline asm 2026-02-21T12:39:47.6409508Z // wait for regs: %r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374,%r2253,%r2254,%r2255 2026-02-21T12:39:47.6412301Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6412498Z // end inline asm 2026-02-21T12:39:47.6412645Z $L__tmp2: 2026-02-21T12:39:47.6412942Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6413298Z // begin inline asm 2026-02-21T12:39:47.6413462Z mov.u64 %rd184, 0x0; 2026-02-21T12:39:47.6413704Z createpolicy.fractional.L2::evict_last.b64 %rd184, 1.0; 2026-02-21T12:39:47.6413964Z // end inline asm 2026-02-21T12:39:47.6414110Z // begin inline asm 2026-02-21T12:39:47.6414268Z mov.u32 %r2387, 0x0; 2026-02-21T12:39:47.6414430Z mov.u32 %r2388, 0x0; 2026-02-21T12:39:47.6414578Z mov.u32 %r2389, 0x0; 2026-02-21T12:39:47.6414733Z mov.u32 %r2390, 0x0; 2026-02-21T12:39:47.6415048Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2387, %r2388, %r2389, %r2390 }, [ %rd592 + 0 ], %rd184; 2026-02-21T12:39:47.6415416Z // end inline asm 2026-02-21T12:39:47.6415810Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6416163Z bar.sync 0; 2026-02-21T12:39:47.6416346Z st.shared.v2.b32 [%r10], {%r2387, %r2388}; 2026-02-21T12:39:47.6416761Z st.shared.v2.b32 [%r11], {%r2389, %r2390}; 2026-02-21T12:39:47.6416989Z bar.sync 0; 2026-02-21T12:39:47.6417142Z ld.shared.b16 %rs57, [%r13]; 2026-02-21T12:39:47.6417345Z ld.shared.b16 %rs58, [%r13+256]; 2026-02-21T12:39:47.6417557Z ld.shared.b16 %rs59, [%r13+16]; 2026-02-21T12:39:47.6417754Z ld.shared.b16 %rs60, [%r13+272]; 2026-02-21T12:39:47.6417949Z ld.shared.b16 %rs61, [%r14]; 2026-02-21T12:39:47.6418138Z ld.shared.b16 %rs62, [%r14+256]; 2026-02-21T12:39:47.6418331Z ld.shared.b16 %rs63, [%r14+16]; 2026-02-21T12:39:47.6418530Z ld.shared.b16 %rs64, [%r14+272]; 2026-02-21T12:39:47.6418730Z cvt.f32.bf16 %r2649, %rs57; 2026-02-21T12:39:47.6418911Z cvt.f32.bf16 %r2650, %rs58; 2026-02-21T12:39:47.6419091Z cvt.f32.bf16 %r2651, %rs61; 2026-02-21T12:39:47.6419366Z cvt.f32.bf16 %r2652, %rs62; 2026-02-21T12:39:47.6419556Z cvt.f32.bf16 %r2909, %rs59; 2026-02-21T12:39:47.6419729Z cvt.f32.bf16 %r2910, %rs60; 2026-02-21T12:39:47.6419908Z cvt.f32.bf16 %r2911, %rs63; 2026-02-21T12:39:47.6420083Z cvt.f32.bf16 %r2912, %rs64; 2026-02-21T12:39:47.6420407Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6420844Z add.s64 %rd187, %rd591, 10240; 2026-02-21T12:39:47.6421037Z // begin inline asm 2026-02-21T12:39:47.6421215Z mov.u32 %r2391, 0x0; 2026-02-21T12:39:47.6421376Z mov.u32 %r2392, 0x0; 2026-02-21T12:39:47.6421573Z ld.global.v2.b32 { %r2391, %r2392 }, [ %rd187 + 0 ]; 2026-02-21T12:39:47.6421809Z // end inline asm 2026-02-21T12:39:47.6422112Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6422457Z bar.sync 0; 2026-02-21T12:39:47.6422634Z st.shared.b8 [%r15], %r2391; 2026-02-21T12:39:47.6422841Z prmt.b32 %r3233, %r2391, 0, 0x7771U; 2026-02-21T12:39:47.6423051Z st.shared.b8 [%r16+256], %r3233; 2026-02-21T12:39:47.6423255Z prmt.b32 %r3234, %r2391, 0, 0x7772U; 2026-02-21T12:39:47.6423453Z st.shared.b8 [%r17+512], %r3234; 2026-02-21T12:39:47.6423661Z prmt.b32 %r3235, %r2391, 0, 0x7773U; 2026-02-21T12:39:47.6423863Z st.shared.b8 [%r18+768], %r3235; 2026-02-21T12:39:47.6424061Z st.shared.b8 [%r19+1024], %r2392; 2026-02-21T12:39:47.6424258Z prmt.b32 %r3236, %r2392, 0, 0x7771U; 2026-02-21T12:39:47.6424455Z st.shared.b8 [%r20+1280], %r3236; 2026-02-21T12:39:47.6424652Z prmt.b32 %r3237, %r2392, 0, 0x7772U; 2026-02-21T12:39:47.6424846Z st.shared.b8 [%r21+1536], %r3237; 2026-02-21T12:39:47.6425045Z prmt.b32 %r3238, %r2392, 0, 0x7773U; 2026-02-21T12:39:47.6425240Z st.shared.b8 [%r22+1792], %r3238; 2026-02-21T12:39:47.6425430Z bar.sync 0; 2026-02-21T12:39:47.6425580Z ld.shared.b32 %r3239, [%r23]; 2026-02-21T12:39:47.6425771Z prmt.b32 %r3240, %r3239, 0, 0x7771U; 2026-02-21T12:39:47.6425972Z cvt.u16.u32 %rs65, %r3240; 2026-02-21T12:39:47.6426157Z prmt.b32 %r3241, %r3239, 0, 0x7770U; 2026-02-21T12:39:47.6426364Z cvt.u16.u32 %rs66, %r3241; 2026-02-21T12:39:47.6426702Z prmt.b32 %r3242, %r3239, 0, 0x7773U; 2026-02-21T12:39:47.6426909Z cvt.u16.u32 %rs67, %r3242; 2026-02-21T12:39:47.6427091Z prmt.b32 %r3243, %r3239, 0, 0x7772U; 2026-02-21T12:39:47.6427289Z cvt.u16.u32 %rs68, %r3243; 2026-02-21T12:39:47.6427614Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6427978Z shl.b16 %rs69, %rs66, 4; 2026-02-21T12:39:47.6428150Z shl.b16 %rs70, %rs65, 4; 2026-02-21T12:39:47.6428520Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6428877Z cvt.u32.u16 %r3244, %rs69; 2026-02-21T12:39:47.6429066Z prmt.b32 %r3245, %r3244, %r3246, 0x3340U; 2026-02-21T12:39:47.6429289Z prmt.b32 %r3247, %r3245, %r3189, 0x5410U; 2026-02-21T12:39:47.6442922Z prmt.b32 %r3248, %r3247, %r3239, 0x5040U; 2026-02-21T12:39:47.6443268Z prmt.b32 %r3249, %r3248, 0, 0x9991U; 2026-02-21T12:39:47.6443501Z cvt.u16.u32 %rs71, %r3249; 2026-02-21T12:39:47.6443709Z shr.s16 %rs72, %rs71, 4; 2026-02-21T12:39:47.6444095Z prmt.b32 %r3250, %r3248, 0, 0xbbb3U; 2026-02-21T12:39:47.6444323Z cvt.u16.u32 %rs73, %r3250; 2026-02-21T12:39:47.6444519Z shr.s16 %rs74, %rs73, 4; 2026-02-21T12:39:47.6444706Z cvt.s16.s8 %rs75, %rs69; 2026-02-21T12:39:47.6444886Z shr.s16 %rs76, %rs75, 4; 2026-02-21T12:39:47.6445063Z cvt.s16.s8 %rs77, %rs70; 2026-02-21T12:39:47.6445240Z shr.s16 %rs78, %rs77, 4; 2026-02-21T12:39:47.6445572Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6445953Z cvt.rn.f32.s16 %r3251, %rs74; 2026-02-21T12:39:47.6446146Z cvt.rn.f32.s16 %r3252, %rs72; 2026-02-21T12:39:47.6446345Z cvt.rn.f32.s16 %r3253, %rs78; 2026-02-21T12:39:47.6446698Z cvt.rn.f32.s16 %r3254, %rs76; 2026-02-21T12:39:47.6447143Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6447522Z shl.b16 %rs79, %rs68, 4; 2026-02-21T12:39:47.6447707Z shl.b16 %rs80, %rs67, 4; 2026-02-21T12:39:47.6448025Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6448388Z prmt.b32 %r3255, %r3239, %r3256, 0x3020U; 2026-02-21T12:39:47.6448701Z prmt.b32 %r3257, %r3255, 0, 0x9991U; 2026-02-21T12:39:47.6448908Z cvt.u16.u32 %rs81, %r3257; 2026-02-21T12:39:47.6449094Z shr.s16 %rs82, %rs81, 4; 2026-02-21T12:39:47.6449265Z cvt.s16.s8 %rs83, %rs79; 2026-02-21T12:39:47.6449436Z shr.s16 %rs84, %rs83, 4; 2026-02-21T12:39:47.6449602Z cvt.s16.s8 %rs85, %rs80; 2026-02-21T12:39:47.6449781Z shr.s16 %rs86, %rs85, 4; 2026-02-21T12:39:47.6449977Z prmt.b32 %r3258, %r3239, 0, 0xbbb3U; 2026-02-21T12:39:47.6450178Z cvt.u16.u32 %rs87, %r3258; 2026-02-21T12:39:47.6450359Z shr.s16 %rs88, %rs87, 4; 2026-02-21T12:39:47.6450667Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6451028Z cvt.rn.f32.s16 %r3259, %rs82; 2026-02-21T12:39:47.6451209Z cvt.rn.f32.s16 %r3260, %rs88; 2026-02-21T12:39:47.6451408Z cvt.rn.f32.s16 %r3261, %rs86; 2026-02-21T12:39:47.6451590Z cvt.rn.f32.s16 %r3262, %rs84; 2026-02-21T12:39:47.6451909Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6452268Z ld.shared.b32 %r3263, [%r23+128]; 2026-02-21T12:39:47.6452474Z prmt.b32 %r3264, %r3263, 0, 0x7771U; 2026-02-21T12:39:47.6452679Z cvt.u16.u32 %rs89, %r3264; 2026-02-21T12:39:47.6452858Z prmt.b32 %r3265, %r3263, 0, 0x7770U; 2026-02-21T12:39:47.6453058Z cvt.u16.u32 %rs90, %r3265; 2026-02-21T12:39:47.6453243Z prmt.b32 %r3266, %r3263, 0, 0x7773U; 2026-02-21T12:39:47.6453438Z cvt.u16.u32 %rs91, %r3266; 2026-02-21T12:39:47.6453614Z prmt.b32 %r3267, %r3263, 0, 0x7772U; 2026-02-21T12:39:47.6453820Z cvt.u16.u32 %rs92, %r3267; 2026-02-21T12:39:47.6454138Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6454485Z shl.b16 %rs93, %rs90, 4; 2026-02-21T12:39:47.6454682Z shl.b16 %rs94, %rs89, 4; 2026-02-21T12:39:47.6454989Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6455340Z cvt.u32.u16 %r3268, %rs93; 2026-02-21T12:39:47.6455528Z prmt.b32 %r3269, %r3268, %r3270, 0x3340U; 2026-02-21T12:39:47.6455756Z prmt.b32 %r3271, %r3269, %r3189, 0x5410U; 2026-02-21T12:39:47.6455981Z prmt.b32 %r3272, %r3271, %r3263, 0x5040U; 2026-02-21T12:39:47.6456188Z prmt.b32 %r3273, %r3272, 0, 0x9991U; 2026-02-21T12:39:47.6456393Z cvt.u16.u32 %rs95, %r3273; 2026-02-21T12:39:47.6456687Z shr.s16 %rs96, %rs95, 4; 2026-02-21T12:39:47.6456880Z prmt.b32 %r3274, %r3272, 0, 0xbbb3U; 2026-02-21T12:39:47.6457084Z cvt.u16.u32 %rs97, %r3274; 2026-02-21T12:39:47.6457373Z shr.s16 %rs98, %rs97, 4; 2026-02-21T12:39:47.6457546Z cvt.s16.s8 %rs99, %rs93; 2026-02-21T12:39:47.6457716Z shr.s16 %rs100, %rs99, 4; 2026-02-21T12:39:47.6457895Z cvt.s16.s8 %rs101, %rs94; 2026-02-21T12:39:47.6458146Z shr.s16 %rs102, %rs101, 4; 2026-02-21T12:39:47.6458479Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6458864Z cvt.rn.f32.s16 %r3275, %rs98; 2026-02-21T12:39:47.6459057Z cvt.rn.f32.s16 %r3276, %rs96; 2026-02-21T12:39:47.6459238Z cvt.rn.f32.s16 %r3277, %rs102; 2026-02-21T12:39:47.6459426Z cvt.rn.f32.s16 %r3278, %rs100; 2026-02-21T12:39:47.6459766Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6460126Z shl.b16 %rs103, %rs92, 4; 2026-02-21T12:39:47.6460312Z shl.b16 %rs104, %rs91, 4; 2026-02-21T12:39:47.6460633Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6461082Z prmt.b32 %r3279, %r3263, %r3280, 0x3020U; 2026-02-21T12:39:47.6461308Z prmt.b32 %r3281, %r3279, 0, 0x9991U; 2026-02-21T12:39:47.6461522Z cvt.u16.u32 %rs105, %r3281; 2026-02-21T12:39:47.6461708Z shr.s16 %rs106, %rs105, 4; 2026-02-21T12:39:47.6461889Z cvt.s16.s8 %rs107, %rs103; 2026-02-21T12:39:47.6462060Z shr.s16 %rs108, %rs107, 4; 2026-02-21T12:39:47.6462235Z cvt.s16.s8 %rs109, %rs104; 2026-02-21T12:39:47.6462468Z shr.s16 %rs110, %rs109, 4; 2026-02-21T12:39:47.6462648Z prmt.b32 %r3282, %r3263, 0, 0xbbb3U; 2026-02-21T12:39:47.6462856Z cvt.u16.u32 %rs111, %r3282; 2026-02-21T12:39:47.6463036Z shr.s16 %rs112, %rs111, 4; 2026-02-21T12:39:47.6463364Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6463720Z cvt.rn.f32.s16 %r3283, %rs106; 2026-02-21T12:39:47.6463905Z cvt.rn.f32.s16 %r3284, %rs112; 2026-02-21T12:39:47.6464088Z cvt.rn.f32.s16 %r3285, %rs110; 2026-02-21T12:39:47.6464273Z cvt.rn.f32.s16 %r3286, %rs108; 2026-02-21T12:39:47.6464459Z bar.sync 0; 2026-02-21T12:39:47.6464673Z st.shared.v4.b32 [%r24], {%r3254, %r3252, %r3253, %r3251}; 2026-02-21T12:39:47.6464994Z st.shared.v4.b32 [%r25], {%r3262, %r3259, %r3261, %r3260}; 2026-02-21T12:39:47.6465287Z st.shared.v4.b32 [%r26], {%r3278, %r3276, %r3277, %r3275}; 2026-02-21T12:39:47.6465579Z st.shared.v4.b32 [%r27], {%r3286, %r3283, %r3285, %r3284}; 2026-02-21T12:39:47.6465821Z $L__tmp3: 2026-02-21T12:39:47.6466199Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6466753Z // begin inline asm 2026-02-21T12:39:47.6466960Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6467165Z // end inline asm 2026-02-21T12:39:47.6467317Z bar.sync 0; 2026-02-21T12:39:47.6467487Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6467688Z // begin inline asm 2026-02-21T12:39:47.6470594Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374}, {%r2649,%r2650,%r2651,%r2652}, %rd404, %p3, 1, 1; 2026-02-21T12:39:47.6473668Z // end inline asm 2026-02-21T12:39:47.6473823Z // begin inline asm 2026-02-21T12:39:47.6476859Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374}, {%r2909,%r2910,%r2911,%r2912}, %rd405, %p3, 1, 1; 2026-02-21T12:39:47.6479974Z // end inline asm 2026-02-21T12:39:47.6480149Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6480431Z mov.b32 %r3041, %r10096; 2026-02-21T12:39:47.6480612Z mov.b32 %r3043, %r3042; 2026-02-21T12:39:47.6480778Z // begin inline asm 2026-02-21T12:39:47.6483393Z // wait for regs: %r11247,%r11248,%r11249,%r11250,%r11251,%r11252,%r11253,%r11254,%r11255,%r11256,%r11257,%r11258,%r11259,%r11260,%r11261,%r11262,%r11263,%r11264,%r11265,%r11266,%r11267,%r11268,%r11269,%r11270,%r11271,%r11272,%r11273,%r11274,%r11275,%r11276,%r11277,%r11278,%r11279,%r11280,%r11281,%r11282,%r11283,%r11284,%r11285,%r11286,%r11287,%r11288,%r11289,%r11290,%r11291,%r11292,%r11293,%r11294,%r11295,%r11296,%r11297,%r11298,%r11299,%r11300,%r11301,%r11302,%r11303,%r11304,%r11305,%r11306,%r11307,%r11308,%r11309,%r11310,%r11311,%r11312,%r11313,%r11314,%r11315,%r11316,%r11317,%r11318,%r11319,%r11320,%r11321,%r11322,%r11323,%r11324,%r11325,%r11326,%r11327,%r11328,%r11329,%r11330,%r11331,%r11332,%r11333,%r11334,%r11335,%r11336,%r11337,%r11338,%r11339,%r11340,%r11341,%r11342,%r11343,%r11344,%r11345,%r11346,%r11347,%r11348,%r11349,%r11350,%r11351,%r11352,%r11353,%r11354,%r11355,%r11356,%r11357,%r11358,%r11359,%r11360,%r11361,%r11362,%r11363,%r11364,%r11365,%r11366,%r11367,%r11368,%r11369,%r11370,%r11371,%r11372,%r11373,%r11374,%r3041,%r3042,%r3043 2026-02-21T12:39:47.6486204Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6486399Z // end inline asm 2026-02-21T12:39:47.6486677Z $L__tmp4: 2026-02-21T12:39:47.6486983Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6487359Z add.s64 %rd593, %rd593, 16; 2026-02-21T12:39:47.6487550Z add.s64 %rd592, %rd592, 64; 2026-02-21T12:39:47.6487736Z add.s64 %rd591, %rd591, 20480; 2026-02-21T12:39:47.6487940Z setp.lt.u64 %p7, %rd593, 4080; 2026-02-21T12:39:47.6488130Z @%p7 bra $L__BB0_6; 2026-02-21T12:39:47.6488353Z // %bb.7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6488774Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6489152Z or.b64 %rd206, %rd36, %rd7; 2026-02-21T12:39:47.6489339Z or.b64 %rd207, %rd36, %rd8; 2026-02-21T12:39:47.6489525Z or.b64 %rd208, %rd36, %rd9; 2026-02-21T12:39:47.6489717Z or.b64 %rd209, %rd36, %rd10; 2026-02-21T12:39:47.6489918Z or.b64 %rd210, %rd36, %rd11; 2026-02-21T12:39:47.6490114Z or.b64 %rd211, %rd36, %rd12; 2026-02-21T12:39:47.6490293Z or.b64 %rd212, %rd36, %rd13; 2026-02-21T12:39:47.6490478Z or.b64 %rd213, %rd36, %rd14; 2026-02-21T12:39:47.6490655Z or.b64 %rd214, %rd36, %rd15; 2026-02-21T12:39:47.6490843Z or.b64 %rd215, %rd36, %rd16; 2026-02-21T12:39:47.6491106Z or.b64 %rd216, %rd36, %rd17; 2026-02-21T12:39:47.6491299Z or.b64 %rd217, %rd36, %rd18; 2026-02-21T12:39:47.6491480Z or.b64 %rd218, %rd36, %rd19; 2026-02-21T12:39:47.6491669Z or.b64 %rd219, %rd36, %rd20; 2026-02-21T12:39:47.6491924Z or.b64 %rd220, %rd36, %rd21; 2026-02-21T12:39:47.6492103Z or.b64 %rd221, %rd36, %rd22; 2026-02-21T12:39:47.6492434Z .loc 1 90 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:90:28 2026-02-21T12:39:47.6492815Z cvt.rn.bf16x2.f32 %r3431, %r11248, %r11247; 2026-02-21T12:39:47.6493082Z cvt.rn.bf16x2.f32 %r3432, %r11250, %r11249; 2026-02-21T12:39:47.6493319Z cvt.rn.bf16x2.f32 %r3433, %r11252, %r11251; 2026-02-21T12:39:47.6493552Z cvt.rn.bf16x2.f32 %r3434, %r11254, %r11253; 2026-02-21T12:39:47.6493788Z cvt.rn.bf16x2.f32 %r3435, %r11256, %r11255; 2026-02-21T12:39:47.6494010Z cvt.rn.bf16x2.f32 %r3436, %r11258, %r11257; 2026-02-21T12:39:47.6494242Z cvt.rn.bf16x2.f32 %r3437, %r11260, %r11259; 2026-02-21T12:39:47.6494538Z cvt.rn.bf16x2.f32 %r3438, %r11262, %r11261; 2026-02-21T12:39:47.6494776Z cvt.rn.bf16x2.f32 %r3439, %r11264, %r11263; 2026-02-21T12:39:47.6495000Z cvt.rn.bf16x2.f32 %r3440, %r11266, %r11265; 2026-02-21T12:39:47.6495231Z cvt.rn.bf16x2.f32 %r3441, %r11268, %r11267; 2026-02-21T12:39:47.6495463Z cvt.rn.bf16x2.f32 %r3442, %r11270, %r11269; 2026-02-21T12:39:47.6495694Z cvt.rn.bf16x2.f32 %r3443, %r11272, %r11271; 2026-02-21T12:39:47.6496030Z cvt.rn.bf16x2.f32 %r3444, %r11274, %r11273; 2026-02-21T12:39:47.6496261Z cvt.rn.bf16x2.f32 %r3445, %r11276, %r11275; 2026-02-21T12:39:47.6496609Z cvt.rn.bf16x2.f32 %r3446, %r11278, %r11277; 2026-02-21T12:39:47.6496838Z cvt.rn.bf16x2.f32 %r3447, %r11280, %r11279; 2026-02-21T12:39:47.6497081Z cvt.rn.bf16x2.f32 %r3448, %r11282, %r11281; 2026-02-21T12:39:47.6497310Z cvt.rn.bf16x2.f32 %r3449, %r11284, %r11283; 2026-02-21T12:39:47.6497534Z cvt.rn.bf16x2.f32 %r3450, %r11286, %r11285; 2026-02-21T12:39:47.6497774Z cvt.rn.bf16x2.f32 %r3451, %r11288, %r11287; 2026-02-21T12:39:47.6498022Z cvt.rn.bf16x2.f32 %r3452, %r11290, %r11289; 2026-02-21T12:39:47.6498262Z cvt.rn.bf16x2.f32 %r3453, %r11292, %r11291; 2026-02-21T12:39:47.6498493Z cvt.rn.bf16x2.f32 %r3454, %r11294, %r11293; 2026-02-21T12:39:47.6498731Z cvt.rn.bf16x2.f32 %r3455, %r11296, %r11295; 2026-02-21T12:39:47.6498957Z cvt.rn.bf16x2.f32 %r3456, %r11298, %r11297; 2026-02-21T12:39:47.6499192Z cvt.rn.bf16x2.f32 %r3457, %r11300, %r11299; 2026-02-21T12:39:47.6499423Z cvt.rn.bf16x2.f32 %r3458, %r11302, %r11301; 2026-02-21T12:39:47.6499649Z cvt.rn.bf16x2.f32 %r3459, %r11304, %r11303; 2026-02-21T12:39:47.6499883Z cvt.rn.bf16x2.f32 %r3460, %r11306, %r11305; 2026-02-21T12:39:47.6500113Z cvt.rn.bf16x2.f32 %r3461, %r11308, %r11307; 2026-02-21T12:39:47.6500362Z cvt.rn.bf16x2.f32 %r3462, %r11310, %r11309; 2026-02-21T12:39:47.6500593Z cvt.rn.bf16x2.f32 %r3463, %r11312, %r11311; 2026-02-21T12:39:47.6500826Z cvt.rn.bf16x2.f32 %r3464, %r11314, %r11313; 2026-02-21T12:39:47.6501056Z cvt.rn.bf16x2.f32 %r3465, %r11316, %r11315; 2026-02-21T12:39:47.6501288Z cvt.rn.bf16x2.f32 %r3466, %r11318, %r11317; 2026-02-21T12:39:47.6501515Z cvt.rn.bf16x2.f32 %r3467, %r11320, %r11319; 2026-02-21T12:39:47.6501739Z cvt.rn.bf16x2.f32 %r3468, %r11322, %r11321; 2026-02-21T12:39:47.6501973Z cvt.rn.bf16x2.f32 %r3469, %r11324, %r11323; 2026-02-21T12:39:47.6502198Z cvt.rn.bf16x2.f32 %r3470, %r11326, %r11325; 2026-02-21T12:39:47.6502432Z cvt.rn.bf16x2.f32 %r3471, %r11328, %r11327; 2026-02-21T12:39:47.6502656Z cvt.rn.bf16x2.f32 %r3472, %r11330, %r11329; 2026-02-21T12:39:47.6502899Z cvt.rn.bf16x2.f32 %r3473, %r11332, %r11331; 2026-02-21T12:39:47.6503125Z cvt.rn.bf16x2.f32 %r3474, %r11334, %r11333; 2026-02-21T12:39:47.6503348Z cvt.rn.bf16x2.f32 %r3475, %r11336, %r11335; 2026-02-21T12:39:47.6503579Z cvt.rn.bf16x2.f32 %r3476, %r11338, %r11337; 2026-02-21T12:39:47.6503803Z cvt.rn.bf16x2.f32 %r3477, %r11340, %r11339; 2026-02-21T12:39:47.6504025Z cvt.rn.bf16x2.f32 %r3478, %r11342, %r11341; 2026-02-21T12:39:47.6504344Z cvt.rn.bf16x2.f32 %r3479, %r11344, %r11343; 2026-02-21T12:39:47.6504571Z cvt.rn.bf16x2.f32 %r3480, %r11346, %r11345; 2026-02-21T12:39:47.6504796Z cvt.rn.bf16x2.f32 %r3481, %r11348, %r11347; 2026-02-21T12:39:47.6505090Z cvt.rn.bf16x2.f32 %r3482, %r11350, %r11349; 2026-02-21T12:39:47.6505322Z cvt.rn.bf16x2.f32 %r3483, %r11352, %r11351; 2026-02-21T12:39:47.6505547Z cvt.rn.bf16x2.f32 %r3484, %r11354, %r11353; 2026-02-21T12:39:47.6505791Z cvt.rn.bf16x2.f32 %r3485, %r11356, %r11355; 2026-02-21T12:39:47.6506019Z cvt.rn.bf16x2.f32 %r3486, %r11358, %r11357; 2026-02-21T12:39:47.6506248Z cvt.rn.bf16x2.f32 %r3487, %r11360, %r11359; 2026-02-21T12:39:47.6506582Z cvt.rn.bf16x2.f32 %r3488, %r11362, %r11361; 2026-02-21T12:39:47.6506835Z cvt.rn.bf16x2.f32 %r3489, %r11364, %r11363; 2026-02-21T12:39:47.6507064Z cvt.rn.bf16x2.f32 %r3490, %r11366, %r11365; 2026-02-21T12:39:47.6507296Z cvt.rn.bf16x2.f32 %r3491, %r11368, %r11367; 2026-02-21T12:39:47.6507530Z cvt.rn.bf16x2.f32 %r3492, %r11370, %r11369; 2026-02-21T12:39:47.6507847Z cvt.rn.bf16x2.f32 %r3493, %r11372, %r11371; 2026-02-21T12:39:47.6508074Z cvt.rn.bf16x2.f32 %r3494, %r11374, %r11373; 2026-02-21T12:39:47.6508502Z .loc 1 91 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:22 2026-02-21T12:39:47.6508882Z mad.lo.s64 %rd222, %rd206, 2560, %rd151; 2026-02-21T12:39:47.6509103Z shl.b64 %rd223, %rd37, 1; 2026-02-21T12:39:47.6509359Z add.s64 %rd190, %rd222, %rd223; 2026-02-21T12:39:47.6509565Z mad.lo.s64 %rd224, %rd207, 2560, %rd151; 2026-02-21T12:39:47.6509772Z add.s64 %rd191, %rd224, %rd223; 2026-02-21T12:39:47.6509970Z mad.lo.s64 %rd225, %rd208, 2560, %rd151; 2026-02-21T12:39:47.6510178Z add.s64 %rd192, %rd225, %rd223; 2026-02-21T12:39:47.6510368Z mad.lo.s64 %rd226, %rd209, 2560, %rd151; 2026-02-21T12:39:47.6510573Z add.s64 %rd193, %rd226, %rd223; 2026-02-21T12:39:47.6510760Z mad.lo.s64 %rd227, %rd210, 2560, %rd151; 2026-02-21T12:39:47.6510967Z add.s64 %rd194, %rd227, %rd223; 2026-02-21T12:39:47.6511159Z mad.lo.s64 %rd228, %rd211, 2560, %rd151; 2026-02-21T12:39:47.6511374Z add.s64 %rd195, %rd228, %rd223; 2026-02-21T12:39:47.6511566Z mad.lo.s64 %rd229, %rd212, 2560, %rd151; 2026-02-21T12:39:47.6511772Z add.s64 %rd196, %rd229, %rd223; 2026-02-21T12:39:47.6511964Z mad.lo.s64 %rd230, %rd213, 2560, %rd151; 2026-02-21T12:39:47.6512162Z add.s64 %rd197, %rd230, %rd223; 2026-02-21T12:39:47.6512357Z mad.lo.s64 %rd231, %rd214, 2560, %rd151; 2026-02-21T12:39:47.6512557Z add.s64 %rd198, %rd231, %rd223; 2026-02-21T12:39:47.6512747Z mad.lo.s64 %rd232, %rd215, 2560, %rd151; 2026-02-21T12:39:47.6512953Z add.s64 %rd199, %rd232, %rd223; 2026-02-21T12:39:47.6513142Z mad.lo.s64 %rd233, %rd216, 2560, %rd151; 2026-02-21T12:39:47.6513344Z add.s64 %rd200, %rd233, %rd223; 2026-02-21T12:39:47.6513537Z mad.lo.s64 %rd234, %rd217, 2560, %rd151; 2026-02-21T12:39:47.6513741Z add.s64 %rd201, %rd234, %rd223; 2026-02-21T12:39:47.6513929Z mad.lo.s64 %rd235, %rd218, 2560, %rd151; 2026-02-21T12:39:47.6514136Z add.s64 %rd202, %rd235, %rd223; 2026-02-21T12:39:47.6514323Z mad.lo.s64 %rd236, %rd219, 2560, %rd151; 2026-02-21T12:39:47.6514529Z add.s64 %rd203, %rd236, %rd223; 2026-02-21T12:39:47.6514725Z mad.lo.s64 %rd237, %rd220, 2560, %rd151; 2026-02-21T12:39:47.6514931Z add.s64 %rd204, %rd237, %rd223; 2026-02-21T12:39:47.6515120Z mad.lo.s64 %rd238, %rd221, 2560, %rd151; 2026-02-21T12:39:47.6515340Z add.s64 %rd205, %rd238, %rd223; 2026-02-21T12:39:47.6515676Z .loc 1 91 81 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:81 2026-02-21T12:39:47.6516031Z bar.sync 0; 2026-02-21T12:39:47.6516232Z st.shared.v4.b32 [%r28], {%r3431, %r3433, %r3435, %r3437}; 2026-02-21T12:39:47.6516653Z st.shared.v4.b32 [%r29], {%r3439, %r3441, %r3443, %r3445}; 2026-02-21T12:39:47.6516948Z st.shared.v4.b32 [%r30], {%r3447, %r3449, %r3451, %r3453}; 2026-02-21T12:39:47.6517229Z st.shared.v4.b32 [%r31], {%r3455, %r3457, %r3459, %r3461}; 2026-02-21T12:39:47.6517602Z st.shared.v4.b32 [%r32], {%r3463, %r3465, %r3467, %r3469}; 2026-02-21T12:39:47.6517889Z st.shared.v4.b32 [%r33], {%r3471, %r3473, %r3475, %r3477}; 2026-02-21T12:39:47.6518170Z st.shared.v4.b32 [%r34], {%r3479, %r3481, %r3483, %r3485}; 2026-02-21T12:39:47.6518521Z st.shared.v4.b32 [%r35], {%r3487, %r3489, %r3491, %r3493}; 2026-02-21T12:39:47.6518753Z bar.sync 0; 2026-02-21T12:39:47.6518900Z // begin inline asm 2026-02-21T12:39:47.6519188Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3287, %r3288, %r3289, %r3290}, [%r3291]; 2026-02-21T12:39:47.6519524Z // end inline asm 2026-02-21T12:39:47.6519676Z // begin inline asm 2026-02-21T12:39:47.6519960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3292, %r3293, %r3294, %r3295}, [%r3296]; 2026-02-21T12:39:47.6520301Z // end inline asm 2026-02-21T12:39:47.6520449Z // begin inline asm 2026-02-21T12:39:47.6520724Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3297, %r3298, %r3299, %r3300}, [%r3301]; 2026-02-21T12:39:47.6521045Z // end inline asm 2026-02-21T12:39:47.6521268Z // begin inline asm 2026-02-21T12:39:47.6521541Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3302, %r3303, %r3304, %r3305}, [%r3306]; 2026-02-21T12:39:47.6521868Z // end inline asm 2026-02-21T12:39:47.6522026Z // begin inline asm 2026-02-21T12:39:47.6522308Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3307, %r3308, %r3309, %r3310}, [%r3311]; 2026-02-21T12:39:47.6522635Z // end inline asm 2026-02-21T12:39:47.6522848Z // begin inline asm 2026-02-21T12:39:47.6523123Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3312, %r3313, %r3314, %r3315}, [%r3316]; 2026-02-21T12:39:47.6523445Z // end inline asm 2026-02-21T12:39:47.6523591Z // begin inline asm 2026-02-21T12:39:47.6523863Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3317, %r3318, %r3319, %r3320}, [%r3321]; 2026-02-21T12:39:47.6524185Z // end inline asm 2026-02-21T12:39:47.6524332Z // begin inline asm 2026-02-21T12:39:47.6524613Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3322, %r3323, %r3324, %r3325}, [%r3326]; 2026-02-21T12:39:47.6524942Z // end inline asm 2026-02-21T12:39:47.6525082Z bar.sync 0; 2026-02-21T12:39:47.6525274Z st.shared.v4.b32 [%r28], {%r3432, %r3434, %r3436, %r3438}; 2026-02-21T12:39:47.6525559Z st.shared.v4.b32 [%r29], {%r3440, %r3442, %r3444, %r3446}; 2026-02-21T12:39:47.6525847Z st.shared.v4.b32 [%r30], {%r3448, %r3450, %r3452, %r3454}; 2026-02-21T12:39:47.6526128Z st.shared.v4.b32 [%r31], {%r3456, %r3458, %r3460, %r3462}; 2026-02-21T12:39:47.6526411Z st.shared.v4.b32 [%r32], {%r3464, %r3466, %r3468, %r3470}; 2026-02-21T12:39:47.6526826Z st.shared.v4.b32 [%r33], {%r3472, %r3474, %r3476, %r3478}; 2026-02-21T12:39:47.6527108Z st.shared.v4.b32 [%r34], {%r3480, %r3482, %r3484, %r3486}; 2026-02-21T12:39:47.6527388Z st.shared.v4.b32 [%r35], {%r3488, %r3490, %r3492, %r3494}; 2026-02-21T12:39:47.6527621Z bar.sync 0; 2026-02-21T12:39:47.6527760Z // begin inline asm 2026-02-21T12:39:47.6528035Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3327, %r3328, %r3329, %r3330}, [%r3291]; 2026-02-21T12:39:47.6528366Z // end inline asm 2026-02-21T12:39:47.6528510Z // begin inline asm 2026-02-21T12:39:47.6528782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3332, %r3333, %r3334, %r3335}, [%r3296]; 2026-02-21T12:39:47.6529104Z // end inline asm 2026-02-21T12:39:47.6529252Z // begin inline asm 2026-02-21T12:39:47.6529529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3337, %r3338, %r3339, %r3340}, [%r3301]; 2026-02-21T12:39:47.6529850Z // end inline asm 2026-02-21T12:39:47.6529996Z // begin inline asm 2026-02-21T12:39:47.6530265Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3342, %r3343, %r3344, %r3345}, [%r3306]; 2026-02-21T12:39:47.6530590Z // end inline asm 2026-02-21T12:39:47.6530734Z // begin inline asm 2026-02-21T12:39:47.6531023Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3347, %r3348, %r3349, %r3350}, [%r3311]; 2026-02-21T12:39:47.6531344Z // end inline asm 2026-02-21T12:39:47.6531488Z // begin inline asm 2026-02-21T12:39:47.6531761Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3352, %r3353, %r3354, %r3355}, [%r3316]; 2026-02-21T12:39:47.6532167Z // end inline asm 2026-02-21T12:39:47.6532327Z // begin inline asm 2026-02-21T12:39:47.6532598Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3357, %r3358, %r3359, %r3360}, [%r3321]; 2026-02-21T12:39:47.6532988Z // end inline asm 2026-02-21T12:39:47.6533131Z // begin inline asm 2026-02-21T12:39:47.6533403Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3362, %r3363, %r3364, %r3365}, [%r3326]; 2026-02-21T12:39:47.6533723Z // end inline asm 2026-02-21T12:39:47.6533866Z // begin inline asm 2026-02-21T12:39:47.6534087Z st.global.v4.b32 [ %rd190 + 0 ], { %r3287, %r3288, %r3289, %r3290 }; 2026-02-21T12:39:47.6534359Z // end inline asm 2026-02-21T12:39:47.6534508Z // begin inline asm 2026-02-21T12:39:47.6534731Z st.global.v4.b32 [ %rd191 + 0 ], { %r3327, %r3328, %r3329, %r3330 }; 2026-02-21T12:39:47.6534988Z // end inline asm 2026-02-21T12:39:47.6535128Z // begin inline asm 2026-02-21T12:39:47.6535341Z st.global.v4.b32 [ %rd192 + 0 ], { %r3292, %r3293, %r3294, %r3295 }; 2026-02-21T12:39:47.6535670Z // end inline asm 2026-02-21T12:39:47.6535818Z // begin inline asm 2026-02-21T12:39:47.6536028Z st.global.v4.b32 [ %rd193 + 0 ], { %r3332, %r3333, %r3334, %r3335 }; 2026-02-21T12:39:47.6536281Z // end inline asm 2026-02-21T12:39:47.6536425Z // begin inline asm 2026-02-21T12:39:47.6536744Z st.global.v4.b32 [ %rd194 + 0 ], { %r3297, %r3298, %r3299, %r3300 }; 2026-02-21T12:39:47.6537077Z // end inline asm 2026-02-21T12:39:47.6537229Z // begin inline asm 2026-02-21T12:39:47.6537436Z st.global.v4.b32 [ %rd195 + 0 ], { %r3337, %r3338, %r3339, %r3340 }; 2026-02-21T12:39:47.6537695Z // end inline asm 2026-02-21T12:39:47.6537843Z // begin inline asm 2026-02-21T12:39:47.6538064Z st.global.v4.b32 [ %rd196 + 0 ], { %r3302, %r3303, %r3304, %r3305 }; 2026-02-21T12:39:47.6538320Z // end inline asm 2026-02-21T12:39:47.6538472Z // begin inline asm 2026-02-21T12:39:47.6538686Z st.global.v4.b32 [ %rd197 + 0 ], { %r3342, %r3343, %r3344, %r3345 }; 2026-02-21T12:39:47.6538945Z // end inline asm 2026-02-21T12:39:47.6539093Z // begin inline asm 2026-02-21T12:39:47.6539310Z st.global.v4.b32 [ %rd198 + 0 ], { %r3307, %r3308, %r3309, %r3310 }; 2026-02-21T12:39:47.6539569Z // end inline asm 2026-02-21T12:39:47.6539716Z // begin inline asm 2026-02-21T12:39:47.6539927Z st.global.v4.b32 [ %rd199 + 0 ], { %r3347, %r3348, %r3349, %r3350 }; 2026-02-21T12:39:47.6540179Z // end inline asm 2026-02-21T12:39:47.6540328Z // begin inline asm 2026-02-21T12:39:47.6540533Z st.global.v4.b32 [ %rd200 + 0 ], { %r3312, %r3313, %r3314, %r3315 }; 2026-02-21T12:39:47.6540785Z // end inline asm 2026-02-21T12:39:47.6540930Z // begin inline asm 2026-02-21T12:39:47.6541141Z st.global.v4.b32 [ %rd201 + 0 ], { %r3352, %r3353, %r3354, %r3355 }; 2026-02-21T12:39:47.6541392Z // end inline asm 2026-02-21T12:39:47.6541540Z // begin inline asm 2026-02-21T12:39:47.6541656Z st.global.v4.b32 [ %rd202 + 0 ], { %r3317, %r3318, %r3319, %r3320 }; 2026-02-21T12:39:47.6541716Z // end inline asm 2026-02-21T12:39:47.6541778Z // begin inline asm 2026-02-21T12:39:47.6541906Z st.global.v4.b32 [ %rd203 + 0 ], { %r3357, %r3358, %r3359, %r3360 }; 2026-02-21T12:39:47.6541965Z // end inline asm 2026-02-21T12:39:47.6542027Z // begin inline asm 2026-02-21T12:39:47.6542145Z st.global.v4.b32 [ %rd204 + 0 ], { %r3322, %r3323, %r3324, %r3325 }; 2026-02-21T12:39:47.6542202Z // end inline asm 2026-02-21T12:39:47.6542263Z // begin inline asm 2026-02-21T12:39:47.6542379Z st.global.v4.b32 [ %rd205 + 0 ], { %r3362, %r3363, %r3364, %r3365 }; 2026-02-21T12:39:47.6542436Z // end inline asm 2026-02-21T12:39:47.6542656Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6542725Z add.s64 %rd239, %rd589, 1; 2026-02-21T12:39:47.6542928Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6543018Z mul.hi.u64 %rd240, %rd239, -3689348814741910323; 2026-02-21T12:39:47.6543167Z shr.u64 %rd241, %rd240, 5; 2026-02-21T12:39:47.6543376Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6543442Z shl.b64 %rd46, %rd241, 3; 2026-02-21T12:39:47.6543643Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6543781Z sub.s64 %rd242, 2048, %rd46; 2026-02-21T12:39:47.6543983Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6544051Z min.s64 %rd47, %rd242, 8; 2026-02-21T12:39:47.6544247Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6544315Z mul.lo.s64 %rd243, %rd241, 40; 2026-02-21T12:39:47.6544384Z sub.s64 %rd48, %rd239, %rd243; 2026-02-21T12:39:47.6544580Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6544645Z or.b64 %rd244, %rd48, %rd47; 2026-02-21T12:39:47.6544799Z and.b64 %rd245, %rd244, -4294967296; 2026-02-21T12:39:47.6544876Z setp.ne.b64 %p8, %rd245, 0; 2026-02-21T12:39:47.6544951Z @%p8 bra $L__BB0_9; 2026-02-21T12:39:47.6545013Z bra.uni $L__BB0_8; 2026-02-21T12:39:47.6545138Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6545206Z div.s64 %rd594, %rd48, %rd47; 2026-02-21T12:39:47.6545315Z bra.uni $L__BB0_10; 2026-02-21T12:39:47.6545431Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6545496Z cvt.u32.u64 %r3495, %rd47; 2026-02-21T12:39:47.6545559Z cvt.u32.u64 %r3496, %rd48; 2026-02-21T12:39:47.6545624Z div.u32 %r3497, %r3496, %r3495; 2026-02-21T12:39:47.6545691Z cvt.u64.u32 %rd594, %r3497; 2026-02-21T12:39:47.6545803Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6546005Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6546080Z mul.lo.s64 %rd247, %rd594, %rd47; 2026-02-21T12:39:47.6546147Z sub.s64 %rd248, %rd48, %rd247; 2026-02-21T12:39:47.6546345Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6546416Z add.s64 %rd249, %rd248, %rd46; 2026-02-21T12:39:47.6546756Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6546836Z shl.b64 %rd52, %rd249, 7; 2026-02-21T12:39:47.6547035Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6547101Z shl.b64 %rd250, %rd594, 8; 2026-02-21T12:39:47.6547297Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6547362Z or.b64 %rd53, %rd250, %rd23; 2026-02-21T12:39:47.6547571Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6547637Z shl.b64 %rd251, %rd249, 21; 2026-02-21T12:39:47.6547702Z add.s64 %rd596, %rd26, %rd251; 2026-02-21T12:39:47.6547770Z add.s64 %rd595, %rd27, %rd250; 2026-02-21T12:39:47.6547831Z mov.b32 %r11375, 0f00000000; 2026-02-21T12:39:47.6547897Z mov.b64 %rd597, -16; 2026-02-21T12:39:47.6547959Z mov.b32 %r11376, %r11375; 2026-02-21T12:39:47.6548021Z mov.b32 %r11377, %r11375; 2026-02-21T12:39:47.6548081Z mov.b32 %r11378, %r11375; 2026-02-21T12:39:47.6548138Z mov.b32 %r11379, %r11375; 2026-02-21T12:39:47.6548199Z mov.b32 %r11380, %r11375; 2026-02-21T12:39:47.6548257Z mov.b32 %r11381, %r11375; 2026-02-21T12:39:47.6548317Z mov.b32 %r11382, %r11375; 2026-02-21T12:39:47.6548430Z mov.b32 %r11383, %r11375; 2026-02-21T12:39:47.6548508Z mov.b32 %r11384, %r11375; 2026-02-21T12:39:47.6548569Z mov.b32 %r11385, %r11375; 2026-02-21T12:39:47.6548628Z mov.b32 %r11386, %r11375; 2026-02-21T12:39:47.6548693Z mov.b32 %r11387, %r11375; 2026-02-21T12:39:47.6548839Z mov.b32 %r11388, %r11375; 2026-02-21T12:39:47.6548900Z mov.b32 %r11389, %r11375; 2026-02-21T12:39:47.6548959Z mov.b32 %r11390, %r11375; 2026-02-21T12:39:47.6549021Z mov.b32 %r11391, %r11375; 2026-02-21T12:39:47.6549080Z mov.b32 %r11392, %r11375; 2026-02-21T12:39:47.6549201Z mov.b32 %r11393, %r11375; 2026-02-21T12:39:47.6549263Z mov.b32 %r11394, %r11375; 2026-02-21T12:39:47.6549321Z mov.b32 %r11395, %r11375; 2026-02-21T12:39:47.6549381Z mov.b32 %r11396, %r11375; 2026-02-21T12:39:47.6549439Z mov.b32 %r11397, %r11375; 2026-02-21T12:39:47.6549500Z mov.b32 %r11398, %r11375; 2026-02-21T12:39:47.6549558Z mov.b32 %r11399, %r11375; 2026-02-21T12:39:47.6549617Z mov.b32 %r11400, %r11375; 2026-02-21T12:39:47.6549677Z mov.b32 %r11401, %r11375; 2026-02-21T12:39:47.6549734Z mov.b32 %r11402, %r11375; 2026-02-21T12:39:47.6549792Z mov.b32 %r11403, %r11375; 2026-02-21T12:39:47.6549857Z mov.b32 %r11404, %r11375; 2026-02-21T12:39:47.6549916Z mov.b32 %r11405, %r11375; 2026-02-21T12:39:47.6549976Z mov.b32 %r11406, %r11375; 2026-02-21T12:39:47.6550108Z mov.b32 %r11407, %r11375; 2026-02-21T12:39:47.6550178Z mov.b32 %r11408, %r11375; 2026-02-21T12:39:47.6550239Z mov.b32 %r11409, %r11375; 2026-02-21T12:39:47.6550298Z mov.b32 %r11410, %r11375; 2026-02-21T12:39:47.6550361Z mov.b32 %r11411, %r11375; 2026-02-21T12:39:47.6550420Z mov.b32 %r11412, %r11375; 2026-02-21T12:39:47.6550480Z mov.b32 %r11413, %r11375; 2026-02-21T12:39:47.6550597Z mov.b32 %r11414, %r11375; 2026-02-21T12:39:47.6550663Z mov.b32 %r11415, %r11375; 2026-02-21T12:39:47.6550722Z mov.b32 %r11416, %r11375; 2026-02-21T12:39:47.6550780Z mov.b32 %r11417, %r11375; 2026-02-21T12:39:47.6550841Z mov.b32 %r11418, %r11375; 2026-02-21T12:39:47.6550898Z mov.b32 %r11419, %r11375; 2026-02-21T12:39:47.6550957Z mov.b32 %r11420, %r11375; 2026-02-21T12:39:47.6551019Z mov.b32 %r11421, %r11375; 2026-02-21T12:39:47.6551090Z mov.b32 %r11422, %r11375; 2026-02-21T12:39:47.6551163Z mov.b32 %r11423, %r11375; 2026-02-21T12:39:47.6551229Z mov.b32 %r11424, %r11375; 2026-02-21T12:39:47.6551298Z mov.b32 %r11425, %r11375; 2026-02-21T12:39:47.6551359Z mov.b32 %r11426, %r11375; 2026-02-21T12:39:47.6551420Z mov.b32 %r11427, %r11375; 2026-02-21T12:39:47.6551479Z mov.b32 %r11428, %r11375; 2026-02-21T12:39:47.6551545Z mov.b32 %r11429, %r11375; 2026-02-21T12:39:47.6551606Z mov.b32 %r11430, %r11375; 2026-02-21T12:39:47.6551667Z mov.b32 %r11431, %r11375; 2026-02-21T12:39:47.6551738Z mov.b32 %r11432, %r11375; 2026-02-21T12:39:47.6551800Z mov.b32 %r11433, %r11375; 2026-02-21T12:39:47.6551860Z mov.b32 %r11434, %r11375; 2026-02-21T12:39:47.6551923Z mov.b32 %r11435, %r11375; 2026-02-21T12:39:47.6551989Z mov.b32 %r11436, %r11375; 2026-02-21T12:39:47.6552051Z mov.b32 %r11437, %r11375; 2026-02-21T12:39:47.6552111Z mov.b32 %r11438, %r11375; 2026-02-21T12:39:47.6552176Z mov.b32 %r11439, %r11375; 2026-02-21T12:39:47.6552236Z mov.b32 %r11440, %r11375; 2026-02-21T12:39:47.6552298Z mov.b32 %r11441, %r11375; 2026-02-21T12:39:47.6552359Z mov.b32 %r11442, %r11375; 2026-02-21T12:39:47.6552427Z mov.b32 %r11443, %r11375; 2026-02-21T12:39:47.6552487Z mov.b32 %r11444, %r11375; 2026-02-21T12:39:47.6552548Z mov.b32 %r11445, %r11375; 2026-02-21T12:39:47.6552612Z mov.b32 %r11446, %r11375; 2026-02-21T12:39:47.6552673Z mov.b32 %r11447, %r11375; 2026-02-21T12:39:47.6552733Z mov.b32 %r11448, %r11375; 2026-02-21T12:39:47.6552809Z mov.b32 %r11449, %r11375; 2026-02-21T12:39:47.6552877Z mov.b32 %r11450, %r11375; 2026-02-21T12:39:47.6552936Z mov.b32 %r11451, %r11375; 2026-02-21T12:39:47.6552995Z mov.b32 %r11452, %r11375; 2026-02-21T12:39:47.6553061Z mov.b32 %r11453, %r11375; 2026-02-21T12:39:47.6553120Z mov.b32 %r11454, %r11375; 2026-02-21T12:39:47.6553179Z mov.b32 %r11455, %r11375; 2026-02-21T12:39:47.6553243Z mov.b32 %r11456, %r11375; 2026-02-21T12:39:47.6553303Z mov.b32 %r11457, %r11375; 2026-02-21T12:39:47.6553360Z mov.b32 %r11458, %r11375; 2026-02-21T12:39:47.6553420Z mov.b32 %r11459, %r11375; 2026-02-21T12:39:47.6553548Z mov.b32 %r11460, %r11375; 2026-02-21T12:39:47.6553609Z mov.b32 %r11461, %r11375; 2026-02-21T12:39:47.6553669Z mov.b32 %r11462, %r11375; 2026-02-21T12:39:47.6553732Z mov.b32 %r11463, %r11375; 2026-02-21T12:39:47.6553791Z mov.b32 %r11464, %r11375; 2026-02-21T12:39:47.6553898Z mov.b32 %r11465, %r11375; 2026-02-21T12:39:47.6553958Z mov.b32 %r11466, %r11375; 2026-02-21T12:39:47.6554022Z mov.b32 %r11467, %r11375; 2026-02-21T12:39:47.6554083Z mov.b32 %r11468, %r11375; 2026-02-21T12:39:47.6554142Z mov.b32 %r11469, %r11375; 2026-02-21T12:39:47.6554206Z mov.b32 %r11470, %r11375; 2026-02-21T12:39:47.6554265Z mov.b32 %r11471, %r11375; 2026-02-21T12:39:47.6554326Z mov.b32 %r11472, %r11375; 2026-02-21T12:39:47.6554384Z mov.b32 %r11473, %r11375; 2026-02-21T12:39:47.6554447Z mov.b32 %r11474, %r11375; 2026-02-21T12:39:47.6554505Z mov.b32 %r11475, %r11375; 2026-02-21T12:39:47.6554567Z mov.b32 %r11476, %r11375; 2026-02-21T12:39:47.6554630Z mov.b32 %r11477, %r11375; 2026-02-21T12:39:47.6554690Z mov.b32 %r11478, %r11375; 2026-02-21T12:39:47.6554797Z mov.b32 %r11479, %r11375; 2026-02-21T12:39:47.6554859Z mov.b32 %r11480, %r11375; 2026-02-21T12:39:47.6554922Z mov.b32 %r11481, %r11375; 2026-02-21T12:39:47.6554983Z mov.b32 %r11482, %r11375; 2026-02-21T12:39:47.6555044Z mov.b32 %r11483, %r11375; 2026-02-21T12:39:47.6555112Z mov.b32 %r11484, %r11375; 2026-02-21T12:39:47.6555189Z mov.b32 %r11485, %r11375; 2026-02-21T12:39:47.6555295Z mov.b32 %r11486, %r11375; 2026-02-21T12:39:47.6555358Z mov.b32 %r11487, %r11375; 2026-02-21T12:39:47.6555421Z mov.b32 %r11488, %r11375; 2026-02-21T12:39:47.6555481Z mov.b32 %r11489, %r11375; 2026-02-21T12:39:47.6555540Z mov.b32 %r11490, %r11375; 2026-02-21T12:39:47.6555604Z mov.b32 %r11491, %r11375; 2026-02-21T12:39:47.6555666Z mov.b32 %r11492, %r11375; 2026-02-21T12:39:47.6555726Z mov.b32 %r11493, %r11375; 2026-02-21T12:39:47.6555787Z mov.b32 %r11494, %r11375; 2026-02-21T12:39:47.6555847Z mov.b32 %r11495, %r11375; 2026-02-21T12:39:47.6555909Z mov.b32 %r11496, %r11375; 2026-02-21T12:39:47.6555974Z mov.b32 %r11497, %r11375; 2026-02-21T12:39:47.6556037Z mov.b32 %r11498, %r11375; 2026-02-21T12:39:47.6556097Z mov.b32 %r11499, %r11375; 2026-02-21T12:39:47.6556156Z mov.b32 %r11500, %r11375; 2026-02-21T12:39:47.6556221Z mov.b32 %r11501, %r11375; 2026-02-21T12:39:47.6556280Z mov.b32 %r11502, %r11375; 2026-02-21T12:39:47.6556403Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T12:39:47.6556630Z // => This Inner Loop Header: Depth=2 2026-02-21T12:39:47.6556850Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6556921Z add.s64 %rd253, %rd596, -32; 2026-02-21T12:39:47.6556983Z // begin inline asm 2026-02-21T12:39:47.6557051Z mov.u64 %rd252, 0x0; 2026-02-21T12:39:47.6557183Z createpolicy.fractional.L2::evict_last.b64 %rd252, 1.0; 2026-02-21T12:39:47.6557244Z // end inline asm 2026-02-21T12:39:47.6557313Z // begin inline asm 2026-02-21T12:39:47.6557376Z mov.u32 %r3499, 0x0; 2026-02-21T12:39:47.6557436Z mov.u32 %r3500, 0x0; 2026-02-21T12:39:47.6557493Z mov.u32 %r3501, 0x0; 2026-02-21T12:39:47.6557555Z mov.u32 %r3502, 0x0; 2026-02-21T12:39:47.6557788Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3499, %r3500, %r3501, %r3502 }, [ %rd253 + 0 ], %rd252; 2026-02-21T12:39:47.6557847Z // end inline asm 2026-02-21T12:39:47.6558063Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6558124Z bar.sync 0; 2026-02-21T12:39:47.6558211Z st.shared.v2.b32 [%r10], {%r3499, %r3500}; 2026-02-21T12:39:47.6558296Z st.shared.v2.b32 [%r11], {%r3501, %r3502}; 2026-02-21T12:39:47.6558352Z bar.sync 0; 2026-02-21T12:39:47.6558424Z ld.shared.b16 %rs113, [%r13]; 2026-02-21T12:39:47.6558495Z ld.shared.b16 %rs114, [%r13+256]; 2026-02-21T12:39:47.6558570Z ld.shared.b16 %rs115, [%r13+16]; 2026-02-21T12:39:47.6558735Z ld.shared.b16 %rs116, [%r13+272]; 2026-02-21T12:39:47.6558804Z ld.shared.b16 %rs117, [%r14]; 2026-02-21T12:39:47.6558876Z ld.shared.b16 %rs118, [%r14+256]; 2026-02-21T12:39:47.6558943Z ld.shared.b16 %rs119, [%r14+16]; 2026-02-21T12:39:47.6559009Z ld.shared.b16 %rs120, [%r14+272]; 2026-02-21T12:39:47.6559141Z cvt.f32.bf16 %r3761, %rs113; 2026-02-21T12:39:47.6559215Z cvt.f32.bf16 %r3762, %rs114; 2026-02-21T12:39:47.6559279Z cvt.f32.bf16 %r3763, %rs117; 2026-02-21T12:39:47.6559343Z cvt.f32.bf16 %r3764, %rs118; 2026-02-21T12:39:47.6559408Z cvt.f32.bf16 %r4021, %rs115; 2026-02-21T12:39:47.6559470Z cvt.f32.bf16 %r4022, %rs116; 2026-02-21T12:39:47.6559534Z cvt.f32.bf16 %r4023, %rs119; 2026-02-21T12:39:47.6559596Z cvt.f32.bf16 %r4024, %rs120; 2026-02-21T12:39:47.6559814Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6559878Z // begin inline asm 2026-02-21T12:39:47.6559938Z mov.u32 %r3503, 0x0; 2026-02-21T12:39:47.6560005Z mov.u32 %r3504, 0x0; 2026-02-21T12:39:47.6560168Z ld.global.v2.b32 { %r3503, %r3504 }, [ %rd595 + 0 ]; 2026-02-21T12:39:47.6560231Z // end inline asm 2026-02-21T12:39:47.6560438Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6560498Z bar.sync 0; 2026-02-21T12:39:47.6560564Z st.shared.b8 [%r15], %r3503; 2026-02-21T12:39:47.6560698Z prmt.b32 %r5075, %r3503, 0, 0x7771U; 2026-02-21T12:39:47.6560774Z st.shared.b8 [%r16+256], %r5075; 2026-02-21T12:39:47.6560842Z prmt.b32 %r5076, %r3503, 0, 0x7772U; 2026-02-21T12:39:47.6560908Z st.shared.b8 [%r17+512], %r5076; 2026-02-21T12:39:47.6560978Z prmt.b32 %r5077, %r3503, 0, 0x7773U; 2026-02-21T12:39:47.6561045Z st.shared.b8 [%r18+768], %r5077; 2026-02-21T12:39:47.6561113Z st.shared.b8 [%r19+1024], %r3504; 2026-02-21T12:39:47.6561183Z prmt.b32 %r5078, %r3504, 0, 0x7771U; 2026-02-21T12:39:47.6561256Z st.shared.b8 [%r20+1280], %r5078; 2026-02-21T12:39:47.6561324Z prmt.b32 %r5079, %r3504, 0, 0x7772U; 2026-02-21T12:39:47.6561392Z st.shared.b8 [%r21+1536], %r5079; 2026-02-21T12:39:47.6561466Z prmt.b32 %r5080, %r3504, 0, 0x7773U; 2026-02-21T12:39:47.6561529Z st.shared.b8 [%r22+1792], %r5080; 2026-02-21T12:39:47.6561587Z bar.sync 0; 2026-02-21T12:39:47.6561657Z ld.shared.b32 %r5081, [%r44]; 2026-02-21T12:39:47.6561722Z prmt.b32 %r5082, %r5081, 0, 0x7771U; 2026-02-21T12:39:47.6561791Z cvt.u16.u32 %rs121, %r5082; 2026-02-21T12:39:47.6561856Z prmt.b32 %r5083, %r5081, 0, 0x7770U; 2026-02-21T12:39:47.6561923Z cvt.u16.u32 %rs122, %r5083; 2026-02-21T12:39:47.6561988Z prmt.b32 %r5084, %r5081, 0, 0x7773U; 2026-02-21T12:39:47.6562049Z cvt.u16.u32 %rs123, %r5084; 2026-02-21T12:39:47.6562118Z prmt.b32 %r5085, %r5081, 0, 0x7772U; 2026-02-21T12:39:47.6562181Z cvt.u16.u32 %rs124, %r5085; 2026-02-21T12:39:47.6562382Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6562451Z shl.b16 %rs125, %rs122, 4; 2026-02-21T12:39:47.6562526Z shl.b16 %rs126, %rs121, 4; 2026-02-21T12:39:47.6562723Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6562786Z cvt.u32.u16 %r5086, %rs125; 2026-02-21T12:39:47.6562879Z prmt.b32 %r5087, %r5086, %r5088, 0x3340U; 2026-02-21T12:39:47.6562956Z prmt.b32 %r5092, %r5087, %r5089, 0x5410U; 2026-02-21T12:39:47.6563030Z prmt.b32 %r5093, %r5092, %r5081, 0x5040U; 2026-02-21T12:39:47.6563102Z prmt.b32 %r5094, %r5093, 0, 0x9991U; 2026-02-21T12:39:47.6563164Z cvt.u16.u32 %rs127, %r5094; 2026-02-21T12:39:47.6563228Z shr.s16 %rs128, %rs127, 4; 2026-02-21T12:39:47.6563295Z prmt.b32 %r5095, %r5093, 0, 0xbbb3U; 2026-02-21T12:39:47.6563361Z cvt.u16.u32 %rs129, %r5095; 2026-02-21T12:39:47.6563423Z shr.s16 %rs130, %rs129, 4; 2026-02-21T12:39:47.6563487Z cvt.s16.s8 %rs131, %rs125; 2026-02-21T12:39:47.6563555Z shr.s16 %rs132, %rs131, 4; 2026-02-21T12:39:47.6563617Z cvt.s16.s8 %rs133, %rs126; 2026-02-21T12:39:47.6563739Z shr.s16 %rs134, %rs133, 4; 2026-02-21T12:39:47.6563936Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6564008Z cvt.rn.f32.s16 %r5096, %rs130; 2026-02-21T12:39:47.6564153Z cvt.rn.f32.s16 %r5097, %rs128; 2026-02-21T12:39:47.6564220Z cvt.rn.f32.s16 %r5098, %rs134; 2026-02-21T12:39:47.6564290Z cvt.rn.f32.s16 %r5099, %rs132; 2026-02-21T12:39:47.6564491Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6564555Z shl.b16 %rs135, %rs124, 4; 2026-02-21T12:39:47.6564621Z shl.b16 %rs136, %rs123, 4; 2026-02-21T12:39:47.6564819Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6564891Z prmt.b32 %r5100, %r5081, %r5101, 0x3020U; 2026-02-21T12:39:47.6564957Z prmt.b32 %r5102, %r5100, 0, 0x9991U; 2026-02-21T12:39:47.6565024Z cvt.u16.u32 %rs137, %r5102; 2026-02-21T12:39:47.6565138Z shr.s16 %rs138, %rs137, 4; 2026-02-21T12:39:47.6565202Z cvt.s16.s8 %rs139, %rs135; 2026-02-21T12:39:47.6565269Z shr.s16 %rs140, %rs139, 4; 2026-02-21T12:39:47.6565331Z cvt.s16.s8 %rs141, %rs136; 2026-02-21T12:39:47.6565392Z shr.s16 %rs142, %rs141, 4; 2026-02-21T12:39:47.6565464Z prmt.b32 %r5103, %r5081, 0, 0xbbb3U; 2026-02-21T12:39:47.6565526Z cvt.u16.u32 %rs143, %r5103; 2026-02-21T12:39:47.6565633Z shr.s16 %rs144, %rs143, 4; 2026-02-21T12:39:47.6565831Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6565904Z cvt.rn.f32.s16 %r5104, %rs138; 2026-02-21T12:39:47.6565968Z cvt.rn.f32.s16 %r5105, %rs144; 2026-02-21T12:39:47.6566033Z cvt.rn.f32.s16 %r5106, %rs142; 2026-02-21T12:39:47.6566104Z cvt.rn.f32.s16 %r5107, %rs140; 2026-02-21T12:39:47.6566300Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6566367Z ld.shared.b32 %r5108, [%r44+128]; 2026-02-21T12:39:47.6566437Z prmt.b32 %r5109, %r5108, 0, 0x7771U; 2026-02-21T12:39:47.6566620Z cvt.u16.u32 %rs145, %r5109; 2026-02-21T12:39:47.6566691Z prmt.b32 %r5110, %r5108, 0, 0x7770U; 2026-02-21T12:39:47.6566756Z cvt.u16.u32 %rs146, %r5110; 2026-02-21T12:39:47.6566830Z prmt.b32 %r5111, %r5108, 0, 0x7773U; 2026-02-21T12:39:47.6566892Z cvt.u16.u32 %rs147, %r5111; 2026-02-21T12:39:47.6566959Z prmt.b32 %r5112, %r5108, 0, 0x7772U; 2026-02-21T12:39:47.6567025Z cvt.u16.u32 %rs148, %r5112; 2026-02-21T12:39:47.6567228Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6567292Z shl.b16 %rs149, %rs146, 4; 2026-02-21T12:39:47.6567354Z shl.b16 %rs150, %rs145, 4; 2026-02-21T12:39:47.6567554Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6567615Z cvt.u32.u16 %r5113, %rs149; 2026-02-21T12:39:47.6567690Z prmt.b32 %r5114, %r5113, %r5115, 0x3340U; 2026-02-21T12:39:47.6567769Z prmt.b32 %r5116, %r5114, %r5089, 0x5410U; 2026-02-21T12:39:47.6567843Z prmt.b32 %r5117, %r5116, %r5108, 0x5040U; 2026-02-21T12:39:47.6567911Z prmt.b32 %r5118, %r5117, 0, 0x9991U; 2026-02-21T12:39:47.6567982Z cvt.u16.u32 %rs151, %r5118; 2026-02-21T12:39:47.6568045Z shr.s16 %rs152, %rs151, 4; 2026-02-21T12:39:47.6568113Z prmt.b32 %r5119, %r5117, 0, 0xbbb3U; 2026-02-21T12:39:47.6568177Z cvt.u16.u32 %rs153, %r5119; 2026-02-21T12:39:47.6568244Z shr.s16 %rs154, %rs153, 4; 2026-02-21T12:39:47.6568308Z cvt.s16.s8 %rs155, %rs149; 2026-02-21T12:39:47.6568370Z shr.s16 %rs156, %rs155, 4; 2026-02-21T12:39:47.6568437Z cvt.s16.s8 %rs157, %rs150; 2026-02-21T12:39:47.6568500Z shr.s16 %rs158, %rs157, 4; 2026-02-21T12:39:47.6568700Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6568767Z cvt.rn.f32.s16 %r5120, %rs154; 2026-02-21T12:39:47.6568839Z cvt.rn.f32.s16 %r5121, %rs152; 2026-02-21T12:39:47.6568987Z cvt.rn.f32.s16 %r5122, %rs158; 2026-02-21T12:39:47.6569051Z cvt.rn.f32.s16 %r5123, %rs156; 2026-02-21T12:39:47.6569254Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6569391Z shl.b16 %rs159, %rs148, 4; 2026-02-21T12:39:47.6569455Z shl.b16 %rs160, %rs147, 4; 2026-02-21T12:39:47.6569659Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6569731Z prmt.b32 %r5124, %r5108, %r5125, 0x3020U; 2026-02-21T12:39:47.6569797Z prmt.b32 %r5126, %r5124, 0, 0x9991U; 2026-02-21T12:39:47.6569859Z cvt.u16.u32 %rs161, %r5126; 2026-02-21T12:39:47.6569927Z shr.s16 %rs162, %rs161, 4; 2026-02-21T12:39:47.6569988Z cvt.s16.s8 %rs163, %rs159; 2026-02-21T12:39:47.6570049Z shr.s16 %rs164, %rs163, 4; 2026-02-21T12:39:47.6570114Z cvt.s16.s8 %rs165, %rs160; 2026-02-21T12:39:47.6570174Z shr.s16 %rs166, %rs165, 4; 2026-02-21T12:39:47.6570242Z prmt.b32 %r5127, %r5108, 0, 0xbbb3U; 2026-02-21T12:39:47.6570369Z cvt.u16.u32 %rs167, %r5127; 2026-02-21T12:39:47.6570440Z shr.s16 %rs168, %rs167, 4; 2026-02-21T12:39:47.6570638Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6570705Z cvt.rn.f32.s16 %r5128, %rs162; 2026-02-21T12:39:47.6570774Z cvt.rn.f32.s16 %r5129, %rs168; 2026-02-21T12:39:47.6570907Z cvt.rn.f32.s16 %r5130, %rs166; 2026-02-21T12:39:47.6570976Z cvt.rn.f32.s16 %r5131, %rs164; 2026-02-21T12:39:47.6571037Z bar.sync 0; 2026-02-21T12:39:47.6571153Z st.shared.v4.b32 [%r24], {%r5099, %r5097, %r5098, %r5096}; 2026-02-21T12:39:47.6571262Z st.shared.v4.b32 [%r25], {%r5107, %r5104, %r5106, %r5105}; 2026-02-21T12:39:47.6571365Z st.shared.v4.b32 [%r26], {%r5123, %r5121, %r5122, %r5120}; 2026-02-21T12:39:47.6571471Z st.shared.v4.b32 [%r27], {%r5131, %r5128, %r5130, %r5129}; 2026-02-21T12:39:47.6571529Z $L__tmp5: 2026-02-21T12:39:47.6571806Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6571875Z // begin inline asm 2026-02-21T12:39:47.6571955Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6572024Z // end inline asm 2026-02-21T12:39:47.6572089Z bar.sync 0; 2026-02-21T12:39:47.6572171Z shfl.sync.idx.b32 %r5132, %r2, 0, 31, -1; 2026-02-21T12:39:47.6572245Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6572313Z mov.pred %p9, -1; 2026-02-21T12:39:47.6572378Z // begin inline asm 2026-02-21T12:39:47.6575103Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502}, {%r3761,%r3762,%r3763,%r3764}, %rd404, %p9, 1, 1; 2026-02-21T12:39:47.6575173Z // end inline asm 2026-02-21T12:39:47.6575234Z // begin inline asm 2026-02-21T12:39:47.6578064Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502}, {%r4021,%r4022,%r4023,%r4024}, %rd405, %p9, 1, 1; 2026-02-21T12:39:47.6578289Z // end inline asm 2026-02-21T12:39:47.6578369Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6578437Z mov.b32 %r4943, 0; 2026-02-21T12:39:47.6578574Z mov.b32 %r4153, %r10096; 2026-02-21T12:39:47.6578642Z mov.b32 %r4154, %r4943; 2026-02-21T12:39:47.6578707Z mov.b32 %r4155, %r4943; 2026-02-21T12:39:47.6578768Z // begin inline asm 2026-02-21T12:39:47.6581343Z // wait for regs: %r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502,%r4153,%r4154,%r4155 2026-02-21T12:39:47.6581434Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6581493Z // end inline asm 2026-02-21T12:39:47.6581556Z $L__tmp6: 2026-02-21T12:39:47.6581768Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6581830Z // begin inline asm 2026-02-21T12:39:47.6581891Z mov.u64 %rd258, 0x0; 2026-02-21T12:39:47.6582024Z createpolicy.fractional.L2::evict_last.b64 %rd258, 1.0; 2026-02-21T12:39:47.6582083Z // end inline asm 2026-02-21T12:39:47.6582142Z // begin inline asm 2026-02-21T12:39:47.6582206Z mov.u32 %r4287, 0x0; 2026-02-21T12:39:47.6582266Z mov.u32 %r4288, 0x0; 2026-02-21T12:39:47.6582326Z mov.u32 %r4289, 0x0; 2026-02-21T12:39:47.6582389Z mov.u32 %r4290, 0x0; 2026-02-21T12:39:47.6582616Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4287, %r4288, %r4289, %r4290 }, [ %rd596 + 0 ], %rd258; 2026-02-21T12:39:47.6582674Z // end inline asm 2026-02-21T12:39:47.6582879Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6582944Z bar.sync 0; 2026-02-21T12:39:47.6583027Z st.shared.v2.b32 [%r10], {%r4287, %r4288}; 2026-02-21T12:39:47.6583106Z st.shared.v2.b32 [%r11], {%r4289, %r4290}; 2026-02-21T12:39:47.6583169Z bar.sync 0; 2026-02-21T12:39:47.6583238Z ld.shared.b16 %rs169, [%r13]; 2026-02-21T12:39:47.6583306Z ld.shared.b16 %rs170, [%r13+256]; 2026-02-21T12:39:47.6583379Z ld.shared.b16 %rs171, [%r13+16]; 2026-02-21T12:39:47.6583446Z ld.shared.b16 %rs172, [%r13+272]; 2026-02-21T12:39:47.6583512Z ld.shared.b16 %rs173, [%r14]; 2026-02-21T12:39:47.6583580Z ld.shared.b16 %rs174, [%r14+256]; 2026-02-21T12:39:47.6583715Z ld.shared.b16 %rs175, [%r14+16]; 2026-02-21T12:39:47.6583783Z ld.shared.b16 %rs176, [%r14+272]; 2026-02-21T12:39:47.6583851Z cvt.f32.bf16 %r4549, %rs169; 2026-02-21T12:39:47.6583925Z cvt.f32.bf16 %r4550, %rs170; 2026-02-21T12:39:47.6584055Z cvt.f32.bf16 %r4551, %rs173; 2026-02-21T12:39:47.6584119Z cvt.f32.bf16 %r4552, %rs174; 2026-02-21T12:39:47.6584181Z cvt.f32.bf16 %r4809, %rs171; 2026-02-21T12:39:47.6584247Z cvt.f32.bf16 %r4810, %rs172; 2026-02-21T12:39:47.6584311Z cvt.f32.bf16 %r4811, %rs175; 2026-02-21T12:39:47.6584373Z cvt.f32.bf16 %r4812, %rs176; 2026-02-21T12:39:47.6584582Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6584648Z add.s64 %rd261, %rd595, 10240; 2026-02-21T12:39:47.6584708Z // begin inline asm 2026-02-21T12:39:47.6584769Z mov.u32 %r4291, 0x0; 2026-02-21T12:39:47.6584832Z mov.u32 %r4292, 0x0; 2026-02-21T12:39:47.6584930Z ld.global.v2.b32 { %r4291, %r4292 }, [ %rd261 + 0 ]; 2026-02-21T12:39:47.6584990Z // end inline asm 2026-02-21T12:39:47.6585247Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6585308Z bar.sync 0; 2026-02-21T12:39:47.6585373Z st.shared.b8 [%r15], %r4291; 2026-02-21T12:39:47.6585449Z prmt.b32 %r5133, %r4291, 0, 0x7771U; 2026-02-21T12:39:47.6585517Z st.shared.b8 [%r16+256], %r5133; 2026-02-21T12:39:47.6585630Z prmt.b32 %r5134, %r4291, 0, 0x7772U; 2026-02-21T12:39:47.6585698Z st.shared.b8 [%r17+512], %r5134; 2026-02-21T12:39:47.6585768Z prmt.b32 %r5135, %r4291, 0, 0x7773U; 2026-02-21T12:39:47.6585832Z st.shared.b8 [%r18+768], %r5135; 2026-02-21T12:39:47.6585899Z st.shared.b8 [%r19+1024], %r4292; 2026-02-21T12:39:47.6585968Z prmt.b32 %r5136, %r4292, 0, 0x7771U; 2026-02-21T12:39:47.6586032Z st.shared.b8 [%r20+1280], %r5136; 2026-02-21T12:39:47.6586096Z prmt.b32 %r5137, %r4292, 0, 0x7772U; 2026-02-21T12:39:47.6586160Z st.shared.b8 [%r21+1536], %r5137; 2026-02-21T12:39:47.6586234Z prmt.b32 %r5138, %r4292, 0, 0x7773U; 2026-02-21T12:39:47.6586301Z st.shared.b8 [%r22+1792], %r5138; 2026-02-21T12:39:47.6586369Z bar.sync 0; 2026-02-21T12:39:47.6586444Z ld.shared.b32 %r5139, [%r44]; 2026-02-21T12:39:47.6586625Z prmt.b32 %r5140, %r5139, 0, 0x7771U; 2026-02-21T12:39:47.6586695Z cvt.u16.u32 %rs177, %r5140; 2026-02-21T12:39:47.6586763Z prmt.b32 %r5141, %r5139, 0, 0x7770U; 2026-02-21T12:39:47.6586828Z cvt.u16.u32 %rs178, %r5141; 2026-02-21T12:39:47.6586893Z prmt.b32 %r5142, %r5139, 0, 0x7773U; 2026-02-21T12:39:47.6586956Z cvt.u16.u32 %rs179, %r5142; 2026-02-21T12:39:47.6587025Z prmt.b32 %r5143, %r5139, 0, 0x7772U; 2026-02-21T12:39:47.6587086Z cvt.u16.u32 %rs180, %r5143; 2026-02-21T12:39:47.6587301Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6587371Z shl.b16 %rs181, %rs178, 4; 2026-02-21T12:39:47.6587435Z shl.b16 %rs182, %rs177, 4; 2026-02-21T12:39:47.6587635Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6587705Z cvt.u32.u16 %r5144, %rs181; 2026-02-21T12:39:47.6587780Z prmt.b32 %r5145, %r5144, %r5146, 0x3340U; 2026-02-21T12:39:47.6587855Z prmt.b32 %r5147, %r5145, %r5089, 0x5410U; 2026-02-21T12:39:47.6587928Z prmt.b32 %r5148, %r5147, %r5139, 0x5040U; 2026-02-21T12:39:47.6587998Z prmt.b32 %r5149, %r5148, 0, 0x9991U; 2026-02-21T12:39:47.6588061Z cvt.u16.u32 %rs183, %r5149; 2026-02-21T12:39:47.6588126Z shr.s16 %rs184, %rs183, 4; 2026-02-21T12:39:47.6588196Z prmt.b32 %r5150, %r5148, 0, 0xbbb3U; 2026-02-21T12:39:47.6588258Z cvt.u16.u32 %rs185, %r5150; 2026-02-21T12:39:47.6588322Z shr.s16 %rs186, %rs185, 4; 2026-02-21T12:39:47.6588438Z cvt.s16.s8 %rs187, %rs181; 2026-02-21T12:39:47.6588510Z shr.s16 %rs188, %rs187, 4; 2026-02-21T12:39:47.6588575Z cvt.s16.s8 %rs189, %rs182; 2026-02-21T12:39:47.6588637Z shr.s16 %rs190, %rs189, 4; 2026-02-21T12:39:47.6588841Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6589014Z cvt.rn.f32.s16 %r5151, %rs186; 2026-02-21T12:39:47.6589080Z cvt.rn.f32.s16 %r5152, %rs184; 2026-02-21T12:39:47.6589148Z cvt.rn.f32.s16 %r5153, %rs190; 2026-02-21T12:39:47.6589273Z cvt.rn.f32.s16 %r5154, %rs188; 2026-02-21T12:39:47.6589471Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6589536Z shl.b16 %rs191, %rs180, 4; 2026-02-21T12:39:47.6589603Z shl.b16 %rs192, %rs179, 4; 2026-02-21T12:39:47.6589800Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6589875Z prmt.b32 %r5155, %r5139, %r5156, 0x3020U; 2026-02-21T12:39:47.6589944Z prmt.b32 %r5157, %r5155, 0, 0x9991U; 2026-02-21T12:39:47.6590008Z cvt.u16.u32 %rs193, %r5157; 2026-02-21T12:39:47.6590069Z shr.s16 %rs194, %rs193, 4; 2026-02-21T12:39:47.6590132Z cvt.s16.s8 %rs195, %rs191; 2026-02-21T12:39:47.6590262Z shr.s16 %rs196, %rs195, 4; 2026-02-21T12:39:47.6590327Z cvt.s16.s8 %rs197, %rs192; 2026-02-21T12:39:47.6590400Z shr.s16 %rs198, %rs197, 4; 2026-02-21T12:39:47.6590472Z prmt.b32 %r5158, %r5139, 0, 0xbbb3U; 2026-02-21T12:39:47.6590538Z cvt.u16.u32 %rs199, %r5158; 2026-02-21T12:39:47.6590599Z shr.s16 %rs200, %rs199, 4; 2026-02-21T12:39:47.6590861Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6590928Z cvt.rn.f32.s16 %r5159, %rs194; 2026-02-21T12:39:47.6590992Z cvt.rn.f32.s16 %r5160, %rs200; 2026-02-21T12:39:47.6591056Z cvt.rn.f32.s16 %r5161, %rs198; 2026-02-21T12:39:47.6591125Z cvt.rn.f32.s16 %r5162, %rs196; 2026-02-21T12:39:47.6591321Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6591389Z ld.shared.b32 %r5163, [%r44+128]; 2026-02-21T12:39:47.6591459Z prmt.b32 %r5164, %r5163, 0, 0x7771U; 2026-02-21T12:39:47.6591523Z cvt.u16.u32 %rs201, %r5164; 2026-02-21T12:39:47.6591591Z prmt.b32 %r5165, %r5163, 0, 0x7770U; 2026-02-21T12:39:47.6591659Z cvt.u16.u32 %rs202, %r5165; 2026-02-21T12:39:47.6591724Z prmt.b32 %r5166, %r5163, 0, 0x7773U; 2026-02-21T12:39:47.6591786Z cvt.u16.u32 %rs203, %r5166; 2026-02-21T12:39:47.6591852Z prmt.b32 %r5167, %r5163, 0, 0x7772U; 2026-02-21T12:39:47.6591925Z cvt.u16.u32 %rs204, %r5167; 2026-02-21T12:39:47.6592125Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6592202Z shl.b16 %rs205, %rs202, 4; 2026-02-21T12:39:47.6592275Z shl.b16 %rs206, %rs201, 4; 2026-02-21T12:39:47.6592479Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6592545Z cvt.u32.u16 %r5168, %rs205; 2026-02-21T12:39:47.6592620Z prmt.b32 %r5169, %r5168, %r5170, 0x3340U; 2026-02-21T12:39:47.6592701Z prmt.b32 %r5171, %r5169, %r5089, 0x5410U; 2026-02-21T12:39:47.6592776Z prmt.b32 %r5172, %r5171, %r5163, 0x5040U; 2026-02-21T12:39:47.6592845Z prmt.b32 %r5173, %r5172, 0, 0x9991U; 2026-02-21T12:39:47.6592916Z cvt.u16.u32 %rs207, %r5173; 2026-02-21T12:39:47.6592980Z shr.s16 %rs208, %rs207, 4; 2026-02-21T12:39:47.6593052Z prmt.b32 %r5174, %r5172, 0, 0xbbb3U; 2026-02-21T12:39:47.6593123Z cvt.u16.u32 %rs209, %r5174; 2026-02-21T12:39:47.6593191Z shr.s16 %rs210, %rs209, 4; 2026-02-21T12:39:47.6593254Z cvt.s16.s8 %rs211, %rs205; 2026-02-21T12:39:47.6593318Z shr.s16 %rs212, %rs211, 4; 2026-02-21T12:39:47.6593390Z cvt.s16.s8 %rs213, %rs206; 2026-02-21T12:39:47.6593453Z shr.s16 %rs214, %rs213, 4; 2026-02-21T12:39:47.6593649Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6593734Z cvt.rn.f32.s16 %r5175, %rs210; 2026-02-21T12:39:47.6593804Z cvt.rn.f32.s16 %r5176, %rs208; 2026-02-21T12:39:47.6593870Z cvt.rn.f32.s16 %r5177, %rs214; 2026-02-21T12:39:47.6593937Z cvt.rn.f32.s16 %r5178, %rs212; 2026-02-21T12:39:47.6594203Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6594268Z shl.b16 %rs215, %rs204, 4; 2026-02-21T12:39:47.6594331Z shl.b16 %rs216, %rs203, 4; 2026-02-21T12:39:47.6594587Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6594660Z prmt.b32 %r5179, %r5163, %r5180, 0x3020U; 2026-02-21T12:39:47.6594730Z prmt.b32 %r5181, %r5179, 0, 0x9991U; 2026-02-21T12:39:47.6594803Z cvt.u16.u32 %rs217, %r5181; 2026-02-21T12:39:47.6594867Z shr.s16 %rs218, %rs217, 4; 2026-02-21T12:39:47.6594929Z cvt.s16.s8 %rs219, %rs215; 2026-02-21T12:39:47.6594992Z shr.s16 %rs220, %rs219, 4; 2026-02-21T12:39:47.6595060Z cvt.s16.s8 %rs221, %rs216; 2026-02-21T12:39:47.6595123Z shr.s16 %rs222, %rs221, 4; 2026-02-21T12:39:47.6595188Z prmt.b32 %r5182, %r5163, 0, 0xbbb3U; 2026-02-21T12:39:47.6595257Z cvt.u16.u32 %rs223, %r5182; 2026-02-21T12:39:47.6595322Z shr.s16 %rs224, %rs223, 4; 2026-02-21T12:39:47.6595569Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6595644Z cvt.rn.f32.s16 %r5183, %rs218; 2026-02-21T12:39:47.6595712Z cvt.rn.f32.s16 %r5184, %rs224; 2026-02-21T12:39:47.6595781Z cvt.rn.f32.s16 %r5185, %rs222; 2026-02-21T12:39:47.6595846Z cvt.rn.f32.s16 %r5186, %rs220; 2026-02-21T12:39:47.6595908Z bar.sync 0; 2026-02-21T12:39:47.6596067Z st.shared.v4.b32 [%r24], {%r5154, %r5152, %r5153, %r5151}; 2026-02-21T12:39:47.6596178Z st.shared.v4.b32 [%r25], {%r5162, %r5159, %r5161, %r5160}; 2026-02-21T12:39:47.6596288Z st.shared.v4.b32 [%r26], {%r5178, %r5176, %r5177, %r5175}; 2026-02-21T12:39:47.6596393Z st.shared.v4.b32 [%r27], {%r5186, %r5183, %r5185, %r5184}; 2026-02-21T12:39:47.6596588Z $L__tmp7: 2026-02-21T12:39:47.6596872Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6596958Z // begin inline asm 2026-02-21T12:39:47.6597043Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6597104Z // end inline asm 2026-02-21T12:39:47.6597167Z bar.sync 0; 2026-02-21T12:39:47.6597242Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6597305Z // begin inline asm 2026-02-21T12:39:47.6600037Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502}, {%r4549,%r4550,%r4551,%r4552}, %rd404, %p9, 1, 1; 2026-02-21T12:39:47.6600103Z // end inline asm 2026-02-21T12:39:47.6600171Z // begin inline asm 2026-02-21T12:39:47.6602883Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502}, {%r4809,%r4810,%r4811,%r4812}, %rd405, %p9, 1, 1; 2026-02-21T12:39:47.6603088Z // end inline asm 2026-02-21T12:39:47.6603181Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6603252Z mov.b32 %r4941, %r10096; 2026-02-21T12:39:47.6603323Z mov.b32 %r4942, %r4943; 2026-02-21T12:39:47.6603384Z // begin inline asm 2026-02-21T12:39:47.6605998Z // wait for regs: %r11375,%r11376,%r11377,%r11378,%r11379,%r11380,%r11381,%r11382,%r11383,%r11384,%r11385,%r11386,%r11387,%r11388,%r11389,%r11390,%r11391,%r11392,%r11393,%r11394,%r11395,%r11396,%r11397,%r11398,%r11399,%r11400,%r11401,%r11402,%r11403,%r11404,%r11405,%r11406,%r11407,%r11408,%r11409,%r11410,%r11411,%r11412,%r11413,%r11414,%r11415,%r11416,%r11417,%r11418,%r11419,%r11420,%r11421,%r11422,%r11423,%r11424,%r11425,%r11426,%r11427,%r11428,%r11429,%r11430,%r11431,%r11432,%r11433,%r11434,%r11435,%r11436,%r11437,%r11438,%r11439,%r11440,%r11441,%r11442,%r11443,%r11444,%r11445,%r11446,%r11447,%r11448,%r11449,%r11450,%r11451,%r11452,%r11453,%r11454,%r11455,%r11456,%r11457,%r11458,%r11459,%r11460,%r11461,%r11462,%r11463,%r11464,%r11465,%r11466,%r11467,%r11468,%r11469,%r11470,%r11471,%r11472,%r11473,%r11474,%r11475,%r11476,%r11477,%r11478,%r11479,%r11480,%r11481,%r11482,%r11483,%r11484,%r11485,%r11486,%r11487,%r11488,%r11489,%r11490,%r11491,%r11492,%r11493,%r11494,%r11495,%r11496,%r11497,%r11498,%r11499,%r11500,%r11501,%r11502,%r4941,%r4942,%r4943 2026-02-21T12:39:47.6606104Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6606167Z // end inline asm 2026-02-21T12:39:47.6606229Z $L__tmp8: 2026-02-21T12:39:47.6606563Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6606635Z add.s64 %rd597, %rd597, 16; 2026-02-21T12:39:47.6606700Z add.s64 %rd596, %rd596, 64; 2026-02-21T12:39:47.6606776Z add.s64 %rd595, %rd595, 20480; 2026-02-21T12:39:47.6606849Z setp.lt.u64 %p13, %rd597, 4080; 2026-02-21T12:39:47.6606914Z @%p13 bra $L__BB0_11; 2026-02-21T12:39:47.6607045Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6607253Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6607320Z or.b64 %rd280, %rd52, %rd7; 2026-02-21T12:39:47.6607389Z or.b64 %rd281, %rd52, %rd8; 2026-02-21T12:39:47.6607453Z or.b64 %rd282, %rd52, %rd9; 2026-02-21T12:39:47.6607519Z or.b64 %rd283, %rd52, %rd10; 2026-02-21T12:39:47.6607587Z or.b64 %rd284, %rd52, %rd11; 2026-02-21T12:39:47.6607657Z or.b64 %rd285, %rd52, %rd12; 2026-02-21T12:39:47.6607719Z or.b64 %rd286, %rd52, %rd13; 2026-02-21T12:39:47.6607784Z or.b64 %rd287, %rd52, %rd14; 2026-02-21T12:39:47.6607854Z or.b64 %rd288, %rd52, %rd15; 2026-02-21T12:39:47.6607917Z or.b64 %rd289, %rd52, %rd16; 2026-02-21T12:39:47.6607981Z or.b64 %rd290, %rd52, %rd17; 2026-02-21T12:39:47.6608045Z or.b64 %rd291, %rd52, %rd18; 2026-02-21T12:39:47.6608117Z or.b64 %rd292, %rd52, %rd19; 2026-02-21T12:39:47.6608179Z or.b64 %rd293, %rd52, %rd20; 2026-02-21T12:39:47.6608243Z or.b64 %rd294, %rd52, %rd21; 2026-02-21T12:39:47.6608310Z or.b64 %rd295, %rd52, %rd22; 2026-02-21T12:39:47.6608513Z .loc 1 90 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:90:28 2026-02-21T12:39:47.6608603Z cvt.rn.bf16x2.f32 %r5331, %r11376, %r11375; 2026-02-21T12:39:47.6608695Z cvt.rn.bf16x2.f32 %r5332, %r11378, %r11377; 2026-02-21T12:39:47.6608876Z cvt.rn.bf16x2.f32 %r5333, %r11380, %r11379; 2026-02-21T12:39:47.6608959Z cvt.rn.bf16x2.f32 %r5334, %r11382, %r11381; 2026-02-21T12:39:47.6609038Z cvt.rn.bf16x2.f32 %r5335, %r11384, %r11383; 2026-02-21T12:39:47.6609123Z cvt.rn.bf16x2.f32 %r5336, %r11386, %r11385; 2026-02-21T12:39:47.6609263Z cvt.rn.bf16x2.f32 %r5337, %r11388, %r11387; 2026-02-21T12:39:47.6609342Z cvt.rn.bf16x2.f32 %r5338, %r11390, %r11389; 2026-02-21T12:39:47.6609429Z cvt.rn.bf16x2.f32 %r5339, %r11392, %r11391; 2026-02-21T12:39:47.6609508Z cvt.rn.bf16x2.f32 %r5340, %r11394, %r11393; 2026-02-21T12:39:47.6609587Z cvt.rn.bf16x2.f32 %r5341, %r11396, %r11395; 2026-02-21T12:39:47.6609671Z cvt.rn.bf16x2.f32 %r5342, %r11398, %r11397; 2026-02-21T12:39:47.6609750Z cvt.rn.bf16x2.f32 %r5343, %r11400, %r11399; 2026-02-21T12:39:47.6609827Z cvt.rn.bf16x2.f32 %r5344, %r11402, %r11401; 2026-02-21T12:39:47.6609909Z cvt.rn.bf16x2.f32 %r5345, %r11404, %r11403; 2026-02-21T12:39:47.6609992Z cvt.rn.bf16x2.f32 %r5346, %r11406, %r11405; 2026-02-21T12:39:47.6610143Z cvt.rn.bf16x2.f32 %r5347, %r11408, %r11407; 2026-02-21T12:39:47.6610225Z cvt.rn.bf16x2.f32 %r5348, %r11410, %r11409; 2026-02-21T12:39:47.6610311Z cvt.rn.bf16x2.f32 %r5349, %r11412, %r11411; 2026-02-21T12:39:47.6610395Z cvt.rn.bf16x2.f32 %r5350, %r11414, %r11413; 2026-02-21T12:39:47.6610485Z cvt.rn.bf16x2.f32 %r5351, %r11416, %r11415; 2026-02-21T12:39:47.6610651Z cvt.rn.bf16x2.f32 %r5352, %r11418, %r11417; 2026-02-21T12:39:47.6610733Z cvt.rn.bf16x2.f32 %r5353, %r11420, %r11419; 2026-02-21T12:39:47.6610814Z cvt.rn.bf16x2.f32 %r5354, %r11422, %r11421; 2026-02-21T12:39:47.6610893Z cvt.rn.bf16x2.f32 %r5355, %r11424, %r11423; 2026-02-21T12:39:47.6610977Z cvt.rn.bf16x2.f32 %r5356, %r11426, %r11425; 2026-02-21T12:39:47.6611056Z cvt.rn.bf16x2.f32 %r5357, %r11428, %r11427; 2026-02-21T12:39:47.6611135Z cvt.rn.bf16x2.f32 %r5358, %r11430, %r11429; 2026-02-21T12:39:47.6611216Z cvt.rn.bf16x2.f32 %r5359, %r11432, %r11431; 2026-02-21T12:39:47.6611297Z cvt.rn.bf16x2.f32 %r5360, %r11434, %r11433; 2026-02-21T12:39:47.6611374Z cvt.rn.bf16x2.f32 %r5361, %r11436, %r11435; 2026-02-21T12:39:47.6611456Z cvt.rn.bf16x2.f32 %r5362, %r11438, %r11437; 2026-02-21T12:39:47.6611533Z cvt.rn.bf16x2.f32 %r5363, %r11440, %r11439; 2026-02-21T12:39:47.6611612Z cvt.rn.bf16x2.f32 %r5364, %r11442, %r11441; 2026-02-21T12:39:47.6611687Z cvt.rn.bf16x2.f32 %r5365, %r11444, %r11443; 2026-02-21T12:39:47.6611771Z cvt.rn.bf16x2.f32 %r5366, %r11446, %r11445; 2026-02-21T12:39:47.6611848Z cvt.rn.bf16x2.f32 %r5367, %r11448, %r11447; 2026-02-21T12:39:47.6611925Z cvt.rn.bf16x2.f32 %r5368, %r11450, %r11449; 2026-02-21T12:39:47.6612006Z cvt.rn.bf16x2.f32 %r5369, %r11452, %r11451; 2026-02-21T12:39:47.6612083Z cvt.rn.bf16x2.f32 %r5370, %r11454, %r11453; 2026-02-21T12:39:47.6612160Z cvt.rn.bf16x2.f32 %r5371, %r11456, %r11455; 2026-02-21T12:39:47.6612236Z cvt.rn.bf16x2.f32 %r5372, %r11458, %r11457; 2026-02-21T12:39:47.6612318Z cvt.rn.bf16x2.f32 %r5373, %r11460, %r11459; 2026-02-21T12:39:47.6612400Z cvt.rn.bf16x2.f32 %r5374, %r11462, %r11461; 2026-02-21T12:39:47.6612477Z cvt.rn.bf16x2.f32 %r5375, %r11464, %r11463; 2026-02-21T12:39:47.6612559Z cvt.rn.bf16x2.f32 %r5376, %r11466, %r11465; 2026-02-21T12:39:47.6612638Z cvt.rn.bf16x2.f32 %r5377, %r11468, %r11467; 2026-02-21T12:39:47.6612716Z cvt.rn.bf16x2.f32 %r5378, %r11470, %r11469; 2026-02-21T12:39:47.6612801Z cvt.rn.bf16x2.f32 %r5379, %r11472, %r11471; 2026-02-21T12:39:47.6612881Z cvt.rn.bf16x2.f32 %r5380, %r11474, %r11473; 2026-02-21T12:39:47.6612958Z cvt.rn.bf16x2.f32 %r5381, %r11476, %r11475; 2026-02-21T12:39:47.6613036Z cvt.rn.bf16x2.f32 %r5382, %r11478, %r11477; 2026-02-21T12:39:47.6613124Z cvt.rn.bf16x2.f32 %r5383, %r11480, %r11479; 2026-02-21T12:39:47.6613200Z cvt.rn.bf16x2.f32 %r5384, %r11482, %r11481; 2026-02-21T12:39:47.6613279Z cvt.rn.bf16x2.f32 %r5385, %r11484, %r11483; 2026-02-21T12:39:47.6613364Z cvt.rn.bf16x2.f32 %r5386, %r11486, %r11485; 2026-02-21T12:39:47.6613442Z cvt.rn.bf16x2.f32 %r5387, %r11488, %r11487; 2026-02-21T12:39:47.6613585Z cvt.rn.bf16x2.f32 %r5388, %r11490, %r11489; 2026-02-21T12:39:47.6613669Z cvt.rn.bf16x2.f32 %r5389, %r11492, %r11491; 2026-02-21T12:39:47.6613746Z cvt.rn.bf16x2.f32 %r5390, %r11494, %r11493; 2026-02-21T12:39:47.6613887Z cvt.rn.bf16x2.f32 %r5391, %r11496, %r11495; 2026-02-21T12:39:47.6613964Z cvt.rn.bf16x2.f32 %r5392, %r11498, %r11497; 2026-02-21T12:39:47.6614050Z cvt.rn.bf16x2.f32 %r5393, %r11500, %r11499; 2026-02-21T12:39:47.6614129Z cvt.rn.bf16x2.f32 %r5394, %r11502, %r11501; 2026-02-21T12:39:47.6614331Z .loc 1 91 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:22 2026-02-21T12:39:47.6614413Z mad.lo.s64 %rd296, %rd280, 2560, %rd151; 2026-02-21T12:39:47.6614480Z shl.b64 %rd297, %rd53, 1; 2026-02-21T12:39:47.6614548Z add.s64 %rd264, %rd296, %rd297; 2026-02-21T12:39:47.6614631Z mad.lo.s64 %rd298, %rd281, 2560, %rd151; 2026-02-21T12:39:47.6614701Z add.s64 %rd265, %rd298, %rd297; 2026-02-21T12:39:47.6614824Z mad.lo.s64 %rd299, %rd282, 2560, %rd151; 2026-02-21T12:39:47.6614894Z add.s64 %rd266, %rd299, %rd297; 2026-02-21T12:39:47.6614973Z mad.lo.s64 %rd300, %rd283, 2560, %rd151; 2026-02-21T12:39:47.6615038Z add.s64 %rd267, %rd300, %rd297; 2026-02-21T12:39:47.6615111Z mad.lo.s64 %rd301, %rd284, 2560, %rd151; 2026-02-21T12:39:47.6615182Z add.s64 %rd268, %rd301, %rd297; 2026-02-21T12:39:47.6615253Z mad.lo.s64 %rd302, %rd285, 2560, %rd151; 2026-02-21T12:39:47.6615366Z add.s64 %rd269, %rd302, %rd297; 2026-02-21T12:39:47.6615445Z mad.lo.s64 %rd303, %rd286, 2560, %rd151; 2026-02-21T12:39:47.6615511Z add.s64 %rd270, %rd303, %rd297; 2026-02-21T12:39:47.6615583Z mad.lo.s64 %rd304, %rd287, 2560, %rd151; 2026-02-21T12:39:47.6615648Z add.s64 %rd271, %rd304, %rd297; 2026-02-21T12:39:47.6615726Z mad.lo.s64 %rd305, %rd288, 2560, %rd151; 2026-02-21T12:39:47.6615791Z add.s64 %rd272, %rd305, %rd297; 2026-02-21T12:39:47.6615863Z mad.lo.s64 %rd306, %rd289, 2560, %rd151; 2026-02-21T12:39:47.6615939Z add.s64 %rd273, %rd306, %rd297; 2026-02-21T12:39:47.6616012Z mad.lo.s64 %rd307, %rd290, 2560, %rd151; 2026-02-21T12:39:47.6616076Z add.s64 %rd274, %rd307, %rd297; 2026-02-21T12:39:47.6616147Z mad.lo.s64 %rd308, %rd291, 2560, %rd151; 2026-02-21T12:39:47.6616220Z add.s64 %rd275, %rd308, %rd297; 2026-02-21T12:39:47.6616291Z mad.lo.s64 %rd309, %rd292, 2560, %rd151; 2026-02-21T12:39:47.6616356Z add.s64 %rd276, %rd309, %rd297; 2026-02-21T12:39:47.6616442Z mad.lo.s64 %rd310, %rd293, 2560, %rd151; 2026-02-21T12:39:47.6616640Z add.s64 %rd277, %rd310, %rd297; 2026-02-21T12:39:47.6616717Z mad.lo.s64 %rd311, %rd294, 2560, %rd151; 2026-02-21T12:39:47.6616788Z add.s64 %rd278, %rd311, %rd297; 2026-02-21T12:39:47.6616860Z mad.lo.s64 %rd312, %rd295, 2560, %rd151; 2026-02-21T12:39:47.6616926Z add.s64 %rd279, %rd312, %rd297; 2026-02-21T12:39:47.6617129Z .loc 1 91 81 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:81 2026-02-21T12:39:47.6617194Z bar.sync 0; 2026-02-21T12:39:47.6617315Z st.shared.v4.b32 [%r28], {%r5331, %r5333, %r5335, %r5337}; 2026-02-21T12:39:47.6617428Z st.shared.v4.b32 [%r29], {%r5339, %r5341, %r5343, %r5345}; 2026-02-21T12:39:47.6617538Z st.shared.v4.b32 [%r30], {%r5347, %r5349, %r5351, %r5353}; 2026-02-21T12:39:47.6617646Z st.shared.v4.b32 [%r31], {%r5355, %r5357, %r5359, %r5361}; 2026-02-21T12:39:47.6617749Z st.shared.v4.b32 [%r32], {%r5363, %r5365, %r5367, %r5369}; 2026-02-21T12:39:47.6617861Z st.shared.v4.b32 [%r33], {%r5371, %r5373, %r5375, %r5377}; 2026-02-21T12:39:47.6617966Z st.shared.v4.b32 [%r34], {%r5379, %r5381, %r5383, %r5385}; 2026-02-21T12:39:47.6618068Z st.shared.v4.b32 [%r35], {%r5387, %r5389, %r5391, %r5393}; 2026-02-21T12:39:47.6618130Z bar.sync 0; 2026-02-21T12:39:47.6618205Z // begin inline asm 2026-02-21T12:39:47.6618403Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5187, %r5188, %r5189, %r5190}, [%r3291]; 2026-02-21T12:39:47.6618464Z // end inline asm 2026-02-21T12:39:47.6618536Z // begin inline asm 2026-02-21T12:39:47.6618815Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5192, %r5193, %r5194, %r5195}, [%r3296]; 2026-02-21T12:39:47.6618878Z // end inline asm 2026-02-21T12:39:47.6618940Z // begin inline asm 2026-02-21T12:39:47.6619130Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5197, %r5198, %r5199, %r5200}, [%r3301]; 2026-02-21T12:39:47.6619264Z // end inline asm 2026-02-21T12:39:47.6619327Z // begin inline asm 2026-02-21T12:39:47.6619518Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5202, %r5203, %r5204, %r5205}, [%r3306]; 2026-02-21T12:39:47.6619578Z // end inline asm 2026-02-21T12:39:47.6619640Z // begin inline asm 2026-02-21T12:39:47.6619829Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5207, %r5208, %r5209, %r5210}, [%r3311]; 2026-02-21T12:39:47.6619887Z // end inline asm 2026-02-21T12:39:47.6619949Z // begin inline asm 2026-02-21T12:39:47.6620130Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5212, %r5213, %r5214, %r5215}, [%r3316]; 2026-02-21T12:39:47.6620196Z // end inline asm 2026-02-21T12:39:47.6620259Z // begin inline asm 2026-02-21T12:39:47.6620510Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5217, %r5218, %r5219, %r5220}, [%r3321]; 2026-02-21T12:39:47.6620578Z // end inline asm 2026-02-21T12:39:47.6620640Z // begin inline asm 2026-02-21T12:39:47.6620822Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5222, %r5223, %r5224, %r5225}, [%r3326]; 2026-02-21T12:39:47.6620889Z // end inline asm 2026-02-21T12:39:47.6620945Z bar.sync 0; 2026-02-21T12:39:47.6621109Z st.shared.v4.b32 [%r28], {%r5332, %r5334, %r5336, %r5338}; 2026-02-21T12:39:47.6621220Z st.shared.v4.b32 [%r29], {%r5340, %r5342, %r5344, %r5346}; 2026-02-21T12:39:47.6621330Z st.shared.v4.b32 [%r30], {%r5348, %r5350, %r5352, %r5354}; 2026-02-21T12:39:47.6621433Z st.shared.v4.b32 [%r31], {%r5356, %r5358, %r5360, %r5362}; 2026-02-21T12:39:47.6621536Z st.shared.v4.b32 [%r32], {%r5364, %r5366, %r5368, %r5370}; 2026-02-21T12:39:47.6621643Z st.shared.v4.b32 [%r33], {%r5372, %r5374, %r5376, %r5378}; 2026-02-21T12:39:47.6621761Z st.shared.v4.b32 [%r34], {%r5380, %r5382, %r5384, %r5386}; 2026-02-21T12:39:47.6621866Z st.shared.v4.b32 [%r35], {%r5388, %r5390, %r5392, %r5394}; 2026-02-21T12:39:47.6621928Z bar.sync 0; 2026-02-21T12:39:47.6621991Z // begin inline asm 2026-02-21T12:39:47.6622177Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5227, %r5228, %r5229, %r5230}, [%r3291]; 2026-02-21T12:39:47.6622236Z // end inline asm 2026-02-21T12:39:47.6622300Z // begin inline asm 2026-02-21T12:39:47.6622483Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5232, %r5233, %r5234, %r5235}, [%r3296]; 2026-02-21T12:39:47.6622541Z // end inline asm 2026-02-21T12:39:47.6622606Z // begin inline asm 2026-02-21T12:39:47.6622787Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5237, %r5238, %r5239, %r5240}, [%r3301]; 2026-02-21T12:39:47.6622846Z // end inline asm 2026-02-21T12:39:47.6622906Z // begin inline asm 2026-02-21T12:39:47.6623093Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5242, %r5243, %r5244, %r5245}, [%r3306]; 2026-02-21T12:39:47.6623154Z // end inline asm 2026-02-21T12:39:47.6623218Z // begin inline asm 2026-02-21T12:39:47.6623406Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5247, %r5248, %r5249, %r5250}, [%r3311]; 2026-02-21T12:39:47.6623466Z // end inline asm 2026-02-21T12:39:47.6623529Z // begin inline asm 2026-02-21T12:39:47.6623716Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5252, %r5253, %r5254, %r5255}, [%r3316]; 2026-02-21T12:39:47.6623774Z // end inline asm 2026-02-21T12:39:47.6623837Z // begin inline asm 2026-02-21T12:39:47.6624020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5257, %r5258, %r5259, %r5260}, [%r3321]; 2026-02-21T12:39:47.6624084Z // end inline asm 2026-02-21T12:39:47.6624144Z // begin inline asm 2026-02-21T12:39:47.6624326Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5262, %r5263, %r5264, %r5265}, [%r3326]; 2026-02-21T12:39:47.6624389Z // end inline asm 2026-02-21T12:39:47.6624450Z // begin inline asm 2026-02-21T12:39:47.6624583Z st.global.v4.b32 [ %rd264 + 0 ], { %r5187, %r5188, %r5189, %r5190 }; 2026-02-21T12:39:47.6624725Z // end inline asm 2026-02-21T12:39:47.6624789Z // begin inline asm 2026-02-21T12:39:47.6624915Z st.global.v4.b32 [ %rd265 + 0 ], { %r5227, %r5228, %r5229, %r5230 }; 2026-02-21T12:39:47.6624973Z // end inline asm 2026-02-21T12:39:47.6625087Z // begin inline asm 2026-02-21T12:39:47.6625208Z st.global.v4.b32 [ %rd266 + 0 ], { %r5192, %r5193, %r5194, %r5195 }; 2026-02-21T12:39:47.6628036Z // end inline asm 2026-02-21T12:39:47.6628115Z // begin inline asm 2026-02-21T12:39:47.6628267Z st.global.v4.b32 [ %rd267 + 0 ], { %r5232, %r5233, %r5234, %r5235 }; 2026-02-21T12:39:47.6628329Z // end inline asm 2026-02-21T12:39:47.6628469Z // begin inline asm 2026-02-21T12:39:47.6628603Z st.global.v4.b32 [ %rd268 + 0 ], { %r5197, %r5198, %r5199, %r5200 }; 2026-02-21T12:39:47.6628669Z // end inline asm 2026-02-21T12:39:47.6628731Z // begin inline asm 2026-02-21T12:39:47.6628854Z st.global.v4.b32 [ %rd269 + 0 ], { %r5237, %r5238, %r5239, %r5240 }; 2026-02-21T12:39:47.6628924Z // end inline asm 2026-02-21T12:39:47.6629105Z // begin inline asm 2026-02-21T12:39:47.6629234Z st.global.v4.b32 [ %rd270 + 0 ], { %r5202, %r5203, %r5204, %r5205 }; 2026-02-21T12:39:47.6629300Z // end inline asm 2026-02-21T12:39:47.6629361Z // begin inline asm 2026-02-21T12:39:47.6629482Z st.global.v4.b32 [ %rd271 + 0 ], { %r5242, %r5243, %r5244, %r5245 }; 2026-02-21T12:39:47.6629542Z // end inline asm 2026-02-21T12:39:47.6629623Z // begin inline asm 2026-02-21T12:39:47.6629817Z st.global.v4.b32 [ %rd272 + 0 ], { %r5207, %r5208, %r5209, %r5210 }; 2026-02-21T12:39:47.6629884Z // end inline asm 2026-02-21T12:39:47.6629945Z // begin inline asm 2026-02-21T12:39:47.6630062Z st.global.v4.b32 [ %rd273 + 0 ], { %r5247, %r5248, %r5249, %r5250 }; 2026-02-21T12:39:47.6630125Z // end inline asm 2026-02-21T12:39:47.6630185Z // begin inline asm 2026-02-21T12:39:47.6630301Z st.global.v4.b32 [ %rd274 + 0 ], { %r5212, %r5213, %r5214, %r5215 }; 2026-02-21T12:39:47.6630360Z // end inline asm 2026-02-21T12:39:47.6630427Z // begin inline asm 2026-02-21T12:39:47.6630557Z st.global.v4.b32 [ %rd275 + 0 ], { %r5252, %r5253, %r5254, %r5255 }; 2026-02-21T12:39:47.6630619Z // end inline asm 2026-02-21T12:39:47.6630687Z // begin inline asm 2026-02-21T12:39:47.6630814Z st.global.v4.b32 [ %rd276 + 0 ], { %r5217, %r5218, %r5219, %r5220 }; 2026-02-21T12:39:47.6630877Z // end inline asm 2026-02-21T12:39:47.6630943Z // begin inline asm 2026-02-21T12:39:47.6631069Z st.global.v4.b32 [ %rd277 + 0 ], { %r5257, %r5258, %r5259, %r5260 }; 2026-02-21T12:39:47.6631127Z // end inline asm 2026-02-21T12:39:47.6631187Z // begin inline asm 2026-02-21T12:39:47.6631311Z st.global.v4.b32 [ %rd278 + 0 ], { %r5222, %r5223, %r5224, %r5225 }; 2026-02-21T12:39:47.6631370Z // end inline asm 2026-02-21T12:39:47.6631432Z // begin inline asm 2026-02-21T12:39:47.6631558Z st.global.v4.b32 [ %rd279 + 0 ], { %r5262, %r5263, %r5264, %r5265 }; 2026-02-21T12:39:47.6631617Z // end inline asm 2026-02-21T12:39:47.6631844Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6631923Z add.s64 %rd313, %rd589, 2; 2026-02-21T12:39:47.6632137Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6632237Z mul.hi.u64 %rd314, %rd313, -3689348814741910323; 2026-02-21T12:39:47.6632304Z shr.u64 %rd315, %rd314, 5; 2026-02-21T12:39:47.6632525Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6632596Z shl.b64 %rd62, %rd315, 3; 2026-02-21T12:39:47.6632797Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6632871Z sub.s64 %rd316, 2048, %rd62; 2026-02-21T12:39:47.6633070Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6633139Z min.s64 %rd63, %rd316, 8; 2026-02-21T12:39:47.6633348Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6633427Z mul.lo.s64 %rd317, %rd315, 40; 2026-02-21T12:39:47.6633496Z sub.s64 %rd64, %rd313, %rd317; 2026-02-21T12:39:47.6633694Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6633856Z or.b64 %rd318, %rd64, %rd63; 2026-02-21T12:39:47.6633932Z and.b64 %rd319, %rd318, -4294967296; 2026-02-21T12:39:47.6634180Z setp.ne.b64 %p14, %rd319, 0; 2026-02-21T12:39:47.6634255Z @%p14 bra $L__BB0_14; 2026-02-21T12:39:47.6634318Z bra.uni $L__BB0_13; 2026-02-21T12:39:47.6634439Z $L__BB0_14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6634515Z div.s64 %rd598, %rd64, %rd63; 2026-02-21T12:39:47.6634578Z bra.uni $L__BB0_15; 2026-02-21T12:39:47.6634689Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6634755Z cvt.u32.u64 %r5395, %rd63; 2026-02-21T12:39:47.6634824Z cvt.u32.u64 %r5396, %rd64; 2026-02-21T12:39:47.6634958Z div.u32 %r5397, %r5396, %r5395; 2026-02-21T12:39:47.6635029Z cvt.u64.u32 %rd598, %r5397; 2026-02-21T12:39:47.6635147Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6635358Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6635430Z mul.lo.s64 %rd321, %rd598, %rd63; 2026-02-21T12:39:47.6635552Z sub.s64 %rd322, %rd64, %rd321; 2026-02-21T12:39:47.6635755Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6635823Z add.s64 %rd323, %rd322, %rd62; 2026-02-21T12:39:47.6636021Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6636093Z shl.b64 %rd68, %rd323, 7; 2026-02-21T12:39:47.6636288Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6636354Z shl.b64 %rd324, %rd598, 8; 2026-02-21T12:39:47.6636716Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6636786Z or.b64 %rd69, %rd324, %rd23; 2026-02-21T12:39:47.6636998Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6637068Z shl.b64 %rd325, %rd323, 21; 2026-02-21T12:39:47.6637140Z add.s64 %rd600, %rd26, %rd325; 2026-02-21T12:39:47.6637205Z add.s64 %rd599, %rd27, %rd324; 2026-02-21T12:39:47.6637269Z mov.b32 %r11503, 0f00000000; 2026-02-21T12:39:47.6637350Z mov.b64 %rd601, -16; 2026-02-21T12:39:47.6637413Z mov.b32 %r11504, %r11503; 2026-02-21T12:39:47.6637474Z mov.b32 %r11505, %r11503; 2026-02-21T12:39:47.6637540Z mov.b32 %r11506, %r11503; 2026-02-21T12:39:47.6637600Z mov.b32 %r11507, %r11503; 2026-02-21T12:39:47.6637662Z mov.b32 %r11508, %r11503; 2026-02-21T12:39:47.6637728Z mov.b32 %r11509, %r11503; 2026-02-21T12:39:47.6637791Z mov.b32 %r11510, %r11503; 2026-02-21T12:39:47.6637851Z mov.b32 %r11511, %r11503; 2026-02-21T12:39:47.6637911Z mov.b32 %r11512, %r11503; 2026-02-21T12:39:47.6637976Z mov.b32 %r11513, %r11503; 2026-02-21T12:39:47.6638038Z mov.b32 %r11514, %r11503; 2026-02-21T12:39:47.6638099Z mov.b32 %r11515, %r11503; 2026-02-21T12:39:47.6638166Z mov.b32 %r11516, %r11503; 2026-02-21T12:39:47.6638228Z mov.b32 %r11517, %r11503; 2026-02-21T12:39:47.6638290Z mov.b32 %r11518, %r11503; 2026-02-21T12:39:47.6638350Z mov.b32 %r11519, %r11503; 2026-02-21T12:39:47.6638416Z mov.b32 %r11520, %r11503; 2026-02-21T12:39:47.6638477Z mov.b32 %r11521, %r11503; 2026-02-21T12:39:47.6638537Z mov.b32 %r11522, %r11503; 2026-02-21T12:39:47.6638604Z mov.b32 %r11523, %r11503; 2026-02-21T12:39:47.6638664Z mov.b32 %r11524, %r11503; 2026-02-21T12:39:47.6638724Z mov.b32 %r11525, %r11503; 2026-02-21T12:39:47.6638782Z mov.b32 %r11526, %r11503; 2026-02-21T12:39:47.6638856Z mov.b32 %r11527, %r11503; 2026-02-21T12:39:47.6638923Z mov.b32 %r11528, %r11503; 2026-02-21T12:39:47.6638984Z mov.b32 %r11529, %r11503; 2026-02-21T12:39:47.6639050Z mov.b32 %r11530, %r11503; 2026-02-21T12:39:47.6639110Z mov.b32 %r11531, %r11503; 2026-02-21T12:39:47.6639259Z mov.b32 %r11532, %r11503; 2026-02-21T12:39:47.6639319Z mov.b32 %r11533, %r11503; 2026-02-21T12:39:47.6639385Z mov.b32 %r11534, %r11503; 2026-02-21T12:39:47.6639514Z mov.b32 %r11535, %r11503; 2026-02-21T12:39:47.6639575Z mov.b32 %r11536, %r11503; 2026-02-21T12:39:47.6639642Z mov.b32 %r11537, %r11503; 2026-02-21T12:39:47.6646975Z mov.b32 %r11538, %r11503; 2026-02-21T12:39:47.6647096Z mov.b32 %r11539, %r11503; 2026-02-21T12:39:47.6647169Z mov.b32 %r11540, %r11503; 2026-02-21T12:39:47.6647240Z mov.b32 %r11541, %r11503; 2026-02-21T12:39:47.6647301Z mov.b32 %r11542, %r11503; 2026-02-21T12:39:47.6647360Z mov.b32 %r11543, %r11503; 2026-02-21T12:39:47.6647426Z mov.b32 %r11544, %r11503; 2026-02-21T12:39:47.6647484Z mov.b32 %r11545, %r11503; 2026-02-21T12:39:47.6647687Z mov.b32 %r11546, %r11503; 2026-02-21T12:39:47.6647752Z mov.b32 %r11547, %r11503; 2026-02-21T12:39:47.6647818Z mov.b32 %r11548, %r11503; 2026-02-21T12:39:47.6647877Z mov.b32 %r11549, %r11503; 2026-02-21T12:39:47.6647939Z mov.b32 %r11550, %r11503; 2026-02-21T12:39:47.6648003Z mov.b32 %r11551, %r11503; 2026-02-21T12:39:47.6648065Z mov.b32 %r11552, %r11503; 2026-02-21T12:39:47.6648209Z mov.b32 %r11553, %r11503; 2026-02-21T12:39:47.6648270Z mov.b32 %r11554, %r11503; 2026-02-21T12:39:47.6648338Z mov.b32 %r11555, %r11503; 2026-02-21T12:39:47.6648413Z mov.b32 %r11556, %r11503; 2026-02-21T12:39:47.6648475Z mov.b32 %r11557, %r11503; 2026-02-21T12:39:47.6648537Z mov.b32 %r11558, %r11503; 2026-02-21T12:39:47.6648595Z mov.b32 %r11559, %r11503; 2026-02-21T12:39:47.6648654Z mov.b32 %r11560, %r11503; 2026-02-21T12:39:47.6648713Z mov.b32 %r11561, %r11503; 2026-02-21T12:39:47.6648780Z mov.b32 %r11562, %r11503; 2026-02-21T12:39:47.6648839Z mov.b32 %r11563, %r11503; 2026-02-21T12:39:47.6648900Z mov.b32 %r11564, %r11503; 2026-02-21T12:39:47.6648966Z mov.b32 %r11565, %r11503; 2026-02-21T12:39:47.6649024Z mov.b32 %r11566, %r11503; 2026-02-21T12:39:47.6649083Z mov.b32 %r11567, %r11503; 2026-02-21T12:39:47.6649145Z mov.b32 %r11568, %r11503; 2026-02-21T12:39:47.6649211Z mov.b32 %r11569, %r11503; 2026-02-21T12:39:47.6649275Z mov.b32 %r11570, %r11503; 2026-02-21T12:39:47.6649340Z mov.b32 %r11571, %r11503; 2026-02-21T12:39:47.6649409Z mov.b32 %r11572, %r11503; 2026-02-21T12:39:47.6649471Z mov.b32 %r11573, %r11503; 2026-02-21T12:39:47.6649531Z mov.b32 %r11574, %r11503; 2026-02-21T12:39:47.6649594Z mov.b32 %r11575, %r11503; 2026-02-21T12:39:47.6649660Z mov.b32 %r11576, %r11503; 2026-02-21T12:39:47.6649721Z mov.b32 %r11577, %r11503; 2026-02-21T12:39:47.6649780Z mov.b32 %r11578, %r11503; 2026-02-21T12:39:47.6649846Z mov.b32 %r11579, %r11503; 2026-02-21T12:39:47.6649907Z mov.b32 %r11580, %r11503; 2026-02-21T12:39:47.6649968Z mov.b32 %r11581, %r11503; 2026-02-21T12:39:47.6650029Z mov.b32 %r11582, %r11503; 2026-02-21T12:39:47.6650103Z mov.b32 %r11583, %r11503; 2026-02-21T12:39:47.6650163Z mov.b32 %r11584, %r11503; 2026-02-21T12:39:47.6650223Z mov.b32 %r11585, %r11503; 2026-02-21T12:39:47.6650296Z mov.b32 %r11586, %r11503; 2026-02-21T12:39:47.6650355Z mov.b32 %r11587, %r11503; 2026-02-21T12:39:47.6650415Z mov.b32 %r11588, %r11503; 2026-02-21T12:39:47.6650481Z mov.b32 %r11589, %r11503; 2026-02-21T12:39:47.6650546Z mov.b32 %r11590, %r11503; 2026-02-21T12:39:47.6650605Z mov.b32 %r11591, %r11503; 2026-02-21T12:39:47.6650667Z mov.b32 %r11592, %r11503; 2026-02-21T12:39:47.6650729Z mov.b32 %r11593, %r11503; 2026-02-21T12:39:47.6650790Z mov.b32 %r11594, %r11503; 2026-02-21T12:39:47.6650848Z mov.b32 %r11595, %r11503; 2026-02-21T12:39:47.6650908Z mov.b32 %r11596, %r11503; 2026-02-21T12:39:47.6650968Z mov.b32 %r11597, %r11503; 2026-02-21T12:39:47.6651026Z mov.b32 %r11598, %r11503; 2026-02-21T12:39:47.6651086Z mov.b32 %r11599, %r11503; 2026-02-21T12:39:47.6651152Z mov.b32 %r11600, %r11503; 2026-02-21T12:39:47.6651213Z mov.b32 %r11601, %r11503; 2026-02-21T12:39:47.6651282Z mov.b32 %r11602, %r11503; 2026-02-21T12:39:47.6651349Z mov.b32 %r11603, %r11503; 2026-02-21T12:39:47.6651502Z mov.b32 %r11604, %r11503; 2026-02-21T12:39:47.6651562Z mov.b32 %r11605, %r11503; 2026-02-21T12:39:47.6651620Z mov.b32 %r11606, %r11503; 2026-02-21T12:39:47.6651752Z mov.b32 %r11607, %r11503; 2026-02-21T12:39:47.6651812Z mov.b32 %r11608, %r11503; 2026-02-21T12:39:47.6651872Z mov.b32 %r11609, %r11503; 2026-02-21T12:39:47.6651935Z mov.b32 %r11610, %r11503; 2026-02-21T12:39:47.6651993Z mov.b32 %r11611, %r11503; 2026-02-21T12:39:47.6652052Z mov.b32 %r11612, %r11503; 2026-02-21T12:39:47.6652112Z mov.b32 %r11613, %r11503; 2026-02-21T12:39:47.6652178Z mov.b32 %r11614, %r11503; 2026-02-21T12:39:47.6652238Z mov.b32 %r11615, %r11503; 2026-02-21T12:39:47.6652296Z mov.b32 %r11616, %r11503; 2026-02-21T12:39:47.6652361Z mov.b32 %r11617, %r11503; 2026-02-21T12:39:47.6652468Z mov.b32 %r11618, %r11503; 2026-02-21T12:39:47.6652530Z mov.b32 %r11619, %r11503; 2026-02-21T12:39:47.6652599Z mov.b32 %r11620, %r11503; 2026-02-21T12:39:47.6652665Z mov.b32 %r11621, %r11503; 2026-02-21T12:39:47.6652728Z mov.b32 %r11622, %r11503; 2026-02-21T12:39:47.6652787Z mov.b32 %r11623, %r11503; 2026-02-21T12:39:47.6652850Z mov.b32 %r11624, %r11503; 2026-02-21T12:39:47.6652981Z mov.b32 %r11625, %r11503; 2026-02-21T12:39:47.6653043Z mov.b32 %r11626, %r11503; 2026-02-21T12:39:47.6653103Z mov.b32 %r11627, %r11503; 2026-02-21T12:39:47.6653168Z mov.b32 %r11628, %r11503; 2026-02-21T12:39:47.6653227Z mov.b32 %r11629, %r11503; 2026-02-21T12:39:47.6653286Z mov.b32 %r11630, %r11503; 2026-02-21T12:39:47.6653426Z $L__BB0_16: // Parent Loop BB0_2 Depth=1 2026-02-21T12:39:47.6653542Z // => This Inner Loop Header: Depth=2 2026-02-21T12:39:47.6653769Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6653850Z add.s64 %rd327, %rd600, -32; 2026-02-21T12:39:47.6653916Z // begin inline asm 2026-02-21T12:39:47.6653978Z mov.u64 %rd326, 0x0; 2026-02-21T12:39:47.6654120Z createpolicy.fractional.L2::evict_last.b64 %rd326, 1.0; 2026-02-21T12:39:47.6654184Z // end inline asm 2026-02-21T12:39:47.6654247Z // begin inline asm 2026-02-21T12:39:47.6654312Z mov.u32 %r5399, 0x0; 2026-02-21T12:39:47.6654375Z mov.u32 %r5400, 0x0; 2026-02-21T12:39:47.6654435Z mov.u32 %r5401, 0x0; 2026-02-21T12:39:47.6654494Z mov.u32 %r5402, 0x0; 2026-02-21T12:39:47.6654744Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5399, %r5400, %r5401, %r5402 }, [ %rd327 + 0 ], %rd326; 2026-02-21T12:39:47.6654805Z // end inline asm 2026-02-21T12:39:47.6655027Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6655087Z bar.sync 0; 2026-02-21T12:39:47.6655195Z st.shared.v2.b32 [%r10], {%r5399, %r5400}; 2026-02-21T12:39:47.6655279Z st.shared.v2.b32 [%r11], {%r5401, %r5402}; 2026-02-21T12:39:47.6655337Z bar.sync 0; 2026-02-21T12:39:47.6655415Z ld.shared.b16 %rs225, [%r13]; 2026-02-21T12:39:47.6655486Z ld.shared.b16 %rs226, [%r13+256]; 2026-02-21T12:39:47.6655558Z ld.shared.b16 %rs227, [%r13+16]; 2026-02-21T12:39:47.6655628Z ld.shared.b16 %rs228, [%r13+272]; 2026-02-21T12:39:47.6655700Z ld.shared.b16 %rs229, [%r14]; 2026-02-21T12:39:47.6655766Z ld.shared.b16 %rs230, [%r14+256]; 2026-02-21T12:39:47.6655838Z ld.shared.b16 %rs231, [%r14+16]; 2026-02-21T12:39:47.6655910Z ld.shared.b16 %rs232, [%r14+272]; 2026-02-21T12:39:47.6655978Z cvt.f32.bf16 %r5661, %rs225; 2026-02-21T12:39:47.6656041Z cvt.f32.bf16 %r5662, %rs226; 2026-02-21T12:39:47.6656108Z cvt.f32.bf16 %r5663, %rs229; 2026-02-21T12:39:47.6656171Z cvt.f32.bf16 %r5664, %rs230; 2026-02-21T12:39:47.6656236Z cvt.f32.bf16 %r5921, %rs227; 2026-02-21T12:39:47.6656297Z cvt.f32.bf16 %r5922, %rs228; 2026-02-21T12:39:47.6656362Z cvt.f32.bf16 %r5923, %rs231; 2026-02-21T12:39:47.6656428Z cvt.f32.bf16 %r5924, %rs232; 2026-02-21T12:39:47.6656785Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6656960Z // begin inline asm 2026-02-21T12:39:47.6657020Z mov.u32 %r5403, 0x0; 2026-02-21T12:39:47.6657079Z mov.u32 %r5404, 0x0; 2026-02-21T12:39:47.6657247Z ld.global.v2.b32 { %r5403, %r5404 }, [ %rd599 + 0 ]; 2026-02-21T12:39:47.6657307Z // end inline asm 2026-02-21T12:39:47.6657521Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6657578Z bar.sync 0; 2026-02-21T12:39:47.6657652Z st.shared.b8 [%r15], %r5403; 2026-02-21T12:39:47.6657724Z prmt.b32 %r6975, %r5403, 0, 0x7771U; 2026-02-21T12:39:47.6657792Z st.shared.b8 [%r16+256], %r6975; 2026-02-21T12:39:47.6657866Z prmt.b32 %r6976, %r5403, 0, 0x7772U; 2026-02-21T12:39:47.6657933Z st.shared.b8 [%r17+512], %r6976; 2026-02-21T12:39:47.6658062Z prmt.b32 %r6977, %r5403, 0, 0x7773U; 2026-02-21T12:39:47.6658130Z st.shared.b8 [%r18+768], %r6977; 2026-02-21T12:39:47.6658206Z st.shared.b8 [%r19+1024], %r5404; 2026-02-21T12:39:47.6658270Z prmt.b32 %r6978, %r5404, 0, 0x7771U; 2026-02-21T12:39:47.6658337Z st.shared.b8 [%r20+1280], %r6978; 2026-02-21T12:39:47.6658404Z prmt.b32 %r6979, %r5404, 0, 0x7772U; 2026-02-21T12:39:47.6658534Z st.shared.b8 [%r21+1536], %r6979; 2026-02-21T12:39:47.6658600Z prmt.b32 %r6980, %r5404, 0, 0x7773U; 2026-02-21T12:39:47.6658663Z st.shared.b8 [%r22+1792], %r6980; 2026-02-21T12:39:47.6658721Z bar.sync 0; 2026-02-21T12:39:47.6658786Z ld.shared.b32 %r6981, [%r44]; 2026-02-21T12:39:47.6658850Z prmt.b32 %r6982, %r6981, 0, 0x7771U; 2026-02-21T12:39:47.6658918Z cvt.u16.u32 %rs233, %r6982; 2026-02-21T12:39:47.6658981Z prmt.b32 %r6983, %r6981, 0, 0x7770U; 2026-02-21T12:39:47.6659043Z cvt.u16.u32 %rs234, %r6983; 2026-02-21T12:39:47.6659107Z prmt.b32 %r6984, %r6981, 0, 0x7773U; 2026-02-21T12:39:47.6659177Z cvt.u16.u32 %rs235, %r6984; 2026-02-21T12:39:47.6659243Z prmt.b32 %r6985, %r6981, 0, 0x7772U; 2026-02-21T12:39:47.6659306Z cvt.u16.u32 %rs236, %r6985; 2026-02-21T12:39:47.6659514Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6659582Z shl.b16 %rs237, %rs234, 4; 2026-02-21T12:39:47.6659646Z shl.b16 %rs238, %rs233, 4; 2026-02-21T12:39:47.6659873Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6659937Z cvt.u32.u16 %r6986, %rs237; 2026-02-21T12:39:47.6660013Z prmt.b32 %r6987, %r6986, %r6988, 0x3340U; 2026-02-21T12:39:47.6660085Z prmt.b32 %r6992, %r6987, %r6989, 0x5410U; 2026-02-21T12:39:47.6660158Z prmt.b32 %r6993, %r6992, %r6981, 0x5040U; 2026-02-21T12:39:47.6660223Z prmt.b32 %r6994, %r6993, 0, 0x9991U; 2026-02-21T12:39:47.6660286Z cvt.u16.u32 %rs239, %r6994; 2026-02-21T12:39:47.6660352Z shr.s16 %rs240, %rs239, 4; 2026-02-21T12:39:47.6660421Z prmt.b32 %r6995, %r6993, 0, 0xbbb3U; 2026-02-21T12:39:47.6660486Z cvt.u16.u32 %rs241, %r6995; 2026-02-21T12:39:47.6660557Z shr.s16 %rs242, %rs241, 4; 2026-02-21T12:39:47.6660622Z cvt.s16.s8 %rs243, %rs237; 2026-02-21T12:39:47.6660687Z shr.s16 %rs244, %rs243, 4; 2026-02-21T12:39:47.6660749Z cvt.s16.s8 %rs245, %rs238; 2026-02-21T12:39:47.6660813Z shr.s16 %rs246, %rs245, 4; 2026-02-21T12:39:47.6661033Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6661104Z cvt.rn.f32.s16 %r6996, %rs242; 2026-02-21T12:39:47.6661175Z cvt.rn.f32.s16 %r6997, %rs240; 2026-02-21T12:39:47.6661240Z cvt.rn.f32.s16 %r6998, %rs246; 2026-02-21T12:39:47.6661306Z cvt.rn.f32.s16 %r6999, %rs244; 2026-02-21T12:39:47.6661506Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6661574Z shl.b16 %rs247, %rs236, 4; 2026-02-21T12:39:47.6661638Z shl.b16 %rs248, %rs235, 4; 2026-02-21T12:39:47.6661837Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6661915Z prmt.b32 %r7000, %r6981, %r7001, 0x3020U; 2026-02-21T12:39:47.6662060Z prmt.b32 %r7002, %r7000, 0, 0x9991U; 2026-02-21T12:39:47.6662122Z cvt.u16.u32 %rs249, %r7002; 2026-02-21T12:39:47.6662190Z shr.s16 %rs250, %rs249, 4; 2026-02-21T12:39:47.6662311Z cvt.s16.s8 %rs251, %rs247; 2026-02-21T12:39:47.6662374Z shr.s16 %rs252, %rs251, 4; 2026-02-21T12:39:47.6662437Z cvt.s16.s8 %rs253, %rs248; 2026-02-21T12:39:47.6662504Z shr.s16 %rs254, %rs253, 4; 2026-02-21T12:39:47.6662572Z prmt.b32 %r7003, %r6981, 0, 0xbbb3U; 2026-02-21T12:39:47.6662640Z cvt.u16.u32 %rs255, %r7003; 2026-02-21T12:39:47.6662704Z shr.s16 %rs256, %rs255, 4; 2026-02-21T12:39:47.6662904Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6662967Z cvt.rn.f32.s16 %r7004, %rs250; 2026-02-21T12:39:47.6663076Z cvt.rn.f32.s16 %r7005, %rs256; 2026-02-21T12:39:47.6663141Z cvt.rn.f32.s16 %r7006, %rs254; 2026-02-21T12:39:47.6663203Z cvt.rn.f32.s16 %r7007, %rs252; 2026-02-21T12:39:47.6663397Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6663473Z ld.shared.b32 %r7008, [%r44+128]; 2026-02-21T12:39:47.6663537Z prmt.b32 %r7009, %r7008, 0, 0x7771U; 2026-02-21T12:39:47.6663650Z cvt.u16.u32 %rs257, %r7009; 2026-02-21T12:39:47.6663723Z prmt.b32 %r7010, %r7008, 0, 0x7770U; 2026-02-21T12:39:47.6663782Z cvt.u16.u32 %rs258, %r7010; 2026-02-21T12:39:47.6663848Z prmt.b32 %r7011, %r7008, 0, 0x7773U; 2026-02-21T12:39:47.6663910Z cvt.u16.u32 %rs259, %r7011; 2026-02-21T12:39:47.6663981Z prmt.b32 %r7012, %r7008, 0, 0x7772U; 2026-02-21T12:39:47.6664041Z cvt.u16.u32 %rs260, %r7012; 2026-02-21T12:39:47.6664238Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6664305Z shl.b16 %rs261, %rs258, 4; 2026-02-21T12:39:47.6664367Z shl.b16 %rs262, %rs257, 4; 2026-02-21T12:39:47.6664561Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6664630Z cvt.u32.u16 %r7013, %rs261; 2026-02-21T12:39:47.6664703Z prmt.b32 %r7014, %r7013, %r7015, 0x3340U; 2026-02-21T12:39:47.6664772Z prmt.b32 %r7016, %r7014, %r6989, 0x5410U; 2026-02-21T12:39:47.6664844Z prmt.b32 %r7017, %r7016, %r7008, 0x5040U; 2026-02-21T12:39:47.6664913Z prmt.b32 %r7018, %r7017, 0, 0x9991U; 2026-02-21T12:39:47.6664974Z cvt.u16.u32 %rs263, %r7018; 2026-02-21T12:39:47.6665037Z shr.s16 %rs264, %rs263, 4; 2026-02-21T12:39:47.6665108Z prmt.b32 %r7019, %r7017, 0, 0xbbb3U; 2026-02-21T12:39:47.6665172Z cvt.u16.u32 %rs265, %r7019; 2026-02-21T12:39:47.6665234Z shr.s16 %rs266, %rs265, 4; 2026-02-21T12:39:47.6665301Z cvt.s16.s8 %rs267, %rs261; 2026-02-21T12:39:47.6665363Z shr.s16 %rs268, %rs267, 4; 2026-02-21T12:39:47.6665425Z cvt.s16.s8 %rs269, %rs262; 2026-02-21T12:39:47.6665486Z shr.s16 %rs270, %rs269, 4; 2026-02-21T12:39:47.6665685Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6665750Z cvt.rn.f32.s16 %r7020, %rs266; 2026-02-21T12:39:47.6665817Z cvt.rn.f32.s16 %r7021, %rs264; 2026-02-21T12:39:47.6665884Z cvt.rn.f32.s16 %r7022, %rs270; 2026-02-21T12:39:47.6665948Z cvt.rn.f32.s16 %r7023, %rs268; 2026-02-21T12:39:47.6666144Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6666207Z shl.b16 %rs271, %rs260, 4; 2026-02-21T12:39:47.6666273Z shl.b16 %rs272, %rs259, 4; 2026-02-21T12:39:47.6666597Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6666678Z prmt.b32 %r7024, %r7008, %r7025, 0x3020U; 2026-02-21T12:39:47.6666746Z prmt.b32 %r7026, %r7024, 0, 0x9991U; 2026-02-21T12:39:47.6666809Z cvt.u16.u32 %rs273, %r7026; 2026-02-21T12:39:47.6666872Z shr.s16 %rs274, %rs273, 4; 2026-02-21T12:39:47.6666936Z cvt.s16.s8 %rs275, %rs271; 2026-02-21T12:39:47.6666997Z shr.s16 %rs276, %rs275, 4; 2026-02-21T12:39:47.6667059Z cvt.s16.s8 %rs277, %rs272; 2026-02-21T12:39:47.6667208Z shr.s16 %rs278, %rs277, 4; 2026-02-21T12:39:47.6667280Z prmt.b32 %r7027, %r7008, 0, 0xbbb3U; 2026-02-21T12:39:47.6667404Z cvt.u16.u32 %rs279, %r7027; 2026-02-21T12:39:47.6667468Z shr.s16 %rs280, %rs279, 4; 2026-02-21T12:39:47.6667669Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6667734Z cvt.rn.f32.s16 %r7028, %rs274; 2026-02-21T12:39:47.6667798Z cvt.rn.f32.s16 %r7029, %rs280; 2026-02-21T12:39:47.6667872Z cvt.rn.f32.s16 %r7030, %rs278; 2026-02-21T12:39:47.6667939Z cvt.rn.f32.s16 %r7031, %rs276; 2026-02-21T12:39:47.6667997Z bar.sync 0; 2026-02-21T12:39:47.6668112Z st.shared.v4.b32 [%r24], {%r6999, %r6997, %r6998, %r6996}; 2026-02-21T12:39:47.6668296Z st.shared.v4.b32 [%r25], {%r7007, %r7004, %r7006, %r7005}; 2026-02-21T12:39:47.6668494Z st.shared.v4.b32 [%r26], {%r7023, %r7021, %r7022, %r7020}; 2026-02-21T12:39:47.6668599Z st.shared.v4.b32 [%r27], {%r7031, %r7028, %r7030, %r7029}; 2026-02-21T12:39:47.6668662Z $L__tmp9: 2026-02-21T12:39:47.6669016Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6669082Z // begin inline asm 2026-02-21T12:39:47.6669164Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6669226Z // end inline asm 2026-02-21T12:39:47.6669281Z bar.sync 0; 2026-02-21T12:39:47.6669362Z shfl.sync.idx.b32 %r7032, %r2, 0, 31, -1; 2026-02-21T12:39:47.6669441Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6669507Z mov.pred %p15, -1; 2026-02-21T12:39:47.6669566Z // begin inline asm 2026-02-21T12:39:47.6672299Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630}, {%r5661,%r5662,%r5663,%r5664}, %rd404, %p15, 1, 1; 2026-02-21T12:39:47.6672370Z // end inline asm 2026-02-21T12:39:47.6672434Z // begin inline asm 2026-02-21T12:39:47.6675162Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630}, {%r5921,%r5922,%r5923,%r5924}, %rd405, %p15, 1, 1; 2026-02-21T12:39:47.6675289Z // end inline asm 2026-02-21T12:39:47.6675367Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6675426Z mov.b32 %r6842, 0; 2026-02-21T12:39:47.6675538Z mov.b32 %r6053, %r10096; 2026-02-21T12:39:47.6675601Z mov.b32 %r6054, %r6842; 2026-02-21T12:39:47.6675660Z mov.b32 %r6055, %r6842; 2026-02-21T12:39:47.6675729Z // begin inline asm 2026-02-21T12:39:47.6678527Z // wait for regs: %r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630,%r6053,%r6054,%r6055 2026-02-21T12:39:47.6678622Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6678683Z // end inline asm 2026-02-21T12:39:47.6678738Z $L__tmp10: 2026-02-21T12:39:47.6678954Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6679019Z // begin inline asm 2026-02-21T12:39:47.6679083Z mov.u64 %rd332, 0x0; 2026-02-21T12:39:47.6679214Z createpolicy.fractional.L2::evict_last.b64 %rd332, 1.0; 2026-02-21T12:39:47.6679281Z // end inline asm 2026-02-21T12:39:47.6679340Z // begin inline asm 2026-02-21T12:39:47.6679401Z mov.u32 %r6187, 0x0; 2026-02-21T12:39:47.6679460Z mov.u32 %r6188, 0x0; 2026-02-21T12:39:47.6679520Z mov.u32 %r6189, 0x0; 2026-02-21T12:39:47.6679581Z mov.u32 %r6190, 0x0; 2026-02-21T12:39:47.6679812Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6187, %r6188, %r6189, %r6190 }, [ %rd600 + 0 ], %rd332; 2026-02-21T12:39:47.6679870Z // end inline asm 2026-02-21T12:39:47.6680081Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6680138Z bar.sync 0; 2026-02-21T12:39:47.6680221Z st.shared.v2.b32 [%r10], {%r6187, %r6188}; 2026-02-21T12:39:47.6680298Z st.shared.v2.b32 [%r11], {%r6189, %r6190}; 2026-02-21T12:39:47.6680369Z bar.sync 0; 2026-02-21T12:39:47.6680441Z ld.shared.b16 %rs281, [%r13]; 2026-02-21T12:39:47.6680512Z ld.shared.b16 %rs282, [%r13+256]; 2026-02-21T12:39:47.6680584Z ld.shared.b16 %rs283, [%r13+16]; 2026-02-21T12:39:47.6680652Z ld.shared.b16 %rs284, [%r13+272]; 2026-02-21T12:39:47.6680721Z ld.shared.b16 %rs285, [%r14]; 2026-02-21T12:39:47.6680786Z ld.shared.b16 %rs286, [%r14+256]; 2026-02-21T12:39:47.6680855Z ld.shared.b16 %rs287, [%r14+16]; 2026-02-21T12:39:47.6680922Z ld.shared.b16 %rs288, [%r14+272]; 2026-02-21T12:39:47.6680990Z cvt.f32.bf16 %r6449, %rs281; 2026-02-21T12:39:47.6681055Z cvt.f32.bf16 %r6450, %rs282; 2026-02-21T12:39:47.6681117Z cvt.f32.bf16 %r6451, %rs285; 2026-02-21T12:39:47.6681179Z cvt.f32.bf16 %r6452, %rs286; 2026-02-21T12:39:47.6681244Z cvt.f32.bf16 %r6709, %rs283; 2026-02-21T12:39:47.6681305Z cvt.f32.bf16 %r6710, %rs284; 2026-02-21T12:39:47.6681368Z cvt.f32.bf16 %r6711, %rs287; 2026-02-21T12:39:47.6681429Z cvt.f32.bf16 %r6712, %rs288; 2026-02-21T12:39:47.6681640Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6681707Z add.s64 %rd335, %rd599, 10240; 2026-02-21T12:39:47.6681769Z // begin inline asm 2026-02-21T12:39:47.6681833Z mov.u32 %r6191, 0x0; 2026-02-21T12:39:47.6681997Z mov.u32 %r6192, 0x0; 2026-02-21T12:39:47.6682097Z ld.global.v2.b32 { %r6191, %r6192 }, [ %rd335 + 0 ]; 2026-02-21T12:39:47.6682156Z // end inline asm 2026-02-21T12:39:47.6682440Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6682498Z bar.sync 0; 2026-02-21T12:39:47.6682566Z st.shared.b8 [%r15], %r6191; 2026-02-21T12:39:47.6682639Z prmt.b32 %r7033, %r6191, 0, 0x7771U; 2026-02-21T12:39:47.6682709Z st.shared.b8 [%r16+256], %r7033; 2026-02-21T12:39:47.6682775Z prmt.b32 %r7034, %r6191, 0, 0x7772U; 2026-02-21T12:39:47.6682849Z st.shared.b8 [%r17+512], %r7034; 2026-02-21T12:39:47.6682913Z prmt.b32 %r7035, %r6191, 0, 0x7773U; 2026-02-21T12:39:47.6682979Z st.shared.b8 [%r18+768], %r7035; 2026-02-21T12:39:47.6683108Z st.shared.b8 [%r19+1024], %r6192; 2026-02-21T12:39:47.6683180Z prmt.b32 %r7036, %r6192, 0, 0x7771U; 2026-02-21T12:39:47.6683245Z st.shared.b8 [%r20+1280], %r7036; 2026-02-21T12:39:47.6683310Z prmt.b32 %r7037, %r6192, 0, 0x7772U; 2026-02-21T12:39:47.6683381Z st.shared.b8 [%r21+1536], %r7037; 2026-02-21T12:39:47.6683446Z prmt.b32 %r7038, %r6192, 0, 0x7773U; 2026-02-21T12:39:47.6683575Z st.shared.b8 [%r22+1792], %r7038; 2026-02-21T12:39:47.6683634Z bar.sync 0; 2026-02-21T12:39:47.6683704Z ld.shared.b32 %r7039, [%r44]; 2026-02-21T12:39:47.6683768Z prmt.b32 %r7040, %r7039, 0, 0x7771U; 2026-02-21T12:39:47.6683833Z cvt.u16.u32 %rs289, %r7040; 2026-02-21T12:39:47.6683900Z prmt.b32 %r7041, %r7039, 0, 0x7770U; 2026-02-21T12:39:47.6683962Z cvt.u16.u32 %rs290, %r7041; 2026-02-21T12:39:47.6684026Z prmt.b32 %r7042, %r7039, 0, 0x7773U; 2026-02-21T12:39:47.6684090Z cvt.u16.u32 %rs291, %r7042; 2026-02-21T12:39:47.6684160Z prmt.b32 %r7043, %r7039, 0, 0x7772U; 2026-02-21T12:39:47.6684221Z cvt.u16.u32 %rs292, %r7043; 2026-02-21T12:39:47.6684429Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6684501Z shl.b16 %rs293, %rs290, 4; 2026-02-21T12:39:47.6684566Z shl.b16 %rs294, %rs289, 4; 2026-02-21T12:39:47.6684764Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6684832Z cvt.u32.u16 %r7044, %rs293; 2026-02-21T12:39:47.6684921Z prmt.b32 %r7045, %r7044, %r7046, 0x3340U; 2026-02-21T12:39:47.6684997Z prmt.b32 %r7047, %r7045, %r6989, 0x5410U; 2026-02-21T12:39:47.6685068Z prmt.b32 %r7048, %r7047, %r7039, 0x5040U; 2026-02-21T12:39:47.6685137Z prmt.b32 %r7049, %r7048, 0, 0x9991U; 2026-02-21T12:39:47.6685198Z cvt.u16.u32 %rs295, %r7049; 2026-02-21T12:39:47.6685259Z shr.s16 %rs296, %rs295, 4; 2026-02-21T12:39:47.6685328Z prmt.b32 %r7050, %r7048, 0, 0xbbb3U; 2026-02-21T12:39:47.6685388Z cvt.u16.u32 %rs297, %r7050; 2026-02-21T12:39:47.6685451Z shr.s16 %rs298, %rs297, 4; 2026-02-21T12:39:47.6685521Z cvt.s16.s8 %rs299, %rs293; 2026-02-21T12:39:47.6685582Z shr.s16 %rs300, %rs299, 4; 2026-02-21T12:39:47.6685644Z cvt.s16.s8 %rs301, %rs294; 2026-02-21T12:39:47.6685705Z shr.s16 %rs302, %rs301, 4; 2026-02-21T12:39:47.6685900Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6685981Z cvt.rn.f32.s16 %r7051, %rs298; 2026-02-21T12:39:47.6686044Z cvt.rn.f32.s16 %r7052, %rs296; 2026-02-21T12:39:47.6686108Z cvt.rn.f32.s16 %r7053, %rs302; 2026-02-21T12:39:47.6686171Z cvt.rn.f32.s16 %r7054, %rs300; 2026-02-21T12:39:47.6686368Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6686433Z shl.b16 %rs303, %rs292, 4; 2026-02-21T12:39:47.6686616Z shl.b16 %rs304, %rs291, 4; 2026-02-21T12:39:47.6686817Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6686890Z prmt.b32 %r7055, %r7039, %r7056, 0x3020U; 2026-02-21T12:39:47.6686961Z prmt.b32 %r7057, %r7055, 0, 0x9991U; 2026-02-21T12:39:47.6687023Z cvt.u16.u32 %rs305, %r7057; 2026-02-21T12:39:47.6687189Z shr.s16 %rs306, %rs305, 4; 2026-02-21T12:39:47.6687256Z cvt.s16.s8 %rs307, %rs303; 2026-02-21T12:39:47.6687316Z shr.s16 %rs308, %rs307, 4; 2026-02-21T12:39:47.6687460Z cvt.s16.s8 %rs309, %rs304; 2026-02-21T12:39:47.6687521Z shr.s16 %rs310, %rs309, 4; 2026-02-21T12:39:47.6687591Z prmt.b32 %r7058, %r7039, 0, 0xbbb3U; 2026-02-21T12:39:47.6687657Z cvt.u16.u32 %rs311, %r7058; 2026-02-21T12:39:47.6687722Z shr.s16 %rs312, %rs311, 4; 2026-02-21T12:39:47.6687919Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6687985Z cvt.rn.f32.s16 %r7059, %rs306; 2026-02-21T12:39:47.6688050Z cvt.rn.f32.s16 %r7060, %rs312; 2026-02-21T12:39:47.6688113Z cvt.rn.f32.s16 %r7061, %rs310; 2026-02-21T12:39:47.6688243Z cvt.rn.f32.s16 %r7062, %rs308; 2026-02-21T12:39:47.6688440Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6688510Z ld.shared.b32 %r7063, [%r44+128]; 2026-02-21T12:39:47.6688582Z prmt.b32 %r7064, %r7063, 0, 0x7771U; 2026-02-21T12:39:47.6688644Z cvt.u16.u32 %rs313, %r7064; 2026-02-21T12:39:47.6688793Z prmt.b32 %r7065, %r7063, 0, 0x7770U; 2026-02-21T12:39:47.6688859Z cvt.u16.u32 %rs314, %r7065; 2026-02-21T12:39:47.6688926Z prmt.b32 %r7066, %r7063, 0, 0x7773U; 2026-02-21T12:39:47.6688993Z cvt.u16.u32 %rs315, %r7066; 2026-02-21T12:39:47.6689057Z prmt.b32 %r7067, %r7063, 0, 0x7772U; 2026-02-21T12:39:47.6689118Z cvt.u16.u32 %rs316, %r7067; 2026-02-21T12:39:47.6689312Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6689379Z shl.b16 %rs317, %rs314, 4; 2026-02-21T12:39:47.6689440Z shl.b16 %rs318, %rs313, 4; 2026-02-21T12:39:47.6689636Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6689702Z cvt.u32.u16 %r7068, %rs317; 2026-02-21T12:39:47.6689773Z prmt.b32 %r7069, %r7068, %r7070, 0x3340U; 2026-02-21T12:39:47.6689846Z prmt.b32 %r7071, %r7069, %r6989, 0x5410U; 2026-02-21T12:39:47.6689919Z prmt.b32 %r7072, %r7071, %r7063, 0x5040U; 2026-02-21T12:39:47.6689999Z prmt.b32 %r7073, %r7072, 0, 0x9991U; 2026-02-21T12:39:47.6690061Z cvt.u16.u32 %rs319, %r7073; 2026-02-21T12:39:47.6690122Z shr.s16 %rs320, %rs319, 4; 2026-02-21T12:39:47.6690191Z prmt.b32 %r7074, %r7072, 0, 0xbbb3U; 2026-02-21T12:39:47.6690252Z cvt.u16.u32 %rs321, %r7074; 2026-02-21T12:39:47.6690313Z shr.s16 %rs322, %rs321, 4; 2026-02-21T12:39:47.6690377Z cvt.s16.s8 %rs323, %rs317; 2026-02-21T12:39:47.6690437Z shr.s16 %rs324, %rs323, 4; 2026-02-21T12:39:47.6690496Z cvt.s16.s8 %rs325, %rs318; 2026-02-21T12:39:47.6690555Z shr.s16 %rs326, %rs325, 4; 2026-02-21T12:39:47.6690760Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6690824Z cvt.rn.f32.s16 %r7075, %rs322; 2026-02-21T12:39:47.6690890Z cvt.rn.f32.s16 %r7076, %rs320; 2026-02-21T12:39:47.6690957Z cvt.rn.f32.s16 %r7077, %rs326; 2026-02-21T12:39:47.6691021Z cvt.rn.f32.s16 %r7078, %rs324; 2026-02-21T12:39:47.6691218Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6691284Z shl.b16 %rs327, %rs316, 4; 2026-02-21T12:39:47.6691345Z shl.b16 %rs328, %rs315, 4; 2026-02-21T12:39:47.6691539Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6691611Z prmt.b32 %r7079, %r7063, %r7080, 0x3020U; 2026-02-21T12:39:47.6691679Z prmt.b32 %r7081, %r7079, 0, 0x9991U; 2026-02-21T12:39:47.6691740Z cvt.u16.u32 %rs329, %r7081; 2026-02-21T12:39:47.6691801Z shr.s16 %rs330, %rs329, 4; 2026-02-21T12:39:47.6691866Z cvt.s16.s8 %rs331, %rs327; 2026-02-21T12:39:47.6691927Z shr.s16 %rs332, %rs331, 4; 2026-02-21T12:39:47.6691988Z cvt.s16.s8 %rs333, %rs328; 2026-02-21T12:39:47.6692048Z shr.s16 %rs334, %rs333, 4; 2026-02-21T12:39:47.6692115Z prmt.b32 %r7082, %r7063, 0, 0xbbb3U; 2026-02-21T12:39:47.6692247Z cvt.u16.u32 %rs335, %r7082; 2026-02-21T12:39:47.6692309Z shr.s16 %rs336, %rs335, 4; 2026-02-21T12:39:47.6692560Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6692624Z cvt.rn.f32.s16 %r7083, %rs330; 2026-02-21T12:39:47.6692687Z cvt.rn.f32.s16 %r7084, %rs336; 2026-02-21T12:39:47.6692750Z cvt.rn.f32.s16 %r7085, %rs334; 2026-02-21T12:39:47.6692812Z cvt.rn.f32.s16 %r7086, %rs332; 2026-02-21T12:39:47.6692867Z bar.sync 0; 2026-02-21T12:39:47.6692979Z st.shared.v4.b32 [%r24], {%r7054, %r7052, %r7053, %r7051}; 2026-02-21T12:39:47.6693088Z st.shared.v4.b32 [%r25], {%r7062, %r7059, %r7061, %r7060}; 2026-02-21T12:39:47.6693236Z st.shared.v4.b32 [%r26], {%r7078, %r7076, %r7077, %r7075}; 2026-02-21T12:39:47.6693338Z st.shared.v4.b32 [%r27], {%r7086, %r7083, %r7085, %r7084}; 2026-02-21T12:39:47.6693398Z $L__tmp11: 2026-02-21T12:39:47.6693672Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6693747Z // begin inline asm 2026-02-21T12:39:47.6693879Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6693938Z // end inline asm 2026-02-21T12:39:47.6693996Z bar.sync 0; 2026-02-21T12:39:47.6694067Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6694130Z // begin inline asm 2026-02-21T12:39:47.6696976Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630}, {%r6449,%r6450,%r6451,%r6452}, %rd404, %p15, 1, 1; 2026-02-21T12:39:47.6697053Z // end inline asm 2026-02-21T12:39:47.6697113Z // begin inline asm 2026-02-21T12:39:47.6699856Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630}, {%r6709,%r6710,%r6711,%r6712}, %rd405, %p15, 1, 1; 2026-02-21T12:39:47.6699921Z // end inline asm 2026-02-21T12:39:47.6700000Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6700065Z mov.b32 %r6841, %r10096; 2026-02-21T12:39:47.6700128Z mov.b32 %r6843, %r6842; 2026-02-21T12:39:47.6700284Z // begin inline asm 2026-02-21T12:39:47.6702870Z // wait for regs: %r11503,%r11504,%r11505,%r11506,%r11507,%r11508,%r11509,%r11510,%r11511,%r11512,%r11513,%r11514,%r11515,%r11516,%r11517,%r11518,%r11519,%r11520,%r11521,%r11522,%r11523,%r11524,%r11525,%r11526,%r11527,%r11528,%r11529,%r11530,%r11531,%r11532,%r11533,%r11534,%r11535,%r11536,%r11537,%r11538,%r11539,%r11540,%r11541,%r11542,%r11543,%r11544,%r11545,%r11546,%r11547,%r11548,%r11549,%r11550,%r11551,%r11552,%r11553,%r11554,%r11555,%r11556,%r11557,%r11558,%r11559,%r11560,%r11561,%r11562,%r11563,%r11564,%r11565,%r11566,%r11567,%r11568,%r11569,%r11570,%r11571,%r11572,%r11573,%r11574,%r11575,%r11576,%r11577,%r11578,%r11579,%r11580,%r11581,%r11582,%r11583,%r11584,%r11585,%r11586,%r11587,%r11588,%r11589,%r11590,%r11591,%r11592,%r11593,%r11594,%r11595,%r11596,%r11597,%r11598,%r11599,%r11600,%r11601,%r11602,%r11603,%r11604,%r11605,%r11606,%r11607,%r11608,%r11609,%r11610,%r11611,%r11612,%r11613,%r11614,%r11615,%r11616,%r11617,%r11618,%r11619,%r11620,%r11621,%r11622,%r11623,%r11624,%r11625,%r11626,%r11627,%r11628,%r11629,%r11630,%r6841,%r6842,%r6843 2026-02-21T12:39:47.6703008Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6703140Z // end inline asm 2026-02-21T12:39:47.6703202Z $L__tmp12: 2026-02-21T12:39:47.6703421Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6703491Z add.s64 %rd601, %rd601, 16; 2026-02-21T12:39:47.6703553Z add.s64 %rd600, %rd600, 64; 2026-02-21T12:39:47.6703619Z add.s64 %rd599, %rd599, 20480; 2026-02-21T12:39:47.6703690Z setp.lt.u64 %p19, %rd601, 4080; 2026-02-21T12:39:47.6703752Z @%p19 bra $L__BB0_16; 2026-02-21T12:39:47.6703865Z // %bb.17: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6704072Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6704139Z or.b64 %rd354, %rd68, %rd7; 2026-02-21T12:39:47.6704203Z or.b64 %rd355, %rd68, %rd8; 2026-02-21T12:39:47.6704266Z or.b64 %rd356, %rd68, %rd9; 2026-02-21T12:39:47.6704330Z or.b64 %rd357, %rd68, %rd10; 2026-02-21T12:39:47.6704392Z or.b64 %rd358, %rd68, %rd11; 2026-02-21T12:39:47.6704467Z or.b64 %rd359, %rd68, %rd12; 2026-02-21T12:39:47.6704529Z or.b64 %rd360, %rd68, %rd13; 2026-02-21T12:39:47.6704598Z or.b64 %rd361, %rd68, %rd14; 2026-02-21T12:39:47.6704658Z or.b64 %rd362, %rd68, %rd15; 2026-02-21T12:39:47.6704717Z or.b64 %rd363, %rd68, %rd16; 2026-02-21T12:39:47.6704780Z or.b64 %rd364, %rd68, %rd17; 2026-02-21T12:39:47.6704839Z or.b64 %rd365, %rd68, %rd18; 2026-02-21T12:39:47.6704898Z or.b64 %rd366, %rd68, %rd19; 2026-02-21T12:39:47.6704961Z or.b64 %rd367, %rd68, %rd20; 2026-02-21T12:39:47.6705021Z or.b64 %rd368, %rd68, %rd21; 2026-02-21T12:39:47.6705082Z or.b64 %rd369, %rd68, %rd22; 2026-02-21T12:39:47.6705282Z .loc 1 90 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:90:28 2026-02-21T12:39:47.6705372Z cvt.rn.bf16x2.f32 %r7231, %r11504, %r11503; 2026-02-21T12:39:47.6705455Z cvt.rn.bf16x2.f32 %r7232, %r11506, %r11505; 2026-02-21T12:39:47.6705532Z cvt.rn.bf16x2.f32 %r7233, %r11508, %r11507; 2026-02-21T12:39:47.6705615Z cvt.rn.bf16x2.f32 %r7234, %r11510, %r11509; 2026-02-21T12:39:47.6705692Z cvt.rn.bf16x2.f32 %r7235, %r11512, %r11511; 2026-02-21T12:39:47.6705767Z cvt.rn.bf16x2.f32 %r7236, %r11514, %r11513; 2026-02-21T12:39:47.6705843Z cvt.rn.bf16x2.f32 %r7237, %r11516, %r11515; 2026-02-21T12:39:47.6705922Z cvt.rn.bf16x2.f32 %r7238, %r11518, %r11517; 2026-02-21T12:39:47.6705996Z cvt.rn.bf16x2.f32 %r7239, %r11520, %r11519; 2026-02-21T12:39:47.6706069Z cvt.rn.bf16x2.f32 %r7240, %r11522, %r11521; 2026-02-21T12:39:47.6706147Z cvt.rn.bf16x2.f32 %r7241, %r11524, %r11523; 2026-02-21T12:39:47.6706222Z cvt.rn.bf16x2.f32 %r7242, %r11526, %r11525; 2026-02-21T12:39:47.6706296Z cvt.rn.bf16x2.f32 %r7243, %r11528, %r11527; 2026-02-21T12:39:47.6706373Z cvt.rn.bf16x2.f32 %r7244, %r11530, %r11529; 2026-02-21T12:39:47.6706641Z cvt.rn.bf16x2.f32 %r7245, %r11532, %r11531; 2026-02-21T12:39:47.6706719Z cvt.rn.bf16x2.f32 %r7246, %r11534, %r11533; 2026-02-21T12:39:47.6706795Z cvt.rn.bf16x2.f32 %r7247, %r11536, %r11535; 2026-02-21T12:39:47.6706949Z cvt.rn.bf16x2.f32 %r7248, %r11538, %r11537; 2026-02-21T12:39:47.6707022Z cvt.rn.bf16x2.f32 %r7249, %r11540, %r11539; 2026-02-21T12:39:47.6707097Z cvt.rn.bf16x2.f32 %r7250, %r11542, %r11541; 2026-02-21T12:39:47.6707174Z cvt.rn.bf16x2.f32 %r7251, %r11544, %r11543; 2026-02-21T12:39:47.6707248Z cvt.rn.bf16x2.f32 %r7252, %r11546, %r11545; 2026-02-21T12:39:47.6707321Z cvt.rn.bf16x2.f32 %r7253, %r11548, %r11547; 2026-02-21T12:39:47.6707398Z cvt.rn.bf16x2.f32 %r7254, %r11550, %r11549; 2026-02-21T12:39:47.6707472Z cvt.rn.bf16x2.f32 %r7255, %r11552, %r11551; 2026-02-21T12:39:47.6707625Z cvt.rn.bf16x2.f32 %r7256, %r11554, %r11553; 2026-02-21T12:39:47.6707703Z cvt.rn.bf16x2.f32 %r7257, %r11556, %r11555; 2026-02-21T12:39:47.6707779Z cvt.rn.bf16x2.f32 %r7258, %r11558, %r11557; 2026-02-21T12:39:47.6707857Z cvt.rn.bf16x2.f32 %r7259, %r11560, %r11559; 2026-02-21T12:39:47.6707943Z cvt.rn.bf16x2.f32 %r7260, %r11562, %r11561; 2026-02-21T12:39:47.6708084Z cvt.rn.bf16x2.f32 %r7261, %r11564, %r11563; 2026-02-21T12:39:47.6708164Z cvt.rn.bf16x2.f32 %r7262, %r11566, %r11565; 2026-02-21T12:39:47.6708238Z cvt.rn.bf16x2.f32 %r7263, %r11568, %r11567; 2026-02-21T12:39:47.6708314Z cvt.rn.bf16x2.f32 %r7264, %r11570, %r11569; 2026-02-21T12:39:47.6708445Z cvt.rn.bf16x2.f32 %r7265, %r11572, %r11571; 2026-02-21T12:39:47.6708524Z cvt.rn.bf16x2.f32 %r7266, %r11574, %r11573; 2026-02-21T12:39:47.6708601Z cvt.rn.bf16x2.f32 %r7267, %r11576, %r11575; 2026-02-21T12:39:47.6708678Z cvt.rn.bf16x2.f32 %r7268, %r11578, %r11577; 2026-02-21T12:39:47.6708755Z cvt.rn.bf16x2.f32 %r7269, %r11580, %r11579; 2026-02-21T12:39:47.6708831Z cvt.rn.bf16x2.f32 %r7270, %r11582, %r11581; 2026-02-21T12:39:47.6708908Z cvt.rn.bf16x2.f32 %r7271, %r11584, %r11583; 2026-02-21T12:39:47.6708981Z cvt.rn.bf16x2.f32 %r7272, %r11586, %r11585; 2026-02-21T12:39:47.6709059Z cvt.rn.bf16x2.f32 %r7273, %r11588, %r11587; 2026-02-21T12:39:47.6709137Z cvt.rn.bf16x2.f32 %r7274, %r11590, %r11589; 2026-02-21T12:39:47.6709213Z cvt.rn.bf16x2.f32 %r7275, %r11592, %r11591; 2026-02-21T12:39:47.6709287Z cvt.rn.bf16x2.f32 %r7276, %r11594, %r11593; 2026-02-21T12:39:47.6709363Z cvt.rn.bf16x2.f32 %r7277, %r11596, %r11595; 2026-02-21T12:39:47.6709444Z cvt.rn.bf16x2.f32 %r7278, %r11598, %r11597; 2026-02-21T12:39:47.6709520Z cvt.rn.bf16x2.f32 %r7279, %r11600, %r11599; 2026-02-21T12:39:47.6709593Z cvt.rn.bf16x2.f32 %r7280, %r11602, %r11601; 2026-02-21T12:39:47.6709671Z cvt.rn.bf16x2.f32 %r7281, %r11604, %r11603; 2026-02-21T12:39:47.6709747Z cvt.rn.bf16x2.f32 %r7282, %r11606, %r11605; 2026-02-21T12:39:47.6709825Z cvt.rn.bf16x2.f32 %r7283, %r11608, %r11607; 2026-02-21T12:39:47.6709905Z cvt.rn.bf16x2.f32 %r7284, %r11610, %r11609; 2026-02-21T12:39:47.6709981Z cvt.rn.bf16x2.f32 %r7285, %r11612, %r11611; 2026-02-21T12:39:47.6710058Z cvt.rn.bf16x2.f32 %r7286, %r11614, %r11613; 2026-02-21T12:39:47.6710131Z cvt.rn.bf16x2.f32 %r7287, %r11616, %r11615; 2026-02-21T12:39:47.6710213Z cvt.rn.bf16x2.f32 %r7288, %r11618, %r11617; 2026-02-21T12:39:47.6710289Z cvt.rn.bf16x2.f32 %r7289, %r11620, %r11619; 2026-02-21T12:39:47.6710362Z cvt.rn.bf16x2.f32 %r7290, %r11622, %r11621; 2026-02-21T12:39:47.6710442Z cvt.rn.bf16x2.f32 %r7291, %r11624, %r11623; 2026-02-21T12:39:47.6710516Z cvt.rn.bf16x2.f32 %r7292, %r11626, %r11625; 2026-02-21T12:39:47.6710591Z cvt.rn.bf16x2.f32 %r7293, %r11628, %r11627; 2026-02-21T12:39:47.6710668Z cvt.rn.bf16x2.f32 %r7294, %r11630, %r11629; 2026-02-21T12:39:47.6710876Z .loc 1 91 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:22 2026-02-21T12:39:47.6710952Z mad.lo.s64 %rd370, %rd354, 2560, %rd151; 2026-02-21T12:39:47.6711018Z shl.b64 %rd371, %rd69, 1; 2026-02-21T12:39:47.6711090Z add.s64 %rd338, %rd370, %rd371; 2026-02-21T12:39:47.6711163Z mad.lo.s64 %rd372, %rd355, 2560, %rd151; 2026-02-21T12:39:47.6711318Z add.s64 %rd339, %rd372, %rd371; 2026-02-21T12:39:47.6711394Z mad.lo.s64 %rd373, %rd356, 2560, %rd151; 2026-02-21T12:39:47.6711506Z add.s64 %rd340, %rd373, %rd371; 2026-02-21T12:39:47.6711577Z mad.lo.s64 %rd374, %rd357, 2560, %rd151; 2026-02-21T12:39:47.6711640Z add.s64 %rd341, %rd374, %rd371; 2026-02-21T12:39:47.6711715Z mad.lo.s64 %rd375, %rd358, 2560, %rd151; 2026-02-21T12:39:47.6711777Z add.s64 %rd342, %rd375, %rd371; 2026-02-21T12:39:47.6711846Z mad.lo.s64 %rd376, %rd359, 2560, %rd151; 2026-02-21T12:39:47.6711916Z add.s64 %rd343, %rd376, %rd371; 2026-02-21T12:39:47.6711987Z mad.lo.s64 %rd377, %rd360, 2560, %rd151; 2026-02-21T12:39:47.6712050Z add.s64 %rd344, %rd377, %rd371; 2026-02-21T12:39:47.6712187Z mad.lo.s64 %rd378, %rd361, 2560, %rd151; 2026-02-21T12:39:47.6712255Z add.s64 %rd345, %rd378, %rd371; 2026-02-21T12:39:47.6712326Z mad.lo.s64 %rd379, %rd362, 2560, %rd151; 2026-02-21T12:39:47.6712389Z add.s64 %rd346, %rd379, %rd371; 2026-02-21T12:39:47.6712463Z mad.lo.s64 %rd380, %rd363, 2560, %rd151; 2026-02-21T12:39:47.6712526Z add.s64 %rd347, %rd380, %rd371; 2026-02-21T12:39:47.6712645Z mad.lo.s64 %rd381, %rd364, 2560, %rd151; 2026-02-21T12:39:47.6712713Z add.s64 %rd348, %rd381, %rd371; 2026-02-21T12:39:47.6712783Z mad.lo.s64 %rd382, %rd365, 2560, %rd151; 2026-02-21T12:39:47.6712845Z add.s64 %rd349, %rd382, %rd371; 2026-02-21T12:39:47.6712913Z mad.lo.s64 %rd383, %rd366, 2560, %rd151; 2026-02-21T12:39:47.6712978Z add.s64 %rd350, %rd383, %rd371; 2026-02-21T12:39:47.6713047Z mad.lo.s64 %rd384, %rd367, 2560, %rd151; 2026-02-21T12:39:47.6713110Z add.s64 %rd351, %rd384, %rd371; 2026-02-21T12:39:47.6713182Z mad.lo.s64 %rd385, %rd368, 2560, %rd151; 2026-02-21T12:39:47.6713246Z add.s64 %rd352, %rd385, %rd371; 2026-02-21T12:39:47.6713315Z mad.lo.s64 %rd386, %rd369, 2560, %rd151; 2026-02-21T12:39:47.6713381Z add.s64 %rd353, %rd386, %rd371; 2026-02-21T12:39:47.6713588Z .loc 1 91 81 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:81 2026-02-21T12:39:47.6713647Z bar.sync 0; 2026-02-21T12:39:47.6713761Z st.shared.v4.b32 [%r28], {%r7231, %r7233, %r7235, %r7237}; 2026-02-21T12:39:47.6713879Z st.shared.v4.b32 [%r29], {%r7239, %r7241, %r7243, %r7245}; 2026-02-21T12:39:47.6713982Z st.shared.v4.b32 [%r30], {%r7247, %r7249, %r7251, %r7253}; 2026-02-21T12:39:47.6714083Z st.shared.v4.b32 [%r31], {%r7255, %r7257, %r7259, %r7261}; 2026-02-21T12:39:47.6714186Z st.shared.v4.b32 [%r32], {%r7263, %r7265, %r7267, %r7269}; 2026-02-21T12:39:47.6714286Z st.shared.v4.b32 [%r33], {%r7271, %r7273, %r7275, %r7277}; 2026-02-21T12:39:47.6714386Z st.shared.v4.b32 [%r34], {%r7279, %r7281, %r7283, %r7285}; 2026-02-21T12:39:47.6714491Z st.shared.v4.b32 [%r35], {%r7287, %r7289, %r7291, %r7293}; 2026-02-21T12:39:47.6714551Z bar.sync 0; 2026-02-21T12:39:47.6714612Z // begin inline asm 2026-02-21T12:39:47.6714807Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7087, %r7088, %r7089, %r7090}, [%r3291]; 2026-02-21T12:39:47.6714871Z // end inline asm 2026-02-21T12:39:47.6714930Z // begin inline asm 2026-02-21T12:39:47.6715115Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7092, %r7093, %r7094, %r7095}, [%r3296]; 2026-02-21T12:39:47.6715191Z // end inline asm 2026-02-21T12:39:47.6715253Z // begin inline asm 2026-02-21T12:39:47.6715432Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7097, %r7098, %r7099, %r7100}, [%r3301]; 2026-02-21T12:39:47.6715491Z // end inline asm 2026-02-21T12:39:47.6715549Z // begin inline asm 2026-02-21T12:39:47.6715726Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7102, %r7103, %r7104, %r7105}, [%r3306]; 2026-02-21T12:39:47.6715782Z // end inline asm 2026-02-21T12:39:47.6715845Z // begin inline asm 2026-02-21T12:39:47.6716025Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7107, %r7108, %r7109, %r7110}, [%r3311]; 2026-02-21T12:39:47.6716083Z // end inline asm 2026-02-21T12:39:47.6716145Z // begin inline asm 2026-02-21T12:39:47.6716321Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7112, %r7113, %r7114, %r7115}, [%r3316]; 2026-02-21T12:39:47.6716436Z // end inline asm 2026-02-21T12:39:47.6716622Z // begin inline asm 2026-02-21T12:39:47.6716891Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7117, %r7118, %r7119, %r7120}, [%r3321]; 2026-02-21T12:39:47.6716947Z // end inline asm 2026-02-21T12:39:47.6717004Z // begin inline asm 2026-02-21T12:39:47.6717183Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7122, %r7123, %r7124, %r7125}, [%r3326]; 2026-02-21T12:39:47.6717240Z // end inline asm 2026-02-21T12:39:47.6717296Z bar.sync 0; 2026-02-21T12:39:47.6717404Z st.shared.v4.b32 [%r28], {%r7232, %r7234, %r7236, %r7238}; 2026-02-21T12:39:47.6717507Z st.shared.v4.b32 [%r29], {%r7240, %r7242, %r7244, %r7246}; 2026-02-21T12:39:47.6717688Z st.shared.v4.b32 [%r30], {%r7248, %r7250, %r7252, %r7254}; 2026-02-21T12:39:47.6717796Z st.shared.v4.b32 [%r31], {%r7256, %r7258, %r7260, %r7262}; 2026-02-21T12:39:47.6717900Z st.shared.v4.b32 [%r32], {%r7264, %r7266, %r7268, %r7270}; 2026-02-21T12:39:47.6718004Z st.shared.v4.b32 [%r33], {%r7272, %r7274, %r7276, %r7278}; 2026-02-21T12:39:47.6718104Z st.shared.v4.b32 [%r34], {%r7280, %r7282, %r7284, %r7286}; 2026-02-21T12:39:47.6718276Z st.shared.v4.b32 [%r35], {%r7288, %r7290, %r7292, %r7294}; 2026-02-21T12:39:47.6718335Z bar.sync 0; 2026-02-21T12:39:47.6718394Z // begin inline asm 2026-02-21T12:39:47.6718577Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7127, %r7128, %r7129, %r7130}, [%r3291]; 2026-02-21T12:39:47.6718633Z // end inline asm 2026-02-21T12:39:47.6718691Z // begin inline asm 2026-02-21T12:39:47.6718868Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7132, %r7133, %r7134, %r7135}, [%r3296]; 2026-02-21T12:39:47.6718939Z // end inline asm 2026-02-21T12:39:47.6719001Z // begin inline asm 2026-02-21T12:39:47.6719181Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7137, %r7138, %r7139, %r7140}, [%r3301]; 2026-02-21T12:39:47.6719239Z // end inline asm 2026-02-21T12:39:47.6719298Z // begin inline asm 2026-02-21T12:39:47.6719477Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7142, %r7143, %r7144, %r7145}, [%r3306]; 2026-02-21T12:39:47.6719540Z // end inline asm 2026-02-21T12:39:47.6719599Z // begin inline asm 2026-02-21T12:39:47.6719776Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7147, %r7148, %r7149, %r7150}, [%r3311]; 2026-02-21T12:39:47.6719832Z // end inline asm 2026-02-21T12:39:47.6719892Z // begin inline asm 2026-02-21T12:39:47.6720067Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7152, %r7153, %r7154, %r7155}, [%r3316]; 2026-02-21T12:39:47.6720123Z // end inline asm 2026-02-21T12:39:47.6720184Z // begin inline asm 2026-02-21T12:39:47.6720361Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7157, %r7158, %r7159, %r7160}, [%r3321]; 2026-02-21T12:39:47.6720417Z // end inline asm 2026-02-21T12:39:47.6720477Z // begin inline asm 2026-02-21T12:39:47.6720657Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7162, %r7163, %r7164, %r7165}, [%r3326]; 2026-02-21T12:39:47.6720725Z // end inline asm 2026-02-21T12:39:47.6720787Z // begin inline asm 2026-02-21T12:39:47.6720922Z st.global.v4.b32 [ %rd338 + 0 ], { %r7087, %r7088, %r7089, %r7090 }; 2026-02-21T12:39:47.6720977Z // end inline asm 2026-02-21T12:39:47.6721038Z // begin inline asm 2026-02-21T12:39:47.6721160Z st.global.v4.b32 [ %rd339 + 0 ], { %r7127, %r7128, %r7129, %r7130 }; 2026-02-21T12:39:47.6721216Z // end inline asm 2026-02-21T12:39:47.6721274Z // begin inline asm 2026-02-21T12:39:47.6721388Z st.global.v4.b32 [ %rd340 + 0 ], { %r7092, %r7093, %r7094, %r7095 }; 2026-02-21T12:39:47.6721446Z // end inline asm 2026-02-21T12:39:47.6721502Z // begin inline asm 2026-02-21T12:39:47.6721615Z st.global.v4.b32 [ %rd341 + 0 ], { %r7132, %r7133, %r7134, %r7135 }; 2026-02-21T12:39:47.6721676Z // end inline asm 2026-02-21T12:39:47.6721733Z // begin inline asm 2026-02-21T12:39:47.6721847Z st.global.v4.b32 [ %rd342 + 0 ], { %r7097, %r7098, %r7099, %r7100 }; 2026-02-21T12:39:47.6721903Z // end inline asm 2026-02-21T12:39:47.6721965Z // begin inline asm 2026-02-21T12:39:47.6722178Z st.global.v4.b32 [ %rd343 + 0 ], { %r7137, %r7138, %r7139, %r7140 }; 2026-02-21T12:39:47.6722237Z // end inline asm 2026-02-21T12:39:47.6722298Z // begin inline asm 2026-02-21T12:39:47.6722463Z st.global.v4.b32 [ %rd344 + 0 ], { %r7102, %r7103, %r7104, %r7105 }; 2026-02-21T12:39:47.6722517Z // end inline asm 2026-02-21T12:39:47.6722577Z // begin inline asm 2026-02-21T12:39:47.6722690Z st.global.v4.b32 [ %rd345 + 0 ], { %r7142, %r7143, %r7144, %r7145 }; 2026-02-21T12:39:47.6722745Z // end inline asm 2026-02-21T12:39:47.6722802Z // begin inline asm 2026-02-21T12:39:47.6722919Z st.global.v4.b32 [ %rd346 + 0 ], { %r7107, %r7108, %r7109, %r7110 }; 2026-02-21T12:39:47.6722974Z // end inline asm 2026-02-21T12:39:47.6723042Z // begin inline asm 2026-02-21T12:39:47.6723211Z st.global.v4.b32 [ %rd347 + 0 ], { %r7147, %r7148, %r7149, %r7150 }; 2026-02-21T12:39:47.6723269Z // end inline asm 2026-02-21T12:39:47.6723326Z // begin inline asm 2026-02-21T12:39:47.6723441Z st.global.v4.b32 [ %rd348 + 0 ], { %r7112, %r7113, %r7114, %r7115 }; 2026-02-21T12:39:47.6723501Z // end inline asm 2026-02-21T12:39:47.6723559Z // begin inline asm 2026-02-21T12:39:47.6723719Z st.global.v4.b32 [ %rd349 + 0 ], { %r7152, %r7153, %r7154, %r7155 }; 2026-02-21T12:39:47.6723779Z // end inline asm 2026-02-21T12:39:47.6723836Z // begin inline asm 2026-02-21T12:39:47.6723949Z st.global.v4.b32 [ %rd350 + 0 ], { %r7117, %r7118, %r7119, %r7120 }; 2026-02-21T12:39:47.6724009Z // end inline asm 2026-02-21T12:39:47.6724065Z // begin inline asm 2026-02-21T12:39:47.6724177Z st.global.v4.b32 [ %rd351 + 0 ], { %r7157, %r7158, %r7159, %r7160 }; 2026-02-21T12:39:47.6724233Z // end inline asm 2026-02-21T12:39:47.6724295Z // begin inline asm 2026-02-21T12:39:47.6724417Z st.global.v4.b32 [ %rd352 + 0 ], { %r7122, %r7123, %r7124, %r7125 }; 2026-02-21T12:39:47.6724472Z // end inline asm 2026-02-21T12:39:47.6724534Z // begin inline asm 2026-02-21T12:39:47.6724648Z st.global.v4.b32 [ %rd353 + 0 ], { %r7162, %r7163, %r7164, %r7165 }; 2026-02-21T12:39:47.6724706Z // end inline asm 2026-02-21T12:39:47.6724920Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6724994Z add.s64 %rd387, %rd589, 3; 2026-02-21T12:39:47.6725194Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6725286Z mul.hi.u64 %rd388, %rd387, -3689348814741910323; 2026-02-21T12:39:47.6725351Z shr.u64 %rd389, %rd388, 5; 2026-02-21T12:39:47.6725547Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6725610Z shl.b64 %rd78, %rd389, 3; 2026-02-21T12:39:47.6725811Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6725885Z sub.s64 %rd390, 2048, %rd78; 2026-02-21T12:39:47.6726080Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6726144Z min.s64 %rd79, %rd390, 8; 2026-02-21T12:39:47.6726338Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6726407Z mul.lo.s64 %rd391, %rd389, 40; 2026-02-21T12:39:47.6726600Z sub.s64 %rd80, %rd387, %rd391; 2026-02-21T12:39:47.6726822Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6726889Z or.b64 %rd392, %rd80, %rd79; 2026-02-21T12:39:47.6726959Z and.b64 %rd393, %rd392, -4294967296; 2026-02-21T12:39:47.6727030Z setp.ne.b64 %p20, %rd393, 0; 2026-02-21T12:39:47.6727091Z @%p20 bra $L__BB0_19; 2026-02-21T12:39:47.6727150Z bra.uni $L__BB0_18; 2026-02-21T12:39:47.6727273Z $L__BB0_19: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6727340Z div.s64 %rd602, %rd80, %rd79; 2026-02-21T12:39:47.6727399Z bra.uni $L__BB0_20; 2026-02-21T12:39:47.6727507Z $L__BB0_18: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6727655Z cvt.u32.u64 %r7295, %rd79; 2026-02-21T12:39:47.6727716Z cvt.u32.u64 %r7296, %rd80; 2026-02-21T12:39:47.6727861Z div.u32 %r7297, %r7296, %r7295; 2026-02-21T12:39:47.6727926Z cvt.u64.u32 %rd602, %r7297; 2026-02-21T12:39:47.6728031Z $L__BB0_20: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6728235Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6728305Z mul.lo.s64 %rd395, %rd602, %rd79; 2026-02-21T12:39:47.6728371Z sub.s64 %rd396, %rd80, %rd395; 2026-02-21T12:39:47.6728568Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6728703Z add.s64 %rd397, %rd396, %rd78; 2026-02-21T12:39:47.6728920Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6728984Z shl.b64 %rd84, %rd397, 7; 2026-02-21T12:39:47.6729181Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6729244Z shl.b64 %rd398, %rd602, 8; 2026-02-21T12:39:47.6729502Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6729567Z or.b64 %rd85, %rd398, %rd23; 2026-02-21T12:39:47.6729774Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6729836Z shl.b64 %rd399, %rd397, 21; 2026-02-21T12:39:47.6729898Z add.s64 %rd604, %rd26, %rd399; 2026-02-21T12:39:47.6729960Z add.s64 %rd603, %rd27, %rd398; 2026-02-21T12:39:47.6730026Z mov.b32 %r11631, 0f00000000; 2026-02-21T12:39:47.6730088Z mov.b64 %rd605, -16; 2026-02-21T12:39:47.6730151Z mov.b32 %r11632, %r11631; 2026-02-21T12:39:47.6730213Z mov.b32 %r11633, %r11631; 2026-02-21T12:39:47.6730271Z mov.b32 %r11634, %r11631; 2026-02-21T12:39:47.6730330Z mov.b32 %r11635, %r11631; 2026-02-21T12:39:47.6730390Z mov.b32 %r11636, %r11631; 2026-02-21T12:39:47.6730452Z mov.b32 %r11637, %r11631; 2026-02-21T12:39:47.6730510Z mov.b32 %r11638, %r11631; 2026-02-21T12:39:47.6730571Z mov.b32 %r11639, %r11631; 2026-02-21T12:39:47.6730633Z mov.b32 %r11640, %r11631; 2026-02-21T12:39:47.6730691Z mov.b32 %r11641, %r11631; 2026-02-21T12:39:47.6730750Z mov.b32 %r11642, %r11631; 2026-02-21T12:39:47.6730811Z mov.b32 %r11643, %r11631; 2026-02-21T12:39:47.6730868Z mov.b32 %r11644, %r11631; 2026-02-21T12:39:47.6730930Z mov.b32 %r11645, %r11631; 2026-02-21T12:39:47.6730987Z mov.b32 %r11646, %r11631; 2026-02-21T12:39:47.6731047Z mov.b32 %r11647, %r11631; 2026-02-21T12:39:47.6731104Z mov.b32 %r11648, %r11631; 2026-02-21T12:39:47.6731161Z mov.b32 %r11649, %r11631; 2026-02-21T12:39:47.6731225Z mov.b32 %r11650, %r11631; 2026-02-21T12:39:47.6731285Z mov.b32 %r11651, %r11631; 2026-02-21T12:39:47.6731348Z mov.b32 %r11652, %r11631; 2026-02-21T12:39:47.6731407Z mov.b32 %r11653, %r11631; 2026-02-21T12:39:47.6731487Z mov.b32 %r11654, %r11631; 2026-02-21T12:39:47.6731550Z mov.b32 %r11655, %r11631; 2026-02-21T12:39:47.6731611Z mov.b32 %r11656, %r11631; 2026-02-21T12:39:47.6731678Z mov.b32 %r11657, %r11631; 2026-02-21T12:39:47.6731736Z mov.b32 %r11658, %r11631; 2026-02-21T12:39:47.6731795Z mov.b32 %r11659, %r11631; 2026-02-21T12:39:47.6731854Z mov.b32 %r11660, %r11631; 2026-02-21T12:39:47.6731917Z mov.b32 %r11661, %r11631; 2026-02-21T12:39:47.6731976Z mov.b32 %r11662, %r11631; 2026-02-21T12:39:47.6732036Z mov.b32 %r11663, %r11631; 2026-02-21T12:39:47.6732097Z mov.b32 %r11664, %r11631; 2026-02-21T12:39:47.6732157Z mov.b32 %r11665, %r11631; 2026-02-21T12:39:47.6732215Z mov.b32 %r11666, %r11631; 2026-02-21T12:39:47.6732273Z mov.b32 %r11667, %r11631; 2026-02-21T12:39:47.6732337Z mov.b32 %r11668, %r11631; 2026-02-21T12:39:47.6732397Z mov.b32 %r11669, %r11631; 2026-02-21T12:39:47.6732456Z mov.b32 %r11670, %r11631; 2026-02-21T12:39:47.6732519Z mov.b32 %r11671, %r11631; 2026-02-21T12:39:47.6732637Z mov.b32 %r11672, %r11631; 2026-02-21T12:39:47.6732696Z mov.b32 %r11673, %r11631; 2026-02-21T12:39:47.6732755Z mov.b32 %r11674, %r11631; 2026-02-21T12:39:47.6732869Z mov.b32 %r11675, %r11631; 2026-02-21T12:39:47.6732929Z mov.b32 %r11676, %r11631; 2026-02-21T12:39:47.6732987Z mov.b32 %r11677, %r11631; 2026-02-21T12:39:47.6733055Z mov.b32 %r11678, %r11631; 2026-02-21T12:39:47.6733120Z mov.b32 %r11679, %r11631; 2026-02-21T12:39:47.6733179Z mov.b32 %r11680, %r11631; 2026-02-21T12:39:47.6733239Z mov.b32 %r11681, %r11631; 2026-02-21T12:39:47.6733301Z mov.b32 %r11682, %r11631; 2026-02-21T12:39:47.6733360Z mov.b32 %r11683, %r11631; 2026-02-21T12:39:47.6733420Z mov.b32 %r11684, %r11631; 2026-02-21T12:39:47.6733485Z mov.b32 %r11685, %r11631; 2026-02-21T12:39:47.6733601Z mov.b32 %r11686, %r11631; 2026-02-21T12:39:47.6733665Z mov.b32 %r11687, %r11631; 2026-02-21T12:39:47.6733725Z mov.b32 %r11688, %r11631; 2026-02-21T12:39:47.6733783Z mov.b32 %r11689, %r11631; 2026-02-21T12:39:47.6733844Z mov.b32 %r11690, %r11631; 2026-02-21T12:39:47.6733903Z mov.b32 %r11691, %r11631; 2026-02-21T12:39:47.6733966Z mov.b32 %r11692, %r11631; 2026-02-21T12:39:47.6734073Z mov.b32 %r11693, %r11631; 2026-02-21T12:39:47.6734133Z mov.b32 %r11694, %r11631; 2026-02-21T12:39:47.6734194Z mov.b32 %r11695, %r11631; 2026-02-21T12:39:47.6734253Z mov.b32 %r11696, %r11631; 2026-02-21T12:39:47.6734312Z mov.b32 %r11697, %r11631; 2026-02-21T12:39:47.6734370Z mov.b32 %r11698, %r11631; 2026-02-21T12:39:47.6734432Z mov.b32 %r11699, %r11631; 2026-02-21T12:39:47.6734489Z mov.b32 %r11700, %r11631; 2026-02-21T12:39:47.6734547Z mov.b32 %r11701, %r11631; 2026-02-21T12:39:47.6734609Z mov.b32 %r11702, %r11631; 2026-02-21T12:39:47.6734667Z mov.b32 %r11703, %r11631; 2026-02-21T12:39:47.6734726Z mov.b32 %r11704, %r11631; 2026-02-21T12:39:47.6734785Z mov.b32 %r11705, %r11631; 2026-02-21T12:39:47.6734847Z mov.b32 %r11706, %r11631; 2026-02-21T12:39:47.6734904Z mov.b32 %r11707, %r11631; 2026-02-21T12:39:47.6734964Z mov.b32 %r11708, %r11631; 2026-02-21T12:39:47.6735026Z mov.b32 %r11709, %r11631; 2026-02-21T12:39:47.6735084Z mov.b32 %r11710, %r11631; 2026-02-21T12:39:47.6735146Z mov.b32 %r11711, %r11631; 2026-02-21T12:39:47.6735209Z mov.b32 %r11712, %r11631; 2026-02-21T12:39:47.6735269Z mov.b32 %r11713, %r11631; 2026-02-21T12:39:47.6735327Z mov.b32 %r11714, %r11631; 2026-02-21T12:39:47.6735386Z mov.b32 %r11715, %r11631; 2026-02-21T12:39:47.6735449Z mov.b32 %r11716, %r11631; 2026-02-21T12:39:47.6735508Z mov.b32 %r11717, %r11631; 2026-02-21T12:39:47.6735565Z mov.b32 %r11718, %r11631; 2026-02-21T12:39:47.6735623Z mov.b32 %r11719, %r11631; 2026-02-21T12:39:47.6735686Z mov.b32 %r11720, %r11631; 2026-02-21T12:39:47.6735745Z mov.b32 %r11721, %r11631; 2026-02-21T12:39:47.6735804Z mov.b32 %r11722, %r11631; 2026-02-21T12:39:47.6735866Z mov.b32 %r11723, %r11631; 2026-02-21T12:39:47.6735925Z mov.b32 %r11724, %r11631; 2026-02-21T12:39:47.6735982Z mov.b32 %r11725, %r11631; 2026-02-21T12:39:47.6736055Z mov.b32 %r11726, %r11631; 2026-02-21T12:39:47.6736119Z mov.b32 %r11727, %r11631; 2026-02-21T12:39:47.6736177Z mov.b32 %r11728, %r11631; 2026-02-21T12:39:47.6736241Z mov.b32 %r11729, %r11631; 2026-02-21T12:39:47.6736303Z mov.b32 %r11730, %r11631; 2026-02-21T12:39:47.6736361Z mov.b32 %r11731, %r11631; 2026-02-21T12:39:47.6736421Z mov.b32 %r11732, %r11631; 2026-02-21T12:39:47.6736635Z mov.b32 %r11733, %r11631; 2026-02-21T12:39:47.6736710Z mov.b32 %r11734, %r11631; 2026-02-21T12:39:47.6736770Z mov.b32 %r11735, %r11631; 2026-02-21T12:39:47.6736828Z mov.b32 %r11736, %r11631; 2026-02-21T12:39:47.6736892Z mov.b32 %r11737, %r11631; 2026-02-21T12:39:47.6736951Z mov.b32 %r11738, %r11631; 2026-02-21T12:39:47.6737011Z mov.b32 %r11739, %r11631; 2026-02-21T12:39:47.6737072Z mov.b32 %r11740, %r11631; 2026-02-21T12:39:47.6737131Z mov.b32 %r11741, %r11631; 2026-02-21T12:39:47.6737187Z mov.b32 %r11742, %r11631; 2026-02-21T12:39:47.6737246Z mov.b32 %r11743, %r11631; 2026-02-21T12:39:47.6737395Z mov.b32 %r11744, %r11631; 2026-02-21T12:39:47.6737454Z mov.b32 %r11745, %r11631; 2026-02-21T12:39:47.6737511Z mov.b32 %r11746, %r11631; 2026-02-21T12:39:47.6737644Z mov.b32 %r11747, %r11631; 2026-02-21T12:39:47.6737702Z mov.b32 %r11748, %r11631; 2026-02-21T12:39:47.6737761Z mov.b32 %r11749, %r11631; 2026-02-21T12:39:47.6737821Z mov.b32 %r11750, %r11631; 2026-02-21T12:39:47.6737884Z mov.b32 %r11751, %r11631; 2026-02-21T12:39:47.6737942Z mov.b32 %r11752, %r11631; 2026-02-21T12:39:47.6738000Z mov.b32 %r11753, %r11631; 2026-02-21T12:39:47.6738067Z mov.b32 %r11754, %r11631; 2026-02-21T12:39:47.6738125Z mov.b32 %r11755, %r11631; 2026-02-21T12:39:47.6738183Z mov.b32 %r11756, %r11631; 2026-02-21T12:39:47.6738302Z mov.b32 %r11757, %r11631; 2026-02-21T12:39:47.6738383Z mov.b32 %r11758, %r11631; 2026-02-21T12:39:47.6738499Z $L__BB0_21: // Parent Loop BB0_2 Depth=1 2026-02-21T12:39:47.6738607Z // => This Inner Loop Header: Depth=2 2026-02-21T12:39:47.6738816Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6738950Z add.s64 %rd401, %rd604, -32; 2026-02-21T12:39:47.6739013Z // begin inline asm 2026-02-21T12:39:47.6739077Z mov.u64 %rd400, 0x0; 2026-02-21T12:39:47.6739209Z createpolicy.fractional.L2::evict_last.b64 %rd400, 1.0; 2026-02-21T12:39:47.6739267Z // end inline asm 2026-02-21T12:39:47.6739329Z // begin inline asm 2026-02-21T12:39:47.6739391Z mov.u32 %r7299, 0x0; 2026-02-21T12:39:47.6739449Z mov.u32 %r7300, 0x0; 2026-02-21T12:39:47.6739506Z mov.u32 %r7301, 0x0; 2026-02-21T12:39:47.6739565Z mov.u32 %r7302, 0x0; 2026-02-21T12:39:47.6739795Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7299, %r7300, %r7301, %r7302 }, [ %rd401 + 0 ], %rd400; 2026-02-21T12:39:47.6739852Z // end inline asm 2026-02-21T12:39:47.6740056Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6740117Z bar.sync 0; 2026-02-21T12:39:47.6740198Z st.shared.v2.b32 [%r10], {%r7299, %r7300}; 2026-02-21T12:39:47.6740281Z st.shared.v2.b32 [%r11], {%r7301, %r7302}; 2026-02-21T12:39:47.6740342Z bar.sync 0; 2026-02-21T12:39:47.6740410Z ld.shared.b16 %rs337, [%r13]; 2026-02-21T12:39:47.6740480Z ld.shared.b16 %rs338, [%r13+256]; 2026-02-21T12:39:47.6740551Z ld.shared.b16 %rs339, [%r13+16]; 2026-02-21T12:39:47.6740618Z ld.shared.b16 %rs340, [%r13+272]; 2026-02-21T12:39:47.6740682Z ld.shared.b16 %rs341, [%r14]; 2026-02-21T12:39:47.6740759Z ld.shared.b16 %rs342, [%r14+256]; 2026-02-21T12:39:47.6740832Z ld.shared.b16 %rs343, [%r14+16]; 2026-02-21T12:39:47.6740896Z ld.shared.b16 %rs344, [%r14+272]; 2026-02-21T12:39:47.6740963Z cvt.f32.bf16 %r7561, %rs337; 2026-02-21T12:39:47.6741031Z cvt.f32.bf16 %r7562, %rs338; 2026-02-21T12:39:47.6741093Z cvt.f32.bf16 %r7563, %rs341; 2026-02-21T12:39:47.6741153Z cvt.f32.bf16 %r7564, %rs342; 2026-02-21T12:39:47.6741217Z cvt.f32.bf16 %r7821, %rs339; 2026-02-21T12:39:47.6741285Z cvt.f32.bf16 %r7822, %rs340; 2026-02-21T12:39:47.6741346Z cvt.f32.bf16 %r7823, %rs343; 2026-02-21T12:39:47.6741412Z cvt.f32.bf16 %r7824, %rs344; 2026-02-21T12:39:47.6741616Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6741676Z // begin inline asm 2026-02-21T12:39:47.6741735Z mov.u32 %r7303, 0x0; 2026-02-21T12:39:47.6741796Z mov.u32 %r7304, 0x0; 2026-02-21T12:39:47.6741893Z ld.global.v2.b32 { %r7303, %r7304 }, [ %rd603 + 0 ]; 2026-02-21T12:39:47.6741950Z // end inline asm 2026-02-21T12:39:47.6742150Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6742213Z bar.sync 0; 2026-02-21T12:39:47.6742279Z st.shared.b8 [%r15], %r7303; 2026-02-21T12:39:47.6742349Z prmt.b32 %r8875, %r7303, 0, 0x7771U; 2026-02-21T12:39:47.6742419Z st.shared.b8 [%r16+256], %r8875; 2026-02-21T12:39:47.6742564Z prmt.b32 %r8876, %r7303, 0, 0x7772U; 2026-02-21T12:39:47.6742632Z st.shared.b8 [%r17+512], %r8876; 2026-02-21T12:39:47.6742701Z prmt.b32 %r8877, %r7303, 0, 0x7773U; 2026-02-21T12:39:47.6742821Z st.shared.b8 [%r18+768], %r8877; 2026-02-21T12:39:47.6742888Z st.shared.b8 [%r19+1024], %r7304; 2026-02-21T12:39:47.6742952Z prmt.b32 %r8878, %r7304, 0, 0x7771U; 2026-02-21T12:39:47.6743020Z st.shared.b8 [%r20+1280], %r8878; 2026-02-21T12:39:47.6743085Z prmt.b32 %r8879, %r7304, 0, 0x7772U; 2026-02-21T12:39:47.6743147Z st.shared.b8 [%r21+1536], %r8879; 2026-02-21T12:39:47.6743215Z prmt.b32 %r8880, %r7304, 0, 0x7773U; 2026-02-21T12:39:47.6743279Z st.shared.b8 [%r22+1792], %r8880; 2026-02-21T12:39:47.6743335Z bar.sync 0; 2026-02-21T12:39:47.6743449Z ld.shared.b32 %r8881, [%r44]; 2026-02-21T12:39:47.6743518Z prmt.b32 %r8882, %r8881, 0, 0x7771U; 2026-02-21T12:39:47.6743583Z cvt.u16.u32 %rs345, %r8882; 2026-02-21T12:39:47.6743659Z prmt.b32 %r8883, %r8881, 0, 0x7770U; 2026-02-21T12:39:47.6743728Z cvt.u16.u32 %rs346, %r8883; 2026-02-21T12:39:47.6743794Z prmt.b32 %r8884, %r8881, 0, 0x7773U; 2026-02-21T12:39:47.6745305Z cvt.u16.u32 %rs347, %r8884; 2026-02-21T12:39:47.6745513Z prmt.b32 %r8885, %r8881, 0, 0x7772U; 2026-02-21T12:39:47.6745587Z cvt.u16.u32 %rs348, %r8885; 2026-02-21T12:39:47.6745821Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6745888Z shl.b16 %rs349, %rs346, 4; 2026-02-21T12:39:47.6745950Z shl.b16 %rs350, %rs345, 4; 2026-02-21T12:39:47.6746166Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6746231Z cvt.u32.u16 %r8886, %rs349; 2026-02-21T12:39:47.6746311Z prmt.b32 %r8887, %r8886, %r8888, 0x3340U; 2026-02-21T12:39:47.6746386Z prmt.b32 %r8892, %r8887, %r8889, 0x5410U; 2026-02-21T12:39:47.6746649Z prmt.b32 %r8893, %r8892, %r8881, 0x5040U; 2026-02-21T12:39:47.6746725Z prmt.b32 %r8894, %r8893, 0, 0x9991U; 2026-02-21T12:39:47.6746793Z cvt.u16.u32 %rs351, %r8894; 2026-02-21T12:39:47.6746859Z shr.s16 %rs352, %rs351, 4; 2026-02-21T12:39:47.6746940Z prmt.b32 %r8895, %r8893, 0, 0xbbb3U; 2026-02-21T12:39:47.6747007Z cvt.u16.u32 %rs353, %r8895; 2026-02-21T12:39:47.6747069Z shr.s16 %rs354, %rs353, 4; 2026-02-21T12:39:47.6747135Z cvt.s16.s8 %rs355, %rs349; 2026-02-21T12:39:47.6747194Z shr.s16 %rs356, %rs355, 4; 2026-02-21T12:39:47.6747254Z cvt.s16.s8 %rs357, %rs350; 2026-02-21T12:39:47.6747317Z shr.s16 %rs358, %rs357, 4; 2026-02-21T12:39:47.6747537Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6747608Z cvt.rn.f32.s16 %r8896, %rs354; 2026-02-21T12:39:47.6747679Z cvt.rn.f32.s16 %r8897, %rs352; 2026-02-21T12:39:47.6747741Z cvt.rn.f32.s16 %r8898, %rs358; 2026-02-21T12:39:47.6747805Z cvt.rn.f32.s16 %r8899, %rs356; 2026-02-21T12:39:47.6748012Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6748083Z shl.b16 %rs359, %rs348, 4; 2026-02-21T12:39:47.6748144Z shl.b16 %rs360, %rs347, 4; 2026-02-21T12:39:47.6748351Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6748523Z prmt.b32 %r8900, %r8881, %r8901, 0x3020U; 2026-02-21T12:39:47.6748592Z prmt.b32 %r8902, %r8900, 0, 0x9991U; 2026-02-21T12:39:47.6748657Z cvt.u16.u32 %rs361, %r8902; 2026-02-21T12:39:47.6748724Z shr.s16 %rs362, %rs361, 4; 2026-02-21T12:39:47.6748787Z cvt.s16.s8 %rs363, %rs359; 2026-02-21T12:39:47.6748848Z shr.s16 %rs364, %rs363, 4; 2026-02-21T12:39:47.6748910Z cvt.s16.s8 %rs365, %rs360; 2026-02-21T12:39:47.6748975Z shr.s16 %rs366, %rs365, 4; 2026-02-21T12:39:47.6749042Z prmt.b32 %r8903, %r8881, 0, 0xbbb3U; 2026-02-21T12:39:47.6749103Z cvt.u16.u32 %rs367, %r8903; 2026-02-21T12:39:47.6749168Z shr.s16 %rs368, %rs367, 4; 2026-02-21T12:39:47.6749368Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6749532Z cvt.rn.f32.s16 %r8904, %rs362; 2026-02-21T12:39:47.6749599Z cvt.rn.f32.s16 %r8905, %rs368; 2026-02-21T12:39:47.6749730Z cvt.rn.f32.s16 %r8906, %rs366; 2026-02-21T12:39:47.6749793Z cvt.rn.f32.s16 %r8907, %rs364; 2026-02-21T12:39:47.6749988Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6750071Z ld.shared.b32 %r8908, [%r44+128]; 2026-02-21T12:39:47.6750141Z prmt.b32 %r8909, %r8908, 0, 0x7771U; 2026-02-21T12:39:47.6750204Z cvt.u16.u32 %rs369, %r8909; 2026-02-21T12:39:47.6750274Z prmt.b32 %r8910, %r8908, 0, 0x7770U; 2026-02-21T12:39:47.6750334Z cvt.u16.u32 %rs370, %r8910; 2026-02-21T12:39:47.6750397Z prmt.b32 %r8911, %r8908, 0, 0x7773U; 2026-02-21T12:39:47.6750456Z cvt.u16.u32 %rs371, %r8911; 2026-02-21T12:39:47.6750523Z prmt.b32 %r8912, %r8908, 0, 0x7772U; 2026-02-21T12:39:47.6750583Z cvt.u16.u32 %rs372, %r8912; 2026-02-21T12:39:47.6750782Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6751049Z shl.b16 %rs373, %rs370, 4; 2026-02-21T12:39:47.6751268Z shl.b16 %rs374, %rs369, 4; 2026-02-21T12:39:47.6751577Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6751670Z cvt.u32.u16 %r8913, %rs373; 2026-02-21T12:39:47.6751770Z prmt.b32 %r8914, %r8913, %r8915, 0x3340U; 2026-02-21T12:39:47.6751869Z prmt.b32 %r8916, %r8914, %r8889, 0x5410U; 2026-02-21T12:39:47.6751971Z prmt.b32 %r8917, %r8916, %r8908, 0x5040U; 2026-02-21T12:39:47.6752080Z prmt.b32 %r8918, %r8917, 0, 0x9991U; 2026-02-21T12:39:47.6752165Z cvt.u16.u32 %rs375, %r8918; 2026-02-21T12:39:47.6752249Z shr.s16 %rs376, %rs375, 4; 2026-02-21T12:39:47.6752348Z prmt.b32 %r8919, %r8917, 0, 0xbbb3U; 2026-02-21T12:39:47.6752443Z cvt.u16.u32 %rs377, %r8919; 2026-02-21T12:39:47.6752533Z shr.s16 %rs378, %rs377, 4; 2026-02-21T12:39:47.6752624Z cvt.s16.s8 %rs379, %rs373; 2026-02-21T12:39:47.6752713Z shr.s16 %rs380, %rs379, 4; 2026-02-21T12:39:47.6752797Z cvt.s16.s8 %rs381, %rs374; 2026-02-21T12:39:47.6752889Z shr.s16 %rs382, %rs381, 4; 2026-02-21T12:39:47.6753180Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6753276Z cvt.rn.f32.s16 %r8920, %rs378; 2026-02-21T12:39:47.6753366Z cvt.rn.f32.s16 %r8921, %rs376; 2026-02-21T12:39:47.6753454Z cvt.rn.f32.s16 %r8922, %rs382; 2026-02-21T12:39:47.6753549Z cvt.rn.f32.s16 %r8923, %rs380; 2026-02-21T12:39:47.6753855Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6753940Z shl.b16 %rs383, %rs372, 4; 2026-02-21T12:39:47.6754026Z shl.b16 %rs384, %rs371, 4; 2026-02-21T12:39:47.6754335Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6754462Z prmt.b32 %r8924, %r8908, %r8925, 0x3020U; 2026-02-21T12:39:47.6754586Z prmt.b32 %r8926, %r8924, 0, 0x9991U; 2026-02-21T12:39:47.6754656Z cvt.u16.u32 %rs385, %r8926; 2026-02-21T12:39:47.6754723Z shr.s16 %rs386, %rs385, 4; 2026-02-21T12:39:47.6754785Z cvt.s16.s8 %rs387, %rs383; 2026-02-21T12:39:47.6754850Z shr.s16 %rs388, %rs387, 4; 2026-02-21T12:39:47.6754913Z cvt.s16.s8 %rs389, %rs384; 2026-02-21T12:39:47.6754972Z shr.s16 %rs390, %rs389, 4; 2026-02-21T12:39:47.6755044Z prmt.b32 %r8927, %r8908, 0, 0xbbb3U; 2026-02-21T12:39:47.6755105Z cvt.u16.u32 %rs391, %r8927; 2026-02-21T12:39:47.6755167Z shr.s16 %rs392, %rs391, 4; 2026-02-21T12:39:47.6755370Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6755435Z cvt.rn.f32.s16 %r8928, %rs386; 2026-02-21T12:39:47.6755498Z cvt.rn.f32.s16 %r8929, %rs392; 2026-02-21T12:39:47.6755560Z cvt.rn.f32.s16 %r8930, %rs390; 2026-02-21T12:39:47.6755626Z cvt.rn.f32.s16 %r8931, %rs388; 2026-02-21T12:39:47.6755759Z bar.sync 0; 2026-02-21T12:39:47.6755879Z st.shared.v4.b32 [%r24], {%r8899, %r8897, %r8898, %r8896}; 2026-02-21T12:39:47.6755994Z st.shared.v4.b32 [%r25], {%r8907, %r8904, %r8906, %r8905}; 2026-02-21T12:39:47.6756155Z st.shared.v4.b32 [%r26], {%r8923, %r8921, %r8922, %r8920}; 2026-02-21T12:39:47.6756258Z st.shared.v4.b32 [%r27], {%r8931, %r8928, %r8930, %r8929}; 2026-02-21T12:39:47.6756316Z $L__tmp13: 2026-02-21T12:39:47.6756827Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6756894Z // begin inline asm 2026-02-21T12:39:47.6756980Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6757040Z // end inline asm 2026-02-21T12:39:47.6757097Z bar.sync 0; 2026-02-21T12:39:47.6757181Z shfl.sync.idx.b32 %r8932, %r2, 0, 31, -1; 2026-02-21T12:39:47.6757257Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6757323Z mov.pred %p21, -1; 2026-02-21T12:39:47.6757383Z // begin inline asm 2026-02-21T12:39:47.6760301Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758}, {%r7561,%r7562,%r7563,%r7564}, %rd404, %p21, 1, 1; 2026-02-21T12:39:47.6760373Z // end inline asm 2026-02-21T12:39:47.6760441Z // begin inline asm 2026-02-21T12:39:47.6763158Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758}, {%r7821,%r7822,%r7823,%r7824}, %rd405, %p21, 1, 1; 2026-02-21T12:39:47.6763227Z // end inline asm 2026-02-21T12:39:47.6763307Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6763366Z mov.b32 %r8742, 0; 2026-02-21T12:39:47.6763433Z mov.b32 %r7953, %r10096; 2026-02-21T12:39:47.6763495Z mov.b32 %r7954, %r8742; 2026-02-21T12:39:47.6763553Z mov.b32 %r7955, %r8742; 2026-02-21T12:39:47.6763614Z // begin inline asm 2026-02-21T12:39:47.6766155Z // wait for regs: %r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758,%r7953,%r7954,%r7955 2026-02-21T12:39:47.6766383Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6766447Z // end inline asm 2026-02-21T12:39:47.6766654Z $L__tmp14: 2026-02-21T12:39:47.6766872Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6766940Z // begin inline asm 2026-02-21T12:39:47.6767000Z mov.u64 %rd406, 0x0; 2026-02-21T12:39:47.6767269Z createpolicy.fractional.L2::evict_last.b64 %rd406, 1.0; 2026-02-21T12:39:47.6767335Z // end inline asm 2026-02-21T12:39:47.6767395Z // begin inline asm 2026-02-21T12:39:47.6767454Z mov.u32 %r8087, 0x0; 2026-02-21T12:39:47.6767512Z mov.u32 %r8088, 0x0; 2026-02-21T12:39:47.6767573Z mov.u32 %r8089, 0x0; 2026-02-21T12:39:47.6767642Z mov.u32 %r8090, 0x0; 2026-02-21T12:39:47.6767872Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8087, %r8088, %r8089, %r8090 }, [ %rd604 + 0 ], %rd406; 2026-02-21T12:39:47.6767933Z // end inline asm 2026-02-21T12:39:47.6768137Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6768195Z bar.sync 0; 2026-02-21T12:39:47.6768278Z st.shared.v2.b32 [%r10], {%r8087, %r8088}; 2026-02-21T12:39:47.6768362Z st.shared.v2.b32 [%r11], {%r8089, %r8090}; 2026-02-21T12:39:47.6768421Z bar.sync 0; 2026-02-21T12:39:47.6768491Z ld.shared.b16 %rs393, [%r13]; 2026-02-21T12:39:47.6768572Z ld.shared.b16 %rs394, [%r13+256]; 2026-02-21T12:39:47.6768641Z ld.shared.b16 %rs395, [%r13+16]; 2026-02-21T12:39:47.6768708Z ld.shared.b16 %rs396, [%r13+272]; 2026-02-21T12:39:47.6768776Z ld.shared.b16 %rs397, [%r14]; 2026-02-21T12:39:47.6768839Z ld.shared.b16 %rs398, [%r14+256]; 2026-02-21T12:39:47.6768904Z ld.shared.b16 %rs399, [%r14+16]; 2026-02-21T12:39:47.6768968Z ld.shared.b16 %rs400, [%r14+272]; 2026-02-21T12:39:47.6769038Z cvt.f32.bf16 %r8349, %rs393; 2026-02-21T12:39:47.6769102Z cvt.f32.bf16 %r8350, %rs394; 2026-02-21T12:39:47.6769176Z cvt.f32.bf16 %r8351, %rs397; 2026-02-21T12:39:47.6769248Z cvt.f32.bf16 %r8352, %rs398; 2026-02-21T12:39:47.6769312Z cvt.f32.bf16 %r8609, %rs395; 2026-02-21T12:39:47.6769374Z cvt.f32.bf16 %r8610, %rs396; 2026-02-21T12:39:47.6769437Z cvt.f32.bf16 %r8611, %rs399; 2026-02-21T12:39:47.6769504Z cvt.f32.bf16 %r8612, %rs400; 2026-02-21T12:39:47.6769719Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6769792Z add.s64 %rd409, %rd603, 10240; 2026-02-21T12:39:47.6769857Z // begin inline asm 2026-02-21T12:39:47.6769917Z mov.u32 %r8091, 0x0; 2026-02-21T12:39:47.6769974Z mov.u32 %r8092, 0x0; 2026-02-21T12:39:47.6770078Z ld.global.v2.b32 { %r8091, %r8092 }, [ %rd409 + 0 ]; 2026-02-21T12:39:47.6770136Z // end inline asm 2026-02-21T12:39:47.6770340Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6770399Z bar.sync 0; 2026-02-21T12:39:47.6770471Z st.shared.b8 [%r15], %r8091; 2026-02-21T12:39:47.6770542Z prmt.b32 %r8933, %r8091, 0, 0x7771U; 2026-02-21T12:39:47.6770615Z st.shared.b8 [%r16+256], %r8933; 2026-02-21T12:39:47.6770686Z prmt.b32 %r8934, %r8091, 0, 0x7772U; 2026-02-21T12:39:47.6770750Z st.shared.b8 [%r17+512], %r8934; 2026-02-21T12:39:47.6770909Z prmt.b32 %r8935, %r8091, 0, 0x7773U; 2026-02-21T12:39:47.6770973Z st.shared.b8 [%r18+768], %r8935; 2026-02-21T12:39:47.6771115Z st.shared.b8 [%r19+1024], %r8092; 2026-02-21T12:39:47.6771182Z prmt.b32 %r8936, %r8092, 0, 0x7771U; 2026-02-21T12:39:47.6771248Z st.shared.b8 [%r20+1280], %r8936; 2026-02-21T12:39:47.6771316Z prmt.b32 %r8937, %r8092, 0, 0x7772U; 2026-02-21T12:39:47.6771381Z st.shared.b8 [%r21+1536], %r8937; 2026-02-21T12:39:47.6771447Z prmt.b32 %r8938, %r8092, 0, 0x7773U; 2026-02-21T12:39:47.6771517Z st.shared.b8 [%r22+1792], %r8938; 2026-02-21T12:39:47.6771573Z bar.sync 0; 2026-02-21T12:39:47.6771640Z ld.shared.b32 %r8939, [%r44]; 2026-02-21T12:39:47.6771705Z prmt.b32 %r8940, %r8939, 0, 0x7771U; 2026-02-21T12:39:47.6771775Z cvt.u16.u32 %rs401, %r8940; 2026-02-21T12:39:47.6771838Z prmt.b32 %r8941, %r8939, 0, 0x7770U; 2026-02-21T12:39:47.6771900Z cvt.u16.u32 %rs402, %r8941; 2026-02-21T12:39:47.6771967Z prmt.b32 %r8942, %r8939, 0, 0x7773U; 2026-02-21T12:39:47.6772031Z cvt.u16.u32 %rs403, %r8942; 2026-02-21T12:39:47.6772094Z prmt.b32 %r8943, %r8939, 0, 0x7772U; 2026-02-21T12:39:47.6772231Z cvt.u16.u32 %rs404, %r8943; 2026-02-21T12:39:47.6772485Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6772553Z shl.b16 %rs405, %rs402, 4; 2026-02-21T12:39:47.6772615Z shl.b16 %rs406, %rs401, 4; 2026-02-21T12:39:47.6772818Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6772882Z cvt.u32.u16 %r8944, %rs405; 2026-02-21T12:39:47.6772960Z prmt.b32 %r8945, %r8944, %r8946, 0x3340U; 2026-02-21T12:39:47.6773037Z prmt.b32 %r8947, %r8945, %r8889, 0x5410U; 2026-02-21T12:39:47.6773117Z prmt.b32 %r8948, %r8947, %r8939, 0x5040U; 2026-02-21T12:39:47.6773186Z prmt.b32 %r8949, %r8948, 0, 0x9991U; 2026-02-21T12:39:47.6773250Z cvt.u16.u32 %rs407, %r8949; 2026-02-21T12:39:47.6773316Z shr.s16 %rs408, %rs407, 4; 2026-02-21T12:39:47.6773382Z prmt.b32 %r8950, %r8948, 0, 0xbbb3U; 2026-02-21T12:39:47.6773445Z cvt.u16.u32 %rs409, %r8950; 2026-02-21T12:39:47.6773516Z shr.s16 %rs410, %rs409, 4; 2026-02-21T12:39:47.6773582Z cvt.s16.s8 %rs411, %rs405; 2026-02-21T12:39:47.6773644Z shr.s16 %rs412, %rs411, 4; 2026-02-21T12:39:47.6773705Z cvt.s16.s8 %rs413, %rs406; 2026-02-21T12:39:47.6773769Z shr.s16 %rs414, %rs413, 4; 2026-02-21T12:39:47.6773968Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6774034Z cvt.rn.f32.s16 %r8951, %rs410; 2026-02-21T12:39:47.6774105Z cvt.rn.f32.s16 %r8952, %rs408; 2026-02-21T12:39:47.6774168Z cvt.rn.f32.s16 %r8953, %rs414; 2026-02-21T12:39:47.6774232Z cvt.rn.f32.s16 %r8954, %rs412; 2026-02-21T12:39:47.6774439Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6774505Z shl.b16 %rs415, %rs404, 4; 2026-02-21T12:39:47.6774569Z shl.b16 %rs416, %rs403, 4; 2026-02-21T12:39:47.6774768Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6774845Z prmt.b32 %r8955, %r8939, %r8956, 0x3020U; 2026-02-21T12:39:47.6774911Z prmt.b32 %r8957, %r8955, 0, 0x9991U; 2026-02-21T12:39:47.6774972Z cvt.u16.u32 %rs417, %r8957; 2026-02-21T12:39:47.6775037Z shr.s16 %rs418, %rs417, 4; 2026-02-21T12:39:47.6775098Z cvt.s16.s8 %rs419, %rs415; 2026-02-21T12:39:47.6775170Z shr.s16 %rs420, %rs419, 4; 2026-02-21T12:39:47.6775236Z cvt.s16.s8 %rs421, %rs416; 2026-02-21T12:39:47.6775297Z shr.s16 %rs422, %rs421, 4; 2026-02-21T12:39:47.6775362Z prmt.b32 %r8958, %r8939, 0, 0xbbb3U; 2026-02-21T12:39:47.6775424Z cvt.u16.u32 %rs423, %r8958; 2026-02-21T12:39:47.6775490Z shr.s16 %rs424, %rs423, 4; 2026-02-21T12:39:47.6775688Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6775754Z cvt.rn.f32.s16 %r8959, %rs418; 2026-02-21T12:39:47.6775883Z cvt.rn.f32.s16 %r8960, %rs424; 2026-02-21T12:39:47.6775945Z cvt.rn.f32.s16 %r8961, %rs422; 2026-02-21T12:39:47.6776007Z cvt.rn.f32.s16 %r8962, %rs420; 2026-02-21T12:39:47.6776255Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6776324Z ld.shared.b32 %r8963, [%r44+128]; 2026-02-21T12:39:47.6776391Z prmt.b32 %r8964, %r8963, 0, 0x7771U; 2026-02-21T12:39:47.6776600Z cvt.u16.u32 %rs425, %r8964; 2026-02-21T12:39:47.6776675Z prmt.b32 %r8965, %r8963, 0, 0x7770U; 2026-02-21T12:39:47.6776738Z cvt.u16.u32 %rs426, %r8965; 2026-02-21T12:39:47.6776801Z prmt.b32 %r8966, %r8963, 0, 0x7773U; 2026-02-21T12:39:47.6776865Z cvt.u16.u32 %rs427, %r8966; 2026-02-21T12:39:47.6776930Z prmt.b32 %r8967, %r8963, 0, 0x7772U; 2026-02-21T12:39:47.6776990Z cvt.u16.u32 %rs428, %r8967; 2026-02-21T12:39:47.6777187Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6777256Z shl.b16 %rs429, %rs426, 4; 2026-02-21T12:39:47.6777319Z shl.b16 %rs430, %rs425, 4; 2026-02-21T12:39:47.6777654Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6777736Z cvt.u32.u16 %r8968, %rs429; 2026-02-21T12:39:47.6777810Z prmt.b32 %r8969, %r8968, %r8970, 0x3340U; 2026-02-21T12:39:47.6777881Z prmt.b32 %r8971, %r8969, %r8889, 0x5410U; 2026-02-21T12:39:47.6777953Z prmt.b32 %r8972, %r8971, %r8963, 0x5040U; 2026-02-21T12:39:47.6778016Z prmt.b32 %r8973, %r8972, 0, 0x9991U; 2026-02-21T12:39:47.6778077Z cvt.u16.u32 %rs431, %r8973; 2026-02-21T12:39:47.6778140Z shr.s16 %rs432, %rs431, 4; 2026-02-21T12:39:47.6778207Z prmt.b32 %r8974, %r8972, 0, 0xbbb3U; 2026-02-21T12:39:47.6778267Z cvt.u16.u32 %rs433, %r8974; 2026-02-21T12:39:47.6778328Z shr.s16 %rs434, %rs433, 4; 2026-02-21T12:39:47.6778391Z cvt.s16.s8 %rs435, %rs429; 2026-02-21T12:39:47.6778452Z shr.s16 %rs436, %rs435, 4; 2026-02-21T12:39:47.6778514Z cvt.s16.s8 %rs437, %rs430; 2026-02-21T12:39:47.6778580Z shr.s16 %rs438, %rs437, 4; 2026-02-21T12:39:47.6778780Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6778846Z cvt.rn.f32.s16 %r8975, %rs434; 2026-02-21T12:39:47.6778908Z cvt.rn.f32.s16 %r8976, %rs432; 2026-02-21T12:39:47.6778975Z cvt.rn.f32.s16 %r8977, %rs438; 2026-02-21T12:39:47.6779039Z cvt.rn.f32.s16 %r8978, %rs436; 2026-02-21T12:39:47.6779235Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6779300Z shl.b16 %rs439, %rs428, 4; 2026-02-21T12:39:47.6779360Z shl.b16 %rs440, %rs427, 4; 2026-02-21T12:39:47.6779555Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6779632Z prmt.b32 %r8979, %r8963, %r8980, 0x3020U; 2026-02-21T12:39:47.6779698Z prmt.b32 %r8981, %r8979, 0, 0x9991U; 2026-02-21T12:39:47.6779761Z cvt.u16.u32 %rs441, %r8981; 2026-02-21T12:39:47.6779821Z shr.s16 %rs442, %rs441, 4; 2026-02-21T12:39:47.6779885Z cvt.s16.s8 %rs443, %rs439; 2026-02-21T12:39:47.6779964Z shr.s16 %rs444, %rs443, 4; 2026-02-21T12:39:47.6780028Z cvt.s16.s8 %rs445, %rs440; 2026-02-21T12:39:47.6780092Z shr.s16 %rs446, %rs445, 4; 2026-02-21T12:39:47.6780158Z prmt.b32 %r8982, %r8963, 0, 0xbbb3U; 2026-02-21T12:39:47.6780220Z cvt.u16.u32 %rs447, %r8982; 2026-02-21T12:39:47.6780280Z shr.s16 %rs448, %rs447, 4; 2026-02-21T12:39:47.6780480Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6780544Z cvt.rn.f32.s16 %r8983, %rs442; 2026-02-21T12:39:47.6780607Z cvt.rn.f32.s16 %r8984, %rs448; 2026-02-21T12:39:47.6780674Z cvt.rn.f32.s16 %r8985, %rs446; 2026-02-21T12:39:47.6780737Z cvt.rn.f32.s16 %r8986, %rs444; 2026-02-21T12:39:47.6780791Z bar.sync 0; 2026-02-21T12:39:47.6780906Z st.shared.v4.b32 [%r24], {%r8954, %r8952, %r8953, %r8951}; 2026-02-21T12:39:47.6781116Z st.shared.v4.b32 [%r25], {%r8962, %r8959, %r8961, %r8960}; 2026-02-21T12:39:47.6781227Z st.shared.v4.b32 [%r26], {%r8978, %r8976, %r8977, %r8975}; 2026-02-21T12:39:47.6781394Z st.shared.v4.b32 [%r27], {%r8986, %r8983, %r8985, %r8984}; 2026-02-21T12:39:47.6781454Z $L__tmp15: 2026-02-21T12:39:47.6781729Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6781792Z // begin inline asm 2026-02-21T12:39:47.6781881Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6781944Z // end inline asm 2026-02-21T12:39:47.6781999Z bar.sync 0; 2026-02-21T12:39:47.6782073Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6782137Z // begin inline asm 2026-02-21T12:39:47.6784962Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758}, {%r8349,%r8350,%r8351,%r8352}, %rd404, %p21, 1, 1; 2026-02-21T12:39:47.6785030Z // end inline asm 2026-02-21T12:39:47.6785088Z // begin inline asm 2026-02-21T12:39:47.6787945Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758}, {%r8609,%r8610,%r8611,%r8612}, %rd405, %p21, 1, 1; 2026-02-21T12:39:47.6788013Z // end inline asm 2026-02-21T12:39:47.6788098Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6788159Z mov.b32 %r8741, %r10096; 2026-02-21T12:39:47.6788221Z mov.b32 %r8743, %r8742; 2026-02-21T12:39:47.6788279Z // begin inline asm 2026-02-21T12:39:47.6790883Z // wait for regs: %r11631,%r11632,%r11633,%r11634,%r11635,%r11636,%r11637,%r11638,%r11639,%r11640,%r11641,%r11642,%r11643,%r11644,%r11645,%r11646,%r11647,%r11648,%r11649,%r11650,%r11651,%r11652,%r11653,%r11654,%r11655,%r11656,%r11657,%r11658,%r11659,%r11660,%r11661,%r11662,%r11663,%r11664,%r11665,%r11666,%r11667,%r11668,%r11669,%r11670,%r11671,%r11672,%r11673,%r11674,%r11675,%r11676,%r11677,%r11678,%r11679,%r11680,%r11681,%r11682,%r11683,%r11684,%r11685,%r11686,%r11687,%r11688,%r11689,%r11690,%r11691,%r11692,%r11693,%r11694,%r11695,%r11696,%r11697,%r11698,%r11699,%r11700,%r11701,%r11702,%r11703,%r11704,%r11705,%r11706,%r11707,%r11708,%r11709,%r11710,%r11711,%r11712,%r11713,%r11714,%r11715,%r11716,%r11717,%r11718,%r11719,%r11720,%r11721,%r11722,%r11723,%r11724,%r11725,%r11726,%r11727,%r11728,%r11729,%r11730,%r11731,%r11732,%r11733,%r11734,%r11735,%r11736,%r11737,%r11738,%r11739,%r11740,%r11741,%r11742,%r11743,%r11744,%r11745,%r11746,%r11747,%r11748,%r11749,%r11750,%r11751,%r11752,%r11753,%r11754,%r11755,%r11756,%r11757,%r11758,%r8741,%r8742,%r8743 2026-02-21T12:39:47.6791110Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6791168Z // end inline asm 2026-02-21T12:39:47.6791224Z $L__tmp16: 2026-02-21T12:39:47.6791438Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6791506Z add.s64 %rd605, %rd605, 16; 2026-02-21T12:39:47.6791568Z add.s64 %rd604, %rd604, 64; 2026-02-21T12:39:47.6791645Z add.s64 %rd603, %rd603, 20480; 2026-02-21T12:39:47.6791718Z setp.lt.u64 %p25, %rd605, 4080; 2026-02-21T12:39:47.6791782Z @%p25 bra $L__BB0_21; 2026-02-21T12:39:47.6791897Z // %bb.22: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:39:47.6792213Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6792286Z or.b64 %rd428, %rd84, %rd7; 2026-02-21T12:39:47.6792346Z or.b64 %rd429, %rd84, %rd8; 2026-02-21T12:39:47.6792408Z or.b64 %rd430, %rd84, %rd9; 2026-02-21T12:39:47.6792476Z or.b64 %rd431, %rd84, %rd10; 2026-02-21T12:39:47.6792537Z or.b64 %rd432, %rd84, %rd11; 2026-02-21T12:39:47.6792597Z or.b64 %rd433, %rd84, %rd12; 2026-02-21T12:39:47.6792663Z or.b64 %rd434, %rd84, %rd13; 2026-02-21T12:39:47.6792725Z or.b64 %rd435, %rd84, %rd14; 2026-02-21T12:39:47.6792785Z or.b64 %rd436, %rd84, %rd15; 2026-02-21T12:39:47.6792844Z or.b64 %rd437, %rd84, %rd16; 2026-02-21T12:39:47.6792910Z or.b64 %rd438, %rd84, %rd17; 2026-02-21T12:39:47.6792971Z or.b64 %rd439, %rd84, %rd18; 2026-02-21T12:39:47.6793031Z or.b64 %rd440, %rd84, %rd19; 2026-02-21T12:39:47.6793097Z or.b64 %rd441, %rd84, %rd20; 2026-02-21T12:39:47.6793158Z or.b64 %rd442, %rd84, %rd21; 2026-02-21T12:39:47.6793222Z or.b64 %rd443, %rd84, %rd22; 2026-02-21T12:39:47.6793424Z .loc 1 90 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:90:28 2026-02-21T12:39:47.6793516Z cvt.rn.bf16x2.f32 %r9131, %r11632, %r11631; 2026-02-21T12:39:47.6793596Z cvt.rn.bf16x2.f32 %r9132, %r11634, %r11633; 2026-02-21T12:39:47.6793673Z cvt.rn.bf16x2.f32 %r9133, %r11636, %r11635; 2026-02-21T12:39:47.6793756Z cvt.rn.bf16x2.f32 %r9134, %r11638, %r11637; 2026-02-21T12:39:47.6793833Z cvt.rn.bf16x2.f32 %r9135, %r11640, %r11639; 2026-02-21T12:39:47.6793909Z cvt.rn.bf16x2.f32 %r9136, %r11642, %r11641; 2026-02-21T12:39:47.6793990Z cvt.rn.bf16x2.f32 %r9137, %r11644, %r11643; 2026-02-21T12:39:47.6794064Z cvt.rn.bf16x2.f32 %r9138, %r11646, %r11645; 2026-02-21T12:39:47.6794141Z cvt.rn.bf16x2.f32 %r9139, %r11648, %r11647; 2026-02-21T12:39:47.6794217Z cvt.rn.bf16x2.f32 %r9140, %r11650, %r11649; 2026-02-21T12:39:47.6794297Z cvt.rn.bf16x2.f32 %r9141, %r11652, %r11651; 2026-02-21T12:39:47.6794373Z cvt.rn.bf16x2.f32 %r9142, %r11654, %r11653; 2026-02-21T12:39:47.6794456Z cvt.rn.bf16x2.f32 %r9143, %r11656, %r11655; 2026-02-21T12:39:47.6794535Z cvt.rn.bf16x2.f32 %r9144, %r11658, %r11657; 2026-02-21T12:39:47.6794609Z cvt.rn.bf16x2.f32 %r9145, %r11660, %r11659; 2026-02-21T12:39:47.6794684Z cvt.rn.bf16x2.f32 %r9146, %r11662, %r11661; 2026-02-21T12:39:47.6794763Z cvt.rn.bf16x2.f32 %r9147, %r11664, %r11663; 2026-02-21T12:39:47.6794838Z cvt.rn.bf16x2.f32 %r9148, %r11666, %r11665; 2026-02-21T12:39:47.6794913Z cvt.rn.bf16x2.f32 %r9149, %r11668, %r11667; 2026-02-21T12:39:47.6794986Z cvt.rn.bf16x2.f32 %r9150, %r11670, %r11669; 2026-02-21T12:39:47.6795066Z cvt.rn.bf16x2.f32 %r9151, %r11672, %r11671; 2026-02-21T12:39:47.6795142Z cvt.rn.bf16x2.f32 %r9152, %r11674, %r11673; 2026-02-21T12:39:47.6795217Z cvt.rn.bf16x2.f32 %r9153, %r11676, %r11675; 2026-02-21T12:39:47.6795370Z cvt.rn.bf16x2.f32 %r9154, %r11678, %r11677; 2026-02-21T12:39:47.6795447Z cvt.rn.bf16x2.f32 %r9155, %r11680, %r11679; 2026-02-21T12:39:47.6795588Z cvt.rn.bf16x2.f32 %r9156, %r11682, %r11681; 2026-02-21T12:39:47.6795665Z cvt.rn.bf16x2.f32 %r9157, %r11684, %r11683; 2026-02-21T12:39:47.6795739Z cvt.rn.bf16x2.f32 %r9158, %r11686, %r11685; 2026-02-21T12:39:47.6795814Z cvt.rn.bf16x2.f32 %r9159, %r11688, %r11687; 2026-02-21T12:39:47.6795889Z cvt.rn.bf16x2.f32 %r9160, %r11690, %r11689; 2026-02-21T12:39:47.6795967Z cvt.rn.bf16x2.f32 %r9161, %r11692, %r11691; 2026-02-21T12:39:47.6796041Z cvt.rn.bf16x2.f32 %r9162, %r11694, %r11693; 2026-02-21T12:39:47.6796117Z cvt.rn.bf16x2.f32 %r9163, %r11696, %r11695; 2026-02-21T12:39:47.6796195Z cvt.rn.bf16x2.f32 %r9164, %r11698, %r11697; 2026-02-21T12:39:47.6796268Z cvt.rn.bf16x2.f32 %r9165, %r11700, %r11699; 2026-02-21T12:39:47.6796345Z cvt.rn.bf16x2.f32 %r9166, %r11702, %r11701; 2026-02-21T12:39:47.6796425Z cvt.rn.bf16x2.f32 %r9167, %r11704, %r11703; 2026-02-21T12:39:47.6796645Z cvt.rn.bf16x2.f32 %r9168, %r11706, %r11705; 2026-02-21T12:39:47.6796797Z cvt.rn.bf16x2.f32 %r9169, %r11708, %r11707; 2026-02-21T12:39:47.6796936Z cvt.rn.bf16x2.f32 %r9170, %r11710, %r11709; 2026-02-21T12:39:47.6797018Z cvt.rn.bf16x2.f32 %r9171, %r11712, %r11711; 2026-02-21T12:39:47.6797092Z cvt.rn.bf16x2.f32 %r9172, %r11714, %r11713; 2026-02-21T12:39:47.6797169Z cvt.rn.bf16x2.f32 %r9173, %r11716, %r11715; 2026-02-21T12:39:47.6797245Z cvt.rn.bf16x2.f32 %r9174, %r11718, %r11717; 2026-02-21T12:39:47.6797319Z cvt.rn.bf16x2.f32 %r9175, %r11720, %r11719; 2026-02-21T12:39:47.6797394Z cvt.rn.bf16x2.f32 %r9176, %r11722, %r11721; 2026-02-21T12:39:47.6797467Z cvt.rn.bf16x2.f32 %r9177, %r11724, %r11723; 2026-02-21T12:39:47.6797546Z cvt.rn.bf16x2.f32 %r9178, %r11726, %r11725; 2026-02-21T12:39:47.6797620Z cvt.rn.bf16x2.f32 %r9179, %r11728, %r11727; 2026-02-21T12:39:47.6797709Z cvt.rn.bf16x2.f32 %r9180, %r11730, %r11729; 2026-02-21T12:39:47.6797795Z cvt.rn.bf16x2.f32 %r9181, %r11732, %r11731; 2026-02-21T12:39:47.6797872Z cvt.rn.bf16x2.f32 %r9182, %r11734, %r11733; 2026-02-21T12:39:47.6797952Z cvt.rn.bf16x2.f32 %r9183, %r11736, %r11735; 2026-02-21T12:39:47.6798028Z cvt.rn.bf16x2.f32 %r9184, %r11738, %r11737; 2026-02-21T12:39:47.6798102Z cvt.rn.bf16x2.f32 %r9185, %r11740, %r11739; 2026-02-21T12:39:47.6798175Z cvt.rn.bf16x2.f32 %r9186, %r11742, %r11741; 2026-02-21T12:39:47.6798248Z cvt.rn.bf16x2.f32 %r9187, %r11744, %r11743; 2026-02-21T12:39:47.6798326Z cvt.rn.bf16x2.f32 %r9188, %r11746, %r11745; 2026-02-21T12:39:47.6798400Z cvt.rn.bf16x2.f32 %r9189, %r11748, %r11747; 2026-02-21T12:39:47.6798475Z cvt.rn.bf16x2.f32 %r9190, %r11750, %r11749; 2026-02-21T12:39:47.6798553Z cvt.rn.bf16x2.f32 %r9191, %r11752, %r11751; 2026-02-21T12:39:47.6798627Z cvt.rn.bf16x2.f32 %r9192, %r11754, %r11753; 2026-02-21T12:39:47.6798702Z cvt.rn.bf16x2.f32 %r9193, %r11756, %r11755; 2026-02-21T12:39:47.6798778Z cvt.rn.bf16x2.f32 %r9194, %r11758, %r11757; 2026-02-21T12:39:47.6798983Z .loc 1 91 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:22 2026-02-21T12:39:47.6799060Z mad.lo.s64 %rd444, %rd428, 2560, %rd151; 2026-02-21T12:39:47.6799124Z shl.b64 %rd445, %rd85, 1; 2026-02-21T12:39:47.6799195Z add.s64 %rd412, %rd444, %rd445; 2026-02-21T12:39:47.6799267Z mad.lo.s64 %rd446, %rd429, 2560, %rd151; 2026-02-21T12:39:47.6799332Z add.s64 %rd413, %rd446, %rd445; 2026-02-21T12:39:47.6799405Z mad.lo.s64 %rd447, %rd430, 2560, %rd151; 2026-02-21T12:39:47.6799470Z add.s64 %rd414, %rd447, %rd445; 2026-02-21T12:39:47.6799538Z mad.lo.s64 %rd448, %rd431, 2560, %rd151; 2026-02-21T12:39:47.6799608Z add.s64 %rd415, %rd448, %rd445; 2026-02-21T12:39:47.6799688Z mad.lo.s64 %rd449, %rd432, 2560, %rd151; 2026-02-21T12:39:47.6799751Z add.s64 %rd416, %rd449, %rd445; 2026-02-21T12:39:47.6799820Z mad.lo.s64 %rd450, %rd433, 2560, %rd151; 2026-02-21T12:39:47.6799886Z add.s64 %rd417, %rd450, %rd445; 2026-02-21T12:39:47.6800041Z mad.lo.s64 %rd451, %rd434, 2560, %rd151; 2026-02-21T12:39:47.6800105Z add.s64 %rd418, %rd451, %rd445; 2026-02-21T12:39:47.6800179Z mad.lo.s64 %rd452, %rd435, 2560, %rd151; 2026-02-21T12:39:47.6800311Z add.s64 %rd419, %rd452, %rd445; 2026-02-21T12:39:47.6800381Z mad.lo.s64 %rd453, %rd436, 2560, %rd151; 2026-02-21T12:39:47.6800444Z add.s64 %rd420, %rd453, %rd445; 2026-02-21T12:39:47.6800516Z mad.lo.s64 %rd454, %rd437, 2560, %rd151; 2026-02-21T12:39:47.6800579Z add.s64 %rd421, %rd454, %rd445; 2026-02-21T12:39:47.6800649Z mad.lo.s64 %rd455, %rd438, 2560, %rd151; 2026-02-21T12:39:47.6800716Z add.s64 %rd422, %rd455, %rd445; 2026-02-21T12:39:47.6800785Z mad.lo.s64 %rd456, %rd439, 2560, %rd151; 2026-02-21T12:39:47.6800847Z add.s64 %rd423, %rd456, %rd445; 2026-02-21T12:39:47.6800923Z mad.lo.s64 %rd457, %rd440, 2560, %rd151; 2026-02-21T12:39:47.6800984Z add.s64 %rd424, %rd457, %rd445; 2026-02-21T12:39:47.6801053Z mad.lo.s64 %rd458, %rd441, 2560, %rd151; 2026-02-21T12:39:47.6801117Z add.s64 %rd425, %rd458, %rd445; 2026-02-21T12:39:47.6801190Z mad.lo.s64 %rd459, %rd442, 2560, %rd151; 2026-02-21T12:39:47.6801301Z add.s64 %rd426, %rd459, %rd445; 2026-02-21T12:39:47.6801416Z mad.lo.s64 %rd460, %rd443, 2560, %rd151; 2026-02-21T12:39:47.6801485Z add.s64 %rd427, %rd460, %rd445; 2026-02-21T12:39:47.6801683Z .loc 1 91 81 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:81 2026-02-21T12:39:47.6801756Z bar.sync 0; 2026-02-21T12:39:47.6801876Z st.shared.v4.b32 [%r28], {%r9131, %r9133, %r9135, %r9137}; 2026-02-21T12:39:47.6801986Z st.shared.v4.b32 [%r29], {%r9139, %r9141, %r9143, %r9145}; 2026-02-21T12:39:47.6802089Z st.shared.v4.b32 [%r30], {%r9147, %r9149, %r9151, %r9153}; 2026-02-21T12:39:47.6802189Z st.shared.v4.b32 [%r31], {%r9155, %r9157, %r9159, %r9161}; 2026-02-21T12:39:47.6802291Z st.shared.v4.b32 [%r32], {%r9163, %r9165, %r9167, %r9169}; 2026-02-21T12:39:47.6802391Z st.shared.v4.b32 [%r33], {%r9171, %r9173, %r9175, %r9177}; 2026-02-21T12:39:47.6802495Z st.shared.v4.b32 [%r34], {%r9179, %r9181, %r9183, %r9185}; 2026-02-21T12:39:47.6802601Z st.shared.v4.b32 [%r35], {%r9187, %r9189, %r9191, %r9193}; 2026-02-21T12:39:47.6802661Z bar.sync 0; 2026-02-21T12:39:47.6802723Z // begin inline asm 2026-02-21T12:39:47.6802930Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8987, %r8988, %r8989, %r8990}, [%r3291]; 2026-02-21T12:39:47.6802990Z // end inline asm 2026-02-21T12:39:47.6803050Z // begin inline asm 2026-02-21T12:39:47.6803236Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8992, %r8993, %r8994, %r8995}, [%r3296]; 2026-02-21T12:39:47.6803298Z // end inline asm 2026-02-21T12:39:47.6803357Z // begin inline asm 2026-02-21T12:39:47.6803538Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8997, %r8998, %r8999, %r9000}, [%r3301]; 2026-02-21T12:39:47.6803596Z // end inline asm 2026-02-21T12:39:47.6803654Z // begin inline asm 2026-02-21T12:39:47.6803830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9002, %r9003, %r9004, %r9005}, [%r3306]; 2026-02-21T12:39:47.6803891Z // end inline asm 2026-02-21T12:39:47.6803954Z // begin inline asm 2026-02-21T12:39:47.6804136Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9007, %r9008, %r9009, %r9010}, [%r3311]; 2026-02-21T12:39:47.6804195Z // end inline asm 2026-02-21T12:39:47.6804257Z // begin inline asm 2026-02-21T12:39:47.6804435Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9012, %r9013, %r9014, %r9015}, [%r3316]; 2026-02-21T12:39:47.6804498Z // end inline asm 2026-02-21T12:39:47.6804559Z // begin inline asm 2026-02-21T12:39:47.6804736Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9017, %r9018, %r9019, %r9020}, [%r3321]; 2026-02-21T12:39:47.6804791Z // end inline asm 2026-02-21T12:39:47.6804853Z // begin inline asm 2026-02-21T12:39:47.6805034Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9022, %r9023, %r9024, %r9025}, [%r3326]; 2026-02-21T12:39:47.6805089Z // end inline asm 2026-02-21T12:39:47.6805143Z bar.sync 0; 2026-02-21T12:39:47.6805250Z st.shared.v4.b32 [%r28], {%r9132, %r9134, %r9136, %r9138}; 2026-02-21T12:39:47.6805413Z st.shared.v4.b32 [%r29], {%r9140, %r9142, %r9144, %r9146}; 2026-02-21T12:39:47.6805518Z st.shared.v4.b32 [%r30], {%r9148, %r9150, %r9152, %r9154}; 2026-02-21T12:39:47.6805687Z st.shared.v4.b32 [%r31], {%r9156, %r9158, %r9160, %r9162}; 2026-02-21T12:39:47.6805790Z st.shared.v4.b32 [%r32], {%r9164, %r9166, %r9168, %r9170}; 2026-02-21T12:39:47.6805889Z st.shared.v4.b32 [%r33], {%r9172, %r9174, %r9176, %r9178}; 2026-02-21T12:39:47.6805988Z st.shared.v4.b32 [%r34], {%r9180, %r9182, %r9184, %r9186}; 2026-02-21T12:39:47.6806093Z st.shared.v4.b32 [%r35], {%r9188, %r9190, %r9192, %r9194}; 2026-02-21T12:39:47.6806148Z bar.sync 0; 2026-02-21T12:39:47.6806207Z // begin inline asm 2026-02-21T12:39:47.6806391Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9027, %r9028, %r9029, %r9030}, [%r3291]; 2026-02-21T12:39:47.6806550Z // end inline asm 2026-02-21T12:39:47.6806613Z // begin inline asm 2026-02-21T12:39:47.6806797Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9032, %r9033, %r9034, %r9035}, [%r3296]; 2026-02-21T12:39:47.6806857Z // end inline asm 2026-02-21T12:39:47.6806917Z // begin inline asm 2026-02-21T12:39:47.6807221Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9037, %r9038, %r9039, %r9040}, [%r3301]; 2026-02-21T12:39:47.6807297Z // end inline asm 2026-02-21T12:39:47.6807358Z // begin inline asm 2026-02-21T12:39:47.6807535Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9042, %r9043, %r9044, %r9045}, [%r3306]; 2026-02-21T12:39:47.6807594Z // end inline asm 2026-02-21T12:39:47.6807652Z // begin inline asm 2026-02-21T12:39:47.6807830Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9047, %r9048, %r9049, %r9050}, [%r3311]; 2026-02-21T12:39:47.6807886Z // end inline asm 2026-02-21T12:39:47.6807948Z // begin inline asm 2026-02-21T12:39:47.6808122Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9052, %r9053, %r9054, %r9055}, [%r3316]; 2026-02-21T12:39:47.6808179Z // end inline asm 2026-02-21T12:39:47.6808240Z // begin inline asm 2026-02-21T12:39:47.6808421Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9057, %r9058, %r9059, %r9060}, [%r3321]; 2026-02-21T12:39:47.6808478Z // end inline asm 2026-02-21T12:39:47.6808541Z // begin inline asm 2026-02-21T12:39:47.6808721Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9062, %r9063, %r9064, %r9065}, [%r3326]; 2026-02-21T12:39:47.6808777Z // end inline asm 2026-02-21T12:39:47.6808836Z // begin inline asm 2026-02-21T12:39:47.6808967Z st.global.v4.b32 [ %rd412 + 0 ], { %r8987, %r8988, %r8989, %r8990 }; 2026-02-21T12:39:47.6809024Z // end inline asm 2026-02-21T12:39:47.6809083Z // begin inline asm 2026-02-21T12:39:47.6809206Z st.global.v4.b32 [ %rd413 + 0 ], { %r9027, %r9028, %r9029, %r9030 }; 2026-02-21T12:39:47.6809265Z // end inline asm 2026-02-21T12:39:47.6809334Z // begin inline asm 2026-02-21T12:39:47.6809452Z st.global.v4.b32 [ %rd414 + 0 ], { %r8992, %r8993, %r8994, %r8995 }; 2026-02-21T12:39:47.6809513Z // end inline asm 2026-02-21T12:39:47.6809570Z // begin inline asm 2026-02-21T12:39:47.6809688Z st.global.v4.b32 [ %rd415 + 0 ], { %r9032, %r9033, %r9034, %r9035 }; 2026-02-21T12:39:47.6809749Z // end inline asm 2026-02-21T12:39:47.6809808Z // begin inline asm 2026-02-21T12:39:47.6809929Z st.global.v4.b32 [ %rd416 + 0 ], { %r8997, %r8998, %r8999, %r9000 }; 2026-02-21T12:39:47.6809990Z // end inline asm 2026-02-21T12:39:47.6810046Z // begin inline asm 2026-02-21T12:39:47.6810160Z st.global.v4.b32 [ %rd417 + 0 ], { %r9037, %r9038, %r9039, %r9040 }; 2026-02-21T12:39:47.6810214Z // end inline asm 2026-02-21T12:39:47.6810274Z // begin inline asm 2026-02-21T12:39:47.6810387Z st.global.v4.b32 [ %rd418 + 0 ], { %r9002, %r9003, %r9004, %r9005 }; 2026-02-21T12:39:47.6810443Z // end inline asm 2026-02-21T12:39:47.6810503Z // begin inline asm 2026-02-21T12:39:47.6810617Z st.global.v4.b32 [ %rd419 + 0 ], { %r9042, %r9043, %r9044, %r9045 }; 2026-02-21T12:39:47.6810673Z // end inline asm 2026-02-21T12:39:47.6810730Z // begin inline asm 2026-02-21T12:39:47.6810846Z st.global.v4.b32 [ %rd420 + 0 ], { %r9007, %r9008, %r9009, %r9010 }; 2026-02-21T12:39:47.6810982Z // end inline asm 2026-02-21T12:39:47.6811040Z // begin inline asm 2026-02-21T12:39:47.6811167Z st.global.v4.b32 [ %rd421 + 0 ], { %r9047, %r9048, %r9049, %r9050 }; 2026-02-21T12:39:47.6811290Z // end inline asm 2026-02-21T12:39:47.6811348Z // begin inline asm 2026-02-21T12:39:47.6811467Z st.global.v4.b32 [ %rd422 + 0 ], { %r9012, %r9013, %r9014, %r9015 }; 2026-02-21T12:39:47.6811525Z // end inline asm 2026-02-21T12:39:47.6811583Z // begin inline asm 2026-02-21T12:39:47.6811699Z st.global.v4.b32 [ %rd423 + 0 ], { %r9052, %r9053, %r9054, %r9055 }; 2026-02-21T12:39:47.6811758Z // end inline asm 2026-02-21T12:39:47.6811817Z // begin inline asm 2026-02-21T12:39:47.6811929Z st.global.v4.b32 [ %rd424 + 0 ], { %r9017, %r9018, %r9019, %r9020 }; 2026-02-21T12:39:47.6811987Z // end inline asm 2026-02-21T12:39:47.6812047Z // begin inline asm 2026-02-21T12:39:47.6812159Z st.global.v4.b32 [ %rd425 + 0 ], { %r9057, %r9058, %r9059, %r9060 }; 2026-02-21T12:39:47.6812218Z // end inline asm 2026-02-21T12:39:47.6812292Z // begin inline asm 2026-02-21T12:39:47.6812466Z st.global.v4.b32 [ %rd426 + 0 ], { %r9022, %r9023, %r9024, %r9025 }; 2026-02-21T12:39:47.6812568Z // end inline asm 2026-02-21T12:39:47.6812631Z // begin inline asm 2026-02-21T12:39:47.6812745Z st.global.v4.b32 [ %rd427 + 0 ], { %r9062, %r9063, %r9064, %r9065 }; 2026-02-21T12:39:47.6812801Z // end inline asm 2026-02-21T12:39:47.6813020Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6813087Z add.s64 %rd589, %rd589, 4; 2026-02-21T12:39:47.6813157Z setp.lt.u64 %p26, %rd589, %rd629; 2026-02-21T12:39:47.6813217Z @%p26 bra $L__BB0_2; 2026-02-21T12:39:47.6813311Z $L__BB0_23: // %._crit_edge 2026-02-21T12:39:47.6813508Z .loc 1 0 0 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:0 2026-02-21T12:39:47.6813577Z cvt.u64.u32 %rd4, %r1504; 2026-02-21T12:39:47.6813647Z cvt.u64.u32 %rd5, %r1505; 2026-02-21T12:39:47.6813707Z shl.b32 %r7, %r6, 2; 2026-02-21T12:39:47.6813918Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6813999Z sub.s64 %rd95, %rd2, %rd629; 2026-02-21T12:39:47.6814201Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6814286Z mul.hi.s64 %rd461, %rd629, 7378697629483820647; 2026-02-21T12:39:47.6814349Z shr.u64 %rd462, %rd461, 63; 2026-02-21T12:39:47.6814415Z shr.s64 %rd463, %rd461, 4; 2026-02-21T12:39:47.6814480Z add.s64 %rd464, %rd463, %rd462; 2026-02-21T12:39:47.6814689Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6814759Z shl.b64 %rd97, %rd464, 3; 2026-02-21T12:39:47.6814956Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6815020Z sub.s64 %rd465, 2048, %rd97; 2026-02-21T12:39:47.6815217Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6815284Z min.s64 %rd98, %rd465, 8; 2026-02-21T12:39:47.6815477Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6815542Z mul.lo.s64 %rd466, %rd464, 40; 2026-02-21T12:39:47.6815609Z sub.s64 %rd99, %rd629, %rd466; 2026-02-21T12:39:47.6815803Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6815868Z or.b64 %rd467, %rd99, %rd98; 2026-02-21T12:39:47.6815945Z and.b64 %rd468, %rd467, -4294967296; 2026-02-21T12:39:47.6816011Z setp.ne.b64 %p27, %rd468, 0; 2026-02-21T12:39:47.6816072Z @%p27 bra $L__BB0_25; 2026-02-21T12:39:47.6816135Z bra.uni $L__BB0_24; 2026-02-21T12:39:47.6816190Z $L__BB0_25: 2026-02-21T12:39:47.6816255Z div.s64 %rd606, %rd99, %rd98; 2026-02-21T12:39:47.6816405Z bra.uni $L__BB0_26; 2026-02-21T12:39:47.6816754Z $L__BB0_24: 2026-02-21T12:39:47.6816822Z cvt.u32.u64 %r9195, %rd98; 2026-02-21T12:39:47.6816887Z cvt.u32.u64 %r9196, %rd99; 2026-02-21T12:39:47.6817050Z div.u32 %r9197, %r9196, %r9195; 2026-02-21T12:39:47.6817114Z cvt.u64.u32 %rd606, %r9197; 2026-02-21T12:39:47.6817167Z $L__BB0_26: 2026-02-21T12:39:47.6817380Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6817450Z setp.lt.s64 %p28, %rd95, 1; 2026-02-21T12:39:47.6817515Z setp.gt.s64 %p29, %rd95, 0; 2026-02-21T12:39:47.6817730Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6817804Z mul.lo.s64 %rd489, %rd606, %rd98; 2026-02-21T12:39:47.6817868Z sub.s64 %rd490, %rd99, %rd489; 2026-02-21T12:39:47.6818064Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6818132Z add.s64 %rd491, %rd490, %rd97; 2026-02-21T12:39:47.6818327Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6818517Z shl.b64 %rd607, %rd491, 7; 2026-02-21T12:39:47.6818717Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6818786Z or.b64 %rd630, %rd607, %rd4; 2026-02-21T12:39:47.6818847Z or.b64 %rd631, %rd607, %rd5; 2026-02-21T12:39:47.6819041Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6819113Z shl.b64 %rd492, %rd630, 14; 2026-02-21T12:39:47.6819178Z add.s64 %rd493, %rd149, %rd492; 2026-02-21T12:39:47.6819246Z mul.wide.u32 %rd494, %r7, 2; 2026-02-21T12:39:47.6819316Z add.s64 %rd469, %rd493, %rd494; 2026-02-21T12:39:47.6823304Z shl.b64 %rd495, %rd631, 14; 2026-02-21T12:39:47.6823420Z add.s64 %rd496, %rd149, %rd495; 2026-02-21T12:39:47.6823491Z add.s64 %rd470, %rd496, %rd494; 2026-02-21T12:39:47.6823735Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6823810Z and.b32 %r9238, %r8, 1912; 2026-02-21T12:39:47.6823886Z setp.eq.b32 %p30, %r11759, 0; 2026-02-21T12:39:47.6823960Z selp.b32 %r1070, 0, 136, %p30; 2026-02-21T12:39:47.6824029Z xor.b32 %r1071, %r1070, %r9238; 2026-02-21T12:39:47.6824095Z add.s32 %r9240, %r10096, %r1071; 2026-02-21T12:39:47.6824166Z add.s32 %r9198, %r9240, 32768; 2026-02-21T12:39:47.6824232Z selp.b32 %r9199, 8, 0, %p29; 2026-02-21T12:39:47.6824294Z // begin inline asm 2026-02-21T12:39:47.6824452Z cp.async.ca.shared.global [ %r9198 + 0 ], [ %rd469 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6824518Z // end inline asm 2026-02-21T12:39:47.6824581Z add.s32 %r9200, %r9240, 34816; 2026-02-21T12:39:47.6824642Z // begin inline asm 2026-02-21T12:39:47.6824791Z cp.async.ca.shared.global [ %r9200 + 0 ], [ %rd470 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6824861Z // end inline asm 2026-02-21T12:39:47.6824934Z cp.async.commit_group; 2026-02-21T12:39:47.6825160Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6825232Z add.s64 %rd471, %rd469, 32; 2026-02-21T12:39:47.6825296Z add.s64 %rd472, %rd470, 32; 2026-02-21T12:39:47.6825504Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6825575Z add.s32 %r9202, %r9240, 53248; 2026-02-21T12:39:47.6825635Z // begin inline asm 2026-02-21T12:39:47.6825776Z cp.async.ca.shared.global [ %r9202 + 0 ], [ %rd471 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6825840Z // end inline asm 2026-02-21T12:39:47.6825903Z add.s32 %r9204, %r9240, 55296; 2026-02-21T12:39:47.6825963Z // begin inline asm 2026-02-21T12:39:47.6826098Z cp.async.ca.shared.global [ %r9204 + 0 ], [ %rd472 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6826160Z // end inline asm 2026-02-21T12:39:47.6826229Z cp.async.commit_group; 2026-02-21T12:39:47.6826753Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6826829Z add.s64 %rd473, %rd469, 64; 2026-02-21T12:39:47.6826974Z add.s64 %rd474, %rd470, 64; 2026-02-21T12:39:47.6827186Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6827251Z bar.sync 0; 2026-02-21T12:39:47.6827317Z add.s32 %r9206, %r9240, 36864; 2026-02-21T12:39:47.6827379Z // begin inline asm 2026-02-21T12:39:47.6827527Z cp.async.ca.shared.global [ %r9206 + 0 ], [ %rd473 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6827594Z // end inline asm 2026-02-21T12:39:47.6827665Z add.s32 %r9208, %r9240, 38912; 2026-02-21T12:39:47.6827726Z // begin inline asm 2026-02-21T12:39:47.6827869Z cp.async.ca.shared.global [ %r9208 + 0 ], [ %rd474 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6827928Z // end inline asm 2026-02-21T12:39:47.6827995Z cp.async.commit_group; 2026-02-21T12:39:47.6828208Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6828279Z add.s64 %rd475, %rd469, 96; 2026-02-21T12:39:47.6828539Z add.s64 %rd476, %rd470, 96; 2026-02-21T12:39:47.6828752Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6828821Z add.s32 %r9210, %r9240, 57344; 2026-02-21T12:39:47.6828882Z // begin inline asm 2026-02-21T12:39:47.6829021Z cp.async.ca.shared.global [ %r9210 + 0 ], [ %rd475 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6829082Z // end inline asm 2026-02-21T12:39:47.6829144Z add.s32 %r9212, %r9240, 59392; 2026-02-21T12:39:47.6829204Z // begin inline asm 2026-02-21T12:39:47.6829338Z cp.async.ca.shared.global [ %r9212 + 0 ], [ %rd476 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6829399Z // end inline asm 2026-02-21T12:39:47.6829466Z cp.async.commit_group; 2026-02-21T12:39:47.6829666Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6829738Z add.s64 %rd477, %rd469, 128; 2026-02-21T12:39:47.6829802Z add.s64 %rd478, %rd470, 128; 2026-02-21T12:39:47.6830018Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6830080Z bar.sync 0; 2026-02-21T12:39:47.6830143Z add.s32 %r9214, %r9240, 40960; 2026-02-21T12:39:47.6830202Z // begin inline asm 2026-02-21T12:39:47.6830337Z cp.async.ca.shared.global [ %r9214 + 0 ], [ %rd477 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6830397Z // end inline asm 2026-02-21T12:39:47.6830459Z add.s32 %r9216, %r9240, 43008; 2026-02-21T12:39:47.6830519Z // begin inline asm 2026-02-21T12:39:47.6830661Z cp.async.ca.shared.global [ %r9216 + 0 ], [ %rd478 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6830718Z // end inline asm 2026-02-21T12:39:47.6830784Z cp.async.commit_group; 2026-02-21T12:39:47.6830981Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6831046Z add.s64 %rd479, %rd469, 160; 2026-02-21T12:39:47.6831107Z add.s64 %rd480, %rd470, 160; 2026-02-21T12:39:47.6831307Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6831376Z add.s32 %r9218, %r9240, 61440; 2026-02-21T12:39:47.6831436Z // begin inline asm 2026-02-21T12:39:47.6831576Z cp.async.ca.shared.global [ %r9218 + 0 ], [ %rd479 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6831640Z // end inline asm 2026-02-21T12:39:47.6831703Z add.s32 %r9220, %r9240, 63488; 2026-02-21T12:39:47.6831764Z // begin inline asm 2026-02-21T12:39:47.6831903Z cp.async.ca.shared.global [ %r9220 + 0 ], [ %rd480 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6831965Z // end inline asm 2026-02-21T12:39:47.6832032Z cp.async.commit_group; 2026-02-21T12:39:47.6832241Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6832313Z add.s64 %rd481, %rd469, 192; 2026-02-21T12:39:47.6832466Z add.s64 %rd482, %rd470, 192; 2026-02-21T12:39:47.6832678Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6832788Z bar.sync 0; 2026-02-21T12:39:47.6832851Z add.s32 %r9222, %r9240, 45056; 2026-02-21T12:39:47.6832912Z // begin inline asm 2026-02-21T12:39:47.6833051Z cp.async.ca.shared.global [ %r9222 + 0 ], [ %rd481 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6833113Z // end inline asm 2026-02-21T12:39:47.6833175Z add.s32 %r9224, %r9240, 47104; 2026-02-21T12:39:47.6833235Z // begin inline asm 2026-02-21T12:39:47.6833373Z cp.async.ca.shared.global [ %r9224 + 0 ], [ %rd482 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6833431Z // end inline asm 2026-02-21T12:39:47.6833497Z cp.async.commit_group; 2026-02-21T12:39:47.6833703Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6833768Z add.s64 %rd483, %rd469, 224; 2026-02-21T12:39:47.6833833Z add.s64 %rd484, %rd470, 224; 2026-02-21T12:39:47.6834030Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6834200Z add.s32 %r9226, %r9240, 65536; 2026-02-21T12:39:47.6834263Z // begin inline asm 2026-02-21T12:39:47.6834397Z cp.async.ca.shared.global [ %r9226 + 0 ], [ %rd483 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6834457Z // end inline asm 2026-02-21T12:39:47.6834517Z add.s32 %r9228, %r9240, 67584; 2026-02-21T12:39:47.6834578Z // begin inline asm 2026-02-21T12:39:47.6834716Z cp.async.ca.shared.global [ %r9228 + 0 ], [ %rd484 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6834780Z // end inline asm 2026-02-21T12:39:47.6834846Z cp.async.commit_group; 2026-02-21T12:39:47.6835044Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6835111Z add.s64 %rd485, %rd469, 256; 2026-02-21T12:39:47.6835172Z add.s64 %rd486, %rd470, 256; 2026-02-21T12:39:47.6835369Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6835432Z bar.sync 0; 2026-02-21T12:39:47.6835498Z add.s32 %r9230, %r9240, 49152; 2026-02-21T12:39:47.6835558Z // begin inline asm 2026-02-21T12:39:47.6835692Z cp.async.ca.shared.global [ %r9230 + 0 ], [ %rd485 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6835752Z // end inline asm 2026-02-21T12:39:47.6835812Z add.s32 %r9232, %r9240, 51200; 2026-02-21T12:39:47.6835872Z // begin inline asm 2026-02-21T12:39:47.6836006Z cp.async.ca.shared.global [ %r9232 + 0 ], [ %rd486 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6836063Z // end inline asm 2026-02-21T12:39:47.6836130Z cp.async.commit_group; 2026-02-21T12:39:47.6836345Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6836412Z add.s64 %rd487, %rd469, 288; 2026-02-21T12:39:47.6836589Z add.s64 %rd488, %rd470, 288; 2026-02-21T12:39:47.6836803Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6836871Z add.s32 %r9234, %r9240, 69632; 2026-02-21T12:39:47.6836934Z // begin inline asm 2026-02-21T12:39:47.6837073Z cp.async.ca.shared.global [ %r9234 + 0 ], [ %rd487 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6837132Z // end inline asm 2026-02-21T12:39:47.6837194Z add.s32 %r9236, %r9240, 71680; 2026-02-21T12:39:47.6837253Z // begin inline asm 2026-02-21T12:39:47.6837398Z cp.async.ca.shared.global [ %r9236 + 0 ], [ %rd488 + 0 ], 0x8, %r9199; 2026-02-21T12:39:47.6837460Z // end inline asm 2026-02-21T12:39:47.6837525Z cp.async.commit_group; 2026-02-21T12:39:47.6837735Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6837804Z @%p28 bra $L__BB0_36; 2026-02-21T12:39:47.6837897Z // %bb.27: // %.lr.ph122 2026-02-21T12:39:47.6838092Z .loc 1 0 0 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:0 2026-02-21T12:39:47.6838241Z shl.b64 %rd96, %rd95, 8; 2026-02-21T12:39:47.6838442Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6838581Z shl.b64 %rd502, %rd606, 8; 2026-02-21T12:39:47.6838785Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6838852Z or.b64 %rd615, %rd502, %rd23; 2026-02-21T12:39:47.6838918Z add.s64 %rd107, %rd96, -5; 2026-02-21T12:39:47.6838985Z or.b32 %r9257, %r11232, %r11233; 2026-02-21T12:39:47.6839053Z or.b32 %r9258, %r9257, %r11234; 2026-02-21T12:39:47.6839116Z or.b32 %r1072, %r9258, %r1070; 2026-02-21T12:39:47.6839176Z xor.b32 %r1073, %r1072, 8; 2026-02-21T12:39:47.6839240Z or.b32 %r9262, %r11238, %r11237; 2026-02-21T12:39:47.6839303Z or.b32 %r9263, %r9262, %r11236; 2026-02-21T12:39:47.6839367Z add.s32 %r1074, %r10096, %r9263; 2026-02-21T12:39:47.6839428Z xor.b32 %r9265, %r9263, 16; 2026-02-21T12:39:47.6839499Z add.s32 %r1075, %r10096, %r9265; 2026-02-21T12:39:47.6839558Z xor.b32 %r9266, %r9263, 32; 2026-02-21T12:39:47.6839684Z add.s32 %r1076, %r10096, %r9266; 2026-02-21T12:39:47.6839837Z xor.b32 %r9267, %r9263, 48; 2026-02-21T12:39:47.6839902Z add.s32 %r1077, %r10096, %r9267; 2026-02-21T12:39:47.6839963Z xor.b32 %r9268, %r9263, 64; 2026-02-21T12:39:47.6840029Z add.s32 %r1078, %r10096, %r9268; 2026-02-21T12:39:47.6840091Z xor.b32 %r9269, %r9263, 80; 2026-02-21T12:39:47.6840152Z add.s32 %r1079, %r10096, %r9269; 2026-02-21T12:39:47.6840211Z xor.b32 %r9270, %r9263, 96; 2026-02-21T12:39:47.6840278Z add.s32 %r1080, %r10096, %r9270; 2026-02-21T12:39:47.6840339Z xor.b32 %r9271, %r9263, 112; 2026-02-21T12:39:47.6840398Z add.s32 %r1081, %r10096, %r9271; 2026-02-21T12:39:47.6840461Z shl.b32 %r9273, %r11239, 8; 2026-02-21T12:39:47.6840519Z shl.b32 %r9274, %r11239, 4; 2026-02-21T12:39:47.6840580Z shr.u32 %r9276, %r11242, 1; 2026-02-21T12:39:47.6840644Z xor.b32 %r9277, %r9274, %r9276; 2026-02-21T12:39:47.6840710Z add.s32 %r9278, %r10096, %r9273; 2026-02-21T12:39:47.6840773Z add.s32 %r1082, %r9278, %r9277; 2026-02-21T12:39:47.6840839Z and.b32 %r9280, %r11240, 16320; 2026-02-21T12:39:47.6840906Z or.b32 %r9282, %r9280, %r11241; 2026-02-21T12:39:47.6840968Z add.s32 %r1083, %r10096, %r9282; 2026-02-21T12:39:47.6841041Z xor.b32 %r9283, %r9282, 16; 2026-02-21T12:39:47.6841106Z add.s32 %r1084, %r10096, %r9283; 2026-02-21T12:39:47.6841169Z xor.b32 %r9284, %r9282, 32; 2026-02-21T12:39:47.6841232Z add.s32 %r1085, %r10096, %r9284; 2026-02-21T12:39:47.6841293Z xor.b32 %r9285, %r9282, 48; 2026-02-21T12:39:47.6841359Z add.s32 %r1086, %r10096, %r9285; 2026-02-21T12:39:47.6841422Z bfe.u32 %r9286, %r10096, 4, 14; 2026-02-21T12:39:47.6841487Z cvt.u64.u32 %rd503, %r9286; 2026-02-21T12:39:47.6841574Z or.b64 %rd519, %rd503, -9223371899348713472; 2026-02-21T12:39:47.6841637Z add.s32 %r9287, %r10096, 32; 2026-02-21T12:39:47.6841701Z bfe.u32 %r9288, %r9287, 4, 14; 2026-02-21T12:39:47.6841768Z cvt.u64.u32 %rd504, %r9288; 2026-02-21T12:39:47.6841849Z or.b64 %rd520, %rd504, -9223371899348713472; 2026-02-21T12:39:47.6841913Z shl.b32 %r9289, %r11242, 4; 2026-02-21T12:39:47.6841979Z and.b32 %r9291, %r11243, 4112; 2026-02-21T12:39:47.6842044Z or.b32 %r9292, %r9291, %r9289; 2026-02-21T12:39:47.6842116Z mad.lo.s32 %r9293, %r6, 8224, %r9292; 2026-02-21T12:39:47.6842183Z add.s32 %r1087, %r10096, %r9293; 2026-02-21T12:39:47.6842244Z xor.b32 %r9294, %r9293, 16; 2026-02-21T12:39:47.6842313Z add.s32 %r1088, %r10096, %r9294; 2026-02-21T12:39:47.6842375Z xor.b32 %r9295, %r9293, 32; 2026-02-21T12:39:47.6842437Z add.s32 %r1089, %r10096, %r9295; 2026-02-21T12:39:47.6842501Z xor.b32 %r9296, %r9293, 48; 2026-02-21T12:39:47.6842562Z add.s32 %r1090, %r10096, %r9296; 2026-02-21T12:39:47.6842620Z xor.b32 %r9297, %r9293, 64; 2026-02-21T12:39:47.6842681Z add.s32 %r1091, %r10096, %r9297; 2026-02-21T12:39:47.6842742Z xor.b32 %r9298, %r9293, 80; 2026-02-21T12:39:47.6842865Z add.s32 %r1092, %r10096, %r9298; 2026-02-21T12:39:47.6842928Z xor.b32 %r9299, %r9293, 96; 2026-02-21T12:39:47.6843001Z add.s32 %r1093, %r10096, %r9299; 2026-02-21T12:39:47.6843119Z xor.b32 %r9300, %r9293, 112; 2026-02-21T12:39:47.6843182Z add.s32 %r1094, %r10096, %r9300; 2026-02-21T12:39:47.6843246Z shl.b32 %r9302, %r11244, 10; 2026-02-21T12:39:47.6843305Z shl.b32 %r9303, %r11244, 2; 2026-02-21T12:39:47.6843367Z and.b32 %r9305, %r11245, 384; 2026-02-21T12:39:47.6843429Z and.b32 %r9307, %r11246, 4112; 2026-02-21T12:39:47.6843492Z or.b32 %r9308, %r9302, %r9274; 2026-02-21T12:39:47.6843552Z or.b32 %r9309, %r9303, %r9305; 2026-02-21T12:39:47.6843616Z xor.b32 %r9310, %r9308, %r9309; 2026-02-21T12:39:47.6843680Z xor.b32 %r9311, %r9310, %r9307; 2026-02-21T12:39:47.6843745Z add.s32 %r11026, %r10096, %r9311; 2026-02-21T12:39:47.6843807Z add.s32 %r11031, %r11026, 512; 2026-02-21T12:39:47.6843868Z add.s32 %r11036, %r11026, 1024; 2026-02-21T12:39:47.6843932Z add.s32 %r11041, %r11026, 1536; 2026-02-21T12:39:47.6843994Z add.s32 %r11046, %r11026, 2048; 2026-02-21T12:39:47.6844052Z add.s32 %r11051, %r11026, 2560; 2026-02-21T12:39:47.6844209Z add.s32 %r11056, %r11026, 3072; 2026-02-21T12:39:47.6844272Z add.s32 %r11061, %r11026, 3584; 2026-02-21T12:39:47.6844330Z mov.b64 %rd622, 4; 2026-02-21T12:39:47.6844391Z mov.b32 %r11772, 0f00000000; 2026-02-21T12:39:47.6844451Z mov.b32 %r11771, 4; 2026-02-21T12:39:47.6844512Z mov.b32 %r11770, -1; 2026-02-21T12:39:47.6844569Z mov.b32 %r11769, 0; 2026-02-21T12:39:47.6844630Z mov.b32 %r11768, 16; 2026-02-21T12:39:47.6844686Z mov.b32 %r11767, 32; 2026-02-21T12:39:47.6844742Z mov.b32 %r11766, 48; 2026-02-21T12:39:47.6844802Z mov.b32 %r11765, 64; 2026-02-21T12:39:47.6844862Z mov.b32 %r11764, 8; 2026-02-21T12:39:47.6844920Z mov.b32 %r11763, 24; 2026-02-21T12:39:47.6844978Z mov.b32 %r11762, 40; 2026-02-21T12:39:47.6845037Z mov.b32 %r11761, 56; 2026-02-21T12:39:47.6845096Z mov.b32 %r11760, 72; 2026-02-21T12:39:47.6845156Z mov.b64 %rd614, 0; 2026-02-21T12:39:47.6845215Z mov.b64 %rd613, 1; 2026-02-21T12:39:47.6845274Z mov.b64 %rd612, 2; 2026-02-21T12:39:47.6845329Z mov.b64 %rd611, 3; 2026-02-21T12:39:47.6845417Z prmt.b32 %r10911, %r10912, %r10913, 0x3340U; 2026-02-21T12:39:47.6845485Z mov.b64 %rd608, %rd607; 2026-02-21T12:39:47.6845546Z mov.b64 %rd609, %rd607; 2026-02-21T12:39:47.6845604Z mov.b64 %rd610, %rd607; 2026-02-21T12:39:47.6845669Z mov.b64 %rd616, %rd615; 2026-02-21T12:39:47.6845728Z mov.b64 %rd617, %rd615; 2026-02-21T12:39:47.6845788Z mov.b64 %rd618, %rd615; 2026-02-21T12:39:47.6845847Z mov.b64 %rd619, %rd615; 2026-02-21T12:39:47.6845910Z mov.b64 %rd620, %rd607; 2026-02-21T12:39:47.6845973Z mov.b32 %r11773, %r11772; 2026-02-21T12:39:47.6846032Z mov.b32 %r11774, %r11772; 2026-02-21T12:39:47.6846094Z mov.b32 %r11775, %r11772; 2026-02-21T12:39:47.6846163Z mov.b32 %r11776, %r11772; 2026-02-21T12:39:47.6846225Z mov.b32 %r11777, %r11772; 2026-02-21T12:39:47.6846283Z mov.b32 %r11778, %r11772; 2026-02-21T12:39:47.6846347Z mov.b32 %r11779, %r11772; 2026-02-21T12:39:47.6846405Z mov.b32 %r11780, %r11772; 2026-02-21T12:39:47.6846577Z mov.b32 %r11781, %r11772; 2026-02-21T12:39:47.6846666Z mov.b32 %r11782, %r11772; 2026-02-21T12:39:47.6846725Z mov.b32 %r11783, %r11772; 2026-02-21T12:39:47.6846784Z mov.b32 %r11784, %r11772; 2026-02-21T12:39:47.6846842Z mov.b32 %r11785, %r11772; 2026-02-21T12:39:47.6846905Z mov.b32 %r11786, %r11772; 2026-02-21T12:39:47.6846962Z mov.b32 %r11787, %r11772; 2026-02-21T12:39:47.6847022Z mov.b32 %r11788, %r11772; 2026-02-21T12:39:47.6847095Z mov.b32 %r11789, %r11772; 2026-02-21T12:39:47.6847155Z mov.b32 %r11790, %r11772; 2026-02-21T12:39:47.6847214Z mov.b32 %r11791, %r11772; 2026-02-21T12:39:47.6847273Z mov.b32 %r11792, %r11772; 2026-02-21T12:39:47.6847334Z mov.b32 %r11793, %r11772; 2026-02-21T12:39:47.6847392Z mov.b32 %r11794, %r11772; 2026-02-21T12:39:47.6847450Z mov.b32 %r11795, %r11772; 2026-02-21T12:39:47.6847511Z mov.b32 %r11796, %r11772; 2026-02-21T12:39:47.6847667Z mov.b32 %r11797, %r11772; 2026-02-21T12:39:47.6847726Z mov.b32 %r11798, %r11772; 2026-02-21T12:39:47.6847788Z mov.b32 %r11799, %r11772; 2026-02-21T12:39:47.6847911Z mov.b32 %r11800, %r11772; 2026-02-21T12:39:47.6847971Z mov.b32 %r11801, %r11772; 2026-02-21T12:39:47.6848029Z mov.b32 %r11802, %r11772; 2026-02-21T12:39:47.6848090Z mov.b32 %r11803, %r11772; 2026-02-21T12:39:47.6848147Z mov.b32 %r11804, %r11772; 2026-02-21T12:39:47.6848205Z mov.b32 %r11805, %r11772; 2026-02-21T12:39:47.6848264Z mov.b32 %r11806, %r11772; 2026-02-21T12:39:47.6848325Z mov.b32 %r11807, %r11772; 2026-02-21T12:39:47.6848384Z mov.b32 %r11808, %r11772; 2026-02-21T12:39:47.6848445Z mov.b32 %r11809, %r11772; 2026-02-21T12:39:47.6848507Z mov.b32 %r11810, %r11772; 2026-02-21T12:39:47.6848564Z mov.b32 %r11811, %r11772; 2026-02-21T12:39:47.6848622Z mov.b32 %r11812, %r11772; 2026-02-21T12:39:47.6848685Z mov.b32 %r11813, %r11772; 2026-02-21T12:39:47.6848743Z mov.b32 %r11814, %r11772; 2026-02-21T12:39:47.6848804Z mov.b32 %r11815, %r11772; 2026-02-21T12:39:47.6848863Z mov.b32 %r11816, %r11772; 2026-02-21T12:39:47.6848989Z mov.b32 %r11817, %r11772; 2026-02-21T12:39:47.6849108Z mov.b32 %r11818, %r11772; 2026-02-21T12:39:47.6849168Z mov.b32 %r11819, %r11772; 2026-02-21T12:39:47.6849232Z mov.b32 %r11820, %r11772; 2026-02-21T12:39:47.6849291Z mov.b32 %r11821, %r11772; 2026-02-21T12:39:47.6849349Z mov.b32 %r11822, %r11772; 2026-02-21T12:39:47.6849409Z mov.b32 %r11823, %r11772; 2026-02-21T12:39:47.6849466Z mov.b32 %r11824, %r11772; 2026-02-21T12:39:47.6849523Z mov.b32 %r11825, %r11772; 2026-02-21T12:39:47.6849592Z mov.b32 %r11826, %r11772; 2026-02-21T12:39:47.6849655Z mov.b32 %r11827, %r11772; 2026-02-21T12:39:47.6849714Z mov.b32 %r11828, %r11772; 2026-02-21T12:39:47.6849772Z mov.b32 %r11829, %r11772; 2026-02-21T12:39:47.6849833Z mov.b32 %r11830, %r11772; 2026-02-21T12:39:47.6849892Z mov.b32 %r11831, %r11772; 2026-02-21T12:39:47.6849950Z mov.b32 %r11832, %r11772; 2026-02-21T12:39:47.6850010Z mov.b32 %r11833, %r11772; 2026-02-21T12:39:47.6850073Z mov.b32 %r11834, %r11772; 2026-02-21T12:39:47.6850133Z mov.b32 %r11835, %r11772; 2026-02-21T12:39:47.6850194Z mov.b32 %r11836, %r11772; 2026-02-21T12:39:47.6850257Z mov.b32 %r11837, %r11772; 2026-02-21T12:39:47.6850315Z mov.b32 %r11838, %r11772; 2026-02-21T12:39:47.6850373Z mov.b32 %r11839, %r11772; 2026-02-21T12:39:47.6850431Z mov.b32 %r11840, %r11772; 2026-02-21T12:39:47.6850493Z mov.b32 %r11841, %r11772; 2026-02-21T12:39:47.6850552Z mov.b32 %r11842, %r11772; 2026-02-21T12:39:47.6850610Z mov.b32 %r11843, %r11772; 2026-02-21T12:39:47.6850671Z mov.b32 %r11844, %r11772; 2026-02-21T12:39:47.6850730Z mov.b32 %r11845, %r11772; 2026-02-21T12:39:47.6850788Z mov.b32 %r11846, %r11772; 2026-02-21T12:39:47.6850846Z mov.b32 %r11847, %r11772; 2026-02-21T12:39:47.6850909Z mov.b32 %r11848, %r11772; 2026-02-21T12:39:47.6850970Z mov.b32 %r11849, %r11772; 2026-02-21T12:39:47.6851029Z mov.b32 %r11850, %r11772; 2026-02-21T12:39:47.6851093Z mov.b32 %r11851, %r11772; 2026-02-21T12:39:47.6851151Z mov.b32 %r11852, %r11772; 2026-02-21T12:39:47.6851221Z mov.b32 %r11853, %r11772; 2026-02-21T12:39:47.6851287Z mov.b32 %r11854, %r11772; 2026-02-21T12:39:47.6851349Z mov.b32 %r11855, %r11772; 2026-02-21T12:39:47.6851408Z mov.b32 %r11856, %r11772; 2026-02-21T12:39:47.6851467Z mov.b32 %r11857, %r11772; 2026-02-21T12:39:47.6851529Z mov.b32 %r11858, %r11772; 2026-02-21T12:39:47.6851589Z mov.b32 %r11859, %r11772; 2026-02-21T12:39:47.6851646Z mov.b32 %r11860, %r11772; 2026-02-21T12:39:47.6851706Z mov.b32 %r11861, %r11772; 2026-02-21T12:39:47.6851769Z mov.b32 %r11862, %r11772; 2026-02-21T12:39:47.6851827Z mov.b32 %r11863, %r11772; 2026-02-21T12:39:47.6851885Z mov.b32 %r11864, %r11772; 2026-02-21T12:39:47.6851948Z mov.b32 %r11865, %r11772; 2026-02-21T12:39:47.6852004Z mov.b32 %r11866, %r11772; 2026-02-21T12:39:47.6852062Z mov.b32 %r11867, %r11772; 2026-02-21T12:39:47.6852123Z mov.b32 %r11868, %r11772; 2026-02-21T12:39:47.6852244Z mov.b32 %r11869, %r11772; 2026-02-21T12:39:47.6852301Z mov.b32 %r11870, %r11772; 2026-02-21T12:39:47.6852360Z mov.b32 %r11871, %r11772; 2026-02-21T12:39:47.6852468Z mov.b32 %r11872, %r11772; 2026-02-21T12:39:47.6852527Z mov.b32 %r11873, %r11772; 2026-02-21T12:39:47.6852586Z mov.b32 %r11874, %r11772; 2026-02-21T12:39:47.6852648Z mov.b32 %r11875, %r11772; 2026-02-21T12:39:47.6852706Z mov.b32 %r11876, %r11772; 2026-02-21T12:39:47.6852765Z mov.b32 %r11877, %r11772; 2026-02-21T12:39:47.6852833Z mov.b32 %r11878, %r11772; 2026-02-21T12:39:47.6852897Z mov.b32 %r11879, %r11772; 2026-02-21T12:39:47.6852956Z mov.b32 %r11880, %r11772; 2026-02-21T12:39:47.6853014Z mov.b32 %r11881, %r11772; 2026-02-21T12:39:47.6853074Z mov.b32 %r11882, %r11772; 2026-02-21T12:39:47.6853132Z mov.b32 %r11883, %r11772; 2026-02-21T12:39:47.6853189Z mov.b32 %r11884, %r11772; 2026-02-21T12:39:47.6853247Z mov.b32 %r11885, %r11772; 2026-02-21T12:39:47.6853308Z mov.b32 %r11886, %r11772; 2026-02-21T12:39:47.6853368Z mov.b32 %r11887, %r11772; 2026-02-21T12:39:47.6853426Z mov.b32 %r11888, %r11772; 2026-02-21T12:39:47.6853540Z mov.b32 %r11889, %r11772; 2026-02-21T12:39:47.6853646Z mov.b32 %r11890, %r11772; 2026-02-21T12:39:47.6853705Z mov.b32 %r11891, %r11772; 2026-02-21T12:39:47.6853764Z mov.b32 %r11892, %r11772; 2026-02-21T12:39:47.6853827Z mov.b32 %r11893, %r11772; 2026-02-21T12:39:47.6853883Z mov.b32 %r11894, %r11772; 2026-02-21T12:39:47.6853941Z mov.b32 %r11895, %r11772; 2026-02-21T12:39:47.6854006Z mov.b32 %r11896, %r11772; 2026-02-21T12:39:47.6854071Z mov.b32 %r11897, %r11772; 2026-02-21T12:39:47.6854130Z mov.b32 %r11898, %r11772; 2026-02-21T12:39:47.6854189Z mov.b32 %r11899, %r11772; 2026-02-21T12:39:47.6854253Z mov.b64 %rd623, %rd614; 2026-02-21T12:39:47.6854313Z mov.b64 %rd627, %rd620; 2026-02-21T12:39:47.6854374Z mov.b64 %rd628, %rd619; 2026-02-21T12:39:47.6854436Z bra.uni $L__BB0_28; 2026-02-21T12:39:47.6854560Z $L__BB0_35: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6854779Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6854852Z add.s64 %rd623, %rd623, 1; 2026-02-21T12:39:47.6854925Z setp.ne.b64 %p43, %rd96, %rd623; 2026-02-21T12:39:47.6854989Z mov.b64 %rd607, %rd620; 2026-02-21T12:39:47.6855048Z mov.b64 %rd610, %rd114; 2026-02-21T12:39:47.6855110Z mov.b64 %rd611, %rd622; 2026-02-21T12:39:47.6855180Z mov.b64 %rd614, %rd118; 2026-02-21T12:39:47.6855248Z mov.b32 %r11764, %r1106; 2026-02-21T12:39:47.6855309Z mov.b64 %rd615, %rd619; 2026-02-21T12:39:47.6855367Z mov.b64 %rd618, %rd122; 2026-02-21T12:39:47.6855429Z mov.b32 %r11769, %r1111; 2026-02-21T12:39:47.6855486Z mov.b64 %rd619, %rd628; 2026-02-21T12:39:47.6855546Z mov.b64 %rd620, %rd627; 2026-02-21T12:39:47.6855604Z mov.b64 %rd622, %rd131; 2026-02-21T12:39:47.6855667Z @%p43 bra $L__BB0_28; 2026-02-21T12:39:47.6855726Z bra.uni $L__BB0_36; 2026-02-21T12:39:47.6855914Z $L__BB0_28: // =>This Inner Loop Header: Depth=1 2026-02-21T12:39:47.6856249Z .loc 1 0 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:0:120 2026-02-21T12:39:47.6856340Z mov.b32 %r1111, %r11768; 2026-02-21T12:39:47.6856425Z mov.b32 %r11768, %r11767; 2026-02-21T12:39:47.6856672Z mov.b32 %r11767, %r11766; 2026-02-21T12:39:47.6856769Z mov.b32 %r11766, %r11765; 2026-02-21T12:39:47.6856859Z mov.b64 %rd122, %rd617; 2026-02-21T12:39:47.6856945Z mov.b64 %rd617, %rd616; 2026-02-21T12:39:47.6857030Z mov.b64 %rd616, %rd615; 2026-02-21T12:39:47.6857116Z mov.b32 %r1106, %r11763; 2026-02-21T12:39:47.6857202Z mov.b32 %r11763, %r11762; 2026-02-21T12:39:47.6857289Z mov.b32 %r11762, %r11761; 2026-02-21T12:39:47.6857380Z mov.b32 %r11761, %r11760; 2026-02-21T12:39:47.6857471Z mov.b64 %rd118, %rd613; 2026-02-21T12:39:47.6857557Z mov.b64 %rd613, %rd612; 2026-02-21T12:39:47.6857645Z mov.b64 %rd612, %rd611; 2026-02-21T12:39:47.6857854Z mov.b64 %rd114, %rd609; 2026-02-21T12:39:47.6857936Z mov.b64 %rd609, %rd608; 2026-02-21T12:39:47.6858024Z mov.b64 %rd608, %rd607; 2026-02-21T12:39:47.6858430Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6858526Z add.s64 %rd505, %rd622, 1; 2026-02-21T12:39:47.6858621Z setp.eq.b64 %p31, %rd622, 255; 2026-02-21T12:39:47.6858722Z selp.b64 %rd131, 0, %rd505, %p31; 2026-02-21T12:39:47.6858826Z setp.ne.b64 %p32, %rd131, 0; 2026-02-21T12:39:47.6858924Z @%p32 bra $L__BB0_33; 2026-02-21T12:39:47.6859104Z // %bb.29: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6859209Z add.s64 %rd629, %rd629, 1; 2026-02-21T12:39:47.6859544Z .loc 1 28 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:28:35 2026-02-21T12:39:47.6859688Z mul.hi.s64 %rd506, %rd629, 7378697629483820647; 2026-02-21T12:39:47.6859791Z shr.u64 %rd507, %rd506, 63; 2026-02-21T12:39:47.6859892Z shr.s64 %rd508, %rd506, 4; 2026-02-21T12:39:47.6859999Z add.s64 %rd509, %rd508, %rd507; 2026-02-21T12:39:47.6860576Z .loc 1 29 33 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:29:33 2026-02-21T12:39:47.6860696Z shl.b64 %rd133, %rd509, 3; 2026-02-21T12:39:47.6861051Z .loc 1 30 39 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:39 2026-02-21T12:39:47.6861151Z sub.s64 %rd510, 2048, %rd133; 2026-02-21T12:39:47.6861473Z .loc 1 30 52 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:30:52 2026-02-21T12:39:47.6861572Z min.s64 %rd134, %rd510, 8; 2026-02-21T12:39:47.6861874Z .loc 1 31 45 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:45 2026-02-21T12:39:47.6861963Z mul.lo.s64 %rd511, %rd509, 40; 2026-02-21T12:39:47.6862049Z sub.s64 %rd135, %rd629, %rd511; 2026-02-21T12:39:47.6862346Z .loc 1 32 51 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:32:51 2026-02-21T12:39:47.6862460Z or.b64 %rd512, %rd135, %rd134; 2026-02-21T12:39:47.6862584Z and.b64 %rd513, %rd512, -4294967296; 2026-02-21T12:39:47.6862664Z setp.ne.b64 %p33, %rd513, 0; 2026-02-21T12:39:47.6862728Z @%p33 bra $L__BB0_31; 2026-02-21T12:39:47.6862787Z bra.uni $L__BB0_30; 2026-02-21T12:39:47.6862905Z $L__BB0_31: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6862974Z div.s64 %rd626, %rd135, %rd134; 2026-02-21T12:39:47.6863034Z bra.uni $L__BB0_32; 2026-02-21T12:39:47.6863146Z $L__BB0_30: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6863211Z cvt.u32.u64 %r9312, %rd134; 2026-02-21T12:39:47.6863276Z cvt.u32.u64 %r9313, %rd135; 2026-02-21T12:39:47.6863340Z div.u32 %r9314, %r9313, %r9312; 2026-02-21T12:39:47.6863401Z cvt.u64.u32 %rd626, %r9314; 2026-02-21T12:39:47.6863513Z $L__BB0_32: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6863723Z .loc 1 31 64 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:64 2026-02-21T12:39:47.6863795Z mul.lo.s64 %rd514, %rd626, %rd134; 2026-02-21T12:39:47.6863876Z sub.s64 %rd515, %rd135, %rd514; 2026-02-21T12:39:47.6864077Z .loc 1 31 30 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:31:30 2026-02-21T12:39:47.6864141Z add.s64 %rd516, %rd515, %rd133; 2026-02-21T12:39:47.6864335Z .loc 1 33 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:33:27 2026-02-21T12:39:47.6864400Z shl.b64 %rd627, %rd516, 7; 2026-02-21T12:39:47.6864596Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6864659Z or.b64 %rd630, %rd627, %rd4; 2026-02-21T12:39:47.6864723Z or.b64 %rd631, %rd627, %rd5; 2026-02-21T12:39:47.6864919Z .loc 1 35 27 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:35:27 2026-02-21T12:39:47.6865086Z shl.b64 %rd517, %rd626, 8; 2026-02-21T12:39:47.6865291Z .loc 1 36 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:36:32 2026-02-21T12:39:47.6865410Z or.b64 %rd628, %rd517, %rd23; 2026-02-21T12:39:47.6865518Z $L__BB0_33: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6865840Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6865925Z setp.eq.b64 %p38, %rd131, 0; 2026-02-21T12:39:47.6865999Z setp.lt.s64 %p39, %rd623, %rd107; 2026-02-21T12:39:47.6866063Z add.s32 %r10891, %r11770, 1; 2026-02-21T12:39:47.6866132Z setp.gt.s32 %p40, %r10891, 4; 2026-02-21T12:39:47.6866207Z selp.b32 %r11770, 0, %r10891, %p40; 2026-02-21T12:39:47.6866418Z .loc 1 44 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:44:35 2026-02-21T12:39:47.6866605Z cvt.s64.s32 %rd528, %r11769; 2026-02-21T12:39:47.6866687Z add.s64 %rd529, %rd528, %rd7; 2026-02-21T12:39:47.6866890Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6867105Z cp.async.wait_group 8; 2026-02-21T12:39:47.6867167Z bar.sync 0; 2026-02-21T12:39:47.6867231Z shl.b32 %r10892, %r11770, 12; 2026-02-21T12:39:47.6867296Z add.s32 %r10893, %r10096, 32768; 2026-02-21T12:39:47.6867363Z add.s32 %r10894, %r10893, %r10892; 2026-02-21T12:39:47.6867562Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6867638Z add.s32 %r10895, %r10894, %r1072; 2026-02-21T12:39:47.6867715Z ld.shared.b16 %rs449, [%r10895]; 2026-02-21T12:39:47.6867788Z ld.shared.b16 %rs450, [%r10895+256]; 2026-02-21T12:39:47.6867856Z ld.shared.b16 %rs451, [%r10895+16]; 2026-02-21T12:39:47.6867924Z ld.shared.b16 %rs452, [%r10895+272]; 2026-02-21T12:39:47.6867990Z add.s32 %r10896, %r10894, %r1073; 2026-02-21T12:39:47.6868056Z ld.shared.b16 %rs453, [%r10896]; 2026-02-21T12:39:47.6868124Z ld.shared.b16 %rs454, [%r10896+256]; 2026-02-21T12:39:47.6868193Z ld.shared.b16 %rs455, [%r10896+16]; 2026-02-21T12:39:47.6868265Z ld.shared.b16 %rs456, [%r10896+272]; 2026-02-21T12:39:47.6868332Z cvt.f32.bf16 %r9573, %rs449; 2026-02-21T12:39:47.6868495Z cvt.f32.bf16 %r9574, %rs450; 2026-02-21T12:39:47.6868559Z cvt.f32.bf16 %r9575, %rs453; 2026-02-21T12:39:47.6868621Z cvt.f32.bf16 %r9576, %rs454; 2026-02-21T12:39:47.6868682Z cvt.f32.bf16 %r9833, %rs451; 2026-02-21T12:39:47.6868746Z cvt.f32.bf16 %r9834, %rs452; 2026-02-21T12:39:47.6868807Z cvt.f32.bf16 %r9835, %rs455; 2026-02-21T12:39:47.6868868Z cvt.f32.bf16 %r9836, %rs456; 2026-02-21T12:39:47.6869075Z .loc 1 57 34 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:34 2026-02-21T12:39:47.6869149Z mad.lo.s64 %rd530, %rd529, 1280, %rd150; 2026-02-21T12:39:47.6869215Z add.s64 %rd518, %rd530, %rd618; 2026-02-21T12:39:47.6869411Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6869479Z // begin inline asm 2026-02-21T12:39:47.6869541Z mov.u32 %r9315, 0x0; 2026-02-21T12:39:47.6869601Z mov.u32 %r9316, 0x0; 2026-02-21T12:39:47.6869708Z ld.global.v2.b32 { %r9315, %r9316 }, [ %rd518 + 0 ]; 2026-02-21T12:39:47.6869767Z // end inline asm 2026-02-21T12:39:47.6869975Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6870045Z st.shared.b8 [%r1074], %r9315; 2026-02-21T12:39:47.6870118Z prmt.b32 %r10897, %r9315, 0, 0x7771U; 2026-02-21T12:39:47.6870187Z st.shared.b8 [%r1075+256], %r10897; 2026-02-21T12:39:47.6870254Z prmt.b32 %r10898, %r9315, 0, 0x7772U; 2026-02-21T12:39:47.6870324Z st.shared.b8 [%r1076+512], %r10898; 2026-02-21T12:39:47.6870391Z prmt.b32 %r10899, %r9315, 0, 0x7773U; 2026-02-21T12:39:47.6870458Z st.shared.b8 [%r1077+768], %r10899; 2026-02-21T12:39:47.6870526Z st.shared.b8 [%r1078+1024], %r9316; 2026-02-21T12:39:47.6870672Z prmt.b32 %r10900, %r9316, 0, 0x7771U; 2026-02-21T12:39:47.6870739Z st.shared.b8 [%r1079+1280], %r10900; 2026-02-21T12:39:47.6870881Z prmt.b32 %r10901, %r9316, 0, 0x7772U; 2026-02-21T12:39:47.6870949Z st.shared.b8 [%r1080+1536], %r10901; 2026-02-21T12:39:47.6871012Z prmt.b32 %r10902, %r9316, 0, 0x7773U; 2026-02-21T12:39:47.6871076Z st.shared.b8 [%r1081+1792], %r10902; 2026-02-21T12:39:47.6871135Z bar.sync 0; 2026-02-21T12:39:47.6871201Z ld.shared.b32 %r10903, [%r1082]; 2026-02-21T12:39:47.6871270Z prmt.b32 %r10904, %r10903, 0, 0x7771U; 2026-02-21T12:39:47.6871338Z cvt.u16.u32 %rs457, %r10904; 2026-02-21T12:39:47.6871405Z prmt.b32 %r10905, %r10903, 0, 0x7770U; 2026-02-21T12:39:47.6871467Z cvt.u16.u32 %rs458, %r10905; 2026-02-21T12:39:47.6871532Z prmt.b32 %r10906, %r10903, 0, 0x7773U; 2026-02-21T12:39:47.6871595Z cvt.u16.u32 %rs459, %r10906; 2026-02-21T12:39:47.6871660Z prmt.b32 %r10907, %r10903, 0, 0x7772U; 2026-02-21T12:39:47.6871724Z cvt.u16.u32 %rs460, %r10907; 2026-02-21T12:39:47.6871936Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6872097Z shl.b16 %rs461, %rs458, 4; 2026-02-21T12:39:47.6872163Z shl.b16 %rs462, %rs457, 4; 2026-02-21T12:39:47.6872365Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6872425Z cvt.u32.u16 %r10908, %rs461; 2026-02-21T12:39:47.6872509Z prmt.b32 %r10909, %r10908, %r10910, 0x3340U; 2026-02-21T12:39:47.6872586Z prmt.b32 %r10914, %r10909, %r10911, 0x5410U; 2026-02-21T12:39:47.6872667Z prmt.b32 %r10915, %r10914, %r10903, 0x5040U; 2026-02-21T12:39:47.6872734Z prmt.b32 %r10916, %r10915, 0, 0x9991U; 2026-02-21T12:39:47.6872796Z cvt.u16.u32 %rs463, %r10916; 2026-02-21T12:39:47.6872861Z shr.s16 %rs464, %rs463, 4; 2026-02-21T12:39:47.6872927Z prmt.b32 %r10917, %r10915, 0, 0xbbb3U; 2026-02-21T12:39:47.6872990Z cvt.u16.u32 %rs465, %r10917; 2026-02-21T12:39:47.6873053Z shr.s16 %rs466, %rs465, 4; 2026-02-21T12:39:47.6873117Z cvt.s16.s8 %rs467, %rs461; 2026-02-21T12:39:47.6873178Z shr.s16 %rs468, %rs467, 4; 2026-02-21T12:39:47.6873243Z cvt.s16.s8 %rs469, %rs462; 2026-02-21T12:39:47.6873307Z shr.s16 %rs470, %rs469, 4; 2026-02-21T12:39:47.6873508Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6873575Z cvt.rn.f32.s16 %r10918, %rs466; 2026-02-21T12:39:47.6873639Z cvt.rn.f32.s16 %r10919, %rs464; 2026-02-21T12:39:47.6873704Z cvt.rn.f32.s16 %r10920, %rs470; 2026-02-21T12:39:47.6873767Z cvt.rn.f32.s16 %r10921, %rs468; 2026-02-21T12:39:47.6873964Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6874030Z shl.b16 %rs471, %rs460, 4; 2026-02-21T12:39:47.6874092Z shl.b16 %rs472, %rs459, 4; 2026-02-21T12:39:47.6874286Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6874371Z prmt.b32 %r10922, %r10903, %r10923, 0x3020U; 2026-02-21T12:39:47.6874437Z prmt.b32 %r10924, %r10922, 0, 0x9991U; 2026-02-21T12:39:47.6874509Z cvt.u16.u32 %rs473, %r10924; 2026-02-21T12:39:47.6874571Z shr.s16 %rs474, %rs473, 4; 2026-02-21T12:39:47.6874633Z cvt.s16.s8 %rs475, %rs471; 2026-02-21T12:39:47.6874692Z shr.s16 %rs476, %rs475, 4; 2026-02-21T12:39:47.6874751Z cvt.s16.s8 %rs477, %rs472; 2026-02-21T12:39:47.6874814Z shr.s16 %rs478, %rs477, 4; 2026-02-21T12:39:47.6874883Z prmt.b32 %r10925, %r10903, 0, 0xbbb3U; 2026-02-21T12:39:47.6874944Z cvt.u16.u32 %rs479, %r10925; 2026-02-21T12:39:47.6875006Z shr.s16 %rs480, %rs479, 4; 2026-02-21T12:39:47.6875203Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6875267Z cvt.rn.f32.s16 %r10926, %rs474; 2026-02-21T12:39:47.6875329Z cvt.rn.f32.s16 %r10927, %rs480; 2026-02-21T12:39:47.6875395Z cvt.rn.f32.s16 %r10928, %rs478; 2026-02-21T12:39:47.6875531Z cvt.rn.f32.s16 %r10929, %rs476; 2026-02-21T12:39:47.6875728Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6875849Z ld.shared.b32 %r10930, [%r1082+128]; 2026-02-21T12:39:47.6875917Z prmt.b32 %r10931, %r10930, 0, 0x7771U; 2026-02-21T12:39:47.6875979Z cvt.u16.u32 %rs481, %r10931; 2026-02-21T12:39:47.6876049Z prmt.b32 %r10932, %r10930, 0, 0x7770U; 2026-02-21T12:39:47.6876112Z cvt.u16.u32 %rs482, %r10932; 2026-02-21T12:39:47.6876177Z prmt.b32 %r10933, %r10930, 0, 0x7773U; 2026-02-21T12:39:47.6876238Z cvt.u16.u32 %rs483, %r10933; 2026-02-21T12:39:47.6876305Z prmt.b32 %r10934, %r10930, 0, 0x7772U; 2026-02-21T12:39:47.6876366Z cvt.u16.u32 %rs484, %r10934; 2026-02-21T12:39:47.6876717Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6876790Z shl.b16 %rs485, %rs482, 4; 2026-02-21T12:39:47.6876851Z shl.b16 %rs486, %rs481, 4; 2026-02-21T12:39:47.6877049Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6877202Z cvt.u32.u16 %r10935, %rs485; 2026-02-21T12:39:47.6877342Z prmt.b32 %r10936, %r10935, %r10937, 0x3340U; 2026-02-21T12:39:47.6877421Z prmt.b32 %r10938, %r10936, %r10911, 0x5410U; 2026-02-21T12:39:47.6877496Z prmt.b32 %r10939, %r10938, %r10930, 0x5040U; 2026-02-21T12:39:47.6877565Z prmt.b32 %r10940, %r10939, 0, 0x9991U; 2026-02-21T12:39:47.6877627Z cvt.u16.u32 %rs487, %r10940; 2026-02-21T12:39:47.6877702Z shr.s16 %rs488, %rs487, 4; 2026-02-21T12:39:47.6877773Z prmt.b32 %r10941, %r10939, 0, 0xbbb3U; 2026-02-21T12:39:47.6877833Z cvt.u16.u32 %rs489, %r10941; 2026-02-21T12:39:47.6877896Z shr.s16 %rs490, %rs489, 4; 2026-02-21T12:39:47.6877957Z cvt.s16.s8 %rs491, %rs485; 2026-02-21T12:39:47.6878020Z shr.s16 %rs492, %rs491, 4; 2026-02-21T12:39:47.6878081Z cvt.s16.s8 %rs493, %rs486; 2026-02-21T12:39:47.6878141Z shr.s16 %rs494, %rs493, 4; 2026-02-21T12:39:47.6878342Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6878411Z cvt.rn.f32.s16 %r10942, %rs490; 2026-02-21T12:39:47.6878474Z cvt.rn.f32.s16 %r10943, %rs488; 2026-02-21T12:39:47.6878539Z cvt.rn.f32.s16 %r10944, %rs494; 2026-02-21T12:39:47.6878600Z cvt.rn.f32.s16 %r10945, %rs492; 2026-02-21T12:39:47.6878796Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6878869Z shl.b16 %rs495, %rs484, 4; 2026-02-21T12:39:47.6878935Z shl.b16 %rs496, %rs483, 4; 2026-02-21T12:39:47.6879131Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6879208Z prmt.b32 %r10946, %r10930, %r10947, 0x3020U; 2026-02-21T12:39:47.6879277Z prmt.b32 %r10948, %r10946, 0, 0x9991U; 2026-02-21T12:39:47.6879339Z cvt.u16.u32 %rs497, %r10948; 2026-02-21T12:39:47.6879400Z shr.s16 %rs498, %rs497, 4; 2026-02-21T12:39:47.6879466Z cvt.s16.s8 %rs499, %rs495; 2026-02-21T12:39:47.6879525Z shr.s16 %rs500, %rs499, 4; 2026-02-21T12:39:47.6879588Z cvt.s16.s8 %rs501, %rs496; 2026-02-21T12:39:47.6879650Z shr.s16 %rs502, %rs501, 4; 2026-02-21T12:39:47.6879719Z prmt.b32 %r10949, %r10930, 0, 0xbbb3U; 2026-02-21T12:39:47.6879781Z cvt.u16.u32 %rs503, %r10949; 2026-02-21T12:39:47.6879840Z shr.s16 %rs504, %rs503, 4; 2026-02-21T12:39:47.6880036Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6880100Z cvt.rn.f32.s16 %r10950, %rs498; 2026-02-21T12:39:47.6880162Z cvt.rn.f32.s16 %r10951, %rs504; 2026-02-21T12:39:47.6880223Z cvt.rn.f32.s16 %r10952, %rs502; 2026-02-21T12:39:47.6880286Z cvt.rn.f32.s16 %r10953, %rs500; 2026-02-21T12:39:47.6880340Z bar.sync 0; 2026-02-21T12:39:47.6880468Z st.shared.v4.b32 [%r1083], {%r10921, %r10919, %r10920, %r10918}; 2026-02-21T12:39:47.6880589Z st.shared.v4.b32 [%r1084], {%r10929, %r10926, %r10928, %r10927}; 2026-02-21T12:39:47.6880805Z st.shared.v4.b32 [%r1085], {%r10945, %r10943, %r10944, %r10942}; 2026-02-21T12:39:47.6880930Z st.shared.v4.b32 [%r1086], {%r10953, %r10950, %r10952, %r10951}; 2026-02-21T12:39:47.6881083Z $L__tmp17: 2026-02-21T12:39:47.6881372Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6881435Z // begin inline asm 2026-02-21T12:39:47.6881521Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6881582Z // end inline asm 2026-02-21T12:39:47.6881640Z bar.sync 0; 2026-02-21T12:39:47.6881727Z shfl.sync.idx.b32 %r10954, %r2, 0, 31, -1; 2026-02-21T12:39:47.6881802Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6881868Z mov.pred %p34, -1; 2026-02-21T12:39:47.6881927Z // begin inline asm 2026-02-21T12:39:47.6884698Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899}, {%r9573,%r9574,%r9575,%r9576}, %rd519, %p34, 1, 1; 2026-02-21T12:39:47.6884778Z // end inline asm 2026-02-21T12:39:47.6884843Z // begin inline asm 2026-02-21T12:39:47.6887659Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899}, {%r9833,%r9834,%r9835,%r9836}, %rd520, %p34, 1, 1; 2026-02-21T12:39:47.6887732Z // end inline asm 2026-02-21T12:39:47.6887810Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6887870Z mov.b32 %r10750, 0; 2026-02-21T12:39:47.6887935Z mov.b32 %r9965, %r10096; 2026-02-21T12:39:47.6887996Z mov.b32 %r9966, %r10750; 2026-02-21T12:39:47.6888054Z mov.b32 %r9967, %r10750; 2026-02-21T12:39:47.6888111Z // begin inline asm 2026-02-21T12:39:47.6890640Z // wait for regs: %r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r9965,%r9966,%r9967 2026-02-21T12:39:47.6890870Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6890929Z // end inline asm 2026-02-21T12:39:47.6890983Z $L__tmp18: 2026-02-21T12:39:47.6891198Z .loc 1 44 35 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:44:35 2026-02-21T12:39:47.6891270Z cvt.s64.s32 %rd531, %r11764; 2026-02-21T12:39:47.6891337Z add.s64 %rd532, %rd531, %rd7; 2026-02-21T12:39:47.6891539Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6891725Z add.s32 %r10955, %r10096, 53248; 2026-02-21T12:39:47.6891794Z add.s32 %r10956, %r10955, %r10892; 2026-02-21T12:39:47.6891992Z .loc 1 55 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:55:32 2026-02-21T12:39:47.6892058Z add.s32 %r10957, %r10956, %r1072; 2026-02-21T12:39:47.6892127Z ld.shared.b16 %rs505, [%r10957]; 2026-02-21T12:39:47.6892197Z ld.shared.b16 %rs506, [%r10957+256]; 2026-02-21T12:39:47.6892265Z ld.shared.b16 %rs507, [%r10957+16]; 2026-02-21T12:39:47.6892333Z ld.shared.b16 %rs508, [%r10957+272]; 2026-02-21T12:39:47.6892396Z add.s32 %r10958, %r10956, %r1073; 2026-02-21T12:39:47.6892471Z ld.shared.b16 %rs509, [%r10958]; 2026-02-21T12:39:47.6892544Z ld.shared.b16 %rs510, [%r10958+256]; 2026-02-21T12:39:47.6892611Z ld.shared.b16 %rs511, [%r10958+16]; 2026-02-21T12:39:47.6892680Z ld.shared.b16 %rs512, [%r10958+272]; 2026-02-21T12:39:47.6892742Z cvt.f32.bf16 %r10357, %rs505; 2026-02-21T12:39:47.6892808Z cvt.f32.bf16 %r10358, %rs506; 2026-02-21T12:39:47.6892873Z cvt.f32.bf16 %r10359, %rs509; 2026-02-21T12:39:47.6892934Z cvt.f32.bf16 %r10360, %rs510; 2026-02-21T12:39:47.6892999Z cvt.f32.bf16 %r10617, %rs507; 2026-02-21T12:39:47.6893059Z cvt.f32.bf16 %r10618, %rs508; 2026-02-21T12:39:47.6893119Z cvt.f32.bf16 %r10619, %rs511; 2026-02-21T12:39:47.6893182Z cvt.f32.bf16 %r10620, %rs512; 2026-02-21T12:39:47.6893379Z .loc 1 57 34 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:34 2026-02-21T12:39:47.6893454Z mad.lo.s64 %rd533, %rd532, 1280, %rd150; 2026-02-21T12:39:47.6893530Z add.s64 %rd521, %rd533, %rd618; 2026-02-21T12:39:47.6893729Z .loc 1 57 87 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:57:87 2026-02-21T12:39:47.6893789Z // begin inline asm 2026-02-21T12:39:47.6893851Z mov.u32 %r10099, 0x0; 2026-02-21T12:39:47.6893910Z mov.u32 %r10100, 0x0; 2026-02-21T12:39:47.6894016Z ld.global.v2.b32 { %r10099, %r10100 }, [ %rd521 + 0 ]; 2026-02-21T12:39:47.6894077Z // end inline asm 2026-02-21T12:39:47.6894281Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6894342Z bar.sync 0; 2026-02-21T12:39:47.6894417Z st.shared.b8 [%r1074], %r10099; 2026-02-21T12:39:47.6894490Z prmt.b32 %r10959, %r10099, 0, 0x7771U; 2026-02-21T12:39:47.6894562Z st.shared.b8 [%r1075+256], %r10959; 2026-02-21T12:39:47.6894629Z prmt.b32 %r10960, %r10099, 0, 0x7772U; 2026-02-21T12:39:47.6894693Z st.shared.b8 [%r1076+512], %r10960; 2026-02-21T12:39:47.6894762Z prmt.b32 %r10961, %r10099, 0, 0x7773U; 2026-02-21T12:39:47.6894827Z st.shared.b8 [%r1077+768], %r10961; 2026-02-21T12:39:47.6894893Z st.shared.b8 [%r1078+1024], %r10100; 2026-02-21T12:39:47.6894958Z prmt.b32 %r10962, %r10100, 0, 0x7771U; 2026-02-21T12:39:47.6895103Z st.shared.b8 [%r1079+1280], %r10962; 2026-02-21T12:39:47.6895172Z prmt.b32 %r10963, %r10100, 0, 0x7772U; 2026-02-21T12:39:47.6895239Z st.shared.b8 [%r1080+1536], %r10963; 2026-02-21T12:39:47.6895359Z prmt.b32 %r10964, %r10100, 0, 0x7773U; 2026-02-21T12:39:47.6895425Z st.shared.b8 [%r1081+1792], %r10964; 2026-02-21T12:39:47.6895479Z bar.sync 0; 2026-02-21T12:39:47.6895545Z ld.shared.b32 %r10965, [%r1082]; 2026-02-21T12:39:47.6895613Z prmt.b32 %r10966, %r10965, 0, 0x7771U; 2026-02-21T12:39:47.6895678Z cvt.u16.u32 %rs513, %r10966; 2026-02-21T12:39:47.6895743Z prmt.b32 %r10967, %r10965, 0, 0x7770U; 2026-02-21T12:39:47.6895807Z cvt.u16.u32 %rs514, %r10967; 2026-02-21T12:39:47.6895872Z prmt.b32 %r10968, %r10965, 0, 0x7773U; 2026-02-21T12:39:47.6895933Z cvt.u16.u32 %rs515, %r10968; 2026-02-21T12:39:47.6895998Z prmt.b32 %r10969, %r10965, 0, 0x7772U; 2026-02-21T12:39:47.6896070Z cvt.u16.u32 %rs516, %r10969; 2026-02-21T12:39:47.6896273Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6896341Z shl.b16 %rs517, %rs514, 4; 2026-02-21T12:39:47.6896404Z shl.b16 %rs518, %rs513, 4; 2026-02-21T12:39:47.6896855Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6896929Z cvt.u32.u16 %r10970, %rs517; 2026-02-21T12:39:47.6897017Z prmt.b32 %r10971, %r10970, %r10972, 0x3340U; 2026-02-21T12:39:47.6897096Z prmt.b32 %r10973, %r10971, %r10911, 0x5410U; 2026-02-21T12:39:47.6897172Z prmt.b32 %r10974, %r10973, %r10965, 0x5040U; 2026-02-21T12:39:47.6897243Z prmt.b32 %r10975, %r10974, 0, 0x9991U; 2026-02-21T12:39:47.6897306Z cvt.u16.u32 %rs519, %r10975; 2026-02-21T12:39:47.6897369Z shr.s16 %rs520, %rs519, 4; 2026-02-21T12:39:47.6897437Z prmt.b32 %r10976, %r10974, 0, 0xbbb3U; 2026-02-21T12:39:47.6897504Z cvt.u16.u32 %rs521, %r10976; 2026-02-21T12:39:47.6897566Z shr.s16 %rs522, %rs521, 4; 2026-02-21T12:39:47.6897628Z cvt.s16.s8 %rs523, %rs517; 2026-02-21T12:39:47.6897703Z shr.s16 %rs524, %rs523, 4; 2026-02-21T12:39:47.6897766Z cvt.s16.s8 %rs525, %rs518; 2026-02-21T12:39:47.6897827Z shr.s16 %rs526, %rs525, 4; 2026-02-21T12:39:47.6898037Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6898111Z cvt.rn.f32.s16 %r10977, %rs522; 2026-02-21T12:39:47.6898176Z cvt.rn.f32.s16 %r10978, %rs520; 2026-02-21T12:39:47.6898241Z cvt.rn.f32.s16 %r10979, %rs526; 2026-02-21T12:39:47.6898307Z cvt.rn.f32.s16 %r10980, %rs524; 2026-02-21T12:39:47.6898507Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6898567Z shl.b16 %rs527, %rs516, 4; 2026-02-21T12:39:47.6898631Z shl.b16 %rs528, %rs515, 4; 2026-02-21T12:39:47.6898826Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6898906Z prmt.b32 %r10981, %r10965, %r10982, 0x3020U; 2026-02-21T12:39:47.6898975Z prmt.b32 %r10983, %r10981, 0, 0x9991U; 2026-02-21T12:39:47.6899040Z cvt.u16.u32 %rs529, %r10983; 2026-02-21T12:39:47.6899101Z shr.s16 %rs530, %rs529, 4; 2026-02-21T12:39:47.6899167Z cvt.s16.s8 %rs531, %rs527; 2026-02-21T12:39:47.6899237Z shr.s16 %rs532, %rs531, 4; 2026-02-21T12:39:47.6899304Z cvt.s16.s8 %rs533, %rs528; 2026-02-21T12:39:47.6899365Z shr.s16 %rs534, %rs533, 4; 2026-02-21T12:39:47.6899435Z prmt.b32 %r10984, %r10965, 0, 0xbbb3U; 2026-02-21T12:39:47.6899497Z cvt.u16.u32 %rs535, %r10984; 2026-02-21T12:39:47.6899558Z shr.s16 %rs536, %rs535, 4; 2026-02-21T12:39:47.6899753Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6899821Z cvt.rn.f32.s16 %r10985, %rs530; 2026-02-21T12:39:47.6899883Z cvt.rn.f32.s16 %r10986, %rs536; 2026-02-21T12:39:47.6899945Z cvt.rn.f32.s16 %r10987, %rs534; 2026-02-21T12:39:47.6900008Z cvt.rn.f32.s16 %r10988, %rs532; 2026-02-21T12:39:47.6900204Z .loc 1 65 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:65:28 2026-02-21T12:39:47.6900366Z ld.shared.b32 %r10989, [%r1082+128]; 2026-02-21T12:39:47.6900499Z prmt.b32 %r10990, %r10989, 0, 0x7771U; 2026-02-21T12:39:47.6900566Z cvt.u16.u32 %rs537, %r10990; 2026-02-21T12:39:47.6900632Z prmt.b32 %r10991, %r10989, 0, 0x7770U; 2026-02-21T12:39:47.6900695Z cvt.u16.u32 %rs538, %r10991; 2026-02-21T12:39:47.6900763Z prmt.b32 %r10992, %r10989, 0, 0x7773U; 2026-02-21T12:39:47.6900824Z cvt.u16.u32 %rs539, %r10992; 2026-02-21T12:39:47.6900888Z prmt.b32 %r10993, %r10989, 0, 0x7772U; 2026-02-21T12:39:47.6900952Z cvt.u16.u32 %rs540, %r10993; 2026-02-21T12:39:47.6901159Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6901223Z shl.b16 %rs541, %rs538, 4; 2026-02-21T12:39:47.6901286Z shl.b16 %rs542, %rs537, 4; 2026-02-21T12:39:47.6901485Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6901550Z cvt.u32.u16 %r10994, %rs541; 2026-02-21T12:39:47.6901628Z prmt.b32 %r10995, %r10994, %r10996, 0x3340U; 2026-02-21T12:39:47.6901811Z prmt.b32 %r10997, %r10995, %r10911, 0x5410U; 2026-02-21T12:39:47.6901892Z prmt.b32 %r10998, %r10997, %r10989, 0x5040U; 2026-02-21T12:39:47.6901964Z prmt.b32 %r10999, %r10998, 0, 0x9991U; 2026-02-21T12:39:47.6902031Z cvt.u16.u32 %rs543, %r10999; 2026-02-21T12:39:47.6902092Z shr.s16 %rs544, %rs543, 4; 2026-02-21T12:39:47.6902159Z prmt.b32 %r11000, %r10998, 0, 0xbbb3U; 2026-02-21T12:39:47.6902221Z cvt.u16.u32 %rs545, %r11000; 2026-02-21T12:39:47.6902285Z shr.s16 %rs546, %rs545, 4; 2026-02-21T12:39:47.6902346Z cvt.s16.s8 %rs547, %rs541; 2026-02-21T12:39:47.6902407Z shr.s16 %rs548, %rs547, 4; 2026-02-21T12:39:47.6902472Z cvt.s16.s8 %rs549, %rs542; 2026-02-21T12:39:47.6902534Z shr.s16 %rs550, %rs549, 4; 2026-02-21T12:39:47.6902730Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6902799Z cvt.rn.f32.s16 %r11001, %rs546; 2026-02-21T12:39:47.6902868Z cvt.rn.f32.s16 %r11002, %rs544; 2026-02-21T12:39:47.6902950Z cvt.rn.f32.s16 %r11003, %rs550; 2026-02-21T12:39:47.6903014Z cvt.rn.f32.s16 %r11004, %rs548; 2026-02-21T12:39:47.6903216Z .loc 1 60 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:60:28 2026-02-21T12:39:47.6903280Z shl.b16 %rs551, %rs540, 4; 2026-02-21T12:39:47.6903341Z shl.b16 %rs552, %rs539, 4; 2026-02-21T12:39:47.6903541Z .loc 1 62 25 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:62:25 2026-02-21T12:39:47.6903621Z prmt.b32 %r11005, %r10989, %r11006, 0x3020U; 2026-02-21T12:39:47.6903689Z prmt.b32 %r11007, %r11005, 0, 0x9991U; 2026-02-21T12:39:47.6903754Z cvt.u16.u32 %rs553, %r11007; 2026-02-21T12:39:47.6903820Z shr.s16 %rs554, %rs553, 4; 2026-02-21T12:39:47.6903881Z cvt.s16.s8 %rs555, %rs551; 2026-02-21T12:39:47.6903945Z shr.s16 %rs556, %rs555, 4; 2026-02-21T12:39:47.6904011Z cvt.s16.s8 %rs557, %rs552; 2026-02-21T12:39:47.6904072Z shr.s16 %rs558, %rs557, 4; 2026-02-21T12:39:47.6904144Z prmt.b32 %r11008, %r10989, 0, 0xbbb3U; 2026-02-21T12:39:47.6904208Z cvt.u16.u32 %rs559, %r11008; 2026-02-21T12:39:47.6904274Z shr.s16 %rs560, %rs559, 4; 2026-02-21T12:39:47.6904479Z .loc 1 80 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:80:32 2026-02-21T12:39:47.6904544Z cvt.rn.f32.s16 %r11009, %rs554; 2026-02-21T12:39:47.6904611Z cvt.rn.f32.s16 %r11010, %rs560; 2026-02-21T12:39:47.6904673Z cvt.rn.f32.s16 %r11011, %rs558; 2026-02-21T12:39:47.6904735Z cvt.rn.f32.s16 %r11012, %rs556; 2026-02-21T12:39:47.6904795Z bar.sync 0; 2026-02-21T12:39:47.6904922Z st.shared.v4.b32 [%r1083], {%r10980, %r10978, %r10979, %r10977}; 2026-02-21T12:39:47.6905042Z st.shared.v4.b32 [%r1084], {%r10988, %r10985, %r10987, %r10986}; 2026-02-21T12:39:47.6905155Z st.shared.v4.b32 [%r1085], {%r11004, %r11002, %r11003, %r11001}; 2026-02-21T12:39:47.6905350Z st.shared.v4.b32 [%r1086], {%r11012, %r11009, %r11011, %r11010}; 2026-02-21T12:39:47.6905406Z $L__tmp19: 2026-02-21T12:39:47.6905730Z .loc 2 291 36 // standard.py:291:36 @[ conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:87:40 ] 2026-02-21T12:39:47.6905797Z // begin inline asm 2026-02-21T12:39:47.6905878Z fence.proxy.async.shared::cta; 2026-02-21T12:39:47.6905935Z // end inline asm 2026-02-21T12:39:47.6905994Z bar.sync 0; 2026-02-21T12:39:47.6906067Z wgmma.fence.sync.aligned; 2026-02-21T12:39:47.6906130Z // begin inline asm 2026-02-21T12:39:47.6909242Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899}, {%r10357,%r10358,%r10359,%r10360}, %rd519, %p34, 1, 1; 2026-02-21T12:39:47.6909327Z // end inline asm 2026-02-21T12:39:47.6909393Z // begin inline asm 2026-02-21T12:39:47.6912110Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899}, {%r10617,%r10618,%r10619,%r10620}, %rd520, %p34, 1, 1; 2026-02-21T12:39:47.6912179Z // end inline asm 2026-02-21T12:39:47.6912256Z wgmma.commit_group.sync.aligned; 2026-02-21T12:39:47.6912319Z mov.b32 %r10749, %r10096; 2026-02-21T12:39:47.6912390Z mov.b32 %r10751, %r10750; 2026-02-21T12:39:47.6912450Z // begin inline asm 2026-02-21T12:39:47.6914977Z // wait for regs: %r11772,%r11773,%r11774,%r11775,%r11776,%r11777,%r11778,%r11779,%r11780,%r11781,%r11782,%r11783,%r11784,%r11785,%r11786,%r11787,%r11788,%r11789,%r11790,%r11791,%r11792,%r11793,%r11794,%r11795,%r11796,%r11797,%r11798,%r11799,%r11800,%r11801,%r11802,%r11803,%r11804,%r11805,%r11806,%r11807,%r11808,%r11809,%r11810,%r11811,%r11812,%r11813,%r11814,%r11815,%r11816,%r11817,%r11818,%r11819,%r11820,%r11821,%r11822,%r11823,%r11824,%r11825,%r11826,%r11827,%r11828,%r11829,%r11830,%r11831,%r11832,%r11833,%r11834,%r11835,%r11836,%r11837,%r11838,%r11839,%r11840,%r11841,%r11842,%r11843,%r11844,%r11845,%r11846,%r11847,%r11848,%r11849,%r11850,%r11851,%r11852,%r11853,%r11854,%r11855,%r11856,%r11857,%r11858,%r11859,%r11860,%r11861,%r11862,%r11863,%r11864,%r11865,%r11866,%r11867,%r11868,%r11869,%r11870,%r11871,%r11872,%r11873,%r11874,%r11875,%r11876,%r11877,%r11878,%r11879,%r11880,%r11881,%r11882,%r11883,%r11884,%r11885,%r11886,%r11887,%r11888,%r11889,%r11890,%r11891,%r11892,%r11893,%r11894,%r11895,%r11896,%r11897,%r11898,%r11899,%r10749,%r10750,%r10751 2026-02-21T12:39:47.6915199Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:39:47.6915257Z // end inline asm 2026-02-21T12:39:47.6915312Z $L__tmp20: 2026-02-21T12:39:47.6915533Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6915595Z add.s32 %r11013, %r11766, 16; 2026-02-21T12:39:47.6915669Z add.s32 %r11014, %r11771, 1; 2026-02-21T12:39:47.6915741Z setp.gt.s32 %p41, %r11014, 4; 2026-02-21T12:39:47.6915814Z selp.b32 %r11771, 0, %r11014, %p41; 2026-02-21T12:39:47.6915883Z selp.b32 %r11765, 0, %r11013, %p38; 2026-02-21T12:39:47.6916089Z .loc 1 48 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:48:22 2026-02-21T12:39:47.6916152Z shl.b32 %r11015, %r11765, 1; 2026-02-21T12:39:47.6916436Z .loc 1 50 26 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:50:26 2026-02-21T12:39:47.6916629Z add.s32 %r11016, %r11015, %r7; 2026-02-21T12:39:47.6916848Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6916915Z shl.b64 %rd534, %rd630, 14; 2026-02-21T12:39:47.6916980Z add.s64 %rd535, %rd149, %rd534; 2026-02-21T12:39:47.6917055Z mul.wide.s32 %rd536, %r11016, 2; 2026-02-21T12:39:47.6917117Z add.s64 %rd524, %rd535, %rd536; 2026-02-21T12:39:47.6917179Z shl.b64 %rd537, %rd631, 14; 2026-02-21T12:39:47.6917246Z add.s64 %rd538, %rd149, %rd537; 2026-02-21T12:39:47.6917308Z add.s64 %rd525, %rd538, %rd536; 2026-02-21T12:39:47.6917503Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6917574Z shl.b32 %r11017, %r11771, 12; 2026-02-21T12:39:47.6917648Z add.s32 %r11018, %r10893, %r11017; 2026-02-21T12:39:47.6917712Z add.s32 %r10883, %r11018, %r1071; 2026-02-21T12:39:47.6917782Z selp.b32 %r10884, 8, 0, %p39; 2026-02-21T12:39:47.6917847Z // begin inline asm 2026-02-21T12:39:47.6918006Z cp.async.ca.shared.global [ %r10883 + 0 ], [ %rd524 + 0 ], 0x8, %r10884; 2026-02-21T12:39:47.6918064Z // end inline asm 2026-02-21T12:39:47.6918131Z add.s32 %r10885, %r10883, 2048; 2026-02-21T12:39:47.6918190Z // begin inline asm 2026-02-21T12:39:47.6918335Z cp.async.ca.shared.global [ %r10885 + 0 ], [ %rd525 + 0 ], 0x8, %r10884; 2026-02-21T12:39:47.6918393Z // end inline asm 2026-02-21T12:39:47.6918465Z cp.async.commit_group; 2026-02-21T12:39:47.6918682Z .loc 1 43 126 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:43:126 2026-02-21T12:39:47.6918746Z add.s32 %r11760, %r11765, 8; 2026-02-21T12:39:47.6918951Z .loc 1 48 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:48:22 2026-02-21T12:39:47.6919015Z shl.b32 %r11019, %r11760, 1; 2026-02-21T12:39:47.6919216Z .loc 1 50 26 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:50:26 2026-02-21T12:39:47.6919285Z add.s32 %r11020, %r11019, %r7; 2026-02-21T12:39:47.6919481Z .loc 1 51 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:32 2026-02-21T12:39:47.6919549Z mul.wide.s32 %rd539, %r11020, 2; 2026-02-21T12:39:47.6919615Z add.s64 %rd526, %rd535, %rd539; 2026-02-21T12:39:47.6919683Z add.s64 %rd527, %rd538, %rd539; 2026-02-21T12:39:47.6919890Z .loc 1 51 80 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:51:80 2026-02-21T12:39:47.6919957Z add.s32 %r11021, %r10955, %r11017; 2026-02-21T12:39:47.6920025Z add.s32 %r10887, %r11021, %r1071; 2026-02-21T12:39:47.6920084Z // begin inline asm 2026-02-21T12:39:47.6920231Z cp.async.ca.shared.global [ %r10887 + 0 ], [ %rd526 + 0 ], 0x8, %r10884; 2026-02-21T12:39:47.6920378Z // end inline asm 2026-02-21T12:39:47.6920441Z add.s32 %r10889, %r10887, 2048; 2026-02-21T12:39:47.6920500Z // begin inline asm 2026-02-21T12:39:47.6920705Z cp.async.ca.shared.global [ %r10889 + 0 ], [ %rd527 + 0 ], 0x8, %r10884; 2026-02-21T12:39:47.6920767Z // end inline asm 2026-02-21T12:39:47.6920834Z cp.async.commit_group; 2026-02-21T12:39:47.6921041Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6921115Z setp.ne.b64 %p42, %rd614, 255; 2026-02-21T12:39:47.6921178Z @%p42 bra $L__BB0_35; 2026-02-21T12:39:47.6921292Z // %bb.34: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:39:47.6921493Z .loc 1 34 32 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:34:32 2026-02-21T12:39:47.6921559Z add.s64 %rd556, %rd610, %rd7; 2026-02-21T12:39:47.6921623Z add.s64 %rd557, %rd610, %rd8; 2026-02-21T12:39:47.6921685Z add.s64 %rd558, %rd610, %rd9; 2026-02-21T12:39:47.6921754Z add.s64 %rd559, %rd610, %rd10; 2026-02-21T12:39:47.6921816Z add.s64 %rd560, %rd610, %rd11; 2026-02-21T12:39:47.6921950Z add.s64 %rd561, %rd610, %rd12; 2026-02-21T12:39:47.6922075Z add.s64 %rd562, %rd610, %rd13; 2026-02-21T12:39:47.6922140Z add.s64 %rd563, %rd610, %rd14; 2026-02-21T12:39:47.6922203Z add.s64 %rd564, %rd610, %rd15; 2026-02-21T12:39:47.6922265Z add.s64 %rd565, %rd610, %rd16; 2026-02-21T12:39:47.6922344Z add.s64 %rd566, %rd610, %rd17; 2026-02-21T12:39:47.6922408Z add.s64 %rd567, %rd610, %rd18; 2026-02-21T12:39:47.6922469Z add.s64 %rd568, %rd610, %rd19; 2026-02-21T12:39:47.6922533Z add.s64 %rd569, %rd610, %rd20; 2026-02-21T12:39:47.6922593Z add.s64 %rd570, %rd610, %rd21; 2026-02-21T12:39:47.6922655Z add.s64 %rd571, %rd610, %rd22; 2026-02-21T12:39:47.6922855Z .loc 1 90 28 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:90:28 2026-02-21T12:39:47.6922942Z cvt.rn.bf16x2.f32 %r11167, %r11773, %r11772; 2026-02-21T12:39:47.6923028Z cvt.rn.bf16x2.f32 %r11168, %r11775, %r11774; 2026-02-21T12:39:47.6923107Z cvt.rn.bf16x2.f32 %r11169, %r11777, %r11776; 2026-02-21T12:39:47.6923192Z cvt.rn.bf16x2.f32 %r11170, %r11779, %r11778; 2026-02-21T12:39:47.6923270Z cvt.rn.bf16x2.f32 %r11171, %r11781, %r11780; 2026-02-21T12:39:47.6923345Z cvt.rn.bf16x2.f32 %r11172, %r11783, %r11782; 2026-02-21T12:39:47.6923427Z cvt.rn.bf16x2.f32 %r11173, %r11785, %r11784; 2026-02-21T12:39:47.6923503Z cvt.rn.bf16x2.f32 %r11174, %r11787, %r11786; 2026-02-21T12:39:47.6923578Z cvt.rn.bf16x2.f32 %r11175, %r11789, %r11788; 2026-02-21T12:39:47.6923656Z cvt.rn.bf16x2.f32 %r11176, %r11791, %r11790; 2026-02-21T12:39:47.6923733Z cvt.rn.bf16x2.f32 %r11177, %r11793, %r11792; 2026-02-21T12:39:47.6923809Z cvt.rn.bf16x2.f32 %r11178, %r11795, %r11794; 2026-02-21T12:39:47.6923886Z cvt.rn.bf16x2.f32 %r11179, %r11797, %r11796; 2026-02-21T12:39:47.6923967Z cvt.rn.bf16x2.f32 %r11180, %r11799, %r11798; 2026-02-21T12:39:47.6924043Z cvt.rn.bf16x2.f32 %r11181, %r11801, %r11800; 2026-02-21T12:39:47.6924134Z cvt.rn.bf16x2.f32 %r11182, %r11803, %r11802; 2026-02-21T12:39:47.6924215Z cvt.rn.bf16x2.f32 %r11183, %r11805, %r11804; 2026-02-21T12:39:47.6924297Z cvt.rn.bf16x2.f32 %r11184, %r11807, %r11806; 2026-02-21T12:39:47.6924382Z cvt.rn.bf16x2.f32 %r11185, %r11809, %r11808; 2026-02-21T12:39:47.6924458Z cvt.rn.bf16x2.f32 %r11186, %r11811, %r11810; 2026-02-21T12:39:47.6924537Z cvt.rn.bf16x2.f32 %r11187, %r11813, %r11812; 2026-02-21T12:39:47.6924614Z cvt.rn.bf16x2.f32 %r11188, %r11815, %r11814; 2026-02-21T12:39:47.6924690Z cvt.rn.bf16x2.f32 %r11189, %r11817, %r11816; 2026-02-21T12:39:47.6924768Z cvt.rn.bf16x2.f32 %r11190, %r11819, %r11818; 2026-02-21T12:39:47.6924843Z cvt.rn.bf16x2.f32 %r11191, %r11821, %r11820; 2026-02-21T12:39:47.6924920Z cvt.rn.bf16x2.f32 %r11192, %r11823, %r11822; 2026-02-21T12:39:47.6924998Z cvt.rn.bf16x2.f32 %r11193, %r11825, %r11824; 2026-02-21T12:39:47.6925073Z cvt.rn.bf16x2.f32 %r11194, %r11827, %r11826; 2026-02-21T12:39:47.6925210Z cvt.rn.bf16x2.f32 %r11195, %r11829, %r11828; 2026-02-21T12:39:47.6925285Z cvt.rn.bf16x2.f32 %r11196, %r11831, %r11830; 2026-02-21T12:39:47.6925425Z cvt.rn.bf16x2.f32 %r11197, %r11833, %r11832; 2026-02-21T12:39:47.6925506Z cvt.rn.bf16x2.f32 %r11198, %r11835, %r11834; 2026-02-21T12:39:47.6925586Z cvt.rn.bf16x2.f32 %r11199, %r11837, %r11836; 2026-02-21T12:39:47.6925668Z cvt.rn.bf16x2.f32 %r11200, %r11839, %r11838; 2026-02-21T12:39:47.6925744Z cvt.rn.bf16x2.f32 %r11201, %r11841, %r11840; 2026-02-21T12:39:47.6925822Z cvt.rn.bf16x2.f32 %r11202, %r11843, %r11842; 2026-02-21T12:39:47.6925902Z cvt.rn.bf16x2.f32 %r11203, %r11845, %r11844; 2026-02-21T12:39:47.6925979Z cvt.rn.bf16x2.f32 %r11204, %r11847, %r11846; 2026-02-21T12:39:47.6926054Z cvt.rn.bf16x2.f32 %r11205, %r11849, %r11848; 2026-02-21T12:39:47.6926129Z cvt.rn.bf16x2.f32 %r11206, %r11851, %r11850; 2026-02-21T12:39:47.6926208Z cvt.rn.bf16x2.f32 %r11207, %r11853, %r11852; 2026-02-21T12:39:47.6926286Z cvt.rn.bf16x2.f32 %r11208, %r11855, %r11854; 2026-02-21T12:39:47.6926361Z cvt.rn.bf16x2.f32 %r11209, %r11857, %r11856; 2026-02-21T12:39:47.6926635Z cvt.rn.bf16x2.f32 %r11210, %r11859, %r11858; 2026-02-21T12:39:47.6926817Z cvt.rn.bf16x2.f32 %r11211, %r11861, %r11860; 2026-02-21T12:39:47.6926902Z cvt.rn.bf16x2.f32 %r11212, %r11863, %r11862; 2026-02-21T12:39:47.6926985Z cvt.rn.bf16x2.f32 %r11213, %r11865, %r11864; 2026-02-21T12:39:47.6927060Z cvt.rn.bf16x2.f32 %r11214, %r11867, %r11866; 2026-02-21T12:39:47.6927136Z cvt.rn.bf16x2.f32 %r11215, %r11869, %r11868; 2026-02-21T12:39:47.6927212Z cvt.rn.bf16x2.f32 %r11216, %r11871, %r11870; 2026-02-21T12:39:47.6927292Z cvt.rn.bf16x2.f32 %r11217, %r11873, %r11872; 2026-02-21T12:39:47.6927369Z cvt.rn.bf16x2.f32 %r11218, %r11875, %r11874; 2026-02-21T12:39:47.6927445Z cvt.rn.bf16x2.f32 %r11219, %r11877, %r11876; 2026-02-21T12:39:47.6927524Z cvt.rn.bf16x2.f32 %r11220, %r11879, %r11878; 2026-02-21T12:39:47.6927610Z cvt.rn.bf16x2.f32 %r11221, %r11881, %r11880; 2026-02-21T12:39:47.6927691Z cvt.rn.bf16x2.f32 %r11222, %r11883, %r11882; 2026-02-21T12:39:47.6927770Z cvt.rn.bf16x2.f32 %r11223, %r11885, %r11884; 2026-02-21T12:39:47.6927854Z cvt.rn.bf16x2.f32 %r11224, %r11887, %r11886; 2026-02-21T12:39:47.6927933Z cvt.rn.bf16x2.f32 %r11225, %r11889, %r11888; 2026-02-21T12:39:47.6928008Z cvt.rn.bf16x2.f32 %r11226, %r11891, %r11890; 2026-02-21T12:39:47.6928089Z cvt.rn.bf16x2.f32 %r11227, %r11893, %r11892; 2026-02-21T12:39:47.6928166Z cvt.rn.bf16x2.f32 %r11228, %r11895, %r11894; 2026-02-21T12:39:47.6928242Z cvt.rn.bf16x2.f32 %r11229, %r11897, %r11896; 2026-02-21T12:39:47.6928324Z cvt.rn.bf16x2.f32 %r11230, %r11899, %r11898; 2026-02-21T12:39:47.6928535Z .loc 1 91 22 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:22 2026-02-21T12:39:47.6928614Z mad.lo.s64 %rd572, %rd556, 2560, %rd151; 2026-02-21T12:39:47.6928683Z shl.b64 %rd573, %rd618, 1; 2026-02-21T12:39:47.6928751Z add.s64 %rd540, %rd572, %rd573; 2026-02-21T12:39:47.6928828Z mad.lo.s64 %rd574, %rd557, 2560, %rd151; 2026-02-21T12:39:47.6928894Z add.s64 %rd541, %rd574, %rd573; 2026-02-21T12:39:47.6928970Z mad.lo.s64 %rd575, %rd558, 2560, %rd151; 2026-02-21T12:39:47.6929039Z add.s64 %rd542, %rd575, %rd573; 2026-02-21T12:39:47.6929110Z mad.lo.s64 %rd576, %rd559, 2560, %rd151; 2026-02-21T12:39:47.6929177Z add.s64 %rd543, %rd576, %rd573; 2026-02-21T12:39:47.6929248Z mad.lo.s64 %rd577, %rd560, 2560, %rd151; 2026-02-21T12:39:47.6929316Z add.s64 %rd544, %rd577, %rd573; 2026-02-21T12:39:47.6929386Z mad.lo.s64 %rd578, %rd561, 2560, %rd151; 2026-02-21T12:39:47.6929456Z add.s64 %rd545, %rd578, %rd573; 2026-02-21T12:39:47.6929526Z mad.lo.s64 %rd579, %rd562, 2560, %rd151; 2026-02-21T12:39:47.6929590Z add.s64 %rd546, %rd579, %rd573; 2026-02-21T12:39:47.6929669Z mad.lo.s64 %rd580, %rd563, 2560, %rd151; 2026-02-21T12:39:47.6929733Z add.s64 %rd547, %rd580, %rd573; 2026-02-21T12:39:47.6929804Z mad.lo.s64 %rd581, %rd564, 2560, %rd151; 2026-02-21T12:39:47.6929964Z add.s64 %rd548, %rd581, %rd573; 2026-02-21T12:39:47.6930035Z mad.lo.s64 %rd582, %rd565, 2560, %rd151; 2026-02-21T12:39:47.6930101Z add.s64 %rd549, %rd582, %rd573; 2026-02-21T12:39:47.6930246Z mad.lo.s64 %rd583, %rd566, 2560, %rd151; 2026-02-21T12:39:47.6930312Z add.s64 %rd550, %rd583, %rd573; 2026-02-21T12:39:47.6930380Z mad.lo.s64 %rd584, %rd567, 2560, %rd151; 2026-02-21T12:39:47.6930444Z add.s64 %rd551, %rd584, %rd573; 2026-02-21T12:39:47.6930517Z mad.lo.s64 %rd585, %rd568, 2560, %rd151; 2026-02-21T12:39:47.6930579Z add.s64 %rd552, %rd585, %rd573; 2026-02-21T12:39:47.6930648Z mad.lo.s64 %rd586, %rd569, 2560, %rd151; 2026-02-21T12:39:47.6930710Z add.s64 %rd553, %rd586, %rd573; 2026-02-21T12:39:47.6930786Z mad.lo.s64 %rd587, %rd570, 2560, %rd151; 2026-02-21T12:39:47.6930848Z add.s64 %rd554, %rd587, %rd573; 2026-02-21T12:39:47.6930918Z mad.lo.s64 %rd588, %rd571, 2560, %rd151; 2026-02-21T12:39:47.6930984Z add.s64 %rd555, %rd588, %rd573; 2026-02-21T12:39:47.6931192Z .loc 1 91 81 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:91:81 2026-02-21T12:39:47.6931251Z bar.sync 0; 2026-02-21T12:39:47.6931487Z st.shared.v4.b32 [%r1087], {%r11167, %r11169, %r11171, %r11173}; 2026-02-21T12:39:47.6931615Z st.shared.v4.b32 [%r1088], {%r11175, %r11177, %r11179, %r11181}; 2026-02-21T12:39:47.6931739Z st.shared.v4.b32 [%r1089], {%r11183, %r11185, %r11187, %r11189}; 2026-02-21T12:39:47.6931856Z st.shared.v4.b32 [%r1090], {%r11191, %r11193, %r11195, %r11197}; 2026-02-21T12:39:47.6931968Z st.shared.v4.b32 [%r1091], {%r11199, %r11201, %r11203, %r11205}; 2026-02-21T12:39:47.6932079Z st.shared.v4.b32 [%r1092], {%r11207, %r11209, %r11211, %r11213}; 2026-02-21T12:39:47.6932192Z st.shared.v4.b32 [%r1093], {%r11215, %r11217, %r11219, %r11221}; 2026-02-21T12:39:47.6932305Z st.shared.v4.b32 [%r1094], {%r11223, %r11225, %r11227, %r11229}; 2026-02-21T12:39:47.6932361Z bar.sync 0; 2026-02-21T12:39:47.6932424Z // begin inline asm 2026-02-21T12:39:47.6932645Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11102, %r11103, %r11104, %r11105}, [%r11026]; 2026-02-21T12:39:47.6932707Z // end inline asm 2026-02-21T12:39:47.6932770Z // begin inline asm 2026-02-21T12:39:47.6932973Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11110, %r11111, %r11112, %r11113}, [%r11031]; 2026-02-21T12:39:47.6933031Z // end inline asm 2026-02-21T12:39:47.6933091Z // begin inline asm 2026-02-21T12:39:47.6933284Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11118, %r11119, %r11120, %r11121}, [%r11036]; 2026-02-21T12:39:47.6933346Z // end inline asm 2026-02-21T12:39:47.6933404Z // begin inline asm 2026-02-21T12:39:47.6933596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11126, %r11127, %r11128, %r11129}, [%r11041]; 2026-02-21T12:39:47.6933654Z // end inline asm 2026-02-21T12:39:47.6933713Z // begin inline asm 2026-02-21T12:39:47.6933905Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11134, %r11135, %r11136, %r11137}, [%r11046]; 2026-02-21T12:39:47.6933961Z // end inline asm 2026-02-21T12:39:47.6934025Z // begin inline asm 2026-02-21T12:39:47.6934216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11142, %r11143, %r11144, %r11145}, [%r11051]; 2026-02-21T12:39:47.6934279Z // end inline asm 2026-02-21T12:39:47.6934340Z // begin inline asm 2026-02-21T12:39:47.6934529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11150, %r11151, %r11152, %r11153}, [%r11056]; 2026-02-21T12:39:47.6934587Z // end inline asm 2026-02-21T12:39:47.6934649Z // begin inline asm 2026-02-21T12:39:47.6934839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11158, %r11159, %r11160, %r11161}, [%r11061]; 2026-02-21T12:39:47.6934896Z // end inline asm 2026-02-21T12:39:47.6934952Z bar.sync 0; 2026-02-21T12:39:47.6935084Z st.shared.v4.b32 [%r1087], {%r11168, %r11170, %r11172, %r11174}; 2026-02-21T12:39:47.6935203Z st.shared.v4.b32 [%r1088], {%r11176, %r11178, %r11180, %r11182}; 2026-02-21T12:39:47.6935316Z st.shared.v4.b32 [%r1089], {%r11184, %r11186, %r11188, %r11190}; 2026-02-21T12:39:47.6935433Z st.shared.v4.b32 [%r1090], {%r11192, %r11194, %r11196, %r11198}; 2026-02-21T12:39:47.6935612Z st.shared.v4.b32 [%r1091], {%r11200, %r11202, %r11204, %r11206}; 2026-02-21T12:39:47.6935773Z st.shared.v4.b32 [%r1092], {%r11208, %r11210, %r11212, %r11214}; 2026-02-21T12:39:47.6935888Z st.shared.v4.b32 [%r1093], {%r11216, %r11218, %r11220, %r11222}; 2026-02-21T12:39:47.6935999Z st.shared.v4.b32 [%r1094], {%r11224, %r11226, %r11228, %r11230}; 2026-02-21T12:39:47.6936054Z bar.sync 0; 2026-02-21T12:39:47.6936112Z // begin inline asm 2026-02-21T12:39:47.6936311Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11106, %r11107, %r11108, %r11109}, [%r11026]; 2026-02-21T12:39:47.6936369Z // end inline asm 2026-02-21T12:39:47.6936428Z // begin inline asm 2026-02-21T12:39:47.6936750Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11114, %r11115, %r11116, %r11117}, [%r11031]; 2026-02-21T12:39:47.6936811Z // end inline asm 2026-02-21T12:39:47.6936872Z // begin inline asm 2026-02-21T12:39:47.6937069Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11122, %r11123, %r11124, %r11125}, [%r11036]; 2026-02-21T12:39:47.6937128Z // end inline asm 2026-02-21T12:39:47.6937187Z // begin inline asm 2026-02-21T12:39:47.6937523Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11130, %r11131, %r11132, %r11133}, [%r11041]; 2026-02-21T12:39:47.6937591Z // end inline asm 2026-02-21T12:39:47.6937652Z // begin inline asm 2026-02-21T12:39:47.6937849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11138, %r11139, %r11140, %r11141}, [%r11046]; 2026-02-21T12:39:47.6937914Z // end inline asm 2026-02-21T12:39:47.6937973Z // begin inline asm 2026-02-21T12:39:47.6938167Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11146, %r11147, %r11148, %r11149}, [%r11051]; 2026-02-21T12:39:47.6938228Z // end inline asm 2026-02-21T12:39:47.6938287Z // begin inline asm 2026-02-21T12:39:47.6938480Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11154, %r11155, %r11156, %r11157}, [%r11056]; 2026-02-21T12:39:47.6938537Z // end inline asm 2026-02-21T12:39:47.6938598Z // begin inline asm 2026-02-21T12:39:47.6938792Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11162, %r11163, %r11164, %r11165}, [%r11061]; 2026-02-21T12:39:47.6938850Z // end inline asm 2026-02-21T12:39:47.6938914Z // begin inline asm 2026-02-21T12:39:47.6939049Z st.global.v4.b32 [ %rd540 + 0 ], { %r11102, %r11103, %r11104, %r11105 }; 2026-02-21T12:39:47.6939108Z // end inline asm 2026-02-21T12:39:47.6939166Z // begin inline asm 2026-02-21T12:39:47.6939296Z st.global.v4.b32 [ %rd541 + 0 ], { %r11106, %r11107, %r11108, %r11109 }; 2026-02-21T12:39:47.6939353Z // end inline asm 2026-02-21T12:39:47.6939413Z // begin inline asm 2026-02-21T12:39:47.6939538Z st.global.v4.b32 [ %rd542 + 0 ], { %r11110, %r11111, %r11112, %r11113 }; 2026-02-21T12:39:47.6939594Z // end inline asm 2026-02-21T12:39:47.6939654Z // begin inline asm 2026-02-21T12:39:47.6939777Z st.global.v4.b32 [ %rd543 + 0 ], { %r11114, %r11115, %r11116, %r11117 }; 2026-02-21T12:39:47.6939834Z // end inline asm 2026-02-21T12:39:47.6939894Z // begin inline asm 2026-02-21T12:39:47.6940018Z st.global.v4.b32 [ %rd544 + 0 ], { %r11118, %r11119, %r11120, %r11121 }; 2026-02-21T12:39:47.6940080Z // end inline asm 2026-02-21T12:39:47.6940145Z // begin inline asm 2026-02-21T12:39:47.6940267Z st.global.v4.b32 [ %rd545 + 0 ], { %r11122, %r11123, %r11124, %r11125 }; 2026-02-21T12:39:47.6940326Z // end inline asm 2026-02-21T12:39:47.6940383Z // begin inline asm 2026-02-21T12:39:47.6940505Z st.global.v4.b32 [ %rd546 + 0 ], { %r11126, %r11127, %r11128, %r11129 }; 2026-02-21T12:39:47.6940573Z // end inline asm 2026-02-21T12:39:47.6940640Z // begin inline asm 2026-02-21T12:39:47.6940761Z st.global.v4.b32 [ %rd547 + 0 ], { %r11130, %r11131, %r11132, %r11133 }; 2026-02-21T12:39:47.6940817Z // end inline asm 2026-02-21T12:39:47.6940881Z // begin inline asm 2026-02-21T12:39:47.6941001Z st.global.v4.b32 [ %rd548 + 0 ], { %r11134, %r11135, %r11136, %r11137 }; 2026-02-21T12:39:47.6941057Z // end inline asm 2026-02-21T12:39:47.6941120Z // begin inline asm 2026-02-21T12:39:47.6941328Z st.global.v4.b32 [ %rd549 + 0 ], { %r11138, %r11139, %r11140, %r11141 }; 2026-02-21T12:39:47.6941395Z // end inline asm 2026-02-21T12:39:47.6941519Z // begin inline asm 2026-02-21T12:39:47.6941647Z st.global.v4.b32 [ %rd550 + 0 ], { %r11142, %r11143, %r11144, %r11145 }; 2026-02-21T12:39:47.6941708Z // end inline asm 2026-02-21T12:39:47.6941769Z // begin inline asm 2026-02-21T12:39:47.6941893Z st.global.v4.b32 [ %rd551 + 0 ], { %r11146, %r11147, %r11148, %r11149 }; 2026-02-21T12:39:47.6941950Z // end inline asm 2026-02-21T12:39:47.6942008Z // begin inline asm 2026-02-21T12:39:47.6942129Z st.global.v4.b32 [ %rd552 + 0 ], { %r11150, %r11151, %r11152, %r11153 }; 2026-02-21T12:39:47.6942189Z // end inline asm 2026-02-21T12:39:47.6942249Z // begin inline asm 2026-02-21T12:39:47.6942370Z st.global.v4.b32 [ %rd553 + 0 ], { %r11154, %r11155, %r11156, %r11157 }; 2026-02-21T12:39:47.6942429Z // end inline asm 2026-02-21T12:39:47.6942489Z // begin inline asm 2026-02-21T12:39:47.6942611Z st.global.v4.b32 [ %rd554 + 0 ], { %r11158, %r11159, %r11160, %r11161 }; 2026-02-21T12:39:47.6942671Z // end inline asm 2026-02-21T12:39:47.6942730Z // begin inline asm 2026-02-21T12:39:47.6942942Z st.global.v4.b32 [ %rd555 + 0 ], { %r11162, %r11163, %r11164, %r11165 }; 2026-02-21T12:39:47.6943002Z // end inline asm 2026-02-21T12:39:47.6943072Z mov.b32 %r11772, 0f00000000; 2026-02-21T12:39:47.6943135Z mov.b32 %r11773, %r11772; 2026-02-21T12:39:47.6943195Z mov.b32 %r11774, %r11772; 2026-02-21T12:39:47.6943258Z mov.b32 %r11775, %r11772; 2026-02-21T12:39:47.6943317Z mov.b32 %r11776, %r11772; 2026-02-21T12:39:47.6943376Z mov.b32 %r11777, %r11772; 2026-02-21T12:39:47.6943435Z mov.b32 %r11778, %r11772; 2026-02-21T12:39:47.6943499Z mov.b32 %r11779, %r11772; 2026-02-21T12:39:47.6943558Z mov.b32 %r11780, %r11772; 2026-02-21T12:39:47.6943616Z mov.b32 %r11781, %r11772; 2026-02-21T12:39:47.6943678Z mov.b32 %r11782, %r11772; 2026-02-21T12:39:47.6943738Z mov.b32 %r11783, %r11772; 2026-02-21T12:39:47.6943798Z mov.b32 %r11784, %r11772; 2026-02-21T12:39:47.6943858Z mov.b32 %r11785, %r11772; 2026-02-21T12:39:47.6943921Z mov.b32 %r11786, %r11772; 2026-02-21T12:39:47.6943986Z mov.b32 %r11787, %r11772; 2026-02-21T12:39:47.6944044Z mov.b32 %r11788, %r11772; 2026-02-21T12:39:47.6944105Z mov.b32 %r11789, %r11772; 2026-02-21T12:39:47.6944164Z mov.b32 %r11790, %r11772; 2026-02-21T12:39:47.6944223Z mov.b32 %r11791, %r11772; 2026-02-21T12:39:47.6944281Z mov.b32 %r11792, %r11772; 2026-02-21T12:39:47.6944344Z mov.b32 %r11793, %r11772; 2026-02-21T12:39:47.6944416Z mov.b32 %r11794, %r11772; 2026-02-21T12:39:47.6944477Z mov.b32 %r11795, %r11772; 2026-02-21T12:39:47.6944542Z mov.b32 %r11796, %r11772; 2026-02-21T12:39:47.6944601Z mov.b32 %r11797, %r11772; 2026-02-21T12:39:47.6944660Z mov.b32 %r11798, %r11772; 2026-02-21T12:39:47.6944720Z mov.b32 %r11799, %r11772; 2026-02-21T12:39:47.6944782Z mov.b32 %r11800, %r11772; 2026-02-21T12:39:47.6944841Z mov.b32 %r11801, %r11772; 2026-02-21T12:39:47.6944902Z mov.b32 %r11802, %r11772; 2026-02-21T12:39:47.6944964Z mov.b32 %r11803, %r11772; 2026-02-21T12:39:47.6945023Z mov.b32 %r11804, %r11772; 2026-02-21T12:39:47.6945086Z mov.b32 %r11805, %r11772; 2026-02-21T12:39:47.6945147Z mov.b32 %r11806, %r11772; 2026-02-21T12:39:47.6945206Z mov.b32 %r11807, %r11772; 2026-02-21T12:39:47.6945265Z mov.b32 %r11808, %r11772; 2026-02-21T12:39:47.6945327Z mov.b32 %r11809, %r11772; 2026-02-21T12:39:47.6945389Z mov.b32 %r11810, %r11772; 2026-02-21T12:39:47.6945446Z mov.b32 %r11811, %r11772; 2026-02-21T12:39:47.6945509Z mov.b32 %r11812, %r11772; 2026-02-21T12:39:47.6945571Z mov.b32 %r11813, %r11772; 2026-02-21T12:39:47.6945630Z mov.b32 %r11814, %r11772; 2026-02-21T12:39:47.6945689Z mov.b32 %r11815, %r11772; 2026-02-21T12:39:47.6945749Z mov.b32 %r11816, %r11772; 2026-02-21T12:39:47.6945813Z mov.b32 %r11817, %r11772; 2026-02-21T12:39:47.6945873Z mov.b32 %r11818, %r11772; 2026-02-21T12:39:47.6945933Z mov.b32 %r11819, %r11772; 2026-02-21T12:39:47.6946059Z mov.b32 %r11820, %r11772; 2026-02-21T12:39:47.6946118Z mov.b32 %r11821, %r11772; 2026-02-21T12:39:47.6946177Z mov.b32 %r11822, %r11772; 2026-02-21T12:39:47.6946284Z mov.b32 %r11823, %r11772; 2026-02-21T12:39:47.6946346Z mov.b32 %r11824, %r11772; 2026-02-21T12:39:47.6946407Z mov.b32 %r11825, %r11772; 2026-02-21T12:39:47.6946604Z mov.b32 %r11826, %r11772; 2026-02-21T12:39:47.6946686Z mov.b32 %r11827, %r11772; 2026-02-21T12:39:47.6946748Z mov.b32 %r11828, %r11772; 2026-02-21T12:39:47.6946807Z mov.b32 %r11829, %r11772; 2026-02-21T12:39:47.6946868Z mov.b32 %r11830, %r11772; 2026-02-21T12:39:47.6946930Z mov.b32 %r11831, %r11772; 2026-02-21T12:39:47.6946989Z mov.b32 %r11832, %r11772; 2026-02-21T12:39:47.6947048Z mov.b32 %r11833, %r11772; 2026-02-21T12:39:47.6947112Z mov.b32 %r11834, %r11772; 2026-02-21T12:39:47.6947170Z mov.b32 %r11835, %r11772; 2026-02-21T12:39:47.6947231Z mov.b32 %r11836, %r11772; 2026-02-21T12:39:47.6947289Z mov.b32 %r11837, %r11772; 2026-02-21T12:39:47.6947354Z mov.b32 %r11838, %r11772; 2026-02-21T12:39:47.6947413Z mov.b32 %r11839, %r11772; 2026-02-21T12:39:47.6947471Z mov.b32 %r11840, %r11772; 2026-02-21T12:39:47.6947688Z mov.b32 %r11841, %r11772; 2026-02-21T12:39:47.6947754Z mov.b32 %r11842, %r11772; 2026-02-21T12:39:47.6947814Z mov.b32 %r11843, %r11772; 2026-02-21T12:39:47.6947875Z mov.b32 %r11844, %r11772; 2026-02-21T12:39:47.6947935Z mov.b32 %r11845, %r11772; 2026-02-21T12:39:47.6947993Z mov.b32 %r11846, %r11772; 2026-02-21T12:39:47.6948052Z mov.b32 %r11847, %r11772; 2026-02-21T12:39:47.6948113Z mov.b32 %r11848, %r11772; 2026-02-21T12:39:47.6948171Z mov.b32 %r11849, %r11772; 2026-02-21T12:39:47.6948229Z mov.b32 %r11850, %r11772; 2026-02-21T12:39:47.6948291Z mov.b32 %r11851, %r11772; 2026-02-21T12:39:47.6948350Z mov.b32 %r11852, %r11772; 2026-02-21T12:39:47.6948465Z mov.b32 %r11853, %r11772; 2026-02-21T12:39:47.6948526Z mov.b32 %r11854, %r11772; 2026-02-21T12:39:47.6948590Z mov.b32 %r11855, %r11772; 2026-02-21T12:39:47.6948652Z mov.b32 %r11856, %r11772; 2026-02-21T12:39:47.6948709Z mov.b32 %r11857, %r11772; 2026-02-21T12:39:47.6948773Z mov.b32 %r11858, %r11772; 2026-02-21T12:39:47.6948836Z mov.b32 %r11859, %r11772; 2026-02-21T12:39:47.6948897Z mov.b32 %r11860, %r11772; 2026-02-21T12:39:47.6948954Z mov.b32 %r11861, %r11772; 2026-02-21T12:39:47.6949018Z mov.b32 %r11862, %r11772; 2026-02-21T12:39:47.6949075Z mov.b32 %r11863, %r11772; 2026-02-21T12:39:47.6949134Z mov.b32 %r11864, %r11772; 2026-02-21T12:39:47.6949195Z mov.b32 %r11865, %r11772; 2026-02-21T12:39:47.6949253Z mov.b32 %r11866, %r11772; 2026-02-21T12:39:47.6949311Z mov.b32 %r11867, %r11772; 2026-02-21T12:39:47.6949370Z mov.b32 %r11868, %r11772; 2026-02-21T12:39:47.6949432Z mov.b32 %r11869, %r11772; 2026-02-21T12:39:47.6949492Z mov.b32 %r11870, %r11772; 2026-02-21T12:39:47.6949550Z mov.b32 %r11871, %r11772; 2026-02-21T12:39:47.6949613Z mov.b32 %r11872, %r11772; 2026-02-21T12:39:47.6949672Z mov.b32 %r11873, %r11772; 2026-02-21T12:39:47.6949732Z mov.b32 %r11874, %r11772; 2026-02-21T12:39:47.6949790Z mov.b32 %r11875, %r11772; 2026-02-21T12:39:47.6949865Z mov.b32 %r11876, %r11772; 2026-02-21T12:39:47.6949931Z mov.b32 %r11877, %r11772; 2026-02-21T12:39:47.6949992Z mov.b32 %r11878, %r11772; 2026-02-21T12:39:47.6950055Z mov.b32 %r11879, %r11772; 2026-02-21T12:39:47.6950113Z mov.b32 %r11880, %r11772; 2026-02-21T12:39:47.6950175Z mov.b32 %r11881, %r11772; 2026-02-21T12:39:47.6950233Z mov.b32 %r11882, %r11772; 2026-02-21T12:39:47.6950295Z mov.b32 %r11883, %r11772; 2026-02-21T12:39:47.6950352Z mov.b32 %r11884, %r11772; 2026-02-21T12:39:47.6950411Z mov.b32 %r11885, %r11772; 2026-02-21T12:39:47.6950482Z mov.b32 %r11886, %r11772; 2026-02-21T12:39:47.6950547Z mov.b32 %r11887, %r11772; 2026-02-21T12:39:47.6950606Z mov.b32 %r11888, %r11772; 2026-02-21T12:39:47.6950666Z mov.b32 %r11889, %r11772; 2026-02-21T12:39:47.6950725Z mov.b32 %r11890, %r11772; 2026-02-21T12:39:47.6950786Z mov.b32 %r11891, %r11772; 2026-02-21T12:39:47.6950923Z mov.b32 %r11892, %r11772; 2026-02-21T12:39:47.6950986Z mov.b32 %r11893, %r11772; 2026-02-21T12:39:47.6951045Z mov.b32 %r11894, %r11772; 2026-02-21T12:39:47.6951166Z mov.b32 %r11895, %r11772; 2026-02-21T12:39:47.6951230Z mov.b32 %r11896, %r11772; 2026-02-21T12:39:47.6951291Z mov.b32 %r11897, %r11772; 2026-02-21T12:39:47.6951349Z mov.b32 %r11898, %r11772; 2026-02-21T12:39:47.6951408Z mov.b32 %r11899, %r11772; 2026-02-21T12:39:47.6951474Z bra.uni $L__BB0_35; 2026-02-21T12:39:47.6951575Z $L__BB0_36: // %._crit_edge123 2026-02-21T12:39:47.6951808Z .loc 1 22 120 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:120 2026-02-21T12:39:47.6951882Z cp.async.wait_group 0; 2026-02-21T12:39:47.6951939Z bar.sync 0; 2026-02-21T12:39:47.6952144Z .loc 1 22 4 // conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py:22:4 2026-02-21T12:39:47.6952198Z ret; 2026-02-21T12:39:47.6952258Z $L__tmp21: 2026-02-21T12:39:47.6952317Z $L__func_end0: 2026-02-21T12:39:47.6952405Z // -- End function 2026-02-21T12:39:47.6952469Z } 2026-02-21T12:39:47.6952817Z .file 1 "/tmp/torchinductor_root/on/conqj6jkv4cgprf4otq3nsi3zp5lmdvd6vje72rqc2wp76pg4y3z.py" 2026-02-21T12:39:47.6953037Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:39:47.6953104Z .section .debug_abbrev 2026-02-21T12:39:47.6953157Z { 2026-02-21T12:39:47.6953255Z .b8 1 // Abbreviation Code 2026-02-21T12:39:47.6953362Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:39:47.6953453Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:39:47.6953542Z .b8 37 // DW_AT_producer 2026-02-21T12:39:47.6953624Z .b8 8 // DW_FORM_string 2026-02-21T12:39:47.6953709Z .b8 19 // DW_AT_language 2026-02-21T12:39:47.6953793Z .b8 5 // DW_FORM_data2 2026-02-21T12:39:47.6953872Z .b8 3 // DW_AT_name 2026-02-21T12:39:47.6953962Z .b8 8 // DW_FORM_string 2026-02-21T12:39:47.6954045Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:39:47.6954125Z .b8 6 // DW_FORM_data4 2026-02-21T12:39:47.6954208Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:39:47.6954292Z .b8 8 // DW_FORM_string 2026-02-21T12:39:47.6954367Z .b8 0 // EOM(1) 2026-02-21T12:39:47.6954439Z .b8 0 // EOM(2) 2026-02-21T12:39:47.6954532Z .b8 2 // Abbreviation Code 2026-02-21T12:39:47.6954621Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:39:47.6954701Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:39:47.6954783Z .b8 3 // DW_AT_name 2026-02-21T12:39:47.6954867Z .b8 8 // DW_FORM_string 2026-02-21T12:39:47.6954957Z .b8 32 // DW_AT_inline 2026-02-21T12:39:47.6955038Z .b8 11 // DW_FORM_data1 2026-02-21T12:39:47.6955117Z .b8 0 // EOM(1) 2026-02-21T12:39:47.6955187Z .b8 0 // EOM(2) 2026-02-21T12:39:47.6955275Z .b8 3 // Abbreviation Code 2026-02-21T12:39:47.6955366Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:39:47.6955450Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:39:47.6955533Z .b8 17 // DW_AT_low_pc 2026-02-21T12:39:47.6955612Z .b8 1 // DW_FORM_addr 2026-02-21T12:39:47.6955696Z .b8 18 // DW_AT_high_pc 2026-02-21T12:39:47.6955832Z .b8 1 // DW_FORM_addr 2026-02-21T12:39:47.6955934Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:39:47.6956059Z .b8 19 // DW_FORM_ref4 2026-02-21T12:39:47.6956128Z .b8 0 // EOM(1) 2026-02-21T12:39:47.6956199Z .b8 0 // EOM(2) 2026-02-21T12:39:47.6956287Z .b8 4 // Abbreviation Code 2026-02-21T12:39:47.6956399Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:39:47.6956599Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:39:47.6956701Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:39:47.6956779Z .b8 19 // DW_FORM_ref4 2026-02-21T12:39:47.6956856Z .b8 17 // DW_AT_low_pc 2026-02-21T12:39:47.6956935Z .b8 1 // DW_FORM_addr 2026-02-21T12:39:47.6957028Z .b8 18 // DW_AT_high_pc 2026-02-21T12:39:47.6957185Z .b8 1 // DW_FORM_addr 2026-02-21T12:39:47.6957343Z .b8 88 // DW_AT_call_file 2026-02-21T12:39:47.6957436Z .b8 11 // DW_FORM_data1 2026-02-21T12:39:47.6957517Z .b8 89 // DW_AT_call_line 2026-02-21T12:39:47.6957596Z .b8 11 // DW_FORM_data1 2026-02-21T12:39:47.6957684Z .b8 87 // DW_AT_call_column 2026-02-21T12:39:47.6957763Z .b8 11 // DW_FORM_data1 2026-02-21T12:39:47.6957833Z .b8 0 // EOM(1) 2026-02-21T12:39:47.6957906Z .b8 0 // EOM(2) 2026-02-21T12:39:47.6957979Z .b8 0 // EOM(3) 2026-02-21T12:39:47.6958031Z } 2026-02-21T12:39:47.6958096Z .section .debug_info 2026-02-21T12:39:47.6958152Z { 2026-02-21T12:39:47.6958243Z .b32 178 // Length of Unit 2026-02-21T12:39:47.6958342Z .b8 2 // DWARF version number 2026-02-21T12:39:47.6958398Z .b8 0 2026-02-21T12:39:47.6958532Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:39:47.6958629Z .b8 8 // Address Size (in bytes) 2026-02-21T12:39:47.6958747Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:39:47.6958836Z .b8 116 // DW_AT_producer 2026-02-21T12:39:47.6958892Z .b8 114 2026-02-21T12:39:47.6958946Z .b8 105 2026-02-21T12:39:47.6959002Z .b8 116 2026-02-21T12:39:47.6959054Z .b8 111 2026-02-21T12:39:47.6959107Z .b8 110 2026-02-21T12:39:47.6959163Z .b8 0 2026-02-21T12:39:47.6959244Z .b8 2 // DW_AT_language 2026-02-21T12:39:47.6959294Z .b8 0 2026-02-21T12:39:47.6959376Z .b8 99 // DW_AT_name 2026-02-21T12:39:47.6959433Z .b8 111 2026-02-21T12:39:47.6959486Z .b8 110 2026-02-21T12:39:47.6959537Z .b8 113 2026-02-21T12:39:47.6959595Z .b8 106 2026-02-21T12:39:47.6959647Z .b8 54 2026-02-21T12:39:47.6959700Z .b8 106 2026-02-21T12:39:47.6959753Z .b8 107 2026-02-21T12:39:47.6959808Z .b8 118 2026-02-21T12:39:47.6959858Z .b8 52 2026-02-21T12:39:47.6959910Z .b8 99 2026-02-21T12:39:47.6959967Z .b8 103 2026-02-21T12:39:47.6960020Z .b8 112 2026-02-21T12:39:47.6960086Z .b8 114 2026-02-21T12:39:47.6960142Z .b8 102 2026-02-21T12:39:47.6960197Z .b8 52 2026-02-21T12:39:47.6960251Z .b8 111 2026-02-21T12:39:47.6960302Z .b8 116 2026-02-21T12:39:47.6960355Z .b8 113 2026-02-21T12:39:47.6960410Z .b8 51 2026-02-21T12:39:47.6960464Z .b8 110 2026-02-21T12:39:47.6960516Z .b8 115 2026-02-21T12:39:47.6960570Z .b8 105 2026-02-21T12:39:47.6960622Z .b8 51 2026-02-21T12:39:47.6960673Z .b8 122 2026-02-21T12:39:47.6960725Z .b8 112 2026-02-21T12:39:47.6960784Z .b8 53 2026-02-21T12:39:47.6960924Z .b8 108 2026-02-21T12:39:47.6960977Z .b8 109 2026-02-21T12:39:47.6961033Z .b8 100 2026-02-21T12:39:47.6961086Z .b8 118 2026-02-21T12:39:47.6961204Z .b8 100 2026-02-21T12:39:47.6961272Z .b8 54 2026-02-21T12:39:47.6961328Z .b8 118 2026-02-21T12:39:47.6961382Z .b8 106 2026-02-21T12:39:47.6961434Z .b8 101 2026-02-21T12:39:47.6961485Z .b8 55 2026-02-21T12:39:47.6961539Z .b8 50 2026-02-21T12:39:47.6961591Z .b8 114 2026-02-21T12:39:47.6961643Z .b8 113 2026-02-21T12:39:47.6961699Z .b8 99 2026-02-21T12:39:47.6961752Z .b8 50 2026-02-21T12:39:47.6961805Z .b8 119 2026-02-21T12:39:47.6961857Z .b8 112 2026-02-21T12:39:47.6961913Z .b8 55 2026-02-21T12:39:47.6961964Z .b8 54 2026-02-21T12:39:47.6962015Z .b8 112 2026-02-21T12:39:47.6962070Z .b8 103 2026-02-21T12:39:47.6962121Z .b8 52 2026-02-21T12:39:47.6962172Z .b8 121 2026-02-21T12:39:47.6962223Z .b8 51 2026-02-21T12:39:47.6962282Z .b8 122 2026-02-21T12:39:47.6962334Z .b8 46 2026-02-21T12:39:47.6962386Z .b8 112 2026-02-21T12:39:47.6962440Z .b8 121 2026-02-21T12:39:47.6962496Z .b8 0 2026-02-21T12:39:47.6962600Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:39:47.6962782Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:39:47.6962843Z .b8 116 2026-02-21T12:39:47.6962893Z .b8 109 2026-02-21T12:39:47.6962947Z .b8 112 2026-02-21T12:39:47.6963002Z .b8 47 2026-02-21T12:39:47.6963066Z .b8 116 2026-02-21T12:39:47.6963121Z .b8 111 2026-02-21T12:39:47.6963173Z .b8 114 2026-02-21T12:39:47.6963227Z .b8 99 2026-02-21T12:39:47.6963278Z .b8 104 2026-02-21T12:39:47.6963329Z .b8 105 2026-02-21T12:39:47.6963381Z .b8 110 2026-02-21T12:39:47.6963435Z .b8 100 2026-02-21T12:39:47.6963485Z .b8 117 2026-02-21T12:39:47.6963536Z .b8 99 2026-02-21T12:39:47.6963591Z .b8 116 2026-02-21T12:39:47.6963643Z .b8 111 2026-02-21T12:39:47.6963694Z .b8 114 2026-02-21T12:39:47.6963746Z .b8 95 2026-02-21T12:39:47.6963801Z .b8 114 2026-02-21T12:39:47.6963851Z .b8 111 2026-02-21T12:39:47.6963903Z .b8 111 2026-02-21T12:39:47.6963959Z .b8 116 2026-02-21T12:39:47.6964012Z .b8 47 2026-02-21T12:39:47.6964065Z .b8 111 2026-02-21T12:39:47.6964117Z .b8 110 2026-02-21T12:39:47.6964172Z .b8 0 2026-02-21T12:39:47.6964292Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:39:47.6964370Z .b8 95 // DW_AT_name 2026-02-21T12:39:47.6964430Z .b8 104 2026-02-21T12:39:47.6964498Z .b8 101 2026-02-21T12:39:47.6964551Z .b8 108 2026-02-21T12:39:47.6964604Z .b8 105 2026-02-21T12:39:47.6964658Z .b8 111 2026-02-21T12:39:47.6964710Z .b8 110 2026-02-21T12:39:47.6964761Z .b8 95 2026-02-21T12:39:47.6964813Z .b8 109 2026-02-21T12:39:47.6964866Z .b8 97 2026-02-21T12:39:47.6964918Z .b8 116 2026-02-21T12:39:47.6964970Z .b8 109 2026-02-21T12:39:47.6965028Z .b8 117 2026-02-21T12:39:47.6965081Z .b8 108 2026-02-21T12:39:47.6965131Z .b8 95 2026-02-21T12:39:47.6965191Z .b8 98 2026-02-21T12:39:47.6965249Z .b8 102 2026-02-21T12:39:47.6965301Z .b8 49 2026-02-21T12:39:47.6965351Z .b8 54 2026-02-21T12:39:47.6965407Z .b8 95 2026-02-21T12:39:47.6965459Z .b8 105 2026-02-21T12:39:47.6965511Z .b8 110 2026-02-21T12:39:47.6965562Z .b8 116 2026-02-21T12:39:47.6965618Z .b8 52 2026-02-21T12:39:47.6965672Z .b8 0 2026-02-21T12:39:47.6965755Z .b8 1 // DW_AT_inline 2026-02-21T12:39:47.6965867Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:39:47.6965962Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:39:47.6966057Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:39:47.6966161Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:39:47.6966296Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:39:47.6966394Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:39:47.6966595Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:39:47.6966698Z .b64 $L__tmp20 // DW_AT_high_pc 2026-02-21T12:39:47.6966869Z .b8 1 // DW_AT_call_file 2026-02-21T12:39:47.6966963Z .b8 87 // DW_AT_call_line 2026-02-21T12:39:47.6967146Z .b8 40 // DW_AT_call_column 2026-02-21T12:39:47.6967250Z .b8 0 // End Of Children Mark 2026-02-21T12:39:47.6967339Z .b8 0 // End Of Children Mark 2026-02-21T12:39:47.6967394Z } 2026-02-21T12:39:47.6967463Z .section .debug_macinfo { } 2026-02-21T12:39:47.6967469Z 2026-02-21T12:39:47.6967551Z ================================================================ 2026-02-21T12:39:47.6967670Z please share the reproducer above with Triton project. 2026-02-21T12:40:23.7893915Z 2026-02-21T12:40:23.7893932Z 2026-02-21T12:40:23.7893937Z 2026-02-21T12:40:23.7894366Z ================================================================ 2026-02-21T12:40:23.7894766Z Internal Triton PTX codegen error 2026-02-21T12:40:23.7895053Z `ptxas` stderr: 2026-02-21T12:40:23.7896321Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 705 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:40:23.7897730Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:40:23.7897973Z 2026-02-21T12:40:23.7898634Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmppzm9t2nw.ptx -o /tmp/tmppzm9t2nw.ptx.o 2026-02-21T12:40:23.7899371Z 2026-02-21T12:40:23.7899376Z 2026-02-21T12:40:23.7899456Z // 2026-02-21T12:40:23.7899675Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:40:23.7899934Z // 2026-02-21T12:40:23.7900033Z 2026-02-21T12:40:23.7900108Z .version 8.7 2026-02-21T12:40:23.7900290Z .target sm_90a 2026-02-21T12:40:23.7900485Z .address_size 64 2026-02-21T12:40:23.7900602Z 2026-02-21T12:40:23.7900834Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:40:23.7901269Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:40:23.7901608Z // @_helion_matmul_bf16_int4 2026-02-21T12:40:23.7901953Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:40:23.7902336Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:40:23.7902786Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:40:23.7903248Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:40:23.7903676Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:40:23.7904104Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:40:23.7904449Z ) 2026-02-21T12:40:23.7904605Z .reqntid 128 2026-02-21T12:40:23.7904798Z .maxnreg 128 2026-02-21T12:40:23.7904962Z { 2026-02-21T12:40:23.7905134Z .reg .pred %p<59>; 2026-02-21T12:40:23.7905334Z .reg .b16 %rs<2081>; 2026-02-21T12:40:23.7905546Z .reg .b32 %r<22610>; 2026-02-21T12:40:23.7905759Z .reg .b64 %rd<648>; 2026-02-21T12:40:23.7906109Z .loc 1 14 0 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:14:0 2026-02-21T12:40:23.7906668Z $L__func_begin0: 2026-02-21T12:40:23.7906996Z .loc 1 14 0 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:14:0 2026-02-21T12:40:23.7907306Z 2026-02-21T12:40:23.7907376Z // %bb.0: 2026-02-21T12:40:23.7907580Z ld.param.b64 %rd162, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:40:23.7907903Z ld.param.b64 %rd161, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:40:23.7908208Z ld.param.b64 %rd160, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:40:23.7908560Z $L__tmp0: 2026-02-21T12:40:23.7908856Z .loc 1 20 30 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:20:30 2026-02-21T12:40:23.7909224Z mov.u32 %r2659, %ctaid.x; 2026-02-21T12:40:23.7909554Z .loc 1 20 48 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:20:48 2026-02-21T12:40:23.7910142Z mul.wide.u32 %rd612, %r2659, 5; 2026-02-21T12:40:23.7910702Z .loc 1 21 49 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:21:49 2026-02-21T12:40:23.7911447Z min.u64 %rd163, %rd612, 20475; 2026-02-21T12:40:23.7911719Z add.s64 %rd2, %rd163, 5; 2026-02-21T12:40:23.7912186Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.7912796Z sub.s64 %rd164, %rd2, %rd612; 2026-02-21T12:40:23.7913135Z shr.s64 %rd165, %rd164, 63; 2026-02-21T12:40:23.7913431Z shr.u64 %rd166, %rd165, 62; 2026-02-21T12:40:23.7913795Z add.s64 %rd167, %rd164, %rd166; 2026-02-21T12:40:23.7914084Z and.b64 %rd168, %rd167, -4; 2026-02-21T12:40:23.7914350Z add.s64 %rd641, %rd168, %rd612; 2026-02-21T12:40:23.7914863Z .loc 1 34 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:45 2026-02-21T12:40:23.7915390Z mov.u32 %r1, %tid.x; 2026-02-21T12:40:23.7915630Z shr.u32 %r2, %r1, 5; 2026-02-21T12:40:23.7915879Z bfe.u32 %r2660, %r1, 1, 6; 2026-02-21T12:40:23.7916330Z and.b32 %r3, %r1, 96; 2026-02-21T12:40:23.7917054Z bfe.u32 %r2661, %r1, 5, 2; 2026-02-21T12:40:23.7917383Z or.b32 %r2662, %r2661, 4; 2026-02-21T12:40:23.7917584Z or.b32 %r2663, %r2661, 8; 2026-02-21T12:40:23.7917853Z or.b32 %r2664, %r2661, 12; 2026-02-21T12:40:23.7918182Z or.b32 %r2665, %r2661, 16; 2026-02-21T12:40:23.7918382Z or.b32 %r2666, %r2661, 20; 2026-02-21T12:40:23.7918571Z or.b32 %r2667, %r2661, 24; 2026-02-21T12:40:23.7918903Z or.b32 %r2668, %r2661, 28; 2026-02-21T12:40:23.7919105Z or.b32 %r2669, %r2661, 32; 2026-02-21T12:40:23.7919285Z or.b32 %r2670, %r2661, 36; 2026-02-21T12:40:23.7919492Z or.b32 %r2671, %r2661, 40; 2026-02-21T12:40:23.7919798Z or.b32 %r2672, %r2661, 44; 2026-02-21T12:40:23.7920096Z or.b32 %r2673, %r2661, 48; 2026-02-21T12:40:23.7920337Z or.b32 %r2674, %r2661, 52; 2026-02-21T12:40:23.7920510Z or.b32 %r2675, %r2661, 56; 2026-02-21T12:40:23.7920812Z or.b32 %r2676, %r2661, 60; 2026-02-21T12:40:23.7921294Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:23.7921814Z cvt.u64.u32 %rd4, %r2660; 2026-02-21T12:40:23.7922072Z cvt.u64.u32 %rd5, %r2661; 2026-02-21T12:40:23.7922279Z cvt.u64.u32 %rd6, %r2662; 2026-02-21T12:40:23.7922557Z cvt.u64.u32 %rd7, %r2663; 2026-02-21T12:40:23.7922836Z cvt.u64.u32 %rd8, %r2664; 2026-02-21T12:40:23.7923033Z cvt.u64.u32 %rd9, %r2665; 2026-02-21T12:40:23.7923214Z cvt.u64.u32 %rd10, %r2666; 2026-02-21T12:40:23.7923548Z cvt.u64.u32 %rd11, %r2667; 2026-02-21T12:40:23.7923779Z cvt.u64.u32 %rd12, %r2668; 2026-02-21T12:40:23.7923959Z cvt.u64.u32 %rd13, %r2669; 2026-02-21T12:40:23.7924209Z cvt.u64.u32 %rd14, %r2670; 2026-02-21T12:40:23.7924534Z cvt.u64.u32 %rd15, %r2671; 2026-02-21T12:40:23.7924717Z cvt.u64.u32 %rd16, %r2672; 2026-02-21T12:40:23.7924896Z cvt.u64.u32 %rd17, %r2673; 2026-02-21T12:40:23.7925219Z cvt.u64.u32 %rd18, %r2674; 2026-02-21T12:40:23.7925466Z cvt.u64.u32 %rd19, %r2675; 2026-02-21T12:40:23.7925647Z cvt.u64.u32 %rd20, %r2676; 2026-02-21T12:40:23.7926184Z .loc 1 36 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:45 2026-02-21T12:40:23.7926858Z shl.b32 %r4, %r1, 4; 2026-02-21T12:40:23.7927144Z and.b32 %r2677, %r4, 240; 2026-02-21T12:40:23.7927325Z shl.b32 %r5, %r1, 3; 2026-02-21T12:40:23.7927556Z and.b32 %r2678, %r5, 248; 2026-02-21T12:40:23.7928052Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:23.7928528Z cvt.u64.u32 %rd21, %r2677; 2026-02-21T12:40:23.7928841Z cvt.u64.u32 %rd22, %r2678; 2026-02-21T12:40:23.7929776Z .loc 1 44 48 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:44:48 2026-02-21T12:40:23.7930256Z and.b32 %r6, %r1, 112; 2026-02-21T12:40:23.7930522Z bfe.u32 %r7, %r1, 4, 3; 2026-02-21T12:40:23.7931095Z .loc 1 50 39 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:50:39 2026-02-21T12:40:23.7931541Z and.b32 %r8, %r5, 8; 2026-02-21T12:40:23.7932174Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.7932569Z setp.lt.s64 %p1, %rd168, 1; 2026-02-21T12:40:23.7932902Z cvt.u32.u64 %r21312, %rd5; 2026-02-21T12:40:23.7933242Z and.b32 %r21313, %r4, 1904; 2026-02-21T12:40:23.7933469Z and.b32 %r21314, %r1, 8; 2026-02-21T12:40:23.7933686Z bfe.s32 %r21315, %r1, 3, 1; 2026-02-21T12:40:23.7934024Z mov.b32 %r18409, global_smem; 2026-02-21T12:40:23.7934236Z shl.b32 %r21317, %r3, 4; 2026-02-21T12:40:23.7934416Z and.b32 %r21318, %r5, 96; 2026-02-21T12:40:23.7934704Z shl.b32 %r21319, %r1, 1; 2026-02-21T12:40:23.7934965Z bfe.s32 %r21320, %r1, 4, 1; 2026-02-21T12:40:23.7935147Z and.b32 %r21321, %r4, 112; 2026-02-21T12:40:23.7935327Z and.b32 %r21322, %r1, 6; 2026-02-21T12:40:23.7935641Z shl.b32 %r21323, %r1, 6; 2026-02-21T12:40:23.7935866Z and.b32 %r21324, %r1, 3; 2026-02-21T12:40:23.7936052Z and.b32 %r21325, %r4, 1920; 2026-02-21T12:40:23.7936407Z bfe.s32 %r21326, %r1, 2, 1; 2026-02-21T12:40:23.7936805Z and.b32 %r21327, %r1, 24; 2026-02-21T12:40:23.7936997Z bfe.s32 %r21328, %r1, 5, 1; 2026-02-21T12:40:23.7937177Z and.b32 %r21329, %r1, 1; 2026-02-21T12:40:23.7937361Z mul.wide.u32 %rd611, %r7, 1280; 2026-02-21T12:40:23.7937559Z @%p1 bra $L__BB0_4; 2026-02-21T12:40:23.7937743Z // %bb.1: // %.lr.ph 2026-02-21T12:40:23.7938139Z .loc 1 0 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:0:120 2026-02-21T12:40:23.7938500Z and.b32 %r2683, %r21315, 136; 2026-02-21T12:40:23.7938693Z or.b32 %r2684, %r2683, %r21313; 2026-02-21T12:40:23.7938914Z add.s32 %r9, %r18409, %r2684; 2026-02-21T12:40:23.7939099Z xor.b32 %r2686, %r2684, 8; 2026-02-21T12:40:23.7939278Z add.s32 %r10, %r18409, %r2686; 2026-02-21T12:40:23.7939468Z and.b32 %r2690, %r21319, 6; 2026-02-21T12:40:23.7939648Z and.b32 %r2692, %r21320, 136; 2026-02-21T12:40:23.7939824Z or.b32 %r2693, %r21317, %r21318; 2026-02-21T12:40:23.7940026Z or.b32 %r2694, %r2693, %r2690; 2026-02-21T12:40:23.7940206Z or.b32 %r2695, %r2694, %r2692; 2026-02-21T12:40:23.7940391Z add.s32 %r11, %r18409, %r2695; 2026-02-21T12:40:23.7940567Z xor.b32 %r2696, %r2695, 8; 2026-02-21T12:40:23.7940754Z add.s32 %r12, %r18409, %r2696; 2026-02-21T12:40:23.7940965Z shr.u32 %r2698, %r21314, 1; 2026-02-21T12:40:23.7941158Z or.b32 %r2699, %r2698, %r21321; 2026-02-21T12:40:23.7941397Z or.b32 %r2700, %r2699, %r21312; 2026-02-21T12:40:23.7941596Z or.b32 %r2701, %r2700, %r2692; 2026-02-21T12:40:23.7941803Z add.s32 %r13, %r18409, %r2701; 2026-02-21T12:40:23.7941992Z xor.b32 %r2702, %r2701, 8; 2026-02-21T12:40:23.7942192Z add.s32 %r14, %r18409, %r2702; 2026-02-21T12:40:23.7942382Z xor.b32 %r2703, %r2701, 32; 2026-02-21T12:40:23.7942562Z add.s32 %r15, %r18409, %r2703; 2026-02-21T12:40:23.7942742Z xor.b32 %r2704, %r2701, 40; 2026-02-21T12:40:23.7942920Z add.s32 %r16, %r18409, %r2704; 2026-02-21T12:40:23.7943116Z xor.b32 %r2705, %r2701, 64; 2026-02-21T12:40:23.7943301Z add.s32 %r17, %r18409, %r2705; 2026-02-21T12:40:23.7943500Z xor.b32 %r2706, %r2701, 72; 2026-02-21T12:40:23.7943684Z add.s32 %r18, %r18409, %r2706; 2026-02-21T12:40:23.7943873Z xor.b32 %r2707, %r2701, 96; 2026-02-21T12:40:23.7944047Z add.s32 %r19, %r18409, %r2707; 2026-02-21T12:40:23.7944236Z xor.b32 %r2708, %r2701, 104; 2026-02-21T12:40:23.7944414Z add.s32 %r20, %r18409, %r2708; 2026-02-21T12:40:23.7944600Z xor.b32 %r2709, %r2701, 4; 2026-02-21T12:40:23.7944784Z add.s32 %r21, %r18409, %r2709; 2026-02-21T12:40:23.7944964Z xor.b32 %r2710, %r2701, 12; 2026-02-21T12:40:23.7945144Z add.s32 %r22, %r18409, %r2710; 2026-02-21T12:40:23.7945324Z xor.b32 %r2711, %r2701, 36; 2026-02-21T12:40:23.7945504Z add.s32 %r23, %r18409, %r2711; 2026-02-21T12:40:23.7945682Z xor.b32 %r2712, %r2701, 44; 2026-02-21T12:40:23.7945956Z add.s32 %r24, %r18409, %r2712; 2026-02-21T12:40:23.7946133Z xor.b32 %r2713, %r2701, 68; 2026-02-21T12:40:23.7946314Z add.s32 %r25, %r18409, %r2713; 2026-02-21T12:40:23.7946729Z xor.b32 %r2714, %r2701, 76; 2026-02-21T12:40:23.7946911Z add.s32 %r26, %r18409, %r2714; 2026-02-21T12:40:23.7947101Z xor.b32 %r2715, %r2701, 100; 2026-02-21T12:40:23.7947284Z add.s32 %r27, %r18409, %r2715; 2026-02-21T12:40:23.7947468Z xor.b32 %r2716, %r2701, 108; 2026-02-21T12:40:23.7947640Z add.s32 %r28, %r18409, %r2716; 2026-02-21T12:40:23.7947822Z and.b32 %r2718, %r21315, 1028; 2026-02-21T12:40:23.7948003Z mul.lo.s32 %r2719, %r21322, 144; 2026-02-21T12:40:23.7948197Z xor.b32 %r2720, %r2719, %r6; 2026-02-21T12:40:23.7948469Z or.b32 %r2721, %r2718, %r2720; 2026-02-21T12:40:23.7948656Z or.b32 %r2722, %r2721, %r8; 2026-02-21T12:40:23.7948830Z add.s32 %r29, %r18409, %r2722; 2026-02-21T12:40:23.7949016Z xor.b32 %r2723, %r2722, 4; 2026-02-21T12:40:23.7949195Z add.s32 %r30, %r18409, %r2723; 2026-02-21T12:40:23.7949376Z xor.b32 %r2724, %r2722, 136; 2026-02-21T12:40:23.7949554Z add.s32 %r31, %r18409, %r2724; 2026-02-21T12:40:23.7949880Z xor.b32 %r2725, %r2722, 140; 2026-02-21T12:40:23.7950069Z add.s32 %r32, %r18409, %r2725; 2026-02-21T12:40:23.7950248Z and.b32 %r2727, %r21323, 8128; 2026-02-21T12:40:23.7950429Z shl.b32 %r2728, %r21322, 3; 2026-02-21T12:40:23.7950605Z or.b32 %r2729, %r2727, %r2728; 2026-02-21T12:40:23.7950787Z add.s32 %r33, %r18409, %r2729; 2026-02-21T12:40:23.7950972Z xor.b32 %r2730, %r2729, 16; 2026-02-21T12:40:23.7951144Z add.s32 %r34, %r18409, %r2730; 2026-02-21T12:40:23.7951329Z xor.b32 %r2731, %r2729, 32; 2026-02-21T12:40:23.7951500Z add.s32 %r35, %r18409, %r2731; 2026-02-21T12:40:23.7951685Z xor.b32 %r2732, %r2729, 48; 2026-02-21T12:40:23.7951857Z add.s32 %r36, %r18409, %r2732; 2026-02-21T12:40:23.7952044Z bfe.u32 %r2733, %r18409, 4, 14; 2026-02-21T12:40:23.7952234Z cvt.u64.u32 %rd169, %r2733; 2026-02-21T12:40:23.7952453Z or.b64 %rd23, %rd169, -9223371899348713472; 2026-02-21T12:40:23.7952685Z add.s32 %r2734, %r18409, 32; 2026-02-21T12:40:23.7952873Z bfe.u32 %r2735, %r2734, 4, 14; 2026-02-21T12:40:23.7953066Z cvt.u64.u32 %rd170, %r2735; 2026-02-21T12:40:23.7953257Z or.b64 %rd24, %rd170, -9223371899348713472; 2026-02-21T12:40:23.7953475Z and.b32 %r2739, %r21326, 2064; 2026-02-21T12:40:23.7953663Z or.b32 %r2740, %r21325, %r2739; 2026-02-21T12:40:23.7953868Z mad.lo.s32 %r2741, %r21324, 4128, %r2740; 2026-02-21T12:40:23.7954080Z add.s32 %r37, %r18409, %r2741; 2026-02-21T12:40:23.7954278Z xor.b32 %r2742, %r2741, 16; 2026-02-21T12:40:23.7954460Z add.s32 %r38, %r18409, %r2742; 2026-02-21T12:40:23.7954647Z xor.b32 %r2743, %r2741, 32; 2026-02-21T12:40:23.7954830Z add.s32 %r39, %r18409, %r2743; 2026-02-21T12:40:23.7955007Z xor.b32 %r2744, %r2741, 48; 2026-02-21T12:40:23.7955187Z add.s32 %r40, %r18409, %r2744; 2026-02-21T12:40:23.7955362Z xor.b32 %r2745, %r2741, 64; 2026-02-21T12:40:23.7955539Z add.s32 %r41, %r18409, %r2745; 2026-02-21T12:40:23.7955720Z xor.b32 %r2746, %r2741, 80; 2026-02-21T12:40:23.7955893Z add.s32 %r42, %r18409, %r2746; 2026-02-21T12:40:23.7956089Z xor.b32 %r2747, %r2741, 96; 2026-02-21T12:40:23.7956267Z add.s32 %r43, %r18409, %r2747; 2026-02-21T12:40:23.7956443Z xor.b32 %r2748, %r2741, 112; 2026-02-21T12:40:23.7956776Z add.s32 %r44, %r18409, %r2748; 2026-02-21T12:40:23.7956957Z shl.b32 %r2750, %r21327, 9; 2026-02-21T12:40:23.7957130Z shl.b32 %r2751, %r21327, 2; 2026-02-21T12:40:23.7957310Z and.b32 %r2753, %r21328, 2064; 2026-02-21T12:40:23.7957490Z and.b32 %r2754, %r21319, 128; 2026-02-21T12:40:23.7957678Z or.b32 %r2755, %r2750, %r21321; 2026-02-21T12:40:23.7957863Z or.b32 %r2756, %r2753, %r2751; 2026-02-21T12:40:23.7958051Z xor.b32 %r2757, %r2756, %r2755; 2026-02-21T12:40:23.7958236Z add.s32 %r2758, %r18409, %r2754; 2026-02-21T12:40:23.7958434Z add.s32 %r6254, %r2758, %r2757; 2026-02-21T12:40:23.7958635Z add.s32 %r6259, %r6254, 256; 2026-02-21T12:40:23.7958933Z add.s32 %r6264, %r6254, 512; 2026-02-21T12:40:23.7959112Z add.s32 %r6269, %r6254, 768; 2026-02-21T12:40:23.7959287Z add.s32 %r6274, %r6254, 1024; 2026-02-21T12:40:23.7959542Z add.s32 %r6279, %r6254, 1280; 2026-02-21T12:40:23.7959716Z add.s32 %r6284, %r6254, 1536; 2026-02-21T12:40:23.7959893Z add.s32 %r6289, %r6254, 1792; 2026-02-21T12:40:23.7960068Z shl.b32 %r2759, %r21324, 1; 2026-02-21T12:40:23.7960252Z or.b32 %r2760, %r2693, %r2759; 2026-02-21T12:40:23.7960429Z or.b32 %r2761, %r2760, %r2692; 2026-02-21T12:40:23.7960616Z add.s32 %r53, %r18409, %r2761; 2026-02-21T12:40:23.7960802Z xor.b32 %r2762, %r2761, 8; 2026-02-21T12:40:23.7960979Z add.s32 %r54, %r18409, %r2762; 2026-02-21T12:40:23.7961339Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.7961723Z shl.b64 %rd171, %rd4, 14; 2026-02-21T12:40:23.7962010Z mul.wide.u32 %rd172, %r21329, 16; 2026-02-21T12:40:23.7962213Z or.b64 %rd173, %rd171, %rd172; 2026-02-21T12:40:23.7962403Z add.s64 %rd174, %rd173, %rd160; 2026-02-21T12:40:23.7962594Z add.s64 %rd25, %rd174, 64; 2026-02-21T12:40:23.7962945Z or.b64 %rd176, %rd611, %rd21; 2026-02-21T12:40:23.7963218Z add.s64 %rd26, %rd161, %rd176; 2026-02-21T12:40:23.7963410Z add.s64 %rd27, %rd174, 16320; 2026-02-21T12:40:23.7963697Z add.s64 %rd28, %rd26, 5222400; 2026-02-21T12:40:23.7963936Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:40:23.7964245Z // Child Loop BB0_14 Depth 2 2026-02-21T12:40:23.7964607Z // Child Loop BB0_16 Depth 2 2026-02-21T12:40:23.7964888Z // Child Loop BB0_21 Depth 2 2026-02-21T12:40:23.7965177Z // Child Loop BB0_23 Depth 2 2026-02-21T12:40:23.7965517Z // Child Loop BB0_28 Depth 2 2026-02-21T12:40:23.7965797Z // Child Loop BB0_30 Depth 2 2026-02-21T12:40:23.7966125Z // Child Loop BB0_35 Depth 2 2026-02-21T12:40:23.7966424Z // Child Loop BB0_37 Depth 2 2026-02-21T12:40:23.7967006Z .loc 1 28 35 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:28:35 2026-02-21T12:40:23.7967442Z mul.hi.u64 %rd177, %rd612, -3689348814741910323; 2026-02-21T12:40:23.7967713Z shr.u64 %rd178, %rd177, 5; 2026-02-21T12:40:23.7968031Z .loc 1 29 33 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:29:33 2026-02-21T12:40:23.7968386Z shl.b64 %rd38, %rd178, 3; 2026-02-21T12:40:23.7968693Z .loc 1 30 39 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:39 2026-02-21T12:40:23.7969045Z sub.s64 %rd179, 4096, %rd38; 2026-02-21T12:40:23.7969359Z .loc 1 30 52 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:52 2026-02-21T12:40:23.7969711Z min.s64 %rd39, %rd179, 8; 2026-02-21T12:40:23.7970026Z .loc 1 31 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:45 2026-02-21T12:40:23.7970379Z mul.lo.s64 %rd180, %rd178, 40; 2026-02-21T12:40:23.7970588Z sub.s64 %rd40, %rd612, %rd180; 2026-02-21T12:40:23.7970905Z .loc 1 32 51 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:32:51 2026-02-21T12:40:23.7971253Z or.b64 %rd181, %rd40, %rd39; 2026-02-21T12:40:23.7971437Z and.b64 %rd182, %rd181, -4294967296; 2026-02-21T12:40:23.7971650Z setp.ne.b64 %p2, %rd182, 0; 2026-02-21T12:40:23.7971838Z @%p2 bra $L__BB0_12; 2026-02-21T12:40:23.7972000Z bra.uni $L__BB0_3; 2026-02-21T12:40:23.7972217Z $L__BB0_12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.7972481Z div.s64 %rd613, %rd40, %rd39; 2026-02-21T12:40:23.7972683Z bra.uni $L__BB0_13; 2026-02-21T12:40:23.7972895Z $L__BB0_3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.7973258Z cvt.u32.u64 %r2764, %rd39; 2026-02-21T12:40:23.7973441Z cvt.u32.u64 %r2765, %rd40; 2026-02-21T12:40:23.7973710Z div.u32 %r2766, %r2765, %r2764; 2026-02-21T12:40:23.7973908Z cvt.u64.u32 %rd613, %r2766; 2026-02-21T12:40:23.7974134Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.7974542Z .loc 1 31 64 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:64 2026-02-21T12:40:23.7974909Z mul.lo.s64 %rd184, %rd613, %rd39; 2026-02-21T12:40:23.7975118Z sub.s64 %rd185, %rd40, %rd184; 2026-02-21T12:40:23.7975443Z .loc 1 31 30 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:30 2026-02-21T12:40:23.7975805Z add.s64 %rd186, %rd185, %rd38; 2026-02-21T12:40:23.7976123Z .loc 1 33 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:33:27 2026-02-21T12:40:23.7976627Z shl.b64 %rd44, %rd186, 6; 2026-02-21T12:40:23.7977011Z .loc 1 35 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:35:27 2026-02-21T12:40:23.7977448Z shl.b64 %rd45, %rd613, 8; 2026-02-21T12:40:23.7977889Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.7978257Z shl.b64 %rd46, %rd186, 20; 2026-02-21T12:40:23.7978449Z add.s64 %rd615, %rd25, %rd46; 2026-02-21T12:40:23.7978635Z add.s64 %rd614, %rd26, %rd45; 2026-02-21T12:40:23.7978824Z mov.b32 %r21330, 0f00000000; 2026-02-21T12:40:23.7979012Z mov.b64 %rd616, -24; 2026-02-21T12:40:23.7979177Z mov.b32 %r21331, %r21330; 2026-02-21T12:40:23.7979361Z mov.b32 %r21332, %r21330; 2026-02-21T12:40:23.7979528Z mov.b32 %r21333, %r21330; 2026-02-21T12:40:23.7979703Z mov.b32 %r21334, %r21330; 2026-02-21T12:40:23.7979872Z mov.b32 %r21335, %r21330; 2026-02-21T12:40:23.7980044Z mov.b32 %r21336, %r21330; 2026-02-21T12:40:23.7980209Z mov.b32 %r21337, %r21330; 2026-02-21T12:40:23.7980380Z mov.b32 %r21338, %r21330; 2026-02-21T12:40:23.7980550Z mov.b32 %r21339, %r21330; 2026-02-21T12:40:23.7980714Z mov.b32 %r21340, %r21330; 2026-02-21T12:40:23.7980889Z mov.b32 %r21341, %r21330; 2026-02-21T12:40:23.7981054Z mov.b32 %r21342, %r21330; 2026-02-21T12:40:23.7981222Z mov.b32 %r21343, %r21330; 2026-02-21T12:40:23.7981386Z mov.b32 %r21344, %r21330; 2026-02-21T12:40:23.7981554Z mov.b32 %r21345, %r21330; 2026-02-21T12:40:23.7981722Z mov.b32 %r21346, %r21330; 2026-02-21T12:40:23.7981890Z mov.b32 %r21347, %r21330; 2026-02-21T12:40:23.7982052Z mov.b32 %r21348, %r21330; 2026-02-21T12:40:23.7982240Z mov.b32 %r21349, %r21330; 2026-02-21T12:40:23.7982414Z mov.b32 %r21350, %r21330; 2026-02-21T12:40:23.7982577Z mov.b32 %r21351, %r21330; 2026-02-21T12:40:23.7982746Z mov.b32 %r21352, %r21330; 2026-02-21T12:40:23.7982909Z mov.b32 %r21353, %r21330; 2026-02-21T12:40:23.7983084Z mov.b32 %r21354, %r21330; 2026-02-21T12:40:23.7983244Z mov.b32 %r21355, %r21330; 2026-02-21T12:40:23.7983416Z mov.b32 %r21356, %r21330; 2026-02-21T12:40:23.7983581Z mov.b32 %r21357, %r21330; 2026-02-21T12:40:23.7983750Z mov.b32 %r21358, %r21330; 2026-02-21T12:40:23.7983920Z mov.b32 %r21359, %r21330; 2026-02-21T12:40:23.7984092Z mov.b32 %r21360, %r21330; 2026-02-21T12:40:23.7984262Z mov.b32 %r21361, %r21330; 2026-02-21T12:40:23.7984428Z mov.b32 %r21362, %r21330; 2026-02-21T12:40:23.7984604Z mov.b32 %r21363, %r21330; 2026-02-21T12:40:23.7984779Z mov.b32 %r21364, %r21330; 2026-02-21T12:40:23.7984951Z mov.b32 %r21365, %r21330; 2026-02-21T12:40:23.7985114Z mov.b32 %r21366, %r21330; 2026-02-21T12:40:23.7985288Z mov.b32 %r21367, %r21330; 2026-02-21T12:40:23.7985461Z mov.b32 %r21368, %r21330; 2026-02-21T12:40:23.7985636Z mov.b32 %r21369, %r21330; 2026-02-21T12:40:23.7985804Z mov.b32 %r21370, %r21330; 2026-02-21T12:40:23.7985975Z mov.b32 %r21371, %r21330; 2026-02-21T12:40:23.7986144Z mov.b32 %r21372, %r21330; 2026-02-21T12:40:23.7986311Z mov.b32 %r21373, %r21330; 2026-02-21T12:40:23.7986718Z mov.b32 %r21374, %r21330; 2026-02-21T12:40:23.7986886Z mov.b32 %r21375, %r21330; 2026-02-21T12:40:23.7987058Z mov.b32 %r21376, %r21330; 2026-02-21T12:40:23.7987303Z mov.b32 %r21377, %r21330; 2026-02-21T12:40:23.7987489Z mov.b32 %r21378, %r21330; 2026-02-21T12:40:23.7987659Z mov.b32 %r21379, %r21330; 2026-02-21T12:40:23.7987832Z mov.b32 %r21380, %r21330; 2026-02-21T12:40:23.7988001Z mov.b32 %r21381, %r21330; 2026-02-21T12:40:23.7988177Z mov.b32 %r21382, %r21330; 2026-02-21T12:40:23.7988456Z mov.b32 %r21383, %r21330; 2026-02-21T12:40:23.7988639Z mov.b32 %r21384, %r21330; 2026-02-21T12:40:23.7988817Z mov.b32 %r21385, %r21330; 2026-02-21T12:40:23.7988983Z mov.b32 %r21386, %r21330; 2026-02-21T12:40:23.7989158Z mov.b32 %r21387, %r21330; 2026-02-21T12:40:23.7989321Z mov.b32 %r21388, %r21330; 2026-02-21T12:40:23.7989487Z mov.b32 %r21389, %r21330; 2026-02-21T12:40:23.7989654Z mov.b32 %r21390, %r21330; 2026-02-21T12:40:23.7989823Z mov.b32 %r21391, %r21330; 2026-02-21T12:40:23.7989989Z mov.b32 %r21392, %r21330; 2026-02-21T12:40:23.7990160Z mov.b32 %r21393, %r21330; 2026-02-21T12:40:23.7990328Z mov.b32 %r21394, %r21330; 2026-02-21T12:40:23.7990634Z mov.b32 %r21395, %r21330; 2026-02-21T12:40:23.7990812Z mov.b32 %r21396, %r21330; 2026-02-21T12:40:23.7990975Z mov.b32 %r21397, %r21330; 2026-02-21T12:40:23.7991143Z mov.b32 %r21398, %r21330; 2026-02-21T12:40:23.7991307Z mov.b32 %r21399, %r21330; 2026-02-21T12:40:23.7991476Z mov.b32 %r21400, %r21330; 2026-02-21T12:40:23.7991641Z mov.b32 %r21401, %r21330; 2026-02-21T12:40:23.7991808Z mov.b32 %r21402, %r21330; 2026-02-21T12:40:23.7991973Z mov.b32 %r21403, %r21330; 2026-02-21T12:40:23.7992147Z mov.b32 %r21404, %r21330; 2026-02-21T12:40:23.7992318Z mov.b32 %r21405, %r21330; 2026-02-21T12:40:23.7992482Z mov.b32 %r21406, %r21330; 2026-02-21T12:40:23.7992652Z mov.b32 %r21407, %r21330; 2026-02-21T12:40:23.7992815Z mov.b32 %r21408, %r21330; 2026-02-21T12:40:23.7992986Z mov.b32 %r21409, %r21330; 2026-02-21T12:40:23.7993154Z mov.b32 %r21410, %r21330; 2026-02-21T12:40:23.7993325Z mov.b32 %r21411, %r21330; 2026-02-21T12:40:23.7993493Z mov.b32 %r21412, %r21330; 2026-02-21T12:40:23.7993668Z mov.b32 %r21413, %r21330; 2026-02-21T12:40:23.7993833Z mov.b32 %r21414, %r21330; 2026-02-21T12:40:23.7994004Z mov.b32 %r21415, %r21330; 2026-02-21T12:40:23.7994175Z mov.b32 %r21416, %r21330; 2026-02-21T12:40:23.7994345Z mov.b32 %r21417, %r21330; 2026-02-21T12:40:23.7994517Z mov.b32 %r21418, %r21330; 2026-02-21T12:40:23.7994681Z mov.b32 %r21419, %r21330; 2026-02-21T12:40:23.7994869Z mov.b32 %r21420, %r21330; 2026-02-21T12:40:23.7995039Z mov.b32 %r21421, %r21330; 2026-02-21T12:40:23.7995211Z mov.b32 %r21422, %r21330; 2026-02-21T12:40:23.7995375Z mov.b32 %r21423, %r21330; 2026-02-21T12:40:23.7995545Z mov.b32 %r21424, %r21330; 2026-02-21T12:40:23.7995719Z mov.b32 %r21425, %r21330; 2026-02-21T12:40:23.7995890Z mov.b32 %r21426, %r21330; 2026-02-21T12:40:23.7996063Z mov.b32 %r21427, %r21330; 2026-02-21T12:40:23.7996233Z mov.b32 %r21428, %r21330; 2026-02-21T12:40:23.7996407Z mov.b32 %r21429, %r21330; 2026-02-21T12:40:23.7996705Z mov.b32 %r21430, %r21330; 2026-02-21T12:40:23.7996885Z mov.b32 %r21431, %r21330; 2026-02-21T12:40:23.7997052Z mov.b32 %r21432, %r21330; 2026-02-21T12:40:23.7997226Z mov.b32 %r21433, %r21330; 2026-02-21T12:40:23.7997392Z mov.b32 %r21434, %r21330; 2026-02-21T12:40:23.7997576Z mov.b32 %r21435, %r21330; 2026-02-21T12:40:23.7997749Z mov.b32 %r21436, %r21330; 2026-02-21T12:40:23.7997913Z mov.b32 %r21437, %r21330; 2026-02-21T12:40:23.7998081Z mov.b32 %r21438, %r21330; 2026-02-21T12:40:23.7998247Z mov.b32 %r21439, %r21330; 2026-02-21T12:40:23.7998418Z mov.b32 %r21440, %r21330; 2026-02-21T12:40:23.7998579Z mov.b32 %r21441, %r21330; 2026-02-21T12:40:23.7998748Z mov.b32 %r21442, %r21330; 2026-02-21T12:40:23.7998911Z mov.b32 %r21443, %r21330; 2026-02-21T12:40:23.7999078Z mov.b32 %r21444, %r21330; 2026-02-21T12:40:23.7999242Z mov.b32 %r21445, %r21330; 2026-02-21T12:40:23.7999513Z mov.b32 %r21446, %r21330; 2026-02-21T12:40:23.7999684Z mov.b32 %r21447, %r21330; 2026-02-21T12:40:23.7999848Z mov.b32 %r21448, %r21330; 2026-02-21T12:40:23.8000087Z mov.b32 %r21449, %r21330; 2026-02-21T12:40:23.8000268Z mov.b32 %r21450, %r21330; 2026-02-21T12:40:23.8000447Z mov.b32 %r21451, %r21330; 2026-02-21T12:40:23.8000620Z mov.b32 %r21452, %r21330; 2026-02-21T12:40:23.8000792Z mov.b32 %r21453, %r21330; 2026-02-21T12:40:23.8000956Z mov.b32 %r21454, %r21330; 2026-02-21T12:40:23.8001130Z mov.b32 %r21455, %r21330; 2026-02-21T12:40:23.8001296Z mov.b32 %r21456, %r21330; 2026-02-21T12:40:23.8001467Z mov.b32 %r21457, %r21330; 2026-02-21T12:40:23.8001697Z $L__BB0_14: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8002005Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8002407Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8002772Z add.s64 %rd188, %rd615, -64; 2026-02-21T12:40:23.8002963Z // begin inline asm 2026-02-21T12:40:23.8003125Z mov.u64 %rd187, 0x0; 2026-02-21T12:40:23.8003577Z createpolicy.fractional.L2::evict_last.b64 %rd187, 1.0; 2026-02-21T12:40:23.8003849Z // end inline asm 2026-02-21T12:40:23.8004003Z // begin inline asm 2026-02-21T12:40:23.8004164Z mov.u32 %r2768, 0x0; 2026-02-21T12:40:23.8004319Z mov.u32 %r2769, 0x0; 2026-02-21T12:40:23.8004479Z mov.u32 %r2770, 0x0; 2026-02-21T12:40:23.8004630Z mov.u32 %r2771, 0x0; 2026-02-21T12:40:23.8004960Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2768, %r2769, %r2770, %r2771 }, [ %rd188 + 0 ], %rd187; 2026-02-21T12:40:23.8005327Z // end inline asm 2026-02-21T12:40:23.8005640Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8006010Z bar.sync 0; 2026-02-21T12:40:23.8006179Z st.shared.v2.b32 [%r9], {%r2768, %r2769}; 2026-02-21T12:40:23.8006421Z st.shared.v2.b32 [%r10], {%r2770, %r2771}; 2026-02-21T12:40:23.8006787Z bar.sync 0; 2026-02-21T12:40:23.8006949Z ld.shared.b16 %rs1, [%r11]; 2026-02-21T12:40:23.8007149Z ld.shared.b16 %rs2, [%r11+256]; 2026-02-21T12:40:23.8007359Z ld.shared.b16 %rs3, [%r11+16]; 2026-02-21T12:40:23.8007565Z ld.shared.b16 %rs4, [%r11+272]; 2026-02-21T12:40:23.8007772Z ld.shared.b16 %rs5, [%r12]; 2026-02-21T12:40:23.8007958Z ld.shared.b16 %rs6, [%r12+256]; 2026-02-21T12:40:23.8008160Z ld.shared.b16 %rs7, [%r12+16]; 2026-02-21T12:40:23.8008356Z ld.shared.b16 %rs8, [%r12+272]; 2026-02-21T12:40:23.8008549Z cvt.f32.bf16 %r3032, %rs1; 2026-02-21T12:40:23.8008737Z cvt.f32.bf16 %r3033, %rs2; 2026-02-21T12:40:23.8008911Z cvt.f32.bf16 %r3034, %rs5; 2026-02-21T12:40:23.8009088Z cvt.f32.bf16 %r3035, %rs6; 2026-02-21T12:40:23.8009260Z cvt.f32.bf16 %r3292, %rs3; 2026-02-21T12:40:23.8009440Z cvt.f32.bf16 %r3293, %rs4; 2026-02-21T12:40:23.8009611Z cvt.f32.bf16 %r3294, %rs7; 2026-02-21T12:40:23.8009790Z cvt.f32.bf16 %r3295, %rs8; 2026-02-21T12:40:23.8010110Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8010464Z // begin inline asm 2026-02-21T12:40:23.8010629Z mov.u32 %r2772, 0x0; 2026-02-21T12:40:23.8010785Z mov.u32 %r2773, 0x0; 2026-02-21T12:40:23.8010943Z mov.u32 %r2774, 0x0; 2026-02-21T12:40:23.8011095Z mov.u32 %r2775, 0x0; 2026-02-21T12:40:23.8011321Z ld.global.v4.b32 { %r2772, %r2773, %r2774, %r2775 }, [ %rd614 + 0 ]; 2026-02-21T12:40:23.8011600Z // end inline asm 2026-02-21T12:40:23.8011901Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8012247Z bar.sync 0; 2026-02-21T12:40:23.8012401Z st.shared.b8 [%r13], %r2772; 2026-02-21T12:40:23.8012599Z prmt.b32 %r5138, %r2772, 0, 0x7771U; 2026-02-21T12:40:23.8012807Z st.shared.b8 [%r14], %r5138; 2026-02-21T12:40:23.8012995Z prmt.b32 %r5139, %r2772, 0, 0x7772U; 2026-02-21T12:40:23.8013198Z st.shared.b8 [%r15+256], %r5139; 2026-02-21T12:40:23.8013487Z prmt.b32 %r5140, %r2772, 0, 0x7773U; 2026-02-21T12:40:23.8013686Z st.shared.b8 [%r16+256], %r5140; 2026-02-21T12:40:23.8013982Z st.shared.b8 [%r17+512], %r2773; 2026-02-21T12:40:23.8014176Z prmt.b32 %r5141, %r2773, 0, 0x7771U; 2026-02-21T12:40:23.8014377Z st.shared.b8 [%r18+512], %r5141; 2026-02-21T12:40:23.8014573Z prmt.b32 %r5142, %r2773, 0, 0x7772U; 2026-02-21T12:40:23.8014768Z st.shared.b8 [%r19+768], %r5142; 2026-02-21T12:40:23.8014963Z prmt.b32 %r5143, %r2773, 0, 0x7773U; 2026-02-21T12:40:23.8015162Z st.shared.b8 [%r20+768], %r5143; 2026-02-21T12:40:23.8015363Z st.shared.b8 [%r21+1024], %r2774; 2026-02-21T12:40:23.8015557Z prmt.b32 %r5144, %r2774, 0, 0x7771U; 2026-02-21T12:40:23.8015759Z st.shared.b8 [%r22+1024], %r5144; 2026-02-21T12:40:23.8015957Z prmt.b32 %r5145, %r2774, 0, 0x7772U; 2026-02-21T12:40:23.8016160Z st.shared.b8 [%r23+1280], %r5145; 2026-02-21T12:40:23.8016358Z prmt.b32 %r5146, %r2774, 0, 0x7773U; 2026-02-21T12:40:23.8016691Z st.shared.b8 [%r24+1280], %r5146; 2026-02-21T12:40:23.8016897Z st.shared.b8 [%r25+1536], %r2775; 2026-02-21T12:40:23.8017167Z prmt.b32 %r5147, %r2775, 0, 0x7771U; 2026-02-21T12:40:23.8017469Z st.shared.b8 [%r26+1536], %r5147; 2026-02-21T12:40:23.8017668Z prmt.b32 %r5148, %r2775, 0, 0x7772U; 2026-02-21T12:40:23.8017876Z st.shared.b8 [%r27+1792], %r5148; 2026-02-21T12:40:23.8018068Z prmt.b32 %r5149, %r2775, 0, 0x7773U; 2026-02-21T12:40:23.8018273Z st.shared.b8 [%r28+1792], %r5149; 2026-02-21T12:40:23.8018462Z bar.sync 0; 2026-02-21T12:40:23.8018616Z ld.shared.b32 %r5150, [%r29]; 2026-02-21T12:40:23.8018811Z prmt.b32 %r5151, %r5150, 0, 0x7770U; 2026-02-21T12:40:23.8019009Z cvt.u16.u32 %rs9, %r5151; 2026-02-21T12:40:23.8019194Z prmt.b32 %r5152, %r5150, 0, 0x7771U; 2026-02-21T12:40:23.8019387Z cvt.u16.u32 %rs10, %r5152; 2026-02-21T12:40:23.8019589Z prmt.b32 %r5153, %r5150, 0, 0x7772U; 2026-02-21T12:40:23.8019786Z cvt.u16.u32 %rs11, %r5153; 2026-02-21T12:40:23.8019971Z prmt.b32 %r5154, %r5150, 0, 0x7773U; 2026-02-21T12:40:23.8020162Z cvt.u16.u32 %rs12, %r5154; 2026-02-21T12:40:23.8020346Z ld.shared.b32 %r5155, [%r30]; 2026-02-21T12:40:23.8020552Z prmt.b32 %r5156, %r5155, 0, 0x7770U; 2026-02-21T12:40:23.8020747Z cvt.u16.u32 %rs13, %r5156; 2026-02-21T12:40:23.8020933Z prmt.b32 %r5157, %r5155, 0, 0x7771U; 2026-02-21T12:40:23.8021124Z cvt.u16.u32 %rs14, %r5157; 2026-02-21T12:40:23.8021303Z prmt.b32 %r5158, %r5155, 0, 0x7772U; 2026-02-21T12:40:23.8021496Z cvt.u16.u32 %rs15, %r5158; 2026-02-21T12:40:23.8021674Z prmt.b32 %r5159, %r5155, 0, 0x7773U; 2026-02-21T12:40:23.8021865Z cvt.u16.u32 %rs16, %r5159; 2026-02-21T12:40:23.8022051Z ld.shared.b32 %r5160, [%r31]; 2026-02-21T12:40:23.8022239Z prmt.b32 %r5161, %r5160, 0, 0x7770U; 2026-02-21T12:40:23.8022432Z cvt.u16.u32 %rs17, %r5161; 2026-02-21T12:40:23.8022618Z prmt.b32 %r5162, %r5160, 0, 0x7771U; 2026-02-21T12:40:23.8022811Z cvt.u16.u32 %rs18, %r5162; 2026-02-21T12:40:23.8023006Z prmt.b32 %r5163, %r5160, 0, 0x7772U; 2026-02-21T12:40:23.8023203Z cvt.u16.u32 %rs19, %r5163; 2026-02-21T12:40:23.8023385Z prmt.b32 %r5164, %r5160, 0, 0x7773U; 2026-02-21T12:40:23.8023585Z cvt.u16.u32 %rs20, %r5164; 2026-02-21T12:40:23.8023768Z ld.shared.b32 %r5165, [%r32]; 2026-02-21T12:40:23.8023958Z prmt.b32 %r5166, %r5165, 0, 0x7770U; 2026-02-21T12:40:23.8024153Z cvt.u16.u32 %rs21, %r5166; 2026-02-21T12:40:23.8024336Z prmt.b32 %r5167, %r5165, 0, 0x7771U; 2026-02-21T12:40:23.8024530Z cvt.u16.u32 %rs22, %r5167; 2026-02-21T12:40:23.8024712Z prmt.b32 %r5168, %r5165, 0, 0x7772U; 2026-02-21T12:40:23.8024904Z cvt.u16.u32 %rs23, %r5168; 2026-02-21T12:40:23.8025084Z prmt.b32 %r5169, %r5165, 0, 0x7773U; 2026-02-21T12:40:23.8025276Z cvt.u16.u32 %rs24, %r5169; 2026-02-21T12:40:23.8025600Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8025967Z shl.b16 %rs25, %rs9, 4; 2026-02-21T12:40:23.8026139Z shl.b16 %rs26, %rs13, 4; 2026-02-21T12:40:23.8026409Z shl.b16 %rs27, %rs17, 4; 2026-02-21T12:40:23.8026711Z shl.b16 %rs28, %rs21, 4; 2026-02-21T12:40:23.8026888Z shl.b16 %rs29, %rs10, 4; 2026-02-21T12:40:23.8027137Z shl.b16 %rs30, %rs14, 4; 2026-02-21T12:40:23.8027325Z shl.b16 %rs31, %rs18, 4; 2026-02-21T12:40:23.8027495Z shl.b16 %rs32, %rs22, 4; 2026-02-21T12:40:23.8027672Z shl.b16 %rs33, %rs11, 4; 2026-02-21T12:40:23.8027840Z shl.b16 %rs34, %rs15, 4; 2026-02-21T12:40:23.8028009Z shl.b16 %rs35, %rs19, 4; 2026-02-21T12:40:23.8028185Z shl.b16 %rs36, %rs23, 4; 2026-02-21T12:40:23.8028413Z shl.b16 %rs37, %rs12, 4; 2026-02-21T12:40:23.8028598Z shl.b16 %rs38, %rs16, 4; 2026-02-21T12:40:23.8028766Z shl.b16 %rs39, %rs20, 4; 2026-02-21T12:40:23.8028941Z shl.b16 %rs40, %rs24, 4; 2026-02-21T12:40:23.8029247Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8029604Z cvt.s16.s8 %rs41, %rs25; 2026-02-21T12:40:23.8029774Z shr.s16 %rs42, %rs41, 4; 2026-02-21T12:40:23.8029950Z cvt.s16.s8 %rs43, %rs27; 2026-02-21T12:40:23.8030118Z shr.s16 %rs44, %rs43, 4; 2026-02-21T12:40:23.8030296Z prmt.b32 %r5170, %r5150, 0, 0x8880U; 2026-02-21T12:40:23.8030638Z cvt.u16.u32 %rs45, %r5170; 2026-02-21T12:40:23.8030818Z shr.s16 %rs46, %rs45, 4; 2026-02-21T12:40:23.8030997Z prmt.b32 %r5171, %r5160, 0, 0x8880U; 2026-02-21T12:40:23.8031189Z cvt.u16.u32 %rs47, %r5171; 2026-02-21T12:40:23.8031368Z shr.s16 %rs48, %rs47, 4; 2026-02-21T12:40:23.8031674Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8032042Z cvt.rn.f32.s16 %r5172, %rs48; 2026-02-21T12:40:23.8032223Z cvt.rn.f32.s16 %r5173, %rs46; 2026-02-21T12:40:23.8032409Z cvt.rn.f32.s16 %r5174, %rs44; 2026-02-21T12:40:23.8032590Z cvt.rn.f32.s16 %r5175, %rs42; 2026-02-21T12:40:23.8032904Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8033256Z cvt.s16.s8 %rs49, %rs26; 2026-02-21T12:40:23.8033431Z shr.s16 %rs50, %rs49, 4; 2026-02-21T12:40:23.8033608Z cvt.s16.s8 %rs51, %rs28; 2026-02-21T12:40:23.8033776Z shr.s16 %rs52, %rs51, 4; 2026-02-21T12:40:23.8033958Z prmt.b32 %r5176, %r5155, 0, 0x8880U; 2026-02-21T12:40:23.8034157Z cvt.u16.u32 %rs53, %r5176; 2026-02-21T12:40:23.8034335Z shr.s16 %rs54, %rs53, 4; 2026-02-21T12:40:23.8034512Z prmt.b32 %r5177, %r5165, 0, 0x8880U; 2026-02-21T12:40:23.8034706Z cvt.u16.u32 %rs55, %r5177; 2026-02-21T12:40:23.8034896Z shr.s16 %rs56, %rs55, 4; 2026-02-21T12:40:23.8035207Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8035563Z cvt.rn.f32.s16 %r5178, %rs56; 2026-02-21T12:40:23.8035743Z cvt.rn.f32.s16 %r5179, %rs54; 2026-02-21T12:40:23.8035935Z cvt.rn.f32.s16 %r5180, %rs52; 2026-02-21T12:40:23.8036113Z cvt.rn.f32.s16 %r5181, %rs50; 2026-02-21T12:40:23.8036434Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8036929Z cvt.s16.s8 %rs57, %rs29; 2026-02-21T12:40:23.8037100Z shr.s16 %rs58, %rs57, 4; 2026-02-21T12:40:23.8037280Z cvt.s16.s8 %rs59, %rs31; 2026-02-21T12:40:23.8037460Z shr.s16 %rs60, %rs59, 4; 2026-02-21T12:40:23.8037649Z prmt.b32 %r5182, %r5150, 0, 0x9991U; 2026-02-21T12:40:23.8037844Z cvt.u16.u32 %rs61, %r5182; 2026-02-21T12:40:23.8059398Z shr.s16 %rs62, %rs61, 4; 2026-02-21T12:40:23.8059641Z prmt.b32 %r5183, %r5160, 0, 0x9991U; 2026-02-21T12:40:23.8059874Z cvt.u16.u32 %rs63, %r5183; 2026-02-21T12:40:23.8060078Z shr.s16 %rs64, %rs63, 4; 2026-02-21T12:40:23.8060421Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8060804Z cvt.rn.f32.s16 %r5184, %rs64; 2026-02-21T12:40:23.8061017Z cvt.rn.f32.s16 %r5185, %rs62; 2026-02-21T12:40:23.8061203Z cvt.rn.f32.s16 %r5186, %rs60; 2026-02-21T12:40:23.8061391Z cvt.rn.f32.s16 %r5187, %rs58; 2026-02-21T12:40:23.8061915Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8062285Z cvt.s16.s8 %rs65, %rs30; 2026-02-21T12:40:23.8062560Z shr.s16 %rs66, %rs65, 4; 2026-02-21T12:40:23.8062731Z cvt.s16.s8 %rs67, %rs32; 2026-02-21T12:40:23.8062905Z shr.s16 %rs68, %rs67, 4; 2026-02-21T12:40:23.8063084Z prmt.b32 %r5188, %r5155, 0, 0x9991U; 2026-02-21T12:40:23.8063295Z cvt.u16.u32 %rs69, %r5188; 2026-02-21T12:40:23.8063477Z shr.s16 %rs70, %rs69, 4; 2026-02-21T12:40:23.8063660Z prmt.b32 %r5189, %r5165, 0, 0x9991U; 2026-02-21T12:40:23.8063857Z cvt.u16.u32 %rs71, %r5189; 2026-02-21T12:40:23.8064042Z shr.s16 %rs72, %rs71, 4; 2026-02-21T12:40:23.8064352Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8064726Z cvt.rn.f32.s16 %r5190, %rs72; 2026-02-21T12:40:23.8064921Z cvt.rn.f32.s16 %r5191, %rs70; 2026-02-21T12:40:23.8065103Z cvt.rn.f32.s16 %r5192, %rs68; 2026-02-21T12:40:23.8065292Z cvt.rn.f32.s16 %r5193, %rs66; 2026-02-21T12:40:23.8065605Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8066095Z cvt.s16.s8 %rs73, %rs33; 2026-02-21T12:40:23.8066270Z shr.s16 %rs74, %rs73, 4; 2026-02-21T12:40:23.8066632Z cvt.s16.s8 %rs75, %rs35; 2026-02-21T12:40:23.8066830Z shr.s16 %rs76, %rs75, 4; 2026-02-21T12:40:23.8067018Z prmt.b32 %r5194, %r5150, 0, 0xaaa2U; 2026-02-21T12:40:23.8067222Z cvt.u16.u32 %rs77, %r5194; 2026-02-21T12:40:23.8067397Z shr.s16 %rs78, %rs77, 4; 2026-02-21T12:40:23.8067576Z prmt.b32 %r5195, %r5160, 0, 0xaaa2U; 2026-02-21T12:40:23.8067769Z cvt.u16.u32 %rs79, %r5195; 2026-02-21T12:40:23.8067947Z shr.s16 %rs80, %rs79, 4; 2026-02-21T12:40:23.8068247Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8068693Z cvt.rn.f32.s16 %r5196, %rs80; 2026-02-21T12:40:23.8068876Z cvt.rn.f32.s16 %r5197, %rs78; 2026-02-21T12:40:23.8069068Z cvt.rn.f32.s16 %r5198, %rs76; 2026-02-21T12:40:23.8069252Z cvt.rn.f32.s16 %r5199, %rs74; 2026-02-21T12:40:23.8069570Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8069927Z cvt.s16.s8 %rs81, %rs34; 2026-02-21T12:40:23.8070095Z shr.s16 %rs82, %rs81, 4; 2026-02-21T12:40:23.8070269Z cvt.s16.s8 %rs83, %rs36; 2026-02-21T12:40:23.8070449Z shr.s16 %rs84, %rs83, 4; 2026-02-21T12:40:23.8070631Z prmt.b32 %r5200, %r5155, 0, 0xaaa2U; 2026-02-21T12:40:23.8070828Z cvt.u16.u32 %rs85, %r5200; 2026-02-21T12:40:23.8071006Z shr.s16 %rs86, %rs85, 4; 2026-02-21T12:40:23.8071181Z prmt.b32 %r5201, %r5165, 0, 0xaaa2U; 2026-02-21T12:40:23.8071377Z cvt.u16.u32 %rs87, %r5201; 2026-02-21T12:40:23.8071556Z shr.s16 %rs88, %rs87, 4; 2026-02-21T12:40:23.8071861Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8072217Z cvt.rn.f32.s16 %r5202, %rs88; 2026-02-21T12:40:23.8072400Z cvt.rn.f32.s16 %r5203, %rs86; 2026-02-21T12:40:23.8072585Z cvt.rn.f32.s16 %r5204, %rs84; 2026-02-21T12:40:23.8072765Z cvt.rn.f32.s16 %r5205, %rs82; 2026-02-21T12:40:23.8073087Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8073439Z cvt.s16.s8 %rs89, %rs37; 2026-02-21T12:40:23.8073610Z shr.s16 %rs90, %rs89, 4; 2026-02-21T12:40:23.8073783Z cvt.s16.s8 %rs91, %rs39; 2026-02-21T12:40:23.8073944Z shr.s16 %rs92, %rs91, 4; 2026-02-21T12:40:23.8074118Z prmt.b32 %r5206, %r5150, 0, 0xbbb3U; 2026-02-21T12:40:23.8074312Z cvt.u16.u32 %rs93, %r5206; 2026-02-21T12:40:23.8074490Z shr.s16 %rs94, %rs93, 4; 2026-02-21T12:40:23.8074665Z prmt.b32 %r5207, %r5160, 0, 0xbbb3U; 2026-02-21T12:40:23.8074892Z cvt.u16.u32 %rs95, %r5207; 2026-02-21T12:40:23.8075081Z shr.s16 %rs96, %rs95, 4; 2026-02-21T12:40:23.8075395Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8075851Z cvt.rn.f32.s16 %r5208, %rs96; 2026-02-21T12:40:23.8076049Z cvt.rn.f32.s16 %r5209, %rs94; 2026-02-21T12:40:23.8076245Z cvt.rn.f32.s16 %r5210, %rs92; 2026-02-21T12:40:23.8076623Z cvt.rn.f32.s16 %r5211, %rs90; 2026-02-21T12:40:23.8076949Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8077296Z cvt.s16.s8 %rs97, %rs38; 2026-02-21T12:40:23.8077485Z shr.s16 %rs98, %rs97, 4; 2026-02-21T12:40:23.8077681Z cvt.s16.s8 %rs99, %rs40; 2026-02-21T12:40:23.8077861Z shr.s16 %rs100, %rs99, 4; 2026-02-21T12:40:23.8078055Z prmt.b32 %r5212, %r5155, 0, 0xbbb3U; 2026-02-21T12:40:23.8078264Z cvt.u16.u32 %rs101, %r5212; 2026-02-21T12:40:23.8078460Z shr.s16 %rs102, %rs101, 4; 2026-02-21T12:40:23.8078650Z prmt.b32 %r5213, %r5165, 0, 0xbbb3U; 2026-02-21T12:40:23.8078863Z cvt.u16.u32 %rs103, %r5213; 2026-02-21T12:40:23.8079048Z shr.s16 %rs104, %rs103, 4; 2026-02-21T12:40:23.8079385Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8079753Z cvt.rn.f32.s16 %r5214, %rs104; 2026-02-21T12:40:23.8080073Z cvt.rn.f32.s16 %r5215, %rs102; 2026-02-21T12:40:23.8080259Z cvt.rn.f32.s16 %r5216, %rs100; 2026-02-21T12:40:23.8080435Z cvt.rn.f32.s16 %r5217, %rs98; 2026-02-21T12:40:23.8080609Z bar.sync 0; 2026-02-21T12:40:23.8080804Z st.shared.v4.b32 [%r33], {%r5175, %r5173, %r5174, %r5172}; 2026-02-21T12:40:23.8081123Z st.shared.v4.b32 [%r33+8192], {%r5181, %r5179, %r5180, %r5178}; 2026-02-21T12:40:23.8081428Z st.shared.v4.b32 [%r34], {%r5187, %r5185, %r5186, %r5184}; 2026-02-21T12:40:23.8081732Z st.shared.v4.b32 [%r34+8192], {%r5193, %r5191, %r5192, %r5190}; 2026-02-21T12:40:23.8082048Z st.shared.v4.b32 [%r35], {%r5199, %r5197, %r5198, %r5196}; 2026-02-21T12:40:23.8082340Z st.shared.v4.b32 [%r35+8192], {%r5205, %r5203, %r5204, %r5202}; 2026-02-21T12:40:23.8082639Z st.shared.v4.b32 [%r36], {%r5211, %r5209, %r5210, %r5208}; 2026-02-21T12:40:23.8082933Z st.shared.v4.b32 [%r36+8192], {%r5217, %r5215, %r5216, %r5214}; 2026-02-21T12:40:23.8083189Z $L__tmp1: 2026-02-21T12:40:23.8083550Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8083969Z // begin inline asm 2026-02-21T12:40:23.8084152Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8084340Z // end inline asm 2026-02-21T12:40:23.8084487Z bar.sync 0; 2026-02-21T12:40:23.8084646Z shfl.sync.idx.b32 %r5218, %r2, 0, 31, -1; 2026-02-21T12:40:23.8084881Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8085070Z mov.pred %p3, -1; 2026-02-21T12:40:23.8085243Z // begin inline asm 2026-02-21T12:40:23.8088191Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r3032,%r3033,%r3034,%r3035}, %rd23, %p3, 1, 1; 2026-02-21T12:40:23.8091162Z // end inline asm 2026-02-21T12:40:23.8091336Z // begin inline asm 2026-02-21T12:40:23.8094100Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r3292,%r3293,%r3294,%r3295}, %rd24, %p3, 1, 1; 2026-02-21T12:40:23.8097304Z // end inline asm 2026-02-21T12:40:23.8097565Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8097835Z mov.b32 %r5005, 0; 2026-02-21T12:40:23.8097998Z mov.b32 %r3425, %r5005; 2026-02-21T12:40:23.8098177Z mov.b32 %r3426, %r5005; 2026-02-21T12:40:23.8098349Z mov.b32 %r3424, %r18409; 2026-02-21T12:40:23.8098527Z // begin inline asm 2026-02-21T12:40:23.8101087Z // wait for regs: %r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457,%r3424,%r3425,%r3426 2026-02-21T12:40:23.8103844Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8104048Z // end inline asm 2026-02-21T12:40:23.8104215Z $L__tmp2: 2026-02-21T12:40:23.8104510Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8104892Z add.s64 %rd194, %rd615, -32; 2026-02-21T12:40:23.8105079Z // begin inline asm 2026-02-21T12:40:23.8105246Z mov.u64 %rd193, 0x0; 2026-02-21T12:40:23.8105491Z createpolicy.fractional.L2::evict_last.b64 %rd193, 1.0; 2026-02-21T12:40:23.8105764Z // end inline asm 2026-02-21T12:40:23.8105918Z // begin inline asm 2026-02-21T12:40:23.8106081Z mov.u32 %r3558, 0x0; 2026-02-21T12:40:23.8106247Z mov.u32 %r3559, 0x0; 2026-02-21T12:40:23.8106403Z mov.u32 %r3560, 0x0; 2026-02-21T12:40:23.8106703Z mov.u32 %r3561, 0x0; 2026-02-21T12:40:23.8107034Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3558, %r3559, %r3560, %r3561 }, [ %rd194 + 0 ], %rd193; 2026-02-21T12:40:23.8107435Z // end inline asm 2026-02-21T12:40:23.8107735Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8108095Z bar.sync 0; 2026-02-21T12:40:23.8108271Z st.shared.v2.b32 [%r9], {%r3558, %r3559}; 2026-02-21T12:40:23.8108558Z st.shared.v2.b32 [%r10], {%r3560, %r3561}; 2026-02-21T12:40:23.8108774Z bar.sync 0; 2026-02-21T12:40:23.8108929Z ld.shared.b16 %rs105, [%r11]; 2026-02-21T12:40:23.8109138Z ld.shared.b16 %rs106, [%r11+256]; 2026-02-21T12:40:23.8109446Z ld.shared.b16 %rs107, [%r11+16]; 2026-02-21T12:40:23.8109649Z ld.shared.b16 %rs108, [%r11+272]; 2026-02-21T12:40:23.8109845Z ld.shared.b16 %rs109, [%r12]; 2026-02-21T12:40:23.8110117Z ld.shared.b16 %rs110, [%r12+256]; 2026-02-21T12:40:23.8110323Z ld.shared.b16 %rs111, [%r12+16]; 2026-02-21T12:40:23.8110530Z ld.shared.b16 %rs112, [%r12+272]; 2026-02-21T12:40:23.8110741Z cvt.f32.bf16 %r3822, %rs105; 2026-02-21T12:40:23.8110932Z cvt.f32.bf16 %r3823, %rs106; 2026-02-21T12:40:23.8111118Z cvt.f32.bf16 %r3824, %rs109; 2026-02-21T12:40:23.8111313Z cvt.f32.bf16 %r3825, %rs110; 2026-02-21T12:40:23.8111498Z cvt.f32.bf16 %r4082, %rs107; 2026-02-21T12:40:23.8111677Z cvt.f32.bf16 %r4083, %rs108; 2026-02-21T12:40:23.8111860Z cvt.f32.bf16 %r4084, %rs111; 2026-02-21T12:40:23.8112040Z cvt.f32.bf16 %r4085, %rs112; 2026-02-21T12:40:23.8112360Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8112728Z add.s64 %rd196, %rd614, 10240; 2026-02-21T12:40:23.8112922Z // begin inline asm 2026-02-21T12:40:23.8113089Z mov.u32 %r3562, 0x0; 2026-02-21T12:40:23.8113248Z mov.u32 %r3563, 0x0; 2026-02-21T12:40:23.8113574Z mov.u32 %r3564, 0x0; 2026-02-21T12:40:23.8113735Z mov.u32 %r3565, 0x0; 2026-02-21T12:40:23.8113967Z ld.global.v4.b32 { %r3562, %r3563, %r3564, %r3565 }, [ %rd196 + 0 ]; 2026-02-21T12:40:23.8114257Z // end inline asm 2026-02-21T12:40:23.8114554Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8114905Z bar.sync 0; 2026-02-21T12:40:23.8115057Z st.shared.b8 [%r13], %r3562; 2026-02-21T12:40:23.8115260Z prmt.b32 %r5219, %r3562, 0, 0x7771U; 2026-02-21T12:40:23.8115466Z st.shared.b8 [%r14], %r5219; 2026-02-21T12:40:23.8115671Z prmt.b32 %r5220, %r3562, 0, 0x7772U; 2026-02-21T12:40:23.8115871Z st.shared.b8 [%r15+256], %r5220; 2026-02-21T12:40:23.8116077Z prmt.b32 %r5221, %r3562, 0, 0x7773U; 2026-02-21T12:40:23.8116284Z st.shared.b8 [%r16+256], %r5221; 2026-02-21T12:40:23.8116611Z st.shared.b8 [%r17+512], %r3563; 2026-02-21T12:40:23.8116819Z prmt.b32 %r5222, %r3563, 0, 0x7771U; 2026-02-21T12:40:23.8117023Z st.shared.b8 [%r18+512], %r5222; 2026-02-21T12:40:23.8117221Z prmt.b32 %r5223, %r3563, 0, 0x7772U; 2026-02-21T12:40:23.8117435Z st.shared.b8 [%r19+768], %r5223; 2026-02-21T12:40:23.8117631Z prmt.b32 %r5224, %r3563, 0, 0x7773U; 2026-02-21T12:40:23.8117829Z st.shared.b8 [%r20+768], %r5224; 2026-02-21T12:40:23.8118030Z st.shared.b8 [%r21+1024], %r3564; 2026-02-21T12:40:23.8118236Z prmt.b32 %r5225, %r3564, 0, 0x7771U; 2026-02-21T12:40:23.8118431Z st.shared.b8 [%r22+1024], %r5225; 2026-02-21T12:40:23.8118632Z prmt.b32 %r5226, %r3564, 0, 0x7772U; 2026-02-21T12:40:23.8118824Z st.shared.b8 [%r23+1280], %r5226; 2026-02-21T12:40:23.8119023Z prmt.b32 %r5227, %r3564, 0, 0x7773U; 2026-02-21T12:40:23.8119233Z st.shared.b8 [%r24+1280], %r5227; 2026-02-21T12:40:23.8119439Z st.shared.b8 [%r25+1536], %r3565; 2026-02-21T12:40:23.8119636Z prmt.b32 %r5228, %r3565, 0, 0x7771U; 2026-02-21T12:40:23.8119837Z st.shared.b8 [%r26+1536], %r5228; 2026-02-21T12:40:23.8120031Z prmt.b32 %r5229, %r3565, 0, 0x7772U; 2026-02-21T12:40:23.8120236Z st.shared.b8 [%r27+1792], %r5229; 2026-02-21T12:40:23.8120433Z prmt.b32 %r5230, %r3565, 0, 0x7773U; 2026-02-21T12:40:23.8120629Z st.shared.b8 [%r28+1792], %r5230; 2026-02-21T12:40:23.8120816Z bar.sync 0; 2026-02-21T12:40:23.8120970Z ld.shared.b32 %r5231, [%r29]; 2026-02-21T12:40:23.8121154Z prmt.b32 %r5232, %r5231, 0, 0x7770U; 2026-02-21T12:40:23.8121348Z cvt.u16.u32 %rs113, %r5232; 2026-02-21T12:40:23.8121541Z prmt.b32 %r5233, %r5231, 0, 0x7771U; 2026-02-21T12:40:23.8121739Z cvt.u16.u32 %rs114, %r5233; 2026-02-21T12:40:23.8121914Z prmt.b32 %r5234, %r5231, 0, 0x7772U; 2026-02-21T12:40:23.8122109Z cvt.u16.u32 %rs115, %r5234; 2026-02-21T12:40:23.8122289Z prmt.b32 %r5235, %r5231, 0, 0x7773U; 2026-02-21T12:40:23.8122476Z cvt.u16.u32 %rs116, %r5235; 2026-02-21T12:40:23.8122656Z ld.shared.b32 %r5236, [%r30]; 2026-02-21T12:40:23.8122924Z prmt.b32 %r5237, %r5236, 0, 0x7770U; 2026-02-21T12:40:23.8123115Z cvt.u16.u32 %rs117, %r5237; 2026-02-21T12:40:23.8123357Z prmt.b32 %r5238, %r5236, 0, 0x7771U; 2026-02-21T12:40:23.8123549Z cvt.u16.u32 %rs118, %r5238; 2026-02-21T12:40:23.8123722Z prmt.b32 %r5239, %r5236, 0, 0x7772U; 2026-02-21T12:40:23.8123913Z cvt.u16.u32 %rs119, %r5239; 2026-02-21T12:40:23.8124091Z prmt.b32 %r5240, %r5236, 0, 0x7773U; 2026-02-21T12:40:23.8124282Z cvt.u16.u32 %rs120, %r5240; 2026-02-21T12:40:23.8124470Z ld.shared.b32 %r5241, [%r31]; 2026-02-21T12:40:23.8124652Z prmt.b32 %r5242, %r5241, 0, 0x7770U; 2026-02-21T12:40:23.8124844Z cvt.u16.u32 %rs121, %r5242; 2026-02-21T12:40:23.8125016Z prmt.b32 %r5243, %r5241, 0, 0x7771U; 2026-02-21T12:40:23.8125206Z cvt.u16.u32 %rs122, %r5243; 2026-02-21T12:40:23.8125379Z prmt.b32 %r5244, %r5241, 0, 0x7772U; 2026-02-21T12:40:23.8125571Z cvt.u16.u32 %rs123, %r5244; 2026-02-21T12:40:23.8125759Z prmt.b32 %r5245, %r5241, 0, 0x7773U; 2026-02-21T12:40:23.8125952Z cvt.u16.u32 %rs124, %r5245; 2026-02-21T12:40:23.8126129Z ld.shared.b32 %r5246, [%r32]; 2026-02-21T12:40:23.8126592Z prmt.b32 %r5247, %r5246, 0, 0x7770U; 2026-02-21T12:40:23.8126811Z cvt.u16.u32 %rs125, %r5247; 2026-02-21T12:40:23.8126993Z prmt.b32 %r5248, %r5246, 0, 0x7771U; 2026-02-21T12:40:23.8127193Z cvt.u16.u32 %rs126, %r5248; 2026-02-21T12:40:23.8127372Z prmt.b32 %r5249, %r5246, 0, 0x7772U; 2026-02-21T12:40:23.8127590Z cvt.u16.u32 %rs127, %r5249; 2026-02-21T12:40:23.8127768Z prmt.b32 %r5250, %r5246, 0, 0x7773U; 2026-02-21T12:40:23.8127957Z cvt.u16.u32 %rs128, %r5250; 2026-02-21T12:40:23.8128276Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8128629Z shl.b16 %rs129, %rs113, 4; 2026-02-21T12:40:23.8128810Z shl.b16 %rs130, %rs117, 4; 2026-02-21T12:40:23.8128981Z shl.b16 %rs131, %rs121, 4; 2026-02-21T12:40:23.8129153Z shl.b16 %rs132, %rs125, 4; 2026-02-21T12:40:23.8129322Z shl.b16 %rs133, %rs114, 4; 2026-02-21T12:40:23.8129494Z shl.b16 %rs134, %rs118, 4; 2026-02-21T12:40:23.8129660Z shl.b16 %rs135, %rs122, 4; 2026-02-21T12:40:23.8129839Z shl.b16 %rs136, %rs126, 4; 2026-02-21T12:40:23.8130008Z shl.b16 %rs137, %rs115, 4; 2026-02-21T12:40:23.8130191Z shl.b16 %rs138, %rs119, 4; 2026-02-21T12:40:23.8130367Z shl.b16 %rs139, %rs123, 4; 2026-02-21T12:40:23.8130536Z shl.b16 %rs140, %rs127, 4; 2026-02-21T12:40:23.8130709Z shl.b16 %rs141, %rs116, 4; 2026-02-21T12:40:23.8130874Z shl.b16 %rs142, %rs120, 4; 2026-02-21T12:40:23.8131045Z shl.b16 %rs143, %rs124, 4; 2026-02-21T12:40:23.8131212Z shl.b16 %rs144, %rs128, 4; 2026-02-21T12:40:23.8131519Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8131865Z cvt.s16.s8 %rs145, %rs129; 2026-02-21T12:40:23.8132037Z shr.s16 %rs146, %rs145, 4; 2026-02-21T12:40:23.8132214Z cvt.s16.s8 %rs147, %rs131; 2026-02-21T12:40:23.8132382Z shr.s16 %rs148, %rs147, 4; 2026-02-21T12:40:23.8132559Z prmt.b32 %r5251, %r5231, 0, 0x8880U; 2026-02-21T12:40:23.8132750Z cvt.u16.u32 %rs149, %r5251; 2026-02-21T12:40:23.8132932Z shr.s16 %rs150, %rs149, 4; 2026-02-21T12:40:23.8133105Z prmt.b32 %r5252, %r5241, 0, 0x8880U; 2026-02-21T12:40:23.8133302Z cvt.u16.u32 %rs151, %r5252; 2026-02-21T12:40:23.8133471Z shr.s16 %rs152, %rs151, 4; 2026-02-21T12:40:23.8133781Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8134134Z cvt.rn.f32.s16 %r5253, %rs152; 2026-02-21T12:40:23.8134319Z cvt.rn.f32.s16 %r5254, %rs150; 2026-02-21T12:40:23.8134506Z cvt.rn.f32.s16 %r5255, %rs148; 2026-02-21T12:40:23.8134687Z cvt.rn.f32.s16 %r5256, %rs146; 2026-02-21T12:40:23.8135003Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8135348Z cvt.s16.s8 %rs153, %rs130; 2026-02-21T12:40:23.8135523Z shr.s16 %rs154, %rs153, 4; 2026-02-21T12:40:23.8135784Z cvt.s16.s8 %rs155, %rs132; 2026-02-21T12:40:23.8135972Z shr.s16 %rs156, %rs155, 4; 2026-02-21T12:40:23.8136152Z prmt.b32 %r5257, %r5236, 0, 0x8880U; 2026-02-21T12:40:23.8136410Z cvt.u16.u32 %rs157, %r5257; 2026-02-21T12:40:23.8136710Z shr.s16 %rs158, %rs157, 4; 2026-02-21T12:40:23.8136883Z prmt.b32 %r5258, %r5246, 0, 0x8880U; 2026-02-21T12:40:23.8137076Z cvt.u16.u32 %rs159, %r5258; 2026-02-21T12:40:23.8137247Z shr.s16 %rs160, %rs159, 4; 2026-02-21T12:40:23.8137558Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8137902Z cvt.rn.f32.s16 %r5259, %rs160; 2026-02-21T12:40:23.8138089Z cvt.rn.f32.s16 %r5260, %rs158; 2026-02-21T12:40:23.8138275Z cvt.rn.f32.s16 %r5261, %rs156; 2026-02-21T12:40:23.8138453Z cvt.rn.f32.s16 %r5262, %rs154; 2026-02-21T12:40:23.8138770Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8139119Z cvt.s16.s8 %rs161, %rs133; 2026-02-21T12:40:23.8139295Z shr.s16 %rs162, %rs161, 4; 2026-02-21T12:40:23.8139467Z cvt.s16.s8 %rs163, %rs135; 2026-02-21T12:40:23.8139728Z shr.s16 %rs164, %rs163, 4; 2026-02-21T12:40:23.8139984Z prmt.b32 %r5263, %r5231, 0, 0x9991U; 2026-02-21T12:40:23.8140183Z cvt.u16.u32 %rs165, %r5263; 2026-02-21T12:40:23.8140356Z shr.s16 %rs166, %rs165, 4; 2026-02-21T12:40:23.8140533Z prmt.b32 %r5264, %r5241, 0, 0x9991U; 2026-02-21T12:40:23.8140725Z cvt.u16.u32 %rs167, %r5264; 2026-02-21T12:40:23.8140894Z shr.s16 %rs168, %rs167, 4; 2026-02-21T12:40:23.8141201Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8141545Z cvt.rn.f32.s16 %r5265, %rs168; 2026-02-21T12:40:23.8141731Z cvt.rn.f32.s16 %r5266, %rs166; 2026-02-21T12:40:23.8141914Z cvt.rn.f32.s16 %r5267, %rs164; 2026-02-21T12:40:23.8142094Z cvt.rn.f32.s16 %r5268, %rs162; 2026-02-21T12:40:23.8142407Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8142750Z cvt.s16.s8 %rs169, %rs134; 2026-02-21T12:40:23.8142922Z shr.s16 %rs170, %rs169, 4; 2026-02-21T12:40:23.8143098Z cvt.s16.s8 %rs171, %rs136; 2026-02-21T12:40:23.8143279Z shr.s16 %rs172, %rs171, 4; 2026-02-21T12:40:23.8143460Z prmt.b32 %r5269, %r5236, 0, 0x9991U; 2026-02-21T12:40:23.8143655Z cvt.u16.u32 %rs173, %r5269; 2026-02-21T12:40:23.8143826Z shr.s16 %rs174, %rs173, 4; 2026-02-21T12:40:23.8144004Z prmt.b32 %r5270, %r5246, 0, 0x9991U; 2026-02-21T12:40:23.8144193Z cvt.u16.u32 %rs175, %r5270; 2026-02-21T12:40:23.8144368Z shr.s16 %rs176, %rs175, 4; 2026-02-21T12:40:23.8144672Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8145022Z cvt.rn.f32.s16 %r5271, %rs176; 2026-02-21T12:40:23.8145211Z cvt.rn.f32.s16 %r5272, %rs174; 2026-02-21T12:40:23.8145388Z cvt.rn.f32.s16 %r5273, %rs172; 2026-02-21T12:40:23.8145568Z cvt.rn.f32.s16 %r5274, %rs170; 2026-02-21T12:40:23.8145881Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8146233Z cvt.s16.s8 %rs177, %rs137; 2026-02-21T12:40:23.8146404Z shr.s16 %rs178, %rs177, 4; 2026-02-21T12:40:23.8146698Z cvt.s16.s8 %rs179, %rs139; 2026-02-21T12:40:23.8146869Z shr.s16 %rs180, %rs179, 4; 2026-02-21T12:40:23.8147055Z prmt.b32 %r5275, %r5231, 0, 0xaaa2U; 2026-02-21T12:40:23.8147248Z cvt.u16.u32 %rs181, %r5275; 2026-02-21T12:40:23.8147419Z shr.s16 %rs182, %rs181, 4; 2026-02-21T12:40:23.8147592Z prmt.b32 %r5276, %r5241, 0, 0xaaa2U; 2026-02-21T12:40:23.8147781Z cvt.u16.u32 %rs183, %r5276; 2026-02-21T12:40:23.8147955Z shr.s16 %rs184, %rs183, 4; 2026-02-21T12:40:23.8148257Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8148664Z cvt.rn.f32.s16 %r5277, %rs184; 2026-02-21T12:40:23.8148847Z cvt.rn.f32.s16 %r5278, %rs182; 2026-02-21T12:40:23.8149109Z cvt.rn.f32.s16 %r5279, %rs180; 2026-02-21T12:40:23.8149289Z cvt.rn.f32.s16 %r5280, %rs178; 2026-02-21T12:40:23.8149611Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8150025Z cvt.s16.s8 %rs185, %rs138; 2026-02-21T12:40:23.8150196Z shr.s16 %rs186, %rs185, 4; 2026-02-21T12:40:23.8150367Z cvt.s16.s8 %rs187, %rs140; 2026-02-21T12:40:23.8150541Z shr.s16 %rs188, %rs187, 4; 2026-02-21T12:40:23.8150712Z prmt.b32 %r5281, %r5236, 0, 0xaaa2U; 2026-02-21T12:40:23.8150906Z cvt.u16.u32 %rs189, %r5281; 2026-02-21T12:40:23.8151076Z shr.s16 %rs190, %rs189, 4; 2026-02-21T12:40:23.8151251Z prmt.b32 %r5282, %r5246, 0, 0xaaa2U; 2026-02-21T12:40:23.8151437Z cvt.u16.u32 %rs191, %r5282; 2026-02-21T12:40:23.8151610Z shr.s16 %rs192, %rs191, 4; 2026-02-21T12:40:23.8151912Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8152258Z cvt.rn.f32.s16 %r5283, %rs192; 2026-02-21T12:40:23.8152447Z cvt.rn.f32.s16 %r5284, %rs190; 2026-02-21T12:40:23.8152625Z cvt.rn.f32.s16 %r5285, %rs188; 2026-02-21T12:40:23.8152894Z cvt.rn.f32.s16 %r5286, %rs186; 2026-02-21T12:40:23.8153273Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8153617Z cvt.s16.s8 %rs193, %rs141; 2026-02-21T12:40:23.8153785Z shr.s16 %rs194, %rs193, 4; 2026-02-21T12:40:23.8153956Z cvt.s16.s8 %rs195, %rs143; 2026-02-21T12:40:23.8154121Z shr.s16 %rs196, %rs195, 4; 2026-02-21T12:40:23.8154299Z prmt.b32 %r5287, %r5231, 0, 0xbbb3U; 2026-02-21T12:40:23.8154487Z cvt.u16.u32 %rs197, %r5287; 2026-02-21T12:40:23.8154660Z shr.s16 %rs198, %rs197, 4; 2026-02-21T12:40:23.8154842Z prmt.b32 %r5288, %r5241, 0, 0xbbb3U; 2026-02-21T12:40:23.8155038Z cvt.u16.u32 %rs199, %r5288; 2026-02-21T12:40:23.8155221Z shr.s16 %rs200, %rs199, 4; 2026-02-21T12:40:23.8155535Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8155908Z cvt.rn.f32.s16 %r5289, %rs200; 2026-02-21T12:40:23.8156097Z cvt.rn.f32.s16 %r5290, %rs198; 2026-02-21T12:40:23.8156287Z cvt.rn.f32.s16 %r5291, %rs196; 2026-02-21T12:40:23.8156588Z cvt.rn.f32.s16 %r5292, %rs194; 2026-02-21T12:40:23.8156911Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8157262Z cvt.s16.s8 %rs201, %rs142; 2026-02-21T12:40:23.8157461Z shr.s16 %rs202, %rs201, 4; 2026-02-21T12:40:23.8157637Z cvt.s16.s8 %rs203, %rs144; 2026-02-21T12:40:23.8157807Z shr.s16 %rs204, %rs203, 4; 2026-02-21T12:40:23.8157983Z prmt.b32 %r5293, %r5236, 0, 0xbbb3U; 2026-02-21T12:40:23.8158180Z cvt.u16.u32 %rs205, %r5293; 2026-02-21T12:40:23.8158354Z shr.s16 %rs206, %rs205, 4; 2026-02-21T12:40:23.8158530Z prmt.b32 %r5294, %r5246, 0, 0xbbb3U; 2026-02-21T12:40:23.8158740Z cvt.u16.u32 %rs207, %r5294; 2026-02-21T12:40:23.8158914Z shr.s16 %rs208, %rs207, 4; 2026-02-21T12:40:23.8159220Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8159569Z cvt.rn.f32.s16 %r5295, %rs208; 2026-02-21T12:40:23.8159754Z cvt.rn.f32.s16 %r5296, %rs206; 2026-02-21T12:40:23.8159941Z cvt.rn.f32.s16 %r5297, %rs204; 2026-02-21T12:40:23.8160119Z cvt.rn.f32.s16 %r5298, %rs202; 2026-02-21T12:40:23.8160294Z bar.sync 0; 2026-02-21T12:40:23.8160487Z st.shared.v4.b32 [%r33], {%r5256, %r5254, %r5255, %r5253}; 2026-02-21T12:40:23.8160799Z st.shared.v4.b32 [%r33+8192], {%r5262, %r5260, %r5261, %r5259}; 2026-02-21T12:40:23.8161100Z st.shared.v4.b32 [%r34], {%r5268, %r5266, %r5267, %r5265}; 2026-02-21T12:40:23.8161394Z st.shared.v4.b32 [%r34+8192], {%r5274, %r5272, %r5273, %r5271}; 2026-02-21T12:40:23.8161707Z st.shared.v4.b32 [%r35], {%r5280, %r5278, %r5279, %r5277}; 2026-02-21T12:40:23.8162000Z st.shared.v4.b32 [%r35+8192], {%r5286, %r5284, %r5285, %r5283}; 2026-02-21T12:40:23.8162293Z st.shared.v4.b32 [%r36], {%r5292, %r5290, %r5291, %r5289}; 2026-02-21T12:40:23.8162663Z st.shared.v4.b32 [%r36+8192], {%r5298, %r5296, %r5297, %r5295}; 2026-02-21T12:40:23.8162990Z $L__tmp3: 2026-02-21T12:40:23.8163362Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8163778Z // begin inline asm 2026-02-21T12:40:23.8163965Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8164152Z // end inline asm 2026-02-21T12:40:23.8164303Z bar.sync 0; 2026-02-21T12:40:23.8164456Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8164637Z // begin inline asm 2026-02-21T12:40:23.8167734Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r3822,%r3823,%r3824,%r3825}, %rd23, %p3, 1, 1; 2026-02-21T12:40:23.8170743Z // end inline asm 2026-02-21T12:40:23.8170899Z // begin inline asm 2026-02-21T12:40:23.8173712Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r4082,%r4083,%r4084,%r4085}, %rd24, %p3, 1, 1; 2026-02-21T12:40:23.8176846Z // end inline asm 2026-02-21T12:40:23.8177020Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8177228Z mov.b32 %r4215, %r5005; 2026-02-21T12:40:23.8177417Z mov.b32 %r4216, %r5005; 2026-02-21T12:40:23.8177594Z mov.b32 %r4214, %r18409; 2026-02-21T12:40:23.8177771Z // begin inline asm 2026-02-21T12:40:23.8180379Z // wait for regs: %r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457,%r4214,%r4215,%r4216 2026-02-21T12:40:23.8183325Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8183536Z // end inline asm 2026-02-21T12:40:23.8183683Z $L__tmp4: 2026-02-21T12:40:23.8183974Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8184326Z // begin inline asm 2026-02-21T12:40:23.8184487Z mov.u64 %rd199, 0x0; 2026-02-21T12:40:23.8184711Z createpolicy.fractional.L2::evict_last.b64 %rd199, 1.0; 2026-02-21T12:40:23.8184970Z // end inline asm 2026-02-21T12:40:23.8185125Z // begin inline asm 2026-02-21T12:40:23.8185279Z mov.u32 %r4348, 0x0; 2026-02-21T12:40:23.8185441Z mov.u32 %r4349, 0x0; 2026-02-21T12:40:23.8185590Z mov.u32 %r4350, 0x0; 2026-02-21T12:40:23.8185745Z mov.u32 %r4351, 0x0; 2026-02-21T12:40:23.8186192Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4348, %r4349, %r4350, %r4351 }, [ %rd615 + 0 ], %rd199; 2026-02-21T12:40:23.8186702Z // end inline asm 2026-02-21T12:40:23.8187002Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8187365Z bar.sync 0; 2026-02-21T12:40:23.8187535Z st.shared.v2.b32 [%r9], {%r4348, %r4349}; 2026-02-21T12:40:23.8187764Z st.shared.v2.b32 [%r10], {%r4350, %r4351}; 2026-02-21T12:40:23.8187976Z bar.sync 0; 2026-02-21T12:40:23.8188047Z ld.shared.b16 %rs209, [%r11]; 2026-02-21T12:40:23.8188118Z ld.shared.b16 %rs210, [%r11+256]; 2026-02-21T12:40:23.8188194Z ld.shared.b16 %rs211, [%r11+16]; 2026-02-21T12:40:23.8188262Z ld.shared.b16 %rs212, [%r11+272]; 2026-02-21T12:40:23.8188376Z ld.shared.b16 %rs213, [%r12]; 2026-02-21T12:40:23.8188453Z ld.shared.b16 %rs214, [%r12+256]; 2026-02-21T12:40:23.8188529Z ld.shared.b16 %rs215, [%r12+16]; 2026-02-21T12:40:23.8188599Z ld.shared.b16 %rs216, [%r12+272]; 2026-02-21T12:40:23.8188672Z cvt.f32.bf16 %r4612, %rs209; 2026-02-21T12:40:23.8188740Z cvt.f32.bf16 %r4613, %rs210; 2026-02-21T12:40:23.8188803Z cvt.f32.bf16 %r4614, %rs213; 2026-02-21T12:40:23.8188865Z cvt.f32.bf16 %r4615, %rs214; 2026-02-21T12:40:23.8188927Z cvt.f32.bf16 %r4872, %rs211; 2026-02-21T12:40:23.8188993Z cvt.f32.bf16 %r4873, %rs212; 2026-02-21T12:40:23.8189056Z cvt.f32.bf16 %r4874, %rs215; 2026-02-21T12:40:23.8189118Z cvt.f32.bf16 %r4875, %rs216; 2026-02-21T12:40:23.8189330Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8189398Z add.s64 %rd202, %rd614, 20480; 2026-02-21T12:40:23.8189461Z // begin inline asm 2026-02-21T12:40:23.8189527Z mov.u32 %r4352, 0x0; 2026-02-21T12:40:23.8189586Z mov.u32 %r4353, 0x0; 2026-02-21T12:40:23.8189646Z mov.u32 %r4354, 0x0; 2026-02-21T12:40:23.8189706Z mov.u32 %r4355, 0x0; 2026-02-21T12:40:23.8189842Z ld.global.v4.b32 { %r4352, %r4353, %r4354, %r4355 }, [ %rd202 + 0 ]; 2026-02-21T12:40:23.8189906Z // end inline asm 2026-02-21T12:40:23.8190107Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8190170Z bar.sync 0; 2026-02-21T12:40:23.8190237Z st.shared.b8 [%r13], %r4352; 2026-02-21T12:40:23.8190308Z prmt.b32 %r5299, %r4352, 0, 0x7771U; 2026-02-21T12:40:23.8190374Z st.shared.b8 [%r14], %r5299; 2026-02-21T12:40:23.8190447Z prmt.b32 %r5300, %r4352, 0, 0x7772U; 2026-02-21T12:40:23.8190514Z st.shared.b8 [%r15+256], %r5300; 2026-02-21T12:40:23.8190581Z prmt.b32 %r5301, %r4352, 0, 0x7773U; 2026-02-21T12:40:23.8190652Z st.shared.b8 [%r16+256], %r5301; 2026-02-21T12:40:23.8190722Z st.shared.b8 [%r17+512], %r4353; 2026-02-21T12:40:23.8190790Z prmt.b32 %r5302, %r4353, 0, 0x7771U; 2026-02-21T12:40:23.8190856Z st.shared.b8 [%r18+512], %r5302; 2026-02-21T12:40:23.8191030Z prmt.b32 %r5303, %r4353, 0, 0x7772U; 2026-02-21T12:40:23.8191100Z st.shared.b8 [%r19+768], %r5303; 2026-02-21T12:40:23.8191235Z prmt.b32 %r5304, %r4353, 0, 0x7773U; 2026-02-21T12:40:23.8191309Z st.shared.b8 [%r20+768], %r5304; 2026-02-21T12:40:23.8191379Z st.shared.b8 [%r21+1024], %r4354; 2026-02-21T12:40:23.8191446Z prmt.b32 %r5305, %r4354, 0, 0x7771U; 2026-02-21T12:40:23.8191516Z st.shared.b8 [%r22+1024], %r5305; 2026-02-21T12:40:23.8191585Z prmt.b32 %r5306, %r4354, 0, 0x7772U; 2026-02-21T12:40:23.8191649Z st.shared.b8 [%r23+1280], %r5306; 2026-02-21T12:40:23.8191715Z prmt.b32 %r5307, %r4354, 0, 0x7773U; 2026-02-21T12:40:23.8191786Z st.shared.b8 [%r24+1280], %r5307; 2026-02-21T12:40:23.8191850Z st.shared.b8 [%r25+1536], %r4355; 2026-02-21T12:40:23.8191918Z prmt.b32 %r5308, %r4355, 0, 0x7771U; 2026-02-21T12:40:23.8191987Z st.shared.b8 [%r26+1536], %r5308; 2026-02-21T12:40:23.8192052Z prmt.b32 %r5309, %r4355, 0, 0x7772U; 2026-02-21T12:40:23.8192118Z st.shared.b8 [%r27+1792], %r5309; 2026-02-21T12:40:23.8192184Z prmt.b32 %r5310, %r4355, 0, 0x7773U; 2026-02-21T12:40:23.8192321Z st.shared.b8 [%r28+1792], %r5310; 2026-02-21T12:40:23.8192442Z bar.sync 0; 2026-02-21T12:40:23.8192513Z ld.shared.b32 %r5311, [%r29]; 2026-02-21T12:40:23.8192585Z prmt.b32 %r5312, %r5311, 0, 0x7770U; 2026-02-21T12:40:23.8192654Z cvt.u16.u32 %rs217, %r5312; 2026-02-21T12:40:23.8192719Z prmt.b32 %r5313, %r5311, 0, 0x7771U; 2026-02-21T12:40:23.8192787Z cvt.u16.u32 %rs218, %r5313; 2026-02-21T12:40:23.8192853Z prmt.b32 %r5314, %r5311, 0, 0x7772U; 2026-02-21T12:40:23.8192917Z cvt.u16.u32 %rs219, %r5314; 2026-02-21T12:40:23.8192982Z prmt.b32 %r5315, %r5311, 0, 0x7773U; 2026-02-21T12:40:23.8193051Z cvt.u16.u32 %rs220, %r5315; 2026-02-21T12:40:23.8193117Z ld.shared.b32 %r5316, [%r30]; 2026-02-21T12:40:23.8193182Z prmt.b32 %r5317, %r5316, 0, 0x7770U; 2026-02-21T12:40:23.8193247Z cvt.u16.u32 %rs221, %r5317; 2026-02-21T12:40:23.8193310Z prmt.b32 %r5318, %r5316, 0, 0x7771U; 2026-02-21T12:40:23.8193374Z cvt.u16.u32 %rs222, %r5318; 2026-02-21T12:40:23.8193437Z prmt.b32 %r5319, %r5316, 0, 0x7772U; 2026-02-21T12:40:23.8193507Z cvt.u16.u32 %rs223, %r5319; 2026-02-21T12:40:23.8193571Z prmt.b32 %r5320, %r5316, 0, 0x7773U; 2026-02-21T12:40:23.8193633Z cvt.u16.u32 %rs224, %r5320; 2026-02-21T12:40:23.8193703Z ld.shared.b32 %r5321, [%r31]; 2026-02-21T12:40:23.8193768Z prmt.b32 %r5322, %r5321, 0, 0x7770U; 2026-02-21T12:40:23.8193829Z cvt.u16.u32 %rs225, %r5322; 2026-02-21T12:40:23.8193895Z prmt.b32 %r5323, %r5321, 0, 0x7771U; 2026-02-21T12:40:23.8193963Z cvt.u16.u32 %rs226, %r5323; 2026-02-21T12:40:23.8194027Z prmt.b32 %r5324, %r5321, 0, 0x7772U; 2026-02-21T12:40:23.8194088Z cvt.u16.u32 %rs227, %r5324; 2026-02-21T12:40:23.8194156Z prmt.b32 %r5325, %r5321, 0, 0x7773U; 2026-02-21T12:40:23.8194216Z cvt.u16.u32 %rs228, %r5325; 2026-02-21T12:40:23.8194293Z ld.shared.b32 %r5326, [%r32]; 2026-02-21T12:40:23.8194363Z prmt.b32 %r5327, %r5326, 0, 0x7770U; 2026-02-21T12:40:23.8194427Z cvt.u16.u32 %rs229, %r5327; 2026-02-21T12:40:23.8194492Z prmt.b32 %r5328, %r5326, 0, 0x7771U; 2026-02-21T12:40:23.8194557Z cvt.u16.u32 %rs230, %r5328; 2026-02-21T12:40:23.8194630Z prmt.b32 %r5329, %r5326, 0, 0x7772U; 2026-02-21T12:40:23.8194693Z cvt.u16.u32 %rs231, %r5329; 2026-02-21T12:40:23.8194758Z prmt.b32 %r5330, %r5326, 0, 0x7773U; 2026-02-21T12:40:23.8194823Z cvt.u16.u32 %rs232, %r5330; 2026-02-21T12:40:23.8195032Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8195098Z shl.b16 %rs233, %rs217, 4; 2026-02-21T12:40:23.8195162Z shl.b16 %rs234, %rs221, 4; 2026-02-21T12:40:23.8195229Z shl.b16 %rs235, %rs225, 4; 2026-02-21T12:40:23.8195289Z shl.b16 %rs236, %rs229, 4; 2026-02-21T12:40:23.8195354Z shl.b16 %rs237, %rs218, 4; 2026-02-21T12:40:23.8195419Z shl.b16 %rs238, %rs222, 4; 2026-02-21T12:40:23.8195480Z shl.b16 %rs239, %rs226, 4; 2026-02-21T12:40:23.8195540Z shl.b16 %rs240, %rs230, 4; 2026-02-21T12:40:23.8195672Z shl.b16 %rs241, %rs219, 4; 2026-02-21T12:40:23.8195734Z shl.b16 %rs242, %rs223, 4; 2026-02-21T12:40:23.8195856Z shl.b16 %rs243, %rs227, 4; 2026-02-21T12:40:23.8195920Z shl.b16 %rs244, %rs231, 4; 2026-02-21T12:40:23.8195987Z shl.b16 %rs245, %rs220, 4; 2026-02-21T12:40:23.8196048Z shl.b16 %rs246, %rs224, 4; 2026-02-21T12:40:23.8196111Z shl.b16 %rs247, %rs228, 4; 2026-02-21T12:40:23.8196176Z shl.b16 %rs248, %rs232, 4; 2026-02-21T12:40:23.8196376Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8196440Z cvt.s16.s8 %rs249, %rs233; 2026-02-21T12:40:23.8196616Z shr.s16 %rs250, %rs249, 4; 2026-02-21T12:40:23.8196695Z cvt.s16.s8 %rs251, %rs235; 2026-02-21T12:40:23.8196762Z shr.s16 %rs252, %rs251, 4; 2026-02-21T12:40:23.8196831Z prmt.b32 %r5331, %r5311, 0, 0x8880U; 2026-02-21T12:40:23.8196899Z cvt.u16.u32 %rs253, %r5331; 2026-02-21T12:40:23.8196960Z shr.s16 %rs254, %rs253, 4; 2026-02-21T12:40:23.8197029Z prmt.b32 %r5332, %r5321, 0, 0x8880U; 2026-02-21T12:40:23.8197095Z cvt.u16.u32 %rs255, %r5332; 2026-02-21T12:40:23.8197244Z shr.s16 %rs256, %rs255, 4; 2026-02-21T12:40:23.8197517Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8197590Z cvt.rn.f32.s16 %r5333, %rs256; 2026-02-21T12:40:23.8197662Z cvt.rn.f32.s16 %r5334, %rs254; 2026-02-21T12:40:23.8197725Z cvt.rn.f32.s16 %r5335, %rs252; 2026-02-21T12:40:23.8197787Z cvt.rn.f32.s16 %r5336, %rs250; 2026-02-21T12:40:23.8197985Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8198049Z cvt.s16.s8 %rs257, %rs234; 2026-02-21T12:40:23.8198111Z shr.s16 %rs258, %rs257, 4; 2026-02-21T12:40:23.8198172Z cvt.s16.s8 %rs259, %rs236; 2026-02-21T12:40:23.8198238Z shr.s16 %rs260, %rs259, 4; 2026-02-21T12:40:23.8198304Z prmt.b32 %r5337, %r5316, 0, 0x8880U; 2026-02-21T12:40:23.8198371Z cvt.u16.u32 %rs261, %r5337; 2026-02-21T12:40:23.8198437Z shr.s16 %rs262, %rs261, 4; 2026-02-21T12:40:23.8198503Z prmt.b32 %r5338, %r5326, 0, 0x8880U; 2026-02-21T12:40:23.8198571Z cvt.u16.u32 %rs263, %r5338; 2026-02-21T12:40:23.8198635Z shr.s16 %rs264, %rs263, 4; 2026-02-21T12:40:23.8198834Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8198900Z cvt.rn.f32.s16 %r5339, %rs264; 2026-02-21T12:40:23.8198963Z cvt.rn.f32.s16 %r5340, %rs262; 2026-02-21T12:40:23.8199029Z cvt.rn.f32.s16 %r5341, %rs260; 2026-02-21T12:40:23.8199094Z cvt.rn.f32.s16 %r5342, %rs258; 2026-02-21T12:40:23.8199288Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8199355Z cvt.s16.s8 %rs265, %rs237; 2026-02-21T12:40:23.8199417Z shr.s16 %rs266, %rs265, 4; 2026-02-21T12:40:23.8199480Z cvt.s16.s8 %rs267, %rs239; 2026-02-21T12:40:23.8199542Z shr.s16 %rs268, %rs267, 4; 2026-02-21T12:40:23.8199616Z prmt.b32 %r5343, %r5311, 0, 0x9991U; 2026-02-21T12:40:23.8199678Z cvt.u16.u32 %rs269, %r5343; 2026-02-21T12:40:23.8199744Z shr.s16 %rs270, %rs269, 4; 2026-02-21T12:40:23.8199819Z prmt.b32 %r5344, %r5321, 0, 0x9991U; 2026-02-21T12:40:23.8199883Z cvt.u16.u32 %rs271, %r5344; 2026-02-21T12:40:23.8199948Z shr.s16 %rs272, %rs271, 4; 2026-02-21T12:40:23.8200149Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8200217Z cvt.rn.f32.s16 %r5345, %rs272; 2026-02-21T12:40:23.8200281Z cvt.rn.f32.s16 %r5346, %rs270; 2026-02-21T12:40:23.8200345Z cvt.rn.f32.s16 %r5347, %rs268; 2026-02-21T12:40:23.8200415Z cvt.rn.f32.s16 %r5348, %rs266; 2026-02-21T12:40:23.8200611Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8200676Z cvt.s16.s8 %rs273, %rs238; 2026-02-21T12:40:23.8200746Z shr.s16 %rs274, %rs273, 4; 2026-02-21T12:40:23.8200906Z cvt.s16.s8 %rs275, %rs240; 2026-02-21T12:40:23.8200969Z shr.s16 %rs276, %rs275, 4; 2026-02-21T12:40:23.8201035Z prmt.b32 %r5349, %r5316, 0, 0x9991U; 2026-02-21T12:40:23.8201166Z cvt.u16.u32 %rs277, %r5349; 2026-02-21T12:40:23.8201227Z shr.s16 %rs278, %rs277, 4; 2026-02-21T12:40:23.8201291Z prmt.b32 %r5350, %r5326, 0, 0x9991U; 2026-02-21T12:40:23.8201357Z cvt.u16.u32 %rs279, %r5350; 2026-02-21T12:40:23.8201419Z shr.s16 %rs280, %rs279, 4; 2026-02-21T12:40:23.8201633Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8201703Z cvt.rn.f32.s16 %r5351, %rs280; 2026-02-21T12:40:23.8201765Z cvt.rn.f32.s16 %r5352, %rs278; 2026-02-21T12:40:23.8201826Z cvt.rn.f32.s16 %r5353, %rs276; 2026-02-21T12:40:23.8201888Z cvt.rn.f32.s16 %r5354, %rs274; 2026-02-21T12:40:23.8202089Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8202152Z cvt.s16.s8 %rs281, %rs241; 2026-02-21T12:40:23.8202216Z shr.s16 %rs282, %rs281, 4; 2026-02-21T12:40:23.8202283Z cvt.s16.s8 %rs283, %rs243; 2026-02-21T12:40:23.8202395Z shr.s16 %rs284, %rs283, 4; 2026-02-21T12:40:23.8202510Z prmt.b32 %r5355, %r5311, 0, 0xaaa2U; 2026-02-21T12:40:23.8202574Z cvt.u16.u32 %rs285, %r5355; 2026-02-21T12:40:23.8202642Z shr.s16 %rs286, %rs285, 4; 2026-02-21T12:40:23.8202706Z prmt.b32 %r5356, %r5321, 0, 0xaaa2U; 2026-02-21T12:40:23.8202781Z cvt.u16.u32 %rs287, %r5356; 2026-02-21T12:40:23.8202849Z shr.s16 %rs288, %rs287, 4; 2026-02-21T12:40:23.8203044Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8203108Z cvt.rn.f32.s16 %r5357, %rs288; 2026-02-21T12:40:23.8203175Z cvt.rn.f32.s16 %r5358, %rs286; 2026-02-21T12:40:23.8203235Z cvt.rn.f32.s16 %r5359, %rs284; 2026-02-21T12:40:23.8203297Z cvt.rn.f32.s16 %r5360, %rs282; 2026-02-21T12:40:23.8203492Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8203564Z cvt.s16.s8 %rs289, %rs242; 2026-02-21T12:40:23.8203626Z shr.s16 %rs290, %rs289, 4; 2026-02-21T12:40:23.8203692Z cvt.s16.s8 %rs291, %rs244; 2026-02-21T12:40:23.8203757Z shr.s16 %rs292, %rs291, 4; 2026-02-21T12:40:23.8203822Z prmt.b32 %r5361, %r5316, 0, 0xaaa2U; 2026-02-21T12:40:23.8203885Z cvt.u16.u32 %rs293, %r5361; 2026-02-21T12:40:23.8203948Z shr.s16 %rs294, %rs293, 4; 2026-02-21T12:40:23.8204022Z prmt.b32 %r5362, %r5326, 0, 0xaaa2U; 2026-02-21T12:40:23.8204083Z cvt.u16.u32 %rs295, %r5362; 2026-02-21T12:40:23.8204148Z shr.s16 %rs296, %rs295, 4; 2026-02-21T12:40:23.8204347Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8204411Z cvt.rn.f32.s16 %r5363, %rs296; 2026-02-21T12:40:23.8204474Z cvt.rn.f32.s16 %r5364, %rs294; 2026-02-21T12:40:23.8204541Z cvt.rn.f32.s16 %r5365, %rs292; 2026-02-21T12:40:23.8204603Z cvt.rn.f32.s16 %r5366, %rs290; 2026-02-21T12:40:23.8204798Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8204863Z cvt.s16.s8 %rs297, %rs245; 2026-02-21T12:40:23.8204934Z shr.s16 %rs298, %rs297, 4; 2026-02-21T12:40:23.8204997Z cvt.s16.s8 %rs299, %rs247; 2026-02-21T12:40:23.8205059Z shr.s16 %rs300, %rs299, 4; 2026-02-21T12:40:23.8205130Z prmt.b32 %r5367, %r5311, 0, 0xbbb3U; 2026-02-21T12:40:23.8205194Z cvt.u16.u32 %rs301, %r5367; 2026-02-21T12:40:23.8205256Z shr.s16 %rs302, %rs301, 4; 2026-02-21T12:40:23.8205322Z prmt.b32 %r5368, %r5321, 0, 0xbbb3U; 2026-02-21T12:40:23.8205391Z cvt.u16.u32 %rs303, %r5368; 2026-02-21T12:40:23.8205457Z shr.s16 %rs304, %rs303, 4; 2026-02-21T12:40:23.8205661Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8205736Z cvt.rn.f32.s16 %r5369, %rs304; 2026-02-21T12:40:23.8205800Z cvt.rn.f32.s16 %r5370, %rs302; 2026-02-21T12:40:23.8205935Z cvt.rn.f32.s16 %r5371, %rs300; 2026-02-21T12:40:23.8206004Z cvt.rn.f32.s16 %r5372, %rs298; 2026-02-21T12:40:23.8206204Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8206319Z cvt.s16.s8 %rs305, %rs246; 2026-02-21T12:40:23.8206381Z shr.s16 %rs306, %rs305, 4; 2026-02-21T12:40:23.8206565Z cvt.s16.s8 %rs307, %rs248; 2026-02-21T12:40:23.8206633Z shr.s16 %rs308, %rs307, 4; 2026-02-21T12:40:23.8206699Z prmt.b32 %r5373, %r5316, 0, 0xbbb3U; 2026-02-21T12:40:23.8206767Z cvt.u16.u32 %rs309, %r5373; 2026-02-21T12:40:23.8206830Z shr.s16 %rs310, %rs309, 4; 2026-02-21T12:40:23.8206897Z prmt.b32 %r5374, %r5326, 0, 0xbbb3U; 2026-02-21T12:40:23.8206965Z cvt.u16.u32 %rs311, %r5374; 2026-02-21T12:40:23.8207028Z shr.s16 %rs312, %rs311, 4; 2026-02-21T12:40:23.8207225Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8207292Z cvt.rn.f32.s16 %r5375, %rs312; 2026-02-21T12:40:23.8207364Z cvt.rn.f32.s16 %r5376, %rs310; 2026-02-21T12:40:23.8207428Z cvt.rn.f32.s16 %r5377, %rs308; 2026-02-21T12:40:23.8207575Z cvt.rn.f32.s16 %r5378, %rs306; 2026-02-21T12:40:23.8207698Z bar.sync 0; 2026-02-21T12:40:23.8207818Z st.shared.v4.b32 [%r33], {%r5336, %r5334, %r5335, %r5333}; 2026-02-21T12:40:23.8207940Z st.shared.v4.b32 [%r33+8192], {%r5342, %r5340, %r5341, %r5339}; 2026-02-21T12:40:23.8208050Z st.shared.v4.b32 [%r34], {%r5348, %r5346, %r5347, %r5345}; 2026-02-21T12:40:23.8208168Z st.shared.v4.b32 [%r34+8192], {%r5354, %r5352, %r5353, %r5351}; 2026-02-21T12:40:23.8208273Z st.shared.v4.b32 [%r35], {%r5360, %r5358, %r5359, %r5357}; 2026-02-21T12:40:23.8208387Z st.shared.v4.b32 [%r35+8192], {%r5366, %r5364, %r5365, %r5363}; 2026-02-21T12:40:23.8208498Z st.shared.v4.b32 [%r36], {%r5372, %r5370, %r5371, %r5369}; 2026-02-21T12:40:23.8208609Z st.shared.v4.b32 [%r36+8192], {%r5378, %r5376, %r5377, %r5375}; 2026-02-21T12:40:23.8208667Z $L__tmp5: 2026-02-21T12:40:23.8208948Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8209015Z // begin inline asm 2026-02-21T12:40:23.8209096Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8209158Z // end inline asm 2026-02-21T12:40:23.8209217Z bar.sync 0; 2026-02-21T12:40:23.8209293Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8209353Z // begin inline asm 2026-02-21T12:40:23.8212082Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r4612,%r4613,%r4614,%r4615}, %rd23, %p3, 1, 1; 2026-02-21T12:40:23.8212146Z // end inline asm 2026-02-21T12:40:23.8212211Z // begin inline asm 2026-02-21T12:40:23.8214914Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r4872,%r4873,%r4874,%r4875}, %rd24, %p3, 1, 1; 2026-02-21T12:40:23.8215106Z // end inline asm 2026-02-21T12:40:23.8215187Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8215248Z mov.b32 %r5004, %r18409; 2026-02-21T12:40:23.8215327Z mov.b32 %r5006, %r5005; 2026-02-21T12:40:23.8215392Z // begin inline asm 2026-02-21T12:40:23.8218164Z // wait for regs: %r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457,%r5004,%r5005,%r5006 2026-02-21T12:40:23.8218256Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8218318Z // end inline asm 2026-02-21T12:40:23.8218376Z $L__tmp6: 2026-02-21T12:40:23.8218592Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8218658Z add.s64 %rd616, %rd616, 24; 2026-02-21T12:40:23.8218729Z add.s64 %rd615, %rd615, 96; 2026-02-21T12:40:23.8218802Z add.s64 %rd614, %rd614, 30720; 2026-02-21T12:40:23.8218872Z setp.lt.u64 %p9, %rd616, 4056; 2026-02-21T12:40:23.8218934Z @%p9 bra $L__BB0_14; 2026-02-21T12:40:23.8219034Z // %bb.15: // %.preheader157 2026-02-21T12:40:23.8219141Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8219208Z add.s64 %rd618, %rd27, %rd46; 2026-02-21T12:40:23.8219276Z add.s64 %rd617, %rd28, %rd45; 2026-02-21T12:40:23.8219340Z mov.b64 %rd619, 4072; 2026-02-21T12:40:23.8219456Z $L__BB0_16: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8219574Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8219777Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8219837Z // begin inline asm 2026-02-21T12:40:23.8219897Z mov.u64 %rd206, 0x0; 2026-02-21T12:40:23.8220028Z createpolicy.fractional.L2::evict_last.b64 %rd206, 1.0; 2026-02-21T12:40:23.8220086Z // end inline asm 2026-02-21T12:40:23.8220146Z // begin inline asm 2026-02-21T12:40:23.8220211Z mov.u32 %r5379, 0x0; 2026-02-21T12:40:23.8220270Z mov.u32 %r5380, 0x0; 2026-02-21T12:40:23.8220328Z mov.u32 %r5381, 0x0; 2026-02-21T12:40:23.8220391Z mov.u32 %r5382, 0x0; 2026-02-21T12:40:23.8220617Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5379, %r5380, %r5381, %r5382 }, [ %rd618 + 0 ], %rd206; 2026-02-21T12:40:23.8220746Z // end inline asm 2026-02-21T12:40:23.8220946Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8221084Z bar.sync 0; 2026-02-21T12:40:23.8221167Z st.shared.v2.b32 [%r9], {%r5379, %r5380}; 2026-02-21T12:40:23.8221248Z st.shared.v2.b32 [%r10], {%r5381, %r5382}; 2026-02-21T12:40:23.8221309Z bar.sync 0; 2026-02-21T12:40:23.8221376Z ld.shared.b16 %rs313, [%r11]; 2026-02-21T12:40:23.8221445Z ld.shared.b16 %rs314, [%r11+256]; 2026-02-21T12:40:23.8221521Z ld.shared.b16 %rs315, [%r11+16]; 2026-02-21T12:40:23.8221587Z ld.shared.b16 %rs316, [%r11+272]; 2026-02-21T12:40:23.8221651Z ld.shared.b16 %rs317, [%r12]; 2026-02-21T12:40:23.8221717Z ld.shared.b16 %rs318, [%r12+256]; 2026-02-21T12:40:23.8221790Z ld.shared.b16 %rs319, [%r12+16]; 2026-02-21T12:40:23.8221858Z ld.shared.b16 %rs320, [%r12+272]; 2026-02-21T12:40:23.8221924Z cvt.f32.bf16 %r5643, %rs313; 2026-02-21T12:40:23.8221993Z cvt.f32.bf16 %r5644, %rs314; 2026-02-21T12:40:23.8222057Z cvt.f32.bf16 %r5645, %rs317; 2026-02-21T12:40:23.8222120Z cvt.f32.bf16 %r5646, %rs318; 2026-02-21T12:40:23.8222234Z cvt.f32.bf16 %r5903, %rs315; 2026-02-21T12:40:23.8222365Z cvt.f32.bf16 %r5904, %rs316; 2026-02-21T12:40:23.8222430Z cvt.f32.bf16 %r5905, %rs319; 2026-02-21T12:40:23.8222493Z cvt.f32.bf16 %r5906, %rs320; 2026-02-21T12:40:23.8222696Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8222756Z // begin inline asm 2026-02-21T12:40:23.8222814Z mov.u32 %r5383, 0x0; 2026-02-21T12:40:23.8222873Z mov.u32 %r5384, 0x0; 2026-02-21T12:40:23.8222936Z mov.u32 %r5385, 0x0; 2026-02-21T12:40:23.8222993Z mov.u32 %r5386, 0x0; 2026-02-21T12:40:23.8223121Z ld.global.v4.b32 { %r5383, %r5384, %r5385, %r5386 }, [ %rd617 + 0 ]; 2026-02-21T12:40:23.8223184Z // end inline asm 2026-02-21T12:40:23.8223378Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8223438Z bar.sync 0; 2026-02-21T12:40:23.8223508Z st.shared.b8 [%r13], %r5383; 2026-02-21T12:40:23.8223580Z prmt.b32 %r6169, %r5383, 0, 0x7771U; 2026-02-21T12:40:23.8223648Z st.shared.b8 [%r14], %r6169; 2026-02-21T12:40:23.8223714Z prmt.b32 %r6170, %r5383, 0, 0x7772U; 2026-02-21T12:40:23.8223789Z st.shared.b8 [%r15+256], %r6170; 2026-02-21T12:40:23.8223857Z prmt.b32 %r6171, %r5383, 0, 0x7773U; 2026-02-21T12:40:23.8223934Z st.shared.b8 [%r16+256], %r6171; 2026-02-21T12:40:23.8224007Z st.shared.b8 [%r17+512], %r5384; 2026-02-21T12:40:23.8224077Z prmt.b32 %r6172, %r5384, 0, 0x7771U; 2026-02-21T12:40:23.8224146Z st.shared.b8 [%r18+512], %r6172; 2026-02-21T12:40:23.8224216Z prmt.b32 %r6173, %r5384, 0, 0x7772U; 2026-02-21T12:40:23.8224288Z st.shared.b8 [%r19+768], %r6173; 2026-02-21T12:40:23.8224357Z prmt.b32 %r6174, %r5384, 0, 0x7773U; 2026-02-21T12:40:23.8224424Z st.shared.b8 [%r20+768], %r6174; 2026-02-21T12:40:23.8224496Z st.shared.b8 [%r21+1024], %r5385; 2026-02-21T12:40:23.8224564Z prmt.b32 %r6175, %r5385, 0, 0x7771U; 2026-02-21T12:40:23.8224631Z st.shared.b8 [%r22+1024], %r6175; 2026-02-21T12:40:23.8224713Z prmt.b32 %r6176, %r5385, 0, 0x7772U; 2026-02-21T12:40:23.8224780Z st.shared.b8 [%r23+1280], %r6176; 2026-02-21T12:40:23.8224846Z prmt.b32 %r6177, %r5385, 0, 0x7773U; 2026-02-21T12:40:23.8224912Z st.shared.b8 [%r24+1280], %r6177; 2026-02-21T12:40:23.8224986Z st.shared.b8 [%r25+1536], %r5386; 2026-02-21T12:40:23.8225053Z prmt.b32 %r6178, %r5386, 0, 0x7771U; 2026-02-21T12:40:23.8225119Z st.shared.b8 [%r26+1536], %r6178; 2026-02-21T12:40:23.8225195Z prmt.b32 %r6179, %r5386, 0, 0x7772U; 2026-02-21T12:40:23.8225261Z st.shared.b8 [%r27+1792], %r6179; 2026-02-21T12:40:23.8225328Z prmt.b32 %r6180, %r5386, 0, 0x7773U; 2026-02-21T12:40:23.8225396Z st.shared.b8 [%r28+1792], %r6180; 2026-02-21T12:40:23.8225464Z bar.sync 0; 2026-02-21T12:40:23.8225531Z ld.shared.b32 %r6181, [%r29]; 2026-02-21T12:40:23.8225597Z prmt.b32 %r6182, %r6181, 0, 0x7770U; 2026-02-21T12:40:23.8225730Z cvt.u16.u32 %rs321, %r6182; 2026-02-21T12:40:23.8225803Z prmt.b32 %r6183, %r6181, 0, 0x7771U; 2026-02-21T12:40:23.8225917Z cvt.u16.u32 %rs322, %r6183; 2026-02-21T12:40:23.8225990Z prmt.b32 %r6184, %r6181, 0, 0x7772U; 2026-02-21T12:40:23.8226055Z cvt.u16.u32 %rs323, %r6184; 2026-02-21T12:40:23.8226121Z prmt.b32 %r6185, %r6181, 0, 0x7773U; 2026-02-21T12:40:23.8226184Z cvt.u16.u32 %rs324, %r6185; 2026-02-21T12:40:23.8226260Z ld.shared.b32 %r6186, [%r30]; 2026-02-21T12:40:23.8226328Z prmt.b32 %r6187, %r6186, 0, 0x7770U; 2026-02-21T12:40:23.8226391Z cvt.u16.u32 %rs325, %r6187; 2026-02-21T12:40:23.8226592Z prmt.b32 %r6188, %r6186, 0, 0x7771U; 2026-02-21T12:40:23.8226664Z cvt.u16.u32 %rs326, %r6188; 2026-02-21T12:40:23.8226731Z prmt.b32 %r6189, %r6186, 0, 0x7772U; 2026-02-21T12:40:23.8226796Z cvt.u16.u32 %rs327, %r6189; 2026-02-21T12:40:23.8226867Z prmt.b32 %r6190, %r6186, 0, 0x7773U; 2026-02-21T12:40:23.8226930Z cvt.u16.u32 %rs328, %r6190; 2026-02-21T12:40:23.8227009Z ld.shared.b32 %r6191, [%r31]; 2026-02-21T12:40:23.8227084Z prmt.b32 %r6192, %r6191, 0, 0x7770U; 2026-02-21T12:40:23.8227222Z cvt.u16.u32 %rs329, %r6192; 2026-02-21T12:40:23.8227351Z prmt.b32 %r6193, %r6191, 0, 0x7771U; 2026-02-21T12:40:23.8227418Z cvt.u16.u32 %rs330, %r6193; 2026-02-21T12:40:23.8227489Z prmt.b32 %r6194, %r6191, 0, 0x7772U; 2026-02-21T12:40:23.8227552Z cvt.u16.u32 %rs331, %r6194; 2026-02-21T12:40:23.8227618Z prmt.b32 %r6195, %r6191, 0, 0x7773U; 2026-02-21T12:40:23.8227684Z cvt.u16.u32 %rs332, %r6195; 2026-02-21T12:40:23.8227751Z ld.shared.b32 %r6196, [%r32]; 2026-02-21T12:40:23.8227816Z prmt.b32 %r6197, %r6196, 0, 0x7770U; 2026-02-21T12:40:23.8227884Z cvt.u16.u32 %rs333, %r6197; 2026-02-21T12:40:23.8227949Z prmt.b32 %r6198, %r6196, 0, 0x7771U; 2026-02-21T12:40:23.8228016Z cvt.u16.u32 %rs334, %r6198; 2026-02-21T12:40:23.8228082Z prmt.b32 %r6199, %r6196, 0, 0x7772U; 2026-02-21T12:40:23.8228154Z cvt.u16.u32 %rs335, %r6199; 2026-02-21T12:40:23.8228220Z prmt.b32 %r6200, %r6196, 0, 0x7773U; 2026-02-21T12:40:23.8228283Z cvt.u16.u32 %rs336, %r6200; 2026-02-21T12:40:23.8228558Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8228630Z shl.b16 %rs337, %rs321, 4; 2026-02-21T12:40:23.8228695Z shl.b16 %rs338, %rs325, 4; 2026-02-21T12:40:23.8228758Z shl.b16 %rs339, %rs329, 4; 2026-02-21T12:40:23.8228827Z shl.b16 %rs340, %rs333, 4; 2026-02-21T12:40:23.8228891Z shl.b16 %rs341, %rs322, 4; 2026-02-21T12:40:23.8228953Z shl.b16 %rs342, %rs326, 4; 2026-02-21T12:40:23.8229023Z shl.b16 %rs343, %rs330, 4; 2026-02-21T12:40:23.8229085Z shl.b16 %rs344, %rs334, 4; 2026-02-21T12:40:23.8229147Z shl.b16 %rs345, %rs323, 4; 2026-02-21T12:40:23.8229210Z shl.b16 %rs346, %rs327, 4; 2026-02-21T12:40:23.8229281Z shl.b16 %rs347, %rs331, 4; 2026-02-21T12:40:23.8229343Z shl.b16 %rs348, %rs335, 4; 2026-02-21T12:40:23.8229405Z shl.b16 %rs349, %rs324, 4; 2026-02-21T12:40:23.8229476Z shl.b16 %rs350, %rs328, 4; 2026-02-21T12:40:23.8229538Z shl.b16 %rs351, %rs332, 4; 2026-02-21T12:40:23.8229601Z shl.b16 %rs352, %rs336, 4; 2026-02-21T12:40:23.8229817Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8229882Z cvt.s16.s8 %rs353, %rs337; 2026-02-21T12:40:23.8229944Z shr.s16 %rs354, %rs353, 4; 2026-02-21T12:40:23.8230006Z cvt.s16.s8 %rs355, %rs339; 2026-02-21T12:40:23.8230076Z shr.s16 %rs356, %rs355, 4; 2026-02-21T12:40:23.8230156Z prmt.b32 %r6201, %r6181, 0, 0x8880U; 2026-02-21T12:40:23.8230222Z cvt.u16.u32 %rs357, %r6201; 2026-02-21T12:40:23.8230299Z shr.s16 %rs358, %rs357, 4; 2026-02-21T12:40:23.8230364Z prmt.b32 %r6202, %r6191, 0, 0x8880U; 2026-02-21T12:40:23.8230429Z cvt.u16.u32 %rs359, %r6202; 2026-02-21T12:40:23.8230492Z shr.s16 %rs360, %rs359, 4; 2026-02-21T12:40:23.8230697Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8230846Z cvt.rn.f32.s16 %r6203, %rs360; 2026-02-21T12:40:23.8230912Z cvt.rn.f32.s16 %r6204, %rs358; 2026-02-21T12:40:23.8230987Z cvt.rn.f32.s16 %r6205, %rs356; 2026-02-21T12:40:23.8231113Z cvt.rn.f32.s16 %r6206, %rs354; 2026-02-21T12:40:23.8231311Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8231383Z cvt.s16.s8 %rs361, %rs338; 2026-02-21T12:40:23.8231447Z shr.s16 %rs362, %rs361, 4; 2026-02-21T12:40:23.8231509Z cvt.s16.s8 %rs363, %rs340; 2026-02-21T12:40:23.8231580Z shr.s16 %rs364, %rs363, 4; 2026-02-21T12:40:23.8231654Z prmt.b32 %r6207, %r6186, 0, 0x8880U; 2026-02-21T12:40:23.8231720Z cvt.u16.u32 %rs365, %r6207; 2026-02-21T12:40:23.8231783Z shr.s16 %rs366, %rs365, 4; 2026-02-21T12:40:23.8231855Z prmt.b32 %r6208, %r6196, 0, 0x8880U; 2026-02-21T12:40:23.8231919Z cvt.u16.u32 %rs367, %r6208; 2026-02-21T12:40:23.8231996Z shr.s16 %rs368, %rs367, 4; 2026-02-21T12:40:23.8232198Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8232273Z cvt.rn.f32.s16 %r6209, %rs368; 2026-02-21T12:40:23.8232434Z cvt.rn.f32.s16 %r6210, %rs366; 2026-02-21T12:40:23.8232503Z cvt.rn.f32.s16 %r6211, %rs364; 2026-02-21T12:40:23.8232576Z cvt.rn.f32.s16 %r6212, %rs362; 2026-02-21T12:40:23.8232769Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8232830Z cvt.s16.s8 %rs369, %rs341; 2026-02-21T12:40:23.8232892Z shr.s16 %rs370, %rs369, 4; 2026-02-21T12:40:23.8232955Z cvt.s16.s8 %rs371, %rs343; 2026-02-21T12:40:23.8233018Z shr.s16 %rs372, %rs371, 4; 2026-02-21T12:40:23.8233084Z prmt.b32 %r6213, %r6181, 0, 0x9991U; 2026-02-21T12:40:23.8233158Z cvt.u16.u32 %rs373, %r6213; 2026-02-21T12:40:23.8233222Z shr.s16 %rs374, %rs373, 4; 2026-02-21T12:40:23.8233288Z prmt.b32 %r6214, %r6191, 0, 0x9991U; 2026-02-21T12:40:23.8233358Z cvt.u16.u32 %rs375, %r6214; 2026-02-21T12:40:23.8233425Z shr.s16 %rs376, %rs375, 4; 2026-02-21T12:40:23.8233624Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8233702Z cvt.rn.f32.s16 %r6215, %rs376; 2026-02-21T12:40:23.8233769Z cvt.rn.f32.s16 %r6216, %rs374; 2026-02-21T12:40:23.8233837Z cvt.rn.f32.s16 %r6217, %rs372; 2026-02-21T12:40:23.8233903Z cvt.rn.f32.s16 %r6218, %rs370; 2026-02-21T12:40:23.8234105Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8234172Z cvt.s16.s8 %rs377, %rs342; 2026-02-21T12:40:23.8234237Z shr.s16 %rs378, %rs377, 4; 2026-02-21T12:40:23.8234308Z cvt.s16.s8 %rs379, %rs344; 2026-02-21T12:40:23.8234370Z shr.s16 %rs380, %rs379, 4; 2026-02-21T12:40:23.8234439Z prmt.b32 %r6219, %r6186, 0, 0x9991U; 2026-02-21T12:40:23.8234504Z cvt.u16.u32 %rs381, %r6219; 2026-02-21T12:40:23.8234575Z shr.s16 %rs382, %rs381, 4; 2026-02-21T12:40:23.8234643Z prmt.b32 %r6220, %r6196, 0, 0x9991U; 2026-02-21T12:40:23.8234709Z cvt.u16.u32 %rs383, %r6220; 2026-02-21T12:40:23.8234778Z shr.s16 %rs384, %rs383, 4; 2026-02-21T12:40:23.8234975Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8235044Z cvt.rn.f32.s16 %r6221, %rs384; 2026-02-21T12:40:23.8235113Z cvt.rn.f32.s16 %r6222, %rs382; 2026-02-21T12:40:23.8235181Z cvt.rn.f32.s16 %r6223, %rs380; 2026-02-21T12:40:23.8235253Z cvt.rn.f32.s16 %r6224, %rs378; 2026-02-21T12:40:23.8235450Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8235522Z cvt.s16.s8 %rs385, %rs345; 2026-02-21T12:40:23.8235587Z shr.s16 %rs386, %rs385, 4; 2026-02-21T12:40:23.8235653Z cvt.s16.s8 %rs387, %rs347; 2026-02-21T12:40:23.8235727Z shr.s16 %rs388, %rs387, 4; 2026-02-21T12:40:23.8235797Z prmt.b32 %r6225, %r6181, 0, 0xaaa2U; 2026-02-21T12:40:23.8235864Z cvt.u16.u32 %rs389, %r6225; 2026-02-21T12:40:23.8235998Z shr.s16 %rs390, %rs389, 4; 2026-02-21T12:40:23.8236077Z prmt.b32 %r6226, %r6191, 0, 0xaaa2U; 2026-02-21T12:40:23.8236142Z cvt.u16.u32 %rs391, %r6226; 2026-02-21T12:40:23.8236259Z shr.s16 %rs392, %rs391, 4; 2026-02-21T12:40:23.8236594Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8236677Z cvt.rn.f32.s16 %r6227, %rs392; 2026-02-21T12:40:23.8236746Z cvt.rn.f32.s16 %r6228, %rs390; 2026-02-21T12:40:23.8236821Z cvt.rn.f32.s16 %r6229, %rs388; 2026-02-21T12:40:23.8236885Z cvt.rn.f32.s16 %r6230, %rs386; 2026-02-21T12:40:23.8237094Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8237161Z cvt.s16.s8 %rs393, %rs346; 2026-02-21T12:40:23.8237232Z shr.s16 %rs394, %rs393, 4; 2026-02-21T12:40:23.8237298Z cvt.s16.s8 %rs395, %rs348; 2026-02-21T12:40:23.8237364Z shr.s16 %rs396, %rs395, 4; 2026-02-21T12:40:23.8237439Z prmt.b32 %r6231, %r6186, 0, 0xaaa2U; 2026-02-21T12:40:23.8237506Z cvt.u16.u32 %rs397, %r6231; 2026-02-21T12:40:23.8237569Z shr.s16 %rs398, %rs397, 4; 2026-02-21T12:40:23.8237729Z prmt.b32 %r6232, %r6196, 0, 0xaaa2U; 2026-02-21T12:40:23.8237864Z cvt.u16.u32 %rs399, %r6232; 2026-02-21T12:40:23.8237932Z shr.s16 %rs400, %rs399, 4; 2026-02-21T12:40:23.8238128Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8238199Z cvt.rn.f32.s16 %r6233, %rs400; 2026-02-21T12:40:23.8238264Z cvt.rn.f32.s16 %r6234, %rs398; 2026-02-21T12:40:23.8238329Z cvt.rn.f32.s16 %r6235, %rs396; 2026-02-21T12:40:23.8238401Z cvt.rn.f32.s16 %r6236, %rs394; 2026-02-21T12:40:23.8238598Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8238664Z cvt.s16.s8 %rs401, %rs349; 2026-02-21T12:40:23.8238727Z shr.s16 %rs402, %rs401, 4; 2026-02-21T12:40:23.8238799Z cvt.s16.s8 %rs403, %rs351; 2026-02-21T12:40:23.8238864Z shr.s16 %rs404, %rs403, 4; 2026-02-21T12:40:23.8238932Z prmt.b32 %r6237, %r6181, 0, 0xbbb3U; 2026-02-21T12:40:23.8239004Z cvt.u16.u32 %rs405, %r6237; 2026-02-21T12:40:23.8239073Z shr.s16 %rs406, %rs405, 4; 2026-02-21T12:40:23.8239142Z prmt.b32 %r6238, %r6191, 0, 0xbbb3U; 2026-02-21T12:40:23.8239207Z cvt.u16.u32 %rs407, %r6238; 2026-02-21T12:40:23.8239281Z shr.s16 %rs408, %rs407, 4; 2026-02-21T12:40:23.8239479Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8239547Z cvt.rn.f32.s16 %r6239, %rs408; 2026-02-21T12:40:23.8239618Z cvt.rn.f32.s16 %r6240, %rs406; 2026-02-21T12:40:23.8239682Z cvt.rn.f32.s16 %r6241, %rs404; 2026-02-21T12:40:23.8239760Z cvt.rn.f32.s16 %r6242, %rs402; 2026-02-21T12:40:23.8239966Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8240029Z cvt.s16.s8 %rs409, %rs350; 2026-02-21T12:40:23.8240095Z shr.s16 %rs410, %rs409, 4; 2026-02-21T12:40:23.8240163Z cvt.s16.s8 %rs411, %rs352; 2026-02-21T12:40:23.8240235Z shr.s16 %rs412, %rs411, 4; 2026-02-21T12:40:23.8240308Z prmt.b32 %r6243, %r6186, 0, 0xbbb3U; 2026-02-21T12:40:23.8240376Z cvt.u16.u32 %rs413, %r6243; 2026-02-21T12:40:23.8240448Z shr.s16 %rs414, %rs413, 4; 2026-02-21T12:40:23.8240516Z prmt.b32 %r6244, %r6196, 0, 0xbbb3U; 2026-02-21T12:40:23.8240580Z cvt.u16.u32 %rs415, %r6244; 2026-02-21T12:40:23.8240643Z shr.s16 %rs416, %rs415, 4; 2026-02-21T12:40:23.8240846Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8240914Z cvt.rn.f32.s16 %r6245, %rs416; 2026-02-21T12:40:23.8240979Z cvt.rn.f32.s16 %r6246, %rs414; 2026-02-21T12:40:23.8241054Z cvt.rn.f32.s16 %r6247, %rs412; 2026-02-21T12:40:23.8241119Z cvt.rn.f32.s16 %r6248, %rs410; 2026-02-21T12:40:23.8241177Z bar.sync 0; 2026-02-21T12:40:23.8241300Z st.shared.v4.b32 [%r33], {%r6206, %r6204, %r6205, %r6203}; 2026-02-21T12:40:23.8241512Z st.shared.v4.b32 [%r33+8192], {%r6212, %r6210, %r6211, %r6209}; 2026-02-21T12:40:23.8241626Z st.shared.v4.b32 [%r34], {%r6218, %r6216, %r6217, %r6215}; 2026-02-21T12:40:23.8241807Z st.shared.v4.b32 [%r34+8192], {%r6224, %r6222, %r6223, %r6221}; 2026-02-21T12:40:23.8241920Z st.shared.v4.b32 [%r35], {%r6230, %r6228, %r6229, %r6227}; 2026-02-21T12:40:23.8242035Z st.shared.v4.b32 [%r35+8192], {%r6236, %r6234, %r6235, %r6233}; 2026-02-21T12:40:23.8242137Z st.shared.v4.b32 [%r36], {%r6242, %r6240, %r6241, %r6239}; 2026-02-21T12:40:23.8242256Z st.shared.v4.b32 [%r36+8192], {%r6248, %r6246, %r6247, %r6245}; 2026-02-21T12:40:23.8242315Z $L__tmp7: 2026-02-21T12:40:23.8242585Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8242655Z // begin inline asm 2026-02-21T12:40:23.8242739Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8242798Z // end inline asm 2026-02-21T12:40:23.8242855Z bar.sync 0; 2026-02-21T12:40:23.8242951Z shfl.sync.idx.b32 %r6249, %r2, 0, 31, -1; 2026-02-21T12:40:23.8243027Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8243162Z mov.pred %p10, -1; 2026-02-21T12:40:23.8243282Z // begin inline asm 2026-02-21T12:40:23.8246015Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r5643,%r5644,%r5645,%r5646}, %rd23, %p10, 1, 1; 2026-02-21T12:40:23.8246096Z // end inline asm 2026-02-21T12:40:23.8246159Z // begin inline asm 2026-02-21T12:40:23.8249014Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457}, {%r5903,%r5904,%r5905,%r5906}, %rd24, %p10, 1, 1; 2026-02-21T12:40:23.8249084Z // end inline asm 2026-02-21T12:40:23.8249162Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8249228Z mov.b32 %r6036, 0; 2026-02-21T12:40:23.8249292Z mov.b32 %r6037, %r6036; 2026-02-21T12:40:23.8249355Z mov.b32 %r6035, %r18409; 2026-02-21T12:40:23.8249420Z // begin inline asm 2026-02-21T12:40:23.8251922Z // wait for regs: %r21330,%r21331,%r21332,%r21333,%r21334,%r21335,%r21336,%r21337,%r21338,%r21339,%r21340,%r21341,%r21342,%r21343,%r21344,%r21345,%r21346,%r21347,%r21348,%r21349,%r21350,%r21351,%r21352,%r21353,%r21354,%r21355,%r21356,%r21357,%r21358,%r21359,%r21360,%r21361,%r21362,%r21363,%r21364,%r21365,%r21366,%r21367,%r21368,%r21369,%r21370,%r21371,%r21372,%r21373,%r21374,%r21375,%r21376,%r21377,%r21378,%r21379,%r21380,%r21381,%r21382,%r21383,%r21384,%r21385,%r21386,%r21387,%r21388,%r21389,%r21390,%r21391,%r21392,%r21393,%r21394,%r21395,%r21396,%r21397,%r21398,%r21399,%r21400,%r21401,%r21402,%r21403,%r21404,%r21405,%r21406,%r21407,%r21408,%r21409,%r21410,%r21411,%r21412,%r21413,%r21414,%r21415,%r21416,%r21417,%r21418,%r21419,%r21420,%r21421,%r21422,%r21423,%r21424,%r21425,%r21426,%r21427,%r21428,%r21429,%r21430,%r21431,%r21432,%r21433,%r21434,%r21435,%r21436,%r21437,%r21438,%r21439,%r21440,%r21441,%r21442,%r21443,%r21444,%r21445,%r21446,%r21447,%r21448,%r21449,%r21450,%r21451,%r21452,%r21453,%r21454,%r21455,%r21456,%r21457,%r6035,%r6036,%r6037 2026-02-21T12:40:23.8252157Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8252216Z // end inline asm 2026-02-21T12:40:23.8252272Z $L__tmp8: 2026-02-21T12:40:23.8252606Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8252679Z add.s64 %rd619, %rd619, 8; 2026-02-21T12:40:23.8252743Z add.s64 %rd618, %rd618, 32; 2026-02-21T12:40:23.8252808Z add.s64 %rd617, %rd617, 10240; 2026-02-21T12:40:23.8252880Z setp.lt.u64 %p12, %rd619, 4088; 2026-02-21T12:40:23.8252942Z @%p12 bra $L__BB0_16; 2026-02-21T12:40:23.8253057Z // %bb.17: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8253266Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:23.8253331Z or.b64 %rd228, %rd44, %rd5; 2026-02-21T12:40:23.8253393Z or.b64 %rd229, %rd44, %rd6; 2026-02-21T12:40:23.8253464Z or.b64 %rd230, %rd44, %rd7; 2026-02-21T12:40:23.8253535Z or.b64 %rd231, %rd44, %rd8; 2026-02-21T12:40:23.8253600Z or.b64 %rd232, %rd44, %rd9; 2026-02-21T12:40:23.8253666Z or.b64 %rd233, %rd44, %rd10; 2026-02-21T12:40:23.8253735Z or.b64 %rd234, %rd44, %rd11; 2026-02-21T12:40:23.8253804Z or.b64 %rd235, %rd44, %rd12; 2026-02-21T12:40:23.8253870Z or.b64 %rd236, %rd44, %rd13; 2026-02-21T12:40:23.8253939Z or.b64 %rd237, %rd44, %rd14; 2026-02-21T12:40:23.8254005Z or.b64 %rd238, %rd44, %rd15; 2026-02-21T12:40:23.8254069Z or.b64 %rd239, %rd44, %rd16; 2026-02-21T12:40:23.8254137Z or.b64 %rd240, %rd44, %rd17; 2026-02-21T12:40:23.8254205Z or.b64 %rd241, %rd44, %rd18; 2026-02-21T12:40:23.8254267Z or.b64 %rd242, %rd44, %rd19; 2026-02-21T12:40:23.8254330Z or.b64 %rd243, %rd44, %rd20; 2026-02-21T12:40:23.8254536Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:23.8254600Z or.b64 %rd244, %rd45, %rd22; 2026-02-21T12:40:23.8254806Z .loc 1 90 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:90:28 2026-02-21T12:40:23.8254906Z cvt.rn.bf16x2.f32 %r6394, %r21331, %r21330; 2026-02-21T12:40:23.8254989Z cvt.rn.bf16x2.f32 %r6395, %r21333, %r21332; 2026-02-21T12:40:23.8255082Z cvt.rn.bf16x2.f32 %r6396, %r21335, %r21334; 2026-02-21T12:40:23.8255165Z cvt.rn.bf16x2.f32 %r6397, %r21337, %r21336; 2026-02-21T12:40:23.8255251Z cvt.rn.bf16x2.f32 %r6398, %r21339, %r21338; 2026-02-21T12:40:23.8255329Z cvt.rn.bf16x2.f32 %r6399, %r21341, %r21340; 2026-02-21T12:40:23.8255407Z cvt.rn.bf16x2.f32 %r6400, %r21343, %r21342; 2026-02-21T12:40:23.8255491Z cvt.rn.bf16x2.f32 %r6401, %r21345, %r21344; 2026-02-21T12:40:23.8255569Z cvt.rn.bf16x2.f32 %r6402, %r21347, %r21346; 2026-02-21T12:40:23.8255647Z cvt.rn.bf16x2.f32 %r6403, %r21349, %r21348; 2026-02-21T12:40:23.8255737Z cvt.rn.bf16x2.f32 %r6404, %r21351, %r21350; 2026-02-21T12:40:23.8255820Z cvt.rn.bf16x2.f32 %r6405, %r21353, %r21352; 2026-02-21T12:40:23.8255895Z cvt.rn.bf16x2.f32 %r6406, %r21355, %r21354; 2026-02-21T12:40:23.8255974Z cvt.rn.bf16x2.f32 %r6407, %r21357, %r21356; 2026-02-21T12:40:23.8256118Z cvt.rn.bf16x2.f32 %r6408, %r21359, %r21358; 2026-02-21T12:40:23.8256200Z cvt.rn.bf16x2.f32 %r6409, %r21361, %r21360; 2026-02-21T12:40:23.8256336Z cvt.rn.bf16x2.f32 %r6410, %r21363, %r21362; 2026-02-21T12:40:23.8256424Z cvt.rn.bf16x2.f32 %r6411, %r21365, %r21364; 2026-02-21T12:40:23.8256607Z cvt.rn.bf16x2.f32 %r6412, %r21367, %r21366; 2026-02-21T12:40:23.8256693Z cvt.rn.bf16x2.f32 %r6413, %r21369, %r21368; 2026-02-21T12:40:23.8256779Z cvt.rn.bf16x2.f32 %r6414, %r21371, %r21370; 2026-02-21T12:40:23.8256857Z cvt.rn.bf16x2.f32 %r6415, %r21373, %r21372; 2026-02-21T12:40:23.8256933Z cvt.rn.bf16x2.f32 %r6416, %r21375, %r21374; 2026-02-21T12:40:23.8257009Z cvt.rn.bf16x2.f32 %r6417, %r21377, %r21376; 2026-02-21T12:40:23.8257095Z cvt.rn.bf16x2.f32 %r6418, %r21379, %r21378; 2026-02-21T12:40:23.8257184Z cvt.rn.bf16x2.f32 %r6419, %r21381, %r21380; 2026-02-21T12:40:23.8257262Z cvt.rn.bf16x2.f32 %r6420, %r21383, %r21382; 2026-02-21T12:40:23.8257350Z cvt.rn.bf16x2.f32 %r6421, %r21385, %r21384; 2026-02-21T12:40:23.8257426Z cvt.rn.bf16x2.f32 %r6422, %r21387, %r21386; 2026-02-21T12:40:23.8257632Z cvt.rn.bf16x2.f32 %r6423, %r21389, %r21388; 2026-02-21T12:40:23.8257713Z cvt.rn.bf16x2.f32 %r6424, %r21391, %r21390; 2026-02-21T12:40:23.8257797Z cvt.rn.bf16x2.f32 %r6425, %r21393, %r21392; 2026-02-21T12:40:23.8257876Z cvt.rn.bf16x2.f32 %r6426, %r21395, %r21394; 2026-02-21T12:40:23.8257967Z cvt.rn.bf16x2.f32 %r6427, %r21397, %r21396; 2026-02-21T12:40:23.8258056Z cvt.rn.bf16x2.f32 %r6428, %r21399, %r21398; 2026-02-21T12:40:23.8258135Z cvt.rn.bf16x2.f32 %r6429, %r21401, %r21400; 2026-02-21T12:40:23.8258214Z cvt.rn.bf16x2.f32 %r6430, %r21403, %r21402; 2026-02-21T12:40:23.8258295Z cvt.rn.bf16x2.f32 %r6431, %r21405, %r21404; 2026-02-21T12:40:23.8258372Z cvt.rn.bf16x2.f32 %r6432, %r21407, %r21406; 2026-02-21T12:40:23.8258449Z cvt.rn.bf16x2.f32 %r6433, %r21409, %r21408; 2026-02-21T12:40:23.8258527Z cvt.rn.bf16x2.f32 %r6434, %r21411, %r21410; 2026-02-21T12:40:23.8258614Z cvt.rn.bf16x2.f32 %r6435, %r21413, %r21412; 2026-02-21T12:40:23.8258694Z cvt.rn.bf16x2.f32 %r6436, %r21415, %r21414; 2026-02-21T12:40:23.8258777Z cvt.rn.bf16x2.f32 %r6437, %r21417, %r21416; 2026-02-21T12:40:23.8258860Z cvt.rn.bf16x2.f32 %r6438, %r21419, %r21418; 2026-02-21T12:40:23.8258935Z cvt.rn.bf16x2.f32 %r6439, %r21421, %r21420; 2026-02-21T12:40:23.8259013Z cvt.rn.bf16x2.f32 %r6440, %r21423, %r21422; 2026-02-21T12:40:23.8259097Z cvt.rn.bf16x2.f32 %r6441, %r21425, %r21424; 2026-02-21T12:40:23.8259174Z cvt.rn.bf16x2.f32 %r6442, %r21427, %r21426; 2026-02-21T12:40:23.8259250Z cvt.rn.bf16x2.f32 %r6443, %r21429, %r21428; 2026-02-21T12:40:23.8259325Z cvt.rn.bf16x2.f32 %r6444, %r21431, %r21430; 2026-02-21T12:40:23.8259409Z cvt.rn.bf16x2.f32 %r6445, %r21433, %r21432; 2026-02-21T12:40:23.8259485Z cvt.rn.bf16x2.f32 %r6446, %r21435, %r21434; 2026-02-21T12:40:23.8259561Z cvt.rn.bf16x2.f32 %r6447, %r21437, %r21436; 2026-02-21T12:40:23.8259644Z cvt.rn.bf16x2.f32 %r6448, %r21439, %r21438; 2026-02-21T12:40:23.8259719Z cvt.rn.bf16x2.f32 %r6449, %r21441, %r21440; 2026-02-21T12:40:23.8259798Z cvt.rn.bf16x2.f32 %r6450, %r21443, %r21442; 2026-02-21T12:40:23.8259882Z cvt.rn.bf16x2.f32 %r6451, %r21445, %r21444; 2026-02-21T12:40:23.8259973Z cvt.rn.bf16x2.f32 %r6452, %r21447, %r21446; 2026-02-21T12:40:23.8260054Z cvt.rn.bf16x2.f32 %r6453, %r21449, %r21448; 2026-02-21T12:40:23.8260131Z cvt.rn.bf16x2.f32 %r6454, %r21451, %r21450; 2026-02-21T12:40:23.8260213Z cvt.rn.bf16x2.f32 %r6455, %r21453, %r21452; 2026-02-21T12:40:23.8260288Z cvt.rn.bf16x2.f32 %r6456, %r21455, %r21454; 2026-02-21T12:40:23.8260365Z cvt.rn.bf16x2.f32 %r6457, %r21457, %r21456; 2026-02-21T12:40:23.8260571Z .loc 1 91 22 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:22 2026-02-21T12:40:23.8260648Z mad.lo.s64 %rd245, %rd228, 2560, %rd162; 2026-02-21T12:40:23.8260715Z shl.b64 %rd246, %rd244, 1; 2026-02-21T12:40:23.8260789Z add.s64 %rd212, %rd245, %rd246; 2026-02-21T12:40:23.8260941Z mad.lo.s64 %rd247, %rd229, 2560, %rd162; 2026-02-21T12:40:23.8261007Z add.s64 %rd213, %rd247, %rd246; 2026-02-21T12:40:23.8261153Z mad.lo.s64 %rd248, %rd230, 2560, %rd162; 2026-02-21T12:40:23.8261226Z add.s64 %rd214, %rd248, %rd246; 2026-02-21T12:40:23.8261297Z mad.lo.s64 %rd249, %rd231, 2560, %rd162; 2026-02-21T12:40:23.8261360Z add.s64 %rd215, %rd249, %rd246; 2026-02-21T12:40:23.8261451Z mad.lo.s64 %rd250, %rd232, 2560, %rd162; 2026-02-21T12:40:23.8261517Z add.s64 %rd216, %rd250, %rd246; 2026-02-21T12:40:23.8261588Z mad.lo.s64 %rd251, %rd233, 2560, %rd162; 2026-02-21T12:40:23.8261652Z add.s64 %rd217, %rd251, %rd246; 2026-02-21T12:40:23.8261727Z mad.lo.s64 %rd252, %rd234, 2560, %rd162; 2026-02-21T12:40:23.8261793Z add.s64 %rd218, %rd252, %rd246; 2026-02-21T12:40:23.8261862Z mad.lo.s64 %rd253, %rd235, 2560, %rd162; 2026-02-21T12:40:23.8261931Z add.s64 %rd219, %rd253, %rd246; 2026-02-21T12:40:23.8262002Z mad.lo.s64 %rd254, %rd236, 2560, %rd162; 2026-02-21T12:40:23.8262070Z add.s64 %rd220, %rd254, %rd246; 2026-02-21T12:40:23.8262146Z mad.lo.s64 %rd255, %rd237, 2560, %rd162; 2026-02-21T12:40:23.8262306Z add.s64 %rd221, %rd255, %rd246; 2026-02-21T12:40:23.8262379Z mad.lo.s64 %rd256, %rd238, 2560, %rd162; 2026-02-21T12:40:23.8262445Z add.s64 %rd222, %rd256, %rd246; 2026-02-21T12:40:23.8262520Z mad.lo.s64 %rd257, %rd239, 2560, %rd162; 2026-02-21T12:40:23.8262584Z add.s64 %rd223, %rd257, %rd246; 2026-02-21T12:40:23.8262655Z mad.lo.s64 %rd258, %rd240, 2560, %rd162; 2026-02-21T12:40:23.8262723Z add.s64 %rd224, %rd258, %rd246; 2026-02-21T12:40:23.8262793Z mad.lo.s64 %rd259, %rd241, 2560, %rd162; 2026-02-21T12:40:23.8262856Z add.s64 %rd225, %rd259, %rd246; 2026-02-21T12:40:23.8262930Z mad.lo.s64 %rd260, %rd242, 2560, %rd162; 2026-02-21T12:40:23.8262993Z add.s64 %rd226, %rd260, %rd246; 2026-02-21T12:40:23.8263063Z mad.lo.s64 %rd261, %rd243, 2560, %rd162; 2026-02-21T12:40:23.8263140Z add.s64 %rd227, %rd261, %rd246; 2026-02-21T12:40:23.8263352Z .loc 1 91 81 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:81 2026-02-21T12:40:23.8263413Z bar.sync 0; 2026-02-21T12:40:23.8263533Z st.shared.v4.b32 [%r37], {%r6394, %r6396, %r6398, %r6400}; 2026-02-21T12:40:23.8263649Z st.shared.v4.b32 [%r38], {%r6402, %r6404, %r6406, %r6408}; 2026-02-21T12:40:23.8263754Z st.shared.v4.b32 [%r39], {%r6410, %r6412, %r6414, %r6416}; 2026-02-21T12:40:23.8263856Z st.shared.v4.b32 [%r40], {%r6418, %r6420, %r6422, %r6424}; 2026-02-21T12:40:23.8263964Z st.shared.v4.b32 [%r41], {%r6426, %r6428, %r6430, %r6432}; 2026-02-21T12:40:23.8264066Z st.shared.v4.b32 [%r42], {%r6434, %r6436, %r6438, %r6440}; 2026-02-21T12:40:23.8264170Z st.shared.v4.b32 [%r43], {%r6442, %r6444, %r6446, %r6448}; 2026-02-21T12:40:23.8264272Z st.shared.v4.b32 [%r44], {%r6450, %r6452, %r6454, %r6456}; 2026-02-21T12:40:23.8264339Z bar.sync 0; 2026-02-21T12:40:23.8264401Z // begin inline asm 2026-02-21T12:40:23.8264596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6250, %r6251, %r6252, %r6253}, [%r6254]; 2026-02-21T12:40:23.8264663Z // end inline asm 2026-02-21T12:40:23.8264724Z // begin inline asm 2026-02-21T12:40:23.8264913Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6255, %r6256, %r6257, %r6258}, [%r6259]; 2026-02-21T12:40:23.8264972Z // end inline asm 2026-02-21T12:40:23.8265040Z // begin inline asm 2026-02-21T12:40:23.8265221Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6260, %r6261, %r6262, %r6263}, [%r6264]; 2026-02-21T12:40:23.8265280Z // end inline asm 2026-02-21T12:40:23.8265348Z // begin inline asm 2026-02-21T12:40:23.8265529Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6265, %r6266, %r6267, %r6268}, [%r6269]; 2026-02-21T12:40:23.8265591Z // end inline asm 2026-02-21T12:40:23.8265662Z // begin inline asm 2026-02-21T12:40:23.8265841Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6270, %r6271, %r6272, %r6273}, [%r6274]; 2026-02-21T12:40:23.8265898Z // end inline asm 2026-02-21T12:40:23.8265958Z // begin inline asm 2026-02-21T12:40:23.8266214Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6275, %r6276, %r6277, %r6278}, [%r6279]; 2026-02-21T12:40:23.8266274Z // end inline asm 2026-02-21T12:40:23.8271846Z // begin inline asm 2026-02-21T12:40:23.8272132Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6280, %r6281, %r6282, %r6283}, [%r6284]; 2026-02-21T12:40:23.8272202Z // end inline asm 2026-02-21T12:40:23.8272271Z // begin inline asm 2026-02-21T12:40:23.8272483Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6285, %r6286, %r6287, %r6288}, [%r6289]; 2026-02-21T12:40:23.8272544Z // end inline asm 2026-02-21T12:40:23.8272606Z bar.sync 0; 2026-02-21T12:40:23.8272725Z st.shared.v4.b32 [%r37], {%r6395, %r6397, %r6399, %r6401}; 2026-02-21T12:40:23.8272836Z st.shared.v4.b32 [%r38], {%r6403, %r6405, %r6407, %r6409}; 2026-02-21T12:40:23.8272941Z st.shared.v4.b32 [%r39], {%r6411, %r6413, %r6415, %r6417}; 2026-02-21T12:40:23.8273050Z st.shared.v4.b32 [%r40], {%r6419, %r6421, %r6423, %r6425}; 2026-02-21T12:40:23.8273152Z st.shared.v4.b32 [%r41], {%r6427, %r6429, %r6431, %r6433}; 2026-02-21T12:40:23.8273251Z st.shared.v4.b32 [%r42], {%r6435, %r6437, %r6439, %r6441}; 2026-02-21T12:40:23.8273568Z st.shared.v4.b32 [%r43], {%r6443, %r6445, %r6447, %r6449}; 2026-02-21T12:40:23.8273675Z st.shared.v4.b32 [%r44], {%r6451, %r6453, %r6455, %r6457}; 2026-02-21T12:40:23.8273734Z bar.sync 0; 2026-02-21T12:40:23.8273804Z // begin inline asm 2026-02-21T12:40:23.8274000Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6290, %r6291, %r6292, %r6293}, [%r6254]; 2026-02-21T12:40:23.8274060Z // end inline asm 2026-02-21T12:40:23.8274121Z // begin inline asm 2026-02-21T12:40:23.8274318Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6295, %r6296, %r6297, %r6298}, [%r6259]; 2026-02-21T12:40:23.8274378Z // end inline asm 2026-02-21T12:40:23.8274437Z // begin inline asm 2026-02-21T12:40:23.8274626Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6300, %r6301, %r6302, %r6303}, [%r6264]; 2026-02-21T12:40:23.8274684Z // end inline asm 2026-02-21T12:40:23.8274744Z // begin inline asm 2026-02-21T12:40:23.8274932Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6305, %r6306, %r6307, %r6308}, [%r6269]; 2026-02-21T12:40:23.8274993Z // end inline asm 2026-02-21T12:40:23.8275054Z // begin inline asm 2026-02-21T12:40:23.8275230Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6310, %r6311, %r6312, %r6313}, [%r6274]; 2026-02-21T12:40:23.8275295Z // end inline asm 2026-02-21T12:40:23.8275353Z // begin inline asm 2026-02-21T12:40:23.8275533Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6315, %r6316, %r6317, %r6318}, [%r6279]; 2026-02-21T12:40:23.8275596Z // end inline asm 2026-02-21T12:40:23.8275657Z // begin inline asm 2026-02-21T12:40:23.8275851Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6320, %r6321, %r6322, %r6323}, [%r6284]; 2026-02-21T12:40:23.8275908Z // end inline asm 2026-02-21T12:40:23.8275976Z // begin inline asm 2026-02-21T12:40:23.8276155Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6325, %r6326, %r6327, %r6328}, [%r6289]; 2026-02-21T12:40:23.8276228Z // end inline asm 2026-02-21T12:40:23.8276293Z // begin inline asm 2026-02-21T12:40:23.8276427Z st.global.v4.b32 [ %rd212 + 0 ], { %r6250, %r6251, %r6252, %r6253 }; 2026-02-21T12:40:23.8276652Z // end inline asm 2026-02-21T12:40:23.8276722Z // begin inline asm 2026-02-21T12:40:23.8276850Z st.global.v4.b32 [ %rd213 + 0 ], { %r6255, %r6256, %r6257, %r6258 }; 2026-02-21T12:40:23.8276909Z // end inline asm 2026-02-21T12:40:23.8276968Z // begin inline asm 2026-02-21T12:40:23.8277096Z st.global.v4.b32 [ %rd214 + 0 ], { %r6290, %r6291, %r6292, %r6293 }; 2026-02-21T12:40:23.8277163Z // end inline asm 2026-02-21T12:40:23.8277223Z // begin inline asm 2026-02-21T12:40:23.8277347Z st.global.v4.b32 [ %rd215 + 0 ], { %r6295, %r6296, %r6297, %r6298 }; 2026-02-21T12:40:23.8277405Z // end inline asm 2026-02-21T12:40:23.8277463Z // begin inline asm 2026-02-21T12:40:23.8277580Z st.global.v4.b32 [ %rd216 + 0 ], { %r6260, %r6261, %r6262, %r6263 }; 2026-02-21T12:40:23.8277642Z // end inline asm 2026-02-21T12:40:23.8277796Z // begin inline asm 2026-02-21T12:40:23.8277914Z st.global.v4.b32 [ %rd217 + 0 ], { %r6265, %r6266, %r6267, %r6268 }; 2026-02-21T12:40:23.8278035Z // end inline asm 2026-02-21T12:40:23.8278094Z // begin inline asm 2026-02-21T12:40:23.8278209Z st.global.v4.b32 [ %rd218 + 0 ], { %r6300, %r6301, %r6302, %r6303 }; 2026-02-21T12:40:23.8278269Z // end inline asm 2026-02-21T12:40:23.8278329Z // begin inline asm 2026-02-21T12:40:23.8278444Z st.global.v4.b32 [ %rd219 + 0 ], { %r6305, %r6306, %r6307, %r6308 }; 2026-02-21T12:40:23.8278501Z // end inline asm 2026-02-21T12:40:23.8278566Z // begin inline asm 2026-02-21T12:40:23.8278681Z st.global.v4.b32 [ %rd220 + 0 ], { %r6270, %r6271, %r6272, %r6273 }; 2026-02-21T12:40:23.8278738Z // end inline asm 2026-02-21T12:40:23.8278801Z // begin inline asm 2026-02-21T12:40:23.8278914Z st.global.v4.b32 [ %rd221 + 0 ], { %r6275, %r6276, %r6277, %r6278 }; 2026-02-21T12:40:23.8278971Z // end inline asm 2026-02-21T12:40:23.8279029Z // begin inline asm 2026-02-21T12:40:23.8279149Z st.global.v4.b32 [ %rd222 + 0 ], { %r6310, %r6311, %r6312, %r6313 }; 2026-02-21T12:40:23.8279204Z // end inline asm 2026-02-21T12:40:23.8279331Z // begin inline asm 2026-02-21T12:40:23.8279517Z st.global.v4.b32 [ %rd223 + 0 ], { %r6315, %r6316, %r6317, %r6318 }; 2026-02-21T12:40:23.8279577Z // end inline asm 2026-02-21T12:40:23.8279637Z // begin inline asm 2026-02-21T12:40:23.8279758Z st.global.v4.b32 [ %rd224 + 0 ], { %r6280, %r6281, %r6282, %r6283 }; 2026-02-21T12:40:23.8279816Z // end inline asm 2026-02-21T12:40:23.8279874Z // begin inline asm 2026-02-21T12:40:23.8279988Z st.global.v4.b32 [ %rd225 + 0 ], { %r6285, %r6286, %r6287, %r6288 }; 2026-02-21T12:40:23.8280047Z // end inline asm 2026-02-21T12:40:23.8280103Z // begin inline asm 2026-02-21T12:40:23.8280217Z st.global.v4.b32 [ %rd226 + 0 ], { %r6320, %r6321, %r6322, %r6323 }; 2026-02-21T12:40:23.8280278Z // end inline asm 2026-02-21T12:40:23.8280336Z // begin inline asm 2026-02-21T12:40:23.8280450Z st.global.v4.b32 [ %rd227 + 0 ], { %r6325, %r6326, %r6327, %r6328 }; 2026-02-21T12:40:23.8280510Z // end inline asm 2026-02-21T12:40:23.8280744Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.8280816Z add.s64 %rd262, %rd612, 1; 2026-02-21T12:40:23.8281023Z .loc 1 28 35 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:28:35 2026-02-21T12:40:23.8281121Z mul.hi.u64 %rd263, %rd262, -3689348814741910323; 2026-02-21T12:40:23.8281187Z shr.u64 %rd264, %rd263, 5; 2026-02-21T12:40:23.8281383Z .loc 1 29 33 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:29:33 2026-02-21T12:40:23.8281451Z shl.b64 %rd63, %rd264, 3; 2026-02-21T12:40:23.8281644Z .loc 1 30 39 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:39 2026-02-21T12:40:23.8281721Z sub.s64 %rd265, 4096, %rd63; 2026-02-21T12:40:23.8281921Z .loc 1 30 52 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:52 2026-02-21T12:40:23.8281986Z min.s64 %rd64, %rd265, 8; 2026-02-21T12:40:23.8282182Z .loc 1 31 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:45 2026-02-21T12:40:23.8282251Z mul.lo.s64 %rd266, %rd264, 40; 2026-02-21T12:40:23.8282322Z sub.s64 %rd65, %rd262, %rd266; 2026-02-21T12:40:23.8282513Z .loc 1 32 51 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:32:51 2026-02-21T12:40:23.8282577Z or.b64 %rd267, %rd65, %rd64; 2026-02-21T12:40:23.8282653Z and.b64 %rd268, %rd267, -4294967296; 2026-02-21T12:40:23.8282722Z setp.ne.b64 %p13, %rd268, 0; 2026-02-21T12:40:23.8282785Z @%p13 bra $L__BB0_19; 2026-02-21T12:40:23.8282850Z bra.uni $L__BB0_18; 2026-02-21T12:40:23.8282969Z $L__BB0_19: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8283038Z div.s64 %rd620, %rd65, %rd64; 2026-02-21T12:40:23.8283097Z bra.uni $L__BB0_20; 2026-02-21T12:40:23.8283272Z $L__BB0_18: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8283337Z cvt.u32.u64 %r6458, %rd64; 2026-02-21T12:40:23.8283449Z cvt.u32.u64 %r6459, %rd65; 2026-02-21T12:40:23.8283519Z div.u32 %r6460, %r6459, %r6458; 2026-02-21T12:40:23.8283585Z cvt.u64.u32 %rd620, %r6460; 2026-02-21T12:40:23.8283692Z $L__BB0_20: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8283911Z .loc 1 31 64 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:64 2026-02-21T12:40:23.8283979Z mul.lo.s64 %rd270, %rd620, %rd64; 2026-02-21T12:40:23.8284044Z sub.s64 %rd271, %rd65, %rd270; 2026-02-21T12:40:23.8284256Z .loc 1 31 30 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:30 2026-02-21T12:40:23.8284330Z add.s64 %rd272, %rd271, %rd63; 2026-02-21T12:40:23.8284528Z .loc 1 33 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:33:27 2026-02-21T12:40:23.8284596Z shl.b64 %rd69, %rd272, 6; 2026-02-21T12:40:23.8284841Z .loc 1 35 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:35:27 2026-02-21T12:40:23.8284959Z shl.b64 %rd70, %rd620, 8; 2026-02-21T12:40:23.8285165Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8285236Z shl.b64 %rd71, %rd272, 20; 2026-02-21T12:40:23.8285302Z add.s64 %rd622, %rd25, %rd71; 2026-02-21T12:40:23.8285364Z add.s64 %rd621, %rd26, %rd70; 2026-02-21T12:40:23.8285433Z mov.b32 %r21586, 0f00000000; 2026-02-21T12:40:23.8285498Z mov.b64 %rd623, -24; 2026-02-21T12:40:23.8285560Z mov.b32 %r21587, %r21586; 2026-02-21T12:40:23.8285619Z mov.b32 %r21588, %r21586; 2026-02-21T12:40:23.8285683Z mov.b32 %r21589, %r21586; 2026-02-21T12:40:23.8285741Z mov.b32 %r21590, %r21586; 2026-02-21T12:40:23.8285807Z mov.b32 %r21591, %r21586; 2026-02-21T12:40:23.8285866Z mov.b32 %r21592, %r21586; 2026-02-21T12:40:23.8285938Z mov.b32 %r21593, %r21586; 2026-02-21T12:40:23.8286001Z mov.b32 %r21594, %r21586; 2026-02-21T12:40:23.8286060Z mov.b32 %r21595, %r21586; 2026-02-21T12:40:23.8286130Z mov.b32 %r21596, %r21586; 2026-02-21T12:40:23.8286190Z mov.b32 %r21597, %r21586; 2026-02-21T12:40:23.8286249Z mov.b32 %r21598, %r21586; 2026-02-21T12:40:23.8286308Z mov.b32 %r21599, %r21586; 2026-02-21T12:40:23.8286373Z mov.b32 %r21600, %r21586; 2026-02-21T12:40:23.8286434Z mov.b32 %r21601, %r21586; 2026-02-21T12:40:23.8286616Z mov.b32 %r21602, %r21586; 2026-02-21T12:40:23.8286683Z mov.b32 %r21603, %r21586; 2026-02-21T12:40:23.8286740Z mov.b32 %r21604, %r21586; 2026-02-21T12:40:23.8286798Z mov.b32 %r21605, %r21586; 2026-02-21T12:40:23.8286859Z mov.b32 %r21606, %r21586; 2026-02-21T12:40:23.8286924Z mov.b32 %r21607, %r21586; 2026-02-21T12:40:23.8286983Z mov.b32 %r21608, %r21586; 2026-02-21T12:40:23.8287043Z mov.b32 %r21609, %r21586; 2026-02-21T12:40:23.8287108Z mov.b32 %r21610, %r21586; 2026-02-21T12:40:23.8287170Z mov.b32 %r21611, %r21586; 2026-02-21T12:40:23.8287230Z mov.b32 %r21612, %r21586; 2026-02-21T12:40:23.8287294Z mov.b32 %r21613, %r21586; 2026-02-21T12:40:23.8287358Z mov.b32 %r21614, %r21586; 2026-02-21T12:40:23.8287417Z mov.b32 %r21615, %r21586; 2026-02-21T12:40:23.8287476Z mov.b32 %r21616, %r21586; 2026-02-21T12:40:23.8287554Z mov.b32 %r21617, %r21586; 2026-02-21T12:40:23.8287614Z mov.b32 %r21618, %r21586; 2026-02-21T12:40:23.8287671Z mov.b32 %r21619, %r21586; 2026-02-21T12:40:23.8287735Z mov.b32 %r21620, %r21586; 2026-02-21T12:40:23.8287794Z mov.b32 %r21621, %r21586; 2026-02-21T12:40:23.8287852Z mov.b32 %r21622, %r21586; 2026-02-21T12:40:23.8287908Z mov.b32 %r21623, %r21586; 2026-02-21T12:40:23.8287967Z mov.b32 %r21624, %r21586; 2026-02-21T12:40:23.8288024Z mov.b32 %r21625, %r21586; 2026-02-21T12:40:23.8288081Z mov.b32 %r21626, %r21586; 2026-02-21T12:40:23.8288140Z mov.b32 %r21627, %r21586; 2026-02-21T12:40:23.8288197Z mov.b32 %r21628, %r21586; 2026-02-21T12:40:23.8288341Z mov.b32 %r21629, %r21586; 2026-02-21T12:40:23.8288400Z mov.b32 %r21630, %r21586; 2026-02-21T12:40:23.8288460Z mov.b32 %r21631, %r21586; 2026-02-21T12:40:23.8288584Z mov.b32 %r21632, %r21586; 2026-02-21T12:40:23.8288643Z mov.b32 %r21633, %r21586; 2026-02-21T12:40:23.8288715Z mov.b32 %r21634, %r21586; 2026-02-21T12:40:23.8288778Z mov.b32 %r21635, %r21586; 2026-02-21T12:40:23.8288837Z mov.b32 %r21636, %r21586; 2026-02-21T12:40:23.8288897Z mov.b32 %r21637, %r21586; 2026-02-21T12:40:23.8288962Z mov.b32 %r21638, %r21586; 2026-02-21T12:40:23.8289020Z mov.b32 %r21639, %r21586; 2026-02-21T12:40:23.8289079Z mov.b32 %r21640, %r21586; 2026-02-21T12:40:23.8289143Z mov.b32 %r21641, %r21586; 2026-02-21T12:40:23.8289204Z mov.b32 %r21642, %r21586; 2026-02-21T12:40:23.8289264Z mov.b32 %r21643, %r21586; 2026-02-21T12:40:23.8289323Z mov.b32 %r21644, %r21586; 2026-02-21T12:40:23.8289388Z mov.b32 %r21645, %r21586; 2026-02-21T12:40:23.8289447Z mov.b32 %r21646, %r21586; 2026-02-21T12:40:23.8289507Z mov.b32 %r21647, %r21586; 2026-02-21T12:40:23.8289573Z mov.b32 %r21648, %r21586; 2026-02-21T12:40:23.8289634Z mov.b32 %r21649, %r21586; 2026-02-21T12:40:23.8289813Z mov.b32 %r21650, %r21586; 2026-02-21T12:40:23.8289876Z mov.b32 %r21651, %r21586; 2026-02-21T12:40:23.8289937Z mov.b32 %r21652, %r21586; 2026-02-21T12:40:23.8289993Z mov.b32 %r21653, %r21586; 2026-02-21T12:40:23.8290049Z mov.b32 %r21654, %r21586; 2026-02-21T12:40:23.8290108Z mov.b32 %r21655, %r21586; 2026-02-21T12:40:23.8290165Z mov.b32 %r21656, %r21586; 2026-02-21T12:40:23.8290220Z mov.b32 %r21657, %r21586; 2026-02-21T12:40:23.8290279Z mov.b32 %r21658, %r21586; 2026-02-21T12:40:23.8290339Z mov.b32 %r21659, %r21586; 2026-02-21T12:40:23.8290399Z mov.b32 %r21660, %r21586; 2026-02-21T12:40:23.8290459Z mov.b32 %r21661, %r21586; 2026-02-21T12:40:23.8290523Z mov.b32 %r21662, %r21586; 2026-02-21T12:40:23.8290583Z mov.b32 %r21663, %r21586; 2026-02-21T12:40:23.8290643Z mov.b32 %r21664, %r21586; 2026-02-21T12:40:23.8290709Z mov.b32 %r21665, %r21586; 2026-02-21T12:40:23.8290768Z mov.b32 %r21666, %r21586; 2026-02-21T12:40:23.8290827Z mov.b32 %r21667, %r21586; 2026-02-21T12:40:23.8290891Z mov.b32 %r21668, %r21586; 2026-02-21T12:40:23.8290955Z mov.b32 %r21669, %r21586; 2026-02-21T12:40:23.8291014Z mov.b32 %r21670, %r21586; 2026-02-21T12:40:23.8291072Z mov.b32 %r21671, %r21586; 2026-02-21T12:40:23.8291136Z mov.b32 %r21672, %r21586; 2026-02-21T12:40:23.8291194Z mov.b32 %r21673, %r21586; 2026-02-21T12:40:23.8291252Z mov.b32 %r21674, %r21586; 2026-02-21T12:40:23.8291323Z mov.b32 %r21675, %r21586; 2026-02-21T12:40:23.8291388Z mov.b32 %r21676, %r21586; 2026-02-21T12:40:23.8291448Z mov.b32 %r21677, %r21586; 2026-02-21T12:40:23.8291505Z mov.b32 %r21678, %r21586; 2026-02-21T12:40:23.8291563Z mov.b32 %r21679, %r21586; 2026-02-21T12:40:23.8291623Z mov.b32 %r21680, %r21586; 2026-02-21T12:40:23.8291681Z mov.b32 %r21681, %r21586; 2026-02-21T12:40:23.8291738Z mov.b32 %r21682, %r21586; 2026-02-21T12:40:23.8291802Z mov.b32 %r21683, %r21586; 2026-02-21T12:40:23.8291862Z mov.b32 %r21684, %r21586; 2026-02-21T12:40:23.8291920Z mov.b32 %r21685, %r21586; 2026-02-21T12:40:23.8291988Z mov.b32 %r21686, %r21586; 2026-02-21T12:40:23.8292047Z mov.b32 %r21687, %r21586; 2026-02-21T12:40:23.8292105Z mov.b32 %r21688, %r21586; 2026-02-21T12:40:23.8292165Z mov.b32 %r21689, %r21586; 2026-02-21T12:40:23.8292227Z mov.b32 %r21690, %r21586; 2026-02-21T12:40:23.8292287Z mov.b32 %r21691, %r21586; 2026-02-21T12:40:23.8292346Z mov.b32 %r21692, %r21586; 2026-02-21T12:40:23.8292407Z mov.b32 %r21693, %r21586; 2026-02-21T12:40:23.8292468Z mov.b32 %r21694, %r21586; 2026-02-21T12:40:23.8292526Z mov.b32 %r21695, %r21586; 2026-02-21T12:40:23.8292585Z mov.b32 %r21696, %r21586; 2026-02-21T12:40:23.8292645Z mov.b32 %r21697, %r21586; 2026-02-21T12:40:23.8292705Z mov.b32 %r21698, %r21586; 2026-02-21T12:40:23.8292764Z mov.b32 %r21699, %r21586; 2026-02-21T12:40:23.8292826Z mov.b32 %r21700, %r21586; 2026-02-21T12:40:23.8292973Z mov.b32 %r21701, %r21586; 2026-02-21T12:40:23.8293030Z mov.b32 %r21702, %r21586; 2026-02-21T12:40:23.8293091Z mov.b32 %r21703, %r21586; 2026-02-21T12:40:23.8293202Z mov.b32 %r21704, %r21586; 2026-02-21T12:40:23.8293262Z mov.b32 %r21705, %r21586; 2026-02-21T12:40:23.8293320Z mov.b32 %r21706, %r21586; 2026-02-21T12:40:23.8293384Z mov.b32 %r21707, %r21586; 2026-02-21T12:40:23.8293442Z mov.b32 %r21708, %r21586; 2026-02-21T12:40:23.8293501Z mov.b32 %r21709, %r21586; 2026-02-21T12:40:23.8293565Z mov.b32 %r21710, %r21586; 2026-02-21T12:40:23.8293624Z mov.b32 %r21711, %r21586; 2026-02-21T12:40:23.8293682Z mov.b32 %r21712, %r21586; 2026-02-21T12:40:23.8293741Z mov.b32 %r21713, %r21586; 2026-02-21T12:40:23.8293861Z $L__BB0_21: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8293966Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8294174Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8294250Z add.s64 %rd274, %rd622, -64; 2026-02-21T12:40:23.8294309Z // begin inline asm 2026-02-21T12:40:23.8294461Z mov.u64 %rd273, 0x0; 2026-02-21T12:40:23.8294602Z createpolicy.fractional.L2::evict_last.b64 %rd273, 1.0; 2026-02-21T12:40:23.8294663Z // end inline asm 2026-02-21T12:40:23.8294722Z // begin inline asm 2026-02-21T12:40:23.8294781Z mov.u32 %r6462, 0x0; 2026-02-21T12:40:23.8294844Z mov.u32 %r6463, 0x0; 2026-02-21T12:40:23.8294902Z mov.u32 %r6464, 0x0; 2026-02-21T12:40:23.8294960Z mov.u32 %r6465, 0x0; 2026-02-21T12:40:23.8295195Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6462, %r6463, %r6464, %r6465 }, [ %rd274 + 0 ], %rd273; 2026-02-21T12:40:23.8295254Z // end inline asm 2026-02-21T12:40:23.8295460Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8295523Z bar.sync 0; 2026-02-21T12:40:23.8295604Z st.shared.v2.b32 [%r9], {%r6462, %r6463}; 2026-02-21T12:40:23.8295686Z st.shared.v2.b32 [%r10], {%r6464, %r6465}; 2026-02-21T12:40:23.8295745Z bar.sync 0; 2026-02-21T12:40:23.8295826Z ld.shared.b16 %rs417, [%r53]; 2026-02-21T12:40:23.8295900Z ld.shared.b16 %rs418, [%r53+256]; 2026-02-21T12:40:23.8295971Z ld.shared.b16 %rs419, [%r53+16]; 2026-02-21T12:40:23.8296055Z ld.shared.b16 %rs420, [%r53+272]; 2026-02-21T12:40:23.8296124Z ld.shared.b16 %rs421, [%r54]; 2026-02-21T12:40:23.8296189Z ld.shared.b16 %rs422, [%r54+256]; 2026-02-21T12:40:23.8296255Z ld.shared.b16 %rs423, [%r54+16]; 2026-02-21T12:40:23.8296324Z ld.shared.b16 %rs424, [%r54+272]; 2026-02-21T12:40:23.8296394Z cvt.f32.bf16 %r6726, %rs417; 2026-02-21T12:40:23.8296583Z cvt.f32.bf16 %r6727, %rs418; 2026-02-21T12:40:23.8296657Z cvt.f32.bf16 %r6728, %rs421; 2026-02-21T12:40:23.8296734Z cvt.f32.bf16 %r6729, %rs422; 2026-02-21T12:40:23.8296799Z cvt.f32.bf16 %r6986, %rs419; 2026-02-21T12:40:23.8296860Z cvt.f32.bf16 %r6987, %rs420; 2026-02-21T12:40:23.8296934Z cvt.f32.bf16 %r6988, %rs423; 2026-02-21T12:40:23.8296998Z cvt.f32.bf16 %r6989, %rs424; 2026-02-21T12:40:23.8297222Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8297292Z // begin inline asm 2026-02-21T12:40:23.8297350Z mov.u32 %r6466, 0x0; 2026-02-21T12:40:23.8297411Z mov.u32 %r6467, 0x0; 2026-02-21T12:40:23.8297470Z mov.u32 %r6468, 0x0; 2026-02-21T12:40:23.8297531Z mov.u32 %r6469, 0x0; 2026-02-21T12:40:23.8297667Z ld.global.v4.b32 { %r6466, %r6467, %r6468, %r6469 }, [ %rd621 + 0 ]; 2026-02-21T12:40:23.8297728Z // end inline asm 2026-02-21T12:40:23.8297929Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8297989Z bar.sync 0; 2026-02-21T12:40:23.8298056Z st.shared.b8 [%r13], %r6466; 2026-02-21T12:40:23.8298130Z prmt.b32 %r8832, %r6466, 0, 0x7771U; 2026-02-21T12:40:23.8298203Z st.shared.b8 [%r14], %r8832; 2026-02-21T12:40:23.8298271Z prmt.b32 %r8833, %r6466, 0, 0x7772U; 2026-02-21T12:40:23.8298424Z st.shared.b8 [%r15+256], %r8833; 2026-02-21T12:40:23.8298490Z prmt.b32 %r8834, %r6466, 0, 0x7773U; 2026-02-21T12:40:23.8298619Z st.shared.b8 [%r16+256], %r8834; 2026-02-21T12:40:23.8298684Z st.shared.b8 [%r17+512], %r6467; 2026-02-21T12:40:23.8298749Z prmt.b32 %r8835, %r6467, 0, 0x7771U; 2026-02-21T12:40:23.8298815Z st.shared.b8 [%r18+512], %r8835; 2026-02-21T12:40:23.8298879Z prmt.b32 %r8836, %r6467, 0, 0x7772U; 2026-02-21T12:40:23.8298944Z st.shared.b8 [%r19+768], %r8836; 2026-02-21T12:40:23.8299008Z prmt.b32 %r8837, %r6467, 0, 0x7773U; 2026-02-21T12:40:23.8299079Z st.shared.b8 [%r20+768], %r8837; 2026-02-21T12:40:23.8299144Z st.shared.b8 [%r21+1024], %r6468; 2026-02-21T12:40:23.8299208Z prmt.b32 %r8838, %r6468, 0, 0x7771U; 2026-02-21T12:40:23.8299276Z st.shared.b8 [%r22+1024], %r8838; 2026-02-21T12:40:23.8299342Z prmt.b32 %r8839, %r6468, 0, 0x7772U; 2026-02-21T12:40:23.8299407Z st.shared.b8 [%r23+1280], %r8839; 2026-02-21T12:40:23.8299480Z prmt.b32 %r8840, %r6468, 0, 0x7773U; 2026-02-21T12:40:23.8299555Z st.shared.b8 [%r24+1280], %r8840; 2026-02-21T12:40:23.8299687Z st.shared.b8 [%r25+1536], %r6469; 2026-02-21T12:40:23.8299810Z prmt.b32 %r8841, %r6469, 0, 0x7771U; 2026-02-21T12:40:23.8299882Z st.shared.b8 [%r26+1536], %r8841; 2026-02-21T12:40:23.8299946Z prmt.b32 %r8842, %r6469, 0, 0x7772U; 2026-02-21T12:40:23.8300009Z st.shared.b8 [%r27+1792], %r8842; 2026-02-21T12:40:23.8300076Z prmt.b32 %r8843, %r6469, 0, 0x7773U; 2026-02-21T12:40:23.8300140Z st.shared.b8 [%r28+1792], %r8843; 2026-02-21T12:40:23.8300198Z bar.sync 0; 2026-02-21T12:40:23.8300267Z ld.shared.b32 %r8844, [%r29]; 2026-02-21T12:40:23.8300337Z prmt.b32 %r8845, %r8844, 0, 0x7770U; 2026-02-21T12:40:23.8300404Z cvt.u16.u32 %rs425, %r8845; 2026-02-21T12:40:23.8300468Z prmt.b32 %r8846, %r8844, 0, 0x7771U; 2026-02-21T12:40:23.8300534Z cvt.u16.u32 %rs426, %r8846; 2026-02-21T12:40:23.8300599Z prmt.b32 %r8847, %r8844, 0, 0x7772U; 2026-02-21T12:40:23.8300664Z cvt.u16.u32 %rs427, %r8847; 2026-02-21T12:40:23.8300728Z prmt.b32 %r8848, %r8844, 0, 0x7773U; 2026-02-21T12:40:23.8300795Z cvt.u16.u32 %rs428, %r8848; 2026-02-21T12:40:23.8300863Z ld.shared.b32 %r8849, [%r30]; 2026-02-21T12:40:23.8300928Z prmt.b32 %r8850, %r8849, 0, 0x7770U; 2026-02-21T12:40:23.8301003Z cvt.u16.u32 %rs429, %r8850; 2026-02-21T12:40:23.8301066Z prmt.b32 %r8851, %r8849, 0, 0x7771U; 2026-02-21T12:40:23.8301127Z cvt.u16.u32 %rs430, %r8851; 2026-02-21T12:40:23.8301194Z prmt.b32 %r8852, %r8849, 0, 0x7772U; 2026-02-21T12:40:23.8301254Z cvt.u16.u32 %rs431, %r8852; 2026-02-21T12:40:23.8301318Z prmt.b32 %r8853, %r8849, 0, 0x7773U; 2026-02-21T12:40:23.8301380Z cvt.u16.u32 %rs432, %r8853; 2026-02-21T12:40:23.8301449Z ld.shared.b32 %r8854, [%r31]; 2026-02-21T12:40:23.8301513Z prmt.b32 %r8855, %r8854, 0, 0x7770U; 2026-02-21T12:40:23.8301574Z cvt.u16.u32 %rs433, %r8855; 2026-02-21T12:40:23.8301641Z prmt.b32 %r8856, %r8854, 0, 0x7771U; 2026-02-21T12:40:23.8301703Z cvt.u16.u32 %rs434, %r8856; 2026-02-21T12:40:23.8301765Z prmt.b32 %r8857, %r8854, 0, 0x7772U; 2026-02-21T12:40:23.8301826Z cvt.u16.u32 %rs435, %r8857; 2026-02-21T12:40:23.8301900Z prmt.b32 %r8858, %r8854, 0, 0x7773U; 2026-02-21T12:40:23.8301965Z cvt.u16.u32 %rs436, %r8858; 2026-02-21T12:40:23.8302031Z ld.shared.b32 %r8859, [%r32]; 2026-02-21T12:40:23.8302100Z prmt.b32 %r8860, %r8859, 0, 0x7770U; 2026-02-21T12:40:23.8302164Z cvt.u16.u32 %rs437, %r8860; 2026-02-21T12:40:23.8302229Z prmt.b32 %r8861, %r8859, 0, 0x7771U; 2026-02-21T12:40:23.8302296Z cvt.u16.u32 %rs438, %r8861; 2026-02-21T12:40:23.8302369Z prmt.b32 %r8862, %r8859, 0, 0x7772U; 2026-02-21T12:40:23.8302433Z cvt.u16.u32 %rs439, %r8862; 2026-02-21T12:40:23.8302496Z prmt.b32 %r8863, %r8859, 0, 0x7773U; 2026-02-21T12:40:23.8302560Z cvt.u16.u32 %rs440, %r8863; 2026-02-21T12:40:23.8302780Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8302908Z shl.b16 %rs441, %rs425, 4; 2026-02-21T12:40:23.8302974Z shl.b16 %rs442, %rs429, 4; 2026-02-21T12:40:23.8303036Z shl.b16 %rs443, %rs433, 4; 2026-02-21T12:40:23.8303159Z shl.b16 %rs444, %rs437, 4; 2026-02-21T12:40:23.8303220Z shl.b16 %rs445, %rs426, 4; 2026-02-21T12:40:23.8303286Z shl.b16 %rs446, %rs430, 4; 2026-02-21T12:40:23.8303348Z shl.b16 %rs447, %rs434, 4; 2026-02-21T12:40:23.8303407Z shl.b16 %rs448, %rs438, 4; 2026-02-21T12:40:23.8303473Z shl.b16 %rs449, %rs427, 4; 2026-02-21T12:40:23.8303534Z shl.b16 %rs450, %rs431, 4; 2026-02-21T12:40:23.8303594Z shl.b16 %rs451, %rs435, 4; 2026-02-21T12:40:23.8303655Z shl.b16 %rs452, %rs439, 4; 2026-02-21T12:40:23.8303719Z shl.b16 %rs453, %rs428, 4; 2026-02-21T12:40:23.8303791Z shl.b16 %rs454, %rs432, 4; 2026-02-21T12:40:23.8303854Z shl.b16 %rs455, %rs436, 4; 2026-02-21T12:40:23.8303920Z shl.b16 %rs456, %rs440, 4; 2026-02-21T12:40:23.8304125Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8304191Z cvt.s16.s8 %rs457, %rs441; 2026-02-21T12:40:23.8304254Z shr.s16 %rs458, %rs457, 4; 2026-02-21T12:40:23.8304365Z cvt.s16.s8 %rs459, %rs443; 2026-02-21T12:40:23.8304477Z shr.s16 %rs460, %rs459, 4; 2026-02-21T12:40:23.8304544Z prmt.b32 %r8864, %r8844, 0, 0x8880U; 2026-02-21T12:40:23.8304608Z cvt.u16.u32 %rs461, %r8864; 2026-02-21T12:40:23.8304668Z shr.s16 %rs462, %rs461, 4; 2026-02-21T12:40:23.8304736Z prmt.b32 %r8865, %r8854, 0, 0x8880U; 2026-02-21T12:40:23.8304808Z cvt.u16.u32 %rs463, %r8865; 2026-02-21T12:40:23.8304872Z shr.s16 %rs464, %rs463, 4; 2026-02-21T12:40:23.8305071Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8305137Z cvt.rn.f32.s16 %r8866, %rs464; 2026-02-21T12:40:23.8305200Z cvt.rn.f32.s16 %r8867, %rs462; 2026-02-21T12:40:23.8305263Z cvt.rn.f32.s16 %r8868, %rs460; 2026-02-21T12:40:23.8305326Z cvt.rn.f32.s16 %r8869, %rs458; 2026-02-21T12:40:23.8305525Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8305587Z cvt.s16.s8 %rs465, %rs442; 2026-02-21T12:40:23.8305655Z shr.s16 %rs466, %rs465, 4; 2026-02-21T12:40:23.8305715Z cvt.s16.s8 %rs467, %rs444; 2026-02-21T12:40:23.8305775Z shr.s16 %rs468, %rs467, 4; 2026-02-21T12:40:23.8305841Z prmt.b32 %r8870, %r8849, 0, 0x8880U; 2026-02-21T12:40:23.8305910Z cvt.u16.u32 %rs469, %r8870; 2026-02-21T12:40:23.8305971Z shr.s16 %rs470, %rs469, 4; 2026-02-21T12:40:23.8306035Z prmt.b32 %r8871, %r8859, 0, 0x8880U; 2026-02-21T12:40:23.8306098Z cvt.u16.u32 %rs471, %r8871; 2026-02-21T12:40:23.8306157Z shr.s16 %rs472, %rs471, 4; 2026-02-21T12:40:23.8306352Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8306419Z cvt.rn.f32.s16 %r8872, %rs472; 2026-02-21T12:40:23.8306605Z cvt.rn.f32.s16 %r8873, %rs470; 2026-02-21T12:40:23.8306674Z cvt.rn.f32.s16 %r8874, %rs468; 2026-02-21T12:40:23.8306738Z cvt.rn.f32.s16 %r8875, %rs466; 2026-02-21T12:40:23.8306939Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8307004Z cvt.s16.s8 %rs473, %rs445; 2026-02-21T12:40:23.8307076Z shr.s16 %rs474, %rs473, 4; 2026-02-21T12:40:23.8307140Z cvt.s16.s8 %rs475, %rs447; 2026-02-21T12:40:23.8307199Z shr.s16 %rs476, %rs475, 4; 2026-02-21T12:40:23.8307263Z prmt.b32 %r8876, %r8844, 0, 0x9991U; 2026-02-21T12:40:23.8307326Z cvt.u16.u32 %rs477, %r8876; 2026-02-21T12:40:23.8307386Z shr.s16 %rs478, %rs477, 4; 2026-02-21T12:40:23.8307450Z prmt.b32 %r8877, %r8854, 0, 0x9991U; 2026-02-21T12:40:23.8307512Z cvt.u16.u32 %rs479, %r8877; 2026-02-21T12:40:23.8307576Z shr.s16 %rs480, %rs479, 4; 2026-02-21T12:40:23.8307768Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8307831Z cvt.rn.f32.s16 %r8878, %rs480; 2026-02-21T12:40:23.8307896Z cvt.rn.f32.s16 %r8879, %rs478; 2026-02-21T12:40:23.8308039Z cvt.rn.f32.s16 %r8880, %rs476; 2026-02-21T12:40:23.8308100Z cvt.rn.f32.s16 %r8881, %rs474; 2026-02-21T12:40:23.8308445Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8308515Z cvt.s16.s8 %rs481, %rs446; 2026-02-21T12:40:23.8308577Z shr.s16 %rs482, %rs481, 4; 2026-02-21T12:40:23.8308637Z cvt.s16.s8 %rs483, %rs448; 2026-02-21T12:40:23.8308700Z shr.s16 %rs484, %rs483, 4; 2026-02-21T12:40:23.8308764Z prmt.b32 %r8882, %r8849, 0, 0x9991U; 2026-02-21T12:40:23.8308825Z cvt.u16.u32 %rs485, %r8882; 2026-02-21T12:40:23.8308887Z shr.s16 %rs486, %rs485, 4; 2026-02-21T12:40:23.8308951Z prmt.b32 %r8883, %r8859, 0, 0x9991U; 2026-02-21T12:40:23.8309011Z cvt.u16.u32 %rs487, %r8883; 2026-02-21T12:40:23.8309071Z shr.s16 %rs488, %rs487, 4; 2026-02-21T12:40:23.8309266Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8309330Z cvt.rn.f32.s16 %r8884, %rs488; 2026-02-21T12:40:23.8309392Z cvt.rn.f32.s16 %r8885, %rs486; 2026-02-21T12:40:23.8309457Z cvt.rn.f32.s16 %r8886, %rs484; 2026-02-21T12:40:23.8309660Z cvt.rn.f32.s16 %r8887, %rs482; 2026-02-21T12:40:23.8309857Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8309920Z cvt.s16.s8 %rs489, %rs449; 2026-02-21T12:40:23.8309980Z shr.s16 %rs490, %rs489, 4; 2026-02-21T12:40:23.8310040Z cvt.s16.s8 %rs491, %rs451; 2026-02-21T12:40:23.8310100Z shr.s16 %rs492, %rs491, 4; 2026-02-21T12:40:23.8310166Z prmt.b32 %r8888, %r8844, 0, 0xaaa2U; 2026-02-21T12:40:23.8310239Z cvt.u16.u32 %rs493, %r8888; 2026-02-21T12:40:23.8310301Z shr.s16 %rs494, %rs493, 4; 2026-02-21T12:40:23.8310367Z prmt.b32 %r8889, %r8854, 0, 0xaaa2U; 2026-02-21T12:40:23.8310429Z cvt.u16.u32 %rs495, %r8889; 2026-02-21T12:40:23.8310491Z shr.s16 %rs496, %rs495, 4; 2026-02-21T12:40:23.8310682Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8310750Z cvt.rn.f32.s16 %r8890, %rs496; 2026-02-21T12:40:23.8310817Z cvt.rn.f32.s16 %r8891, %rs494; 2026-02-21T12:40:23.8310880Z cvt.rn.f32.s16 %r8892, %rs492; 2026-02-21T12:40:23.8310945Z cvt.rn.f32.s16 %r8893, %rs490; 2026-02-21T12:40:23.8311138Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8311199Z cvt.s16.s8 %rs497, %rs450; 2026-02-21T12:40:23.8311262Z shr.s16 %rs498, %rs497, 4; 2026-02-21T12:40:23.8311322Z cvt.s16.s8 %rs499, %rs452; 2026-02-21T12:40:23.8311381Z shr.s16 %rs500, %rs499, 4; 2026-02-21T12:40:23.8311445Z prmt.b32 %r8894, %r8849, 0, 0xaaa2U; 2026-02-21T12:40:23.8311509Z cvt.u16.u32 %rs501, %r8894; 2026-02-21T12:40:23.8311569Z shr.s16 %rs502, %rs501, 4; 2026-02-21T12:40:23.8311634Z prmt.b32 %r8895, %r8859, 0, 0xaaa2U; 2026-02-21T12:40:23.8311698Z cvt.u16.u32 %rs503, %r8895; 2026-02-21T12:40:23.8311761Z shr.s16 %rs504, %rs503, 4; 2026-02-21T12:40:23.8311953Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8312024Z cvt.rn.f32.s16 %r8896, %rs504; 2026-02-21T12:40:23.8312090Z cvt.rn.f32.s16 %r8897, %rs502; 2026-02-21T12:40:23.8312152Z cvt.rn.f32.s16 %r8898, %rs500; 2026-02-21T12:40:23.8312213Z cvt.rn.f32.s16 %r8899, %rs498; 2026-02-21T12:40:23.8312411Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8312474Z cvt.s16.s8 %rs505, %rs453; 2026-02-21T12:40:23.8312533Z shr.s16 %rs506, %rs505, 4; 2026-02-21T12:40:23.8312594Z cvt.s16.s8 %rs507, %rs455; 2026-02-21T12:40:23.8312653Z shr.s16 %rs508, %rs507, 4; 2026-02-21T12:40:23.8312716Z prmt.b32 %r8900, %r8844, 0, 0xbbb3U; 2026-02-21T12:40:23.8312777Z cvt.u16.u32 %rs509, %r8900; 2026-02-21T12:40:23.8312839Z shr.s16 %rs510, %rs509, 4; 2026-02-21T12:40:23.8312902Z prmt.b32 %r8901, %r8854, 0, 0xbbb3U; 2026-02-21T12:40:23.8313037Z cvt.u16.u32 %rs511, %r8901; 2026-02-21T12:40:23.8313101Z shr.s16 %rs512, %rs511, 4; 2026-02-21T12:40:23.8313298Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8313408Z cvt.rn.f32.s16 %r8902, %rs512; 2026-02-21T12:40:23.8313470Z cvt.rn.f32.s16 %r8903, %rs510; 2026-02-21T12:40:23.8313536Z cvt.rn.f32.s16 %r8904, %rs508; 2026-02-21T12:40:23.8313598Z cvt.rn.f32.s16 %r8905, %rs506; 2026-02-21T12:40:23.8313790Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8313853Z cvt.s16.s8 %rs513, %rs454; 2026-02-21T12:40:23.8313912Z shr.s16 %rs514, %rs513, 4; 2026-02-21T12:40:23.8313970Z cvt.s16.s8 %rs515, %rs456; 2026-02-21T12:40:23.8314032Z shr.s16 %rs516, %rs515, 4; 2026-02-21T12:40:23.8314095Z prmt.b32 %r8906, %r8849, 0, 0xbbb3U; 2026-02-21T12:40:23.8314156Z cvt.u16.u32 %rs517, %r8906; 2026-02-21T12:40:23.8314218Z shr.s16 %rs518, %rs517, 4; 2026-02-21T12:40:23.8314286Z prmt.b32 %r8907, %r8859, 0, 0xbbb3U; 2026-02-21T12:40:23.8314346Z cvt.u16.u32 %rs519, %r8907; 2026-02-21T12:40:23.8314502Z shr.s16 %rs520, %rs519, 4; 2026-02-21T12:40:23.8314702Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8314765Z cvt.rn.f32.s16 %r8908, %rs520; 2026-02-21T12:40:23.8314827Z cvt.rn.f32.s16 %r8909, %rs518; 2026-02-21T12:40:23.8314889Z cvt.rn.f32.s16 %r8910, %rs516; 2026-02-21T12:40:23.8314954Z cvt.rn.f32.s16 %r8911, %rs514; 2026-02-21T12:40:23.8315010Z bar.sync 0; 2026-02-21T12:40:23.8315126Z st.shared.v4.b32 [%r33], {%r8869, %r8867, %r8868, %r8866}; 2026-02-21T12:40:23.8315250Z st.shared.v4.b32 [%r33+8192], {%r8875, %r8873, %r8874, %r8872}; 2026-02-21T12:40:23.8315355Z st.shared.v4.b32 [%r34], {%r8881, %r8879, %r8880, %r8878}; 2026-02-21T12:40:23.8315467Z st.shared.v4.b32 [%r34+8192], {%r8887, %r8885, %r8886, %r8884}; 2026-02-21T12:40:23.8315574Z st.shared.v4.b32 [%r35], {%r8893, %r8891, %r8892, %r8890}; 2026-02-21T12:40:23.8315684Z st.shared.v4.b32 [%r35+8192], {%r8899, %r8897, %r8898, %r8896}; 2026-02-21T12:40:23.8315796Z st.shared.v4.b32 [%r36], {%r8905, %r8903, %r8904, %r8902}; 2026-02-21T12:40:23.8315906Z st.shared.v4.b32 [%r36+8192], {%r8911, %r8909, %r8910, %r8908}; 2026-02-21T12:40:23.8315965Z $L__tmp9: 2026-02-21T12:40:23.8316237Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8316301Z // begin inline asm 2026-02-21T12:40:23.8316383Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8316439Z // end inline asm 2026-02-21T12:40:23.8316629Z bar.sync 0; 2026-02-21T12:40:23.8316716Z shfl.sync.idx.b32 %r8912, %r2, 0, 31, -1; 2026-02-21T12:40:23.8316791Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8316855Z mov.pred %p14, -1; 2026-02-21T12:40:23.8316913Z // begin inline asm 2026-02-21T12:40:23.8319623Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r6726,%r6727,%r6728,%r6729}, %rd23, %p14, 1, 1; 2026-02-21T12:40:23.8319771Z // end inline asm 2026-02-21T12:40:23.8319889Z // begin inline asm 2026-02-21T12:40:23.8322708Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r6986,%r6987,%r6988,%r6989}, %rd24, %p14, 1, 1; 2026-02-21T12:40:23.8322775Z // end inline asm 2026-02-21T12:40:23.8322853Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8322913Z mov.b32 %r8700, 0; 2026-02-21T12:40:23.8322974Z mov.b32 %r7118, %r18409; 2026-02-21T12:40:23.8323033Z mov.b32 %r7119, %r8700; 2026-02-21T12:40:23.8323091Z mov.b32 %r7120, %r8700; 2026-02-21T12:40:23.8323153Z // begin inline asm 2026-02-21T12:40:23.8325670Z // wait for regs: %r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713,%r7118,%r7119,%r7120 2026-02-21T12:40:23.8325762Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8325831Z // end inline asm 2026-02-21T12:40:23.8325888Z $L__tmp10: 2026-02-21T12:40:23.8326103Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8326173Z add.s64 %rd280, %rd622, -32; 2026-02-21T12:40:23.8326232Z // begin inline asm 2026-02-21T12:40:23.8326297Z mov.u64 %rd279, 0x0; 2026-02-21T12:40:23.8326425Z createpolicy.fractional.L2::evict_last.b64 %rd279, 1.0; 2026-02-21T12:40:23.8326592Z // end inline asm 2026-02-21T12:40:23.8326654Z // begin inline asm 2026-02-21T12:40:23.8326717Z mov.u32 %r7252, 0x0; 2026-02-21T12:40:23.8326774Z mov.u32 %r7253, 0x0; 2026-02-21T12:40:23.8326830Z mov.u32 %r7254, 0x0; 2026-02-21T12:40:23.8326893Z mov.u32 %r7255, 0x0; 2026-02-21T12:40:23.8327120Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7252, %r7253, %r7254, %r7255 }, [ %rd280 + 0 ], %rd279; 2026-02-21T12:40:23.8327178Z // end inline asm 2026-02-21T12:40:23.8327389Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8327448Z bar.sync 0; 2026-02-21T12:40:23.8327527Z st.shared.v2.b32 [%r9], {%r7252, %r7253}; 2026-02-21T12:40:23.8327686Z st.shared.v2.b32 [%r10], {%r7254, %r7255}; 2026-02-21T12:40:23.8327743Z bar.sync 0; 2026-02-21T12:40:23.8327809Z ld.shared.b16 %rs521, [%r53]; 2026-02-21T12:40:23.8327941Z ld.shared.b16 %rs522, [%r53+256]; 2026-02-21T12:40:23.8328014Z ld.shared.b16 %rs523, [%r53+16]; 2026-02-21T12:40:23.8328079Z ld.shared.b16 %rs524, [%r53+272]; 2026-02-21T12:40:23.8328143Z ld.shared.b16 %rs525, [%r54]; 2026-02-21T12:40:23.8328207Z ld.shared.b16 %rs526, [%r54+256]; 2026-02-21T12:40:23.8328276Z ld.shared.b16 %rs527, [%r54+16]; 2026-02-21T12:40:23.8328338Z ld.shared.b16 %rs528, [%r54+272]; 2026-02-21T12:40:23.8328403Z cvt.f32.bf16 %r7516, %rs521; 2026-02-21T12:40:23.8328466Z cvt.f32.bf16 %r7517, %rs522; 2026-02-21T12:40:23.8328528Z cvt.f32.bf16 %r7518, %rs525; 2026-02-21T12:40:23.8328588Z cvt.f32.bf16 %r7519, %rs526; 2026-02-21T12:40:23.8328651Z cvt.f32.bf16 %r7776, %rs523; 2026-02-21T12:40:23.8328714Z cvt.f32.bf16 %r7777, %rs524; 2026-02-21T12:40:23.8328774Z cvt.f32.bf16 %r7778, %rs527; 2026-02-21T12:40:23.8328836Z cvt.f32.bf16 %r7779, %rs528; 2026-02-21T12:40:23.8329152Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8329223Z add.s64 %rd282, %rd621, 10240; 2026-02-21T12:40:23.8329283Z // begin inline asm 2026-02-21T12:40:23.8329343Z mov.u32 %r7256, 0x0; 2026-02-21T12:40:23.8329399Z mov.u32 %r7257, 0x0; 2026-02-21T12:40:23.8329455Z mov.u32 %r7258, 0x0; 2026-02-21T12:40:23.8329511Z mov.u32 %r7259, 0x0; 2026-02-21T12:40:23.8329643Z ld.global.v4.b32 { %r7256, %r7257, %r7258, %r7259 }, [ %rd282 + 0 ]; 2026-02-21T12:40:23.8329700Z // end inline asm 2026-02-21T12:40:23.8329892Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8329951Z bar.sync 0; 2026-02-21T12:40:23.8330015Z st.shared.b8 [%r13], %r7256; 2026-02-21T12:40:23.8330094Z prmt.b32 %r8913, %r7256, 0, 0x7771U; 2026-02-21T12:40:23.8330161Z st.shared.b8 [%r14], %r8913; 2026-02-21T12:40:23.8330231Z prmt.b32 %r8914, %r7256, 0, 0x7772U; 2026-02-21T12:40:23.8330296Z st.shared.b8 [%r15+256], %r8914; 2026-02-21T12:40:23.8330364Z prmt.b32 %r8915, %r7256, 0, 0x7773U; 2026-02-21T12:40:23.8330433Z st.shared.b8 [%r16+256], %r8915; 2026-02-21T12:40:23.8330497Z st.shared.b8 [%r17+512], %r7257; 2026-02-21T12:40:23.8330560Z prmt.b32 %r8916, %r7257, 0, 0x7771U; 2026-02-21T12:40:23.8330627Z st.shared.b8 [%r18+512], %r8916; 2026-02-21T12:40:23.8330690Z prmt.b32 %r8917, %r7257, 0, 0x7772U; 2026-02-21T12:40:23.8330753Z st.shared.b8 [%r19+768], %r8917; 2026-02-21T12:40:23.8330815Z prmt.b32 %r8918, %r7257, 0, 0x7773U; 2026-02-21T12:40:23.8330882Z st.shared.b8 [%r20+768], %r8918; 2026-02-21T12:40:23.8330945Z st.shared.b8 [%r21+1024], %r7258; 2026-02-21T12:40:23.8331011Z prmt.b32 %r8919, %r7258, 0, 0x7771U; 2026-02-21T12:40:23.8331076Z st.shared.b8 [%r22+1024], %r8919; 2026-02-21T12:40:23.8331139Z prmt.b32 %r8920, %r7258, 0, 0x7772U; 2026-02-21T12:40:23.8331204Z st.shared.b8 [%r23+1280], %r8920; 2026-02-21T12:40:23.8331270Z prmt.b32 %r8921, %r7258, 0, 0x7773U; 2026-02-21T12:40:23.8331334Z st.shared.b8 [%r24+1280], %r8921; 2026-02-21T12:40:23.8331399Z st.shared.b8 [%r25+1536], %r7259; 2026-02-21T12:40:23.8331463Z prmt.b32 %r8922, %r7259, 0, 0x7771U; 2026-02-21T12:40:23.8331528Z st.shared.b8 [%r26+1536], %r8922; 2026-02-21T12:40:23.8331592Z prmt.b32 %r8923, %r7259, 0, 0x7772U; 2026-02-21T12:40:23.8331655Z st.shared.b8 [%r27+1792], %r8923; 2026-02-21T12:40:23.8331720Z prmt.b32 %r8924, %r7259, 0, 0x7773U; 2026-02-21T12:40:23.8331782Z st.shared.b8 [%r28+1792], %r8924; 2026-02-21T12:40:23.8331837Z bar.sync 0; 2026-02-21T12:40:23.8331901Z ld.shared.b32 %r8925, [%r29]; 2026-02-21T12:40:23.8331968Z prmt.b32 %r8926, %r8925, 0, 0x7770U; 2026-02-21T12:40:23.8332033Z cvt.u16.u32 %rs529, %r8926; 2026-02-21T12:40:23.8332095Z prmt.b32 %r8927, %r8925, 0, 0x7771U; 2026-02-21T12:40:23.8332159Z cvt.u16.u32 %rs530, %r8927; 2026-02-21T12:40:23.8332286Z prmt.b32 %r8928, %r8925, 0, 0x7772U; 2026-02-21T12:40:23.8332345Z cvt.u16.u32 %rs531, %r8928; 2026-02-21T12:40:23.8332407Z prmt.b32 %r8929, %r8925, 0, 0x7773U; 2026-02-21T12:40:23.8332523Z cvt.u16.u32 %rs532, %r8929; 2026-02-21T12:40:23.8332587Z ld.shared.b32 %r8930, [%r30]; 2026-02-21T12:40:23.8332650Z prmt.b32 %r8931, %r8930, 0, 0x7770U; 2026-02-21T12:40:23.8332713Z cvt.u16.u32 %rs533, %r8931; 2026-02-21T12:40:23.8332775Z prmt.b32 %r8932, %r8930, 0, 0x7771U; 2026-02-21T12:40:23.8332834Z cvt.u16.u32 %rs534, %r8932; 2026-02-21T12:40:23.8332897Z prmt.b32 %r8933, %r8930, 0, 0x7772U; 2026-02-21T12:40:23.8332961Z cvt.u16.u32 %rs535, %r8933; 2026-02-21T12:40:23.8333024Z prmt.b32 %r8934, %r8930, 0, 0x7773U; 2026-02-21T12:40:23.8333084Z cvt.u16.u32 %rs536, %r8934; 2026-02-21T12:40:23.8333151Z ld.shared.b32 %r8935, [%r31]; 2026-02-21T12:40:23.8333227Z prmt.b32 %r8936, %r8935, 0, 0x7770U; 2026-02-21T12:40:23.8333288Z cvt.u16.u32 %rs537, %r8936; 2026-02-21T12:40:23.8333355Z prmt.b32 %r8937, %r8935, 0, 0x7771U; 2026-02-21T12:40:23.8333416Z cvt.u16.u32 %rs538, %r8937; 2026-02-21T12:40:23.8333480Z prmt.b32 %r8938, %r8935, 0, 0x7772U; 2026-02-21T12:40:23.8333650Z cvt.u16.u32 %rs539, %r8938; 2026-02-21T12:40:23.8333723Z prmt.b32 %r8939, %r8935, 0, 0x7773U; 2026-02-21T12:40:23.8333782Z cvt.u16.u32 %rs540, %r8939; 2026-02-21T12:40:23.8333846Z ld.shared.b32 %r8940, [%r32]; 2026-02-21T12:40:23.8333910Z prmt.b32 %r8941, %r8940, 0, 0x7770U; 2026-02-21T12:40:23.8333970Z cvt.u16.u32 %rs541, %r8941; 2026-02-21T12:40:23.8334032Z prmt.b32 %r8942, %r8940, 0, 0x7771U; 2026-02-21T12:40:23.8334091Z cvt.u16.u32 %rs542, %r8942; 2026-02-21T12:40:23.8334156Z prmt.b32 %r8943, %r8940, 0, 0x7772U; 2026-02-21T12:40:23.8334215Z cvt.u16.u32 %rs543, %r8943; 2026-02-21T12:40:23.8334278Z prmt.b32 %r8944, %r8940, 0, 0x7773U; 2026-02-21T12:40:23.8334344Z cvt.u16.u32 %rs544, %r8944; 2026-02-21T12:40:23.8334550Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8334617Z shl.b16 %rs545, %rs529, 4; 2026-02-21T12:40:23.8334682Z shl.b16 %rs546, %rs533, 4; 2026-02-21T12:40:23.8334744Z shl.b16 %rs547, %rs537, 4; 2026-02-21T12:40:23.8334807Z shl.b16 %rs548, %rs541, 4; 2026-02-21T12:40:23.8334867Z shl.b16 %rs549, %rs530, 4; 2026-02-21T12:40:23.8334929Z shl.b16 %rs550, %rs534, 4; 2026-02-21T12:40:23.8334989Z shl.b16 %rs551, %rs538, 4; 2026-02-21T12:40:23.8335048Z shl.b16 %rs552, %rs542, 4; 2026-02-21T12:40:23.8335111Z shl.b16 %rs553, %rs531, 4; 2026-02-21T12:40:23.8335170Z shl.b16 %rs554, %rs535, 4; 2026-02-21T12:40:23.8335232Z shl.b16 %rs555, %rs539, 4; 2026-02-21T12:40:23.8335292Z shl.b16 %rs556, %rs543, 4; 2026-02-21T12:40:23.8335354Z shl.b16 %rs557, %rs532, 4; 2026-02-21T12:40:23.8335414Z shl.b16 %rs558, %rs536, 4; 2026-02-21T12:40:23.8335472Z shl.b16 %rs559, %rs540, 4; 2026-02-21T12:40:23.8335534Z shl.b16 %rs560, %rs544, 4; 2026-02-21T12:40:23.8335729Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8335793Z cvt.s16.s8 %rs561, %rs545; 2026-02-21T12:40:23.8335855Z shr.s16 %rs562, %rs561, 4; 2026-02-21T12:40:23.8335921Z cvt.s16.s8 %rs563, %rs547; 2026-02-21T12:40:23.8335981Z shr.s16 %rs564, %rs563, 4; 2026-02-21T12:40:23.8336049Z prmt.b32 %r8945, %r8925, 0, 0x8880U; 2026-02-21T12:40:23.8336115Z cvt.u16.u32 %rs565, %r8945; 2026-02-21T12:40:23.8336177Z shr.s16 %rs566, %rs565, 4; 2026-02-21T12:40:23.8336242Z prmt.b32 %r8946, %r8935, 0, 0x8880U; 2026-02-21T12:40:23.8336315Z cvt.u16.u32 %rs567, %r8946; 2026-02-21T12:40:23.8336380Z shr.s16 %rs568, %rs567, 4; 2026-02-21T12:40:23.8336710Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8336779Z cvt.rn.f32.s16 %r8947, %rs568; 2026-02-21T12:40:23.8336845Z cvt.rn.f32.s16 %r8948, %rs566; 2026-02-21T12:40:23.8336906Z cvt.rn.f32.s16 %r8949, %rs564; 2026-02-21T12:40:23.8336968Z cvt.rn.f32.s16 %r8950, %rs562; 2026-02-21T12:40:23.8337262Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8337388Z cvt.s16.s8 %rs569, %rs546; 2026-02-21T12:40:23.8337452Z shr.s16 %rs570, %rs569, 4; 2026-02-21T12:40:23.8337513Z cvt.s16.s8 %rs571, %rs548; 2026-02-21T12:40:23.8337577Z shr.s16 %rs572, %rs571, 4; 2026-02-21T12:40:23.8337642Z prmt.b32 %r8951, %r8930, 0, 0x8880U; 2026-02-21T12:40:23.8337703Z cvt.u16.u32 %rs573, %r8951; 2026-02-21T12:40:23.8337767Z shr.s16 %rs574, %rs573, 4; 2026-02-21T12:40:23.8337831Z prmt.b32 %r8952, %r8940, 0, 0x8880U; 2026-02-21T12:40:23.8337892Z cvt.u16.u32 %rs575, %r8952; 2026-02-21T12:40:23.8337956Z shr.s16 %rs576, %rs575, 4; 2026-02-21T12:40:23.8338152Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8338214Z cvt.rn.f32.s16 %r8953, %rs576; 2026-02-21T12:40:23.8338276Z cvt.rn.f32.s16 %r8954, %rs574; 2026-02-21T12:40:23.8338344Z cvt.rn.f32.s16 %r8955, %rs572; 2026-02-21T12:40:23.8338409Z cvt.rn.f32.s16 %r8956, %rs570; 2026-02-21T12:40:23.8338721Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8338792Z cvt.s16.s8 %rs577, %rs549; 2026-02-21T12:40:23.8338852Z shr.s16 %rs578, %rs577, 4; 2026-02-21T12:40:23.8338910Z cvt.s16.s8 %rs579, %rs551; 2026-02-21T12:40:23.8338970Z shr.s16 %rs580, %rs579, 4; 2026-02-21T12:40:23.8339037Z prmt.b32 %r8957, %r8925, 0, 0x9991U; 2026-02-21T12:40:23.8339096Z cvt.u16.u32 %rs581, %r8957; 2026-02-21T12:40:23.8339167Z shr.s16 %rs582, %rs581, 4; 2026-02-21T12:40:23.8339236Z prmt.b32 %r8958, %r8935, 0, 0x9991U; 2026-02-21T12:40:23.8339296Z cvt.u16.u32 %rs583, %r8958; 2026-02-21T12:40:23.8339356Z shr.s16 %rs584, %rs583, 4; 2026-02-21T12:40:23.8339552Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8339616Z cvt.rn.f32.s16 %r8959, %rs584; 2026-02-21T12:40:23.8339680Z cvt.rn.f32.s16 %r8960, %rs582; 2026-02-21T12:40:23.8339741Z cvt.rn.f32.s16 %r8961, %rs580; 2026-02-21T12:40:23.8339807Z cvt.rn.f32.s16 %r8962, %rs578; 2026-02-21T12:40:23.8340003Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8340064Z cvt.s16.s8 %rs585, %rs550; 2026-02-21T12:40:23.8340127Z shr.s16 %rs586, %rs585, 4; 2026-02-21T12:40:23.8340188Z cvt.s16.s8 %rs587, %rs552; 2026-02-21T12:40:23.8340247Z shr.s16 %rs588, %rs587, 4; 2026-02-21T12:40:23.8340313Z prmt.b32 %r8963, %r8930, 0, 0x9991U; 2026-02-21T12:40:23.8340376Z cvt.u16.u32 %rs589, %r8963; 2026-02-21T12:40:23.8340434Z shr.s16 %rs590, %rs589, 4; 2026-02-21T12:40:23.8340497Z prmt.b32 %r8964, %r8940, 0, 0x9991U; 2026-02-21T12:40:23.8340560Z cvt.u16.u32 %rs591, %r8964; 2026-02-21T12:40:23.8340618Z shr.s16 %rs592, %rs591, 4; 2026-02-21T12:40:23.8340810Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8340876Z cvt.rn.f32.s16 %r8965, %rs592; 2026-02-21T12:40:23.8340937Z cvt.rn.f32.s16 %r8966, %rs590; 2026-02-21T12:40:23.8341002Z cvt.rn.f32.s16 %r8967, %rs588; 2026-02-21T12:40:23.8341063Z cvt.rn.f32.s16 %r8968, %rs586; 2026-02-21T12:40:23.8341257Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8341318Z cvt.s16.s8 %rs593, %rs553; 2026-02-21T12:40:23.8341378Z shr.s16 %rs594, %rs593, 4; 2026-02-21T12:40:23.8341442Z cvt.s16.s8 %rs595, %rs555; 2026-02-21T12:40:23.8341501Z shr.s16 %rs596, %rs595, 4; 2026-02-21T12:40:23.8341565Z prmt.b32 %r8969, %r8925, 0, 0xaaa2U; 2026-02-21T12:40:23.8341626Z cvt.u16.u32 %rs597, %r8969; 2026-02-21T12:40:23.8341689Z shr.s16 %rs598, %rs597, 4; 2026-02-21T12:40:23.8341753Z prmt.b32 %r8970, %r8935, 0, 0xaaa2U; 2026-02-21T12:40:23.8341812Z cvt.u16.u32 %rs599, %r8970; 2026-02-21T12:40:23.8341874Z shr.s16 %rs600, %rs599, 4; 2026-02-21T12:40:23.8342139Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8342205Z cvt.rn.f32.s16 %r8971, %rs600; 2026-02-21T12:40:23.8342319Z cvt.rn.f32.s16 %r8972, %rs598; 2026-02-21T12:40:23.8342383Z cvt.rn.f32.s16 %r8973, %rs596; 2026-02-21T12:40:23.8342444Z cvt.rn.f32.s16 %r8974, %rs594; 2026-02-21T12:40:23.8342636Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8342701Z cvt.s16.s8 %rs601, %rs554; 2026-02-21T12:40:23.8342761Z shr.s16 %rs602, %rs601, 4; 2026-02-21T12:40:23.8342820Z cvt.s16.s8 %rs603, %rs556; 2026-02-21T12:40:23.8342883Z shr.s16 %rs604, %rs603, 4; 2026-02-21T12:40:23.8342946Z prmt.b32 %r8975, %r8930, 0, 0xaaa2U; 2026-02-21T12:40:23.8343007Z cvt.u16.u32 %rs605, %r8975; 2026-02-21T12:40:23.8343069Z shr.s16 %rs606, %rs605, 4; 2026-02-21T12:40:23.8343131Z prmt.b32 %r8976, %r8940, 0, 0xaaa2U; 2026-02-21T12:40:23.8343191Z cvt.u16.u32 %rs607, %r8976; 2026-02-21T12:40:23.8343253Z shr.s16 %rs608, %rs607, 4; 2026-02-21T12:40:23.8343537Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8343604Z cvt.rn.f32.s16 %r8977, %rs608; 2026-02-21T12:40:23.8343667Z cvt.rn.f32.s16 %r8978, %rs606; 2026-02-21T12:40:23.8343730Z cvt.rn.f32.s16 %r8979, %rs604; 2026-02-21T12:40:23.8343791Z cvt.rn.f32.s16 %r8980, %rs602; 2026-02-21T12:40:23.8343982Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8344044Z cvt.s16.s8 %rs609, %rs557; 2026-02-21T12:40:23.8344109Z shr.s16 %rs610, %rs609, 4; 2026-02-21T12:40:23.8344169Z cvt.s16.s8 %rs611, %rs559; 2026-02-21T12:40:23.8344229Z shr.s16 %rs612, %rs611, 4; 2026-02-21T12:40:23.8344295Z prmt.b32 %r8981, %r8925, 0, 0xbbb3U; 2026-02-21T12:40:23.8344355Z cvt.u16.u32 %rs613, %r8981; 2026-02-21T12:40:23.8344415Z shr.s16 %rs614, %rs613, 4; 2026-02-21T12:40:23.8344486Z prmt.b32 %r8982, %r8935, 0, 0xbbb3U; 2026-02-21T12:40:23.8344559Z cvt.u16.u32 %rs615, %r8982; 2026-02-21T12:40:23.8344622Z shr.s16 %rs616, %rs615, 4; 2026-02-21T12:40:23.8344821Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8344893Z cvt.rn.f32.s16 %r8983, %rs616; 2026-02-21T12:40:23.8344957Z cvt.rn.f32.s16 %r8984, %rs614; 2026-02-21T12:40:23.8345020Z cvt.rn.f32.s16 %r8985, %rs612; 2026-02-21T12:40:23.8345087Z cvt.rn.f32.s16 %r8986, %rs610; 2026-02-21T12:40:23.8345280Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8345342Z cvt.s16.s8 %rs617, %rs558; 2026-02-21T12:40:23.8345404Z shr.s16 %rs618, %rs617, 4; 2026-02-21T12:40:23.8345470Z cvt.s16.s8 %rs619, %rs560; 2026-02-21T12:40:23.8345530Z shr.s16 %rs620, %rs619, 4; 2026-02-21T12:40:23.8345598Z prmt.b32 %r8987, %r8930, 0, 0xbbb3U; 2026-02-21T12:40:23.8345664Z cvt.u16.u32 %rs621, %r8987; 2026-02-21T12:40:23.8345728Z shr.s16 %rs622, %rs621, 4; 2026-02-21T12:40:23.8345793Z prmt.b32 %r8988, %r8940, 0, 0xbbb3U; 2026-02-21T12:40:23.8345863Z cvt.u16.u32 %rs623, %r8988; 2026-02-21T12:40:23.8345926Z shr.s16 %rs624, %rs623, 4; 2026-02-21T12:40:23.8346132Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8346198Z cvt.rn.f32.s16 %r8989, %rs624; 2026-02-21T12:40:23.8346266Z cvt.rn.f32.s16 %r8990, %rs622; 2026-02-21T12:40:23.8346330Z cvt.rn.f32.s16 %r8991, %rs620; 2026-02-21T12:40:23.8346391Z cvt.rn.f32.s16 %r8992, %rs618; 2026-02-21T12:40:23.8346564Z bar.sync 0; 2026-02-21T12:40:23.8346681Z st.shared.v4.b32 [%r33], {%r8950, %r8948, %r8949, %r8947}; 2026-02-21T12:40:23.8346815Z st.shared.v4.b32 [%r33+8192], {%r8956, %r8954, %r8955, %r8953}; 2026-02-21T12:40:23.8346921Z st.shared.v4.b32 [%r34], {%r8962, %r8960, %r8961, %r8959}; 2026-02-21T12:40:23.8347039Z st.shared.v4.b32 [%r34+8192], {%r8968, %r8966, %r8967, %r8965}; 2026-02-21T12:40:23.8347239Z st.shared.v4.b32 [%r35], {%r8974, %r8972, %r8973, %r8971}; 2026-02-21T12:40:23.8347354Z st.shared.v4.b32 [%r35+8192], {%r8980, %r8978, %r8979, %r8977}; 2026-02-21T12:40:23.8347524Z st.shared.v4.b32 [%r36], {%r8986, %r8984, %r8985, %r8983}; 2026-02-21T12:40:23.8347636Z st.shared.v4.b32 [%r36+8192], {%r8992, %r8990, %r8991, %r8989}; 2026-02-21T12:40:23.8347703Z $L__tmp11: 2026-02-21T12:40:23.8347984Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8348048Z // begin inline asm 2026-02-21T12:40:23.8348128Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8348188Z // end inline asm 2026-02-21T12:40:23.8348243Z bar.sync 0; 2026-02-21T12:40:23.8348315Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8348449Z // begin inline asm 2026-02-21T12:40:23.8351312Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r7516,%r7517,%r7518,%r7519}, %rd23, %p14, 1, 1; 2026-02-21T12:40:23.8351380Z // end inline asm 2026-02-21T12:40:23.8351443Z // begin inline asm 2026-02-21T12:40:23.8354154Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r7776,%r7777,%r7778,%r7779}, %rd24, %p14, 1, 1; 2026-02-21T12:40:23.8354219Z // end inline asm 2026-02-21T12:40:23.8354297Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8354360Z mov.b32 %r7908, %r18409; 2026-02-21T12:40:23.8354423Z mov.b32 %r7909, %r8700; 2026-02-21T12:40:23.8354481Z mov.b32 %r7910, %r8700; 2026-02-21T12:40:23.8354539Z // begin inline asm 2026-02-21T12:40:23.8357178Z // wait for regs: %r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713,%r7908,%r7909,%r7910 2026-02-21T12:40:23.8357417Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8357478Z // end inline asm 2026-02-21T12:40:23.8357536Z $L__tmp12: 2026-02-21T12:40:23.8357741Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8357803Z // begin inline asm 2026-02-21T12:40:23.8357863Z mov.u64 %rd285, 0x0; 2026-02-21T12:40:23.8357990Z createpolicy.fractional.L2::evict_last.b64 %rd285, 1.0; 2026-02-21T12:40:23.8358046Z // end inline asm 2026-02-21T12:40:23.8358175Z // begin inline asm 2026-02-21T12:40:23.8358296Z mov.u32 %r8042, 0x0; 2026-02-21T12:40:23.8358358Z mov.u32 %r8043, 0x0; 2026-02-21T12:40:23.8358422Z mov.u32 %r8044, 0x0; 2026-02-21T12:40:23.8358479Z mov.u32 %r8045, 0x0; 2026-02-21T12:40:23.8358705Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r8042, %r8043, %r8044, %r8045 }, [ %rd622 + 0 ], %rd285; 2026-02-21T12:40:23.8358778Z // end inline asm 2026-02-21T12:40:23.8358979Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8359035Z bar.sync 0; 2026-02-21T12:40:23.8359118Z st.shared.v2.b32 [%r9], {%r8042, %r8043}; 2026-02-21T12:40:23.8359201Z st.shared.v2.b32 [%r10], {%r8044, %r8045}; 2026-02-21T12:40:23.8359256Z bar.sync 0; 2026-02-21T12:40:23.8359322Z ld.shared.b16 %rs625, [%r53]; 2026-02-21T12:40:23.8359399Z ld.shared.b16 %rs626, [%r53+256]; 2026-02-21T12:40:23.8359468Z ld.shared.b16 %rs627, [%r53+16]; 2026-02-21T12:40:23.8359534Z ld.shared.b16 %rs628, [%r53+272]; 2026-02-21T12:40:23.8359611Z ld.shared.b16 %rs629, [%r54]; 2026-02-21T12:40:23.8359675Z ld.shared.b16 %rs630, [%r54+256]; 2026-02-21T12:40:23.8359739Z ld.shared.b16 %rs631, [%r54+16]; 2026-02-21T12:40:23.8359804Z ld.shared.b16 %rs632, [%r54+272]; 2026-02-21T12:40:23.8359873Z cvt.f32.bf16 %r8306, %rs625; 2026-02-21T12:40:23.8359935Z cvt.f32.bf16 %r8307, %rs626; 2026-02-21T12:40:23.8359995Z cvt.f32.bf16 %r8308, %rs629; 2026-02-21T12:40:23.8360058Z cvt.f32.bf16 %r8309, %rs630; 2026-02-21T12:40:23.8360119Z cvt.f32.bf16 %r8566, %rs627; 2026-02-21T12:40:23.8360178Z cvt.f32.bf16 %r8567, %rs628; 2026-02-21T12:40:23.8360239Z cvt.f32.bf16 %r8568, %rs631; 2026-02-21T12:40:23.8360305Z cvt.f32.bf16 %r8569, %rs632; 2026-02-21T12:40:23.8360501Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8360568Z add.s64 %rd288, %rd621, 20480; 2026-02-21T12:40:23.8360629Z // begin inline asm 2026-02-21T12:40:23.8360688Z mov.u32 %r8046, 0x0; 2026-02-21T12:40:23.8360749Z mov.u32 %r8047, 0x0; 2026-02-21T12:40:23.8360807Z mov.u32 %r8048, 0x0; 2026-02-21T12:40:23.8360869Z mov.u32 %r8049, 0x0; 2026-02-21T12:40:23.8360994Z ld.global.v4.b32 { %r8046, %r8047, %r8048, %r8049 }, [ %rd288 + 0 ]; 2026-02-21T12:40:23.8361051Z // end inline asm 2026-02-21T12:40:23.8361257Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8361314Z bar.sync 0; 2026-02-21T12:40:23.8361379Z st.shared.b8 [%r13], %r8046; 2026-02-21T12:40:23.8361450Z prmt.b32 %r8993, %r8046, 0, 0x7771U; 2026-02-21T12:40:23.8361514Z st.shared.b8 [%r14], %r8993; 2026-02-21T12:40:23.8361580Z prmt.b32 %r8994, %r8046, 0, 0x7772U; 2026-02-21T12:40:23.8361645Z st.shared.b8 [%r15+256], %r8994; 2026-02-21T12:40:23.8361711Z prmt.b32 %r8995, %r8046, 0, 0x7773U; 2026-02-21T12:40:23.8361845Z st.shared.b8 [%r16+256], %r8995; 2026-02-21T12:40:23.8361919Z st.shared.b8 [%r17+512], %r8047; 2026-02-21T12:40:23.8362037Z prmt.b32 %r8996, %r8047, 0, 0x7771U; 2026-02-21T12:40:23.8362103Z st.shared.b8 [%r18+512], %r8996; 2026-02-21T12:40:23.8362168Z prmt.b32 %r8997, %r8047, 0, 0x7772U; 2026-02-21T12:40:23.8362233Z st.shared.b8 [%r19+768], %r8997; 2026-02-21T12:40:23.8362302Z prmt.b32 %r8998, %r8047, 0, 0x7773U; 2026-02-21T12:40:23.8362366Z st.shared.b8 [%r20+768], %r8998; 2026-02-21T12:40:23.8362431Z st.shared.b8 [%r21+1024], %r8048; 2026-02-21T12:40:23.8362499Z prmt.b32 %r8999, %r8048, 0, 0x7771U; 2026-02-21T12:40:23.8362562Z st.shared.b8 [%r22+1024], %r8999; 2026-02-21T12:40:23.8362625Z prmt.b32 %r9000, %r8048, 0, 0x7772U; 2026-02-21T12:40:23.8362693Z st.shared.b8 [%r23+1280], %r9000; 2026-02-21T12:40:23.8362758Z prmt.b32 %r9001, %r8048, 0, 0x7773U; 2026-02-21T12:40:23.8362821Z st.shared.b8 [%r24+1280], %r9001; 2026-02-21T12:40:23.8362886Z st.shared.b8 [%r25+1536], %r8049; 2026-02-21T12:40:23.8362956Z prmt.b32 %r9002, %r8049, 0, 0x7771U; 2026-02-21T12:40:23.8363072Z st.shared.b8 [%r26+1536], %r9002; 2026-02-21T12:40:23.8363189Z prmt.b32 %r9003, %r8049, 0, 0x7772U; 2026-02-21T12:40:23.8363257Z st.shared.b8 [%r27+1792], %r9003; 2026-02-21T12:40:23.8363319Z prmt.b32 %r9004, %r8049, 0, 0x7773U; 2026-02-21T12:40:23.8363381Z st.shared.b8 [%r28+1792], %r9004; 2026-02-21T12:40:23.8363439Z bar.sync 0; 2026-02-21T12:40:23.8363507Z ld.shared.b32 %r9005, [%r29]; 2026-02-21T12:40:23.8363569Z prmt.b32 %r9006, %r9005, 0, 0x7770U; 2026-02-21T12:40:23.8363633Z cvt.u16.u32 %rs633, %r9006; 2026-02-21T12:40:23.8363699Z prmt.b32 %r9007, %r9005, 0, 0x7771U; 2026-02-21T12:40:23.8363761Z cvt.u16.u32 %rs634, %r9007; 2026-02-21T12:40:23.8363824Z prmt.b32 %r9008, %r9005, 0, 0x7772U; 2026-02-21T12:40:23.8363888Z cvt.u16.u32 %rs635, %r9008; 2026-02-21T12:40:23.8363952Z prmt.b32 %r9009, %r9005, 0, 0x7773U; 2026-02-21T12:40:23.8364013Z cvt.u16.u32 %rs636, %r9009; 2026-02-21T12:40:23.8364078Z ld.shared.b32 %r9010, [%r30]; 2026-02-21T12:40:23.8364145Z prmt.b32 %r9011, %r9010, 0, 0x7770U; 2026-02-21T12:40:23.8364209Z cvt.u16.u32 %rs637, %r9011; 2026-02-21T12:40:23.8364271Z prmt.b32 %r9012, %r9010, 0, 0x7771U; 2026-02-21T12:40:23.8364335Z cvt.u16.u32 %rs638, %r9012; 2026-02-21T12:40:23.8364398Z prmt.b32 %r9013, %r9010, 0, 0x7772U; 2026-02-21T12:40:23.8364459Z cvt.u16.u32 %rs639, %r9013; 2026-02-21T12:40:23.8364521Z prmt.b32 %r9014, %r9010, 0, 0x7773U; 2026-02-21T12:40:23.8364584Z cvt.u16.u32 %rs640, %r9014; 2026-02-21T12:40:23.8364648Z ld.shared.b32 %r9015, [%r31]; 2026-02-21T12:40:23.8364711Z prmt.b32 %r9016, %r9015, 0, 0x7770U; 2026-02-21T12:40:23.8364775Z cvt.u16.u32 %rs641, %r9016; 2026-02-21T12:40:23.8364838Z prmt.b32 %r9017, %r9015, 0, 0x7771U; 2026-02-21T12:40:23.8364898Z cvt.u16.u32 %rs642, %r9017; 2026-02-21T12:40:23.8364971Z prmt.b32 %r9018, %r9015, 0, 0x7772U; 2026-02-21T12:40:23.8365036Z cvt.u16.u32 %rs643, %r9018; 2026-02-21T12:40:23.8365103Z prmt.b32 %r9019, %r9015, 0, 0x7773U; 2026-02-21T12:40:23.8365162Z cvt.u16.u32 %rs644, %r9019; 2026-02-21T12:40:23.8365234Z ld.shared.b32 %r9020, [%r32]; 2026-02-21T12:40:23.8365297Z prmt.b32 %r9021, %r9020, 0, 0x7770U; 2026-02-21T12:40:23.8365358Z cvt.u16.u32 %rs645, %r9021; 2026-02-21T12:40:23.8365426Z prmt.b32 %r9022, %r9020, 0, 0x7771U; 2026-02-21T12:40:23.8365487Z cvt.u16.u32 %rs646, %r9022; 2026-02-21T12:40:23.8365550Z prmt.b32 %r9023, %r9020, 0, 0x7772U; 2026-02-21T12:40:23.8365609Z cvt.u16.u32 %rs647, %r9023; 2026-02-21T12:40:23.8365674Z prmt.b32 %r9024, %r9020, 0, 0x7773U; 2026-02-21T12:40:23.8365734Z cvt.u16.u32 %rs648, %r9024; 2026-02-21T12:40:23.8365931Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8365998Z shl.b16 %rs649, %rs633, 4; 2026-02-21T12:40:23.8366061Z shl.b16 %rs650, %rs637, 4; 2026-02-21T12:40:23.8366122Z shl.b16 %rs651, %rs641, 4; 2026-02-21T12:40:23.8366253Z shl.b16 %rs652, %rs645, 4; 2026-02-21T12:40:23.8366323Z shl.b16 %rs653, %rs634, 4; 2026-02-21T12:40:23.8366384Z shl.b16 %rs654, %rs638, 4; 2026-02-21T12:40:23.8366612Z shl.b16 %rs655, %rs642, 4; 2026-02-21T12:40:23.8366681Z shl.b16 %rs656, %rs646, 4; 2026-02-21T12:40:23.8366742Z shl.b16 %rs657, %rs635, 4; 2026-02-21T12:40:23.8366802Z shl.b16 %rs658, %rs639, 4; 2026-02-21T12:40:23.8366863Z shl.b16 %rs659, %rs643, 4; 2026-02-21T12:40:23.8366928Z shl.b16 %rs660, %rs647, 4; 2026-02-21T12:40:23.8366988Z shl.b16 %rs661, %rs636, 4; 2026-02-21T12:40:23.8367049Z shl.b16 %rs662, %rs640, 4; 2026-02-21T12:40:23.8367119Z shl.b16 %rs663, %rs644, 4; 2026-02-21T12:40:23.8367180Z shl.b16 %rs664, %rs648, 4; 2026-02-21T12:40:23.8367391Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8367457Z cvt.s16.s8 %rs665, %rs649; 2026-02-21T12:40:23.8367517Z shr.s16 %rs666, %rs665, 4; 2026-02-21T12:40:23.8367579Z cvt.s16.s8 %rs667, %rs651; 2026-02-21T12:40:23.8367640Z shr.s16 %rs668, %rs667, 4; 2026-02-21T12:40:23.8367710Z prmt.b32 %r9025, %r9005, 0, 0x8880U; 2026-02-21T12:40:23.8367909Z cvt.u16.u32 %rs669, %r9025; 2026-02-21T12:40:23.8367975Z shr.s16 %rs670, %rs669, 4; 2026-02-21T12:40:23.8368044Z prmt.b32 %r9026, %r9015, 0, 0x8880U; 2026-02-21T12:40:23.8368105Z cvt.u16.u32 %rs671, %r9026; 2026-02-21T12:40:23.8368169Z shr.s16 %rs672, %rs671, 4; 2026-02-21T12:40:23.8368364Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8368434Z cvt.rn.f32.s16 %r9027, %rs672; 2026-02-21T12:40:23.8368497Z cvt.rn.f32.s16 %r9028, %rs670; 2026-02-21T12:40:23.8368559Z cvt.rn.f32.s16 %r9029, %rs668; 2026-02-21T12:40:23.8368624Z cvt.rn.f32.s16 %r9030, %rs666; 2026-02-21T12:40:23.8368817Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8368878Z cvt.s16.s8 %rs673, %rs650; 2026-02-21T12:40:23.8368945Z shr.s16 %rs674, %rs673, 4; 2026-02-21T12:40:23.8369004Z cvt.s16.s8 %rs675, %rs652; 2026-02-21T12:40:23.8369067Z shr.s16 %rs676, %rs675, 4; 2026-02-21T12:40:23.8369135Z prmt.b32 %r9031, %r9010, 0, 0x8880U; 2026-02-21T12:40:23.8369199Z cvt.u16.u32 %rs677, %r9031; 2026-02-21T12:40:23.8369259Z shr.s16 %rs678, %rs677, 4; 2026-02-21T12:40:23.8369325Z prmt.b32 %r9032, %r9020, 0, 0x8880U; 2026-02-21T12:40:23.8369388Z cvt.u16.u32 %rs679, %r9032; 2026-02-21T12:40:23.8369448Z shr.s16 %rs680, %rs679, 4; 2026-02-21T12:40:23.8369643Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8369710Z cvt.rn.f32.s16 %r9033, %rs680; 2026-02-21T12:40:23.8369775Z cvt.rn.f32.s16 %r9034, %rs678; 2026-02-21T12:40:23.8369839Z cvt.rn.f32.s16 %r9035, %rs676; 2026-02-21T12:40:23.8369901Z cvt.rn.f32.s16 %r9036, %rs674; 2026-02-21T12:40:23.8370100Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8370164Z cvt.s16.s8 %rs681, %rs653; 2026-02-21T12:40:23.8370224Z shr.s16 %rs682, %rs681, 4; 2026-02-21T12:40:23.8370292Z cvt.s16.s8 %rs683, %rs655; 2026-02-21T12:40:23.8370353Z shr.s16 %rs684, %rs683, 4; 2026-02-21T12:40:23.8370417Z prmt.b32 %r9037, %r9005, 0, 0x9991U; 2026-02-21T12:40:23.8370480Z cvt.u16.u32 %rs685, %r9037; 2026-02-21T12:40:23.8370543Z shr.s16 %rs686, %rs685, 4; 2026-02-21T12:40:23.8370607Z prmt.b32 %r9038, %r9015, 0, 0x9991U; 2026-02-21T12:40:23.8370668Z cvt.u16.u32 %rs687, %r9038; 2026-02-21T12:40:23.8370732Z shr.s16 %rs688, %rs687, 4; 2026-02-21T12:40:23.8370924Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8370988Z cvt.rn.f32.s16 %r9039, %rs688; 2026-02-21T12:40:23.8371055Z cvt.rn.f32.s16 %r9040, %rs686; 2026-02-21T12:40:23.8371121Z cvt.rn.f32.s16 %r9041, %rs684; 2026-02-21T12:40:23.8371184Z cvt.rn.f32.s16 %r9042, %rs682; 2026-02-21T12:40:23.8371467Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8371535Z cvt.s16.s8 %rs689, %rs654; 2026-02-21T12:40:23.8371660Z shr.s16 %rs690, %rs689, 4; 2026-02-21T12:40:23.8371721Z cvt.s16.s8 %rs691, %rs656; 2026-02-21T12:40:23.8371783Z shr.s16 %rs692, %rs691, 4; 2026-02-21T12:40:23.8371849Z prmt.b32 %r9043, %r9010, 0, 0x9991U; 2026-02-21T12:40:23.8371910Z cvt.u16.u32 %rs693, %r9043; 2026-02-21T12:40:23.8371969Z shr.s16 %rs694, %rs693, 4; 2026-02-21T12:40:23.8372035Z prmt.b32 %r9044, %r9020, 0, 0x9991U; 2026-02-21T12:40:23.8372094Z cvt.u16.u32 %rs695, %r9044; 2026-02-21T12:40:23.8372154Z shr.s16 %rs696, %rs695, 4; 2026-02-21T12:40:23.8372347Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8372408Z cvt.rn.f32.s16 %r9045, %rs696; 2026-02-21T12:40:23.8372471Z cvt.rn.f32.s16 %r9046, %rs694; 2026-02-21T12:40:23.8372536Z cvt.rn.f32.s16 %r9047, %rs692; 2026-02-21T12:40:23.8372600Z cvt.rn.f32.s16 %r9048, %rs690; 2026-02-21T12:40:23.8372895Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8372961Z cvt.s16.s8 %rs697, %rs657; 2026-02-21T12:40:23.8373025Z shr.s16 %rs698, %rs697, 4; 2026-02-21T12:40:23.8373085Z cvt.s16.s8 %rs699, %rs659; 2026-02-21T12:40:23.8373145Z shr.s16 %rs700, %rs699, 4; 2026-02-21T12:40:23.8373212Z prmt.b32 %r9049, %r9005, 0, 0xaaa2U; 2026-02-21T12:40:23.8373272Z cvt.u16.u32 %rs701, %r9049; 2026-02-21T12:40:23.8373331Z shr.s16 %rs702, %rs701, 4; 2026-02-21T12:40:23.8373395Z prmt.b32 %r9050, %r9015, 0, 0xaaa2U; 2026-02-21T12:40:23.8373458Z cvt.u16.u32 %rs703, %r9050; 2026-02-21T12:40:23.8373518Z shr.s16 %rs704, %rs703, 4; 2026-02-21T12:40:23.8373710Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8373776Z cvt.rn.f32.s16 %r9051, %rs704; 2026-02-21T12:40:23.8373854Z cvt.rn.f32.s16 %r9052, %rs702; 2026-02-21T12:40:23.8373918Z cvt.rn.f32.s16 %r9053, %rs700; 2026-02-21T12:40:23.8373984Z cvt.rn.f32.s16 %r9054, %rs698; 2026-02-21T12:40:23.8374182Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8374243Z cvt.s16.s8 %rs705, %rs658; 2026-02-21T12:40:23.8374304Z shr.s16 %rs706, %rs705, 4; 2026-02-21T12:40:23.8374366Z cvt.s16.s8 %rs707, %rs660; 2026-02-21T12:40:23.8374426Z shr.s16 %rs708, %rs707, 4; 2026-02-21T12:40:23.8374490Z prmt.b32 %r9055, %r9010, 0, 0xaaa2U; 2026-02-21T12:40:23.8374555Z cvt.u16.u32 %rs709, %r9055; 2026-02-21T12:40:23.8374614Z shr.s16 %rs710, %rs709, 4; 2026-02-21T12:40:23.8374677Z prmt.b32 %r9056, %r9020, 0, 0xaaa2U; 2026-02-21T12:40:23.8374738Z cvt.u16.u32 %rs711, %r9056; 2026-02-21T12:40:23.8374802Z shr.s16 %rs712, %rs711, 4; 2026-02-21T12:40:23.8374993Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8375058Z cvt.rn.f32.s16 %r9057, %rs712; 2026-02-21T12:40:23.8375124Z cvt.rn.f32.s16 %r9058, %rs710; 2026-02-21T12:40:23.8375193Z cvt.rn.f32.s16 %r9059, %rs708; 2026-02-21T12:40:23.8375255Z cvt.rn.f32.s16 %r9060, %rs706; 2026-02-21T12:40:23.8375451Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8375514Z cvt.s16.s8 %rs713, %rs661; 2026-02-21T12:40:23.8375576Z shr.s16 %rs714, %rs713, 4; 2026-02-21T12:40:23.8375637Z cvt.s16.s8 %rs715, %rs663; 2026-02-21T12:40:23.8375699Z shr.s16 %rs716, %rs715, 4; 2026-02-21T12:40:23.8375763Z prmt.b32 %r9061, %r9005, 0, 0xbbb3U; 2026-02-21T12:40:23.8375824Z cvt.u16.u32 %rs717, %r9061; 2026-02-21T12:40:23.8375887Z shr.s16 %rs718, %rs717, 4; 2026-02-21T12:40:23.8375951Z prmt.b32 %r9062, %r9015, 0, 0xbbb3U; 2026-02-21T12:40:23.8376012Z cvt.u16.u32 %rs719, %r9062; 2026-02-21T12:40:23.8376074Z shr.s16 %rs720, %rs719, 4; 2026-02-21T12:40:23.8376365Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8376433Z cvt.rn.f32.s16 %r9063, %rs720; 2026-02-21T12:40:23.8376675Z cvt.rn.f32.s16 %r9064, %rs718; 2026-02-21T12:40:23.8376744Z cvt.rn.f32.s16 %r9065, %rs716; 2026-02-21T12:40:23.8376806Z cvt.rn.f32.s16 %r9066, %rs714; 2026-02-21T12:40:23.8377000Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8377065Z cvt.s16.s8 %rs721, %rs662; 2026-02-21T12:40:23.8377126Z shr.s16 %rs722, %rs721, 4; 2026-02-21T12:40:23.8377185Z cvt.s16.s8 %rs723, %rs664; 2026-02-21T12:40:23.8377245Z shr.s16 %rs724, %rs723, 4; 2026-02-21T12:40:23.8377323Z prmt.b32 %r9067, %r9010, 0, 0xbbb3U; 2026-02-21T12:40:23.8377386Z cvt.u16.u32 %rs725, %r9067; 2026-02-21T12:40:23.8377448Z shr.s16 %rs726, %rs725, 4; 2026-02-21T12:40:23.8377515Z prmt.b32 %r9068, %r9020, 0, 0xbbb3U; 2026-02-21T12:40:23.8377576Z cvt.u16.u32 %rs727, %r9068; 2026-02-21T12:40:23.8377640Z shr.s16 %rs728, %rs727, 4; 2026-02-21T12:40:23.8377917Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8378044Z cvt.rn.f32.s16 %r9069, %rs728; 2026-02-21T12:40:23.8378110Z cvt.rn.f32.s16 %r9070, %rs726; 2026-02-21T12:40:23.8378173Z cvt.rn.f32.s16 %r9071, %rs724; 2026-02-21T12:40:23.8378238Z cvt.rn.f32.s16 %r9072, %rs722; 2026-02-21T12:40:23.8378294Z bar.sync 0; 2026-02-21T12:40:23.8378408Z st.shared.v4.b32 [%r33], {%r9030, %r9028, %r9029, %r9027}; 2026-02-21T12:40:23.8378530Z st.shared.v4.b32 [%r33+8192], {%r9036, %r9034, %r9035, %r9033}; 2026-02-21T12:40:23.8378635Z st.shared.v4.b32 [%r34], {%r9042, %r9040, %r9041, %r9039}; 2026-02-21T12:40:23.8378747Z st.shared.v4.b32 [%r34+8192], {%r9048, %r9046, %r9047, %r9045}; 2026-02-21T12:40:23.8378847Z st.shared.v4.b32 [%r35], {%r9054, %r9052, %r9053, %r9051}; 2026-02-21T12:40:23.8378961Z st.shared.v4.b32 [%r35+8192], {%r9060, %r9058, %r9059, %r9057}; 2026-02-21T12:40:23.8379066Z st.shared.v4.b32 [%r36], {%r9066, %r9064, %r9065, %r9063}; 2026-02-21T12:40:23.8379178Z st.shared.v4.b32 [%r36+8192], {%r9072, %r9070, %r9071, %r9069}; 2026-02-21T12:40:23.8379238Z $L__tmp13: 2026-02-21T12:40:23.8379510Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8379574Z // begin inline asm 2026-02-21T12:40:23.8379656Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8379713Z // end inline asm 2026-02-21T12:40:23.8379769Z bar.sync 0; 2026-02-21T12:40:23.8379842Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8379905Z // begin inline asm 2026-02-21T12:40:23.8382641Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r8306,%r8307,%r8308,%r8309}, %rd23, %p14, 1, 1; 2026-02-21T12:40:23.8382712Z // end inline asm 2026-02-21T12:40:23.8382771Z // begin inline asm 2026-02-21T12:40:23.8385484Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r8566,%r8567,%r8568,%r8569}, %rd24, %p14, 1, 1; 2026-02-21T12:40:23.8385693Z // end inline asm 2026-02-21T12:40:23.8385773Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8385922Z mov.b32 %r8699, %r8700; 2026-02-21T12:40:23.8385986Z mov.b32 %r8698, %r18409; 2026-02-21T12:40:23.8386045Z // begin inline asm 2026-02-21T12:40:23.8388768Z // wait for regs: %r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713,%r8698,%r8699,%r8700 2026-02-21T12:40:23.8388858Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8388915Z // end inline asm 2026-02-21T12:40:23.8388968Z $L__tmp14: 2026-02-21T12:40:23.8389181Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8389250Z add.s64 %rd623, %rd623, 24; 2026-02-21T12:40:23.8389311Z add.s64 %rd622, %rd622, 96; 2026-02-21T12:40:23.8389374Z add.s64 %rd621, %rd621, 30720; 2026-02-21T12:40:23.8389442Z setp.lt.u64 %p20, %rd623, 4056; 2026-02-21T12:40:23.8389503Z @%p20 bra $L__BB0_21; 2026-02-21T12:40:23.8389599Z // %bb.22: // %.preheader156 2026-02-21T12:40:23.8389704Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8389778Z add.s64 %rd625, %rd27, %rd71; 2026-02-21T12:40:23.8389852Z add.s64 %rd624, %rd28, %rd70; 2026-02-21T12:40:23.8389913Z mov.b64 %rd626, 4072; 2026-02-21T12:40:23.8390029Z $L__BB0_23: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8390134Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8390337Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8390400Z // begin inline asm 2026-02-21T12:40:23.8390460Z mov.u64 %rd292, 0x0; 2026-02-21T12:40:23.8390583Z createpolicy.fractional.L2::evict_last.b64 %rd292, 1.0; 2026-02-21T12:40:23.8390640Z // end inline asm 2026-02-21T12:40:23.8390701Z // begin inline asm 2026-02-21T12:40:23.8390759Z mov.u32 %r9073, 0x0; 2026-02-21T12:40:23.8390905Z mov.u32 %r9074, 0x0; 2026-02-21T12:40:23.8390966Z mov.u32 %r9075, 0x0; 2026-02-21T12:40:23.8391022Z mov.u32 %r9076, 0x0; 2026-02-21T12:40:23.8391249Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r9073, %r9074, %r9075, %r9076 }, [ %rd625 + 0 ], %rd292; 2026-02-21T12:40:23.8391373Z // end inline asm 2026-02-21T12:40:23.8391571Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8391626Z bar.sync 0; 2026-02-21T12:40:23.8391704Z st.shared.v2.b32 [%r9], {%r9073, %r9074}; 2026-02-21T12:40:23.8391787Z st.shared.v2.b32 [%r10], {%r9075, %r9076}; 2026-02-21T12:40:23.8391843Z bar.sync 0; 2026-02-21T12:40:23.8391910Z ld.shared.b16 %rs729, [%r53]; 2026-02-21T12:40:23.8391983Z ld.shared.b16 %rs730, [%r53+256]; 2026-02-21T12:40:23.8392051Z ld.shared.b16 %rs731, [%r53+16]; 2026-02-21T12:40:23.8392117Z ld.shared.b16 %rs732, [%r53+272]; 2026-02-21T12:40:23.8392180Z ld.shared.b16 %rs733, [%r54]; 2026-02-21T12:40:23.8392251Z ld.shared.b16 %rs734, [%r54+256]; 2026-02-21T12:40:23.8392314Z ld.shared.b16 %rs735, [%r54+16]; 2026-02-21T12:40:23.8392377Z ld.shared.b16 %rs736, [%r54+272]; 2026-02-21T12:40:23.8392579Z cvt.f32.bf16 %r9337, %rs729; 2026-02-21T12:40:23.8392648Z cvt.f32.bf16 %r9338, %rs730; 2026-02-21T12:40:23.8392711Z cvt.f32.bf16 %r9339, %rs733; 2026-02-21T12:40:23.8392776Z cvt.f32.bf16 %r9340, %rs734; 2026-02-21T12:40:23.8392840Z cvt.f32.bf16 %r9597, %rs731; 2026-02-21T12:40:23.8392901Z cvt.f32.bf16 %r9598, %rs732; 2026-02-21T12:40:23.8392961Z cvt.f32.bf16 %r9599, %rs735; 2026-02-21T12:40:23.8393025Z cvt.f32.bf16 %r9600, %rs736; 2026-02-21T12:40:23.8393228Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8393288Z // begin inline asm 2026-02-21T12:40:23.8393349Z mov.u32 %r9077, 0x0; 2026-02-21T12:40:23.8393407Z mov.u32 %r9078, 0x0; 2026-02-21T12:40:23.8393463Z mov.u32 %r9079, 0x0; 2026-02-21T12:40:23.8393519Z mov.u32 %r9080, 0x0; 2026-02-21T12:40:23.8393653Z ld.global.v4.b32 { %r9077, %r9078, %r9079, %r9080 }, [ %rd624 + 0 ]; 2026-02-21T12:40:23.8393711Z // end inline asm 2026-02-21T12:40:23.8393919Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8393980Z bar.sync 0; 2026-02-21T12:40:23.8394048Z st.shared.b8 [%r13], %r9077; 2026-02-21T12:40:23.8394119Z prmt.b32 %r9863, %r9077, 0, 0x7771U; 2026-02-21T12:40:23.8394186Z st.shared.b8 [%r14], %r9863; 2026-02-21T12:40:23.8394252Z prmt.b32 %r9864, %r9077, 0, 0x7772U; 2026-02-21T12:40:23.8394319Z st.shared.b8 [%r15+256], %r9864; 2026-02-21T12:40:23.8394384Z prmt.b32 %r9865, %r9077, 0, 0x7773U; 2026-02-21T12:40:23.8394451Z st.shared.b8 [%r16+256], %r9865; 2026-02-21T12:40:23.8394513Z st.shared.b8 [%r17+512], %r9078; 2026-02-21T12:40:23.8394576Z prmt.b32 %r9866, %r9078, 0, 0x7771U; 2026-02-21T12:40:23.8394641Z st.shared.b8 [%r18+512], %r9866; 2026-02-21T12:40:23.8394704Z prmt.b32 %r9867, %r9078, 0, 0x7772U; 2026-02-21T12:40:23.8394770Z st.shared.b8 [%r19+768], %r9867; 2026-02-21T12:40:23.8394833Z prmt.b32 %r9868, %r9078, 0, 0x7773U; 2026-02-21T12:40:23.8394903Z st.shared.b8 [%r20+768], %r9868; 2026-02-21T12:40:23.8394968Z st.shared.b8 [%r21+1024], %r9079; 2026-02-21T12:40:23.8395032Z prmt.b32 %r9869, %r9079, 0, 0x7771U; 2026-02-21T12:40:23.8395098Z st.shared.b8 [%r22+1024], %r9869; 2026-02-21T12:40:23.8395161Z prmt.b32 %r9870, %r9079, 0, 0x7772U; 2026-02-21T12:40:23.8395225Z st.shared.b8 [%r23+1280], %r9870; 2026-02-21T12:40:23.8395292Z prmt.b32 %r9871, %r9079, 0, 0x7773U; 2026-02-21T12:40:23.8395358Z st.shared.b8 [%r24+1280], %r9871; 2026-02-21T12:40:23.8395423Z st.shared.b8 [%r25+1536], %r9080; 2026-02-21T12:40:23.8395486Z prmt.b32 %r9872, %r9080, 0, 0x7771U; 2026-02-21T12:40:23.8395553Z st.shared.b8 [%r26+1536], %r9872; 2026-02-21T12:40:23.8395617Z prmt.b32 %r9873, %r9080, 0, 0x7772U; 2026-02-21T12:40:23.8395686Z st.shared.b8 [%r27+1792], %r9873; 2026-02-21T12:40:23.8395821Z prmt.b32 %r9874, %r9080, 0, 0x7773U; 2026-02-21T12:40:23.8395885Z st.shared.b8 [%r28+1792], %r9874; 2026-02-21T12:40:23.8395940Z bar.sync 0; 2026-02-21T12:40:23.8396057Z ld.shared.b32 %r9875, [%r29]; 2026-02-21T12:40:23.8396125Z prmt.b32 %r9876, %r9875, 0, 0x7770U; 2026-02-21T12:40:23.8396189Z cvt.u16.u32 %rs737, %r9876; 2026-02-21T12:40:23.8396252Z prmt.b32 %r9877, %r9875, 0, 0x7771U; 2026-02-21T12:40:23.8396316Z cvt.u16.u32 %rs738, %r9877; 2026-02-21T12:40:23.8396380Z prmt.b32 %r9878, %r9875, 0, 0x7772U; 2026-02-21T12:40:23.8396443Z cvt.u16.u32 %rs739, %r9878; 2026-02-21T12:40:23.8396624Z prmt.b32 %r9879, %r9875, 0, 0x7773U; 2026-02-21T12:40:23.8396704Z cvt.u16.u32 %rs740, %r9879; 2026-02-21T12:40:23.8396770Z ld.shared.b32 %r9880, [%r30]; 2026-02-21T12:40:23.8396835Z prmt.b32 %r9881, %r9880, 0, 0x7770U; 2026-02-21T12:40:23.8396897Z cvt.u16.u32 %rs741, %r9881; 2026-02-21T12:40:23.8396961Z prmt.b32 %r9882, %r9880, 0, 0x7771U; 2026-02-21T12:40:23.8397020Z cvt.u16.u32 %rs742, %r9882; 2026-02-21T12:40:23.8397089Z prmt.b32 %r9883, %r9880, 0, 0x7772U; 2026-02-21T12:40:23.8397148Z cvt.u16.u32 %rs743, %r9883; 2026-02-21T12:40:23.8397359Z prmt.b32 %r9884, %r9880, 0, 0x7773U; 2026-02-21T12:40:23.8397434Z cvt.u16.u32 %rs744, %r9884; 2026-02-21T12:40:23.8397502Z ld.shared.b32 %r9885, [%r31]; 2026-02-21T12:40:23.8397566Z prmt.b32 %r9886, %r9885, 0, 0x7770U; 2026-02-21T12:40:23.8397625Z cvt.u16.u32 %rs745, %r9886; 2026-02-21T12:40:23.8397690Z prmt.b32 %r9887, %r9885, 0, 0x7771U; 2026-02-21T12:40:23.8397753Z cvt.u16.u32 %rs746, %r9887; 2026-02-21T12:40:23.8397815Z prmt.b32 %r9888, %r9885, 0, 0x7772U; 2026-02-21T12:40:23.8397876Z cvt.u16.u32 %rs747, %r9888; 2026-02-21T12:40:23.8397942Z prmt.b32 %r9889, %r9885, 0, 0x7773U; 2026-02-21T12:40:23.8398002Z cvt.u16.u32 %rs748, %r9889; 2026-02-21T12:40:23.8398067Z ld.shared.b32 %r9890, [%r32]; 2026-02-21T12:40:23.8398133Z prmt.b32 %r9891, %r9890, 0, 0x7770U; 2026-02-21T12:40:23.8398192Z cvt.u16.u32 %rs749, %r9891; 2026-02-21T12:40:23.8398257Z prmt.b32 %r9892, %r9890, 0, 0x7771U; 2026-02-21T12:40:23.8398319Z cvt.u16.u32 %rs750, %r9892; 2026-02-21T12:40:23.8398387Z prmt.b32 %r9893, %r9890, 0, 0x7772U; 2026-02-21T12:40:23.8398449Z cvt.u16.u32 %rs751, %r9893; 2026-02-21T12:40:23.8398512Z prmt.b32 %r9894, %r9890, 0, 0x7773U; 2026-02-21T12:40:23.8398578Z cvt.u16.u32 %rs752, %r9894; 2026-02-21T12:40:23.8398779Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8398845Z shl.b16 %rs753, %rs737, 4; 2026-02-21T12:40:23.8398911Z shl.b16 %rs754, %rs741, 4; 2026-02-21T12:40:23.8398972Z shl.b16 %rs755, %rs745, 4; 2026-02-21T12:40:23.8399031Z shl.b16 %rs756, %rs749, 4; 2026-02-21T12:40:23.8399091Z shl.b16 %rs757, %rs738, 4; 2026-02-21T12:40:23.8399155Z shl.b16 %rs758, %rs742, 4; 2026-02-21T12:40:23.8399214Z shl.b16 %rs759, %rs746, 4; 2026-02-21T12:40:23.8399286Z shl.b16 %rs760, %rs750, 4; 2026-02-21T12:40:23.8399350Z shl.b16 %rs761, %rs739, 4; 2026-02-21T12:40:23.8399414Z shl.b16 %rs762, %rs743, 4; 2026-02-21T12:40:23.8399474Z shl.b16 %rs763, %rs747, 4; 2026-02-21T12:40:23.8399539Z shl.b16 %rs764, %rs751, 4; 2026-02-21T12:40:23.8399604Z shl.b16 %rs765, %rs740, 4; 2026-02-21T12:40:23.8399664Z shl.b16 %rs766, %rs744, 4; 2026-02-21T12:40:23.8399724Z shl.b16 %rs767, %rs748, 4; 2026-02-21T12:40:23.8399786Z shl.b16 %rs768, %rs752, 4; 2026-02-21T12:40:23.8399980Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8400041Z cvt.s16.s8 %rs769, %rs753; 2026-02-21T12:40:23.8400105Z shr.s16 %rs770, %rs769, 4; 2026-02-21T12:40:23.8400167Z cvt.s16.s8 %rs771, %rs755; 2026-02-21T12:40:23.8400227Z shr.s16 %rs772, %rs771, 4; 2026-02-21T12:40:23.8400290Z prmt.b32 %r9895, %r9875, 0, 0x8880U; 2026-02-21T12:40:23.8400353Z cvt.u16.u32 %rs773, %r9895; 2026-02-21T12:40:23.8400414Z shr.s16 %rs774, %rs773, 4; 2026-02-21T12:40:23.8400478Z prmt.b32 %r9896, %r9885, 0, 0x8880U; 2026-02-21T12:40:23.8400634Z cvt.u16.u32 %rs775, %r9896; 2026-02-21T12:40:23.8400695Z shr.s16 %rs776, %rs775, 4; 2026-02-21T12:40:23.8400964Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8401031Z cvt.rn.f32.s16 %r9897, %rs776; 2026-02-21T12:40:23.8401101Z cvt.rn.f32.s16 %r9898, %rs774; 2026-02-21T12:40:23.8401164Z cvt.rn.f32.s16 %r9899, %rs772; 2026-02-21T12:40:23.8401229Z cvt.rn.f32.s16 %r9900, %rs770; 2026-02-21T12:40:23.8401424Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8401486Z cvt.s16.s8 %rs777, %rs754; 2026-02-21T12:40:23.8401546Z shr.s16 %rs778, %rs777, 4; 2026-02-21T12:40:23.8401609Z cvt.s16.s8 %rs779, %rs756; 2026-02-21T12:40:23.8401670Z shr.s16 %rs780, %rs779, 4; 2026-02-21T12:40:23.8401733Z prmt.b32 %r9901, %r9880, 0, 0x8880U; 2026-02-21T12:40:23.8401794Z cvt.u16.u32 %rs781, %r9901; 2026-02-21T12:40:23.8401859Z shr.s16 %rs782, %rs781, 4; 2026-02-21T12:40:23.8401923Z prmt.b32 %r9902, %r9890, 0, 0x8880U; 2026-02-21T12:40:23.8401983Z cvt.u16.u32 %rs783, %r9902; 2026-02-21T12:40:23.8402138Z shr.s16 %rs784, %rs783, 4; 2026-02-21T12:40:23.8402336Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8402400Z cvt.rn.f32.s16 %r9903, %rs784; 2026-02-21T12:40:23.8402463Z cvt.rn.f32.s16 %r9904, %rs782; 2026-02-21T12:40:23.8402531Z cvt.rn.f32.s16 %r9905, %rs780; 2026-02-21T12:40:23.8402594Z cvt.rn.f32.s16 %r9906, %rs778; 2026-02-21T12:40:23.8402789Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8402853Z cvt.s16.s8 %rs785, %rs757; 2026-02-21T12:40:23.8402915Z shr.s16 %rs786, %rs785, 4; 2026-02-21T12:40:23.8402976Z cvt.s16.s8 %rs787, %rs759; 2026-02-21T12:40:23.8403039Z shr.s16 %rs788, %rs787, 4; 2026-02-21T12:40:23.8403103Z prmt.b32 %r9907, %r9875, 0, 0x9991U; 2026-02-21T12:40:23.8403164Z cvt.u16.u32 %rs789, %r9907; 2026-02-21T12:40:23.8403226Z shr.s16 %rs790, %rs789, 4; 2026-02-21T12:40:23.8403294Z prmt.b32 %r9908, %r9885, 0, 0x9991U; 2026-02-21T12:40:23.8403357Z cvt.u16.u32 %rs791, %r9908; 2026-02-21T12:40:23.8403419Z shr.s16 %rs792, %rs791, 4; 2026-02-21T12:40:23.8403615Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8403679Z cvt.rn.f32.s16 %r9909, %rs792; 2026-02-21T12:40:23.8403741Z cvt.rn.f32.s16 %r9910, %rs790; 2026-02-21T12:40:23.8403805Z cvt.rn.f32.s16 %r9911, %rs788; 2026-02-21T12:40:23.8403871Z cvt.rn.f32.s16 %r9912, %rs786; 2026-02-21T12:40:23.8404063Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8404128Z cvt.s16.s8 %rs793, %rs758; 2026-02-21T12:40:23.8404193Z shr.s16 %rs794, %rs793, 4; 2026-02-21T12:40:23.8404253Z cvt.s16.s8 %rs795, %rs760; 2026-02-21T12:40:23.8404315Z shr.s16 %rs796, %rs795, 4; 2026-02-21T12:40:23.8404383Z prmt.b32 %r9913, %r9880, 0, 0x9991U; 2026-02-21T12:40:23.8404446Z cvt.u16.u32 %rs797, %r9913; 2026-02-21T12:40:23.8404510Z shr.s16 %rs798, %rs797, 4; 2026-02-21T12:40:23.8404577Z prmt.b32 %r9914, %r9890, 0, 0x9991U; 2026-02-21T12:40:23.8404642Z cvt.u16.u32 %rs799, %r9914; 2026-02-21T12:40:23.8404702Z shr.s16 %rs800, %rs799, 4; 2026-02-21T12:40:23.8404895Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8404962Z cvt.rn.f32.s16 %r9915, %rs800; 2026-02-21T12:40:23.8405023Z cvt.rn.f32.s16 %r9916, %rs798; 2026-02-21T12:40:23.8405086Z cvt.rn.f32.s16 %r9917, %rs796; 2026-02-21T12:40:23.8405151Z cvt.rn.f32.s16 %r9918, %rs794; 2026-02-21T12:40:23.8405346Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8405406Z cvt.s16.s8 %rs801, %rs761; 2026-02-21T12:40:23.8405479Z shr.s16 %rs802, %rs801, 4; 2026-02-21T12:40:23.8405604Z cvt.s16.s8 %rs803, %rs763; 2026-02-21T12:40:23.8405665Z shr.s16 %rs804, %rs803, 4; 2026-02-21T12:40:23.8405732Z prmt.b32 %r9919, %r9875, 0, 0xaaa2U; 2026-02-21T12:40:23.8405842Z cvt.u16.u32 %rs805, %r9919; 2026-02-21T12:40:23.8405904Z shr.s16 %rs806, %rs805, 4; 2026-02-21T12:40:23.8405969Z prmt.b32 %r9920, %r9885, 0, 0xaaa2U; 2026-02-21T12:40:23.8406031Z cvt.u16.u32 %rs807, %r9920; 2026-02-21T12:40:23.8406096Z shr.s16 %rs808, %rs807, 4; 2026-02-21T12:40:23.8406288Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8406350Z cvt.rn.f32.s16 %r9921, %rs808; 2026-02-21T12:40:23.8406415Z cvt.rn.f32.s16 %r9922, %rs806; 2026-02-21T12:40:23.8406590Z cvt.rn.f32.s16 %r9923, %rs804; 2026-02-21T12:40:23.8406658Z cvt.rn.f32.s16 %r9924, %rs802; 2026-02-21T12:40:23.8406856Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8406922Z cvt.s16.s8 %rs809, %rs762; 2026-02-21T12:40:23.8406986Z shr.s16 %rs810, %rs809, 4; 2026-02-21T12:40:23.8407046Z cvt.s16.s8 %rs811, %rs764; 2026-02-21T12:40:23.8407243Z shr.s16 %rs812, %rs811, 4; 2026-02-21T12:40:23.8407313Z prmt.b32 %r9925, %r9880, 0, 0xaaa2U; 2026-02-21T12:40:23.8407374Z cvt.u16.u32 %rs813, %r9925; 2026-02-21T12:40:23.8407449Z shr.s16 %rs814, %rs813, 4; 2026-02-21T12:40:23.8407517Z prmt.b32 %r9926, %r9890, 0, 0xaaa2U; 2026-02-21T12:40:23.8407578Z cvt.u16.u32 %rs815, %r9926; 2026-02-21T12:40:23.8407639Z shr.s16 %rs816, %rs815, 4; 2026-02-21T12:40:23.8407837Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8407900Z cvt.rn.f32.s16 %r9927, %rs816; 2026-02-21T12:40:23.8407962Z cvt.rn.f32.s16 %r9928, %rs814; 2026-02-21T12:40:23.8408028Z cvt.rn.f32.s16 %r9929, %rs812; 2026-02-21T12:40:23.8408088Z cvt.rn.f32.s16 %r9930, %rs810; 2026-02-21T12:40:23.8408281Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8408348Z cvt.s16.s8 %rs817, %rs765; 2026-02-21T12:40:23.8408410Z shr.s16 %rs818, %rs817, 4; 2026-02-21T12:40:23.8408472Z cvt.s16.s8 %rs819, %rs767; 2026-02-21T12:40:23.8408533Z shr.s16 %rs820, %rs819, 4; 2026-02-21T12:40:23.8408602Z prmt.b32 %r9931, %r9875, 0, 0xbbb3U; 2026-02-21T12:40:23.8408663Z cvt.u16.u32 %rs821, %r9931; 2026-02-21T12:40:23.8408722Z shr.s16 %rs822, %rs821, 4; 2026-02-21T12:40:23.8408791Z prmt.b32 %r9932, %r9885, 0, 0xbbb3U; 2026-02-21T12:40:23.8408851Z cvt.u16.u32 %rs823, %r9932; 2026-02-21T12:40:23.8408912Z shr.s16 %rs824, %rs823, 4; 2026-02-21T12:40:23.8409103Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8409168Z cvt.rn.f32.s16 %r9933, %rs824; 2026-02-21T12:40:23.8409231Z cvt.rn.f32.s16 %r9934, %rs822; 2026-02-21T12:40:23.8409292Z cvt.rn.f32.s16 %r9935, %rs820; 2026-02-21T12:40:23.8409359Z cvt.rn.f32.s16 %r9936, %rs818; 2026-02-21T12:40:23.8409551Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8409617Z cvt.s16.s8 %rs825, %rs766; 2026-02-21T12:40:23.8409682Z shr.s16 %rs826, %rs825, 4; 2026-02-21T12:40:23.8409746Z cvt.s16.s8 %rs827, %rs768; 2026-02-21T12:40:23.8409806Z shr.s16 %rs828, %rs827, 4; 2026-02-21T12:40:23.8409869Z prmt.b32 %r9937, %r9880, 0, 0xbbb3U; 2026-02-21T12:40:23.8409933Z cvt.u16.u32 %rs829, %r9937; 2026-02-21T12:40:23.8409993Z shr.s16 %rs830, %rs829, 4; 2026-02-21T12:40:23.8410057Z prmt.b32 %r9938, %r9890, 0, 0xbbb3U; 2026-02-21T12:40:23.8410121Z cvt.u16.u32 %rs831, %r9938; 2026-02-21T12:40:23.8410181Z shr.s16 %rs832, %rs831, 4; 2026-02-21T12:40:23.8410374Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8410440Z cvt.rn.f32.s16 %r9939, %rs832; 2026-02-21T12:40:23.8410501Z cvt.rn.f32.s16 %r9940, %rs830; 2026-02-21T12:40:23.8410654Z cvt.rn.f32.s16 %r9941, %rs828; 2026-02-21T12:40:23.8410718Z cvt.rn.f32.s16 %r9942, %rs826; 2026-02-21T12:40:23.8410776Z bar.sync 0; 2026-02-21T12:40:23.8410955Z st.shared.v4.b32 [%r33], {%r9900, %r9898, %r9899, %r9897}; 2026-02-21T12:40:23.8411075Z st.shared.v4.b32 [%r33+8192], {%r9906, %r9904, %r9905, %r9903}; 2026-02-21T12:40:23.8411185Z st.shared.v4.b32 [%r34], {%r9912, %r9910, %r9911, %r9909}; 2026-02-21T12:40:23.8411298Z st.shared.v4.b32 [%r34+8192], {%r9918, %r9916, %r9917, %r9915}; 2026-02-21T12:40:23.8411405Z st.shared.v4.b32 [%r35], {%r9924, %r9922, %r9923, %r9921}; 2026-02-21T12:40:23.8411519Z st.shared.v4.b32 [%r35+8192], {%r9930, %r9928, %r9929, %r9927}; 2026-02-21T12:40:23.8411626Z st.shared.v4.b32 [%r36], {%r9936, %r9934, %r9935, %r9933}; 2026-02-21T12:40:23.8411737Z st.shared.v4.b32 [%r36+8192], {%r9942, %r9940, %r9941, %r9939}; 2026-02-21T12:40:23.8411791Z $L__tmp15: 2026-02-21T12:40:23.8412066Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8412130Z // begin inline asm 2026-02-21T12:40:23.8412257Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8412377Z // end inline asm 2026-02-21T12:40:23.8412437Z bar.sync 0; 2026-02-21T12:40:23.8412518Z shfl.sync.idx.b32 %r9943, %r2, 0, 31, -1; 2026-02-21T12:40:23.8412591Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8412659Z mov.pred %p21, -1; 2026-02-21T12:40:23.8412717Z // begin inline asm 2026-02-21T12:40:23.8415435Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r9337,%r9338,%r9339,%r9340}, %rd23, %p21, 1, 1; 2026-02-21T12:40:23.8415497Z // end inline asm 2026-02-21T12:40:23.8415556Z // begin inline asm 2026-02-21T12:40:23.8418404Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713}, {%r9597,%r9598,%r9599,%r9600}, %rd24, %p21, 1, 1; 2026-02-21T12:40:23.8418474Z // end inline asm 2026-02-21T12:40:23.8418551Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8418609Z mov.b32 %r9730, 0; 2026-02-21T12:40:23.8418745Z mov.b32 %r9731, %r9730; 2026-02-21T12:40:23.8418808Z mov.b32 %r9729, %r18409; 2026-02-21T12:40:23.8418866Z // begin inline asm 2026-02-21T12:40:23.8421575Z // wait for regs: %r21586,%r21587,%r21588,%r21589,%r21590,%r21591,%r21592,%r21593,%r21594,%r21595,%r21596,%r21597,%r21598,%r21599,%r21600,%r21601,%r21602,%r21603,%r21604,%r21605,%r21606,%r21607,%r21608,%r21609,%r21610,%r21611,%r21612,%r21613,%r21614,%r21615,%r21616,%r21617,%r21618,%r21619,%r21620,%r21621,%r21622,%r21623,%r21624,%r21625,%r21626,%r21627,%r21628,%r21629,%r21630,%r21631,%r21632,%r21633,%r21634,%r21635,%r21636,%r21637,%r21638,%r21639,%r21640,%r21641,%r21642,%r21643,%r21644,%r21645,%r21646,%r21647,%r21648,%r21649,%r21650,%r21651,%r21652,%r21653,%r21654,%r21655,%r21656,%r21657,%r21658,%r21659,%r21660,%r21661,%r21662,%r21663,%r21664,%r21665,%r21666,%r21667,%r21668,%r21669,%r21670,%r21671,%r21672,%r21673,%r21674,%r21675,%r21676,%r21677,%r21678,%r21679,%r21680,%r21681,%r21682,%r21683,%r21684,%r21685,%r21686,%r21687,%r21688,%r21689,%r21690,%r21691,%r21692,%r21693,%r21694,%r21695,%r21696,%r21697,%r21698,%r21699,%r21700,%r21701,%r21702,%r21703,%r21704,%r21705,%r21706,%r21707,%r21708,%r21709,%r21710,%r21711,%r21712,%r21713,%r9729,%r9730,%r9731 2026-02-21T12:40:23.8421662Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8421719Z // end inline asm 2026-02-21T12:40:23.8421776Z $L__tmp16: 2026-02-21T12:40:23.8421991Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8422055Z add.s64 %rd626, %rd626, 8; 2026-02-21T12:40:23.8422121Z add.s64 %rd625, %rd625, 32; 2026-02-21T12:40:23.8422186Z add.s64 %rd624, %rd624, 10240; 2026-02-21T12:40:23.8422252Z setp.lt.u64 %p23, %rd626, 4088; 2026-02-21T12:40:23.8422313Z @%p23 bra $L__BB0_23; 2026-02-21T12:40:23.8422430Z // %bb.24: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8422634Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:23.8422700Z or.b64 %rd314, %rd69, %rd5; 2026-02-21T12:40:23.8422765Z or.b64 %rd315, %rd69, %rd6; 2026-02-21T12:40:23.8422832Z or.b64 %rd316, %rd69, %rd7; 2026-02-21T12:40:23.8422892Z or.b64 %rd317, %rd69, %rd8; 2026-02-21T12:40:23.8422955Z or.b64 %rd318, %rd69, %rd9; 2026-02-21T12:40:23.8423019Z or.b64 %rd319, %rd69, %rd10; 2026-02-21T12:40:23.8423081Z or.b64 %rd320, %rd69, %rd11; 2026-02-21T12:40:23.8423140Z or.b64 %rd321, %rd69, %rd12; 2026-02-21T12:40:23.8423215Z or.b64 %rd322, %rd69, %rd13; 2026-02-21T12:40:23.8423277Z or.b64 %rd323, %rd69, %rd14; 2026-02-21T12:40:23.8423337Z or.b64 %rd324, %rd69, %rd15; 2026-02-21T12:40:23.8423400Z or.b64 %rd325, %rd69, %rd16; 2026-02-21T12:40:23.8423459Z or.b64 %rd326, %rd69, %rd17; 2026-02-21T12:40:23.8423520Z or.b64 %rd327, %rd69, %rd18; 2026-02-21T12:40:23.8423580Z or.b64 %rd328, %rd69, %rd19; 2026-02-21T12:40:23.8423644Z or.b64 %rd329, %rd69, %rd20; 2026-02-21T12:40:23.8423845Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:23.8423907Z or.b64 %rd330, %rd70, %rd22; 2026-02-21T12:40:23.8424105Z .loc 1 90 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:90:28 2026-02-21T12:40:23.8424193Z cvt.rn.bf16x2.f32 %r10088, %r21587, %r21586; 2026-02-21T12:40:23.8424273Z cvt.rn.bf16x2.f32 %r10089, %r21589, %r21588; 2026-02-21T12:40:23.8424352Z cvt.rn.bf16x2.f32 %r10090, %r21591, %r21590; 2026-02-21T12:40:23.8424428Z cvt.rn.bf16x2.f32 %r10091, %r21593, %r21592; 2026-02-21T12:40:23.8424503Z cvt.rn.bf16x2.f32 %r10092, %r21595, %r21594; 2026-02-21T12:40:23.8424579Z cvt.rn.bf16x2.f32 %r10093, %r21597, %r21596; 2026-02-21T12:40:23.8424656Z cvt.rn.bf16x2.f32 %r10094, %r21599, %r21598; 2026-02-21T12:40:23.8424731Z cvt.rn.bf16x2.f32 %r10095, %r21601, %r21600; 2026-02-21T12:40:23.8424805Z cvt.rn.bf16x2.f32 %r10096, %r21603, %r21602; 2026-02-21T12:40:23.8424884Z cvt.rn.bf16x2.f32 %r10097, %r21605, %r21604; 2026-02-21T12:40:23.8425019Z cvt.rn.bf16x2.f32 %r10098, %r21607, %r21606; 2026-02-21T12:40:23.8425096Z cvt.rn.bf16x2.f32 %r10099, %r21609, %r21608; 2026-02-21T12:40:23.8425221Z cvt.rn.bf16x2.f32 %r10100, %r21611, %r21610; 2026-02-21T12:40:23.8425298Z cvt.rn.bf16x2.f32 %r10101, %r21613, %r21612; 2026-02-21T12:40:23.8425375Z cvt.rn.bf16x2.f32 %r10102, %r21615, %r21614; 2026-02-21T12:40:23.8425448Z cvt.rn.bf16x2.f32 %r10103, %r21617, %r21616; 2026-02-21T12:40:23.8425528Z cvt.rn.bf16x2.f32 %r10104, %r21619, %r21618; 2026-02-21T12:40:23.8425616Z cvt.rn.bf16x2.f32 %r10105, %r21621, %r21620; 2026-02-21T12:40:23.8425693Z cvt.rn.bf16x2.f32 %r10106, %r21623, %r21622; 2026-02-21T12:40:23.8425774Z cvt.rn.bf16x2.f32 %r10107, %r21625, %r21624; 2026-02-21T12:40:23.8425849Z cvt.rn.bf16x2.f32 %r10108, %r21627, %r21626; 2026-02-21T12:40:23.8425923Z cvt.rn.bf16x2.f32 %r10109, %r21629, %r21628; 2026-02-21T12:40:23.8426000Z cvt.rn.bf16x2.f32 %r10110, %r21631, %r21630; 2026-02-21T12:40:23.8426076Z cvt.rn.bf16x2.f32 %r10111, %r21633, %r21632; 2026-02-21T12:40:23.8426152Z cvt.rn.bf16x2.f32 %r10112, %r21635, %r21634; 2026-02-21T12:40:23.8426320Z cvt.rn.bf16x2.f32 %r10113, %r21637, %r21636; 2026-02-21T12:40:23.8426403Z cvt.rn.bf16x2.f32 %r10114, %r21639, %r21638; 2026-02-21T12:40:23.8426590Z cvt.rn.bf16x2.f32 %r10115, %r21641, %r21640; 2026-02-21T12:40:23.8426672Z cvt.rn.bf16x2.f32 %r10116, %r21643, %r21642; 2026-02-21T12:40:23.8426751Z cvt.rn.bf16x2.f32 %r10117, %r21645, %r21644; 2026-02-21T12:40:23.8426840Z cvt.rn.bf16x2.f32 %r10118, %r21647, %r21646; 2026-02-21T12:40:23.8426916Z cvt.rn.bf16x2.f32 %r10119, %r21649, %r21648; 2026-02-21T12:40:23.8426992Z cvt.rn.bf16x2.f32 %r10120, %r21651, %r21650; 2026-02-21T12:40:23.8427072Z cvt.rn.bf16x2.f32 %r10121, %r21653, %r21652; 2026-02-21T12:40:23.8427149Z cvt.rn.bf16x2.f32 %r10122, %r21655, %r21654; 2026-02-21T12:40:23.8427223Z cvt.rn.bf16x2.f32 %r10123, %r21657, %r21656; 2026-02-21T12:40:23.8427304Z cvt.rn.bf16x2.f32 %r10124, %r21659, %r21658; 2026-02-21T12:40:23.8427391Z cvt.rn.bf16x2.f32 %r10125, %r21661, %r21660; 2026-02-21T12:40:23.8427468Z cvt.rn.bf16x2.f32 %r10126, %r21663, %r21662; 2026-02-21T12:40:23.8427550Z cvt.rn.bf16x2.f32 %r10127, %r21665, %r21664; 2026-02-21T12:40:23.8427627Z cvt.rn.bf16x2.f32 %r10128, %r21667, %r21666; 2026-02-21T12:40:23.8427702Z cvt.rn.bf16x2.f32 %r10129, %r21669, %r21668; 2026-02-21T12:40:23.8427776Z cvt.rn.bf16x2.f32 %r10130, %r21671, %r21670; 2026-02-21T12:40:23.8427856Z cvt.rn.bf16x2.f32 %r10131, %r21673, %r21672; 2026-02-21T12:40:23.8427932Z cvt.rn.bf16x2.f32 %r10132, %r21675, %r21674; 2026-02-21T12:40:23.8428008Z cvt.rn.bf16x2.f32 %r10133, %r21677, %r21676; 2026-02-21T12:40:23.8428088Z cvt.rn.bf16x2.f32 %r10134, %r21679, %r21678; 2026-02-21T12:40:23.8428163Z cvt.rn.bf16x2.f32 %r10135, %r21681, %r21680; 2026-02-21T12:40:23.8428240Z cvt.rn.bf16x2.f32 %r10136, %r21683, %r21682; 2026-02-21T12:40:23.8428318Z cvt.rn.bf16x2.f32 %r10137, %r21685, %r21684; 2026-02-21T12:40:23.8428464Z cvt.rn.bf16x2.f32 %r10138, %r21687, %r21686; 2026-02-21T12:40:23.8428540Z cvt.rn.bf16x2.f32 %r10139, %r21689, %r21688; 2026-02-21T12:40:23.8428619Z cvt.rn.bf16x2.f32 %r10140, %r21691, %r21690; 2026-02-21T12:40:23.8428698Z cvt.rn.bf16x2.f32 %r10141, %r21693, %r21692; 2026-02-21T12:40:23.8428773Z cvt.rn.bf16x2.f32 %r10142, %r21695, %r21694; 2026-02-21T12:40:23.8428848Z cvt.rn.bf16x2.f32 %r10143, %r21697, %r21696; 2026-02-21T12:40:23.8428925Z cvt.rn.bf16x2.f32 %r10144, %r21699, %r21698; 2026-02-21T12:40:23.8428999Z cvt.rn.bf16x2.f32 %r10145, %r21701, %r21700; 2026-02-21T12:40:23.8429073Z cvt.rn.bf16x2.f32 %r10146, %r21703, %r21702; 2026-02-21T12:40:23.8429152Z cvt.rn.bf16x2.f32 %r10147, %r21705, %r21704; 2026-02-21T12:40:23.8429226Z cvt.rn.bf16x2.f32 %r10148, %r21707, %r21706; 2026-02-21T12:40:23.8429300Z cvt.rn.bf16x2.f32 %r10149, %r21709, %r21708; 2026-02-21T12:40:23.8429387Z cvt.rn.bf16x2.f32 %r10150, %r21711, %r21710; 2026-02-21T12:40:23.8429468Z cvt.rn.bf16x2.f32 %r10151, %r21713, %r21712; 2026-02-21T12:40:23.8429754Z .loc 1 91 22 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:22 2026-02-21T12:40:23.8429890Z mad.lo.s64 %rd331, %rd314, 2560, %rd162; 2026-02-21T12:40:23.8429964Z shl.b64 %rd332, %rd330, 1; 2026-02-21T12:40:23.8430034Z add.s64 %rd298, %rd331, %rd332; 2026-02-21T12:40:23.8430106Z mad.lo.s64 %rd333, %rd315, 2560, %rd162; 2026-02-21T12:40:23.8430173Z add.s64 %rd299, %rd333, %rd332; 2026-02-21T12:40:23.8430243Z mad.lo.s64 %rd334, %rd316, 2560, %rd162; 2026-02-21T12:40:23.8430304Z add.s64 %rd300, %rd334, %rd332; 2026-02-21T12:40:23.8430373Z mad.lo.s64 %rd335, %rd317, 2560, %rd162; 2026-02-21T12:40:23.8430437Z add.s64 %rd301, %rd335, %rd332; 2026-02-21T12:40:23.8430505Z mad.lo.s64 %rd336, %rd318, 2560, %rd162; 2026-02-21T12:40:23.8430569Z add.s64 %rd302, %rd336, %rd332; 2026-02-21T12:40:23.8430641Z mad.lo.s64 %rd337, %rd319, 2560, %rd162; 2026-02-21T12:40:23.8430704Z add.s64 %rd303, %rd337, %rd332; 2026-02-21T12:40:23.8430773Z mad.lo.s64 %rd338, %rd320, 2560, %rd162; 2026-02-21T12:40:23.8430836Z add.s64 %rd304, %rd338, %rd332; 2026-02-21T12:40:23.8431036Z mad.lo.s64 %rd339, %rd321, 2560, %rd162; 2026-02-21T12:40:23.8431104Z add.s64 %rd305, %rd339, %rd332; 2026-02-21T12:40:23.8431177Z mad.lo.s64 %rd340, %rd322, 2560, %rd162; 2026-02-21T12:40:23.8431243Z add.s64 %rd306, %rd340, %rd332; 2026-02-21T12:40:23.8431315Z mad.lo.s64 %rd341, %rd323, 2560, %rd162; 2026-02-21T12:40:23.8431381Z add.s64 %rd307, %rd341, %rd332; 2026-02-21T12:40:23.8431453Z mad.lo.s64 %rd342, %rd324, 2560, %rd162; 2026-02-21T12:40:23.8431514Z add.s64 %rd308, %rd342, %rd332; 2026-02-21T12:40:23.8431583Z mad.lo.s64 %rd343, %rd325, 2560, %rd162; 2026-02-21T12:40:23.8431645Z add.s64 %rd309, %rd343, %rd332; 2026-02-21T12:40:23.8431717Z mad.lo.s64 %rd344, %rd326, 2560, %rd162; 2026-02-21T12:40:23.8431778Z add.s64 %rd310, %rd344, %rd332; 2026-02-21T12:40:23.8431846Z mad.lo.s64 %rd345, %rd327, 2560, %rd162; 2026-02-21T12:40:23.8431913Z add.s64 %rd311, %rd345, %rd332; 2026-02-21T12:40:23.8431980Z mad.lo.s64 %rd346, %rd328, 2560, %rd162; 2026-02-21T12:40:23.8432044Z add.s64 %rd312, %rd346, %rd332; 2026-02-21T12:40:23.8432114Z mad.lo.s64 %rd347, %rd329, 2560, %rd162; 2026-02-21T12:40:23.8432180Z add.s64 %rd313, %rd347, %rd332; 2026-02-21T12:40:23.8432381Z .loc 1 91 81 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:81 2026-02-21T12:40:23.8432439Z bar.sync 0; 2026-02-21T12:40:23.8432562Z st.shared.v4.b32 [%r37], {%r10088, %r10090, %r10092, %r10094}; 2026-02-21T12:40:23.8432675Z st.shared.v4.b32 [%r38], {%r10096, %r10098, %r10100, %r10102}; 2026-02-21T12:40:23.8432783Z st.shared.v4.b32 [%r39], {%r10104, %r10106, %r10108, %r10110}; 2026-02-21T12:40:23.8432893Z st.shared.v4.b32 [%r40], {%r10112, %r10114, %r10116, %r10118}; 2026-02-21T12:40:23.8433002Z st.shared.v4.b32 [%r41], {%r10120, %r10122, %r10124, %r10126}; 2026-02-21T12:40:23.8433107Z st.shared.v4.b32 [%r42], {%r10128, %r10130, %r10132, %r10134}; 2026-02-21T12:40:23.8433218Z st.shared.v4.b32 [%r43], {%r10136, %r10138, %r10140, %r10142}; 2026-02-21T12:40:23.8433339Z st.shared.v4.b32 [%r44], {%r10144, %r10146, %r10148, %r10150}; 2026-02-21T12:40:23.8433397Z bar.sync 0; 2026-02-21T12:40:23.8433459Z // begin inline asm 2026-02-21T12:40:23.8433659Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9944, %r9945, %r9946, %r9947}, [%r6254]; 2026-02-21T12:40:23.8433718Z // end inline asm 2026-02-21T12:40:23.8433777Z // begin inline asm 2026-02-21T12:40:23.8433965Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9949, %r9950, %r9951, %r9952}, [%r6259]; 2026-02-21T12:40:23.8434022Z // end inline asm 2026-02-21T12:40:23.8434080Z // begin inline asm 2026-02-21T12:40:23.8434264Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9954, %r9955, %r9956, %r9957}, [%r6264]; 2026-02-21T12:40:23.8434321Z // end inline asm 2026-02-21T12:40:23.8434388Z // begin inline asm 2026-02-21T12:40:23.8434570Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9959, %r9960, %r9961, %r9962}, [%r6269]; 2026-02-21T12:40:23.8434691Z // end inline asm 2026-02-21T12:40:23.8434748Z // begin inline asm 2026-02-21T12:40:23.8434974Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9964, %r9965, %r9966, %r9967}, [%r6274]; 2026-02-21T12:40:23.8435040Z // end inline asm 2026-02-21T12:40:23.8435100Z // begin inline asm 2026-02-21T12:40:23.8435283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9969, %r9970, %r9971, %r9972}, [%r6279]; 2026-02-21T12:40:23.8435342Z // end inline asm 2026-02-21T12:40:23.8446598Z // begin inline asm 2026-02-21T12:40:23.8446839Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9974, %r9975, %r9976, %r9977}, [%r6284]; 2026-02-21T12:40:23.8446901Z // end inline asm 2026-02-21T12:40:23.8446965Z // begin inline asm 2026-02-21T12:40:23.8447161Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9979, %r9980, %r9981, %r9982}, [%r6289]; 2026-02-21T12:40:23.8447220Z // end inline asm 2026-02-21T12:40:23.8447276Z bar.sync 0; 2026-02-21T12:40:23.8447413Z st.shared.v4.b32 [%r37], {%r10089, %r10091, %r10093, %r10095}; 2026-02-21T12:40:23.8447529Z st.shared.v4.b32 [%r38], {%r10097, %r10099, %r10101, %r10103}; 2026-02-21T12:40:23.8447843Z st.shared.v4.b32 [%r39], {%r10105, %r10107, %r10109, %r10111}; 2026-02-21T12:40:23.8447965Z st.shared.v4.b32 [%r40], {%r10113, %r10115, %r10117, %r10119}; 2026-02-21T12:40:23.8448073Z st.shared.v4.b32 [%r41], {%r10121, %r10123, %r10125, %r10127}; 2026-02-21T12:40:23.8448179Z st.shared.v4.b32 [%r42], {%r10129, %r10131, %r10133, %r10135}; 2026-02-21T12:40:23.8448288Z st.shared.v4.b32 [%r43], {%r10137, %r10139, %r10141, %r10143}; 2026-02-21T12:40:23.8448392Z st.shared.v4.b32 [%r44], {%r10145, %r10147, %r10149, %r10151}; 2026-02-21T12:40:23.8448449Z bar.sync 0; 2026-02-21T12:40:23.8448510Z // begin inline asm 2026-02-21T12:40:23.8448704Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9984, %r9985, %r9986, %r9987}, [%r6254]; 2026-02-21T12:40:23.8448762Z // end inline asm 2026-02-21T12:40:23.8448820Z // begin inline asm 2026-02-21T12:40:23.8449015Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9989, %r9990, %r9991, %r9992}, [%r6259]; 2026-02-21T12:40:23.8449072Z // end inline asm 2026-02-21T12:40:23.8449133Z // begin inline asm 2026-02-21T12:40:23.8449314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9994, %r9995, %r9996, %r9997}, [%r6264]; 2026-02-21T12:40:23.8449368Z // end inline asm 2026-02-21T12:40:23.8449438Z // begin inline asm 2026-02-21T12:40:23.8449632Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r9999, %r10000, %r10001, %r10002}, [%r6269]; 2026-02-21T12:40:23.8449694Z // end inline asm 2026-02-21T12:40:23.8449752Z // begin inline asm 2026-02-21T12:40:23.8449941Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10004, %r10005, %r10006, %r10007}, [%r6274]; 2026-02-21T12:40:23.8449999Z // end inline asm 2026-02-21T12:40:23.8450057Z // begin inline asm 2026-02-21T12:40:23.8450244Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10009, %r10010, %r10011, %r10012}, [%r6279]; 2026-02-21T12:40:23.8450308Z // end inline asm 2026-02-21T12:40:23.8450366Z // begin inline asm 2026-02-21T12:40:23.8450550Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10014, %r10015, %r10016, %r10017}, [%r6284]; 2026-02-21T12:40:23.8450609Z // end inline asm 2026-02-21T12:40:23.8450668Z // begin inline asm 2026-02-21T12:40:23.8450854Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r10019, %r10020, %r10021, %r10022}, [%r6289]; 2026-02-21T12:40:23.8450910Z // end inline asm 2026-02-21T12:40:23.8450970Z // begin inline asm 2026-02-21T12:40:23.8451096Z st.global.v4.b32 [ %rd298 + 0 ], { %r9944, %r9945, %r9946, %r9947 }; 2026-02-21T12:40:23.8451153Z // end inline asm 2026-02-21T12:40:23.8451211Z // begin inline asm 2026-02-21T12:40:23.8451333Z st.global.v4.b32 [ %rd299 + 0 ], { %r9949, %r9950, %r9951, %r9952 }; 2026-02-21T12:40:23.8451388Z // end inline asm 2026-02-21T12:40:23.8451446Z // begin inline asm 2026-02-21T12:40:23.8451570Z st.global.v4.b32 [ %rd300 + 0 ], { %r9984, %r9985, %r9986, %r9987 }; 2026-02-21T12:40:23.8451722Z // end inline asm 2026-02-21T12:40:23.8451778Z // begin inline asm 2026-02-21T12:40:23.8451896Z st.global.v4.b32 [ %rd301 + 0 ], { %r9989, %r9990, %r9991, %r9992 }; 2026-02-21T12:40:23.8452049Z // end inline asm 2026-02-21T12:40:23.8452106Z // begin inline asm 2026-02-21T12:40:23.8452219Z st.global.v4.b32 [ %rd302 + 0 ], { %r9954, %r9955, %r9956, %r9957 }; 2026-02-21T12:40:23.8452279Z // end inline asm 2026-02-21T12:40:23.8452335Z // begin inline asm 2026-02-21T12:40:23.8452447Z st.global.v4.b32 [ %rd303 + 0 ], { %r9959, %r9960, %r9961, %r9962 }; 2026-02-21T12:40:23.8452505Z // end inline asm 2026-02-21T12:40:23.8452560Z // begin inline asm 2026-02-21T12:40:23.8452670Z st.global.v4.b32 [ %rd304 + 0 ], { %r9994, %r9995, %r9996, %r9997 }; 2026-02-21T12:40:23.8452723Z // end inline asm 2026-02-21T12:40:23.8452781Z // begin inline asm 2026-02-21T12:40:23.8452902Z st.global.v4.b32 [ %rd305 + 0 ], { %r9999, %r10000, %r10001, %r10002 }; 2026-02-21T12:40:23.8452956Z // end inline asm 2026-02-21T12:40:23.8453019Z // begin inline asm 2026-02-21T12:40:23.8453130Z st.global.v4.b32 [ %rd306 + 0 ], { %r9964, %r9965, %r9966, %r9967 }; 2026-02-21T12:40:23.8453234Z // end inline asm 2026-02-21T12:40:23.8453353Z // begin inline asm 2026-02-21T12:40:23.8453470Z st.global.v4.b32 [ %rd307 + 0 ], { %r9969, %r9970, %r9971, %r9972 }; 2026-02-21T12:40:23.8453526Z // end inline asm 2026-02-21T12:40:23.8453584Z // begin inline asm 2026-02-21T12:40:23.8453710Z st.global.v4.b32 [ %rd308 + 0 ], { %r10004, %r10005, %r10006, %r10007 }; 2026-02-21T12:40:23.8453765Z // end inline asm 2026-02-21T12:40:23.8453823Z // begin inline asm 2026-02-21T12:40:23.8453943Z st.global.v4.b32 [ %rd309 + 0 ], { %r10009, %r10010, %r10011, %r10012 }; 2026-02-21T12:40:23.8453999Z // end inline asm 2026-02-21T12:40:23.8454057Z // begin inline asm 2026-02-21T12:40:23.8454179Z st.global.v4.b32 [ %rd310 + 0 ], { %r9974, %r9975, %r9976, %r9977 }; 2026-02-21T12:40:23.8454240Z // end inline asm 2026-02-21T12:40:23.8454299Z // begin inline asm 2026-02-21T12:40:23.8454419Z st.global.v4.b32 [ %rd311 + 0 ], { %r9979, %r9980, %r9981, %r9982 }; 2026-02-21T12:40:23.8454480Z // end inline asm 2026-02-21T12:40:23.8454541Z // begin inline asm 2026-02-21T12:40:23.8454670Z st.global.v4.b32 [ %rd312 + 0 ], { %r10014, %r10015, %r10016, %r10017 }; 2026-02-21T12:40:23.8454730Z // end inline asm 2026-02-21T12:40:23.8454790Z // begin inline asm 2026-02-21T12:40:23.8454931Z st.global.v4.b32 [ %rd313 + 0 ], { %r10019, %r10020, %r10021, %r10022 }; 2026-02-21T12:40:23.8454988Z // end inline asm 2026-02-21T12:40:23.8455215Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.8455282Z add.s64 %rd348, %rd612, 2; 2026-02-21T12:40:23.8455487Z .loc 1 28 35 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:28:35 2026-02-21T12:40:23.8455581Z mul.hi.u64 %rd349, %rd348, -3689348814741910323; 2026-02-21T12:40:23.8455645Z shr.u64 %rd350, %rd349, 5; 2026-02-21T12:40:23.8455842Z .loc 1 29 33 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:29:33 2026-02-21T12:40:23.8455906Z shl.b64 %rd88, %rd350, 3; 2026-02-21T12:40:23.8456101Z .loc 1 30 39 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:39 2026-02-21T12:40:23.8456165Z sub.s64 %rd351, 4096, %rd88; 2026-02-21T12:40:23.8456355Z .loc 1 30 52 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:52 2026-02-21T12:40:23.8456420Z min.s64 %rd89, %rd351, 8; 2026-02-21T12:40:23.8456735Z .loc 1 31 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:45 2026-02-21T12:40:23.8456806Z mul.lo.s64 %rd352, %rd350, 40; 2026-02-21T12:40:23.8456874Z sub.s64 %rd90, %rd348, %rd352; 2026-02-21T12:40:23.8457069Z .loc 1 32 51 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:32:51 2026-02-21T12:40:23.8457142Z or.b64 %rd353, %rd90, %rd89; 2026-02-21T12:40:23.8457295Z and.b64 %rd354, %rd353, -4294967296; 2026-02-21T12:40:23.8457362Z setp.ne.b64 %p24, %rd354, 0; 2026-02-21T12:40:23.8457423Z @%p24 bra $L__BB0_26; 2026-02-21T12:40:23.8457550Z bra.uni $L__BB0_25; 2026-02-21T12:40:23.8457673Z $L__BB0_26: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8457739Z div.s64 %rd627, %rd90, %rd89; 2026-02-21T12:40:23.8457798Z bra.uni $L__BB0_27; 2026-02-21T12:40:23.8457906Z $L__BB0_25: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8457971Z cvt.u32.u64 %r10152, %rd89; 2026-02-21T12:40:23.8458030Z cvt.u32.u64 %r10153, %rd90; 2026-02-21T12:40:23.8458099Z div.u32 %r10154, %r10153, %r10152; 2026-02-21T12:40:23.8458163Z cvt.u64.u32 %rd627, %r10154; 2026-02-21T12:40:23.8458266Z $L__BB0_27: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8458460Z .loc 1 31 64 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:64 2026-02-21T12:40:23.8458534Z mul.lo.s64 %rd356, %rd627, %rd89; 2026-02-21T12:40:23.8458599Z sub.s64 %rd357, %rd90, %rd356; 2026-02-21T12:40:23.8458914Z .loc 1 31 30 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:30 2026-02-21T12:40:23.8458990Z add.s64 %rd358, %rd357, %rd88; 2026-02-21T12:40:23.8459197Z .loc 1 33 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:33:27 2026-02-21T12:40:23.8459261Z shl.b64 %rd94, %rd358, 6; 2026-02-21T12:40:23.8459456Z .loc 1 35 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:35:27 2026-02-21T12:40:23.8459516Z shl.b64 %rd95, %rd627, 8; 2026-02-21T12:40:23.8459725Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8459788Z shl.b64 %rd96, %rd358, 20; 2026-02-21T12:40:23.8459856Z add.s64 %rd629, %rd25, %rd96; 2026-02-21T12:40:23.8459915Z add.s64 %rd628, %rd26, %rd95; 2026-02-21T12:40:23.8459980Z mov.b32 %r21842, 0f00000000; 2026-02-21T12:40:23.8460045Z mov.b64 %rd630, -24; 2026-02-21T12:40:23.8460108Z mov.b32 %r21843, %r21842; 2026-02-21T12:40:23.8460170Z mov.b32 %r21844, %r21842; 2026-02-21T12:40:23.8460235Z mov.b32 %r21845, %r21842; 2026-02-21T12:40:23.8460300Z mov.b32 %r21846, %r21842; 2026-02-21T12:40:23.8460358Z mov.b32 %r21847, %r21842; 2026-02-21T12:40:23.8460417Z mov.b32 %r21848, %r21842; 2026-02-21T12:40:23.8460476Z mov.b32 %r21849, %r21842; 2026-02-21T12:40:23.8460532Z mov.b32 %r21850, %r21842; 2026-02-21T12:40:23.8460589Z mov.b32 %r21851, %r21842; 2026-02-21T12:40:23.8460647Z mov.b32 %r21852, %r21842; 2026-02-21T12:40:23.8460705Z mov.b32 %r21853, %r21842; 2026-02-21T12:40:23.8460761Z mov.b32 %r21854, %r21842; 2026-02-21T12:40:23.8460819Z mov.b32 %r21855, %r21842; 2026-02-21T12:40:23.8460884Z mov.b32 %r21856, %r21842; 2026-02-21T12:40:23.8460941Z mov.b32 %r21857, %r21842; 2026-02-21T12:40:23.8460999Z mov.b32 %r21858, %r21842; 2026-02-21T12:40:23.8461062Z mov.b32 %r21859, %r21842; 2026-02-21T12:40:23.8461118Z mov.b32 %r21860, %r21842; 2026-02-21T12:40:23.8461176Z mov.b32 %r21861, %r21842; 2026-02-21T12:40:23.8461238Z mov.b32 %r21862, %r21842; 2026-02-21T12:40:23.8461298Z mov.b32 %r21863, %r21842; 2026-02-21T12:40:23.8461355Z mov.b32 %r21864, %r21842; 2026-02-21T12:40:23.8461412Z mov.b32 %r21865, %r21842; 2026-02-21T12:40:23.8461473Z mov.b32 %r21866, %r21842; 2026-02-21T12:40:23.8461531Z mov.b32 %r21867, %r21842; 2026-02-21T12:40:23.8461589Z mov.b32 %r21868, %r21842; 2026-02-21T12:40:23.8461647Z mov.b32 %r21869, %r21842; 2026-02-21T12:40:23.8461708Z mov.b32 %r21870, %r21842; 2026-02-21T12:40:23.8461769Z mov.b32 %r21871, %r21842; 2026-02-21T12:40:23.8461827Z mov.b32 %r21872, %r21842; 2026-02-21T12:40:23.8461887Z mov.b32 %r21873, %r21842; 2026-02-21T12:40:23.8461943Z mov.b32 %r21874, %r21842; 2026-02-21T12:40:23.8462001Z mov.b32 %r21875, %r21842; 2026-02-21T12:40:23.8462058Z mov.b32 %r21876, %r21842; 2026-02-21T12:40:23.8462179Z mov.b32 %r21877, %r21842; 2026-02-21T12:40:23.8462237Z mov.b32 %r21878, %r21842; 2026-02-21T12:40:23.8462294Z mov.b32 %r21879, %r21842; 2026-02-21T12:40:23.8462402Z mov.b32 %r21880, %r21842; 2026-02-21T12:40:23.8462458Z mov.b32 %r21881, %r21842; 2026-02-21T12:40:23.8462514Z mov.b32 %r21882, %r21842; 2026-02-21T12:40:23.8462571Z mov.b32 %r21883, %r21842; 2026-02-21T12:40:23.8462631Z mov.b32 %r21884, %r21842; 2026-02-21T12:40:23.8462687Z mov.b32 %r21885, %r21842; 2026-02-21T12:40:23.8462743Z mov.b32 %r21886, %r21842; 2026-02-21T12:40:23.8462801Z mov.b32 %r21887, %r21842; 2026-02-21T12:40:23.8462859Z mov.b32 %r21888, %r21842; 2026-02-21T12:40:23.8462915Z mov.b32 %r21889, %r21842; 2026-02-21T12:40:23.8462973Z mov.b32 %r21890, %r21842; 2026-02-21T12:40:23.8463030Z mov.b32 %r21891, %r21842; 2026-02-21T12:40:23.8463086Z mov.b32 %r21892, %r21842; 2026-02-21T12:40:23.8463156Z mov.b32 %r21893, %r21842; 2026-02-21T12:40:23.8463220Z mov.b32 %r21894, %r21842; 2026-02-21T12:40:23.8463282Z mov.b32 %r21895, %r21842; 2026-02-21T12:40:23.8463339Z mov.b32 %r21896, %r21842; 2026-02-21T12:40:23.8463401Z mov.b32 %r21897, %r21842; 2026-02-21T12:40:23.8463550Z mov.b32 %r21898, %r21842; 2026-02-21T12:40:23.8463611Z mov.b32 %r21899, %r21842; 2026-02-21T12:40:23.8463669Z mov.b32 %r21900, %r21842; 2026-02-21T12:40:23.8463731Z mov.b32 %r21901, %r21842; 2026-02-21T12:40:23.8463787Z mov.b32 %r21902, %r21842; 2026-02-21T12:40:23.8463844Z mov.b32 %r21903, %r21842; 2026-02-21T12:40:23.8463904Z mov.b32 %r21904, %r21842; 2026-02-21T12:40:23.8463959Z mov.b32 %r21905, %r21842; 2026-02-21T12:40:23.8464017Z mov.b32 %r21906, %r21842; 2026-02-21T12:40:23.8464074Z mov.b32 %r21907, %r21842; 2026-02-21T12:40:23.8464135Z mov.b32 %r21908, %r21842; 2026-02-21T12:40:23.8464192Z mov.b32 %r21909, %r21842; 2026-02-21T12:40:23.8464248Z mov.b32 %r21910, %r21842; 2026-02-21T12:40:23.8464306Z mov.b32 %r21911, %r21842; 2026-02-21T12:40:23.8464363Z mov.b32 %r21912, %r21842; 2026-02-21T12:40:23.8464421Z mov.b32 %r21913, %r21842; 2026-02-21T12:40:23.8464477Z mov.b32 %r21914, %r21842; 2026-02-21T12:40:23.8464536Z mov.b32 %r21915, %r21842; 2026-02-21T12:40:23.8464596Z mov.b32 %r21916, %r21842; 2026-02-21T12:40:23.8464653Z mov.b32 %r21917, %r21842; 2026-02-21T12:40:23.8464714Z mov.b32 %r21918, %r21842; 2026-02-21T12:40:23.8464773Z mov.b32 %r21919, %r21842; 2026-02-21T12:40:23.8464830Z mov.b32 %r21920, %r21842; 2026-02-21T12:40:23.8464888Z mov.b32 %r21921, %r21842; 2026-02-21T12:40:23.8464949Z mov.b32 %r21922, %r21842; 2026-02-21T12:40:23.8465008Z mov.b32 %r21923, %r21842; 2026-02-21T12:40:23.8465065Z mov.b32 %r21924, %r21842; 2026-02-21T12:40:23.8465125Z mov.b32 %r21925, %r21842; 2026-02-21T12:40:23.8465181Z mov.b32 %r21926, %r21842; 2026-02-21T12:40:23.8465238Z mov.b32 %r21927, %r21842; 2026-02-21T12:40:23.8465296Z mov.b32 %r21928, %r21842; 2026-02-21T12:40:23.8465357Z mov.b32 %r21929, %r21842; 2026-02-21T12:40:23.8465415Z mov.b32 %r21930, %r21842; 2026-02-21T12:40:23.8465473Z mov.b32 %r21931, %r21842; 2026-02-21T12:40:23.8465536Z mov.b32 %r21932, %r21842; 2026-02-21T12:40:23.8465592Z mov.b32 %r21933, %r21842; 2026-02-21T12:40:23.8465653Z mov.b32 %r21934, %r21842; 2026-02-21T12:40:23.8465713Z mov.b32 %r21935, %r21842; 2026-02-21T12:40:23.8465770Z mov.b32 %r21936, %r21842; 2026-02-21T12:40:23.8465827Z mov.b32 %r21937, %r21842; 2026-02-21T12:40:23.8465884Z mov.b32 %r21938, %r21842; 2026-02-21T12:40:23.8465944Z mov.b32 %r21939, %r21842; 2026-02-21T12:40:23.8466000Z mov.b32 %r21940, %r21842; 2026-02-21T12:40:23.8466056Z mov.b32 %r21941, %r21842; 2026-02-21T12:40:23.8466115Z mov.b32 %r21942, %r21842; 2026-02-21T12:40:23.8466172Z mov.b32 %r21943, %r21842; 2026-02-21T12:40:23.8466227Z mov.b32 %r21944, %r21842; 2026-02-21T12:40:23.8466283Z mov.b32 %r21945, %r21842; 2026-02-21T12:40:23.8466344Z mov.b32 %r21946, %r21842; 2026-02-21T12:40:23.8466402Z mov.b32 %r21947, %r21842; 2026-02-21T12:40:23.8466580Z mov.b32 %r21948, %r21842; 2026-02-21T12:40:23.8466732Z mov.b32 %r21949, %r21842; 2026-02-21T12:40:23.8466789Z mov.b32 %r21950, %r21842; 2026-02-21T12:40:23.8466846Z mov.b32 %r21951, %r21842; 2026-02-21T12:40:23.8466969Z mov.b32 %r21952, %r21842; 2026-02-21T12:40:23.8467035Z mov.b32 %r21953, %r21842; 2026-02-21T12:40:23.8467098Z mov.b32 %r21954, %r21842; 2026-02-21T12:40:23.8467159Z mov.b32 %r21955, %r21842; 2026-02-21T12:40:23.8467218Z mov.b32 %r21956, %r21842; 2026-02-21T12:40:23.8467276Z mov.b32 %r21957, %r21842; 2026-02-21T12:40:23.8467331Z mov.b32 %r21958, %r21842; 2026-02-21T12:40:23.8467387Z mov.b32 %r21959, %r21842; 2026-02-21T12:40:23.8467449Z mov.b32 %r21960, %r21842; 2026-02-21T12:40:23.8467507Z mov.b32 %r21961, %r21842; 2026-02-21T12:40:23.8467562Z mov.b32 %r21962, %r21842; 2026-02-21T12:40:23.8467622Z mov.b32 %r21963, %r21842; 2026-02-21T12:40:23.8467680Z mov.b32 %r21964, %r21842; 2026-02-21T12:40:23.8467735Z mov.b32 %r21965, %r21842; 2026-02-21T12:40:23.8467792Z mov.b32 %r21966, %r21842; 2026-02-21T12:40:23.8467852Z mov.b32 %r21967, %r21842; 2026-02-21T12:40:23.8467910Z mov.b32 %r21968, %r21842; 2026-02-21T12:40:23.8467967Z mov.b32 %r21969, %r21842; 2026-02-21T12:40:23.8468212Z $L__BB0_28: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8468392Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8468609Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8468679Z add.s64 %rd360, %rd629, -64; 2026-02-21T12:40:23.8468738Z // begin inline asm 2026-02-21T12:40:23.8468797Z mov.u64 %rd359, 0x0; 2026-02-21T12:40:23.8468929Z createpolicy.fractional.L2::evict_last.b64 %rd359, 1.0; 2026-02-21T12:40:23.8468989Z // end inline asm 2026-02-21T12:40:23.8469048Z // begin inline asm 2026-02-21T12:40:23.8469107Z mov.u32 %r10156, 0x0; 2026-02-21T12:40:23.8469165Z mov.u32 %r10157, 0x0; 2026-02-21T12:40:23.8469225Z mov.u32 %r10158, 0x0; 2026-02-21T12:40:23.8469283Z mov.u32 %r10159, 0x0; 2026-02-21T12:40:23.8469534Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10156, %r10157, %r10158, %r10159 }, [ %rd360 + 0 ], %rd359; 2026-02-21T12:40:23.8469598Z // end inline asm 2026-02-21T12:40:23.8469807Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8469863Z bar.sync 0; 2026-02-21T12:40:23.8469949Z st.shared.v2.b32 [%r9], {%r10156, %r10157}; 2026-02-21T12:40:23.8470031Z st.shared.v2.b32 [%r10], {%r10158, %r10159}; 2026-02-21T12:40:23.8470087Z bar.sync 0; 2026-02-21T12:40:23.8470169Z ld.shared.b16 %rs833, [%r53]; 2026-02-21T12:40:23.8470239Z ld.shared.b16 %rs834, [%r53+256]; 2026-02-21T12:40:23.8470305Z ld.shared.b16 %rs835, [%r53+16]; 2026-02-21T12:40:23.8470368Z ld.shared.b16 %rs836, [%r53+272]; 2026-02-21T12:40:23.8470436Z ld.shared.b16 %rs837, [%r54]; 2026-02-21T12:40:23.8470506Z ld.shared.b16 %rs838, [%r54+256]; 2026-02-21T12:40:23.8470569Z ld.shared.b16 %rs839, [%r54+16]; 2026-02-21T12:40:23.8470638Z ld.shared.b16 %rs840, [%r54+272]; 2026-02-21T12:40:23.8470700Z cvt.f32.bf16 %r10420, %rs833; 2026-02-21T12:40:23.8470762Z cvt.f32.bf16 %r10421, %rs834; 2026-02-21T12:40:23.8470827Z cvt.f32.bf16 %r10422, %rs837; 2026-02-21T12:40:23.8470887Z cvt.f32.bf16 %r10423, %rs838; 2026-02-21T12:40:23.8470947Z cvt.f32.bf16 %r10680, %rs835; 2026-02-21T12:40:23.8471006Z cvt.f32.bf16 %r10681, %rs836; 2026-02-21T12:40:23.8471068Z cvt.f32.bf16 %r10682, %rs839; 2026-02-21T12:40:23.8471128Z cvt.f32.bf16 %r10683, %rs840; 2026-02-21T12:40:23.8471331Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8471395Z // begin inline asm 2026-02-21T12:40:23.8471453Z mov.u32 %r10160, 0x0; 2026-02-21T12:40:23.8471511Z mov.u32 %r10161, 0x0; 2026-02-21T12:40:23.8471568Z mov.u32 %r10162, 0x0; 2026-02-21T12:40:23.8471628Z mov.u32 %r10163, 0x0; 2026-02-21T12:40:23.8471766Z ld.global.v4.b32 { %r10160, %r10161, %r10162, %r10163 }, [ %rd628 + 0 ]; 2026-02-21T12:40:23.8471906Z // end inline asm 2026-02-21T12:40:23.8472112Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8472211Z bar.sync 0; 2026-02-21T12:40:23.8472277Z st.shared.b8 [%r13], %r10160; 2026-02-21T12:40:23.8472353Z prmt.b32 %r12526, %r10160, 0, 0x7771U; 2026-02-21T12:40:23.8472415Z st.shared.b8 [%r14], %r12526; 2026-02-21T12:40:23.8472480Z prmt.b32 %r12527, %r10160, 0, 0x7772U; 2026-02-21T12:40:23.8472546Z st.shared.b8 [%r15+256], %r12527; 2026-02-21T12:40:23.8472611Z prmt.b32 %r12528, %r10160, 0, 0x7773U; 2026-02-21T12:40:23.8472674Z st.shared.b8 [%r16+256], %r12528; 2026-02-21T12:40:23.8472734Z st.shared.b8 [%r17+512], %r10161; 2026-02-21T12:40:23.8472801Z prmt.b32 %r12529, %r10161, 0, 0x7771U; 2026-02-21T12:40:23.8472863Z st.shared.b8 [%r18+512], %r12529; 2026-02-21T12:40:23.8472930Z prmt.b32 %r12530, %r10161, 0, 0x7772U; 2026-02-21T12:40:23.8472995Z st.shared.b8 [%r19+768], %r12530; 2026-02-21T12:40:23.8473060Z prmt.b32 %r12531, %r10161, 0, 0x7773U; 2026-02-21T12:40:23.8473123Z st.shared.b8 [%r20+768], %r12531; 2026-02-21T12:40:23.8473279Z st.shared.b8 [%r21+1024], %r10162; 2026-02-21T12:40:23.8473348Z prmt.b32 %r12532, %r10162, 0, 0x7771U; 2026-02-21T12:40:23.8473412Z st.shared.b8 [%r22+1024], %r12532; 2026-02-21T12:40:23.8473473Z prmt.b32 %r12533, %r10162, 0, 0x7772U; 2026-02-21T12:40:23.8473538Z st.shared.b8 [%r23+1280], %r12533; 2026-02-21T12:40:23.8473600Z prmt.b32 %r12534, %r10162, 0, 0x7773U; 2026-02-21T12:40:23.8473662Z st.shared.b8 [%r24+1280], %r12534; 2026-02-21T12:40:23.8473731Z st.shared.b8 [%r25+1536], %r10163; 2026-02-21T12:40:23.8473807Z prmt.b32 %r12535, %r10163, 0, 0x7771U; 2026-02-21T12:40:23.8473872Z st.shared.b8 [%r26+1536], %r12535; 2026-02-21T12:40:23.8473937Z prmt.b32 %r12536, %r10163, 0, 0x7772U; 2026-02-21T12:40:23.8474002Z st.shared.b8 [%r27+1792], %r12536; 2026-02-21T12:40:23.8474067Z prmt.b32 %r12537, %r10163, 0, 0x7773U; 2026-02-21T12:40:23.8474132Z st.shared.b8 [%r28+1792], %r12537; 2026-02-21T12:40:23.8474189Z bar.sync 0; 2026-02-21T12:40:23.8474256Z ld.shared.b32 %r12538, [%r29]; 2026-02-21T12:40:23.8474324Z prmt.b32 %r12539, %r12538, 0, 0x7770U; 2026-02-21T12:40:23.8474388Z cvt.u16.u32 %rs841, %r12539; 2026-02-21T12:40:23.8474455Z prmt.b32 %r12540, %r12538, 0, 0x7771U; 2026-02-21T12:40:23.8474517Z cvt.u16.u32 %rs842, %r12540; 2026-02-21T12:40:23.8474582Z prmt.b32 %r12541, %r12538, 0, 0x7772U; 2026-02-21T12:40:23.8474644Z cvt.u16.u32 %rs843, %r12541; 2026-02-21T12:40:23.8474707Z prmt.b32 %r12542, %r12538, 0, 0x7773U; 2026-02-21T12:40:23.8474767Z cvt.u16.u32 %rs844, %r12542; 2026-02-21T12:40:23.8474842Z ld.shared.b32 %r12543, [%r30]; 2026-02-21T12:40:23.8474910Z prmt.b32 %r12544, %r12543, 0, 0x7770U; 2026-02-21T12:40:23.8474969Z cvt.u16.u32 %rs845, %r12544; 2026-02-21T12:40:23.8475031Z prmt.b32 %r12545, %r12543, 0, 0x7771U; 2026-02-21T12:40:23.8475093Z cvt.u16.u32 %rs846, %r12545; 2026-02-21T12:40:23.8475161Z prmt.b32 %r12546, %r12543, 0, 0x7772U; 2026-02-21T12:40:23.8475222Z cvt.u16.u32 %rs847, %r12546; 2026-02-21T12:40:23.8475294Z prmt.b32 %r12547, %r12543, 0, 0x7773U; 2026-02-21T12:40:23.8475352Z cvt.u16.u32 %rs848, %r12547; 2026-02-21T12:40:23.8475415Z ld.shared.b32 %r12548, [%r31]; 2026-02-21T12:40:23.8475479Z prmt.b32 %r12549, %r12548, 0, 0x7770U; 2026-02-21T12:40:23.8475542Z cvt.u16.u32 %rs849, %r12549; 2026-02-21T12:40:23.8475605Z prmt.b32 %r12550, %r12548, 0, 0x7771U; 2026-02-21T12:40:23.8475664Z cvt.u16.u32 %rs850, %r12550; 2026-02-21T12:40:23.8475735Z prmt.b32 %r12551, %r12548, 0, 0x7772U; 2026-02-21T12:40:23.8475795Z cvt.u16.u32 %rs851, %r12551; 2026-02-21T12:40:23.8475857Z prmt.b32 %r12552, %r12548, 0, 0x7773U; 2026-02-21T12:40:23.8475918Z cvt.u16.u32 %rs852, %r12552; 2026-02-21T12:40:23.8475984Z ld.shared.b32 %r12553, [%r32]; 2026-02-21T12:40:23.8476061Z prmt.b32 %r12554, %r12553, 0, 0x7770U; 2026-02-21T12:40:23.8476123Z cvt.u16.u32 %rs853, %r12554; 2026-02-21T12:40:23.8476251Z prmt.b32 %r12555, %r12553, 0, 0x7771U; 2026-02-21T12:40:23.8476311Z cvt.u16.u32 %rs854, %r12555; 2026-02-21T12:40:23.8476424Z prmt.b32 %r12556, %r12553, 0, 0x7772U; 2026-02-21T12:40:23.8476608Z cvt.u16.u32 %rs855, %r12556; 2026-02-21T12:40:23.8476679Z prmt.b32 %r12557, %r12553, 0, 0x7773U; 2026-02-21T12:40:23.8476739Z cvt.u16.u32 %rs856, %r12557; 2026-02-21T12:40:23.8476941Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8477007Z shl.b16 %rs857, %rs841, 4; 2026-02-21T12:40:23.8477067Z shl.b16 %rs858, %rs845, 4; 2026-02-21T12:40:23.8477127Z shl.b16 %rs859, %rs849, 4; 2026-02-21T12:40:23.8477199Z shl.b16 %rs860, %rs853, 4; 2026-02-21T12:40:23.8477258Z shl.b16 %rs861, %rs842, 4; 2026-02-21T12:40:23.8477317Z shl.b16 %rs862, %rs846, 4; 2026-02-21T12:40:23.8477376Z shl.b16 %rs863, %rs850, 4; 2026-02-21T12:40:23.8477438Z shl.b16 %rs864, %rs854, 4; 2026-02-21T12:40:23.8477500Z shl.b16 %rs865, %rs843, 4; 2026-02-21T12:40:23.8477560Z shl.b16 %rs866, %rs847, 4; 2026-02-21T12:40:23.8477621Z shl.b16 %rs867, %rs851, 4; 2026-02-21T12:40:23.8477812Z shl.b16 %rs868, %rs855, 4; 2026-02-21T12:40:23.8477874Z shl.b16 %rs869, %rs844, 4; 2026-02-21T12:40:23.8477933Z shl.b16 %rs870, %rs848, 4; 2026-02-21T12:40:23.8477994Z shl.b16 %rs871, %rs852, 4; 2026-02-21T12:40:23.8478052Z shl.b16 %rs872, %rs856, 4; 2026-02-21T12:40:23.8478255Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8478320Z cvt.s16.s8 %rs873, %rs857; 2026-02-21T12:40:23.8478380Z shr.s16 %rs874, %rs873, 4; 2026-02-21T12:40:23.8478439Z cvt.s16.s8 %rs875, %rs859; 2026-02-21T12:40:23.8478500Z shr.s16 %rs876, %rs875, 4; 2026-02-21T12:40:23.8478580Z prmt.b32 %r12558, %r12538, 0, 0x8880U; 2026-02-21T12:40:23.8478643Z cvt.u16.u32 %rs877, %r12558; 2026-02-21T12:40:23.8478712Z shr.s16 %rs878, %rs877, 4; 2026-02-21T12:40:23.8478780Z prmt.b32 %r12559, %r12548, 0, 0x8880U; 2026-02-21T12:40:23.8478842Z cvt.u16.u32 %rs879, %r12559; 2026-02-21T12:40:23.8478905Z shr.s16 %rs880, %rs879, 4; 2026-02-21T12:40:23.8479107Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8479174Z cvt.rn.f32.s16 %r12560, %rs880; 2026-02-21T12:40:23.8479239Z cvt.rn.f32.s16 %r12561, %rs878; 2026-02-21T12:40:23.8479299Z cvt.rn.f32.s16 %r12562, %rs876; 2026-02-21T12:40:23.8479358Z cvt.rn.f32.s16 %r12563, %rs874; 2026-02-21T12:40:23.8479550Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8479614Z cvt.s16.s8 %rs881, %rs858; 2026-02-21T12:40:23.8479673Z shr.s16 %rs882, %rs881, 4; 2026-02-21T12:40:23.8479734Z cvt.s16.s8 %rs883, %rs860; 2026-02-21T12:40:23.8479795Z shr.s16 %rs884, %rs883, 4; 2026-02-21T12:40:23.8479861Z prmt.b32 %r12564, %r12543, 0, 0x8880U; 2026-02-21T12:40:23.8479921Z cvt.u16.u32 %rs885, %r12564; 2026-02-21T12:40:23.8479983Z shr.s16 %rs886, %rs885, 4; 2026-02-21T12:40:23.8480048Z prmt.b32 %r12565, %r12553, 0, 0x8880U; 2026-02-21T12:40:23.8480112Z cvt.u16.u32 %rs887, %r12565; 2026-02-21T12:40:23.8480173Z shr.s16 %rs888, %rs887, 4; 2026-02-21T12:40:23.8480369Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8480431Z cvt.rn.f32.s16 %r12566, %rs888; 2026-02-21T12:40:23.8480492Z cvt.rn.f32.s16 %r12567, %rs886; 2026-02-21T12:40:23.8480557Z cvt.rn.f32.s16 %r12568, %rs884; 2026-02-21T12:40:23.8480616Z cvt.rn.f32.s16 %r12569, %rs882; 2026-02-21T12:40:23.8480808Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8480870Z cvt.s16.s8 %rs889, %rs861; 2026-02-21T12:40:23.8480929Z shr.s16 %rs890, %rs889, 4; 2026-02-21T12:40:23.8480987Z cvt.s16.s8 %rs891, %rs863; 2026-02-21T12:40:23.8481046Z shr.s16 %rs892, %rs891, 4; 2026-02-21T12:40:23.8481207Z prmt.b32 %r12570, %r12538, 0, 0x9991U; 2026-02-21T12:40:23.8481267Z cvt.u16.u32 %rs893, %r12570; 2026-02-21T12:40:23.8481326Z shr.s16 %rs894, %rs893, 4; 2026-02-21T12:40:23.8481477Z prmt.b32 %r12571, %r12548, 0, 0x9991U; 2026-02-21T12:40:23.8481538Z cvt.u16.u32 %rs895, %r12571; 2026-02-21T12:40:23.8481597Z shr.s16 %rs896, %rs895, 4; 2026-02-21T12:40:23.8481788Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8481851Z cvt.rn.f32.s16 %r12572, %rs896; 2026-02-21T12:40:23.8481912Z cvt.rn.f32.s16 %r12573, %rs894; 2026-02-21T12:40:23.8481971Z cvt.rn.f32.s16 %r12574, %rs892; 2026-02-21T12:40:23.8482032Z cvt.rn.f32.s16 %r12575, %rs890; 2026-02-21T12:40:23.8482221Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8482280Z cvt.s16.s8 %rs897, %rs862; 2026-02-21T12:40:23.8482340Z shr.s16 %rs898, %rs897, 4; 2026-02-21T12:40:23.8482402Z cvt.s16.s8 %rs899, %rs864; 2026-02-21T12:40:23.8482460Z shr.s16 %rs900, %rs899, 4; 2026-02-21T12:40:23.8482524Z prmt.b32 %r12576, %r12543, 0, 0x9991U; 2026-02-21T12:40:23.8482684Z cvt.u16.u32 %rs901, %r12576; 2026-02-21T12:40:23.8482746Z shr.s16 %rs902, %rs901, 4; 2026-02-21T12:40:23.8482809Z prmt.b32 %r12577, %r12553, 0, 0x9991U; 2026-02-21T12:40:23.8482870Z cvt.u16.u32 %rs903, %r12577; 2026-02-21T12:40:23.8482930Z shr.s16 %rs904, %rs903, 4; 2026-02-21T12:40:23.8483127Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8483188Z cvt.rn.f32.s16 %r12578, %rs904; 2026-02-21T12:40:23.8483250Z cvt.rn.f32.s16 %r12579, %rs902; 2026-02-21T12:40:23.8483312Z cvt.rn.f32.s16 %r12580, %rs900; 2026-02-21T12:40:23.8483372Z cvt.rn.f32.s16 %r12581, %rs898; 2026-02-21T12:40:23.8483565Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8483624Z cvt.s16.s8 %rs905, %rs865; 2026-02-21T12:40:23.8483684Z shr.s16 %rs906, %rs905, 4; 2026-02-21T12:40:23.8483746Z cvt.s16.s8 %rs907, %rs867; 2026-02-21T12:40:23.8483806Z shr.s16 %rs908, %rs907, 4; 2026-02-21T12:40:23.8483874Z prmt.b32 %r12582, %r12538, 0, 0xaaa2U; 2026-02-21T12:40:23.8483935Z cvt.u16.u32 %rs909, %r12582; 2026-02-21T12:40:23.8483997Z shr.s16 %rs910, %rs909, 4; 2026-02-21T12:40:23.8484061Z prmt.b32 %r12583, %r12548, 0, 0xaaa2U; 2026-02-21T12:40:23.8484121Z cvt.u16.u32 %rs911, %r12583; 2026-02-21T12:40:23.8484181Z shr.s16 %rs912, %rs911, 4; 2026-02-21T12:40:23.8484370Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8484431Z cvt.rn.f32.s16 %r12584, %rs912; 2026-02-21T12:40:23.8484492Z cvt.rn.f32.s16 %r12585, %rs910; 2026-02-21T12:40:23.8484554Z cvt.rn.f32.s16 %r12586, %rs908; 2026-02-21T12:40:23.8484613Z cvt.rn.f32.s16 %r12587, %rs906; 2026-02-21T12:40:23.8484803Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8484883Z cvt.s16.s8 %rs913, %rs866; 2026-02-21T12:40:23.8484946Z shr.s16 %rs914, %rs913, 4; 2026-02-21T12:40:23.8485009Z cvt.s16.s8 %rs915, %rs868; 2026-02-21T12:40:23.8485073Z shr.s16 %rs916, %rs915, 4; 2026-02-21T12:40:23.8485138Z prmt.b32 %r12588, %r12543, 0, 0xaaa2U; 2026-02-21T12:40:23.8485198Z cvt.u16.u32 %rs917, %r12588; 2026-02-21T12:40:23.8485256Z shr.s16 %rs918, %rs917, 4; 2026-02-21T12:40:23.8485324Z prmt.b32 %r12589, %r12553, 0, 0xaaa2U; 2026-02-21T12:40:23.8485382Z cvt.u16.u32 %rs919, %r12589; 2026-02-21T12:40:23.8485441Z shr.s16 %rs920, %rs919, 4; 2026-02-21T12:40:23.8485634Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8485695Z cvt.rn.f32.s16 %r12590, %rs920; 2026-02-21T12:40:23.8485762Z cvt.rn.f32.s16 %r12591, %rs918; 2026-02-21T12:40:23.8485822Z cvt.rn.f32.s16 %r12592, %rs916; 2026-02-21T12:40:23.8485885Z cvt.rn.f32.s16 %r12593, %rs914; 2026-02-21T12:40:23.8486138Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8486247Z cvt.s16.s8 %rs921, %rs869; 2026-02-21T12:40:23.8486309Z shr.s16 %rs922, %rs921, 4; 2026-02-21T12:40:23.8486368Z cvt.s16.s8 %rs923, %rs871; 2026-02-21T12:40:23.8486426Z shr.s16 %rs924, %rs923, 4; 2026-02-21T12:40:23.8486631Z prmt.b32 %r12594, %r12538, 0, 0xbbb3U; 2026-02-21T12:40:23.8486701Z cvt.u16.u32 %rs925, %r12594; 2026-02-21T12:40:23.8486763Z shr.s16 %rs926, %rs925, 4; 2026-02-21T12:40:23.8486827Z prmt.b32 %r12595, %r12548, 0, 0xbbb3U; 2026-02-21T12:40:23.8486889Z cvt.u16.u32 %rs927, %r12595; 2026-02-21T12:40:23.8486950Z shr.s16 %rs928, %rs927, 4; 2026-02-21T12:40:23.8487141Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8487205Z cvt.rn.f32.s16 %r12596, %rs928; 2026-02-21T12:40:23.8487267Z cvt.rn.f32.s16 %r12597, %rs926; 2026-02-21T12:40:23.8487329Z cvt.rn.f32.s16 %r12598, %rs924; 2026-02-21T12:40:23.8487407Z cvt.rn.f32.s16 %r12599, %rs922; 2026-02-21T12:40:23.8487752Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8487818Z cvt.s16.s8 %rs929, %rs870; 2026-02-21T12:40:23.8487878Z shr.s16 %rs930, %rs929, 4; 2026-02-21T12:40:23.8487939Z cvt.s16.s8 %rs931, %rs872; 2026-02-21T12:40:23.8487997Z shr.s16 %rs932, %rs931, 4; 2026-02-21T12:40:23.8488062Z prmt.b32 %r12600, %r12543, 0, 0xbbb3U; 2026-02-21T12:40:23.8488126Z cvt.u16.u32 %rs933, %r12600; 2026-02-21T12:40:23.8488185Z shr.s16 %rs934, %rs933, 4; 2026-02-21T12:40:23.8488251Z prmt.b32 %r12601, %r12553, 0, 0xbbb3U; 2026-02-21T12:40:23.8488311Z cvt.u16.u32 %rs935, %r12601; 2026-02-21T12:40:23.8488373Z shr.s16 %rs936, %rs935, 4; 2026-02-21T12:40:23.8488563Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8488627Z cvt.rn.f32.s16 %r12602, %rs936; 2026-02-21T12:40:23.8488692Z cvt.rn.f32.s16 %r12603, %rs934; 2026-02-21T12:40:23.8488753Z cvt.rn.f32.s16 %r12604, %rs932; 2026-02-21T12:40:23.8488821Z cvt.rn.f32.s16 %r12605, %rs930; 2026-02-21T12:40:23.8488879Z bar.sync 0; 2026-02-21T12:40:23.8488999Z st.shared.v4.b32 [%r33], {%r12563, %r12561, %r12562, %r12560}; 2026-02-21T12:40:23.8489131Z st.shared.v4.b32 [%r33+8192], {%r12569, %r12567, %r12568, %r12566}; 2026-02-21T12:40:23.8489252Z st.shared.v4.b32 [%r34], {%r12575, %r12573, %r12574, %r12572}; 2026-02-21T12:40:23.8489377Z st.shared.v4.b32 [%r34+8192], {%r12581, %r12579, %r12580, %r12578}; 2026-02-21T12:40:23.8489485Z st.shared.v4.b32 [%r35], {%r12587, %r12585, %r12586, %r12584}; 2026-02-21T12:40:23.8489600Z st.shared.v4.b32 [%r35+8192], {%r12593, %r12591, %r12592, %r12590}; 2026-02-21T12:40:23.8489711Z st.shared.v4.b32 [%r36], {%r12599, %r12597, %r12598, %r12596}; 2026-02-21T12:40:23.8489825Z st.shared.v4.b32 [%r36+8192], {%r12605, %r12603, %r12604, %r12602}; 2026-02-21T12:40:23.8489881Z $L__tmp17: 2026-02-21T12:40:23.8490170Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8490233Z // begin inline asm 2026-02-21T12:40:23.8490310Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8490365Z // end inline asm 2026-02-21T12:40:23.8490423Z bar.sync 0; 2026-02-21T12:40:23.8490506Z shfl.sync.idx.b32 %r12606, %r2, 0, 31, -1; 2026-02-21T12:40:23.8490579Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8490645Z mov.pred %p25, -1; 2026-02-21T12:40:23.8490703Z // begin inline asm 2026-02-21T12:40:23.8493432Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r10420,%r10421,%r10422,%r10423}, %rd23, %p25, 1, 1; 2026-02-21T12:40:23.8493635Z // end inline asm 2026-02-21T12:40:23.8493694Z // begin inline asm 2026-02-21T12:40:23.8496674Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r10680,%r10681,%r10682,%r10683}, %rd24, %p25, 1, 1; 2026-02-21T12:40:23.8496759Z // end inline asm 2026-02-21T12:40:23.8496841Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8496899Z mov.b32 %r12394, 0; 2026-02-21T12:40:23.8496966Z mov.b32 %r10812, %r18409; 2026-02-21T12:40:23.8497025Z mov.b32 %r10813, %r12394; 2026-02-21T12:40:23.8497083Z mov.b32 %r10814, %r12394; 2026-02-21T12:40:23.8497142Z // begin inline asm 2026-02-21T12:40:23.8499699Z // wait for regs: %r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969,%r10812,%r10813,%r10814 2026-02-21T12:40:23.8499783Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8499843Z // end inline asm 2026-02-21T12:40:23.8499897Z $L__tmp18: 2026-02-21T12:40:23.8500110Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8500176Z add.s64 %rd366, %rd629, -32; 2026-02-21T12:40:23.8500238Z // begin inline asm 2026-02-21T12:40:23.8500297Z mov.u64 %rd365, 0x0; 2026-02-21T12:40:23.8500422Z createpolicy.fractional.L2::evict_last.b64 %rd365, 1.0; 2026-02-21T12:40:23.8500482Z // end inline asm 2026-02-21T12:40:23.8500612Z // begin inline asm 2026-02-21T12:40:23.8500670Z mov.u32 %r10946, 0x0; 2026-02-21T12:40:23.8500726Z mov.u32 %r10947, 0x0; 2026-02-21T12:40:23.8500846Z mov.u32 %r10948, 0x0; 2026-02-21T12:40:23.8500902Z mov.u32 %r10949, 0x0; 2026-02-21T12:40:23.8501139Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r10946, %r10947, %r10948, %r10949 }, [ %rd366 + 0 ], %rd365; 2026-02-21T12:40:23.8501199Z // end inline asm 2026-02-21T12:40:23.8501400Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8501466Z bar.sync 0; 2026-02-21T12:40:23.8501553Z st.shared.v2.b32 [%r9], {%r10946, %r10947}; 2026-02-21T12:40:23.8501635Z st.shared.v2.b32 [%r10], {%r10948, %r10949}; 2026-02-21T12:40:23.8501689Z bar.sync 0; 2026-02-21T12:40:23.8501755Z ld.shared.b16 %rs937, [%r53]; 2026-02-21T12:40:23.8501824Z ld.shared.b16 %rs938, [%r53+256]; 2026-02-21T12:40:23.8501889Z ld.shared.b16 %rs939, [%r53+16]; 2026-02-21T12:40:23.8501953Z ld.shared.b16 %rs940, [%r53+272]; 2026-02-21T12:40:23.8502018Z ld.shared.b16 %rs941, [%r54]; 2026-02-21T12:40:23.8502080Z ld.shared.b16 %rs942, [%r54+256]; 2026-02-21T12:40:23.8502261Z ld.shared.b16 %rs943, [%r54+16]; 2026-02-21T12:40:23.8502331Z ld.shared.b16 %rs944, [%r54+272]; 2026-02-21T12:40:23.8502396Z cvt.f32.bf16 %r11210, %rs937; 2026-02-21T12:40:23.8502456Z cvt.f32.bf16 %r11211, %rs938; 2026-02-21T12:40:23.8502516Z cvt.f32.bf16 %r11212, %rs941; 2026-02-21T12:40:23.8502579Z cvt.f32.bf16 %r11213, %rs942; 2026-02-21T12:40:23.8502637Z cvt.f32.bf16 %r11470, %rs939; 2026-02-21T12:40:23.8502695Z cvt.f32.bf16 %r11471, %rs940; 2026-02-21T12:40:23.8502757Z cvt.f32.bf16 %r11472, %rs943; 2026-02-21T12:40:23.8502814Z cvt.f32.bf16 %r11473, %rs944; 2026-02-21T12:40:23.8503016Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8503080Z add.s64 %rd368, %rd628, 10240; 2026-02-21T12:40:23.8503141Z // begin inline asm 2026-02-21T12:40:23.8503200Z mov.u32 %r10950, 0x0; 2026-02-21T12:40:23.8503256Z mov.u32 %r10951, 0x0; 2026-02-21T12:40:23.8503314Z mov.u32 %r10952, 0x0; 2026-02-21T12:40:23.8503387Z mov.u32 %r10953, 0x0; 2026-02-21T12:40:23.8503526Z ld.global.v4.b32 { %r10950, %r10951, %r10952, %r10953 }, [ %rd368 + 0 ]; 2026-02-21T12:40:23.8503582Z // end inline asm 2026-02-21T12:40:23.8503780Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8503834Z bar.sync 0; 2026-02-21T12:40:23.8503897Z st.shared.b8 [%r13], %r10950; 2026-02-21T12:40:23.8503972Z prmt.b32 %r12607, %r10950, 0, 0x7771U; 2026-02-21T12:40:23.8504033Z st.shared.b8 [%r14], %r12607; 2026-02-21T12:40:23.8504099Z prmt.b32 %r12608, %r10950, 0, 0x7772U; 2026-02-21T12:40:23.8504163Z st.shared.b8 [%r15+256], %r12608; 2026-02-21T12:40:23.8504227Z prmt.b32 %r12609, %r10950, 0, 0x7773U; 2026-02-21T12:40:23.8504289Z st.shared.b8 [%r16+256], %r12609; 2026-02-21T12:40:23.8504353Z st.shared.b8 [%r17+512], %r10951; 2026-02-21T12:40:23.8504432Z prmt.b32 %r12610, %r10951, 0, 0x7771U; 2026-02-21T12:40:23.8504497Z st.shared.b8 [%r18+512], %r12610; 2026-02-21T12:40:23.8504566Z prmt.b32 %r12611, %r10951, 0, 0x7772U; 2026-02-21T12:40:23.8504631Z st.shared.b8 [%r19+768], %r12611; 2026-02-21T12:40:23.8504698Z prmt.b32 %r12612, %r10951, 0, 0x7773U; 2026-02-21T12:40:23.8504760Z st.shared.b8 [%r20+768], %r12612; 2026-02-21T12:40:23.8504824Z st.shared.b8 [%r21+1024], %r10952; 2026-02-21T12:40:23.8504893Z prmt.b32 %r12613, %r10952, 0, 0x7771U; 2026-02-21T12:40:23.8504957Z st.shared.b8 [%r22+1024], %r12613; 2026-02-21T12:40:23.8505020Z prmt.b32 %r12614, %r10952, 0, 0x7772U; 2026-02-21T12:40:23.8505083Z st.shared.b8 [%r23+1280], %r12614; 2026-02-21T12:40:23.8505149Z prmt.b32 %r12615, %r10952, 0, 0x7773U; 2026-02-21T12:40:23.8505209Z st.shared.b8 [%r24+1280], %r12615; 2026-02-21T12:40:23.8505272Z st.shared.b8 [%r25+1536], %r10953; 2026-02-21T12:40:23.8505395Z prmt.b32 %r12616, %r10953, 0, 0x7771U; 2026-02-21T12:40:23.8505457Z st.shared.b8 [%r26+1536], %r12616; 2026-02-21T12:40:23.8505531Z prmt.b32 %r12617, %r10953, 0, 0x7772U; 2026-02-21T12:40:23.8505650Z st.shared.b8 [%r27+1792], %r12617; 2026-02-21T12:40:23.8505715Z prmt.b32 %r12618, %r10953, 0, 0x7773U; 2026-02-21T12:40:23.8505776Z st.shared.b8 [%r28+1792], %r12618; 2026-02-21T12:40:23.8505831Z bar.sync 0; 2026-02-21T12:40:23.8505895Z ld.shared.b32 %r12619, [%r29]; 2026-02-21T12:40:23.8505959Z prmt.b32 %r12620, %r12619, 0, 0x7770U; 2026-02-21T12:40:23.8506021Z cvt.u16.u32 %rs945, %r12620; 2026-02-21T12:40:23.8506087Z prmt.b32 %r12621, %r12619, 0, 0x7771U; 2026-02-21T12:40:23.8506148Z cvt.u16.u32 %rs946, %r12621; 2026-02-21T12:40:23.8506211Z prmt.b32 %r12622, %r12619, 0, 0x7772U; 2026-02-21T12:40:23.8506271Z cvt.u16.u32 %rs947, %r12622; 2026-02-21T12:40:23.8506334Z prmt.b32 %r12623, %r12619, 0, 0x7773U; 2026-02-21T12:40:23.8506394Z cvt.u16.u32 %rs948, %r12623; 2026-02-21T12:40:23.8506584Z ld.shared.b32 %r12624, [%r30]; 2026-02-21T12:40:23.8506655Z prmt.b32 %r12625, %r12624, 0, 0x7770U; 2026-02-21T12:40:23.8506714Z cvt.u16.u32 %rs949, %r12625; 2026-02-21T12:40:23.8506912Z prmt.b32 %r12626, %r12624, 0, 0x7771U; 2026-02-21T12:40:23.8506978Z cvt.u16.u32 %rs950, %r12626; 2026-02-21T12:40:23.8507044Z prmt.b32 %r12627, %r12624, 0, 0x7772U; 2026-02-21T12:40:23.8507105Z cvt.u16.u32 %rs951, %r12627; 2026-02-21T12:40:23.8507170Z prmt.b32 %r12628, %r12624, 0, 0x7773U; 2026-02-21T12:40:23.8507242Z cvt.u16.u32 %rs952, %r12628; 2026-02-21T12:40:23.8507308Z ld.shared.b32 %r12629, [%r31]; 2026-02-21T12:40:23.8507372Z prmt.b32 %r12630, %r12629, 0, 0x7770U; 2026-02-21T12:40:23.8507434Z cvt.u16.u32 %rs953, %r12630; 2026-02-21T12:40:23.8507497Z prmt.b32 %r12631, %r12629, 0, 0x7771U; 2026-02-21T12:40:23.8507556Z cvt.u16.u32 %rs954, %r12631; 2026-02-21T12:40:23.8507622Z prmt.b32 %r12632, %r12629, 0, 0x7772U; 2026-02-21T12:40:23.8507681Z cvt.u16.u32 %rs955, %r12632; 2026-02-21T12:40:23.8507747Z prmt.b32 %r12633, %r12629, 0, 0x7773U; 2026-02-21T12:40:23.8507805Z cvt.u16.u32 %rs956, %r12633; 2026-02-21T12:40:23.8507870Z ld.shared.b32 %r12634, [%r32]; 2026-02-21T12:40:23.8507936Z prmt.b32 %r12635, %r12634, 0, 0x7770U; 2026-02-21T12:40:23.8507995Z cvt.u16.u32 %rs957, %r12635; 2026-02-21T12:40:23.8508061Z prmt.b32 %r12636, %r12634, 0, 0x7771U; 2026-02-21T12:40:23.8508120Z cvt.u16.u32 %rs958, %r12636; 2026-02-21T12:40:23.8508182Z prmt.b32 %r12637, %r12634, 0, 0x7772U; 2026-02-21T12:40:23.8508244Z cvt.u16.u32 %rs959, %r12637; 2026-02-21T12:40:23.8508306Z prmt.b32 %r12638, %r12634, 0, 0x7773U; 2026-02-21T12:40:23.8508436Z cvt.u16.u32 %rs960, %r12638; 2026-02-21T12:40:23.8508636Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8508702Z shl.b16 %rs961, %rs945, 4; 2026-02-21T12:40:23.8508762Z shl.b16 %rs962, %rs949, 4; 2026-02-21T12:40:23.8508821Z shl.b16 %rs963, %rs953, 4; 2026-02-21T12:40:23.8508882Z shl.b16 %rs964, %rs957, 4; 2026-02-21T12:40:23.8508940Z shl.b16 %rs965, %rs946, 4; 2026-02-21T12:40:23.8508999Z shl.b16 %rs966, %rs950, 4; 2026-02-21T12:40:23.8509063Z shl.b16 %rs967, %rs954, 4; 2026-02-21T12:40:23.8509123Z shl.b16 %rs968, %rs958, 4; 2026-02-21T12:40:23.8509182Z shl.b16 %rs969, %rs947, 4; 2026-02-21T12:40:23.8509240Z shl.b16 %rs970, %rs951, 4; 2026-02-21T12:40:23.8509300Z shl.b16 %rs971, %rs955, 4; 2026-02-21T12:40:23.8509358Z shl.b16 %rs972, %rs959, 4; 2026-02-21T12:40:23.8509416Z shl.b16 %rs973, %rs948, 4; 2026-02-21T12:40:23.8509474Z shl.b16 %rs974, %rs952, 4; 2026-02-21T12:40:23.8509536Z shl.b16 %rs975, %rs956, 4; 2026-02-21T12:40:23.8509594Z shl.b16 %rs976, %rs960, 4; 2026-02-21T12:40:23.8509788Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8509851Z cvt.s16.s8 %rs977, %rs961; 2026-02-21T12:40:23.8509909Z shr.s16 %rs978, %rs977, 4; 2026-02-21T12:40:23.8509969Z cvt.s16.s8 %rs979, %rs963; 2026-02-21T12:40:23.8510108Z shr.s16 %rs980, %rs979, 4; 2026-02-21T12:40:23.8510173Z prmt.b32 %r12639, %r12619, 0, 0x8880U; 2026-02-21T12:40:23.8510292Z cvt.u16.u32 %rs981, %r12639; 2026-02-21T12:40:23.8510351Z shr.s16 %rs982, %rs981, 4; 2026-02-21T12:40:23.8510417Z prmt.b32 %r12640, %r12629, 0, 0x8880U; 2026-02-21T12:40:23.8510475Z cvt.u16.u32 %rs983, %r12640; 2026-02-21T12:40:23.8510534Z shr.s16 %rs984, %rs983, 4; 2026-02-21T12:40:23.8510742Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8510808Z cvt.rn.f32.s16 %r12641, %rs984; 2026-02-21T12:40:23.8510873Z cvt.rn.f32.s16 %r12642, %rs982; 2026-02-21T12:40:23.8510932Z cvt.rn.f32.s16 %r12643, %rs980; 2026-02-21T12:40:23.8510996Z cvt.rn.f32.s16 %r12644, %rs978; 2026-02-21T12:40:23.8511188Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8511249Z cvt.s16.s8 %rs985, %rs962; 2026-02-21T12:40:23.8511312Z shr.s16 %rs986, %rs985, 4; 2026-02-21T12:40:23.8511371Z cvt.s16.s8 %rs987, %rs964; 2026-02-21T12:40:23.8511503Z shr.s16 %rs988, %rs987, 4; 2026-02-21T12:40:23.8511620Z prmt.b32 %r12645, %r12624, 0, 0x8880U; 2026-02-21T12:40:23.8511683Z cvt.u16.u32 %rs989, %r12645; 2026-02-21T12:40:23.8511742Z shr.s16 %rs990, %rs989, 4; 2026-02-21T12:40:23.8511807Z prmt.b32 %r12646, %r12634, 0, 0x8880U; 2026-02-21T12:40:23.8511867Z cvt.u16.u32 %rs991, %r12646; 2026-02-21T12:40:23.8511925Z shr.s16 %rs992, %rs991, 4; 2026-02-21T12:40:23.8512118Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8512182Z cvt.rn.f32.s16 %r12647, %rs992; 2026-02-21T12:40:23.8512242Z cvt.rn.f32.s16 %r12648, %rs990; 2026-02-21T12:40:23.8512301Z cvt.rn.f32.s16 %r12649, %rs988; 2026-02-21T12:40:23.8512363Z cvt.rn.f32.s16 %r12650, %rs986; 2026-02-21T12:40:23.8512554Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8512615Z cvt.s16.s8 %rs993, %rs965; 2026-02-21T12:40:23.8512674Z shr.s16 %rs994, %rs993, 4; 2026-02-21T12:40:23.8512741Z cvt.s16.s8 %rs995, %rs967; 2026-02-21T12:40:23.8512800Z shr.s16 %rs996, %rs995, 4; 2026-02-21T12:40:23.8512878Z prmt.b32 %r12651, %r12619, 0, 0x9991U; 2026-02-21T12:40:23.8512941Z cvt.u16.u32 %rs997, %r12651; 2026-02-21T12:40:23.8513000Z shr.s16 %rs998, %rs997, 4; 2026-02-21T12:40:23.8513066Z prmt.b32 %r12652, %r12629, 0, 0x9991U; 2026-02-21T12:40:23.8513126Z cvt.u16.u32 %rs999, %r12652; 2026-02-21T12:40:23.8513190Z shr.s16 %rs1000, %rs999, 4; 2026-02-21T12:40:23.8513382Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8513444Z cvt.rn.f32.s16 %r12653, %rs1000; 2026-02-21T12:40:23.8513508Z cvt.rn.f32.s16 %r12654, %rs998; 2026-02-21T12:40:23.8513569Z cvt.rn.f32.s16 %r12655, %rs996; 2026-02-21T12:40:23.8513628Z cvt.rn.f32.s16 %r12656, %rs994; 2026-02-21T12:40:23.8513825Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8513891Z cvt.s16.s8 %rs1001, %rs966; 2026-02-21T12:40:23.8513952Z shr.s16 %rs1002, %rs1001, 4; 2026-02-21T12:40:23.8514011Z cvt.s16.s8 %rs1003, %rs968; 2026-02-21T12:40:23.8514072Z shr.s16 %rs1004, %rs1003, 4; 2026-02-21T12:40:23.8514136Z prmt.b32 %r12657, %r12624, 0, 0x9991U; 2026-02-21T12:40:23.8514197Z cvt.u16.u32 %rs1005, %r12657; 2026-02-21T12:40:23.8514258Z shr.s16 %rs1006, %rs1005, 4; 2026-02-21T12:40:23.8514323Z prmt.b32 %r12658, %r12634, 0, 0x9991U; 2026-02-21T12:40:23.8514383Z cvt.u16.u32 %rs1007, %r12658; 2026-02-21T12:40:23.8515025Z [8123s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:40:23.8516227Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 64, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=32, num_stages=6, num_warps=4, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:40:23.8516601Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:40:23.8516667Z `ptxas` stderr: 2026-02-21T12:40:23.8517142Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 705 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:40:23.8517244Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:40:23.8517251Z 2026-02-21T12:40:23.8517757Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmppzm9t2nw.ptx -o /tmp/tmppzm9t2nw.ptx.o 2026-02-21T12:40:23.8517762Z 2026-02-21T12:40:23.8517912Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:40:23.8517981Z shr.s16 %rs1008, %rs1007, 4; 2026-02-21T12:40:23.8518324Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8518400Z cvt.rn.f32.s16 %r12659, %rs1008; 2026-02-21T12:40:23.8518464Z cvt.rn.f32.s16 %r12660, %rs1006; 2026-02-21T12:40:23.8518526Z cvt.rn.f32.s16 %r12661, %rs1004; 2026-02-21T12:40:23.8518587Z cvt.rn.f32.s16 %r12662, %rs1002; 2026-02-21T12:40:23.8518792Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8518859Z cvt.s16.s8 %rs1009, %rs969; 2026-02-21T12:40:23.8518921Z shr.s16 %rs1010, %rs1009, 4; 2026-02-21T12:40:23.8518982Z cvt.s16.s8 %rs1011, %rs971; 2026-02-21T12:40:23.8519043Z shr.s16 %rs1012, %rs1011, 4; 2026-02-21T12:40:23.8519125Z prmt.b32 %r12663, %r12619, 0, 0xaaa2U; 2026-02-21T12:40:23.8519195Z cvt.u16.u32 %rs1013, %r12663; 2026-02-21T12:40:23.8519258Z shr.s16 %rs1014, %rs1013, 4; 2026-02-21T12:40:23.8519331Z prmt.b32 %r12664, %r12629, 0, 0xaaa2U; 2026-02-21T12:40:23.8519393Z cvt.u16.u32 %rs1015, %r12664; 2026-02-21T12:40:23.8519456Z shr.s16 %rs1016, %rs1015, 4; 2026-02-21T12:40:23.8519669Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8519738Z cvt.rn.f32.s16 %r12665, %rs1016; 2026-02-21T12:40:23.8519802Z cvt.rn.f32.s16 %r12666, %rs1014; 2026-02-21T12:40:23.8519863Z cvt.rn.f32.s16 %r12667, %rs1012; 2026-02-21T12:40:23.8519927Z cvt.rn.f32.s16 %r12668, %rs1010; 2026-02-21T12:40:23.8520126Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8520198Z cvt.s16.s8 %rs1017, %rs970; 2026-02-21T12:40:23.8520266Z shr.s16 %rs1018, %rs1017, 4; 2026-02-21T12:40:23.8520326Z cvt.s16.s8 %rs1019, %rs972; 2026-02-21T12:40:23.8520388Z shr.s16 %rs1020, %rs1019, 4; 2026-02-21T12:40:23.8520457Z prmt.b32 %r12669, %r12624, 0, 0xaaa2U; 2026-02-21T12:40:23.8520520Z cvt.u16.u32 %rs1021, %r12669; 2026-02-21T12:40:23.8520580Z shr.s16 %rs1022, %rs1021, 4; 2026-02-21T12:40:23.8520649Z prmt.b32 %r12670, %r12634, 0, 0xaaa2U; 2026-02-21T12:40:23.8520714Z cvt.u16.u32 %rs1023, %r12670; 2026-02-21T12:40:23.8520775Z shr.s16 %rs1024, %rs1023, 4; 2026-02-21T12:40:23.8520971Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8521035Z cvt.rn.f32.s16 %r12671, %rs1024; 2026-02-21T12:40:23.8521101Z cvt.rn.f32.s16 %r12672, %rs1022; 2026-02-21T12:40:23.8521162Z cvt.rn.f32.s16 %r12673, %rs1020; 2026-02-21T12:40:23.8521225Z cvt.rn.f32.s16 %r12674, %rs1018; 2026-02-21T12:40:23.8521423Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8521484Z cvt.s16.s8 %rs1025, %rs973; 2026-02-21T12:40:23.8521544Z shr.s16 %rs1026, %rs1025, 4; 2026-02-21T12:40:23.8521608Z cvt.s16.s8 %rs1027, %rs975; 2026-02-21T12:40:23.8521746Z shr.s16 %rs1028, %rs1027, 4; 2026-02-21T12:40:23.8521812Z prmt.b32 %r12675, %r12619, 0, 0xbbb3U; 2026-02-21T12:40:23.8521935Z cvt.u16.u32 %rs1029, %r12675; 2026-02-21T12:40:23.8522000Z shr.s16 %rs1030, %rs1029, 4; 2026-02-21T12:40:23.8522065Z prmt.b32 %r12676, %r12629, 0, 0xbbb3U; 2026-02-21T12:40:23.8522137Z cvt.u16.u32 %rs1031, %r12676; 2026-02-21T12:40:23.8522199Z shr.s16 %rs1032, %rs1031, 4; 2026-02-21T12:40:23.8522395Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8522457Z cvt.rn.f32.s16 %r12677, %rs1032; 2026-02-21T12:40:23.8522519Z cvt.rn.f32.s16 %r12678, %rs1030; 2026-02-21T12:40:23.8522583Z cvt.rn.f32.s16 %r12679, %rs1028; 2026-02-21T12:40:23.8522645Z cvt.rn.f32.s16 %r12680, %rs1026; 2026-02-21T12:40:23.8522836Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8522902Z cvt.s16.s8 %rs1033, %rs974; 2026-02-21T12:40:23.8522964Z shr.s16 %rs1034, %rs1033, 4; 2026-02-21T12:40:23.8523023Z cvt.s16.s8 %rs1035, %rs976; 2026-02-21T12:40:23.8523139Z shr.s16 %rs1036, %rs1035, 4; 2026-02-21T12:40:23.8523249Z prmt.b32 %r12681, %r12624, 0, 0xbbb3U; 2026-02-21T12:40:23.8523311Z cvt.u16.u32 %rs1037, %r12681; 2026-02-21T12:40:23.8523373Z shr.s16 %rs1038, %rs1037, 4; 2026-02-21T12:40:23.8523441Z prmt.b32 %r12682, %r12634, 0, 0xbbb3U; 2026-02-21T12:40:23.8523502Z cvt.u16.u32 %rs1039, %r12682; 2026-02-21T12:40:23.8523561Z shr.s16 %rs1040, %rs1039, 4; 2026-02-21T12:40:23.8523769Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8523833Z cvt.rn.f32.s16 %r12683, %rs1040; 2026-02-21T12:40:23.8523894Z cvt.rn.f32.s16 %r12684, %rs1038; 2026-02-21T12:40:23.8523957Z cvt.rn.f32.s16 %r12685, %rs1036; 2026-02-21T12:40:23.8524019Z cvt.rn.f32.s16 %r12686, %rs1034; 2026-02-21T12:40:23.8524075Z bar.sync 0; 2026-02-21T12:40:23.8524197Z st.shared.v4.b32 [%r33], {%r12644, %r12642, %r12643, %r12641}; 2026-02-21T12:40:23.8524327Z st.shared.v4.b32 [%r33+8192], {%r12650, %r12648, %r12649, %r12647}; 2026-02-21T12:40:23.8524441Z st.shared.v4.b32 [%r34], {%r12656, %r12654, %r12655, %r12653}; 2026-02-21T12:40:23.8524558Z st.shared.v4.b32 [%r34+8192], {%r12662, %r12660, %r12661, %r12659}; 2026-02-21T12:40:23.8524670Z st.shared.v4.b32 [%r35], {%r12668, %r12666, %r12667, %r12665}; 2026-02-21T12:40:23.8524786Z st.shared.v4.b32 [%r35+8192], {%r12674, %r12672, %r12673, %r12671}; 2026-02-21T12:40:23.8524891Z st.shared.v4.b32 [%r36], {%r12680, %r12678, %r12679, %r12677}; 2026-02-21T12:40:23.8525007Z st.shared.v4.b32 [%r36+8192], {%r12686, %r12684, %r12685, %r12683}; 2026-02-21T12:40:23.8525061Z $L__tmp19: 2026-02-21T12:40:23.8525333Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8525395Z // begin inline asm 2026-02-21T12:40:23.8525479Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8525534Z // end inline asm 2026-02-21T12:40:23.8525588Z bar.sync 0; 2026-02-21T12:40:23.8525663Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8525722Z // begin inline asm 2026-02-21T12:40:23.8528584Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r11210,%r11211,%r11212,%r11213}, %rd23, %p25, 1, 1; 2026-02-21T12:40:23.8528787Z // end inline asm 2026-02-21T12:40:23.8528858Z // begin inline asm 2026-02-21T12:40:23.8531694Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r11470,%r11471,%r11472,%r11473}, %rd24, %p25, 1, 1; 2026-02-21T12:40:23.8531758Z // end inline asm 2026-02-21T12:40:23.8531839Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8531900Z mov.b32 %r11602, %r18409; 2026-02-21T12:40:23.8531959Z mov.b32 %r11603, %r12394; 2026-02-21T12:40:23.8532023Z mov.b32 %r11604, %r12394; 2026-02-21T12:40:23.8532079Z // begin inline asm 2026-02-21T12:40:23.8534597Z // wait for regs: %r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969,%r11602,%r11603,%r11604 2026-02-21T12:40:23.8534682Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8534738Z // end inline asm 2026-02-21T12:40:23.8534792Z $L__tmp20: 2026-02-21T12:40:23.8534999Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8535057Z // begin inline asm 2026-02-21T12:40:23.8535115Z mov.u64 %rd371, 0x0; 2026-02-21T12:40:23.8535257Z createpolicy.fractional.L2::evict_last.b64 %rd371, 1.0; 2026-02-21T12:40:23.8535313Z // end inline asm 2026-02-21T12:40:23.8535369Z // begin inline asm 2026-02-21T12:40:23.8535429Z mov.u32 %r11736, 0x0; 2026-02-21T12:40:23.8535487Z mov.u32 %r11737, 0x0; 2026-02-21T12:40:23.8535542Z mov.u32 %r11738, 0x0; 2026-02-21T12:40:23.8535599Z mov.u32 %r11739, 0x0; 2026-02-21T12:40:23.8535839Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r11736, %r11737, %r11738, %r11739 }, [ %rd629 + 0 ], %rd371; 2026-02-21T12:40:23.8535894Z // end inline asm 2026-02-21T12:40:23.8536091Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8536208Z bar.sync 0; 2026-02-21T12:40:23.8536292Z st.shared.v2.b32 [%r9], {%r11736, %r11737}; 2026-02-21T12:40:23.8536436Z st.shared.v2.b32 [%r10], {%r11738, %r11739}; 2026-02-21T12:40:23.8536600Z bar.sync 0; 2026-02-21T12:40:23.8536673Z ld.shared.b16 %rs1041, [%r53]; 2026-02-21T12:40:23.8536741Z ld.shared.b16 %rs1042, [%r53+256]; 2026-02-21T12:40:23.8536806Z ld.shared.b16 %rs1043, [%r53+16]; 2026-02-21T12:40:23.8536875Z ld.shared.b16 %rs1044, [%r53+272]; 2026-02-21T12:40:23.8536938Z ld.shared.b16 %rs1045, [%r54]; 2026-02-21T12:40:23.8537001Z ld.shared.b16 %rs1046, [%r54+256]; 2026-02-21T12:40:23.8537069Z ld.shared.b16 %rs1047, [%r54+16]; 2026-02-21T12:40:23.8537132Z ld.shared.b16 %rs1048, [%r54+272]; 2026-02-21T12:40:23.8537194Z cvt.f32.bf16 %r12000, %rs1041; 2026-02-21T12:40:23.8537270Z cvt.f32.bf16 %r12001, %rs1042; 2026-02-21T12:40:23.8537332Z cvt.f32.bf16 %r12002, %rs1045; 2026-02-21T12:40:23.8537395Z cvt.f32.bf16 %r12003, %rs1046; 2026-02-21T12:40:23.8537455Z cvt.f32.bf16 %r12260, %rs1043; 2026-02-21T12:40:23.8537519Z cvt.f32.bf16 %r12261, %rs1044; 2026-02-21T12:40:23.8537717Z cvt.f32.bf16 %r12262, %rs1047; 2026-02-21T12:40:23.8537781Z cvt.f32.bf16 %r12263, %rs1048; 2026-02-21T12:40:23.8537982Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8538043Z add.s64 %rd374, %rd628, 20480; 2026-02-21T12:40:23.8538102Z // begin inline asm 2026-02-21T12:40:23.8538159Z mov.u32 %r11740, 0x0; 2026-02-21T12:40:23.8538218Z mov.u32 %r11741, 0x0; 2026-02-21T12:40:23.8538273Z mov.u32 %r11742, 0x0; 2026-02-21T12:40:23.8538329Z mov.u32 %r11743, 0x0; 2026-02-21T12:40:23.8538465Z ld.global.v4.b32 { %r11740, %r11741, %r11742, %r11743 }, [ %rd374 + 0 ]; 2026-02-21T12:40:23.8538521Z // end inline asm 2026-02-21T12:40:23.8538713Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8538774Z bar.sync 0; 2026-02-21T12:40:23.8538841Z st.shared.b8 [%r13], %r11740; 2026-02-21T12:40:23.8538913Z prmt.b32 %r12687, %r11740, 0, 0x7771U; 2026-02-21T12:40:23.8538979Z st.shared.b8 [%r14], %r12687; 2026-02-21T12:40:23.8539047Z prmt.b32 %r12688, %r11740, 0, 0x7772U; 2026-02-21T12:40:23.8539111Z st.shared.b8 [%r15+256], %r12688; 2026-02-21T12:40:23.8539177Z prmt.b32 %r12689, %r11740, 0, 0x7773U; 2026-02-21T12:40:23.8539242Z st.shared.b8 [%r16+256], %r12689; 2026-02-21T12:40:23.8539303Z st.shared.b8 [%r17+512], %r11741; 2026-02-21T12:40:23.8539367Z prmt.b32 %r12690, %r11741, 0, 0x7771U; 2026-02-21T12:40:23.8539428Z st.shared.b8 [%r18+512], %r12690; 2026-02-21T12:40:23.8539494Z prmt.b32 %r12691, %r11741, 0, 0x7772U; 2026-02-21T12:40:23.8539555Z st.shared.b8 [%r19+768], %r12691; 2026-02-21T12:40:23.8539618Z prmt.b32 %r12692, %r11741, 0, 0x7773U; 2026-02-21T12:40:23.8539682Z st.shared.b8 [%r20+768], %r12692; 2026-02-21T12:40:23.8539744Z st.shared.b8 [%r21+1024], %r11742; 2026-02-21T12:40:23.8539811Z prmt.b32 %r12693, %r11742, 0, 0x7771U; 2026-02-21T12:40:23.8539873Z st.shared.b8 [%r22+1024], %r12693; 2026-02-21T12:40:23.8539959Z prmt.b32 %r12694, %r11742, 0, 0x7772U; 2026-02-21T12:40:23.8540023Z st.shared.b8 [%r23+1280], %r12694; 2026-02-21T12:40:23.8540086Z prmt.b32 %r12695, %r11742, 0, 0x7773U; 2026-02-21T12:40:23.8540151Z st.shared.b8 [%r24+1280], %r12695; 2026-02-21T12:40:23.8540213Z st.shared.b8 [%r25+1536], %r11743; 2026-02-21T12:40:23.8540277Z prmt.b32 %r12696, %r11743, 0, 0x7771U; 2026-02-21T12:40:23.8540340Z st.shared.b8 [%r26+1536], %r12696; 2026-02-21T12:40:23.8540403Z prmt.b32 %r12697, %r11743, 0, 0x7772U; 2026-02-21T12:40:23.8540465Z st.shared.b8 [%r27+1792], %r12697; 2026-02-21T12:40:23.8540529Z prmt.b32 %r12698, %r11743, 0, 0x7773U; 2026-02-21T12:40:23.8540594Z st.shared.b8 [%r28+1792], %r12698; 2026-02-21T12:40:23.8540647Z bar.sync 0; 2026-02-21T12:40:23.8540711Z ld.shared.b32 %r12699, [%r29]; 2026-02-21T12:40:23.8540868Z prmt.b32 %r12700, %r12699, 0, 0x7770U; 2026-02-21T12:40:23.8540929Z cvt.u16.u32 %rs1049, %r12700; 2026-02-21T12:40:23.8540993Z prmt.b32 %r12701, %r12699, 0, 0x7771U; 2026-02-21T12:40:23.8541118Z cvt.u16.u32 %rs1050, %r12701; 2026-02-21T12:40:23.8541183Z prmt.b32 %r12702, %r12699, 0, 0x7772U; 2026-02-21T12:40:23.8541243Z cvt.u16.u32 %rs1051, %r12702; 2026-02-21T12:40:23.8541306Z prmt.b32 %r12703, %r12699, 0, 0x7773U; 2026-02-21T12:40:23.8541368Z cvt.u16.u32 %rs1052, %r12703; 2026-02-21T12:40:23.8541431Z ld.shared.b32 %r12704, [%r30]; 2026-02-21T12:40:23.8541506Z prmt.b32 %r12705, %r12704, 0, 0x7770U; 2026-02-21T12:40:23.8541570Z cvt.u16.u32 %rs1053, %r12705; 2026-02-21T12:40:23.8541634Z prmt.b32 %r12706, %r12704, 0, 0x7771U; 2026-02-21T12:40:23.8541695Z cvt.u16.u32 %rs1054, %r12706; 2026-02-21T12:40:23.8541758Z prmt.b32 %r12707, %r12704, 0, 0x7772U; 2026-02-21T12:40:23.8541821Z cvt.u16.u32 %rs1055, %r12707; 2026-02-21T12:40:23.8541884Z prmt.b32 %r12708, %r12704, 0, 0x7773U; 2026-02-21T12:40:23.8541946Z cvt.u16.u32 %rs1056, %r12708; 2026-02-21T12:40:23.8542010Z ld.shared.b32 %r12709, [%r31]; 2026-02-21T12:40:23.8542169Z prmt.b32 %r12710, %r12709, 0, 0x7770U; 2026-02-21T12:40:23.8542232Z cvt.u16.u32 %rs1057, %r12710; 2026-02-21T12:40:23.8542294Z prmt.b32 %r12711, %r12709, 0, 0x7771U; 2026-02-21T12:40:23.8542356Z cvt.u16.u32 %rs1058, %r12711; 2026-02-21T12:40:23.8542418Z prmt.b32 %r12712, %r12709, 0, 0x7772U; 2026-02-21T12:40:23.8542477Z cvt.u16.u32 %rs1059, %r12712; 2026-02-21T12:40:23.8542542Z prmt.b32 %r12713, %r12709, 0, 0x7773U; 2026-02-21T12:40:23.8542602Z cvt.u16.u32 %rs1060, %r12713; 2026-02-21T12:40:23.8542665Z ld.shared.b32 %r12714, [%r32]; 2026-02-21T12:40:23.8542728Z prmt.b32 %r12715, %r12714, 0, 0x7770U; 2026-02-21T12:40:23.8542792Z cvt.u16.u32 %rs1061, %r12715; 2026-02-21T12:40:23.8542855Z prmt.b32 %r12716, %r12714, 0, 0x7771U; 2026-02-21T12:40:23.8542914Z cvt.u16.u32 %rs1062, %r12716; 2026-02-21T12:40:23.8542981Z prmt.b32 %r12717, %r12714, 0, 0x7772U; 2026-02-21T12:40:23.8543042Z cvt.u16.u32 %rs1063, %r12717; 2026-02-21T12:40:23.8543107Z prmt.b32 %r12718, %r12714, 0, 0x7773U; 2026-02-21T12:40:23.8543176Z cvt.u16.u32 %rs1064, %r12718; 2026-02-21T12:40:23.8543389Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8543454Z shl.b16 %rs1065, %rs1049, 4; 2026-02-21T12:40:23.8543515Z shl.b16 %rs1066, %rs1053, 4; 2026-02-21T12:40:23.8543578Z shl.b16 %rs1067, %rs1057, 4; 2026-02-21T12:40:23.8543638Z shl.b16 %rs1068, %rs1061, 4; 2026-02-21T12:40:23.8543697Z shl.b16 %rs1069, %rs1050, 4; 2026-02-21T12:40:23.8543760Z shl.b16 %rs1070, %rs1054, 4; 2026-02-21T12:40:23.8543818Z shl.b16 %rs1071, %rs1058, 4; 2026-02-21T12:40:23.8543876Z shl.b16 %rs1072, %rs1062, 4; 2026-02-21T12:40:23.8543934Z shl.b16 %rs1073, %rs1051, 4; 2026-02-21T12:40:23.8543994Z shl.b16 %rs1074, %rs1055, 4; 2026-02-21T12:40:23.8544053Z shl.b16 %rs1075, %rs1059, 4; 2026-02-21T12:40:23.8544113Z shl.b16 %rs1076, %rs1063, 4; 2026-02-21T12:40:23.8544176Z shl.b16 %rs1077, %rs1052, 4; 2026-02-21T12:40:23.8544235Z shl.b16 %rs1078, %rs1056, 4; 2026-02-21T12:40:23.8544299Z shl.b16 %rs1079, %rs1060, 4; 2026-02-21T12:40:23.8544362Z shl.b16 %rs1080, %rs1064, 4; 2026-02-21T12:40:23.8544558Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8544618Z cvt.s16.s8 %rs1081, %rs1065; 2026-02-21T12:40:23.8544677Z shr.s16 %rs1082, %rs1081, 4; 2026-02-21T12:40:23.8544738Z cvt.s16.s8 %rs1083, %rs1067; 2026-02-21T12:40:23.8544798Z shr.s16 %rs1084, %rs1083, 4; 2026-02-21T12:40:23.8544865Z prmt.b32 %r12719, %r12699, 0, 0x8880U; 2026-02-21T12:40:23.8544929Z cvt.u16.u32 %rs1085, %r12719; 2026-02-21T12:40:23.8544987Z shr.s16 %rs1086, %rs1085, 4; 2026-02-21T12:40:23.8545051Z prmt.b32 %r12720, %r12709, 0, 0x8880U; 2026-02-21T12:40:23.8545112Z cvt.u16.u32 %rs1087, %r12720; 2026-02-21T12:40:23.8545173Z shr.s16 %rs1088, %rs1087, 4; 2026-02-21T12:40:23.8545430Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8545544Z cvt.rn.f32.s16 %r12721, %rs1088; 2026-02-21T12:40:23.8545609Z cvt.rn.f32.s16 %r12722, %rs1086; 2026-02-21T12:40:23.8545685Z cvt.rn.f32.s16 %r12723, %rs1084; 2026-02-21T12:40:23.8545748Z cvt.rn.f32.s16 %r12724, %rs1082; 2026-02-21T12:40:23.8545946Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8546008Z cvt.s16.s8 %rs1089, %rs1066; 2026-02-21T12:40:23.8546068Z shr.s16 %rs1090, %rs1089, 4; 2026-02-21T12:40:23.8546127Z cvt.s16.s8 %rs1091, %rs1068; 2026-02-21T12:40:23.8546190Z shr.s16 %rs1092, %rs1091, 4; 2026-02-21T12:40:23.8546254Z prmt.b32 %r12725, %r12704, 0, 0x8880U; 2026-02-21T12:40:23.8546314Z cvt.u16.u32 %rs1093, %r12725; 2026-02-21T12:40:23.8546375Z shr.s16 %rs1094, %rs1093, 4; 2026-02-21T12:40:23.8546439Z prmt.b32 %r12726, %r12714, 0, 0x8880U; 2026-02-21T12:40:23.8546616Z cvt.u16.u32 %rs1095, %r12726; 2026-02-21T12:40:23.8546681Z shr.s16 %rs1096, %rs1095, 4; 2026-02-21T12:40:23.8547015Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8547095Z cvt.rn.f32.s16 %r12727, %rs1096; 2026-02-21T12:40:23.8547157Z cvt.rn.f32.s16 %r12728, %rs1094; 2026-02-21T12:40:23.8547222Z cvt.rn.f32.s16 %r12729, %rs1092; 2026-02-21T12:40:23.8547281Z cvt.rn.f32.s16 %r12730, %rs1090; 2026-02-21T12:40:23.8547474Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8547537Z cvt.s16.s8 %rs1097, %rs1069; 2026-02-21T12:40:23.8547596Z shr.s16 %rs1098, %rs1097, 4; 2026-02-21T12:40:23.8547654Z cvt.s16.s8 %rs1099, %rs1071; 2026-02-21T12:40:23.8547715Z shr.s16 %rs1100, %rs1099, 4; 2026-02-21T12:40:23.8547783Z prmt.b32 %r12731, %r12699, 0, 0x9991U; 2026-02-21T12:40:23.8547843Z cvt.u16.u32 %rs1101, %r12731; 2026-02-21T12:40:23.8547905Z shr.s16 %rs1102, %rs1101, 4; 2026-02-21T12:40:23.8547971Z prmt.b32 %r12732, %r12709, 0, 0x9991U; 2026-02-21T12:40:23.8548035Z cvt.u16.u32 %rs1103, %r12732; 2026-02-21T12:40:23.8548097Z shr.s16 %rs1104, %rs1103, 4; 2026-02-21T12:40:23.8548294Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8548438Z cvt.rn.f32.s16 %r12733, %rs1104; 2026-02-21T12:40:23.8548501Z cvt.rn.f32.s16 %r12734, %rs1102; 2026-02-21T12:40:23.8548561Z cvt.rn.f32.s16 %r12735, %rs1100; 2026-02-21T12:40:23.8548624Z cvt.rn.f32.s16 %r12736, %rs1098; 2026-02-21T12:40:23.8548817Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8548877Z cvt.s16.s8 %rs1105, %rs1070; 2026-02-21T12:40:23.8548948Z shr.s16 %rs1106, %rs1105, 4; 2026-02-21T12:40:23.8549009Z cvt.s16.s8 %rs1107, %rs1072; 2026-02-21T12:40:23.8549069Z shr.s16 %rs1108, %rs1107, 4; 2026-02-21T12:40:23.8549141Z prmt.b32 %r12737, %r12704, 0, 0x9991U; 2026-02-21T12:40:23.8549201Z cvt.u16.u32 %rs1109, %r12737; 2026-02-21T12:40:23.8549262Z shr.s16 %rs1110, %rs1109, 4; 2026-02-21T12:40:23.8549330Z prmt.b32 %r12738, %r12714, 0, 0x9991U; 2026-02-21T12:40:23.8549394Z cvt.u16.u32 %rs1111, %r12738; 2026-02-21T12:40:23.8549452Z shr.s16 %rs1112, %rs1111, 4; 2026-02-21T12:40:23.8549643Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8549709Z cvt.rn.f32.s16 %r12739, %rs1112; 2026-02-21T12:40:23.8549769Z cvt.rn.f32.s16 %r12740, %rs1110; 2026-02-21T12:40:23.8549828Z cvt.rn.f32.s16 %r12741, %rs1108; 2026-02-21T12:40:23.8549889Z cvt.rn.f32.s16 %r12742, %rs1106; 2026-02-21T12:40:23.8550083Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8550142Z cvt.s16.s8 %rs1113, %rs1073; 2026-02-21T12:40:23.8550202Z shr.s16 %rs1114, %rs1113, 4; 2026-02-21T12:40:23.8550355Z cvt.s16.s8 %rs1115, %rs1075; 2026-02-21T12:40:23.8550415Z shr.s16 %rs1116, %rs1115, 4; 2026-02-21T12:40:23.8550482Z prmt.b32 %r12743, %r12699, 0, 0xaaa2U; 2026-02-21T12:40:23.8550611Z cvt.u16.u32 %rs1117, %r12743; 2026-02-21T12:40:23.8550672Z shr.s16 %rs1118, %rs1117, 4; 2026-02-21T12:40:23.8550735Z prmt.b32 %r12744, %r12709, 0, 0xaaa2U; 2026-02-21T12:40:23.8550795Z cvt.u16.u32 %rs1119, %r12744; 2026-02-21T12:40:23.8550858Z shr.s16 %rs1120, %rs1119, 4; 2026-02-21T12:40:23.8551051Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8551113Z cvt.rn.f32.s16 %r12745, %rs1120; 2026-02-21T12:40:23.8551190Z cvt.rn.f32.s16 %r12746, %rs1118; 2026-02-21T12:40:23.8551253Z cvt.rn.f32.s16 %r12747, %rs1116; 2026-02-21T12:40:23.8551315Z cvt.rn.f32.s16 %r12748, %rs1114; 2026-02-21T12:40:23.8551514Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8551576Z cvt.s16.s8 %rs1121, %rs1074; 2026-02-21T12:40:23.8551635Z shr.s16 %rs1122, %rs1121, 4; 2026-02-21T12:40:23.8551695Z cvt.s16.s8 %rs1123, %rs1076; 2026-02-21T12:40:23.8551851Z shr.s16 %rs1124, %rs1123, 4; 2026-02-21T12:40:23.8551919Z prmt.b32 %r12749, %r12704, 0, 0xaaa2U; 2026-02-21T12:40:23.8551981Z cvt.u16.u32 %rs1125, %r12749; 2026-02-21T12:40:23.8552044Z shr.s16 %rs1126, %rs1125, 4; 2026-02-21T12:40:23.8552108Z prmt.b32 %r12750, %r12714, 0, 0xaaa2U; 2026-02-21T12:40:23.8552167Z cvt.u16.u32 %rs1127, %r12750; 2026-02-21T12:40:23.8552229Z shr.s16 %rs1128, %rs1127, 4; 2026-02-21T12:40:23.8552424Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8552486Z cvt.rn.f32.s16 %r12751, %rs1128; 2026-02-21T12:40:23.8552547Z cvt.rn.f32.s16 %r12752, %rs1126; 2026-02-21T12:40:23.8552609Z cvt.rn.f32.s16 %r12753, %rs1124; 2026-02-21T12:40:23.8552671Z cvt.rn.f32.s16 %r12754, %rs1122; 2026-02-21T12:40:23.8552865Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8552927Z cvt.s16.s8 %rs1129, %rs1077; 2026-02-21T12:40:23.8552992Z shr.s16 %rs1130, %rs1129, 4; 2026-02-21T12:40:23.8553053Z cvt.s16.s8 %rs1131, %rs1079; 2026-02-21T12:40:23.8553112Z shr.s16 %rs1132, %rs1131, 4; 2026-02-21T12:40:23.8553178Z prmt.b32 %r12755, %r12699, 0, 0xbbb3U; 2026-02-21T12:40:23.8553239Z cvt.u16.u32 %rs1133, %r12755; 2026-02-21T12:40:23.8553299Z shr.s16 %rs1134, %rs1133, 4; 2026-02-21T12:40:23.8553366Z prmt.b32 %r12756, %r12709, 0, 0xbbb3U; 2026-02-21T12:40:23.8553427Z cvt.u16.u32 %rs1135, %r12756; 2026-02-21T12:40:23.8553485Z shr.s16 %rs1136, %rs1135, 4; 2026-02-21T12:40:23.8553676Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8553742Z cvt.rn.f32.s16 %r12757, %rs1136; 2026-02-21T12:40:23.8553803Z cvt.rn.f32.s16 %r12758, %rs1134; 2026-02-21T12:40:23.8553864Z cvt.rn.f32.s16 %r12759, %rs1132; 2026-02-21T12:40:23.8553931Z cvt.rn.f32.s16 %r12760, %rs1130; 2026-02-21T12:40:23.8554125Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8554189Z cvt.s16.s8 %rs1137, %rs1078; 2026-02-21T12:40:23.8554250Z shr.s16 %rs1138, %rs1137, 4; 2026-02-21T12:40:23.8554310Z cvt.s16.s8 %rs1139, %rs1080; 2026-02-21T12:40:23.8554368Z shr.s16 %rs1140, %rs1139, 4; 2026-02-21T12:40:23.8554431Z prmt.b32 %r12761, %r12704, 0, 0xbbb3U; 2026-02-21T12:40:23.8554493Z cvt.u16.u32 %rs1141, %r12761; 2026-02-21T12:40:23.8554552Z shr.s16 %rs1142, %rs1141, 4; 2026-02-21T12:40:23.8554616Z prmt.b32 %r12762, %r12714, 0, 0xbbb3U; 2026-02-21T12:40:23.8554680Z cvt.u16.u32 %rs1143, %r12762; 2026-02-21T12:40:23.8554741Z shr.s16 %rs1144, %rs1143, 4; 2026-02-21T12:40:23.8554939Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8555003Z cvt.rn.f32.s16 %r12763, %rs1144; 2026-02-21T12:40:23.8555154Z cvt.rn.f32.s16 %r12764, %rs1142; 2026-02-21T12:40:23.8555215Z cvt.rn.f32.s16 %r12765, %rs1140; 2026-02-21T12:40:23.8555324Z cvt.rn.f32.s16 %r12766, %rs1138; 2026-02-21T12:40:23.8555382Z bar.sync 0; 2026-02-21T12:40:23.8555499Z st.shared.v4.b32 [%r33], {%r12724, %r12722, %r12723, %r12721}; 2026-02-21T12:40:23.8555624Z st.shared.v4.b32 [%r33+8192], {%r12730, %r12728, %r12729, %r12727}; 2026-02-21T12:40:23.8555735Z st.shared.v4.b32 [%r34], {%r12736, %r12734, %r12735, %r12733}; 2026-02-21T12:40:23.8555852Z st.shared.v4.b32 [%r34+8192], {%r12742, %r12740, %r12741, %r12739}; 2026-02-21T12:40:23.8555970Z st.shared.v4.b32 [%r35], {%r12748, %r12746, %r12747, %r12745}; 2026-02-21T12:40:23.8556091Z st.shared.v4.b32 [%r35+8192], {%r12754, %r12752, %r12753, %r12751}; 2026-02-21T12:40:23.8556198Z st.shared.v4.b32 [%r36], {%r12760, %r12758, %r12759, %r12757}; 2026-02-21T12:40:23.8556311Z st.shared.v4.b32 [%r36+8192], {%r12766, %r12764, %r12765, %r12763}; 2026-02-21T12:40:23.8556367Z $L__tmp21: 2026-02-21T12:40:23.8556844Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8556983Z // begin inline asm 2026-02-21T12:40:23.8557064Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8557123Z // end inline asm 2026-02-21T12:40:23.8557177Z bar.sync 0; 2026-02-21T12:40:23.8557247Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8557307Z // begin inline asm 2026-02-21T12:40:23.8560043Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r12000,%r12001,%r12002,%r12003}, %rd23, %p25, 1, 1; 2026-02-21T12:40:23.8560105Z // end inline asm 2026-02-21T12:40:23.8560163Z // begin inline asm 2026-02-21T12:40:23.8562879Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r12260,%r12261,%r12262,%r12263}, %rd24, %p25, 1, 1; 2026-02-21T12:40:23.8562942Z // end inline asm 2026-02-21T12:40:23.8563017Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8563153Z mov.b32 %r12393, %r12394; 2026-02-21T12:40:23.8563213Z mov.b32 %r12392, %r18409; 2026-02-21T12:40:23.8563270Z // begin inline asm 2026-02-21T12:40:23.8565957Z // wait for regs: %r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969,%r12392,%r12393,%r12394 2026-02-21T12:40:23.8566042Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8566103Z // end inline asm 2026-02-21T12:40:23.8566155Z $L__tmp22: 2026-02-21T12:40:23.8566368Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8566432Z add.s64 %rd630, %rd630, 24; 2026-02-21T12:40:23.8566618Z add.s64 %rd629, %rd629, 96; 2026-02-21T12:40:23.8566685Z add.s64 %rd628, %rd628, 30720; 2026-02-21T12:40:23.8566751Z setp.lt.u64 %p31, %rd630, 4056; 2026-02-21T12:40:23.8566814Z @%p31 bra $L__BB0_28; 2026-02-21T12:40:23.8566907Z // %bb.29: // %.preheader155 2026-02-21T12:40:23.8567011Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8567077Z add.s64 %rd632, %rd27, %rd96; 2026-02-21T12:40:23.8567141Z add.s64 %rd631, %rd28, %rd95; 2026-02-21T12:40:23.8567197Z mov.b64 %rd633, 4072; 2026-02-21T12:40:23.8567320Z $L__BB0_30: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8567432Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8567635Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8567693Z // begin inline asm 2026-02-21T12:40:23.8567754Z mov.u64 %rd378, 0x0; 2026-02-21T12:40:23.8567878Z createpolicy.fractional.L2::evict_last.b64 %rd378, 1.0; 2026-02-21T12:40:23.8567935Z // end inline asm 2026-02-21T12:40:23.8567994Z // begin inline asm 2026-02-21T12:40:23.8568053Z mov.u32 %r12767, 0x0; 2026-02-21T12:40:23.8568108Z mov.u32 %r12768, 0x0; 2026-02-21T12:40:23.8568165Z mov.u32 %r12769, 0x0; 2026-02-21T12:40:23.8568226Z mov.u32 %r12770, 0x0; 2026-02-21T12:40:23.8568462Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r12767, %r12768, %r12769, %r12770 }, [ %rd632 + 0 ], %rd378; 2026-02-21T12:40:23.8568520Z // end inline asm 2026-02-21T12:40:23.8568721Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8568777Z bar.sync 0; 2026-02-21T12:40:23.8568859Z st.shared.v2.b32 [%r9], {%r12767, %r12768}; 2026-02-21T12:40:23.8568943Z st.shared.v2.b32 [%r10], {%r12769, %r12770}; 2026-02-21T12:40:23.8568998Z bar.sync 0; 2026-02-21T12:40:23.8569065Z ld.shared.b16 %rs1145, [%r53]; 2026-02-21T12:40:23.8569132Z ld.shared.b16 %rs1146, [%r53+256]; 2026-02-21T12:40:23.8569202Z ld.shared.b16 %rs1147, [%r53+16]; 2026-02-21T12:40:23.8569266Z ld.shared.b16 %rs1148, [%r53+272]; 2026-02-21T12:40:23.8569331Z ld.shared.b16 %rs1149, [%r54]; 2026-02-21T12:40:23.8569395Z ld.shared.b16 %rs1150, [%r54+256]; 2026-02-21T12:40:23.8569458Z ld.shared.b16 %rs1151, [%r54+16]; 2026-02-21T12:40:23.8569519Z ld.shared.b16 %rs1152, [%r54+272]; 2026-02-21T12:40:23.8569679Z cvt.f32.bf16 %r13031, %rs1145; 2026-02-21T12:40:23.8569744Z cvt.f32.bf16 %r13032, %rs1146; 2026-02-21T12:40:23.8569808Z cvt.f32.bf16 %r13033, %rs1149; 2026-02-21T12:40:23.8569931Z cvt.f32.bf16 %r13034, %rs1150; 2026-02-21T12:40:23.8569995Z cvt.f32.bf16 %r13291, %rs1147; 2026-02-21T12:40:23.8570055Z cvt.f32.bf16 %r13292, %rs1148; 2026-02-21T12:40:23.8570118Z cvt.f32.bf16 %r13293, %rs1151; 2026-02-21T12:40:23.8570177Z cvt.f32.bf16 %r13294, %rs1152; 2026-02-21T12:40:23.8570381Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8570440Z // begin inline asm 2026-02-21T12:40:23.8570497Z mov.u32 %r12771, 0x0; 2026-02-21T12:40:23.8570569Z mov.u32 %r12772, 0x0; 2026-02-21T12:40:23.8570626Z mov.u32 %r12773, 0x0; 2026-02-21T12:40:23.8570685Z mov.u32 %r12774, 0x0; 2026-02-21T12:40:23.8570820Z ld.global.v4.b32 { %r12771, %r12772, %r12773, %r12774 }, [ %rd631 + 0 ]; 2026-02-21T12:40:23.8570877Z // end inline asm 2026-02-21T12:40:23.8571076Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8571198Z bar.sync 0; 2026-02-21T12:40:23.8571324Z st.shared.b8 [%r13], %r12771; 2026-02-21T12:40:23.8571396Z prmt.b32 %r13557, %r12771, 0, 0x7771U; 2026-02-21T12:40:23.8571458Z st.shared.b8 [%r14], %r13557; 2026-02-21T12:40:23.8571525Z prmt.b32 %r13558, %r12771, 0, 0x7772U; 2026-02-21T12:40:23.8571590Z st.shared.b8 [%r15+256], %r13558; 2026-02-21T12:40:23.8571654Z prmt.b32 %r13559, %r12771, 0, 0x7773U; 2026-02-21T12:40:23.8571716Z st.shared.b8 [%r16+256], %r13559; 2026-02-21T12:40:23.8571781Z st.shared.b8 [%r17+512], %r12772; 2026-02-21T12:40:23.8571857Z prmt.b32 %r13560, %r12772, 0, 0x7771U; 2026-02-21T12:40:23.8571920Z st.shared.b8 [%r18+512], %r13560; 2026-02-21T12:40:23.8571987Z prmt.b32 %r13561, %r12772, 0, 0x7772U; 2026-02-21T12:40:23.8572048Z st.shared.b8 [%r19+768], %r13561; 2026-02-21T12:40:23.8572112Z prmt.b32 %r13562, %r12772, 0, 0x7773U; 2026-02-21T12:40:23.8572179Z st.shared.b8 [%r20+768], %r13562; 2026-02-21T12:40:23.8572241Z st.shared.b8 [%r21+1024], %r12773; 2026-02-21T12:40:23.8572309Z prmt.b32 %r13563, %r12773, 0, 0x7771U; 2026-02-21T12:40:23.8572373Z st.shared.b8 [%r22+1024], %r13563; 2026-02-21T12:40:23.8572438Z prmt.b32 %r13564, %r12773, 0, 0x7772U; 2026-02-21T12:40:23.8572500Z st.shared.b8 [%r23+1280], %r13564; 2026-02-21T12:40:23.8572564Z prmt.b32 %r13565, %r12773, 0, 0x7773U; 2026-02-21T12:40:23.8572628Z st.shared.b8 [%r24+1280], %r13565; 2026-02-21T12:40:23.8572690Z st.shared.b8 [%r25+1536], %r12774; 2026-02-21T12:40:23.8572754Z prmt.b32 %r13566, %r12774, 0, 0x7771U; 2026-02-21T12:40:23.8572816Z st.shared.b8 [%r26+1536], %r13566; 2026-02-21T12:40:23.8572883Z prmt.b32 %r13567, %r12774, 0, 0x7772U; 2026-02-21T12:40:23.8572946Z st.shared.b8 [%r27+1792], %r13567; 2026-02-21T12:40:23.8573009Z prmt.b32 %r13568, %r12774, 0, 0x7773U; 2026-02-21T12:40:23.8573074Z st.shared.b8 [%r28+1792], %r13568; 2026-02-21T12:40:23.8573129Z bar.sync 0; 2026-02-21T12:40:23.8573192Z ld.shared.b32 %r13569, [%r29]; 2026-02-21T12:40:23.8573260Z prmt.b32 %r13570, %r13569, 0, 0x7770U; 2026-02-21T12:40:23.8573325Z cvt.u16.u32 %rs1153, %r13570; 2026-02-21T12:40:23.8573389Z prmt.b32 %r13571, %r13569, 0, 0x7771U; 2026-02-21T12:40:23.8573449Z cvt.u16.u32 %rs1154, %r13571; 2026-02-21T12:40:23.8573515Z prmt.b32 %r13572, %r13569, 0, 0x7772U; 2026-02-21T12:40:23.8573573Z cvt.u16.u32 %rs1155, %r13572; 2026-02-21T12:40:23.8573637Z prmt.b32 %r13573, %r13569, 0, 0x7773U; 2026-02-21T12:40:23.8573699Z cvt.u16.u32 %rs1156, %r13573; 2026-02-21T12:40:23.8573763Z ld.shared.b32 %r13574, [%r30]; 2026-02-21T12:40:23.8573839Z prmt.b32 %r13575, %r13574, 0, 0x7770U; 2026-02-21T12:40:23.8573901Z cvt.u16.u32 %rs1157, %r13575; 2026-02-21T12:40:23.8573968Z prmt.b32 %r13576, %r13574, 0, 0x7771U; 2026-02-21T12:40:23.8574028Z cvt.u16.u32 %rs1158, %r13576; 2026-02-21T12:40:23.8574092Z prmt.b32 %r13577, %r13574, 0, 0x7772U; 2026-02-21T12:40:23.8574214Z cvt.u16.u32 %rs1159, %r13577; 2026-02-21T12:40:23.8574278Z prmt.b32 %r13578, %r13574, 0, 0x7773U; 2026-02-21T12:40:23.8574385Z cvt.u16.u32 %rs1160, %r13578; 2026-02-21T12:40:23.8574450Z ld.shared.b32 %r13579, [%r31]; 2026-02-21T12:40:23.8574518Z prmt.b32 %r13580, %r13579, 0, 0x7770U; 2026-02-21T12:40:23.8574577Z cvt.u16.u32 %rs1161, %r13580; 2026-02-21T12:40:23.8574639Z prmt.b32 %r13581, %r13579, 0, 0x7771U; 2026-02-21T12:40:23.8574701Z cvt.u16.u32 %rs1162, %r13581; 2026-02-21T12:40:23.8574764Z prmt.b32 %r13582, %r13579, 0, 0x7772U; 2026-02-21T12:40:23.8574824Z cvt.u16.u32 %rs1163, %r13582; 2026-02-21T12:40:23.8574889Z prmt.b32 %r13583, %r13579, 0, 0x7773U; 2026-02-21T12:40:23.8574949Z cvt.u16.u32 %rs1164, %r13583; 2026-02-21T12:40:23.8575012Z ld.shared.b32 %r13584, [%r32]; 2026-02-21T12:40:23.8575074Z prmt.b32 %r13585, %r13584, 0, 0x7770U; 2026-02-21T12:40:23.8575136Z cvt.u16.u32 %rs1165, %r13585; 2026-02-21T12:40:23.8575198Z prmt.b32 %r13586, %r13584, 0, 0x7771U; 2026-02-21T12:40:23.8575259Z cvt.u16.u32 %rs1166, %r13586; 2026-02-21T12:40:23.8575324Z prmt.b32 %r13587, %r13584, 0, 0x7772U; 2026-02-21T12:40:23.8575493Z cvt.u16.u32 %rs1167, %r13587; 2026-02-21T12:40:23.8575560Z prmt.b32 %r13588, %r13584, 0, 0x7773U; 2026-02-21T12:40:23.8575632Z cvt.u16.u32 %rs1168, %r13588; 2026-02-21T12:40:23.8575834Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8575898Z shl.b16 %rs1169, %rs1153, 4; 2026-02-21T12:40:23.8575960Z shl.b16 %rs1170, %rs1157, 4; 2026-02-21T12:40:23.8576021Z shl.b16 %rs1171, %rs1161, 4; 2026-02-21T12:40:23.8576083Z shl.b16 %rs1172, %rs1165, 4; 2026-02-21T12:40:23.8576144Z shl.b16 %rs1173, %rs1154, 4; 2026-02-21T12:40:23.8576207Z shl.b16 %rs1174, %rs1158, 4; 2026-02-21T12:40:23.8576266Z shl.b16 %rs1175, %rs1162, 4; 2026-02-21T12:40:23.8576324Z shl.b16 %rs1176, %rs1166, 4; 2026-02-21T12:40:23.8576382Z shl.b16 %rs1177, %rs1155, 4; 2026-02-21T12:40:23.8576565Z shl.b16 %rs1178, %rs1159, 4; 2026-02-21T12:40:23.8576631Z shl.b16 %rs1179, %rs1163, 4; 2026-02-21T12:40:23.8576691Z shl.b16 %rs1180, %rs1167, 4; 2026-02-21T12:40:23.8576755Z shl.b16 %rs1181, %rs1156, 4; 2026-02-21T12:40:23.8576815Z shl.b16 %rs1182, %rs1160, 4; 2026-02-21T12:40:23.8576875Z shl.b16 %rs1183, %rs1164, 4; 2026-02-21T12:40:23.8576934Z shl.b16 %rs1184, %rs1168, 4; 2026-02-21T12:40:23.8577146Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8577208Z cvt.s16.s8 %rs1185, %rs1169; 2026-02-21T12:40:23.8577266Z shr.s16 %rs1186, %rs1185, 4; 2026-02-21T12:40:23.8577328Z cvt.s16.s8 %rs1187, %rs1171; 2026-02-21T12:40:23.8577387Z shr.s16 %rs1188, %rs1187, 4; 2026-02-21T12:40:23.8577451Z prmt.b32 %r13589, %r13569, 0, 0x8880U; 2026-02-21T12:40:23.8577515Z cvt.u16.u32 %rs1189, %r13589; 2026-02-21T12:40:23.8577573Z shr.s16 %rs1190, %rs1189, 4; 2026-02-21T12:40:23.8577637Z prmt.b32 %r13590, %r13579, 0, 0x8880U; 2026-02-21T12:40:23.8577698Z cvt.u16.u32 %rs1191, %r13590; 2026-02-21T12:40:23.8577760Z shr.s16 %rs1192, %rs1191, 4; 2026-02-21T12:40:23.8577957Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8578033Z cvt.rn.f32.s16 %r13591, %rs1192; 2026-02-21T12:40:23.8578102Z cvt.rn.f32.s16 %r13592, %rs1190; 2026-02-21T12:40:23.8578166Z cvt.rn.f32.s16 %r13593, %rs1188; 2026-02-21T12:40:23.8578230Z cvt.rn.f32.s16 %r13594, %rs1186; 2026-02-21T12:40:23.8578493Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8578557Z cvt.s16.s8 %rs1193, %rs1170; 2026-02-21T12:40:23.8578619Z shr.s16 %rs1194, %rs1193, 4; 2026-02-21T12:40:23.8578681Z cvt.s16.s8 %rs1195, %rs1172; 2026-02-21T12:40:23.8578748Z shr.s16 %rs1196, %rs1195, 4; 2026-02-21T12:40:23.8578813Z prmt.b32 %r13595, %r13574, 0, 0x8880U; 2026-02-21T12:40:23.8578875Z cvt.u16.u32 %rs1197, %r13595; 2026-02-21T12:40:23.8579031Z shr.s16 %rs1198, %rs1197, 4; 2026-02-21T12:40:23.8579097Z prmt.b32 %r13596, %r13584, 0, 0x8880U; 2026-02-21T12:40:23.8579220Z cvt.u16.u32 %rs1199, %r13596; 2026-02-21T12:40:23.8579281Z shr.s16 %rs1200, %rs1199, 4; 2026-02-21T12:40:23.8579477Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8579540Z cvt.rn.f32.s16 %r13597, %rs1200; 2026-02-21T12:40:23.8579601Z cvt.rn.f32.s16 %r13598, %rs1198; 2026-02-21T12:40:23.8579666Z cvt.rn.f32.s16 %r13599, %rs1196; 2026-02-21T12:40:23.8579726Z cvt.rn.f32.s16 %r13600, %rs1194; 2026-02-21T12:40:23.8579918Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8579981Z cvt.s16.s8 %rs1201, %rs1173; 2026-02-21T12:40:23.8580040Z shr.s16 %rs1202, %rs1201, 4; 2026-02-21T12:40:23.8580099Z cvt.s16.s8 %rs1203, %rs1175; 2026-02-21T12:40:23.8580156Z shr.s16 %rs1204, %rs1203, 4; 2026-02-21T12:40:23.8580227Z prmt.b32 %r13601, %r13569, 0, 0x9991U; 2026-02-21T12:40:23.8580288Z cvt.u16.u32 %rs1205, %r13601; 2026-02-21T12:40:23.8580409Z shr.s16 %rs1206, %rs1205, 4; 2026-02-21T12:40:23.8580539Z prmt.b32 %r13602, %r13579, 0, 0x9991U; 2026-02-21T12:40:23.8580602Z cvt.u16.u32 %rs1207, %r13602; 2026-02-21T12:40:23.8580661Z shr.s16 %rs1208, %rs1207, 4; 2026-02-21T12:40:23.8580854Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8580919Z cvt.rn.f32.s16 %r13603, %rs1208; 2026-02-21T12:40:23.8580992Z cvt.rn.f32.s16 %r13604, %rs1206; 2026-02-21T12:40:23.8581054Z cvt.rn.f32.s16 %r13605, %rs1204; 2026-02-21T12:40:23.8581115Z cvt.rn.f32.s16 %r13606, %rs1202; 2026-02-21T12:40:23.8581307Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8581367Z cvt.s16.s8 %rs1209, %rs1174; 2026-02-21T12:40:23.8581429Z shr.s16 %rs1210, %rs1209, 4; 2026-02-21T12:40:23.8581490Z cvt.s16.s8 %rs1211, %rs1176; 2026-02-21T12:40:23.8581548Z shr.s16 %rs1212, %rs1211, 4; 2026-02-21T12:40:23.8581615Z prmt.b32 %r13607, %r13574, 0, 0x9991U; 2026-02-21T12:40:23.8581680Z cvt.u16.u32 %rs1213, %r13607; 2026-02-21T12:40:23.8581740Z shr.s16 %rs1214, %rs1213, 4; 2026-02-21T12:40:23.8581805Z prmt.b32 %r13608, %r13584, 0, 0x9991U; 2026-02-21T12:40:23.8581865Z cvt.u16.u32 %rs1215, %r13608; 2026-02-21T12:40:23.8581925Z shr.s16 %rs1216, %rs1215, 4; 2026-02-21T12:40:23.8582116Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8582178Z cvt.rn.f32.s16 %r13609, %rs1216; 2026-02-21T12:40:23.8582243Z cvt.rn.f32.s16 %r13610, %rs1214; 2026-02-21T12:40:23.8582305Z cvt.rn.f32.s16 %r13611, %rs1212; 2026-02-21T12:40:23.8582366Z cvt.rn.f32.s16 %r13612, %rs1210; 2026-02-21T12:40:23.8582561Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8582623Z cvt.s16.s8 %rs1217, %rs1177; 2026-02-21T12:40:23.8582683Z shr.s16 %rs1218, %rs1217, 4; 2026-02-21T12:40:23.8582745Z cvt.s16.s8 %rs1219, %rs1179; 2026-02-21T12:40:23.8582806Z shr.s16 %rs1220, %rs1219, 4; 2026-02-21T12:40:23.8582871Z prmt.b32 %r13613, %r13569, 0, 0xaaa2U; 2026-02-21T12:40:23.8582933Z cvt.u16.u32 %rs1221, %r13613; 2026-02-21T12:40:23.8582995Z shr.s16 %rs1222, %rs1221, 4; 2026-02-21T12:40:23.8583060Z prmt.b32 %r13614, %r13579, 0, 0xaaa2U; 2026-02-21T12:40:23.8583119Z cvt.u16.u32 %rs1223, %r13614; 2026-02-21T12:40:23.8583181Z shr.s16 %rs1224, %rs1223, 4; 2026-02-21T12:40:23.8583373Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8583434Z cvt.rn.f32.s16 %r13615, %rs1224; 2026-02-21T12:40:23.8583496Z cvt.rn.f32.s16 %r13616, %rs1222; 2026-02-21T12:40:23.8583558Z cvt.rn.f32.s16 %r13617, %rs1220; 2026-02-21T12:40:23.8583618Z cvt.rn.f32.s16 %r13618, %rs1218; 2026-02-21T12:40:23.8583873Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8583937Z cvt.s16.s8 %rs1225, %rs1178; 2026-02-21T12:40:23.8584044Z shr.s16 %rs1226, %rs1225, 4; 2026-02-21T12:40:23.8584115Z cvt.s16.s8 %rs1227, %rs1180; 2026-02-21T12:40:23.8584180Z shr.s16 %rs1228, %rs1227, 4; 2026-02-21T12:40:23.8584247Z prmt.b32 %r13619, %r13574, 0, 0xaaa2U; 2026-02-21T12:40:23.8584308Z cvt.u16.u32 %rs1229, %r13619; 2026-02-21T12:40:23.8584368Z shr.s16 %rs1230, %rs1229, 4; 2026-02-21T12:40:23.8584435Z prmt.b32 %r13620, %r13584, 0, 0xaaa2U; 2026-02-21T12:40:23.8584495Z cvt.u16.u32 %rs1231, %r13620; 2026-02-21T12:40:23.8584554Z shr.s16 %rs1232, %rs1231, 4; 2026-02-21T12:40:23.8584749Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8584809Z cvt.rn.f32.s16 %r13621, %rs1232; 2026-02-21T12:40:23.8584871Z cvt.rn.f32.s16 %r13622, %rs1230; 2026-02-21T12:40:23.8584936Z cvt.rn.f32.s16 %r13623, %rs1228; 2026-02-21T12:40:23.8584997Z cvt.rn.f32.s16 %r13624, %rs1226; 2026-02-21T12:40:23.8585280Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8585344Z cvt.s16.s8 %rs1233, %rs1181; 2026-02-21T12:40:23.8585405Z shr.s16 %rs1234, %rs1233, 4; 2026-02-21T12:40:23.8585465Z cvt.s16.s8 %rs1235, %rs1183; 2026-02-21T12:40:23.8585525Z shr.s16 %rs1236, %rs1235, 4; 2026-02-21T12:40:23.8585605Z prmt.b32 %r13625, %r13569, 0, 0xbbb3U; 2026-02-21T12:40:23.8585668Z cvt.u16.u32 %rs1237, %r13625; 2026-02-21T12:40:23.8585727Z shr.s16 %rs1238, %rs1237, 4; 2026-02-21T12:40:23.8585795Z prmt.b32 %r13626, %r13579, 0, 0xbbb3U; 2026-02-21T12:40:23.8585855Z cvt.u16.u32 %rs1239, %r13626; 2026-02-21T12:40:23.8585913Z shr.s16 %rs1240, %rs1239, 4; 2026-02-21T12:40:23.8586103Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8586169Z cvt.rn.f32.s16 %r13627, %rs1240; 2026-02-21T12:40:23.8586232Z cvt.rn.f32.s16 %r13628, %rs1238; 2026-02-21T12:40:23.8586295Z cvt.rn.f32.s16 %r13629, %rs1236; 2026-02-21T12:40:23.8586362Z cvt.rn.f32.s16 %r13630, %rs1234; 2026-02-21T12:40:23.8586677Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8586741Z cvt.s16.s8 %rs1241, %rs1182; 2026-02-21T12:40:23.8586801Z shr.s16 %rs1242, %rs1241, 4; 2026-02-21T12:40:23.8586864Z cvt.s16.s8 %rs1243, %rs1184; 2026-02-21T12:40:23.8586923Z shr.s16 %rs1244, %rs1243, 4; 2026-02-21T12:40:23.8586989Z prmt.b32 %r13631, %r13574, 0, 0xbbb3U; 2026-02-21T12:40:23.8587051Z cvt.u16.u32 %rs1245, %r13631; 2026-02-21T12:40:23.8587109Z shr.s16 %rs1246, %rs1245, 4; 2026-02-21T12:40:23.8587175Z prmt.b32 %r13632, %r13584, 0, 0xbbb3U; 2026-02-21T12:40:23.8587238Z cvt.u16.u32 %rs1247, %r13632; 2026-02-21T12:40:23.8587309Z shr.s16 %rs1248, %rs1247, 4; 2026-02-21T12:40:23.8587503Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8587568Z cvt.rn.f32.s16 %r13633, %rs1248; 2026-02-21T12:40:23.8587637Z cvt.rn.f32.s16 %r13634, %rs1246; 2026-02-21T12:40:23.8587698Z cvt.rn.f32.s16 %r13635, %rs1244; 2026-02-21T12:40:23.8587759Z cvt.rn.f32.s16 %r13636, %rs1242; 2026-02-21T12:40:23.8587815Z bar.sync 0; 2026-02-21T12:40:23.8587931Z st.shared.v4.b32 [%r33], {%r13594, %r13592, %r13593, %r13591}; 2026-02-21T12:40:23.8588052Z st.shared.v4.b32 [%r33+8192], {%r13600, %r13598, %r13599, %r13597}; 2026-02-21T12:40:23.8588162Z st.shared.v4.b32 [%r34], {%r13606, %r13604, %r13605, %r13603}; 2026-02-21T12:40:23.8588277Z st.shared.v4.b32 [%r34+8192], {%r13612, %r13610, %r13611, %r13609}; 2026-02-21T12:40:23.8588445Z st.shared.v4.b32 [%r35], {%r13618, %r13616, %r13617, %r13615}; 2026-02-21T12:40:23.8588562Z st.shared.v4.b32 [%r35+8192], {%r13624, %r13622, %r13623, %r13621}; 2026-02-21T12:40:23.8588673Z st.shared.v4.b32 [%r36], {%r13630, %r13628, %r13629, %r13627}; 2026-02-21T12:40:23.8588884Z st.shared.v4.b32 [%r36+8192], {%r13636, %r13634, %r13635, %r13633}; 2026-02-21T12:40:23.8588939Z $L__tmp23: 2026-02-21T12:40:23.8589279Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8589339Z // begin inline asm 2026-02-21T12:40:23.8589414Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8589472Z // end inline asm 2026-02-21T12:40:23.8589525Z bar.sync 0; 2026-02-21T12:40:23.8589606Z shfl.sync.idx.b32 %r13637, %r2, 0, 31, -1; 2026-02-21T12:40:23.8589678Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8589743Z mov.pred %p32, -1; 2026-02-21T12:40:23.8589800Z // begin inline asm 2026-02-21T12:40:23.8592653Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r13031,%r13032,%r13033,%r13034}, %rd23, %p32, 1, 1; 2026-02-21T12:40:23.8592717Z // end inline asm 2026-02-21T12:40:23.8592774Z // begin inline asm 2026-02-21T12:40:23.8595497Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969}, {%r13291,%r13292,%r13293,%r13294}, %rd24, %p32, 1, 1; 2026-02-21T12:40:23.8595559Z // end inline asm 2026-02-21T12:40:23.8595638Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8595695Z mov.b32 %r13424, 0; 2026-02-21T12:40:23.8595755Z mov.b32 %r13425, %r13424; 2026-02-21T12:40:23.8595819Z mov.b32 %r13423, %r18409; 2026-02-21T12:40:23.8595878Z // begin inline asm 2026-02-21T12:40:23.8598581Z // wait for regs: %r21842,%r21843,%r21844,%r21845,%r21846,%r21847,%r21848,%r21849,%r21850,%r21851,%r21852,%r21853,%r21854,%r21855,%r21856,%r21857,%r21858,%r21859,%r21860,%r21861,%r21862,%r21863,%r21864,%r21865,%r21866,%r21867,%r21868,%r21869,%r21870,%r21871,%r21872,%r21873,%r21874,%r21875,%r21876,%r21877,%r21878,%r21879,%r21880,%r21881,%r21882,%r21883,%r21884,%r21885,%r21886,%r21887,%r21888,%r21889,%r21890,%r21891,%r21892,%r21893,%r21894,%r21895,%r21896,%r21897,%r21898,%r21899,%r21900,%r21901,%r21902,%r21903,%r21904,%r21905,%r21906,%r21907,%r21908,%r21909,%r21910,%r21911,%r21912,%r21913,%r21914,%r21915,%r21916,%r21917,%r21918,%r21919,%r21920,%r21921,%r21922,%r21923,%r21924,%r21925,%r21926,%r21927,%r21928,%r21929,%r21930,%r21931,%r21932,%r21933,%r21934,%r21935,%r21936,%r21937,%r21938,%r21939,%r21940,%r21941,%r21942,%r21943,%r21944,%r21945,%r21946,%r21947,%r21948,%r21949,%r21950,%r21951,%r21952,%r21953,%r21954,%r21955,%r21956,%r21957,%r21958,%r21959,%r21960,%r21961,%r21962,%r21963,%r21964,%r21965,%r21966,%r21967,%r21968,%r21969,%r13423,%r13424,%r13425 2026-02-21T12:40:23.8598797Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8598863Z // end inline asm 2026-02-21T12:40:23.8598919Z $L__tmp24: 2026-02-21T12:40:23.8599133Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8599197Z add.s64 %rd633, %rd633, 8; 2026-02-21T12:40:23.8599262Z add.s64 %rd632, %rd632, 32; 2026-02-21T12:40:23.8599324Z add.s64 %rd631, %rd631, 10240; 2026-02-21T12:40:23.8599389Z setp.lt.u64 %p34, %rd633, 4088; 2026-02-21T12:40:23.8599451Z @%p34 bra $L__BB0_30; 2026-02-21T12:40:23.8599571Z // %bb.31: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8599905Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:23.8599972Z or.b64 %rd400, %rd94, %rd5; 2026-02-21T12:40:23.8600035Z or.b64 %rd401, %rd94, %rd6; 2026-02-21T12:40:23.8600096Z or.b64 %rd402, %rd94, %rd7; 2026-02-21T12:40:23.8600155Z or.b64 %rd403, %rd94, %rd8; 2026-02-21T12:40:23.8600214Z or.b64 %rd404, %rd94, %rd9; 2026-02-21T12:40:23.8600278Z or.b64 %rd405, %rd94, %rd10; 2026-02-21T12:40:23.8600336Z or.b64 %rd406, %rd94, %rd11; 2026-02-21T12:40:23.8600394Z or.b64 %rd407, %rd94, %rd12; 2026-02-21T12:40:23.8600462Z or.b64 %rd408, %rd94, %rd13; 2026-02-21T12:40:23.8600526Z or.b64 %rd409, %rd94, %rd14; 2026-02-21T12:40:23.8600588Z or.b64 %rd410, %rd94, %rd15; 2026-02-21T12:40:23.8600650Z or.b64 %rd411, %rd94, %rd16; 2026-02-21T12:40:23.8600710Z or.b64 %rd412, %rd94, %rd17; 2026-02-21T12:40:23.8600768Z or.b64 %rd413, %rd94, %rd18; 2026-02-21T12:40:23.8600826Z or.b64 %rd414, %rd94, %rd19; 2026-02-21T12:40:23.8600894Z or.b64 %rd415, %rd94, %rd20; 2026-02-21T12:40:23.8601090Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:23.8601149Z or.b64 %rd416, %rd95, %rd22; 2026-02-21T12:40:23.8601341Z .loc 1 90 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:90:28 2026-02-21T12:40:23.8601425Z cvt.rn.bf16x2.f32 %r13782, %r21843, %r21842; 2026-02-21T12:40:23.8601505Z cvt.rn.bf16x2.f32 %r13783, %r21845, %r21844; 2026-02-21T12:40:23.8601585Z cvt.rn.bf16x2.f32 %r13784, %r21847, %r21846; 2026-02-21T12:40:23.8601661Z cvt.rn.bf16x2.f32 %r13785, %r21849, %r21848; 2026-02-21T12:40:23.8601736Z cvt.rn.bf16x2.f32 %r13786, %r21851, %r21850; 2026-02-21T12:40:23.8601813Z cvt.rn.bf16x2.f32 %r13787, %r21853, %r21852; 2026-02-21T12:40:23.8601906Z cvt.rn.bf16x2.f32 %r13788, %r21855, %r21854; 2026-02-21T12:40:23.8601983Z cvt.rn.bf16x2.f32 %r13789, %r21857, %r21856; 2026-02-21T12:40:23.8602063Z cvt.rn.bf16x2.f32 %r13790, %r21859, %r21858; 2026-02-21T12:40:23.8602141Z cvt.rn.bf16x2.f32 %r13791, %r21861, %r21860; 2026-02-21T12:40:23.8602216Z cvt.rn.bf16x2.f32 %r13792, %r21863, %r21862; 2026-02-21T12:40:23.8602289Z cvt.rn.bf16x2.f32 %r13793, %r21865, %r21864; 2026-02-21T12:40:23.8602365Z cvt.rn.bf16x2.f32 %r13794, %r21867, %r21866; 2026-02-21T12:40:23.8602444Z cvt.rn.bf16x2.f32 %r13795, %r21869, %r21868; 2026-02-21T12:40:23.8602518Z cvt.rn.bf16x2.f32 %r13796, %r21871, %r21870; 2026-02-21T12:40:23.8602594Z cvt.rn.bf16x2.f32 %r13797, %r21873, %r21872; 2026-02-21T12:40:23.8602671Z cvt.rn.bf16x2.f32 %r13798, %r21875, %r21874; 2026-02-21T12:40:23.8602745Z cvt.rn.bf16x2.f32 %r13799, %r21877, %r21876; 2026-02-21T12:40:23.8602820Z cvt.rn.bf16x2.f32 %r13800, %r21879, %r21878; 2026-02-21T12:40:23.8602895Z cvt.rn.bf16x2.f32 %r13801, %r21881, %r21880; 2026-02-21T12:40:23.8603032Z cvt.rn.bf16x2.f32 %r13802, %r21883, %r21882; 2026-02-21T12:40:23.8603108Z cvt.rn.bf16x2.f32 %r13803, %r21885, %r21884; 2026-02-21T12:40:23.8603229Z cvt.rn.bf16x2.f32 %r13804, %r21887, %r21886; 2026-02-21T12:40:23.8603321Z cvt.rn.bf16x2.f32 %r13805, %r21889, %r21888; 2026-02-21T12:40:23.8603398Z cvt.rn.bf16x2.f32 %r13806, %r21891, %r21890; 2026-02-21T12:40:23.8603472Z cvt.rn.bf16x2.f32 %r13807, %r21893, %r21892; 2026-02-21T12:40:23.8603549Z cvt.rn.bf16x2.f32 %r13808, %r21895, %r21894; 2026-02-21T12:40:23.8603623Z cvt.rn.bf16x2.f32 %r13809, %r21897, %r21896; 2026-02-21T12:40:23.8603696Z cvt.rn.bf16x2.f32 %r13810, %r21899, %r21898; 2026-02-21T12:40:23.8603773Z cvt.rn.bf16x2.f32 %r13811, %r21901, %r21900; 2026-02-21T12:40:23.8603847Z cvt.rn.bf16x2.f32 %r13812, %r21903, %r21902; 2026-02-21T12:40:23.8603920Z cvt.rn.bf16x2.f32 %r13813, %r21905, %r21904; 2026-02-21T12:40:23.8603993Z cvt.rn.bf16x2.f32 %r13814, %r21907, %r21906; 2026-02-21T12:40:23.8604072Z cvt.rn.bf16x2.f32 %r13815, %r21909, %r21908; 2026-02-21T12:40:23.8604146Z cvt.rn.bf16x2.f32 %r13816, %r21911, %r21910; 2026-02-21T12:40:23.8604316Z cvt.rn.bf16x2.f32 %r13817, %r21913, %r21912; 2026-02-21T12:40:23.8604397Z cvt.rn.bf16x2.f32 %r13818, %r21915, %r21914; 2026-02-21T12:40:23.8604471Z cvt.rn.bf16x2.f32 %r13819, %r21917, %r21916; 2026-02-21T12:40:23.8604544Z cvt.rn.bf16x2.f32 %r13820, %r21919, %r21918; 2026-02-21T12:40:23.8604620Z cvt.rn.bf16x2.f32 %r13821, %r21921, %r21920; 2026-02-21T12:40:23.8604694Z cvt.rn.bf16x2.f32 %r13822, %r21923, %r21922; 2026-02-21T12:40:23.8604770Z cvt.rn.bf16x2.f32 %r13823, %r21925, %r21924; 2026-02-21T12:40:23.8604843Z cvt.rn.bf16x2.f32 %r13824, %r21927, %r21926; 2026-02-21T12:40:23.8604918Z cvt.rn.bf16x2.f32 %r13825, %r21929, %r21928; 2026-02-21T12:40:23.8604989Z cvt.rn.bf16x2.f32 %r13826, %r21931, %r21930; 2026-02-21T12:40:23.8605063Z cvt.rn.bf16x2.f32 %r13827, %r21933, %r21932; 2026-02-21T12:40:23.8605145Z cvt.rn.bf16x2.f32 %r13828, %r21935, %r21934; 2026-02-21T12:40:23.8605227Z cvt.rn.bf16x2.f32 %r13829, %r21937, %r21936; 2026-02-21T12:40:23.8605305Z cvt.rn.bf16x2.f32 %r13830, %r21939, %r21938; 2026-02-21T12:40:23.8605385Z cvt.rn.bf16x2.f32 %r13831, %r21941, %r21940; 2026-02-21T12:40:23.8605460Z cvt.rn.bf16x2.f32 %r13832, %r21943, %r21942; 2026-02-21T12:40:23.8605532Z cvt.rn.bf16x2.f32 %r13833, %r21945, %r21944; 2026-02-21T12:40:23.8605606Z cvt.rn.bf16x2.f32 %r13834, %r21947, %r21946; 2026-02-21T12:40:23.8605683Z cvt.rn.bf16x2.f32 %r13835, %r21949, %r21948; 2026-02-21T12:40:23.8605758Z cvt.rn.bf16x2.f32 %r13836, %r21951, %r21950; 2026-02-21T12:40:23.8605830Z cvt.rn.bf16x2.f32 %r13837, %r21953, %r21952; 2026-02-21T12:40:23.8605906Z cvt.rn.bf16x2.f32 %r13838, %r21955, %r21954; 2026-02-21T12:40:23.8605980Z cvt.rn.bf16x2.f32 %r13839, %r21957, %r21956; 2026-02-21T12:40:23.8606053Z cvt.rn.bf16x2.f32 %r13840, %r21959, %r21958; 2026-02-21T12:40:23.8606129Z cvt.rn.bf16x2.f32 %r13841, %r21961, %r21960; 2026-02-21T12:40:23.8606205Z cvt.rn.bf16x2.f32 %r13842, %r21963, %r21962; 2026-02-21T12:40:23.8606278Z cvt.rn.bf16x2.f32 %r13843, %r21965, %r21964; 2026-02-21T12:40:23.8606355Z cvt.rn.bf16x2.f32 %r13844, %r21967, %r21966; 2026-02-21T12:40:23.8606433Z cvt.rn.bf16x2.f32 %r13845, %r21969, %r21968; 2026-02-21T12:40:23.8606797Z .loc 1 91 22 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:22 2026-02-21T12:40:23.8606876Z mad.lo.s64 %rd417, %rd400, 2560, %rd162; 2026-02-21T12:40:23.8606941Z shl.b64 %rd418, %rd416, 1; 2026-02-21T12:40:23.8607006Z add.s64 %rd384, %rd417, %rd418; 2026-02-21T12:40:23.8607076Z mad.lo.s64 %rd419, %rd401, 2560, %rd162; 2026-02-21T12:40:23.8607141Z add.s64 %rd385, %rd419, %rd418; 2026-02-21T12:40:23.8607210Z mad.lo.s64 %rd420, %rd402, 2560, %rd162; 2026-02-21T12:40:23.8607281Z add.s64 %rd386, %rd420, %rd418; 2026-02-21T12:40:23.8607351Z mad.lo.s64 %rd421, %rd403, 2560, %rd162; 2026-02-21T12:40:23.8607416Z add.s64 %rd387, %rd421, %rd418; 2026-02-21T12:40:23.8607571Z mad.lo.s64 %rd422, %rd404, 2560, %rd162; 2026-02-21T12:40:23.8607632Z add.s64 %rd388, %rd422, %rd418; 2026-02-21T12:40:23.8607762Z mad.lo.s64 %rd423, %rd405, 2560, %rd162; 2026-02-21T12:40:23.8607825Z add.s64 %rd389, %rd423, %rd418; 2026-02-21T12:40:23.8607893Z mad.lo.s64 %rd424, %rd406, 2560, %rd162; 2026-02-21T12:40:23.8607956Z add.s64 %rd390, %rd424, %rd418; 2026-02-21T12:40:23.8608025Z mad.lo.s64 %rd425, %rd407, 2560, %rd162; 2026-02-21T12:40:23.8608087Z add.s64 %rd391, %rd425, %rd418; 2026-02-21T12:40:23.8608157Z mad.lo.s64 %rd426, %rd408, 2560, %rd162; 2026-02-21T12:40:23.8608219Z add.s64 %rd392, %rd426, %rd418; 2026-02-21T12:40:23.8608286Z mad.lo.s64 %rd427, %rd409, 2560, %rd162; 2026-02-21T12:40:23.8608346Z add.s64 %rd393, %rd427, %rd418; 2026-02-21T12:40:23.8608415Z mad.lo.s64 %rd428, %rd410, 2560, %rd162; 2026-02-21T12:40:23.8608476Z add.s64 %rd394, %rd428, %rd418; 2026-02-21T12:40:23.8608543Z mad.lo.s64 %rd429, %rd411, 2560, %rd162; 2026-02-21T12:40:23.8608605Z add.s64 %rd395, %rd429, %rd418; 2026-02-21T12:40:23.8608677Z mad.lo.s64 %rd430, %rd412, 2560, %rd162; 2026-02-21T12:40:23.8608805Z add.s64 %rd396, %rd430, %rd418; 2026-02-21T12:40:23.8608939Z mad.lo.s64 %rd431, %rd413, 2560, %rd162; 2026-02-21T12:40:23.8609003Z add.s64 %rd397, %rd431, %rd418; 2026-02-21T12:40:23.8609083Z mad.lo.s64 %rd432, %rd414, 2560, %rd162; 2026-02-21T12:40:23.8609146Z add.s64 %rd398, %rd432, %rd418; 2026-02-21T12:40:23.8609214Z mad.lo.s64 %rd433, %rd415, 2560, %rd162; 2026-02-21T12:40:23.8609278Z add.s64 %rd399, %rd433, %rd418; 2026-02-21T12:40:23.8609473Z .loc 1 91 81 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:81 2026-02-21T12:40:23.8609529Z bar.sync 0; 2026-02-21T12:40:23.8609647Z st.shared.v4.b32 [%r37], {%r13782, %r13784, %r13786, %r13788}; 2026-02-21T12:40:23.8609756Z st.shared.v4.b32 [%r38], {%r13790, %r13792, %r13794, %r13796}; 2026-02-21T12:40:23.8609864Z st.shared.v4.b32 [%r39], {%r13798, %r13800, %r13802, %r13804}; 2026-02-21T12:40:23.8609977Z st.shared.v4.b32 [%r40], {%r13806, %r13808, %r13810, %r13812}; 2026-02-21T12:40:23.8610089Z st.shared.v4.b32 [%r41], {%r13814, %r13816, %r13818, %r13820}; 2026-02-21T12:40:23.8612916Z st.shared.v4.b32 [%r42], {%r13822, %r13824, %r13826, %r13828}; 2026-02-21T12:40:23.8613070Z st.shared.v4.b32 [%r43], {%r13830, %r13832, %r13834, %r13836}; 2026-02-21T12:40:23.8613204Z st.shared.v4.b32 [%r44], {%r13838, %r13840, %r13842, %r13844}; 2026-02-21T12:40:23.8613266Z bar.sync 0; 2026-02-21T12:40:23.8613333Z // begin inline asm 2026-02-21T12:40:23.8613542Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13638, %r13639, %r13640, %r13641}, [%r6254]; 2026-02-21T12:40:23.8613601Z // end inline asm 2026-02-21T12:40:23.8613664Z // begin inline asm 2026-02-21T12:40:23.8613857Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13643, %r13644, %r13645, %r13646}, [%r6259]; 2026-02-21T12:40:23.8613914Z // end inline asm 2026-02-21T12:40:23.8613973Z // begin inline asm 2026-02-21T12:40:23.8614166Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13648, %r13649, %r13650, %r13651}, [%r6264]; 2026-02-21T12:40:23.8614220Z // end inline asm 2026-02-21T12:40:23.8614285Z // begin inline asm 2026-02-21T12:40:23.8614471Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13653, %r13654, %r13655, %r13656}, [%r6269]; 2026-02-21T12:40:23.8614524Z // end inline asm 2026-02-21T12:40:23.8614583Z // begin inline asm 2026-02-21T12:40:23.8614769Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13658, %r13659, %r13660, %r13661}, [%r6274]; 2026-02-21T12:40:23.8614823Z // end inline asm 2026-02-21T12:40:23.8614878Z // begin inline asm 2026-02-21T12:40:23.8615064Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13663, %r13664, %r13665, %r13666}, [%r6279]; 2026-02-21T12:40:23.8615120Z // end inline asm 2026-02-21T12:40:23.8615174Z // begin inline asm 2026-02-21T12:40:23.8615360Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13668, %r13669, %r13670, %r13671}, [%r6284]; 2026-02-21T12:40:23.8615509Z // end inline asm 2026-02-21T12:40:23.8615566Z // begin inline asm 2026-02-21T12:40:23.8615751Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13673, %r13674, %r13675, %r13676}, [%r6289]; 2026-02-21T12:40:23.8615869Z // end inline asm 2026-02-21T12:40:23.8615923Z bar.sync 0; 2026-02-21T12:40:23.8616035Z st.shared.v4.b32 [%r37], {%r13783, %r13785, %r13787, %r13789}; 2026-02-21T12:40:23.8616144Z st.shared.v4.b32 [%r38], {%r13791, %r13793, %r13795, %r13797}; 2026-02-21T12:40:23.8616248Z st.shared.v4.b32 [%r39], {%r13799, %r13801, %r13803, %r13805}; 2026-02-21T12:40:23.8616351Z st.shared.v4.b32 [%r40], {%r13807, %r13809, %r13811, %r13813}; 2026-02-21T12:40:23.8616631Z st.shared.v4.b32 [%r41], {%r13815, %r13817, %r13819, %r13821}; 2026-02-21T12:40:23.8616745Z st.shared.v4.b32 [%r42], {%r13823, %r13825, %r13827, %r13829}; 2026-02-21T12:40:23.8616848Z st.shared.v4.b32 [%r43], {%r13831, %r13833, %r13835, %r13837}; 2026-02-21T12:40:23.8616952Z st.shared.v4.b32 [%r44], {%r13839, %r13841, %r13843, %r13845}; 2026-02-21T12:40:23.8617014Z bar.sync 0; 2026-02-21T12:40:23.8617071Z // begin inline asm 2026-02-21T12:40:23.8617420Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13678, %r13679, %r13680, %r13681}, [%r6254]; 2026-02-21T12:40:23.8617488Z // end inline asm 2026-02-21T12:40:23.8617545Z // begin inline asm 2026-02-21T12:40:23.8617732Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13683, %r13684, %r13685, %r13686}, [%r6259]; 2026-02-21T12:40:23.8617787Z // end inline asm 2026-02-21T12:40:23.8617846Z // begin inline asm 2026-02-21T12:40:23.8618030Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13688, %r13689, %r13690, %r13691}, [%r6264]; 2026-02-21T12:40:23.8618083Z // end inline asm 2026-02-21T12:40:23.8618150Z // begin inline asm 2026-02-21T12:40:23.8618340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13693, %r13694, %r13695, %r13696}, [%r6269]; 2026-02-21T12:40:23.8618395Z // end inline asm 2026-02-21T12:40:23.8618455Z // begin inline asm 2026-02-21T12:40:23.8618647Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13698, %r13699, %r13700, %r13701}, [%r6274]; 2026-02-21T12:40:23.8618705Z // end inline asm 2026-02-21T12:40:23.8618761Z // begin inline asm 2026-02-21T12:40:23.8618960Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13703, %r13704, %r13705, %r13706}, [%r6279]; 2026-02-21T12:40:23.8619014Z // end inline asm 2026-02-21T12:40:23.8619069Z // begin inline asm 2026-02-21T12:40:23.8619258Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13708, %r13709, %r13710, %r13711}, [%r6284]; 2026-02-21T12:40:23.8619313Z // end inline asm 2026-02-21T12:40:23.8619368Z // begin inline asm 2026-02-21T12:40:23.8619553Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13713, %r13714, %r13715, %r13716}, [%r6289]; 2026-02-21T12:40:23.8619608Z // end inline asm 2026-02-21T12:40:23.8619664Z // begin inline asm 2026-02-21T12:40:23.8619793Z st.global.v4.b32 [ %rd384 + 0 ], { %r13638, %r13639, %r13640, %r13641 }; 2026-02-21T12:40:23.8619852Z // end inline asm 2026-02-21T12:40:23.8619906Z // begin inline asm 2026-02-21T12:40:23.8620030Z st.global.v4.b32 [ %rd385 + 0 ], { %r13643, %r13644, %r13645, %r13646 }; 2026-02-21T12:40:23.8620086Z // end inline asm 2026-02-21T12:40:23.8620147Z // begin inline asm 2026-02-21T12:40:23.8620278Z st.global.v4.b32 [ %rd386 + 0 ], { %r13678, %r13679, %r13680, %r13681 }; 2026-02-21T12:40:23.8620336Z // end inline asm 2026-02-21T12:40:23.8620395Z // begin inline asm 2026-02-21T12:40:23.8620511Z st.global.v4.b32 [ %rd387 + 0 ], { %r13683, %r13684, %r13685, %r13686 }; 2026-02-21T12:40:23.8620564Z // end inline asm 2026-02-21T12:40:23.8620622Z // begin inline asm 2026-02-21T12:40:23.8620738Z st.global.v4.b32 [ %rd388 + 0 ], { %r13648, %r13649, %r13650, %r13651 }; 2026-02-21T12:40:23.8620793Z // end inline asm 2026-02-21T12:40:23.8620850Z // begin inline asm 2026-02-21T12:40:23.8620965Z st.global.v4.b32 [ %rd389 + 0 ], { %r13653, %r13654, %r13655, %r13656 }; 2026-02-21T12:40:23.8621019Z // end inline asm 2026-02-21T12:40:23.8621074Z // begin inline asm 2026-02-21T12:40:23.8621280Z st.global.v4.b32 [ %rd390 + 0 ], { %r13688, %r13689, %r13690, %r13691 }; 2026-02-21T12:40:23.8621334Z // end inline asm 2026-02-21T12:40:23.8621389Z // begin inline asm 2026-02-21T12:40:23.8621593Z st.global.v4.b32 [ %rd391 + 0 ], { %r13693, %r13694, %r13695, %r13696 }; 2026-02-21T12:40:23.8621650Z // end inline asm 2026-02-21T12:40:23.8621707Z // begin inline asm 2026-02-21T12:40:23.8621824Z st.global.v4.b32 [ %rd392 + 0 ], { %r13658, %r13659, %r13660, %r13661 }; 2026-02-21T12:40:23.8621881Z // end inline asm 2026-02-21T12:40:23.8621948Z // begin inline asm 2026-02-21T12:40:23.8622066Z st.global.v4.b32 [ %rd393 + 0 ], { %r13663, %r13664, %r13665, %r13666 }; 2026-02-21T12:40:23.8622122Z // end inline asm 2026-02-21T12:40:23.8622178Z // begin inline asm 2026-02-21T12:40:23.8622292Z st.global.v4.b32 [ %rd394 + 0 ], { %r13698, %r13699, %r13700, %r13701 }; 2026-02-21T12:40:23.8622348Z // end inline asm 2026-02-21T12:40:23.8622406Z // begin inline asm 2026-02-21T12:40:23.8622520Z st.global.v4.b32 [ %rd395 + 0 ], { %r13703, %r13704, %r13705, %r13706 }; 2026-02-21T12:40:23.8622577Z // end inline asm 2026-02-21T12:40:23.8622636Z // begin inline asm 2026-02-21T12:40:23.8622847Z st.global.v4.b32 [ %rd396 + 0 ], { %r13668, %r13669, %r13670, %r13671 }; 2026-02-21T12:40:23.8622903Z // end inline asm 2026-02-21T12:40:23.8622962Z // begin inline asm 2026-02-21T12:40:23.8623078Z st.global.v4.b32 [ %rd397 + 0 ], { %r13673, %r13674, %r13675, %r13676 }; 2026-02-21T12:40:23.8623132Z // end inline asm 2026-02-21T12:40:23.8623187Z // begin inline asm 2026-02-21T12:40:23.8623303Z st.global.v4.b32 [ %rd398 + 0 ], { %r13708, %r13709, %r13710, %r13711 }; 2026-02-21T12:40:23.8623356Z // end inline asm 2026-02-21T12:40:23.8623411Z // begin inline asm 2026-02-21T12:40:23.8623529Z st.global.v4.b32 [ %rd399 + 0 ], { %r13713, %r13714, %r13715, %r13716 }; 2026-02-21T12:40:23.8623584Z // end inline asm 2026-02-21T12:40:23.8623804Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:23.8623876Z add.s64 %rd434, %rd612, 3; 2026-02-21T12:40:23.8624095Z .loc 1 28 35 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:28:35 2026-02-21T12:40:23.8624192Z mul.hi.u64 %rd435, %rd434, -3689348814741910323; 2026-02-21T12:40:23.8624258Z shr.u64 %rd436, %rd435, 5; 2026-02-21T12:40:23.8624469Z .loc 1 29 33 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:29:33 2026-02-21T12:40:23.8624530Z shl.b64 %rd113, %rd436, 3; 2026-02-21T12:40:23.8624725Z .loc 1 30 39 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:39 2026-02-21T12:40:23.8624794Z sub.s64 %rd437, 4096, %rd113; 2026-02-21T12:40:23.8624986Z .loc 1 30 52 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:30:52 2026-02-21T12:40:23.8625046Z min.s64 %rd114, %rd437, 8; 2026-02-21T12:40:23.8625240Z .loc 1 31 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:45 2026-02-21T12:40:23.8625307Z mul.lo.s64 %rd438, %rd436, 40; 2026-02-21T12:40:23.8625371Z sub.s64 %rd115, %rd434, %rd438; 2026-02-21T12:40:23.8625569Z .loc 1 32 51 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:32:51 2026-02-21T12:40:23.8625631Z or.b64 %rd439, %rd115, %rd114; 2026-02-21T12:40:23.8625700Z and.b64 %rd440, %rd439, -4294967296; 2026-02-21T12:40:23.8625767Z setp.ne.b64 %p35, %rd440, 0; 2026-02-21T12:40:23.8625831Z @%p35 bra $L__BB0_33; 2026-02-21T12:40:23.8625889Z bra.uni $L__BB0_32; 2026-02-21T12:40:23.8626001Z $L__BB0_33: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8626065Z div.s64 %rd634, %rd115, %rd114; 2026-02-21T12:40:23.8626123Z bra.uni $L__BB0_34; 2026-02-21T12:40:23.8626227Z $L__BB0_32: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8626290Z cvt.u32.u64 %r13846, %rd114; 2026-02-21T12:40:23.8626365Z cvt.u32.u64 %r13847, %rd115; 2026-02-21T12:40:23.8626606Z div.u32 %r13848, %r13847, %r13846; 2026-02-21T12:40:23.8626670Z cvt.u64.u32 %rd634, %r13848; 2026-02-21T12:40:23.8626780Z $L__BB0_34: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8627065Z .loc 1 31 64 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:64 2026-02-21T12:40:23.8627133Z mul.lo.s64 %rd442, %rd634, %rd114; 2026-02-21T12:40:23.8627196Z sub.s64 %rd443, %rd115, %rd442; 2026-02-21T12:40:23.8627391Z .loc 1 31 30 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:30 2026-02-21T12:40:23.8627454Z add.s64 %rd444, %rd443, %rd113; 2026-02-21T12:40:23.8627644Z .loc 1 33 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:33:27 2026-02-21T12:40:23.8627718Z shl.b64 %rd119, %rd444, 6; 2026-02-21T12:40:23.8627911Z .loc 1 35 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:35:27 2026-02-21T12:40:23.8627973Z shl.b64 %rd120, %rd634, 8; 2026-02-21T12:40:23.8628178Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8628455Z shl.b64 %rd121, %rd444, 20; 2026-02-21T12:40:23.8628534Z add.s64 %rd636, %rd25, %rd121; 2026-02-21T12:40:23.8628599Z add.s64 %rd635, %rd26, %rd120; 2026-02-21T12:40:23.8628662Z mov.b32 %r22098, 0f00000000; 2026-02-21T12:40:23.8628722Z mov.b64 %rd637, -24; 2026-02-21T12:40:23.8628784Z mov.b32 %r22099, %r22098; 2026-02-21T12:40:23.8628841Z mov.b32 %r22100, %r22098; 2026-02-21T12:40:23.8628900Z mov.b32 %r22101, %r22098; 2026-02-21T12:40:23.8628956Z mov.b32 %r22102, %r22098; 2026-02-21T12:40:23.8629018Z mov.b32 %r22103, %r22098; 2026-02-21T12:40:23.8629075Z mov.b32 %r22104, %r22098; 2026-02-21T12:40:23.8629131Z mov.b32 %r22105, %r22098; 2026-02-21T12:40:23.8629191Z mov.b32 %r22106, %r22098; 2026-02-21T12:40:23.8629249Z mov.b32 %r22107, %r22098; 2026-02-21T12:40:23.8629304Z mov.b32 %r22108, %r22098; 2026-02-21T12:40:23.8629364Z mov.b32 %r22109, %r22098; 2026-02-21T12:40:23.8629423Z mov.b32 %r22110, %r22098; 2026-02-21T12:40:23.8629481Z mov.b32 %r22111, %r22098; 2026-02-21T12:40:23.8629541Z mov.b32 %r22112, %r22098; 2026-02-21T12:40:23.8629602Z mov.b32 %r22113, %r22098; 2026-02-21T12:40:23.8629657Z mov.b32 %r22114, %r22098; 2026-02-21T12:40:23.8629713Z mov.b32 %r22115, %r22098; 2026-02-21T12:40:23.8629769Z mov.b32 %r22116, %r22098; 2026-02-21T12:40:23.8629828Z mov.b32 %r22117, %r22098; 2026-02-21T12:40:23.8629884Z mov.b32 %r22118, %r22098; 2026-02-21T12:40:23.8629942Z mov.b32 %r22119, %r22098; 2026-02-21T12:40:23.8630000Z mov.b32 %r22120, %r22098; 2026-02-21T12:40:23.8630056Z mov.b32 %r22121, %r22098; 2026-02-21T12:40:23.8630114Z mov.b32 %r22122, %r22098; 2026-02-21T12:40:23.8630170Z mov.b32 %r22123, %r22098; 2026-02-21T12:40:23.8630229Z mov.b32 %r22124, %r22098; 2026-02-21T12:40:23.8630286Z mov.b32 %r22125, %r22098; 2026-02-21T12:40:23.8630342Z mov.b32 %r22126, %r22098; 2026-02-21T12:40:23.8630403Z mov.b32 %r22127, %r22098; 2026-02-21T12:40:23.8630459Z mov.b32 %r22128, %r22098; 2026-02-21T12:40:23.8630521Z mov.b32 %r22129, %r22098; 2026-02-21T12:40:23.8630580Z mov.b32 %r22130, %r22098; 2026-02-21T12:40:23.8630654Z mov.b32 %r22131, %r22098; 2026-02-21T12:40:23.8630712Z mov.b32 %r22132, %r22098; 2026-02-21T12:40:23.8630768Z mov.b32 %r22133, %r22098; 2026-02-21T12:40:23.8630828Z mov.b32 %r22134, %r22098; 2026-02-21T12:40:23.8630886Z mov.b32 %r22135, %r22098; 2026-02-21T12:40:23.8630941Z mov.b32 %r22136, %r22098; 2026-02-21T12:40:23.8630996Z mov.b32 %r22137, %r22098; 2026-02-21T12:40:23.8631056Z mov.b32 %r22138, %r22098; 2026-02-21T12:40:23.8631111Z mov.b32 %r22139, %r22098; 2026-02-21T12:40:23.8631168Z mov.b32 %r22140, %r22098; 2026-02-21T12:40:23.8631228Z mov.b32 %r22141, %r22098; 2026-02-21T12:40:23.8631287Z mov.b32 %r22142, %r22098; 2026-02-21T12:40:23.8631345Z mov.b32 %r22143, %r22098; 2026-02-21T12:40:23.8631405Z mov.b32 %r22144, %r22098; 2026-02-21T12:40:23.8631544Z mov.b32 %r22145, %r22098; 2026-02-21T12:40:23.8631600Z mov.b32 %r22146, %r22098; 2026-02-21T12:40:23.8631657Z mov.b32 %r22147, %r22098; 2026-02-21T12:40:23.8631763Z mov.b32 %r22148, %r22098; 2026-02-21T12:40:23.8631820Z mov.b32 %r22149, %r22098; 2026-02-21T12:40:23.8631875Z mov.b32 %r22150, %r22098; 2026-02-21T12:40:23.8631933Z mov.b32 %r22151, %r22098; 2026-02-21T12:40:23.8631990Z mov.b32 %r22152, %r22098; 2026-02-21T12:40:23.8632047Z mov.b32 %r22153, %r22098; 2026-02-21T12:40:23.8632103Z mov.b32 %r22154, %r22098; 2026-02-21T12:40:23.8632163Z mov.b32 %r22155, %r22098; 2026-02-21T12:40:23.8632221Z mov.b32 %r22156, %r22098; 2026-02-21T12:40:23.8632278Z mov.b32 %r22157, %r22098; 2026-02-21T12:40:23.8632336Z mov.b32 %r22158, %r22098; 2026-02-21T12:40:23.8632393Z mov.b32 %r22159, %r22098; 2026-02-21T12:40:23.8632448Z mov.b32 %r22160, %r22098; 2026-02-21T12:40:23.8632503Z mov.b32 %r22161, %r22098; 2026-02-21T12:40:23.8632562Z mov.b32 %r22162, %r22098; 2026-02-21T12:40:23.8632621Z mov.b32 %r22163, %r22098; 2026-02-21T12:40:23.8632678Z mov.b32 %r22164, %r22098; 2026-02-21T12:40:23.8632792Z mov.b32 %r22165, %r22098; 2026-02-21T12:40:23.8632919Z mov.b32 %r22166, %r22098; 2026-02-21T12:40:23.8632977Z mov.b32 %r22167, %r22098; 2026-02-21T12:40:23.8633042Z mov.b32 %r22168, %r22098; 2026-02-21T12:40:23.8633143Z mov.b32 %r22169, %r22098; 2026-02-21T12:40:23.8633242Z mov.b32 %r22170, %r22098; 2026-02-21T12:40:23.8633344Z mov.b32 %r22171, %r22098; 2026-02-21T12:40:23.8633430Z mov.b32 %r22172, %r22098; 2026-02-21T12:40:23.8633489Z mov.b32 %r22173, %r22098; 2026-02-21T12:40:23.8633544Z mov.b32 %r22174, %r22098; 2026-02-21T12:40:23.8633600Z mov.b32 %r22175, %r22098; 2026-02-21T12:40:23.8633660Z mov.b32 %r22176, %r22098; 2026-02-21T12:40:23.8633715Z mov.b32 %r22177, %r22098; 2026-02-21T12:40:23.8633770Z mov.b32 %r22178, %r22098; 2026-02-21T12:40:23.8633829Z mov.b32 %r22179, %r22098; 2026-02-21T12:40:23.8633885Z mov.b32 %r22180, %r22098; 2026-02-21T12:40:23.8633945Z mov.b32 %r22181, %r22098; 2026-02-21T12:40:23.8634001Z mov.b32 %r22182, %r22098; 2026-02-21T12:40:23.8634093Z mov.b32 %r22183, %r22098; 2026-02-21T12:40:23.8634201Z mov.b32 %r22184, %r22098; 2026-02-21T12:40:23.8634281Z mov.b32 %r22185, %r22098; 2026-02-21T12:40:23.8634341Z mov.b32 %r22186, %r22098; 2026-02-21T12:40:23.8634399Z mov.b32 %r22187, %r22098; 2026-02-21T12:40:23.8634456Z mov.b32 %r22188, %r22098; 2026-02-21T12:40:23.8634513Z mov.b32 %r22189, %r22098; 2026-02-21T12:40:23.8634572Z mov.b32 %r22190, %r22098; 2026-02-21T12:40:23.8634628Z mov.b32 %r22191, %r22098; 2026-02-21T12:40:23.8634684Z mov.b32 %r22192, %r22098; 2026-02-21T12:40:23.8634745Z mov.b32 %r22193, %r22098; 2026-02-21T12:40:23.8634801Z mov.b32 %r22194, %r22098; 2026-02-21T12:40:23.8634857Z mov.b32 %r22195, %r22098; 2026-02-21T12:40:23.8634937Z mov.b32 %r22196, %r22098; 2026-02-21T12:40:23.8635040Z mov.b32 %r22197, %r22098; 2026-02-21T12:40:23.8635126Z mov.b32 %r22198, %r22098; 2026-02-21T12:40:23.8635188Z mov.b32 %r22199, %r22098; 2026-02-21T12:40:23.8635246Z mov.b32 %r22200, %r22098; 2026-02-21T12:40:23.8635304Z mov.b32 %r22201, %r22098; 2026-02-21T12:40:23.8635364Z mov.b32 %r22202, %r22098; 2026-02-21T12:40:23.8635427Z mov.b32 %r22203, %r22098; 2026-02-21T12:40:23.8635482Z mov.b32 %r22204, %r22098; 2026-02-21T12:40:23.8635538Z mov.b32 %r22205, %r22098; 2026-02-21T12:40:23.8635593Z mov.b32 %r22206, %r22098; 2026-02-21T12:40:23.8635652Z mov.b32 %r22207, %r22098; 2026-02-21T12:40:23.8635707Z mov.b32 %r22208, %r22098; 2026-02-21T12:40:23.8635780Z mov.b32 %r22209, %r22098; 2026-02-21T12:40:23.8635885Z mov.b32 %r22210, %r22098; 2026-02-21T12:40:23.8635981Z mov.b32 %r22211, %r22098; 2026-02-21T12:40:23.8636080Z mov.b32 %r22212, %r22098; 2026-02-21T12:40:23.8636178Z mov.b32 %r22213, %r22098; 2026-02-21T12:40:23.8636280Z mov.b32 %r22214, %r22098; 2026-02-21T12:40:23.8636366Z mov.b32 %r22215, %r22098; 2026-02-21T12:40:23.8636424Z mov.b32 %r22216, %r22098; 2026-02-21T12:40:23.8636790Z mov.b32 %r22217, %r22098; 2026-02-21T12:40:23.8636886Z mov.b32 %r22218, %r22098; 2026-02-21T12:40:23.8636984Z mov.b32 %r22219, %r22098; 2026-02-21T12:40:23.8637132Z mov.b32 %r22220, %r22098; 2026-02-21T12:40:23.8637191Z mov.b32 %r22221, %r22098; 2026-02-21T12:40:23.8637247Z mov.b32 %r22222, %r22098; 2026-02-21T12:40:23.8637303Z mov.b32 %r22223, %r22098; 2026-02-21T12:40:23.8637364Z mov.b32 %r22224, %r22098; 2026-02-21T12:40:23.8637464Z mov.b32 %r22225, %r22098; 2026-02-21T12:40:23.8637663Z $L__BB0_35: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8637851Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8638077Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8638149Z add.s64 %rd446, %rd636, -64; 2026-02-21T12:40:23.8638207Z // begin inline asm 2026-02-21T12:40:23.8638295Z mov.u64 %rd445, 0x0; 2026-02-21T12:40:23.8638521Z createpolicy.fractional.L2::evict_last.b64 %rd445, 1.0; 2026-02-21T12:40:23.8638620Z // end inline asm 2026-02-21T12:40:23.8638806Z // begin inline asm 2026-02-21T12:40:23.8638939Z mov.u32 %r13850, 0x0; 2026-02-21T12:40:23.8638997Z mov.u32 %r13851, 0x0; 2026-02-21T12:40:23.8639052Z mov.u32 %r13852, 0x0; 2026-02-21T12:40:23.8639108Z mov.u32 %r13853, 0x0; 2026-02-21T12:40:23.8639349Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r13850, %r13851, %r13852, %r13853 }, [ %rd446 + 0 ], %rd445; 2026-02-21T12:40:23.8639404Z // end inline asm 2026-02-21T12:40:23.8639616Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8639672Z bar.sync 0; 2026-02-21T12:40:23.8639754Z st.shared.v2.b32 [%r9], {%r13850, %r13851}; 2026-02-21T12:40:23.8639835Z st.shared.v2.b32 [%r10], {%r13852, %r13853}; 2026-02-21T12:40:23.8639888Z bar.sync 0; 2026-02-21T12:40:23.8639958Z ld.shared.b16 %rs1249, [%r53]; 2026-02-21T12:40:23.8640030Z ld.shared.b16 %rs1250, [%r53+256]; 2026-02-21T12:40:23.8640095Z ld.shared.b16 %rs1251, [%r53+16]; 2026-02-21T12:40:23.8640159Z ld.shared.b16 %rs1252, [%r53+272]; 2026-02-21T12:40:23.8640226Z ld.shared.b16 %rs1253, [%r54]; 2026-02-21T12:40:23.8640291Z ld.shared.b16 %rs1254, [%r54+256]; 2026-02-21T12:40:23.8640352Z ld.shared.b16 %rs1255, [%r54+16]; 2026-02-21T12:40:23.8640415Z ld.shared.b16 %rs1256, [%r54+272]; 2026-02-21T12:40:23.8640478Z cvt.f32.bf16 %r14114, %rs1249; 2026-02-21T12:40:23.8640539Z cvt.f32.bf16 %r14115, %rs1250; 2026-02-21T12:40:23.8640596Z cvt.f32.bf16 %r14116, %rs1253; 2026-02-21T12:40:23.8640657Z cvt.f32.bf16 %r14117, %rs1254; 2026-02-21T12:40:23.8640716Z cvt.f32.bf16 %r14374, %rs1251; 2026-02-21T12:40:23.8640774Z cvt.f32.bf16 %r14375, %rs1252; 2026-02-21T12:40:23.8640833Z cvt.f32.bf16 %r14376, %rs1255; 2026-02-21T12:40:23.8640896Z cvt.f32.bf16 %r14377, %rs1256; 2026-02-21T12:40:23.8641100Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8641161Z // begin inline asm 2026-02-21T12:40:23.8641219Z mov.u32 %r13854, 0x0; 2026-02-21T12:40:23.8641279Z mov.u32 %r13855, 0x0; 2026-02-21T12:40:23.8641335Z mov.u32 %r13856, 0x0; 2026-02-21T12:40:23.8641389Z mov.u32 %r13857, 0x0; 2026-02-21T12:40:23.8641527Z ld.global.v4.b32 { %r13854, %r13855, %r13856, %r13857 }, [ %rd635 + 0 ]; 2026-02-21T12:40:23.8641582Z // end inline asm 2026-02-21T12:40:23.8641782Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8641847Z bar.sync 0; 2026-02-21T12:40:23.8641922Z st.shared.b8 [%r13], %r13854; 2026-02-21T12:40:23.8641993Z prmt.b32 %r16220, %r13854, 0, 0x7771U; 2026-02-21T12:40:23.8642057Z st.shared.b8 [%r14], %r16220; 2026-02-21T12:40:23.8642124Z prmt.b32 %r16221, %r13854, 0, 0x7772U; 2026-02-21T12:40:23.8642188Z st.shared.b8 [%r15+256], %r16221; 2026-02-21T12:40:23.8642250Z prmt.b32 %r16222, %r13854, 0, 0x7773U; 2026-02-21T12:40:23.8642429Z st.shared.b8 [%r16+256], %r16222; 2026-02-21T12:40:23.8642489Z st.shared.b8 [%r17+512], %r13855; 2026-02-21T12:40:23.8642603Z prmt.b32 %r16223, %r13855, 0, 0x7771U; 2026-02-21T12:40:23.8642666Z st.shared.b8 [%r18+512], %r16223; 2026-02-21T12:40:23.8642728Z prmt.b32 %r16224, %r13855, 0, 0x7772U; 2026-02-21T12:40:23.8642788Z st.shared.b8 [%r19+768], %r16224; 2026-02-21T12:40:23.8642851Z prmt.b32 %r16225, %r13855, 0, 0x7773U; 2026-02-21T12:40:23.8642914Z st.shared.b8 [%r20+768], %r16225; 2026-02-21T12:40:23.8642975Z st.shared.b8 [%r21+1024], %r13856; 2026-02-21T12:40:23.8643038Z prmt.b32 %r16226, %r13856, 0, 0x7771U; 2026-02-21T12:40:23.8643103Z st.shared.b8 [%r22+1024], %r16226; 2026-02-21T12:40:23.8643179Z prmt.b32 %r16227, %r13856, 0, 0x7772U; 2026-02-21T12:40:23.8643241Z st.shared.b8 [%r23+1280], %r16227; 2026-02-21T12:40:23.8643306Z prmt.b32 %r16228, %r13856, 0, 0x7773U; 2026-02-21T12:40:23.8643367Z st.shared.b8 [%r24+1280], %r16228; 2026-02-21T12:40:23.8643430Z st.shared.b8 [%r25+1536], %r13857; 2026-02-21T12:40:23.8643492Z prmt.b32 %r16229, %r13857, 0, 0x7771U; 2026-02-21T12:40:23.8643649Z st.shared.b8 [%r26+1536], %r16229; 2026-02-21T12:40:23.8643714Z prmt.b32 %r16230, %r13857, 0, 0x7772U; 2026-02-21T12:40:23.8643774Z st.shared.b8 [%r27+1792], %r16230; 2026-02-21T12:40:23.8643837Z prmt.b32 %r16231, %r13857, 0, 0x7773U; 2026-02-21T12:40:23.8643898Z st.shared.b8 [%r28+1792], %r16231; 2026-02-21T12:40:23.8643950Z bar.sync 0; 2026-02-21T12:40:23.8644012Z ld.shared.b32 %r16232, [%r29]; 2026-02-21T12:40:23.8644079Z prmt.b32 %r16233, %r16232, 0, 0x7770U; 2026-02-21T12:40:23.8644138Z cvt.u16.u32 %rs1257, %r16233; 2026-02-21T12:40:23.8644201Z prmt.b32 %r16234, %r16232, 0, 0x7771U; 2026-02-21T12:40:23.8644261Z cvt.u16.u32 %rs1258, %r16234; 2026-02-21T12:40:23.8644324Z prmt.b32 %r16235, %r16232, 0, 0x7772U; 2026-02-21T12:40:23.8644384Z cvt.u16.u32 %rs1259, %r16235; 2026-02-21T12:40:23.8644458Z prmt.b32 %r16236, %r16232, 0, 0x7773U; 2026-02-21T12:40:23.8644524Z cvt.u16.u32 %rs1260, %r16236; 2026-02-21T12:40:23.8644587Z ld.shared.b32 %r16237, [%r30]; 2026-02-21T12:40:23.8644653Z prmt.b32 %r16238, %r16237, 0, 0x7770U; 2026-02-21T12:40:23.8644713Z cvt.u16.u32 %rs1261, %r16238; 2026-02-21T12:40:23.8644774Z prmt.b32 %r16239, %r16237, 0, 0x7771U; 2026-02-21T12:40:23.8644833Z cvt.u16.u32 %rs1262, %r16239; 2026-02-21T12:40:23.8644896Z prmt.b32 %r16240, %r16237, 0, 0x7772U; 2026-02-21T12:40:23.8644954Z cvt.u16.u32 %rs1263, %r16240; 2026-02-21T12:40:23.8645016Z prmt.b32 %r16241, %r16237, 0, 0x7773U; 2026-02-21T12:40:23.8645075Z cvt.u16.u32 %rs1264, %r16241; 2026-02-21T12:40:23.8645139Z ld.shared.b32 %r16242, [%r31]; 2026-02-21T12:40:23.8645200Z prmt.b32 %r16243, %r16242, 0, 0x7770U; 2026-02-21T12:40:23.8645259Z cvt.u16.u32 %rs1265, %r16243; 2026-02-21T12:40:23.8645322Z prmt.b32 %r16244, %r16242, 0, 0x7771U; 2026-02-21T12:40:23.8645380Z cvt.u16.u32 %rs1266, %r16244; 2026-02-21T12:40:23.8645445Z prmt.b32 %r16245, %r16242, 0, 0x7772U; 2026-02-21T12:40:23.8645507Z cvt.u16.u32 %rs1267, %r16245; 2026-02-21T12:40:23.8645572Z prmt.b32 %r16246, %r16242, 0, 0x7773U; 2026-02-21T12:40:23.8645635Z cvt.u16.u32 %rs1268, %r16246; 2026-02-21T12:40:23.8645698Z ld.shared.b32 %r16247, [%r32]; 2026-02-21T12:40:23.8645762Z prmt.b32 %r16248, %r16247, 0, 0x7770U; 2026-02-21T12:40:23.8645820Z cvt.u16.u32 %rs1269, %r16248; 2026-02-21T12:40:23.8645881Z prmt.b32 %r16249, %r16247, 0, 0x7771U; 2026-02-21T12:40:23.8645952Z cvt.u16.u32 %rs1270, %r16249; 2026-02-21T12:40:23.8646023Z prmt.b32 %r16250, %r16247, 0, 0x7772U; 2026-02-21T12:40:23.8646082Z cvt.u16.u32 %rs1271, %r16250; 2026-02-21T12:40:23.8646146Z prmt.b32 %r16251, %r16247, 0, 0x7773U; 2026-02-21T12:40:23.8646207Z cvt.u16.u32 %rs1272, %r16251; 2026-02-21T12:40:23.8646404Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8646639Z shl.b16 %rs1273, %rs1257, 4; 2026-02-21T12:40:23.8646792Z shl.b16 %rs1274, %rs1261, 4; 2026-02-21T12:40:23.8646851Z shl.b16 %rs1275, %rs1265, 4; 2026-02-21T12:40:23.8646912Z shl.b16 %rs1276, %rs1269, 4; 2026-02-21T12:40:23.8647034Z shl.b16 %rs1277, %rs1258, 4; 2026-02-21T12:40:23.8647092Z shl.b16 %rs1278, %rs1262, 4; 2026-02-21T12:40:23.8647151Z shl.b16 %rs1279, %rs1266, 4; 2026-02-21T12:40:23.8647212Z shl.b16 %rs1280, %rs1270, 4; 2026-02-21T12:40:23.8647270Z shl.b16 %rs1281, %rs1259, 4; 2026-02-21T12:40:23.8647329Z shl.b16 %rs1282, %rs1263, 4; 2026-02-21T12:40:23.8647390Z shl.b16 %rs1283, %rs1267, 4; 2026-02-21T12:40:23.8647447Z shl.b16 %rs1284, %rs1271, 4; 2026-02-21T12:40:23.8647518Z shl.b16 %rs1285, %rs1260, 4; 2026-02-21T12:40:23.8647578Z shl.b16 %rs1286, %rs1264, 4; 2026-02-21T12:40:23.8647638Z shl.b16 %rs1287, %rs1268, 4; 2026-02-21T12:40:23.8647696Z shl.b16 %rs1288, %rs1272, 4; 2026-02-21T12:40:23.8647894Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8647961Z cvt.s16.s8 %rs1289, %rs1273; 2026-02-21T12:40:23.8648017Z shr.s16 %rs1290, %rs1289, 4; 2026-02-21T12:40:23.8648075Z cvt.s16.s8 %rs1291, %rs1275; 2026-02-21T12:40:23.8648256Z shr.s16 %rs1292, %rs1291, 4; 2026-02-21T12:40:23.8648329Z prmt.b32 %r16252, %r16232, 0, 0x8880U; 2026-02-21T12:40:23.8648388Z cvt.u16.u32 %rs1293, %r16252; 2026-02-21T12:40:23.8648447Z shr.s16 %rs1294, %rs1293, 4; 2026-02-21T12:40:23.8648515Z prmt.b32 %r16253, %r16242, 0, 0x8880U; 2026-02-21T12:40:23.8648572Z cvt.u16.u32 %rs1295, %r16253; 2026-02-21T12:40:23.8648629Z shr.s16 %rs1296, %rs1295, 4; 2026-02-21T12:40:23.8648822Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8648885Z cvt.rn.f32.s16 %r16254, %rs1296; 2026-02-21T12:40:23.8648946Z cvt.rn.f32.s16 %r16255, %rs1294; 2026-02-21T12:40:23.8649005Z cvt.rn.f32.s16 %r16256, %rs1292; 2026-02-21T12:40:23.8649067Z cvt.rn.f32.s16 %r16257, %rs1290; 2026-02-21T12:40:23.8649260Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8649319Z cvt.s16.s8 %rs1297, %rs1274; 2026-02-21T12:40:23.8649383Z shr.s16 %rs1298, %rs1297, 4; 2026-02-21T12:40:23.8649441Z cvt.s16.s8 %rs1299, %rs1276; 2026-02-21T12:40:23.8649498Z shr.s16 %rs1300, %rs1299, 4; 2026-02-21T12:40:23.8649563Z prmt.b32 %r16258, %r16237, 0, 0x8880U; 2026-02-21T12:40:23.8649621Z cvt.u16.u32 %rs1301, %r16258; 2026-02-21T12:40:23.8649679Z shr.s16 %rs1302, %rs1301, 4; 2026-02-21T12:40:23.8649741Z prmt.b32 %r16259, %r16247, 0, 0x8880U; 2026-02-21T12:40:23.8649801Z cvt.u16.u32 %rs1303, %r16259; 2026-02-21T12:40:23.8649871Z shr.s16 %rs1304, %rs1303, 4; 2026-02-21T12:40:23.8650068Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8650133Z cvt.rn.f32.s16 %r16260, %rs1304; 2026-02-21T12:40:23.8650194Z cvt.rn.f32.s16 %r16261, %rs1302; 2026-02-21T12:40:23.8650253Z cvt.rn.f32.s16 %r16262, %rs1300; 2026-02-21T12:40:23.8650316Z cvt.rn.f32.s16 %r16263, %rs1298; 2026-02-21T12:40:23.8650509Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8650571Z cvt.s16.s8 %rs1305, %rs1277; 2026-02-21T12:40:23.8650630Z shr.s16 %rs1306, %rs1305, 4; 2026-02-21T12:40:23.8650690Z cvt.s16.s8 %rs1307, %rs1279; 2026-02-21T12:40:23.8650749Z shr.s16 %rs1308, %rs1307, 4; 2026-02-21T12:40:23.8650814Z prmt.b32 %r16264, %r16232, 0, 0x9991U; 2026-02-21T12:40:23.8650875Z cvt.u16.u32 %rs1309, %r16264; 2026-02-21T12:40:23.8650934Z shr.s16 %rs1310, %rs1309, 4; 2026-02-21T12:40:23.8650996Z prmt.b32 %r16265, %r16242, 0, 0x9991U; 2026-02-21T12:40:23.8651055Z cvt.u16.u32 %rs1311, %r16265; 2026-02-21T12:40:23.8651116Z shr.s16 %rs1312, %rs1311, 4; 2026-02-21T12:40:23.8651314Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8651375Z cvt.rn.f32.s16 %r16266, %rs1312; 2026-02-21T12:40:23.8651500Z cvt.rn.f32.s16 %r16267, %rs1310; 2026-02-21T12:40:23.8651559Z cvt.rn.f32.s16 %r16268, %rs1308; 2026-02-21T12:40:23.8651667Z cvt.rn.f32.s16 %r16269, %rs1306; 2026-02-21T12:40:23.8651862Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8651921Z cvt.s16.s8 %rs1313, %rs1278; 2026-02-21T12:40:23.8651981Z shr.s16 %rs1314, %rs1313, 4; 2026-02-21T12:40:23.8652040Z cvt.s16.s8 %rs1315, %rs1280; 2026-02-21T12:40:23.8652100Z shr.s16 %rs1316, %rs1315, 4; 2026-02-21T12:40:23.8652164Z prmt.b32 %r16270, %r16237, 0, 0x9991U; 2026-02-21T12:40:23.8652223Z cvt.u16.u32 %rs1317, %r16270; 2026-02-21T12:40:23.8652294Z shr.s16 %rs1318, %rs1317, 4; 2026-02-21T12:40:23.8652359Z prmt.b32 %r16271, %r16247, 0, 0x9991U; 2026-02-21T12:40:23.8652419Z cvt.u16.u32 %rs1319, %r16271; 2026-02-21T12:40:23.8652478Z shr.s16 %rs1320, %rs1319, 4; 2026-02-21T12:40:23.8652675Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8652740Z cvt.rn.f32.s16 %r16272, %rs1320; 2026-02-21T12:40:23.8652855Z cvt.rn.f32.s16 %r16273, %rs1318; 2026-02-21T12:40:23.8652964Z cvt.rn.f32.s16 %r16274, %rs1316; 2026-02-21T12:40:23.8653026Z cvt.rn.f32.s16 %r16275, %rs1314; 2026-02-21T12:40:23.8653219Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8653283Z cvt.s16.s8 %rs1321, %rs1281; 2026-02-21T12:40:23.8653343Z shr.s16 %rs1322, %rs1321, 4; 2026-02-21T12:40:23.8653400Z cvt.s16.s8 %rs1323, %rs1283; 2026-02-21T12:40:23.8653459Z shr.s16 %rs1324, %rs1323, 4; 2026-02-21T12:40:23.8653530Z prmt.b32 %r16276, %r16232, 0, 0xaaa2U; 2026-02-21T12:40:23.8653592Z cvt.u16.u32 %rs1325, %r16276; 2026-02-21T12:40:23.8653651Z shr.s16 %rs1326, %rs1325, 4; 2026-02-21T12:40:23.8653719Z prmt.b32 %r16277, %r16242, 0, 0xaaa2U; 2026-02-21T12:40:23.8653778Z cvt.u16.u32 %rs1327, %r16277; 2026-02-21T12:40:23.8653839Z shr.s16 %rs1328, %rs1327, 4; 2026-02-21T12:40:23.8654040Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8654105Z cvt.rn.f32.s16 %r16278, %rs1328; 2026-02-21T12:40:23.8654168Z cvt.rn.f32.s16 %r16279, %rs1326; 2026-02-21T12:40:23.8654227Z cvt.rn.f32.s16 %r16280, %rs1324; 2026-02-21T12:40:23.8654301Z cvt.rn.f32.s16 %r16281, %rs1322; 2026-02-21T12:40:23.8654497Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8654558Z cvt.s16.s8 %rs1329, %rs1282; 2026-02-21T12:40:23.8654620Z shr.s16 %rs1330, %rs1329, 4; 2026-02-21T12:40:23.8654679Z cvt.s16.s8 %rs1331, %rs1284; 2026-02-21T12:40:23.8654737Z shr.s16 %rs1332, %rs1331, 4; 2026-02-21T12:40:23.8654803Z prmt.b32 %r16282, %r16237, 0, 0xaaa2U; 2026-02-21T12:40:23.8654865Z cvt.u16.u32 %rs1333, %r16282; 2026-02-21T12:40:23.8654923Z shr.s16 %rs1334, %rs1333, 4; 2026-02-21T12:40:23.8654987Z prmt.b32 %r16283, %r16247, 0, 0xaaa2U; 2026-02-21T12:40:23.8655048Z cvt.u16.u32 %rs1335, %r16283; 2026-02-21T12:40:23.8655108Z shr.s16 %rs1336, %rs1335, 4; 2026-02-21T12:40:23.8655302Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8655365Z cvt.rn.f32.s16 %r16284, %rs1336; 2026-02-21T12:40:23.8655424Z cvt.rn.f32.s16 %r16285, %rs1334; 2026-02-21T12:40:23.8655483Z cvt.rn.f32.s16 %r16286, %rs1332; 2026-02-21T12:40:23.8655542Z cvt.rn.f32.s16 %r16287, %rs1330; 2026-02-21T12:40:23.8655735Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8655794Z cvt.s16.s8 %rs1337, %rs1285; 2026-02-21T12:40:23.8655853Z shr.s16 %rs1338, %rs1337, 4; 2026-02-21T12:40:23.8655914Z cvt.s16.s8 %rs1339, %rs1287; 2026-02-21T12:40:23.8655972Z shr.s16 %rs1340, %rs1339, 4; 2026-02-21T12:40:23.8656035Z prmt.b32 %r16288, %r16232, 0, 0xbbb3U; 2026-02-21T12:40:23.8656160Z cvt.u16.u32 %rs1341, %r16288; 2026-02-21T12:40:23.8656218Z shr.s16 %rs1342, %rs1341, 4; 2026-02-21T12:40:23.8656282Z prmt.b32 %r16289, %r16242, 0, 0xbbb3U; 2026-02-21T12:40:23.8656404Z cvt.u16.u32 %rs1343, %r16289; 2026-02-21T12:40:23.8656590Z shr.s16 %rs1344, %rs1343, 4; 2026-02-21T12:40:23.8656786Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8656847Z cvt.rn.f32.s16 %r16290, %rs1344; 2026-02-21T12:40:23.8656909Z cvt.rn.f32.s16 %r16291, %rs1342; 2026-02-21T12:40:23.8656968Z cvt.rn.f32.s16 %r16292, %rs1340; 2026-02-21T12:40:23.8657028Z cvt.rn.f32.s16 %r16293, %rs1338; 2026-02-21T12:40:23.8657230Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8657294Z cvt.s16.s8 %rs1345, %rs1286; 2026-02-21T12:40:23.8657353Z shr.s16 %rs1346, %rs1345, 4; 2026-02-21T12:40:23.8657411Z cvt.s16.s8 %rs1347, %rs1288; 2026-02-21T12:40:23.8657474Z shr.s16 %rs1348, %rs1347, 4; 2026-02-21T12:40:23.8657538Z prmt.b32 %r16294, %r16237, 0, 0xbbb3U; 2026-02-21T12:40:23.8657597Z cvt.u16.u32 %rs1349, %r16294; 2026-02-21T12:40:23.8657804Z shr.s16 %rs1350, %rs1349, 4; 2026-02-21T12:40:23.8657872Z prmt.b32 %r16295, %r16247, 0, 0xbbb3U; 2026-02-21T12:40:23.8657932Z cvt.u16.u32 %rs1351, %r16295; 2026-02-21T12:40:23.8657991Z shr.s16 %rs1352, %rs1351, 4; 2026-02-21T12:40:23.8658184Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8658246Z cvt.rn.f32.s16 %r16296, %rs1352; 2026-02-21T12:40:23.8658307Z cvt.rn.f32.s16 %r16297, %rs1350; 2026-02-21T12:40:23.8658368Z cvt.rn.f32.s16 %r16298, %rs1348; 2026-02-21T12:40:23.8658428Z cvt.rn.f32.s16 %r16299, %rs1346; 2026-02-21T12:40:23.8658482Z bar.sync 0; 2026-02-21T12:40:23.8658601Z st.shared.v4.b32 [%r33], {%r16257, %r16255, %r16256, %r16254}; 2026-02-21T12:40:23.8658725Z st.shared.v4.b32 [%r33+8192], {%r16263, %r16261, %r16262, %r16260}; 2026-02-21T12:40:23.8658836Z st.shared.v4.b32 [%r34], {%r16269, %r16267, %r16268, %r16266}; 2026-02-21T12:40:23.8658953Z st.shared.v4.b32 [%r34+8192], {%r16275, %r16273, %r16274, %r16272}; 2026-02-21T12:40:23.8659063Z st.shared.v4.b32 [%r35], {%r16281, %r16279, %r16280, %r16278}; 2026-02-21T12:40:23.8659174Z st.shared.v4.b32 [%r35+8192], {%r16287, %r16285, %r16286, %r16284}; 2026-02-21T12:40:23.8659278Z st.shared.v4.b32 [%r36], {%r16293, %r16291, %r16292, %r16290}; 2026-02-21T12:40:23.8659392Z st.shared.v4.b32 [%r36+8192], {%r16299, %r16297, %r16298, %r16296}; 2026-02-21T12:40:23.8659446Z $L__tmp25: 2026-02-21T12:40:23.8659718Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8659780Z // begin inline asm 2026-02-21T12:40:23.8659857Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8659912Z // end inline asm 2026-02-21T12:40:23.8659964Z bar.sync 0; 2026-02-21T12:40:23.8660049Z shfl.sync.idx.b32 %r16300, %r2, 0, 31, -1; 2026-02-21T12:40:23.8660121Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8660185Z mov.pred %p36, -1; 2026-02-21T12:40:23.8660247Z // begin inline asm 2026-02-21T12:40:23.8662920Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r14114,%r14115,%r14116,%r14117}, %rd23, %p36, 1, 1; 2026-02-21T12:40:23.8663115Z // end inline asm 2026-02-21T12:40:23.8663171Z // begin inline asm 2026-02-21T12:40:23.8665924Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r14374,%r14375,%r14376,%r14377}, %rd24, %p36, 1, 1; 2026-02-21T12:40:23.8665986Z // end inline asm 2026-02-21T12:40:23.8666062Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8666126Z mov.b32 %r16088, 0; 2026-02-21T12:40:23.8666187Z mov.b32 %r14506, %r18409; 2026-02-21T12:40:23.8666243Z mov.b32 %r14507, %r16088; 2026-02-21T12:40:23.8666301Z mov.b32 %r14508, %r16088; 2026-02-21T12:40:23.8666356Z // begin inline asm 2026-02-21T12:40:23.8669009Z // wait for regs: %r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225,%r14506,%r14507,%r14508 2026-02-21T12:40:23.8669098Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8669152Z // end inline asm 2026-02-21T12:40:23.8669206Z $L__tmp26: 2026-02-21T12:40:23.8669419Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8669485Z add.s64 %rd452, %rd636, -32; 2026-02-21T12:40:23.8669543Z // begin inline asm 2026-02-21T12:40:23.8669603Z mov.u64 %rd451, 0x0; 2026-02-21T12:40:23.8669728Z createpolicy.fractional.L2::evict_last.b64 %rd451, 1.0; 2026-02-21T12:40:23.8669783Z // end inline asm 2026-02-21T12:40:23.8669841Z // begin inline asm 2026-02-21T12:40:23.8669899Z mov.u32 %r14640, 0x0; 2026-02-21T12:40:23.8669955Z mov.u32 %r14641, 0x0; 2026-02-21T12:40:23.8670009Z mov.u32 %r14642, 0x0; 2026-02-21T12:40:23.8670066Z mov.u32 %r14643, 0x0; 2026-02-21T12:40:23.8670304Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r14640, %r14641, %r14642, %r14643 }, [ %rd452 + 0 ], %rd451; 2026-02-21T12:40:23.8670370Z // end inline asm 2026-02-21T12:40:23.8670656Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8670770Z bar.sync 0; 2026-02-21T12:40:23.8670856Z st.shared.v2.b32 [%r9], {%r14640, %r14641}; 2026-02-21T12:40:23.8670936Z st.shared.v2.b32 [%r10], {%r14642, %r14643}; 2026-02-21T12:40:23.8670989Z bar.sync 0; 2026-02-21T12:40:23.8671054Z ld.shared.b16 %rs1353, [%r53]; 2026-02-21T12:40:23.8671120Z ld.shared.b16 %rs1354, [%r53+256]; 2026-02-21T12:40:23.8671187Z ld.shared.b16 %rs1355, [%r53+16]; 2026-02-21T12:40:23.8671251Z ld.shared.b16 %rs1356, [%r53+272]; 2026-02-21T12:40:23.8671314Z ld.shared.b16 %rs1357, [%r54]; 2026-02-21T12:40:23.8671379Z ld.shared.b16 %rs1358, [%r54+256]; 2026-02-21T12:40:23.8671442Z ld.shared.b16 %rs1359, [%r54+16]; 2026-02-21T12:40:23.8671502Z ld.shared.b16 %rs1360, [%r54+272]; 2026-02-21T12:40:23.8671563Z cvt.f32.bf16 %r14904, %rs1353; 2026-02-21T12:40:23.8671626Z cvt.f32.bf16 %r14905, %rs1354; 2026-02-21T12:40:23.8671687Z cvt.f32.bf16 %r14906, %rs1357; 2026-02-21T12:40:23.8671745Z cvt.f32.bf16 %r14907, %rs1358; 2026-02-21T12:40:23.8671869Z cvt.f32.bf16 %r15164, %rs1355; 2026-02-21T12:40:23.8671992Z cvt.f32.bf16 %r15165, %rs1356; 2026-02-21T12:40:23.8672054Z cvt.f32.bf16 %r15166, %rs1359; 2026-02-21T12:40:23.8672113Z cvt.f32.bf16 %r15167, %rs1360; 2026-02-21T12:40:23.8672308Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8672369Z add.s64 %rd454, %rd635, 10240; 2026-02-21T12:40:23.8672425Z // begin inline asm 2026-02-21T12:40:23.8672485Z mov.u32 %r14644, 0x0; 2026-02-21T12:40:23.8672553Z mov.u32 %r14645, 0x0; 2026-02-21T12:40:23.8672609Z mov.u32 %r14646, 0x0; 2026-02-21T12:40:23.8672667Z mov.u32 %r14647, 0x0; 2026-02-21T12:40:23.8672799Z ld.global.v4.b32 { %r14644, %r14645, %r14646, %r14647 }, [ %rd454 + 0 ]; 2026-02-21T12:40:23.8672854Z // end inline asm 2026-02-21T12:40:23.8673046Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8673103Z bar.sync 0; 2026-02-21T12:40:23.8673168Z st.shared.b8 [%r13], %r14644; 2026-02-21T12:40:23.8673239Z prmt.b32 %r16301, %r14644, 0, 0x7771U; 2026-02-21T12:40:23.8673304Z st.shared.b8 [%r14], %r16301; 2026-02-21T12:40:23.8673369Z prmt.b32 %r16302, %r14644, 0, 0x7772U; 2026-02-21T12:40:23.8673431Z st.shared.b8 [%r15+256], %r16302; 2026-02-21T12:40:23.8673494Z prmt.b32 %r16303, %r14644, 0, 0x7773U; 2026-02-21T12:40:23.8673554Z st.shared.b8 [%r16+256], %r16303; 2026-02-21T12:40:23.8673614Z st.shared.b8 [%r17+512], %r14645; 2026-02-21T12:40:23.8673676Z prmt.b32 %r16304, %r14645, 0, 0x7771U; 2026-02-21T12:40:23.8673739Z st.shared.b8 [%r18+512], %r16304; 2026-02-21T12:40:23.8673802Z prmt.b32 %r16305, %r14645, 0, 0x7772U; 2026-02-21T12:40:23.8673863Z st.shared.b8 [%r19+768], %r16305; 2026-02-21T12:40:23.8673927Z prmt.b32 %r16306, %r14645, 0, 0x7773U; 2026-02-21T12:40:23.8673986Z st.shared.b8 [%r20+768], %r16306; 2026-02-21T12:40:23.8674050Z st.shared.b8 [%r21+1024], %r14646; 2026-02-21T12:40:23.8674113Z prmt.b32 %r16307, %r14646, 0, 0x7771U; 2026-02-21T12:40:23.8674180Z st.shared.b8 [%r22+1024], %r16307; 2026-02-21T12:40:23.8674242Z prmt.b32 %r16308, %r14646, 0, 0x7772U; 2026-02-21T12:40:23.8674302Z st.shared.b8 [%r23+1280], %r16308; 2026-02-21T12:40:23.8674366Z prmt.b32 %r16309, %r14646, 0, 0x7773U; 2026-02-21T12:40:23.8674426Z st.shared.b8 [%r24+1280], %r16309; 2026-02-21T12:40:23.8674486Z st.shared.b8 [%r25+1536], %r14647; 2026-02-21T12:40:23.8674552Z prmt.b32 %r16310, %r14647, 0, 0x7771U; 2026-02-21T12:40:23.8674613Z st.shared.b8 [%r26+1536], %r16310; 2026-02-21T12:40:23.8674675Z prmt.b32 %r16311, %r14647, 0, 0x7772U; 2026-02-21T12:40:23.8674736Z st.shared.b8 [%r27+1792], %r16311; 2026-02-21T12:40:23.8674801Z prmt.b32 %r16312, %r14647, 0, 0x7773U; 2026-02-21T12:40:23.8674862Z st.shared.b8 [%r28+1792], %r16312; 2026-02-21T12:40:23.8674916Z bar.sync 0; 2026-02-21T12:40:23.8675051Z ld.shared.b32 %r16313, [%r29]; 2026-02-21T12:40:23.8675115Z prmt.b32 %r16314, %r16313, 0, 0x7770U; 2026-02-21T12:40:23.8675177Z cvt.u16.u32 %rs1361, %r16314; 2026-02-21T12:40:23.8675288Z prmt.b32 %r16315, %r16313, 0, 0x7771U; 2026-02-21T12:40:23.8675350Z cvt.u16.u32 %rs1362, %r16315; 2026-02-21T12:40:23.8675412Z prmt.b32 %r16316, %r16313, 0, 0x7772U; 2026-02-21T12:40:23.8675471Z cvt.u16.u32 %rs1363, %r16316; 2026-02-21T12:40:23.8675535Z prmt.b32 %r16317, %r16313, 0, 0x7773U; 2026-02-21T12:40:23.8675593Z cvt.u16.u32 %rs1364, %r16317; 2026-02-21T12:40:23.8675655Z ld.shared.b32 %r16318, [%r30]; 2026-02-21T12:40:23.8675717Z prmt.b32 %r16319, %r16318, 0, 0x7770U; 2026-02-21T12:40:23.8675778Z cvt.u16.u32 %rs1365, %r16319; 2026-02-21T12:40:23.8675839Z prmt.b32 %r16320, %r16318, 0, 0x7771U; 2026-02-21T12:40:23.8675897Z cvt.u16.u32 %rs1366, %r16320; 2026-02-21T12:40:23.8675971Z prmt.b32 %r16321, %r16318, 0, 0x7772U; 2026-02-21T12:40:23.8676031Z cvt.u16.u32 %rs1367, %r16321; 2026-02-21T12:40:23.8676095Z prmt.b32 %r16322, %r16318, 0, 0x7773U; 2026-02-21T12:40:23.8676154Z cvt.u16.u32 %rs1368, %r16322; 2026-02-21T12:40:23.8676307Z ld.shared.b32 %r16323, [%r31]; 2026-02-21T12:40:23.8676373Z prmt.b32 %r16324, %r16323, 0, 0x7770U; 2026-02-21T12:40:23.8676430Z cvt.u16.u32 %rs1369, %r16324; 2026-02-21T12:40:23.8676706Z prmt.b32 %r16325, %r16323, 0, 0x7771U; 2026-02-21T12:40:23.8676774Z cvt.u16.u32 %rs1370, %r16325; 2026-02-21T12:40:23.8676842Z prmt.b32 %r16326, %r16323, 0, 0x7772U; 2026-02-21T12:40:23.8676905Z cvt.u16.u32 %rs1371, %r16326; 2026-02-21T12:40:23.8676969Z prmt.b32 %r16327, %r16323, 0, 0x7773U; 2026-02-21T12:40:23.8677029Z cvt.u16.u32 %rs1372, %r16327; 2026-02-21T12:40:23.8677093Z ld.shared.b32 %r16328, [%r32]; 2026-02-21T12:40:23.8677160Z prmt.b32 %r16329, %r16328, 0, 0x7770U; 2026-02-21T12:40:23.8677219Z cvt.u16.u32 %rs1373, %r16329; 2026-02-21T12:40:23.8677280Z prmt.b32 %r16330, %r16328, 0, 0x7771U; 2026-02-21T12:40:23.8677346Z cvt.u16.u32 %rs1374, %r16330; 2026-02-21T12:40:23.8677408Z prmt.b32 %r16331, %r16328, 0, 0x7772U; 2026-02-21T12:40:23.8677470Z cvt.u16.u32 %rs1375, %r16331; 2026-02-21T12:40:23.8677551Z prmt.b32 %r16332, %r16328, 0, 0x7773U; 2026-02-21T12:40:23.8677611Z cvt.u16.u32 %rs1376, %r16332; 2026-02-21T12:40:23.8677818Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8677882Z shl.b16 %rs1377, %rs1361, 4; 2026-02-21T12:40:23.8677946Z shl.b16 %rs1378, %rs1365, 4; 2026-02-21T12:40:23.8678005Z shl.b16 %rs1379, %rs1369, 4; 2026-02-21T12:40:23.8678064Z shl.b16 %rs1380, %rs1373, 4; 2026-02-21T12:40:23.8678125Z shl.b16 %rs1381, %rs1362, 4; 2026-02-21T12:40:23.8678184Z shl.b16 %rs1382, %rs1366, 4; 2026-02-21T12:40:23.8678241Z shl.b16 %rs1383, %rs1370, 4; 2026-02-21T12:40:23.8678299Z shl.b16 %rs1384, %rs1374, 4; 2026-02-21T12:40:23.8678359Z shl.b16 %rs1385, %rs1363, 4; 2026-02-21T12:40:23.8678417Z shl.b16 %rs1386, %rs1367, 4; 2026-02-21T12:40:23.8678487Z shl.b16 %rs1387, %rs1371, 4; 2026-02-21T12:40:23.8678550Z shl.b16 %rs1388, %rs1375, 4; 2026-02-21T12:40:23.8678613Z shl.b16 %rs1389, %rs1364, 4; 2026-02-21T12:40:23.8678674Z shl.b16 %rs1390, %rs1368, 4; 2026-02-21T12:40:23.8678731Z shl.b16 %rs1391, %rs1372, 4; 2026-02-21T12:40:23.8678792Z shl.b16 %rs1392, %rs1376, 4; 2026-02-21T12:40:23.8678990Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8679051Z cvt.s16.s8 %rs1393, %rs1377; 2026-02-21T12:40:23.8679112Z shr.s16 %rs1394, %rs1393, 4; 2026-02-21T12:40:23.8679170Z cvt.s16.s8 %rs1395, %rs1379; 2026-02-21T12:40:23.8679229Z shr.s16 %rs1396, %rs1395, 4; 2026-02-21T12:40:23.8679296Z prmt.b32 %r16333, %r16313, 0, 0x8880U; 2026-02-21T12:40:23.8679356Z cvt.u16.u32 %rs1397, %r16333; 2026-02-21T12:40:23.8679413Z shr.s16 %rs1398, %rs1397, 4; 2026-02-21T12:40:23.8679478Z prmt.b32 %r16334, %r16323, 0, 0x8880U; 2026-02-21T12:40:23.8679626Z cvt.u16.u32 %rs1399, %r16334; 2026-02-21T12:40:23.8679685Z shr.s16 %rs1400, %rs1399, 4; 2026-02-21T12:40:23.8679883Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8680013Z cvt.rn.f32.s16 %r16335, %rs1400; 2026-02-21T12:40:23.8680075Z cvt.rn.f32.s16 %r16336, %rs1398; 2026-02-21T12:40:23.8680135Z cvt.rn.f32.s16 %r16337, %rs1396; 2026-02-21T12:40:23.8680196Z cvt.rn.f32.s16 %r16338, %rs1394; 2026-02-21T12:40:23.8680400Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8680462Z cvt.s16.s8 %rs1401, %rs1378; 2026-02-21T12:40:23.8680520Z shr.s16 %rs1402, %rs1401, 4; 2026-02-21T12:40:23.8680581Z cvt.s16.s8 %rs1403, %rs1380; 2026-02-21T12:40:23.8680637Z shr.s16 %rs1404, %rs1403, 4; 2026-02-21T12:40:23.8680700Z prmt.b32 %r16339, %r16318, 0, 0x8880U; 2026-02-21T12:40:23.8680762Z cvt.u16.u32 %rs1405, %r16339; 2026-02-21T12:40:23.8680820Z shr.s16 %rs1406, %rs1405, 4; 2026-02-21T12:40:23.8680887Z prmt.b32 %r16340, %r16328, 0, 0x8880U; 2026-02-21T12:40:23.8680946Z cvt.u16.u32 %rs1407, %r16340; 2026-02-21T12:40:23.8681151Z shr.s16 %rs1408, %rs1407, 4; 2026-02-21T12:40:23.8681347Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8681409Z cvt.rn.f32.s16 %r16341, %rs1408; 2026-02-21T12:40:23.8681472Z cvt.rn.f32.s16 %r16342, %rs1406; 2026-02-21T12:40:23.8681531Z cvt.rn.f32.s16 %r16343, %rs1404; 2026-02-21T12:40:23.8681589Z cvt.rn.f32.s16 %r16344, %rs1402; 2026-02-21T12:40:23.8681781Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8681839Z cvt.s16.s8 %rs1409, %rs1381; 2026-02-21T12:40:23.8681897Z shr.s16 %rs1410, %rs1409, 4; 2026-02-21T12:40:23.8681955Z cvt.s16.s8 %rs1411, %rs1383; 2026-02-21T12:40:23.8682016Z shr.s16 %rs1412, %rs1411, 4; 2026-02-21T12:40:23.8682079Z prmt.b32 %r16345, %r16313, 0, 0x9991U; 2026-02-21T12:40:23.8682140Z cvt.u16.u32 %rs1413, %r16345; 2026-02-21T12:40:23.8682200Z shr.s16 %rs1414, %rs1413, 4; 2026-02-21T12:40:23.8682265Z prmt.b32 %r16346, %r16323, 0, 0x9991U; 2026-02-21T12:40:23.8682326Z cvt.u16.u32 %rs1415, %r16346; 2026-02-21T12:40:23.8682387Z shr.s16 %rs1416, %rs1415, 4; 2026-02-21T12:40:23.8682577Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8682638Z cvt.rn.f32.s16 %r16347, %rs1416; 2026-02-21T12:40:23.8682697Z cvt.rn.f32.s16 %r16348, %rs1414; 2026-02-21T12:40:23.8682758Z cvt.rn.f32.s16 %r16349, %rs1412; 2026-02-21T12:40:23.8682818Z cvt.rn.f32.s16 %r16350, %rs1410; 2026-02-21T12:40:23.8683008Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8683068Z cvt.s16.s8 %rs1417, %rs1382; 2026-02-21T12:40:23.8683127Z shr.s16 %rs1418, %rs1417, 4; 2026-02-21T12:40:23.8683186Z cvt.s16.s8 %rs1419, %rs1384; 2026-02-21T12:40:23.8683246Z shr.s16 %rs1420, %rs1419, 4; 2026-02-21T12:40:23.8683327Z prmt.b32 %r16351, %r16318, 0, 0x9991U; 2026-02-21T12:40:23.8683390Z cvt.u16.u32 %rs1421, %r16351; 2026-02-21T12:40:23.8683451Z shr.s16 %rs1422, %rs1421, 4; 2026-02-21T12:40:23.8683518Z prmt.b32 %r16352, %r16328, 0, 0x9991U; 2026-02-21T12:40:23.8683577Z cvt.u16.u32 %rs1423, %r16352; 2026-02-21T12:40:23.8683634Z shr.s16 %rs1424, %rs1423, 4; 2026-02-21T12:40:23.8683827Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8683887Z cvt.rn.f32.s16 %r16353, %rs1424; 2026-02-21T12:40:23.8683947Z cvt.rn.f32.s16 %r16354, %rs1422; 2026-02-21T12:40:23.8684007Z cvt.rn.f32.s16 %r16355, %rs1420; 2026-02-21T12:40:23.8684069Z cvt.rn.f32.s16 %r16356, %rs1418; 2026-02-21T12:40:23.8684258Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8684317Z cvt.s16.s8 %rs1425, %rs1385; 2026-02-21T12:40:23.8684439Z shr.s16 %rs1426, %rs1425, 4; 2026-02-21T12:40:23.8684498Z cvt.s16.s8 %rs1427, %rs1387; 2026-02-21T12:40:23.8684557Z shr.s16 %rs1428, %rs1427, 4; 2026-02-21T12:40:23.8684673Z prmt.b32 %r16357, %r16313, 0, 0xaaa2U; 2026-02-21T12:40:23.8684733Z cvt.u16.u32 %rs1429, %r16357; 2026-02-21T12:40:23.8684791Z shr.s16 %rs1430, %rs1429, 4; 2026-02-21T12:40:23.8684856Z prmt.b32 %r16358, %r16323, 0, 0xaaa2U; 2026-02-21T12:40:23.8684918Z cvt.u16.u32 %rs1431, %r16358; 2026-02-21T12:40:23.8684977Z shr.s16 %rs1432, %rs1431, 4; 2026-02-21T12:40:23.8685166Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8685228Z cvt.rn.f32.s16 %r16359, %rs1432; 2026-02-21T12:40:23.8685287Z cvt.rn.f32.s16 %r16360, %rs1430; 2026-02-21T12:40:23.8685346Z cvt.rn.f32.s16 %r16361, %rs1428; 2026-02-21T12:40:23.8685406Z cvt.rn.f32.s16 %r16362, %rs1426; 2026-02-21T12:40:23.8685604Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8685679Z cvt.s16.s8 %rs1433, %rs1386; 2026-02-21T12:40:23.8685738Z shr.s16 %rs1434, %rs1433, 4; 2026-02-21T12:40:23.8685892Z cvt.s16.s8 %rs1435, %rs1388; 2026-02-21T12:40:23.8685954Z shr.s16 %rs1436, %rs1435, 4; 2026-02-21T12:40:23.8686019Z prmt.b32 %r16363, %r16318, 0, 0xaaa2U; 2026-02-21T12:40:23.8686083Z cvt.u16.u32 %rs1437, %r16363; 2026-02-21T12:40:23.8686144Z shr.s16 %rs1438, %rs1437, 4; 2026-02-21T12:40:23.8686214Z prmt.b32 %r16364, %r16328, 0, 0xaaa2U; 2026-02-21T12:40:23.8686273Z cvt.u16.u32 %rs1439, %r16364; 2026-02-21T12:40:23.8686333Z shr.s16 %rs1440, %rs1439, 4; 2026-02-21T12:40:23.8686673Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8686740Z cvt.rn.f32.s16 %r16365, %rs1440; 2026-02-21T12:40:23.8686805Z cvt.rn.f32.s16 %r16366, %rs1438; 2026-02-21T12:40:23.8686864Z cvt.rn.f32.s16 %r16367, %rs1436; 2026-02-21T12:40:23.8686927Z cvt.rn.f32.s16 %r16368, %rs1434; 2026-02-21T12:40:23.8687120Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8687185Z cvt.s16.s8 %rs1441, %rs1389; 2026-02-21T12:40:23.8687245Z shr.s16 %rs1442, %rs1441, 4; 2026-02-21T12:40:23.8687304Z cvt.s16.s8 %rs1443, %rs1391; 2026-02-21T12:40:23.8687366Z shr.s16 %rs1444, %rs1443, 4; 2026-02-21T12:40:23.8687432Z prmt.b32 %r16369, %r16313, 0, 0xbbb3U; 2026-02-21T12:40:23.8687494Z cvt.u16.u32 %rs1445, %r16369; 2026-02-21T12:40:23.8687554Z shr.s16 %rs1446, %rs1445, 4; 2026-02-21T12:40:23.8687617Z prmt.b32 %r16370, %r16323, 0, 0xbbb3U; 2026-02-21T12:40:23.8687687Z cvt.u16.u32 %rs1447, %r16370; 2026-02-21T12:40:23.8687747Z shr.s16 %rs1448, %rs1447, 4; 2026-02-21T12:40:23.8687942Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8688002Z cvt.rn.f32.s16 %r16371, %rs1448; 2026-02-21T12:40:23.8688064Z cvt.rn.f32.s16 %r16372, %rs1446; 2026-02-21T12:40:23.8688129Z cvt.rn.f32.s16 %r16373, %rs1444; 2026-02-21T12:40:23.8688189Z cvt.rn.f32.s16 %r16374, %rs1442; 2026-02-21T12:40:23.8688383Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8688444Z cvt.s16.s8 %rs1449, %rs1390; 2026-02-21T12:40:23.8688502Z shr.s16 %rs1450, %rs1449, 4; 2026-02-21T12:40:23.8688562Z cvt.s16.s8 %rs1451, %rs1392; 2026-02-21T12:40:23.8688633Z shr.s16 %rs1452, %rs1451, 4; 2026-02-21T12:40:23.8688704Z prmt.b32 %r16375, %r16318, 0, 0xbbb3U; 2026-02-21T12:40:23.8688764Z cvt.u16.u32 %rs1453, %r16375; 2026-02-21T12:40:23.8688823Z shr.s16 %rs1454, %rs1453, 4; 2026-02-21T12:40:23.8688892Z prmt.b32 %r16376, %r16328, 0, 0xbbb3U; 2026-02-21T12:40:23.8688951Z cvt.u16.u32 %rs1455, %r16376; 2026-02-21T12:40:23.8689010Z shr.s16 %rs1456, %rs1455, 4; 2026-02-21T12:40:23.8689202Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8689349Z cvt.rn.f32.s16 %r16377, %rs1456; 2026-02-21T12:40:23.8689408Z cvt.rn.f32.s16 %r16378, %rs1454; 2026-02-21T12:40:23.8689534Z cvt.rn.f32.s16 %r16379, %rs1452; 2026-02-21T12:40:23.8689598Z cvt.rn.f32.s16 %r16380, %rs1450; 2026-02-21T12:40:23.8689652Z bar.sync 0; 2026-02-21T12:40:23.8689770Z st.shared.v4.b32 [%r33], {%r16338, %r16336, %r16337, %r16335}; 2026-02-21T12:40:23.8689898Z st.shared.v4.b32 [%r33+8192], {%r16344, %r16342, %r16343, %r16341}; 2026-02-21T12:40:23.8690018Z st.shared.v4.b32 [%r34], {%r16350, %r16348, %r16349, %r16347}; 2026-02-21T12:40:23.8690140Z st.shared.v4.b32 [%r34+8192], {%r16356, %r16354, %r16355, %r16353}; 2026-02-21T12:40:23.8690247Z st.shared.v4.b32 [%r35], {%r16362, %r16360, %r16361, %r16359}; 2026-02-21T12:40:23.8690362Z st.shared.v4.b32 [%r35+8192], {%r16368, %r16366, %r16367, %r16365}; 2026-02-21T12:40:23.8690465Z st.shared.v4.b32 [%r36], {%r16374, %r16372, %r16373, %r16371}; 2026-02-21T12:40:23.8690578Z st.shared.v4.b32 [%r36+8192], {%r16380, %r16378, %r16379, %r16377}; 2026-02-21T12:40:23.8690635Z $L__tmp27: 2026-02-21T12:40:23.8691025Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8691089Z // begin inline asm 2026-02-21T12:40:23.8691168Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8691223Z // end inline asm 2026-02-21T12:40:23.8691274Z bar.sync 0; 2026-02-21T12:40:23.8691343Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8691402Z // begin inline asm 2026-02-21T12:40:23.8694075Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r14904,%r14905,%r14906,%r14907}, %rd23, %p36, 1, 1; 2026-02-21T12:40:23.8694139Z // end inline asm 2026-02-21T12:40:23.8694196Z // begin inline asm 2026-02-21T12:40:23.8696993Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r15164,%r15165,%r15166,%r15167}, %rd24, %p36, 1, 1; 2026-02-21T12:40:23.8697066Z // end inline asm 2026-02-21T12:40:23.8697219Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8697281Z mov.b32 %r15296, %r18409; 2026-02-21T12:40:23.8697339Z mov.b32 %r15297, %r16088; 2026-02-21T12:40:23.8697457Z mov.b32 %r15298, %r16088; 2026-02-21T12:40:23.8697516Z // begin inline asm 2026-02-21T12:40:23.8700091Z // wait for regs: %r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225,%r15296,%r15297,%r15298 2026-02-21T12:40:23.8700175Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8700229Z // end inline asm 2026-02-21T12:40:23.8700281Z $L__tmp28: 2026-02-21T12:40:23.8700484Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8700542Z // begin inline asm 2026-02-21T12:40:23.8700599Z mov.u64 %rd457, 0x0; 2026-02-21T12:40:23.8700722Z createpolicy.fractional.L2::evict_last.b64 %rd457, 1.0; 2026-02-21T12:40:23.8700777Z // end inline asm 2026-02-21T12:40:23.8700832Z // begin inline asm 2026-02-21T12:40:23.8700888Z mov.u32 %r15430, 0x0; 2026-02-21T12:40:23.8700946Z mov.u32 %r15431, 0x0; 2026-02-21T12:40:23.8701004Z mov.u32 %r15432, 0x0; 2026-02-21T12:40:23.8701062Z mov.u32 %r15433, 0x0; 2026-02-21T12:40:23.8701297Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r15430, %r15431, %r15432, %r15433 }, [ %rd636 + 0 ], %rd457; 2026-02-21T12:40:23.8701355Z // end inline asm 2026-02-21T12:40:23.8701552Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8701607Z bar.sync 0; 2026-02-21T12:40:23.8701687Z st.shared.v2.b32 [%r9], {%r15430, %r15431}; 2026-02-21T12:40:23.8701767Z st.shared.v2.b32 [%r10], {%r15432, %r15433}; 2026-02-21T12:40:23.8701820Z bar.sync 0; 2026-02-21T12:40:23.8701888Z ld.shared.b16 %rs1457, [%r53]; 2026-02-21T12:40:23.8701956Z ld.shared.b16 %rs1458, [%r53+256]; 2026-02-21T12:40:23.8702033Z ld.shared.b16 %rs1459, [%r53+16]; 2026-02-21T12:40:23.8702101Z ld.shared.b16 %rs1460, [%r53+272]; 2026-02-21T12:40:23.8702166Z ld.shared.b16 %rs1461, [%r54]; 2026-02-21T12:40:23.8702231Z ld.shared.b16 %rs1462, [%r54+256]; 2026-02-21T12:40:23.8702299Z ld.shared.b16 %rs1463, [%r54+16]; 2026-02-21T12:40:23.8702362Z ld.shared.b16 %rs1464, [%r54+272]; 2026-02-21T12:40:23.8702426Z cvt.f32.bf16 %r15694, %rs1457; 2026-02-21T12:40:23.8702487Z cvt.f32.bf16 %r15695, %rs1458; 2026-02-21T12:40:23.8702550Z cvt.f32.bf16 %r15696, %rs1461; 2026-02-21T12:40:23.8702609Z cvt.f32.bf16 %r15697, %rs1462; 2026-02-21T12:40:23.8702668Z cvt.f32.bf16 %r15954, %rs1459; 2026-02-21T12:40:23.8702729Z cvt.f32.bf16 %r15955, %rs1460; 2026-02-21T12:40:23.8702787Z cvt.f32.bf16 %r15956, %rs1463; 2026-02-21T12:40:23.8702845Z cvt.f32.bf16 %r15957, %rs1464; 2026-02-21T12:40:23.8703038Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8703101Z add.s64 %rd460, %rd635, 20480; 2026-02-21T12:40:23.8703157Z // begin inline asm 2026-02-21T12:40:23.8703213Z mov.u32 %r15434, 0x0; 2026-02-21T12:40:23.8703273Z mov.u32 %r15435, 0x0; 2026-02-21T12:40:23.8703328Z mov.u32 %r15436, 0x0; 2026-02-21T12:40:23.8703445Z mov.u32 %r15437, 0x0; 2026-02-21T12:40:23.8703576Z ld.global.v4.b32 { %r15434, %r15435, %r15436, %r15437 }, [ %rd460 + 0 ]; 2026-02-21T12:40:23.8703701Z // end inline asm 2026-02-21T12:40:23.8703897Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8703949Z bar.sync 0; 2026-02-21T12:40:23.8704017Z st.shared.b8 [%r13], %r15434; 2026-02-21T12:40:23.8704084Z prmt.b32 %r16381, %r15434, 0, 0x7771U; 2026-02-21T12:40:23.8704145Z st.shared.b8 [%r14], %r16381; 2026-02-21T12:40:23.8704212Z prmt.b32 %r16382, %r15434, 0, 0x7772U; 2026-02-21T12:40:23.8704275Z st.shared.b8 [%r15+256], %r16382; 2026-02-21T12:40:23.8704338Z prmt.b32 %r16383, %r15434, 0, 0x7773U; 2026-02-21T12:40:23.8704400Z st.shared.b8 [%r16+256], %r16383; 2026-02-21T12:40:23.8704463Z st.shared.b8 [%r17+512], %r15435; 2026-02-21T12:40:23.8704525Z prmt.b32 %r16384, %r15435, 0, 0x7771U; 2026-02-21T12:40:23.8704585Z st.shared.b8 [%r18+512], %r16384; 2026-02-21T12:40:23.8704652Z prmt.b32 %r16385, %r15435, 0, 0x7772U; 2026-02-21T12:40:23.8704712Z st.shared.b8 [%r19+768], %r16385; 2026-02-21T12:40:23.8704870Z prmt.b32 %r16386, %r15435, 0, 0x7773U; 2026-02-21T12:40:23.8704934Z st.shared.b8 [%r20+768], %r16386; 2026-02-21T12:40:23.8704999Z st.shared.b8 [%r21+1024], %r15436; 2026-02-21T12:40:23.8705062Z prmt.b32 %r16387, %r15436, 0, 0x7771U; 2026-02-21T12:40:23.8705123Z st.shared.b8 [%r22+1024], %r16387; 2026-02-21T12:40:23.8705187Z prmt.b32 %r16388, %r15436, 0, 0x7772U; 2026-02-21T12:40:23.8705247Z st.shared.b8 [%r23+1280], %r16388; 2026-02-21T12:40:23.8705309Z prmt.b32 %r16389, %r15436, 0, 0x7773U; 2026-02-21T12:40:23.8705371Z st.shared.b8 [%r24+1280], %r16389; 2026-02-21T12:40:23.8705430Z st.shared.b8 [%r25+1536], %r15437; 2026-02-21T12:40:23.8705492Z prmt.b32 %r16390, %r15437, 0, 0x7771U; 2026-02-21T12:40:23.8705552Z st.shared.b8 [%r26+1536], %r16390; 2026-02-21T12:40:23.8705617Z prmt.b32 %r16391, %r15437, 0, 0x7772U; 2026-02-21T12:40:23.8705680Z st.shared.b8 [%r27+1792], %r16391; 2026-02-21T12:40:23.8705742Z prmt.b32 %r16392, %r15437, 0, 0x7773U; 2026-02-21T12:40:23.8705816Z st.shared.b8 [%r28+1792], %r16392; 2026-02-21T12:40:23.8705869Z bar.sync 0; 2026-02-21T12:40:23.8705934Z ld.shared.b32 %r16393, [%r29]; 2026-02-21T12:40:23.8705996Z prmt.b32 %r16394, %r16393, 0, 0x7770U; 2026-02-21T12:40:23.8706059Z cvt.u16.u32 %rs1465, %r16394; 2026-02-21T12:40:23.8706122Z prmt.b32 %r16395, %r16393, 0, 0x7771U; 2026-02-21T12:40:23.8706181Z cvt.u16.u32 %rs1466, %r16395; 2026-02-21T12:40:23.8706249Z prmt.b32 %r16396, %r16393, 0, 0x7772U; 2026-02-21T12:40:23.8706307Z cvt.u16.u32 %rs1467, %r16396; 2026-02-21T12:40:23.8706369Z prmt.b32 %r16397, %r16393, 0, 0x7773U; 2026-02-21T12:40:23.8706434Z cvt.u16.u32 %rs1468, %r16397; 2026-02-21T12:40:23.8706625Z ld.shared.b32 %r16398, [%r30]; 2026-02-21T12:40:23.8706692Z prmt.b32 %r16399, %r16398, 0, 0x7770U; 2026-02-21T12:40:23.8706751Z cvt.u16.u32 %rs1469, %r16399; 2026-02-21T12:40:23.8706819Z prmt.b32 %r16400, %r16398, 0, 0x7771U; 2026-02-21T12:40:23.8706877Z cvt.u16.u32 %rs1470, %r16400; 2026-02-21T12:40:23.8706943Z prmt.b32 %r16401, %r16398, 0, 0x7772U; 2026-02-21T12:40:23.8707005Z cvt.u16.u32 %rs1471, %r16401; 2026-02-21T12:40:23.8707067Z prmt.b32 %r16402, %r16398, 0, 0x7773U; 2026-02-21T12:40:23.8707126Z cvt.u16.u32 %rs1472, %r16402; 2026-02-21T12:40:23.8707188Z ld.shared.b32 %r16403, [%r31]; 2026-02-21T12:40:23.8707252Z prmt.b32 %r16404, %r16403, 0, 0x7770U; 2026-02-21T12:40:23.8707323Z cvt.u16.u32 %rs1473, %r16404; 2026-02-21T12:40:23.8707388Z prmt.b32 %r16405, %r16403, 0, 0x7771U; 2026-02-21T12:40:23.8707448Z cvt.u16.u32 %rs1474, %r16405; 2026-02-21T12:40:23.8707512Z prmt.b32 %r16406, %r16403, 0, 0x7772U; 2026-02-21T12:40:23.8707569Z cvt.u16.u32 %rs1475, %r16406; 2026-02-21T12:40:23.8707631Z prmt.b32 %r16407, %r16403, 0, 0x7773U; 2026-02-21T12:40:23.8707692Z cvt.u16.u32 %rs1476, %r16407; 2026-02-21T12:40:23.8707848Z ld.shared.b32 %r16408, [%r32]; 2026-02-21T12:40:23.8707912Z prmt.b32 %r16409, %r16408, 0, 0x7770U; 2026-02-21T12:40:23.8707972Z cvt.u16.u32 %rs1477, %r16409; 2026-02-21T12:40:23.8708102Z prmt.b32 %r16410, %r16408, 0, 0x7771U; 2026-02-21T12:40:23.8708160Z cvt.u16.u32 %rs1478, %r16410; 2026-02-21T12:40:23.8708224Z prmt.b32 %r16411, %r16408, 0, 0x7772U; 2026-02-21T12:40:23.8708283Z cvt.u16.u32 %rs1479, %r16411; 2026-02-21T12:40:23.8708406Z prmt.b32 %r16412, %r16408, 0, 0x7773U; 2026-02-21T12:40:23.8708468Z cvt.u16.u32 %rs1480, %r16412; 2026-02-21T12:40:23.8708668Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8708731Z shl.b16 %rs1481, %rs1465, 4; 2026-02-21T12:40:23.8708790Z shl.b16 %rs1482, %rs1469, 4; 2026-02-21T12:40:23.8708852Z shl.b16 %rs1483, %rs1473, 4; 2026-02-21T12:40:23.8708910Z shl.b16 %rs1484, %rs1477, 4; 2026-02-21T12:40:23.8708968Z shl.b16 %rs1485, %rs1466, 4; 2026-02-21T12:40:23.8709026Z shl.b16 %rs1486, %rs1470, 4; 2026-02-21T12:40:23.8709091Z shl.b16 %rs1487, %rs1474, 4; 2026-02-21T12:40:23.8709148Z shl.b16 %rs1488, %rs1478, 4; 2026-02-21T12:40:23.8709331Z shl.b16 %rs1489, %rs1467, 4; 2026-02-21T12:40:23.8709396Z shl.b16 %rs1490, %rs1471, 4; 2026-02-21T12:40:23.8709453Z shl.b16 %rs1491, %rs1475, 4; 2026-02-21T12:40:23.8709510Z shl.b16 %rs1492, %rs1479, 4; 2026-02-21T12:40:23.8709569Z shl.b16 %rs1493, %rs1468, 4; 2026-02-21T12:40:23.8709626Z shl.b16 %rs1494, %rs1472, 4; 2026-02-21T12:40:23.8709684Z shl.b16 %rs1495, %rs1476, 4; 2026-02-21T12:40:23.8709743Z shl.b16 %rs1496, %rs1480, 4; 2026-02-21T12:40:23.8709940Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8709999Z cvt.s16.s8 %rs1497, %rs1481; 2026-02-21T12:40:23.8710057Z shr.s16 %rs1498, %rs1497, 4; 2026-02-21T12:40:23.8710116Z cvt.s16.s8 %rs1499, %rs1483; 2026-02-21T12:40:23.8710175Z shr.s16 %rs1500, %rs1499, 4; 2026-02-21T12:40:23.8710242Z prmt.b32 %r16413, %r16393, 0, 0x8880U; 2026-02-21T12:40:23.8710301Z cvt.u16.u32 %rs1501, %r16413; 2026-02-21T12:40:23.8710361Z shr.s16 %rs1502, %rs1501, 4; 2026-02-21T12:40:23.8710429Z prmt.b32 %r16414, %r16403, 0, 0x8880U; 2026-02-21T12:40:23.8710487Z cvt.u16.u32 %rs1503, %r16414; 2026-02-21T12:40:23.8710548Z shr.s16 %rs1504, %rs1503, 4; 2026-02-21T12:40:23.8710740Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8710804Z cvt.rn.f32.s16 %r16415, %rs1504; 2026-02-21T12:40:23.8710869Z cvt.rn.f32.s16 %r16416, %rs1502; 2026-02-21T12:40:23.8710929Z cvt.rn.f32.s16 %r16417, %rs1500; 2026-02-21T12:40:23.8710989Z cvt.rn.f32.s16 %r16418, %rs1498; 2026-02-21T12:40:23.8711179Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8711240Z cvt.s16.s8 %rs1505, %rs1482; 2026-02-21T12:40:23.8711301Z shr.s16 %rs1506, %rs1505, 4; 2026-02-21T12:40:23.8711362Z cvt.s16.s8 %rs1507, %rs1484; 2026-02-21T12:40:23.8711423Z shr.s16 %rs1508, %rs1507, 4; 2026-02-21T12:40:23.8711499Z prmt.b32 %r16419, %r16398, 0, 0x8880U; 2026-02-21T12:40:23.8711564Z cvt.u16.u32 %rs1509, %r16419; 2026-02-21T12:40:23.8711623Z shr.s16 %rs1510, %rs1509, 4; 2026-02-21T12:40:23.8711691Z prmt.b32 %r16420, %r16408, 0, 0x8880U; 2026-02-21T12:40:23.8711748Z cvt.u16.u32 %rs1511, %r16420; 2026-02-21T12:40:23.8711807Z shr.s16 %rs1512, %rs1511, 4; 2026-02-21T12:40:23.8712003Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8712064Z cvt.rn.f32.s16 %r16421, %rs1512; 2026-02-21T12:40:23.8712125Z cvt.rn.f32.s16 %r16422, %rs1510; 2026-02-21T12:40:23.8712186Z cvt.rn.f32.s16 %r16423, %rs1508; 2026-02-21T12:40:23.8712244Z cvt.rn.f32.s16 %r16424, %rs1506; 2026-02-21T12:40:23.8712433Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8712558Z cvt.s16.s8 %rs1513, %rs1485; 2026-02-21T12:40:23.8712621Z shr.s16 %rs1514, %rs1513, 4; 2026-02-21T12:40:23.8712678Z cvt.s16.s8 %rs1515, %rs1487; 2026-02-21T12:40:23.8712783Z shr.s16 %rs1516, %rs1515, 4; 2026-02-21T12:40:23.8712848Z prmt.b32 %r16425, %r16393, 0, 0x9991U; 2026-02-21T12:40:23.8712907Z cvt.u16.u32 %rs1517, %r16425; 2026-02-21T12:40:23.8712964Z shr.s16 %rs1518, %rs1517, 4; 2026-02-21T12:40:23.8713030Z prmt.b32 %r16426, %r16403, 0, 0x9991U; 2026-02-21T12:40:23.8713089Z cvt.u16.u32 %rs1519, %r16426; 2026-02-21T12:40:23.8713146Z shr.s16 %rs1520, %rs1519, 4; 2026-02-21T12:40:23.8713338Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8713408Z cvt.rn.f32.s16 %r16427, %rs1520; 2026-02-21T12:40:23.8713474Z cvt.rn.f32.s16 %r16428, %rs1518; 2026-02-21T12:40:23.8713534Z cvt.rn.f32.s16 %r16429, %rs1516; 2026-02-21T12:40:23.8713596Z cvt.rn.f32.s16 %r16430, %rs1514; 2026-02-21T12:40:23.8713786Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8713846Z cvt.s16.s8 %rs1521, %rs1486; 2026-02-21T12:40:23.8714001Z shr.s16 %rs1522, %rs1521, 4; 2026-02-21T12:40:23.8714063Z cvt.s16.s8 %rs1523, %rs1488; 2026-02-21T12:40:23.8714121Z shr.s16 %rs1524, %rs1523, 4; 2026-02-21T12:40:23.8714194Z prmt.b32 %r16431, %r16398, 0, 0x9991U; 2026-02-21T12:40:23.8714260Z cvt.u16.u32 %rs1525, %r16431; 2026-02-21T12:40:23.8714323Z shr.s16 %rs1526, %rs1525, 4; 2026-02-21T12:40:23.8714393Z prmt.b32 %r16432, %r16408, 0, 0x9991U; 2026-02-21T12:40:23.8714457Z cvt.u16.u32 %rs1527, %r16432; 2026-02-21T12:40:23.8714519Z shr.s16 %rs1528, %rs1527, 4; 2026-02-21T12:40:23.8714718Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8714783Z cvt.rn.f32.s16 %r16433, %rs1528; 2026-02-21T12:40:23.8714847Z cvt.rn.f32.s16 %r16434, %rs1526; 2026-02-21T12:40:23.8714906Z cvt.rn.f32.s16 %r16435, %rs1524; 2026-02-21T12:40:23.8714969Z cvt.rn.f32.s16 %r16436, %rs1522; 2026-02-21T12:40:23.8715170Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8715232Z cvt.s16.s8 %rs1529, %rs1489; 2026-02-21T12:40:23.8715292Z shr.s16 %rs1530, %rs1529, 4; 2026-02-21T12:40:23.8715352Z cvt.s16.s8 %rs1531, %rs1491; 2026-02-21T12:40:23.8715410Z shr.s16 %rs1532, %rs1531, 4; 2026-02-21T12:40:23.8715478Z prmt.b32 %r16437, %r16393, 0, 0xaaa2U; 2026-02-21T12:40:23.8715538Z cvt.u16.u32 %rs1533, %r16437; 2026-02-21T12:40:23.8715599Z shr.s16 %rs1534, %rs1533, 4; 2026-02-21T12:40:23.8715663Z prmt.b32 %r16438, %r16403, 0, 0xaaa2U; 2026-02-21T12:40:23.8715723Z cvt.u16.u32 %rs1535, %r16438; 2026-02-21T12:40:23.8715784Z shr.s16 %rs1536, %rs1535, 4; 2026-02-21T12:40:23.8715996Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8716063Z cvt.rn.f32.s16 %r16439, %rs1536; 2026-02-21T12:40:23.8716126Z cvt.rn.f32.s16 %r16440, %rs1534; 2026-02-21T12:40:23.8716190Z cvt.rn.f32.s16 %r16441, %rs1532; 2026-02-21T12:40:23.8716253Z cvt.rn.f32.s16 %r16442, %rs1530; 2026-02-21T12:40:23.8716561Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8716629Z cvt.s16.s8 %rs1537, %rs1490; 2026-02-21T12:40:23.8716687Z shr.s16 %rs1538, %rs1537, 4; 2026-02-21T12:40:23.8716747Z cvt.s16.s8 %rs1539, %rs1492; 2026-02-21T12:40:23.8716806Z shr.s16 %rs1540, %rs1539, 4; 2026-02-21T12:40:23.8716872Z prmt.b32 %r16443, %r16398, 0, 0xaaa2U; 2026-02-21T12:40:23.8716931Z cvt.u16.u32 %rs1541, %r16443; 2026-02-21T12:40:23.8716990Z shr.s16 %rs1542, %rs1541, 4; 2026-02-21T12:40:23.8717056Z prmt.b32 %r16444, %r16408, 0, 0xaaa2U; 2026-02-21T12:40:23.8717115Z cvt.u16.u32 %rs1543, %r16444; 2026-02-21T12:40:23.8717172Z shr.s16 %rs1544, %rs1543, 4; 2026-02-21T12:40:23.8717379Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8717523Z cvt.rn.f32.s16 %r16445, %rs1544; 2026-02-21T12:40:23.8717589Z cvt.rn.f32.s16 %r16446, %rs1542; 2026-02-21T12:40:23.8717716Z cvt.rn.f32.s16 %r16447, %rs1540; 2026-02-21T12:40:23.8717777Z cvt.rn.f32.s16 %r16448, %rs1538; 2026-02-21T12:40:23.8717968Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8718028Z cvt.s16.s8 %rs1545, %rs1493; 2026-02-21T12:40:23.8718091Z shr.s16 %rs1546, %rs1545, 4; 2026-02-21T12:40:23.8718150Z cvt.s16.s8 %rs1547, %rs1495; 2026-02-21T12:40:23.8718209Z shr.s16 %rs1548, %rs1547, 4; 2026-02-21T12:40:23.8718276Z prmt.b32 %r16449, %r16393, 0, 0xbbb3U; 2026-02-21T12:40:23.8718335Z cvt.u16.u32 %rs1549, %r16449; 2026-02-21T12:40:23.8718394Z shr.s16 %rs1550, %rs1549, 4; 2026-02-21T12:40:23.8718457Z prmt.b32 %r16450, %r16403, 0, 0xbbb3U; 2026-02-21T12:40:23.8718519Z cvt.u16.u32 %rs1551, %r16450; 2026-02-21T12:40:23.8718580Z shr.s16 %rs1552, %rs1551, 4; 2026-02-21T12:40:23.8718772Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8718955Z cvt.rn.f32.s16 %r16451, %rs1552; 2026-02-21T12:40:23.8719020Z cvt.rn.f32.s16 %r16452, %rs1550; 2026-02-21T12:40:23.8719081Z cvt.rn.f32.s16 %r16453, %rs1548; 2026-02-21T12:40:23.8719142Z cvt.rn.f32.s16 %r16454, %rs1546; 2026-02-21T12:40:23.8719332Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8719392Z cvt.s16.s8 %rs1553, %rs1494; 2026-02-21T12:40:23.8719450Z shr.s16 %rs1554, %rs1553, 4; 2026-02-21T12:40:23.8719510Z cvt.s16.s8 %rs1555, %rs1496; 2026-02-21T12:40:23.8719568Z shr.s16 %rs1556, %rs1555, 4; 2026-02-21T12:40:23.8719631Z prmt.b32 %r16455, %r16398, 0, 0xbbb3U; 2026-02-21T12:40:23.8719692Z cvt.u16.u32 %rs1557, %r16455; 2026-02-21T12:40:23.8719751Z shr.s16 %rs1558, %rs1557, 4; 2026-02-21T12:40:23.8719829Z prmt.b32 %r16456, %r16408, 0, 0xbbb3U; 2026-02-21T12:40:23.8719894Z cvt.u16.u32 %rs1559, %r16456; 2026-02-21T12:40:23.8719955Z shr.s16 %rs1560, %rs1559, 4; 2026-02-21T12:40:23.8720153Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8720214Z cvt.rn.f32.s16 %r16457, %rs1560; 2026-02-21T12:40:23.8720278Z cvt.rn.f32.s16 %r16458, %rs1558; 2026-02-21T12:40:23.8720337Z cvt.rn.f32.s16 %r16459, %rs1556; 2026-02-21T12:40:23.8720397Z cvt.rn.f32.s16 %r16460, %rs1554; 2026-02-21T12:40:23.8720453Z bar.sync 0; 2026-02-21T12:40:23.8720572Z st.shared.v4.b32 [%r33], {%r16418, %r16416, %r16417, %r16415}; 2026-02-21T12:40:23.8720694Z st.shared.v4.b32 [%r33+8192], {%r16424, %r16422, %r16423, %r16421}; 2026-02-21T12:40:23.8720801Z st.shared.v4.b32 [%r34], {%r16430, %r16428, %r16429, %r16427}; 2026-02-21T12:40:23.8720921Z st.shared.v4.b32 [%r34+8192], {%r16436, %r16434, %r16435, %r16433}; 2026-02-21T12:40:23.8721026Z st.shared.v4.b32 [%r35], {%r16442, %r16440, %r16441, %r16439}; 2026-02-21T12:40:23.8721140Z st.shared.v4.b32 [%r35+8192], {%r16448, %r16446, %r16447, %r16445}; 2026-02-21T12:40:23.8721252Z st.shared.v4.b32 [%r36], {%r16454, %r16452, %r16453, %r16451}; 2026-02-21T12:40:23.8721366Z st.shared.v4.b32 [%r36+8192], {%r16460, %r16458, %r16459, %r16457}; 2026-02-21T12:40:23.8721419Z $L__tmp29: 2026-02-21T12:40:23.8721692Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8721762Z // begin inline asm 2026-02-21T12:40:23.8721839Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8721896Z // end inline asm 2026-02-21T12:40:23.8721950Z bar.sync 0; 2026-02-21T12:40:23.8722021Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8722078Z // begin inline asm 2026-02-21T12:40:23.8724744Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r15694,%r15695,%r15696,%r15697}, %rd23, %p36, 1, 1; 2026-02-21T12:40:23.8724920Z // end inline asm 2026-02-21T12:40:23.8724981Z // begin inline asm 2026-02-21T12:40:23.8727994Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r15954,%r15955,%r15956,%r15957}, %rd24, %p36, 1, 1; 2026-02-21T12:40:23.8728077Z // end inline asm 2026-02-21T12:40:23.8728159Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:23.8728222Z mov.b32 %r16087, %r16088; 2026-02-21T12:40:23.8728283Z mov.b32 %r16086, %r18409; 2026-02-21T12:40:23.8728342Z // begin inline asm 2026-02-21T12:40:23.8731169Z // wait for regs: %r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225,%r16086,%r16087,%r16088 2026-02-21T12:40:23.8731258Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:23.8731316Z // end inline asm 2026-02-21T12:40:23.8731372Z $L__tmp30: 2026-02-21T12:40:23.8731650Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:23.8731717Z add.s64 %rd637, %rd637, 24; 2026-02-21T12:40:23.8731780Z add.s64 %rd636, %rd636, 96; 2026-02-21T12:40:23.8731845Z add.s64 %rd635, %rd635, 30720; 2026-02-21T12:40:23.8732010Z setp.lt.u64 %p42, %rd637, 4056; 2026-02-21T12:40:23.8732072Z @%p42 bra $L__BB0_35; 2026-02-21T12:40:23.8732174Z // %bb.36: // %.preheader154 2026-02-21T12:40:23.8732419Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:23.8732488Z add.s64 %rd639, %rd27, %rd121; 2026-02-21T12:40:23.8732596Z add.s64 %rd638, %rd28, %rd120; 2026-02-21T12:40:23.8732669Z mov.b64 %rd640, 4072; 2026-02-21T12:40:23.8732790Z $L__BB0_37: // Parent Loop BB0_2 Depth=1 2026-02-21T12:40:23.8732895Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:23.8733103Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:23.8733161Z // begin inline asm 2026-02-21T12:40:23.8733220Z mov.u64 %rd464, 0x0; 2026-02-21T12:40:23.8733398Z createpolicy.fractional.L2::evict_last.b64 %rd464, 1.0; 2026-02-21T12:40:23.8733488Z // end inline asm 2026-02-21T12:40:23.8733547Z // begin inline asm 2026-02-21T12:40:23.8733608Z mov.u32 %r16461, 0x0; 2026-02-21T12:40:23.8733666Z mov.u32 %r16462, 0x0; 2026-02-21T12:40:23.8733830Z mov.u32 %r16463, 0x0; 2026-02-21T12:40:23.8733892Z mov.u32 %r16464, 0x0; 2026-02-21T12:40:23.8734144Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r16461, %r16462, %r16463, %r16464 }, [ %rd639 + 0 ], %rd464; 2026-02-21T12:40:23.8734242Z // end inline asm 2026-02-21T12:40:23.8734486Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:23.8734546Z bar.sync 0; 2026-02-21T12:40:23.8734629Z st.shared.v2.b32 [%r9], {%r16461, %r16462}; 2026-02-21T12:40:23.8734709Z st.shared.v2.b32 [%r10], {%r16463, %r16464}; 2026-02-21T12:40:23.8734766Z bar.sync 0; 2026-02-21T12:40:23.8734832Z ld.shared.b16 %rs1561, [%r53]; 2026-02-21T12:40:23.8734898Z ld.shared.b16 %rs1562, [%r53+256]; 2026-02-21T12:40:23.8734971Z ld.shared.b16 %rs1563, [%r53+16]; 2026-02-21T12:40:23.8735089Z ld.shared.b16 %rs1564, [%r53+272]; 2026-02-21T12:40:23.8735179Z ld.shared.b16 %rs1565, [%r54]; 2026-02-21T12:40:23.8735247Z ld.shared.b16 %rs1566, [%r54+256]; 2026-02-21T12:40:23.8735321Z ld.shared.b16 %rs1567, [%r54+16]; 2026-02-21T12:40:23.8735386Z ld.shared.b16 %rs1568, [%r54+272]; 2026-02-21T12:40:23.8735446Z cvt.f32.bf16 %r16725, %rs1561; 2026-02-21T12:40:23.8735510Z cvt.f32.bf16 %r16726, %rs1562; 2026-02-21T12:40:23.8735569Z cvt.f32.bf16 %r16727, %rs1565; 2026-02-21T12:40:23.8735628Z cvt.f32.bf16 %r16728, %rs1566; 2026-02-21T12:40:23.8735686Z cvt.f32.bf16 %r16985, %rs1563; 2026-02-21T12:40:23.8735747Z cvt.f32.bf16 %r16986, %rs1564; 2026-02-21T12:40:23.8735807Z cvt.f32.bf16 %r16987, %rs1567; 2026-02-21T12:40:23.8735865Z cvt.f32.bf16 %r16988, %rs1568; 2026-02-21T12:40:23.8736063Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:23.8736125Z // begin inline asm 2026-02-21T12:40:23.8736225Z mov.u32 %r16465, 0x0; 2026-02-21T12:40:23.8736330Z mov.u32 %r16466, 0x0; 2026-02-21T12:40:23.8736429Z mov.u32 %r16467, 0x0; 2026-02-21T12:40:23.8736697Z mov.u32 %r16468, 0x0; 2026-02-21T12:40:23.8736932Z ld.global.v4.b32 { %r16465, %r16466, %r16467, %r16468 }, [ %rd638 + 0 ]; 2026-02-21T12:40:23.8737011Z // end inline asm 2026-02-21T12:40:23.8737209Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:23.8737263Z bar.sync 0; 2026-02-21T12:40:23.8737332Z st.shared.b8 [%r13], %r16465; 2026-02-21T12:40:23.8737414Z prmt.b32 %r17251, %r16465, 0, 0x7771U; 2026-02-21T12:40:23.8737523Z st.shared.b8 [%r14], %r17251; 2026-02-21T12:40:23.8737635Z prmt.b32 %r17252, %r16465, 0, 0x7772U; 2026-02-21T12:40:23.8737754Z st.shared.b8 [%r15+256], %r17252; 2026-02-21T12:40:23.8737861Z prmt.b32 %r17253, %r16465, 0, 0x7773U; 2026-02-21T12:40:23.8737927Z st.shared.b8 [%r16+256], %r17253; 2026-02-21T12:40:23.8737992Z st.shared.b8 [%r17+512], %r16466; 2026-02-21T12:40:23.8738151Z prmt.b32 %r17254, %r16466, 0, 0x7771U; 2026-02-21T12:40:23.8738213Z st.shared.b8 [%r18+512], %r17254; 2026-02-21T12:40:23.8738403Z prmt.b32 %r17255, %r16466, 0, 0x7772U; 2026-02-21T12:40:23.8738517Z st.shared.b8 [%r19+768], %r17255; 2026-02-21T12:40:23.8738631Z prmt.b32 %r17256, %r16466, 0, 0x7773U; 2026-02-21T12:40:23.8738723Z st.shared.b8 [%r20+768], %r17256; 2026-02-21T12:40:23.8738795Z st.shared.b8 [%r21+1024], %r16467; 2026-02-21T12:40:23.8738862Z prmt.b32 %r17257, %r16467, 0, 0x7771U; 2026-02-21T12:40:23.8738925Z st.shared.b8 [%r22+1024], %r17257; 2026-02-21T12:40:23.8738991Z prmt.b32 %r17258, %r16467, 0, 0x7772U; 2026-02-21T12:40:23.8739053Z st.shared.b8 [%r23+1280], %r17258; 2026-02-21T12:40:23.8739117Z prmt.b32 %r17259, %r16467, 0, 0x7773U; 2026-02-21T12:40:23.8739179Z st.shared.b8 [%r24+1280], %r17259; 2026-02-21T12:40:23.8739242Z st.shared.b8 [%r25+1536], %r16468; 2026-02-21T12:40:23.8739304Z prmt.b32 %r17260, %r16468, 0, 0x7771U; 2026-02-21T12:40:23.8739368Z st.shared.b8 [%r26+1536], %r17260; 2026-02-21T12:40:23.8739456Z prmt.b32 %r17261, %r16468, 0, 0x7772U; 2026-02-21T12:40:23.8739745Z st.shared.b8 [%r27+1792], %r17261; 2026-02-21T12:40:23.8739848Z prmt.b32 %r17262, %r16468, 0, 0x7773U; 2026-02-21T12:40:23.8739912Z st.shared.b8 [%r28+1792], %r17262; 2026-02-21T12:40:23.8739971Z bar.sync 0; 2026-02-21T12:40:23.8740037Z ld.shared.b32 %r17263, [%r29]; 2026-02-21T12:40:23.8740100Z prmt.b32 %r17264, %r17263, 0, 0x7770U; 2026-02-21T12:40:23.8740165Z cvt.u16.u32 %rs1569, %r17264; 2026-02-21T12:40:23.8740253Z prmt.b32 %r17265, %r17263, 0, 0x7771U; 2026-02-21T12:40:23.8740361Z cvt.u16.u32 %rs1570, %r17265; 2026-02-21T12:40:23.8740480Z prmt.b32 %r17266, %r17263, 0, 0x7772U; 2026-02-21T12:40:23.8740587Z cvt.u16.u32 %rs1571, %r17266; 2026-02-21T12:40:23.8740693Z prmt.b32 %r17267, %r17263, 0, 0x7773U; 2026-02-21T12:40:23.8740756Z cvt.u16.u32 %rs1572, %r17267; 2026-02-21T12:40:23.8740824Z ld.shared.b32 %r17268, [%r30]; 2026-02-21T12:40:23.8740891Z prmt.b32 %r17269, %r17268, 0, 0x7770U; 2026-02-21T12:40:23.8740949Z cvt.u16.u32 %rs1573, %r17269; 2026-02-21T12:40:23.8741022Z prmt.b32 %r17270, %r17268, 0, 0x7771U; 2026-02-21T12:40:23.8741088Z cvt.u16.u32 %rs1574, %r17270; 2026-02-21T12:40:23.8741198Z prmt.b32 %r17271, %r17268, 0, 0x7772U; 2026-02-21T12:40:23.8741305Z cvt.u16.u32 %rs1575, %r17271; 2026-02-21T12:40:23.8741424Z prmt.b32 %r17272, %r17268, 0, 0x7773U; 2026-02-21T12:40:23.8741529Z cvt.u16.u32 %rs1576, %r17272; 2026-02-21T12:40:23.8741599Z ld.shared.b32 %r17273, [%r31]; 2026-02-21T12:40:23.8741668Z prmt.b32 %r17274, %r17273, 0, 0x7770U; 2026-02-21T12:40:23.8741728Z cvt.u16.u32 %rs1577, %r17274; 2026-02-21T12:40:23.8741791Z prmt.b32 %r17275, %r17273, 0, 0x7771U; 2026-02-21T12:40:23.8741865Z cvt.u16.u32 %rs1578, %r17275; 2026-02-21T12:40:23.8741931Z prmt.b32 %r17276, %r17273, 0, 0x7772U; 2026-02-21T12:40:23.8742033Z cvt.u16.u32 %rs1579, %r17276; 2026-02-21T12:40:23.8742147Z prmt.b32 %r17277, %r17273, 0, 0x7773U; 2026-02-21T12:40:23.8742261Z cvt.u16.u32 %rs1580, %r17277; 2026-02-21T12:40:23.8742375Z ld.shared.b32 %r17278, [%r32]; 2026-02-21T12:40:23.8742453Z prmt.b32 %r17279, %r17278, 0, 0x7770U; 2026-02-21T12:40:23.8742516Z cvt.u16.u32 %rs1581, %r17279; 2026-02-21T12:40:23.8742580Z prmt.b32 %r17280, %r17278, 0, 0x7771U; 2026-02-21T12:40:23.8742640Z cvt.u16.u32 %rs1582, %r17280; 2026-02-21T12:40:23.8742716Z prmt.b32 %r17281, %r17278, 0, 0x7772U; 2026-02-21T12:40:23.8742822Z cvt.u16.u32 %rs1583, %r17281; 2026-02-21T12:40:23.8742934Z prmt.b32 %r17282, %r17278, 0, 0x7773U; 2026-02-21T12:40:23.8743034Z cvt.u16.u32 %rs1584, %r17282; 2026-02-21T12:40:23.8743283Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:23.8743348Z shl.b16 %rs1585, %rs1569, 4; 2026-02-21T12:40:23.8743408Z shl.b16 %rs1586, %rs1573, 4; 2026-02-21T12:40:23.8743469Z shl.b16 %rs1587, %rs1577, 4; 2026-02-21T12:40:23.8743669Z shl.b16 %rs1588, %rs1581, 4; 2026-02-21T12:40:23.8743746Z shl.b16 %rs1589, %rs1570, 4; 2026-02-21T12:40:23.8743808Z shl.b16 %rs1590, %rs1574, 4; 2026-02-21T12:40:23.8743927Z shl.b16 %rs1591, %rs1578, 4; 2026-02-21T12:40:23.8743987Z shl.b16 %rs1592, %rs1582, 4; 2026-02-21T12:40:23.8744044Z shl.b16 %rs1593, %rs1571, 4; 2026-02-21T12:40:23.8744105Z shl.b16 %rs1594, %rs1575, 4; 2026-02-21T12:40:23.8744163Z shl.b16 %rs1595, %rs1579, 4; 2026-02-21T12:40:23.8744224Z shl.b16 %rs1596, %rs1583, 4; 2026-02-21T12:40:23.8744280Z shl.b16 %rs1597, %rs1572, 4; 2026-02-21T12:40:23.8744342Z shl.b16 %rs1598, %rs1576, 4; 2026-02-21T12:40:23.8744441Z shl.b16 %rs1599, %rs1580, 4; 2026-02-21T12:40:23.8744549Z shl.b16 %rs1600, %rs1584, 4; 2026-02-21T12:40:23.8744767Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8744831Z cvt.s16.s8 %rs1601, %rs1585; 2026-02-21T12:40:23.8744892Z shr.s16 %rs1602, %rs1601, 4; 2026-02-21T12:40:23.8744954Z cvt.s16.s8 %rs1603, %rs1587; 2026-02-21T12:40:23.8745019Z shr.s16 %rs1604, %rs1603, 4; 2026-02-21T12:40:23.8745087Z prmt.b32 %r17283, %r17263, 0, 0x8880U; 2026-02-21T12:40:23.8745301Z cvt.u16.u32 %rs1605, %r17283; 2026-02-21T12:40:23.8745416Z shr.s16 %rs1606, %rs1605, 4; 2026-02-21T12:40:23.8745484Z prmt.b32 %r17284, %r17273, 0, 0x8880U; 2026-02-21T12:40:23.8745544Z cvt.u16.u32 %rs1607, %r17284; 2026-02-21T12:40:23.8745607Z shr.s16 %rs1608, %rs1607, 4; 2026-02-21T12:40:23.8745809Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8745876Z cvt.rn.f32.s16 %r17285, %rs1608; 2026-02-21T12:40:23.8745938Z cvt.rn.f32.s16 %r17286, %rs1606; 2026-02-21T12:40:23.8746001Z cvt.rn.f32.s16 %r17287, %rs1604; 2026-02-21T12:40:23.8746067Z cvt.rn.f32.s16 %r17288, %rs1602; 2026-02-21T12:40:23.8746349Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8746418Z cvt.s16.s8 %rs1609, %rs1586; 2026-02-21T12:40:23.8746655Z shr.s16 %rs1610, %rs1609, 4; 2026-02-21T12:40:23.8746765Z cvt.s16.s8 %rs1611, %rs1588; 2026-02-21T12:40:23.8746840Z shr.s16 %rs1612, %rs1611, 4; 2026-02-21T12:40:23.8746910Z prmt.b32 %r17289, %r17268, 0, 0x8880U; 2026-02-21T12:40:23.8746970Z cvt.u16.u32 %rs1613, %r17289; 2026-02-21T12:40:23.8747029Z shr.s16 %rs1614, %rs1613, 4; 2026-02-21T12:40:23.8747097Z prmt.b32 %r17290, %r17278, 0, 0x8880U; 2026-02-21T12:40:23.8747159Z cvt.u16.u32 %rs1615, %r17290; 2026-02-21T12:40:23.8747218Z shr.s16 %rs1616, %rs1615, 4; 2026-02-21T12:40:23.8747425Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8747536Z cvt.rn.f32.s16 %r17291, %rs1616; 2026-02-21T12:40:23.8747631Z cvt.rn.f32.s16 %r17292, %rs1614; 2026-02-21T12:40:23.8747693Z cvt.rn.f32.s16 %r17293, %rs1612; 2026-02-21T12:40:23.8747755Z cvt.rn.f32.s16 %r17294, %rs1610; 2026-02-21T12:40:23.8747965Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8748031Z cvt.s16.s8 %rs1617, %rs1589; 2026-02-21T12:40:23.8748098Z shr.s16 %rs1618, %rs1617, 4; 2026-02-21T12:40:23.8748159Z cvt.s16.s8 %rs1619, %rs1591; 2026-02-21T12:40:23.8748216Z shr.s16 %rs1620, %rs1619, 4; 2026-02-21T12:40:23.8748309Z prmt.b32 %r17295, %r17263, 0, 0x9991U; 2026-02-21T12:40:23.8748491Z cvt.u16.u32 %rs1621, %r17295; 2026-02-21T12:40:23.8748571Z shr.s16 %rs1622, %rs1621, 4; 2026-02-21T12:40:23.8748639Z prmt.b32 %r17296, %r17273, 0, 0x9991U; 2026-02-21T12:40:23.8748703Z cvt.u16.u32 %rs1623, %r17296; 2026-02-21T12:40:23.8748761Z shr.s16 %rs1624, %rs1623, 4; 2026-02-21T12:40:23.8748956Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8749022Z cvt.rn.f32.s16 %r17297, %rs1624; 2026-02-21T12:40:23.8749085Z cvt.rn.f32.s16 %r17298, %rs1622; 2026-02-21T12:40:23.8749145Z cvt.rn.f32.s16 %r17299, %rs1620; 2026-02-21T12:40:23.8749370Z cvt.rn.f32.s16 %r17300, %rs1618; 2026-02-21T12:40:23.8749658Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8749811Z cvt.s16.s8 %rs1625, %rs1590; 2026-02-21T12:40:23.8749873Z shr.s16 %rs1626, %rs1625, 4; 2026-02-21T12:40:23.8749938Z cvt.s16.s8 %rs1627, %rs1592; 2026-02-21T12:40:23.8749998Z shr.s16 %rs1628, %rs1627, 4; 2026-02-21T12:40:23.8750065Z prmt.b32 %r17301, %r17268, 0, 0x9991U; 2026-02-21T12:40:23.8750129Z cvt.u16.u32 %rs1629, %r17301; 2026-02-21T12:40:23.8750189Z shr.s16 %rs1630, %rs1629, 4; 2026-02-21T12:40:23.8750300Z prmt.b32 %r17302, %r17278, 0, 0x9991U; 2026-02-21T12:40:23.8750407Z cvt.u16.u32 %rs1631, %r17302; 2026-02-21T12:40:23.8750475Z shr.s16 %rs1632, %rs1631, 4; 2026-02-21T12:40:23.8750677Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8750741Z cvt.rn.f32.s16 %r17303, %rs1632; 2026-02-21T12:40:23.8750812Z cvt.rn.f32.s16 %r17304, %rs1630; 2026-02-21T12:40:23.8750873Z cvt.rn.f32.s16 %r17305, %rs1628; 2026-02-21T12:40:23.8751010Z cvt.rn.f32.s16 %r17306, %rs1626; 2026-02-21T12:40:23.8751378Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8751449Z cvt.s16.s8 %rs1633, %rs1593; 2026-02-21T12:40:23.8751509Z shr.s16 %rs1634, %rs1633, 4; 2026-02-21T12:40:23.8751569Z cvt.s16.s8 %rs1635, %rs1595; 2026-02-21T12:40:23.8751634Z shr.s16 %rs1636, %rs1635, 4; 2026-02-21T12:40:23.8751704Z prmt.b32 %r17307, %r17263, 0, 0xaaa2U; 2026-02-21T12:40:23.8751767Z cvt.u16.u32 %rs1637, %r17307; 2026-02-21T12:40:23.8751832Z shr.s16 %rs1638, %rs1637, 4; 2026-02-21T12:40:23.8751898Z prmt.b32 %r17308, %r17273, 0, 0xaaa2U; 2026-02-21T12:40:23.8752001Z cvt.u16.u32 %rs1639, %r17308; 2026-02-21T12:40:23.8752110Z shr.s16 %rs1640, %rs1639, 4; 2026-02-21T12:40:23.8752321Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8752388Z cvt.rn.f32.s16 %r17309, %rs1640; 2026-02-21T12:40:23.8752450Z cvt.rn.f32.s16 %r17310, %rs1638; 2026-02-21T12:40:23.8752518Z cvt.rn.f32.s16 %r17311, %rs1636; 2026-02-21T12:40:23.8752580Z cvt.rn.f32.s16 %r17312, %rs1634; 2026-02-21T12:40:23.8752782Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8752893Z cvt.s16.s8 %rs1641, %rs1594; 2026-02-21T12:40:23.8752987Z shr.s16 %rs1642, %rs1641, 4; 2026-02-21T12:40:23.8753048Z cvt.s16.s8 %rs1643, %rs1596; 2026-02-21T12:40:23.8753109Z shr.s16 %rs1644, %rs1643, 4; 2026-02-21T12:40:23.8753179Z prmt.b32 %r17313, %r17268, 0, 0xaaa2U; 2026-02-21T12:40:23.8753239Z cvt.u16.u32 %rs1645, %r17313; 2026-02-21T12:40:23.8753299Z shr.s16 %rs1646, %rs1645, 4; 2026-02-21T12:40:23.8753367Z prmt.b32 %r17314, %r17278, 0, 0xaaa2U; 2026-02-21T12:40:23.8753427Z cvt.u16.u32 %rs1647, %r17314; 2026-02-21T12:40:23.8753489Z shr.s16 %rs1648, %rs1647, 4; 2026-02-21T12:40:23.8753742Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8753841Z cvt.rn.f32.s16 %r17315, %rs1648; 2026-02-21T12:40:23.8753905Z cvt.rn.f32.s16 %r17316, %rs1646; 2026-02-21T12:40:23.8753966Z cvt.rn.f32.s16 %r17317, %rs1644; 2026-02-21T12:40:23.8754030Z cvt.rn.f32.s16 %r17318, %rs1642; 2026-02-21T12:40:23.8754223Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8754285Z cvt.s16.s8 %rs1649, %rs1597; 2026-02-21T12:40:23.8754358Z shr.s16 %rs1650, %rs1649, 4; 2026-02-21T12:40:23.8754420Z cvt.s16.s8 %rs1651, %rs1599; 2026-02-21T12:40:23.8754501Z shr.s16 %rs1652, %rs1651, 4; 2026-02-21T12:40:23.8754616Z prmt.b32 %r17319, %r17263, 0, 0xbbb3U; 2026-02-21T12:40:23.8754696Z cvt.u16.u32 %rs1653, %r17319; 2026-02-21T12:40:23.8754759Z shr.s16 %rs1654, %rs1653, 4; 2026-02-21T12:40:23.8754903Z prmt.b32 %r17320, %r17273, 0, 0xbbb3U; 2026-02-21T12:40:23.8754967Z cvt.u16.u32 %rs1655, %r17320; 2026-02-21T12:40:23.8755027Z shr.s16 %rs1656, %rs1655, 4; 2026-02-21T12:40:23.8755286Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8755391Z cvt.rn.f32.s16 %r17321, %rs1656; 2026-02-21T12:40:23.8755505Z cvt.rn.f32.s16 %r17322, %rs1654; 2026-02-21T12:40:23.8755573Z cvt.rn.f32.s16 %r17323, %rs1652; 2026-02-21T12:40:23.8755635Z cvt.rn.f32.s16 %r17324, %rs1650; 2026-02-21T12:40:23.8755832Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:23.8755893Z cvt.s16.s8 %rs1657, %rs1598; 2026-02-21T12:40:23.8755952Z shr.s16 %rs1658, %rs1657, 4; 2026-02-21T12:40:23.8756016Z cvt.s16.s8 %rs1659, %rs1600; 2026-02-21T12:40:23.8756077Z shr.s16 %rs1660, %rs1659, 4; 2026-02-21T12:40:23.8756143Z prmt.b32 %r17325, %r17268, 0, 0xbbb3U; 2026-02-21T12:40:23.8756244Z cvt.u16.u32 %rs1661, %r17325; 2026-02-21T12:40:23.8756347Z shr.s16 %rs1662, %rs1661, 4; 2026-02-21T12:40:23.8756415Z prmt.b32 %r17326, %r17278, 0, 0xbbb3U; 2026-02-21T12:40:23.8756824Z cvt.u16.u32 %rs1663, %r17326; 2026-02-21T12:40:23.8756910Z shr.s16 %rs1664, %rs1663, 4; 2026-02-21T12:40:23.8757143Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:23.8757212Z cvt.rn.f32.s16 %r17327, %rs1664; 2026-02-21T12:40:23.8757278Z cvt.rn.f32.s16 %r17328, %rs1662; 2026-02-21T12:40:23.8757340Z cvt.rn.f32.s16 %r17329, %rs1660; 2026-02-21T12:40:23.8757436Z cvt.rn.f32.s16 %r17330, %rs1658; 2026-02-21T12:40:23.8757531Z bar.sync 0; 2026-02-21T12:40:23.8757748Z st.shared.v4.b32 [%r33], {%r17288, %r17286, %r17287, %r17285}; 2026-02-21T12:40:23.8757946Z st.shared.v4.b32 [%r33+8192], {%r17294, %r17292, %r17293, %r17291}; 2026-02-21T12:40:23.8758070Z st.shared.v4.b32 [%r34], {%r17300, %r17298, %r17299, %r17297}; 2026-02-21T12:40:23.8758196Z st.shared.v4.b32 [%r34+8192], {%r17306, %r17304, %r17305, %r17303}; 2026-02-21T12:40:23.8758370Z st.shared.v4.b32 [%r35], {%r17312, %r17310, %r17311, %r17309}; 2026-02-21T12:40:23.8758525Z st.shared.v4.b32 [%r35+8192], {%r17318, %r17316, %r17317, %r17315}; 2026-02-21T12:40:23.8758635Z st.shared.v4.b32 [%r36], {%r17324, %r17322, %r17323, %r17321}; 2026-02-21T12:40:23.8758749Z st.shared.v4.b32 [%r36+8192], {%r17330, %r17328, %r17329, %r17327}; 2026-02-21T12:40:23.8758802Z $L__tmp31: 2026-02-21T12:40:23.8759085Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:23.8759192Z // begin inline asm 2026-02-21T12:40:23.8759303Z fence.proxy.async.shared::cta; 2026-02-21T12:40:23.8759362Z // end inline asm 2026-02-21T12:40:23.8759419Z bar.sync 0; 2026-02-21T12:40:23.8759504Z shfl.sync.idx.b32 %r17331, %r2, 0, 31, -1; 2026-02-21T12:40:23.8759577Z wgmma.fence.sync.aligned; 2026-02-21T12:40:23.8759650Z mov.pred %p43, -1; 2026-02-21T12:40:23.8759707Z // begin inline asm 2026-02-21T12:40:23.8762673Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r16725,%r16726,%r16727,%r16728}, %rd23, %p43, 1, 1; 2026-02-21T12:40:24.4223981Z // end inline asm 2026-02-21T12:40:24.4224200Z // begin inline asm 2026-02-21T12:40:24.4228162Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225}, {%r16985,%r16986,%r16987,%r16988}, %rd24, %p43, 1, 1; 2026-02-21T12:40:24.4231578Z // end inline asm 2026-02-21T12:40:24.4231773Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:24.4231998Z mov.b32 %r17118, 0; 2026-02-21T12:40:24.4232167Z mov.b32 %r17119, %r17118; 2026-02-21T12:40:24.4232367Z mov.b32 %r17117, %r18409; 2026-02-21T12:40:24.4232547Z // begin inline asm 2026-02-21T12:40:24.4235444Z // wait for regs: %r22098,%r22099,%r22100,%r22101,%r22102,%r22103,%r22104,%r22105,%r22106,%r22107,%r22108,%r22109,%r22110,%r22111,%r22112,%r22113,%r22114,%r22115,%r22116,%r22117,%r22118,%r22119,%r22120,%r22121,%r22122,%r22123,%r22124,%r22125,%r22126,%r22127,%r22128,%r22129,%r22130,%r22131,%r22132,%r22133,%r22134,%r22135,%r22136,%r22137,%r22138,%r22139,%r22140,%r22141,%r22142,%r22143,%r22144,%r22145,%r22146,%r22147,%r22148,%r22149,%r22150,%r22151,%r22152,%r22153,%r22154,%r22155,%r22156,%r22157,%r22158,%r22159,%r22160,%r22161,%r22162,%r22163,%r22164,%r22165,%r22166,%r22167,%r22168,%r22169,%r22170,%r22171,%r22172,%r22173,%r22174,%r22175,%r22176,%r22177,%r22178,%r22179,%r22180,%r22181,%r22182,%r22183,%r22184,%r22185,%r22186,%r22187,%r22188,%r22189,%r22190,%r22191,%r22192,%r22193,%r22194,%r22195,%r22196,%r22197,%r22198,%r22199,%r22200,%r22201,%r22202,%r22203,%r22204,%r22205,%r22206,%r22207,%r22208,%r22209,%r22210,%r22211,%r22212,%r22213,%r22214,%r22215,%r22216,%r22217,%r22218,%r22219,%r22220,%r22221,%r22222,%r22223,%r22224,%r22225,%r17117,%r17118,%r17119 2026-02-21T12:40:24.4238517Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:24.4238711Z // end inline asm 2026-02-21T12:40:24.4238863Z $L__tmp32: 2026-02-21T12:40:24.4239179Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:24.4239558Z add.s64 %rd640, %rd640, 8; 2026-02-21T12:40:24.4239766Z add.s64 %rd639, %rd639, 32; 2026-02-21T12:40:24.4239955Z add.s64 %rd638, %rd638, 10240; 2026-02-21T12:40:24.4240151Z setp.lt.u64 %p45, %rd640, 4088; 2026-02-21T12:40:24.4240339Z @%p45 bra $L__BB0_37; 2026-02-21T12:40:24.4240562Z // %bb.38: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:40:24.4240977Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:24.4241353Z or.b64 %rd486, %rd119, %rd5; 2026-02-21T12:40:24.4241540Z or.b64 %rd487, %rd119, %rd6; 2026-02-21T12:40:24.4241719Z or.b64 %rd488, %rd119, %rd7; 2026-02-21T12:40:24.4241898Z or.b64 %rd489, %rd119, %rd8; 2026-02-21T12:40:24.4242073Z or.b64 %rd490, %rd119, %rd9; 2026-02-21T12:40:24.4242260Z or.b64 %rd491, %rd119, %rd10; 2026-02-21T12:40:24.4242559Z or.b64 %rd492, %rd119, %rd11; 2026-02-21T12:40:24.4242741Z or.b64 %rd493, %rd119, %rd12; 2026-02-21T12:40:24.4242911Z or.b64 %rd494, %rd119, %rd13; 2026-02-21T12:40:24.4243175Z or.b64 %rd495, %rd119, %rd14; 2026-02-21T12:40:24.4243351Z or.b64 %rd496, %rd119, %rd15; 2026-02-21T12:40:24.4243528Z or.b64 %rd497, %rd119, %rd16; 2026-02-21T12:40:24.4243700Z or.b64 %rd498, %rd119, %rd17; 2026-02-21T12:40:24.4243891Z or.b64 %rd499, %rd119, %rd18; 2026-02-21T12:40:24.4244073Z or.b64 %rd500, %rd119, %rd19; 2026-02-21T12:40:24.4244247Z or.b64 %rd501, %rd119, %rd20; 2026-02-21T12:40:24.4244577Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:24.4244934Z or.b64 %rd502, %rd120, %rd22; 2026-02-21T12:40:24.4245265Z .loc 1 90 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:90:28 2026-02-21T12:40:24.4245638Z cvt.rn.bf16x2.f32 %r17476, %r22099, %r22098; 2026-02-21T12:40:24.4245896Z cvt.rn.bf16x2.f32 %r17477, %r22101, %r22100; 2026-02-21T12:40:24.4246130Z cvt.rn.bf16x2.f32 %r17478, %r22103, %r22102; 2026-02-21T12:40:24.4246600Z cvt.rn.bf16x2.f32 %r17479, %r22105, %r22104; 2026-02-21T12:40:24.4246942Z cvt.rn.bf16x2.f32 %r17480, %r22107, %r22106; 2026-02-21T12:40:24.4247170Z cvt.rn.bf16x2.f32 %r17481, %r22109, %r22108; 2026-02-21T12:40:24.4247400Z cvt.rn.bf16x2.f32 %r17482, %r22111, %r22110; 2026-02-21T12:40:24.4247622Z cvt.rn.bf16x2.f32 %r17483, %r22113, %r22112; 2026-02-21T12:40:24.4247852Z cvt.rn.bf16x2.f32 %r17484, %r22115, %r22114; 2026-02-21T12:40:24.4248072Z cvt.rn.bf16x2.f32 %r17485, %r22117, %r22116; 2026-02-21T12:40:24.4248314Z cvt.rn.bf16x2.f32 %r17486, %r22119, %r22118; 2026-02-21T12:40:24.4248546Z cvt.rn.bf16x2.f32 %r17487, %r22121, %r22120; 2026-02-21T12:40:24.4248769Z cvt.rn.bf16x2.f32 %r17488, %r22123, %r22122; 2026-02-21T12:40:24.4248997Z cvt.rn.bf16x2.f32 %r17489, %r22125, %r22124; 2026-02-21T12:40:24.4249217Z cvt.rn.bf16x2.f32 %r17490, %r22127, %r22126; 2026-02-21T12:40:24.4249447Z cvt.rn.bf16x2.f32 %r17491, %r22129, %r22128; 2026-02-21T12:40:24.4249672Z cvt.rn.bf16x2.f32 %r17492, %r22131, %r22130; 2026-02-21T12:40:24.4249907Z cvt.rn.bf16x2.f32 %r17493, %r22133, %r22132; 2026-02-21T12:40:24.4250150Z cvt.rn.bf16x2.f32 %r17494, %r22135, %r22134; 2026-02-21T12:40:24.4261241Z cvt.rn.bf16x2.f32 %r17495, %r22137, %r22136; 2026-02-21T12:40:24.4261552Z cvt.rn.bf16x2.f32 %r17496, %r22139, %r22138; 2026-02-21T12:40:24.4261846Z cvt.rn.bf16x2.f32 %r17497, %r22141, %r22140; 2026-02-21T12:40:24.4262102Z cvt.rn.bf16x2.f32 %r17498, %r22143, %r22142; 2026-02-21T12:40:24.4262345Z cvt.rn.bf16x2.f32 %r17499, %r22145, %r22144; 2026-02-21T12:40:24.4262583Z cvt.rn.bf16x2.f32 %r17500, %r22147, %r22146; 2026-02-21T12:40:24.4262840Z cvt.rn.bf16x2.f32 %r17501, %r22149, %r22148; 2026-02-21T12:40:24.4263087Z cvt.rn.bf16x2.f32 %r17502, %r22151, %r22150; 2026-02-21T12:40:24.4263325Z cvt.rn.bf16x2.f32 %r17503, %r22153, %r22152; 2026-02-21T12:40:24.4263569Z cvt.rn.bf16x2.f32 %r17504, %r22155, %r22154; 2026-02-21T12:40:24.4263809Z cvt.rn.bf16x2.f32 %r17505, %r22157, %r22156; 2026-02-21T12:40:24.4264053Z cvt.rn.bf16x2.f32 %r17506, %r22159, %r22158; 2026-02-21T12:40:24.4264289Z cvt.rn.bf16x2.f32 %r17507, %r22161, %r22160; 2026-02-21T12:40:24.4264541Z cvt.rn.bf16x2.f32 %r17508, %r22163, %r22162; 2026-02-21T12:40:24.4264777Z cvt.rn.bf16x2.f32 %r17509, %r22165, %r22164; 2026-02-21T12:40:24.4265022Z cvt.rn.bf16x2.f32 %r17510, %r22167, %r22166; 2026-02-21T12:40:24.4265264Z cvt.rn.bf16x2.f32 %r17511, %r22169, %r22168; 2026-02-21T12:40:24.4265499Z cvt.rn.bf16x2.f32 %r17512, %r22171, %r22170; 2026-02-21T12:40:24.4265736Z cvt.rn.bf16x2.f32 %r17513, %r22173, %r22172; 2026-02-21T12:40:24.4265975Z cvt.rn.bf16x2.f32 %r17514, %r22175, %r22174; 2026-02-21T12:40:24.4266211Z cvt.rn.bf16x2.f32 %r17515, %r22177, %r22176; 2026-02-21T12:40:24.4266439Z cvt.rn.bf16x2.f32 %r17516, %r22179, %r22178; 2026-02-21T12:40:24.4266860Z cvt.rn.bf16x2.f32 %r17517, %r22181, %r22180; 2026-02-21T12:40:24.4267251Z cvt.rn.bf16x2.f32 %r17518, %r22183, %r22182; 2026-02-21T12:40:24.4267490Z cvt.rn.bf16x2.f32 %r17519, %r22185, %r22184; 2026-02-21T12:40:24.4267812Z cvt.rn.bf16x2.f32 %r17520, %r22187, %r22186; 2026-02-21T12:40:24.4268045Z cvt.rn.bf16x2.f32 %r17521, %r22189, %r22188; 2026-02-21T12:40:24.4268281Z cvt.rn.bf16x2.f32 %r17522, %r22191, %r22190; 2026-02-21T12:40:24.4268598Z cvt.rn.bf16x2.f32 %r17523, %r22193, %r22192; 2026-02-21T12:40:24.4268839Z cvt.rn.bf16x2.f32 %r17524, %r22195, %r22194; 2026-02-21T12:40:24.4269069Z cvt.rn.bf16x2.f32 %r17525, %r22197, %r22196; 2026-02-21T12:40:24.4269304Z cvt.rn.bf16x2.f32 %r17526, %r22199, %r22198; 2026-02-21T12:40:24.4269544Z cvt.rn.bf16x2.f32 %r17527, %r22201, %r22200; 2026-02-21T12:40:24.4269777Z cvt.rn.bf16x2.f32 %r17528, %r22203, %r22202; 2026-02-21T12:40:24.4270015Z cvt.rn.bf16x2.f32 %r17529, %r22205, %r22204; 2026-02-21T12:40:24.4270244Z cvt.rn.bf16x2.f32 %r17530, %r22207, %r22206; 2026-02-21T12:40:24.4270490Z cvt.rn.bf16x2.f32 %r17531, %r22209, %r22208; 2026-02-21T12:40:24.4270726Z cvt.rn.bf16x2.f32 %r17532, %r22211, %r22210; 2026-02-21T12:40:24.4271054Z cvt.rn.bf16x2.f32 %r17533, %r22213, %r22212; 2026-02-21T12:40:24.4271364Z cvt.rn.bf16x2.f32 %r17534, %r22215, %r22214; 2026-02-21T12:40:24.4271604Z cvt.rn.bf16x2.f32 %r17535, %r22217, %r22216; 2026-02-21T12:40:24.4271839Z cvt.rn.bf16x2.f32 %r17536, %r22219, %r22218; 2026-02-21T12:40:24.4272071Z cvt.rn.bf16x2.f32 %r17537, %r22221, %r22220; 2026-02-21T12:40:24.4272317Z cvt.rn.bf16x2.f32 %r17538, %r22223, %r22222; 2026-02-21T12:40:24.4272556Z cvt.rn.bf16x2.f32 %r17539, %r22225, %r22224; 2026-02-21T12:40:24.4272936Z .loc 1 91 22 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:22 2026-02-21T12:40:24.4273320Z mad.lo.s64 %rd503, %rd486, 2560, %rd162; 2026-02-21T12:40:24.4273553Z shl.b64 %rd504, %rd502, 1; 2026-02-21T12:40:24.4273767Z add.s64 %rd470, %rd503, %rd504; 2026-02-21T12:40:24.4273986Z mad.lo.s64 %rd505, %rd487, 2560, %rd162; 2026-02-21T12:40:24.4274222Z add.s64 %rd471, %rd505, %rd504; 2026-02-21T12:40:24.4274427Z mad.lo.s64 %rd506, %rd488, 2560, %rd162; 2026-02-21T12:40:24.4274654Z add.s64 %rd472, %rd506, %rd504; 2026-02-21T12:40:24.4274853Z mad.lo.s64 %rd507, %rd489, 2560, %rd162; 2026-02-21T12:40:24.4295722Z add.s64 %rd473, %rd507, %rd504; 2026-02-21T12:40:24.4295957Z mad.lo.s64 %rd508, %rd490, 2560, %rd162; 2026-02-21T12:40:24.4296177Z add.s64 %rd474, %rd508, %rd504; 2026-02-21T12:40:24.4296373Z mad.lo.s64 %rd509, %rd491, 2560, %rd162; 2026-02-21T12:40:24.4296701Z add.s64 %rd475, %rd509, %rd504; 2026-02-21T12:40:24.4296892Z mad.lo.s64 %rd510, %rd492, 2560, %rd162; 2026-02-21T12:40:24.4297108Z add.s64 %rd476, %rd510, %rd504; 2026-02-21T12:40:24.4297295Z mad.lo.s64 %rd511, %rd493, 2560, %rd162; 2026-02-21T12:40:24.4297493Z add.s64 %rd477, %rd511, %rd504; 2026-02-21T12:40:24.4297681Z mad.lo.s64 %rd512, %rd494, 2560, %rd162; 2026-02-21T12:40:24.4297879Z add.s64 %rd478, %rd512, %rd504; 2026-02-21T12:40:24.4298067Z mad.lo.s64 %rd513, %rd495, 2560, %rd162; 2026-02-21T12:40:24.4298265Z add.s64 %rd479, %rd513, %rd504; 2026-02-21T12:40:24.4298458Z mad.lo.s64 %rd514, %rd496, 2560, %rd162; 2026-02-21T12:40:24.4298655Z add.s64 %rd480, %rd514, %rd504; 2026-02-21T12:40:24.4298839Z mad.lo.s64 %rd515, %rd497, 2560, %rd162; 2026-02-21T12:40:24.4299038Z add.s64 %rd481, %rd515, %rd504; 2026-02-21T12:40:24.4299220Z mad.lo.s64 %rd516, %rd498, 2560, %rd162; 2026-02-21T12:40:24.4299417Z add.s64 %rd482, %rd516, %rd504; 2026-02-21T12:40:24.4299601Z mad.lo.s64 %rd517, %rd499, 2560, %rd162; 2026-02-21T12:40:24.4299797Z add.s64 %rd483, %rd517, %rd504; 2026-02-21T12:40:24.4299981Z mad.lo.s64 %rd518, %rd500, 2560, %rd162; 2026-02-21T12:40:24.4300179Z add.s64 %rd484, %rd518, %rd504; 2026-02-21T12:40:24.4300363Z mad.lo.s64 %rd519, %rd501, 2560, %rd162; 2026-02-21T12:40:24.4300560Z add.s64 %rd485, %rd519, %rd504; 2026-02-21T12:40:24.4300893Z .loc 1 91 81 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:81 2026-02-21T12:40:24.4301361Z bar.sync 0; 2026-02-21T12:40:24.4301574Z st.shared.v4.b32 [%r37], {%r17476, %r17478, %r17480, %r17482}; 2026-02-21T12:40:24.4301954Z st.shared.v4.b32 [%r38], {%r17484, %r17486, %r17488, %r17490}; 2026-02-21T12:40:24.4302249Z st.shared.v4.b32 [%r39], {%r17492, %r17494, %r17496, %r17498}; 2026-02-21T12:40:24.4302539Z st.shared.v4.b32 [%r40], {%r17500, %r17502, %r17504, %r17506}; 2026-02-21T12:40:24.4302828Z st.shared.v4.b32 [%r41], {%r17508, %r17510, %r17512, %r17514}; 2026-02-21T12:40:24.4303131Z st.shared.v4.b32 [%r42], {%r17516, %r17518, %r17520, %r17522}; 2026-02-21T12:40:24.4303419Z st.shared.v4.b32 [%r43], {%r17524, %r17526, %r17528, %r17530}; 2026-02-21T12:40:24.4303706Z st.shared.v4.b32 [%r44], {%r17532, %r17534, %r17536, %r17538}; 2026-02-21T12:40:24.4303945Z bar.sync 0; 2026-02-21T12:40:24.4304088Z // begin inline asm 2026-02-21T12:40:24.4304391Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17332, %r17333, %r17334, %r17335}, [%r6254]; 2026-02-21T12:40:24.4304741Z // end inline asm 2026-02-21T12:40:24.4304889Z // begin inline asm 2026-02-21T12:40:24.4305299Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17337, %r17338, %r17339, %r17340}, [%r6259]; 2026-02-21T12:40:24.4305635Z // end inline asm 2026-02-21T12:40:24.4305778Z // begin inline asm 2026-02-21T12:40:24.4306074Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17342, %r17343, %r17344, %r17345}, [%r6264]; 2026-02-21T12:40:24.4306429Z // end inline asm 2026-02-21T12:40:24.4306696Z // begin inline asm 2026-02-21T12:40:24.4306996Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17347, %r17348, %r17349, %r17350}, [%r6269]; 2026-02-21T12:40:24.4307325Z // end inline asm 2026-02-21T12:40:24.4307467Z // begin inline asm 2026-02-21T12:40:24.4307744Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17352, %r17353, %r17354, %r17355}, [%r6274]; 2026-02-21T12:40:24.4308073Z // end inline asm 2026-02-21T12:40:24.4308215Z // begin inline asm 2026-02-21T12:40:24.4308577Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17357, %r17358, %r17359, %r17360}, [%r6279]; 2026-02-21T12:40:24.4308906Z // end inline asm 2026-02-21T12:40:24.4309062Z // begin inline asm 2026-02-21T12:40:24.4309344Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17362, %r17363, %r17364, %r17365}, [%r6284]; 2026-02-21T12:40:24.4309695Z // end inline asm 2026-02-21T12:40:24.4309848Z // begin inline asm 2026-02-21T12:40:24.4310141Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17367, %r17368, %r17369, %r17370}, [%r6289]; 2026-02-21T12:40:24.4310488Z // end inline asm 2026-02-21T12:40:24.4310640Z bar.sync 0; 2026-02-21T12:40:24.4310849Z st.shared.v4.b32 [%r37], {%r17477, %r17479, %r17481, %r17483}; 2026-02-21T12:40:24.4311160Z st.shared.v4.b32 [%r38], {%r17485, %r17487, %r17489, %r17491}; 2026-02-21T12:40:24.4311488Z st.shared.v4.b32 [%r39], {%r17493, %r17495, %r17497, %r17499}; 2026-02-21T12:40:24.4311795Z st.shared.v4.b32 [%r40], {%r17501, %r17503, %r17505, %r17507}; 2026-02-21T12:40:24.4312105Z st.shared.v4.b32 [%r41], {%r17509, %r17511, %r17513, %r17515}; 2026-02-21T12:40:24.4312409Z st.shared.v4.b32 [%r42], {%r17517, %r17519, %r17521, %r17523}; 2026-02-21T12:40:24.4312711Z st.shared.v4.b32 [%r43], {%r17525, %r17527, %r17529, %r17531}; 2026-02-21T12:40:24.4313030Z st.shared.v4.b32 [%r44], {%r17533, %r17535, %r17537, %r17539}; 2026-02-21T12:40:24.4313277Z bar.sync 0; 2026-02-21T12:40:24.4313435Z // begin inline asm 2026-02-21T12:40:24.4313732Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17372, %r17373, %r17374, %r17375}, [%r6254]; 2026-02-21T12:40:24.4314086Z // end inline asm 2026-02-21T12:40:24.4314239Z // begin inline asm 2026-02-21T12:40:24.4314534Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17377, %r17378, %r17379, %r17380}, [%r6259]; 2026-02-21T12:40:24.4314879Z // end inline asm 2026-02-21T12:40:24.4315031Z // begin inline asm 2026-02-21T12:40:24.4315321Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17382, %r17383, %r17384, %r17385}, [%r6264]; 2026-02-21T12:40:24.4315760Z // end inline asm 2026-02-21T12:40:24.4315915Z // begin inline asm 2026-02-21T12:40:24.4316198Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17387, %r17388, %r17389, %r17390}, [%r6269]; 2026-02-21T12:40:24.4316732Z // end inline asm 2026-02-21T12:40:24.4316881Z // begin inline asm 2026-02-21T12:40:24.4317168Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17392, %r17393, %r17394, %r17395}, [%r6274]; 2026-02-21T12:40:24.4317507Z // end inline asm 2026-02-21T12:40:24.4317654Z // begin inline asm 2026-02-21T12:40:24.4317943Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17397, %r17398, %r17399, %r17400}, [%r6279]; 2026-02-21T12:40:24.4318274Z // end inline asm 2026-02-21T12:40:24.4318445Z // begin inline asm 2026-02-21T12:40:24.4318730Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17402, %r17403, %r17404, %r17405}, [%r6284]; 2026-02-21T12:40:24.4319070Z // end inline asm 2026-02-21T12:40:24.4319224Z // begin inline asm 2026-02-21T12:40:24.4319507Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r17407, %r17408, %r17409, %r17410}, [%r6289]; 2026-02-21T12:40:24.4319849Z // end inline asm 2026-02-21T12:40:24.4319999Z // begin inline asm 2026-02-21T12:40:24.4320379Z st.global.v4.b32 [ %rd470 + 0 ], { %r17332, %r17333, %r17334, %r17335 }; 2026-02-21T12:40:24.4320659Z // end inline asm 2026-02-21T12:40:24.4320811Z // begin inline asm 2026-02-21T12:40:24.4321029Z st.global.v4.b32 [ %rd471 + 0 ], { %r17337, %r17338, %r17339, %r17340 }; 2026-02-21T12:40:24.4321302Z // end inline asm 2026-02-21T12:40:24.4321451Z // begin inline asm 2026-02-21T12:40:24.4321676Z st.global.v4.b32 [ %rd472 + 0 ], { %r17372, %r17373, %r17374, %r17375 }; 2026-02-21T12:40:24.4321949Z // end inline asm 2026-02-21T12:40:24.4322096Z // begin inline asm 2026-02-21T12:40:24.4322315Z st.global.v4.b32 [ %rd473 + 0 ], { %r17377, %r17378, %r17379, %r17380 }; 2026-02-21T12:40:24.4322576Z // end inline asm 2026-02-21T12:40:24.4322732Z // begin inline asm 2026-02-21T12:40:24.4322947Z st.global.v4.b32 [ %rd474 + 0 ], { %r17342, %r17343, %r17344, %r17345 }; 2026-02-21T12:40:24.4323219Z // end inline asm 2026-02-21T12:40:24.4323367Z // begin inline asm 2026-02-21T12:40:24.4323592Z st.global.v4.b32 [ %rd475 + 0 ], { %r17347, %r17348, %r17349, %r17350 }; 2026-02-21T12:40:24.4323860Z // end inline asm 2026-02-21T12:40:24.4324010Z // begin inline asm 2026-02-21T12:40:24.4324234Z st.global.v4.b32 [ %rd476 + 0 ], { %r17382, %r17383, %r17384, %r17385 }; 2026-02-21T12:40:24.4324496Z // end inline asm 2026-02-21T12:40:24.4324653Z // begin inline asm 2026-02-21T12:40:24.4324871Z st.global.v4.b32 [ %rd477 + 0 ], { %r17387, %r17388, %r17389, %r17390 }; 2026-02-21T12:40:24.4325142Z // end inline asm 2026-02-21T12:40:24.4325293Z // begin inline asm 2026-02-21T12:40:24.4325516Z st.global.v4.b32 [ %rd478 + 0 ], { %r17352, %r17353, %r17354, %r17355 }; 2026-02-21T12:40:24.4325780Z // end inline asm 2026-02-21T12:40:24.4325934Z // begin inline asm 2026-02-21T12:40:24.4326156Z st.global.v4.b32 [ %rd479 + 0 ], { %r17357, %r17358, %r17359, %r17360 }; 2026-02-21T12:40:24.4326421Z // end inline asm 2026-02-21T12:40:24.4326704Z // begin inline asm 2026-02-21T12:40:24.4326939Z st.global.v4.b32 [ %rd480 + 0 ], { %r17392, %r17393, %r17394, %r17395 }; 2026-02-21T12:40:24.4327216Z // end inline asm 2026-02-21T12:40:24.4327367Z // begin inline asm 2026-02-21T12:40:24.4327599Z st.global.v4.b32 [ %rd481 + 0 ], { %r17397, %r17398, %r17399, %r17400 }; 2026-02-21T12:40:24.4327872Z // end inline asm 2026-02-21T12:40:24.4328022Z // begin inline asm 2026-02-21T12:40:24.4328248Z st.global.v4.b32 [ %rd482 + 0 ], { %r17362, %r17363, %r17364, %r17365 }; 2026-02-21T12:40:24.4328513Z // end inline asm 2026-02-21T12:40:24.4328679Z // begin inline asm 2026-02-21T12:40:24.4328895Z st.global.v4.b32 [ %rd483 + 0 ], { %r17367, %r17368, %r17369, %r17370 }; 2026-02-21T12:40:24.4329166Z // end inline asm 2026-02-21T12:40:24.4329313Z // begin inline asm 2026-02-21T12:40:24.4329537Z st.global.v4.b32 [ %rd484 + 0 ], { %r17402, %r17403, %r17404, %r17405 }; 2026-02-21T12:40:24.4329908Z // end inline asm 2026-02-21T12:40:24.4330055Z // begin inline asm 2026-02-21T12:40:24.4330280Z st.global.v4.b32 [ %rd485 + 0 ], { %r17407, %r17408, %r17409, %r17410 }; 2026-02-21T12:40:24.4330612Z // end inline asm 2026-02-21T12:40:24.4330924Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:24.4331297Z add.s64 %rd612, %rd612, 4; 2026-02-21T12:40:24.4331501Z setp.lt.u64 %p46, %rd612, %rd641; 2026-02-21T12:40:24.4331702Z @%p46 bra $L__BB0_2; 2026-02-21T12:40:24.4331913Z $L__BB0_4: // %.preheader153 2026-02-21T12:40:24.4332164Z setp.ge.s64 %p47, %rd641, %rd2; 2026-02-21T12:40:24.4332363Z @%p47 bra $L__BB0_11; 2026-02-21T12:40:24.4332567Z // %bb.5: // %.lr.ph177 2026-02-21T12:40:24.4332949Z .loc 1 0 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:0:120 2026-02-21T12:40:24.4333333Z and.b32 %r17544, %r21315, 136; 2026-02-21T12:40:24.4333533Z or.b32 %r17545, %r17544, %r21313; 2026-02-21T12:40:24.4333744Z add.s32 %r55, %r18409, %r17545; 2026-02-21T12:40:24.4334027Z xor.b32 %r17547, %r17545, 8; 2026-02-21T12:40:24.4334282Z add.s32 %r56, %r18409, %r17547; 2026-02-21T12:40:24.4334477Z and.b32 %r17551, %r21319, 6; 2026-02-21T12:40:24.4334672Z and.b32 %r17553, %r21320, 136; 2026-02-21T12:40:24.4334863Z or.b32 %r17554, %r21317, %r21318; 2026-02-21T12:40:24.4335050Z or.b32 %r17555, %r17554, %r17551; 2026-02-21T12:40:24.4335241Z or.b32 %r17556, %r17555, %r17553; 2026-02-21T12:40:24.4335430Z add.s32 %r57, %r18409, %r17556; 2026-02-21T12:40:24.4335620Z xor.b32 %r17557, %r17556, 8; 2026-02-21T12:40:24.4335804Z add.s32 %r58, %r18409, %r17557; 2026-02-21T12:40:24.4335995Z shr.u32 %r17559, %r21314, 1; 2026-02-21T12:40:24.4336184Z or.b32 %r17560, %r17559, %r21321; 2026-02-21T12:40:24.4336372Z or.b32 %r17561, %r17560, %r21312; 2026-02-21T12:40:24.4336693Z or.b32 %r17562, %r17561, %r17553; 2026-02-21T12:40:24.4336891Z add.s32 %r59, %r18409, %r17562; 2026-02-21T12:40:24.4337084Z xor.b32 %r17563, %r17562, 8; 2026-02-21T12:40:24.4337264Z add.s32 %r60, %r18409, %r17563; 2026-02-21T12:40:24.4337461Z xor.b32 %r17564, %r17562, 32; 2026-02-21T12:40:24.4337647Z add.s32 %r61, %r18409, %r17564; 2026-02-21T12:40:24.4337837Z xor.b32 %r17565, %r17562, 40; 2026-02-21T12:40:24.4338021Z add.s32 %r62, %r18409, %r17565; 2026-02-21T12:40:24.4338208Z xor.b32 %r17566, %r17562, 64; 2026-02-21T12:40:24.4338389Z add.s32 %r63, %r18409, %r17566; 2026-02-21T12:40:24.4338570Z xor.b32 %r17567, %r17562, 72; 2026-02-21T12:40:24.4338753Z add.s32 %r64, %r18409, %r17567; 2026-02-21T12:40:24.4338934Z xor.b32 %r17568, %r17562, 96; 2026-02-21T12:40:24.4339116Z add.s32 %r65, %r18409, %r17568; 2026-02-21T12:40:24.4339310Z xor.b32 %r17569, %r17562, 104; 2026-02-21T12:40:24.4339504Z add.s32 %r66, %r18409, %r17569; 2026-02-21T12:40:24.4339686Z xor.b32 %r17570, %r17562, 4; 2026-02-21T12:40:24.4339868Z add.s32 %r67, %r18409, %r17570; 2026-02-21T12:40:24.4340055Z xor.b32 %r17571, %r17562, 12; 2026-02-21T12:40:24.4340231Z add.s32 %r68, %r18409, %r17571; 2026-02-21T12:40:24.4340424Z xor.b32 %r17572, %r17562, 36; 2026-02-21T12:40:24.4340602Z add.s32 %r69, %r18409, %r17572; 2026-02-21T12:40:24.4340790Z xor.b32 %r17573, %r17562, 44; 2026-02-21T12:40:24.4340966Z add.s32 %r70, %r18409, %r17573; 2026-02-21T12:40:24.4341155Z xor.b32 %r17574, %r17562, 68; 2026-02-21T12:40:24.4341340Z add.s32 %r71, %r18409, %r17574; 2026-02-21T12:40:24.4341528Z xor.b32 %r17575, %r17562, 76; 2026-02-21T12:40:24.4341709Z add.s32 %r72, %r18409, %r17575; 2026-02-21T12:40:24.4341903Z xor.b32 %r17576, %r17562, 100; 2026-02-21T12:40:24.4342094Z add.s32 %r73, %r18409, %r17576; 2026-02-21T12:40:24.4342278Z xor.b32 %r17577, %r17562, 108; 2026-02-21T12:40:24.4342462Z add.s32 %r74, %r18409, %r17577; 2026-02-21T12:40:24.4342658Z and.b32 %r17579, %r21315, 1028; 2026-02-21T12:40:24.4342846Z mul.lo.s32 %r17580, %r21322, 144; 2026-02-21T12:40:24.4343130Z xor.b32 %r17581, %r17580, %r6; 2026-02-21T12:40:24.4343313Z or.b32 %r17582, %r17579, %r17581; 2026-02-21T12:40:24.4343507Z or.b32 %r17583, %r17582, %r8; 2026-02-21T12:40:24.4343754Z add.s32 %r75, %r18409, %r17583; 2026-02-21T12:40:24.4343960Z xor.b32 %r17584, %r17583, 4; 2026-02-21T12:40:24.4344149Z add.s32 %r76, %r18409, %r17584; 2026-02-21T12:40:24.4344328Z xor.b32 %r17585, %r17583, 136; 2026-02-21T12:40:24.4344514Z add.s32 %r77, %r18409, %r17585; 2026-02-21T12:40:24.4344698Z xor.b32 %r17586, %r17583, 140; 2026-02-21T12:40:24.4344886Z add.s32 %r78, %r18409, %r17586; 2026-02-21T12:40:24.4345067Z and.b32 %r17588, %r21323, 8128; 2026-02-21T12:40:24.4345252Z shl.b32 %r17589, %r21322, 3; 2026-02-21T12:40:24.4345427Z or.b32 %r17590, %r17588, %r17589; 2026-02-21T12:40:24.4345617Z add.s32 %r79, %r18409, %r17590; 2026-02-21T12:40:24.4345798Z xor.b32 %r17591, %r17590, 16; 2026-02-21T12:40:24.4345982Z add.s32 %r80, %r18409, %r17591; 2026-02-21T12:40:24.4346165Z xor.b32 %r17592, %r17590, 32; 2026-02-21T12:40:24.4346347Z add.s32 %r81, %r18409, %r17592; 2026-02-21T12:40:24.4346657Z xor.b32 %r17593, %r17590, 48; 2026-02-21T12:40:24.4346969Z add.s32 %r82, %r18409, %r17593; 2026-02-21T12:40:24.4347161Z bfe.u32 %r17594, %r18409, 4, 14; 2026-02-21T12:40:24.4347350Z cvt.u64.u32 %rd520, %r17594; 2026-02-21T12:40:24.4347559Z or.b64 %rd30, %rd520, -9223371899348713472; 2026-02-21T12:40:24.4347779Z add.s32 %r17595, %r18409, 32; 2026-02-21T12:40:24.4347968Z bfe.u32 %r17596, %r17595, 4, 14; 2026-02-21T12:40:24.4348159Z cvt.u64.u32 %rd521, %r17596; 2026-02-21T12:40:24.4348440Z or.b64 %rd31, %rd521, -9223371899348713472; 2026-02-21T12:40:24.4348673Z and.b32 %r17600, %r21326, 2064; 2026-02-21T12:40:24.4348860Z or.b32 %r17601, %r21325, %r17600; 2026-02-21T12:40:24.4349078Z mad.lo.s32 %r17602, %r21324, 4128, %r17601; 2026-02-21T12:40:24.4349291Z add.s32 %r83, %r18409, %r17602; 2026-02-21T12:40:24.4349483Z xor.b32 %r17603, %r17602, 16; 2026-02-21T12:40:24.4349666Z add.s32 %r84, %r18409, %r17603; 2026-02-21T12:40:24.4349854Z xor.b32 %r17604, %r17602, 32; 2026-02-21T12:40:24.4350036Z add.s32 %r85, %r18409, %r17604; 2026-02-21T12:40:24.4350224Z xor.b32 %r17605, %r17602, 48; 2026-02-21T12:40:24.4350406Z add.s32 %r86, %r18409, %r17605; 2026-02-21T12:40:24.4350591Z xor.b32 %r17606, %r17602, 64; 2026-02-21T12:40:24.4350771Z add.s32 %r87, %r18409, %r17606; 2026-02-21T12:40:24.4350953Z xor.b32 %r17607, %r17602, 80; 2026-02-21T12:40:24.4351147Z add.s32 %r88, %r18409, %r17607; 2026-02-21T12:40:24.4351333Z xor.b32 %r17608, %r17602, 96; 2026-02-21T12:40:24.4351519Z add.s32 %r89, %r18409, %r17608; 2026-02-21T12:40:24.4351704Z xor.b32 %r17609, %r17602, 112; 2026-02-21T12:40:24.4351894Z add.s32 %r90, %r18409, %r17609; 2026-02-21T12:40:24.4352084Z shl.b32 %r17611, %r21327, 9; 2026-02-21T12:40:24.4352261Z shl.b32 %r17612, %r21327, 2; 2026-02-21T12:40:24.4352447Z and.b32 %r17614, %r21328, 2064; 2026-02-21T12:40:24.4352629Z and.b32 %r17615, %r21319, 128; 2026-02-21T12:40:24.4352823Z or.b32 %r17616, %r17611, %r21321; 2026-02-21T12:40:24.4353014Z or.b32 %r17617, %r17614, %r17612; 2026-02-21T12:40:24.4353211Z xor.b32 %r17618, %r17617, %r17616; 2026-02-21T12:40:24.4353412Z add.s32 %r17619, %r18409, %r17615; 2026-02-21T12:40:24.4353608Z add.s32 %r21108, %r17619, %r17618; 2026-02-21T12:40:24.4353806Z add.s32 %r21113, %r21108, 256; 2026-02-21T12:40:24.4353991Z add.s32 %r21118, %r21108, 512; 2026-02-21T12:40:24.4354184Z add.s32 %r21123, %r21108, 768; 2026-02-21T12:40:24.4354366Z add.s32 %r21128, %r21108, 1024; 2026-02-21T12:40:24.4354556Z add.s32 %r21133, %r21108, 1280; 2026-02-21T12:40:24.4354743Z add.s32 %r21138, %r21108, 1536; 2026-02-21T12:40:24.4354933Z add.s32 %r21143, %r21108, 1792; 2026-02-21T12:40:24.4355277Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:24.4355669Z mad.wide.u32 %rd522, %r21329, 16, %rd160; 2026-02-21T12:40:24.4355896Z add.s64 %rd32, %rd522, 64; 2026-02-21T12:40:24.4356185Z or.b64 %rd524, %rd611, %rd21; 2026-02-21T12:40:24.4356372Z add.s64 %rd33, %rd161, %rd524; 2026-02-21T12:40:24.4356673Z add.s64 %rd34, %rd522, 16320; 2026-02-21T12:40:24.4356936Z add.s64 %rd35, %rd33, 5222400; 2026-02-21T12:40:24.4357172Z $L__BB0_6: // =>This Loop Header: Depth=1 2026-02-21T12:40:24.4357487Z // Child Loop BB0_7 Depth 2 2026-02-21T12:40:24.4357753Z // Child Loop BB0_9 Depth 2 2026-02-21T12:40:24.4358136Z .loc 1 28 35 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:28:35 2026-02-21T12:40:24.4358525Z mul.hi.u64 %rd526, %rd641, -3689348814741910323; 2026-02-21T12:40:24.4358756Z shr.u64 %rd527, %rd526, 5; 2026-02-21T12:40:24.4359076Z .loc 1 31 45 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:31:45 2026-02-21T12:40:24.4359429Z mul.lo.s64 %rd528, %rd527, 40; 2026-02-21T12:40:24.4359627Z sub.s64 %rd529, %rd641, %rd528; 2026-02-21T12:40:24.4360018Z .loc 1 33 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:33:27 2026-02-21T12:40:24.4360434Z shl.b64 %rd530, %rd527, 9; 2026-02-21T12:40:24.4360618Z shl.b64 %rd531, %rd529, 6; 2026-02-21T12:40:24.4360795Z and.b64 %rd532, %rd531, 448; 2026-02-21T12:40:24.4360986Z or.b64 %rd140, %rd532, %rd530; 2026-02-21T12:40:24.4361306Z .loc 1 35 27 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:35:27 2026-02-21T12:40:24.4361664Z shl.b64 %rd533, %rd529, 5; 2026-02-21T12:40:24.4361855Z and.b64 %rd141, %rd533, 1792; 2026-02-21T12:40:24.4362188Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:24.4362550Z or.b64 %rd534, %rd4, %rd530; 2026-02-21T12:40:24.4362735Z or.b64 %rd535, %rd534, %rd532; 2026-02-21T12:40:24.4362925Z shl.b64 %rd142, %rd535, 14; 2026-02-21T12:40:24.4363111Z add.s64 %rd643, %rd32, %rd142; 2026-02-21T12:40:24.4363301Z add.s64 %rd642, %rd33, %rd141; 2026-02-21T12:40:24.4363482Z mov.b32 %r22354, 0f00000000; 2026-02-21T12:40:24.4363667Z mov.b64 %rd644, -24; 2026-02-21T12:40:24.4363837Z mov.b32 %r22355, %r22354; 2026-02-21T12:40:24.4364013Z mov.b32 %r22356, %r22354; 2026-02-21T12:40:24.4364184Z mov.b32 %r22357, %r22354; 2026-02-21T12:40:24.4364355Z mov.b32 %r22358, %r22354; 2026-02-21T12:40:24.4364529Z mov.b32 %r22359, %r22354; 2026-02-21T12:40:24.4364699Z mov.b32 %r22360, %r22354; 2026-02-21T12:40:24.4364887Z mov.b32 %r22361, %r22354; 2026-02-21T12:40:24.4365056Z mov.b32 %r22362, %r22354; 2026-02-21T12:40:24.4365226Z mov.b32 %r22363, %r22354; 2026-02-21T12:40:24.4365391Z mov.b32 %r22364, %r22354; 2026-02-21T12:40:24.4365566Z mov.b32 %r22365, %r22354; 2026-02-21T12:40:24.4365731Z mov.b32 %r22366, %r22354; 2026-02-21T12:40:24.4365906Z mov.b32 %r22367, %r22354; 2026-02-21T12:40:24.4366079Z mov.b32 %r22368, %r22354; 2026-02-21T12:40:24.4366257Z mov.b32 %r22369, %r22354; 2026-02-21T12:40:24.4366425Z mov.b32 %r22370, %r22354; 2026-02-21T12:40:24.4366717Z mov.b32 %r22371, %r22354; 2026-02-21T12:40:24.4366898Z mov.b32 %r22372, %r22354; 2026-02-21T12:40:24.4367063Z mov.b32 %r22373, %r22354; 2026-02-21T12:40:24.4367234Z mov.b32 %r22374, %r22354; 2026-02-21T12:40:24.4367413Z mov.b32 %r22375, %r22354; 2026-02-21T12:40:24.4367583Z mov.b32 %r22376, %r22354; 2026-02-21T12:40:24.4367750Z mov.b32 %r22377, %r22354; 2026-02-21T12:40:24.4367920Z mov.b32 %r22378, %r22354; 2026-02-21T12:40:24.4368084Z mov.b32 %r22379, %r22354; 2026-02-21T12:40:24.4368256Z mov.b32 %r22380, %r22354; 2026-02-21T12:40:24.4368425Z mov.b32 %r22381, %r22354; 2026-02-21T12:40:24.4368588Z mov.b32 %r22382, %r22354; 2026-02-21T12:40:24.4368755Z mov.b32 %r22383, %r22354; 2026-02-21T12:40:24.4368919Z mov.b32 %r22384, %r22354; 2026-02-21T12:40:24.4369092Z mov.b32 %r22385, %r22354; 2026-02-21T12:40:24.4369256Z mov.b32 %r22386, %r22354; 2026-02-21T12:40:24.4369517Z mov.b32 %r22387, %r22354; 2026-02-21T12:40:24.4369680Z mov.b32 %r22388, %r22354; 2026-02-21T12:40:24.4369855Z mov.b32 %r22389, %r22354; 2026-02-21T12:40:24.4370086Z mov.b32 %r22390, %r22354; 2026-02-21T12:40:24.4370260Z mov.b32 %r22391, %r22354; 2026-02-21T12:40:24.4370427Z mov.b32 %r22392, %r22354; 2026-02-21T12:40:24.4370592Z mov.b32 %r22393, %r22354; 2026-02-21T12:40:24.4370762Z mov.b32 %r22394, %r22354; 2026-02-21T12:40:24.4370927Z mov.b32 %r22395, %r22354; 2026-02-21T12:40:24.4371099Z mov.b32 %r22396, %r22354; 2026-02-21T12:40:24.4371263Z mov.b32 %r22397, %r22354; 2026-02-21T12:40:24.4371434Z mov.b32 %r22398, %r22354; 2026-02-21T12:40:24.4371596Z mov.b32 %r22399, %r22354; 2026-02-21T12:40:24.4371769Z mov.b32 %r22400, %r22354; 2026-02-21T12:40:24.4371931Z mov.b32 %r22401, %r22354; 2026-02-21T12:40:24.4372098Z mov.b32 %r22402, %r22354; 2026-02-21T12:40:24.4372263Z mov.b32 %r22403, %r22354; 2026-02-21T12:40:24.4372427Z mov.b32 %r22404, %r22354; 2026-02-21T12:40:24.4372599Z mov.b32 %r22405, %r22354; 2026-02-21T12:40:24.4372761Z mov.b32 %r22406, %r22354; 2026-02-21T12:40:24.4372944Z mov.b32 %r22407, %r22354; 2026-02-21T12:40:24.4373239Z mov.b32 %r22408, %r22354; 2026-02-21T12:40:24.4373411Z mov.b32 %r22409, %r22354; 2026-02-21T12:40:24.4373573Z mov.b32 %r22410, %r22354; 2026-02-21T12:40:24.4373743Z mov.b32 %r22411, %r22354; 2026-02-21T12:40:24.4373908Z mov.b32 %r22412, %r22354; 2026-02-21T12:40:24.4374076Z mov.b32 %r22413, %r22354; 2026-02-21T12:40:24.4374244Z mov.b32 %r22414, %r22354; 2026-02-21T12:40:24.4374407Z mov.b32 %r22415, %r22354; 2026-02-21T12:40:24.4374577Z mov.b32 %r22416, %r22354; 2026-02-21T12:40:24.4374741Z mov.b32 %r22417, %r22354; 2026-02-21T12:40:24.4374914Z mov.b32 %r22418, %r22354; 2026-02-21T12:40:24.4375082Z mov.b32 %r22419, %r22354; 2026-02-21T12:40:24.4375251Z mov.b32 %r22420, %r22354; 2026-02-21T12:40:24.4375419Z mov.b32 %r22421, %r22354; 2026-02-21T12:40:24.4375598Z mov.b32 %r22422, %r22354; 2026-02-21T12:40:24.4375772Z mov.b32 %r22423, %r22354; 2026-02-21T12:40:24.4375936Z mov.b32 %r22424, %r22354; 2026-02-21T12:40:24.4376102Z mov.b32 %r22425, %r22354; 2026-02-21T12:40:24.4376271Z mov.b32 %r22426, %r22354; 2026-02-21T12:40:24.4376437Z mov.b32 %r22427, %r22354; 2026-02-21T12:40:24.4376723Z mov.b32 %r22428, %r22354; 2026-02-21T12:40:24.4376893Z mov.b32 %r22429, %r22354; 2026-02-21T12:40:24.4377057Z mov.b32 %r22430, %r22354; 2026-02-21T12:40:24.4377223Z mov.b32 %r22431, %r22354; 2026-02-21T12:40:24.4377389Z mov.b32 %r22432, %r22354; 2026-02-21T12:40:24.4377560Z mov.b32 %r22433, %r22354; 2026-02-21T12:40:24.4377732Z mov.b32 %r22434, %r22354; 2026-02-21T12:40:24.4377907Z mov.b32 %r22435, %r22354; 2026-02-21T12:40:24.4378074Z mov.b32 %r22436, %r22354; 2026-02-21T12:40:24.4378239Z mov.b32 %r22437, %r22354; 2026-02-21T12:40:24.4378405Z mov.b32 %r22438, %r22354; 2026-02-21T12:40:24.4378565Z mov.b32 %r22439, %r22354; 2026-02-21T12:40:24.4378736Z mov.b32 %r22440, %r22354; 2026-02-21T12:40:24.4378905Z mov.b32 %r22441, %r22354; 2026-02-21T12:40:24.4379072Z mov.b32 %r22442, %r22354; 2026-02-21T12:40:24.4379235Z mov.b32 %r22443, %r22354; 2026-02-21T12:40:24.4379409Z mov.b32 %r22444, %r22354; 2026-02-21T12:40:24.4379580Z mov.b32 %r22445, %r22354; 2026-02-21T12:40:24.4379741Z mov.b32 %r22446, %r22354; 2026-02-21T12:40:24.4379906Z mov.b32 %r22447, %r22354; 2026-02-21T12:40:24.4380070Z mov.b32 %r22448, %r22354; 2026-02-21T12:40:24.4380129Z mov.b32 %r22449, %r22354; 2026-02-21T12:40:24.4380203Z mov.b32 %r22450, %r22354; 2026-02-21T12:40:24.4380267Z mov.b32 %r22451, %r22354; 2026-02-21T12:40:24.4380328Z mov.b32 %r22452, %r22354; 2026-02-21T12:40:24.4380393Z mov.b32 %r22453, %r22354; 2026-02-21T12:40:24.4380451Z mov.b32 %r22454, %r22354; 2026-02-21T12:40:24.4380510Z mov.b32 %r22455, %r22354; 2026-02-21T12:40:24.4380570Z mov.b32 %r22456, %r22354; 2026-02-21T12:40:24.4380634Z mov.b32 %r22457, %r22354; 2026-02-21T12:40:24.4380693Z mov.b32 %r22458, %r22354; 2026-02-21T12:40:24.4380843Z mov.b32 %r22459, %r22354; 2026-02-21T12:40:24.4380908Z mov.b32 %r22460, %r22354; 2026-02-21T12:40:24.4380966Z mov.b32 %r22461, %r22354; 2026-02-21T12:40:24.4381119Z mov.b32 %r22462, %r22354; 2026-02-21T12:40:24.4381180Z mov.b32 %r22463, %r22354; 2026-02-21T12:40:24.4381249Z mov.b32 %r22464, %r22354; 2026-02-21T12:40:24.4381308Z mov.b32 %r22465, %r22354; 2026-02-21T12:40:24.4381367Z mov.b32 %r22466, %r22354; 2026-02-21T12:40:24.4381430Z mov.b32 %r22467, %r22354; 2026-02-21T12:40:24.4381489Z mov.b32 %r22468, %r22354; 2026-02-21T12:40:24.4381549Z mov.b32 %r22469, %r22354; 2026-02-21T12:40:24.4381619Z mov.b32 %r22470, %r22354; 2026-02-21T12:40:24.4381680Z mov.b32 %r22471, %r22354; 2026-02-21T12:40:24.4381742Z mov.b32 %r22472, %r22354; 2026-02-21T12:40:24.4381806Z mov.b32 %r22473, %r22354; 2026-02-21T12:40:24.4381875Z mov.b32 %r22474, %r22354; 2026-02-21T12:40:24.4381936Z mov.b32 %r22475, %r22354; 2026-02-21T12:40:24.4381997Z mov.b32 %r22476, %r22354; 2026-02-21T12:40:24.4382065Z mov.b32 %r22477, %r22354; 2026-02-21T12:40:24.4382125Z mov.b32 %r22478, %r22354; 2026-02-21T12:40:24.4382186Z mov.b32 %r22479, %r22354; 2026-02-21T12:40:24.4382374Z mov.b32 %r22480, %r22354; 2026-02-21T12:40:24.4382444Z mov.b32 %r22481, %r22354; 2026-02-21T12:40:24.4382563Z $L__BB0_7: // Parent Loop BB0_6 Depth=1 2026-02-21T12:40:24.4382674Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:24.4382889Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:24.4382960Z add.s64 %rd537, %rd643, -64; 2026-02-21T12:40:24.4383036Z // begin inline asm 2026-02-21T12:40:24.4383108Z mov.u64 %rd536, 0x0; 2026-02-21T12:40:24.4383257Z createpolicy.fractional.L2::evict_last.b64 %rd536, 1.0; 2026-02-21T12:40:24.4383320Z // end inline asm 2026-02-21T12:40:24.4383382Z // begin inline asm 2026-02-21T12:40:24.4383450Z mov.u32 %r17622, 0x0; 2026-02-21T12:40:24.4383515Z mov.u32 %r17623, 0x0; 2026-02-21T12:40:24.4383574Z mov.u32 %r17624, 0x0; 2026-02-21T12:40:24.4383642Z mov.u32 %r17625, 0x0; 2026-02-21T12:40:24.4383887Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r17622, %r17623, %r17624, %r17625 }, [ %rd537 + 0 ], %rd536; 2026-02-21T12:40:24.4383949Z // end inline asm 2026-02-21T12:40:24.4384152Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:24.4384217Z bar.sync 0; 2026-02-21T12:40:24.4384302Z st.shared.v2.b32 [%r55], {%r17622, %r17623}; 2026-02-21T12:40:24.4384379Z st.shared.v2.b32 [%r56], {%r17624, %r17625}; 2026-02-21T12:40:24.4384441Z bar.sync 0; 2026-02-21T12:40:24.4384511Z ld.shared.b16 %rs1665, [%r57]; 2026-02-21T12:40:24.4384592Z ld.shared.b16 %rs1666, [%r57+256]; 2026-02-21T12:40:24.4384669Z ld.shared.b16 %rs1667, [%r57+16]; 2026-02-21T12:40:24.4384737Z ld.shared.b16 %rs1668, [%r57+272]; 2026-02-21T12:40:24.4384802Z ld.shared.b16 %rs1669, [%r58]; 2026-02-21T12:40:24.4384870Z ld.shared.b16 %rs1670, [%r58+256]; 2026-02-21T12:40:24.4384942Z ld.shared.b16 %rs1671, [%r58+16]; 2026-02-21T12:40:24.4385009Z ld.shared.b16 %rs1672, [%r58+272]; 2026-02-21T12:40:24.4385077Z cvt.f32.bf16 %r17886, %rs1665; 2026-02-21T12:40:24.4385145Z cvt.f32.bf16 %r17887, %rs1666; 2026-02-21T12:40:24.4385208Z cvt.f32.bf16 %r17888, %rs1669; 2026-02-21T12:40:24.4385270Z cvt.f32.bf16 %r17889, %rs1670; 2026-02-21T12:40:24.4385332Z cvt.f32.bf16 %r18146, %rs1667; 2026-02-21T12:40:24.4385399Z cvt.f32.bf16 %r18147, %rs1668; 2026-02-21T12:40:24.4385461Z cvt.f32.bf16 %r18148, %rs1671; 2026-02-21T12:40:24.4385523Z cvt.f32.bf16 %r18149, %rs1672; 2026-02-21T12:40:24.4385728Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:24.4385789Z // begin inline asm 2026-02-21T12:40:24.4385848Z mov.u32 %r17626, 0x0; 2026-02-21T12:40:24.4385914Z mov.u32 %r17627, 0x0; 2026-02-21T12:40:24.4385976Z mov.u32 %r17628, 0x0; 2026-02-21T12:40:24.4386099Z mov.u32 %r17629, 0x0; 2026-02-21T12:40:24.4386238Z ld.global.v4.b32 { %r17626, %r17627, %r17628, %r17629 }, [ %rd642 + 0 ]; 2026-02-21T12:40:24.4386353Z // end inline asm 2026-02-21T12:40:24.4386677Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:24.4386739Z bar.sync 0; 2026-02-21T12:40:24.4386812Z st.shared.b8 [%r59], %r17626; 2026-02-21T12:40:24.4386886Z prmt.b32 %r19992, %r17626, 0, 0x7771U; 2026-02-21T12:40:24.4386952Z st.shared.b8 [%r60], %r19992; 2026-02-21T12:40:24.4387022Z prmt.b32 %r19993, %r17626, 0, 0x7772U; 2026-02-21T12:40:24.4387094Z st.shared.b8 [%r61+256], %r19993; 2026-02-21T12:40:24.4387164Z prmt.b32 %r19994, %r17626, 0, 0x7773U; 2026-02-21T12:40:24.4387230Z st.shared.b8 [%r62+256], %r19994; 2026-02-21T12:40:24.4387300Z st.shared.b8 [%r63+512], %r17627; 2026-02-21T12:40:24.4387367Z prmt.b32 %r19995, %r17627, 0, 0x7771U; 2026-02-21T12:40:24.4387431Z st.shared.b8 [%r64+512], %r19995; 2026-02-21T12:40:24.4387508Z prmt.b32 %r19996, %r17627, 0, 0x7772U; 2026-02-21T12:40:24.4387573Z st.shared.b8 [%r65+768], %r19996; 2026-02-21T12:40:24.4387768Z prmt.b32 %r19997, %r17627, 0, 0x7773U; 2026-02-21T12:40:24.4387838Z st.shared.b8 [%r66+768], %r19997; 2026-02-21T12:40:24.4387910Z st.shared.b8 [%r67+1024], %r17628; 2026-02-21T12:40:24.4387979Z prmt.b32 %r19998, %r17628, 0, 0x7771U; 2026-02-21T12:40:24.4388059Z st.shared.b8 [%r68+1024], %r19998; 2026-02-21T12:40:24.4388133Z prmt.b32 %r19999, %r17628, 0, 0x7772U; 2026-02-21T12:40:24.4388198Z st.shared.b8 [%r69+1280], %r19999; 2026-02-21T12:40:24.4388265Z prmt.b32 %r20000, %r17628, 0, 0x7773U; 2026-02-21T12:40:24.4388417Z st.shared.b8 [%r70+1280], %r20000; 2026-02-21T12:40:24.4388491Z st.shared.b8 [%r71+1536], %r17629; 2026-02-21T12:40:24.4388559Z prmt.b32 %r20001, %r17629, 0, 0x7771U; 2026-02-21T12:40:24.4388625Z st.shared.b8 [%r72+1536], %r20001; 2026-02-21T12:40:24.4388697Z prmt.b32 %r20002, %r17629, 0, 0x7772U; 2026-02-21T12:40:24.4388766Z st.shared.b8 [%r73+1792], %r20002; 2026-02-21T12:40:24.4388835Z prmt.b32 %r20003, %r17629, 0, 0x7773U; 2026-02-21T12:40:24.4388911Z st.shared.b8 [%r74+1792], %r20003; 2026-02-21T12:40:24.4388970Z bar.sync 0; 2026-02-21T12:40:24.4389039Z ld.shared.b32 %r20004, [%r75]; 2026-02-21T12:40:24.4389108Z prmt.b32 %r20005, %r20004, 0, 0x7770U; 2026-02-21T12:40:24.4389178Z cvt.u16.u32 %rs1673, %r20005; 2026-02-21T12:40:24.4389244Z prmt.b32 %r20006, %r20004, 0, 0x7771U; 2026-02-21T12:40:24.4389308Z cvt.u16.u32 %rs1674, %r20006; 2026-02-21T12:40:24.4389380Z prmt.b32 %r20007, %r20004, 0, 0x7772U; 2026-02-21T12:40:24.4389443Z cvt.u16.u32 %rs1675, %r20007; 2026-02-21T12:40:24.4389510Z prmt.b32 %r20008, %r20004, 0, 0x7773U; 2026-02-21T12:40:24.4389573Z cvt.u16.u32 %rs1676, %r20008; 2026-02-21T12:40:24.4389646Z ld.shared.b32 %r20009, [%r76]; 2026-02-21T12:40:24.4389712Z prmt.b32 %r20010, %r20009, 0, 0x7770U; 2026-02-21T12:40:24.4389775Z cvt.u16.u32 %rs1677, %r20010; 2026-02-21T12:40:24.4389853Z prmt.b32 %r20011, %r20009, 0, 0x7771U; 2026-02-21T12:40:24.4389918Z cvt.u16.u32 %rs1678, %r20011; 2026-02-21T12:40:24.4389989Z prmt.b32 %r20012, %r20009, 0, 0x7772U; 2026-02-21T12:40:24.4390055Z cvt.u16.u32 %rs1679, %r20012; 2026-02-21T12:40:24.4390126Z prmt.b32 %r20013, %r20009, 0, 0x7773U; 2026-02-21T12:40:24.4390190Z cvt.u16.u32 %rs1680, %r20013; 2026-02-21T12:40:24.4390258Z ld.shared.b32 %r20014, [%r77]; 2026-02-21T12:40:24.4390332Z prmt.b32 %r20015, %r20014, 0, 0x7770U; 2026-02-21T12:40:24.4390400Z cvt.u16.u32 %rs1681, %r20015; 2026-02-21T12:40:24.4390466Z prmt.b32 %r20016, %r20014, 0, 0x7771U; 2026-02-21T12:40:24.4390534Z cvt.u16.u32 %rs1682, %r20016; 2026-02-21T12:40:24.4390602Z prmt.b32 %r20017, %r20014, 0, 0x7772U; 2026-02-21T12:40:24.4390667Z cvt.u16.u32 %rs1683, %r20017; 2026-02-21T12:40:24.4390734Z prmt.b32 %r20018, %r20014, 0, 0x7773U; 2026-02-21T12:40:24.4390804Z cvt.u16.u32 %rs1684, %r20018; 2026-02-21T12:40:24.4390962Z ld.shared.b32 %r20019, [%r78]; 2026-02-21T12:40:24.4391032Z prmt.b32 %r20020, %r20019, 0, 0x7770U; 2026-02-21T12:40:24.4391102Z cvt.u16.u32 %rs1685, %r20020; 2026-02-21T12:40:24.4391238Z prmt.b32 %r20021, %r20019, 0, 0x7771U; 2026-02-21T12:40:24.4391302Z cvt.u16.u32 %rs1686, %r20021; 2026-02-21T12:40:24.4391368Z prmt.b32 %r20022, %r20019, 0, 0x7772U; 2026-02-21T12:40:24.4391443Z cvt.u16.u32 %rs1687, %r20022; 2026-02-21T12:40:24.4391510Z prmt.b32 %r20023, %r20019, 0, 0x7773U; 2026-02-21T12:40:24.4391573Z cvt.u16.u32 %rs1688, %r20023; 2026-02-21T12:40:24.4391782Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:24.4391851Z shl.b16 %rs1689, %rs1673, 4; 2026-02-21T12:40:24.4391914Z shl.b16 %rs1690, %rs1677, 4; 2026-02-21T12:40:24.4391981Z shl.b16 %rs1691, %rs1681, 4; 2026-02-21T12:40:24.4392045Z shl.b16 %rs1692, %rs1685, 4; 2026-02-21T12:40:24.4392108Z shl.b16 %rs1693, %rs1674, 4; 2026-02-21T12:40:24.4392170Z shl.b16 %rs1694, %rs1678, 4; 2026-02-21T12:40:24.4392243Z shl.b16 %rs1695, %rs1682, 4; 2026-02-21T12:40:24.4392305Z shl.b16 %rs1696, %rs1686, 4; 2026-02-21T12:40:24.4392464Z shl.b16 %rs1697, %rs1675, 4; 2026-02-21T12:40:24.4392534Z shl.b16 %rs1698, %rs1679, 4; 2026-02-21T12:40:24.4392597Z shl.b16 %rs1699, %rs1683, 4; 2026-02-21T12:40:24.4392658Z shl.b16 %rs1700, %rs1687, 4; 2026-02-21T12:40:24.4392720Z shl.b16 %rs1701, %rs1676, 4; 2026-02-21T12:40:24.4392789Z shl.b16 %rs1702, %rs1680, 4; 2026-02-21T12:40:24.4392851Z shl.b16 %rs1703, %rs1684, 4; 2026-02-21T12:40:24.4392912Z shl.b16 %rs1704, %rs1688, 4; 2026-02-21T12:40:24.4393113Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4393176Z cvt.s16.s8 %rs1705, %rs1689; 2026-02-21T12:40:24.4393237Z shr.s16 %rs1706, %rs1705, 4; 2026-02-21T12:40:24.4393307Z cvt.s16.s8 %rs1707, %rs1691; 2026-02-21T12:40:24.4393370Z shr.s16 %rs1708, %rs1707, 4; 2026-02-21T12:40:24.4393444Z prmt.b32 %r20024, %r20004, 0, 0x8880U; 2026-02-21T12:40:24.4393508Z cvt.u16.u32 %rs1709, %r20024; 2026-02-21T12:40:24.4393575Z shr.s16 %rs1710, %rs1709, 4; 2026-02-21T12:40:24.4393648Z prmt.b32 %r20025, %r20014, 0, 0x8880U; 2026-02-21T12:40:24.4393711Z cvt.u16.u32 %rs1711, %r20025; 2026-02-21T12:40:24.4393778Z shr.s16 %rs1712, %rs1711, 4; 2026-02-21T12:40:24.4393972Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4394040Z cvt.rn.f32.s16 %r20026, %rs1712; 2026-02-21T12:40:24.4394107Z cvt.rn.f32.s16 %r20027, %rs1710; 2026-02-21T12:40:24.4394176Z cvt.rn.f32.s16 %r20028, %rs1708; 2026-02-21T12:40:24.4394242Z cvt.rn.f32.s16 %r20029, %rs1706; 2026-02-21T12:40:24.4394436Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4394513Z cvt.s16.s8 %rs1713, %rs1690; 2026-02-21T12:40:24.4394583Z shr.s16 %rs1714, %rs1713, 4; 2026-02-21T12:40:24.4394648Z cvt.s16.s8 %rs1715, %rs1692; 2026-02-21T12:40:24.4394715Z shr.s16 %rs1716, %rs1715, 4; 2026-02-21T12:40:24.4394784Z prmt.b32 %r20030, %r20009, 0, 0x8880U; 2026-02-21T12:40:24.4394853Z cvt.u16.u32 %rs1717, %r20030; 2026-02-21T12:40:24.4394916Z shr.s16 %rs1718, %rs1717, 4; 2026-02-21T12:40:24.4394989Z prmt.b32 %r20031, %r20019, 0, 0x8880U; 2026-02-21T12:40:24.4395053Z cvt.u16.u32 %rs1719, %r20031; 2026-02-21T12:40:24.4395115Z shr.s16 %rs1720, %rs1719, 4; 2026-02-21T12:40:24.4395314Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4395380Z cvt.rn.f32.s16 %r20032, %rs1720; 2026-02-21T12:40:24.4395457Z cvt.rn.f32.s16 %r20033, %rs1718; 2026-02-21T12:40:24.4395524Z cvt.rn.f32.s16 %r20034, %rs1716; 2026-02-21T12:40:24.4395600Z cvt.rn.f32.s16 %r20035, %rs1714; 2026-02-21T12:40:24.4395795Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4395921Z cvt.s16.s8 %rs1721, %rs1693; 2026-02-21T12:40:24.4395990Z shr.s16 %rs1722, %rs1721, 4; 2026-02-21T12:40:24.4396053Z cvt.s16.s8 %rs1723, %rs1695; 2026-02-21T12:40:24.4396174Z shr.s16 %rs1724, %rs1723, 4; 2026-02-21T12:40:24.4396247Z prmt.b32 %r20036, %r20004, 0, 0x9991U; 2026-02-21T12:40:24.4396311Z cvt.u16.u32 %rs1725, %r20036; 2026-02-21T12:40:24.4396373Z shr.s16 %rs1726, %rs1725, 4; 2026-02-21T12:40:24.4396442Z prmt.b32 %r20037, %r20014, 0, 0x9991U; 2026-02-21T12:40:24.4396632Z cvt.u16.u32 %rs1727, %r20037; 2026-02-21T12:40:24.4396700Z shr.s16 %rs1728, %rs1727, 4; 2026-02-21T12:40:24.4396896Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4396967Z cvt.rn.f32.s16 %r20038, %rs1728; 2026-02-21T12:40:24.4397032Z cvt.rn.f32.s16 %r20039, %rs1726; 2026-02-21T12:40:24.4397097Z cvt.rn.f32.s16 %r20040, %rs1724; 2026-02-21T12:40:24.4397166Z cvt.rn.f32.s16 %r20041, %rs1722; 2026-02-21T12:40:24.4397361Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4397427Z cvt.s16.s8 %rs1729, %rs1694; 2026-02-21T12:40:24.4397625Z shr.s16 %rs1730, %rs1729, 4; 2026-02-21T12:40:24.4397704Z cvt.s16.s8 %rs1731, %rs1696; 2026-02-21T12:40:24.4397770Z shr.s16 %rs1732, %rs1731, 4; 2026-02-21T12:40:24.4397841Z prmt.b32 %r20042, %r20009, 0, 0x9991U; 2026-02-21T12:40:24.4397912Z cvt.u16.u32 %rs1733, %r20042; 2026-02-21T12:40:24.4397973Z shr.s16 %rs1734, %rs1733, 4; 2026-02-21T12:40:24.4398042Z prmt.b32 %r20043, %r20019, 0, 0x9991U; 2026-02-21T12:40:24.4398106Z cvt.u16.u32 %rs1735, %r20043; 2026-02-21T12:40:24.4398180Z shr.s16 %rs1736, %rs1735, 4; 2026-02-21T12:40:24.4398387Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4398454Z cvt.rn.f32.s16 %r20044, %rs1736; 2026-02-21T12:40:24.4398523Z cvt.rn.f32.s16 %r20045, %rs1734; 2026-02-21T12:40:24.4398587Z cvt.rn.f32.s16 %r20046, %rs1732; 2026-02-21T12:40:24.4398656Z cvt.rn.f32.s16 %r20047, %rs1730; 2026-02-21T12:40:24.4398861Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4398930Z cvt.s16.s8 %rs1737, %rs1697; 2026-02-21T12:40:24.4398994Z shr.s16 %rs1738, %rs1737, 4; 2026-02-21T12:40:24.4399056Z cvt.s16.s8 %rs1739, %rs1699; 2026-02-21T12:40:24.4399125Z shr.s16 %rs1740, %rs1739, 4; 2026-02-21T12:40:24.4399199Z prmt.b32 %r20048, %r20004, 0, 0xaaa2U; 2026-02-21T12:40:24.4399264Z cvt.u16.u32 %rs1741, %r20048; 2026-02-21T12:40:24.4399337Z shr.s16 %rs1742, %rs1741, 4; 2026-02-21T12:40:24.4399406Z prmt.b32 %r20049, %r20014, 0, 0xaaa2U; 2026-02-21T12:40:24.4399471Z cvt.u16.u32 %rs1743, %r20049; 2026-02-21T12:40:24.4399535Z shr.s16 %rs1744, %rs1743, 4; 2026-02-21T12:40:24.4399742Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4399808Z cvt.rn.f32.s16 %r20050, %rs1744; 2026-02-21T12:40:24.4399876Z cvt.rn.f32.s16 %r20051, %rs1742; 2026-02-21T12:40:24.4399950Z cvt.rn.f32.s16 %r20052, %rs1740; 2026-02-21T12:40:24.4400016Z cvt.rn.f32.s16 %r20053, %rs1738; 2026-02-21T12:40:24.4400214Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4400283Z cvt.s16.s8 %rs1745, %rs1698; 2026-02-21T12:40:24.4400344Z shr.s16 %rs1746, %rs1745, 4; 2026-02-21T12:40:24.4400406Z cvt.s16.s8 %rs1747, %rs1700; 2026-02-21T12:40:24.4400469Z shr.s16 %rs1748, %rs1747, 4; 2026-02-21T12:40:24.4400545Z prmt.b32 %r20054, %r20009, 0, 0xaaa2U; 2026-02-21T12:40:24.4400608Z cvt.u16.u32 %rs1749, %r20054; 2026-02-21T12:40:24.4400671Z shr.s16 %rs1750, %rs1749, 4; 2026-02-21T12:40:24.4400743Z prmt.b32 %r20055, %r20019, 0, 0xaaa2U; 2026-02-21T12:40:24.4400807Z cvt.u16.u32 %rs1751, %r20055; 2026-02-21T12:40:24.4400869Z shr.s16 %rs1752, %rs1751, 4; 2026-02-21T12:40:24.4401067Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4401235Z cvt.rn.f32.s16 %r20056, %rs1752; 2026-02-21T12:40:24.4401303Z cvt.rn.f32.s16 %r20057, %rs1750; 2026-02-21T12:40:24.4401437Z cvt.rn.f32.s16 %r20058, %rs1748; 2026-02-21T12:40:24.4401507Z cvt.rn.f32.s16 %r20059, %rs1746; 2026-02-21T12:40:24.4401702Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4401766Z cvt.s16.s8 %rs1753, %rs1701; 2026-02-21T12:40:24.4401833Z shr.s16 %rs1754, %rs1753, 4; 2026-02-21T12:40:24.4401894Z cvt.s16.s8 %rs1755, %rs1703; 2026-02-21T12:40:24.4401956Z shr.s16 %rs1756, %rs1755, 4; 2026-02-21T12:40:24.4402027Z prmt.b32 %r20060, %r20004, 0, 0xbbb3U; 2026-02-21T12:40:24.4402090Z cvt.u16.u32 %rs1757, %r20060; 2026-02-21T12:40:24.4402166Z shr.s16 %rs1758, %rs1757, 4; 2026-02-21T12:40:24.4402235Z prmt.b32 %r20061, %r20014, 0, 0xbbb3U; 2026-02-21T12:40:24.4402303Z cvt.u16.u32 %rs1759, %r20061; 2026-02-21T12:40:24.4402371Z shr.s16 %rs1760, %rs1759, 4; 2026-02-21T12:40:24.4402566Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4402732Z cvt.rn.f32.s16 %r20062, %rs1760; 2026-02-21T12:40:24.4402799Z cvt.rn.f32.s16 %r20063, %rs1758; 2026-02-21T12:40:24.4402861Z cvt.rn.f32.s16 %r20064, %rs1756; 2026-02-21T12:40:24.4402926Z cvt.rn.f32.s16 %r20065, %rs1754; 2026-02-21T12:40:24.4403124Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4403191Z cvt.s16.s8 %rs1761, %rs1702; 2026-02-21T12:40:24.4403254Z shr.s16 %rs1762, %rs1761, 4; 2026-02-21T12:40:24.4403322Z cvt.s16.s8 %rs1763, %rs1704; 2026-02-21T12:40:24.4403385Z shr.s16 %rs1764, %rs1763, 4; 2026-02-21T12:40:24.4403465Z prmt.b32 %r20066, %r20009, 0, 0xbbb3U; 2026-02-21T12:40:24.4403536Z cvt.u16.u32 %rs1765, %r20066; 2026-02-21T12:40:24.4403600Z shr.s16 %rs1766, %rs1765, 4; 2026-02-21T12:40:24.4403668Z prmt.b32 %r20067, %r20019, 0, 0xbbb3U; 2026-02-21T12:40:24.4403734Z cvt.u16.u32 %rs1767, %r20067; 2026-02-21T12:40:24.4403804Z shr.s16 %rs1768, %rs1767, 4; 2026-02-21T12:40:24.4404005Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4404070Z cvt.rn.f32.s16 %r20068, %rs1768; 2026-02-21T12:40:24.4404142Z cvt.rn.f32.s16 %r20069, %rs1766; 2026-02-21T12:40:24.4404206Z cvt.rn.f32.s16 %r20070, %rs1764; 2026-02-21T12:40:24.4404270Z cvt.rn.f32.s16 %r20071, %rs1762; 2026-02-21T12:40:24.4404327Z bar.sync 0; 2026-02-21T12:40:24.4404454Z st.shared.v4.b32 [%r79], {%r20029, %r20027, %r20028, %r20026}; 2026-02-21T12:40:24.4404584Z st.shared.v4.b32 [%r79+8192], {%r20035, %r20033, %r20034, %r20032}; 2026-02-21T12:40:24.4404699Z st.shared.v4.b32 [%r80], {%r20041, %r20039, %r20040, %r20038}; 2026-02-21T12:40:24.4404825Z st.shared.v4.b32 [%r80+8192], {%r20047, %r20045, %r20046, %r20044}; 2026-02-21T12:40:24.4404937Z st.shared.v4.b32 [%r81], {%r20053, %r20051, %r20052, %r20050}; 2026-02-21T12:40:24.4405057Z st.shared.v4.b32 [%r81+8192], {%r20059, %r20057, %r20058, %r20056}; 2026-02-21T12:40:24.4405176Z st.shared.v4.b32 [%r82], {%r20065, %r20063, %r20064, %r20062}; 2026-02-21T12:40:24.4405294Z st.shared.v4.b32 [%r82+8192], {%r20071, %r20069, %r20070, %r20068}; 2026-02-21T12:40:24.4405354Z $L__tmp33: 2026-02-21T12:40:24.4405636Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:24.4405700Z // begin inline asm 2026-02-21T12:40:24.4405782Z fence.proxy.async.shared::cta; 2026-02-21T12:40:24.4405841Z // end inline asm 2026-02-21T12:40:24.4405904Z bar.sync 0; 2026-02-21T12:40:24.4405999Z shfl.sync.idx.b32 %r20072, %r2, 0, 31, -1; 2026-02-21T12:40:24.4406074Z wgmma.fence.sync.aligned; 2026-02-21T12:40:24.4406146Z mov.pred %p48, -1; 2026-02-21T12:40:24.4406207Z // begin inline asm 2026-02-21T12:40:24.4409064Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r17886,%r17887,%r17888,%r17889}, %rd30, %p48, 1, 1; 2026-02-21T12:40:24.4409285Z // end inline asm 2026-02-21T12:40:24.4409347Z // begin inline asm 2026-02-21T12:40:24.4412176Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r18146,%r18147,%r18148,%r18149}, %rd31, %p48, 1, 1; 2026-02-21T12:40:24.4412244Z // end inline asm 2026-02-21T12:40:24.4412331Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:24.4412394Z mov.b32 %r19860, 0; 2026-02-21T12:40:24.4412459Z mov.b32 %r18278, %r18409; 2026-02-21T12:40:24.4412527Z mov.b32 %r18279, %r19860; 2026-02-21T12:40:24.4412588Z mov.b32 %r18280, %r19860; 2026-02-21T12:40:24.4412649Z // begin inline asm 2026-02-21T12:40:24.4415175Z // wait for regs: %r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r18278,%r18279,%r18280 2026-02-21T12:40:24.4415259Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:24.4415338Z // end inline asm 2026-02-21T12:40:24.4415397Z $L__tmp34: 2026-02-21T12:40:24.4415603Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:24.4415728Z add.s64 %rd543, %rd643, -32; 2026-02-21T12:40:24.4415798Z // begin inline asm 2026-02-21T12:40:24.4415918Z mov.u64 %rd542, 0x0; 2026-02-21T12:40:24.4416051Z createpolicy.fractional.L2::evict_last.b64 %rd542, 1.0; 2026-02-21T12:40:24.4416119Z // end inline asm 2026-02-21T12:40:24.4416180Z // begin inline asm 2026-02-21T12:40:24.4416242Z mov.u32 %r18412, 0x0; 2026-02-21T12:40:24.4416303Z mov.u32 %r18413, 0x0; 2026-02-21T12:40:24.4416368Z mov.u32 %r18414, 0x0; 2026-02-21T12:40:24.4416429Z mov.u32 %r18415, 0x0; 2026-02-21T12:40:24.4416792Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r18412, %r18413, %r18414, %r18415 }, [ %rd543 + 0 ], %rd542; 2026-02-21T12:40:24.4416861Z // end inline asm 2026-02-21T12:40:24.4417063Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:24.4417123Z bar.sync 0; 2026-02-21T12:40:24.4417217Z st.shared.v2.b32 [%r55], {%r18412, %r18413}; 2026-02-21T12:40:24.4417300Z st.shared.v2.b32 [%r56], {%r18414, %r18415}; 2026-02-21T12:40:24.4417358Z bar.sync 0; 2026-02-21T12:40:24.4417503Z ld.shared.b16 %rs1769, [%r57]; 2026-02-21T12:40:24.4417639Z ld.shared.b16 %rs1770, [%r57+256]; 2026-02-21T12:40:24.4417710Z ld.shared.b16 %rs1771, [%r57+16]; 2026-02-21T12:40:24.4417777Z ld.shared.b16 %rs1772, [%r57+272]; 2026-02-21T12:40:24.4417848Z ld.shared.b16 %rs1773, [%r58]; 2026-02-21T12:40:24.4417914Z ld.shared.b16 %rs1774, [%r58+256]; 2026-02-21T12:40:24.4417980Z ld.shared.b16 %rs1775, [%r58+16]; 2026-02-21T12:40:24.4418048Z ld.shared.b16 %rs1776, [%r58+272]; 2026-02-21T12:40:24.4418118Z cvt.f32.bf16 %r18676, %rs1769; 2026-02-21T12:40:24.4418181Z cvt.f32.bf16 %r18677, %rs1770; 2026-02-21T12:40:24.4418244Z cvt.f32.bf16 %r18678, %rs1773; 2026-02-21T12:40:24.4418311Z cvt.f32.bf16 %r18679, %rs1774; 2026-02-21T12:40:24.4418374Z cvt.f32.bf16 %r18936, %rs1771; 2026-02-21T12:40:24.4418435Z cvt.f32.bf16 %r18937, %rs1772; 2026-02-21T12:40:24.4418507Z cvt.f32.bf16 %r18938, %rs1775; 2026-02-21T12:40:24.4418579Z cvt.f32.bf16 %r18939, %rs1776; 2026-02-21T12:40:24.4418789Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:24.4418860Z add.s64 %rd545, %rd642, 10240; 2026-02-21T12:40:24.4418927Z // begin inline asm 2026-02-21T12:40:24.4418987Z mov.u32 %r18416, 0x0; 2026-02-21T12:40:24.4419046Z mov.u32 %r18417, 0x0; 2026-02-21T12:40:24.4419109Z mov.u32 %r18418, 0x0; 2026-02-21T12:40:24.4419169Z mov.u32 %r18419, 0x0; 2026-02-21T12:40:24.4419304Z ld.global.v4.b32 { %r18416, %r18417, %r18418, %r18419 }, [ %rd545 + 0 ]; 2026-02-21T12:40:24.4419363Z // end inline asm 2026-02-21T12:40:24.4419565Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:24.4419622Z bar.sync 0; 2026-02-21T12:40:24.4419689Z st.shared.b8 [%r59], %r18416; 2026-02-21T12:40:24.4419766Z prmt.b32 %r20073, %r18416, 0, 0x7771U; 2026-02-21T12:40:24.4419834Z st.shared.b8 [%r60], %r20073; 2026-02-21T12:40:24.4419904Z prmt.b32 %r20074, %r18416, 0, 0x7772U; 2026-02-21T12:40:24.4419979Z st.shared.b8 [%r61+256], %r20074; 2026-02-21T12:40:24.4420051Z prmt.b32 %r20075, %r18416, 0, 0x7773U; 2026-02-21T12:40:24.4420117Z st.shared.b8 [%r62+256], %r20075; 2026-02-21T12:40:24.4420182Z st.shared.b8 [%r63+512], %r18417; 2026-02-21T12:40:24.4420255Z prmt.b32 %r20076, %r18417, 0, 0x7771U; 2026-02-21T12:40:24.4420320Z st.shared.b8 [%r64+512], %r20076; 2026-02-21T12:40:24.4420386Z prmt.b32 %r20077, %r18417, 0, 0x7772U; 2026-02-21T12:40:24.4420456Z st.shared.b8 [%r65+768], %r20077; 2026-02-21T12:40:24.4420523Z prmt.b32 %r20078, %r18417, 0, 0x7773U; 2026-02-21T12:40:24.4420587Z st.shared.b8 [%r66+768], %r20078; 2026-02-21T12:40:24.4420654Z st.shared.b8 [%r67+1024], %r18418; 2026-02-21T12:40:24.4420726Z prmt.b32 %r20079, %r18418, 0, 0x7771U; 2026-02-21T12:40:24.4420791Z st.shared.b8 [%r68+1024], %r20079; 2026-02-21T12:40:24.4420947Z prmt.b32 %r20080, %r18418, 0, 0x7772U; 2026-02-21T12:40:24.4421020Z st.shared.b8 [%r69+1280], %r20080; 2026-02-21T12:40:24.4421089Z prmt.b32 %r20081, %r18418, 0, 0x7773U; 2026-02-21T12:40:24.4421215Z st.shared.b8 [%r70+1280], %r20081; 2026-02-21T12:40:24.4421286Z st.shared.b8 [%r71+1536], %r18419; 2026-02-21T12:40:24.4421352Z prmt.b32 %r20082, %r18419, 0, 0x7771U; 2026-02-21T12:40:24.4421418Z st.shared.b8 [%r72+1536], %r20082; 2026-02-21T12:40:24.4421485Z prmt.b32 %r20083, %r18419, 0, 0x7772U; 2026-02-21T12:40:24.4421555Z st.shared.b8 [%r73+1792], %r20083; 2026-02-21T12:40:24.4421622Z prmt.b32 %r20084, %r18419, 0, 0x7773U; 2026-02-21T12:40:24.4421689Z st.shared.b8 [%r74+1792], %r20084; 2026-02-21T12:40:24.4421755Z bar.sync 0; 2026-02-21T12:40:24.4421823Z ld.shared.b32 %r20085, [%r75]; 2026-02-21T12:40:24.4421890Z prmt.b32 %r20086, %r20085, 0, 0x7770U; 2026-02-21T12:40:24.4421955Z cvt.u16.u32 %rs1777, %r20086; 2026-02-21T12:40:24.4422029Z prmt.b32 %r20087, %r20085, 0, 0x7771U; 2026-02-21T12:40:24.4422095Z cvt.u16.u32 %rs1778, %r20087; 2026-02-21T12:40:24.4422161Z prmt.b32 %r20088, %r20085, 0, 0x7772U; 2026-02-21T12:40:24.4422278Z cvt.u16.u32 %rs1779, %r20088; 2026-02-21T12:40:24.4422410Z prmt.b32 %r20089, %r20085, 0, 0x7773U; 2026-02-21T12:40:24.4422475Z cvt.u16.u32 %rs1780, %r20089; 2026-02-21T12:40:24.4422542Z ld.shared.b32 %r20090, [%r76]; 2026-02-21T12:40:24.4422614Z prmt.b32 %r20091, %r20090, 0, 0x7770U; 2026-02-21T12:40:24.4422677Z cvt.u16.u32 %rs1781, %r20091; 2026-02-21T12:40:24.4422743Z prmt.b32 %r20092, %r20090, 0, 0x7771U; 2026-02-21T12:40:24.4422814Z cvt.u16.u32 %rs1782, %r20092; 2026-02-21T12:40:24.4422880Z prmt.b32 %r20093, %r20090, 0, 0x7772U; 2026-02-21T12:40:24.4422943Z cvt.u16.u32 %rs1783, %r20093; 2026-02-21T12:40:24.4423013Z prmt.b32 %r20094, %r20090, 0, 0x7773U; 2026-02-21T12:40:24.4423079Z cvt.u16.u32 %rs1784, %r20094; 2026-02-21T12:40:24.4423147Z ld.shared.b32 %r20095, [%r77]; 2026-02-21T12:40:24.4423215Z prmt.b32 %r20096, %r20095, 0, 0x7770U; 2026-02-21T12:40:24.4423288Z cvt.u16.u32 %rs1785, %r20096; 2026-02-21T12:40:24.4423355Z prmt.b32 %r20097, %r20095, 0, 0x7771U; 2026-02-21T12:40:24.4423425Z cvt.u16.u32 %rs1786, %r20097; 2026-02-21T12:40:24.4423498Z prmt.b32 %r20098, %r20095, 0, 0x7772U; 2026-02-21T12:40:24.4423563Z cvt.u16.u32 %rs1787, %r20098; 2026-02-21T12:40:24.4423629Z prmt.b32 %r20099, %r20095, 0, 0x7773U; 2026-02-21T12:40:24.4423693Z cvt.u16.u32 %rs1788, %r20099; 2026-02-21T12:40:24.4423767Z ld.shared.b32 %r20100, [%r78]; 2026-02-21T12:40:24.4423846Z prmt.b32 %r20101, %r20100, 0, 0x7770U; 2026-02-21T12:40:24.4423913Z cvt.u16.u32 %rs1789, %r20101; 2026-02-21T12:40:24.4423988Z prmt.b32 %r20102, %r20100, 0, 0x7771U; 2026-02-21T12:40:24.4424052Z cvt.u16.u32 %rs1790, %r20102; 2026-02-21T12:40:24.4424121Z prmt.b32 %r20103, %r20100, 0, 0x7772U; 2026-02-21T12:40:24.4424195Z cvt.u16.u32 %rs1791, %r20103; 2026-02-21T12:40:24.4424262Z prmt.b32 %r20104, %r20100, 0, 0x7773U; 2026-02-21T12:40:24.4424329Z cvt.u16.u32 %rs1792, %r20104; 2026-02-21T12:40:24.4424531Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:24.4424609Z shl.b16 %rs1793, %rs1777, 4; 2026-02-21T12:40:24.4424672Z shl.b16 %rs1794, %rs1781, 4; 2026-02-21T12:40:24.4424737Z shl.b16 %rs1795, %rs1785, 4; 2026-02-21T12:40:24.4424806Z shl.b16 %rs1796, %rs1789, 4; 2026-02-21T12:40:24.4424869Z shl.b16 %rs1797, %rs1778, 4; 2026-02-21T12:40:24.4424933Z shl.b16 %rs1798, %rs1782, 4; 2026-02-21T12:40:24.4424996Z shl.b16 %rs1799, %rs1786, 4; 2026-02-21T12:40:24.4425064Z shl.b16 %rs1800, %rs1790, 4; 2026-02-21T12:40:24.4425125Z shl.b16 %rs1801, %rs1779, 4; 2026-02-21T12:40:24.4425187Z shl.b16 %rs1802, %rs1783, 4; 2026-02-21T12:40:24.4425256Z shl.b16 %rs1803, %rs1787, 4; 2026-02-21T12:40:24.4425318Z shl.b16 %rs1804, %rs1791, 4; 2026-02-21T12:40:24.4425379Z shl.b16 %rs1805, %rs1780, 4; 2026-02-21T12:40:24.4425442Z shl.b16 %rs1806, %rs1784, 4; 2026-02-21T12:40:24.4425571Z shl.b16 %rs1807, %rs1788, 4; 2026-02-21T12:40:24.4425634Z shl.b16 %rs1808, %rs1792, 4; 2026-02-21T12:40:24.4425834Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4425952Z cvt.s16.s8 %rs1809, %rs1793; 2026-02-21T12:40:24.4426017Z shr.s16 %rs1810, %rs1809, 4; 2026-02-21T12:40:24.4426079Z cvt.s16.s8 %rs1811, %rs1795; 2026-02-21T12:40:24.4426146Z shr.s16 %rs1812, %rs1811, 4; 2026-02-21T12:40:24.4426216Z prmt.b32 %r20105, %r20085, 0, 0x8880U; 2026-02-21T12:40:24.4426295Z cvt.u16.u32 %rs1813, %r20105; 2026-02-21T12:40:24.4426365Z shr.s16 %rs1814, %rs1813, 4; 2026-02-21T12:40:24.4426439Z prmt.b32 %r20106, %r20095, 0, 0x8880U; 2026-02-21T12:40:24.4426608Z cvt.u16.u32 %rs1815, %r20106; 2026-02-21T12:40:24.4426672Z shr.s16 %rs1816, %rs1815, 4; 2026-02-21T12:40:24.4426873Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4426940Z cvt.rn.f32.s16 %r20107, %rs1816; 2026-02-21T12:40:24.4427008Z cvt.rn.f32.s16 %r20108, %rs1814; 2026-02-21T12:40:24.4427078Z cvt.rn.f32.s16 %r20109, %rs1812; 2026-02-21T12:40:24.4427273Z cvt.rn.f32.s16 %r20110, %rs1810; 2026-02-21T12:40:24.4427487Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4427553Z cvt.s16.s8 %rs1817, %rs1794; 2026-02-21T12:40:24.4427622Z shr.s16 %rs1818, %rs1817, 4; 2026-02-21T12:40:24.4427685Z cvt.s16.s8 %rs1819, %rs1796; 2026-02-21T12:40:24.4427747Z shr.s16 %rs1820, %rs1819, 4; 2026-02-21T12:40:24.4427821Z prmt.b32 %r20111, %r20090, 0, 0x8880U; 2026-02-21T12:40:24.4427884Z cvt.u16.u32 %rs1821, %r20111; 2026-02-21T12:40:24.4427946Z shr.s16 %rs1822, %rs1821, 4; 2026-02-21T12:40:24.4428016Z prmt.b32 %r20112, %r20100, 0, 0x8880U; 2026-02-21T12:40:24.4428085Z cvt.u16.u32 %rs1823, %r20112; 2026-02-21T12:40:24.4428148Z shr.s16 %rs1824, %rs1823, 4; 2026-02-21T12:40:24.4428411Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4428489Z cvt.rn.f32.s16 %r20113, %rs1824; 2026-02-21T12:40:24.4428561Z cvt.rn.f32.s16 %r20114, %rs1822; 2026-02-21T12:40:24.4428625Z cvt.rn.f32.s16 %r20115, %rs1820; 2026-02-21T12:40:24.4428697Z cvt.rn.f32.s16 %r20116, %rs1818; 2026-02-21T12:40:24.4428891Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4428956Z cvt.s16.s8 %rs1825, %rs1797; 2026-02-21T12:40:24.4429019Z shr.s16 %rs1826, %rs1825, 4; 2026-02-21T12:40:24.4429089Z cvt.s16.s8 %rs1827, %rs1799; 2026-02-21T12:40:24.4429150Z shr.s16 %rs1828, %rs1827, 4; 2026-02-21T12:40:24.4429219Z prmt.b32 %r20117, %r20085, 0, 0x9991U; 2026-02-21T12:40:24.4429296Z cvt.u16.u32 %rs1829, %r20117; 2026-02-21T12:40:24.4429363Z shr.s16 %rs1830, %rs1829, 4; 2026-02-21T12:40:24.4429432Z prmt.b32 %r20118, %r20095, 0, 0x9991U; 2026-02-21T12:40:24.4429496Z cvt.u16.u32 %rs1831, %r20118; 2026-02-21T12:40:24.4429567Z shr.s16 %rs1832, %rs1831, 4; 2026-02-21T12:40:24.4429762Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4429831Z cvt.rn.f32.s16 %r20119, %rs1832; 2026-02-21T12:40:24.4429903Z cvt.rn.f32.s16 %r20120, %rs1830; 2026-02-21T12:40:24.4429968Z cvt.rn.f32.s16 %r20121, %rs1828; 2026-02-21T12:40:24.4430033Z cvt.rn.f32.s16 %r20122, %rs1826; 2026-02-21T12:40:24.4430233Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4430297Z cvt.s16.s8 %rs1833, %rs1798; 2026-02-21T12:40:24.4430360Z shr.s16 %rs1834, %rs1833, 4; 2026-02-21T12:40:24.4430423Z cvt.s16.s8 %rs1835, %rs1800; 2026-02-21T12:40:24.4430499Z shr.s16 %rs1836, %rs1835, 4; 2026-02-21T12:40:24.4430568Z prmt.b32 %r20123, %r20090, 0, 0x9991U; 2026-02-21T12:40:24.4430634Z cvt.u16.u32 %rs1837, %r20123; 2026-02-21T12:40:24.4430709Z shr.s16 %rs1838, %rs1837, 4; 2026-02-21T12:40:24.4430857Z prmt.b32 %r20124, %r20100, 0, 0x9991U; 2026-02-21T12:40:24.4430921Z cvt.u16.u32 %rs1839, %r20124; 2026-02-21T12:40:24.4431002Z shr.s16 %rs1840, %rs1839, 4; 2026-02-21T12:40:24.4431261Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4431327Z cvt.rn.f32.s16 %r20125, %rs1840; 2026-02-21T12:40:24.4431391Z cvt.rn.f32.s16 %r20126, %rs1838; 2026-02-21T12:40:24.4431462Z cvt.rn.f32.s16 %r20127, %rs1836; 2026-02-21T12:40:24.4431528Z cvt.rn.f32.s16 %r20128, %rs1834; 2026-02-21T12:40:24.4431726Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4431798Z cvt.s16.s8 %rs1841, %rs1801; 2026-02-21T12:40:24.4431862Z shr.s16 %rs1842, %rs1841, 4; 2026-02-21T12:40:24.4431925Z cvt.s16.s8 %rs1843, %rs1803; 2026-02-21T12:40:24.4431994Z shr.s16 %rs1844, %rs1843, 4; 2026-02-21T12:40:24.4432062Z prmt.b32 %r20129, %r20085, 0, 0xaaa2U; 2026-02-21T12:40:24.4432128Z cvt.u16.u32 %rs1845, %r20129; 2026-02-21T12:40:24.4432194Z shr.s16 %rs1846, %rs1845, 4; 2026-02-21T12:40:24.4432317Z prmt.b32 %r20130, %r20095, 0, 0xaaa2U; 2026-02-21T12:40:24.4432455Z cvt.u16.u32 %rs1847, %r20130; 2026-02-21T12:40:24.4432520Z shr.s16 %rs1848, %rs1847, 4; 2026-02-21T12:40:24.4432724Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4432790Z cvt.rn.f32.s16 %r20131, %rs1848; 2026-02-21T12:40:24.4432855Z cvt.rn.f32.s16 %r20132, %rs1846; 2026-02-21T12:40:24.4432921Z cvt.rn.f32.s16 %r20133, %rs1844; 2026-02-21T12:40:24.4432991Z cvt.rn.f32.s16 %r20134, %rs1842; 2026-02-21T12:40:24.4433184Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4433250Z cvt.s16.s8 %rs1849, %rs1802; 2026-02-21T12:40:24.4433318Z shr.s16 %rs1850, %rs1849, 4; 2026-02-21T12:40:24.4433382Z cvt.s16.s8 %rs1851, %rs1804; 2026-02-21T12:40:24.4433448Z shr.s16 %rs1852, %rs1851, 4; 2026-02-21T12:40:24.4433523Z prmt.b32 %r20135, %r20090, 0, 0xaaa2U; 2026-02-21T12:40:24.4433586Z cvt.u16.u32 %rs1853, %r20135; 2026-02-21T12:40:24.4433654Z shr.s16 %rs1854, %rs1853, 4; 2026-02-21T12:40:24.4433720Z prmt.b32 %r20136, %r20100, 0, 0xaaa2U; 2026-02-21T12:40:24.4433787Z cvt.u16.u32 %rs1855, %r20136; 2026-02-21T12:40:24.4433849Z shr.s16 %rs1856, %rs1855, 4; 2026-02-21T12:40:24.4434042Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4434111Z cvt.rn.f32.s16 %r20137, %rs1856; 2026-02-21T12:40:24.4434176Z cvt.rn.f32.s16 %r20138, %rs1854; 2026-02-21T12:40:24.4434241Z cvt.rn.f32.s16 %r20139, %rs1852; 2026-02-21T12:40:24.4434304Z cvt.rn.f32.s16 %r20140, %rs1850; 2026-02-21T12:40:24.4434507Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4434570Z cvt.s16.s8 %rs1857, %rs1805; 2026-02-21T12:40:24.4434635Z shr.s16 %rs1858, %rs1857, 4; 2026-02-21T12:40:24.4434703Z cvt.s16.s8 %rs1859, %rs1807; 2026-02-21T12:40:24.4434765Z shr.s16 %rs1860, %rs1859, 4; 2026-02-21T12:40:24.4434839Z prmt.b32 %r20141, %r20085, 0, 0xbbb3U; 2026-02-21T12:40:24.4434907Z cvt.u16.u32 %rs1861, %r20141; 2026-02-21T12:40:24.4434969Z shr.s16 %rs1862, %rs1861, 4; 2026-02-21T12:40:24.4435038Z prmt.b32 %r20142, %r20095, 0, 0xbbb3U; 2026-02-21T12:40:24.4435101Z cvt.u16.u32 %rs1863, %r20142; 2026-02-21T12:40:24.4435168Z shr.s16 %rs1864, %rs1863, 4; 2026-02-21T12:40:24.4435378Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4435444Z cvt.rn.f32.s16 %r20143, %rs1864; 2026-02-21T12:40:24.4435514Z cvt.rn.f32.s16 %r20144, %rs1862; 2026-02-21T12:40:24.4435576Z cvt.rn.f32.s16 %r20145, %rs1860; 2026-02-21T12:40:24.4435641Z cvt.rn.f32.s16 %r20146, %rs1858; 2026-02-21T12:40:24.4435840Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4435983Z cvt.s16.s8 %rs1865, %rs1806; 2026-02-21T12:40:24.4436046Z shr.s16 %rs1866, %rs1865, 4; 2026-02-21T12:40:24.4436163Z cvt.s16.s8 %rs1867, %rs1808; 2026-02-21T12:40:24.4436232Z shr.s16 %rs1868, %rs1867, 4; 2026-02-21T12:40:24.4436300Z prmt.b32 %r20147, %r20090, 0, 0xbbb3U; 2026-02-21T12:40:24.4436364Z cvt.u16.u32 %rs1869, %r20147; 2026-02-21T12:40:24.4436432Z shr.s16 %rs1870, %rs1869, 4; 2026-02-21T12:40:24.4436642Z prmt.b32 %r20148, %r20100, 0, 0xbbb3U; 2026-02-21T12:40:24.4436710Z cvt.u16.u32 %rs1871, %r20148; 2026-02-21T12:40:24.4436774Z shr.s16 %rs1872, %rs1871, 4; 2026-02-21T12:40:24.4436991Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4437058Z cvt.rn.f32.s16 %r20149, %rs1872; 2026-02-21T12:40:24.4437122Z cvt.rn.f32.s16 %r20150, %rs1870; 2026-02-21T12:40:24.4437192Z cvt.rn.f32.s16 %r20151, %rs1868; 2026-02-21T12:40:24.4437255Z cvt.rn.f32.s16 %r20152, %rs1866; 2026-02-21T12:40:24.4437317Z bar.sync 0; 2026-02-21T12:40:24.4437444Z st.shared.v4.b32 [%r79], {%r20110, %r20108, %r20109, %r20107}; 2026-02-21T12:40:24.4437713Z st.shared.v4.b32 [%r79+8192], {%r20116, %r20114, %r20115, %r20113}; 2026-02-21T12:40:24.4437836Z st.shared.v4.b32 [%r80], {%r20122, %r20120, %r20121, %r20119}; 2026-02-21T12:40:24.4437958Z st.shared.v4.b32 [%r80+8192], {%r20128, %r20126, %r20127, %r20125}; 2026-02-21T12:40:24.4438076Z st.shared.v4.b32 [%r81], {%r20134, %r20132, %r20133, %r20131}; 2026-02-21T12:40:24.4438193Z st.shared.v4.b32 [%r81+8192], {%r20140, %r20138, %r20139, %r20137}; 2026-02-21T12:40:24.4438301Z st.shared.v4.b32 [%r82], {%r20146, %r20144, %r20145, %r20143}; 2026-02-21T12:40:24.4438424Z st.shared.v4.b32 [%r82+8192], {%r20152, %r20150, %r20151, %r20149}; 2026-02-21T12:40:24.4438482Z $L__tmp35: 2026-02-21T12:40:24.4438767Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:24.4438839Z // begin inline asm 2026-02-21T12:40:24.4438921Z fence.proxy.async.shared::cta; 2026-02-21T12:40:24.4438982Z // end inline asm 2026-02-21T12:40:24.4439045Z bar.sync 0; 2026-02-21T12:40:24.4439128Z wgmma.fence.sync.aligned; 2026-02-21T12:40:24.4439189Z // begin inline asm 2026-02-21T12:40:24.4441924Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r18676,%r18677,%r18678,%r18679}, %rd30, %p48, 1, 1; 2026-02-21T12:40:24.4441990Z // end inline asm 2026-02-21T12:40:24.4442052Z // begin inline asm 2026-02-21T12:40:24.4444769Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r18936,%r18937,%r18938,%r18939}, %rd31, %p48, 1, 1; 2026-02-21T12:40:24.4444954Z // end inline asm 2026-02-21T12:40:24.4445033Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:24.4445096Z mov.b32 %r19068, %r18409; 2026-02-21T12:40:24.4445160Z mov.b32 %r19069, %r19860; 2026-02-21T12:40:24.4445228Z mov.b32 %r19070, %r19860; 2026-02-21T12:40:24.4445290Z // begin inline asm 2026-02-21T12:40:24.4448114Z // wait for regs: %r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r19068,%r19069,%r19070 2026-02-21T12:40:24.4448212Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:24.4448278Z // end inline asm 2026-02-21T12:40:24.4448342Z $L__tmp36: 2026-02-21T12:40:24.4448557Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:24.4448626Z // begin inline asm 2026-02-21T12:40:24.4448695Z mov.u64 %rd548, 0x0; 2026-02-21T12:40:24.4448829Z createpolicy.fractional.L2::evict_last.b64 %rd548, 1.0; 2026-02-21T12:40:24.4448889Z // end inline asm 2026-02-21T12:40:24.4448951Z // begin inline asm 2026-02-21T12:40:24.4449022Z mov.u32 %r19202, 0x0; 2026-02-21T12:40:24.4449083Z mov.u32 %r19203, 0x0; 2026-02-21T12:40:24.4449144Z mov.u32 %r19204, 0x0; 2026-02-21T12:40:24.4449210Z mov.u32 %r19205, 0x0; 2026-02-21T12:40:24.4449453Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r19202, %r19203, %r19204, %r19205 }, [ %rd643 + 0 ], %rd548; 2026-02-21T12:40:24.4449514Z // end inline asm 2026-02-21T12:40:24.4449726Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:24.4449789Z bar.sync 0; 2026-02-21T12:40:24.4449879Z st.shared.v2.b32 [%r55], {%r19202, %r19203}; 2026-02-21T12:40:24.4449962Z st.shared.v2.b32 [%r56], {%r19204, %r19205}; 2026-02-21T12:40:24.4450028Z bar.sync 0; 2026-02-21T12:40:24.4450100Z ld.shared.b16 %rs1873, [%r57]; 2026-02-21T12:40:24.4450172Z ld.shared.b16 %rs1874, [%r57+256]; 2026-02-21T12:40:24.4450248Z ld.shared.b16 %rs1875, [%r57+16]; 2026-02-21T12:40:24.4450316Z ld.shared.b16 %rs1876, [%r57+272]; 2026-02-21T12:40:24.4450383Z ld.shared.b16 %rs1877, [%r58]; 2026-02-21T12:40:24.4450451Z ld.shared.b16 %rs1878, [%r58+256]; 2026-02-21T12:40:24.4450526Z ld.shared.b16 %rs1879, [%r58+16]; 2026-02-21T12:40:24.4450595Z ld.shared.b16 %rs1880, [%r58+272]; 2026-02-21T12:40:24.4450663Z cvt.f32.bf16 %r19466, %rs1873; 2026-02-21T12:40:24.4450734Z cvt.f32.bf16 %r19467, %rs1874; 2026-02-21T12:40:24.4450885Z cvt.f32.bf16 %r19468, %rs1877; 2026-02-21T12:40:24.4450948Z cvt.f32.bf16 %r19469, %rs1878; 2026-02-21T12:40:24.4451016Z cvt.f32.bf16 %r19726, %rs1875; 2026-02-21T12:40:24.4451142Z cvt.f32.bf16 %r19727, %rs1876; 2026-02-21T12:40:24.4451206Z cvt.f32.bf16 %r19728, %rs1879; 2026-02-21T12:40:24.4451268Z cvt.f32.bf16 %r19729, %rs1880; 2026-02-21T12:40:24.4451477Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:24.4451544Z add.s64 %rd551, %rd642, 20480; 2026-02-21T12:40:24.4451605Z // begin inline asm 2026-02-21T12:40:24.4451668Z mov.u32 %r19206, 0x0; 2026-02-21T12:40:24.4451729Z mov.u32 %r19207, 0x0; 2026-02-21T12:40:24.4451787Z mov.u32 %r19208, 0x0; 2026-02-21T12:40:24.4451847Z mov.u32 %r19209, 0x0; 2026-02-21T12:40:24.4451988Z ld.global.v4.b32 { %r19206, %r19207, %r19208, %r19209 }, [ %rd551 + 0 ]; 2026-02-21T12:40:24.4452047Z // end inline asm 2026-02-21T12:40:24.4452246Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:24.4452313Z bar.sync 0; 2026-02-21T12:40:24.4452427Z st.shared.b8 [%r59], %r19206; 2026-02-21T12:40:24.4452548Z prmt.b32 %r20153, %r19206, 0, 0x7771U; 2026-02-21T12:40:24.4452621Z st.shared.b8 [%r60], %r20153; 2026-02-21T12:40:24.4452690Z prmt.b32 %r20154, %r19206, 0, 0x7772U; 2026-02-21T12:40:24.4452757Z st.shared.b8 [%r61+256], %r20154; 2026-02-21T12:40:24.4452823Z prmt.b32 %r20155, %r19206, 0, 0x7773U; 2026-02-21T12:40:24.4452895Z st.shared.b8 [%r62+256], %r20155; 2026-02-21T12:40:24.4452959Z st.shared.b8 [%r63+512], %r19207; 2026-02-21T12:40:24.4453025Z prmt.b32 %r20156, %r19207, 0, 0x7771U; 2026-02-21T12:40:24.4453107Z st.shared.b8 [%r64+512], %r20156; 2026-02-21T12:40:24.4453174Z prmt.b32 %r20157, %r19207, 0, 0x7772U; 2026-02-21T12:40:24.4453243Z st.shared.b8 [%r65+768], %r20157; 2026-02-21T12:40:24.4453311Z prmt.b32 %r20158, %r19207, 0, 0x7773U; 2026-02-21T12:40:24.4453383Z st.shared.b8 [%r66+768], %r20158; 2026-02-21T12:40:24.4453450Z st.shared.b8 [%r67+1024], %r19208; 2026-02-21T12:40:24.4453521Z prmt.b32 %r20159, %r19208, 0, 0x7771U; 2026-02-21T12:40:24.4453603Z st.shared.b8 [%r68+1024], %r20159; 2026-02-21T12:40:24.4453671Z prmt.b32 %r20160, %r19208, 0, 0x7772U; 2026-02-21T12:40:24.4453735Z st.shared.b8 [%r69+1280], %r20160; 2026-02-21T12:40:24.4453808Z prmt.b32 %r20161, %r19208, 0, 0x7773U; 2026-02-21T12:40:24.4453873Z st.shared.b8 [%r70+1280], %r20161; 2026-02-21T12:40:24.4453939Z st.shared.b8 [%r71+1536], %r19209; 2026-02-21T12:40:24.4454005Z prmt.b32 %r20162, %r19209, 0, 0x7771U; 2026-02-21T12:40:24.4454078Z st.shared.b8 [%r72+1536], %r20162; 2026-02-21T12:40:24.4454144Z prmt.b32 %r20163, %r19209, 0, 0x7772U; 2026-02-21T12:40:24.4454209Z st.shared.b8 [%r73+1792], %r20163; 2026-02-21T12:40:24.4454283Z prmt.b32 %r20164, %r19209, 0, 0x7773U; 2026-02-21T12:40:24.4454361Z st.shared.b8 [%r74+1792], %r20164; 2026-02-21T12:40:24.4454421Z bar.sync 0; 2026-02-21T12:40:24.4454491Z ld.shared.b32 %r20165, [%r75]; 2026-02-21T12:40:24.4454564Z prmt.b32 %r20166, %r20165, 0, 0x7770U; 2026-02-21T12:40:24.4454633Z cvt.u16.u32 %rs1881, %r20166; 2026-02-21T12:40:24.4454703Z prmt.b32 %r20167, %r20165, 0, 0x7771U; 2026-02-21T12:40:24.4454773Z cvt.u16.u32 %rs1882, %r20167; 2026-02-21T12:40:24.4454840Z prmt.b32 %r20168, %r20165, 0, 0x7772U; 2026-02-21T12:40:24.4454903Z cvt.u16.u32 %rs1883, %r20168; 2026-02-21T12:40:24.4454973Z prmt.b32 %r20169, %r20165, 0, 0x7773U; 2026-02-21T12:40:24.4455042Z cvt.u16.u32 %rs1884, %r20169; 2026-02-21T12:40:24.4455110Z ld.shared.b32 %r20170, [%r76]; 2026-02-21T12:40:24.4455187Z prmt.b32 %r20171, %r20170, 0, 0x7770U; 2026-02-21T12:40:24.4455260Z cvt.u16.u32 %rs1885, %r20171; 2026-02-21T12:40:24.4455327Z prmt.b32 %r20172, %r20170, 0, 0x7771U; 2026-02-21T12:40:24.4455391Z cvt.u16.u32 %rs1886, %r20172; 2026-02-21T12:40:24.4455467Z prmt.b32 %r20173, %r20170, 0, 0x7772U; 2026-02-21T12:40:24.4455530Z cvt.u16.u32 %rs1887, %r20173; 2026-02-21T12:40:24.4455655Z prmt.b32 %r20174, %r20170, 0, 0x7773U; 2026-02-21T12:40:24.4455720Z cvt.u16.u32 %rs1888, %r20174; 2026-02-21T12:40:24.4460080Z ld.shared.b32 %r20175, [%r77]; 2026-02-21T12:40:24.4460206Z prmt.b32 %r20176, %r20175, 0, 0x7770U; 2026-02-21T12:40:24.4460279Z cvt.u16.u32 %rs1889, %r20176; 2026-02-21T12:40:24.4460357Z prmt.b32 %r20177, %r20175, 0, 0x7771U; 2026-02-21T12:40:24.4460422Z cvt.u16.u32 %rs1890, %r20177; 2026-02-21T12:40:24.4460493Z prmt.b32 %r20178, %r20175, 0, 0x7772U; 2026-02-21T12:40:24.4460559Z cvt.u16.u32 %rs1891, %r20178; 2026-02-21T12:40:24.4460625Z prmt.b32 %r20179, %r20175, 0, 0x7773U; 2026-02-21T12:40:24.4460685Z cvt.u16.u32 %rs1892, %r20179; 2026-02-21T12:40:24.4460752Z ld.shared.b32 %r20180, [%r78]; 2026-02-21T12:40:24.4460822Z prmt.b32 %r20181, %r20180, 0, 0x7770U; 2026-02-21T12:40:24.4460884Z cvt.u16.u32 %rs1893, %r20181; 2026-02-21T12:40:24.4460950Z prmt.b32 %r20182, %r20180, 0, 0x7771U; 2026-02-21T12:40:24.4461014Z cvt.u16.u32 %rs1894, %r20182; 2026-02-21T12:40:24.4461079Z prmt.b32 %r20183, %r20180, 0, 0x7772U; 2026-02-21T12:40:24.4461138Z cvt.u16.u32 %rs1895, %r20183; 2026-02-21T12:40:24.4461389Z prmt.b32 %r20184, %r20180, 0, 0x7773U; 2026-02-21T12:40:24.4461462Z cvt.u16.u32 %rs1896, %r20184; 2026-02-21T12:40:24.4461690Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:24.4461757Z shl.b16 %rs1897, %rs1881, 4; 2026-02-21T12:40:24.4461822Z shl.b16 %rs1898, %rs1885, 4; 2026-02-21T12:40:24.4461882Z shl.b16 %rs1899, %rs1889, 4; 2026-02-21T12:40:24.4461941Z shl.b16 %rs1900, %rs1893, 4; 2026-02-21T12:40:24.4462002Z shl.b16 %rs1901, %rs1882, 4; 2026-02-21T12:40:24.4462060Z shl.b16 %rs1902, %rs1886, 4; 2026-02-21T12:40:24.4462119Z shl.b16 %rs1903, %rs1890, 4; 2026-02-21T12:40:24.4462177Z shl.b16 %rs1904, %rs1894, 4; 2026-02-21T12:40:24.4462238Z shl.b16 %rs1905, %rs1883, 4; 2026-02-21T12:40:24.4462297Z shl.b16 %rs1906, %rs1887, 4; 2026-02-21T12:40:24.4462357Z shl.b16 %rs1907, %rs1891, 4; 2026-02-21T12:40:24.4462417Z shl.b16 %rs1908, %rs1895, 4; 2026-02-21T12:40:24.4462477Z shl.b16 %rs1909, %rs1884, 4; 2026-02-21T12:40:24.4462539Z shl.b16 %rs1910, %rs1888, 4; 2026-02-21T12:40:24.4462598Z shl.b16 %rs1911, %rs1892, 4; 2026-02-21T12:40:24.4462659Z shl.b16 %rs1912, %rs1896, 4; 2026-02-21T12:40:24.4462872Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4462936Z cvt.s16.s8 %rs1913, %rs1897; 2026-02-21T12:40:24.4462999Z shr.s16 %rs1914, %rs1913, 4; 2026-02-21T12:40:24.4463058Z cvt.s16.s8 %rs1915, %rs1899; 2026-02-21T12:40:24.4463117Z shr.s16 %rs1916, %rs1915, 4; 2026-02-21T12:40:24.4463189Z prmt.b32 %r20185, %r20165, 0, 0x8880U; 2026-02-21T12:40:24.4463255Z cvt.u16.u32 %rs1917, %r20185; 2026-02-21T12:40:24.4463313Z shr.s16 %rs1918, %rs1917, 4; 2026-02-21T12:40:24.4463394Z prmt.b32 %r20186, %r20175, 0, 0x8880U; 2026-02-21T12:40:24.4463461Z cvt.u16.u32 %rs1919, %r20186; 2026-02-21T12:40:24.4463522Z shr.s16 %rs1920, %rs1919, 4; 2026-02-21T12:40:24.4463738Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4463812Z cvt.rn.f32.s16 %r20187, %rs1920; 2026-02-21T12:40:24.4463874Z cvt.rn.f32.s16 %r20188, %rs1918; 2026-02-21T12:40:24.4463935Z cvt.rn.f32.s16 %r20189, %rs1916; 2026-02-21T12:40:24.4463996Z cvt.rn.f32.s16 %r20190, %rs1914; 2026-02-21T12:40:24.4464194Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4464256Z cvt.s16.s8 %rs1921, %rs1898; 2026-02-21T12:40:24.4464316Z shr.s16 %rs1922, %rs1921, 4; 2026-02-21T12:40:24.4464379Z cvt.s16.s8 %rs1923, %rs1900; 2026-02-21T12:40:24.4464439Z shr.s16 %rs1924, %rs1923, 4; 2026-02-21T12:40:24.4464505Z prmt.b32 %r20191, %r20170, 0, 0x8880U; 2026-02-21T12:40:24.4464579Z cvt.u16.u32 %rs1925, %r20191; 2026-02-21T12:40:24.4464642Z shr.s16 %rs1926, %rs1925, 4; 2026-02-21T12:40:24.4464793Z prmt.b32 %r20192, %r20180, 0, 0x8880U; 2026-02-21T12:40:24.4464854Z cvt.u16.u32 %rs1927, %r20192; 2026-02-21T12:40:24.4464979Z shr.s16 %rs1928, %rs1927, 4; 2026-02-21T12:40:24.4465174Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4465236Z cvt.rn.f32.s16 %r20193, %rs1928; 2026-02-21T12:40:24.4465301Z cvt.rn.f32.s16 %r20194, %rs1926; 2026-02-21T12:40:24.4465362Z cvt.rn.f32.s16 %r20195, %rs1924; 2026-02-21T12:40:24.4465421Z cvt.rn.f32.s16 %r20196, %rs1922; 2026-02-21T12:40:24.4465617Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4465690Z cvt.s16.s8 %rs1929, %rs1901; 2026-02-21T12:40:24.4465755Z shr.s16 %rs1930, %rs1929, 4; 2026-02-21T12:40:24.4465818Z cvt.s16.s8 %rs1931, %rs1903; 2026-02-21T12:40:24.4465882Z shr.s16 %rs1932, %rs1931, 4; 2026-02-21T12:40:24.4465953Z prmt.b32 %r20197, %r20165, 0, 0x9991U; 2026-02-21T12:40:24.4466026Z cvt.u16.u32 %rs1933, %r20197; 2026-02-21T12:40:24.4466088Z shr.s16 %rs1934, %rs1933, 4; 2026-02-21T12:40:24.4466248Z prmt.b32 %r20198, %r20175, 0, 0x9991U; 2026-02-21T12:40:24.4466317Z cvt.u16.u32 %rs1935, %r20198; 2026-02-21T12:40:24.4466380Z shr.s16 %rs1936, %rs1935, 4; 2026-02-21T12:40:24.4466736Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4466808Z cvt.rn.f32.s16 %r20199, %rs1936; 2026-02-21T12:40:24.4466872Z cvt.rn.f32.s16 %r20200, %rs1934; 2026-02-21T12:40:24.4466936Z cvt.rn.f32.s16 %r20201, %rs1932; 2026-02-21T12:40:24.4466998Z cvt.rn.f32.s16 %r20202, %rs1930; 2026-02-21T12:40:24.4467194Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4467257Z cvt.s16.s8 %rs1937, %rs1902; 2026-02-21T12:40:24.4467319Z shr.s16 %rs1938, %rs1937, 4; 2026-02-21T12:40:24.4467378Z cvt.s16.s8 %rs1939, %rs1904; 2026-02-21T12:40:24.4467453Z shr.s16 %rs1940, %rs1939, 4; 2026-02-21T12:40:24.4467525Z prmt.b32 %r20203, %r20170, 0, 0x9991U; 2026-02-21T12:40:24.4467591Z cvt.u16.u32 %rs1941, %r20203; 2026-02-21T12:40:24.4467653Z shr.s16 %rs1942, %rs1941, 4; 2026-02-21T12:40:24.4467722Z prmt.b32 %r20204, %r20180, 0, 0x9991U; 2026-02-21T12:40:24.4467785Z cvt.u16.u32 %rs1943, %r20204; 2026-02-21T12:40:24.4467849Z shr.s16 %rs1944, %rs1943, 4; 2026-02-21T12:40:24.4468049Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4468117Z cvt.rn.f32.s16 %r20205, %rs1944; 2026-02-21T12:40:24.4468178Z cvt.rn.f32.s16 %r20206, %rs1942; 2026-02-21T12:40:24.4468240Z cvt.rn.f32.s16 %r20207, %rs1940; 2026-02-21T12:40:24.4468306Z cvt.rn.f32.s16 %r20208, %rs1938; 2026-02-21T12:40:24.4468588Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4468652Z cvt.s16.s8 %rs1945, %rs1905; 2026-02-21T12:40:24.4468719Z shr.s16 %rs1946, %rs1945, 4; 2026-02-21T12:40:24.4468779Z cvt.s16.s8 %rs1947, %rs1907; 2026-02-21T12:40:24.4468840Z shr.s16 %rs1948, %rs1947, 4; 2026-02-21T12:40:24.4468912Z prmt.b32 %r20209, %r20165, 0, 0xaaa2U; 2026-02-21T12:40:24.4468978Z cvt.u16.u32 %rs1949, %r20209; 2026-02-21T12:40:24.4469038Z shr.s16 %rs1950, %rs1949, 4; 2026-02-21T12:40:24.4469104Z prmt.b32 %r20210, %r20175, 0, 0xaaa2U; 2026-02-21T12:40:24.4469175Z cvt.u16.u32 %rs1951, %r20210; 2026-02-21T12:40:24.4469242Z shr.s16 %rs1952, %rs1951, 4; 2026-02-21T12:40:24.4469438Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4469511Z cvt.rn.f32.s16 %r20211, %rs1952; 2026-02-21T12:40:24.4469576Z cvt.rn.f32.s16 %r20212, %rs1950; 2026-02-21T12:40:24.4469639Z cvt.rn.f32.s16 %r20213, %rs1948; 2026-02-21T12:40:24.4469701Z cvt.rn.f32.s16 %r20214, %rs1946; 2026-02-21T12:40:24.4469901Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4470051Z cvt.s16.s8 %rs1953, %rs1906; 2026-02-21T12:40:24.4470118Z shr.s16 %rs1954, %rs1953, 4; 2026-02-21T12:40:24.4470269Z cvt.s16.s8 %rs1955, %rs1908; 2026-02-21T12:40:24.4470336Z shr.s16 %rs1956, %rs1955, 4; 2026-02-21T12:40:24.4470404Z prmt.b32 %r20215, %r20170, 0, 0xaaa2U; 2026-02-21T12:40:24.4470468Z cvt.u16.u32 %rs1957, %r20215; 2026-02-21T12:40:24.4470538Z shr.s16 %rs1958, %rs1957, 4; 2026-02-21T12:40:24.4470620Z prmt.b32 %r20216, %r20180, 0, 0xaaa2U; 2026-02-21T12:40:24.4470685Z cvt.u16.u32 %rs1959, %r20216; 2026-02-21T12:40:24.4470755Z shr.s16 %rs1960, %rs1959, 4; 2026-02-21T12:40:24.4470950Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4471015Z cvt.rn.f32.s16 %r20217, %rs1960; 2026-02-21T12:40:24.4471084Z cvt.rn.f32.s16 %r20218, %rs1958; 2026-02-21T12:40:24.4471149Z cvt.rn.f32.s16 %r20219, %rs1956; 2026-02-21T12:40:24.4471214Z cvt.rn.f32.s16 %r20220, %rs1954; 2026-02-21T12:40:24.4471473Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4471611Z cvt.s16.s8 %rs1961, %rs1909; 2026-02-21T12:40:24.4471677Z shr.s16 %rs1962, %rs1961, 4; 2026-02-21T12:40:24.4471740Z cvt.s16.s8 %rs1963, %rs1911; 2026-02-21T12:40:24.4471808Z shr.s16 %rs1964, %rs1963, 4; 2026-02-21T12:40:24.4471874Z prmt.b32 %r20221, %r20165, 0, 0xbbb3U; 2026-02-21T12:40:24.4471949Z cvt.u16.u32 %rs1965, %r20221; 2026-02-21T12:40:24.4472021Z shr.s16 %rs1966, %rs1965, 4; 2026-02-21T12:40:24.4472089Z prmt.b32 %r20222, %r20175, 0, 0xbbb3U; 2026-02-21T12:40:24.4472153Z cvt.u16.u32 %rs1967, %r20222; 2026-02-21T12:40:24.4472218Z shr.s16 %rs1968, %rs1967, 4; 2026-02-21T12:40:24.4472419Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4472484Z cvt.rn.f32.s16 %r20223, %rs1968; 2026-02-21T12:40:24.4472549Z cvt.rn.f32.s16 %r20224, %rs1966; 2026-02-21T12:40:24.4472618Z cvt.rn.f32.s16 %r20225, %rs1964; 2026-02-21T12:40:24.4472681Z cvt.rn.f32.s16 %r20226, %rs1962; 2026-02-21T12:40:24.4472882Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4472945Z cvt.s16.s8 %rs1969, %rs1910; 2026-02-21T12:40:24.4473015Z shr.s16 %rs1970, %rs1969, 4; 2026-02-21T12:40:24.4473077Z cvt.s16.s8 %rs1971, %rs1912; 2026-02-21T12:40:24.4473139Z shr.s16 %rs1972, %rs1971, 4; 2026-02-21T12:40:24.4473212Z prmt.b32 %r20227, %r20170, 0, 0xbbb3U; 2026-02-21T12:40:24.4473275Z cvt.u16.u32 %rs1973, %r20227; 2026-02-21T12:40:24.4473338Z shr.s16 %rs1974, %rs1973, 4; 2026-02-21T12:40:24.4473411Z prmt.b32 %r20228, %r20180, 0, 0xbbb3U; 2026-02-21T12:40:24.4473474Z cvt.u16.u32 %rs1975, %r20228; 2026-02-21T12:40:24.4473535Z shr.s16 %rs1976, %rs1975, 4; 2026-02-21T12:40:24.4473731Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4473805Z cvt.rn.f32.s16 %r20229, %rs1976; 2026-02-21T12:40:24.4473869Z cvt.rn.f32.s16 %r20230, %rs1974; 2026-02-21T12:40:24.4473939Z cvt.rn.f32.s16 %r20231, %rs1972; 2026-02-21T12:40:24.4474010Z cvt.rn.f32.s16 %r20232, %rs1970; 2026-02-21T12:40:24.4474068Z bar.sync 0; 2026-02-21T12:40:24.4474192Z st.shared.v4.b32 [%r79], {%r20190, %r20188, %r20189, %r20187}; 2026-02-21T12:40:24.4474325Z st.shared.v4.b32 [%r79+8192], {%r20196, %r20194, %r20195, %r20193}; 2026-02-21T12:40:24.4474439Z st.shared.v4.b32 [%r80], {%r20202, %r20200, %r20201, %r20199}; 2026-02-21T12:40:24.4474559Z st.shared.v4.b32 [%r80+8192], {%r20208, %r20206, %r20207, %r20205}; 2026-02-21T12:40:24.4474669Z st.shared.v4.b32 [%r81], {%r20214, %r20212, %r20213, %r20211}; 2026-02-21T12:40:24.4474791Z st.shared.v4.b32 [%r81+8192], {%r20220, %r20218, %r20219, %r20217}; 2026-02-21T12:40:24.4474901Z st.shared.v4.b32 [%r82], {%r20226, %r20224, %r20225, %r20223}; 2026-02-21T12:40:24.4475083Z st.shared.v4.b32 [%r82+8192], {%r20232, %r20230, %r20231, %r20229}; 2026-02-21T12:40:24.4475147Z $L__tmp37: 2026-02-21T12:40:24.4475429Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:24.4475549Z // begin inline asm 2026-02-21T12:40:24.4475639Z fence.proxy.async.shared::cta; 2026-02-21T12:40:24.4475699Z // end inline asm 2026-02-21T12:40:24.4475755Z bar.sync 0; 2026-02-21T12:40:24.4475831Z wgmma.fence.sync.aligned; 2026-02-21T12:40:24.4475897Z // begin inline asm 2026-02-21T12:40:24.4478856Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r19466,%r19467,%r19468,%r19469}, %rd30, %p48, 1, 1; 2026-02-21T12:40:24.4478947Z // end inline asm 2026-02-21T12:40:24.4479010Z // begin inline asm 2026-02-21T12:40:24.4481684Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r19726,%r19727,%r19728,%r19729}, %rd31, %p48, 1, 1; 2026-02-21T12:40:24.4481751Z // end inline asm 2026-02-21T12:40:24.4481838Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:24.4481903Z mov.b32 %r19859, %r19860; 2026-02-21T12:40:24.4481964Z mov.b32 %r19858, %r18409; 2026-02-21T12:40:24.4482029Z // begin inline asm 2026-02-21T12:40:24.4484571Z // wait for regs: %r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r19858,%r19859,%r19860 2026-02-21T12:40:24.4484786Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:24.4484854Z // end inline asm 2026-02-21T12:40:24.4484911Z $L__tmp38: 2026-02-21T12:40:24.4485138Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:24.4485212Z add.s64 %rd644, %rd644, 24; 2026-02-21T12:40:24.4485276Z add.s64 %rd643, %rd643, 96; 2026-02-21T12:40:24.4485345Z add.s64 %rd642, %rd642, 30720; 2026-02-21T12:40:24.4485424Z setp.lt.u64 %p54, %rd644, 4056; 2026-02-21T12:40:24.4485488Z @%p54 bra $L__BB0_7; 2026-02-21T12:40:24.4485582Z // %bb.8: // %.preheader 2026-02-21T12:40:24.4485692Z // in Loop: Header=BB0_6 Depth=1 2026-02-21T12:40:24.4485769Z add.s64 %rd646, %rd34, %rd142; 2026-02-21T12:40:24.4485832Z add.s64 %rd645, %rd35, %rd141; 2026-02-21T12:40:24.4485993Z mov.b64 %rd647, 4072; 2026-02-21T12:40:24.4486116Z $L__BB0_9: // Parent Loop BB0_6 Depth=1 2026-02-21T12:40:24.4486226Z // => This Inner Loop Header: Depth=2 2026-02-21T12:40:24.4486438Z .loc 1 51 80 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:51:80 2026-02-21T12:40:24.4486624Z // begin inline asm 2026-02-21T12:40:24.4486692Z mov.u64 %rd555, 0x0; 2026-02-21T12:40:24.4486822Z createpolicy.fractional.L2::evict_last.b64 %rd555, 1.0; 2026-02-21T12:40:24.4486883Z // end inline asm 2026-02-21T12:40:24.4486951Z // begin inline asm 2026-02-21T12:40:24.4487011Z mov.u32 %r20233, 0x0; 2026-02-21T12:40:24.4487071Z mov.u32 %r20234, 0x0; 2026-02-21T12:40:24.4487137Z mov.u32 %r20235, 0x0; 2026-02-21T12:40:24.4487197Z mov.u32 %r20236, 0x0; 2026-02-21T12:40:24.4487442Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r20233, %r20234, %r20235, %r20236 }, [ %rd646 + 0 ], %rd555; 2026-02-21T12:40:24.4487507Z // end inline asm 2026-02-21T12:40:24.4487723Z .loc 1 55 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:55:32 2026-02-21T12:40:24.4487782Z bar.sync 0; 2026-02-21T12:40:24.4487870Z st.shared.v2.b32 [%r55], {%r20233, %r20234}; 2026-02-21T12:40:24.4487968Z st.shared.v2.b32 [%r56], {%r20235, %r20236}; 2026-02-21T12:40:24.4488030Z bar.sync 0; 2026-02-21T12:40:24.4488101Z ld.shared.b16 %rs1977, [%r57]; 2026-02-21T12:40:24.4488176Z ld.shared.b16 %rs1978, [%r57+256]; 2026-02-21T12:40:24.4488246Z ld.shared.b16 %rs1979, [%r57+16]; 2026-02-21T12:40:24.4488314Z ld.shared.b16 %rs1980, [%r57+272]; 2026-02-21T12:40:24.4488382Z ld.shared.b16 %rs1981, [%r58]; 2026-02-21T12:40:24.4488453Z ld.shared.b16 %rs1982, [%r58+256]; 2026-02-21T12:40:24.4488519Z ld.shared.b16 %rs1983, [%r58+16]; 2026-02-21T12:40:24.4488588Z ld.shared.b16 %rs1984, [%r58+272]; 2026-02-21T12:40:24.4488658Z cvt.f32.bf16 %r20497, %rs1977; 2026-02-21T12:40:24.4488722Z cvt.f32.bf16 %r20498, %rs1978; 2026-02-21T12:40:24.4488790Z cvt.f32.bf16 %r20499, %rs1981; 2026-02-21T12:40:24.4488856Z cvt.f32.bf16 %r20500, %rs1982; 2026-02-21T12:40:24.4488924Z cvt.f32.bf16 %r20757, %rs1979; 2026-02-21T12:40:24.4488989Z cvt.f32.bf16 %r20758, %rs1980; 2026-02-21T12:40:24.4489051Z cvt.f32.bf16 %r20759, %rs1983; 2026-02-21T12:40:24.4489118Z cvt.f32.bf16 %r20760, %rs1984; 2026-02-21T12:40:24.4489324Z .loc 1 57 87 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:57:87 2026-02-21T12:40:24.4489386Z // begin inline asm 2026-02-21T12:40:24.4489451Z mov.u32 %r20237, 0x0; 2026-02-21T12:40:24.4489512Z mov.u32 %r20238, 0x0; 2026-02-21T12:40:24.4489571Z mov.u32 %r20239, 0x0; 2026-02-21T12:40:24.4489630Z mov.u32 %r20240, 0x0; 2026-02-21T12:40:24.4489774Z ld.global.v4.b32 { %r20237, %r20238, %r20239, %r20240 }, [ %rd645 + 0 ]; 2026-02-21T12:40:24.4489918Z // end inline asm 2026-02-21T12:40:24.4490125Z .loc 1 65 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:65:28 2026-02-21T12:40:24.4490250Z bar.sync 0; 2026-02-21T12:40:24.4490320Z st.shared.b8 [%r59], %r20237; 2026-02-21T12:40:24.4490393Z prmt.b32 %r21023, %r20237, 0, 0x7771U; 2026-02-21T12:40:24.4490460Z st.shared.b8 [%r60], %r21023; 2026-02-21T12:40:24.4490549Z prmt.b32 %r21024, %r20237, 0, 0x7772U; 2026-02-21T12:40:24.4490619Z st.shared.b8 [%r61+256], %r21024; 2026-02-21T12:40:24.4490689Z prmt.b32 %r21025, %r20237, 0, 0x7773U; 2026-02-21T12:40:24.4490760Z st.shared.b8 [%r62+256], %r21025; 2026-02-21T12:40:24.4490827Z st.shared.b8 [%r63+512], %r20238; 2026-02-21T12:40:24.4490894Z prmt.b32 %r21026, %r20238, 0, 0x7771U; 2026-02-21T12:40:24.4490963Z st.shared.b8 [%r64+512], %r21026; 2026-02-21T12:40:24.4491032Z prmt.b32 %r21027, %r20238, 0, 0x7772U; 2026-02-21T12:40:24.4491098Z st.shared.b8 [%r65+768], %r21027; 2026-02-21T12:40:24.4491167Z prmt.b32 %r21028, %r20238, 0, 0x7773U; 2026-02-21T12:40:24.4491238Z st.shared.b8 [%r66+768], %r21028; 2026-02-21T12:40:24.4491425Z st.shared.b8 [%r67+1024], %r20239; 2026-02-21T12:40:24.4491498Z prmt.b32 %r21029, %r20239, 0, 0x7771U; 2026-02-21T12:40:24.4491568Z st.shared.b8 [%r68+1024], %r21029; 2026-02-21T12:40:24.4491635Z prmt.b32 %r21030, %r20239, 0, 0x7772U; 2026-02-21T12:40:24.4491702Z st.shared.b8 [%r69+1280], %r21030; 2026-02-21T12:40:24.4491768Z prmt.b32 %r21031, %r20239, 0, 0x7773U; 2026-02-21T12:40:24.4491849Z st.shared.b8 [%r70+1280], %r21031; 2026-02-21T12:40:24.4491918Z st.shared.b8 [%r71+1536], %r20240; 2026-02-21T12:40:24.4491984Z prmt.b32 %r21032, %r20240, 0, 0x7771U; 2026-02-21T12:40:24.4492055Z st.shared.b8 [%r72+1536], %r21032; 2026-02-21T12:40:24.4492120Z prmt.b32 %r21033, %r20240, 0, 0x7772U; 2026-02-21T12:40:24.4492186Z st.shared.b8 [%r73+1792], %r21033; 2026-02-21T12:40:24.4492257Z prmt.b32 %r21034, %r20240, 0, 0x7773U; 2026-02-21T12:40:24.4492325Z st.shared.b8 [%r74+1792], %r21034; 2026-02-21T12:40:24.4492381Z bar.sync 0; 2026-02-21T12:40:24.4492451Z ld.shared.b32 %r21035, [%r75]; 2026-02-21T12:40:24.4492526Z prmt.b32 %r21036, %r21035, 0, 0x7770U; 2026-02-21T12:40:24.4492601Z cvt.u16.u32 %rs1985, %r21036; 2026-02-21T12:40:24.4492672Z prmt.b32 %r21037, %r21035, 0, 0x7771U; 2026-02-21T12:40:24.4492745Z cvt.u16.u32 %rs1986, %r21037; 2026-02-21T12:40:24.4492812Z prmt.b32 %r21038, %r21035, 0, 0x7772U; 2026-02-21T12:40:24.4492879Z cvt.u16.u32 %rs1987, %r21038; 2026-02-21T12:40:24.4492948Z prmt.b32 %r21039, %r21035, 0, 0x7773U; 2026-02-21T12:40:24.4493015Z cvt.u16.u32 %rs1988, %r21039; 2026-02-21T12:40:24.4493082Z ld.shared.b32 %r21040, [%r76]; 2026-02-21T12:40:24.4493149Z prmt.b32 %r21041, %r21040, 0, 0x7770U; 2026-02-21T12:40:24.4493217Z cvt.u16.u32 %rs1989, %r21041; 2026-02-21T12:40:24.4493283Z prmt.b32 %r21042, %r21040, 0, 0x7771U; 2026-02-21T12:40:24.4493347Z cvt.u16.u32 %rs1990, %r21042; 2026-02-21T12:40:24.4493419Z prmt.b32 %r21043, %r21040, 0, 0x7772U; 2026-02-21T12:40:24.4493487Z cvt.u16.u32 %rs1991, %r21043; 2026-02-21T12:40:24.4493558Z prmt.b32 %r21044, %r21040, 0, 0x7773U; 2026-02-21T12:40:24.4493621Z cvt.u16.u32 %rs1992, %r21044; 2026-02-21T12:40:24.4493692Z ld.shared.b32 %r21045, [%r77]; 2026-02-21T12:40:24.4493758Z prmt.b32 %r21046, %r21045, 0, 0x7770U; 2026-02-21T12:40:24.4493821Z cvt.u16.u32 %rs1993, %r21046; 2026-02-21T12:40:24.4493889Z prmt.b32 %r21047, %r21045, 0, 0x7771U; 2026-02-21T12:40:24.4493953Z cvt.u16.u32 %rs1994, %r21047; 2026-02-21T12:40:24.4494022Z prmt.b32 %r21048, %r21045, 0, 0x7772U; 2026-02-21T12:40:24.4494088Z cvt.u16.u32 %rs1995, %r21048; 2026-02-21T12:40:24.4494157Z prmt.b32 %r21049, %r21045, 0, 0x7773U; 2026-02-21T12:40:24.4494219Z cvt.u16.u32 %rs1996, %r21049; 2026-02-21T12:40:24.4494285Z ld.shared.b32 %r21050, [%r78]; 2026-02-21T12:40:24.4494355Z prmt.b32 %r21051, %r21050, 0, 0x7770U; 2026-02-21T12:40:24.4494496Z cvt.u16.u32 %rs1997, %r21051; 2026-02-21T12:40:24.4494562Z prmt.b32 %r21052, %r21050, 0, 0x7771U; 2026-02-21T12:40:24.4494628Z cvt.u16.u32 %rs1998, %r21052; 2026-02-21T12:40:24.4494749Z prmt.b32 %r21053, %r21050, 0, 0x7772U; 2026-02-21T12:40:24.4494821Z cvt.u16.u32 %rs1999, %r21053; 2026-02-21T12:40:24.4494888Z prmt.b32 %r21054, %r21050, 0, 0x7773U; 2026-02-21T12:40:24.4494953Z cvt.u16.u32 %rs2000, %r21054; 2026-02-21T12:40:24.4495157Z .loc 1 60 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:60:28 2026-02-21T12:40:24.4495224Z shl.b16 %rs2001, %rs1985, 4; 2026-02-21T12:40:24.4495291Z shl.b16 %rs2002, %rs1989, 4; 2026-02-21T12:40:24.4495353Z shl.b16 %rs2003, %rs1993, 4; 2026-02-21T12:40:24.4495415Z shl.b16 %rs2004, %rs1997, 4; 2026-02-21T12:40:24.4495480Z shl.b16 %rs2005, %rs1986, 4; 2026-02-21T12:40:24.4495542Z shl.b16 %rs2006, %rs1990, 4; 2026-02-21T12:40:24.4495603Z shl.b16 %rs2007, %rs1994, 4; 2026-02-21T12:40:24.4495667Z shl.b16 %rs2008, %rs1998, 4; 2026-02-21T12:40:24.4495735Z shl.b16 %rs2009, %rs1987, 4; 2026-02-21T12:40:24.4495798Z shl.b16 %rs2010, %rs1991, 4; 2026-02-21T12:40:24.4495913Z shl.b16 %rs2011, %rs1995, 4; 2026-02-21T12:40:24.4496027Z shl.b16 %rs2012, %rs1999, 4; 2026-02-21T12:40:24.4496091Z shl.b16 %rs2013, %rs1988, 4; 2026-02-21T12:40:24.4496156Z shl.b16 %rs2014, %rs1992, 4; 2026-02-21T12:40:24.4496216Z shl.b16 %rs2015, %rs1996, 4; 2026-02-21T12:40:24.4496281Z shl.b16 %rs2016, %rs2000, 4; 2026-02-21T12:40:24.4496600Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4496669Z cvt.s16.s8 %rs2017, %rs2001; 2026-02-21T12:40:24.4496735Z shr.s16 %rs2018, %rs2017, 4; 2026-02-21T12:40:24.4496798Z cvt.s16.s8 %rs2019, %rs2003; 2026-02-21T12:40:24.4496859Z shr.s16 %rs2020, %rs2019, 4; 2026-02-21T12:40:24.4496930Z prmt.b32 %r21055, %r21035, 0, 0x8880U; 2026-02-21T12:40:24.4496994Z cvt.u16.u32 %rs2021, %r21055; 2026-02-21T12:40:24.4497058Z shr.s16 %rs2022, %rs2021, 4; 2026-02-21T12:40:24.4497125Z prmt.b32 %r21056, %r21045, 0, 0x8880U; 2026-02-21T12:40:24.4497197Z cvt.u16.u32 %rs2023, %r21056; 2026-02-21T12:40:24.4497264Z shr.s16 %rs2024, %rs2023, 4; 2026-02-21T12:40:24.4497470Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4497544Z cvt.rn.f32.s16 %r21057, %rs2024; 2026-02-21T12:40:24.4497610Z cvt.rn.f32.s16 %r21058, %rs2022; 2026-02-21T12:40:24.4497672Z cvt.rn.f32.s16 %r21059, %rs2020; 2026-02-21T12:40:24.4497735Z cvt.rn.f32.s16 %r21060, %rs2018; 2026-02-21T12:40:24.4497938Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4498000Z cvt.s16.s8 %rs2025, %rs2002; 2026-02-21T12:40:24.4498062Z shr.s16 %rs2026, %rs2025, 4; 2026-02-21T12:40:24.4498129Z cvt.s16.s8 %rs2027, %rs2004; 2026-02-21T12:40:24.4498190Z shr.s16 %rs2028, %rs2027, 4; 2026-02-21T12:40:24.4498258Z prmt.b32 %r21061, %r21040, 0, 0x8880U; 2026-02-21T12:40:24.4498327Z cvt.u16.u32 %rs2029, %r21061; 2026-02-21T12:40:24.4498390Z shr.s16 %rs2030, %rs2029, 4; 2026-02-21T12:40:24.4498461Z prmt.b32 %r21062, %r21050, 0, 0x8880U; 2026-02-21T12:40:24.4498524Z cvt.u16.u32 %rs2031, %r21062; 2026-02-21T12:40:24.4498591Z shr.s16 %rs2032, %rs2031, 4; 2026-02-21T12:40:24.4498796Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4498862Z cvt.rn.f32.s16 %r21063, %rs2032; 2026-02-21T12:40:24.4498930Z cvt.rn.f32.s16 %r21064, %rs2030; 2026-02-21T12:40:24.4498996Z cvt.rn.f32.s16 %r21065, %rs2028; 2026-02-21T12:40:24.4499058Z cvt.rn.f32.s16 %r21066, %rs2026; 2026-02-21T12:40:24.4499258Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4499321Z cvt.s16.s8 %rs2033, %rs2005; 2026-02-21T12:40:24.4499383Z shr.s16 %rs2034, %rs2033, 4; 2026-02-21T12:40:24.4499444Z cvt.s16.s8 %rs2035, %rs2007; 2026-02-21T12:40:24.4499602Z shr.s16 %rs2036, %rs2035, 4; 2026-02-21T12:40:24.4499670Z prmt.b32 %r21067, %r21035, 0, 0x9991U; 2026-02-21T12:40:24.4499802Z cvt.u16.u32 %rs2037, %r21067; 2026-02-21T12:40:24.4499867Z shr.s16 %rs2038, %rs2037, 4; 2026-02-21T12:40:24.4499935Z prmt.b32 %r21068, %r21045, 0, 0x9991U; 2026-02-21T12:40:24.4499998Z cvt.u16.u32 %rs2039, %r21068; 2026-02-21T12:40:24.4500060Z shr.s16 %rs2040, %rs2039, 4; 2026-02-21T12:40:24.4500257Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4500324Z cvt.rn.f32.s16 %r21069, %rs2040; 2026-02-21T12:40:24.4500388Z cvt.rn.f32.s16 %r21070, %rs2038; 2026-02-21T12:40:24.4500457Z cvt.rn.f32.s16 %r21071, %rs2036; 2026-02-21T12:40:24.4500522Z cvt.rn.f32.s16 %r21072, %rs2034; 2026-02-21T12:40:24.4500717Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4500783Z cvt.s16.s8 %rs2041, %rs2006; 2026-02-21T12:40:24.4500848Z shr.s16 %rs2042, %rs2041, 4; 2026-02-21T12:40:24.4500914Z cvt.s16.s8 %rs2043, %rs2008; 2026-02-21T12:40:24.4501040Z shr.s16 %rs2044, %rs2043, 4; 2026-02-21T12:40:24.4501184Z prmt.b32 %r21073, %r21040, 0, 0x9991U; 2026-02-21T12:40:24.4501251Z cvt.u16.u32 %rs2045, %r21073; 2026-02-21T12:40:24.4501313Z shr.s16 %rs2046, %rs2045, 4; 2026-02-21T12:40:24.4501386Z prmt.b32 %r21074, %r21050, 0, 0x9991U; 2026-02-21T12:40:24.4501451Z cvt.u16.u32 %rs2047, %r21074; 2026-02-21T12:40:24.4501514Z shr.s16 %rs2048, %rs2047, 4; 2026-02-21T12:40:24.4501714Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4501778Z cvt.rn.f32.s16 %r21075, %rs2048; 2026-02-21T12:40:24.4501839Z cvt.rn.f32.s16 %r21076, %rs2046; 2026-02-21T12:40:24.4501908Z cvt.rn.f32.s16 %r21077, %rs2044; 2026-02-21T12:40:24.4501970Z cvt.rn.f32.s16 %r21078, %rs2042; 2026-02-21T12:40:24.4502175Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4502245Z cvt.s16.s8 %rs2049, %rs2009; 2026-02-21T12:40:24.4502308Z shr.s16 %rs2050, %rs2049, 4; 2026-02-21T12:40:24.4502373Z cvt.s16.s8 %rs2051, %rs2011; 2026-02-21T12:40:24.4502439Z shr.s16 %rs2052, %rs2051, 4; 2026-02-21T12:40:24.4502506Z prmt.b32 %r21079, %r21035, 0, 0xaaa2U; 2026-02-21T12:40:24.4502566Z cvt.u16.u32 %rs2053, %r21079; 2026-02-21T12:40:24.4502627Z shr.s16 %rs2054, %rs2053, 4; 2026-02-21T12:40:24.4502697Z prmt.b32 %r21080, %r21045, 0, 0xaaa2U; 2026-02-21T12:40:24.4502756Z cvt.u16.u32 %rs2055, %r21080; 2026-02-21T12:40:24.4502817Z shr.s16 %rs2056, %rs2055, 4; 2026-02-21T12:40:24.4503015Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4503078Z cvt.rn.f32.s16 %r21081, %rs2056; 2026-02-21T12:40:24.4503140Z cvt.rn.f32.s16 %r21082, %rs2054; 2026-02-21T12:40:24.4503203Z cvt.rn.f32.s16 %r21083, %rs2052; 2026-02-21T12:40:24.4503267Z cvt.rn.f32.s16 %r21084, %rs2050; 2026-02-21T12:40:24.4503462Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4503526Z cvt.s16.s8 %rs2057, %rs2010; 2026-02-21T12:40:24.4503591Z shr.s16 %rs2058, %rs2057, 4; 2026-02-21T12:40:24.4503650Z cvt.s16.s8 %rs2059, %rs2012; 2026-02-21T12:40:24.4503710Z shr.s16 %rs2060, %rs2059, 4; 2026-02-21T12:40:24.4503778Z prmt.b32 %r21085, %r21040, 0, 0xaaa2U; 2026-02-21T12:40:24.4503839Z cvt.u16.u32 %rs2061, %r21085; 2026-02-21T12:40:24.4503899Z shr.s16 %rs2062, %rs2061, 4; 2026-02-21T12:40:24.4503966Z prmt.b32 %r21086, %r21050, 0, 0xaaa2U; 2026-02-21T12:40:24.4504029Z cvt.u16.u32 %rs2063, %r21086; 2026-02-21T12:40:24.4504088Z shr.s16 %rs2064, %rs2063, 4; 2026-02-21T12:40:24.4504278Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4504347Z cvt.rn.f32.s16 %r21087, %rs2064; 2026-02-21T12:40:24.4504470Z cvt.rn.f32.s16 %r21088, %rs2062; 2026-02-21T12:40:24.4504532Z cvt.rn.f32.s16 %r21089, %rs2060; 2026-02-21T12:40:24.4504598Z cvt.rn.f32.s16 %r21090, %rs2058; 2026-02-21T12:40:24.4504836Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4504898Z cvt.s16.s8 %rs2065, %rs2013; 2026-02-21T12:40:24.4504960Z shr.s16 %rs2066, %rs2065, 4; 2026-02-21T12:40:24.4505022Z cvt.s16.s8 %rs2067, %rs2015; 2026-02-21T12:40:24.4505082Z shr.s16 %rs2068, %rs2067, 4; 2026-02-21T12:40:24.4505146Z prmt.b32 %r21091, %r21035, 0, 0xbbb3U; 2026-02-21T12:40:24.4505216Z cvt.u16.u32 %rs2069, %r21091; 2026-02-21T12:40:24.4505283Z shr.s16 %rs2070, %rs2069, 4; 2026-02-21T12:40:24.4505349Z prmt.b32 %r21092, %r21045, 0, 0xbbb3U; 2026-02-21T12:40:24.4505412Z cvt.u16.u32 %rs2071, %r21092; 2026-02-21T12:40:24.4505471Z shr.s16 %rs2072, %rs2071, 4; 2026-02-21T12:40:24.4505664Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4505728Z cvt.rn.f32.s16 %r21093, %rs2072; 2026-02-21T12:40:24.4505792Z cvt.rn.f32.s16 %r21094, %rs2070; 2026-02-21T12:40:24.4505948Z cvt.rn.f32.s16 %r21095, %rs2068; 2026-02-21T12:40:24.4506013Z cvt.rn.f32.s16 %r21096, %rs2066; 2026-02-21T12:40:24.4506212Z .loc 1 62 25 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:62:25 2026-02-21T12:40:24.4506273Z cvt.s16.s8 %rs2073, %rs2014; 2026-02-21T12:40:24.4506334Z shr.s16 %rs2074, %rs2073, 4; 2026-02-21T12:40:24.4506394Z cvt.s16.s8 %rs2075, %rs2016; 2026-02-21T12:40:24.4506569Z shr.s16 %rs2076, %rs2075, 4; 2026-02-21T12:40:24.4506643Z prmt.b32 %r21097, %r21040, 0, 0xbbb3U; 2026-02-21T12:40:24.4506705Z cvt.u16.u32 %rs2077, %r21097; 2026-02-21T12:40:24.4506768Z shr.s16 %rs2078, %rs2077, 4; 2026-02-21T12:40:24.4506833Z prmt.b32 %r21098, %r21050, 0, 0xbbb3U; 2026-02-21T12:40:24.4506894Z cvt.u16.u32 %rs2079, %r21098; 2026-02-21T12:40:24.4506957Z shr.s16 %rs2080, %rs2079, 4; 2026-02-21T12:40:24.4507152Z .loc 1 80 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:80:32 2026-02-21T12:40:24.4507219Z cvt.rn.f32.s16 %r21099, %rs2080; 2026-02-21T12:40:24.4507281Z cvt.rn.f32.s16 %r21100, %rs2078; 2026-02-21T12:40:24.4507345Z cvt.rn.f32.s16 %r21101, %rs2076; 2026-02-21T12:40:24.4507406Z cvt.rn.f32.s16 %r21102, %rs2074; 2026-02-21T12:40:24.4507471Z bar.sync 0; 2026-02-21T12:40:24.4507598Z st.shared.v4.b32 [%r79], {%r21060, %r21058, %r21059, %r21057}; 2026-02-21T12:40:24.4507724Z st.shared.v4.b32 [%r79+8192], {%r21066, %r21064, %r21065, %r21063}; 2026-02-21T12:40:24.4507838Z st.shared.v4.b32 [%r80], {%r21072, %r21070, %r21071, %r21069}; 2026-02-21T12:40:24.4507958Z st.shared.v4.b32 [%r80+8192], {%r21078, %r21076, %r21077, %r21075}; 2026-02-21T12:40:24.4508065Z st.shared.v4.b32 [%r81], {%r21084, %r21082, %r21083, %r21081}; 2026-02-21T12:40:24.4508181Z st.shared.v4.b32 [%r81+8192], {%r21090, %r21088, %r21089, %r21087}; 2026-02-21T12:40:24.4508292Z st.shared.v4.b32 [%r82], {%r21096, %r21094, %r21095, %r21093}; 2026-02-21T12:40:24.4508504Z st.shared.v4.b32 [%r82+8192], {%r21102, %r21100, %r21101, %r21099}; 2026-02-21T12:40:24.4508563Z $L__tmp39: 2026-02-21T12:40:24.4508838Z .loc 2 291 36 // standard.py:291:36 @[ cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:87:40 ] 2026-02-21T12:40:24.4508902Z // begin inline asm 2026-02-21T12:40:24.4508980Z fence.proxy.async.shared::cta; 2026-02-21T12:40:24.4509038Z // end inline asm 2026-02-21T12:40:24.4509094Z bar.sync 0; 2026-02-21T12:40:24.4509178Z shfl.sync.idx.b32 %r21103, %r2, 0, 31, -1; 2026-02-21T12:40:24.4509250Z wgmma.fence.sync.aligned; 2026-02-21T12:40:24.4509315Z mov.pred %p55, -1; 2026-02-21T12:40:24.4509379Z // begin inline asm 2026-02-21T12:40:24.4512103Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r20497,%r20498,%r20499,%r20500}, %rd30, %p55, 1, 1; 2026-02-21T12:40:24.4512318Z // end inline asm 2026-02-21T12:40:24.4512382Z // begin inline asm 2026-02-21T12:40:24.4515239Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481}, {%r20757,%r20758,%r20759,%r20760}, %rd31, %p55, 1, 1; 2026-02-21T12:40:24.4515307Z // end inline asm 2026-02-21T12:40:24.4515388Z wgmma.commit_group.sync.aligned; 2026-02-21T12:40:24.4515448Z mov.b32 %r20890, 0; 2026-02-21T12:40:24.4515513Z mov.b32 %r20891, %r20890; 2026-02-21T12:40:24.4515571Z mov.b32 %r20889, %r18409; 2026-02-21T12:40:24.4515631Z // begin inline asm 2026-02-21T12:40:24.4518324Z // wait for regs: %r22354,%r22355,%r22356,%r22357,%r22358,%r22359,%r22360,%r22361,%r22362,%r22363,%r22364,%r22365,%r22366,%r22367,%r22368,%r22369,%r22370,%r22371,%r22372,%r22373,%r22374,%r22375,%r22376,%r22377,%r22378,%r22379,%r22380,%r22381,%r22382,%r22383,%r22384,%r22385,%r22386,%r22387,%r22388,%r22389,%r22390,%r22391,%r22392,%r22393,%r22394,%r22395,%r22396,%r22397,%r22398,%r22399,%r22400,%r22401,%r22402,%r22403,%r22404,%r22405,%r22406,%r22407,%r22408,%r22409,%r22410,%r22411,%r22412,%r22413,%r22414,%r22415,%r22416,%r22417,%r22418,%r22419,%r22420,%r22421,%r22422,%r22423,%r22424,%r22425,%r22426,%r22427,%r22428,%r22429,%r22430,%r22431,%r22432,%r22433,%r22434,%r22435,%r22436,%r22437,%r22438,%r22439,%r22440,%r22441,%r22442,%r22443,%r22444,%r22445,%r22446,%r22447,%r22448,%r22449,%r22450,%r22451,%r22452,%r22453,%r22454,%r22455,%r22456,%r22457,%r22458,%r22459,%r22460,%r22461,%r22462,%r22463,%r22464,%r22465,%r22466,%r22467,%r22468,%r22469,%r22470,%r22471,%r22472,%r22473,%r22474,%r22475,%r22476,%r22477,%r22478,%r22479,%r22480,%r22481,%r20889,%r20890,%r20891 2026-02-21T12:40:24.4518412Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:40:24.4518469Z // end inline asm 2026-02-21T12:40:24.4518523Z $L__tmp40: 2026-02-21T12:40:24.4518740Z .loc 1 43 126 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:43:126 2026-02-21T12:40:24.4518804Z add.s64 %rd647, %rd647, 8; 2026-02-21T12:40:24.4518868Z add.s64 %rd646, %rd646, 32; 2026-02-21T12:40:24.4518935Z add.s64 %rd645, %rd645, 10240; 2026-02-21T12:40:24.4519092Z setp.lt.u64 %p57, %rd647, 4088; 2026-02-21T12:40:24.4519153Z @%p57 bra $L__BB0_9; 2026-02-21T12:40:24.4519326Z // %bb.10: // in Loop: Header=BB0_6 Depth=1 2026-02-21T12:40:24.4519533Z .loc 1 34 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:34:32 2026-02-21T12:40:24.4519598Z or.b64 %rd577, %rd140, %rd5; 2026-02-21T12:40:24.4519658Z or.b64 %rd578, %rd140, %rd6; 2026-02-21T12:40:24.4519722Z or.b64 %rd579, %rd140, %rd7; 2026-02-21T12:40:24.4519782Z or.b64 %rd580, %rd140, %rd8; 2026-02-21T12:40:24.4519841Z or.b64 %rd581, %rd140, %rd9; 2026-02-21T12:40:24.4519903Z or.b64 %rd582, %rd140, %rd10; 2026-02-21T12:40:24.4519968Z or.b64 %rd583, %rd140, %rd11; 2026-02-21T12:40:24.4520028Z or.b64 %rd584, %rd140, %rd12; 2026-02-21T12:40:24.4520089Z or.b64 %rd585, %rd140, %rd13; 2026-02-21T12:40:24.4520153Z or.b64 %rd586, %rd140, %rd14; 2026-02-21T12:40:24.4520212Z or.b64 %rd587, %rd140, %rd15; 2026-02-21T12:40:24.4520273Z or.b64 %rd588, %rd140, %rd16; 2026-02-21T12:40:24.4520336Z or.b64 %rd589, %rd140, %rd17; 2026-02-21T12:40:24.4520465Z or.b64 %rd590, %rd140, %rd18; 2026-02-21T12:40:24.4520583Z or.b64 %rd591, %rd140, %rd19; 2026-02-21T12:40:24.4520645Z or.b64 %rd592, %rd140, %rd20; 2026-02-21T12:40:24.4520845Z .loc 1 36 32 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:36:32 2026-02-21T12:40:24.4520905Z or.b64 %rd593, %rd141, %rd22; 2026-02-21T12:40:24.4521095Z .loc 1 90 28 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:90:28 2026-02-21T12:40:24.4521184Z cvt.rn.bf16x2.f32 %r21248, %r22355, %r22354; 2026-02-21T12:40:24.4521263Z cvt.rn.bf16x2.f32 %r21249, %r22357, %r22356; 2026-02-21T12:40:24.4521340Z cvt.rn.bf16x2.f32 %r21250, %r22359, %r22358; 2026-02-21T12:40:24.4521417Z cvt.rn.bf16x2.f32 %r21251, %r22361, %r22360; 2026-02-21T12:40:24.4521492Z cvt.rn.bf16x2.f32 %r21252, %r22363, %r22362; 2026-02-21T12:40:24.4521569Z cvt.rn.bf16x2.f32 %r21253, %r22365, %r22364; 2026-02-21T12:40:24.4521644Z cvt.rn.bf16x2.f32 %r21254, %r22367, %r22366; 2026-02-21T12:40:24.4521726Z cvt.rn.bf16x2.f32 %r21255, %r22369, %r22368; 2026-02-21T12:40:24.4521802Z cvt.rn.bf16x2.f32 %r21256, %r22371, %r22370; 2026-02-21T12:40:24.4521877Z cvt.rn.bf16x2.f32 %r21257, %r22373, %r22372; 2026-02-21T12:40:24.4521955Z cvt.rn.bf16x2.f32 %r21258, %r22375, %r22374; 2026-02-21T12:40:24.4522029Z cvt.rn.bf16x2.f32 %r21259, %r22377, %r22376; 2026-02-21T12:40:24.4522105Z cvt.rn.bf16x2.f32 %r21260, %r22379, %r22378; 2026-02-21T12:40:24.4522179Z cvt.rn.bf16x2.f32 %r21261, %r22381, %r22380; 2026-02-21T12:40:24.4522257Z cvt.rn.bf16x2.f32 %r21262, %r22383, %r22382; 2026-02-21T12:40:24.4522332Z cvt.rn.bf16x2.f32 %r21263, %r22385, %r22384; 2026-02-21T12:40:24.4522406Z cvt.rn.bf16x2.f32 %r21264, %r22387, %r22386; 2026-02-21T12:40:24.4522483Z cvt.rn.bf16x2.f32 %r21265, %r22389, %r22388; 2026-02-21T12:40:24.4522556Z cvt.rn.bf16x2.f32 %r21266, %r22391, %r22390; 2026-02-21T12:40:24.4522633Z cvt.rn.bf16x2.f32 %r21267, %r22393, %r22392; 2026-02-21T12:40:24.4522711Z cvt.rn.bf16x2.f32 %r21268, %r22395, %r22394; 2026-02-21T12:40:24.4522806Z cvt.rn.bf16x2.f32 %r21269, %r22397, %r22396; 2026-02-21T12:40:24.4522887Z cvt.rn.bf16x2.f32 %r21270, %r22399, %r22398; 2026-02-21T12:40:24.4522963Z cvt.rn.bf16x2.f32 %r21271, %r22401, %r22400; 2026-02-21T12:40:24.4523042Z cvt.rn.bf16x2.f32 %r21272, %r22403, %r22402; 2026-02-21T12:40:24.4523116Z cvt.rn.bf16x2.f32 %r21273, %r22405, %r22404; 2026-02-21T12:40:24.4523191Z cvt.rn.bf16x2.f32 %r21274, %r22407, %r22406; 2026-02-21T12:40:24.4523268Z cvt.rn.bf16x2.f32 %r21275, %r22409, %r22408; 2026-02-21T12:40:24.4523343Z cvt.rn.bf16x2.f32 %r21276, %r22411, %r22410; 2026-02-21T12:40:24.4523417Z cvt.rn.bf16x2.f32 %r21277, %r22413, %r22412; 2026-02-21T12:40:24.4523495Z cvt.rn.bf16x2.f32 %r21278, %r22415, %r22414; 2026-02-21T12:40:24.4523570Z cvt.rn.bf16x2.f32 %r21279, %r22417, %r22416; 2026-02-21T12:40:24.4523708Z cvt.rn.bf16x2.f32 %r21280, %r22419, %r22418; 2026-02-21T12:40:24.4523783Z cvt.rn.bf16x2.f32 %r21281, %r22421, %r22420; 2026-02-21T12:40:24.4523908Z cvt.rn.bf16x2.f32 %r21282, %r22423, %r22422; 2026-02-21T12:40:24.4523985Z cvt.rn.bf16x2.f32 %r21283, %r22425, %r22424; 2026-02-21T12:40:24.4524063Z cvt.rn.bf16x2.f32 %r21284, %r22427, %r22426; 2026-02-21T12:40:24.4524139Z cvt.rn.bf16x2.f32 %r21285, %r22429, %r22428; 2026-02-21T12:40:24.4524215Z cvt.rn.bf16x2.f32 %r21286, %r22431, %r22430; 2026-02-21T12:40:24.4524294Z cvt.rn.bf16x2.f32 %r21287, %r22433, %r22432; 2026-02-21T12:40:24.4524373Z cvt.rn.bf16x2.f32 %r21288, %r22435, %r22434; 2026-02-21T12:40:24.4524449Z cvt.rn.bf16x2.f32 %r21289, %r22437, %r22436; 2026-02-21T12:40:24.4524524Z cvt.rn.bf16x2.f32 %r21290, %r22439, %r22438; 2026-02-21T12:40:24.4524600Z cvt.rn.bf16x2.f32 %r21291, %r22441, %r22440; 2026-02-21T12:40:24.4524677Z cvt.rn.bf16x2.f32 %r21292, %r22443, %r22442; 2026-02-21T12:40:24.4524755Z cvt.rn.bf16x2.f32 %r21293, %r22445, %r22444; 2026-02-21T12:40:24.4524829Z cvt.rn.bf16x2.f32 %r21294, %r22447, %r22446; 2026-02-21T12:40:24.4524952Z cvt.rn.bf16x2.f32 %r21295, %r22449, %r22448; 2026-02-21T12:40:24.4525075Z cvt.rn.bf16x2.f32 %r21296, %r22451, %r22450; 2026-02-21T12:40:24.4525152Z cvt.rn.bf16x2.f32 %r21297, %r22453, %r22452; 2026-02-21T12:40:24.4525228Z cvt.rn.bf16x2.f32 %r21298, %r22455, %r22454; 2026-02-21T12:40:24.4525303Z cvt.rn.bf16x2.f32 %r21299, %r22457, %r22456; 2026-02-21T12:40:24.4525376Z cvt.rn.bf16x2.f32 %r21300, %r22459, %r22458; 2026-02-21T12:40:24.4525451Z cvt.rn.bf16x2.f32 %r21301, %r22461, %r22460; 2026-02-21T12:40:24.4525529Z cvt.rn.bf16x2.f32 %r21302, %r22463, %r22462; 2026-02-21T12:40:24.4525604Z cvt.rn.bf16x2.f32 %r21303, %r22465, %r22464; 2026-02-21T12:40:24.4525680Z cvt.rn.bf16x2.f32 %r21304, %r22467, %r22466; 2026-02-21T12:40:24.4525759Z cvt.rn.bf16x2.f32 %r21305, %r22469, %r22468; 2026-02-21T12:40:24.4525845Z cvt.rn.bf16x2.f32 %r21306, %r22471, %r22470; 2026-02-21T12:40:24.4525925Z cvt.rn.bf16x2.f32 %r21307, %r22473, %r22472; 2026-02-21T12:40:24.4526012Z cvt.rn.bf16x2.f32 %r21308, %r22475, %r22474; 2026-02-21T12:40:24.4526093Z cvt.rn.bf16x2.f32 %r21309, %r22477, %r22476; 2026-02-21T12:40:24.4526168Z cvt.rn.bf16x2.f32 %r21310, %r22479, %r22478; 2026-02-21T12:40:24.4526242Z cvt.rn.bf16x2.f32 %r21311, %r22481, %r22480; 2026-02-21T12:40:24.4526581Z .loc 1 91 22 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:22 2026-02-21T12:40:24.4526673Z mad.lo.s64 %rd594, %rd577, 2560, %rd162; 2026-02-21T12:40:24.4526741Z shl.b64 %rd595, %rd593, 1; 2026-02-21T12:40:24.4526811Z add.s64 %rd561, %rd594, %rd595; 2026-02-21T12:40:24.4526885Z mad.lo.s64 %rd596, %rd578, 2560, %rd162; 2026-02-21T12:40:24.4526949Z add.s64 %rd562, %rd596, %rd595; 2026-02-21T12:40:24.4527022Z mad.lo.s64 %rd597, %rd579, 2560, %rd162; 2026-02-21T12:40:24.4527086Z add.s64 %rd563, %rd597, %rd595; 2026-02-21T12:40:24.4527155Z mad.lo.s64 %rd598, %rd580, 2560, %rd162; 2026-02-21T12:40:24.4527219Z add.s64 %rd564, %rd598, %rd595; 2026-02-21T12:40:24.4527290Z mad.lo.s64 %rd599, %rd581, 2560, %rd162; 2026-02-21T12:40:24.4527356Z add.s64 %rd565, %rd599, %rd595; 2026-02-21T12:40:24.4527425Z mad.lo.s64 %rd600, %rd582, 2560, %rd162; 2026-02-21T12:40:24.4527490Z add.s64 %rd566, %rd600, %rd595; 2026-02-21T12:40:24.4527561Z mad.lo.s64 %rd601, %rd583, 2560, %rd162; 2026-02-21T12:40:24.4527622Z add.s64 %rd567, %rd601, %rd595; 2026-02-21T12:40:24.4527690Z mad.lo.s64 %rd602, %rd584, 2560, %rd162; 2026-02-21T12:40:24.4527756Z add.s64 %rd568, %rd602, %rd595; 2026-02-21T12:40:24.4527824Z mad.lo.s64 %rd603, %rd585, 2560, %rd162; 2026-02-21T12:40:24.4527885Z add.s64 %rd569, %rd603, %rd595; 2026-02-21T12:40:24.4527955Z mad.lo.s64 %rd604, %rd586, 2560, %rd162; 2026-02-21T12:40:24.4528018Z add.s64 %rd570, %rd604, %rd595; 2026-02-21T12:40:24.4528086Z mad.lo.s64 %rd605, %rd587, 2560, %rd162; 2026-02-21T12:40:24.4528150Z add.s64 %rd571, %rd605, %rd595; 2026-02-21T12:40:24.4528308Z mad.lo.s64 %rd606, %rd588, 2560, %rd162; 2026-02-21T12:40:24.4528371Z add.s64 %rd572, %rd606, %rd595; 2026-02-21T12:40:24.4528503Z mad.lo.s64 %rd607, %rd589, 2560, %rd162; 2026-02-21T12:40:24.4528568Z add.s64 %rd573, %rd607, %rd595; 2026-02-21T12:40:24.4528636Z mad.lo.s64 %rd608, %rd590, 2560, %rd162; 2026-02-21T12:40:24.4528698Z add.s64 %rd574, %rd608, %rd595; 2026-02-21T12:40:24.4528768Z mad.lo.s64 %rd609, %rd591, 2560, %rd162; 2026-02-21T12:40:24.4528829Z add.s64 %rd575, %rd609, %rd595; 2026-02-21T12:40:24.4528896Z mad.lo.s64 %rd610, %rd592, 2560, %rd162; 2026-02-21T12:40:24.4528957Z add.s64 %rd576, %rd610, %rd595; 2026-02-21T12:40:24.4529162Z .loc 1 91 81 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:91:81 2026-02-21T12:40:24.4529220Z bar.sync 0; 2026-02-21T12:40:24.4529339Z st.shared.v4.b32 [%r83], {%r21248, %r21250, %r21252, %r21254}; 2026-02-21T12:40:24.4529453Z st.shared.v4.b32 [%r84], {%r21256, %r21258, %r21260, %r21262}; 2026-02-21T12:40:24.4529565Z st.shared.v4.b32 [%r85], {%r21264, %r21266, %r21268, %r21270}; 2026-02-21T12:40:24.4529789Z st.shared.v4.b32 [%r86], {%r21272, %r21274, %r21276, %r21278}; 2026-02-21T12:40:24.4529903Z st.shared.v4.b32 [%r87], {%r21280, %r21282, %r21284, %r21286}; 2026-02-21T12:40:24.4530008Z st.shared.v4.b32 [%r88], {%r21288, %r21290, %r21292, %r21294}; 2026-02-21T12:40:24.4530114Z st.shared.v4.b32 [%r89], {%r21296, %r21298, %r21300, %r21302}; 2026-02-21T12:40:24.4530234Z st.shared.v4.b32 [%r90], {%r21304, %r21306, %r21308, %r21310}; 2026-02-21T12:40:24.4530292Z bar.sync 0; 2026-02-21T12:40:24.4530353Z // begin inline asm 2026-02-21T12:40:24.4530559Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21104, %r21105, %r21106, %r21107}, [%r21108]; 2026-02-21T12:40:24.4530619Z // end inline asm 2026-02-21T12:40:24.4530679Z // begin inline asm 2026-02-21T12:40:24.4530874Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21109, %r21110, %r21111, %r21112}, [%r21113]; 2026-02-21T12:40:24.4530937Z // end inline asm 2026-02-21T12:40:24.4530994Z // begin inline asm 2026-02-21T12:40:24.4531189Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21114, %r21115, %r21116, %r21117}, [%r21118]; 2026-02-21T12:40:24.4531247Z // end inline asm 2026-02-21T12:40:24.4531307Z // begin inline asm 2026-02-21T12:40:24.4531496Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21119, %r21120, %r21121, %r21122}, [%r21123]; 2026-02-21T12:40:24.4531551Z // end inline asm 2026-02-21T12:40:24.4531611Z // begin inline asm 2026-02-21T12:40:24.4531799Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21124, %r21125, %r21126, %r21127}, [%r21128]; 2026-02-21T12:40:24.4531854Z // end inline asm 2026-02-21T12:40:24.4531914Z // begin inline asm 2026-02-21T12:40:24.4532104Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21129, %r21130, %r21131, %r21132}, [%r21133]; 2026-02-21T12:40:24.4532159Z // end inline asm 2026-02-21T12:40:24.4532215Z // begin inline asm 2026-02-21T12:40:24.4532408Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21134, %r21135, %r21136, %r21137}, [%r21138]; 2026-02-21T12:40:24.4532465Z // end inline asm 2026-02-21T12:40:24.4532523Z // begin inline asm 2026-02-21T12:40:24.4532719Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21139, %r21140, %r21141, %r21142}, [%r21143]; 2026-02-21T12:40:24.4532775Z // end inline asm 2026-02-21T12:40:24.4532830Z bar.sync 0; 2026-02-21T12:40:24.4532946Z st.shared.v4.b32 [%r83], {%r21249, %r21251, %r21253, %r21255}; 2026-02-21T12:40:24.4533054Z st.shared.v4.b32 [%r84], {%r21257, %r21259, %r21261, %r21263}; 2026-02-21T12:40:24.4533159Z st.shared.v4.b32 [%r85], {%r21265, %r21267, %r21269, %r21271}; 2026-02-21T12:40:24.4533275Z st.shared.v4.b32 [%r86], {%r21273, %r21275, %r21277, %r21279}; 2026-02-21T12:40:24.4533386Z st.shared.v4.b32 [%r87], {%r21281, %r21283, %r21285, %r21287}; 2026-02-21T12:40:24.4533495Z st.shared.v4.b32 [%r88], {%r21289, %r21291, %r21293, %r21295}; 2026-02-21T12:40:24.4533602Z st.shared.v4.b32 [%r89], {%r21297, %r21299, %r21301, %r21303}; 2026-02-21T12:40:24.4533770Z st.shared.v4.b32 [%r90], {%r21305, %r21307, %r21309, %r21311}; 2026-02-21T12:40:24.4533824Z bar.sync 0; 2026-02-21T12:40:24.4533931Z // begin inline asm 2026-02-21T12:40:24.4534128Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21144, %r21145, %r21146, %r21147}, [%r21108]; 2026-02-21T12:40:24.4534185Z // end inline asm 2026-02-21T12:40:24.4534243Z // begin inline asm 2026-02-21T12:40:24.4534433Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21149, %r21150, %r21151, %r21152}, [%r21113]; 2026-02-21T12:40:24.4534501Z // end inline asm 2026-02-21T12:40:24.4534564Z // begin inline asm 2026-02-21T12:40:24.4534758Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21154, %r21155, %r21156, %r21157}, [%r21118]; 2026-02-21T12:40:24.4534816Z // end inline asm 2026-02-21T12:40:24.4534873Z // begin inline asm 2026-02-21T12:40:24.4535060Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21159, %r21160, %r21161, %r21162}, [%r21123]; 2026-02-21T12:40:24.4535116Z // end inline asm 2026-02-21T12:40:24.4535176Z // begin inline asm 2026-02-21T12:40:24.4535365Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21164, %r21165, %r21166, %r21167}, [%r21128]; 2026-02-21T12:40:24.4535534Z // end inline asm 2026-02-21T12:40:24.4535603Z // begin inline asm 2026-02-21T12:40:24.4535797Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21169, %r21170, %r21171, %r21172}, [%r21133]; 2026-02-21T12:40:24.4535856Z // end inline asm 2026-02-21T12:40:24.4535921Z // begin inline asm 2026-02-21T12:40:24.4536117Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21174, %r21175, %r21176, %r21177}, [%r21138]; 2026-02-21T12:40:24.4536175Z // end inline asm 2026-02-21T12:40:24.4536234Z // begin inline asm 2026-02-21T12:40:24.4536430Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r21179, %r21180, %r21181, %r21182}, [%r21143]; 2026-02-21T12:40:24.4536604Z // end inline asm 2026-02-21T12:40:24.4536682Z // begin inline asm 2026-02-21T12:40:24.4536830Z st.global.v4.b32 [ %rd561 + 0 ], { %r21104, %r21105, %r21106, %r21107 }; 2026-02-21T12:40:24.4536894Z // end inline asm 2026-02-21T12:40:24.4536954Z // begin inline asm 2026-02-21T12:40:24.4537088Z st.global.v4.b32 [ %rd562 + 0 ], { %r21109, %r21110, %r21111, %r21112 }; 2026-02-21T12:40:24.4537154Z // end inline asm 2026-02-21T12:40:24.4537215Z // begin inline asm 2026-02-21T12:40:24.4537340Z st.global.v4.b32 [ %rd563 + 0 ], { %r21144, %r21145, %r21146, %r21147 }; 2026-02-21T12:40:24.4537406Z // end inline asm 2026-02-21T12:40:24.4537467Z // begin inline asm 2026-02-21T12:40:24.4537591Z st.global.v4.b32 [ %rd564 + 0 ], { %r21149, %r21150, %r21151, %r21152 }; 2026-02-21T12:40:24.4537658Z // end inline asm 2026-02-21T12:40:24.4537718Z // begin inline asm 2026-02-21T12:40:24.4537841Z st.global.v4.b32 [ %rd565 + 0 ], { %r21114, %r21115, %r21116, %r21117 }; 2026-02-21T12:40:24.4537900Z // end inline asm 2026-02-21T12:40:24.4537971Z // begin inline asm 2026-02-21T12:40:24.4538095Z st.global.v4.b32 [ %rd566 + 0 ], { %r21119, %r21120, %r21121, %r21122 }; 2026-02-21T12:40:24.4538155Z // end inline asm 2026-02-21T12:40:24.4538223Z // begin inline asm 2026-02-21T12:40:24.4538344Z st.global.v4.b32 [ %rd567 + 0 ], { %r21154, %r21155, %r21156, %r21157 }; 2026-02-21T12:40:24.4538406Z // end inline asm 2026-02-21T12:40:24.4538475Z // begin inline asm 2026-02-21T12:40:24.4538596Z st.global.v4.b32 [ %rd568 + 0 ], { %r21159, %r21160, %r21161, %r21162 }; 2026-02-21T12:40:24.4538656Z // end inline asm 2026-02-21T12:40:24.4538720Z // begin inline asm 2026-02-21T12:40:24.4538845Z st.global.v4.b32 [ %rd569 + 0 ], { %r21124, %r21125, %r21126, %r21127 }; 2026-02-21T12:40:24.4538904Z // end inline asm 2026-02-21T12:40:24.4538963Z // begin inline asm 2026-02-21T12:40:24.4539090Z st.global.v4.b32 [ %rd570 + 0 ], { %r21129, %r21130, %r21131, %r21132 }; 2026-02-21T12:40:24.4539148Z // end inline asm 2026-02-21T12:40:24.4539209Z // begin inline asm 2026-02-21T12:40:24.4539330Z st.global.v4.b32 [ %rd571 + 0 ], { %r21164, %r21165, %r21166, %r21167 }; 2026-02-21T12:40:24.4539393Z // end inline asm 2026-02-21T12:40:24.4539545Z // begin inline asm 2026-02-21T12:40:24.4539671Z st.global.v4.b32 [ %rd572 + 0 ], { %r21169, %r21170, %r21171, %r21172 }; 2026-02-21T12:40:24.4539801Z // end inline asm 2026-02-21T12:40:24.4539861Z // begin inline asm 2026-02-21T12:40:24.4539983Z st.global.v4.b32 [ %rd573 + 0 ], { %r21134, %r21135, %r21136, %r21137 }; 2026-02-21T12:40:24.4540045Z // end inline asm 2026-02-21T12:40:24.4540108Z // begin inline asm 2026-02-21T12:40:24.4540230Z st.global.v4.b32 [ %rd574 + 0 ], { %r21139, %r21140, %r21141, %r21142 }; 2026-02-21T12:40:24.4540289Z // end inline asm 2026-02-21T12:40:24.4540353Z // begin inline asm 2026-02-21T12:40:24.4540475Z st.global.v4.b32 [ %rd575 + 0 ], { %r21174, %r21175, %r21176, %r21177 }; 2026-02-21T12:40:24.4540533Z // end inline asm 2026-02-21T12:40:24.4540597Z // begin inline asm 2026-02-21T12:40:24.4540718Z st.global.v4.b32 [ %rd576 + 0 ], { %r21179, %r21180, %r21181, %r21182 }; 2026-02-21T12:40:24.4540776Z // end inline asm 2026-02-21T12:40:24.4540995Z .loc 1 22 120 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:120 2026-02-21T12:40:24.4541132Z add.s64 %rd641, %rd641, 1; 2026-02-21T12:40:24.4541267Z setp.ne.b64 %p58, %rd641, %rd2; 2026-02-21T12:40:24.4541336Z @%p58 bra $L__BB0_6; 2026-02-21T12:40:24.4541437Z $L__BB0_11: // %._crit_edge 2026-02-21T12:40:24.4541649Z .loc 1 22 4 // cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py:22:4 2026-02-21T12:40:24.4541703Z ret; 2026-02-21T12:40:24.4541766Z $L__tmp41: 2026-02-21T12:40:24.4541828Z $L__func_end0: 2026-02-21T12:40:24.4541919Z // -- End function 2026-02-21T12:40:24.4541980Z } 2026-02-21T12:40:24.4542231Z .file 1 "/tmp/torchinductor_root/kt/cktdtwaoq5ja44zhf6gxdqabu4xmjsoz3h5d336qt7bbwl4tes66.py" 2026-02-21T12:40:24.4542445Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:40:24.4542516Z .section .debug_abbrev 2026-02-21T12:40:24.4542580Z { 2026-02-21T12:40:24.4542678Z .b8 1 // Abbreviation Code 2026-02-21T12:40:24.4542780Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:40:24.4542873Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:40:24.4542961Z .b8 37 // DW_AT_producer 2026-02-21T12:40:24.4543042Z .b8 8 // DW_FORM_string 2026-02-21T12:40:24.4543123Z .b8 19 // DW_AT_language 2026-02-21T12:40:24.4543214Z .b8 5 // DW_FORM_data2 2026-02-21T12:40:24.4543295Z .b8 3 // DW_AT_name 2026-02-21T12:40:24.4543377Z .b8 8 // DW_FORM_string 2026-02-21T12:40:24.4543467Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:40:24.4543552Z .b8 6 // DW_FORM_data4 2026-02-21T12:40:24.4543635Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:40:24.4543720Z .b8 8 // DW_FORM_string 2026-02-21T12:40:24.4543801Z .b8 0 // EOM(1) 2026-02-21T12:40:24.4543877Z .b8 0 // EOM(2) 2026-02-21T12:40:24.4543967Z .b8 2 // Abbreviation Code 2026-02-21T12:40:24.4544064Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:40:24.4544146Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:40:24.4544227Z .b8 3 // DW_AT_name 2026-02-21T12:40:24.4544314Z .b8 8 // DW_FORM_string 2026-02-21T12:40:24.4544398Z .b8 32 // DW_AT_inline 2026-02-21T12:40:24.4544480Z .b8 11 // DW_FORM_data1 2026-02-21T12:40:24.4544558Z .b8 0 // EOM(1) 2026-02-21T12:40:24.4544703Z .b8 0 // EOM(2) 2026-02-21T12:40:24.4544792Z .b8 3 // Abbreviation Code 2026-02-21T12:40:24.4544937Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:40:24.4545024Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:40:24.4545105Z .b8 17 // DW_AT_low_pc 2026-02-21T12:40:24.4545184Z .b8 1 // DW_FORM_addr 2026-02-21T12:40:24.4545272Z .b8 18 // DW_AT_high_pc 2026-02-21T12:40:24.4545349Z .b8 1 // DW_FORM_addr 2026-02-21T12:40:24.4545444Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:40:24.4545529Z .b8 19 // DW_FORM_ref4 2026-02-21T12:40:24.4545602Z .b8 0 // EOM(1) 2026-02-21T12:40:24.4545677Z .b8 0 // EOM(2) 2026-02-21T12:40:24.4545772Z .b8 4 // Abbreviation Code 2026-02-21T12:40:24.4545875Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:40:24.4546056Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:40:24.4546152Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:40:24.4546237Z .b8 19 // DW_FORM_ref4 2026-02-21T12:40:24.4546315Z .b8 17 // DW_AT_low_pc 2026-02-21T12:40:24.4546390Z .b8 1 // DW_FORM_addr 2026-02-21T12:40:24.4546596Z .b8 18 // DW_AT_high_pc 2026-02-21T12:40:24.4546678Z .b8 1 // DW_FORM_addr 2026-02-21T12:40:24.4546763Z .b8 88 // DW_AT_call_file 2026-02-21T12:40:24.4546846Z .b8 11 // DW_FORM_data1 2026-02-21T12:40:24.4546938Z .b8 89 // DW_AT_call_line 2026-02-21T12:40:24.4547022Z .b8 11 // DW_FORM_data1 2026-02-21T12:40:24.4547110Z .b8 87 // DW_AT_call_column 2026-02-21T12:40:24.4547196Z .b8 11 // DW_FORM_data1 2026-02-21T12:40:24.4547267Z .b8 0 // EOM(1) 2026-02-21T12:40:24.4547338Z .b8 0 // EOM(2) 2026-02-21T12:40:24.4547413Z .b8 0 // EOM(3) 2026-02-21T12:40:24.4547469Z } 2026-02-21T12:40:24.4547533Z .section .debug_info 2026-02-21T12:40:24.4547588Z { 2026-02-21T12:40:24.4547677Z .b32 178 // Length of Unit 2026-02-21T12:40:24.4547774Z .b8 2 // DWARF version number 2026-02-21T12:40:24.4547828Z .b8 0 2026-02-21T12:40:24.4547965Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:40:24.4548061Z .b8 8 // Address Size (in bytes) 2026-02-21T12:40:24.4548179Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:40:24.4548272Z .b8 116 // DW_AT_producer 2026-02-21T12:40:24.4548386Z .b8 114 2026-02-21T12:40:24.4548444Z .b8 105 2026-02-21T12:40:24.4548502Z .b8 116 2026-02-21T12:40:24.4548555Z .b8 111 2026-02-21T12:40:24.4548607Z .b8 110 2026-02-21T12:40:24.4548658Z .b8 0 2026-02-21T12:40:24.4548744Z .b8 2 // DW_AT_language 2026-02-21T12:40:24.4548796Z .b8 0 2026-02-21T12:40:24.4548878Z .b8 99 // DW_AT_name 2026-02-21T12:40:24.4548936Z .b8 107 2026-02-21T12:40:24.4548988Z .b8 116 2026-02-21T12:40:24.4549040Z .b8 100 2026-02-21T12:40:24.4549092Z .b8 116 2026-02-21T12:40:24.4549146Z .b8 119 2026-02-21T12:40:24.4549198Z .b8 97 2026-02-21T12:40:24.4549249Z .b8 111 2026-02-21T12:40:24.4549302Z .b8 113 2026-02-21T12:40:24.4549357Z .b8 53 2026-02-21T12:40:24.4549410Z .b8 106 2026-02-21T12:40:24.4549555Z .b8 97 2026-02-21T12:40:24.4549609Z .b8 52 2026-02-21T12:40:24.4549662Z .b8 52 2026-02-21T12:40:24.4549714Z .b8 122 2026-02-21T12:40:24.4549766Z .b8 104 2026-02-21T12:40:24.4549888Z .b8 102 2026-02-21T12:40:24.4549940Z .b8 54 2026-02-21T12:40:24.4549992Z .b8 103 2026-02-21T12:40:24.4550047Z .b8 120 2026-02-21T12:40:24.4550097Z .b8 100 2026-02-21T12:40:24.4550150Z .b8 113 2026-02-21T12:40:24.4550202Z .b8 97 2026-02-21T12:40:24.4550263Z .b8 98 2026-02-21T12:40:24.4550315Z .b8 117 2026-02-21T12:40:24.4550365Z .b8 52 2026-02-21T12:40:24.4550417Z .b8 120 2026-02-21T12:40:24.4550473Z .b8 109 2026-02-21T12:40:24.4550525Z .b8 106 2026-02-21T12:40:24.4550576Z .b8 115 2026-02-21T12:40:24.4550633Z .b8 111 2026-02-21T12:40:24.4550686Z .b8 122 2026-02-21T12:40:24.4550737Z .b8 51 2026-02-21T12:40:24.4550789Z .b8 104 2026-02-21T12:40:24.4550843Z .b8 53 2026-02-21T12:40:24.4550894Z .b8 100 2026-02-21T12:40:24.4550945Z .b8 51 2026-02-21T12:40:24.4551000Z .b8 51 2026-02-21T12:40:24.4551052Z .b8 54 2026-02-21T12:40:24.4551107Z .b8 113 2026-02-21T12:40:24.4551157Z .b8 116 2026-02-21T12:40:24.4551212Z .b8 55 2026-02-21T12:40:24.4551263Z .b8 98 2026-02-21T12:40:24.4551318Z .b8 98 2026-02-21T12:40:24.4551506Z .b8 119 2026-02-21T12:40:24.4551564Z .b8 108 2026-02-21T12:40:24.4551615Z .b8 52 2026-02-21T12:40:24.4551669Z .b8 116 2026-02-21T12:40:24.4551724Z .b8 101 2026-02-21T12:40:24.4551777Z .b8 115 2026-02-21T12:40:24.4551828Z .b8 54 2026-02-21T12:40:24.4551879Z .b8 54 2026-02-21T12:40:24.4551934Z .b8 46 2026-02-21T12:40:24.4551989Z .b8 112 2026-02-21T12:40:24.4552041Z .b8 121 2026-02-21T12:40:24.4552095Z .b8 0 2026-02-21T12:40:24.4552204Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:40:24.4552289Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:40:24.4552341Z .b8 116 2026-02-21T12:40:24.4552398Z .b8 109 2026-02-21T12:40:24.4552449Z .b8 112 2026-02-21T12:40:24.4552500Z .b8 47 2026-02-21T12:40:24.4552554Z .b8 116 2026-02-21T12:40:24.4552610Z .b8 111 2026-02-21T12:40:24.4552666Z .b8 114 2026-02-21T12:40:24.4552722Z .b8 99 2026-02-21T12:40:24.4552781Z .b8 104 2026-02-21T12:40:24.4552833Z .b8 105 2026-02-21T12:40:24.4552885Z .b8 110 2026-02-21T12:40:24.4552960Z .b8 100 2026-02-21T12:40:24.4553015Z .b8 117 2026-02-21T12:40:24.4553067Z .b8 99 2026-02-21T12:40:24.4553120Z .b8 116 2026-02-21T12:40:24.4553176Z .b8 111 2026-02-21T12:40:24.4553229Z .b8 114 2026-02-21T12:40:24.4553282Z .b8 95 2026-02-21T12:40:24.4553335Z .b8 114 2026-02-21T12:40:24.4553390Z .b8 111 2026-02-21T12:40:24.4553443Z .b8 111 2026-02-21T12:40:24.4553497Z .b8 116 2026-02-21T12:40:24.4553552Z .b8 47 2026-02-21T12:40:24.4553605Z .b8 107 2026-02-21T12:40:24.4553657Z .b8 116 2026-02-21T12:40:24.4553708Z .b8 0 2026-02-21T12:40:24.4553826Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:40:24.4553906Z .b8 95 // DW_AT_name 2026-02-21T12:40:24.4553958Z .b8 104 2026-02-21T12:40:24.4554014Z .b8 101 2026-02-21T12:40:24.4554065Z .b8 108 2026-02-21T12:40:24.4554121Z .b8 105 2026-02-21T12:40:24.4554172Z .b8 111 2026-02-21T12:40:24.4554229Z .b8 110 2026-02-21T12:40:24.4554280Z .b8 95 2026-02-21T12:40:24.4554336Z .b8 109 2026-02-21T12:40:24.4554392Z .b8 97 2026-02-21T12:40:24.4554444Z .b8 116 2026-02-21T12:40:24.4554496Z .b8 109 2026-02-21T12:40:24.4554548Z .b8 117 2026-02-21T12:40:24.4554602Z .b8 108 2026-02-21T12:40:24.4554652Z .b8 95 2026-02-21T12:40:24.4554707Z .b8 98 2026-02-21T12:40:24.4554762Z .b8 102 2026-02-21T12:40:24.4554812Z .b8 49 2026-02-21T12:40:24.4554862Z .b8 54 2026-02-21T12:40:24.4554914Z .b8 95 2026-02-21T12:40:24.4554971Z .b8 105 2026-02-21T12:40:24.4555023Z .b8 110 2026-02-21T12:40:24.4555074Z .b8 116 2026-02-21T12:40:24.4555124Z .b8 52 2026-02-21T12:40:24.4555179Z .b8 0 2026-02-21T12:40:24.4555262Z .b8 1 // DW_AT_inline 2026-02-21T12:40:24.4555370Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:40:24.4555469Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:40:24.4555640Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:40:24.4555743Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:40:24.4555949Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:40:24.4556049Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:40:24.4556140Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:40:24.4556236Z .b64 $L__tmp40 // DW_AT_high_pc 2026-02-21T12:40:24.4556323Z .b8 1 // DW_AT_call_file 2026-02-21T12:40:24.4556405Z .b8 87 // DW_AT_call_line 2026-02-21T12:40:24.4556613Z .b8 40 // DW_AT_call_column 2026-02-21T12:40:24.4556716Z .b8 0 // End Of Children Mark 2026-02-21T12:40:24.4556804Z .b8 0 // End Of Children Mark 2026-02-21T12:40:24.4556859Z } 2026-02-21T12:40:24.4556936Z .section .debug_macinfo { } 2026-02-21T12:40:24.4556942Z 2026-02-21T12:40:24.4557151Z ================================================================ 2026-02-21T12:40:24.4557280Z please share the reproducer above with Triton project. 2026-02-21T12:40:40.9457167Z [8140s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=32, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[4, 1], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T12:40:40.9458930Z Tensor-likes are not close! 2026-02-21T12:40:40.9459108Z 2026-02-21T12:40:40.9459241Z Mismatched elements: 334377873 / 335544320 (99.7%) 2026-02-21T12:40:40.9459685Z Greatest absolute difference: 4160.0 at index (242617, 1196) (up to 0.01 allowed) 2026-02-21T12:40:40.9460248Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:40:40.9460735Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:40:40.9461024Z 2026-02-21T12:41:41.7859266Z 2026-02-21T12:41:41.7860037Z Generation 2: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 103/103 0.4 configs/s 2026-02-21T12:41:42.3259883Z Generation 2: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 5/5 - configs/s 2026-02-21T12:41:45.0327861Z [8204s] Generation 2 complete: 2026-02-21T12:41:45.0328170Z error=15 2026-02-21T12:41:45.0328348Z ok=90 2026-02-21T12:41:45.0328512Z min=38.2089 2026-02-21T12:41:45.0328690Z mid=107.5174 2026-02-21T12:41:45.0328859Z max=2299.0566 2026-02-21T12:41:45.0329053Z best={'block_sizes': [32, 128, 64], 2026-02-21T12:41:45.0329394Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'pointer'], 2026-02-21T12:41:45.0329760Z 'l2_groupings': [32], 2026-02-21T12:41:45.0329987Z 'load_eviction_policies': ['last', ''], 2026-02-21T12:41:45.0330279Z 'loop_orders': [[1, 0]], 2026-02-21T12:41:45.0330494Z 'num_stages': 6, 2026-02-21T12:41:45.0330677Z 'num_warps': 4, 2026-02-21T12:41:45.0330895Z 'pid_type': 'flat', 2026-02-21T12:41:45.0331108Z 'range_flattens': [None, False], 2026-02-21T12:41:45.0331366Z 'range_multi_buffers': [None, True], 2026-02-21T12:41:45.0331619Z 'range_num_stages': [0, 2], 2026-02-21T12:41:45.0331852Z 'range_unroll_factors': [0, 3], 2026-02-21T12:41:45.0332085Z 'range_warp_specializes': []} 2026-02-21T12:41:45.0365477Z [8204s] Fitting surrogate: 318 points, 318 targets 2026-02-21T12:41:46.6761226Z [8205s] Generation 3 starting: 100 neighbors, 5 active search path(s) 2026-02-21T12:42:41.8343516Z Generation 3: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━ 101/101 0.9 configs/s 2026-02-21T12:43:14.3009390Z 2026-02-21T12:43:14.3009405Z 2026-02-21T12:43:14.3009822Z ================================================================ 2026-02-21T12:43:14.3010227Z Internal Triton PTX codegen error 2026-02-21T12:43:14.3010953Z `ptxas` stderr: 2026-02-21T12:43:14.3011730Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 376 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T12:43:14.3012771Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:43:14.3013009Z 2026-02-21T12:43:14.3013683Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpoqe17qpi.ptx -o /tmp/tmpoqe17qpi.ptx.o 2026-02-21T12:43:14.3014460Z 2026-02-21T12:43:14.3014465Z 2026-02-21T12:43:14.3014552Z // 2026-02-21T12:43:14.3014760Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:43:14.3015006Z // 2026-02-21T12:43:14.3015107Z 2026-02-21T12:43:14.3015180Z .version 8.7 2026-02-21T12:43:14.3015362Z .target sm_90a 2026-02-21T12:43:14.3015554Z .address_size 64 2026-02-21T12:43:14.3015675Z 2026-02-21T12:43:14.3015903Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:43:14.3016350Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:43:14.3017436Z // @_helion_matmul_bf16_int4 2026-02-21T12:43:14.3017826Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:43:14.3018212Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:43:14.3018741Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:43:14.3019213Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:43:14.3019637Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:43:14.3020066Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:43:14.3020400Z ) 2026-02-21T12:43:14.3020553Z .reqntid 1024 2026-02-21T12:43:14.3020726Z .maxnreg 64 2026-02-21T12:43:14.3020883Z { 2026-02-21T12:43:14.3021047Z .reg .pred %p<52>; 2026-02-21T12:43:14.3021244Z .reg .b16 %rs<310>; 2026-02-21T12:43:14.3021543Z .reg .b32 %r<7873>; 2026-02-21T12:43:14.3021732Z .reg .b64 %rd<415>; 2026-02-21T12:43:14.3022126Z .loc 1 14 0 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:14:0 2026-02-21T12:43:14.3022610Z $L__func_begin0: 2026-02-21T12:43:14.3022979Z .loc 1 14 0 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:14:0 2026-02-21T12:43:14.3023346Z 2026-02-21T12:43:14.3023418Z // %bb.0: 2026-02-21T12:43:14.3023649Z ld.param.b64 %rd88, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:43:14.3024030Z ld.param.b64 %rd87, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:43:14.3024397Z ld.param.b64 %rd86, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:43:14.3024709Z $L__tmp0: 2026-02-21T12:43:14.3025089Z .loc 1 19 46 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:46 2026-02-21T12:43:14.3025546Z mov.u32 %r561, %ctaid.x; 2026-02-21T12:43:14.3025944Z .loc 1 19 52 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:52 2026-02-21T12:43:14.3026393Z cvt.u64.u32 %rd398, %r561; 2026-02-21T12:43:14.3027004Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3027474Z sub.s64 %rd89, 5383, %rd398; 2026-02-21T12:43:14.3027744Z mul.hi.u64 %rd90, %rd89, 1117984489315730401; 2026-02-21T12:43:14.3028028Z shr.u64 %rd91, %rd90, 4; 2026-02-21T12:43:14.3028261Z mul.hi.u64 %rd92, %rd91, 6148914691236517206; 2026-02-21T12:43:14.3028617Z mad.lo.s64 %rd411, %rd92, 792, %rd398; 2026-02-21T12:43:14.3029107Z .loc 1 31 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:45 2026-02-21T12:43:14.3029499Z mov.u32 %r1, %tid.x; 2026-02-21T12:43:14.3029673Z shr.u32 %r2, %r1, 5; 2026-02-21T12:43:14.3029834Z shl.b32 %r3, %r1, 3; 2026-02-21T12:43:14.3030005Z and.b32 %r562, %r3, 120; 2026-02-21T12:43:14.3030182Z and.b32 %r563, %r1, 127; 2026-02-21T12:43:14.3030647Z .loc 1 31 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:32 2026-02-21T12:43:14.3031015Z cvt.u64.u32 %rd3, %r562; 2026-02-21T12:43:14.3031285Z cvt.u64.u32 %rd4, %r563; 2026-02-21T12:43:14.3031594Z .loc 1 33 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:45 2026-02-21T12:43:14.3031948Z shr.u32 %r564, %r1, 1; 2026-02-21T12:43:14.3032117Z shr.u32 %r565, %r1, 4; 2026-02-21T12:43:14.3032289Z or.b32 %r566, %r565, 64; 2026-02-21T12:43:14.3032465Z or.b32 %r567, %r565, 128; 2026-02-21T12:43:14.3032642Z or.b32 %r568, %r565, 192; 2026-02-21T12:43:14.3032825Z or.b32 %r569, %r565, 256; 2026-02-21T12:43:14.3032996Z or.b32 %r570, %r565, 320; 2026-02-21T12:43:14.3033181Z or.b32 %r571, %r565, 384; 2026-02-21T12:43:14.3033347Z or.b32 %r572, %r565, 448; 2026-02-21T12:43:14.3033664Z .loc 1 33 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:32 2026-02-21T12:43:14.3034022Z cvt.u64.u32 %rd5, %r564; 2026-02-21T12:43:14.3034198Z cvt.u64.u32 %rd6, %r565; 2026-02-21T12:43:14.3034368Z cvt.u64.u32 %rd7, %r566; 2026-02-21T12:43:14.3034692Z cvt.u64.u32 %rd8, %r567; 2026-02-21T12:43:14.3034878Z cvt.u64.u32 %rd9, %r568; 2026-02-21T12:43:14.3035048Z cvt.u64.u32 %rd10, %r569; 2026-02-21T12:43:14.3035222Z cvt.u64.u32 %rd11, %r570; 2026-02-21T12:43:14.3035390Z cvt.u64.u32 %rd12, %r571; 2026-02-21T12:43:14.3035564Z cvt.u64.u32 %rd13, %r572; 2026-02-21T12:43:14.3035868Z .loc 1 41 48 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:41:48 2026-02-21T12:43:14.3036227Z and.b32 %r4, %r1, 896; 2026-02-21T12:43:14.3036393Z shr.u32 %r5, %r1, 7; 2026-02-21T12:43:14.3036838Z .loc 1 65 38 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:65:38 2026-02-21T12:43:14.3037195Z and.b32 %r6, %r1, 128; 2026-02-21T12:43:14.3037520Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3037898Z setp.le.s64 %p1, %rd411, %rd398; 2026-02-21T12:43:14.3038095Z cvt.u32.u64 %r7599, %rd6; 2026-02-21T12:43:14.3038288Z cvt.u32.u64 %r7600, %rd5; 2026-02-21T12:43:14.3038459Z cvt.u32.u64 %r7601, %rd4; 2026-02-21T12:43:14.3038635Z shl.b32 %r7602, %r1, 4; 2026-02-21T12:43:14.3038823Z bfe.s32 %r7603, %r1, 3, 1; 2026-02-21T12:43:14.3039004Z mov.b32 %r6277, global_smem; 2026-02-21T12:43:14.3039191Z and.b32 %r7605, %r3, 96; 2026-02-21T12:43:14.3039355Z shl.b32 %r7606, %r1, 1; 2026-02-21T12:43:14.3039525Z bfe.s32 %r7607, %r1, 4, 1; 2026-02-21T12:43:14.3039694Z shl.b32 %r7608, %r1, 2; 2026-02-21T12:43:14.3039860Z and.b32 %r7609, %r1, 384; 2026-02-21T12:43:14.3040026Z and.b32 %r7610, %r3, 512; 2026-02-21T12:43:14.3040201Z setp.gt.u32 %p50, %r1, 511; 2026-02-21T12:43:14.3040382Z and.b32 %r7611, %r3, 48; 2026-02-21T12:43:14.3040556Z shr.u32 %r7612, %r4, 5; 2026-02-21T12:43:14.3040726Z and.b32 %r7613, %r1, 3; 2026-02-21T12:43:14.3040889Z shl.b32 %r7614, %r1, 5; 2026-02-21T12:43:14.3041054Z and.b32 %r7615, %r1, 24; 2026-02-21T12:43:14.3041219Z shl.b64 %rd396, %rd5, 14; 2026-02-21T12:43:14.3041392Z and.b32 %r7616, %r1, 1; 2026-02-21T12:43:14.3041562Z mul.wide.u32 %rd397, %r5, 1280; 2026-02-21T12:43:14.3041757Z setp.eq.b32 %p51, %r6, 0; 2026-02-21T12:43:14.3041926Z @%p1 bra $L__BB0_4; 2026-02-21T12:43:14.3042112Z // %bb.1: // %.lr.ph 2026-02-21T12:43:14.3042478Z .loc 1 0 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:0:124 2026-02-21T12:43:14.3042849Z and.b32 %r577, %r7602, 16240; 2026-02-21T12:43:14.3043042Z and.b32 %r579, %r7603, 136; 2026-02-21T12:43:14.3043217Z or.b32 %r580, %r579, %r577; 2026-02-21T12:43:14.3043400Z add.s32 %r7, %r6277, %r580; 2026-02-21T12:43:14.3043573Z xor.b32 %r582, %r580, 8; 2026-02-21T12:43:14.3043750Z add.s32 %r8, %r6277, %r582; 2026-02-21T12:43:14.3043922Z and.b32 %r583, %r7602, 15872; 2026-02-21T12:43:14.3044106Z and.b32 %r586, %r7606, 6; 2026-02-21T12:43:14.3044407Z and.b32 %r588, %r7607, 136; 2026-02-21T12:43:14.3044590Z or.b32 %r589, %r583, %r7605; 2026-02-21T12:43:14.3044864Z or.b32 %r590, %r589, %r586; 2026-02-21T12:43:14.3045039Z or.b32 %r591, %r590, %r588; 2026-02-21T12:43:14.3045229Z add.s32 %r9, %r6277, %r591; 2026-02-21T12:43:14.3045403Z xor.b32 %r592, %r591, 8; 2026-02-21T12:43:14.3045585Z add.s32 %r10, %r6277, %r592; 2026-02-21T12:43:14.3045768Z and.b32 %r594, %r7608, 124; 2026-02-21T12:43:14.3045954Z and.b32 %r596, %r7599, 2; 2026-02-21T12:43:14.3046132Z selp.b32 %r598, 1, 0, %p50; 2026-02-21T12:43:14.3046334Z add.s32 %r599, %r6277, %r7609; 2026-02-21T12:43:14.3046656Z add.s32 %r600, %r599, %r598; 2026-02-21T12:43:14.3046850Z add.s32 %r601, %r600, %r7610; 2026-02-21T12:43:14.3047036Z add.s32 %r602, %r601, %r596; 2026-02-21T12:43:14.3047213Z add.s32 %r11, %r602, %r594; 2026-02-21T12:43:14.3047397Z and.b32 %r603, %r7600, 384; 2026-02-21T12:43:14.3047576Z add.s32 %r604, %r6277, %r596; 2026-02-21T12:43:14.3047784Z add.s32 %r605, %r604, %r603; 2026-02-21T12:43:14.3047957Z add.s32 %r606, %r605, %r594; 2026-02-21T12:43:14.3048231Z add.s32 %r12, %r606, %r7610; 2026-02-21T12:43:14.3048488Z shl.b32 %r607, %r7601, 6; 2026-02-21T12:43:14.3048672Z xor.b32 %r610, %r7611, %r7612; 2026-02-21T12:43:14.3048859Z or.b32 %r611, %r610, %r607; 2026-02-21T12:43:14.3049040Z add.s32 %r13, %r6277, %r611; 2026-02-21T12:43:14.3049227Z xor.b32 %r612, %r611, 32; 2026-02-21T12:43:14.3049403Z add.s32 %r14, %r6277, %r612; 2026-02-21T12:43:14.3049582Z bfe.u32 %r613, %r6277, 4, 14; 2026-02-21T12:43:14.3049760Z cvt.u64.u32 %rd93, %r613; 2026-02-21T12:43:14.3049955Z or.b64 %rd14, %rd93, -9223371899382267904; 2026-02-21T12:43:14.3050171Z add.s32 %r614, %r6277, 32; 2026-02-21T12:43:14.3050351Z bfe.u32 %r615, %r614, 4, 14; 2026-02-21T12:43:14.3050523Z cvt.u64.u32 %rd94, %r615; 2026-02-21T12:43:14.3050713Z or.b64 %rd15, %rd94, -9223371899382267904; 2026-02-21T12:43:14.3050924Z shl.b32 %r617, %r7613, 15; 2026-02-21T12:43:14.3051098Z and.b32 %r619, %r7614, 31840; 2026-02-21T12:43:14.3051284Z shl.b32 %r621, %r7615, 4; 2026-02-21T12:43:14.3051456Z and.b32 %r622, %r7608, 16; 2026-02-21T12:43:14.3051634Z or.b32 %r623, %r617, %r622; 2026-02-21T12:43:14.3051806Z or.b32 %r624, %r619, %r621; 2026-02-21T12:43:14.3051982Z or.b32 %r625, %r623, %r624; 2026-02-21T12:43:14.3052152Z add.s32 %r15, %r6277, %r625; 2026-02-21T12:43:14.3052334Z xor.b32 %r626, %r625, 32; 2026-02-21T12:43:14.3052500Z add.s32 %r16, %r6277, %r626; 2026-02-21T12:43:14.3052677Z xor.b32 %r627, %r625, 64; 2026-02-21T12:43:14.3052847Z add.s32 %r17, %r6277, %r627; 2026-02-21T12:43:14.3053018Z xor.b32 %r628, %r625, 96; 2026-02-21T12:43:14.3053189Z add.s32 %r18, %r6277, %r628; 2026-02-21T12:43:14.3053358Z shl.b32 %r629, %r7615, 12; 2026-02-21T12:43:14.3053531Z shl.b32 %r630, %r7613, 5; 2026-02-21T12:43:14.3053695Z and.b32 %r631, %r7608, 4080; 2026-02-21T12:43:14.3053872Z or.b32 %r632, %r629, %r630; 2026-02-21T12:43:14.3054046Z xor.b32 %r633, %r632, %r631; 2026-02-21T12:43:14.3054226Z add.s32 %r2264, %r6277, %r633; 2026-02-21T12:43:14.3054428Z add.s32 %r2269, %r2264, 4096; 2026-02-21T12:43:14.3054612Z add.s32 %r2274, %r2264, 8192; 2026-02-21T12:43:14.3054793Z add.s32 %r2279, %r2264, 12288; 2026-02-21T12:43:14.3054972Z add.s32 %r2284, %r2264, 16384; 2026-02-21T12:43:14.3055157Z add.s32 %r2289, %r2264, 20480; 2026-02-21T12:43:14.3055333Z add.s32 %r2294, %r2264, 24576; 2026-02-21T12:43:14.3055516Z add.s32 %r2299, %r2264, 28672; 2026-02-21T12:43:14.3055691Z shl.b32 %r634, %r7613, 1; 2026-02-21T12:43:14.3055884Z or.b32 %r635, %r589, %r634; 2026-02-21T12:43:14.3066706Z or.b32 %r636, %r635, %r588; 2026-02-21T12:43:14.3066940Z add.s32 %r27, %r6277, %r636; 2026-02-21T12:43:14.3067148Z xor.b32 %r637, %r636, 8; 2026-02-21T12:43:14.3067344Z add.s32 %r28, %r6277, %r637; 2026-02-21T12:43:14.3067705Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3068255Z mul.wide.u32 %rd96, %r7616, 16; 2026-02-21T12:43:14.3068459Z or.b64 %rd97, %rd396, %rd96; 2026-02-21T12:43:14.3068756Z add.s64 %rd98, %rd97, %rd86; 2026-02-21T12:43:14.3069036Z add.s64 %rd16, %rd98, 96; 2026-02-21T12:43:14.3069225Z or.b64 %rd100, %rd397, %rd4; 2026-02-21T12:43:14.3069408Z add.s64 %rd101, %rd100, %rd87; 2026-02-21T12:43:14.3069607Z add.s64 %rd17, %rd101, 30720; 2026-02-21T12:43:14.3069849Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:43:14.3070144Z // Child Loop BB0_12 Depth 2 2026-02-21T12:43:14.3070428Z // Child Loop BB0_17 Depth 2 2026-02-21T12:43:14.3070700Z // Child Loop BB0_22 Depth 2 2026-02-21T12:43:14.3071113Z .loc 1 25 35 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:25:35 2026-02-21T12:43:14.3071485Z shr.s64 %rd102, %rd398, 63; 2026-02-21T12:43:14.3071683Z shr.u64 %rd103, %rd102, 52; 2026-02-21T12:43:14.3071873Z add.s64 %rd104, %rd398, %rd103; 2026-02-21T12:43:14.3072159Z shr.s64 %rd105, %rd104, 12; 2026-02-21T12:43:14.3072591Z .loc 1 26 33 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:26:33 2026-02-21T12:43:14.3072969Z shl.b64 %rd25, %rd105, 3; 2026-02-21T12:43:14.3073301Z .loc 1 27 39 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:39 2026-02-21T12:43:14.3073653Z sub.s64 %rd106, 10, %rd25; 2026-02-21T12:43:14.3073980Z .loc 1 27 52 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:52 2026-02-21T12:43:14.3074334Z min.s64 %rd26, %rd106, 8; 2026-02-21T12:43:14.3074647Z .loc 1 28 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:45 2026-02-21T12:43:14.3075023Z and.b64 %rd107, %rd104, -4096; 2026-02-21T12:43:14.3075219Z sub.s64 %rd27, %rd398, %rd107; 2026-02-21T12:43:14.3075569Z .loc 1 29 51 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:29:51 2026-02-21T12:43:14.3075933Z or.b64 %rd108, %rd27, %rd26; 2026-02-21T12:43:14.3076137Z and.b64 %rd109, %rd108, -4294967296; 2026-02-21T12:43:14.3076347Z setp.ne.b64 %p3, %rd109, 0; 2026-02-21T12:43:14.3076700Z @%p3 bra $L__BB0_10; 2026-02-21T12:43:14.3076873Z bra.uni $L__BB0_3; 2026-02-21T12:43:14.3077086Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3077360Z div.s64 %rd399, %rd27, %rd26; 2026-02-21T12:43:14.3077562Z bra.uni $L__BB0_11; 2026-02-21T12:43:14.3077782Z $L__BB0_3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3078037Z cvt.u32.u64 %r639, %rd26; 2026-02-21T12:43:14.3078220Z cvt.u32.u64 %r640, %rd27; 2026-02-21T12:43:14.3078392Z div.u32 %r641, %r640, %r639; 2026-02-21T12:43:14.3078580Z cvt.u64.u32 %rd399, %r641; 2026-02-21T12:43:14.3078808Z $L__BB0_11: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3079211Z .loc 1 28 64 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:64 2026-02-21T12:43:14.3079595Z mul.lo.s64 %rd111, %rd399, %rd26; 2026-02-21T12:43:14.3079795Z sub.s64 %rd112, %rd27, %rd111; 2026-02-21T12:43:14.3080129Z .loc 1 28 30 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:30 2026-02-21T12:43:14.3080486Z add.s64 %rd113, %rd112, %rd25; 2026-02-21T12:43:14.3080820Z .loc 1 30 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:30:27 2026-02-21T12:43:14.3081190Z shl.b64 %rd31, %rd113, 7; 2026-02-21T12:43:14.3081510Z .loc 1 32 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:32:27 2026-02-21T12:43:14.3081873Z shl.b64 %rd32, %rd399, 9; 2026-02-21T12:43:14.3082217Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3082684Z shl.b64 %rd114, %rd399, 23; 2026-02-21T12:43:14.3082875Z add.s64 %rd401, %rd16, %rd114; 2026-02-21T12:43:14.3083076Z add.s64 %rd400, %rd17, %rd31; 2026-02-21T12:43:14.3083359Z mov.b32 %r7617, 0f00000000; 2026-02-21T12:43:14.3083541Z mov.b64 %rd402, -32; 2026-02-21T12:43:14.3083713Z mov.b32 %r7618, %r7617; 2026-02-21T12:43:14.3083883Z mov.b32 %r7619, %r7617; 2026-02-21T12:43:14.3084053Z mov.b32 %r7620, %r7617; 2026-02-21T12:43:14.3084215Z mov.b32 %r7621, %r7617; 2026-02-21T12:43:14.3084390Z mov.b32 %r7622, %r7617; 2026-02-21T12:43:14.3084548Z mov.b32 %r7623, %r7617; 2026-02-21T12:43:14.3084717Z mov.b32 %r7624, %r7617; 2026-02-21T12:43:14.3084878Z mov.b32 %r7625, %r7617; 2026-02-21T12:43:14.3085061Z mov.b32 %r7626, %r7617; 2026-02-21T12:43:14.3085231Z mov.b32 %r7627, %r7617; 2026-02-21T12:43:14.3085392Z mov.b32 %r7628, %r7617; 2026-02-21T12:43:14.3085560Z mov.b32 %r7629, %r7617; 2026-02-21T12:43:14.3085720Z mov.b32 %r7630, %r7617; 2026-02-21T12:43:14.3085887Z mov.b32 %r7631, %r7617; 2026-02-21T12:43:14.3086050Z mov.b32 %r7632, %r7617; 2026-02-21T12:43:14.3086216Z mov.b32 %r7633, %r7617; 2026-02-21T12:43:14.3086594Z mov.b32 %r7634, %r7617; 2026-02-21T12:43:14.3086866Z mov.b32 %r7635, %r7617; 2026-02-21T12:43:14.3087033Z mov.b32 %r7636, %r7617; 2026-02-21T12:43:14.3087201Z mov.b32 %r7637, %r7617; 2026-02-21T12:43:14.3087368Z mov.b32 %r7638, %r7617; 2026-02-21T12:43:14.3087530Z mov.b32 %r7639, %r7617; 2026-02-21T12:43:14.3087718Z mov.b32 %r7640, %r7617; 2026-02-21T12:43:14.3087882Z mov.b32 %r7641, %r7617; 2026-02-21T12:43:14.3088049Z mov.b32 %r7642, %r7617; 2026-02-21T12:43:14.3088209Z mov.b32 %r7643, %r7617; 2026-02-21T12:43:14.3088381Z mov.b32 %r7644, %r7617; 2026-02-21T12:43:14.3088541Z mov.b32 %r7645, %r7617; 2026-02-21T12:43:14.3088721Z mov.b32 %r7646, %r7617; 2026-02-21T12:43:14.3088880Z mov.b32 %r7647, %r7617; 2026-02-21T12:43:14.3089047Z mov.b32 %r7648, %r7617; 2026-02-21T12:43:14.3089214Z mov.b32 %r7649, %r7617; 2026-02-21T12:43:14.3089392Z mov.b32 %r7650, %r7617; 2026-02-21T12:43:14.3089569Z mov.b32 %r7651, %r7617; 2026-02-21T12:43:14.3089730Z mov.b32 %r7652, %r7617; 2026-02-21T12:43:14.3089907Z mov.b32 %r7653, %r7617; 2026-02-21T12:43:14.3090071Z mov.b32 %r7654, %r7617; 2026-02-21T12:43:14.3090241Z mov.b32 %r7655, %r7617; 2026-02-21T12:43:14.3090406Z mov.b32 %r7656, %r7617; 2026-02-21T12:43:14.3090577Z mov.b32 %r7657, %r7617; 2026-02-21T12:43:14.3090742Z mov.b32 %r7658, %r7617; 2026-02-21T12:43:14.3090916Z mov.b32 %r7659, %r7617; 2026-02-21T12:43:14.3091086Z mov.b32 %r7660, %r7617; 2026-02-21T12:43:14.3091252Z mov.b32 %r7661, %r7617; 2026-02-21T12:43:14.3091426Z mov.b32 %r7662, %r7617; 2026-02-21T12:43:14.3091589Z mov.b32 %r7663, %r7617; 2026-02-21T12:43:14.3091760Z mov.b32 %r7664, %r7617; 2026-02-21T12:43:14.3091921Z mov.b32 %r7665, %r7617; 2026-02-21T12:43:14.3092094Z mov.b32 %r7666, %r7617; 2026-02-21T12:43:14.3092256Z mov.b32 %r7667, %r7617; 2026-02-21T12:43:14.3092428Z mov.b32 %r7668, %r7617; 2026-02-21T12:43:14.3092597Z mov.b32 %r7669, %r7617; 2026-02-21T12:43:14.3092769Z mov.b32 %r7670, %r7617; 2026-02-21T12:43:14.3092939Z mov.b32 %r7671, %r7617; 2026-02-21T12:43:14.3093103Z mov.b32 %r7672, %r7617; 2026-02-21T12:43:14.3093271Z mov.b32 %r7673, %r7617; 2026-02-21T12:43:14.3093437Z mov.b32 %r7674, %r7617; 2026-02-21T12:43:14.3093614Z mov.b32 %r7675, %r7617; 2026-02-21T12:43:14.3093783Z mov.b32 %r7676, %r7617; 2026-02-21T12:43:14.3093953Z mov.b32 %r7677, %r7617; 2026-02-21T12:43:14.3094114Z mov.b32 %r7678, %r7617; 2026-02-21T12:43:14.3094280Z mov.b32 %r7679, %r7617; 2026-02-21T12:43:14.3094442Z mov.b32 %r7680, %r7617; 2026-02-21T12:43:14.3094665Z $L__BB0_12: // Parent Loop BB0_2 Depth=1 2026-02-21T12:43:14.3094966Z // => This Inner Loop Header: Depth=2 2026-02-21T12:43:14.3095358Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3095831Z add.s64 %rd116, %rd401, -96; 2026-02-21T12:43:14.3096017Z // begin inline asm 2026-02-21T12:43:14.3096190Z mov.u64 %rd115, 0x0; 2026-02-21T12:43:14.3096672Z createpolicy.fractional.L2::evict_last.b64 %rd115, 1.0; 2026-02-21T12:43:14.3096940Z // end inline asm 2026-02-21T12:43:14.3097095Z // begin inline asm 2026-02-21T12:43:14.3097261Z mov.u32 %r643, 0x0; 2026-02-21T12:43:14.3097420Z mov.u32 %r644, 0x0; 2026-02-21T12:43:14.3097581Z mov.u32 %r645, 0x0; 2026-02-21T12:43:14.3097741Z mov.u32 %r646, 0x0; 2026-02-21T12:43:14.3098047Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r643, %r644, %r645, %r646 }, [ %rd116 + 0 ], %rd115; 2026-02-21T12:43:14.3098408Z // end inline asm 2026-02-21T12:43:14.3098703Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3099058Z bar.sync 0; 2026-02-21T12:43:14.3099225Z st.shared.v2.b32 [%r7], {%r643, %r644}; 2026-02-21T12:43:14.3099444Z st.shared.v2.b32 [%r8], {%r645, %r646}; 2026-02-21T12:43:14.3099652Z bar.sync 0; 2026-02-21T12:43:14.3099804Z ld.shared.b16 %rs5, [%r9]; 2026-02-21T12:43:14.3100002Z ld.shared.b16 %rs6, [%r9+256]; 2026-02-21T12:43:14.3100359Z ld.shared.b16 %rs7, [%r9+16]; 2026-02-21T12:43:14.3100567Z ld.shared.b16 %rs8, [%r9+272]; 2026-02-21T12:43:14.3100760Z ld.shared.b16 %rs9, [%r10]; 2026-02-21T12:43:14.3100958Z ld.shared.b16 %rs10, [%r10+256]; 2026-02-21T12:43:14.3101162Z ld.shared.b16 %rs11, [%r10+16]; 2026-02-21T12:43:14.3101375Z ld.shared.b16 %rs12, [%r10+272]; 2026-02-21T12:43:14.3101585Z cvt.f32.bf16 %r775, %rs5; 2026-02-21T12:43:14.3101761Z cvt.f32.bf16 %r776, %rs6; 2026-02-21T12:43:14.3101936Z cvt.f32.bf16 %r777, %rs9; 2026-02-21T12:43:14.3102106Z cvt.f32.bf16 %r778, %rs10; 2026-02-21T12:43:14.3102286Z cvt.f32.bf16 %r907, %rs7; 2026-02-21T12:43:14.3102455Z cvt.f32.bf16 %r908, %rs8; 2026-02-21T12:43:14.3102629Z cvt.f32.bf16 %r909, %rs11; 2026-02-21T12:43:14.3102806Z cvt.f32.bf16 %r910, %rs12; 2026-02-21T12:43:14.3103123Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3103494Z add.s64 %rd119, %rd400, -30720; 2026-02-21T12:43:14.3103685Z // begin inline asm 2026-02-21T12:43:14.3103860Z mov.u64 %rd118, 0x0; 2026-02-21T12:43:14.3104090Z createpolicy.fractional.L2::evict_first.b64 %rd118, 1.0; 2026-02-21T12:43:14.3104362Z // end inline asm 2026-02-21T12:43:14.3104519Z // begin inline asm 2026-02-21T12:43:14.3104685Z mov.u16 %rs1, 0x0; 2026-02-21T12:43:14.3104952Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs1 }, [ %rd119 + 0 ], %rd118; 2026-02-21T12:43:14.3105258Z // end inline asm 2026-02-21T12:43:14.3105569Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3105926Z bar.sync 0; 2026-02-21T12:43:14.3106087Z st.shared.b8 [%r11], %rs1; 2026-02-21T12:43:14.3106272Z bar.sync 0; 2026-02-21T12:43:14.3106561Z ld.shared.v2.b8 {%rs13, %rs14}, [%r12]; 2026-02-21T12:43:14.3106927Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3107288Z shl.b16 %rs15, %rs13, 4; 2026-02-21T12:43:14.3107484Z shl.b16 %rs16, %rs14, 4; 2026-02-21T12:43:14.3107794Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3108158Z selp.b16 %rs17, %rs15, %rs13, %p51; 2026-02-21T12:43:14.3108360Z cvt.s16.s8 %rs18, %rs17; 2026-02-21T12:43:14.3108534Z shr.s16 %rs19, %rs18, 4; 2026-02-21T12:43:14.3108793Z selp.b16 %rs20, %rs16, %rs14, %p51; 2026-02-21T12:43:14.3108996Z cvt.s16.s8 %rs21, %rs20; 2026-02-21T12:43:14.3109163Z shr.s16 %rs22, %rs21, 4; 2026-02-21T12:43:14.3109475Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3109833Z cvt.rn.f32.s16 %r2251, %rs19; 2026-02-21T12:43:14.3110023Z cvt.rn.f32.s16 %r2252, %rs22; 2026-02-21T12:43:14.3110204Z bar.sync 0; 2026-02-21T12:43:14.3110462Z st.shared.b32 [%r13], %r2251; 2026-02-21T12:43:14.3110649Z st.shared.b32 [%r14], %r2252; 2026-02-21T12:43:14.3110833Z $L__tmp1: 2026-02-21T12:43:14.3111273Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3111688Z // begin inline asm 2026-02-21T12:43:14.3111869Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3112059Z // end inline asm 2026-02-21T12:43:14.3112206Z bar.sync 0; 2026-02-21T12:43:14.3112373Z shfl.sync.idx.b32 %r2253, %r2, 0, 31, -1; 2026-02-21T12:43:14.3112599Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3112785Z mov.pred %p4, -1; 2026-02-21T12:43:14.3112945Z // begin inline asm 2026-02-21T12:43:14.3114465Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r775,%r776,%r777,%r778}, %rd14, %p4, 1, 1; 2026-02-21T12:43:14.3115911Z // end inline asm 2026-02-21T12:43:14.3116078Z // begin inline asm 2026-02-21T12:43:14.3117567Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r907,%r908,%r909,%r910}, %rd15, %p4, 1, 1; 2026-02-21T12:43:14.3118957Z // end inline asm 2026-02-21T12:43:14.3119132Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3119334Z mov.b32 %r2182, 0; 2026-02-21T12:43:14.3119489Z mov.b32 %r976, %r2182; 2026-02-21T12:43:14.3119665Z mov.b32 %r977, %r2182; 2026-02-21T12:43:14.3119831Z mov.b32 %r975, %r6277; 2026-02-21T12:43:14.3119996Z // begin inline asm 2026-02-21T12:43:14.3121144Z // wait for regs: %r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680,%r975,%r976,%r977 2026-02-21T12:43:14.3122371Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3122564Z // end inline asm 2026-02-21T12:43:14.3122720Z $L__tmp2: 2026-02-21T12:43:14.3123013Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3123380Z add.s64 %rd124, %rd401, -64; 2026-02-21T12:43:14.3123574Z // begin inline asm 2026-02-21T12:43:14.3123727Z mov.u64 %rd123, 0x0; 2026-02-21T12:43:14.3123951Z createpolicy.fractional.L2::evict_last.b64 %rd123, 1.0; 2026-02-21T12:43:14.3124209Z // end inline asm 2026-02-21T12:43:14.3124355Z // begin inline asm 2026-02-21T12:43:14.3124511Z mov.u32 %r1045, 0x0; 2026-02-21T12:43:14.3124665Z mov.u32 %r1046, 0x0; 2026-02-21T12:43:14.3124818Z mov.u32 %r1047, 0x0; 2026-02-21T12:43:14.3124965Z mov.u32 %r1048, 0x0; 2026-02-21T12:43:14.3125282Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1045, %r1046, %r1047, %r1048 }, [ %rd124 + 0 ], %rd123; 2026-02-21T12:43:14.3125645Z // end inline asm 2026-02-21T12:43:14.3125939Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3126304Z bar.sync 0; 2026-02-21T12:43:14.3126580Z st.shared.v2.b32 [%r7], {%r1045, %r1046}; 2026-02-21T12:43:14.3126923Z st.shared.v2.b32 [%r8], {%r1047, %r1048}; 2026-02-21T12:43:14.3127126Z bar.sync 0; 2026-02-21T12:43:14.3127401Z ld.shared.b16 %rs23, [%r9]; 2026-02-21T12:43:14.3127595Z ld.shared.b16 %rs24, [%r9+256]; 2026-02-21T12:43:14.3127796Z ld.shared.b16 %rs25, [%r9+16]; 2026-02-21T12:43:14.3127987Z ld.shared.b16 %rs26, [%r9+272]; 2026-02-21T12:43:14.3128185Z ld.shared.b16 %rs27, [%r10]; 2026-02-21T12:43:14.3128394Z ld.shared.b16 %rs28, [%r10+256]; 2026-02-21T12:43:14.3128601Z ld.shared.b16 %rs29, [%r10+16]; 2026-02-21T12:43:14.3128799Z ld.shared.b16 %rs30, [%r10+272]; 2026-02-21T12:43:14.3128995Z cvt.f32.bf16 %r1177, %rs23; 2026-02-21T12:43:14.3129183Z cvt.f32.bf16 %r1178, %rs24; 2026-02-21T12:43:14.3129363Z cvt.f32.bf16 %r1179, %rs27; 2026-02-21T12:43:14.3129548Z cvt.f32.bf16 %r1180, %rs28; 2026-02-21T12:43:14.3129735Z cvt.f32.bf16 %r1309, %rs25; 2026-02-21T12:43:14.3129918Z cvt.f32.bf16 %r1310, %rs26; 2026-02-21T12:43:14.3130101Z cvt.f32.bf16 %r1311, %rs29; 2026-02-21T12:43:14.3130281Z cvt.f32.bf16 %r1312, %rs30; 2026-02-21T12:43:14.3130751Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3131125Z add.s64 %rd127, %rd400, -20480; 2026-02-21T12:43:14.3131344Z // begin inline asm 2026-02-21T12:43:14.3131515Z mov.u64 %rd126, 0x0; 2026-02-21T12:43:14.3131756Z createpolicy.fractional.L2::evict_first.b64 %rd126, 1.0; 2026-02-21T12:43:14.3132019Z // end inline asm 2026-02-21T12:43:14.3132184Z // begin inline asm 2026-02-21T12:43:14.3132338Z mov.u16 %rs2, 0x0; 2026-02-21T12:43:14.3132594Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs2 }, [ %rd127 + 0 ], %rd126; 2026-02-21T12:43:14.3132898Z // end inline asm 2026-02-21T12:43:14.3133192Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3133546Z bar.sync 0; 2026-02-21T12:43:14.3133699Z st.shared.b8 [%r11], %rs2; 2026-02-21T12:43:14.3133882Z bar.sync 0; 2026-02-21T12:43:14.3134045Z ld.shared.v2.b8 {%rs31, %rs32}, [%r12]; 2026-02-21T12:43:14.3134399Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3134759Z shl.b16 %rs33, %rs31, 4; 2026-02-21T12:43:14.3134935Z shl.b16 %rs34, %rs32, 4; 2026-02-21T12:43:14.3135246Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3135602Z selp.b16 %rs35, %rs33, %rs31, %p51; 2026-02-21T12:43:14.3135817Z cvt.s16.s8 %rs36, %rs35; 2026-02-21T12:43:14.3135995Z shr.s16 %rs37, %rs36, 4; 2026-02-21T12:43:14.3136180Z selp.b16 %rs38, %rs34, %rs32, %p51; 2026-02-21T12:43:14.3136380Z cvt.s16.s8 %rs39, %rs38; 2026-02-21T12:43:14.3136681Z shr.s16 %rs40, %rs39, 4; 2026-02-21T12:43:14.3137005Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3137374Z cvt.rn.f32.s16 %r2254, %rs37; 2026-02-21T12:43:14.3137567Z cvt.rn.f32.s16 %r2255, %rs40; 2026-02-21T12:43:14.3137738Z bar.sync 0; 2026-02-21T12:43:14.3137892Z st.shared.b32 [%r13], %r2254; 2026-02-21T12:43:14.3138074Z st.shared.b32 [%r14], %r2255; 2026-02-21T12:43:14.3138256Z $L__tmp3: 2026-02-21T12:43:14.3138610Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3139037Z // begin inline asm 2026-02-21T12:43:14.3139217Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3139399Z // end inline asm 2026-02-21T12:43:14.3139564Z bar.sync 0; 2026-02-21T12:43:14.3139723Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3139904Z // begin inline asm 2026-02-21T12:43:14.3141264Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r1177,%r1178,%r1179,%r1180}, %rd14, %p4, 1, 1; 2026-02-21T12:43:14.3142830Z // end inline asm 2026-02-21T12:43:14.3142992Z // begin inline asm 2026-02-21T12:43:14.3144356Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r1309,%r1310,%r1311,%r1312}, %rd15, %p4, 1, 1; 2026-02-21T12:43:14.3145762Z // end inline asm 2026-02-21T12:43:14.3145934Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3146136Z mov.b32 %r1378, %r2182; 2026-02-21T12:43:14.3146382Z mov.b32 %r1379, %r2182; 2026-02-21T12:43:14.3146753Z mov.b32 %r1377, %r6277; 2026-02-21T12:43:14.3146928Z // begin inline asm 2026-02-21T12:43:14.3148107Z // wait for regs: %r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680,%r1377,%r1378,%r1379 2026-02-21T12:43:14.3149419Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3149615Z // end inline asm 2026-02-21T12:43:14.3149760Z $L__tmp4: 2026-02-21T12:43:14.3150063Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3150440Z add.s64 %rd132, %rd401, -32; 2026-02-21T12:43:14.3150624Z // begin inline asm 2026-02-21T12:43:14.3150794Z mov.u64 %rd131, 0x0; 2026-02-21T12:43:14.3151020Z createpolicy.fractional.L2::evict_last.b64 %rd131, 1.0; 2026-02-21T12:43:14.3151277Z // end inline asm 2026-02-21T12:43:14.3151424Z // begin inline asm 2026-02-21T12:43:14.3151582Z mov.u32 %r1447, 0x0; 2026-02-21T12:43:14.3151736Z mov.u32 %r1448, 0x0; 2026-02-21T12:43:14.3151891Z mov.u32 %r1449, 0x0; 2026-02-21T12:43:14.3152043Z mov.u32 %r1450, 0x0; 2026-02-21T12:43:14.3152360Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1447, %r1448, %r1449, %r1450 }, [ %rd132 + 0 ], %rd131; 2026-02-21T12:43:14.3152727Z // end inline asm 2026-02-21T12:43:14.3153022Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3153373Z bar.sync 0; 2026-02-21T12:43:14.3153535Z st.shared.v2.b32 [%r7], {%r1447, %r1448}; 2026-02-21T12:43:14.3153768Z st.shared.v2.b32 [%r8], {%r1449, %r1450}; 2026-02-21T12:43:14.3153980Z bar.sync 0; 2026-02-21T12:43:14.3154141Z ld.shared.b16 %rs41, [%r9]; 2026-02-21T12:43:14.3154338Z ld.shared.b16 %rs42, [%r9+256]; 2026-02-21T12:43:14.3154531Z ld.shared.b16 %rs43, [%r9+16]; 2026-02-21T12:43:14.3154727Z ld.shared.b16 %rs44, [%r9+272]; 2026-02-21T12:43:14.3154917Z ld.shared.b16 %rs45, [%r10]; 2026-02-21T12:43:14.3155108Z ld.shared.b16 %rs46, [%r10+256]; 2026-02-21T12:43:14.3155303Z ld.shared.b16 %rs47, [%r10+16]; 2026-02-21T12:43:14.3155498Z ld.shared.b16 %rs48, [%r10+272]; 2026-02-21T12:43:14.3155690Z cvt.f32.bf16 %r1579, %rs41; 2026-02-21T12:43:14.3155875Z cvt.f32.bf16 %r1580, %rs42; 2026-02-21T12:43:14.3156053Z cvt.f32.bf16 %r1581, %rs45; 2026-02-21T12:43:14.3156225Z cvt.f32.bf16 %r1582, %rs46; 2026-02-21T12:43:14.3156403Z cvt.f32.bf16 %r1711, %rs43; 2026-02-21T12:43:14.3156737Z cvt.f32.bf16 %r1712, %rs44; 2026-02-21T12:43:14.3156913Z cvt.f32.bf16 %r1713, %rs47; 2026-02-21T12:43:14.3157191Z cvt.f32.bf16 %r1714, %rs48; 2026-02-21T12:43:14.3157535Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3157982Z add.s64 %rd135, %rd400, -10240; 2026-02-21T12:43:14.3158182Z // begin inline asm 2026-02-21T12:43:14.3158352Z mov.u64 %rd134, 0x0; 2026-02-21T12:43:14.3158588Z createpolicy.fractional.L2::evict_first.b64 %rd134, 1.0; 2026-02-21T12:43:14.3158857Z // end inline asm 2026-02-21T12:43:14.3159005Z // begin inline asm 2026-02-21T12:43:14.3159162Z mov.u16 %rs3, 0x0; 2026-02-21T12:43:14.3159425Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs3 }, [ %rd135 + 0 ], %rd134; 2026-02-21T12:43:14.3159735Z // end inline asm 2026-02-21T12:43:14.3160033Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3160387Z bar.sync 0; 2026-02-21T12:43:14.3160540Z st.shared.b8 [%r11], %rs3; 2026-02-21T12:43:14.3160720Z bar.sync 0; 2026-02-21T12:43:14.3160883Z ld.shared.v2.b8 {%rs49, %rs50}, [%r12]; 2026-02-21T12:43:14.3161369Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3161751Z shl.b16 %rs51, %rs49, 4; 2026-02-21T12:43:14.3161929Z shl.b16 %rs52, %rs50, 4; 2026-02-21T12:43:14.3162243Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3162603Z selp.b16 %rs53, %rs51, %rs49, %p51; 2026-02-21T12:43:14.3162810Z cvt.s16.s8 %rs54, %rs53; 2026-02-21T12:43:14.3162986Z shr.s16 %rs55, %rs54, 4; 2026-02-21T12:43:14.3163159Z selp.b16 %rs56, %rs52, %rs50, %p51; 2026-02-21T12:43:14.3163358Z cvt.s16.s8 %rs57, %rs56; 2026-02-21T12:43:14.3163523Z shr.s16 %rs58, %rs57, 4; 2026-02-21T12:43:14.3163835Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3164185Z cvt.rn.f32.s16 %r2256, %rs55; 2026-02-21T12:43:14.3164374Z cvt.rn.f32.s16 %r2257, %rs58; 2026-02-21T12:43:14.3164545Z bar.sync 0; 2026-02-21T12:43:14.3164715Z st.shared.b32 [%r13], %r2256; 2026-02-21T12:43:14.3164906Z st.shared.b32 [%r14], %r2257; 2026-02-21T12:43:14.3165075Z $L__tmp5: 2026-02-21T12:43:14.3165430Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3165845Z // begin inline asm 2026-02-21T12:43:14.3166020Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3166204Z // end inline asm 2026-02-21T12:43:14.3166351Z bar.sync 0; 2026-02-21T12:43:14.3166634Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3166818Z // begin inline asm 2026-02-21T12:43:14.3168198Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r1579,%r1580,%r1581,%r1582}, %rd14, %p4, 1, 1; 2026-02-21T12:43:14.3169601Z // end inline asm 2026-02-21T12:43:14.3169754Z // begin inline asm 2026-02-21T12:43:14.3171108Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r1711,%r1712,%r1713,%r1714}, %rd15, %p4, 1, 1; 2026-02-21T12:43:14.3172492Z // end inline asm 2026-02-21T12:43:14.3172748Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3172963Z mov.b32 %r1780, %r2182; 2026-02-21T12:43:14.3173131Z mov.b32 %r1781, %r2182; 2026-02-21T12:43:14.3173402Z mov.b32 %r1779, %r6277; 2026-02-21T12:43:14.3173563Z // begin inline asm 2026-02-21T12:43:14.3174729Z // wait for regs: %r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680,%r1779,%r1780,%r1781 2026-02-21T12:43:14.3175946Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3176138Z // end inline asm 2026-02-21T12:43:14.3176286Z $L__tmp6: 2026-02-21T12:43:14.3176687Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3177049Z // begin inline asm 2026-02-21T12:43:14.3177217Z mov.u64 %rd139, 0x0; 2026-02-21T12:43:14.3177601Z createpolicy.fractional.L2::evict_last.b64 %rd139, 1.0; 2026-02-21T12:43:14.3177876Z // end inline asm 2026-02-21T12:43:14.3178024Z // begin inline asm 2026-02-21T12:43:14.3178180Z mov.u32 %r1849, 0x0; 2026-02-21T12:43:14.3178332Z mov.u32 %r1850, 0x0; 2026-02-21T12:43:14.3178494Z mov.u32 %r1851, 0x0; 2026-02-21T12:43:14.3178646Z mov.u32 %r1852, 0x0; 2026-02-21T12:43:14.3178970Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r1849, %r1850, %r1851, %r1852 }, [ %rd401 + 0 ], %rd139; 2026-02-21T12:43:14.3179336Z // end inline asm 2026-02-21T12:43:14.3179636Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3179992Z bar.sync 0; 2026-02-21T12:43:14.3180154Z st.shared.v2.b32 [%r7], {%r1849, %r1850}; 2026-02-21T12:43:14.3180388Z st.shared.v2.b32 [%r8], {%r1851, %r1852}; 2026-02-21T12:43:14.3180594Z bar.sync 0; 2026-02-21T12:43:14.3180760Z ld.shared.b16 %rs59, [%r9]; 2026-02-21T12:43:14.3180955Z ld.shared.b16 %rs60, [%r9+256]; 2026-02-21T12:43:14.3181161Z ld.shared.b16 %rs61, [%r9+16]; 2026-02-21T12:43:14.3181353Z ld.shared.b16 %rs62, [%r9+272]; 2026-02-21T12:43:14.3181549Z ld.shared.b16 %rs63, [%r10]; 2026-02-21T12:43:14.3181736Z ld.shared.b16 %rs64, [%r10+256]; 2026-02-21T12:43:14.3181938Z ld.shared.b16 %rs65, [%r10+16]; 2026-02-21T12:43:14.3182130Z ld.shared.b16 %rs66, [%r10+272]; 2026-02-21T12:43:14.3182321Z cvt.f32.bf16 %r1981, %rs59; 2026-02-21T12:43:14.3182504Z cvt.f32.bf16 %r1982, %rs60; 2026-02-21T12:43:14.3182676Z cvt.f32.bf16 %r1983, %rs63; 2026-02-21T12:43:14.3182854Z cvt.f32.bf16 %r1984, %rs64; 2026-02-21T12:43:14.3183025Z cvt.f32.bf16 %r2113, %rs61; 2026-02-21T12:43:14.3183201Z cvt.f32.bf16 %r2114, %rs62; 2026-02-21T12:43:14.3183372Z cvt.f32.bf16 %r2115, %rs65; 2026-02-21T12:43:14.3183547Z cvt.f32.bf16 %r2116, %rs66; 2026-02-21T12:43:14.3183871Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3184220Z // begin inline asm 2026-02-21T12:43:14.3184389Z mov.u64 %rd142, 0x0; 2026-02-21T12:43:14.3184609Z createpolicy.fractional.L2::evict_first.b64 %rd142, 1.0; 2026-02-21T12:43:14.3184873Z // end inline asm 2026-02-21T12:43:14.3185023Z // begin inline asm 2026-02-21T12:43:14.3185181Z mov.u16 %rs4, 0x0; 2026-02-21T12:43:14.3185431Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs4 }, [ %rd400 + 0 ], %rd142; 2026-02-21T12:43:14.3185735Z // end inline asm 2026-02-21T12:43:14.3186039Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3186391Z bar.sync 0; 2026-02-21T12:43:14.3186673Z st.shared.b8 [%r11], %rs4; 2026-02-21T12:43:14.3186859Z bar.sync 0; 2026-02-21T12:43:14.3187026Z ld.shared.v2.b8 {%rs67, %rs68}, [%r12]; 2026-02-21T12:43:14.3187385Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3187844Z shl.b16 %rs69, %rs67, 4; 2026-02-21T12:43:14.3188031Z shl.b16 %rs70, %rs68, 4; 2026-02-21T12:43:14.3188421Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3188865Z selp.b16 %rs71, %rs69, %rs67, %p51; 2026-02-21T12:43:14.3189065Z cvt.s16.s8 %rs72, %rs71; 2026-02-21T12:43:14.3189241Z shr.s16 %rs73, %rs72, 4; 2026-02-21T12:43:14.3189415Z selp.b16 %rs74, %rs70, %rs68, %p51; 2026-02-21T12:43:14.3189620Z cvt.s16.s8 %rs75, %rs74; 2026-02-21T12:43:14.3189784Z shr.s16 %rs76, %rs75, 4; 2026-02-21T12:43:14.3190097Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3190457Z cvt.rn.f32.s16 %r2258, %rs73; 2026-02-21T12:43:14.3190643Z cvt.rn.f32.s16 %r2259, %rs76; 2026-02-21T12:43:14.3190826Z bar.sync 0; 2026-02-21T12:43:14.3190974Z st.shared.b32 [%r13], %r2258; 2026-02-21T12:43:14.3191167Z st.shared.b32 [%r14], %r2259; 2026-02-21T12:43:14.3191340Z $L__tmp7: 2026-02-21T12:43:14.3191850Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3192290Z // begin inline asm 2026-02-21T12:43:14.3192472Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3192667Z // end inline asm 2026-02-21T12:43:14.3192814Z bar.sync 0; 2026-02-21T12:43:14.3192979Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3193158Z // begin inline asm 2026-02-21T12:43:14.3194518Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r1981,%r1982,%r1983,%r1984}, %rd14, %p4, 1, 1; 2026-02-21T12:43:14.3195921Z // end inline asm 2026-02-21T12:43:14.3196073Z // begin inline asm 2026-02-21T12:43:14.3197555Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680}, {%r2113,%r2114,%r2115,%r2116}, %rd15, %p4, 1, 1; 2026-02-21T12:43:14.3198950Z // end inline asm 2026-02-21T12:43:14.3199115Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3199317Z mov.b32 %r2181, %r6277; 2026-02-21T12:43:14.3199482Z mov.b32 %r2183, %r2182; 2026-02-21T12:43:14.3199651Z // begin inline asm 2026-02-21T12:43:14.3200825Z // wait for regs: %r7617,%r7618,%r7619,%r7620,%r7621,%r7622,%r7623,%r7624,%r7625,%r7626,%r7627,%r7628,%r7629,%r7630,%r7631,%r7632,%r7633,%r7634,%r7635,%r7636,%r7637,%r7638,%r7639,%r7640,%r7641,%r7642,%r7643,%r7644,%r7645,%r7646,%r7647,%r7648,%r7649,%r7650,%r7651,%r7652,%r7653,%r7654,%r7655,%r7656,%r7657,%r7658,%r7659,%r7660,%r7661,%r7662,%r7663,%r7664,%r7665,%r7666,%r7667,%r7668,%r7669,%r7670,%r7671,%r7672,%r7673,%r7674,%r7675,%r7676,%r7677,%r7678,%r7679,%r7680,%r2181,%r2182,%r2183 2026-02-21T12:43:14.3202045Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3202240Z // end inline asm 2026-02-21T12:43:14.3202384Z $L__tmp8: 2026-02-21T12:43:14.3202678Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3203042Z add.s64 %rd402, %rd402, 32; 2026-02-21T12:43:14.3203234Z add.s64 %rd401, %rd401, 128; 2026-02-21T12:43:14.3203421Z add.s64 %rd400, %rd400, 40960; 2026-02-21T12:43:14.3203710Z setp.lt.u64 %p13, %rd402, 4064; 2026-02-21T12:43:14.3203904Z @%p13 bra $L__BB0_12; 2026-02-21T12:43:14.3204199Z // %bb.13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3204610Z .loc 1 31 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:32 2026-02-21T12:43:14.3204978Z or.b64 %rd155, %rd31, %rd3; 2026-02-21T12:43:14.3205313Z .loc 1 33 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:32 2026-02-21T12:43:14.3205675Z or.b64 %rd156, %rd32, %rd6; 2026-02-21T12:43:14.3205852Z or.b64 %rd157, %rd32, %rd7; 2026-02-21T12:43:14.3206032Z or.b64 %rd158, %rd32, %rd8; 2026-02-21T12:43:14.3206209Z or.b64 %rd159, %rd32, %rd9; 2026-02-21T12:43:14.3206393Z or.b64 %rd160, %rd32, %rd10; 2026-02-21T12:43:14.3206707Z or.b64 %rd161, %rd32, %rd11; 2026-02-21T12:43:14.3206892Z or.b64 %rd162, %rd32, %rd12; 2026-02-21T12:43:14.3207069Z or.b64 %rd163, %rd32, %rd13; 2026-02-21T12:43:14.3207392Z .loc 1 87 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:87:28 2026-02-21T12:43:14.3207910Z cvt.rn.bf16x2.f32 %r2332, %r7618, %r7617; 2026-02-21T12:43:14.3208157Z cvt.rn.bf16x2.f32 %r2333, %r7620, %r7619; 2026-02-21T12:43:14.3208385Z cvt.rn.bf16x2.f32 %r2334, %r7622, %r7621; 2026-02-21T12:43:14.3208602Z cvt.rn.bf16x2.f32 %r2335, %r7624, %r7623; 2026-02-21T12:43:14.3208830Z cvt.rn.bf16x2.f32 %r2336, %r7626, %r7625; 2026-02-21T12:43:14.3209047Z cvt.rn.bf16x2.f32 %r2337, %r7628, %r7627; 2026-02-21T12:43:14.3209268Z cvt.rn.bf16x2.f32 %r2338, %r7630, %r7629; 2026-02-21T12:43:14.3209484Z cvt.rn.bf16x2.f32 %r2339, %r7632, %r7631; 2026-02-21T12:43:14.3209705Z cvt.rn.bf16x2.f32 %r2340, %r7634, %r7633; 2026-02-21T12:43:14.3209928Z cvt.rn.bf16x2.f32 %r2341, %r7636, %r7635; 2026-02-21T12:43:14.3210145Z cvt.rn.bf16x2.f32 %r2342, %r7638, %r7637; 2026-02-21T12:43:14.3210372Z cvt.rn.bf16x2.f32 %r2343, %r7640, %r7639; 2026-02-21T12:43:14.3210589Z cvt.rn.bf16x2.f32 %r2344, %r7642, %r7641; 2026-02-21T12:43:14.3210811Z cvt.rn.bf16x2.f32 %r2345, %r7644, %r7643; 2026-02-21T12:43:14.3211031Z cvt.rn.bf16x2.f32 %r2346, %r7646, %r7645; 2026-02-21T12:43:14.3211256Z cvt.rn.bf16x2.f32 %r2347, %r7648, %r7647; 2026-02-21T12:43:14.3211471Z cvt.rn.bf16x2.f32 %r2348, %r7650, %r7649; 2026-02-21T12:43:14.3211695Z cvt.rn.bf16x2.f32 %r2349, %r7652, %r7651; 2026-02-21T12:43:14.3211915Z cvt.rn.bf16x2.f32 %r2350, %r7654, %r7653; 2026-02-21T12:43:14.3212530Z [8293s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:43:14.3214145Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 512, 128], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=2, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[True, True], range_num_stages=[2, 4], range_unroll_factors=[3, 4], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:43:14.3215658Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:43:14.3215941Z `ptxas` stderr: 2026-02-21T12:43:14.3216603Z ptxas fatal : (C7602) Insufficient registers (64) to compile instruction at line 376 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T12:43:14.3217244Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:43:14.3217439Z 2026-02-21T12:43:14.3217943Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpoqe17qpi.ptx -o /tmp/tmpoqe17qpi.ptx.o 2026-02-21T12:43:14.3218526Z 2026-02-21T12:43:14.3218687Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:43:14.3218996Z cvt.rn.bf16x2.f32 %r2351, %r7656, %r7655; 2026-02-21T12:43:14.3219222Z cvt.rn.bf16x2.f32 %r2352, %r7658, %r7657; 2026-02-21T12:43:14.3219557Z cvt.rn.bf16x2.f32 %r2353, %r7660, %r7659; 2026-02-21T12:43:14.3219771Z cvt.rn.bf16x2.f32 %r2354, %r7662, %r7661; 2026-02-21T12:43:14.3220072Z cvt.rn.bf16x2.f32 %r2355, %r7664, %r7663; 2026-02-21T12:43:14.3220286Z cvt.rn.bf16x2.f32 %r2356, %r7666, %r7665; 2026-02-21T12:43:14.3220503Z cvt.rn.bf16x2.f32 %r2357, %r7668, %r7667; 2026-02-21T12:43:14.3220715Z cvt.rn.bf16x2.f32 %r2358, %r7670, %r7669; 2026-02-21T12:43:14.3220933Z cvt.rn.bf16x2.f32 %r2359, %r7672, %r7671; 2026-02-21T12:43:14.3221153Z cvt.rn.bf16x2.f32 %r2360, %r7674, %r7673; 2026-02-21T12:43:14.3221367Z cvt.rn.bf16x2.f32 %r2361, %r7676, %r7675; 2026-02-21T12:43:14.3221588Z cvt.rn.bf16x2.f32 %r2362, %r7678, %r7677; 2026-02-21T12:43:14.3221798Z cvt.rn.bf16x2.f32 %r2363, %r7680, %r7679; 2026-02-21T12:43:14.3222153Z .loc 1 88 22 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:22 2026-02-21T12:43:14.3222523Z mad.lo.s64 %rd164, %rd156, 2560, %rd88; 2026-02-21T12:43:14.3222739Z shl.b64 %rd165, %rd155, 1; 2026-02-21T12:43:14.3222925Z add.s64 %rd147, %rd164, %rd165; 2026-02-21T12:43:14.3223297Z mad.lo.s64 %rd166, %rd157, 2560, %rd88; 2026-02-21T12:43:14.3223521Z add.s64 %rd148, %rd166, %rd165; 2026-02-21T12:43:14.3223715Z mad.lo.s64 %rd167, %rd158, 2560, %rd88; 2026-02-21T12:43:14.3223923Z add.s64 %rd149, %rd167, %rd165; 2026-02-21T12:43:14.3224111Z mad.lo.s64 %rd168, %rd159, 2560, %rd88; 2026-02-21T12:43:14.3224317Z add.s64 %rd150, %rd168, %rd165; 2026-02-21T12:43:14.3224507Z mad.lo.s64 %rd169, %rd160, 2560, %rd88; 2026-02-21T12:43:14.3224713Z add.s64 %rd151, %rd169, %rd165; 2026-02-21T12:43:14.3224902Z mad.lo.s64 %rd170, %rd161, 2560, %rd88; 2026-02-21T12:43:14.3225110Z add.s64 %rd152, %rd170, %rd165; 2026-02-21T12:43:14.3225309Z mad.lo.s64 %rd171, %rd162, 2560, %rd88; 2026-02-21T12:43:14.3225509Z add.s64 %rd153, %rd171, %rd165; 2026-02-21T12:43:14.3225704Z mad.lo.s64 %rd172, %rd163, 2560, %rd88; 2026-02-21T12:43:14.3225916Z add.s64 %rd154, %rd172, %rd165; 2026-02-21T12:43:14.3226253Z .loc 1 88 81 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:81 2026-02-21T12:43:14.3226727Z bar.sync 0; 2026-02-21T12:43:14.3226934Z st.shared.v4.b32 [%r15], {%r2332, %r2334, %r2336, %r2338}; 2026-02-21T12:43:14.3227255Z st.shared.v4.b32 [%r15+512], {%r2333, %r2335, %r2337, %r2339}; 2026-02-21T12:43:14.3227559Z st.shared.v4.b32 [%r16], {%r2340, %r2342, %r2344, %r2346}; 2026-02-21T12:43:14.3227859Z st.shared.v4.b32 [%r16+512], {%r2341, %r2343, %r2345, %r2347}; 2026-02-21T12:43:14.3228154Z st.shared.v4.b32 [%r17], {%r2348, %r2350, %r2352, %r2354}; 2026-02-21T12:43:14.3228453Z st.shared.v4.b32 [%r17+512], {%r2349, %r2351, %r2353, %r2355}; 2026-02-21T12:43:14.3228817Z st.shared.v4.b32 [%r18], {%r2356, %r2358, %r2360, %r2362}; 2026-02-21T12:43:14.3229112Z st.shared.v4.b32 [%r18+512], {%r2357, %r2359, %r2361, %r2363}; 2026-02-21T12:43:14.3229355Z bar.sync 0; 2026-02-21T12:43:14.3229510Z // begin inline asm 2026-02-21T12:43:14.3229803Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2260, %r2261, %r2262, %r2263}, [%r2264]; 2026-02-21T12:43:14.3230140Z // end inline asm 2026-02-21T12:43:14.3230298Z // begin inline asm 2026-02-21T12:43:14.3230578Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2265, %r2266, %r2267, %r2268}, [%r2269]; 2026-02-21T12:43:14.3230911Z // end inline asm 2026-02-21T12:43:14.3231058Z // begin inline asm 2026-02-21T12:43:14.3231340Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2270, %r2271, %r2272, %r2273}, [%r2274]; 2026-02-21T12:43:14.3231675Z // end inline asm 2026-02-21T12:43:14.3231825Z // begin inline asm 2026-02-21T12:43:14.3232102Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2275, %r2276, %r2277, %r2278}, [%r2279]; 2026-02-21T12:43:14.3232424Z // end inline asm 2026-02-21T12:43:14.3232576Z // begin inline asm 2026-02-21T12:43:14.3232843Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2280, %r2281, %r2282, %r2283}, [%r2284]; 2026-02-21T12:43:14.3233301Z // end inline asm 2026-02-21T12:43:14.3233447Z // begin inline asm 2026-02-21T12:43:14.3233725Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2285, %r2286, %r2287, %r2288}, [%r2289]; 2026-02-21T12:43:14.3234133Z // end inline asm 2026-02-21T12:43:14.3234298Z // begin inline asm 2026-02-21T12:43:14.3234578Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2290, %r2291, %r2292, %r2293}, [%r2294]; 2026-02-21T12:43:14.3234902Z // end inline asm 2026-02-21T12:43:14.3235055Z // begin inline asm 2026-02-21T12:43:14.3235324Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2295, %r2296, %r2297, %r2298}, [%r2299]; 2026-02-21T12:43:14.3235653Z // end inline asm 2026-02-21T12:43:14.3235800Z // begin inline asm 2026-02-21T12:43:14.3236031Z st.global.v4.b32 [ %rd147 + 0 ], { %r2260, %r2261, %r2262, %r2263 }; 2026-02-21T12:43:14.3236295Z // end inline asm 2026-02-21T12:43:14.3236450Z // begin inline asm 2026-02-21T12:43:14.3236797Z st.global.v4.b32 [ %rd148 + 0 ], { %r2265, %r2266, %r2267, %r2268 }; 2026-02-21T12:43:14.3237056Z // end inline asm 2026-02-21T12:43:14.3237210Z // begin inline asm 2026-02-21T12:43:14.3237521Z st.global.v4.b32 [ %rd149 + 0 ], { %r2270, %r2271, %r2272, %r2273 }; 2026-02-21T12:43:14.3237871Z // end inline asm 2026-02-21T12:43:14.3238024Z // begin inline asm 2026-02-21T12:43:14.3238241Z st.global.v4.b32 [ %rd150 + 0 ], { %r2275, %r2276, %r2277, %r2278 }; 2026-02-21T12:43:14.3238497Z // end inline asm 2026-02-21T12:43:14.3238653Z // begin inline asm 2026-02-21T12:43:14.3238876Z st.global.v4.b32 [ %rd151 + 0 ], { %r2280, %r2281, %r2282, %r2283 }; 2026-02-21T12:43:14.3239132Z // end inline asm 2026-02-21T12:43:14.3239299Z // begin inline asm 2026-02-21T12:43:14.3239517Z st.global.v4.b32 [ %rd152 + 0 ], { %r2285, %r2286, %r2287, %r2288 }; 2026-02-21T12:43:14.3239780Z // end inline asm 2026-02-21T12:43:14.3239928Z // begin inline asm 2026-02-21T12:43:14.3240146Z st.global.v4.b32 [ %rd153 + 0 ], { %r2290, %r2291, %r2292, %r2293 }; 2026-02-21T12:43:14.3240403Z // end inline asm 2026-02-21T12:43:14.3240554Z // begin inline asm 2026-02-21T12:43:14.3240769Z st.global.v4.b32 [ %rd154 + 0 ], { %r2295, %r2296, %r2297, %r2298 }; 2026-02-21T12:43:14.3241025Z // end inline asm 2026-02-21T12:43:14.3241409Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3241775Z add.s64 %rd173, %rd398, 264; 2026-02-21T12:43:14.3242106Z .loc 1 25 35 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:25:35 2026-02-21T12:43:14.3242464Z shr.s64 %rd174, %rd173, 63; 2026-02-21T12:43:14.3242651Z shr.u64 %rd175, %rd174, 52; 2026-02-21T12:43:14.3242829Z add.s64 %rd176, %rd173, %rd175; 2026-02-21T12:43:14.3243022Z shr.s64 %rd177, %rd176, 12; 2026-02-21T12:43:14.3243350Z .loc 1 26 33 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:26:33 2026-02-21T12:43:14.3243705Z shl.b64 %rd41, %rd177, 3; 2026-02-21T12:43:14.3244024Z .loc 1 27 39 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:39 2026-02-21T12:43:14.3244375Z sub.s64 %rd178, 10, %rd41; 2026-02-21T12:43:14.3244695Z .loc 1 27 52 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:52 2026-02-21T12:43:14.3245048Z min.s64 %rd42, %rd178, 8; 2026-02-21T12:43:14.3245365Z .loc 1 28 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:45 2026-02-21T12:43:14.3245727Z and.b64 %rd179, %rd176, -4096; 2026-02-21T12:43:14.3245918Z sub.s64 %rd43, %rd173, %rd179; 2026-02-21T12:43:14.3246257Z .loc 1 29 51 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:29:51 2026-02-21T12:43:14.3246769Z or.b64 %rd180, %rd43, %rd42; 2026-02-21T12:43:14.3246965Z and.b64 %rd181, %rd180, -4294967296; 2026-02-21T12:43:14.3247172Z setp.ne.b64 %p14, %rd181, 0; 2026-02-21T12:43:14.3247382Z @%p14 bra $L__BB0_15; 2026-02-21T12:43:14.3247559Z bra.uni $L__BB0_14; 2026-02-21T12:43:14.3247779Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3248145Z div.s64 %rd403, %rd43, %rd42; 2026-02-21T12:43:14.3248328Z bra.uni $L__BB0_16; 2026-02-21T12:43:14.3248624Z $L__BB0_14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3248887Z cvt.u32.u64 %r2364, %rd42; 2026-02-21T12:43:14.3249068Z cvt.u32.u64 %r2365, %rd43; 2026-02-21T12:43:14.3249242Z div.u32 %r2366, %r2365, %r2364; 2026-02-21T12:43:14.3249450Z cvt.u64.u32 %rd403, %r2366; 2026-02-21T12:43:14.3249682Z $L__BB0_16: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3250078Z .loc 1 28 64 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:64 2026-02-21T12:43:14.3250440Z mul.lo.s64 %rd183, %rd403, %rd42; 2026-02-21T12:43:14.3250637Z sub.s64 %rd184, %rd43, %rd183; 2026-02-21T12:43:14.3250973Z .loc 1 28 30 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:30 2026-02-21T12:43:14.3251330Z add.s64 %rd185, %rd184, %rd41; 2026-02-21T12:43:14.3251657Z .loc 1 30 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:30:27 2026-02-21T12:43:14.3252160Z shl.b64 %rd47, %rd185, 7; 2026-02-21T12:43:14.3252497Z .loc 1 32 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:32:27 2026-02-21T12:43:14.3252855Z shl.b64 %rd48, %rd403, 9; 2026-02-21T12:43:14.3253172Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3253539Z shl.b64 %rd186, %rd403, 23; 2026-02-21T12:43:14.3253724Z add.s64 %rd405, %rd16, %rd186; 2026-02-21T12:43:14.3253916Z add.s64 %rd404, %rd17, %rd47; 2026-02-21T12:43:14.3254095Z mov.b32 %r7681, 0f00000000; 2026-02-21T12:43:14.3254279Z mov.b64 %rd406, -32; 2026-02-21T12:43:14.3254453Z mov.b32 %r7682, %r7681; 2026-02-21T12:43:14.3254622Z mov.b32 %r7683, %r7681; 2026-02-21T12:43:14.3254792Z mov.b32 %r7684, %r7681; 2026-02-21T12:43:14.3254957Z mov.b32 %r7685, %r7681; 2026-02-21T12:43:14.3255120Z mov.b32 %r7686, %r7681; 2026-02-21T12:43:14.3255279Z mov.b32 %r7687, %r7681; 2026-02-21T12:43:14.3255448Z mov.b32 %r7688, %r7681; 2026-02-21T12:43:14.3255609Z mov.b32 %r7689, %r7681; 2026-02-21T12:43:14.3255778Z mov.b32 %r7690, %r7681; 2026-02-21T12:43:14.3255942Z mov.b32 %r7691, %r7681; 2026-02-21T12:43:14.3256110Z mov.b32 %r7692, %r7681; 2026-02-21T12:43:14.3256275Z mov.b32 %r7693, %r7681; 2026-02-21T12:43:14.3256435Z mov.b32 %r7694, %r7681; 2026-02-21T12:43:14.3256752Z mov.b32 %r7695, %r7681; 2026-02-21T12:43:14.3256920Z mov.b32 %r7696, %r7681; 2026-02-21T12:43:14.3257086Z mov.b32 %r7697, %r7681; 2026-02-21T12:43:14.3257246Z mov.b32 %r7698, %r7681; 2026-02-21T12:43:14.3257422Z mov.b32 %r7699, %r7681; 2026-02-21T12:43:14.3257585Z mov.b32 %r7700, %r7681; 2026-02-21T12:43:14.3257752Z mov.b32 %r7701, %r7681; 2026-02-21T12:43:14.3257913Z mov.b32 %r7702, %r7681; 2026-02-21T12:43:14.3258079Z mov.b32 %r7703, %r7681; 2026-02-21T12:43:14.3258250Z mov.b32 %r7704, %r7681; 2026-02-21T12:43:14.3258412Z mov.b32 %r7705, %r7681; 2026-02-21T12:43:14.3258578Z mov.b32 %r7706, %r7681; 2026-02-21T12:43:14.3258752Z mov.b32 %r7707, %r7681; 2026-02-21T12:43:14.3258918Z mov.b32 %r7708, %r7681; 2026-02-21T12:43:14.3259080Z mov.b32 %r7709, %r7681; 2026-02-21T12:43:14.3259248Z mov.b32 %r7710, %r7681; 2026-02-21T12:43:14.3259408Z mov.b32 %r7711, %r7681; 2026-02-21T12:43:14.3259573Z mov.b32 %r7712, %r7681; 2026-02-21T12:43:14.3259733Z mov.b32 %r7713, %r7681; 2026-02-21T12:43:14.3259911Z mov.b32 %r7714, %r7681; 2026-02-21T12:43:14.3260077Z mov.b32 %r7715, %r7681; 2026-02-21T12:43:14.3260237Z mov.b32 %r7716, %r7681; 2026-02-21T12:43:14.3260406Z mov.b32 %r7717, %r7681; 2026-02-21T12:43:14.3260568Z mov.b32 %r7718, %r7681; 2026-02-21T12:43:14.3260739Z mov.b32 %r7719, %r7681; 2026-02-21T12:43:14.3260902Z mov.b32 %r7720, %r7681; 2026-02-21T12:43:14.3261070Z mov.b32 %r7721, %r7681; 2026-02-21T12:43:14.3261336Z mov.b32 %r7722, %r7681; 2026-02-21T12:43:14.3261509Z mov.b32 %r7723, %r7681; 2026-02-21T12:43:14.3261675Z mov.b32 %r7724, %r7681; 2026-02-21T12:43:14.3261919Z mov.b32 %r7725, %r7681; 2026-02-21T12:43:14.3262093Z mov.b32 %r7726, %r7681; 2026-02-21T12:43:14.3262256Z mov.b32 %r7727, %r7681; 2026-02-21T12:43:14.3262424Z mov.b32 %r7728, %r7681; 2026-02-21T12:43:14.3262588Z mov.b32 %r7729, %r7681; 2026-02-21T12:43:14.3262756Z mov.b32 %r7730, %r7681; 2026-02-21T12:43:14.3262916Z mov.b32 %r7731, %r7681; 2026-02-21T12:43:14.3263086Z mov.b32 %r7732, %r7681; 2026-02-21T12:43:14.3263250Z mov.b32 %r7733, %r7681; 2026-02-21T12:43:14.3263415Z mov.b32 %r7734, %r7681; 2026-02-21T12:43:14.3263576Z mov.b32 %r7735, %r7681; 2026-02-21T12:43:14.3263742Z mov.b32 %r7736, %r7681; 2026-02-21T12:43:14.3263910Z mov.b32 %r7737, %r7681; 2026-02-21T12:43:14.3264070Z mov.b32 %r7738, %r7681; 2026-02-21T12:43:14.3264237Z mov.b32 %r7739, %r7681; 2026-02-21T12:43:14.3264412Z mov.b32 %r7740, %r7681; 2026-02-21T12:43:14.3264581Z mov.b32 %r7741, %r7681; 2026-02-21T12:43:14.3264742Z mov.b32 %r7742, %r7681; 2026-02-21T12:43:14.3264910Z mov.b32 %r7743, %r7681; 2026-02-21T12:43:14.3265220Z mov.b32 %r7744, %r7681; 2026-02-21T12:43:14.3265456Z $L__BB0_17: // Parent Loop BB0_2 Depth=1 2026-02-21T12:43:14.3265754Z // => This Inner Loop Header: Depth=2 2026-02-21T12:43:14.3266145Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3266625Z add.s64 %rd188, %rd405, -96; 2026-02-21T12:43:14.3266812Z // begin inline asm 2026-02-21T12:43:14.3266976Z mov.u64 %rd187, 0x0; 2026-02-21T12:43:14.3267202Z createpolicy.fractional.L2::evict_last.b64 %rd187, 1.0; 2026-02-21T12:43:14.3267479Z // end inline asm 2026-02-21T12:43:14.3267631Z // begin inline asm 2026-02-21T12:43:14.3267790Z mov.u32 %r2368, 0x0; 2026-02-21T12:43:14.3267951Z mov.u32 %r2369, 0x0; 2026-02-21T12:43:14.3268107Z mov.u32 %r2370, 0x0; 2026-02-21T12:43:14.3268263Z mov.u32 %r2371, 0x0; 2026-02-21T12:43:14.3268661Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2368, %r2369, %r2370, %r2371 }, [ %rd188 + 0 ], %rd187; 2026-02-21T12:43:14.3269055Z // end inline asm 2026-02-21T12:43:14.3269352Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3269705Z bar.sync 0; 2026-02-21T12:43:14.3269874Z st.shared.v2.b32 [%r7], {%r2368, %r2369}; 2026-02-21T12:43:14.3270109Z st.shared.v2.b32 [%r8], {%r2370, %r2371}; 2026-02-21T12:43:14.3270318Z bar.sync 0; 2026-02-21T12:43:14.3270470Z ld.shared.b16 %rs81, [%r27]; 2026-02-21T12:43:14.3270669Z ld.shared.b16 %rs82, [%r27+256]; 2026-02-21T12:43:14.3270867Z ld.shared.b16 %rs83, [%r27+16]; 2026-02-21T12:43:14.3271080Z ld.shared.b16 %rs84, [%r27+272]; 2026-02-21T12:43:14.3271274Z ld.shared.b16 %rs85, [%r28]; 2026-02-21T12:43:14.3271467Z ld.shared.b16 %rs86, [%r28+256]; 2026-02-21T12:43:14.3271667Z ld.shared.b16 %rs87, [%r28+16]; 2026-02-21T12:43:14.3271870Z ld.shared.b16 %rs88, [%r28+272]; 2026-02-21T12:43:14.3272071Z cvt.f32.bf16 %r2500, %rs81; 2026-02-21T12:43:14.3272259Z cvt.f32.bf16 %r2501, %rs82; 2026-02-21T12:43:14.3272445Z cvt.f32.bf16 %r2502, %rs85; 2026-02-21T12:43:14.3272621Z cvt.f32.bf16 %r2503, %rs86; 2026-02-21T12:43:14.3272802Z cvt.f32.bf16 %r2632, %rs83; 2026-02-21T12:43:14.3272977Z cvt.f32.bf16 %r2633, %rs84; 2026-02-21T12:43:14.3273158Z cvt.f32.bf16 %r2634, %rs87; 2026-02-21T12:43:14.3273349Z cvt.f32.bf16 %r2635, %rs88; 2026-02-21T12:43:14.3273680Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3274050Z add.s64 %rd191, %rd404, -30720; 2026-02-21T12:43:14.3274239Z // begin inline asm 2026-02-21T12:43:14.3274402Z mov.u64 %rd190, 0x0; 2026-02-21T12:43:14.3274628Z createpolicy.fractional.L2::evict_first.b64 %rd190, 1.0; 2026-02-21T12:43:14.3274894Z // end inline asm 2026-02-21T12:43:14.3275144Z // begin inline asm 2026-02-21T12:43:14.3275306Z mov.u16 %rs77, 0x0; 2026-02-21T12:43:14.3275564Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs77 }, [ %rd191 + 0 ], %rd190; 2026-02-21T12:43:14.3275956Z // end inline asm 2026-02-21T12:43:14.3276258Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3276742Z bar.sync 0; 2026-02-21T12:43:14.3276902Z st.shared.b8 [%r11], %rs77; 2026-02-21T12:43:14.3277076Z bar.sync 0; 2026-02-21T12:43:14.3277239Z ld.shared.v2.b8 {%rs89, %rs90}, [%r12]; 2026-02-21T12:43:14.3277608Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3277966Z shl.b16 %rs91, %rs89, 4; 2026-02-21T12:43:14.3278138Z shl.b16 %rs92, %rs90, 4; 2026-02-21T12:43:14.3278462Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3278825Z selp.b16 %rs93, %rs91, %rs89, %p51; 2026-02-21T12:43:14.3279040Z cvt.s16.s8 %rs94, %rs93; 2026-02-21T12:43:14.3279217Z shr.s16 %rs95, %rs94, 4; 2026-02-21T12:43:14.3279472Z selp.b16 %rs96, %rs92, %rs90, %p51; 2026-02-21T12:43:14.3279773Z cvt.s16.s8 %rs97, %rs96; 2026-02-21T12:43:14.3279949Z shr.s16 %rs98, %rs97, 4; 2026-02-21T12:43:14.3280263Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3280624Z cvt.rn.f32.s16 %r3976, %rs95; 2026-02-21T12:43:14.3280811Z cvt.rn.f32.s16 %r3977, %rs98; 2026-02-21T12:43:14.3281002Z bar.sync 0; 2026-02-21T12:43:14.3281158Z st.shared.b32 [%r13], %r3976; 2026-02-21T12:43:14.3281346Z st.shared.b32 [%r14], %r3977; 2026-02-21T12:43:14.3281518Z $L__tmp9: 2026-02-21T12:43:14.3281876Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3282297Z // begin inline asm 2026-02-21T12:43:14.3282477Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3282665Z // end inline asm 2026-02-21T12:43:14.3282834Z bar.sync 0; 2026-02-21T12:43:14.3283003Z shfl.sync.idx.b32 %r3978, %r2, 0, 31, -1; 2026-02-21T12:43:14.3283234Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3289039Z mov.pred %p15, -1; 2026-02-21T12:43:14.3289249Z // begin inline asm 2026-02-21T12:43:14.3290647Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r2500,%r2501,%r2502,%r2503}, %rd14, %p15, 1, 1; 2026-02-21T12:43:14.3292085Z // end inline asm 2026-02-21T12:43:14.3292255Z // begin inline asm 2026-02-21T12:43:14.3293633Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r2632,%r2633,%r2634,%r2635}, %rd15, %p15, 1, 1; 2026-02-21T12:43:14.3295042Z // end inline asm 2026-02-21T12:43:14.3295221Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3295429Z mov.b32 %r3908, 0; 2026-02-21T12:43:14.3295596Z mov.b32 %r2700, %r6277; 2026-02-21T12:43:14.3295765Z mov.b32 %r2701, %r3908; 2026-02-21T12:43:14.3295948Z mov.b32 %r2702, %r3908; 2026-02-21T12:43:14.3296117Z // begin inline asm 2026-02-21T12:43:14.3297448Z // wait for regs: %r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744,%r2700,%r2701,%r2702 2026-02-21T12:43:14.3298953Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3299154Z // end inline asm 2026-02-21T12:43:14.3299308Z $L__tmp10: 2026-02-21T12:43:14.3299608Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3299984Z add.s64 %rd196, %rd405, -64; 2026-02-21T12:43:14.3300177Z // begin inline asm 2026-02-21T12:43:14.3300337Z mov.u64 %rd195, 0x0; 2026-02-21T12:43:14.3300579Z createpolicy.fractional.L2::evict_last.b64 %rd195, 1.0; 2026-02-21T12:43:14.3300836Z // end inline asm 2026-02-21T12:43:14.3301003Z // begin inline asm 2026-02-21T12:43:14.3301161Z mov.u32 %r2770, 0x0; 2026-02-21T12:43:14.3301337Z mov.u32 %r2771, 0x0; 2026-02-21T12:43:14.3301578Z mov.u32 %r2772, 0x0; 2026-02-21T12:43:14.3301819Z mov.u32 %r2773, 0x0; 2026-02-21T12:43:14.3302156Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r2770, %r2771, %r2772, %r2773 }, [ %rd196 + 0 ], %rd195; 2026-02-21T12:43:14.3302526Z // end inline asm 2026-02-21T12:43:14.3302837Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3303201Z bar.sync 0; 2026-02-21T12:43:14.3303381Z st.shared.v2.b32 [%r7], {%r2770, %r2771}; 2026-02-21T12:43:14.3303611Z st.shared.v2.b32 [%r8], {%r2772, %r2773}; 2026-02-21T12:43:14.3303823Z bar.sync 0; 2026-02-21T12:43:14.3303985Z ld.shared.b16 %rs99, [%r27]; 2026-02-21T12:43:14.3304184Z ld.shared.b16 %rs100, [%r27+256]; 2026-02-21T12:43:14.3304396Z ld.shared.b16 %rs101, [%r27+16]; 2026-02-21T12:43:14.3304593Z ld.shared.b16 %rs102, [%r27+272]; 2026-02-21T12:43:14.3304799Z ld.shared.b16 %rs103, [%r28]; 2026-02-21T12:43:14.3304985Z ld.shared.b16 %rs104, [%r28+256]; 2026-02-21T12:43:14.3305192Z ld.shared.b16 %rs105, [%r28+16]; 2026-02-21T12:43:14.3305388Z ld.shared.b16 %rs106, [%r28+272]; 2026-02-21T12:43:14.3305595Z cvt.f32.bf16 %r2902, %rs99; 2026-02-21T12:43:14.3305783Z cvt.f32.bf16 %r2903, %rs100; 2026-02-21T12:43:14.3305972Z cvt.f32.bf16 %r2904, %rs103; 2026-02-21T12:43:14.3306160Z cvt.f32.bf16 %r2905, %rs104; 2026-02-21T12:43:14.3306339Z cvt.f32.bf16 %r3034, %rs101; 2026-02-21T12:43:14.3306660Z cvt.f32.bf16 %r3035, %rs102; 2026-02-21T12:43:14.3306837Z cvt.f32.bf16 %r3036, %rs105; 2026-02-21T12:43:14.3307018Z cvt.f32.bf16 %r3037, %rs106; 2026-02-21T12:43:14.3307372Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3307768Z add.s64 %rd199, %rd404, -20480; 2026-02-21T12:43:14.3307964Z // begin inline asm 2026-02-21T12:43:14.3308042Z mov.u64 %rd198, 0x0; 2026-02-21T12:43:14.3308177Z createpolicy.fractional.L2::evict_first.b64 %rd198, 1.0; 2026-02-21T12:43:14.3308239Z // end inline asm 2026-02-21T12:43:14.3308315Z // begin inline asm 2026-02-21T12:43:14.3308377Z mov.u16 %rs78, 0x0; 2026-02-21T12:43:14.3308549Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs78 }, [ %rd199 + 0 ], %rd198; 2026-02-21T12:43:14.3308696Z // end inline asm 2026-02-21T12:43:14.3308914Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3308975Z bar.sync 0; 2026-02-21T12:43:14.3309046Z st.shared.b8 [%r11], %rs78; 2026-02-21T12:43:14.3309112Z bar.sync 0; 2026-02-21T12:43:14.3309196Z ld.shared.v2.b8 {%rs107, %rs108}, [%r12]; 2026-02-21T12:43:14.3309399Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3309473Z shl.b16 %rs109, %rs107, 4; 2026-02-21T12:43:14.3309538Z shl.b16 %rs110, %rs108, 4; 2026-02-21T12:43:14.3309861Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3309949Z selp.b16 %rs111, %rs109, %rs107, %p51; 2026-02-21T12:43:14.3310092Z cvt.s16.s8 %rs112, %rs111; 2026-02-21T12:43:14.3310158Z shr.s16 %rs113, %rs112, 4; 2026-02-21T12:43:14.3310242Z selp.b16 %rs114, %rs110, %rs108, %p51; 2026-02-21T12:43:14.3310318Z cvt.s16.s8 %rs115, %rs114; 2026-02-21T12:43:14.3310385Z shr.s16 %rs116, %rs115, 4; 2026-02-21T12:43:14.3310593Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3310670Z cvt.rn.f32.s16 %r3979, %rs113; 2026-02-21T12:43:14.3310738Z cvt.rn.f32.s16 %r3980, %rs116; 2026-02-21T12:43:14.3310799Z bar.sync 0; 2026-02-21T12:43:14.3310868Z st.shared.b32 [%r13], %r3979; 2026-02-21T12:43:14.3310939Z st.shared.b32 [%r14], %r3980; 2026-02-21T12:43:14.3310997Z $L__tmp11: 2026-02-21T12:43:14.3311275Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3311350Z // begin inline asm 2026-02-21T12:43:14.3311566Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3311637Z // end inline asm 2026-02-21T12:43:14.3311702Z bar.sync 0; 2026-02-21T12:43:14.3311779Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3311841Z // begin inline asm 2026-02-21T12:43:14.3313123Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r2902,%r2903,%r2904,%r2905}, %rd14, %p15, 1, 1; 2026-02-21T12:43:14.3313185Z // end inline asm 2026-02-21T12:43:14.3313248Z // begin inline asm 2026-02-21T12:43:14.3314522Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r3034,%r3035,%r3036,%r3037}, %rd15, %p15, 1, 1; 2026-02-21T12:43:14.3314585Z // end inline asm 2026-02-21T12:43:14.3314666Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3314737Z mov.b32 %r3102, %r6277; 2026-02-21T12:43:14.3314799Z mov.b32 %r3103, %r3908; 2026-02-21T12:43:14.3314859Z mov.b32 %r3104, %r3908; 2026-02-21T12:43:14.3314926Z // begin inline asm 2026-02-21T12:43:14.3315996Z // wait for regs: %r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744,%r3102,%r3103,%r3104 2026-02-21T12:43:14.3316080Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3316146Z // end inline asm 2026-02-21T12:43:14.3316203Z $L__tmp12: 2026-02-21T12:43:14.3316414Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3316607Z add.s64 %rd204, %rd405, -32; 2026-02-21T12:43:14.3316681Z // begin inline asm 2026-02-21T12:43:14.3316743Z mov.u64 %rd203, 0x0; 2026-02-21T12:43:14.3316879Z createpolicy.fractional.L2::evict_last.b64 %rd203, 1.0; 2026-02-21T12:43:14.3316946Z // end inline asm 2026-02-21T12:43:14.3317111Z // begin inline asm 2026-02-21T12:43:14.3317174Z mov.u32 %r3172, 0x0; 2026-02-21T12:43:14.3317242Z mov.u32 %r3173, 0x0; 2026-02-21T12:43:14.3317379Z mov.u32 %r3174, 0x0; 2026-02-21T12:43:14.3317452Z mov.u32 %r3175, 0x0; 2026-02-21T12:43:14.3317688Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3172, %r3173, %r3174, %r3175 }, [ %rd204 + 0 ], %rd203; 2026-02-21T12:43:14.3317755Z // end inline asm 2026-02-21T12:43:14.3317968Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3318026Z bar.sync 0; 2026-02-21T12:43:14.3318114Z st.shared.v2.b32 [%r7], {%r3172, %r3173}; 2026-02-21T12:43:14.3318192Z st.shared.v2.b32 [%r8], {%r3174, %r3175}; 2026-02-21T12:43:14.3318252Z bar.sync 0; 2026-02-21T12:43:14.3318329Z ld.shared.b16 %rs117, [%r27]; 2026-02-21T12:43:14.3318400Z ld.shared.b16 %rs118, [%r27+256]; 2026-02-21T12:43:14.3318471Z ld.shared.b16 %rs119, [%r27+16]; 2026-02-21T12:43:14.3318541Z ld.shared.b16 %rs120, [%r27+272]; 2026-02-21T12:43:14.3318613Z ld.shared.b16 %rs121, [%r28]; 2026-02-21T12:43:14.3318680Z ld.shared.b16 %rs122, [%r28+256]; 2026-02-21T12:43:14.3318886Z ld.shared.b16 %rs123, [%r28+16]; 2026-02-21T12:43:14.3318975Z ld.shared.b16 %rs124, [%r28+272]; 2026-02-21T12:43:14.3319046Z cvt.f32.bf16 %r3304, %rs117; 2026-02-21T12:43:14.3319111Z cvt.f32.bf16 %r3305, %rs118; 2026-02-21T12:43:14.3319177Z cvt.f32.bf16 %r3306, %rs121; 2026-02-21T12:43:14.3319250Z cvt.f32.bf16 %r3307, %rs122; 2026-02-21T12:43:14.3319314Z cvt.f32.bf16 %r3436, %rs119; 2026-02-21T12:43:14.3319377Z cvt.f32.bf16 %r3437, %rs120; 2026-02-21T12:43:14.3319446Z cvt.f32.bf16 %r3438, %rs123; 2026-02-21T12:43:14.3319509Z cvt.f32.bf16 %r3439, %rs124; 2026-02-21T12:43:14.3319720Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3319797Z add.s64 %rd207, %rd404, -10240; 2026-02-21T12:43:14.3319860Z // begin inline asm 2026-02-21T12:43:14.3319924Z mov.u64 %rd206, 0x0; 2026-02-21T12:43:14.3320058Z createpolicy.fractional.L2::evict_first.b64 %rd206, 1.0; 2026-02-21T12:43:14.3320130Z // end inline asm 2026-02-21T12:43:14.3320195Z // begin inline asm 2026-02-21T12:43:14.3320257Z mov.u16 %rs79, 0x0; 2026-02-21T12:43:14.3320430Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs79 }, [ %rd207 + 0 ], %rd206; 2026-02-21T12:43:14.3320492Z // end inline asm 2026-02-21T12:43:14.3320699Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3320760Z bar.sync 0; 2026-02-21T12:43:14.3320837Z st.shared.b8 [%r11], %rs79; 2026-02-21T12:43:14.3320896Z bar.sync 0; 2026-02-21T12:43:14.3320976Z ld.shared.v2.b8 {%rs125, %rs126}, [%r12]; 2026-02-21T12:43:14.3321186Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3321254Z shl.b16 %rs127, %rs125, 4; 2026-02-21T12:43:14.3321319Z shl.b16 %rs128, %rs126, 4; 2026-02-21T12:43:14.3321530Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3321616Z selp.b16 %rs129, %rs127, %rs125, %p51; 2026-02-21T12:43:14.3321681Z cvt.s16.s8 %rs130, %rs129; 2026-02-21T12:43:14.3321745Z shr.s16 %rs131, %rs130, 4; 2026-02-21T12:43:14.3321828Z selp.b16 %rs132, %rs128, %rs126, %p51; 2026-02-21T12:43:14.3321895Z cvt.s16.s8 %rs133, %rs132; 2026-02-21T12:43:14.3321960Z shr.s16 %rs134, %rs133, 4; 2026-02-21T12:43:14.3322165Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3322235Z cvt.rn.f32.s16 %r3981, %rs131; 2026-02-21T12:43:14.3322304Z cvt.rn.f32.s16 %r3982, %rs134; 2026-02-21T12:43:14.3322368Z bar.sync 0; 2026-02-21T12:43:14.3322438Z st.shared.b32 [%r13], %r3981; 2026-02-21T12:43:14.3322503Z st.shared.b32 [%r14], %r3982; 2026-02-21T12:43:14.3322560Z $L__tmp13: 2026-02-21T12:43:14.3322843Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3322982Z // begin inline asm 2026-02-21T12:43:14.3323121Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3323189Z // end inline asm 2026-02-21T12:43:14.3323246Z bar.sync 0; 2026-02-21T12:43:14.3323322Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3323386Z // begin inline asm 2026-02-21T12:43:14.3324668Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r3304,%r3305,%r3306,%r3307}, %rd14, %p15, 1, 1; 2026-02-21T12:43:14.3324732Z // end inline asm 2026-02-21T12:43:14.3324800Z // begin inline asm 2026-02-21T12:43:14.3326181Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r3436,%r3437,%r3438,%r3439}, %rd15, %p15, 1, 1; 2026-02-21T12:43:14.3326253Z // end inline asm 2026-02-21T12:43:14.3326343Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3326407Z mov.b32 %r3504, %r6277; 2026-02-21T12:43:14.3326591Z mov.b32 %r3505, %r3908; 2026-02-21T12:43:14.3326663Z mov.b32 %r3506, %r3908; 2026-02-21T12:43:14.3326724Z // begin inline asm 2026-02-21T12:43:14.3327812Z // wait for regs: %r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744,%r3504,%r3505,%r3506 2026-02-21T12:43:14.3327898Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3327958Z // end inline asm 2026-02-21T12:43:14.3328012Z $L__tmp14: 2026-02-21T12:43:14.3328221Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3328284Z // begin inline asm 2026-02-21T12:43:14.3328346Z mov.u64 %rd211, 0x0; 2026-02-21T12:43:14.3328471Z createpolicy.fractional.L2::evict_last.b64 %rd211, 1.0; 2026-02-21T12:43:14.3328536Z // end inline asm 2026-02-21T12:43:14.3328599Z // begin inline asm 2026-02-21T12:43:14.3328661Z mov.u32 %r3574, 0x0; 2026-02-21T12:43:14.3328732Z mov.u32 %r3575, 0x0; 2026-02-21T12:43:14.3328799Z mov.u32 %r3576, 0x0; 2026-02-21T12:43:14.3328861Z mov.u32 %r3577, 0x0; 2026-02-21T12:43:14.3329089Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r3574, %r3575, %r3576, %r3577 }, [ %rd405 + 0 ], %rd211; 2026-02-21T12:43:14.3329157Z // end inline asm 2026-02-21T12:43:14.3329358Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3329416Z bar.sync 0; 2026-02-21T12:43:14.3329502Z st.shared.v2.b32 [%r7], {%r3574, %r3575}; 2026-02-21T12:43:14.3329579Z st.shared.v2.b32 [%r8], {%r3576, %r3577}; 2026-02-21T12:43:14.3329636Z bar.sync 0; 2026-02-21T12:43:14.3329715Z ld.shared.b16 %rs135, [%r27]; 2026-02-21T12:43:14.3329784Z ld.shared.b16 %rs136, [%r27+256]; 2026-02-21T12:43:14.3329852Z ld.shared.b16 %rs137, [%r27+16]; 2026-02-21T12:43:14.3329918Z ld.shared.b16 %rs138, [%r27+272]; 2026-02-21T12:43:14.3330092Z ld.shared.b16 %rs139, [%r28]; 2026-02-21T12:43:14.3330161Z ld.shared.b16 %rs140, [%r28+256]; 2026-02-21T12:43:14.3330306Z ld.shared.b16 %rs141, [%r28+16]; 2026-02-21T12:43:14.3330382Z ld.shared.b16 %rs142, [%r28+272]; 2026-02-21T12:43:14.3330450Z cvt.f32.bf16 %r3706, %rs135; 2026-02-21T12:43:14.3330514Z cvt.f32.bf16 %r3707, %rs136; 2026-02-21T12:43:14.3330577Z cvt.f32.bf16 %r3708, %rs139; 2026-02-21T12:43:14.3330650Z cvt.f32.bf16 %r3709, %rs140; 2026-02-21T12:43:14.3330713Z cvt.f32.bf16 %r3838, %rs137; 2026-02-21T12:43:14.3330775Z cvt.f32.bf16 %r3839, %rs138; 2026-02-21T12:43:14.3330847Z cvt.f32.bf16 %r3840, %rs141; 2026-02-21T12:43:14.3330910Z cvt.f32.bf16 %r3841, %rs142; 2026-02-21T12:43:14.3331112Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3331180Z // begin inline asm 2026-02-21T12:43:14.3331244Z mov.u64 %rd214, 0x0; 2026-02-21T12:43:14.3331371Z createpolicy.fractional.L2::evict_first.b64 %rd214, 1.0; 2026-02-21T12:43:14.3331438Z // end inline asm 2026-02-21T12:43:14.3331507Z // begin inline asm 2026-02-21T12:43:14.3331647Z mov.u16 %rs80, 0x0; 2026-02-21T12:43:14.3331891Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs80 }, [ %rd404 + 0 ], %rd214; 2026-02-21T12:43:14.3331962Z // end inline asm 2026-02-21T12:43:14.3332165Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3332225Z bar.sync 0; 2026-02-21T12:43:14.3332301Z st.shared.b8 [%r11], %rs80; 2026-02-21T12:43:14.3332359Z bar.sync 0; 2026-02-21T12:43:14.3332441Z ld.shared.v2.b8 {%rs143, %rs144}, [%r12]; 2026-02-21T12:43:14.3332641Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3332714Z shl.b16 %rs145, %rs143, 4; 2026-02-21T12:43:14.3332778Z shl.b16 %rs146, %rs144, 4; 2026-02-21T12:43:14.3332978Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3333063Z selp.b16 %rs147, %rs145, %rs143, %p51; 2026-02-21T12:43:14.3333130Z cvt.s16.s8 %rs148, %rs147; 2026-02-21T12:43:14.3333200Z shr.s16 %rs149, %rs148, 4; 2026-02-21T12:43:14.3333275Z selp.b16 %rs150, %rs146, %rs144, %p51; 2026-02-21T12:43:14.3333344Z cvt.s16.s8 %rs151, %rs150; 2026-02-21T12:43:14.3333411Z shr.s16 %rs152, %rs151, 4; 2026-02-21T12:43:14.3333617Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3333685Z cvt.rn.f32.s16 %r3983, %rs149; 2026-02-21T12:43:14.3333750Z cvt.rn.f32.s16 %r3984, %rs152; 2026-02-21T12:43:14.3333806Z bar.sync 0; 2026-02-21T12:43:14.3333881Z st.shared.b32 [%r13], %r3983; 2026-02-21T12:43:14.3333946Z st.shared.b32 [%r14], %r3984; 2026-02-21T12:43:14.3334003Z $L__tmp15: 2026-02-21T12:43:14.3334280Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3334355Z // begin inline asm 2026-02-21T12:43:14.3334436Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3334497Z // end inline asm 2026-02-21T12:43:14.3334563Z bar.sync 0; 2026-02-21T12:43:14.3334638Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3334700Z // begin inline asm 2026-02-21T12:43:14.3335971Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r3706,%r3707,%r3708,%r3709}, %rd14, %p15, 1, 1; 2026-02-21T12:43:14.3336031Z // end inline asm 2026-02-21T12:43:14.3336093Z // begin inline asm 2026-02-21T12:43:14.3337602Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744}, {%r3838,%r3839,%r3840,%r3841}, %rd15, %p15, 1, 1; 2026-02-21T12:43:14.3337744Z // end inline asm 2026-02-21T12:43:14.3337830Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3337894Z mov.b32 %r3906, %r6277; 2026-02-21T12:43:14.3337954Z mov.b32 %r3907, %r3908; 2026-02-21T12:43:14.3338015Z // begin inline asm 2026-02-21T12:43:14.3339217Z // wait for regs: %r7681,%r7682,%r7683,%r7684,%r7685,%r7686,%r7687,%r7688,%r7689,%r7690,%r7691,%r7692,%r7693,%r7694,%r7695,%r7696,%r7697,%r7698,%r7699,%r7700,%r7701,%r7702,%r7703,%r7704,%r7705,%r7706,%r7707,%r7708,%r7709,%r7710,%r7711,%r7712,%r7713,%r7714,%r7715,%r7716,%r7717,%r7718,%r7719,%r7720,%r7721,%r7722,%r7723,%r7724,%r7725,%r7726,%r7727,%r7728,%r7729,%r7730,%r7731,%r7732,%r7733,%r7734,%r7735,%r7736,%r7737,%r7738,%r7739,%r7740,%r7741,%r7742,%r7743,%r7744,%r3906,%r3907,%r3908 2026-02-21T12:43:14.3339313Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3339380Z // end inline asm 2026-02-21T12:43:14.3339438Z $L__tmp16: 2026-02-21T12:43:14.3339662Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3339729Z add.s64 %rd406, %rd406, 32; 2026-02-21T12:43:14.3339801Z add.s64 %rd405, %rd405, 128; 2026-02-21T12:43:14.3339867Z add.s64 %rd404, %rd404, 40960; 2026-02-21T12:43:14.3339938Z setp.lt.u64 %p24, %rd406, 4064; 2026-02-21T12:43:14.3340010Z @%p24 bra $L__BB0_17; 2026-02-21T12:43:14.3340125Z // %bb.18: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3340334Z .loc 1 31 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:32 2026-02-21T12:43:14.3340409Z or.b64 %rd227, %rd47, %rd3; 2026-02-21T12:43:14.3340613Z .loc 1 33 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:32 2026-02-21T12:43:14.3340679Z or.b64 %rd228, %rd48, %rd6; 2026-02-21T12:43:14.3340742Z or.b64 %rd229, %rd48, %rd7; 2026-02-21T12:43:14.3340812Z or.b64 %rd230, %rd48, %rd8; 2026-02-21T12:43:14.3340875Z or.b64 %rd231, %rd48, %rd9; 2026-02-21T12:43:14.3340939Z or.b64 %rd232, %rd48, %rd10; 2026-02-21T12:43:14.3341007Z or.b64 %rd233, %rd48, %rd11; 2026-02-21T12:43:14.3341070Z or.b64 %rd234, %rd48, %rd12; 2026-02-21T12:43:14.3341131Z or.b64 %rd235, %rd48, %rd13; 2026-02-21T12:43:14.3341329Z .loc 1 87 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:87:28 2026-02-21T12:43:14.3341414Z cvt.rn.bf16x2.f32 %r4057, %r7682, %r7681; 2026-02-21T12:43:14.3341492Z cvt.rn.bf16x2.f32 %r4058, %r7684, %r7683; 2026-02-21T12:43:14.3341565Z cvt.rn.bf16x2.f32 %r4059, %r7686, %r7685; 2026-02-21T12:43:14.3341643Z cvt.rn.bf16x2.f32 %r4060, %r7688, %r7687; 2026-02-21T12:43:14.3341718Z cvt.rn.bf16x2.f32 %r4061, %r7690, %r7689; 2026-02-21T12:43:14.3341790Z cvt.rn.bf16x2.f32 %r4062, %r7692, %r7691; 2026-02-21T12:43:14.3341869Z cvt.rn.bf16x2.f32 %r4063, %r7694, %r7693; 2026-02-21T12:43:14.3341942Z cvt.rn.bf16x2.f32 %r4064, %r7696, %r7695; 2026-02-21T12:43:14.3342013Z cvt.rn.bf16x2.f32 %r4065, %r7698, %r7697; 2026-02-21T12:43:14.3342086Z cvt.rn.bf16x2.f32 %r4066, %r7700, %r7699; 2026-02-21T12:43:14.3342164Z cvt.rn.bf16x2.f32 %r4067, %r7702, %r7701; 2026-02-21T12:43:14.3342235Z cvt.rn.bf16x2.f32 %r4068, %r7704, %r7703; 2026-02-21T12:43:14.3342307Z cvt.rn.bf16x2.f32 %r4069, %r7706, %r7705; 2026-02-21T12:43:14.3342384Z cvt.rn.bf16x2.f32 %r4070, %r7708, %r7707; 2026-02-21T12:43:14.3342458Z cvt.rn.bf16x2.f32 %r4071, %r7710, %r7709; 2026-02-21T12:43:14.3342624Z cvt.rn.bf16x2.f32 %r4072, %r7712, %r7711; 2026-02-21T12:43:14.3342703Z cvt.rn.bf16x2.f32 %r4073, %r7714, %r7713; 2026-02-21T12:43:14.3342776Z cvt.rn.bf16x2.f32 %r4074, %r7716, %r7715; 2026-02-21T12:43:14.3342907Z cvt.rn.bf16x2.f32 %r4075, %r7718, %r7717; 2026-02-21T12:43:14.3342980Z cvt.rn.bf16x2.f32 %r4076, %r7720, %r7719; 2026-02-21T12:43:14.3343060Z cvt.rn.bf16x2.f32 %r4077, %r7722, %r7721; 2026-02-21T12:43:14.3343133Z cvt.rn.bf16x2.f32 %r4078, %r7724, %r7723; 2026-02-21T12:43:14.3343204Z cvt.rn.bf16x2.f32 %r4079, %r7726, %r7725; 2026-02-21T12:43:14.3343283Z cvt.rn.bf16x2.f32 %r4080, %r7728, %r7727; 2026-02-21T12:43:14.3343355Z cvt.rn.bf16x2.f32 %r4081, %r7730, %r7729; 2026-02-21T12:43:14.3343428Z cvt.rn.bf16x2.f32 %r4082, %r7732, %r7731; 2026-02-21T12:43:14.3343500Z cvt.rn.bf16x2.f32 %r4083, %r7734, %r7733; 2026-02-21T12:43:14.3343578Z cvt.rn.bf16x2.f32 %r4084, %r7736, %r7735; 2026-02-21T12:43:14.3343653Z cvt.rn.bf16x2.f32 %r4085, %r7738, %r7737; 2026-02-21T12:43:14.3343728Z cvt.rn.bf16x2.f32 %r4086, %r7740, %r7739; 2026-02-21T12:43:14.3343804Z cvt.rn.bf16x2.f32 %r4087, %r7742, %r7741; 2026-02-21T12:43:14.3343936Z cvt.rn.bf16x2.f32 %r4088, %r7744, %r7743; 2026-02-21T12:43:14.3344203Z .loc 1 88 22 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:22 2026-02-21T12:43:14.3344286Z mad.lo.s64 %rd236, %rd228, 2560, %rd88; 2026-02-21T12:43:14.3344353Z shl.b64 %rd237, %rd227, 1; 2026-02-21T12:43:14.3344421Z add.s64 %rd219, %rd236, %rd237; 2026-02-21T12:43:14.3344493Z mad.lo.s64 %rd238, %rd229, 2560, %rd88; 2026-02-21T12:43:14.3344564Z add.s64 %rd220, %rd238, %rd237; 2026-02-21T12:43:14.3344634Z mad.lo.s64 %rd239, %rd230, 2560, %rd88; 2026-02-21T12:43:14.3344699Z add.s64 %rd221, %rd239, %rd237; 2026-02-21T12:43:14.3344775Z mad.lo.s64 %rd240, %rd231, 2560, %rd88; 2026-02-21T12:43:14.3344840Z add.s64 %rd222, %rd240, %rd237; 2026-02-21T12:43:14.3344923Z mad.lo.s64 %rd241, %rd232, 2560, %rd88; 2026-02-21T12:43:14.3344993Z add.s64 %rd223, %rd241, %rd237; 2026-02-21T12:43:14.3345066Z mad.lo.s64 %rd242, %rd233, 2560, %rd88; 2026-02-21T12:43:14.3345142Z add.s64 %rd224, %rd242, %rd237; 2026-02-21T12:43:14.3345217Z mad.lo.s64 %rd243, %rd234, 2560, %rd88; 2026-02-21T12:43:14.3345282Z add.s64 %rd225, %rd243, %rd237; 2026-02-21T12:43:14.3345352Z mad.lo.s64 %rd244, %rd235, 2560, %rd88; 2026-02-21T12:43:14.3345423Z add.s64 %rd226, %rd244, %rd237; 2026-02-21T12:43:14.3345626Z .loc 1 88 81 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:81 2026-02-21T12:43:14.3345684Z bar.sync 0; 2026-02-21T12:43:14.3345803Z st.shared.v4.b32 [%r15], {%r4057, %r4059, %r4061, %r4063}; 2026-02-21T12:43:14.3345920Z st.shared.v4.b32 [%r15+512], {%r4058, %r4060, %r4062, %r4064}; 2026-02-21T12:43:14.3346027Z st.shared.v4.b32 [%r16], {%r4065, %r4067, %r4069, %r4071}; 2026-02-21T12:43:14.3346144Z st.shared.v4.b32 [%r16+512], {%r4066, %r4068, %r4070, %r4072}; 2026-02-21T12:43:14.3346251Z st.shared.v4.b32 [%r17], {%r4073, %r4075, %r4077, %r4079}; 2026-02-21T12:43:14.3346365Z st.shared.v4.b32 [%r17+512], {%r4074, %r4076, %r4078, %r4080}; 2026-02-21T12:43:14.3346628Z st.shared.v4.b32 [%r18], {%r4081, %r4083, %r4085, %r4087}; 2026-02-21T12:43:14.3346755Z st.shared.v4.b32 [%r18+512], {%r4082, %r4084, %r4086, %r4088}; 2026-02-21T12:43:14.3346815Z bar.sync 0; 2026-02-21T12:43:14.3346879Z // begin inline asm 2026-02-21T12:43:14.3347079Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3985, %r3986, %r3987, %r3988}, [%r2264]; 2026-02-21T12:43:14.3347140Z // end inline asm 2026-02-21T12:43:14.3347202Z // begin inline asm 2026-02-21T12:43:14.3347394Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3990, %r3991, %r3992, %r3993}, [%r2269]; 2026-02-21T12:43:14.3347454Z // end inline asm 2026-02-21T12:43:14.3347515Z // begin inline asm 2026-02-21T12:43:14.3347712Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3995, %r3996, %r3997, %r3998}, [%r2274]; 2026-02-21T12:43:14.3347777Z // end inline asm 2026-02-21T12:43:14.3347932Z // begin inline asm 2026-02-21T12:43:14.3348112Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4000, %r4001, %r4002, %r4003}, [%r2279]; 2026-02-21T12:43:14.3348272Z // end inline asm 2026-02-21T12:43:14.3348335Z // begin inline asm 2026-02-21T12:43:14.3348516Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4005, %r4006, %r4007, %r4008}, [%r2284]; 2026-02-21T12:43:14.3348660Z // end inline asm 2026-02-21T12:43:14.3348730Z // begin inline asm 2026-02-21T12:43:14.3348915Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4010, %r4011, %r4012, %r4013}, [%r2289]; 2026-02-21T12:43:14.3348972Z // end inline asm 2026-02-21T12:43:14.3349037Z // begin inline asm 2026-02-21T12:43:14.3349216Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4015, %r4016, %r4017, %r4018}, [%r2294]; 2026-02-21T12:43:14.3349274Z // end inline asm 2026-02-21T12:43:14.3349339Z // begin inline asm 2026-02-21T12:43:14.3349518Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r4020, %r4021, %r4022, %r4023}, [%r2299]; 2026-02-21T12:43:14.3349578Z // end inline asm 2026-02-21T12:43:14.3349639Z // begin inline asm 2026-02-21T12:43:14.3349772Z st.global.v4.b32 [ %rd219 + 0 ], { %r3985, %r3986, %r3987, %r3988 }; 2026-02-21T12:43:14.3349968Z // end inline asm 2026-02-21T12:43:14.3350036Z // begin inline asm 2026-02-21T12:43:14.3350168Z st.global.v4.b32 [ %rd220 + 0 ], { %r3990, %r3991, %r3992, %r3993 }; 2026-02-21T12:43:14.3350226Z // end inline asm 2026-02-21T12:43:14.3350285Z // begin inline asm 2026-02-21T12:43:14.3350403Z st.global.v4.b32 [ %rd221 + 0 ], { %r3995, %r3996, %r3997, %r3998 }; 2026-02-21T12:43:14.3350466Z // end inline asm 2026-02-21T12:43:14.3350526Z // begin inline asm 2026-02-21T12:43:14.3350644Z st.global.v4.b32 [ %rd222 + 0 ], { %r4000, %r4001, %r4002, %r4003 }; 2026-02-21T12:43:14.3350714Z // end inline asm 2026-02-21T12:43:14.3350780Z // begin inline asm 2026-02-21T12:43:14.3350897Z st.global.v4.b32 [ %rd223 + 0 ], { %r4005, %r4006, %r4007, %r4008 }; 2026-02-21T12:43:14.3350959Z // end inline asm 2026-02-21T12:43:14.3351022Z // begin inline asm 2026-02-21T12:43:14.3351141Z st.global.v4.b32 [ %rd224 + 0 ], { %r4010, %r4011, %r4012, %r4013 }; 2026-02-21T12:43:14.3351200Z // end inline asm 2026-02-21T12:43:14.3351270Z // begin inline asm 2026-02-21T12:43:14.3351383Z st.global.v4.b32 [ %rd225 + 0 ], { %r4015, %r4016, %r4017, %r4018 }; 2026-02-21T12:43:14.3351443Z // end inline asm 2026-02-21T12:43:14.3351506Z // begin inline asm 2026-02-21T12:43:14.3351620Z st.global.v4.b32 [ %rd226 + 0 ], { %r4020, %r4021, %r4022, %r4023 }; 2026-02-21T12:43:14.3351678Z // end inline asm 2026-02-21T12:43:14.3351896Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3351970Z add.s64 %rd245, %rd398, 528; 2026-02-21T12:43:14.3352174Z .loc 1 25 35 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:25:35 2026-02-21T12:43:14.3352241Z shr.s64 %rd246, %rd245, 63; 2026-02-21T12:43:14.3352310Z shr.u64 %rd247, %rd246, 52; 2026-02-21T12:43:14.3352378Z add.s64 %rd248, %rd245, %rd247; 2026-02-21T12:43:14.3352443Z shr.s64 %rd249, %rd248, 12; 2026-02-21T12:43:14.3352651Z .loc 1 26 33 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:26:33 2026-02-21T12:43:14.3352720Z shl.b64 %rd57, %rd249, 3; 2026-02-21T12:43:14.3352919Z .loc 1 27 39 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:39 2026-02-21T12:43:14.3352991Z sub.s64 %rd250, 10, %rd57; 2026-02-21T12:43:14.3353189Z .loc 1 27 52 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:52 2026-02-21T12:43:14.3353255Z min.s64 %rd58, %rd250, 8; 2026-02-21T12:43:14.3353453Z .loc 1 28 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:45 2026-02-21T12:43:14.3353529Z and.b64 %rd251, %rd248, -4096; 2026-02-21T12:43:14.3353596Z sub.s64 %rd59, %rd245, %rd251; 2026-02-21T12:43:14.3353793Z .loc 1 29 51 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:29:51 2026-02-21T12:43:14.3353938Z or.b64 %rd252, %rd59, %rd58; 2026-02-21T12:43:14.3354012Z and.b64 %rd253, %rd252, -4294967296; 2026-02-21T12:43:14.3354141Z setp.ne.b64 %p25, %rd253, 0; 2026-02-21T12:43:14.3354210Z @%p25 bra $L__BB0_20; 2026-02-21T12:43:14.3354274Z bra.uni $L__BB0_19; 2026-02-21T12:43:14.3354393Z $L__BB0_20: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3354462Z div.s64 %rd407, %rd59, %rd58; 2026-02-21T12:43:14.3354529Z bra.uni $L__BB0_21; 2026-02-21T12:43:14.3354641Z $L__BB0_19: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3354717Z cvt.u32.u64 %r4089, %rd58; 2026-02-21T12:43:14.3354788Z cvt.u32.u64 %r4090, %rd59; 2026-02-21T12:43:14.3354856Z div.u32 %r4091, %r4090, %r4089; 2026-02-21T12:43:14.3354920Z cvt.u64.u32 %rd407, %r4091; 2026-02-21T12:43:14.3355026Z $L__BB0_21: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3355235Z .loc 1 28 64 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:64 2026-02-21T12:43:14.3355304Z mul.lo.s64 %rd255, %rd407, %rd58; 2026-02-21T12:43:14.3355472Z sub.s64 %rd256, %rd59, %rd255; 2026-02-21T12:43:14.3355696Z .loc 1 28 30 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:30 2026-02-21T12:43:14.3355764Z add.s64 %rd257, %rd256, %rd57; 2026-02-21T12:43:14.3355967Z .loc 1 30 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:30:27 2026-02-21T12:43:14.3356039Z shl.b64 %rd63, %rd257, 7; 2026-02-21T12:43:14.3356237Z .loc 1 32 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:32:27 2026-02-21T12:43:14.3356309Z shl.b64 %rd64, %rd407, 9; 2026-02-21T12:43:14.3356653Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3356724Z shl.b64 %rd258, %rd407, 23; 2026-02-21T12:43:14.3356791Z add.s64 %rd409, %rd16, %rd258; 2026-02-21T12:43:14.3356856Z add.s64 %rd408, %rd17, %rd63; 2026-02-21T12:43:14.3356925Z mov.b32 %r7745, 0f00000000; 2026-02-21T12:43:14.3356993Z mov.b64 %rd410, -32; 2026-02-21T12:43:14.3357069Z mov.b32 %r7746, %r7745; 2026-02-21T12:43:14.3357141Z mov.b32 %r7747, %r7745; 2026-02-21T12:43:14.3357202Z mov.b32 %r7748, %r7745; 2026-02-21T12:43:14.3357261Z mov.b32 %r7749, %r7745; 2026-02-21T12:43:14.3357323Z mov.b32 %r7750, %r7745; 2026-02-21T12:43:14.3357388Z mov.b32 %r7751, %r7745; 2026-02-21T12:43:14.3357451Z mov.b32 %r7752, %r7745; 2026-02-21T12:43:14.3357512Z mov.b32 %r7753, %r7745; 2026-02-21T12:43:14.3357579Z mov.b32 %r7754, %r7745; 2026-02-21T12:43:14.3357639Z mov.b32 %r7755, %r7745; 2026-02-21T12:43:14.3357700Z mov.b32 %r7756, %r7745; 2026-02-21T12:43:14.3357762Z mov.b32 %r7757, %r7745; 2026-02-21T12:43:14.3357827Z mov.b32 %r7758, %r7745; 2026-02-21T12:43:14.3357899Z mov.b32 %r7759, %r7745; 2026-02-21T12:43:14.3357961Z mov.b32 %r7760, %r7745; 2026-02-21T12:43:14.3358030Z mov.b32 %r7761, %r7745; 2026-02-21T12:43:14.3358092Z mov.b32 %r7762, %r7745; 2026-02-21T12:43:14.3358151Z mov.b32 %r7763, %r7745; 2026-02-21T12:43:14.3358216Z mov.b32 %r7764, %r7745; 2026-02-21T12:43:14.3358281Z mov.b32 %r7765, %r7745; 2026-02-21T12:43:14.3358339Z mov.b32 %r7766, %r7745; 2026-02-21T12:43:14.3358399Z mov.b32 %r7767, %r7745; 2026-02-21T12:43:14.3358464Z mov.b32 %r7768, %r7745; 2026-02-21T12:43:14.3358525Z mov.b32 %r7769, %r7745; 2026-02-21T12:43:14.3358584Z mov.b32 %r7770, %r7745; 2026-02-21T12:43:14.3358649Z mov.b32 %r7771, %r7745; 2026-02-21T12:43:14.3358712Z mov.b32 %r7772, %r7745; 2026-02-21T12:43:14.3358773Z mov.b32 %r7773, %r7745; 2026-02-21T12:43:14.3358832Z mov.b32 %r7774, %r7745; 2026-02-21T12:43:14.3358900Z mov.b32 %r7775, %r7745; 2026-02-21T12:43:14.3358961Z mov.b32 %r7776, %r7745; 2026-02-21T12:43:14.3359029Z mov.b32 %r7777, %r7745; 2026-02-21T12:43:14.3359093Z mov.b32 %r7778, %r7745; 2026-02-21T12:43:14.3359254Z mov.b32 %r7779, %r7745; 2026-02-21T12:43:14.3359314Z mov.b32 %r7780, %r7745; 2026-02-21T12:43:14.3359373Z mov.b32 %r7781, %r7745; 2026-02-21T12:43:14.3359519Z mov.b32 %r7782, %r7745; 2026-02-21T12:43:14.3359582Z mov.b32 %r7783, %r7745; 2026-02-21T12:43:14.3359643Z mov.b32 %r7784, %r7745; 2026-02-21T12:43:14.3359708Z mov.b32 %r7785, %r7745; 2026-02-21T12:43:14.3359768Z mov.b32 %r7786, %r7745; 2026-02-21T12:43:14.3359828Z mov.b32 %r7787, %r7745; 2026-02-21T12:43:14.3359887Z mov.b32 %r7788, %r7745; 2026-02-21T12:43:14.3359952Z mov.b32 %r7789, %r7745; 2026-02-21T12:43:14.3360012Z mov.b32 %r7790, %r7745; 2026-02-21T12:43:14.3360072Z mov.b32 %r7791, %r7745; 2026-02-21T12:43:14.3360135Z mov.b32 %r7792, %r7745; 2026-02-21T12:43:14.3360195Z mov.b32 %r7793, %r7745; 2026-02-21T12:43:14.3360255Z mov.b32 %r7794, %r7745; 2026-02-21T12:43:14.3360316Z mov.b32 %r7795, %r7745; 2026-02-21T12:43:14.3360381Z mov.b32 %r7796, %r7745; 2026-02-21T12:43:14.3360441Z mov.b32 %r7797, %r7745; 2026-02-21T12:43:14.3360505Z mov.b32 %r7798, %r7745; 2026-02-21T12:43:14.3360570Z mov.b32 %r7799, %r7745; 2026-02-21T12:43:14.3360631Z mov.b32 %r7800, %r7745; 2026-02-21T12:43:14.3360830Z mov.b32 %r7801, %r7745; 2026-02-21T12:43:14.3360907Z mov.b32 %r7802, %r7745; 2026-02-21T12:43:14.3360975Z mov.b32 %r7803, %r7745; 2026-02-21T12:43:14.3361036Z mov.b32 %r7804, %r7745; 2026-02-21T12:43:14.3361096Z mov.b32 %r7805, %r7745; 2026-02-21T12:43:14.3361162Z mov.b32 %r7806, %r7745; 2026-02-21T12:43:14.3361224Z mov.b32 %r7807, %r7745; 2026-02-21T12:43:14.3361283Z mov.b32 %r7808, %r7745; 2026-02-21T12:43:14.3361407Z $L__BB0_22: // Parent Loop BB0_2 Depth=1 2026-02-21T12:43:14.3361519Z // => This Inner Loop Header: Depth=2 2026-02-21T12:43:14.3361729Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3361795Z add.s64 %rd260, %rd409, -96; 2026-02-21T12:43:14.3361866Z // begin inline asm 2026-02-21T12:43:14.3361927Z mov.u64 %rd259, 0x0; 2026-02-21T12:43:14.3362056Z createpolicy.fractional.L2::evict_last.b64 %rd259, 1.0; 2026-02-21T12:43:14.3362126Z // end inline asm 2026-02-21T12:43:14.3362199Z // begin inline asm 2026-02-21T12:43:14.3362260Z mov.u32 %r4093, 0x0; 2026-02-21T12:43:14.3362322Z mov.u32 %r4094, 0x0; 2026-02-21T12:43:14.3362389Z mov.u32 %r4095, 0x0; 2026-02-21T12:43:14.3362449Z mov.u32 %r4096, 0x0; 2026-02-21T12:43:14.3362676Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4093, %r4094, %r4095, %r4096 }, [ %rd260 + 0 ], %rd259; 2026-02-21T12:43:14.3362739Z // end inline asm 2026-02-21T12:43:14.3362944Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3363003Z bar.sync 0; 2026-02-21T12:43:14.3363089Z st.shared.v2.b32 [%r7], {%r4093, %r4094}; 2026-02-21T12:43:14.3363166Z st.shared.v2.b32 [%r8], {%r4095, %r4096}; 2026-02-21T12:43:14.3363223Z bar.sync 0; 2026-02-21T12:43:14.3363304Z ld.shared.b16 %rs157, [%r27]; 2026-02-21T12:43:14.3363384Z ld.shared.b16 %rs158, [%r27+256]; 2026-02-21T12:43:14.3363457Z ld.shared.b16 %rs159, [%r27+16]; 2026-02-21T12:43:14.3363528Z ld.shared.b16 %rs160, [%r27+272]; 2026-02-21T12:43:14.3363603Z ld.shared.b16 %rs161, [%r28]; 2026-02-21T12:43:14.3363669Z ld.shared.b16 %rs162, [%r28+256]; 2026-02-21T12:43:14.3363739Z ld.shared.b16 %rs163, [%r28+16]; 2026-02-21T12:43:14.3363810Z ld.shared.b16 %rs164, [%r28+272]; 2026-02-21T12:43:14.3363879Z cvt.f32.bf16 %r4225, %rs157; 2026-02-21T12:43:14.3363944Z cvt.f32.bf16 %r4226, %rs158; 2026-02-21T12:43:14.3364006Z cvt.f32.bf16 %r4227, %rs161; 2026-02-21T12:43:14.3364076Z cvt.f32.bf16 %r4228, %rs162; 2026-02-21T12:43:14.3364139Z cvt.f32.bf16 %r4357, %rs159; 2026-02-21T12:43:14.3364201Z cvt.f32.bf16 %r4358, %rs160; 2026-02-21T12:43:14.3364268Z cvt.f32.bf16 %r4359, %rs163; 2026-02-21T12:43:14.3364331Z cvt.f32.bf16 %r4360, %rs164; 2026-02-21T12:43:14.3364540Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3364685Z add.s64 %rd263, %rd408, -30720; 2026-02-21T12:43:14.3364818Z // begin inline asm 2026-02-21T12:43:14.3364883Z mov.u64 %rd262, 0x0; 2026-02-21T12:43:14.3365017Z createpolicy.fractional.L2::evict_first.b64 %rd262, 1.0; 2026-02-21T12:43:14.3365084Z // end inline asm 2026-02-21T12:43:14.3365146Z // begin inline asm 2026-02-21T12:43:14.3365209Z mov.u16 %rs153, 0x0; 2026-02-21T12:43:14.3365378Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs153 }, [ %rd263 + 0 ], %rd262; 2026-02-21T12:43:14.3365441Z // end inline asm 2026-02-21T12:43:14.3365642Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3365701Z bar.sync 0; 2026-02-21T12:43:14.3365776Z st.shared.b8 [%r11], %rs153; 2026-02-21T12:43:14.3365834Z bar.sync 0; 2026-02-21T12:43:14.3365914Z ld.shared.v2.b8 {%rs165, %rs166}, [%r12]; 2026-02-21T12:43:14.3366121Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3366189Z shl.b16 %rs167, %rs165, 4; 2026-02-21T12:43:14.3366358Z shl.b16 %rs168, %rs166, 4; 2026-02-21T12:43:14.3366689Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3366769Z selp.b16 %rs169, %rs167, %rs165, %p51; 2026-02-21T12:43:14.3366835Z cvt.s16.s8 %rs170, %rs169; 2026-02-21T12:43:14.3366896Z shr.s16 %rs171, %rs170, 4; 2026-02-21T12:43:14.3366972Z selp.b16 %rs172, %rs168, %rs166, %p51; 2026-02-21T12:43:14.3367035Z cvt.s16.s8 %rs173, %rs172; 2026-02-21T12:43:14.3367098Z shr.s16 %rs174, %rs173, 4; 2026-02-21T12:43:14.3367302Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3367383Z cvt.rn.f32.s16 %r5701, %rs171; 2026-02-21T12:43:14.3367450Z cvt.rn.f32.s16 %r5702, %rs174; 2026-02-21T12:43:14.3367507Z bar.sync 0; 2026-02-21T12:43:14.3367582Z st.shared.b32 [%r13], %r5701; 2026-02-21T12:43:14.3367647Z st.shared.b32 [%r14], %r5702; 2026-02-21T12:43:14.3367702Z $L__tmp17: 2026-02-21T12:43:14.3367989Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3368056Z // begin inline asm 2026-02-21T12:43:14.3368134Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3368199Z // end inline asm 2026-02-21T12:43:14.3368254Z bar.sync 0; 2026-02-21T12:43:14.3368334Z shfl.sync.idx.b32 %r5703, %r2, 0, 31, -1; 2026-02-21T12:43:14.3368411Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3368483Z mov.pred %p26, -1; 2026-02-21T12:43:14.3368544Z // begin inline asm 2026-02-21T12:43:14.3369821Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r4225,%r4226,%r4227,%r4228}, %rd14, %p26, 1, 1; 2026-02-21T12:43:14.3369890Z // end inline asm 2026-02-21T12:43:14.3369950Z // begin inline asm 2026-02-21T12:43:14.3371218Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r4357,%r4358,%r4359,%r4360}, %rd15, %p26, 1, 1; 2026-02-21T12:43:14.3371278Z // end inline asm 2026-02-21T12:43:14.3371473Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3371537Z mov.b32 %r5632, 0; 2026-02-21T12:43:14.3371601Z mov.b32 %r4425, %r6277; 2026-02-21T12:43:14.3371754Z mov.b32 %r4426, %r5632; 2026-02-21T12:43:14.3371814Z mov.b32 %r4427, %r5632; 2026-02-21T12:43:14.3371880Z // begin inline asm 2026-02-21T12:43:14.3372942Z // wait for regs: %r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808,%r4425,%r4426,%r4427 2026-02-21T12:43:14.3373022Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3373087Z // end inline asm 2026-02-21T12:43:14.3373144Z $L__tmp18: 2026-02-21T12:43:14.3373358Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3373433Z add.s64 %rd268, %rd409, -64; 2026-02-21T12:43:14.3373637Z // begin inline asm 2026-02-21T12:43:14.3373704Z mov.u64 %rd267, 0x0; 2026-02-21T12:43:14.3373841Z createpolicy.fractional.L2::evict_last.b64 %rd267, 1.0; 2026-02-21T12:43:14.3373901Z // end inline asm 2026-02-21T12:43:14.3373962Z // begin inline asm 2026-02-21T12:43:14.3374022Z mov.u32 %r4495, 0x0; 2026-02-21T12:43:14.3374086Z mov.u32 %r4496, 0x0; 2026-02-21T12:43:14.3374144Z mov.u32 %r4497, 0x0; 2026-02-21T12:43:14.3374203Z mov.u32 %r4498, 0x0; 2026-02-21T12:43:14.3374434Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4495, %r4496, %r4497, %r4498 }, [ %rd268 + 0 ], %rd267; 2026-02-21T12:43:14.3374494Z // end inline asm 2026-02-21T12:43:14.3374700Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3374779Z bar.sync 0; 2026-02-21T12:43:14.3374865Z st.shared.v2.b32 [%r7], {%r4495, %r4496}; 2026-02-21T12:43:14.3374940Z st.shared.v2.b32 [%r8], {%r4497, %r4498}; 2026-02-21T12:43:14.3375000Z bar.sync 0; 2026-02-21T12:43:14.3375077Z ld.shared.b16 %rs175, [%r27]; 2026-02-21T12:43:14.3375149Z ld.shared.b16 %rs176, [%r27+256]; 2026-02-21T12:43:14.3375220Z ld.shared.b16 %rs177, [%r27+16]; 2026-02-21T12:43:14.3375291Z ld.shared.b16 %rs178, [%r27+272]; 2026-02-21T12:43:14.3375357Z ld.shared.b16 %rs179, [%r28]; 2026-02-21T12:43:14.3375425Z ld.shared.b16 %rs180, [%r28+256]; 2026-02-21T12:43:14.3375492Z ld.shared.b16 %rs181, [%r28+16]; 2026-02-21T12:43:14.3375564Z ld.shared.b16 %rs182, [%r28+272]; 2026-02-21T12:43:14.3375631Z cvt.f32.bf16 %r4627, %rs175; 2026-02-21T12:43:14.3375695Z cvt.f32.bf16 %r4628, %rs176; 2026-02-21T12:43:14.3375764Z cvt.f32.bf16 %r4629, %rs179; 2026-02-21T12:43:14.3375826Z cvt.f32.bf16 %r4630, %rs180; 2026-02-21T12:43:14.3375890Z cvt.f32.bf16 %r4759, %rs177; 2026-02-21T12:43:14.3375966Z cvt.f32.bf16 %r4760, %rs178; 2026-02-21T12:43:14.3376037Z cvt.f32.bf16 %r4761, %rs181; 2026-02-21T12:43:14.3376100Z cvt.f32.bf16 %r4762, %rs182; 2026-02-21T12:43:14.3376310Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3376388Z add.s64 %rd271, %rd408, -20480; 2026-02-21T12:43:14.3376578Z // begin inline asm 2026-02-21T12:43:14.3376647Z mov.u64 %rd270, 0x0; 2026-02-21T12:43:14.3376787Z createpolicy.fractional.L2::evict_first.b64 %rd270, 1.0; 2026-02-21T12:43:14.3376849Z // end inline asm 2026-02-21T12:43:14.3376914Z // begin inline asm 2026-02-21T12:43:14.3376979Z mov.u16 %rs154, 0x0; 2026-02-21T12:43:14.3377151Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs154 }, [ %rd271 + 0 ], %rd270; 2026-02-21T12:43:14.3377211Z // end inline asm 2026-02-21T12:43:14.3377429Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3377495Z bar.sync 0; 2026-02-21T12:43:14.3377659Z st.shared.b8 [%r11], %rs154; 2026-02-21T12:43:14.3377718Z bar.sync 0; 2026-02-21T12:43:14.3377803Z ld.shared.v2.b8 {%rs183, %rs184}, [%r12]; 2026-02-21T12:43:14.3378085Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3378154Z shl.b16 %rs185, %rs183, 4; 2026-02-21T12:43:14.3378219Z shl.b16 %rs186, %rs184, 4; 2026-02-21T12:43:14.3378430Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3378506Z selp.b16 %rs187, %rs185, %rs183, %p51; 2026-02-21T12:43:14.3378571Z cvt.s16.s8 %rs188, %rs187; 2026-02-21T12:43:14.3378638Z shr.s16 %rs189, %rs188, 4; 2026-02-21T12:43:14.3378720Z selp.b16 %rs190, %rs186, %rs184, %p51; 2026-02-21T12:43:14.3378783Z cvt.s16.s8 %rs191, %rs190; 2026-02-21T12:43:14.3378847Z shr.s16 %rs192, %rs191, 4; 2026-02-21T12:43:14.3379052Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3379122Z cvt.rn.f32.s16 %r5704, %rs189; 2026-02-21T12:43:14.3379189Z cvt.rn.f32.s16 %r5705, %rs192; 2026-02-21T12:43:14.3379332Z bar.sync 0; 2026-02-21T12:43:14.3379475Z st.shared.b32 [%r13], %r5704; 2026-02-21T12:43:14.3379550Z st.shared.b32 [%r14], %r5705; 2026-02-21T12:43:14.3379612Z $L__tmp19: 2026-02-21T12:43:14.3379890Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3379951Z // begin inline asm 2026-02-21T12:43:14.3380030Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3380092Z // end inline asm 2026-02-21T12:43:14.3380151Z bar.sync 0; 2026-02-21T12:43:14.3380225Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3380289Z // begin inline asm 2026-02-21T12:43:14.3381563Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r4627,%r4628,%r4629,%r4630}, %rd14, %p26, 1, 1; 2026-02-21T12:43:14.3381627Z // end inline asm 2026-02-21T12:43:14.3381691Z // begin inline asm 2026-02-21T12:43:14.3382948Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r4759,%r4760,%r4761,%r4762}, %rd15, %p26, 1, 1; 2026-02-21T12:43:14.3383015Z // end inline asm 2026-02-21T12:43:14.3383092Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3383158Z mov.b32 %r4827, %r6277; 2026-02-21T12:43:14.3383221Z mov.b32 %r4828, %r5632; 2026-02-21T12:43:14.3383286Z mov.b32 %r4829, %r5632; 2026-02-21T12:43:14.3383347Z // begin inline asm 2026-02-21T12:43:14.3384420Z // wait for regs: %r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808,%r4827,%r4828,%r4829 2026-02-21T12:43:14.3384504Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3384562Z // end inline asm 2026-02-21T12:43:14.3384686Z $L__tmp20: 2026-02-21T12:43:14.3384899Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3385019Z add.s64 %rd276, %rd409, -32; 2026-02-21T12:43:14.3385080Z // begin inline asm 2026-02-21T12:43:14.3385144Z mov.u64 %rd275, 0x0; 2026-02-21T12:43:14.3385272Z createpolicy.fractional.L2::evict_last.b64 %rd275, 1.0; 2026-02-21T12:43:14.3385341Z // end inline asm 2026-02-21T12:43:14.3385405Z // begin inline asm 2026-02-21T12:43:14.3385470Z mov.u32 %r4897, 0x0; 2026-02-21T12:43:14.3385529Z mov.u32 %r4898, 0x0; 2026-02-21T12:43:14.3385587Z mov.u32 %r4899, 0x0; 2026-02-21T12:43:14.3385650Z mov.u32 %r4900, 0x0; 2026-02-21T12:43:14.3385874Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r4897, %r4898, %r4899, %r4900 }, [ %rd276 + 0 ], %rd275; 2026-02-21T12:43:14.3385934Z // end inline asm 2026-02-21T12:43:14.3386144Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3386205Z bar.sync 0; 2026-02-21T12:43:14.3386284Z st.shared.v2.b32 [%r7], {%r4897, %r4898}; 2026-02-21T12:43:14.3386413Z st.shared.v2.b32 [%r8], {%r4899, %r4900}; 2026-02-21T12:43:14.3386680Z bar.sync 0; 2026-02-21T12:43:14.3386764Z ld.shared.b16 %rs193, [%r27]; 2026-02-21T12:43:14.3386834Z ld.shared.b16 %rs194, [%r27+256]; 2026-02-21T12:43:14.3386908Z ld.shared.b16 %rs195, [%r27+16]; 2026-02-21T12:43:14.3386975Z ld.shared.b16 %rs196, [%r27+272]; 2026-02-21T12:43:14.3387040Z ld.shared.b16 %rs197, [%r28]; 2026-02-21T12:43:14.3387108Z ld.shared.b16 %rs198, [%r28+256]; 2026-02-21T12:43:14.3387190Z ld.shared.b16 %rs199, [%r28+16]; 2026-02-21T12:43:14.3387258Z ld.shared.b16 %rs200, [%r28+272]; 2026-02-21T12:43:14.3387326Z cvt.f32.bf16 %r5029, %rs193; 2026-02-21T12:43:14.3387396Z cvt.f32.bf16 %r5030, %rs194; 2026-02-21T12:43:14.3387461Z cvt.f32.bf16 %r5031, %rs197; 2026-02-21T12:43:14.3387524Z cvt.f32.bf16 %r5032, %rs198; 2026-02-21T12:43:14.3387593Z cvt.f32.bf16 %r5161, %rs195; 2026-02-21T12:43:14.3387657Z cvt.f32.bf16 %r5162, %rs196; 2026-02-21T12:43:14.3387719Z cvt.f32.bf16 %r5163, %rs199; 2026-02-21T12:43:14.3387785Z cvt.f32.bf16 %r5164, %rs200; 2026-02-21T12:43:14.3388005Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3388082Z add.s64 %rd279, %rd408, -10240; 2026-02-21T12:43:14.3388147Z // begin inline asm 2026-02-21T12:43:14.3388213Z mov.u64 %rd278, 0x0; 2026-02-21T12:43:14.3388342Z createpolicy.fractional.L2::evict_first.b64 %rd278, 1.0; 2026-02-21T12:43:14.3388401Z // end inline asm 2026-02-21T12:43:14.3388462Z // begin inline asm 2026-02-21T12:43:14.3388534Z mov.u16 %rs155, 0x0; 2026-02-21T12:43:14.3388804Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs155 }, [ %rd279 + 0 ], %rd278; 2026-02-21T12:43:14.3388865Z // end inline asm 2026-02-21T12:43:14.3389077Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3389138Z bar.sync 0; 2026-02-21T12:43:14.3389206Z st.shared.b8 [%r11], %rs155; 2026-02-21T12:43:14.3389271Z bar.sync 0; 2026-02-21T12:43:14.3389353Z ld.shared.v2.b8 {%rs201, %rs202}, [%r12]; 2026-02-21T12:43:14.3389559Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3389627Z shl.b16 %rs203, %rs201, 4; 2026-02-21T12:43:14.3389697Z shl.b16 %rs204, %rs202, 4; 2026-02-21T12:43:14.3389903Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3389977Z selp.b16 %rs205, %rs203, %rs201, %p51; 2026-02-21T12:43:14.3390048Z cvt.s16.s8 %rs206, %rs205; 2026-02-21T12:43:14.3390112Z shr.s16 %rs207, %rs206, 4; 2026-02-21T12:43:14.3390185Z selp.b16 %rs208, %rs204, %rs202, %p51; 2026-02-21T12:43:14.3390253Z cvt.s16.s8 %rs209, %rs208; 2026-02-21T12:43:14.3390327Z shr.s16 %rs210, %rs209, 4; 2026-02-21T12:43:14.3390529Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3390695Z cvt.rn.f32.s16 %r5706, %rs207; 2026-02-21T12:43:14.3390771Z cvt.rn.f32.s16 %r5707, %rs210; 2026-02-21T12:43:14.3390912Z bar.sync 0; 2026-02-21T12:43:14.3390981Z st.shared.b32 [%r13], %r5706; 2026-02-21T12:43:14.3391053Z st.shared.b32 [%r14], %r5707; 2026-02-21T12:43:14.3391110Z $L__tmp21: 2026-02-21T12:43:14.3391383Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3391451Z // begin inline asm 2026-02-21T12:43:14.3391528Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3391587Z // end inline asm 2026-02-21T12:43:14.3391644Z bar.sync 0; 2026-02-21T12:43:14.3391723Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3391785Z // begin inline asm 2026-02-21T12:43:14.3393198Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r5029,%r5030,%r5031,%r5032}, %rd14, %p26, 1, 1; 2026-02-21T12:43:14.3393273Z // end inline asm 2026-02-21T12:43:14.3393335Z // begin inline asm 2026-02-21T12:43:14.3394592Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r5161,%r5162,%r5163,%r5164}, %rd15, %p26, 1, 1; 2026-02-21T12:43:14.3394659Z // end inline asm 2026-02-21T12:43:14.3394737Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3394804Z mov.b32 %r5229, %r6277; 2026-02-21T12:43:14.3394873Z mov.b32 %r5230, %r5632; 2026-02-21T12:43:14.3394931Z mov.b32 %r5231, %r5632; 2026-02-21T12:43:14.3394991Z // begin inline asm 2026-02-21T12:43:14.3396078Z // wait for regs: %r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808,%r5229,%r5230,%r5231 2026-02-21T12:43:14.3396158Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3396216Z // end inline asm 2026-02-21T12:43:14.3396278Z $L__tmp22: 2026-02-21T12:43:14.3396599Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3396674Z // begin inline asm 2026-02-21T12:43:14.3396742Z mov.u64 %rd283, 0x0; 2026-02-21T12:43:14.3396867Z createpolicy.fractional.L2::evict_last.b64 %rd283, 1.0; 2026-02-21T12:43:14.3396926Z // end inline asm 2026-02-21T12:43:14.3396986Z // begin inline asm 2026-02-21T12:43:14.3397051Z mov.u32 %r5299, 0x0; 2026-02-21T12:43:14.3397110Z mov.u32 %r5300, 0x0; 2026-02-21T12:43:14.3397171Z mov.u32 %r5301, 0x0; 2026-02-21T12:43:14.3397233Z mov.u32 %r5302, 0x0; 2026-02-21T12:43:14.3397469Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5299, %r5300, %r5301, %r5302 }, [ %rd409 + 0 ], %rd283; 2026-02-21T12:43:14.3397531Z // end inline asm 2026-02-21T12:43:14.3397742Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3397802Z bar.sync 0; 2026-02-21T12:43:14.3397979Z st.shared.v2.b32 [%r7], {%r5299, %r5300}; 2026-02-21T12:43:14.3398059Z st.shared.v2.b32 [%r8], {%r5301, %r5302}; 2026-02-21T12:43:14.3398198Z bar.sync 0; 2026-02-21T12:43:14.3398273Z ld.shared.b16 %rs211, [%r27]; 2026-02-21T12:43:14.3398345Z ld.shared.b16 %rs212, [%r27+256]; 2026-02-21T12:43:14.3398419Z ld.shared.b16 %rs213, [%r27+16]; 2026-02-21T12:43:14.3398486Z ld.shared.b16 %rs214, [%r27+272]; 2026-02-21T12:43:14.3398551Z ld.shared.b16 %rs215, [%r28]; 2026-02-21T12:43:14.3398622Z ld.shared.b16 %rs216, [%r28+256]; 2026-02-21T12:43:14.3398690Z ld.shared.b16 %rs217, [%r28+16]; 2026-02-21T12:43:14.3398756Z ld.shared.b16 %rs218, [%r28+272]; 2026-02-21T12:43:14.3398823Z cvt.f32.bf16 %r5431, %rs211; 2026-02-21T12:43:14.3398892Z cvt.f32.bf16 %r5432, %rs212; 2026-02-21T12:43:14.3398954Z cvt.f32.bf16 %r5433, %rs215; 2026-02-21T12:43:14.3399017Z cvt.f32.bf16 %r5434, %rs216; 2026-02-21T12:43:14.3399083Z cvt.f32.bf16 %r5563, %rs213; 2026-02-21T12:43:14.3399147Z cvt.f32.bf16 %r5564, %rs214; 2026-02-21T12:43:14.3399209Z cvt.f32.bf16 %r5565, %rs217; 2026-02-21T12:43:14.3399274Z cvt.f32.bf16 %r5566, %rs218; 2026-02-21T12:43:14.3399637Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3399715Z // begin inline asm 2026-02-21T12:43:14.3399779Z mov.u64 %rd286, 0x0; 2026-02-21T12:43:14.3399913Z createpolicy.fractional.L2::evict_first.b64 %rd286, 1.0; 2026-02-21T12:43:14.3399973Z // end inline asm 2026-02-21T12:43:14.3400035Z // begin inline asm 2026-02-21T12:43:14.3400100Z mov.u16 %rs156, 0x0; 2026-02-21T12:43:14.3400262Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs156 }, [ %rd408 + 0 ], %rd286; 2026-02-21T12:43:14.3400321Z // end inline asm 2026-02-21T12:43:14.3400523Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3400589Z bar.sync 0; 2026-02-21T12:43:14.3400656Z st.shared.b8 [%r11], %rs156; 2026-02-21T12:43:14.3400715Z bar.sync 0; 2026-02-21T12:43:14.3400799Z ld.shared.v2.b8 {%rs219, %rs220}, [%r12]; 2026-02-21T12:43:14.3401004Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3401075Z shl.b16 %rs221, %rs219, 4; 2026-02-21T12:43:14.3401144Z shl.b16 %rs222, %rs220, 4; 2026-02-21T12:43:14.3401343Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3401419Z selp.b16 %rs223, %rs221, %rs219, %p51; 2026-02-21T12:43:14.3401483Z cvt.s16.s8 %rs224, %rs223; 2026-02-21T12:43:14.3401556Z shr.s16 %rs225, %rs224, 4; 2026-02-21T12:43:14.3401629Z selp.b16 %rs226, %rs222, %rs220, %p51; 2026-02-21T12:43:14.3401693Z cvt.s16.s8 %rs227, %rs226; 2026-02-21T12:43:14.3401764Z shr.s16 %rs228, %rs227, 4; 2026-02-21T12:43:14.3401961Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3402031Z cvt.rn.f32.s16 %r5708, %rs225; 2026-02-21T12:43:14.3402098Z cvt.rn.f32.s16 %r5709, %rs228; 2026-02-21T12:43:14.3402162Z bar.sync 0; 2026-02-21T12:43:14.3402234Z st.shared.b32 [%r13], %r5708; 2026-02-21T12:43:14.3402300Z st.shared.b32 [%r14], %r5709; 2026-02-21T12:43:14.3402364Z $L__tmp23: 2026-02-21T12:43:14.3402638Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3402701Z // begin inline asm 2026-02-21T12:43:14.3402790Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3402851Z // end inline asm 2026-02-21T12:43:14.3402920Z bar.sync 0; 2026-02-21T12:43:14.3402995Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3403066Z // begin inline asm 2026-02-21T12:43:14.3404335Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r5431,%r5432,%r5433,%r5434}, %rd14, %p26, 1, 1; 2026-02-21T12:43:14.3404517Z // end inline asm 2026-02-21T12:43:14.3404580Z // begin inline asm 2026-02-21T12:43:14.3405850Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808}, {%r5563,%r5564,%r5565,%r5566}, %rd15, %p26, 1, 1; 2026-02-21T12:43:14.3405920Z // end inline asm 2026-02-21T12:43:14.3405997Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3406059Z mov.b32 %r5631, %r6277; 2026-02-21T12:43:14.3406224Z mov.b32 %r5633, %r5632; 2026-02-21T12:43:14.3406290Z // begin inline asm 2026-02-21T12:43:14.3407470Z // wait for regs: %r7745,%r7746,%r7747,%r7748,%r7749,%r7750,%r7751,%r7752,%r7753,%r7754,%r7755,%r7756,%r7757,%r7758,%r7759,%r7760,%r7761,%r7762,%r7763,%r7764,%r7765,%r7766,%r7767,%r7768,%r7769,%r7770,%r7771,%r7772,%r7773,%r7774,%r7775,%r7776,%r7777,%r7778,%r7779,%r7780,%r7781,%r7782,%r7783,%r7784,%r7785,%r7786,%r7787,%r7788,%r7789,%r7790,%r7791,%r7792,%r7793,%r7794,%r7795,%r7796,%r7797,%r7798,%r7799,%r7800,%r7801,%r7802,%r7803,%r7804,%r7805,%r7806,%r7807,%r7808,%r5631,%r5632,%r5633 2026-02-21T12:43:14.3407556Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3407614Z // end inline asm 2026-02-21T12:43:14.3407681Z $L__tmp24: 2026-02-21T12:43:14.3407899Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3407974Z add.s64 %rd410, %rd410, 32; 2026-02-21T12:43:14.3408038Z add.s64 %rd409, %rd409, 128; 2026-02-21T12:43:14.3408108Z add.s64 %rd408, %rd408, 40960; 2026-02-21T12:43:14.3408181Z setp.lt.u64 %p35, %rd410, 4064; 2026-02-21T12:43:14.3408243Z @%p35 bra $L__BB0_22; 2026-02-21T12:43:14.3408359Z // %bb.23: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:43:14.3408568Z .loc 1 31 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:32 2026-02-21T12:43:14.3408642Z or.b64 %rd299, %rd63, %rd3; 2026-02-21T12:43:14.3408839Z .loc 1 33 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:32 2026-02-21T12:43:14.3408903Z or.b64 %rd300, %rd64, %rd6; 2026-02-21T12:43:14.3408970Z or.b64 %rd301, %rd64, %rd7; 2026-02-21T12:43:14.3409032Z or.b64 %rd302, %rd64, %rd8; 2026-02-21T12:43:14.3409094Z or.b64 %rd303, %rd64, %rd9; 2026-02-21T12:43:14.3409162Z or.b64 %rd304, %rd64, %rd10; 2026-02-21T12:43:14.3409226Z or.b64 %rd305, %rd64, %rd11; 2026-02-21T12:43:14.3409289Z or.b64 %rd306, %rd64, %rd12; 2026-02-21T12:43:14.3409359Z or.b64 %rd307, %rd64, %rd13; 2026-02-21T12:43:14.3409558Z .loc 1 87 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:87:28 2026-02-21T12:43:14.3409636Z cvt.rn.bf16x2.f32 %r5782, %r7746, %r7745; 2026-02-21T12:43:14.3409711Z cvt.rn.bf16x2.f32 %r5783, %r7748, %r7747; 2026-02-21T12:43:14.3409789Z cvt.rn.bf16x2.f32 %r5784, %r7750, %r7749; 2026-02-21T12:43:14.3409864Z cvt.rn.bf16x2.f32 %r5785, %r7752, %r7751; 2026-02-21T12:43:14.3409936Z cvt.rn.bf16x2.f32 %r5786, %r7754, %r7753; 2026-02-21T12:43:14.3410014Z cvt.rn.bf16x2.f32 %r5787, %r7756, %r7755; 2026-02-21T12:43:14.3410088Z cvt.rn.bf16x2.f32 %r5788, %r7758, %r7757; 2026-02-21T12:43:14.3410161Z cvt.rn.bf16x2.f32 %r5789, %r7760, %r7759; 2026-02-21T12:43:14.3410238Z cvt.rn.bf16x2.f32 %r5790, %r7762, %r7761; 2026-02-21T12:43:14.3410413Z cvt.rn.bf16x2.f32 %r5791, %r7764, %r7763; 2026-02-21T12:43:14.3410488Z cvt.rn.bf16x2.f32 %r5792, %r7766, %r7765; 2026-02-21T12:43:14.3410562Z cvt.rn.bf16x2.f32 %r5793, %r7768, %r7767; 2026-02-21T12:43:14.3410718Z cvt.rn.bf16x2.f32 %r5794, %r7770, %r7769; 2026-02-21T12:43:14.3410791Z cvt.rn.bf16x2.f32 %r5795, %r7772, %r7771; 2026-02-21T12:43:14.3410864Z cvt.rn.bf16x2.f32 %r5796, %r7774, %r7773; 2026-02-21T12:43:14.3410942Z cvt.rn.bf16x2.f32 %r5797, %r7776, %r7775; 2026-02-21T12:43:14.3411016Z cvt.rn.bf16x2.f32 %r5798, %r7778, %r7777; 2026-02-21T12:43:14.3411088Z cvt.rn.bf16x2.f32 %r5799, %r7780, %r7779; 2026-02-21T12:43:14.3411160Z cvt.rn.bf16x2.f32 %r5800, %r7782, %r7781; 2026-02-21T12:43:14.3411241Z cvt.rn.bf16x2.f32 %r5801, %r7784, %r7783; 2026-02-21T12:43:14.3411315Z cvt.rn.bf16x2.f32 %r5802, %r7786, %r7785; 2026-02-21T12:43:14.3411387Z cvt.rn.bf16x2.f32 %r5803, %r7788, %r7787; 2026-02-21T12:43:14.3411469Z cvt.rn.bf16x2.f32 %r5804, %r7790, %r7789; 2026-02-21T12:43:14.3411545Z cvt.rn.bf16x2.f32 %r5805, %r7792, %r7791; 2026-02-21T12:43:14.3411618Z cvt.rn.bf16x2.f32 %r5806, %r7794, %r7793; 2026-02-21T12:43:14.3411775Z cvt.rn.bf16x2.f32 %r5807, %r7796, %r7795; 2026-02-21T12:43:14.3411917Z cvt.rn.bf16x2.f32 %r5808, %r7798, %r7797; 2026-02-21T12:43:14.3412003Z cvt.rn.bf16x2.f32 %r5809, %r7800, %r7799; 2026-02-21T12:43:14.3412077Z cvt.rn.bf16x2.f32 %r5810, %r7802, %r7801; 2026-02-21T12:43:14.3412160Z cvt.rn.bf16x2.f32 %r5811, %r7804, %r7803; 2026-02-21T12:43:14.3412232Z cvt.rn.bf16x2.f32 %r5812, %r7806, %r7805; 2026-02-21T12:43:14.3412310Z cvt.rn.bf16x2.f32 %r5813, %r7808, %r7807; 2026-02-21T12:43:14.3412522Z .loc 1 88 22 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:22 2026-02-21T12:43:14.3412598Z mad.lo.s64 %rd308, %rd300, 2560, %rd88; 2026-02-21T12:43:14.3412663Z shl.b64 %rd309, %rd299, 1; 2026-02-21T12:43:14.3412735Z add.s64 %rd291, %rd308, %rd309; 2026-02-21T12:43:14.3412806Z mad.lo.s64 %rd310, %rd301, 2560, %rd88; 2026-02-21T12:43:14.3412874Z add.s64 %rd292, %rd310, %rd309; 2026-02-21T12:43:14.3412943Z mad.lo.s64 %rd311, %rd302, 2560, %rd88; 2026-02-21T12:43:14.3413015Z add.s64 %rd293, %rd311, %rd309; 2026-02-21T12:43:14.3413090Z mad.lo.s64 %rd312, %rd303, 2560, %rd88; 2026-02-21T12:43:14.3413154Z add.s64 %rd294, %rd312, %rd309; 2026-02-21T12:43:14.3413229Z mad.lo.s64 %rd313, %rd304, 2560, %rd88; 2026-02-21T12:43:14.3413294Z add.s64 %rd295, %rd313, %rd309; 2026-02-21T12:43:14.3413363Z mad.lo.s64 %rd314, %rd305, 2560, %rd88; 2026-02-21T12:43:14.3413427Z add.s64 %rd296, %rd314, %rd309; 2026-02-21T12:43:14.3413504Z mad.lo.s64 %rd315, %rd306, 2560, %rd88; 2026-02-21T12:43:14.3413570Z add.s64 %rd297, %rd315, %rd309; 2026-02-21T12:43:14.3413639Z mad.lo.s64 %rd316, %rd307, 2560, %rd88; 2026-02-21T12:43:14.3413714Z add.s64 %rd298, %rd316, %rd309; 2026-02-21T12:43:14.3413913Z .loc 1 88 81 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:81 2026-02-21T12:43:14.3413973Z bar.sync 0; 2026-02-21T12:43:14.3414094Z st.shared.v4.b32 [%r15], {%r5782, %r5784, %r5786, %r5788}; 2026-02-21T12:43:14.3414216Z st.shared.v4.b32 [%r15+512], {%r5783, %r5785, %r5787, %r5789}; 2026-02-21T12:43:14.3414327Z st.shared.v4.b32 [%r16], {%r5790, %r5792, %r5794, %r5796}; 2026-02-21T12:43:14.3414441Z st.shared.v4.b32 [%r16+512], {%r5791, %r5793, %r5795, %r5797}; 2026-02-21T12:43:14.3414554Z st.shared.v4.b32 [%r17], {%r5798, %r5800, %r5802, %r5804}; 2026-02-21T12:43:14.3414671Z st.shared.v4.b32 [%r17+512], {%r5799, %r5801, %r5803, %r5805}; 2026-02-21T12:43:14.3414781Z st.shared.v4.b32 [%r18], {%r5806, %r5808, %r5810, %r5812}; 2026-02-21T12:43:14.3414901Z st.shared.v4.b32 [%r18+512], {%r5807, %r5809, %r5811, %r5813}; 2026-02-21T12:43:14.3414959Z bar.sync 0; 2026-02-21T12:43:14.3415024Z // begin inline asm 2026-02-21T12:43:14.3415221Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5710, %r5711, %r5712, %r5713}, [%r2264]; 2026-02-21T12:43:14.3415282Z // end inline asm 2026-02-21T12:43:14.3415413Z // begin inline asm 2026-02-21T12:43:14.3415605Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5715, %r5716, %r5717, %r5718}, [%r2269]; 2026-02-21T12:43:14.3415732Z // end inline asm 2026-02-21T12:43:14.3415795Z // begin inline asm 2026-02-21T12:43:14.3415989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5720, %r5721, %r5722, %r5723}, [%r2274]; 2026-02-21T12:43:14.3416057Z // end inline asm 2026-02-21T12:43:14.3416120Z // begin inline asm 2026-02-21T12:43:14.3416305Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5725, %r5726, %r5727, %r5728}, [%r2279]; 2026-02-21T12:43:14.3416369Z // end inline asm 2026-02-21T12:43:14.3416429Z // begin inline asm 2026-02-21T12:43:14.3416745Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5730, %r5731, %r5732, %r5733}, [%r2284]; 2026-02-21T12:43:14.3416806Z // end inline asm 2026-02-21T12:43:14.3416873Z // begin inline asm 2026-02-21T12:43:14.3417051Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5735, %r5736, %r5737, %r5738}, [%r2289]; 2026-02-21T12:43:14.3417111Z // end inline asm 2026-02-21T12:43:14.3417175Z // begin inline asm 2026-02-21T12:43:14.3417457Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5740, %r5741, %r5742, %r5743}, [%r2294]; 2026-02-21T12:43:14.3417586Z // end inline asm 2026-02-21T12:43:14.3417656Z // begin inline asm 2026-02-21T12:43:14.3417849Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r5745, %r5746, %r5747, %r5748}, [%r2299]; 2026-02-21T12:43:14.3417910Z // end inline asm 2026-02-21T12:43:14.3417970Z // begin inline asm 2026-02-21T12:43:14.3418104Z st.global.v4.b32 [ %rd291 + 0 ], { %r5710, %r5711, %r5712, %r5713 }; 2026-02-21T12:43:14.3418165Z // end inline asm 2026-02-21T12:43:14.3418225Z // begin inline asm 2026-02-21T12:43:14.3418349Z st.global.v4.b32 [ %rd292 + 0 ], { %r5715, %r5716, %r5717, %r5718 }; 2026-02-21T12:43:14.3418407Z // end inline asm 2026-02-21T12:43:14.3418467Z // begin inline asm 2026-02-21T12:43:14.3418583Z st.global.v4.b32 [ %rd293 + 0 ], { %r5720, %r5721, %r5722, %r5723 }; 2026-02-21T12:43:14.3418655Z // end inline asm 2026-02-21T12:43:14.3418718Z // begin inline asm 2026-02-21T12:43:14.3418836Z st.global.v4.b32 [ %rd294 + 0 ], { %r5725, %r5726, %r5727, %r5728 }; 2026-02-21T12:43:14.3418904Z // end inline asm 2026-02-21T12:43:14.3418964Z // begin inline asm 2026-02-21T12:43:14.3419081Z st.global.v4.b32 [ %rd295 + 0 ], { %r5730, %r5731, %r5732, %r5733 }; 2026-02-21T12:43:14.3419145Z // end inline asm 2026-02-21T12:43:14.3419205Z // begin inline asm 2026-02-21T12:43:14.3419322Z st.global.v4.b32 [ %rd296 + 0 ], { %r5735, %r5736, %r5737, %r5738 }; 2026-02-21T12:43:14.3419379Z // end inline asm 2026-02-21T12:43:14.3419444Z // begin inline asm 2026-02-21T12:43:14.3419559Z st.global.v4.b32 [ %rd297 + 0 ], { %r5740, %r5741, %r5742, %r5743 }; 2026-02-21T12:43:14.3419619Z // end inline asm 2026-02-21T12:43:14.3419683Z // begin inline asm 2026-02-21T12:43:14.3419800Z st.global.v4.b32 [ %rd298 + 0 ], { %r5745, %r5746, %r5747, %r5748 }; 2026-02-21T12:43:14.3419858Z // end inline asm 2026-02-21T12:43:14.3420080Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3420155Z add.s64 %rd398, %rd398, 792; 2026-02-21T12:43:14.3420233Z setp.lt.s64 %p36, %rd398, %rd411; 2026-02-21T12:43:14.3420298Z @%p36 bra $L__BB0_2; 2026-02-21T12:43:14.3420394Z $L__BB0_4: // %.preheader 2026-02-21T12:43:14.3420468Z setp.gt.s64 %p37, %rd411, 5119; 2026-02-21T12:43:14.3420529Z @%p37 bra $L__BB0_9; 2026-02-21T12:43:14.3420619Z // %bb.5: // %.lr.ph133 2026-02-21T12:43:14.3420835Z .loc 1 0 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:0:124 2026-02-21T12:43:14.3420902Z and.b32 %r5818, %r7602, 16240; 2026-02-21T12:43:14.3420967Z and.b32 %r5820, %r7603, 136; 2026-02-21T12:43:14.3421035Z or.b32 %r5821, %r5820, %r5818; 2026-02-21T12:43:14.3421101Z add.s32 %r29, %r6277, %r5821; 2026-02-21T12:43:14.3421166Z xor.b32 %r5823, %r5821, 8; 2026-02-21T12:43:14.3421327Z add.s32 %r30, %r6277, %r5823; 2026-02-21T12:43:14.3421392Z and.b32 %r5824, %r7602, 15872; 2026-02-21T12:43:14.3421455Z and.b32 %r5827, %r7606, 6; 2026-02-21T12:43:14.3421594Z and.b32 %r5829, %r7607, 136; 2026-02-21T12:43:14.3421662Z or.b32 %r5830, %r5824, %r7605; 2026-02-21T12:43:14.3421725Z or.b32 %r5831, %r5830, %r5827; 2026-02-21T12:43:14.3421786Z or.b32 %r5832, %r5831, %r5829; 2026-02-21T12:43:14.3421858Z add.s32 %r31, %r6277, %r5832; 2026-02-21T12:43:14.3421919Z xor.b32 %r5833, %r5832, 8; 2026-02-21T12:43:14.3421982Z add.s32 %r32, %r6277, %r5833; 2026-02-21T12:43:14.3422053Z and.b32 %r5835, %r7608, 124; 2026-02-21T12:43:14.3422118Z and.b32 %r5837, %r7599, 2; 2026-02-21T12:43:14.3422184Z selp.b32 %r5839, 1, 0, %p50; 2026-02-21T12:43:14.3422250Z add.s32 %r5840, %r6277, %r7609; 2026-02-21T12:43:14.3422320Z add.s32 %r5841, %r5840, %r5839; 2026-02-21T12:43:14.3422384Z add.s32 %r5842, %r5841, %r7610; 2026-02-21T12:43:14.3422447Z add.s32 %r5843, %r5842, %r5837; 2026-02-21T12:43:14.3422522Z add.s32 %r33, %r5843, %r5835; 2026-02-21T12:43:14.3422584Z and.b32 %r5844, %r7600, 384; 2026-02-21T12:43:14.3422706Z add.s32 %r5845, %r6277, %r5837; 2026-02-21T12:43:14.3422846Z add.s32 %r5846, %r5845, %r5844; 2026-02-21T12:43:14.3422918Z add.s32 %r5847, %r5846, %r5835; 2026-02-21T12:43:14.3422983Z add.s32 %r34, %r5847, %r7610; 2026-02-21T12:43:14.3423047Z shl.b32 %r5848, %r7601, 6; 2026-02-21T12:43:14.3423116Z xor.b32 %r5851, %r7611, %r7612; 2026-02-21T12:43:14.3423179Z or.b32 %r5852, %r5851, %r5848; 2026-02-21T12:43:14.3423242Z add.s32 %r35, %r6277, %r5852; 2026-02-21T12:43:14.3423307Z xor.b32 %r5853, %r5852, 32; 2026-02-21T12:43:14.3423376Z add.s32 %r36, %r6277, %r5853; 2026-02-21T12:43:14.3423440Z bfe.u32 %r5854, %r6277, 4, 14; 2026-02-21T12:43:14.3423505Z cvt.u64.u32 %rd317, %r5854; 2026-02-21T12:43:14.3423594Z or.b64 %rd19, %rd317, -9223371899382267904; 2026-02-21T12:43:14.3423662Z add.s32 %r5855, %r6277, 32; 2026-02-21T12:43:14.3423726Z bfe.u32 %r5856, %r5855, 4, 14; 2026-02-21T12:43:14.3423801Z cvt.u64.u32 %rd318, %r5856; 2026-02-21T12:43:14.3423879Z or.b64 %rd20, %rd318, -9223371899382267904; 2026-02-21T12:43:14.3423948Z shl.b32 %r5858, %r7613, 15; 2026-02-21T12:43:14.3424011Z and.b32 %r5860, %r7614, 31840; 2026-02-21T12:43:14.3424082Z shl.b32 %r5862, %r7615, 4; 2026-02-21T12:43:14.3424144Z and.b32 %r5863, %r7608, 16; 2026-02-21T12:43:14.3424207Z or.b32 %r5864, %r5858, %r5863; 2026-02-21T12:43:14.3424278Z or.b32 %r5865, %r5860, %r5862; 2026-02-21T12:43:14.3424344Z or.b32 %r5866, %r5864, %r5865; 2026-02-21T12:43:14.3424407Z add.s32 %r37, %r6277, %r5866; 2026-02-21T12:43:14.3424470Z xor.b32 %r5867, %r5866, 32; 2026-02-21T12:43:14.3424541Z add.s32 %r38, %r6277, %r5867; 2026-02-21T12:43:14.3424604Z xor.b32 %r5868, %r5866, 64; 2026-02-21T12:43:14.3424669Z add.s32 %r39, %r6277, %r5868; 2026-02-21T12:43:14.3424741Z xor.b32 %r5869, %r5866, 96; 2026-02-21T12:43:14.3424812Z add.s32 %r40, %r6277, %r5869; 2026-02-21T12:43:14.3424879Z shl.b32 %r5870, %r7615, 12; 2026-02-21T12:43:14.3424941Z shl.b32 %r5871, %r7613, 5; 2026-02-21T12:43:14.3425012Z and.b32 %r5872, %r7608, 4080; 2026-02-21T12:43:14.3425078Z or.b32 %r5873, %r5870, %r5871; 2026-02-21T12:43:14.3425144Z xor.b32 %r5874, %r5873, %r5872; 2026-02-21T12:43:14.3425213Z add.s32 %r7499, %r6277, %r5874; 2026-02-21T12:43:14.3425274Z add.s32 %r7504, %r7499, 4096; 2026-02-21T12:43:14.3425334Z add.s32 %r7509, %r7499, 8192; 2026-02-21T12:43:14.3425398Z add.s32 %r7514, %r7499, 12288; 2026-02-21T12:43:14.3425465Z add.s32 %r7519, %r7499, 16384; 2026-02-21T12:43:14.3425527Z add.s32 %r7524, %r7499, 20480; 2026-02-21T12:43:14.3425588Z add.s32 %r7529, %r7499, 24576; 2026-02-21T12:43:14.3425655Z add.s32 %r7534, %r7499, 28672; 2026-02-21T12:43:14.3425869Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3425939Z mul.wide.u32 %rd320, %r7616, 16; 2026-02-21T12:43:14.3426011Z or.b64 %rd321, %rd396, %rd320; 2026-02-21T12:43:14.3426143Z add.s64 %rd322, %rd321, %rd86; 2026-02-21T12:43:14.3426207Z add.s64 %rd21, %rd322, 96; 2026-02-21T12:43:14.3426273Z or.b64 %rd324, %rd397, %rd4; 2026-02-21T12:43:14.3426408Z add.s64 %rd325, %rd324, %rd87; 2026-02-21T12:43:14.3426594Z add.s64 %rd22, %rd325, 30720; 2026-02-21T12:43:14.3426718Z $L__BB0_6: // =>This Loop Header: Depth=1 2026-02-21T12:43:14.3426825Z // Child Loop BB0_7 Depth 2 2026-02-21T12:43:14.3427033Z .loc 1 25 35 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:25:35 2026-02-21T12:43:14.3427097Z shr.s64 %rd327, %rd411, 63; 2026-02-21T12:43:14.3427177Z shr.u64 %rd328, %rd327, 52; 2026-02-21T12:43:14.3427243Z add.s64 %rd329, %rd411, %rd328; 2026-02-21T12:43:14.3427306Z shr.s64 %rd330, %rd329, 12; 2026-02-21T12:43:14.3427509Z .loc 1 26 33 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:26:33 2026-02-21T12:43:14.3427581Z shl.b64 %rd331, %rd330, 3; 2026-02-21T12:43:14.3427866Z .loc 1 27 39 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:39 2026-02-21T12:43:14.3427997Z sub.s64 %rd332, 10, %rd331; 2026-02-21T12:43:14.3428216Z .loc 1 27 52 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:27:52 2026-02-21T12:43:14.3428282Z min.u64 %rd333, %rd332, 8; 2026-02-21T12:43:14.3428485Z .loc 1 28 45 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:45 2026-02-21T12:43:14.3428616Z and.b64 %rd334, %rd329, 61440; 2026-02-21T12:43:14.3428688Z sub.s64 %rd335, %rd411, %rd334; 2026-02-21T12:43:14.3428887Z .loc 1 28 64 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:64 2026-02-21T12:43:14.3428951Z cvt.u16.u64 %rs229, %rd335; 2026-02-21T12:43:14.3429020Z cvt.u16.u64 %rs230, %rd333; 2026-02-21T12:43:14.3429217Z .loc 1 29 51 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:29:51 2026-02-21T12:43:14.3429284Z div.s16 %rs231, %rs229, %rs230; 2026-02-21T12:43:14.3429489Z .loc 1 28 64 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:64 2026-02-21T12:43:14.3429560Z mul.lo.s16 %rs232, %rs231, %rs230; 2026-02-21T12:43:14.3429627Z sub.s16 %rs233, %rs229, %rs232; 2026-02-21T12:43:14.3429697Z cvt.s64.s16 %rd336, %rs233; 2026-02-21T12:43:14.3429896Z .loc 1 28 30 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:28:30 2026-02-21T12:43:14.3429963Z add.s64 %rd337, %rd331, %rd336; 2026-02-21T12:43:14.3430167Z .loc 1 30 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:30:27 2026-02-21T12:43:14.3430234Z shl.b64 %rd75, %rd337, 7; 2026-02-21T12:43:14.3430434Z .loc 1 32 27 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:32:27 2026-02-21T12:43:14.3430510Z cvt.s32.s16 %r5877, %rs231; 2026-02-21T12:43:14.3430589Z mul.wide.s32 %rd76, %r5877, 512; 2026-02-21T12:43:14.3430798Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3430887Z mad.wide.s32 %rd413, %r5877, 8388608, %rd21; 2026-02-21T12:43:14.3430959Z add.s64 %rd412, %rd22, %rd75; 2026-02-21T12:43:14.3431022Z mov.b32 %r7809, 0f00000000; 2026-02-21T12:43:14.3431084Z mov.b64 %rd414, -32; 2026-02-21T12:43:14.3431154Z mov.b32 %r7810, %r7809; 2026-02-21T12:43:14.3431215Z mov.b32 %r7811, %r7809; 2026-02-21T12:43:14.3431276Z mov.b32 %r7812, %r7809; 2026-02-21T12:43:14.3431334Z mov.b32 %r7813, %r7809; 2026-02-21T12:43:14.3431395Z mov.b32 %r7814, %r7809; 2026-02-21T12:43:14.3431455Z mov.b32 %r7815, %r7809; 2026-02-21T12:43:14.3431514Z mov.b32 %r7816, %r7809; 2026-02-21T12:43:14.3431578Z mov.b32 %r7817, %r7809; 2026-02-21T12:43:14.3431639Z mov.b32 %r7818, %r7809; 2026-02-21T12:43:14.3431698Z mov.b32 %r7819, %r7809; 2026-02-21T12:43:14.3431758Z mov.b32 %r7820, %r7809; 2026-02-21T12:43:14.3431913Z mov.b32 %r7821, %r7809; 2026-02-21T12:43:14.3431975Z mov.b32 %r7822, %r7809; 2026-02-21T12:43:14.3432036Z mov.b32 %r7823, %r7809; 2026-02-21T12:43:14.3432173Z mov.b32 %r7824, %r7809; 2026-02-21T12:43:14.3432232Z mov.b32 %r7825, %r7809; 2026-02-21T12:43:14.3432292Z mov.b32 %r7826, %r7809; 2026-02-21T12:43:14.3432351Z mov.b32 %r7827, %r7809; 2026-02-21T12:43:14.3432416Z mov.b32 %r7828, %r7809; 2026-02-21T12:43:14.3432477Z mov.b32 %r7829, %r7809; 2026-02-21T12:43:14.3432535Z mov.b32 %r7830, %r7809; 2026-02-21T12:43:14.3432602Z mov.b32 %r7831, %r7809; 2026-02-21T12:43:14.3432662Z mov.b32 %r7832, %r7809; 2026-02-21T12:43:14.3432722Z mov.b32 %r7833, %r7809; 2026-02-21T12:43:14.3432781Z mov.b32 %r7834, %r7809; 2026-02-21T12:43:14.3432846Z mov.b32 %r7835, %r7809; 2026-02-21T12:43:14.3432908Z mov.b32 %r7836, %r7809; 2026-02-21T12:43:14.3432970Z mov.b32 %r7837, %r7809; 2026-02-21T12:43:14.3433035Z mov.b32 %r7838, %r7809; 2026-02-21T12:43:14.3433101Z mov.b32 %r7839, %r7809; 2026-02-21T12:43:14.3433160Z mov.b32 %r7840, %r7809; 2026-02-21T12:43:14.3433220Z mov.b32 %r7841, %r7809; 2026-02-21T12:43:14.3433352Z mov.b32 %r7842, %r7809; 2026-02-21T12:43:14.3433463Z mov.b32 %r7843, %r7809; 2026-02-21T12:43:14.3433526Z mov.b32 %r7844, %r7809; 2026-02-21T12:43:14.3433596Z mov.b32 %r7845, %r7809; 2026-02-21T12:43:14.3433655Z mov.b32 %r7846, %r7809; 2026-02-21T12:43:14.3433713Z mov.b32 %r7847, %r7809; 2026-02-21T12:43:14.3433785Z mov.b32 %r7848, %r7809; 2026-02-21T12:43:14.3433847Z mov.b32 %r7849, %r7809; 2026-02-21T12:43:14.3433907Z mov.b32 %r7850, %r7809; 2026-02-21T12:43:14.3433969Z mov.b32 %r7851, %r7809; 2026-02-21T12:43:14.3434034Z mov.b32 %r7852, %r7809; 2026-02-21T12:43:14.3434094Z mov.b32 %r7853, %r7809; 2026-02-21T12:43:14.3434153Z mov.b32 %r7854, %r7809; 2026-02-21T12:43:14.3434217Z mov.b32 %r7855, %r7809; 2026-02-21T12:43:14.3434276Z mov.b32 %r7856, %r7809; 2026-02-21T12:43:14.3434335Z mov.b32 %r7857, %r7809; 2026-02-21T12:43:14.3434395Z mov.b32 %r7858, %r7809; 2026-02-21T12:43:14.3434458Z mov.b32 %r7859, %r7809; 2026-02-21T12:43:14.3434518Z mov.b32 %r7860, %r7809; 2026-02-21T12:43:14.3434582Z mov.b32 %r7861, %r7809; 2026-02-21T12:43:14.3434647Z mov.b32 %r7862, %r7809; 2026-02-21T12:43:14.3434707Z mov.b32 %r7863, %r7809; 2026-02-21T12:43:14.3434766Z mov.b32 %r7864, %r7809; 2026-02-21T12:43:14.3434825Z mov.b32 %r7865, %r7809; 2026-02-21T12:43:14.3434889Z mov.b32 %r7866, %r7809; 2026-02-21T12:43:14.3434948Z mov.b32 %r7867, %r7809; 2026-02-21T12:43:14.3435008Z mov.b32 %r7868, %r7809; 2026-02-21T12:43:14.3435074Z mov.b32 %r7869, %r7809; 2026-02-21T12:43:14.3435134Z mov.b32 %r7870, %r7809; 2026-02-21T12:43:14.3435193Z mov.b32 %r7871, %r7809; 2026-02-21T12:43:14.3435252Z mov.b32 %r7872, %r7809; 2026-02-21T12:43:14.3435378Z $L__BB0_7: // Parent Loop BB0_6 Depth=1 2026-02-21T12:43:14.3435488Z // => This Inner Loop Header: Depth=2 2026-02-21T12:43:14.3435695Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3435771Z add.s64 %rd339, %rd413, -96; 2026-02-21T12:43:14.3435833Z // begin inline asm 2026-02-21T12:43:14.3435895Z mov.u64 %rd338, 0x0; 2026-02-21T12:43:14.3436030Z createpolicy.fractional.L2::evict_last.b64 %rd338, 1.0; 2026-02-21T12:43:14.3436090Z // end inline asm 2026-02-21T12:43:14.3436150Z // begin inline asm 2026-02-21T12:43:14.3436209Z mov.u32 %r5878, 0x0; 2026-02-21T12:43:14.3436274Z mov.u32 %r5879, 0x0; 2026-02-21T12:43:14.3436332Z mov.u32 %r5880, 0x0; 2026-02-21T12:43:14.3436390Z mov.u32 %r5881, 0x0; 2026-02-21T12:43:14.3436766Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r5878, %r5879, %r5880, %r5881 }, [ %rd339 + 0 ], %rd338; 2026-02-21T12:43:14.3436829Z // end inline asm 2026-02-21T12:43:14.3437035Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3437198Z bar.sync 0; 2026-02-21T12:43:14.3437283Z st.shared.v2.b32 [%r29], {%r5878, %r5879}; 2026-02-21T12:43:14.3437361Z st.shared.v2.b32 [%r30], {%r5880, %r5881}; 2026-02-21T12:43:14.3437513Z bar.sync 0; 2026-02-21T12:43:14.3437599Z ld.shared.b16 %rs238, [%r31]; 2026-02-21T12:43:14.3437671Z ld.shared.b16 %rs239, [%r31+256]; 2026-02-21T12:43:14.3437741Z ld.shared.b16 %rs240, [%r31+16]; 2026-02-21T12:43:14.3437813Z ld.shared.b16 %rs241, [%r31+272]; 2026-02-21T12:43:14.3437880Z ld.shared.b16 %rs242, [%r32]; 2026-02-21T12:43:14.3437947Z ld.shared.b16 %rs243, [%r32+256]; 2026-02-21T12:43:14.3438017Z ld.shared.b16 %rs244, [%r32+16]; 2026-02-21T12:43:14.3438088Z ld.shared.b16 %rs245, [%r32+272]; 2026-02-21T12:43:14.3438153Z cvt.f32.bf16 %r6010, %rs238; 2026-02-21T12:43:14.3438217Z cvt.f32.bf16 %r6011, %rs239; 2026-02-21T12:43:14.3438285Z cvt.f32.bf16 %r6012, %rs242; 2026-02-21T12:43:14.3438346Z cvt.f32.bf16 %r6013, %rs243; 2026-02-21T12:43:14.3438410Z cvt.f32.bf16 %r6142, %rs240; 2026-02-21T12:43:14.3438474Z cvt.f32.bf16 %r6143, %rs241; 2026-02-21T12:43:14.3438544Z cvt.f32.bf16 %r6144, %rs244; 2026-02-21T12:43:14.3438682Z cvt.f32.bf16 %r6145, %rs245; 2026-02-21T12:43:14.3438964Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3439051Z add.s64 %rd342, %rd412, -30720; 2026-02-21T12:43:14.3439115Z // begin inline asm 2026-02-21T12:43:14.3439176Z mov.u64 %rd341, 0x0; 2026-02-21T12:43:14.3439313Z createpolicy.fractional.L2::evict_first.b64 %rd341, 1.0; 2026-02-21T12:43:14.3439371Z // end inline asm 2026-02-21T12:43:14.3439431Z // begin inline asm 2026-02-21T12:43:14.3439490Z mov.u16 %rs234, 0x0; 2026-02-21T12:43:14.3439659Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs234 }, [ %rd342 + 0 ], %rd341; 2026-02-21T12:43:14.3439720Z // end inline asm 2026-02-21T12:43:14.3439927Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3439993Z bar.sync 0; 2026-02-21T12:43:14.3440060Z st.shared.b8 [%r33], %rs234; 2026-02-21T12:43:14.3440116Z bar.sync 0; 2026-02-21T12:43:14.3440206Z ld.shared.v2.b8 {%rs246, %rs247}, [%r34]; 2026-02-21T12:43:14.3440411Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3440480Z shl.b16 %rs248, %rs246, 4; 2026-02-21T12:43:14.3440544Z shl.b16 %rs249, %rs247, 4; 2026-02-21T12:43:14.3440748Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3440826Z selp.b16 %rs250, %rs248, %rs246, %p51; 2026-02-21T12:43:14.3440891Z cvt.s16.s8 %rs251, %rs250; 2026-02-21T12:43:14.3440967Z shr.s16 %rs252, %rs251, 4; 2026-02-21T12:43:14.3441043Z selp.b16 %rs253, %rs249, %rs247, %p51; 2026-02-21T12:43:14.3441108Z cvt.s16.s8 %rs254, %rs253; 2026-02-21T12:43:14.3441177Z shr.s16 %rs255, %rs254, 4; 2026-02-21T12:43:14.3441379Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3441450Z cvt.rn.f32.s16 %r7486, %rs252; 2026-02-21T12:43:14.3441518Z cvt.rn.f32.s16 %r7487, %rs255; 2026-02-21T12:43:14.3441582Z bar.sync 0; 2026-02-21T12:43:14.3441647Z st.shared.b32 [%r35], %r7486; 2026-02-21T12:43:14.3441714Z st.shared.b32 [%r36], %r7487; 2026-02-21T12:43:14.3441772Z $L__tmp25: 2026-02-21T12:43:14.3442050Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3442112Z // begin inline asm 2026-02-21T12:43:14.3442190Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3442254Z // end inline asm 2026-02-21T12:43:14.3442310Z bar.sync 0; 2026-02-21T12:43:14.3442391Z shfl.sync.idx.b32 %r7488, %r2, 0, 31, -1; 2026-02-21T12:43:14.3442470Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3442535Z mov.pred %p39, -1; 2026-02-21T12:43:14.3442593Z // begin inline asm 2026-02-21T12:43:14.3443868Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6010,%r6011,%r6012,%r6013}, %rd19, %p39, 1, 1; 2026-02-21T12:43:14.3444043Z // end inline asm 2026-02-21T12:43:14.3444105Z // begin inline asm 2026-02-21T12:43:14.3445419Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6142,%r6143,%r6144,%r6145}, %rd20, %p39, 1, 1; 2026-02-21T12:43:14.3445528Z // end inline asm 2026-02-21T12:43:14.3445610Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3445675Z mov.b32 %r7417, 0; 2026-02-21T12:43:14.3445737Z mov.b32 %r6210, %r6277; 2026-02-21T12:43:14.3445798Z mov.b32 %r6211, %r7417; 2026-02-21T12:43:14.3445870Z mov.b32 %r6212, %r7417; 2026-02-21T12:43:14.3445937Z // begin inline asm 2026-02-21T12:43:14.3447143Z // wait for regs: %r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872,%r6210,%r6211,%r6212 2026-02-21T12:43:14.3447232Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3447294Z // end inline asm 2026-02-21T12:43:14.3447353Z $L__tmp26: 2026-02-21T12:43:14.3447567Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3447634Z add.s64 %rd347, %rd413, -64; 2026-02-21T12:43:14.3447696Z // begin inline asm 2026-02-21T12:43:14.3447756Z mov.u64 %rd346, 0x0; 2026-02-21T12:43:14.3447888Z createpolicy.fractional.L2::evict_last.b64 %rd346, 1.0; 2026-02-21T12:43:14.3447947Z // end inline asm 2026-02-21T12:43:14.3448007Z // begin inline asm 2026-02-21T12:43:14.3448072Z mov.u32 %r6280, 0x0; 2026-02-21T12:43:14.3448131Z mov.u32 %r6281, 0x0; 2026-02-21T12:43:14.3448190Z mov.u32 %r6282, 0x0; 2026-02-21T12:43:14.3448248Z mov.u32 %r6283, 0x0; 2026-02-21T12:43:14.3448476Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6280, %r6281, %r6282, %r6283 }, [ %rd347 + 0 ], %rd346; 2026-02-21T12:43:14.3448537Z // end inline asm 2026-02-21T12:43:14.3448750Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3448826Z bar.sync 0; 2026-02-21T12:43:14.3448909Z st.shared.v2.b32 [%r29], {%r6280, %r6281}; 2026-02-21T12:43:14.3448987Z st.shared.v2.b32 [%r30], {%r6282, %r6283}; 2026-02-21T12:43:14.3449051Z bar.sync 0; 2026-02-21T12:43:14.3449118Z ld.shared.b16 %rs256, [%r31]; 2026-02-21T12:43:14.3449188Z ld.shared.b16 %rs257, [%r31+256]; 2026-02-21T12:43:14.3449259Z ld.shared.b16 %rs258, [%r31+16]; 2026-02-21T12:43:14.3449331Z ld.shared.b16 %rs259, [%r31+272]; 2026-02-21T12:43:14.3449395Z ld.shared.b16 %rs260, [%r32]; 2026-02-21T12:43:14.3449460Z ld.shared.b16 %rs261, [%r32+256]; 2026-02-21T12:43:14.3449532Z ld.shared.b16 %rs262, [%r32+16]; 2026-02-21T12:43:14.3449598Z ld.shared.b16 %rs263, [%r32+272]; 2026-02-21T12:43:14.3449665Z cvt.f32.bf16 %r6412, %rs256; 2026-02-21T12:43:14.3449734Z cvt.f32.bf16 %r6413, %rs257; 2026-02-21T12:43:14.3449893Z cvt.f32.bf16 %r6414, %rs260; 2026-02-21T12:43:14.3449957Z cvt.f32.bf16 %r6415, %rs261; 2026-02-21T12:43:14.3450104Z cvt.f32.bf16 %r6544, %rs258; 2026-02-21T12:43:14.3450172Z cvt.f32.bf16 %r6545, %rs259; 2026-02-21T12:43:14.3450238Z cvt.f32.bf16 %r6546, %rs262; 2026-02-21T12:43:14.3450301Z cvt.f32.bf16 %r6547, %rs263; 2026-02-21T12:43:14.3450508Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3450578Z add.s64 %rd350, %rd412, -20480; 2026-02-21T12:43:14.3450639Z // begin inline asm 2026-02-21T12:43:14.3450699Z mov.u64 %rd349, 0x0; 2026-02-21T12:43:14.3450833Z createpolicy.fractional.L2::evict_first.b64 %rd349, 1.0; 2026-02-21T12:43:14.3450893Z // end inline asm 2026-02-21T12:43:14.3450954Z // begin inline asm 2026-02-21T12:43:14.3451019Z mov.u16 %rs235, 0x0; 2026-02-21T12:43:14.3451186Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs235 }, [ %rd350 + 0 ], %rd349; 2026-02-21T12:43:14.3451247Z // end inline asm 2026-02-21T12:43:14.3451537Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3451683Z bar.sync 0; 2026-02-21T12:43:14.3451754Z st.shared.b8 [%r33], %rs235; 2026-02-21T12:43:14.3451811Z bar.sync 0; 2026-02-21T12:43:14.3451897Z ld.shared.v2.b8 {%rs264, %rs265}, [%r34]; 2026-02-21T12:43:14.3452099Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3452164Z shl.b16 %rs266, %rs264, 4; 2026-02-21T12:43:14.3452237Z shl.b16 %rs267, %rs265, 4; 2026-02-21T12:43:14.3452435Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3452510Z selp.b16 %rs268, %rs266, %rs264, %p51; 2026-02-21T12:43:14.3452582Z cvt.s16.s8 %rs269, %rs268; 2026-02-21T12:43:14.3452645Z shr.s16 %rs270, %rs269, 4; 2026-02-21T12:43:14.3452718Z selp.b16 %rs271, %rs267, %rs265, %p51; 2026-02-21T12:43:14.3452787Z cvt.s16.s8 %rs272, %rs271; 2026-02-21T12:43:14.3452854Z shr.s16 %rs273, %rs272, 4; 2026-02-21T12:43:14.3453056Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3453126Z cvt.rn.f32.s16 %r7489, %rs270; 2026-02-21T12:43:14.3453200Z cvt.rn.f32.s16 %r7490, %rs273; 2026-02-21T12:43:14.3453257Z bar.sync 0; 2026-02-21T12:43:14.3453323Z st.shared.b32 [%r35], %r7489; 2026-02-21T12:43:14.3453388Z st.shared.b32 [%r36], %r7490; 2026-02-21T12:43:14.3453450Z $L__tmp27: 2026-02-21T12:43:14.3453721Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3453783Z // begin inline asm 2026-02-21T12:43:14.3453865Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3453923Z // end inline asm 2026-02-21T12:43:14.3453980Z bar.sync 0; 2026-02-21T12:43:14.3454060Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3454124Z // begin inline asm 2026-02-21T12:43:14.3455389Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6412,%r6413,%r6414,%r6415}, %rd19, %p39, 1, 1; 2026-02-21T12:43:14.3455456Z // end inline asm 2026-02-21T12:43:14.3455516Z // begin inline asm 2026-02-21T12:43:14.3456906Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6544,%r6545,%r6546,%r6547}, %rd20, %p39, 1, 1; 2026-02-21T12:43:14.3457147Z // end inline asm 2026-02-21T12:43:14.3457228Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3457300Z mov.b32 %r6612, %r6277; 2026-02-21T12:43:14.3457368Z mov.b32 %r6613, %r7417; 2026-02-21T12:43:14.3457432Z mov.b32 %r6614, %r7417; 2026-02-21T12:43:14.3457493Z // begin inline asm 2026-02-21T12:43:14.3458563Z // wait for regs: %r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872,%r6612,%r6613,%r6614 2026-02-21T12:43:14.3458643Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3458776Z // end inline asm 2026-02-21T12:43:14.3458908Z $L__tmp28: 2026-02-21T12:43:14.3459126Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3459193Z add.s64 %rd355, %rd413, -32; 2026-02-21T12:43:14.3459256Z // begin inline asm 2026-02-21T12:43:14.3459316Z mov.u64 %rd354, 0x0; 2026-02-21T12:43:14.3459442Z createpolicy.fractional.L2::evict_last.b64 %rd354, 1.0; 2026-02-21T12:43:14.3459501Z // end inline asm 2026-02-21T12:43:14.3459566Z // begin inline asm 2026-02-21T12:43:14.3459626Z mov.u32 %r6682, 0x0; 2026-02-21T12:43:14.3459685Z mov.u32 %r6683, 0x0; 2026-02-21T12:43:14.3459747Z mov.u32 %r6684, 0x0; 2026-02-21T12:43:14.3459805Z mov.u32 %r6685, 0x0; 2026-02-21T12:43:14.3460028Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r6682, %r6683, %r6684, %r6685 }, [ %rd355 + 0 ], %rd354; 2026-02-21T12:43:14.3460089Z // end inline asm 2026-02-21T12:43:14.3460298Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3460356Z bar.sync 0; 2026-02-21T12:43:14.3460437Z st.shared.v2.b32 [%r29], {%r6682, %r6683}; 2026-02-21T12:43:14.3460518Z st.shared.v2.b32 [%r30], {%r6684, %r6685}; 2026-02-21T12:43:14.3460575Z bar.sync 0; 2026-02-21T12:43:14.3460644Z ld.shared.b16 %rs274, [%r31]; 2026-02-21T12:43:14.3460719Z ld.shared.b16 %rs275, [%r31+256]; 2026-02-21T12:43:14.3460788Z ld.shared.b16 %rs276, [%r31+16]; 2026-02-21T12:43:14.3460855Z ld.shared.b16 %rs277, [%r31+272]; 2026-02-21T12:43:14.3460920Z ld.shared.b16 %rs278, [%r32]; 2026-02-21T12:43:14.3460989Z ld.shared.b16 %rs279, [%r32+256]; 2026-02-21T12:43:14.3461059Z ld.shared.b16 %rs280, [%r32+16]; 2026-02-21T12:43:14.3461126Z ld.shared.b16 %rs281, [%r32+272]; 2026-02-21T12:43:14.3461193Z cvt.f32.bf16 %r6814, %rs274; 2026-02-21T12:43:14.3461259Z cvt.f32.bf16 %r6815, %rs275; 2026-02-21T12:43:14.3461322Z cvt.f32.bf16 %r6816, %rs278; 2026-02-21T12:43:14.3461385Z cvt.f32.bf16 %r6817, %rs279; 2026-02-21T12:43:14.3461456Z cvt.f32.bf16 %r6946, %rs276; 2026-02-21T12:43:14.3461519Z cvt.f32.bf16 %r6947, %rs277; 2026-02-21T12:43:14.3461582Z cvt.f32.bf16 %r6948, %rs280; 2026-02-21T12:43:14.3461650Z cvt.f32.bf16 %r6949, %rs281; 2026-02-21T12:43:14.3461854Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3461922Z add.s64 %rd358, %rd412, -10240; 2026-02-21T12:43:14.3461989Z // begin inline asm 2026-02-21T12:43:14.3462049Z mov.u64 %rd357, 0x0; 2026-02-21T12:43:14.3462189Z createpolicy.fractional.L2::evict_first.b64 %rd357, 1.0; 2026-02-21T12:43:14.3462251Z // end inline asm 2026-02-21T12:43:14.3462320Z // begin inline asm 2026-02-21T12:43:14.3462382Z mov.u16 %rs236, 0x0; 2026-02-21T12:43:14.3462546Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs236 }, [ %rd358 + 0 ], %rd357; 2026-02-21T12:43:14.3462700Z // end inline asm 2026-02-21T12:43:14.3462907Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3463019Z bar.sync 0; 2026-02-21T12:43:14.3463091Z st.shared.b8 [%r33], %rs236; 2026-02-21T12:43:14.3463147Z bar.sync 0; 2026-02-21T12:43:14.3463238Z ld.shared.v2.b8 {%rs282, %rs283}, [%r34]; 2026-02-21T12:43:14.3463441Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3463515Z shl.b16 %rs284, %rs282, 4; 2026-02-21T12:43:14.3463580Z shl.b16 %rs285, %rs283, 4; 2026-02-21T12:43:14.3463779Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3463858Z selp.b16 %rs286, %rs284, %rs282, %p51; 2026-02-21T12:43:14.3463924Z cvt.s16.s8 %rs287, %rs286; 2026-02-21T12:43:14.3463986Z shr.s16 %rs288, %rs287, 4; 2026-02-21T12:43:14.3464062Z selp.b16 %rs289, %rs285, %rs283, %p51; 2026-02-21T12:43:14.3464129Z cvt.s16.s8 %rs290, %rs289; 2026-02-21T12:43:14.3464191Z shr.s16 %rs291, %rs290, 4; 2026-02-21T12:43:14.3464504Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3464583Z cvt.rn.f32.s16 %r7491, %rs288; 2026-02-21T12:43:14.3464662Z cvt.rn.f32.s16 %r7492, %rs291; 2026-02-21T12:43:14.3464722Z bar.sync 0; 2026-02-21T12:43:14.3464795Z st.shared.b32 [%r35], %r7491; 2026-02-21T12:43:14.3464861Z st.shared.b32 [%r36], %r7492; 2026-02-21T12:43:14.3464917Z $L__tmp29: 2026-02-21T12:43:14.3465190Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3465262Z // begin inline asm 2026-02-21T12:43:14.3465340Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3465397Z // end inline asm 2026-02-21T12:43:14.3465461Z bar.sync 0; 2026-02-21T12:43:14.3465533Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3465595Z // begin inline asm 2026-02-21T12:43:14.3466981Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6814,%r6815,%r6816,%r6817}, %rd19, %p39, 1, 1; 2026-02-21T12:43:14.3467050Z // end inline asm 2026-02-21T12:43:14.3467113Z // begin inline asm 2026-02-21T12:43:14.3468392Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r6946,%r6947,%r6948,%r6949}, %rd20, %p39, 1, 1; 2026-02-21T12:43:14.3468454Z // end inline asm 2026-02-21T12:43:14.3468532Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3468684Z mov.b32 %r7014, %r6277; 2026-02-21T12:43:14.3468754Z mov.b32 %r7015, %r7417; 2026-02-21T12:43:14.3468815Z mov.b32 %r7016, %r7417; 2026-02-21T12:43:14.3468882Z // begin inline asm 2026-02-21T12:43:14.3469951Z // wait for regs: %r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872,%r7014,%r7015,%r7016 2026-02-21T12:43:14.3470122Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3470253Z // end inline asm 2026-02-21T12:43:14.3470312Z $L__tmp30: 2026-02-21T12:43:14.3470518Z .loc 1 48 80 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:48:80 2026-02-21T12:43:14.3470583Z // begin inline asm 2026-02-21T12:43:14.3470644Z mov.u64 %rd362, 0x0; 2026-02-21T12:43:14.3470768Z createpolicy.fractional.L2::evict_last.b64 %rd362, 1.0; 2026-02-21T12:43:14.3470826Z // end inline asm 2026-02-21T12:43:14.3470889Z // begin inline asm 2026-02-21T12:43:14.3470959Z mov.u32 %r7084, 0x0; 2026-02-21T12:43:14.3471020Z mov.u32 %r7085, 0x0; 2026-02-21T12:43:14.3471085Z mov.u32 %r7086, 0x0; 2026-02-21T12:43:14.3471145Z mov.u32 %r7087, 0x0; 2026-02-21T12:43:14.3471370Z ld.global.L1::evict_last.L2::cache_hint.v4.b32 { %r7084, %r7085, %r7086, %r7087 }, [ %rd413 + 0 ], %rd362; 2026-02-21T12:43:14.3471435Z // end inline asm 2026-02-21T12:43:14.3471720Z .loc 1 52 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:52:32 2026-02-21T12:43:14.3471843Z bar.sync 0; 2026-02-21T12:43:14.3471938Z st.shared.v2.b32 [%r29], {%r7084, %r7085}; 2026-02-21T12:43:14.3472022Z st.shared.v2.b32 [%r30], {%r7086, %r7087}; 2026-02-21T12:43:14.3472079Z bar.sync 0; 2026-02-21T12:43:14.3472148Z ld.shared.b16 %rs292, [%r31]; 2026-02-21T12:43:14.3472219Z ld.shared.b16 %rs293, [%r31+256]; 2026-02-21T12:43:14.3472288Z ld.shared.b16 %rs294, [%r31+16]; 2026-02-21T12:43:14.3472354Z ld.shared.b16 %rs295, [%r31+272]; 2026-02-21T12:43:14.3472420Z ld.shared.b16 %rs296, [%r32]; 2026-02-21T12:43:14.3472488Z ld.shared.b16 %rs297, [%r32+256]; 2026-02-21T12:43:14.3472555Z ld.shared.b16 %rs298, [%r32+16]; 2026-02-21T12:43:14.3472619Z ld.shared.b16 %rs299, [%r32+272]; 2026-02-21T12:43:14.3472688Z cvt.f32.bf16 %r7216, %rs292; 2026-02-21T12:43:14.3472753Z cvt.f32.bf16 %r7217, %rs293; 2026-02-21T12:43:14.3472818Z cvt.f32.bf16 %r7218, %rs296; 2026-02-21T12:43:14.3472885Z cvt.f32.bf16 %r7219, %rs297; 2026-02-21T12:43:14.3472953Z cvt.f32.bf16 %r7348, %rs294; 2026-02-21T12:43:14.3473016Z cvt.f32.bf16 %r7349, %rs295; 2026-02-21T12:43:14.3473080Z cvt.f32.bf16 %r7350, %rs298; 2026-02-21T12:43:14.3473146Z cvt.f32.bf16 %r7351, %rs299; 2026-02-21T12:43:14.3473347Z .loc 1 54 87 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:54:87 2026-02-21T12:43:14.3473409Z // begin inline asm 2026-02-21T12:43:14.3473473Z mov.u64 %rd365, 0x0; 2026-02-21T12:43:14.3473600Z createpolicy.fractional.L2::evict_first.b64 %rd365, 1.0; 2026-02-21T12:43:14.3473660Z // end inline asm 2026-02-21T12:43:14.3473721Z // begin inline asm 2026-02-21T12:43:14.3473786Z mov.u16 %rs237, 0x0; 2026-02-21T12:43:14.3473944Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs237 }, [ %rd412 + 0 ], %rd365; 2026-02-21T12:43:14.3474003Z // end inline asm 2026-02-21T12:43:14.3474214Z .loc 1 62 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:62:28 2026-02-21T12:43:14.3474270Z bar.sync 0; 2026-02-21T12:43:14.3474340Z st.shared.b8 [%r33], %rs237; 2026-02-21T12:43:14.3474402Z bar.sync 0; 2026-02-21T12:43:14.3474479Z ld.shared.v2.b8 {%rs300, %rs301}, [%r34]; 2026-02-21T12:43:14.3474677Z .loc 1 57 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:57:28 2026-02-21T12:43:14.3474742Z shl.b16 %rs302, %rs300, 4; 2026-02-21T12:43:14.3474811Z shl.b16 %rs303, %rs301, 4; 2026-02-21T12:43:14.3475008Z .loc 1 72 58 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:72:58 2026-02-21T12:43:14.3475081Z selp.b16 %rs304, %rs302, %rs300, %p51; 2026-02-21T12:43:14.3475151Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T12:43:14.3475214Z shr.s16 %rs306, %rs305, 4; 2026-02-21T12:43:14.3475297Z selp.b16 %rs307, %rs303, %rs301, %p51; 2026-02-21T12:43:14.3475368Z cvt.s16.s8 %rs308, %rs307; 2026-02-21T12:43:14.3475501Z shr.s16 %rs309, %rs308, 4; 2026-02-21T12:43:14.3475706Z .loc 1 77 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:77:32 2026-02-21T12:43:14.3475831Z cvt.rn.f32.s16 %r7493, %rs306; 2026-02-21T12:43:14.3475908Z cvt.rn.f32.s16 %r7494, %rs309; 2026-02-21T12:43:14.3475965Z bar.sync 0; 2026-02-21T12:43:14.3476031Z st.shared.b32 [%r35], %r7493; 2026-02-21T12:43:14.3476102Z st.shared.b32 [%r36], %r7494; 2026-02-21T12:43:14.3476158Z $L__tmp31: 2026-02-21T12:43:14.3476429Z .loc 2 291 36 // standard.py:291:36 @[ cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:84:40 ] 2026-02-21T12:43:14.3476614Z // begin inline asm 2026-02-21T12:43:14.3476702Z fence.proxy.async.shared::cta; 2026-02-21T12:43:14.3476762Z // end inline asm 2026-02-21T12:43:14.3476818Z bar.sync 0; 2026-02-21T12:43:14.3476896Z wgmma.fence.sync.aligned; 2026-02-21T12:43:14.3476956Z // begin inline asm 2026-02-21T12:43:14.3478385Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r7216,%r7217,%r7218,%r7219}, %rd19, %p39, 1, 1; 2026-02-21T12:43:14.3478458Z // end inline asm 2026-02-21T12:43:14.3478518Z // begin inline asm 2026-02-21T12:43:14.3479795Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872}, {%r7348,%r7349,%r7350,%r7351}, %rd20, %p39, 1, 1; 2026-02-21T12:43:14.3479865Z // end inline asm 2026-02-21T12:43:14.3482978Z wgmma.commit_group.sync.aligned; 2026-02-21T12:43:14.3483072Z mov.b32 %r7416, %r6277; 2026-02-21T12:43:14.3483139Z mov.b32 %r7418, %r7417; 2026-02-21T12:43:14.3483209Z // begin inline asm 2026-02-21T12:43:14.3484290Z // wait for regs: %r7809,%r7810,%r7811,%r7812,%r7813,%r7814,%r7815,%r7816,%r7817,%r7818,%r7819,%r7820,%r7821,%r7822,%r7823,%r7824,%r7825,%r7826,%r7827,%r7828,%r7829,%r7830,%r7831,%r7832,%r7833,%r7834,%r7835,%r7836,%r7837,%r7838,%r7839,%r7840,%r7841,%r7842,%r7843,%r7844,%r7845,%r7846,%r7847,%r7848,%r7849,%r7850,%r7851,%r7852,%r7853,%r7854,%r7855,%r7856,%r7857,%r7858,%r7859,%r7860,%r7861,%r7862,%r7863,%r7864,%r7865,%r7866,%r7867,%r7868,%r7869,%r7870,%r7871,%r7872,%r7416,%r7417,%r7418 2026-02-21T12:43:14.3484382Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:43:14.3484449Z // end inline asm 2026-02-21T12:43:14.3484512Z $L__tmp32: 2026-02-21T12:43:14.3484746Z .loc 1 40 126 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:40:126 2026-02-21T12:43:14.3484816Z add.s64 %rd414, %rd414, 32; 2026-02-21T12:43:14.3484888Z add.s64 %rd413, %rd413, 128; 2026-02-21T12:43:14.3484955Z add.s64 %rd412, %rd412, 40960; 2026-02-21T12:43:14.3485024Z setp.lt.u64 %p48, %rd414, 4064; 2026-02-21T12:43:14.3485094Z @%p48 bra $L__BB0_7; 2026-02-21T12:43:14.3485209Z // %bb.8: // in Loop: Header=BB0_6 Depth=1 2026-02-21T12:43:14.3485423Z .loc 1 31 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:31:32 2026-02-21T12:43:14.3485497Z or.b64 %rd378, %rd75, %rd3; 2026-02-21T12:43:14.3485696Z .loc 1 33 32 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:33:32 2026-02-21T12:43:14.3485760Z or.b64 %rd379, %rd76, %rd6; 2026-02-21T12:43:14.3485956Z or.b64 %rd380, %rd76, %rd7; 2026-02-21T12:43:14.3486026Z or.b64 %rd381, %rd76, %rd8; 2026-02-21T12:43:14.3486177Z or.b64 %rd382, %rd76, %rd9; 2026-02-21T12:43:14.3486246Z or.b64 %rd383, %rd76, %rd10; 2026-02-21T12:43:14.3486316Z or.b64 %rd384, %rd76, %rd11; 2026-02-21T12:43:14.3486379Z or.b64 %rd385, %rd76, %rd12; 2026-02-21T12:43:14.3486440Z or.b64 %rd386, %rd76, %rd13; 2026-02-21T12:43:14.3486827Z .loc 1 87 28 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:87:28 2026-02-21T12:43:14.3486915Z cvt.rn.bf16x2.f32 %r7567, %r7810, %r7809; 2026-02-21T12:43:14.3486995Z cvt.rn.bf16x2.f32 %r7568, %r7812, %r7811; 2026-02-21T12:43:14.3487067Z cvt.rn.bf16x2.f32 %r7569, %r7814, %r7813; 2026-02-21T12:43:14.3487143Z cvt.rn.bf16x2.f32 %r7570, %r7816, %r7815; 2026-02-21T12:43:14.3487214Z cvt.rn.bf16x2.f32 %r7571, %r7818, %r7817; 2026-02-21T12:43:14.3487297Z cvt.rn.bf16x2.f32 %r7572, %r7820, %r7819; 2026-02-21T12:43:14.3487384Z cvt.rn.bf16x2.f32 %r7573, %r7822, %r7821; 2026-02-21T12:43:14.3487459Z cvt.rn.bf16x2.f32 %r7574, %r7824, %r7823; 2026-02-21T12:43:14.3487606Z cvt.rn.bf16x2.f32 %r7575, %r7826, %r7825; 2026-02-21T12:43:14.3487777Z cvt.rn.bf16x2.f32 %r7576, %r7828, %r7827; 2026-02-21T12:43:14.3487865Z cvt.rn.bf16x2.f32 %r7577, %r7830, %r7829; 2026-02-21T12:43:14.3487941Z cvt.rn.bf16x2.f32 %r7578, %r7832, %r7831; 2026-02-21T12:43:14.3488021Z cvt.rn.bf16x2.f32 %r7579, %r7834, %r7833; 2026-02-21T12:43:14.3488100Z cvt.rn.bf16x2.f32 %r7580, %r7836, %r7835; 2026-02-21T12:43:14.3488173Z cvt.rn.bf16x2.f32 %r7581, %r7838, %r7837; 2026-02-21T12:43:14.3488243Z cvt.rn.bf16x2.f32 %r7582, %r7840, %r7839; 2026-02-21T12:43:14.3488323Z cvt.rn.bf16x2.f32 %r7583, %r7842, %r7841; 2026-02-21T12:43:14.3488396Z cvt.rn.bf16x2.f32 %r7584, %r7844, %r7843; 2026-02-21T12:43:14.3488466Z cvt.rn.bf16x2.f32 %r7585, %r7846, %r7845; 2026-02-21T12:43:14.3488538Z cvt.rn.bf16x2.f32 %r7586, %r7848, %r7847; 2026-02-21T12:43:14.3488621Z cvt.rn.bf16x2.f32 %r7587, %r7850, %r7849; 2026-02-21T12:43:14.3488693Z cvt.rn.bf16x2.f32 %r7588, %r7852, %r7851; 2026-02-21T12:43:14.3488772Z cvt.rn.bf16x2.f32 %r7589, %r7854, %r7853; 2026-02-21T12:43:14.3488852Z cvt.rn.bf16x2.f32 %r7590, %r7856, %r7855; 2026-02-21T12:43:14.3488923Z cvt.rn.bf16x2.f32 %r7591, %r7858, %r7857; 2026-02-21T12:43:14.3488998Z cvt.rn.bf16x2.f32 %r7592, %r7860, %r7859; 2026-02-21T12:43:14.3489077Z cvt.rn.bf16x2.f32 %r7593, %r7862, %r7861; 2026-02-21T12:43:14.3489149Z cvt.rn.bf16x2.f32 %r7594, %r7864, %r7863; 2026-02-21T12:43:14.3489219Z cvt.rn.bf16x2.f32 %r7595, %r7866, %r7865; 2026-02-21T12:43:14.3489290Z cvt.rn.bf16x2.f32 %r7596, %r7868, %r7867; 2026-02-21T12:43:14.3489367Z cvt.rn.bf16x2.f32 %r7597, %r7870, %r7869; 2026-02-21T12:43:14.3489438Z cvt.rn.bf16x2.f32 %r7598, %r7872, %r7871; 2026-02-21T12:43:14.3489661Z .loc 1 88 22 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:22 2026-02-21T12:43:14.3489745Z mad.lo.s64 %rd387, %rd379, 2560, %rd88; 2026-02-21T12:43:14.3489814Z shl.b64 %rd388, %rd378, 1; 2026-02-21T12:43:14.3489883Z add.s64 %rd370, %rd387, %rd388; 2026-02-21T12:43:14.3489967Z mad.lo.s64 %rd389, %rd380, 2560, %rd88; 2026-02-21T12:43:14.3490033Z add.s64 %rd371, %rd389, %rd388; 2026-02-21T12:43:14.3490104Z mad.lo.s64 %rd390, %rd381, 2560, %rd88; 2026-02-21T12:43:14.3490169Z add.s64 %rd372, %rd390, %rd388; 2026-02-21T12:43:14.3490245Z mad.lo.s64 %rd391, %rd382, 2560, %rd88; 2026-02-21T12:43:14.3490309Z add.s64 %rd373, %rd391, %rd388; 2026-02-21T12:43:14.3490378Z mad.lo.s64 %rd392, %rd383, 2560, %rd88; 2026-02-21T12:43:14.3490448Z add.s64 %rd374, %rd392, %rd388; 2026-02-21T12:43:14.3490518Z mad.lo.s64 %rd393, %rd384, 2560, %rd88; 2026-02-21T12:43:14.3490583Z add.s64 %rd375, %rd393, %rd388; 2026-02-21T12:43:14.3490654Z mad.lo.s64 %rd394, %rd385, 2560, %rd88; 2026-02-21T12:43:14.3490723Z add.s64 %rd376, %rd394, %rd388; 2026-02-21T12:43:14.3490791Z mad.lo.s64 %rd395, %rd386, 2560, %rd88; 2026-02-21T12:43:14.3490949Z add.s64 %rd377, %rd395, %rd388; 2026-02-21T12:43:14.3491177Z .loc 1 88 81 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:88:81 2026-02-21T12:43:14.3491307Z bar.sync 0; 2026-02-21T12:43:14.3491425Z st.shared.v4.b32 [%r37], {%r7567, %r7569, %r7571, %r7573}; 2026-02-21T12:43:14.3491547Z st.shared.v4.b32 [%r37+512], {%r7568, %r7570, %r7572, %r7574}; 2026-02-21T12:43:14.3491653Z st.shared.v4.b32 [%r38], {%r7575, %r7577, %r7579, %r7581}; 2026-02-21T12:43:14.3491764Z st.shared.v4.b32 [%r38+512], {%r7576, %r7578, %r7580, %r7582}; 2026-02-21T12:43:14.3491875Z st.shared.v4.b32 [%r39], {%r7583, %r7585, %r7587, %r7589}; 2026-02-21T12:43:14.3491984Z st.shared.v4.b32 [%r39+512], {%r7584, %r7586, %r7588, %r7590}; 2026-02-21T12:43:14.3492089Z st.shared.v4.b32 [%r40], {%r7591, %r7593, %r7595, %r7597}; 2026-02-21T12:43:14.3492199Z st.shared.v4.b32 [%r40+512], {%r7592, %r7594, %r7596, %r7598}; 2026-02-21T12:43:14.3492266Z bar.sync 0; 2026-02-21T12:43:14.3492330Z // begin inline asm 2026-02-21T12:43:14.3492525Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7495, %r7496, %r7497, %r7498}, [%r7499]; 2026-02-21T12:43:14.3492699Z // end inline asm 2026-02-21T12:43:14.3492769Z // begin inline asm 2026-02-21T12:43:14.3492956Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7500, %r7501, %r7502, %r7503}, [%r7504]; 2026-02-21T12:43:14.3493015Z // end inline asm 2026-02-21T12:43:14.3493079Z // begin inline asm 2026-02-21T12:43:14.3493259Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7505, %r7506, %r7507, %r7508}, [%r7509]; 2026-02-21T12:43:14.3493316Z // end inline asm 2026-02-21T12:43:14.3493382Z // begin inline asm 2026-02-21T12:43:14.3493561Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7510, %r7511, %r7512, %r7513}, [%r7514]; 2026-02-21T12:43:14.3493620Z // end inline asm 2026-02-21T12:43:14.3493685Z // begin inline asm 2026-02-21T12:43:14.3493864Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7515, %r7516, %r7517, %r7518}, [%r7519]; 2026-02-21T12:43:14.3493925Z // end inline asm 2026-02-21T12:43:14.3493985Z // begin inline asm 2026-02-21T12:43:14.3494172Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7520, %r7521, %r7522, %r7523}, [%r7524]; 2026-02-21T12:43:14.3494232Z // end inline asm 2026-02-21T12:43:14.3494293Z // begin inline asm 2026-02-21T12:43:14.3494480Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7525, %r7526, %r7527, %r7528}, [%r7529]; 2026-02-21T12:43:14.3494538Z // end inline asm 2026-02-21T12:43:14.3494599Z // begin inline asm 2026-02-21T12:43:14.3494782Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r7530, %r7531, %r7532, %r7533}, [%r7534]; 2026-02-21T12:43:14.3494843Z // end inline asm 2026-02-21T12:43:14.3494903Z // begin inline asm 2026-02-21T12:43:14.3495030Z st.global.v4.b32 [ %rd370 + 0 ], { %r7495, %r7496, %r7497, %r7498 }; 2026-02-21T12:43:14.3495096Z // end inline asm 2026-02-21T12:43:14.3495155Z // begin inline asm 2026-02-21T12:43:14.3495276Z st.global.v4.b32 [ %rd371 + 0 ], { %r7500, %r7501, %r7502, %r7503 }; 2026-02-21T12:43:14.3495343Z // end inline asm 2026-02-21T12:43:14.3495406Z // begin inline asm 2026-02-21T12:43:14.3495526Z st.global.v4.b32 [ %rd372 + 0 ], { %r7505, %r7506, %r7507, %r7508 }; 2026-02-21T12:43:14.3495588Z // end inline asm 2026-02-21T12:43:14.3495654Z // begin inline asm 2026-02-21T12:43:14.3495769Z st.global.v4.b32 [ %rd373 + 0 ], { %r7510, %r7511, %r7512, %r7513 }; 2026-02-21T12:43:14.3495828Z // end inline asm 2026-02-21T12:43:14.3495893Z // begin inline asm 2026-02-21T12:43:14.3496007Z st.global.v4.b32 [ %rd374 + 0 ], { %r7515, %r7516, %r7517, %r7518 }; 2026-02-21T12:43:14.3496066Z // end inline asm 2026-02-21T12:43:14.3496128Z // begin inline asm 2026-02-21T12:43:14.3496269Z st.global.v4.b32 [ %rd375 + 0 ], { %r7520, %r7521, %r7522, %r7523 }; 2026-02-21T12:43:14.3496330Z // end inline asm 2026-02-21T12:43:14.3496390Z // begin inline asm 2026-02-21T12:43:14.3496641Z st.global.v4.b32 [ %rd376 + 0 ], { %r7525, %r7526, %r7527, %r7528 }; 2026-02-21T12:43:14.3496703Z // end inline asm 2026-02-21T12:43:14.3496852Z // begin inline asm 2026-02-21T12:43:14.3496976Z st.global.v4.b32 [ %rd377 + 0 ], { %r7530, %r7531, %r7532, %r7533 }; 2026-02-21T12:43:14.3497109Z // end inline asm 2026-02-21T12:43:14.3497331Z .loc 1 19 124 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:124 2026-02-21T12:43:14.3497400Z add.s64 %rd85, %rd411, 264; 2026-02-21T12:43:14.3497477Z setp.lt.s64 %p49, %rd411, 4856; 2026-02-21T12:43:14.3497553Z mov.b64 %rd411, %rd85; 2026-02-21T12:43:14.3497619Z @%p49 bra $L__BB0_6; 2026-02-21T12:43:14.3497721Z $L__BB0_9: // %._crit_edge 2026-02-21T12:43:14.3497926Z .loc 1 19 4 // cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py:19:4 2026-02-21T12:43:14.3497981Z ret; 2026-02-21T12:43:14.3498045Z $L__tmp33: 2026-02-21T12:43:14.3498104Z $L__func_end0: 2026-02-21T12:43:14.3498193Z // -- End function 2026-02-21T12:43:14.3498251Z } 2026-02-21T12:43:14.3498508Z .file 1 "/tmp/torchinductor_root/fz/cfz6dqyl5r35f2v3dxbymqjqzxqlrdk6czpojh43tcgllkbojwiw.py" 2026-02-21T12:43:14.3498863Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:43:14.3498943Z .section .debug_abbrev 2026-02-21T12:43:14.3499007Z { 2026-02-21T12:43:14.3499105Z .b8 1 // Abbreviation Code 2026-02-21T12:43:14.3499204Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:43:14.3499300Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:43:14.3499387Z .b8 37 // DW_AT_producer 2026-02-21T12:43:14.3499467Z .b8 8 // DW_FORM_string 2026-02-21T12:43:14.3499547Z .b8 19 // DW_AT_language 2026-02-21T12:43:14.3499637Z .b8 5 // DW_FORM_data2 2026-02-21T12:43:14.3499719Z .b8 3 // DW_AT_name 2026-02-21T12:43:14.3499801Z .b8 8 // DW_FORM_string 2026-02-21T12:43:14.3499895Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:43:14.3499978Z .b8 6 // DW_FORM_data4 2026-02-21T12:43:14.3500062Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:43:14.3500148Z .b8 8 // DW_FORM_string 2026-02-21T12:43:14.3500225Z .b8 0 // EOM(1) 2026-02-21T12:43:14.3500301Z .b8 0 // EOM(2) 2026-02-21T12:43:14.3500396Z .b8 2 // Abbreviation Code 2026-02-21T12:43:14.3500492Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:43:14.3500574Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:43:14.3500652Z .b8 3 // DW_AT_name 2026-02-21T12:43:14.3500737Z .b8 8 // DW_FORM_string 2026-02-21T12:43:14.3500821Z .b8 32 // DW_AT_inline 2026-02-21T12:43:14.3500905Z .b8 11 // DW_FORM_data1 2026-02-21T12:43:14.3500986Z .b8 0 // EOM(1) 2026-02-21T12:43:14.3501058Z .b8 0 // EOM(2) 2026-02-21T12:43:14.3501143Z .b8 3 // Abbreviation Code 2026-02-21T12:43:14.3501230Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:43:14.3501321Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:43:14.3501409Z .b8 17 // DW_AT_low_pc 2026-02-21T12:43:14.3501491Z .b8 1 // DW_FORM_addr 2026-02-21T12:43:14.3501585Z .b8 18 // DW_AT_high_pc 2026-02-21T12:43:14.3501668Z .b8 1 // DW_FORM_addr 2026-02-21T12:43:14.3501765Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:43:14.3501922Z .b8 19 // DW_FORM_ref4 2026-02-21T12:43:14.3502000Z .b8 0 // EOM(1) 2026-02-21T12:43:14.3502130Z .b8 0 // EOM(2) 2026-02-21T12:43:14.3502228Z .b8 4 // Abbreviation Code 2026-02-21T12:43:14.3502334Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:43:14.3502416Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:43:14.3502510Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:43:14.3502595Z .b8 19 // DW_FORM_ref4 2026-02-21T12:43:14.3502674Z .b8 17 // DW_AT_low_pc 2026-02-21T12:43:14.3502750Z .b8 1 // DW_FORM_addr 2026-02-21T12:43:14.3502838Z .b8 18 // DW_AT_high_pc 2026-02-21T12:43:14.3502915Z .b8 1 // DW_FORM_addr 2026-02-21T12:43:14.3503001Z .b8 88 // DW_AT_call_file 2026-02-21T12:43:14.3503197Z .b8 11 // DW_FORM_data1 2026-02-21T12:43:14.3503291Z .b8 89 // DW_AT_call_line 2026-02-21T12:43:14.3503375Z .b8 11 // DW_FORM_data1 2026-02-21T12:43:14.3503466Z .b8 87 // DW_AT_call_column 2026-02-21T12:43:14.3503554Z .b8 11 // DW_FORM_data1 2026-02-21T12:43:14.3503628Z .b8 0 // EOM(1) 2026-02-21T12:43:14.3503701Z .b8 0 // EOM(2) 2026-02-21T12:43:14.3503779Z .b8 0 // EOM(3) 2026-02-21T12:43:14.3503834Z } 2026-02-21T12:43:14.3503900Z .section .debug_info 2026-02-21T12:43:14.3503960Z { 2026-02-21T12:43:14.3504054Z .b32 178 // Length of Unit 2026-02-21T12:43:14.3504154Z .b8 2 // DWARF version number 2026-02-21T12:43:14.3504209Z .b8 0 2026-02-21T12:43:14.3504357Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:43:14.3504457Z .b8 8 // Address Size (in bytes) 2026-02-21T12:43:14.3504577Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:43:14.3504672Z .b8 116 // DW_AT_producer 2026-02-21T12:43:14.3504728Z .b8 114 2026-02-21T12:43:14.3504783Z .b8 105 2026-02-21T12:43:14.3504837Z .b8 116 2026-02-21T12:43:14.3504896Z .b8 111 2026-02-21T12:43:14.3504951Z .b8 110 2026-02-21T12:43:14.3505004Z .b8 0 2026-02-21T12:43:14.3505094Z .b8 2 // DW_AT_language 2026-02-21T12:43:14.3505147Z .b8 0 2026-02-21T12:43:14.3505228Z .b8 99 // DW_AT_name 2026-02-21T12:43:14.3505281Z .b8 102 2026-02-21T12:43:14.3505340Z .b8 122 2026-02-21T12:43:14.3505396Z .b8 54 2026-02-21T12:43:14.3505451Z .b8 100 2026-02-21T12:43:14.3505513Z .b8 113 2026-02-21T12:43:14.3505568Z .b8 121 2026-02-21T12:43:14.3505621Z .b8 108 2026-02-21T12:43:14.3505680Z .b8 53 2026-02-21T12:43:14.3505749Z .b8 114 2026-02-21T12:43:14.3505806Z .b8 51 2026-02-21T12:43:14.3505861Z .b8 53 2026-02-21T12:43:14.3505922Z .b8 102 2026-02-21T12:43:14.3505975Z .b8 50 2026-02-21T12:43:14.3506028Z .b8 118 2026-02-21T12:43:14.3506081Z .b8 51 2026-02-21T12:43:14.3506142Z .b8 100 2026-02-21T12:43:14.3506195Z .b8 120 2026-02-21T12:43:14.3506248Z .b8 98 2026-02-21T12:43:14.3506307Z .b8 121 2026-02-21T12:43:14.3506362Z .b8 109 2026-02-21T12:43:14.3506415Z .b8 113 2026-02-21T12:43:14.3506589Z .b8 106 2026-02-21T12:43:14.3506652Z .b8 113 2026-02-21T12:43:14.3506704Z .b8 122 2026-02-21T12:43:14.3506757Z .b8 120 2026-02-21T12:43:14.3506810Z .b8 113 2026-02-21T12:43:14.3506870Z .b8 108 2026-02-21T12:43:14.3506924Z .b8 114 2026-02-21T12:43:14.3506978Z .b8 100 2026-02-21T12:43:14.3507037Z .b8 107 2026-02-21T12:43:14.3507185Z .b8 54 2026-02-21T12:43:14.3507240Z .b8 99 2026-02-21T12:43:14.3507304Z .b8 122 2026-02-21T12:43:14.3507367Z .b8 112 2026-02-21T12:43:14.3507420Z .b8 111 2026-02-21T12:43:14.3507549Z .b8 106 2026-02-21T12:43:14.3507609Z .b8 104 2026-02-21T12:43:14.3507661Z .b8 52 2026-02-21T12:43:14.3507718Z .b8 51 2026-02-21T12:43:14.3507783Z .b8 116 2026-02-21T12:43:14.3507842Z .b8 99 2026-02-21T12:43:14.3507897Z .b8 103 2026-02-21T12:43:14.3507949Z .b8 108 2026-02-21T12:43:14.3508003Z .b8 108 2026-02-21T12:43:14.3508062Z .b8 107 2026-02-21T12:43:14.3508114Z .b8 98 2026-02-21T12:43:14.3508168Z .b8 111 2026-02-21T12:43:14.3508226Z .b8 106 2026-02-21T12:43:14.3508280Z .b8 119 2026-02-21T12:43:14.3508333Z .b8 105 2026-02-21T12:43:14.3508390Z .b8 119 2026-02-21T12:43:14.3508448Z .b8 46 2026-02-21T12:43:14.3508502Z .b8 112 2026-02-21T12:43:14.3508644Z .b8 121 2026-02-21T12:43:14.3508719Z .b8 0 2026-02-21T12:43:14.3508828Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:43:14.3508916Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:43:14.3508975Z .b8 116 2026-02-21T12:43:14.3509035Z .b8 109 2026-02-21T12:43:14.3509089Z .b8 112 2026-02-21T12:43:14.3509239Z .b8 47 2026-02-21T12:43:14.3509368Z .b8 116 2026-02-21T12:43:14.3509428Z .b8 111 2026-02-21T12:43:14.3509481Z .b8 114 2026-02-21T12:43:14.3509534Z .b8 99 2026-02-21T12:43:14.3509597Z .b8 104 2026-02-21T12:43:14.3509650Z .b8 105 2026-02-21T12:43:14.3509704Z .b8 110 2026-02-21T12:43:14.3509765Z .b8 100 2026-02-21T12:43:14.3509817Z .b8 117 2026-02-21T12:43:14.3509881Z .b8 99 2026-02-21T12:43:14.3509936Z .b8 116 2026-02-21T12:43:14.3509998Z .b8 111 2026-02-21T12:43:14.3510052Z .b8 114 2026-02-21T12:43:14.3510106Z .b8 95 2026-02-21T12:43:14.3510159Z .b8 114 2026-02-21T12:43:14.3510216Z .b8 111 2026-02-21T12:43:14.3510272Z .b8 111 2026-02-21T12:43:14.3510324Z .b8 116 2026-02-21T12:43:14.3510382Z .b8 47 2026-02-21T12:43:14.3510434Z .b8 102 2026-02-21T12:43:14.3510486Z .b8 122 2026-02-21T12:43:14.3510547Z .b8 0 2026-02-21T12:43:14.3510667Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:43:14.3510749Z .b8 95 // DW_AT_name 2026-02-21T12:43:14.3510807Z .b8 104 2026-02-21T12:43:14.3510867Z .b8 101 2026-02-21T12:43:14.3510919Z .b8 108 2026-02-21T12:43:14.3510972Z .b8 105 2026-02-21T12:43:14.3511029Z .b8 111 2026-02-21T12:43:14.3511082Z .b8 110 2026-02-21T12:43:14.3511134Z .b8 95 2026-02-21T12:43:14.3511186Z .b8 109 2026-02-21T12:43:14.3511246Z .b8 97 2026-02-21T12:43:14.3511301Z .b8 116 2026-02-21T12:43:14.3511353Z .b8 109 2026-02-21T12:43:14.3511412Z .b8 117 2026-02-21T12:43:14.3511465Z .b8 108 2026-02-21T12:43:14.3511517Z .b8 95 2026-02-21T12:43:14.3511569Z .b8 98 2026-02-21T12:43:14.3511626Z .b8 102 2026-02-21T12:43:14.3511678Z .b8 49 2026-02-21T12:43:14.3511730Z .b8 54 2026-02-21T12:43:14.3511784Z .b8 95 2026-02-21T12:43:14.3511844Z .b8 105 2026-02-21T12:43:14.3511897Z .b8 110 2026-02-21T12:43:14.3511951Z .b8 116 2026-02-21T12:43:14.3512011Z .b8 52 2026-02-21T12:43:14.3512065Z .b8 0 2026-02-21T12:43:14.3512148Z .b8 1 // DW_AT_inline 2026-02-21T12:43:14.3512263Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:43:14.3512370Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:43:14.3512469Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:43:14.3512571Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:43:14.3512708Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:43:14.3512809Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:43:14.3512899Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:43:14.3512998Z .b64 $L__tmp32 // DW_AT_high_pc 2026-02-21T12:43:14.3513084Z .b8 1 // DW_AT_call_file 2026-02-21T12:43:14.3513169Z .b8 84 // DW_AT_call_line 2026-02-21T12:43:14.3513337Z .b8 40 // DW_AT_call_column 2026-02-21T12:43:14.3513430Z .b8 0 // End Of Children Mark 2026-02-21T12:43:14.3513575Z .b8 0 // End Of Children Mark 2026-02-21T12:43:14.3513627Z } 2026-02-21T12:43:14.3513703Z .section .debug_macinfo { } 2026-02-21T12:43:14.3513709Z 2026-02-21T12:43:14.3513790Z ================================================================ 2026-02-21T12:43:14.3513911Z please share the reproducer above with Triton project. 2026-02-21T12:43:29.4309025Z [8308s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 512, 64], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=2, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[True, False], range_num_stages=[3, 4], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T12:43:29.4310960Z Tensor-likes are not close! 2026-02-21T12:43:29.4311137Z 2026-02-21T12:43:29.4311794Z Mismatched elements: 334832970 / 335544320 (99.8%) 2026-02-21T12:43:29.4312278Z Greatest absolute difference: 7232.0 at index (126054, 532) (up to 0.01 allowed) 2026-02-21T12:43:29.4312818Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:43:29.4313337Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:43:29.4313654Z 2026-02-21T12:44:28.5650145Z 2026-02-21T12:44:28.5650160Z 2026-02-21T12:44:28.5650165Z 2026-02-21T12:44:28.5650602Z ================================================================ 2026-02-21T12:44:28.5650990Z Internal Triton PTX codegen error 2026-02-21T12:44:28.5651242Z `ptxas` stderr: 2026-02-21T12:44:28.5652071Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 495 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:44:28.5653089Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:44:28.5653367Z 2026-02-21T12:44:28.5654143Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmppcb8cdjd.ptx -o /tmp/tmppcb8cdjd.ptx.o 2026-02-21T12:44:28.5655049Z 2026-02-21T12:44:28.5655323Z 2026-02-21T12:44:28.5655407Z // 2026-02-21T12:44:28.5655610Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:44:28.5655897Z // 2026-02-21T12:44:28.5655996Z 2026-02-21T12:44:28.5656078Z .version 8.7 2026-02-21T12:44:28.5656281Z .target sm_90a 2026-02-21T12:44:28.5656691Z .address_size 64 2026-02-21T12:44:28.5656828Z 2026-02-21T12:44:28.5657090Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:44:28.5657590Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:44:28.5657968Z // @_helion_matmul_bf16_int4 2026-02-21T12:44:28.5658346Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:44:28.5658766Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:44:28.5659304Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:44:28.5659825Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:44:28.5660329Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:44:28.5660827Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:44:28.5661214Z ) 2026-02-21T12:44:28.5661398Z .reqntid 256 2026-02-21T12:44:28.5661595Z .maxnreg 128 2026-02-21T12:44:28.5661785Z { 2026-02-21T12:44:28.5661977Z .reg .pred %p<43>; 2026-02-21T12:44:28.5662191Z .reg .b16 %rs<561>; 2026-02-21T12:44:28.5662374Z .reg .b32 %r<14604>; 2026-02-21T12:44:28.5662549Z .reg .b64 %rd<699>; 2026-02-21T12:44:28.5662916Z .loc 1 14 0 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:14:0 2026-02-21T12:44:28.5663762Z $L__func_begin0: 2026-02-21T12:44:28.5664139Z .loc 1 14 0 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:14:0 2026-02-21T12:44:28.5664674Z 2026-02-21T12:44:28.5664739Z // %bb.0: 2026-02-21T12:44:28.5664974Z ld.param.b64 %rd167, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:44:28.5665339Z ld.param.b64 %rd166, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:44:28.5665688Z ld.param.b64 %rd165, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:44:28.5665978Z $L__tmp0: 2026-02-21T12:44:28.5666325Z .loc 1 20 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:20:30 2026-02-21T12:44:28.5666932Z mov.u32 %r1495, %ctaid.x; 2026-02-21T12:44:28.5667316Z .loc 1 20 48 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:20:48 2026-02-21T12:44:28.5667744Z shl.b32 %r1496, %r1495, 1; 2026-02-21T12:44:28.5667962Z cvt.u64.u32 %rd647, %r1496; 2026-02-21T12:44:28.5668358Z .loc 1 21 49 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:21:49 2026-02-21T12:44:28.5669016Z min.u64 %rd168, %rd647, 10238; 2026-02-21T12:44:28.5669376Z add.s64 %rd2, %rd168, 2; 2026-02-21T12:44:28.5669827Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.5670268Z sub.s64 %rd169, %rd2, %rd647; 2026-02-21T12:44:28.5670485Z shr.u64 %rd170, %rd169, 62; 2026-02-21T12:44:28.5670711Z add.s64 %rd171, %rd169, %rd170; 2026-02-21T12:44:28.5670928Z and.b64 %rd172, %rd171, -4; 2026-02-21T12:44:28.5671151Z add.s64 %rd698, %rd172, %rd647; 2026-02-21T12:44:28.5671537Z .loc 1 34 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:45 2026-02-21T12:44:28.5671956Z mov.u32 %r1, %tid.x; 2026-02-21T12:44:28.5672135Z shr.u32 %r2, %r1, 5; 2026-02-21T12:44:28.5672323Z bfe.u32 %r1497, %r1, 1, 7; 2026-02-21T12:44:28.5672518Z and.b32 %r3, %r1, 224; 2026-02-21T12:44:28.5672696Z bfe.u32 %r1498, %r1, 5, 3; 2026-02-21T12:44:28.5672875Z or.b32 %r1499, %r1498, 8; 2026-02-21T12:44:28.5673062Z or.b32 %r1500, %r1498, 16; 2026-02-21T12:44:28.5673257Z or.b32 %r1501, %r1498, 24; 2026-02-21T12:44:28.5673433Z or.b32 %r1502, %r1498, 32; 2026-02-21T12:44:28.5673608Z or.b32 %r1503, %r1498, 40; 2026-02-21T12:44:28.5673786Z or.b32 %r1504, %r1498, 48; 2026-02-21T12:44:28.5673971Z or.b32 %r1505, %r1498, 56; 2026-02-21T12:44:28.5674142Z or.b32 %r1506, %r1498, 64; 2026-02-21T12:44:28.5674333Z or.b32 %r1507, %r1498, 72; 2026-02-21T12:44:28.5674508Z or.b32 %r1508, %r1498, 80; 2026-02-21T12:44:28.5674689Z or.b32 %r1509, %r1498, 88; 2026-02-21T12:44:28.5674871Z or.b32 %r1510, %r1498, 96; 2026-02-21T12:44:28.5675045Z or.b32 %r1511, %r1498, 104; 2026-02-21T12:44:28.5675285Z or.b32 %r1512, %r1498, 112; 2026-02-21T12:44:28.5675458Z or.b32 %r1513, %r1498, 120; 2026-02-21T12:44:28.5675809Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.5676190Z cvt.u64.u32 %rd4, %r1497; 2026-02-21T12:44:28.5676398Z cvt.u64.u32 %rd5, %r1498; 2026-02-21T12:44:28.5676735Z cvt.u64.u32 %rd6, %r1499; 2026-02-21T12:44:28.5676952Z cvt.u64.u32 %rd7, %r1500; 2026-02-21T12:44:28.5677133Z cvt.u64.u32 %rd8, %r1501; 2026-02-21T12:44:28.5677307Z cvt.u64.u32 %rd9, %r1502; 2026-02-21T12:44:28.5677498Z cvt.u64.u32 %rd10, %r1503; 2026-02-21T12:44:28.5677692Z cvt.u64.u32 %rd11, %r1504; 2026-02-21T12:44:28.5677883Z cvt.u64.u32 %rd12, %r1505; 2026-02-21T12:44:28.5678058Z cvt.u64.u32 %rd13, %r1506; 2026-02-21T12:44:28.5678238Z cvt.u64.u32 %rd14, %r1507; 2026-02-21T12:44:28.5678414Z cvt.u64.u32 %rd15, %r1508; 2026-02-21T12:44:28.5678590Z cvt.u64.u32 %rd16, %r1509; 2026-02-21T12:44:28.5678769Z cvt.u64.u32 %rd17, %r1510; 2026-02-21T12:44:28.5678954Z cvt.u64.u32 %rd18, %r1511; 2026-02-21T12:44:28.5679130Z cvt.u64.u32 %rd19, %r1512; 2026-02-21T12:44:28.5679322Z cvt.u64.u32 %rd20, %r1513; 2026-02-21T12:44:28.5679790Z .loc 1 36 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:45 2026-02-21T12:44:28.5680153Z shl.b32 %r4, %r1, 2; 2026-02-21T12:44:28.5680403Z and.b32 %r1514, %r4, 252; 2026-02-21T12:44:28.5680574Z shl.b32 %r5, %r1, 3; 2026-02-21T12:44:28.5680738Z and.b32 %r1515, %r5, 248; 2026-02-21T12:44:28.5681053Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.5681410Z cvt.u64.u32 %rd21, %r1514; 2026-02-21T12:44:28.5681592Z cvt.u64.u32 %rd22, %r1515; 2026-02-21T12:44:28.5681909Z .loc 1 44 48 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:44:48 2026-02-21T12:44:28.5682277Z and.b32 %r6, %r1, 192; 2026-02-21T12:44:28.5682451Z bfe.u32 %r1516, %r1, 6, 2; 2026-02-21T12:44:28.5682785Z .loc 1 44 66 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:44:66 2026-02-21T12:44:28.5683134Z cvt.u64.u32 %rd23, %r1516; 2026-02-21T12:44:28.5683456Z .loc 1 50 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:50:39 2026-02-21T12:44:28.5683884Z and.b32 %r7, %r4, 4; 2026-02-21T12:44:28.5684242Z .loc 1 50 26 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:50:26 2026-02-21T12:44:28.5684602Z or.b32 %r1517, %r4, 8184; 2026-02-21T12:44:28.5684906Z .loc 1 57 55 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:55 2026-02-21T12:44:28.5685279Z mul.wide.u32 %rd24, %r1516, 1280; 2026-02-21T12:44:28.5685629Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.5686004Z mad.wide.u32 %rd25, %r1517, 2, %rd165; 2026-02-21T12:44:28.5686222Z setp.gt.s64 %p1, %rd172, 0; 2026-02-21T12:44:28.5686414Z mov.b32 %r11895, global_smem; 2026-02-21T12:44:28.5686758Z shl.b32 %r13808, %r3, 3; 2026-02-21T12:44:28.5686939Z and.b32 %r13809, %r4, 112; 2026-02-21T12:44:28.5687116Z and.b32 %r13810, %r1, 3; 2026-02-21T12:44:28.5687298Z or.b64 %rd644, %rd23, %rd21; 2026-02-21T12:44:28.5687485Z and.b32 %r13811, %r1, 252; 2026-02-21T12:44:28.5687660Z shl.b32 %r13812, %r1, 5; 2026-02-21T12:44:28.5687835Z bfe.s32 %r13813, %r1, 2, 1; 2026-02-21T12:44:28.5688017Z add.s64 %rd645, %rd166, %rd24; 2026-02-21T12:44:28.5688217Z shl.b32 %r13814, %r1, 4; 2026-02-21T12:44:28.5688394Z and.b32 %r13815, %r1, 24; 2026-02-21T12:44:28.5688571Z shl.b32 %r13816, %r6, 1; 2026-02-21T12:44:28.5688739Z bfe.s32 %r13817, %r1, 5, 1; 2026-02-21T12:44:28.5688918Z cvt.u64.u32 %rd664, %r7; 2026-02-21T12:44:28.5689084Z and.b32 %r14330, %r5, 2040; 2026-02-21T12:44:28.5689264Z @%p1 bra $L__BB0_1; 2026-02-21T12:44:28.5689440Z bra.uni $L__BB0_23; 2026-02-21T12:44:28.5689625Z $L__BB0_1: // %.lr.ph 2026-02-21T12:44:28.5689999Z .loc 1 0 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:0:120 2026-02-21T12:44:28.5690361Z add.s32 %r10, %r11895, %r14330; 2026-02-21T12:44:28.5690551Z shl.b32 %r1522, %r13810, 1; 2026-02-21T12:44:28.5690731Z add.s32 %r1523, %r11895, %r13808; 2026-02-21T12:44:28.5690935Z add.s32 %r1524, %r1523, %r13809; 2026-02-21T12:44:28.5691124Z add.s32 %r11, %r1524, %r1522; 2026-02-21T12:44:28.5691308Z cvt.u32.u64 %r1525, %rd644; 2026-02-21T12:44:28.5691495Z add.s32 %r12, %r11895, %r1525; 2026-02-21T12:44:28.5691675Z xor.b32 %r1526, %r1525, 32; 2026-02-21T12:44:28.5691864Z add.s32 %r13, %r11895, %r1526; 2026-02-21T12:44:28.5692040Z xor.b32 %r1527, %r1525, 64; 2026-02-21T12:44:28.5692221Z add.s32 %r14, %r11895, %r1527; 2026-02-21T12:44:28.5692399Z xor.b32 %r1528, %r1525, 96; 2026-02-21T12:44:28.5692577Z add.s32 %r15, %r11895, %r1528; 2026-02-21T12:44:28.5692759Z shl.b32 %r1529, %r13810, 8; 2026-02-21T12:44:28.5692952Z shl.b32 %r1530, %r13810, 5; 2026-02-21T12:44:28.5693131Z xor.b32 %r1532, %r1530, %r13811; 2026-02-21T12:44:28.5693329Z add.s32 %r1533, %r11895, %r1529; 2026-02-21T12:44:28.5693613Z add.s32 %r16, %r1533, %r1532; 2026-02-21T12:44:28.5693791Z and.b32 %r1535, %r13812, 8032; 2026-02-21T12:44:28.5693995Z and.b32 %r1537, %r13813, 144; 2026-02-21T12:44:28.5694252Z or.b32 %r1538, %r1537, %r1535; 2026-02-21T12:44:28.5694439Z add.s32 %r17, %r11895, %r1538; 2026-02-21T12:44:28.5694617Z xor.b32 %r1539, %r1538, 16; 2026-02-21T12:44:28.5694798Z add.s32 %r18, %r11895, %r1539; 2026-02-21T12:44:28.5694980Z bfe.u32 %r1540, %r11895, 4, 14; 2026-02-21T12:44:28.5695176Z cvt.u64.u32 %rd174, %r1540; 2026-02-21T12:44:28.5695382Z or.b64 %rd466, %rd174, -4611685949674356736; 2026-02-21T12:44:28.5695600Z shl.b32 %r1541, %r13810, 13; 2026-02-21T12:44:28.5695785Z and.b32 %r1543, %r13814, 3968; 2026-02-21T12:44:28.5695964Z and.b32 %r1544, %r13813, 4112; 2026-02-21T12:44:28.5696149Z or.b32 %r1545, %r1543, %r1544; 2026-02-21T12:44:28.5696326Z or.b32 %r1546, %r1545, %r1541; 2026-02-21T12:44:28.5696664Z or.b32 %r1547, %r1546, %r1530; 2026-02-21T12:44:28.5696859Z add.s32 %r19, %r11895, %r1547; 2026-02-21T12:44:28.5697058Z xor.b32 %r1548, %r1547, 16; 2026-02-21T12:44:28.5697238Z add.s32 %r20, %r11895, %r1548; 2026-02-21T12:44:28.5697556Z xor.b32 %r1549, %r1547, 32; 2026-02-21T12:44:28.5697747Z add.s32 %r21, %r11895, %r1549; 2026-02-21T12:44:28.5697941Z xor.b32 %r1550, %r1547, 48; 2026-02-21T12:44:28.5698128Z add.s32 %r22, %r11895, %r1550; 2026-02-21T12:44:28.5698307Z xor.b32 %r1551, %r1547, 64; 2026-02-21T12:44:28.5698488Z add.s32 %r23, %r11895, %r1551; 2026-02-21T12:44:28.5698668Z xor.b32 %r1552, %r1547, 80; 2026-02-21T12:44:28.5698846Z add.s32 %r24, %r11895, %r1552; 2026-02-21T12:44:28.5699025Z xor.b32 %r1553, %r1547, 96; 2026-02-21T12:44:28.5699206Z add.s32 %r25, %r11895, %r1553; 2026-02-21T12:44:28.5699390Z xor.b32 %r1554, %r1547, 112; 2026-02-21T12:44:28.5699566Z add.s32 %r26, %r11895, %r1554; 2026-02-21T12:44:28.5699750Z shl.b32 %r1556, %r13815, 10; 2026-02-21T12:44:28.5699924Z and.b32 %r1557, %r13814, 112; 2026-02-21T12:44:28.5700109Z shl.b32 %r1558, %r13815, 2; 2026-02-21T12:44:28.5700282Z and.b32 %r1561, %r13817, 4112; 2026-02-21T12:44:28.5700464Z or.b32 %r1562, %r1556, %r1557; 2026-02-21T12:44:28.5700650Z or.b32 %r1563, %r1558, %r13816; 2026-02-21T12:44:28.5700840Z xor.b32 %r1564, %r1562, %r1563; 2026-02-21T12:44:28.5701027Z xor.b32 %r1565, %r1564, %r1561; 2026-02-21T12:44:28.5701214Z add.s32 %r3760, %r11895, %r1565; 2026-02-21T12:44:28.5701404Z add.s32 %r3765, %r3760, 512; 2026-02-21T12:44:28.5701578Z add.s32 %r3770, %r3760, 1024; 2026-02-21T12:44:28.5701764Z add.s32 %r3775, %r3760, 1536; 2026-02-21T12:44:28.5701942Z add.s32 %r3780, %r3760, 2048; 2026-02-21T12:44:28.5702119Z add.s32 %r3785, %r3760, 2560; 2026-02-21T12:44:28.5702292Z add.s32 %r3790, %r3760, 3072; 2026-02-21T12:44:28.5702490Z add.s32 %r3795, %r3760, 3584; 2026-02-21T12:44:28.5702832Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.5703209Z shl.b64 %rd175, %rd4, 14; 2026-02-21T12:44:28.5703398Z and.b32 %r1566, %r1, 1; 2026-02-21T12:44:28.5703573Z mul.wide.u32 %rd176, %r1566, 8; 2026-02-21T12:44:28.5703776Z or.b64 %rd177, %rd175, %rd176; 2026-02-21T12:44:28.5703964Z add.s64 %rd178, %rd177, %rd165; 2026-02-21T12:44:28.5704156Z add.s64 %rd30, %rd178, 32; 2026-02-21T12:44:28.5704337Z add.s64 %rd179, %rd24, %rd21; 2026-02-21T12:44:28.5704540Z add.s64 %rd31, %rd166, %rd179; 2026-02-21T12:44:28.5704739Z prmt.b32 %r3157, %r3158, %r3159, 0x3340U; 2026-02-21T12:44:28.5704965Z prmt.b32 %r3911, %r3912, %r3913, 0x3340U; 2026-02-21T12:44:28.5705192Z prmt.b32 %r5585, %r5586, %r5587, 0x3340U; 2026-02-21T12:44:28.5705406Z prmt.b32 %r6338, %r6339, %r6340, 0x3340U; 2026-02-21T12:44:28.5705619Z prmt.b32 %r8011, %r8012, %r8013, 0x3340U; 2026-02-21T12:44:28.5705827Z prmt.b32 %r8764, %r8765, %r8766, 0x3340U; 2026-02-21T12:44:28.5706060Z prmt.b32 %r10437, %r10438, %r10439, 0x3340U; 2026-02-21T12:44:28.5706291Z prmt.b32 %r11190, %r11191, %r11192, 0x3340U; 2026-02-21T12:44:28.5706782Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:44:28.5707075Z // Child Loop BB0_6 Depth 2 2026-02-21T12:44:28.5707432Z // Child Loop BB0_11 Depth 2 2026-02-21T12:44:28.5707710Z // Child Loop BB0_16 Depth 2 2026-02-21T12:44:28.5707974Z // Child Loop BB0_21 Depth 2 2026-02-21T12:44:28.5708366Z .loc 1 28 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:28:35 2026-02-21T12:44:28.5708879Z mul.hi.u64 %rd180, %rd647, -3689348814741910323; 2026-02-21T12:44:28.5709124Z shr.u64 %rd181, %rd180, 4; 2026-02-21T12:44:28.5709450Z .loc 1 29 33 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:29:33 2026-02-21T12:44:28.5709819Z shl.b64 %rd33, %rd181, 2; 2026-02-21T12:44:28.5710142Z .loc 1 30 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:39 2026-02-21T12:44:28.5710500Z sub.s64 %rd182, 2048, %rd33; 2026-02-21T12:44:28.5710977Z .loc 1 30 52 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:52 2026-02-21T12:44:28.5711353Z min.s64 %rd34, %rd182, 4; 2026-02-21T12:44:28.5711673Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.5712039Z mul.lo.s64 %rd183, %rd181, 20; 2026-02-21T12:44:28.5712229Z sub.s64 %rd35, %rd647, %rd183; 2026-02-21T12:44:28.5712552Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.5712902Z or.b64 %rd184, %rd35, %rd34; 2026-02-21T12:44:28.5713096Z and.b64 %rd185, %rd184, -4294967296; 2026-02-21T12:44:28.5713300Z setp.ne.b64 %p2, %rd185, 0; 2026-02-21T12:44:28.5713489Z @%p2 bra $L__BB0_4; 2026-02-21T12:44:28.5713652Z bra.uni $L__BB0_3; 2026-02-21T12:44:28.5713869Z $L__BB0_4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5714142Z div.s64 %rd648, %rd35, %rd34; 2026-02-21T12:44:28.5714347Z bra.uni $L__BB0_5; 2026-02-21T12:44:28.5714561Z $L__BB0_3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5714818Z cvt.u32.u64 %r1567, %rd34; 2026-02-21T12:44:28.5715001Z cvt.u32.u64 %r1568, %rd35; 2026-02-21T12:44:28.5715177Z div.u32 %r1569, %r1568, %r1567; 2026-02-21T12:44:28.5715369Z cvt.u64.u32 %rd648, %r1569; 2026-02-21T12:44:28.5715604Z $L__BB0_5: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5716012Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.5716379Z mul.lo.s64 %rd187, %rd648, %rd34; 2026-02-21T12:44:28.5716769Z sub.s64 %rd188, %rd35, %rd187; 2026-02-21T12:44:28.5717101Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.5717453Z add.s64 %rd189, %rd188, %rd33; 2026-02-21T12:44:28.5717788Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.5718161Z shl.b64 %rd39, %rd189, 7; 2026-02-21T12:44:28.5718489Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.5718852Z or.b64 %rd190, %rd39, %rd4; 2026-02-21T12:44:28.5719171Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.5719529Z shl.b64 %rd40, %rd648, 8; 2026-02-21T12:44:28.5719838Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.5720195Z or.b64 %rd41, %rd40, %rd21; 2026-02-21T12:44:28.5720510Z .loc 1 51 53 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:53 2026-02-21T12:44:28.5720866Z shl.b64 %rd42, %rd190, 13; 2026-02-21T12:44:28.5721297Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.5721658Z shl.b64 %rd191, %rd189, 21; 2026-02-21T12:44:28.5721914Z add.s64 %rd650, %rd30, %rd191; 2026-02-21T12:44:28.5722105Z add.s64 %rd649, %rd31, %rd40; 2026-02-21T12:44:28.5722302Z mov.b32 %r13818, 0f00000000; 2026-02-21T12:44:28.5722484Z mov.b64 %rd651, -12; 2026-02-21T12:44:28.5722657Z mov.b32 %r13819, %r13818; 2026-02-21T12:44:28.5722829Z mov.b32 %r13820, %r13818; 2026-02-21T12:44:28.5723007Z mov.b32 %r13821, %r13818; 2026-02-21T12:44:28.5723182Z mov.b32 %r13822, %r13818; 2026-02-21T12:44:28.5723350Z mov.b32 %r13823, %r13818; 2026-02-21T12:44:28.5723526Z mov.b32 %r13824, %r13818; 2026-02-21T12:44:28.5723692Z mov.b32 %r13825, %r13818; 2026-02-21T12:44:28.5723866Z mov.b32 %r13826, %r13818; 2026-02-21T12:44:28.5724034Z mov.b32 %r13827, %r13818; 2026-02-21T12:44:28.5724205Z mov.b32 %r13828, %r13818; 2026-02-21T12:44:28.5724378Z mov.b32 %r13829, %r13818; 2026-02-21T12:44:28.5724553Z mov.b32 %r13830, %r13818; 2026-02-21T12:44:28.5724718Z mov.b32 %r13831, %r13818; 2026-02-21T12:44:28.5724984Z mov.b32 %r13832, %r13818; 2026-02-21T12:44:28.5725219Z mov.b32 %r13833, %r13818; 2026-02-21T12:44:28.5725388Z mov.b32 %r13834, %r13818; 2026-02-21T12:44:28.5725562Z mov.b32 %r13835, %r13818; 2026-02-21T12:44:28.5725730Z mov.b32 %r13836, %r13818; 2026-02-21T12:44:28.5725901Z mov.b32 %r13837, %r13818; 2026-02-21T12:44:28.5726067Z mov.b32 %r13838, %r13818; 2026-02-21T12:44:28.5726238Z mov.b32 %r13839, %r13818; 2026-02-21T12:44:28.5726400Z mov.b32 %r13840, %r13818; 2026-02-21T12:44:28.5726747Z mov.b32 %r13841, %r13818; 2026-02-21T12:44:28.5726912Z mov.b32 %r13842, %r13818; 2026-02-21T12:44:28.5727083Z mov.b32 %r13843, %r13818; 2026-02-21T12:44:28.5727252Z mov.b32 %r13844, %r13818; 2026-02-21T12:44:28.5727419Z mov.b32 %r13845, %r13818; 2026-02-21T12:44:28.5727588Z mov.b32 %r13846, %r13818; 2026-02-21T12:44:28.5727753Z mov.b32 %r13847, %r13818; 2026-02-21T12:44:28.5727926Z mov.b32 %r13848, %r13818; 2026-02-21T12:44:28.5728089Z mov.b32 %r13849, %r13818; 2026-02-21T12:44:28.5728263Z mov.b32 %r13850, %r13818; 2026-02-21T12:44:28.5728434Z mov.b32 %r13851, %r13818; 2026-02-21T12:44:28.5728604Z mov.b32 %r13852, %r13818; 2026-02-21T12:44:28.5728776Z mov.b32 %r13853, %r13818; 2026-02-21T12:44:28.5728946Z mov.b32 %r13854, %r13818; 2026-02-21T12:44:28.5729117Z mov.b32 %r13855, %r13818; 2026-02-21T12:44:28.5729282Z mov.b32 %r13856, %r13818; 2026-02-21T12:44:28.5729453Z mov.b32 %r13857, %r13818; 2026-02-21T12:44:28.5729617Z mov.b32 %r13858, %r13818; 2026-02-21T12:44:28.5729789Z mov.b32 %r13859, %r13818; 2026-02-21T12:44:28.5729954Z mov.b32 %r13860, %r13818; 2026-02-21T12:44:28.5730124Z mov.b32 %r13861, %r13818; 2026-02-21T12:44:28.5730287Z mov.b32 %r13862, %r13818; 2026-02-21T12:44:28.5730458Z mov.b32 %r13863, %r13818; 2026-02-21T12:44:28.5730644Z mov.b32 %r13864, %r13818; 2026-02-21T12:44:28.5730816Z mov.b32 %r13865, %r13818; 2026-02-21T12:44:28.5730997Z mov.b32 %r13866, %r13818; 2026-02-21T12:44:28.5731163Z mov.b32 %r13867, %r13818; 2026-02-21T12:44:28.5731336Z mov.b32 %r13868, %r13818; 2026-02-21T12:44:28.5731505Z mov.b32 %r13869, %r13818; 2026-02-21T12:44:28.5731684Z mov.b32 %r13870, %r13818; 2026-02-21T12:44:28.5731853Z mov.b32 %r13871, %r13818; 2026-02-21T12:44:28.5732026Z mov.b32 %r13872, %r13818; 2026-02-21T12:44:28.5732197Z mov.b32 %r13873, %r13818; 2026-02-21T12:44:28.5732372Z mov.b32 %r13874, %r13818; 2026-02-21T12:44:28.5732547Z mov.b32 %r13875, %r13818; 2026-02-21T12:44:28.5732713Z mov.b32 %r13876, %r13818; 2026-02-21T12:44:28.5732886Z mov.b32 %r13877, %r13818; 2026-02-21T12:44:28.5733052Z mov.b32 %r13878, %r13818; 2026-02-21T12:44:28.5733224Z mov.b32 %r13879, %r13818; 2026-02-21T12:44:28.5733389Z mov.b32 %r13880, %r13818; 2026-02-21T12:44:28.5733560Z mov.b32 %r13881, %r13818; 2026-02-21T12:44:28.5733725Z mov.b32 %r13882, %r13818; 2026-02-21T12:44:28.5733895Z mov.b32 %r13883, %r13818; 2026-02-21T12:44:28.5734157Z mov.b32 %r13884, %r13818; 2026-02-21T12:44:28.5734327Z mov.b32 %r13885, %r13818; 2026-02-21T12:44:28.5734497Z mov.b32 %r13886, %r13818; 2026-02-21T12:44:28.5734726Z mov.b32 %r13887, %r13818; 2026-02-21T12:44:28.5734901Z mov.b32 %r13888, %r13818; 2026-02-21T12:44:28.5735065Z mov.b32 %r13889, %r13818; 2026-02-21T12:44:28.5735235Z mov.b32 %r13890, %r13818; 2026-02-21T12:44:28.5735400Z mov.b32 %r13891, %r13818; 2026-02-21T12:44:28.5735570Z mov.b32 %r13892, %r13818; 2026-02-21T12:44:28.5735751Z mov.b32 %r13893, %r13818; 2026-02-21T12:44:28.5735924Z mov.b32 %r13894, %r13818; 2026-02-21T12:44:28.5736090Z mov.b32 %r13895, %r13818; 2026-02-21T12:44:28.5736261Z mov.b32 %r13896, %r13818; 2026-02-21T12:44:28.5736433Z mov.b32 %r13897, %r13818; 2026-02-21T12:44:28.5736725Z mov.b32 %r13898, %r13818; 2026-02-21T12:44:28.5736898Z mov.b32 %r13899, %r13818; 2026-02-21T12:44:28.5737075Z mov.b32 %r13900, %r13818; 2026-02-21T12:44:28.5737251Z mov.b32 %r13901, %r13818; 2026-02-21T12:44:28.5737433Z mov.b32 %r13902, %r13818; 2026-02-21T12:44:28.5737607Z mov.b32 %r13903, %r13818; 2026-02-21T12:44:28.5737852Z mov.b32 %r13904, %r13818; 2026-02-21T12:44:28.5738085Z mov.b32 %r13905, %r13818; 2026-02-21T12:44:28.5738255Z mov.b32 %r13906, %r13818; 2026-02-21T12:44:28.5738428Z mov.b32 %r13907, %r13818; 2026-02-21T12:44:28.5738599Z mov.b32 %r13908, %r13818; 2026-02-21T12:44:28.5738766Z mov.b32 %r13909, %r13818; 2026-02-21T12:44:28.5738950Z mov.b32 %r13910, %r13818; 2026-02-21T12:44:28.5739119Z mov.b32 %r13911, %r13818; 2026-02-21T12:44:28.5739292Z mov.b32 %r13912, %r13818; 2026-02-21T12:44:28.5739460Z mov.b32 %r13913, %r13818; 2026-02-21T12:44:28.5739629Z mov.b32 %r13914, %r13818; 2026-02-21T12:44:28.5739797Z mov.b32 %r13915, %r13818; 2026-02-21T12:44:28.5739970Z mov.b32 %r13916, %r13818; 2026-02-21T12:44:28.5740138Z mov.b32 %r13917, %r13818; 2026-02-21T12:44:28.5740317Z mov.b32 %r13918, %r13818; 2026-02-21T12:44:28.5740489Z mov.b32 %r13919, %r13818; 2026-02-21T12:44:28.5740659Z mov.b32 %r13920, %r13818; 2026-02-21T12:44:28.5740833Z mov.b32 %r13921, %r13818; 2026-02-21T12:44:28.5741002Z mov.b32 %r13922, %r13818; 2026-02-21T12:44:28.5741193Z mov.b32 %r13923, %r13818; 2026-02-21T12:44:28.5741367Z mov.b32 %r13924, %r13818; 2026-02-21T12:44:28.5741538Z mov.b32 %r13925, %r13818; 2026-02-21T12:44:28.5741706Z mov.b32 %r13926, %r13818; 2026-02-21T12:44:28.5741880Z mov.b32 %r13927, %r13818; 2026-02-21T12:44:28.5742045Z mov.b32 %r13928, %r13818; 2026-02-21T12:44:28.5742221Z mov.b32 %r13929, %r13818; 2026-02-21T12:44:28.5742393Z mov.b32 %r13930, %r13818; 2026-02-21T12:44:28.5742558Z mov.b32 %r13931, %r13818; 2026-02-21T12:44:28.5742727Z mov.b32 %r13932, %r13818; 2026-02-21T12:44:28.5742892Z mov.b32 %r13933, %r13818; 2026-02-21T12:44:28.5743062Z mov.b32 %r13934, %r13818; 2026-02-21T12:44:28.5743227Z mov.b32 %r13935, %r13818; 2026-02-21T12:44:28.5743396Z mov.b32 %r13936, %r13818; 2026-02-21T12:44:28.5743562Z mov.b32 %r13937, %r13818; 2026-02-21T12:44:28.5743735Z mov.b32 %r13938, %r13818; 2026-02-21T12:44:28.5743899Z mov.b32 %r13939, %r13818; 2026-02-21T12:44:28.5744074Z mov.b32 %r13940, %r13818; 2026-02-21T12:44:28.5744247Z mov.b32 %r13941, %r13818; 2026-02-21T12:44:28.5744414Z mov.b32 %r13942, %r13818; 2026-02-21T12:44:28.5744584Z mov.b32 %r13943, %r13818; 2026-02-21T12:44:28.5744751Z mov.b32 %r13944, %r13818; 2026-02-21T12:44:28.5744923Z mov.b32 %r13945, %r13818; 2026-02-21T12:44:28.5745145Z $L__BB0_6: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:28.5745452Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:28.5745874Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5746241Z add.s64 %rd193, %rd650, -32; 2026-02-21T12:44:28.5746432Z // begin inline asm 2026-02-21T12:44:28.5746724Z mov.u64 %rd192, 0x0; 2026-02-21T12:44:28.5746979Z createpolicy.fractional.L2::evict_last.b64 %rd192, 1.0; 2026-02-21T12:44:28.5747333Z // end inline asm 2026-02-21T12:44:28.5747493Z // begin inline asm 2026-02-21T12:44:28.5747718Z mov.u32 %r1571, 0x0; 2026-02-21T12:44:28.5747885Z mov.u32 %r1572, 0x0; 2026-02-21T12:44:28.5748180Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r1571, %r1572 }, [ %rd193 + 0 ], %rd192; 2026-02-21T12:44:28.5748592Z // end inline asm 2026-02-21T12:44:28.5748909Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5749262Z bar.sync 0; 2026-02-21T12:44:28.5749438Z st.shared.v2.b32 [%r10], {%r1571, %r1572}; 2026-02-21T12:44:28.5749650Z bar.sync 0; 2026-02-21T12:44:28.5749811Z ld.shared.b16 %rs1, [%r11]; 2026-02-21T12:44:28.5750003Z ld.shared.b16 %rs2, [%r11+128]; 2026-02-21T12:44:28.5750206Z ld.shared.b16 %rs3, [%r11+8]; 2026-02-21T12:44:28.5750405Z ld.shared.b16 %rs4, [%r11+136]; 2026-02-21T12:44:28.5750598Z cvt.f32.bf16 %r1830, %rs1; 2026-02-21T12:44:28.5750794Z cvt.f32.bf16 %r1831, %rs2; 2026-02-21T12:44:28.5750981Z cvt.f32.bf16 %r1832, %rs3; 2026-02-21T12:44:28.5751164Z cvt.f32.bf16 %r1833, %rs4; 2026-02-21T12:44:28.5751618Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5751992Z // begin inline asm 2026-02-21T12:44:28.5752162Z mov.u32 %r1573, 0x0; 2026-02-21T12:44:28.5752350Z ld.global.b32 { %r1573 }, [ %rd649 + 0 ]; 2026-02-21T12:44:28.5752560Z // end inline asm 2026-02-21T12:44:28.5752864Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5753221Z bar.sync 0; 2026-02-21T12:44:28.5753374Z st.shared.b8 [%r12], %r1573; 2026-02-21T12:44:28.5753575Z prmt.b32 %r3146, %r1573, 0, 0x7771U; 2026-02-21T12:44:28.5753779Z st.shared.b8 [%r13+256], %r3146; 2026-02-21T12:44:28.5753982Z prmt.b32 %r3147, %r1573, 0, 0x7772U; 2026-02-21T12:44:28.5754182Z st.shared.b8 [%r14+512], %r3147; 2026-02-21T12:44:28.5754382Z prmt.b32 %r3148, %r1573, 0, 0x7773U; 2026-02-21T12:44:28.5754579Z st.shared.b8 [%r15+768], %r3148; 2026-02-21T12:44:28.5754768Z bar.sync 0; 2026-02-21T12:44:28.5754934Z ld.shared.b32 %r3149, [%r16]; 2026-02-21T12:44:28.5755119Z prmt.b32 %r3150, %r3149, 0, 0x7771U; 2026-02-21T12:44:28.5755321Z cvt.u16.u32 %rs5, %r3150; 2026-02-21T12:44:28.5755499Z prmt.b32 %r3151, %r3149, 0, 0x7770U; 2026-02-21T12:44:28.5755698Z cvt.u16.u32 %rs6, %r3151; 2026-02-21T12:44:28.5755874Z prmt.b32 %r3152, %r3149, 0, 0x7773U; 2026-02-21T12:44:28.5756074Z cvt.u16.u32 %rs7, %r3152; 2026-02-21T12:44:28.5756248Z prmt.b32 %r3153, %r3149, 0, 0x7772U; 2026-02-21T12:44:28.5756445Z cvt.u16.u32 %rs8, %r3153; 2026-02-21T12:44:28.5756919Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5757275Z shl.b16 %rs9, %rs6, 4; 2026-02-21T12:44:28.5757454Z shl.b16 %rs10, %rs5, 4; 2026-02-21T12:44:28.5757765Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5758128Z cvt.u32.u16 %r3154, %rs9; 2026-02-21T12:44:28.5758316Z prmt.b32 %r3155, %r3154, %r3156, 0x3340U; 2026-02-21T12:44:28.5758546Z prmt.b32 %r3160, %r3155, %r3157, 0x5410U; 2026-02-21T12:44:28.5758778Z prmt.b32 %r3161, %r3160, %r3149, 0x5040U; 2026-02-21T12:44:28.5758998Z prmt.b32 %r3162, %r3161, 0, 0x9991U; 2026-02-21T12:44:28.5759201Z cvt.u16.u32 %rs11, %r3162; 2026-02-21T12:44:28.5759379Z shr.s16 %rs12, %rs11, 4; 2026-02-21T12:44:28.5759561Z prmt.b32 %r3163, %r3161, 0, 0xbbb3U; 2026-02-21T12:44:28.5759756Z cvt.u16.u32 %rs13, %r3163; 2026-02-21T12:44:28.5759938Z shr.s16 %rs14, %rs13, 4; 2026-02-21T12:44:28.5760111Z cvt.s16.s8 %rs15, %rs9; 2026-02-21T12:44:28.5760300Z shr.s16 %rs16, %rs15, 4; 2026-02-21T12:44:28.5760474Z cvt.s16.s8 %rs17, %rs10; 2026-02-21T12:44:28.5760651Z shr.s16 %rs18, %rs17, 4; 2026-02-21T12:44:28.5760966Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5761406Z cvt.rn.f32.s16 %r3164, %rs14; 2026-02-21T12:44:28.5761597Z cvt.rn.f32.s16 %r3165, %rs12; 2026-02-21T12:44:28.5761884Z cvt.rn.f32.s16 %r3166, %rs18; 2026-02-21T12:44:28.5762089Z cvt.rn.f32.s16 %r3167, %rs16; 2026-02-21T12:44:28.5762411Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5762773Z shl.b16 %rs19, %rs8, 4; 2026-02-21T12:44:28.5762942Z shl.b16 %rs20, %rs7, 4; 2026-02-21T12:44:28.5763253Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5763624Z prmt.b32 %r3168, %r3149, %r3169, 0x3020U; 2026-02-21T12:44:28.5763838Z prmt.b32 %r3170, %r3168, 0, 0x9991U; 2026-02-21T12:44:28.5764044Z cvt.u16.u32 %rs21, %r3170; 2026-02-21T12:44:28.5764221Z shr.s16 %rs22, %rs21, 4; 2026-02-21T12:44:28.5764415Z cvt.s16.s8 %rs23, %rs19; 2026-02-21T12:44:28.5764587Z shr.s16 %rs24, %rs23, 4; 2026-02-21T12:44:28.5764766Z cvt.s16.s8 %rs25, %rs20; 2026-02-21T12:44:28.5764939Z shr.s16 %rs26, %rs25, 4; 2026-02-21T12:44:28.5765194Z prmt.b32 %r3171, %r3149, 0, 0xbbb3U; 2026-02-21T12:44:28.5765459Z cvt.u16.u32 %rs27, %r3171; 2026-02-21T12:44:28.5765637Z shr.s16 %rs28, %rs27, 4; 2026-02-21T12:44:28.5765948Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5766299Z cvt.rn.f32.s16 %r3172, %rs22; 2026-02-21T12:44:28.5766601Z cvt.rn.f32.s16 %r3173, %rs28; 2026-02-21T12:44:28.5766789Z cvt.rn.f32.s16 %r3174, %rs26; 2026-02-21T12:44:28.5766972Z cvt.rn.f32.s16 %r3175, %rs24; 2026-02-21T12:44:28.5767141Z bar.sync 0; 2026-02-21T12:44:28.5767341Z st.shared.v4.b32 [%r17], {%r3167, %r3165, %r3166, %r3164}; 2026-02-21T12:44:28.5767645Z st.shared.v4.b32 [%r18], {%r3175, %r3172, %r3174, %r3173}; 2026-02-21T12:44:28.5767898Z $L__tmp1: 2026-02-21T12:44:28.5768260Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5768679Z // begin inline asm 2026-02-21T12:44:28.5768861Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5769054Z // end inline asm 2026-02-21T12:44:28.5769211Z bar.sync 0; 2026-02-21T12:44:28.5769373Z shfl.sync.idx.b32 %r3176, %r2, 0, 31, -1; 2026-02-21T12:44:28.5769605Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5769794Z mov.pred %p3, -1; 2026-02-21T12:44:28.5769952Z // begin inline asm 2026-02-21T12:44:28.5772800Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945}, {%r1830,%r1831,%r1832,%r1833}, %rd466, %p3, 1, 1; 2026-02-21T12:44:28.5775852Z // end inline asm 2026-02-21T12:44:28.5776029Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.5776229Z mov.b32 %r3623, 0; 2026-02-21T12:44:28.5776397Z mov.b32 %r1962, %r11895; 2026-02-21T12:44:28.5776693Z mov.b32 %r1963, %r3623; 2026-02-21T12:44:28.5776869Z mov.b32 %r1964, %r3623; 2026-02-21T12:44:28.5777031Z // begin inline asm 2026-02-21T12:44:28.5779751Z // wait for regs: %r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945,%r1962,%r1963,%r1964 2026-02-21T12:44:28.5782610Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.5782805Z // end inline asm 2026-02-21T12:44:28.5783069Z $L__tmp2: 2026-02-21T12:44:28.5783386Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5797971Z add.s64 %rd198, %rd650, -16; 2026-02-21T12:44:28.5798265Z // begin inline asm 2026-02-21T12:44:28.5798459Z mov.u64 %rd197, 0x0; 2026-02-21T12:44:28.5798709Z createpolicy.fractional.L2::evict_last.b64 %rd197, 1.0; 2026-02-21T12:44:28.5798991Z // end inline asm 2026-02-21T12:44:28.5799156Z // begin inline asm 2026-02-21T12:44:28.5799331Z mov.u32 %r2096, 0x0; 2026-02-21T12:44:28.5799501Z mov.u32 %r2097, 0x0; 2026-02-21T12:44:28.5799793Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2096, %r2097 }, [ %rd198 + 0 ], %rd197; 2026-02-21T12:44:28.5800141Z // end inline asm 2026-02-21T12:44:28.5800456Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5800855Z bar.sync 0; 2026-02-21T12:44:28.5801033Z st.shared.v2.b32 [%r10], {%r2096, %r2097}; 2026-02-21T12:44:28.5801264Z bar.sync 0; 2026-02-21T12:44:28.5801431Z ld.shared.b16 %rs29, [%r11]; 2026-02-21T12:44:28.5801627Z ld.shared.b16 %rs30, [%r11+128]; 2026-02-21T12:44:28.5801833Z ld.shared.b16 %rs31, [%r11+8]; 2026-02-21T12:44:28.5802038Z ld.shared.b16 %rs32, [%r11+136]; 2026-02-21T12:44:28.5802248Z cvt.f32.bf16 %r2355, %rs29; 2026-02-21T12:44:28.5802432Z cvt.f32.bf16 %r2356, %rs30; 2026-02-21T12:44:28.5802614Z cvt.f32.bf16 %r2357, %rs31; 2026-02-21T12:44:28.5802791Z cvt.f32.bf16 %r2358, %rs32; 2026-02-21T12:44:28.5803128Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5803513Z add.s64 %rd200, %rd649, 5120; 2026-02-21T12:44:28.5803723Z // begin inline asm 2026-02-21T12:44:28.5803904Z mov.u32 %r2098, 0x0; 2026-02-21T12:44:28.5804090Z ld.global.b32 { %r2098 }, [ %rd200 + 0 ]; 2026-02-21T12:44:28.5804313Z // end inline asm 2026-02-21T12:44:28.5804629Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5804999Z bar.sync 0; 2026-02-21T12:44:28.5805160Z st.shared.b8 [%r12], %r2098; 2026-02-21T12:44:28.5805364Z prmt.b32 %r3177, %r2098, 0, 0x7771U; 2026-02-21T12:44:28.5805574Z st.shared.b8 [%r13+256], %r3177; 2026-02-21T12:44:28.5805778Z prmt.b32 %r3178, %r2098, 0, 0x7772U; 2026-02-21T12:44:28.5805995Z st.shared.b8 [%r14+512], %r3178; 2026-02-21T12:44:28.5806189Z prmt.b32 %r3179, %r2098, 0, 0x7773U; 2026-02-21T12:44:28.5806393Z st.shared.b8 [%r15+768], %r3179; 2026-02-21T12:44:28.5806764Z bar.sync 0; 2026-02-21T12:44:28.5806923Z ld.shared.b32 %r3180, [%r16]; 2026-02-21T12:44:28.5807128Z prmt.b32 %r3181, %r3180, 0, 0x7771U; 2026-02-21T12:44:28.5807338Z cvt.u16.u32 %rs33, %r3181; 2026-02-21T12:44:28.5807519Z prmt.b32 %r3182, %r3180, 0, 0x7770U; 2026-02-21T12:44:28.5807907Z cvt.u16.u32 %rs34, %r3182; 2026-02-21T12:44:28.5808091Z prmt.b32 %r3183, %r3180, 0, 0x7773U; 2026-02-21T12:44:28.5808291Z cvt.u16.u32 %rs35, %r3183; 2026-02-21T12:44:28.5808564Z prmt.b32 %r3184, %r3180, 0, 0x7772U; 2026-02-21T12:44:28.5808759Z cvt.u16.u32 %rs36, %r3184; 2026-02-21T12:44:28.5809085Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5809442Z shl.b16 %rs37, %rs34, 4; 2026-02-21T12:44:28.5809628Z shl.b16 %rs38, %rs33, 4; 2026-02-21T12:44:28.5809938Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5810299Z cvt.u32.u16 %r3185, %rs37; 2026-02-21T12:44:28.5810499Z prmt.b32 %r3186, %r3185, %r3187, 0x3340U; 2026-02-21T12:44:28.5810738Z prmt.b32 %r3188, %r3186, %r3157, 0x5410U; 2026-02-21T12:44:28.5810980Z prmt.b32 %r3189, %r3188, %r3180, 0x5040U; 2026-02-21T12:44:28.5811213Z prmt.b32 %r3190, %r3189, 0, 0x9991U; 2026-02-21T12:44:28.5811429Z cvt.u16.u32 %rs39, %r3190; 2026-02-21T12:44:28.5811609Z shr.s16 %rs40, %rs39, 4; 2026-02-21T12:44:28.5811887Z prmt.b32 %r3191, %r3189, 0, 0xbbb3U; 2026-02-21T12:44:28.5812156Z cvt.u16.u32 %rs41, %r3191; 2026-02-21T12:44:28.5812344Z shr.s16 %rs42, %rs41, 4; 2026-02-21T12:44:28.5812522Z cvt.s16.s8 %rs43, %rs37; 2026-02-21T12:44:28.5812694Z shr.s16 %rs44, %rs43, 4; 2026-02-21T12:44:28.5812868Z cvt.s16.s8 %rs45, %rs38; 2026-02-21T12:44:28.5813053Z shr.s16 %rs46, %rs45, 4; 2026-02-21T12:44:28.5813392Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5813768Z cvt.rn.f32.s16 %r3192, %rs42; 2026-02-21T12:44:28.5813964Z cvt.rn.f32.s16 %r3193, %rs40; 2026-02-21T12:44:28.5814152Z cvt.rn.f32.s16 %r3194, %rs46; 2026-02-21T12:44:28.5814336Z cvt.rn.f32.s16 %r3195, %rs44; 2026-02-21T12:44:28.5814668Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5815031Z shl.b16 %rs47, %rs36, 4; 2026-02-21T12:44:28.5815215Z shl.b16 %rs48, %rs35, 4; 2026-02-21T12:44:28.5815530Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5815911Z prmt.b32 %r3196, %r3180, %r3197, 0x3020U; 2026-02-21T12:44:28.5816132Z prmt.b32 %r3198, %r3196, 0, 0x9991U; 2026-02-21T12:44:28.5816341Z cvt.u16.u32 %rs49, %r3198; 2026-02-21T12:44:28.5816671Z shr.s16 %rs50, %rs49, 4; 2026-02-21T12:44:28.5816853Z cvt.s16.s8 %rs51, %rs47; 2026-02-21T12:44:28.5817029Z shr.s16 %rs52, %rs51, 4; 2026-02-21T12:44:28.5817198Z cvt.s16.s8 %rs53, %rs48; 2026-02-21T12:44:28.5817386Z shr.s16 %rs54, %rs53, 4; 2026-02-21T12:44:28.5817563Z prmt.b32 %r3199, %r3180, 0, 0xbbb3U; 2026-02-21T12:44:28.5817769Z cvt.u16.u32 %rs55, %r3199; 2026-02-21T12:44:28.5817947Z shr.s16 %rs56, %rs55, 4; 2026-02-21T12:44:28.5818286Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5818646Z cvt.rn.f32.s16 %r3200, %rs50; 2026-02-21T12:44:28.5818839Z cvt.rn.f32.s16 %r3201, %rs56; 2026-02-21T12:44:28.5819030Z cvt.rn.f32.s16 %r3202, %rs54; 2026-02-21T12:44:28.5819211Z cvt.rn.f32.s16 %r3203, %rs52; 2026-02-21T12:44:28.5819391Z bar.sync 0; 2026-02-21T12:44:28.5819592Z st.shared.v4.b32 [%r17], {%r3195, %r3193, %r3194, %r3192}; 2026-02-21T12:44:28.5819893Z st.shared.v4.b32 [%r18], {%r3203, %r3200, %r3202, %r3201}; 2026-02-21T12:44:28.5820137Z $L__tmp3: 2026-02-21T12:44:28.5820506Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5820932Z // begin inline asm 2026-02-21T12:44:28.5821121Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5821318Z // end inline asm 2026-02-21T12:44:28.5821471Z bar.sync 0; 2026-02-21T12:44:28.5821640Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5821820Z // begin inline asm 2026-02-21T12:44:28.5824760Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945}, {%r2355,%r2356,%r2357,%r2358}, %rd466, %p3, 1, 1; 2026-02-21T12:44:28.5827997Z // end inline asm 2026-02-21T12:44:28.5828312Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.5828606Z mov.b32 %r2487, %r11895; 2026-02-21T12:44:28.5828796Z mov.b32 %r2488, %r3623; 2026-02-21T12:44:28.5828975Z mov.b32 %r2489, %r3623; 2026-02-21T12:44:28.5829155Z // begin inline asm 2026-02-21T12:44:28.5831775Z // wait for regs: %r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945,%r2487,%r2488,%r2489 2026-02-21T12:44:28.5834591Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.5834796Z // end inline asm 2026-02-21T12:44:28.5834946Z $L__tmp4: 2026-02-21T12:44:28.5835260Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5835623Z // begin inline asm 2026-02-21T12:44:28.5835792Z mov.u64 %rd202, 0x0; 2026-02-21T12:44:28.5836025Z createpolicy.fractional.L2::evict_last.b64 %rd202, 1.0; 2026-02-21T12:44:28.5836291Z // end inline asm 2026-02-21T12:44:28.5836593Z // begin inline asm 2026-02-21T12:44:28.5836768Z mov.u32 %r2621, 0x0; 2026-02-21T12:44:28.5836939Z mov.u32 %r2622, 0x0; 2026-02-21T12:44:28.5837234Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r2621, %r2622 }, [ %rd650 + 0 ], %rd202; 2026-02-21T12:44:28.5837578Z // end inline asm 2026-02-21T12:44:28.5837888Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5838256Z bar.sync 0; 2026-02-21T12:44:28.5838430Z st.shared.v2.b32 [%r10], {%r2621, %r2622}; 2026-02-21T12:44:28.5838646Z bar.sync 0; 2026-02-21T12:44:28.5838813Z ld.shared.b16 %rs57, [%r11]; 2026-02-21T12:44:28.5839013Z ld.shared.b16 %rs58, [%r11+128]; 2026-02-21T12:44:28.5839224Z ld.shared.b16 %rs59, [%r11+8]; 2026-02-21T12:44:28.5839434Z ld.shared.b16 %rs60, [%r11+136]; 2026-02-21T12:44:28.5839648Z cvt.f32.bf16 %r2880, %rs57; 2026-02-21T12:44:28.5839843Z cvt.f32.bf16 %r2881, %rs58; 2026-02-21T12:44:28.5840030Z cvt.f32.bf16 %r2882, %rs59; 2026-02-21T12:44:28.5840304Z cvt.f32.bf16 %r2883, %rs60; 2026-02-21T12:44:28.5840649Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5841098Z add.s64 %rd205, %rd649, 10240; 2026-02-21T12:44:28.5841286Z // begin inline asm 2026-02-21T12:44:28.5841454Z mov.u32 %r2623, 0x0; 2026-02-21T12:44:28.5841630Z ld.global.b32 { %r2623 }, [ %rd205 + 0 ]; 2026-02-21T12:44:28.5841863Z // end inline asm 2026-02-21T12:44:28.5842172Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5842534Z bar.sync 0; 2026-02-21T12:44:28.5842696Z st.shared.b8 [%r12], %r2623; 2026-02-21T12:44:28.5842903Z prmt.b32 %r3204, %r2623, 0, 0x7771U; 2026-02-21T12:44:28.5843119Z st.shared.b8 [%r13+256], %r3204; 2026-02-21T12:44:28.5843318Z prmt.b32 %r3205, %r2623, 0, 0x7772U; 2026-02-21T12:44:28.5843532Z st.shared.b8 [%r14+512], %r3205; 2026-02-21T12:44:28.5843727Z prmt.b32 %r3206, %r2623, 0, 0x7773U; 2026-02-21T12:44:28.5843937Z st.shared.b8 [%r15+768], %r3206; 2026-02-21T12:44:28.5844126Z bar.sync 0; 2026-02-21T12:44:28.5844360Z ld.shared.b32 %r3207, [%r16]; 2026-02-21T12:44:28.5844636Z prmt.b32 %r3208, %r3207, 0, 0x7771U; 2026-02-21T12:44:28.5844844Z cvt.u16.u32 %rs61, %r3208; 2026-02-21T12:44:28.5845036Z prmt.b32 %r3209, %r3207, 0, 0x7770U; 2026-02-21T12:44:28.5845233Z cvt.u16.u32 %rs62, %r3209; 2026-02-21T12:44:28.5845418Z prmt.b32 %r3210, %r3207, 0, 0x7773U; 2026-02-21T12:44:28.5845611Z cvt.u16.u32 %rs63, %r3210; 2026-02-21T12:44:28.5845795Z prmt.b32 %r3211, %r3207, 0, 0x7772U; 2026-02-21T12:44:28.5845987Z cvt.u16.u32 %rs64, %r3211; 2026-02-21T12:44:28.5846316Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5846823Z shl.b16 %rs65, %rs62, 4; 2026-02-21T12:44:28.5847003Z shl.b16 %rs66, %rs61, 4; 2026-02-21T12:44:28.5847332Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5847690Z cvt.u32.u16 %r3212, %rs65; 2026-02-21T12:44:28.5847908Z prmt.b32 %r3213, %r3212, %r3214, 0x3340U; 2026-02-21T12:44:28.5848140Z prmt.b32 %r3215, %r3213, %r3157, 0x5410U; 2026-02-21T12:44:28.5848371Z prmt.b32 %r3216, %r3215, %r3207, 0x5040U; 2026-02-21T12:44:28.5848593Z prmt.b32 %r3217, %r3216, 0, 0x9991U; 2026-02-21T12:44:28.5848807Z cvt.u16.u32 %rs67, %r3217; 2026-02-21T12:44:28.5848989Z shr.s16 %rs68, %rs67, 4; 2026-02-21T12:44:28.5849167Z prmt.b32 %r3218, %r3216, 0, 0xbbb3U; 2026-02-21T12:44:28.5849370Z cvt.u16.u32 %rs69, %r3218; 2026-02-21T12:44:28.5849545Z shr.s16 %rs70, %rs69, 4; 2026-02-21T12:44:28.5849726Z cvt.s16.s8 %rs71, %rs65; 2026-02-21T12:44:28.5849896Z shr.s16 %rs72, %rs71, 4; 2026-02-21T12:44:28.5850070Z cvt.s16.s8 %rs73, %rs66; 2026-02-21T12:44:28.5850238Z shr.s16 %rs74, %rs73, 4; 2026-02-21T12:44:28.5850570Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5850940Z cvt.rn.f32.s16 %r3219, %rs70; 2026-02-21T12:44:28.5851126Z cvt.rn.f32.s16 %r3220, %rs68; 2026-02-21T12:44:28.5851318Z cvt.rn.f32.s16 %r3221, %rs74; 2026-02-21T12:44:28.5851498Z cvt.rn.f32.s16 %r3222, %rs72; 2026-02-21T12:44:28.5851822Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5852187Z shl.b16 %rs75, %rs64, 4; 2026-02-21T12:44:28.5852365Z shl.b16 %rs76, %rs63, 4; 2026-02-21T12:44:28.5852677Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5853046Z prmt.b32 %r3223, %r3207, %r3224, 0x3020U; 2026-02-21T12:44:28.5853264Z prmt.b32 %r3225, %r3223, 0, 0x9991U; 2026-02-21T12:44:28.5853462Z cvt.u16.u32 %rs77, %r3225; 2026-02-21T12:44:28.5853654Z shr.s16 %rs78, %rs77, 4; 2026-02-21T12:44:28.5853827Z cvt.s16.s8 %rs79, %rs75; 2026-02-21T12:44:28.5854004Z shr.s16 %rs80, %rs79, 4; 2026-02-21T12:44:28.5854259Z cvt.s16.s8 %rs81, %rs76; 2026-02-21T12:44:28.5854434Z shr.s16 %rs82, %rs81, 4; 2026-02-21T12:44:28.5854607Z prmt.b32 %r3226, %r3207, 0, 0xbbb3U; 2026-02-21T12:44:28.5854888Z cvt.u16.u32 %rs83, %r3226; 2026-02-21T12:44:28.5855070Z shr.s16 %rs84, %rs83, 4; 2026-02-21T12:44:28.5855386Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5855740Z cvt.rn.f32.s16 %r3227, %rs78; 2026-02-21T12:44:28.5855924Z cvt.rn.f32.s16 %r3228, %rs84; 2026-02-21T12:44:28.5856098Z cvt.rn.f32.s16 %r3229, %rs82; 2026-02-21T12:44:28.5856274Z cvt.rn.f32.s16 %r3230, %rs80; 2026-02-21T12:44:28.5856442Z bar.sync 0; 2026-02-21T12:44:28.5856777Z st.shared.v4.b32 [%r17], {%r3222, %r3220, %r3221, %r3219}; 2026-02-21T12:44:28.5857072Z st.shared.v4.b32 [%r18], {%r3230, %r3227, %r3229, %r3228}; 2026-02-21T12:44:28.5857320Z $L__tmp5: 2026-02-21T12:44:28.5857696Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5858129Z // begin inline asm 2026-02-21T12:44:28.5858310Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5858649Z // end inline asm 2026-02-21T12:44:28.5858809Z bar.sync 0; 2026-02-21T12:44:28.5858963Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5859142Z // begin inline asm 2026-02-21T12:44:28.5861965Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945}, {%r2880,%r2881,%r2882,%r2883}, %rd466, %p3, 1, 1; 2026-02-21T12:44:28.5864959Z // end inline asm 2026-02-21T12:44:28.5865133Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.5865339Z mov.b32 %r3013, %r3623; 2026-02-21T12:44:28.5865506Z mov.b32 %r3012, %r11895; 2026-02-21T12:44:28.5865675Z mov.b32 %r3014, %r3623; 2026-02-21T12:44:28.5865833Z // begin inline asm 2026-02-21T12:44:28.5868635Z // wait for regs: %r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945,%r3012,%r3013,%r3014 2026-02-21T12:44:28.5871451Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.5871644Z // end inline asm 2026-02-21T12:44:28.5871901Z $L__tmp6: 2026-02-21T12:44:28.5872208Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.5872662Z add.s64 %rd651, %rd651, 12; 2026-02-21T12:44:28.5872849Z add.s64 %rd650, %rd650, 48; 2026-02-21T12:44:28.5873029Z add.s64 %rd649, %rd649, 15360; 2026-02-21T12:44:28.5873223Z setp.lt.u64 %p6, %rd651, 4080; 2026-02-21T12:44:28.5873409Z @%p6 bra $L__BB0_6; 2026-02-21T12:44:28.5873623Z // %bb.7: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5874037Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.5874404Z or.b64 %rd228, %rd39, %rd5; 2026-02-21T12:44:28.5874591Z or.b64 %rd229, %rd39, %rd6; 2026-02-21T12:44:28.5874767Z or.b64 %rd230, %rd39, %rd7; 2026-02-21T12:44:28.5874942Z or.b64 %rd231, %rd39, %rd8; 2026-02-21T12:44:28.5875114Z or.b64 %rd232, %rd39, %rd9; 2026-02-21T12:44:28.5875306Z or.b64 %rd233, %rd39, %rd10; 2026-02-21T12:44:28.5875492Z or.b64 %rd234, %rd39, %rd11; 2026-02-21T12:44:28.5875673Z or.b64 %rd235, %rd39, %rd12; 2026-02-21T12:44:28.5875917Z or.b64 %rd236, %rd39, %rd13; 2026-02-21T12:44:28.5876156Z or.b64 %rd237, %rd39, %rd14; 2026-02-21T12:44:28.5876333Z or.b64 %rd238, %rd39, %rd15; 2026-02-21T12:44:28.5876653Z or.b64 %rd239, %rd39, %rd16; 2026-02-21T12:44:28.5876837Z or.b64 %rd240, %rd39, %rd17; 2026-02-21T12:44:28.5877012Z or.b64 %rd241, %rd39, %rd18; 2026-02-21T12:44:28.5877188Z or.b64 %rd242, %rd39, %rd19; 2026-02-21T12:44:28.5877362Z or.b64 %rd243, %rd39, %rd20; 2026-02-21T12:44:28.5877687Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.5878043Z or.b64 %rd244, %rd40, %rd22; 2026-02-21T12:44:28.5878362Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.5878714Z shl.b64 %rd245, %rd42, 1; 2026-02-21T12:44:28.5878904Z add.s64 %rd208, %rd25, %rd245; 2026-02-21T12:44:28.5879236Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5879589Z // begin inline asm 2026-02-21T12:44:28.5879752Z mov.u64 %rd207, 0x0; 2026-02-21T12:44:28.5879976Z createpolicy.fractional.L2::evict_last.b64 %rd207, 1.0; 2026-02-21T12:44:28.5880248Z // end inline asm 2026-02-21T12:44:28.5880399Z // begin inline asm 2026-02-21T12:44:28.5880557Z mov.u32 %r3231, 0x0; 2026-02-21T12:44:28.5880710Z mov.u32 %r3232, 0x0; 2026-02-21T12:44:28.5880998Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r3231, %r3232 }, [ %rd208 + 0 ], %rd207; 2026-02-21T12:44:28.5881334Z // end inline asm 2026-02-21T12:44:28.5881627Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5881982Z bar.sync 0; 2026-02-21T12:44:28.5882148Z st.shared.v2.b32 [%r10], {%r3231, %r3232}; 2026-02-21T12:44:28.5882359Z bar.sync 0; 2026-02-21T12:44:28.5882514Z ld.shared.b16 %rs85, [%r11]; 2026-02-21T12:44:28.5882704Z ld.shared.b16 %rs86, [%r11+128]; 2026-02-21T12:44:28.5882902Z ld.shared.b16 %rs87, [%r11+8]; 2026-02-21T12:44:28.5883096Z ld.shared.b16 %rs88, [%r11+136]; 2026-02-21T12:44:28.5883289Z cvt.f32.bf16 %r3490, %rs85; 2026-02-21T12:44:28.5883472Z cvt.f32.bf16 %r3491, %rs86; 2026-02-21T12:44:28.5883652Z cvt.f32.bf16 %r3492, %rs87; 2026-02-21T12:44:28.5883825Z cvt.f32.bf16 %r3493, %rs88; 2026-02-21T12:44:28.5884146Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.5884518Z add.s64 %rd246, %rd645, %rd41; 2026-02-21T12:44:28.5884710Z add.s64 %rd210, %rd246, 5237760; 2026-02-21T12:44:28.5885040Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5885386Z // begin inline asm 2026-02-21T12:44:28.5885548Z mov.u32 %r3233, 0x0; 2026-02-21T12:44:28.5885725Z ld.global.b32 { %r3233 }, [ %rd210 + 0 ]; 2026-02-21T12:44:28.5886016Z // end inline asm 2026-02-21T12:44:28.5886314Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5886878Z bar.sync 0; 2026-02-21T12:44:28.5887032Z st.shared.b8 [%r12], %r3233; 2026-02-21T12:44:28.5887226Z prmt.b32 %r3900, %r3233, 0, 0x7771U; 2026-02-21T12:44:28.5887434Z st.shared.b8 [%r13+256], %r3900; 2026-02-21T12:44:28.5887627Z prmt.b32 %r3901, %r3233, 0, 0x7772U; 2026-02-21T12:44:28.5887831Z st.shared.b8 [%r14+512], %r3901; 2026-02-21T12:44:28.5888020Z prmt.b32 %r3902, %r3233, 0, 0x7773U; 2026-02-21T12:44:28.5888221Z st.shared.b8 [%r15+768], %r3902; 2026-02-21T12:44:28.5888399Z bar.sync 0; 2026-02-21T12:44:28.5888552Z ld.shared.b32 %r3903, [%r16]; 2026-02-21T12:44:28.5888739Z prmt.b32 %r3904, %r3903, 0, 0x7771U; 2026-02-21T12:44:28.5888938Z cvt.u16.u32 %rs89, %r3904; 2026-02-21T12:44:28.5889119Z prmt.b32 %r3905, %r3903, 0, 0x7770U; 2026-02-21T12:44:28.5889327Z cvt.u16.u32 %rs90, %r3905; 2026-02-21T12:44:28.5889516Z prmt.b32 %r3906, %r3903, 0, 0x7773U; 2026-02-21T12:44:28.5889710Z cvt.u16.u32 %rs91, %r3906; 2026-02-21T12:44:28.5890023Z prmt.b32 %r3907, %r3903, 0, 0x7772U; 2026-02-21T12:44:28.5890219Z cvt.u16.u32 %rs92, %r3907; 2026-02-21T12:44:28.5890538Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5890898Z shl.b16 %rs93, %rs90, 4; 2026-02-21T12:44:28.5891073Z shl.b16 %rs94, %rs89, 4; 2026-02-21T12:44:28.5891385Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5891740Z cvt.u32.u16 %r3908, %rs93; 2026-02-21T12:44:28.5891931Z prmt.b32 %r3909, %r3908, %r3910, 0x3340U; 2026-02-21T12:44:28.5892149Z prmt.b32 %r3914, %r3909, %r3911, 0x5410U; 2026-02-21T12:44:28.5892359Z prmt.b32 %r3915, %r3914, %r3903, 0x5040U; 2026-02-21T12:44:28.5892565Z prmt.b32 %r3916, %r3915, 0, 0x9991U; 2026-02-21T12:44:28.5892765Z cvt.u16.u32 %rs95, %r3916; 2026-02-21T12:44:28.5892938Z shr.s16 %rs96, %rs95, 4; 2026-02-21T12:44:28.5893116Z prmt.b32 %r3917, %r3915, 0, 0xbbb3U; 2026-02-21T12:44:28.5893315Z cvt.u16.u32 %rs97, %r3917; 2026-02-21T12:44:28.5893500Z shr.s16 %rs98, %rs97, 4; 2026-02-21T12:44:28.5893679Z cvt.s16.s8 %rs99, %rs93; 2026-02-21T12:44:28.5893847Z shr.s16 %rs100, %rs99, 4; 2026-02-21T12:44:28.5894025Z cvt.s16.s8 %rs101, %rs94; 2026-02-21T12:44:28.5894195Z shr.s16 %rs102, %rs101, 4; 2026-02-21T12:44:28.5894515Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5894873Z cvt.rn.f32.s16 %r3918, %rs98; 2026-02-21T12:44:28.5895064Z cvt.rn.f32.s16 %r3919, %rs96; 2026-02-21T12:44:28.5895245Z cvt.rn.f32.s16 %r3920, %rs102; 2026-02-21T12:44:28.5895433Z cvt.rn.f32.s16 %r3921, %rs100; 2026-02-21T12:44:28.5895758Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5896110Z shl.b16 %rs103, %rs92, 4; 2026-02-21T12:44:28.5896287Z shl.b16 %rs104, %rs91, 4; 2026-02-21T12:44:28.5896734Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5897108Z prmt.b32 %r3922, %r3903, %r3923, 0x3020U; 2026-02-21T12:44:28.5897319Z prmt.b32 %r3924, %r3922, 0, 0x9991U; 2026-02-21T12:44:28.5897531Z cvt.u16.u32 %rs105, %r3924; 2026-02-21T12:44:28.5897720Z shr.s16 %rs106, %rs105, 4; 2026-02-21T12:44:28.5897896Z cvt.s16.s8 %rs107, %rs103; 2026-02-21T12:44:28.5898076Z shr.s16 %rs108, %rs107, 4; 2026-02-21T12:44:28.5898246Z cvt.s16.s8 %rs109, %rs104; 2026-02-21T12:44:28.5898420Z shr.s16 %rs110, %rs109, 4; 2026-02-21T12:44:28.5898596Z prmt.b32 %r3925, %r3903, 0, 0xbbb3U; 2026-02-21T12:44:28.5898806Z cvt.u16.u32 %rs111, %r3925; 2026-02-21T12:44:28.5898983Z shr.s16 %rs112, %rs111, 4; 2026-02-21T12:44:28.5899300Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5899744Z cvt.rn.f32.s16 %r3926, %rs106; 2026-02-21T12:44:28.5899929Z cvt.rn.f32.s16 %r3927, %rs112; 2026-02-21T12:44:28.5900118Z cvt.rn.f32.s16 %r3928, %rs110; 2026-02-21T12:44:28.5900366Z cvt.rn.f32.s16 %r3929, %rs108; 2026-02-21T12:44:28.5900559Z bar.sync 0; 2026-02-21T12:44:28.5900754Z st.shared.v4.b32 [%r17], {%r3921, %r3919, %r3920, %r3918}; 2026-02-21T12:44:28.5901050Z st.shared.v4.b32 [%r18], {%r3929, %r3926, %r3928, %r3927}; 2026-02-21T12:44:28.5901291Z $L__tmp7: 2026-02-21T12:44:28.5901654Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5902080Z // begin inline asm 2026-02-21T12:44:28.5902261Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5902452Z // end inline asm 2026-02-21T12:44:28.5902601Z bar.sync 0; 2026-02-21T12:44:28.5902767Z shfl.sync.idx.b32 %r3930, %r2, 0, 31, -1; 2026-02-21T12:44:28.5903004Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5903191Z // begin inline asm 2026-02-21T12:44:28.5906149Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945}, {%r3490,%r3491,%r3492,%r3493}, %rd466, %p3, 1, 1; 2026-02-21T12:44:28.5909361Z // end inline asm 2026-02-21T12:44:28.5909538Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.5909741Z mov.b32 %r3622, %r11895; 2026-02-21T12:44:28.5909910Z mov.b32 %r3624, %r3623; 2026-02-21T12:44:28.5910078Z // begin inline asm 2026-02-21T12:44:28.5912687Z // wait for regs: %r13818,%r13819,%r13820,%r13821,%r13822,%r13823,%r13824,%r13825,%r13826,%r13827,%r13828,%r13829,%r13830,%r13831,%r13832,%r13833,%r13834,%r13835,%r13836,%r13837,%r13838,%r13839,%r13840,%r13841,%r13842,%r13843,%r13844,%r13845,%r13846,%r13847,%r13848,%r13849,%r13850,%r13851,%r13852,%r13853,%r13854,%r13855,%r13856,%r13857,%r13858,%r13859,%r13860,%r13861,%r13862,%r13863,%r13864,%r13865,%r13866,%r13867,%r13868,%r13869,%r13870,%r13871,%r13872,%r13873,%r13874,%r13875,%r13876,%r13877,%r13878,%r13879,%r13880,%r13881,%r13882,%r13883,%r13884,%r13885,%r13886,%r13887,%r13888,%r13889,%r13890,%r13891,%r13892,%r13893,%r13894,%r13895,%r13896,%r13897,%r13898,%r13899,%r13900,%r13901,%r13902,%r13903,%r13904,%r13905,%r13906,%r13907,%r13908,%r13909,%r13910,%r13911,%r13912,%r13913,%r13914,%r13915,%r13916,%r13917,%r13918,%r13919,%r13920,%r13921,%r13922,%r13923,%r13924,%r13925,%r13926,%r13927,%r13928,%r13929,%r13930,%r13931,%r13932,%r13933,%r13934,%r13935,%r13936,%r13937,%r13938,%r13939,%r13940,%r13941,%r13942,%r13943,%r13944,%r13945,%r3622,%r3623,%r3624 2026-02-21T12:44:28.5915487Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.5915681Z // end inline asm 2026-02-21T12:44:28.5915837Z $L__tmp8: 2026-02-21T12:44:28.5916130Z .loc 1 90 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:90:28 2026-02-21T12:44:28.5916640Z cvt.rn.bf16x2.f32 %r3931, %r13819, %r13818; 2026-02-21T12:44:28.5916881Z cvt.rn.bf16x2.f32 %r3932, %r13821, %r13820; 2026-02-21T12:44:28.5917109Z cvt.rn.bf16x2.f32 %r3933, %r13823, %r13822; 2026-02-21T12:44:28.5917429Z cvt.rn.bf16x2.f32 %r3934, %r13825, %r13824; 2026-02-21T12:44:28.5917654Z cvt.rn.bf16x2.f32 %r3935, %r13827, %r13826; 2026-02-21T12:44:28.5917958Z cvt.rn.bf16x2.f32 %r3936, %r13829, %r13828; 2026-02-21T12:44:28.5918186Z cvt.rn.bf16x2.f32 %r3937, %r13831, %r13830; 2026-02-21T12:44:28.5918408Z cvt.rn.bf16x2.f32 %r3938, %r13833, %r13832; 2026-02-21T12:44:28.5918631Z cvt.rn.bf16x2.f32 %r3939, %r13835, %r13834; 2026-02-21T12:44:28.5918856Z cvt.rn.bf16x2.f32 %r3940, %r13837, %r13836; 2026-02-21T12:44:28.5919092Z cvt.rn.bf16x2.f32 %r3941, %r13839, %r13838; 2026-02-21T12:44:28.5919316Z cvt.rn.bf16x2.f32 %r3942, %r13841, %r13840; 2026-02-21T12:44:28.5919535Z cvt.rn.bf16x2.f32 %r3943, %r13843, %r13842; 2026-02-21T12:44:28.5919760Z cvt.rn.bf16x2.f32 %r3944, %r13845, %r13844; 2026-02-21T12:44:28.5919981Z cvt.rn.bf16x2.f32 %r3945, %r13847, %r13846; 2026-02-21T12:44:28.5920205Z cvt.rn.bf16x2.f32 %r3946, %r13849, %r13848; 2026-02-21T12:44:28.5920426Z cvt.rn.bf16x2.f32 %r3947, %r13851, %r13850; 2026-02-21T12:44:28.5920657Z cvt.rn.bf16x2.f32 %r3948, %r13853, %r13852; 2026-02-21T12:44:28.5920884Z cvt.rn.bf16x2.f32 %r3949, %r13855, %r13854; 2026-02-21T12:44:28.5921228Z cvt.rn.bf16x2.f32 %r3950, %r13857, %r13856; 2026-02-21T12:44:28.5921460Z cvt.rn.bf16x2.f32 %r3951, %r13859, %r13858; 2026-02-21T12:44:28.5921681Z cvt.rn.bf16x2.f32 %r3952, %r13861, %r13860; 2026-02-21T12:44:28.5921906Z cvt.rn.bf16x2.f32 %r3953, %r13863, %r13862; 2026-02-21T12:44:28.5922128Z cvt.rn.bf16x2.f32 %r3954, %r13865, %r13864; 2026-02-21T12:44:28.5922353Z cvt.rn.bf16x2.f32 %r3955, %r13867, %r13866; 2026-02-21T12:44:28.5922581Z cvt.rn.bf16x2.f32 %r3956, %r13869, %r13868; 2026-02-21T12:44:28.5922814Z cvt.rn.bf16x2.f32 %r3957, %r13871, %r13870; 2026-02-21T12:44:28.5923044Z cvt.rn.bf16x2.f32 %r3958, %r13873, %r13872; 2026-02-21T12:44:28.5923273Z cvt.rn.bf16x2.f32 %r3959, %r13875, %r13874; 2026-02-21T12:44:28.5923503Z cvt.rn.bf16x2.f32 %r3960, %r13877, %r13876; 2026-02-21T12:44:28.5923729Z cvt.rn.bf16x2.f32 %r3961, %r13879, %r13878; 2026-02-21T12:44:28.5923957Z cvt.rn.bf16x2.f32 %r3962, %r13881, %r13880; 2026-02-21T12:44:28.5924180Z cvt.rn.bf16x2.f32 %r3963, %r13883, %r13882; 2026-02-21T12:44:28.5924413Z cvt.rn.bf16x2.f32 %r3964, %r13885, %r13884; 2026-02-21T12:44:28.5924658Z cvt.rn.bf16x2.f32 %r3965, %r13887, %r13886; 2026-02-21T12:44:28.5924888Z cvt.rn.bf16x2.f32 %r3966, %r13889, %r13888; 2026-02-21T12:44:28.5925118Z cvt.rn.bf16x2.f32 %r3967, %r13891, %r13890; 2026-02-21T12:44:28.5925344Z cvt.rn.bf16x2.f32 %r3968, %r13893, %r13892; 2026-02-21T12:44:28.5925573Z cvt.rn.bf16x2.f32 %r3969, %r13895, %r13894; 2026-02-21T12:44:28.5925797Z cvt.rn.bf16x2.f32 %r3970, %r13897, %r13896; 2026-02-21T12:44:28.5926025Z cvt.rn.bf16x2.f32 %r3971, %r13899, %r13898; 2026-02-21T12:44:28.5926267Z cvt.rn.bf16x2.f32 %r3972, %r13901, %r13900; 2026-02-21T12:44:28.5926615Z cvt.rn.bf16x2.f32 %r3973, %r13903, %r13902; 2026-02-21T12:44:28.5926846Z cvt.rn.bf16x2.f32 %r3974, %r13905, %r13904; 2026-02-21T12:44:28.5927072Z cvt.rn.bf16x2.f32 %r3975, %r13907, %r13906; 2026-02-21T12:44:28.5927298Z cvt.rn.bf16x2.f32 %r3976, %r13909, %r13908; 2026-02-21T12:44:28.5927527Z cvt.rn.bf16x2.f32 %r3977, %r13911, %r13910; 2026-02-21T12:44:28.5927759Z cvt.rn.bf16x2.f32 %r3978, %r13913, %r13912; 2026-02-21T12:44:28.5927980Z cvt.rn.bf16x2.f32 %r3979, %r13915, %r13914; 2026-02-21T12:44:28.5928210Z cvt.rn.bf16x2.f32 %r3980, %r13917, %r13916; 2026-02-21T12:44:28.5928434Z cvt.rn.bf16x2.f32 %r3981, %r13919, %r13918; 2026-02-21T12:44:28.5928680Z cvt.rn.bf16x2.f32 %r3982, %r13921, %r13920; 2026-02-21T12:44:28.5928908Z cvt.rn.bf16x2.f32 %r3983, %r13923, %r13922; 2026-02-21T12:44:28.5929130Z cvt.rn.bf16x2.f32 %r3984, %r13925, %r13924; 2026-02-21T12:44:28.5929357Z cvt.rn.bf16x2.f32 %r3985, %r13927, %r13926; 2026-02-21T12:44:28.5929580Z cvt.rn.bf16x2.f32 %r3986, %r13929, %r13928; 2026-02-21T12:44:28.5929809Z cvt.rn.bf16x2.f32 %r3987, %r13931, %r13930; 2026-02-21T12:44:28.5930033Z cvt.rn.bf16x2.f32 %r3988, %r13933, %r13932; 2026-02-21T12:44:28.5930359Z cvt.rn.bf16x2.f32 %r3989, %r13935, %r13934; 2026-02-21T12:44:28.5930586Z cvt.rn.bf16x2.f32 %r3990, %r13937, %r13936; 2026-02-21T12:44:28.5930877Z cvt.rn.bf16x2.f32 %r3991, %r13939, %r13938; 2026-02-21T12:44:28.5931106Z cvt.rn.bf16x2.f32 %r3992, %r13941, %r13940; 2026-02-21T12:44:28.5931330Z cvt.rn.bf16x2.f32 %r3993, %r13943, %r13942; 2026-02-21T12:44:28.5931561Z cvt.rn.bf16x2.f32 %r3994, %r13945, %r13944; 2026-02-21T12:44:28.5931918Z .loc 1 91 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:22 2026-02-21T12:44:28.5932294Z mad.lo.s64 %rd247, %rd228, 2560, %rd167; 2026-02-21T12:44:28.5932506Z shl.b64 %rd248, %rd244, 1; 2026-02-21T12:44:28.5932695Z add.s64 %rd212, %rd247, %rd248; 2026-02-21T12:44:28.5932898Z mad.lo.s64 %rd249, %rd229, 2560, %rd167; 2026-02-21T12:44:28.5933106Z add.s64 %rd213, %rd249, %rd248; 2026-02-21T12:44:28.5933300Z mad.lo.s64 %rd250, %rd230, 2560, %rd167; 2026-02-21T12:44:28.5933524Z add.s64 %rd214, %rd250, %rd248; 2026-02-21T12:44:28.5933720Z mad.lo.s64 %rd251, %rd231, 2560, %rd167; 2026-02-21T12:44:28.5933996Z add.s64 %rd215, %rd251, %rd248; 2026-02-21T12:44:28.5934252Z mad.lo.s64 %rd252, %rd232, 2560, %rd167; 2026-02-21T12:44:28.5934457Z add.s64 %rd216, %rd252, %rd248; 2026-02-21T12:44:28.5934656Z mad.lo.s64 %rd253, %rd233, 2560, %rd167; 2026-02-21T12:44:28.5934870Z add.s64 %rd217, %rd253, %rd248; 2026-02-21T12:44:28.5935071Z mad.lo.s64 %rd254, %rd234, 2560, %rd167; 2026-02-21T12:44:28.5935286Z add.s64 %rd218, %rd254, %rd248; 2026-02-21T12:44:28.5935481Z mad.lo.s64 %rd255, %rd235, 2560, %rd167; 2026-02-21T12:44:28.5935692Z add.s64 %rd219, %rd255, %rd248; 2026-02-21T12:44:28.5935886Z mad.lo.s64 %rd256, %rd236, 2560, %rd167; 2026-02-21T12:44:28.5936099Z add.s64 %rd220, %rd256, %rd248; 2026-02-21T12:44:28.5936292Z mad.lo.s64 %rd257, %rd237, 2560, %rd167; 2026-02-21T12:44:28.5936637Z add.s64 %rd221, %rd257, %rd248; 2026-02-21T12:44:28.5936840Z mad.lo.s64 %rd258, %rd238, 2560, %rd167; 2026-02-21T12:44:28.5937049Z add.s64 %rd222, %rd258, %rd248; 2026-02-21T12:44:28.5937247Z mad.lo.s64 %rd259, %rd239, 2560, %rd167; 2026-02-21T12:44:28.5937475Z add.s64 %rd223, %rd259, %rd248; 2026-02-21T12:44:28.5937673Z mad.lo.s64 %rd260, %rd240, 2560, %rd167; 2026-02-21T12:44:28.5937878Z add.s64 %rd224, %rd260, %rd248; 2026-02-21T12:44:28.5938073Z mad.lo.s64 %rd261, %rd241, 2560, %rd167; 2026-02-21T12:44:28.5938275Z add.s64 %rd225, %rd261, %rd248; 2026-02-21T12:44:28.5938471Z mad.lo.s64 %rd262, %rd242, 2560, %rd167; 2026-02-21T12:44:28.5938683Z add.s64 %rd226, %rd262, %rd248; 2026-02-21T12:44:28.5938873Z mad.lo.s64 %rd263, %rd243, 2560, %rd167; 2026-02-21T12:44:28.5939082Z add.s64 %rd227, %rd263, %rd248; 2026-02-21T12:44:28.5939420Z .loc 1 91 81 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:81 2026-02-21T12:44:28.5939785Z bar.sync 0; 2026-02-21T12:44:28.5939984Z st.shared.v4.b32 [%r19], {%r3931, %r3933, %r3935, %r3937}; 2026-02-21T12:44:28.5940284Z st.shared.v4.b32 [%r20], {%r3939, %r3941, %r3943, %r3945}; 2026-02-21T12:44:28.5940589Z st.shared.v4.b32 [%r21], {%r3947, %r3949, %r3951, %r3953}; 2026-02-21T12:44:28.5940885Z st.shared.v4.b32 [%r22], {%r3955, %r3957, %r3959, %r3961}; 2026-02-21T12:44:28.5941170Z st.shared.v4.b32 [%r23], {%r3963, %r3965, %r3967, %r3969}; 2026-02-21T12:44:28.5941452Z st.shared.v4.b32 [%r24], {%r3971, %r3973, %r3975, %r3977}; 2026-02-21T12:44:28.5941738Z st.shared.v4.b32 [%r25], {%r3979, %r3981, %r3983, %r3985}; 2026-02-21T12:44:28.5942020Z st.shared.v4.b32 [%r26], {%r3987, %r3989, %r3991, %r3993}; 2026-02-21T12:44:28.5942260Z bar.sync 0; 2026-02-21T12:44:28.5942407Z // begin inline asm 2026-02-21T12:44:28.5942699Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3756, %r3757, %r3758, %r3759}, [%r3760]; 2026-02-21T12:44:28.5943038Z // end inline asm 2026-02-21T12:44:28.5943192Z // begin inline asm 2026-02-21T12:44:28.5943478Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3761, %r3762, %r3763, %r3764}, [%r3765]; 2026-02-21T12:44:28.5943903Z // end inline asm 2026-02-21T12:44:28.5944059Z // begin inline asm 2026-02-21T12:44:28.5944396Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3766, %r3767, %r3768, %r3769}, [%r3770]; 2026-02-21T12:44:28.5944728Z // end inline asm 2026-02-21T12:44:28.5944879Z // begin inline asm 2026-02-21T12:44:28.5945160Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3771, %r3772, %r3773, %r3774}, [%r3775]; 2026-02-21T12:44:28.5945491Z // end inline asm 2026-02-21T12:44:28.5945641Z // begin inline asm 2026-02-21T12:44:28.5945921Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3776, %r3777, %r3778, %r3779}, [%r3780]; 2026-02-21T12:44:28.5946248Z // end inline asm 2026-02-21T12:44:28.5946399Z // begin inline asm 2026-02-21T12:44:28.5946802Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3781, %r3782, %r3783, %r3784}, [%r3785]; 2026-02-21T12:44:28.5947131Z // end inline asm 2026-02-21T12:44:28.5947277Z // begin inline asm 2026-02-21T12:44:28.5947559Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3786, %r3787, %r3788, %r3789}, [%r3790]; 2026-02-21T12:44:28.5947886Z // end inline asm 2026-02-21T12:44:28.5948111Z // begin inline asm 2026-02-21T12:44:28.5948456Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3791, %r3792, %r3793, %r3794}, [%r3795]; 2026-02-21T12:44:28.5948852Z // end inline asm 2026-02-21T12:44:28.5949003Z bar.sync 0; 2026-02-21T12:44:28.5949193Z st.shared.v4.b32 [%r19], {%r3932, %r3934, %r3936, %r3938}; 2026-02-21T12:44:28.5949486Z st.shared.v4.b32 [%r20], {%r3940, %r3942, %r3944, %r3946}; 2026-02-21T12:44:28.5949770Z st.shared.v4.b32 [%r21], {%r3948, %r3950, %r3952, %r3954}; 2026-02-21T12:44:28.5950056Z st.shared.v4.b32 [%r22], {%r3956, %r3958, %r3960, %r3962}; 2026-02-21T12:44:28.5950345Z st.shared.v4.b32 [%r23], {%r3964, %r3966, %r3968, %r3970}; 2026-02-21T12:44:28.5950627Z st.shared.v4.b32 [%r24], {%r3972, %r3974, %r3976, %r3978}; 2026-02-21T12:44:28.5950911Z st.shared.v4.b32 [%r25], {%r3980, %r3982, %r3984, %r3986}; 2026-02-21T12:44:28.5951194Z st.shared.v4.b32 [%r26], {%r3988, %r3990, %r3992, %r3994}; 2026-02-21T12:44:28.5951433Z bar.sync 0; 2026-02-21T12:44:28.5951577Z // begin inline asm 2026-02-21T12:44:28.5951866Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3796, %r3797, %r3798, %r3799}, [%r3760]; 2026-02-21T12:44:28.5952198Z // end inline asm 2026-02-21T12:44:28.5952351Z // begin inline asm 2026-02-21T12:44:28.5952541Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3801, %r3802, %r3803, %r3804}, [%r3765]; 2026-02-21T12:44:28.5952606Z // end inline asm 2026-02-21T12:44:28.5952678Z // begin inline asm 2026-02-21T12:44:28.5952866Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3806, %r3807, %r3808, %r3809}, [%r3770]; 2026-02-21T12:44:28.5952929Z // end inline asm 2026-02-21T12:44:28.5952990Z // begin inline asm 2026-02-21T12:44:28.5953170Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3811, %r3812, %r3813, %r3814}, [%r3775]; 2026-02-21T12:44:28.5953228Z // end inline asm 2026-02-21T12:44:28.5953294Z // begin inline asm 2026-02-21T12:44:28.5953475Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3816, %r3817, %r3818, %r3819}, [%r3780]; 2026-02-21T12:44:28.5953532Z // end inline asm 2026-02-21T12:44:28.5953604Z // begin inline asm 2026-02-21T12:44:28.5953783Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3821, %r3822, %r3823, %r3824}, [%r3785]; 2026-02-21T12:44:28.5953842Z // end inline asm 2026-02-21T12:44:28.5953908Z // begin inline asm 2026-02-21T12:44:28.5954089Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3826, %r3827, %r3828, %r3829}, [%r3790]; 2026-02-21T12:44:28.5954159Z // end inline asm 2026-02-21T12:44:28.5954222Z // begin inline asm 2026-02-21T12:44:28.5954409Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3831, %r3832, %r3833, %r3834}, [%r3795]; 2026-02-21T12:44:28.5954467Z // end inline asm 2026-02-21T12:44:28.5954528Z // begin inline asm 2026-02-21T12:44:28.5954663Z st.global.v4.b32 [ %rd212 + 0 ], { %r3756, %r3757, %r3758, %r3759 }; 2026-02-21T12:44:28.5954725Z // end inline asm 2026-02-21T12:44:28.5954869Z // begin inline asm 2026-02-21T12:44:28.5954996Z st.global.v4.b32 [ %rd213 + 0 ], { %r3796, %r3797, %r3798, %r3799 }; 2026-02-21T12:44:28.5955061Z // end inline asm 2026-02-21T12:44:28.5955186Z // begin inline asm 2026-02-21T12:44:28.5955309Z st.global.v4.b32 [ %rd214 + 0 ], { %r3761, %r3762, %r3763, %r3764 }; 2026-02-21T12:44:28.5955372Z // end inline asm 2026-02-21T12:44:28.5955432Z // begin inline asm 2026-02-21T12:44:28.5955550Z st.global.v4.b32 [ %rd215 + 0 ], { %r3801, %r3802, %r3803, %r3804 }; 2026-02-21T12:44:28.5955613Z // end inline asm 2026-02-21T12:44:28.5955676Z // begin inline asm 2026-02-21T12:44:28.5955806Z st.global.v4.b32 [ %rd216 + 0 ], { %r3766, %r3767, %r3768, %r3769 }; 2026-02-21T12:44:28.5955873Z // end inline asm 2026-02-21T12:44:28.5955938Z // begin inline asm 2026-02-21T12:44:28.5956057Z st.global.v4.b32 [ %rd217 + 0 ], { %r3806, %r3807, %r3808, %r3809 }; 2026-02-21T12:44:28.5956117Z // end inline asm 2026-02-21T12:44:28.5956180Z // begin inline asm 2026-02-21T12:44:28.5956301Z st.global.v4.b32 [ %rd218 + 0 ], { %r3771, %r3772, %r3773, %r3774 }; 2026-02-21T12:44:28.5956360Z // end inline asm 2026-02-21T12:44:28.5956619Z // begin inline asm 2026-02-21T12:44:28.5956841Z st.global.v4.b32 [ %rd219 + 0 ], { %r3811, %r3812, %r3813, %r3814 }; 2026-02-21T12:44:28.5956908Z // end inline asm 2026-02-21T12:44:28.5956969Z // begin inline asm 2026-02-21T12:44:28.5957095Z st.global.v4.b32 [ %rd220 + 0 ], { %r3776, %r3777, %r3778, %r3779 }; 2026-02-21T12:44:28.5957154Z // end inline asm 2026-02-21T12:44:28.5957216Z // begin inline asm 2026-02-21T12:44:28.5957342Z st.global.v4.b32 [ %rd221 + 0 ], { %r3816, %r3817, %r3818, %r3819 }; 2026-02-21T12:44:28.5957402Z // end inline asm 2026-02-21T12:44:28.5957462Z // begin inline asm 2026-02-21T12:44:28.5957580Z st.global.v4.b32 [ %rd222 + 0 ], { %r3781, %r3782, %r3783, %r3784 }; 2026-02-21T12:44:28.5957646Z // end inline asm 2026-02-21T12:44:28.5957709Z // begin inline asm 2026-02-21T12:44:28.5957824Z st.global.v4.b32 [ %rd223 + 0 ], { %r3821, %r3822, %r3823, %r3824 }; 2026-02-21T12:44:28.5957889Z // end inline asm 2026-02-21T12:44:28.5957950Z // begin inline asm 2026-02-21T12:44:28.5958073Z st.global.v4.b32 [ %rd224 + 0 ], { %r3786, %r3787, %r3788, %r3789 }; 2026-02-21T12:44:28.5958132Z // end inline asm 2026-02-21T12:44:28.5958198Z // begin inline asm 2026-02-21T12:44:28.5958318Z st.global.v4.b32 [ %rd225 + 0 ], { %r3826, %r3827, %r3828, %r3829 }; 2026-02-21T12:44:28.5958375Z // end inline asm 2026-02-21T12:44:28.5958441Z // begin inline asm 2026-02-21T12:44:28.5958558Z st.global.v4.b32 [ %rd226 + 0 ], { %r3791, %r3792, %r3793, %r3794 }; 2026-02-21T12:44:28.5958618Z // end inline asm 2026-02-21T12:44:28.5958681Z // begin inline asm 2026-02-21T12:44:28.5958798Z st.global.v4.b32 [ %rd227 + 0 ], { %r3831, %r3832, %r3833, %r3834 }; 2026-02-21T12:44:28.5958857Z // end inline asm 2026-02-21T12:44:28.5959081Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.5959156Z or.b64 %rd264, %rd647, 1; 2026-02-21T12:44:28.5959366Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.5959460Z mul.hi.u64 %rd265, %rd264, -3689348814741910323; 2026-02-21T12:44:28.5959534Z shr.u64 %rd266, %rd265, 4; 2026-02-21T12:44:28.5959603Z mul.lo.s64 %rd267, %rd266, 20; 2026-02-21T12:44:28.5959671Z sub.s64 %rd268, %rd264, %rd267; 2026-02-21T12:44:28.5959878Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.5959947Z or.b64 %rd269, %rd268, %rd34; 2026-02-21T12:44:28.5960019Z and.b64 %rd270, %rd269, -4294967296; 2026-02-21T12:44:28.5960088Z setp.ne.b64 %p8, %rd270, 0; 2026-02-21T12:44:28.5960155Z @%p8 bra $L__BB0_9; 2026-02-21T12:44:28.5960215Z bra.uni $L__BB0_8; 2026-02-21T12:44:28.5960331Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5960403Z div.s64 %rd652, %rd268, %rd34; 2026-02-21T12:44:28.5960569Z bra.uni $L__BB0_10; 2026-02-21T12:44:28.5960683Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5960810Z cvt.u32.u64 %r3995, %rd34; 2026-02-21T12:44:28.5960878Z cvt.u32.u64 %r3996, %rd268; 2026-02-21T12:44:28.5960943Z div.u32 %r3997, %r3996, %r3995; 2026-02-21T12:44:28.5961006Z cvt.u64.u32 %rd652, %r3997; 2026-02-21T12:44:28.5961122Z $L__BB0_10: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.5961328Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.5961397Z mul.lo.s64 %rd272, %rd652, %rd34; 2026-02-21T12:44:28.5961468Z sub.s64 %rd273, %rd268, %rd272; 2026-02-21T12:44:28.5961669Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.5961738Z add.s64 %rd274, %rd273, %rd33; 2026-02-21T12:44:28.5961948Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.5962025Z shl.b64 %rd55, %rd274, 7; 2026-02-21T12:44:28.5962326Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.5962397Z or.b64 %rd275, %rd55, %rd4; 2026-02-21T12:44:28.5962615Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.5962683Z shl.b64 %rd56, %rd652, 8; 2026-02-21T12:44:28.5962885Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.5962954Z or.b64 %rd57, %rd56, %rd21; 2026-02-21T12:44:28.5963155Z .loc 1 51 53 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:53 2026-02-21T12:44:28.5963222Z shl.b64 %rd58, %rd275, 13; 2026-02-21T12:44:28.5963438Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.5963505Z shl.b64 %rd276, %rd274, 21; 2026-02-21T12:44:28.5963571Z add.s64 %rd654, %rd30, %rd276; 2026-02-21T12:44:28.5963637Z add.s64 %rd653, %rd31, %rd56; 2026-02-21T12:44:28.5963710Z mov.b32 %r13946, 0f00000000; 2026-02-21T12:44:28.5963776Z mov.b64 %rd655, -12; 2026-02-21T12:44:28.5963838Z mov.b32 %r13947, %r13946; 2026-02-21T12:44:28.5963903Z mov.b32 %r13948, %r13946; 2026-02-21T12:44:28.5963964Z mov.b32 %r13949, %r13946; 2026-02-21T12:44:28.5964025Z mov.b32 %r13950, %r13946; 2026-02-21T12:44:28.5964086Z mov.b32 %r13951, %r13946; 2026-02-21T12:44:28.5964151Z mov.b32 %r13952, %r13946; 2026-02-21T12:44:28.5964210Z mov.b32 %r13953, %r13946; 2026-02-21T12:44:28.5964271Z mov.b32 %r13954, %r13946; 2026-02-21T12:44:28.5964335Z mov.b32 %r13955, %r13946; 2026-02-21T12:44:28.5964396Z mov.b32 %r13956, %r13946; 2026-02-21T12:44:28.5964457Z mov.b32 %r13957, %r13946; 2026-02-21T12:44:28.5964520Z mov.b32 %r13958, %r13946; 2026-02-21T12:44:28.5964582Z mov.b32 %r13959, %r13946; 2026-02-21T12:44:28.5964644Z mov.b32 %r13960, %r13946; 2026-02-21T12:44:28.5964705Z mov.b32 %r13961, %r13946; 2026-02-21T12:44:28.5964770Z mov.b32 %r13962, %r13946; 2026-02-21T12:44:28.5964833Z mov.b32 %r13963, %r13946; 2026-02-21T12:44:28.5964893Z mov.b32 %r13964, %r13946; 2026-02-21T12:44:28.5964958Z mov.b32 %r13965, %r13946; 2026-02-21T12:44:28.5965019Z mov.b32 %r13966, %r13946; 2026-02-21T12:44:28.5965078Z mov.b32 %r13967, %r13946; 2026-02-21T12:44:28.5965139Z mov.b32 %r13968, %r13946; 2026-02-21T12:44:28.5965203Z mov.b32 %r13969, %r13946; 2026-02-21T12:44:28.5965264Z mov.b32 %r13970, %r13946; 2026-02-21T12:44:28.5965325Z mov.b32 %r13971, %r13946; 2026-02-21T12:44:28.5965401Z mov.b32 %r13972, %r13946; 2026-02-21T12:44:28.5965466Z mov.b32 %r13973, %r13946; 2026-02-21T12:44:28.5965529Z mov.b32 %r13974, %r13946; 2026-02-21T12:44:28.5965590Z mov.b32 %r13975, %r13946; 2026-02-21T12:44:28.5965652Z mov.b32 %r13976, %r13946; 2026-02-21T12:44:28.5965713Z mov.b32 %r13977, %r13946; 2026-02-21T12:44:28.5965835Z mov.b32 %r13978, %r13946; 2026-02-21T12:44:28.5965905Z mov.b32 %r13979, %r13946; 2026-02-21T12:44:28.5965969Z mov.b32 %r13980, %r13946; 2026-02-21T12:44:28.5966081Z mov.b32 %r13981, %r13946; 2026-02-21T12:44:28.5966142Z mov.b32 %r13982, %r13946; 2026-02-21T12:44:28.5966208Z mov.b32 %r13983, %r13946; 2026-02-21T12:44:28.5966271Z mov.b32 %r13984, %r13946; 2026-02-21T12:44:28.5966332Z mov.b32 %r13985, %r13946; 2026-02-21T12:44:28.5966399Z mov.b32 %r13986, %r13946; 2026-02-21T12:44:28.5966602Z mov.b32 %r13987, %r13946; 2026-02-21T12:44:28.5966666Z mov.b32 %r13988, %r13946; 2026-02-21T12:44:28.5966725Z mov.b32 %r13989, %r13946; 2026-02-21T12:44:28.5966798Z mov.b32 %r13990, %r13946; 2026-02-21T12:44:28.5966860Z mov.b32 %r13991, %r13946; 2026-02-21T12:44:28.5966924Z mov.b32 %r13992, %r13946; 2026-02-21T12:44:28.5967002Z mov.b32 %r13993, %r13946; 2026-02-21T12:44:28.5967068Z mov.b32 %r13994, %r13946; 2026-02-21T12:44:28.5967127Z mov.b32 %r13995, %r13946; 2026-02-21T12:44:28.5967192Z mov.b32 %r13996, %r13946; 2026-02-21T12:44:28.5967260Z mov.b32 %r13997, %r13946; 2026-02-21T12:44:28.5967397Z mov.b32 %r13998, %r13946; 2026-02-21T12:44:28.5967518Z mov.b32 %r13999, %r13946; 2026-02-21T12:44:28.5967590Z mov.b32 %r14000, %r13946; 2026-02-21T12:44:28.5967653Z mov.b32 %r14001, %r13946; 2026-02-21T12:44:28.5967714Z mov.b32 %r14002, %r13946; 2026-02-21T12:44:28.5967780Z mov.b32 %r14003, %r13946; 2026-02-21T12:44:28.5967844Z mov.b32 %r14004, %r13946; 2026-02-21T12:44:28.5967909Z mov.b32 %r14005, %r13946; 2026-02-21T12:44:28.5967971Z mov.b32 %r14006, %r13946; 2026-02-21T12:44:28.5968036Z mov.b32 %r14007, %r13946; 2026-02-21T12:44:28.5968098Z mov.b32 %r14008, %r13946; 2026-02-21T12:44:28.5968159Z mov.b32 %r14009, %r13946; 2026-02-21T12:44:28.5968227Z mov.b32 %r14010, %r13946; 2026-02-21T12:44:28.5968289Z mov.b32 %r14011, %r13946; 2026-02-21T12:44:28.5968351Z mov.b32 %r14012, %r13946; 2026-02-21T12:44:28.5968413Z mov.b32 %r14013, %r13946; 2026-02-21T12:44:28.5968483Z mov.b32 %r14014, %r13946; 2026-02-21T12:44:28.5968543Z mov.b32 %r14015, %r13946; 2026-02-21T12:44:28.5968607Z mov.b32 %r14016, %r13946; 2026-02-21T12:44:28.5968678Z mov.b32 %r14017, %r13946; 2026-02-21T12:44:28.5968741Z mov.b32 %r14018, %r13946; 2026-02-21T12:44:28.5968803Z mov.b32 %r14019, %r13946; 2026-02-21T12:44:28.5968864Z mov.b32 %r14020, %r13946; 2026-02-21T12:44:28.5968930Z mov.b32 %r14021, %r13946; 2026-02-21T12:44:28.5968991Z mov.b32 %r14022, %r13946; 2026-02-21T12:44:28.5969051Z mov.b32 %r14023, %r13946; 2026-02-21T12:44:28.5969118Z mov.b32 %r14024, %r13946; 2026-02-21T12:44:28.5969179Z mov.b32 %r14025, %r13946; 2026-02-21T12:44:28.5969239Z mov.b32 %r14026, %r13946; 2026-02-21T12:44:28.5969300Z mov.b32 %r14027, %r13946; 2026-02-21T12:44:28.5969382Z mov.b32 %r14028, %r13946; 2026-02-21T12:44:28.5969447Z mov.b32 %r14029, %r13946; 2026-02-21T12:44:28.5969508Z mov.b32 %r14030, %r13946; 2026-02-21T12:44:28.5969575Z mov.b32 %r14031, %r13946; 2026-02-21T12:44:28.5969638Z mov.b32 %r14032, %r13946; 2026-02-21T12:44:28.5969700Z mov.b32 %r14033, %r13946; 2026-02-21T12:44:28.5969762Z mov.b32 %r14034, %r13946; 2026-02-21T12:44:28.5969832Z mov.b32 %r14035, %r13946; 2026-02-21T12:44:28.5969893Z mov.b32 %r14036, %r13946; 2026-02-21T12:44:28.5969954Z mov.b32 %r14037, %r13946; 2026-02-21T12:44:28.5970022Z mov.b32 %r14038, %r13946; 2026-02-21T12:44:28.5970082Z mov.b32 %r14039, %r13946; 2026-02-21T12:44:28.5970143Z mov.b32 %r14040, %r13946; 2026-02-21T12:44:28.5970204Z mov.b32 %r14041, %r13946; 2026-02-21T12:44:28.5970272Z mov.b32 %r14042, %r13946; 2026-02-21T12:44:28.5970333Z mov.b32 %r14043, %r13946; 2026-02-21T12:44:28.5970396Z mov.b32 %r14044, %r13946; 2026-02-21T12:44:28.5970462Z mov.b32 %r14045, %r13946; 2026-02-21T12:44:28.5970523Z mov.b32 %r14046, %r13946; 2026-02-21T12:44:28.5970584Z mov.b32 %r14047, %r13946; 2026-02-21T12:44:28.5970652Z mov.b32 %r14048, %r13946; 2026-02-21T12:44:28.5970715Z mov.b32 %r14049, %r13946; 2026-02-21T12:44:28.5970852Z mov.b32 %r14050, %r13946; 2026-02-21T12:44:28.5970916Z mov.b32 %r14051, %r13946; 2026-02-21T12:44:28.5970998Z mov.b32 %r14052, %r13946; 2026-02-21T12:44:28.5971118Z mov.b32 %r14053, %r13946; 2026-02-21T12:44:28.5971179Z mov.b32 %r14054, %r13946; 2026-02-21T12:44:28.5971245Z mov.b32 %r14055, %r13946; 2026-02-21T12:44:28.5971319Z mov.b32 %r14056, %r13946; 2026-02-21T12:44:28.5971383Z mov.b32 %r14057, %r13946; 2026-02-21T12:44:28.5971444Z mov.b32 %r14058, %r13946; 2026-02-21T12:44:28.5971510Z mov.b32 %r14059, %r13946; 2026-02-21T12:44:28.5971572Z mov.b32 %r14060, %r13946; 2026-02-21T12:44:28.5971633Z mov.b32 %r14061, %r13946; 2026-02-21T12:44:28.5971702Z mov.b32 %r14062, %r13946; 2026-02-21T12:44:28.5971764Z mov.b32 %r14063, %r13946; 2026-02-21T12:44:28.5971826Z mov.b32 %r14064, %r13946; 2026-02-21T12:44:28.5971893Z mov.b32 %r14065, %r13946; 2026-02-21T12:44:28.5971966Z mov.b32 %r14066, %r13946; 2026-02-21T12:44:28.5972033Z mov.b32 %r14067, %r13946; 2026-02-21T12:44:28.5972100Z mov.b32 %r14068, %r13946; 2026-02-21T12:44:28.5972175Z mov.b32 %r14069, %r13946; 2026-02-21T12:44:28.5972288Z mov.b32 %r14070, %r13946; 2026-02-21T12:44:28.5972399Z mov.b32 %r14071, %r13946; 2026-02-21T12:44:28.5972467Z mov.b32 %r14072, %r13946; 2026-02-21T12:44:28.5972543Z mov.b32 %r14073, %r13946; 2026-02-21T12:44:28.5972662Z $L__BB0_11: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:28.5972776Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:28.5972998Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5973070Z add.s64 %rd278, %rd654, -32; 2026-02-21T12:44:28.5973138Z // begin inline asm 2026-02-21T12:44:28.5973215Z mov.u64 %rd277, 0x0; 2026-02-21T12:44:28.5973352Z createpolicy.fractional.L2::evict_last.b64 %rd277, 1.0; 2026-02-21T12:44:28.5973417Z // end inline asm 2026-02-21T12:44:28.5973488Z // begin inline asm 2026-02-21T12:44:28.5973567Z mov.u32 %r3999, 0x0; 2026-02-21T12:44:28.5973630Z mov.u32 %r4000, 0x0; 2026-02-21T12:44:28.5973828Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r3999, %r4000 }, [ %rd278 + 0 ], %rd277; 2026-02-21T12:44:28.5973905Z // end inline asm 2026-02-21T12:44:28.5974116Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5974177Z bar.sync 0; 2026-02-21T12:44:28.5974277Z st.shared.v2.b32 [%r10], {%r3999, %r4000}; 2026-02-21T12:44:28.5974336Z bar.sync 0; 2026-02-21T12:44:28.5974410Z ld.shared.b16 %rs113, [%r11]; 2026-02-21T12:44:28.5974485Z ld.shared.b16 %rs114, [%r11+128]; 2026-02-21T12:44:28.5974566Z ld.shared.b16 %rs115, [%r11+8]; 2026-02-21T12:44:28.5974638Z ld.shared.b16 %rs116, [%r11+136]; 2026-02-21T12:44:28.5974706Z cvt.f32.bf16 %r4258, %rs113; 2026-02-21T12:44:28.5974771Z cvt.f32.bf16 %r4259, %rs114; 2026-02-21T12:44:28.5974839Z cvt.f32.bf16 %r4260, %rs115; 2026-02-21T12:44:28.5974911Z cvt.f32.bf16 %r4261, %rs116; 2026-02-21T12:44:28.5975117Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5975190Z // begin inline asm 2026-02-21T12:44:28.5975252Z mov.u32 %r4001, 0x0; 2026-02-21T12:44:28.5975332Z ld.global.b32 { %r4001 }, [ %rd653 + 0 ]; 2026-02-21T12:44:28.5975400Z // end inline asm 2026-02-21T12:44:28.5975602Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5975661Z bar.sync 0; 2026-02-21T12:44:28.5975735Z st.shared.b8 [%r12], %r4001; 2026-02-21T12:44:28.5975808Z prmt.b32 %r5574, %r4001, 0, 0x7771U; 2026-02-21T12:44:28.5975877Z st.shared.b8 [%r13+256], %r5574; 2026-02-21T12:44:28.5975958Z prmt.b32 %r5575, %r4001, 0, 0x7772U; 2026-02-21T12:44:28.5976034Z st.shared.b8 [%r14+512], %r5575; 2026-02-21T12:44:28.5976102Z prmt.b32 %r5576, %r4001, 0, 0x7773U; 2026-02-21T12:44:28.5976170Z st.shared.b8 [%r15+768], %r5576; 2026-02-21T12:44:28.5976296Z bar.sync 0; 2026-02-21T12:44:28.5976367Z ld.shared.b32 %r5577, [%r16]; 2026-02-21T12:44:28.5976437Z prmt.b32 %r5578, %r5577, 0, 0x7771U; 2026-02-21T12:44:28.5976744Z cvt.u16.u32 %rs117, %r5578; 2026-02-21T12:44:28.5976824Z prmt.b32 %r5579, %r5577, 0, 0x7770U; 2026-02-21T12:44:28.5976893Z cvt.u16.u32 %rs118, %r5579; 2026-02-21T12:44:28.5976964Z prmt.b32 %r5580, %r5577, 0, 0x7773U; 2026-02-21T12:44:28.5977034Z cvt.u16.u32 %rs119, %r5580; 2026-02-21T12:44:28.5977103Z prmt.b32 %r5581, %r5577, 0, 0x7772U; 2026-02-21T12:44:28.5977168Z cvt.u16.u32 %rs120, %r5581; 2026-02-21T12:44:28.5977411Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5977483Z shl.b16 %rs121, %rs118, 4; 2026-02-21T12:44:28.5977558Z shl.b16 %rs122, %rs117, 4; 2026-02-21T12:44:28.5977782Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5977859Z cvt.u32.u16 %r5582, %rs121; 2026-02-21T12:44:28.5977939Z prmt.b32 %r5583, %r5582, %r5584, 0x3340U; 2026-02-21T12:44:28.5978089Z prmt.b32 %r5588, %r5583, %r5585, 0x5410U; 2026-02-21T12:44:28.5978255Z prmt.b32 %r5589, %r5588, %r5577, 0x5040U; 2026-02-21T12:44:28.5978325Z prmt.b32 %r5590, %r5589, 0, 0x9991U; 2026-02-21T12:44:28.5978392Z cvt.u16.u32 %rs123, %r5590; 2026-02-21T12:44:28.5978457Z shr.s16 %rs124, %rs123, 4; 2026-02-21T12:44:28.5978528Z prmt.b32 %r5591, %r5589, 0, 0xbbb3U; 2026-02-21T12:44:28.5978591Z cvt.u16.u32 %rs125, %r5591; 2026-02-21T12:44:28.5978656Z shr.s16 %rs126, %rs125, 4; 2026-02-21T12:44:28.5978730Z cvt.s16.s8 %rs127, %rs121; 2026-02-21T12:44:28.5978794Z shr.s16 %rs128, %rs127, 4; 2026-02-21T12:44:28.5978856Z cvt.s16.s8 %rs129, %rs122; 2026-02-21T12:44:28.5978924Z shr.s16 %rs130, %rs129, 4; 2026-02-21T12:44:28.5979130Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5979201Z cvt.rn.f32.s16 %r5592, %rs126; 2026-02-21T12:44:28.5979269Z cvt.rn.f32.s16 %r5593, %rs124; 2026-02-21T12:44:28.5979338Z cvt.rn.f32.s16 %r5594, %rs130; 2026-02-21T12:44:28.5979407Z cvt.rn.f32.s16 %r5595, %rs128; 2026-02-21T12:44:28.5979610Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5979679Z shl.b16 %rs131, %rs120, 4; 2026-02-21T12:44:28.5979744Z shl.b16 %rs132, %rs119, 4; 2026-02-21T12:44:28.5979945Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5980025Z prmt.b32 %r5596, %r5577, %r5597, 0x3020U; 2026-02-21T12:44:28.5980094Z prmt.b32 %r5598, %r5596, 0, 0x9991U; 2026-02-21T12:44:28.5980159Z cvt.u16.u32 %rs133, %r5598; 2026-02-21T12:44:28.5980223Z shr.s16 %rs134, %rs133, 4; 2026-02-21T12:44:28.5980292Z cvt.s16.s8 %rs135, %rs131; 2026-02-21T12:44:28.5980358Z shr.s16 %rs136, %rs135, 4; 2026-02-21T12:44:28.5980425Z cvt.s16.s8 %rs137, %rs132; 2026-02-21T12:44:28.5980497Z shr.s16 %rs138, %rs137, 4; 2026-02-21T12:44:28.5980570Z prmt.b32 %r5599, %r5577, 0, 0xbbb3U; 2026-02-21T12:44:28.5980639Z cvt.u16.u32 %rs139, %r5599; 2026-02-21T12:44:28.5980709Z shr.s16 %rs140, %rs139, 4; 2026-02-21T12:44:28.5980939Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5981009Z cvt.rn.f32.s16 %r5600, %rs134; 2026-02-21T12:44:28.5981076Z cvt.rn.f32.s16 %r5601, %rs140; 2026-02-21T12:44:28.5981153Z cvt.rn.f32.s16 %r5602, %rs138; 2026-02-21T12:44:28.5981219Z cvt.rn.f32.s16 %r5603, %rs136; 2026-02-21T12:44:28.5981277Z bar.sync 0; 2026-02-21T12:44:28.5981403Z st.shared.v4.b32 [%r17], {%r5595, %r5593, %r5594, %r5592}; 2026-02-21T12:44:28.5981515Z st.shared.v4.b32 [%r18], {%r5603, %r5600, %r5602, %r5601}; 2026-02-21T12:44:28.5981580Z $L__tmp9: 2026-02-21T12:44:28.5981864Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5982038Z // begin inline asm 2026-02-21T12:44:28.5982121Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5982235Z // end inline asm 2026-02-21T12:44:28.5982310Z bar.sync 0; 2026-02-21T12:44:28.5982390Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5982458Z mov.pred %p9, -1; 2026-02-21T12:44:28.5982519Z // begin inline asm 2026-02-21T12:44:28.5985336Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073}, {%r4258,%r4259,%r4260,%r4261}, %rd466, %p9, 1, 1; 2026-02-21T12:44:28.5985416Z // end inline asm 2026-02-21T12:44:28.5985501Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.5985566Z mov.b32 %r6051, 0; 2026-02-21T12:44:28.5985630Z mov.b32 %r4390, %r11895; 2026-02-21T12:44:28.5985707Z mov.b32 %r4391, %r6051; 2026-02-21T12:44:28.5985769Z mov.b32 %r4392, %r6051; 2026-02-21T12:44:28.5985833Z // begin inline asm 2026-02-21T12:44:28.5988531Z // wait for regs: %r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073,%r4390,%r4391,%r4392 2026-02-21T12:44:28.5988631Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.5988700Z // end inline asm 2026-02-21T12:44:28.5988760Z $L__tmp10: 2026-02-21T12:44:28.5988988Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.5989073Z add.s64 %rd283, %rd654, -16; 2026-02-21T12:44:28.5989137Z // begin inline asm 2026-02-21T12:44:28.5989203Z mov.u64 %rd282, 0x0; 2026-02-21T12:44:28.5989338Z createpolicy.fractional.L2::evict_last.b64 %rd282, 1.0; 2026-02-21T12:44:28.5989405Z // end inline asm 2026-02-21T12:44:28.5989470Z // begin inline asm 2026-02-21T12:44:28.5989529Z mov.u32 %r4524, 0x0; 2026-02-21T12:44:28.5989599Z mov.u32 %r4525, 0x0; 2026-02-21T12:44:28.5989800Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r4524, %r4525 }, [ %rd283 + 0 ], %rd282; 2026-02-21T12:44:28.5989872Z // end inline asm 2026-02-21T12:44:28.5990092Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.5990153Z bar.sync 0; 2026-02-21T12:44:28.5990321Z st.shared.v2.b32 [%r10], {%r4524, %r4525}; 2026-02-21T12:44:28.5990382Z bar.sync 0; 2026-02-21T12:44:28.5990461Z ld.shared.b16 %rs141, [%r11]; 2026-02-21T12:44:28.5990598Z ld.shared.b16 %rs142, [%r11+128]; 2026-02-21T12:44:28.5990668Z ld.shared.b16 %rs143, [%r11+8]; 2026-02-21T12:44:28.5990747Z ld.shared.b16 %rs144, [%r11+136]; 2026-02-21T12:44:28.5990818Z cvt.f32.bf16 %r4783, %rs141; 2026-02-21T12:44:28.5990884Z cvt.f32.bf16 %r4784, %rs142; 2026-02-21T12:44:28.5990949Z cvt.f32.bf16 %r4785, %rs143; 2026-02-21T12:44:28.5991024Z cvt.f32.bf16 %r4786, %rs144; 2026-02-21T12:44:28.5991232Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.5991312Z add.s64 %rd285, %rd653, 5120; 2026-02-21T12:44:28.5991379Z // begin inline asm 2026-02-21T12:44:28.5991442Z mov.u32 %r4526, 0x0; 2026-02-21T12:44:28.5991521Z ld.global.b32 { %r4526 }, [ %rd285 + 0 ]; 2026-02-21T12:44:28.5991590Z // end inline asm 2026-02-21T12:44:28.5991796Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.5991857Z bar.sync 0; 2026-02-21T12:44:28.5992043Z st.shared.b8 [%r12], %r4526; 2026-02-21T12:44:28.5992124Z prmt.b32 %r5604, %r4526, 0, 0x7771U; 2026-02-21T12:44:28.5992201Z st.shared.b8 [%r13+256], %r5604; 2026-02-21T12:44:28.5992269Z prmt.b32 %r5605, %r4526, 0, 0x7772U; 2026-02-21T12:44:28.5992339Z st.shared.b8 [%r14+512], %r5605; 2026-02-21T12:44:28.5992408Z prmt.b32 %r5606, %r4526, 0, 0x7773U; 2026-02-21T12:44:28.5992477Z st.shared.b8 [%r15+768], %r5606; 2026-02-21T12:44:28.5992534Z bar.sync 0; 2026-02-21T12:44:28.5992614Z ld.shared.b32 %r5607, [%r16]; 2026-02-21T12:44:28.5992679Z prmt.b32 %r5608, %r5607, 0, 0x7771U; 2026-02-21T12:44:28.5992744Z cvt.u16.u32 %rs145, %r5608; 2026-02-21T12:44:28.5992816Z prmt.b32 %r5609, %r5607, 0, 0x7770U; 2026-02-21T12:44:28.5992886Z cvt.u16.u32 %rs146, %r5609; 2026-02-21T12:44:28.5992952Z prmt.b32 %r5610, %r5607, 0, 0x7773U; 2026-02-21T12:44:28.5993026Z cvt.u16.u32 %rs147, %r5610; 2026-02-21T12:44:28.5993103Z prmt.b32 %r5611, %r5607, 0, 0x7772U; 2026-02-21T12:44:28.5993171Z cvt.u16.u32 %rs148, %r5611; 2026-02-21T12:44:28.5993376Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5993448Z shl.b16 %rs149, %rs146, 4; 2026-02-21T12:44:28.5993511Z shl.b16 %rs150, %rs145, 4; 2026-02-21T12:44:28.5993711Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5993780Z cvt.u32.u16 %r5612, %rs149; 2026-02-21T12:44:28.5993856Z prmt.b32 %r5613, %r5612, %r5614, 0x3340U; 2026-02-21T12:44:28.5993929Z prmt.b32 %r5615, %r5613, %r5585, 0x5410U; 2026-02-21T12:44:28.5994003Z prmt.b32 %r5616, %r5615, %r5607, 0x5040U; 2026-02-21T12:44:28.5994070Z prmt.b32 %r5617, %r5616, 0, 0x9991U; 2026-02-21T12:44:28.5994133Z cvt.u16.u32 %rs151, %r5617; 2026-02-21T12:44:28.5994199Z shr.s16 %rs152, %rs151, 4; 2026-02-21T12:44:28.5994271Z prmt.b32 %r5618, %r5616, 0, 0xbbb3U; 2026-02-21T12:44:28.5994336Z cvt.u16.u32 %rs153, %r5618; 2026-02-21T12:44:28.5994407Z shr.s16 %rs154, %rs153, 4; 2026-02-21T12:44:28.5994479Z cvt.s16.s8 %rs155, %rs149; 2026-02-21T12:44:28.5994550Z shr.s16 %rs156, %rs155, 4; 2026-02-21T12:44:28.5994615Z cvt.s16.s8 %rs157, %rs150; 2026-02-21T12:44:28.5994679Z shr.s16 %rs158, %rs157, 4; 2026-02-21T12:44:28.5994895Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5994964Z cvt.rn.f32.s16 %r5619, %rs154; 2026-02-21T12:44:28.5995030Z cvt.rn.f32.s16 %r5620, %rs152; 2026-02-21T12:44:28.5995102Z cvt.rn.f32.s16 %r5621, %rs158; 2026-02-21T12:44:28.5995170Z cvt.rn.f32.s16 %r5622, %rs156; 2026-02-21T12:44:28.5995372Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.5995443Z shl.b16 %rs159, %rs148, 4; 2026-02-21T12:44:28.5995570Z shl.b16 %rs160, %rs147, 4; 2026-02-21T12:44:28.5995771Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.5995890Z prmt.b32 %r5623, %r5607, %r5624, 0x3020U; 2026-02-21T12:44:28.5995972Z prmt.b32 %r5625, %r5623, 0, 0x9991U; 2026-02-21T12:44:28.5996042Z cvt.u16.u32 %rs161, %r5625; 2026-02-21T12:44:28.5996108Z shr.s16 %rs162, %rs161, 4; 2026-02-21T12:44:28.5996186Z cvt.s16.s8 %rs163, %rs159; 2026-02-21T12:44:28.5996252Z shr.s16 %rs164, %rs163, 4; 2026-02-21T12:44:28.5996317Z cvt.s16.s8 %rs165, %rs160; 2026-02-21T12:44:28.5996379Z shr.s16 %rs166, %rs165, 4; 2026-02-21T12:44:28.5996648Z prmt.b32 %r5626, %r5607, 0, 0xbbb3U; 2026-02-21T12:44:28.5996718Z cvt.u16.u32 %rs167, %r5626; 2026-02-21T12:44:28.5996781Z shr.s16 %rs168, %rs167, 4; 2026-02-21T12:44:28.5996996Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.5997063Z cvt.rn.f32.s16 %r5627, %rs162; 2026-02-21T12:44:28.5997131Z cvt.rn.f32.s16 %r5628, %rs168; 2026-02-21T12:44:28.5997202Z cvt.rn.f32.s16 %r5629, %rs166; 2026-02-21T12:44:28.5997406Z cvt.rn.f32.s16 %r5630, %rs164; 2026-02-21T12:44:28.5997472Z bar.sync 0; 2026-02-21T12:44:28.5997590Z st.shared.v4.b32 [%r17], {%r5622, %r5620, %r5621, %r5619}; 2026-02-21T12:44:28.5997705Z st.shared.v4.b32 [%r18], {%r5630, %r5627, %r5629, %r5628}; 2026-02-21T12:44:28.5997764Z $L__tmp11: 2026-02-21T12:44:28.5998044Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.5998117Z // begin inline asm 2026-02-21T12:44:28.5998198Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.5998259Z // end inline asm 2026-02-21T12:44:28.5998318Z bar.sync 0; 2026-02-21T12:44:28.5998402Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.5998465Z // begin inline asm 2026-02-21T12:44:28.6001192Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073}, {%r4783,%r4784,%r4785,%r4786}, %rd466, %p9, 1, 1; 2026-02-21T12:44:28.6001259Z // end inline asm 2026-02-21T12:44:28.6001342Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6001415Z mov.b32 %r4915, %r11895; 2026-02-21T12:44:28.6001480Z mov.b32 %r4916, %r6051; 2026-02-21T12:44:28.6001554Z mov.b32 %r4917, %r6051; 2026-02-21T12:44:28.6001628Z // begin inline asm 2026-02-21T12:44:28.6004140Z // wait for regs: %r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073,%r4915,%r4916,%r4917 2026-02-21T12:44:28.6004353Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6004413Z // end inline asm 2026-02-21T12:44:28.6004474Z $L__tmp12: 2026-02-21T12:44:28.6004695Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6004757Z // begin inline asm 2026-02-21T12:44:28.6004820Z mov.u64 %rd287, 0x0; 2026-02-21T12:44:28.6004946Z createpolicy.fractional.L2::evict_last.b64 %rd287, 1.0; 2026-02-21T12:44:28.6005013Z // end inline asm 2026-02-21T12:44:28.6005090Z // begin inline asm 2026-02-21T12:44:28.6005158Z mov.u32 %r5049, 0x0; 2026-02-21T12:44:28.6005222Z mov.u32 %r5050, 0x0; 2026-02-21T12:44:28.6005516Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r5049, %r5050 }, [ %rd654 + 0 ], %rd287; 2026-02-21T12:44:28.6005583Z // end inline asm 2026-02-21T12:44:28.6005793Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6005853Z bar.sync 0; 2026-02-21T12:44:28.6005934Z st.shared.v2.b32 [%r10], {%r5049, %r5050}; 2026-02-21T12:44:28.6005992Z bar.sync 0; 2026-02-21T12:44:28.6006069Z ld.shared.b16 %rs169, [%r11]; 2026-02-21T12:44:28.6006144Z ld.shared.b16 %rs170, [%r11+128]; 2026-02-21T12:44:28.6006211Z ld.shared.b16 %rs171, [%r11+8]; 2026-02-21T12:44:28.6006290Z ld.shared.b16 %rs172, [%r11+136]; 2026-02-21T12:44:28.6006356Z cvt.f32.bf16 %r5308, %rs169; 2026-02-21T12:44:28.6006420Z cvt.f32.bf16 %r5309, %rs170; 2026-02-21T12:44:28.6006649Z cvt.f32.bf16 %r5310, %rs171; 2026-02-21T12:44:28.6006731Z cvt.f32.bf16 %r5311, %rs172; 2026-02-21T12:44:28.6006937Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6007009Z add.s64 %rd290, %rd653, 10240; 2026-02-21T12:44:28.6007076Z // begin inline asm 2026-02-21T12:44:28.6007140Z mov.u32 %r5051, 0x0; 2026-02-21T12:44:28.6007216Z ld.global.b32 { %r5051 }, [ %rd290 + 0 ]; 2026-02-21T12:44:28.6007279Z // end inline asm 2026-02-21T12:44:28.6007484Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6007548Z bar.sync 0; 2026-02-21T12:44:28.6007626Z st.shared.b8 [%r12], %r5051; 2026-02-21T12:44:28.6007705Z prmt.b32 %r5631, %r5051, 0, 0x7771U; 2026-02-21T12:44:28.6007774Z st.shared.b8 [%r13+256], %r5631; 2026-02-21T12:44:28.6007840Z prmt.b32 %r5632, %r5051, 0, 0x7772U; 2026-02-21T12:44:28.6007911Z st.shared.b8 [%r14+512], %r5632; 2026-02-21T12:44:28.6007977Z prmt.b32 %r5633, %r5051, 0, 0x7773U; 2026-02-21T12:44:28.6008046Z st.shared.b8 [%r15+768], %r5633; 2026-02-21T12:44:28.6008106Z bar.sync 0; 2026-02-21T12:44:28.6008179Z ld.shared.b32 %r5634, [%r16]; 2026-02-21T12:44:28.6008253Z prmt.b32 %r5635, %r5634, 0, 0x7771U; 2026-02-21T12:44:28.6008319Z cvt.u16.u32 %rs173, %r5635; 2026-02-21T12:44:28.6008391Z prmt.b32 %r5636, %r5634, 0, 0x7770U; 2026-02-21T12:44:28.6008456Z cvt.u16.u32 %rs174, %r5636; 2026-02-21T12:44:28.6008528Z prmt.b32 %r5637, %r5634, 0, 0x7773U; 2026-02-21T12:44:28.6008596Z cvt.u16.u32 %rs175, %r5637; 2026-02-21T12:44:28.6008663Z prmt.b32 %r5638, %r5634, 0, 0x7772U; 2026-02-21T12:44:28.6008727Z cvt.u16.u32 %rs176, %r5638; 2026-02-21T12:44:28.6008930Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6009000Z shl.b16 %rs177, %rs174, 4; 2026-02-21T12:44:28.6009064Z shl.b16 %rs178, %rs173, 4; 2026-02-21T12:44:28.6009264Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6009414Z cvt.u32.u16 %r5639, %rs177; 2026-02-21T12:44:28.6009489Z prmt.b32 %r5640, %r5639, %r5641, 0x3340U; 2026-02-21T12:44:28.6009634Z prmt.b32 %r5642, %r5640, %r5585, 0x5410U; 2026-02-21T12:44:28.6009711Z prmt.b32 %r5643, %r5642, %r5634, 0x5040U; 2026-02-21T12:44:28.6009779Z prmt.b32 %r5644, %r5643, 0, 0x9991U; 2026-02-21T12:44:28.6009844Z cvt.u16.u32 %rs179, %r5644; 2026-02-21T12:44:28.6009910Z shr.s16 %rs180, %rs179, 4; 2026-02-21T12:44:28.6009983Z prmt.b32 %r5645, %r5643, 0, 0xbbb3U; 2026-02-21T12:44:28.6010048Z cvt.u16.u32 %rs181, %r5645; 2026-02-21T12:44:28.6010112Z shr.s16 %rs182, %rs181, 4; 2026-02-21T12:44:28.6010183Z cvt.s16.s8 %rs183, %rs177; 2026-02-21T12:44:28.6010249Z shr.s16 %rs184, %rs183, 4; 2026-02-21T12:44:28.6010312Z cvt.s16.s8 %rs185, %rs178; 2026-02-21T12:44:28.6010374Z shr.s16 %rs186, %rs185, 4; 2026-02-21T12:44:28.6010583Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6010652Z cvt.rn.f32.s16 %r5646, %rs182; 2026-02-21T12:44:28.6010718Z cvt.rn.f32.s16 %r5647, %rs180; 2026-02-21T12:44:28.6010862Z cvt.rn.f32.s16 %r5648, %rs186; 2026-02-21T12:44:28.6010992Z cvt.rn.f32.s16 %r5649, %rs184; 2026-02-21T12:44:28.6011198Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6011271Z shl.b16 %rs187, %rs176, 4; 2026-02-21T12:44:28.6011336Z shl.b16 %rs188, %rs175, 4; 2026-02-21T12:44:28.6011535Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6011608Z prmt.b32 %r5650, %r5634, %r5651, 0x3020U; 2026-02-21T12:44:28.6011680Z prmt.b32 %r5652, %r5650, 0, 0x9991U; 2026-02-21T12:44:28.6011742Z cvt.u16.u32 %rs189, %r5652; 2026-02-21T12:44:28.6011805Z shr.s16 %rs190, %rs189, 4; 2026-02-21T12:44:28.6011876Z cvt.s16.s8 %rs191, %rs187; 2026-02-21T12:44:28.6011941Z shr.s16 %rs192, %rs191, 4; 2026-02-21T12:44:28.6017011Z cvt.s16.s8 %rs193, %rs188; 2026-02-21T12:44:28.6017111Z shr.s16 %rs194, %rs193, 4; 2026-02-21T12:44:28.6017190Z prmt.b32 %r5653, %r5634, 0, 0xbbb3U; 2026-02-21T12:44:28.6017276Z cvt.u16.u32 %rs195, %r5653; 2026-02-21T12:44:28.6017340Z shr.s16 %rs196, %rs195, 4; 2026-02-21T12:44:28.6017574Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6017645Z cvt.rn.f32.s16 %r5654, %rs190; 2026-02-21T12:44:28.6017714Z cvt.rn.f32.s16 %r5655, %rs196; 2026-02-21T12:44:28.6017777Z cvt.rn.f32.s16 %r5656, %rs194; 2026-02-21T12:44:28.6017838Z cvt.rn.f32.s16 %r5657, %rs192; 2026-02-21T12:44:28.6017902Z bar.sync 0; 2026-02-21T12:44:28.6018032Z st.shared.v4.b32 [%r17], {%r5649, %r5647, %r5648, %r5646}; 2026-02-21T12:44:28.6018144Z st.shared.v4.b32 [%r18], {%r5657, %r5654, %r5656, %r5655}; 2026-02-21T12:44:28.6018200Z $L__tmp13: 2026-02-21T12:44:28.6018500Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6018567Z // begin inline asm 2026-02-21T12:44:28.6018653Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6018721Z // end inline asm 2026-02-21T12:44:28.6018778Z bar.sync 0; 2026-02-21T12:44:28.6018851Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6018917Z // begin inline asm 2026-02-21T12:44:28.6021597Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073}, {%r5308,%r5309,%r5310,%r5311}, %rd466, %p9, 1, 1; 2026-02-21T12:44:28.6021888Z // end inline asm 2026-02-21T12:44:28.6021972Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6022035Z mov.b32 %r5442, %r6051; 2026-02-21T12:44:28.6022102Z mov.b32 %r5440, %r11895; 2026-02-21T12:44:28.6022161Z mov.b32 %r5441, %r6051; 2026-02-21T12:44:28.6022222Z // begin inline asm 2026-02-21T12:44:28.6024802Z // wait for regs: %r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073,%r5440,%r5441,%r5442 2026-02-21T12:44:28.6024889Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6024948Z // end inline asm 2026-02-21T12:44:28.6025010Z $L__tmp14: 2026-02-21T12:44:28.6025242Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6025315Z add.s64 %rd655, %rd655, 12; 2026-02-21T12:44:28.6025386Z add.s64 %rd654, %rd654, 48; 2026-02-21T12:44:28.6025453Z add.s64 %rd653, %rd653, 15360; 2026-02-21T12:44:28.6025522Z setp.lt.u64 %p12, %rd655, 4080; 2026-02-21T12:44:28.6025591Z @%p12 bra $L__BB0_11; 2026-02-21T12:44:28.6025715Z // %bb.12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6025931Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6025998Z or.b64 %rd313, %rd55, %rd5; 2026-02-21T12:44:28.6026068Z or.b64 %rd314, %rd55, %rd6; 2026-02-21T12:44:28.6026131Z or.b64 %rd315, %rd55, %rd7; 2026-02-21T12:44:28.6026192Z or.b64 %rd316, %rd55, %rd8; 2026-02-21T12:44:28.6026255Z or.b64 %rd317, %rd55, %rd9; 2026-02-21T12:44:28.6026320Z or.b64 %rd318, %rd55, %rd10; 2026-02-21T12:44:28.6026383Z or.b64 %rd319, %rd55, %rd11; 2026-02-21T12:44:28.6026445Z or.b64 %rd320, %rd55, %rd12; 2026-02-21T12:44:28.6026655Z or.b64 %rd321, %rd55, %rd13; 2026-02-21T12:44:28.6026717Z or.b64 %rd322, %rd55, %rd14; 2026-02-21T12:44:28.6026780Z or.b64 %rd323, %rd55, %rd15; 2026-02-21T12:44:28.6026844Z or.b64 %rd324, %rd55, %rd16; 2026-02-21T12:44:28.6026903Z or.b64 %rd325, %rd55, %rd17; 2026-02-21T12:44:28.6026962Z or.b64 %rd326, %rd55, %rd18; 2026-02-21T12:44:28.6027038Z or.b64 %rd327, %rd55, %rd19; 2026-02-21T12:44:28.6027100Z or.b64 %rd328, %rd55, %rd20; 2026-02-21T12:44:28.6027305Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6027367Z or.b64 %rd329, %rd56, %rd22; 2026-02-21T12:44:28.6027575Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6027639Z shl.b64 %rd330, %rd58, 1; 2026-02-21T12:44:28.6027785Z add.s64 %rd293, %rd25, %rd330; 2026-02-21T12:44:28.6028375Z [8367s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:44:28.6029767Z Config: @helion.kernel(config=helion.Config(block_sizes=[4, 128, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=64, num_stages=6, num_warps=8, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:44:28.6029920Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:44:28.6029982Z `ptxas` stderr: 2026-02-21T12:44:28.6030447Z ptxas fatal : (C7602) Insufficient registers (128) to compile instruction at line 495 in function _helion_matmul_bf16_int4. Try to compile with register target of 158 or higher. 2026-02-21T12:44:28.6030552Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:44:28.6030562Z 2026-02-21T12:44:28.6031194Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmppcb8cdjd.ptx -o /tmp/tmppcb8cdjd.ptx.o 2026-02-21T12:44:28.6031202Z 2026-02-21T12:44:28.6031366Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:44:28.6031605Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6031669Z // begin inline asm 2026-02-21T12:44:28.6031732Z mov.u64 %rd292, 0x0; 2026-02-21T12:44:28.6031862Z createpolicy.fractional.L2::evict_last.b64 %rd292, 1.0; 2026-02-21T12:44:28.6031928Z // end inline asm 2026-02-21T12:44:28.6031988Z // begin inline asm 2026-02-21T12:44:28.6032054Z mov.u32 %r5658, 0x0; 2026-02-21T12:44:28.6032115Z mov.u32 %r5659, 0x0; 2026-02-21T12:44:28.6032306Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r5658, %r5659 }, [ %rd293 + 0 ], %rd292; 2026-02-21T12:44:28.6032365Z // end inline asm 2026-02-21T12:44:28.6032582Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6032642Z bar.sync 0; 2026-02-21T12:44:28.6032725Z st.shared.v2.b32 [%r10], {%r5658, %r5659}; 2026-02-21T12:44:28.6032781Z bar.sync 0; 2026-02-21T12:44:28.6032856Z ld.shared.b16 %rs197, [%r11]; 2026-02-21T12:44:28.6032925Z ld.shared.b16 %rs198, [%r11+128]; 2026-02-21T12:44:28.6032993Z ld.shared.b16 %rs199, [%r11+8]; 2026-02-21T12:44:28.6033062Z ld.shared.b16 %rs200, [%r11+136]; 2026-02-21T12:44:28.6033141Z cvt.f32.bf16 %r5917, %rs197; 2026-02-21T12:44:28.6033209Z cvt.f32.bf16 %r5918, %rs198; 2026-02-21T12:44:28.6033270Z cvt.f32.bf16 %r5919, %rs199; 2026-02-21T12:44:28.6033336Z cvt.f32.bf16 %r5920, %rs200; 2026-02-21T12:44:28.6033546Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6033614Z add.s64 %rd331, %rd645, %rd57; 2026-02-21T12:44:28.6033682Z add.s64 %rd295, %rd331, 5237760; 2026-02-21T12:44:28.6033885Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6033949Z // begin inline asm 2026-02-21T12:44:28.6034010Z mov.u32 %r5660, 0x0; 2026-02-21T12:44:28.6034086Z ld.global.b32 { %r5660 }, [ %rd295 + 0 ]; 2026-02-21T12:44:28.6034145Z // end inline asm 2026-02-21T12:44:28.6034340Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6034400Z bar.sync 0; 2026-02-21T12:44:28.6034464Z st.shared.b8 [%r12], %r5660; 2026-02-21T12:44:28.6034535Z prmt.b32 %r6327, %r5660, 0, 0x7771U; 2026-02-21T12:44:28.6034608Z st.shared.b8 [%r13+256], %r6327; 2026-02-21T12:44:28.6034676Z prmt.b32 %r6328, %r5660, 0, 0x7772U; 2026-02-21T12:44:28.6034753Z st.shared.b8 [%r14+512], %r6328; 2026-02-21T12:44:28.6034825Z prmt.b32 %r6329, %r5660, 0, 0x7773U; 2026-02-21T12:44:28.6034951Z st.shared.b8 [%r15+768], %r6329; 2026-02-21T12:44:28.6035007Z bar.sync 0; 2026-02-21T12:44:28.6035073Z ld.shared.b32 %r6330, [%r16]; 2026-02-21T12:44:28.6035221Z prmt.b32 %r6331, %r6330, 0, 0x7771U; 2026-02-21T12:44:28.6035288Z cvt.u16.u32 %rs201, %r6331; 2026-02-21T12:44:28.6035355Z prmt.b32 %r6332, %r6330, 0, 0x7770U; 2026-02-21T12:44:28.6035419Z cvt.u16.u32 %rs202, %r6332; 2026-02-21T12:44:28.6035483Z prmt.b32 %r6333, %r6330, 0, 0x7773U; 2026-02-21T12:44:28.6035544Z cvt.u16.u32 %rs203, %r6333; 2026-02-21T12:44:28.6035607Z prmt.b32 %r6334, %r6330, 0, 0x7772U; 2026-02-21T12:44:28.6035672Z cvt.u16.u32 %rs204, %r6334; 2026-02-21T12:44:28.6035885Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6035951Z shl.b16 %rs205, %rs202, 4; 2026-02-21T12:44:28.6036017Z shl.b16 %rs206, %rs201, 4; 2026-02-21T12:44:28.6036225Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6036291Z cvt.u32.u16 %r6335, %rs205; 2026-02-21T12:44:28.6036374Z prmt.b32 %r6336, %r6335, %r6337, 0x3340U; 2026-02-21T12:44:28.6036715Z prmt.b32 %r6341, %r6336, %r6338, 0x5410U; 2026-02-21T12:44:28.6036814Z prmt.b32 %r6342, %r6341, %r6330, 0x5040U; 2026-02-21T12:44:28.6036887Z prmt.b32 %r6343, %r6342, 0, 0x9991U; 2026-02-21T12:44:28.6036955Z cvt.u16.u32 %rs207, %r6343; 2026-02-21T12:44:28.6037018Z shr.s16 %rs208, %rs207, 4; 2026-02-21T12:44:28.6037085Z prmt.b32 %r6344, %r6342, 0, 0xbbb3U; 2026-02-21T12:44:28.6037152Z cvt.u16.u32 %rs209, %r6344; 2026-02-21T12:44:28.6037214Z shr.s16 %rs210, %rs209, 4; 2026-02-21T12:44:28.6037276Z cvt.s16.s8 %rs211, %rs205; 2026-02-21T12:44:28.6037337Z shr.s16 %rs212, %rs211, 4; 2026-02-21T12:44:28.6037403Z cvt.s16.s8 %rs213, %rs206; 2026-02-21T12:44:28.6037463Z shr.s16 %rs214, %rs213, 4; 2026-02-21T12:44:28.6037685Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6037760Z cvt.rn.f32.s16 %r6345, %rs210; 2026-02-21T12:44:28.6037822Z cvt.rn.f32.s16 %r6346, %rs208; 2026-02-21T12:44:28.6037891Z cvt.rn.f32.s16 %r6347, %rs214; 2026-02-21T12:44:28.6037959Z cvt.rn.f32.s16 %r6348, %rs212; 2026-02-21T12:44:28.6038165Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6038228Z shl.b16 %rs215, %rs204, 4; 2026-02-21T12:44:28.6038290Z shl.b16 %rs216, %rs203, 4; 2026-02-21T12:44:28.6038500Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6038577Z prmt.b32 %r6349, %r6330, %r6350, 0x3020U; 2026-02-21T12:44:28.6038644Z prmt.b32 %r6351, %r6349, 0, 0x9991U; 2026-02-21T12:44:28.6038712Z cvt.u16.u32 %rs217, %r6351; 2026-02-21T12:44:28.6038777Z shr.s16 %rs218, %rs217, 4; 2026-02-21T12:44:28.6038840Z cvt.s16.s8 %rs219, %rs215; 2026-02-21T12:44:28.6038909Z shr.s16 %rs220, %rs219, 4; 2026-02-21T12:44:28.6038987Z cvt.s16.s8 %rs221, %rs216; 2026-02-21T12:44:28.6039048Z shr.s16 %rs222, %rs221, 4; 2026-02-21T12:44:28.6039113Z prmt.b32 %r6352, %r6330, 0, 0xbbb3U; 2026-02-21T12:44:28.6039184Z cvt.u16.u32 %rs223, %r6352; 2026-02-21T12:44:28.6039244Z shr.s16 %rs224, %rs223, 4; 2026-02-21T12:44:28.6039455Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6039527Z cvt.rn.f32.s16 %r6353, %rs218; 2026-02-21T12:44:28.6039590Z cvt.rn.f32.s16 %r6354, %rs224; 2026-02-21T12:44:28.6039653Z cvt.rn.f32.s16 %r6355, %rs222; 2026-02-21T12:44:28.6039713Z cvt.rn.f32.s16 %r6356, %rs220; 2026-02-21T12:44:28.6039775Z bar.sync 0; 2026-02-21T12:44:28.6039891Z st.shared.v4.b32 [%r17], {%r6348, %r6346, %r6347, %r6345}; 2026-02-21T12:44:28.6039997Z st.shared.v4.b32 [%r18], {%r6356, %r6353, %r6355, %r6354}; 2026-02-21T12:44:28.6040057Z $L__tmp15: 2026-02-21T12:44:28.6040334Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6040473Z // begin inline asm 2026-02-21T12:44:28.6040561Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6040679Z // end inline asm 2026-02-21T12:44:28.6040734Z bar.sync 0; 2026-02-21T12:44:28.6040807Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6040884Z // begin inline asm 2026-02-21T12:44:28.6043694Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073}, {%r5917,%r5918,%r5919,%r5920}, %rd466, %p9, 1, 1; 2026-02-21T12:44:28.6043768Z // end inline asm 2026-02-21T12:44:28.6043850Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6043914Z mov.b32 %r6050, %r6051; 2026-02-21T12:44:28.6043978Z mov.b32 %r6049, %r11895; 2026-02-21T12:44:28.6044038Z // begin inline asm 2026-02-21T12:44:28.6046694Z // wait for regs: %r13946,%r13947,%r13948,%r13949,%r13950,%r13951,%r13952,%r13953,%r13954,%r13955,%r13956,%r13957,%r13958,%r13959,%r13960,%r13961,%r13962,%r13963,%r13964,%r13965,%r13966,%r13967,%r13968,%r13969,%r13970,%r13971,%r13972,%r13973,%r13974,%r13975,%r13976,%r13977,%r13978,%r13979,%r13980,%r13981,%r13982,%r13983,%r13984,%r13985,%r13986,%r13987,%r13988,%r13989,%r13990,%r13991,%r13992,%r13993,%r13994,%r13995,%r13996,%r13997,%r13998,%r13999,%r14000,%r14001,%r14002,%r14003,%r14004,%r14005,%r14006,%r14007,%r14008,%r14009,%r14010,%r14011,%r14012,%r14013,%r14014,%r14015,%r14016,%r14017,%r14018,%r14019,%r14020,%r14021,%r14022,%r14023,%r14024,%r14025,%r14026,%r14027,%r14028,%r14029,%r14030,%r14031,%r14032,%r14033,%r14034,%r14035,%r14036,%r14037,%r14038,%r14039,%r14040,%r14041,%r14042,%r14043,%r14044,%r14045,%r14046,%r14047,%r14048,%r14049,%r14050,%r14051,%r14052,%r14053,%r14054,%r14055,%r14056,%r14057,%r14058,%r14059,%r14060,%r14061,%r14062,%r14063,%r14064,%r14065,%r14066,%r14067,%r14068,%r14069,%r14070,%r14071,%r14072,%r14073,%r6049,%r6050,%r6051 2026-02-21T12:44:28.6046789Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6046848Z // end inline asm 2026-02-21T12:44:28.6046907Z $L__tmp16: 2026-02-21T12:44:28.6047129Z .loc 1 90 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:90:28 2026-02-21T12:44:28.6047218Z cvt.rn.bf16x2.f32 %r6357, %r13947, %r13946; 2026-02-21T12:44:28.6047306Z cvt.rn.bf16x2.f32 %r6358, %r13949, %r13948; 2026-02-21T12:44:28.6047382Z cvt.rn.bf16x2.f32 %r6359, %r13951, %r13950; 2026-02-21T12:44:28.6047458Z cvt.rn.bf16x2.f32 %r6360, %r13953, %r13952; 2026-02-21T12:44:28.6047532Z cvt.rn.bf16x2.f32 %r6361, %r13955, %r13954; 2026-02-21T12:44:28.6047611Z cvt.rn.bf16x2.f32 %r6362, %r13957, %r13956; 2026-02-21T12:44:28.6047685Z cvt.rn.bf16x2.f32 %r6363, %r13959, %r13958; 2026-02-21T12:44:28.6047758Z cvt.rn.bf16x2.f32 %r6364, %r13961, %r13960; 2026-02-21T12:44:28.6047837Z cvt.rn.bf16x2.f32 %r6365, %r13963, %r13962; 2026-02-21T12:44:28.6047913Z cvt.rn.bf16x2.f32 %r6366, %r13965, %r13964; 2026-02-21T12:44:28.6047987Z cvt.rn.bf16x2.f32 %r6367, %r13967, %r13966; 2026-02-21T12:44:28.6048065Z cvt.rn.bf16x2.f32 %r6368, %r13969, %r13968; 2026-02-21T12:44:28.6048232Z cvt.rn.bf16x2.f32 %r6369, %r13971, %r13970; 2026-02-21T12:44:28.6048308Z cvt.rn.bf16x2.f32 %r6370, %r13973, %r13972; 2026-02-21T12:44:28.6048454Z cvt.rn.bf16x2.f32 %r6371, %r13975, %r13974; 2026-02-21T12:44:28.6048536Z cvt.rn.bf16x2.f32 %r6372, %r13977, %r13976; 2026-02-21T12:44:28.6048611Z cvt.rn.bf16x2.f32 %r6373, %r13979, %r13978; 2026-02-21T12:44:28.6048685Z cvt.rn.bf16x2.f32 %r6374, %r13981, %r13980; 2026-02-21T12:44:28.6048768Z cvt.rn.bf16x2.f32 %r6375, %r13983, %r13982; 2026-02-21T12:44:28.6048843Z cvt.rn.bf16x2.f32 %r6376, %r13985, %r13984; 2026-02-21T12:44:28.6048924Z cvt.rn.bf16x2.f32 %r6377, %r13987, %r13986; 2026-02-21T12:44:28.6048999Z cvt.rn.bf16x2.f32 %r6378, %r13989, %r13988; 2026-02-21T12:44:28.6049078Z cvt.rn.bf16x2.f32 %r6379, %r13991, %r13990; 2026-02-21T12:44:28.6049152Z cvt.rn.bf16x2.f32 %r6380, %r13993, %r13992; 2026-02-21T12:44:28.6049226Z cvt.rn.bf16x2.f32 %r6381, %r13995, %r13994; 2026-02-21T12:44:28.6049310Z cvt.rn.bf16x2.f32 %r6382, %r13997, %r13996; 2026-02-21T12:44:28.6049391Z cvt.rn.bf16x2.f32 %r6383, %r13999, %r13998; 2026-02-21T12:44:28.6049468Z cvt.rn.bf16x2.f32 %r6384, %r14001, %r14000; 2026-02-21T12:44:28.6049663Z cvt.rn.bf16x2.f32 %r6385, %r14003, %r14002; 2026-02-21T12:44:28.6049743Z cvt.rn.bf16x2.f32 %r6386, %r14005, %r14004; 2026-02-21T12:44:28.6049817Z cvt.rn.bf16x2.f32 %r6387, %r14007, %r14006; 2026-02-21T12:44:28.6049891Z cvt.rn.bf16x2.f32 %r6388, %r14009, %r14008; 2026-02-21T12:44:28.6049970Z cvt.rn.bf16x2.f32 %r6389, %r14011, %r14010; 2026-02-21T12:44:28.6050045Z cvt.rn.bf16x2.f32 %r6390, %r14013, %r14012; 2026-02-21T12:44:28.6050120Z cvt.rn.bf16x2.f32 %r6391, %r14015, %r14014; 2026-02-21T12:44:28.6050198Z cvt.rn.bf16x2.f32 %r6392, %r14017, %r14016; 2026-02-21T12:44:28.6050273Z cvt.rn.bf16x2.f32 %r6393, %r14019, %r14018; 2026-02-21T12:44:28.6050348Z cvt.rn.bf16x2.f32 %r6394, %r14021, %r14020; 2026-02-21T12:44:28.6050424Z cvt.rn.bf16x2.f32 %r6395, %r14023, %r14022; 2026-02-21T12:44:28.6050498Z cvt.rn.bf16x2.f32 %r6396, %r14025, %r14024; 2026-02-21T12:44:28.6050575Z cvt.rn.bf16x2.f32 %r6397, %r14027, %r14026; 2026-02-21T12:44:28.6050651Z cvt.rn.bf16x2.f32 %r6398, %r14029, %r14028; 2026-02-21T12:44:28.6050734Z cvt.rn.bf16x2.f32 %r6399, %r14031, %r14030; 2026-02-21T12:44:28.6050808Z cvt.rn.bf16x2.f32 %r6400, %r14033, %r14032; 2026-02-21T12:44:28.6050882Z cvt.rn.bf16x2.f32 %r6401, %r14035, %r14034; 2026-02-21T12:44:28.6050962Z cvt.rn.bf16x2.f32 %r6402, %r14037, %r14036; 2026-02-21T12:44:28.6051040Z cvt.rn.bf16x2.f32 %r6403, %r14039, %r14038; 2026-02-21T12:44:28.6051117Z cvt.rn.bf16x2.f32 %r6404, %r14041, %r14040; 2026-02-21T12:44:28.6051195Z cvt.rn.bf16x2.f32 %r6405, %r14043, %r14042; 2026-02-21T12:44:28.6051272Z cvt.rn.bf16x2.f32 %r6406, %r14045, %r14044; 2026-02-21T12:44:28.6051347Z cvt.rn.bf16x2.f32 %r6407, %r14047, %r14046; 2026-02-21T12:44:28.6051436Z cvt.rn.bf16x2.f32 %r6408, %r14049, %r14048; 2026-02-21T12:44:28.6051516Z cvt.rn.bf16x2.f32 %r6409, %r14051, %r14050; 2026-02-21T12:44:28.6051596Z cvt.rn.bf16x2.f32 %r6410, %r14053, %r14052; 2026-02-21T12:44:28.6051672Z cvt.rn.bf16x2.f32 %r6411, %r14055, %r14054; 2026-02-21T12:44:28.6051756Z cvt.rn.bf16x2.f32 %r6412, %r14057, %r14056; 2026-02-21T12:44:28.6051831Z cvt.rn.bf16x2.f32 %r6413, %r14059, %r14058; 2026-02-21T12:44:28.6051907Z cvt.rn.bf16x2.f32 %r6414, %r14061, %r14060; 2026-02-21T12:44:28.6051987Z cvt.rn.bf16x2.f32 %r6415, %r14063, %r14062; 2026-02-21T12:44:28.6052062Z cvt.rn.bf16x2.f32 %r6416, %r14065, %r14064; 2026-02-21T12:44:28.6052139Z cvt.rn.bf16x2.f32 %r6417, %r14067, %r14066; 2026-02-21T12:44:28.6052212Z cvt.rn.bf16x2.f32 %r6418, %r14069, %r14068; 2026-02-21T12:44:28.6052293Z cvt.rn.bf16x2.f32 %r6419, %r14071, %r14070; 2026-02-21T12:44:28.6052368Z cvt.rn.bf16x2.f32 %r6420, %r14073, %r14072; 2026-02-21T12:44:28.6052585Z .loc 1 91 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:22 2026-02-21T12:44:28.6052665Z mad.lo.s64 %rd332, %rd313, 2560, %rd167; 2026-02-21T12:44:28.6052802Z shl.b64 %rd333, %rd329, 1; 2026-02-21T12:44:28.6052869Z add.s64 %rd297, %rd332, %rd333; 2026-02-21T12:44:28.6052945Z mad.lo.s64 %rd334, %rd314, 2560, %rd167; 2026-02-21T12:44:28.6053056Z add.s64 %rd298, %rd334, %rd333; 2026-02-21T12:44:28.6053126Z mad.lo.s64 %rd335, %rd315, 2560, %rd167; 2026-02-21T12:44:28.6053187Z add.s64 %rd299, %rd335, %rd333; 2026-02-21T12:44:28.6053262Z mad.lo.s64 %rd336, %rd316, 2560, %rd167; 2026-02-21T12:44:28.6053325Z add.s64 %rd300, %rd336, %rd333; 2026-02-21T12:44:28.6053392Z mad.lo.s64 %rd337, %rd317, 2560, %rd167; 2026-02-21T12:44:28.6053461Z add.s64 %rd301, %rd337, %rd333; 2026-02-21T12:44:28.6053537Z mad.lo.s64 %rd338, %rd318, 2560, %rd167; 2026-02-21T12:44:28.6053600Z add.s64 %rd302, %rd338, %rd333; 2026-02-21T12:44:28.6053669Z mad.lo.s64 %rd339, %rd319, 2560, %rd167; 2026-02-21T12:44:28.6053735Z add.s64 %rd303, %rd339, %rd333; 2026-02-21T12:44:28.6053802Z mad.lo.s64 %rd340, %rd320, 2560, %rd167; 2026-02-21T12:44:28.6053864Z add.s64 %rd304, %rd340, %rd333; 2026-02-21T12:44:28.6053937Z mad.lo.s64 %rd341, %rd321, 2560, %rd167; 2026-02-21T12:44:28.6053999Z add.s64 %rd305, %rd341, %rd333; 2026-02-21T12:44:28.6054157Z mad.lo.s64 %rd342, %rd322, 2560, %rd167; 2026-02-21T12:44:28.6054226Z add.s64 %rd306, %rd342, %rd333; 2026-02-21T12:44:28.6054294Z mad.lo.s64 %rd343, %rd323, 2560, %rd167; 2026-02-21T12:44:28.6054358Z add.s64 %rd307, %rd343, %rd333; 2026-02-21T12:44:28.6054427Z mad.lo.s64 %rd344, %rd324, 2560, %rd167; 2026-02-21T12:44:28.6054492Z add.s64 %rd308, %rd344, %rd333; 2026-02-21T12:44:28.6054560Z mad.lo.s64 %rd345, %rd325, 2560, %rd167; 2026-02-21T12:44:28.6054620Z add.s64 %rd309, %rd345, %rd333; 2026-02-21T12:44:28.6054694Z mad.lo.s64 %rd346, %rd326, 2560, %rd167; 2026-02-21T12:44:28.6054755Z add.s64 %rd310, %rd346, %rd333; 2026-02-21T12:44:28.6054822Z mad.lo.s64 %rd347, %rd327, 2560, %rd167; 2026-02-21T12:44:28.6054884Z add.s64 %rd311, %rd347, %rd333; 2026-02-21T12:44:28.6054958Z mad.lo.s64 %rd348, %rd328, 2560, %rd167; 2026-02-21T12:44:28.6055022Z add.s64 %rd312, %rd348, %rd333; 2026-02-21T12:44:28.6055239Z .loc 1 91 81 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:81 2026-02-21T12:44:28.6055310Z bar.sync 0; 2026-02-21T12:44:28.6055425Z st.shared.v4.b32 [%r19], {%r6357, %r6359, %r6361, %r6363}; 2026-02-21T12:44:28.6055531Z st.shared.v4.b32 [%r20], {%r6365, %r6367, %r6369, %r6371}; 2026-02-21T12:44:28.6055635Z st.shared.v4.b32 [%r21], {%r6373, %r6375, %r6377, %r6379}; 2026-02-21T12:44:28.6055737Z st.shared.v4.b32 [%r22], {%r6381, %r6383, %r6385, %r6387}; 2026-02-21T12:44:28.6055835Z st.shared.v4.b32 [%r23], {%r6389, %r6391, %r6393, %r6395}; 2026-02-21T12:44:28.6055940Z st.shared.v4.b32 [%r24], {%r6397, %r6399, %r6401, %r6403}; 2026-02-21T12:44:28.6056041Z st.shared.v4.b32 [%r25], {%r6405, %r6407, %r6409, %r6411}; 2026-02-21T12:44:28.6056140Z st.shared.v4.b32 [%r26], {%r6413, %r6415, %r6417, %r6419}; 2026-02-21T12:44:28.6056200Z bar.sync 0; 2026-02-21T12:44:28.6056264Z // begin inline asm 2026-02-21T12:44:28.6056581Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6183, %r6184, %r6185, %r6186}, [%r3760]; 2026-02-21T12:44:28.6056660Z // end inline asm 2026-02-21T12:44:28.6056724Z // begin inline asm 2026-02-21T12:44:28.6056911Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6188, %r6189, %r6190, %r6191}, [%r3765]; 2026-02-21T12:44:28.6056967Z // end inline asm 2026-02-21T12:44:28.6057029Z // begin inline asm 2026-02-21T12:44:28.6057205Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6193, %r6194, %r6195, %r6196}, [%r3770]; 2026-02-21T12:44:28.6057264Z // end inline asm 2026-02-21T12:44:28.6057325Z // begin inline asm 2026-02-21T12:44:28.6057503Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6198, %r6199, %r6200, %r6201}, [%r3775]; 2026-02-21T12:44:28.6057558Z // end inline asm 2026-02-21T12:44:28.6057620Z // begin inline asm 2026-02-21T12:44:28.6057798Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6203, %r6204, %r6205, %r6206}, [%r3780]; 2026-02-21T12:44:28.6057955Z // end inline asm 2026-02-21T12:44:28.6058014Z // begin inline asm 2026-02-21T12:44:28.6058198Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6208, %r6209, %r6210, %r6211}, [%r3785]; 2026-02-21T12:44:28.6058318Z // end inline asm 2026-02-21T12:44:28.6058375Z // begin inline asm 2026-02-21T12:44:28.6058555Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6213, %r6214, %r6215, %r6216}, [%r3790]; 2026-02-21T12:44:28.6058611Z // end inline asm 2026-02-21T12:44:28.6058670Z // begin inline asm 2026-02-21T12:44:28.6058844Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6218, %r6219, %r6220, %r6221}, [%r3795]; 2026-02-21T12:44:28.6058904Z // end inline asm 2026-02-21T12:44:28.6058959Z bar.sync 0; 2026-02-21T12:44:28.6059061Z st.shared.v4.b32 [%r19], {%r6358, %r6360, %r6362, %r6364}; 2026-02-21T12:44:28.6059166Z st.shared.v4.b32 [%r20], {%r6366, %r6368, %r6370, %r6372}; 2026-02-21T12:44:28.6059282Z st.shared.v4.b32 [%r21], {%r6374, %r6376, %r6378, %r6380}; 2026-02-21T12:44:28.6059386Z st.shared.v4.b32 [%r22], {%r6382, %r6384, %r6386, %r6388}; 2026-02-21T12:44:28.6059489Z st.shared.v4.b32 [%r23], {%r6390, %r6392, %r6394, %r6396}; 2026-02-21T12:44:28.6059702Z st.shared.v4.b32 [%r24], {%r6398, %r6400, %r6402, %r6404}; 2026-02-21T12:44:28.6059805Z st.shared.v4.b32 [%r25], {%r6406, %r6408, %r6410, %r6412}; 2026-02-21T12:44:28.6059905Z st.shared.v4.b32 [%r26], {%r6414, %r6416, %r6418, %r6420}; 2026-02-21T12:44:28.6059966Z bar.sync 0; 2026-02-21T12:44:28.6060024Z // begin inline asm 2026-02-21T12:44:28.6060203Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6223, %r6224, %r6225, %r6226}, [%r3760]; 2026-02-21T12:44:28.6060263Z // end inline asm 2026-02-21T12:44:28.6060322Z // begin inline asm 2026-02-21T12:44:28.6060499Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6228, %r6229, %r6230, %r6231}, [%r3765]; 2026-02-21T12:44:28.6060561Z // end inline asm 2026-02-21T12:44:28.6060620Z // begin inline asm 2026-02-21T12:44:28.6060795Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6233, %r6234, %r6235, %r6236}, [%r3770]; 2026-02-21T12:44:28.6060853Z // end inline asm 2026-02-21T12:44:28.6060913Z // begin inline asm 2026-02-21T12:44:28.6061091Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6238, %r6239, %r6240, %r6241}, [%r3775]; 2026-02-21T12:44:28.6061147Z // end inline asm 2026-02-21T12:44:28.6061208Z // begin inline asm 2026-02-21T12:44:28.6061383Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6243, %r6244, %r6245, %r6246}, [%r3780]; 2026-02-21T12:44:28.6061451Z // end inline asm 2026-02-21T12:44:28.6061516Z // begin inline asm 2026-02-21T12:44:28.6061693Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6248, %r6249, %r6250, %r6251}, [%r3785]; 2026-02-21T12:44:28.6061748Z // end inline asm 2026-02-21T12:44:28.6061806Z // begin inline asm 2026-02-21T12:44:28.6061986Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6253, %r6254, %r6255, %r6256}, [%r3790]; 2026-02-21T12:44:28.6062042Z // end inline asm 2026-02-21T12:44:28.6062100Z // begin inline asm 2026-02-21T12:44:28.6062279Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r6258, %r6259, %r6260, %r6261}, [%r3795]; 2026-02-21T12:44:28.6062349Z // end inline asm 2026-02-21T12:44:28.6062407Z // begin inline asm 2026-02-21T12:44:28.6062540Z st.global.v4.b32 [ %rd297 + 0 ], { %r6183, %r6184, %r6185, %r6186 }; 2026-02-21T12:44:28.6062599Z // end inline asm 2026-02-21T12:44:28.6062655Z // begin inline asm 2026-02-21T12:44:28.6062774Z st.global.v4.b32 [ %rd298 + 0 ], { %r6223, %r6224, %r6225, %r6226 }; 2026-02-21T12:44:28.6062833Z // end inline asm 2026-02-21T12:44:28.6062890Z // begin inline asm 2026-02-21T12:44:28.6063006Z st.global.v4.b32 [ %rd299 + 0 ], { %r6188, %r6189, %r6190, %r6191 }; 2026-02-21T12:44:28.6063069Z // end inline asm 2026-02-21T12:44:28.6063126Z // begin inline asm 2026-02-21T12:44:28.6063239Z st.global.v4.b32 [ %rd300 + 0 ], { %r6228, %r6229, %r6230, %r6231 }; 2026-02-21T12:44:28.6063294Z // end inline asm 2026-02-21T12:44:28.6063356Z // begin inline asm 2026-02-21T12:44:28.6063467Z st.global.v4.b32 [ %rd301 + 0 ], { %r6193, %r6194, %r6195, %r6196 }; 2026-02-21T12:44:28.6063592Z // end inline asm 2026-02-21T12:44:28.6063654Z // begin inline asm 2026-02-21T12:44:28.6063814Z st.global.v4.b32 [ %rd302 + 0 ], { %r6233, %r6234, %r6235, %r6236 }; 2026-02-21T12:44:28.6063873Z // end inline asm 2026-02-21T12:44:28.6063931Z // begin inline asm 2026-02-21T12:44:28.6064046Z st.global.v4.b32 [ %rd303 + 0 ], { %r6198, %r6199, %r6200, %r6201 }; 2026-02-21T12:44:28.6064102Z // end inline asm 2026-02-21T12:44:28.6064158Z // begin inline asm 2026-02-21T12:44:28.6064276Z st.global.v4.b32 [ %rd304 + 0 ], { %r6238, %r6239, %r6240, %r6241 }; 2026-02-21T12:44:28.6064330Z // end inline asm 2026-02-21T12:44:28.6064387Z // begin inline asm 2026-02-21T12:44:28.6064504Z st.global.v4.b32 [ %rd305 + 0 ], { %r6203, %r6204, %r6205, %r6206 }; 2026-02-21T12:44:28.6064560Z // end inline asm 2026-02-21T12:44:28.6064618Z // begin inline asm 2026-02-21T12:44:28.6064729Z st.global.v4.b32 [ %rd306 + 0 ], { %r6243, %r6244, %r6245, %r6246 }; 2026-02-21T12:44:28.6064794Z // end inline asm 2026-02-21T12:44:28.6064851Z // begin inline asm 2026-02-21T12:44:28.6065017Z st.global.v4.b32 [ %rd307 + 0 ], { %r6208, %r6209, %r6210, %r6211 }; 2026-02-21T12:44:28.6065132Z // end inline asm 2026-02-21T12:44:28.6065193Z // begin inline asm 2026-02-21T12:44:28.6065308Z st.global.v4.b32 [ %rd308 + 0 ], { %r6248, %r6249, %r6250, %r6251 }; 2026-02-21T12:44:28.6065363Z // end inline asm 2026-02-21T12:44:28.6065419Z // begin inline asm 2026-02-21T12:44:28.6065529Z st.global.v4.b32 [ %rd309 + 0 ], { %r6213, %r6214, %r6215, %r6216 }; 2026-02-21T12:44:28.6065586Z // end inline asm 2026-02-21T12:44:28.6065641Z // begin inline asm 2026-02-21T12:44:28.6065752Z st.global.v4.b32 [ %rd310 + 0 ], { %r6253, %r6254, %r6255, %r6256 }; 2026-02-21T12:44:28.6065809Z // end inline asm 2026-02-21T12:44:28.6065864Z // begin inline asm 2026-02-21T12:44:28.6065974Z st.global.v4.b32 [ %rd311 + 0 ], { %r6218, %r6219, %r6220, %r6221 }; 2026-02-21T12:44:28.6066029Z // end inline asm 2026-02-21T12:44:28.6066091Z // begin inline asm 2026-02-21T12:44:28.6066201Z st.global.v4.b32 [ %rd312 + 0 ], { %r6258, %r6259, %r6260, %r6261 }; 2026-02-21T12:44:28.6066260Z // end inline asm 2026-02-21T12:44:28.6066616Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6066685Z add.s64 %rd349, %rd647, 2; 2026-02-21T12:44:28.6066891Z .loc 1 28 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:28:35 2026-02-21T12:44:28.6066983Z mul.hi.u64 %rd350, %rd349, -3689348814741910323; 2026-02-21T12:44:28.6067046Z shr.u64 %rd351, %rd350, 4; 2026-02-21T12:44:28.6067246Z .loc 1 29 33 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:29:33 2026-02-21T12:44:28.6067313Z shl.b64 %rd67, %rd351, 2; 2026-02-21T12:44:28.6067509Z .loc 1 30 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:39 2026-02-21T12:44:28.6067569Z sub.s64 %rd352, 2048, %rd67; 2026-02-21T12:44:28.6067767Z .loc 1 30 52 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:52 2026-02-21T12:44:28.6067835Z min.s64 %rd68, %rd352, 4; 2026-02-21T12:44:28.6068030Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.6068095Z mul.lo.s64 %rd353, %rd351, 20; 2026-02-21T12:44:28.6068161Z sub.s64 %rd69, %rd349, %rd353; 2026-02-21T12:44:28.6068357Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.6068420Z or.b64 %rd354, %rd69, %rd68; 2026-02-21T12:44:28.6068538Z and.b64 %rd355, %rd354, -4294967296; 2026-02-21T12:44:28.6068616Z setp.ne.b64 %p14, %rd355, 0; 2026-02-21T12:44:28.6068678Z @%p14 bra $L__BB0_14; 2026-02-21T12:44:28.6068736Z bra.uni $L__BB0_13; 2026-02-21T12:44:28.6068857Z $L__BB0_14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6069003Z div.s64 %rd656, %rd69, %rd68; 2026-02-21T12:44:28.6069060Z bra.uni $L__BB0_15; 2026-02-21T12:44:28.6069174Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6069303Z cvt.u32.u64 %r6421, %rd68; 2026-02-21T12:44:28.6069361Z cvt.u32.u64 %r6422, %rd69; 2026-02-21T12:44:28.6069424Z div.u32 %r6423, %r6422, %r6421; 2026-02-21T12:44:28.6069488Z cvt.u64.u32 %rd656, %r6423; 2026-02-21T12:44:28.6069591Z $L__BB0_15: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6069790Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.6069862Z mul.lo.s64 %rd357, %rd656, %rd68; 2026-02-21T12:44:28.6069925Z sub.s64 %rd358, %rd69, %rd357; 2026-02-21T12:44:28.6070121Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.6070185Z add.s64 %rd359, %rd358, %rd67; 2026-02-21T12:44:28.6070378Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.6070442Z shl.b64 %rd73, %rd359, 7; 2026-02-21T12:44:28.6070754Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6070820Z or.b64 %rd360, %rd73, %rd4; 2026-02-21T12:44:28.6071014Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.6071086Z shl.b64 %rd74, %rd656, 8; 2026-02-21T12:44:28.6071290Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6071349Z or.b64 %rd75, %rd74, %rd21; 2026-02-21T12:44:28.6071544Z .loc 1 51 53 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:53 2026-02-21T12:44:28.6071608Z shl.b64 %rd76, %rd360, 13; 2026-02-21T12:44:28.6071815Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6071887Z shl.b64 %rd361, %rd359, 21; 2026-02-21T12:44:28.6071951Z add.s64 %rd658, %rd30, %rd361; 2026-02-21T12:44:28.6072018Z add.s64 %rd657, %rd31, %rd74; 2026-02-21T12:44:28.6072080Z mov.b32 %r14074, 0f00000000; 2026-02-21T12:44:28.6072140Z mov.b64 %rd659, -12; 2026-02-21T12:44:28.6072203Z mov.b32 %r14075, %r14074; 2026-02-21T12:44:28.6072261Z mov.b32 %r14076, %r14074; 2026-02-21T12:44:28.6072318Z mov.b32 %r14077, %r14074; 2026-02-21T12:44:28.6072380Z mov.b32 %r14078, %r14074; 2026-02-21T12:44:28.6072436Z mov.b32 %r14079, %r14074; 2026-02-21T12:44:28.6072492Z mov.b32 %r14080, %r14074; 2026-02-21T12:44:28.6072549Z mov.b32 %r14081, %r14074; 2026-02-21T12:44:28.6072607Z mov.b32 %r14082, %r14074; 2026-02-21T12:44:28.6072663Z mov.b32 %r14083, %r14074; 2026-02-21T12:44:28.6072719Z mov.b32 %r14084, %r14074; 2026-02-21T12:44:28.6072779Z mov.b32 %r14085, %r14074; 2026-02-21T12:44:28.6072837Z mov.b32 %r14086, %r14074; 2026-02-21T12:44:28.6072897Z mov.b32 %r14087, %r14074; 2026-02-21T12:44:28.6072970Z mov.b32 %r14088, %r14074; 2026-02-21T12:44:28.6073029Z mov.b32 %r14089, %r14074; 2026-02-21T12:44:28.6073090Z mov.b32 %r14090, %r14074; 2026-02-21T12:44:28.6073148Z mov.b32 %r14091, %r14074; 2026-02-21T12:44:28.6073212Z mov.b32 %r14092, %r14074; 2026-02-21T12:44:28.6073269Z mov.b32 %r14093, %r14074; 2026-02-21T12:44:28.6073326Z mov.b32 %r14094, %r14074; 2026-02-21T12:44:28.6073383Z mov.b32 %r14095, %r14074; 2026-02-21T12:44:28.6073441Z mov.b32 %r14096, %r14074; 2026-02-21T12:44:28.6073499Z mov.b32 %r14097, %r14074; 2026-02-21T12:44:28.6073556Z mov.b32 %r14098, %r14074; 2026-02-21T12:44:28.6073615Z mov.b32 %r14099, %r14074; 2026-02-21T12:44:28.6073675Z mov.b32 %r14100, %r14074; 2026-02-21T12:44:28.6073731Z mov.b32 %r14101, %r14074; 2026-02-21T12:44:28.6073792Z mov.b32 %r14102, %r14074; 2026-02-21T12:44:28.6073850Z mov.b32 %r14103, %r14074; 2026-02-21T12:44:28.6073907Z mov.b32 %r14104, %r14074; 2026-02-21T12:44:28.6074024Z mov.b32 %r14105, %r14074; 2026-02-21T12:44:28.6074085Z mov.b32 %r14106, %r14074; 2026-02-21T12:44:28.6074141Z mov.b32 %r14107, %r14074; 2026-02-21T12:44:28.6074246Z mov.b32 %r14108, %r14074; 2026-02-21T12:44:28.6074307Z mov.b32 %r14109, %r14074; 2026-02-21T12:44:28.6074363Z mov.b32 %r14110, %r14074; 2026-02-21T12:44:28.6074421Z mov.b32 %r14111, %r14074; 2026-02-21T12:44:28.6074478Z mov.b32 %r14112, %r14074; 2026-02-21T12:44:28.6074538Z mov.b32 %r14113, %r14074; 2026-02-21T12:44:28.6074594Z mov.b32 %r14114, %r14074; 2026-02-21T12:44:28.6074650Z mov.b32 %r14115, %r14074; 2026-02-21T12:44:28.6074710Z mov.b32 %r14116, %r14074; 2026-02-21T12:44:28.6074767Z mov.b32 %r14117, %r14074; 2026-02-21T12:44:28.6074823Z mov.b32 %r14118, %r14074; 2026-02-21T12:44:28.6074879Z mov.b32 %r14119, %r14074; 2026-02-21T12:44:28.6074942Z mov.b32 %r14120, %r14074; 2026-02-21T12:44:28.6075000Z mov.b32 %r14121, %r14074; 2026-02-21T12:44:28.6075056Z mov.b32 %r14122, %r14074; 2026-02-21T12:44:28.6075118Z mov.b32 %r14123, %r14074; 2026-02-21T12:44:28.6075175Z mov.b32 %r14124, %r14074; 2026-02-21T12:44:28.6075232Z mov.b32 %r14125, %r14074; 2026-02-21T12:44:28.6075402Z mov.b32 %r14126, %r14074; 2026-02-21T12:44:28.6075465Z mov.b32 %r14127, %r14074; 2026-02-21T12:44:28.6075522Z mov.b32 %r14128, %r14074; 2026-02-21T12:44:28.6075578Z mov.b32 %r14129, %r14074; 2026-02-21T12:44:28.6075638Z mov.b32 %r14130, %r14074; 2026-02-21T12:44:28.6075695Z mov.b32 %r14131, %r14074; 2026-02-21T12:44:28.6075751Z mov.b32 %r14132, %r14074; 2026-02-21T12:44:28.6075809Z mov.b32 %r14133, %r14074; 2026-02-21T12:44:28.6075863Z mov.b32 %r14134, %r14074; 2026-02-21T12:44:28.6075920Z mov.b32 %r14135, %r14074; 2026-02-21T12:44:28.6075978Z mov.b32 %r14136, %r14074; 2026-02-21T12:44:28.6076038Z mov.b32 %r14137, %r14074; 2026-02-21T12:44:28.6076094Z mov.b32 %r14138, %r14074; 2026-02-21T12:44:28.6076151Z mov.b32 %r14139, %r14074; 2026-02-21T12:44:28.6076210Z mov.b32 %r14140, %r14074; 2026-02-21T12:44:28.6076268Z mov.b32 %r14141, %r14074; 2026-02-21T12:44:28.6076327Z mov.b32 %r14142, %r14074; 2026-02-21T12:44:28.6076382Z mov.b32 %r14143, %r14074; 2026-02-21T12:44:28.6076572Z mov.b32 %r14144, %r14074; 2026-02-21T12:44:28.6076637Z mov.b32 %r14145, %r14074; 2026-02-21T12:44:28.6076693Z mov.b32 %r14146, %r14074; 2026-02-21T12:44:28.6076752Z mov.b32 %r14147, %r14074; 2026-02-21T12:44:28.6076808Z mov.b32 %r14148, %r14074; 2026-02-21T12:44:28.6076865Z mov.b32 %r14149, %r14074; 2026-02-21T12:44:28.6076922Z mov.b32 %r14150, %r14074; 2026-02-21T12:44:28.6076981Z mov.b32 %r14151, %r14074; 2026-02-21T12:44:28.6077038Z mov.b32 %r14152, %r14074; 2026-02-21T12:44:28.6077095Z mov.b32 %r14153, %r14074; 2026-02-21T12:44:28.6077156Z mov.b32 %r14154, %r14074; 2026-02-21T12:44:28.6077213Z mov.b32 %r14155, %r14074; 2026-02-21T12:44:28.6077271Z mov.b32 %r14156, %r14074; 2026-02-21T12:44:28.6077327Z mov.b32 %r14157, %r14074; 2026-02-21T12:44:28.6077386Z mov.b32 %r14158, %r14074; 2026-02-21T12:44:28.6077445Z mov.b32 %r14159, %r14074; 2026-02-21T12:44:28.6077501Z mov.b32 %r14160, %r14074; 2026-02-21T12:44:28.6077560Z mov.b32 %r14161, %r14074; 2026-02-21T12:44:28.6077621Z mov.b32 %r14162, %r14074; 2026-02-21T12:44:28.6077677Z mov.b32 %r14163, %r14074; 2026-02-21T12:44:28.6077734Z mov.b32 %r14164, %r14074; 2026-02-21T12:44:28.6077794Z mov.b32 %r14165, %r14074; 2026-02-21T12:44:28.6077851Z mov.b32 %r14166, %r14074; 2026-02-21T12:44:28.6077908Z mov.b32 %r14167, %r14074; 2026-02-21T12:44:28.6077968Z mov.b32 %r14168, %r14074; 2026-02-21T12:44:28.6078024Z mov.b32 %r14169, %r14074; 2026-02-21T12:44:28.6078081Z mov.b32 %r14170, %r14074; 2026-02-21T12:44:28.6078137Z mov.b32 %r14171, %r14074; 2026-02-21T12:44:28.6078198Z mov.b32 %r14172, %r14074; 2026-02-21T12:44:28.6078255Z mov.b32 %r14173, %r14074; 2026-02-21T12:44:28.6078312Z mov.b32 %r14174, %r14074; 2026-02-21T12:44:28.6078370Z mov.b32 %r14175, %r14074; 2026-02-21T12:44:28.6078426Z mov.b32 %r14176, %r14074; 2026-02-21T12:44:28.6078580Z mov.b32 %r14177, %r14074; 2026-02-21T12:44:28.6078639Z mov.b32 %r14178, %r14074; 2026-02-21T12:44:28.6078695Z mov.b32 %r14179, %r14074; 2026-02-21T12:44:28.6078822Z mov.b32 %r14180, %r14074; 2026-02-21T12:44:28.6078878Z mov.b32 %r14181, %r14074; 2026-02-21T12:44:28.6078938Z mov.b32 %r14182, %r14074; 2026-02-21T12:44:28.6078993Z mov.b32 %r14183, %r14074; 2026-02-21T12:44:28.6079049Z mov.b32 %r14184, %r14074; 2026-02-21T12:44:28.6079107Z mov.b32 %r14185, %r14074; 2026-02-21T12:44:28.6079164Z mov.b32 %r14186, %r14074; 2026-02-21T12:44:28.6079223Z mov.b32 %r14187, %r14074; 2026-02-21T12:44:28.6079280Z mov.b32 %r14188, %r14074; 2026-02-21T12:44:28.6079339Z mov.b32 %r14189, %r14074; 2026-02-21T12:44:28.6079395Z mov.b32 %r14190, %r14074; 2026-02-21T12:44:28.6079455Z mov.b32 %r14191, %r14074; 2026-02-21T12:44:28.6079515Z mov.b32 %r14192, %r14074; 2026-02-21T12:44:28.6079571Z mov.b32 %r14193, %r14074; 2026-02-21T12:44:28.6079629Z mov.b32 %r14194, %r14074; 2026-02-21T12:44:28.6079687Z mov.b32 %r14195, %r14074; 2026-02-21T12:44:28.6079747Z mov.b32 %r14196, %r14074; 2026-02-21T12:44:28.6079802Z mov.b32 %r14197, %r14074; 2026-02-21T12:44:28.6079976Z mov.b32 %r14198, %r14074; 2026-02-21T12:44:28.6080040Z mov.b32 %r14199, %r14074; 2026-02-21T12:44:28.6080096Z mov.b32 %r14200, %r14074; 2026-02-21T12:44:28.6080153Z mov.b32 %r14201, %r14074; 2026-02-21T12:44:28.6080269Z $L__BB0_16: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:28.6080374Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:28.6080579Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6080646Z add.s64 %rd363, %rd658, -32; 2026-02-21T12:44:28.6080710Z // begin inline asm 2026-02-21T12:44:28.6080770Z mov.u64 %rd362, 0x0; 2026-02-21T12:44:28.6080900Z createpolicy.fractional.L2::evict_last.b64 %rd362, 1.0; 2026-02-21T12:44:28.6080959Z // end inline asm 2026-02-21T12:44:28.6081019Z // begin inline asm 2026-02-21T12:44:28.6081076Z mov.u32 %r6425, 0x0; 2026-02-21T12:44:28.6081132Z mov.u32 %r6426, 0x0; 2026-02-21T12:44:28.6081331Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r6425, %r6426 }, [ %rd363 + 0 ], %rd362; 2026-02-21T12:44:28.6081388Z // end inline asm 2026-02-21T12:44:28.6081607Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6081666Z bar.sync 0; 2026-02-21T12:44:28.6081747Z st.shared.v2.b32 [%r10], {%r6425, %r6426}; 2026-02-21T12:44:28.6081803Z bar.sync 0; 2026-02-21T12:44:28.6081873Z ld.shared.b16 %rs225, [%r11]; 2026-02-21T12:44:28.6081940Z ld.shared.b16 %rs226, [%r11+128]; 2026-02-21T12:44:28.6082007Z ld.shared.b16 %rs227, [%r11+8]; 2026-02-21T12:44:28.6082077Z ld.shared.b16 %rs228, [%r11+136]; 2026-02-21T12:44:28.6082145Z cvt.f32.bf16 %r6684, %rs225; 2026-02-21T12:44:28.6082207Z cvt.f32.bf16 %r6685, %rs226; 2026-02-21T12:44:28.6082268Z cvt.f32.bf16 %r6686, %rs227; 2026-02-21T12:44:28.6082333Z cvt.f32.bf16 %r6687, %rs228; 2026-02-21T12:44:28.6082539Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6082600Z // begin inline asm 2026-02-21T12:44:28.6082658Z mov.u32 %r6427, 0x0; 2026-02-21T12:44:28.6082734Z ld.global.b32 { %r6427 }, [ %rd657 + 0 ]; 2026-02-21T12:44:28.6082790Z // end inline asm 2026-02-21T12:44:28.6082987Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6083046Z bar.sync 0; 2026-02-21T12:44:28.6083108Z st.shared.b8 [%r12], %r6427; 2026-02-21T12:44:28.6083188Z prmt.b32 %r8000, %r6427, 0, 0x7771U; 2026-02-21T12:44:28.6083257Z st.shared.b8 [%r13+256], %r8000; 2026-02-21T12:44:28.6083322Z prmt.b32 %r8001, %r6427, 0, 0x7772U; 2026-02-21T12:44:28.6083386Z st.shared.b8 [%r14+512], %r8001; 2026-02-21T12:44:28.6083448Z prmt.b32 %r8002, %r6427, 0, 0x7773U; 2026-02-21T12:44:28.6083571Z st.shared.b8 [%r15+768], %r8002; 2026-02-21T12:44:28.6083625Z bar.sync 0; 2026-02-21T12:44:28.6083687Z ld.shared.b32 %r8003, [%r16]; 2026-02-21T12:44:28.6083799Z prmt.b32 %r8004, %r8003, 0, 0x7771U; 2026-02-21T12:44:28.6083861Z cvt.u16.u32 %rs229, %r8004; 2026-02-21T12:44:28.6083923Z prmt.b32 %r8005, %r8003, 0, 0x7770U; 2026-02-21T12:44:28.6083983Z cvt.u16.u32 %rs230, %r8005; 2026-02-21T12:44:28.6084046Z prmt.b32 %r8006, %r8003, 0, 0x7773U; 2026-02-21T12:44:28.6084105Z cvt.u16.u32 %rs231, %r8006; 2026-02-21T12:44:28.6084165Z prmt.b32 %r8007, %r8003, 0, 0x7772U; 2026-02-21T12:44:28.6084229Z cvt.u16.u32 %rs232, %r8007; 2026-02-21T12:44:28.6084428Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6084491Z shl.b16 %rs233, %rs230, 4; 2026-02-21T12:44:28.6084557Z shl.b16 %rs234, %rs229, 4; 2026-02-21T12:44:28.6084754Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6084817Z cvt.u32.u16 %r8008, %rs233; 2026-02-21T12:44:28.6084889Z prmt.b32 %r8009, %r8008, %r8010, 0x3340U; 2026-02-21T12:44:28.6085050Z prmt.b32 %r8014, %r8009, %r8011, 0x5410U; 2026-02-21T12:44:28.6085122Z prmt.b32 %r8015, %r8014, %r8003, 0x5040U; 2026-02-21T12:44:28.6085184Z prmt.b32 %r8016, %r8015, 0, 0x9991U; 2026-02-21T12:44:28.6085247Z cvt.u16.u32 %rs235, %r8016; 2026-02-21T12:44:28.6085306Z shr.s16 %rs236, %rs235, 4; 2026-02-21T12:44:28.6085368Z prmt.b32 %r8017, %r8015, 0, 0xbbb3U; 2026-02-21T12:44:28.6085428Z cvt.u16.u32 %rs237, %r8017; 2026-02-21T12:44:28.6085493Z shr.s16 %rs238, %rs237, 4; 2026-02-21T12:44:28.6085553Z cvt.s16.s8 %rs239, %rs233; 2026-02-21T12:44:28.6085612Z shr.s16 %rs240, %rs239, 4; 2026-02-21T12:44:28.6085673Z cvt.s16.s8 %rs241, %rs234; 2026-02-21T12:44:28.6085732Z shr.s16 %rs242, %rs241, 4; 2026-02-21T12:44:28.6085929Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6086000Z cvt.rn.f32.s16 %r8018, %rs238; 2026-02-21T12:44:28.6086063Z cvt.rn.f32.s16 %r8019, %rs236; 2026-02-21T12:44:28.6086125Z cvt.rn.f32.s16 %r8020, %rs242; 2026-02-21T12:44:28.6086189Z cvt.rn.f32.s16 %r8021, %rs240; 2026-02-21T12:44:28.6086391Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6086580Z shl.b16 %rs243, %rs232, 4; 2026-02-21T12:44:28.6086645Z shl.b16 %rs244, %rs231, 4; 2026-02-21T12:44:28.6086848Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6086919Z prmt.b32 %r8022, %r8003, %r8023, 0x3020U; 2026-02-21T12:44:28.6086983Z prmt.b32 %r8024, %r8022, 0, 0x9991U; 2026-02-21T12:44:28.6087046Z cvt.u16.u32 %rs245, %r8024; 2026-02-21T12:44:28.6087106Z shr.s16 %rs246, %rs245, 4; 2026-02-21T12:44:28.6087166Z cvt.s16.s8 %rs247, %rs243; 2026-02-21T12:44:28.6087226Z shr.s16 %rs248, %rs247, 4; 2026-02-21T12:44:28.6087293Z cvt.s16.s8 %rs249, %rs244; 2026-02-21T12:44:28.6087363Z shr.s16 %rs250, %rs249, 4; 2026-02-21T12:44:28.6087430Z prmt.b32 %r8025, %r8003, 0, 0xbbb3U; 2026-02-21T12:44:28.6087496Z cvt.u16.u32 %rs251, %r8025; 2026-02-21T12:44:28.6087557Z shr.s16 %rs252, %rs251, 4; 2026-02-21T12:44:28.6087755Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6087819Z cvt.rn.f32.s16 %r8026, %rs246; 2026-02-21T12:44:28.6087883Z cvt.rn.f32.s16 %r8027, %rs252; 2026-02-21T12:44:28.6087944Z cvt.rn.f32.s16 %r8028, %rs250; 2026-02-21T12:44:28.6088005Z cvt.rn.f32.s16 %r8029, %rs248; 2026-02-21T12:44:28.6088061Z bar.sync 0; 2026-02-21T12:44:28.6088173Z st.shared.v4.b32 [%r17], {%r8021, %r8019, %r8020, %r8018}; 2026-02-21T12:44:28.6088277Z st.shared.v4.b32 [%r18], {%r8029, %r8026, %r8028, %r8027}; 2026-02-21T12:44:28.6088336Z $L__tmp17: 2026-02-21T12:44:28.6088611Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6088752Z // begin inline asm 2026-02-21T12:44:28.6088828Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6088946Z // end inline asm 2026-02-21T12:44:28.6089011Z bar.sync 0; 2026-02-21T12:44:28.6089083Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6089150Z mov.pred %p15, -1; 2026-02-21T12:44:28.6089206Z // begin inline asm 2026-02-21T12:44:28.6092037Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201}, {%r6684,%r6685,%r6686,%r6687}, %rd466, %p15, 1, 1; 2026-02-21T12:44:28.6092105Z // end inline asm 2026-02-21T12:44:28.6092181Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6092240Z mov.b32 %r8477, 0; 2026-02-21T12:44:28.6092298Z mov.b32 %r6816, %r11895; 2026-02-21T12:44:28.6092358Z mov.b32 %r6817, %r8477; 2026-02-21T12:44:28.6092415Z mov.b32 %r6818, %r8477; 2026-02-21T12:44:28.6092475Z // begin inline asm 2026-02-21T12:44:28.6094991Z // wait for regs: %r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201,%r6816,%r6817,%r6818 2026-02-21T12:44:28.6095073Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6095129Z // end inline asm 2026-02-21T12:44:28.6095182Z $L__tmp18: 2026-02-21T12:44:28.6095394Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6095461Z add.s64 %rd368, %rd658, -16; 2026-02-21T12:44:28.6095520Z // begin inline asm 2026-02-21T12:44:28.6095580Z mov.u64 %rd367, 0x0; 2026-02-21T12:44:28.6095702Z createpolicy.fractional.L2::evict_last.b64 %rd367, 1.0; 2026-02-21T12:44:28.6095758Z // end inline asm 2026-02-21T12:44:28.6095814Z // begin inline asm 2026-02-21T12:44:28.6095875Z mov.u32 %r6950, 0x0; 2026-02-21T12:44:28.6095929Z mov.u32 %r6951, 0x0; 2026-02-21T12:44:28.6096127Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r6950, %r6951 }, [ %rd368 + 0 ], %rd367; 2026-02-21T12:44:28.6096187Z // end inline asm 2026-02-21T12:44:28.6096389Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6096637Z bar.sync 0; 2026-02-21T12:44:28.6096721Z st.shared.v2.b32 [%r10], {%r6950, %r6951}; 2026-02-21T12:44:28.6096777Z bar.sync 0; 2026-02-21T12:44:28.6096844Z ld.shared.b16 %rs253, [%r11]; 2026-02-21T12:44:28.6096985Z ld.shared.b16 %rs254, [%r11+128]; 2026-02-21T12:44:28.6097052Z ld.shared.b16 %rs255, [%r11+8]; 2026-02-21T12:44:28.6097115Z ld.shared.b16 %rs256, [%r11+136]; 2026-02-21T12:44:28.6097179Z cvt.f32.bf16 %r7209, %rs253; 2026-02-21T12:44:28.6097244Z cvt.f32.bf16 %r7210, %rs254; 2026-02-21T12:44:28.6097304Z cvt.f32.bf16 %r7211, %rs255; 2026-02-21T12:44:28.6097363Z cvt.f32.bf16 %r7212, %rs256; 2026-02-21T12:44:28.6097569Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6097635Z add.s64 %rd370, %rd657, 5120; 2026-02-21T12:44:28.6097693Z // begin inline asm 2026-02-21T12:44:28.6097748Z mov.u32 %r6952, 0x0; 2026-02-21T12:44:28.6097825Z ld.global.b32 { %r6952 }, [ %rd370 + 0 ]; 2026-02-21T12:44:28.6097883Z // end inline asm 2026-02-21T12:44:28.6098091Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6098235Z bar.sync 0; 2026-02-21T12:44:28.6098355Z st.shared.b8 [%r12], %r6952; 2026-02-21T12:44:28.6098423Z prmt.b32 %r8030, %r6952, 0, 0x7771U; 2026-02-21T12:44:28.6098490Z st.shared.b8 [%r13+256], %r8030; 2026-02-21T12:44:28.6098558Z prmt.b32 %r8031, %r6952, 0, 0x7772U; 2026-02-21T12:44:28.6098621Z st.shared.b8 [%r14+512], %r8031; 2026-02-21T12:44:28.6098684Z prmt.b32 %r8032, %r6952, 0, 0x7773U; 2026-02-21T12:44:28.6098748Z st.shared.b8 [%r15+768], %r8032; 2026-02-21T12:44:28.6098801Z bar.sync 0; 2026-02-21T12:44:28.6098864Z ld.shared.b32 %r8033, [%r16]; 2026-02-21T12:44:28.6098924Z prmt.b32 %r8034, %r8033, 0, 0x7771U; 2026-02-21T12:44:28.6098989Z cvt.u16.u32 %rs257, %r8034; 2026-02-21T12:44:28.6099051Z prmt.b32 %r8035, %r8033, 0, 0x7770U; 2026-02-21T12:44:28.6099112Z cvt.u16.u32 %rs258, %r8035; 2026-02-21T12:44:28.6099179Z prmt.b32 %r8036, %r8033, 0, 0x7773U; 2026-02-21T12:44:28.6099239Z cvt.u16.u32 %rs259, %r8036; 2026-02-21T12:44:28.6099300Z prmt.b32 %r8037, %r8033, 0, 0x7772U; 2026-02-21T12:44:28.6099371Z cvt.u16.u32 %rs260, %r8037; 2026-02-21T12:44:28.6099589Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6099658Z shl.b16 %rs261, %rs258, 4; 2026-02-21T12:44:28.6099721Z shl.b16 %rs262, %rs257, 4; 2026-02-21T12:44:28.6099926Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6099989Z cvt.u32.u16 %r8038, %rs261; 2026-02-21T12:44:28.6100064Z prmt.b32 %r8039, %r8038, %r8040, 0x3340U; 2026-02-21T12:44:28.6100136Z prmt.b32 %r8041, %r8039, %r8011, 0x5410U; 2026-02-21T12:44:28.6100204Z prmt.b32 %r8042, %r8041, %r8033, 0x5040U; 2026-02-21T12:44:28.6100269Z prmt.b32 %r8043, %r8042, 0, 0x9991U; 2026-02-21T12:44:28.6100334Z cvt.u16.u32 %rs263, %r8043; 2026-02-21T12:44:28.6100398Z shr.s16 %rs264, %rs263, 4; 2026-02-21T12:44:28.6100464Z prmt.b32 %r8044, %r8042, 0, 0xbbb3U; 2026-02-21T12:44:28.6100525Z cvt.u16.u32 %rs265, %r8044; 2026-02-21T12:44:28.6100593Z shr.s16 %rs266, %rs265, 4; 2026-02-21T12:44:28.6100653Z cvt.s16.s8 %rs267, %rs261; 2026-02-21T12:44:28.6100711Z shr.s16 %rs268, %rs267, 4; 2026-02-21T12:44:28.6100773Z cvt.s16.s8 %rs269, %rs262; 2026-02-21T12:44:28.6100832Z shr.s16 %rs270, %rs269, 4; 2026-02-21T12:44:28.6101032Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6101098Z cvt.rn.f32.s16 %r8045, %rs266; 2026-02-21T12:44:28.6101175Z cvt.rn.f32.s16 %r8046, %rs264; 2026-02-21T12:44:28.6101238Z cvt.rn.f32.s16 %r8047, %rs270; 2026-02-21T12:44:28.6101300Z cvt.rn.f32.s16 %r8048, %rs268; 2026-02-21T12:44:28.6101501Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6101561Z shl.b16 %rs271, %rs260, 4; 2026-02-21T12:44:28.6101700Z shl.b16 %rs272, %rs259, 4; 2026-02-21T12:44:28.6101913Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6102044Z prmt.b32 %r8049, %r8033, %r8050, 0x3020U; 2026-02-21T12:44:28.6102110Z prmt.b32 %r8051, %r8049, 0, 0x9991U; 2026-02-21T12:44:28.6102171Z cvt.u16.u32 %rs273, %r8051; 2026-02-21T12:44:28.6102233Z shr.s16 %rs274, %rs273, 4; 2026-02-21T12:44:28.6102292Z cvt.s16.s8 %rs275, %rs271; 2026-02-21T12:44:28.6102352Z shr.s16 %rs276, %rs275, 4; 2026-02-21T12:44:28.6102414Z cvt.s16.s8 %rs277, %rs272; 2026-02-21T12:44:28.6102472Z shr.s16 %rs278, %rs277, 4; 2026-02-21T12:44:28.6102541Z prmt.b32 %r8052, %r8033, 0, 0xbbb3U; 2026-02-21T12:44:28.6102601Z cvt.u16.u32 %rs279, %r8052; 2026-02-21T12:44:28.6102664Z shr.s16 %rs280, %rs279, 4; 2026-02-21T12:44:28.6102861Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6102927Z cvt.rn.f32.s16 %r8053, %rs274; 2026-02-21T12:44:28.6102994Z cvt.rn.f32.s16 %r8054, %rs280; 2026-02-21T12:44:28.6103055Z cvt.rn.f32.s16 %r8055, %rs278; 2026-02-21T12:44:28.6103218Z cvt.rn.f32.s16 %r8056, %rs276; 2026-02-21T12:44:28.6103282Z bar.sync 0; 2026-02-21T12:44:28.6103398Z st.shared.v4.b32 [%r17], {%r8048, %r8046, %r8047, %r8045}; 2026-02-21T12:44:28.6103503Z st.shared.v4.b32 [%r18], {%r8056, %r8053, %r8055, %r8054}; 2026-02-21T12:44:28.6103558Z $L__tmp19: 2026-02-21T12:44:28.6103835Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6103894Z // begin inline asm 2026-02-21T12:44:28.6103973Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6104031Z // end inline asm 2026-02-21T12:44:28.6104084Z bar.sync 0; 2026-02-21T12:44:28.6104158Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6104217Z // begin inline asm 2026-02-21T12:44:28.6107081Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201}, {%r7209,%r7210,%r7211,%r7212}, %rd466, %p15, 1, 1; 2026-02-21T12:44:28.6107154Z // end inline asm 2026-02-21T12:44:28.6107233Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6107295Z mov.b32 %r7341, %r11895; 2026-02-21T12:44:28.6107357Z mov.b32 %r7342, %r8477; 2026-02-21T12:44:28.6107420Z mov.b32 %r7343, %r8477; 2026-02-21T12:44:28.6107480Z // begin inline asm 2026-02-21T12:44:28.6110043Z // wait for regs: %r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201,%r7341,%r7342,%r7343 2026-02-21T12:44:28.6110267Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6110323Z // end inline asm 2026-02-21T12:44:28.6110379Z $L__tmp20: 2026-02-21T12:44:28.6110590Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6110650Z // begin inline asm 2026-02-21T12:44:28.6110712Z mov.u64 %rd372, 0x0; 2026-02-21T12:44:28.6110839Z createpolicy.fractional.L2::evict_last.b64 %rd372, 1.0; 2026-02-21T12:44:28.6110895Z // end inline asm 2026-02-21T12:44:28.6110955Z // begin inline asm 2026-02-21T12:44:28.6111015Z mov.u32 %r7475, 0x0; 2026-02-21T12:44:28.6111073Z mov.u32 %r7476, 0x0; 2026-02-21T12:44:28.6111383Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r7475, %r7476 }, [ %rd658 + 0 ], %rd372; 2026-02-21T12:44:28.6111452Z // end inline asm 2026-02-21T12:44:28.6111657Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6111712Z bar.sync 0; 2026-02-21T12:44:28.6111795Z st.shared.v2.b32 [%r10], {%r7475, %r7476}; 2026-02-21T12:44:28.6111859Z bar.sync 0; 2026-02-21T12:44:28.6111927Z ld.shared.b16 %rs281, [%r11]; 2026-02-21T12:44:28.6111994Z ld.shared.b16 %rs282, [%r11+128]; 2026-02-21T12:44:28.6112065Z ld.shared.b16 %rs283, [%r11+8]; 2026-02-21T12:44:28.6112130Z ld.shared.b16 %rs284, [%r11+136]; 2026-02-21T12:44:28.6112195Z cvt.f32.bf16 %r7734, %rs281; 2026-02-21T12:44:28.6112259Z cvt.f32.bf16 %r7735, %rs282; 2026-02-21T12:44:28.6112318Z cvt.f32.bf16 %r7736, %rs283; 2026-02-21T12:44:28.6112383Z cvt.f32.bf16 %r7737, %rs284; 2026-02-21T12:44:28.6112591Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6112662Z add.s64 %rd375, %rd657, 10240; 2026-02-21T12:44:28.6112722Z // begin inline asm 2026-02-21T12:44:28.6112779Z mov.u32 %r7477, 0x0; 2026-02-21T12:44:28.6112857Z ld.global.b32 { %r7477 }, [ %rd375 + 0 ]; 2026-02-21T12:44:28.6112913Z // end inline asm 2026-02-21T12:44:28.6113115Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6113172Z bar.sync 0; 2026-02-21T12:44:28.6113236Z st.shared.b8 [%r12], %r7477; 2026-02-21T12:44:28.6113301Z prmt.b32 %r8057, %r7477, 0, 0x7771U; 2026-02-21T12:44:28.6113367Z st.shared.b8 [%r13+256], %r8057; 2026-02-21T12:44:28.6113435Z prmt.b32 %r8058, %r7477, 0, 0x7772U; 2026-02-21T12:44:28.6113499Z st.shared.b8 [%r14+512], %r8058; 2026-02-21T12:44:28.6113563Z prmt.b32 %r8059, %r7477, 0, 0x7773U; 2026-02-21T12:44:28.6113634Z st.shared.b8 [%r15+768], %r8059; 2026-02-21T12:44:28.6113688Z bar.sync 0; 2026-02-21T12:44:28.6113753Z ld.shared.b32 %r8060, [%r16]; 2026-02-21T12:44:28.6113821Z prmt.b32 %r8061, %r8060, 0, 0x7771U; 2026-02-21T12:44:28.6113901Z cvt.u16.u32 %rs285, %r8061; 2026-02-21T12:44:28.6113966Z prmt.b32 %r8062, %r8060, 0, 0x7770U; 2026-02-21T12:44:28.6114027Z cvt.u16.u32 %rs286, %r8062; 2026-02-21T12:44:28.6114094Z prmt.b32 %r8063, %r8060, 0, 0x7773U; 2026-02-21T12:44:28.6114154Z cvt.u16.u32 %rs287, %r8063; 2026-02-21T12:44:28.6114216Z prmt.b32 %r8064, %r8060, 0, 0x7772U; 2026-02-21T12:44:28.6114279Z cvt.u16.u32 %rs288, %r8064; 2026-02-21T12:44:28.6114477Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6114540Z shl.b16 %rs289, %rs286, 4; 2026-02-21T12:44:28.6114601Z shl.b16 %rs290, %rs285, 4; 2026-02-21T12:44:28.6114801Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6114924Z cvt.u32.u16 %r8065, %rs289; 2026-02-21T12:44:28.6114998Z prmt.b32 %r8066, %r8065, %r8067, 0x3340U; 2026-02-21T12:44:28.6115131Z prmt.b32 %r8068, %r8066, %r8011, 0x5410U; 2026-02-21T12:44:28.6115199Z prmt.b32 %r8069, %r8068, %r8060, 0x5040U; 2026-02-21T12:44:28.6115262Z prmt.b32 %r8070, %r8069, 0, 0x9991U; 2026-02-21T12:44:28.6115324Z cvt.u16.u32 %rs291, %r8070; 2026-02-21T12:44:28.6115385Z shr.s16 %rs292, %rs291, 4; 2026-02-21T12:44:28.6115447Z prmt.b32 %r8071, %r8069, 0, 0xbbb3U; 2026-02-21T12:44:28.6115507Z cvt.u16.u32 %rs293, %r8071; 2026-02-21T12:44:28.6115573Z shr.s16 %rs294, %rs293, 4; 2026-02-21T12:44:28.6115633Z cvt.s16.s8 %rs295, %rs289; 2026-02-21T12:44:28.6115692Z shr.s16 %rs296, %rs295, 4; 2026-02-21T12:44:28.6115754Z cvt.s16.s8 %rs297, %rs290; 2026-02-21T12:44:28.6115815Z shr.s16 %rs298, %rs297, 4; 2026-02-21T12:44:28.6116013Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6116079Z cvt.rn.f32.s16 %r8072, %rs294; 2026-02-21T12:44:28.6116144Z cvt.rn.f32.s16 %r8073, %rs292; 2026-02-21T12:44:28.6116292Z cvt.rn.f32.s16 %r8074, %rs298; 2026-02-21T12:44:28.6116358Z cvt.rn.f32.s16 %r8075, %rs296; 2026-02-21T12:44:28.6116689Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6116754Z shl.b16 %rs299, %rs288, 4; 2026-02-21T12:44:28.6116816Z shl.b16 %rs300, %rs287, 4; 2026-02-21T12:44:28.6117016Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6117086Z prmt.b32 %r8076, %r8060, %r8077, 0x3020U; 2026-02-21T12:44:28.6117149Z prmt.b32 %r8078, %r8076, 0, 0x9991U; 2026-02-21T12:44:28.6117208Z cvt.u16.u32 %rs301, %r8078; 2026-02-21T12:44:28.6117272Z shr.s16 %rs302, %rs301, 4; 2026-02-21T12:44:28.6117331Z cvt.s16.s8 %rs303, %rs299; 2026-02-21T12:44:28.6117391Z shr.s16 %rs304, %rs303, 4; 2026-02-21T12:44:28.6117457Z cvt.s16.s8 %rs305, %rs300; 2026-02-21T12:44:28.6117517Z shr.s16 %rs306, %rs305, 4; 2026-02-21T12:44:28.6117582Z prmt.b32 %r8079, %r8060, 0, 0xbbb3U; 2026-02-21T12:44:28.6117645Z cvt.u16.u32 %rs307, %r8079; 2026-02-21T12:44:28.6117709Z shr.s16 %rs308, %rs307, 4; 2026-02-21T12:44:28.6117907Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6117970Z cvt.rn.f32.s16 %r8080, %rs302; 2026-02-21T12:44:28.6118037Z cvt.rn.f32.s16 %r8081, %rs308; 2026-02-21T12:44:28.6118099Z cvt.rn.f32.s16 %r8082, %rs306; 2026-02-21T12:44:28.6118162Z cvt.rn.f32.s16 %r8083, %rs304; 2026-02-21T12:44:28.6118219Z bar.sync 0; 2026-02-21T12:44:28.6118328Z st.shared.v4.b32 [%r17], {%r8075, %r8073, %r8074, %r8072}; 2026-02-21T12:44:28.6118431Z st.shared.v4.b32 [%r18], {%r8083, %r8080, %r8082, %r8081}; 2026-02-21T12:44:28.6118484Z $L__tmp21: 2026-02-21T12:44:28.6118760Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6118822Z // begin inline asm 2026-02-21T12:44:28.6118903Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6118965Z // end inline asm 2026-02-21T12:44:28.6119019Z bar.sync 0; 2026-02-21T12:44:28.6119089Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6119149Z // begin inline asm 2026-02-21T12:44:28.6121870Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201}, {%r7734,%r7735,%r7736,%r7737}, %rd466, %p15, 1, 1; 2026-02-21T12:44:28.6122096Z // end inline asm 2026-02-21T12:44:28.6122174Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6122242Z mov.b32 %r7868, %r8477; 2026-02-21T12:44:28.6122301Z mov.b32 %r7866, %r11895; 2026-02-21T12:44:28.6122363Z mov.b32 %r7867, %r8477; 2026-02-21T12:44:28.6122420Z // begin inline asm 2026-02-21T12:44:28.6125040Z // wait for regs: %r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201,%r7866,%r7867,%r7868 2026-02-21T12:44:28.6125124Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6125182Z // end inline asm 2026-02-21T12:44:28.6125240Z $L__tmp22: 2026-02-21T12:44:28.6125459Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6125527Z add.s64 %rd659, %rd659, 12; 2026-02-21T12:44:28.6125593Z add.s64 %rd658, %rd658, 48; 2026-02-21T12:44:28.6125655Z add.s64 %rd657, %rd657, 15360; 2026-02-21T12:44:28.6125721Z setp.lt.u64 %p18, %rd659, 4080; 2026-02-21T12:44:28.6125781Z @%p18 bra $L__BB0_16; 2026-02-21T12:44:28.6125894Z // %bb.17: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6126097Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6126158Z or.b64 %rd398, %rd73, %rd5; 2026-02-21T12:44:28.6126222Z or.b64 %rd399, %rd73, %rd6; 2026-02-21T12:44:28.6126295Z or.b64 %rd400, %rd73, %rd7; 2026-02-21T12:44:28.6126357Z or.b64 %rd401, %rd73, %rd8; 2026-02-21T12:44:28.6126419Z or.b64 %rd402, %rd73, %rd9; 2026-02-21T12:44:28.6126630Z or.b64 %rd403, %rd73, %rd10; 2026-02-21T12:44:28.6126696Z or.b64 %rd404, %rd73, %rd11; 2026-02-21T12:44:28.6126756Z or.b64 %rd405, %rd73, %rd12; 2026-02-21T12:44:28.6126826Z or.b64 %rd406, %rd73, %rd13; 2026-02-21T12:44:28.6126885Z or.b64 %rd407, %rd73, %rd14; 2026-02-21T12:44:28.6126944Z or.b64 %rd408, %rd73, %rd15; 2026-02-21T12:44:28.6127005Z or.b64 %rd409, %rd73, %rd16; 2026-02-21T12:44:28.6127065Z or.b64 %rd410, %rd73, %rd17; 2026-02-21T12:44:28.6127124Z or.b64 %rd411, %rd73, %rd18; 2026-02-21T12:44:28.6127194Z or.b64 %rd412, %rd73, %rd19; 2026-02-21T12:44:28.6127257Z or.b64 %rd413, %rd73, %rd20; 2026-02-21T12:44:28.6127458Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6127518Z or.b64 %rd414, %rd74, %rd22; 2026-02-21T12:44:28.6127715Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6127776Z shl.b64 %rd415, %rd76, 1; 2026-02-21T12:44:28.6127923Z add.s64 %rd378, %rd25, %rd415; 2026-02-21T12:44:28.6128125Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6128260Z // begin inline asm 2026-02-21T12:44:28.6128320Z mov.u64 %rd377, 0x0; 2026-02-21T12:44:28.6128451Z createpolicy.fractional.L2::evict_last.b64 %rd377, 1.0; 2026-02-21T12:44:28.6128507Z // end inline asm 2026-02-21T12:44:28.6128566Z // begin inline asm 2026-02-21T12:44:28.6128624Z mov.u32 %r8084, 0x0; 2026-02-21T12:44:28.6128684Z mov.u32 %r8085, 0x0; 2026-02-21T12:44:28.6128871Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r8084, %r8085 }, [ %rd378 + 0 ], %rd377; 2026-02-21T12:44:28.6128927Z // end inline asm 2026-02-21T12:44:28.6129129Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6129184Z bar.sync 0; 2026-02-21T12:44:28.6129262Z st.shared.v2.b32 [%r10], {%r8084, %r8085}; 2026-02-21T12:44:28.6129320Z bar.sync 0; 2026-02-21T12:44:28.6129390Z ld.shared.b16 %rs309, [%r11]; 2026-02-21T12:44:28.6129457Z ld.shared.b16 %rs310, [%r11+128]; 2026-02-21T12:44:28.6129639Z ld.shared.b16 %rs311, [%r11+8]; 2026-02-21T12:44:28.6129711Z ld.shared.b16 %rs312, [%r11+136]; 2026-02-21T12:44:28.6129775Z cvt.f32.bf16 %r8343, %rs309; 2026-02-21T12:44:28.6129836Z cvt.f32.bf16 %r8344, %rs310; 2026-02-21T12:44:28.6129899Z cvt.f32.bf16 %r8345, %rs311; 2026-02-21T12:44:28.6129959Z cvt.f32.bf16 %r8346, %rs312; 2026-02-21T12:44:28.6130162Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6130223Z add.s64 %rd416, %rd645, %rd75; 2026-02-21T12:44:28.6130293Z add.s64 %rd380, %rd416, 5237760; 2026-02-21T12:44:28.6130492Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6130551Z // begin inline asm 2026-02-21T12:44:28.6130611Z mov.u32 %r8086, 0x0; 2026-02-21T12:44:28.6130688Z ld.global.b32 { %r8086 }, [ %rd380 + 0 ]; 2026-02-21T12:44:28.6130742Z // end inline asm 2026-02-21T12:44:28.6130942Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6131002Z bar.sync 0; 2026-02-21T12:44:28.6131065Z st.shared.b8 [%r12], %r8086; 2026-02-21T12:44:28.6131137Z prmt.b32 %r8753, %r8086, 0, 0x7771U; 2026-02-21T12:44:28.6131215Z st.shared.b8 [%r13+256], %r8753; 2026-02-21T12:44:28.6131282Z prmt.b32 %r8754, %r8086, 0, 0x7772U; 2026-02-21T12:44:28.6131346Z st.shared.b8 [%r14+512], %r8754; 2026-02-21T12:44:28.6131413Z prmt.b32 %r8755, %r8086, 0, 0x7773U; 2026-02-21T12:44:28.6131477Z st.shared.b8 [%r15+768], %r8755; 2026-02-21T12:44:28.6131531Z bar.sync 0; 2026-02-21T12:44:28.6131603Z ld.shared.b32 %r8756, [%r16]; 2026-02-21T12:44:28.6131669Z prmt.b32 %r8757, %r8756, 0, 0x7771U; 2026-02-21T12:44:28.6131731Z cvt.u16.u32 %rs313, %r8757; 2026-02-21T12:44:28.6131793Z prmt.b32 %r8758, %r8756, 0, 0x7770U; 2026-02-21T12:44:28.6131859Z cvt.u16.u32 %rs314, %r8758; 2026-02-21T12:44:28.6131921Z prmt.b32 %r8759, %r8756, 0, 0x7773U; 2026-02-21T12:44:28.6131985Z cvt.u16.u32 %rs315, %r8759; 2026-02-21T12:44:28.6132050Z prmt.b32 %r8760, %r8756, 0, 0x7772U; 2026-02-21T12:44:28.6132113Z cvt.u16.u32 %rs316, %r8760; 2026-02-21T12:44:28.6132313Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6132376Z shl.b16 %rs317, %rs314, 4; 2026-02-21T12:44:28.6132443Z shl.b16 %rs318, %rs313, 4; 2026-02-21T12:44:28.6132640Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6132699Z cvt.u32.u16 %r8761, %rs317; 2026-02-21T12:44:28.6132774Z prmt.b32 %r8762, %r8761, %r8763, 0x3340U; 2026-02-21T12:44:28.6132843Z prmt.b32 %r8767, %r8762, %r8764, 0x5410U; 2026-02-21T12:44:28.6132914Z prmt.b32 %r8768, %r8767, %r8756, 0x5040U; 2026-02-21T12:44:28.6132976Z prmt.b32 %r8769, %r8768, 0, 0x9991U; 2026-02-21T12:44:28.6133098Z cvt.u16.u32 %rs319, %r8769; 2026-02-21T12:44:28.6133159Z shr.s16 %rs320, %rs319, 4; 2026-02-21T12:44:28.6133269Z prmt.b32 %r8770, %r8768, 0, 0xbbb3U; 2026-02-21T12:44:28.6133331Z cvt.u16.u32 %rs321, %r8770; 2026-02-21T12:44:28.6133391Z shr.s16 %rs322, %rs321, 4; 2026-02-21T12:44:28.6133450Z cvt.s16.s8 %rs323, %rs317; 2026-02-21T12:44:28.6133512Z shr.s16 %rs324, %rs323, 4; 2026-02-21T12:44:28.6133573Z cvt.s16.s8 %rs325, %rs318; 2026-02-21T12:44:28.6133633Z shr.s16 %rs326, %rs325, 4; 2026-02-21T12:44:28.6133832Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6133899Z cvt.rn.f32.s16 %r8771, %rs322; 2026-02-21T12:44:28.6133960Z cvt.rn.f32.s16 %r8772, %rs320; 2026-02-21T12:44:28.6134034Z cvt.rn.f32.s16 %r8773, %rs326; 2026-02-21T12:44:28.6134098Z cvt.rn.f32.s16 %r8774, %rs324; 2026-02-21T12:44:28.6134295Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6134359Z shl.b16 %rs327, %rs316, 4; 2026-02-21T12:44:28.6134419Z shl.b16 %rs328, %rs315, 4; 2026-02-21T12:44:28.6134705Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6134777Z prmt.b32 %r8775, %r8756, %r8776, 0x3020U; 2026-02-21T12:44:28.6134841Z prmt.b32 %r8777, %r8775, 0, 0x9991U; 2026-02-21T12:44:28.6134906Z cvt.u16.u32 %rs329, %r8777; 2026-02-21T12:44:28.6134966Z shr.s16 %rs330, %rs329, 4; 2026-02-21T12:44:28.6135025Z cvt.s16.s8 %rs331, %rs327; 2026-02-21T12:44:28.6135088Z shr.s16 %rs332, %rs331, 4; 2026-02-21T12:44:28.6135148Z cvt.s16.s8 %rs333, %rs328; 2026-02-21T12:44:28.6135209Z shr.s16 %rs334, %rs333, 4; 2026-02-21T12:44:28.6135273Z prmt.b32 %r8778, %r8756, 0, 0xbbb3U; 2026-02-21T12:44:28.6135337Z cvt.u16.u32 %rs335, %r8778; 2026-02-21T12:44:28.6135407Z shr.s16 %rs336, %rs335, 4; 2026-02-21T12:44:28.6135605Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6135672Z cvt.rn.f32.s16 %r8779, %rs330; 2026-02-21T12:44:28.6135737Z cvt.rn.f32.s16 %r8780, %rs336; 2026-02-21T12:44:28.6135801Z cvt.rn.f32.s16 %r8781, %rs334; 2026-02-21T12:44:28.6135862Z cvt.rn.f32.s16 %r8782, %rs332; 2026-02-21T12:44:28.6135920Z bar.sync 0; 2026-02-21T12:44:28.6136031Z st.shared.v4.b32 [%r17], {%r8774, %r8772, %r8773, %r8771}; 2026-02-21T12:44:28.6136135Z st.shared.v4.b32 [%r18], {%r8782, %r8779, %r8781, %r8780}; 2026-02-21T12:44:28.6136194Z $L__tmp23: 2026-02-21T12:44:28.6136588Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6136652Z // begin inline asm 2026-02-21T12:44:28.6136732Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6136788Z // end inline asm 2026-02-21T12:44:28.6136842Z bar.sync 0; 2026-02-21T12:44:28.6136926Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6136993Z // begin inline asm 2026-02-21T12:44:28.6139719Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201}, {%r8343,%r8344,%r8345,%r8346}, %rd466, %p15, 1, 1; 2026-02-21T12:44:28.6139927Z // end inline asm 2026-02-21T12:44:28.6140004Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6140066Z mov.b32 %r8476, %r8477; 2026-02-21T12:44:28.6140128Z mov.b32 %r8475, %r11895; 2026-02-21T12:44:28.6140185Z // begin inline asm 2026-02-21T12:44:28.6142816Z // wait for regs: %r14074,%r14075,%r14076,%r14077,%r14078,%r14079,%r14080,%r14081,%r14082,%r14083,%r14084,%r14085,%r14086,%r14087,%r14088,%r14089,%r14090,%r14091,%r14092,%r14093,%r14094,%r14095,%r14096,%r14097,%r14098,%r14099,%r14100,%r14101,%r14102,%r14103,%r14104,%r14105,%r14106,%r14107,%r14108,%r14109,%r14110,%r14111,%r14112,%r14113,%r14114,%r14115,%r14116,%r14117,%r14118,%r14119,%r14120,%r14121,%r14122,%r14123,%r14124,%r14125,%r14126,%r14127,%r14128,%r14129,%r14130,%r14131,%r14132,%r14133,%r14134,%r14135,%r14136,%r14137,%r14138,%r14139,%r14140,%r14141,%r14142,%r14143,%r14144,%r14145,%r14146,%r14147,%r14148,%r14149,%r14150,%r14151,%r14152,%r14153,%r14154,%r14155,%r14156,%r14157,%r14158,%r14159,%r14160,%r14161,%r14162,%r14163,%r14164,%r14165,%r14166,%r14167,%r14168,%r14169,%r14170,%r14171,%r14172,%r14173,%r14174,%r14175,%r14176,%r14177,%r14178,%r14179,%r14180,%r14181,%r14182,%r14183,%r14184,%r14185,%r14186,%r14187,%r14188,%r14189,%r14190,%r14191,%r14192,%r14193,%r14194,%r14195,%r14196,%r14197,%r14198,%r14199,%r14200,%r14201,%r8475,%r8476,%r8477 2026-02-21T12:44:28.6142900Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6142956Z // end inline asm 2026-02-21T12:44:28.6143012Z $L__tmp24: 2026-02-21T12:44:28.6143217Z .loc 1 90 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:90:28 2026-02-21T12:44:28.6143299Z cvt.rn.bf16x2.f32 %r8783, %r14075, %r14074; 2026-02-21T12:44:28.6143378Z cvt.rn.bf16x2.f32 %r8784, %r14077, %r14076; 2026-02-21T12:44:28.6143454Z cvt.rn.bf16x2.f32 %r8785, %r14079, %r14078; 2026-02-21T12:44:28.6143530Z cvt.rn.bf16x2.f32 %r8786, %r14081, %r14080; 2026-02-21T12:44:28.6143603Z cvt.rn.bf16x2.f32 %r8787, %r14083, %r14082; 2026-02-21T12:44:28.6143685Z cvt.rn.bf16x2.f32 %r8788, %r14085, %r14084; 2026-02-21T12:44:28.6143758Z cvt.rn.bf16x2.f32 %r8789, %r14087, %r14086; 2026-02-21T12:44:28.6143830Z cvt.rn.bf16x2.f32 %r8790, %r14089, %r14088; 2026-02-21T12:44:28.6143907Z cvt.rn.bf16x2.f32 %r8791, %r14091, %r14090; 2026-02-21T12:44:28.6143981Z cvt.rn.bf16x2.f32 %r8792, %r14093, %r14092; 2026-02-21T12:44:28.6144054Z cvt.rn.bf16x2.f32 %r8793, %r14095, %r14094; 2026-02-21T12:44:28.6144131Z cvt.rn.bf16x2.f32 %r8794, %r14097, %r14096; 2026-02-21T12:44:28.6144218Z cvt.rn.bf16x2.f32 %r8795, %r14099, %r14098; 2026-02-21T12:44:28.6144295Z cvt.rn.bf16x2.f32 %r8796, %r14101, %r14100; 2026-02-21T12:44:28.6144368Z cvt.rn.bf16x2.f32 %r8797, %r14103, %r14102; 2026-02-21T12:44:28.6144444Z cvt.rn.bf16x2.f32 %r8798, %r14105, %r14104; 2026-02-21T12:44:28.6144517Z cvt.rn.bf16x2.f32 %r8799, %r14107, %r14106; 2026-02-21T12:44:28.6144594Z cvt.rn.bf16x2.f32 %r8800, %r14109, %r14108; 2026-02-21T12:44:28.6144669Z cvt.rn.bf16x2.f32 %r8801, %r14111, %r14110; 2026-02-21T12:44:28.6144747Z cvt.rn.bf16x2.f32 %r8802, %r14113, %r14112; 2026-02-21T12:44:28.6144821Z cvt.rn.bf16x2.f32 %r8803, %r14115, %r14114; 2026-02-21T12:44:28.6144898Z cvt.rn.bf16x2.f32 %r8804, %r14117, %r14116; 2026-02-21T12:44:28.6144970Z cvt.rn.bf16x2.f32 %r8805, %r14119, %r14118; 2026-02-21T12:44:28.6145042Z cvt.rn.bf16x2.f32 %r8806, %r14121, %r14120; 2026-02-21T12:44:28.6145115Z cvt.rn.bf16x2.f32 %r8807, %r14123, %r14122; 2026-02-21T12:44:28.6145190Z cvt.rn.bf16x2.f32 %r8808, %r14125, %r14124; 2026-02-21T12:44:28.6145262Z cvt.rn.bf16x2.f32 %r8809, %r14127, %r14126; 2026-02-21T12:44:28.6145335Z cvt.rn.bf16x2.f32 %r8810, %r14129, %r14128; 2026-02-21T12:44:28.6145411Z cvt.rn.bf16x2.f32 %r8811, %r14131, %r14130; 2026-02-21T12:44:28.6145486Z cvt.rn.bf16x2.f32 %r8812, %r14133, %r14132; 2026-02-21T12:44:28.6145638Z cvt.rn.bf16x2.f32 %r8813, %r14135, %r14134; 2026-02-21T12:44:28.6145714Z cvt.rn.bf16x2.f32 %r8814, %r14137, %r14136; 2026-02-21T12:44:28.6145790Z cvt.rn.bf16x2.f32 %r8815, %r14139, %r14138; 2026-02-21T12:44:28.6145909Z cvt.rn.bf16x2.f32 %r8816, %r14141, %r14140; 2026-02-21T12:44:28.6145985Z cvt.rn.bf16x2.f32 %r8817, %r14143, %r14142; 2026-02-21T12:44:28.6146061Z cvt.rn.bf16x2.f32 %r8818, %r14145, %r14144; 2026-02-21T12:44:28.6146134Z cvt.rn.bf16x2.f32 %r8819, %r14147, %r14146; 2026-02-21T12:44:28.6146206Z cvt.rn.bf16x2.f32 %r8820, %r14149, %r14148; 2026-02-21T12:44:28.6146280Z cvt.rn.bf16x2.f32 %r8821, %r14151, %r14150; 2026-02-21T12:44:28.6146366Z cvt.rn.bf16x2.f32 %r8822, %r14153, %r14152; 2026-02-21T12:44:28.6146440Z cvt.rn.bf16x2.f32 %r8823, %r14155, %r14154; 2026-02-21T12:44:28.6146640Z cvt.rn.bf16x2.f32 %r8824, %r14157, %r14156; 2026-02-21T12:44:28.6146718Z cvt.rn.bf16x2.f32 %r8825, %r14159, %r14158; 2026-02-21T12:44:28.6146793Z cvt.rn.bf16x2.f32 %r8826, %r14161, %r14160; 2026-02-21T12:44:28.6146870Z cvt.rn.bf16x2.f32 %r8827, %r14163, %r14162; 2026-02-21T12:44:28.6146947Z cvt.rn.bf16x2.f32 %r8828, %r14165, %r14164; 2026-02-21T12:44:28.6147149Z cvt.rn.bf16x2.f32 %r8829, %r14167, %r14166; 2026-02-21T12:44:28.6147241Z cvt.rn.bf16x2.f32 %r8830, %r14169, %r14168; 2026-02-21T12:44:28.6147318Z cvt.rn.bf16x2.f32 %r8831, %r14171, %r14170; 2026-02-21T12:44:28.6147391Z cvt.rn.bf16x2.f32 %r8832, %r14173, %r14172; 2026-02-21T12:44:28.6147463Z cvt.rn.bf16x2.f32 %r8833, %r14175, %r14174; 2026-02-21T12:44:28.6147537Z cvt.rn.bf16x2.f32 %r8834, %r14177, %r14176; 2026-02-21T12:44:28.6147613Z cvt.rn.bf16x2.f32 %r8835, %r14179, %r14178; 2026-02-21T12:44:28.6147686Z cvt.rn.bf16x2.f32 %r8836, %r14181, %r14180; 2026-02-21T12:44:28.6147759Z cvt.rn.bf16x2.f32 %r8837, %r14183, %r14182; 2026-02-21T12:44:28.6147835Z cvt.rn.bf16x2.f32 %r8838, %r14185, %r14184; 2026-02-21T12:44:28.6147909Z cvt.rn.bf16x2.f32 %r8839, %r14187, %r14186; 2026-02-21T12:44:28.6147983Z cvt.rn.bf16x2.f32 %r8840, %r14189, %r14188; 2026-02-21T12:44:28.6148064Z cvt.rn.bf16x2.f32 %r8841, %r14191, %r14190; 2026-02-21T12:44:28.6148138Z cvt.rn.bf16x2.f32 %r8842, %r14193, %r14192; 2026-02-21T12:44:28.6148216Z cvt.rn.bf16x2.f32 %r8843, %r14195, %r14194; 2026-02-21T12:44:28.6148290Z cvt.rn.bf16x2.f32 %r8844, %r14197, %r14196; 2026-02-21T12:44:28.6148367Z cvt.rn.bf16x2.f32 %r8845, %r14199, %r14198; 2026-02-21T12:44:28.6148440Z cvt.rn.bf16x2.f32 %r8846, %r14201, %r14200; 2026-02-21T12:44:28.6148733Z .loc 1 91 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:22 2026-02-21T12:44:28.6148812Z mad.lo.s64 %rd417, %rd398, 2560, %rd167; 2026-02-21T12:44:28.6148876Z shl.b64 %rd418, %rd414, 1; 2026-02-21T12:44:28.6148938Z add.s64 %rd382, %rd417, %rd418; 2026-02-21T12:44:28.6149013Z mad.lo.s64 %rd419, %rd399, 2560, %rd167; 2026-02-21T12:44:28.6149074Z add.s64 %rd383, %rd419, %rd418; 2026-02-21T12:44:28.6149144Z mad.lo.s64 %rd420, %rd400, 2560, %rd167; 2026-02-21T12:44:28.6149209Z add.s64 %rd384, %rd420, %rd418; 2026-02-21T12:44:28.6149281Z mad.lo.s64 %rd421, %rd401, 2560, %rd167; 2026-02-21T12:44:28.6149343Z add.s64 %rd385, %rd421, %rd418; 2026-02-21T12:44:28.6149418Z mad.lo.s64 %rd422, %rd402, 2560, %rd167; 2026-02-21T12:44:28.6149484Z add.s64 %rd386, %rd422, %rd418; 2026-02-21T12:44:28.6149550Z mad.lo.s64 %rd423, %rd403, 2560, %rd167; 2026-02-21T12:44:28.6149610Z add.s64 %rd387, %rd423, %rd418; 2026-02-21T12:44:28.6149680Z mad.lo.s64 %rd424, %rd404, 2560, %rd167; 2026-02-21T12:44:28.6149740Z add.s64 %rd388, %rd424, %rd418; 2026-02-21T12:44:28.6149807Z mad.lo.s64 %rd425, %rd405, 2560, %rd167; 2026-02-21T12:44:28.6149869Z add.s64 %rd389, %rd425, %rd418; 2026-02-21T12:44:28.6149940Z mad.lo.s64 %rd426, %rd406, 2560, %rd167; 2026-02-21T12:44:28.6150002Z add.s64 %rd390, %rd426, %rd418; 2026-02-21T12:44:28.6150068Z mad.lo.s64 %rd427, %rd407, 2560, %rd167; 2026-02-21T12:44:28.6150134Z add.s64 %rd391, %rd427, %rd418; 2026-02-21T12:44:28.6150201Z mad.lo.s64 %rd428, %rd408, 2560, %rd167; 2026-02-21T12:44:28.6150341Z add.s64 %rd392, %rd428, %rd418; 2026-02-21T12:44:28.6150411Z mad.lo.s64 %rd429, %rd409, 2560, %rd167; 2026-02-21T12:44:28.6150533Z add.s64 %rd393, %rd429, %rd418; 2026-02-21T12:44:28.6150602Z mad.lo.s64 %rd430, %rd410, 2560, %rd167; 2026-02-21T12:44:28.6150665Z add.s64 %rd394, %rd430, %rd418; 2026-02-21T12:44:28.6150737Z mad.lo.s64 %rd431, %rd411, 2560, %rd167; 2026-02-21T12:44:28.6150797Z add.s64 %rd395, %rd431, %rd418; 2026-02-21T12:44:28.6150865Z mad.lo.s64 %rd432, %rd412, 2560, %rd167; 2026-02-21T12:44:28.6150930Z add.s64 %rd396, %rd432, %rd418; 2026-02-21T12:44:28.6150998Z mad.lo.s64 %rd433, %rd413, 2560, %rd167; 2026-02-21T12:44:28.6151072Z add.s64 %rd397, %rd433, %rd418; 2026-02-21T12:44:28.6151275Z .loc 1 91 81 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:81 2026-02-21T12:44:28.6151334Z bar.sync 0; 2026-02-21T12:44:28.6151443Z st.shared.v4.b32 [%r19], {%r8783, %r8785, %r8787, %r8789}; 2026-02-21T12:44:28.6151548Z st.shared.v4.b32 [%r20], {%r8791, %r8793, %r8795, %r8797}; 2026-02-21T12:44:28.6151703Z st.shared.v4.b32 [%r21], {%r8799, %r8801, %r8803, %r8805}; 2026-02-21T12:44:28.6151848Z st.shared.v4.b32 [%r22], {%r8807, %r8809, %r8811, %r8813}; 2026-02-21T12:44:28.6151949Z st.shared.v4.b32 [%r23], {%r8815, %r8817, %r8819, %r8821}; 2026-02-21T12:44:28.6152050Z st.shared.v4.b32 [%r24], {%r8823, %r8825, %r8827, %r8829}; 2026-02-21T12:44:28.6152151Z st.shared.v4.b32 [%r25], {%r8831, %r8833, %r8835, %r8837}; 2026-02-21T12:44:28.6152250Z st.shared.v4.b32 [%r26], {%r8839, %r8841, %r8843, %r8845}; 2026-02-21T12:44:28.6152304Z bar.sync 0; 2026-02-21T12:44:28.6152366Z // begin inline asm 2026-02-21T12:44:28.6152560Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8609, %r8610, %r8611, %r8612}, [%r3760]; 2026-02-21T12:44:28.6152615Z // end inline asm 2026-02-21T12:44:28.6152676Z // begin inline asm 2026-02-21T12:44:28.6152857Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8614, %r8615, %r8616, %r8617}, [%r3765]; 2026-02-21T12:44:28.6152915Z // end inline asm 2026-02-21T12:44:28.6152974Z // begin inline asm 2026-02-21T12:44:28.6153159Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8619, %r8620, %r8621, %r8622}, [%r3770]; 2026-02-21T12:44:28.6153228Z // end inline asm 2026-02-21T12:44:28.6153288Z // begin inline asm 2026-02-21T12:44:28.6153471Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8624, %r8625, %r8626, %r8627}, [%r3775]; 2026-02-21T12:44:28.6153526Z // end inline asm 2026-02-21T12:44:28.6153584Z // begin inline asm 2026-02-21T12:44:28.6153764Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8629, %r8630, %r8631, %r8632}, [%r3780]; 2026-02-21T12:44:28.6153819Z // end inline asm 2026-02-21T12:44:28.6153875Z // begin inline asm 2026-02-21T12:44:28.6154051Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8634, %r8635, %r8636, %r8637}, [%r3785]; 2026-02-21T12:44:28.6154112Z // end inline asm 2026-02-21T12:44:28.6154169Z // begin inline asm 2026-02-21T12:44:28.6154345Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8639, %r8640, %r8641, %r8642}, [%r3790]; 2026-02-21T12:44:28.6154406Z // end inline asm 2026-02-21T12:44:28.6154465Z // begin inline asm 2026-02-21T12:44:28.6154645Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8644, %r8645, %r8646, %r8647}, [%r3795]; 2026-02-21T12:44:28.6154707Z // end inline asm 2026-02-21T12:44:28.6154762Z bar.sync 0; 2026-02-21T12:44:28.6154862Z st.shared.v4.b32 [%r19], {%r8784, %r8786, %r8788, %r8790}; 2026-02-21T12:44:28.6154962Z st.shared.v4.b32 [%r20], {%r8792, %r8794, %r8796, %r8798}; 2026-02-21T12:44:28.6155063Z st.shared.v4.b32 [%r21], {%r8800, %r8802, %r8804, %r8806}; 2026-02-21T12:44:28.6155161Z st.shared.v4.b32 [%r22], {%r8808, %r8810, %r8812, %r8814}; 2026-02-21T12:44:28.6155259Z st.shared.v4.b32 [%r23], {%r8816, %r8818, %r8820, %r8822}; 2026-02-21T12:44:28.6155373Z st.shared.v4.b32 [%r24], {%r8824, %r8826, %r8828, %r8830}; 2026-02-21T12:44:28.6155475Z st.shared.v4.b32 [%r25], {%r8832, %r8834, %r8836, %r8838}; 2026-02-21T12:44:28.6155635Z st.shared.v4.b32 [%r26], {%r8840, %r8842, %r8844, %r8846}; 2026-02-21T12:44:28.6155693Z bar.sync 0; 2026-02-21T12:44:28.6155751Z // begin inline asm 2026-02-21T12:44:28.6155976Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8649, %r8650, %r8651, %r8652}, [%r3760]; 2026-02-21T12:44:28.6156032Z // end inline asm 2026-02-21T12:44:28.6156092Z // begin inline asm 2026-02-21T12:44:28.6156267Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8654, %r8655, %r8656, %r8657}, [%r3765]; 2026-02-21T12:44:28.6156322Z // end inline asm 2026-02-21T12:44:28.6156384Z // begin inline asm 2026-02-21T12:44:28.6156693Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8659, %r8660, %r8661, %r8662}, [%r3770]; 2026-02-21T12:44:28.6156752Z // end inline asm 2026-02-21T12:44:28.6156812Z // begin inline asm 2026-02-21T12:44:28.6156989Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8664, %r8665, %r8666, %r8667}, [%r3775]; 2026-02-21T12:44:28.6157044Z // end inline asm 2026-02-21T12:44:28.6157100Z // begin inline asm 2026-02-21T12:44:28.6157283Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8669, %r8670, %r8671, %r8672}, [%r3780]; 2026-02-21T12:44:28.6157339Z // end inline asm 2026-02-21T12:44:28.6157531Z // begin inline asm 2026-02-21T12:44:28.6157715Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8674, %r8675, %r8676, %r8677}, [%r3785]; 2026-02-21T12:44:28.6157770Z // end inline asm 2026-02-21T12:44:28.6157828Z // begin inline asm 2026-02-21T12:44:28.6158003Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8679, %r8680, %r8681, %r8682}, [%r3790]; 2026-02-21T12:44:28.6158062Z // end inline asm 2026-02-21T12:44:28.6158119Z // begin inline asm 2026-02-21T12:44:28.6158293Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r8684, %r8685, %r8686, %r8687}, [%r3795]; 2026-02-21T12:44:28.6158350Z // end inline asm 2026-02-21T12:44:28.6158408Z // begin inline asm 2026-02-21T12:44:28.6158535Z st.global.v4.b32 [ %rd382 + 0 ], { %r8609, %r8610, %r8611, %r8612 }; 2026-02-21T12:44:28.6158593Z // end inline asm 2026-02-21T12:44:28.6158652Z // begin inline asm 2026-02-21T12:44:28.6158771Z st.global.v4.b32 [ %rd383 + 0 ], { %r8649, %r8650, %r8651, %r8652 }; 2026-02-21T12:44:28.6158829Z // end inline asm 2026-02-21T12:44:28.6158908Z // begin inline asm 2026-02-21T12:44:28.6159028Z st.global.v4.b32 [ %rd384 + 0 ], { %r8614, %r8615, %r8616, %r8617 }; 2026-02-21T12:44:28.6159082Z // end inline asm 2026-02-21T12:44:28.6159143Z // begin inline asm 2026-02-21T12:44:28.6159261Z st.global.v4.b32 [ %rd385 + 0 ], { %r8654, %r8655, %r8656, %r8657 }; 2026-02-21T12:44:28.6159317Z // end inline asm 2026-02-21T12:44:28.6159377Z // begin inline asm 2026-02-21T12:44:28.6159496Z st.global.v4.b32 [ %rd386 + 0 ], { %r8619, %r8620, %r8621, %r8622 }; 2026-02-21T12:44:28.6159552Z // end inline asm 2026-02-21T12:44:28.6159610Z // begin inline asm 2026-02-21T12:44:28.6159726Z st.global.v4.b32 [ %rd387 + 0 ], { %r8659, %r8660, %r8661, %r8662 }; 2026-02-21T12:44:28.6159782Z // end inline asm 2026-02-21T12:44:28.6159841Z // begin inline asm 2026-02-21T12:44:28.6159961Z st.global.v4.b32 [ %rd388 + 0 ], { %r8624, %r8625, %r8626, %r8627 }; 2026-02-21T12:44:28.6160015Z // end inline asm 2026-02-21T12:44:28.6160073Z // begin inline asm 2026-02-21T12:44:28.6160202Z st.global.v4.b32 [ %rd389 + 0 ], { %r8664, %r8665, %r8666, %r8667 }; 2026-02-21T12:44:28.6160265Z // end inline asm 2026-02-21T12:44:28.6160323Z // begin inline asm 2026-02-21T12:44:28.6160435Z st.global.v4.b32 [ %rd390 + 0 ], { %r8629, %r8630, %r8631, %r8632 }; 2026-02-21T12:44:28.6160498Z // end inline asm 2026-02-21T12:44:28.6160556Z // begin inline asm 2026-02-21T12:44:28.6160669Z st.global.v4.b32 [ %rd391 + 0 ], { %r8669, %r8670, %r8671, %r8672 }; 2026-02-21T12:44:28.6160725Z // end inline asm 2026-02-21T12:44:28.6160784Z // begin inline asm 2026-02-21T12:44:28.6160895Z st.global.v4.b32 [ %rd392 + 0 ], { %r8634, %r8635, %r8636, %r8637 }; 2026-02-21T12:44:28.6160952Z // end inline asm 2026-02-21T12:44:28.6161012Z // begin inline asm 2026-02-21T12:44:28.6161124Z st.global.v4.b32 [ %rd393 + 0 ], { %r8674, %r8675, %r8676, %r8677 }; 2026-02-21T12:44:28.6161255Z // end inline asm 2026-02-21T12:44:28.6161314Z // begin inline asm 2026-02-21T12:44:28.6161489Z st.global.v4.b32 [ %rd394 + 0 ], { %r8639, %r8640, %r8641, %r8642 }; 2026-02-21T12:44:28.6161546Z // end inline asm 2026-02-21T12:44:28.6161603Z // begin inline asm 2026-02-21T12:44:28.6161717Z st.global.v4.b32 [ %rd395 + 0 ], { %r8679, %r8680, %r8681, %r8682 }; 2026-02-21T12:44:28.6161784Z // end inline asm 2026-02-21T12:44:28.6161844Z // begin inline asm 2026-02-21T12:44:28.6161961Z st.global.v4.b32 [ %rd396 + 0 ], { %r8644, %r8645, %r8646, %r8647 }; 2026-02-21T12:44:28.6162017Z // end inline asm 2026-02-21T12:44:28.6162075Z // begin inline asm 2026-02-21T12:44:28.6162191Z st.global.v4.b32 [ %rd397 + 0 ], { %r8684, %r8685, %r8686, %r8687 }; 2026-02-21T12:44:28.6162249Z // end inline asm 2026-02-21T12:44:28.6162466Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6162531Z add.s64 %rd434, %rd647, 3; 2026-02-21T12:44:28.6162786Z .loc 1 28 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:28:35 2026-02-21T12:44:28.6162944Z mul.hi.u64 %rd435, %rd434, -3689348814741910323; 2026-02-21T12:44:28.6163007Z shr.u64 %rd436, %rd435, 4; 2026-02-21T12:44:28.6163210Z .loc 1 29 33 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:29:33 2026-02-21T12:44:28.6163274Z shl.b64 %rd85, %rd436, 2; 2026-02-21T12:44:28.6163468Z .loc 1 30 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:39 2026-02-21T12:44:28.6163531Z sub.s64 %rd437, 2048, %rd85; 2026-02-21T12:44:28.6163729Z .loc 1 30 52 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:52 2026-02-21T12:44:28.6163789Z min.s64 %rd86, %rd437, 4; 2026-02-21T12:44:28.6163986Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.6164057Z mul.lo.s64 %rd438, %rd436, 20; 2026-02-21T12:44:28.6164121Z sub.s64 %rd87, %rd434, %rd438; 2026-02-21T12:44:28.6164318Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.6164385Z or.b64 %rd439, %rd87, %rd86; 2026-02-21T12:44:28.6164454Z and.b64 %rd440, %rd439, -4294967296; 2026-02-21T12:44:28.6164518Z setp.ne.b64 %p20, %rd440, 0; 2026-02-21T12:44:28.6164582Z @%p20 bra $L__BB0_19; 2026-02-21T12:44:28.6164642Z bra.uni $L__BB0_18; 2026-02-21T12:44:28.6164757Z $L__BB0_19: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6164821Z div.s64 %rd660, %rd87, %rd86; 2026-02-21T12:44:28.6164880Z bra.uni $L__BB0_20; 2026-02-21T12:44:28.6164985Z $L__BB0_18: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6165048Z cvt.u32.u64 %r8847, %rd86; 2026-02-21T12:44:28.6165111Z cvt.u32.u64 %r8848, %rd87; 2026-02-21T12:44:28.6165174Z div.u32 %r8849, %r8848, %r8847; 2026-02-21T12:44:28.6165238Z cvt.u64.u32 %rd660, %r8849; 2026-02-21T12:44:28.6165343Z $L__BB0_20: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6165549Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.6165616Z mul.lo.s64 %rd442, %rd660, %rd86; 2026-02-21T12:44:28.6165677Z sub.s64 %rd443, %rd87, %rd442; 2026-02-21T12:44:28.6165879Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.6165940Z add.s64 %rd444, %rd443, %rd85; 2026-02-21T12:44:28.6166134Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.6166197Z shl.b64 %rd91, %rd444, 7; 2026-02-21T12:44:28.6166406Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6166590Z or.b64 %rd445, %rd91, %rd4; 2026-02-21T12:44:28.6166895Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.6166962Z shl.b64 %rd92, %rd660, 8; 2026-02-21T12:44:28.6167233Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6167295Z or.b64 %rd93, %rd92, %rd21; 2026-02-21T12:44:28.6167496Z .loc 1 51 53 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:53 2026-02-21T12:44:28.6167558Z shl.b64 %rd94, %rd445, 13; 2026-02-21T12:44:28.6167762Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6167825Z shl.b64 %rd446, %rd444, 21; 2026-02-21T12:44:28.6167886Z add.s64 %rd662, %rd30, %rd446; 2026-02-21T12:44:28.6167948Z add.s64 %rd661, %rd31, %rd92; 2026-02-21T12:44:28.6168013Z mov.b32 %r14202, 0f00000000; 2026-02-21T12:44:28.6168077Z mov.b64 %rd663, -12; 2026-02-21T12:44:28.6168137Z mov.b32 %r14203, %r14202; 2026-02-21T12:44:28.6168197Z mov.b32 %r14204, %r14202; 2026-02-21T12:44:28.6168256Z mov.b32 %r14205, %r14202; 2026-02-21T12:44:28.6168380Z mov.b32 %r14206, %r14202; 2026-02-21T12:44:28.6168501Z mov.b32 %r14207, %r14202; 2026-02-21T12:44:28.6168566Z mov.b32 %r14208, %r14202; 2026-02-21T12:44:28.6168623Z mov.b32 %r14209, %r14202; 2026-02-21T12:44:28.6168680Z mov.b32 %r14210, %r14202; 2026-02-21T12:44:28.6168740Z mov.b32 %r14211, %r14202; 2026-02-21T12:44:28.6168802Z mov.b32 %r14212, %r14202; 2026-02-21T12:44:28.6168859Z mov.b32 %r14213, %r14202; 2026-02-21T12:44:28.6168917Z mov.b32 %r14214, %r14202; 2026-02-21T12:44:28.6168978Z mov.b32 %r14215, %r14202; 2026-02-21T12:44:28.6169034Z mov.b32 %r14216, %r14202; 2026-02-21T12:44:28.6169091Z mov.b32 %r14217, %r14202; 2026-02-21T12:44:28.6169150Z mov.b32 %r14218, %r14202; 2026-02-21T12:44:28.6169212Z mov.b32 %r14219, %r14202; 2026-02-21T12:44:28.6169271Z mov.b32 %r14220, %r14202; 2026-02-21T12:44:28.6169331Z mov.b32 %r14221, %r14202; 2026-02-21T12:44:28.6169392Z mov.b32 %r14222, %r14202; 2026-02-21T12:44:28.6169449Z mov.b32 %r14223, %r14202; 2026-02-21T12:44:28.6169512Z mov.b32 %r14224, %r14202; 2026-02-21T12:44:28.6169571Z mov.b32 %r14225, %r14202; 2026-02-21T12:44:28.6169632Z mov.b32 %r14226, %r14202; 2026-02-21T12:44:28.6169701Z mov.b32 %r14227, %r14202; 2026-02-21T12:44:28.6169761Z mov.b32 %r14228, %r14202; 2026-02-21T12:44:28.6169823Z mov.b32 %r14229, %r14202; 2026-02-21T12:44:28.6169882Z mov.b32 %r14230, %r14202; 2026-02-21T12:44:28.6169938Z mov.b32 %r14231, %r14202; 2026-02-21T12:44:28.6169997Z mov.b32 %r14232, %r14202; 2026-02-21T12:44:28.6170055Z mov.b32 %r14233, %r14202; 2026-02-21T12:44:28.6170112Z mov.b32 %r14234, %r14202; 2026-02-21T12:44:28.6170168Z mov.b32 %r14235, %r14202; 2026-02-21T12:44:28.6170228Z mov.b32 %r14236, %r14202; 2026-02-21T12:44:28.6170286Z mov.b32 %r14237, %r14202; 2026-02-21T12:44:28.6170344Z mov.b32 %r14238, %r14202; 2026-02-21T12:44:28.6170406Z mov.b32 %r14239, %r14202; 2026-02-21T12:44:28.6170463Z mov.b32 %r14240, %r14202; 2026-02-21T12:44:28.6170520Z mov.b32 %r14241, %r14202; 2026-02-21T12:44:28.6170581Z mov.b32 %r14242, %r14202; 2026-02-21T12:44:28.6170644Z mov.b32 %r14243, %r14202; 2026-02-21T12:44:28.6170703Z mov.b32 %r14244, %r14202; 2026-02-21T12:44:28.6170760Z mov.b32 %r14245, %r14202; 2026-02-21T12:44:28.6170822Z mov.b32 %r14246, %r14202; 2026-02-21T12:44:28.6170880Z mov.b32 %r14247, %r14202; 2026-02-21T12:44:28.6170937Z mov.b32 %r14248, %r14202; 2026-02-21T12:44:28.6170995Z mov.b32 %r14249, %r14202; 2026-02-21T12:44:28.6171057Z mov.b32 %r14250, %r14202; 2026-02-21T12:44:28.6171115Z mov.b32 %r14251, %r14202; 2026-02-21T12:44:28.6171172Z mov.b32 %r14252, %r14202; 2026-02-21T12:44:28.6171232Z mov.b32 %r14253, %r14202; 2026-02-21T12:44:28.6171301Z mov.b32 %r14254, %r14202; 2026-02-21T12:44:28.6171360Z mov.b32 %r14255, %r14202; 2026-02-21T12:44:28.6171418Z mov.b32 %r14256, %r14202; 2026-02-21T12:44:28.6171538Z mov.b32 %r14257, %r14202; 2026-02-21T12:44:28.6171597Z mov.b32 %r14258, %r14202; 2026-02-21T12:44:28.6171662Z mov.b32 %r14259, %r14202; 2026-02-21T12:44:28.6171769Z mov.b32 %r14260, %r14202; 2026-02-21T12:44:28.6171828Z mov.b32 %r14261, %r14202; 2026-02-21T12:44:28.6171886Z mov.b32 %r14262, %r14202; 2026-02-21T12:44:28.6171943Z mov.b32 %r14263, %r14202; 2026-02-21T12:44:28.6172003Z mov.b32 %r14264, %r14202; 2026-02-21T12:44:28.6172059Z mov.b32 %r14265, %r14202; 2026-02-21T12:44:28.6172114Z mov.b32 %r14266, %r14202; 2026-02-21T12:44:28.6172174Z mov.b32 %r14267, %r14202; 2026-02-21T12:44:28.6172230Z mov.b32 %r14268, %r14202; 2026-02-21T12:44:28.6172287Z mov.b32 %r14269, %r14202; 2026-02-21T12:44:28.6172345Z mov.b32 %r14270, %r14202; 2026-02-21T12:44:28.6172405Z mov.b32 %r14271, %r14202; 2026-02-21T12:44:28.6172462Z mov.b32 %r14272, %r14202; 2026-02-21T12:44:28.6172519Z mov.b32 %r14273, %r14202; 2026-02-21T12:44:28.6172578Z mov.b32 %r14274, %r14202; 2026-02-21T12:44:28.6172638Z mov.b32 %r14275, %r14202; 2026-02-21T12:44:28.6172696Z mov.b32 %r14276, %r14202; 2026-02-21T12:44:28.6172754Z mov.b32 %r14277, %r14202; 2026-02-21T12:44:28.6172901Z mov.b32 %r14278, %r14202; 2026-02-21T12:44:28.6172964Z mov.b32 %r14279, %r14202; 2026-02-21T12:44:28.6173021Z mov.b32 %r14280, %r14202; 2026-02-21T12:44:28.6173078Z mov.b32 %r14281, %r14202; 2026-02-21T12:44:28.6173134Z mov.b32 %r14282, %r14202; 2026-02-21T12:44:28.6173191Z mov.b32 %r14283, %r14202; 2026-02-21T12:44:28.6173250Z mov.b32 %r14284, %r14202; 2026-02-21T12:44:28.6173308Z mov.b32 %r14285, %r14202; 2026-02-21T12:44:28.6173365Z mov.b32 %r14286, %r14202; 2026-02-21T12:44:28.6173421Z mov.b32 %r14287, %r14202; 2026-02-21T12:44:28.6173481Z mov.b32 %r14288, %r14202; 2026-02-21T12:44:28.6173538Z mov.b32 %r14289, %r14202; 2026-02-21T12:44:28.6173594Z mov.b32 %r14290, %r14202; 2026-02-21T12:44:28.6173656Z mov.b32 %r14291, %r14202; 2026-02-21T12:44:28.6173711Z mov.b32 %r14292, %r14202; 2026-02-21T12:44:28.6173771Z mov.b32 %r14293, %r14202; 2026-02-21T12:44:28.6173826Z mov.b32 %r14294, %r14202; 2026-02-21T12:44:28.6173886Z mov.b32 %r14295, %r14202; 2026-02-21T12:44:28.6173948Z mov.b32 %r14296, %r14202; 2026-02-21T12:44:28.6174005Z mov.b32 %r14297, %r14202; 2026-02-21T12:44:28.6174066Z mov.b32 %r14298, %r14202; 2026-02-21T12:44:28.6174123Z mov.b32 %r14299, %r14202; 2026-02-21T12:44:28.6174181Z mov.b32 %r14300, %r14202; 2026-02-21T12:44:28.6174236Z mov.b32 %r14301, %r14202; 2026-02-21T12:44:28.6174296Z mov.b32 %r14302, %r14202; 2026-02-21T12:44:28.6174352Z mov.b32 %r14303, %r14202; 2026-02-21T12:44:28.6174421Z mov.b32 %r14304, %r14202; 2026-02-21T12:44:28.6174483Z mov.b32 %r14305, %r14202; 2026-02-21T12:44:28.6174542Z mov.b32 %r14306, %r14202; 2026-02-21T12:44:28.6174600Z mov.b32 %r14307, %r14202; 2026-02-21T12:44:28.6174657Z mov.b32 %r14308, %r14202; 2026-02-21T12:44:28.6174717Z mov.b32 %r14309, %r14202; 2026-02-21T12:44:28.6174773Z mov.b32 %r14310, %r14202; 2026-02-21T12:44:28.6174832Z mov.b32 %r14311, %r14202; 2026-02-21T12:44:28.6174893Z mov.b32 %r14312, %r14202; 2026-02-21T12:44:28.6174951Z mov.b32 %r14313, %r14202; 2026-02-21T12:44:28.6175013Z mov.b32 %r14314, %r14202; 2026-02-21T12:44:28.6175071Z mov.b32 %r14315, %r14202; 2026-02-21T12:44:28.6175131Z mov.b32 %r14316, %r14202; 2026-02-21T12:44:28.6175197Z mov.b32 %r14317, %r14202; 2026-02-21T12:44:28.6175256Z mov.b32 %r14318, %r14202; 2026-02-21T12:44:28.6175317Z mov.b32 %r14319, %r14202; 2026-02-21T12:44:28.6175374Z mov.b32 %r14320, %r14202; 2026-02-21T12:44:28.6175431Z mov.b32 %r14321, %r14202; 2026-02-21T12:44:28.6175491Z mov.b32 %r14322, %r14202; 2026-02-21T12:44:28.6175549Z mov.b32 %r14323, %r14202; 2026-02-21T12:44:28.6175605Z mov.b32 %r14324, %r14202; 2026-02-21T12:44:28.6175663Z mov.b32 %r14325, %r14202; 2026-02-21T12:44:28.6175723Z mov.b32 %r14326, %r14202; 2026-02-21T12:44:28.6175780Z mov.b32 %r14327, %r14202; 2026-02-21T12:44:28.6175837Z mov.b32 %r14328, %r14202; 2026-02-21T12:44:28.6175956Z mov.b32 %r14329, %r14202; 2026-02-21T12:44:28.6176073Z $L__BB0_21: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:28.6176226Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:28.6176446Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6176639Z add.s64 %rd448, %rd662, -32; 2026-02-21T12:44:28.6176699Z // begin inline asm 2026-02-21T12:44:28.6176756Z mov.u64 %rd447, 0x0; 2026-02-21T12:44:28.6176886Z createpolicy.fractional.L2::evict_last.b64 %rd447, 1.0; 2026-02-21T12:44:28.6176943Z // end inline asm 2026-02-21T12:44:28.6177001Z // begin inline asm 2026-02-21T12:44:28.6177062Z mov.u32 %r8851, 0x0; 2026-02-21T12:44:28.6177122Z mov.u32 %r8852, 0x0; 2026-02-21T12:44:28.6177311Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r8851, %r8852 }, [ %rd448 + 0 ], %rd447; 2026-02-21T12:44:28.6177366Z // end inline asm 2026-02-21T12:44:28.6177572Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6177641Z bar.sync 0; 2026-02-21T12:44:28.6177853Z st.shared.v2.b32 [%r10], {%r8851, %r8852}; 2026-02-21T12:44:28.6177916Z bar.sync 0; 2026-02-21T12:44:28.6177983Z ld.shared.b16 %rs337, [%r11]; 2026-02-21T12:44:28.6178050Z ld.shared.b16 %rs338, [%r11+128]; 2026-02-21T12:44:28.6178116Z ld.shared.b16 %rs339, [%r11+8]; 2026-02-21T12:44:28.6178183Z ld.shared.b16 %rs340, [%r11+136]; 2026-02-21T12:44:28.6178245Z cvt.f32.bf16 %r9110, %rs337; 2026-02-21T12:44:28.6178309Z cvt.f32.bf16 %r9111, %rs338; 2026-02-21T12:44:28.6178373Z cvt.f32.bf16 %r9112, %rs339; 2026-02-21T12:44:28.6178434Z cvt.f32.bf16 %r9113, %rs340; 2026-02-21T12:44:28.6178642Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6178709Z // begin inline asm 2026-02-21T12:44:28.6178767Z mov.u32 %r8853, 0x0; 2026-02-21T12:44:28.6178841Z ld.global.b32 { %r8853 }, [ %rd661 + 0 ]; 2026-02-21T12:44:28.6178913Z // end inline asm 2026-02-21T12:44:28.6179120Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6179177Z bar.sync 0; 2026-02-21T12:44:28.6179241Z st.shared.b8 [%r12], %r8853; 2026-02-21T12:44:28.6179314Z prmt.b32 %r10426, %r8853, 0, 0x7771U; 2026-02-21T12:44:28.6179378Z st.shared.b8 [%r13+256], %r10426; 2026-02-21T12:44:28.6179444Z prmt.b32 %r10427, %r8853, 0, 0x7772U; 2026-02-21T12:44:28.6179506Z st.shared.b8 [%r14+512], %r10427; 2026-02-21T12:44:28.6179573Z prmt.b32 %r10428, %r8853, 0, 0x7773U; 2026-02-21T12:44:28.6179636Z st.shared.b8 [%r15+768], %r10428; 2026-02-21T12:44:28.6179690Z bar.sync 0; 2026-02-21T12:44:28.6179757Z ld.shared.b32 %r10429, [%r16]; 2026-02-21T12:44:28.6179824Z prmt.b32 %r10430, %r10429, 0, 0x7771U; 2026-02-21T12:44:28.6179886Z cvt.u16.u32 %rs341, %r10430; 2026-02-21T12:44:28.6179953Z prmt.b32 %r10431, %r10429, 0, 0x7770U; 2026-02-21T12:44:28.6180015Z cvt.u16.u32 %rs342, %r10431; 2026-02-21T12:44:28.6180081Z prmt.b32 %r10432, %r10429, 0, 0x7773U; 2026-02-21T12:44:28.6180142Z cvt.u16.u32 %rs343, %r10432; 2026-02-21T12:44:28.6180212Z prmt.b32 %r10433, %r10429, 0, 0x7772U; 2026-02-21T12:44:28.6180273Z cvt.u16.u32 %rs344, %r10433; 2026-02-21T12:44:28.6180471Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6180536Z shl.b16 %rs345, %rs342, 4; 2026-02-21T12:44:28.6180597Z shl.b16 %rs346, %rs341, 4; 2026-02-21T12:44:28.6180791Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6180856Z cvt.u32.u16 %r10434, %rs345; 2026-02-21T12:44:28.6180946Z prmt.b32 %r10435, %r10434, %r10436, 0x3340U; 2026-02-21T12:44:28.6181028Z prmt.b32 %r10440, %r10435, %r10437, 0x5410U; 2026-02-21T12:44:28.6181102Z prmt.b32 %r10441, %r10440, %r10429, 0x5040U; 2026-02-21T12:44:28.6181169Z prmt.b32 %r10442, %r10441, 0, 0x9991U; 2026-02-21T12:44:28.6181305Z cvt.u16.u32 %rs347, %r10442; 2026-02-21T12:44:28.6181365Z shr.s16 %rs348, %rs347, 4; 2026-02-21T12:44:28.6181490Z prmt.b32 %r10443, %r10441, 0, 0xbbb3U; 2026-02-21T12:44:28.6181550Z cvt.u16.u32 %rs349, %r10443; 2026-02-21T12:44:28.6181609Z shr.s16 %rs350, %rs349, 4; 2026-02-21T12:44:28.6181669Z cvt.s16.s8 %rs351, %rs345; 2026-02-21T12:44:28.6181732Z shr.s16 %rs352, %rs351, 4; 2026-02-21T12:44:28.6181791Z cvt.s16.s8 %rs353, %rs346; 2026-02-21T12:44:28.6181857Z shr.s16 %rs354, %rs353, 4; 2026-02-21T12:44:28.6182059Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6182124Z cvt.rn.f32.s16 %r10444, %rs350; 2026-02-21T12:44:28.6182187Z cvt.rn.f32.s16 %r10445, %rs348; 2026-02-21T12:44:28.6182251Z cvt.rn.f32.s16 %r10446, %rs354; 2026-02-21T12:44:28.6182313Z cvt.rn.f32.s16 %r10447, %rs352; 2026-02-21T12:44:28.6182515Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6182578Z shl.b16 %rs355, %rs344, 4; 2026-02-21T12:44:28.6182709Z shl.b16 %rs356, %rs343, 4; 2026-02-21T12:44:28.6182950Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6183029Z prmt.b32 %r10448, %r10429, %r10449, 0x3020U; 2026-02-21T12:44:28.6183099Z prmt.b32 %r10450, %r10448, 0, 0x9991U; 2026-02-21T12:44:28.6183160Z cvt.u16.u32 %rs357, %r10450; 2026-02-21T12:44:28.6183218Z shr.s16 %rs358, %rs357, 4; 2026-02-21T12:44:28.6183282Z cvt.s16.s8 %rs359, %rs355; 2026-02-21T12:44:28.6183343Z shr.s16 %rs360, %rs359, 4; 2026-02-21T12:44:28.6183402Z cvt.s16.s8 %rs361, %rs356; 2026-02-21T12:44:28.6183460Z shr.s16 %rs362, %rs361, 4; 2026-02-21T12:44:28.6183529Z prmt.b32 %r10451, %r10429, 0, 0xbbb3U; 2026-02-21T12:44:28.6183590Z cvt.u16.u32 %rs363, %r10451; 2026-02-21T12:44:28.6183650Z shr.s16 %rs364, %rs363, 4; 2026-02-21T12:44:28.6183849Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6183911Z cvt.rn.f32.s16 %r10452, %rs358; 2026-02-21T12:44:28.6183977Z cvt.rn.f32.s16 %r10453, %rs364; 2026-02-21T12:44:28.6184039Z cvt.rn.f32.s16 %r10454, %rs362; 2026-02-21T12:44:28.6184102Z cvt.rn.f32.s16 %r10455, %rs360; 2026-02-21T12:44:28.6184157Z bar.sync 0; 2026-02-21T12:44:28.6184274Z st.shared.v4.b32 [%r17], {%r10447, %r10445, %r10446, %r10444}; 2026-02-21T12:44:28.6184390Z st.shared.v4.b32 [%r18], {%r10455, %r10452, %r10454, %r10453}; 2026-02-21T12:44:28.6184444Z $L__tmp25: 2026-02-21T12:44:28.6184721Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6184782Z // begin inline asm 2026-02-21T12:44:28.6184870Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6184929Z // end inline asm 2026-02-21T12:44:28.6184983Z bar.sync 0; 2026-02-21T12:44:28.6185059Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6185124Z mov.pred %p21, -1; 2026-02-21T12:44:28.6185182Z // begin inline asm 2026-02-21T12:44:28.6188040Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329}, {%r9110,%r9111,%r9112,%r9113}, %rd466, %p21, 1, 1; 2026-02-21T12:44:28.6188243Z // end inline asm 2026-02-21T12:44:28.6188324Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6188383Z mov.b32 %r10903, 0; 2026-02-21T12:44:28.6188446Z mov.b32 %r9242, %r11895; 2026-02-21T12:44:28.6188573Z mov.b32 %r9243, %r10903; 2026-02-21T12:44:28.6188638Z mov.b32 %r9244, %r10903; 2026-02-21T12:44:28.6188700Z // begin inline asm 2026-02-21T12:44:28.6197889Z // wait for regs: %r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329,%r9242,%r9243,%r9244 2026-02-21T12:44:28.6198025Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6198087Z // end inline asm 2026-02-21T12:44:28.6198143Z $L__tmp26: 2026-02-21T12:44:28.6198384Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6198464Z add.s64 %rd453, %rd662, -16; 2026-02-21T12:44:28.6198531Z // begin inline asm 2026-02-21T12:44:28.6198593Z mov.u64 %rd452, 0x0; 2026-02-21T12:44:28.6198740Z createpolicy.fractional.L2::evict_last.b64 %rd452, 1.0; 2026-02-21T12:44:28.6198802Z // end inline asm 2026-02-21T12:44:28.6198861Z // begin inline asm 2026-02-21T12:44:28.6198920Z mov.u32 %r9376, 0x0; 2026-02-21T12:44:28.6198984Z mov.u32 %r9377, 0x0; 2026-02-21T12:44:28.6199194Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r9376, %r9377 }, [ %rd453 + 0 ], %rd452; 2026-02-21T12:44:28.6199255Z // end inline asm 2026-02-21T12:44:28.6199480Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6199540Z bar.sync 0; 2026-02-21T12:44:28.6199625Z st.shared.v2.b32 [%r10], {%r9376, %r9377}; 2026-02-21T12:44:28.6199685Z bar.sync 0; 2026-02-21T12:44:28.6199757Z ld.shared.b16 %rs365, [%r11]; 2026-02-21T12:44:28.6199827Z ld.shared.b16 %rs366, [%r11+128]; 2026-02-21T12:44:28.6199901Z ld.shared.b16 %rs367, [%r11+8]; 2026-02-21T12:44:28.6199971Z ld.shared.b16 %rs368, [%r11+136]; 2026-02-21T12:44:28.6200039Z cvt.f32.bf16 %r9635, %rs365; 2026-02-21T12:44:28.6200104Z cvt.f32.bf16 %r9636, %rs366; 2026-02-21T12:44:28.6200169Z cvt.f32.bf16 %r9637, %rs367; 2026-02-21T12:44:28.6200230Z cvt.f32.bf16 %r9638, %rs368; 2026-02-21T12:44:28.6200446Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6200513Z add.s64 %rd455, %rd661, 5120; 2026-02-21T12:44:28.6200574Z // begin inline asm 2026-02-21T12:44:28.6200633Z mov.u32 %r9378, 0x0; 2026-02-21T12:44:28.6200711Z ld.global.b32 { %r9378 }, [ %rd455 + 0 ]; 2026-02-21T12:44:28.6200773Z // end inline asm 2026-02-21T12:44:28.6200977Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6201033Z bar.sync 0; 2026-02-21T12:44:28.6201100Z st.shared.b8 [%r12], %r9378; 2026-02-21T12:44:28.6201261Z prmt.b32 %r10456, %r9378, 0, 0x7771U; 2026-02-21T12:44:28.6201329Z st.shared.b8 [%r13+256], %r10456; 2026-02-21T12:44:28.6201394Z prmt.b32 %r10457, %r9378, 0, 0x7772U; 2026-02-21T12:44:28.6201525Z st.shared.b8 [%r14+512], %r10457; 2026-02-21T12:44:28.6201593Z prmt.b32 %r10458, %r9378, 0, 0x7773U; 2026-02-21T12:44:28.6201660Z st.shared.b8 [%r15+768], %r10458; 2026-02-21T12:44:28.6201720Z bar.sync 0; 2026-02-21T12:44:28.6201789Z ld.shared.b32 %r10459, [%r16]; 2026-02-21T12:44:28.6201862Z prmt.b32 %r10460, %r10459, 0, 0x7771U; 2026-02-21T12:44:28.6201929Z cvt.u16.u32 %rs369, %r10460; 2026-02-21T12:44:28.6202000Z prmt.b32 %r10461, %r10459, 0, 0x7770U; 2026-02-21T12:44:28.6202075Z cvt.u16.u32 %rs370, %r10461; 2026-02-21T12:44:28.6202139Z prmt.b32 %r10462, %r10459, 0, 0x7773U; 2026-02-21T12:44:28.6202202Z cvt.u16.u32 %rs371, %r10462; 2026-02-21T12:44:28.6202265Z prmt.b32 %r10463, %r10459, 0, 0x7772U; 2026-02-21T12:44:28.6202326Z cvt.u16.u32 %rs372, %r10463; 2026-02-21T12:44:28.6202554Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6202620Z shl.b16 %rs373, %rs370, 4; 2026-02-21T12:44:28.6202794Z shl.b16 %rs374, %rs369, 4; 2026-02-21T12:44:28.6203005Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6203073Z cvt.u32.u16 %r10464, %rs373; 2026-02-21T12:44:28.6203161Z prmt.b32 %r10465, %r10464, %r10466, 0x3340U; 2026-02-21T12:44:28.6203241Z prmt.b32 %r10467, %r10465, %r10437, 0x5410U; 2026-02-21T12:44:28.6203324Z prmt.b32 %r10468, %r10467, %r10459, 0x5040U; 2026-02-21T12:44:28.6203405Z prmt.b32 %r10469, %r10468, 0, 0x9991U; 2026-02-21T12:44:28.6203471Z cvt.u16.u32 %rs375, %r10469; 2026-02-21T12:44:28.6203536Z shr.s16 %rs376, %rs375, 4; 2026-02-21T12:44:28.6203607Z prmt.b32 %r10470, %r10468, 0, 0xbbb3U; 2026-02-21T12:44:28.6203670Z cvt.u16.u32 %rs377, %r10470; 2026-02-21T12:44:28.6203729Z shr.s16 %rs378, %rs377, 4; 2026-02-21T12:44:28.6203795Z cvt.s16.s8 %rs379, %rs373; 2026-02-21T12:44:28.6203858Z shr.s16 %rs380, %rs379, 4; 2026-02-21T12:44:28.6203918Z cvt.s16.s8 %rs381, %rs374; 2026-02-21T12:44:28.6203984Z shr.s16 %rs382, %rs381, 4; 2026-02-21T12:44:28.6204200Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6204270Z cvt.rn.f32.s16 %r10471, %rs378; 2026-02-21T12:44:28.6204336Z cvt.rn.f32.s16 %r10472, %rs376; 2026-02-21T12:44:28.6204401Z cvt.rn.f32.s16 %r10473, %rs382; 2026-02-21T12:44:28.6204462Z cvt.rn.f32.s16 %r10474, %rs380; 2026-02-21T12:44:28.6204665Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6204729Z shl.b16 %rs383, %rs372, 4; 2026-02-21T12:44:28.6204790Z shl.b16 %rs384, %rs371, 4; 2026-02-21T12:44:28.6204990Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6205072Z prmt.b32 %r10475, %r10459, %r10476, 0x3020U; 2026-02-21T12:44:28.6205141Z prmt.b32 %r10477, %r10475, 0, 0x9991U; 2026-02-21T12:44:28.6205203Z cvt.u16.u32 %rs385, %r10477; 2026-02-21T12:44:28.6205266Z shr.s16 %rs386, %rs385, 4; 2026-02-21T12:44:28.6205331Z cvt.s16.s8 %rs387, %rs383; 2026-02-21T12:44:28.6205389Z shr.s16 %rs388, %rs387, 4; 2026-02-21T12:44:28.6205449Z cvt.s16.s8 %rs389, %rs384; 2026-02-21T12:44:28.6205519Z shr.s16 %rs390, %rs389, 4; 2026-02-21T12:44:28.6205594Z prmt.b32 %r10478, %r10459, 0, 0xbbb3U; 2026-02-21T12:44:28.6205657Z cvt.u16.u32 %rs391, %r10478; 2026-02-21T12:44:28.6205716Z shr.s16 %rs392, %rs391, 4; 2026-02-21T12:44:28.6205923Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6205987Z cvt.rn.f32.s16 %r10479, %rs386; 2026-02-21T12:44:28.6206047Z cvt.rn.f32.s16 %r10480, %rs392; 2026-02-21T12:44:28.6206113Z cvt.rn.f32.s16 %r10481, %rs390; 2026-02-21T12:44:28.6206175Z cvt.rn.f32.s16 %r10482, %rs388; 2026-02-21T12:44:28.6206292Z bar.sync 0; 2026-02-21T12:44:28.6206417Z st.shared.v4.b32 [%r17], {%r10474, %r10472, %r10473, %r10471}; 2026-02-21T12:44:28.6206738Z st.shared.v4.b32 [%r18], {%r10482, %r10479, %r10481, %r10480}; 2026-02-21T12:44:28.6206796Z $L__tmp27: 2026-02-21T12:44:28.6207074Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6207153Z // begin inline asm 2026-02-21T12:44:28.6207234Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6207291Z // end inline asm 2026-02-21T12:44:28.6207349Z bar.sync 0; 2026-02-21T12:44:28.6207422Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6207481Z // begin inline asm 2026-02-21T12:44:28.6210334Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329}, {%r9635,%r9636,%r9637,%r9638}, %rd466, %p21, 1, 1; 2026-02-21T12:44:28.6210401Z // end inline asm 2026-02-21T12:44:28.6210483Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6210545Z mov.b32 %r9767, %r11895; 2026-02-21T12:44:28.6210607Z mov.b32 %r9768, %r10903; 2026-02-21T12:44:28.6210664Z mov.b32 %r9769, %r10903; 2026-02-21T12:44:28.6210728Z // begin inline asm 2026-02-21T12:44:28.6213242Z // wait for regs: %r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329,%r9767,%r9768,%r9769 2026-02-21T12:44:28.6213326Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6213384Z // end inline asm 2026-02-21T12:44:28.6213439Z $L__tmp28: 2026-02-21T12:44:28.6213653Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6213714Z // begin inline asm 2026-02-21T12:44:28.6213773Z mov.u64 %rd457, 0x0; 2026-02-21T12:44:28.6213903Z createpolicy.fractional.L2::evict_last.b64 %rd457, 1.0; 2026-02-21T12:44:28.6213961Z // end inline asm 2026-02-21T12:44:28.6214021Z // begin inline asm 2026-02-21T12:44:28.6214094Z mov.u32 %r9901, 0x0; 2026-02-21T12:44:28.6214155Z mov.u32 %r9902, 0x0; 2026-02-21T12:44:28.6214347Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r9901, %r9902 }, [ %rd662 + 0 ], %rd457; 2026-02-21T12:44:28.6214472Z // end inline asm 2026-02-21T12:44:28.6214684Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6214816Z bar.sync 0; 2026-02-21T12:44:28.6214898Z st.shared.v2.b32 [%r10], {%r9901, %r9902}; 2026-02-21T12:44:28.6214969Z bar.sync 0; 2026-02-21T12:44:28.6215037Z ld.shared.b16 %rs393, [%r11]; 2026-02-21T12:44:28.6215106Z ld.shared.b16 %rs394, [%r11+128]; 2026-02-21T12:44:28.6215172Z ld.shared.b16 %rs395, [%r11+8]; 2026-02-21T12:44:28.6215241Z ld.shared.b16 %rs396, [%r11+136]; 2026-02-21T12:44:28.6215302Z cvt.f32.bf16 %r10160, %rs393; 2026-02-21T12:44:28.6215362Z cvt.f32.bf16 %r10161, %rs394; 2026-02-21T12:44:28.6215429Z cvt.f32.bf16 %r10162, %rs395; 2026-02-21T12:44:28.6215491Z cvt.f32.bf16 %r10163, %rs396; 2026-02-21T12:44:28.6215695Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6215764Z add.s64 %rd460, %rd661, 10240; 2026-02-21T12:44:28.6215825Z // begin inline asm 2026-02-21T12:44:28.6215883Z mov.u32 %r9903, 0x0; 2026-02-21T12:44:28.6216008Z ld.global.b32 { %r9903 }, [ %rd460 + 0 ]; 2026-02-21T12:44:28.6216115Z // end inline asm 2026-02-21T12:44:28.6216315Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6216371Z bar.sync 0; 2026-02-21T12:44:28.6216440Z st.shared.b8 [%r12], %r9903; 2026-02-21T12:44:28.6216644Z prmt.b32 %r10483, %r9903, 0, 0x7771U; 2026-02-21T12:44:28.6216712Z st.shared.b8 [%r13+256], %r10483; 2026-02-21T12:44:28.6216776Z prmt.b32 %r10484, %r9903, 0, 0x7772U; 2026-02-21T12:44:28.6216843Z st.shared.b8 [%r14+512], %r10484; 2026-02-21T12:44:28.6216907Z prmt.b32 %r10485, %r9903, 0, 0x7773U; 2026-02-21T12:44:28.6216971Z st.shared.b8 [%r15+768], %r10485; 2026-02-21T12:44:28.6217030Z bar.sync 0; 2026-02-21T12:44:28.6217096Z ld.shared.b32 %r10486, [%r16]; 2026-02-21T12:44:28.6217163Z prmt.b32 %r10487, %r10486, 0, 0x7771U; 2026-02-21T12:44:28.6217235Z cvt.u16.u32 %rs397, %r10487; 2026-02-21T12:44:28.6217300Z prmt.b32 %r10488, %r10486, 0, 0x7770U; 2026-02-21T12:44:28.6217366Z cvt.u16.u32 %rs398, %r10488; 2026-02-21T12:44:28.6217431Z prmt.b32 %r10489, %r10486, 0, 0x7773U; 2026-02-21T12:44:28.6217496Z cvt.u16.u32 %rs399, %r10489; 2026-02-21T12:44:28.6217561Z prmt.b32 %r10490, %r10486, 0, 0x7772U; 2026-02-21T12:44:28.6217621Z cvt.u16.u32 %rs400, %r10490; 2026-02-21T12:44:28.6217823Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6217887Z shl.b16 %rs401, %rs398, 4; 2026-02-21T12:44:28.6217948Z shl.b16 %rs402, %rs397, 4; 2026-02-21T12:44:28.6218144Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6218209Z cvt.u32.u16 %r10491, %rs401; 2026-02-21T12:44:28.6218288Z prmt.b32 %r10492, %r10491, %r10493, 0x3340U; 2026-02-21T12:44:28.6218366Z prmt.b32 %r10494, %r10492, %r10437, 0x5410U; 2026-02-21T12:44:28.6218443Z prmt.b32 %r10495, %r10494, %r10486, 0x5040U; 2026-02-21T12:44:28.6218511Z prmt.b32 %r10496, %r10495, 0, 0x9991U; 2026-02-21T12:44:28.6218577Z cvt.u16.u32 %rs403, %r10496; 2026-02-21T12:44:28.6218641Z shr.s16 %rs404, %rs403, 4; 2026-02-21T12:44:28.6218706Z prmt.b32 %r10497, %r10495, 0, 0xbbb3U; 2026-02-21T12:44:28.6218766Z cvt.u16.u32 %rs405, %r10497; 2026-02-21T12:44:28.6218826Z shr.s16 %rs406, %rs405, 4; 2026-02-21T12:44:28.6218890Z cvt.s16.s8 %rs407, %rs401; 2026-02-21T12:44:28.6218963Z shr.s16 %rs408, %rs407, 4; 2026-02-21T12:44:28.6219026Z cvt.s16.s8 %rs409, %rs402; 2026-02-21T12:44:28.6219087Z shr.s16 %rs410, %rs409, 4; 2026-02-21T12:44:28.6219289Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6219355Z cvt.rn.f32.s16 %r10498, %rs406; 2026-02-21T12:44:28.6219423Z cvt.rn.f32.s16 %r10499, %rs404; 2026-02-21T12:44:28.6219567Z cvt.rn.f32.s16 %r10500, %rs410; 2026-02-21T12:44:28.6219630Z cvt.rn.f32.s16 %r10501, %rs408; 2026-02-21T12:44:28.6219830Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6219953Z shl.b16 %rs411, %rs400, 4; 2026-02-21T12:44:28.6220023Z shl.b16 %rs412, %rs399, 4; 2026-02-21T12:44:28.6220224Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6220303Z prmt.b32 %r10502, %r10486, %r10503, 0x3020U; 2026-02-21T12:44:28.6220370Z prmt.b32 %r10504, %r10502, 0, 0x9991U; 2026-02-21T12:44:28.6220431Z cvt.u16.u32 %rs413, %r10504; 2026-02-21T12:44:28.6220494Z shr.s16 %rs414, %rs413, 4; 2026-02-21T12:44:28.6220555Z cvt.s16.s8 %rs415, %rs411; 2026-02-21T12:44:28.6220615Z shr.s16 %rs416, %rs415, 4; 2026-02-21T12:44:28.6220675Z cvt.s16.s8 %rs417, %rs412; 2026-02-21T12:44:28.6220739Z shr.s16 %rs418, %rs417, 4; 2026-02-21T12:44:28.6220804Z prmt.b32 %r10505, %r10486, 0, 0xbbb3U; 2026-02-21T12:44:28.6220867Z cvt.u16.u32 %rs419, %r10505; 2026-02-21T12:44:28.6220929Z shr.s16 %rs420, %rs419, 4; 2026-02-21T12:44:28.6221247Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6221313Z cvt.rn.f32.s16 %r10506, %rs414; 2026-02-21T12:44:28.6221378Z cvt.rn.f32.s16 %r10507, %rs420; 2026-02-21T12:44:28.6221443Z cvt.rn.f32.s16 %r10508, %rs418; 2026-02-21T12:44:28.6221505Z cvt.rn.f32.s16 %r10509, %rs416; 2026-02-21T12:44:28.6221561Z bar.sync 0; 2026-02-21T12:44:28.6221685Z st.shared.v4.b32 [%r17], {%r10501, %r10499, %r10500, %r10498}; 2026-02-21T12:44:28.6221798Z st.shared.v4.b32 [%r18], {%r10509, %r10506, %r10508, %r10507}; 2026-02-21T12:44:28.6221853Z $L__tmp29: 2026-02-21T12:44:28.6222132Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6222196Z // begin inline asm 2026-02-21T12:44:28.6222275Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6222332Z // end inline asm 2026-02-21T12:44:28.6222391Z bar.sync 0; 2026-02-21T12:44:28.6222465Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6222528Z // begin inline asm 2026-02-21T12:44:28.6225260Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329}, {%r10160,%r10161,%r10162,%r10163}, %rd466, %p21, 1, 1; 2026-02-21T12:44:28.6225322Z // end inline asm 2026-02-21T12:44:28.6225403Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6225477Z mov.b32 %r10294, %r10903; 2026-02-21T12:44:28.6225540Z mov.b32 %r10292, %r11895; 2026-02-21T12:44:28.6225601Z mov.b32 %r10293, %r10903; 2026-02-21T12:44:28.6225659Z // begin inline asm 2026-02-21T12:44:28.6228302Z // wait for regs: %r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329,%r10292,%r10293,%r10294 2026-02-21T12:44:28.6228586Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6228647Z // end inline asm 2026-02-21T12:44:28.6228702Z $L__tmp30: 2026-02-21T12:44:28.6228928Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6228994Z add.s64 %rd663, %rd663, 12; 2026-02-21T12:44:28.6229055Z add.s64 %rd662, %rd662, 48; 2026-02-21T12:44:28.6229239Z add.s64 %rd661, %rd661, 15360; 2026-02-21T12:44:28.6229308Z setp.lt.u64 %p24, %rd663, 4080; 2026-02-21T12:44:28.6229369Z @%p24 bra $L__BB0_21; 2026-02-21T12:44:28.6229484Z // %bb.22: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:28.6229693Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6229757Z or.b64 %rd483, %rd91, %rd5; 2026-02-21T12:44:28.6229818Z or.b64 %rd484, %rd91, %rd6; 2026-02-21T12:44:28.6229883Z or.b64 %rd485, %rd91, %rd7; 2026-02-21T12:44:28.6229944Z or.b64 %rd486, %rd91, %rd8; 2026-02-21T12:44:28.6230006Z or.b64 %rd487, %rd91, %rd9; 2026-02-21T12:44:28.6230074Z or.b64 %rd488, %rd91, %rd10; 2026-02-21T12:44:28.6230134Z or.b64 %rd489, %rd91, %rd11; 2026-02-21T12:44:28.6230193Z or.b64 %rd490, %rd91, %rd12; 2026-02-21T12:44:28.6230260Z or.b64 %rd491, %rd91, %rd13; 2026-02-21T12:44:28.6230320Z or.b64 %rd492, %rd91, %rd14; 2026-02-21T12:44:28.6230384Z or.b64 %rd493, %rd91, %rd15; 2026-02-21T12:44:28.6230445Z or.b64 %rd494, %rd91, %rd16; 2026-02-21T12:44:28.6230510Z or.b64 %rd495, %rd91, %rd17; 2026-02-21T12:44:28.6230571Z or.b64 %rd496, %rd91, %rd18; 2026-02-21T12:44:28.6230631Z or.b64 %rd497, %rd91, %rd19; 2026-02-21T12:44:28.6230695Z or.b64 %rd498, %rd91, %rd20; 2026-02-21T12:44:28.6230895Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6230956Z or.b64 %rd499, %rd92, %rd22; 2026-02-21T12:44:28.6231159Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6231227Z shl.b64 %rd500, %rd94, 1; 2026-02-21T12:44:28.6231292Z add.s64 %rd463, %rd25, %rd500; 2026-02-21T12:44:28.6231492Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6231558Z // begin inline asm 2026-02-21T12:44:28.6231617Z mov.u64 %rd462, 0x0; 2026-02-21T12:44:28.6231750Z createpolicy.fractional.L2::evict_last.b64 %rd462, 1.0; 2026-02-21T12:44:28.6231810Z // end inline asm 2026-02-21T12:44:28.6231868Z // begin inline asm 2026-02-21T12:44:28.6231927Z mov.u32 %r10510, 0x0; 2026-02-21T12:44:28.6231985Z mov.u32 %r10511, 0x0; 2026-02-21T12:44:28.6232185Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r10510, %r10511 }, [ %rd463 + 0 ], %rd462; 2026-02-21T12:44:28.6232243Z // end inline asm 2026-02-21T12:44:28.6232443Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6232513Z bar.sync 0; 2026-02-21T12:44:28.6232600Z st.shared.v2.b32 [%r10], {%r10510, %r10511}; 2026-02-21T12:44:28.6232657Z bar.sync 0; 2026-02-21T12:44:28.6232727Z ld.shared.b16 %rs421, [%r11]; 2026-02-21T12:44:28.6232795Z ld.shared.b16 %rs422, [%r11+128]; 2026-02-21T12:44:28.6232921Z ld.shared.b16 %rs423, [%r11+8]; 2026-02-21T12:44:28.6232987Z ld.shared.b16 %rs424, [%r11+136]; 2026-02-21T12:44:28.6233101Z cvt.f32.bf16 %r10769, %rs421; 2026-02-21T12:44:28.6233162Z cvt.f32.bf16 %r10770, %rs422; 2026-02-21T12:44:28.6233222Z cvt.f32.bf16 %r10771, %rs423; 2026-02-21T12:44:28.6233285Z cvt.f32.bf16 %r10772, %rs424; 2026-02-21T12:44:28.6233503Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6233571Z add.s64 %rd501, %rd645, %rd93; 2026-02-21T12:44:28.6233636Z add.s64 %rd465, %rd501, 5237760; 2026-02-21T12:44:28.6233842Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6233903Z // begin inline asm 2026-02-21T12:44:28.6233961Z mov.u32 %r10512, 0x0; 2026-02-21T12:44:28.6234039Z ld.global.b32 { %r10512 }, [ %rd465 + 0 ]; 2026-02-21T12:44:28.6234096Z // end inline asm 2026-02-21T12:44:28.6234297Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6234354Z bar.sync 0; 2026-02-21T12:44:28.6234515Z st.shared.b8 [%r12], %r10512; 2026-02-21T12:44:28.6234589Z prmt.b32 %r11179, %r10512, 0, 0x7771U; 2026-02-21T12:44:28.6234655Z st.shared.b8 [%r13+256], %r11179; 2026-02-21T12:44:28.6234726Z prmt.b32 %r11180, %r10512, 0, 0x7772U; 2026-02-21T12:44:28.6234790Z st.shared.b8 [%r14+512], %r11180; 2026-02-21T12:44:28.6234855Z prmt.b32 %r11181, %r10512, 0, 0x7773U; 2026-02-21T12:44:28.6234921Z st.shared.b8 [%r15+768], %r11181; 2026-02-21T12:44:28.6234977Z bar.sync 0; 2026-02-21T12:44:28.6235043Z ld.shared.b32 %r11182, [%r16]; 2026-02-21T12:44:28.6235113Z prmt.b32 %r11183, %r11182, 0, 0x7771U; 2026-02-21T12:44:28.6235177Z cvt.u16.u32 %rs425, %r11183; 2026-02-21T12:44:28.6235241Z prmt.b32 %r11184, %r11182, 0, 0x7770U; 2026-02-21T12:44:28.6235303Z cvt.u16.u32 %rs426, %r11184; 2026-02-21T12:44:28.6235368Z prmt.b32 %r11185, %r11182, 0, 0x7773U; 2026-02-21T12:44:28.6235431Z cvt.u16.u32 %rs427, %r11185; 2026-02-21T12:44:28.6235494Z prmt.b32 %r11186, %r11182, 0, 0x7772U; 2026-02-21T12:44:28.6235563Z cvt.u16.u32 %rs428, %r11186; 2026-02-21T12:44:28.6235763Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6235826Z shl.b16 %rs429, %rs426, 4; 2026-02-21T12:44:28.6235889Z shl.b16 %rs430, %rs425, 4; 2026-02-21T12:44:28.6236085Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6236147Z cvt.u32.u16 %r11187, %rs429; 2026-02-21T12:44:28.6236226Z prmt.b32 %r11188, %r11187, %r11189, 0x3340U; 2026-02-21T12:44:28.6236305Z prmt.b32 %r11193, %r11188, %r11190, 0x5410U; 2026-02-21T12:44:28.6236379Z prmt.b32 %r11194, %r11193, %r11182, 0x5040U; 2026-02-21T12:44:28.6236586Z prmt.b32 %r11195, %r11194, 0, 0x9991U; 2026-02-21T12:44:28.6236657Z cvt.u16.u32 %rs431, %r11195; 2026-02-21T12:44:28.6236721Z shr.s16 %rs432, %rs431, 4; 2026-02-21T12:44:28.6236787Z prmt.b32 %r11196, %r11194, 0, 0xbbb3U; 2026-02-21T12:44:28.6236850Z cvt.u16.u32 %rs433, %r11196; 2026-02-21T12:44:28.6236916Z shr.s16 %rs434, %rs433, 4; 2026-02-21T12:44:28.6236978Z cvt.s16.s8 %rs435, %rs429; 2026-02-21T12:44:28.6237039Z shr.s16 %rs436, %rs435, 4; 2026-02-21T12:44:28.6237103Z cvt.s16.s8 %rs437, %rs430; 2026-02-21T12:44:28.6237163Z shr.s16 %rs438, %rs437, 4; 2026-02-21T12:44:28.6237373Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6237444Z cvt.rn.f32.s16 %r11197, %rs434; 2026-02-21T12:44:28.6237509Z cvt.rn.f32.s16 %r11198, %rs432; 2026-02-21T12:44:28.6237571Z cvt.rn.f32.s16 %r11199, %rs438; 2026-02-21T12:44:28.6237634Z cvt.rn.f32.s16 %r11200, %rs436; 2026-02-21T12:44:28.6237837Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6237897Z shl.b16 %rs439, %rs428, 4; 2026-02-21T12:44:28.6238057Z shl.b16 %rs440, %rs427, 4; 2026-02-21T12:44:28.6238275Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6238416Z prmt.b32 %r11201, %r11182, %r11202, 0x3020U; 2026-02-21T12:44:28.6238482Z prmt.b32 %r11203, %r11201, 0, 0x9991U; 2026-02-21T12:44:28.6238551Z cvt.u16.u32 %rs441, %r11203; 2026-02-21T12:44:28.6238612Z shr.s16 %rs442, %rs441, 4; 2026-02-21T12:44:28.6238672Z cvt.s16.s8 %rs443, %rs439; 2026-02-21T12:44:28.6238733Z shr.s16 %rs444, %rs443, 4; 2026-02-21T12:44:28.6238796Z cvt.s16.s8 %rs445, %rs440; 2026-02-21T12:44:28.6238856Z shr.s16 %rs446, %rs445, 4; 2026-02-21T12:44:28.6238920Z prmt.b32 %r11204, %r11182, 0, 0xbbb3U; 2026-02-21T12:44:28.6238985Z cvt.u16.u32 %rs447, %r11204; 2026-02-21T12:44:28.6239058Z shr.s16 %rs448, %rs447, 4; 2026-02-21T12:44:28.6239257Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6239324Z cvt.rn.f32.s16 %r11205, %rs442; 2026-02-21T12:44:28.6239386Z cvt.rn.f32.s16 %r11206, %rs448; 2026-02-21T12:44:28.6239512Z cvt.rn.f32.s16 %r11207, %rs446; 2026-02-21T12:44:28.6239632Z cvt.rn.f32.s16 %r11208, %rs444; 2026-02-21T12:44:28.6239691Z bar.sync 0; 2026-02-21T12:44:28.6239811Z st.shared.v4.b32 [%r17], {%r11200, %r11198, %r11199, %r11197}; 2026-02-21T12:44:28.6239921Z st.shared.v4.b32 [%r18], {%r11208, %r11205, %r11207, %r11206}; 2026-02-21T12:44:28.6239977Z $L__tmp31: 2026-02-21T12:44:28.6240251Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6240310Z // begin inline asm 2026-02-21T12:44:28.6240390Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6240458Z // end inline asm 2026-02-21T12:44:28.6240515Z bar.sync 0; 2026-02-21T12:44:28.6240588Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6240649Z // begin inline asm 2026-02-21T12:44:28.6243371Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329}, {%r10769,%r10770,%r10771,%r10772}, %rd466, %p21, 1, 1; 2026-02-21T12:44:28.6243438Z // end inline asm 2026-02-21T12:44:28.6243524Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6243587Z mov.b32 %r10902, %r10903; 2026-02-21T12:44:28.6243650Z mov.b32 %r10901, %r11895; 2026-02-21T12:44:28.6243709Z // begin inline asm 2026-02-21T12:44:28.6246224Z // wait for regs: %r14202,%r14203,%r14204,%r14205,%r14206,%r14207,%r14208,%r14209,%r14210,%r14211,%r14212,%r14213,%r14214,%r14215,%r14216,%r14217,%r14218,%r14219,%r14220,%r14221,%r14222,%r14223,%r14224,%r14225,%r14226,%r14227,%r14228,%r14229,%r14230,%r14231,%r14232,%r14233,%r14234,%r14235,%r14236,%r14237,%r14238,%r14239,%r14240,%r14241,%r14242,%r14243,%r14244,%r14245,%r14246,%r14247,%r14248,%r14249,%r14250,%r14251,%r14252,%r14253,%r14254,%r14255,%r14256,%r14257,%r14258,%r14259,%r14260,%r14261,%r14262,%r14263,%r14264,%r14265,%r14266,%r14267,%r14268,%r14269,%r14270,%r14271,%r14272,%r14273,%r14274,%r14275,%r14276,%r14277,%r14278,%r14279,%r14280,%r14281,%r14282,%r14283,%r14284,%r14285,%r14286,%r14287,%r14288,%r14289,%r14290,%r14291,%r14292,%r14293,%r14294,%r14295,%r14296,%r14297,%r14298,%r14299,%r14300,%r14301,%r14302,%r14303,%r14304,%r14305,%r14306,%r14307,%r14308,%r14309,%r14310,%r14311,%r14312,%r14313,%r14314,%r14315,%r14316,%r14317,%r14318,%r14319,%r14320,%r14321,%r14322,%r14323,%r14324,%r14325,%r14326,%r14327,%r14328,%r14329,%r10901,%r10902,%r10903 2026-02-21T12:44:28.6246396Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6246577Z // end inline asm 2026-02-21T12:44:28.6246639Z $L__tmp32: 2026-02-21T12:44:28.6246844Z .loc 1 90 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:90:28 2026-02-21T12:44:28.6246926Z cvt.rn.bf16x2.f32 %r11209, %r14203, %r14202; 2026-02-21T12:44:28.6247009Z cvt.rn.bf16x2.f32 %r11210, %r14205, %r14204; 2026-02-21T12:44:28.6247085Z cvt.rn.bf16x2.f32 %r11211, %r14207, %r14206; 2026-02-21T12:44:28.6247160Z cvt.rn.bf16x2.f32 %r11212, %r14209, %r14208; 2026-02-21T12:44:28.6247239Z cvt.rn.bf16x2.f32 %r11213, %r14211, %r14210; 2026-02-21T12:44:28.6247318Z cvt.rn.bf16x2.f32 %r11214, %r14213, %r14212; 2026-02-21T12:44:28.6247522Z cvt.rn.bf16x2.f32 %r11215, %r14215, %r14214; 2026-02-21T12:44:28.6247601Z cvt.rn.bf16x2.f32 %r11216, %r14217, %r14216; 2026-02-21T12:44:28.6247679Z cvt.rn.bf16x2.f32 %r11217, %r14219, %r14218; 2026-02-21T12:44:28.6247754Z cvt.rn.bf16x2.f32 %r11218, %r14221, %r14220; 2026-02-21T12:44:28.6247830Z cvt.rn.bf16x2.f32 %r11219, %r14223, %r14222; 2026-02-21T12:44:28.6247905Z cvt.rn.bf16x2.f32 %r11220, %r14225, %r14224; 2026-02-21T12:44:28.6247995Z cvt.rn.bf16x2.f32 %r11221, %r14227, %r14226; 2026-02-21T12:44:28.6248075Z cvt.rn.bf16x2.f32 %r11222, %r14229, %r14228; 2026-02-21T12:44:28.6248152Z cvt.rn.bf16x2.f32 %r11223, %r14231, %r14230; 2026-02-21T12:44:28.6248230Z cvt.rn.bf16x2.f32 %r11224, %r14233, %r14232; 2026-02-21T12:44:28.6248306Z cvt.rn.bf16x2.f32 %r11225, %r14235, %r14234; 2026-02-21T12:44:28.6248383Z cvt.rn.bf16x2.f32 %r11226, %r14237, %r14236; 2026-02-21T12:44:28.6248459Z cvt.rn.bf16x2.f32 %r11227, %r14239, %r14238; 2026-02-21T12:44:28.6248540Z cvt.rn.bf16x2.f32 %r11228, %r14241, %r14240; 2026-02-21T12:44:28.6248620Z cvt.rn.bf16x2.f32 %r11229, %r14243, %r14242; 2026-02-21T12:44:28.6248696Z cvt.rn.bf16x2.f32 %r11230, %r14245, %r14244; 2026-02-21T12:44:28.6248771Z cvt.rn.bf16x2.f32 %r11231, %r14247, %r14246; 2026-02-21T12:44:28.6248846Z cvt.rn.bf16x2.f32 %r11232, %r14249, %r14248; 2026-02-21T12:44:28.6248929Z cvt.rn.bf16x2.f32 %r11233, %r14251, %r14250; 2026-02-21T12:44:28.6249006Z cvt.rn.bf16x2.f32 %r11234, %r14253, %r14252; 2026-02-21T12:44:28.6249080Z cvt.rn.bf16x2.f32 %r11235, %r14255, %r14254; 2026-02-21T12:44:28.6249159Z cvt.rn.bf16x2.f32 %r11236, %r14257, %r14256; 2026-02-21T12:44:28.6249233Z cvt.rn.bf16x2.f32 %r11237, %r14259, %r14258; 2026-02-21T12:44:28.6249307Z cvt.rn.bf16x2.f32 %r11238, %r14261, %r14260; 2026-02-21T12:44:28.6249384Z cvt.rn.bf16x2.f32 %r11239, %r14263, %r14262; 2026-02-21T12:44:28.6249462Z cvt.rn.bf16x2.f32 %r11240, %r14265, %r14264; 2026-02-21T12:44:28.6249538Z cvt.rn.bf16x2.f32 %r11241, %r14267, %r14266; 2026-02-21T12:44:28.6249617Z cvt.rn.bf16x2.f32 %r11242, %r14269, %r14268; 2026-02-21T12:44:28.6249710Z cvt.rn.bf16x2.f32 %r11243, %r14271, %r14270; 2026-02-21T12:44:28.6249788Z cvt.rn.bf16x2.f32 %r11244, %r14273, %r14272; 2026-02-21T12:44:28.6249862Z cvt.rn.bf16x2.f32 %r11245, %r14275, %r14274; 2026-02-21T12:44:28.6249939Z cvt.rn.bf16x2.f32 %r11246, %r14277, %r14276; 2026-02-21T12:44:28.6250014Z cvt.rn.bf16x2.f32 %r11247, %r14279, %r14278; 2026-02-21T12:44:28.6250089Z cvt.rn.bf16x2.f32 %r11248, %r14281, %r14280; 2026-02-21T12:44:28.6250168Z cvt.rn.bf16x2.f32 %r11249, %r14283, %r14282; 2026-02-21T12:44:28.6250243Z cvt.rn.bf16x2.f32 %r11250, %r14285, %r14284; 2026-02-21T12:44:28.6250318Z cvt.rn.bf16x2.f32 %r11251, %r14287, %r14286; 2026-02-21T12:44:28.6250391Z cvt.rn.bf16x2.f32 %r11252, %r14289, %r14288; 2026-02-21T12:44:28.6250546Z cvt.rn.bf16x2.f32 %r11253, %r14291, %r14290; 2026-02-21T12:44:28.6250620Z cvt.rn.bf16x2.f32 %r11254, %r14293, %r14292; 2026-02-21T12:44:28.6250697Z cvt.rn.bf16x2.f32 %r11255, %r14295, %r14294; 2026-02-21T12:44:28.6250836Z cvt.rn.bf16x2.f32 %r11256, %r14297, %r14296; 2026-02-21T12:44:28.6250911Z cvt.rn.bf16x2.f32 %r11257, %r14299, %r14298; 2026-02-21T12:44:28.6250985Z cvt.rn.bf16x2.f32 %r11258, %r14301, %r14300; 2026-02-21T12:44:28.6251061Z cvt.rn.bf16x2.f32 %r11259, %r14303, %r14302; 2026-02-21T12:44:28.6251135Z cvt.rn.bf16x2.f32 %r11260, %r14305, %r14304; 2026-02-21T12:44:28.6251210Z cvt.rn.bf16x2.f32 %r11261, %r14307, %r14306; 2026-02-21T12:44:28.6251285Z cvt.rn.bf16x2.f32 %r11262, %r14309, %r14308; 2026-02-21T12:44:28.6251362Z cvt.rn.bf16x2.f32 %r11263, %r14311, %r14310; 2026-02-21T12:44:28.6251437Z cvt.rn.bf16x2.f32 %r11264, %r14313, %r14312; 2026-02-21T12:44:28.6251511Z cvt.rn.bf16x2.f32 %r11265, %r14315, %r14314; 2026-02-21T12:44:28.6251589Z cvt.rn.bf16x2.f32 %r11266, %r14317, %r14316; 2026-02-21T12:44:28.6251665Z cvt.rn.bf16x2.f32 %r11267, %r14319, %r14318; 2026-02-21T12:44:28.6251738Z cvt.rn.bf16x2.f32 %r11268, %r14321, %r14320; 2026-02-21T12:44:28.6251908Z cvt.rn.bf16x2.f32 %r11269, %r14323, %r14322; 2026-02-21T12:44:28.6251987Z cvt.rn.bf16x2.f32 %r11270, %r14325, %r14324; 2026-02-21T12:44:28.6252062Z cvt.rn.bf16x2.f32 %r11271, %r14327, %r14326; 2026-02-21T12:44:28.6252137Z cvt.rn.bf16x2.f32 %r11272, %r14329, %r14328; 2026-02-21T12:44:28.6252360Z .loc 1 91 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:22 2026-02-21T12:44:28.6252436Z mad.lo.s64 %rd502, %rd483, 2560, %rd167; 2026-02-21T12:44:28.6252500Z shl.b64 %rd503, %rd499, 1; 2026-02-21T12:44:28.6252569Z add.s64 %rd467, %rd502, %rd503; 2026-02-21T12:44:28.6252641Z mad.lo.s64 %rd504, %rd484, 2560, %rd167; 2026-02-21T12:44:28.6252704Z add.s64 %rd468, %rd504, %rd503; 2026-02-21T12:44:28.6252774Z mad.lo.s64 %rd505, %rd485, 2560, %rd167; 2026-02-21T12:44:28.6252840Z add.s64 %rd469, %rd505, %rd503; 2026-02-21T12:44:28.6252908Z mad.lo.s64 %rd506, %rd486, 2560, %rd167; 2026-02-21T12:44:28.6252970Z add.s64 %rd470, %rd506, %rd503; 2026-02-21T12:44:28.6253044Z mad.lo.s64 %rd507, %rd487, 2560, %rd167; 2026-02-21T12:44:28.6253107Z add.s64 %rd471, %rd507, %rd503; 2026-02-21T12:44:28.6253174Z mad.lo.s64 %rd508, %rd488, 2560, %rd167; 2026-02-21T12:44:28.6253238Z add.s64 %rd472, %rd508, %rd503; 2026-02-21T12:44:28.6253306Z mad.lo.s64 %rd509, %rd489, 2560, %rd167; 2026-02-21T12:44:28.6253368Z add.s64 %rd473, %rd509, %rd503; 2026-02-21T12:44:28.6253436Z mad.lo.s64 %rd510, %rd490, 2560, %rd167; 2026-02-21T12:44:28.6253499Z add.s64 %rd474, %rd510, %rd503; 2026-02-21T12:44:28.6253567Z mad.lo.s64 %rd511, %rd491, 2560, %rd167; 2026-02-21T12:44:28.6253628Z add.s64 %rd475, %rd511, %rd503; 2026-02-21T12:44:28.6253700Z mad.lo.s64 %rd512, %rd492, 2560, %rd167; 2026-02-21T12:44:28.6253762Z add.s64 %rd476, %rd512, %rd503; 2026-02-21T12:44:28.6253828Z mad.lo.s64 %rd513, %rd493, 2560, %rd167; 2026-02-21T12:44:28.6253891Z add.s64 %rd477, %rd513, %rd503; 2026-02-21T12:44:28.6253962Z mad.lo.s64 %rd514, %rd494, 2560, %rd167; 2026-02-21T12:44:28.6254028Z add.s64 %rd478, %rd514, %rd503; 2026-02-21T12:44:28.6254098Z mad.lo.s64 %rd515, %rd495, 2560, %rd167; 2026-02-21T12:44:28.6254163Z add.s64 %rd479, %rd515, %rd503; 2026-02-21T12:44:28.6254232Z mad.lo.s64 %rd516, %rd496, 2560, %rd167; 2026-02-21T12:44:28.6254293Z add.s64 %rd480, %rd516, %rd503; 2026-02-21T12:44:28.6254364Z mad.lo.s64 %rd517, %rd497, 2560, %rd167; 2026-02-21T12:44:28.6254425Z add.s64 %rd481, %rd517, %rd503; 2026-02-21T12:44:28.6254492Z mad.lo.s64 %rd518, %rd498, 2560, %rd167; 2026-02-21T12:44:28.6254553Z add.s64 %rd482, %rd518, %rd503; 2026-02-21T12:44:28.6254771Z .loc 1 91 81 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:81 2026-02-21T12:44:28.6254829Z bar.sync 0; 2026-02-21T12:44:28.6254950Z st.shared.v4.b32 [%r19], {%r11209, %r11211, %r11213, %r11215}; 2026-02-21T12:44:28.6255125Z st.shared.v4.b32 [%r20], {%r11217, %r11219, %r11221, %r11223}; 2026-02-21T12:44:28.6255236Z st.shared.v4.b32 [%r21], {%r11225, %r11227, %r11229, %r11231}; 2026-02-21T12:44:28.6255388Z st.shared.v4.b32 [%r22], {%r11233, %r11235, %r11237, %r11239}; 2026-02-21T12:44:28.6255497Z st.shared.v4.b32 [%r23], {%r11241, %r11243, %r11245, %r11247}; 2026-02-21T12:44:28.6255604Z st.shared.v4.b32 [%r24], {%r11249, %r11251, %r11253, %r11255}; 2026-02-21T12:44:28.6255711Z st.shared.v4.b32 [%r25], {%r11257, %r11259, %r11261, %r11263}; 2026-02-21T12:44:28.6255817Z st.shared.v4.b32 [%r26], {%r11265, %r11267, %r11269, %r11271}; 2026-02-21T12:44:28.6255876Z bar.sync 0; 2026-02-21T12:44:28.6255935Z // begin inline asm 2026-02-21T12:44:28.6256139Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11035, %r11036, %r11037, %r11038}, [%r3760]; 2026-02-21T12:44:28.6256201Z // end inline asm 2026-02-21T12:44:28.6256260Z // begin inline asm 2026-02-21T12:44:28.6256575Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11040, %r11041, %r11042, %r11043}, [%r3765]; 2026-02-21T12:44:28.6256643Z // end inline asm 2026-02-21T12:44:28.6256775Z // begin inline asm 2026-02-21T12:44:28.6257047Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11045, %r11046, %r11047, %r11048}, [%r3770]; 2026-02-21T12:44:28.6257106Z // end inline asm 2026-02-21T12:44:28.6257167Z // begin inline asm 2026-02-21T12:44:28.6257354Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11050, %r11051, %r11052, %r11053}, [%r3775]; 2026-02-21T12:44:28.6257410Z // end inline asm 2026-02-21T12:44:28.6257482Z // begin inline asm 2026-02-21T12:44:28.6257672Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11055, %r11056, %r11057, %r11058}, [%r3780]; 2026-02-21T12:44:28.6257727Z // end inline asm 2026-02-21T12:44:28.6257788Z // begin inline asm 2026-02-21T12:44:28.6257975Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11060, %r11061, %r11062, %r11063}, [%r3785]; 2026-02-21T12:44:28.6258031Z // end inline asm 2026-02-21T12:44:28.6258092Z // begin inline asm 2026-02-21T12:44:28.6258282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11065, %r11066, %r11067, %r11068}, [%r3790]; 2026-02-21T12:44:28.6258342Z // end inline asm 2026-02-21T12:44:28.6258403Z // begin inline asm 2026-02-21T12:44:28.6258593Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11070, %r11071, %r11072, %r11073}, [%r3795]; 2026-02-21T12:44:28.6258650Z // end inline asm 2026-02-21T12:44:28.6258706Z bar.sync 0; 2026-02-21T12:44:28.6258816Z st.shared.v4.b32 [%r19], {%r11210, %r11212, %r11214, %r11216}; 2026-02-21T12:44:28.6258928Z st.shared.v4.b32 [%r20], {%r11218, %r11220, %r11222, %r11224}; 2026-02-21T12:44:28.6259034Z st.shared.v4.b32 [%r21], {%r11226, %r11228, %r11230, %r11232}; 2026-02-21T12:44:28.6259140Z st.shared.v4.b32 [%r22], {%r11234, %r11236, %r11238, %r11240}; 2026-02-21T12:44:28.6259248Z st.shared.v4.b32 [%r23], {%r11242, %r11244, %r11246, %r11248}; 2026-02-21T12:44:28.6259353Z st.shared.v4.b32 [%r24], {%r11250, %r11252, %r11254, %r11256}; 2026-02-21T12:44:28.6259459Z st.shared.v4.b32 [%r25], {%r11258, %r11260, %r11262, %r11264}; 2026-02-21T12:44:28.6259571Z st.shared.v4.b32 [%r26], {%r11266, %r11268, %r11270, %r11272}; 2026-02-21T12:44:28.6259631Z bar.sync 0; 2026-02-21T12:44:28.6259689Z // begin inline asm 2026-02-21T12:44:28.6259883Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11075, %r11076, %r11077, %r11078}, [%r3760]; 2026-02-21T12:44:28.6259939Z // end inline asm 2026-02-21T12:44:28.6259997Z // begin inline asm 2026-02-21T12:44:28.6260184Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11080, %r11081, %r11082, %r11083}, [%r3765]; 2026-02-21T12:44:28.6260243Z // end inline asm 2026-02-21T12:44:28.6260301Z // begin inline asm 2026-02-21T12:44:28.6260488Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11085, %r11086, %r11087, %r11088}, [%r3770]; 2026-02-21T12:44:28.6260546Z // end inline asm 2026-02-21T12:44:28.6260603Z // begin inline asm 2026-02-21T12:44:28.6260790Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11090, %r11091, %r11092, %r11093}, [%r3775]; 2026-02-21T12:44:28.6260923Z // end inline asm 2026-02-21T12:44:28.6260984Z // begin inline asm 2026-02-21T12:44:28.6261174Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11095, %r11096, %r11097, %r11098}, [%r3780]; 2026-02-21T12:44:28.6261289Z // end inline asm 2026-02-21T12:44:28.6261349Z // begin inline asm 2026-02-21T12:44:28.6261536Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11100, %r11101, %r11102, %r11103}, [%r3785]; 2026-02-21T12:44:28.6261604Z // end inline asm 2026-02-21T12:44:28.6261665Z // begin inline asm 2026-02-21T12:44:28.6261862Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11105, %r11106, %r11107, %r11108}, [%r3790]; 2026-02-21T12:44:28.6261918Z // end inline asm 2026-02-21T12:44:28.6261975Z // begin inline asm 2026-02-21T12:44:28.6262164Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r11110, %r11111, %r11112, %r11113}, [%r3795]; 2026-02-21T12:44:28.6262221Z // end inline asm 2026-02-21T12:44:28.6262279Z // begin inline asm 2026-02-21T12:44:28.6262411Z st.global.v4.b32 [ %rd467 + 0 ], { %r11035, %r11036, %r11037, %r11038 }; 2026-02-21T12:44:28.6262470Z // end inline asm 2026-02-21T12:44:28.6262528Z // begin inline asm 2026-02-21T12:44:28.6262743Z st.global.v4.b32 [ %rd468 + 0 ], { %r11075, %r11076, %r11077, %r11078 }; 2026-02-21T12:44:28.6262802Z // end inline asm 2026-02-21T12:44:28.6262859Z // begin inline asm 2026-02-21T12:44:28.6262978Z st.global.v4.b32 [ %rd469 + 0 ], { %r11040, %r11041, %r11042, %r11043 }; 2026-02-21T12:44:28.6263036Z // end inline asm 2026-02-21T12:44:28.6263094Z // begin inline asm 2026-02-21T12:44:28.6263212Z st.global.v4.b32 [ %rd470 + 0 ], { %r11080, %r11081, %r11082, %r11083 }; 2026-02-21T12:44:28.6263270Z // end inline asm 2026-02-21T12:44:28.6263327Z // begin inline asm 2026-02-21T12:44:28.6263443Z st.global.v4.b32 [ %rd471 + 0 ], { %r11045, %r11046, %r11047, %r11048 }; 2026-02-21T12:44:28.6263500Z // end inline asm 2026-02-21T12:44:28.6263561Z // begin inline asm 2026-02-21T12:44:28.6263680Z st.global.v4.b32 [ %rd472 + 0 ], { %r11085, %r11086, %r11087, %r11088 }; 2026-02-21T12:44:28.6263738Z // end inline asm 2026-02-21T12:44:28.6263801Z // begin inline asm 2026-02-21T12:44:28.6263925Z st.global.v4.b32 [ %rd473 + 0 ], { %r11050, %r11051, %r11052, %r11053 }; 2026-02-21T12:44:28.6263982Z // end inline asm 2026-02-21T12:44:28.6264045Z // begin inline asm 2026-02-21T12:44:28.6264164Z st.global.v4.b32 [ %rd474 + 0 ], { %r11090, %r11091, %r11092, %r11093 }; 2026-02-21T12:44:28.6264220Z // end inline asm 2026-02-21T12:44:28.6264277Z // begin inline asm 2026-02-21T12:44:28.6264397Z st.global.v4.b32 [ %rd475 + 0 ], { %r11055, %r11056, %r11057, %r11058 }; 2026-02-21T12:44:28.6264452Z // end inline asm 2026-02-21T12:44:28.6264509Z // begin inline asm 2026-02-21T12:44:28.6264629Z st.global.v4.b32 [ %rd476 + 0 ], { %r11095, %r11096, %r11097, %r11098 }; 2026-02-21T12:44:28.6264685Z // end inline asm 2026-02-21T12:44:28.6264743Z // begin inline asm 2026-02-21T12:44:28.6264859Z st.global.v4.b32 [ %rd477 + 0 ], { %r11060, %r11061, %r11062, %r11063 }; 2026-02-21T12:44:28.6264918Z // end inline asm 2026-02-21T12:44:28.6264975Z // begin inline asm 2026-02-21T12:44:28.6265095Z st.global.v4.b32 [ %rd478 + 0 ], { %r11100, %r11101, %r11102, %r11103 }; 2026-02-21T12:44:28.6265169Z // end inline asm 2026-02-21T12:44:28.6265229Z // begin inline asm 2026-02-21T12:44:28.6265347Z st.global.v4.b32 [ %rd479 + 0 ], { %r11065, %r11066, %r11067, %r11068 }; 2026-02-21T12:44:28.6265407Z // end inline asm 2026-02-21T12:44:28.6265464Z // begin inline asm 2026-02-21T12:44:28.6265581Z st.global.v4.b32 [ %rd480 + 0 ], { %r11105, %r11106, %r11107, %r11108 }; 2026-02-21T12:44:28.6265637Z // end inline asm 2026-02-21T12:44:28.6265697Z // begin inline asm 2026-02-21T12:44:28.6265814Z st.global.v4.b32 [ %rd481 + 0 ], { %r11070, %r11071, %r11072, %r11073 }; 2026-02-21T12:44:28.6265869Z // end inline asm 2026-02-21T12:44:28.6265930Z // begin inline asm 2026-02-21T12:44:28.6266049Z st.global.v4.b32 [ %rd482 + 0 ], { %r11110, %r11111, %r11112, %r11113 }; 2026-02-21T12:44:28.6266161Z // end inline asm 2026-02-21T12:44:28.6266380Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6266616Z add.s64 %rd647, %rd647, 4; 2026-02-21T12:44:28.6266692Z setp.lt.s64 %p26, %rd647, %rd698; 2026-02-21T12:44:28.6266754Z @%p26 bra $L__BB0_2; 2026-02-21T12:44:28.6266855Z $L__BB0_23: // %._crit_edge 2026-02-21T12:44:28.6266927Z sub.s64 %rd105, %rd2, %rd698; 2026-02-21T12:44:28.6267135Z .loc 1 28 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:28:35 2026-02-21T12:44:28.6267224Z mul.hi.s64 %rd519, %rd698, 7378697629483820647; 2026-02-21T12:44:28.6267288Z shr.u64 %rd520, %rd519, 63; 2026-02-21T12:44:28.6267361Z shr.s64 %rd521, %rd519, 3; 2026-02-21T12:44:28.6267430Z add.s64 %rd522, %rd521, %rd520; 2026-02-21T12:44:28.6267633Z .loc 1 29 33 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:29:33 2026-02-21T12:44:28.6267697Z shl.b64 %rd107, %rd522, 2; 2026-02-21T12:44:28.6267966Z .loc 1 30 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:39 2026-02-21T12:44:28.6268093Z sub.s64 %rd523, 2048, %rd107; 2026-02-21T12:44:28.6268292Z .loc 1 30 52 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:52 2026-02-21T12:44:28.6268354Z min.s64 %rd108, %rd523, 4; 2026-02-21T12:44:28.6268612Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.6268681Z mul.lo.s64 %rd524, %rd522, 20; 2026-02-21T12:44:28.6268745Z sub.s64 %rd109, %rd698, %rd524; 2026-02-21T12:44:28.6268946Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.6269010Z or.b64 %rd525, %rd109, %rd108; 2026-02-21T12:44:28.6269081Z and.b64 %rd526, %rd525, -4294967296; 2026-02-21T12:44:28.6269148Z setp.ne.b64 %p27, %rd526, 0; 2026-02-21T12:44:28.6269216Z @%p27 bra $L__BB0_25; 2026-02-21T12:44:28.6269276Z bra.uni $L__BB0_24; 2026-02-21T12:44:28.6269331Z $L__BB0_25: 2026-02-21T12:44:28.6269402Z div.s64 %rd665, %rd109, %rd108; 2026-02-21T12:44:28.6269460Z bra.uni $L__BB0_26; 2026-02-21T12:44:28.6269513Z $L__BB0_24: 2026-02-21T12:44:28.6269578Z cvt.u32.u64 %r11273, %rd108; 2026-02-21T12:44:28.6269643Z cvt.u32.u64 %r11274, %rd109; 2026-02-21T12:44:28.6269707Z div.u32 %r11275, %r11274, %r11273; 2026-02-21T12:44:28.6269769Z cvt.u64.u32 %rd665, %r11275; 2026-02-21T12:44:28.6269824Z $L__BB0_26: 2026-02-21T12:44:28.6270032Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6270097Z setp.lt.s64 %p28, %rd105, 1; 2026-02-21T12:44:28.6270161Z setp.gt.s64 %p29, %rd105, 0; 2026-02-21T12:44:28.6270363Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.6270429Z mul.lo.s64 %rd542, %rd665, %rd108; 2026-02-21T12:44:28.6270495Z sub.s64 %rd543, %rd109, %rd542; 2026-02-21T12:44:28.6270694Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.6270758Z add.s64 %rd544, %rd543, %rd107; 2026-02-21T12:44:28.6270954Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.6271019Z shl.b64 %rd672, %rd544, 7; 2026-02-21T12:44:28.6271215Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6271277Z or.b64 %rd666, %rd672, %rd4; 2026-02-21T12:44:28.6271474Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6271537Z shl.b64 %rd545, %rd666, 14; 2026-02-21T12:44:28.6271598Z add.s64 %rd546, %rd165, %rd545; 2026-02-21T12:44:28.6271660Z shl.b64 %rd547, %rd664, 1; 2026-02-21T12:44:28.6271725Z add.s64 %rd527, %rd546, %rd547; 2026-02-21T12:44:28.6272010Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6272133Z add.s32 %r11307, %r11895, %r14330; 2026-02-21T12:44:28.6272199Z add.s32 %r11276, %r11307, 32768; 2026-02-21T12:44:28.6272266Z selp.b32 %r11277, 8, 0, %p29; 2026-02-21T12:44:28.6272325Z // begin inline asm 2026-02-21T12:44:28.6272476Z cp.async.ca.shared.global [ %r11276 + 0 ], [ %rd527 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6272534Z // end inline asm 2026-02-21T12:44:28.6272600Z cp.async.commit_group; 2026-02-21T12:44:28.6272798Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6272865Z add.s64 %rd528, %rd527, 16; 2026-02-21T12:44:28.6273059Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6273126Z add.s32 %r11278, %r11307, 43008; 2026-02-21T12:44:28.6273197Z // begin inline asm 2026-02-21T12:44:28.6273344Z cp.async.ca.shared.global [ %r11278 + 0 ], [ %rd528 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6273401Z // end inline asm 2026-02-21T12:44:28.6273556Z cp.async.commit_group; 2026-02-21T12:44:28.6273759Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6273822Z add.s64 %rd529, %rd527, 32; 2026-02-21T12:44:28.6274019Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6274083Z add.s32 %r11280, %r11307, 53248; 2026-02-21T12:44:28.6274146Z // begin inline asm 2026-02-21T12:44:28.6274285Z cp.async.ca.shared.global [ %r11280 + 0 ], [ %rd529 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6274344Z // end inline asm 2026-02-21T12:44:28.6274409Z cp.async.commit_group; 2026-02-21T12:44:28.6274603Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6274668Z add.s64 %rd530, %rd527, 48; 2026-02-21T12:44:28.6274867Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6274926Z bar.sync 0; 2026-02-21T12:44:28.6274988Z add.s32 %r11282, %r11307, 34816; 2026-02-21T12:44:28.6275064Z // begin inline asm 2026-02-21T12:44:28.6275205Z cp.async.ca.shared.global [ %r11282 + 0 ], [ %rd530 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6275262Z // end inline asm 2026-02-21T12:44:28.6275330Z cp.async.commit_group; 2026-02-21T12:44:28.6275527Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6275588Z add.s64 %rd531, %rd527, 64; 2026-02-21T12:44:28.6275787Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6275848Z add.s32 %r11284, %r11307, 45056; 2026-02-21T12:44:28.6275908Z // begin inline asm 2026-02-21T12:44:28.6276046Z cp.async.ca.shared.global [ %r11284 + 0 ], [ %rd531 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6276106Z // end inline asm 2026-02-21T12:44:28.6276171Z cp.async.commit_group; 2026-02-21T12:44:28.6276369Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6276436Z add.s64 %rd532, %rd527, 80; 2026-02-21T12:44:28.6276758Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6276832Z add.s32 %r11286, %r11307, 55296; 2026-02-21T12:44:28.6276896Z // begin inline asm 2026-02-21T12:44:28.6277036Z cp.async.ca.shared.global [ %r11286 + 0 ], [ %rd532 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6277093Z // end inline asm 2026-02-21T12:44:28.6277159Z cp.async.commit_group; 2026-02-21T12:44:28.6277369Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6277433Z add.s64 %rd533, %rd527, 96; 2026-02-21T12:44:28.6277629Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6277765Z bar.sync 0; 2026-02-21T12:44:28.6277828Z add.s32 %r11288, %r11307, 36864; 2026-02-21T12:44:28.6277950Z // begin inline asm 2026-02-21T12:44:28.6278087Z cp.async.ca.shared.global [ %r11288 + 0 ], [ %rd533 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6278148Z // end inline asm 2026-02-21T12:44:28.6278212Z cp.async.commit_group; 2026-02-21T12:44:28.6278408Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6278474Z add.s64 %rd534, %rd527, 112; 2026-02-21T12:44:28.6278686Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6278748Z add.s32 %r11290, %r11307, 47104; 2026-02-21T12:44:28.6278811Z // begin inline asm 2026-02-21T12:44:28.6278949Z cp.async.ca.shared.global [ %r11290 + 0 ], [ %rd534 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6279006Z // end inline asm 2026-02-21T12:44:28.6279074Z cp.async.commit_group; 2026-02-21T12:44:28.6279275Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6279480Z add.s64 %rd535, %rd527, 128; 2026-02-21T12:44:28.6279681Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6279747Z add.s32 %r11292, %r11307, 57344; 2026-02-21T12:44:28.6279805Z // begin inline asm 2026-02-21T12:44:28.6279941Z cp.async.ca.shared.global [ %r11292 + 0 ], [ %rd535 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6280000Z // end inline asm 2026-02-21T12:44:28.6280065Z cp.async.commit_group; 2026-02-21T12:44:28.6280258Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6280327Z add.s64 %rd536, %rd527, 144; 2026-02-21T12:44:28.6280522Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6280581Z bar.sync 0; 2026-02-21T12:44:28.6280641Z add.s32 %r11294, %r11307, 38912; 2026-02-21T12:44:28.6280702Z // begin inline asm 2026-02-21T12:44:28.6280844Z cp.async.ca.shared.global [ %r11294 + 0 ], [ %rd536 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6280901Z // end inline asm 2026-02-21T12:44:28.6280967Z cp.async.commit_group; 2026-02-21T12:44:28.6281162Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6281224Z add.s64 %rd537, %rd527, 160; 2026-02-21T12:44:28.6281419Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6281481Z add.s32 %r11296, %r11307, 49152; 2026-02-21T12:44:28.6281540Z // begin inline asm 2026-02-21T12:44:28.6281677Z cp.async.ca.shared.global [ %r11296 + 0 ], [ %rd537 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6281737Z // end inline asm 2026-02-21T12:44:28.6281801Z cp.async.commit_group; 2026-02-21T12:44:28.6282009Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6282077Z add.s64 %rd538, %rd527, 176; 2026-02-21T12:44:28.6282278Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6282338Z add.s32 %r11298, %r11307, 59392; 2026-02-21T12:44:28.6282396Z // begin inline asm 2026-02-21T12:44:28.6282536Z cp.async.ca.shared.global [ %r11298 + 0 ], [ %rd538 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6282592Z // end inline asm 2026-02-21T12:44:28.6282656Z cp.async.commit_group; 2026-02-21T12:44:28.6282852Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6282913Z add.s64 %rd539, %rd527, 192; 2026-02-21T12:44:28.6283107Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6283163Z bar.sync 0; 2026-02-21T12:44:28.6283222Z add.s32 %r11300, %r11307, 40960; 2026-02-21T12:44:28.6283340Z // begin inline asm 2026-02-21T12:44:28.6283475Z cp.async.ca.shared.global [ %r11300 + 0 ], [ %rd539 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6283579Z // end inline asm 2026-02-21T12:44:28.6283645Z cp.async.commit_group; 2026-02-21T12:44:28.6283842Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6283907Z add.s64 %rd540, %rd527, 208; 2026-02-21T12:44:28.6284113Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6284175Z add.s32 %r11302, %r11307, 51200; 2026-02-21T12:44:28.6284236Z // begin inline asm 2026-02-21T12:44:28.6284374Z cp.async.ca.shared.global [ %r11302 + 0 ], [ %rd540 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6284430Z // end inline asm 2026-02-21T12:44:28.6284494Z cp.async.commit_group; 2026-02-21T12:44:28.6284691Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6284756Z add.s64 %rd541, %rd527, 224; 2026-02-21T12:44:28.6285046Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6285114Z add.s32 %r11304, %r11307, 61440; 2026-02-21T12:44:28.6285173Z // begin inline asm 2026-02-21T12:44:28.6285310Z cp.async.ca.shared.global [ %r11304 + 0 ], [ %rd541 + 0 ], 0x8, %r11277; 2026-02-21T12:44:28.6285370Z // end inline asm 2026-02-21T12:44:28.6285434Z cp.async.commit_group; 2026-02-21T12:44:28.6285640Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6285700Z @%p28 bra $L__BB0_36; 2026-02-21T12:44:28.6285788Z // %bb.27: // %.lr.ph136 2026-02-21T12:44:28.6285985Z .loc 1 0 0 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:0 2026-02-21T12:44:28.6286053Z mul.lo.s64 %rd106, %rd105, 341; 2026-02-21T12:44:28.6286256Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.6286323Z shl.b64 %rd668, %rd665, 8; 2026-02-21T12:44:28.6286648Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6286718Z or.b64 %rd680, %rd668, %rd21; 2026-02-21T12:44:28.6286783Z add.s64 %rd117, %rd106, -5; 2026-02-21T12:44:28.6286845Z shl.b32 %r11328, %r13810, 1; 2026-02-21T12:44:28.6286912Z or.b32 %r11329, %r13808, %r13809; 2026-02-21T12:44:28.6286975Z or.b32 %r1063, %r11329, %r11328; 2026-02-21T12:44:28.6287037Z cvt.u32.u64 %r11330, %rd644; 2026-02-21T12:44:28.6287099Z add.s32 %r1064, %r11895, %r11330; 2026-02-21T12:44:28.6287163Z xor.b32 %r11332, %r11330, 32; 2026-02-21T12:44:28.6287223Z add.s32 %r1065, %r11895, %r11332; 2026-02-21T12:44:28.6287283Z xor.b32 %r11333, %r11330, 64; 2026-02-21T12:44:28.6287344Z add.s32 %r1066, %r11895, %r11333; 2026-02-21T12:44:28.6287404Z xor.b32 %r11334, %r11330, 96; 2026-02-21T12:44:28.6287467Z add.s32 %r1067, %r11895, %r11334; 2026-02-21T12:44:28.6287530Z shl.b32 %r11335, %r13810, 8; 2026-02-21T12:44:28.6287595Z shl.b32 %r11336, %r13810, 5; 2026-02-21T12:44:28.6287665Z xor.b32 %r11338, %r11336, %r13811; 2026-02-21T12:44:28.6287729Z add.s32 %r11339, %r11895, %r11335; 2026-02-21T12:44:28.6287793Z add.s32 %r1068, %r11339, %r11338; 2026-02-21T12:44:28.6287856Z and.b32 %r11341, %r13812, 8032; 2026-02-21T12:44:28.6287919Z and.b32 %r11343, %r13813, 144; 2026-02-21T12:44:28.6287979Z or.b32 %r11344, %r11343, %r11341; 2026-02-21T12:44:28.6288056Z add.s32 %r1069, %r11895, %r11344; 2026-02-21T12:44:28.6288118Z xor.b32 %r11345, %r11344, 16; 2026-02-21T12:44:28.6288182Z add.s32 %r1070, %r11895, %r11345; 2026-02-21T12:44:28.6288246Z bfe.u32 %r11346, %r11895, 4, 14; 2026-02-21T12:44:28.6288308Z cvt.u64.u32 %rd554, %r11346; 2026-02-21T12:44:28.6288389Z or.b64 %rd591, %rd554, -4611685949674356736; 2026-02-21T12:44:28.6288453Z add.s32 %r1072, %r11895, %r1063; 2026-02-21T12:44:28.6288599Z shl.b32 %r11347, %r13810, 13; 2026-02-21T12:44:28.6288660Z and.b32 %r11349, %r13814, 3968; 2026-02-21T12:44:28.6288721Z and.b32 %r11350, %r13813, 4112; 2026-02-21T12:44:28.6288844Z or.b32 %r11351, %r11349, %r11350; 2026-02-21T12:44:28.6288904Z or.b32 %r11352, %r11351, %r11347; 2026-02-21T12:44:28.6288965Z or.b32 %r11353, %r11352, %r11336; 2026-02-21T12:44:28.6289027Z add.s32 %r1073, %r11895, %r11353; 2026-02-21T12:44:28.6289087Z xor.b32 %r11354, %r11353, 16; 2026-02-21T12:44:28.6289148Z add.s32 %r1074, %r11895, %r11354; 2026-02-21T12:44:28.6289208Z xor.b32 %r11355, %r11353, 32; 2026-02-21T12:44:28.6289271Z add.s32 %r1075, %r11895, %r11355; 2026-02-21T12:44:28.6289331Z xor.b32 %r11356, %r11353, 48; 2026-02-21T12:44:28.6289391Z add.s32 %r1076, %r11895, %r11356; 2026-02-21T12:44:28.6289465Z xor.b32 %r11357, %r11353, 64; 2026-02-21T12:44:28.6289528Z add.s32 %r1077, %r11895, %r11357; 2026-02-21T12:44:28.6289588Z xor.b32 %r11358, %r11353, 80; 2026-02-21T12:44:28.6289653Z add.s32 %r1078, %r11895, %r11358; 2026-02-21T12:44:28.6289717Z xor.b32 %r11359, %r11353, 96; 2026-02-21T12:44:28.6289844Z add.s32 %r1079, %r11895, %r11359; 2026-02-21T12:44:28.6289966Z xor.b32 %r11360, %r11353, 112; 2026-02-21T12:44:28.6290031Z add.s32 %r1080, %r11895, %r11360; 2026-02-21T12:44:28.6290091Z shl.b32 %r11362, %r13815, 10; 2026-02-21T12:44:28.6290152Z and.b32 %r11363, %r13814, 112; 2026-02-21T12:44:28.6290214Z shl.b32 %r11364, %r13815, 2; 2026-02-21T12:44:28.6290276Z and.b32 %r11367, %r13817, 4112; 2026-02-21T12:44:28.6290336Z or.b32 %r11368, %r11362, %r11363; 2026-02-21T12:44:28.6290395Z or.b32 %r11369, %r11364, %r13816; 2026-02-21T12:44:28.6290460Z xor.b32 %r11370, %r11368, %r11369; 2026-02-21T12:44:28.6290521Z xor.b32 %r11371, %r11370, %r11367; 2026-02-21T12:44:28.6290594Z add.s32 %r13571, %r11895, %r11371; 2026-02-21T12:44:28.6290660Z add.s32 %r13576, %r13571, 512; 2026-02-21T12:44:28.6290723Z add.s32 %r13581, %r13571, 1024; 2026-02-21T12:44:28.6290786Z add.s32 %r13586, %r13571, 1536; 2026-02-21T12:44:28.6290852Z add.s32 %r13591, %r13571, 2048; 2026-02-21T12:44:28.6290917Z add.s32 %r13596, %r13571, 2560; 2026-02-21T12:44:28.6290982Z add.s32 %r13601, %r13571, 3072; 2026-02-21T12:44:28.6291041Z add.s32 %r13606, %r13571, 3584; 2026-02-21T12:44:28.6291101Z mov.b64 %rd688, 4; 2026-02-21T12:44:28.6291160Z mov.b32 %r14348, 0f00000000; 2026-02-21T12:44:28.6291219Z mov.b32 %r14346, -1; 2026-02-21T12:44:28.6291278Z mov.b32 %r14345, 0; 2026-02-21T12:44:28.6291338Z mov.b32 %r14344, 12; 2026-02-21T12:44:28.6291395Z mov.b32 %r14343, 24; 2026-02-21T12:44:28.6291453Z mov.b32 %r14342, 36; 2026-02-21T12:44:28.6291513Z mov.b32 %r14341, 48; 2026-02-21T12:44:28.6291570Z mov.b32 %r14340, 4; 2026-02-21T12:44:28.6291626Z mov.b32 %r14339, 16; 2026-02-21T12:44:28.6291682Z mov.b32 %r14338, 28; 2026-02-21T12:44:28.6291742Z mov.b32 %r14337, 40; 2026-02-21T12:44:28.6291798Z mov.b32 %r14336, 52; 2026-02-21T12:44:28.6291855Z mov.b32 %r14335, 8; 2026-02-21T12:44:28.6291916Z mov.b32 %r14334, 20; 2026-02-21T12:44:28.6291973Z mov.b32 %r14333, 32; 2026-02-21T12:44:28.6292030Z mov.b32 %r14332, 44; 2026-02-21T12:44:28.6292093Z mov.b32 %r14331, 56; 2026-02-21T12:44:28.6292152Z mov.b64 %rd679, 0; 2026-02-21T12:44:28.6292209Z mov.b64 %rd678, 1; 2026-02-21T12:44:28.6292264Z mov.b64 %rd677, 2; 2026-02-21T12:44:28.6292324Z mov.b64 %rd676, 3; 2026-02-21T12:44:28.6292407Z prmt.b32 %r12965, %r12966, %r12967, 0x3340U; 2026-02-21T12:44:28.6292484Z prmt.b32 %r13723, %r13724, %r13725, 0x3340U; 2026-02-21T12:44:28.6292548Z mov.b64 %rd667, %rd666; 2026-02-21T12:44:28.6292608Z mov.b64 %rd669, %rd668; 2026-02-21T12:44:28.6292666Z mov.b64 %rd670, %rd668; 2026-02-21T12:44:28.6292725Z mov.b64 %rd671, %rd668; 2026-02-21T12:44:28.6292786Z mov.b64 %rd673, %rd672; 2026-02-21T12:44:28.6292845Z mov.b64 %rd674, %rd672; 2026-02-21T12:44:28.6292904Z mov.b64 %rd675, %rd672; 2026-02-21T12:44:28.6292966Z mov.b64 %rd681, %rd680; 2026-02-21T12:44:28.6293089Z mov.b64 %rd682, %rd680; 2026-02-21T12:44:28.6293148Z mov.b64 %rd683, %rd680; 2026-02-21T12:44:28.6293209Z mov.b32 %r14347, %r14340; 2026-02-21T12:44:28.6293317Z mov.b64 %rd684, %rd680; 2026-02-21T12:44:28.6293376Z mov.b64 %rd685, %rd668; 2026-02-21T12:44:28.6293436Z mov.b64 %rd686, %rd672; 2026-02-21T12:44:28.6293501Z mov.b32 %r14349, %r14348; 2026-02-21T12:44:28.6293570Z mov.b32 %r14350, %r14348; 2026-02-21T12:44:28.6293631Z mov.b32 %r14351, %r14348; 2026-02-21T12:44:28.6293691Z mov.b32 %r14352, %r14348; 2026-02-21T12:44:28.6293753Z mov.b32 %r14353, %r14348; 2026-02-21T12:44:28.6293812Z mov.b32 %r14354, %r14348; 2026-02-21T12:44:28.6293870Z mov.b32 %r14355, %r14348; 2026-02-21T12:44:28.6293930Z mov.b32 %r14356, %r14348; 2026-02-21T12:44:28.6293989Z mov.b32 %r14357, %r14348; 2026-02-21T12:44:28.6294047Z mov.b32 %r14358, %r14348; 2026-02-21T12:44:28.6294105Z mov.b32 %r14359, %r14348; 2026-02-21T12:44:28.6294166Z mov.b32 %r14360, %r14348; 2026-02-21T12:44:28.6294226Z mov.b32 %r14361, %r14348; 2026-02-21T12:44:28.6294285Z mov.b32 %r14362, %r14348; 2026-02-21T12:44:28.6294345Z mov.b32 %r14363, %r14348; 2026-02-21T12:44:28.6294497Z mov.b32 %r14364, %r14348; 2026-02-21T12:44:28.6294558Z mov.b32 %r14365, %r14348; 2026-02-21T12:44:28.6294616Z mov.b32 %r14366, %r14348; 2026-02-21T12:44:28.6294677Z mov.b32 %r14367, %r14348; 2026-02-21T12:44:28.6294735Z mov.b32 %r14368, %r14348; 2026-02-21T12:44:28.6294794Z mov.b32 %r14369, %r14348; 2026-02-21T12:44:28.6294854Z mov.b32 %r14370, %r14348; 2026-02-21T12:44:28.6294912Z mov.b32 %r14371, %r14348; 2026-02-21T12:44:28.6294970Z mov.b32 %r14372, %r14348; 2026-02-21T12:44:28.6295028Z mov.b32 %r14373, %r14348; 2026-02-21T12:44:28.6295089Z mov.b32 %r14374, %r14348; 2026-02-21T12:44:28.6295147Z mov.b32 %r14375, %r14348; 2026-02-21T12:44:28.6295205Z mov.b32 %r14376, %r14348; 2026-02-21T12:44:28.6295266Z mov.b32 %r14377, %r14348; 2026-02-21T12:44:28.6295324Z mov.b32 %r14378, %r14348; 2026-02-21T12:44:28.6295384Z mov.b32 %r14379, %r14348; 2026-02-21T12:44:28.6295445Z mov.b32 %r14380, %r14348; 2026-02-21T12:44:28.6295504Z mov.b32 %r14381, %r14348; 2026-02-21T12:44:28.6295566Z mov.b32 %r14382, %r14348; 2026-02-21T12:44:28.6295625Z mov.b32 %r14383, %r14348; 2026-02-21T12:44:28.6295687Z mov.b32 %r14384, %r14348; 2026-02-21T12:44:28.6295745Z mov.b32 %r14385, %r14348; 2026-02-21T12:44:28.6295803Z mov.b32 %r14386, %r14348; 2026-02-21T12:44:28.6295865Z mov.b32 %r14387, %r14348; 2026-02-21T12:44:28.6295923Z mov.b32 %r14388, %r14348; 2026-02-21T12:44:28.6295980Z mov.b32 %r14389, %r14348; 2026-02-21T12:44:28.6296039Z mov.b32 %r14390, %r14348; 2026-02-21T12:44:28.6296100Z mov.b32 %r14391, %r14348; 2026-02-21T12:44:28.6296157Z mov.b32 %r14392, %r14348; 2026-02-21T12:44:28.6296214Z mov.b32 %r14393, %r14348; 2026-02-21T12:44:28.6296279Z mov.b32 %r14394, %r14348; 2026-02-21T12:44:28.6296338Z mov.b32 %r14395, %r14348; 2026-02-21T12:44:28.6296395Z mov.b32 %r14396, %r14348; 2026-02-21T12:44:28.6296583Z mov.b32 %r14397, %r14348; 2026-02-21T12:44:28.6296651Z mov.b32 %r14398, %r14348; 2026-02-21T12:44:28.6296712Z mov.b32 %r14399, %r14348; 2026-02-21T12:44:28.6296776Z mov.b32 %r14400, %r14348; 2026-02-21T12:44:28.6296837Z mov.b32 %r14401, %r14348; 2026-02-21T12:44:28.6296894Z mov.b32 %r14402, %r14348; 2026-02-21T12:44:28.6296954Z mov.b32 %r14403, %r14348; 2026-02-21T12:44:28.6297012Z mov.b32 %r14404, %r14348; 2026-02-21T12:44:28.6297073Z mov.b32 %r14405, %r14348; 2026-02-21T12:44:28.6297131Z mov.b32 %r14406, %r14348; 2026-02-21T12:44:28.6297189Z mov.b32 %r14407, %r14348; 2026-02-21T12:44:28.6297252Z mov.b32 %r14408, %r14348; 2026-02-21T12:44:28.6297311Z mov.b32 %r14409, %r14348; 2026-02-21T12:44:28.6297371Z mov.b32 %r14410, %r14348; 2026-02-21T12:44:28.6297429Z mov.b32 %r14411, %r14348; 2026-02-21T12:44:28.6297495Z mov.b32 %r14412, %r14348; 2026-02-21T12:44:28.6297553Z mov.b32 %r14413, %r14348; 2026-02-21T12:44:28.6297610Z mov.b32 %r14414, %r14348; 2026-02-21T12:44:28.6297752Z mov.b32 %r14415, %r14348; 2026-02-21T12:44:28.6297822Z mov.b32 %r14416, %r14348; 2026-02-21T12:44:28.6297882Z mov.b32 %r14417, %r14348; 2026-02-21T12:44:28.6298024Z mov.b32 %r14418, %r14348; 2026-02-21T12:44:28.6298086Z mov.b32 %r14419, %r14348; 2026-02-21T12:44:28.6298143Z mov.b32 %r14420, %r14348; 2026-02-21T12:44:28.6298201Z mov.b32 %r14421, %r14348; 2026-02-21T12:44:28.6298261Z mov.b32 %r14422, %r14348; 2026-02-21T12:44:28.6298319Z mov.b32 %r14423, %r14348; 2026-02-21T12:44:28.6298375Z mov.b32 %r14424, %r14348; 2026-02-21T12:44:28.6298436Z mov.b32 %r14425, %r14348; 2026-02-21T12:44:28.6298493Z mov.b32 %r14426, %r14348; 2026-02-21T12:44:28.6298551Z mov.b32 %r14427, %r14348; 2026-02-21T12:44:28.6298608Z mov.b32 %r14428, %r14348; 2026-02-21T12:44:28.6298669Z mov.b32 %r14429, %r14348; 2026-02-21T12:44:28.6298730Z mov.b32 %r14430, %r14348; 2026-02-21T12:44:28.6298787Z mov.b32 %r14431, %r14348; 2026-02-21T12:44:28.6298847Z mov.b32 %r14432, %r14348; 2026-02-21T12:44:28.6298907Z mov.b32 %r14433, %r14348; 2026-02-21T12:44:28.6298964Z mov.b32 %r14434, %r14348; 2026-02-21T12:44:28.6299021Z mov.b32 %r14435, %r14348; 2026-02-21T12:44:28.6299217Z mov.b32 %r14436, %r14348; 2026-02-21T12:44:28.6299282Z mov.b32 %r14437, %r14348; 2026-02-21T12:44:28.6299341Z mov.b32 %r14438, %r14348; 2026-02-21T12:44:28.6299402Z mov.b32 %r14439, %r14348; 2026-02-21T12:44:28.6299461Z mov.b32 %r14440, %r14348; 2026-02-21T12:44:28.6299519Z mov.b32 %r14441, %r14348; 2026-02-21T12:44:28.6299576Z mov.b32 %r14442, %r14348; 2026-02-21T12:44:28.6299637Z mov.b32 %r14443, %r14348; 2026-02-21T12:44:28.6299696Z mov.b32 %r14444, %r14348; 2026-02-21T12:44:28.6299754Z mov.b32 %r14445, %r14348; 2026-02-21T12:44:28.6299815Z mov.b32 %r14446, %r14348; 2026-02-21T12:44:28.6299873Z mov.b32 %r14447, %r14348; 2026-02-21T12:44:28.6299931Z mov.b32 %r14448, %r14348; 2026-02-21T12:44:28.6299993Z mov.b32 %r14449, %r14348; 2026-02-21T12:44:28.6300055Z mov.b32 %r14450, %r14348; 2026-02-21T12:44:28.6300116Z mov.b32 %r14451, %r14348; 2026-02-21T12:44:28.6300174Z mov.b32 %r14452, %r14348; 2026-02-21T12:44:28.6300234Z mov.b32 %r14453, %r14348; 2026-02-21T12:44:28.6300297Z mov.b32 %r14454, %r14348; 2026-02-21T12:44:28.6300356Z mov.b32 %r14455, %r14348; 2026-02-21T12:44:28.6300414Z mov.b32 %r14456, %r14348; 2026-02-21T12:44:28.6300474Z mov.b32 %r14457, %r14348; 2026-02-21T12:44:28.6300534Z mov.b32 %r14458, %r14348; 2026-02-21T12:44:28.6300591Z mov.b32 %r14459, %r14348; 2026-02-21T12:44:28.6300654Z mov.b32 %r14460, %r14348; 2026-02-21T12:44:28.6300711Z mov.b32 %r14461, %r14348; 2026-02-21T12:44:28.6300769Z mov.b32 %r14462, %r14348; 2026-02-21T12:44:28.6300827Z mov.b32 %r14463, %r14348; 2026-02-21T12:44:28.6300889Z mov.b32 %r14464, %r14348; 2026-02-21T12:44:28.6300948Z mov.b32 %r14465, %r14348; 2026-02-21T12:44:28.6301006Z mov.b32 %r14466, %r14348; 2026-02-21T12:44:28.6301067Z mov.b32 %r14467, %r14348; 2026-02-21T12:44:28.6301127Z mov.b32 %r14468, %r14348; 2026-02-21T12:44:28.6301189Z mov.b32 %r14469, %r14348; 2026-02-21T12:44:28.6301251Z mov.b32 %r14470, %r14348; 2026-02-21T12:44:28.6301311Z mov.b32 %r14471, %r14348; 2026-02-21T12:44:28.6301373Z mov.b32 %r14472, %r14348; 2026-02-21T12:44:28.6301433Z mov.b32 %r14473, %r14348; 2026-02-21T12:44:28.6301494Z mov.b32 %r14474, %r14348; 2026-02-21T12:44:28.6301552Z mov.b32 %r14475, %r14348; 2026-02-21T12:44:28.6301612Z mov.b64 %rd689, %rd679; 2026-02-21T12:44:28.6301672Z mov.b64 %rd690, %rd666; 2026-02-21T12:44:28.6301732Z mov.b64 %rd691, %rd666; 2026-02-21T12:44:28.6301791Z mov.b64 %rd692, %rd666; 2026-02-21T12:44:28.6301850Z mov.b64 %rd694, %rd686; 2026-02-21T12:44:28.6301913Z mov.b64 %rd695, %rd690; 2026-02-21T12:44:28.6301973Z mov.b64 %rd696, %rd685; 2026-02-21T12:44:28.6302031Z mov.b64 %rd697, %rd684; 2026-02-21T12:44:28.6302094Z bra.uni $L__BB0_28; 2026-02-21T12:44:28.6302229Z $L__BB0_35: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6302448Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6302573Z add.s64 %rd689, %rd689, 1; 2026-02-21T12:44:28.6302691Z setp.ne.b64 %p42, %rd106, %rd689; 2026-02-21T12:44:28.6302752Z mov.b64 %rd666, %rd690; 2026-02-21T12:44:28.6302811Z mov.b64 %rd667, %rd692; 2026-02-21T12:44:28.6302871Z mov.b64 %rd668, %rd685; 2026-02-21T12:44:28.6302930Z mov.b64 %rd671, %rd124; 2026-02-21T12:44:28.6302989Z mov.b64 %rd672, %rd686; 2026-02-21T12:44:28.6303050Z mov.b64 %rd675, %rd128; 2026-02-21T12:44:28.6303110Z mov.b64 %rd676, %rd688; 2026-02-21T12:44:28.6303168Z mov.b64 %rd679, %rd132; 2026-02-21T12:44:28.6303227Z mov.b32 %r14335, %r1092; 2026-02-21T12:44:28.6303289Z mov.b32 %r14340, %r1097; 2026-02-21T12:44:28.6303350Z mov.b64 %rd680, %rd684; 2026-02-21T12:44:28.6303409Z mov.b64 %rd683, %rd136; 2026-02-21T12:44:28.6303470Z mov.b32 %r14345, %r1102; 2026-02-21T12:44:28.6303528Z mov.b64 %rd684, %rd697; 2026-02-21T12:44:28.6303589Z mov.b64 %rd685, %rd696; 2026-02-21T12:44:28.6303646Z mov.b64 %rd686, %rd694; 2026-02-21T12:44:28.6303720Z mov.b64 %rd688, %rd147; 2026-02-21T12:44:28.6303828Z mov.b64 %rd690, %rd695; 2026-02-21T12:44:28.6303932Z mov.b64 %rd691, %rd121; 2026-02-21T12:44:28.6303995Z mov.b64 %rd692, %rd120; 2026-02-21T12:44:28.6304057Z @%p42 bra $L__BB0_28; 2026-02-21T12:44:28.6304115Z bra.uni $L__BB0_36; 2026-02-21T12:44:28.6304238Z $L__BB0_28: // =>This Inner Loop Header: Depth=1 2026-02-21T12:44:28.6304451Z .loc 1 0 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:0:120 2026-02-21T12:44:28.6304511Z mov.b32 %r1102, %r14344; 2026-02-21T12:44:28.6304571Z mov.b32 %r14344, %r14343; 2026-02-21T12:44:28.6304633Z mov.b32 %r14343, %r14342; 2026-02-21T12:44:28.6304691Z mov.b32 %r14342, %r14341; 2026-02-21T12:44:28.6304750Z mov.b64 %rd136, %rd682; 2026-02-21T12:44:28.6304809Z mov.b64 %rd682, %rd681; 2026-02-21T12:44:28.6304883Z mov.b64 %rd681, %rd680; 2026-02-21T12:44:28.6304945Z mov.b32 %r1097, %r14339; 2026-02-21T12:44:28.6305004Z mov.b32 %r14339, %r14338; 2026-02-21T12:44:28.6305067Z mov.b32 %r14338, %r14337; 2026-02-21T12:44:28.6305130Z mov.b32 %r14337, %r14336; 2026-02-21T12:44:28.6305190Z mov.b32 %r1092, %r14334; 2026-02-21T12:44:28.6305252Z mov.b32 %r14334, %r14333; 2026-02-21T12:44:28.6305310Z mov.b32 %r14333, %r14332; 2026-02-21T12:44:28.6305370Z mov.b32 %r14332, %r14331; 2026-02-21T12:44:28.6305429Z mov.b64 %rd132, %rd678; 2026-02-21T12:44:28.6305491Z mov.b64 %rd678, %rd677; 2026-02-21T12:44:28.6305550Z mov.b64 %rd677, %rd676; 2026-02-21T12:44:28.6305608Z mov.b64 %rd128, %rd674; 2026-02-21T12:44:28.6305668Z mov.b64 %rd674, %rd673; 2026-02-21T12:44:28.6305727Z mov.b64 %rd673, %rd672; 2026-02-21T12:44:28.6305786Z mov.b64 %rd124, %rd670; 2026-02-21T12:44:28.6305844Z mov.b64 %rd670, %rd669; 2026-02-21T12:44:28.6305906Z mov.b64 %rd669, %rd668; 2026-02-21T12:44:28.6305966Z mov.b64 %rd121, %rd667; 2026-02-21T12:44:28.6306027Z mov.b64 %rd120, %rd666; 2026-02-21T12:44:28.6306239Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6306307Z add.s64 %rd555, %rd688, 1; 2026-02-21T12:44:28.6306374Z setp.eq.b64 %p30, %rd688, 340; 2026-02-21T12:44:28.6306442Z selp.b64 %rd147, 0, %rd555, %p30; 2026-02-21T12:44:28.6306631Z setp.ne.b64 %p31, %rd147, 0; 2026-02-21T12:44:28.6306692Z @%p31 bra $L__BB0_33; 2026-02-21T12:44:28.6306799Z // %bb.29: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6306864Z add.s64 %rd698, %rd698, 1; 2026-02-21T12:44:28.6307068Z .loc 1 28 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:28:35 2026-02-21T12:44:28.6307155Z mul.hi.s64 %rd556, %rd698, 7378697629483820647; 2026-02-21T12:44:28.6307220Z shr.u64 %rd557, %rd556, 63; 2026-02-21T12:44:28.6307282Z shr.s64 %rd558, %rd556, 3; 2026-02-21T12:44:28.6307346Z add.s64 %rd559, %rd558, %rd557; 2026-02-21T12:44:28.6307634Z .loc 1 29 33 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:29:33 2026-02-21T12:44:28.6307759Z shl.b64 %rd149, %rd559, 2; 2026-02-21T12:44:28.6307958Z .loc 1 30 39 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:39 2026-02-21T12:44:28.6308021Z sub.s64 %rd560, 2048, %rd149; 2026-02-21T12:44:28.6308220Z .loc 1 30 52 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:30:52 2026-02-21T12:44:28.6308282Z min.s64 %rd150, %rd560, 4; 2026-02-21T12:44:28.6308478Z .loc 1 31 45 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:45 2026-02-21T12:44:28.6308608Z mul.lo.s64 %rd561, %rd559, 20; 2026-02-21T12:44:28.6308673Z sub.s64 %rd151, %rd698, %rd561; 2026-02-21T12:44:28.6308871Z .loc 1 32 51 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:32:51 2026-02-21T12:44:28.6308937Z or.b64 %rd562, %rd151, %rd150; 2026-02-21T12:44:28.6309008Z and.b64 %rd563, %rd562, -4294967296; 2026-02-21T12:44:28.6309076Z setp.ne.b64 %p32, %rd563, 0; 2026-02-21T12:44:28.6309201Z @%p32 bra $L__BB0_31; 2026-02-21T12:44:28.6309323Z bra.uni $L__BB0_30; 2026-02-21T12:44:28.6309435Z $L__BB0_31: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6309499Z div.s64 %rd693, %rd151, %rd150; 2026-02-21T12:44:28.6309559Z bra.uni $L__BB0_32; 2026-02-21T12:44:28.6309664Z $L__BB0_30: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6309726Z cvt.u32.u64 %r11372, %rd150; 2026-02-21T12:44:28.6309787Z cvt.u32.u64 %r11373, %rd151; 2026-02-21T12:44:28.6309853Z div.u32 %r11374, %r11373, %r11372; 2026-02-21T12:44:28.6309915Z cvt.u64.u32 %rd693, %r11374; 2026-02-21T12:44:28.6310017Z $L__BB0_32: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6310216Z .loc 1 31 64 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:64 2026-02-21T12:44:28.6310296Z mul.lo.s64 %rd564, %rd693, %rd150; 2026-02-21T12:44:28.6310359Z sub.s64 %rd565, %rd151, %rd564; 2026-02-21T12:44:28.6310563Z .loc 1 31 30 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:31:30 2026-02-21T12:44:28.6310626Z add.s64 %rd566, %rd565, %rd149; 2026-02-21T12:44:28.6310823Z .loc 1 33 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:33:27 2026-02-21T12:44:28.6310886Z shl.b64 %rd694, %rd566, 7; 2026-02-21T12:44:28.6311082Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6311143Z or.b64 %rd695, %rd694, %rd4; 2026-02-21T12:44:28.6311337Z .loc 1 35 27 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:35:27 2026-02-21T12:44:28.6311401Z shl.b64 %rd696, %rd693, 8; 2026-02-21T12:44:28.6311596Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6311661Z or.b64 %rd697, %rd696, %rd21; 2026-02-21T12:44:28.6311769Z $L__BB0_33: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6311979Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6312045Z setp.eq.b64 %p36, %rd147, 0; 2026-02-21T12:44:28.6312114Z setp.lt.s64 %p37, %rd689, %rd117; 2026-02-21T12:44:28.6312175Z add.s32 %r12950, %r14346, 1; 2026-02-21T12:44:28.6312242Z setp.gt.s32 %p38, %r12950, 4; 2026-02-21T12:44:28.6312312Z selp.b32 %r14346, 0, %r12950, %p38; 2026-02-21T12:44:28.6312512Z .loc 1 44 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:44:35 2026-02-21T12:44:28.6312573Z cvt.s64.s32 %rd576, %r14345; 2026-02-21T12:44:28.6312637Z add.s64 %rd577, %rd576, %rd23; 2026-02-21T12:44:28.6312835Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6312970Z cp.async.wait_group 12; 2026-02-21T12:44:28.6313028Z bar.sync 0; 2026-02-21T12:44:28.6313091Z shl.b32 %r12951, %r14346, 11; 2026-02-21T12:44:28.6313200Z add.s32 %r12952, %r11895, %r12951; 2026-02-21T12:44:28.6313264Z add.s32 %r12953, %r12952, %r1063; 2026-02-21T12:44:28.6313461Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6313539Z ld.shared.b16 %rs449, [%r12953+32768]; 2026-02-21T12:44:28.6313611Z ld.shared.b16 %rs450, [%r12953+32896]; 2026-02-21T12:44:28.6313680Z ld.shared.b16 %rs451, [%r12953+32776]; 2026-02-21T12:44:28.6313748Z ld.shared.b16 %rs452, [%r12953+32904]; 2026-02-21T12:44:28.6313812Z cvt.f32.bf16 %r11632, %rs449; 2026-02-21T12:44:28.6313873Z cvt.f32.bf16 %r11633, %rs450; 2026-02-21T12:44:28.6313933Z cvt.f32.bf16 %r11634, %rs451; 2026-02-21T12:44:28.6313997Z cvt.f32.bf16 %r11635, %rs452; 2026-02-21T12:44:28.6314204Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6314281Z mad.lo.s64 %rd578, %rd577, 1280, %rd166; 2026-02-21T12:44:28.6314396Z add.s64 %rd567, %rd578, %rd683; 2026-02-21T12:44:28.6314649Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6314714Z // begin inline asm 2026-02-21T12:44:28.6314778Z mov.u32 %r11375, 0x0; 2026-02-21T12:44:28.6314859Z ld.global.b32 { %r11375 }, [ %rd567 + 0 ]; 2026-02-21T12:44:28.6314919Z // end inline asm 2026-02-21T12:44:28.6315128Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6315198Z st.shared.b8 [%r1064], %r11375; 2026-02-21T12:44:28.6315268Z prmt.b32 %r12954, %r11375, 0, 0x7771U; 2026-02-21T12:44:28.6315338Z st.shared.b8 [%r1065+256], %r12954; 2026-02-21T12:44:28.6315406Z prmt.b32 %r12955, %r11375, 0, 0x7772U; 2026-02-21T12:44:28.6315471Z st.shared.b8 [%r1066+512], %r12955; 2026-02-21T12:44:28.6315540Z prmt.b32 %r12956, %r11375, 0, 0x7773U; 2026-02-21T12:44:28.6315608Z st.shared.b8 [%r1067+768], %r12956; 2026-02-21T12:44:28.6315665Z bar.sync 0; 2026-02-21T12:44:28.6315736Z ld.shared.b32 %r12957, [%r1068]; 2026-02-21T12:44:28.6315801Z prmt.b32 %r12958, %r12957, 0, 0x7771U; 2026-02-21T12:44:28.6315868Z cvt.u16.u32 %rs453, %r12958; 2026-02-21T12:44:28.6315932Z prmt.b32 %r12959, %r12957, 0, 0x7770U; 2026-02-21T12:44:28.6315993Z cvt.u16.u32 %rs454, %r12959; 2026-02-21T12:44:28.6316060Z prmt.b32 %r12960, %r12957, 0, 0x7773U; 2026-02-21T12:44:28.6316122Z cvt.u16.u32 %rs455, %r12960; 2026-02-21T12:44:28.6316187Z prmt.b32 %r12961, %r12957, 0, 0x7772U; 2026-02-21T12:44:28.6316248Z cvt.u16.u32 %rs456, %r12961; 2026-02-21T12:44:28.6316585Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6316655Z shl.b16 %rs457, %rs454, 4; 2026-02-21T12:44:28.6316717Z shl.b16 %rs458, %rs453, 4; 2026-02-21T12:44:28.6316931Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6316995Z cvt.u32.u16 %r12962, %rs457; 2026-02-21T12:44:28.6317081Z prmt.b32 %r12963, %r12962, %r12964, 0x3340U; 2026-02-21T12:44:28.6317164Z prmt.b32 %r12968, %r12963, %r12965, 0x5410U; 2026-02-21T12:44:28.6317249Z prmt.b32 %r12969, %r12968, %r12957, 0x5040U; 2026-02-21T12:44:28.6317319Z prmt.b32 %r12970, %r12969, 0, 0x9991U; 2026-02-21T12:44:28.6317382Z cvt.u16.u32 %rs459, %r12970; 2026-02-21T12:44:28.6317448Z shr.s16 %rs460, %rs459, 4; 2026-02-21T12:44:28.6317514Z prmt.b32 %r12971, %r12969, 0, 0xbbb3U; 2026-02-21T12:44:28.6317575Z cvt.u16.u32 %rs461, %r12971; 2026-02-21T12:44:28.6317641Z shr.s16 %rs462, %rs461, 4; 2026-02-21T12:44:28.6317703Z cvt.s16.s8 %rs463, %rs457; 2026-02-21T12:44:28.6317763Z shr.s16 %rs464, %rs463, 4; 2026-02-21T12:44:28.6317826Z cvt.s16.s8 %rs465, %rs458; 2026-02-21T12:44:28.6317886Z shr.s16 %rs466, %rs465, 4; 2026-02-21T12:44:28.6318192Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6318261Z cvt.rn.f32.s16 %r12972, %rs462; 2026-02-21T12:44:28.6318389Z cvt.rn.f32.s16 %r12973, %rs460; 2026-02-21T12:44:28.6318452Z cvt.rn.f32.s16 %r12974, %rs466; 2026-02-21T12:44:28.6318513Z cvt.rn.f32.s16 %r12975, %rs464; 2026-02-21T12:44:28.6318717Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6318780Z shl.b16 %rs467, %rs456, 4; 2026-02-21T12:44:28.6318842Z shl.b16 %rs468, %rs455, 4; 2026-02-21T12:44:28.6319044Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6319132Z prmt.b32 %r12976, %r12957, %r12977, 0x3020U; 2026-02-21T12:44:28.6319201Z prmt.b32 %r12978, %r12976, 0, 0x9991U; 2026-02-21T12:44:28.6319263Z cvt.u16.u32 %rs469, %r12978; 2026-02-21T12:44:28.6319328Z shr.s16 %rs470, %rs469, 4; 2026-02-21T12:44:28.6319391Z cvt.s16.s8 %rs471, %rs467; 2026-02-21T12:44:28.6319452Z shr.s16 %rs472, %rs471, 4; 2026-02-21T12:44:28.6319516Z cvt.s16.s8 %rs473, %rs468; 2026-02-21T12:44:28.6319695Z shr.s16 %rs474, %rs473, 4; 2026-02-21T12:44:28.6319767Z prmt.b32 %r12979, %r12957, 0, 0xbbb3U; 2026-02-21T12:44:28.6319830Z cvt.u16.u32 %rs475, %r12979; 2026-02-21T12:44:28.6319894Z shr.s16 %rs476, %rs475, 4; 2026-02-21T12:44:28.6320092Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6320155Z cvt.rn.f32.s16 %r12980, %rs470; 2026-02-21T12:44:28.6320221Z cvt.rn.f32.s16 %r12981, %rs476; 2026-02-21T12:44:28.6320282Z cvt.rn.f32.s16 %r12982, %rs474; 2026-02-21T12:44:28.6320343Z cvt.rn.f32.s16 %r12983, %rs472; 2026-02-21T12:44:28.6320402Z bar.sync 0; 2026-02-21T12:44:28.6320530Z st.shared.v4.b32 [%r1069], {%r12975, %r12973, %r12974, %r12972}; 2026-02-21T12:44:28.6320649Z st.shared.v4.b32 [%r1070], {%r12983, %r12980, %r12982, %r12981}; 2026-02-21T12:44:28.6320707Z $L__tmp33: 2026-02-21T12:44:28.6320993Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6321058Z // begin inline asm 2026-02-21T12:44:28.6321140Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6321199Z // end inline asm 2026-02-21T12:44:28.6321255Z bar.sync 0; 2026-02-21T12:44:28.6321341Z shfl.sync.idx.b32 %r12984, %r2, 0, 31, -1; 2026-02-21T12:44:28.6321415Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6321482Z mov.pred %p33, -1; 2026-02-21T12:44:28.6321553Z // begin inline asm 2026-02-21T12:44:28.6324296Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475}, {%r11632,%r11633,%r11634,%r11635}, %rd591, %p33, 1, 1; 2026-02-21T12:44:28.6324362Z // end inline asm 2026-02-21T12:44:28.6324439Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6324500Z mov.b32 %r12812, 0; 2026-02-21T12:44:28.6324564Z mov.b32 %r11764, %r11895; 2026-02-21T12:44:28.6324623Z mov.b32 %r11765, %r12812; 2026-02-21T12:44:28.6324742Z mov.b32 %r11766, %r12812; 2026-02-21T12:44:28.6324802Z // begin inline asm 2026-02-21T12:44:28.6327498Z // wait for regs: %r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475,%r11764,%r11765,%r11766 2026-02-21T12:44:28.6327704Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6327766Z // end inline asm 2026-02-21T12:44:28.6327825Z $L__tmp34: 2026-02-21T12:44:28.6328038Z .loc 1 44 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:44:35 2026-02-21T12:44:28.6328102Z cvt.s64.s32 %rd579, %r14340; 2026-02-21T12:44:28.6328168Z add.s64 %rd580, %rd579, %rd23; 2026-02-21T12:44:28.6328371Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6328443Z ld.shared.b16 %rs477, [%r12953+43008]; 2026-02-21T12:44:28.6328511Z ld.shared.b16 %rs478, [%r12953+43136]; 2026-02-21T12:44:28.6328582Z ld.shared.b16 %rs479, [%r12953+43016]; 2026-02-21T12:44:28.6328649Z ld.shared.b16 %rs480, [%r12953+43144]; 2026-02-21T12:44:28.6328717Z cvt.f32.bf16 %r12155, %rs477; 2026-02-21T12:44:28.6328783Z cvt.f32.bf16 %r12156, %rs478; 2026-02-21T12:44:28.6328843Z cvt.f32.bf16 %r12157, %rs479; 2026-02-21T12:44:28.6328920Z cvt.f32.bf16 %r12158, %rs480; 2026-02-21T12:44:28.6329124Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6329201Z mad.lo.s64 %rd581, %rd580, 1280, %rd166; 2026-02-21T12:44:28.6329267Z add.s64 %rd569, %rd581, %rd683; 2026-02-21T12:44:28.6329466Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6329528Z // begin inline asm 2026-02-21T12:44:28.6329587Z mov.u32 %r11898, 0x0; 2026-02-21T12:44:28.6329663Z ld.global.b32 { %r11898 }, [ %rd569 + 0 ]; 2026-02-21T12:44:28.6329722Z // end inline asm 2026-02-21T12:44:28.6329920Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6329976Z bar.sync 0; 2026-02-21T12:44:28.6330044Z st.shared.b8 [%r1064], %r11898; 2026-02-21T12:44:28.6330115Z prmt.b32 %r12985, %r11898, 0, 0x7771U; 2026-02-21T12:44:28.6330189Z st.shared.b8 [%r1065+256], %r12985; 2026-02-21T12:44:28.6330256Z prmt.b32 %r12986, %r11898, 0, 0x7772U; 2026-02-21T12:44:28.6330325Z st.shared.b8 [%r1066+512], %r12986; 2026-02-21T12:44:28.6330391Z prmt.b32 %r12987, %r11898, 0, 0x7773U; 2026-02-21T12:44:28.6330458Z st.shared.b8 [%r1067+768], %r12987; 2026-02-21T12:44:28.6330514Z bar.sync 0; 2026-02-21T12:44:28.6330587Z ld.shared.b32 %r12988, [%r1068]; 2026-02-21T12:44:28.6330653Z prmt.b32 %r12989, %r12988, 0, 0x7771U; 2026-02-21T12:44:28.6330718Z cvt.u16.u32 %rs481, %r12989; 2026-02-21T12:44:28.6330784Z prmt.b32 %r12990, %r12988, 0, 0x7770U; 2026-02-21T12:44:28.6330846Z cvt.u16.u32 %rs482, %r12990; 2026-02-21T12:44:28.6330912Z prmt.b32 %r12991, %r12988, 0, 0x7773U; 2026-02-21T12:44:28.6330975Z cvt.u16.u32 %rs483, %r12991; 2026-02-21T12:44:28.6331040Z prmt.b32 %r12992, %r12988, 0, 0x7772U; 2026-02-21T12:44:28.6331188Z cvt.u16.u32 %rs484, %r12992; 2026-02-21T12:44:28.6331394Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6331504Z shl.b16 %rs485, %rs482, 4; 2026-02-21T12:44:28.6331566Z shl.b16 %rs486, %rs481, 4; 2026-02-21T12:44:28.6331764Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6331835Z cvt.u32.u16 %r12993, %rs485; 2026-02-21T12:44:28.6331918Z prmt.b32 %r12994, %r12993, %r12995, 0x3340U; 2026-02-21T12:44:28.6331995Z prmt.b32 %r12996, %r12994, %r12965, 0x5410U; 2026-02-21T12:44:28.6332072Z prmt.b32 %r12997, %r12996, %r12988, 0x5040U; 2026-02-21T12:44:28.6332138Z prmt.b32 %r12998, %r12997, 0, 0x9991U; 2026-02-21T12:44:28.6332201Z cvt.u16.u32 %rs487, %r12998; 2026-02-21T12:44:28.6332260Z shr.s16 %rs488, %rs487, 4; 2026-02-21T12:44:28.6332327Z prmt.b32 %r12999, %r12997, 0, 0xbbb3U; 2026-02-21T12:44:28.6332390Z cvt.u16.u32 %rs489, %r12999; 2026-02-21T12:44:28.6332451Z shr.s16 %rs490, %rs489, 4; 2026-02-21T12:44:28.6332515Z cvt.s16.s8 %rs491, %rs485; 2026-02-21T12:44:28.6332668Z shr.s16 %rs492, %rs491, 4; 2026-02-21T12:44:28.6332731Z cvt.s16.s8 %rs493, %rs486; 2026-02-21T12:44:28.6332791Z shr.s16 %rs494, %rs493, 4; 2026-02-21T12:44:28.6332993Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6333060Z cvt.rn.f32.s16 %r13000, %rs490; 2026-02-21T12:44:28.6333124Z cvt.rn.f32.s16 %r13001, %rs488; 2026-02-21T12:44:28.6333190Z cvt.rn.f32.s16 %r13002, %rs494; 2026-02-21T12:44:28.6333251Z cvt.rn.f32.s16 %r13003, %rs492; 2026-02-21T12:44:28.6333448Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6333512Z shl.b16 %rs495, %rs484, 4; 2026-02-21T12:44:28.6333572Z shl.b16 %rs496, %rs483, 4; 2026-02-21T12:44:28.6333769Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6333847Z prmt.b32 %r13004, %r12988, %r13005, 0x3020U; 2026-02-21T12:44:28.6333920Z prmt.b32 %r13006, %r13004, 0, 0x9991U; 2026-02-21T12:44:28.6333993Z cvt.u16.u32 %rs497, %r13006; 2026-02-21T12:44:28.6334055Z shr.s16 %rs498, %rs497, 4; 2026-02-21T12:44:28.6334120Z cvt.s16.s8 %rs499, %rs495; 2026-02-21T12:44:28.6334182Z shr.s16 %rs500, %rs499, 4; 2026-02-21T12:44:28.6334245Z cvt.s16.s8 %rs501, %rs496; 2026-02-21T12:44:28.6334308Z shr.s16 %rs502, %rs501, 4; 2026-02-21T12:44:28.6334379Z prmt.b32 %r13007, %r12988, 0, 0xbbb3U; 2026-02-21T12:44:28.6334440Z cvt.u16.u32 %rs503, %r13007; 2026-02-21T12:44:28.6334500Z shr.s16 %rs504, %rs503, 4; 2026-02-21T12:44:28.6334702Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6334766Z cvt.rn.f32.s16 %r13008, %rs498; 2026-02-21T12:44:28.6334830Z cvt.rn.f32.s16 %r13009, %rs504; 2026-02-21T12:44:28.6334897Z cvt.rn.f32.s16 %r13010, %rs502; 2026-02-21T12:44:28.6334960Z cvt.rn.f32.s16 %r13011, %rs500; 2026-02-21T12:44:28.6335015Z bar.sync 0; 2026-02-21T12:44:28.6335142Z st.shared.v4.b32 [%r1069], {%r13003, %r13001, %r13002, %r13000}; 2026-02-21T12:44:28.6335260Z st.shared.v4.b32 [%r1070], {%r13011, %r13008, %r13010, %r13009}; 2026-02-21T12:44:28.6335314Z $L__tmp35: 2026-02-21T12:44:28.6335589Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6335655Z // begin inline asm 2026-02-21T12:44:28.6335734Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6335790Z // end inline asm 2026-02-21T12:44:28.6335849Z bar.sync 0; 2026-02-21T12:44:28.6335921Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6335980Z // begin inline asm 2026-02-21T12:44:28.6338841Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475}, {%r12155,%r12156,%r12157,%r12158}, %rd591, %p33, 1, 1; 2026-02-21T12:44:28.6339039Z // end inline asm 2026-02-21T12:44:28.6339121Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6339184Z mov.b32 %r12287, %r11895; 2026-02-21T12:44:28.6339301Z mov.b32 %r12288, %r12812; 2026-02-21T12:44:28.6339434Z mov.b32 %r12289, %r12812; 2026-02-21T12:44:28.6339497Z // begin inline asm 2026-02-21T12:44:28.6342038Z // wait for regs: %r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475,%r12287,%r12288,%r12289 2026-02-21T12:44:28.6342123Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6342181Z // end inline asm 2026-02-21T12:44:28.6342234Z $L__tmp36: 2026-02-21T12:44:28.6342454Z .loc 1 44 35 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:44:35 2026-02-21T12:44:28.6342519Z cvt.s64.s32 %rd582, %r14335; 2026-02-21T12:44:28.6342586Z add.s64 %rd583, %rd582, %rd23; 2026-02-21T12:44:28.6342792Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6342867Z ld.shared.b16 %rs505, [%r12953+53248]; 2026-02-21T12:44:28.6342936Z ld.shared.b16 %rs506, [%r12953+53376]; 2026-02-21T12:44:28.6343005Z ld.shared.b16 %rs507, [%r12953+53256]; 2026-02-21T12:44:28.6343091Z ld.shared.b16 %rs508, [%r12953+53384]; 2026-02-21T12:44:28.6343159Z cvt.f32.bf16 %r12678, %rs505; 2026-02-21T12:44:28.6343222Z cvt.f32.bf16 %r12679, %rs506; 2026-02-21T12:44:28.6343287Z cvt.f32.bf16 %r12680, %rs507; 2026-02-21T12:44:28.6343348Z cvt.f32.bf16 %r12681, %rs508; 2026-02-21T12:44:28.6343553Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6343630Z mad.lo.s64 %rd584, %rd583, 1280, %rd166; 2026-02-21T12:44:28.6343695Z add.s64 %rd571, %rd584, %rd683; 2026-02-21T12:44:28.6343893Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6343953Z // begin inline asm 2026-02-21T12:44:28.6344015Z mov.u32 %r12421, 0x0; 2026-02-21T12:44:28.6344089Z ld.global.b32 { %r12421 }, [ %rd571 + 0 ]; 2026-02-21T12:44:28.6344149Z // end inline asm 2026-02-21T12:44:28.6344408Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6344509Z bar.sync 0; 2026-02-21T12:44:28.6344576Z st.shared.b8 [%r1064], %r12421; 2026-02-21T12:44:28.6344647Z prmt.b32 %r13012, %r12421, 0, 0x7771U; 2026-02-21T12:44:28.6344719Z st.shared.b8 [%r1065+256], %r13012; 2026-02-21T12:44:28.6344786Z prmt.b32 %r13013, %r12421, 0, 0x7772U; 2026-02-21T12:44:28.6344855Z st.shared.b8 [%r1066+512], %r13013; 2026-02-21T12:44:28.6344923Z prmt.b32 %r13014, %r12421, 0, 0x7773U; 2026-02-21T12:44:28.6344987Z st.shared.b8 [%r1067+768], %r13014; 2026-02-21T12:44:28.6345042Z bar.sync 0; 2026-02-21T12:44:28.6345113Z ld.shared.b32 %r13015, [%r1068]; 2026-02-21T12:44:28.6345178Z prmt.b32 %r13016, %r13015, 0, 0x7771U; 2026-02-21T12:44:28.6345239Z cvt.u16.u32 %rs509, %r13016; 2026-02-21T12:44:28.6345303Z prmt.b32 %r13017, %r13015, 0, 0x7770U; 2026-02-21T12:44:28.6345367Z cvt.u16.u32 %rs510, %r13017; 2026-02-21T12:44:28.6345432Z prmt.b32 %r13018, %r13015, 0, 0x7773U; 2026-02-21T12:44:28.6345495Z cvt.u16.u32 %rs511, %r13018; 2026-02-21T12:44:28.6345670Z prmt.b32 %r13019, %r13015, 0, 0x7772U; 2026-02-21T12:44:28.6345737Z cvt.u16.u32 %rs512, %r13019; 2026-02-21T12:44:28.6345939Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6346003Z shl.b16 %rs513, %rs510, 4; 2026-02-21T12:44:28.6346069Z shl.b16 %rs514, %rs509, 4; 2026-02-21T12:44:28.6346267Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6346329Z cvt.u32.u16 %r13020, %rs513; 2026-02-21T12:44:28.6346411Z prmt.b32 %r13021, %r13020, %r13022, 0x3340U; 2026-02-21T12:44:28.6346607Z prmt.b32 %r13023, %r13021, %r12965, 0x5410U; 2026-02-21T12:44:28.6346696Z prmt.b32 %r13024, %r13023, %r13015, 0x5040U; 2026-02-21T12:44:28.6346765Z prmt.b32 %r13025, %r13024, 0, 0x9991U; 2026-02-21T12:44:28.6346830Z cvt.u16.u32 %rs515, %r13025; 2026-02-21T12:44:28.6346890Z shr.s16 %rs516, %rs515, 4; 2026-02-21T12:44:28.6346956Z prmt.b32 %r13026, %r13024, 0, 0xbbb3U; 2026-02-21T12:44:28.6347024Z cvt.u16.u32 %rs517, %r13026; 2026-02-21T12:44:28.6347085Z shr.s16 %rs518, %rs517, 4; 2026-02-21T12:44:28.6347146Z cvt.s16.s8 %rs519, %rs513; 2026-02-21T12:44:28.6347210Z shr.s16 %rs520, %rs519, 4; 2026-02-21T12:44:28.6347272Z cvt.s16.s8 %rs521, %rs514; 2026-02-21T12:44:28.6347333Z shr.s16 %rs522, %rs521, 4; 2026-02-21T12:44:28.6347537Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6347603Z cvt.rn.f32.s16 %r13027, %rs518; 2026-02-21T12:44:28.6347667Z cvt.rn.f32.s16 %r13028, %rs516; 2026-02-21T12:44:28.6347728Z cvt.rn.f32.s16 %r13029, %rs522; 2026-02-21T12:44:28.6347797Z cvt.rn.f32.s16 %r13030, %rs520; 2026-02-21T12:44:28.6347994Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6348068Z shl.b16 %rs523, %rs512, 4; 2026-02-21T12:44:28.6348134Z shl.b16 %rs524, %rs511, 4; 2026-02-21T12:44:28.6348335Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6348414Z prmt.b32 %r13031, %r13015, %r13032, 0x3020U; 2026-02-21T12:44:28.6348483Z prmt.b32 %r13033, %r13031, 0, 0x9991U; 2026-02-21T12:44:28.6348644Z cvt.u16.u32 %rs525, %r13033; 2026-02-21T12:44:28.6348706Z shr.s16 %rs526, %rs525, 4; 2026-02-21T12:44:28.6348767Z cvt.s16.s8 %rs527, %rs523; 2026-02-21T12:44:28.6348831Z shr.s16 %rs528, %rs527, 4; 2026-02-21T12:44:28.6348892Z cvt.s16.s8 %rs529, %rs524; 2026-02-21T12:44:28.6348951Z shr.s16 %rs530, %rs529, 4; 2026-02-21T12:44:28.6349020Z prmt.b32 %r13034, %r13015, 0, 0xbbb3U; 2026-02-21T12:44:28.6349080Z cvt.u16.u32 %rs531, %r13034; 2026-02-21T12:44:28.6349140Z shr.s16 %rs532, %rs531, 4; 2026-02-21T12:44:28.6349338Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6349488Z cvt.rn.f32.s16 %r13035, %rs526; 2026-02-21T12:44:28.6349552Z cvt.rn.f32.s16 %r13036, %rs532; 2026-02-21T12:44:28.6349673Z cvt.rn.f32.s16 %r13037, %rs530; 2026-02-21T12:44:28.6349738Z cvt.rn.f32.s16 %r13038, %rs528; 2026-02-21T12:44:28.6349793Z bar.sync 0; 2026-02-21T12:44:28.6349915Z st.shared.v4.b32 [%r1069], {%r13030, %r13028, %r13029, %r13027}; 2026-02-21T12:44:28.6350034Z st.shared.v4.b32 [%r1070], {%r13038, %r13035, %r13037, %r13036}; 2026-02-21T12:44:28.6350090Z $L__tmp37: 2026-02-21T12:44:28.6350367Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6350429Z // begin inline asm 2026-02-21T12:44:28.6350510Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6350566Z // end inline asm 2026-02-21T12:44:28.6350621Z bar.sync 0; 2026-02-21T12:44:28.6350696Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6350769Z // begin inline asm 2026-02-21T12:44:28.6353627Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475}, {%r12678,%r12679,%r12680,%r12681}, %rd591, %p33, 1, 1; 2026-02-21T12:44:28.6353697Z // end inline asm 2026-02-21T12:44:28.6353772Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6353834Z mov.b32 %r12810, %r11895; 2026-02-21T12:44:28.6353894Z mov.b32 %r12811, %r12812; 2026-02-21T12:44:28.6353952Z // begin inline asm 2026-02-21T12:44:28.6356579Z // wait for regs: %r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475,%r12810,%r12811,%r12812 2026-02-21T12:44:28.6356663Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6356736Z // end inline asm 2026-02-21T12:44:28.6356793Z $L__tmp38: 2026-02-21T12:44:28.6357010Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6357072Z add.s32 %r13039, %r14342, 12; 2026-02-21T12:44:28.6357137Z add.s32 %r13040, %r14347, 1; 2026-02-21T12:44:28.6357203Z setp.gt.s32 %p39, %r13040, 4; 2026-02-21T12:44:28.6357272Z selp.b32 %r14347, 0, %r13040, %p39; 2026-02-21T12:44:28.6357413Z selp.b32 %r14341, 0, %r13039, %p36; 2026-02-21T12:44:28.6357617Z .loc 1 48 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:48:22 2026-02-21T12:44:28.6357735Z shl.b32 %r13041, %r14341, 1; 2026-02-21T12:44:28.6357938Z .loc 1 50 26 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:50:26 2026-02-21T12:44:28.6358002Z add.s32 %r13042, %r13041, %r7; 2026-02-21T12:44:28.6358202Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6358264Z shl.b64 %rd585, %rd695, 14; 2026-02-21T12:44:28.6358330Z add.s64 %rd586, %rd165, %rd585; 2026-02-21T12:44:28.6358404Z mad.wide.s32 %rd573, %r13042, 2, %rd586; 2026-02-21T12:44:28.6358599Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6358664Z shl.b32 %r13043, %r14347, 11; 2026-02-21T12:44:28.6358728Z add.s32 %r12944, %r11276, %r13043; 2026-02-21T12:44:28.6358793Z selp.b32 %r12945, 8, 0, %p37; 2026-02-21T12:44:28.6358856Z // begin inline asm 2026-02-21T12:44:28.6359121Z cp.async.ca.shared.global [ %r12944 + 0 ], [ %rd573 + 0 ], 0x8, %r12945; 2026-02-21T12:44:28.6359182Z // end inline asm 2026-02-21T12:44:28.6359248Z cp.async.commit_group; 2026-02-21T12:44:28.6359460Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6359522Z add.s32 %r14336, %r14341, 4; 2026-02-21T12:44:28.6359719Z .loc 1 48 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:48:22 2026-02-21T12:44:28.6359783Z shl.b32 %r13044, %r14336, 1; 2026-02-21T12:44:28.6359979Z .loc 1 50 26 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:50:26 2026-02-21T12:44:28.6360042Z add.s32 %r13045, %r13044, %r7; 2026-02-21T12:44:28.6360240Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6360315Z mad.wide.s32 %rd574, %r13045, 2, %rd586; 2026-02-21T12:44:28.6360525Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6360592Z add.s32 %r12946, %r11278, %r13043; 2026-02-21T12:44:28.6360658Z // begin inline asm 2026-02-21T12:44:28.6360802Z cp.async.ca.shared.global [ %r12946 + 0 ], [ %rd574 + 0 ], 0x8, %r12945; 2026-02-21T12:44:28.6360859Z // end inline asm 2026-02-21T12:44:28.6360927Z cp.async.commit_group; 2026-02-21T12:44:28.6361135Z .loc 1 43 126 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:43:126 2026-02-21T12:44:28.6361196Z add.s32 %r14331, %r14341, 8; 2026-02-21T12:44:28.6361396Z .loc 1 48 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:48:22 2026-02-21T12:44:28.6361456Z shl.b32 %r13046, %r14331, 1; 2026-02-21T12:44:28.6361651Z .loc 1 50 26 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:50:26 2026-02-21T12:44:28.6361716Z add.s32 %r13047, %r13046, %r7; 2026-02-21T12:44:28.6361915Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6361990Z mad.wide.s32 %rd575, %r13047, 2, %rd586; 2026-02-21T12:44:28.6362193Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6362258Z add.s32 %r12948, %r11280, %r13043; 2026-02-21T12:44:28.6362318Z // begin inline asm 2026-02-21T12:44:28.6362456Z cp.async.ca.shared.global [ %r12948 + 0 ], [ %rd575 + 0 ], 0x8, %r12945; 2026-02-21T12:44:28.6362515Z // end inline asm 2026-02-21T12:44:28.6362582Z cp.async.commit_group; 2026-02-21T12:44:28.6362788Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6362858Z setp.ne.b64 %p40, %rd679, 340; 2026-02-21T12:44:28.6362920Z @%p40 bra $L__BB0_35; 2026-02-21T12:44:28.6363091Z // %bb.34: // in Loop: Header=BB0_28 Depth=1 2026-02-21T12:44:28.6363307Z .loc 1 34 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:34:32 2026-02-21T12:44:28.6363421Z add.s64 %rd608, %rd675, %rd5; 2026-02-21T12:44:28.6363496Z add.s64 %rd609, %rd675, %rd6; 2026-02-21T12:44:28.6363559Z add.s64 %rd610, %rd675, %rd7; 2026-02-21T12:44:28.6363622Z add.s64 %rd611, %rd675, %rd8; 2026-02-21T12:44:28.6363685Z add.s64 %rd612, %rd675, %rd9; 2026-02-21T12:44:28.6363748Z add.s64 %rd613, %rd675, %rd10; 2026-02-21T12:44:28.6363811Z add.s64 %rd614, %rd675, %rd11; 2026-02-21T12:44:28.6363875Z add.s64 %rd615, %rd675, %rd12; 2026-02-21T12:44:28.6363937Z add.s64 %rd616, %rd675, %rd13; 2026-02-21T12:44:28.6363999Z add.s64 %rd617, %rd675, %rd14; 2026-02-21T12:44:28.6364063Z add.s64 %rd618, %rd675, %rd15; 2026-02-21T12:44:28.6364123Z add.s64 %rd619, %rd675, %rd16; 2026-02-21T12:44:28.6364185Z add.s64 %rd620, %rd675, %rd17; 2026-02-21T12:44:28.6364252Z add.s64 %rd621, %rd675, %rd18; 2026-02-21T12:44:28.6364314Z add.s64 %rd622, %rd675, %rd19; 2026-02-21T12:44:28.6364446Z add.s64 %rd623, %rd675, %rd20; 2026-02-21T12:44:28.6364698Z .loc 1 36 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:36:32 2026-02-21T12:44:28.6364765Z add.s64 %rd624, %rd671, %rd22; 2026-02-21T12:44:28.6364962Z .loc 1 51 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:32 2026-02-21T12:44:28.6365024Z shl.b64 %rd625, %rd691, 14; 2026-02-21T12:44:28.6365090Z add.s64 %rd588, %rd25, %rd625; 2026-02-21T12:44:28.6365284Z .loc 1 51 80 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:51:80 2026-02-21T12:44:28.6365344Z // begin inline asm 2026-02-21T12:44:28.6365406Z mov.u64 %rd589, 0x0; 2026-02-21T12:44:28.6365546Z createpolicy.fractional.L2::evict_last.b64 %rd589, 1.0; 2026-02-21T12:44:28.6365609Z // end inline asm 2026-02-21T12:44:28.6365672Z // begin inline asm 2026-02-21T12:44:28.6365734Z mov.u32 %r13048, 0x0; 2026-02-21T12:44:28.6365793Z mov.u32 %r13049, 0x0; 2026-02-21T12:44:28.6365993Z ld.global.L1::evict_last.L2::cache_hint.v2.b32 { %r13048, %r13049 }, [ %rd588 + 0 ], %rd589; 2026-02-21T12:44:28.6366053Z // end inline asm 2026-02-21T12:44:28.6366255Z .loc 1 55 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:55:32 2026-02-21T12:44:28.6366311Z bar.sync 0; 2026-02-21T12:44:28.6366404Z st.shared.v2.b32 [%r11307], {%r13048, %r13049}; 2026-02-21T12:44:28.6366574Z bar.sync 0; 2026-02-21T12:44:28.6366647Z ld.shared.b16 %rs533, [%r1072]; 2026-02-21T12:44:28.6366717Z ld.shared.b16 %rs534, [%r1072+128]; 2026-02-21T12:44:28.6366789Z ld.shared.b16 %rs535, [%r1072+8]; 2026-02-21T12:44:28.6366856Z ld.shared.b16 %rs536, [%r1072+136]; 2026-02-21T12:44:28.6366919Z cvt.f32.bf16 %r13307, %rs533; 2026-02-21T12:44:28.6366983Z cvt.f32.bf16 %r13308, %rs534; 2026-02-21T12:44:28.6367044Z cvt.f32.bf16 %r13309, %rs535; 2026-02-21T12:44:28.6367108Z cvt.f32.bf16 %r13310, %rs536; 2026-02-21T12:44:28.6367311Z .loc 1 57 34 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:34 2026-02-21T12:44:28.6367381Z add.s64 %rd626, %rd645, %rd683; 2026-02-21T12:44:28.6367445Z add.s64 %rd590, %rd626, 5237760; 2026-02-21T12:44:28.6367655Z .loc 1 57 87 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:57:87 2026-02-21T12:44:28.6367720Z // begin inline asm 2026-02-21T12:44:28.6367778Z mov.u32 %r13050, 0x0; 2026-02-21T12:44:28.6367853Z ld.global.b32 { %r13050 }, [ %rd590 + 0 ]; 2026-02-21T12:44:28.6367915Z // end inline asm 2026-02-21T12:44:28.6368113Z .loc 1 65 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:65:28 2026-02-21T12:44:28.6368169Z bar.sync 0; 2026-02-21T12:44:28.6368236Z st.shared.b8 [%r1064], %r13050; 2026-02-21T12:44:28.6368310Z prmt.b32 %r13712, %r13050, 0, 0x7771U; 2026-02-21T12:44:28.6368469Z st.shared.b8 [%r1065+256], %r13712; 2026-02-21T12:44:28.6368539Z prmt.b32 %r13713, %r13050, 0, 0x7772U; 2026-02-21T12:44:28.6368609Z st.shared.b8 [%r1066+512], %r13713; 2026-02-21T12:44:28.6368742Z prmt.b32 %r13714, %r13050, 0, 0x7773U; 2026-02-21T12:44:28.6368808Z st.shared.b8 [%r1067+768], %r13714; 2026-02-21T12:44:28.6368862Z bar.sync 0; 2026-02-21T12:44:28.6368930Z ld.shared.b32 %r13715, [%r1068]; 2026-02-21T12:44:28.6368996Z prmt.b32 %r13716, %r13715, 0, 0x7771U; 2026-02-21T12:44:28.6369060Z cvt.u16.u32 %rs537, %r13716; 2026-02-21T12:44:28.6369130Z prmt.b32 %r13717, %r13715, 0, 0x7770U; 2026-02-21T12:44:28.6369192Z cvt.u16.u32 %rs538, %r13717; 2026-02-21T12:44:28.6369256Z prmt.b32 %r13718, %r13715, 0, 0x7773U; 2026-02-21T12:44:28.6369321Z cvt.u16.u32 %rs539, %r13718; 2026-02-21T12:44:28.6369386Z prmt.b32 %r13719, %r13715, 0, 0x7772U; 2026-02-21T12:44:28.6369446Z cvt.u16.u32 %rs540, %r13719; 2026-02-21T12:44:28.6369649Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6369719Z shl.b16 %rs541, %rs538, 4; 2026-02-21T12:44:28.6369781Z shl.b16 %rs542, %rs537, 4; 2026-02-21T12:44:28.6370116Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6370188Z cvt.u32.u16 %r13720, %rs541; 2026-02-21T12:44:28.6372795Z prmt.b32 %r13721, %r13720, %r13722, 0x3340U; 2026-02-21T12:44:28.6372925Z prmt.b32 %r13726, %r13721, %r13723, 0x5410U; 2026-02-21T12:44:28.6373013Z prmt.b32 %r13727, %r13726, %r13715, 0x5040U; 2026-02-21T12:44:28.6373084Z prmt.b32 %r13728, %r13727, 0, 0x9991U; 2026-02-21T12:44:28.6373166Z cvt.u16.u32 %rs543, %r13728; 2026-02-21T12:44:28.6373233Z shr.s16 %rs544, %rs543, 4; 2026-02-21T12:44:28.6373307Z prmt.b32 %r13729, %r13727, 0, 0xbbb3U; 2026-02-21T12:44:28.6373371Z cvt.u16.u32 %rs545, %r13729; 2026-02-21T12:44:28.6373438Z shr.s16 %rs546, %rs545, 4; 2026-02-21T12:44:28.6373502Z cvt.s16.s8 %rs547, %rs541; 2026-02-21T12:44:28.6373567Z shr.s16 %rs548, %rs547, 4; 2026-02-21T12:44:28.6373628Z cvt.s16.s8 %rs549, %rs542; 2026-02-21T12:44:28.6373692Z shr.s16 %rs550, %rs549, 4; 2026-02-21T12:44:28.6373929Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6374003Z cvt.rn.f32.s16 %r13730, %rs546; 2026-02-21T12:44:28.6374067Z cvt.rn.f32.s16 %r13731, %rs544; 2026-02-21T12:44:28.6374130Z cvt.rn.f32.s16 %r13732, %rs550; 2026-02-21T12:44:28.6374190Z cvt.rn.f32.s16 %r13733, %rs548; 2026-02-21T12:44:28.6374405Z .loc 1 60 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:60:28 2026-02-21T12:44:28.6374469Z shl.b16 %rs551, %rs540, 4; 2026-02-21T12:44:28.6374529Z shl.b16 %rs552, %rs539, 4; 2026-02-21T12:44:28.6374741Z .loc 1 62 25 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:62:25 2026-02-21T12:44:28.6374828Z prmt.b32 %r13734, %r13715, %r13735, 0x3020U; 2026-02-21T12:44:28.6374902Z prmt.b32 %r13736, %r13734, 0, 0x9991U; 2026-02-21T12:44:28.6374970Z cvt.u16.u32 %rs553, %r13736; 2026-02-21T12:44:28.6375035Z shr.s16 %rs554, %rs553, 4; 2026-02-21T12:44:28.6375101Z cvt.s16.s8 %rs555, %rs551; 2026-02-21T12:44:28.6375161Z shr.s16 %rs556, %rs555, 4; 2026-02-21T12:44:28.6375225Z cvt.s16.s8 %rs557, %rs552; 2026-02-21T12:44:28.6375284Z shr.s16 %rs558, %rs557, 4; 2026-02-21T12:44:28.6375353Z prmt.b32 %r13737, %r13715, 0, 0xbbb3U; 2026-02-21T12:44:28.6375419Z cvt.u16.u32 %rs559, %r13737; 2026-02-21T12:44:28.6375479Z shr.s16 %rs560, %rs559, 4; 2026-02-21T12:44:28.6375692Z .loc 1 80 32 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:80:32 2026-02-21T12:44:28.6375761Z cvt.rn.f32.s16 %r13738, %rs554; 2026-02-21T12:44:28.6375827Z cvt.rn.f32.s16 %r13739, %rs560; 2026-02-21T12:44:28.6375890Z cvt.rn.f32.s16 %r13740, %rs558; 2026-02-21T12:44:28.6375951Z cvt.rn.f32.s16 %r13741, %rs556; 2026-02-21T12:44:28.6376010Z bar.sync 0; 2026-02-21T12:44:28.6376232Z st.shared.v4.b32 [%r1069], {%r13733, %r13731, %r13732, %r13730}; 2026-02-21T12:44:28.6376352Z st.shared.v4.b32 [%r1070], {%r13741, %r13738, %r13740, %r13739}; 2026-02-21T12:44:28.6376623Z $L__tmp39: 2026-02-21T12:44:28.6376913Z .loc 2 291 36 // standard.py:291:36 @[ cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:87:40 ] 2026-02-21T12:44:28.6376975Z // begin inline asm 2026-02-21T12:44:28.6377060Z fence.proxy.async.shared::cta; 2026-02-21T12:44:28.6377131Z // end inline asm 2026-02-21T12:44:28.6377188Z bar.sync 0; 2026-02-21T12:44:28.6377262Z wgmma.fence.sync.aligned; 2026-02-21T12:44:28.6377325Z // begin inline asm 2026-02-21T12:44:28.6380140Z wgmma.mma_async.sync.aligned.m64n256k8.f32.tf32.tf32 {%r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475}, {%r13307,%r13308,%r13309,%r13310}, %rd591, %p33, 1, 1; 2026-02-21T12:44:28.6380213Z // end inline asm 2026-02-21T12:44:28.6380292Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:28.6380354Z // begin inline asm 2026-02-21T12:44:28.6382774Z // wait for regs: %r14348,%r14349,%r14350,%r14351,%r14352,%r14353,%r14354,%r14355,%r14356,%r14357,%r14358,%r14359,%r14360,%r14361,%r14362,%r14363,%r14364,%r14365,%r14366,%r14367,%r14368,%r14369,%r14370,%r14371,%r14372,%r14373,%r14374,%r14375,%r14376,%r14377,%r14378,%r14379,%r14380,%r14381,%r14382,%r14383,%r14384,%r14385,%r14386,%r14387,%r14388,%r14389,%r14390,%r14391,%r14392,%r14393,%r14394,%r14395,%r14396,%r14397,%r14398,%r14399,%r14400,%r14401,%r14402,%r14403,%r14404,%r14405,%r14406,%r14407,%r14408,%r14409,%r14410,%r14411,%r14412,%r14413,%r14414,%r14415,%r14416,%r14417,%r14418,%r14419,%r14420,%r14421,%r14422,%r14423,%r14424,%r14425,%r14426,%r14427,%r14428,%r14429,%r14430,%r14431,%r14432,%r14433,%r14434,%r14435,%r14436,%r14437,%r14438,%r14439,%r14440,%r14441,%r14442,%r14443,%r14444,%r14445,%r14446,%r14447,%r14448,%r14449,%r14450,%r14451,%r14452,%r14453,%r14454,%r14455,%r14456,%r14457,%r14458,%r14459,%r14460,%r14461,%r14462,%r14463,%r14464,%r14465,%r14466,%r14467,%r14468,%r14469,%r14470,%r14471,%r14472,%r14473,%r14474,%r14475 2026-02-21T12:44:28.6382856Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:28.6382916Z // end inline asm 2026-02-21T12:44:28.6382974Z $L__tmp40: 2026-02-21T12:44:28.6383193Z .loc 1 90 28 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:90:28 2026-02-21T12:44:28.6383296Z cvt.rn.bf16x2.f32 %r13742, %r14349, %r14348; 2026-02-21T12:44:28.6383377Z cvt.rn.bf16x2.f32 %r13743, %r14351, %r14350; 2026-02-21T12:44:28.6383453Z cvt.rn.bf16x2.f32 %r13744, %r14353, %r14352; 2026-02-21T12:44:28.6383527Z cvt.rn.bf16x2.f32 %r13745, %r14355, %r14354; 2026-02-21T12:44:28.6383605Z cvt.rn.bf16x2.f32 %r13746, %r14357, %r14356; 2026-02-21T12:44:28.6383679Z cvt.rn.bf16x2.f32 %r13747, %r14359, %r14358; 2026-02-21T12:44:28.6383754Z cvt.rn.bf16x2.f32 %r13748, %r14361, %r14360; 2026-02-21T12:44:28.6383830Z cvt.rn.bf16x2.f32 %r13749, %r14363, %r14362; 2026-02-21T12:44:28.6383904Z cvt.rn.bf16x2.f32 %r13750, %r14365, %r14364; 2026-02-21T12:44:28.6384049Z cvt.rn.bf16x2.f32 %r13751, %r14367, %r14366; 2026-02-21T12:44:28.6384129Z cvt.rn.bf16x2.f32 %r13752, %r14369, %r14368; 2026-02-21T12:44:28.6384207Z cvt.rn.bf16x2.f32 %r13753, %r14371, %r14370; 2026-02-21T12:44:28.6384342Z cvt.rn.bf16x2.f32 %r13754, %r14373, %r14372; 2026-02-21T12:44:28.6384419Z cvt.rn.bf16x2.f32 %r13755, %r14375, %r14374; 2026-02-21T12:44:28.6384497Z cvt.rn.bf16x2.f32 %r13756, %r14377, %r14376; 2026-02-21T12:44:28.6384571Z cvt.rn.bf16x2.f32 %r13757, %r14379, %r14378; 2026-02-21T12:44:28.6384645Z cvt.rn.bf16x2.f32 %r13758, %r14381, %r14380; 2026-02-21T12:44:28.6384723Z cvt.rn.bf16x2.f32 %r13759, %r14383, %r14382; 2026-02-21T12:44:28.6384797Z cvt.rn.bf16x2.f32 %r13760, %r14385, %r14384; 2026-02-21T12:44:28.6384883Z cvt.rn.bf16x2.f32 %r13761, %r14387, %r14386; 2026-02-21T12:44:28.6384961Z cvt.rn.bf16x2.f32 %r13762, %r14389, %r14388; 2026-02-21T12:44:28.6385038Z cvt.rn.bf16x2.f32 %r13763, %r14391, %r14390; 2026-02-21T12:44:28.6385114Z cvt.rn.bf16x2.f32 %r13764, %r14393, %r14392; 2026-02-21T12:44:28.6385190Z cvt.rn.bf16x2.f32 %r13765, %r14395, %r14394; 2026-02-21T12:44:28.6385268Z cvt.rn.bf16x2.f32 %r13766, %r14397, %r14396; 2026-02-21T12:44:28.6385434Z cvt.rn.bf16x2.f32 %r13767, %r14399, %r14398; 2026-02-21T12:44:28.6385511Z cvt.rn.bf16x2.f32 %r13768, %r14401, %r14400; 2026-02-21T12:44:28.6385591Z cvt.rn.bf16x2.f32 %r13769, %r14403, %r14402; 2026-02-21T12:44:28.6385667Z cvt.rn.bf16x2.f32 %r13770, %r14405, %r14404; 2026-02-21T12:44:28.6385741Z cvt.rn.bf16x2.f32 %r13771, %r14407, %r14406; 2026-02-21T12:44:28.6385816Z cvt.rn.bf16x2.f32 %r13772, %r14409, %r14408; 2026-02-21T12:44:28.6385892Z cvt.rn.bf16x2.f32 %r13773, %r14411, %r14410; 2026-02-21T12:44:28.6385967Z cvt.rn.bf16x2.f32 %r13774, %r14413, %r14412; 2026-02-21T12:44:28.6386042Z cvt.rn.bf16x2.f32 %r13775, %r14415, %r14414; 2026-02-21T12:44:28.6386120Z cvt.rn.bf16x2.f32 %r13776, %r14417, %r14416; 2026-02-21T12:44:28.6386197Z cvt.rn.bf16x2.f32 %r13777, %r14419, %r14418; 2026-02-21T12:44:28.6386274Z cvt.rn.bf16x2.f32 %r13778, %r14421, %r14420; 2026-02-21T12:44:28.6386350Z cvt.rn.bf16x2.f32 %r13779, %r14423, %r14422; 2026-02-21T12:44:28.6386427Z cvt.rn.bf16x2.f32 %r13780, %r14425, %r14424; 2026-02-21T12:44:28.6386631Z cvt.rn.bf16x2.f32 %r13781, %r14427, %r14426; 2026-02-21T12:44:28.6386710Z cvt.rn.bf16x2.f32 %r13782, %r14429, %r14428; 2026-02-21T12:44:28.6386793Z cvt.rn.bf16x2.f32 %r13783, %r14431, %r14430; 2026-02-21T12:44:28.6386869Z cvt.rn.bf16x2.f32 %r13784, %r14433, %r14432; 2026-02-21T12:44:28.6386944Z cvt.rn.bf16x2.f32 %r13785, %r14435, %r14434; 2026-02-21T12:44:28.6387023Z cvt.rn.bf16x2.f32 %r13786, %r14437, %r14436; 2026-02-21T12:44:28.6387098Z cvt.rn.bf16x2.f32 %r13787, %r14439, %r14438; 2026-02-21T12:44:28.6387172Z cvt.rn.bf16x2.f32 %r13788, %r14441, %r14440; 2026-02-21T12:44:28.6387249Z cvt.rn.bf16x2.f32 %r13789, %r14443, %r14442; 2026-02-21T12:44:28.6387323Z cvt.rn.bf16x2.f32 %r13790, %r14445, %r14444; 2026-02-21T12:44:28.6387397Z cvt.rn.bf16x2.f32 %r13791, %r14447, %r14446; 2026-02-21T12:44:28.6387475Z cvt.rn.bf16x2.f32 %r13792, %r14449, %r14448; 2026-02-21T12:44:28.6387553Z cvt.rn.bf16x2.f32 %r13793, %r14451, %r14450; 2026-02-21T12:44:28.6387634Z cvt.rn.bf16x2.f32 %r13794, %r14453, %r14452; 2026-02-21T12:44:28.6387710Z cvt.rn.bf16x2.f32 %r13795, %r14455, %r14454; 2026-02-21T12:44:28.6387787Z cvt.rn.bf16x2.f32 %r13796, %r14457, %r14456; 2026-02-21T12:44:28.6387862Z cvt.rn.bf16x2.f32 %r13797, %r14459, %r14458; 2026-02-21T12:44:28.6387936Z cvt.rn.bf16x2.f32 %r13798, %r14461, %r14460; 2026-02-21T12:44:28.6388014Z cvt.rn.bf16x2.f32 %r13799, %r14463, %r14462; 2026-02-21T12:44:28.6388089Z cvt.rn.bf16x2.f32 %r13800, %r14465, %r14464; 2026-02-21T12:44:28.6388164Z cvt.rn.bf16x2.f32 %r13801, %r14467, %r14466; 2026-02-21T12:44:28.6388241Z cvt.rn.bf16x2.f32 %r13802, %r14469, %r14468; 2026-02-21T12:44:28.6388319Z cvt.rn.bf16x2.f32 %r13803, %r14471, %r14470; 2026-02-21T12:44:28.6388394Z cvt.rn.bf16x2.f32 %r13804, %r14473, %r14472; 2026-02-21T12:44:28.6388638Z cvt.rn.bf16x2.f32 %r13805, %r14475, %r14474; 2026-02-21T12:44:28.6388857Z .loc 1 91 22 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:22 2026-02-21T12:44:28.6389020Z mad.lo.s64 %rd627, %rd608, 2560, %rd167; 2026-02-21T12:44:28.6389084Z shl.b64 %rd628, %rd624, 1; 2026-02-21T12:44:28.6389154Z add.s64 %rd592, %rd627, %rd628; 2026-02-21T12:44:28.6389227Z mad.lo.s64 %rd629, %rd609, 2560, %rd167; 2026-02-21T12:44:28.6389292Z add.s64 %rd593, %rd629, %rd628; 2026-02-21T12:44:28.6389361Z mad.lo.s64 %rd630, %rd610, 2560, %rd167; 2026-02-21T12:44:28.6389425Z add.s64 %rd594, %rd630, %rd628; 2026-02-21T12:44:28.6389492Z mad.lo.s64 %rd631, %rd611, 2560, %rd167; 2026-02-21T12:44:28.6389555Z add.s64 %rd595, %rd631, %rd628; 2026-02-21T12:44:28.6389624Z mad.lo.s64 %rd632, %rd612, 2560, %rd167; 2026-02-21T12:44:28.6389687Z add.s64 %rd596, %rd632, %rd628; 2026-02-21T12:44:28.6389765Z mad.lo.s64 %rd633, %rd613, 2560, %rd167; 2026-02-21T12:44:28.6389829Z add.s64 %rd597, %rd633, %rd628; 2026-02-21T12:44:28.6389904Z mad.lo.s64 %rd634, %rd614, 2560, %rd167; 2026-02-21T12:44:28.6389966Z add.s64 %rd598, %rd634, %rd628; 2026-02-21T12:44:28.6390150Z mad.lo.s64 %rd635, %rd615, 2560, %rd167; 2026-02-21T12:44:28.6390218Z add.s64 %rd599, %rd635, %rd628; 2026-02-21T12:44:28.6390287Z mad.lo.s64 %rd636, %rd616, 2560, %rd167; 2026-02-21T12:44:28.6390349Z add.s64 %rd600, %rd636, %rd628; 2026-02-21T12:44:28.6390420Z mad.lo.s64 %rd637, %rd617, 2560, %rd167; 2026-02-21T12:44:28.6390482Z add.s64 %rd601, %rd637, %rd628; 2026-02-21T12:44:28.6390550Z mad.lo.s64 %rd638, %rd618, 2560, %rd167; 2026-02-21T12:44:28.6390611Z add.s64 %rd602, %rd638, %rd628; 2026-02-21T12:44:28.6390682Z mad.lo.s64 %rd639, %rd619, 2560, %rd167; 2026-02-21T12:44:28.6390744Z add.s64 %rd603, %rd639, %rd628; 2026-02-21T12:44:28.6390811Z mad.lo.s64 %rd640, %rd620, 2560, %rd167; 2026-02-21T12:44:28.6390875Z add.s64 %rd604, %rd640, %rd628; 2026-02-21T12:44:28.6390943Z mad.lo.s64 %rd641, %rd621, 2560, %rd167; 2026-02-21T12:44:28.6391008Z add.s64 %rd605, %rd641, %rd628; 2026-02-21T12:44:28.6391077Z mad.lo.s64 %rd642, %rd622, 2560, %rd167; 2026-02-21T12:44:28.6391148Z add.s64 %rd606, %rd642, %rd628; 2026-02-21T12:44:28.6391216Z mad.lo.s64 %rd643, %rd623, 2560, %rd167; 2026-02-21T12:44:28.6391279Z add.s64 %rd607, %rd643, %rd628; 2026-02-21T12:44:28.6391491Z .loc 1 91 81 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:91:81 2026-02-21T12:44:28.6391549Z bar.sync 0; 2026-02-21T12:44:28.6391676Z st.shared.v4.b32 [%r1073], {%r13742, %r13744, %r13746, %r13748}; 2026-02-21T12:44:28.6391800Z st.shared.v4.b32 [%r1074], {%r13750, %r13752, %r13754, %r13756}; 2026-02-21T12:44:28.6391911Z st.shared.v4.b32 [%r1075], {%r13758, %r13760, %r13762, %r13764}; 2026-02-21T12:44:28.6392021Z st.shared.v4.b32 [%r1076], {%r13766, %r13768, %r13770, %r13772}; 2026-02-21T12:44:28.6392132Z st.shared.v4.b32 [%r1077], {%r13774, %r13776, %r13778, %r13780}; 2026-02-21T12:44:28.6392243Z st.shared.v4.b32 [%r1078], {%r13782, %r13784, %r13786, %r13788}; 2026-02-21T12:44:28.6392353Z st.shared.v4.b32 [%r1079], {%r13790, %r13792, %r13794, %r13796}; 2026-02-21T12:44:28.6392466Z st.shared.v4.b32 [%r1080], {%r13798, %r13800, %r13802, %r13804}; 2026-02-21T12:44:28.6392525Z bar.sync 0; 2026-02-21T12:44:28.6392585Z // begin inline asm 2026-02-21T12:44:28.6392792Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13647, %r13648, %r13649, %r13650}, [%r13571]; 2026-02-21T12:44:28.6392852Z // end inline asm 2026-02-21T12:44:28.6392910Z // begin inline asm 2026-02-21T12:44:28.6393103Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13655, %r13656, %r13657, %r13658}, [%r13576]; 2026-02-21T12:44:28.6393161Z // end inline asm 2026-02-21T12:44:28.6393219Z // begin inline asm 2026-02-21T12:44:28.6393411Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13663, %r13664, %r13665, %r13666}, [%r13581]; 2026-02-21T12:44:28.6393468Z // end inline asm 2026-02-21T12:44:28.6393528Z // begin inline asm 2026-02-21T12:44:28.6393795Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13671, %r13672, %r13673, %r13674}, [%r13586]; 2026-02-21T12:44:28.6393851Z // end inline asm 2026-02-21T12:44:28.6393960Z // begin inline asm 2026-02-21T12:44:28.6394150Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13679, %r13680, %r13681, %r13682}, [%r13591]; 2026-02-21T12:44:28.6394206Z // end inline asm 2026-02-21T12:44:28.6394263Z // begin inline asm 2026-02-21T12:44:28.6394456Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13687, %r13688, %r13689, %r13690}, [%r13596]; 2026-02-21T12:44:28.6394512Z // end inline asm 2026-02-21T12:44:28.6394572Z // begin inline asm 2026-02-21T12:44:28.6394764Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13695, %r13696, %r13697, %r13698}, [%r13601]; 2026-02-21T12:44:28.6394821Z // end inline asm 2026-02-21T12:44:28.6394879Z // begin inline asm 2026-02-21T12:44:28.6395070Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13703, %r13704, %r13705, %r13706}, [%r13606]; 2026-02-21T12:44:28.6395126Z // end inline asm 2026-02-21T12:44:28.6395183Z bar.sync 0; 2026-02-21T12:44:28.6395306Z st.shared.v4.b32 [%r1073], {%r13743, %r13745, %r13747, %r13749}; 2026-02-21T12:44:28.6395521Z st.shared.v4.b32 [%r1074], {%r13751, %r13753, %r13755, %r13757}; 2026-02-21T12:44:28.6395635Z st.shared.v4.b32 [%r1075], {%r13759, %r13761, %r13763, %r13765}; 2026-02-21T12:44:28.6395745Z st.shared.v4.b32 [%r1076], {%r13767, %r13769, %r13771, %r13773}; 2026-02-21T12:44:28.6395857Z st.shared.v4.b32 [%r1077], {%r13775, %r13777, %r13779, %r13781}; 2026-02-21T12:44:28.6395983Z st.shared.v4.b32 [%r1078], {%r13783, %r13785, %r13787, %r13789}; 2026-02-21T12:44:28.6396093Z st.shared.v4.b32 [%r1079], {%r13791, %r13793, %r13795, %r13797}; 2026-02-21T12:44:28.6396204Z st.shared.v4.b32 [%r1080], {%r13799, %r13801, %r13803, %r13805}; 2026-02-21T12:44:28.6396259Z bar.sync 0; 2026-02-21T12:44:28.6396318Z // begin inline asm 2026-02-21T12:44:28.6396634Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13651, %r13652, %r13653, %r13654}, [%r13571]; 2026-02-21T12:44:28.6396704Z // end inline asm 2026-02-21T12:44:28.6396763Z // begin inline asm 2026-02-21T12:44:28.6396959Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13659, %r13660, %r13661, %r13662}, [%r13576]; 2026-02-21T12:44:28.6397022Z // end inline asm 2026-02-21T12:44:28.6397081Z // begin inline asm 2026-02-21T12:44:28.6397282Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13667, %r13668, %r13669, %r13670}, [%r13581]; 2026-02-21T12:44:28.6397342Z // end inline asm 2026-02-21T12:44:28.6397401Z // begin inline asm 2026-02-21T12:44:28.6397593Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13675, %r13676, %r13677, %r13678}, [%r13586]; 2026-02-21T12:44:28.6397648Z // end inline asm 2026-02-21T12:44:28.6397708Z // begin inline asm 2026-02-21T12:44:28.6397898Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13683, %r13684, %r13685, %r13686}, [%r13591]; 2026-02-21T12:44:28.6397953Z // end inline asm 2026-02-21T12:44:28.6398014Z // begin inline asm 2026-02-21T12:44:28.6398204Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13691, %r13692, %r13693, %r13694}, [%r13596]; 2026-02-21T12:44:28.6398262Z // end inline asm 2026-02-21T12:44:28.6398323Z // begin inline asm 2026-02-21T12:44:28.6398518Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13699, %r13700, %r13701, %r13702}, [%r13601]; 2026-02-21T12:44:28.6398573Z // end inline asm 2026-02-21T12:44:28.6398630Z // begin inline asm 2026-02-21T12:44:28.6398819Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r13707, %r13708, %r13709, %r13710}, [%r13606]; 2026-02-21T12:44:28.6398874Z // end inline asm 2026-02-21T12:44:28.6398931Z // begin inline asm 2026-02-21T12:44:28.6399064Z st.global.v4.b32 [ %rd592 + 0 ], { %r13647, %r13648, %r13649, %r13650 }; 2026-02-21T12:44:28.6399121Z // end inline asm 2026-02-21T12:44:28.6399178Z // begin inline asm 2026-02-21T12:44:28.6399302Z st.global.v4.b32 [ %rd593 + 0 ], { %r13651, %r13652, %r13653, %r13654 }; 2026-02-21T12:44:28.6399359Z // end inline asm 2026-02-21T12:44:28.6399416Z // begin inline asm 2026-02-21T12:44:28.6399628Z st.global.v4.b32 [ %rd594 + 0 ], { %r13655, %r13656, %r13657, %r13658 }; 2026-02-21T12:44:28.6399686Z // end inline asm 2026-02-21T12:44:28.6399804Z // begin inline asm 2026-02-21T12:44:28.6399923Z st.global.v4.b32 [ %rd595 + 0 ], { %r13659, %r13660, %r13661, %r13662 }; 2026-02-21T12:44:28.6399981Z // end inline asm 2026-02-21T12:44:28.6400039Z // begin inline asm 2026-02-21T12:44:28.6400155Z st.global.v4.b32 [ %rd596 + 0 ], { %r13663, %r13664, %r13665, %r13666 }; 2026-02-21T12:44:28.6400212Z // end inline asm 2026-02-21T12:44:28.6400272Z // begin inline asm 2026-02-21T12:44:28.6400391Z st.global.v4.b32 [ %rd597 + 0 ], { %r13667, %r13668, %r13669, %r13670 }; 2026-02-21T12:44:28.6400447Z // end inline asm 2026-02-21T12:44:28.6400507Z // begin inline asm 2026-02-21T12:44:28.6400623Z st.global.v4.b32 [ %rd598 + 0 ], { %r13671, %r13672, %r13673, %r13674 }; 2026-02-21T12:44:28.6400678Z // end inline asm 2026-02-21T12:44:28.6400733Z // begin inline asm 2026-02-21T12:44:28.6400856Z st.global.v4.b32 [ %rd599 + 0 ], { %r13675, %r13676, %r13677, %r13678 }; 2026-02-21T12:44:28.6400913Z // end inline asm 2026-02-21T12:44:28.6400970Z // begin inline asm 2026-02-21T12:44:28.6401208Z st.global.v4.b32 [ %rd600 + 0 ], { %r13679, %r13680, %r13681, %r13682 }; 2026-02-21T12:44:28.6401268Z // end inline asm 2026-02-21T12:44:28.6401327Z // begin inline asm 2026-02-21T12:44:28.6401447Z st.global.v4.b32 [ %rd601 + 0 ], { %r13683, %r13684, %r13685, %r13686 }; 2026-02-21T12:44:28.6401503Z // end inline asm 2026-02-21T12:44:28.6401560Z // begin inline asm 2026-02-21T12:44:28.6401676Z st.global.v4.b32 [ %rd602 + 0 ], { %r13687, %r13688, %r13689, %r13690 }; 2026-02-21T12:44:28.6401735Z // end inline asm 2026-02-21T12:44:28.6401793Z // begin inline asm 2026-02-21T12:44:28.6401910Z st.global.v4.b32 [ %rd603 + 0 ], { %r13691, %r13692, %r13693, %r13694 }; 2026-02-21T12:44:28.6401968Z // end inline asm 2026-02-21T12:44:28.6402025Z // begin inline asm 2026-02-21T12:44:28.6402149Z st.global.v4.b32 [ %rd604 + 0 ], { %r13695, %r13696, %r13697, %r13698 }; 2026-02-21T12:44:28.6402207Z // end inline asm 2026-02-21T12:44:28.6402266Z // begin inline asm 2026-02-21T12:44:28.6402387Z st.global.v4.b32 [ %rd605 + 0 ], { %r13699, %r13700, %r13701, %r13702 }; 2026-02-21T12:44:28.6402443Z // end inline asm 2026-02-21T12:44:28.6402503Z // begin inline asm 2026-02-21T12:44:28.6402621Z st.global.v4.b32 [ %rd606 + 0 ], { %r13703, %r13704, %r13705, %r13706 }; 2026-02-21T12:44:28.6402676Z // end inline asm 2026-02-21T12:44:28.6402748Z // begin inline asm 2026-02-21T12:44:28.6402869Z st.global.v4.b32 [ %rd607 + 0 ], { %r13707, %r13708, %r13709, %r13710 }; 2026-02-21T12:44:28.6402926Z // end inline asm 2026-02-21T12:44:28.6402989Z mov.b32 %r14348, 0f00000000; 2026-02-21T12:44:28.6403055Z mov.b32 %r14349, %r14348; 2026-02-21T12:44:28.6403115Z mov.b32 %r14350, %r14348; 2026-02-21T12:44:28.6403173Z mov.b32 %r14351, %r14348; 2026-02-21T12:44:28.6403233Z mov.b32 %r14352, %r14348; 2026-02-21T12:44:28.6403293Z mov.b32 %r14353, %r14348; 2026-02-21T12:44:28.6403352Z mov.b32 %r14354, %r14348; 2026-02-21T12:44:28.6403409Z mov.b32 %r14355, %r14348; 2026-02-21T12:44:28.6403473Z mov.b32 %r14356, %r14348; 2026-02-21T12:44:28.6403532Z mov.b32 %r14357, %r14348; 2026-02-21T12:44:28.6403591Z mov.b32 %r14358, %r14348; 2026-02-21T12:44:28.6403651Z mov.b32 %r14359, %r14348; 2026-02-21T12:44:28.6403708Z mov.b32 %r14360, %r14348; 2026-02-21T12:44:28.6403765Z mov.b32 %r14361, %r14348; 2026-02-21T12:44:28.6403822Z mov.b32 %r14362, %r14348; 2026-02-21T12:44:28.6403882Z mov.b32 %r14363, %r14348; 2026-02-21T12:44:28.6403939Z mov.b32 %r14364, %r14348; 2026-02-21T12:44:28.6403996Z mov.b32 %r14365, %r14348; 2026-02-21T12:44:28.6404056Z mov.b32 %r14366, %r14348; 2026-02-21T12:44:28.6404115Z mov.b32 %r14367, %r14348; 2026-02-21T12:44:28.6404173Z mov.b32 %r14368, %r14348; 2026-02-21T12:44:28.6404231Z mov.b32 %r14369, %r14348; 2026-02-21T12:44:28.6404292Z mov.b32 %r14370, %r14348; 2026-02-21T12:44:28.6404411Z mov.b32 %r14371, %r14348; 2026-02-21T12:44:28.6404469Z mov.b32 %r14372, %r14348; 2026-02-21T12:44:28.6404531Z mov.b32 %r14373, %r14348; 2026-02-21T12:44:28.6404638Z mov.b32 %r14374, %r14348; 2026-02-21T12:44:28.6404699Z mov.b32 %r14375, %r14348; 2026-02-21T12:44:28.6404765Z mov.b32 %r14376, %r14348; 2026-02-21T12:44:28.6404830Z mov.b32 %r14377, %r14348; 2026-02-21T12:44:28.6404888Z mov.b32 %r14378, %r14348; 2026-02-21T12:44:28.6404946Z mov.b32 %r14379, %r14348; 2026-02-21T12:44:28.6405008Z mov.b32 %r14380, %r14348; 2026-02-21T12:44:28.6405066Z mov.b32 %r14381, %r14348; 2026-02-21T12:44:28.6405125Z mov.b32 %r14382, %r14348; 2026-02-21T12:44:28.6405185Z mov.b32 %r14383, %r14348; 2026-02-21T12:44:28.6405242Z mov.b32 %r14384, %r14348; 2026-02-21T12:44:28.6405301Z mov.b32 %r14385, %r14348; 2026-02-21T12:44:28.6405363Z mov.b32 %r14386, %r14348; 2026-02-21T12:44:28.6405422Z mov.b32 %r14387, %r14348; 2026-02-21T12:44:28.6405480Z mov.b32 %r14388, %r14348; 2026-02-21T12:44:28.6405541Z mov.b32 %r14389, %r14348; 2026-02-21T12:44:28.6405602Z mov.b32 %r14390, %r14348; 2026-02-21T12:44:28.6405660Z mov.b32 %r14391, %r14348; 2026-02-21T12:44:28.6405816Z mov.b32 %r14392, %r14348; 2026-02-21T12:44:28.6405881Z mov.b32 %r14393, %r14348; 2026-02-21T12:44:28.6405939Z mov.b32 %r14394, %r14348; 2026-02-21T12:44:28.6405997Z mov.b32 %r14395, %r14348; 2026-02-21T12:44:28.6406054Z mov.b32 %r14396, %r14348; 2026-02-21T12:44:28.6406114Z mov.b32 %r14397, %r14348; 2026-02-21T12:44:28.6406171Z mov.b32 %r14398, %r14348; 2026-02-21T12:44:28.6406229Z mov.b32 %r14399, %r14348; 2026-02-21T12:44:28.6406291Z mov.b32 %r14400, %r14348; 2026-02-21T12:44:28.6406349Z mov.b32 %r14401, %r14348; 2026-02-21T12:44:28.6406406Z mov.b32 %r14402, %r14348; 2026-02-21T12:44:28.6406589Z mov.b32 %r14403, %r14348; 2026-02-21T12:44:28.6406657Z mov.b32 %r14404, %r14348; 2026-02-21T12:44:28.6406715Z mov.b32 %r14405, %r14348; 2026-02-21T12:44:28.6406772Z mov.b32 %r14406, %r14348; 2026-02-21T12:44:28.6406836Z mov.b32 %r14407, %r14348; 2026-02-21T12:44:28.6406894Z mov.b32 %r14408, %r14348; 2026-02-21T12:44:28.6406952Z mov.b32 %r14409, %r14348; 2026-02-21T12:44:28.6407015Z mov.b32 %r14410, %r14348; 2026-02-21T12:44:28.6407075Z mov.b32 %r14411, %r14348; 2026-02-21T12:44:28.6407134Z mov.b32 %r14412, %r14348; 2026-02-21T12:44:28.6407193Z mov.b32 %r14413, %r14348; 2026-02-21T12:44:28.6407252Z mov.b32 %r14414, %r14348; 2026-02-21T12:44:28.6407323Z mov.b32 %r14415, %r14348; 2026-02-21T12:44:28.6407383Z mov.b32 %r14416, %r14348; 2026-02-21T12:44:28.6407442Z mov.b32 %r14417, %r14348; 2026-02-21T12:44:28.6407502Z mov.b32 %r14418, %r14348; 2026-02-21T12:44:28.6407561Z mov.b32 %r14419, %r14348; 2026-02-21T12:44:28.6407619Z mov.b32 %r14420, %r14348; 2026-02-21T12:44:28.6407678Z mov.b32 %r14421, %r14348; 2026-02-21T12:44:28.6407736Z mov.b32 %r14422, %r14348; 2026-02-21T12:44:28.6407793Z mov.b32 %r14423, %r14348; 2026-02-21T12:44:28.6407851Z mov.b32 %r14424, %r14348; 2026-02-21T12:44:28.6407913Z mov.b32 %r14425, %r14348; 2026-02-21T12:44:28.6407970Z mov.b32 %r14426, %r14348; 2026-02-21T12:44:28.6408026Z mov.b32 %r14427, %r14348; 2026-02-21T12:44:28.6408092Z mov.b32 %r14428, %r14348; 2026-02-21T12:44:28.6408152Z mov.b32 %r14429, %r14348; 2026-02-21T12:44:28.6408210Z mov.b32 %r14430, %r14348; 2026-02-21T12:44:28.6408270Z mov.b32 %r14431, %r14348; 2026-02-21T12:44:28.6408328Z mov.b32 %r14432, %r14348; 2026-02-21T12:44:28.6408385Z mov.b32 %r14433, %r14348; 2026-02-21T12:44:28.6408442Z mov.b32 %r14434, %r14348; 2026-02-21T12:44:28.6408501Z mov.b32 %r14435, %r14348; 2026-02-21T12:44:28.6408571Z mov.b32 %r14436, %r14348; 2026-02-21T12:44:28.6408630Z mov.b32 %r14437, %r14348; 2026-02-21T12:44:28.6408691Z mov.b32 %r14438, %r14348; 2026-02-21T12:44:28.6408749Z mov.b32 %r14439, %r14348; 2026-02-21T12:44:28.6408806Z mov.b32 %r14440, %r14348; 2026-02-21T12:44:28.6408862Z mov.b32 %r14441, %r14348; 2026-02-21T12:44:28.6408921Z mov.b32 %r14442, %r14348; 2026-02-21T12:44:28.6409084Z mov.b32 %r14443, %r14348; 2026-02-21T12:44:28.6409142Z mov.b32 %r14444, %r14348; 2026-02-21T12:44:28.6409200Z mov.b32 %r14445, %r14348; 2026-02-21T12:44:28.6409328Z mov.b32 %r14446, %r14348; 2026-02-21T12:44:28.6409387Z mov.b32 %r14447, %r14348; 2026-02-21T12:44:28.6409444Z mov.b32 %r14448, %r14348; 2026-02-21T12:44:28.6409504Z mov.b32 %r14449, %r14348; 2026-02-21T12:44:28.6409561Z mov.b32 %r14450, %r14348; 2026-02-21T12:44:28.6409618Z mov.b32 %r14451, %r14348; 2026-02-21T12:44:28.6409677Z mov.b32 %r14452, %r14348; 2026-02-21T12:44:28.6409735Z mov.b32 %r14453, %r14348; 2026-02-21T12:44:28.6409793Z mov.b32 %r14454, %r14348; 2026-02-21T12:44:28.6409850Z mov.b32 %r14455, %r14348; 2026-02-21T12:44:28.6409910Z mov.b32 %r14456, %r14348; 2026-02-21T12:44:28.6409967Z mov.b32 %r14457, %r14348; 2026-02-21T12:44:28.6410023Z mov.b32 %r14458, %r14348; 2026-02-21T12:44:28.6410083Z mov.b32 %r14459, %r14348; 2026-02-21T12:44:28.6410140Z mov.b32 %r14460, %r14348; 2026-02-21T12:44:28.6410200Z mov.b32 %r14461, %r14348; 2026-02-21T12:44:28.6410259Z mov.b32 %r14462, %r14348; 2026-02-21T12:44:28.6410319Z mov.b32 %r14463, %r14348; 2026-02-21T12:44:28.6410496Z mov.b32 %r14464, %r14348; 2026-02-21T12:44:28.6410560Z mov.b32 %r14465, %r14348; 2026-02-21T12:44:28.6410626Z mov.b32 %r14466, %r14348; 2026-02-21T12:44:28.6410690Z mov.b32 %r14467, %r14348; 2026-02-21T12:44:28.6410754Z mov.b32 %r14468, %r14348; 2026-02-21T12:44:28.6410812Z mov.b32 %r14469, %r14348; 2026-02-21T12:44:28.6410869Z mov.b32 %r14470, %r14348; 2026-02-21T12:44:28.6410925Z mov.b32 %r14471, %r14348; 2026-02-21T12:44:28.6410985Z mov.b32 %r14472, %r14348; 2026-02-21T12:44:28.6411042Z mov.b32 %r14473, %r14348; 2026-02-21T12:44:28.6411100Z mov.b32 %r14474, %r14348; 2026-02-21T12:44:28.6411160Z mov.b32 %r14475, %r14348; 2026-02-21T12:44:28.6411219Z bra.uni $L__BB0_35; 2026-02-21T12:44:28.6411316Z $L__BB0_36: // %._crit_edge137 2026-02-21T12:44:28.6411537Z .loc 1 22 120 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:120 2026-02-21T12:44:28.6411612Z cp.async.wait_group 0; 2026-02-21T12:44:28.6411671Z bar.sync 0; 2026-02-21T12:44:28.6411884Z .loc 1 22 4 // cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py:22:4 2026-02-21T12:44:28.6411939Z ret; 2026-02-21T12:44:28.6411994Z $L__tmp41: 2026-02-21T12:44:28.6412049Z $L__func_end0: 2026-02-21T12:44:28.6412138Z // -- End function 2026-02-21T12:44:28.6412193Z } 2026-02-21T12:44:28.6412445Z .file 1 "/tmp/torchinductor_root/pr/cpr6yfl2cckuh4grsfskfccdvmmnu2u5b2yehxppb6jooz6x3a5u.py" 2026-02-21T12:44:28.6412659Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:44:28.6412729Z .section .debug_abbrev 2026-02-21T12:44:28.6412781Z { 2026-02-21T12:44:28.6412878Z .b8 1 // Abbreviation Code 2026-02-21T12:44:28.6412973Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:44:28.6413064Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:44:28.6413160Z .b8 37 // DW_AT_producer 2026-02-21T12:44:28.6413248Z .b8 8 // DW_FORM_string 2026-02-21T12:44:28.6413328Z .b8 19 // DW_AT_language 2026-02-21T12:44:28.6413410Z .b8 5 // DW_FORM_data2 2026-02-21T12:44:28.6413489Z .b8 3 // DW_AT_name 2026-02-21T12:44:28.6413571Z .b8 8 // DW_FORM_string 2026-02-21T12:44:28.6413651Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:44:28.6413730Z .b8 6 // DW_FORM_data4 2026-02-21T12:44:28.6413812Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:44:28.6413890Z .b8 8 // DW_FORM_string 2026-02-21T12:44:28.6414033Z .b8 0 // EOM(1) 2026-02-21T12:44:28.6414105Z .b8 0 // EOM(2) 2026-02-21T12:44:28.6414197Z .b8 2 // Abbreviation Code 2026-02-21T12:44:28.6414329Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:44:28.6414410Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:44:28.6414485Z .b8 3 // DW_AT_name 2026-02-21T12:44:28.6414561Z .b8 8 // DW_FORM_string 2026-02-21T12:44:28.6414641Z .b8 32 // DW_AT_inline 2026-02-21T12:44:28.6414722Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:28.6414790Z .b8 0 // EOM(1) 2026-02-21T12:44:28.6414857Z .b8 0 // EOM(2) 2026-02-21T12:44:28.6414946Z .b8 3 // Abbreviation Code 2026-02-21T12:44:28.6415031Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:44:28.6415114Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:44:28.6415287Z .b8 17 // DW_AT_low_pc 2026-02-21T12:44:28.6415364Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:28.6415446Z .b8 18 // DW_AT_high_pc 2026-02-21T12:44:28.6415522Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:28.6415617Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:44:28.6415692Z .b8 19 // DW_FORM_ref4 2026-02-21T12:44:28.6415763Z .b8 0 // EOM(1) 2026-02-21T12:44:28.6415833Z .b8 0 // EOM(2) 2026-02-21T12:44:28.6415918Z .b8 4 // Abbreviation Code 2026-02-21T12:44:28.6416027Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:44:28.6416112Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:44:28.6416202Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:44:28.6416282Z .b8 19 // DW_FORM_ref4 2026-02-21T12:44:28.6416358Z .b8 17 // DW_AT_low_pc 2026-02-21T12:44:28.6416435Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:28.6416692Z .b8 18 // DW_AT_high_pc 2026-02-21T12:44:28.6416825Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:28.6416960Z .b8 88 // DW_AT_call_file 2026-02-21T12:44:28.6417078Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:28.6417202Z .b8 89 // DW_AT_call_line 2026-02-21T12:44:28.6417319Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:28.6417454Z .b8 87 // DW_AT_call_column 2026-02-21T12:44:28.6417547Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:28.6417624Z .b8 0 // EOM(1) 2026-02-21T12:44:28.6417696Z .b8 0 // EOM(2) 2026-02-21T12:44:28.6417765Z .b8 0 // EOM(3) 2026-02-21T12:44:28.6417817Z } 2026-02-21T12:44:28.6417883Z .section .debug_info 2026-02-21T12:44:28.6417934Z { 2026-02-21T12:44:28.6418024Z .b32 178 // Length of Unit 2026-02-21T12:44:28.6418121Z .b8 2 // DWARF version number 2026-02-21T12:44:28.6418173Z .b8 0 2026-02-21T12:44:28.6418308Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:44:28.6418406Z .b8 8 // Address Size (in bytes) 2026-02-21T12:44:28.6418524Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:44:28.6418611Z .b8 116 // DW_AT_producer 2026-02-21T12:44:28.6418763Z .b8 114 2026-02-21T12:44:28.6418817Z .b8 105 2026-02-21T12:44:28.6418868Z .b8 116 2026-02-21T12:44:28.6418978Z .b8 111 2026-02-21T12:44:28.6419032Z .b8 110 2026-02-21T12:44:28.6419085Z .b8 0 2026-02-21T12:44:28.6419171Z .b8 2 // DW_AT_language 2026-02-21T12:44:28.6419221Z .b8 0 2026-02-21T12:44:28.6419305Z .b8 99 // DW_AT_name 2026-02-21T12:44:28.6419358Z .b8 112 2026-02-21T12:44:28.6419410Z .b8 114 2026-02-21T12:44:28.6419462Z .b8 54 2026-02-21T12:44:28.6419516Z .b8 121 2026-02-21T12:44:28.6419568Z .b8 102 2026-02-21T12:44:28.6419619Z .b8 108 2026-02-21T12:44:28.6419672Z .b8 50 2026-02-21T12:44:28.6419723Z .b8 99 2026-02-21T12:44:28.6419773Z .b8 99 2026-02-21T12:44:28.6419824Z .b8 107 2026-02-21T12:44:28.6419877Z .b8 117 2026-02-21T12:44:28.6419926Z .b8 104 2026-02-21T12:44:28.6419976Z .b8 52 2026-02-21T12:44:28.6420030Z .b8 103 2026-02-21T12:44:28.6420081Z .b8 114 2026-02-21T12:44:28.6420134Z .b8 115 2026-02-21T12:44:28.6420185Z .b8 102 2026-02-21T12:44:28.6420238Z .b8 115 2026-02-21T12:44:28.6420293Z .b8 107 2026-02-21T12:44:28.6420344Z .b8 102 2026-02-21T12:44:28.6420467Z .b8 99 2026-02-21T12:44:28.6420594Z .b8 99 2026-02-21T12:44:28.6420649Z .b8 100 2026-02-21T12:44:28.6420699Z .b8 118 2026-02-21T12:44:28.6420752Z .b8 109 2026-02-21T12:44:28.6420805Z .b8 109 2026-02-21T12:44:28.6420855Z .b8 110 2026-02-21T12:44:28.6420905Z .b8 117 2026-02-21T12:44:28.6420958Z .b8 50 2026-02-21T12:44:28.6421008Z .b8 117 2026-02-21T12:44:28.6421057Z .b8 53 2026-02-21T12:44:28.6421109Z .b8 98 2026-02-21T12:44:28.6421160Z .b8 50 2026-02-21T12:44:28.6421211Z .b8 121 2026-02-21T12:44:28.6421262Z .b8 101 2026-02-21T12:44:28.6421316Z .b8 104 2026-02-21T12:44:28.6421366Z .b8 120 2026-02-21T12:44:28.6421417Z .b8 112 2026-02-21T12:44:28.6421482Z .b8 112 2026-02-21T12:44:28.6421536Z .b8 98 2026-02-21T12:44:28.6421587Z .b8 54 2026-02-21T12:44:28.6421638Z .b8 106 2026-02-21T12:44:28.6421692Z .b8 111 2026-02-21T12:44:28.6421745Z .b8 111 2026-02-21T12:44:28.6421796Z .b8 122 2026-02-21T12:44:28.6421846Z .b8 54 2026-02-21T12:44:28.6421899Z .b8 120 2026-02-21T12:44:28.6421953Z .b8 51 2026-02-21T12:44:28.6422006Z .b8 97 2026-02-21T12:44:28.6422059Z .b8 53 2026-02-21T12:44:28.6422119Z .b8 117 2026-02-21T12:44:28.6422169Z .b8 46 2026-02-21T12:44:28.6422219Z .b8 112 2026-02-21T12:44:28.6422272Z .b8 121 2026-02-21T12:44:28.6422322Z .b8 0 2026-02-21T12:44:28.6422438Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:44:28.6422533Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:44:28.6422584Z .b8 116 2026-02-21T12:44:28.6422634Z .b8 109 2026-02-21T12:44:28.6422685Z .b8 112 2026-02-21T12:44:28.6422737Z .b8 47 2026-02-21T12:44:28.6422787Z .b8 116 2026-02-21T12:44:28.6422837Z .b8 111 2026-02-21T12:44:28.6422890Z .b8 114 2026-02-21T12:44:28.6422939Z .b8 99 2026-02-21T12:44:28.6422990Z .b8 104 2026-02-21T12:44:28.6423040Z .b8 105 2026-02-21T12:44:28.6423094Z .b8 110 2026-02-21T12:44:28.6423147Z .b8 100 2026-02-21T12:44:28.6423198Z .b8 117 2026-02-21T12:44:28.6423250Z .b8 99 2026-02-21T12:44:28.6423302Z .b8 116 2026-02-21T12:44:28.6423352Z .b8 111 2026-02-21T12:44:28.6423405Z .b8 114 2026-02-21T12:44:28.6423459Z .b8 95 2026-02-21T12:44:28.6423511Z .b8 114 2026-02-21T12:44:28.6423561Z .b8 111 2026-02-21T12:44:28.6423613Z .b8 111 2026-02-21T12:44:28.6423667Z .b8 116 2026-02-21T12:44:28.6423717Z .b8 47 2026-02-21T12:44:28.6423767Z .b8 112 2026-02-21T12:44:28.6423820Z .b8 114 2026-02-21T12:44:28.6423869Z .b8 0 2026-02-21T12:44:28.6423987Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:44:28.6424068Z .b8 95 // DW_AT_name 2026-02-21T12:44:28.6424123Z .b8 104 2026-02-21T12:44:28.6424174Z .b8 101 2026-02-21T12:44:28.6424226Z .b8 108 2026-02-21T12:44:28.6424281Z .b8 105 2026-02-21T12:44:28.6424332Z .b8 111 2026-02-21T12:44:28.6424382Z .b8 110 2026-02-21T12:44:28.6424432Z .b8 95 2026-02-21T12:44:28.6424487Z .b8 109 2026-02-21T12:44:28.6424613Z .b8 97 2026-02-21T12:44:28.6424663Z .b8 116 2026-02-21T12:44:28.6424716Z .b8 109 2026-02-21T12:44:28.6424767Z .b8 117 2026-02-21T12:44:28.6424818Z .b8 108 2026-02-21T12:44:28.6424918Z .b8 95 2026-02-21T12:44:28.6424971Z .b8 98 2026-02-21T12:44:28.6425022Z .b8 102 2026-02-21T12:44:28.6425072Z .b8 49 2026-02-21T12:44:28.6425121Z .b8 54 2026-02-21T12:44:28.6425186Z .b8 95 2026-02-21T12:44:28.6425239Z .b8 105 2026-02-21T12:44:28.6425290Z .b8 110 2026-02-21T12:44:28.6425343Z .b8 116 2026-02-21T12:44:28.6425393Z .b8 52 2026-02-21T12:44:28.6425444Z .b8 0 2026-02-21T12:44:28.6425529Z .b8 1 // DW_AT_inline 2026-02-21T12:44:28.6425640Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:44:28.6425740Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:44:28.6425838Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:44:28.6425941Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:44:28.6426077Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:44:28.6426283Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:44:28.6426379Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:44:28.6426597Z .b64 $L__tmp40 // DW_AT_high_pc 2026-02-21T12:44:28.6426690Z .b8 1 // DW_AT_call_file 2026-02-21T12:44:28.6426774Z .b8 87 // DW_AT_call_line 2026-02-21T12:44:28.6426863Z .b8 40 // DW_AT_call_column 2026-02-21T12:44:28.6426953Z .b8 0 // End Of Children Mark 2026-02-21T12:44:28.6427040Z .b8 0 // End Of Children Mark 2026-02-21T12:44:28.6427094Z } 2026-02-21T12:44:28.6427163Z .section .debug_macinfo { } 2026-02-21T12:44:28.6427169Z 2026-02-21T12:44:28.6427250Z ================================================================ 2026-02-21T12:44:28.6427384Z please share the reproducer above with Triton project. 2026-02-21T12:44:48.3088254Z [8387s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=64, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 0], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T12:44:48.3089995Z Tensor-likes are not close! 2026-02-21T12:44:48.3090153Z 2026-02-21T12:44:48.3090271Z Mismatched elements: 334497811 / 335544320 (99.7%) 2026-02-21T12:44:48.3090670Z Greatest absolute difference: 4256.0 at index (10096, 918) (up to 0.01 allowed) 2026-02-21T12:44:48.3091167Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:44:48.3091599Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:44:48.3091865Z 2026-02-21T12:44:51.6630709Z 2026-02-21T12:44:51.6630736Z 2026-02-21T12:44:51.6630771Z 2026-02-21T12:44:51.6631230Z ================================================================ 2026-02-21T12:44:51.6631627Z Internal Triton PTX codegen error 2026-02-21T12:44:51.6631950Z `ptxas` stderr: 2026-02-21T12:44:51.6632667Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 373 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T12:44:51.6633492Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:44:51.6633723Z 2026-02-21T12:44:51.6634365Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4d0jcm0u.ptx -o /tmp/tmp4d0jcm0u.ptx.o 2026-02-21T12:44:51.6635133Z 2026-02-21T12:44:51.6635138Z 2026-02-21T12:44:51.6635213Z // 2026-02-21T12:44:51.6635834Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:44:51.6636090Z // 2026-02-21T12:44:51.6636189Z 2026-02-21T12:44:51.6636262Z .version 8.7 2026-02-21T12:44:51.6637060Z .target sm_90a 2026-02-21T12:44:51.6637278Z .address_size 64 2026-02-21T12:44:51.6637405Z 2026-02-21T12:44:51.6637649Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:44:51.6638116Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:44:51.6638443Z // @_helion_matmul_bf16_int4 2026-02-21T12:44:51.6638835Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:44:51.6639301Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:44:51.6639761Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:44:51.6640234Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:44:51.6640689Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:44:51.6641130Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:44:51.6641516Z ) 2026-02-21T12:44:51.6641697Z .reqntid 512 2026-02-21T12:44:51.6642015Z .maxnreg 32 2026-02-21T12:44:51.6642337Z { 2026-02-21T12:44:51.6642516Z .reg .pred %p<135>; 2026-02-21T12:44:51.6642717Z .reg .b16 %rs<476>; 2026-02-21T12:44:51.6642910Z .reg .b32 %r<4099>; 2026-02-21T12:44:51.6643097Z .reg .b64 %rd<526>; 2026-02-21T12:44:51.6643483Z .loc 1 14 0 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:14:0 2026-02-21T12:44:51.6643944Z $L__func_begin0: 2026-02-21T12:44:51.6644310Z .loc 1 14 0 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:14:0 2026-02-21T12:44:51.6644685Z 2026-02-21T12:44:51.6644752Z // %bb.0: 2026-02-21T12:44:51.6644998Z ld.param.b64 %rd123, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:44:51.6645383Z ld.param.b64 %rd122, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:44:51.6645754Z ld.param.b64 %rd121, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:44:51.6646068Z $L__tmp0: 2026-02-21T12:44:51.6646445Z .loc 1 20 30 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:20:30 2026-02-21T12:44:51.6647048Z mov.u32 %r117, %ctaid.x; 2026-02-21T12:44:51.6647447Z .loc 1 20 48 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:20:48 2026-02-21T12:44:51.6647912Z mul.wide.u32 %rd505, %r117, 10; 2026-02-21T12:44:51.6648336Z .loc 1 21 49 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:21:49 2026-02-21T12:44:51.6648792Z min.u64 %rd124, %rd505, 81910; 2026-02-21T12:44:51.6649027Z add.s64 %rd2, %rd124, 10; 2026-02-21T12:44:51.6649438Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.6649827Z sub.s64 %rd125, %rd2, %rd505; 2026-02-21T12:44:51.6650028Z shr.s64 %rd126, %rd125, 63; 2026-02-21T12:44:51.6650211Z shr.u64 %rd127, %rd126, 62; 2026-02-21T12:44:51.6650403Z add.s64 %rd128, %rd125, %rd127; 2026-02-21T12:44:51.6650596Z and.b64 %rd129, %rd128, -4; 2026-02-21T12:44:51.6650788Z add.s64 %rd522, %rd129, %rd505; 2026-02-21T12:44:51.6651149Z .loc 1 34 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:45 2026-02-21T12:44:51.6651526Z mov.u32 %r1, %tid.x; 2026-02-21T12:44:51.6651702Z and.b32 %r2, %r1, 31; 2026-02-21T12:44:51.6651863Z shr.u32 %r3, %r1, 5; 2026-02-21T12:44:51.6652036Z and.b32 %r4, %r1, 504; 2026-02-21T12:44:51.6652209Z bfe.u32 %r118, %r1, 3, 6; 2026-02-21T12:44:51.6652386Z and.b32 %r5, %r1, 7; 2026-02-21T12:44:51.6652544Z and.b32 %r119, %r1, 63; 2026-02-21T12:44:51.6652865Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.6653237Z cvt.u64.u32 %rd4, %r118; 2026-02-21T12:44:51.6653420Z mul.wide.u32 %rd5, %r5, 8; 2026-02-21T12:44:51.6653613Z cvt.u64.u32 %rd6, %r119; 2026-02-21T12:44:51.6654040Z .loc 1 44 48 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:44:48 2026-02-21T12:44:51.6654407Z and.b32 %r6, %r1, 448; 2026-02-21T12:44:51.6655172Z [8390s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:44:51.6656824Z Config: @helion.kernel(config=helion.Config(block_sizes=[8, 64, 64], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=64, num_stages=6, num_warps=16, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:44:51.6658277Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:44:51.6658566Z `ptxas` stderr: 2026-02-21T12:44:51.6659123Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 373 in function _helion_matmul_bf16_int4. Try to compile with register target of 38 or higher. 2026-02-21T12:44:51.6659759Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:44:51.6660033Z 2026-02-21T12:44:51.6660605Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmp4d0jcm0u.ptx -o /tmp/tmp4d0jcm0u.ptx.o 2026-02-21T12:44:51.6661184Z 2026-02-21T12:44:51.6661337Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:44:51.6661635Z bfe.u32 %r7, %r1, 6, 3; 2026-02-21T12:44:51.6661961Z .loc 1 68 38 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:68:38 2026-02-21T12:44:51.6662339Z and.b32 %r8, %r1, 64; 2026-02-21T12:44:51.6662664Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.6663039Z setp.lt.s64 %p1, %rd129, 1; 2026-02-21T12:44:51.6663228Z cvt.u32.u64 %r4039, %rd6; 2026-02-21T12:44:51.6663414Z shl.b32 %r4040, %r1, 2; 2026-02-21T12:44:51.6663587Z and.b32 %r4041, %r1, 32; 2026-02-21T12:44:51.6663759Z bfe.s32 %r4042, %r1, 5, 1; 2026-02-21T12:44:51.6663948Z mov.b32 %r3946, global_smem; 2026-02-21T12:44:51.6664125Z and.b32 %r4044, %r1, 96; 2026-02-21T12:44:51.6664297Z shl.b32 %r4045, %r1, 3; 2026-02-21T12:44:51.6664459Z shl.b32 %r4046, %r1, 1; 2026-02-21T12:44:51.6664625Z and.b32 %r4047, %r1, 16; 2026-02-21T12:44:51.6664788Z bfe.s32 %r4048, %r1, 4, 1; 2026-02-21T12:44:51.6664965Z bfe.u32 %r4049, %r1, 8, 1; 2026-02-21T12:44:51.6665132Z and.b32 %r4050, %r1, 384; 2026-02-21T12:44:51.6665308Z shr.u32 %r4051, %r6, 4; 2026-02-21T12:44:51.6665478Z shl.b32 %r4052, %r3, 3; 2026-02-21T12:44:51.6665640Z shl.b32 %r4053, %r1, 6; 2026-02-21T12:44:51.6665806Z shl.b32 %r4054, %r5, 4; 2026-02-21T12:44:51.6665971Z shl.b32 %r4055, %r1, 4; 2026-02-21T12:44:51.6666139Z shl.b32 %r4056, %r5, 10; 2026-02-21T12:44:51.6666303Z shl.b32 %r4057, %r1, 7; 2026-02-21T12:44:51.6666605Z shl.b32 %r4058, %r4, 1; 2026-02-21T12:44:51.6666787Z add.s64 %rd501, %rd122, %rd6; 2026-02-21T12:44:51.6666984Z mul.wide.u32 %rd502, %r7, 1280; 2026-02-21T12:44:51.6667189Z mul.wide.u32 %rd503, %r5, 4; 2026-02-21T12:44:51.6667374Z setp.eq.b32 %p134, %r8, 0; 2026-02-21T12:44:51.6667554Z shl.b64 %rd504, %rd5, 1; 2026-02-21T12:44:51.6667727Z @%p1 bra $L__BB0_4; 2026-02-21T12:44:51.6667916Z // %bb.1: // %.lr.ph 2026-02-21T12:44:51.6668320Z .loc 1 0 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:0:120 2026-02-21T12:44:51.6668860Z and.b32 %r122, %r4040, 1916; 2026-02-21T12:44:51.6669055Z and.b32 %r125, %r4042, 136; 2026-02-21T12:44:51.6669242Z xor.b32 %r126, %r125, %r122; 2026-02-21T12:44:51.6669419Z add.s32 %r9, %r3946, %r126; 2026-02-21T12:44:51.6669601Z shl.b32 %r129, %r4044, 4; 2026-02-21T12:44:51.6669780Z and.b32 %r131, %r4045, 96; 2026-02-21T12:44:51.6669955Z and.b32 %r133, %r4046, 6; 2026-02-21T12:44:51.6670224Z and.b32 %r136, %r4048, 136; 2026-02-21T12:44:51.6670398Z or.b32 %r137, %r129, %r131; 2026-02-21T12:44:51.6670575Z or.b32 %r138, %r137, %r133; 2026-02-21T12:44:51.6670812Z or.b32 %r139, %r138, %r136; 2026-02-21T12:44:51.6671008Z add.s32 %r10, %r3946, %r139; 2026-02-21T12:44:51.6671189Z xor.b32 %r140, %r139, 8; 2026-02-21T12:44:51.6671370Z add.s32 %r11, %r3946, %r140; 2026-02-21T12:44:51.6671557Z and.b32 %r141, %r4040, 124; 2026-02-21T12:44:51.6671730Z and.b32 %r142, %r4046, 384; 2026-02-21T12:44:51.6671905Z shr.u32 %r143, %r4041, 4; 2026-02-21T12:44:51.6672079Z add.s32 %r145, %r3946, %r4049; 2026-02-21T12:44:51.6672268Z add.s32 %r146, %r145, %r142; 2026-02-21T12:44:51.6672442Z add.s32 %r147, %r146, %r143; 2026-02-21T12:44:51.6672624Z add.s32 %r12, %r147, %r141; 2026-02-21T12:44:51.6672801Z add.s32 %r149, %r3946, %r143; 2026-02-21T12:44:51.6672985Z add.s32 %r150, %r149, %r4050; 2026-02-21T12:44:51.6673162Z add.s32 %r13, %r150, %r141; 2026-02-21T12:44:51.6673345Z shl.b32 %r151, %r4039, 6; 2026-02-21T12:44:51.6673522Z and.b32 %r152, %r4045, 48; 2026-02-21T12:44:51.6673695Z xor.b32 %r154, %r152, %r4051; 2026-02-21T12:44:51.6674012Z or.b32 %r155, %r154, %r151; 2026-02-21T12:44:51.6674198Z add.s32 %r14, %r3946, %r155; 2026-02-21T12:44:51.6674381Z xor.b32 %r156, %r155, 32; 2026-02-21T12:44:51.6674551Z add.s32 %r15, %r3946, %r156; 2026-02-21T12:44:51.6674731Z and.b32 %r158, %r4052, 120; 2026-02-21T12:44:51.6674904Z or.b32 %r159, %r158, %r2; 2026-02-21T12:44:51.6675082Z shl.b32 %r160, %r159, 4; 2026-02-21T12:44:51.6675250Z add.s32 %r161, %r3946, 4096; 2026-02-21T12:44:51.6675429Z add.s32 %r712, %r161, %r160; 2026-02-21T12:44:51.6675609Z and.b32 %r163, %r4053, 1536; 2026-02-21T12:44:51.6675779Z shl.b32 %r165, %r4044, 2; 2026-02-21T12:44:51.6675955Z add.s32 %r166, %r161, %r163; 2026-02-21T12:44:51.6676131Z add.s32 %r167, %r166, %r4054; 2026-02-21T12:44:51.6676311Z add.s32 %r213, %r167, %r165; 2026-02-21T12:44:51.6676610Z bfe.u32 %r168, %r3946, 4, 14; 2026-02-21T12:44:51.6676823Z cvt.u64.u32 %rd130, %r168; 2026-02-21T12:44:51.6677021Z or.b64 %rd7, %rd130, -9223371899399045120; 2026-02-21T12:44:51.6677249Z add.s32 %r169, %r3946, 32; 2026-02-21T12:44:51.6677435Z bfe.u32 %r170, %r169, 4, 14; 2026-02-21T12:44:51.6677624Z cvt.u64.u32 %rd131, %r170; 2026-02-21T12:44:51.6677821Z or.b64 %rd8, %rd131, -9223371899399045120; 2026-02-21T12:44:51.6678030Z add.s32 %r171, %r3946, 1024; 2026-02-21T12:44:51.6678212Z bfe.u32 %r172, %r171, 4, 14; 2026-02-21T12:44:51.6678386Z cvt.u64.u32 %rd132, %r172; 2026-02-21T12:44:51.6678577Z or.b64 %rd9, %rd132, -9223371899399045120; 2026-02-21T12:44:51.6678783Z add.s32 %r173, %r3946, 1056; 2026-02-21T12:44:51.6678971Z bfe.u32 %r174, %r173, 4, 14; 2026-02-21T12:44:51.6679145Z cvt.u64.u32 %rd133, %r174; 2026-02-21T12:44:51.6679341Z or.b64 %rd10, %rd133, -9223371899399045120; 2026-02-21T12:44:51.6679560Z add.s32 %r175, %r3946, 2048; 2026-02-21T12:44:51.6679735Z bfe.u32 %r176, %r175, 4, 14; 2026-02-21T12:44:51.6679919Z cvt.u64.u32 %rd134, %r176; 2026-02-21T12:44:51.6680107Z or.b64 %rd11, %rd134, -9223371899399045120; 2026-02-21T12:44:51.6680326Z add.s32 %r177, %r3946, 2080; 2026-02-21T12:44:51.6680507Z bfe.u32 %r178, %r177, 4, 14; 2026-02-21T12:44:51.6680688Z cvt.u64.u32 %rd135, %r178; 2026-02-21T12:44:51.6680877Z or.b64 %rd12, %rd135, -9223371899399045120; 2026-02-21T12:44:51.6681093Z add.s32 %r179, %r3946, 3072; 2026-02-21T12:44:51.6681283Z bfe.u32 %r180, %r179, 4, 14; 2026-02-21T12:44:51.6681463Z cvt.u64.u32 %rd136, %r180; 2026-02-21T12:44:51.6681654Z or.b64 %rd13, %rd136, -9223371899399045120; 2026-02-21T12:44:51.6681859Z add.s32 %r181, %r3946, 3104; 2026-02-21T12:44:51.6682035Z bfe.u32 %r182, %r181, 4, 14; 2026-02-21T12:44:51.6682207Z cvt.u64.u32 %rd137, %r182; 2026-02-21T12:44:51.6682393Z or.b64 %rd14, %rd137, -9223371899399045120; 2026-02-21T12:44:51.6682600Z and.b32 %r184, %r4055, 1536; 2026-02-21T12:44:51.6682780Z or.b32 %r185, %r184, %r131; 2026-02-21T12:44:51.6683060Z or.b32 %r186, %r185, %r133; 2026-02-21T12:44:51.6683230Z or.b32 %r187, %r186, %r136; 2026-02-21T12:44:51.6683407Z add.s32 %r18, %r3946, %r187; 2026-02-21T12:44:51.6683664Z xor.b32 %r188, %r187, 8; 2026-02-21T12:44:51.6683839Z add.s32 %r19, %r3946, %r188; 2026-02-21T12:44:51.6684012Z and.b32 %r190, %r4055, 240; 2026-02-21T12:44:51.6684190Z shl.b32 %r191, %r4044, 3; 2026-02-21T12:44:51.6684357Z shr.u32 %r192, %r4050, 2; 2026-02-21T12:44:51.6684530Z or.b32 %r193, %r190, %r191; 2026-02-21T12:44:51.6684702Z or.b32 %r194, %r192, %r4047; 2026-02-21T12:44:51.6684879Z xor.b32 %r195, %r193, %r194; 2026-02-21T12:44:51.6685064Z add.s32 %r196, %r3946, %r4056; 2026-02-21T12:44:51.6685248Z add.s32 %r20, %r196, %r195; 2026-02-21T12:44:51.6685430Z and.b32 %r198, %r4057, 7168; 2026-02-21T12:44:51.6685621Z xor.b32 %r200, %r4054, %r4058; 2026-02-21T12:44:51.6685815Z add.s32 %r201, %r3946, %r198; 2026-02-21T12:44:51.6685994Z add.s32 %r21, %r201, %r200; 2026-02-21T12:44:51.6686177Z add.s32 %r202, %r149, %r141; 2026-02-21T12:44:51.6686353Z add.s32 %r22, %r202, %r4050; 2026-02-21T12:44:51.6686666Z add.s32 %r203, %r166, %r165; 2026-02-21T12:44:51.6687012Z add.s32 %r964, %r203, %r4054; 2026-02-21T12:44:51.6687370Z .loc 1 50 26 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:50:26 2026-02-21T12:44:51.6687760Z mul.wide.u32 %rd15, %r5, 2; 2026-02-21T12:44:51.6688098Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.6688470Z shl.b64 %rd138, %rd4, 14; 2026-02-21T12:44:51.6688650Z or.b64 %rd140, %rd138, %rd503; 2026-02-21T12:44:51.6688870Z add.s64 %rd141, %rd140, %rd121; 2026-02-21T12:44:51.6689062Z add.s64 %rd18, %rd141, 64; 2026-02-21T12:44:51.6689247Z or.b64 %rd142, %rd502, %rd6; 2026-02-21T12:44:51.6689435Z add.s64 %rd19, %rd122, %rd142; 2026-02-21T12:44:51.6689669Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:44:51.6689970Z // Child Loop BB0_10 Depth 2 2026-02-21T12:44:51.6690253Z // Child Loop BB0_15 Depth 2 2026-02-21T12:44:51.6690541Z // Child Loop BB0_20 Depth 2 2026-02-21T12:44:51.6690808Z // Child Loop BB0_25 Depth 2 2026-02-21T12:44:51.6691206Z .loc 1 28 35 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:28:35 2026-02-21T12:44:51.6691603Z mul.hi.u64 %rd143, %rd505, -3689348814741910323; 2026-02-21T12:44:51.6691834Z shr.u64 %rd144, %rd143, 7; 2026-02-21T12:44:51.6692169Z .loc 1 29 33 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:29:33 2026-02-21T12:44:51.6692532Z shl.b64 %rd36, %rd144, 3; 2026-02-21T12:44:51.6692855Z .loc 1 30 39 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:39 2026-02-21T12:44:51.6693214Z sub.s64 %rd145, 4096, %rd36; 2026-02-21T12:44:51.6693540Z .loc 1 30 52 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:52 2026-02-21T12:44:51.6693902Z min.s64 %rd37, %rd145, 8; 2026-02-21T12:44:51.6694222Z .loc 1 31 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:45 2026-02-21T12:44:51.6694585Z mul.lo.s64 %rd146, %rd144, 160; 2026-02-21T12:44:51.6694782Z sub.s64 %rd38, %rd505, %rd146; 2026-02-21T12:44:51.6695112Z .loc 1 32 51 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:32:51 2026-02-21T12:44:51.6695468Z or.b64 %rd147, %rd38, %rd37; 2026-02-21T12:44:51.6695659Z and.b64 %rd148, %rd147, -4294967296; 2026-02-21T12:44:51.6695868Z setp.ne.b64 %p2, %rd148, 0; 2026-02-21T12:44:51.6696053Z @%p2 bra $L__BB0_8; 2026-02-21T12:44:51.6696220Z bra.uni $L__BB0_3; 2026-02-21T12:44:51.6696602Z $L__BB0_8: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6696881Z div.s64 %rd506, %rd38, %rd37; 2026-02-21T12:44:51.6697145Z bra.uni $L__BB0_9; 2026-02-21T12:44:51.6697356Z $L__BB0_3: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6697681Z cvt.u32.u64 %r204, %rd37; 2026-02-21T12:44:51.6697875Z cvt.u32.u64 %r205, %rd38; 2026-02-21T12:44:51.6698062Z div.u32 %r206, %r205, %r204; 2026-02-21T12:44:51.6698245Z cvt.u64.u32 %rd506, %r206; 2026-02-21T12:44:51.6698478Z $L__BB0_9: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6698887Z .loc 1 31 64 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:64 2026-02-21T12:44:51.6699276Z mul.lo.s64 %rd150, %rd506, %rd37; 2026-02-21T12:44:51.6699475Z sub.s64 %rd151, %rd38, %rd150; 2026-02-21T12:44:51.6699814Z .loc 1 31 30 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:30 2026-02-21T12:44:51.6700172Z add.s64 %rd152, %rd151, %rd36; 2026-02-21T12:44:51.6700505Z .loc 1 33 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:33:27 2026-02-21T12:44:51.6700869Z shl.b64 %rd153, %rd152, 6; 2026-02-21T12:44:51.6701308Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.6701681Z or.b64 %rd42, %rd153, %rd4; 2026-02-21T12:44:51.6702002Z .loc 1 35 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:35:27 2026-02-21T12:44:51.6702361Z shl.b64 %rd43, %rd506, 6; 2026-02-21T12:44:51.6702536Z shl.b64 %rd154, %rd42, 14; 2026-02-21T12:44:51.6702719Z add.s64 %rd44, %rd121, %rd154; 2026-02-21T12:44:51.6702910Z add.s64 %rd45, %rd501, %rd43; 2026-02-21T12:44:51.6703245Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.6703617Z shl.b64 %rd155, %rd152, 20; 2026-02-21T12:44:51.6703801Z add.s64 %rd509, %rd18, %rd155; 2026-02-21T12:44:51.6703992Z add.s64 %rd508, %rd19, %rd43; 2026-02-21T12:44:51.6704191Z mov.b32 %r767, 0f00000000; 2026-02-21T12:44:51.6704374Z mov.b64 %rd510, -24; 2026-02-21T12:44:51.6704541Z mov.b32 %r768, %r767; 2026-02-21T12:44:51.6704712Z mov.b32 %r769, %r767; 2026-02-21T12:44:51.6704883Z mov.b32 %r770, %r767; 2026-02-21T12:44:51.6705041Z mov.b32 %r771, %r767; 2026-02-21T12:44:51.6705206Z mov.b32 %r772, %r767; 2026-02-21T12:44:51.6705367Z mov.b32 %r773, %r767; 2026-02-21T12:44:51.6705531Z mov.b32 %r774, %r767; 2026-02-21T12:44:51.6705744Z $L__BB0_10: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:51.6706046Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:51.6706444Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6706957Z add.s64 %rd157, %rd509, -64; 2026-02-21T12:44:51.6707161Z // begin inline asm 2026-02-21T12:44:51.6707328Z mov.u64 %rd156, 0x0; 2026-02-21T12:44:51.6707583Z createpolicy.fractional.L2::evict_last.b64 %rd156, 1.0; 2026-02-21T12:44:51.6707843Z // end inline asm 2026-02-21T12:44:51.6707998Z // begin inline asm 2026-02-21T12:44:51.6708150Z mov.u32 %r208, 0x0; 2026-02-21T12:44:51.6708416Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r208 }, [ %rd157 + 0 ], %rd156; 2026-02-21T12:44:51.6708784Z // end inline asm 2026-02-21T12:44:51.6709095Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6709453Z bar.sync 0; 2026-02-21T12:44:51.6709606Z st.shared.b32 [%r9], %r208; 2026-02-21T12:44:51.6709788Z bar.sync 0; 2026-02-21T12:44:51.6709936Z ld.shared.b16 %rs4, [%r10]; 2026-02-21T12:44:51.6710124Z ld.shared.b16 %rs5, [%r10+256]; 2026-02-21T12:44:51.6710320Z ld.shared.b16 %rs6, [%r10+16]; 2026-02-21T12:44:51.6710515Z ld.shared.b16 %rs7, [%r10+272]; 2026-02-21T12:44:51.6710706Z ld.shared.b16 %rs8, [%r11]; 2026-02-21T12:44:51.6710893Z ld.shared.b16 %rs9, [%r11+256]; 2026-02-21T12:44:51.6711087Z ld.shared.b16 %rs10, [%r11+16]; 2026-02-21T12:44:51.6711370Z ld.shared.b16 %rs11, [%r11+272]; 2026-02-21T12:44:51.6711572Z cvt.f32.bf16 %r305, %rs4; 2026-02-21T12:44:51.6711764Z cvt.f32.bf16 %r306, %rs5; 2026-02-21T12:44:51.6712010Z cvt.f32.bf16 %r307, %rs8; 2026-02-21T12:44:51.6712181Z cvt.f32.bf16 %r308, %rs9; 2026-02-21T12:44:51.6712354Z cvt.f32.bf16 %r325, %rs6; 2026-02-21T12:44:51.6712522Z cvt.f32.bf16 %r326, %rs7; 2026-02-21T12:44:51.6712700Z cvt.f32.bf16 %r327, %rs10; 2026-02-21T12:44:51.6712875Z cvt.f32.bf16 %r328, %rs11; 2026-02-21T12:44:51.6713200Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6713564Z // begin inline asm 2026-02-21T12:44:51.6713720Z mov.u16 %rs1, 0x0; 2026-02-21T12:44:51.6713892Z ld.global.b8 { %rs1 }, [ %rd508 + 0 ]; 2026-02-21T12:44:51.6714093Z // end inline asm 2026-02-21T12:44:51.6714401Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6714755Z bar.sync 0; 2026-02-21T12:44:51.6714918Z st.shared.b8 [%r12], %rs1; 2026-02-21T12:44:51.6715102Z bar.sync 0; 2026-02-21T12:44:51.6715349Z ld.shared.v2.b8 {%rs12, %rs13}, [%r13]; 2026-02-21T12:44:51.6715794Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6716168Z shl.b16 %rs14, %rs12, 4; 2026-02-21T12:44:51.6716349Z shl.b16 %rs15, %rs13, 4; 2026-02-21T12:44:51.6716822Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6717199Z selp.b16 %rs16, %rs14, %rs12, %p134; 2026-02-21T12:44:51.6717422Z cvt.s16.s8 %rs17, %rs16; 2026-02-21T12:44:51.6717608Z shr.s16 %rs18, %rs17, 4; 2026-02-21T12:44:51.6717796Z selp.b16 %rs19, %rs15, %rs13, %p134; 2026-02-21T12:44:51.6717994Z cvt.s16.s8 %rs20, %rs19; 2026-02-21T12:44:51.6718168Z shr.s16 %rs21, %rs20, 4; 2026-02-21T12:44:51.6718482Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6718854Z cvt.rn.f32.s16 %r789, %rs18; 2026-02-21T12:44:51.6719037Z cvt.rn.f32.s16 %r790, %rs21; 2026-02-21T12:44:51.6719230Z bar.sync 0; 2026-02-21T12:44:51.6719378Z st.shared.b32 [%r14], %r789; 2026-02-21T12:44:51.6719561Z st.shared.b32 [%r15], %r790; 2026-02-21T12:44:51.6719814Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r767}; 2026-02-21T12:44:51.6720082Z bar.sync 0; 2026-02-21T12:44:51.6720231Z // begin inline asm 2026-02-21T12:44:51.6720507Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r269, %r309, %r349, %r389}, [%r213]; 2026-02-21T12:44:51.6720830Z // end inline asm 2026-02-21T12:44:51.6720978Z bar.sync 0; 2026-02-21T12:44:51.6721190Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r769}; 2026-02-21T12:44:51.6721456Z bar.sync 0; 2026-02-21T12:44:51.6721604Z // begin inline asm 2026-02-21T12:44:51.6721875Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r271, %r311, %r351, %r391}, [%r213]; 2026-02-21T12:44:51.6722220Z // end inline asm 2026-02-21T12:44:51.6722374Z bar.sync 0; 2026-02-21T12:44:51.6722584Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r768}; 2026-02-21T12:44:51.6722855Z bar.sync 0; 2026-02-21T12:44:51.6723001Z // begin inline asm 2026-02-21T12:44:51.6723389Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r270, %r310, %r350, %r390}, [%r213]; 2026-02-21T12:44:51.6723739Z // end inline asm 2026-02-21T12:44:51.6723891Z bar.sync 0; 2026-02-21T12:44:51.6724121Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r770}; 2026-02-21T12:44:51.6724474Z bar.sync 0; 2026-02-21T12:44:51.6724626Z // begin inline asm 2026-02-21T12:44:51.6725020Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r272, %r312, %r352, %r392}, [%r213]; 2026-02-21T12:44:51.6725450Z // end inline asm 2026-02-21T12:44:51.6725599Z bar.sync 0; 2026-02-21T12:44:51.6725814Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r771}; 2026-02-21T12:44:51.6726093Z bar.sync 0; 2026-02-21T12:44:51.6726335Z // begin inline asm 2026-02-21T12:44:51.6727176Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r273, %r313, %r353, %r393}, [%r213]; 2026-02-21T12:44:51.6727540Z // end inline asm 2026-02-21T12:44:51.6727805Z bar.sync 0; 2026-02-21T12:44:51.6728031Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r773}; 2026-02-21T12:44:51.6728315Z bar.sync 0; 2026-02-21T12:44:51.6728462Z // begin inline asm 2026-02-21T12:44:51.6728743Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r275, %r315, %r355, %r395}, [%r213]; 2026-02-21T12:44:51.6729084Z // end inline asm 2026-02-21T12:44:51.6729240Z bar.sync 0; 2026-02-21T12:44:51.6729452Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r772}; 2026-02-21T12:44:51.6729724Z bar.sync 0; 2026-02-21T12:44:51.6729870Z // begin inline asm 2026-02-21T12:44:51.6730147Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r274, %r314, %r354, %r394}, [%r213]; 2026-02-21T12:44:51.6730473Z // end inline asm 2026-02-21T12:44:51.6730624Z bar.sync 0; 2026-02-21T12:44:51.6730843Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r774}; 2026-02-21T12:44:51.6731106Z bar.sync 0; 2026-02-21T12:44:51.6731254Z // begin inline asm 2026-02-21T12:44:51.6731677Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r276, %r316, %r356, %r396}, [%r213]; 2026-02-21T12:44:51.6732010Z // end inline asm 2026-02-21T12:44:51.6732157Z $L__tmp1: 2026-02-21T12:44:51.6732523Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6732951Z // begin inline asm 2026-02-21T12:44:51.6733135Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6733333Z // end inline asm 2026-02-21T12:44:51.6733499Z shfl.sync.idx.b32 %r791, %r3, 0, 31, -1; 2026-02-21T12:44:51.6733732Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6733919Z mov.pred %p20, -1; 2026-02-21T12:44:51.6734083Z // begin inline asm 2026-02-21T12:44:51.6734521Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276}, {%r305,%r306,%r307,%r308}, %rd7, %p20, 1, 1; 2026-02-21T12:44:51.6735185Z // end inline asm 2026-02-21T12:44:51.6735405Z // begin inline asm 2026-02-21T12:44:51.6736036Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276}, {%r325,%r326,%r327,%r328}, %rd8, %p20, 1, 1; 2026-02-21T12:44:51.6737029Z // end inline asm 2026-02-21T12:44:51.6737252Z // begin inline asm 2026-02-21T12:44:51.6737899Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316}, {%r305,%r306,%r307,%r308}, %rd9, %p20, 1, 1; 2026-02-21T12:44:51.6738646Z // end inline asm 2026-02-21T12:44:51.6738864Z // begin inline asm 2026-02-21T12:44:51.6739499Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316}, {%r325,%r326,%r327,%r328}, %rd10, %p20, 1, 1; 2026-02-21T12:44:51.6740188Z // end inline asm 2026-02-21T12:44:51.6740407Z // begin inline asm 2026-02-21T12:44:51.6741036Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356}, {%r305,%r306,%r307,%r308}, %rd11, %p20, 1, 1; 2026-02-21T12:44:51.6741725Z // end inline asm 2026-02-21T12:44:51.6741940Z // begin inline asm 2026-02-21T12:44:51.6742591Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356}, {%r325,%r326,%r327,%r328}, %rd12, %p20, 1, 1; 2026-02-21T12:44:51.6743205Z // end inline asm 2026-02-21T12:44:51.6743354Z // begin inline asm 2026-02-21T12:44:51.6743786Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396}, {%r305,%r306,%r307,%r308}, %rd13, %p20, 1, 1; 2026-02-21T12:44:51.6744261Z // end inline asm 2026-02-21T12:44:51.6744414Z // begin inline asm 2026-02-21T12:44:51.6744834Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396}, {%r325,%r326,%r327,%r328}, %rd14, %p20, 1, 1; 2026-02-21T12:44:51.6745314Z // end inline asm 2026-02-21T12:44:51.6757641Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6758060Z mov.b32 %r916, 0; 2026-02-21T12:44:51.6758234Z mov.b32 %r442, %r916; 2026-02-21T12:44:51.6758416Z mov.b32 %r443, %r916; 2026-02-21T12:44:51.6758680Z mov.b32 %r441, %r3946; 2026-02-21T12:44:51.6758864Z // begin inline asm 2026-02-21T12:44:51.6759445Z // wait for regs: %r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276,%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316,%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356,%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396,%r441,%r442,%r443 2026-02-21T12:44:51.6760089Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6760297Z // end inline asm 2026-02-21T12:44:51.6760450Z $L__tmp2: 2026-02-21T12:44:51.6760765Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6761158Z add.s64 %rd169, %rd509, -32; 2026-02-21T12:44:51.6761360Z // begin inline asm 2026-02-21T12:44:51.6761540Z mov.u64 %rd168, 0x0; 2026-02-21T12:44:51.6761787Z createpolicy.fractional.L2::evict_last.b64 %rd168, 1.0; 2026-02-21T12:44:51.6762049Z // end inline asm 2026-02-21T12:44:51.6762212Z // begin inline asm 2026-02-21T12:44:51.6762448Z mov.u32 %r479, 0x0; 2026-02-21T12:44:51.6762786Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r479 }, [ %rd169 + 0 ], %rd168; 2026-02-21T12:44:51.6763114Z // end inline asm 2026-02-21T12:44:51.6763432Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6763801Z bar.sync 0; 2026-02-21T12:44:51.6763958Z st.shared.b32 [%r9], %r479; 2026-02-21T12:44:51.6764145Z bar.sync 0; 2026-02-21T12:44:51.6764302Z ld.shared.b16 %rs22, [%r10]; 2026-02-21T12:44:51.6764501Z ld.shared.b16 %rs23, [%r10+256]; 2026-02-21T12:44:51.6764704Z ld.shared.b16 %rs24, [%r10+16]; 2026-02-21T12:44:51.6764907Z ld.shared.b16 %rs25, [%r10+272]; 2026-02-21T12:44:51.6765106Z ld.shared.b16 %rs26, [%r11]; 2026-02-21T12:44:51.6765304Z ld.shared.b16 %rs27, [%r11+256]; 2026-02-21T12:44:51.6765509Z ld.shared.b16 %rs28, [%r11+16]; 2026-02-21T12:44:51.6765702Z ld.shared.b16 %rs29, [%r11+272]; 2026-02-21T12:44:51.6765904Z cvt.f32.bf16 %r536, %rs22; 2026-02-21T12:44:51.6766089Z cvt.f32.bf16 %r537, %rs23; 2026-02-21T12:44:51.6766275Z cvt.f32.bf16 %r538, %rs26; 2026-02-21T12:44:51.6766447Z cvt.f32.bf16 %r539, %rs27; 2026-02-21T12:44:51.6766783Z cvt.f32.bf16 %r556, %rs24; 2026-02-21T12:44:51.6766964Z cvt.f32.bf16 %r557, %rs25; 2026-02-21T12:44:51.6767139Z cvt.f32.bf16 %r558, %rs28; 2026-02-21T12:44:51.6767319Z cvt.f32.bf16 %r559, %rs29; 2026-02-21T12:44:51.6767657Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6768033Z add.s64 %rd171, %rd508, 10240; 2026-02-21T12:44:51.6768224Z // begin inline asm 2026-02-21T12:44:51.6768392Z mov.u16 %rs2, 0x0; 2026-02-21T12:44:51.6768563Z ld.global.b8 { %rs2 }, [ %rd171 + 0 ]; 2026-02-21T12:44:51.6768784Z // end inline asm 2026-02-21T12:44:51.6769096Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6769452Z bar.sync 0; 2026-02-21T12:44:51.6769613Z st.shared.b8 [%r12], %rs2; 2026-02-21T12:44:51.6769794Z bar.sync 0; 2026-02-21T12:44:51.6769964Z ld.shared.v2.b8 {%rs30, %rs31}, [%r13]; 2026-02-21T12:44:51.6770324Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6770702Z shl.b16 %rs32, %rs30, 4; 2026-02-21T12:44:51.6770881Z shl.b16 %rs33, %rs31, 4; 2026-02-21T12:44:51.6771207Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6771590Z selp.b16 %rs34, %rs32, %rs30, %p134; 2026-02-21T12:44:51.6771798Z cvt.s16.s8 %rs35, %rs34; 2026-02-21T12:44:51.6771984Z shr.s16 %rs36, %rs35, 4; 2026-02-21T12:44:51.6772161Z selp.b16 %rs37, %rs33, %rs31, %p134; 2026-02-21T12:44:51.6772368Z cvt.s16.s8 %rs38, %rs37; 2026-02-21T12:44:51.6772542Z shr.s16 %rs39, %rs38, 4; 2026-02-21T12:44:51.6772957Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6773339Z cvt.rn.f32.s16 %r792, %rs36; 2026-02-21T12:44:51.6773616Z cvt.rn.f32.s16 %r793, %rs39; 2026-02-21T12:44:51.6773803Z bar.sync 0; 2026-02-21T12:44:51.6773957Z st.shared.b32 [%r14], %r792; 2026-02-21T12:44:51.6774147Z st.shared.b32 [%r15], %r793; 2026-02-21T12:44:51.6774318Z $L__tmp3: 2026-02-21T12:44:51.6774698Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6775138Z // begin inline asm 2026-02-21T12:44:51.6775336Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6775533Z // end inline asm 2026-02-21T12:44:51.6775690Z bar.sync 0; 2026-02-21T12:44:51.6775859Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6776049Z // begin inline asm 2026-02-21T12:44:51.6776656Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276}, {%r536,%r537,%r538,%r539}, %rd7, %p20, 1, 1; 2026-02-21T12:44:51.6777153Z // end inline asm 2026-02-21T12:44:51.6777310Z // begin inline asm 2026-02-21T12:44:51.6777899Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276}, {%r556,%r557,%r558,%r559}, %rd8, %p20, 1, 1; 2026-02-21T12:44:51.6778402Z // end inline asm 2026-02-21T12:44:51.6778562Z // begin inline asm 2026-02-21T12:44:51.6779000Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316}, {%r536,%r537,%r538,%r539}, %rd9, %p20, 1, 1; 2026-02-21T12:44:51.6779488Z // end inline asm 2026-02-21T12:44:51.6779636Z // begin inline asm 2026-02-21T12:44:51.6780070Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316}, {%r556,%r557,%r558,%r559}, %rd10, %p20, 1, 1; 2026-02-21T12:44:51.6780548Z // end inline asm 2026-02-21T12:44:51.6780703Z // begin inline asm 2026-02-21T12:44:51.6781136Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356}, {%r536,%r537,%r538,%r539}, %rd11, %p20, 1, 1; 2026-02-21T12:44:51.6781614Z // end inline asm 2026-02-21T12:44:51.6781770Z // begin inline asm 2026-02-21T12:44:51.6782196Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356}, {%r556,%r557,%r558,%r559}, %rd12, %p20, 1, 1; 2026-02-21T12:44:51.6782680Z // end inline asm 2026-02-21T12:44:51.6782829Z // begin inline asm 2026-02-21T12:44:51.6783259Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396}, {%r536,%r537,%r538,%r539}, %rd13, %p20, 1, 1; 2026-02-21T12:44:51.6783741Z // end inline asm 2026-02-21T12:44:51.6783889Z // begin inline asm 2026-02-21T12:44:51.6784320Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396}, {%r556,%r557,%r558,%r559}, %rd14, %p20, 1, 1; 2026-02-21T12:44:51.6784794Z // end inline asm 2026-02-21T12:44:51.6784972Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6785183Z mov.b32 %r673, %r916; 2026-02-21T12:44:51.6785359Z mov.b32 %r674, %r916; 2026-02-21T12:44:51.6785532Z mov.b32 %r672, %r3946; 2026-02-21T12:44:51.6785711Z // begin inline asm 2026-02-21T12:44:51.6786279Z // wait for regs: %r269,%r270,%r271,%r272,%r273,%r274,%r275,%r276,%r309,%r310,%r311,%r312,%r313,%r314,%r315,%r316,%r349,%r350,%r351,%r352,%r353,%r354,%r355,%r356,%r389,%r390,%r391,%r392,%r393,%r394,%r395,%r396,%r672,%r673,%r674 2026-02-21T12:44:51.6787068Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6787274Z // end inline asm 2026-02-21T12:44:51.6787422Z $L__tmp4: 2026-02-21T12:44:51.6787729Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6788095Z // begin inline asm 2026-02-21T12:44:51.6788260Z mov.u64 %rd180, 0x0; 2026-02-21T12:44:51.6788590Z createpolicy.fractional.L2::evict_last.b64 %rd180, 1.0; 2026-02-21T12:44:51.6788964Z // end inline asm 2026-02-21T12:44:51.6789127Z // begin inline asm 2026-02-21T12:44:51.6789285Z mov.u32 %r710, 0x0; 2026-02-21T12:44:51.6789556Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r710 }, [ %rd509 + 0 ], %rd180; 2026-02-21T12:44:51.6789925Z // end inline asm 2026-02-21T12:44:51.6790234Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6790589Z bar.sync 0; 2026-02-21T12:44:51.6790761Z st.shared.b32 [%r9], %r710; 2026-02-21T12:44:51.6790950Z bar.sync 0; 2026-02-21T12:44:51.6791100Z ld.shared.b16 %rs40, [%r10]; 2026-02-21T12:44:51.6791301Z ld.shared.b16 %rs41, [%r10+256]; 2026-02-21T12:44:51.6791498Z ld.shared.b16 %rs42, [%r10+16]; 2026-02-21T12:44:51.6791696Z ld.shared.b16 %rs43, [%r10+272]; 2026-02-21T12:44:51.6791889Z ld.shared.b16 %rs44, [%r11]; 2026-02-21T12:44:51.6792083Z ld.shared.b16 %rs45, [%r11+256]; 2026-02-21T12:44:51.6792272Z ld.shared.b16 %rs46, [%r11+16]; 2026-02-21T12:44:51.6792471Z ld.shared.b16 %rs47, [%r11+272]; 2026-02-21T12:44:51.6792667Z cvt.f32.bf16 %r743, %rs40; 2026-02-21T12:44:51.6792851Z cvt.f32.bf16 %r744, %rs41; 2026-02-21T12:44:51.6793190Z cvt.f32.bf16 %r745, %rs44; 2026-02-21T12:44:51.6793370Z cvt.f32.bf16 %r746, %rs45; 2026-02-21T12:44:51.6793551Z cvt.f32.bf16 %r763, %rs42; 2026-02-21T12:44:51.6793723Z cvt.f32.bf16 %r764, %rs43; 2026-02-21T12:44:51.6793901Z cvt.f32.bf16 %r765, %rs46; 2026-02-21T12:44:51.6794073Z cvt.f32.bf16 %r766, %rs47; 2026-02-21T12:44:51.6794399Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6794762Z add.s64 %rd183, %rd508, 20480; 2026-02-21T12:44:51.6794957Z // begin inline asm 2026-02-21T12:44:51.6795117Z mov.u16 %rs3, 0x0; 2026-02-21T12:44:51.6795284Z ld.global.b8 { %rs3 }, [ %rd183 + 0 ]; 2026-02-21T12:44:51.6795495Z // end inline asm 2026-02-21T12:44:51.6795789Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6796147Z bar.sync 0; 2026-02-21T12:44:51.6796306Z st.shared.b8 [%r12], %rs3; 2026-02-21T12:44:51.6796618Z bar.sync 0; 2026-02-21T12:44:51.6796792Z ld.shared.v2.b8 {%rs48, %rs49}, [%r13]; 2026-02-21T12:44:51.6797152Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6797526Z shl.b16 %rs50, %rs48, 4; 2026-02-21T12:44:51.6797707Z shl.b16 %rs51, %rs49, 4; 2026-02-21T12:44:51.6798031Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6798398Z selp.b16 %rs52, %rs50, %rs48, %p134; 2026-02-21T12:44:51.6798611Z cvt.s16.s8 %rs53, %rs52; 2026-02-21T12:44:51.6798787Z shr.s16 %rs54, %rs53, 4; 2026-02-21T12:44:51.6798966Z selp.b16 %rs55, %rs51, %rs49, %p134; 2026-02-21T12:44:51.6799171Z cvt.s16.s8 %rs56, %rs55; 2026-02-21T12:44:51.6799340Z shr.s16 %rs57, %rs56, 4; 2026-02-21T12:44:51.6799655Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6800012Z cvt.rn.f32.s16 %r794, %rs54; 2026-02-21T12:44:51.6800203Z cvt.rn.f32.s16 %r795, %rs57; 2026-02-21T12:44:51.6800379Z bar.sync 0; 2026-02-21T12:44:51.6800532Z st.shared.b32 [%r14], %r794; 2026-02-21T12:44:51.6800720Z st.shared.b32 [%r15], %r795; 2026-02-21T12:44:51.6800891Z $L__tmp5: 2026-02-21T12:44:51.6801256Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6801797Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r269, %r309, %r349, %r389}; 2026-02-21T12:44:51.6802132Z bar.sync 0; 2026-02-21T12:44:51.6802351Z // begin inline asm 2026-02-21T12:44:51.6802720Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r767}, [%r712]; 2026-02-21T12:44:51.6803117Z // end inline asm 2026-02-21T12:44:51.6803330Z bar.sync 0; 2026-02-21T12:44:51.6803706Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r271, %r311, %r351, %r391}; 2026-02-21T12:44:51.6804286Z bar.sync 0; 2026-02-21T12:44:51.6804489Z // begin inline asm 2026-02-21T12:44:51.6804820Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r769}, [%r712]; 2026-02-21T12:44:51.6805296Z // end inline asm 2026-02-21T12:44:51.6805501Z bar.sync 0; 2026-02-21T12:44:51.6805870Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r270, %r310, %r350, %r390}; 2026-02-21T12:44:51.6806323Z bar.sync 0; 2026-02-21T12:44:51.6806726Z // begin inline asm 2026-02-21T12:44:51.6807055Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r768}, [%r712]; 2026-02-21T12:44:51.6807428Z // end inline asm 2026-02-21T12:44:51.6807634Z bar.sync 0; 2026-02-21T12:44:51.6807988Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r272, %r312, %r352, %r392}; 2026-02-21T12:44:51.6808482Z bar.sync 0; 2026-02-21T12:44:51.6808719Z // begin inline asm 2026-02-21T12:44:51.6809000Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r770}, [%r712]; 2026-02-21T12:44:51.6809263Z // end inline asm 2026-02-21T12:44:51.6809419Z bar.sync 0; 2026-02-21T12:44:51.6809668Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r273, %r313, %r353, %r393}; 2026-02-21T12:44:51.6810103Z bar.sync 0; 2026-02-21T12:44:51.6810331Z // begin inline asm 2026-02-21T12:44:51.6810556Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r771}, [%r712]; 2026-02-21T12:44:51.6810820Z // end inline asm 2026-02-21T12:44:51.6810962Z bar.sync 0; 2026-02-21T12:44:51.6811217Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r275, %r315, %r355, %r395}; 2026-02-21T12:44:51.6811530Z bar.sync 0; 2026-02-21T12:44:51.6811681Z // begin inline asm 2026-02-21T12:44:51.6811895Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r773}, [%r712]; 2026-02-21T12:44:51.6812162Z // end inline asm 2026-02-21T12:44:51.6812310Z bar.sync 0; 2026-02-21T12:44:51.6812564Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r274, %r314, %r354, %r394}; 2026-02-21T12:44:51.6812878Z bar.sync 0; 2026-02-21T12:44:51.6813024Z // begin inline asm 2026-02-21T12:44:51.6813243Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r772}, [%r712]; 2026-02-21T12:44:51.6813508Z // end inline asm 2026-02-21T12:44:51.6813650Z bar.sync 0; 2026-02-21T12:44:51.6813912Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r213], {%r276, %r316, %r356, %r396}; 2026-02-21T12:44:51.6814224Z bar.sync 0; 2026-02-21T12:44:51.6814371Z // begin inline asm 2026-02-21T12:44:51.6814585Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r774}, [%r712]; 2026-02-21T12:44:51.6814848Z // end inline asm 2026-02-21T12:44:51.6814999Z // begin inline asm 2026-02-21T12:44:51.6815189Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6815383Z // end inline asm 2026-02-21T12:44:51.6815541Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6815725Z shl.b32 %r796, %r791, 8; 2026-02-21T12:44:51.6815894Z and.b32 %r797, %r796, 3072; 2026-02-21T12:44:51.6816077Z add.s32 %r798, %r797, %r3946; 2026-02-21T12:44:51.6816257Z bfe.u32 %r799, %r798, 4, 14; 2026-02-21T12:44:51.6816440Z cvt.u64.u32 %rd186, %r799; 2026-02-21T12:44:51.6816828Z or.b64 %rd184, %rd186, -9223371899399045120; 2026-02-21T12:44:51.6817046Z // begin inline asm 2026-02-21T12:44:51.6817501Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r743,%r744,%r745,%r746}, %rd184, %p20, 1, 1; 2026-02-21T12:44:51.6817989Z // end inline asm 2026-02-21T12:44:51.6818144Z add.s32 %r800, %r798, 32; 2026-02-21T12:44:51.6818314Z bfe.u32 %r801, %r800, 4, 14; 2026-02-21T12:44:51.6818492Z cvt.u64.u32 %rd187, %r801; 2026-02-21T12:44:51.6818680Z or.b64 %rd185, %rd187, -9223371899399045120; 2026-02-21T12:44:51.6818891Z // begin inline asm 2026-02-21T12:44:51.6819334Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r763,%r764,%r765,%r766}, %rd185, %p20, 1, 1; 2026-02-21T12:44:51.6819812Z // end inline asm 2026-02-21T12:44:51.6819981Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6820179Z mov.b32 %r776, %r916; 2026-02-21T12:44:51.6820441Z mov.b32 %r777, %r916; 2026-02-21T12:44:51.6820597Z mov.b32 %r775, %r3946; 2026-02-21T12:44:51.6820765Z // begin inline asm 2026-02-21T12:44:51.6821014Z // wait for regs: %r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r775,%r776,%r777 2026-02-21T12:44:51.6821397Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6821592Z // end inline asm 2026-02-21T12:44:51.6821734Z $L__tmp6: 2026-02-21T12:44:51.6822039Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.6822408Z add.s64 %rd510, %rd510, 24; 2026-02-21T12:44:51.6822591Z add.s64 %rd509, %rd509, 96; 2026-02-21T12:44:51.6822771Z add.s64 %rd508, %rd508, 30720; 2026-02-21T12:44:51.6822967Z setp.lt.u64 %p22, %rd510, 4056; 2026-02-21T12:44:51.6823154Z @%p22 bra $L__BB0_10; 2026-02-21T12:44:51.6823347Z // %bb.11: // %.preheader211 2026-02-21T12:44:51.6823628Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6824032Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.6824466Z shl.b64 %rd201, %rd15, 1; 2026-02-21T12:44:51.6824707Z add.s64 %rd202, %rd44, %rd201; 2026-02-21T12:44:51.6824899Z add.s64 %rd189, %rd202, 16320; 2026-02-21T12:44:51.6825224Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6825582Z // begin inline asm 2026-02-21T12:44:51.6825737Z mov.u64 %rd188, 0x0; 2026-02-21T12:44:51.6825980Z createpolicy.fractional.L2::evict_last.b64 %rd188, 1.0; 2026-02-21T12:44:51.6826237Z // end inline asm 2026-02-21T12:44:51.6826388Z // begin inline asm 2026-02-21T12:44:51.6826675Z mov.u32 %r802, 0x0; 2026-02-21T12:44:51.6826930Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r802 }, [ %rd189 + 0 ], %rd188; 2026-02-21T12:44:51.6827235Z // end inline asm 2026-02-21T12:44:51.6827537Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6827922Z bar.sync 0; 2026-02-21T12:44:51.6828082Z st.shared.b32 [%r9], %r802; 2026-02-21T12:44:51.6828262Z bar.sync 0; 2026-02-21T12:44:51.6828420Z ld.shared.b16 %rs60, [%r18]; 2026-02-21T12:44:51.6828712Z ld.shared.b16 %rs61, [%r18+256]; 2026-02-21T12:44:51.6828916Z ld.shared.b16 %rs62, [%r18+16]; 2026-02-21T12:44:51.6829105Z ld.shared.b16 %rs63, [%r18+272]; 2026-02-21T12:44:51.6829304Z ld.shared.b16 %rs64, [%r19]; 2026-02-21T12:44:51.6829483Z ld.shared.b16 %rs65, [%r19+256]; 2026-02-21T12:44:51.6829678Z ld.shared.b16 %rs66, [%r19+16]; 2026-02-21T12:44:51.6829862Z ld.shared.b16 %rs67, [%r19+272]; 2026-02-21T12:44:51.6830052Z cvt.f32.bf16 %r819, %rs60; 2026-02-21T12:44:51.6830233Z cvt.f32.bf16 %r820, %rs61; 2026-02-21T12:44:51.6830404Z cvt.f32.bf16 %r821, %rs64; 2026-02-21T12:44:51.6830578Z cvt.f32.bf16 %r822, %rs65; 2026-02-21T12:44:51.6830743Z cvt.f32.bf16 %r839, %rs62; 2026-02-21T12:44:51.6830913Z cvt.f32.bf16 %r840, %rs63; 2026-02-21T12:44:51.6831084Z cvt.f32.bf16 %r841, %rs66; 2026-02-21T12:44:51.6831255Z cvt.f32.bf16 %r842, %rs67; 2026-02-21T12:44:51.6831575Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.6831944Z add.s64 %rd203, %rd45, %rd502; 2026-02-21T12:44:51.6832136Z add.s64 %rd191, %rd203, 5222400; 2026-02-21T12:44:51.6832466Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6832820Z // begin inline asm 2026-02-21T12:44:51.6832975Z mov.u16 %rs58, 0x0; 2026-02-21T12:44:51.6833144Z ld.global.b8 { %rs58 }, [ %rd191 + 0 ]; 2026-02-21T12:44:51.6833345Z // end inline asm 2026-02-21T12:44:51.6833644Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6833986Z bar.sync 0; 2026-02-21T12:44:51.6834137Z st.shared.b8 [%r12], %rs58; 2026-02-21T12:44:51.6834316Z bar.sync 0; 2026-02-21T12:44:51.6834577Z ld.shared.v2.b8 {%rs68, %rs69}, [%r13]; 2026-02-21T12:44:51.6834933Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6835355Z shl.b16 %rs70, %rs68, 4; 2026-02-21T12:44:51.6835530Z shl.b16 %rs71, %rs69, 4; 2026-02-21T12:44:51.6835838Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6836203Z selp.b16 %rs72, %rs70, %rs68, %p134; 2026-02-21T12:44:51.6836406Z cvt.s16.s8 %rs73, %rs72; 2026-02-21T12:44:51.6836704Z shr.s16 %rs74, %rs73, 4; 2026-02-21T12:44:51.6836887Z selp.b16 %rs75, %rs71, %rs69, %p134; 2026-02-21T12:44:51.6837081Z cvt.s16.s8 %rs76, %rs75; 2026-02-21T12:44:51.6837249Z shr.s16 %rs77, %rs76, 4; 2026-02-21T12:44:51.6837575Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6837942Z cvt.rn.f32.s16 %r932, %rs74; 2026-02-21T12:44:51.6838120Z cvt.rn.f32.s16 %r933, %rs77; 2026-02-21T12:44:51.6838300Z bar.sync 0; 2026-02-21T12:44:51.6838452Z st.shared.b32 [%r14], %r932; 2026-02-21T12:44:51.6838722Z st.shared.b32 [%r15], %r933; 2026-02-21T12:44:51.6838959Z $L__tmp7: 2026-02-21T12:44:51.6839326Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6839748Z // begin inline asm 2026-02-21T12:44:51.6839928Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6840123Z // end inline asm 2026-02-21T12:44:51.6840269Z bar.sync 0; 2026-02-21T12:44:51.6840453Z shfl.sync.idx.b32 %r934, %r3, 0, 31, -1; 2026-02-21T12:44:51.6840680Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6840866Z shl.b32 %r935, %r934, 8; 2026-02-21T12:44:51.6841037Z and.b32 %r936, %r935, 3072; 2026-02-21T12:44:51.6841216Z add.s32 %r937, %r936, %r3946; 2026-02-21T12:44:51.6841401Z bfe.u32 %r938, %r937, 4, 14; 2026-02-21T12:44:51.6841581Z cvt.u64.u32 %rd204, %r938; 2026-02-21T12:44:51.6841783Z or.b64 %rd192, %rd204, -9223371899399045120; 2026-02-21T12:44:51.6841991Z // begin inline asm 2026-02-21T12:44:51.6842445Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r819,%r820,%r821,%r822}, %rd192, %p20, 1, 1; 2026-02-21T12:44:51.6842936Z // end inline asm 2026-02-21T12:44:51.6843095Z add.s32 %r940, %r936, %r169; 2026-02-21T12:44:51.6843273Z bfe.u32 %r941, %r940, 4, 14; 2026-02-21T12:44:51.6843446Z cvt.u64.u32 %rd205, %r941; 2026-02-21T12:44:51.6843639Z or.b64 %rd193, %rd205, -9223371899399045120; 2026-02-21T12:44:51.6843862Z // begin inline asm 2026-02-21T12:44:51.6844304Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r839,%r840,%r841,%r842}, %rd193, %p20, 1, 1; 2026-02-21T12:44:51.6844784Z // end inline asm 2026-02-21T12:44:51.6844955Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6845151Z mov.b32 %r852, %r916; 2026-02-21T12:44:51.6845315Z mov.b32 %r853, %r916; 2026-02-21T12:44:51.6845477Z mov.b32 %r851, %r3946; 2026-02-21T12:44:51.6845646Z // begin inline asm 2026-02-21T12:44:51.6845900Z // wait for regs: %r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r851,%r852,%r853 2026-02-21T12:44:51.6846219Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6846416Z // end inline asm 2026-02-21T12:44:51.6846687Z $L__tmp8: 2026-02-21T12:44:51.6846980Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.6847357Z add.s64 %rd195, %rd202, 16352; 2026-02-21T12:44:51.6847691Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6848046Z // begin inline asm 2026-02-21T12:44:51.6848201Z mov.u64 %rd194, 0x0; 2026-02-21T12:44:51.6848422Z createpolicy.fractional.L2::evict_last.b64 %rd194, 1.0; 2026-02-21T12:44:51.6848673Z // end inline asm 2026-02-21T12:44:51.6848824Z // begin inline asm 2026-02-21T12:44:51.6848975Z mov.u32 %r865, 0x0; 2026-02-21T12:44:51.6849326Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r865 }, [ %rd195 + 0 ], %rd194; 2026-02-21T12:44:51.6849627Z // end inline asm 2026-02-21T12:44:51.6849987Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6850340Z bar.sync 0; 2026-02-21T12:44:51.6850490Z st.shared.b32 [%r9], %r865; 2026-02-21T12:44:51.6850668Z bar.sync 0; 2026-02-21T12:44:51.6850812Z ld.shared.b16 %rs78, [%r18]; 2026-02-21T12:44:51.6851003Z ld.shared.b16 %rs79, [%r18+256]; 2026-02-21T12:44:51.6851211Z ld.shared.b16 %rs80, [%r18+16]; 2026-02-21T12:44:51.6851414Z ld.shared.b16 %rs81, [%r18+272]; 2026-02-21T12:44:51.6851602Z ld.shared.b16 %rs82, [%r19]; 2026-02-21T12:44:51.6851789Z ld.shared.b16 %rs83, [%r19+256]; 2026-02-21T12:44:51.6851980Z ld.shared.b16 %rs84, [%r19+16]; 2026-02-21T12:44:51.6852166Z ld.shared.b16 %rs85, [%r19+272]; 2026-02-21T12:44:51.6852356Z cvt.f32.bf16 %r882, %rs78; 2026-02-21T12:44:51.6852533Z cvt.f32.bf16 %r883, %rs79; 2026-02-21T12:44:51.6852709Z cvt.f32.bf16 %r884, %rs82; 2026-02-21T12:44:51.6852878Z cvt.f32.bf16 %r885, %rs83; 2026-02-21T12:44:51.6853181Z cvt.f32.bf16 %r902, %rs80; 2026-02-21T12:44:51.6853359Z cvt.f32.bf16 %r903, %rs81; 2026-02-21T12:44:51.6853536Z cvt.f32.bf16 %r904, %rs84; 2026-02-21T12:44:51.6853706Z cvt.f32.bf16 %r905, %rs85; 2026-02-21T12:44:51.6854019Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.6854377Z add.s64 %rd197, %rd203, 5232640; 2026-02-21T12:44:51.6854721Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6855072Z // begin inline asm 2026-02-21T12:44:51.6855224Z mov.u16 %rs59, 0x0; 2026-02-21T12:44:51.6855395Z ld.global.b8 { %rs59 }, [ %rd197 + 0 ]; 2026-02-21T12:44:51.6855593Z // end inline asm 2026-02-21T12:44:51.6855887Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6856242Z bar.sync 0; 2026-02-21T12:44:51.6856387Z st.shared.b8 [%r12], %rs59; 2026-02-21T12:44:51.6856703Z bar.sync 0; 2026-02-21T12:44:51.6856862Z ld.shared.v2.b8 {%rs86, %rs87}, [%r13]; 2026-02-21T12:44:51.6857219Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6857586Z shl.b16 %rs88, %rs86, 4; 2026-02-21T12:44:51.6857765Z shl.b16 %rs89, %rs87, 4; 2026-02-21T12:44:51.6858073Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6858438Z selp.b16 %rs90, %rs88, %rs86, %p134; 2026-02-21T12:44:51.6858644Z cvt.s16.s8 %rs91, %rs90; 2026-02-21T12:44:51.6858814Z shr.s16 %rs92, %rs91, 4; 2026-02-21T12:44:51.6858995Z selp.b16 %rs93, %rs89, %rs87, %p134; 2026-02-21T12:44:51.6859191Z cvt.s16.s8 %rs94, %rs93; 2026-02-21T12:44:51.6859365Z shr.s16 %rs95, %rs94, 4; 2026-02-21T12:44:51.6859674Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6860032Z cvt.rn.f32.s16 %r942, %rs92; 2026-02-21T12:44:51.6860237Z cvt.rn.f32.s16 %r943, %rs95; 2026-02-21T12:44:51.6860410Z bar.sync 0; 2026-02-21T12:44:51.6860570Z st.shared.b32 [%r14], %r942; 2026-02-21T12:44:51.6860753Z st.shared.b32 [%r15], %r943; 2026-02-21T12:44:51.6860933Z $L__tmp9: 2026-02-21T12:44:51.6861285Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6861712Z // begin inline asm 2026-02-21T12:44:51.6861891Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6862088Z // end inline asm 2026-02-21T12:44:51.6862261Z bar.sync 0; 2026-02-21T12:44:51.6862427Z shfl.sync.idx.b32 %r944, %r3, 0, 31, -1; 2026-02-21T12:44:51.6862661Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6862839Z shl.b32 %r945, %r944, 8; 2026-02-21T12:44:51.6863011Z and.b32 %r946, %r945, 3072; 2026-02-21T12:44:51.6863275Z add.s32 %r947, %r946, %r3946; 2026-02-21T12:44:51.6863460Z bfe.u32 %r948, %r947, 4, 14; 2026-02-21T12:44:51.6863640Z cvt.u64.u32 %rd206, %r948; 2026-02-21T12:44:51.6863949Z or.b64 %rd198, %rd206, -9223371899399045120; 2026-02-21T12:44:51.6864179Z // begin inline asm 2026-02-21T12:44:51.6864636Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r882,%r883,%r884,%r885}, %rd198, %p20, 1, 1; 2026-02-21T12:44:51.6865132Z // end inline asm 2026-02-21T12:44:51.6865292Z add.s32 %r949, %r946, %r169; 2026-02-21T12:44:51.6865489Z bfe.u32 %r950, %r949, 4, 14; 2026-02-21T12:44:51.6865667Z cvt.u64.u32 %rd207, %r950; 2026-02-21T12:44:51.6865865Z or.b64 %rd199, %rd207, -9223371899399045120; 2026-02-21T12:44:51.6866188Z // begin inline asm 2026-02-21T12:44:51.6866909Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774}, {%r902,%r903,%r904,%r905}, %rd199, %p20, 1, 1; 2026-02-21T12:44:51.6867751Z // end inline asm 2026-02-21T12:44:51.6868033Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6868355Z mov.b32 %r915, %r916; 2026-02-21T12:44:51.6868892Z mov.b32 %r914, %r3946; 2026-02-21T12:44:51.6869156Z // begin inline asm 2026-02-21T12:44:51.6869515Z // wait for regs: %r767,%r768,%r769,%r770,%r771,%r772,%r773,%r774,%r914,%r915,%r916 2026-02-21T12:44:51.6869982Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6870263Z // end inline asm 2026-02-21T12:44:51.6870477Z $L__tmp10: 2026-02-21T12:44:51.6870920Z .loc 1 90 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:90:28 2026-02-21T12:44:51.6871461Z cvt.rn.bf16x2.f32 %r951, %r768, %r767; 2026-02-21T12:44:51.6871778Z cvt.rn.bf16x2.f32 %r952, %r770, %r769; 2026-02-21T12:44:51.6872074Z cvt.rn.bf16x2.f32 %r953, %r772, %r771; 2026-02-21T12:44:51.6872370Z cvt.rn.bf16x2.f32 %r954, %r774, %r773; 2026-02-21T12:44:51.6872919Z .loc 1 91 22 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:22 2026-02-21T12:44:51.6873519Z mad.lo.s64 %rd208, %rd42, 2560, %rd123; 2026-02-21T12:44:51.6873825Z shl.b64 %rd209, %rd43, 1; 2026-02-21T12:44:51.6874030Z add.s64 %rd210, %rd208, %rd209; 2026-02-21T12:44:51.6874226Z add.s64 %rd200, %rd210, %rd504; 2026-02-21T12:44:51.6874559Z .loc 1 91 81 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:81 2026-02-21T12:44:51.6874915Z bar.sync 0; 2026-02-21T12:44:51.6875254Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r951, %r952, %r953, %r954}; 2026-02-21T12:44:51.6875587Z bar.sync 0; 2026-02-21T12:44:51.6875806Z ld.shared.v4.b32 {%r928, %r929, %r930, %r931}, [%r21]; 2026-02-21T12:44:51.6876107Z // begin inline asm 2026-02-21T12:44:51.6876329Z st.global.v4.b32 [ %rd200 + 0 ], { %r928, %r929, %r930, %r931 }; 2026-02-21T12:44:51.6876767Z // end inline asm 2026-02-21T12:44:51.6877091Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.6877605Z or.b64 %rd212, %rd505, 1; 2026-02-21T12:44:51.6877958Z .loc 1 31 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:45 2026-02-21T12:44:51.6878344Z mul.hi.u64 %rd213, %rd212, -3689348814741910323; 2026-02-21T12:44:51.6878580Z shr.u64 %rd214, %rd213, 7; 2026-02-21T12:44:51.6878765Z mul.lo.s64 %rd215, %rd214, 160; 2026-02-21T12:44:51.6879066Z sub.s64 %rd216, %rd212, %rd215; 2026-02-21T12:44:51.6879431Z .loc 1 32 51 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:32:51 2026-02-21T12:44:51.6879786Z or.b64 %rd217, %rd216, %rd37; 2026-02-21T12:44:51.6879980Z and.b64 %rd218, %rd217, -4294967296; 2026-02-21T12:44:51.6880188Z setp.ne.b64 %p28, %rd218, 0; 2026-02-21T12:44:51.6880376Z @%p28 bra $L__BB0_13; 2026-02-21T12:44:51.6880540Z bra.uni $L__BB0_12; 2026-02-21T12:44:51.6880758Z $L__BB0_13: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6881156Z div.s64 %rd507, %rd216, %rd37; 2026-02-21T12:44:51.6881342Z bra.uni $L__BB0_14; 2026-02-21T12:44:51.6881555Z $L__BB0_12: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6881890Z cvt.u32.u64 %r955, %rd37; 2026-02-21T12:44:51.6882072Z cvt.u32.u64 %r956, %rd216; 2026-02-21T12:44:51.6882251Z div.u32 %r957, %r956, %r955; 2026-02-21T12:44:51.6882431Z cvt.u64.u32 %rd507, %r957; 2026-02-21T12:44:51.6882649Z $L__BB0_14: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6883179Z .loc 1 31 64 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:64 2026-02-21T12:44:51.6883562Z mul.lo.s64 %rd220, %rd507, %rd37; 2026-02-21T12:44:51.6883846Z sub.s64 %rd221, %rd216, %rd220; 2026-02-21T12:44:51.6884478Z .loc 1 31 30 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:30 2026-02-21T12:44:51.6885134Z add.s64 %rd222, %rd221, %rd36; 2026-02-21T12:44:51.6885725Z .loc 1 33 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:33:27 2026-02-21T12:44:51.6886381Z shl.b64 %rd223, %rd222, 6; 2026-02-21T12:44:51.6887386Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.6888072Z or.b64 %rd52, %rd223, %rd4; 2026-02-21T12:44:51.6888481Z .loc 1 35 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:35:27 2026-02-21T12:44:51.6888848Z shl.b64 %rd53, %rd507, 6; 2026-02-21T12:44:51.6889027Z shl.b64 %rd224, %rd52, 14; 2026-02-21T12:44:51.6889229Z add.s64 %rd54, %rd121, %rd224; 2026-02-21T12:44:51.6889503Z add.s64 %rd55, %rd501, %rd53; 2026-02-21T12:44:51.6889850Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.6890215Z shl.b64 %rd225, %rd222, 20; 2026-02-21T12:44:51.6890403Z add.s64 %rd513, %rd18, %rd225; 2026-02-21T12:44:51.6890594Z add.s64 %rd512, %rd19, %rd53; 2026-02-21T12:44:51.6890787Z mov.b32 %r1518, 0f00000000; 2026-02-21T12:44:51.6890966Z mov.b64 %rd514, -24; 2026-02-21T12:44:51.6891136Z mov.b32 %r1519, %r1518; 2026-02-21T12:44:51.6891317Z mov.b32 %r1520, %r1518; 2026-02-21T12:44:51.6891478Z mov.b32 %r1521, %r1518; 2026-02-21T12:44:51.6891648Z mov.b32 %r1522, %r1518; 2026-02-21T12:44:51.6891809Z mov.b32 %r1523, %r1518; 2026-02-21T12:44:51.6891975Z mov.b32 %r1524, %r1518; 2026-02-21T12:44:51.6892137Z mov.b32 %r1525, %r1518; 2026-02-21T12:44:51.6892457Z $L__BB0_15: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:51.6892761Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:51.6893260Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6893674Z add.s64 %rd227, %rd513, -64; 2026-02-21T12:44:51.6893857Z // begin inline asm 2026-02-21T12:44:51.6894024Z mov.u64 %rd226, 0x0; 2026-02-21T12:44:51.6894257Z createpolicy.fractional.L2::evict_last.b64 %rd226, 1.0; 2026-02-21T12:44:51.6894518Z // end inline asm 2026-02-21T12:44:51.6894677Z // begin inline asm 2026-02-21T12:44:51.6894836Z mov.u32 %r959, 0x0; 2026-02-21T12:44:51.6895113Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r959 }, [ %rd227 + 0 ], %rd226; 2026-02-21T12:44:51.6895415Z // end inline asm 2026-02-21T12:44:51.6895717Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6896158Z bar.sync 0; 2026-02-21T12:44:51.6896319Z st.shared.b32 [%r9], %r959; 2026-02-21T12:44:51.6896813Z bar.sync 0; 2026-02-21T12:44:51.6896995Z ld.shared.b16 %rs99, [%r10]; 2026-02-21T12:44:51.6897243Z ld.shared.b16 %rs100, [%r10+256]; 2026-02-21T12:44:51.6897557Z ld.shared.b16 %rs101, [%r10+16]; 2026-02-21T12:44:51.6897787Z ld.shared.b16 %rs102, [%r10+272]; 2026-02-21T12:44:51.6897987Z ld.shared.b16 %rs103, [%r11]; 2026-02-21T12:44:51.6898179Z ld.shared.b16 %rs104, [%r11+256]; 2026-02-21T12:44:51.6898514Z ld.shared.b16 %rs105, [%r11+16]; 2026-02-21T12:44:51.6898710Z ld.shared.b16 %rs106, [%r11+272]; 2026-02-21T12:44:51.6898902Z cvt.f32.bf16 %r1056, %rs99; 2026-02-21T12:44:51.6899158Z cvt.f32.bf16 %r1057, %rs100; 2026-02-21T12:44:51.6899340Z cvt.f32.bf16 %r1058, %rs103; 2026-02-21T12:44:51.6899516Z cvt.f32.bf16 %r1059, %rs104; 2026-02-21T12:44:51.6899713Z cvt.f32.bf16 %r1076, %rs101; 2026-02-21T12:44:51.6899964Z cvt.f32.bf16 %r1077, %rs102; 2026-02-21T12:44:51.6900235Z cvt.f32.bf16 %r1078, %rs105; 2026-02-21T12:44:51.6900415Z cvt.f32.bf16 %r1079, %rs106; 2026-02-21T12:44:51.6900813Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6901358Z // begin inline asm 2026-02-21T12:44:51.6901538Z mov.u16 %rs96, 0x0; 2026-02-21T12:44:51.6901712Z ld.global.b8 { %rs96 }, [ %rd512 + 0 ]; 2026-02-21T12:44:51.6901923Z // end inline asm 2026-02-21T12:44:51.6902235Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6902595Z bar.sync 0; 2026-02-21T12:44:51.6902761Z st.shared.b8 [%r12], %rs96; 2026-02-21T12:44:51.6903087Z bar.sync 0; 2026-02-21T12:44:51.6903262Z ld.shared.v2.b8 {%rs107, %rs108}, [%r22]; 2026-02-21T12:44:51.6903636Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6904141Z shl.b16 %rs109, %rs107, 4; 2026-02-21T12:44:51.6904338Z shl.b16 %rs110, %rs108, 4; 2026-02-21T12:44:51.6904764Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6905146Z selp.b16 %rs111, %rs109, %rs107, %p134; 2026-02-21T12:44:51.6905356Z cvt.s16.s8 %rs112, %rs111; 2026-02-21T12:44:51.6905539Z shr.s16 %rs113, %rs112, 4; 2026-02-21T12:44:51.6905720Z selp.b16 %rs114, %rs110, %rs108, %p134; 2026-02-21T12:44:51.6905928Z cvt.s16.s8 %rs115, %rs114; 2026-02-21T12:44:51.6906102Z shr.s16 %rs116, %rs115, 4; 2026-02-21T12:44:51.6906428Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6906965Z cvt.rn.f32.s16 %r1540, %rs113; 2026-02-21T12:44:51.6907169Z cvt.rn.f32.s16 %r1541, %rs116; 2026-02-21T12:44:51.6907434Z bar.sync 0; 2026-02-21T12:44:51.6907618Z st.shared.b32 [%r14], %r1540; 2026-02-21T12:44:51.6907872Z st.shared.b32 [%r15], %r1541; 2026-02-21T12:44:51.6908130Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1518}; 2026-02-21T12:44:51.6908627Z bar.sync 0; 2026-02-21T12:44:51.6908779Z // begin inline asm 2026-02-21T12:44:51.6909071Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1020, %r1060, %r1100, %r1140}, [%r964]; 2026-02-21T12:44:51.6909406Z // end inline asm 2026-02-21T12:44:51.6909553Z bar.sync 0; 2026-02-21T12:44:51.6909773Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1520}; 2026-02-21T12:44:51.6910036Z bar.sync 0; 2026-02-21T12:44:51.6910184Z // begin inline asm 2026-02-21T12:44:51.6910470Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1022, %r1062, %r1102, %r1142}, [%r964]; 2026-02-21T12:44:51.6910891Z // end inline asm 2026-02-21T12:44:51.6911094Z bar.sync 0; 2026-02-21T12:44:51.6911317Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1519}; 2026-02-21T12:44:51.6911641Z bar.sync 0; 2026-02-21T12:44:51.6911844Z // begin inline asm 2026-02-21T12:44:51.6912274Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1021, %r1061, %r1101, %r1141}, [%r964]; 2026-02-21T12:44:51.6912606Z // end inline asm 2026-02-21T12:44:51.6912756Z bar.sync 0; 2026-02-21T12:44:51.6912967Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1521}; 2026-02-21T12:44:51.6913234Z bar.sync 0; 2026-02-21T12:44:51.6913374Z // begin inline asm 2026-02-21T12:44:51.6913653Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1023, %r1063, %r1103, %r1143}, [%r964]; 2026-02-21T12:44:51.6913980Z // end inline asm 2026-02-21T12:44:51.6914131Z bar.sync 0; 2026-02-21T12:44:51.6914375Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1522}; 2026-02-21T12:44:51.6914787Z bar.sync 0; 2026-02-21T12:44:51.6915019Z // begin inline asm 2026-02-21T12:44:51.6915549Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1024, %r1064, %r1104, %r1144}, [%r964]; 2026-02-21T12:44:51.6916288Z // end inline asm 2026-02-21T12:44:51.6916724Z bar.sync 0; 2026-02-21T12:44:51.6917117Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1524}; 2026-02-21T12:44:51.6917603Z bar.sync 0; 2026-02-21T12:44:51.6917848Z // begin inline asm 2026-02-21T12:44:51.6918347Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1026, %r1066, %r1106, %r1146}, [%r964]; 2026-02-21T12:44:51.6918938Z // end inline asm 2026-02-21T12:44:51.6919210Z bar.sync 0; 2026-02-21T12:44:51.6919430Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1523}; 2026-02-21T12:44:51.6919700Z bar.sync 0; 2026-02-21T12:44:51.6919855Z // begin inline asm 2026-02-21T12:44:51.6920201Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1025, %r1065, %r1105, %r1145}, [%r964]; 2026-02-21T12:44:51.6920563Z // end inline asm 2026-02-21T12:44:51.6920720Z bar.sync 0; 2026-02-21T12:44:51.6920931Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r1525}; 2026-02-21T12:44:51.6921378Z bar.sync 0; 2026-02-21T12:44:51.6921536Z // begin inline asm 2026-02-21T12:44:51.6921816Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1027, %r1067, %r1107, %r1147}, [%r964]; 2026-02-21T12:44:51.6922148Z // end inline asm 2026-02-21T12:44:51.6922296Z $L__tmp11: 2026-02-21T12:44:51.6922681Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6923208Z // begin inline asm 2026-02-21T12:44:51.6923398Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6923676Z // end inline asm 2026-02-21T12:44:51.6923914Z shfl.sync.idx.b32 %r1542, %r3, 0, 31, -1; 2026-02-21T12:44:51.6924154Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6924345Z mov.pred %p46, -1; 2026-02-21T12:44:51.6924512Z // begin inline asm 2026-02-21T12:44:51.6924987Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027}, {%r1056,%r1057,%r1058,%r1059}, %rd7, %p46, 1, 1; 2026-02-21T12:44:51.6925521Z // end inline asm 2026-02-21T12:44:51.6925672Z // begin inline asm 2026-02-21T12:44:51.6926126Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027}, {%r1076,%r1077,%r1078,%r1079}, %rd8, %p46, 1, 1; 2026-02-21T12:44:51.6926191Z // end inline asm 2026-02-21T12:44:51.6926252Z // begin inline asm 2026-02-21T12:44:51.6926784Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067}, {%r1056,%r1057,%r1058,%r1059}, %rd9, %p46, 1, 1; 2026-02-21T12:44:51.6926887Z // end inline asm 2026-02-21T12:44:51.6926994Z // begin inline asm 2026-02-21T12:44:51.6927427Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067}, {%r1076,%r1077,%r1078,%r1079}, %rd10, %p46, 1, 1; 2026-02-21T12:44:51.6927493Z // end inline asm 2026-02-21T12:44:51.6927556Z // begin inline asm 2026-02-21T12:44:51.6927932Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107}, {%r1056,%r1057,%r1058,%r1059}, %rd11, %p46, 1, 1; 2026-02-21T12:44:51.6927992Z // end inline asm 2026-02-21T12:44:51.6928054Z // begin inline asm 2026-02-21T12:44:51.6928419Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107}, {%r1076,%r1077,%r1078,%r1079}, %rd12, %p46, 1, 1; 2026-02-21T12:44:51.6928484Z // end inline asm 2026-02-21T12:44:51.6928545Z // begin inline asm 2026-02-21T12:44:51.6928914Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147}, {%r1056,%r1057,%r1058,%r1059}, %rd13, %p46, 1, 1; 2026-02-21T12:44:51.6928978Z // end inline asm 2026-02-21T12:44:51.6929039Z // begin inline asm 2026-02-21T12:44:51.6929486Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147}, {%r1076,%r1077,%r1078,%r1079}, %rd14, %p46, 1, 1; 2026-02-21T12:44:51.6929752Z // end inline asm 2026-02-21T12:44:51.6929837Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6929899Z mov.b32 %r1667, 0; 2026-02-21T12:44:51.6929963Z mov.b32 %r1192, %r3946; 2026-02-21T12:44:51.6930032Z mov.b32 %r1193, %r1667; 2026-02-21T12:44:51.6930094Z mov.b32 %r1194, %r1667; 2026-02-21T12:44:51.6930175Z // begin inline asm 2026-02-21T12:44:51.6930974Z // wait for regs: %r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027,%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1192,%r1193,%r1194 2026-02-21T12:44:51.6931061Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6931121Z // end inline asm 2026-02-21T12:44:51.6931188Z $L__tmp12: 2026-02-21T12:44:51.6931410Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6931488Z add.s64 %rd239, %rd513, -32; 2026-02-21T12:44:51.6931645Z // begin inline asm 2026-02-21T12:44:51.6931797Z mov.u64 %rd238, 0x0; 2026-02-21T12:44:51.6931933Z createpolicy.fractional.L2::evict_last.b64 %rd238, 1.0; 2026-02-21T12:44:51.6931993Z // end inline asm 2026-02-21T12:44:51.6932076Z // begin inline asm 2026-02-21T12:44:51.6932140Z mov.u32 %r1230, 0x0; 2026-02-21T12:44:51.6932310Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1230 }, [ %rd239 + 0 ], %rd238; 2026-02-21T12:44:51.6932375Z // end inline asm 2026-02-21T12:44:51.6932589Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6932651Z bar.sync 0; 2026-02-21T12:44:51.6932720Z st.shared.b32 [%r9], %r1230; 2026-02-21T12:44:51.6932782Z bar.sync 0; 2026-02-21T12:44:51.6932895Z ld.shared.b16 %rs117, [%r10]; 2026-02-21T12:44:51.6933018Z ld.shared.b16 %rs118, [%r10+256]; 2026-02-21T12:44:51.6933100Z ld.shared.b16 %rs119, [%r10+16]; 2026-02-21T12:44:51.6933168Z ld.shared.b16 %rs120, [%r10+272]; 2026-02-21T12:44:51.6933240Z ld.shared.b16 %rs121, [%r11]; 2026-02-21T12:44:51.6933316Z ld.shared.b16 %rs122, [%r11+256]; 2026-02-21T12:44:51.6933425Z ld.shared.b16 %rs123, [%r11+16]; 2026-02-21T12:44:51.6933547Z ld.shared.b16 %rs124, [%r11+272]; 2026-02-21T12:44:51.6933667Z cvt.f32.bf16 %r1287, %rs117; 2026-02-21T12:44:51.6933792Z cvt.f32.bf16 %r1288, %rs118; 2026-02-21T12:44:51.6933903Z cvt.f32.bf16 %r1289, %rs121; 2026-02-21T12:44:51.6934018Z cvt.f32.bf16 %r1290, %rs122; 2026-02-21T12:44:51.6934133Z cvt.f32.bf16 %r1307, %rs119; 2026-02-21T12:44:51.6934241Z cvt.f32.bf16 %r1308, %rs120; 2026-02-21T12:44:51.6934343Z cvt.f32.bf16 %r1309, %rs123; 2026-02-21T12:44:51.6934432Z cvt.f32.bf16 %r1310, %rs124; 2026-02-21T12:44:51.6934757Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6934877Z add.s64 %rd241, %rd512, 10240; 2026-02-21T12:44:51.6934976Z // begin inline asm 2026-02-21T12:44:51.6935079Z mov.u16 %rs97, 0x0; 2026-02-21T12:44:51.6935205Z ld.global.b8 { %rs97 }, [ %rd241 + 0 ]; 2026-02-21T12:44:51.6935295Z // end inline asm 2026-02-21T12:44:51.6935670Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6935764Z bar.sync 0; 2026-02-21T12:44:51.6935866Z st.shared.b8 [%r12], %rs97; 2026-02-21T12:44:51.6935945Z bar.sync 0; 2026-02-21T12:44:51.6936064Z ld.shared.v2.b8 {%rs125, %rs126}, [%r22]; 2026-02-21T12:44:51.6936372Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6936651Z shl.b16 %rs127, %rs125, 4; 2026-02-21T12:44:51.6936758Z shl.b16 %rs128, %rs126, 4; 2026-02-21T12:44:51.6937079Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6937190Z selp.b16 %rs129, %rs127, %rs125, %p134; 2026-02-21T12:44:51.6937437Z cvt.s16.s8 %rs130, %rs129; 2026-02-21T12:44:51.6937536Z shr.s16 %rs131, %rs130, 4; 2026-02-21T12:44:51.6937749Z selp.b16 %rs132, %rs128, %rs126, %p134; 2026-02-21T12:44:51.6937840Z cvt.s16.s8 %rs133, %rs132; 2026-02-21T12:44:51.6937939Z shr.s16 %rs134, %rs133, 4; 2026-02-21T12:44:51.6938264Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6938357Z cvt.rn.f32.s16 %r1543, %rs131; 2026-02-21T12:44:51.6938452Z cvt.rn.f32.s16 %r1544, %rs134; 2026-02-21T12:44:51.6938536Z bar.sync 0; 2026-02-21T12:44:51.6938630Z st.shared.b32 [%r14], %r1543; 2026-02-21T12:44:51.6938723Z st.shared.b32 [%r15], %r1544; 2026-02-21T12:44:51.6938808Z $L__tmp13: 2026-02-21T12:44:51.6939256Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6939345Z // begin inline asm 2026-02-21T12:44:51.6939479Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6939564Z // end inline asm 2026-02-21T12:44:51.6939646Z bar.sync 0; 2026-02-21T12:44:51.6939845Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6940028Z // begin inline asm 2026-02-21T12:44:51.6940598Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027}, {%r1287,%r1288,%r1289,%r1290}, %rd7, %p46, 1, 1; 2026-02-21T12:44:51.6940686Z // end inline asm 2026-02-21T12:44:51.6940780Z // begin inline asm 2026-02-21T12:44:51.6941330Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027}, {%r1307,%r1308,%r1309,%r1310}, %rd8, %p46, 1, 1; 2026-02-21T12:44:51.6941414Z // end inline asm 2026-02-21T12:44:51.6941503Z // begin inline asm 2026-02-21T12:44:51.6942025Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067}, {%r1287,%r1288,%r1289,%r1290}, %rd9, %p46, 1, 1; 2026-02-21T12:44:51.6942115Z // end inline asm 2026-02-21T12:44:51.6942216Z // begin inline asm 2026-02-21T12:44:51.6942780Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067}, {%r1307,%r1308,%r1309,%r1310}, %rd10, %p46, 1, 1; 2026-02-21T12:44:51.6942847Z // end inline asm 2026-02-21T12:44:51.6942907Z // begin inline asm 2026-02-21T12:44:51.6943285Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107}, {%r1287,%r1288,%r1289,%r1290}, %rd11, %p46, 1, 1; 2026-02-21T12:44:51.6943346Z // end inline asm 2026-02-21T12:44:51.6943408Z // begin inline asm 2026-02-21T12:44:51.6943780Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107}, {%r1307,%r1308,%r1309,%r1310}, %rd12, %p46, 1, 1; 2026-02-21T12:44:51.6943840Z // end inline asm 2026-02-21T12:44:51.6943900Z // begin inline asm 2026-02-21T12:44:51.6944272Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147}, {%r1287,%r1288,%r1289,%r1290}, %rd13, %p46, 1, 1; 2026-02-21T12:44:51.6944334Z // end inline asm 2026-02-21T12:44:51.6944400Z // begin inline asm 2026-02-21T12:44:51.6944772Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147}, {%r1307,%r1308,%r1309,%r1310}, %rd14, %p46, 1, 1; 2026-02-21T12:44:51.6944830Z // end inline asm 2026-02-21T12:44:51.6944912Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6944980Z mov.b32 %r1423, %r3946; 2026-02-21T12:44:51.6945049Z mov.b32 %r1424, %r1667; 2026-02-21T12:44:51.6945110Z mov.b32 %r1425, %r1667; 2026-02-21T12:44:51.6945173Z // begin inline asm 2026-02-21T12:44:51.6945738Z // wait for regs: %r1020,%r1021,%r1022,%r1023,%r1024,%r1025,%r1026,%r1027,%r1060,%r1061,%r1062,%r1063,%r1064,%r1065,%r1066,%r1067,%r1100,%r1101,%r1102,%r1103,%r1104,%r1105,%r1106,%r1107,%r1140,%r1141,%r1142,%r1143,%r1144,%r1145,%r1146,%r1147,%r1423,%r1424,%r1425 2026-02-21T12:44:51.6945889Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6945948Z // end inline asm 2026-02-21T12:44:51.6946016Z $L__tmp14: 2026-02-21T12:44:51.6946294Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6946358Z // begin inline asm 2026-02-21T12:44:51.6946421Z mov.u64 %rd250, 0x0; 2026-02-21T12:44:51.6946691Z createpolicy.fractional.L2::evict_last.b64 %rd250, 1.0; 2026-02-21T12:44:51.6946754Z // end inline asm 2026-02-21T12:44:51.6946815Z // begin inline asm 2026-02-21T12:44:51.6946882Z mov.u32 %r1461, 0x0; 2026-02-21T12:44:51.6947046Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1461 }, [ %rd513 + 0 ], %rd250; 2026-02-21T12:44:51.6947106Z // end inline asm 2026-02-21T12:44:51.6947324Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6947383Z bar.sync 0; 2026-02-21T12:44:51.6947452Z st.shared.b32 [%r9], %r1461; 2026-02-21T12:44:51.6947516Z bar.sync 0; 2026-02-21T12:44:51.6947591Z ld.shared.b16 %rs135, [%r10]; 2026-02-21T12:44:51.6947663Z ld.shared.b16 %rs136, [%r10+256]; 2026-02-21T12:44:51.6947891Z ld.shared.b16 %rs137, [%r10+16]; 2026-02-21T12:44:51.6947972Z ld.shared.b16 %rs138, [%r10+272]; 2026-02-21T12:44:51.6948043Z ld.shared.b16 %rs139, [%r11]; 2026-02-21T12:44:51.6948110Z ld.shared.b16 %rs140, [%r11+256]; 2026-02-21T12:44:51.6948185Z ld.shared.b16 %rs141, [%r11+16]; 2026-02-21T12:44:51.6948251Z ld.shared.b16 %rs142, [%r11+272]; 2026-02-21T12:44:51.6948319Z cvt.f32.bf16 %r1494, %rs135; 2026-02-21T12:44:51.6948383Z cvt.f32.bf16 %r1495, %rs136; 2026-02-21T12:44:51.6948453Z cvt.f32.bf16 %r1496, %rs139; 2026-02-21T12:44:51.6948624Z cvt.f32.bf16 %r1497, %rs140; 2026-02-21T12:44:51.6948690Z cvt.f32.bf16 %r1514, %rs137; 2026-02-21T12:44:51.6948759Z cvt.f32.bf16 %r1515, %rs138; 2026-02-21T12:44:51.6948824Z cvt.f32.bf16 %r1516, %rs141; 2026-02-21T12:44:51.6948887Z cvt.f32.bf16 %r1517, %rs142; 2026-02-21T12:44:51.6949101Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6949183Z add.s64 %rd253, %rd512, 20480; 2026-02-21T12:44:51.6949254Z // begin inline asm 2026-02-21T12:44:51.6949316Z mov.u16 %rs98, 0x0; 2026-02-21T12:44:51.6949401Z ld.global.b8 { %rs98 }, [ %rd253 + 0 ]; 2026-02-21T12:44:51.6949463Z // end inline asm 2026-02-21T12:44:51.6949666Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6949732Z bar.sync 0; 2026-02-21T12:44:51.6949804Z st.shared.b8 [%r12], %rs98; 2026-02-21T12:44:51.6949861Z bar.sync 0; 2026-02-21T12:44:51.6949944Z ld.shared.v2.b8 {%rs143, %rs144}, [%r22]; 2026-02-21T12:44:51.6950155Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6950222Z shl.b16 %rs145, %rs143, 4; 2026-02-21T12:44:51.6950287Z shl.b16 %rs146, %rs144, 4; 2026-02-21T12:44:51.6950499Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6950576Z selp.b16 %rs147, %rs145, %rs143, %p134; 2026-02-21T12:44:51.6950645Z cvt.s16.s8 %rs148, %rs147; 2026-02-21T12:44:51.6950711Z shr.s16 %rs149, %rs148, 4; 2026-02-21T12:44:51.6950793Z selp.b16 %rs150, %rs146, %rs144, %p134; 2026-02-21T12:44:51.6950859Z cvt.s16.s8 %rs151, %rs150; 2026-02-21T12:44:51.6950923Z shr.s16 %rs152, %rs151, 4; 2026-02-21T12:44:51.6951129Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6951197Z cvt.rn.f32.s16 %r1545, %rs149; 2026-02-21T12:44:51.6951262Z cvt.rn.f32.s16 %r1546, %rs152; 2026-02-21T12:44:51.6951328Z bar.sync 0; 2026-02-21T12:44:51.6951394Z st.shared.b32 [%r14], %r1545; 2026-02-21T12:44:51.6951458Z st.shared.b32 [%r15], %r1546; 2026-02-21T12:44:51.6951513Z $L__tmp15: 2026-02-21T12:44:51.6951810Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6952091Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1020, %r1060, %r1100, %r1140}; 2026-02-21T12:44:51.6952212Z bar.sync 0; 2026-02-21T12:44:51.6952282Z // begin inline asm 2026-02-21T12:44:51.6952423Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1518}, [%r712]; 2026-02-21T12:44:51.6952483Z // end inline asm 2026-02-21T12:44:51.6952540Z bar.sync 0; 2026-02-21T12:44:51.6952729Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1022, %r1062, %r1102, %r1142}; 2026-02-21T12:44:51.6952786Z bar.sync 0; 2026-02-21T12:44:51.6952846Z // begin inline asm 2026-02-21T12:44:51.6952984Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1520}, [%r712]; 2026-02-21T12:44:51.6953042Z // end inline asm 2026-02-21T12:44:51.6953100Z bar.sync 0; 2026-02-21T12:44:51.6953283Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1021, %r1061, %r1101, %r1141}; 2026-02-21T12:44:51.6953339Z bar.sync 0; 2026-02-21T12:44:51.6953417Z // begin inline asm 2026-02-21T12:44:51.6953547Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1519}, [%r712]; 2026-02-21T12:44:51.6953609Z // end inline asm 2026-02-21T12:44:51.6953786Z bar.sync 0; 2026-02-21T12:44:51.6953967Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1023, %r1063, %r1103, %r1143}; 2026-02-21T12:44:51.6954040Z bar.sync 0; 2026-02-21T12:44:51.6954103Z // begin inline asm 2026-02-21T12:44:51.6954231Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1521}, [%r712]; 2026-02-21T12:44:51.6954290Z // end inline asm 2026-02-21T12:44:51.6954355Z bar.sync 0; 2026-02-21T12:44:51.6954533Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1024, %r1064, %r1104, %r1144}; 2026-02-21T12:44:51.6954589Z bar.sync 0; 2026-02-21T12:44:51.6954655Z // begin inline asm 2026-02-21T12:44:51.6954781Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1522}, [%r712]; 2026-02-21T12:44:51.6954838Z // end inline asm 2026-02-21T12:44:51.6954900Z bar.sync 0; 2026-02-21T12:44:51.6955080Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1026, %r1066, %r1106, %r1146}; 2026-02-21T12:44:51.6955137Z bar.sync 0; 2026-02-21T12:44:51.6955198Z // begin inline asm 2026-02-21T12:44:51.6955338Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1524}, [%r712]; 2026-02-21T12:44:51.6955396Z // end inline asm 2026-02-21T12:44:51.6955453Z bar.sync 0; 2026-02-21T12:44:51.6955637Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1025, %r1065, %r1105, %r1145}; 2026-02-21T12:44:51.6955694Z bar.sync 0; 2026-02-21T12:44:51.6955756Z // begin inline asm 2026-02-21T12:44:51.6955895Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1523}, [%r712]; 2026-02-21T12:44:51.6955962Z // end inline asm 2026-02-21T12:44:51.6956022Z bar.sync 0; 2026-02-21T12:44:51.6956201Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1027, %r1067, %r1107, %r1147}; 2026-02-21T12:44:51.6956263Z bar.sync 0; 2026-02-21T12:44:51.6956324Z // begin inline asm 2026-02-21T12:44:51.6956585Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r1525}, [%r712]; 2026-02-21T12:44:51.6956658Z // end inline asm 2026-02-21T12:44:51.6956719Z // begin inline asm 2026-02-21T12:44:51.6956800Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6956862Z // end inline asm 2026-02-21T12:44:51.6956941Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6957005Z shl.b32 %r1547, %r1542, 8; 2026-02-21T12:44:51.6957069Z and.b32 %r1548, %r1547, 3072; 2026-02-21T12:44:51.6957145Z add.s32 %r1549, %r1548, %r3946; 2026-02-21T12:44:51.6957217Z bfe.u32 %r1550, %r1549, 4, 14; 2026-02-21T12:44:51.6957285Z cvt.u64.u32 %rd256, %r1550; 2026-02-21T12:44:51.6957370Z or.b64 %rd254, %rd256, -9223371899399045120; 2026-02-21T12:44:51.6957440Z // begin inline asm 2026-02-21T12:44:51.6957838Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1494,%r1495,%r1496,%r1497}, %rd254, %p46, 1, 1; 2026-02-21T12:44:51.6957899Z // end inline asm 2026-02-21T12:44:51.6957968Z add.s32 %r1551, %r1549, 32; 2026-02-21T12:44:51.6958117Z bfe.u32 %r1552, %r1551, 4, 14; 2026-02-21T12:44:51.6958181Z cvt.u64.u32 %rd257, %r1552; 2026-02-21T12:44:51.6958270Z or.b64 %rd255, %rd257, -9223371899399045120; 2026-02-21T12:44:51.6958391Z // begin inline asm 2026-02-21T12:44:51.6958767Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1514,%r1515,%r1516,%r1517}, %rd255, %p46, 1, 1; 2026-02-21T12:44:51.6958827Z // end inline asm 2026-02-21T12:44:51.6958910Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6958973Z mov.b32 %r1526, %r3946; 2026-02-21T12:44:51.6959035Z mov.b32 %r1527, %r1667; 2026-02-21T12:44:51.6959102Z mov.b32 %r1528, %r1667; 2026-02-21T12:44:51.6959165Z // begin inline asm 2026-02-21T12:44:51.6959345Z // wait for regs: %r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1526,%r1527,%r1528 2026-02-21T12:44:51.6959423Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6959491Z // end inline asm 2026-02-21T12:44:51.6959553Z $L__tmp16: 2026-02-21T12:44:51.6959778Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.6959923Z add.s64 %rd514, %rd514, 24; 2026-02-21T12:44:51.6960064Z add.s64 %rd513, %rd513, 96; 2026-02-21T12:44:51.6960135Z add.s64 %rd512, %rd512, 30720; 2026-02-21T12:44:51.6960210Z setp.lt.u64 %p48, %rd514, 4056; 2026-02-21T12:44:51.6960275Z @%p48 bra $L__BB0_15; 2026-02-21T12:44:51.6960375Z // %bb.16: // %.preheader210 2026-02-21T12:44:51.6960487Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6960704Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.6960772Z add.s64 %rd272, %rd54, %rd201; 2026-02-21T12:44:51.6960835Z add.s64 %rd259, %rd272, 16320; 2026-02-21T12:44:51.6961049Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6961114Z // begin inline asm 2026-02-21T12:44:51.6961176Z mov.u64 %rd258, 0x0; 2026-02-21T12:44:51.6961313Z createpolicy.fractional.L2::evict_last.b64 %rd258, 1.0; 2026-02-21T12:44:51.6961380Z // end inline asm 2026-02-21T12:44:51.6961441Z // begin inline asm 2026-02-21T12:44:51.6961503Z mov.u32 %r1553, 0x0; 2026-02-21T12:44:51.6961672Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1553 }, [ %rd259 + 0 ], %rd258; 2026-02-21T12:44:51.6961731Z // end inline asm 2026-02-21T12:44:51.6961934Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6961998Z bar.sync 0; 2026-02-21T12:44:51.6962066Z st.shared.b32 [%r9], %r1553; 2026-02-21T12:44:51.6962124Z bar.sync 0; 2026-02-21T12:44:51.6962195Z ld.shared.b16 %rs155, [%r10]; 2026-02-21T12:44:51.6962268Z ld.shared.b16 %rs156, [%r10+256]; 2026-02-21T12:44:51.6962342Z ld.shared.b16 %rs157, [%r10+16]; 2026-02-21T12:44:51.6962412Z ld.shared.b16 %rs158, [%r10+272]; 2026-02-21T12:44:51.6962489Z ld.shared.b16 %rs159, [%r11]; 2026-02-21T12:44:51.6962555Z ld.shared.b16 %rs160, [%r11+256]; 2026-02-21T12:44:51.6962627Z ld.shared.b16 %rs161, [%r11+16]; 2026-02-21T12:44:51.6962707Z ld.shared.b16 %rs162, [%r11+272]; 2026-02-21T12:44:51.6962773Z cvt.f32.bf16 %r1570, %rs155; 2026-02-21T12:44:51.6962836Z cvt.f32.bf16 %r1571, %rs156; 2026-02-21T12:44:51.6962898Z cvt.f32.bf16 %r1572, %rs159; 2026-02-21T12:44:51.6962965Z cvt.f32.bf16 %r1573, %rs160; 2026-02-21T12:44:51.6963026Z cvt.f32.bf16 %r1590, %rs157; 2026-02-21T12:44:51.6963091Z cvt.f32.bf16 %r1591, %rs158; 2026-02-21T12:44:51.6963169Z cvt.f32.bf16 %r1592, %rs161; 2026-02-21T12:44:51.6963249Z cvt.f32.bf16 %r1593, %rs162; 2026-02-21T12:44:51.6963455Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.6963534Z add.s64 %rd273, %rd55, %rd502; 2026-02-21T12:44:51.6963606Z add.s64 %rd261, %rd273, 5222400; 2026-02-21T12:44:51.6963806Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6963937Z // begin inline asm 2026-02-21T12:44:51.6964066Z mov.u16 %rs153, 0x0; 2026-02-21T12:44:51.6964144Z ld.global.b8 { %rs153 }, [ %rd261 + 0 ]; 2026-02-21T12:44:51.6964204Z // end inline asm 2026-02-21T12:44:51.6964412Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6964471Z bar.sync 0; 2026-02-21T12:44:51.6964539Z st.shared.b8 [%r12], %rs153; 2026-02-21T12:44:51.6964601Z bar.sync 0; 2026-02-21T12:44:51.6964696Z ld.shared.v2.b8 {%rs163, %rs164}, [%r22]; 2026-02-21T12:44:51.6964912Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6964978Z shl.b16 %rs165, %rs163, 4; 2026-02-21T12:44:51.6965058Z shl.b16 %rs166, %rs164, 4; 2026-02-21T12:44:51.6965263Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6965354Z selp.b16 %rs167, %rs165, %rs163, %p134; 2026-02-21T12:44:51.6965435Z cvt.s16.s8 %rs168, %rs167; 2026-02-21T12:44:51.6965599Z shr.s16 %rs169, %rs168, 4; 2026-02-21T12:44:51.6965679Z selp.b16 %rs170, %rs166, %rs164, %p134; 2026-02-21T12:44:51.6965744Z cvt.s16.s8 %rs171, %rs170; 2026-02-21T12:44:51.6965817Z shr.s16 %rs172, %rs171, 4; 2026-02-21T12:44:51.6966029Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6966101Z cvt.rn.f32.s16 %r1683, %rs169; 2026-02-21T12:44:51.6966180Z cvt.rn.f32.s16 %r1684, %rs172; 2026-02-21T12:44:51.6966238Z bar.sync 0; 2026-02-21T12:44:51.6966305Z st.shared.b32 [%r14], %r1683; 2026-02-21T12:44:51.6966379Z st.shared.b32 [%r15], %r1684; 2026-02-21T12:44:51.6966436Z $L__tmp17: 2026-02-21T12:44:51.6966841Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6966921Z // begin inline asm 2026-02-21T12:44:51.6967006Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6967068Z // end inline asm 2026-02-21T12:44:51.6967131Z bar.sync 0; 2026-02-21T12:44:51.6967225Z shfl.sync.idx.b32 %r1685, %r3, 0, 31, -1; 2026-02-21T12:44:51.6967299Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6967365Z shl.b32 %r1686, %r1685, 8; 2026-02-21T12:44:51.6967432Z and.b32 %r1687, %r1686, 3072; 2026-02-21T12:44:51.6967510Z add.s32 %r1688, %r1687, %r3946; 2026-02-21T12:44:51.6967584Z bfe.u32 %r1689, %r1688, 4, 14; 2026-02-21T12:44:51.6967654Z cvt.u64.u32 %rd274, %r1689; 2026-02-21T12:44:51.6967746Z or.b64 %rd262, %rd274, -9223371899399045120; 2026-02-21T12:44:51.6967810Z // begin inline asm 2026-02-21T12:44:51.6968197Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1570,%r1571,%r1572,%r1573}, %rd262, %p46, 1, 1; 2026-02-21T12:44:51.6968262Z // end inline asm 2026-02-21T12:44:51.6968327Z add.s32 %r1691, %r1687, %r169; 2026-02-21T12:44:51.6968396Z bfe.u32 %r1692, %r1691, 4, 14; 2026-02-21T12:44:51.6968461Z cvt.u64.u32 %rd275, %r1692; 2026-02-21T12:44:51.6968551Z or.b64 %rd263, %rd275, -9223371899399045120; 2026-02-21T12:44:51.6968620Z // begin inline asm 2026-02-21T12:44:51.6968994Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1590,%r1591,%r1592,%r1593}, %rd263, %p46, 1, 1; 2026-02-21T12:44:51.6974168Z // end inline asm 2026-02-21T12:44:51.6974297Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6974380Z mov.b32 %r1602, %r3946; 2026-02-21T12:44:51.6974447Z mov.b32 %r1603, %r1667; 2026-02-21T12:44:51.6974511Z mov.b32 %r1604, %r1667; 2026-02-21T12:44:51.6974582Z // begin inline asm 2026-02-21T12:44:51.6974788Z // wait for regs: %r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1602,%r1603,%r1604 2026-02-21T12:44:51.6974873Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6974936Z // end inline asm 2026-02-21T12:44:51.6975152Z $L__tmp18: 2026-02-21T12:44:51.6975389Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.6975543Z add.s64 %rd265, %rd272, 16352; 2026-02-21T12:44:51.6975775Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6975842Z // begin inline asm 2026-02-21T12:44:51.6975907Z mov.u64 %rd264, 0x0; 2026-02-21T12:44:51.6976069Z createpolicy.fractional.L2::evict_last.b64 %rd264, 1.0; 2026-02-21T12:44:51.6976134Z // end inline asm 2026-02-21T12:44:51.6976200Z // begin inline asm 2026-02-21T12:44:51.6976261Z mov.u32 %r1616, 0x0; 2026-02-21T12:44:51.6976439Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1616 }, [ %rd265 + 0 ], %rd264; 2026-02-21T12:44:51.6976723Z // end inline asm 2026-02-21T12:44:51.6976945Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6977019Z bar.sync 0; 2026-02-21T12:44:51.6977093Z st.shared.b32 [%r9], %r1616; 2026-02-21T12:44:51.6977151Z bar.sync 0; 2026-02-21T12:44:51.6977229Z ld.shared.b16 %rs173, [%r10]; 2026-02-21T12:44:51.6977460Z ld.shared.b16 %rs174, [%r10+256]; 2026-02-21T12:44:51.6977539Z ld.shared.b16 %rs175, [%r10+16]; 2026-02-21T12:44:51.6977608Z ld.shared.b16 %rs176, [%r10+272]; 2026-02-21T12:44:51.6977683Z ld.shared.b16 %rs177, [%r11]; 2026-02-21T12:44:51.6977750Z ld.shared.b16 %rs178, [%r11+256]; 2026-02-21T12:44:51.6977817Z ld.shared.b16 %rs179, [%r11+16]; 2026-02-21T12:44:51.6977891Z ld.shared.b16 %rs180, [%r11+272]; 2026-02-21T12:44:51.6977959Z cvt.f32.bf16 %r1633, %rs173; 2026-02-21T12:44:51.6978024Z cvt.f32.bf16 %r1634, %rs174; 2026-02-21T12:44:51.6978088Z cvt.f32.bf16 %r1635, %rs177; 2026-02-21T12:44:51.6978157Z cvt.f32.bf16 %r1636, %rs178; 2026-02-21T12:44:51.6978218Z cvt.f32.bf16 %r1653, %rs175; 2026-02-21T12:44:51.6978281Z cvt.f32.bf16 %r1654, %rs176; 2026-02-21T12:44:51.6978348Z cvt.f32.bf16 %r1655, %rs179; 2026-02-21T12:44:51.6978413Z cvt.f32.bf16 %r1656, %rs180; 2026-02-21T12:44:51.6978628Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.6978705Z add.s64 %rd267, %rd273, 5232640; 2026-02-21T12:44:51.6978916Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6978979Z // begin inline asm 2026-02-21T12:44:51.6979043Z mov.u16 %rs154, 0x0; 2026-02-21T12:44:51.6979127Z ld.global.b8 { %rs154 }, [ %rd267 + 0 ]; 2026-02-21T12:44:51.6979188Z // end inline asm 2026-02-21T12:44:51.6979411Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6979477Z bar.sync 0; 2026-02-21T12:44:51.6979550Z st.shared.b8 [%r12], %rs154; 2026-02-21T12:44:51.6979607Z bar.sync 0; 2026-02-21T12:44:51.6979694Z ld.shared.v2.b8 {%rs181, %rs182}, [%r22]; 2026-02-21T12:44:51.6979928Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6979998Z shl.b16 %rs183, %rs181, 4; 2026-02-21T12:44:51.6980066Z shl.b16 %rs184, %rs182, 4; 2026-02-21T12:44:51.6980286Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6980363Z selp.b16 %rs185, %rs183, %rs181, %p134; 2026-02-21T12:44:51.6980428Z cvt.s16.s8 %rs186, %rs185; 2026-02-21T12:44:51.6980497Z shr.s16 %rs187, %rs186, 4; 2026-02-21T12:44:51.6980569Z selp.b16 %rs188, %rs184, %rs182, %p134; 2026-02-21T12:44:51.6980633Z cvt.s16.s8 %rs189, %rs188; 2026-02-21T12:44:51.6980697Z shr.s16 %rs190, %rs189, 4; 2026-02-21T12:44:51.6980906Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6980978Z cvt.rn.f32.s16 %r1693, %rs187; 2026-02-21T12:44:51.6981045Z cvt.rn.f32.s16 %r1694, %rs190; 2026-02-21T12:44:51.6981109Z bar.sync 0; 2026-02-21T12:44:51.6981265Z st.shared.b32 [%r14], %r1693; 2026-02-21T12:44:51.6981330Z st.shared.b32 [%r15], %r1694; 2026-02-21T12:44:51.6981387Z $L__tmp19: 2026-02-21T12:44:51.6981922Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.6981986Z // begin inline asm 2026-02-21T12:44:51.6982064Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.6982129Z // end inline asm 2026-02-21T12:44:51.6982185Z bar.sync 0; 2026-02-21T12:44:51.6982269Z shfl.sync.idx.b32 %r1695, %r3, 0, 31, -1; 2026-02-21T12:44:51.6982350Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.6982416Z shl.b32 %r1696, %r1695, 8; 2026-02-21T12:44:51.6982480Z and.b32 %r1697, %r1696, 3072; 2026-02-21T12:44:51.6982547Z add.s32 %r1698, %r1697, %r3946; 2026-02-21T12:44:51.6982618Z bfe.u32 %r1699, %r1698, 4, 14; 2026-02-21T12:44:51.6982685Z cvt.u64.u32 %rd276, %r1699; 2026-02-21T12:44:51.6982768Z or.b64 %rd268, %rd276, -9223371899399045120; 2026-02-21T12:44:51.6982838Z // begin inline asm 2026-02-21T12:44:51.6983355Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1633,%r1634,%r1635,%r1636}, %rd268, %p46, 1, 1; 2026-02-21T12:44:51.6983422Z // end inline asm 2026-02-21T12:44:51.6983496Z add.s32 %r1700, %r1697, %r169; 2026-02-21T12:44:51.6983560Z bfe.u32 %r1701, %r1700, 4, 14; 2026-02-21T12:44:51.6983627Z cvt.u64.u32 %rd277, %r1701; 2026-02-21T12:44:51.6983710Z or.b64 %rd269, %rd277, -9223371899399045120; 2026-02-21T12:44:51.6983779Z // begin inline asm 2026-02-21T12:44:51.6984165Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525}, {%r1653,%r1654,%r1655,%r1656}, %rd269, %p46, 1, 1; 2026-02-21T12:44:51.6984228Z // end inline asm 2026-02-21T12:44:51.6984314Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.6984378Z mov.b32 %r1665, %r3946; 2026-02-21T12:44:51.6984440Z mov.b32 %r1666, %r1667; 2026-02-21T12:44:51.6984509Z // begin inline asm 2026-02-21T12:44:51.6984696Z // wait for regs: %r1518,%r1519,%r1520,%r1521,%r1522,%r1523,%r1524,%r1525,%r1665,%r1666,%r1667 2026-02-21T12:44:51.6984780Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.6984839Z // end inline asm 2026-02-21T12:44:51.6984906Z $L__tmp20: 2026-02-21T12:44:51.6985131Z .loc 1 90 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:90:28 2026-02-21T12:44:51.6985219Z cvt.rn.bf16x2.f32 %r1702, %r1519, %r1518; 2026-02-21T12:44:51.6985305Z cvt.rn.bf16x2.f32 %r1703, %r1521, %r1520; 2026-02-21T12:44:51.6985381Z cvt.rn.bf16x2.f32 %r1704, %r1523, %r1522; 2026-02-21T12:44:51.6985454Z cvt.rn.bf16x2.f32 %r1705, %r1525, %r1524; 2026-02-21T12:44:51.6985674Z .loc 1 91 22 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:22 2026-02-21T12:44:51.6985752Z mad.lo.s64 %rd278, %rd52, 2560, %rd123; 2026-02-21T12:44:51.6985819Z shl.b64 %rd279, %rd53, 1; 2026-02-21T12:44:51.6985887Z add.s64 %rd280, %rd278, %rd279; 2026-02-21T12:44:51.6985962Z add.s64 %rd270, %rd280, %rd504; 2026-02-21T12:44:51.6986173Z .loc 1 91 81 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:81 2026-02-21T12:44:51.6986234Z bar.sync 0; 2026-02-21T12:44:51.6986432Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r1702, %r1703, %r1704, %r1705}; 2026-02-21T12:44:51.6986636Z bar.sync 0; 2026-02-21T12:44:51.6986761Z ld.shared.v4.b32 {%r1679, %r1680, %r1681, %r1682}, [%r21]; 2026-02-21T12:44:51.6986826Z // begin inline asm 2026-02-21T12:44:51.6986978Z st.global.v4.b32 [ %rd270 + 0 ], { %r1679, %r1680, %r1681, %r1682 }; 2026-02-21T12:44:51.6987044Z // end inline asm 2026-02-21T12:44:51.6987265Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.6987339Z add.s64 %rd282, %rd505, 2; 2026-02-21T12:44:51.6987544Z .loc 1 28 35 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:28:35 2026-02-21T12:44:51.6987725Z mul.hi.u64 %rd283, %rd282, -3689348814741910323; 2026-02-21T12:44:51.6987804Z shr.u64 %rd284, %rd283, 7; 2026-02-21T12:44:51.6988076Z .loc 1 29 33 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:29:33 2026-02-21T12:44:51.6988144Z shl.b64 %rd64, %rd284, 3; 2026-02-21T12:44:51.6988355Z .loc 1 30 39 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:39 2026-02-21T12:44:51.6988422Z sub.s64 %rd285, 4096, %rd64; 2026-02-21T12:44:51.6988719Z .loc 1 30 52 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:52 2026-02-21T12:44:51.6988786Z min.s64 %rd65, %rd285, 8; 2026-02-21T12:44:51.6988995Z .loc 1 31 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:45 2026-02-21T12:44:51.6989065Z mul.lo.s64 %rd286, %rd284, 160; 2026-02-21T12:44:51.6989138Z sub.s64 %rd66, %rd282, %rd286; 2026-02-21T12:44:51.6989349Z .loc 1 32 51 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:32:51 2026-02-21T12:44:51.6989416Z or.b64 %rd287, %rd66, %rd65; 2026-02-21T12:44:51.6989622Z and.b64 %rd288, %rd287, -4294967296; 2026-02-21T12:44:51.6989704Z setp.ne.b64 %p54, %rd288, 0; 2026-02-21T12:44:51.6989770Z @%p54 bra $L__BB0_18; 2026-02-21T12:44:51.6989832Z bra.uni $L__BB0_17; 2026-02-21T12:44:51.6989953Z $L__BB0_18: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6990028Z div.s64 %rd511, %rd66, %rd65; 2026-02-21T12:44:51.6990088Z bra.uni $L__BB0_19; 2026-02-21T12:44:51.6990200Z $L__BB0_17: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6990272Z cvt.u32.u64 %r1706, %rd65; 2026-02-21T12:44:51.6990335Z cvt.u32.u64 %r1707, %rd66; 2026-02-21T12:44:51.6990401Z div.u32 %r1708, %r1707, %r1706; 2026-02-21T12:44:51.6990474Z cvt.u64.u32 %rd511, %r1708; 2026-02-21T12:44:51.6990580Z $L__BB0_19: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.6990793Z .loc 1 31 64 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:64 2026-02-21T12:44:51.6990869Z mul.lo.s64 %rd290, %rd511, %rd65; 2026-02-21T12:44:51.6990943Z sub.s64 %rd291, %rd66, %rd290; 2026-02-21T12:44:51.6991147Z .loc 1 31 30 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:30 2026-02-21T12:44:51.6991213Z add.s64 %rd292, %rd291, %rd64; 2026-02-21T12:44:51.6991422Z .loc 1 33 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:33:27 2026-02-21T12:44:51.6991485Z shl.b64 %rd293, %rd292, 6; 2026-02-21T12:44:51.6991686Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.6991757Z or.b64 %rd70, %rd293, %rd4; 2026-02-21T12:44:51.6991958Z .loc 1 35 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:35:27 2026-02-21T12:44:51.6992022Z shl.b64 %rd71, %rd511, 6; 2026-02-21T12:44:51.6992087Z shl.b64 %rd294, %rd70, 14; 2026-02-21T12:44:51.6992160Z add.s64 %rd72, %rd121, %rd294; 2026-02-21T12:44:51.6992226Z add.s64 %rd73, %rd501, %rd71; 2026-02-21T12:44:51.6992448Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.6992531Z shl.b64 %rd295, %rd292, 20; 2026-02-21T12:44:51.6992597Z add.s64 %rd517, %rd18, %rd295; 2026-02-21T12:44:51.6992660Z add.s64 %rd516, %rd19, %rd71; 2026-02-21T12:44:51.6992729Z mov.b32 %r2269, 0f00000000; 2026-02-21T12:44:51.6992793Z mov.b64 %rd518, -24; 2026-02-21T12:44:51.6992856Z mov.b32 %r2270, %r2269; 2026-02-21T12:44:51.6992917Z mov.b32 %r2271, %r2269; 2026-02-21T12:44:51.6992984Z mov.b32 %r2272, %r2269; 2026-02-21T12:44:51.6993044Z mov.b32 %r2273, %r2269; 2026-02-21T12:44:51.6993103Z mov.b32 %r2274, %r2269; 2026-02-21T12:44:51.6993169Z mov.b32 %r2275, %r2269; 2026-02-21T12:44:51.6993228Z mov.b32 %r2276, %r2269; 2026-02-21T12:44:51.6993404Z $L__BB0_20: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:51.6993513Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:51.6993773Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.6993843Z add.s64 %rd297, %rd517, -64; 2026-02-21T12:44:51.6993906Z // begin inline asm 2026-02-21T12:44:51.6993973Z mov.u64 %rd296, 0x0; 2026-02-21T12:44:51.6994104Z createpolicy.fractional.L2::evict_last.b64 %rd296, 1.0; 2026-02-21T12:44:51.6994163Z // end inline asm 2026-02-21T12:44:51.6994231Z // begin inline asm 2026-02-21T12:44:51.6994291Z mov.u32 %r1710, 0x0; 2026-02-21T12:44:51.6994460Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1710 }, [ %rd297 + 0 ], %rd296; 2026-02-21T12:44:51.6994520Z // end inline asm 2026-02-21T12:44:51.6994733Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.6994794Z bar.sync 0; 2026-02-21T12:44:51.6994864Z st.shared.b32 [%r9], %r1710; 2026-02-21T12:44:51.6994929Z bar.sync 0; 2026-02-21T12:44:51.6995045Z ld.shared.b16 %rs194, [%r10]; 2026-02-21T12:44:51.6995162Z ld.shared.b16 %rs195, [%r10+256]; 2026-02-21T12:44:51.6995240Z ld.shared.b16 %rs196, [%r10+16]; 2026-02-21T12:44:51.6995308Z ld.shared.b16 %rs197, [%r10+272]; 2026-02-21T12:44:51.6995373Z ld.shared.b16 %rs198, [%r11]; 2026-02-21T12:44:51.6995439Z ld.shared.b16 %rs199, [%r11+256]; 2026-02-21T12:44:51.6995514Z ld.shared.b16 %rs200, [%r11+16]; 2026-02-21T12:44:51.6995582Z ld.shared.b16 %rs201, [%r11+272]; 2026-02-21T12:44:51.6995649Z cvt.f32.bf16 %r1807, %rs194; 2026-02-21T12:44:51.6995721Z cvt.f32.bf16 %r1808, %rs195; 2026-02-21T12:44:51.6995790Z cvt.f32.bf16 %r1809, %rs198; 2026-02-21T12:44:51.6995857Z cvt.f32.bf16 %r1810, %rs199; 2026-02-21T12:44:51.6995920Z cvt.f32.bf16 %r1827, %rs196; 2026-02-21T12:44:51.6995993Z cvt.f32.bf16 %r1828, %rs197; 2026-02-21T12:44:51.6996059Z cvt.f32.bf16 %r1829, %rs200; 2026-02-21T12:44:51.6996123Z cvt.f32.bf16 %r1830, %rs201; 2026-02-21T12:44:51.6996342Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.6996409Z // begin inline asm 2026-02-21T12:44:51.6996611Z mov.u16 %rs191, 0x0; 2026-02-21T12:44:51.6996701Z ld.global.b8 { %rs191 }, [ %rd516 + 0 ]; 2026-02-21T12:44:51.6996764Z // end inline asm 2026-02-21T12:44:51.6996971Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.6997029Z bar.sync 0; 2026-02-21T12:44:51.6997101Z st.shared.b8 [%r12], %rs191; 2026-02-21T12:44:51.6997159Z bar.sync 0; 2026-02-21T12:44:51.6997242Z ld.shared.v2.b8 {%rs202, %rs203}, [%r22]; 2026-02-21T12:44:51.6997455Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.6997534Z shl.b16 %rs204, %rs202, 4; 2026-02-21T12:44:51.6997601Z shl.b16 %rs205, %rs203, 4; 2026-02-21T12:44:51.6997809Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.6997899Z selp.b16 %rs206, %rs204, %rs202, %p134; 2026-02-21T12:44:51.6997965Z cvt.s16.s8 %rs207, %rs206; 2026-02-21T12:44:51.6998039Z shr.s16 %rs208, %rs207, 4; 2026-02-21T12:44:51.6998123Z selp.b16 %rs209, %rs205, %rs203, %p134; 2026-02-21T12:44:51.6998186Z cvt.s16.s8 %rs210, %rs209; 2026-02-21T12:44:51.6998249Z shr.s16 %rs211, %rs210, 4; 2026-02-21T12:44:51.6998457Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.6998525Z cvt.rn.f32.s16 %r2291, %rs208; 2026-02-21T12:44:51.6998592Z cvt.rn.f32.s16 %r2292, %rs211; 2026-02-21T12:44:51.6998652Z bar.sync 0; 2026-02-21T12:44:51.6998728Z st.shared.b32 [%r14], %r2291; 2026-02-21T12:44:51.6998791Z st.shared.b32 [%r15], %r2292; 2026-02-21T12:44:51.6998939Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2269}; 2026-02-21T12:44:51.6999085Z bar.sync 0; 2026-02-21T12:44:51.6999148Z // begin inline asm 2026-02-21T12:44:51.6999343Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1771, %r1811, %r1851, %r1891}, [%r964]; 2026-02-21T12:44:51.6999464Z // end inline asm 2026-02-21T12:44:51.6999528Z bar.sync 0; 2026-02-21T12:44:51.6999660Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2271}; 2026-02-21T12:44:51.6999716Z bar.sync 0; 2026-02-21T12:44:51.6999784Z // begin inline asm 2026-02-21T12:44:51.6999968Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1773, %r1813, %r1853, %r1893}, [%r964]; 2026-02-21T12:44:51.7000029Z // end inline asm 2026-02-21T12:44:51.7000091Z bar.sync 0; 2026-02-21T12:44:51.7000221Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2270}; 2026-02-21T12:44:51.7000279Z bar.sync 0; 2026-02-21T12:44:51.7000339Z // begin inline asm 2026-02-21T12:44:51.7000527Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1772, %r1812, %r1852, %r1892}, [%r964]; 2026-02-21T12:44:51.7000588Z // end inline asm 2026-02-21T12:44:51.7000644Z bar.sync 0; 2026-02-21T12:44:51.7000777Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2272}; 2026-02-21T12:44:51.7000897Z bar.sync 0; 2026-02-21T12:44:51.7001016Z // begin inline asm 2026-02-21T12:44:51.7001196Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1774, %r1814, %r1854, %r1894}, [%r964]; 2026-02-21T12:44:51.7001262Z // end inline asm 2026-02-21T12:44:51.7001318Z bar.sync 0; 2026-02-21T12:44:51.7001445Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2273}; 2026-02-21T12:44:51.7001507Z bar.sync 0; 2026-02-21T12:44:51.7001567Z // begin inline asm 2026-02-21T12:44:51.7001744Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1775, %r1815, %r1855, %r1895}, [%r964]; 2026-02-21T12:44:51.7001808Z // end inline asm 2026-02-21T12:44:51.7001864Z bar.sync 0; 2026-02-21T12:44:51.7001989Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2275}; 2026-02-21T12:44:51.7002047Z bar.sync 0; 2026-02-21T12:44:51.7002132Z // begin inline asm 2026-02-21T12:44:51.7002314Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1777, %r1817, %r1857, %r1897}, [%r964]; 2026-02-21T12:44:51.7002374Z // end inline asm 2026-02-21T12:44:51.7002441Z bar.sync 0; 2026-02-21T12:44:51.7002570Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2274}; 2026-02-21T12:44:51.7002628Z bar.sync 0; 2026-02-21T12:44:51.7002690Z // begin inline asm 2026-02-21T12:44:51.7002872Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1776, %r1816, %r1856, %r1896}, [%r964]; 2026-02-21T12:44:51.7002930Z // end inline asm 2026-02-21T12:44:51.7002986Z bar.sync 0; 2026-02-21T12:44:51.7003118Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r2276}; 2026-02-21T12:44:51.7003177Z bar.sync 0; 2026-02-21T12:44:51.7003237Z // begin inline asm 2026-02-21T12:44:51.7003414Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r1778, %r1818, %r1858, %r1898}, [%r964]; 2026-02-21T12:44:51.7003481Z // end inline asm 2026-02-21T12:44:51.7003538Z $L__tmp21: 2026-02-21T12:44:51.7003820Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7003889Z // begin inline asm 2026-02-21T12:44:51.7003973Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7004031Z // end inline asm 2026-02-21T12:44:51.7004118Z shfl.sync.idx.b32 %r2293, %r3, 0, 31, -1; 2026-02-21T12:44:51.7004192Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7004259Z mov.pred %p72, -1; 2026-02-21T12:44:51.7004319Z // begin inline asm 2026-02-21T12:44:51.7004707Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778}, {%r1807,%r1808,%r1809,%r1810}, %rd7, %p72, 1, 1; 2026-02-21T12:44:51.7004767Z // end inline asm 2026-02-21T12:44:51.7004828Z // begin inline asm 2026-02-21T12:44:51.7005203Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778}, {%r1827,%r1828,%r1829,%r1830}, %rd8, %p72, 1, 1; 2026-02-21T12:44:51.7005264Z // end inline asm 2026-02-21T12:44:51.7005386Z // begin inline asm 2026-02-21T12:44:51.7005767Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818}, {%r1807,%r1808,%r1809,%r1810}, %rd9, %p72, 1, 1; 2026-02-21T12:44:51.7005898Z // end inline asm 2026-02-21T12:44:51.7005960Z // begin inline asm 2026-02-21T12:44:51.7006344Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818}, {%r1827,%r1828,%r1829,%r1830}, %rd10, %p72, 1, 1; 2026-02-21T12:44:51.7006403Z // end inline asm 2026-02-21T12:44:51.7006592Z // begin inline asm 2026-02-21T12:44:51.7006969Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858}, {%r1807,%r1808,%r1809,%r1810}, %rd11, %p72, 1, 1; 2026-02-21T12:44:51.7007027Z // end inline asm 2026-02-21T12:44:51.7007088Z // begin inline asm 2026-02-21T12:44:51.7007470Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858}, {%r1827,%r1828,%r1829,%r1830}, %rd12, %p72, 1, 1; 2026-02-21T12:44:51.7007535Z // end inline asm 2026-02-21T12:44:51.7007596Z // begin inline asm 2026-02-21T12:44:51.7008124Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898}, {%r1807,%r1808,%r1809,%r1810}, %rd13, %p72, 1, 1; 2026-02-21T12:44:51.7008194Z // end inline asm 2026-02-21T12:44:51.7008254Z // begin inline asm 2026-02-21T12:44:51.7008618Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898}, {%r1827,%r1828,%r1829,%r1830}, %rd14, %p72, 1, 1; 2026-02-21T12:44:51.7008681Z // end inline asm 2026-02-21T12:44:51.7008761Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7008821Z mov.b32 %r2418, 0; 2026-02-21T12:44:51.7008890Z mov.b32 %r1944, %r2418; 2026-02-21T12:44:51.7008950Z mov.b32 %r1945, %r2418; 2026-02-21T12:44:51.7009010Z mov.b32 %r1943, %r3946; 2026-02-21T12:44:51.7009069Z // begin inline asm 2026-02-21T12:44:51.7009640Z // wait for regs: %r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858,%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898,%r1943,%r1944,%r1945 2026-02-21T12:44:51.7009720Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7009778Z // end inline asm 2026-02-21T12:44:51.7009842Z $L__tmp22: 2026-02-21T12:44:51.7010055Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7010124Z add.s64 %rd309, %rd517, -32; 2026-02-21T12:44:51.7010190Z // begin inline asm 2026-02-21T12:44:51.7010251Z mov.u64 %rd308, 0x0; 2026-02-21T12:44:51.7010377Z createpolicy.fractional.L2::evict_last.b64 %rd308, 1.0; 2026-02-21T12:44:51.7010437Z // end inline asm 2026-02-21T12:44:51.7010507Z // begin inline asm 2026-02-21T12:44:51.7010571Z mov.u32 %r1981, 0x0; 2026-02-21T12:44:51.7010733Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r1981 }, [ %rd309 + 0 ], %rd308; 2026-02-21T12:44:51.7010807Z // end inline asm 2026-02-21T12:44:51.7011028Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7011089Z bar.sync 0; 2026-02-21T12:44:51.7011165Z st.shared.b32 [%r9], %r1981; 2026-02-21T12:44:51.7011223Z bar.sync 0; 2026-02-21T12:44:51.7011290Z ld.shared.b16 %rs212, [%r10]; 2026-02-21T12:44:51.7011361Z ld.shared.b16 %rs213, [%r10+256]; 2026-02-21T12:44:51.7011436Z ld.shared.b16 %rs214, [%r10+16]; 2026-02-21T12:44:51.7011504Z ld.shared.b16 %rs215, [%r10+272]; 2026-02-21T12:44:51.7011571Z ld.shared.b16 %rs216, [%r11]; 2026-02-21T12:44:51.7011644Z ld.shared.b16 %rs217, [%r11+256]; 2026-02-21T12:44:51.7011710Z ld.shared.b16 %rs218, [%r11+16]; 2026-02-21T12:44:51.7011776Z ld.shared.b16 %rs219, [%r11+272]; 2026-02-21T12:44:51.7011840Z cvt.f32.bf16 %r2038, %rs212; 2026-02-21T12:44:51.7011909Z cvt.f32.bf16 %r2039, %rs213; 2026-02-21T12:44:51.7012059Z cvt.f32.bf16 %r2040, %rs216; 2026-02-21T12:44:51.7012121Z cvt.f32.bf16 %r2041, %rs217; 2026-02-21T12:44:51.7012190Z cvt.f32.bf16 %r2058, %rs214; 2026-02-21T12:44:51.7012321Z cvt.f32.bf16 %r2059, %rs215; 2026-02-21T12:44:51.7012398Z cvt.f32.bf16 %r2060, %rs218; 2026-02-21T12:44:51.7012465Z cvt.f32.bf16 %r2061, %rs219; 2026-02-21T12:44:51.7012686Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7012755Z add.s64 %rd311, %rd516, 10240; 2026-02-21T12:44:51.7012818Z // begin inline asm 2026-02-21T12:44:51.7012885Z mov.u16 %rs192, 0x0; 2026-02-21T12:44:51.7012965Z ld.global.b8 { %rs192 }, [ %rd311 + 0 ]; 2026-02-21T12:44:51.7013023Z // end inline asm 2026-02-21T12:44:51.7013236Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7013295Z bar.sync 0; 2026-02-21T12:44:51.7013361Z st.shared.b8 [%r12], %rs192; 2026-02-21T12:44:51.7013419Z bar.sync 0; 2026-02-21T12:44:51.7013507Z ld.shared.v2.b8 {%rs220, %rs221}, [%r22]; 2026-02-21T12:44:51.7013805Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7013876Z shl.b16 %rs222, %rs220, 4; 2026-02-21T12:44:51.7013944Z shl.b16 %rs223, %rs221, 4; 2026-02-21T12:44:51.7014144Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7014219Z selp.b16 %rs224, %rs222, %rs220, %p134; 2026-02-21T12:44:51.7014287Z cvt.s16.s8 %rs225, %rs224; 2026-02-21T12:44:51.7014348Z shr.s16 %rs226, %rs225, 4; 2026-02-21T12:44:51.7014419Z selp.b16 %rs227, %rs223, %rs221, %p134; 2026-02-21T12:44:51.7014482Z cvt.s16.s8 %rs228, %rs227; 2026-02-21T12:44:51.7014553Z shr.s16 %rs229, %rs228, 4; 2026-02-21T12:44:51.7014753Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7014831Z cvt.rn.f32.s16 %r2294, %rs226; 2026-02-21T12:44:51.7014908Z cvt.rn.f32.s16 %r2295, %rs229; 2026-02-21T12:44:51.7014966Z bar.sync 0; 2026-02-21T12:44:51.7015038Z st.shared.b32 [%r14], %r2294; 2026-02-21T12:44:51.7015108Z st.shared.b32 [%r15], %r2295; 2026-02-21T12:44:51.7015170Z $L__tmp23: 2026-02-21T12:44:51.7015459Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7015524Z // begin inline asm 2026-02-21T12:44:51.7015619Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7015684Z // end inline asm 2026-02-21T12:44:51.7015742Z bar.sync 0; 2026-02-21T12:44:51.7015822Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7015884Z // begin inline asm 2026-02-21T12:44:51.7016271Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778}, {%r2038,%r2039,%r2040,%r2041}, %rd7, %p72, 1, 1; 2026-02-21T12:44:51.7016330Z // end inline asm 2026-02-21T12:44:51.7016398Z // begin inline asm 2026-02-21T12:44:51.7016911Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778}, {%r2058,%r2059,%r2060,%r2061}, %rd8, %p72, 1, 1; 2026-02-21T12:44:51.7016979Z // end inline asm 2026-02-21T12:44:51.7017042Z // begin inline asm 2026-02-21T12:44:51.7017409Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818}, {%r2038,%r2039,%r2040,%r2041}, %rd9, %p72, 1, 1; 2026-02-21T12:44:51.7017474Z // end inline asm 2026-02-21T12:44:51.7017537Z // begin inline asm 2026-02-21T12:44:51.7017922Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818}, {%r2058,%r2059,%r2060,%r2061}, %rd10, %p72, 1, 1; 2026-02-21T12:44:51.7017991Z // end inline asm 2026-02-21T12:44:51.7018051Z // begin inline asm 2026-02-21T12:44:51.7018418Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858}, {%r2038,%r2039,%r2040,%r2041}, %rd11, %p72, 1, 1; 2026-02-21T12:44:51.7018562Z // end inline asm 2026-02-21T12:44:51.7018624Z // begin inline asm 2026-02-21T12:44:51.7019062Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858}, {%r2058,%r2059,%r2060,%r2061}, %rd12, %p72, 1, 1; 2026-02-21T12:44:51.7019120Z // end inline asm 2026-02-21T12:44:51.7019187Z // begin inline asm 2026-02-21T12:44:51.7019551Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898}, {%r2038,%r2039,%r2040,%r2041}, %rd13, %p72, 1, 1; 2026-02-21T12:44:51.7019610Z // end inline asm 2026-02-21T12:44:51.7019677Z // begin inline asm 2026-02-21T12:44:51.7020040Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898}, {%r2058,%r2059,%r2060,%r2061}, %rd14, %p72, 1, 1; 2026-02-21T12:44:51.7020099Z // end inline asm 2026-02-21T12:44:51.7020183Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7020250Z mov.b32 %r2175, %r2418; 2026-02-21T12:44:51.7020312Z mov.b32 %r2176, %r2418; 2026-02-21T12:44:51.7020377Z mov.b32 %r2174, %r3946; 2026-02-21T12:44:51.7020564Z // begin inline asm 2026-02-21T12:44:51.7021127Z // wait for regs: %r1771,%r1772,%r1773,%r1774,%r1775,%r1776,%r1777,%r1778,%r1811,%r1812,%r1813,%r1814,%r1815,%r1816,%r1817,%r1818,%r1851,%r1852,%r1853,%r1854,%r1855,%r1856,%r1857,%r1858,%r1891,%r1892,%r1893,%r1894,%r1895,%r1896,%r1897,%r1898,%r2174,%r2175,%r2176 2026-02-21T12:44:51.7021204Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7021282Z // end inline asm 2026-02-21T12:44:51.7021342Z $L__tmp24: 2026-02-21T12:44:51.7021561Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7021628Z // begin inline asm 2026-02-21T12:44:51.7021689Z mov.u64 %rd320, 0x0; 2026-02-21T12:44:51.7021821Z createpolicy.fractional.L2::evict_last.b64 %rd320, 1.0; 2026-02-21T12:44:51.7021883Z // end inline asm 2026-02-21T12:44:51.7021946Z // begin inline asm 2026-02-21T12:44:51.7022006Z mov.u32 %r2212, 0x0; 2026-02-21T12:44:51.7022172Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2212 }, [ %rd517 + 0 ], %rd320; 2026-02-21T12:44:51.7022237Z // end inline asm 2026-02-21T12:44:51.7022444Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7022503Z bar.sync 0; 2026-02-21T12:44:51.7022575Z st.shared.b32 [%r9], %r2212; 2026-02-21T12:44:51.7022632Z bar.sync 0; 2026-02-21T12:44:51.7022699Z ld.shared.b16 %rs230, [%r10]; 2026-02-21T12:44:51.7022768Z ld.shared.b16 %rs231, [%r10+256]; 2026-02-21T12:44:51.7022842Z ld.shared.b16 %rs232, [%r10+16]; 2026-02-21T12:44:51.7022910Z ld.shared.b16 %rs233, [%r10+272]; 2026-02-21T12:44:51.7022975Z ld.shared.b16 %rs234, [%r11]; 2026-02-21T12:44:51.7023044Z ld.shared.b16 %rs235, [%r11+256]; 2026-02-21T12:44:51.7023120Z ld.shared.b16 %rs236, [%r11+16]; 2026-02-21T12:44:51.7023189Z ld.shared.b16 %rs237, [%r11+272]; 2026-02-21T12:44:51.7023260Z cvt.f32.bf16 %r2245, %rs230; 2026-02-21T12:44:51.7023326Z cvt.f32.bf16 %r2246, %rs231; 2026-02-21T12:44:51.7023393Z cvt.f32.bf16 %r2247, %rs234; 2026-02-21T12:44:51.7023455Z cvt.f32.bf16 %r2248, %rs235; 2026-02-21T12:44:51.7023522Z cvt.f32.bf16 %r2265, %rs232; 2026-02-21T12:44:51.7023585Z cvt.f32.bf16 %r2266, %rs233; 2026-02-21T12:44:51.7023647Z cvt.f32.bf16 %r2267, %rs236; 2026-02-21T12:44:51.7023713Z cvt.f32.bf16 %r2268, %rs237; 2026-02-21T12:44:51.7023920Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7023986Z add.s64 %rd323, %rd516, 20480; 2026-02-21T12:44:51.7024048Z // begin inline asm 2026-02-21T12:44:51.7024116Z mov.u16 %rs193, 0x0; 2026-02-21T12:44:51.7024192Z ld.global.b8 { %rs193 }, [ %rd323 + 0 ]; 2026-02-21T12:44:51.7024250Z // end inline asm 2026-02-21T12:44:51.7024461Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7024581Z bar.sync 0; 2026-02-21T12:44:51.7024647Z st.shared.b8 [%r12], %rs193; 2026-02-21T12:44:51.7024708Z bar.sync 0; 2026-02-21T12:44:51.7024838Z ld.shared.v2.b8 {%rs238, %rs239}, [%r22]; 2026-02-21T12:44:51.7025041Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7025106Z shl.b16 %rs240, %rs238, 4; 2026-02-21T12:44:51.7025177Z shl.b16 %rs241, %rs239, 4; 2026-02-21T12:44:51.7025376Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7025450Z selp.b16 %rs242, %rs240, %rs238, %p134; 2026-02-21T12:44:51.7025520Z cvt.s16.s8 %rs243, %rs242; 2026-02-21T12:44:51.7025583Z shr.s16 %rs244, %rs243, 4; 2026-02-21T12:44:51.7025654Z selp.b16 %rs245, %rs241, %rs239, %p134; 2026-02-21T12:44:51.7025725Z cvt.s16.s8 %rs246, %rs245; 2026-02-21T12:44:51.7025788Z shr.s16 %rs247, %rs246, 4; 2026-02-21T12:44:51.7025991Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7026071Z cvt.rn.f32.s16 %r2296, %rs244; 2026-02-21T12:44:51.7026239Z cvt.rn.f32.s16 %r2297, %rs247; 2026-02-21T12:44:51.7026300Z bar.sync 0; 2026-02-21T12:44:51.7026366Z st.shared.b32 [%r14], %r2296; 2026-02-21T12:44:51.7026438Z st.shared.b32 [%r15], %r2297; 2026-02-21T12:44:51.7026624Z $L__tmp25: 2026-02-21T12:44:51.7026904Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7027100Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1771, %r1811, %r1851, %r1891}; 2026-02-21T12:44:51.7027157Z bar.sync 0; 2026-02-21T12:44:51.7027218Z // begin inline asm 2026-02-21T12:44:51.7027356Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2269}, [%r712]; 2026-02-21T12:44:51.7027423Z // end inline asm 2026-02-21T12:44:51.7027481Z bar.sync 0; 2026-02-21T12:44:51.7027662Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1773, %r1813, %r1853, %r1893}; 2026-02-21T12:44:51.7027725Z bar.sync 0; 2026-02-21T12:44:51.7027785Z // begin inline asm 2026-02-21T12:44:51.7027921Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2271}, [%r712]; 2026-02-21T12:44:51.7027980Z // end inline asm 2026-02-21T12:44:51.7028042Z bar.sync 0; 2026-02-21T12:44:51.7028223Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1772, %r1812, %r1852, %r1892}; 2026-02-21T12:44:51.7028279Z bar.sync 0; 2026-02-21T12:44:51.7028345Z // begin inline asm 2026-02-21T12:44:51.7028473Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2270}, [%r712]; 2026-02-21T12:44:51.7028617Z // end inline asm 2026-02-21T12:44:51.7028680Z bar.sync 0; 2026-02-21T12:44:51.7028869Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1774, %r1814, %r1854, %r1894}; 2026-02-21T12:44:51.7028926Z bar.sync 0; 2026-02-21T12:44:51.7028991Z // begin inline asm 2026-02-21T12:44:51.7029133Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2272}, [%r712]; 2026-02-21T12:44:51.7029193Z // end inline asm 2026-02-21T12:44:51.7029250Z bar.sync 0; 2026-02-21T12:44:51.7029440Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1775, %r1815, %r1855, %r1895}; 2026-02-21T12:44:51.7029498Z bar.sync 0; 2026-02-21T12:44:51.7029561Z // begin inline asm 2026-02-21T12:44:51.7029690Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2273}, [%r712]; 2026-02-21T12:44:51.7029753Z // end inline asm 2026-02-21T12:44:51.7029808Z bar.sync 0; 2026-02-21T12:44:51.7029986Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1777, %r1817, %r1857, %r1897}; 2026-02-21T12:44:51.7030048Z bar.sync 0; 2026-02-21T12:44:51.7030110Z // begin inline asm 2026-02-21T12:44:51.7030237Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2275}, [%r712]; 2026-02-21T12:44:51.7030298Z // end inline asm 2026-02-21T12:44:51.7030361Z bar.sync 0; 2026-02-21T12:44:51.7030539Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1776, %r1816, %r1856, %r1896}; 2026-02-21T12:44:51.7030595Z bar.sync 0; 2026-02-21T12:44:51.7030774Z // begin inline asm 2026-02-21T12:44:51.7030906Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2274}, [%r712]; 2026-02-21T12:44:51.7031030Z // end inline asm 2026-02-21T12:44:51.7031087Z bar.sync 0; 2026-02-21T12:44:51.7031278Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r1778, %r1818, %r1858, %r1898}; 2026-02-21T12:44:51.7031336Z bar.sync 0; 2026-02-21T12:44:51.7031397Z // begin inline asm 2026-02-21T12:44:51.7031529Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r2276}, [%r712]; 2026-02-21T12:44:51.7031589Z // end inline asm 2026-02-21T12:44:51.7031647Z // begin inline asm 2026-02-21T12:44:51.7031732Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7031789Z // end inline asm 2026-02-21T12:44:51.7031863Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7031925Z shl.b32 %r2298, %r2293, 8; 2026-02-21T12:44:51.7031991Z and.b32 %r2299, %r2298, 3072; 2026-02-21T12:44:51.7032056Z add.s32 %r2300, %r2299, %r3946; 2026-02-21T12:44:51.7032119Z bfe.u32 %r2301, %r2300, 4, 14; 2026-02-21T12:44:51.7032191Z cvt.u64.u32 %rd326, %r2301; 2026-02-21T12:44:51.7032275Z or.b64 %rd324, %rd326, -9223371899399045120; 2026-02-21T12:44:51.7032456Z // begin inline asm 2026-02-21T12:44:51.7032842Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2245,%r2246,%r2247,%r2248}, %rd324, %p72, 1, 1; 2026-02-21T12:44:51.7032906Z // end inline asm 2026-02-21T12:44:51.7032969Z add.s32 %r2302, %r2300, 32; 2026-02-21T12:44:51.7033031Z bfe.u32 %r2303, %r2302, 4, 14; 2026-02-21T12:44:51.7033114Z cvt.u64.u32 %rd327, %r2303; 2026-02-21T12:44:51.7033195Z or.b64 %rd325, %rd327, -9223371899399045120; 2026-02-21T12:44:51.7033256Z // begin inline asm 2026-02-21T12:44:51.7033632Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2265,%r2266,%r2267,%r2268}, %rd325, %p72, 1, 1; 2026-02-21T12:44:51.7033691Z // end inline asm 2026-02-21T12:44:51.7033767Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7033832Z mov.b32 %r2278, %r2418; 2026-02-21T12:44:51.7033899Z mov.b32 %r2277, %r3946; 2026-02-21T12:44:51.7033963Z mov.b32 %r2279, %r2418; 2026-02-21T12:44:51.7034026Z // begin inline asm 2026-02-21T12:44:51.7034211Z // wait for regs: %r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276,%r2277,%r2278,%r2279 2026-02-21T12:44:51.7034286Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7034344Z // end inline asm 2026-02-21T12:44:51.7034404Z $L__tmp26: 2026-02-21T12:44:51.7034630Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.7034696Z add.s64 %rd518, %rd518, 24; 2026-02-21T12:44:51.7034758Z add.s64 %rd517, %rd517, 96; 2026-02-21T12:44:51.7034827Z add.s64 %rd516, %rd516, 30720; 2026-02-21T12:44:51.7034894Z setp.lt.u64 %p74, %rd518, 4056; 2026-02-21T12:44:51.7034956Z @%p74 bra $L__BB0_20; 2026-02-21T12:44:51.7035057Z // %bb.21: // %.preheader209 2026-02-21T12:44:51.7035164Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.7035377Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7035454Z add.s64 %rd342, %rd72, %rd201; 2026-02-21T12:44:51.7035517Z add.s64 %rd329, %rd342, 16320; 2026-02-21T12:44:51.7035721Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7035783Z // begin inline asm 2026-02-21T12:44:51.7035850Z mov.u64 %rd328, 0x0; 2026-02-21T12:44:51.7035976Z createpolicy.fractional.L2::evict_last.b64 %rd328, 1.0; 2026-02-21T12:44:51.7036035Z // end inline asm 2026-02-21T12:44:51.7036102Z // begin inline asm 2026-02-21T12:44:51.7036161Z mov.u32 %r2304, 0x0; 2026-02-21T12:44:51.7036323Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2304 }, [ %rd329 + 0 ], %rd328; 2026-02-21T12:44:51.7036400Z // end inline asm 2026-02-21T12:44:51.7036804Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7036862Z bar.sync 0; 2026-02-21T12:44:51.7037010Z st.shared.b32 [%r9], %r2304; 2026-02-21T12:44:51.7037072Z bar.sync 0; 2026-02-21T12:44:51.7037139Z ld.shared.b16 %rs250, [%r10]; 2026-02-21T12:44:51.7037207Z ld.shared.b16 %rs251, [%r10+256]; 2026-02-21T12:44:51.7037293Z ld.shared.b16 %rs252, [%r10+16]; 2026-02-21T12:44:51.7037361Z ld.shared.b16 %rs253, [%r10+272]; 2026-02-21T12:44:51.7037428Z ld.shared.b16 %rs254, [%r11]; 2026-02-21T12:44:51.7037494Z ld.shared.b16 %rs255, [%r11+256]; 2026-02-21T12:44:51.7037569Z ld.shared.b16 %rs256, [%r11+16]; 2026-02-21T12:44:51.7037634Z ld.shared.b16 %rs257, [%r11+272]; 2026-02-21T12:44:51.7037699Z cvt.f32.bf16 %r2321, %rs250; 2026-02-21T12:44:51.7037767Z cvt.f32.bf16 %r2322, %rs251; 2026-02-21T12:44:51.7037830Z cvt.f32.bf16 %r2323, %rs254; 2026-02-21T12:44:51.7037892Z cvt.f32.bf16 %r2324, %rs255; 2026-02-21T12:44:51.7037961Z cvt.f32.bf16 %r2341, %rs252; 2026-02-21T12:44:51.7038028Z cvt.f32.bf16 %r2342, %rs253; 2026-02-21T12:44:51.7038091Z cvt.f32.bf16 %r2343, %rs256; 2026-02-21T12:44:51.7038285Z cvt.f32.bf16 %r2344, %rs257; 2026-02-21T12:44:51.7038511Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7038578Z add.s64 %rd343, %rd73, %rd502; 2026-02-21T12:44:51.7038644Z add.s64 %rd331, %rd343, 5222400; 2026-02-21T12:44:51.7038855Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7038918Z // begin inline asm 2026-02-21T12:44:51.7038977Z mov.u16 %rs248, 0x0; 2026-02-21T12:44:51.7039056Z ld.global.b8 { %rs248 }, [ %rd331 + 0 ]; 2026-02-21T12:44:51.7039125Z // end inline asm 2026-02-21T12:44:51.7039327Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7039385Z bar.sync 0; 2026-02-21T12:44:51.7039459Z st.shared.b8 [%r12], %rs248; 2026-02-21T12:44:51.7039514Z bar.sync 0; 2026-02-21T12:44:51.7039595Z ld.shared.v2.b8 {%rs258, %rs259}, [%r22]; 2026-02-21T12:44:51.7039808Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7039874Z shl.b16 %rs260, %rs258, 4; 2026-02-21T12:44:51.7039940Z shl.b16 %rs261, %rs259, 4; 2026-02-21T12:44:51.7040143Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7040226Z selp.b16 %rs262, %rs260, %rs258, %p134; 2026-02-21T12:44:51.7040290Z cvt.s16.s8 %rs263, %rs262; 2026-02-21T12:44:51.7040352Z shr.s16 %rs264, %rs263, 4; 2026-02-21T12:44:51.7040432Z selp.b16 %rs265, %rs261, %rs259, %p134; 2026-02-21T12:44:51.7040493Z cvt.s16.s8 %rs266, %rs265; 2026-02-21T12:44:51.7040554Z shr.s16 %rs267, %rs266, 4; 2026-02-21T12:44:51.7040764Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7040832Z cvt.rn.f32.s16 %r2434, %rs264; 2026-02-21T12:44:51.7040896Z cvt.rn.f32.s16 %r2435, %rs267; 2026-02-21T12:44:51.7040955Z bar.sync 0; 2026-02-21T12:44:51.7041029Z st.shared.b32 [%r14], %r2434; 2026-02-21T12:44:51.7041107Z st.shared.b32 [%r15], %r2435; 2026-02-21T12:44:51.7041164Z $L__tmp27: 2026-02-21T12:44:51.7041449Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7041510Z // begin inline asm 2026-02-21T12:44:51.7041587Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7041645Z // end inline asm 2026-02-21T12:44:51.7041706Z bar.sync 0; 2026-02-21T12:44:51.7041787Z shfl.sync.idx.b32 %r2436, %r3, 0, 31, -1; 2026-02-21T12:44:51.7041861Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7041931Z shl.b32 %r2437, %r2436, 8; 2026-02-21T12:44:51.7041991Z and.b32 %r2438, %r2437, 3072; 2026-02-21T12:44:51.7042056Z add.s32 %r2439, %r2438, %r3946; 2026-02-21T12:44:51.7042193Z bfe.u32 %r2440, %r2439, 4, 14; 2026-02-21T12:44:51.7042263Z cvt.u64.u32 %rd344, %r2440; 2026-02-21T12:44:51.7042343Z or.b64 %rd332, %rd344, -9223371899399045120; 2026-02-21T12:44:51.7042460Z // begin inline asm 2026-02-21T12:44:51.7042866Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2321,%r2322,%r2323,%r2324}, %rd332, %p72, 1, 1; 2026-02-21T12:44:51.7042928Z // end inline asm 2026-02-21T12:44:51.7042994Z add.s32 %r2442, %r2438, %r169; 2026-02-21T12:44:51.7043061Z bfe.u32 %r2443, %r2442, 4, 14; 2026-02-21T12:44:51.7043129Z cvt.u64.u32 %rd345, %r2443; 2026-02-21T12:44:51.7043210Z or.b64 %rd333, %rd345, -9223371899399045120; 2026-02-21T12:44:51.7043272Z // begin inline asm 2026-02-21T12:44:51.7043656Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2341,%r2342,%r2343,%r2344}, %rd333, %p72, 1, 1; 2026-02-21T12:44:51.7043715Z // end inline asm 2026-02-21T12:44:51.7043797Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7043866Z mov.b32 %r2354, %r2418; 2026-02-21T12:44:51.7043930Z mov.b32 %r2355, %r2418; 2026-02-21T12:44:51.7044092Z mov.b32 %r2353, %r3946; 2026-02-21T12:44:51.7044163Z // begin inline asm 2026-02-21T12:44:51.7044348Z // wait for regs: %r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276,%r2353,%r2354,%r2355 2026-02-21T12:44:51.7044427Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7044485Z // end inline asm 2026-02-21T12:44:51.7044548Z $L__tmp28: 2026-02-21T12:44:51.7044760Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7044827Z add.s64 %rd335, %rd342, 16352; 2026-02-21T12:44:51.7045038Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7045101Z // begin inline asm 2026-02-21T12:44:51.7045161Z mov.u64 %rd334, 0x0; 2026-02-21T12:44:51.7045296Z createpolicy.fractional.L2::evict_last.b64 %rd334, 1.0; 2026-02-21T12:44:51.7045356Z // end inline asm 2026-02-21T12:44:51.7045415Z // begin inline asm 2026-02-21T12:44:51.7045478Z mov.u32 %r2367, 0x0; 2026-02-21T12:44:51.7045648Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2367 }, [ %rd335 + 0 ], %rd334; 2026-02-21T12:44:51.7045708Z // end inline asm 2026-02-21T12:44:51.7045918Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7045979Z bar.sync 0; 2026-02-21T12:44:51.7046048Z st.shared.b32 [%r9], %r2367; 2026-02-21T12:44:51.7046105Z bar.sync 0; 2026-02-21T12:44:51.7046177Z ld.shared.b16 %rs268, [%r10]; 2026-02-21T12:44:51.7046248Z ld.shared.b16 %rs269, [%r10+256]; 2026-02-21T12:44:51.7046318Z ld.shared.b16 %rs270, [%r10+16]; 2026-02-21T12:44:51.7046385Z ld.shared.b16 %rs271, [%r10+272]; 2026-02-21T12:44:51.7046587Z ld.shared.b16 %rs272, [%r11]; 2026-02-21T12:44:51.7046659Z ld.shared.b16 %rs273, [%r11+256]; 2026-02-21T12:44:51.7046730Z ld.shared.b16 %rs274, [%r11+16]; 2026-02-21T12:44:51.7046804Z ld.shared.b16 %rs275, [%r11+272]; 2026-02-21T12:44:51.7046873Z cvt.f32.bf16 %r2384, %rs268; 2026-02-21T12:44:51.7046940Z cvt.f32.bf16 %r2385, %rs269; 2026-02-21T12:44:51.7047002Z cvt.f32.bf16 %r2386, %rs272; 2026-02-21T12:44:51.7047073Z cvt.f32.bf16 %r2387, %rs273; 2026-02-21T12:44:51.7047136Z cvt.f32.bf16 %r2404, %rs270; 2026-02-21T12:44:51.7047200Z cvt.f32.bf16 %r2405, %rs271; 2026-02-21T12:44:51.7047266Z cvt.f32.bf16 %r2406, %rs274; 2026-02-21T12:44:51.7047327Z cvt.f32.bf16 %r2407, %rs275; 2026-02-21T12:44:51.7047534Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7047603Z add.s64 %rd337, %rd343, 5232640; 2026-02-21T12:44:51.7047820Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7047882Z // begin inline asm 2026-02-21T12:44:51.7047944Z mov.u16 %rs249, 0x0; 2026-02-21T12:44:51.7048118Z ld.global.b8 { %rs249 }, [ %rd337 + 0 ]; 2026-02-21T12:44:51.7048176Z // end inline asm 2026-02-21T12:44:51.7048382Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7048504Z bar.sync 0; 2026-02-21T12:44:51.7048570Z st.shared.b8 [%r12], %rs249; 2026-02-21T12:44:51.7048626Z bar.sync 0; 2026-02-21T12:44:51.7048710Z ld.shared.v2.b8 {%rs276, %rs277}, [%r22]; 2026-02-21T12:44:51.7048918Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7048987Z shl.b16 %rs278, %rs276, 4; 2026-02-21T12:44:51.7049052Z shl.b16 %rs279, %rs277, 4; 2026-02-21T12:44:51.7049259Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7049332Z selp.b16 %rs280, %rs278, %rs276, %p134; 2026-02-21T12:44:51.7049397Z cvt.s16.s8 %rs281, %rs280; 2026-02-21T12:44:51.7049464Z shr.s16 %rs282, %rs281, 4; 2026-02-21T12:44:51.7049539Z selp.b16 %rs283, %rs279, %rs277, %p134; 2026-02-21T12:44:51.7049601Z cvt.s16.s8 %rs284, %rs283; 2026-02-21T12:44:51.7049799Z shr.s16 %rs285, %rs284, 4; 2026-02-21T12:44:51.7050011Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7050079Z cvt.rn.f32.s16 %r2444, %rs282; 2026-02-21T12:44:51.7050145Z cvt.rn.f32.s16 %r2445, %rs285; 2026-02-21T12:44:51.7050206Z bar.sync 0; 2026-02-21T12:44:51.7050272Z st.shared.b32 [%r14], %r2444; 2026-02-21T12:44:51.7050337Z st.shared.b32 [%r15], %r2445; 2026-02-21T12:44:51.7050396Z $L__tmp29: 2026-02-21T12:44:51.7050673Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7050735Z // begin inline asm 2026-02-21T12:44:51.7050812Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7050872Z // end inline asm 2026-02-21T12:44:51.7050929Z bar.sync 0; 2026-02-21T12:44:51.7051011Z shfl.sync.idx.b32 %r2446, %r3, 0, 31, -1; 2026-02-21T12:44:51.7051095Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7051169Z shl.b32 %r2447, %r2446, 8; 2026-02-21T12:44:51.7051238Z and.b32 %r2448, %r2447, 3072; 2026-02-21T12:44:51.7051304Z add.s32 %r2449, %r2448, %r3946; 2026-02-21T12:44:51.7051373Z bfe.u32 %r2450, %r2449, 4, 14; 2026-02-21T12:44:51.7051439Z cvt.u64.u32 %rd346, %r2450; 2026-02-21T12:44:51.7051519Z or.b64 %rd338, %rd346, -9223371899399045120; 2026-02-21T12:44:51.7051583Z // begin inline asm 2026-02-21T12:44:51.7051959Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2384,%r2385,%r2386,%r2387}, %rd338, %p72, 1, 1; 2026-02-21T12:44:51.7052019Z // end inline asm 2026-02-21T12:44:51.7052085Z add.s32 %r2451, %r2448, %r169; 2026-02-21T12:44:51.7052147Z bfe.u32 %r2452, %r2451, 4, 14; 2026-02-21T12:44:51.7052210Z cvt.u64.u32 %rd347, %r2452; 2026-02-21T12:44:51.7052286Z or.b64 %rd339, %rd347, -9223371899399045120; 2026-02-21T12:44:51.7052353Z // begin inline asm 2026-02-21T12:44:51.7052726Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276}, {%r2404,%r2405,%r2406,%r2407}, %rd339, %p72, 1, 1; 2026-02-21T12:44:51.7052786Z // end inline asm 2026-02-21T12:44:51.7052869Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7052932Z mov.b32 %r2416, %r3946; 2026-02-21T12:44:51.7052992Z mov.b32 %r2417, %r2418; 2026-02-21T12:44:51.7053055Z // begin inline asm 2026-02-21T12:44:51.7053232Z // wait for regs: %r2269,%r2270,%r2271,%r2272,%r2273,%r2274,%r2275,%r2276,%r2416,%r2417,%r2418 2026-02-21T12:44:51.7053317Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7053376Z // end inline asm 2026-02-21T12:44:51.7053436Z $L__tmp30: 2026-02-21T12:44:51.7053646Z .loc 1 90 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:90:28 2026-02-21T12:44:51.7053723Z cvt.rn.bf16x2.f32 %r2453, %r2270, %r2269; 2026-02-21T12:44:51.7053860Z cvt.rn.bf16x2.f32 %r2454, %r2272, %r2271; 2026-02-21T12:44:51.7053932Z cvt.rn.bf16x2.f32 %r2455, %r2274, %r2273; 2026-02-21T12:44:51.7054051Z cvt.rn.bf16x2.f32 %r2456, %r2276, %r2275; 2026-02-21T12:44:51.7054260Z .loc 1 91 22 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:22 2026-02-21T12:44:51.7054331Z mad.lo.s64 %rd348, %rd70, 2560, %rd123; 2026-02-21T12:44:51.7054394Z shl.b64 %rd349, %rd71, 1; 2026-02-21T12:44:51.7054459Z add.s64 %rd350, %rd348, %rd349; 2026-02-21T12:44:51.7054527Z add.s64 %rd340, %rd350, %rd504; 2026-02-21T12:44:51.7054729Z .loc 1 91 81 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:81 2026-02-21T12:44:51.7054786Z bar.sync 0; 2026-02-21T12:44:51.7054978Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r2453, %r2454, %r2455, %r2456}; 2026-02-21T12:44:51.7055034Z bar.sync 0; 2026-02-21T12:44:51.7055146Z ld.shared.v4.b32 {%r2430, %r2431, %r2432, %r2433}, [%r21]; 2026-02-21T12:44:51.7055209Z // begin inline asm 2026-02-21T12:44:51.7055340Z st.global.v4.b32 [ %rd340 + 0 ], { %r2430, %r2431, %r2432, %r2433 }; 2026-02-21T12:44:51.7055450Z // end inline asm 2026-02-21T12:44:51.7055711Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.7055792Z add.s64 %rd352, %rd505, 3; 2026-02-21T12:44:51.7056000Z .loc 1 28 35 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:28:35 2026-02-21T12:44:51.7056089Z mul.hi.u64 %rd353, %rd352, -3689348814741910323; 2026-02-21T12:44:51.7056155Z shr.u64 %rd354, %rd353, 7; 2026-02-21T12:44:51.7056355Z .loc 1 29 33 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:29:33 2026-02-21T12:44:51.7056418Z shl.b64 %rd82, %rd354, 3; 2026-02-21T12:44:51.7056779Z .loc 1 30 39 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:39 2026-02-21T12:44:51.7056845Z sub.s64 %rd355, 4096, %rd82; 2026-02-21T12:44:51.7057048Z .loc 1 30 52 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:30:52 2026-02-21T12:44:51.7057116Z min.s64 %rd83, %rd355, 8; 2026-02-21T12:44:51.7057317Z .loc 1 31 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:45 2026-02-21T12:44:51.7057383Z mul.lo.s64 %rd356, %rd354, 160; 2026-02-21T12:44:51.7057448Z sub.s64 %rd84, %rd352, %rd356; 2026-02-21T12:44:51.7057652Z .loc 1 32 51 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:32:51 2026-02-21T12:44:51.7057716Z or.b64 %rd357, %rd84, %rd83; 2026-02-21T12:44:51.7057785Z and.b64 %rd358, %rd357, -4294967296; 2026-02-21T12:44:51.7057858Z setp.ne.b64 %p80, %rd358, 0; 2026-02-21T12:44:51.7057919Z @%p80 bra $L__BB0_23; 2026-02-21T12:44:51.7057978Z bra.uni $L__BB0_22; 2026-02-21T12:44:51.7058093Z $L__BB0_23: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.7058165Z div.s64 %rd515, %rd84, %rd83; 2026-02-21T12:44:51.7058231Z bra.uni $L__BB0_24; 2026-02-21T12:44:51.7058342Z $L__BB0_22: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.7058413Z cvt.u32.u64 %r2457, %rd83; 2026-02-21T12:44:51.7058479Z cvt.u32.u64 %r2458, %rd84; 2026-02-21T12:44:51.7058542Z div.u32 %r2459, %r2458, %r2457; 2026-02-21T12:44:51.7058611Z cvt.u64.u32 %rd515, %r2459; 2026-02-21T12:44:51.7058718Z $L__BB0_24: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.7058926Z .loc 1 31 64 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:64 2026-02-21T12:44:51.7058992Z mul.lo.s64 %rd360, %rd515, %rd83; 2026-02-21T12:44:51.7059062Z sub.s64 %rd361, %rd84, %rd360; 2026-02-21T12:44:51.7059262Z .loc 1 31 30 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:30 2026-02-21T12:44:51.7059327Z add.s64 %rd362, %rd361, %rd82; 2026-02-21T12:44:51.7059528Z .loc 1 33 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:33:27 2026-02-21T12:44:51.7059683Z shl.b64 %rd363, %rd362, 6; 2026-02-21T12:44:51.7059959Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.7060027Z or.b64 %rd88, %rd363, %rd4; 2026-02-21T12:44:51.7060229Z .loc 1 35 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:35:27 2026-02-21T12:44:51.7060291Z shl.b64 %rd89, %rd515, 6; 2026-02-21T12:44:51.7060367Z shl.b64 %rd364, %rd88, 14; 2026-02-21T12:44:51.7060438Z add.s64 %rd90, %rd121, %rd364; 2026-02-21T12:44:51.7060505Z add.s64 %rd91, %rd501, %rd89; 2026-02-21T12:44:51.7060719Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.7060787Z shl.b64 %rd365, %rd362, 20; 2026-02-21T12:44:51.7060853Z add.s64 %rd520, %rd18, %rd365; 2026-02-21T12:44:51.7060915Z add.s64 %rd519, %rd19, %rd89; 2026-02-21T12:44:51.7060985Z mov.b32 %r3020, 0f00000000; 2026-02-21T12:44:51.7061047Z mov.b64 %rd521, -24; 2026-02-21T12:44:51.7061109Z mov.b32 %r3021, %r3020; 2026-02-21T12:44:51.7061307Z mov.b32 %r3022, %r3020; 2026-02-21T12:44:51.7061376Z mov.b32 %r3023, %r3020; 2026-02-21T12:44:51.7061438Z mov.b32 %r3024, %r3020; 2026-02-21T12:44:51.7061498Z mov.b32 %r3025, %r3020; 2026-02-21T12:44:51.7061562Z mov.b32 %r3026, %r3020; 2026-02-21T12:44:51.7061622Z mov.b32 %r3027, %r3020; 2026-02-21T12:44:51.7061735Z $L__BB0_25: // Parent Loop BB0_2 Depth=1 2026-02-21T12:44:51.7061843Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:51.7062052Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7062117Z add.s64 %rd367, %rd520, -64; 2026-02-21T12:44:51.7062178Z // begin inline asm 2026-02-21T12:44:51.7062247Z mov.u64 %rd366, 0x0; 2026-02-21T12:44:51.7062389Z createpolicy.fractional.L2::evict_last.b64 %rd366, 1.0; 2026-02-21T12:44:51.7062450Z // end inline asm 2026-02-21T12:44:51.7062517Z // begin inline asm 2026-02-21T12:44:51.7062582Z mov.u32 %r2461, 0x0; 2026-02-21T12:44:51.7062747Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2461 }, [ %rd367 + 0 ], %rd366; 2026-02-21T12:44:51.7062806Z // end inline asm 2026-02-21T12:44:51.7063014Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7063072Z bar.sync 0; 2026-02-21T12:44:51.7063139Z st.shared.b32 [%r9], %r2461; 2026-02-21T12:44:51.7063199Z bar.sync 0; 2026-02-21T12:44:51.7063266Z ld.shared.b16 %rs289, [%r10]; 2026-02-21T12:44:51.7063334Z ld.shared.b16 %rs290, [%r10+256]; 2026-02-21T12:44:51.7063408Z ld.shared.b16 %rs291, [%r10+16]; 2026-02-21T12:44:51.7063475Z ld.shared.b16 %rs292, [%r10+272]; 2026-02-21T12:44:51.7063540Z ld.shared.b16 %rs293, [%r11]; 2026-02-21T12:44:51.7063605Z ld.shared.b16 %rs294, [%r11+256]; 2026-02-21T12:44:51.7063678Z ld.shared.b16 %rs295, [%r11+16]; 2026-02-21T12:44:51.7063744Z ld.shared.b16 %rs296, [%r11+272]; 2026-02-21T12:44:51.7063812Z cvt.f32.bf16 %r2558, %rs289; 2026-02-21T12:44:51.7063886Z cvt.f32.bf16 %r2559, %rs290; 2026-02-21T12:44:51.7063950Z cvt.f32.bf16 %r2560, %rs293; 2026-02-21T12:44:51.7064012Z cvt.f32.bf16 %r2561, %rs294; 2026-02-21T12:44:51.7064074Z cvt.f32.bf16 %r2578, %rs291; 2026-02-21T12:44:51.7064142Z cvt.f32.bf16 %r2579, %rs292; 2026-02-21T12:44:51.7064205Z cvt.f32.bf16 %r2580, %rs295; 2026-02-21T12:44:51.7064267Z cvt.f32.bf16 %r2581, %rs296; 2026-02-21T12:44:51.7064475Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7064537Z // begin inline asm 2026-02-21T12:44:51.7064598Z mov.u16 %rs286, 0x0; 2026-02-21T12:44:51.7064679Z ld.global.b8 { %rs286 }, [ %rd519 + 0 ]; 2026-02-21T12:44:51.7064737Z // end inline asm 2026-02-21T12:44:51.7064939Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7065079Z bar.sync 0; 2026-02-21T12:44:51.7065152Z st.shared.b8 [%r12], %rs286; 2026-02-21T12:44:51.7065258Z bar.sync 0; 2026-02-21T12:44:51.7065340Z ld.shared.v2.b8 {%rs297, %rs298}, [%r22]; 2026-02-21T12:44:51.7065547Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7065610Z shl.b16 %rs299, %rs297, 4; 2026-02-21T12:44:51.7065676Z shl.b16 %rs300, %rs298, 4; 2026-02-21T12:44:51.7065878Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7065956Z selp.b16 %rs301, %rs299, %rs297, %p134; 2026-02-21T12:44:51.7066020Z cvt.s16.s8 %rs302, %rs301; 2026-02-21T12:44:51.7066081Z shr.s16 %rs303, %rs302, 4; 2026-02-21T12:44:51.7066156Z selp.b16 %rs304, %rs300, %rs298, %p134; 2026-02-21T12:44:51.7066219Z cvt.s16.s8 %rs305, %rs304; 2026-02-21T12:44:51.7066282Z shr.s16 %rs306, %rs305, 4; 2026-02-21T12:44:51.7066670Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7066880Z cvt.rn.f32.s16 %r3042, %rs303; 2026-02-21T12:44:51.7066952Z cvt.rn.f32.s16 %r3043, %rs306; 2026-02-21T12:44:51.7067010Z bar.sync 0; 2026-02-21T12:44:51.7067080Z st.shared.b32 [%r14], %r3042; 2026-02-21T12:44:51.7067146Z st.shared.b32 [%r15], %r3043; 2026-02-21T12:44:51.7067285Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3020}; 2026-02-21T12:44:51.7067347Z bar.sync 0; 2026-02-21T12:44:51.7067408Z // begin inline asm 2026-02-21T12:44:51.7067596Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2522, %r2562, %r2602, %r2642}, [%r964]; 2026-02-21T12:44:51.7067660Z // end inline asm 2026-02-21T12:44:51.7067717Z bar.sync 0; 2026-02-21T12:44:51.7067849Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3022}; 2026-02-21T12:44:51.7067905Z bar.sync 0; 2026-02-21T12:44:51.7067974Z // begin inline asm 2026-02-21T12:44:51.7068169Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2524, %r2564, %r2604, %r2644}, [%r964]; 2026-02-21T12:44:51.7068227Z // end inline asm 2026-02-21T12:44:51.7068292Z bar.sync 0; 2026-02-21T12:44:51.7068424Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3021}; 2026-02-21T12:44:51.7068480Z bar.sync 0; 2026-02-21T12:44:51.7068612Z // begin inline asm 2026-02-21T12:44:51.7068798Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2523, %r2563, %r2603, %r2643}, [%r964]; 2026-02-21T12:44:51.7068857Z // end inline asm 2026-02-21T12:44:51.7068914Z bar.sync 0; 2026-02-21T12:44:51.7069049Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3023}; 2026-02-21T12:44:51.7069104Z bar.sync 0; 2026-02-21T12:44:51.7069165Z // begin inline asm 2026-02-21T12:44:51.7069342Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2525, %r2565, %r2605, %r2645}, [%r964]; 2026-02-21T12:44:51.7069406Z // end inline asm 2026-02-21T12:44:51.7069465Z bar.sync 0; 2026-02-21T12:44:51.7069592Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3024}; 2026-02-21T12:44:51.7069657Z bar.sync 0; 2026-02-21T12:44:51.7069717Z // begin inline asm 2026-02-21T12:44:51.7069898Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2526, %r2566, %r2606, %r2646}, [%r964]; 2026-02-21T12:44:51.7069969Z // end inline asm 2026-02-21T12:44:51.7070024Z bar.sync 0; 2026-02-21T12:44:51.7070151Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3026}; 2026-02-21T12:44:51.7070207Z bar.sync 0; 2026-02-21T12:44:51.7070273Z // begin inline asm 2026-02-21T12:44:51.7070447Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2528, %r2568, %r2608, %r2648}, [%r964]; 2026-02-21T12:44:51.7070505Z // end inline asm 2026-02-21T12:44:51.7070564Z bar.sync 0; 2026-02-21T12:44:51.7070689Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3025}; 2026-02-21T12:44:51.7070745Z bar.sync 0; 2026-02-21T12:44:51.7070803Z // begin inline asm 2026-02-21T12:44:51.7070984Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2527, %r2567, %r2607, %r2647}, [%r964]; 2026-02-21T12:44:51.7071123Z // end inline asm 2026-02-21T12:44:51.7071179Z bar.sync 0; 2026-02-21T12:44:51.7071310Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r712], {%r3027}; 2026-02-21T12:44:51.7071427Z bar.sync 0; 2026-02-21T12:44:51.7071500Z // begin inline asm 2026-02-21T12:44:51.7071684Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r2529, %r2569, %r2609, %r2649}, [%r964]; 2026-02-21T12:44:51.7071746Z // end inline asm 2026-02-21T12:44:51.7071802Z $L__tmp31: 2026-02-21T12:44:51.7072082Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7072146Z // begin inline asm 2026-02-21T12:44:51.7072224Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7072282Z // end inline asm 2026-02-21T12:44:51.7072367Z shfl.sync.idx.b32 %r3044, %r3, 0, 31, -1; 2026-02-21T12:44:51.7072441Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7072505Z mov.pred %p98, -1; 2026-02-21T12:44:51.7072566Z // begin inline asm 2026-02-21T12:44:51.7072952Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529}, {%r2558,%r2559,%r2560,%r2561}, %rd7, %p98, 1, 1; 2026-02-21T12:44:51.7073124Z // end inline asm 2026-02-21T12:44:51.7073187Z // begin inline asm 2026-02-21T12:44:51.7073562Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529}, {%r2578,%r2579,%r2580,%r2581}, %rd8, %p98, 1, 1; 2026-02-21T12:44:51.7073624Z // end inline asm 2026-02-21T12:44:51.7073685Z // begin inline asm 2026-02-21T12:44:51.7074054Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569}, {%r2558,%r2559,%r2560,%r2561}, %rd9, %p98, 1, 1; 2026-02-21T12:44:51.7074112Z // end inline asm 2026-02-21T12:44:51.7074171Z // begin inline asm 2026-02-21T12:44:51.7074543Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569}, {%r2578,%r2579,%r2580,%r2581}, %rd10, %p98, 1, 1; 2026-02-21T12:44:51.7074603Z // end inline asm 2026-02-21T12:44:51.7074663Z // begin inline asm 2026-02-21T12:44:51.7075030Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609}, {%r2558,%r2559,%r2560,%r2561}, %rd11, %p98, 1, 1; 2026-02-21T12:44:51.7075098Z // end inline asm 2026-02-21T12:44:51.7075158Z // begin inline asm 2026-02-21T12:44:51.7075523Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609}, {%r2578,%r2579,%r2580,%r2581}, %rd12, %p98, 1, 1; 2026-02-21T12:44:51.7075588Z // end inline asm 2026-02-21T12:44:51.7075651Z // begin inline asm 2026-02-21T12:44:51.7076014Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649}, {%r2558,%r2559,%r2560,%r2561}, %rd13, %p98, 1, 1; 2026-02-21T12:44:51.7076078Z // end inline asm 2026-02-21T12:44:51.7076138Z // begin inline asm 2026-02-21T12:44:51.7076639Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649}, {%r2578,%r2579,%r2580,%r2581}, %rd14, %p98, 1, 1; 2026-02-21T12:44:51.7076713Z // end inline asm 2026-02-21T12:44:51.7076793Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7076851Z mov.b32 %r3169, 0; 2026-02-21T12:44:51.7076913Z mov.b32 %r2694, %r3946; 2026-02-21T12:44:51.7076980Z mov.b32 %r2695, %r3169; 2026-02-21T12:44:51.7077041Z mov.b32 %r2696, %r3169; 2026-02-21T12:44:51.7077100Z // begin inline asm 2026-02-21T12:44:51.7077683Z // wait for regs: %r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529,%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609,%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649,%r2694,%r2695,%r2696 2026-02-21T12:44:51.7077763Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7077823Z // end inline asm 2026-02-21T12:44:51.7077881Z $L__tmp32: 2026-02-21T12:44:51.7078097Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7078246Z add.s64 %rd379, %rd520, -32; 2026-02-21T12:44:51.7078364Z // begin inline asm 2026-02-21T12:44:51.7078433Z mov.u64 %rd378, 0x0; 2026-02-21T12:44:51.7078561Z createpolicy.fractional.L2::evict_last.b64 %rd378, 1.0; 2026-02-21T12:44:51.7078620Z // end inline asm 2026-02-21T12:44:51.7078685Z // begin inline asm 2026-02-21T12:44:51.7078744Z mov.u32 %r2732, 0x0; 2026-02-21T12:44:51.7078908Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2732 }, [ %rd379 + 0 ], %rd378; 2026-02-21T12:44:51.7078975Z // end inline asm 2026-02-21T12:44:51.7079182Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7079238Z bar.sync 0; 2026-02-21T12:44:51.7079306Z st.shared.b32 [%r9], %r2732; 2026-02-21T12:44:51.7079369Z bar.sync 0; 2026-02-21T12:44:51.7079434Z ld.shared.b16 %rs307, [%r10]; 2026-02-21T12:44:51.7079503Z ld.shared.b16 %rs308, [%r10+256]; 2026-02-21T12:44:51.7079577Z ld.shared.b16 %rs309, [%r10+16]; 2026-02-21T12:44:51.7079645Z ld.shared.b16 %rs310, [%r10+272]; 2026-02-21T12:44:51.7079780Z ld.shared.b16 %rs311, [%r11]; 2026-02-21T12:44:51.7079907Z ld.shared.b16 %rs312, [%r11+256]; 2026-02-21T12:44:51.7079981Z ld.shared.b16 %rs313, [%r11+16]; 2026-02-21T12:44:51.7080047Z ld.shared.b16 %rs314, [%r11+272]; 2026-02-21T12:44:51.7080114Z cvt.f32.bf16 %r2789, %rs307; 2026-02-21T12:44:51.7080186Z cvt.f32.bf16 %r2790, %rs308; 2026-02-21T12:44:51.7080251Z cvt.f32.bf16 %r2791, %rs311; 2026-02-21T12:44:51.7080315Z cvt.f32.bf16 %r2792, %rs312; 2026-02-21T12:44:51.7080377Z cvt.f32.bf16 %r2809, %rs309; 2026-02-21T12:44:51.7080444Z cvt.f32.bf16 %r2810, %rs310; 2026-02-21T12:44:51.7080510Z cvt.f32.bf16 %r2811, %rs313; 2026-02-21T12:44:51.7080586Z cvt.f32.bf16 %r2812, %rs314; 2026-02-21T12:44:51.7080802Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7080870Z add.s64 %rd381, %rd519, 10240; 2026-02-21T12:44:51.7080931Z // begin inline asm 2026-02-21T12:44:51.7080996Z mov.u16 %rs287, 0x0; 2026-02-21T12:44:51.7081079Z ld.global.b8 { %rs287 }, [ %rd381 + 0 ]; 2026-02-21T12:44:51.7081139Z // end inline asm 2026-02-21T12:44:51.7081340Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7081403Z bar.sync 0; 2026-02-21T12:44:51.7081469Z st.shared.b8 [%r12], %rs287; 2026-02-21T12:44:51.7081524Z bar.sync 0; 2026-02-21T12:44:51.7081608Z ld.shared.v2.b8 {%rs315, %rs316}, [%r22]; 2026-02-21T12:44:51.7081810Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7081875Z shl.b16 %rs317, %rs315, 4; 2026-02-21T12:44:51.7081944Z shl.b16 %rs318, %rs316, 4; 2026-02-21T12:44:51.7082143Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7082218Z selp.b16 %rs319, %rs317, %rs315, %p134; 2026-02-21T12:44:51.7082280Z cvt.s16.s8 %rs320, %rs319; 2026-02-21T12:44:51.7082348Z shr.s16 %rs321, %rs320, 4; 2026-02-21T12:44:51.7082422Z selp.b16 %rs322, %rs318, %rs316, %p134; 2026-02-21T12:44:51.7082485Z cvt.s16.s8 %rs323, %rs322; 2026-02-21T12:44:51.7082551Z shr.s16 %rs324, %rs323, 4; 2026-02-21T12:44:51.7082752Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7082818Z cvt.rn.f32.s16 %r3045, %rs321; 2026-02-21T12:44:51.7082892Z cvt.rn.f32.s16 %r3046, %rs324; 2026-02-21T12:44:51.7082949Z bar.sync 0; 2026-02-21T12:44:51.7083012Z st.shared.b32 [%r14], %r3045; 2026-02-21T12:44:51.7083076Z st.shared.b32 [%r15], %r3046; 2026-02-21T12:44:51.7083136Z $L__tmp33: 2026-02-21T12:44:51.7083410Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7083471Z // begin inline asm 2026-02-21T12:44:51.7083615Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7083672Z // end inline asm 2026-02-21T12:44:51.7083728Z bar.sync 0; 2026-02-21T12:44:51.7083861Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7083928Z // begin inline asm 2026-02-21T12:44:51.7084303Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529}, {%r2789,%r2790,%r2791,%r2792}, %rd7, %p98, 1, 1; 2026-02-21T12:44:51.7084361Z // end inline asm 2026-02-21T12:44:51.7084426Z // begin inline asm 2026-02-21T12:44:51.7084800Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529}, {%r2809,%r2810,%r2811,%r2812}, %rd8, %p98, 1, 1; 2026-02-21T12:44:51.7084858Z // end inline asm 2026-02-21T12:44:51.7084921Z // begin inline asm 2026-02-21T12:44:51.7085284Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569}, {%r2789,%r2790,%r2791,%r2792}, %rd9, %p98, 1, 1; 2026-02-21T12:44:51.7085344Z // end inline asm 2026-02-21T12:44:51.7085403Z // begin inline asm 2026-02-21T12:44:51.7085866Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569}, {%r2809,%r2810,%r2811,%r2812}, %rd10, %p98, 1, 1; 2026-02-21T12:44:51.7085930Z // end inline asm 2026-02-21T12:44:51.7085990Z // begin inline asm 2026-02-21T12:44:51.7086358Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609}, {%r2789,%r2790,%r2791,%r2792}, %rd11, %p98, 1, 1; 2026-02-21T12:44:51.7086417Z // end inline asm 2026-02-21T12:44:51.7086612Z // begin inline asm 2026-02-21T12:44:51.7086991Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609}, {%r2809,%r2810,%r2811,%r2812}, %rd12, %p98, 1, 1; 2026-02-21T12:44:51.7087050Z // end inline asm 2026-02-21T12:44:51.7087110Z // begin inline asm 2026-02-21T12:44:51.7087492Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649}, {%r2789,%r2790,%r2791,%r2792}, %rd13, %p98, 1, 1; 2026-02-21T12:44:51.7087554Z // end inline asm 2026-02-21T12:44:51.7087619Z // begin inline asm 2026-02-21T12:44:51.7087989Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649}, {%r2809,%r2810,%r2811,%r2812}, %rd14, %p98, 1, 1; 2026-02-21T12:44:51.7088047Z // end inline asm 2026-02-21T12:44:51.7088124Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7088189Z mov.b32 %r2925, %r3946; 2026-02-21T12:44:51.7088255Z mov.b32 %r2926, %r3169; 2026-02-21T12:44:51.7088315Z mov.b32 %r2927, %r3169; 2026-02-21T12:44:51.7088375Z // begin inline asm 2026-02-21T12:44:51.7088950Z // wait for regs: %r2522,%r2523,%r2524,%r2525,%r2526,%r2527,%r2528,%r2529,%r2562,%r2563,%r2564,%r2565,%r2566,%r2567,%r2568,%r2569,%r2602,%r2603,%r2604,%r2605,%r2606,%r2607,%r2608,%r2609,%r2642,%r2643,%r2644,%r2645,%r2646,%r2647,%r2648,%r2649,%r2925,%r2926,%r2927 2026-02-21T12:44:51.7089031Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7089092Z // end inline asm 2026-02-21T12:44:51.7089153Z $L__tmp34: 2026-02-21T12:44:51.7089370Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7089431Z // begin inline asm 2026-02-21T12:44:51.7089492Z mov.u64 %rd390, 0x0; 2026-02-21T12:44:51.7089626Z createpolicy.fractional.L2::evict_last.b64 %rd390, 1.0; 2026-02-21T12:44:51.7089684Z // end inline asm 2026-02-21T12:44:51.7089745Z // begin inline asm 2026-02-21T12:44:51.7089813Z mov.u32 %r2963, 0x0; 2026-02-21T12:44:51.7089974Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r2963 }, [ %rd520 + 0 ], %rd390; 2026-02-21T12:44:51.7090033Z // end inline asm 2026-02-21T12:44:51.7090244Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7090302Z bar.sync 0; 2026-02-21T12:44:51.7090370Z st.shared.b32 [%r9], %r2963; 2026-02-21T12:44:51.7090511Z bar.sync 0; 2026-02-21T12:44:51.7090586Z ld.shared.b16 %rs325, [%r10]; 2026-02-21T12:44:51.7090655Z ld.shared.b16 %rs326, [%r10+256]; 2026-02-21T12:44:51.7090786Z ld.shared.b16 %rs327, [%r10+16]; 2026-02-21T12:44:51.7090857Z ld.shared.b16 %rs328, [%r10+272]; 2026-02-21T12:44:51.7090923Z ld.shared.b16 %rs329, [%r11]; 2026-02-21T12:44:51.7090990Z ld.shared.b16 %rs330, [%r11+256]; 2026-02-21T12:44:51.7091056Z ld.shared.b16 %rs331, [%r11+16]; 2026-02-21T12:44:51.7091128Z ld.shared.b16 %rs332, [%r11+272]; 2026-02-21T12:44:51.7091193Z cvt.f32.bf16 %r2996, %rs325; 2026-02-21T12:44:51.7091270Z cvt.f32.bf16 %r2997, %rs326; 2026-02-21T12:44:51.7091341Z cvt.f32.bf16 %r2998, %rs329; 2026-02-21T12:44:51.7091404Z cvt.f32.bf16 %r2999, %rs330; 2026-02-21T12:44:51.7091466Z cvt.f32.bf16 %r3016, %rs327; 2026-02-21T12:44:51.7091527Z cvt.f32.bf16 %r3017, %rs328; 2026-02-21T12:44:51.7091595Z cvt.f32.bf16 %r3018, %rs331; 2026-02-21T12:44:51.7091657Z cvt.f32.bf16 %r3019, %rs332; 2026-02-21T12:44:51.7091868Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7092007Z add.s64 %rd393, %rd519, 20480; 2026-02-21T12:44:51.7092129Z // begin inline asm 2026-02-21T12:44:51.7092194Z mov.u16 %rs288, 0x0; 2026-02-21T12:44:51.7092275Z ld.global.b8 { %rs288 }, [ %rd393 + 0 ]; 2026-02-21T12:44:51.7092334Z // end inline asm 2026-02-21T12:44:51.7092535Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7092592Z bar.sync 0; 2026-02-21T12:44:51.7092660Z st.shared.b8 [%r12], %rs288; 2026-02-21T12:44:51.7092719Z bar.sync 0; 2026-02-21T12:44:51.7092798Z ld.shared.v2.b8 {%rs333, %rs334}, [%r22]; 2026-02-21T12:44:51.7093004Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7093071Z shl.b16 %rs335, %rs333, 4; 2026-02-21T12:44:51.7093135Z shl.b16 %rs336, %rs334, 4; 2026-02-21T12:44:51.7093342Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7093417Z selp.b16 %rs337, %rs335, %rs333, %p134; 2026-02-21T12:44:51.7093484Z cvt.s16.s8 %rs338, %rs337; 2026-02-21T12:44:51.7093546Z shr.s16 %rs339, %rs338, 4; 2026-02-21T12:44:51.7093623Z selp.b16 %rs340, %rs336, %rs334, %p134; 2026-02-21T12:44:51.7093685Z cvt.s16.s8 %rs341, %rs340; 2026-02-21T12:44:51.7093748Z shr.s16 %rs342, %rs341, 4; 2026-02-21T12:44:51.7093953Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7094019Z cvt.rn.f32.s16 %r3047, %rs339; 2026-02-21T12:44:51.7094086Z cvt.rn.f32.s16 %r3048, %rs342; 2026-02-21T12:44:51.7094155Z bar.sync 0; 2026-02-21T12:44:51.7094228Z st.shared.b32 [%r14], %r3047; 2026-02-21T12:44:51.7094294Z st.shared.b32 [%r15], %r3048; 2026-02-21T12:44:51.7094350Z $L__tmp35: 2026-02-21T12:44:51.7094632Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7094830Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2522, %r2562, %r2602, %r2642}; 2026-02-21T12:44:51.7094888Z bar.sync 0; 2026-02-21T12:44:51.7094953Z // begin inline asm 2026-02-21T12:44:51.7095091Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3020}, [%r712]; 2026-02-21T12:44:51.7095151Z // end inline asm 2026-02-21T12:44:51.7095207Z bar.sync 0; 2026-02-21T12:44:51.7095396Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2524, %r2564, %r2604, %r2644}; 2026-02-21T12:44:51.7095452Z bar.sync 0; 2026-02-21T12:44:51.7095512Z // begin inline asm 2026-02-21T12:44:51.7095646Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3022}, [%r712]; 2026-02-21T12:44:51.7095702Z // end inline asm 2026-02-21T12:44:51.7095757Z bar.sync 0; 2026-02-21T12:44:51.7095937Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2523, %r2563, %r2603, %r2643}; 2026-02-21T12:44:51.7096011Z bar.sync 0; 2026-02-21T12:44:51.7096132Z // begin inline asm 2026-02-21T12:44:51.7096263Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3021}, [%r712]; 2026-02-21T12:44:51.7096392Z // end inline asm 2026-02-21T12:44:51.7096568Z bar.sync 0; 2026-02-21T12:44:51.7096752Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2525, %r2565, %r2605, %r2645}; 2026-02-21T12:44:51.7096813Z bar.sync 0; 2026-02-21T12:44:51.7096872Z // begin inline asm 2026-02-21T12:44:51.7096998Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3023}, [%r712]; 2026-02-21T12:44:51.7097056Z // end inline asm 2026-02-21T12:44:51.7097117Z bar.sync 0; 2026-02-21T12:44:51.7097307Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2526, %r2566, %r2606, %r2646}; 2026-02-21T12:44:51.7097366Z bar.sync 0; 2026-02-21T12:44:51.7097435Z // begin inline asm 2026-02-21T12:44:51.7097565Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3024}, [%r712]; 2026-02-21T12:44:51.7097623Z // end inline asm 2026-02-21T12:44:51.7097679Z bar.sync 0; 2026-02-21T12:44:51.7097862Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2528, %r2568, %r2608, %r2648}; 2026-02-21T12:44:51.7097920Z bar.sync 0; 2026-02-21T12:44:51.7098057Z // begin inline asm 2026-02-21T12:44:51.7098253Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3026}, [%r712]; 2026-02-21T12:44:51.7098314Z // end inline asm 2026-02-21T12:44:51.7098373Z bar.sync 0; 2026-02-21T12:44:51.7098554Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2527, %r2567, %r2607, %r2647}; 2026-02-21T12:44:51.7098611Z bar.sync 0; 2026-02-21T12:44:51.7098672Z // begin inline asm 2026-02-21T12:44:51.7098799Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3025}, [%r712]; 2026-02-21T12:44:51.7098863Z // end inline asm 2026-02-21T12:44:51.7098920Z bar.sync 0; 2026-02-21T12:44:51.7099097Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r964], {%r2529, %r2569, %r2609, %r2649}; 2026-02-21T12:44:51.7099160Z bar.sync 0; 2026-02-21T12:44:51.7099219Z // begin inline asm 2026-02-21T12:44:51.7099349Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3027}, [%r712]; 2026-02-21T12:44:51.7099408Z // end inline asm 2026-02-21T12:44:51.7099473Z // begin inline asm 2026-02-21T12:44:51.7099566Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7099628Z // end inline asm 2026-02-21T12:44:51.7099709Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7099771Z shl.b32 %r3049, %r3044, 8; 2026-02-21T12:44:51.7099835Z and.b32 %r3050, %r3049, 3072; 2026-02-21T12:44:51.7099902Z add.s32 %r3051, %r3050, %r3946; 2026-02-21T12:44:51.7099972Z bfe.u32 %r3052, %r3051, 4, 14; 2026-02-21T12:44:51.7100038Z cvt.u64.u32 %rd396, %r3052; 2026-02-21T12:44:51.7100122Z or.b64 %rd394, %rd396, -9223371899399045120; 2026-02-21T12:44:51.7100193Z // begin inline asm 2026-02-21T12:44:51.7100575Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r2996,%r2997,%r2998,%r2999}, %rd394, %p98, 1, 1; 2026-02-21T12:44:51.7100634Z // end inline asm 2026-02-21T12:44:51.7100702Z add.s32 %r3053, %r3051, 32; 2026-02-21T12:44:51.7100767Z bfe.u32 %r3054, %r3053, 4, 14; 2026-02-21T12:44:51.7100832Z cvt.u64.u32 %rd397, %r3054; 2026-02-21T12:44:51.7100913Z or.b64 %rd395, %rd397, -9223371899399045120; 2026-02-21T12:44:51.7100984Z // begin inline asm 2026-02-21T12:44:51.7101360Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r3016,%r3017,%r3018,%r3019}, %rd395, %p98, 1, 1; 2026-02-21T12:44:51.7101421Z // end inline asm 2026-02-21T12:44:51.7101505Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7101568Z mov.b32 %r3030, %r3169; 2026-02-21T12:44:51.7101630Z mov.b32 %r3028, %r3946; 2026-02-21T12:44:51.7101697Z mov.b32 %r3029, %r3169; 2026-02-21T12:44:51.7101757Z // begin inline asm 2026-02-21T12:44:51.7101938Z // wait for regs: %r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3028,%r3029,%r3030 2026-02-21T12:44:51.7102013Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7102077Z // end inline asm 2026-02-21T12:44:51.7102209Z $L__tmp36: 2026-02-21T12:44:51.7102430Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.7102571Z add.s64 %rd521, %rd521, 24; 2026-02-21T12:44:51.7102635Z add.s64 %rd520, %rd520, 96; 2026-02-21T12:44:51.7102698Z add.s64 %rd519, %rd519, 30720; 2026-02-21T12:44:51.7102766Z setp.lt.u64 %p100, %rd521, 4056; 2026-02-21T12:44:51.7102833Z @%p100 bra $L__BB0_25; 2026-02-21T12:44:51.7102926Z // %bb.26: // %.preheader208 2026-02-21T12:44:51.7103029Z // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:44:51.7103242Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7103309Z add.s64 %rd412, %rd90, %rd201; 2026-02-21T12:44:51.7103371Z add.s64 %rd399, %rd412, 16320; 2026-02-21T12:44:51.7103581Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7103646Z // begin inline asm 2026-02-21T12:44:51.7103705Z mov.u64 %rd398, 0x0; 2026-02-21T12:44:51.7103897Z createpolicy.fractional.L2::evict_last.b64 %rd398, 1.0; 2026-02-21T12:44:51.7104009Z // end inline asm 2026-02-21T12:44:51.7104070Z // begin inline asm 2026-02-21T12:44:51.7104130Z mov.u32 %r3055, 0x0; 2026-02-21T12:44:51.7104297Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3055 }, [ %rd399 + 0 ], %rd398; 2026-02-21T12:44:51.7104355Z // end inline asm 2026-02-21T12:44:51.7104556Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7104615Z bar.sync 0; 2026-02-21T12:44:51.7104682Z st.shared.b32 [%r9], %r3055; 2026-02-21T12:44:51.7104737Z bar.sync 0; 2026-02-21T12:44:51.7104806Z ld.shared.b16 %rs345, [%r10]; 2026-02-21T12:44:51.7104879Z ld.shared.b16 %rs346, [%r10+256]; 2026-02-21T12:44:51.7104946Z ld.shared.b16 %rs347, [%r10+16]; 2026-02-21T12:44:51.7105014Z ld.shared.b16 %rs348, [%r10+272]; 2026-02-21T12:44:51.7105085Z ld.shared.b16 %rs349, [%r11]; 2026-02-21T12:44:51.7105150Z ld.shared.b16 %rs350, [%r11+256]; 2026-02-21T12:44:51.7105220Z ld.shared.b16 %rs351, [%r11+16]; 2026-02-21T12:44:51.7105294Z ld.shared.b16 %rs352, [%r11+272]; 2026-02-21T12:44:51.7105358Z cvt.f32.bf16 %r3072, %rs345; 2026-02-21T12:44:51.7105423Z cvt.f32.bf16 %r3073, %rs346; 2026-02-21T12:44:51.7105485Z cvt.f32.bf16 %r3074, %rs349; 2026-02-21T12:44:51.7105553Z cvt.f32.bf16 %r3075, %rs350; 2026-02-21T12:44:51.7105618Z cvt.f32.bf16 %r3092, %rs347; 2026-02-21T12:44:51.7105679Z cvt.f32.bf16 %r3093, %rs348; 2026-02-21T12:44:51.7105745Z cvt.f32.bf16 %r3094, %rs351; 2026-02-21T12:44:51.7105807Z cvt.f32.bf16 %r3095, %rs352; 2026-02-21T12:44:51.7106009Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7106074Z add.s64 %rd413, %rd91, %rd502; 2026-02-21T12:44:51.7106143Z add.s64 %rd401, %rd413, 5222400; 2026-02-21T12:44:51.7106347Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7106410Z // begin inline asm 2026-02-21T12:44:51.7106595Z mov.u16 %rs343, 0x0; 2026-02-21T12:44:51.7106678Z ld.global.b8 { %rs343 }, [ %rd401 + 0 ]; 2026-02-21T12:44:51.7106736Z // end inline asm 2026-02-21T12:44:51.7106940Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7106997Z bar.sync 0; 2026-02-21T12:44:51.7107062Z st.shared.b8 [%r12], %rs343; 2026-02-21T12:44:51.7107117Z bar.sync 0; 2026-02-21T12:44:51.7107217Z ld.shared.v2.b8 {%rs353, %rs354}, [%r22]; 2026-02-21T12:44:51.7107422Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7107486Z shl.b16 %rs355, %rs353, 4; 2026-02-21T12:44:51.7107557Z shl.b16 %rs356, %rs354, 4; 2026-02-21T12:44:51.7107757Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7107915Z selp.b16 %rs357, %rs355, %rs353, %p134; 2026-02-21T12:44:51.7107984Z cvt.s16.s8 %rs358, %rs357; 2026-02-21T12:44:51.7108113Z shr.s16 %rs359, %rs358, 4; 2026-02-21T12:44:51.7108184Z selp.b16 %rs360, %rs356, %rs354, %p134; 2026-02-21T12:44:51.7108246Z cvt.s16.s8 %rs361, %rs360; 2026-02-21T12:44:51.7108312Z shr.s16 %rs362, %rs361, 4; 2026-02-21T12:44:51.7108575Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7108645Z cvt.rn.f32.s16 %r3185, %rs359; 2026-02-21T12:44:51.7108715Z cvt.rn.f32.s16 %r3186, %rs362; 2026-02-21T12:44:51.7108772Z bar.sync 0; 2026-02-21T12:44:51.7108837Z st.shared.b32 [%r14], %r3185; 2026-02-21T12:44:51.7108906Z st.shared.b32 [%r15], %r3186; 2026-02-21T12:44:51.7108972Z $L__tmp37: 2026-02-21T12:44:51.7109246Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7109310Z // begin inline asm 2026-02-21T12:44:51.7109393Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7109451Z // end inline asm 2026-02-21T12:44:51.7109586Z bar.sync 0; 2026-02-21T12:44:51.7109731Z shfl.sync.idx.b32 %r3187, %r3, 0, 31, -1; 2026-02-21T12:44:51.7109808Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7109871Z shl.b32 %r3188, %r3187, 8; 2026-02-21T12:44:51.7109933Z and.b32 %r3189, %r3188, 3072; 2026-02-21T12:44:51.7110004Z add.s32 %r3190, %r3189, %r3946; 2026-02-21T12:44:51.7110068Z bfe.u32 %r3191, %r3190, 4, 14; 2026-02-21T12:44:51.7110132Z cvt.u64.u32 %rd414, %r3191; 2026-02-21T12:44:51.7110215Z or.b64 %rd402, %rd414, -9223371899399045120; 2026-02-21T12:44:51.7110277Z // begin inline asm 2026-02-21T12:44:51.7110659Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r3072,%r3073,%r3074,%r3075}, %rd402, %p98, 1, 1; 2026-02-21T12:44:51.7110723Z // end inline asm 2026-02-21T12:44:51.7110788Z add.s32 %r3193, %r3189, %r169; 2026-02-21T12:44:51.7110851Z bfe.u32 %r3194, %r3193, 4, 14; 2026-02-21T12:44:51.7110918Z cvt.u64.u32 %rd415, %r3194; 2026-02-21T12:44:51.7111004Z or.b64 %rd403, %rd415, -9223371899399045120; 2026-02-21T12:44:51.7111064Z // begin inline asm 2026-02-21T12:44:51.7111434Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r3092,%r3093,%r3094,%r3095}, %rd403, %p98, 1, 1; 2026-02-21T12:44:51.7111498Z // end inline asm 2026-02-21T12:44:51.7111574Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7111636Z mov.b32 %r3105, %r3169; 2026-02-21T12:44:51.7111700Z mov.b32 %r3106, %r3169; 2026-02-21T12:44:51.7111781Z mov.b32 %r3104, %r3946; 2026-02-21T12:44:51.7111843Z // begin inline asm 2026-02-21T12:44:51.7112025Z // wait for regs: %r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3104,%r3105,%r3106 2026-02-21T12:44:51.7112104Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7112162Z // end inline asm 2026-02-21T12:44:51.7112220Z $L__tmp38: 2026-02-21T12:44:51.7112435Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7112504Z add.s64 %rd405, %rd412, 16352; 2026-02-21T12:44:51.7112707Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7112769Z // begin inline asm 2026-02-21T12:44:51.7112833Z mov.u64 %rd404, 0x0; 2026-02-21T12:44:51.7112958Z createpolicy.fractional.L2::evict_last.b64 %rd404, 1.0; 2026-02-21T12:44:51.7113019Z // end inline asm 2026-02-21T12:44:51.7113083Z // begin inline asm 2026-02-21T12:44:51.7113143Z mov.u32 %r3118, 0x0; 2026-02-21T12:44:51.7113303Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3118 }, [ %rd405 + 0 ], %rd404; 2026-02-21T12:44:51.7113366Z // end inline asm 2026-02-21T12:44:51.7113570Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7113699Z bar.sync 0; 2026-02-21T12:44:51.7113766Z st.shared.b32 [%r9], %r3118; 2026-02-21T12:44:51.7113829Z bar.sync 0; 2026-02-21T12:44:51.7113896Z ld.shared.b16 %rs363, [%r10]; 2026-02-21T12:44:51.7114012Z ld.shared.b16 %rs364, [%r10+256]; 2026-02-21T12:44:51.7114082Z ld.shared.b16 %rs365, [%r10+16]; 2026-02-21T12:44:51.7114148Z ld.shared.b16 %rs366, [%r10+272]; 2026-02-21T12:44:51.7114212Z ld.shared.b16 %rs367, [%r11]; 2026-02-21T12:44:51.7114277Z ld.shared.b16 %rs368, [%r11+256]; 2026-02-21T12:44:51.7114348Z ld.shared.b16 %rs369, [%r11+16]; 2026-02-21T12:44:51.7114412Z ld.shared.b16 %rs370, [%r11+272]; 2026-02-21T12:44:51.7114476Z cvt.f32.bf16 %r3135, %rs363; 2026-02-21T12:44:51.7114542Z cvt.f32.bf16 %r3136, %rs364; 2026-02-21T12:44:51.7114605Z cvt.f32.bf16 %r3137, %rs367; 2026-02-21T12:44:51.7114665Z cvt.f32.bf16 %r3138, %rs368; 2026-02-21T12:44:51.7114733Z cvt.f32.bf16 %r3155, %rs365; 2026-02-21T12:44:51.7114794Z cvt.f32.bf16 %r3156, %rs366; 2026-02-21T12:44:51.7114858Z cvt.f32.bf16 %r3157, %rs369; 2026-02-21T12:44:51.7114919Z cvt.f32.bf16 %r3158, %rs370; 2026-02-21T12:44:51.7115225Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7115294Z add.s64 %rd407, %rd413, 5232640; 2026-02-21T12:44:51.7115494Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7115558Z // begin inline asm 2026-02-21T12:44:51.7115617Z mov.u16 %rs344, 0x0; 2026-02-21T12:44:51.7115691Z ld.global.b8 { %rs344 }, [ %rd407 + 0 ]; 2026-02-21T12:44:51.7115749Z // end inline asm 2026-02-21T12:44:51.7115952Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7116010Z bar.sync 0; 2026-02-21T12:44:51.7116073Z st.shared.b8 [%r12], %rs344; 2026-02-21T12:44:51.7116137Z bar.sync 0; 2026-02-21T12:44:51.7116215Z ld.shared.v2.b8 {%rs371, %rs372}, [%r22]; 2026-02-21T12:44:51.7116418Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7116613Z shl.b16 %rs373, %rs371, 4; 2026-02-21T12:44:51.7116686Z shl.b16 %rs374, %rs372, 4; 2026-02-21T12:44:51.7116887Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7116960Z selp.b16 %rs375, %rs373, %rs371, %p134; 2026-02-21T12:44:51.7117027Z cvt.s16.s8 %rs376, %rs375; 2026-02-21T12:44:51.7117089Z shr.s16 %rs377, %rs376, 4; 2026-02-21T12:44:51.7117159Z selp.b16 %rs378, %rs374, %rs372, %p134; 2026-02-21T12:44:51.7117227Z cvt.s16.s8 %rs379, %rs378; 2026-02-21T12:44:51.7117302Z shr.s16 %rs380, %rs379, 4; 2026-02-21T12:44:51.7117507Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7117575Z cvt.rn.f32.s16 %r3195, %rs377; 2026-02-21T12:44:51.7117638Z cvt.rn.f32.s16 %r3196, %rs380; 2026-02-21T12:44:51.7117694Z bar.sync 0; 2026-02-21T12:44:51.7117760Z st.shared.b32 [%r14], %r3195; 2026-02-21T12:44:51.7117831Z st.shared.b32 [%r15], %r3196; 2026-02-21T12:44:51.7117887Z $L__tmp39: 2026-02-21T12:44:51.7118164Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7118230Z // begin inline asm 2026-02-21T12:44:51.7118309Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7118367Z // end inline asm 2026-02-21T12:44:51.7118426Z bar.sync 0; 2026-02-21T12:44:51.7118508Z shfl.sync.idx.b32 %r3197, %r3, 0, 31, -1; 2026-02-21T12:44:51.7118581Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7118643Z shl.b32 %r3198, %r3197, 8; 2026-02-21T12:44:51.7118707Z and.b32 %r3199, %r3198, 3072; 2026-02-21T12:44:51.7118772Z add.s32 %r3200, %r3199, %r3946; 2026-02-21T12:44:51.7118836Z bfe.u32 %r3201, %r3200, 4, 14; 2026-02-21T12:44:51.7118904Z cvt.u64.u32 %rd416, %r3201; 2026-02-21T12:44:51.7118983Z or.b64 %rd408, %rd416, -9223371899399045120; 2026-02-21T12:44:51.7119146Z // begin inline asm 2026-02-21T12:44:51.7119524Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r3135,%r3136,%r3137,%r3138}, %rd408, %p98, 1, 1; 2026-02-21T12:44:51.7119645Z // end inline asm 2026-02-21T12:44:51.7119721Z add.s32 %r3202, %r3199, %r169; 2026-02-21T12:44:51.7119785Z bfe.u32 %r3203, %r3202, 4, 14; 2026-02-21T12:44:51.7119855Z cvt.u64.u32 %rd417, %r3203; 2026-02-21T12:44:51.7119932Z or.b64 %rd409, %rd417, -9223371899399045120; 2026-02-21T12:44:51.7119996Z // begin inline asm 2026-02-21T12:44:51.7120372Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027}, {%r3155,%r3156,%r3157,%r3158}, %rd409, %p98, 1, 1; 2026-02-21T12:44:51.7120432Z // end inline asm 2026-02-21T12:44:51.7120507Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7120570Z mov.b32 %r3168, %r3169; 2026-02-21T12:44:51.7120636Z mov.b32 %r3167, %r3946; 2026-02-21T12:44:51.7120698Z // begin inline asm 2026-02-21T12:44:51.7120874Z // wait for regs: %r3020,%r3021,%r3022,%r3023,%r3024,%r3025,%r3026,%r3027,%r3167,%r3168,%r3169 2026-02-21T12:44:51.7121076Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7121137Z // end inline asm 2026-02-21T12:44:51.7121191Z $L__tmp40: 2026-02-21T12:44:51.7121399Z .loc 1 90 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:90:28 2026-02-21T12:44:51.7121480Z cvt.rn.bf16x2.f32 %r3204, %r3021, %r3020; 2026-02-21T12:44:51.7121557Z cvt.rn.bf16x2.f32 %r3205, %r3023, %r3022; 2026-02-21T12:44:51.7121631Z cvt.rn.bf16x2.f32 %r3206, %r3025, %r3024; 2026-02-21T12:44:51.7121710Z cvt.rn.bf16x2.f32 %r3207, %r3027, %r3026; 2026-02-21T12:44:51.7121913Z .loc 1 91 22 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:22 2026-02-21T12:44:51.7121983Z mad.lo.s64 %rd418, %rd88, 2560, %rd123; 2026-02-21T12:44:51.7122050Z shl.b64 %rd419, %rd89, 1; 2026-02-21T12:44:51.7122119Z add.s64 %rd420, %rd418, %rd419; 2026-02-21T12:44:51.7122184Z add.s64 %rd410, %rd420, %rd504; 2026-02-21T12:44:51.7122389Z .loc 1 91 81 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:81 2026-02-21T12:44:51.7122452Z bar.sync 0; 2026-02-21T12:44:51.7122642Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r20], {%r3204, %r3205, %r3206, %r3207}; 2026-02-21T12:44:51.7122699Z bar.sync 0; 2026-02-21T12:44:51.7122816Z ld.shared.v4.b32 {%r3181, %r3182, %r3183, %r3184}, [%r21]; 2026-02-21T12:44:51.7122877Z // begin inline asm 2026-02-21T12:44:51.7123003Z st.global.v4.b32 [ %rd410 + 0 ], { %r3181, %r3182, %r3183, %r3184 }; 2026-02-21T12:44:51.7123066Z // end inline asm 2026-02-21T12:44:51.7123280Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.7123345Z add.s64 %rd505, %rd505, 4; 2026-02-21T12:44:51.7123416Z setp.lt.u64 %p106, %rd505, %rd522; 2026-02-21T12:44:51.7123485Z @%p106 bra $L__BB0_2; 2026-02-21T12:44:51.7123583Z $L__BB0_4: // %.preheader207 2026-02-21T12:44:51.7123652Z setp.ge.s64 %p107, %rd522, %rd2; 2026-02-21T12:44:51.7123739Z @%p107 bra $L__BB0_28; 2026-02-21T12:44:51.7123826Z // %bb.5: // %.lr.ph221 2026-02-21T12:44:51.7124034Z .loc 1 0 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:0:120 2026-02-21T12:44:51.7124100Z and.b32 %r3210, %r4040, 1916; 2026-02-21T12:44:51.7124162Z and.b32 %r3213, %r4042, 136; 2026-02-21T12:44:51.7124226Z xor.b32 %r3214, %r3213, %r3210; 2026-02-21T12:44:51.7124288Z add.s32 %r24, %r3946, %r3214; 2026-02-21T12:44:51.7124354Z shl.b32 %r3217, %r4044, 4; 2026-02-21T12:44:51.7124416Z and.b32 %r3219, %r4045, 96; 2026-02-21T12:44:51.7124477Z and.b32 %r3221, %r4046, 6; 2026-02-21T12:44:51.7124541Z and.b32 %r3224, %r4048, 136; 2026-02-21T12:44:51.7124604Z or.b32 %r3225, %r3217, %r3219; 2026-02-21T12:44:51.7124666Z or.b32 %r3226, %r3225, %r3221; 2026-02-21T12:44:51.7124785Z or.b32 %r3227, %r3226, %r3224; 2026-02-21T12:44:51.7124851Z add.s32 %r25, %r3946, %r3227; 2026-02-21T12:44:51.7124912Z xor.b32 %r3228, %r3227, 8; 2026-02-21T12:44:51.7125021Z add.s32 %r26, %r3946, %r3228; 2026-02-21T12:44:51.7125088Z and.b32 %r3229, %r4040, 124; 2026-02-21T12:44:51.7125153Z and.b32 %r3230, %r4046, 384; 2026-02-21T12:44:51.7125213Z shr.u32 %r3231, %r4041, 4; 2026-02-21T12:44:51.7125280Z add.s32 %r3233, %r3946, %r4049; 2026-02-21T12:44:51.7125341Z add.s32 %r3234, %r3233, %r3230; 2026-02-21T12:44:51.7125402Z add.s32 %r3235, %r3234, %r3231; 2026-02-21T12:44:51.7125462Z add.s32 %r27, %r3235, %r3229; 2026-02-21T12:44:51.7125528Z add.s32 %r3237, %r3946, %r3231; 2026-02-21T12:44:51.7125602Z add.s32 %r3238, %r3237, %r4050; 2026-02-21T12:44:51.7125666Z add.s32 %r28, %r3238, %r3229; 2026-02-21T12:44:51.7125732Z shl.b32 %r3239, %r4039, 6; 2026-02-21T12:44:51.7125794Z and.b32 %r3240, %r4045, 48; 2026-02-21T12:44:51.7125856Z xor.b32 %r3242, %r3240, %r4051; 2026-02-21T12:44:51.7125921Z or.b32 %r3243, %r3242, %r3239; 2026-02-21T12:44:51.7125986Z add.s32 %r29, %r3946, %r3243; 2026-02-21T12:44:51.7126097Z xor.b32 %r3244, %r3243, 32; 2026-02-21T12:44:51.7126205Z add.s32 %r30, %r3946, %r3244; 2026-02-21T12:44:51.7126274Z and.b32 %r3246, %r4052, 120; 2026-02-21T12:44:51.7126335Z or.b32 %r3247, %r3246, %r2; 2026-02-21T12:44:51.7126395Z shl.b32 %r3248, %r3247, 4; 2026-02-21T12:44:51.7126588Z add.s32 %r3249, %r3946, 4096; 2026-02-21T12:44:51.7126659Z add.s32 %r3796, %r3249, %r3248; 2026-02-21T12:44:51.7126719Z and.b32 %r3251, %r4053, 1536; 2026-02-21T12:44:51.7126792Z shl.b32 %r3253, %r4044, 2; 2026-02-21T12:44:51.7126864Z add.s32 %r3254, %r3249, %r3251; 2026-02-21T12:44:51.7126926Z add.s32 %r3255, %r3254, %r4054; 2026-02-21T12:44:51.7126988Z add.s32 %r3297, %r3255, %r3253; 2026-02-21T12:44:51.7127050Z bfe.u32 %r3256, %r3946, 4, 14; 2026-02-21T12:44:51.7127118Z cvt.u64.u32 %rd422, %r3256; 2026-02-21T12:44:51.7127202Z or.b64 %rd21, %rd422, -9223371899399045120; 2026-02-21T12:44:51.7127264Z add.s32 %r3257, %r3946, 32; 2026-02-21T12:44:51.7127331Z bfe.u32 %r3258, %r3257, 4, 14; 2026-02-21T12:44:51.7127399Z cvt.u64.u32 %rd423, %r3258; 2026-02-21T12:44:51.7127477Z or.b64 %rd22, %rd423, -9223371899399045120; 2026-02-21T12:44:51.7127541Z add.s32 %r3259, %r3946, 1024; 2026-02-21T12:44:51.7127603Z bfe.u32 %r3260, %r3259, 4, 14; 2026-02-21T12:44:51.7127667Z cvt.u64.u32 %rd424, %r3260; 2026-02-21T12:44:51.7127743Z or.b64 %rd23, %rd424, -9223371899399045120; 2026-02-21T12:44:51.7127809Z add.s32 %r3261, %r3946, 1056; 2026-02-21T12:44:51.7127871Z bfe.u32 %r3262, %r3261, 4, 14; 2026-02-21T12:44:51.7127933Z cvt.u64.u32 %rd425, %r3262; 2026-02-21T12:44:51.7128010Z or.b64 %rd24, %rd425, -9223371899399045120; 2026-02-21T12:44:51.7128072Z add.s32 %r3263, %r3946, 2048; 2026-02-21T12:44:51.7128132Z bfe.u32 %r3264, %r3263, 4, 14; 2026-02-21T12:44:51.7128194Z cvt.u64.u32 %rd426, %r3264; 2026-02-21T12:44:51.7128287Z or.b64 %rd25, %rd426, -9223371899399045120; 2026-02-21T12:44:51.7128348Z add.s32 %r3265, %r3946, 2080; 2026-02-21T12:44:51.7128410Z bfe.u32 %r3266, %r3265, 4, 14; 2026-02-21T12:44:51.7128480Z cvt.u64.u32 %rd427, %r3266; 2026-02-21T12:44:51.7128555Z or.b64 %rd26, %rd427, -9223371899399045120; 2026-02-21T12:44:51.7128628Z add.s32 %r3267, %r3946, 3072; 2026-02-21T12:44:51.7128697Z bfe.u32 %r3268, %r3267, 4, 14; 2026-02-21T12:44:51.7128761Z cvt.u64.u32 %rd428, %r3268; 2026-02-21T12:44:51.7128833Z or.b64 %rd27, %rd428, -9223371899399045120; 2026-02-21T12:44:51.7128893Z add.s32 %r3269, %r3946, 3104; 2026-02-21T12:44:51.7128959Z bfe.u32 %r3270, %r3269, 4, 14; 2026-02-21T12:44:51.7129023Z cvt.u64.u32 %rd429, %r3270; 2026-02-21T12:44:51.7129097Z or.b64 %rd28, %rd429, -9223371899399045120; 2026-02-21T12:44:51.7129164Z and.b32 %r3272, %r4055, 1536; 2026-02-21T12:44:51.7129225Z or.b32 %r3273, %r3272, %r3219; 2026-02-21T12:44:51.7129286Z or.b32 %r3274, %r3273, %r3221; 2026-02-21T12:44:51.7129427Z or.b32 %r3275, %r3274, %r3224; 2026-02-21T12:44:51.7129493Z add.s32 %r33, %r3946, %r3275; 2026-02-21T12:44:51.7129553Z xor.b32 %r3276, %r3275, 8; 2026-02-21T12:44:51.7129677Z add.s32 %r34, %r3946, %r3276; 2026-02-21T12:44:51.7129744Z and.b32 %r3278, %r4055, 240; 2026-02-21T12:44:51.7129805Z and.b32 %r3279, %r4045, 768; 2026-02-21T12:44:51.7129868Z shr.u32 %r3280, %r1, 2; 2026-02-21T12:44:51.7129929Z and.b32 %r3281, %r3280, 96; 2026-02-21T12:44:51.7129995Z or.b32 %r3282, %r3278, %r3279; 2026-02-21T12:44:51.7130056Z or.b32 %r3283, %r3281, %r4047; 2026-02-21T12:44:51.7130121Z xor.b32 %r3284, %r3282, %r3283; 2026-02-21T12:44:51.7130187Z add.s32 %r3285, %r3946, %r4056; 2026-02-21T12:44:51.7130247Z add.s32 %r35, %r3285, %r3284; 2026-02-21T12:44:51.7130318Z and.b32 %r3287, %r4057, 7168; 2026-02-21T12:44:51.7130386Z xor.b32 %r3289, %r4054, %r4058; 2026-02-21T12:44:51.7130448Z add.s32 %r3290, %r3946, %r3287; 2026-02-21T12:44:51.7130509Z add.s32 %r36, %r3290, %r3289; 2026-02-21T12:44:51.7130727Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.7130864Z add.s64 %rd430, %rd503, %rd121; 2026-02-21T12:44:51.7130991Z add.s64 %rd32, %rd430, 64; 2026-02-21T12:44:51.7131068Z or.b64 %rd431, %rd502, %rd6; 2026-02-21T12:44:51.7131136Z add.s64 %rd33, %rd122, %rd431; 2026-02-21T12:44:51.7131251Z $L__BB0_6: // =>This Loop Header: Depth=1 2026-02-21T12:44:51.7131351Z // Child Loop BB0_7 Depth 2 2026-02-21T12:44:51.7131563Z .loc 1 28 35 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:28:35 2026-02-21T12:44:51.7131652Z mul.hi.u64 %rd433, %rd522, -3689348814741910323; 2026-02-21T12:44:51.7131716Z shr.u64 %rd434, %rd433, 7; 2026-02-21T12:44:51.7131914Z .loc 1 31 45 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:31:45 2026-02-21T12:44:51.7131983Z mul.lo.s64 %rd435, %rd434, 160; 2026-02-21T12:44:51.7132048Z sub.s64 %rd436, %rd522, %rd435; 2026-02-21T12:44:51.7132252Z .loc 1 33 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:33:27 2026-02-21T12:44:51.7132321Z shl.b64 %rd437, %rd434, 9; 2026-02-21T12:44:51.7132386Z shl.b64 %rd438, %rd436, 6; 2026-02-21T12:44:51.7132448Z and.b64 %rd439, %rd438, 448; 2026-02-21T12:44:51.7132514Z or.b64 %rd440, %rd439, %rd437; 2026-02-21T12:44:51.7132716Z .loc 1 34 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:34:32 2026-02-21T12:44:51.7132781Z or.b64 %rd108, %rd440, %rd4; 2026-02-21T12:44:51.7132979Z .loc 1 35 27 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:35:27 2026-02-21T12:44:51.7133044Z shl.b64 %rd441, %rd436, 3; 2026-02-21T12:44:51.7133107Z and.b64 %rd109, %rd441, 1984; 2026-02-21T12:44:51.7133170Z shl.b64 %rd442, %rd108, 14; 2026-02-21T12:44:51.7133237Z add.s64 %rd110, %rd121, %rd442; 2026-02-21T12:44:51.7133303Z add.s64 %rd111, %rd501, %rd109; 2026-02-21T12:44:51.7133511Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.7133583Z or.b64 %rd443, %rd4, %rd437; 2026-02-21T12:44:51.7133646Z or.b64 %rd444, %rd443, %rd439; 2026-02-21T12:44:51.7133708Z shl.b64 %rd445, %rd444, 14; 2026-02-21T12:44:51.7133785Z add.s64 %rd524, %rd32, %rd445; 2026-02-21T12:44:51.7133855Z add.s64 %rd523, %rd33, %rd109; 2026-02-21T12:44:51.7133920Z mov.b32 %r3851, 0f00000000; 2026-02-21T12:44:51.7133981Z mov.b64 %rd525, -24; 2026-02-21T12:44:51.7134047Z mov.b32 %r3852, %r3851; 2026-02-21T12:44:51.7134108Z mov.b32 %r3853, %r3851; 2026-02-21T12:44:51.7134167Z mov.b32 %r3854, %r3851; 2026-02-21T12:44:51.7134227Z mov.b32 %r3855, %r3851; 2026-02-21T12:44:51.7134291Z mov.b32 %r3856, %r3851; 2026-02-21T12:44:51.7134351Z mov.b32 %r3857, %r3851; 2026-02-21T12:44:51.7134413Z mov.b32 %r3858, %r3851; 2026-02-21T12:44:51.7134529Z $L__BB0_7: // Parent Loop BB0_6 Depth=1 2026-02-21T12:44:51.7134698Z // => This Inner Loop Header: Depth=2 2026-02-21T12:44:51.7134952Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7135027Z add.s64 %rd447, %rd524, -64; 2026-02-21T12:44:51.7135088Z // begin inline asm 2026-02-21T12:44:51.7135149Z mov.u64 %rd446, 0x0; 2026-02-21T12:44:51.7135277Z createpolicy.fractional.L2::evict_last.b64 %rd446, 1.0; 2026-02-21T12:44:51.7135343Z // end inline asm 2026-02-21T12:44:51.7135404Z // begin inline asm 2026-02-21T12:44:51.7135464Z mov.u32 %r3292, 0x0; 2026-02-21T12:44:51.7135632Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3292 }, [ %rd447 + 0 ], %rd446; 2026-02-21T12:44:51.7135691Z // end inline asm 2026-02-21T12:44:51.7135896Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7135963Z bar.sync 0; 2026-02-21T12:44:51.7136033Z st.shared.b32 [%r24], %r3292; 2026-02-21T12:44:51.7136089Z bar.sync 0; 2026-02-21T12:44:51.7136156Z ld.shared.b16 %rs384, [%r25]; 2026-02-21T12:44:51.7136343Z ld.shared.b16 %rs385, [%r25+256]; 2026-02-21T12:44:51.7136414Z ld.shared.b16 %rs386, [%r25+16]; 2026-02-21T12:44:51.7136609Z ld.shared.b16 %rs387, [%r25+272]; 2026-02-21T12:44:51.7136684Z ld.shared.b16 %rs388, [%r26]; 2026-02-21T12:44:51.7136753Z ld.shared.b16 %rs389, [%r26+256]; 2026-02-21T12:44:51.7139779Z ld.shared.b16 %rs390, [%r26+16]; 2026-02-21T12:44:51.7139889Z ld.shared.b16 %rs391, [%r26+272]; 2026-02-21T12:44:51.7139966Z cvt.f32.bf16 %r3389, %rs384; 2026-02-21T12:44:51.7140033Z cvt.f32.bf16 %r3390, %rs385; 2026-02-21T12:44:51.7140102Z cvt.f32.bf16 %r3391, %rs388; 2026-02-21T12:44:51.7140165Z cvt.f32.bf16 %r3392, %rs389; 2026-02-21T12:44:51.7140226Z cvt.f32.bf16 %r3409, %rs386; 2026-02-21T12:44:51.7140301Z cvt.f32.bf16 %r3410, %rs387; 2026-02-21T12:44:51.7140373Z cvt.f32.bf16 %r3411, %rs390; 2026-02-21T12:44:51.7140447Z cvt.f32.bf16 %r3412, %rs391; 2026-02-21T12:44:51.7140693Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7140762Z // begin inline asm 2026-02-21T12:44:51.7140826Z mov.u16 %rs381, 0x0; 2026-02-21T12:44:51.7140908Z ld.global.b8 { %rs381 }, [ %rd523 + 0 ]; 2026-02-21T12:44:51.7140979Z // end inline asm 2026-02-21T12:44:51.7141206Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7141267Z bar.sync 0; 2026-02-21T12:44:51.7141344Z st.shared.b8 [%r27], %rs381; 2026-02-21T12:44:51.7141401Z bar.sync 0; 2026-02-21T12:44:51.7141486Z ld.shared.v2.b8 {%rs392, %rs393}, [%r28]; 2026-02-21T12:44:51.7141704Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7141782Z shl.b16 %rs394, %rs392, 4; 2026-02-21T12:44:51.7141846Z shl.b16 %rs395, %rs393, 4; 2026-02-21T12:44:51.7142054Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7142137Z selp.b16 %rs396, %rs394, %rs392, %p134; 2026-02-21T12:44:51.7142206Z cvt.s16.s8 %rs397, %rs396; 2026-02-21T12:44:51.7142269Z shr.s16 %rs398, %rs397, 4; 2026-02-21T12:44:51.7142346Z selp.b16 %rs399, %rs395, %rs393, %p134; 2026-02-21T12:44:51.7142409Z cvt.s16.s8 %rs400, %rs399; 2026-02-21T12:44:51.7142470Z shr.s16 %rs401, %rs400, 4; 2026-02-21T12:44:51.7142674Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7142752Z cvt.rn.f32.s16 %r3873, %rs398; 2026-02-21T12:44:51.7142821Z cvt.rn.f32.s16 %r3874, %rs401; 2026-02-21T12:44:51.7142879Z bar.sync 0; 2026-02-21T12:44:51.7142952Z st.shared.b32 [%r29], %r3873; 2026-02-21T12:44:51.7143018Z st.shared.b32 [%r30], %r3874; 2026-02-21T12:44:51.7143166Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3851}; 2026-02-21T12:44:51.7143359Z bar.sync 0; 2026-02-21T12:44:51.7143425Z // begin inline asm 2026-02-21T12:44:51.7143625Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3353, %r3393, %r3433, %r3473}, [%r3297]; 2026-02-21T12:44:51.7143752Z // end inline asm 2026-02-21T12:44:51.7143814Z bar.sync 0; 2026-02-21T12:44:51.7143946Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3853}; 2026-02-21T12:44:51.7144002Z bar.sync 0; 2026-02-21T12:44:51.7144068Z // begin inline asm 2026-02-21T12:44:51.7144247Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3355, %r3395, %r3435, %r3475}, [%r3297]; 2026-02-21T12:44:51.7144305Z // end inline asm 2026-02-21T12:44:51.7144360Z bar.sync 0; 2026-02-21T12:44:51.7144495Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3852}; 2026-02-21T12:44:51.7144552Z bar.sync 0; 2026-02-21T12:44:51.7144611Z // begin inline asm 2026-02-21T12:44:51.7144798Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3354, %r3394, %r3434, %r3474}, [%r3297]; 2026-02-21T12:44:51.7144857Z // end inline asm 2026-02-21T12:44:51.7144916Z bar.sync 0; 2026-02-21T12:44:51.7145053Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3854}; 2026-02-21T12:44:51.7145189Z bar.sync 0; 2026-02-21T12:44:51.7145316Z // begin inline asm 2026-02-21T12:44:51.7145500Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3356, %r3396, %r3436, %r3476}, [%r3297]; 2026-02-21T12:44:51.7145563Z // end inline asm 2026-02-21T12:44:51.7145620Z bar.sync 0; 2026-02-21T12:44:51.7145751Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3855}; 2026-02-21T12:44:51.7145813Z bar.sync 0; 2026-02-21T12:44:51.7145872Z // begin inline asm 2026-02-21T12:44:51.7146051Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3357, %r3397, %r3437, %r3477}, [%r3297]; 2026-02-21T12:44:51.7146108Z // end inline asm 2026-02-21T12:44:51.7146169Z bar.sync 0; 2026-02-21T12:44:51.7146298Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3857}; 2026-02-21T12:44:51.7146354Z bar.sync 0; 2026-02-21T12:44:51.7146416Z // begin inline asm 2026-02-21T12:44:51.7146751Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3359, %r3399, %r3439, %r3479}, [%r3297]; 2026-02-21T12:44:51.7146811Z // end inline asm 2026-02-21T12:44:51.7146872Z bar.sync 0; 2026-02-21T12:44:51.7147009Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3856}; 2026-02-21T12:44:51.7147075Z bar.sync 0; 2026-02-21T12:44:51.7147138Z // begin inline asm 2026-02-21T12:44:51.7147322Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3358, %r3398, %r3438, %r3478}, [%r3297]; 2026-02-21T12:44:51.7147381Z // end inline asm 2026-02-21T12:44:51.7147447Z bar.sync 0; 2026-02-21T12:44:51.7147586Z stmatrix.sync.aligned.m8n8.x1.shared.b16 [%r3796], {%r3858}; 2026-02-21T12:44:51.7147643Z bar.sync 0; 2026-02-21T12:44:51.7147703Z // begin inline asm 2026-02-21T12:44:51.7147880Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3360, %r3400, %r3440, %r3480}, [%r3297]; 2026-02-21T12:44:51.7147944Z // end inline asm 2026-02-21T12:44:51.7148004Z $L__tmp41: 2026-02-21T12:44:51.7148293Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7148362Z // begin inline asm 2026-02-21T12:44:51.7148452Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7148581Z // end inline asm 2026-02-21T12:44:51.7148671Z shfl.sync.idx.b32 %r3875, %r3, 0, 31, -1; 2026-02-21T12:44:51.7148753Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7148829Z mov.pred %p125, -1; 2026-02-21T12:44:51.7148890Z // begin inline asm 2026-02-21T12:44:51.7149287Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360}, {%r3389,%r3390,%r3391,%r3392}, %rd21, %p125, 1, 1; 2026-02-21T12:44:51.7149347Z // end inline asm 2026-02-21T12:44:51.7149407Z // begin inline asm 2026-02-21T12:44:51.7149779Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360}, {%r3409,%r3410,%r3411,%r3412}, %rd22, %p125, 1, 1; 2026-02-21T12:44:51.7149837Z // end inline asm 2026-02-21T12:44:51.7149984Z // begin inline asm 2026-02-21T12:44:51.7150352Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400}, {%r3389,%r3390,%r3391,%r3392}, %rd23, %p125, 1, 1; 2026-02-21T12:44:51.7150471Z // end inline asm 2026-02-21T12:44:51.7150532Z // begin inline asm 2026-02-21T12:44:51.7150892Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400}, {%r3409,%r3410,%r3411,%r3412}, %rd24, %p125, 1, 1; 2026-02-21T12:44:51.7150957Z // end inline asm 2026-02-21T12:44:51.7151020Z // begin inline asm 2026-02-21T12:44:51.7151379Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440}, {%r3389,%r3390,%r3391,%r3392}, %rd25, %p125, 1, 1; 2026-02-21T12:44:51.7151445Z // end inline asm 2026-02-21T12:44:51.7151505Z // begin inline asm 2026-02-21T12:44:51.7151867Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440}, {%r3409,%r3410,%r3411,%r3412}, %rd26, %p125, 1, 1; 2026-02-21T12:44:51.7151936Z // end inline asm 2026-02-21T12:44:51.7152057Z // begin inline asm 2026-02-21T12:44:51.7152478Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480}, {%r3389,%r3390,%r3391,%r3392}, %rd27, %p125, 1, 1; 2026-02-21T12:44:51.7152545Z // end inline asm 2026-02-21T12:44:51.7152607Z // begin inline asm 2026-02-21T12:44:51.7152968Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480}, {%r3409,%r3410,%r3411,%r3412}, %rd28, %p125, 1, 1; 2026-02-21T12:44:51.7153027Z // end inline asm 2026-02-21T12:44:51.7153115Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7153175Z mov.b32 %r4000, 0; 2026-02-21T12:44:51.7153252Z mov.b32 %r3526, %r4000; 2026-02-21T12:44:51.7153322Z mov.b32 %r3527, %r4000; 2026-02-21T12:44:51.7153383Z mov.b32 %r3525, %r3946; 2026-02-21T12:44:51.7153445Z // begin inline asm 2026-02-21T12:44:51.7154028Z // wait for regs: %r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3525,%r3526,%r3527 2026-02-21T12:44:51.7154112Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7154172Z // end inline asm 2026-02-21T12:44:51.7154228Z $L__tmp42: 2026-02-21T12:44:51.7154464Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7154536Z add.s64 %rd459, %rd524, -32; 2026-02-21T12:44:51.7154597Z // begin inline asm 2026-02-21T12:44:51.7154664Z mov.u64 %rd458, 0x0; 2026-02-21T12:44:51.7154795Z createpolicy.fractional.L2::evict_last.b64 %rd458, 1.0; 2026-02-21T12:44:51.7154856Z // end inline asm 2026-02-21T12:44:51.7154920Z // begin inline asm 2026-02-21T12:44:51.7154980Z mov.u32 %r3563, 0x0; 2026-02-21T12:44:51.7155147Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3563 }, [ %rd459 + 0 ], %rd458; 2026-02-21T12:44:51.7155207Z // end inline asm 2026-02-21T12:44:51.7155447Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7155509Z bar.sync 0; 2026-02-21T12:44:51.7155577Z st.shared.b32 [%r24], %r3563; 2026-02-21T12:44:51.7155640Z bar.sync 0; 2026-02-21T12:44:51.7155708Z ld.shared.b16 %rs402, [%r25]; 2026-02-21T12:44:51.7155779Z ld.shared.b16 %rs403, [%r25+256]; 2026-02-21T12:44:51.7155851Z ld.shared.b16 %rs404, [%r25+16]; 2026-02-21T12:44:51.7155918Z ld.shared.b16 %rs405, [%r25+272]; 2026-02-21T12:44:51.7155984Z ld.shared.b16 %rs406, [%r26]; 2026-02-21T12:44:51.7156048Z ld.shared.b16 %rs407, [%r26+256]; 2026-02-21T12:44:51.7156118Z ld.shared.b16 %rs408, [%r26+16]; 2026-02-21T12:44:51.7156182Z ld.shared.b16 %rs409, [%r26+272]; 2026-02-21T12:44:51.7156249Z cvt.f32.bf16 %r3620, %rs402; 2026-02-21T12:44:51.7156318Z cvt.f32.bf16 %r3621, %rs403; 2026-02-21T12:44:51.7156443Z cvt.f32.bf16 %r3622, %rs406; 2026-02-21T12:44:51.7156661Z cvt.f32.bf16 %r3623, %rs407; 2026-02-21T12:44:51.7156732Z cvt.f32.bf16 %r3640, %rs404; 2026-02-21T12:44:51.7156879Z cvt.f32.bf16 %r3641, %rs405; 2026-02-21T12:44:51.7156941Z cvt.f32.bf16 %r3642, %rs408; 2026-02-21T12:44:51.7157003Z cvt.f32.bf16 %r3643, %rs409; 2026-02-21T12:44:51.7157217Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7157297Z add.s64 %rd461, %rd523, 10240; 2026-02-21T12:44:51.7157359Z // begin inline asm 2026-02-21T12:44:51.7157421Z mov.u16 %rs382, 0x0; 2026-02-21T12:44:51.7157505Z ld.global.b8 { %rs382 }, [ %rd461 + 0 ]; 2026-02-21T12:44:51.7157565Z // end inline asm 2026-02-21T12:44:51.7157772Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7157834Z bar.sync 0; 2026-02-21T12:44:51.7157902Z st.shared.b8 [%r27], %rs382; 2026-02-21T12:44:51.7157960Z bar.sync 0; 2026-02-21T12:44:51.7158046Z ld.shared.v2.b8 {%rs410, %rs411}, [%r28]; 2026-02-21T12:44:51.7158367Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7158439Z shl.b16 %rs412, %rs410, 4; 2026-02-21T12:44:51.7158501Z shl.b16 %rs413, %rs411, 4; 2026-02-21T12:44:51.7158709Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7158782Z selp.b16 %rs414, %rs412, %rs410, %p134; 2026-02-21T12:44:51.7158847Z cvt.s16.s8 %rs415, %rs414; 2026-02-21T12:44:51.7158913Z shr.s16 %rs416, %rs415, 4; 2026-02-21T12:44:51.7158983Z selp.b16 %rs417, %rs413, %rs411, %p134; 2026-02-21T12:44:51.7159044Z cvt.s16.s8 %rs418, %rs417; 2026-02-21T12:44:51.7159111Z shr.s16 %rs419, %rs418, 4; 2026-02-21T12:44:51.7159314Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7159391Z cvt.rn.f32.s16 %r3876, %rs416; 2026-02-21T12:44:51.7159456Z cvt.rn.f32.s16 %r3877, %rs419; 2026-02-21T12:44:51.7159518Z bar.sync 0; 2026-02-21T12:44:51.7159585Z st.shared.b32 [%r29], %r3876; 2026-02-21T12:44:51.7159653Z st.shared.b32 [%r30], %r3877; 2026-02-21T12:44:51.7159714Z $L__tmp43: 2026-02-21T12:44:51.7159995Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7160056Z // begin inline asm 2026-02-21T12:44:51.7160140Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7160198Z // end inline asm 2026-02-21T12:44:51.7160253Z bar.sync 0; 2026-02-21T12:44:51.7160328Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7160396Z // begin inline asm 2026-02-21T12:44:51.7160781Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360}, {%r3620,%r3621,%r3622,%r3623}, %rd21, %p125, 1, 1; 2026-02-21T12:44:51.7160839Z // end inline asm 2026-02-21T12:44:51.7160907Z // begin inline asm 2026-02-21T12:44:51.7161279Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360}, {%r3640,%r3641,%r3642,%r3643}, %rd22, %p125, 1, 1; 2026-02-21T12:44:51.7161340Z // end inline asm 2026-02-21T12:44:51.7161399Z // begin inline asm 2026-02-21T12:44:51.7161771Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400}, {%r3620,%r3621,%r3622,%r3623}, %rd23, %p125, 1, 1; 2026-02-21T12:44:51.7161829Z // end inline asm 2026-02-21T12:44:51.7161889Z // begin inline asm 2026-02-21T12:44:51.7162261Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400}, {%r3640,%r3641,%r3642,%r3643}, %rd24, %p125, 1, 1; 2026-02-21T12:44:51.7162320Z // end inline asm 2026-02-21T12:44:51.7162382Z // begin inline asm 2026-02-21T12:44:51.7162751Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440}, {%r3620,%r3621,%r3622,%r3623}, %rd25, %p125, 1, 1; 2026-02-21T12:44:51.7162893Z // end inline asm 2026-02-21T12:44:51.7162953Z // begin inline asm 2026-02-21T12:44:51.7163373Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440}, {%r3640,%r3641,%r3642,%r3643}, %rd26, %p125, 1, 1; 2026-02-21T12:44:51.7163433Z // end inline asm 2026-02-21T12:44:51.7163494Z // begin inline asm 2026-02-21T12:44:51.7163864Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480}, {%r3620,%r3621,%r3622,%r3623}, %rd27, %p125, 1, 1; 2026-02-21T12:44:51.7163921Z // end inline asm 2026-02-21T12:44:51.7163980Z // begin inline asm 2026-02-21T12:44:51.7164349Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480}, {%r3640,%r3641,%r3642,%r3643}, %rd28, %p125, 1, 1; 2026-02-21T12:44:51.7164413Z // end inline asm 2026-02-21T12:44:51.7164506Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7164573Z mov.b32 %r3757, %r4000; 2026-02-21T12:44:51.7164643Z mov.b32 %r3758, %r4000; 2026-02-21T12:44:51.7164780Z mov.b32 %r3756, %r3946; 2026-02-21T12:44:51.7164888Z // begin inline asm 2026-02-21T12:44:51.7165460Z // wait for regs: %r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3433,%r3434,%r3435,%r3436,%r3437,%r3438,%r3439,%r3440,%r3473,%r3474,%r3475,%r3476,%r3477,%r3478,%r3479,%r3480,%r3756,%r3757,%r3758 2026-02-21T12:44:51.7165550Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7165609Z // end inline asm 2026-02-21T12:44:51.7165664Z $L__tmp44: 2026-02-21T12:44:51.7165883Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7165942Z // begin inline asm 2026-02-21T12:44:51.7166002Z mov.u64 %rd470, 0x0; 2026-02-21T12:44:51.7166134Z createpolicy.fractional.L2::evict_last.b64 %rd470, 1.0; 2026-02-21T12:44:51.7166194Z // end inline asm 2026-02-21T12:44:51.7166252Z // begin inline asm 2026-02-21T12:44:51.7166320Z mov.u32 %r3794, 0x0; 2026-02-21T12:44:51.7166610Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3794 }, [ %rd524 + 0 ], %rd470; 2026-02-21T12:44:51.7166676Z // end inline asm 2026-02-21T12:44:51.7166881Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7166943Z bar.sync 0; 2026-02-21T12:44:51.7167009Z st.shared.b32 [%r24], %r3794; 2026-02-21T12:44:51.7167065Z bar.sync 0; 2026-02-21T12:44:51.7167140Z ld.shared.b16 %rs420, [%r25]; 2026-02-21T12:44:51.7167209Z ld.shared.b16 %rs421, [%r25+256]; 2026-02-21T12:44:51.7167279Z ld.shared.b16 %rs422, [%r25+16]; 2026-02-21T12:44:51.7167362Z ld.shared.b16 %rs423, [%r25+272]; 2026-02-21T12:44:51.7167431Z ld.shared.b16 %rs424, [%r26]; 2026-02-21T12:44:51.7167497Z ld.shared.b16 %rs425, [%r26+256]; 2026-02-21T12:44:51.7167563Z ld.shared.b16 %rs426, [%r26+16]; 2026-02-21T12:44:51.7167638Z ld.shared.b16 %rs427, [%r26+272]; 2026-02-21T12:44:51.7167705Z cvt.f32.bf16 %r3827, %rs420; 2026-02-21T12:44:51.7167768Z cvt.f32.bf16 %r3828, %rs421; 2026-02-21T12:44:51.7167843Z cvt.f32.bf16 %r3829, %rs424; 2026-02-21T12:44:51.7167906Z cvt.f32.bf16 %r3830, %rs425; 2026-02-21T12:44:51.7167968Z cvt.f32.bf16 %r3847, %rs422; 2026-02-21T12:44:51.7168031Z cvt.f32.bf16 %r3848, %rs423; 2026-02-21T12:44:51.7168100Z cvt.f32.bf16 %r3849, %rs426; 2026-02-21T12:44:51.7168165Z cvt.f32.bf16 %r3850, %rs427; 2026-02-21T12:44:51.7168369Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7168439Z add.s64 %rd473, %rd523, 20480; 2026-02-21T12:44:51.7168500Z // begin inline asm 2026-02-21T12:44:51.7168561Z mov.u16 %rs383, 0x0; 2026-02-21T12:44:51.7168636Z ld.global.b8 { %rs383 }, [ %rd473 + 0 ]; 2026-02-21T12:44:51.7168698Z // end inline asm 2026-02-21T12:44:51.7168901Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7169043Z bar.sync 0; 2026-02-21T12:44:51.7169114Z st.shared.b8 [%r27], %rs383; 2026-02-21T12:44:51.7169231Z bar.sync 0; 2026-02-21T12:44:51.7169313Z ld.shared.v2.b8 {%rs428, %rs429}, [%r28]; 2026-02-21T12:44:51.7169522Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7169587Z shl.b16 %rs430, %rs428, 4; 2026-02-21T12:44:51.7169652Z shl.b16 %rs431, %rs429, 4; 2026-02-21T12:44:51.7169853Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7169931Z selp.b16 %rs432, %rs430, %rs428, %p134; 2026-02-21T12:44:51.7169994Z cvt.s16.s8 %rs433, %rs432; 2026-02-21T12:44:51.7170057Z shr.s16 %rs434, %rs433, 4; 2026-02-21T12:44:51.7170134Z selp.b16 %rs435, %rs431, %rs429, %p134; 2026-02-21T12:44:51.7170197Z cvt.s16.s8 %rs436, %rs435; 2026-02-21T12:44:51.7170258Z shr.s16 %rs437, %rs436, 4; 2026-02-21T12:44:51.7170481Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7170615Z cvt.rn.f32.s16 %r3878, %rs434; 2026-02-21T12:44:51.7170827Z cvt.rn.f32.s16 %r3879, %rs437; 2026-02-21T12:44:51.7170890Z bar.sync 0; 2026-02-21T12:44:51.7170961Z st.shared.b32 [%r29], %r3878; 2026-02-21T12:44:51.7171026Z st.shared.b32 [%r30], %r3879; 2026-02-21T12:44:51.7171084Z $L__tmp45: 2026-02-21T12:44:51.7171367Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7171563Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3353, %r3393, %r3433, %r3473}; 2026-02-21T12:44:51.7171622Z bar.sync 0; 2026-02-21T12:44:51.7171688Z // begin inline asm 2026-02-21T12:44:51.7171827Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3851}, [%r3796]; 2026-02-21T12:44:51.7171886Z // end inline asm 2026-02-21T12:44:51.7171942Z bar.sync 0; 2026-02-21T12:44:51.7172130Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3355, %r3395, %r3435, %r3475}; 2026-02-21T12:44:51.7172187Z bar.sync 0; 2026-02-21T12:44:51.7172251Z // begin inline asm 2026-02-21T12:44:51.7172389Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3853}, [%r3796]; 2026-02-21T12:44:51.7172446Z // end inline asm 2026-02-21T12:44:51.7172500Z bar.sync 0; 2026-02-21T12:44:51.7172680Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3354, %r3394, %r3434, %r3474}; 2026-02-21T12:44:51.7172743Z bar.sync 0; 2026-02-21T12:44:51.7172808Z // begin inline asm 2026-02-21T12:44:51.7172941Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3852}, [%r3796]; 2026-02-21T12:44:51.7173009Z // end inline asm 2026-02-21T12:44:51.7173066Z bar.sync 0; 2026-02-21T12:44:51.7173244Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3356, %r3396, %r3436, %r3476}; 2026-02-21T12:44:51.7173308Z bar.sync 0; 2026-02-21T12:44:51.7173370Z // begin inline asm 2026-02-21T12:44:51.7173504Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3854}, [%r3796]; 2026-02-21T12:44:51.7173572Z // end inline asm 2026-02-21T12:44:51.7173628Z bar.sync 0; 2026-02-21T12:44:51.7173809Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3357, %r3397, %r3437, %r3477}; 2026-02-21T12:44:51.7173871Z bar.sync 0; 2026-02-21T12:44:51.7173936Z // begin inline asm 2026-02-21T12:44:51.7174064Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3855}, [%r3796]; 2026-02-21T12:44:51.7174123Z // end inline asm 2026-02-21T12:44:51.7174183Z bar.sync 0; 2026-02-21T12:44:51.7174360Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3359, %r3399, %r3439, %r3479}; 2026-02-21T12:44:51.7174416Z bar.sync 0; 2026-02-21T12:44:51.7174475Z // begin inline asm 2026-02-21T12:44:51.7174609Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3857}, [%r3796]; 2026-02-21T12:44:51.7174666Z // end inline asm 2026-02-21T12:44:51.7174722Z bar.sync 0; 2026-02-21T12:44:51.7174906Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3358, %r3398, %r3438, %r3478}; 2026-02-21T12:44:51.7175034Z bar.sync 0; 2026-02-21T12:44:51.7175093Z // begin inline asm 2026-02-21T12:44:51.7175226Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3856}, [%r3796]; 2026-02-21T12:44:51.7175339Z // end inline asm 2026-02-21T12:44:51.7175395Z bar.sync 0; 2026-02-21T12:44:51.7175572Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r3297], {%r3360, %r3400, %r3440, %r3480}; 2026-02-21T12:44:51.7175633Z bar.sync 0; 2026-02-21T12:44:51.7175691Z // begin inline asm 2026-02-21T12:44:51.7175819Z ldmatrix.sync.aligned.m8n8.x1.shared.b16 {%r3858}, [%r3796]; 2026-02-21T12:44:51.7175881Z // end inline asm 2026-02-21T12:44:51.7175941Z // begin inline asm 2026-02-21T12:44:51.7176021Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7176077Z // end inline asm 2026-02-21T12:44:51.7176162Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7176225Z shl.b32 %r3880, %r3875, 8; 2026-02-21T12:44:51.7176288Z and.b32 %r3881, %r3880, 3072; 2026-02-21T12:44:51.7176360Z add.s32 %r3882, %r3881, %r3946; 2026-02-21T12:44:51.7176429Z bfe.u32 %r3883, %r3882, 4, 14; 2026-02-21T12:44:51.7176635Z cvt.u64.u32 %rd476, %r3883; 2026-02-21T12:44:51.7176803Z or.b64 %rd474, %rd476, -9223371899399045120; 2026-02-21T12:44:51.7176929Z // begin inline asm 2026-02-21T12:44:51.7177331Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3827,%r3828,%r3829,%r3830}, %rd474, %p125, 1, 1; 2026-02-21T12:44:51.7177390Z // end inline asm 2026-02-21T12:44:51.7177460Z add.s32 %r3884, %r3882, 32; 2026-02-21T12:44:51.7177524Z bfe.u32 %r3885, %r3884, 4, 14; 2026-02-21T12:44:51.7177587Z cvt.u64.u32 %rd477, %r3885; 2026-02-21T12:44:51.7177672Z or.b64 %rd475, %rd477, -9223371899399045120; 2026-02-21T12:44:51.7177736Z // begin inline asm 2026-02-21T12:44:51.7178113Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3847,%r3848,%r3849,%r3850}, %rd475, %p125, 1, 1; 2026-02-21T12:44:51.7178171Z // end inline asm 2026-02-21T12:44:51.7178255Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7178317Z mov.b32 %r3860, %r4000; 2026-02-21T12:44:51.7178378Z mov.b32 %r3861, %r4000; 2026-02-21T12:44:51.7178448Z mov.b32 %r3859, %r3946; 2026-02-21T12:44:51.7178509Z // begin inline asm 2026-02-21T12:44:51.7178688Z // wait for regs: %r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3859,%r3860,%r3861 2026-02-21T12:44:51.7178767Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7178826Z // end inline asm 2026-02-21T12:44:51.7178880Z $L__tmp46: 2026-02-21T12:44:51.7179104Z .loc 1 43 126 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:43:126 2026-02-21T12:44:51.7179173Z add.s64 %rd525, %rd525, 24; 2026-02-21T12:44:51.7179234Z add.s64 %rd524, %rd524, 96; 2026-02-21T12:44:51.7179299Z add.s64 %rd523, %rd523, 30720; 2026-02-21T12:44:51.7179372Z setp.lt.u64 %p127, %rd525, 4056; 2026-02-21T12:44:51.7179433Z @%p127 bra $L__BB0_7; 2026-02-21T12:44:51.7179521Z // %bb.27: // %.preheader 2026-02-21T12:44:51.7179629Z // in Loop: Header=BB0_6 Depth=1 2026-02-21T12:44:51.7179849Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7179913Z add.s64 %rd491, %rd110, %rd503; 2026-02-21T12:44:51.7179976Z add.s64 %rd479, %rd491, 16320; 2026-02-21T12:44:51.7180181Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7180242Z // begin inline asm 2026-02-21T12:44:51.7180301Z mov.u64 %rd478, 0x0; 2026-02-21T12:44:51.7180431Z createpolicy.fractional.L2::evict_last.b64 %rd478, 1.0; 2026-02-21T12:44:51.7180489Z // end inline asm 2026-02-21T12:44:51.7180547Z // begin inline asm 2026-02-21T12:44:51.7180607Z mov.u32 %r3886, 0x0; 2026-02-21T12:44:51.7180777Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3886 }, [ %rd479 + 0 ], %rd478; 2026-02-21T12:44:51.7180836Z // end inline asm 2026-02-21T12:44:51.7181128Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7181251Z bar.sync 0; 2026-02-21T12:44:51.7181319Z st.shared.b32 [%r24], %r3886; 2026-02-21T12:44:51.7181375Z bar.sync 0; 2026-02-21T12:44:51.7181446Z ld.shared.b16 %rs440, [%r33]; 2026-02-21T12:44:51.7181518Z ld.shared.b16 %rs441, [%r33+256]; 2026-02-21T12:44:51.7181587Z ld.shared.b16 %rs442, [%r33+16]; 2026-02-21T12:44:51.7181654Z ld.shared.b16 %rs443, [%r33+272]; 2026-02-21T12:44:51.7181723Z ld.shared.b16 %rs444, [%r34]; 2026-02-21T12:44:51.7181787Z ld.shared.b16 %rs445, [%r34+256]; 2026-02-21T12:44:51.7181853Z ld.shared.b16 %rs446, [%r34+16]; 2026-02-21T12:44:51.7181923Z ld.shared.b16 %rs447, [%r34+272]; 2026-02-21T12:44:51.7181989Z cvt.f32.bf16 %r3903, %rs440; 2026-02-21T12:44:51.7182051Z cvt.f32.bf16 %r3904, %rs441; 2026-02-21T12:44:51.7182116Z cvt.f32.bf16 %r3905, %rs444; 2026-02-21T12:44:51.7182182Z cvt.f32.bf16 %r3906, %rs445; 2026-02-21T12:44:51.7182246Z cvt.f32.bf16 %r3923, %rs442; 2026-02-21T12:44:51.7182314Z cvt.f32.bf16 %r3924, %rs443; 2026-02-21T12:44:51.7182429Z cvt.f32.bf16 %r3925, %rs446; 2026-02-21T12:44:51.7182535Z cvt.f32.bf16 %r3926, %rs447; 2026-02-21T12:44:51.7182752Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7182819Z add.s64 %rd492, %rd111, %rd502; 2026-02-21T12:44:51.7182881Z add.s64 %rd481, %rd492, 5222400; 2026-02-21T12:44:51.7183091Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7183152Z // begin inline asm 2026-02-21T12:44:51.7183210Z mov.u16 %rs438, 0x0; 2026-02-21T12:44:51.7183288Z ld.global.b8 { %rs438 }, [ %rd481 + 0 ]; 2026-02-21T12:44:51.7183350Z // end inline asm 2026-02-21T12:44:51.7183549Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7183608Z bar.sync 0; 2026-02-21T12:44:51.7183679Z st.shared.b8 [%r27], %rs438; 2026-02-21T12:44:51.7183734Z bar.sync 0; 2026-02-21T12:44:51.7183817Z ld.shared.v2.b8 {%rs448, %rs449}, [%r28]; 2026-02-21T12:44:51.7184028Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7184092Z shl.b16 %rs450, %rs448, 4; 2026-02-21T12:44:51.7184158Z shl.b16 %rs451, %rs449, 4; 2026-02-21T12:44:51.7184360Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7184440Z selp.b16 %rs452, %rs450, %rs448, %p134; 2026-02-21T12:44:51.7184517Z cvt.s16.s8 %rs453, %rs452; 2026-02-21T12:44:51.7184579Z shr.s16 %rs454, %rs453, 4; 2026-02-21T12:44:51.7184653Z selp.b16 %rs455, %rs451, %rs449, %p134; 2026-02-21T12:44:51.7184716Z cvt.s16.s8 %rs456, %rs455; 2026-02-21T12:44:51.7184779Z shr.s16 %rs457, %rs456, 4; 2026-02-21T12:44:51.7184981Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7185055Z cvt.rn.f32.s16 %r4016, %rs454; 2026-02-21T12:44:51.7185120Z cvt.rn.f32.s16 %r4017, %rs457; 2026-02-21T12:44:51.7185179Z bar.sync 0; 2026-02-21T12:44:51.7185247Z st.shared.b32 [%r29], %r4016; 2026-02-21T12:44:51.7185310Z st.shared.b32 [%r30], %r4017; 2026-02-21T12:44:51.7185363Z $L__tmp47: 2026-02-21T12:44:51.7185643Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7185705Z // begin inline asm 2026-02-21T12:44:51.7185781Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7185842Z // end inline asm 2026-02-21T12:44:51.7185900Z bar.sync 0; 2026-02-21T12:44:51.7185980Z shfl.sync.idx.b32 %r4018, %r3, 0, 31, -1; 2026-02-21T12:44:51.7186053Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7186117Z shl.b32 %r4019, %r4018, 8; 2026-02-21T12:44:51.7186176Z and.b32 %r4020, %r4019, 3072; 2026-02-21T12:44:51.7186306Z add.s32 %r4021, %r4020, %r3946; 2026-02-21T12:44:51.7186372Z bfe.u32 %r4022, %r4021, 4, 14; 2026-02-21T12:44:51.7186439Z cvt.u64.u32 %rd493, %r4022; 2026-02-21T12:44:51.7186743Z or.b64 %rd482, %rd493, -9223371899399045120; 2026-02-21T12:44:51.7186806Z // begin inline asm 2026-02-21T12:44:51.7187198Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3903,%r3904,%r3905,%r3906}, %rd482, %p125, 1, 1; 2026-02-21T12:44:51.7187257Z // end inline asm 2026-02-21T12:44:51.7187321Z add.s32 %r4024, %r4020, %r3257; 2026-02-21T12:44:51.7187387Z bfe.u32 %r4025, %r4024, 4, 14; 2026-02-21T12:44:51.7187461Z cvt.u64.u32 %rd494, %r4025; 2026-02-21T12:44:51.7187543Z or.b64 %rd483, %rd494, -9223371899399045120; 2026-02-21T12:44:51.7187603Z // begin inline asm 2026-02-21T12:44:51.7187991Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3923,%r3924,%r3925,%r3926}, %rd483, %p125, 1, 1; 2026-02-21T12:44:51.7188054Z // end inline asm 2026-02-21T12:44:51.7188128Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7188194Z mov.b32 %r3936, %r4000; 2026-02-21T12:44:51.7188372Z mov.b32 %r3937, %r4000; 2026-02-21T12:44:51.7188435Z mov.b32 %r3935, %r3946; 2026-02-21T12:44:51.7188550Z // begin inline asm 2026-02-21T12:44:51.7188743Z // wait for regs: %r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3935,%r3936,%r3937 2026-02-21T12:44:51.7188818Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7188874Z // end inline asm 2026-02-21T12:44:51.7188931Z $L__tmp48: 2026-02-21T12:44:51.7189151Z .loc 1 51 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:32 2026-02-21T12:44:51.7189218Z add.s64 %rd485, %rd491, 16352; 2026-02-21T12:44:51.7189433Z .loc 1 51 80 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:51:80 2026-02-21T12:44:51.7189494Z // begin inline asm 2026-02-21T12:44:51.7189555Z mov.u64 %rd484, 0x0; 2026-02-21T12:44:51.7189694Z createpolicy.fractional.L2::evict_last.b64 %rd484, 1.0; 2026-02-21T12:44:51.7189752Z // end inline asm 2026-02-21T12:44:51.7189814Z // begin inline asm 2026-02-21T12:44:51.7189874Z mov.u32 %r3949, 0x0; 2026-02-21T12:44:51.7190045Z ld.global.L1::evict_last.L2::cache_hint.b32 { %r3949 }, [ %rd485 + 0 ], %rd484; 2026-02-21T12:44:51.7190104Z // end inline asm 2026-02-21T12:44:51.7190319Z .loc 1 55 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:55:32 2026-02-21T12:44:51.7190381Z bar.sync 0; 2026-02-21T12:44:51.7190450Z st.shared.b32 [%r24], %r3949; 2026-02-21T12:44:51.7190505Z bar.sync 0; 2026-02-21T12:44:51.7190570Z ld.shared.b16 %rs458, [%r33]; 2026-02-21T12:44:51.7190643Z ld.shared.b16 %rs459, [%r33+256]; 2026-02-21T12:44:51.7190711Z ld.shared.b16 %rs460, [%r33+16]; 2026-02-21T12:44:51.7190777Z ld.shared.b16 %rs461, [%r33+272]; 2026-02-21T12:44:51.7190845Z ld.shared.b16 %rs462, [%r34]; 2026-02-21T12:44:51.7190909Z ld.shared.b16 %rs463, [%r34+256]; 2026-02-21T12:44:51.7190977Z ld.shared.b16 %rs464, [%r34+16]; 2026-02-21T12:44:51.7191045Z ld.shared.b16 %rs465, [%r34+272]; 2026-02-21T12:44:51.7191115Z cvt.f32.bf16 %r3966, %rs458; 2026-02-21T12:44:51.7191178Z cvt.f32.bf16 %r3967, %rs459; 2026-02-21T12:44:51.7191240Z cvt.f32.bf16 %r3968, %rs462; 2026-02-21T12:44:51.7191306Z cvt.f32.bf16 %r3969, %rs463; 2026-02-21T12:44:51.7191367Z cvt.f32.bf16 %r3986, %rs460; 2026-02-21T12:44:51.7191429Z cvt.f32.bf16 %r3987, %rs461; 2026-02-21T12:44:51.7191492Z cvt.f32.bf16 %r3988, %rs464; 2026-02-21T12:44:51.7191553Z cvt.f32.bf16 %r3989, %rs465; 2026-02-21T12:44:51.7191764Z .loc 1 57 34 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:34 2026-02-21T12:44:51.7191829Z add.s64 %rd487, %rd492, 5232640; 2026-02-21T12:44:51.7192039Z .loc 1 57 87 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:57:87 2026-02-21T12:44:51.7192099Z // begin inline asm 2026-02-21T12:44:51.7192255Z mov.u16 %rs439, 0x0; 2026-02-21T12:44:51.7192337Z ld.global.b8 { %rs439 }, [ %rd487 + 0 ]; 2026-02-21T12:44:51.7192395Z // end inline asm 2026-02-21T12:44:51.7192657Z .loc 1 65 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:65:28 2026-02-21T12:44:51.7192717Z bar.sync 0; 2026-02-21T12:44:51.7192782Z st.shared.b8 [%r27], %rs439; 2026-02-21T12:44:51.7192838Z bar.sync 0; 2026-02-21T12:44:51.7192919Z ld.shared.v2.b8 {%rs466, %rs467}, [%r28]; 2026-02-21T12:44:51.7193135Z .loc 1 60 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:60:28 2026-02-21T12:44:51.7193200Z shl.b16 %rs468, %rs466, 4; 2026-02-21T12:44:51.7193260Z shl.b16 %rs469, %rs467, 4; 2026-02-21T12:44:51.7193470Z .loc 1 75 58 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:75:58 2026-02-21T12:44:51.7193546Z selp.b16 %rs470, %rs468, %rs466, %p134; 2026-02-21T12:44:51.7193609Z cvt.s16.s8 %rs471, %rs470; 2026-02-21T12:44:51.7193676Z shr.s16 %rs472, %rs471, 4; 2026-02-21T12:44:51.7193750Z selp.b16 %rs473, %rs469, %rs467, %p134; 2026-02-21T12:44:51.7193863Z cvt.s16.s8 %rs474, %rs473; 2026-02-21T12:44:51.7193979Z shr.s16 %rs475, %rs474, 4; 2026-02-21T12:44:51.7194211Z .loc 1 80 32 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:80:32 2026-02-21T12:44:51.7194285Z cvt.rn.f32.s16 %r4026, %rs472; 2026-02-21T12:44:51.7194353Z cvt.rn.f32.s16 %r4027, %rs475; 2026-02-21T12:44:51.7194415Z bar.sync 0; 2026-02-21T12:44:51.7194483Z st.shared.b32 [%r29], %r4026; 2026-02-21T12:44:51.7194548Z st.shared.b32 [%r30], %r4027; 2026-02-21T12:44:51.7194605Z $L__tmp49: 2026-02-21T12:44:51.7194888Z .loc 2 291 36 // standard.py:291:36 @[ cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:87:40 ] 2026-02-21T12:44:51.7194950Z // begin inline asm 2026-02-21T12:44:51.7195027Z fence.proxy.async.shared::cta; 2026-02-21T12:44:51.7195087Z // end inline asm 2026-02-21T12:44:51.7195145Z bar.sync 0; 2026-02-21T12:44:51.7195226Z shfl.sync.idx.b32 %r4028, %r3, 0, 31, -1; 2026-02-21T12:44:51.7195307Z wgmma.fence.sync.aligned; 2026-02-21T12:44:51.7195384Z shl.b32 %r4029, %r4028, 8; 2026-02-21T12:44:51.7195450Z and.b32 %r4030, %r4029, 3072; 2026-02-21T12:44:51.7195514Z add.s32 %r4031, %r4030, %r3946; 2026-02-21T12:44:51.7195581Z bfe.u32 %r4032, %r4031, 4, 14; 2026-02-21T12:44:51.7195645Z cvt.u64.u32 %rd495, %r4032; 2026-02-21T12:44:51.7195729Z or.b64 %rd488, %rd495, -9223371899399045120; 2026-02-21T12:44:51.7195793Z // begin inline asm 2026-02-21T12:44:51.7196185Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3966,%r3967,%r3968,%r3969}, %rd488, %p125, 1, 1; 2026-02-21T12:44:51.7196247Z // end inline asm 2026-02-21T12:44:51.7196315Z add.s32 %r4033, %r4030, %r3257; 2026-02-21T12:44:51.7196378Z bfe.u32 %r4034, %r4033, 4, 14; 2026-02-21T12:44:51.7196441Z cvt.u64.u32 %rd496, %r4034; 2026-02-21T12:44:51.7196648Z or.b64 %rd489, %rd496, -9223371899399045120; 2026-02-21T12:44:51.7196717Z // begin inline asm 2026-02-21T12:44:51.7197104Z wgmma.mma_async.sync.aligned.m64n16k8.f32.tf32.tf32 {%r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858}, {%r3986,%r3987,%r3988,%r3989}, %rd489, %p125, 1, 1; 2026-02-21T12:44:51.7197163Z // end inline asm 2026-02-21T12:44:51.7197244Z wgmma.commit_group.sync.aligned; 2026-02-21T12:44:51.7197305Z mov.b32 %r3999, %r4000; 2026-02-21T12:44:51.7197365Z mov.b32 %r3998, %r3946; 2026-02-21T12:44:51.7197425Z // begin inline asm 2026-02-21T12:44:51.7197614Z // wait for regs: %r3851,%r3852,%r3853,%r3854,%r3855,%r3856,%r3857,%r3858,%r3998,%r3999,%r4000 2026-02-21T12:44:51.7197691Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:44:51.7197749Z // end inline asm 2026-02-21T12:44:51.7197807Z $L__tmp50: 2026-02-21T12:44:51.7198020Z .loc 1 90 28 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:90:28 2026-02-21T12:44:51.7198177Z cvt.rn.bf16x2.f32 %r4035, %r3852, %r3851; 2026-02-21T12:44:51.7198255Z cvt.rn.bf16x2.f32 %r4036, %r3854, %r3853; 2026-02-21T12:44:51.7198328Z cvt.rn.bf16x2.f32 %r4037, %r3856, %r3855; 2026-02-21T12:44:51.7198458Z cvt.rn.bf16x2.f32 %r4038, %r3858, %r3857; 2026-02-21T12:44:51.7198664Z .loc 1 91 22 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:22 2026-02-21T12:44:51.7198744Z mad.lo.s64 %rd497, %rd108, 2560, %rd123; 2026-02-21T12:44:51.7198809Z shl.b64 %rd498, %rd109, 1; 2026-02-21T12:44:51.7198875Z add.s64 %rd499, %rd497, %rd498; 2026-02-21T12:44:51.7198943Z add.s64 %rd490, %rd499, %rd504; 2026-02-21T12:44:51.7199143Z .loc 1 91 81 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:91:81 2026-02-21T12:44:51.7199201Z bar.sync 0; 2026-02-21T12:44:51.7199394Z stmatrix.sync.aligned.m8n8.x4.shared.b16 [%r35], {%r4035, %r4036, %r4037, %r4038}; 2026-02-21T12:44:51.7199450Z bar.sync 0; 2026-02-21T12:44:51.7199566Z ld.shared.v4.b32 {%r4012, %r4013, %r4014, %r4015}, [%r36]; 2026-02-21T12:44:51.7199630Z // begin inline asm 2026-02-21T12:44:51.7199826Z st.global.v4.b32 [ %rd490 + 0 ], { %r4012, %r4013, %r4014, %r4015 }; 2026-02-21T12:44:51.7199956Z // end inline asm 2026-02-21T12:44:51.7200181Z .loc 1 22 120 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:120 2026-02-21T12:44:51.7200251Z add.s64 %rd522, %rd522, 1; 2026-02-21T12:44:51.7200321Z setp.eq.b64 %p133, %rd522, %rd2; 2026-02-21T12:44:51.7200386Z @%p133 bra $L__BB0_28; 2026-02-21T12:44:51.7200450Z bra.uni $L__BB0_6; 2026-02-21T12:44:51.7200540Z $L__BB0_28: // %._crit_edge 2026-02-21T12:44:51.7200744Z .loc 1 22 4 // cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py:22:4 2026-02-21T12:44:51.7200799Z ret; 2026-02-21T12:44:51.7200859Z $L__tmp51: 2026-02-21T12:44:51.7200916Z $L__func_end0: 2026-02-21T12:44:51.7201005Z // -- End function 2026-02-21T12:44:51.7201068Z } 2026-02-21T12:44:51.7201321Z .file 1 "/tmp/torchinductor_root/we/cwedbvvwmfzugixoxxston5hwqig43cunfrn34nueroqdjinzl3r.py" 2026-02-21T12:44:51.7201539Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:44:51.7201609Z .section .debug_abbrev 2026-02-21T12:44:51.7201663Z { 2026-02-21T12:44:51.7201760Z .b8 1 // Abbreviation Code 2026-02-21T12:44:51.7201857Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:44:51.7201948Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:44:51.7202038Z .b8 37 // DW_AT_producer 2026-02-21T12:44:51.7202122Z .b8 8 // DW_FORM_string 2026-02-21T12:44:51.7202203Z .b8 19 // DW_AT_language 2026-02-21T12:44:51.7202286Z .b8 5 // DW_FORM_data2 2026-02-21T12:44:51.7202367Z .b8 3 // DW_AT_name 2026-02-21T12:44:51.7202453Z .b8 8 // DW_FORM_string 2026-02-21T12:44:51.7202540Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:44:51.7202624Z .b8 6 // DW_FORM_data4 2026-02-21T12:44:51.7202704Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:44:51.7202788Z .b8 8 // DW_FORM_string 2026-02-21T12:44:51.7202877Z .b8 0 // EOM(1) 2026-02-21T12:44:51.7202952Z .b8 0 // EOM(2) 2026-02-21T12:44:51.7203047Z .b8 2 // Abbreviation Code 2026-02-21T12:44:51.7203133Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:44:51.7203212Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:44:51.7203289Z .b8 3 // DW_AT_name 2026-02-21T12:44:51.7203368Z .b8 8 // DW_FORM_string 2026-02-21T12:44:51.7203510Z .b8 32 // DW_AT_inline 2026-02-21T12:44:51.7203594Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:51.7203712Z .b8 0 // EOM(1) 2026-02-21T12:44:51.7203782Z .b8 0 // EOM(2) 2026-02-21T12:44:51.7203868Z .b8 3 // Abbreviation Code 2026-02-21T12:44:51.7203960Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:44:51.7204043Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:44:51.7204124Z .b8 17 // DW_AT_low_pc 2026-02-21T12:44:51.7204205Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:51.7204287Z .b8 18 // DW_AT_high_pc 2026-02-21T12:44:51.7204365Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:51.7204461Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:44:51.7204544Z .b8 19 // DW_FORM_ref4 2026-02-21T12:44:51.7204719Z .b8 0 // EOM(1) 2026-02-21T12:44:51.7204797Z .b8 0 // EOM(2) 2026-02-21T12:44:51.7204886Z .b8 4 // Abbreviation Code 2026-02-21T12:44:51.7204990Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:44:51.7205072Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:44:51.7205167Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:44:51.7205245Z .b8 19 // DW_FORM_ref4 2026-02-21T12:44:51.7205323Z .b8 17 // DW_AT_low_pc 2026-02-21T12:44:51.7205402Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:51.7205483Z .b8 18 // DW_AT_high_pc 2026-02-21T12:44:51.7205560Z .b8 1 // DW_FORM_addr 2026-02-21T12:44:51.7205642Z .b8 88 // DW_AT_call_file 2026-02-21T12:44:51.7205730Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:51.7205811Z .b8 89 // DW_AT_call_line 2026-02-21T12:44:51.7205889Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:51.7205975Z .b8 87 // DW_AT_call_column 2026-02-21T12:44:51.7206053Z .b8 11 // DW_FORM_data1 2026-02-21T12:44:51.7206123Z .b8 0 // EOM(1) 2026-02-21T12:44:51.7206195Z .b8 0 // EOM(2) 2026-02-21T12:44:51.7206264Z .b8 0 // EOM(3) 2026-02-21T12:44:51.7206317Z } 2026-02-21T12:44:51.7206381Z .section .debug_info 2026-02-21T12:44:51.7206436Z { 2026-02-21T12:44:51.7206642Z .b32 178 // Length of Unit 2026-02-21T12:44:51.7206741Z .b8 2 // DWARF version number 2026-02-21T12:44:51.7206798Z .b8 0 2026-02-21T12:44:51.7206938Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:44:51.7207047Z .b8 8 // Address Size (in bytes) 2026-02-21T12:44:51.7207166Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:44:51.7207253Z .b8 116 // DW_AT_producer 2026-02-21T12:44:51.7207308Z .b8 114 2026-02-21T12:44:51.7207365Z .b8 105 2026-02-21T12:44:51.7207422Z .b8 116 2026-02-21T12:44:51.7207473Z .b8 111 2026-02-21T12:44:51.7207525Z .b8 110 2026-02-21T12:44:51.7207590Z .b8 0 2026-02-21T12:44:51.7207673Z .b8 2 // DW_AT_language 2026-02-21T12:44:51.7207724Z .b8 0 2026-02-21T12:44:51.7207803Z .b8 99 // DW_AT_name 2026-02-21T12:44:51.7207861Z .b8 119 2026-02-21T12:44:51.7208025Z .b8 101 2026-02-21T12:44:51.7208078Z .b8 100 2026-02-21T12:44:51.7208135Z .b8 98 2026-02-21T12:44:51.7208186Z .b8 118 2026-02-21T12:44:51.7208239Z .b8 118 2026-02-21T12:44:51.7208353Z .b8 119 2026-02-21T12:44:51.7208410Z .b8 109 2026-02-21T12:44:51.7208462Z .b8 102 2026-02-21T12:44:51.7208514Z .b8 122 2026-02-21T12:44:51.7208565Z .b8 117 2026-02-21T12:44:51.7208620Z .b8 103 2026-02-21T12:44:51.7208671Z .b8 105 2026-02-21T12:44:51.7208722Z .b8 120 2026-02-21T12:44:51.7208790Z .b8 111 2026-02-21T12:44:51.7208844Z .b8 120 2026-02-21T12:44:51.7208896Z .b8 120 2026-02-21T12:44:51.7208948Z .b8 115 2026-02-21T12:44:51.7209004Z .b8 116 2026-02-21T12:44:51.7209055Z .b8 111 2026-02-21T12:44:51.7209108Z .b8 110 2026-02-21T12:44:51.7209162Z .b8 53 2026-02-21T12:44:51.7209214Z .b8 104 2026-02-21T12:44:51.7209266Z .b8 119 2026-02-21T12:44:51.7209317Z .b8 113 2026-02-21T12:44:51.7209373Z .b8 105 2026-02-21T12:44:51.7209424Z .b8 103 2026-02-21T12:44:51.7209476Z .b8 52 2026-02-21T12:44:51.7209527Z .b8 51 2026-02-21T12:44:51.7209585Z .b8 99 2026-02-21T12:44:51.7209636Z .b8 117 2026-02-21T12:44:51.7209689Z .b8 110 2026-02-21T12:44:51.7209744Z .b8 102 2026-02-21T12:44:51.7209796Z .b8 114 2026-02-21T12:44:51.7209925Z .b8 110 2026-02-21T12:44:51.7210059Z .b8 51 2026-02-21T12:44:51.7210120Z .b8 52 2026-02-21T12:44:51.7210173Z .b8 110 2026-02-21T12:44:51.7210228Z .b8 117 2026-02-21T12:44:51.7210284Z .b8 101 2026-02-21T12:44:51.7210336Z .b8 114 2026-02-21T12:44:51.7210387Z .b8 111 2026-02-21T12:44:51.7210439Z .b8 113 2026-02-21T12:44:51.7210494Z .b8 100 2026-02-21T12:44:51.7210544Z .b8 106 2026-02-21T12:44:51.7210598Z .b8 105 2026-02-21T12:44:51.7210654Z .b8 110 2026-02-21T12:44:51.7210705Z .b8 122 2026-02-21T12:44:51.7210757Z .b8 108 2026-02-21T12:44:51.7210810Z .b8 51 2026-02-21T12:44:51.7210866Z .b8 114 2026-02-21T12:44:51.7210916Z .b8 46 2026-02-21T12:44:51.7210968Z .b8 112 2026-02-21T12:44:51.7211020Z .b8 121 2026-02-21T12:44:51.7211073Z .b8 0 2026-02-21T12:44:51.7211179Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:44:51.7211268Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:44:51.7211323Z .b8 116 2026-02-21T12:44:51.7211375Z .b8 109 2026-02-21T12:44:51.7211430Z .b8 112 2026-02-21T12:44:51.7211484Z .b8 47 2026-02-21T12:44:51.7211544Z .b8 116 2026-02-21T12:44:51.7211596Z .b8 111 2026-02-21T12:44:51.7211646Z .b8 114 2026-02-21T12:44:51.7211700Z .b8 99 2026-02-21T12:44:51.7211751Z .b8 104 2026-02-21T12:44:51.7211803Z .b8 105 2026-02-21T12:44:51.7211854Z .b8 110 2026-02-21T12:44:51.7211909Z .b8 100 2026-02-21T12:44:51.7211959Z .b8 117 2026-02-21T12:44:51.7212010Z .b8 99 2026-02-21T12:44:51.7212069Z .b8 116 2026-02-21T12:44:51.7212131Z .b8 111 2026-02-21T12:44:51.7212185Z .b8 114 2026-02-21T12:44:51.7212235Z .b8 95 2026-02-21T12:44:51.7212291Z .b8 114 2026-02-21T12:44:51.7212343Z .b8 111 2026-02-21T12:44:51.7212397Z .b8 111 2026-02-21T12:44:51.7212448Z .b8 116 2026-02-21T12:44:51.7212502Z .b8 47 2026-02-21T12:44:51.7212555Z .b8 119 2026-02-21T12:44:51.7212608Z .b8 101 2026-02-21T12:44:51.7212664Z .b8 0 2026-02-21T12:44:51.7212779Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:44:51.7212862Z .b8 95 // DW_AT_name 2026-02-21T12:44:51.7212920Z .b8 104 2026-02-21T12:44:51.7212973Z .b8 101 2026-02-21T12:44:51.7213024Z .b8 108 2026-02-21T12:44:51.7213075Z .b8 105 2026-02-21T12:44:51.7213130Z .b8 111 2026-02-21T12:44:51.7213181Z .b8 110 2026-02-21T12:44:51.7213231Z .b8 95 2026-02-21T12:44:51.7213295Z .b8 109 2026-02-21T12:44:51.7213353Z .b8 97 2026-02-21T12:44:51.7213405Z .b8 116 2026-02-21T12:44:51.7213457Z .b8 109 2026-02-21T12:44:51.7213515Z .b8 117 2026-02-21T12:44:51.7213570Z .b8 108 2026-02-21T12:44:51.7213621Z .b8 95 2026-02-21T12:44:51.7213672Z .b8 98 2026-02-21T12:44:51.7213727Z .b8 102 2026-02-21T12:44:51.7213779Z .b8 49 2026-02-21T12:44:51.7213830Z .b8 54 2026-02-21T12:44:51.7213885Z .b8 95 2026-02-21T12:44:51.7213938Z .b8 105 2026-02-21T12:44:51.7213990Z .b8 110 2026-02-21T12:44:51.7214042Z .b8 116 2026-02-21T12:44:51.7214159Z .b8 52 2026-02-21T12:44:51.7214211Z .b8 0 2026-02-21T12:44:51.7214295Z .b8 1 // DW_AT_inline 2026-02-21T12:44:51.7214458Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:44:51.7214555Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:44:51.7214655Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:44:51.7214756Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:44:51.7214895Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:44:51.7214994Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:44:51.7215084Z .b64 $L__tmp1 // DW_AT_low_pc 2026-02-21T12:44:51.7215182Z .b64 $L__tmp50 // DW_AT_high_pc 2026-02-21T12:44:51.7215265Z .b8 1 // DW_AT_call_file 2026-02-21T12:44:51.7215349Z .b8 87 // DW_AT_call_line 2026-02-21T12:44:51.7215440Z .b8 40 // DW_AT_call_column 2026-02-21T12:44:51.7215624Z .b8 0 // End Of Children Mark 2026-02-21T12:44:51.7215715Z .b8 0 // End Of Children Mark 2026-02-21T12:44:51.7215769Z } 2026-02-21T12:44:51.7215846Z .section .debug_macinfo { } 2026-02-21T12:44:51.7215854Z 2026-02-21T12:44:51.7215934Z ================================================================ 2026-02-21T12:44:51.7216054Z please share the reproducer above with Triton project. 2026-02-21T12:44:52.6912046Z [8391s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[4, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=64, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, False], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T12:44:52.6913655Z Tensor-likes are not close! 2026-02-21T12:44:52.6913810Z 2026-02-21T12:44:52.6913934Z Mismatched elements: 334497811 / 335544320 (99.7%) 2026-02-21T12:44:52.6914400Z Greatest absolute difference: 4256.0 at index (10096, 918) (up to 0.01 allowed) 2026-02-21T12:44:52.6915101Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:44:52.6915812Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:44:52.6916212Z 2026-02-21T12:44:58.7410379Z [8397s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=64, num_stages=6, num_warps=1, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[None, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T12:44:58.7412610Z Tensor-likes are not close! 2026-02-21T12:44:58.7412823Z 2026-02-21T12:44:58.7412986Z Mismatched elements: 334859556 / 335544320 (99.8%) 2026-02-21T12:44:58.7413550Z Greatest absolute difference: 7360.0 at index (10096, 918) (up to 0.01 allowed) 2026-02-21T12:44:58.7414190Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:44:58.7414780Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:44:58.7415091Z 2026-02-21T12:45:18.8601622Z [8418s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_stages=6, num_warps=16, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, True], range_num_stages=[0, 1], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T12:45:18.8603225Z Tensor-likes are not close! 2026-02-21T12:45:18.8603915Z 2026-02-21T12:45:18.8604034Z Mismatched elements: 334818436 / 335544320 (99.8%) 2026-02-21T12:45:18.8604490Z Greatest absolute difference: 7168.0 at index (5531, 926) (up to 0.01 allowed) 2026-02-21T12:45:18.8605184Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:45:18.8605657Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:45:18.8605913Z 2026-02-21T12:45:31.2143177Z [8430s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[2, 32, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[4], load_eviction_policies=['last', ''], loop_orders=[[0, 1]], num_sm_multiplier=64, num_stages=6, num_warps=2, pid_type='persistent_blocked', range_flattens=[True, True], range_multi_buffers=[False, True], range_num_stages=[3, 1], range_unroll_factors=[4, 3], range_warp_specializes=[]) 2026-02-21T12:45:31.2145754Z Tensor-likes are not close! 2026-02-21T12:45:31.2146025Z 2026-02-21T12:45:31.2146217Z Mismatched elements: 334855905 / 335544320 (99.8%) 2026-02-21T12:45:31.2147658Z Greatest absolute difference: 7200.0 at index (5531, 830) (up to 0.01 allowed) 2026-02-21T12:45:31.2149522Z Greatest relative difference: inf at index (60515, 1164) (up to 0.01 allowed) 2026-02-21T12:45:31.2150305Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T12:45:31.2150693Z 2026-02-21T12:46:36.6122671Z 2026-02-21T12:46:36.6123882Z Generation 3: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━ 101/101 0.5 configs/s 2026-02-21T12:46:37.9231845Z Generation 3: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 5/5 - configs/s 2026-02-21T12:46:44.3828193Z [8503s] Generation 3 complete: 2026-02-21T12:46:44.3828549Z error=15 2026-02-21T12:46:44.3828708Z ok=90 2026-02-21T12:46:44.3828935Z min=38.4440 2026-02-21T12:46:44.3829099Z mid=70.4581 2026-02-21T12:46:44.3829250Z max=2293.8730 2026-02-21T12:46:44.3829420Z best={'block_sizes': [16, 64, 256], 2026-02-21T12:46:44.3829750Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:46:44.3830043Z 'l2_groupings': [1], 2026-02-21T12:46:44.3830269Z 'load_eviction_policies': ['', ''], 2026-02-21T12:46:44.3830492Z 'loop_orders': [[1, 0]], 2026-02-21T12:46:44.3830716Z 'num_stages': 1, 2026-02-21T12:46:44.3830882Z 'num_warps': 4, 2026-02-21T12:46:44.3831050Z 'pid_type': 'flat', 2026-02-21T12:46:44.3831234Z 'range_flattens': [None, None], 2026-02-21T12:46:44.3831519Z 'range_multi_buffers': [None, True], 2026-02-21T12:46:44.3831754Z 'range_num_stages': [0, 0], 2026-02-21T12:46:44.3831955Z 'range_unroll_factors': [0, 0], 2026-02-21T12:46:44.3832224Z 'range_warp_specializes': []} 2026-02-21T12:46:44.3872544Z [8503s] Fitting surrogate: 423 points, 423 targets 2026-02-21T12:46:45.9648738Z [8505s] Generation 4 starting: 96 neighbors, 5 active search path(s) 2026-02-21T12:47:24.8659186Z Generation 4: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98/98 3.3 configs/s 2026-02-21T12:50:27.7399976Z Generation 4: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 98/98 0.5 configs/s 2026-02-21T12:50:28.5624038Z Generation 4: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 6/6 - configs/s 2026-02-21T12:50:33.3517072Z [8732s] Generation 4 complete: 2026-02-21T12:50:33.3517440Z error=22 2026-02-21T12:50:33.3517611Z ok=79 2026-02-21T12:50:33.3517777Z min=27.3375 2026-02-21T12:50:33.3517950Z mid=58.2346 2026-02-21T12:50:33.3518110Z max=1975.1863 2026-02-21T12:50:33.3518306Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:50:33.3518599Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:50:33.3518906Z 'l2_groupings': [1], 2026-02-21T12:50:33.3519131Z 'load_eviction_policies': ['', ''], 2026-02-21T12:50:33.3519396Z 'loop_orders': [[1, 0]], 2026-02-21T12:50:33.3519608Z 'num_stages': 1, 2026-02-21T12:50:33.3519795Z 'num_warps': 4, 2026-02-21T12:50:33.3519977Z 'pid_type': 'flat', 2026-02-21T12:50:33.3520187Z 'range_flattens': [None, None], 2026-02-21T12:50:33.3520439Z 'range_multi_buffers': [None, True], 2026-02-21T12:50:33.3520695Z 'range_num_stages': [0, 0], 2026-02-21T12:50:33.3521472Z 'range_unroll_factors': [0, 1], 2026-02-21T12:50:33.3521716Z 'range_warp_specializes': []} 2026-02-21T12:50:33.3563368Z [8732s] Fitting surrogate: 524 points, 524 targets 2026-02-21T12:50:36.4085907Z [8735s] Generation 5 starting: 98 neighbors, 5 active search path(s) 2026-02-21T12:51:14.0346112Z Generation 5: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 98/98 3.0 configs/s 2026-02-21T12:51:27.5033936Z [8786s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:51:27.5035928Z Config: @helion.kernel(config=helion.Config(block_sizes=[16, 256, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=5, num_warps=16, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[False, False], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:51:27.5038159Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:51:27.5038490Z `ptxas` stderr: 2026-02-21T12:51:27.5039704Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 393 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T12:51:27.5040481Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:51:27.5040699Z 2026-02-21T12:51:27.5041307Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpbw3czelp.ptx -o /tmp/tmpbw3czelp.ptx.o 2026-02-21T12:51:27.5042045Z 2026-02-21T12:51:27.5042241Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:51:27.5042544Z 2026-02-21T12:51:27.5042548Z 2026-02-21T12:51:27.5042650Z ================================================================ 2026-02-21T12:51:27.5042957Z Internal Triton PTX codegen error 2026-02-21T12:51:27.5043214Z `ptxas` stderr: 2026-02-21T12:51:27.5043927Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 393 in function _helion_matmul_bf16_int4. Try to compile with register target of 46 or higher. 2026-02-21T12:51:27.5044739Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:51:27.5044973Z 2026-02-21T12:51:27.5045617Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpbw3czelp.ptx -o /tmp/tmpbw3czelp.ptx.o 2026-02-21T12:51:27.5046367Z 2026-02-21T12:51:27.5046370Z 2026-02-21T12:51:27.5046440Z // 2026-02-21T12:51:27.5046774Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:51:27.5047008Z // 2026-02-21T12:51:27.5047095Z 2026-02-21T12:51:27.5047170Z .version 8.7 2026-02-21T12:51:27.5047345Z .target sm_90a 2026-02-21T12:51:27.5047524Z .address_size 64 2026-02-21T12:51:27.5047645Z 2026-02-21T12:51:27.5047860Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:51:27.5048466Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:51:27.5048785Z // @_helion_matmul_bf16_int4 2026-02-21T12:51:27.5049107Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:51:27.5049468Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:51:27.5049896Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:51:27.5050325Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:51:27.5050744Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:51:27.5051171Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:51:27.5051512Z ) 2026-02-21T12:51:27.5051666Z .reqntid 512 2026-02-21T12:51:27.5051852Z .maxnreg 32 2026-02-21T12:51:27.5052013Z { 2026-02-21T12:51:27.5052176Z .reg .pred %p<12>; 2026-02-21T12:51:27.5052358Z .reg .b16 %rs<32>; 2026-02-21T12:51:27.5052522Z .reg .b32 %r<424>; 2026-02-21T12:51:27.5052680Z .reg .b64 %rd<103>; 2026-02-21T12:51:27.5052842Z $L__func_begin0: 2026-02-21T12:51:27.5053037Z 2026-02-21T12:51:27.5053232Z // %bb.0: 2026-02-21T12:51:27.5053535Z .loc 1 19 46 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:46 2026-02-21T12:51:27.5053996Z mov.u32 %r1, %ctaid.x; 2026-02-21T12:51:27.5054318Z .loc 1 19 103 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:103 2026-02-21T12:51:27.5054698Z setp.gt.u32 %p1, %r1, 40959; 2026-02-21T12:51:27.5054888Z @%p1 bra $L__BB0_5; 2026-02-21T12:51:27.5055072Z // %bb.1: // %.lr.ph 2026-02-21T12:51:27.5055446Z .loc 1 0 103 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:0:103 2026-02-21T12:51:27.5055863Z ld.param.b64 %rd32, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:51:27.5056168Z ld.param.b64 %rd31, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:51:27.5056605Z ld.param.b64 %rd30, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:51:27.5056867Z mov.u32 %r2, %tid.x; 2026-02-21T12:51:27.5057116Z shr.u32 %r3, %r2, 5; 2026-02-21T12:51:27.5057280Z and.b32 %r4, %r2, 7; 2026-02-21T12:51:27.5057432Z and.b32 %r5, %r2, 3; 2026-02-21T12:51:27.5057609Z and.b32 %r72, %r2, 31; 2026-02-21T12:51:27.5057787Z mul.wide.u32 %rd1, %r4, 4; 2026-02-21T12:51:27.5057976Z mul.wide.u32 %rd2, %r5, 8; 2026-02-21T12:51:27.5058156Z cvt.u64.u32 %rd3, %r72; 2026-02-21T12:51:27.5058335Z and.b32 %r6, %r2, 504; 2026-02-21T12:51:27.5058512Z bfe.u32 %r73, %r2, 3, 6; 2026-02-21T12:51:27.5058683Z or.b32 %r74, %r73, 64; 2026-02-21T12:51:27.5058853Z or.b32 %r75, %r73, 128; 2026-02-21T12:51:27.5059017Z or.b32 %r76, %r73, 192; 2026-02-21T12:51:27.5059188Z shr.u32 %r77, %r2, 2; 2026-02-21T12:51:27.5059354Z bfe.u32 %r78, %r2, 2, 7; 2026-02-21T12:51:27.5059527Z or.b32 %r79, %r77, 128; 2026-02-21T12:51:27.5059689Z cvt.u64.u32 %rd4, %r73; 2026-02-21T12:51:27.5059862Z cvt.u64.u32 %rd5, %r74; 2026-02-21T12:51:27.5060031Z cvt.u64.u32 %rd6, %r75; 2026-02-21T12:51:27.5060216Z cvt.u64.u32 %rd7, %r76; 2026-02-21T12:51:27.5060393Z cvt.u64.u32 %rd8, %r78; 2026-02-21T12:51:27.5060556Z cvt.u64.u32 %rd9, %r79; 2026-02-21T12:51:27.5060724Z and.b32 %r7, %r2, 32; 2026-02-21T12:51:27.5060884Z cvt.u32.u64 %r80, %rd3; 2026-02-21T12:51:27.5061208Z .loc 1 41 48 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:41:48 2026-02-21T12:51:27.5061566Z and.b32 %r81, %r2, 480; 2026-02-21T12:51:27.5061747Z bfe.u32 %r82, %r2, 5, 4; 2026-02-21T12:51:27.5062067Z .loc 1 19 52 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:52 2026-02-21T12:51:27.5062425Z cvt.u64.u32 %rd99, %r1; 2026-02-21T12:51:27.5062603Z shl.b32 %r83, %r2, 3; 2026-02-21T12:51:27.5062768Z and.b32 %r84, %r83, 4088; 2026-02-21T12:51:27.5062950Z shr.u32 %r85, %r2, 1; 2026-02-21T12:51:27.5063110Z and.b32 %r86, %r85, 24; 2026-02-21T12:51:27.5063362Z xor.b32 %r8, %r84, %r86; 2026-02-21T12:51:27.5063532Z mov.b32 %r87, global_smem; 2026-02-21T12:51:27.5063715Z add.s32 %r137, %r87, %r8; 2026-02-21T12:51:27.5063897Z add.s32 %r139, %r137, 4096; 2026-02-21T12:51:27.5064083Z add.s32 %r141, %r137, 8192; 2026-02-21T12:51:27.5064260Z add.s32 %r143, %r137, 12288; 2026-02-21T12:51:27.5064435Z add.s32 %r145, %r137, 16384; 2026-02-21T12:51:27.5064612Z add.s32 %r147, %r137, 20480; 2026-02-21T12:51:27.5064795Z add.s32 %r149, %r137, 24576; 2026-02-21T12:51:27.5064974Z add.s32 %r151, %r137, 28672; 2026-02-21T12:51:27.5065144Z add.s32 %r153, %r137, 32768; 2026-02-21T12:51:27.5065319Z add.s32 %r155, %r137, 36864; 2026-02-21T12:51:27.5065493Z add.s32 %r157, %r137, 40960; 2026-02-21T12:51:27.5065666Z add.s32 %r159, %r137, 45056; 2026-02-21T12:51:27.5065833Z add.s32 %r161, %r137, 49152; 2026-02-21T12:51:27.5066006Z add.s32 %r163, %r137, 53248; 2026-02-21T12:51:27.5066181Z add.s32 %r165, %r137, 57344; 2026-02-21T12:51:27.5066352Z add.s32 %r167, %r137, 61440; 2026-02-21T12:51:27.5066664Z shl.b32 %r88, %r81, 5; 2026-02-21T12:51:27.5066832Z shl.b32 %r89, %r2, 4; 2026-02-21T12:51:27.5067139Z and.b32 %r90, %r89, 448; 2026-02-21T12:51:27.5067323Z shl.b32 %r91, %r5, 1; 2026-02-21T12:51:27.5067484Z and.b32 %r92, %r2, 24; 2026-02-21T12:51:27.5067646Z or.b32 %r93, %r88, %r90; 2026-02-21T12:51:27.5067816Z or.b32 %r94, %r91, %r92; 2026-02-21T12:51:27.5067978Z or.b32 %r25, %r93, %r94; 2026-02-21T12:51:27.5068152Z xor.b32 %r26, %r25, 8; 2026-02-21T12:51:27.5068318Z xor.b32 %r27, %r25, 16; 2026-02-21T12:51:27.5068587Z xor.b32 %r28, %r25, 24; 2026-02-21T12:51:27.5068758Z shl.b32 %r95, %r80, 2; 2026-02-21T12:51:27.5068916Z shl.b32 %r96, %r2, 1; 2026-02-21T12:51:27.5069081Z and.b32 %r97, %r96, 384; 2026-02-21T12:51:27.5069244Z shr.u32 %r98, %r7, 4; 2026-02-21T12:51:27.5069407Z bfe.u32 %r99, %r2, 8, 1; 2026-02-21T12:51:27.5069572Z add.s32 %r100, %r87, 65536; 2026-02-21T12:51:27.5069749Z add.s32 %r101, %r100, %r97; 2026-02-21T12:51:27.5069924Z add.s32 %r102, %r101, %r98; 2026-02-21T12:51:27.5070102Z add.s32 %r103, %r102, %r99; 2026-02-21T12:51:27.5070360Z add.s32 %r29, %r103, %r95; 2026-02-21T12:51:27.5070541Z and.b32 %r104, %r2, 384; 2026-02-21T12:51:27.5070713Z and.b32 %r105, %r3, 2; 2026-02-21T12:51:27.5070875Z add.s32 %r106, %r100, %r105; 2026-02-21T12:51:27.5071056Z add.s32 %r107, %r106, %r104; 2026-02-21T12:51:27.5071231Z add.s32 %r30, %r107, %r95; 2026-02-21T12:51:27.5071409Z shl.b32 %r108, %r80, 7; 2026-02-21T12:51:27.5071576Z shl.b32 %r109, %r4, 4; 2026-02-21T12:51:27.5071758Z shr.u32 %r110, %r81, 3; 2026-02-21T12:51:27.5071928Z xor.b32 %r111, %r109, %r110; 2026-02-21T12:51:27.5072107Z or.b32 %r112, %r111, %r108; 2026-02-21T12:51:27.5072290Z add.s32 %r31, %r100, %r112; 2026-02-21T12:51:27.5072466Z xor.b32 %r113, %r112, 64; 2026-02-21T12:51:27.5072648Z add.s32 %r32, %r100, %r113; 2026-02-21T12:51:27.5072821Z bfe.u32 %r114, %r100, 4, 14; 2026-02-21T12:51:27.5073005Z cvt.u64.u32 %rd33, %r114; 2026-02-21T12:51:27.5073192Z or.b64 %rd83, %rd33, 4611686293322072064; 2026-02-21T12:51:27.5073405Z add.s32 %r115, %r87, 65568; 2026-02-21T12:51:27.5073580Z bfe.u32 %r116, %r115, 4, 14; 2026-02-21T12:51:27.5073757Z cvt.u64.u32 %rd34, %r116; 2026-02-21T12:51:27.5073941Z or.b64 %rd84, %rd34, 4611686293322072064; 2026-02-21T12:51:27.5074141Z add.s32 %r117, %r87, 65600; 2026-02-21T12:51:27.5074320Z bfe.u32 %r118, %r117, 4, 14; 2026-02-21T12:51:27.5074491Z cvt.u64.u32 %rd35, %r118; 2026-02-21T12:51:27.5074671Z or.b64 %rd85, %rd35, 4611686293322072064; 2026-02-21T12:51:27.5074871Z add.s32 %r119, %r87, 65632; 2026-02-21T12:51:27.5075048Z bfe.u32 %r120, %r119, 4, 14; 2026-02-21T12:51:27.5075220Z cvt.u64.u32 %rd36, %r120; 2026-02-21T12:51:27.5075417Z or.b64 %rd86, %rd36, 4611686293322072064; 2026-02-21T12:51:27.5075616Z shl.b32 %r121, %r2, 9; 2026-02-21T12:51:27.5075782Z and.b32 %r122, %r121, 12288; 2026-02-21T12:51:27.5075956Z shl.b32 %r123, %r5, 5; 2026-02-21T12:51:27.5076192Z shl.b32 %r124, %r81, 3; 2026-02-21T12:51:27.5076359Z shl.b32 %r125, %r2, 2; 2026-02-21T12:51:27.5076653Z and.b32 %r126, %r125, 112; 2026-02-21T12:51:27.5076837Z or.b32 %r127, %r123, %r124; 2026-02-21T12:51:27.5077007Z xor.b32 %r128, %r127, %r126; 2026-02-21T12:51:27.5077186Z add.s32 %r129, %r87, %r122; 2026-02-21T12:51:27.5077373Z add.s32 %r33, %r129, %r128; 2026-02-21T12:51:27.5077550Z shl.b32 %r130, %r2, 11; 2026-02-21T12:51:27.5077712Z and.b32 %r131, %r130, 12288; 2026-02-21T12:51:27.5077889Z shl.b32 %r132, %r6, 2; 2026-02-21T12:51:27.5078054Z or.b32 %r133, %r131, %r109; 2026-02-21T12:51:27.5078224Z xor.b32 %r134, %r133, %r132; 2026-02-21T12:51:27.5078401Z add.s32 %r384, %r87, %r134; 2026-02-21T12:51:27.5078572Z add.s32 %r389, %r384, 2048; 2026-02-21T12:51:27.5078915Z .loc 1 19 103 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:103 2026-02-21T12:51:27.5079286Z and.b32 %r135, %r1, 1; 2026-02-21T12:51:27.5079464Z cvt.u64.u32 %rd15, %r135; 2026-02-21T12:51:27.5079648Z shl.b64 %rd37, %rd4, 14; 2026-02-21T12:51:27.5079832Z mul.wide.u32 %rd38, %r4, 8; 2026-02-21T12:51:27.5080019Z or.b64 %rd39, %rd37, %rd38; 2026-02-21T12:51:27.5080363Z add.s64 %rd40, %rd39, %rd30; 2026-02-21T12:51:27.5080554Z add.s64 %rd16, %rd40, 3145984; 2026-02-21T12:51:27.5087890Z shr.u64 %rd41, %rd99, 1; 2026-02-21T12:51:27.5088167Z cvt.u16.u64 %rs31, %rd41; 2026-02-21T12:51:27.5088393Z mul.wide.u32 %rd42, %r82, 1280; 2026-02-21T12:51:27.5088609Z mul.wide.u32 %rd43, %r135, 32; 2026-02-21T12:51:27.5088809Z or.b64 %rd44, %rd42, %rd43; 2026-02-21T12:51:27.5089006Z or.b64 %rd45, %rd44, %rd3; 2026-02-21T12:51:27.5089198Z add.s64 %rd17, %rd31, %rd45; 2026-02-21T12:51:27.5089388Z shl.b64 %rd73, %rd1, 1; 2026-02-21T12:51:27.5089574Z setp.eq.b32 %p6, %r7, 0; 2026-02-21T12:51:27.5089831Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:51:27.5090139Z // Child Loop BB0_3 Depth 2 2026-02-21T12:51:27.5090563Z .loc 1 26 33 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:26:33 2026-02-21T12:51:27.5091073Z cvt.u32.u16 %r172, %rs31; 2026-02-21T12:51:27.5091272Z and.b32 %r173, %r172, 1023; 2026-02-21T12:51:27.5091487Z mad.wide.u32 %rd101, %r173, 4194304, %rd16; 2026-02-21T12:51:27.5091707Z cvt.u32.u64 %r174, %rd99; 2026-02-21T12:51:27.5091901Z bfe.u32 %r175, %r174, 11, 5; 2026-02-21T12:51:27.5092091Z mad.wide.u32 %rd100, %r175, 64, %rd17; 2026-02-21T12:51:27.5092304Z shr.u64 %rd63, %rd99, 10; 2026-02-21T12:51:27.5092475Z and.b64 %rd64, %rd63, 62; 2026-02-21T12:51:27.5092807Z .loc 1 28 30 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:28:30 2026-02-21T12:51:27.5093174Z or.b64 %rd65, %rd64, %rd15; 2026-02-21T12:51:27.5093490Z .loc 1 30 27 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:30:27 2026-02-21T12:51:27.5093847Z shl.b64 %rd21, %rd65, 5; 2026-02-21T12:51:27.5094174Z .loc 1 32 27 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:32:27 2026-02-21T12:51:27.5094532Z shl.b64 %rd66, %rd99, 7; 2026-02-21T12:51:27.5094712Z and.b64 %rd22, %rd66, 261888; 2026-02-21T12:51:27.5095037Z .loc 1 33 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:33:32 2026-02-21T12:51:27.5095391Z or.b64 %rd67, %rd22, %rd4; 2026-02-21T12:51:27.5095571Z or.b64 %rd68, %rd22, %rd5; 2026-02-21T12:51:27.5095753Z or.b64 %rd69, %rd22, %rd6; 2026-02-21T12:51:27.5095927Z or.b64 %rd70, %rd22, %rd7; 2026-02-21T12:51:27.5096244Z .loc 1 48 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:32 2026-02-21T12:51:27.5096745Z shl.b64 %rd71, %rd67, 14; 2026-02-21T12:51:27.5096930Z add.s64 %rd72, %rd30, %rd71; 2026-02-21T12:51:27.5097116Z add.s64 %rd46, %rd72, %rd73; 2026-02-21T12:51:27.5097305Z shl.b64 %rd74, %rd68, 14; 2026-02-21T12:51:27.5097485Z add.s64 %rd75, %rd30, %rd74; 2026-02-21T12:51:27.5097767Z add.s64 %rd47, %rd75, %rd73; 2026-02-21T12:51:27.5097955Z shl.b64 %rd76, %rd69, 14; 2026-02-21T12:51:27.5098125Z add.s64 %rd77, %rd30, %rd76; 2026-02-21T12:51:27.5098311Z add.s64 %rd48, %rd77, %rd73; 2026-02-21T12:51:27.5098487Z shl.b64 %rd78, %rd70, 14; 2026-02-21T12:51:27.5098665Z add.s64 %rd79, %rd30, %rd78; 2026-02-21T12:51:27.5098842Z add.s64 %rd49, %rd79, %rd73; 2026-02-21T12:51:27.5099159Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5099510Z bar.sync 0; 2026-02-21T12:51:27.5099657Z mov.b32 %r138, 8; 2026-02-21T12:51:27.5099819Z // begin inline asm 2026-02-21T12:51:27.5100061Z cp.async.ca.shared.global [ %r137 + 0 ], [ %rd46 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5100344Z // end inline asm 2026-02-21T12:51:27.5100500Z // begin inline asm 2026-02-21T12:51:27.5100739Z cp.async.ca.shared.global [ %r139 + 0 ], [ %rd47 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5101012Z // end inline asm 2026-02-21T12:51:27.5101178Z // begin inline asm 2026-02-21T12:51:27.5101409Z cp.async.ca.shared.global [ %r141 + 0 ], [ %rd48 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5101832Z // end inline asm 2026-02-21T12:51:27.5101995Z // begin inline asm 2026-02-21T12:51:27.5102220Z cp.async.ca.shared.global [ %r143 + 0 ], [ %rd49 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5102505Z // end inline asm 2026-02-21T12:51:27.5102664Z cp.async.commit_group; 2026-02-21T12:51:27.5102990Z .loc 1 48 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:32 2026-02-21T12:51:27.5103348Z add.s64 %rd50, %rd46, 64; 2026-02-21T12:51:27.5103532Z add.s64 %rd51, %rd47, 64; 2026-02-21T12:51:27.5103711Z add.s64 %rd52, %rd48, 64; 2026-02-21T12:51:27.5103878Z add.s64 %rd53, %rd49, 64; 2026-02-21T12:51:27.5104195Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5104542Z bar.sync 0; 2026-02-21T12:51:27.5104700Z // begin inline asm 2026-02-21T12:51:27.5104933Z cp.async.ca.shared.global [ %r145 + 0 ], [ %rd50 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5105210Z // end inline asm 2026-02-21T12:51:27.5105440Z // begin inline asm 2026-02-21T12:51:27.5105676Z cp.async.ca.shared.global [ %r147 + 0 ], [ %rd51 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5105950Z // end inline asm 2026-02-21T12:51:27.5106112Z // begin inline asm 2026-02-21T12:51:27.5106339Z cp.async.ca.shared.global [ %r149 + 0 ], [ %rd52 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5106754Z // end inline asm 2026-02-21T12:51:27.5106915Z // begin inline asm 2026-02-21T12:51:27.5107134Z cp.async.ca.shared.global [ %r151 + 0 ], [ %rd53 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5107417Z // end inline asm 2026-02-21T12:51:27.5107577Z cp.async.commit_group; 2026-02-21T12:51:27.5107895Z .loc 1 48 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:32 2026-02-21T12:51:27.5108255Z add.s64 %rd54, %rd46, 128; 2026-02-21T12:51:27.5108499Z add.s64 %rd55, %rd47, 128; 2026-02-21T12:51:27.5108697Z add.s64 %rd56, %rd48, 128; 2026-02-21T12:51:27.5108871Z add.s64 %rd57, %rd49, 128; 2026-02-21T12:51:27.5109193Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5109536Z bar.sync 0; 2026-02-21T12:51:27.5109695Z // begin inline asm 2026-02-21T12:51:27.5109924Z cp.async.ca.shared.global [ %r153 + 0 ], [ %rd54 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5110201Z // end inline asm 2026-02-21T12:51:27.5110368Z // begin inline asm 2026-02-21T12:51:27.5110589Z cp.async.ca.shared.global [ %r155 + 0 ], [ %rd55 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5110860Z // end inline asm 2026-02-21T12:51:27.5111008Z // begin inline asm 2026-02-21T12:51:27.5111239Z cp.async.ca.shared.global [ %r157 + 0 ], [ %rd56 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5111522Z // end inline asm 2026-02-21T12:51:27.5111682Z // begin inline asm 2026-02-21T12:51:27.5111917Z cp.async.ca.shared.global [ %r159 + 0 ], [ %rd57 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5112293Z // end inline asm 2026-02-21T12:51:27.5112454Z cp.async.commit_group; 2026-02-21T12:51:27.5112792Z .loc 1 48 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:32 2026-02-21T12:51:27.5113163Z add.s64 %rd58, %rd46, 192; 2026-02-21T12:51:27.5113343Z add.s64 %rd59, %rd47, 192; 2026-02-21T12:51:27.5113526Z add.s64 %rd60, %rd48, 192; 2026-02-21T12:51:27.5113701Z add.s64 %rd61, %rd49, 192; 2026-02-21T12:51:27.5114024Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5114366Z bar.sync 0; 2026-02-21T12:51:27.5114521Z // begin inline asm 2026-02-21T12:51:27.5114757Z cp.async.ca.shared.global [ %r161 + 0 ], [ %rd58 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5115024Z // end inline asm 2026-02-21T12:51:27.5115180Z // begin inline asm 2026-02-21T12:51:27.5115403Z cp.async.ca.shared.global [ %r163 + 0 ], [ %rd59 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5115695Z // end inline asm 2026-02-21T12:51:27.5115844Z // begin inline asm 2026-02-21T12:51:27.5116077Z cp.async.ca.shared.global [ %r165 + 0 ], [ %rd60 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5116629Z // end inline asm 2026-02-21T12:51:27.5116802Z // begin inline asm 2026-02-21T12:51:27.5117032Z cp.async.ca.shared.global [ %r167 + 0 ], [ %rd61 + 0 ], 0x8, %r138; 2026-02-21T12:51:27.5117304Z // end inline asm 2026-02-21T12:51:27.5117468Z cp.async.commit_group; 2026-02-21T12:51:27.5117644Z mov.b32 %r408, 0f00000000; 2026-02-21T12:51:27.5117825Z mov.b32 %r407, 3; 2026-02-21T12:51:27.5117979Z mov.b32 %r406, -1; 2026-02-21T12:51:27.5118142Z mov.b64 %rd102, -16; 2026-02-21T12:51:27.5118304Z mov.b32 %r409, %r408; 2026-02-21T12:51:27.5118470Z mov.b32 %r410, %r408; 2026-02-21T12:51:27.5118626Z mov.b32 %r411, %r408; 2026-02-21T12:51:27.5118793Z mov.b32 %r412, %r408; 2026-02-21T12:51:27.5118966Z mov.b32 %r413, %r408; 2026-02-21T12:51:27.5119131Z mov.b32 %r414, %r408; 2026-02-21T12:51:27.5119298Z mov.b32 %r415, %r408; 2026-02-21T12:51:27.5119456Z mov.b32 %r416, %r408; 2026-02-21T12:51:27.5119619Z mov.b32 %r417, %r408; 2026-02-21T12:51:27.5119872Z mov.b32 %r418, %r408; 2026-02-21T12:51:27.5120038Z mov.b32 %r419, %r408; 2026-02-21T12:51:27.5120194Z mov.b32 %r420, %r408; 2026-02-21T12:51:27.5120362Z mov.b32 %r421, %r408; 2026-02-21T12:51:27.5120517Z mov.b32 %r422, %r408; 2026-02-21T12:51:27.5120679Z mov.b32 %r423, %r408; 2026-02-21T12:51:27.5120889Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T12:51:27.5121187Z // => This Inner Loop Header: Depth=2 2026-02-21T12:51:27.5121590Z .loc 1 40 89 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:40:89 2026-02-21T12:51:27.5121945Z add.s64 %rd102, %rd102, 16; 2026-02-21T12:51:27.5122149Z setp.lt.u64 %p7, %rd102, 4032; 2026-02-21T12:51:27.5122341Z add.s32 %r366, %r406, 1; 2026-02-21T12:51:27.5122528Z setp.gt.s32 %p8, %r366, 3; 2026-02-21T12:51:27.5122716Z selp.b32 %r406, 0, %r366, %p8; 2026-02-21T12:51:27.5123049Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5123412Z cp.async.wait_group 3; 2026-02-21T12:51:27.5123598Z bar.sync 0; 2026-02-21T12:51:27.5123756Z shl.b32 %r367, %r406, 14; 2026-02-21T12:51:27.5123932Z add.s32 %r369, %r87, %r367; 2026-02-21T12:51:27.5124256Z .loc 1 52 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:52:32 2026-02-21T12:51:27.5124602Z add.s32 %r370, %r369, %r25; 2026-02-21T12:51:27.5124793Z ld.shared.b16 %rs5, [%r370]; 2026-02-21T12:51:27.5124989Z ld.shared.b16 %rs6, [%r370+512]; 2026-02-21T12:51:27.5125190Z ld.shared.b16 %rs7, [%r370+32]; 2026-02-21T12:51:27.5125391Z ld.shared.b16 %rs8, [%r370+544]; 2026-02-21T12:51:27.5125582Z add.s32 %r371, %r369, %r26; 2026-02-21T12:51:27.5125770Z ld.shared.b16 %rs9, [%r371]; 2026-02-21T12:51:27.5125960Z ld.shared.b16 %rs10, [%r371+512]; 2026-02-21T12:51:27.5126249Z ld.shared.b16 %rs11, [%r371+32]; 2026-02-21T12:51:27.5126442Z ld.shared.b16 %rs12, [%r371+544]; 2026-02-21T12:51:27.5126786Z add.s32 %r372, %r369, %r27; 2026-02-21T12:51:27.5126979Z ld.shared.b16 %rs13, [%r372]; 2026-02-21T12:51:27.5127168Z ld.shared.b16 %rs14, [%r372+512]; 2026-02-21T12:51:27.5127391Z ld.shared.b16 %rs15, [%r372+32]; 2026-02-21T12:51:27.5127586Z ld.shared.b16 %rs16, [%r372+544]; 2026-02-21T12:51:27.5127784Z add.s32 %r373, %r369, %r28; 2026-02-21T12:51:27.5127965Z ld.shared.b16 %rs17, [%r373]; 2026-02-21T12:51:27.5128157Z ld.shared.b16 %rs18, [%r373+512]; 2026-02-21T12:51:27.5128351Z ld.shared.b16 %rs19, [%r373+32]; 2026-02-21T12:51:27.5128553Z ld.shared.b16 %rs20, [%r373+544]; 2026-02-21T12:51:27.5128748Z cvt.f32.bf16 %r208, %rs5; 2026-02-21T12:51:27.5128930Z cvt.f32.bf16 %r209, %rs6; 2026-02-21T12:51:27.5129104Z cvt.f32.bf16 %r210, %rs9; 2026-02-21T12:51:27.5129282Z cvt.f32.bf16 %r211, %rs10; 2026-02-21T12:51:27.5129469Z cvt.f32.bf16 %r244, %rs13; 2026-02-21T12:51:27.5129641Z cvt.f32.bf16 %r245, %rs14; 2026-02-21T12:51:27.5129818Z cvt.f32.bf16 %r246, %rs17; 2026-02-21T12:51:27.5130147Z cvt.f32.bf16 %r247, %rs18; 2026-02-21T12:51:27.5130329Z cvt.f32.bf16 %r280, %rs7; 2026-02-21T12:51:27.5130495Z cvt.f32.bf16 %r281, %rs8; 2026-02-21T12:51:27.5130671Z cvt.f32.bf16 %r282, %rs11; 2026-02-21T12:51:27.5130843Z cvt.f32.bf16 %r283, %rs12; 2026-02-21T12:51:27.5131016Z cvt.f32.bf16 %r316, %rs15; 2026-02-21T12:51:27.5131189Z cvt.f32.bf16 %r317, %rs16; 2026-02-21T12:51:27.5131359Z cvt.f32.bf16 %r318, %rs19; 2026-02-21T12:51:27.5131536Z cvt.f32.bf16 %r319, %rs20; 2026-02-21T12:51:27.5131849Z .loc 1 54 87 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:54:87 2026-02-21T12:51:27.5132214Z // begin inline asm 2026-02-21T12:51:27.5132387Z mov.u64 %rd80, 0x0; 2026-02-21T12:51:27.5132639Z createpolicy.fractional.L2::evict_first.b64 %rd80, 1.0; 2026-02-21T12:51:27.5132899Z // end inline asm 2026-02-21T12:51:27.5133058Z // begin inline asm 2026-02-21T12:51:27.5133221Z mov.u16 %rs4, 0x0; 2026-02-21T12:51:27.5133550Z ld.global.L1::evict_first.L2::cache_hint.b8 { %rs4 }, [ %rd100 + 0 ], %rd80; 2026-02-21T12:51:27.5133859Z // end inline asm 2026-02-21T12:51:27.5134164Z .loc 1 62 28 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:62:28 2026-02-21T12:51:27.5134520Z st.shared.b8 [%r29], %rs4; 2026-02-21T12:51:27.5134700Z bar.sync 0; 2026-02-21T12:51:27.5134865Z ld.shared.v2.b8 {%rs21, %rs22}, [%r30]; 2026-02-21T12:51:27.5135220Z .loc 1 57 28 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:57:28 2026-02-21T12:51:27.5135567Z shl.b16 %rs23, %rs21, 4; 2026-02-21T12:51:27.5135762Z shl.b16 %rs24, %rs22, 4; 2026-02-21T12:51:27.5136089Z .loc 1 72 58 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:72:58 2026-02-21T12:51:27.5136618Z selp.b16 %rs25, %rs23, %rs21, %p6; 2026-02-21T12:51:27.5136845Z cvt.s16.s8 %rs26, %rs25; 2026-02-21T12:51:27.5137024Z shr.s16 %rs27, %rs26, 4; 2026-02-21T12:51:27.5137225Z selp.b16 %rs28, %rs24, %rs22, %p6; 2026-02-21T12:51:27.5137433Z cvt.s16.s8 %rs29, %rs28; 2026-02-21T12:51:27.5137608Z shr.s16 %rs30, %rs29, 4; 2026-02-21T12:51:27.5137922Z .loc 1 77 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:77:32 2026-02-21T12:51:27.5138301Z cvt.rn.f32.s16 %r374, %rs27; 2026-02-21T12:51:27.5138494Z cvt.rn.f32.s16 %r375, %rs30; 2026-02-21T12:51:27.5138669Z bar.sync 0; 2026-02-21T12:51:27.5138824Z st.shared.b32 [%r31], %r374; 2026-02-21T12:51:27.5138999Z st.shared.b32 [%r32], %r375; 2026-02-21T12:51:27.5139172Z $L__tmp0: 2026-02-21T12:51:27.5139520Z .loc 2 291 36 // standard.py:291:36 @[ cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:84:40 ] 2026-02-21T12:51:27.5139935Z // begin inline asm 2026-02-21T12:51:27.5140116Z fence.proxy.async.shared::cta; 2026-02-21T12:51:27.5140409Z // end inline asm 2026-02-21T12:51:27.5140557Z bar.sync 0; 2026-02-21T12:51:27.5140722Z shfl.sync.idx.b32 %r376, %r3, 0, 31, -1; 2026-02-21T12:51:27.5140953Z wgmma.fence.sync.aligned; 2026-02-21T12:51:27.5141135Z mov.pred %p2, -1; 2026-02-21T12:51:27.5141301Z // begin inline asm 2026-02-21T12:51:27.5141856Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r408,%r409,%r410,%r411,%r412,%r413,%r414,%r415,%r416,%r417,%r418,%r419,%r420,%r421,%r422,%r423}, {%r208,%r209,%r210,%r211}, %rd83, %p2, 1, 1; 2026-02-21T12:51:27.5142442Z // end inline asm 2026-02-21T12:51:27.5142595Z // begin inline asm 2026-02-21T12:51:27.5143132Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r408,%r409,%r410,%r411,%r412,%r413,%r414,%r415,%r416,%r417,%r418,%r419,%r420,%r421,%r422,%r423}, {%r244,%r245,%r246,%r247}, %rd84, %p2, 1, 1; 2026-02-21T12:51:27.5143712Z // end inline asm 2026-02-21T12:51:27.5143859Z // begin inline asm 2026-02-21T12:51:27.5144408Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r408,%r409,%r410,%r411,%r412,%r413,%r414,%r415,%r416,%r417,%r418,%r419,%r420,%r421,%r422,%r423}, {%r280,%r281,%r282,%r283}, %rd85, %p2, 1, 1; 2026-02-21T12:51:27.5145146Z // end inline asm 2026-02-21T12:51:27.5145308Z // begin inline asm 2026-02-21T12:51:27.5145847Z wgmma.mma_async.sync.aligned.m64n32k8.f32.tf32.tf32 {%r408,%r409,%r410,%r411,%r412,%r413,%r414,%r415,%r416,%r417,%r418,%r419,%r420,%r421,%r422,%r423}, {%r316,%r317,%r318,%r319}, %rd86, %p2, 1, 1; 2026-02-21T12:51:27.5146434Z // end inline asm 2026-02-21T12:51:27.5146743Z wgmma.commit_group.sync.aligned; 2026-02-21T12:51:27.5146938Z mov.b32 %r337, 0; 2026-02-21T12:51:27.5147094Z mov.b32 %r336, %r100; 2026-02-21T12:51:27.5147253Z mov.b32 %r338, %r337; 2026-02-21T12:51:27.5147412Z // begin inline asm 2026-02-21T12:51:27.5147764Z // wait for regs: %r408,%r409,%r410,%r411,%r412,%r413,%r414,%r415,%r416,%r417,%r418,%r419,%r420,%r421,%r422,%r423,%r336,%r337,%r338 2026-02-21T12:51:27.5148179Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:51:27.5148380Z // end inline asm 2026-02-21T12:51:27.5148600Z $L__tmp1: 2026-02-21T12:51:27.5148980Z .loc 1 40 89 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:40:89 2026-02-21T12:51:27.5149343Z add.s32 %r377, %r407, 1; 2026-02-21T12:51:27.5149532Z setp.gt.s32 %p9, %r377, 3; 2026-02-21T12:51:27.5149715Z selp.b32 %r407, 0, %r377, %p9; 2026-02-21T12:51:27.5150046Z .loc 1 48 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:32 2026-02-21T12:51:27.5150407Z add.s64 %rd87, %rd101, -3145728; 2026-02-21T12:51:27.5150614Z add.s64 %rd88, %rd101, -2097152; 2026-02-21T12:51:27.5150811Z add.s64 %rd89, %rd101, -1048576; 2026-02-21T12:51:27.5151152Z .loc 1 48 80 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:48:80 2026-02-21T12:51:27.5151514Z shl.b32 %r378, %r407, 14; 2026-02-21T12:51:27.5151692Z add.s32 %r379, %r87, %r378; 2026-02-21T12:51:27.5151875Z add.s32 %r358, %r379, %r8; 2026-02-21T12:51:27.5152061Z selp.b32 %r359, 8, 0, %p7; 2026-02-21T12:51:27.5152238Z // begin inline asm 2026-02-21T12:51:27.5152481Z cp.async.ca.shared.global [ %r358 + 0 ], [ %rd87 + 0 ], 0x8, %r359; 2026-02-21T12:51:27.5152757Z // end inline asm 2026-02-21T12:51:27.5152909Z add.s32 %r360, %r358, 4096; 2026-02-21T12:51:27.5153081Z // begin inline asm 2026-02-21T12:51:27.5153315Z cp.async.ca.shared.global [ %r360 + 0 ], [ %rd88 + 0 ], 0x8, %r359; 2026-02-21T12:51:27.5153580Z // end inline asm 2026-02-21T12:51:27.5153733Z add.s32 %r362, %r358, 8192; 2026-02-21T12:51:27.5153906Z // begin inline asm 2026-02-21T12:51:27.5154129Z cp.async.ca.shared.global [ %r362 + 0 ], [ %rd89 + 0 ], 0x8, %r359; 2026-02-21T12:51:27.5154394Z // end inline asm 2026-02-21T12:51:27.5154544Z add.s32 %r364, %r358, 12288; 2026-02-21T12:51:27.5154719Z // begin inline asm 2026-02-21T12:51:27.5154942Z cp.async.ca.shared.global [ %r364 + 0 ], [ %rd101 + 0 ], 0x8, %r359; 2026-02-21T12:51:27.5155212Z // end inline asm 2026-02-21T12:51:27.5155463Z cp.async.commit_group; 2026-02-21T12:51:27.5155782Z .loc 1 40 89 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:40:89 2026-02-21T12:51:27.5156139Z add.s64 %rd101, %rd101, 64; 2026-02-21T12:51:27.5156323Z add.s64 %rd100, %rd100, 20480; 2026-02-21T12:51:27.5156661Z setp.lt.u64 %p10, %rd102, 4080; 2026-02-21T12:51:27.5156852Z @%p10 bra $L__BB0_3; 2026-02-21T12:51:27.5157078Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:51:27.5157488Z .loc 1 31 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:31:32 2026-02-21T12:51:27.5157854Z or.b64 %rd93, %rd21, %rd2; 2026-02-21T12:51:27.5158181Z .loc 1 33 32 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:33:32 2026-02-21T12:51:27.5158539Z or.b64 %rd94, %rd22, %rd8; 2026-02-21T12:51:27.5158715Z or.b64 %rd95, %rd22, %rd9; 2026-02-21T12:51:27.5159026Z .loc 1 40 89 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:40:89 2026-02-21T12:51:27.5159385Z cp.async.wait_group 0; 2026-02-21T12:51:27.5159554Z bar.sync 0; 2026-02-21T12:51:27.5159988Z .loc 1 87 28 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:87:28 2026-02-21T12:51:27.5160348Z cvt.rn.bf16x2.f32 %r398, %r409, %r408; 2026-02-21T12:51:27.5160565Z cvt.rn.bf16x2.f32 %r399, %r411, %r410; 2026-02-21T12:51:27.5160771Z cvt.rn.bf16x2.f32 %r400, %r413, %r412; 2026-02-21T12:51:27.5160979Z cvt.rn.bf16x2.f32 %r401, %r415, %r414; 2026-02-21T12:51:27.5161185Z cvt.rn.bf16x2.f32 %r402, %r417, %r416; 2026-02-21T12:51:27.5161402Z cvt.rn.bf16x2.f32 %r403, %r419, %r418; 2026-02-21T12:51:27.5161613Z cvt.rn.bf16x2.f32 %r404, %r421, %r420; 2026-02-21T12:51:27.5161814Z cvt.rn.bf16x2.f32 %r405, %r423, %r422; 2026-02-21T12:51:27.5162158Z .loc 1 88 22 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:88:22 2026-02-21T12:51:27.5162518Z mad.lo.s64 %rd96, %rd94, 2560, %rd32; 2026-02-21T12:51:27.5162722Z shl.b64 %rd97, %rd93, 1; 2026-02-21T12:51:27.5162900Z add.s64 %rd91, %rd96, %rd97; 2026-02-21T12:51:27.5163157Z mad.lo.s64 %rd98, %rd95, 2560, %rd32; 2026-02-21T12:51:27.5163364Z add.s64 %rd92, %rd98, %rd97; 2026-02-21T12:51:27.5163677Z .loc 1 88 81 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:88:81 2026-02-21T12:51:27.5164072Z st.shared.v4.b32 [%r33], {%r398, %r400, %r402, %r404}; 2026-02-21T12:51:27.5164358Z st.shared.v4.b32 [%r33+128], {%r399, %r401, %r403, %r405}; 2026-02-21T12:51:27.5164600Z bar.sync 0; 2026-02-21T12:51:27.5164745Z // begin inline asm 2026-02-21T12:51:27.5165027Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r380, %r381, %r382, %r383}, [%r384]; 2026-02-21T12:51:27.5165355Z // end inline asm 2026-02-21T12:51:27.5165515Z // begin inline asm 2026-02-21T12:51:27.5165787Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r385, %r386, %r387, %r388}, [%r389]; 2026-02-21T12:51:27.5166111Z // end inline asm 2026-02-21T12:51:27.5166263Z // begin inline asm 2026-02-21T12:51:27.5166602Z st.global.v4.b32 [ %rd91 + 0 ], { %r380, %r381, %r382, %r383 }; 2026-02-21T12:51:27.5166876Z // end inline asm 2026-02-21T12:51:27.5167026Z // begin inline asm 2026-02-21T12:51:27.5167231Z st.global.v4.b32 [ %rd92 + 0 ], { %r385, %r386, %r387, %r388 }; 2026-02-21T12:51:27.5167493Z // end inline asm 2026-02-21T12:51:27.5167795Z .loc 1 19 103 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:103 2026-02-21T12:51:27.5168163Z add.s64 %rd29, %rd99, 4224; 2026-02-21T12:51:27.5168345Z add.s16 %rs31, %rs31, 64; 2026-02-21T12:51:27.5168529Z setp.lt.u64 %p11, %rd99, 36736; 2026-02-21T12:51:27.5168716Z mov.b64 %rd99, %rd29; 2026-02-21T12:51:27.5168881Z @%p11 bra $L__BB0_2; 2026-02-21T12:51:27.5169062Z $L__BB0_5: // %._crit_edge 2026-02-21T12:51:27.5169436Z .loc 1 19 4 // cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py:19:4 2026-02-21T12:51:27.5169868Z ret; 2026-02-21T12:51:27.5169997Z $L__tmp2: 2026-02-21T12:51:27.5170132Z $L__func_end0: 2026-02-21T12:51:27.5170309Z // -- End function 2026-02-21T12:51:27.5170524Z } 2026-02-21T12:51:27.5170843Z .file 1 "/tmp/torchinductor_root/pg/cpgwzrs7mmixohiu7daqnej436mdeol37xfi743wekcv6wtrridf.py" 2026-02-21T12:51:27.5171398Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:51:27.5171764Z .section .debug_abbrev 2026-02-21T12:51:27.5171928Z { 2026-02-21T12:51:27.5172100Z .b8 1 // Abbreviation Code 2026-02-21T12:51:27.5172362Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:51:27.5172630Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:51:27.5172876Z .b8 37 // DW_AT_producer 2026-02-21T12:51:27.5173124Z .b8 8 // DW_FORM_string 2026-02-21T12:51:27.5173379Z .b8 19 // DW_AT_language 2026-02-21T12:51:27.5173622Z .b8 5 // DW_FORM_data2 2026-02-21T12:51:27.5174004Z .b8 3 // DW_AT_name 2026-02-21T12:51:27.5174238Z .b8 8 // DW_FORM_string 2026-02-21T12:51:27.5174479Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:51:27.5174718Z .b8 6 // DW_FORM_data4 2026-02-21T12:51:27.5174956Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:51:27.5175191Z .b8 8 // DW_FORM_string 2026-02-21T12:51:27.5175425Z .b8 0 // EOM(1) 2026-02-21T12:51:27.5175644Z .b8 0 // EOM(2) 2026-02-21T12:51:27.5175879Z .b8 2 // Abbreviation Code 2026-02-21T12:51:27.5176140Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:51:27.5176385Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:51:27.5176838Z .b8 3 // DW_AT_name 2026-02-21T12:51:27.5177077Z .b8 8 // DW_FORM_string 2026-02-21T12:51:27.5177318Z .b8 32 // DW_AT_inline 2026-02-21T12:51:27.5177569Z .b8 11 // DW_FORM_data1 2026-02-21T12:51:27.5177799Z .b8 0 // EOM(1) 2026-02-21T12:51:27.5178017Z .b8 0 // EOM(2) 2026-02-21T12:51:27.5178244Z .b8 3 // Abbreviation Code 2026-02-21T12:51:27.5178499Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:51:27.5178741Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:51:27.5178987Z .b8 17 // DW_AT_low_pc 2026-02-21T12:51:27.5179218Z .b8 1 // DW_FORM_addr 2026-02-21T12:51:27.5179456Z .b8 18 // DW_AT_high_pc 2026-02-21T12:51:27.5179692Z .b8 1 // DW_FORM_addr 2026-02-21T12:51:27.5179940Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:51:27.5180189Z .b8 19 // DW_FORM_ref4 2026-02-21T12:51:27.5180411Z .b8 0 // EOM(1) 2026-02-21T12:51:27.5180628Z .b8 0 // EOM(2) 2026-02-21T12:51:27.5180857Z .b8 4 // Abbreviation Code 2026-02-21T12:51:27.5181122Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:51:27.5181385Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:51:27.5181633Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:51:27.5181896Z .b8 19 // DW_FORM_ref4 2026-02-21T12:51:27.5182124Z .b8 17 // DW_AT_low_pc 2026-02-21T12:51:27.5182447Z .b8 1 // DW_FORM_addr 2026-02-21T12:51:27.5182680Z .b8 18 // DW_AT_high_pc 2026-02-21T12:51:27.5182917Z .b8 1 // DW_FORM_addr 2026-02-21T12:51:27.5183154Z .b8 88 // DW_AT_call_file 2026-02-21T12:51:27.5183391Z .b8 11 // DW_FORM_data1 2026-02-21T12:51:27.5183629Z .b8 89 // DW_AT_call_line 2026-02-21T12:51:27.5183862Z .b8 11 // DW_FORM_data1 2026-02-21T12:51:27.5184104Z .b8 87 // DW_AT_call_column 2026-02-21T12:51:27.5184342Z .b8 11 // DW_FORM_data1 2026-02-21T12:51:27.5184572Z .b8 0 // EOM(1) 2026-02-21T12:51:27.5184791Z .b8 0 // EOM(2) 2026-02-21T12:51:27.5185007Z .b8 0 // EOM(3) 2026-02-21T12:51:27.5185213Z } 2026-02-21T12:51:27.5185347Z .section .debug_info 2026-02-21T12:51:27.5185502Z { 2026-02-21T12:51:27.5185807Z .b32 178 // Length of Unit 2026-02-21T12:51:27.5186072Z .b8 2 // DWARF version number 2026-02-21T12:51:27.5186297Z .b8 0 2026-02-21T12:51:27.5186647Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:51:27.5186962Z .b8 8 // Address Size (in bytes) 2026-02-21T12:51:27.5187264Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:51:27.5187574Z .b8 116 // DW_AT_producer 2026-02-21T12:51:27.5187797Z .b8 114 2026-02-21T12:51:27.5187939Z .b8 105 2026-02-21T12:51:27.5188066Z .b8 116 2026-02-21T12:51:27.5188199Z .b8 111 2026-02-21T12:51:27.5188325Z .b8 110 2026-02-21T12:51:27.5188512Z .b8 0 2026-02-21T12:51:27.5188675Z .b8 2 // DW_AT_language 2026-02-21T12:51:27.5188897Z .b8 0 2026-02-21T12:51:27.5189048Z .b8 99 // DW_AT_name 2026-02-21T12:51:27.5189349Z .b8 112 2026-02-21T12:51:27.5189490Z .b8 103 2026-02-21T12:51:27.5189615Z .b8 119 2026-02-21T12:51:27.5189745Z .b8 122 2026-02-21T12:51:27.5189867Z .b8 114 2026-02-21T12:51:27.5190001Z .b8 115 2026-02-21T12:51:27.5190127Z .b8 55 2026-02-21T12:51:27.5190258Z .b8 109 2026-02-21T12:51:27.5190382Z .b8 109 2026-02-21T12:51:27.5190509Z .b8 105 2026-02-21T12:51:27.5190632Z .b8 120 2026-02-21T12:51:27.5190761Z .b8 111 2026-02-21T12:51:27.5190886Z .b8 104 2026-02-21T12:51:27.5191016Z .b8 105 2026-02-21T12:51:27.5191140Z .b8 117 2026-02-21T12:51:27.5191277Z .b8 55 2026-02-21T12:51:27.5191414Z .b8 100 2026-02-21T12:51:27.5191550Z .b8 97 2026-02-21T12:51:27.5191682Z .b8 113 2026-02-21T12:51:27.5191806Z .b8 110 2026-02-21T12:51:27.5191933Z .b8 101 2026-02-21T12:51:27.5192058Z .b8 106 2026-02-21T12:51:27.5192190Z .b8 52 2026-02-21T12:51:27.5192317Z .b8 51 2026-02-21T12:51:27.5192446Z .b8 54 2026-02-21T12:51:27.5192573Z .b8 109 2026-02-21T12:51:27.5192702Z .b8 100 2026-02-21T12:51:27.5192830Z .b8 101 2026-02-21T12:51:27.5192963Z .b8 111 2026-02-21T12:51:27.5193088Z .b8 108 2026-02-21T12:51:27.5193220Z .b8 51 2026-02-21T12:51:27.5193351Z .b8 55 2026-02-21T12:51:27.5193473Z .b8 120 2026-02-21T12:51:27.5193603Z .b8 102 2026-02-21T12:51:27.5193726Z .b8 105 2026-02-21T12:51:27.5193855Z .b8 55 2026-02-21T12:51:27.5193978Z .b8 52 2026-02-21T12:51:27.5194120Z .b8 51 2026-02-21T12:51:27.5194248Z .b8 119 2026-02-21T12:51:27.5194380Z .b8 101 2026-02-21T12:51:27.5194508Z .b8 107 2026-02-21T12:51:27.5194642Z .b8 99 2026-02-21T12:51:27.5194767Z .b8 118 2026-02-21T12:51:27.5194900Z .b8 54 2026-02-21T12:51:27.5195027Z .b8 119 2026-02-21T12:51:27.5195160Z .b8 116 2026-02-21T12:51:27.5195294Z .b8 114 2026-02-21T12:51:27.5195418Z .b8 114 2026-02-21T12:51:27.5195546Z .b8 105 2026-02-21T12:51:27.5195674Z .b8 100 2026-02-21T12:51:27.5195801Z .b8 102 2026-02-21T12:51:27.5196027Z .b8 46 2026-02-21T12:51:27.5196155Z .b8 112 2026-02-21T12:51:27.5196279Z .b8 121 2026-02-21T12:51:27.5196405Z .b8 0 2026-02-21T12:51:27.5196734Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:51:27.5197008Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:51:27.5197218Z .b8 116 2026-02-21T12:51:27.5197349Z .b8 109 2026-02-21T12:51:27.5197471Z .b8 112 2026-02-21T12:51:27.5197599Z .b8 47 2026-02-21T12:51:27.5197725Z .b8 116 2026-02-21T12:51:27.5197847Z .b8 111 2026-02-21T12:51:27.5197991Z .b8 114 2026-02-21T12:51:27.5198114Z .b8 99 2026-02-21T12:51:27.5198248Z .b8 104 2026-02-21T12:51:27.5198369Z .b8 105 2026-02-21T12:51:27.5198497Z .b8 110 2026-02-21T12:51:27.5198621Z .b8 100 2026-02-21T12:51:27.5198749Z .b8 117 2026-02-21T12:51:27.5198871Z .b8 99 2026-02-21T12:51:27.5199001Z .b8 116 2026-02-21T12:51:27.5199125Z .b8 111 2026-02-21T12:51:27.5199253Z .b8 114 2026-02-21T12:51:27.5199375Z .b8 95 2026-02-21T12:51:27.5199509Z .b8 114 2026-02-21T12:51:27.5199645Z .b8 111 2026-02-21T12:51:27.5199770Z .b8 111 2026-02-21T12:51:27.5199899Z .b8 116 2026-02-21T12:51:27.5200021Z .b8 47 2026-02-21T12:51:27.5200150Z .b8 112 2026-02-21T12:51:27.5200360Z .b8 103 2026-02-21T12:51:27.5200553Z .b8 0 2026-02-21T12:51:27.5200740Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:51:27.5201028Z .b8 95 // DW_AT_name 2026-02-21T12:51:27.5201240Z .b8 104 2026-02-21T12:51:27.5201370Z .b8 101 2026-02-21T12:51:27.5201497Z .b8 108 2026-02-21T12:51:27.5201628Z .b8 105 2026-02-21T12:51:27.5201759Z .b8 111 2026-02-21T12:51:27.5201886Z .b8 110 2026-02-21T12:51:27.5202018Z .b8 95 2026-02-21T12:51:27.5202148Z .b8 109 2026-02-21T12:51:27.5202278Z .b8 97 2026-02-21T12:51:27.5202405Z .b8 116 2026-02-21T12:51:27.5202535Z .b8 109 2026-02-21T12:51:27.5202662Z .b8 117 2026-02-21T12:51:27.5202794Z .b8 108 2026-02-21T12:51:27.5202933Z .b8 95 2026-02-21T12:51:27.5203069Z .b8 98 2026-02-21T12:51:27.5203192Z .b8 102 2026-02-21T12:51:27.5203324Z .b8 49 2026-02-21T12:51:27.5203446Z .b8 54 2026-02-21T12:51:27.5203576Z .b8 95 2026-02-21T12:51:27.5203704Z .b8 105 2026-02-21T12:51:27.5203828Z .b8 110 2026-02-21T12:51:27.5204037Z .b8 116 2026-02-21T12:51:27.5204164Z .b8 52 2026-02-21T12:51:27.5204291Z .b8 0 2026-02-21T12:51:27.5204445Z .b8 1 // DW_AT_inline 2026-02-21T12:51:27.5204725Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:51:27.5205020Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:51:27.5205292Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:51:27.5205566Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:51:27.5205877Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:51:27.5206200Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:51:27.5206601Z .b64 $L__tmp0 // DW_AT_low_pc 2026-02-21T12:51:27.5206879Z .b64 $L__tmp1 // DW_AT_high_pc 2026-02-21T12:51:27.5207132Z .b8 1 // DW_AT_call_file 2026-02-21T12:51:27.5207388Z .b8 84 // DW_AT_call_line 2026-02-21T12:51:27.5207648Z .b8 40 // DW_AT_call_column 2026-02-21T12:51:27.5207915Z .b8 0 // End Of Children Mark 2026-02-21T12:51:27.5208180Z .b8 0 // End Of Children Mark 2026-02-21T12:51:27.5208403Z } 2026-02-21T12:51:27.5208554Z .section .debug_macinfo { } 2026-02-21T12:51:27.5208680Z 2026-02-21T12:51:27.5208763Z ================================================================ 2026-02-21T12:51:27.5209048Z please share the reproducer above with Triton project. 2026-02-21T12:52:20.4639174Z 2026-02-21T12:52:20.4640006Z Generation 5: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 98/98 1.6 configs/s 2026-02-21T12:52:21.0159429Z Generation 5: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 7/7 - configs/s 2026-02-21T12:52:24.9182385Z [8844s] Generation 5 complete: 2026-02-21T12:52:24.9182763Z error=42 2026-02-21T12:52:24.9183043Z ok=61 2026-02-21T12:52:24.9183246Z min=28.2028 2026-02-21T12:52:24.9183447Z mid=58.4061 2026-02-21T12:52:24.9183657Z max=1431.7927 2026-02-21T12:52:24.9183888Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:52:24.9184255Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:52:24.9184619Z 'l2_groupings': [1], 2026-02-21T12:52:24.9184897Z 'load_eviction_policies': ['', ''], 2026-02-21T12:52:24.9185214Z 'loop_orders': [[1, 0]], 2026-02-21T12:52:24.9185474Z 'num_stages': 1, 2026-02-21T12:52:24.9185709Z 'num_warps': 4, 2026-02-21T12:52:24.9185934Z 'pid_type': 'flat', 2026-02-21T12:52:24.9186196Z 'range_flattens': [None, None], 2026-02-21T12:52:24.9186938Z 'range_multi_buffers': [None, True], 2026-02-21T12:52:24.9187305Z 'range_num_stages': [0, 0], 2026-02-21T12:52:24.9187591Z 'range_unroll_factors': [0, 1], 2026-02-21T12:52:24.9187946Z 'range_warp_specializes': []} 2026-02-21T12:52:24.9225817Z [8844s] Fitting surrogate: 627 points, 627 targets 2026-02-21T12:52:26.3817434Z [8845s] Generation 6 starting: 77 neighbors, 5 active search path(s) 2026-02-21T12:52:57.0627064Z Generation 6: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 77/77 3.6 configs/s 2026-02-21T12:54:06.7678863Z Generation 6: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 77/77 0.9 configs/s 2026-02-21T12:54:08.3351203Z Generation 6: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━ 7/7 2.8 configs/s 2026-02-21T12:54:15.6540211Z [8954s] Generation 6 complete: 2026-02-21T12:54:15.6540493Z error=15 2026-02-21T12:54:15.6540643Z ok=67 2026-02-21T12:54:15.6540796Z min=27.5955 2026-02-21T12:54:15.6540942Z mid=42.2552 2026-02-21T12:54:15.6541094Z max=1956.1366 2026-02-21T12:54:15.6541260Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:54:15.6541522Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:54:15.6541785Z 'l2_groupings': [1], 2026-02-21T12:54:15.6541999Z 'load_eviction_policies': ['', ''], 2026-02-21T12:54:15.6542226Z 'loop_orders': [[1, 0]], 2026-02-21T12:54:15.6542966Z 'num_stages': 2, 2026-02-21T12:54:15.6543170Z 'num_warps': 4, 2026-02-21T12:54:15.6543331Z 'pid_type': 'flat', 2026-02-21T12:54:15.6543518Z 'range_flattens': [None, None], 2026-02-21T12:54:15.6543732Z 'range_multi_buffers': [None, True], 2026-02-21T12:54:15.6543964Z 'range_num_stages': [0, 0], 2026-02-21T12:54:15.6544163Z 'range_unroll_factors': [0, 1], 2026-02-21T12:54:15.6544381Z 'range_warp_specializes': []} 2026-02-21T12:54:15.6591719Z [8954s] Fitting surrogate: 709 points, 709 targets 2026-02-21T12:54:16.6401396Z [8955s] Generation 7 starting: 55 neighbors, 3 active search path(s) 2026-02-21T12:54:38.3945682Z Generation 7: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 55/55 2.5 configs/s 2026-02-21T12:55:35.8281657Z Generation 7: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 55/55 0.8 configs/s 2026-02-21T12:55:36.5605771Z Generation 7: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 7/7 - configs/s 2026-02-21T12:55:41.6777019Z [9040s] Generation 7 complete: 2026-02-21T12:55:41.6777478Z error=10 2026-02-21T12:55:41.6777779Z ok=48 2026-02-21T12:55:41.6778052Z min=27.9060 2026-02-21T12:55:41.6778315Z mid=46.4983 2026-02-21T12:55:41.6778589Z max=2072.8018 2026-02-21T12:55:41.6778890Z best={'block_sizes': [32, 64, 128], 2026-02-21T12:55:41.6779366Z 'indexing': ['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], 2026-02-21T12:55:41.6779747Z 'l2_groupings': [8], 2026-02-21T12:55:41.6779953Z 'load_eviction_policies': ['', 'first'], 2026-02-21T12:55:41.6780205Z 'loop_orders': [[1, 0]], 2026-02-21T12:55:41.6780399Z 'maxnreg': 256, 2026-02-21T12:55:41.6780597Z 'num_sm_multiplier': 32, 2026-02-21T12:55:41.6780798Z 'num_stages': 5, 2026-02-21T12:55:41.6780985Z 'num_warps': 4, 2026-02-21T12:55:41.6781177Z 'pid_type': 'persistent_interleaved', 2026-02-21T12:55:41.6781426Z 'range_flattens': [None, False], 2026-02-21T12:55:41.6782076Z 'range_multi_buffers': [False, False], 2026-02-21T12:55:41.6782315Z 'range_num_stages': [4, 0], 2026-02-21T12:55:41.6782522Z 'range_unroll_factors': [0, 0], 2026-02-21T12:55:41.6782754Z 'range_warp_specializes': []} 2026-02-21T12:55:41.6835104Z [9040s] Fitting surrogate: 767 points, 767 targets 2026-02-21T12:55:42.6827158Z [9041s] Generation 8 starting: 56 neighbors, 3 active search path(s) 2026-02-21T12:56:06.2875545Z Generation 8: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56/56 2.1 configs/s 2026-02-21T12:56:17.2116310Z 2026-02-21T12:56:17.2116326Z 2026-02-21T12:56:17.2116851Z ================================================================ 2026-02-21T12:56:17.2117221Z Internal Triton PTX codegen error 2026-02-21T12:56:17.2117468Z `ptxas` stderr: 2026-02-21T12:56:17.2118201Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1297 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T12:56:17.2119030Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:56:17.2119289Z 2026-02-21T12:56:17.2120529Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpc0ojczf1.ptx -o /tmp/tmpc0ojczf1.ptx.o 2026-02-21T12:56:17.2121426Z 2026-02-21T12:56:17.2121432Z 2026-02-21T12:56:17.2121510Z // 2026-02-21T12:56:17.2121690Z // Generated by LLVM NVPTX Back-End 2026-02-21T12:56:17.2121936Z // 2026-02-21T12:56:17.2122022Z 2026-02-21T12:56:17.2122092Z .version 8.7 2026-02-21T12:56:17.2122268Z .target sm_90a 2026-02-21T12:56:17.2122435Z .address_size 64 2026-02-21T12:56:17.2122543Z 2026-02-21T12:56:17.2122751Z // .globl _helion_matmul_bf16_int4 // -- Begin function _helion_matmul_bf16_int4 2026-02-21T12:56:17.2123153Z .extern .shared .align 16 .b8 global_smem[]; 2026-02-21T12:56:17.2123444Z // @_helion_matmul_bf16_int4 2026-02-21T12:56:17.2123745Z .visible .entry _helion_matmul_bf16_int4( 2026-02-21T12:56:17.2124084Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_0, 2026-02-21T12:56:17.2124609Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_1, 2026-02-21T12:56:17.2125028Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_2, 2026-02-21T12:56:17.2125423Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_3, 2026-02-21T12:56:17.2125830Z .param .u64 .ptr .global .align 1 _helion_matmul_bf16_int4_param_4 2026-02-21T12:56:17.2126140Z ) 2026-02-21T12:56:17.2126290Z .reqntid 128 2026-02-21T12:56:17.2126654Z .maxnreg 32 2026-02-21T12:56:17.2126823Z { 2026-02-21T12:56:17.2126973Z .reg .pred %p<23>; 2026-02-21T12:56:17.2127158Z .reg .b16 %rs<231>; 2026-02-21T12:56:17.2127333Z .reg .b32 %r<3430>; 2026-02-21T12:56:17.2127511Z .reg .b64 %rd<260>; 2026-02-21T12:56:17.2127686Z $L__func_begin0: 2026-02-21T12:56:17.2127789Z 2026-02-21T12:56:17.2127851Z // %bb.0: 2026-02-21T12:56:17.2128205Z .loc 1 19 46 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:19:46 2026-02-21T12:56:17.2128641Z mov.u32 %r1, %ctaid.x; 2026-02-21T12:56:17.2129031Z .loc 1 19 103 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:19:103 2026-02-21T12:56:17.2129482Z setp.gt.u32 %p1, %r1, 20479; 2026-02-21T12:56:17.2129696Z @%p1 bra $L__BB0_5; 2026-02-21T12:56:17.2129901Z // %bb.1: // %.lr.ph 2026-02-21T12:56:17.2130330Z .loc 1 0 103 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:0:103 2026-02-21T12:56:17.2130827Z ld.param.b64 %rd75, [_helion_matmul_bf16_int4_param_2]; 2026-02-21T12:56:17.2131169Z ld.param.b64 %rd74, [_helion_matmul_bf16_int4_param_1]; 2026-02-21T12:56:17.2131509Z ld.param.b64 %rd73, [_helion_matmul_bf16_int4_param_0]; 2026-02-21T12:56:17.2131803Z mov.u32 %r2, %tid.x; 2026-02-21T12:56:17.2132032Z shr.u32 %r3, %r2, 5; 2026-02-21T12:56:17.2132216Z and.b32 %r4, %r2, 112; 2026-02-21T12:56:17.2132414Z bfe.u32 %r370, %r2, 4, 3; 2026-02-21T12:56:17.2132749Z or.b32 %r371, %r370, 8; 2026-02-21T12:56:17.2132940Z or.b32 %r372, %r370, 16; 2026-02-21T12:56:17.2133128Z or.b32 %r373, %r370, 24; 2026-02-21T12:56:17.2133312Z or.b32 %r374, %r370, 32; 2026-02-21T12:56:17.2133494Z or.b32 %r375, %r370, 40; 2026-02-21T12:56:17.2133664Z or.b32 %r376, %r370, 48; 2026-02-21T12:56:17.2133839Z or.b32 %r377, %r370, 56; 2026-02-21T12:56:17.2134005Z or.b32 %r378, %r370, 64; 2026-02-21T12:56:17.2134177Z or.b32 %r379, %r370, 72; 2026-02-21T12:56:17.2134354Z or.b32 %r380, %r370, 80; 2026-02-21T12:56:17.2134542Z or.b32 %r381, %r370, 88; 2026-02-21T12:56:17.2144049Z or.b32 %r382, %r370, 96; 2026-02-21T12:56:17.2144263Z or.b32 %r383, %r370, 104; 2026-02-21T12:56:17.2144464Z or.b32 %r384, %r370, 112; 2026-02-21T12:56:17.2144655Z or.b32 %r385, %r370, 120; 2026-02-21T12:56:17.2144828Z shl.b32 %r386, %r2, 4; 2026-02-21T12:56:17.2145017Z and.b32 %r387, %r386, 112; 2026-02-21T12:56:17.2145200Z and.b32 %r5, %r2, 15; 2026-02-21T12:56:17.2145400Z cvt.u64.u32 %rd1, %r370; 2026-02-21T12:56:17.2145579Z cvt.u64.u32 %rd2, %r371; 2026-02-21T12:56:17.2145757Z cvt.u64.u32 %rd3, %r372; 2026-02-21T12:56:17.2146156Z cvt.u64.u32 %rd4, %r373; 2026-02-21T12:56:17.2146347Z cvt.u64.u32 %rd5, %r374; 2026-02-21T12:56:17.2146698Z cvt.u64.u32 %rd6, %r375; 2026-02-21T12:56:17.2146866Z cvt.u64.u32 %rd7, %r376; 2026-02-21T12:56:17.2147039Z cvt.u64.u32 %rd8, %r377; 2026-02-21T12:56:17.2147207Z cvt.u64.u32 %rd9, %r378; 2026-02-21T12:56:17.2147393Z cvt.u64.u32 %rd10, %r379; 2026-02-21T12:56:17.2147570Z cvt.u64.u32 %rd11, %r380; 2026-02-21T12:56:17.2147743Z cvt.u64.u32 %rd12, %r381; 2026-02-21T12:56:17.2147910Z cvt.u64.u32 %rd13, %r382; 2026-02-21T12:56:17.2148096Z cvt.u64.u32 %rd14, %r383; 2026-02-21T12:56:17.2148266Z cvt.u64.u32 %rd15, %r384; 2026-02-21T12:56:17.2148547Z cvt.u64.u32 %rd16, %r385; 2026-02-21T12:56:17.2148720Z cvt.u64.u32 %rd17, %r387; 2026-02-21T12:56:17.2148892Z mul.wide.u32 %rd18, %r5, 8; 2026-02-21T12:56:17.2149084Z and.b32 %r6, %r2, 120; 2026-02-21T12:56:17.2149252Z bfe.u32 %r388, %r2, 3, 4; 2026-02-21T12:56:17.2149428Z or.b32 %r389, %r388, 16; 2026-02-21T12:56:17.2149698Z cvt.u64.u32 %rd19, %r388; 2026-02-21T12:56:17.2149883Z cvt.u64.u32 %rd20, %r389; 2026-02-21T12:56:17.2150056Z cvt.u32.u64 %r390, %rd17; 2026-02-21T12:56:17.2150486Z .loc 1 19 52 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:19:52 2026-02-21T12:56:17.2150882Z cvt.u64.u32 %rd257, %r1; 2026-02-21T12:56:17.2151064Z mul.wide.u32 %rd22, %r5, 4; 2026-02-21T12:56:17.2151254Z and.b32 %r7, %r2, 127; 2026-02-21T12:56:17.2151417Z shl.b32 %r391, %r7, 3; 2026-02-21T12:56:17.2151582Z shr.u32 %r392, %r4, 1; 2026-02-21T12:56:17.2151753Z xor.b32 %r8, %r391, %r392; 2026-02-21T12:56:17.2151945Z mov.b32 %r393, global_smem; 2026-02-21T12:56:17.2152121Z add.s32 %r444, %r393, %r8; 2026-02-21T12:56:17.2152297Z add.s32 %r446, %r444, 1024; 2026-02-21T12:56:17.2152472Z add.s32 %r448, %r444, 2048; 2026-02-21T12:56:17.2152667Z add.s32 %r450, %r444, 3072; 2026-02-21T12:56:17.2152843Z add.s32 %r452, %r444, 4096; 2026-02-21T12:56:17.2153015Z add.s32 %r454, %r444, 5120; 2026-02-21T12:56:17.2153197Z add.s32 %r456, %r444, 6144; 2026-02-21T12:56:17.2153368Z add.s32 %r458, %r444, 7168; 2026-02-21T12:56:17.2153555Z add.s32 %r460, %r444, 8192; 2026-02-21T12:56:17.2153728Z add.s32 %r462, %r444, 9216; 2026-02-21T12:56:17.2153909Z add.s32 %r464, %r444, 10240; 2026-02-21T12:56:17.2154086Z add.s32 %r466, %r444, 11264; 2026-02-21T12:56:17.2154264Z add.s32 %r468, %r444, 12288; 2026-02-21T12:56:17.2154440Z add.s32 %r470, %r444, 13312; 2026-02-21T12:56:17.2154620Z add.s32 %r472, %r444, 14336; 2026-02-21T12:56:17.2154804Z add.s32 %r474, %r444, 15360; 2026-02-21T12:56:17.2155006Z mad.lo.s64 %rd23, %rd19, 1280, %rd74; 2026-02-21T12:56:17.2155234Z mad.lo.s64 %rd24, %rd20, 1280, %rd74; 2026-02-21T12:56:17.2155434Z shl.b32 %r25, %r7, 4; 2026-02-21T12:56:17.2155979Z [9076s] Triton compile failed. This likely indicates a bug in Triton. Skipping failing config. 2026-02-21T12:56:17.2157870Z Config: @helion.kernel(config=helion.Config(block_sizes=[32, 128, 128], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=32, num_stages=5, num_warps=4, pid_type='persistent_interleaved', range_flattens=[None, False], range_multi_buffers=[False, None], range_num_stages=[4, 0], range_unroll_factors=[0, 0], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:56:17.2159435Z Error: PTXASError: PTXAS error: Internal Triton PTX codegen error 2026-02-21T12:56:17.2159726Z `ptxas` stderr: 2026-02-21T12:56:17.2160286Z ptxas fatal : (C7602) Insufficient registers (32) to compile instruction at line 1297 in function _helion_matmul_bf16_int4. Try to compile with register target of 94 or higher. 2026-02-21T12:56:17.2160925Z ptxas fatal : Ptx assembly aborted due to errors 2026-02-21T12:56:17.2161116Z 2026-02-21T12:56:17.2161730Z Repro command: /__w/helion/helion/.venv/lib/python3.12/site-packages/triton/backends/nvidia/bin/ptxas -lineinfo -v --gpu-name=sm_90a /tmp/tmpc0ojczf1.ptx -o /tmp/tmpc0ojczf1.ptx.o 2026-02-21T12:56:17.2162383Z 2026-02-21T12:56:17.2162546Z Enable HELION_AUTOTUNE_LOG_LEVEL=DEBUG to log generated Triton code. 2026-02-21T12:56:17.2162846Z add.s32 %r394, %r393, %r25; 2026-02-21T12:56:17.2163049Z add.s32 %r476, %r394, 98304; 2026-02-21T12:56:17.2163233Z add.s32 %r478, %r394, 100352; 2026-02-21T12:56:17.2163422Z add.s32 %r480, %r444, 16384; 2026-02-21T12:56:17.2163596Z add.s32 %r482, %r444, 17408; 2026-02-21T12:56:17.2163784Z add.s32 %r484, %r444, 18432; 2026-02-21T12:56:17.2163967Z add.s32 %r486, %r444, 19456; 2026-02-21T12:56:17.2164137Z add.s32 %r488, %r444, 20480; 2026-02-21T12:56:17.2164317Z add.s32 %r490, %r444, 21504; 2026-02-21T12:56:17.2164487Z add.s32 %r492, %r444, 22528; 2026-02-21T12:56:17.2164680Z add.s32 %r494, %r444, 23552; 2026-02-21T12:56:17.2164855Z add.s32 %r496, %r444, 24576; 2026-02-21T12:56:17.2165033Z add.s32 %r498, %r444, 25600; 2026-02-21T12:56:17.2165206Z add.s32 %r500, %r444, 26624; 2026-02-21T12:56:17.2165472Z add.s32 %r502, %r444, 27648; 2026-02-21T12:56:17.2165657Z add.s32 %r504, %r444, 28672; 2026-02-21T12:56:17.2165829Z add.s32 %r506, %r444, 29696; 2026-02-21T12:56:17.2166004Z add.s32 %r508, %r444, 30720; 2026-02-21T12:56:17.2166186Z add.s32 %r510, %r444, 31744; 2026-02-21T12:56:17.2166371Z add.s32 %r512, %r394, 102400; 2026-02-21T12:56:17.2166694Z add.s32 %r514, %r394, 104448; 2026-02-21T12:56:17.2166881Z add.s32 %r516, %r444, 32768; 2026-02-21T12:56:17.2167054Z add.s32 %r518, %r444, 33792; 2026-02-21T12:56:17.2167233Z add.s32 %r520, %r444, 34816; 2026-02-21T12:56:17.2167425Z add.s32 %r522, %r444, 35840; 2026-02-21T12:56:17.2167604Z add.s32 %r524, %r444, 36864; 2026-02-21T12:56:17.2167785Z add.s32 %r526, %r444, 37888; 2026-02-21T12:56:17.2167967Z add.s32 %r528, %r444, 38912; 2026-02-21T12:56:17.2168150Z add.s32 %r530, %r444, 39936; 2026-02-21T12:56:17.2168321Z add.s32 %r532, %r444, 40960; 2026-02-21T12:56:17.2168514Z add.s32 %r534, %r444, 41984; 2026-02-21T12:56:17.2168694Z add.s32 %r536, %r444, 43008; 2026-02-21T12:56:17.2168876Z add.s32 %r538, %r444, 44032; 2026-02-21T12:56:17.2169050Z add.s32 %r540, %r444, 45056; 2026-02-21T12:56:17.2169230Z add.s32 %r542, %r444, 46080; 2026-02-21T12:56:17.2169415Z add.s32 %r544, %r444, 47104; 2026-02-21T12:56:17.2169598Z add.s32 %r546, %r444, 48128; 2026-02-21T12:56:17.2169782Z add.s32 %r548, %r394, 106496; 2026-02-21T12:56:17.2169957Z add.s32 %r550, %r394, 108544; 2026-02-21T12:56:17.2170139Z add.s32 %r552, %r444, 49152; 2026-02-21T12:56:17.2170308Z add.s32 %r554, %r444, 50176; 2026-02-21T12:56:17.2170487Z add.s32 %r556, %r444, 51200; 2026-02-21T12:56:17.2170657Z add.s32 %r558, %r444, 52224; 2026-02-21T12:56:17.2170838Z add.s32 %r560, %r444, 53248; 2026-02-21T12:56:17.2171007Z add.s32 %r562, %r444, 54272; 2026-02-21T12:56:17.2171283Z add.s32 %r564, %r444, 55296; 2026-02-21T12:56:17.2171462Z add.s32 %r566, %r444, 56320; 2026-02-21T12:56:17.2171631Z add.s32 %r568, %r444, 57344; 2026-02-21T12:56:17.2171839Z add.s32 %r570, %r444, 58368; 2026-02-21T12:56:17.2172010Z add.s32 %r572, %r444, 59392; 2026-02-21T12:56:17.2172188Z add.s32 %r574, %r444, 60416; 2026-02-21T12:56:17.2172357Z add.s32 %r576, %r444, 61440; 2026-02-21T12:56:17.2172535Z add.s32 %r578, %r444, 62464; 2026-02-21T12:56:17.2172706Z add.s32 %r580, %r444, 63488; 2026-02-21T12:56:17.2172884Z add.s32 %r582, %r444, 64512; 2026-02-21T12:56:17.2173060Z add.s32 %r584, %r394, 110592; 2026-02-21T12:56:17.2173251Z add.s32 %r586, %r394, 112640; 2026-02-21T12:56:17.2173430Z shl.b32 %r395, %r2, 6; 2026-02-21T12:56:17.2173598Z and.b32 %r396, %r395, 6144; 2026-02-21T12:56:17.2173783Z shl.b32 %r397, %r2, 5; 2026-02-21T12:56:17.2173950Z and.b32 %r398, %r397, 896; 2026-02-21T12:56:17.2174129Z shl.b32 %r399, %r2, 1; 2026-02-21T12:56:17.2174293Z and.b32 %r400, %r399, 62; 2026-02-21T12:56:17.2174493Z or.b32 %r401, %r396, %r398; 2026-02-21T12:56:17.2174670Z or.b32 %r82, %r401, %r400; 2026-02-21T12:56:17.2174925Z xor.b32 %r83, %r82, 8; 2026-02-21T12:56:17.2175172Z xor.b32 %r84, %r82, 16; 2026-02-21T12:56:17.2175350Z xor.b32 %r85, %r82, 24; 2026-02-21T12:56:17.2175524Z xor.b32 %r86, %r82, 32; 2026-02-21T12:56:17.2175697Z xor.b32 %r87, %r82, 40; 2026-02-21T12:56:17.2175866Z xor.b32 %r88, %r82, 48; 2026-02-21T12:56:17.2176025Z xor.b32 %r89, %r82, 56; 2026-02-21T12:56:17.2176194Z or.b32 %r90, %r2, 896; 2026-02-21T12:56:17.2176356Z or.b32 %r91, %r2, 1920; 2026-02-21T12:56:17.2176656Z or.b32 %r92, %r2, 2944; 2026-02-21T12:56:17.2176821Z or.b32 %r93, %r2, 3968; 2026-02-21T12:56:17.2176989Z shl.b32 %r402, %r7, 7; 2026-02-21T12:56:17.2177165Z or.b32 %r403, %r402, %r390; 2026-02-21T12:56:17.2177350Z add.s32 %r404, %r393, 65536; 2026-02-21T12:56:17.2177547Z add.s32 %r94, %r404, %r403; 2026-02-21T12:56:17.2177722Z xor.b32 %r405, %r403, 16; 2026-02-21T12:56:17.2177901Z add.s32 %r95, %r404, %r405; 2026-02-21T12:56:17.2178072Z xor.b32 %r406, %r403, 32; 2026-02-21T12:56:17.2178326Z add.s32 %r96, %r404, %r406; 2026-02-21T12:56:17.2178521Z xor.b32 %r407, %r403, 48; 2026-02-21T12:56:17.2178700Z add.s32 %r97, %r404, %r407; 2026-02-21T12:56:17.2178875Z xor.b32 %r408, %r403, 64; 2026-02-21T12:56:17.2179040Z add.s32 %r98, %r404, %r408; 2026-02-21T12:56:17.2179214Z xor.b32 %r409, %r403, 80; 2026-02-21T12:56:17.2179376Z add.s32 %r99, %r404, %r409; 2026-02-21T12:56:17.2179555Z xor.b32 %r410, %r403, 96; 2026-02-21T12:56:17.2179723Z add.s32 %r100, %r404, %r410; 2026-02-21T12:56:17.2179915Z xor.b32 %r411, %r403, 112; 2026-02-21T12:56:17.2180089Z add.s32 %r101, %r404, %r411; 2026-02-21T12:56:17.2180273Z bfe.u32 %r412, %r404, 4, 14; 2026-02-21T12:56:17.2180449Z cvt.u64.u32 %rd76, %r412; 2026-02-21T12:56:17.2180642Z or.b64 %rd183, %rd76, 4611686293372403712; 2026-02-21T12:56:17.2180862Z add.s32 %r413, %r393, 65568; 2026-02-21T12:56:17.2181037Z bfe.u32 %r414, %r413, 4, 14; 2026-02-21T12:56:17.2181214Z cvt.u64.u32 %rd77, %r414; 2026-02-21T12:56:17.2181410Z or.b64 %rd184, %rd77, 4611686293372403712; 2026-02-21T12:56:17.2181625Z add.s32 %r415, %r393, 65600; 2026-02-21T12:56:17.2181795Z bfe.u32 %r416, %r415, 4, 14; 2026-02-21T12:56:17.2181983Z cvt.u64.u32 %rd78, %r416; 2026-02-21T12:56:17.2182162Z or.b64 %rd185, %rd78, 4611686293372403712; 2026-02-21T12:56:17.2182370Z add.s32 %r417, %r393, 65632; 2026-02-21T12:56:17.2182544Z bfe.u32 %r418, %r417, 4, 14; 2026-02-21T12:56:17.2182717Z cvt.u64.u32 %rd79, %r418; 2026-02-21T12:56:17.2182897Z or.b64 %rd186, %rd79, 4611686293372403712; 2026-02-21T12:56:17.2183099Z add.s32 %r419, %r393, 81920; 2026-02-21T12:56:17.2183270Z bfe.u32 %r420, %r419, 4, 14; 2026-02-21T12:56:17.2183440Z cvt.u64.u32 %rd80, %r420; 2026-02-21T12:56:17.2183622Z or.b64 %rd187, %rd80, 4611686293372403712; 2026-02-21T12:56:17.2183826Z add.s32 %r421, %r393, 81952; 2026-02-21T12:56:17.2184102Z bfe.u32 %r422, %r421, 4, 14; 2026-02-21T12:56:17.2184274Z cvt.u64.u32 %rd81, %r422; 2026-02-21T12:56:17.2184454Z or.b64 %rd188, %rd81, 4611686293372403712; 2026-02-21T12:56:17.2184663Z add.s32 %r423, %r393, 81984; 2026-02-21T12:56:17.2184832Z bfe.u32 %r424, %r423, 4, 14; 2026-02-21T12:56:17.2185006Z cvt.u64.u32 %rd82, %r424; 2026-02-21T12:56:17.2185183Z or.b64 %rd189, %rd82, 4611686293372403712; 2026-02-21T12:56:17.2185385Z add.s32 %r425, %r393, 82016; 2026-02-21T12:56:17.2185555Z bfe.u32 %r426, %r425, 4, 14; 2026-02-21T12:56:17.2185735Z cvt.u64.u32 %rd83, %r426; 2026-02-21T12:56:17.2185913Z or.b64 %rd190, %rd83, 4611686293372403712; 2026-02-21T12:56:17.2186119Z and.b32 %r427, %r2, 3; 2026-02-21T12:56:17.2186287Z shl.b32 %r428, %r427, 11; 2026-02-21T12:56:17.2186586Z shl.b32 %r429, %r427, 5; 2026-02-21T12:56:17.2186776Z shl.b32 %r430, %r6, 4; 2026-02-21T12:56:17.2186938Z shl.b32 %r431, %r2, 2; 2026-02-21T12:56:17.2187103Z and.b32 %r432, %r431, 16; 2026-02-21T12:56:17.2187299Z or.b32 %r433, %r430, %r432; 2026-02-21T12:56:17.2187473Z or.b32 %r434, %r433, %r428; 2026-02-21T12:56:17.2187653Z or.b32 %r435, %r434, %r429; 2026-02-21T12:56:17.2187992Z add.s32 %r102, %r393, %r435; 2026-02-21T12:56:17.2188173Z xor.b32 %r436, %r435, 32; 2026-02-21T12:56:17.2188402Z add.s32 %r103, %r393, %r436; 2026-02-21T12:56:17.2188588Z xor.b32 %r437, %r435, 64; 2026-02-21T12:56:17.2188754Z add.s32 %r104, %r393, %r437; 2026-02-21T12:56:17.2188929Z xor.b32 %r438, %r435, 96; 2026-02-21T12:56:17.2189103Z add.s32 %r105, %r393, %r438; 2026-02-21T12:56:17.2189277Z shl.b32 %r439, %r2, 8; 2026-02-21T12:56:17.2189443Z and.b32 %r440, %r439, 6144; 2026-02-21T12:56:17.2189614Z and.b32 %r441, %r431, 496; 2026-02-21T12:56:17.2189784Z or.b32 %r442, %r440, %r429; 2026-02-21T12:56:17.2189951Z xor.b32 %r443, %r442, %r441; 2026-02-21T12:56:17.2190140Z add.s32 %r3096, %r393, %r443; 2026-02-21T12:56:17.2190316Z add.s32 %r3101, %r3096, 512; 2026-02-21T12:56:17.2190493Z add.s32 %r3106, %r3096, 1024; 2026-02-21T12:56:17.2190677Z add.s32 %r3111, %r3096, 1536; 2026-02-21T12:56:17.2190900Z $L__BB0_2: // =>This Loop Header: Depth=1 2026-02-21T12:56:17.2191329Z // Child Loop BB0_3 Depth 2 2026-02-21T12:56:17.2191731Z .loc 1 26 33 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:26:33 2026-02-21T12:56:17.2192102Z shr.u64 %rd158, %rd257, 11; 2026-02-21T12:56:17.2192290Z and.b64 %rd159, %rd158, 8; 2026-02-21T12:56:17.2192615Z .loc 1 27 39 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:27:39 2026-02-21T12:56:17.2192975Z xor.b64 %rd160, %rd159, 10; 2026-02-21T12:56:17.2193296Z .loc 1 27 52 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:27:52 2026-02-21T12:56:17.2193656Z min.u64 %rd161, %rd160, 8; 2026-02-21T12:56:17.2193967Z .loc 1 28 64 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:28:64 2026-02-21T12:56:17.2194327Z cvt.u16.u64 %rs1, %rd257; 2026-02-21T12:56:17.2194511Z and.b16 %rs2, %rs1, 16383; 2026-02-21T12:56:17.2194693Z cvt.u16.u64 %rs3, %rd161; 2026-02-21T12:56:17.2195010Z .loc 1 29 51 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:29:51 2026-02-21T12:56:17.2195364Z div.u16 %rs4, %rs2, %rs3; 2026-02-21T12:56:17.2195681Z .loc 1 28 64 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:28:64 2026-02-21T12:56:17.2196033Z mul.lo.s16 %rs5, %rs4, %rs3; 2026-02-21T12:56:17.2196213Z sub.s16 %rs6, %rs2, %rs5; 2026-02-21T12:56:17.2196381Z cvt.u64.u16 %rd162, %rs6; 2026-02-21T12:56:17.2196856Z .loc 1 28 30 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:28:30 2026-02-21T12:56:17.2197212Z add.s64 %rd163, %rd159, %rd162; 2026-02-21T12:56:17.2197547Z .loc 1 30 27 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:30:27 2026-02-21T12:56:17.2198018Z shl.b64 %rd34, %rd163, 7; 2026-02-21T12:56:17.2198324Z .loc 1 31 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:31:32 2026-02-21T12:56:17.2198685Z or.b64 %rd164, %rd34, %rd17; 2026-02-21T12:56:17.2198999Z .loc 1 32 27 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:32:27 2026-02-21T12:56:17.2199360Z cvt.u32.u16 %r591, %rs4; 2026-02-21T12:56:17.2199534Z mul.wide.u32 %rd165, %r591, 128; 2026-02-21T12:56:17.2199872Z .loc 1 33 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:33:32 2026-02-21T12:56:17.2200228Z or.b64 %rd35, %rd165, %rd1; 2026-02-21T12:56:17.2200401Z or.b64 %rd36, %rd165, %rd2; 2026-02-21T12:56:17.2200581Z or.b64 %rd37, %rd165, %rd3; 2026-02-21T12:56:17.2200753Z or.b64 %rd38, %rd165, %rd4; 2026-02-21T12:56:17.2200931Z or.b64 %rd39, %rd165, %rd5; 2026-02-21T12:56:17.2201101Z or.b64 %rd40, %rd165, %rd6; 2026-02-21T12:56:17.2201292Z or.b64 %rd41, %rd165, %rd7; 2026-02-21T12:56:17.2201466Z or.b64 %rd42, %rd165, %rd8; 2026-02-21T12:56:17.2201643Z or.b64 %rd43, %rd165, %rd9; 2026-02-21T12:56:17.2201823Z or.b64 %rd44, %rd165, %rd10; 2026-02-21T12:56:17.2202150Z or.b64 %rd45, %rd165, %rd11; 2026-02-21T12:56:17.2202337Z or.b64 %rd46, %rd165, %rd12; 2026-02-21T12:56:17.2202509Z or.b64 %rd47, %rd165, %rd13; 2026-02-21T12:56:17.2202687Z or.b64 %rd48, %rd165, %rd14; 2026-02-21T12:56:17.2202859Z or.b64 %rd49, %rd165, %rd15; 2026-02-21T12:56:17.2203033Z or.b64 %rd50, %rd165, %rd16; 2026-02-21T12:56:17.2203344Z .loc 1 48 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:32 2026-02-21T12:56:17.2203699Z shl.b64 %rd166, %rd35, 14; 2026-02-21T12:56:17.2203891Z add.s64 %rd51, %rd73, %rd166; 2026-02-21T12:56:17.2204072Z shl.b64 %rd167, %rd22, 1; 2026-02-21T12:56:17.2204246Z add.s64 %rd84, %rd51, %rd167; 2026-02-21T12:56:17.2204418Z shl.b64 %rd168, %rd36, 14; 2026-02-21T12:56:17.2204595Z add.s64 %rd52, %rd73, %rd168; 2026-02-21T12:56:17.2204770Z add.s64 %rd85, %rd52, %rd167; 2026-02-21T12:56:17.2204946Z shl.b64 %rd169, %rd37, 14; 2026-02-21T12:56:17.2205194Z add.s64 %rd53, %rd73, %rd169; 2026-02-21T12:56:17.2205385Z add.s64 %rd86, %rd53, %rd167; 2026-02-21T12:56:17.2205561Z shl.b64 %rd170, %rd38, 14; 2026-02-21T12:56:17.2205736Z add.s64 %rd54, %rd73, %rd170; 2026-02-21T12:56:17.2205915Z add.s64 %rd87, %rd54, %rd167; 2026-02-21T12:56:17.2206102Z shl.b64 %rd171, %rd39, 14; 2026-02-21T12:56:17.2206279Z add.s64 %rd55, %rd73, %rd171; 2026-02-21T12:56:17.2206577Z add.s64 %rd88, %rd55, %rd167; 2026-02-21T12:56:17.2206781Z shl.b64 %rd172, %rd40, 14; 2026-02-21T12:56:17.2206953Z add.s64 %rd56, %rd73, %rd172; 2026-02-21T12:56:17.2207133Z add.s64 %rd89, %rd56, %rd167; 2026-02-21T12:56:17.2207320Z shl.b64 %rd173, %rd41, 14; 2026-02-21T12:56:17.2207502Z add.s64 %rd57, %rd73, %rd173; 2026-02-21T12:56:17.2207681Z add.s64 %rd90, %rd57, %rd167; 2026-02-21T12:56:17.2207853Z shl.b64 %rd174, %rd42, 14; 2026-02-21T12:56:17.2208038Z add.s64 %rd58, %rd73, %rd174; 2026-02-21T12:56:17.2208212Z add.s64 %rd91, %rd58, %rd167; 2026-02-21T12:56:17.2208395Z shl.b64 %rd175, %rd43, 14; 2026-02-21T12:56:17.2208583Z add.s64 %rd59, %rd73, %rd175; 2026-02-21T12:56:17.2208762Z add.s64 %rd92, %rd59, %rd167; 2026-02-21T12:56:17.2208938Z shl.b64 %rd176, %rd44, 14; 2026-02-21T12:56:17.2209118Z add.s64 %rd60, %rd73, %rd176; 2026-02-21T12:56:17.2209290Z add.s64 %rd93, %rd60, %rd167; 2026-02-21T12:56:17.2209469Z shl.b64 %rd177, %rd45, 14; 2026-02-21T12:56:17.2209647Z add.s64 %rd61, %rd73, %rd177; 2026-02-21T12:56:17.2209833Z add.s64 %rd94, %rd61, %rd167; 2026-02-21T12:56:17.2210020Z shl.b64 %rd178, %rd46, 14; 2026-02-21T12:56:17.2210191Z add.s64 %rd62, %rd73, %rd178; 2026-02-21T12:56:17.2210370Z add.s64 %rd95, %rd62, %rd167; 2026-02-21T12:56:17.2210560Z shl.b64 %rd179, %rd47, 14; 2026-02-21T12:56:17.2210739Z add.s64 %rd63, %rd73, %rd179; 2026-02-21T12:56:17.2210915Z add.s64 %rd96, %rd63, %rd167; 2026-02-21T12:56:17.2211192Z shl.b64 %rd180, %rd48, 14; 2026-02-21T12:56:17.2211364Z add.s64 %rd64, %rd73, %rd180; 2026-02-21T12:56:17.2211551Z add.s64 %rd97, %rd64, %rd167; 2026-02-21T12:56:17.2211739Z shl.b64 %rd181, %rd49, 14; 2026-02-21T12:56:17.2211928Z add.s64 %rd65, %rd73, %rd181; 2026-02-21T12:56:17.2212115Z add.s64 %rd98, %rd65, %rd167; 2026-02-21T12:56:17.2212291Z shl.b64 %rd182, %rd50, 14; 2026-02-21T12:56:17.2212468Z add.s64 %rd66, %rd73, %rd182; 2026-02-21T12:56:17.2212640Z add.s64 %rd99, %rd66, %rd167; 2026-02-21T12:56:17.2212973Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2213324Z bar.sync 0; 2026-02-21T12:56:17.2213470Z mov.b32 %r445, 8; 2026-02-21T12:56:17.2213625Z // begin inline asm 2026-02-21T12:56:17.2213857Z cp.async.ca.shared.global [ %r444 + 0 ], [ %rd84 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2214135Z // end inline asm 2026-02-21T12:56:17.2214282Z // begin inline asm 2026-02-21T12:56:17.2214512Z cp.async.ca.shared.global [ %r446 + 0 ], [ %rd85 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2214777Z // end inline asm 2026-02-21T12:56:17.2214921Z // begin inline asm 2026-02-21T12:56:17.2215291Z cp.async.ca.shared.global [ %r448 + 0 ], [ %rd86 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2215556Z // end inline asm 2026-02-21T12:56:17.2215702Z // begin inline asm 2026-02-21T12:56:17.2215919Z cp.async.ca.shared.global [ %r450 + 0 ], [ %rd87 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2216180Z // end inline asm 2026-02-21T12:56:17.2216322Z // begin inline asm 2026-02-21T12:56:17.2216686Z cp.async.ca.shared.global [ %r452 + 0 ], [ %rd88 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2216945Z // end inline asm 2026-02-21T12:56:17.2217086Z // begin inline asm 2026-02-21T12:56:17.2217296Z cp.async.ca.shared.global [ %r454 + 0 ], [ %rd89 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2217578Z // end inline asm 2026-02-21T12:56:17.2217720Z // begin inline asm 2026-02-21T12:56:17.2217939Z cp.async.ca.shared.global [ %r456 + 0 ], [ %rd90 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2218201Z // end inline asm 2026-02-21T12:56:17.2218344Z // begin inline asm 2026-02-21T12:56:17.2218643Z cp.async.ca.shared.global [ %r458 + 0 ], [ %rd91 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2218908Z // end inline asm 2026-02-21T12:56:17.2219054Z // begin inline asm 2026-02-21T12:56:17.2219271Z cp.async.ca.shared.global [ %r460 + 0 ], [ %rd92 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2219549Z // end inline asm 2026-02-21T12:56:17.2219691Z // begin inline asm 2026-02-21T12:56:17.2219910Z cp.async.ca.shared.global [ %r462 + 0 ], [ %rd93 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2220173Z // end inline asm 2026-02-21T12:56:17.2220312Z // begin inline asm 2026-02-21T12:56:17.2220530Z cp.async.ca.shared.global [ %r464 + 0 ], [ %rd94 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2220789Z // end inline asm 2026-02-21T12:56:17.2220936Z // begin inline asm 2026-02-21T12:56:17.2221148Z cp.async.ca.shared.global [ %r466 + 0 ], [ %rd95 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2221412Z // end inline asm 2026-02-21T12:56:17.2221552Z // begin inline asm 2026-02-21T12:56:17.2221782Z cp.async.ca.shared.global [ %r468 + 0 ], [ %rd96 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2222062Z // end inline asm 2026-02-21T12:56:17.2222202Z // begin inline asm 2026-02-21T12:56:17.2222431Z cp.async.ca.shared.global [ %r470 + 0 ], [ %rd97 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2222694Z // end inline asm 2026-02-21T12:56:17.2222841Z // begin inline asm 2026-02-21T12:56:17.2223056Z cp.async.ca.shared.global [ %r472 + 0 ], [ %rd98 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2223320Z // end inline asm 2026-02-21T12:56:17.2223461Z // begin inline asm 2026-02-21T12:56:17.2223685Z cp.async.ca.shared.global [ %r474 + 0 ], [ %rd99 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2223948Z // end inline asm 2026-02-21T12:56:17.2224108Z cp.async.commit_group; 2026-02-21T12:56:17.2224444Z .loc 1 54 34 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:34 2026-02-21T12:56:17.2224908Z add.s64 %rd100, %rd23, %rd164; 2026-02-21T12:56:17.2225095Z add.s64 %rd101, %rd24, %rd164; 2026-02-21T12:56:17.2225278Z mov.b32 %r477, 16; 2026-02-21T12:56:17.2225584Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2225931Z // begin inline asm 2026-02-21T12:56:17.2226163Z cp.async.cg.shared.global [ %r476 + 0 ], [ %rd100 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2226555Z // end inline asm 2026-02-21T12:56:17.2226711Z // begin inline asm 2026-02-21T12:56:17.2226953Z cp.async.cg.shared.global [ %r478 + 0 ], [ %rd101 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2227216Z // end inline asm 2026-02-21T12:56:17.2227378Z cp.async.commit_group; 2026-02-21T12:56:17.2227691Z .loc 1 48 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:32 2026-02-21T12:56:17.2228050Z add.s64 %rd102, %rd84, 128; 2026-02-21T12:56:17.2228226Z add.s64 %rd103, %rd85, 128; 2026-02-21T12:56:17.2228460Z add.s64 %rd104, %rd86, 128; 2026-02-21T12:56:17.2228636Z add.s64 %rd105, %rd87, 128; 2026-02-21T12:56:17.2228814Z add.s64 %rd106, %rd88, 128; 2026-02-21T12:56:17.2229175Z add.s64 %rd107, %rd89, 128; 2026-02-21T12:56:17.2229351Z add.s64 %rd108, %rd90, 128; 2026-02-21T12:56:17.2229538Z add.s64 %rd109, %rd91, 128; 2026-02-21T12:56:17.2229709Z add.s64 %rd110, %rd92, 128; 2026-02-21T12:56:17.2229881Z add.s64 %rd111, %rd93, 128; 2026-02-21T12:56:17.2230046Z add.s64 %rd112, %rd94, 128; 2026-02-21T12:56:17.2230217Z add.s64 %rd113, %rd95, 128; 2026-02-21T12:56:17.2230387Z add.s64 %rd114, %rd96, 128; 2026-02-21T12:56:17.2230560Z add.s64 %rd115, %rd97, 128; 2026-02-21T12:56:17.2230731Z add.s64 %rd116, %rd98, 128; 2026-02-21T12:56:17.2230897Z add.s64 %rd117, %rd99, 128; 2026-02-21T12:56:17.2231224Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2231572Z bar.sync 0; 2026-02-21T12:56:17.2231719Z // begin inline asm 2026-02-21T12:56:17.2231951Z cp.async.ca.shared.global [ %r480 + 0 ], [ %rd102 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2232230Z // end inline asm 2026-02-21T12:56:17.2232472Z // begin inline asm 2026-02-21T12:56:17.2232721Z cp.async.ca.shared.global [ %r482 + 0 ], [ %rd103 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2233015Z // end inline asm 2026-02-21T12:56:17.2233167Z // begin inline asm 2026-02-21T12:56:17.2233403Z cp.async.ca.shared.global [ %r484 + 0 ], [ %rd104 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2233674Z // end inline asm 2026-02-21T12:56:17.2233841Z // begin inline asm 2026-02-21T12:56:17.2234069Z cp.async.ca.shared.global [ %r486 + 0 ], [ %rd105 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2234347Z // end inline asm 2026-02-21T12:56:17.2234496Z // begin inline asm 2026-02-21T12:56:17.2234723Z cp.async.ca.shared.global [ %r488 + 0 ], [ %rd106 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2234995Z // end inline asm 2026-02-21T12:56:17.2235155Z // begin inline asm 2026-02-21T12:56:17.2235384Z cp.async.ca.shared.global [ %r490 + 0 ], [ %rd107 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2235651Z // end inline asm 2026-02-21T12:56:17.2235801Z // begin inline asm 2026-02-21T12:56:17.2236024Z cp.async.ca.shared.global [ %r492 + 0 ], [ %rd108 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2236293Z // end inline asm 2026-02-21T12:56:17.2236437Z // begin inline asm 2026-02-21T12:56:17.2236794Z cp.async.ca.shared.global [ %r494 + 0 ], [ %rd109 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2237063Z // end inline asm 2026-02-21T12:56:17.2237209Z // begin inline asm 2026-02-21T12:56:17.2237431Z cp.async.ca.shared.global [ %r496 + 0 ], [ %rd110 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2237708Z // end inline asm 2026-02-21T12:56:17.2237859Z // begin inline asm 2026-02-21T12:56:17.2238074Z cp.async.ca.shared.global [ %r498 + 0 ], [ %rd111 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2238343Z // end inline asm 2026-02-21T12:56:17.2238489Z // begin inline asm 2026-02-21T12:56:17.2238714Z cp.async.ca.shared.global [ %r500 + 0 ], [ %rd112 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2239071Z // end inline asm 2026-02-21T12:56:17.2239217Z // begin inline asm 2026-02-21T12:56:17.2239444Z cp.async.ca.shared.global [ %r502 + 0 ], [ %rd113 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2239728Z // end inline asm 2026-02-21T12:56:17.2239883Z // begin inline asm 2026-02-21T12:56:17.2240101Z cp.async.ca.shared.global [ %r504 + 0 ], [ %rd114 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2240369Z // end inline asm 2026-02-21T12:56:17.2240512Z // begin inline asm 2026-02-21T12:56:17.2240735Z cp.async.ca.shared.global [ %r506 + 0 ], [ %rd115 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2241013Z // end inline asm 2026-02-21T12:56:17.2241166Z // begin inline asm 2026-02-21T12:56:17.2241391Z cp.async.ca.shared.global [ %r508 + 0 ], [ %rd116 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2241657Z // end inline asm 2026-02-21T12:56:17.2241809Z // begin inline asm 2026-02-21T12:56:17.2242029Z cp.async.ca.shared.global [ %r510 + 0 ], [ %rd117 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2242302Z // end inline asm 2026-02-21T12:56:17.2242455Z cp.async.commit_group; 2026-02-21T12:56:17.2242862Z .loc 1 54 34 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:34 2026-02-21T12:56:17.2243323Z add.s64 %rd118, %rd100, 40960; 2026-02-21T12:56:17.2243520Z add.s64 %rd119, %rd101, 40960; 2026-02-21T12:56:17.2243853Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2244206Z // begin inline asm 2026-02-21T12:56:17.2244443Z cp.async.cg.shared.global [ %r512 + 0 ], [ %rd118 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2244716Z // end inline asm 2026-02-21T12:56:17.2244869Z // begin inline asm 2026-02-21T12:56:17.2245103Z cp.async.cg.shared.global [ %r514 + 0 ], [ %rd119 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2245386Z // end inline asm 2026-02-21T12:56:17.2245547Z cp.async.commit_group; 2026-02-21T12:56:17.2245869Z .loc 1 48 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:32 2026-02-21T12:56:17.2246239Z add.s64 %rd120, %rd84, 256; 2026-02-21T12:56:17.2246419Z add.s64 %rd121, %rd85, 256; 2026-02-21T12:56:17.2246843Z add.s64 %rd122, %rd86, 256; 2026-02-21T12:56:17.2247029Z add.s64 %rd123, %rd87, 256; 2026-02-21T12:56:17.2247206Z add.s64 %rd124, %rd88, 256; 2026-02-21T12:56:17.2247386Z add.s64 %rd125, %rd89, 256; 2026-02-21T12:56:17.2247564Z add.s64 %rd126, %rd90, 256; 2026-02-21T12:56:17.2247735Z add.s64 %rd127, %rd91, 256; 2026-02-21T12:56:17.2247910Z add.s64 %rd128, %rd92, 256; 2026-02-21T12:56:17.2248085Z add.s64 %rd129, %rd93, 256; 2026-02-21T12:56:17.2248252Z add.s64 %rd130, %rd94, 256; 2026-02-21T12:56:17.2248436Z add.s64 %rd131, %rd95, 256; 2026-02-21T12:56:17.2248617Z add.s64 %rd132, %rd96, 256; 2026-02-21T12:56:17.2248796Z add.s64 %rd133, %rd97, 256; 2026-02-21T12:56:17.2248965Z add.s64 %rd134, %rd98, 256; 2026-02-21T12:56:17.2249139Z add.s64 %rd135, %rd99, 256; 2026-02-21T12:56:17.2249477Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2249839Z bar.sync 0; 2026-02-21T12:56:17.2249988Z // begin inline asm 2026-02-21T12:56:17.2250243Z cp.async.ca.shared.global [ %r516 + 0 ], [ %rd120 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2250524Z // end inline asm 2026-02-21T12:56:17.2250677Z // begin inline asm 2026-02-21T12:56:17.2250905Z cp.async.ca.shared.global [ %r518 + 0 ], [ %rd121 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2251173Z // end inline asm 2026-02-21T12:56:17.2251327Z // begin inline asm 2026-02-21T12:56:17.2251550Z cp.async.ca.shared.global [ %r520 + 0 ], [ %rd122 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2251839Z // end inline asm 2026-02-21T12:56:17.2251993Z // begin inline asm 2026-02-21T12:56:17.2252214Z cp.async.ca.shared.global [ %r522 + 0 ], [ %rd123 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2252490Z // end inline asm 2026-02-21T12:56:17.2252636Z // begin inline asm 2026-02-21T12:56:17.2252861Z cp.async.ca.shared.global [ %r524 + 0 ], [ %rd124 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2253214Z // end inline asm 2026-02-21T12:56:17.2253368Z // begin inline asm 2026-02-21T12:56:17.2253591Z cp.async.ca.shared.global [ %r526 + 0 ], [ %rd125 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2253862Z // end inline asm 2026-02-21T12:56:17.2254016Z // begin inline asm 2026-02-21T12:56:17.2254237Z cp.async.ca.shared.global [ %r528 + 0 ], [ %rd126 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2254508Z // end inline asm 2026-02-21T12:56:17.2254654Z // begin inline asm 2026-02-21T12:56:17.2254879Z cp.async.ca.shared.global [ %r530 + 0 ], [ %rd127 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2255146Z // end inline asm 2026-02-21T12:56:17.2255299Z // begin inline asm 2026-02-21T12:56:17.2255518Z cp.async.ca.shared.global [ %r532 + 0 ], [ %rd128 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2255792Z // end inline asm 2026-02-21T12:56:17.2255942Z // begin inline asm 2026-02-21T12:56:17.2256185Z cp.async.ca.shared.global [ %r534 + 0 ], [ %rd129 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2256587Z // end inline asm 2026-02-21T12:56:17.2256745Z // begin inline asm 2026-02-21T12:56:17.2256981Z cp.async.ca.shared.global [ %r536 + 0 ], [ %rd130 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2257404Z // end inline asm 2026-02-21T12:56:17.2257560Z // begin inline asm 2026-02-21T12:56:17.2257782Z cp.async.ca.shared.global [ %r538 + 0 ], [ %rd131 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2258058Z // end inline asm 2026-02-21T12:56:17.2258202Z // begin inline asm 2026-02-21T12:56:17.2258425Z cp.async.ca.shared.global [ %r540 + 0 ], [ %rd132 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2258690Z // end inline asm 2026-02-21T12:56:17.2258835Z // begin inline asm 2026-02-21T12:56:17.2259073Z cp.async.ca.shared.global [ %r542 + 0 ], [ %rd133 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2259338Z // end inline asm 2026-02-21T12:56:17.2259488Z // begin inline asm 2026-02-21T12:56:17.2259708Z cp.async.ca.shared.global [ %r544 + 0 ], [ %rd134 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2259975Z // end inline asm 2026-02-21T12:56:17.2260122Z // begin inline asm 2026-02-21T12:56:17.2260346Z cp.async.ca.shared.global [ %r546 + 0 ], [ %rd135 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2260703Z // end inline asm 2026-02-21T12:56:17.2260863Z cp.async.commit_group; 2026-02-21T12:56:17.2261187Z .loc 1 54 34 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:34 2026-02-21T12:56:17.2261551Z add.s64 %rd136, %rd100, 81920; 2026-02-21T12:56:17.2261745Z add.s64 %rd137, %rd101, 81920; 2026-02-21T12:56:17.2262073Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2262429Z // begin inline asm 2026-02-21T12:56:17.2262657Z cp.async.cg.shared.global [ %r548 + 0 ], [ %rd136 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2262944Z // end inline asm 2026-02-21T12:56:17.2263102Z // begin inline asm 2026-02-21T12:56:17.2263326Z cp.async.cg.shared.global [ %r550 + 0 ], [ %rd137 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2263601Z // end inline asm 2026-02-21T12:56:17.2263758Z cp.async.commit_group; 2026-02-21T12:56:17.2264084Z .loc 1 48 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:32 2026-02-21T12:56:17.2264448Z add.s64 %rd138, %rd84, 384; 2026-02-21T12:56:17.2264634Z add.s64 %rd139, %rd85, 384; 2026-02-21T12:56:17.2264816Z add.s64 %rd140, %rd86, 384; 2026-02-21T12:56:17.2264988Z add.s64 %rd141, %rd87, 384; 2026-02-21T12:56:17.2265167Z add.s64 %rd142, %rd88, 384; 2026-02-21T12:56:17.2265339Z add.s64 %rd143, %rd89, 384; 2026-02-21T12:56:17.2265518Z add.s64 %rd144, %rd90, 384; 2026-02-21T12:56:17.2265691Z add.s64 %rd145, %rd91, 384; 2026-02-21T12:56:17.2265880Z add.s64 %rd146, %rd92, 384; 2026-02-21T12:56:17.2266059Z add.s64 %rd147, %rd93, 384; 2026-02-21T12:56:17.2266236Z add.s64 %rd148, %rd94, 384; 2026-02-21T12:56:17.2266409Z add.s64 %rd149, %rd95, 384; 2026-02-21T12:56:17.2266720Z add.s64 %rd150, %rd96, 384; 2026-02-21T12:56:17.2266900Z add.s64 %rd151, %rd97, 384; 2026-02-21T12:56:17.2267155Z add.s64 %rd152, %rd98, 384; 2026-02-21T12:56:17.2267349Z add.s64 %rd153, %rd99, 384; 2026-02-21T12:56:17.2267669Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2268024Z bar.sync 0; 2026-02-21T12:56:17.2268170Z // begin inline asm 2026-02-21T12:56:17.2268495Z cp.async.ca.shared.global [ %r552 + 0 ], [ %rd138 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2268767Z // end inline asm 2026-02-21T12:56:17.2268923Z // begin inline asm 2026-02-21T12:56:17.2269154Z cp.async.ca.shared.global [ %r554 + 0 ], [ %rd139 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2269423Z // end inline asm 2026-02-21T12:56:17.2269587Z // begin inline asm 2026-02-21T12:56:17.2269810Z cp.async.ca.shared.global [ %r556 + 0 ], [ %rd140 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2270078Z // end inline asm 2026-02-21T12:56:17.2270222Z // begin inline asm 2026-02-21T12:56:17.2270458Z cp.async.ca.shared.global [ %r558 + 0 ], [ %rd141 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2270725Z // end inline asm 2026-02-21T12:56:17.2270875Z // begin inline asm 2026-02-21T12:56:17.2271176Z cp.async.ca.shared.global [ %r560 + 0 ], [ %rd142 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2271514Z // end inline asm 2026-02-21T12:56:17.2271667Z // begin inline asm 2026-02-21T12:56:17.2271881Z cp.async.ca.shared.global [ %r562 + 0 ], [ %rd143 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2272157Z // end inline asm 2026-02-21T12:56:17.2272301Z // begin inline asm 2026-02-21T12:56:17.2272522Z cp.async.ca.shared.global [ %r564 + 0 ], [ %rd144 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2272798Z // end inline asm 2026-02-21T12:56:17.2272951Z // begin inline asm 2026-02-21T12:56:17.2273172Z cp.async.ca.shared.global [ %r566 + 0 ], [ %rd145 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2273441Z // end inline asm 2026-02-21T12:56:17.2273589Z // begin inline asm 2026-02-21T12:56:17.2273807Z cp.async.ca.shared.global [ %r568 + 0 ], [ %rd146 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2274081Z // end inline asm 2026-02-21T12:56:17.2274229Z // begin inline asm 2026-02-21T12:56:17.2274450Z cp.async.ca.shared.global [ %r570 + 0 ], [ %rd147 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2274800Z // end inline asm 2026-02-21T12:56:17.2274958Z // begin inline asm 2026-02-21T12:56:17.2275177Z cp.async.ca.shared.global [ %r572 + 0 ], [ %rd148 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2275445Z // end inline asm 2026-02-21T12:56:17.2275597Z // begin inline asm 2026-02-21T12:56:17.2275816Z cp.async.ca.shared.global [ %r574 + 0 ], [ %rd149 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2276082Z // end inline asm 2026-02-21T12:56:17.2276226Z // begin inline asm 2026-02-21T12:56:17.2276584Z cp.async.ca.shared.global [ %r576 + 0 ], [ %rd150 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2276855Z // end inline asm 2026-02-21T12:56:17.2277003Z // begin inline asm 2026-02-21T12:56:17.2277235Z cp.async.ca.shared.global [ %r578 + 0 ], [ %rd151 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2277520Z // end inline asm 2026-02-21T12:56:17.2277679Z // begin inline asm 2026-02-21T12:56:17.2277899Z cp.async.ca.shared.global [ %r580 + 0 ], [ %rd152 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2278174Z // end inline asm 2026-02-21T12:56:17.2278339Z // begin inline asm 2026-02-21T12:56:17.2278571Z cp.async.ca.shared.global [ %r582 + 0 ], [ %rd153 + 0 ], 0x8, %r445; 2026-02-21T12:56:17.2278837Z // end inline asm 2026-02-21T12:56:17.2278998Z cp.async.commit_group; 2026-02-21T12:56:17.2279313Z .loc 1 54 34 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:34 2026-02-21T12:56:17.2279683Z add.s64 %rd154, %rd100, 122880; 2026-02-21T12:56:17.2279884Z add.s64 %rd155, %rd101, 122880; 2026-02-21T12:56:17.2280208Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2280565Z // begin inline asm 2026-02-21T12:56:17.2280794Z cp.async.cg.shared.global [ %r584 + 0 ], [ %rd154 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2281072Z // end inline asm 2026-02-21T12:56:17.2281314Z // begin inline asm 2026-02-21T12:56:17.2281543Z cp.async.cg.shared.global [ %r586 + 0 ], [ %rd155 + 0 ], 0x10, %r477; 2026-02-21T12:56:17.2281825Z // end inline asm 2026-02-21T12:56:17.2281988Z cp.async.commit_group; 2026-02-21T12:56:17.2282299Z .loc 1 40 57 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:40:57 2026-02-21T12:56:17.2282654Z add.s64 %rd67, %rd74, %rd164; 2026-02-21T12:56:17.2282841Z mov.b32 %r3302, 0f00000000; 2026-02-21T12:56:17.2283013Z mov.b32 %r3301, 3; 2026-02-21T12:56:17.2283178Z mov.b32 %r3300, -1; 2026-02-21T12:56:17.2283335Z mov.b64 %rd259, 0; 2026-02-21T12:56:17.2283492Z mov.b64 %rd258, -32; 2026-02-21T12:56:17.2283653Z mov.b32 %r3303, %r3302; 2026-02-21T12:56:17.2283823Z mov.b32 %r3304, %r3302; 2026-02-21T12:56:17.2283995Z mov.b32 %r3305, %r3302; 2026-02-21T12:56:17.2284157Z mov.b32 %r3306, %r3302; 2026-02-21T12:56:17.2284326Z mov.b32 %r3307, %r3302; 2026-02-21T12:56:17.2284486Z mov.b32 %r3308, %r3302; 2026-02-21T12:56:17.2284653Z mov.b32 %r3309, %r3302; 2026-02-21T12:56:17.2284813Z mov.b32 %r3310, %r3302; 2026-02-21T12:56:17.2284976Z mov.b32 %r3311, %r3302; 2026-02-21T12:56:17.2285287Z mov.b32 %r3312, %r3302; 2026-02-21T12:56:17.2285455Z mov.b32 %r3313, %r3302; 2026-02-21T12:56:17.2285627Z mov.b32 %r3314, %r3302; 2026-02-21T12:56:17.2285794Z mov.b32 %r3315, %r3302; 2026-02-21T12:56:17.2285959Z mov.b32 %r3316, %r3302; 2026-02-21T12:56:17.2286118Z mov.b32 %r3317, %r3302; 2026-02-21T12:56:17.2286284Z mov.b32 %r3318, %r3302; 2026-02-21T12:56:17.2286445Z mov.b32 %r3319, %r3302; 2026-02-21T12:56:17.2286745Z mov.b32 %r3320, %r3302; 2026-02-21T12:56:17.2286904Z mov.b32 %r3321, %r3302; 2026-02-21T12:56:17.2287078Z mov.b32 %r3322, %r3302; 2026-02-21T12:56:17.2287247Z mov.b32 %r3323, %r3302; 2026-02-21T12:56:17.2287416Z mov.b32 %r3324, %r3302; 2026-02-21T12:56:17.2287576Z mov.b32 %r3325, %r3302; 2026-02-21T12:56:17.2287741Z mov.b32 %r3326, %r3302; 2026-02-21T12:56:17.2287913Z mov.b32 %r3327, %r3302; 2026-02-21T12:56:17.2288076Z mov.b32 %r3328, %r3302; 2026-02-21T12:56:17.2288242Z mov.b32 %r3329, %r3302; 2026-02-21T12:56:17.2288495Z mov.b32 %r3330, %r3302; 2026-02-21T12:56:17.2288674Z mov.b32 %r3331, %r3302; 2026-02-21T12:56:17.2288836Z mov.b32 %r3332, %r3302; 2026-02-21T12:56:17.2289008Z mov.b32 %r3333, %r3302; 2026-02-21T12:56:17.2289167Z mov.b32 %r3334, %r3302; 2026-02-21T12:56:17.2289344Z mov.b32 %r3335, %r3302; 2026-02-21T12:56:17.2289509Z mov.b32 %r3336, %r3302; 2026-02-21T12:56:17.2289675Z mov.b32 %r3337, %r3302; 2026-02-21T12:56:17.2289846Z mov.b32 %r3338, %r3302; 2026-02-21T12:56:17.2290004Z mov.b32 %r3339, %r3302; 2026-02-21T12:56:17.2290169Z mov.b32 %r3340, %r3302; 2026-02-21T12:56:17.2290339Z mov.b32 %r3341, %r3302; 2026-02-21T12:56:17.2290510Z mov.b32 %r3342, %r3302; 2026-02-21T12:56:17.2290668Z mov.b32 %r3343, %r3302; 2026-02-21T12:56:17.2290832Z mov.b32 %r3344, %r3302; 2026-02-21T12:56:17.2290989Z mov.b32 %r3345, %r3302; 2026-02-21T12:56:17.2291160Z mov.b32 %r3346, %r3302; 2026-02-21T12:56:17.2291320Z mov.b32 %r3347, %r3302; 2026-02-21T12:56:17.2291485Z mov.b32 %r3348, %r3302; 2026-02-21T12:56:17.2291654Z mov.b32 %r3349, %r3302; 2026-02-21T12:56:17.2291813Z mov.b32 %r3350, %r3302; 2026-02-21T12:56:17.2291986Z mov.b32 %r3351, %r3302; 2026-02-21T12:56:17.2292143Z mov.b32 %r3352, %r3302; 2026-02-21T12:56:17.2292306Z mov.b32 %r3353, %r3302; 2026-02-21T12:56:17.2292462Z mov.b32 %r3354, %r3302; 2026-02-21T12:56:17.2292630Z mov.b32 %r3355, %r3302; 2026-02-21T12:56:17.2292788Z mov.b32 %r3356, %r3302; 2026-02-21T12:56:17.2292964Z mov.b32 %r3357, %r3302; 2026-02-21T12:56:17.2293124Z mov.b32 %r3358, %r3302; 2026-02-21T12:56:17.2293286Z mov.b32 %r3359, %r3302; 2026-02-21T12:56:17.2293448Z mov.b32 %r3360, %r3302; 2026-02-21T12:56:17.2293607Z mov.b32 %r3361, %r3302; 2026-02-21T12:56:17.2293771Z mov.b32 %r3362, %r3302; 2026-02-21T12:56:17.2293930Z mov.b32 %r3363, %r3302; 2026-02-21T12:56:17.2294093Z mov.b32 %r3364, %r3302; 2026-02-21T12:56:17.2294342Z mov.b32 %r3365, %r3302; 2026-02-21T12:56:17.2294504Z mov.b32 %r3366, %r3302; 2026-02-21T12:56:17.2294663Z mov.b32 %r3367, %r3302; 2026-02-21T12:56:17.2294831Z mov.b32 %r3368, %r3302; 2026-02-21T12:56:17.2294989Z mov.b32 %r3369, %r3302; 2026-02-21T12:56:17.2295168Z mov.b32 %r3370, %r3302; 2026-02-21T12:56:17.2295332Z mov.b32 %r3371, %r3302; 2026-02-21T12:56:17.2295491Z mov.b32 %r3372, %r3302; 2026-02-21T12:56:17.2295656Z mov.b32 %r3373, %r3302; 2026-02-21T12:56:17.2295813Z mov.b32 %r3374, %r3302; 2026-02-21T12:56:17.2295978Z mov.b32 %r3375, %r3302; 2026-02-21T12:56:17.2296136Z mov.b32 %r3376, %r3302; 2026-02-21T12:56:17.2296301Z mov.b32 %r3377, %r3302; 2026-02-21T12:56:17.2296582Z mov.b32 %r3378, %r3302; 2026-02-21T12:56:17.2296758Z mov.b32 %r3379, %r3302; 2026-02-21T12:56:17.2296918Z mov.b32 %r3380, %r3302; 2026-02-21T12:56:17.2297095Z mov.b32 %r3381, %r3302; 2026-02-21T12:56:17.2297273Z mov.b32 %r3382, %r3302; 2026-02-21T12:56:17.2297441Z mov.b32 %r3383, %r3302; 2026-02-21T12:56:17.2297609Z mov.b32 %r3384, %r3302; 2026-02-21T12:56:17.2297767Z mov.b32 %r3385, %r3302; 2026-02-21T12:56:17.2298049Z mov.b32 %r3386, %r3302; 2026-02-21T12:56:17.2298277Z mov.b32 %r3387, %r3302; 2026-02-21T12:56:17.2298444Z mov.b32 %r3388, %r3302; 2026-02-21T12:56:17.2298606Z mov.b32 %r3389, %r3302; 2026-02-21T12:56:17.2298773Z mov.b32 %r3390, %r3302; 2026-02-21T12:56:17.2298931Z mov.b32 %r3391, %r3302; 2026-02-21T12:56:17.2299098Z mov.b32 %r3392, %r3302; 2026-02-21T12:56:17.2299264Z mov.b32 %r3393, %r3302; 2026-02-21T12:56:17.2299425Z mov.b32 %r3394, %r3302; 2026-02-21T12:56:17.2299598Z mov.b32 %r3395, %r3302; 2026-02-21T12:56:17.2299765Z mov.b32 %r3396, %r3302; 2026-02-21T12:56:17.2299932Z mov.b32 %r3397, %r3302; 2026-02-21T12:56:17.2300089Z mov.b32 %r3398, %r3302; 2026-02-21T12:56:17.2300254Z mov.b32 %r3399, %r3302; 2026-02-21T12:56:17.2300416Z mov.b32 %r3400, %r3302; 2026-02-21T12:56:17.2300579Z mov.b32 %r3401, %r3302; 2026-02-21T12:56:17.2300748Z mov.b32 %r3402, %r3302; 2026-02-21T12:56:17.2300915Z mov.b32 %r3403, %r3302; 2026-02-21T12:56:17.2301076Z mov.b32 %r3404, %r3302; 2026-02-21T12:56:17.2301329Z mov.b32 %r3405, %r3302; 2026-02-21T12:56:17.2301504Z mov.b32 %r3406, %r3302; 2026-02-21T12:56:17.2301663Z mov.b32 %r3407, %r3302; 2026-02-21T12:56:17.2301828Z mov.b32 %r3408, %r3302; 2026-02-21T12:56:17.2301985Z mov.b32 %r3409, %r3302; 2026-02-21T12:56:17.2302154Z mov.b32 %r3410, %r3302; 2026-02-21T12:56:17.2302314Z mov.b32 %r3411, %r3302; 2026-02-21T12:56:17.2302479Z mov.b32 %r3412, %r3302; 2026-02-21T12:56:17.2302635Z mov.b32 %r3413, %r3302; 2026-02-21T12:56:17.2302801Z mov.b32 %r3414, %r3302; 2026-02-21T12:56:17.2302965Z mov.b32 %r3415, %r3302; 2026-02-21T12:56:17.2303124Z mov.b32 %r3416, %r3302; 2026-02-21T12:56:17.2303304Z mov.b32 %r3417, %r3302; 2026-02-21T12:56:17.2303467Z mov.b32 %r3418, %r3302; 2026-02-21T12:56:17.2303634Z mov.b32 %r3419, %r3302; 2026-02-21T12:56:17.2303803Z mov.b32 %r3420, %r3302; 2026-02-21T12:56:17.2303978Z mov.b32 %r3421, %r3302; 2026-02-21T12:56:17.2304136Z mov.b32 %r3422, %r3302; 2026-02-21T12:56:17.2304303Z mov.b32 %r3423, %r3302; 2026-02-21T12:56:17.2304468Z mov.b32 %r3424, %r3302; 2026-02-21T12:56:17.2304638Z mov.b32 %r3425, %r3302; 2026-02-21T12:56:17.2304801Z mov.b32 %r3426, %r3302; 2026-02-21T12:56:17.2304961Z mov.b32 %r3427, %r3302; 2026-02-21T12:56:17.2305123Z mov.b32 %r3428, %r3302; 2026-02-21T12:56:17.2305280Z mov.b32 %r3429, %r3302; 2026-02-21T12:56:17.2305514Z $L__BB0_3: // Parent Loop BB0_2 Depth=1 2026-02-21T12:56:17.2305807Z // => This Inner Loop Header: Depth=2 2026-02-21T12:56:17.2306074Z add.s64 %rd258, %rd258, 32; 2026-02-21T12:56:17.2306261Z setp.lt.u64 %p18, %rd258, 3968; 2026-02-21T12:56:17.2306589Z add.s32 %r3002, %r3300, 1; 2026-02-21T12:56:17.2306797Z setp.gt.s32 %p19, %r3002, 3; 2026-02-21T12:56:17.2306990Z selp.b32 %r3300, 0, %r3002, %p19; 2026-02-21T12:56:17.2307441Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2307814Z cp.async.wait_group 6; 2026-02-21T12:56:17.2308005Z bar.sync 0; 2026-02-21T12:56:17.2308154Z shl.b32 %r3003, %r3300, 14; 2026-02-21T12:56:17.2308405Z add.s32 %r3005, %r393, %r3003; 2026-02-21T12:56:17.2308734Z .loc 1 52 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:52:32 2026-02-21T12:56:17.2309104Z add.s32 %r3006, %r3005, %r82; 2026-02-21T12:56:17.2309294Z ld.shared.b16 %rs7, [%r3006]; 2026-02-21T12:56:17.2309479Z ld.shared.b16 %rs8, [%r3006+1024]; 2026-02-21T12:56:17.2309690Z ld.shared.b16 %rs9, [%r3006+64]; 2026-02-21T12:56:17.2309888Z ld.shared.b16 %rs10, [%r3006+1088]; 2026-02-21T12:56:17.2310094Z ld.shared.b16 %rs11, [%r3006+8192]; 2026-02-21T12:56:17.2310288Z ld.shared.b16 %rs12, [%r3006+9216]; 2026-02-21T12:56:17.2310497Z ld.shared.b16 %rs13, [%r3006+8256]; 2026-02-21T12:56:17.2310697Z ld.shared.b16 %rs14, [%r3006+9280]; 2026-02-21T12:56:17.2310891Z add.s32 %r3007, %r3005, %r83; 2026-02-21T12:56:17.2311076Z ld.shared.b16 %rs15, [%r3007]; 2026-02-21T12:56:17.2311412Z ld.shared.b16 %rs16, [%r3007+1024]; 2026-02-21T12:56:17.2311617Z ld.shared.b16 %rs17, [%r3007+64]; 2026-02-21T12:56:17.2311815Z ld.shared.b16 %rs18, [%r3007+1088]; 2026-02-21T12:56:17.2312018Z ld.shared.b16 %rs19, [%r3007+8192]; 2026-02-21T12:56:17.2312211Z ld.shared.b16 %rs20, [%r3007+9216]; 2026-02-21T12:56:17.2312411Z ld.shared.b16 %rs21, [%r3007+8256]; 2026-02-21T12:56:17.2312605Z ld.shared.b16 %rs22, [%r3007+9280]; 2026-02-21T12:56:17.2312812Z add.s32 %r3008, %r3005, %r84; 2026-02-21T12:56:17.2312998Z ld.shared.b16 %rs23, [%r3008]; 2026-02-21T12:56:17.2313192Z ld.shared.b16 %rs24, [%r3008+1024]; 2026-02-21T12:56:17.2313391Z ld.shared.b16 %rs25, [%r3008+64]; 2026-02-21T12:56:17.2313583Z ld.shared.b16 %rs26, [%r3008+1088]; 2026-02-21T12:56:17.2313786Z ld.shared.b16 %rs27, [%r3008+8192]; 2026-02-21T12:56:17.2313986Z ld.shared.b16 %rs28, [%r3008+9216]; 2026-02-21T12:56:17.2314184Z ld.shared.b16 %rs29, [%r3008+8256]; 2026-02-21T12:56:17.2314469Z ld.shared.b16 %rs30, [%r3008+9280]; 2026-02-21T12:56:17.2314677Z add.s32 %r3009, %r3005, %r85; 2026-02-21T12:56:17.2314858Z ld.shared.b16 %rs31, [%r3009]; 2026-02-21T12:56:17.2315051Z ld.shared.b16 %rs32, [%r3009+1024]; 2026-02-21T12:56:17.2315253Z ld.shared.b16 %rs33, [%r3009+64]; 2026-02-21T12:56:17.2315456Z ld.shared.b16 %rs34, [%r3009+1088]; 2026-02-21T12:56:17.2315657Z ld.shared.b16 %rs35, [%r3009+8192]; 2026-02-21T12:56:17.2315850Z ld.shared.b16 %rs36, [%r3009+9216]; 2026-02-21T12:56:17.2316046Z ld.shared.b16 %rs37, [%r3009+8256]; 2026-02-21T12:56:17.2316241Z ld.shared.b16 %rs38, [%r3009+9280]; 2026-02-21T12:56:17.2316436Z add.s32 %r3010, %r3005, %r86; 2026-02-21T12:56:17.2322968Z ld.shared.b16 %rs39, [%r3010]; 2026-02-21T12:56:17.2323255Z ld.shared.b16 %rs40, [%r3010+1024]; 2026-02-21T12:56:17.2323484Z ld.shared.b16 %rs41, [%r3010+64]; 2026-02-21T12:56:17.2323704Z ld.shared.b16 %rs42, [%r3010+1088]; 2026-02-21T12:56:17.2323920Z ld.shared.b16 %rs43, [%r3010+8192]; 2026-02-21T12:56:17.2324132Z ld.shared.b16 %rs44, [%r3010+9216]; 2026-02-21T12:56:17.2324329Z ld.shared.b16 %rs45, [%r3010+8256]; 2026-02-21T12:56:17.2324530Z ld.shared.b16 %rs46, [%r3010+9280]; 2026-02-21T12:56:17.2324739Z add.s32 %r3011, %r3005, %r87; 2026-02-21T12:56:17.2324940Z ld.shared.b16 %rs47, [%r3011]; 2026-02-21T12:56:17.2325132Z ld.shared.b16 %rs48, [%r3011+1024]; 2026-02-21T12:56:17.2325336Z ld.shared.b16 %rs49, [%r3011+64]; 2026-02-21T12:56:17.2325541Z ld.shared.b16 %rs50, [%r3011+1088]; 2026-02-21T12:56:17.2325738Z ld.shared.b16 %rs51, [%r3011+8192]; 2026-02-21T12:56:17.2325938Z ld.shared.b16 %rs52, [%r3011+9216]; 2026-02-21T12:56:17.2326130Z ld.shared.b16 %rs53, [%r3011+8256]; 2026-02-21T12:56:17.2326331Z ld.shared.b16 %rs54, [%r3011+9280]; 2026-02-21T12:56:17.2326708Z add.s32 %r3012, %r3005, %r88; 2026-02-21T12:56:17.2327077Z ld.shared.b16 %rs55, [%r3012]; 2026-02-21T12:56:17.2327285Z ld.shared.b16 %rs56, [%r3012+1024]; 2026-02-21T12:56:17.2327495Z ld.shared.b16 %rs57, [%r3012+64]; 2026-02-21T12:56:17.2327713Z ld.shared.b16 %rs58, [%r3012+1088]; 2026-02-21T12:56:17.2327922Z ld.shared.b16 %rs59, [%r3012+8192]; 2026-02-21T12:56:17.2328122Z ld.shared.b16 %rs60, [%r3012+9216]; 2026-02-21T12:56:17.2328316Z ld.shared.b16 %rs61, [%r3012+8256]; 2026-02-21T12:56:17.2328514Z ld.shared.b16 %rs62, [%r3012+9280]; 2026-02-21T12:56:17.2328710Z add.s32 %r3013, %r3005, %r89; 2026-02-21T12:56:17.2328907Z ld.shared.b16 %rs63, [%r3013]; 2026-02-21T12:56:17.2329095Z ld.shared.b16 %rs64, [%r3013+1024]; 2026-02-21T12:56:17.2329316Z ld.shared.b16 %rs65, [%r3013+64]; 2026-02-21T12:56:17.2329516Z ld.shared.b16 %rs66, [%r3013+1088]; 2026-02-21T12:56:17.2329717Z ld.shared.b16 %rs67, [%r3013+8192]; 2026-02-21T12:56:17.2329914Z ld.shared.b16 %rs68, [%r3013+9216]; 2026-02-21T12:56:17.2330107Z ld.shared.b16 %rs69, [%r3013+8256]; 2026-02-21T12:56:17.2330307Z ld.shared.b16 %rs70, [%r3013+9280]; 2026-02-21T12:56:17.2330500Z cvt.f32.bf16 %r720, %rs7; 2026-02-21T12:56:17.2330686Z cvt.f32.bf16 %r721, %rs8; 2026-02-21T12:56:17.2331045Z cvt.f32.bf16 %r722, %rs15; 2026-02-21T12:56:17.2331237Z cvt.f32.bf16 %r723, %rs16; 2026-02-21T12:56:17.2331413Z cvt.f32.bf16 %r852, %rs23; 2026-02-21T12:56:17.2331588Z cvt.f32.bf16 %r853, %rs24; 2026-02-21T12:56:17.2331765Z cvt.f32.bf16 %r854, %rs31; 2026-02-21T12:56:17.2331935Z cvt.f32.bf16 %r855, %rs32; 2026-02-21T12:56:17.2332115Z cvt.f32.bf16 %r984, %rs39; 2026-02-21T12:56:17.2332291Z cvt.f32.bf16 %r985, %rs40; 2026-02-21T12:56:17.2332466Z cvt.f32.bf16 %r986, %rs47; 2026-02-21T12:56:17.2332635Z cvt.f32.bf16 %r987, %rs48; 2026-02-21T12:56:17.2332814Z cvt.f32.bf16 %r1116, %rs55; 2026-02-21T12:56:17.2332991Z cvt.f32.bf16 %r1117, %rs56; 2026-02-21T12:56:17.2333180Z cvt.f32.bf16 %r1118, %rs63; 2026-02-21T12:56:17.2333358Z cvt.f32.bf16 %r1119, %rs64; 2026-02-21T12:56:17.2333539Z cvt.f32.bf16 %r1248, %rs9; 2026-02-21T12:56:17.2333714Z cvt.f32.bf16 %r1249, %rs10; 2026-02-21T12:56:17.2333886Z cvt.f32.bf16 %r1250, %rs17; 2026-02-21T12:56:17.2334147Z cvt.f32.bf16 %r1251, %rs18; 2026-02-21T12:56:17.2334326Z cvt.f32.bf16 %r1380, %rs25; 2026-02-21T12:56:17.2334505Z cvt.f32.bf16 %r1381, %rs26; 2026-02-21T12:56:17.2334686Z cvt.f32.bf16 %r1382, %rs33; 2026-02-21T12:56:17.2334867Z cvt.f32.bf16 %r1383, %rs34; 2026-02-21T12:56:17.2335038Z cvt.f32.bf16 %r1512, %rs41; 2026-02-21T12:56:17.2335215Z cvt.f32.bf16 %r1513, %rs42; 2026-02-21T12:56:17.2335387Z cvt.f32.bf16 %r1514, %rs49; 2026-02-21T12:56:17.2335578Z cvt.f32.bf16 %r1515, %rs50; 2026-02-21T12:56:17.2335758Z cvt.f32.bf16 %r1644, %rs57; 2026-02-21T12:56:17.2335931Z cvt.f32.bf16 %r1645, %rs58; 2026-02-21T12:56:17.2336109Z cvt.f32.bf16 %r1646, %rs65; 2026-02-21T12:56:17.2336279Z cvt.f32.bf16 %r1647, %rs66; 2026-02-21T12:56:17.2336589Z cvt.f32.bf16 %r1776, %rs11; 2026-02-21T12:56:17.2336778Z cvt.f32.bf16 %r1777, %rs12; 2026-02-21T12:56:17.2336960Z cvt.f32.bf16 %r1778, %rs19; 2026-02-21T12:56:17.2337134Z cvt.f32.bf16 %r1779, %rs20; 2026-02-21T12:56:17.2337332Z cvt.f32.bf16 %r1908, %rs27; 2026-02-21T12:56:17.2337512Z cvt.f32.bf16 %r1909, %rs28; 2026-02-21T12:56:17.2337684Z cvt.f32.bf16 %r1910, %rs35; 2026-02-21T12:56:17.2337860Z cvt.f32.bf16 %r1911, %rs36; 2026-02-21T12:56:17.2338039Z cvt.f32.bf16 %r2040, %rs43; 2026-02-21T12:56:17.2338221Z cvt.f32.bf16 %r2041, %rs44; 2026-02-21T12:56:17.2338393Z cvt.f32.bf16 %r2042, %rs51; 2026-02-21T12:56:17.2338570Z cvt.f32.bf16 %r2043, %rs52; 2026-02-21T12:56:17.2338741Z cvt.f32.bf16 %r2172, %rs59; 2026-02-21T12:56:17.2338922Z cvt.f32.bf16 %r2173, %rs60; 2026-02-21T12:56:17.2339100Z cvt.f32.bf16 %r2174, %rs67; 2026-02-21T12:56:17.2339280Z cvt.f32.bf16 %r2175, %rs68; 2026-02-21T12:56:17.2339454Z cvt.f32.bf16 %r2304, %rs13; 2026-02-21T12:56:17.2339625Z cvt.f32.bf16 %r2305, %rs14; 2026-02-21T12:56:17.2339803Z cvt.f32.bf16 %r2306, %rs21; 2026-02-21T12:56:17.2340067Z cvt.f32.bf16 %r2307, %rs22; 2026-02-21T12:56:17.2340244Z cvt.f32.bf16 %r2436, %rs29; 2026-02-21T12:56:17.2340412Z cvt.f32.bf16 %r2437, %rs30; 2026-02-21T12:56:17.2340605Z cvt.f32.bf16 %r2438, %rs37; 2026-02-21T12:56:17.2340778Z cvt.f32.bf16 %r2439, %rs38; 2026-02-21T12:56:17.2340951Z cvt.f32.bf16 %r2568, %rs45; 2026-02-21T12:56:17.2341131Z cvt.f32.bf16 %r2569, %rs46; 2026-02-21T12:56:17.2341313Z cvt.f32.bf16 %r2570, %rs53; 2026-02-21T12:56:17.2341496Z cvt.f32.bf16 %r2571, %rs54; 2026-02-21T12:56:17.2341670Z cvt.f32.bf16 %r2700, %rs61; 2026-02-21T12:56:17.2341848Z cvt.f32.bf16 %r2701, %rs62; 2026-02-21T12:56:17.2342040Z cvt.f32.bf16 %r2702, %rs69; 2026-02-21T12:56:17.2342222Z cvt.f32.bf16 %r2703, %rs70; 2026-02-21T12:56:17.2342564Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2342942Z shl.b32 %r3014, %r3300, 12; 2026-02-21T12:56:17.2343122Z add.s32 %r3015, %r393, 98304; 2026-02-21T12:56:17.2343313Z add.s32 %r3016, %r3015, %r3014; 2026-02-21T12:56:17.2343661Z .loc 1 67 45 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:67:45 2026-02-21T12:56:17.2344171Z add.s32 %r3017, %r3016, %r7; 2026-02-21T12:56:17.2344360Z add.s32 %r3018, %r3016, %r90; 2026-02-21T12:56:17.2344552Z add.s32 %r3019, %r3016, %r91; 2026-02-21T12:56:17.2344736Z add.s32 %r3020, %r3016, %r92; 2026-02-21T12:56:17.2344910Z add.s32 %r3021, %r3016, %r93; 2026-02-21T12:56:17.2345238Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2345603Z ld.shared.s8 %rs71, [%r3017]; 2026-02-21T12:56:17.2345922Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2346278Z shl.b16 %rs72, %rs71, 4; 2026-02-21T12:56:17.2346729Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2347113Z ld.shared.s8 %rs73, [%r3017+128]; 2026-02-21T12:56:17.2347534Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2347923Z shl.b16 %rs74, %rs73, 4; 2026-02-21T12:56:17.2348245Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2348673Z ld.shared.s8 %rs75, [%r3017+256]; 2026-02-21T12:56:17.2349012Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2349366Z shl.b16 %rs76, %rs75, 4; 2026-02-21T12:56:17.2349681Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2350037Z ld.shared.s8 %rs77, [%r3017+384]; 2026-02-21T12:56:17.2350381Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2350744Z shl.b16 %rs78, %rs77, 4; 2026-02-21T12:56:17.2351064Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2351429Z ld.shared.s8 %rs79, [%r3017+512]; 2026-02-21T12:56:17.2351764Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2352140Z shl.b16 %rs80, %rs79, 4; 2026-02-21T12:56:17.2352453Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2352812Z ld.shared.s8 %rs81, [%r3017+640]; 2026-02-21T12:56:17.2353146Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2353496Z shl.b16 %rs82, %rs81, 4; 2026-02-21T12:56:17.2353810Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2354163Z ld.shared.s8 %rs83, [%r3017+768]; 2026-02-21T12:56:17.2354512Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2354974Z shl.b16 %rs84, %rs83, 4; 2026-02-21T12:56:17.2355292Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2355654Z ld.shared.s8 %rs85, [%r3018]; 2026-02-21T12:56:17.2355979Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2356347Z shl.b16 %rs86, %rs85, 4; 2026-02-21T12:56:17.2356789Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2357151Z ld.shared.s8 %rs87, [%r3017+1024]; 2026-02-21T12:56:17.2357503Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2357854Z shl.b16 %rs88, %rs87, 4; 2026-02-21T12:56:17.2358165Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2358535Z ld.shared.s8 %rs89, [%r3017+1152]; 2026-02-21T12:56:17.2358880Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2359381Z shl.b16 %rs90, %rs89, 4; 2026-02-21T12:56:17.2359696Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2360066Z ld.shared.s8 %rs91, [%r3017+1280]; 2026-02-21T12:56:17.2360400Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2360752Z shl.b16 %rs92, %rs91, 4; 2026-02-21T12:56:17.2361060Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2361419Z ld.shared.s8 %rs93, [%r3017+1408]; 2026-02-21T12:56:17.2361747Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2362110Z shl.b16 %rs94, %rs93, 4; 2026-02-21T12:56:17.2362429Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2362856Z ld.shared.s8 %rs95, [%r3017+1536]; 2026-02-21T12:56:17.2363208Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2363558Z shl.b16 %rs96, %rs95, 4; 2026-02-21T12:56:17.2363869Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2364219Z ld.shared.s8 %rs97, [%r3017+1664]; 2026-02-21T12:56:17.2364552Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2364906Z shl.b16 %rs98, %rs97, 4; 2026-02-21T12:56:17.2365211Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2365569Z ld.shared.s8 %rs99, [%r3017+1792]; 2026-02-21T12:56:17.2365903Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2366264Z shl.b16 %rs100, %rs99, 4; 2026-02-21T12:56:17.2366718Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2367096Z ld.shared.s8 %rs101, [%r3019]; 2026-02-21T12:56:17.2367445Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2367798Z shl.b16 %rs102, %rs101, 4; 2026-02-21T12:56:17.2368005Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2368077Z ld.shared.s8 %rs103, [%r3017+2048]; 2026-02-21T12:56:17.2368278Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2368340Z shl.b16 %rs104, %rs103, 4; 2026-02-21T12:56:17.2368545Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2368704Z ld.shared.s8 %rs105, [%r3017+2176]; 2026-02-21T12:56:17.2368918Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2369003Z shl.b16 %rs106, %rs105, 4; 2026-02-21T12:56:17.2369210Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2369282Z ld.shared.s8 %rs107, [%r3017+2304]; 2026-02-21T12:56:17.2369493Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2369559Z shl.b16 %rs108, %rs107, 4; 2026-02-21T12:56:17.2369758Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2369828Z ld.shared.s8 %rs109, [%r3017+2432]; 2026-02-21T12:56:17.2370026Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2370088Z shl.b16 %rs110, %rs109, 4; 2026-02-21T12:56:17.2370289Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2370509Z ld.shared.s8 %rs111, [%r3017+2560]; 2026-02-21T12:56:17.2370715Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2370779Z shl.b16 %rs112, %rs111, 4; 2026-02-21T12:56:17.2370984Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2371051Z ld.shared.s8 %rs113, [%r3017+2688]; 2026-02-21T12:56:17.2371253Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2371320Z shl.b16 %rs114, %rs113, 4; 2026-02-21T12:56:17.2371521Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2371589Z ld.shared.s8 %rs115, [%r3017+2816]; 2026-02-21T12:56:17.2371796Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2371857Z shl.b16 %rs116, %rs115, 4; 2026-02-21T12:56:17.2372148Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2372225Z ld.shared.s8 %rs117, [%r3020]; 2026-02-21T12:56:17.2372425Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2372492Z shl.b16 %rs118, %rs117, 4; 2026-02-21T12:56:17.2372692Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2372763Z ld.shared.s8 %rs119, [%r3017+3072]; 2026-02-21T12:56:17.2372963Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2373025Z shl.b16 %rs120, %rs119, 4; 2026-02-21T12:56:17.2373230Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2373299Z ld.shared.s8 %rs121, [%r3017+3200]; 2026-02-21T12:56:17.2373500Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2373568Z shl.b16 %rs122, %rs121, 4; 2026-02-21T12:56:17.2373767Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2373833Z ld.shared.s8 %rs123, [%r3017+3328]; 2026-02-21T12:56:17.2374038Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2374101Z shl.b16 %rs124, %rs123, 4; 2026-02-21T12:56:17.2374301Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2374374Z ld.shared.s8 %rs125, [%r3017+3456]; 2026-02-21T12:56:17.2374575Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2374705Z shl.b16 %rs126, %rs125, 4; 2026-02-21T12:56:17.2374910Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2374978Z ld.shared.s8 %rs127, [%r3017+3584]; 2026-02-21T12:56:17.2375177Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2375251Z shl.b16 %rs128, %rs127, 4; 2026-02-21T12:56:17.2375471Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2375541Z ld.shared.s8 %rs129, [%r3017+3712]; 2026-02-21T12:56:17.2375752Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2375814Z shl.b16 %rs130, %rs129, 4; 2026-02-21T12:56:17.2376014Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2376083Z ld.shared.s8 %rs131, [%r3017+3840]; 2026-02-21T12:56:17.2376285Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2376419Z shl.b16 %rs132, %rs131, 4; 2026-02-21T12:56:17.2376821Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2376909Z ld.shared.s8 %rs133, [%r3021]; 2026-02-21T12:56:17.2377114Z .loc 1 57 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:57:28 2026-02-21T12:56:17.2377174Z shl.b16 %rs134, %rs133, 4; 2026-02-21T12:56:17.2377378Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2377452Z cvt.s16.s8 %rs135, %rs72; 2026-02-21T12:56:17.2377519Z shr.s16 %rs136, %rs135, 4; 2026-02-21T12:56:17.2377583Z cvt.s16.s8 %rs137, %rs74; 2026-02-21T12:56:17.2377643Z shr.s16 %rs138, %rs137, 4; 2026-02-21T12:56:17.2377703Z shr.s16 %rs139, %rs71, 4; 2026-02-21T12:56:17.2377765Z shr.s16 %rs140, %rs73, 4; 2026-02-21T12:56:17.2377968Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2378111Z cvt.rn.f32.s16 %r3022, %rs140; 2026-02-21T12:56:17.2378180Z cvt.rn.f32.s16 %r3023, %rs139; 2026-02-21T12:56:17.2378245Z cvt.rn.f32.s16 %r3024, %rs138; 2026-02-21T12:56:17.2378306Z cvt.rn.f32.s16 %r3025, %rs136; 2026-02-21T12:56:17.2378508Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2378571Z cvt.s16.s8 %rs141, %rs76; 2026-02-21T12:56:17.2378631Z shr.s16 %rs142, %rs141, 4; 2026-02-21T12:56:17.2378689Z cvt.s16.s8 %rs143, %rs78; 2026-02-21T12:56:17.2378749Z shr.s16 %rs144, %rs143, 4; 2026-02-21T12:56:17.2378821Z shr.s16 %rs145, %rs75, 4; 2026-02-21T12:56:17.2378884Z shr.s16 %rs146, %rs77, 4; 2026-02-21T12:56:17.2379085Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2379152Z cvt.rn.f32.s16 %r3026, %rs146; 2026-02-21T12:56:17.2379213Z cvt.rn.f32.s16 %r3027, %rs145; 2026-02-21T12:56:17.2379276Z cvt.rn.f32.s16 %r3028, %rs144; 2026-02-21T12:56:17.2379342Z cvt.rn.f32.s16 %r3029, %rs142; 2026-02-21T12:56:17.2379545Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2379604Z cvt.s16.s8 %rs147, %rs80; 2026-02-21T12:56:17.2379674Z shr.s16 %rs148, %rs147, 4; 2026-02-21T12:56:17.2379740Z cvt.s16.s8 %rs149, %rs82; 2026-02-21T12:56:17.2379799Z shr.s16 %rs150, %rs149, 4; 2026-02-21T12:56:17.2379858Z shr.s16 %rs151, %rs79, 4; 2026-02-21T12:56:17.2379919Z shr.s16 %rs152, %rs81, 4; 2026-02-21T12:56:17.2380120Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2380181Z cvt.rn.f32.s16 %r3030, %rs152; 2026-02-21T12:56:17.2380244Z cvt.rn.f32.s16 %r3031, %rs151; 2026-02-21T12:56:17.2380308Z cvt.rn.f32.s16 %r3032, %rs150; 2026-02-21T12:56:17.2380447Z cvt.rn.f32.s16 %r3033, %rs148; 2026-02-21T12:56:17.2380650Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2380727Z cvt.s16.s8 %rs153, %rs84; 2026-02-21T12:56:17.2380788Z shr.s16 %rs154, %rs153, 4; 2026-02-21T12:56:17.2380849Z cvt.s16.s8 %rs155, %rs86; 2026-02-21T12:56:17.2380911Z shr.s16 %rs156, %rs155, 4; 2026-02-21T12:56:17.2380971Z shr.s16 %rs157, %rs83, 4; 2026-02-21T12:56:17.2381030Z shr.s16 %rs158, %rs85, 4; 2026-02-21T12:56:17.2381230Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2381297Z cvt.rn.f32.s16 %r3034, %rs158; 2026-02-21T12:56:17.2381356Z cvt.rn.f32.s16 %r3035, %rs157; 2026-02-21T12:56:17.2381419Z cvt.rn.f32.s16 %r3036, %rs156; 2026-02-21T12:56:17.2381482Z cvt.rn.f32.s16 %r3037, %rs154; 2026-02-21T12:56:17.2381680Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2381741Z cvt.s16.s8 %rs159, %rs88; 2026-02-21T12:56:17.2381802Z shr.s16 %rs160, %rs159, 4; 2026-02-21T12:56:17.2381989Z cvt.s16.s8 %rs161, %rs90; 2026-02-21T12:56:17.2382062Z shr.s16 %rs162, %rs161, 4; 2026-02-21T12:56:17.2382122Z shr.s16 %rs163, %rs87, 4; 2026-02-21T12:56:17.2382186Z shr.s16 %rs164, %rs89, 4; 2026-02-21T12:56:17.2382392Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2382453Z cvt.rn.f32.s16 %r3038, %rs164; 2026-02-21T12:56:17.2382517Z cvt.rn.f32.s16 %r3039, %rs163; 2026-02-21T12:56:17.2382577Z cvt.rn.f32.s16 %r3040, %rs162; 2026-02-21T12:56:17.2382638Z cvt.rn.f32.s16 %r3041, %rs160; 2026-02-21T12:56:17.2382836Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2382897Z cvt.s16.s8 %rs165, %rs92; 2026-02-21T12:56:17.2382958Z shr.s16 %rs166, %rs165, 4; 2026-02-21T12:56:17.2383019Z cvt.s16.s8 %rs167, %rs94; 2026-02-21T12:56:17.2383082Z shr.s16 %rs168, %rs167, 4; 2026-02-21T12:56:17.2383142Z shr.s16 %rs169, %rs91, 4; 2026-02-21T12:56:17.2383282Z shr.s16 %rs170, %rs93, 4; 2026-02-21T12:56:17.2383496Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2383564Z cvt.rn.f32.s16 %r3042, %rs170; 2026-02-21T12:56:17.2383627Z cvt.rn.f32.s16 %r3043, %rs169; 2026-02-21T12:56:17.2383688Z cvt.rn.f32.s16 %r3044, %rs168; 2026-02-21T12:56:17.2383753Z cvt.rn.f32.s16 %r3045, %rs166; 2026-02-21T12:56:17.2383949Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2384009Z cvt.s16.s8 %rs171, %rs96; 2026-02-21T12:56:17.2384071Z shr.s16 %rs172, %rs171, 4; 2026-02-21T12:56:17.2384130Z cvt.s16.s8 %rs173, %rs98; 2026-02-21T12:56:17.2384190Z shr.s16 %rs174, %rs173, 4; 2026-02-21T12:56:17.2384250Z shr.s16 %rs175, %rs95, 4; 2026-02-21T12:56:17.2384315Z shr.s16 %rs176, %rs97, 4; 2026-02-21T12:56:17.2384517Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2384593Z cvt.rn.f32.s16 %r3046, %rs176; 2026-02-21T12:56:17.2384658Z cvt.rn.f32.s16 %r3047, %rs175; 2026-02-21T12:56:17.2384718Z cvt.rn.f32.s16 %r3048, %rs174; 2026-02-21T12:56:17.2384778Z cvt.rn.f32.s16 %r3049, %rs172; 2026-02-21T12:56:17.2384980Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2385042Z cvt.s16.s8 %rs177, %rs100; 2026-02-21T12:56:17.2385102Z shr.s16 %rs178, %rs177, 4; 2026-02-21T12:56:17.2385161Z cvt.s16.s8 %rs179, %rs102; 2026-02-21T12:56:17.2385225Z shr.s16 %rs180, %rs179, 4; 2026-02-21T12:56:17.2385283Z shr.s16 %rs181, %rs99, 4; 2026-02-21T12:56:17.2385353Z shr.s16 %rs182, %rs101, 4; 2026-02-21T12:56:17.2385560Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2385681Z cvt.rn.f32.s16 %r3050, %rs182; 2026-02-21T12:56:17.2385743Z cvt.rn.f32.s16 %r3051, %rs181; 2026-02-21T12:56:17.2385819Z cvt.rn.f32.s16 %r3052, %rs180; 2026-02-21T12:56:17.2385886Z cvt.rn.f32.s16 %r3053, %rs178; 2026-02-21T12:56:17.2386086Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2386147Z cvt.s16.s8 %rs183, %rs104; 2026-02-21T12:56:17.2386210Z shr.s16 %rs184, %rs183, 4; 2026-02-21T12:56:17.2386268Z cvt.s16.s8 %rs185, %rs106; 2026-02-21T12:56:17.2386326Z shr.s16 %rs186, %rs185, 4; 2026-02-21T12:56:17.2386387Z shr.s16 %rs187, %rs103, 4; 2026-02-21T12:56:17.2386565Z shr.s16 %rs188, %rs105, 4; 2026-02-21T12:56:17.2386785Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2386849Z cvt.rn.f32.s16 %r3054, %rs188; 2026-02-21T12:56:17.2386913Z cvt.rn.f32.s16 %r3055, %rs187; 2026-02-21T12:56:17.2386978Z cvt.rn.f32.s16 %r3056, %rs186; 2026-02-21T12:56:17.2387038Z cvt.rn.f32.s16 %r3057, %rs184; 2026-02-21T12:56:17.2387325Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2387452Z cvt.s16.s8 %rs189, %rs108; 2026-02-21T12:56:17.2387513Z shr.s16 %rs190, %rs189, 4; 2026-02-21T12:56:17.2387576Z cvt.s16.s8 %rs191, %rs110; 2026-02-21T12:56:17.2387637Z shr.s16 %rs192, %rs191, 4; 2026-02-21T12:56:17.2387697Z shr.s16 %rs193, %rs107, 4; 2026-02-21T12:56:17.2387756Z shr.s16 %rs194, %rs109, 4; 2026-02-21T12:56:17.2387958Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2388020Z cvt.rn.f32.s16 %r3058, %rs194; 2026-02-21T12:56:17.2388080Z cvt.rn.f32.s16 %r3059, %rs193; 2026-02-21T12:56:17.2388146Z cvt.rn.f32.s16 %r3060, %rs192; 2026-02-21T12:56:17.2388208Z cvt.rn.f32.s16 %r3061, %rs190; 2026-02-21T12:56:17.2388482Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2388551Z cvt.s16.s8 %rs195, %rs112; 2026-02-21T12:56:17.2388687Z shr.s16 %rs196, %rs195, 4; 2026-02-21T12:56:17.2388753Z cvt.s16.s8 %rs197, %rs114; 2026-02-21T12:56:17.2388815Z shr.s16 %rs198, %rs197, 4; 2026-02-21T12:56:17.2388883Z shr.s16 %rs199, %rs111, 4; 2026-02-21T12:56:17.2388949Z shr.s16 %rs200, %rs113, 4; 2026-02-21T12:56:17.2389158Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2389224Z cvt.rn.f32.s16 %r3062, %rs200; 2026-02-21T12:56:17.2389285Z cvt.rn.f32.s16 %r3063, %rs199; 2026-02-21T12:56:17.2389347Z cvt.rn.f32.s16 %r3064, %rs198; 2026-02-21T12:56:17.2389406Z cvt.rn.f32.s16 %r3065, %rs196; 2026-02-21T12:56:17.2389615Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2389686Z cvt.s16.s8 %rs201, %rs116; 2026-02-21T12:56:17.2389747Z shr.s16 %rs202, %rs201, 4; 2026-02-21T12:56:17.2389813Z cvt.s16.s8 %rs203, %rs118; 2026-02-21T12:56:17.2389874Z shr.s16 %rs204, %rs203, 4; 2026-02-21T12:56:17.2389935Z shr.s16 %rs205, %rs115, 4; 2026-02-21T12:56:17.2389999Z shr.s16 %rs206, %rs117, 4; 2026-02-21T12:56:17.2390203Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2390265Z cvt.rn.f32.s16 %r3066, %rs206; 2026-02-21T12:56:17.2390335Z cvt.rn.f32.s16 %r3067, %rs205; 2026-02-21T12:56:17.2390402Z cvt.rn.f32.s16 %r3068, %rs204; 2026-02-21T12:56:17.2390463Z cvt.rn.f32.s16 %r3069, %rs202; 2026-02-21T12:56:17.2390668Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2390735Z cvt.s16.s8 %rs207, %rs120; 2026-02-21T12:56:17.2390794Z shr.s16 %rs208, %rs207, 4; 2026-02-21T12:56:17.2390854Z cvt.s16.s8 %rs209, %rs122; 2026-02-21T12:56:17.2390913Z shr.s16 %rs210, %rs209, 4; 2026-02-21T12:56:17.2391064Z shr.s16 %rs211, %rs119, 4; 2026-02-21T12:56:17.2391124Z shr.s16 %rs212, %rs121, 4; 2026-02-21T12:56:17.2391333Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2391412Z cvt.rn.f32.s16 %r3070, %rs212; 2026-02-21T12:56:17.2391473Z cvt.rn.f32.s16 %r3071, %rs211; 2026-02-21T12:56:17.2391533Z cvt.rn.f32.s16 %r3072, %rs210; 2026-02-21T12:56:17.2391597Z cvt.rn.f32.s16 %r3073, %rs208; 2026-02-21T12:56:17.2391798Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2391867Z cvt.s16.s8 %rs213, %rs124; 2026-02-21T12:56:17.2391925Z shr.s16 %rs214, %rs213, 4; 2026-02-21T12:56:17.2391988Z cvt.s16.s8 %rs215, %rs126; 2026-02-21T12:56:17.2392047Z shr.s16 %rs216, %rs215, 4; 2026-02-21T12:56:17.2392107Z shr.s16 %rs217, %rs123, 4; 2026-02-21T12:56:17.2392167Z shr.s16 %rs218, %rs125, 4; 2026-02-21T12:56:17.2392367Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2392429Z cvt.rn.f32.s16 %r3074, %rs218; 2026-02-21T12:56:17.2392492Z cvt.rn.f32.s16 %r3075, %rs217; 2026-02-21T12:56:17.2392664Z cvt.rn.f32.s16 %r3076, %rs216; 2026-02-21T12:56:17.2392728Z cvt.rn.f32.s16 %r3077, %rs214; 2026-02-21T12:56:17.2392930Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2392993Z cvt.s16.s8 %rs219, %rs128; 2026-02-21T12:56:17.2393053Z shr.s16 %rs220, %rs219, 4; 2026-02-21T12:56:17.2393112Z cvt.s16.s8 %rs221, %rs130; 2026-02-21T12:56:17.2393174Z shr.s16 %rs222, %rs221, 4; 2026-02-21T12:56:17.2393234Z shr.s16 %rs223, %rs127, 4; 2026-02-21T12:56:17.2393294Z shr.s16 %rs224, %rs129, 4; 2026-02-21T12:56:17.2393492Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2393556Z cvt.rn.f32.s16 %r3078, %rs224; 2026-02-21T12:56:17.2393620Z cvt.rn.f32.s16 %r3079, %rs223; 2026-02-21T12:56:17.2393684Z cvt.rn.f32.s16 %r3080, %rs222; 2026-02-21T12:56:17.2393747Z cvt.rn.f32.s16 %r3081, %rs220; 2026-02-21T12:56:17.2394008Z .loc 1 59 25 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:59:25 2026-02-21T12:56:17.2394071Z cvt.s16.s8 %rs225, %rs132; 2026-02-21T12:56:17.2394135Z shr.s16 %rs226, %rs225, 4; 2026-02-21T12:56:17.2394207Z cvt.s16.s8 %rs227, %rs134; 2026-02-21T12:56:17.2394268Z shr.s16 %rs228, %rs227, 4; 2026-02-21T12:56:17.2394328Z shr.s16 %rs229, %rs131, 4; 2026-02-21T12:56:17.2394390Z shr.s16 %rs230, %rs133, 4; 2026-02-21T12:56:17.2394591Z .loc 1 77 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:77:32 2026-02-21T12:56:17.2394654Z cvt.rn.f32.s16 %r3082, %rs230; 2026-02-21T12:56:17.2394719Z cvt.rn.f32.s16 %r3083, %rs229; 2026-02-21T12:56:17.2394779Z cvt.rn.f32.s16 %r3084, %rs228; 2026-02-21T12:56:17.2394841Z cvt.rn.f32.s16 %r3085, %rs226; 2026-02-21T12:56:17.2394958Z st.shared.v4.b32 [%r94], {%r3025, %r3023, %r3024, %r3022}; 2026-02-21T12:56:17.2395083Z st.shared.v4.b32 [%r94+16384], {%r3057, %r3055, %r3056, %r3054}; 2026-02-21T12:56:17.2395193Z st.shared.v4.b32 [%r95], {%r3029, %r3027, %r3028, %r3026}; 2026-02-21T12:56:17.2395307Z st.shared.v4.b32 [%r95+16384], {%r3061, %r3059, %r3060, %r3058}; 2026-02-21T12:56:17.2395410Z st.shared.v4.b32 [%r96], {%r3033, %r3031, %r3032, %r3030}; 2026-02-21T12:56:17.2395520Z st.shared.v4.b32 [%r96+16384], {%r3065, %r3063, %r3064, %r3062}; 2026-02-21T12:56:17.2395620Z st.shared.v4.b32 [%r97], {%r3037, %r3035, %r3036, %r3034}; 2026-02-21T12:56:17.2395738Z st.shared.v4.b32 [%r97+16384], {%r3069, %r3067, %r3068, %r3066}; 2026-02-21T12:56:17.2395846Z st.shared.v4.b32 [%r98], {%r3041, %r3039, %r3040, %r3038}; 2026-02-21T12:56:17.2395957Z st.shared.v4.b32 [%r98+16384], {%r3073, %r3071, %r3072, %r3070}; 2026-02-21T12:56:17.2396062Z st.shared.v4.b32 [%r99], {%r3045, %r3043, %r3044, %r3042}; 2026-02-21T12:56:17.2396239Z st.shared.v4.b32 [%r99+16384], {%r3077, %r3075, %r3076, %r3074}; 2026-02-21T12:56:17.2396346Z st.shared.v4.b32 [%r100], {%r3049, %r3047, %r3048, %r3046}; 2026-02-21T12:56:17.2396604Z st.shared.v4.b32 [%r100+16384], {%r3081, %r3079, %r3080, %r3078}; 2026-02-21T12:56:17.2396734Z st.shared.v4.b32 [%r101], {%r3053, %r3051, %r3052, %r3050}; 2026-02-21T12:56:17.2396856Z st.shared.v4.b32 [%r101+16384], {%r3085, %r3083, %r3084, %r3082}; 2026-02-21T12:56:17.2396912Z $L__tmp0: 2026-02-21T12:56:17.2397205Z .loc 2 291 36 // standard.py:291:36 @[ cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:84:40 ] 2026-02-21T12:56:17.2397277Z // begin inline asm 2026-02-21T12:56:17.2397378Z fence.proxy.async.shared::cta; 2026-02-21T12:56:17.2397439Z // end inline asm 2026-02-21T12:56:17.2397494Z bar.sync 0; 2026-02-21T12:56:17.2397579Z shfl.sync.idx.b32 %r3086, %r3, 0, 31, -1; 2026-02-21T12:56:17.2397652Z wgmma.fence.sync.aligned; 2026-02-21T12:56:17.2397718Z mov.pred %p2, -1; 2026-02-21T12:56:17.2397778Z // begin inline asm 2026-02-21T12:56:17.2399128Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r720,%r721,%r722,%r723}, %rd183, %p2, 1, 1; 2026-02-21T12:56:17.2399249Z // end inline asm 2026-02-21T12:56:17.2399308Z // begin inline asm 2026-02-21T12:56:17.2400631Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r852,%r853,%r854,%r855}, %rd184, %p2, 1, 1; 2026-02-21T12:56:17.2400698Z // end inline asm 2026-02-21T12:56:17.2400756Z // begin inline asm 2026-02-21T12:56:17.2402010Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r984,%r985,%r986,%r987}, %rd185, %p2, 1, 1; 2026-02-21T12:56:17.2402067Z // end inline asm 2026-02-21T12:56:17.2402126Z // begin inline asm 2026-02-21T12:56:17.2403407Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r1116,%r1117,%r1118,%r1119}, %rd186, %p2, 1, 1; 2026-02-21T12:56:17.2403467Z // end inline asm 2026-02-21T12:56:17.2403525Z // begin inline asm 2026-02-21T12:56:17.2404791Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r1248,%r1249,%r1250,%r1251}, %rd187, %p2, 1, 1; 2026-02-21T12:56:17.2404926Z // end inline asm 2026-02-21T12:56:17.2404987Z // begin inline asm 2026-02-21T12:56:17.2406256Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r1380,%r1381,%r1382,%r1383}, %rd188, %p2, 1, 1; 2026-02-21T12:56:17.2406312Z // end inline asm 2026-02-21T12:56:17.2406375Z // begin inline asm 2026-02-21T12:56:17.2407836Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r1512,%r1513,%r1514,%r1515}, %rd189, %p2, 1, 1; 2026-02-21T12:56:17.2407986Z // end inline asm 2026-02-21T12:56:17.2408045Z // begin inline asm 2026-02-21T12:56:17.2409397Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365}, {%r1644,%r1645,%r1646,%r1647}, %rd190, %p2, 1, 1; 2026-02-21T12:56:17.2409474Z // end inline asm 2026-02-21T12:56:17.2409534Z // begin inline asm 2026-02-21T12:56:17.2410789Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r1776,%r1777,%r1778,%r1779}, %rd183, %p2, 1, 1; 2026-02-21T12:56:17.2410849Z // end inline asm 2026-02-21T12:56:17.2410909Z // begin inline asm 2026-02-21T12:56:17.2412173Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r1908,%r1909,%r1910,%r1911}, %rd184, %p2, 1, 1; 2026-02-21T12:56:17.2412233Z // end inline asm 2026-02-21T12:56:17.2412291Z // begin inline asm 2026-02-21T12:56:17.2413548Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2040,%r2041,%r2042,%r2043}, %rd185, %p2, 1, 1; 2026-02-21T12:56:17.2413685Z // end inline asm 2026-02-21T12:56:17.2413743Z // begin inline asm 2026-02-21T12:56:17.2415008Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2172,%r2173,%r2174,%r2175}, %rd186, %p2, 1, 1; 2026-02-21T12:56:17.2415068Z // end inline asm 2026-02-21T12:56:17.2415129Z // begin inline asm 2026-02-21T12:56:17.2416442Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2304,%r2305,%r2306,%r2307}, %rd187, %p2, 1, 1; 2026-02-21T12:56:17.2416696Z // end inline asm 2026-02-21T12:56:17.2416757Z // begin inline asm 2026-02-21T12:56:17.2418111Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2436,%r2437,%r2438,%r2439}, %rd188, %p2, 1, 1; 2026-02-21T12:56:17.2418175Z // end inline asm 2026-02-21T12:56:17.2418240Z // begin inline asm 2026-02-21T12:56:17.2419505Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2568,%r2569,%r2570,%r2571}, %rd189, %p2, 1, 1; 2026-02-21T12:56:17.2419562Z // end inline asm 2026-02-21T12:56:17.2419625Z // begin inline asm 2026-02-21T12:56:17.2420893Z wgmma.mma_async.sync.aligned.m64n128k8.f32.tf32.tf32 {%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429}, {%r2700,%r2701,%r2702,%r2703}, %rd190, %p2, 1, 1; 2026-02-21T12:56:17.2420954Z // end inline asm 2026-02-21T12:56:17.2421033Z wgmma.commit_group.sync.aligned; 2026-02-21T12:56:17.2421093Z mov.b32 %r2833, 0; 2026-02-21T12:56:17.2421153Z mov.b32 %r2832, %r404; 2026-02-21T12:56:17.2421217Z mov.b32 %r2834, %r2833; 2026-02-21T12:56:17.2421274Z // begin inline asm 2026-02-21T12:56:17.2423342Z // wait for regs: %r3302,%r3303,%r3304,%r3305,%r3306,%r3307,%r3308,%r3309,%r3310,%r3311,%r3312,%r3313,%r3314,%r3315,%r3316,%r3317,%r3318,%r3319,%r3320,%r3321,%r3322,%r3323,%r3324,%r3325,%r3326,%r3327,%r3328,%r3329,%r3330,%r3331,%r3332,%r3333,%r3334,%r3335,%r3336,%r3337,%r3338,%r3339,%r3340,%r3341,%r3342,%r3343,%r3344,%r3345,%r3346,%r3347,%r3348,%r3349,%r3350,%r3351,%r3352,%r3353,%r3354,%r3355,%r3356,%r3357,%r3358,%r3359,%r3360,%r3361,%r3362,%r3363,%r3364,%r3365,%r3366,%r3367,%r3368,%r3369,%r3370,%r3371,%r3372,%r3373,%r3374,%r3375,%r3376,%r3377,%r3378,%r3379,%r3380,%r3381,%r3382,%r3383,%r3384,%r3385,%r3386,%r3387,%r3388,%r3389,%r3390,%r3391,%r3392,%r3393,%r3394,%r3395,%r3396,%r3397,%r3398,%r3399,%r3400,%r3401,%r3402,%r3403,%r3404,%r3405,%r3406,%r3407,%r3408,%r3409,%r3410,%r3411,%r3412,%r3413,%r3414,%r3415,%r3416,%r3417,%r3418,%r3419,%r3420,%r3421,%r3422,%r3423,%r3424,%r3425,%r3426,%r3427,%r3428,%r3429,%r2832,%r2833,%r2834 2026-02-21T12:56:17.2423495Z wgmma.wait_group.sync.aligned 0; 2026-02-21T12:56:17.2423553Z // end inline asm 2026-02-21T12:56:17.2423621Z $L__tmp1: 2026-02-21T12:56:17.2423849Z .loc 1 40 57 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:40:57 2026-02-21T12:56:17.2423913Z add.s32 %r3087, %r3301, 1; 2026-02-21T12:56:17.2424081Z setp.gt.s32 %p20, %r3087, 3; 2026-02-21T12:56:17.2424157Z selp.b32 %r3301, 0, %r3087, %p20; 2026-02-21T12:56:17.2424230Z add.s64 %rd217, %rd259, 128; 2026-02-21T12:56:17.2424442Z .loc 1 41 35 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:41:35 2026-02-21T12:56:17.2424508Z or.b64 %rd218, %rd217, %rd19; 2026-02-21T12:56:17.2424569Z or.b64 %rd219, %rd217, %rd20; 2026-02-21T12:56:17.2424770Z .loc 1 48 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:32 2026-02-21T12:56:17.2424834Z shl.b64 %rd221, %rd217, 2; 2026-02-21T12:56:17.2424897Z or.b64 %rd222, %rd221, %rd167; 2026-02-21T12:56:17.2424960Z add.s64 %rd199, %rd51, %rd222; 2026-02-21T12:56:17.2425022Z add.s64 %rd200, %rd52, %rd222; 2026-02-21T12:56:17.2425087Z add.s64 %rd201, %rd53, %rd222; 2026-02-21T12:56:17.2425148Z add.s64 %rd202, %rd54, %rd222; 2026-02-21T12:56:17.2425207Z add.s64 %rd203, %rd55, %rd222; 2026-02-21T12:56:17.2425330Z add.s64 %rd204, %rd56, %rd222; 2026-02-21T12:56:17.2425396Z add.s64 %rd205, %rd57, %rd222; 2026-02-21T12:56:17.2425457Z add.s64 %rd206, %rd58, %rd222; 2026-02-21T12:56:17.2425516Z add.s64 %rd207, %rd59, %rd222; 2026-02-21T12:56:17.2425580Z add.s64 %rd208, %rd60, %rd222; 2026-02-21T12:56:17.2425639Z add.s64 %rd209, %rd61, %rd222; 2026-02-21T12:56:17.2425698Z add.s64 %rd210, %rd62, %rd222; 2026-02-21T12:56:17.2425759Z add.s64 %rd211, %rd63, %rd222; 2026-02-21T12:56:17.2425819Z add.s64 %rd212, %rd64, %rd222; 2026-02-21T12:56:17.2425880Z add.s64 %rd213, %rd65, %rd222; 2026-02-21T12:56:17.2425942Z add.s64 %rd214, %rd66, %rd222; 2026-02-21T12:56:17.2426143Z .loc 1 48 80 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:48:80 2026-02-21T12:56:17.2426205Z shl.b32 %r3088, %r3301, 14; 2026-02-21T12:56:17.2426268Z add.s32 %r3089, %r393, %r3088; 2026-02-21T12:56:17.2426333Z add.s32 %r2966, %r3089, %r8; 2026-02-21T12:56:17.2426410Z selp.b32 %r2967, 8, 0, %p18; 2026-02-21T12:56:17.2426616Z // begin inline asm 2026-02-21T12:56:17.2426783Z cp.async.ca.shared.global [ %r2966 + 0 ], [ %rd199 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2426841Z // end inline asm 2026-02-21T12:56:17.2426902Z add.s32 %r2968, %r2966, 1024; 2026-02-21T12:56:17.2426960Z // begin inline asm 2026-02-21T12:56:17.2427098Z cp.async.ca.shared.global [ %r2968 + 0 ], [ %rd200 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2427153Z // end inline asm 2026-02-21T12:56:17.2427223Z add.s32 %r2970, %r2966, 2048; 2026-02-21T12:56:17.2427284Z // begin inline asm 2026-02-21T12:56:17.2427415Z cp.async.ca.shared.global [ %r2970 + 0 ], [ %rd201 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2427471Z // end inline asm 2026-02-21T12:56:17.2427533Z add.s32 %r2972, %r2966, 3072; 2026-02-21T12:56:17.2427590Z // begin inline asm 2026-02-21T12:56:17.2427809Z cp.async.ca.shared.global [ %r2972 + 0 ], [ %rd202 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2427864Z // end inline asm 2026-02-21T12:56:17.2427931Z add.s32 %r2974, %r2966, 4096; 2026-02-21T12:56:17.2427988Z // begin inline asm 2026-02-21T12:56:17.2428116Z cp.async.ca.shared.global [ %r2974 + 0 ], [ %rd203 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2428172Z // end inline asm 2026-02-21T12:56:17.2428241Z add.s32 %r2976, %r2966, 5120; 2026-02-21T12:56:17.2428382Z // begin inline asm 2026-02-21T12:56:17.2428516Z cp.async.ca.shared.global [ %r2976 + 0 ], [ %rd204 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2428574Z // end inline asm 2026-02-21T12:56:17.2428633Z add.s32 %r2978, %r2966, 6144; 2026-02-21T12:56:17.2428692Z // begin inline asm 2026-02-21T12:56:17.2428822Z cp.async.ca.shared.global [ %r2978 + 0 ], [ %rd205 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2428876Z // end inline asm 2026-02-21T12:56:17.2428936Z add.s32 %r2980, %r2966, 7168; 2026-02-21T12:56:17.2428993Z // begin inline asm 2026-02-21T12:56:17.2429129Z cp.async.ca.shared.global [ %r2980 + 0 ], [ %rd206 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2429184Z // end inline asm 2026-02-21T12:56:17.2429388Z add.s32 %r2982, %r2966, 8192; 2026-02-21T12:56:17.2429452Z // begin inline asm 2026-02-21T12:56:17.2429583Z cp.async.ca.shared.global [ %r2982 + 0 ], [ %rd207 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2429637Z // end inline asm 2026-02-21T12:56:17.2429699Z add.s32 %r2984, %r2966, 9216; 2026-02-21T12:56:17.2429755Z // begin inline asm 2026-02-21T12:56:17.2429894Z cp.async.ca.shared.global [ %r2984 + 0 ], [ %rd208 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2429955Z // end inline asm 2026-02-21T12:56:17.2430019Z add.s32 %r2986, %r2966, 10240; 2026-02-21T12:56:17.2430076Z // begin inline asm 2026-02-21T12:56:17.2430209Z cp.async.ca.shared.global [ %r2986 + 0 ], [ %rd209 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2430269Z // end inline asm 2026-02-21T12:56:17.2430329Z add.s32 %r2988, %r2966, 11264; 2026-02-21T12:56:17.2430389Z // begin inline asm 2026-02-21T12:56:17.2430526Z cp.async.ca.shared.global [ %r2988 + 0 ], [ %rd210 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2430650Z // end inline asm 2026-02-21T12:56:17.2430716Z add.s32 %r2990, %r2966, 12288; 2026-02-21T12:56:17.2430783Z // begin inline asm 2026-02-21T12:56:17.2430918Z cp.async.ca.shared.global [ %r2990 + 0 ], [ %rd211 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2430974Z // end inline asm 2026-02-21T12:56:17.2431041Z add.s32 %r2992, %r2966, 13312; 2026-02-21T12:56:17.2431103Z // begin inline asm 2026-02-21T12:56:17.2431232Z cp.async.ca.shared.global [ %r2992 + 0 ], [ %rd212 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2431290Z // end inline asm 2026-02-21T12:56:17.2431348Z add.s32 %r2994, %r2966, 14336; 2026-02-21T12:56:17.2431409Z // begin inline asm 2026-02-21T12:56:17.2431537Z cp.async.ca.shared.global [ %r2994 + 0 ], [ %rd213 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2431592Z // end inline asm 2026-02-21T12:56:17.2431657Z add.s32 %r2996, %r2966, 15360; 2026-02-21T12:56:17.2431715Z // begin inline asm 2026-02-21T12:56:17.2431844Z cp.async.ca.shared.global [ %r2996 + 0 ], [ %rd214 + 0 ], 0x8, %r2967; 2026-02-21T12:56:17.2431906Z // end inline asm 2026-02-21T12:56:17.2431975Z cp.async.commit_group; 2026-02-21T12:56:17.2432201Z .loc 1 54 34 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:34 2026-02-21T12:56:17.2432278Z mad.lo.s64 %rd215, %rd218, 1280, %rd67; 2026-02-21T12:56:17.2432356Z mad.lo.s64 %rd216, %rd219, 1280, %rd67; 2026-02-21T12:56:17.2432560Z .loc 1 54 87 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:54:87 2026-02-21T12:56:17.2432623Z shl.b32 %r3090, %r3301, 12; 2026-02-21T12:56:17.2432691Z add.s32 %r3091, %r3015, %r3090; 2026-02-21T12:56:17.2432762Z add.s32 %r2998, %r3091, %r25; 2026-02-21T12:56:17.2432827Z selp.b32 %r2999, 16, 0, %p18; 2026-02-21T12:56:17.2432888Z // begin inline asm 2026-02-21T12:56:17.2433035Z cp.async.cg.shared.global [ %r2998 + 0 ], [ %rd215 + 0 ], 0x10, %r2999; 2026-02-21T12:56:17.2433166Z // end inline asm 2026-02-21T12:56:17.2433228Z add.s32 %r3000, %r2998, 2048; 2026-02-21T12:56:17.2433301Z // begin inline asm 2026-02-21T12:56:17.2433448Z cp.async.cg.shared.global [ %r3000 + 0 ], [ %rd216 + 0 ], 0x10, %r2999; 2026-02-21T12:56:17.2433506Z // end inline asm 2026-02-21T12:56:17.2433572Z cp.async.commit_group; 2026-02-21T12:56:17.2433789Z .loc 1 40 57 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:40:57 2026-02-21T12:56:17.2433855Z add.s64 %rd259, %rd259, 32; 2026-02-21T12:56:17.2433923Z setp.lt.u64 %p21, %rd258, 4064; 2026-02-21T12:56:17.2433993Z @%p21 bra $L__BB0_3; 2026-02-21T12:56:17.2434105Z // %bb.4: // in Loop: Header=BB0_2 Depth=1 2026-02-21T12:56:17.2434311Z .loc 1 31 32 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:31:32 2026-02-21T12:56:17.2434379Z or.b64 %rd239, %rd34, %rd18; 2026-02-21T12:56:17.2434586Z .loc 1 40 57 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:40:57 2026-02-21T12:56:17.2434655Z cp.async.wait_group 0; 2026-02-21T12:56:17.2434799Z bar.sync 0; 2026-02-21T12:56:17.2435048Z .loc 1 87 28 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:87:28 2026-02-21T12:56:17.2435128Z cvt.rn.bf16x2.f32 %r3236, %r3303, %r3302; 2026-02-21T12:56:17.2435201Z cvt.rn.bf16x2.f32 %r3237, %r3305, %r3304; 2026-02-21T12:56:17.2435277Z cvt.rn.bf16x2.f32 %r3238, %r3307, %r3306; 2026-02-21T12:56:17.2435348Z cvt.rn.bf16x2.f32 %r3239, %r3309, %r3308; 2026-02-21T12:56:17.2435418Z cvt.rn.bf16x2.f32 %r3240, %r3311, %r3310; 2026-02-21T12:56:17.2435493Z cvt.rn.bf16x2.f32 %r3241, %r3313, %r3312; 2026-02-21T12:56:17.2435564Z cvt.rn.bf16x2.f32 %r3242, %r3315, %r3314; 2026-02-21T12:56:17.2435635Z cvt.rn.bf16x2.f32 %r3243, %r3317, %r3316; 2026-02-21T12:56:17.2435708Z cvt.rn.bf16x2.f32 %r3244, %r3319, %r3318; 2026-02-21T12:56:17.2435780Z cvt.rn.bf16x2.f32 %r3245, %r3321, %r3320; 2026-02-21T12:56:17.2435855Z cvt.rn.bf16x2.f32 %r3246, %r3323, %r3322; 2026-02-21T12:56:17.2435973Z cvt.rn.bf16x2.f32 %r3247, %r3325, %r3324; 2026-02-21T12:56:17.2436066Z cvt.rn.bf16x2.f32 %r3248, %r3327, %r3326; 2026-02-21T12:56:17.2436138Z cvt.rn.bf16x2.f32 %r3249, %r3329, %r3328; 2026-02-21T12:56:17.2436208Z cvt.rn.bf16x2.f32 %r3250, %r3331, %r3330; 2026-02-21T12:56:17.2436285Z cvt.rn.bf16x2.f32 %r3251, %r3333, %r3332; 2026-02-21T12:56:17.2436354Z cvt.rn.bf16x2.f32 %r3252, %r3335, %r3334; 2026-02-21T12:56:17.2436424Z cvt.rn.bf16x2.f32 %r3253, %r3337, %r3336; 2026-02-21T12:56:17.2436622Z cvt.rn.bf16x2.f32 %r3254, %r3339, %r3338; 2026-02-21T12:56:17.2436702Z cvt.rn.bf16x2.f32 %r3255, %r3341, %r3340; 2026-02-21T12:56:17.2436783Z cvt.rn.bf16x2.f32 %r3256, %r3343, %r3342; 2026-02-21T12:56:17.2436859Z cvt.rn.bf16x2.f32 %r3257, %r3345, %r3344; 2026-02-21T12:56:17.2436934Z cvt.rn.bf16x2.f32 %r3258, %r3347, %r3346; 2026-02-21T12:56:17.2437004Z cvt.rn.bf16x2.f32 %r3259, %r3349, %r3348; 2026-02-21T12:56:17.2437078Z cvt.rn.bf16x2.f32 %r3260, %r3351, %r3350; 2026-02-21T12:56:17.2437153Z cvt.rn.bf16x2.f32 %r3261, %r3353, %r3352; 2026-02-21T12:56:17.2437226Z cvt.rn.bf16x2.f32 %r3262, %r3355, %r3354; 2026-02-21T12:56:17.2437296Z cvt.rn.bf16x2.f32 %r3263, %r3357, %r3356; 2026-02-21T12:56:17.2437378Z cvt.rn.bf16x2.f32 %r3264, %r3359, %r3358; 2026-02-21T12:56:17.2437454Z cvt.rn.bf16x2.f32 %r3265, %r3361, %r3360; 2026-02-21T12:56:17.2437524Z cvt.rn.bf16x2.f32 %r3266, %r3363, %r3362; 2026-02-21T12:56:17.2437593Z cvt.rn.bf16x2.f32 %r3267, %r3365, %r3364; 2026-02-21T12:56:17.2437672Z cvt.rn.bf16x2.f32 %r3268, %r3367, %r3366; 2026-02-21T12:56:17.2437743Z cvt.rn.bf16x2.f32 %r3269, %r3369, %r3368; 2026-02-21T12:56:17.2437811Z cvt.rn.bf16x2.f32 %r3270, %r3371, %r3370; 2026-02-21T12:56:17.2437883Z cvt.rn.bf16x2.f32 %r3271, %r3373, %r3372; 2026-02-21T12:56:17.2437954Z cvt.rn.bf16x2.f32 %r3272, %r3375, %r3374; 2026-02-21T12:56:17.2438024Z cvt.rn.bf16x2.f32 %r3273, %r3377, %r3376; 2026-02-21T12:56:17.2438179Z cvt.rn.bf16x2.f32 %r3274, %r3379, %r3378; 2026-02-21T12:56:17.2438269Z cvt.rn.bf16x2.f32 %r3275, %r3381, %r3380; 2026-02-21T12:56:17.2438343Z cvt.rn.bf16x2.f32 %r3276, %r3383, %r3382; 2026-02-21T12:56:17.2438415Z cvt.rn.bf16x2.f32 %r3277, %r3385, %r3384; 2026-02-21T12:56:17.2438489Z cvt.rn.bf16x2.f32 %r3278, %r3387, %r3386; 2026-02-21T12:56:17.2438559Z cvt.rn.bf16x2.f32 %r3279, %r3389, %r3388; 2026-02-21T12:56:17.2438629Z cvt.rn.bf16x2.f32 %r3280, %r3391, %r3390; 2026-02-21T12:56:17.2438699Z cvt.rn.bf16x2.f32 %r3281, %r3393, %r3392; 2026-02-21T12:56:17.2438773Z cvt.rn.bf16x2.f32 %r3282, %r3395, %r3394; 2026-02-21T12:56:17.2438842Z cvt.rn.bf16x2.f32 %r3283, %r3397, %r3396; 2026-02-21T12:56:17.2438912Z cvt.rn.bf16x2.f32 %r3284, %r3399, %r3398; 2026-02-21T12:56:17.2438986Z cvt.rn.bf16x2.f32 %r3285, %r3401, %r3400; 2026-02-21T12:56:17.2439065Z cvt.rn.bf16x2.f32 %r3286, %r3403, %r3402; 2026-02-21T12:56:17.2439137Z cvt.rn.bf16x2.f32 %r3287, %r3405, %r3404; 2026-02-21T12:56:17.2439214Z cvt.rn.bf16x2.f32 %r3288, %r3407, %r3406; 2026-02-21T12:56:17.2439284Z cvt.rn.bf16x2.f32 %r3289, %r3409, %r3408; 2026-02-21T12:56:17.2439482Z cvt.rn.bf16x2.f32 %r3290, %r3411, %r3410; 2026-02-21T12:56:17.2439557Z cvt.rn.bf16x2.f32 %r3291, %r3413, %r3412; 2026-02-21T12:56:17.2439629Z cvt.rn.bf16x2.f32 %r3292, %r3415, %r3414; 2026-02-21T12:56:17.2439699Z cvt.rn.bf16x2.f32 %r3293, %r3417, %r3416; 2026-02-21T12:56:17.2439768Z cvt.rn.bf16x2.f32 %r3294, %r3419, %r3418; 2026-02-21T12:56:17.2439841Z cvt.rn.bf16x2.f32 %r3295, %r3421, %r3420; 2026-02-21T12:56:17.2439910Z cvt.rn.bf16x2.f32 %r3296, %r3423, %r3422; 2026-02-21T12:56:17.2439979Z cvt.rn.bf16x2.f32 %r3297, %r3425, %r3424; 2026-02-21T12:56:17.2440052Z cvt.rn.bf16x2.f32 %r3298, %r3427, %r3426; 2026-02-21T12:56:17.2440123Z cvt.rn.bf16x2.f32 %r3299, %r3429, %r3428; 2026-02-21T12:56:17.2440337Z .loc 1 88 22 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:88:22 2026-02-21T12:56:17.2440410Z mad.lo.s64 %rd240, %rd35, 2560, %rd75; 2026-02-21T12:56:17.2440483Z shl.b64 %rd241, %rd239, 1; 2026-02-21T12:56:17.2440616Z add.s64 %rd223, %rd240, %rd241; 2026-02-21T12:56:17.2440686Z mad.lo.s64 %rd242, %rd36, 2560, %rd75; 2026-02-21T12:56:17.2440758Z add.s64 %rd224, %rd242, %rd241; 2026-02-21T12:56:17.2440832Z mad.lo.s64 %rd243, %rd37, 2560, %rd75; 2026-02-21T12:56:17.2440894Z add.s64 %rd225, %rd243, %rd241; 2026-02-21T12:56:17.2440961Z mad.lo.s64 %rd244, %rd38, 2560, %rd75; 2026-02-21T12:56:17.2441028Z add.s64 %rd226, %rd244, %rd241; 2026-02-21T12:56:17.2441094Z mad.lo.s64 %rd245, %rd39, 2560, %rd75; 2026-02-21T12:56:17.2441155Z add.s64 %rd227, %rd245, %rd241; 2026-02-21T12:56:17.2441224Z mad.lo.s64 %rd246, %rd40, 2560, %rd75; 2026-02-21T12:56:17.2441289Z add.s64 %rd228, %rd246, %rd241; 2026-02-21T12:56:17.2441356Z mad.lo.s64 %rd247, %rd41, 2560, %rd75; 2026-02-21T12:56:17.2441424Z add.s64 %rd229, %rd247, %rd241; 2026-02-21T12:56:17.2441491Z mad.lo.s64 %rd248, %rd42, 2560, %rd75; 2026-02-21T12:56:17.2441553Z add.s64 %rd230, %rd248, %rd241; 2026-02-21T12:56:17.2441630Z mad.lo.s64 %rd249, %rd43, 2560, %rd75; 2026-02-21T12:56:17.2441702Z add.s64 %rd231, %rd249, %rd241; 2026-02-21T12:56:17.2441768Z mad.lo.s64 %rd250, %rd44, 2560, %rd75; 2026-02-21T12:56:17.2441833Z add.s64 %rd232, %rd250, %rd241; 2026-02-21T12:56:17.2441918Z mad.lo.s64 %rd251, %rd45, 2560, %rd75; 2026-02-21T12:56:17.2441989Z add.s64 %rd233, %rd251, %rd241; 2026-02-21T12:56:17.2442055Z mad.lo.s64 %rd252, %rd46, 2560, %rd75; 2026-02-21T12:56:17.2442120Z add.s64 %rd234, %rd252, %rd241; 2026-02-21T12:56:17.2442188Z mad.lo.s64 %rd253, %rd47, 2560, %rd75; 2026-02-21T12:56:17.2442249Z add.s64 %rd235, %rd253, %rd241; 2026-02-21T12:56:17.2442316Z mad.lo.s64 %rd254, %rd48, 2560, %rd75; 2026-02-21T12:56:17.2442381Z add.s64 %rd236, %rd254, %rd241; 2026-02-21T12:56:17.2442447Z mad.lo.s64 %rd255, %rd49, 2560, %rd75; 2026-02-21T12:56:17.2442510Z add.s64 %rd237, %rd255, %rd241; 2026-02-21T12:56:17.2442648Z mad.lo.s64 %rd256, %rd50, 2560, %rd75; 2026-02-21T12:56:17.2442710Z add.s64 %rd238, %rd256, %rd241; 2026-02-21T12:56:17.2442923Z .loc 1 88 81 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:88:81 2026-02-21T12:56:17.2443040Z st.shared.v4.b32 [%r102], {%r3236, %r3238, %r3240, %r3242}; 2026-02-21T12:56:17.2443150Z st.shared.v4.b32 [%r103], {%r3244, %r3246, %r3248, %r3250}; 2026-02-21T12:56:17.2443253Z st.shared.v4.b32 [%r104], {%r3252, %r3254, %r3256, %r3258}; 2026-02-21T12:56:17.2443355Z st.shared.v4.b32 [%r105], {%r3260, %r3262, %r3264, %r3266}; 2026-02-21T12:56:17.2443425Z bar.sync 0; 2026-02-21T12:56:17.2443488Z // begin inline asm 2026-02-21T12:56:17.2443683Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3092, %r3093, %r3094, %r3095}, [%r3096]; 2026-02-21T12:56:17.2443747Z // end inline asm 2026-02-21T12:56:17.2443807Z // begin inline asm 2026-02-21T12:56:17.2443990Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3097, %r3098, %r3099, %r3100}, [%r3101]; 2026-02-21T12:56:17.2444049Z // end inline asm 2026-02-21T12:56:17.2444114Z // begin inline asm 2026-02-21T12:56:17.2444407Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3102, %r3103, %r3104, %r3105}, [%r3106]; 2026-02-21T12:56:17.2444465Z // end inline asm 2026-02-21T12:56:17.2444524Z // begin inline asm 2026-02-21T12:56:17.2444701Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3107, %r3108, %r3109, %r3110}, [%r3111]; 2026-02-21T12:56:17.2444757Z // end inline asm 2026-02-21T12:56:17.2444815Z bar.sync 0; 2026-02-21T12:56:17.2444919Z st.shared.v4.b32 [%r102], {%r3237, %r3239, %r3241, %r3243}; 2026-02-21T12:56:17.2445023Z st.shared.v4.b32 [%r103], {%r3245, %r3247, %r3249, %r3251}; 2026-02-21T12:56:17.2445123Z st.shared.v4.b32 [%r104], {%r3253, %r3255, %r3257, %r3259}; 2026-02-21T12:56:17.2445229Z st.shared.v4.b32 [%r105], {%r3261, %r3263, %r3265, %r3267}; 2026-02-21T12:56:17.2445285Z bar.sync 0; 2026-02-21T12:56:17.2445343Z // begin inline asm 2026-02-21T12:56:17.2445539Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3112, %r3113, %r3114, %r3115}, [%r3096]; 2026-02-21T12:56:17.2445597Z // end inline asm 2026-02-21T12:56:17.2445717Z // begin inline asm 2026-02-21T12:56:17.2445901Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3117, %r3118, %r3119, %r3120}, [%r3101]; 2026-02-21T12:56:17.2445961Z // end inline asm 2026-02-21T12:56:17.2446019Z // begin inline asm 2026-02-21T12:56:17.2446199Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3122, %r3123, %r3124, %r3125}, [%r3106]; 2026-02-21T12:56:17.2446258Z // end inline asm 2026-02-21T12:56:17.2446316Z // begin inline asm 2026-02-21T12:56:17.2446636Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3127, %r3128, %r3129, %r3130}, [%r3111]; 2026-02-21T12:56:17.2446710Z // end inline asm 2026-02-21T12:56:17.2446767Z bar.sync 0; 2026-02-21T12:56:17.2446873Z st.shared.v4.b32 [%r102], {%r3268, %r3270, %r3272, %r3274}; 2026-02-21T12:56:17.2446975Z st.shared.v4.b32 [%r103], {%r3276, %r3278, %r3280, %r3282}; 2026-02-21T12:56:17.2447083Z st.shared.v4.b32 [%r104], {%r3284, %r3286, %r3288, %r3290}; 2026-02-21T12:56:17.2447185Z st.shared.v4.b32 [%r105], {%r3292, %r3294, %r3296, %r3298}; 2026-02-21T12:56:17.2447244Z bar.sync 0; 2026-02-21T12:56:17.2447306Z // begin inline asm 2026-02-21T12:56:17.2447499Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3132, %r3133, %r3134, %r3135}, [%r3096]; 2026-02-21T12:56:17.2447556Z // end inline asm 2026-02-21T12:56:17.2447618Z // begin inline asm 2026-02-21T12:56:17.2447799Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3137, %r3138, %r3139, %r3140}, [%r3101]; 2026-02-21T12:56:17.2447854Z // end inline asm 2026-02-21T12:56:17.2447912Z // begin inline asm 2026-02-21T12:56:17.2448094Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3142, %r3143, %r3144, %r3145}, [%r3106]; 2026-02-21T12:56:17.2448150Z // end inline asm 2026-02-21T12:56:17.2448208Z // begin inline asm 2026-02-21T12:56:17.2448389Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3147, %r3148, %r3149, %r3150}, [%r3111]; 2026-02-21T12:56:17.2448536Z // end inline asm 2026-02-21T12:56:17.2448592Z bar.sync 0; 2026-02-21T12:56:17.2448698Z st.shared.v4.b32 [%r102], {%r3269, %r3271, %r3273, %r3275}; 2026-02-21T12:56:17.2448821Z st.shared.v4.b32 [%r103], {%r3277, %r3279, %r3281, %r3283}; 2026-02-21T12:56:17.2448925Z st.shared.v4.b32 [%r104], {%r3285, %r3287, %r3289, %r3291}; 2026-02-21T12:56:17.2449026Z st.shared.v4.b32 [%r105], {%r3293, %r3295, %r3297, %r3299}; 2026-02-21T12:56:17.2449086Z bar.sync 0; 2026-02-21T12:56:17.2449148Z // begin inline asm 2026-02-21T12:56:17.2449326Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3152, %r3153, %r3154, %r3155}, [%r3096]; 2026-02-21T12:56:17.2449387Z // end inline asm 2026-02-21T12:56:17.2449446Z // begin inline asm 2026-02-21T12:56:17.2449624Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3157, %r3158, %r3159, %r3160}, [%r3101]; 2026-02-21T12:56:17.2449683Z // end inline asm 2026-02-21T12:56:17.2449745Z // begin inline asm 2026-02-21T12:56:17.2449922Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3162, %r3163, %r3164, %r3165}, [%r3106]; 2026-02-21T12:56:17.2449980Z // end inline asm 2026-02-21T12:56:17.2450041Z // begin inline asm 2026-02-21T12:56:17.2450351Z ldmatrix.sync.aligned.m8n8.x4.shared.b16 {%r3167, %r3168, %r3169, %r3170}, [%r3111]; 2026-02-21T12:56:17.2450422Z // end inline asm 2026-02-21T12:56:17.2450484Z // begin inline asm 2026-02-21T12:56:17.2450612Z st.global.v4.b32 [ %rd223 + 0 ], { %r3092, %r3093, %r3094, %r3095 }; 2026-02-21T12:56:17.2450669Z // end inline asm 2026-02-21T12:56:17.2450726Z // begin inline asm 2026-02-21T12:56:17.2450849Z st.global.v4.b32 [ %rd224 + 0 ], { %r3112, %r3113, %r3114, %r3115 }; 2026-02-21T12:56:17.2450905Z // end inline asm 2026-02-21T12:56:17.2450964Z // begin inline asm 2026-02-21T12:56:17.2451085Z st.global.v4.b32 [ %rd225 + 0 ], { %r3097, %r3098, %r3099, %r3100 }; 2026-02-21T12:56:17.2451140Z // end inline asm 2026-02-21T12:56:17.2451198Z // begin inline asm 2026-02-21T12:56:17.2451312Z st.global.v4.b32 [ %rd226 + 0 ], { %r3117, %r3118, %r3119, %r3120 }; 2026-02-21T12:56:17.2451375Z // end inline asm 2026-02-21T12:56:17.2451432Z // begin inline asm 2026-02-21T12:56:17.2451625Z st.global.v4.b32 [ %rd227 + 0 ], { %r3102, %r3103, %r3104, %r3105 }; 2026-02-21T12:56:17.2451690Z // end inline asm 2026-02-21T12:56:17.2451757Z // begin inline asm 2026-02-21T12:56:17.2451874Z st.global.v4.b32 [ %rd228 + 0 ], { %r3122, %r3123, %r3124, %r3125 }; 2026-02-21T12:56:17.2451935Z // end inline asm 2026-02-21T12:56:17.2451993Z // begin inline asm 2026-02-21T12:56:17.2452105Z st.global.v4.b32 [ %rd229 + 0 ], { %r3107, %r3108, %r3109, %r3110 }; 2026-02-21T12:56:17.2452161Z // end inline asm 2026-02-21T12:56:17.2452221Z // begin inline asm 2026-02-21T12:56:17.2452332Z st.global.v4.b32 [ %rd230 + 0 ], { %r3127, %r3128, %r3129, %r3130 }; 2026-02-21T12:56:17.2452390Z // end inline asm 2026-02-21T12:56:17.2452453Z // begin inline asm 2026-02-21T12:56:17.2452564Z st.global.v4.b32 [ %rd231 + 0 ], { %r3132, %r3133, %r3134, %r3135 }; 2026-02-21T12:56:17.2452621Z // end inline asm 2026-02-21T12:56:17.2452679Z // begin inline asm 2026-02-21T12:56:17.2452799Z st.global.v4.b32 [ %rd232 + 0 ], { %r3152, %r3153, %r3154, %r3155 }; 2026-02-21T12:56:17.2452869Z // end inline asm 2026-02-21T12:56:17.2452929Z // begin inline asm 2026-02-21T12:56:17.2453046Z st.global.v4.b32 [ %rd233 + 0 ], { %r3137, %r3138, %r3139, %r3140 }; 2026-02-21T12:56:17.2453103Z // end inline asm 2026-02-21T12:56:17.2453160Z // begin inline asm 2026-02-21T12:56:17.2453276Z st.global.v4.b32 [ %rd234 + 0 ], { %r3157, %r3158, %r3159, %r3160 }; 2026-02-21T12:56:17.2453342Z // end inline asm 2026-02-21T12:56:17.2453402Z // begin inline asm 2026-02-21T12:56:17.2453516Z st.global.v4.b32 [ %rd235 + 0 ], { %r3142, %r3143, %r3144, %r3145 }; 2026-02-21T12:56:17.2453575Z // end inline asm 2026-02-21T12:56:17.2453633Z // begin inline asm 2026-02-21T12:56:17.2453745Z st.global.v4.b32 [ %rd236 + 0 ], { %r3162, %r3163, %r3164, %r3165 }; 2026-02-21T12:56:17.2453888Z // end inline asm 2026-02-21T12:56:17.2453958Z // begin inline asm 2026-02-21T12:56:17.2454074Z st.global.v4.b32 [ %rd237 + 0 ], { %r3147, %r3148, %r3149, %r3150 }; 2026-02-21T12:56:17.2454136Z // end inline asm 2026-02-21T12:56:17.2454196Z // begin inline asm 2026-02-21T12:56:17.2454308Z st.global.v4.b32 [ %rd238 + 0 ], { %r3167, %r3168, %r3169, %r3170 }; 2026-02-21T12:56:17.2454363Z // end inline asm 2026-02-21T12:56:17.2454591Z .loc 1 19 103 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:19:103 2026-02-21T12:56:17.2454668Z add.s64 %rd72, %rd257, 4224; 2026-02-21T12:56:17.2454741Z setp.lt.u64 %p22, %rd257, 16256; 2026-02-21T12:56:17.2454808Z mov.b64 %rd257, %rd72; 2026-02-21T12:56:17.2454868Z @%p22 bra $L__BB0_2; 2026-02-21T12:56:17.2454957Z $L__BB0_5: // %._crit_edge 2026-02-21T12:56:17.2455163Z .loc 1 19 4 // cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py:19:4 2026-02-21T12:56:17.2455222Z ret; 2026-02-21T12:56:17.2455281Z $L__tmp2: 2026-02-21T12:56:17.2455337Z $L__func_end0: 2026-02-21T12:56:17.2455428Z // -- End function 2026-02-21T12:56:17.2455548Z } 2026-02-21T12:56:17.2455852Z .file 1 "/tmp/torchinductor_root/gc/cgcwgwe74bzskzgsca2z2ztghivlqtumxvqoippqcjaqduvx4x4w.py" 2026-02-21T12:56:17.2456070Z .file 2 "/__w/helion/helion/.venv/lib/python3.12/site-packages/triton/language/standard.py" 2026-02-21T12:56:17.2456140Z .section .debug_abbrev 2026-02-21T12:56:17.2456192Z { 2026-02-21T12:56:17.2456285Z .b8 1 // Abbreviation Code 2026-02-21T12:56:17.2456392Z .b8 17 // DW_TAG_compile_unit 2026-02-21T12:56:17.2456622Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:56:17.2456716Z .b8 37 // DW_AT_producer 2026-02-21T12:56:17.2456803Z .b8 8 // DW_FORM_string 2026-02-21T12:56:17.2456885Z .b8 19 // DW_AT_language 2026-02-21T12:56:17.2456972Z .b8 5 // DW_FORM_data2 2026-02-21T12:56:17.2457136Z .b8 3 // DW_AT_name 2026-02-21T12:56:17.2457231Z .b8 8 // DW_FORM_string 2026-02-21T12:56:17.2457316Z .b8 16 // DW_AT_stmt_list 2026-02-21T12:56:17.2457397Z .b8 6 // DW_FORM_data4 2026-02-21T12:56:17.2457494Z .b8 27 // DW_AT_comp_dir 2026-02-21T12:56:17.2457575Z .b8 8 // DW_FORM_string 2026-02-21T12:56:17.2457650Z .b8 0 // EOM(1) 2026-02-21T12:56:17.2457724Z .b8 0 // EOM(2) 2026-02-21T12:56:17.2457813Z .b8 2 // Abbreviation Code 2026-02-21T12:56:17.2457900Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:56:17.2457987Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:56:17.2458065Z .b8 3 // DW_AT_name 2026-02-21T12:56:17.2458146Z .b8 8 // DW_FORM_string 2026-02-21T12:56:17.2458229Z .b8 32 // DW_AT_inline 2026-02-21T12:56:17.2458314Z .b8 11 // DW_FORM_data1 2026-02-21T12:56:17.2458385Z .b8 0 // EOM(1) 2026-02-21T12:56:17.2458454Z .b8 0 // EOM(2) 2026-02-21T12:56:17.2458543Z .b8 3 // Abbreviation Code 2026-02-21T12:56:17.2458627Z .b8 46 // DW_TAG_subprogram 2026-02-21T12:56:17.2458710Z .b8 1 // DW_CHILDREN_yes 2026-02-21T12:56:17.2458791Z .b8 17 // DW_AT_low_pc 2026-02-21T12:56:17.2458868Z .b8 1 // DW_FORM_addr 2026-02-21T12:56:17.2458949Z .b8 18 // DW_AT_high_pc 2026-02-21T12:56:17.2459117Z .b8 1 // DW_FORM_addr 2026-02-21T12:56:17.2459221Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:56:17.2459300Z .b8 19 // DW_FORM_ref4 2026-02-21T12:56:17.2459370Z .b8 0 // EOM(1) 2026-02-21T12:56:17.2459441Z .b8 0 // EOM(2) 2026-02-21T12:56:17.2459526Z .b8 4 // Abbreviation Code 2026-02-21T12:56:17.2459626Z .b8 29 // DW_TAG_inlined_subroutine 2026-02-21T12:56:17.2459709Z .b8 0 // DW_CHILDREN_no 2026-02-21T12:56:17.2459800Z .b8 49 // DW_AT_abstract_origin 2026-02-21T12:56:17.2459881Z .b8 19 // DW_FORM_ref4 2026-02-21T12:56:17.2459961Z .b8 17 // DW_AT_low_pc 2026-02-21T12:56:17.2460042Z .b8 1 // DW_FORM_addr 2026-02-21T12:56:17.2460121Z .b8 18 // DW_AT_high_pc 2026-02-21T12:56:17.2460329Z .b8 1 // DW_FORM_addr 2026-02-21T12:56:17.2460417Z .b8 88 // DW_AT_call_file 2026-02-21T12:56:17.2460494Z .b8 11 // DW_FORM_data1 2026-02-21T12:56:17.2460574Z .b8 89 // DW_AT_call_line 2026-02-21T12:56:17.2460657Z .b8 11 // DW_FORM_data1 2026-02-21T12:56:17.2460741Z .b8 87 // DW_AT_call_column 2026-02-21T12:56:17.2460817Z .b8 11 // DW_FORM_data1 2026-02-21T12:56:17.2460891Z .b8 0 // EOM(1) 2026-02-21T12:56:17.2460961Z .b8 0 // EOM(2) 2026-02-21T12:56:17.2461030Z .b8 0 // EOM(3) 2026-02-21T12:56:17.2461084Z } 2026-02-21T12:56:17.2461149Z .section .debug_info 2026-02-21T12:56:17.2461203Z { 2026-02-21T12:56:17.2461292Z .b32 178 // Length of Unit 2026-02-21T12:56:17.2461443Z .b8 2 // DWARF version number 2026-02-21T12:56:17.2461496Z .b8 0 2026-02-21T12:56:17.2461641Z .b32 .debug_abbrev // Offset Into Abbrev. Section 2026-02-21T12:56:17.2461737Z .b8 8 // Address Size (in bytes) 2026-02-21T12:56:17.2461857Z .b8 1 // Abbrev [1] 0xb:0xab DW_TAG_compile_unit 2026-02-21T12:56:17.2461943Z .b8 116 // DW_AT_producer 2026-02-21T12:56:17.2461996Z .b8 114 2026-02-21T12:56:17.2462053Z .b8 105 2026-02-21T12:56:17.2462106Z .b8 116 2026-02-21T12:56:17.2462157Z .b8 111 2026-02-21T12:56:17.2462212Z .b8 110 2026-02-21T12:56:17.2462263Z .b8 0 2026-02-21T12:56:17.2462351Z .b8 2 // DW_AT_language 2026-02-21T12:56:17.2462405Z .b8 0 2026-02-21T12:56:17.2462487Z .b8 99 // DW_AT_name 2026-02-21T12:56:17.2462540Z .b8 103 2026-02-21T12:56:17.2462593Z .b8 99 2026-02-21T12:56:17.2462650Z .b8 119 2026-02-21T12:56:17.2462700Z .b8 103 2026-02-21T12:56:17.2462751Z .b8 119 2026-02-21T12:56:17.2462801Z .b8 101 2026-02-21T12:56:17.2462854Z .b8 55 2026-02-21T12:56:17.2462903Z .b8 52 2026-02-21T12:56:17.2462954Z .b8 98 2026-02-21T12:56:17.2463005Z .b8 122 2026-02-21T12:56:17.2463070Z .b8 115 2026-02-21T12:56:17.2463124Z .b8 107 2026-02-21T12:56:17.2463176Z .b8 122 2026-02-21T12:56:17.2463231Z .b8 103 2026-02-21T12:56:17.2463282Z .b8 115 2026-02-21T12:56:17.2463342Z .b8 99 2026-02-21T12:56:17.2463394Z .b8 97 2026-02-21T12:56:17.2463449Z .b8 50 2026-02-21T12:56:17.2463501Z .b8 122 2026-02-21T12:56:17.2463551Z .b8 50 2026-02-21T12:56:17.2463605Z .b8 122 2026-02-21T12:56:17.2463656Z .b8 116 2026-02-21T12:56:17.2463706Z .b8 103 2026-02-21T12:56:17.2463758Z .b8 104 2026-02-21T12:56:17.2463814Z .b8 105 2026-02-21T12:56:17.2463937Z .b8 118 2026-02-21T12:56:17.2463989Z .b8 108 2026-02-21T12:56:17.2464039Z .b8 113 2026-02-21T12:56:17.2464097Z .b8 116 2026-02-21T12:56:17.2464150Z .b8 117 2026-02-21T12:56:17.2464204Z .b8 109 2026-02-21T12:56:17.2464259Z .b8 120 2026-02-21T12:56:17.2464311Z .b8 118 2026-02-21T12:56:17.2464361Z .b8 113 2026-02-21T12:56:17.2464412Z .b8 111 2026-02-21T12:56:17.2464466Z .b8 105 2026-02-21T12:56:17.2464517Z .b8 112 2026-02-21T12:56:17.2464569Z .b8 112 2026-02-21T12:56:17.2464622Z .b8 113 2026-02-21T12:56:17.2464674Z .b8 99 2026-02-21T12:56:17.2464727Z .b8 106 2026-02-21T12:56:17.2464777Z .b8 97 2026-02-21T12:56:17.2464832Z .b8 113 2026-02-21T12:56:17.2464883Z .b8 100 2026-02-21T12:56:17.2464934Z .b8 117 2026-02-21T12:56:17.2464988Z .b8 118 2026-02-21T12:56:17.2465039Z .b8 120 2026-02-21T12:56:17.2465090Z .b8 52 2026-02-21T12:56:17.2465144Z .b8 120 2026-02-21T12:56:17.2465200Z .b8 52 2026-02-21T12:56:17.2465251Z .b8 119 2026-02-21T12:56:17.2465302Z .b8 46 2026-02-21T12:56:17.2465364Z .b8 112 2026-02-21T12:56:17.2465424Z .b8 121 2026-02-21T12:56:17.2465476Z .b8 0 2026-02-21T12:56:17.2465577Z .b32 .debug_line // DW_AT_stmt_list 2026-02-21T12:56:17.2465765Z .b8 47 // DW_AT_comp_dir 2026-02-21T12:56:17.2465818Z .b8 116 2026-02-21T12:56:17.2465880Z .b8 109 2026-02-21T12:56:17.2465934Z .b8 112 2026-02-21T12:56:17.2465991Z .b8 47 2026-02-21T12:56:17.2466042Z .b8 116 2026-02-21T12:56:17.2466092Z .b8 111 2026-02-21T12:56:17.2466151Z .b8 114 2026-02-21T12:56:17.2466204Z .b8 99 2026-02-21T12:56:17.2466256Z .b8 104 2026-02-21T12:56:17.2466307Z .b8 105 2026-02-21T12:56:17.2466364Z .b8 110 2026-02-21T12:56:17.2466416Z .b8 100 2026-02-21T12:56:17.2466594Z .b8 117 2026-02-21T12:56:17.2466656Z .b8 99 2026-02-21T12:56:17.2466712Z .b8 116 2026-02-21T12:56:17.2466764Z .b8 111 2026-02-21T12:56:17.2466817Z .b8 114 2026-02-21T12:56:17.2466873Z .b8 95 2026-02-21T12:56:17.2466925Z .b8 114 2026-02-21T12:56:17.2466978Z .b8 111 2026-02-21T12:56:17.2467033Z .b8 111 2026-02-21T12:56:17.2467094Z .b8 116 2026-02-21T12:56:17.2467147Z .b8 47 2026-02-21T12:56:17.2467211Z .b8 103 2026-02-21T12:56:17.2467267Z .b8 99 2026-02-21T12:56:17.2467320Z .b8 0 2026-02-21T12:56:17.2467515Z .b8 2 // Abbrev [2] 0x6c:0x1b DW_TAG_subprogram 2026-02-21T12:56:17.2467604Z .b8 95 // DW_AT_name 2026-02-21T12:56:17.2467674Z .b8 104 2026-02-21T12:56:17.2467728Z .b8 101 2026-02-21T12:56:17.2467782Z .b8 108 2026-02-21T12:56:17.2467839Z .b8 105 2026-02-21T12:56:17.2467892Z .b8 111 2026-02-21T12:56:17.2467943Z .b8 110 2026-02-21T12:56:17.2467994Z .b8 95 2026-02-21T12:56:17.2468052Z .b8 109 2026-02-21T12:56:17.2468104Z .b8 97 2026-02-21T12:56:17.2468157Z .b8 116 2026-02-21T12:56:17.2468213Z .b8 109 2026-02-21T12:56:17.2468265Z .b8 117 2026-02-21T12:56:17.2468401Z .b8 108 2026-02-21T12:56:17.2468456Z .b8 95 2026-02-21T12:56:17.2468517Z .b8 98 2026-02-21T12:56:17.2468570Z .b8 102 2026-02-21T12:56:17.2468630Z .b8 49 2026-02-21T12:56:17.2468689Z .b8 54 2026-02-21T12:56:17.2468741Z .b8 95 2026-02-21T12:56:17.2468792Z .b8 105 2026-02-21T12:56:17.2468844Z .b8 110 2026-02-21T12:56:17.2468901Z .b8 116 2026-02-21T12:56:17.2468967Z .b8 52 2026-02-21T12:56:17.2469023Z .b8 0 2026-02-21T12:56:17.2469108Z .b8 1 // DW_AT_inline 2026-02-21T12:56:17.2469220Z .b8 3 // Abbrev [3] 0x87:0x2e DW_TAG_subprogram 2026-02-21T12:56:17.2469314Z .b64 $L__func_begin0 // DW_AT_low_pc 2026-02-21T12:56:17.2469414Z .b64 $L__func_end0 // DW_AT_high_pc 2026-02-21T12:56:17.2469522Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:56:17.2469654Z .b8 4 // Abbrev [4] 0x9c:0x18 DW_TAG_inlined_subroutine 2026-02-21T12:56:17.2469752Z .b32 108 // DW_AT_abstract_origin 2026-02-21T12:56:17.2469845Z .b64 $L__tmp0 // DW_AT_low_pc 2026-02-21T12:56:17.2469935Z .b64 $L__tmp1 // DW_AT_high_pc 2026-02-21T12:56:17.2470111Z .b8 1 // DW_AT_call_file 2026-02-21T12:56:17.2470204Z .b8 84 // DW_AT_call_line 2026-02-21T12:56:17.2470306Z .b8 40 // DW_AT_call_column 2026-02-21T12:56:17.2470402Z .b8 0 // End Of Children Mark 2026-02-21T12:56:17.2470491Z .b8 0 // End Of Children Mark 2026-02-21T12:56:17.2470549Z } 2026-02-21T12:56:17.2470621Z .section .debug_macinfo { } 2026-02-21T12:56:17.2470627Z 2026-02-21T12:56:17.2470708Z ================================================================ 2026-02-21T12:56:17.2470832Z please share the reproducer above with Triton project. 2026-02-21T12:56:40.8114366Z 2026-02-21T12:56:40.8115282Z Generation 8: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 56/56 1.5 configs/s 2026-02-21T12:56:41.3748188Z Generation 8: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 7/7 - configs/s 2026-02-21T12:56:45.3039494Z [9104s] Generation 8 complete: 2026-02-21T12:56:45.3039812Z error=14 2026-02-21T12:56:45.3039989Z ok=45 2026-02-21T12:56:45.3040786Z min=26.3145 2026-02-21T12:56:45.3041176Z mid=53.3515 2026-02-21T12:56:45.3041339Z max=765.7447 2026-02-21T12:56:45.3041533Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:56:45.3041826Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:56:45.3042125Z 'l2_groupings': [1], 2026-02-21T12:56:45.3042353Z 'load_eviction_policies': ['last', ''], 2026-02-21T12:56:45.3042614Z 'loop_orders': [[1, 0]], 2026-02-21T12:56:45.3042826Z 'num_stages': 2, 2026-02-21T12:56:45.3043010Z 'num_warps': 4, 2026-02-21T12:56:45.3043198Z 'pid_type': 'flat', 2026-02-21T12:56:45.3043401Z 'range_flattens': [None, None], 2026-02-21T12:56:45.3043659Z 'range_multi_buffers': [None, True], 2026-02-21T12:56:45.3043916Z 'range_num_stages': [0, 0], 2026-02-21T12:56:45.3044151Z 'range_unroll_factors': [0, 1], 2026-02-21T12:56:45.3044398Z 'range_warp_specializes': []} 2026-02-21T12:56:45.3084056Z [9104s] Fitting surrogate: 826 points, 826 targets 2026-02-21T12:56:45.6415823Z [9104s] Generation 9 starting: 12 neighbors, 1 active search path(s) 2026-02-21T12:56:50.6174949Z Generation 9: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 12/12 5.3 configs/s 2026-02-21T12:56:54.7266109Z Generation 9: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━━ 12/12 2.7 configs/s 2026-02-21T12:56:54.8729257Z Generation 9: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━━ 7/7 - configs/s 2026-02-21T12:56:55.8725811Z [9115s] Generation 9 complete: 2026-02-21T12:56:55.8726123Z error=6 2026-02-21T12:56:55.8726374Z ok=8 2026-02-21T12:56:55.8727113Z min=25.8132 2026-02-21T12:56:55.8727391Z mid=43.4807 2026-02-21T12:56:55.8727631Z max=124.8562 2026-02-21T12:56:55.8727901Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:56:55.8728336Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:56:55.8728769Z 'l2_groupings': [1], 2026-02-21T12:56:55.8729110Z 'load_eviction_policies': ['last', ''], 2026-02-21T12:56:55.8729525Z 'loop_orders': [[1, 0]], 2026-02-21T12:56:55.8729844Z 'num_stages': 2, 2026-02-21T12:56:55.8730114Z 'num_warps': 4, 2026-02-21T12:56:55.8730425Z 'pid_type': 'flat', 2026-02-21T12:56:55.8730731Z 'range_flattens': [None, None], 2026-02-21T12:56:55.8731108Z 'range_multi_buffers': [None, True], 2026-02-21T12:56:55.8731496Z 'range_num_stages': [0, 0], 2026-02-21T12:56:55.8731855Z 'range_unroll_factors': [0, 1], 2026-02-21T12:56:55.8732205Z 'range_warp_specializes': []} 2026-02-21T12:56:55.8781070Z [9115s] Fitting surrogate: 840 points, 840 targets 2026-02-21T12:56:56.2654999Z [9115s] Generation 10 starting: 16 neighbors, 1 active search path(s) 2026-02-21T12:57:03.0809257Z Generation 10: precompiling 100% ━━━━━━━━━━━━━━━━━━━━━━━━━━━ 16/16 3.1 configs/s 2026-02-21T12:57:09.9327461Z Generation 10: exploring neighbors 100% ━━━━━━━━━━━━━━━━━━━━ 16/16 1.9 configs/s 2026-02-21T12:57:10.0004977Z Generation 10: verifying top configs 100% ━━━━━━━━━━━━━━━━━━━━━━ 7/7 - configs/s 2026-02-21T12:57:10.4470056Z [9129s] Generation 10 complete: 2026-02-21T12:57:10.4470367Z error=7 2026-02-21T12:57:10.4470547Z ok=11 2026-02-21T12:57:10.4471160Z min=25.9983 2026-02-21T12:57:10.4471357Z mid=64.0419 2026-02-21T12:57:10.4471522Z max=133.8478 2026-02-21T12:57:10.4471728Z best={'block_sizes': [32, 64, 256], 2026-02-21T12:57:10.4472049Z 'indexing': ['pointer', 'pointer', 'pointer'], 2026-02-21T12:57:10.4472368Z 'l2_groupings': [1], 2026-02-21T12:57:10.4472617Z 'load_eviction_policies': ['last', ''], 2026-02-21T12:57:10.4472888Z 'loop_orders': [[1, 0]], 2026-02-21T12:57:10.4473109Z 'num_stages': 2, 2026-02-21T12:57:10.4473304Z 'num_warps': 4, 2026-02-21T12:57:10.4473504Z 'pid_type': 'flat', 2026-02-21T12:57:10.4473719Z 'range_flattens': [None, None], 2026-02-21T12:57:10.4473998Z 'range_multi_buffers': [None, True], 2026-02-21T12:57:10.4474271Z 'range_num_stages': [0, 0], 2026-02-21T12:57:10.4474508Z 'range_unroll_factors': [0, 1], 2026-02-21T12:57:10.4474761Z 'range_warp_specializes': []} 2026-02-21T12:57:10.4517823Z [9129s] Fitting surrogate: 858 points, 858 targets 2026-02-21T12:57:10.6279411Z [9129s] Autotuning complete in 9129.9s after searching 817 configs. 2026-02-21T12:57:10.6280238Z One can hardcode the best config and skip autotuning with: 2026-02-21T12:57:10.6281990Z @helion.kernel(config=helion.Config(block_sizes=[32, 64, 256], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', ''], loop_orders=[[1, 0]], num_stages=2, num_warps=4, pid_type='flat', range_flattens=[None, None], range_multi_buffers=[None, True], range_num_stages=[0, 0], range_unroll_factors=[0, 1], range_warp_specializes=[]), static_shapes=True) 2026-02-21T12:57:10.6283637Z 2026-02-21T12:57:10.6284051Z [9129s] Code of selected kernel: /tmp/torchinductor_root/ph/cpha2cojrzs5taqqbpdarc3dgk7vf7q42mn3wfsnhaizjp7ioqbx.py 2026-02-21T12:57:11.7041043Z WARNING:tritonbench.utils.triton_op:Completed input ID 28: 2026-02-21T12:57:11.7041431Z x_val 2026-02-21T12:57:11.7041627Z ---------------------- 2026-02-21T12:57:11.7041852Z (64, 4096, 1280, 8192) 2026-02-21T12:57:11.7041977Z 2026-02-21T12:57:11.7083834Z 90%|█████████ | 9/10 [4:38:52<1:05:36, 3936.54s/it]WARNING:tritonbench.utils.triton_op:Running input ID 31: 2026-02-21T12:57:11.7084314Z x_val 2026-02-21T12:57:11.7084479Z ---------------------- 2026-02-21T12:57:11.7084678Z (64, 4096, 8192, 3584) 2026-02-21T12:57:11.7143258Z INFO:tritonbench.utils.triton_op:Took 0.29ms to get benchmark function for preprocessed_eager_int4_gemm 2026-02-21T12:57:12.6800062Z INFO:tritonbench.utils.triton_op:Took 3.68ms to get benchmark function for preprocessed_torch_compile_int4_gemm 2026-02-21T12:57:29.1979948Z Autotune Choices Stats: 2026-02-21T12:57:29.1981332Z {"num_choices": 20, "num_triton_choices": 19, "best_kernel": "mm", "best_time": 22.76041603088379, "best_triton_pos": 1, "best_triton_time": 30.75257682800293, "best_triton_kernel": "triton_mm_145", "best_triton_kernel_desc": "ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8"} 2026-02-21T12:57:29.3610136Z AUTOTUNE mm(262144x3584, 3584x8192) 2026-02-21T12:57:29.3610507Z strides: [3584, 1], [8192, 1] 2026-02-21T12:57:29.3610779Z dtypes: torch.bfloat16, torch.bfloat16 2026-02-21T12:57:29.3611057Z mm 22.7604 ms 100.0% 2026-02-21T12:57:29.3611752Z triton_mm_145 30.7526 ms 74.0% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=5, num_warps=8 2026-02-21T12:57:29.3612924Z triton_mm_144 32.6344 ms 69.7% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T12:57:29.3614132Z triton_mm_143 34.0899 ms 66.8% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T12:57:29.3615656Z triton_mm_142 43.6877 ms 52.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=2, num_warps=8 2026-02-21T12:57:29.3617258Z triton_mm_139 45.6358 ms 49.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=128, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=4 2026-02-21T12:57:29.3618368Z triton_mm_138 47.8874 ms 47.5% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=64, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T12:57:29.3619436Z triton_mm_141 49.5655 ms 45.9% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T12:57:29.3620508Z triton_mm_140 50.4129 ms 45.1% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=128, BLOCK_N=64, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=3, num_warps=4 2026-02-21T12:57:29.3621808Z triton_mm_137 51.2420 ms 44.4% ACC_TYPE='tl.float32', ALLOW_TF32=False, BLOCK_K=32, BLOCK_M=64, BLOCK_N=128, EVEN_K=True, GROUP_M=8, USE_FAST_ACCUM=False, num_stages=4, num_warps=8 2026-02-21T12:57:29.3622709Z SingleProcess AUTOTUNE benchmarking takes 10.9861 seconds and 5.4323 seconds precompiling for 20 choices 2026-02-21T12:57:46.8651359Z INFO:tritonbench.utils.triton_op:Took 0.23ms to get benchmark function for preprocessed_triton_int4_gemm 2026-02-21T12:57:52.8685284Z WARNING:__main__:Input tensor metadata: 2026-02-21T12:57:52.8688071Z { 'args': ( { 'device': 'cuda:0', 2026-02-21T12:57:52.8688372Z 'dtype': 'torch.bfloat16', 2026-02-21T12:57:52.8694385Z 'shape': (64, 4096, 3584), 2026-02-21T12:57:52.8694733Z 'stride': (14680064, 3584, 1)}, 2026-02-21T12:57:52.8695011Z { 'device': 'cuda:0', 2026-02-21T12:57:52.8695267Z 'dtype': 'torch.int32', 2026-02-21T12:57:52.8695524Z 'shape': (3584, 8192), 2026-02-21T12:57:52.8695788Z 'stride': (8192, 1)}), 2026-02-21T12:57:52.8696031Z 'kwargs': {}} 2026-02-21T12:57:52.8778749Z INFO:tritonbench.utils.triton_op:Took 9.87ms to get benchmark function for helion_int4_gemm_tritonbench 2026-02-21T12:57:53.3109487Z [0s] Autotune random seed: 2135373392 2026-02-21T12:57:54.5237643Z [0s] Starting LFBOPatternSearch with initial_population=FROM_RANDOM, copies=5, max_generations=20, similarity_penalty=1.0 2026-02-21T12:58:31.2035425Z [36s] Timeout after 33s compiling Config(block_sizes=[16, 8192, 2], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'last'], loop_orders=[[1, 0]], maxnreg=256, num_sm_multiplier=4, num_stages=3, num_warps=8, pid_type='persistent_interleaved', range_flattens=[None, None], range_multi_buffers=[None, None], range_num_stages=[1, 3], range_unroll_factors=[0, 2], range_warp_specializes=[]) 2026-02-21T12:58:35.8690386Z [41s] Timeout after 33s compiling Config(block_sizes=[4, 32768, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[16], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], maxnreg=128, num_sm_multiplier=1, num_stages=5, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[None, None], range_num_stages=[1, 4], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T12:58:40.0151658Z [45s] Timeout after 33s compiling Config(block_sizes=[32, 8, 2048], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[64], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=7, num_warps=8, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[True, True], range_num_stages=[4, 2], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T12:58:40.4297115Z [45s] Timeout after 33s compiling Config(block_sizes=[16, 16384, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[32], load_eviction_policies=['first', 'last'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=32, num_stages=7, num_warps=32, pid_type='persistent_interleaved', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[3, 3], range_unroll_factors=[3, 0], range_warp_specializes=[]) 2026-02-21T12:58:40.9908213Z [46s] Timeout after 30s compiling Config(block_sizes=[64, 4096, 1], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=64, num_sm_multiplier=32, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, None], range_multi_buffers=[False, True], range_num_stages=[3, 0], range_unroll_factors=[3, 1], range_warp_specializes=[]) 2026-02-21T12:58:42.2617587Z [47s] Timeout after 30s compiling Config(block_sizes=[32, 1, 2], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['last', 'first'], loop_orders=[[1, 0]], num_stages=5, num_warps=8, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 3], range_unroll_factors=[0, 0], range_warp_specializes=[]) 2026-02-21T12:58:43.4249940Z [48s] Timeout after 30s compiling Config(block_sizes=[256, 4, 128], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[2], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], num_stages=1, num_warps=32, pid_type='flat', range_flattens=[None, True], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 3], range_warp_specializes=[]) 2026-02-21T12:58:44.5884694Z [50s] Timeout after 30s compiling Config(block_sizes=[2, 2048, 1], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[8], load_eviction_policies=['last', 'first'], loop_orders=[[0, 1]], maxnreg=256, num_sm_multiplier=128, num_stages=2, num_warps=2, pid_type='persistent_interleaved', range_flattens=[True, True], range_multi_buffers=[None, None], range_num_stages=[4, 2], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T12:58:44.6280349Z [50s] Timeout after 31s compiling Config(block_sizes=[1, 16, 4], indexing=['pointer', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['', 'last'], loop_orders=[[1, 0]], num_stages=1, num_warps=4, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 2], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T12:58:45.0687202Z [50s] Timeout after 30s compiling Config(block_sizes=[32, 128, 4], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[1], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=32, num_sm_multiplier=8, num_stages=3, num_warps=1, pid_type='persistent_blocked', range_flattens=[False, False], range_multi_buffers=[False, False], range_num_stages=[0, 0], range_unroll_factors=[1, 2], range_warp_specializes=[]) 2026-02-21T12:58:47.6333800Z [53s] Timeout after 30s compiling Config(block_sizes=[1, 65536, 1], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[2], load_eviction_policies=['', 'first'], loop_orders=[[1, 0]], num_stages=7, num_warps=2, pid_type='flat', range_flattens=[None, False], range_multi_buffers=[None, False], range_num_stages=[0, 0], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T12:58:50.5737874Z [56s] Timeout after 30s compiling Config(block_sizes=[128, 128, 2], indexing=['pointer', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', ''], loop_orders=[[1, 0]], maxnreg=32, num_sm_multiplier=4, num_stages=8, num_warps=8, pid_type='persistent_blocked', range_flattens=[False, True], range_multi_buffers=[False, False], range_num_stages=[1, 1], range_unroll_factors=[3, 4], range_warp_specializes=[]) 2026-02-21T12:58:56.2011562Z [61s] Timeout after 30s compiling Config(block_sizes=[16, 512, 8], indexing=['tensor_descriptor', 'pointer', 'pointer'], l2_groupings=[8], load_eviction_policies=['first', 'first'], loop_orders=[[0, 1]], maxnreg=64, num_sm_multiplier=16, num_stages=2, num_warps=2, pid_type='persistent_blocked', range_flattens=[None, True], range_multi_buffers=[None, None], range_num_stages=[2, 0], range_unroll_factors=[4, 2], range_warp_specializes=[]) 2026-02-21T12:58:56.3075530Z Initial population precompiling 100% ━━━━━━━━━━━━━━━━━━━━━ 100/100 3.6 configs/s 2026-02-21T13:48:02.4825793Z [3007s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 256, 16], indexing=['pointer', 'tensor_descriptor', 'pointer'], l2_groupings=[32], load_eviction_policies=['last', 'last'], loop_orders=[[1, 0]], num_sm_multiplier=4, num_stages=1, num_warps=2, pid_type='persistent_interleaved', range_flattens=[False, False], range_multi_buffers=[False, True], range_num_stages=[4, 1], range_unroll_factors=[0, 4], range_warp_specializes=[]) 2026-02-21T13:48:02.4828018Z Tensor-likes are not close! 2026-02-21T13:48:02.4828169Z 2026-02-21T13:48:02.4828292Z Mismatched elements: 2144433975 / 2147483648 (99.9%) 2026-02-21T13:48:02.4828779Z Greatest absolute difference: 6656.0 at index (197501, 140) (up to 0.01 allowed) 2026-02-21T13:48:02.4829920Z Greatest relative difference: inf at index (15192, 72) (up to 0.01 allowed) 2026-02-21T13:48:02.4830365Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T13:48:02.4830598Z 2026-02-21T13:50:00.6852225Z [3126s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 2048, 32], indexing=['tensor_descriptor', 'tensor_descriptor', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['first', 'first'], loop_orders=[[1, 0]], num_sm_multiplier=64, num_stages=2, num_warps=32, pid_type='persistent_blocked', range_flattens=[True, None], range_multi_buffers=[False, True], range_num_stages=[4, 2], range_unroll_factors=[4, 1], range_warp_specializes=[]) 2026-02-21T13:50:00.6854247Z Tensor-likes are not close! 2026-02-21T13:50:00.6854400Z 2026-02-21T13:50:00.6854519Z Mismatched elements: 2144425011 / 2147483648 (99.9%) 2026-02-21T13:50:00.6854945Z Greatest absolute difference: 7168.0 at index (197949, 132) (up to 0.01 allowed) 2026-02-21T13:50:00.6855449Z Greatest relative difference: inf at index (15192, 72) (up to 0.01 allowed) 2026-02-21T13:50:00.6855895Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T13:50:00.6856133Z 2026-02-21T14:03:50.4378682Z [3955s] Skipping config with accuracy mismatch: helion.Config(block_sizes=[1, 32, 32], indexing=['tensor_descriptor', 'pointer', 'tensor_descriptor'], l2_groupings=[4], load_eviction_policies=['last', 'last'], loop_orders=[[0, 1]], maxnreg=128, num_sm_multiplier=128, num_stages=1, num_warps=16, pid_type='persistent_interleaved', range_flattens=[False, None], range_multi_buffers=[False, None], range_num_stages=[2, 4], range_unroll_factors=[1, 1], range_warp_specializes=[]) 2026-02-21T14:03:50.4380556Z Tensor-likes are not close! 2026-02-21T14:03:50.4380725Z 2026-02-21T14:03:50.4380865Z Mismatched elements: 2147462237 / 2147483648 (100.0%) 2026-02-21T14:03:50.4381362Z Greatest absolute difference: 5248.0 at index (207574, 4977) (up to 0.01 allowed) 2026-02-21T14:03:50.4381931Z Greatest relative difference: 3.171875 at index (233956, 2995) (up to 0.01 allowed) 2026-02-21T14:03:50.4382370Z Use HELION_AUTOTUNE_ACCURACY_CHECK=0 to disable this check. 2026-02-21T14:03:50.4382599Z 2026-02-21T14:04:42.2013180Z ##[error]The operation was canceled. 2026-02-21T14:04:42.2081649Z Post job cleanup. 2026-02-21T14:04:42.2085495Z ##[command]/usr/bin/docker exec dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 sh -c "cat /etc/*release | grep ^ID" 2026-02-21T14:04:42.3793557Z [command]/usr/bin/git version 2026-02-21T14:04:42.3833318Z git version 2.43.0 2026-02-21T14:04:42.3871446Z Temporarily overriding HOME='/__w/_temp/474b6bda-15c6-436d-a762-e332dcdfe2e8' before making global git config changes 2026-02-21T14:04:42.3872092Z Adding repository directory to the temporary git global config as a safe directory 2026-02-21T14:04:42.3875974Z [command]/usr/bin/git config --global --add safe.directory /__w/helion/helion 2026-02-21T14:04:42.3906794Z Removing SSH command configuration 2026-02-21T14:04:42.3912035Z [command]/usr/bin/git config --local --name-only --get-regexp core\.sshCommand 2026-02-21T14:04:42.3942355Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'core\.sshCommand' && git config --local --unset-all 'core.sshCommand' || :" 2026-02-21T14:04:42.4271940Z Removing HTTP extra header 2026-02-21T14:04:42.4275320Z [command]/usr/bin/git config --local --name-only --get-regexp http\.https\:\/\/github\.com\/\.extraheader 2026-02-21T14:04:42.4307156Z [command]/usr/bin/git submodule foreach --recursive sh -c "git config --local --name-only --get-regexp 'http\.https\:\/\/github\.com\/\.extraheader' && git config --local --unset-all 'http.https://github.com/.extraheader' || :" 2026-02-21T14:04:42.4583605Z Removing includeIf entries pointing to credentials config files 2026-02-21T14:04:42.4596846Z [command]/usr/bin/git config --local --name-only --get-regexp ^includeIf\.gitdir: 2026-02-21T14:04:42.4614821Z includeif.gitdir:/__w/helion/helion/.git.path 2026-02-21T14:04:42.4615199Z includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path 2026-02-21T14:04:42.4615553Z includeif.gitdir:/github/workspace/.git.path 2026-02-21T14:04:42.4615895Z includeif.gitdir:/github/workspace/.git/worktrees/*.path 2026-02-21T14:04:42.4621578Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/__w/helion/helion/.git.path 2026-02-21T14:04:42.4645740Z /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4654002Z [command]/usr/bin/git config --local --unset includeif.gitdir:/__w/helion/helion/.git.path /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4688902Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path 2026-02-21T14:04:42.4712947Z /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4721445Z [command]/usr/bin/git config --local --unset includeif.gitdir:/__w/helion/helion/.git/worktrees/*.path /__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4754828Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/github/workspace/.git.path 2026-02-21T14:04:42.4778360Z /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4785497Z [command]/usr/bin/git config --local --unset includeif.gitdir:/github/workspace/.git.path /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4817907Z [command]/usr/bin/git config --local --get-all includeif.gitdir:/github/workspace/.git/worktrees/*.path 2026-02-21T14:04:42.4842674Z /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4849580Z [command]/usr/bin/git config --local --unset includeif.gitdir:/github/workspace/.git/worktrees/*.path /github/runner_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config 2026-02-21T14:04:42.4883224Z [command]/usr/bin/git submodule foreach --recursive git config --local --show-origin --name-only --get-regexp remote.origin.url 2026-02-21T14:04:42.5160841Z Removing credentials config '/__w/_temp/git-credentials-9c9762d5-bda2-41fd-b36a-d25e9eae85c5.config' 2026-02-21T14:04:42.5311805Z Stop and remove container: a713bb541b394d40a9e372a466ae966c_nvidiacuda1281develubuntu2404_ca6977 2026-02-21T14:04:42.5316369Z ##[command]/usr/bin/docker rm --force dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T14:04:57.9538522Z dfaa2b0d874681230c8443d6e8ad7743474c44dc43659fe49fb8f6fcfb8fa204 2026-02-21T14:04:57.9584132Z Remove container network: github_network_3a5cb79b050545ab97b2786230e17941 2026-02-21T14:04:57.9588695Z ##[command]/usr/bin/docker network rm github_network_3a5cb79b050545ab97b2786230e17941 2026-02-21T14:04:58.4283998Z github_network_3a5cb79b050545ab97b2786230e17941 2026-02-21T14:04:58.4351968Z Evaluate and set job outputs 2026-02-21T14:04:58.4358321Z Cleaning up orphan processes